2023-07-11

cs.LG

cs.LG - 2023-07-11

Sports Betting: an application of neural networks and modern portfolio theory to the English Premier League

paper_url: http://arxiv.org/abs/2307.13807
repo_url: None
paper_authors: Vélez Jiménez, Román Alberto, Lecuanda Ontiveros, José Manuel, Edgar Possani
for: 这篇研究旨在优化运动赌博策略，应用 Von Neumann-Morgenstern 预期价理论、深度学习技术和进阶的 Kelly 条件。
methods: 研究结合神经网络模型和资产估值优化，实现了20/21英超联赛第二季后的135.8%资产增值。
results: 研究获得了实用的对预测足球赛果的神经网络模型，并评估了完整和限制策略的性能、风险管理和多样化。

Abstract
This paper presents a novel approach for optimizing betting strategies in sports gambling by integrating Von Neumann-Morgenstern Expected Utility Theory, deep learning techniques, and advanced formulations of the Kelly Criterion. By combining neural network models with portfolio optimization, our method achieved remarkable profits of 135.8% relative to the initial wealth during the latter half of the 20/21 season of the English Premier League. We explore complete and restricted strategies, evaluating their performance, risk management, and diversification. A deep neural network model is developed to forecast match outcomes, addressing challenges such as limited variables. Our research provides valuable insights and practical applications in the field of sports betting and predictive modeling.

摘要
Translation notes:* "Von Neumann-Morgenstern Expected Utility Theory" is translated as "尼采曼-摩根斯特恩预期风险理论" (Niàimèi-Mògēngshìyì Expected Utility Theory)* "deep learning techniques" is translated as "深度学习技术" (shēngrán xuéxí jīshù)* "advanced formulations of the Kelly Criterion" is translated as "凯利 критери亮的高级形式" (Kēlǐ Críterion de gāojí xíngshì)* "English Premier League" is translated as "英超联赛" (Yīngcháo Liánshì)* "match outcomes" is translated as "赛事结果" (sài shì jiéguǒ)

Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning

paper_url: http://arxiv.org/abs/2307.05384
repo_url: None
paper_authors: Xuxing Chen, Krishnakumar Balasubramanian, Saeed Ghadimi
for: 解决嵌套compose函数的二级优化问题
methods: 使用stochev approximation algorithm，不需要matrix inversion或mini-batch
results: 可以达到$\epsilon$-站点解的解决方案，复杂度约为$\tilde{O}_T(1/\epsilon^2)$

Abstract
We develop and analyze stochastic approximation algorithms for solving nested compositional bi-level optimization problems. These problems involve a nested composition of $T$ potentially non-convex smooth functions in the upper-level, and a smooth and strongly convex function in the lower-level. Our proposed algorithm does not rely on matrix inversions or mini-batches and can achieve an $\epsilon$-stationary solution with an oracle complexity of approximately $\tilde{O}_T(1/\epsilon^{2})$, assuming the availability of stochastic first-order oracles for the individual functions in the composition and the lower-level, which are unbiased and have bounded moments. Here, $\tilde{O}_T$ hides polylog factors and constants that depend on $T$. The key challenge we address in establishing this result relates to handling three distinct sources of bias in the stochastic gradients. The first source arises from the compositional nature of the upper-level, the second stems from the bi-level structure, and the third emerges due to the utilization of Neumann series approximations to avoid matrix inversion. To demonstrate the effectiveness of our approach, we apply it to the problem of robust feature learning for deep neural networks under covariate shift, showcasing the benefits and advantages of our methodology in that context.

摘要
我们开发和分析Stochastic Approximation Algorithm，用于解决嵌套组合的 би-级优化问题。这些问题包含 $T$ 个可能不准确的凸函数的嵌套作用在上级，以及一个准确的凸函数在下级。我们的提议的算法不需要矩阵反转或 mini-batch，可以在 $\epsilon$ 静态环境下获得 $\tilde{O}_T(1/\epsilon^{2})$ 精度，假设有随机首ORDER oracle 对各个函数的拟合和下级，这些 oracle 是无偏见的和有 bounded moments。在这里， $\tilde{O}_T$ 隐藏了 polylog 因子和常数，它们取决于 $T$。我们在实现这个结果时面临的主要挑战是处理 Stochastic Gradient 中的三种偏差。第一种来自于嵌套结构，第二种来自于 bi-level 结构，第三种来自于使用 Neumann 系列近似来避免矩阵反转。为了证明我们的方法的效果，我们将它应用于深度神经网络中的 Robust Feature Learning 问题，并通过这个例子展示了我们的方法的优势和优点。

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

paper_url: http://arxiv.org/abs/2307.05358
repo_url: None
paper_authors: Sikai Bai, Shuaicheng Li, Weiming Zhuang, Jie Zhang, Song Guo, Kunlin Yang, Jun Hou, Shuai Zhang, Junyu Gao, Shuai Yi
for: 这个论文targets federated semi-supervised learning (FSSL) in decentralized heterogeneous data, with a focus on addressing the challenge of non-identical data distribution within and across clients.
methods: 该论文提出了一种新的FSSL框架，called FedDure，which uses dual regulators (C-reg和F-reg) to address the assumption of independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client.
results: 该论文的实验表明，FedDure比现有方法在各种场景中表现出色，尤其是在CIFAR-10和CINIC-10 datasets上，提高了更多于11%的性能。

Abstract
Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.

摘要
federated 学习已经成为了learn from 分布式不同数据的popular方法。 federated 半supervised 学习（FSSL）emerges 用于从小量标签数据中训练模型，由于分布式客户端上标签数据的稀缺。 existing FSSL 方法假设客户端上的标签数据是独立同分布的（IID），并且在客户端内部标签和未标签数据的分布相同。 this work studies 一种更实际和挑战的scenario of FSSL，where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure。FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

paper_url: http://arxiv.org/abs/2307.05341
repo_url: None
paper_authors: Joe Suk, Samory Kpotufe
For: 这篇论文主要研究非参数contextual bandit，其中 lipschitz均值函数可能随时间变化。* Methods: 作者首先确定了这种情况下的最佳动态快悔率率，并证明现有方法在这种设定下是不优秀的。然后，他们提出了一种新的定义，即“经验性变化”，以更好地考虑本地性。* Results: 作者的主要结果是证明这种更宽容的定义可以实际应用于 adaptive 算法。

Abstract
We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes $L$ and total-variation $V$, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of $L$ or $V$. Quite importantly, we posit that the bandit problem, viewed locally at a given context $X_t$, should not be affected by reward changes in other parts of context space $\cal X$. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than $L$ and $V$. Furthermore, similar to recent work on non-stationary MAB (Suk & Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.

摘要
我们研究非参数化上下文带狗，其中 lipschitz 平均奖励函数可能随时间变化。我们首先将最佳动态对抗 regret 率定义为函数数量 $L$ 和总变化 $V$，这两个量都捕捉了上下文空间中的所有变化，并证明了现有的方法在这个设定下是不佳的。接下来，我们对这个设定进行了适应，即在不知道 $L$ 或 $V$ 的情况下 achieving 最佳动态对抗 regret 率。我们认为在上下文空间中的带狗问题，应该忽略其他部分上下文空间中的奖励变化。因此，我们提出了一个称为“体验了大幅度变化”的概念，这个概念更好地考虑了地方性，因此只计算了上下文空间中的一部分变化。此外，我们还与最近的非站点带狗研究（Suk & Kpotufe, 2022）相似，只计算了最重要的奖励变化，例如在观察到的上下文中的严重最好臂变化。我们的主要结果是证明这个更宽容的变化定义可以实际地适应。

Predicting small molecules solubilities on endpoint devices using deep ensemble neural networks

paper_url: http://arxiv.org/abs/2307.05318
repo_url: https://github.com/ur-whitelab/mol.dev
paper_authors: Mayk Caldas Ramos, Andrew D. White
for: 这个研究旨在提高溶解性预测的精度和计算效率，同时解决使用群体基于的方法的使用问题。
methods: 该研究使用深度学习模型来预测溶解性，并通过提供Predictive uncertainty来衡量模型的不确定性。
results: 研究结果表明，该模型可以得到满意的溶解性预测结果，并且可以帮助创建溶解性预测模型，既能够考虑不确定性，又能够使用者友好。

Abstract
Aqueous solubility is a valuable yet challenging property to predict. Computing solubility using first-principles methods requires accounting for the competing effects of entropy and enthalpy, resulting in long computations for relatively poor accuracy. Data-driven approaches, such as deep learning, offer improved accuracy and computational efficiency but typically lack uncertainty quantification. Additionally, ease of use remains a concern for any computational technique, resulting in the sustained popularity of group-based contribution methods. In this work, we addressed these problems with a deep learning model with predictive uncertainty that runs on a static website (without a server). This approach moves computing needs onto the website visitor without requiring installation, removing the need to pay for and maintain servers. Our model achieves satisfactory results in solubility prediction. Furthermore, we demonstrate how to create molecular property prediction models that balance uncertainty and ease of use. The code is available at \url{https://github.com/ur-whitelab/mol.dev}, and the model is usable at \url{https://mol.dev}.

摘要
溶解性是一个有价值又具有挑战性的性质，计算溶解性使用基于原理方法需要考虑竞争的 entropy 和 enthalpy 效应，导致计算效率低下，准确性也不高。数据驱动方法，如深度学习，可以提高准确性和计算效率，但通常缺乏 uncertainty 评估。此外，使用容易性也是一个关键问题，导致群组基于的贡献方法仍然具有广泛的应用。在这种情况下，我们采用了一种深度学习模型，具有预测 uncertainty，运行在静态网站（无需服务器）上。这种方法将计算需求卷积到网站访问者身上，不需要安装和维护服务器。我们的模型可以达到溶解性预测的满意结果。此外，我们还示出了如何创建分子性质预测模型，既具有 uncertainty 也具有容易使用性。代码可以在 \url{https://github.com/ur-whitelab/mol.dev} 上获取，模型可以在 \url{https://mol.dev} 上使用。

Discovering Symbolic Laws Directly from Trajectories with Hamiltonian Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.05299
repo_url: None
paper_authors: Suresh Bishnoi, Ravinder Bhattoo, Jayadeva, Sayan Ranu, N M Anoop Krishnan
for: 本研究旨在通过数学方法发现自然系统的互动规律。
methods: 本研究使用了加权图 neural network (HGNN) 来学习物理系统的动态。HGNN 是一种符合物理规则的神经网络，可以直接从系统的轨迹中学习动态。
results: 研究发现，HGNN 可以很好地适应不同的物理系统，并且可以从小量数据中学习出高度一致的动态。此外，HGNN 还可以扩展到更大的系统大小和混合系统中，并且可以通过符号 regresion 推导出互动方程。

Abstract
The time evolution of physical systems is described by differential equations, which depend on abstract quantities like energy and force. Traditionally, these quantities are derived as functionals based on observables such as positions and velocities. Discovering these governing symbolic laws is the key to comprehending the interactions in nature. Here, we present a Hamiltonian graph neural network (HGNN), a physics-enforced GNN that learns the dynamics of systems directly from their trajectory. We demonstrate the performance of HGNN on n-springs, n-pendulums, gravitational systems, and binary Lennard Jones systems; HGNN learns the dynamics in excellent agreement with the ground truth from small amounts of data. We also evaluate the ability of HGNN to generalize to larger system sizes, and to hybrid spring-pendulum system that is a combination of two original systems (spring and pendulum) on which the models are trained independently. Finally, employing symbolic regression on the learned HGNN, we infer the underlying equations relating the energy functionals, even for complex systems such as the binary Lennard-Jones liquid. Our framework facilitates the interpretable discovery of interaction laws directly from physical system trajectories. Furthermore, this approach can be extended to other systems with topology-dependent dynamics, such as cells, polydisperse gels, or deformable bodies.

摘要
Physical systems' 时间演化是用导函数方程表示的，这些方程取决于抽象量如能量和力。传统上，这些量是基于观察量如位置和速度来 derivation 的。找出这些指导符号法则是理解自然系统的交互的键。我们现在提出了一种哈密顿图 neural network（HGNN），这是一种符合物理规则的图 neural network，可以直接从系统轨迹中学习系统的动力学。我们在 n-spring、n-pendulum、重力系统和二元 Lenard-Jones 系统上测试了 HGNN，它在小数据量下与真实值一致地学习了系统的动力学。我们还评估了 HGNN 的扩展性和可重复性，以及将两个独立训练的系统（春和振荡）结合在一起的 hybrid 春振荡系统的性能。最后，我们使用符号回归来推导出在 energy 函数上的下文依赖的方程，包括复杂的二元 Lenard-Jones 液体系统。我们的框架可以直接从物理系统轨迹中可读地找到交互的法则，并且可以扩展到其他具有体系依赖动力学的系统，如细胞、多种 gel 或可变形体。

On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

paper_url: http://arxiv.org/abs/2307.05284
repo_url: https://github.com/namkoong-lab/whyshift
paper_authors: Jiashuo Liu, Tianyu Wang, Peng Cui, Hongseok Namkoong
for: This paper aims to investigate natural shifts in tabular datasets and the impact of these shifts on algorithmic performance.
methods: The authors use a thorough investigation of 5 tabular datasets and 86,000 model configurations to identify the most prevalent types of distribution shifts, specifically $Y|X$-shifts. They also build an empirical testbed called WhyShift to characterize and benchmark performance over different types of shifts.
results: The authors find that $Y|X$-shifts are the most prevalent type of shift in tabular settings, and they identify covariate regions that suffer the biggest $Y|X$-shifts. They discuss the implications of these shifts for algorithmic and data-based interventions.

Abstract
Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they implicitly focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e.g., previous observations on algorithmic performance can fail to be valid when the $Y|X$ distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations, and find that $Y|X$-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build WhyShift, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since $Y|X$-shifts are prevalent in tabular settings, we identify covariate regions that suffer the biggest $Y|X$-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of how distributions differ.

摘要
不同的分布偏移需要不同的算法和操作干预。方法研究必须基于具体的偏移来定制。虽然初始的标准模型提供了一个有前途的基础，但它们默认地关注 covariate 偏移，并且研究成果的有效性取决于偏移的类型，例如，之前对算法性能的评估可能无法在 $Y|X$ 分布变化时保持有效。我们对 5 个表格数据集进行了全面的调查，并发现了 $Y|X$-偏移是最普遍的。为促进研究人员开发更加细化的分布偏移语言，我们建立了 WhyShift，一个基于实际情况的偏移测试床，我们在这里characterize了我们测试性能的类型的偏移。由于 tabular 设置中 $Y|X$-偏移最普遍，我们将covariate 区域分析出最大的 $Y|X$-偏移，并讨论了对算法和数据基于干预的影响。我们的测试床表明未来研究应该建立一个对分布的不同而建立的理解，以便更好地适应具体的应用场景。

CareFall: Automatic Fall Detection through Wearable Devices and AI Methods

paper_url: http://arxiv.org/abs/2307.05275
repo_url: None
paper_authors: Juan Carlos Ruiz-Garcia, Ruben Tolosana, Ruben Vera-Rodriguez, Carlos Moro
for: 这篇研究旨在开发一个自动检测跌倒的系统，以减轻老年人跌倒所带来的负面影响。
methods: 这篇研究使用了智能手表上的加速度和陀螺仪时间信号，并使用人工智能方法进行特征提取和类别。
results: 实验结果显示，使用机器学习方法结合加速度和陀螺仪信息的方法，在准确性、敏感度和特异度方面都高于阈值基本方法。

Abstract
The aging population has led to a growing number of falls in our society, affecting global public health worldwide. This paper presents CareFall, an automatic Fall Detection System (FDS) based on wearable devices and Artificial Intelligence (AI) methods. CareFall considers the accelerometer and gyroscope time signals extracted from a smartwatch. Two different approaches are used for feature extraction and classification: i) threshold-based, and ii) machine learning-based. Experimental results on two public databases show that the machine learning-based approach, which combines accelerometer and gyroscope information, outperforms the threshold-based approach in terms of accuracy, sensitivity, and specificity. This research contributes to the design of smart and user-friendly solutions to mitigate the negative consequences of falls among older people.

摘要
随着人口老龄化，社会中的滥落事件数量在全球范围内呈增加趋势。本文介绍了一种基于智能手表和人工智能方法的自动滥落检测系统（FDS），称为CareFall。CareFall利用智能手表上的加速度和自转器时间信号进行特征提取和分类，并使用两种不同的方法：一种是基于阈值的方法，另一种是基于机器学习的方法。在两个公共数据库上进行了实验，结果显示，基于加速度和自转器信息的机器学习方法的检测精度、敏感度和特征鲁平性比基于阈值的方法高。这项研究增加了设计智能和易用的解决方案，以降低老年人滥落的负面影响。

U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

paper_url: http://arxiv.org/abs/2307.05260
repo_url: https://github.com/exploration-lab/il-pcr
paper_authors: Abhinav Joshi, Akshat Sharma, Sai Kiran Tanikella, Ashutosh Modi
For: The paper is written for the task of Prior Case Retrieval (PCR) in the legal domain, specifically proposing a new large benchmark (IL-PCR corpus) and exploring the role of events in legal case retrieval.* Methods: The paper proposes an unsupervised retrieval method-based pipeline called U-CREAT (Unsupervised Case Retrieval using Events Extraction), which significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin.* Results: The proposed system is generic and shows state-of-the-art performance on the benchmarks for both the Indian and Canadian legal systems (IL-PCR and COLIEE corpora).Here are the three points in Simplified Chinese text:* For: 这篇论文是为了legal domain中的 Prior Case Retrieval (PCR)任务而写的，具体来说是提出一个新的大型benchmark（IL-PCR corpus），并探讨法律案例之间的事件的角色。* Methods: 这篇论文提出了一个无监督的检索方法基于管道called U-CREAT (Unsupervised Case Retrieval using Events Extraction)，它与BM25相比显著提高了性能，并且使检索速度减少了较大的margin，因此适用于实时案例检索系统。* Results: 提posed系统是通用的，并在两个不同的法律系统（印度和加拿大）的benchmark上达到了状态计算机表现。

Abstract
The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).

摘要
PCR任务在法律领域是自动提供相关的前例案例，以便在查询案例中提供有用的信息。为了进一步推动PCR研究，在这篇论文中，我们提出了一个新的大量benchmark（英文） дляPCR任务：IL-PCR（印度法律前例案例采集） corpora。由于案例相关性的复杂性和法律文档的长度，BM25仍然是PCR任务中的强基线。在这种工作中，我们探讨了法律案例中事件的角色，并提出了一种无监督检索方法 pipeline U-CREAT（无监督案例检索使用事件提取）。我们发现，我们提出的无监督检索方法可以在BM25和事件提取方法之间提高性能，并且可以在实时案例检索系统中提高检索速度，使其成为实时案例检索系统中的可靠选择。我们的提出的系统是通用的，我们表明它可以在印度和加拿大两个不同的法律系统中实现状态的表现，并且在IL-PCR和COLIEE corpora上达到了状态的性能。

MAP- and MLE-Based Teaching

paper_url: http://arxiv.org/abs/2307.05252
repo_url: None
paper_authors: Hans Ulrich Simon, Jan Arne Telle
for: 这个论文主要研究的是学习概率的概念推理问题，具体来说是learner L从一个观察集Z中INFER一个隐藏的概念。
methods: 该论文基于 Ferri et al.的工作，假设learner L被参数化为 prior P(c)和c-conditioned likelihoods P(z|c)，其中c是一个给定的概念集C中的一个概念，z是一个观察集Z中的一个观察。learner L被称为MAP-learner（resp. MLE-learner），如果它将观察集S视为一个随机抽样，并返回最大a posteriori probabilities（resp.最大c-conditional likelihood）。
results: 该论文的主要结果是，这种教学模型具有一些愉悦的 monotonicity 性质，并且可以通过不同的抽样方式来关联ogether。在特定的情况下（即概念是集合，观察是0,1-标记的示例）， authors 还得到了一些额外的结果，例如，MAP-和MLE-教学维度可以被图 theoretically characterize，并且可以通过VC-dimension和其他 combinatorial parameters来Upper bound。

Abstract
Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time.

摘要
学生L将尝试对一个集合观察做推理，建立在[4] Ferri等人的工作之上。我们假设学生L是受到先前知识P(c)和c- conditional likelihoods P(z|c)的参数化学生，其中c是所有概念集C中的一个概念，z是所有观察集Z中的一个观察。L被称为MAP-学生（resp. MLE-学生），如果它视观察集S为一个随机抽样，并返回概念中的最大a posteriori probabilities（resp. 最大c-conditional likelihood）。对于L是否假设S是顺序或无顺序抽样，或者是否从抽样中删除或不删除某些观察，我们可以区别出四种抽样模式。对于目标概念c在C中，教师 дляMAP-学生L的目标是找到一个最小的观察集，使L返回c。这个模型具有一些愉悦的弹性性质。我们还证明了这些抽样模式之间的相关性。对于特别的情况，其中概念是域中的子集和观察是0,1-标注的例子，我们获得了一些额外的结果。例如，我们可以Characterize MAP-和MLE-教育dimension associated with an optimally parameterized MAP-learner graph-theoretically。从这个中央结果，一些其他的结果是容易 derivable。例如，我们可以证明MAP-和MLE-教育dimension是等于或高于对方，并且可以通过 antichain number、VC-dimension和相关的 combinatorial parameters bound from above。此外，这些教育dimension可以在 polynomial time 内计算。

DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

paper_url: http://arxiv.org/abs/2307.05249
repo_url: None
paper_authors: Zhiwen Yang, Yang Zhou, Hui Zhang, Bingzheng Wei, Yubo Fan, Yan Xu
For: 多中心 positron射tomography（PET）图像合成问题的目的是 recuperate low-dose PET图像从多个不同中心。* Methods: 我们开发了一种通用模型，该模型在不同中心共享结构和参数，以便利用多中心之间的共同知识。但是，这种通用模型可能会受到中心间干扰问题的影响，即不同中心的梯度方向可能不一致或 même 相反。为 Mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts.* Results: 我们的通用模型与动态路由（DRMC）在多中心之间表现出了优秀的通用性。

Abstract
Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by training specialized models for each center, they are parameter inefficient and do not well exploit the shared knowledge across centers. To address this, we develop a generalist model that shares architecture and parameters across centers to utilize the shared knowledge. However, the generalist model can suffer from the center interference issue, \textit{i.e.} the gradient directions of different centers can be inconsistent or even opposite owing to the non-identical data distribution. To mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts. Experiments show that our generalist model with dynamic routing (DRMC) exhibits excellent generalizability across centers. Code and data are available at: https://github.com/Yaziwel/Multi-Center-PET-Image-Synthesis.

摘要

A Survey From Distributed Machine Learning to Distributed Deep Learning

paper_url: http://arxiv.org/abs/2307.05232
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Mohammad Dehghani, Zahra Yazdanparast
for: 本文总结了当前分布式机器学习领域的最新进展，包括分类和聚类（传统机器学习）、深度学习和深度强化学习等方法。
methods: 本文对分布式机器学习算法进行了详细的概述，并将其分为分类和聚类（传统机器学习）、深度学习和深度强化学习等类别。
results: 本文对各种分布式机器学习算法进行了评估，并将其分为深度学习和传统机器学习两类。深度学习在分布式机器学习中占据了主导地位，大多数研究都集中在这一方面。

Abstract
Artificial intelligence has achieved significant success in handling complex tasks in recent years. This success is due to advances in machine learning algorithms and hardware acceleration. In order to obtain more accurate results and solve more complex problems, algorithms must be trained with more data. This huge amount of data could be time-consuming to process and require a great deal of computation. This solution could be achieved by distributing the data and algorithm across several machines, which is known as distributed machine learning. There has been considerable effort put into distributed machine learning algorithms, and different methods have been proposed so far. In this article, we present a comprehensive summary of the current state-of-the-art in the field through the review of these algorithms. We divide this algorithms in classification and clustering (traditional machine learning), deep learning and deep reinforcement learning groups. Distributed deep learning has gained more attention in recent years and most of studies worked on this algorithms. As a result, most of the articles we discussed here belong to this category. Based on our investigation of algorithms, we highlight limitations that should be addressed in future research.

摘要
人工智能在最近几年内已经取得了重要的成功，这种成功主要归功于机器学习算法和硬件加速。为了获得更高准确率和解决更复杂的问题，算法需要更多的数据进行训练。这些庞大数据可能需要很长时间来处理，并需要巨量的计算资源。为了解决这个问题，人们提出了分布式机器学习的想法。在这篇文章中，我们对当前领域的状况做了全面的概述，包括分类和聚类（传统机器学习）、深度学习和深度强化学习等方法。分布式深度学习在最近几年内得到了更多的关注， więc大多数研究都是在这个领域进行的。根据我们对算法的调查，我们指出了未来研究中应该解决的一些限制。

Attribute Controlled Dialogue Prompting

paper_url: http://arxiv.org/abs/2307.05228
repo_url: None
paper_authors: Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart
for: 这篇研究是为了提出一种新的、具体化的问题训练算法，用于控制对话生成。
methods: 这篇研究使用了基于实例级控制代码的问题训练算法，而不是基于对话历史。
results: 实验结果显示，该方法与问题训练基线比较，并且与只使用5%-6%的总参数进行精致训练相当。

Abstract
Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous prompting assume fixed prompts for all data samples within a task, neglecting the fact that inputs vary greatly in some tasks such as open-domain dialogue generation. In this paper, we present a novel, instance-specific prompt-tuning algorithm for dialogue generation. Specifically, we generate prompts based on instance-level control code, rather than the conversation history, to explore their impact on controlled dialogue generation. Experiments on popular open-domain dialogue datasets, evaluated on both automated metrics and human evaluation, demonstrate that our method is superior to prompting baselines and comparable to fine-tuning with only 5%-6% of total parameters.

摘要
启发调整已成为大型预训练语言模型适应下游任务的常用方法，但是两种精确提示和连续提示都假设每个数据样本内任务中的提示是固定的，而忽视了一些任务，如开放领域对话生成，输入可能很大的变化。在这篇论文中，我们提出了一种新的实例特定的提示调整算法，具体来说，我们基于实例级控制代码而不是对话历史来生成提示，以探索它们对控制对话生成的影响。我们在流行的开放领域对话Dataset上进行了实验，并通过自动评价指标和人工评价来评价我们的方法。结果表明，我们的方法在比基准提示和只使用5%-6%的总参数进行精度训练时具有优异表现。

Supervised Attention Using Homophily in Graph Neural Networks

paper_url: http://arxiv.org/abs/2307.05217
repo_url: None
paper_authors: Michail Chatzianastasis, Giannis Nikolentzos, Michalis Vazirgiannis
for: 本研究旨在提高图像学习中Graph Attention Networks（GATs）的性能，以提高图像分类和图像推荐等任务的效果。
methods: 我们提出了一种新的技术，可以与任何图像学习模型结合使用，以增强GATs中节点之间的关注分布。该技术可以提高节点之间的关注分布，从而提高图像分类和图像推荐等任务的效果。
results: 我们在多个节点分类数据集上进行了评估，并证明了我们的方法可以提高GATs的性能，并且在某些任务上超过标准基eline模型。

Abstract
Graph neural networks have become the standard approach for dealing with learning problems on graphs. Among the different variants of graph neural networks, graph attention networks (GATs) have been applied with great success to different tasks. In the GAT model, each node assigns an importance score to its neighbors using an attention mechanism. However, similar to other graph neural networks, GATs aggregate messages from nodes that belong to different classes, and therefore produce node representations that are not well separated with respect to the different classes, which might hurt their performance. In this work, to alleviate this problem, we propose a new technique that can be incorporated into any graph attention model to encourage higher attention scores between nodes that share the same class label. We evaluate the proposed method on several node classification datasets demonstrating increased performance over standard baseline models.

摘要
格raph神经网络已经成为处理图structured learning问题的标准方法。其中，graph attention网络（GATs）在不同任务上得到了成功应用。在GAT模型中，每个节点通过注意机制对其邻居进行重要性分配。然而，与其他图神经网络一样，GATs将来自不同类别的节点的消息聚合，因此生成的节点表示可能并不是根据不同类别而分离得到的，这可能会影响其性能。在这种情况下，我们提出了一种可以在任何图注意模型中应用的新技术，以促进同类别节点之间的高度注意分数。我们在多个节点分类数据集上评估了该方法，并证明了它与标准基准模型相比表现更好。

Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning

paper_url: http://arxiv.org/abs/2307.05213
repo_url: None
paper_authors: Mattia Silvestri, Senne Berden, Jayanta Mandi, Ali İrfan Mahmutoğulları, Maxime Mulamba, Allegra De Filippo, Tias Guns, Michele Lombardi
for: 这个论文主要用于提出一种基于决策学习的优化方法，用于解决具有未知参数的实际优化问题。methods: 这种方法基于直接最小化下游任务损失来训练机器学习模型，而不是直接最大化预测精度。它采用分布预测和分数函数梯度估计（SFGE）来计算决策学习更新，以扩展决策学习的应用范围。results: 经过实验，这种方法可以：（1）处理在目标函数和约束中都有预测的情况；（2）有效地解决两个阶段随机优化问题。

Abstract
Many real-world optimization problems contain unknown parameters that must be predicted prior to solving. To train the predictive machine learning (ML) models involved, the commonly adopted approach focuses on maximizing predictive accuracy. However, this approach does not always lead to the minimization of the downstream task loss. Decision-focused learning (DFL) is a recently proposed paradigm whose goal is to train the ML model by directly minimizing the task loss. However, state-of-the-art DFL methods are limited by the assumptions they make about the structure of the optimization problem (e.g., that the problem is linear) and by the fact that can only predict parameters that appear in the objective function. In this work, we address these limitations by instead predicting \textit{distributions} over parameters and adopting score function gradient estimation (SFGE) to compute decision-focused updates to the predictive model, thereby widening the applicability of DFL. Our experiments show that by using SFGE we can: (1) deal with predictions that occur both in the objective function and in the constraints; and (2) effectively tackle two-stage stochastic optimization problems.

摘要
许多实际优化问题中含有未知参数，需要预测才能解决。现有的方法通常是通过提高预测精度来训练预测机器学习（ML）模型。然而，这种方法不一定能够最小化下游任务损失。决策驱动学习（DFL）是一种最近提出的方法，其目标是通过直接最小化任务损失来训练 ML 模型。然而，现有的 DFL 方法受到问题结构假设（例如，问题是线性的）和仅能预测出现在目标函数中的参数的限制。在这项工作中，我们解决这些限制，而不是直接预测参数，而是预测参数的分布，并采用分数函数梯度估计（SFGE）来计算决策关注更新，从而扩展 DFL 的应用范围。我们的实验表明，通过使用 SFGE，我们可以：（1）处理目标函数中的预测和约束中的预测；（2）有效地解决两阶段随机优化问题。

Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.05209
repo_url: None
paper_authors: Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
for: 提高深度强化学习代理的适应能力和学习效率，使其能够更好地适应未看过的任务和环境变化。
methods: 使用奖机器（RM）来表示当前任务，通过生成符号表示法提供代理 Symbolic 表示当前任务的优质转移，并在多个任务之间共享这些表示，使代理可以利用已经遇到的符号和转移来增强转移。
results: 在多个领域中进行了实验，证明了我们的方法可以提高代理的样本效率和几个shot转移，从而提高深度强化学习代理的适应能力和学习效率。

Abstract
Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.

摘要

Reject option models comprising out-of-distribution detection

paper_url: http://arxiv.org/abs/2307.05199
repo_url: None
paper_authors: Vojtech Franc, Daniel Prusa, Jakub Paplham
for: 本研究旨在解决机器学习中的 OUT-OF-DISTRIBUTION（OOD）设置问题，提出了多种贡献。
methods: 本文提出了三种拒绝选项模型 для OOD 设置：Cost-based 模型、Bounded TPR-FPR 模型和 Bounded Precision-Recall 模型。这些模型将标准的拒绝选项模型扩展到非 OOD 设置，并定义了理想的 OOD 选择类фика器的概念。我们证明所提出的모든模型，尽管它们的不同形式化，都共享一个公共的优化策略。
results: 实验结果表明，使用两个选择的 OOD 检测器的uncertainty 分数来进行双重分数 OOD 方法，具有较高的性能。此外，我们提出了基于定义优化策略的新评价指标，以提供全面和可靠地评价 OOD 方法。

Abstract
The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches.

摘要
OPTIMAL PREDICTION STRATEGY FOR OUT-OF-DISTRIBUTION SETUPS 是机器学习领域的基本问题。在这篇论文中，我们回答这个问题，并提出了多种贡献。我们提出了三种拒绝选项模型 для OOD 设置：成本基于模型、TPR-FPR 边界值模型和精度-准确率边界值模型。这些模型将标准的非 OOD 拒绝选项模型推广到 OOD 设置，并定义了最佳 OOD 选择类фика器的概念。我们证明了所提出的모든模型，尽管它们的不同形式ulation，都共享一个共同的优化策略。受到优化策略的激发，我们提出了双分数 OOD 方法，利用两个选择的 OOD 探测器的uncertainty scores：一个专注于 OOD/ID 识别，另一个专注于错误探测。实验结果逐 consistently 表明这种简单的策略的 superior performance 与当前方法相比。此外，我们提出了基于定义最佳策略的新评价指标，用于评价 OOD 方法。这些新指标提供了全面和可靠的评价方法，不受现有评价方法的缺陷。

Differentially Private Statistical Inference through $β$-Divergence One Posterior Sampling

paper_url: http://arxiv.org/abs/2307.05194
repo_url: None
paper_authors: Jack Jewson, Sahra Ghalebikesabi, Chris Holmes
for: 本研究旨在提供一种可靠地保护敏感数据的统计分析结果的隐私保护机制，无需改变数据生成过程。
methods: 本研究使用 Bayesian posterior sampling 方法，通过采样 Bayesian posterior distribution 来生成私有的估计结果，而不需要人工添加噪声。
results: 本研究表明，使用 $\beta$D-Bayes 方法可以实现更高精度的隐私保护，同时可以用于复杂的分类器和连续回归模型，如神经网络。

Abstract
Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.

摘要
dif·fer·en·tial pri·va·cy guar·an·tees allow the results of a stat·is·tical analy·sis in·volv·ing sen·sen·tive data to be re·leased with·out com·pro·mis·ing the pri·va·cy of any in·di·vid·ual tak·ing part. Achiev·ing such guar·an·tees gen·er·al·ly re·quires the in·jec·tion of noise, either di·rect·ly into para·me·ter es·ti·mates or into the es·ti·ma·tion process. In·stead of ar·ti·fi·cial·ly in·tro·duc·ing per·tur·ba·tions, sam·pling from Bay·esian po·st·erior dis·trib·u·tions has been shown to be a spe·cial case of the ex·po·nen·tial mech·a·nism, pro·duc·ing con·sis·tent, and ef·fi·cient pri·va·te es·ti·mates with·out al·ter·ing the data gen·er·a·tive process. The ap·pli·ca·tion of cur·rent ap·proach·es has, how·ev·er, been lim·it·ed by their strong bound·ing as·sump·tions which do not hold for ba·sic mod·els, such as sim·ple lin·ear re·gressors. To ame·li·or·ate this, we pro·pose $\beta$D-Bayes, a po·ster·ior sam·pling scheme from a gen·er·al·ized po·ster·ior tar·get·ing the mi·ni·miza·tion of the $\beta$-di·ver·gence be·tween the model and the data gen·er·a·tive process. This pro·vides pri·va·te es·ti·ma·tion that is gen·er·al·ly ap·plic·a·ble with·out re·quir·ing changes to the un·der·ly·ing mo·del and con·sis·tent·ly learns the data gen·er·a·tive pa·ra·me·ter. We show that $\beta$D-Bayes pro·duces more pre·cise in·fer·ence es·ti·mates for the same pri·va·cy guar·an·tees, and fur·ther fa·cil·i·tates di·ffer·en·tially pri·va·te es·ti·ma·tion via po·ster·ior sam·pling for com·plex class·i·fi·ers and con·tin·u·ous re·gress·ion mod·els such as neu·ral net·works for the first time.

Membership Inference Attacks on DNNs using Adversarial Perturbations

paper_url: http://arxiv.org/abs/2307.05193
repo_url: https://github.com/hassanalikhatim/amia
paper_authors: Hassan Ali, Adnan Qayyum, Ala Al-Fuqaha, Junaid Qadir
for: 这个论文主要目标是提出一种高置信度成员检测算法，以便在深度神经网络（DNN）训练后进行成员检测。
methods: 这个论文使用了现有的成员检测攻击（MI attack），并提出了两种新的攻击方法： adversarial membership inference attack（AMIA）和 enhance AMIA（E-AMIA）。这些攻击方法利用了subject的会员和非会员信息，并在一定的散度范围内进行了对准损失函数的最小化。
results: 论文的实验结果表明，AMIA和E-AMIA在Fashion-MNIST和MNIST datasets上的真正阳性率分别达到6%和8%，而现有的LiRA和EMIA在这些 datasets上的真正阳性率几乎为0。此外，这些攻击方法还能够在不同的训练方法和环境下进行 Transfer Learning，并且比现有的攻击方法更加稳定和可靠。

Abstract
Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attack (LiRA) and enhanced MI attack (EMIA) -- only perform well on complex datasets (e.g., CIFAR-10 and Imagenet) where the target DNN overfits its train set, but perform poorly on simpler datasets (0% TPR by both attacks on Fashion-MNIST, 2% and 0% TPR respectively by LiRA and EMIA on MNIST at 1% FPR). To address this, firstly, we unify current MI attacks by presenting a framework divided into three stages -- preparation, indication and decision. Secondly, we utilize the framework to propose two novel attacks: (1) Adversarial Membership Inference Attack (AMIA) efficiently utilizes the membership and the non-membership information of the subjects while adversarially minimizing a novel loss function, achieving 6% TPR on both Fashion-MNIST and MNIST datasets; and (2) Enhanced AMIA (E-AMIA) combines EMIA and AMIA to achieve 8% and 4% TPRs on Fashion-MNIST and MNIST datasets respectively, at 1% FPR. Thirdly, we introduce two novel augmented indicators that positively leverage the loss information in the Gaussian neighborhood of a subject. This improves TPR of all four attacks on average by 2.5% and 0.25% respectively on Fashion-MNIST and MNIST datasets at 1% FPR. Finally, we propose simple, yet novel, evaluation metric, the running TPR average (RTA) at a given FPR, that better distinguishes different MI attacks in the low FPR region. We also show that AMIA and E-AMIA are more transferable to the unknown DNNs (other than the target DNN) and are more robust to DP-SGD training as compared to LiRA and EMIA.

摘要
多种会员推测（MI）攻击已经提议用于审核目标神经网络（DNN）。给定一组主题，MI攻击可以确定目标DNN在训练过程中训练过的主题。这项工作专注于增强后期MI攻击，强调高置信度会员检测——准确率（TPR）在低 FALSE POSITIVE RATE（FPR）下。现有的工作在这个类别——概率比例攻击（LiRA）和增强MI攻击（EMIA）——只在复杂的数据集（如CIFAR-10和Imagenet）上表现出色，但在简单的数据集（Fashion-MNIST中的0% TPR，MNIST中的2%和0% TPR）上表现不佳。为解决这个问题，我们首先：一、将当前MI攻击统一为一个框架，分为三个阶段：准备阶段、指示阶段和决策阶段。二、使用框架提出两种新的攻击：（1）敌意会员推测攻击（AMIA）利用会员和非会员主题的信息，同时 adversarially 最小化一个新的损失函数，在Fashion-MNIST和MNIST数据集上达到6%的TPR；（2）增强AMIA（E-AMIA）结合EMIA和AMIA，在Fashion-MNIST和MNIST数据集上达到8%和4%的TPR，即1%的FPR。三、我们介绍两种新的扩展指标，利用损失函数的梯度信息在Gaussian neighborhood中提高TPR的平均值。这将在1% FPR下提高Fashion-MNIST和MNIST数据集的TPR平均值 by 2.5%和0.25%。最后，我们提出一个简单 yet novel的评价指标——运行TPR平均值（RTA）——更好地 отлича出不同的MI攻击在低FPR区域。我们还证明AMIA和E-AMIA在未知DNN上（与目标DNN不同）更加可转移和DP-SGD训练中更加稳定。

Using Linear Regression for Iteratively Training Neural Networks

paper_url: http://arxiv.org/abs/2307.05189
repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
paper_authors: Harshad Khadilkar
for: 这 paper 的目的是提出一种基于线性回归的神经网络学习方法，作为标准梯度下降法的一种alternative。
methods: 本 paper 使用的方法是基于输入层的线性组合和神经元参数（ weights 和 biases）来学习神经网络的参数。作者们提出了一种可靠和快速的算法，通过工作 backwards 从输出来计算理想的输入值，并在每个神经元上更新参数和活动值。
results: 作者们表明，对小 проблеmlarge, more complex architectures。 In addition, the approach is more stable and faster than gradient-based methods.

Abstract
We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for small problems) the approach is more stable and faster than gradient-based methods.

摘要
我们提出了一种基于线性回归的方法来学习 neural network 的权重和偏置。这项工作是探索性的，我们限制了描述和实验到（i）简单的前向 нейрон网络，（ii）具有唯一输出（单输出）回归问题，以及（iii）可逆活化函数。然而，该方法适用于更大、更复杂的结构。关键思想是观察每个神经元在神经网络中的输入是前一层神经元的活化值和当前层参数（权重和偏置）的线性组合。如果我们可以计算每个神经元的理想总输入值，那么我们可以将学习问题转化为一个线性最小二乘问题，这个问题可以通过更新参数和活化值进行迭代来解决。我们提供了一个明确的算法，并证明（至少对小问题）该方法比梯度基本方法更稳定和更快。

Decorrelation using Optimal Transport

paper_url: http://arxiv.org/abs/2307.05187
repo_url: https://github.com/malteal/ot-decorrelation
paper_authors: Malte Algren, John Andrew Raine, Tobias Golling
for: decorrelate a continuous feature space against protected attributes with optimal transport
methods: Convex Neural Optimal Transport Solvers (Cnots)
results: achieved state-of-the-art performance in binary classification, and significantly better performance in multiclass outputs compared to the state-of-the-art.

Abstract
Being able to decorrelate a feature space from protected attributes is an area of active research and study in ethics, fairness, and also natural sciences. We introduce a novel decorrelation method using Convex Neural Optimal Transport Solvers (Cnots) that is able to decorrelate a continuous feature space against protected attributes with optimal transport. We demonstrate how well it performs in the context of jet classification in high energy physics, where classifier scores are desired to be decorrelated from the mass of a jet. The decorrelation achieved in binary classification approaches the levels achieved by the state-of-the-art using conditional normalising flows. When moving to multiclass outputs the optimal transport approach performs significantly better than the state-of-the-art, suggesting substantial gains at decorrelating multidimensional feature spaces.

摘要
simplified-chinese能够减除保护属性相关的特征空间是一个活跃的研究领域，包括伦理、公平性和自然科学。我们介绍了一种使用凸神经最优运输算法（CNOTS）来减除连续特征空间与保护属性的距离最优运输方法。我们在高能物理中的扩散分类中示出了该方法的性能，并达到了 conditional normalizing flows 的水平。在多类输出情况下，优化运输方法表现更好， suggesting substantial gains in decorrelating multidimensional feature spaces.

A Mapping Study of Machine Learning Methods for Remaining Useful Life Estimation of Lead-Acid Batteries

paper_url: http://arxiv.org/abs/2307.05163
repo_url: None
paper_authors: Sérgio F Chevtchenko, Elisson da Silva Rocha, Bruna Cruz, Ermeson Carneiro de Andrade, Danilo Ricardo Barbosa de Araújo
for: 这篇论文主要针对的是锂离子电池的状态评估和剩余用生命时间的机器学习方法。
methods: 本论文使用了多种机器学习算法来评估锂离子电池的状态和剩余用生命时间，并评估了这些算法的准确率和计算时间。
results: 本论文通过分析不同应用中使用的感知器组合，发现了一些常见的感知器组合，并评估了这些组合的性能。同时，本论文还发现了未来研究的潜在空白和机遇。

Abstract
Energy storage solutions play an increasingly important role in modern infrastructure and lead-acid batteries are among the most commonly used in the rechargeable category. Due to normal degradation over time, correctly determining the battery's State of Health (SoH) and Remaining Useful Life (RUL) contributes to enhancing predictive maintenance, reliability, and longevity of battery systems. Besides improving the cost savings, correct estimation of the SoH can lead to reduced pollution though reuse of retired batteries. This paper presents a mapping study of the state-of-the-art in machine learning methods for estimating the SoH and RUL of lead-acid batteries. These two indicators are critical in the battery management systems of electric vehicles, renewable energy systems, and other applications that rely heavily on this battery technology. In this study, we analyzed the types of machine learning algorithms employed for estimating SoH and RUL, and evaluated their performance in terms of accuracy and inference time. Additionally, this mapping identifies and analyzes the most commonly used combinations of sensors in specific applications, such as vehicular batteries. The mapping concludes by highlighting potential gaps and opportunities for future research, which lays the foundation for further advancements in the field.

摘要
This study uses machine learning methods to estimate the SoH and RUL of lead-acid batteries. These indicators are critical in battery management systems for electric vehicles, renewable energy systems, and other applications that rely heavily on this battery technology. The study analyzed the types of machine learning algorithms used for estimating SoH and RUL, and evaluated their performance in terms of accuracy and inference time. Additionally, the study identified and analyzed the most commonly used combinations of sensors in specific applications, such as vehicular batteries.The study concludes by highlighting potential gaps and opportunities for future research, providing a foundation for further advancements in the field.

SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization

paper_url: http://arxiv.org/abs/2307.05162
repo_url: None
paper_authors: Kunal Suri, Prakhar Mishra, Saumajit Saha, Atul Singh
for: 本研究旨在探讨 Parametric Efficient Fine Tuning (PEFT) 方法可以提高适用于域域特定应用场景的语言模型的性能。
methods: 本研究使用的是 Low Rank Adaptation (LoRA) 方法，它在保持大语言模型为固定基础之前，添加额外层次，并使用 PEFT 方法进行微调。
results: 实验结果显示，LoRA 方法在临床对话概要SUMMARIZATION 任务上达到了与端到端微调相当的性能水平。

Abstract
Finetuning Large Language Models helps improve the results for domain-specific use cases. End-to-end finetuning of large language models is time and resource intensive and has high storage requirements to store the finetuned version of the large language model. Parameter Efficient Fine Tuning (PEFT) methods address the time and resource challenges by keeping the large language model as a fixed base and add additional layers, which the PEFT methods finetune. This paper demonstrates the evaluation results for one such PEFT method Low Rank Adaptation (LoRA), for Clinical Dialogue Summarization. The evaluation results show that LoRA works at par with end-to-end finetuning for a large language model. The paper presents the evaluations done for solving both the Subtask A and B from ImageCLEFmedical {https://www.imageclef.org/2023/medical}

摘要
适应域pecific用 caso 中的语言模型细化可以提高结果。整个终端细化大语言模型需要很多时间和资源，同时存储细化后的大语言模型也需要很大的存储空间。Parameter Efficient Fine Tuning (PEFT) 方法解决了这些时间和资源挑战，它保留了大语言模型作为固定基础，并在其上添加了额外层，这些层由 PEFT 方法进行细化。本文介绍了一种名为 Low Rank Adaptation (LoRA) 的 PEFT 方法，用于临床对话摘要。试验结果表明，LoRA 与终端细化大语言模型相当。文章介绍了解决 ImageCLEF 医学 {https://www.imageclef.org/2023/medical} 中的两个任务 A 和 B 的评估结果。

Multiobjective Hydropower Reservoir Operation Optimization with Transformer-Based Deep Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.05643
repo_url: None
paper_authors: Rixin Wu, Ran Wang, Jie Hao, Qiang Wu, Ping Wang
for: 这个研究的目的是为了对多座水库系统进行调和，以确保发电、生态保护和居民用水的平衡。methods: 这篇研究使用了深度强化学习法，并将 transformer 框架组合到了运算决策中。多头注意机制和多贝勒网络实现了储存和居民区域资讯的提取，并生成了适当的运作决策。results: 实验结果显示，使用这种方法可以生成适当的运作结果，比较于现有方法提高发电量10.11%，减少调整年度调整流量差异39.69%，增加水货收益4.10%。因此，这种方法可以对多座水库系统进行有效的调和运作。

Abstract
Due to shortage of water resources and increasing water demands, the joint operation of multireservoir systems for balancing power generation, ecological protection, and the residential water supply has become a critical issue in hydropower management. However, the numerous constraints and nonlinearity of multiple reservoirs make solving this problem time-consuming. To address this challenge, a deep reinforcement learning approach that incorporates a transformer framework is proposed. The multihead attention mechanism of the encoder effectively extracts information from reservoirs and residential areas, and the multireservoir attention network of the decoder generates suitable operational decisions. The proposed method is applied to Lake Mead and Lake Powell in the Colorado River Basin. The experimental results demonstrate that the transformer-based deep reinforcement learning approach can produce appropriate operational outcomes. Compared to a state-of-the-art method, the operation strategies produced by the proposed approach generate 10.11% more electricity, reduce the amended annual proportional flow deviation by 39.69%, and increase water supply revenue by 4.10%. Consequently, the proposed approach offers an effective method for the multiobjective operation of multihydropower reservoir systems.

摘要
Simplified Chinese:由于水资源短缺和增长水需求， JOINT操作多个水电堤system为平衡发电、生态保护和居民用水供应成为水力管理中的核心问题。然而，多个堤坝的约束和非线性使得解决这个问题占用了很多时间。为 Addressing this challenge, a deep reinforcement learning approach that incorporates a transformer framework is proposed. The multi-head attention mechanism of the encoder effectively extracts information from reservoirs and residential areas, and the multi-reservoir attention network of the decoder generates suitable operational decisions. The proposed method is applied to Lake Mead and Lake Powell in the Colorado River Basin. The experimental results demonstrate that the transformer-based deep reinforcement learning approach can produce appropriate operational outcomes. Compared to a state-of-the-art method, the operation strategies produced by the proposed approach generate 10.11% more electricity, reduce the amended annual proportional flow deviation by 39.69%, and increase water supply revenue by 4.10%. Therefore, the proposed approach offers an effective method for the multiobjective operation of multihydropower reservoir systems.

On the Effectiveness of Speech Self-supervised Learning for Music

paper_url: http://arxiv.org/abs/2307.05161
repo_url: None
paper_authors: Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu
for: 本研究探讨了自主学习（SSL）在音乐信息检索（MIR）领域的应用，并评估了两种不同的speech相关模型在音乐数据上的适用性。
methods: 本研究使用了两种不同的speech相关模型，namely data2vec1.0和Hubert，并在不同的预训练配置下训练了12个SSL模型，共计95M个参数。
results: 研究发现，通过训练音乐数据可以改善MIR任务的性能，即使使用了speech相关的模型。然而，研究发现现有的speech导向的设计在处理多重音频信息方面存在局限性。基于实验结果，本研究还提出了未来音乐SSL策略和模式的设计建议。

Abstract
Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms.

摘要
我们探索了将 Speech SSL 模型应用于音乐录制的音乐适应 (SSL)，并使用了 two distinctive speech-related models, data2vec1.0和Hubert，并将其称为music2vec和musicHuBERT，respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modeling polyphonic information. Based on the experimental results, we also provide empirical suggestions for designing future musical SSL strategies and paradigms.

Fast Neural Network Inference on FPGAs for Triggering on Long-Lived Particles at Colliders

paper_url: http://arxiv.org/abs/2307.05152
repo_url: None
paper_authors: Andrea Coccaro, Francesco Armando Di Bello, Stefano Giagu, Lucrezia Rambelli, Nicola Stocchetti
for: 这项研究是为了开发一种高效的触发和获取系统，以便更好地处理高能物理实验中的碰撞事件。
methods: 这项研究使用了FPGA卡来实现不同的计算方法，以提高触发策略的效率。
results: 研究发现，使用FPGA卡加速的机器学习算法可以保持高度的精度，而且加速时间较短。此外，对比CPU和GPU硬件设置，FPGA卡的加速效果更好。

Abstract
Experimental particle physics demands a sophisticated trigger and acquisition system capable to efficiently retain the collisions of interest for further investigation. Heterogeneous computing with the employment of FPGA cards may emerge as a trending technology for the triggering strategy of the upcoming high-luminosity program of the Large Hadron Collider at CERN. In this context, we present two machine-learning algorithms for selecting events where neutral long-lived particles decay within the detector volume studying their accuracy and inference time when accelerated on commercially available Xilinx FPGA accelerator cards. The inference time is also confronted with a CPU- and GPU-based hardware setup. The proposed new algorithms are proven efficient for the considered benchmark physics scenario and their accuracy is found to not degrade when accelerated on the FPGA cards. The results indicate that all tested architectures fit within the latency requirements of a second-level trigger farm and that exploiting accelerator technologies for real-time processing of particle-physics collisions is a promising research field that deserves additional investigations, in particular with machine-learning models with a large number of trainable parameters.

摘要
Note: The text has been translated using the Simplified Chinese language model, which is a standardized form of Chinese that is used in mainland China and Singapore. The translation is written in the traditional Chinese characters and not in the simplified ones, which are used in mainland China.

ConFL: Constraint-guided Fuzzing for Machine Learning Framework

paper_url: http://arxiv.org/abs/2307.05642
repo_url: None
paper_authors: Zhao Liu, Quanchen Zou, Tian Yu, Xuan Wang, Guozhu Meng, Kai Chen, Deyue Zhang
For: The paper aims to propose a constraint-guided fuzzer for machine learning (ML) frameworks to improve the efficiency and effectiveness of fuzzing.* Methods: The proposed ConFL fuzzer automatically extracts constraints from kernel codes without prior knowledge, and uses these constraints to generate valid inputs that can pass verification and explore deeper paths of kernel codes. The paper also designs a grouping technique to boost fuzzing efficiency.* Results: The paper evaluates the performance of ConFL mainly on Tensorflow and finds that it covers more code lines and generates more valid inputs than state-of-the-art fuzzers. ConFL also discovers 84 previously unknown vulnerabilities in different versions of Tensorflow, including 3 critical-severity and 13 high-severity vulnerabilities. The paper also extends ConFL to test PyTorch and Paddle, finding 7 vulnerabilities to date.Here is the simplified Chinese text for the three key points:* For: 本研究旨在提出一种基于约束的机器学习（ML）框架测试工具，以提高测试效率和可靠性。* Methods: 提议的 ConFL 测试工具可以自动从核心代码中提取约束，无需任何先驱知识。基于这些约束，ConFL 可以生成可通过验证的有效输入，并探索更深层次的核心代码。 paper 还提出了分组技术，以提高测试效率。* Results: 本研究主要测试 Tensorflow，发现 ConFL 能够覆盖更多的代码行数，并生成更多的有效输入。此外，ConFL 还发现了 Tensorflow 不同版本中的84个新的漏洞，其中有3个是严重性高的漏洞。 paper 还扩展了 ConFL 到测试 PyTorch 和 Paddle，到目前为止发现了7个漏洞。

Abstract
As machine learning gains prominence in various sectors of society for automated decision-making, concerns have risen regarding potential vulnerabilities in machine learning (ML) frameworks. Nevertheless, testing these frameworks is a daunting task due to their intricate implementation. Previous research on fuzzing ML frameworks has struggled to effectively extract input constraints and generate valid inputs, leading to extended fuzzing durations for deep execution or revealing the target crash. In this paper, we propose ConFL, a constraint-guided fuzzer for ML frameworks. ConFL automatically extracting constraints from kernel codes without the need for any prior knowledge. Guided by the constraints, ConFL is able to generate valid inputs that can pass the verification and explore deeper paths of kernel codes. In addition, we design a grouping technique to boost the fuzzing efficiency. To demonstrate the effectiveness of ConFL, we evaluated its performance mainly on Tensorflow. We find that ConFL is able to cover more code lines, and generate more valid inputs than state-of-the-art (SOTA) fuzzers. More importantly, ConFL found 84 previously unknown vulnerabilities in different versions of Tensorflow, all of which were assigned with new CVE ids, of which 3 were critical-severity and 13 were high-severity. We also extended ConFL to test PyTorch and Paddle, 7 vulnerabilities are found to date.

摘要
Machine learning 在不同领域的自动化决策中受到推广，但是有关机器学习（ML）架构的可能漏洞问题却愈来愈严重。实际上，测试这些架构是一个艰辛的任务，因为它们的实现非常复杂。在这篇论文中，我们提出了 ConFL，一个基于条件的对ML架构的搜寻器。ConFL可以自动从核心代码中提取约束，不需要任何先前知识。根据这些约束，ConFL能够产生有效的输入，并让核心代码进行更深入的测试。此外，我们设计了一种分组技术，以提高搜寻效率。为证明 ConFL 的有效性，我们主要对 Tensorflow 进行评估。我们发现 ConFL 能够覆盖更多的代码行数，并产生更多的有效的输入，比起现有的 SOTA 搜寻器。更重要的是，ConFL 发现了 Tensorflow 不同版本中的84个未知漏洞，其中3个是严重性高的漏洞。我们还将 ConFL 扩展到 PyTorch 和 Paddle，发现了7个漏洞。

Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks

paper_url: http://arxiv.org/abs/2307.05639
repo_url: https://github.com/dannyzx/grbf-nns
paper_authors: Danny D’Agostino, Ilija Ilievski, Christine Annette Shoemaker
for: 提出了一种可以同时实现强预测性和人类可读性的机器学习模型，以解决机器学习研究中最大化预测性和人类可读性之间的矛盾。
methods: 提出了一种基于径向基函数神经网络模型的修改，将其核函数加载了一个学习的精度矩阵。通过训练模型后提取precision矩阵的谱спект的方式，可以提取有价值信息，包括方向敏感度最大化和输入变量的重要性排名。
results: 通过对回归、分类和特征选择任务进行数值实验，与其他机器学习模型和深度学习基于嵌入特征选择技术进行比较，结果显示该模型不仅在预测性方面与竞争者具有优异性，还提供了可读性高的结果，可能为实际应用中决策过程提供帮助。

Abstract
Providing a model that achieves a strong predictive performance and at the same time is interpretable by humans is one of the most difficult challenges in machine learning research due to the conflicting nature of these two objectives. To address this challenge, we propose a modification of the Radial Basis Function Neural Network model by equipping its Gaussian kernel with a learnable precision matrix. We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed. In particular, the eigenvectors explain the directions of maximum sensitivity of the model revealing the active subspace and suggesting potential applications for supervised dimensionality reduction. At the same time, the eigenvectors highlight the relationship in terms of absolute variation between the input and the latent variables, thereby allowing us to extract a ranking of the input variables based on their importance to the prediction task enhancing the model interpretability. We conducted numerical experiments for regression, classification, and feature selection tasks, comparing our model against popular machine learning models and the state-of-the-art deep learning-based embedding feature selection techniques. Our results demonstrate that the proposed model does not only yield an attractive prediction performance with respect to the competitors but also provides meaningful and interpretable results that potentially could assist the decision-making process in real-world applications. A PyTorch implementation of the model is available on GitHub at the following link. https://github.com/dannyzx/GRBF-NNs

摘要
提供一个具有强预测性能并且可以被人类理解的模型是机器学习研究中最大的挑战，这是因为这两个目标之间存在矛盾。为解决这个挑战，我们提议对卷积函数神经网络模型进行修改，并在其核函数中添加学习型精度矩阵。我们发现在训练过程完成后，模型的准确矩阵的谱有着极大的信息价值。特别是，准确矩阵的特征向量可以解释模型的最大敏感方向，披露活动子空间和可能的维度减少任务。同时，特征向量还可以描述输入和隐藏变量之间的绝对变化关系，从而提供输入变量的重要性排名，这有助于提高模型的解释性。我们对回归、分类和特征选择任务进行了数学实验，与流行的机器学习模型和深度学习基于嵌入特征选择技术进行比较。我们的结果表明，我们的模型不仅在竞争对手中具有吸引人的预测性能，而且提供了可读ible和可解释的结果，这些结果可能在实际应用中帮助决策过程。PyTorch实现的模型可以在GitHub上找到，请参考以下链接：https://github.com/dannyzx/GRBF-NNs。

A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and Directions

paper_url: http://arxiv.org/abs/2307.05638
repo_url: None
paper_authors: Peng Yan, Ahmed Abdulkadir, Matthias Rosenthal, Gerrit A. Schatte, Benjamin F. Grewe, Thilo Stadelmann
for: This paper focuses on the application of deep transfer learning in industrial process monitoring and anomaly detection, with the goal of enhancing efficiency and optimizing quality.
methods: The paper reviews and examines the problem settings of transfer learning and classifies the prevailing deep transfer learning methods, with a focus on their applications in industrial contexts.
results: The paper discusses the challenges and limitations of deep transfer learning in industrial contexts, and provides practical directions for solution design and implementation, including specific, actionable suggestions.Here’s the Chinese translation of the three key information points:
for: 这篇论文关注于工业过程监测和异常检测中使用深度转移学习，以提高效率和质量。
methods: 论文检查和分类深度转移学习方法的问题设置，主要在工业上下文中应用。
results: 论文描述了深度转移学习在工业上下文中的挑战和局限性，并提供了实践的解决方案设计和实施方法，包括具体的、可行的建议。

Abstract
Automating the monitoring of industrial processes has the potential to enhance efficiency and optimize quality by promptly detecting abnormal events and thus facilitating timely interventions. Deep learning, with its capacity to discern non-trivial patterns within large datasets, plays a pivotal role in this process. Standard deep learning methods are suitable to solve a specific task given a specific type of data. During training, the algorithms demand large volumes of labeled training data. However, due to the dynamic nature of processes and the environment, it is impractical to acquire the needed data for standard deep learning training for every slightly different case anew. Deep transfer learning offers a solution to this problem. By leveraging knowledge from related tasks and accounting for variations in data distributions, this learning framework solves new tasks even with little or no additional labeled data. The approach bypasses the need to retrain a model from scratch for every new setup and dramatically reduces the labeled data requirement. This survey provides an in-depth review of deep transfer learning, examining the problem settings of transfer learning and classifying the prevailing deep transfer learning methods. Moreover, we delve into applying deep transfer learning in the context of a broad spectrum of time series anomaly detection tasks prevalent in primary industrial domains, e.g., manufacturing process monitoring, predictive maintenance, energy management, and infrastructure facility monitoring. We conclude this survey by underlining the challenges and limitations of deep transfer learning in industrial contexts. We also provide practical directions for solution design and implementation for these tasks, leading to specific, actionable suggestions.

摘要
自动监测工业过程有可能提高效率和优化质量，通过及时检测异常事件，以便及时干预。深度学习，作为检测非平凡模式的能力，在这个过程中扮演着关键角色。标准的深度学习方法适用于特定任务和数据类型。在训练过程中，算法需要大量标注数据。然而，由于生产过程和环境的动态性，获得需要的数据是不可能的。深度传输学习提供了一个解决方案，通过利用相关任务的知识和考虑数据分布的差异，解决新任务，甚至无需额外的标注数据。这种学习框架绕过了对新设置的模型重新训练的需要，减少了标注数据的需求，从而减少了训练时间。本文提供了深度传输学习的深入审查，包括传输学习问题的设定和深度传输学习方法的分类。此外，我们还探讨了在主要工业领域中广泛存在的时间序列异常检测任务中的深度传输学习应用，例如制造过程监测、预测维护、能源管理和基础设施监测。我们 conclude this survey by highlighting the challenges and limitations of deep transfer learning in industrial contexts, and provide practical directions for solution design and implementation, leading to specific, actionable suggestions.

Deep Probabilistic Movement Primitives with a Bayesian Aggregator

paper_url: http://arxiv.org/abs/2307.05141
repo_url: None
paper_authors: Michael Przystupa, Faezeh Haghverd, Martin Jagersand, Samuele Tosatto
for: 本研究的目的是提出一种深度运动原理模型，可以执行先前的运动操作，包括时间调整、混合、终点决定和上下文决定。
methods: 本研究使用了深度学习模型来实现运动原理模型，并使用bayesianContext aggregator来实现更好的上下文决定和混合。
results: 实验结果表明，我们的方法可以在更多的输入选择下实现复杂的运动复制，而且与基eline运动原理模型提供的操作相当。

Abstract
Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.

摘要
movement primitives 是可训练的参数化模型，可以复制 Starting from a limited set of demonstrations, robotic movements. Previous works have proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know and I can provide that as well.

Speech Diarization and ASR with GMM

paper_url: http://arxiv.org/abs/2307.05637
repo_url: None
paper_authors: Aayush Kumar Sharma, Vineet Bhavikatti, Amogh Nidawani, Dr. Siddappaji, Sanath P, Dr Geetishree Mishra
for: 这个研究论文主要探讨了语音分类和自动语音识别（ASR）问题。语音分类是将音频流中的各个发言者分类为不同的个体，使用ASR转文本来实现这一目标。
methods: 我们在语音分类方法中使用了 Gaussian Mixer Model（GMM）来表示语音段。计算间集距离基于GMM参数，使用距离阈值作为停止 criterion。
results: 我们的主要目标是开发一个可以最小化单词错误率（WER）度量的语音转文本模型。

Abstract
In this research paper, we delve into the topics of Speech Diarization and Automatic Speech Recognition (ASR). Speech diarization involves the separation of individual speakers within an audio stream. By employing the ASR transcript, the diarization process aims to segregate each speaker's utterances, grouping them based on their unique audio characteristics. On the other hand, Automatic Speech Recognition refers to the capability of a machine or program to identify and convert spoken words and phrases into a machine-readable format. In our speech diarization approach, we utilize the Gaussian Mixer Model (GMM) to represent speech segments. The inter-cluster distance is computed based on the GMM parameters, and the distance threshold serves as the stopping criterion. ASR entails the conversion of an unknown speech waveform into a corresponding written transcription. The speech signal is analyzed using synchronized algorithms, taking into account the pitch frequency. Our primary objective typically revolves around developing a model that minimizes the Word Error Rate (WER) metric during speech transcription.

摘要
在这篇研究报告中，我们探讨了语音分类和自动语音识别（ASR）的话题。语音分类是将音频流中的各个说话人分割成不同的个体。通过使用ASR转cript，分类过程将每个说话人的讲话分组成为各自的音频特征。相反，自动语音识别是一种机器或程序能够识别和将口头语音转换成可读格式的能力。在我们的语音分类方法中，我们使用高斯混合模型（GMM）来表示语音段落。间隔距离是根据GMM参数计算的，并且距离阈值作为停止条件。ASR则是将未知的语音波形转换成相应的书面转cript。语音信号被同步的算法分析，考虑到抽屉频率。我们的主要目标通常是开发一个能够最小化单词错误率（WER）度量的模型。

TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation

paper_url: http://arxiv.org/abs/2307.05134
repo_url: https://github.com/grimalpaul/tiam
paper_authors: Paul Grimal, Hervé Le Borgne, Olivier Ferret, Julien Tourille
for: 本研究旨在评估文本到图像（T2I）模型生成图像的质量，特别是考虑提示中的重要内容是否正确反映在生成图像中。
methods: 我们提出了一种基于提示模板的新评价指标，可以更好地衡量生成图像与提示中的内容的对应关系，包括提示中的对象类型、数量和颜色等方面。
results: 我们通过对多种最新的T2I模型进行研究发现，图像质量受到种子图像的随机变化的影响很大，同时提示中的概念数量、顺序和颜色属性也会影响图像质量。此外，我们还发现了一些特定的种子图像可以生成更高质量的图像，开启了新的研究方向。

Abstract
The progress in the generation of synthetic images has made it crucial to assess their quality. While several metrics have been proposed to assess the rendering of images, it is crucial for Text-to-Image (T2I) models, which generate images based on a prompt, to consider additional aspects such as to which extent the generated image matches the important content of the prompt. Moreover, although the generated images usually result from a random starting point, the influence of this one is generally not considered. In this article, we propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images. It allows us to better characterize the alignment in terms of the type of the specified objects, their number, and their color. We conducted a study on several recent T2I models about various aspects. An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the latent noise used as a seed for the images. We also quantify the influence of the number of concepts in the prompt, their order as well as their (color) attributes. Finally, our method allows us to identify some latent seeds that produce better images than others, opening novel directions of research on this understudied topic.

摘要
“随着生成Synthetic图像的进步，它成为了评估图像质量的关键。多个指标有被提议来评估图像的渲染，但是Text-to-Image（T2I）模型，它们根据提示生成图像，需要考虑更多的方面，例如提示中重要内容和生成图像之间的对齐度。此外，生成图像通常是由Random Starting Point开始的，但是这个影响通常不被考虑。在这篇文章中，我们提出了一个基于提示模板的新指标，以研究提示中的内容和生成图像之间的对齐度。这允许我们更好地描述对齐度的类型、物件的数量和颜色等方面。我们对多个现代T2I模型进行了一系列研究，获得了一些有趣的结果。例如，图像质量可以很大程度上受到latent noise的影响，并且我们可以量化提示中的概念数量、顺序以及颜色属性的影响。最后，我们的方法允许我们识别一些latent seed的特定对图像质量的影响，开启了一些未经研究的方向。”

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

paper_url: http://arxiv.org/abs/2307.05132
repo_url: None
paper_authors: Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely
For: The paper is focused on exploring the use of self-supervised learning (SSL) in spontaneous text-to-speech (TTS) and predicting the mean opinion scores (MOS) of synthesized speech.* Methods: The paper uses six different SSL models and three layers within each SSL model to evaluate their effectiveness in spontaneous TTS. The authors also extend an existing SSL-based MOS prediction framework to predict the quality of synthesized spontaneous speech.* Results: The paper presents comprehensive experimental results on the use of SSL in spontaneous TTS and MOS prediction, including the performance of different SSL models and layers in predicting the MOS of synthesized speech. The results show that certain SSL models and layers perform better than others in spontaneous TTS and MOS prediction.

Abstract
Self-supervised learning (SSL) speech representations learned from large amounts of diverse, mixed-quality speech data without transcriptions are gaining ground in many speech technology applications. Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech. However, it is still not clear which SSL and which layer from each SSL model is most suited for spontaneous TTS. We address this shortcoming by extending the scope of comparison for SSL in spontaneous TTS to 6 different SSLs and 3 layers within each SSL. Furthermore, SSL has also shown potential in predicting the mean opinion scores (MOS) of synthesized speech, but this has only been done in read-speech MOS prediction. We extend an SSL-based MOS prediction framework previously developed for scoring read speech synthesis and evaluate its performance on synthesized spontaneous speech. All experiments are conducted twice on two different spontaneous corpora in order to find generalizable trends. Overall, we present comprehensive experimental results on the use of SSL in spontaneous TTS and MOS prediction to further quantify and understand how SSL can be used in spontaneous TTS. Audios samples: https://www.speech.kth.se/tts-demos/sp_ssl_tts

摘要
自我指导学习（SSL）所学习的语音表示法，从大量多样性的杂质语音数据中学习，无需转录。先前的研究表明，SSL 是两stage text-to-speech（TTS）中的有效中间表示，包括阅读和自发语音。然而，尚未确定哪些 SSL 和哪层在每个 SSL 模型最适合自发 TTS。我们解决这个缺陷，通过对 SSL 在自发 TTS 中进行扩展比较，包括 6 种 SSL 和每个 SSL 模型中的 3 层。此外，SSL 还显示了在synthesized speech的 mean opinion scores（MOS）预测中的潜力，但只有在阅读 speech MOS 预测中进行过。我们扩展了以前为阅读 speech synthesis 预测 MOS 的 SSL-based 框架，并对 synthesized spontaneous speech 进行评估。所有实验都在两个不同的自发语音 corpus 上进行了两次，以找到通用的趋势。总的来说，我们提供了 SSL 在自发 TTS 和 MOS 预测中的详细实验结果，以更好地量化和理解 SSL 在自发 TTS 中的使用。响应样本：https://www.speech.kth.se/tts-demos/sp_ssl_tts

Enhancing Continuous Time Series Modelling with a Latent ODE-LSTM Approach

paper_url: http://arxiv.org/abs/2307.05126
repo_url: None
paper_authors: C. Coelho, M. Fernanda P. Costa, L. L. Ferrás
for: 这个论文主要针对连续时间序列（CTS）的模型化，尤其是在 irregular sampling rate 和高频 sampling rate 下。
methods: 该论文提出了一种基于 ODE-RNN 和 Latent ODE 模型的新模型 - Latent ODE-LSTM，以解决 CTTS 模型化中的vanishing和exploding gradients问题。
results: 数值实验表明，新提出的 Latent ODE-LSTM 模型在模型 CTTS WITH regular和irregular sampling rates 时，表现更好于 Latent ODE-RNNs，并能够避免在训练过程中的vanishing和exploding gradients问题。

Abstract
Due to their dynamic properties such as irregular sampling rate and high-frequency sampling, Continuous Time Series (CTS) are found in many applications. Since CTS with irregular sampling rate are difficult to model with standard Recurrent Neural Networks (RNNs), RNNs have been generalised to have continuous-time hidden dynamics defined by a Neural Ordinary Differential Equation (Neural ODE), leading to the ODE-RNN model. Another approach that provides a better modelling is that of the Latent ODE model, which constructs a continuous-time model where a latent state is defined at all times. The Latent ODE model uses a standard RNN as the encoder and a Neural ODE as the decoder. However, since the RNN encoder leads to difficulties with missing data and ill-defined latent variables, a Latent ODE-RNN model has recently been proposed that uses a ODE-RNN model as the encoder instead. Both the Latent ODE and Latent ODE-RNN models are difficult to train due to the vanishing and exploding gradients problem. To overcome this problem, the main contribution of this paper is to propose and illustrate a new model based on a new Latent ODE using an ODE-LSTM (Long Short-Term Memory) network as an encoder -- the Latent ODE-LSTM model. To limit the growth of the gradients the Norm Gradient Clipping strategy was embedded on the Latent ODE-LSTM model. The performance evaluation of the new Latent ODE-LSTM (with and without Norm Gradient Clipping) for modelling CTS with regular and irregular sampling rates is then demonstrated. Numerical experiments show that the new Latent ODE-LSTM performs better than Latent ODE-RNNs and can avoid the vanishing and exploding gradients during training.

摘要
因为它们的动态属性，如不规则采样速率和高频采样， kontinuous time series (CTS) 在许多应用中出现。由于标准的回归神经网络 (RNN) 不能模型 CTs 的不规则采样速率，因此 RNN 被推广为有连续时间隐藏动力学定义的神经普通微分方程 (Neural Ordinary Differential Equation, ODE)，导致 ODE-RNN 模型的出现。另一种提供更好的模型是隐藏 ODE 模型，该模型在全部时间上定义了隐藏状态。隐藏 ODE 模型使用标准的 RNN 作为编码器和 ODE 作为解码器。然而，由 RNN 编码器导致缺失数据和不定义的隐藏变量，因此在最近提出了 Latent ODE-RNN 模型，该模型使用 ODE-RNN 作为编码器。两种 Latent ODE 和 Latent ODE-RNN 模型都困难于训练，主要是因为训练过程中的混血和爆炸梯度问题。为解决这个问题，本文的主要贡献是提出了一种基于新的隐藏 ODE 模型，使用 ODE-LSTM (Long Short-Term Memory) 网络作为编码器 - 隐藏 ODE-LSTM 模型。为限制梯度的增长，在隐藏 ODE-LSTM 模型中嵌入了 Norm Gradient Clipping 策略。对 CTs WITH 规则和无规则采样速率进行表达评估，并对新模型的性能进行比较。数值实验显示，新的隐藏 ODE-LSTM 模型比 Latent ODE-RNN 更好，并且可以在训练过程中避免混血和爆炸梯度问题。

Transaction Fraud Detection via Spatial-Temporal-Aware Graph Transformer

paper_url: http://arxiv.org/abs/2307.05121
repo_url: None
paper_authors: Yue Tian, Guanjun Liu
for: 预防金融 transactions 诈骗
methods: 使用 Graph Neural Networks (GNNs) 和 transformer module 模型，capture temporal dependencies 和 learn local and global information
results: 在两个金融数据集上比较于常见 GNN 模型和 GNN-based fraud detectors 表现出色，有效地检测 transaction fraud

Abstract
How to obtain informative representations of transactions and then perform the identification of fraudulent transactions is a crucial part of ensuring financial security. Recent studies apply Graph Neural Networks (GNNs) to the transaction fraud detection problem. Nevertheless, they encounter challenges in effectively learning spatial-temporal information due to structural limitations. Moreover, few prior GNN-based detectors have recognized the significance of incorporating global information, which encompasses similar behavioral patterns and offers valuable insights for discriminative representation learning. Therefore, we propose a novel heterogeneous graph neural network called Spatial-Temporal-Aware Graph Transformer (STA-GT) for transaction fraud detection problems. Specifically, we design a temporal encoding strategy to capture temporal dependencies and incorporate it into the graph neural network framework, enhancing spatial-temporal information modeling and improving expressive ability. Furthermore, we introduce a transformer module to learn local and global information. Pairwise node-node interactions overcome the limitation of the GNN structure and build up the interactions with the target node and long-distance ones. Experimental results on two financial datasets compared to general GNN models and GNN-based fraud detectors demonstrate that our proposed method STA-GT is effective on the transaction fraud detection task.

摘要
如何获得有用的交易表示并实现交易识别是金融安全的关键环节。现有研究利用图神经网络（GNN）解决交易诈骗问题。然而，它们在有效地学习空间-时间信息方面遇到了结构限制。此外，先前的GNN基本检测器很少认可全球信息的重要性，这些信息包括类似行为模式，它们提供了价值的表示学习意味。因此，我们提出了一种新的多类型图神经网络模型，即空间-时间感知图Transformer（STA-GT），用于交易诈骗检测问题。具体来说，我们设计了时间编码策略，以捕捉时间依赖关系，并将其 integrate into GNN框架中，提高空间-时间信息模型化的表达能力。此外，我们引入了Transformer模块，以学习本地和全球信息。对于目标节点和远程节点之间的对比交互，我们建立了对抗限制GNN结构的对抗。实验结果表明，我们提出的STA-GT方法在两个金融数据集上与通用GNN模型和GNN基本检测器相比，在交易诈骗检测任务上表现出色。

$\ell_p$-Regression in the Arbitrary Partition Model of Communication

paper_url: http://arxiv.org/abs/2307.05117
repo_url: None
paper_authors: Yi Li, Honghao Lin, David P. Woodruff
for: 这个论文研究了分布式 $\ell_p$-回归问题在协调器模型中的随机通信复杂度， $p\in (0,2]$。
methods: 作者使用了随机化通信复杂度来研究这个问题，并提供了更好的上界和下界。
results: 作者得到了更好的上界，特别是在 $p = 2$ 情况下，提供了首个优等式 bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ 比特。在 $p \in (1,2]$ 情况下，作者提供了 $\tilde{O}(sd^2/\epsilon + sd/\text{poly}(\epsilon))$ 上界，其中 $d$ 是数据维度。

Abstract
We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\epsilon)$-approximate solution to $\min_{x\in\mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p$. Here $M \leq \mathrm{poly}(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits. For $p \in (1,2)$,we obtain an $\tilde{O}(sd^2/\epsilon + sd/\mathrm{poly}(\epsilon))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\epsilon^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\epsilon)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).

摘要
我们考虑分布式 $\ell_p$ 回传问题的随机通信复杂度在协调模型中，其中 $p \in (0,2]$。在这个问题中，有一个协调者和 $s$ 个服务器。每个服务器 $i$ 获得 $A^i \in \{-M, -M+1, \ldots, M\}^{n \times d}$ 和 $b^i \in \{-M, -M+1, \ldots, M\}^n$，并且协调者想要找到 $(1+\epsilon)$-近似解的 $\min_{x \in \mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p$。其中 $M \leq \text{poly}(nd)$ 是一个实数。这个模型，称为可以分配式模型，是由数据以添加方式分享到服务器。我们获得了非常改善的上界。对于 $p = 2$，即最小二乘回传，我们给出了首个 $\tilde{\Theta}(sd^2 + sd/\epsilon)$ 比特的上界。对于 $p \in (1,2)$，我们获得了 $\tilde{O}(sd^2/\epsilon + sd/\text{poly}(\epsilon))$ 的上界。特别是当 $d$ 够大时，我们的主要项只靠对应 $\frac{1}{\epsilon}$ 而不是 $\frac{1}{\epsilon^2}$。我们还证明了分布式通信下界，包括 $\Omega(sd^2 + sd/\epsilon^2)$ 和 $\Omega(sd^2 + sd/\epsilon)$。我们的下界明显超过先前的 bounds，例如 (Woodruff et al., COLT, 2013) 和 (Vempala et al., SODA, 2020)。

Conformalization of Sparse Generalized Linear Models

paper_url: http://arxiv.org/abs/2307.05109
repo_url: https://github.com/etashguha/sparse_conformal
paper_authors: Etash Kumar Guha, Eugene Ndiaye, Xiaoming Huo
for: 这篇论文关注的是如何使用简单的线性模型和数字继续技术来实现可靠的预测集，以便在大数据量时进行高效的预测。
methods: 这篇论文使用的方法包括使用唯一的一些变量进行预测，并使用数字继续技术来精确地计算预测集。
results: 论文的结果表明，使用这种方法可以高效地生成预测集，并且可以在不同的数据集上进行高效的预测。

Abstract
Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.

摘要
In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism.We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.

Fundamental limits of overparametrized shallow neural networks for supervised learning

paper_url: http://arxiv.org/abs/2307.05635
repo_url: None
paper_authors: Francesco Camilli, Daria Tieplova, Jean Barbier
for: 这 paper 是关于两层神经网络在有限数据训练下的信息理论分析，以确定神经网络训练时的基本性能限制。
methods: 这 paper 使用了 teacher 网络生成的输入输出对来训练一个二层神经网络，并利用信息理论 bound 来描述神经网络的性能。
results: 这 paper 得到了一些 bounds，表明在有限数据训练下，神经网络的性能受到输入维度、隐藏层Unit 和训练样本数量的限制。这些 bounds 是对任何神经网络训练过程的基本性能限制，并且覆盖了所有网络参数的训练情况。

Abstract
We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, thus yield fundamental performance limits for any neural network (and actually any learning procedure) trained from limited data generated according to our two-layer teacher neural network model. The proof relies on rigorous tools from spin glasses and is guided by ``Gaussian equivalence principles'' lying at the core of numerous recent analyses of neural networks. With respect to the existing literature, which is either non-rigorous or restricted to the case of the learning of the readout weights only, our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the network parameters are trained.

摘要
我们进行了信息理论分析，对一个两层神经网络，从输入输出对生成的教师网络中训练，在过参数化的情况下。我们的结果以bounds的形式表达，关系于一、training数据和神经网络参数之间的共 informations，或二、最优化泛化误差的bayesian优化 bound。我们的 bound 表达式为数据训练样本数、输入维度和隐藏层单元数，从而获得任何神经网络（以及任何学习过程）从有限数据生成的基本性能上限。证明基于磁矿石的精确工具，受到“Gaussian equivalence principles”的引导，这些原理在神经网络分析中具有重要作用。相比现有文献，我们的结果是信息理论的（即不受任何学习算法限制），同时覆盖了所有网络参数都被训练的情况。

A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI

paper_url: http://arxiv.org/abs/2307.05104
repo_url: https://github.com/visual-xai-for-time-series/time-series-xai-perturbation-analysis
paper_authors: Udo Schlegel, Daniel A. Keim
for: 本研究旨在评估时序数据XAI技术中的质量，提供可靠和可解释的机器学习模型。
methods: 本研究使用扰动分析方法评估XAI技术中的质量，通过修改输入数据来评估XAI方法生成的权重。
results: 研究结果表明，扰动分析方法可以有效评估XAI技术中的质量，并提供时序数据XAI技术的强点和局限性。这种方法可以帮助选择适合时序数据的XAI方法，以及开发更可靠和可解释的机器学习模型。

Abstract
Explainable Artificial Intelligence (XAI) has gained significant attention recently as the demand for transparency and interpretability of machine learning models has increased. In particular, XAI for time series data has become increasingly important in finance, healthcare, and climate science. However, evaluating the quality of explanations, such as attributions provided by XAI techniques, remains challenging. This paper provides an in-depth analysis of using perturbations to evaluate attributions extracted from time series models. A perturbation analysis involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method. We apply this approach to several state-of-the-art XAI techniques and evaluate their performance on three time series classification datasets. Our results demonstrate that the perturbation analysis approach can effectively evaluate the quality of attributions and provide insights into the strengths and limitations of XAI techniques. Such an approach can guide the selection of XAI methods for time series data, e.g., focusing on return time rather than precision, and facilitate the development of more reliable and interpretable machine learning models for time series analysis.

摘要
<>将文本翻译成简化中文。<>最近，可解释人工智能（XAI）已经受到了广泛关注，因为机器学习模型的透明度和可解释性的需求增加了。特别是在金融、医疗和气候科学等领域，XAI for time series data已经变得非常重要。然而，评估XAI技术提供的解释质量仍然是一个挑战。这篇论文提供了对使用扰动分析评估XAI方法提供的解释的深入分析。扰动分析通过系统地修改输入数据，并评估对XAI方法生成的解释产生的影响。我们对several state-of-the-art XAI技术进行了应用，并对三个时间序列分类 dataset进行了评估。我们的结果表明，扰动分析方法可以有效评估解释质量，并为XAI技术的选择和开发更可靠和可解释的机器学习模型提供了指导。

PePNet: A Periodicity-Perceived Workload Prediction Network Supporting Rare Occurrence of Heavy Workload

paper_url: http://arxiv.org/abs/2308.01917
repo_url: None
paper_authors: Feiyi Chen, Zhen Qin, Hailiang Zhao, Mengchu Zhou, Shuiguang Deng
for: 提高云服务器的工作负载预测精度，尤其是高工作负载。
methods: 使用统计方法和神经网络方法，并将两种方法结合使用。
results: 对Alibaba2018、SMD数据集和Dinda的数据集进行了广泛的实验，并显示了与现有方法相比的MAPЭ值下降20.0%，特别是对高工作负载下的MAPЭ值下降23.9%。

Abstract
Cloud providers can greatly benefit from accurate workload prediction. However, the workload of cloud servers is highly variable, with occasional heavy workload bursts. This makes workload prediction challenging. There are mainly two categories of workload prediction methods: statistical methods and neural-network-based ones. The former ones rely on strong mathematical assumptions and have reported low accuracy when predicting highly variable workload. The latter ones offer higher overall accuracy, yet they are vulnerable to data imbalance between heavy workload and common one. This impairs the prediction accuracy of neural network-based models on heavy workload. Either the overall inaccuracy of statistic methods or the heavy-workload inaccuracy of neural-network-based models can cause service level agreement violations. Thus, we propose PePNet to improve overall especially heavy workload prediction accuracy. It has two distinctive characteristics: (i) A Periodicity-Perceived Mechanism to detect the existence of periodicity and the length of one period automatically, without any priori knowledge. Furthermore, it fuses periodic information adaptively, which is suitable for periodic, lax periodic and aperiodic time series. (ii) An Achilles' Heel Loss Function iteratively optimizing the most under-fitting part in predicting sequence for each step, which significantly improves the prediction accuracy of heavy load. Extensive experiments conducted on Alibaba2018, SMD dataset and Dinda's dataset demonstrate that PePNet improves MAPE for overall workload by 20.0% on average, compared with state-of-the-art methods. Especially, PePNet improves MAPE for heavy workload by 23.9% on average.

摘要
云提供商可以受益很大地由于准确的工作负载预测。然而，云服务器的工作负载很变化， occasional 强大的工作负载峰值。这使工作负载预测变得困难。根据文章，有两类主要的工作负载预测方法：统计方法和神经网络基于的方法。前者基于强大的数学假设，报告了低精度when predicting highly variable workload。后者提供更高的总精度，但它们对数据不均衡between heavy workload and common one而言，这会降低神经网络基于模型的预测精度。因此，文章提出了PePNet以提高特别是高负载预测精度。它有两个特点：（i）一种Periodicity-Perceived机制，自动检测工作负载是否存在周期性和一个周期的长度，无需任何先验知识。此外，它适应 periodic, lax periodic 和频繁无法时序列。（ii）一种Achilles' Heel Loss Function， iteratively 优化预测序列中最下降的部分，以提高高负载预测精度。根据文章的实验结果，PePNet在Alibaba2018、SMD dataset 和 Dinda's dataset上提高了MAPEs的平均值，相比之下state-of-the-art方法。尤其是，PePNet在高负载预测方面提高了MAPEs的平均值23.9%。

Transaction Fraud Detection via an Adaptive Graph Neural Network

paper_url: http://arxiv.org/abs/2307.05633
repo_url: None
paper_authors: Yue Tian, Guanjun Liu, Jiacun Wang, Mengchu Zhou
for: 提高交易验证 fraud detection 精度，以保障个人和银行的金融安全。
methods: 提出了 Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN)，可以学习交易数据中的准确表示。
results: 对三个真实的金融数据集进行了广泛的实验，显示了 ASA-GNN 的提出方法在交易验证预测中的优于现有方法。

Abstract
Many machine learning methods have been proposed to achieve accurate transaction fraud detection, which is essential to the financial security of individuals and banks. However, most existing methods leverage original features only or require manual feature engineering. They lack the ability to learn discriminative representations from transaction data. Moreover, criminals often commit fraud by imitating cardholders' behaviors, which causes the poor performance of existing detection models. In this paper, we propose an Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN) that learns discriminative representations to improve the performance of transaction fraud detection. A neighbor sampling strategy is performed to filter noisy nodes and supplement information for fraudulent nodes. Specifically, we leverage cosine similarity and edge weights to adaptively select neighbors with similar behavior patterns for target nodes and then find multi-hop neighbors for fraudulent nodes. A neighbor diversity metric is designed by calculating the entropy among neighbors to tackle the camouflage issue of fraudsters and explicitly alleviate the over-smoothing phenomena. Extensive experiments on three real financial datasets demonstrate that the proposed method ASA-GNN outperforms state-of-the-art ones.

摘要
多种机器学习方法已经被提议用于实现准确的交易诈骗检测，这是个人和银行的金融安全的关键。然而，大多数现有方法只使用原始特征或需要手动工程特征。它们缺乏学习特征表示的能力。此外，诈骗者们常常通过模仿卡户的行为进行诈骗，这导致现有的检测模型表现不佳。在这篇论文中，我们提出了一种适应采样和汇集基于图 neural network（ASA-GNN），用于提高交易诈骗检测的性能。我们使用cosine相似性和边重要性来适应选择target节点周围的相似行为模式的邻居，然后找到多跳邻居 для诈骗节点。我们还设计了邻居多样性度量，通过计算邻居Entropy来解决诈骗者的掩盖问题，并直接缓解过滤问题。我们在三个实际金融dataset上进行了广泛的实验，结果显示，我们提出的方法ASA-GNN在相比之前的方法上表现出优异的表现。

Estimating label quality and errors in semantic segmentation data via any model

paper_url: http://arxiv.org/abs/2307.05080
repo_url: None
paper_authors: Vedang Lad, Jonas Mueller
for: 提高 semantic segmentation 数据集的标注质量，减少人工标注错误。
methods: 使用 probabilistic 预测来评估标注质量，可以使用任何模型架构和训练方法。
results: 通过评估多种标注质量分数方法，发现使用soft-minimum 方法可以最 effectively 标识图像中的错误标注，并且适用于多种类型的标注错误。

Abstract
The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset, which is critical in sensitive applications such as medical imaging and autonomous vehicles. Widely applicable, our label quality scores rely on probabilistic predictions from a trained segmentation model -- any model architecture and training procedure can be utilized. Here we study 7 different label quality scoring methods used in conjunction with a DeepLabV3+ or a FPN segmentation model to detect annotation errors in a version of the SYNTHIA dataset. Precision-recall evaluations reveal a score -- the soft-minimum of the model-estimated likelihoods of each pixel's annotated class -- that is particularly effective to identify images that are mislabeled, across multiple types of annotation error.

摘要
人工标注过程中的 semantic segmentation 数据集的劳动密集程度可能会导致错误，因为人们很难将每个像素都正确地标注。我们研究自动检测标注错误的算法，特别是计算标签质量分数，以便根据分数 lowest 优先级顺序对数据进行审核，以确保高质量的训练/评估数据集，这是敏感应用，如医学影像和自动驾驶等。我们的标签质量分数可以任意选择模型结构和训练方法。在这里，我们研究了7种不同的标签质量分数方法，与 DeepLabV3+ 或 FPN segmentation模型结合使用，检测 SYNTHIA 数据集中的标注错误。精度-回归评估表明，使用模型估计每个像素的类别概率的软最小值是特别有效地标识错批标注， across multiple types of annotation error。

A Theory of Bounded Inductive Rationality

paper_url: http://arxiv.org/abs/2307.05068
repo_url: None
paper_authors: Caspar Oesterheld, Abram Demski, Vincent Conitzer
for: 这篇论文是为了研究不假设完整知识的理性决策理论而写的。
methods: 论文使用了 inductive reasoning 和 infinitely often testing 来定义理性。
results: 论文提出了一种新的理性决策理论，并证明了这种理性决策可以在各种决策问题上达到 Desirable Properties，如值 random and pseudo-random lotteries at their expected reward。此外，论文还证明了在不同代理之间的竞争中， bounded rational inductive agents 可以 converges to certain strategies。

Abstract
The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to.

摘要
dominant 理论假设了推理完全性。即在面临决策问题时，一个代理人可以完成所有相关计算并确定所有有关逻辑/数学声明的真值。这个假设是不现实的，例如在购买远程数字π的赌注或面临计算不可解压缩的规划问题时。此外，假设推理完全性会导致环境中描述代理人本身的矛盾。在这篇论文中，我们开发了一种不假设推理完全性的决策理论。我们考虑代理人在面临决策问题（包括购买数字π的赌注或与其他代理人的游戏）时的行为。我们的主要贡献是提供了一种合理的决策理论，即代理人应该对每个可计算的假设进行无限次测试，并且在这些假设保证高奖励时采取这些假设。然后，我们证明了这种合理性具有其他愉悦的属性，例如对随机和 Pseudo-Random 抽签有价值。最后，我们考虑不同代理人之间的战略互动，并证明了这些代理人可以 converges 到的策略。

Portfolio Optimization: A Comparative Study

paper_url: http://arxiv.org/abs/2307.05048
repo_url: https://github.com/riddhi927/Portfolio-Optimization
paper_authors: Jaydip Sen, Subhasis Dasgupta
for: 这篇论文主要研究了股票组合设计方法的比较，包括mean-variance portfolio（MVP）、 hierarchical risk parity（HRP）基于的股票组合和自适应神经网络（Autoencoder）基于的股票组合。
methods: 这篇论文使用了历史股票价格数据，从国家证券交易所（NSE）的十个主题领域选择了股票。使用了股票价格数据从2018年1月1日至2022年12月31日，并对这些股票组合进行了测试。
results: 研究发现，MVP股票组合在对数据上的风险考虑返回最好。然而，Autoencoder股票组合在一年 Returns 方面表现更好。

Abstract
Portfolio optimization has been an area that has attracted considerable attention from the financial research community. Designing a profitable portfolio is a challenging task involving precise forecasting of future stock returns and risks. This chapter presents a comparative study of three portfolio design approaches, the mean-variance portfolio (MVP), hierarchical risk parity (HRP)-based portfolio, and autoencoder-based portfolio. These three approaches to portfolio design are applied to the historical prices of stocks chosen from ten thematic sectors listed on the National Stock Exchange (NSE) of India. The portfolios are designed using the stock price data from January 1, 2018, to December 31, 2021, and their performances are tested on the out-of-sample data from January 1, 2022, to December 31, 2022. Extensive results are analyzed on the performance of the portfolios. It is observed that the performance of the MVP portfolio is the best on the out-of-sample data for the risk-adjusted returns. However, the autoencoder portfolios outperformed their counterparts on annual returns.

摘要
股票股票组合优化已经吸引了金融研究社区的广泛关注。设计一个有利可图的股票组合是一项复杂的任务，涉及精准预测未来股票收益和风险。这章介绍了三种股票组合设计方法：均值风险股票组合（MVP）、层次风险平衡基于股票组合（HRP）以及自动编码基于股票组合。这三种股票组合设计方法在印度国家证券交易所（NSE）上市的10个主题领域上的股票历史价格数据上进行了应用。这些股票组合使用的价格数据是从2018年1月1日至2021年12月31日，并在这些数据上测试了其性能。结果显示，MVP股票组合在审核数据上的风险衡量回报最佳。然而，自动编码股票组合在年度收益上超过了其对应的股票组合。

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

paper_url: http://arxiv.org/abs/2307.05628
repo_url: None
paper_authors: Daoan Zhang, Weitong Zhang, Bing He, Yu Zhao, Jianguo Zhang, Chenchen Qin, Jianhua Yao
for:DNAGPT is proposed to handle various DNA analysis tasks, including genomic signals and regions recognition, pseudo genomes generation, and mRNA abundance regression.methods:DNAGPT uses a pre-trained model with a modified architecture that includes binary classification and numerical regression tasks, as well as a comprehensive token language to encode sequence, number, and task-related information.results:DNAGPT demonstrates superior performance on various DNA analysis tasks, especially when compared to existing models specialized for specific downstream tasks.

Abstract
GPT has been proven to be capable of extracting general information from language sequences, thereby benefiting all downstream tasks. This motivates us to use pre-trained models to explore the hidden inherent information in DNA sequences. However, data and task requirements in DNA sequence analyses are tasked in different formats such as generation, prediction and regression, and are complexity and involve different modalities, such as nucleotides sequences and, expression levels, etc. Existing BERT-based models are mostly for generation tasks and use sequence data as input and output, thus cannot easily handle various DNA analysis tasks in one single model. Herein, we propose a generalized DNA pre-training DNA model, DNAGPT, that was trained on over 200 billion base pairs from all the mammals. We enhance the classic GPT model by adding binary classification task (DNA sequence order) and numerical regression task (guanine-cytosine content prediction) in the pre-training period and enhancing the architecture with corresponding embedding layers and encoding heads. We also design a comprehensive token language to encode sequence, number and task related information in the same token space. Therefore, DNAGPT can handle versatile DNA analysis tasks and simultaneously process handle both sequence and numerical data. We have evaluated our model on genomic signals and regions recognition, pseudo genomes generation and mRNA abudance regression tasks. We demonstrate that benefiting from pre-training, DNAGPT can shows superior performance than the existing models specially designed for various downstreams tasks.

摘要

Number Systems for Deep Neural Network Architectures: A Survey

paper_url: http://arxiv.org/abs/2307.05035
repo_url: None
paper_authors: Ghada Alsuhli, Vasileios Sakellariou, Hani Saleh, Mahmoud Al-Qutayri, Baker Mohammad, Thanos Stouraitis
for: 本文主要探讨了深度神经网络（DNNs）中的数据表示方法，以提高DNNs的计算效率和能效性。
methods: 本文提出了多种非标准数学系统，包括循环乘法、基于扩展的Gaussian数学系统、基于循环的数学系统等，以优化DNNs的表示方法。
results: 本文结果表明，使用非标准数学系统可以提高DNNs的计算效率和能效性，同时也可以降低硬件设计的复杂性。但是，每种数学系统都有其缺点和挑战，需要进一步的研究和优化。

Abstract
Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted

摘要

paper_url: http://arxiv.org/abs/2307.05029
repo_url: None
paper_authors: Normen Yu, Gang Tan, Saeid Tizpaz-Niari
For: This paper aims to provide a user-friendly interface for laypeople to understand and remedy unfairness in machine learning models.* Methods: The paper uses open-sourced machine learning model explanation tools, such as Local Interpretable Model-Agnostic Explanations (LIME), and integrates them with existing machine learning-focused graphical user interfaces (GUIs) like Python Streamlit.* Results: The paper tests the effectiveness of FairLay-ML, a proof-of-concept GUI, using models of various accuracy and fairness generated by an unfairness detector tool, Parfait-ML, and validates the results using Themis. The study finds that the technology stack used for FairLay-ML is easy to install and provides real-time black-box explanations of pre-trained models to users, and the explanations provided translate to actionable remedies.Here is the information in Simplified Chinese text:* For: 这项研究旨在提供一个用户友好的界面，使普通人能够理解和修复机器学习模型中的不公正性。* Methods: 这篇论文使用开源的机器学习模型解释工具，如Local Interpretable Model-Agnostic Explanations（LIME），并将其与现有的机器学习专注的图形用户界面（GUI）如Python Streamlit集成。* Results: 这篇论文测试了 FairLay-ML，一个证明性的 GUI，使用 Parfait-ML 生成的不公正度探测器生成的模型，并使用 Themis 验证结果。研究发现，FairLay-ML 使用的技术栈易于安装，可以在实时提供黑盒解释，并且解释提供了实际的纠正措施。

Abstract
This thesis explores open-sourced machine learning (ML) model explanation tools to understand whether these tools can allow a layman to visualize, understand, and suggest intuitive remedies to unfairness in ML-based decision-support systems. Machine learning models trained on datasets biased against minority groups are increasingly used to guide life-altering social decisions, prompting the urgent need to study their logic for unfairness. Due to this problem's impact on vast populations of the general public, it is critical for the layperson -- not just subject matter experts in social justice or machine learning experts -- to understand the nature of unfairness within these algorithms and the potential trade-offs. Existing research on fairness in machine learning focuses mostly on the mathematical definitions and tools to understand and remedy unfair models, with some directly citing user-interactive tools as necessary for future work. This thesis presents FairLay-ML, a proof-of-concept GUI integrating some of the most promising tools to provide intuitive explanations for unfair logic in ML models by integrating existing research tools (e.g. Local Interpretable Model-Agnostic Explanations) with existing ML-focused GUI (e.g. Python Streamlit). We test FairLay-ML using models of various accuracy and fairness generated by an unfairness detector tool, Parfait-ML, and validate our results using Themis. Our study finds that the technology stack used for FairLay-ML makes it easy to install and provides real-time black-box explanations of pre-trained models to users. Furthermore, the explanations provided translate to actionable remedies.

摘要

Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

paper_url: http://arxiv.org/abs/2307.05025
repo_url: None
paper_authors: Hui Kang, Sheng Liu, Huaxi Huang, Jun Yu, Bo Han, Dadong Wang, Tongliang Liu
for: 本研究旨在探讨学习含杂标签时的稳定性和泛化性问题，并提出简单的基线方法来解决这个问题。
methods: 本研究使用了混合抑制技术，包括学习率衰退、模型权重平均、数据增强等方法来提高模型的稳定性和泛化性。
results: 研究发现，使用简单的基线方法可以超越当前的状态艺术方法，并且这些方法可以更好地发挥其潜力。这些结果 suggessts that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels.

Abstract
In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated techniques, such as noise modeling, label correction, and co-training. In this study, we demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies like learning rate decay, model weights average, and data augmentations, can outperform state-of-the-art methods. Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels. While some of these regularization strategies have been utilized in previous noisy label learning research, their full potential has not been thoroughly explored. Our results encourage a reevaluation of benchmarks for learning with noisy labels and prompt reconsideration of the role of specialized learning algorithms designed for training with noisy labels.

摘要
在近年来，学习含杂标签的研究主要关注开发 novel 算法，以实现对含杂训练标签的Robustness 性，同时能够泛化到干净数据上。这些算法通常包括复杂的技术，如噪声模型、标签修正和合作学习。在这项研究中，我们发现，使用 Cross-Entropy 损失函数，并与通用的规则化策略（如学习率减少、模型权重平均和数据扩展）结合使用，可以超越当前的state-of-the-art 方法。我们的发现表明，结合规则化策略可以更有效地解决含杂标签学习中的挑战。虽然一些这些规则化策略在过去的含杂标签学习研究中已经被利用，但它们的潜力尚未得到了全面的探索。我们的结果鼓励我们重新评估含杂标签学习的标准准则，并重新考虑特殊的含杂标签学习算法。

Feature Activation Map: Visual Explanation of Deep Learning Models for Image Classification

paper_url: http://arxiv.org/abs/2307.05017
repo_url: None
paper_authors: Yi Liao, Yongsheng Gao, Weichuan Zhang
for: 这 paper 的目的是为了解释 deep learning 模型不含全连接层的分类器的决策。
methods: 该 paper 提出了一种后处解释工具 named feature activation map (FAM)，可以用于解释不含全连接层的 deep learning 模型。 FAM 算法通过计算图像嵌入的相似度分布来 derive 通道 wise 贡献权重，然后将 activation map 与相应的正规化贡献权重进行线性组合，形成解释图。
results: 在十种 deep learning 模型上，包括几种 few-shot 图像分类、对比学习图像分类和图像检索任务，Quantitative 和 Qualitative 实验结果表明 FAM 算法的有效性。

Abstract
Decisions made by convolutional neural networks(CNN) can be understood and explained by visualizing discriminative regions on images. To this end, Class Activation Map (CAM) based methods were proposed as powerful interpretation tools, making the prediction of deep learning models more explainable, transparent, and trustworthy. However, all the CAM-based methods (e.g., CAM, Grad-CAM, and Relevance-CAM) can only be used for interpreting CNN models with fully-connected (FC) layers as a classifier. It is worth noting that many deep learning models classify images without FC layers, e.g., few-shot learning image classification, contrastive learning image classification, and image retrieval tasks. In this work, a post-hoc interpretation tool named feature activation map (FAM) is proposed, which can interpret deep learning models without FC layers as a classifier. In the proposed FAM algorithm, the channel-wise contribution weights are derived from the similarity scores between two image embeddings. The activation maps are linearly combined with the corresponding normalized contribution weights, forming the explanation map for visualization. The quantitative and qualitative experiments conducted on ten deep learning models for few-shot image classification, contrastive learning image classification and image retrieval tasks demonstrate the effectiveness of the proposed FAM algorithm.

摘要
<>translate_language: zh-CN文本：深度学习模型的决策可以通过图像视觉化的方式来理解和解释。为此，基于图像活动映射（CAM）的方法被提出，使得深度学习模型的预测变得更加可解、透明和信任worthy。然而，所有的CAM基于方法（如CAM、Grad-CAM和Relevance-CAM）都只适用于深度学习模型中的完全连接（FC）层作为分类器。它们无法用于解释不含FC层的深度学习模型，例如几拍学习图像分类、对比学习图像分类和图像检索任务。在这种情况下，一种后期解释工具named feature activation map（FAM）被提出，可以解释不含FC层的深度学习模型。在提出的FAM算法中，通过两个图像嵌入的相似度分布来 derivate通道 wise的贡献重量。然后，将活动地图与相应的 норmal化贡献重量进行线性组合，形成解释地图 для视觉化。对于几拍学习图像分类、对比学习图像分类和图像检索任务中的十个深度学习模型，我们进行了量化和质量测试，结果表明FAM算法的效果。Translation:<>translate_language: zh-CN文本：深度学习模型的决策可以通过图像视觉化的方式来理解和解释。为此，基于图像活动映射（CAM）的方法被提出，使得深度学习模型的预测变得更加可解、透明和信任worthy。然而，所有的CAM基于方法（如CAM、Grad-CAM和Relevance-CAM）都只适用于深度学习模型中的完全连接（FC）层作为分类器。它们无法用于解释不含FC层的深度学习模型，例如几拍学习图像分类、对比学习图像分类和图像检索任务。在这种情况下，一种后期解释工具named feature activation map（FAM）被提出，可以解释不含FC层的深度学习模型。在提出的FAM算法中，通过两个图像嵌入的相似度分布来 derivate通道 wise的贡献重量。然后，将活动地图与相应的 норmal化贡献重量进行线性组合，形成解释地图 для视觉化。对于几拍学习图像分类、对比学习图像分类和图像检索任务中的十个深度学习模型，我们进行了量化和质量测试，结果表明FAM算法的效果。

CILF:Causality Inspired Learning Framework for Out-of-Distribution Vehicle Trajectory Prediction

paper_url: http://arxiv.org/abs/2307.05624
repo_url: None
paper_authors: Shengyi Li, Qifan Xue, Yezhuo Zhang, Xuanpeng Li
for: 提高自动驾驶车辆的路径预测精度
methods: 提出了一种基于 causal graph 的 Out-of-Distribution Causal Graph (OOD-CG) 方法，并提出了一种基于这个 causal graph 的 Causal Inspired Learning Framework (CILF)
results: 在 NGSIM 和 INTERACTION 两个主流数据集上，CILF 实现了提高域间泛化性的表现

Abstract
Trajectory prediction is critical for autonomous driving vehicles. Most existing methods tend to model the correlation between history trajectory (input) and future trajectory (output). Since correlation is just a superficial description of reality, these methods rely heavily on the i.i.d. assumption and evince a heightened susceptibility to out-of-distribution data. To address this problem, we propose an Out-of- Distribution Causal Graph (OOD-CG), which explicitly defines the underlying causal structure of the data with three entangled latent features: 1) domain-invariant causal feature (IC), 2) domain-variant causal feature (VC), and 3) domain-variant non-causal feature (VN ). While these features are confounded by confounder (C) and domain selector (D). To leverage causal features for prediction, we propose a Causal Inspired Learning Framework (CILF), which includes three steps: 1) extracting domain-invariant causal feature by means of an invariance loss, 2) extracting domain variant feature by domain contrastive learning, and 3) separating domain-variant causal and non-causal feature by encouraging causal sufficiency. We evaluate the performance of CILF in different vehicle trajectory prediction models on the mainstream datasets NGSIM and INTERACTION. Experiments show promising improvements in CILF on domain generalization.

摘要
几何预测是自动驾驶车辆中的关键技术。大多数现有方法都是根据历史轨迹（输入）和未来轨迹（输出）之间的相互相关性模型。由于相互相关性只是现象的表面描述，这些方法对于非常用数据的敏感性较高。为了解决这个问题，我们提出了一个 OUT-OF-DISTRIBUTION causal graph（OOD-CG），它明确地定义了数据的底层 causal 结构，包括三个涉及的隐藏特征：1）域对称 causal 特征（IC），2）域特有 causal 特征（VC），和3）域特有 non-causal 特征（VN）。这些特征受到干扰因子（C）和域选择器（D）的混合影响。为了利用 causal 特征进行预测，我们提出了一个 causal 灵感学习框架（CILF），包括三个步骤：1）通过不对称损失提取域对称 causal 特征，2）通过域区别学习提取域特有 causal 特征，和3）通过鼓励 causal 充分性来分离域特有 causal 和 non-causal 特征。我们在主流的 NGSIM 和 INTERACTION 等数据集上评估了 CILF 的表现，实验结果显示了 CILF 在域泛化中的明显改进。

Test-Time Training on Video Streams

paper_url: http://arxiv.org/abs/2307.05014
repo_url: https://github.com/molyswu/hand_detection
paper_authors: Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang
for: 这个论文是为了提高在测试时使用已经训练好的模型的性能而设计的。
methods: 这个论文使用了在测试时使用自我监督任务，如图像重建使用压缩 autoencoders，来进行模型进一步改进。
results: 这个论文在三个实际 datasets 上对四个任务进行了实验，并取得了45% 和 66% 的相对提升。

Abstract
Prior work has established test-time training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders. We extend TTT to the streaming setting, where multiple test instances - video frames in our case - arrive in temporal order. Our extension is online TTT: The current model is initialized from the previous model, then trained on the current frame and a small window of frames immediately before. Online TTT significantly outperforms the fixed-model baseline for four tasks, on three real-world datasets. The relative improvement is 45% and 66% for instance and panoptic segmentation. Surprisingly, online TTT also outperforms its offline variant that accesses more information, training on all frames from the entire test video regardless of temporal order. This differs from previous findings using synthetic videos. We conceptualize locality as the advantage of online over offline TTT. We analyze the role of locality with ablations and a theory based on bias-variance trade-off.

摘要
Translated into Simplified Chinese:先前的研究已经确立了测试时训练（TTT）为一种通用的框架，以进一步改进已经训练的模型。在测试时，模型会被训练在每个测试实例上，使用自我指导任务，如图像重建 WITH 掩码 autoencoders。我们将 TTT 扩展到流处理设置，其中多个测试实例（视频帧）会在时间顺序下 arrive。我们的扩展是在线 TTT：当前模型将从前一个模型 initialized，然后在当前帧和当前时间范围内的一小Window of frames上进行训练。在线 TTT 与固定模型基eline 相比，显著提高了四个任务的性能，分别是实例和杂点分割。奇怪的是，在线 TTT 还超过了它的离线变体，即训练所有测试视频帧的整体信息。这与以往使用 sintetic videos 的发现不同。我们认为本地性是在线 TTT 的优势。我们通过减少和基准范变换来分析本地性的作用。

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

paper_url: http://arxiv.org/abs/2307.05623
repo_url: None
paper_authors: Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng
for: 本研究旨在解决交通领域中OD矩阵估算中的主要问题，即使用交通传感器测量信息来估算交通需求表示的OD矩阵。methods: 本研究提议一种集成方法，利用深度学习方法来推断OD序列的结构，并使用结构约束导引传统的数值优化。results: 实验表明，神经网络可以有效地推断OD序列的结构，并提供实用的约束 для数值优化以获得更好的结果。此外，实验还表明，提供的结构信息不仅包含OD矩阵的空间结构约束，还包含时间结构约束，可以有效解决延迟问题。

Abstract
OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.

摘要
OD矩阵估计是交通领域中的关键问题。主要方法使用交通仪器测量信息，如交通统计数据，来估计交通需求表示的OD矩阵。问题分为两类：静态OD矩阵估计和动态OD序列（简称OD序列）估计。两者都面临了不充分约束的问题，导致估计过多的参数。此外，OD序列估计还面临着延迟问题：由于不同的交通情况，如拥堵，同一段道路上的同一辆车辆在同一个观察时期出现，导致同一个OD需求对应不同的旅行。为解决这些问题，本文提出了一种集成方法，使用深度学习方法来推断OD序列的结构，并使用结构约束来导引传统的数值优化。我们的实验表明，神经网络（NN）可以有效地推断OD序列的结构，并为数值优化提供实用的约束。此外，实验还表明，提供的结构信息不仅包含OD矩阵的空间结构约束，还提供了OD序列的时间结构约束，这有效解决了延迟问题。

Improving RNN-Transducers with Acoustic LookAhead

paper_url: http://arxiv.org/abs/2307.05006
repo_url: None
paper_authors: Vinit S. Unni, Ashish Mittal, Preethi Jyothi, Sunita Sarawagi
for: 这篇论文是为了提高语音转文字的精度和流动性而写的。
methods: 这篇论文使用了RNN-T模型，并提出了一个名为LookAhead的技术来让文本表现更加声音背景测量。
results: 这篇论文的实验结果显示，使用LookAhead技术可以导致文本误差率降低5%-20%，包括在域内和域外评估集上。

Abstract
RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities. A typical RNN-T independently encodes the input audio and the text context, and combines the two encodings by a thin joint network. While this architecture provides SOTA streaming accuracy, it also makes the model vulnerable to strong LM biasing which manifests as multi-step hallucination of text without acoustic evidence. In this paper we propose LookAhead that makes text representations more acoustically grounded by looking ahead into the future within the audio input. This technique yields a significant 5%-20% relative reduction in word error rate on both in-domain and out-of-domain evaluation sets.

摘要

Latent Space Perspicacity and Interpretation Enhancement (LS-PIE) Framework

paper_url: http://arxiv.org/abs/2307.05620
repo_url: None
paper_authors: Jesse Stevens, Daniel N. Wilke, Itumeleng Setshedi
for: 这个论文的目的是提高线性隐 Variable 模型中的隐空间表示，以提高这些模型的可解释性。
methods: 这个论文提出了一个通用框架，用于自动对隐向量进行归类、缩放和排序，以提高每个隐向量的信息含量。这个框架还包括单通道和多通道数据源、数据预处理策略和特定 метри来自动确定隐向量的归类数量。
results: 在两个基础问题上，这个框架的效果被证明了，包括LR、LS和LCON等功能。这些功能可以帮助提高线性隐 Variable 模型的可解释性和应用范围。

Abstract
Linear latent variable models such as principal component analysis (PCA), independent component analysis (ICA), canonical correlation analysis (CCA), and factor analysis (FA) identify latent directions (or loadings) either ordered or unordered. The data is then projected onto the latent directions to obtain their projected representations (or scores). For example, PCA solvers usually rank the principal directions by explaining the most to least variance, while ICA solvers usually return independent directions unordered and often with single sources spread across multiple directions as multiple sub-sources, which is of severe detriment to their usability and interpretability. This paper proposes a general framework to enhance latent space representations for improving the interpretability of linear latent spaces. Although the concepts in this paper are language agnostic, the framework is written in Python. This framework automates the clustering and ranking of latent vectors to enhance the latent information per latent vector, as well as, the interpretation of latent vectors. Several innovative enhancements are incorporated including latent ranking (LR), latent scaling (LS), latent clustering (LC), and latent condensing (LCON). For a specified linear latent variable model, LR ranks latent directions according to a specified metric, LS scales latent directions according to a specified metric, LC automatically clusters latent directions into a specified number of clusters, while, LCON automatically determines an appropriate number of clusters into which to condense the latent directions for a given metric. Additional functionality of the framework includes single-channel and multi-channel data sources, data preprocessing strategies such as Hankelisation to seamlessly expand the applicability of linear latent variable models (LLVMs) to a wider variety of data. The effectiveness of LR, LS, and LCON are showcased on two crafted foundational problems with two applied latent variable models, namely, PCA and ICA.

摘要
Linear 隐变量模型，如主成分分析（PCA）、独立成分分析（ICA）、共谱分析（CCA）和因素分析（FA），可以找到隐向量（或负荷）是有序还是无序的。然后将数据Project onto these latent directions to obtain their projected representations（或分数）。例如，PCA 解决器通常会根据解释最多变量来排序主方向，而 ICA 解决器通常会返回独立的方向，无序，经常有多个来源分散在多个方向中，这会影响其可用性和可读性。这篇文章提出了一种通用框架，用于提高线性隐空间表示的可解释性。尽管这些概念是语言无关的，但框架是写在Python语言中。这个框架可以自动将隐向量集中到减少隐信息的latent vector，以及提高隐向量的解释性。框架包含了多种创新的改进，包括隐向量排名（LR）、隐向量缩放（LS）、隐向量划分（LC）和隐向量压缩（LCON）。对于指定的线性隐变量模型，LR 可以根据指定的度量将隐方向排名，LS 可以根据指定的度量缩放隐方向，LC 可以自动将隐方向分为指定数量的集中，而 LCON 可以自动确定隐方向压缩到指定度量的最佳数量。框架还包括单通道和多通道数据源，以及数据预处理策略，例如束腾化来扩展线性隐变量模型（LLVMs）的应用范围。LR、LS和LCON 的效果在两个基本问题上进行了示例，这两个问题分别使用 PCA 和 ICA 作为应用隐变量模型。

Control as Probabilistic Inference as an Emergent Communication Mechanism in Multi-Agent Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.05004
repo_url: None
paper_authors: Tomoaki Nakamura, Akira Taniguchi, Tadahiro Taniguchi
for: 这种论文旨在提出一种生成概率模型，整合emergent communication和多个代理人强化学习。
methods: 该模型使用概率推理进行控制，并通过隐藏变量和估计来实现代理人之间的交流。
results: 通过在网格环境中的实验，我们表明了该PGM可以推理出有意义的消息，以完成合作任务。

Abstract
This paper proposes a generative probabilistic model integrating emergent communication and multi-agent reinforcement learning. The agents plan their actions by probabilistic inference, called control as inference, and communicate using messages that are latent variables and estimated based on the planned actions. Through these messages, each agent can send information about its actions and know information about the actions of another agent. Therefore, the agents change their actions according to the estimated messages to achieve cooperative tasks. This inference of messages can be considered as communication, and this procedure can be formulated by the Metropolis-Hasting naming game. Through experiments in the grid world environment, we show that the proposed PGM can infer meaningful messages to achieve the cooperative task.

摘要

Selective Sampling and Imitation Learning via Online Regression

paper_url: http://arxiv.org/abs/2307.04998
repo_url: None
paper_authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
for: 本文提出了一种解决Imitation Learning（IL）问题的交互算法，使得在受到噪声专家反馈的情况下，IL可以更加成功。
methods: 本文使用选择样本算法，通过咨询噪声专家来获得反馈，以提高IL的性能。
results: 本文提供了一种新的选择样本算法，可以在涉及到多个动作和通用函数类型的情况下实现IL。这个算法的 regret bound和查询次数都与在线回归 oracle 相关，并且可以与噪声专家进行有限次的交互。

Abstract
We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.

摘要
我们考虑伪模仿学习（IL）问题，通过活动地发送受惊访问来获得不精确的专家反馈。过往的大部分研究假设可以得到不受扰动的专家反馈，但这在许多应用中不实际。事实上，只有可以获得不精确的专家反馈时，基于专家反馈的非互动式IL（非互动式学习）的算法可以证明需要极大的样本数量才能成功。相比之下，在这个研究中，我们提出了一个互动式IL算法，使用选择性样本来活动地发送受惊访问。我们的贡献包括：首先，我们提出了一个新的选择性样本算法，适用于一般函数类别和多个动作。我们获得了最好的known bounds的 regret和询问数量。其次，我们将这一分析扩展到伪模仿学习问题中，提出了一个新的IL算法，使用选择性样本来获得有限的询问数量。我们的算法利用函数近似，并且透过在line上的 regression oracle 来预测动作，并决定是否需要受惊访问专家。从理论上看，我们的算法的 regret bound是由online regression oracle的 regret bound所 upper bounded，而且询问 complexity 还dependent于模型类别的eluder dimension。我们补充了一个下界，证明我们的结果是紧缩的。 finally，我们扩展了我们的选择性样本算法，提供了具有 regret和询问数量 bounds的IL算法，适用于一般函数类别和多个动作。这个新的特点是，我们的 regret和询问 complexity bounds仅dependent于Optimal policy（而不是专家、学习者）在状态空间中的小margin次数。

Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.04996
repo_url: https://github.com/GhanshyamVerma/Explainable-Recommender-System
paper_authors: Ghanshyam Verma, Shovon Sengupta, Simon Simanta, Huan Chen, Janos A. Perge, Devishree Pillai, John P. McCrae, Paul Buitelaar
For: This paper focuses on developing interpretable knowledge graph-based recommender systems for personalized article recommendations in financial services.* Methods: The authors propose two approaches: one using Reinforcement Learning and the other using XGBoost, both of which leverage a knowledge graph generated from structured and unstructured data. The Reinforcement Learning-based approach utilizes graph traversal paths to provide interpretations, while the XGBoost-based approach uses post-hoc methods like SHAP and ELI5 to provide explainable results.* Results: The approach offers explainable results, promoting better decision-making, and demonstrates the potential of combining advanced machine learning techniques with KG-driven insights for enhancing customer experience in relationship management.Here’s the Chinese version:* For: 这篇论文关注开发可解释知识图基于文章个性化推荐在金融服务中。* Methods: 作者提出了两种方法：一种使用奖励学习，另一种使用XGBoost，两者都利用了基于结构化和无结构化数据生成的知识图。奖励学习基本方法使用图 traversal 路径来提供解释，而 XGBoost 基本方法使用 SHAP 和 ELI5 等后置方法提供解释结果。* Results: 方法提供了可解释结果，促进更好的决策，并证明了结合先进机器学习技术和知识图驱动的想法可以增强客户关系管理的体验。

Abstract
Personalized recommendations have a growing importance in direct marketing, which motivates research to enhance customer experiences by knowledge graph (KG) applications. For example, in financial services, companies may benefit from providing relevant financial articles to their customers to cultivate relationships, foster client engagement and promote informed financial decisions. While several approaches center on KG-based recommender systems for improved content, in this study we focus on interpretable KG-based recommender systems for decision making.To this end, we present two knowledge graph-based approaches for personalized article recommendations for a set of customers of a large multinational financial services company. The first approach employs Reinforcement Learning and the second approach uses the XGBoost algorithm for recommending articles to the customers. Both approaches make use of a KG generated from both structured (tabular data) and unstructured data (a large body of text data).Using the Reinforcement Learning-based recommender system we could leverage the graph traversal path leading to the recommendation as a way to generate interpretations (Path Directed Reasoning (PDR)). In the XGBoost-based approach, one can also provide explainable results using post-hoc methods such as SHAP (SHapley Additive exPlanations) and ELI5 (Explain Like I am Five).Importantly, our approach offers explainable results, promoting better decision-making. This study underscores the potential of combining advanced machine learning techniques with KG-driven insights to bolster experience in customer relationship management.

摘要
personalized recommendations 的重要性在直接市场营销中增长，这些研究旨在提高客户体验 durch 知识图（KG）应用。例如，在金融服务中，公司可能会从提供相关的金融文章来培养关系，促进客户参与度和提高客户做出的Financial 决策。虽然许多方法集中在 KG 基于的推荐系统中，但在这项研究中，我们关注可解释 KG 基于的推荐系统。为此，我们提出了两种基于知识图的方法，用于个性化文章推荐。首先，我们使用强化学习来实现 Reinforcement Learning 基于的推荐系统。这种方法可以利用知识图的搜索路径来生成可解释的结果（Path Directed Reasoning （PDR））。其次，我们使用 XGBoost 算法来推荐文章。这种方法可以通过后处方法，如 SHAP 和 ELI5，提供可解释的结果。我们的方法可以提供可解释的结果，这会促进更好的决策。这项研究证明了将高级机器学习技术与知识图驱动的 Insights 结合使用，可以提高客户关系管理的经验。

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

paper_url: http://arxiv.org/abs/2307.04995
repo_url: None
paper_authors: Zixuan Ma, Haojie Wang, Jingze Xing, Liyan Zheng, Chen Zhang, Huanqi Cao, Kezhao Huang, Shizhi Tang, Penghan Wang, Jidong Zhai
for: 这个论文主要是为了提高深度神经网络（DNN）的计算效率，以及对不同领域的加速器上的代码生成。
methods: 这个论文提出了一个名为IntelliGen的tensor compiler，可以为memory-intensive操作生成高性能的代码，并考虑到 computation和data movement optimizations。
results: 在试验IntelliGen时，在NVIDIA GPU、AMD GPU和Cambricon MLU上得到了1.97倍、2.93倍和16.91倍的速度提升（在average上是1.28倍、1.23倍和2.31倍），较现有最高效的框架还要快。

Abstract
Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.

摘要
深度神经网络（DNN）在不同领域都有critical使用。为加速DNN计算，tensor编译器被提议，以生成在不同领域特定加速器上的高效代码。现有的tensor编译器主要关注计算效率。然而，内存访问已成为计算加速器的性能瓶颈，因为计算性能的提升速度比内存性能提升的速度要快得多。现有的tensor编译器中间表示（IR）缺乏直接描述内存访问和数据依赖，这使得生成内存高效代码带来重大挑战。为解决这个问题，我们提出了IntelliGen，一种tensor编译器，可以通过考虑计算和数据移动优化来生成高性能的内存高效代码。IntelliGen使用GIR表示DNN程序，GIR包括计算、数据移动和并行策略的元素。这些信息将被组合成为数字水平的数据流图，以实现整体优化。我们对IntelliGen进行了NVIDIA GPU、AMD GPU和Cambricon MLU的测试，并显示了与当前最高性能框架的比较，获得了速度提升为1.97倍、2.93倍和16.91倍（1.28倍、1.23倍和2.31倍的平均提升）。

Uncertainty Quantification of the Virial Black Hole Mass with Conformal Prediction

paper_url: http://arxiv.org/abs/2307.04993
repo_url: https://github.com/yongsukyee/uncertain_blackholemass
paper_authors: Suk Yee Yong, Cheng Soon Ong
for: 这个研究的目的是为了测量黑洞质量的精度，以了解黑洞和宿主 галактика之间的演化。
methods: 这个研究使用了对称化量划 regression (CQR) 来衡量黑洞预测的不确定性。
results: 研究发现，使用 CQR 方法可以提供更有用的预测 интерVAL指标，并且可以根据黑洞质量和相关属性进行调整。

Abstract
Precise measurements of the black hole mass are essential to gain insight on the black hole and host galaxy co-evolution. A direct measure of the black hole mass is often restricted to nearest galaxies and instead, an indirect method using the single-epoch virial black hole mass estimation is used for objects at high redshifts. However, this method is subjected to biases and uncertainties as it is reliant on the scaling relation from a small sample of local active galactic nuclei. In this study, we propose the application of conformalised quantile regression (CQR) to quantify the uncertainties of the black hole predictions in a machine learning setting. We compare CQR with various prediction interval techniques and demonstrated that CQR can provide a more useful prediction interval indicator. In contrast to baseline approaches for prediction interval estimation, we show that the CQR method provides prediction intervals that adjust to the black hole mass and its related properties. That is it yields a tighter constraint on the prediction interval (hence more certain) for a larger black hole mass, and accordingly, bright and broad spectral line width source. Using a combination of neural network model and CQR framework, the recovered virial black hole mass predictions and uncertainties are comparable to those measured from the Sloan Digital Sky Survey. The code is publicly available at https://github.com/yongsukyee/uncertain_blackholemass.

摘要
精确测量黑洞质量是研究黑洞和宿主 галактики共EVOLUTION的关键。直接测量黑洞质量通常只能在最近的 галактиках中进行，而高红shift объек图使用单个epoch virial黑洞质量估计法则受到偏见和不确定性的限制。在这种情况下，我们提议使用尺度化量表 regression（CQR）来评估黑洞预测的不确定性。我们比较了CQR与various prediction interval技术，并证明了CQR可以提供更有用的预测间隔指标。与基eline方法相比，CQR方法提供的预测间隔变化随着黑洞质量和相关的属性而变化，即在更大的黑洞质量和明亮宽 spectral line width sources 时提供更紧张的预测间隔（更确定）。使用一种带有神经网络模型的CQR框架，我们 retrieved virial黑洞质量预测和不确定性结果与SDSS中的测量结果相符。代码可以在https://github.com/yongsukyee/uncertain_blackholemass上获取。

Monotone deep Boltzmann machines

paper_url: http://arxiv.org/abs/2307.04990
repo_url: None
paper_authors: Zhili Feng, Ezra Winston, J. Zico Kolter
for: This paper explores the possibility of efficient approximate inference in deep Boltzmann machines (DBMs) by developing a new class of restricted models called monotone DBMs.
methods: The paper uses tools from the recently-proposed monotone Deep Equilibrium model to develop a fixed-point iteration that gives a variational mean-field solution for monotone DBMs.
results: The paper demonstrates that the proposed approach allows for tasks such as joint completion and classification of images within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.Here is the same information in Simplified Chinese text:
for: 这篇论文探讨了深度博尔茨曼机器 (DBM) 的有效减少概率 approximate inference 的可能性，通过开发一种新的受限模型，即 monotone DBM。
methods: 论文使用最近提出的 monotone Deep Equilibrium 模型的工具，开发了一种固有点迭代，以获得 monotone DBM 的变量场解。
results: 论文示出了该方法可以在单个深度概率设定下完成图像的联合完成和分类任务，而不是传统 RBM 中的mean-field inference 中的陷阱。

Abstract
Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in order to allow for more efficient inference. In this work, we revisit the generic DBM approach, and ask the question: are there other possible restrictions to their design that would enable efficient (approximate) inference? In particular, we develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer, but restricts the \emph{weights} in a manner that guarantees the existence and global uniqueness of a mean-field fixed point. To do this, we leverage tools from the recently-proposed monotone Deep Equilibrium model and show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution. While this approach is still largely conceptual, it is the first architecture that allows for efficient approximate inference in fully-general weight structures for DBMs. We apply this approach to simple deep convolutional Boltzmann architectures and demonstrate that it allows for tasks such as the joint completion and classification of images, within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.

摘要
深度博尔茨曼机（DBM）是一种多层概率模型，其中每个层都有一个对应的概率分布，用于描述网络中所有变量/节点的可能性。在实践中，DBM通常会被限制，例如通过使用Restricted Boltzmann Machine（RBM）架构，以便更加有效地进行推理。在这个工作中，我们回到了基本的DBM方法，并问：是否有其他可能的限制，以实现更加有效的推理？特别是，我们开发了一种新的受限模型，即偏好DBM，该模型允许每层任意自连接，但是限制权重的方式，以保证存在和全局唯一的均衡点。为此，我们利用了最近提出的偏好深度平衡模型的工具，并证明在某种特定的激活函数下，这种模型会导致一个均衡点逻辑的解。尽管这种方法仍然是概念上的，但它是DBM中第一种允许高效近似推理的建筑。我们应用这种方法于深度卷积博尔茨曼架构，并示出它可以在单个深度概率设定下完成图像的联合完成和分类任务，而不需要传统RBM的含义场推理。

Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation

paper_url: http://arxiv.org/abs/2307.04988
repo_url: None
paper_authors: Chris Chinenye Emezue, Alexandre Drouin, Tristan Deleu, Stefan Bauer, Yoshua Bengio
for: 评估 causal discovery 方法的下游 task 的效果，即 treatment effect estimation。
methods: 使用 seven 个基准方法，包括一种新提出的 GFlowNets 方法，对 causal discovery 方法的下游 task 进行评估。
results: 研究结果显示，一些算法能够有效地捕捉各种有用和多样的 ATE 模式，而其他一些算法往往学习低概率模式，影响 (不relax) 精度和准确性。

Abstract
The practical utility of causality in decision-making is widespread and brought about by the intertwining of causal discovery and causal inference. Nevertheless, a notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. To address this gap, we evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets, on the downstream task of treatment effect estimation. Through the implementation of a distribution-level evaluation, we offer valuable and unique insights into the efficacy of these causal discovery methods for treatment effect estimation, considering both synthetic and real-world scenarios, as well as low-data scenarios. The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes, while some tend to learn many low-probability modes which impacts the (unrelaxed) recall and precision.

摘要
“ causality 在决策中的实际用途广泛，这与 causal discovery 和 causal inference 的相互关联有关。然而，评估 causal discovery 方法的一个显著的差距是在下游推理方面，现有的评估方法强调上游推理。为了解决这个差距，我们评估了七种已有的基准 causal discovery 方法，包括一种基于 GFlowNets 的新方法，在对医疗效果估计任务上。通过实施分布水平评估，我们提供了有价值和独特的洞察，探讨这些 causal discovery 方法在医疗效果估计任务中的效果，包括合成和实际场景，以及低数据场景。结果显示，一些算法可以有效地捕捉各种有用和多样的 ATE 模式，而其他些则往往学习低概率模式，影响（不压缩）准确率和准确率。”

Secrets of RLHF in Large Language Models Part I: PPO

paper_url: http://arxiv.org/abs/2307.04964
repo_url: https://github.com/openlmlab/moss-rlhf
paper_authors: Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
for:这篇论文的目标是提出一种技术Alignment的方法，以便在人类中心的辅助下，使大型语言模型（LLMs）得到人类Feedback（RLHF）的改进。methods:该论文使用了 reward models 来测量人类的偏好，Proximal Policy Optimization（PPO）来优化策略模型的输出，以及 process supervision 来提高步骤逻辑能力。results:研究发现，Policy constraints 是 PPO 算法的关键因素，因此，他们提出了 PPO-max 算法，以提高策略模型的训练稳定性。基于主要结果，他们进行了RLHF的全面分析，并与 SFT 模型和 ChatGPT 进行了比较。

Abstract
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs.

摘要
大型语言模型（LLM）已经制定了人工通用智能的发展蓝图。其主要目标是作为人acentric（帮助、诚实、无害）助手。与人类Alignment相当重要，而使用人类反馈学习（RLHF）成为了这一努力的核心技术。现有的技术路径通常包括了优先项目模型来度量人类喜好，使用Proximal Policy Optimization（PPO）来优化政策模型的输出，以及过程监控来提高步骤逻辑能力。但由于优先项目设计、环境互动和机器人训练等因素，加上大型语言模型的实验成本巨大，使得AI研究人员对LLM的技术Alignment和安全落地带出了很大的挑战。RLHF的稳定训练仍然是一个谜。在本报告中，我们分析RLHF的框架，重新评估PPO内部运作，并探索PPO算法中的不同部分对政策代理训练的影响。我们发现政策限制是PPO算法的关键因素。因此，我们探索了PPO-max，一种PPO算法的进阶版本，以提高政策模型训练的稳定性。根据我们的主要结果，我们进行了RLHF能力的全面分析，与SFT模型和ChatGPT进行比较。由于LLMs的开源实现缺乏，因此我们对LLMs的对齐做出了很大的挑战。因此，我们将释出技术报告、优先项目模型和PPO代码，以做出一定的贡献 LLMs的进一步发展。

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization

paper_url: http://arxiv.org/abs/2307.04963
repo_url: None
paper_authors: Simin Chen, Shiyi Wei, Cong Liu, Wei Yang
for: 提高动态神经网络（DyNNs）的编译效率和性能。
methods: 提出一种通用的方法，使得现有的深度学习（DL）编译器可以成功编译DyNNs。该方法包括程序分析和程序转换技术，将动态神经网络转换为多个子神经网络。每个子神经网络独立编译，并且使用主机模块来模拟控制流。
results: 对多个动态神经网络进行编译，实现了100%的成功率。同时，生成的执行代码在一些场景下运行速度提高了1.12-20.21倍。

Abstract
DL compiler's primary function is to translate DNN programs written in high-level DL frameworks such as PyTorch and TensorFlow into portable executables. These executables can then be flexibly executed by the deployed host programs. However, existing DL compilers rely on a tracing mechanism, which involves feeding a runtime input to a neural network program and tracing the program execution paths to generate the computational graph necessary for compilation. Unfortunately, this mechanism falls short when dealing with modern dynamic neural networks (DyNNs) that possess varying computational graphs depending on the inputs. Consequently, conventional DL compilers struggle to accurately compile DyNNs into executable code. To address this limitation, we propose \tool, a general approach that enables any existing DL compiler to successfully compile DyNNs. \tool tackles the dynamic nature of DyNNs by introducing a compilation mechanism that redistributes the control and data flow of the original DNN programs during the compilation process. Specifically, \tool develops program analysis and program transformation techniques to convert a dynamic neural network into multiple sub-neural networks. Each sub-neural network is devoid of conditional statements and is compiled independently. Furthermore, \tool synthesizes a host module that models the control flow of the DyNNs and facilitates the invocation of the sub-neural networks. Our evaluation demonstrates the effectiveness of \tool, achieving a 100\% success rate in compiling all dynamic neural networks. Moreover, the compiled executables generated by \tool exhibit significantly improved performance, running between $1.12\times$ and $20.21\times$ faster than the original DyNNs executed on general-purpose DL frameworks.

摘要
DL编译器的主要功能是将深度学习（DL）程序从高级框架such as PyTorch和TensorFlow转换为可移植的执行程序。这些执行程序可以在部署的主机程序上灵活执行。然而，现有的DL编译器都 rely on一种跟踪机制，即通过 feeding runtime输入到深度学习程序并跟踪程序执行路径来生成必要的计算图来进行编译。然而，这种机制在处理现代动态神经网络（DyNNs）时会遇到问题，因为DyNNs具有因输入而变化的计算图。因此，传统的DL编译器无法准确地编译DyNNs。为解决这个限制，我们提出了\tool，一种通用的方法，可以使任何现有的DL编译器成功编译DyNNs。\tool 处理动态神经网络的方式是通过在编译过程中重新分配控制和数据流的方式来转换动态神经网络。具体来说，\tool 开发了程序分析和程序转换技术，将动态神经网络转换为多个子神经网络。每个子神经网络都是无条件语句的，可以独立地编译。此外，\tool synthesizes主机模块，模拟动态神经网络的控制流，并且促进了子神经网络的邀请。我们的评估表明，\tool 的效果非常出色，所有的动态神经网络都成功编译。此外，由\tool 生成的执行程序表现出色，在特定的情况下，与原始 DyNNs 执行在通用DL框架上的性能相比，具有1.12倍至20.21倍的提升。

Intrinsically motivated graph exploration using network theories of human curiosity

paper_url: http://arxiv.org/abs/2307.04962
repo_url: https://github.com/spatank/GraphRL
paper_authors: Shubhankar P. Patankar, Mathieu Ouellet, Juan Cervino, Alejandro Ribeiro, Kieran A. Murphy, Dani S. Bassett
for: 这篇论文主要是为了解决如何在图structured数据中引导探索，而不需要外部奖励。
methods: 该论文提出了一种基于图神经网络学习的探索方法，使用了人类好奇的两种理论：信息差距理论和压缩进步理论。
results: 在多个synthetically生成的图上，训练过的代理人能够在更大的环境和更长的探索步骤上generalize，并且比较有效率地计算 topological feature。此外，好奇基于探索的推荐系统也比PageRank中心性更能预测人类行为，在MovieLens、Amazon Books和Wikispeedia等实际图 dataset上得到了证明。

Abstract
Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by the visited nodes in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to larger environments and to longer exploratory walks than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that curiosity-based recommendations are more predictive of human behavior than PageRank centrality for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.

摘要
天生有探索的恩恵，即使没有外部奖励，也可以有效地促进学习。当环境自然表示为图时，如何最好引导探索仍然是一个开放问题。在这项工作中，我们提议一种新的方法，通过利用访问节点所induced的子图特征来驱动graph neural network基于的奖励学习。我们使用这些提议的特征作为奖励，以便促进探索。在多种 sintetically生成的图上，我们发现训练的代理人可以在更大的环境和更长的探索步骤上generalize。我们的方法更加高效，而不是仅仅是评估相关的topological特征。我们的内在动机具有特别 relevance для推荐系统。我们示示了curiosity-based推荐的更高predictive power than PageRank中心性for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.

Reinforcement Learning with Non-Cumulative Objective

paper_url: http://arxiv.org/abs/2307.04957
repo_url: https://github.com/willtop/Reinforcement_Learning_With_Non-Cumulative_Objective
paper_authors: Wei Cui, Wei Yu
for: solving optimal control and reinforcement learning problems with non-cumulative objectives
methods: modifying existing algorithms using a generalized Bellman update rule and providing sufficient conditions for globally optimal convergence
results: demonstrating the idea experimentally on classical tasks and network routing problems, and achieving better performance compared to traditional methods

Abstract
In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.

摘要
在再增强学习中，目标通常是一个累积函数，表示过程中的奖励的总和。然而，在各种应用领域中，特别是在通信和网络领域，有许多优化控制和再增强学习问题，其目标不是自然地表示为奖励的总和。在这篇论文中，我们认为这种非累积目标在各种问题中很普遍，并提出修改现有算法来优化这类目标的方法。specifically，我们探究了许多优化控制和再增强学习算法的基本构建块：贝尔曼优化Equation。为了优化非累积目标，我们在贝尔曼更新规则中replace原始的总和操作，使用一种通用化的操作，与目标相对应。此外，我们还提供了 garantía global optimal convergence of the generalized Bellman updates可以 garantía的条件和假设，即Markov decision process的形式和assumptions。我们在经典优化控制和再增强学习任务上，以及两个网络流量最大化问题上，通过实验证明了这个想法。

Hybrid hidden Markov LSTM for short-term traffic flow prediction

paper_url: http://arxiv.org/abs/2307.04954
repo_url: None
paper_authors: Agnimitra Sengupta, Adway Das, S. Ilgin Guler
for: 预测交通流量
methods: 使用深度学习方法（如RNN和其变种）和hybrid hidden Markov-LSTM模型
results: 比使用传统方法（如Markov switching ARIMA和LSTM）表现出显著的性能提升

Abstract
Deep learning (DL) methods have outperformed parametric models such as historical average, ARIMA and variants in predicting traffic variables into short and near-short future, that are critical for traffic management. Specifically, recurrent neural network (RNN) and its variants (e.g. long short-term memory) are designed to retain long-term temporal correlations and therefore are suitable for modeling sequences. However, multi-regime models assume the traffic system to evolve through multiple states (say, free-flow, congestion in traffic) with distinct characteristics, and hence, separate models are trained to characterize the traffic dynamics within each regime. For instance, Markov-switching models with a hidden Markov model (HMM) for regime identification is capable of capturing complex dynamic patterns and non-stationarity. Interestingly, both HMM and LSTM can be used for modeling an observation sequence from a set of latent or, hidden state variables. In LSTM, the latent variable is computed in a deterministic manner from the current observation and the previous latent variable, while, in HMM, the set of latent variables is a Markov chain. Inspired by research in natural language processing, a hybrid hidden Markov-LSTM model that is capable of learning complementary features in traffic data is proposed for traffic flow prediction. Results indicate significant performance gains in using hybrid architecture compared to conventional methods such as Markov switching ARIMA and LSTM.

摘要
深度学习（DL）方法已经超过参数模型，如历史平均值、ARIMA和其变种，在预测交通变量的短期和近期未来方面表现出色。特别是，循环神经网络（RNN）和其变种（例如长短期记忆）能够保留长期时间的相关性，因此适用于序列模elling。然而，多态模型假设交通系统会逐渐发展到多个状态（例如自由流、堵塞），每个状态都具有独特的特征，因此需要分别训练特定的模型来描述交通动态。例如，Markov switching模型可以使用隐藏Markov模型（HMM）来确定状态转换，可以捕捉复杂的动态模式和非站ARY。在LSTM中，隐藏变量是基于当前观察和上一个隐藏变量的决定方式计算的，而在HMM中，隐藏变量是一个Markov链。受自然语言处理研究的启发，一种hybrid隐藏Markov-LSTM模型被提出，可以学习交通数据中的补充特征。结果表明，使用hybrid体系可以与传统方法，如Markov switching ARIMA和LSTM，相比较显著提高预测性能。

Compact Twice Fusion Network for Edge Detection

paper_url: http://arxiv.org/abs/2307.04952
repo_url: https://github.com/li-yachuan/ctfn-pytorch-master
paper_authors: Yachuan Li, Zongmin Li, Xavier Soria P., Chaozhi Yang, Qian Xiao, Yun Bai, Hua Li, Xiangdong Wang
for: 本研究旨在提出一种可靠的多尺度特征融合网络，以便实现高精度Edge detection的目标。
methods: 该网络使用了两种轻量级多尺度特征融合模块：一个具有semantic enhancement module（SEM），可以利用粗略度特征中的semantic信息来引导细粒度特征的学习；另一个具有pseudo pixel-level weighting（PPW）模块，可以将多尺度特征的补做weighting，以便更好地融合多尺度特征。
results: 该方法在BSDS500、NYUDv2和BIPEDv2等三个 dataset上进行了评估，与state-of-the-art方法相比，CTFN达到了竞争性的准确率，而且具有较少的参数和计算成本。特别是，除了基础模型外，CTFN只需要0.1M的额外参数，这使得其计算成本降低到了60%以下。

Abstract
The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at https://github.com/Li-yachuan/CTFN-pytorch-master.

摘要
“多尺度特征的重要性逐渐被edge detection社区所认可。然而，多尺度特征的融合增加模型的复杂度，不易应用。在这个工作中，我们提出了一个Compact Twice Fusion Network（CTFN），可以充分融合多尺度特征，同时保持模型的简洁性。CTFN包括两个轻量级多尺度特征融合模组：一个Semantic Enhancement Module（SEM）可以利用粗细度特征中的 semantics信息来导引细细度特征的学习，以及一个Pseudo Pixel-level Weighting（PPW）模组可以将多尺度特征中的 complementary advantages聚集到所有特征上。尽管如此，隐藏在文本腐败中的杂质项目仍然是一个挑战。为了解决这个问题，我们提出了一个新的损失函数，即Dynamic Focal Loss，它可以重新定义标准十进法损失函数，并动态调整权重，以正确处理困难样本。我们在BSDS500、NYUDv2和BIPEDv2三个 dataset上评估了我们的方法，与现有的方法相比，CTFN实现了竞争的精度，仅需0.1M的额外参数，对应computational cost的减少为60%。代码可以在https://github.com/Li-yachuan/CTFN-pytorch-master上获取。”

DDGM: Solving inverse problems by Diffusive Denoising of Gradient-based Minimization

paper_url: http://arxiv.org/abs/2307.04946
repo_url: None
paper_authors: Kyle Luther, H. Sebastian Seung
for: This paper is written for solving the inverse problem of tomographic reconstruction from electron micrographs acquired at multiple tilt angles.
methods: The paper proposes a simpler approach that combines traditional gradient-based minimization of reconstruction error with denoising, using a convolutional neural network (CNN) as a prior. The method adds noise at each step and uses an iterative dynamics resembling a Langevin or diffusion process, with the level of added noise and the size of the denoising step decaying exponentially with time.
results: The paper shows that high accuracy can be achieved with as few as 50 denoising steps, and compares the proposed method with more complex diffusion methods such as DDRM and DPS. The results demonstrate that the proposed method is more accurate (as measured by MSE and SSIM) for the tomography problem, and can be applied to reconstruction of arbitrary-sized images.

Abstract
Inverse problems generally require a regularizer or prior for a good solution. A recent trend is to train a convolutional net to denoise images, and use this net as a prior when solving the inverse problem. Several proposals depend on a singular value decomposition of the forward operator, and several others backpropagate through the denoising net at runtime. Here we propose a simpler approach that combines the traditional gradient-based minimization of reconstruction error with denoising. Noise is also added at each step, so the iterative dynamics resembles a Langevin or diffusion process. Both the level of added noise and the size of the denoising step decay exponentially with time. We apply our method to the problem of tomographic reconstruction from electron micrographs acquired at multiple tilt angles. With empirical studies using simulated tilt views, we find parameter settings for our method that produce good results. We show that high accuracy can be achieved with as few as 50 denoising steps. We also compare with DDRM and DPS, more complex diffusion methods of the kinds mentioned above. These methods are less accurate (as measured by MSE and SSIM) for our tomography problem, even after the generation hyperparameters are optimized. Finally we extend our method to reconstruction of arbitrary-sized images and show results on 128 $\times$ 1568 pixel images

摘要
“倒Problems通常需要一个正规化或先验的方法以获得好的解决方案。现在的趋势是使用卷积网来去噪图像，并将这个网络用作解决倒Problem的先验。一些提案靠摄Singular value decomposition of the forward operator，另一些透过在Runtime backpropagating through the denoising net。我们提出了一种更简单的方法，它结合了传统的梯度基于的最小化重建错误和去噪。噪音也会在每步加入，因此迭代运算类似于兰格温或漫游过程。噪音水平和去噪步骤的减少阶段落逐渐呈指数衰减。我们将方法应用到电子显微镜中的 Tomographic Reconstruction问题上。通过实验使用模拟的倾斜角度，我们获得了适当的参数设定，并证明高精度可以在50个去噪步骤中获得。我们还与DDRM和DPS等更复杂的演化方法进行比较，这些方法在我们的Tomography问题上较低的Mean Squared Error和Structural Similarity Index Measure。最后，我们将方法扩展到任意大小的图像重建问题上，并在128 $\times$ 1568像素图像上显示结果。”

Benchmarking Algorithms for Federated Domain Generalization

paper_url: http://arxiv.org/abs/2307.04942
repo_url: https://github.com/inouye-lab/feddg_benchmark
paper_authors: Ruqi Bai, Saurabh Bagchi, David I. Inouye
for: This paper is written for evaluating the performance of Federated Domain Generalization (FedDG) methods, which is a new challenge in Federated Learning (FL) that involves dealing with diverse client datasets.
methods: The paper proposes a benchmark methodology for FedDG, which includes controlling the number and heterogeneity of clients and providing metrics for dataset difficulty. The authors also evaluate 13 FedDG methods, including centralized DG methods adapted to the FL context, FL methods that handle client heterogeneity, and methods designed specifically for FedDG.
results: The paper shows that despite some progress, there remain significant performance gaps in FedDG, particularly when evaluating with a large number of clients, high client heterogeneity, or more realistic datasets. The authors also find that the performance of FedDG methods can be improved by using a larger number of clients and more diverse datasets.Here’s the simplified Chinese text for the three key points:
for: 这篇论文是为评估 Federated Domain Generalization（FedDG）方法的性能，这是 Federated Learning（FL）中新的挑战，它涉及处理多个客户端数据集的多样性。
methods: 论文提出了一种 FedDG 评估方法方法，包括控制客户端数量和多样性，以及提供数据集困难度指标。作者还评估了 13 种 FedDG 方法，包括中央 DG 方法在 FL 上的修改，FL 方法可以处理客户端多样性，以及特制 для FedDG 的方法。
results: 论文显示，尽管有一些进步，但 FedDG 中的性能仍然存在显著的性能差距，特别是在评估多个客户端、高客户端多样性或更真实的数据集时。作者们还发现，通过使用更多的客户端和更多的多样的数据集，FedDG 方法的性能可以得到改进。

Abstract
While prior domain generalization (DG) benchmarks consider train-test dataset heterogeneity, we evaluate Federated DG which introduces federated learning (FL) specific challenges. Additionally, we explore domain-based heterogeneity in clients' local datasets - a realistic Federated DG scenario. Prior Federated DG evaluations are limited in terms of the number or heterogeneity of clients and dataset diversity. To address this gap, we propose an Federated DG benchmark methodology that enables control of the number and heterogeneity of clients and provides metrics for dataset difficulty. We then apply our methodology to evaluate 13 Federated DG methods, which include centralized DG methods adapted to the FL context, FL methods that handle client heterogeneity, and methods designed specifically for Federated DG. Our results suggest that despite some progress, there remain significant performance gaps in Federated DG particularly when evaluating with a large number of clients, high client heterogeneity, or more realistic datasets. Please check our extendable benchmark code here: https://github.com/inouye-lab/FedDG_Benchmark.

摘要
“对于联边学习（Federated Learning，FL）中的网络统一化（Domain Generalization，DG），我们评估了联边网络统一化（Federated DG），并将联边学习特有的挑战纳入考虑。另外，我们还探索了客户端本地数据中的领域差异，这是现实中联边学习中的常见情况。过去的联边DG评估仅仅具有一些客户和数据的限制，无法反映现实中联边学习的多样性和问题。为了缓解这个问题，我们提出了一个联边DG评估方法，可以控制客户和数据的数量和多样性，并提供了评估数据的困难度的指标。我们运用这个方法评估了13种联边DG方法，包括中央化DG方法在FL上的适应，FL方法可以处理客户端的多样性，以及特地设计 для联边DG的方法。我们的结果显示，虽然有一些进步，但在处理大量客户、高客户多样性或更真实的数据时仍然存在较大的性能差距。您可以在以下链接中获取我们的可extendable benchmark代码：https://github.com/inouye-lab/FedDG_Benchmark。”

Impact of Feature Encoding on Malware Classification Explainability

paper_url: http://arxiv.org/abs/2307.05614
repo_url: None
paper_authors: Elyes Manai, Mohamed Mejri, Jaouhar Fattahi
for: 这个论文研究了对于可解释人工智能（XAI）算法的特征编码技术的影响。
methods: 使用一个恶意软件分类 dataset，我们训练了一个 XGBoost 模型，并比较了两种特征编码方法：标签编码（LE）和一个热点编码（OHE）。
results: 我们发现，使用 OHE 相比 LE，表现只有微不足。但是，OHE 提供了更详细的解释，使得更深入的探究详细信息。此外，我们发现 OHE 使得解释文件更小，降低了人类分析员的分析时间。这些发现强调了考虑特征编码技术在 XAI 研究中的重要性，并建议进一步探索采用其他编码方法和创新视觉方法。

Abstract
This paper investigates the impact of feature encoding techniques on the explainability of XAI (Explainable Artificial Intelligence) algorithms. Using a malware classification dataset, we trained an XGBoost model and compared the performance of two feature encoding methods: Label Encoding (LE) and One Hot Encoding (OHE). Our findings reveal a marginal performance loss when using OHE instead of LE. However, the more detailed explanations provided by OHE compensated for this loss. We observed that OHE enables deeper exploration of details in both global and local contexts, facilitating more comprehensive answers. Additionally, we observed that using OHE resulted in smaller explanation files and reduced analysis time for human analysts. These findings emphasize the significance of considering feature encoding techniques in XAI research and suggest potential for further exploration by incorporating additional encoding methods and innovative visualization approaches.

摘要

Towards Fair Graph Neural Networks via Graph Counterfactual

paper_url: http://arxiv.org/abs/2307.04937
repo_url: https://github.com/timelovercc/caf-gnn
paper_authors: Zhimeng Guo, Jialiang Li, Teng Xiao, Yao Ma, Suhang Wang
for: 本文针对 Graph Neural Networks (GNNs) 的偏见问题进行研究，尤其是 GNNs 在训练数据中继承和增强偏见的问题。
methods: 本文提出了一个名为 CAF 的新框架，它可以从训练数据中选择合理的 counterfactual，以避免非现实的 counterfactual，并将选择的 counterfactual 用于学习公平的node表示。
results: 实验结果显示，CAF 可以对 synthetic 和 real-world 数据进行优化，并且可以增强 GNNs 的公平性。

Abstract
Graph neural networks have shown great ability in representation (GNNs) learning on graphs, facilitating various tasks. Despite their great performance in modeling graphs, recent works show that GNNs tend to inherit and amplify the bias from training data, causing concerns of the adoption of GNNs in high-stake scenarios. Hence, many efforts have been taken for fairness-aware GNNs. However, most existing fair GNNs learn fair node representations by adopting statistical fairness notions, which may fail to alleviate bias in the presence of statistical anomalies. Motivated by causal theory, there are several attempts utilizing graph counterfactual fairness to mitigate root causes of unfairness. However, these methods suffer from non-realistic counterfactuals obtained by perturbation or generation. In this paper, we take a causal view on fair graph learning problem. Guided by the casual analysis, we propose a novel framework CAF, which can select counterfactuals from training data to avoid non-realistic counterfactuals and adopt selected counterfactuals to learn fair node representations for node classification task. Extensive experiments on synthetic and real-world datasets show the effectiveness of CAF. Our code is available at https://github.com/TimeLovercc/CAF-GNN.

摘要
GRAPH Neural Networks (GNNs) 有出色的能力在图上学习 Representation, 促进多种任务。 despite their great performance in modeling graphs, recent works show that GNNs tend to inherit and amplify the bias from training data, causing concerns about the adoption of GNNs in high-stakes scenarios. Therefore, many efforts have been made for fairness-aware GNNs. However, most existing fair GNNs learn fair node representations by adopting statistical fairness notions, which may fail to alleviate bias in the presence of statistical anomalies. Motivated by causal theory, there are several attempts using graph counterfactual fairness to mitigate the root causes of unfairness. However, these methods suffer from non-realistic counterfactuals obtained by perturbation or generation.In this paper, we take a causal view on the fair graph learning problem. Guided by the causal analysis, we propose a novel framework CAF, which can select counterfactuals from training data to avoid non-realistic counterfactuals and adopt selected counterfactuals to learn fair node representations for node classification tasks. Extensive experiments on synthetic and real-world datasets show the effectiveness of CAF. Our code is available at .Here's the translation in Simplified Chinese:GRNNs 有出色的能力在图上学习 Representation, 促进多种任务。 despite their great performance in modeling graphs, recent works show that GRNNs tend to inherit and amplify the bias from training data, causing concerns about the adoption of GRNNs in high-stakes scenarios. Therefore, many efforts have been made for fairness-aware GRNNs. However, most existing fair GRNNs learn fair node representations by adopting statistical fairness notions, which may fail to alleviate bias in the presence of statistical anomalies. Motivated by causal theory, there are several attempts using graph counterfactual fairness to mitigate the root causes of unfairness. However, these methods suffer from non-realistic counterfactuals obtained by perturbation or generation.In this paper, we take a causal view on the fair graph learning problem. Guided by the causal analysis, we propose a novel framework CAF, which can select counterfactuals from training data to avoid non-realistic counterfactuals and adopt selected counterfactuals to learn fair node representations for node classification tasks. Extensive experiments on synthetic and real-world datasets show the effectiveness of CAF. Our code is available at .

Substance or Style: What Does Your Image Embedding Know?

paper_url: http://arxiv.org/abs/2307.05610
repo_url: None
paper_authors: Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins
for: 这个论文是为了研究图像基础模型中的非 semantic信息，以及这些基础模型在不同下游任务中的表现。
methods: 作者使用了一系列的变换预测任务来测试图像基础模型中的非 semantic信息，包括图像风格、质量和自然/人工变换等多个轴。
results: 研究发现，六种图像基础模型（包括SimCLR）中的embeddings含有许多非 semantic信息，可以识别多达数十种变换。此外，作者还发现，使用图像文本模型（CLIP和ALIGN）可以更好地识别新的样式转移示例，而使用面具模型（CAN和MAE）则更适合隐藏变换任务。

Abstract
Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of transformations. We also consider a generalization task, where we group similar transformations and hold out several for testing. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE). Overall, our results suggest that the choice of pre-training algorithm impacts the types of information in the embedding, and certain models are better than others for non-semantic downstream tasks.

摘要
probes 是小型网络，可以预测嵌入数据中的性质，并提供一种targeted、有效的方式来照明嵌入数据中的信息。在 NLP 领域中， probes 已经成为标准的分析方法，而在视觉领域中， however, 只有很少的探索。 popular 的嵌入模型（如 MAE、SimCLR 和 CLIP）主要被评估为semantic content，但是更深入的理解这些嵌入模型中的非 semantic information（例如图像风格、质量等）会对训练算法和这些基础模型的应用有新的灯光。我们设计了一个系统性的变换预测任务，并测量嵌入中的视觉内容 along 多个轴，包括图像风格、质量和一些自然和人工变换。结果显示， six 个嵌入（包括 SimCLR）encode enough non-semantic information，可以识别多达 dozen 种变换。我们还考虑一个总结任务，将类似的变换分组，并将一些用作测试集。我们发现， image-text 模型（CLIP 和 ALIGN）在新的样式转移例子中表现更好，而 masking-based 模型（CAN 和 MAE）则更适合掩码转换。总之，我们的结果表明，选择预训练算法的选择会影响嵌入中的信息类型，而certain 模型在非 semantic 下游任务中表现更好。

Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)

paper_url: http://arxiv.org/abs/2307.04927
repo_url: None
paper_authors: Xiaotong Ji, Antonio Filieri
for: 本研究旨在解决RL在安全关键场景中的限制，因为失败可能导致高成本。
methods: 我们使用Counterexample-based safe exploration方法，把批处理和抽象模型结合在一起，以便在不同的环境中快速地训练agent。
results: 我们的方法可以在静止训练和在线探索中减少安全违反的风险，并且可以与QL和DQN标准算法和先前的相关工作相比，提高了安全性和总奖励的性能。

Abstract
Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counterexample generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.

摘要
安全探索targets addressing the limitations of reinforcement learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counterexample generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.

SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation

paper_url: http://arxiv.org/abs/2307.04907
repo_url: None
paper_authors: Bhathiya Hemanthage, Christian Dondrup, Phil Bartie, Oliver Lemon
for: 这篇论文主要是为了提出一种简单的语言模型，用于处理多modal任务对话。
methods: 该模型基于大规模的 transformer 框架，并利用了 transfer learning 技术，从 pre-trained GPT-2 中提取了知识。为了捕捉视觉场景的 semantics，该模型引入了本地和 де-本地 токен。
results: 该模型在 Response Generation 子任务上达到了 state-of-the-art BLEU 分数 (0.327)，并在其他多modal 子任务中表现良好，包括 Disambiguation、Coreference Resolution 和 Dialog State Tracking。这是 despite 该模型采取了一种 minimalist 的方法来提取视觉（以及非视觉）信息。

Abstract
SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pre-trained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) information. In addition the model does not rely on task-specific architectural changes such as classification heads.

摘要
SimpleMTOD 是一个简单的语言模型，它将多modal task-oriented对话中的多个子任务转化为序列预测任务。 SimpleMTOD 基于大规模的 transformer 自动生成架构，这种架构在单modal task-oriented对话中已经证明成功，并且有效地利用了预训练的 GPT-2 的转移学习。为了捕捉视觉场景的 semantics，我们引入了场景中对象的本地和非本地符号。非本地符号表示对象的类型而不是特定的对象，因此具有 dataset 中的一致性。 SimpleMTOD 在 SIMMC 2.0 测试标准数据集中的 Response Generation 子任务中 achieved state-of-the-art BLEU 分数 (0.327)，并在其他多modal 子任务中（歧义解决、核心引用解决和对话状态跟踪）表现良好，尽管使用了 minimalist 的方法来提取视觉（和非视觉）信息。此外，模型不需要任务特定的建筑性Changes，如分类头。

FedYolo: Augmenting Federated Learning with Pretrained Transformers

paper_url: http://arxiv.org/abs/2307.04905
repo_url: None
paper_authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak
for: 这个论文的目的是探讨如何使用预训练 трансформа器（PTF）来实现在移动设备和边缘设备上进行学习，以满足多样化的客户端目标和有限的不同数据的学习。
methods: 这个论文使用了联合学习（Federated Learning）和预训练 transformer（PTF）来实现在移动设备和边缘设备上进行学习，并 investigate了模型大小和模块化的影响。
results: 研究发现，可以通过增大模型规模和使用模块化来提高设备和网络约束下的学习效果，同时减少了通信轮次数。此外，模块化还可以使得客户端可以同时解决多个无关的任务，而不会出现归化问题。这些发现 inspirited a new federated learning approach called “You Only Load Once” (FedYolo)， clients可以通过通信有效的模块来更新模型，而不需要每次更新整个模型。

Abstract
The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.

摘要
machine learning应用的增长和多样性，需要我们重新思考手持设备和边缘设备上的学习。如何处理多样化客户端目标和缺乏多样化数据的学习呢？联邦学习旨在解决这些问题，但它存在一些阻碍统一解决方案的挑战。大型转换器模型已经在多种任务上显示出惊人的几次适应性，这引起了问题：客户可以使用单一通用模型来满足每个任务，而不是为每个任务创建特定的模型吗？在这种情况下，我们 investigate pretrained transformers (PTF) 以实现这些在设备上学习的目标，并且全面探索模型大小和模块化的角色。我们将注重联邦学习，并证明：1. 规模的扩大可以降低客户端和网络约束下的准确性差距，同时提高多样性的鲁棒性。通过在本地进行更多的SGD迭代，客户可以减少通信轮次数。在极限情况下，客户可以在本地达到尽可能高的准确性，这 highlights 了本地学习的潜在可能性。2. 模块化设计可以减少通信量，并且意外地提高了本地适应方法的总体化能力和小型PTF的鲁棒性。此外，它还允许客户同时解决多个无关的任务，而不是通过全部更新而导致彻底忘记。这些策略对规模和模块化的影响，驱动我们提出一种新的联邦学习方法，我们称之为“只上载一次”（FedYolo）。在这种方法中，客户在第一次上载一个完整的PTF模型后，所有的未来更新都可以通过通信效率低的模块来完成，而不会导致彻底忘记。

Fast dynamic time warping and clustering in C++

paper_url: http://arxiv.org/abs/2307.04904
repo_url: None
paper_authors: Volkan Kumtepeli, Rebecca Perriment, David A. Howey
for: Computationally efficient dynamic time warping (DTW) and clustering of time-series data.
methods: Dynamic programming and mixed-integer programming (MIP) for DTW and clustering, with task-level parallelization for efficiency.
results: 33% faster than the next fastest option on average, with a 64% speedup for larger datasets (over 1000 time series). The MIP clustering is most effective for small numbers of longer time series.

Abstract
We present an approach for computationally efficient dynamic time warping (DTW) and clustering of time-series data. The method frames the dynamic warping of time series datasets as an optimisation problem solved using dynamic programming, and then clusters time series data by solving a second optimisation problem using mixed-integer programming (MIP). There is also an option to use k-medoids clustering for increased speed, when a certificate for global optimality is not essential. The improved efficiency of our approach is due to task-level parallelisation of the clustering alongside DTW. Our approach was tested using the UCR Time Series Archive, and was found to be, on average, 33% faster than the next fastest option when using the same clustering method. This increases to 64% faster when considering only larger datasets (with more than 1000 time series). The MIP clustering is most effective on small numbers of longer time series, because the DTW computation is faster than other approaches, but the clustering problem becomes increasingly computationally expensive as the number of time series to be clustered increases.

摘要
我们提出了一种 computationally efficient 的动态时间扭曲（DTW）和时间序列数据 clustering 方法。该方法将动态扭曲时间序列数据集 frames 为一个优化问题，使用动态Programming 解决，然后使用杂Integer Programming（MIP）解决第二个优化问题，并且可以选择使用 k-medoids clustering 以提高速度。我们的方法在UCR Time Series Archive 上进行测试，与其他相同 clustering 方法相比，平均提高了33%的速度，对于更大的数据集（包括1000个时间序列），则提高到64%。MIP clustering 对于少量 longer 时间序列表示最高效，因为 DTW 计算 faster than other approaches，但是 clustering 问题的计算成本随着时间序列数据集的数量增加。

Can You Improve My Code? Optimizing Programs with Local Search

paper_url: http://arxiv.org/abs/2307.05603
repo_url: https://github.com/fatemehab/polis
paper_authors: Fatemeh Abdollahi, Saqib Ameen, Matthew E. Taylor, Levi H. S. Lelis
for: 这篇论文目标是提高现有程序的性能，通过利用程序的结构和已有的猛烈合成算法。
methods: 该方法使用了本地搜索，不断地改进单个程序行，以提高程序的性能。
results: 经过27名参与者的用户研究，发现POLIS可以在两个单机游戏中（即月球降落和高速公路）提高参与者的程序性能。此外，对现有Stack Overflow代码进行了一个证明示例，表明POLIS在实际问题中也有应用价值。

Abstract
This paper introduces a local search method for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines. POLIS improves a single line of the program while keeping the remaining lines fixed, using existing brute-force synthesis algorithms, and continues iterating until it is unable to improve the program's performance. POLIS was evaluated with a 27-person user study, where participants wrote programs attempting to maximize the score of two single-agent games: Lunar Lander and Highway. POLIS was able to substantially improve the participants' programs with respect to the game scores. A proof-of-concept demonstration on existing Stack Overflow code measures applicability in real-world problems. These results suggest that POLIS could be used as a helpful programming assistant for programming problems with measurable objectives.

摘要
The effectiveness of POLIS was evaluated through a 27-person user study, where participants wrote programs aiming to maximize the scores of two single-agent games: Lunar Lander and Highway. The results showed that POLIS was able to significantly improve the participants' programs with respect to game scores. Additionally, a proof-of-concept demonstration on existing Stack Overflow code demonstrated the applicability of POLIS in real-world problems. These findings suggest that POLIS could be a useful programming assistant for programming problems with measurable objectives.Translated into Simplified Chinese:这篇论文介绍了一种基于可测量目标的地方搜索方法，用于改进现有的程序。该方法称为程序优化与本地改进搜索（POLIS），利用程序的线程结构，在保持其余线程固定的情况下，使用现有的毫干搜索算法，对程序中的单个线程进行改进，并继续迭代直到无法提高程序的性能。为证明POLIS的有用性，该论文进行了27名用户参与的实验，参与者需要通过设计两个单机游戏的分数来尝试提高他们的程序：月球降落和高速公路。结果表明，POLIS能够有效地改进参与者们的程序，增加分数。此外，对现有的Stack Overflow代码进行了一个证明性示例，以证明POLIS在实际问题中的应用可行性。这些结果表明，POLIS可能成为程序问题中的有用编程助手。

Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer

paper_url: http://arxiv.org/abs/2307.04895
repo_url: https://github.com/azreasoners/recurrent_transformer
paper_authors: Zhun Yang, Adam Ishay, Joohyung Lee
for: 解决干式约束问题 (Constraint Satisfaction Problems, CSPs)
methods: 使用Transformer扩展了回归，结合Visual input和逻辑知识进行端到端学习
results: 提出了一种新的方法，可以在端到端学习中解决CSPs，并且可以使用逻辑知识进行 semi-supervised learning 和 sample-efficient learning

Abstract
Constraint satisfaction problems (CSPs) are about finding values of variables that satisfy the given constraints. We show that Transformer extended with recurrence is a viable approach to learning to solve CSPs in an end-to-end manner, having clear advantages over state-of-the-art methods such as Graph Neural Networks, SATNet, and some neuro-symbolic models. With the ability of Transformer to handle visual input, the proposed Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. We also show how to leverage deductive knowledge of discrete constraints in the Transformer's inductive learning to achieve sample-efficient learning and semi-supervised learning for CSPs.

摘要

Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies

paper_url: http://arxiv.org/abs/2307.04893
repo_url: https://github.com/rubensolv/locallearnerijcai
paper_authors: Rubens O. Moraes, David S. Aleixo, Lucas N. Ferreira, Levi H. S. Lelis
for: 这篇论文旨在提供一种用于指导搜索算法的参考策略集，以提高在两个玩家零风险游戏中搜索策略的效果。
methods: 本论文提出了一种名为本地学习（2L）算法，该算法可以活动选择一组参考策略，以提高搜索信号。
results: 实验表明，使用2L算法可以比较IBR、FP和DO算法更好地学习参考策略，并在Synthesizing策略中提高搜索效果。此外，我们还通过模拟一场MicroRTS比赛，发现使用2L算法 synthesizer 可以比较两个最近的MicroRTS比赛的赢家，这些赢家都是由人工程师编写的程序策略。

Abstract
This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.

摘要
(Simplified Chinese translation)这篇论文介绍了本地学习器（2L），一种算法用于提供指导搜索算法的参考策略集。先前的学习算法，如趋同最佳回应（IBR）、虚拟游戏（FP）和双重oracle（DO），可能具有计算成本高或搜索信号不准确的缺陷。2L活动选择一组参考策略，以提高搜索信号。我们通过对三个游戏进行实验，包括实时战略游戏MicroRTS，证明了2L所学习的参考策略比IBR、FP和DO更强的搜索信号。此外，我们还在MicroRTS的赛事中使用2L进行合成策略，并在赛事中击败了最近两届MicroRTS的赛事冠军，这些赛事冠军是由人工程师编写的程序策略。

Unsupervised Domain Adaptation with Deep Neural-Network

paper_url: http://arxiv.org/abs/2307.05601
repo_url: https://github.com/jetwev/domain-adaptation
paper_authors: Artem Bituitskii
for: 这篇论文为了解决不监督领域适应问题提供分析现有方法、推出新方法，并在不同领域下进行视觉识别任务的改进。
methods: 这篇论文使用了现有方法的分析和一种新的方法，用于解决不同领域下的视觉识别任务。
results: 这篇论文的结果预示了适应领域下的视觉识别任务可以通过提高现有方法的性能来改进。

Abstract
This report contributes to the field of unsupervised domain adaptation by providing an analysis of existing methods, introducing a new approach, and demonstrating the potential for improving visual recognition tasks across different domains. The results of this study open up opportunities for further study and development of advanced methods in the field of domain adaptation.

摘要
这份报告对不监督领域适应进行了分析，提出了一种新的方法，并证明了在不同领域中进行视觉识别任务的可能性。这些研究结果为领域适应领域的进一步研究开创了新的可能性。Here's a breakdown of the translation:* 这份报告 (zhè fā bào gāo) - This report* 对 (duì) - To* 不监督 (bù jiān dǎo) - Unsupervised* 领域 (lǐng yù) - Domain* 适应 (shì yìng) - Adaptation* 进行 (jìn cè) - To perform* 分析 (fēn xī) - Analysis* 提出 (tím zhè) - To propose* 一种 (yī zhǒng) - A new* 方法 (fāng fá) - Method* 并 (bìng) - And* 证明 (zhèng míng) - To demonstrate* 在 (zhè) - In* 不同 (bù tiěng) - Different* 领域 (lǐng yù) - Domains* 进行 (jìn cè) - To perform* 视觉识别 (shì jìng bìa) - Visual recognition* 任务 (réng zhì) - Tasks* 可能性 (kě néng xìng) - Possibilities* 开创 (kāi chuàng) - New possibilitiesI hope this helps! Let me know if you have any further questions.

Accelerated Discovery of Machine-Learned Symmetries: Deriving the Exceptional Lie Groups G2, F4 and E6

paper_url: http://arxiv.org/abs/2307.04891
repo_url: None
paper_authors: Roy T. Forestano, Konstantin T. Matchev, Katia Matcheva, Alexander Roman, Eyup B. Unlu, Sarunas Verner
for: 这些研究用超vised深度学习来找到保持数据标签的连续对称变换，以及对应的对称生成器的代数。
methods: 这封信使用两种改进的算法来加速对称变换的发现，并用 sparse 形式表示发现的生成器。
results: 这些新算法对标准方法的性能进行了比较，并在对称群 $G_2$, $F_4$, $E_6$ 中发现了完整的生成器集。

Abstract
Recent work has applied supervised deep learning to derive continuous symmetry transformations that preserve the data labels and to obtain the corresponding algebras of symmetry generators. This letter introduces two improved algorithms that significantly speed up the discovery of these symmetry transformations. The new methods are demonstrated by deriving the complete set of generators for the unitary groups U(n) and the exceptional Lie groups $G_2$, $F_4$, and $E_6$. A third post-processing algorithm renders the found generators in sparse form. We benchmark the performance improvement of the new algorithms relative to the standard approach. Given the significant complexity of the exceptional Lie groups, our results demonstrate that this machine-learning method for discovering symmetries is completely general and can be applied to a wide variety of labeled datasets.

摘要
最近的工作已经使用监督深度学习来找到保持数据标签的连续 симметry 变换，并获得相应的symmetry生成器的代数。这封信件介绍了两种改进的算法，可以很快速地找到这些symmetry变换。新方法在U(n)和G_2、F_4、E_6等特例 Lie grupos中找到了完整的生成器集。此外，我们还提出了一种后处理算法，将找到的生成器变换成简约的形式。我们对标准方法和新方法进行了性能比较，结果显示，对特例 Lie grupos来说，这种机器学习方法可以完全应用于各种标注数据集。

Measuring and Mitigating Interference in Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.04887
repo_url: None
paper_authors: Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White
for: 本研究旨在提供一种定义和衡量值基的强化学习方法中的干扰量的方法，以及一种测试这种干扰量和控制性能的方法。
methods: 本研究使用了 fitted Q-iteration 和 DQN 等值基强化学习方法，并提出了一种新的干扰量测试方法。
results: 研究发现，该测试方法与控制性能的变化高度相关，并且可以用来评估不同网络架构和学习算法对干扰量的影响。此外，研究还提出了一种名为 “online-aware” 的算法，可以减少干扰量，并且在一些经典控制环境中提高了稳定性和性能。

Abstract
Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.

摘要
Catastrophic interference is a common problem in many network-based learning systems, and many proposals have been made to mitigate it. Before we can overcome interference, we must first understand it better. In this study, we provide a definition and a novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference and show that it correlates with instability in control performance across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms that mitigate interference. Finally, we outline a class of algorithms we call "online-aware" that are designed to mitigate interference, and show that they do reduce interference according to our measure and improve stability and performance in several classic control environments.

AI For Global Climate Cooperation 2023 Competition Proceedings

paper_url: http://arxiv.org/abs/2307.06951
repo_url: None
paper_authors: Yoshua Bengio, Prateek Gupta, Lu Li, Soham Phade, Sunil Srinivasa, Andrew Williams, Tianyu Zhang, Yang Zhang, Stephan Zheng
For: The paper aims to design international frameworks for mitigating climate change and promoting sustainable economic growth, using AI-driven integrated assessment models (IAM) and simulations.* Methods: The paper uses RICE-N, an AI-driven IAM that supports modeling regional decision-making using AI agents, to model the climate-economic impact of decisions into the future. The proposals were evaluated both quantitatively and qualitatively, with a combination of performance metrics and human expert evaluation.* Results: The paper seeks to provide a promising solution to the challenges of collaboration in mitigating climate change and promoting sustainable economic growth, by combining AI with climate-economic simulations and involving human experts from multiple disciplines. The results of the competition and the improvements to RICE-N are expected to contribute to the development of effective and sustainable international frameworks for climate cooperation.

Abstract
The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N.

摘要
国际社区必须合作以 Mitigate климатиче变化并保持经济增长。然而，合作困难，其中一个原因是没有全球权威机构可以确保国际气候协议的遵从性。通过将 AI 与气候经济仿真模型相结合，可以开发出国际框架，包括谈判协议和气候协议，以便促进和激励合作。此外，这些框架还应该包括政策目标实现和持续承诺，考虑气候经济动态和策略行为。这些挑战需要跨学科的方法，包括机器学习、经济学、气候科学、法律、政策、伦理和其他领域。为了实现这个目标，我们组织了 AI for Global Climate Cooperation 竞赛，其中团队提交了国际框架的建议和分析，基于（修改后）的 RICE-N 气候经济仿真模型（IAM）。特别是，RICE-N 支持用 AI 代理模型地区决策。而 IAM 则模拟了这些决策对未来气候经济的影响。在第一个轨道中，只评估性能指标。而在第二个轨道中，提交的提案被评估了 both quantitatively 和 qualitatively。量化评估包括（i）全球气温升高的减轻程度和（ii）经济生产力的增长。然而，人类专家组成的多学科评审团（包括法律、政策、社会学、经济学和环境科学）对解决方案进行了质量评估。特别是，评审团考虑了效果、简洁、可行性、伦理和气候正义方面的评估。在第三个轨道中，参与者被要求批评并改进 RICE-N。

Onion Universe Algorithm: Applications in Weakly Supervised Learning

paper_url: http://arxiv.org/abs/2307.04870
repo_url: None
paper_authors: Woojoo Na
for: 本研究旨在提出一种新的分类方法，即 Onion Universe Algorithm (OUA)，用于弱监督学习。
methods: OUA 基于弱信号空间的几何解释，不需要任何假设，可以快速实现并且简单易用。
results: 实验结果表明，OUA 在常见的标准数据集上表现出色，比既有的标签模型更好。

Abstract
We introduce Onion Universe Algorithm (OUA), a novel classification method in ensemble learning. In particular, we show its applicability as a label model for weakly supervised learning. OUA offers simplicity in implementation, computational efficiency, and does not rely on any assumptions regarding the data or weak signals. The model is well suited for scenarios where fully labeled data is not available. Our method is built upon geometrical interpretation of the space spanned by weak signals. Empirical results support our analysis of the hidden geometric structure underlying general set of weak signals and also illustrates that OUA works well in practice. We show empirical evidence that OUA performs favorably on common benchmark datasets compared to existing label models for weakly supervised learning.

摘要
我团队介绍了葱宇宙算法（OUA），一种新型的集成学习分类方法。具体来说，我们证明了它在弱监督学习中的应用性。OUA具有简单的实现、计算效率和不假设数据或弱信号的特点。该模型适用于具有受限数据的场景。我们的方法基于弱信号空间的几何解释。我们的分析表明，OUA在通用的弱信号集中隐藏的几何结构下适用。实验证明了OUA在实践中表现良好，并且与现有的弱监督学习标签模型相比，OUA的性能较好。

Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning

paper_url: http://arxiv.org/abs/2307.04869
repo_url: None
paper_authors: Gaurav Bagwe, Xiaoyong Yuan, Miao Pan, Lan Zhang
for: 这篇论文是针对 Federated Continual Learning (FCL) 的研究，尤其是在不需要练习的情况下学习多个任务。
methods: 本论文使用了 Prompt Learning 技术，通过强制学习 Task-specific 的描述来解决 FCL 中的忘记问题。它还 introduce two key components： asynchronous prompt learning 和 contrastive continual loss，以 Handling asynchronous task arrival 和 heterogeneous data distributions 在 FCL 中。
results: 实验结果显示 Fed-CPrompt 可以实现 SOTA 的 rehearsal-free FCL 性能。

Abstract
Federated continual learning (FCL) learns incremental tasks over time from confidential datasets distributed across clients. This paper focuses on rehearsal-free FCL, which has severe forgetting issues when learning new tasks due to the lack of access to historical task data. To address this issue, we propose Fed-CPrompt based on prompt learning techniques to obtain task-specific prompts in a communication-efficient way. Fed-CPrompt introduces two key components, asynchronous prompt learning, and contrastive continual loss, to handle asynchronous task arrival and heterogeneous data distributions in FCL, respectively. Extensive experiments demonstrate the effectiveness of Fed-CPrompt in achieving SOTA rehearsal-free FCL performance.

摘要
联邦不断学习（FCL）逐渐学习课程，随着时间的推移，从静态分布在客户端上的机密数据集中学习新任务。这篇论文专注于无卷重复FCL，由于缺乏历史任务数据，因此受到严重的忘记问题困扰。为解决这个问题，我们提议了Fed-CPrompt，基于提示学习技术来获得任务特定的提示。Fed-CPrompt具有异步提示学习和对异构数据分布的对比连续损失两个关键组件，可以有效地处理FCL中的异步任务到达和不同数据分布问题。广泛的实验表明Fed-CPrompt可以实现SOTA的无卷重复FCL性能。

Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise

paper_url: http://arxiv.org/abs/2307.04868
repo_url: https://github.com/MLD3/Instance_Dependent_Label_Noise
paper_authors: Donna Tjandra, Jenna Wiens
for: 这篇论文是为了解决受标签错误影响的模型性能问题。
methods: 这篇论文提出了一个two-stage方法来在标签错误中学习。这个方法使用了“anchor points”，一小部分数据，其标签已知。
results: 这篇论文的方法在多个任务上实现了显著的改善（AUROC），同时减少了偏见（AUEOC）。例如，在预测MIMIC-IIIdataset上的严重呼吸系统失常开始时，这篇论文的方法取得了0.84（SD 0.01）的和谐平均值（AUROC和AUEOC），比下一个最佳基eline的0.81（SD 0.01）高。总的来说，这篇论文的方法可以提高精度，同时减少可能的偏见。

Abstract
Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise.

摘要

Compositional Generalization from First Principles

paper_url: http://arxiv.org/abs/2307.05596
repo_url: https://github.com/brendel-group/compositional-ood-generalization
paper_authors: Thaddäus Wiedemer, Prasanna Mayilvahanan, Matthias Bethge, Wieland Brendel
for: 本研究旨在探讨机器学习中的compositional generalization问题，即如何使模型能够通过学习数据的组成结构来泛化到新的数据集。
methods: 我们采用了一种底层的方法，通过对数据生成过程的分析，将compositional generalization问题转化为了一种数据生成问题。然后，我们提出了一些某种条件，这些条件只需要支持Training distribution和模型结构，即可以确保模型的泛化能力。
results: 我们的研究结果表明，在实际场景中，我们的方法可以有效地推广模型的泛化能力。此外，我们还进行了一些实验来验证我们的理论结论，并得到了正面的结果。

Abstract
Leveraging the compositional nature of our world to expedite learning and facilitate generalization is a hallmark of human perception. In machine learning, on the other hand, achieving compositional generalization has proven to be an elusive goal, even for models with explicit compositional priors. To get a better handle on compositional generalization, we here approach it from the bottom up: Inspired by identifiable representation learning, we investigate compositionality as a property of the data-generating process rather than the data itself. This reformulation enables us to derive mild conditions on only the support of the training distribution and the model architecture, which are sufficient for compositional generalization. We further demonstrate how our theoretical framework applies to real-world scenarios and validate our findings empirically. Our results set the stage for a principled theoretical study of compositional generalization.

摘要
利用世界的compositional nature来加速学习和推广是人类视觉的一种特征。在机器学习中， however，实现compositional generalization是一个困难的目标，即使模型具有显式的compositional priors。为了更好地理解compositional generalization，我们在这里从底向上方法： inspirited by identifiable representation learning, we investigate compositionality as a property of the data-generating process rather than the data itself.这种重新定义允许我们 derive mild conditions on only the support of the training distribution and the model architecture, which are sufficient for compositional generalization. we further demonstrate how our theoretical framework applies to real-world scenarios and validate our findings empirically. our results set the stage for a principled theoretical study of compositional generalization.Note: "compositional nature" is translated as "世界的compositional nature" in Simplified Chinese, where "世界" (shì jiè) means "world" and "compositional" is an adjective.

Automated Detection of Gait Events and Travel Distance Using Waist-worn Accelerometers Across a Typical Range of Walking and Running Speeds

paper_url: http://arxiv.org/abs/2307.04866
repo_url: None
paper_authors: Albara Ah Ramli, Xin Liu, Kelly Berndt, Chen-Nee Chuah, Erica Goude, Lynea B. Kaethler, Amanda Lopez, Alina Nicorici, Corey Owens, David Rodriguez, Jane Wang, Daniel Aranki, Craig M. McDonald, Erik K. Henricson
For: The paper aims to evaluate the accuracy of using accelerometer data from commercially-available smartphones to measure clinical features of gait (CFs) in children with Duchenne muscular dystrophy (DMD) and typically developing controls (TDs) using machine learning (ML)-based methods.* Methods: The study used a multi-step machine learning-based process to extract CFs from accelerometer data collected from 15 children with DMD and 15 TDs during supervised clinical testing across a range of gait speeds, including 10 or 25m run/walk (10MRW, 25MRW), 100m run/walk (100MRW), 6-minute walk (6MWT) and free-walk (FW) evaluations.* Results: The study found that the estimates of CFs obtained from the accelerometer data showed a strong correlation with ground-truth observation data, with a mean (SD) percentage error of 1.49% (7.04%) for step counts, 1.18% (9.91%) for distance traveled, and 0.37% (7.52%) for step length.Here is the information in Simplified Chinese text:* For: 这个研究是用来评估使用商业可用的智能手机陀螺仪数据来评估 Duchenne muscular dystrophy（DMD）和 typically developing controls（TDs）中的步行临床特征（CFs）的准确性的。* Methods: 这个研究使用了多步机器学习基于的过程来从陀螺仪数据中提取CFs，并在15个DMD儿童和15个TD儿童的指导临床测试中进行了多种步速评估，包括10米/25米跑步（10MRW、25MRW）、100米跑步（100MRW）、6分钟步行（6MWT）和自由步行（FW）测试。* Results: 研究发现，通过陀螺仪数据获取的CFs估算与实际观察数据之间呈现了强相关关系，其中step count、距离旅行和步长的估算 errors的mean（SD）为1.49%（7.04%）、1.18%（9.91%）和0.37%（7.52%）。

Abstract
Background: Estimation of temporospatial clinical features of gait (CFs), such as step count and length, step duration, step frequency, gait speed and distance traveled is an important component of community-based mobility evaluation using wearable accelerometers. However, challenges arising from device complexity and availability, cost and analytical methodology have limited widespread application of such tools. Research Question: Can accelerometer data from commercially-available smartphones be used to extract gait CFs across a broad range of attainable gait velocities in children with Duchenne muscular dystrophy (DMD) and typically developing controls (TDs) using machine learning (ML)-based methods Methods: Fifteen children with DMD and 15 TDs underwent supervised clinical testing across a range of gait speeds using 10 or 25m run/walk (10MRW, 25MRW), 100m run/walk (100MRW), 6-minute walk (6MWT) and free-walk (FW) evaluations while wearing a mobile phone-based accelerometer at the waist near the body's center of mass. Gait CFs were extracted from the accelerometer data using a multi-step machine learning-based process and results were compared to ground-truth observation data. Results: Model predictions vs. observed values for step counts, distance traveled, and step length showed a strong correlation (Pearson's r = -0.9929 to 0.9986, p<0.0001). The estimates demonstrated a mean (SD) percentage error of 1.49% (7.04%) for step counts, 1.18% (9.91%) for distance traveled, and 0.37% (7.52%) for step length compared to ground truth observations for the combined 6MWT, 100MRW, and FW tasks. Significance: The study findings indicate that a single accelerometer placed near the body's center of mass can accurately measure CFs across different gait speeds in both TD and DMD peers, suggesting that there is potential for accurately measuring CFs in the community with consumer-level smartphones.

摘要
背景：评估社区基尼行动（CF），如步数和长度、步duration、步频、走速和总距离是评估社区基尼行动评估器（CB-GAIT）的重要组成部分。然而，设备复杂性和可用性、成本和分析方法的问题有限制了这些工具的广泛应用。研究问题：可以使用商业化手机的加速度仪数据来提取CF在各种可行的步速下的儿童 Duchenne muscular dystrophy（DMD）和正常发育 controls（TD）中使用机器学习（ML）方法来提取CF。方法：15名DMD儿童和15名TD儿童在不同的步速下进行了监测，包括10米跑步/跑步（10MRW）、25米跑步/跑步（25MRW）、100米跑步/跑步（100MRW）、6分钟步行（6MWT）和自由步行（FW）测试，同时穿着在腰部近身体中心的手机加速度仪。CF从加速度数据中提取使用多步骤机器学习基本过程，并与实际观测数据进行比较。结果：模型预测与实际观测值之间的相关性（Pearson的r = -0.9929到0.9986，p < 0.0001），并且估计结果表明了步数、总距离和步长的mean（SD）百分比误差为1.49%（7.04%）、1.18%（9.91%）和0.37%（7.52%），与实际观测值相比。意义：这些结果表明，一个单一的加速度仪可以在不同的步速下准确地测量CF，并且这些测量可以在社区中使用consumer级别的智能手机进行。

Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models

paper_url: http://arxiv.org/abs/2307.04859
repo_url: None
paper_authors: Alexander W. Bergman, Wang Yifan, Gordon Wetzstein
for: 本研究旨在提供一种基于文本描述的3D人物头部生成方法，以满足人工智能、虚拟现实、电影制作和教育等领域的需求。
methods: 本研究使用了已经训练过的2D文本到图像扩散模型，直接生成3D-多视图一致的辐射场，以生成3D人物头部。新的优化方法可以保持2D和3D的表情特征相对应。
results: 研究表明，使用 diffusion-based 方法可以生成高质量的3D人物头部，并且可以在特定的领域内操作，例如人类头部。与之前的CLIP方法相比，我们的方法可以提供更高的多样性和准确性。

Abstract
The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. These methods directly leverage pre-trained 2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance fields of generic objects. However, due to the lack of geometry and texture priors, these methods have limited control over the generated 3D objects, making it difficult to operate inside a specific domain, e.g., human heads. In this work, we develop a new approach to text-guided 3D head avatar generation to address this limitation. Our framework directly operates on the geometry and texture of an articulable 3D morphable model (3DMM) of a head, and introduces novel optimization procedures to update the geometry and texture while keeping the 2D and 3D facial features aligned. The result is a 3D head avatar that is consistent with the text description and can be readily articulated using the deformation model of the 3DMM. We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task. The latter are typically based on CLIP, which is known to provide limited diversity of generation and accuracy for 3D object generation.

摘要
“三维人物头像生成能力是许多应用中的重要能力，包括增强现实、电影拍摄和教育。现有的文本导向三维物体生成研究已经展示了很大的应用潜力。这些方法直接利用预训的二维文本扩散模型来生成三维多视角具有颜色场的对应物体。但由于缺乏几何和纹理偏好，这些方法对生成的三维物体有限的控制，很难在特定领域内运作，例如人头。在这个工作中，我们开发了一新的文本导向三维头像生成方法，以解决这个限制。我们的框架直接运算在头像3DMM中的几何和纹理，并引入了新的优化程序以更新几何和纹理，并保持2D和3D脸部特征相互Alignment。结果是一个跟文本描述相符的3D头像，可以轻松地运动使用3DMM的扭变模型。我们显示了我们的扩散基于3DMM的头像比以前的方法更高效。这些方法通常基于CLIP，CLIP知道提供有限的多样性和精度 для三维物体生成。”

SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

paper_url: http://arxiv.org/abs/2307.04850
repo_url: None
paper_authors: Sanjay Kariyappa, Leonidas Tsepenekas, Freddy Lécué, Daniele Magazzeni
for: 本研究的目的是解释模型预测结果的原因，通过计算特征重要性。
methods: 本研究使用了SHAP框架，并引入了Top-k标识问题（TkIP），以解决高级特征选择问题。
results: 研究人员通过引入多重采样和适应采样策略，提高了现有方法的样本效率和运行时间，平均提高了5倍。

Abstract
The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.

摘要
<> translate "The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets."中文翻译：<>SHAP框架提供一种原则正的方法来解释模型预测的原因，通过计算特征重要性。在金融应用场景下，我们引入了Top-k标识问题（TkIP）， objective是Identify k 最高 SHAP 值的特征。尽管任何可以计算 SHAP 值的不确定性估计（如 KernelSHAP 和 SamplingSHAP）可以轻松地适应 TkIP，但这样做是高度样本不效率的。我们的目标是提高现有方法的样本效率，在 TkIP 的 контекス中。我们的关键发现是，TkIP 可以视为一个 Explore-m 问题，这是一个已经研究了多臂枪（MAB）中的问题。这种连接使我们可以通过利用 MAB литературе中的两种技术来提高样本效率：（1）更好的停止条件（以确保 PAC （Probably Approximately Correct）的保证）和（2）贪婪的采样方案，智能分配样本 между不同的特征。通过采用这些方法，我们开发了 KernelSHAP@k 和 SamplingSHAP@k，以有效地解决 TkIP，在大多数常见的借款相关数据集上提供了平均提高 $5\times$ 的样本效率和运行时间。

SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees

paper_url: http://arxiv.org/abs/2307.04849
repo_url: None
paper_authors: Aleksei Sorokin, Xinran Zhu, Eric Hans Lee, Bolong Cheng
for: 提高Gradient Boosted Trees（GBTs）模型的hyperparameter优化
methods: 使用meta学和多信度优化技术进行模型 aware的hyperparameter优化，自动学习performant的hyperparameter
results: 比现有系统更高效地优化GBTs hyperparameter，减少用户域知识的需求，提供更易用的用户体验

Abstract
Gradient boosted trees (GBTs) are ubiquitous models used by researchers, machine learning (ML) practitioners, and data scientists because of their robust performance, interpretable behavior, and ease-of-use. One critical challenge in training GBTs is the tuning of their hyperparameters. In practice, selecting these hyperparameters is often done manually. Recently, the ML community has advocated for tuning hyperparameters through black-box optimization and developed state-of-the-art systems to do so. However, applying such systems to tune GBTs suffers from two drawbacks. First, these systems are not \textit{model-aware}, rather they are designed to apply to a \textit{generic} model; this leaves significant optimization performance on the table. Second, using these systems requires \textit{domain knowledge} such as the choice of hyperparameter search space, which is an antithesis to the automatic experimentation that black-box optimization aims to provide. In this paper, we present SigOpt Mulch, a model-aware hyperparameter tuning system specifically designed for automated tuning of GBTs that provides two improvements over existing systems. First, Mulch leverages powerful techniques in metalearning and multifidelity optimization to perform model-aware hyperparameter optimization. Second, it automates the process of learning performant hyperparameters by making intelligent decisions about the optimization search space, thus reducing the need for user domain knowledge. These innovations allow Mulch to identify good GBT hyperparameters far more efficiently -- and in a more seamless and user-friendly way -- than existing black-box hyperparameter tuning systems.

摘要
Gradient Boosted Trees (GBTs) 是研究人员、机器学习（ML）专业人员和数据科学家们广泛使用的模型，因其性能 Robust 、可解释性和易用性。但在训练 GBTs 时，一个关键挑战是调整它们的超参数。在实践中，选择这些超参数通常是手动进行的。近年来，ML 社区强调通过黑盒优化进行超参数调整，并开发出了state-of-the-art 系统。但是，对 GBTs 进行黑盒优化具有两个缺点。首先，这些系统不是 GBTs 模型具有的，而是为普通模型而设计的，这会吃到大量的优化性能。第二，使用这些系统需要域知识，如选择超参数搜索空间，这与黑盒优化的自动实验相对抵触。在本文中，我们介绍了 SigOpt Mulch，一个特地设计用于自动调整 GBTs 超参数的模型具有优化系统。相比现有系统，Mulch 具有两个优势：首先，Mulch 利用了强大的元学习和多 fidelt 优化技术来进行模型具有的超参数优化。其次，它自动化了超参数优化的过程，从而减少了用户需要域知识的需求。这些创新使得 Mulch 可以更加高效地找到好的 GBTs 超参数，并且在更加易用和愉悦的方式进行自动化。

Dynamics of Temporal Difference Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.04841
repo_url: https://github.com/pehlevan-group/td-rl-dynamics
paper_authors: Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan
for: 这篇论文旨在研究反射学习中参数和状态表示方法如何控制学习动态。
methods: 这篇论文使用统计物理概念来研究价值函数学习的时间差分学习 Curves。
results: 研究发现，在抽样 espacio 的情况下，涉及到随机漫步的 Stochastic semi-gradient noise 会导致价值错误出现极大板块，而不同于传统的梯度下降动力学。研究还发现，学习率渐变和奖励调整可以改善学习动态和板块。

Abstract
Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

摘要
<>将文本翻译成简化中文。<>基于实验的成功，强化学习已经在具有罕见反馈的环境中的多种应用中成功。然而，虽然有这些实际成功，但是还没有充分的理论理解强化学习模型参数和表示状态的特征之间如何控制学习动态。在这项工作中，我们使用统计物理学的概念，研究延迟差值学习值函数的时间梯度学习曲线的典型情况。我们的理论基于 Gaussian 等价假设，将 episodic 随机轨迹的平均值替换为时间相关的 Gaussian 特征平均值，并在小规模 Markov Decision Processes 上验证我们的假设。我们发现，由于 episodic 随机轨迹的抽样而导致的随机半gradient 噪声会在传统的梯度下降动力学中引起显著的板块，而不同于传统的梯度下降动力学。我们研究学习动态和板块如何受特征结构、学习率、折扣因子和奖励函数的影响。然后，我们分析如何通过学习率减退和奖励修饰来改善学习动态和板块。综上所述，我们的工作开启了一个新的方向，以开发强化学习动态学的理论。

CREPE: Learnable Prompting With CLIP Improves Visual Relationship Prediction

paper_url: http://arxiv.org/abs/2307.04838
repo_url: https://github.com/llnl/crepe
paper_authors: Rakshith Subramanyam, T. S. Jayram, Rushil Anirudh, Jayaraman J. Thiagarajan
for: 这篇论文探讨了使用视觉语言模型（VLM），尤其是CLIP，预测视觉对象关系的潜力，从图像中提取语言基于关系。
methods: 我们采用了UVTransE关系预测框架，该框架学习关系为图像中的翻译嵌入。我们系统地探索CLIP中的subject、object和union-box表示方法，并提出了CREPE（CLIP表示增强预测）。CREPE使用了文本基于表示，并引入了一种新的对比训练策略来自动推理union-box的文本提示。
results: 我们的方法在Visual Genome benchmark上实现了 predicate estimation的状态aru-the-art性能，mR@5 27.79，mR@20 31.95，与最近的状态aru-the-art在mR@20上提高15.3%。这个工作证明了CLIP在对象关系预测中的效果，并鼓励了更多的研究在这个挑战性的领域。

Abstract
In this paper, we explore the potential of Vision-Language Models (VLMs), specifically CLIP, in predicting visual object relationships, which involves interpreting visual features from images into language-based relations. Current state-of-the-art methods use complex graphical models that utilize language cues and visual features to address this challenge. We hypothesize that the strong language priors in CLIP embeddings can simplify these graphical models paving for a simpler approach. We adopt the UVTransE relation prediction framework, which learns the relation as a translational embedding with subject, object, and union box embeddings from a scene. We systematically explore the design of CLIP-based subject, object, and union-box representations within the UVTransE framework and propose CREPE (CLIP Representation Enhanced Predicate Estimation). CREPE utilizes text-based representations for all three bounding boxes and introduces a novel contrastive training strategy to automatically infer the text prompt for union-box. Our approach achieves state-of-the-art performance in predicate estimation, mR@5 27.79, and mR@20 31.95 on the Visual Genome benchmark, achieving a 15.3\% gain in performance over recent state-of-the-art at mR@20. This work demonstrates CLIP's effectiveness in object relation prediction and encourages further research on VLMs in this challenging domain.

摘要
在这篇论文中，我们探索了视觉语言模型（VLM），尤其是CLIP，在预测视觉对象关系方面的潜力。当前领先方法使用复杂的图形模型，利用语言提示和视觉特征来解决这个挑战。我们假设CLIP的强语言优先可以简化这些图形模型，为更简单的方法提供条件。我们采用UVTransE关系预测框架，该框架学习关系为图像中的翻译嵌入。我们系统地探索CLIP基于表示的主体、对象和联合盒子表示的设计，并提出CREPE（CLIP表示增强预测Predicate）。CREPE使用文本基于表示 для所有三个盒子，并 introduce了一种新的对比训练策略，自动推断联合盒子的文本提示。我们的方法在Visual Genome标准 benchmark上实现了预测 predicate 的状态对应性，MR@5 27.79，MR@20 31.95，与最近领先方法相比提高了15.3%的性能。这项工作证明CLIP在对象关系预测中的效iveness，并鼓励进一步的VLM在这个领域的研究。

Graph Representation of the Magnetic Field Topology in High-Fidelity Plasma Simulations for Machine Learning Applications

paper_url: http://arxiv.org/abs/2307.09469
repo_url: None
paper_authors: Ioanna Bouri, Fanni Franssila, Markku Alho, Giulia Cozzani, Ivan Zaitsev, Minna Palmroth, Teemu Roos
for: study of magnetic reconnection in three-dimensional magnetic vector fields
methods: scalable pipeline for topological data analysis and spatiotemporal graph representation
results: demonstration on simulations of the Earth’s magnetosphere produced by Vlasiator

Abstract
Topological analysis of the magnetic field in simulated plasmas allows the study of various physical phenomena in a wide range of settings. One such application is magnetic reconnection, a phenomenon related to the dynamics of the magnetic field topology, which is difficult to detect and characterize in three dimensions. We propose a scalable pipeline for topological data analysis and spatiotemporal graph representation of three-dimensional magnetic vector fields. We demonstrate our methods on simulations of the Earth's magnetosphere produced by Vlasiator, a supercomputer-scale Vlasov theory-based simulation for near-Earth space. The purpose of this work is to challenge the machine learning community to explore graph-based machine learning approaches to address a largely open scientific problem with wide-ranging potential impact.

摘要
topological 分析 магнитного场在模拟的气体中允许研究各种物理现象在广泛的设置下。一种应用是 магнит重连，这是关于磁场拓扑结构动态的现象，Difficult to detect and characterize in three dimensions. We propose a scalable pipeline for topological data analysis and spatiotemporal graph representation of three-dimensional magnetic vector fields. We demonstrate our methods on simulations of the Earth's magnetosphere produced by Vlasiator, a supercomputer-scale Vlasov theory-based simulation for near-Earth space. The purpose of this work is to challenge the machine learning community to explore graph-based machine learning approaches to address a largely open scientific problem with wide-ranging potential impact.Note: Simplified Chinese is used here, which is a more casual and widely-used version of Chinese. If you prefer Traditional Chinese, please let me know.

Functional PCA and Deep Neural Networks-based Bayesian Inverse Uncertainty Quantification with Transient Experimental Data

paper_url: http://arxiv.org/abs/2307.05592
repo_url: None
paper_authors: Ziyu Xie, Mahmoud Yaseen, Xu Wu
for: 这个论文的目的是开发一种逆Quantification of Uncertainty（UQ）过程，用于量化模型输入不确定性基于实验数据。
methods: 这篇论文使用了功能归一分析（PCA）和深度神经网络（DNN）来建立一个快速的Surrogate模型，从而减少计算成本。
results: 该研究比较了不同的逆UQ过程，并结果表明，提posed方法可以更好地减少TRACE транзиент计算模型的维度，并且预测结果与实验数据更加一致。

Abstract
Inverse UQ is the process to inversely quantify the model input uncertainties based on experimental data. This work focuses on developing an inverse UQ process for time-dependent responses, using dimensionality reduction by functional principal component analysis (PCA) and deep neural network (DNN)-based surrogate models. The demonstration is based on the inverse UQ of TRACE physical model parameters using the FEBA transient experimental data. The measurement data is time-dependent peak cladding temperature (PCT). Since the quantity-of-interest (QoI) is time-dependent that corresponds to infinite-dimensional responses, PCA is used to reduce the QoI dimension while preserving the transient profile of the PCT, in order to make the inverse UQ process more efficient. However, conventional PCA applied directly to the PCT time series profiles can hardly represent the data precisely due to the sudden temperature drop at the time of quenching. As a result, a functional alignment method is used to separate the phase and amplitude information of the transient PCT profiles before dimensionality reduction. DNNs are then trained using PC scores from functional PCA to build surrogate models of TRACE in order to reduce the computational cost in Markov Chain Monte Carlo sampling. Bayesian neural networks are used to estimate the uncertainties of DNN surrogate model predictions. In this study, we compared four different inverse UQ processes with different dimensionality reduction methods and surrogate models. The proposed approach shows an improvement in reducing the dimension of the TRACE transient simulations, and the forward propagation of inverse UQ results has a better agreement with the experimental data.

摘要
<>转换为简化中文：<>反UQ是量化模型输入不确定性的过程，基于实验数据。这项工作关注于开发一种反UQ过程，用于时间相依 responses，使用函数 principales component analysis (PCA) 和深度神经网络 (DNN) 模型来替代模型。示例基于TRACE物理模型参数的反UQ，使用FEBA过程数据。测量数据是时间相依的皮层温度 (PCT)。由于QoI是无穷维度的响应，PCA 用于减少 QoI 维度，保持温度过程的演变轨迹，以便更有效地进行反UQ。然而，直接应用常规 PCA 到 PCT 时间序列profile 的数据可能无法准确表示数据，因为冷却过程中的温度快速下降。因此，我们使用函数对齐方法分离 PCT 时间序列profile 的频率和幅度信息。然后，使用 PC scores 从函数 PCA 训练 DNN 模型，以减少计算成本。bayesian neural networks 用于估计 DNN 模型预测结果的不确定性。在这项研究中，我们比较了不同的反UQ过程，包括不同的维度减少方法和模型。我们的方法显示可以更好地减少 TRACE 过程的维度，并且反UQ结果的前向传播与实验数据更好匹配。

SITTA: A Semantic Image-Text Alignment for Image Captioning

paper_url: http://arxiv.org/abs/2307.05591
repo_url: https://github.com/ml-jku/semantic-image-text-alignment
paper_authors: Fabian Paischer, Thomas Adler, Markus Hofmarcher, Sepp Hochreiter
for: 这 paper 的目的是提供一种将 Semantic 信息从视觉模型传输到生成型语言模型中，以实现图像描述的详细语言能力。methods: 该 paper 使用了两种新的建构方法来实现将 Semantic 信息从视觉模型的 embedding space 传输到生成型语言模型中，包括通过token对应关系和使用额外数据来构建直接从视觉空间到语言空间的映射。results: 该 paper 使用了这两种方法，在 MS-COCO 和 Flickr30k 数据集上实现了强大的描述性能，甚至在有限的数据情况下，与零参数和训练参数的竞争对手相比，有部分表现出色。具体来说，使用该方法，只需要250M参数的语言模型就可以生成不错的描述。这种方法使得图像描述更加可 accessible для机构具有限制的计算资源。

Abstract
Textual and semantic comprehension of images is essential for generating proper captions. The comprehension requires detection of objects, modeling of relations between them, an assessment of the semantics of the scene and, finally, representing the extracted knowledge in a language space. To achieve rich language capabilities while ensuring good image-language mappings, pretrained language models (LMs) were conditioned on pretrained multi-modal (image-text) models that allow for image inputs. This requires an alignment of the image representation of the multi-modal model with the language representations of a generative LM. However, it is not clear how to best transfer semantics detected by the vision encoder of the multi-modal model to the LM. We introduce two novel ways of constructing a linear mapping that successfully transfers semantics between the embedding spaces of the two pretrained models. The first aligns the embedding space of the multi-modal language encoder with the embedding space of the pretrained LM via token correspondences. The latter leverages additional data that consists of image-text pairs to construct the mapping directly from vision to language space. Using our semantic mappings, we unlock image captioning for LMs without access to gradient information. By using different sources of data we achieve strong captioning performance on MS-COCO and Flickr30k datasets. Even in the face of limited data, our method partly exceeds the performance of other zero-shot and even finetuned competitors. Our ablation studies show that even LMs at a scale of merely 250M parameters can generate decent captions employing our semantic mappings. Our approach makes image captioning more accessible for institutions with restricted computational resources.

摘要
文本和 semantic 理解图像是图像描述的关键。这种理解需要检测对象、模型图像中对象之间的关系、Scene 的 semantics 评估，然后将抽象到语言空间中。为了实现良好的语言功能并确保图像语言映射，我们使用预训练的语言模型（LM） conditioned 在预训练的多modal（图像文本）模型中。这需要将图像表示的多modal 语言encoder 与 LM 的语言表示空间进行Alignment。然而，不知道如何最好地传递多modal 视觉encoder 中检测到的 semantics 到 LM。我们提出了两种新的方法，它们可以成功地在多modal 和 LM 的 embedding 空间之间传递 semantics。第一种方法是将多modal 语言encoder 的 embedding 空间与预训练 LM 的 embedding 空间进行Token 对应。第二种方法是通过使用更多的数据， direkt 从视觉到语言空间进行映射。使用我们的semantic mapping，我们可以在 LM 没有梯度信息的情况下实现图像描述。通过使用不同的数据源，我们在 MS-COCO 和 Flickr30k 数据集上达到了强大的描述性性能。即使 faced 有限的数据，我们的方法可以比其他零 shot 和精度调整的竞争对手表现更好。我们的抽象研究表明，即使 LM 的参数数量只有 250M，我们的方法仍可以生成不错的描述。我们的方法使图像描述更加可 accessible для机构具有限制的计算资源。

Information decomposition to identify relevant variation in complex systems with machine learning

paper_url: http://arxiv.org/abs/2307.04755
repo_url: None
paper_authors: Kieran A. Murphy, Dani S. Bassett
for: 本研究旨在提供一种实用、有效和通用的方法，以解compose the information contained in a set of measurements，以便更好地理解复杂系统的行为。
methods: 该方法基于分布式信息瓶颈作为学习目标，通过lossily compressing each measurement，对measurements的变化进行分类，并对不同量的预测信息进行排序。
results: 研究表明，该方法可以帮助分解复杂系统的信息，并在不同的预测量上提供更多的细节。在两个典型的复杂系统中（Boolean circuit和塑性变形材料），研究人员可以通过查看学习 compression scheme 来了解系统中关键的变化，并从而更好地理解系统的行为。

Abstract
One of the fundamental steps toward understanding a complex system is identifying variation at the scale of the system's components that is most relevant to behavior on a macroscopic scale. Mutual information is a natural means of linking variation across scales of a system due to its independence of the particular functional relationship between variables. However, estimating mutual information given high-dimensional, continuous-valued data is notoriously difficult, and the desideratum -- to reveal important variation in a comprehensible manner -- is only readily achieved through exhaustive search. Here we propose a practical, efficient, and broadly applicable methodology to decompose the information contained in a set of measurements by lossily compressing each measurement with machine learning. Guided by the distributed information bottleneck as a learning objective, the information decomposition sorts variation in the measurements of the system state by relevance to specified macroscale behavior, revealing the most important subsets of measurements for different amounts of predictive information. Additional granularity is achieved by inspection of the learned compression schemes: the variation transmitted during compression is composed of distinctions among measurement values that are most relevant to the macroscale behavior. We focus our analysis on two paradigmatic complex systems: a Boolean circuit and an amorphous material undergoing plastic deformation. In both examples, specific bits of entropy are identified out of the high entropy of the system state as most related to macroscale behavior for insight about the connection between micro- and macro- in the complex system. The identification of meaningful variation in data, with the full generality brought by information theory, is made practical for the study of complex systems.

摘要
一个基本的步骤到理解复杂系统是识别系统组件的变化 scales 最 relevante to macroscopic behavior. 互补信息是一种自然地将变化 across scales of a system link 的方法，但是估计高维数据 Continuous valued mutual information 是非常困难的。我们提出了一种实用的、高效的和通用的方法，通过使用机器学习来压缩每个测量，以实现信息剖析。我们的方法基于分布式信息瓶颈作为学习目标，通过对测量集进行损失压缩，对测量集中的信息进行分类，并对不同Amount of predictive information 的情况进行分类。通过 inspecting the learned compression schemes ，我们可以获得额外的细化，并识别出系统状态测量中最相关的变化。我们的分析涉及到两种典型的复杂系统：布尔电路和杂质材料在塑性变形过程中。在这两个例子中，我们可以通过对系统状态测量中的高 entropy 进行分类，并识别出与 macroscopic behavior 相关的特定比特 entropy。通过这种方法，我们可以实际地识别复杂系统数据中的有用变化，并且通过信息理论来获得全面的一般性。

paper_url: http://arxiv.org/abs/2307.04751
repo_url: None
paper_authors: Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox
for: 提供了一个系统来重新排序场景中的物品，以达到想要的物品-场景排序关系，如一本书插入开放架上的槽中。
methods: 使用了一个管线，处理了三维点云的变化，并从示范训练出来运作。系统可以处理不同的场景和物品的几种对称和位置，并且可以快速地处理多种不同的重新排序任务。
results: 系统可以实现多种不同的重新排序任务，包括处理多模式和物体形状和位置的变化。实验和真实世界中的评估结果显示，系统可以精确地完成重新排序任务，并且可以快速地处理多种不同的任务。

Abstract
We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/

摘要
我们提出了一个系统，用于将对象重新排序Scene以实现想要的对象-Scene的放置关系，如一本书插入开放架上的槽中。我们的管道可以普适到新的几何结构、姿态和布局，并从示例中训练直接操作3D点云。我们的系统可以超越对于给定场景中的多个几何相似的重新排序解。通过利用循环pose减噪训练过程，我们可以适应多模态示例数据，并且生成多模态输出，同时保持精度和准确。我们还发现，通过关注相关的本地几何特征，而忽略无关的全局结构，可以提高both泛化和精度。我们在三个不同的重新排序任务中证明了我们的方法的优势，这些任务需要处理多模态和对象形状和姿态的泛化。项目网站、代码和视频：https://anthonysimeonov.github.io/rpdiff-multi-modal/

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback

paper_url: http://arxiv.org/abs/2307.04749
repo_url: None
paper_authors: Jaskirat Singh, Liang Zheng
for: This paper aims to improve the accuracy of text-to-image alignment in latent diffusion models, which have shown remarkable progress in the field of text-conditioned image generation.
methods: The proposed approach uses a decompositional approach to evaluate and improve text-to-image alignment. This involves introducing a Decompositional-Alignment-Score, which decomposes a complex prompt into a set of disjoint assertions and measures the alignment of each assertion with generated images using a VQA model.
results: The proposed approach shows significantly higher correlation with human ratings compared to traditional CLIP and BLIP scores, and also provides useful feedback in the form of assertion level alignment scores. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy.Here is the same information in Simplified Chinese text:
for: 这个研究旨在提高文本到图像对适性的准确性，文本conditioned图像生成领域内，latest扩散模型已经做出了无 precedent的进步。
methods: 提议的方法使用分解法来评估和改进文本到图像对适性。这里引入了Decompositional-Alignment-Score，将复杂的提示分解成一系列独立的声明，然后使用VQA模型测量每个声明与生成图像之间的对适性。
results: 提议的方法与人类评分之间显著相关性高于传统的CLIP和BLIP分数，同时也提供了有用的反馈，即声明级对适性分数。人类用户研究表明，提议的方法比前一代最佳方法提高了8.7%的全面文本到图像对适性精度。

Abstract
The field of text-conditioned image generation has made unparalleled progress with the recent advent of latent diffusion models. While remarkable, as the complexity of given text input increases, the state-of-the-art diffusion models may still fail in generating images which accurately convey the semantics of the given prompt. Furthermore, it has been observed that such misalignments are often left undetected by pretrained multi-modal models such as CLIP. To address these problems, in this paper we explore a simple yet effective decompositional approach towards both evaluation and improvement of text-to-image alignment. In particular, we first introduce a Decompositional-Alignment-Score which given a complex prompt decomposes it into a set of disjoint assertions. The alignment of each assertion with generated images is then measured using a VQA model. Finally, alignment scores for different assertions are combined aposteriori to give the final text-to-image alignment score. Experimental analysis reveals that the proposed alignment metric shows significantly higher correlation with human ratings as opposed to traditional CLIP, BLIP scores. Furthermore, we also find that the assertion level alignment scores provide a useful feedback which can then be used in a simple iterative procedure to gradually increase the expression of different assertions in the final image outputs. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy. Project page for our paper is available at https://1jsingh.github.io/divide-evaluate-and-refine

摘要
“文本调整图像生成领域在最近的几年内取得了无 precedent 的进步，特别是透过潜在散射模型。然而，当文本输入变得越来越复杂时，现代的散射模型可能仍然无法生成具有准确传递 semantics 的图像。此外，实验表明，这些不一致性常常会被先验的多modal模型如 CLIP 掩盖。为了解决这些问题，在本文中我们探索了一个简单 yet effective 的分解分析方法，包括：（1）将复杂的文本提示分解为一系列不耦合的宣告（Decompositional-Alignment-Score）；（2）使用 VQA 模型评估每个宣告与生成的图像之间的Alignment度；（3）将不同宣告的Alignment度联合后续构成最终的文本至图像Alignment度。实验分析显示，我们提出的Alignment度指标与人工评分呈现高度相关，并且比Traditional CLIP、BLIP 分数更高。此外，我们发现 assertion 阶层Alignment度也提供了有用的反馈，可以用于一个简单的迭代程序，逐步增加生成图像中不同宣告的表达。人工用户研究显示，我们的方法与前一代最佳状态差异8.7%。”Note: The translation is done using Google Translate, and may not be perfect. Please let me know if you need any further assistance.

RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

paper_url: http://arxiv.org/abs/2307.04738
repo_url: https://github.com/MandiZhao/robot-collab
paper_authors: Zhao Mandi, Shreeya Jain, Shuran Song
for: 这个论文旨在探讨多机器人合作的新方法，利用预训大量语言模型（LLM）来实现高层次通信和低层次路径规划。
methods: 在这个方法中，机器人具有LLM，可以集体推理和讨论任务策略。它们产生了子任务计划和任务空间径路径，这些路径被用于加速曲线规划。此外，环境反馈，例如碰撞检查，并提醒LLM代理从中 improvise它们的计划和径路。
results: 在 RoCoBench benchmark 中，这个方法得到了高成功率，并能够适应任务 semantics 的变化。对话设置具有高可读性和灵活性，在实际世界实验中，RoCo 可以与人合作完成任务。

Abstract
We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They then generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset for agent representation and reasoning. We experimentally demonstrate the effectiveness of our approach -- it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility -- in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. See project website https://project-roco.github.io for videos and code.

摘要
我们提出了一种新的多机器人合作方法，利用预训练的大型自然语言模型（LLM）来实现高级沟通和低级路径规划。机器人通过LLM进行集体理解任务策略，然后生成子任务计划和任务空间弧轨路径，这些路径被用于加速曲线规划。我们还提供了环境反馈，如碰撞检测，并让LLM代理更新其计划和弧轨路径。为评估，我们提出了RoCoBench，一个6个任务的benchmark，涵盖了多机器人合作场景的各种情况，并附带了文本数据集，用于代理表示和理解。我们的对话设置具有高可读性和灵活性，在实际世界实验中，我们示例了RoCo可以轻松地与人类在Loop合作，以完成任务。更多信息请访问项目网站https://project-roco.github.io。

A unifying framework for differentially private quantum algorithms

paper_url: http://arxiv.org/abs/2307.04733
repo_url: None
paper_authors: Armando Angrisani, Mina Doosti, Elham Kashefi
for: 本研究旨在提供一种通用的量子隐私定义，以保护敏感信息的处理。
methods: 本文提出了一种新的量子隐私定义，基于量子状态的近似性。此外，本文还提出了一种将类型和量子噪声添加到混合噪声中的方法，以提供更加紧密的隐私保障。
results: 本文的研究结果表明，使用该新的量子隐私定义和混合噪声方法可以提供更加紧密的隐私保障，同时具有较小的失真率。此外，本文还证明了量子隐私的先进共轨性，并应用到量子隐私中。

Abstract
Differential privacy is a widely used notion of security that enables the processing of sensitive information. In short, differentially private algorithms map "neighbouring" inputs to close output distributions. Prior work proposed several quantum extensions of differential privacy, each of them built on substantially different notions of neighbouring quantum states. In this paper, we propose a novel and general definition of neighbouring quantum states. We demonstrate that this definition captures the underlying structure of quantum encodings and can be used to provide exponentially tighter privacy guarantees for quantum measurements. Our approach combines the addition of classical and quantum noise and is motivated by the noisy nature of near-term quantum devices. Moreover, we also investigate an alternative setting where we are provided with multiple copies of the input state. In this case, differential privacy can be ensured with little loss in accuracy combining concentration of measure and noise-adding mechanisms. En route, we prove the advanced joint convexity of the quantum hockey-stick divergence and we demonstrate how this result can be applied to quantum differential privacy. Finally, we complement our theoretical findings with an empirical estimation of the certified adversarial robustness ensured by differentially private measurements.

摘要
differential privacy 是一种广泛使用的安全概念，允许处理敏感信息。简而言之， differentially private 算法将 "邻近" 输入映射到相似的输出分布。先前的工作已经提出了多种量子扩展 differential privacy，每一种都基于不同的量子邻近状态的几何。在这篇论文中，我们提出了一个新的和通用的量子邻近状态定义。我们示出了这个定义捕捉了量子编码的下面结构，可以提供 exponentially 紧密的隐私保证 для量子测量。我们的方法 combinest классической和量子噪声的添加，被动机是近期的量子设备的噪声性。此外，我们还 investigate了另一种情况，在这种情况下，我们被提供多份输入状态。在这种情况下， differential privacy 可以在减少精度的情况下确保。在路过中，我们证明了量子奶酪散度的高级联合几何性，并示出了如何将这一结果应用于量子隐私。最后，我们补充了我们的理论发现，通过实际的验证证明了 differentially private 测量确保的抗对抗性。

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.04726
repo_url: None
paper_authors: Suzan Ece Ada, Erhan Oztop, Emre Ugur
for: 本研究旨在提高Offline Reinforcement Learning（RL）方法的灵活性和可靠性，以便在不同的环境中学习更好的策略。
methods: 本研究使用了Diffusion Policies，并添加了State Reconstruction Feature Learning来解决Out-of-distribution扩展问题。
results: 本研究在2D Multimodal Contextual Bandit环境中实现了State-of-the-art的性能，并在多个D4RLbenchmark任务上也达到了优秀的结果。

Abstract
Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for experience collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to obtain expressive policies to represent multimodal behavior in the dataset. Nevertheless, they are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution states. We design a 2D Multimodal Contextual Bandit environment to demonstrate and evaluate our proposed model. We assess the performance of our model not only in this new environment but also on several D4RL benchmark tasks, achieving state-of-the-art results.

摘要
<> translate "Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for experience collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to obtain expressive policies to represent multimodal behavior in the dataset. Nevertheless, they are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution states. We design a 2D Multimodal Contextual Bandit environment to demonstrate and evaluate our proposed model. We assess the performance of our model not only in this new environment but also on several D4RL benchmark tasks, achieving state-of-the-art results."into Simplified Chinese:<>在线RL方法可以利用先前的经验学习更好的策略，而不是使用行为策略来收集经验。与行为克隆不同，在线RL可以处理非专家数据和多Modal行为策略。然而，在线RL算法面临着处理分布偏移和有效表达策略的挑战。先前的在线RL使用条件扩散模型获得表达性策略来表示数据集中的多Modal行为。然而，它们不是专门解决异常状态泛化问题。我们提出了一种新的方法，利用状态重建特征学习在最近的扩散策略中来解决异常状态泛化问题。状态重建损失使得状态表示学习更加描述性，以适应数据集中的分布偏移。我们设计了2D多Modal上下文随机策略环境，用以评估和评测我们的提议模型。我们不仅在这个新环境中评估我们的模型，还在D4RL benchmark任务上达到了最佳成绩。

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

paper_url: http://arxiv.org/abs/2307.04725
repo_url: https://github.com/guoyww/animatediff
paper_authors: Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai
for: 提出了一个实用的框架，以便使得大多数现有的个性化文本到图像模型都能够通过一次性调整生成动画图像。
methods: 核心思想是插入一个新初始化的动态模型模块到冻结的文本到图像模型中，并在视频clip上进行训练，以储存合理的动作假设。
results: 对多种公共代表性的个性化文本到图像模型进行评估，并示出了该框架可以帮助这些模型生成满足时间滑动的动画clip，同时保持域和多样性的输出。

Abstract
With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs. Code and pre-trained weights will be publicly available at https://animatediff.github.io/ .

摘要
随着文本到图像模型（如稳定扩散）和相关个性化技术（如梦 Broadcast和LoRA）的发展，现在任何人都可以将想象力转化成高质量图像，并且在便宜的成本下进行。然而，随后需要图像动画技术来将生成的静止图像与动态运动结合起来。在这份报告中，我们提出一个实用的框架，可以将大多数现有的个性化文本到图像模型动画一遍，从而避免模型特定的调整。核心思想是插入一个新初始化的运动模型模块到冻结的文本到图像模型中，然后在视频clip上进行准则学习，以提取合理的运动偏好。一旦训练完成，只需将这个运动模型模块注入到基础T2I的所有个性化版本中，即使这些版本都是基于同一个基础T2I模型的，也可以通过文本驱动生成个性化的动画图像。我们在各种公共代表性的个性化文本到图像模型上进行了评估，包括漫画图片和真实摄影，并证明了我们提出的框架可以帮助这些模型生成 temporally Smooth的动画clip，保持域和多样性的输出。代码和预训练 веса将在https://animatediff.github.io/公共上发布。

Advances and Challenges in Meta-Learning: A Technical Review

paper_url: http://arxiv.org/abs/2307.04722
repo_url: None
paper_authors: Anna Vettoruzzo, Mohamed-Rafik Bouguelia, Joaquin Vanschoren, Thorsteinn Rögnvaldsson, KC Santosh
for: 这篇评论旨在提供一个全面的技术概述，探讨meta-learning在实际应用中的重要性，以及它如何帮助学习系统从多个任务中获得知识，以更快地适应和泛化到新任务。
methods: 评论涵盖了当前meta-learning领域的state-of-the-art方法，并探讨了meta-learning与多任务学习、传输学习、领域适应和泛化、自我指导学习、个性化联合学习以及持续学习的关系。
results: 评论总结了当前领域的最新研究发展，并提出了未解决的问题和挑战，以便未来研究者可以更好地发挥创新力和积极性。

Abstract
Meta-learning empowers learning systems with the ability to acquire knowledge from multiple tasks, enabling faster adaptation and generalization to new tasks. This review provides a comprehensive technical overview of meta-learning, emphasizing its importance in real-world applications where data may be scarce or expensive to obtain. The paper covers the state-of-the-art meta-learning approaches and explores the relationship between meta-learning and multi-task learning, transfer learning, domain adaptation and generalization, self-supervised learning, personalized federated learning, and continual learning. By highlighting the synergies between these topics and the field of meta-learning, the paper demonstrates how advancements in one area can benefit the field as a whole, while avoiding unnecessary duplication of efforts. Additionally, the paper delves into advanced meta-learning topics such as learning from complex multi-modal task distributions, unsupervised meta-learning, learning to efficiently adapt to data distribution shifts, and continual meta-learning. Lastly, the paper highlights open problems and challenges for future research in the field. By synthesizing the latest research developments, this paper provides a thorough understanding of meta-learning and its potential impact on various machine learning applications. We believe that this technical overview will contribute to the advancement of meta-learning and its practical implications in addressing real-world problems.

摘要
Meta-学习授予学习系统多个任务知识的能力，以便更快地适应和泛化新任务。本文提供了关于Meta-学习的全面技术综述，强调其在实际应用中的重要性，特别是数据可能罕见或便宜得不到的情况下。文章涵盖了当前Meta-学习领域的state-of-the-art方法，并探讨Meta-学习和多任务学习、传输学习、领域适应和泛化、自动学习、个性化联合学习和持续学习之间的关系。文章指出这些话题之间的相互关系，并表明在一个领域进步可以对另一个领域产生积极影响，而不需要重复努力。此外，文章还探讨了Meta-学习高级主题，如从复杂多Modal任务分布学习、无监督Meta-学习、高效地适应数据分布变化学习和持续Meta-学习。最后，文章揭示了未解决的问题和未来研究的挑战。通过总结最新的研究发展，本文提供了Meta-学习的全面理解，以及其在不同机器学习应用中的实际影响。我们认为这种技术综述将对Meta-学习的进一步发展和实际应用产生贡献。

On the curvature of the loss landscape

paper_url: http://arxiv.org/abs/2307.04719
repo_url: https://github.com/Enosh-P/Study-on-Loss-Landscape-Geometry-for-Improving-Generalization-in-Adaptive-Optimization-Methods
paper_authors: Alison Pouplin, Hrittik Roy, Sidak Pal Singh, Georgios Arvanitidis
for: Understanding the generalization abilities of over-parameterized deep learning models.
methods: Analyzing the loss landscape as an embedded Riemannian manifold, focusing on the scalar curvature.
results: Connections between the scalar curvature and generalization in deep learning models.Here’s the same information in Simplified Chinese:
for: 了解深度学习模型的泛化能力。
methods: 分析损失ландшаф特为嵌入的里曼尼投影，关注scalar curvature。
results: 关于scalar curvature和泛化的连接。

Abstract
One of the main challenges in modern deep learning is to understand why such over-parameterized models perform so well when trained on finite data. A way to analyze this generalization concept is through the properties of the associated loss landscape. In this work, we consider the loss landscape as an embedded Riemannian manifold and show that the differential geometric properties of the manifold can be used when analyzing the generalization abilities of a deep net. In particular, we focus on the scalar curvature, which can be computed analytically for our manifold, and show connections to several settings that potentially imply generalization.

摘要
（一个主要挑战在现代深度学习中是理解如何over-parameterized模型在finite数据上表现得如此出色。我们可以通过损失景观的属性来分析这种泛化能力。在这个工作中，我们将损失景观视为一个嵌入在Riemannian多重空间中的抽象 manifold，并证明了这种多重空间的几何属性可以用于分析深度网络的泛化能力。特别是，我们将关注scalar curvature，可以在我们的 manifold 上计算analytically，并显示了与several settings的连接，这些设置可能导致泛化。）

Cobalt: Optimizing Mining Rewards in Proof-of-Work Network Games

paper_url: http://arxiv.org/abs/2307.04695
repo_url: None
paper_authors: Arti Vedula, Abhishek Gupta, Shaileshh Bojja Venkatakrishnan
for: 提高挖矿 reward 的最佳方式
methods: 使用 combinatorial bandit 算法，利用网络坐标来学习网络结构
results: 对多种网络设置进行实验，提出的方法能够超过或匹配基线方法的性能

Abstract
Mining in proof-of-work blockchains has become an expensive affair requiring specialized hardware capable of executing several megahashes per second at huge electricity costs. Miners earn a reward each time they mine a block within the longest chain, which helps offset their mining costs. It is therefore of interest to miners to maximize the number of mined blocks in the blockchain and increase revenue. A key factor affecting mining rewards earned is the connectivity between miners in the peer-to-peer network. To maximize rewards a miner must choose its network connections carefully, ensuring existence of paths to other miners that are on average of a lower latency compared to paths between other miners. We formulate the problem of deciding whom to connect to for miners as a combinatorial bandit problem. Each node picks its neighbors strategically to minimize the latency to reach 90\% of the hash power of the network relative to the 90-th percentile latency from other nodes. A key contribution of our work is the use of a network coordinates based model for learning the network structure within the bandit algorithm. Experimentally we show our proposed algorithm outperforming or matching baselines on diverse network settings.

摘要
钱币证明（proof-of-work）分布式区块链中的矿工活动已成为一项昂贵的 affair，需要特殊的硬件，可以每秒执行几百万次的Hash算法，并且需要巨大的电力成本。矿工每次在最长链中挖矿一个块，就会获得一定的奖励，这有助于 offset 矿工的矿工成本。因此，矿工想 maximize 矿工奖励 earned，需要选择网络连接优化。我们将这个问题定义为一个 combinatorial bandit problem。每个节点选择其邻居策略地，以最小化与90%的哈希能量网络的路径延迟相比，90%的哈希能量网络的路径延迟。我们的工作的一个重要贡献是使用基于网络坐标的模型来学习网络结构 dentro de bandit 算法。我们在多种网络设置下进行实验，并证明我们的提议算法可以超过或与基eline相当。

FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing

paper_url: http://arxiv.org/abs/2307.04684
repo_url: https://github.com/lpengyang/freedrag
paper_authors: Pengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin
for: 提高图像修改精度和灵活性，解决DragGAN在点追踪方面存在缺陷和困难。
methods: 提出了一种基于特征的方法，即FreeDrag，通过适应模板特征、直线搜索和杂化本地化技术来解决点追踪困难。
results: 对 DragGAN 进行比较，FreeDrag 能够在具有相似结构、细节或多个目标点的场景下实现稳定和高效的点基型图像修改。

Abstract
To serve the intricate and varied demands of image editing, precise and flexible manipulation of image content is indispensable. Recently, DragGAN has achieved impressive editing results through point-based manipulation. However, we have observed that DragGAN struggles with miss tracking, where DragGAN encounters difficulty in effectively tracking the desired handle points, and ambiguous tracking, where the tracked points are situated within other regions that bear resemblance to the handle points. To deal with the above issues, we propose FreeDrag, which adopts a feature-oriented approach to free the burden on point tracking within the point-oriented methodology of DragGAN. The FreeDrag incorporates adaptive template features, line search, and fuzzy localization techniques to perform stable and efficient point-based image editing. Extensive experiments demonstrate that our method is superior to the DragGAN and enables stable point-based editing in challenging scenarios with similar structures, fine details, or under multi-point targets.

摘要
为了满足图像编辑的复杂和多样化需求，图像内容精确和 flexible 的操作是不可或缺的。近期，DragGAN 已经实现了吸引人的编辑结果通过点基的操作。然而，我们发现 DragGAN 会遇到跟踪困难和模糊跟踪问题，其中 DragGAN 在跟踪感兴趣的执行点时遇到了困难，并且执行点可能会位于其他区域中，这些区域具有执行点的相似特征。为解决以上问题，我们提出了 FreeDrag，它采用了特征对应方法来解除 DragGAN 中点跟踪的压力。FreeDrag integrate了适应模板特征、线搜索和杂化地址技术，以实现稳定和高效的点基图像编辑。广泛的实验表明，我们的方法比 DragGAN 更高效和稳定，能够在复杂的场景下，如同构件、细节等，或者多点目标下进行稳定的点基编辑。

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

paper_url: http://arxiv.org/abs/2307.04679
repo_url: None
paper_authors: Kevin Scaman, Mathieu Even, Laurent Massoulié
for: 本文提出了一种新的泛化误差分析框架，用于对首选优化算法的统计学学习中的泛化误差进行分析，当Gradient只能通过部分观察到了由oracle提供的时候。我们的分析基于数据样本的Regularity，并可以得到多种学习问题的泛化误差的近似上下限，包括超vised学习、转移学习、Robust学习、分布式学习和通信效率的学习使用Gradient量化。这些结果适用于光滑和强型优化问题，以及非光滑优化问题，只要满足Polyak-Lojasiewicz假设。特别是，我们的上下限和下限取决于一个新的量，它扩展了条件标准差的概念，并是让Optimization of the statistical learning objective几乎等于Estimation of its gradient的准确度。
methods: 我们的分析基于数据样本的Regularity，并使用了Conditional standard deviation的概念来扩展Gradient的近似。
results: 我们的结果显示，在标准超vised学习问题中，采用增加 batch size的mini-batch gradient descent，并使用了温start的优化策略可以 дости到最佳的泛化误差，即在一定的多项式因子下，与理论最佳值相同。这些结果鼓励使用这种优化策略在实际应用中。

Abstract
In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.

摘要
在这篇论文中，我们提供了一种新的框架来分析首领优化算法在统计学学习中的泛化误差。我们的分析基于数据样本的规则性，可以 derive near matching 上下 bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization.这些结果适用于光滑和强 converges 优化问题，以及非 convex 优化问题， whenever the gradient can be approximated by having access to an oracle. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle.因此，我们的分析给出了一个准确的含义，即优化统计学学习 objective 是等价于 estimation of its gradient的困难。最后，我们表明，在标准的supervised learning情况下，采用增加 batch size 和 warm start的 mini-batch gradient descent可以达到一个泛化误差，是optimal up to a multiplicative factor，因此推荐使用这种优化策略在实际应用中。

LINFA: a Python library for variational inference with normalizing flow and annealing

paper_url: http://arxiv.org/abs/2307.04675
repo_url: https://github.com/desreslab/linfa
paper_authors: Yu Wang, Emma R. Cobian, Jubilee Lee, Fang Liu, Jonathan D. Hauenstein, Daniele E. Schiavazzi
for: 这 paper 是为了提供一种用于变量推断的 Python 库，以便处理复杂的模型和难以采样的分布。
methods: 这 paper 使用了变量推断和均化流程来解决计算成本高和难以采样的问题。
results: 这 paper 在多个 benchmark 中表现出色，可以快速和高效地处理复杂的模型和分布。

Abstract
Variational inference is an increasingly popular method in statistics and machine learning for approximating probability distributions. We developed LINFA (Library for Inference with Normalizing Flow and Annealing), a Python library for variational inference to accommodate computationally expensive models and difficult-to-sample distributions with dependent parameters. We discuss the theoretical background, capabilities, and performance of LINFA in various benchmarks. LINFA is publicly available on GitHub at https://github.com/desResLab/LINFA.

摘要
<> translate "Variational inference is an increasingly popular method in statistics and machine learning for approximating probability distributions. We developed LINFA (Library for Inference with Normalizing Flow and Annealing), a Python library for variational inference to accommodate computationally expensive models and difficult-to-sample distributions with dependent parameters. We discuss the theoretical background, capabilities, and performance of LINFA in various benchmarks. LINFA is publicly available on GitHub at https://github.com/desResLab/LINFA." into Simplified Chinese.习惯性推论是现代统计学和机器学习中越来越受欢迎的方法，用于简化机会概率分布。我们发展了LINFA（对应流和气适化库），一个用于统计学和机器学习中的推论方法，以应对 computationally expensive 模型和困难样本分布。我们在不同的审核中讨论了 LINFA 的理论背景、能力和性能。 LINFA 可以在 GitHub 上遍历：https://github.com/desResLab/LINFA。

Quantifying the Echo Chamber Effect: An Embedding Distance-based Approach

paper_url: http://arxiv.org/abs/2307.04668
repo_url: https://github.com/faalatawi/echo-chamber-score
paper_authors: Faisal Alatawi, Paras Sheth, Huan Liu
for: 这 paper 的目的是开发一种量化 echo chamber 的方法，以便更好地理解在社交媒体平台上的信息传播和社会架构。
methods: 这 paper 使用了一种新的衡量方法，即 Echo Chamber Score (ECS)，可以不需要用户政治信仰的标签，同时不需要假设交互图的结构。具体来说，ECS 使用了一种自我超vised graph autoencoder-based 的用户嵌入模型，以便在嵌入空间中度量用户之间的距离。
results: 根据 Twitter 数据集的四个话题（两个极化话题和两个非极化话题），我们的结果表明 ECS 是一种有效的量化 echo chamber 的工具，可以帮助我们更好地理解在线讨论的动态。

Abstract
The rise of social media platforms has facilitated the formation of echo chambers, which are online spaces where users predominantly encounter viewpoints that reinforce their existing beliefs while excluding dissenting perspectives. This phenomenon significantly hinders information dissemination across communities and fuels societal polarization. Therefore, it is crucial to develop methods for quantifying echo chambers. In this paper, we present the Echo Chamber Score (ECS), a novel metric that assesses the cohesion and separation of user communities by measuring distances between users in the embedding space. In contrast to existing approaches, ECS is able to function without labels for user ideologies and makes no assumptions about the structure of the interaction graph. To facilitate measuring distances between users, we propose EchoGAE, a self-supervised graph autoencoder-based user embedding model that leverages users' posts and the interaction graph to embed them in a manner that reflects their ideological similarity. To assess the effectiveness of ECS, we use a Twitter dataset consisting of four topics - two polarizing and two non-polarizing. Our results showcase ECS's effectiveness as a tool for quantifying echo chambers and shedding light on the dynamics of online discourse.

摘要
社交媒体平台的兴起使得听众圈（echo chamber）在线空间中增长，这些圈中听众主要遇到支持他们现有的信仰的观点，同时排除不同意见。这种现象严重阻碍信息在社区之间传播，促进社会分化。因此，我们需要开发方法来评估听众圈。在这篇论文中，我们提出了听众圈分数（ECS），一种新的评估方法，可以评估用户社区的凝聚和分化程度。与现有方法不同的是，ECS不需要用户政治信仰的标签，并且不 assumptions 关于交互图的结构。为了计算用户之间的距离，我们提出了 EchoGAE，一种基于用户帖子和交互图的自适应图自动encoder模型，可以帮助将用户嵌入到一个表示他们意识形态相似性的空间中。为了评估ECS的有效性，我们使用了一个 Twitter 数据集，包括四个话题：两个极化话题和两个非极化话题。我们的结果显示ECS 是一种有效的听众圈评估工具，可以揭示在线讨论的动态。

2023-07-11

Sports Betting: an application of neural networks and modern portfolio theory to the English Premier League

Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

Predicting small molecules solubilities on endpoint devices using deep ensemble neural networks

Discovering Symbolic Laws Directly from Trajectories with Hamiltonian Graph Neural Networks

On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

CareFall: Automatic Fall Detection through Wearable Devices and AI Methods

U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

MAP- and MLE-Based Teaching

DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

A Survey From Distributed Machine Learning to Distributed Deep Learning

Attribute Controlled Dialogue Prompting

Supervised Attention Using Homophily in Graph Neural Networks

Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning

Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning

Reject option models comprising out-of-distribution detection

Differentially Private Statistical Inference through $β$-Divergence One Posterior Sampling

Membership Inference Attacks on DNNs using Adversarial Perturbations

Using Linear Regression for Iteratively Training Neural Networks

Decorrelation using Optimal Transport

A Mapping Study of Machine Learning Methods for Remaining Useful Life Estimation of Lead-Acid Batteries

SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization

Multiobjective Hydropower Reservoir Operation Optimization with Transformer-Based Deep Reinforcement Learning

On the Effectiveness of Speech Self-supervised Learning for Music

Fast Neural Network Inference on FPGAs for Triggering on Long-Lived Particles at Colliders

ConFL: Constraint-guided Fuzzing for Machine Learning Framework

Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks

A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and Directions

Deep Probabilistic Movement Primitives with a Bayesian Aggregator

Speech Diarization and ASR with GMM

TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

Enhancing Continuous Time Series Modelling with a Latent ODE-LSTM Approach

Transaction Fraud Detection via Spatial-Temporal-Aware Graph Transformer

$\ell_p$-Regression in the Arbitrary Partition Model of Communication

Conformalization of Sparse Generalized Linear Models

Fundamental limits of overparametrized shallow neural networks for supervised learning

A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI

PePNet: A Periodicity-Perceived Workload Prediction Network Supporting Rare Occurrence of Heavy Workload

Transaction Fraud Detection via an Adaptive Graph Neural Network

Estimating label quality and errors in semantic segmentation data via any model

A Theory of Bounded Inductive Rationality

Portfolio Optimization: A Comparative Study

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

Number Systems for Deep Neural Network Architectures: A Survey

FairLay-ML: Intuitive Remedies for Unfairness in Data-Driven Social-Critical Algorithms

Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

Feature Activation Map: Visual Explanation of Deep Learning Models for Image Classification

CILF:Causality Inspired Learning Framework for Out-of-Distribution Vehicle Trajectory Prediction

Test-Time Training on Video Streams

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

Improving RNN-Transducers with Acoustic LookAhead

Latent Space Perspicacity and Interpretation Enhancement (LS-PIE) Framework

Control as Probabilistic Inference as an Emergent Communication Mechanism in Multi-Agent Reinforcement Learning

Selective Sampling and Imitation Learning via Online Regression

Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Uncertainty Quantification of the Virial Black Hole Mass with Conformal Prediction

Monotone deep Boltzmann machines

Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation

Secrets of RLHF in Large Language Models Part I: PPO

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization

Intrinsically motivated graph exploration using network theories of human curiosity

Reinforcement Learning with Non-Cumulative Objective

Hybrid hidden Markov LSTM for short-term traffic flow prediction

Compact Twice Fusion Network for Edge Detection

DDGM: Solving inverse problems by Diffusive Denoising of Gradient-based Minimization

Benchmarking Algorithms for Federated Domain Generalization

Impact of Feature Encoding on Malware Classification Explainability

Towards Fair Graph Neural Networks via Graph Counterfactual

Substance or Style: What Does Your Image Embedding Know?

Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)

SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation

FedYolo: Augmenting Federated Learning with Pretrained Transformers

Fast dynamic time warping and clustering in C++

Can You Improve My Code? Optimizing Programs with Local Search

Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer

Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies