2023-07-26

cs.LG

cs.LG - 2023-07-26

Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy

paper_url: http://arxiv.org/abs/2307.14243
repo_url: None
paper_authors: Luca Clissa, Antonio Macaluso, Roberto Morelli, Alessandra Occhinegro, Emiliana Piscitiello, Ludovico Taddei, Marco Luppi, Roberto Amici, Matteo Cerri, Timna Hitrec, Lorenzo Rinaldi, Antonio Zoccoli
for: fluorescence microscopy image analysis and deep learning research in life sciences
methods: diverse markers for rodent neuronal cells’ nuclei and cytoplasm, ground-truth annotations for semantic segmentation, object detection, and counting
results: facilitating methodological advancements in computer vision approaches and catalyzing breakthroughs in fluorescence microscopy analysis for life sciences research.Here’s the Chinese version:
for: 生物科学中的染色体微scopic image分析和深度学习研究
methods: 使用多种标记突出rodent neuronal cells的核和细胞体特征
results: 促进计算机视觉方法的进步和推动生物科学研究中的突破性发现

Abstract
Fluorescent Neuronal Cells v2 is a collection of fluorescence microscopy images and the corresponding ground-truth annotations, designed to foster innovative research in the domains of Life Sciences and Deep Learning. This dataset encompasses three image collections in which rodent neuronal cells' nuclei and cytoplasm are stained with diverse markers to highlight their anatomical or functional characteristics. Alongside the images, we provide ground-truth annotations for several learning tasks, including semantic segmentation, object detection, and counting. The contribution is two-fold. First, given the variety of annotations and their accessible formats, we envision our work facilitating methodological advancements in computer vision approaches for segmentation, detection, feature learning, unsupervised and self-supervised learning, transfer learning, and related areas. Second, by enabling extensive exploration and benchmarking, we hope Fluorescent Neuronal Cells v2 will catalyze breakthroughs in fluorescence microscopy analysis and promote cutting-edge discoveries in life sciences. The data are available at: https://amsacta.unibo.it/id/eprint/7347

摘要
fluorescent neuronal cells v2是一个包含 fluorescence microscopy 图像和相应的真实标注的集合，旨在推动生命科学和深度学习领域的创新研究。这个数据集包括三个图像集，其中 rodent neuronal cells的核和质物被用 diverse markers 染料来标示其 анатомиче或功能特征。同时，我们提供了真实标注数据 для多种学习任务，包括semantic segmentation，对象检测和计数。我们的贡献是twofold。首先，由于数据集的多样性和可访问的格式，我们期望我们的工作可以推动计算机视觉领域中的方法创新，包括 segmentation，检测，特征学习，无监督和自监督学习，转移学习，等等。其次，通过允许广泛探索和比较，我们希望 fluorescent neuronal cells v2 可以促进 fluorescence microscopy 分析的进步，并促进生命科学的前沿研究。数据可以在：https://amsacta.unibo.it/id/eprint/7347 获取。

Evolving Multi-Objective Neural Network Controllers for Robot Swarms

paper_url: http://arxiv.org/abs/2307.14237
repo_url: None
paper_authors: Karl Mason, Sabine Hauert
for: 这个研究是为了开发多目标控制器 для 群体机器人。
methods: 这个研究使用进化神经网络方法来训练群体机器人控制器，并在低精度Python实现和高精度Webots实现中进行训练和测试。
results: 研究结果显示，提案的方法可以有效地控制每个机器人，并且可以随着目标权重的调整，让机器人群展示不同的行为。同时，研究还证实了训练在低精度实现中的多目标神经网络控制器可以转移到高精度实现中，并且不需要进一步 retrained。

Abstract
Many swarm robotics tasks consist of multiple conflicting objectives. This research proposes a multi-objective evolutionary neural network approach to developing controllers for swarms of robots. The swarm robot controllers are trained in a low-fidelity Python simulator and then tested in a high-fidelity simulated environment using Webots. Simulations are then conducted to test the scalability of the evolved multi-objective robot controllers to environments with a larger number of robots. The results presented demonstrate that the proposed approach can effectively control each of the robots. The robot swarm exhibits different behaviours as the weighting for each objective is adjusted. The results also confirm that multi-objective neural network controllers evolved in a low-fidelity simulator can be transferred to high-fidelity simulated environments and that the controllers can scale to environments with a larger number of robots without further retraining needed.

摘要
许多群体机器人任务具有多个冲突目标。本研究提出了一种多目标演化神经网络方法来开发群体机器人控制器。群体机器人控制器在低精度Python模拟器中进行训练，然后在使用Webots的高精度模拟环境进行测试。在 simulate the scalability of the evolved multi-objective robot controllers to environments with a larger number of robots。结果显示，提出的方法可以有效地控制每个机器人。机器人群体展示不同的行为，按照目标权重的调整。结果还证明了在低精度模拟器中进行多目标神经网络控制器的演化可以在高精度模拟环境中转移，而且控制器可以在更多的机器人环境中进行扩展，无需进行进一步的 retrained。

Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences

paper_url: http://arxiv.org/abs/2307.14225
repo_url: None
paper_authors: Scott Sanner, Krisztian Balog, Filip Radlinski, Ben Wedin, Lucas Dixon
for: 这个论文主要针对的是如何使用语言基于的偏好表达来进行推荐。
methods: 这篇论文使用了大型自然语言模型（LLM）的提示方法来进行推荐。
results: 研究发现，使用LLM的提示方法可以在没有特定任务的情况下（即冷启动）提供竞争力强的推荐性能，并且这种方法的偏好表达比Item-based CF方法更加可解释和透明。

Abstract
Traditional recommender systems leverage users' item preference history to recommend novel content that users may like. However, modern dialog interfaces that allow users to express language-based preferences offer a fundamentally different modality for preference input. Inspired by recent successes of prompting paradigms for large language models (LLMs), we study their use for making recommendations from both item-based and language-based preferences in comparison to state-of-the-art item-based collaborative filtering (CF) methods. To support this investigation, we collect a new dataset consisting of both item-based and language-based preferences elicited from users along with their ratings on a variety of (biased) recommended items and (unbiased) random items. Among numerous experimental results, we find that LLMs provide competitive recommendation performance for pure language-based preferences (no item preferences) in the near cold-start case in comparison to item-based CF methods, despite having no supervised training for this specific task (zero-shot) or only a few labels (few-shot). This is particularly promising as language-based preference representations are more explainable and scrutable than item-based or vector-based representations.

摘要
传统推荐系统利用用户ITEM的喜好历史来推荐新的内容，但现代对话界面允许用户通过语言基于的喜好输入来提供一种全然不同的可能性。受最近大语言模型（LLM）的成功启发，我们研究其在基于ITEM和语言基于的喜好input中进行推荐的可能性，并与当前状态艺术CF方法进行比较。为支持这些研究，我们收集了一个新的数据集，该数据集包含了基于ITEM和语言基于的喜好输入，以及用户对各种（偏见）推荐ITEM和（无偏见）随机ITEM的评分。在许多实验结果中，我们发现了LLM在冷启动情况下，对于纯language基于的喜好（没有ITEM喜好）的推荐性能和ITEM基CF方法相当，即使没有特定任务的直接supervised训练（零shot）或只有几个标签（几shot）。这对于语言基于的喜好表示是更加可解释和易于理解，因为ITEM基CF方法的表示是基于VECTOR的。注意：这里使用的 Simplified Chinese 是指简化字符串的 Simplified Chinese，而不是指特定的字符串。

Online Modeling and Monitoring of Dependent Processes under Resource Constraints

paper_url: http://arxiv.org/abs/2307.14208
repo_url: None
paper_authors: Tanapol Kosolwattana, Huazheng Wang, Ying Lin
for: 监测受限资源的依赖过程集中有 kritical importance for abnormal event detection.
methods: 提出了一种online collaborative learning方法，可以动态分配资源以优先级进行高风险进程的利用和依赖动力学的探索。
results: 理论分析和实验证明了方法的效率。

Abstract
Monitoring a population of dependent processes under limited resources is critical for abnormal events detection. A novel online collaborative learning method is proposed to adaptively allocate the resources for exploitation of high-risk processes and exploration of dependent dynamics. Efficiency of the proposed method is proved through theoretical analysis and experiments.

摘要
监测依赖过程的人口在有限资源下是检测异常事件的关键。提出了一种新的在线合作学习方法，以适应尽可能地分配资源，以便利用高风险过程的探索和依赖动态的探索。我们通过理论分析和实验证明了该方法的效率。

Application of Random Forest and Support Vector Machine for Investigation of Pressure Filtration Performance, a Zinc Plant Filter Cake Modeling

paper_url: http://arxiv.org/abs/2307.14199
repo_url: None
paper_authors: Masoume Kazemi, Davood Moradkhani, Alireza Abbas Alipour
for: 这项研究目的是研究压滤过程中硫铁残留物的影响，并通过Random Forest和Support Vector Machine模型来预测硫铁残留物的含水率。
methods: 本研究使用了Random Forest Regression和Support Vector Regression模型，这些模型以实验室样本中的连续变量（提取特征）为输入。
results: 研究发现，Random Forest Regression模型在预测硫铁残留物含水率方面比Support Vector Regression模型更为精确。

Abstract
The hydrometallurgical method of zinc production involves leaching zinc from ore and then separating the solid residue from the liquid solution by pressure filtration. This separation process is very important since the solid residue contains some moisture that can reduce the amount of zinc recovered. This study modeled the pressure filtration process through Random Forest (RF) and Support Vector Machine (SVM). The models take continuous variables (extracted features) from the lab samples as inputs. Thus, regression models namely Random Forest Regression (RFR) and Support Vector Regression (SVR) were chosen. A total dataset was obtained during the pressure filtration process in two conditions: 1) Polypropylene (S1) and 2) Polyester fabrics (S2). To predict the cake moisture, solids concentration (0.2 and 0.38), temperature (35 and 65 centigrade), pH (2, 3.5, and 5), pressure, cake thickness (14, 20, 26, and 34 mm), air-blow time (2, 10 and 15 min) and filtration time were applied as input variables. The models' predictive accuracy was evaluated by the coefficient of determination (R2) parameter. The results revealed that the RFR model is superior to the SVR model for cake moisture prediction.

摘要
《锌生产水化металлурги法》中的压 филь特过程是非常重要的，因为压 filtering process中的固体剩下物含有一定的湿度，这可能会降低锌的回收率。本研究通过Random Forest（RF）和Support Vector Machine（SVM）模型来模拟压 filtering process。这两种模型都是回归模型，它们使用实验室样本中的连续变量（提取特征）作为输入。因此，我们选择了Random Forest Regression（RFR）和Support Vector Regression（SVR）模型来预测压 filtering process中固体剩下物的湿度。我们获得了压 filtering process在两种条件下的总数据集：1）Polypropylene（S1）和2）Polyester fabrics（S2）。为了预测固体剩下物的湿度，我们选择了以下输入变量：压 filtering time（2, 10和15 min），温度（35和65 centigrade），pH（2, 3.5和5），压力，压 filtering cake thickness（14, 20, 26和34 mm），空气吹气时间（2, 10和15 min）。我们使用R2参数来评估模型预测的准确性。结果表明，RFR模型在预测固体剩下物的湿度方面比SVR模型更为有力。

Efficient Learning of Discrete-Continuous Computation Graphs

paper_url: http://arxiv.org/abs/2307.14193
repo_url: https://github.com/nec-research/dccg
paper_authors: David Friede, Mathias Niepert
for: 本研究旨在提出新的方法来训练混合抽象和连续模型，以便更好地处理复杂的机器学习任务。
methods: 研究人员使用了混合抽象和连续模型，并使用了Stochastic softmax tricks来搅合抽象和连续模型。
results: 研究人员发现，使用新的方法可以训练复杂的混合抽象和连续模型，并且这些模型在一些benchmark datasets上表现更好。

Abstract
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.

摘要
很多超级vised和强化学习模型都可以借鉴混合精度和连续模型组件。端到端学习可靠的精度-连续模型是可组合的，通常更容易泛化，而且更易于解释。在建立精度-连续计算图时，一种常见的方法是通过将精度概率分布 integrate到神经网络中使用随机杂化技术。先前的工作主要集中在计算图上有单个精度组件的每个执行路径。我们分析了更复杂的随机计算图中多个顺序精度组件的行为。我们发现这些模型的参数优化很困难，主要是因为小的梯度和地方最小值。我们然后提出了两种新策略来解决这些挑战。首先，我们显示在训练时增加抽象噪声拟合的扩大参数可以改善学习行为。其次，我们提出特殊设计 для某些随机、精度-连续计算图的dropout降阶连接。通过广泛的实验，我们证明可以训练复杂的精度-连续模型，而标准随机杂化技术无法训练这些模型。此外，我们还证明复杂的精度模型在多个benchmark数据集上比其连续counterpart更好地适应和泛化。

A comparison of machine learning surrogate models of street-scale flooding in Norfolk, Virginia

paper_url: http://arxiv.org/abs/2307.14185
repo_url: None
paper_authors: Diana McSpadden, Steven Goldenberg, Binata Roy, Malachi Schram, Jonathan L. Goodall, Heather Richter
for: 评估低洼海岸城市（如诺福克，弗吉尼亚）面临的街道洪水问题，这会压力交通和废水系统，并可能导致财产损害。
methods: 使用诺福克降雨事件数据（2016-2018年），比较之前的抽象模型（基于随机森林算法）和两种深度学习模型：长短期记忆（LSTM）和闭合循环单元（GRU）的性能。
results: 研究表明，使用支持预测不确定性的模型架构和有效地结合相关多样特征是关键，以提高预测准确性和稳定性。

Abstract
Low-lying coastal cities, exemplified by Norfolk, Virginia, face the challenge of street flooding caused by rainfall and tides, which strain transportation and sewer systems and can lead to property damage. While high-fidelity, physics-based simulations provide accurate predictions of urban pluvial flooding, their computational complexity renders them unsuitable for real-time applications. Using data from Norfolk rainfall events between 2016 and 2018, this study compares the performance of a previous surrogate model based on a random forest algorithm with two deep learning models: Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). This investigation underscores the importance of using a model architecture that supports the communication of prediction uncertainty and the effective integration of relevant, multi-modal features.

摘要
低洼海岸城市，如尼科尔斯，面临洪水泛滥和潮汐的挑战，这会压力交通和废水系统，并可能导致财产损害。虽然高精度的物理学基模型可以准确预测城市洪水，但它们的计算复杂度使其不适用于实时应用。根据2016-2018年尼科尔斯雨事件的数据，本研究比较了之前的随机森林算法基于模型和两种深度学习模型：长短期记忆（LSTM）和闭包逻辑单元（GRU）的表现。这一研究强调了使用一种支持预测不确定性的模型架构，并有效地 integrate 多种多样的特征。

Learning Disentangled Discrete Representations

paper_url: http://arxiv.org/abs/2307.14151
repo_url: https://github.com/david-friede/lddr
paper_authors: David Friede, Christian Reimers, Heiner Stuckenschmidt, Mathias Niepert
for: 该论文探讨了精制的离散特征空间如何提高分离表示的质量。
methods: 该论文使用了特定的 categorical variational autoencoder (VAE)，并通过分析和实验证明了离散分布的格子结构可以减轻多变量 Gaussian 分布中的旋转不变性问题，从而为分离表示提供了有效的induction prior。
results: 该论文通过分析和实验表明，离散 VAE 可以更好地学习分离表示，并提出了首个不经过标注的模型选择策略，以便寻找更好的分离表示模型。

Abstract
Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions mitigates the problem of rotational invariance associated with multivariate Gaussian distributions, acting as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.

摘要
近期的图像生成、模型基 Reinforcement Learning 和文本到图像生成 achievements have shown the empirical advantages of discrete latent representations, but the reasons behind these benefits are not clear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions solves the problem of rotational invariance associated with multivariate Gaussian distributions, serving as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.

Toward Design of Synthetic Active Inference Agents by Mere Mortals

paper_url: http://arxiv.org/abs/2307.14145
repo_url: None
paper_authors: Bert de Vries
for: 总结有效的活动推理代理在边缘设备上实现
methods: 软件工具箱支持不熟悉的工程师开发工作的活动推理代理
results: 实现了加速活动推理代理在边缘设备上的民主化Here’s the English version for reference:
for: Realizing effective active inference agents on edge devices
methods: A software toolbox supporting non-expert engineers to develop working active inference agents
results: Accelerating the democratization of active inference agents on edge devices

Abstract
The theoretical properties of active inference agents are impressive, but how do we realize effective agents in working hardware and software on edge devices? This is an interesting problem because the computational load for policy exploration explodes exponentially, while the computational resources are very limited for edge devices. In this paper, we discuss the necessary features for a software toolbox that supports a competent non-expert engineer to develop working active inference agents. We introduce a toolbox-in-progress that aims to accelerate the democratization of active inference agents in a similar way as TensorFlow propelled applications of deep learning technology.

摘要
理论上，活动推理代理的性能很吸引人，但实现工作硬件和软件上的有效代理却是一个挑战。这是因为策略探索计算负担会 exponentiates，而边缘设备的计算资源非常有限。在这篇文章中，我们讨论了实现有效的活动推理代理所需的必要特性，以及一个在进行中的工具箱，用于加速活动推理代理的普及。

paper_url: http://arxiv.org/abs/2307.14138
repo_url: None
paper_authors: Behzad Nourani-Koliji, Steven Bilaj, Amir Rezaei Balef, Setareh Maghsudi
for: 解决piecewise站点稳定 combinatorial半带准问题，即在非站点环境下，base arms的分布变化、奖励之间的 causal 关系变化或 Both，导致奖励生成过程发生变化。
methods: 提出了一种基于 Upper Confidence Bound（UCB）算法的优化策略，即采用改变点探测器基于 Generalized Likelihood Ratio（GLR）测试，并引入了一种新的结构化环境中的 group restart 策略。
results: theoretically, Establish a regret upper bound that reflects the effects of the number of structural- and distribution changes on the performance, and the numerical experiments in real-world scenarios exhibit the applicability and superior performance of the proposed method compared to the state-of-the-art benchmarks.

Abstract
We study the piecewise stationary combinatorial semi-bandit problem with causally related rewards. In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process. In such an environment, an optimal decision-maker must follow both sources of change and adapt accordingly. The problem becomes aggravated in the combinatorial semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms. The core of our proposed policy is the Upper Confidence Bound (UCB) algorithm. We assume the agent relies on an adaptive approach to overcome the challenge. More specifically, it employs a change-point detector based on the Generalized Likelihood Ratio (GLR) test. Besides, we introduce the notion of group restart as a new alternative restarting strategy in the decision making process in structured environments. Finally, our algorithm integrates a mechanism to trace the variations of the underlying graph structure, which captures the causal relationships between the rewards in the bandit setting. Theoretically, we establish a regret upper bound that reflects the effects of the number of structural- and distribution changes on the performance. The outcome of our numerical experiments in real-world scenarios exhibits applicability and superior performance of our proposal compared to the state-of-the-art benchmarks.

摘要
我们研究了分割式站立 combinatorial semi-bandit问题，其中奖励关系存在 causal 关系。在我们的非站点环境中，奖励生成过程中存在变化，包括基础武器的分布变化、奖励之间的 causal 关系变化或者两者都存在变化。在这种环境中，一个优化的决策者需要跟踪这些变化并适应应对。在 combinatorial semi-bandit 设定下，决策者只能观察选择的武器集的结果。我们的提议的策略是使用 Upper Confidence Bound（UCB）算法。我们假设agent使用可适应的方法来解决这个挑战。更具体地说，它使用基于 Generalized Likelihood Ratio（GLR）测试的变化检测器。此外，我们引入了一种新的分组重启策略，即 group restart，以便在结构化环境中决策过程中使用。最后，我们的算法包括一个跟踪变化的基本图结构，这个结构捕捉了奖励之间的 causal 关系。理论上，我们Establish了变量数量和分布变化对性能的 regret Upper bound。实际实验结果表明我们的提议在实际 scenarios 中表现出优于状态艺术的 benchmarcks。

A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

paper_url: http://arxiv.org/abs/2307.14397
repo_url: https://github.com/sutd-visual-computing-group/awesome-generative-modeling-under-data-constraints
paper_authors: Milad Abdollahzadeh, Touba Malekzadeh, Christopher T. H. Teo, Keshigeyan Chandrasegaran, Guimeng Liu, Ngai-Man Cheung
for: 本研究探讨在数据约束下学习生成模型，即Generative Modeling under Data Constraint（GM-DC）。GM-DC在医疗应用中数据采集困难时非常重要。
methods: 本研究提出了两种分类法：一种是GM-DC任务分类法，另一种是GM-DC方法分类法。此外，本研究还研究了不同GM-DC任务和方法之间的交互关系。
results: 本研究提出了一个GM-DC任务和方法分类框架，并进行了对GM-DC任务和方法的研究 gap、研究趋势和未来探索的探讨。

Abstract
In machine learning, generative modeling aims to learn to generate new data statistically similar to the training data distribution. In this paper, we survey learning generative models under limited data, few shots and zero shot, referred to as Generative Modeling under Data Constraint (GM-DC). This is an important topic when data acquisition is challenging, e.g. healthcare applications. We discuss background, challenges, and propose two taxonomies: one on GM-DC tasks and another on GM-DC approaches. Importantly, we study interactions between different GM-DC tasks and approaches. Furthermore, we highlight research gaps, research trends, and potential avenues for future exploration. Project website: https://gmdc-survey.github.io.

摘要
在机器学习中，生成模型学习的目标是学习生成新数据，与训练数据分布相似。在这篇论文中，我们对有限数据、少量shot和零shot生成模型学习（GM-DC）进行了报告。这是在数据收集困难时，如医疗应用场景中非常重要的话题。我们讨论了背景、挑战和两种分类：一种是GM-DC任务分类，另一种是GM-DC方法分类。此外，我们还研究了不同GM-DC任务和方法之间的交互。此外，我们还提出了未来探索的研究漏洞、趋势和潜在的研究方向。项目网站：https://gmdc-survey.github.io。Here's the breakdown of the translation:* 机器学习 (machine learning) becomes 机器学习 (machine learning)* 生成模型 (generative model) becomes 生成模型 (generative model)* 学习 (learn) becomes 学习 (learn)* 新数据 (new data) becomes 新数据 (new data)* 训练数据 (training data) becomes 训练数据 (training data)* GM-DC (Generative Modeling under Data Constraint) becomes GM-DC (生成模型学习下数据约束)* 有限数据 (limited data) becomes 有限数据 (limited data)* 少量shot (few shots) becomes 少量shot (few shots)* 零shot (zero shot) becomes 零shot (zero shot)* 讨论 (discuss) becomes 讨论 (discuss)* 背景 (background) becomes 背景 (background)* 挑战 (challenges) becomes 挑战 (challenges)* 两种分类 (two taxonomies) becomes 两种分类 (two taxonomies)* 任务 (task) becomes 任务 (task)* 方法 (method) becomes 方法 (method)* 交互 (interaction) becomes 交互 (interaction)* 未来探索 (future exploration) becomes 未来探索 (future exploration)* 研究漏洞 (research gaps) becomes 研究漏洞 (research gaps)* 趋势 (trends) becomes 趋势 (trends)* 潜在的研究方向 (potential research directions) becomes 潜在的研究方向 (potential research directions)

Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models

paper_url: http://arxiv.org/abs/2307.14134
repo_url: None
paper_authors: Himmet Toprak Kesgin, Muzaffer Kaan Yuce, Mehmet Fatih Amasyali
for: 这个研究旨在 bridge the research gap in less-resourced languages, 通过开发和评估不同大小的 Turkish BERT 模型。
methods: 研究者使用了多种数据源，包括多种文本类型，并在不同任务上进行了测试，包括偏好预测、情感分类、新闻分类和零shot分类。
results: 研究者发现，即使使用了更小的模型，模型仍然可以达到可靠的性能，包括零shot任务，同时保证计算效率和快速执行时间。

Abstract
This study introduces and evaluates tiny, mini, small, and medium-sized uncased Turkish BERT models, aiming to bridge the research gap in less-resourced languages. We trained these models on a diverse dataset encompassing over 75GB of text from multiple sources and tested them on several tasks, including mask prediction, sentiment analysis, news classification, and, zero-shot classification. Despite their smaller size, our models exhibited robust performance, including zero-shot task, while ensuring computational efficiency and faster execution times. Our findings provide valuable insights into the development and application of smaller language models, especially in the context of the Turkish language.

摘要
Translated into Simplified Chinese:这项研究介绍和评估了不同大小的无框 Turkish BERT 模型，以填补资源更少的语言研究隔阂。我们使用多种数据源，训练这些模型，并在多个任务上进行测试，包括掩码预测、情感分析、新闻分类和零shot分类。尽管这些模型较小，但它们在零shot任务中 still exhibited robust performance，同时保证计算效率和快速执行时间。我们的发现对小语言模型的开发和应用提供了有价值的 Insights，特别是在土耳其语言上。

GraphRNN Revisited: An Ablation Study and Extensions for Directed Acyclic Graphs

paper_url: http://arxiv.org/abs/2307.14109
repo_url: None
paper_authors: Taniya Das, Mark Koch, Maya Ravichandran, Nikhil Khatri
for: 学习图形生成模型
methods: 使用深度学习架构GraphRNN，并对基eline模型进行评估和ablation study
results: 发现BFS traverse对模型性能有重要贡献，并将GraphRNN扩展到生成导向图。Here’s the same information in English:
for: Learning graph generative models
methods: Using the deep learning-based GraphRNN architecture and evaluating against baseline models with an ablation study
results: Find that the BFS traversal suggested by You et al. to collapse representations of isomorphic graphs significantly contributes to model performance, and extend GraphRNN to generate directed acyclic graphs by replacing the BFS traversal with a topological sort, demonstrating improved performance on a real-world dataset.

Abstract
GraphRNN is a deep learning-based architecture proposed by You et al. for learning generative models for graphs. We replicate the results of You et al. using a reproduced implementation of the GraphRNN architecture and evaluate this against baseline models using new metrics. Through an ablation study, we find that the BFS traversal suggested by You et al. to collapse representations of isomorphic graphs contributes significantly to model performance. Additionally, we extend GraphRNN to generate directed acyclic graphs by replacing the BFS traversal with a topological sort. We demonstrate that this method improves significantly over a directed-multiclass variant of GraphRNN on a real-world dataset.

摘要
GRAPHRNN是一种深度学习基于架构，由You等人提出用于学习图生成模型。我们复制了You等人的结果，使用自己实现的GRAPHRNN架构，并对基准模型进行评估。通过一项剥夺研究，我们发现，You等人提出的BFS traverse方法可以很好地减少同构图的表示。此外，我们扩展GRAPHRNN来生成直接的цикли graphs，通过将BFS traverse替换为拓扑排序。我们示出，这种方法在实际数据上明显超过了一种 direkt-多类变体的GRAPHRNN。

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

paper_url: http://arxiv.org/abs/2307.14085
repo_url: None
paper_authors: Siyu Chen, Mengdi Wang, Zhuoran Yang
for: 学习一种名为量化斯坦堡峰点的 equilibrio（QSE）在一个 episodic Markov 游戏中，其中有一位领袖和一位跟随者。
methods: 使用 reinforcement learning（RL）和 maximum likelihood estimation（MLE）来学习领袖的决策问题，并且使用 entropy-regularized policy optimization problem（EPOP）来解决跟随者的决策问题。
results: 提出了一些可靠的算法来解决领袖的决策问题，并且可以在线和离线两种设置下实现。这些算法可以在不见跟随者的奖励情况下做出优化的决策，并且可以在特定的线性和偏好设置下实现高效的计算。

Abstract
We study reinforcement learning (RL) for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. In specific, at the outset of the game, the leader announces her policy to the follower and commits to it. The follower observes the leader's policy and, in turn, adopts a quantal response policy by solving an entropy-regularized policy optimization problem induced by leader's policy. The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data. A key challenge of this problem is that the leader cannot observe the follower's reward, and needs to infer the follower's quantal response model from his actions against leader's policies. We propose sample-efficient algorithms for both the online and offline settings, in the context of function approximation. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem, and we show that they achieve sublinear regret upper bounds. Moreover, we quantify the uncertainty of these estimators and leverage the uncertainty to implement optimistic and pessimistic algorithms for online and offline settings. Besides, when specialized to the linear and myopic setting, our algorithms are also computationally efficient. Our theoretical analysis features a novel performance-difference lemma which incorporates the error of quantal response model, which might be of independent interest.

摘要
我们研究强化学习（RL）以学习一个量化Stackelberg均衡（QSE）在一个 episodic Markov 游戏中，具有领袖-追随者结构。具体来说，在游戏开始时，领袖宣布她的策略给追随者，并将其固定下来。追随者根据领袖的策略采取一个量化回应策略，解决由领袖的策略引起的Entropy 正则化的策略优化问题。领袖的目标是找到她的优化策略，以实现与追随者的互动和学习数据的最优预期总回报。一个关键挑战是，领袖无法观察追随者的奖励，需要从追随者的行动中推断出追随者的量化回应模型。我们提出了样本效率高的算法，包括在线和离线设置中的Maximum Likelihood估计和模型自由或模型基于的RL，并证明它们可以实现下界 regret upper bound。此外，我们还评估了这些估计的uncertainty，并利用这些uncertainty来实现在线和离线设置中的 оптимистик和悲观算法。此外，当特化到线性和偏置设置时，我们的算法也是计算效率高的。我们的理论分析包括一个新的性能差异公式，可能是独立的兴趣。

Learning to simulate partially known spatio-temporal dynamics with trainable difference operators

paper_url: http://arxiv.org/abs/2307.14395
repo_url: None
paper_authors: Xiang Huang, Zhuoyuan Li, Hongsheng Liu, Zidong Wang, Hongye Zhou, Bin Dong, Bei Hua
for: 用神经网络模拟空间时间动态的研究在最近几年得到了很多关注。然而，大多数现有方法采用纯数据驱动的黑盒模型，准确性和可解释性受限。本文提出了一种新的混合体系，名为PDE-Net++，它通过结合可训练的差异算子和黑盒模型来Explicitly embedding partial prior knowledge of the underlying PDEs。
methods: 本文提出了两种不同的差异算子选项：trainable flipping difference layer (TFDL)和trainable dynamic difference layer (TDDL)。
results: numerical experiments show that PDE-Net++ has better prediction accuracy and extrapolation performance than black-box models.

Abstract
Recently, using neural networks to simulate spatio-temporal dynamics has received a lot of attention. However, most existing methods adopt pure data-driven black-box models, which have limited accuracy and interpretability. By combining trainable difference operators with black-box models, we propose a new hybrid architecture explicitly embedded with partial prior knowledge of the underlying PDEs named PDE-Net++. Furthermore, we introduce two distinct options called the trainable flipping difference layer (TFDL) and the trainable dynamic difference layer (TDDL) for the difference operators. Numerous numerical experiments have demonstrated that PDE-Net++ has superior prediction accuracy and better extrapolation performance than black-box models.

摘要
Note:* "Recently" is translated as "近些时候" (jìn xiē shí hou)* "using neural networks" is translated as "使用神经网络" (shǐ yòng jīng xīn wǎng luò)* "simulate spatio-temporal dynamics" is translated as "模拟空间时间动态" (mó xiàng kōng jiān shí huan dòng)* "black-box models" is translated as "黑盒模型" (hēi bāo mó xiàng)* "trainable difference operators" is translated as "可训练差异算子" (kě xù xíng yì yán sè)* "PDE-Net++" is translated as "PDE-Net++" (PDE-Net++), no change* "numerical experiments" is translated as "数值实验" (shù xiàng shí yàn)* "superior prediction accuracy" is translated as "更高的预测精度" (gèng gāo de yù jí dào)* "better extrapolation performance" is translated as "更好的推论性能" (gèng hǎo de tuī yì yù)

Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation

paper_url: http://arxiv.org/abs/2307.14068
repo_url: None
paper_authors: Long Liu, Bo Zhou, Zhipeng Zhao, Zening Liu
for: 本研究旨在解决多源零监督领域适应（MUDA）中的困难，包括适应性下降和功能缺失。
methods: 我们提出了一种名为动态领域差异调整的活动多源领域适应（D3AAMDA）方法。该方法基于源频率的多源动态调整机制，控制每个源频率与目标频率之间的匹配程度，以便有效地利用每个源频率的本地有利特征信息。此外，我们还提出了一种多源活动边界选择策略（MABS），使用引导动态边界损失来设计高效的查询函数，以选择重要的样本。
results: 我们对常用的领域适应数据集进行了广泛的比较研究，并证明了我们的方法的超越性。

Abstract
Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from related source domains to an unlabeled target domain. While recent MUDA methods have shown promising results, most focus on aligning the overall feature distributions across source domains, which can lead to negative effects due to redundant features within each domain. Moreover, there is a significant performance gap between MUDA and supervised methods. To address these challenges, we propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA). Firstly, we establish a multi-source dynamic modulation mechanism during the training process based on the degree of distribution differences between source and target domains. This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains. Additionally, we propose a Multi-source Active Boundary Sample Selection (MABS) strategy, which utilizes a guided dynamic boundary loss to design an efficient query function for selecting important samples. This strategy achieves improved generalization to the target domain with minimal sampling costs. We extensively evaluate our proposed method on commonly used domain adaptation datasets, comparing it against existing UDA and ADA methods. The experimental results unequivocally demonstrate the superiority of our approach.

摘要
多源无监督领域适应（MUDA）目标是将相关的源频率域知识传递到无标注目标频率域。although recent MUDA methods have shown promising results, most of them focus on aligning the overall feature distributions across source domains, which can lead to negative effects due to redundant features within each domain. Moreover, there is a significant performance gap between MUDA and supervised methods. To address these challenges, we propose a novel approach called Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation (D3AAMDA). Firstly, we establish a multi-source dynamic modulation mechanism during the training process based on the degree of distribution differences between source and target domains. This mechanism controls the alignment level of features between each source domain and the target domain, effectively leveraging the local advantageous feature information within the source domains. Additionally, we propose a Multi-source Active Boundary Sample Selection (MABS) strategy, which utilizes a guided dynamic boundary loss to design an efficient query function for selecting important samples. This strategy achieves improved generalization to the target domain with minimal sampling costs. We extensively evaluate our proposed method on commonly used domain adaptation datasets, comparing it against existing UDA and ADA methods. The experimental results unequivocally demonstrate the superiority of our approach.

Hypergraph Isomorphism Computation

paper_url: http://arxiv.org/abs/2307.14394
repo_url: None
paper_authors: Yifan Feng, Jiashu Han, Shihui Ying, Yue Gao
for: solve the problem of hypergraph isomorphism and improve the performance of hypergraph kernel methods.
methods: propose a hypergraph Weisfiler-Lehman test algorithm and a general hypergraph Weisfeiler-Lehamn kernel framework, and implement two instances.
results: significant improvements in hypergraph classification and outperform other typical kernel-based methods in terms of runtime.

Abstract
The isomorphism problem is a fundamental problem in network analysis, which involves capturing both low-order and high-order structural information. In terms of extracting low-order structural information, graph isomorphism algorithms analyze the structural equivalence to reduce the solver space dimension, which demonstrates its power in many applications, such as protein design, chemical pathways, and community detection. For the more commonly occurring high-order relationships in real-life scenarios, the problem of hypergraph isomorphism, which effectively captures these high-order structural relationships, cannot be straightforwardly addressed using graph isomorphism methods. Besides, the existing hypergraph kernel methods may suffer from high memory consumption or inaccurate sub-structure identification, thus yielding sub-optimal performance. In this paper, to address the abovementioned problems, we first propose the hypergraph Weisfiler-Lehman test algorithm for the hypergraph isomorphism test problem by generalizing the Weisfiler-Lehman test algorithm from graphs to hypergraphs. Secondly, based on the presented algorithm, we propose a general hypergraph Weisfieler-Lehman kernel framework and implement two instances, which are Hypergraph Weisfeiler-Lehamn Subtree Kernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfill our research objectives, a comprehensive set of experiments was meticulously designed, including seven graph classification datasets and 12 hypergraph classification datasets. Results on hypergraph classification datasets show significant improvements compared to other typical kernel-based methods, which demonstrates the effectiveness of the proposed methods. In our evaluation, we found that our proposed methods outperform the second-best method in terms of runtime, running over 80 times faster when handling complex hypergraph structures.

摘要
<>TRANSLATE_TEXTThe isomorphism problem is a fundamental problem in network analysis, which involves capturing both low-order and high-order structural information. In terms of extracting low-order structural information, graph isomorphism algorithms analyze the structural equivalence to reduce the solver space dimension, which demonstrates its power in many applications, such as protein design, chemical pathways, and community detection. For the more commonly occurring high-order relationships in real-life scenarios, the problem of hypergraph isomorphism, which effectively captures these high-order structural relationships, cannot be straightforwardly addressed using graph isomorphism methods. Besides, the existing hypergraph kernel methods may suffer from high memory consumption or inaccurate sub-structure identification, thus yielding sub-optimal performance. In this paper, to address the abovementioned problems, we first propose the hypergraph Weisfiler-Lehman test algorithm for the hypergraph isomorphism test problem by generalizing the Weisfiler-Lehman test algorithm from graphs to hypergraphs. Secondly, based on the presented algorithm, we propose a general hypergraph Weisfieler-Lehamn kernel framework and implement two instances, which are Hypergraph Weisfeiler-Lehamn Subtree Kernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfill our research objectives, a comprehensive set of experiments was meticulously designed, including seven graph classification datasets and 12 hypergraph classification datasets. Results on hypergraph classification datasets show significant improvements compared to other typical kernel-based methods, which demonstrates the effectiveness of the proposed methods. In our evaluation, we found that our proposed methods outperform the second-best method in terms of runtime, running over 80 times faster when handling complex hypergraph structures.TRANSLATE_TEXT

Machine Learning Applications In Healthcare: The State Of Knowledge and Future Directions

paper_url: http://arxiv.org/abs/2307.14067
repo_url: None
paper_authors: Mrinmoy Roy, Sarwar J. Minar, Porarthi Dhar, A T M Omor Faruq
For: The paper aims to gather and present Machine Learning (ML) applications in different areas of healthcare, such as community level work, risk management/preventive care, healthcare operation management, remote care, and early detection, to provide quick access to necessary information and reduce the knowledge gap of clinicians about ML applications in healthcare.* Methods: The paper uses a comprehensive review of existing literature to identify and categorize ML applications in healthcare, and provides relevant references with descriptions in tabular form for quick access.* Results: The paper provides a comprehensive overview of ML applications in healthcare, including their potential benefits and limitations, and aims to motivate healthcare professionals towards more ML-based healthcare systems.Here’s the same information in Simplified Chinese text:* For: 这篇论文目的是为健康培训系统中的不同领域提供机器学习（ML）应用，以便快速获取必要信息，并减少医生关于ML应用的知识差距。* Methods: 这篇论文通过对现有文献进行全面审查，identify和分类健康培训系统中的ML应用，并提供相关参考文献的描述在表格形式以便快速访问。* Results: 这篇论文提供了健康培训系统中ML应用的全面概述，包括其潜在优势和局限性，并希望能够激励医疗专业人员更加关注ML基于健康培训系统。

Abstract
Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed information, scarcity of properly arranged and easily explainable documentation in related sector are major impede which are making ML applications difficult to healthcare professionals. This study aimed to gather ML applications in different areas of healthcare concisely and more effectively so that necessary information can be accessed immediately with relevant references. We divided our study into five major groups: community level work, risk management/ preventive care, healthcare operation management, remote care, and early detection. Dividing these groups into subgroups, we provided relevant references with description in tabular form for quick access. Our objective is to inform people about ML applicability in healthcare industry, reduce the knowledge gap of clinicians about the ML applications and motivate healthcare professionals towards more machine learning based healthcare system.

摘要
<>translate(Detection of easily missed hidden patterns with fast processing power makes machine learning (ML) indispensable to today's healthcare system. Though many ML applications have already been discovered and many are still under investigation, only a few have been adopted by current healthcare systems. As a result, there exists an enormous opportunity in healthcare system for ML but distributed information, scarcity of properly arranged and easily explainable documentation in related sector are major impede which are making ML applications difficult to healthcare professionals. This study aimed to gather ML applications in different areas of healthcare concisely and more effectively so that necessary information can be accessed immediately with relevant references. We divided our study into five major groups: community level work, risk management/ preventive care, healthcare operation management, remote care, and early detection. Dividing these groups into subgroups, we provided relevant references with description in tabular form for quick access. Our objective is to inform people about ML applicability in healthcare industry, reduce the knowledge gap of clinicians about the ML applications and motivate healthcare professionals towards more machine learning based healthcare system.)中文(简化)机器学习（ML）因其快速处理和检测隐藏 Patterns 的能力，成为今天医疗系统中不可或缺的一部分。虽然许多 ML 应用已经被发现，但只有一些被当前医疗系统采纳。因此，医疗系统中对 ML 的潜在机会巨大，但是分散的信息和医疗相关领域的文献不够整洁和容易解释，使 ML 应用困难于医疗专业人员。本研究的目标是收集不同领域的 ML 应用，并按照五大类划分：社区层次的工作、风险管理/预防护理、医疗运营管理、远程护理和早期检测。将这些类划分为子类，并提供相关参考文献的描述在表格形式，以便快速获取相关信息。我们的目标是通过了解医疗工业中 ML 的可能性，减少医生关于 ML 应用的知识差距，并激励医疗专业人员更多地采用机器学习基于医疗系统。）

Pre-Training with Diffusion models for Dental Radiography segmentation

paper_url: http://arxiv.org/abs/2307.14066
repo_url: None
paper_authors: Jérémy Rousseau, Christian Alaka, Emma Covili, Hippolyte Mayard, Laura Misrachi, Willy Au
for: 这篇论文旨在提高医疗X射影像分类的效率，尤其是 dental radiography 的分类。
methods: 本文提出了一种使用 Denoising Diffusion Probabilistic Models (DDPM) 的单簇预训练方法，并在这种预训练方法下预训 U-Net 模型。
results: 实验结果显示，提出的方法可以与现有的预训练方法相比，在 dental radiographs 的分类任务中达到高效的结果。

Abstract
Medical radiography segmentation, and specifically dental radiography, is highly limited by the cost of labeling which requires specific expertise and labor-intensive annotations. In this work, we propose a straightforward pre-training method for semantic segmentation leveraging Denoising Diffusion Probabilistic Models (DDPM), which have shown impressive results for generative modeling. Our straightforward approach achieves remarkable performance in terms of label efficiency and does not require architectural modifications between pre-training and downstream tasks. We propose to first pre-train a Unet by exploiting the DDPM training objective, and then fine-tune the resulting model on a segmentation task. Our experimental results on the segmentation of dental radiographs demonstrate that the proposed method is competitive with state-of-the-art pre-training methods.

摘要
医疗成像分割，特别是牙科成像，受到标注成本的限制，需要专业知识和劳动密集的标注。在这项工作中，我们提出了一种简单的预训练方法 для semantic segmentation，利用Denosing Diffusion Probabilistic Models（DDPM），这种模型在生成模型中表现出色。我们的简单approach可以 достичьRemarkable的标签效率，不需要下游任务中的建筑修改。我们提议首先预训一个Unet，利用DDPM训练目标，然后精度调整得到的模型以进行分割任务。我们的实验结果表明，提议的方法与现有的预训方法竞争。

Topologically-Regularized Multiple Instance Learning for Red Blood Cell Disease Classification

paper_url: http://arxiv.org/abs/2307.14025
repo_url: None
paper_authors: Salome Kazeminia, Ario Sadafi, Asya Makhro, Anna Bogdanova, Carsten Marr, Bastian Rieck
for: 鉴别罕见贫血病用微scopic图像
methods: 使用topology基本特征来正则化多例学习
results: 在71名患有罕见贫血病的患者和521张红血球微scopic图像上，使用topology正则化方法可以提高自动识别罕见贫血病的性能超过3%。

Abstract
Diagnosing rare anemia disorders using microscopic images is challenging for skilled specialists and machine-learning methods alike. Due to thousands of disease-relevant cells in a single blood sample, this constitutes a complex multiple-instance learning (MIL) problem. While the spatial neighborhood of red blood cells is not meaningful per se, the topology, i.e., the geometry of blood samples as a whole, contains informative features to remedy typical MIL issues, such as vanishing gradients and overfitting when training on limited data. We thus develop a topology-based approach that extracts multi-scale topological features from bags of single red blood cell images. The topological features are used to regularize the model, enforcing the preservation of characteristic topological properties of the data. Applied to a dataset of 71 patients suffering from rare anemia disorders with 521 microscopic images of red blood cells, our experiments show that topological regularization is an effective method that leads to more than 3% performance improvements for the automated classification of rare anemia disorders based on single-cell images. This is the first approach that uses topological properties for regularizing the MIL process.

摘要
诊断罕见血红细胞疾病使用微scopic图像具有挑战性， tanto для专家和机器学习方法。由于每个血液样本中有数以千计疾病相关的细胞，这构成了复杂的多例学习（MIL）问题。虽然红血球的空间邻居无法直接提供有用信息，但血液样本的整体几何结构却含有有用的特征，以解决典型的MIL问题，如消失梯度和过拟合。我们因此开发了基于几何特征的方法，从单个红血球图像中提取多尺度几何特征。这些几何特征用于规范模型，使模型保持疾病特征的几何特性。我们对71名患有罕见血红细胞疾病的病人，共521个微scopic红血球图像进行实验，结果表明，基于几何特征的规范是一种有效的方法，可以提高单细胞图像自动诊断罕见血红细胞疾病的性能，高于3%。这是首次使用几何特征进行MIL规范的方法。

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

paper_url: http://arxiv.org/abs/2307.14023
repo_url: None
paper_authors: Tokio Kajitsuka, Issei Sato
for: 这个论文是为了探讨Transformer模型的表达能力而写的。
methods: 该论文使用了 clarify the connection between the softmax function and the Boltzmann operator来解释Transformer模型的表达能力。
results: 研究发现，单层自注意力层可以完全捕捉输入序列的上下文，因此单层Transformer模型具有内存化能力，而且由一层自注意力层和两个隐藏层组成的Transformer模型是对 kontinuous function的universal approximator。

Abstract
Existing analyses of the expressive capacity of Transformer models have required excessively deep layers for data memorization, leading to a discrepancy with the Transformers actually used in practice. This is primarily due to the interpretation of the softmax function as an approximation of the hardmax function. By clarifying the connection between the softmax function and the Boltzmann operator, we prove that a single layer of self-attention with low-rank weight matrices possesses the capability to perfectly capture the context of an entire input sequence. As a consequence, we show that single-layer Transformer has a memorization capacity for finite samples, and that Transformers consisting of one self-attention layer with two feed-forward neural networks are universal approximators for continuous functions on a compact domain.

摘要
现有的分析表明，Transformer模型的表达能力需要过度深度层来进行数据记忆，导致与实际使用中的Transformer模型存在差异。这主要是因为对softmax函数的解释为硬max函数的近似，从而导致了层次结构的增加。我们通过证明了softmax函数和Boltzmann运算符之间的连接，表明了一层自注意力层可以完全捕捉整个输入序列的上下文。因此，我们显示了单层Transformer具有 finite samples的记忆能力，并证明了由一层自注意力层和两个输入神经网络组成的Transformer模型是对紧窄领域上的连续函数的universalapproximator。

MCMC-Correction of Score-Based Diffusion Models for Model Composition

paper_url: http://arxiv.org/abs/2307.14012
repo_url: https://github.com/jackonelli/mcmc_corr_score_diffusion
paper_authors: Anders Sjöberg, Jakob Lindqvist, Magnus Önnheim, Mats Jirstrand, Lennart Svensson
for: 这篇论文的目的是提出一种基于得分函数的扩展样本过程，以便组合不同的Markov链 Monte Carlo（MCMC）方法。
methods: 这篇论文使用了得分函数来Parameterize diffusion models，并通过线tegration of the score function来计算能量基于的接受概率。
results: 实验表明，这种方法可以 дости得与使用能量函数参数化的性能，且可以 reuse existing diffusion models。

Abstract
Diffusion models can be parameterised in terms of either a score or an energy function. The energy parameterisation has better theoretical properties, mainly that it enables an extended sampling procedure with a Metropolis--Hastings correction step, based on the change in total energy in the proposed samples. However, it seems to yield slightly worse performance, and more importantly, due to the widespread popularity of score-based diffusion, there are limited availability of off-the-shelf pre-trained energy-based ones. This limitation undermines the purpose of model composition, which aims to combine pre-trained models to sample from new distributions. Our proposal, however, suggests retaining the score parameterization and instead computing the energy-based acceptance probability through line integration of the score function. This allows us to re-use existing diffusion models and still combine the reverse process with various Markov-Chain Monte Carlo (MCMC) methods. We evaluate our method on a 2D experiment and find that it achieve similar or arguably better performance than the energy parameterisation.

摘要
Diffusion models 可以被参数化为得分或能量函数。能量参数化有更好的理论性质，主要是它允许扩展采样过程，并且基于变化总能量的 Metropolis--Hastings 修正步骤。然而，它似乎表现稍微下降，而且由于得分基 diffusion 的广泛普及，有限的可用性。这限制了模型组合的目的，即将预训练模型组合以采样新的分布。我们的提议是保留得分参数化，并计算基于得分函数的能量基于接受probability。这允许我们重用现有的 diffusion 模型，并且与多种Markov-Chain Monte Carlo（MCMC）方法结合。我们对2D实验进行评估，发现其表现相当或者更好于能量参数化。

Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG

paper_url: http://arxiv.org/abs/2307.14389
repo_url: https://github.com/yorgoon/diffe
paper_authors: Soowon Kim, Young-Eun Lee, Seo-Hyun Lee, Seong-Whan Lee
for: 用于实现脑机器人通信的想像语音数据解析
methods: 使用抽象数据模型（DDPMs）和增强型自适应神经网络（Diff-E）进行EEG信号处理
results: 比传统机器学习技术和基准模型更高的实现EEG信号实现想像语音的精度

Abstract
Decoding EEG signals for imagined speech is a challenging task due to the high-dimensional nature of the data and low signal-to-noise ratio. In recent years, denoising diffusion probabilistic models (DDPMs) have emerged as promising approaches for representation learning in various domains. Our study proposes a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional autoencoder named Diff-E. Results indicate that Diff-E significantly improves the accuracy of decoding EEG signals for imagined speech compared to traditional machine learning techniques and baseline models. Our findings suggest that DDPMs can be an effective tool for EEG signal decoding, with potential implications for the development of brain-computer interfaces that enable communication through imagined speech.

摘要
decode EEG 信号为想像的语音是一项复杂的任务，因为数据的维度高，信号噪声比低。在过去几年，杂Diffusion probabilistic models（DDPMs）已经出现为不同领域的表示学习提供了promising的方法。我们的研究提出了一种使用 DDPMs 和名为 Diff-E 的条件 autoencoder 来解码 EEG 信号的新方法。结果显示，Diff-E 可以 significatively 提高对想像语音的 EEG 信号解码精度，比传统机器学习技术和基线模型更高。我们的发现表明，DDPMs 可以是 EEG 信号解码的有效工具，具有可能应用于通过想像语音的 brain-computer interfaces 的发展的潜在意义。

Fast algorithms for k-submodular maximization subject to a matroid constraint

paper_url: http://arxiv.org/abs/2307.13996
repo_url: None
paper_authors: Shuxian Niu, Qian Liu, Yang Zhou, Min Li
for: 本文使用阈值递减算法来最大化$k$-submodular函数 beneath 环境约束，从而降低算法的查询复杂度，与排序算法相比，只有小量的损失。
methods: 本文提出了$(1/2-\epsilon)$-近似算法 для monotone $k$-submodular函数最大化，以及$(1/3-\epsilon)$-近似算法 для非 monotone случа例，其复杂度为 $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$.
results: 本文提出了一种快速的算法来最大化$k$-submodular函数 beneath 总大小约束，其复杂度为 $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$. 这些结果可以看作是对特殊的Uniform matroid环境的一种快速算法。

Abstract
In this paper, we apply a Threshold-Decreasing Algorithm to maximize $k$-submodular functions under a matroid constraint, which reduces the query complexity of the algorithm compared to the greedy algorithm with little loss in approximation ratio. We give a $(\frac{1}{2} - \epsilon)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - \epsilon)$-approximation algorithm for non-monotone case, with complexity $O(\frac{n(k\cdot EO + IO)}{\epsilon} \log \frac{r}{\epsilon})$, where $r$ denotes the rank of the matroid, and $IO, EO$ denote the number of oracles to evaluate whether a subset is an independent set and to compute the function value of $f$, respectively. Since the constraint of total size can be looked as a special matroid, called uniform matroid, then we present the fast algorithm for maximizing $k$-submodular functions subject to a total size constraint as corollaries. corollaries.

摘要
在这篇论文中，我们采用了阈值降低算法来最大化$k$-超模ular函数 beneath a matroid constraint，这将相比于格雷德算法减少查询复杂度，即使有些损失相对精度。我们提供了$(1/2-\epsilon)$-近似算法 для升华$k$-超模ular函数最大化，以及$(1/3-\epsilon)$-近似算法 для非升华 случа子，其复杂度为$O(\frac{n(k\cdot EO+IO)}{\epsilon}\log\frac{r}{\epsilon})$，其中$r$表示矩阵的排名，$IO, EO$表示评估 subset 是独立集的或acles数和计算函数值的次数。由于总大小的约束可以看作特殊的矩阵，即uniform矩阵，因此我们将在corollaries中提供快速算法 для最大化$k$-超模ular函数 subject to 总大小约束。

Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space

paper_url: http://arxiv.org/abs/2307.13995
repo_url: None
paper_authors: Guogang Zhu, Xuefeng Liu, Shaojie Tang, Jianwei Niu, Xinghao Wu, Jiaxing Shen
for: 这篇论文是针对个性化联合学习（PFL）框架的研究，具体来说是如何在不同客户端的数据分布下实现个性化的模型。
methods: 这篇论文提出了一种新的PFL框架，即FedPick，它在全局编码器生成的特征空间中采取了适应式选择客户端任务相关的特征，以提高模型在跨域FL中的性能。
results: 实验结果表明，FedPick可以有效地选择客户端任务相关的特征，并提高模型在跨域FL中的性能。

Abstract
Personalized federated learning (PFL) is a popular framework that allows clients to have different models to address application scenarios where clients' data are in different domains. The typical model of a client in PFL features a global encoder trained by all clients to extract universal features from the raw data and personalized layers (e.g., a classifier) trained using the client's local data. Nonetheless, due to the differences between the data distributions of different clients (aka, domain gaps), the universal features produced by the global encoder largely encompass numerous components irrelevant to a certain client's local task. Some recent PFL methods address the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and non-linearity of neural network parameter space. In contrast, the feature space exhibits a lower dimensionality, providing greater intuitiveness and interpretability as compared to the parameter space. To this end, we propose a novel PFL framework named FedPick. FedPick achieves PFL in the low-dimensional feature space by selecting task-relevant features adaptively for each client from the features generated by the global encoder based on its local data distribution. It presents a more accessible and interpretable implementation of PFL compared to those methods working in the parameter space. Extensive experimental results show that FedPick could effectively select task-relevant features for each client and improve model performance in cross-domain FL.

摘要
personalized federated learning (PFL) 是一种 популяр的框架，允许客户端有不同的模型来应对应用场景中客户端数据在不同域中。典型的客户端模型在 PFL 中包括全球编码器通过所有客户端训练的通用特征Extract和客户端本地数据使用的个性化层（例如分类器）。然而，由于客户端数据分布的不同（也就是域漏），全球编码器生成的通用特征主要包括客户端本地任务无关的多个组件。一些最近的 PFL 方法 Addressing the above problem by personalizing specific parameters within the encoder. However, these methods encounter substantial challenges attributed to the high dimensionality and non-linearity of neural network parameter space.在 contrast，特征空间具有较低的维度，提供更加直观和可解释的特征，相比于参数空间。为此，我们提议一种新的 PFL 框架，名为 FedPick。FedPick 实现了 PFL 在低维特征空间中，通过对每个客户端的本地数据分布进行适应性地选择任务相关的特征。它提供了 PFL 的更加可访问和可解释的实现，相比于在参数空间工作的方法。我们进行了广泛的实验研究，表明 FedPick 可以有效地选择每个客户端的任务相关特征，并提高 cross-domain FL 的模型性能。

BovineTalk: Machine Learning for Vocalization Analysis of Dairy Cattle under Negative Affective States

paper_url: http://arxiv.org/abs/2307.13994
repo_url: None
paper_authors: Dinu Gavojdian, Teddy Lazebnik, Madalina Mincu, Ariel Oren, Ioana Nicolae, Anna Zamansky
for: 这个研究的目的是开发和验证对livestock动物情感状态的非侵入式指标，以便将其 integrate到场地评估协议中。
methods: 这个研究使用了牛的 vocal indicators，并使用了深度学习和可解释机器学习两种计算框架来分类低频和高频牛叫，以及牛叫voice recognition。
results: 研究结果表明，使用深度学习和可解释机器学习两种计算框架可以达到87.2%和89.4%的叫声分类精度，并达到68.9%和72.5%的牛叫voice recognition精度。

Abstract
There is a critical need to develop and validate non-invasive animal-based indicators of affective states in livestock species, in order to integrate them into on-farm assessment protocols, potentially via the use of precision livestock farming (PLF) tools. One such promising approach is the use of vocal indicators. The acoustic structure of vocalizations and their functions were extensively studied in important livestock species, such as pigs, horses, poultry and goats, yet cattle remain understudied in this context to date. Cows were shown to produce two types vocalizations: low-frequency calls (LF), produced with the mouth closed, or partially closed, for close distance contacts and open mouth emitted high-frequency calls (HF), produced for long distance communication, with the latter considered to be largely associated with negative affective states. Moreover, cattle vocalizations were shown to contain information on individuality across a wide range of contexts, both negative and positive. Nowadays, dairy cows are facing a series of negative challenges and stressors in a typical production cycle, making vocalizations during negative affective states of special interest for research. One contribution of this study is providing the largest to date pre-processed (clean from noises) dataset of lactating adult multiparous dairy cows during negative affective states induced by visual isolation challenges. Here we present two computational frameworks - deep learning based and explainable machine learning based, to classify high and low-frequency cattle calls, and individual cow voice recognition. Our models in these two frameworks reached 87.2% and 89.4% accuracy for LF and HF classification, with 68.9% and 72.5% accuracy rates for the cow individual identification, respectively.

摘要
“有一个急需要发展和验证不侵入性动物表征情感状态的需求，以便在农场评估程序中 integrate 其。一种有前途的方法是使用 vocals。livestock 种类中，如猪、马、鸡和山羊的 vocalizations 和其功能已经广泛研究，但是牛尚未在这个设定中受到研究。牛产生了两种 vocals：低频声（LF），通过关闭或部分关闭口部生成，用于近距离接触，以及开口输出高频声（HF），用于长距离通信，后者被认为与负面情感状态有关。此外，牛 vocalizations 包含个体特征信息，在各种情况下都有广泛的应用。现在，生产周期中的牛面临许多负面挑战和压力，使得在负面情感状态下的牛 vocalizations 特别有研究价值。本研究将提供过去最大的（清洁掉噪） dataset，包括排程哺乳的成年多孢牛在负面情感状态下的 vocalizations。我们提出了两个 computation framework - deep learning 基于的和可解释机器学习基于的，用于分类高频和低频牛声，以及个体牛声识别。我们的模型在这两个框架中分别达到了 87.2% 和 89.4% 的准确率 для LF 和 HF 分类，以及 68.9% 和 72.5% 的准确率 для牛个体识别。”

Differentiable short-time Fourier transform with respect to the hop length

paper_url: http://arxiv.org/abs/2308.02421
repo_url: https://github.com/maxime-leiber/dstft
paper_authors: Maxime Leiber, Yosra Marnissi, Axel Barrau, Mohammed El Badaoui
for: 提出一种可微分的短时傅立叙 transform (STFT)，允许通过 kontinuous 的 hop 长或帧时间位置进行梯度下降优化。
methods: 使用 kontinuous 的 hop 长和帧时间位置进行梯度下降优化，提供更精细的时间位置控制。
results: 在 simulated 示例中，提出的方法可以更好地控制时间位置，并且可以轻松地与现有的算法和神经网络集成。

Abstract
In this paper, we propose a differentiable version of the short-time Fourier transform (STFT) that allows for gradient-based optimization of the hop length or the frame temporal position by making these parameters continuous. Our approach provides improved control over the temporal positioning of frames, as the continuous nature of the hop length allows for a more finely-tuned optimization. Furthermore, our contribution enables the use of optimization methods such as gradient descent, which are more computationally efficient than conventional discrete optimization methods. Our differentiable STFT can also be easily integrated into existing algorithms and neural networks. We present a simulated illustration to demonstrate the efficacy of our approach and to garner interest from the research community.

摘要
在这篇论文中，我们提出了一种可微分的短时傅立叙变换（STFT），允许通过使这些参数变为连续的，进行梯度下降优化。我们的方法可以提供更好的控制时间位置，因为连续的跳跃长度允许更细化优化。此外，我们的贡献允许使用优化方法，如梯度下降，这些方法更有效率于传统的离散优化方法。我们的可微分STFT也可以轻松地与现有的算法和神经网络结合使用。我们在示例中提供了一个示例，以证明我们的方法的有效性，并且吸引研究人员的关注。

paper_url: http://arxiv.org/abs/2307.13991
repo_url: None
paper_authors: Junwon Seo, Taekyung Kim, Seongyong Ahn, Kiho Kwak
for: 本研究旨在开发一种能够准确预测不同环境中的地形通行可能性的自适应导航系统。
methods: 该研究使用meta-学习框架，通过自我指导式方式，使用汽车-地面交互反馈来训练一个全球模型，从稀疏的LiDAR点云中生成一个连续值的成本图，以预测地形通行可能性。
results: 研究人员通过收集不同地形的驾驶数据，训练了一个全球模型，并在部署过程中进行了在线适应，以快速适应当地环境。这种方法能够减少预测uncertainty，并且可以安全地和稳定地导航在未知和未适应的地形中。

Abstract
Autonomous navigation in off-road conditions requires an accurate estimation of terrain traversability. However, traversability estimation in unstructured environments is subject to high uncertainty due to the variability of numerous factors that influence vehicle-terrain interaction. Consequently, it is challenging to obtain a generalizable model that can accurately predict traversability in a variety of environments. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains.

摘要
自主导航在不结构化环境中需要准确地估计地形通行能力。然而，在无结构环境中的通行能力估计受到许多因素的变化所致的高度不确定性的影响。因此，constructing a generalizable model that can accurately predict terrain traversability in a variety of environments is challenging. This paper presents METAVerse, a meta-learning framework for learning a global model that accurately and reliably predicts terrain traversability across diverse environments. We train the traversability prediction network to generate a dense and continuous-valued cost map from a sparse LiDAR point cloud, leveraging vehicle-terrain interaction feedback in a self-supervised manner. Meta-learning is utilized to train a global model with driving data collected from multiple environments, effectively minimizing estimation uncertainty. During deployment, online adaptation is performed to rapidly adapt the network to the local environment by exploiting recent interaction experiences. To conduct a comprehensive evaluation, we collect driving data from various terrains and demonstrate that our method can obtain a global model that minimizes uncertainty. Moreover, by integrating our model with a model predictive controller, we demonstrate that the reduced uncertainty results in safe and stable navigation in unstructured and unknown terrains.Note that Simplified Chinese is a written language that is used in mainland China, and it is different from Traditional Chinese, which is used in Taiwan and other parts of the world.

Differentiable adaptive short-time Fourier transform with respect to the window length

paper_url: http://arxiv.org/abs/2308.02418
repo_url: https://github.com/maxime-leiber/dstft
paper_authors: Maxime Leiber, Yosra Marnissi, Axel Barrau, Mohammed El Badaoui
for: 这篇论文是为了提出一种基于梯度的方法，用于在实时进行 STFT 的优化，包括每帧和每频域窗口长的优化。
methods: 这篇论文使用了 differentiable STFT，并通过将窗口长变量化，使其成为可微分的。这使得可以使用梯度下降来优化 STFT。
results: 研究人员透过实验验证了他们的方法，并发现其可以很好地适应具有变动和站立 ком component的时间频率图表。

Abstract
This paper presents a gradient-based method for on-the-fly optimization for both per-frame and per-frequency window length of the short-time Fourier transform (STFT), related to previous work in which we developed a differentiable version of STFT by making the window length a continuous parameter. The resulting differentiable adaptive STFT possesses commendable properties, such as the ability to adapt in the same time-frequency representation to both transient and stationary components, while being easily optimized by gradient descent. We validate the performance of our method in vibration analysis.

摘要
Here's the text in Simplified Chinese:这篇论文提出了一种基于梯度的方法，用于在实时中对STFT的每帧和每频窗长进行优化，与之前的工作相关，我们将STFT中的窗长作为连续参数来实现可导的STFT。这种可导的STFT具有许多优点，如适应同时频域中的激变和站ARY组件，同时也可以通过梯度下降方便地优化。我们在振荡分析中验证了这种方法的性能。

This is not correct! Negation-aware Evaluation of Language Generation Systems

paper_url: http://arxiv.org/abs/2307.13989
repo_url: https://github.com/dmlls/cannot-dataset
paper_authors: Miriam Anschütz, Diego Miguel Lozano, Georg Groh
for: 本文为了解决大语言模型对否定语句的影响不充分评估，提出了一种基于否定语句的评估指标——NegBLEURT。
methods: 本文使用了一种基于规则的句子否定工具，并使用这个工具创建了CANNOT negation评估数据集。然后，通过这个数据集进行了一些模型的微调和评估指标的修改，以提高对否定语句的敏感性。
results: 对于现有的benchmark测试，我们的微调模型和评估指标在否定句子上表现出了很大的改善，而不会影响其他类型的折衣。

Abstract
Large language models underestimate the impact of negations on how much they change the meaning of a sentence. Therefore, learned evaluation metrics based on these models are insensitive to negations. In this paper, we propose NegBLEURT, a negation-aware version of the BLEURT evaluation metric. For that, we designed a rule-based sentence negation tool and used it to create the CANNOT negation evaluation dataset. Based on this dataset, we fine-tuned a sentence transformer and an evaluation metric to improve their negation sensitivity. Evaluating these models on existing benchmarks shows that our fine-tuned models outperform existing metrics on the negated sentences by far while preserving their base models' performances on other perturbations.

摘要
大型语言模型会 під估算负语气对句子意思的影响。因此，基于这些模型的评估指标会忽略负语气。在这篇论文中，我们提出了NegBLEURT，一个负语气意识的版本的BLEURT评估指标。为此，我们设计了一个基于规则的句子负语气工具，并使用这个工具创建了CANNOT负语气评估集。基于这个集合，我们精微调整了句子转换器和评估指标，以改善它们对负语气的敏感性。在现有的测试基础上评估这些模型，我们发现我们的精微调整模型在负语气句子上大幅超越了现有的指标，而且保持了基本模型在其他折冲上的表现。

Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation

paper_url: http://arxiv.org/abs/2307.13978
repo_url: None
paper_authors: Mahyar Abbasian, Taha Rajabzadeh, Ahmadreza Moradipari, Seyed Amir Hossein Aqajari, Hongsheng Lu, Amir Rahmani
for: 本研究旨在解决生成器网络中控制生成过程的挑战，通过结合激励学习（RL）Agent和幂变空间生成器（l-GAN）来实现恰当的输出生成。
methods: 我们提出了一种 integrate RL agent with l-GAN 的方法，包括设计了一个actor-critic RL agent和一个仔细设计的奖励策略，使得 agent 在 latent space 中穿梭并生成基于指定任务的输出。
results: 我们通过使用 MNIST dataset 进行了一系列实验，包括数学运算的 illustrate task，实验结果证明了我们的方法的有效性。

Abstract
Generative Adversarial Networks (GAN) have emerged as a formidable AI tool to generate realistic outputs based on training datasets. However, the challenge of exerting control over the generation process of GANs remains a significant hurdle. In this paper, we propose a novel methodology to address this issue by integrating a reinforcement learning (RL) agent with a latent-space GAN (l-GAN), thereby facilitating the generation of desired outputs. More specifically, we have developed an actor-critic RL agent with a meticulously designed reward policy, enabling it to acquire proficiency in navigating the latent space of the l-GAN and generating outputs based on specified tasks. To substantiate the efficacy of our approach, we have conducted a series of experiments employing the MNIST dataset, including arithmetic addition as an illustrative task. The outcomes of these experiments serve to validate our methodology. Our pioneering integration of an RL agent with a GAN model represents a novel advancement, holding great potential for enhancing generative networks in the future.

摘要

Mathematical Modeling of BCG-based Bladder Cancer Treatment Using Socio-Demographics

paper_url: http://arxiv.org/abs/2307.15084
repo_url: None
paper_authors: Elizaveta Savchenko, Ariel Rosenfeld, Svetlana Bunimovich-Mendrazitsky
for: 这个研究旨在提供一个基于患者的社会民主ographic信息的个性化BCG治疗模型，以提高BCG治疗的效果。
methods: 研究人员采用了一个已知的BCG治疗模型，并将machine learning成分integrated到模型中，以时间地调整和重配置关键参数，以便个性化治疗。
results: 使用实际临床数据，研究人员发现，个性化模型比原始模型在预测治疗结束时的癌细胞数量方面，有14.8%的改善，平均而言。

Abstract
Cancer is one of the most widespread diseases around the world with millions of new patients each year. Bladder cancer is one of the most prevalent types of cancer affecting all individuals alike with no obvious prototypical patient. The current standard treatment for BC follows a routine weekly Bacillus Calmette-Guerin (BCG) immunotherapy-based therapy protocol which is applied to all patients alike. The clinical outcomes associated with BCG treatment vary significantly among patients due to the biological and clinical complexity of the interaction between the immune system, treatments, and cancer cells. In this study, we take advantage of the patient's socio-demographics to offer a personalized mathematical model that describes the clinical dynamics associated with BCG-based treatment. To this end, we adopt a well-established BCG treatment model and integrate a machine learning component to temporally adjust and reconfigure key parameters within the model thus promoting its personalization. Using real clinical data, we show that our personalized model favorably compares with the original one in predicting the number of cancer cells at the end of the treatment, with 14.8% improvement, on average.

摘要
癌症是全球最常见的疾病之一，每年新生癌病例数量达数百万。膀胱癌是最常见的癌病种之一，它对所有患者来说都是无型的。现行的BCG免疫疗法从业余周调度，这个调度是给所有患者都一样的。但是，BCG治疗的临床结果差强人之间，这是因为免疫系统、治疗和癌细胞之间的生物和临床复杂性。在这个研究中，我们利用患者的社会demographics来提供一个个性化的数学模型，描述BCG治疗的临床动力学。为此，我们采用了一个已知的BCG治疗模型，并将其整合了机器学习 ком成分，以时间地调整和重新配置关键参数，以便个性化。使用实际的临床数据，我们显示了我们的个性化模型与原始模型之间的比较，显示了14.8%的改善，平均而言。

Understanding Deep Neural Networks via Linear Separability of Hidden Layers

paper_url: http://arxiv.org/abs/2307.13962
repo_url: None
paper_authors: Chao Zhang, Xinyu Chen, Wensheng Li, Lixue Liu, Wei Wu, Dacheng Tao
for: 研究深度神经网络的特点，尤其是隐藏层输出的线性可分性度。
methods: 提出了基于美丽度差（MD）的线性可分性度度量（LSM）来评估隐藏层输出的线性可分性度。
results: 发现隐藏层输出的线性可分性度和网络训练性能之间存在同步关系，即如果更新参数可以提高隐藏层输出的线性可分性度，则更新后的网络将在训练过程中表现更好，并且相反。此外，研究了活化函数和网络大小（包括宽度和深度）对隐藏层输出的线性可分性度的影响。

Abstract
In this paper, we measure the linear separability of hidden layer outputs to study the characteristics of deep neural networks. In particular, we first propose Minkowski difference based linear separability measures (MD-LSMs) to evaluate the linear separability degree of two points sets. Then, we demonstrate that there is a synchronicity between the linear separability degree of hidden layer outputs and the network training performance, i.e., if the updated weights can enhance the linear separability degree of hidden layer outputs, the updated network will achieve a better training performance, and vice versa. Moreover, we study the effect of activation function and network size (including width and depth) on the linear separability of hidden layers. Finally, we conduct the numerical experiments to validate our findings on some popular deep networks including multilayer perceptron (MLP), convolutional neural network (CNN), deep belief network (DBN), ResNet, VGGNet, AlexNet, vision transformer (ViT) and GoogLeNet.

摘要
在这篇论文中，我们测量了深度神经网络的线性可分性来研究它们的特点。特别是，我们首先提出了米诺夫差分基于的线性可分性度量（MD-LSM）来评估两个点集的线性可分性度。然后，我们证明了深度神经网络训练性能和隐层输出的线性可分性度之间存在同步关系，即如果更新的权重可以提高隐层输出的线性可分性度，那么更新后的网络将获得更好的训练性能，并且vice versa。此外，我们研究了活动函数和网络大小（包括宽和深）对隐层输出的线性可分性的影响。最后，我们进行了一些实验来验证我们的发现，并在多层感知网络（MLP）、卷积神经网络（CNN）、深度信念网络（DBN）、ResNet、VGGNet、AlexNet、视transformer（ViT）和GoogLeNet等 популяр的深度神经网络上进行了实验。

Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

paper_url: http://arxiv.org/abs/2308.02362
repo_url: None
paper_authors: Yuxi Mi, Hongquan Liu, Yewei Xia, Yiheng Sun, Jihong Guan, Shuigeng Zhou
for: 保护数据隐私和实现任务目标之间的平衡，针对垂直联合学习中的敏感信息泄露问题。
methods: 提出一种灵活和通用的方法，通过分解两个目标来解决这两个目标之间的矛盾，首先通过norm clipping来确保隐私保护，然后通过adaptive调整特征编码的扩展和分布来提高任务性能，保持先前的隐私机制。
results: 经验表明，提出的VFL-AFE框架能够有效地防止隐私泄露和保持任务性能，并且可以适应不同的数据集和模型。

Abstract
The emergence of vertical federated learning (VFL) has stimulated concerns about the imperfection in privacy protection, as shared feature embeddings may reveal sensitive information under privacy attacks. This paper studies the delicate equilibrium between data privacy and task utility goals of VFL under differential privacy (DP). To address the generality issue of prior arts, this paper advocates a flexible and generic approach that decouples the two goals and addresses them successively. Specifically, we initially derive a rigorous privacy guarantee by applying norm clipping on shared feature embeddings, which is applicable across various datasets and models. Subsequently, we demonstrate that task utility can be optimized via adaptive adjustments on the scale and distribution of feature embeddings in an accuracy-appreciative way, without compromising established DP mechanisms. We concretize our observation into the proposed VFL-AFE framework, which exhibits effectiveness against privacy attacks and the capacity to retain favorable task utility, as substantiated by extensive experiments.

摘要
《垂直联邦学习（VFL）的出现引发了隐私保护的担忧，因为共享特征嵌入可能会泄露敏感信息在隐私攻击下。这篇论文研究了VFL中数据隐私和任务使用目标之间的敏感平衡，通过减少隐私攻击的方法来保护隐私。为了普适性，这篇论文提出了一种灵活和通用的方法，将数据隐私和任务使用目标分开，然后一一地处理它们。具体来说，我们首先通过在共享特征嵌入中应用范围clip来确保隐私保障，这种方法适用于不同的数据集和模型。然后，我们表明了可以通过对特征嵌入的扩缩和分布进行适应调整，以提高任务使用度，而不会违反已有的隐私机制。我们将这种观察结合到VFL-AFE框架中，该框架能够具有防止隐私攻击和保持任务使用度的能力，经过广泛的实验证明。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Entropy Neural Estimation for Graph Contrastive Learning

paper_url: http://arxiv.org/abs/2307.13944
repo_url: https://github.com/kunzhan/M-ILBO
paper_authors: Yixuan Ma, Xiaolin Zhang, Peng Zhang, Kun Zhan
for: 本 paper 目的是提取图像中节点的可区分高级表示。
methods: 作者提出了一种使用 neural network 来估计数据集的熵，并使用这个熵来提取图像中节点的高级表示。他们还提出了一种subset sampling strategy来对图像进行对比，并使用两个目标函数同时优化网络。
results: 作者在七个图像 benchmark 上进行了广泛的实验，并取得了与当前状态艺术法相当的性能。

Abstract
Contrastive learning on graphs aims at extracting distinguishable high-level representations of nodes. In this paper, we theoretically illustrate that the entropy of a dataset can be approximated by maximizing the lower bound of the mutual information across different views of a graph, \ie, entropy is estimated by a neural network. Based on this finding, we propose a simple yet effective subset sampling strategy to contrast pairwise representations between views of a dataset. In particular, we randomly sample nodes and edges from a given graph to build the input subset for a view. Two views are fed into a parameter-shared Siamese network to extract the high-dimensional embeddings and estimate the information entropy of the entire graph. For the learning process, we propose to optimize the network using two objectives, simultaneously. Concretely, the input of the contrastive loss function consists of positive and negative pairs. Our selection strategy of pairs is different from previous works and we present a novel strategy to enhance the representation ability of the graph encoder by selecting nodes based on cross-view similarities. We enrich the diversity of the positive and negative pairs by selecting highly similar samples and totally different data with the guidance of cross-view similarity scores, respectively. We also introduce a cross-view consistency constraint on the representations generated from the different views. This objective guarantees the learned representations are consistent across views from the perspective of the entire graph. We conduct extensive experiments on seven graph benchmarks, and the proposed approach achieves competitive performance compared to the current state-of-the-art methods. The source code will be publicly released once this paper is accepted.

摘要
“对图进行对比学习的目标是提取图像中节点的明确高级表示。在这篇论文中，我们理论上验证了对于不同视图的图集， Entropy 的估计可以通过最大化不同视图之间的互信息lower bound来进行approximation。基于这个发现，我们提出了一种简单 yet effective的subset sampling策略，即在给定图集中随机选择节点和边，并将其作为视图的输入子集来建立。两个视图被feed into a shared参数的siamesenetwork中，以EXTRACT高维表示和估算整个图集的Entropy。在学习过程中，我们提出了两个目标函数，同时进行优化。具体来说，输入对比损失函数的输入包括正例和负例。我们的选择策略与之前的作品不同，我们提出了一种新的选择策略，根据不同视图的相似性来选择节点。我们通过选择高度相似的样本和完全不同的数据来增强图像编码器的表示能力。此外，我们还引入了跨视图一致性约束，这个约束保证了学习到的表示是视图的整个图集的一致的表示。我们在七个图基本 benchmark 上进行了广泛的实验，并发现我们的方法与当前状态的艺术方法相当竞争。我们将代码公开发布一旦这篇论文被接受。”

Topology-aware Robust Optimization for Out-of-distribution Generalization

paper_url: http://arxiv.org/abs/2307.13943
repo_url: https://github.com/joffery/tro
paper_authors: Fengchun Qiao, Xi Peng
for: 该研究旨在提高机器学习模型对非典型数据的鲁棒性，以适应高风险应用场景。
methods: 该研究提出了一种基于分布 topological structure的 robust optimization 方法，即 Topology-aware Robust Optimization (TRO)，其包括两个优化目标：(1) Topology Learning 探索数据抽象表示的分布 topological structure; (2) Learning on Topology 利用分布 topological structure来约束robust优化，以避免过度优化。
results: 研究证明了 TRO 的有效性，并在多种任务，如分类、回归和semantic segmentation 中显著超越了现有方法。此外，研究发现数据驱动的分布 topological structure与领域知识相一致，从而提高了该方法的解释性。

Abstract
Out-of-distribution (OOD) generalization is a challenging machine learning problem yet highly desirable in many high-stake applications. Existing methods suffer from overly pessimistic modeling with low generalization confidence. As generalizing to arbitrary test distributions is impossible, we hypothesize that further structure on the topology of distributions is crucial in developing strong OOD resilience. To this end, we propose topology-aware robust optimization (TRO) that seamlessly integrates distributional topology in a principled optimization framework. More specifically, TRO solves two optimization objectives: (1) Topology Learning which explores data manifold to uncover the distributional topology; (2) Learning on Topology which exploits the topology to constrain robust optimization for tightly-bounded generalization risks. We theoretically demonstrate the effectiveness of our approach and empirically show that it significantly outperforms the state of the arts in a wide range of tasks including classification, regression, and semantic segmentation. Moreover, we empirically find the data-driven distributional topology is consistent with domain knowledge, enhancing the explainability of our approach.

摘要

Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network

paper_url: http://arxiv.org/abs/2307.13938
repo_url: https://github.com/kunzhan/DSSN
paper_authors: Zhibo Tain, Xiaolin Zhang, Peng Zhang, Kun Zhan
for: 提高semantic segmentation任务中使用无标数据的效果，降低标注训练示例的成本。
methods: 提出了一种双级套件结构网络（DSSN），通过在低级图像空间和高级特征空间进行像素级强制对应，以便充分利用可用的无标数据。同时，引入了一种新的分类意识 pseudo-label 选择策略，用于 Addressing the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes。
results: 在 PASCAL VOC 2012 和 Cityscapes 两个预测集上达到了领先的状态态结果，与其他 SSS 算法相比，具有显著的优势。

Abstract
Semi-supervised semantic segmentation (SSS) is an important task that utilizes both labeled and unlabeled data to reduce expenses on labeling training examples. However, the effectiveness of SSS algorithms is limited by the difficulty of fully exploiting the potential of unlabeled data. To address this, we propose a dual-level Siamese structure network (DSSN) for pixel-wise contrastive learning. By aligning positive pairs with a pixel-wise contrastive loss using strong augmented views in both low-level image space and high-level feature space, the proposed DSSN is designed to maximize the utilization of available unlabeled data. Additionally, we introduce a novel class-aware pseudo-label selection strategy for weak-to-strong supervision, which addresses the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes. Specifically, our strategy selects the top high-confidence prediction of the weak view for each class to generate pseudo labels that supervise the strong augmented views. This strategy is capable of taking into account the class imbalance and improving the performance of long-tailed classes. Our proposed method achieves state-of-the-art results on two datasets, PASCAL VOC 2012 and Cityscapes, outperforming other SSS algorithms by a significant margin.

摘要
semi-supervised semantic segmentation（SSS）是一项重要的任务，它利用了标注和无标注数据来降低标注训练示例的成本。然而，SSS算法的效iveness受到无标注数据的具体性的限制。为了解决这个问题，我们提议了一种双级SIAMESE结构网络（DSSN） для像素级对比学习。我们使用了强制对augmented views的像素级和高级特征空间进行对比，以使得提议的DSSN能够充分利用可用的无标注数据。此外，我们还引入了一种新的类意识 pseudo-label 选择策略，以Addressing the limitations of most existing methods that do not perform selection or apply a predefined threshold for all classes. Specifically, our strategy selects the top high-confidence prediction of the weak view for each class to generate pseudo labels that supervise the strong augmented views. This strategy is capable of taking into account the class imbalance and improving the performance of long-tailed classes. Our proposed method achieves state-of-the-art results on two datasets, PASCAL VOC 2012 and Cityscapes, outperforming other SSS algorithms by a significant margin.

trajdata: A Unified Interface to Multiple Human Trajectory Datasets

paper_url: http://arxiv.org/abs/2307.13924
repo_url: https://github.com/nvlabs/trajdata
paper_authors: Boris Ivanovic, Guanyu Song, Igor Gilitschenski, Marco Pavone
for: 本研究目的是提供一个统一的人行走轨迹数据接口，以便对多个人行走数据集进行训练和评估。
methods: 本研究使用了一个简单、通用、高效的数据表示和API，以探讨现有的人行走数据集。
results: 本研究提供了一个全面的实验分析，以帮助研究人员更好地理解现有的人行走数据集，并提出了未来数据集的建议。

Abstract
The field of trajectory forecasting has grown significantly in recent years, partially owing to the release of numerous large-scale, real-world human trajectory datasets for autonomous vehicles (AVs) and pedestrian motion tracking. While such datasets have been a boon for the community, they each use custom and unique data formats and APIs, making it cumbersome for researchers to train and evaluate methods across multiple datasets. To remedy this, we present trajdata: a unified interface to multiple human trajectory datasets. At its core, trajdata provides a simple, uniform, and efficient representation and API for trajectory and map data. As a demonstration of its capabilities, in this work we conduct a comprehensive empirical evaluation of existing trajectory datasets, providing users with a rich understanding of the data underpinning much of current pedestrian and AV motion forecasting research, and proposing suggestions for future datasets from these insights. trajdata is permissively licensed (Apache 2.0) and can be accessed online at https://github.com/NVlabs/trajdata

摘要
领域轨迹预测在最近几年内得到了广泛发展，部分归功于许多大规模的实际世界人员轨迹数据集的发布，用于自动驾驶车（AV）和人行轨迹跟踪。然而，这些数据集各自使用自定义的数据格式和API，使研究人员在多个数据集之间训练和评估方法变得困难。为解决这个问题，我们提出了trajdata：一个统一的接口，用于多个人轨迹数据集。trajdata的核心思想是提供简单、统一的轨迹和地图数据表示和API。在这篇论文中，我们进行了详细的实验性评估，对现有的轨迹数据集进行了全面的检验，并提出了将来数据集的建议。trajdata是允许任意使用（Apache 2.0），可以在https://github.com/NVlabs/trajdata上线上访问。

HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning

paper_url: http://arxiv.org/abs/2307.14384
repo_url: None
paper_authors: Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Huabin Zhu, Yanchao Tan, Jun Wang, Yue Qi
for: 提高 Federated Learning (FL) 下非标一致数据的表现
methods: 使用 Hyperbolic Prototype Tammes Initialization (HPTI)、Hyperbolic Prototype Learning (HPL) 和 Consistent Aggregation (CA) 模块
results: 在四个数据集上进行了广泛的研究，证明 HyperFed 能够有效提高 FL 下非标一致数据的表现

Abstract
Federated learning (FL) collaboratively models user data in a decentralized way. However, in the real world, non-identical and independent data distributions (non-IID) among clients hinder the performance of FL due to three issues, i.e., (1) the class statistics shifting, (2) the insufficient hierarchical information utilization, and (3) the inconsistency in aggregating clients. To address the above issues, we propose HyperFed which contains three main modules, i.e., hyperbolic prototype Tammes initialization (HPTI), hyperbolic prototype learning (HPL), and consistent aggregation (CA). Firstly, HPTI in the server constructs uniformly distributed and fixed class prototypes, and shares them with clients to match class statistics, further guiding consistent feature representation for local clients. Secondly, HPL in each client captures the hierarchical information in local data with the supervision of shared class prototypes in the hyperbolic model space. Additionally, CA in the server mitigates the impact of the inconsistent deviations from clients to server. Extensive studies of four datasets prove that HyperFed is effective in enhancing the performance of FL under the non-IID set.

摘要

类别统计移动2. 缺乏层次信息利用3. 客户端聚合不一致为解决以上问题，我们提出了 HyperFed，它包括以下三个主要模块：1. hyperbolic prototype Tammes initialization (HPTI)：在服务器端，constructs uniformly distributed和 fixes 类别 prototype，并将其分享给客户端，以匹配类别统计，并且引导客户端的准确特征表示。2. hyperbolic prototype learning (HPL)：在每个客户端上，使用分布式hyperbolic模型，在shared class prototype的超visions下，捕捉本地数据中的层次信息。3. consistent aggregation (CA)：在服务器端，mitigates the impact of inconsistent deviations from clients to server。我们对四个数据集进行了广泛的研究，证明了 HyperFed 可以在非Identical和独立的客户端数据下提高 FL 的性能。

Simulation-based Inference for Cardiovascular Models

paper_url: http://arxiv.org/abs/2307.13918
repo_url: None
paper_authors: Antoine Wehenkel, Jens Behrmann, Andrew C. Miller, Guillermo Sapiro, Ozan Sener, Marco Cuturi, Jörn-Henrik Jacobsen
for: 这 paper 是为了研究 cardiovascular systems 的实验室模拟工具，以及将 physiological parameters 映射回到可能的 waveforms 中。
methods: 这 paper 使用 simulation-based inference (SBI) 方法，通过 statistical inference 来解决 inverse problem，并提供了 multi-dimensional 的 uncertainty 表示。
results: 这 paper 的研究结果表明，SBI 可以为 five biomarkers of clinical interest 提供可靠的 estimations，并且可以捕捉到 standard sensitivity analyses 无法捕捉到的实用信息，如 parameter estimation 的不同不同 uncertainty regimes。

Abstract
Over the past decades, hemodynamics simulators have steadily evolved and have become tools of choice for studying cardiovascular systems in-silico. While such tools are routinely used to simulate whole-body hemodynamics from physiological parameters, solving the corresponding inverse problem of mapping waveforms back to plausible physiological parameters remains both promising and challenging. Motivated by advances in simulation-based inference (SBI), we cast this inverse problem as statistical inference. In contrast to alternative approaches, SBI provides \textit{posterior distributions} for the parameters of interest, providing a \textit{multi-dimensional} representation of uncertainty for \textit{individual} measurements. We showcase this ability by performing an in-silico uncertainty analysis of five biomarkers of clinical interest comparing several measurement modalities. Beyond the corroboration of known facts, such as the feasibility of estimating heart rate, our study highlights the potential of estimating new biomarkers from standard-of-care measurements. SBI reveals practically relevant findings that cannot be captured by standard sensitivity analyses, such as the existence of sub-populations for which parameter estimation exhibits distinct uncertainty regimes. Finally, we study the gap between in-vivo and in-silico with the MIMIC-III waveform database and critically discuss how cardiovascular simulations can inform real-world data analysis.

摘要
Inspired by advances in simulation-based inference (SBI), we approach this inverse problem as a statistical inference problem. Unlike other methods, SBI provides a distribution of posterior probabilities for the parameters of interest, providing a multi-dimensional representation of uncertainty for individual measurements. We demonstrate the power of SBI by performing an in-silico uncertainty analysis of five biomarkers of clinical interest using different measurement modalities.Our study confirms known facts, such as the feasibility of estimating heart rate, and highlights the potential of estimating new biomarkers from standard-of-care measurements. SBI reveals practically relevant findings that cannot be captured by standard sensitivity analyses, such as the existence of sub-populations with distinct uncertainty regimes. Finally, we compare in-vivo and in-silico data using the MIMIC-III waveform database and discuss how cardiovascular simulations can inform real-world data analysis.

BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery

paper_url: http://arxiv.org/abs/2307.13917
repo_url: None
paper_authors: Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong
for: 本研究旨在推断 causal 模型的 posterior 分布，Quantifying 认知不确定性并对下游任务产生积极影响。
methods: 我们引入了一种可扩展的 Bayesian causal discovery 框架，基于 Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC)，可以缓解现有的计算挑战。我们的方法可以直接从 posterior 中抽取 DAGs，不需要任何 DAG 正则化，同时可以同时抽取函数参数样本，并适用于线性和非线性 causal 模型。
results: 我们的方法在 synthetic 和实际数据上进行了 empirical 评估，与 state-of-the-art 基准集比较，并达到了更高的效果。

Abstract
Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on stochastic gradient Markov Chain Monte Carlo (SG-MCMC) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluations on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.

摘要
bayesian causal discovery 目的是从观察数据中推断 posterior 分布 над causal models，量化 epistemic uncertainty 并且提高下游任务的表现。然而，因为joint inference над combinatorial space of Directed Acyclic Graphs (DAGs) 和 nonlinear functions 而产生计算挑战。尽管最近有些进展在 posterior inference над DAGs 上，现有的方法是 either limited to variational inference on node permutation matrices for linear causal models，导致推断精度受限，或者 continuous relaxation of adjacency matrices constrained by a DAG regularizer，无法确保得到的图是 DAGs。在这种情况下，我们提出了一个可扩展的 bayesian causal discovery 框架，基于 Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC)。我们的方法可以直接从 posterior 中抽出 DAGs，无需任何 DAG regularization，同时同时 draw function parameter samples 和可以应用于 linear 和 nonlinear causal models。为了实现我们的方法，我们 derive 了一种新的 permutation-based DAG learning 的等价关系，这使得可以使用任何 relaxed gradient estimator defined over permutations。到我们所知，这是第一个 applying gradient-based MCMC sampling for causal discovery 的框架。我们的实验表明，我们的方法可以与现有的基准值相比，在 synthetic 和实际世界数据上表现更好。

Online learning in bandits with predicted context

paper_url: http://arxiv.org/abs/2307.13916
repo_url: None
paper_authors: Yongyi Guo, Susan Murphy
for: solve the contextual bandit problem with non-diminishing context error
methods: extend the measurement error model in classical statistics to the online decision-making setting
results: achieve sublinear regret compared to the appropriate benchmark

Abstract
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available. When the context error is non-diminishing, classical bandit algorithms fail to achieve sublinear regret. We propose the first online algorithm in this setting with sublinear regret compared to the appropriate benchmark. The key idea is to extend the measurement error model in classical statistics to the online decision-making setting, which is nontrivial due to the policy being dependent on the noisy context observations.

摘要
我团队考虑了 Contextual Bandit 问题，在每个时间点，机器人只有访问一个含有噪声的上下文的机会。这种设定是由各种应用场景所驱动，其中真实的决策上下文未知，仅可获取一个可能复杂的机器学习算法预测的上下文预测。当上下文噪声不逐渐减少时， classical bandit 算法无法实现 суб线性快悟。我们提出了首个在这种设定下的在线算法，与相应的 referent 相比，具有 суб线性快悟。主要想法是将 classical statistics 中的 measurement error model 扩展到在线决策设定中，这是由于决策取决于噪声上下文观测的政策依赖关系。

Graph Neural Networks-based Hybrid Framework For Predicting Particle Crushing Strength

paper_url: http://arxiv.org/abs/2307.13909
repo_url: https://github.com/doujiang-zheng/gnn-for-particle-crushing
paper_authors: Tongya Zheng, Tianli Zhang, Qingzheng Guan, Wenjie Huang, Zunlei Feng, Mingli Song, Chun Chen
for: This paper aims to characterize the mechanical behaviors of particle crushing through the connectivity of particle fragments with Graph Neural Networks (GNNs) and to facilitate the research progress of machine learning for particle crushing by generating a large-scale dataset.
methods: The authors use a hybrid framework based on GNNs to predict particle crushing strength in a particle fragment view, and compare their hybrid framework against traditional machine learning methods and the plain MLP to verify its effectiveness.
results: The authors generate a dataset with 45,000 numerical simulations and 900 particle types, and their hybrid framework achieves better performance than traditional machine learning methods and the plain MLP. They also discuss the usefulness of different features through gradient attribution explanation w.r.t the predictions.

Abstract
Graph Neural Networks have emerged as an effective machine learning tool for multi-disciplinary tasks such as pharmaceutical molecule classification and chemical reaction prediction, because they can model non-euclidean relationships between different entities. Particle crushing, as a significant field of civil engineering, describes the breakage of granular materials caused by the breakage of particle fragment bonds under the modeling of numerical simulations, which motivates us to characterize the mechanical behaviors of particle crushing through the connectivity of particle fragments with Graph Neural Networks (GNNs). However, there lacks an open-source large-scale particle crushing dataset for research due to the expensive costs of laboratory tests or numerical simulations. Therefore, we firstly generate a dataset with 45,000 numerical simulations and 900 particle types to facilitate the research progress of machine learning for particle crushing. Secondly, we devise a hybrid framework based on GNNs to predict particle crushing strength in a particle fragment view with the advances of state of the art GNNs. Finally, we compare our hybrid framework against traditional machine learning methods and the plain MLP to verify its effectiveness. The usefulness of different features is further discussed through the gradient attribution explanation w.r.t the predictions. Our data and code are released at https://github.com/doujiang-zheng/GNN-For-Particle-Crushing.

摘要
图 neural network 已成为多学科任务的有效机器学习工具，如药品分类和化学反应预测，因为它可以模型不对称关系 между不同实体。在 civil engineering 中， particulate crushing 描述了受 fragment bond 的破坏而导致的 granular material 的破坏，这使我们想使用 GNN 来描述 particulate crushing 的机械行为。然而，由于实验室测试或数值 simulate 的成本太高，因此在 particle crushing 领域中缺乏开源大规模数据集，以便进行研究。因此，我们首先生成了一个数据集，包含 45,000 个数值 simulate 和 900 种 particule type，以促进机器学习在 particle crushing 领域的研究进步。其次，我们开发了基于 GNN 的混合框架，用于预测 particulate crushing 的强度在 particule fragment 视角中。最后，我们比较了我们的混合框架与传统机器学习方法和平方多层感知网络，以验证其效果。此外，我们还通过 gradient attribution 的解释来评估不同特征的用用。我们的数据和代码在 GitHub 上发布，请参考 https://github.com/doujiang-zheng/GNN-For-Particle-Crushing。

Robustness Verification of Deep Neural Networks using Star-Based Reachability Analysis with Variable-Length Time Series Input

paper_url: http://arxiv.org/abs/2307.13907
repo_url: None
paper_authors: Neelanjana Pal, Diego Manzanas Lopez, Taylor T Johnson
for:这个研究探讨了如何使用神经网络（NN）进行资料驱动的异常探测和预测维护，并将注意力集中在时间序列资料的NN-基的分析方法上。methods:这篇论文使用了设计 Variable-length input 来简化输入处理并增强网络架构的可扩展性。它还使用了星形可达分析来检验NN的 Robustness，并使用了多个性能指标来衡量对输入噪声的影响。results:这篇论文发现，这些NN-based analytics 具有高度的Robustness，并且可以对时间序列资料进行高精度的预测。它还发现，这些NNs 对于不同的输入噪声情况下的预测结果稳定且可靠。

Abstract
Data-driven, neural network (NN) based anomaly detection and predictive maintenance are emerging research areas. NN-based analytics of time-series data offer valuable insights into past behaviors and estimates of critical parameters like remaining useful life (RUL) of equipment and state-of-charge (SOC) of batteries. However, input time series data can be exposed to intentional or unintentional noise when passing through sensors, necessitating robust validation and verification of these NNs. This paper presents a case study of the robustness verification approach for time series regression NNs (TSRegNN) using set-based formal methods. It focuses on utilizing variable-length input data to streamline input manipulation and enhance network architecture generalizability. The method is applied to two data sets in the Prognostics and Health Management (PHM) application areas: (1) SOC estimation of a Lithium-ion battery and (2) RUL estimation of a turbine engine. The NNs' robustness is checked using star-based reachability analysis, and several performance measures evaluate the effect of bounded perturbations in the input on network outputs, i.e., future outcomes. Overall, the paper offers a comprehensive case study for validating and verifying NN-based analytics of time-series data in real-world applications, emphasizing the importance of robustness testing for accurate and reliable predictions, especially considering the impact of noise on future outcomes.

摘要
数据驱动、基于神经网络（NN）的异常检测和预测维护是当前研究领域的新兴领域。NN基于时间序列数据的分析可以为过去行为提供有价值的洞察，并估计设备的剩余有用生命（RUL）和电池的状态充电（SOC）。然而，输入时间序列数据可能会受到意外或非意外的噪声影响，因此需要Robust验证和验证这些NN。这篇论文介绍了对时间序列回归NN（TSRegNN）的Robust验证方法，使用变量长度输入数据来简化输入处理和提高网络架构的通用性。该方法在两个PHM应用领域的数据集上进行了应用：（1）离子电池SOC估计和（2）涡轮机RUL估计。NN的Robustness通过星形可达分析进行检查，并通过一些性能指标评估输入噪声对网络输出的影响，即未来的结果。总之，这篇论文提供了实际应用中验证和验证NN基于时间序列数据的分析的全面的案例研究，强调Robust验证的重要性，特别是考虑噪声对未来结果的影响。

Corruption-Robust Lipschitz Contextual Search

paper_url: http://arxiv.org/abs/2307.13903
repo_url: None
paper_authors: Shiliang Zuo
for: 学习一个 lipschitz 函数，即 adversary 选择的函数 $f$。
methods: 使用 natural yet powerful technique sanity check，设计了 corruption-robust 算法。
results: 对于 symmetric loss， learner 的 regret 为 $O(C\log T)$ （其中 $d = 1$），或 $O_d(C\log T + T^{(d-1)/d})$（其中 $d > 1$）；对于 pricing loss， learner 的 regret 为 $\widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$。

Abstract
I study the problem of learning a Lipschitz function with corrupted binary signals. The learner tries to learn a Lipschitz function $f$ that the adversary chooses. In each round, the adversary selects a context vector $x_t$ in the input space, and the learner makes a guess to the true function value $f(x_t)$ and receives a binary signal indicating whether the guess was high or low. In a total of $C$ rounds, the signal may be corrupted, though the value of $C$ is unknown to the learner. The learner's goal is to incur a small cumulative loss. I present a natural yet powerful technique sanity check, which proves useful in designing corruption-robust algorithms. I design algorithms which (treating the Lipschitz parameter $L$ as constant): for the symmetric loss, the learner achieves regret $O(C\log T)$ with $d = 1$ and $O_d(C\log T + T^{(d-1)/d})$ with $d > 1$; for the pricing loss the learner achieves regret $\widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$.

摘要
我研究一个学习具有欺诈 Binary 信号的 Lipschitz 函数问题。学习者尝试学习一个由 против方选择的 Lipschitz 函数 $f$。在每个回合中，对手选择一个输入空间中的上下文向量 $x_t$，学习者对真实函数值 $f(x_t)$ 进行猜测，并接收一个指示猜测高或低的 Binary 信号。总共有 $C$ 回合，信号可能受到欺诈，尝试者不知道对手会欺诈多少回合。学习者的目标是减少总的损失。我提出了一种自然且强大的技术Check，用于设计欺诈Robust算法。我设计了算法，对于对称损失，学习者可以得到 $O(C\log T)$ 的 regret，其中 $d = 1$ 和 $O_d(C\log T + T^{(d-1)/d})$ ，其中 $d > 1$。对于价格损失，学习者可以得到 $\widetilde{O} (T^{d/(d+1)} + C\cdot T^{1/(d+1)})$。

Regularizing Neural Networks with Meta-Learning Generative Models

paper_url: http://arxiv.org/abs/2307.13899
repo_url: None
paper_authors: Shin’ya Yamaguchi, Daiki Chijiwa, Sekitoshi Kanai, Atsutoshi Kumagai, Hisashi Kashima
for: 该论文旨在提高深度学习中的生成数据增强。
methods: 该论文提出了一种新的生成数据增强策略 called meta generative regularization (MGR)，通过在特征提取器中使用生成样本来避免生成数据增强的性能下降。
results: 实验结果表明，MGR可以避免生成数据增强的性能下降，并在小 dataset 设置下特别有效，稳定超过基eline。

Abstract
This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is because the synthetic samples do not perfectly represent class categories in real data and uniform sampling does not necessarily provide useful samples for tasks. In this paper, we present a novel strategy for generative data augmentation called meta generative regularization (MGR). To avoid the degradation of generative data augmentation, MGR utilizes synthetic samples in the regularization term for feature extractors instead of in the loss function, e.g., cross-entropy. These synthetic samples are dynamically determined to minimize the validation losses through meta-learning. We observed that MGR can avoid the performance degradation of na\"ive generative data augmentation and boost the baselines. Experiments on six datasets showed that MGR is effective particularly when datasets are smaller and stably outperforms baselines.

摘要
To address this challenge, we propose a novel approach called meta generative regularization (MGR). Instead of using synthetic samples in the loss function, such as cross-entropy, MGR utilizes these samples in the regularization term for feature extractors. The synthetic samples are dynamically determined to minimize validation losses through meta-learning.We observed that MGR can effectively avoid the performance degradation of naive generative data augmentation and improve baseline performance. Our experiments on six datasets showed that MGR is particularly effective in small datasets and consistently outperforms baselines.

Efficient Estimation of the Local Robustness of Machine Learning Models

paper_url: http://arxiv.org/abs/2307.13885
repo_url: None
paper_authors: Tessa Han, Suraj Srinivas, Himabindu Lakkaraju
for: 本文旨在提高机器学习模型对噪声输入数据的Robustness。
methods: 本文提出了首个分析性的 robustness estimator，通过地方线性函数近似和多变量正态分布函数，可以高效计算多类分类模型的地方Robustness。
results: 本文通过DERIVATION的过程，确认了这些 estimator 能够高度准确地计算标准深度学习模型的地方Robustness。此外，本文还证明了这些 estimator 在不同任务中的有用性，如测试模型的Robustness偏见和找到数据集中噪声扰动的易受到影响的示例。

Abstract
Machine learning models often need to be robust to noisy input data. The effect of real-world noise (which is often random) on model predictions is captured by a model's local robustness, i.e., the consistency of model predictions in a local region around an input. However, the na\"ive approach to computing local robustness based on Monte-Carlo sampling is statistically inefficient, leading to prohibitive computational costs for large-scale applications. In this work, we develop the first analytical estimators to efficiently compute local robustness of multi-class discriminative models using local linear function approximation and the multivariate Normal CDF. Through the derivation of these estimators, we show how local robustness is connected to concepts such as randomized smoothing and softmax probability. We also confirm empirically that these estimators accurately and efficiently compute the local robustness of standard deep learning models. In addition, we demonstrate these estimators' usefulness for various tasks involving local robustness, such as measuring robustness bias and identifying examples that are vulnerable to noise perturbation in a dataset. By developing these analytical estimators, this work not only advances conceptual understanding of local robustness, but also makes its computation practical, enabling the use of local robustness in critical downstream applications.

摘要

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

paper_url: http://arxiv.org/abs/2307.13883
repo_url: None
paper_authors: Kensen Shi, Joey Hong, Manzil Zaheer, Pengcheng Yin, Charles Sutton
for: 这个论文的目的是描述一种基于拆分的程序生成策略，以及这种策略在不同复杂性水平下的普适性。
methods: 这个论文使用了一种基于执行目标的分解策略，即预测每个步骤的执行目标，以解决问题步骤通过执行程序。
results: compared to基elines,这种分解策略能够更好地普适化并且具有更高的程序生成性能。

Abstract
When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, we can measure whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more complex tasks. In this paper, we characterize several different forms of compositional generalization that are desirable in program synthesis, forming a meta-benchmark which we use to create generalization tasks for two popular datasets, RobustFill and DeepCoder. We then propose ExeDec, a novel decomposition-based synthesis strategy that predicts execution subgoals to solve problems step-by-step informed by program execution at each step. ExeDec has better synthesis performance and greatly improved compositional generalization ability compared to baselines.

摘要
当编写程序时，人们可以将复杂任务分解成更加熟悉的子任务，以便更好地进行解决。虽然无法直接测量神经程序合成方法是否具有类似的能力，但我们可以测量它们是否具有组合普适性，即是否一个已经在简单子任务上训练的模型能够解决更复杂的任务。在这篇论文中，我们描述了几种不同的组合普适性形式，这些形式是程序合成中极其感兴趣的。我们使用这些形式组成一个元标准，并使用这个元标准来创建一些总体化任务，以便测试两个流行的数据集：RobustFill和DeepCoder。然后，我们提出了一种新的分解基于的合成策略，即ExeDec，这种策略可以预测执行目标，以解决问题步骤条件下。ExeDec的合成性能和组合普适性能比基eline要好。

Good Lattice Training: Physics-Informed Neural Networks Accelerated by Number Theory

paper_url: http://arxiv.org/abs/2307.13869
repo_url: None
paper_authors: Takashi Matsubara, Takaharu Yaguchi
for: 解决partial differential equations (PDEs)
methods: 使用physics-informed neural networks (PINNs)和good lattice training (GLT)
results: 需要2-20倍少的散点（相对于随机抽样或拉丁hypercube抽样），而且实现竞争性的性能

Abstract
Physics-informed neural networks (PINNs) offer a novel and efficient approach to solving partial differential equations (PDEs). Their success lies in the physics-informed loss, which trains a neural network to satisfy a given PDE at specific points and to approximate the solution. However, the solutions to PDEs are inherently infinite-dimensional, and the distance between the output and the solution is defined by an integral over the domain. Therefore, the physics-informed loss only provides a finite approximation, and selecting appropriate collocation points becomes crucial to suppress the discretization errors, although this aspect has often been overlooked. In this paper, we propose a new technique called good lattice training (GLT) for PINNs, inspired by number theoretic methods for numerical analysis. GLT offers a set of collocation points that are effective even with a small number of points and for multi-dimensional spaces. Our experiments demonstrate that GLT requires 2--20 times fewer collocation points (resulting in lower computational cost) than uniformly random sampling or Latin hypercube sampling, while achieving competitive performance.

摘要
physics-informed neural networks (PINNs) 提供了一种新的和高效的方法来解决partial differential equations (PDEs)。其成功归功于物理学习损失，该损失训练一个神经网络满足给定PDE的特定点并估算解。然而，解析方程的解是无穷维度的，而误差的评价是通过域内积分来定义的。因此，物理学习损失只提供了有限的approximation，选择合适的集合点变得非常重要，尽管这一点经常被忽略。在这篇论文中，我们提出了一种新的技术called good lattice training (GLT) дляPINNs， inspirited by number theoretic methods for numerical analysis。GLT提供了一组高效的集合点，可以在小量的点数下在多维空间中达到竞争性表现。我们的实验表明，GLT比uniformly random sampling或Latin hypercube sampling要少2--20倍的集合点数（导致更低的计算成本），同时具有竞争性的表现。

Learning sources of variability from high-dimensional observational studies

paper_url: http://arxiv.org/abs/2307.13868
repo_url: https://github.com/ebridge2/cdcorr
paper_authors: Eric W. Bridgeford, Jaewon Chung, Brian Gilbert, Sambit Panda, Adam Li, Cencheng Shen, Alexandra Badea, Brian Caffo, Joshua T. Vogelstein
for: 这个论文是关于 causal inference 的研究，即判断变量是否对观察结果产生影响的问题。methods: 这篇论文使用了一种新的方法，即将 causal estimands 扩展到包括任意维度或任意测量空间的输出，并将 Nominal 变量的 causal estimands 表示为 causal discrepancy tests。results: 该方法在数据分析中具有较好的finite sample validity和power性能，并且可以在 GitHub 上获取开源代码。

Abstract
Causal inference studies whether the presence of a variable influences an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. Unfortunately, the majority of these methods are often limited to univariate outcomes. Our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. We propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. Numerical experiments illustrate that our method, Causal CDcorr, leads to improvements in both finite sample validity and power when compared to existing strategies. Our methods are all open source and available at github.com/ebridge2/cdcorr.

摘要
causal inference studies whether a variable's presence affects an observed outcome. As measured by quantities such as the "average treatment effect," this paradigm is employed across numerous biological fields, from vaccine and drug development to policy interventions. unfortunately, most methods are limited to univariate outcomes. our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space, and formulates traditional causal estimands for nominal variables as causal discrepancy tests. we propose a simple technique for adjusting universally consistent conditional independence tests and prove that these tests are universally consistent causal discrepancy tests. numerical experiments illustrate that our method, causal CDcorr, leads to improvements in both finite sample validity and power compared to existing strategies. our methods are all open source and available at github.com/ebridge2/cdcorr.Here's the translation breakdown:* "causal inference" is causal inference ( causal 推断)* "whether a variable's presence affects an observed outcome" is whether a variable's presence affects an observed outcome ( 变量存在影响观察结果)* "as measured by quantities such as the 'average treatment effect'" is as measured by quantities such as the 'average treatment effect' ( 根据 "average treatment effect" 类型的量进行测量)* "this paradigm is employed across numerous biological fields" is this paradigm is employed across numerous biological fields ( 这种方法在生物学多个领域中使用)* "from vaccine and drug development to policy interventions" is from vaccine and drug development to policy interventions ( 从疫苗和药物开发到政策干预)* "unfortunately, most methods are limited to univariate outcomes" is unfortunately, most methods are limited to univariate outcomes ( unfortunately, most methods are limited to univariate outcomes)* "our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space" is our work generalizes causal estimands to outcomes with any number of dimensions or any measurable space ( 我们的工作泛化 causal estimands 到任意维度或任意测量空间)* "and formulates traditional causal estimands for nominal variables as causal discrepancy tests" is and formulates traditional causal estimands for nominal variables as causal discrepancy tests ( 并将 traditional causal estimands для nominal variables 表示为 causal discrepancy tests)* "we propose a simple technique for adjusting universally consistent conditional independence tests" is we propose a simple technique for adjusting universally consistent conditional independence tests ( 我们提出一种简单的方法来调整 universally consistent conditional independence tests)* "and prove that these tests are universally consistent causal discrepancy tests" is and prove that these tests are universally consistent causal discrepancy tests ( 并证明这些测试是 universally consistent causal discrepancy tests)* "numerical experiments illustrate that our method, causal CDcorr, leads to improvements in both finite sample validity and power" is numerical experiments illustrate that our method, causal CDcorr, leads to improvements in both finite sample validity and power ( 数学实验表明，我们的方法 causal CDcorr 在finite sample validity 和 power 两个方面都有改善)* "our methods are all open source and available at github.com/ebridge2/cdcorr" is our methods are all open source and available at github.com/ebridge2/cdcorr ( 我们的方法都是开源的，可以在 github.com/ebridge2/cdcorr 上获取)

Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT

paper_url: http://arxiv.org/abs/2307.13865
repo_url: None
paper_authors: Taha Emre, Marzieh Oghbaie, Arunava Chakravarty, Antoine Rivail, Sophie Riedl, Julia Mai, Hendrik P. N. Scholl, Sobha Sivaprasad, Daniel Rueckert, Andrew Lotery, Ursula Schmidt-Erfurth, Hrvoje Bogunović
for: 这种研究旨在提高静脉穿梭图像处理领域的肿瘤诊断和预测性能。methods: 该研究采用了2.5D结构，结合了卷积神经网络（CNN）、长短期记忆（LSTM）和变换器，并利用了非对照预训练方法。results: 研究表明，该方法可以在两个大型静脉穿梭数据集上预测肿瘤患者在六个月内到达湿性肿瘤（AMD）的患病程度，并且比传统方法提高了性能和数据使用效率。

Abstract
In the field of medical imaging, 3D deep learning models play a crucial role in building powerful predictive models of disease progression. However, the size of these models presents significant challenges, both in terms of computational resources and data requirements. Moreover, achieving high-quality pretraining of 3D models proves to be even more challenging. To address these issues, hybrid 2.5D approaches provide an effective solution for utilizing 3D volumetric data efficiently using 2D models. Combining 2D and 3D techniques offers a promising avenue for optimizing performance while minimizing memory requirements. In this paper, we explore 2.5D architectures based on a combination of convolutional neural networks (CNNs), long short-term memory (LSTM), and Transformers. In addition, leveraging the benefits of recent non-contrastive pretraining approaches in 2D, we enhanced the performance and data efficiency of 2.5D techniques even further. We demonstrate the effectiveness of architectures and associated pretraining on a task of predicting progression to wet age-related macular degeneration (AMD) within a six-month period on two large longitudinal OCT datasets.

摘要
医学成像领域中，3D深度学习模型在建立疾病进程预测模型方面发挥关键作用。然而，这些模型的大小带来了计算资源和数据需求的挑战。同时，获得高质量预训练3D模型也是极其困难的。为解决这些问题，混合2.5D方法提供了高效地利用3D体积数据的解决方案。将2D和3D技术结合使用，可以备受提高性能的同时减少内存需求。在本文中，我们探讨了基于卷积神经网络（CNN）、长期吸引记忆（LSTM）和转换器的2.5D架构。此外，利用最近的非对照预训练方法在2D中，我们进一步提高了2.5D技术的性能和数据效率。我们在两个大型Longitudinal OCT数据集上证明了这些架构和预训练的效果，用于预测在6个月内进行湿性肿瘤性macular degeneration（AMD）的进程。

Learning to Design Analog Circuits to Meet Threshold Specifications

paper_url: http://arxiv.org/abs/2307.13861
repo_url: https://github.com/indylab/circuit-synthesis
paper_authors: Dmitrii Krylov, Pooya Khajeh, Junhan Ouyang, Thomas Reeves, Tongkai Liu, Hiba Ajmal, Hamidreza Aghasi, Roy Fox
for: 本文是关于自动化分析和无线电电路设计的研究，使用supervised或反射学习从 simulation 数据中学习。
methods: 本文提出了一种方法，通过在 simulation 数据上生成一个数据集，以便通过supervised learning训练系统，以实现适用于阈值要求的电路设计。
results: 本文的实验结果表明，该方法可以在5%错误率下达到90%的成功率，同时提高数据效率。 Demo 系统可以在 circuits.streamlit.app 上测试。

Abstract
Automated design of analog and radio-frequency circuits using supervised or reinforcement learning from simulation data has recently been studied as an alternative to manual expert design. It is straightforward for a design agent to learn an inverse function from desired performance metrics to circuit parameters. However, it is more common for a user to have threshold performance criteria rather than an exact target vector of feasible performance measures. In this work, we propose a method for generating from simulation data a dataset on which a system can be trained via supervised learning to design circuits to meet threshold specifications. We moreover perform the to-date most extensive evaluation of automated analog circuit design, including experimenting in a significantly more diverse set of circuits than in prior work, covering linear, nonlinear, and autonomous circuit configurations, and show that our method consistently reaches success rate better than 90% at 5% error margin, while also improving data efficiency by upward of an order of magnitude. A demo of this system is available at circuits.streamlit.app

摘要
自动化的分析和广播逻辑电路设计使用监督或强化学习从 simulate 数据进行研究，以代替人工专家设计。它直观的 для 设计代理人学习一个逆函数从需要性能指标到电路参数。然而，更常见的用户有阈值性能标准而不是具体的可行性表现度量Vector。在这项工作中，我们提出了一种从 simulate 数据中生成一个可以通过监督学习训练系统来满足阈值要求的数据集。我们进一步执行了之前的自动分析电路设计评估，包括在更加多样化的电路配置中进行实验，包括线性、非线性和自主电路配置，并证明了我们的方法可以在 5% 误差率下达到 Better than 90% 的成功率，同时也提高了数据效率，高达一个数量级。一个示例系统可以在 circuits.streamlit.app 上查看。

On the unreasonable vulnerability of transformers for image restoration – and an easy fix

paper_url: http://arxiv.org/abs/2307.13856
repo_url: None
paper_authors: Shashank Agnihotri, Kanchana Vaishnavi Gandikota, Julia Grabinski, Paramanand Chandramouli, Margret Keuper
for: 这个论文主要研究了使用视Transformer（ViT）进行图像修复 task 的 robustness 性能。
methods: 作者使用了 Projected Gradient Descent（PGD）和 Cosine PGD（CosPGD）等 adversarial attack 方法进行评估。
results: 研究发现，Restormer 模型在 GoPro dataset 上的图像锐化任务中具有较高的Robustness，但 NAAFNet 和 Baseline 模型的Robustness 较差。通过对 Restormer 进行 adversarial training，可以得到显著提高的Robustness。

Abstract
Following their success in visual recognition tasks, Vision Transformers(ViTs) are being increasingly employed for image restoration. As a few recent works claim that ViTs for image classification also have better robustness properties, we investigate whether the improved adversarial robustness of ViTs extends to image restoration. We consider the recently proposed Restormer model, as well as NAFNet and the "Baseline network" which are both simplified versions of a Restormer. We use Projected Gradient Descent (PGD) and CosPGD, a recently proposed adversarial attack tailored to pixel-wise prediction tasks for our robustness evaluation. Our experiments are performed on real-world images from the GoPro dataset for image deblurring. Our analysis indicates that contrary to as advocated by ViTs in image classification works, these models are highly susceptible to adversarial attacks. We attempt to improve their robustness through adversarial training. While this yields a significant increase in robustness for Restormer, results on other networks are less promising. Interestingly, the design choices in NAFNet and Baselines, which were based on iid performance, and not on robust generalization, seem to be at odds with the model robustness. Thus, we investigate this further and find a fix.

摘要
根据视觉任务的成功，视觉转换器（ViT）在图像修复领域中得到了越来越多的应用。一些最近的研究表明，ViT在图像分类任务中也有更好的鲁棒性质，我们来 investigate这些鲁棒性质是否扩展到图像修复领域。我们考虑了最近提出的Restormer模型，以及NAFNet和"基eline网络"，这两者都是Restormer的简化版本。我们使用项目化梯度下降（PGD）和CosPGD，一种最近提出的针对像精度预测任务的抗击攻击方法来评估我们的模型的鲁棒性。我们的实验是在GoPro dataset上进行了实际图像锐化任务。我们的分析表明，与在图像分类任务中所提出的鲁棒性相反，这些模型对抗击攻击非常易受伤。我们尝试通过对模型进行鲁棒训练来提高其鲁棒性。而对Restormer来说，这种方法带来了显著的鲁棒性提高，但对NAFNet和基eline网络来说，结果不那么抵触。我们进一步调查这个问题，发现iid性的设计选择在NAFNet和基eline网络中不是适合的。因此，我们进一步调查这个问题，并发现了一个解决方案。

Exploring the Sharpened Cosine Similarity

paper_url: http://arxiv.org/abs/2307.13855
repo_url: None
paper_authors: Skyler Wu, Fred Lu, Edward Raff, James Holt
for: 对比 convolutional layers，这篇论文探讨了使用 Sharpened Cosine Similarity (SCS) 来替换图像分类模型的可能性。
methods: 这篇论文使用了 SCS 的参数行为和多种 CNN 架构的比较，以及对 CIFAR-10 数据集的测试。
results: 论文发现，使用 SCS 可能不会提高准确率，但可能学习更易于理解的特征表示。此外，在某些情况下，SCS 可能会提高对抗式攻击的Robustness。

Abstract
Convolutional layers have long served as the primary workhorse for image classification. Recently, an alternative to convolution was proposed using the Sharpened Cosine Similarity (SCS), which in theory may serve as a better feature detector. While multiple sources report promising results, there has not been to date a full-scale empirical analysis of neural network performance using these new layers. In our work, we explore SCS's parameter behavior and potential as a drop-in replacement for convolutions in multiple CNN architectures benchmarked on CIFAR-10. We find that while SCS may not yield significant increases in accuracy, it may learn more interpretable representations. We also find that, in some circumstances, SCS may confer a slight increase in adversarial robustness.

摘要
卷积层长期作为图像分类的主要工具。最近，一种使用卷积的替代方案，即加剪极值相似性（SCS），被提出。虽然多种来源报道了批处性的结果，但到目前为止没有一个全面的实验分析了神经网络性能使用这些新层。在我们的工作中，我们探索SCS的参数行为和作为替换卷积层的潜在可能性。我们发现SCS可能不会导致显著提高准确率，但可能学习更易于理解的表示。我们还发现，在某些情况下，SCS可能会提供轻微的防御性提升。

WebArena: A Realistic Web Environment for Building Autonomous Agents

paper_url: http://arxiv.org/abs/2307.13854
repo_url: https://github.com/web-arena-x/webarena
paper_authors: Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, Graham Neubig
for: 这篇论文的目的是为了建立一个高度真实和可重现的自动控制代理环境，以便测试和评估基于自然语言命令的自动代理。
methods: 这篇论文使用了现有的自然语言处理技术，如GPT-4，以及一些特定的推理和决策策略，来实现自动代理的功能。
results: 根据这篇论文的结果，当前的状态对抗算法仍然有很大的改进空间，特别是在解决复杂的实际任务时。最好的GPT-4基本代理只有10.59%的任务完成率。这些结果表明，在实际任务中，自动代理仍然需要进一步的发展和改进。

Abstract
With generative AI advances, the exciting potential for autonomous agents to manage daily tasks via natural language commands has emerged. However, cur rent agents are primarily created and tested in simplified synthetic environments, substantially limiting real-world scenario representation. In this paper, we build an environment for agent command and control that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on websites, and we create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (e.g., a map) and external knowledge bases (e.g., user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks focusing on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and are designed to emulate tasks that humans routinely perform on the internet. We design and implement several autonomous agents, integrating recent techniques such as reasoning before acting. The results demonstrate that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 10.59%. These results highlight the need for further development of robust agents, that current state-of-the-art LMs are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress. Our code, data, environment reproduction resources, and video demonstrations are publicly available at https://webarena.dev/.

摘要
With the advancement of generative AI, the possibility of autonomous agents managing daily tasks through natural language commands has emerged. However, current agents are primarily developed and tested in simplified synthetic environments, which significantly limits the representation of real-world scenarios. In this paper, we create an environment for agent command and control that is highly realistic and reproducible. Specifically, we focus on agents that perform tasks on websites, and we create an environment with fully functional websites from four common domains: e-commerce, social forum discussions, collaborative software development, and content management. Our environment is enriched with tools (such as a map) and external knowledge bases (such as user manuals) to encourage human-like task-solving. Building upon our environment, we release a set of benchmark tasks that focus on evaluating the functional correctness of task completions. The tasks in our benchmark are diverse, long-horizon, and are designed to emulate tasks that humans routinely perform on the internet. We design and implement several autonomous agents, integrating recent techniques such as reasoning before acting. The results show that solving complex tasks is challenging: our best GPT-4-based agent only achieves an end-to-end task success rate of 10.59%. These results highlight the need for further development of robust agents, that current state-of-the-art language models are far from perfect performance in these real-life tasks, and that WebArena can be used to measure such progress. Our code, data, environment reproduction resources, and video demonstrations are publicly available at .

SplitFed resilience to packet loss: Where to split, that is the question

paper_url: http://arxiv.org/abs/2307.13851
repo_url: None
paper_authors: Chamani Shiranthika, Zahra Hafezi Kafshgari, Parvaneh Saeedi, Ivan V. Bajić
for: 这篇论文是研究分布式机器学习中的 Federation Learning（FL）和 Split Learning（SL）的hybrid模型 Split Federated Learning（SplitFed或SFL）的稳定性。
methods: 这篇论文使用了在 коммуникаation链上的包 loss对 SFL的影响的研究，并对不同的 SFL 聚合策略进行了测试，包括在模型中分割点的深度。
results: 实验结果表明，在人类胚胎图像分割模型中，使用深度分割点可以获得更高的准确率。

Abstract
Decentralized machine learning has broadened its scope recently with the invention of Federated Learning (FL), Split Learning (SL), and their hybrids like Split Federated Learning (SplitFed or SFL). The goal of SFL is to reduce the computational power required by each client in FL and parallelize SL while maintaining privacy. This paper investigates the robustness of SFL against packet loss on communication links. The performance of various SFL aggregation strategies is examined by splitting the model at two points -- shallow split and deep split -- and testing whether the split point makes a statistically significant difference to the accuracy of the final model. Experiments are carried out on a segmentation model for human embryo images and indicate the statistically significant advantage of a deeper split point.

摘要
分布式机器学习在最近已经扩展了其范围，包括联邦学习（FL）、分布式学习（SL）以及其混合体（SplitFed或SFL）。SFL的目标是降低每个客户端在FL中计算能力需求并并行SL，同时保持隐私。这篇论文研究了SFL在通信链路上 packet loss 的Robustness。具体来说，该论文分析了不同的SFL聚合策略在不同的分割点（浅分割和深分割）下的性能，并测试了这些分割点是否对最终模型的准确率产生了统计学上的影响。实验使用了人类胚胎图像分割模型，结果表明深分割点具有统计学上的优势。

MAEA: Multimodal Attribution for Embodied AI

paper_url: http://arxiv.org/abs/2307.13850
repo_url: None
paper_authors: Vidhi Jain, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Yonatan Bisk
for: 本研究旨在理解多Modal perception дляembodied AI，因为输入可能包含高度相互补充的信息。
methods: 本研究使用了解归报分析来理解不同策略在ALFRED数据集上的全球趋势。
results: 研究发现了一种名为MAEA的框架，可以计算任意分 diferenciable 策略的全球贡献。此外，研究还表明了如何使用贡献来分析低级行为在EAI策略中。

Abstract
Understanding multimodal perception for embodied AI is an open question because such inputs may contain highly complementary as well as redundant information for the task. A relevant direction for multimodal policies is understanding the global trends of each modality at the fusion layer. To this end, we disentangle the attributions for visual, language, and previous action inputs across different policies trained on the ALFRED dataset. Attribution analysis can be utilized to rank and group the failure scenarios, investigate modeling and dataset biases, and critically analyze multimodal EAI policies for robustness and user trust before deployment. We present MAEA, a framework to compute global attributions per modality of any differentiable policy. In addition, we show how attributions enable lower-level behavior analysis in EAI policies for language and visual attributions.

摘要
（简体中文）理解多模态识别对带有体的AI是一个开放的问题，因为这些输入可能包含高度相互补做的信息。一个有利的方向是理解每个模态的全球趋势在融合层。为此，我们分离不同策略在ALFRED数据集上训练的视觉、语言和前一个动作输入的归因分析。归因分析可以用来排序和分组失败场景，调查模型和数据集偏见，并在部署前对多模态EAI策略进行 crítico分析。我们提出了MAEA框架，用于计算任意可导策略的全球归因。此外，我们还示出了归因如何帮助分析EAI策略的下一个行为。

Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search

paper_url: http://arxiv.org/abs/2307.13831
repo_url: None
paper_authors: Yuki Tsukada, Hideaki Iiduka
for: 本研究探讨了使用散射梯度下降（SGD）训练深度学习模型时的收敛分析。
methods: 本研究使用了Armijo线搜索学习率来实现SGD的收敛分析。
results: 研究结果表明，当步长和批处理大 enough时，SGD的期望平方误差的Upper bound会很小。此外，研究还发现，SGD WITH Armijo-line-search 学习率，批处理大小的增加会使得训练深度学习模型所需的步长数量减少。最后，研究还发现了一个关键的批处理大小，可以最小化Stochastic first-order oracle（SFO）复杂度。 numerics 支持了理论结果，它们表明，训练深度学习模型所需的步长数量随批处理大小的增加而减少，并且存在一个关键的批处理大小。

Abstract
Stochastic gradient descent (SGD) is the simplest deep learning optimizer with which to train deep neural networks. While SGD can use various learning rates, such as constant or diminishing rates, the previous numerical results showed that SGD performs better than other deep learning optimizers using when it uses learning rates given by line search methods. In this paper, we perform a convergence analysis on SGD with a learning rate given by an Armijo line search for nonconvex optimization. The analysis indicates that the upper bound of the expectation of the squared norm of the full gradient becomes small when the number of steps and the batch size are large. Next, we show that, for SGD with the Armijo-line-search learning rate, the number of steps needed for nonconvex optimization is a monotone decreasing convex function of the batch size; that is, the number of steps needed for nonconvex optimization decreases as the batch size increases. Furthermore, we show that the stochastic first-order oracle (SFO) complexity, which is the stochastic gradient computation cost, is a convex function of the batch size; that is, there exists a critical batch size that minimizes the SFO complexity. Finally, we provide numerical results that support our theoretical results. The numerical results indicate that the number of steps needed for training deep neural networks decreases as the batch size increases and that there exist the critical batch sizes that can be estimated from the theoretical results.

摘要
Stochastic gradient descent（SGD）是深度学习优化器中最简单的一种，用于训练深度神经网络。SGD可以使用不同的学习率，如常数或减小学习率，但前面的数据分析表明SGD使用给定的线搜索方法学习率时表现比其他深度学习优化器更好。在这篇论文中，我们进行了SGD的收敛分析，其中SGD使用Armijo线搜索学习率进行非 convex 优化。分析结果表明，当数据步长和批处理大小增加时，SGD的期望平方误差的上界变小。然后，我们证明SGD使用Armijo-线搜索学习率时，非 convex 优化的数据步长是增加批处理大小的 monotone 减少函数；即，数据步长随着批处理大小增加而逐渐减少。此外，我们证明SGD的杂乱首项 Oracle（SFO）复杂度，即杂乱首项计算成本，是批处理大小的几何函数；即存在一个最佳批处理大小，可以最小化SFO复杂度。最后，我们提供了实际数据支持我们的理论结果。实际数据表明，训练深度神经网络时，数据步长随着批处理大小增加而逐渐减少，并且存在一个最佳批处理大小，可以从理论结果中估算。

Offline Reinforcement Learning with On-Policy Q-Function Regularization

paper_url: http://arxiv.org/abs/2307.13824
repo_url: None
paper_authors: Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist
for: 本研究的目的是解决线上强化学习（RL）中的扩展错误问题，特别是在history集和期望策略之间的分布转换导致的扩展错误。
methods: 本研究使用Q函数估计来正则化学习策略，而不是直接正则化策略本身，以便更好地处理扩展错误。
results: 提出了两种基于Q函数估计的算法，并在D4RL标准吨量上表现出色。

Abstract
The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.

摘要
核心挑战是线上强化学习（RL）是处理可能导致极端的推理错误的分布shift问题。大部分先前工作是通过隐式/显式正则化学习策略向行为策略，这是在实践中难以估算的。在这种工作中，我们建议正则化学习向行为策略的Q函数，而不是行为策略本身，因为Q函数可以更好地被SARSA预测器估算，并且更直观地处理推理错误。我们提出了两种利用估算Q函数的算法，并在D4RL标准准则上展示强大的表现。

Fitting Auditory Filterbanks with Multiresolution Neural Networks

paper_url: http://arxiv.org/abs/2307.13821
repo_url: https://github.com/lostanlen/lostanlen2023waspaa
paper_authors: Vincent Lostanlen, Daniel Haider, Han Han, Mathieu Lagrange, Peter Balazs, Martin Ehler
for: 这paper的目的是超越深度学习抽象波形模型中的非Parametric和Parametric两种方法之间的矛盾。
methods: 这paper使用了一种名为多解析 neural network (MuReNN)，它是通过在快速傅立叶变换 (DWT) 中的 Octave 下分割，然后对每个 Octave 下的傅立叶函数进行分割学习的 convolutional neural network (CNN)。
results: compare to state of the art, MuReNN 在一些优化问题上达到了最佳性能，包括 hold-out set 上的好き度Of fit 和 Heisenberg time-frequency localization。

Abstract
Waveform-based deep learning faces a dilemma between nonparametric and parametric approaches. On one hand, convolutional neural networks (convnets) may approximate any linear time-invariant system; yet, in practice, their frequency responses become more irregular as their receptive fields grow. On the other hand, a parametric model such as LEAF is guaranteed to yield Gabor filters, hence an optimal time-frequency localization; yet, this strong inductive bias comes at the detriment of representational capacity. In this paper, we aim to overcome this dilemma by introducing a neural audio model, named multiresolution neural network (MuReNN). The key idea behind MuReNN is to train separate convolutional operators over the octave subbands of a discrete wavelet transform (DWT). Since the scale of DWT atoms grows exponentially between octaves, the receptive fields of the subsequent learnable convolutions in MuReNN are dilated accordingly. For a given real-world dataset, we fit the magnitude response of MuReNN to that of a well-established auditory filterbank: Gammatone for speech, CQT for music, and third-octave for urban sounds, respectively. This is a form of knowledge distillation (KD), in which the filterbank ''teacher'' is engineered by domain knowledge while the neural network ''student'' is optimized from data. We compare MuReNN to the state of the art in terms of goodness of fit after KD on a hold-out set and in terms of Heisenberg time-frequency localization. Compared to convnets and Gabor convolutions, we find that MuReNN reaches state-of-the-art performance on all three optimization problems.

摘要
文本描述一个深度学习问题，即 между非 Parametric 和 Parametric 方法之间的矛盾。一种方法是使用卷积神经网络（convnets），它们可以近似任何线性时变系统；然而，在实践中，它们的频谱响应会随着它们的观测领域而变得更加不规则。另一方面，一种 Parametric 模型如 LEAF 可以提供最佳的时间频率地址，但是这种强制性的假设来到了表达能力的代价。本文的目标是解决这个矛盾，通过引入多尺度神经网络（MuReNN）。MuReNN 的关键思想是在 discrete wavelet transform（DWT）中的 Octave 子域上训练分离的卷积操作。由于 DWT 中的尺度很大的 Atom 在不同 Octave 中的尺度增长，因此 MuReNN 中的后续学习可以在不同 Octave 中进行扩展。对于一个真实的数据集，我们将 MuReNN 的 магниту德响应与一个已知的听力滤波器anka 进行比较：Gammatone для语音、CQT для音乐和第三Octave для城市声音，分别。这是一种知识填充（KD），在听力滤波器“教师”是由领域知识工程而成，而神经网络“学生”是通过数据优化。我们将 MuReNN 与现状的最佳性比较，包括在 KD 中的准确性和 Heisenberg 时间频率地址。相比 convnets 和 Gabor 卷积，我们发现 MuReNN 在三个优化问题中达到了状态的最佳性。

Gradient-Based Spectral Embeddings of Random Dot Product Graphs

paper_url: http://arxiv.org/abs/2307.13818
repo_url: https://github.com/marfiori/efficient-ase
paper_authors: Marcelo Fiori, Bernardo Marenco, Federico Larroca, Paola Bermolen, Gonzalo Mateos
for: 这篇论文旨在提出一种基于非对映准则的关系数据生成模型，以及一种基于非对映准则的图像抽象算法，以解决现有的图像抽象算法存在的问题。
methods: 该论文使用了非对映准则优化方法来解决图像抽象问题，并且提出了一种新的 feasible 优化方法来保证对角矩阵的正则性。
results: 该论文通过 reproduce 性的实验表明，提出的图像抽象算法可以更好地处理实际网络数据，并且可以更好地捕捉网络数据的变化特征。

Abstract
The Random Dot Product Graph (RDPG) is a generative model for relational data, where nodes are represented via latent vectors in low-dimensional Euclidean space. RDPGs crucially postulate that edge formation probabilities are given by the dot product of the corresponding latent positions. Accordingly, the embedding task of estimating these vectors from an observed graph is typically posed as a low-rank matrix factorization problem. The workhorse Adjacency Spectral Embedding (ASE) enjoys solid statistical properties, but it is formally solving a surrogate problem and can be computationally intensive. In this paper, we bring to bear recent advances in non-convex optimization and demonstrate their impact to RDPG inference. We advocate first-order gradient descent methods to better solve the embedding problem, and to organically accommodate broader network embedding applications of practical relevance. Notably, we argue that RDPG embeddings of directed graphs loose interpretability unless the factor matrices are constrained to have orthogonal columns. We thus develop a novel feasible optimization method in the resulting manifold. The effectiveness of the graph representation learning framework is demonstrated on reproducible experiments with both synthetic and real network data. Our open-source algorithm implementations are scalable, and unlike the ASE they are robust to missing edge data and can track slowly-varying latent positions from streaming graphs.

摘要
“Random Dot Product Graph（RDPG）是一种生成模型，用于关系数据，节点通过低维欧几何空间中的latent vector表示。RDPG假设边形成概率由latent vector的点积生成。因此，从观察到的图像到latent vector的嵌入问题通常是一个低级matrix factorization问题。ASE工作马力广泛应用，但是它是一个代理问题，可能会 computationally expensive。在这篇论文中，我们利用非 convex 优化的最新进展，并证明其对RDPG推理的影响。我们建议使用first-order gradient descent方法，以更好地解决嵌入问题，并且可以自然地承载更广泛的网络嵌入应用。另外，我们发现RDPG对于指向图的嵌入具有解释性的限制，因此我们开发了一种新的可行优化方法。我们的图表示学框架在 reproduceable 实验中表现出色，可扩展性和稳定性都很好。”Note: Simplified Chinese is used in this translation, which is a more casual and widely-used version of Chinese. If you prefer Traditional Chinese, I can provide that version as well.

How to Scale Your EMA

paper_url: http://arxiv.org/abs/2307.13813
repo_url: https://github.com/ZulqarnainZilli/-9-Email-Marketing-Tips-For-Content-Marketers
paper_authors: Dan Busbridge, Jason Ramapuram, Pierre Ablin, Tatiana Likhomanenko, Eeshan Gunesh Dhekane, Xavier Suau, Russ Webb
for: This paper aims to improve the practicality of machine learning by preserving training dynamics across batch sizes, enabling the trade-off between batch size and wall-clock time.
methods: The paper proposes a scaling rule for optimization in the presence of model Exponential Moving Averages (EMAs), which can improve the robustness and generalization properties of supervised learning, stabilize pseudo-labeling, and provide a learning signal for Self-Supervised Learning (SSL).
results: The paper demonstrates the validity of the scaling rule across a range of architectures, optimizers, and data modalities, and shows that the rule enables training of EMA-based pseudo-labeling and SSL methods at small and large batch sizes. Additionally, the paper achieves a 6x wall-clock time reduction for training BYOL up to batch size 24,576 without sacrificing performance.

Abstract
Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important tool for practical machine learning is the model Exponential Moving Average (EMA), which is a model copy that does not receive gradient information, but instead follows its target model with some momentum. This model EMA can improve the robustness and generalization properties of supervised learning, stabilize pseudo-labeling, and provide a learning signal for Self-Supervised Learning (SSL). Prior works have treated the model EMA separately from optimization, leading to different training dynamics across batch sizes and lower model performance. In this work, we provide a scaling rule for optimization in the presence of model EMAs and demonstrate its validity across a range of architectures, optimizers, and data modalities. We also show the rule's validity where the model EMA contributes to the optimization of the target model, enabling us to train EMA-based pseudo-labeling and SSL methods at small and large batch sizes. For SSL, we enable training of BYOL up to batch size 24,576 without sacrificing performance, optimally a 6$\times$ wall-clock time reduction.

摘要
保持批处理大小中的训练动力是实用机器学习中重要的工具，因为它允许批处理大小和墙 clock 时间之间的交换。这种交换通常是通过缩放规则实现，例如在杂散Gradient Descent中，应该将学习率linearly缩放与批处理大小。另一个重要的实用机器学习工具是模型Exponential Moving Average（EMA），它是一个不接受梯度信息，而是跟随其目标模型的模型 copier ，可以提高超vised learning的稳定性和泛化性，稳定pseudo-labeling，并为Self-Supervised Learning（SSL）提供学习信号。先前的工作将模型 EMA 分离于优化，导致不同的批处理大小中的训练动力，并降低模型性能。在这项工作中，我们提供了优化过程中模型 EMA 的缩放规则，并证明其在不同的架构、优化器和数据模式下的有效性。我们还证明了这种规则在模型 EMA 对目标模型优化的情况下，可以训练 Pseudo-labeling 和 SSL 方法，包括在小批处理大小和大批处理大小下进行训练。为 SSL，我们可以在批处理大小为 24576 的情况下训练 BYOL，无需牺牲性能，实现了墙 clock 时间的6倍减少。

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

paper_url: http://arxiv.org/abs/2307.14382
repo_url: None
paper_authors: Maxime Fontana, Michael Spratling, Miaojing Shi
for: 这个论文旨在探讨多任务学习（MTL）在不同部分监督设置下如何实现。
methods: 论文使用了不同的参数共享技术来传递知识 между任务。
results: 论文介绍了多任务学习引入了多个目标函数的优化问题和挑战，并提出了根据任务关系分组任务的方法来解决这些挑战。

Abstract
Multi-Task Learning (MTL) aims to learn multiple tasks simultaneously while exploiting their mutual relationships. By using shared resources to simultaneously calculate multiple outputs, this learning paradigm has the potential to have lower memory requirements and inference times compared to the traditional approach of using separate methods for each task. Previous work in MTL has mainly focused on fully-supervised methods, as task relationships can not only be leveraged to lower the level of data-dependency of those methods but they can also improve performance. However, MTL introduces a set of challenges due to a complex optimisation scheme and a higher labeling requirement. This review focuses on how MTL could be utilised under different partial supervision settings to address these challenges. First, this review analyses how MTL traditionally uses different parameter sharing techniques to transfer knowledge in between tasks. Second, it presents the different challenges arising from such a multi-objective optimisation scheme. Third, it introduces how task groupings can be achieved by analysing task relationships. Fourth, it focuses on how partially supervised methods applied to MTL can tackle the aforementioned challenges. Lastly, this review presents the available datasets, tools and benchmarking results of such methods.

摘要
First, the review analyzes how MTL traditionally uses parameter sharing techniques to transfer knowledge between tasks. Second, it discusses the challenges arising from the multi-objective optimization scheme. Third, it introduces how task groupings can be achieved by analyzing task relationships. Fourth, it focuses on how partially supervised methods applied to MTL can tackle the aforementioned challenges. Lastly, the review presents available datasets, tools, and benchmarking results of such methods.Translation notes:* "Multi-Task Learning" (MTL) is translated as "多任务学习" (duō rèn shì xué yì)* "simultaneously" is translated as "同时" (tóng shí)* "exploiting their mutual relationships" is translated as "利用它们之间的关系" (lì yòng tā men zhī jiān de guān xì)* "by using shared resources" is translated as "通过共享资源" (tōng qián gòng yè zī yuán)* "lower memory requirements" is translated as "减少内存需求" (jiǎn shang nèi yì yè xū yè)* "inference times" is translated as "推理时间" (tuī lǐ shí jiān)* "traditional approach" is translated as "传统方法" (chuán tǒng fāng fǎ)* "using separate methods for each task" is translated as "使用单独的方法处理每个任务" (shǐ yòu dan zuò de fāng fǎ xíng yì jīn yè)* "lower data-dependency" is translated as "降低数据依赖" (jiàng dào xù xiàng yì yāng)* "improve performance" is translated as "提高性能" (tím gāo xìng néng)* "mainly focused" is translated as "主要强调" (zhǔ yào qiáng dào)* "fully-supervised methods" is translated as "完全监督的方法" (quán zhěn jiān dū de fāng fǎ)* "partially supervised methods" is translated as "部分监督的方法" (bùzhèng jiān dū de fāng fǎ)* "task relationships" is translated as "任务关系" (tāsk guān xì)* "multi-objective optimization scheme" is translated as "多目标优化方案" (duō mù zhì yǎo fāng yì)* "higher labeling requirement" is translated as "更高的标签要求" (gèng gāo de biāo hǎo yào qiú)* "analyzing task relationships" is translated as "分析任务关系" (fēn xiǎo tāsk guān xì)* "task groupings" is translated as "任务分组" (tāsk fēn yè)* "partially supervised methods applied to MTL" is translated as "对MTL应用部分监督的方法" (duì MTL yì yù bùzhèng jiān dū de fāng fǎ)* "available datasets, tools, and benchmarking results" is translated as "可用的数据集、工具和比较结果" (kě yòu de xiàng jì, gōng jī, bǐ jiào jié yì)

EdgeConvEns: Convolutional Ensemble Learning for Edge Intelligence

paper_url: http://arxiv.org/abs/2307.14381
repo_url: None
paper_authors: Ilkay Sikdokur, İnci M. Baytaş, Arda Yurdakul
for: 这篇论文旨在提出一种基于 Edge 的深度学习方法，以便在Edge网络中进行计算成本高的训练，同时解决了数据隐私问题。
methods: 该方法使用了分布式学习方法，例如联邦学习，来在Edge设备上训练多种弱型模型，并将其ensemble成一个更高级别的模型。Edge设备上实现和训练独立的FPGA设备，并将学习到的数据表示转移到中央服务器进行集成训练。
results: 实验结果表明，EdgeConvEns 可以在不同训练场景下超越当前最佳性能，并且需要 fewer communications 和 less data。

Abstract
Deep edge intelligence aims to deploy deep learning models that demand computationally expensive training in the edge network with limited computational power. Moreover, many deep edge intelligence applications require handling distributed data that cannot be transferred to a central server due to privacy concerns. Decentralized learning methods, such as federated learning, offer solutions where models are learned collectively by exchanging learned weights. However, they often require complex models that edge devices may not handle and multiple rounds of network communication to achieve state-of-the-art performances. This study proposes a convolutional ensemble learning approach, coined EdgeConvEns, that facilitates training heterogeneous weak models on edge and learning to ensemble them where data on edge are heterogeneously distributed. Edge models are implemented and trained independently on Field-Programmable Gate Array (FPGA) devices with various computational capacities. Learned data representations are transferred to a central server where the ensemble model is trained with the learned features received from the edge devices to boost the overall prediction performance. Extensive experiments demonstrate that the EdgeConvEns can outperform the state-of-the-art performance with fewer communications and less data in various training scenarios.

摘要
深入智能目标是在边缘网络中部署需要计算资源充足的训练深度学习模型。此外，许多深入智能应用程序需要处理分布式数据，这些数据不能被传输到中央服务器 Due to privacy concerns. 联合学习方法，如联邦学习，提供了解决方案，其中模型在分布式设备上学习并交换学习到达的权重。然而，这些方法通常需要复杂的模型，边缘设备可能无法处理，并且需要多轮网络交互以达到现场表现。本研究提出了一种 convolutional ensemble learning 方法，称为 EdgeConvEns，它可以在边缘上训练不同计算能力的弱模型，并将数据在边缘处分布的学习结果转移到中央服务器进行集成。边缘设备上实现和训练独立的 Field-Programmable Gate Array (FPGA) 设备，并将学习到的特征传输到中央服务器进行集成模型训练，以提高总预测性能。广泛的实验表明，EdgeConvEns 可以在不同的训练场景下超越现有的表现，并且需要 fewer communications 和 less data。

Source Condition Double Robust Inference on Functionals of Inverse Problems

paper_url: http://arxiv.org/abs/2307.13793
repo_url: None
paper_authors: Andrew Bennett, Nathan Kallus, Xiaojie Mao, Whitney Newey, Vasilis Syrgkanis, Masatoshi Uehara
for: 这篇论文是关于线性逆问题的估计 parameters的研究，特别是linear functionals of solutions to linear inverse problems的估计。
methods: 论文使用了doubly robust representation来表示参数，这个表示法取决于解决的 dual linear inverse problem的解。
results: 论文提供了第一个source condition double robust inference method，可以在参数 интереBS的附近具有准确性，只要 Either the primal or the dual inverse problem是 suficiently well-posed，而不需要知道哪一个逆问题更加well-posed。这个结果得到了iterated Tikhonov regularized adversarial estimators的新的保证，这些保证适用于一般假设空间上的线性逆问题。

Abstract
We consider estimation of parameters defined as linear functionals of solutions to linear inverse problems. Any such parameter admits a doubly robust representation that depends on the solution to a dual linear inverse problem, where the dual solution can be thought as a generalization of the inverse propensity function. We provide the first source condition double robust inference method that ensures asymptotic normality around the parameter of interest as long as either the primal or the dual inverse problem is sufficiently well-posed, without knowledge of which inverse problem is the more well-posed one. Our result is enabled by novel guarantees for iterated Tikhonov regularized adversarial estimators for linear inverse problems, over general hypothesis spaces, which are developments of independent interest.

摘要
我们考虑预测定义为线性函数的参数，即解决线性逆问题中的参数。任一个参数都可以得到一个双重稳定表现，这个表现取决于解决的dual逆问题的解，可以视为对问题传递函数的一种扩展。我们提供了第一个源条件双重稳定推断方法，这个方法可以在参数的数据分布预测顶点附近对参数进行推断，只要primaldual逆问题中的一个问题够单纯，就可以获得 asymptotic normality 的 guarantees，不需要知道哪一个逆问题更加单纯。我们的结果受到iterated Tikhonov regularized adversarial estimator的 novel guarantees 的支持，这些 guarantees 适用于一般假设空间中的线性逆问题，是独立的 interessing 开发。

Histogram Layer Time Delay Neural Networks for Passive Sonar Classification

paper_url: http://arxiv.org/abs/2307.13788
repo_url: https://github.com/peeples-lab/hltdnn
paper_authors: Jarin Ritu, Ethan Barnes, Riley Martell, Alexandra Van Dine, Joshua Peeples
for: 提高海上探测陌生目标的精度
methods: combine时间延迟神经网络和 histogram层，利用统计上下文提高特征学习和海上听频目标识别
results: 比基线模型高效，demonstrate了利用统计上下文的优势Here’s a more detailed explanation of each point:
for: The paper is written to improve the accuracy of underwater acoustic target detection in remote marine sensing operations.
methods: The proposed method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification.
results: The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition.

Abstract
Underwater acoustic target detection in remote marine sensing operations is challenging due to complex sound wave propagation. Despite the availability of reliable sonar systems, target recognition remains a difficult problem. Various methods address improved target recognition. However, most struggle to disentangle the high-dimensional, non-linear patterns in the observed target recordings. In this work, a novel method combines a time delay neural network and histogram layer to incorporate statistical contexts for improved feature learning and underwater acoustic target classification. The proposed method outperforms the baseline model, demonstrating the utility in incorporating statistical contexts for passive sonar target recognition. The code for this work is publicly available.

摘要
水下声学目标检测在远程海洋探测操作中是一项复杂的任务，由于声波传播的复杂性。尽管可靠的声纳系统可以提供优质的目标检测结果，但目标识别仍然是一个困难的问题。许多方法尝试解决这个问题，但大多数方法无法分解高维、非线性的目标记录特征。在这项工作中，我们提出了一种新的方法，该方法组合了时间延迟神经网络和 histogram 层，以利用统计上下文来改善声学目标识别。我们的方法在比较基准模型时表现出色，这 demonstartes 在声学目标识别中包含统计上下文的优势。代码 для这项工作公共可用。

The GANfather: Controllable generation of malicious activity to improve defence systems

paper_url: http://arxiv.org/abs/2307.13787
repo_url: None
paper_authors: Ricardo Ribeiro Pereira, Jacopo Bono, João Tiago Ascensão, David Aparício, Pedro Ribeiro, Pedro Bizarro
for: 这篇论文的目的是提出一种基于生成 adversarial Networks (GANs) 的方法，以生成具有黑客活动特征的样本，并且不需要标签数据。
methods: 这篇论文使用了 GANs 来生成黑客活动的样本，并且引入了一个额外的目标函数，以优化生成的样本具有黑客活动特征。
results: 这篇论文在两个实际应用案例中进行了评估，分别是防止贪污和推荐系统。在第一个应用案例中，这篇论文成功地将总金额透过一个网络的账户移动到不同的账户，而不被现有的防护系统检测到。在第二个应用案例中，这篇论文成功地将目标物品推荐到广泛的用户群中，只需要30个生成的黑客攻击者。

Abstract
Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7--4 trillion euros are laundered annually and go undetected. We propose The GANfather, a method to generate samples with properties of malicious activity, without label requirements. We propose to reward the generation of malicious samples by introducing an extra objective to the typical Generative Adversarial Networks (GANs) loss. Ultimately, our goal is to enhance the detection of illicit activity using the discriminator network as a novel and robust defence system. Optionally, we may encourage the generator to bypass pre-existing detection systems. This setup then reveals defensive weaknesses for the discriminator to correct. We evaluate our method in two real-world use cases, money laundering and recommendation systems. In the former, our method moves cumulative amounts close to 350 thousand dollars through a network of accounts without being detected by an existing system. In the latter, we recommend the target item to a broad user base with as few as 30 synthetic attackers. In both cases, we train a new defence system to capture the synthetic attacks.

摘要
机器学习方法通常需要标注数据来帮助防御系统检测恶意活动。在某些领域，这些标注数据可能不可得或者不完整。这可能导致检测率低下，假阳性率高，这些现象在例如反走私系统中经常出现。据估计，每年1.7至4万亿欧元被贩卖而不被发现。我们提出了“GANfather”方法，可以生成具有恶意活动特征的样本，无需标注数据。我们建议在传统的生成对抗网络（GANs）损失函数中引入一个额外目标，以奖励生成恶意样本的生成器。最终，我们的目标是通过使用探测器网络作为一种新的和可靠的防御系统，提高恶意活动的检测率。可选地，我们可以让生成器尝试绕过现有的检测系统，这种设置会暴露防御系统的弱点，让探测器网络进行更好的 corrections。我们在两个实际应用中评估了我们的方法：反走私和推荐系统。在前一个应用中，我们通过一个网络的账户来转移大约35万美元，而不被现有系统发现。在后一个应用中，我们通过 Synthetic 攻击者来推荐目标项目，并且只需要30名Synthetic 攻击者。在两个案例中，我们训练了一个新的防御系统，以捕捉Synthetic 攻击。

Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations

paper_url: http://arxiv.org/abs/2307.14380
repo_url: None
paper_authors: Daniel Kałuża, Andrzej Janusz, Dominik Ślęzak
for: 提高活动学习中缺失数据标注的质量
methods: 提出了两种基于无标注部分的样本空间的注解统一算法，需要少量或无 intersect между不同专家的标注
results: 实验结果表明提出的方法在估计专家的可靠性和实际标签分配方面具有坚定性和超越性，并且在四个公共数据集上达到了最佳效果。

Abstract
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe. Their performance is strictly connected with the quality of labels used in training. Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice. To tackle this challenge, active learning algorithms are commonly employed to select only the most relevant data for labeling. However, this is possible only when the quality and quantity of labels acquired from experts are sufficient. Unfortunately, in many applications, a trade-off between annotating individual samples by multiple annotators to increase label quality vs. annotating new samples to increase the total number of labeled instances is necessary. In this paper, we address the issue of faulty data annotations in the context of active learning. In particular, we propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space. The proposed methods require little to no intersection between samples annotated by different experts. Our experiments on four public datasets indicate the robustness and superiority of the proposed methods in both, the estimation of the annotator's reliability, and the assignment of actual labels, against the state-of-the-art algorithms and the simple majority voting.

摘要

Accuracy Amplification in Differentially Private Logistic Regression: A Pre-Training Approach

paper_url: http://arxiv.org/abs/2307.13771
repo_url: None
paper_authors: Mohammad Hoseinpour, Milad Hoseinpour, Ali Aghagolzadeh
for: 这篇论文的目的是提高具有隐私保证的机器学习（DP-ML）模型的准确性。
methods: 本论文使用了预训模组来提高DP-ML模型的准确性。这个预训模组首先在公开的训练 dataset 上进行预训，然后在具有隐私保证的私人训练 dataset 上进行微调。
results: numerical results show that adding a pre-training module significantly improves the accuracy of the DP logistic regression.

Abstract
Machine learning (ML) models can memorize training datasets. As a result, training ML models over private datasets can violate the privacy of individuals. Differential privacy (DP) is a rigorous privacy notion to preserve the privacy of underlying training datasets in ML models. Yet, training ML models in a DP framework usually degrades the accuracy of ML models. This paper aims to boost the accuracy of a DP-ML model, specifically a logistic regression model, via a pre-training module. In more detail, we initially pre-train our model on a public training dataset that there is no privacy concern about it. Then, we fine-tune our model via the DP logistic regression with the private dataset. In the numerical results, we show that adding a pre-training module significantly improves the accuracy of the DP logistic regression.

摘要

ClusterSeq: Enhancing Sequential Recommender Systems with Clustering based Meta-Learning

paper_url: http://arxiv.org/abs/2307.13766
repo_url: None
paper_authors: Mohammmadmahdi Maheri, Reza Abdollahzadeh, Bardia Mohammadi, Mina Rafiei, Jafar Habibi, Hamid R. Rabiee
for: 这个研究旨在解决用户冰对问题，即用户在推荐系统中的内部状态难以准确决定。
methods: 这个研究使用了meta-学习和用户项目信息，以增强预测项目的准确性。
results: 实验结果显示， compared to existing meta-learning recommenders, 我们的提案方法可以 achieve a substantial improvement of 16-39% in Mean Reciprocal Rank (MRR)。

Abstract
In practical scenarios, the effectiveness of sequential recommendation systems is hindered by the user cold-start problem, which arises due to limited interactions for accurately determining user preferences. Previous studies have attempted to address this issue by combining meta-learning with user and item-side information. However, these approaches face inherent challenges in modeling user preference dynamics, particularly for "minor users" who exhibit distinct preferences compared to more common or "major users." To overcome these limitations, we present a novel approach called ClusterSeq, a Meta-Learning Clustering-Based Sequential Recommender System. ClusterSeq leverages dynamic information in the user sequence to enhance item prediction accuracy, even in the absence of side information. This model preserves the preferences of minor users without being overshadowed by major users, and it capitalizes on the collective knowledge of users within the same cluster. Extensive experiments conducted on various benchmark datasets validate the effectiveness of ClusterSeq. Empirical results consistently demonstrate that ClusterSeq outperforms several state-of-the-art meta-learning recommenders. Notably, compared to existing meta-learning methods, our proposed approach achieves a substantial improvement of 16-39% in Mean Reciprocal Rank (MRR).

摘要
实际应用场景中，顺序推荐系统的效果受用户冷启动问题的限制，这种问题由用户的交互数量有限制，难以准确地确定用户的偏好。先前的研究尝试通过将meta学与用户和项信息结合来解决这个问题，但这些方法面临用户偏好动态模型化的内在挑战，特别是对"小用户"来说，他们的偏好比"大用户"更加独特。为了解决这些限制，我们提出了一种新的方法 called ClusterSeq，这是一种基于集群学习的顺序推荐系统。ClusterSeq利用用户序列中的动态信息来提高项预测精度，即使在没有侧 информация的情况下。这个模型保持了"小用户"的偏好，不被"大用户"所掩蔽，同时充分利用用户同一个集群内的共同知识。我们在不同的 benchmark 数据集上进行了广泛的实验，结果表明，ClusterSeq 比许多现有的meta学推荐器表现出较好的效果。empirical 结果表明，ClusterSeq 与现有meta学方法相比，在 Mean Reciprocal Rank（MRR）上实现了16-39%的显著提升。

Implicitly Normalized Explicitly Regularized Density Estimation

paper_url: http://arxiv.org/abs/2307.13763
repo_url: None
paper_authors: Mark Kozdoba, Binyamin Perets, Shie Mannor
for: 本研究提出了一种新的非参数density估计方法，基于 Sobolev 范数regularization。这种方法与kernel density estimation不同，可以提供明确可读的模型偏差。
methods: 本方法无法closed analytic form的kernel，可以通过采样来approximation。优化问题需要解决是非核vex的，标准的梯度方法不太好。然而，我们表明，采用适当的初始化和自然梯度，可以获得良好的解决方案。
results: 本研究使用了recent Anomaly Detection benchmark suite, ADBench,进行评估，并得到了第二好的成绩，在more than 15algorithms中。

Abstract
We propose a new approach to non-parametric density estimation, that is based on regularizing a Sobolev norm of the density. This method is provably different from Kernel Density Estimation, and makes the bias of the model clear and interpretable. While there is no closed analytic form for the associated kernel, we show that one can approximate it using sampling. The optimization problem needed to determine the density is non-convex, and standard gradient methods do not perform well. However, we show that with an appropriate initialization and using natural gradients, one can obtain well performing solutions. Finally, while the approach provides unnormalized densities, which prevents the use of log-likelihood for cross validation, we show that one can instead adapt Fisher Divergence based Score Matching methods for this task. We evaluate the resulting method on the comprehensive recent Anomaly Detection benchmark suite, ADBench, and find that it ranks second best, among more than 15 algorithms.

摘要
我们提出了一种新的非参数化概率分布估计方法，基于 Sobolev 范数regularization。这种方法与核密度估计方法不同，可以让模型的偏见变得明确和可解释。虽然关联的核函数没有关闭的分析形式，但我们表明可以使用抽象来近似它。估计问题需要解决的非 convex 问题，标准的梯度法不太适用。然而，我们表明，通过适当的初始化和使用自然梯度，可以获得良好的解决方案。虽然方法提供的概率分布无法使用对数分布来进行验证，但我们表明可以使用基于 Fisher 分布的分数匹配方法来实现这一点。我们对最新的 Anomaly Detection 测试集 ADBench 进行了评估，并发现其在超过 15 种算法中排名第二。

UPREVE: An End-to-End Causal Discovery Benchmarking System

paper_url: http://arxiv.org/abs/2307.13757
repo_url: None
paper_authors: Suraj Jyothi Unni, Paras Sheth, Kaize Ding, Huan Liu, K. Selcuk Candan
for: 提高 complex socio-behavioral systems 中 causal relationships 的发现，以便更好地做出决策。
methods: 提供了一个用户友好的 web-based graphical user interface (GUI)，可以同时运行多种算法，视觉化 causal relationships，并评估学习的 causal graphs 的准确性。
results: 可以帮助研究者和实践者更好地探索和理解 causal relationships，从而获得更好的决策。

Abstract
Discovering causal relationships in complex socio-behavioral systems is challenging but essential for informed decision-making. We present Upload, PREprocess, Visualize, and Evaluate (UPREVE), a user-friendly web-based graphical user interface (GUI) designed to simplify the process of causal discovery. UPREVE allows users to run multiple algorithms simultaneously, visualize causal relationships, and evaluate the accuracy of learned causal graphs. With its accessible interface and customizable features, UPREVE empowers researchers and practitioners in social computing and behavioral-cultural modeling (among others) to explore and understand causal relationships effectively. Our proposed solution aims to make causal discovery more accessible and user-friendly, enabling users to gain valuable insights for better decision-making.

摘要
发现复杂社会行为系统中的 causal 关系是挑战性的，但是这是 informed decision-making 的关键。我们提出了 Upload, PREprocess, Visualize, and Evaluate (UPREVE)，一个用户友好的网页式 graphical user interface (GUI)，用于简化 causal discovery 的过程。UPREVE 允许用户同时运行多个算法，可视化 causal 关系，并评估学习的 causal 图的准确性。它的可访问性和可定制功能使得社会计算和行为文化模型等研究人员能够有效地探索和理解 causal 关系，从而获得价值的情报。我们的提案的目标是使 causal discovery 更加访问ible 和用户友好，使用户能够更好地理解 causal 关系，以便更好的决策。

Solution Path of Time-varying Markov Random Fields with Discrete Regularization

paper_url: http://arxiv.org/abs/2307.13750
repo_url: None
paper_authors: Salar Fattahi, Andres Gomez
for: 这个论文目的是解决推理稀疏时变Markov随机场（MRF）的问题，具有不同的抽象和时间正则化。
methods: 该论文使用的方法是基于新的受限制优化问题，以提高参数的稀疏性。这个方法可以 Parametrically解决，并且可以在几乎所有稀疏程度下得到解决方案。
results: 论文表明，该方法可以在不同类型的时变MRF中实现提高的优化性和精度，并且可以在几乎实际情况下的数据规模下进行 Parametric 解决。此外，论文还实现了在几分钟内解决3000万变量的实例问题。

Abstract
We study the problem of inferring sparse time-varying Markov random fields (MRFs) with different discrete and temporal regularizations on the parameters. Due to the intractability of discrete regularization, most approaches for solving this problem rely on the so-called maximum-likelihood estimation (MLE) with relaxed regularization, which neither results in ideal statistical properties nor scale to the dimensions encountered in realistic settings. In this paper, we address these challenges by departing from the MLE paradigm and resorting to a new class of constrained optimization problems with exact, discrete regularization to promote sparsity in the estimated parameters. Despite the nonconvex and discrete nature of our formulation, we show that it can be solved efficiently and parametrically for all sparsity levels. More specifically, we show that the entire solution path of the time-varying MRF for all sparsity levels can be obtained in $\mathcal{O}(pT^3)$, where $T$ is the number of time steps and $p$ is the number of unknown parameters at any given time. The efficient and parametric characterization of the solution path renders our approach highly suitable for cross-validation, where parameter estimation is required for varying regularization values. Despite its simplicity and efficiency, we show that our proposed approach achieves provably small estimation error for different classes of time-varying MRFs, namely Gaussian and discrete MRFs, with as few as one sample per time. Utilizing our algorithm, we can recover the complete solution path for instances of time-varying MRFs featuring over 30 million variables in less than 12 minutes on a standard laptop computer. Our code is available at \url{https://sites.google.com/usc.edu/gomez/data}.

摘要
我们研究了推理缺少时间变化Markov随机场（MRF）的问题，其中参数具有不同的整数和时间规则化。由于整数规则化的不可解性，大多数解决这个问题的方法都基于最大 likelihood估计（MLE）的宽松规则化，这并不会导致理想的统计特性，nor scale to the dimensions encountered in realistic settings。在这篇论文中，我们解决这些挑战，我们 departure from the MLE paradigm and resort to a new class of constrained optimization problems with exact, discrete regularization to promote sparsity in the estimated parameters。尽管我们的形式ulation是非 convex和整数的，我们展示了可以有效地和 Parametrically解决这个问题。具体来说，我们表明了时间变化MRF的解的整个解 paths可以在 $\mathcal{O}(pT^3)$ 时间内获得，其中 $T$ 是时间步骤数量，$p$ 是任何时间点的未知参数数量。这种有效和 Parametrically 的解决方法使得我们的方法高度适合 cross-validation，其中需要在不同规则值下进行参数估计。尽管它的简单和高效性，我们证明了我们提议的方法可以在不同类型的时间变化MRFs 中实现可观测小的估计误差，只需要一个时间步骤中的一个样本。我们可以在 less than 12 分钟内在标准笔记计算机上解决了包含超过 30 万个变量的时间变化MRFs 的完整解 paths。我们的代码可以在 \url{https://sites.google.com/usc.edu/gomez/data} 上获取。

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

paper_url: http://arxiv.org/abs/2307.13744
repo_url: None
paper_authors: Yue Niu, Zalan Fabian, Sunwoo Lee, Mahdi Soltanolkotabi, Salman Avestimehr
for: 这个论文目的是提出一种基于准新顿法的轻量级深度神经网络优化算法，以便在大规模分布式深度神经网络优化中使用。
methods: 该论文使用了一种叫做mL-BFGS的新算法，它是一种基于准新顿法的满足约束的批量优化算法，具有较低的计算成本和稳定的收敛性。
results: 在对一些标准神经网络模型的训练中，mL-BFGS算法比基于SGD、Adam和其他准新顿法的算法得到了更好的性能，同时也比基于准新顿法的算法更快。

Abstract
Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS that efficiently approximates the Hessian using history parameter and gradient changes, suffers convergence instability in stochastic training. So far, attempts that adapt L-BFGS to large-scale stochastic training incur considerable extra overhead, which offsets its convergence benefits in wall-clock time. In this paper, we propose mL-BFGS, a lightweight momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) optimization. mL-BFGS introduces a nearly cost-free momentum scheme into L-BFGS update and greatly reduces stochastic noise in the Hessian, therefore stabilizing convergence during stochastic optimization. For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing nodes. We provide a supporting convergence analysis for mL-BFGS in stochastic settings. To investigate mL-BFGS potential in large-scale DNN training, we train benchmark neural models using mL-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that mL-BFGS achieves both noticeable iteration-wise and wall-clock speedup.

摘要
对于大规模神经网络训练而言，类新顿方法仍然面临着 significiant 的挑战，主要是在条件 Compute 成本和统计训练中发生的不稳定性问题。一种广泛使用的方法是 L-BFGS，它可以有效地预测 Hessian 的值，但在随机训练中却会出现问题，导致训练不稳定。在这篇论文中，我们提出了 mL-BFGS，一种轻量级的态势-基本方法，它可以在大规模分布式深度神经网络优化中实现类新顿方法的可行性。mL-BFGS 通过将态势给动的思想引入 L-BFGS 更新，很大地减少了随机训练中的条件统计误差，因此稳定了训练的条件。为了在大规模的模型训练中实现分布式计算和内存成本的分摊，mL-BFGS 采用了对称的封页 Hessian 估计。我们提供了支持 mL-BFGS 在随机设定下的均衡分析。为了评估 mL-BFGS 在大规模 DNN 训练中的可能性，我们使用 mL-BFGS 训练了一些benchmark神经网络模型，并与基eline (SGD, Adam, 其他类新顿方法) 进行比较。结果显示，mL-BFGS 可以在随机训练中获得明显的迭代次数和实际时间优化。

ARB: Advanced Reasoning Benchmark for Large Language Models

paper_url: http://arxiv.org/abs/2307.13692
repo_url: None
paper_authors: Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki
for: 本研究旨在提供一个新的评价标准 benchmark，以测试大型自然语言模型（LLM）在多个领域的高级推理能力。
methods: 本研究使用了一个新的评价标准，即 ARB，该标准包括了多个领域的高级推理问题。此外，研究还使用了一种新的评价方法，即 rubric-based evaluation approach，以提高自动和协助评价能力。
results: 研究发现，当前的LLM模型在ARB中的得分较低，只有在一些较为简单的问题上达到了50%的得分。此外，人工评价结果与GPT-4的自动评价结果之间存在了良好的一致性。

Abstract
Large Language Models (LLMs) have demonstrated remarkable performance on various quantitative reasoning and knowledge benchmarks. However, many of these benchmarks are losing utility as LLMs get increasingly high scores, despite not yet reaching expert performance in these domains. We introduce ARB, a novel benchmark composed of advanced reasoning problems in multiple fields. ARB presents a more challenging test than prior benchmarks, featuring problems in mathematics, physics, biology, chemistry, and law. As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge. We evaluate recent models such as GPT-4 and Claude on ARB and demonstrate that current models score well below 50% on more demanding tasks. In order to improve both automatic and assisted evaluation capabilities, we introduce a rubric-based evaluation approach, allowing GPT-4 to score its own intermediate reasoning steps. Further, we conduct a human evaluation of the symbolic subset of ARB, finding promising agreement between annotators and GPT-4 rubric evaluation scores.

摘要
大语言模型（LLM）已经表现出了非常出色的表现力在不同的量化逻辑和知识测试中。然而，许多这些测试在LLM获得高分后就失去了用途，即使它们还没有在这些领域达到专家水平。我们介绍了ARB，一个新的测试套件，它包含了多个领域的高级逻辑问题。ARB比之前的测试更加具有挑战性，包括数学、物理、生物、化学和法律等领域的问题。为ARB的一个子集，我们引入了一个有chedding的数学和物理问题集，它们需要高级符号逻辑和领域知识。我们使用GPT-4和Claude等最新的模型测试ARB，并发现现在的模型在更加具有挑战性的任务上的表现仍然落后于50%。为了提高自动和协助评估能力，我们引入了一种基于笔记的评估方法，允许GPT-4自己评估其中间的符号逻辑步骤。此外，我们进行了人类评估ARB的符号子集，发现和GPT-4笔记评估得分具有惊人的一致性。

High Probability Analysis for Non-Convex Stochastic Optimization with Clipping

paper_url: http://arxiv.org/abs/2307.13680
repo_url: None
paper_authors: Shaojie Li, Yong Liu
for: 这个论文主要针对 Stochastic Optimization 中的 Gradient Clipping 技术，并提供了对这种技术的高机会分析和优化性能 bound。
methods: 论文使用了 Stochastic Gradient Descent 和其变种（包括带有权重和步长调整的 SGD），以及 Gradient Clipping 技术。
results: 论文提供了对 Stochastic Optimization 算法和 Gradient Clipping 技术的高机会分析和优化性能 bound，并研究了一种强度 bounded 的 $\alpha$-th moment 假设，以推导出更强的 теорем guarantees。

Abstract
Gradient clipping is a commonly used technique to stabilize the training process of neural networks. A growing body of studies has shown that gradient clipping is a promising technique for dealing with the heavy-tailed behavior that emerged in stochastic optimization as well. While gradient clipping is significant, its theoretical guarantees are scarce. Most theoretical guarantees only provide an in-expectation analysis and only focus on optimization performance. In this paper, we provide high probability analysis in the non-convex setting and derive the optimization bound and the generalization bound simultaneously for popular stochastic optimization algorithms with gradient clipping, including stochastic gradient descent and its variants of momentum and adaptive stepsizes. With the gradient clipping, we study a heavy-tailed assumption that the gradients only have bounded $\alpha$-th moments for some $\alpha \in (1, 2]$, which is much weaker than the standard bounded second-moment assumption. Overall, our study provides a relatively complete picture for the theoretical guarantee of stochastic optimization algorithms with clipping.

摘要
Gradient clipping 是一种常用的技术来稳定神经网络的训练过程。一组不断增长的研究表明，Gradient clipping 是一种有前途的技术，用于处理随机优化中的重 tailed 行为。虽然 Gradient clipping 具有重要性，但其理论保证却 scarce。大多数理论保证都仅提供了预期分析，只关注优化性能。在这篇论文中，我们提供了高概率分析在非对称设定下，并同时 deriv 出优化 bound 和泛化 bound для流行的随机优化算法与 Gradient clipping，包括随机梯度下降和其 variants of momentum 和 adaptive stepsizes。在使用 Gradient clipping 时，我们研究了一种偏值 $\alpha $-th moment 只有bounded 的假设，其中 $\alpha \in (1, 2] $，这是标准二次 moments 假设的很弱条件。总的来说，我们的研究提供了针对随机优化算法与 clipping 的理论保证的相对完整的图像。

RED CoMETS: An ensemble classifier for symbolically represented multivariate time series

paper_url: http://arxiv.org/abs/2307.13679
repo_url: https://github.com/zy18811/red-comets
paper_authors: Luca A. Bennett, Zahraa S. Abdallah
For: The paper is written for researchers and practitioners working in the field of multivariate time series classification, particularly in finance, healthcare, engineering, and other related fields.* Methods: The paper proposes a novel ensemble classifier called RED CoMETS, which builds upon the success of Co-eye and extends its capabilities to handle multivariate time series data. The proposed method uses a combination of random enhanced co-eye and symbolic representation to improve the accuracy and efficiency of multivariate time series classification.* Results: The paper demonstrates the performance of RED CoMETS on benchmark datasets from the UCR archive, achieving competitive accuracy compared to state-of-the-art techniques in multivariate settings. Specifically, it achieves the highest reported accuracy in the literature for the ‘HandMovementDirection’ dataset. Additionally, the proposed method significantly reduces computation time compared to Co-eye, making it an efficient and effective choice for multivariate time series classification.Here is the simplified Chinese text for the three key points:* For: 这篇论文是为了推广多变量时间序列分类领域的研究人员和实践者而写的。* Methods: 这篇论文提出了一种新的ensemble分类器 called RED CoMETS，它基于Co-eye的成功并将其扩展到多变量时间序列数据上。提出的方法使用Random Enhanced Co-eye和符号表示来提高多变量时间序列分类的准确性和效率。* Results: 论文通过对UCR数据集的测试，展示了RED CoMETS的性能，与多变量时间序列分类领域的状态 искусственный技术相比，它达到了最高的报告精度。此外，提出的方法还能够显著减少计算时间，使其成为效率和可靠的多变量时间序列分类选择。

Abstract
Multivariate time series classification is a rapidly growing research field with practical applications in finance, healthcare, engineering, and more. The complexity of classifying multivariate time series data arises from its high dimensionality, temporal dependencies, and varying lengths. This paper introduces a novel ensemble classifier called RED CoMETS (Random Enhanced Co-eye for Multivariate Time Series), which addresses these challenges. RED CoMETS builds upon the success of Co-eye, an ensemble classifier specifically designed for symbolically represented univariate time series, and extends its capabilities to handle multivariate data. The performance of RED CoMETS is evaluated on benchmark datasets from the UCR archive, where it demonstrates competitive accuracy when compared to state-of-the-art techniques in multivariate settings. Notably, it achieves the highest reported accuracy in the literature for the 'HandMovementDirection' dataset. Moreover, the proposed method significantly reduces computation time compared to Co-eye, making it an efficient and effective choice for multivariate time series classification.

摘要
多变量时间序列分类是一个快速发展的研究领域，有实际应用于金融、医疗、工程等领域。multivariate时间序列数据的复杂性来自其高维度、时间相关性和不同长度。这篇论文介绍了一种新的团队分类器called RED CoMETS（随机增强共视 для多变量时间序列），该方法解决了这些挑战。RED CoMETS基于Co-eye Ensemble分类器，该分类器专门为symbolically represented单变量时间序列设计，并扩展其能力以处理多变量数据。本文评估了RED CoMETS的性能，并与state-of-the-art多变量设置比较。结果表明，RED CoMETS在UCAR archive的 benchmark数据集上达到了Literature中最高的报告精度，特别是在'HandMovementDirection'数据集上。此外，提议的方法可以significantly reduce computation time compared to Co-eye，使其成为efficient和effective的多变量时间序列分类方法。

FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.13716
repo_url: None
paper_authors: Leiming Chen, Cihao Dong, Sibo Qiao, Ziling Huang, Kai Wang, Yuming Nie, Zhaoxiang Hou, Cheewei Tan
for: 解决传统 federated learning 中客户端模型质量不均匀和恶意上传模型的问题
methods: 使用 reinforcement learning 进行模型融合，包括两个阶段：第一阶段过滤恶意模型并选择可信客户端模型参与融合，第二阶段adaptively 调整可信客户端模型的权重并融合最佳全球模型
results: 在五种模型融合场景中，我们的算法比基eline algorithms 高于可靠性而保持准确性

Abstract
Traditional federated learning uses the number of samples to calculate the weights of each client model and uses this fixed weight value to fusion the global model. However, in practical scenarios, each client's device and data heterogeneity leads to differences in the quality of each client's model. Thus the contribution to the global model is not wholly determined by the sample size. In addition, if clients intentionally upload low-quality or malicious models, using these models for aggregation will lead to a severe decrease in global model accuracy. Traditional federated learning algorithms do not address these issues. To solve this probelm, we propose FedDRL, a model fusion approach using reinforcement learning based on a two staged approach. In the first stage, Our method could filter out malicious models and selects trusted client models to participate in the model fusion. In the second stage, the FedDRL algorithm adaptively adjusts the weights of the trusted client models and aggregates the optimal global model. We also define five model fusion scenarios and compare our method with two baseline algorithms in those scenarios. The experimental results show that our algorithm has higher reliability than other algorithms while maintaining accuracy.

摘要
传统的联合学习方法使用客户端模型的样本数来计算每个客户端模型的权重，并使用这些固定权重值进行模型融合。然而，在实际场景中，每个客户端的设备和数据多样性会导致每个客户端模型的质量差异。因此，使用传统的联合学习算法来融合所有客户端模型可能会导致全局模型的准确率下降。此外，如果客户端故意上传低质量或黑客模型，使用这些模型进行融合会导致全局模型的准确率受到严重的影响。传统的联合学习算法不能解决这些问题。为解决这些问题，我们提出了 FedDRL，一种基于强化学习的模型融合方法。在第一阶段，我们的方法可以过滤掉黑客模型，并选择可信worth的客户端模型参与模型融合。在第二阶段，FedDRL算法可以自适应地调整可信worth的客户端模型的权重，并将这些权重加权的全局模型进行融合。我们还定义了五种模型融合场景，并与两种基准算法进行比较。实验结果表明，我们的算法在可靠性和准确率之间做出了折衔。

Towards an AI Accountability Policy

paper_url: http://arxiv.org/abs/2307.13658
repo_url: None
paper_authors: Przemyslaw Grabowicz, Nicholas Perello, Yair Zick
for: 这份白皮书是回应美国国家电信和信息管理局（NTIA）发布的“人工智能责任政策请求意见”（AI Accountability Policy Request for Comments）。
methods: 该白皮书提供了一套相互连接的建议，用于制定人工智能责任政策。
results: 该白皮书的建议旨在确保人工智能技术的应用符合道德和法律要求，保障公民的权益和隐私，并促进人工智能技术的负责任和可靠性。

Abstract
This white paper is a response to the "AI Accountability Policy Request for Comments" by the National Telecommunications and Information Administration of the United States. The question numbers for which comments were requested are provided in superscripts at the end of key sentences answering the respective questions. The white paper offers a set of interconnected recommendations for an AI accountability policy.

摘要
这份白皮是回应美国国家电信管理局（NTIA）发布的“人工智能责任政策公开征求意见”（AI责任政策公开征求意见）的回应。文中提到的问题号用超文字标注在关键句中回答相应的问题。本白皮提供了一组相互关联的人工智能责任政策建议。

GNN4FR: A Lossless GNN-based Federated Recommendation Framework

paper_url: http://arxiv.org/abs/2308.01197
repo_url: None
paper_authors: Guowei Wu, Weike Pan, Zhong Ming
for: 提供一种 Privacy-preserving federated recommendation framework based on Graph Neural Networks (GNNs), which can train a global graph without leaking each user’s private interaction data.
methods: 使用 LightGCN 实例化该框架，并证明其与非联合版本等价。
results: 实现了全图训练，保持完整的高阶结构信息，使训练过程与非联合版本等价。

Abstract
Graph neural networks (GNNs) have gained wide popularity in recommender systems due to their capability to capture higher-order structure information among the nodes of users and items. However, these methods need to collect personal interaction data between a user and the corresponding items and then model them in a central server, which would break the privacy laws such as GDPR. So far, no existing work can construct a global graph without leaking each user's private interaction data (i.e., his or her subgraph). In this paper, we are the first to design a novel lossless federated recommendation framework based on GNN, which achieves full-graph training with complete high-order structure information, enabling the training process to be equivalent to the corresponding un-federated counterpart. In addition, we use LightGCN to instantiate an example of our framework and show its equivalence.

摘要
graph neural networks (GNNs) 在推荐系统中得到了广泛的应用，因为它们可以捕捉用户和物品之间的高阶结构信息。然而，这些方法需要收集用户与对应物品之间的个人互动数据，并将其模型在中央服务器上，这会违反隐私法规，如GDPR。到目前为止，没有任何现有的工作可以构建一个全球图 Without leaking each user's private interaction data (i.e., his or her subgraph).在这篇论文中，我们是首次设计了一种新的无损联邦推荐框架基于GNN，可以实现全图训练，并保持高阶结构信息完整性，使训练过程与相应的非联邦 counterpart等价。此外，我们使用 LightGCN 实例化我们的框架，并证明其等价性。

Safety Margins for Reinforcement Learning

paper_url: http://arxiv.org/abs/2307.13642
repo_url: None
paper_authors: Alexander Grushin, Walt Woods, Alvaro Velasquez, Simon Khan
for: This paper is written for autonomous controllers in freight transportation applications, to help identify when unsafe situations are about to occur and draw timely human oversight.
methods: The paper uses a definition of true criticality as the mean reduction in reward given some number of random actions, and computes proxy criticality metrics that can be compared to the true criticality in real-time.
results: The paper demonstrates how to leverage these proxy metrics to generate safety margins, which directly tie the consequences of potentially incorrect actions to an anticipated loss in overall performance. The approach is evaluated on learned policies from APE-X and A3C within an Atari environment, and shows how safety margins decrease as agents approach failure states.

Abstract
Any autonomous controller will be unsafe in some situations. The ability to quantitatively identify when these unsafe situations are about to occur is crucial for drawing timely human oversight in, e.g., freight transportation applications. In this work, we demonstrate that the true criticality of an agent's situation can be robustly defined as the mean reduction in reward given some number of random actions. Proxy criticality metrics that are computable in real-time (i.e., without actually simulating the effects of random actions) can be compared to the true criticality, and we show how to leverage these proxy metrics to generate safety margins, which directly tie the consequences of potentially incorrect actions to an anticipated loss in overall performance. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment, and demonstrate how safety margins decrease as agents approach failure states. The integration of safety margins into programs for monitoring deployed agents allows for the real-time identification of potentially catastrophic situations.

摘要
任何自主控制器都会在某些情况下不安全。可以量化地确定这些不安全情况的发生是关键，以便在例如货物运输应用中引入有效的人工监督。在这种工作中，我们示出了一种可靠地定义自动控制器的真正极点的方法，即通过一些随机动作的平均减少奖励来定义。我们还介绍了一种可实时计算的代理极点指标，可以与真实极点相比较，并且可以利用这些代理指标生成安全优势，直接将可能错误的行为与预计的损失性相关联。我们在APE-X和A3C学习政策中的Atari环境中评估了我们的方法，并证明了安全优势在失败状态附近减少。将安全优势集成到监控部署的程序中，可以实时识别可能catastrophic的情况。

DBGSA: A Novel Data Adaptive Bregman Clustering Algorithm

paper_url: http://arxiv.org/abs/2307.14375
repo_url: None
paper_authors: Ying Xiao, Hou-biao Li, Yu-pu Zhang
for: 提高非对稳定数据集中的各种减redundancy算法的精度methods: 使用数据驱动的Bregman异常参数优化减redundancy算法（DBGSA），结合Universal Gravitational Algorithm（UGA）将相似点靠拢近于数据集中。构建了重力系数方程，逐渐减少影响因子，并引入Bregman异常分子总能平均信息损失最小化来识别群中心。results: 对四个模拟数据集和六个实际数据集进行了广泛的实验，结果显示DBGSA比其他类似方法和改进的数据集平均提高精度达63.8%。此外，建立了三维网格搜索来比较不同参数值的影响，发现我们的模型提供的参数集是优化的。这些发现证明了DBGSA的高精度和稳定性。

Abstract
With the development of Big data technology, data analysis has become increasingly important. Traditional clustering algorithms such as K-means are highly sensitive to the initial centroid selection and perform poorly on non-convex datasets. In this paper, we address these problems by proposing a data-driven Bregman divergence parameter optimization clustering algorithm (DBGSA), which combines the Universal Gravitational Algorithm to bring similar points closer in the dataset. We construct a gravitational coefficient equation with a special property that gradually reduces the influence factor as the iteration progresses. Furthermore, we introduce the Bregman divergence generalized power mean information loss minimization to identify cluster centers and build a hyperparameter identification optimization model, which effectively solves the problems of manual adjustment and uncertainty in the improved dataset. Extensive experiments are conducted on four simulated datasets and six real datasets. The results demonstrate that DBGSA significantly improves the accuracy of various clustering algorithms by an average of 63.8\% compared to other similar approaches like enhanced clustering algorithms and improved datasets. Additionally, a three-dimensional grid search was established to compare the effects of different parameter values within threshold conditions, and it was discovered the parameter set provided by our model is optimal. This finding provides strong evidence of the high accuracy and robustness of the algorithm.

摘要
随着大数据技术的发展，数据分析已成为非常重要。传统的聚类算法如K-means受初始中心选择的影响很大，在非对称数据集上表现不佳。在这篇论文中，我们解决这些问题，提出一种基于数据驱动的布格曼异分距度优化聚类算法（DBGSA）。我们将Universal Gravitational Algorithm用于将相似点在数据集中帮助更近。我们构建了重力系数方程，其特点是逐步减少影响因子。此外，我们引入布格曼异分广泛含义力平均信息损失来识别聚类中心，并建立一个距离阈值范围内的超参数标准化模型，以有效解决手动调整和不确定性问题。我们在四个 simulate数据集和六个实际数据集上进行了广泛的实验。结果表明，DBGSA可以在不同的聚类算法上提高准确率的平均提升率为63.8%，比其他类似方法如增强聚类算法和改进的数据集更高。此外，我们建立了一个三维网格搜索，以比较不同参数值在阈值条件下的效果，发现我们的模型提供的参数集是最佳的。这一发现证明了我们的算法的高精度和可靠性。

Turning hazardous volatile matter compounds into fuel by catalytic steam reforming: An evolutionary machine learning approach

paper_url: http://arxiv.org/abs/2308.05750
repo_url: None
paper_authors: Alireza Shafizadeh, Hossein Shahbeik, Mohammad Hossein Nadian, Vijai Kumar Gupta, Abdul-Sattar Nizami, Su Shiung Lam, Wanxi Peng, Junting Pan, Meisam Tabatabaei, Mortaza Aghbashlo
for: 这个研究旨在开发一个基于机器学习的研究框架，用于模拟、理解和优化 catalytic steam reforming 过程中的材料和反应条件。methods: 这个研究使用了 X-ray diffraction 分析和文献库来获取输入特征，并使用了六种机器学习模型和粒子群搜索算法来优化反应条件。results: 研究发现， ensemble machine learning 模型可以提供最高的预测性能（R2 > 0.976），并且在737.44-725.62 ℃ 的温度范围内，可以实现高达77.2% 的 tar 转化率和产物分布。

Abstract
Chemical and biomass processing systems release volatile matter compounds into the environment daily. Catalytic reforming can convert these compounds into valuable fuels, but developing stable and efficient catalysts is challenging. Machine learning can handle complex relationships in big data and optimize reaction conditions, making it an effective solution for addressing the mentioned issues. This study is the first to develop a machine-learning-based research framework for modeling, understanding, and optimizing the catalytic steam reforming of volatile matter compounds. Toluene catalytic steam reforming is used as a case study to show how chemical/textural analyses (e.g., X-ray diffraction analysis) can be used to obtain input features for machine learning models. Literature is used to compile a database covering a variety of catalyst characteristics and reaction conditions. The process is thoroughly analyzed, mechanistically discussed, modeled by six machine learning models, and optimized using the particle swarm optimization algorithm. Ensemble machine learning provides the best prediction performance (R2 > 0.976) for toluene conversion and product distribution. The optimal tar conversion (higher than 77.2%) is obtained at temperatures between 637.44 and 725.62 {\deg}C, with a steam-to-carbon molar ratio of 5.81-7.15 and a catalyst BET surface area 476.03-638.55 m2/g. The feature importance analysis satisfactorily reveals the effects of input descriptors on model prediction. Operating conditions (50.9%) and catalyst properties (49.1%) are equally important in modeling. The developed framework can expedite the search for optimal catalyst characteristics and reaction conditions, not only for catalytic chemical processing but also for related research areas.

摘要
化学和生物质处理系统每天都会释放有害物质into the environment。 catalytic reforming可以将这些物质转化为有价值的燃料，但是开发稳定和高效的催化剂是挑战。机器学习可以处理复杂的关系在大数据中，并且可以优化反应条件，因此它是解决这些问题的有效解决方案。本研究是首次开发了基于机器学习的研究框架，用于模型、理解和优化催化气相 reforming的有害物质。toluenecatalytic steam reforming作为一个例子，通过X射线晶体分析等方法获取输入特征，并使用文献库评估催化剂特性和反应条件。机器学习模型由六种模型组成，并使用粒子群优化算法进行优化。 ensemble machine learning提供了最佳预测性能（R2> 0.976） для toluene转化和产物分布。最佳的 tar转化（高于 77.2%）在637.44-725.62 ℃的温度范围内，与气-碳分子比为5.81-7.15和催化剂BET表面积为476.03-638.55 m2/g。特征重要性分析得到了输入描述对模型预测的影响。操作条件（50.9%）和催化剂特性（49.1%）在模型中具有相等的重要性。开发的框架可以加速寻找优化催化剂特性和反应条件，不仅限于催化化学处理，还可以扩展到相关的研究领域。

Scaling machine learning-based chemical plant simulation: A method for fine-tuning a model to induce stable fixed points

paper_url: http://arxiv.org/abs/2307.13621
repo_url: None
paper_authors: Malte Esders, Gimmy Alex Fernandez Ramirez, Michael Gastegger, Satya Swarup Samal
for: 这个论文是为了使用机器学习模型直接适应化化学厂数据而写的。
methods: 这篇论文使用了一种结构化方法，每个厂区都被一个机器学习模型代表。模型们被连接成一个流程图像，并且在数据上适应模型。
results: 对小型化学厂来说，这种方法工作良好，但对大型化学厂来说，由于大量和嵌入循环的循环导致循环解决器不稳定。作者分析了这个问题，并提出了一种方法来精细调整机器学习模型，使得解决循环变得稳定。

Abstract
Idealized first-principles models of chemical plants can be inaccurate. An alternative is to fit a Machine Learning (ML) model directly to plant sensor data. We use a structured approach: Each unit within the plant gets represented by one ML model. After fitting the models to the data, the models are connected into a flowsheet-like directed graph. We find that for smaller plants, this approach works well, but for larger plants, the complex dynamics arising from large and nested cycles in the flowsheet lead to instabilities in the cycle solver. We analyze this problem in depth and show that it is not merely a specialized concern but rather a more pervasive challenge that will likely occur whenever ML is applied to larger plants. To address this problem, we present a way to fine-tune ML models such that solving cycles with the usual methods becomes robust again.

摘要
(Simplified Chinese translation)理想化的初始原理模型可能不准确。一种alternative是直接将机器学习（ML）模型适应到厂区传感器数据。我们采用一种结构化方法：每个厂区内的单元都被表示为一个ML模型。在给数据适应模型后，模型被连接成一个流程图像的导向图。我们发现，对于小型厂区，这种方法效果很好，但对于更大的厂区，由于大量和嵌套的循环在流程图中，导致循环解决器中的不稳定。我们对这个问题进行了深入分析，并证明这不仅是特殊情况，而是更普遍的挑战，当机器学习应用于更大的厂区时，这种问题将会出现。为解决这个问题，我们提出了一种精细调整ML模型的方法，使得通过常规方法解决循环变得稳定。

AI and ethics in insurance: a new solution to mitigate proxy discrimination in risk modeling

paper_url: http://arxiv.org/abs/2307.13616
repo_url: None
paper_authors: Marguerite Sauce, Antoine Chancel, Antoine Ly
For: The paper aims to address the issue of indirect discrimination in insurance pricing and risk selection practices, using a mathematical approach based on linear algebra to reduce the risks of discrimination.* Methods: The paper proposes an innovative method that has not been previously discussed in the literature, which uses mathematical concepts of linear algebra to reduce the risks of indirect discrimination in insurance.* Results: The paper demonstrates the effectiveness of the proposed method in a concrete case of risk selection in life insurance, showing its simplicity of use and promising performance.Here is the same information in Simplified Chinese text:* For: 本研究旨在Addressing indirect discrimination in insurance pricing and risk selection practices, using mathematical approach based on linear algebra to reduce the risks of discrimination.* Methods: 本研究提出了一种innovative method, which uses mathematical concepts of linear algebra to reduce the risks of indirect discrimination in insurance.* Results: 研究demonstrates the effectiveness of the proposed method in a concrete case of risk selection in life insurance, showing its simplicity of use and promising performance.

Abstract
The development of Machine Learning is experiencing growing interest from the general public, and in recent years there have been numerous press articles questioning its objectivity: racism, sexism, \dots Driven by the growing attention of regulators on the ethical use of data in insurance, the actuarial community must rethink pricing and risk selection practices for fairer insurance. Equity is a philosophy concept that has many different definitions in every jurisdiction that influence each other without currently reaching consensus. In Europe, the Charter of Fundamental Rights defines guidelines on discrimination, and the use of sensitive personal data in algorithms is regulated. If the simple removal of the protected variables prevents any so-called `direct' discrimination, models are still able to `indirectly' discriminate between individuals thanks to latent interactions between variables, which bring better performance (and therefore a better quantification of risk, segmentation of prices, and so on). After introducing the key concepts related to discrimination, we illustrate the complexity of quantifying them. We then propose an innovative method, not yet met in the literature, to reduce the risks of indirect discrimination thanks to mathematical concepts of linear algebra. This technique is illustrated in a concrete case of risk selection in life insurance, demonstrating its simplicity of use and its promising performance.

摘要
机器学习的发展正在受到一般大众的越来越高度关注，最近几年媒体也有许多报导质疑其公正性： racism、性别歧视、等等。由于资料使用的 regulators 在保险业中日益增加注意力，保险业界必须重新思考定价和风险选择实践，以确保更公正的保险。“Equity”是一个哲学概念，在每个司法管辖区都有不同的定义，这些定义彼此影响而无现在达成共识。在欧洲，《欧洲基本权利宣言》提供了歧视指南，而使用敏感个人资料在算法中的使用则是规管的。即使简单地删除保护变数，模型仍然能够间接歧视个人，因为变数之间的隐藏互动可以提高模型的性能（并因此提高风险的量化、价格分 segmentation 等）。我们首先介绍了歧视的主要概念，然后详细介绍了量化歧视的复杂性。接着，我们提出了一种新的方法，尚未在文献中出现过，以减少间接歧视的风险，这种方法基于数学概念的线性代数。这种技术在生命保险中的风险选择中被证明了其简单使用和推荐性的表现。

Team Intro to AI team8 at CoachAI Badminton Challenge 2023: Advanced ShuttleNet for Shot Predictions

paper_url: http://arxiv.org/abs/2307.13715
repo_url: None
paper_authors: Shih-Hong Chen, Pin-Hsuan Chou, Yong-Fu Liu, Chien-An Han
for: 提高现有框架ShuttleNet在预测羽毛球球类型和位置的性能，通过利用过去的拍打。
methods: 使用过去拍打来改进ShuttleNet框架的预测性能。
results: 在IJCAI 2023 CoachAI Badminton Challenge中获得了较好的成绩，比基线要好得多，最终获得了比赛的第一名，并公布了代码。

Abstract
In this paper, our objective is to improve the performance of the existing framework ShuttleNet in predicting badminton shot types and locations by leveraging past strokes. We participated in the CoachAI Badminton Challenge at IJCAI 2023 and achieved significantly better results compared to the baseline. Ultimately, our team achieved the first position in the competition and we made our code available.

摘要
在这篇论文中，我们的目标是通过利用过去的击球来提高现有框架ShuttleNet在预测羽毛球shot类型和位置的性能。我们参加了IJCAI 2023年CoachAI羽毛球比赛，与基准线比较，得到了显著更好的结果。最终，我们的团队获得了比赛的第一名，并且我们的代码公开了。

Forecasting, capturing and activation of carbon-dioxide (CO$_2$): Integration of Time Series Analysis, Machine Learning, and Material Design

paper_url: http://arxiv.org/abs/2307.14374
repo_url: None
paper_authors: Suchetana Sadhukhan, Vivek Kumar Yadav
for: 这项研究旨在进行欧盟27国+英国、意大利、德国和西班牙等国家以及印度日常产业碳排放时序分析，从2019年1月至2023年2月的15个月时间段内获取数据。
methods: 该研究使用了卡本监测研究计划提供的近实时活动数据，并对2020年的数据进行排除，以避免 COVID-19 大流行对数据的干扰。然后，研究人员使用主成分分析（PCA）确定排放的主要贡献者。为了提高预测质量，研究人员使用了7天移动平均数据进行进一步分析。
results: 研究发现，电力、工业和公路交通三个领域占据了总变量的显著部分。使用长期短期记忆（LSTM）模型对7天移动平均数据进行预测，可以有效地预测排放和提供政策决策、缓减策略和气候变化努力的指导。模型在训练阶段保证稳定性和协调性，并在测试阶段表现出高效率，$R^2$ 值分别为0.8242-0.995。此外，研究人员还提出了使用锆和氮气/铝合金薄膜作为捕捉CO2的非常有效材料，这些材料在此方面超过了 grafene 和氮氧化物薄膜的绑定能量范围。

Abstract
This study provides a comprehensive time series analysis of daily industry-specific, country-wise CO$_2$ emissions from January 2019 to February 2023. The research focuses on the Power, Industry, Ground Transport, Domestic Aviation, and International Aviation sectors in European countries (EU27 & UK, Italy, Germany, Spain) and India, utilizing near-real-time activity data from the Carbon Monitor research initiative. To identify regular emission patterns, the data from the year 2020 is excluded due to the disruptive effects caused by the COVID-19 pandemic. The study then performs a principal component analysis (PCA) to determine the key contributors to CO$_2$ emissions. The analysis reveals that the Power, Industry, and Ground Transport sectors account for a significant portion of the variance in the dataset. A 7-day moving averaged dataset is employed for further analysis to facilitate robust predictions. This dataset captures both short-term and long-term trends and enhances the quality of the data for prediction purposes. The study utilizes Long Short-Term Memory (LSTM) models on the 7-day moving averaged dataset to effectively predict emissions and provide insights for policy decisions, mitigation strategies, and climate change efforts. During the training phase, the stability and convergence of the LSTM models are ensured, which guarantees their reliability in the testing phase. The evaluation of the loss function indicates this reliability. The model achieves high efficiency, as demonstrated by $R^2$ values ranging from 0.8242 to 0.995 for various countries and sectors. Furthermore, there is a proposal for utilizing scandium and boron/aluminium-based thin films as exceptionally efficient materials for capturing CO$_2$ (with a binding energy range from -3.0 to -3.5 eV). These materials are shown to surpass the affinity of graphene and boron nitride sheets in this regard.

摘要

2023-07-26

Fluorescent Neuronal Cells v2: Multi-Task, Multi-Format Annotations for Deep Learning in Microscopy

Evolving Multi-Objective Neural Network Controllers for Robot Swarms

Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences

Online Modeling and Monitoring of Dependent Processes under Resource Constraints

Application of Random Forest and Support Vector Machine for Investigation of Pressure Filtration Performance, a Zinc Plant Filter Cake Modeling

Efficient Learning of Discrete-Continuous Computation Graphs

A comparison of machine learning surrogate models of street-scale flooding in Norfolk, Virginia

Learning Disentangled Discrete Representations

Toward Design of Synthetic Active Inference Agents by Mere Mortals

Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related Rewards

A Survey on Generative Modeling with Limited Data, Few Shots, and Zero Shot

Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models

GraphRNN Revisited: An Ablation Study and Extensions for Directed Acyclic Graphs

Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks

Learning to simulate partially known spatio-temporal dynamics with trainable difference operators

Dynamic Domain Discrepancy Adjustment for Active Multi-Domain Adaptation

Hypergraph Isomorphism Computation

Machine Learning Applications In Healthcare: The State Of Knowledge and Future Directions

Pre-Training with Diffusion models for Dental Radiography segmentation

Topologically-Regularized Multiple Instance Learning for Red Blood Cell Disease Classification

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

MCMC-Correction of Score-Based Diffusion Models for Model Composition

Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG

Fast algorithms for k-submodular maximization subject to a matroid constraint

Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space

BovineTalk: Machine Learning for Vocalization Analysis of Dairy Cattle under Negative Affective States

Differentiable short-time Fourier transform with respect to the hop length

METAVerse: Meta-Learning Traversability Cost Map for Off-Road Navigation

Differentiable adaptive short-time Fourier transform with respect to the window length

This is not correct! Negation-aware Evaluation of Language Generation Systems

Controlling the Latent Space of GANs through Reinforcement Learning: A Case Study on Task-based Image-to-Image Translation

Mathematical Modeling of BCG-based Bladder Cancer Treatment Using Socio-Demographics

Understanding Deep Neural Networks via Linear Separability of Hidden Layers

Flexible Differentially Private Vertical Federated Learning with Adaptive Feature Embeddings

Entropy Neural Estimation for Graph Contrastive Learning

Topology-aware Robust Optimization for Out-of-distribution Generalization

Improving Semi-Supervised Semantic Segmentation with Dual-Level Siamese Structure Network

trajdata: A Unified Interface to Multiple Human Trajectory Datasets

HyperFed: Hyperbolic Prototypes Exploration with Consistent Aggregation for Non-IID Data in Federated Learning

Simulation-based Inference for Cardiovascular Models

BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery

Online learning in bandits with predicted context

Graph Neural Networks-based Hybrid Framework For Predicting Particle Crushing Strength

Robustness Verification of Deep Neural Networks using Star-Based Reachability Analysis with Variable-Length Time Series Input

Corruption-Robust Lipschitz Contextual Search

Regularizing Neural Networks with Meta-Learning Generative Models

Efficient Estimation of the Local Robustness of Machine Learning Models

ExeDec: Execution Decomposition for Compositional Generalization in Neural Program Synthesis

Good Lattice Training: Physics-Informed Neural Networks Accelerated by Number Theory

Learning sources of variability from high-dimensional observational studies

Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT

Learning to Design Analog Circuits to Meet Threshold Specifications

On the unreasonable vulnerability of transformers for image restoration – and an easy fix

Exploring the Sharpened Cosine Similarity

WebArena: A Realistic Web Environment for Building Autonomous Agents

SplitFed resilience to packet loss: Where to split, that is the question

MAEA: Multimodal Attribution for Embodied AI

Relationship between Batch Size and Number of Steps Needed for Nonconvex Optimization of Stochastic Gradient Descent using Armijo Line Search

Offline Reinforcement Learning with On-Policy Q-Function Regularization

Fitting Auditory Filterbanks with Multiresolution Neural Networks

Gradient-Based Spectral Embeddings of Random Dot Product Graphs

How to Scale Your EMA

When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review

EdgeConvEns: Convolutional Ensemble Learning for Edge Intelligence

Source Condition Double Robust Inference on Functionals of Inverse Problems

Histogram Layer Time Delay Neural Networks for Passive Sonar Classification

The GANfather: Controllable generation of malicious activity to improve defence systems

Robust Assignment of Labels for Active Learning with Sparse and Noisy Annotations

Accuracy Amplification in Differentially Private Logistic Regression: A Pre-Training Approach

ClusterSeq: Enhancing Sequential Recommender Systems with Clustering based Meta-Learning

Implicitly Normalized Explicitly Regularized Density Estimation

UPREVE: An End-to-End Causal Discovery Benchmarking System

Solution Path of Time-varying Markov Random Fields with Discrete Regularization

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

ARB: Advanced Reasoning Benchmark for Large Language Models

High Probability Analysis for Non-Convex Stochastic Optimization with Clipping

RED CoMETS: An ensemble classifier for symbolically represented multivariate time series

FedDRL: A Trustworthy Federated Learning Model Fusion Method Based on Staged Reinforcement Learning

Towards an AI Accountability Policy