2023-12-01

cs.LG

cs.LG - 2023-12-01

Spatiotemporal Transformer for Imputing Sparse Data: A Deep Learning Approach

paper_url: http://arxiv.org/abs/2312.00963
repo_url: None
paper_authors: Kehui Yao, Jingyi Huang, Jun Zhu
for: This paper aims to address the challenge of missing values in sparse spatiotemporal datasets, particularly focusing on soil moisture data.
methods: The ST-Transformer model employs multiple spatiotemporal attention layers to capture complex spatiotemporal correlations in the data and can integrate additional spatiotemporal covariates during the imputation process, enhancing its accuracy.
results: The model demonstrates superior accuracy compared to well-known imputation methods and is applicable to various spatiotemporal imputation tasks.Here is the same information in Simplified Chinese text:
for: 这篇论文主要目的是解决稀疏时空数据中的缺失值问题，特别是关注 soil moisture 数据。
methods: 这种 ST-Transformer 模型使用多个时空注意层来捕捉时空数据中的复杂相关关系，并可以在填充过程中使用其他时空 covariates，从而提高准确性。
results: 模型在 SMAP 1km soil moisture 数据上的应用显示了较高的准确性，并且在其他数据集上的 simulation 研究表明该模型可以在多种时空填充任务中实现更高的应用效果。

Abstract
Effective management of environmental resources and agricultural sustainability heavily depends on accurate soil moisture data. However, datasets like the SMAP/Sentinel-1 soil moisture product often contain missing values across their spatiotemporal grid, which poses a significant challenge. This paper introduces a novel Spatiotemporal Transformer model (ST-Transformer) specifically designed to address the issue of missing values in sparse spatiotemporal datasets, particularly focusing on soil moisture data. The ST-Transformer employs multiple spatiotemporal attention layers to capture the complex spatiotemporal correlations in the data and can integrate additional spatiotemporal covariates during the imputation process, thereby enhancing its accuracy. The model is trained using a self-supervised approach, enabling it to autonomously predict missing values from observed data points. Our model's efficacy is demonstrated through its application to the SMAP 1km soil moisture data over a 36 x 36 km grid in Texas. It showcases superior accuracy compared to well-known imputation methods. Additionally, our simulation studies on other datasets highlight the model's broader applicability in various spatiotemporal imputation tasks.

摘要
管理环境资源和农业可持续发展具有重要意义，减少缺失数据对这些任务是一个大的挑战。这篇论文提出了一种新的空间时间变换器模型（ST-Transformer），特意解决缺失数据问题在缺少数据空间时间grid上。ST-Transformer使用多个空间时间注意层捕捉数据中的复杂空间时间相关性，可以在填充过程中integrate额外的空间时间变量，因此提高准确性。模型通过自动学习方法进行训练，能够自动从观察数据点预测缺失值。我们的模型在德州1公里的SMAP soil moisture数据上进行应用，显示与其他知名填充方法相比，具有更高的准确性。此外，我们的仿真研究表明，该模型在其他数据集上也具有更广泛的应用可能性。

A Theory of Unimodal Bias in Multimodal Learning

paper_url: http://arxiv.org/abs/2312.00935
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Yedi Zhang, Peter E. Latham, Andrew Saxe
for: This paper aims to understand the phenomenon of unimodal bias in deep multimodal neural networks during joint training.
methods: The authors use deep multimodal linear networks and analyze the duration of the unimodal phase in learning as a function of layer fusion, dataset statistics, and initialization.
results: The authors find that deeper layer fusion leads to longer unimodal phases, which can result in permanent unimodal bias and a generalization deficit in overparametrized networks. Additionally, they show that the modality learned first is not necessarily the most important modality for the output. These results apply to ReLU networks in certain settings.

Abstract
Using multiple input streams simultaneously in training multimodal neural networks is intuitively advantageous, but practically challenging. A key challenge is unimodal bias, where a network overly relies on one modality and ignores others during joint training. While unimodal bias is well-documented empirically, our theoretical understanding of how architecture and data statistics influence this bias remains incomplete. Here we develop a theory of unimodal bias with deep multimodal linear networks. We calculate the duration of the unimodal phase in learning as a function of the depth at which modalities are fused within the network, dataset statistics, and initialization. We find that the deeper the layer at which fusion occurs, the longer the unimodal phase. A long unimodal phase can lead to a generalization deficit and permanent unimodal bias in the overparametrized regime. In addition, our theory reveals the modality learned first is not necessarily the modality that contributes more to the output. Our results, derived for multimodal linear networks, extend to ReLU networks in certain settings. Taken together, this work illuminates pathologies of multimodal learning under joint training, showing that late and intermediate fusion architectures can give rise to long unimodal phases and permanent unimodal bias.

摘要
Simplified Chinese:使用多个输入流 simultaneously 在训练多模态神经网络时，Intuitively advantageous, but practically challenging. A key challenge is unimodal bias, where a network overly relies on one modality and ignores others during joint training. While unimodal bias is well-documented empirically, our theoretical understanding of how architecture and data statistics influence this bias remains incomplete. Here we develop a theory of unimodal bias with deep multimodal linear networks. We calculate the duration of the unimodal phase in learning as a function of the depth at which modalities are fused within the network, dataset statistics, and initialization. We find that the deeper the layer at which fusion occurs, the longer the unimodal phase. A long unimodal phase can lead to a generalization deficit and permanent unimodal bias in the overparametrized regime. In addition, our theory reveals the modality learned first is not necessarily the modality that contributes more to the output. Our results, derived for multimodal linear networks, extend to ReLU networks in certain settings. Taken together, this work illuminates pathologies of multimodal learning under joint training, showing that late and intermediate fusion architectures can give rise to long unimodal phases and permanent unimodal bias.

PACE: A Program Analysis Framework for Continuous Performance Prediction

paper_url: http://arxiv.org/abs/2312.00918
repo_url: https://github.com/padlab/pace
paper_authors: Chidera Biringa, Gokhan Kul
for: 这篇论文主要是为了提供一个程序分析框架，以便在代码更新之前提供连续反馈关于代码性能影响的数据。
methods: 该框架使用了功能测试案例的执行时间来创建性能微本benchmark，然后将这些微本benchmark映射到代码风格特征上，并将其传递给预测器进行性能预测。
results: 实验表明，该框架可以具有显著的性能预测能力，比现有状态的技术提高75%。

Abstract
Software development teams establish elaborate continuous integration pipelines containing automated test cases to accelerate the development process of software. Automated tests help to verify the correctness of code modifications decreasing the response time to changing requirements. However, when the software teams do not track the performance impact of pending modifications, they may need to spend considerable time refactoring existing code. This paper presents PACE, a program analysis framework that provides continuous feedback on the performance impact of pending code updates. We design performance microbenchmarks by mapping the execution time of functional test cases given a code update. We map microbenchmarks to code stylometry features and feed them to predictors for performance predictions. Our experiments achieved significant performance in predicting code performance, outperforming current state-of-the-art by 75% on neural-represented code stylometry features.

摘要
Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".Translation notes:* "continuous integration" is translated as "连续 интеграation" (liánxié integration)* "automated test cases" is translated as "自动化测试用例" (zìdònghuà test cases)* "performance impact" is translated as "性能影响" (xìngnéng yǐngxiāng)* "pending modifications" is translated as "待修改的代码" (dài xiūgòu de gōngzuò)* "code stylometry features" is translated as "代码风格特征" (gōngzuò fēngxìng tèzhèng)* "microbenchmarks" is translated as "微测试" (wēi tèshì)* "predictors" is translated as "预测器" (yùdòng zhī)* "neural-represented" is translated as "神经表示的" (shénxīn biǎozhǐ de)

Extreme Event Prediction with Multi-agent Reinforcement Learning-based Parametrization of Atmospheric and Oceanic Turbulence

paper_url: http://arxiv.org/abs/2312.00907
repo_url: None
paper_authors: Rambod Mojgani, Daniel Waelchli, Yifei Guan, Petros Koumoutsakos, Pedram Hassanzadeh
for: 这篇论文旨在探讨globale climate models（GCMs）中的主要问题， specifically the use of supervised-learned closures and reinforcement learning to improve the accuracy of climate simulations.
methods: 本论文使用了Scientific Multi-Agent Reinforcement Learning（SMARL）和基本的湍流物理学原理，通过将computational elements作为精确化点和学习代理，来学习closure模型。
results: 本论文的结果显示，使用SMARL和湍流物理学原理来学习closure模型，可以在仅使用几个高精度数据的情况下，实现稳定且高准确的低分辨率模拟。

Abstract
Global climate models (GCMs) are the main tools for understanding and predicting climate change. However, due to limited numerical resolutions, these models suffer from major structural uncertainties; e.g., they cannot resolve critical processes such as small-scale eddies in atmospheric and oceanic turbulence. Thus, such small-scale processes have to be represented as a function of the resolved scales via closures (parametrization). The accuracy of these closures is particularly important for capturing climate extremes. Traditionally, such closures are based on heuristics and simplifying assumptions about the unresolved physics. Recently, supervised-learned closures, trained offline on high-fidelity data, have been shown to outperform the classical physics-based closures. However, this approach requires a significant amount of high-fidelity training data and can also lead to instabilities. Reinforcement learning is emerging as a potent alternative for developing such closures as it requires only low-order statistics and leads to stable closures. In Scientific Multi-Agent Reinforcement Learning (SMARL) computational elements serve a dual role of discretization points and learning agents. We leverage SMARL and fundamentals of turbulence physics to learn closures for prototypes of atmospheric and oceanic turbulence. The policy is trained using only the enstrophy spectrum, which is nearly invariant and can be estimated from a few high-fidelity samples (these few samples are far from enough for supervised/offline learning). We show that these closures lead to stable low-resolution simulations that, at a fraction of the cost, can reproduce the high-fidelity simulations' statistics, including the tails of the probability density functions. The results demonstrate the high potential of SMARL for closure modeling for GCMs, especially in the regime of scarce data and indirect observations.

摘要
全球气候模型（GCM）是现代气候变化理解和预测的主要工具。然而，由于数值分解能力有限，这些模型受到主要结构不确定性的影响，例如不能解决大气和海洋中的小规模旋变。因此，这些小规模过程通常通过 closure（参数化）来表示，其中参数是通过高精度数据训练来得到的。然而，这种方法需要大量的高精度训练数据，并且可能会导致不稳定。在这些情况下，人工智能学习是一种有潜力的代替方案，因为它只需要低阶统计信息，并且可以获得稳定的 closure。在科学多智能学习（SMARL）中，计算元素同时扮演着离散点和学习代理的角色。我们利用SMARL和液体动力学基础知识来学习closure。 closure是通过仅使用扩散 спектrum来训练，这个 спектrum在高精度数据中是不变的，并且可以从几个高精度样本中估算。我们表明，这些closure可以在低分解度 simulations中获得稳定的低分解度 simulations，并且可以在较低的成本下复制高精度 simulations 的统计特征，包括分布函数的尾部。这些结果表明SMARL在GCMs中的应用潜力很大，��pecially在数据稀缺和间接观测的情况下。

Explaining Knock-on Effects of Bias Mitigation

paper_url: http://arxiv.org/abs/2312.00765
repo_url: None
paper_authors: Svetoslav Nizhnichenkov, Rahul Nair, Elizabeth Daly, Brian Mac Namee
for: 这 paper 的目的是 characterise 受影响的群体（cohorts）when bias mitigation interventions are applied.
methods: 该 paper 使用 explainable meta-classifier 来 identific 受影响的 cohorts, 并 examine 多种 bias mitigation strategies 的效果。
results: 研究发现，所有测试的mitigation strategies都会 negatively impact a non-trivial fraction of cases, 即因为mitigation efforts而 receiving unfavourable outcomes. 这些结果 serve as a basis for arguing for more careful audits of static mitigation interventions that go beyond aggregate metrics.

Abstract
In machine learning systems, bias mitigation approaches aim to make outcomes fairer across privileged and unprivileged groups. Bias mitigation methods work in different ways and have known "waterfall" effects, e.g., mitigating bias at one place may manifest bias elsewhere. In this paper, we aim to characterise impacted cohorts when mitigation interventions are applied. To do so, we treat intervention effects as a classification task and learn an explainable meta-classifier to identify cohorts that have altered outcomes. We examine a range of bias mitigation strategies that work at various stages of the model life cycle. We empirically demonstrate that our meta-classifier is able to uncover impacted cohorts. Further, we show that all tested mitigation strategies negatively impact a non-trivial fraction of cases, i.e., people who receive unfavourable outcomes solely on account of mitigation efforts. This is despite improvement in fairness metrics. We use these results as a basis to argue for more careful audits of static mitigation interventions that go beyond aggregate metrics.

摘要
To do this, we treat the intervention effects as a classification task and use an explainable meta-classifier to identify cohorts that have experienced altered outcomes. We examine a range of bias mitigation strategies that work at different stages of the model life cycle.Our results show that our meta-classifier is able to identify impacted cohorts. However, we also find that all of the tested mitigation strategies negatively impact a non-trivial fraction of cases, meaning that some people may receive unfavourable outcomes solely as a result of the mitigation efforts. This is despite the fact that these strategies can improve fairness metrics.We use these results to argue for more careful audits of static mitigation interventions that go beyond aggregate metrics. This is because these interventions can have unintended consequences and it is important to understand how they are impacting different groups of people.

SpaCE: The Spatial Confounding Environment

paper_url: http://arxiv.org/abs/2312.00710
repo_url: https://github.com/nsaph-projects/space
paper_authors: Mauricio Tec, Ana Trisovic, Michelle Audirac, Sophie Woodward, Jie Kate Hu, Naeem Khoshnevis, Francesca Dominici
for: Addressing spatial confounding in scientific studies involving spatial data
methods: Provides realistic benchmark datasets and tools for evaluating causal inference methods
results: Includes diverse datasets of varying sizes and spatial complexity, with realistic semi-synthetic outcomes and counterfactuals

Abstract
Spatial confounding poses a significant challenge in scientific studies involving spatial data, where unobserved spatial variables can influence both treatment and outcome, possibly leading to spurious associations. To address this problem, we introduce SpaCE: The Spatial Confounding Environment, the first toolkit to provide realistic benchmark datasets and tools for systematically evaluating causal inference methods designed to alleviate spatial confounding. Each dataset includes training data, true counterfactuals, a spatial graph with coordinates, and smoothness and confounding scores characterizing the effect of a missing spatial confounder. It also includes realistic semi-synthetic outcomes and counterfactuals, generated using state-of-the-art machine learning ensembles, following best practices for causal inference benchmarks. The datasets cover real treatment and covariates from diverse domains, including climate, health and social sciences. SpaCE facilitates an automated end-to-end pipeline, simplifying data loading, experimental setup, and evaluating machine learning and causal inference models. The SpaCE project provides several dozens of datasets of diverse sizes and spatial complexity. It is publicly available as a Python package, encouraging community feedback and contributions.

摘要
空间假设 pose 科学研究中的一个重要挑战，因为未观察的空间变量可以影响处理和结果，导致假设的相关性。为解决这个问题，我们介绍了 SpaCE：空间假设环境，该工具包提供了真实的比较数据集和用于系统地评估 causal inference 方法的工具。每个数据集包括训练数据、真实的对照数据、一个含有坐标的空间图和影响缺失的空间假设的投影分数和均匀分数。它还包括使用现代机器学习套件生成的真实的半 sintetic 结果和对照数据，遵循最佳实践 для causal inference 标准准。数据集覆盖了实际的处理和 covariates 来自多个领域，包括气候、健康和社会科学。SpaCE 提供了一个自动化的终端到终点管道，简化了数据加载、实验设置和评估机器学习和 causal inference 模型。SpaCE 项目提供了多达几十个不同大小和空间复杂性的数据集。它是作为 Python 包公开可用，欢迎社区反馈和贡献。

Machine Learning for Health symposium 2023 – Findings track

paper_url: http://arxiv.org/abs/2312.00655
repo_url: None
paper_authors: Stefan Hegselmann, Antonio Parziale, Divya Shanmugam, Shengpu Tang, Mercy Nyamewaa Asiedu, Serina Chang, Thomas Hartvigsen, Harvineet Singh
For: The paper is part of the 3rd Machine Learning for Health (ML4H) symposium, which is a collection of accepted Findings papers presented on December 10, 2023, in New Orleans, Louisiana, USA.* Methods: The paper underwent a double-blind peer-review process.* Results: The paper is a collection of new ideas and sparked insightful discussions in the health-related disciplines of healthcare, biomedicine, and public health.Here’s the information in Simplified Chinese text:
for: 这是第3届机器学习 для医疗（ML4H）学术会议上的Accepted Findings paper，于2023年12月10日在路易斯安那州新奥尔良举行。
methods: 这篇论文经过了双重盲 peer-review过程。
results: 这篇论文是健康相关领域（医疗、生物医学和公共卫生）中新的想法和有价值的讨论的集成。

Abstract
A collection of the accepted Findings papers that were presented at the 3rd Machine Learning for Health symposium (ML4H 2023), which was held on December 10, 2023, in New Orleans, Louisiana, USA. ML4H 2023 invited high-quality submissions on relevant problems in a variety of health-related disciplines including healthcare, biomedicine, and public health. Two submission tracks were offered: the archival Proceedings track, and the non-archival Findings track. Proceedings were targeted at mature work with strong technical sophistication and a high impact to health. The Findings track looked for new ideas that could spark insightful discussion, serve as valuable resources for the community, or could enable new collaborations. Submissions to the Proceedings track, if not accepted, were automatically considered for the Findings track. All the manuscripts submitted to ML4H Symposium underwent a double-blind peer-review process.

摘要
一份包含2023年12月10日在美国路易斯安那州新奥尔良市举行的第三届机器学习 для健康学术symposium（ML4H 2023）的accepted Findings paper集。ML4H 2023邀请高水平的提交作品，涵盖各种健康相关领域，如医疗、生物医学和公共健康。提交两个轨道：一是可archivable Proceedings轨道，二是非 archivable Findings轨道。Proceedings轨道针对已成熟的工作，具有强技术性和高度影响健康领域。Findings轨道寻找新的想法，以便促进深入讨论、为社区提供价值资源或者启动新的合作。如果Proceedings轨道不被接受，则自动进行Findings轨道的考核。所有ML4H Symposium的投稿都经过了双盲审核过程。

Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation

paper_url: http://arxiv.org/abs/2312.00645
repo_url: None
paper_authors: Paul Bricman
for: 本研究旨在评估语言模型在敏感话题方面的能力，如生物恐怖主义或cyber战。但传统的开源benchmark不适用于此任务，因为它们通常公布正确答案在人类可读格式下。
methods: 我们提议使用hashmarking协议来评估语言模型在开放的环境中，而不需要披露正确答案。在 simplest form中，一个hashmark是一个benchmark，其参考解决方案已经在发布前被加密。
results: 我们评估了hashmarking协议的抵御性，包括传统的攻击方式（如彩虹表攻击）以及生成型模型的缺陷。

Abstract
There is a growing need to gain insight into language model capabilities that relate to sensitive topics, such as bioterrorism or cyberwarfare. However, traditional open source benchmarks are not fit for the task, due to the associated practice of publishing the correct answers in human-readable form. At the same time, enforcing mandatory closed-quarters evaluations might stifle development and erode trust. In this context, we propose hashmarking, a protocol for evaluating language models in the open without having to disclose the correct answers. In its simplest form, a hashmark is a benchmark whose reference solutions have been cryptographically hashed prior to publication. Following an overview of the proposed evaluation protocol, we go on to assess its resilience against traditional attack vectors (e.g. rainbow table attacks), as well as against failure modes unique to increasingly capable generative models.

摘要
“需要增加关于敏感话题语言模型能力的了解，如生物恐怖主义或网络战争。但传统的开源标准 benchmar ks 不适用于这种任务，因为它们通常将正确答案发布在人类可读格式下。同时，强制实施封闭评估可能会阻碍发展和侵蚀信任。在这种情况下，我们提议使用 hashmarking 协议，以评估语言模型在开放的环境中而不需要披露正确答案。在 simplest form 中，一个 hashmark 是一个 benchmark 的参考解决方案，其前置公布的解决方案已经被 криптографически hashed。接下来，我们会介绍评估协议的可抗性，包括传统的攻击方法（如彩虹表攻击）以及生成型模型的特有失败模式。”

One to beat them all: “RYU’’ – a unifying framework for the construction of safe balls

paper_url: http://arxiv.org/abs/2312.00640
repo_url: None
paper_authors: Thu-Le Tran, Clément Elvira, Hong-Phuong Dang, Cédric Herzet
for: 这 paper 是用于构造 “安全” 球（即包含目标优化问题的 dual 解的区域）的一种新框架（名为 “RYU”）。
methods: 这 paper 使用了一种新的框架，名为 “RYU”，来构造 “安全” 球。这个框架可以涵盖过去十年内 для相关优化问题的所有结果。
results: 这 paper 的结果表明，RYU 框架可以将所有过去十年内的相关优化问题的结果总结或改进。

Abstract
In this paper, we put forth a novel framework (named ``RYU'') for the construction of ``safe'' balls, i.e. regions that provably contain the dual solution of a target optimization problem. We concentrate on the standard setup where the cost function is the sum of two terms: a closed, proper, convex Lipschitz-smooth function and a closed, proper, convex function. The RYU framework is shown to generalize or improve upon all the results proposed in the last decade for the considered family of optimization problems.

摘要
在这篇论文中，我们提出了一种新的框架（名为“RYU”），用于构建“安全”的球体（即包含目标优化问题的 dual 解的区域）。我们对标准设置进行研究，其中成本函数是两个项：一个关闭、完善、凸 lipschitz 光滑函数和一个关闭、完善、凸函数。RYU 框架被证明可以总结或改进过去一个 década 中关于这种优化问题的所有结果。

Forecasting Trends in Food Security: a Reservoir Computing Approach

paper_url: http://arxiv.org/abs/2312.00626
repo_url: None
paper_authors: Joschka Herteux, Christoph Räth, Amine Baha, Giulia Martini, Duccio Piovani
for: 这项研究旨在开发一个基于数据驱动的全球早期警示系统，以预测60天内的食物消耗水平，并在四个国家（马利、奈及利亚、叙利亚和也门）进行了应用。
methods: 该方法基于世界食品计划的一体化全球饥饿监测系统，使用公共可用数据进行预测，包括冲击、天气事件和其他饥饿安全驱动因素的日常更新。研究 Comparing ARIMA、XGBoost、LSTMs、CNNs和RC（储存计算）模型的性能，包括RMSE指标。
results: 研究发现，RC模型在饥饿安全领域具有优异表现，具有强抗过拟合特性和高效训练能力。该方法可以用于建立全球数据驱动的早期警示系统，以预测和检测饥饿问题。

Abstract
Early warning systems are an essential tool for effective humanitarian action. Advance warnings on impending disasters facilitate timely and targeted response which help save lives, livelihoods, and scarce financial resources. In this work we present a new quantitative methodology to forecast levels of food consumption for 60 consecutive days, at the sub-national level, in four countries: Mali, Nigeria, Syria, and Yemen. The methodology is built on publicly available data from the World Food Programme's integrated global hunger monitoring system which collects, processes, and displays daily updates on key food security metrics, conflict, weather events, and other drivers of food insecurity across 90 countries (https://hungermap.wfp.org/). In this study, we assessed the performance of various models including ARIMA, XGBoost, LSTMs, CNNs, and Reservoir Computing (RC), by comparing their Root Mean Squared Error (RMSE) metrics. This comprehensive analysis spanned classical statistical, machine learning, and deep learning approaches. Our findings highlight Reservoir Computing as a particularly well-suited model in the field of food security given both its notable resistance to over-fitting on limited data samples and its efficient training capabilities. The methodology we introduce establishes the groundwork for a global, data-driven early warning system designed to anticipate and detect food insecurity.

摘要
早期警告系统是人道主义行动中不可或缺的工具。提前预测自然灾害的发生可以实现时准的应对措施，从而拯救生命、生产基础设施和紧缺的财务资源。在这份工作中，我们提出了一种新的量化方法，用于预测60天 consecutively的食品消耗水平，在马里、尼日利亚、叙利亚和也门四个国家的sub-national水平。这种方法基于公共可用的数据，包括世界食品计划集成全球饥饿监测系统，该系统每天更新关键饥饿安全指标、冲突、天气事件和食品安全问题的数据，覆盖90个国家（https://hungermap.wfp.org/)。在这项研究中，我们评估了不同的模型，包括ARIMA、XGBoost、LSTMs、CNNs和Reservoir Computing（RC），通过比较它们的根平方误差（RMSE）指标。这种全面的分析涵盖了统计学、机器学习和深度学习方法。我们的发现表明，Reservoir Computing在食品安全领域具有优秀的适应性和训练效率，因此可以作为食品安全预警系统的优秀选择。这种方法的引入标志着一个全球、数据驱动的早期警告系统的开创，旨在预测和检测食品危机。

Practical Path-based Bayesian Optimization

paper_url: http://arxiv.org/abs/2312.00622
repo_url: None
paper_authors: Jose Pablo Folch, James Odgers, Shiqiang Zhang, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, Ruth Misener
for: 这篇论文是用于探讨数据驱动实验设计，特别是在化学工程和药品制造中的应用。
methods: 本研究使用了 Bayesian 优化（BO）方法，可以模型具有高成本黑盒函数的反应。在这篇论文中，我们将 SnAKe 算法扩展到同时考虑实验成本和变量成本。
results: 本研究提出了对最大输入变化的扩展，以及多目标设定下的扩展。

Abstract
There has been a surge in interest in data-driven experimental design with applications to chemical engineering and drug manufacturing. Bayesian optimization (BO) has proven to be adaptable to such cases, since we can model the reactions of interest as expensive black-box functions. Sometimes, the cost of this black-box functions can be separated into two parts: (a) the cost of the experiment itself, and (b) the cost of changing the input parameters. In this short paper, we extend the SnAKe algorithm to deal with both types of costs simultaneously. We further propose extensions to the case of a maximum allowable input change, as well as to the multi-objective setting.

摘要
“在数据驱动实验设计方面，有很大的兴趣增长，特别是在化学工程和药品生产方面。 bayesian优化（BO）已经证明可以适应这些情况，因为我们可以模型我们关注的反应为昂贵的黑盒函数。有时，这种黑盒函数的成本可以分为两部分：（a）实验成本，和（b）输入参数修改成本。在这篇短文中，我们将扩展SnAKe算法，以同时处理这两种成本。我们还提出了在最大输入变化限制下的扩展，以及多目标情况下的扩展。”Note: The translation is done using Google Translate, and may not be perfect.

Investigating a domain adaptation approach for integrating different measurement instruments in a longitudinal clinical registry

paper_url: http://arxiv.org/abs/2312.00616
repo_url: None
paper_authors: Maren Hackenberg, Michelle Pfaffenlehner, Max Behrens, Astrid Pechmann, Janbernd Kirschner, Harald Binder
for: 这种研究旨在探讨深度学习技术可以如何将不同时间点的测量工具数据集合起来，以获得共同的隐藏表示。
methods: 这种研究使用了适应领域学习的概念，将图像数据与测量工具数据进行了对应。
results: 研究发现，适应领域学习可以在小数量的时间点上进行追踪，并且可以根据个体特征来推断测量工具数据的映射。此外，研究还发现，在不同测量工具数据中，存在一定的结构，即使测量工具数据存在差异也可以获得一定的结果。

Abstract
In a longitudinal clinical registry, different measurement instruments might have been used for assessing individuals at different time points. To combine them, we investigate deep learning techniques for obtaining a joint latent representation, to which the items of different measurement instruments are mapped. This corresponds to domain adaptation, an established concept in computer science for image data. Using the proposed approach as an example, we evaluate the potential of domain adaptation in a longitudinal cohort setting with a rather small number of time points, motivated by an application with different motor function measurement instruments in a registry of spinal muscular atrophy (SMA) patients. There, we model trajectories in the latent representation by ordinary differential equations (ODEs), where person-specific ODE parameters are inferred from baseline characteristics. The goodness of fit and complexity of the ODE solutions then allows to judge the measurement instrument mappings. We subsequently explore how alignment can be improved by incorporating corresponding penalty terms into model fitting. To systematically investigate the effect of differences between measurement instruments, we consider several scenarios based on modified SMA data, including scenarios where a mapping should be feasible in principle and scenarios where no perfect mapping is available. While misalignment increases in more complex scenarios, some structure is still recovered, even if the availability of measurement instruments depends on patient state. A reasonable mapping is feasible also in the more complex real SMA dataset. These results indicate that domain adaptation might be more generally useful in statistical modeling for longitudinal registry data.

摘要
在长itudinal临床 registry 中，不同的测量工具可能会在不同的时间点上使用不同的测量工具来评估个体。为了将它们结合起来，我们研究深度学习技术，以获得共同的射度表示，其中不同测量工具的项目都是映射到该表示中的。这与计算机科学中的领域适应（domain adaptation）的概念相似，但是在长itudinal cohort 中进行适应。使用我们的方法为例，我们评估了在长itudinal cohort 中使用领域适应的潜在价值。在一个小数量的时间点上进行评估，我们使用了各个测量工具的数据，并模型了人具体的演化趋势。我们使用了径向 differential equations（ODEs）来模型这些趋势，并从基线特征中推断出每个人的特定参数。然后，我们可以根据模型的合理性和复杂性来评估测量工具的映射。我们随后探索了如何通过添加相应的罚金项来改善匹配。为了系统地研究测量工具之间的差异的影响，我们考虑了一些基于修改的 SMA 数据的场景，包括一些可能的映射场景和无法完美映射的场景。当测量工具之间存在差异时，我们发现了一定的结构仍然可以被恢复，即使测量工具的可用性取决于病人的状态。在更复杂的实际 SMA 数据中，我们还发现了一个合理的映射是可行的。这些结果表明，领域适应可能在统计模型中更广泛地有用。

Improving Plasticity in Online Continual Learning via Collaborative Learning

paper_url: http://arxiv.org/abs/2312.00600
repo_url: None
paper_authors: Maorong Wang, Nicolas Michel, Ling Xiao, Toshihiko Yamasaki
for: solves the problem of learning the ever-emerging new classification tasks from a continuous data stream.
methods: Collaborative Continual Learning (CCL), a collaborative learning based strategy to improve the model’s capability in acquiring new concepts. Additionally, we introduce Distillation Chain (DC), a novel collaborative learning scheme to boost the training of the models.
results: our strategy can still improve model plasticity dramatically, and thereby improve the overall performance by a large margin.

Abstract
Online Continual Learning (CL) solves the problem of learning the ever-emerging new classification tasks from a continuous data stream. Unlike its offline counterpart, in online CL, the training data can only be seen once. Most existing online CL research regards catastrophic forgetting (i.e., model stability) as almost the only challenge. In this paper, we argue that the model's capability to acquire new knowledge (i.e., model plasticity) is another challenge in online CL. While replay-based strategies have been shown to be effective in alleviating catastrophic forgetting, there is a notable gap in research attention toward improving model plasticity. To this end, we propose Collaborative Continual Learning (CCL), a collaborative learning based strategy to improve the model's capability in acquiring new concepts. Additionally, we introduce Distillation Chain (DC), a novel collaborative learning scheme to boost the training of the models. We adapted CCL-DC to existing representative online CL works. Extensive experiments demonstrate that even if the learners are well-trained with state-of-the-art online CL methods, our strategy can still improve model plasticity dramatically, and thereby improve the overall performance by a large margin.

摘要
在线连续学习（CL）解决了从连续数据流中学习不断出现的新分类任务的问题。与其它线上CL研究不同，在线CL中的训练数据只能看一次。大多数现有的线上CL研究视为慢升级（i.e., 模型稳定性）为线上CL最大的挑战。在这篇论文中，我们认为模型获得新知识（i.e., 模型塑化性）是线上CL另一个挑战。虽然播放方法已经被证明可以减轻慢升级，但是对于提高模型塑化性的研究却存在显著的缺失。为此，我们提出了协同连续学习（CCL）策略，以提高模型的新知识获得能力。此外，我们还介绍了分配链（DC），一种新的协同学习方案，以加强模型的训练。我们适应CCL-DC到现有的代表性线上CL工作中。广泛的实验表明，即使学习者已经通过当今线上CL方法受过高水平训练，我们的策略仍然可以在模型塑化性方面提高差距，并因此提高总性能。

Adaptive Parameter-Free Robust Learning using Latent Bernoulli Variables

paper_url: http://arxiv.org/abs/2312.00585
repo_url: https://github.com/akarakulev/rlvi
paper_authors: Aleksandr Karakulev, Dave Zachariah, Prashant Singh
for: robust statistical learning from corrupted training sets
methods: 使用 latent Bernoulli variables identific corrupted 和 non-corrupted samples, 并通过变量极大化来解决 robust learning 问题
results: 提高了 state-of-the-art 的 robust learning 方法, 自动推断腐败水平和异常点，增加了小量计算负担, 并在多种机器学习任务中展示了适应不同噪音水平和高预测精度

Abstract
We present an efficient parameter-free approach for statistical learning from corrupted training sets. We identify corrupted and non-corrupted samples using latent Bernoulli variables, and therefore formulate the robust learning problem as maximization of the likelihood where latent variables are marginalized out. The resulting optimization problem is solved via variational inference using an efficient Expectation-Maximization based method. The proposed approach improves over the state-of-the-art by automatically inferring the corruption level and identifying outliers, while adding minimal computational overhead. We demonstrate our robust learning method on a wide variety of machine learning tasks including online learning and deep learning where it exhibits ability to adapt to different levels of noise and attain high prediction accuracy.

摘要
我们提出了一种高效无参数的统计学学习方法，可以在受损训练集上进行学习。我们使用潜在的bernoulli变量来识别受损和非受损样本，因此将robust学习问题转化为maximize likelihood的问题，其中潜在变量被约束出。我们使用可靠的Expectation-Maximization基于方法来解决这个优化问题。我们的方法可以自动推断受损水平和标注异常样本，而且增加了 minimal的计算过程。我们在多种机器学习任务上展示了我们的Robust学习方法，包括在线学习和深度学习中，其中它表现出了适应不同水平的噪声和高预测精度。

Pathway to a fully data-driven geotechnics: lessons from materials informatics

paper_url: http://arxiv.org/abs/2312.00581
repo_url: None
paper_authors: Stephen Wu, Yu Otake, Yosuke Higo, Ikumasa Yoshida
for: 本研究探讨了将数据驱动方法应用于岩土工程中的挑战和机遇，从物料信息学的成功中继承。
methods: 本研究使用了深度学习等数据驱动方法，特别是高维数据特征提取和传输学习，以推动岩土工程领域的共同合作和创新。
results: 本研究预测了数据驱动方法在岩土工程中的应用将导致一种新的共同合作和创新的 paradigma shift，并将通过大语言模型等高级计算工具重塑岩土工程信息学领域。

Abstract
This paper elucidates the challenges and opportunities inherent in integrating data-driven methodologies into geotechnics, drawing inspiration from the success of materials informatics. Highlighting the intricacies of soil complexity, heterogeneity, and the lack of comprehensive data, the discussion underscores the pressing need for community-driven database initiatives and open science movements. By leveraging the transformative power of deep learning, particularly in feature extraction from high-dimensional data and the potential of transfer learning, we envision a paradigm shift towards a more collaborative and innovative geotechnics field. The paper concludes with a forward-looking stance, emphasizing the revolutionary potential brought about by advanced computational tools like large language models in reshaping geotechnics informatics.

摘要
Translation notes:* "geotechnics" is translated as "地科技" (dì kē jī)* "data-driven methodologies" is translated as "数据驱动方法" (shù jí yùn fāng yì)* "materials informatics" is translated as "材料信息学" (caì yào xìn xué)* "soil complexity" is translated as "土壤复杂性" (tǔ chén fù zhāng xìng)* "heterogeneity" is translated as "不均质性" (bù jìn zhì xìng)* "comprehensive data" is translated as "全面数据" (quán miàn shù)* "community-driven database initiatives" is translated as "社区驱动数据库 iniciativas" (shè qū yùn dRIVe shū jué)* "open science movements" is translated as "开放科学运动" (kāifàng kē xué yùndòng)* "deep learning" is translated as "深度学习" (shēn dào xué xí)* "feature extraction" is translated as "特征提取" (tè yì fù qiū)* "transfer learning" is translated as "传输学习" (chuán xīn xué xí)* "paradigm shift" is translated as "思维转变" (sī wéi zhuān biàn)* "advanced computational tools" is translated as "高级计算工具" (gāo jí jì yì gōng chú)* "large language models" is translated as "大语言模型" (dà yǔ yán mó delì)

Interior Point Constrained Reinforcement Learning with Global Convergence Guarantees

paper_url: http://arxiv.org/abs/2312.00561
repo_url: None
paper_authors: Tingting Ni, Maryam Kamgarpour
for: 这篇论文主要针对的是解决Constrained Markov Decision Processes（CMDP）中的优化问题，即在遵守一些约束条件的情况下找到最优策略，以 maximize the expected cumulative reward。
methods: 作者采用了零次点方法，基于CMDP的减法栅函数，以确保策略满足约束条件。
results: 作者证明了该算法在满足约束条件的前提下，能够 garantuee策略的可行性，并且在执行过程中可以保证策略的可行性。相比之前的方法，该算法需要更多的样本数（$O(\varepsilon^{-6})$），但可以在同等的样本数下保证策略的可行性。

Abstract
We consider discounted infinite horizon constrained Markov decision processes (CMDPs) where the goal is to find an optimal policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Motivated by the application of CMDPs in online learning of safety-critical systems, we focus on developing an algorithm that ensures constraint satisfaction during learning. To this end, we develop a zeroth-order interior point approach based on the log barrier function of the CMDP. Under the commonly assumed conditions of Fisher non-degeneracy and bounded transfer error of the policy parameterization, we establish the theoretical properties of the algorithm. In particular, in contrast to existing CMDP approaches that ensure policy feasibility only upon convergence, our algorithm guarantees feasibility of the policies during the learning process and converges to the optimal policy with a sample complexity of $O(\varepsilon^{-6})$. In comparison to the state-of-the-art policy gradient-based algorithm, C-NPG-PDA, our algorithm requires an additional $O(\varepsilon^{-2})$ samples to ensure policy feasibility during learning with same Fisher-non-degenerate parameterization.

摘要
我们考虑了折扣无穷远 horizon constrained Markov decision process (CMDP)，其目的是找到一个优化策略，以最大化预期总奖励，同时满足预期总约束。在安全关键系统的在线学习应用中，我们强调了策略学习中的约束满足。为此，我们提出了一种零次内点方法，基于 CMDP 的征函数 log barrier。在政策参数化的假设下，我们证明了算法的理论性质。特别是，我们的算法可以在学习过程中保证策略的合法性，并且与 Fisher 非分歧的参数化下 converge to optimal policy 的样本复杂度为 $O(\varepsilon^{-6})$。相比之下，使用 policy gradient-based algorithm C-NPG-PDA 的样本复杂度为 $O(\varepsilon^{-3})$。

A Preconditioned Interior Point Method for Support Vector Machines Using an ANOVA-Decomposition and NFFT-Based Matrix-Vector Products

paper_url: http://arxiv.org/abs/2312.00538
repo_url: https://github.com/wagnertheresa/nfftsvmipm
paper_authors: Theresa Wagner, John W. Pearson, Martin Stoll
for: solves the numerical solution to the soft-margin support vector machine optimization problem with large-scale kernel matrices.
methods: employs an NFFT-accelerated matrix-vector product using an ANOVA decomposition and an interior point method, with preconditioning based on low-rank approximations of the kernel matrix and a Krylov subspace solver.
results: compares the accuracy of the ANOVA-based kernel with the default LIBSVM implementation and investigates the performance of different preconditioners on several large-scale datasets.

Abstract
In this paper we consider the numerical solution to the soft-margin support vector machine optimization problem. This problem is typically solved using the SMO algorithm, given the high computational complexity of traditional optimization algorithms when dealing with large-scale kernel matrices. In this work, we propose employing an NFFT-accelerated matrix-vector product using an ANOVA decomposition for the feature space that is used within an interior point method for the overall optimization problem. As this method requires the solution of a linear system of saddle point form we suggest a preconditioning approach that is based on low-rank approximations of the kernel matrix together with a Krylov subspace solver. We compare the accuracy of the ANOVA-based kernel with the default LIBSVM implementation. We investigate the performance of the different preconditioners as well as the accuracy of the ANOVA kernel on several large-scale datasets.

摘要
在这篇论文中，我们考虑了软边支持向量机器学习优化问题的数值解决方案。这个问题通常使用SMO算法解决，因为传统优化算法在处理大规模kernel矩阵时的计算复杂性很高。在这项工作中，我们提议使用NFFT加速matrix-vector乘法使用ANOVA分解在特征空间中，并使用内点法解决整个优化问题。由于这个方法需要解决一个坐标点形式的线性系统，我们建议一种基于低级别kernel矩阵的低级别预conditioning方法，并使用Krylov子空间解决器。我们比较了ANOVA基于kernel的准确率和LIBSVM实现的默认实现，以及不同预conditioners的性能和ANOVA kernel在几个大规模数据集上的准确率。

RIS-Based On-the-Air Semantic Communications – a Diffractional Deep Neural Network Approach

paper_url: http://arxiv.org/abs/2312.00535
repo_url: None
paper_authors: Shuyi Chen, Yingzhe Hui, Yifan Qin, Yueyi Yuan, Weixiao Meng, Xuewen Luo, Hsiao-Hwa Chen
for: 这篇论文的目的是探讨基于智能板（RIS）的空中 semantics 通信技术，以实现高效的无线通信。
methods: 该论文使用的方法是基于空中 diffraction 深度神经网络（D$^2$NN），实现在无线信号传输过程中进行计算处理。
results: 该论文通过对图像传输作为例子进行性能分析，显示了基于 RIS 的空中 semantics 通信技术可以提供较高的传输效率和多任务同时处理能力。

Abstract
Semantic communication has gained significant attention recently due to its advantages in achieving higher transmission efficiency by focusing on semantic information instead of bit-level information. However, current AI-based semantic communication methods require digital hardware for implementation. With the rapid advancement on reconfigurable intelligence surfaces (RISs), a new approach called on-the-air diffractional deep neural networks (D$^2$NN) can be utilized to enable semantic communications on the wave domain. This paper proposes a new paradigm of RIS-based on-the-air semantic communications, where the computational process occurs inherently as wireless signals pass through RISs. We present the system model and discuss the data and control flows of this scheme, followed by a performance analysis using image transmission as an example. In comparison to traditional hardware-based approaches, RIS-based semantic communications offer appealing features, such as light-speed computation, low computational power requirements, and the ability to handle multiple tasks simultaneously.

摘要
现代Semantic communication技术在最近受到了广泛关注，因为它可以更高效地传输信息，通过专注于 semantics信息而不是 bits信息。然而，目前的 AI-based Semantic communication方法都需要数字硬件实现。随着易配置智能表面（RIS）的快速进步，我们可以使用 called on-the-air diffractional deep neural networks (D$^2$NN)来实现Semantic communications在波域上。这篇论文提出了基于 RIS 的在空中 Semantic communications 新方法，其中计算过程会自然地在无线信号通过 RIS 时发生。我们介绍了系统模型，并讨论了数据和控制流的这种方案，然后进行了图像传输作为例子的性能分析。与传统硬件基础方法相比，基于 RIS 的 Semantic communications 具有吸引人的特点，如光速计算、低计算机功率要求和可同时处理多个任务。

Spatio-Temporal-Decoupled Masked Pre-training for Traffic Forecasting

paper_url: http://arxiv.org/abs/2312.00516
repo_url: https://github.com/jimmy-7664/std_mae
paper_authors: Haotian Gao, Renhe Jiang, Zheng Dong, Jinliang Deng, Xuan Song
for: 预测多变量交通流时序系列数据，即预测不同路径上的交通流速度和时间的相关性。
methods: 使用两个分离的伪隐藏层自动编码器来学习和编码交通数据的空间和时间相关性，并通过自我超级vised预训练方法来增强下游交通预测器的能力。
results: 在四个广泛使用的交通准 benchmark 上（PEMS03、PEMS04、PEMS07和PEMS08）进行了评估，并证明了 STD-MAE 可以提高下游交通预测器的能力，特别是在捕捉长距离复杂的空间和时间相关性方面。

Abstract
Accurate forecasting of multivariate traffic flow time series remains challenging due to substantial spatio-temporal heterogeneity and complex long-range correlative patterns. To address this, we propose Spatio-Temporal-Decoupled Masked Pre-training (STD-MAE), a novel framework that employs masked autoencoders to learn and encode complex spatio-temporal dependencies via pre-training. Specifically, we use two decoupled masked autoencoders to reconstruct the traffic data along spatial and temporal axes using a self-supervised pre-training approach. These mask reconstruction mechanisms capture the long-range correlations in space and time separately. The learned hidden representations are then used to augment the downstream spatio-temporal traffic predictor. A series of quantitative and qualitative evaluations on four widely-used traffic benchmarks (PEMS03, PEMS04, PEMS07, and PEMS08) are conducted to verify the state-of-the-art performance, with STD-MAE explicitly enhancing the downstream spatio-temporal models' ability to capture long-range intricate spatial and temporal patterns. Codes are available at https://github.com/Jimmy-7664/STD_MAE.

摘要
准确预测多变量交通流时间序列仍然存在挑战，主要原因在于交通流的空间时间特征强烈不同和复杂长距离相关性。为解决这问题，我们提出了空间时间分解掩码预训练（STD-MAE）框架，该框架利用掩码自适应神经网络来学习和编码交通流的复杂空间时间关系。具体来说，我们使用两个分解的掩码自适应神经网络来重建交通流数据的空间和时间轴，使用自我超级vised预训练方法。这两个掩码重建机制可以分别捕捉交通流的长距离相关性在空间和时间上。学习的隐藏表示被用来改善下游的交通流预测模型。我们在四个常用的交通流标准 benchmark（PEMS03、PEMS04、PEMS07和PEMS08）上进行了一系列量化和质量评估，以验证我们的模型可以达到当前顶峰性能。代码可以在 GitHub 上找到：https://github.com/Jimmy-7664/STD_MAE。

Bayesian causal discovery from unknown general interventions

paper_url: http://arxiv.org/abs/2312.00509
repo_url: https://github.com/alesmascaro/bcd-ugi
paper_authors: Alessandro Mascaro, Federico Castelletti
for: 学习 causal Directed Acyclic Graphs (DAGs) 使用观测和干扰实验数据的组合。
methods: 提议一种 Bayesian 方法 для causal discovery，允许对未知目标节点进行修改。
results: 提出了一种 Markov Chain Monte Carlo (MCMC) 算法来 aproximate posterior distribution over DAGs, intervention targets and induced parent sets。

Abstract
We consider the problem of learning causal Directed Acyclic Graphs (DAGs) using combinations of observational and interventional experimental data. Current methods tailored to this setting assume that interventions either destroy parent-child relations of the intervened (target) nodes or only alter such relations without modifying the parent sets, even when the intervention targets are unknown. We relax this assumption by proposing a Bayesian method for causal discovery from general interventions, which allow for modifications of the parent sets of the unknown targets. Even in this framework, DAGs and general interventions may be identifiable only up to some equivalence classes. We provide graphical characterizations of such interventional Markov equivalence and devise compatible priors for Bayesian inference that guarantee score equivalence of indistinguishable structures. We then develop a Markov Chain Monte Carlo (MCMC) scheme to approximate the posterior distribution over DAGs, intervention targets and induced parent sets. Finally, we evaluate the proposed methodology on both simulated and real protein expression data.

摘要
我团队考虑了用观察和干预实验数据学习 causal 导向的Directed Acyclic Graphs (DAGs)。现有的方法假设干预会摧毁父节点与目标节点之间的父子关系，或者只是改变这些关系而不改变父集。我们relax这个假设，提议了一种 bayesian方法用于 causal 发现，该方法允许修改目标节点的父集。即使在这种框架下，DAGs和通用干预可能只能被确定为相对EQivalence类型。我们提供了MarkovEquivalence的图形化表示，并设计了相容的 prior 以保证scoreEquivalence。然后，我们开发了 Markov Chain Monte Carlo (MCMC) 算法来近似 posterior distribution over DAGs, 干预目标和推导出的父集。最后，我们对 simulated 和实际蛋白表达数据进行了评估。

VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity

paper_url: http://arxiv.org/abs/2312.00507
repo_url: None
paper_authors: S. VenkataKeerthy, Yashas Andaluri, Sayan Dey, Soumya Banerjee, Ramakrishna Upadrasta
for: 本研究提出了一种基于VEX IR的函数编码框架，用于在 binary 中找到相似函数。
methods: 该框架使用 VEX IR 作为中间表示，并提出了 POV 自定义缩小优化引擎来normalize VEX IR，以便进行有效的相似性分析。
results: 在两个实验中（diffing和Searching），我们的框架对针对不同架构、编译器和版本、优化序列和混淆的 binary 进行了评估，并取得了superior的精度和准确率值。我们的框架也具有高可扩展性和并发性。

Abstract
We propose VEXIR2Vec, a code embedding framework for finding similar functions in binaries. Our representations rely on VEX IR, the intermediate representation used by binary analysis tools like Valgrind and angr. Our proposed embeddings encode both syntactic and semantic information to represent a function, and is both application and architecture independent. We also propose POV, a custom Peephole Optimization engine that normalizes the VEX IR for effective similarity analysis. We design several optimizations like copy/constant propagation, constant folding, common subexpression elimination and load-store elimination in POV. We evaluate our framework on two experiments -- diffing and searching -- involving binaries targeting different architectures, compiled using different compilers and versions, optimization sequences, and obfuscations. We show results on several standard projects and on real-world vulnerabilities. Our results show that VEXIR2Vec achieves superior precision and recall values compared to the state-of-the-art works. Our framework is highly scalable and is built as a multi-threaded, parallel library by only using open-source tools. VEXIR2Vec achieves about $3.2 \times$ speedup on the closest competitor, and orders-of-magnitude speedup on other tools.

摘要
我们提出VEXIR2Vec，一个代码嵌入框架，用于在二进制中找到相似函数。我们的表示方式基于VEX IR，这是 binary 分析工具如 Valgrind 和 angr 使用的中间表示。我们的提议的嵌入都会采集函数的语法和 semantics 信息，并且是独立于应用程序和架构的。我们还提出了 POV，一个自定义的 peephole 优化引擎，用于normalize VEX IR，以便有效地进行相似性分析。我们在 POV 中设计了多个优化，如拷贝/常量传递、常量聚合、通用表达消除和加载/存储消除。我们在两个实验中评估了我们的框架：diffing 和 searching。这两个实验都涉及到不同的架构、编译器和版本、优化序列和隐蔽。我们在几个标准项目和真实攻击漏洞上进行了评估。我们的结果表明，VEXIR2Vec 的精度和准确率比现有的工作更高。我们的框架具有高可扩展性，并且是一个多线程、并发的开源库。VEXIR2Vec 与最接近竞争者相比，实现了约 $3.2 \times$ 的速度提升，并在其他工具上实现了orders-of-magnitude 的速度提升。

On the Out-Of-Distribution Robustness of Self-Supervised Representation Learning for Phonocardiogram Signals

paper_url: http://arxiv.org/abs/2312.00502
repo_url: https://github.com/aristotelisballas/listen2yourheart
paper_authors: Aristotelis Ballas, Vasileios Papapanagiotou, Christos Diou
for: 这个研究旨在解决医学领域中深度学习模型尚未广泛被接受的问题，即缺乏高品质标注数据，从而妨碍模型的发展和一致性。
methods: 本研究提出使用对照自监督学习（SSL）来解决缺乏标注数据的问题，通过将无标注数据作为训练数据，从而提高模型的一致性和效力。
results: 我们实验发现，对于不同的训练分布，对于未见的数据（OOD）评分效果可以下降到32%，而使用SSL模型则只下降到10%或者甚至提高。结论：使用对照SSL预训可以帮助提供具有一致性和抗变性的分类器，不需要过程式的标注过程，并且提供了一个广泛的评估协议，帮助选择适合的音频增强技术。

Abstract
Objective: Despite the recent increase in research activity, deep-learning models have not yet been widely accepted in medicine. The shortage of high-quality annotated data often hinders the development of robust and generalizable models, which do not suffer from degraded effectiveness when presented with newly-collected, out-of-distribution (OOD) datasets. Methods: Contrastive Self-Supervised Learning (SSL) offers a potential solution to the scarcity of labeled data as it takes advantage of unlabeled data to increase model effectiveness and robustness. In this research, we propose applying contrastive SSL for detecting abnormalities in phonocardiogram (PCG) samples by learning a generalized representation of the signal. Specifically, we perform an extensive comparative evaluation of a wide range of audio-based augmentations and evaluate trained classifiers on multiple datasets across different downstream tasks. Results: We experimentally demonstrate that, depending on its training distribution, the effectiveness of a fully-supervised model can degrade up to 32% when evaluated on unseen data, while SSL models only lose up to 10% or even improve in some cases. Conclusions: Contrastive SSL pretraining can assist in providing robust classifiers which can generalize to unseen, OOD data, without relying on time- and labor-intensive annotation processes by medical experts. Furthermore, the proposed extensive evaluation protocol sheds light on the most promising and appropriate augmentations for robust PCG signal processing. Significance: We provide researchers and practitioners with a roadmap towards producing robust models for PCG classification, in addition to an open-source codebase for developing novel approaches.

摘要
目的：尽管最近有很多研究活动，深度学习模型在医学领域还没有得到广泛acceptance。缺乏高质量标注数据经常阻碍深度学习模型的发展，这些模型不受新收集的外部数据（OOD）的影响。方法：对比自我超视学习（SSL）提供了一种可能的解决方案，它利用无标注数据提高模型的效果和通用性。在这项研究中，我们提议通过学习普适的信号表示来检测phonocardiogram（PCG）样本中的异常。具体来说，我们进行了广泛的对比评估，以评估不同的音频基于扩展的扩展和训练集。结果：我们实验表明，取决于训练集的分布，全部超vised模型在未经见过数据上的效果可能下降到32%，而SSL模型则下降到10%或者even improve in some cases。结论：对比SSL预训练可以帮助提供通用的分类器，不需要由医学专家投入大量时间和劳动来标注数据。此外，我们提出的广泛评估协议可以透视最有前途的和适合PCG信号处理的扩展。重要性：我们为研究人员和实践者提供了PCG分类器的稳定模型的制作路线图，以及一个开源代码库，以便开发新的方法。

REDUCR: Robust Data Downsampling Using Class Priority Reweighting

paper_url: http://arxiv.org/abs/2312.00486
repo_url: None
paper_authors: William Bankes, George Hughes, Ilija Bogunovic, Zi Wang
for: 降低实际图像和文本分类任务中模型训练成本的方法methods: 使用类别优先权重调整的数据减样方法results: 在视觉和文本分类任务中，使用REDUCR方法可以显著提高最坏类测试精度（以及平均精度），比前STATE方法提高约15%。

Abstract
Modern machine learning models are becoming increasingly expensive to train for real-world image and text classification tasks, where massive web-scale data is collected in a streaming fashion. To reduce the training cost, online batch selection techniques have been developed to choose the most informative datapoints. However, these techniques can suffer from poor worst-class generalization performance due to class imbalance and distributional shifts. This work introduces REDUCR, a robust and efficient data downsampling method that uses class priority reweighting. REDUCR reduces the training data while preserving worst-class generalization performance. REDUCR assigns priority weights to datapoints in a class-aware manner using an online learning algorithm. We demonstrate the data efficiency and robust performance of REDUCR on vision and text classification tasks. On web-scraped datasets with imbalanced class distributions, REDUCR significantly improves worst-class test accuracy (and average accuracy), surpassing state-of-the-art methods by around 15%.

摘要
现代机器学习模型在实际图像和文本分类任务中日益成本高，这是因为收集了大量的网络规模数据，并在流处理方式下进行训练。为了降低训练成本，在线批处理技术已经开发出来，以选择最有用的数据点。然而，这些技术可能会因为类偏移和分布变化而导致最坏类泛化性能的下降。这项工作介绍了REDUCR，一种可靠和高效的数据减样方法，该方法使用类优先权重定义。REDUCR减少了训练数据量，同时保持最坏类泛化性能。REDUCR使用在线学习算法来为每个类分配优先权重，以类 Bewusstsein manner。我们在视觉和文本分类任务上示出了REDUCR的数据效率和稳定性。在收集网络数据时，REDUCR在不均衡类分布的情况下显著提高了最坏类测试精度（以及平均精度），超过了当前的状态艺术方法约15%。

Backbone-based Dynamic Graph Spatio-Temporal Network for Epidemic Forecasting

paper_url: http://arxiv.org/abs/2312.00485
repo_url: None
paper_authors: Junkai Mao, Yuexing Han, Gouhei Tanaka, Bing Wang
For: 本文提出了一种新的模型BDGSTN，用于精准预测疾病传播。该模型可以充分利用图像的连续变化，并且通过综合使用静止图像和动态图像来捕捉疾病的 espacio-temporal 特征。* Methods: 本文提出了一种名为BDGSTN的新模型，它包括静止图像的adaptive生成和动态图像的生成两个部分。具体来说，首先使用静止图像的adaptive生成来生成静止图像，然后使用动态图像的生成来捕捉疾病的时间依赖关系。最后，使用DLINear模型来处理时间相关性，并与动态图像 convolution 相结合以进行疾病预测。* Results: 实验结果表明，BDGSTN模型在两个数据集上的预测性能都高于基eline模型，并且与缺省模型进行比较后，BDGSTN模型的效果更加明显。此外，文章还分析了不同方面的信息度量，以证明BDGSTN模型中的图像和时间图像之间的关系是非常重要的。最后，文章还对模型参数量和训练时间进行了比较，证明BDGSTN模型在复杂性和效率两个方面都具有优势。

Abstract
Accurate epidemic forecasting is a critical task in controlling disease transmission. Many deep learning-based models focus only on static or dynamic graphs when constructing spatial information, ignoring their relationship. Additionally, these models often rely on recurrent structures, which can lead to error accumulation and computational time consumption. To address the aforementioned problems, we propose a novel model called Backbone-based Dynamic Graph Spatio-Temporal Network (BDGSTN). Intuitively, the continuous and smooth changes in graph structure, make adjacent graph structures share a basic pattern. To capture this property, we use adaptive methods to generate static backbone graphs containing the primary information and temporal models to generate dynamic temporal graphs of epidemic data, fusing them to generate a backbone-based dynamic graph. To overcome potential limitations associated with recurrent structures, we introduce a linear model DLinear to handle temporal dependencies and combine it with dynamic graph convolution for epidemic forecasting. Extensive experiments on two datasets demonstrate that BDGSTN outperforms baseline models and ablation comparison further verifies the effectiveness of model components. Furthermore, we analyze and measure the significance of backbone and temporal graphs by using information metrics from different aspects. Finally, we compare model parameter volume and training time to confirm the superior complexity and efficiency of BDGSTN.

摘要
📝精准的流行病预测是控制疾病传播的关键任务。许多深度学习基于模型只是在构建空间信息时注重静止或动态图，忽略它们之间的关系。此外，这些模型经常依赖循环结构，可能会导致错误积累和计算时间浪费。为解决以上问题，我们提出了一种新的模型，即基础结构基于动态图 spatial-temporal 网络（BDGSTN）。💡我们发现，流行病数据中的连续和平滑的结构变化，使得邻近图结构共享基本模式。为了捕捉这种性质，我们使用适应方法生成静止基础图，包含主要信息，并使用时间模型生成动态时间图，将其与基础图进行融合，生成基础结构基于动态图。🔢为了解决循环结构可能引起的问题，我们引入了一种线性模型 DLinear，用于处理时间相关性，并与动态图 convolution 结合，用于流行病预测。广泛的实验表明，BDGSTN 超过基线模型，并且综合比较表明模型组件的效果。此外，我们还分析和测量不同方面的信息度量，以证明基础和时间图的重要性。最后，我们对模型参数的体积和训练时间进行比较，确认BDGSTN 的更高复杂性和效率。

MultiView Independent Component Analysis with Delays

paper_url: http://arxiv.org/abs/2312.00484
repo_url: None
paper_authors: Ambroise Heurtebise, Pierre Ablin, Alexandre Gramfort
for: 这个论文是为了提高独立源分离的信号质量而设计的。
methods: 这个论文使用了多视图独立源分析（MVICA）模型，允许源在不同视图上具有不同的延迟。
results: 通过使用 simulated data 和 Cam-CAN 大规模磁共振成像（MEG）数据，这个模型可以更好地分离 neural signals。另外，这个模型还发现了年龄相关的延迟效应。

Abstract
Linear Independent Component Analysis (ICA) is a blind source separation technique that has been used in various domains to identify independent latent sources from observed signals. In order to obtain a higher signal-to-noise ratio, the presence of multiple views of the same sources can be used. In this work, we present MultiView Independent Component Analysis with Delays (MVICAD). This algorithm builds on the MultiView ICA model by allowing sources to be delayed versions of some shared sources: sources are shared across views up to some unknown latencies that are view- and source-specific. Using simulations, we demonstrate that MVICAD leads to better unmixing of the sources. Moreover, as ICA is often used in neuroscience, we show that latencies are age-related when applied to Cam-CAN, a large-scale magnetoencephalography (MEG) dataset. These results demonstrate that the MVICAD model can reveal rich effects on neural signals without human supervision.

摘要
Linear Independent Component Analysis (ICA) 是一种无视源分离技术，通常用于不同领域来确定来自观察信号的独立潜在来源。为了提高信号噪声比，可以使用多个视图。在这种情况下，我们介绍了 MultiView Independent Component Analysis with Delays (MVICAD) 算法。这种算法基于 MultiView ICA 模型，允许源被视图和来源特定的未知延迟所影响。通过仿真实验，我们证明了 MVICAD 能够更好地拆分源。此外，由于 ICAs frequently 用于神经科学，我们在 Cam-CAN 大规模磁共振成像（MEG）数据集上应用了这种模型，并发现了年龄相关的延迟。这些结果表明了 MVICAD 模型可以无人监督下提取脑信号中的丰富效应。

Interpretable Meta-Learning of Physical Systems

paper_url: http://arxiv.org/abs/2312.00477
repo_url: None
paper_authors: Matthieu Blanke, Marc Lelarge
for: 这篇论文旨在探讨如何使用机器学习方法在不同的实验条件下进行学习。
methods: 这篇论文使用了多任务学习方法，但是它们采用黑obox神经网络，导致计算成本高、解释性差。作者提出了一种简单的学习模型，具有对学习任务的拟合结构，可以实现多环境泛化。
results: 作者通过比较与现状最佳算法进行比较，显示了该方法在物理系统上的竞争性泛化性和计算成本的优势。此外，作者还通过应用于物理参数导引和自适应控制等原始应用来说明了该方法的解释性。

Abstract
Machine learning methods can be a valuable aid in the scientific process, but they need to face challenging settings where data come from inhomogeneous experimental conditions. Recent meta-learning methods have made significant progress in multi-task learning, but they rely on black-box neural networks, resulting in high computational costs and limited interpretability. Leveraging the structure of the learning problem, we argue that multi-environment generalization can be achieved using a simpler learning model, with an affine structure with respect to the learning task. Crucially, we prove that this architecture can identify the physical parameters of the system, enabling interpreable learning. We demonstrate the competitive generalization performance and the low computational cost of our method by comparing it to state-of-the-art algorithms on physical systems, ranging from toy models to complex, non-analytical systems. The interpretability of our method is illustrated with original applications to physical-parameter-induced adaptation and to adaptive control.

摘要

Auto-encoding GPS data to reveal individual and collective behaviour

paper_url: http://arxiv.org/abs/2312.00456
repo_url: None
paper_authors: Saint-Clair Chabert-Liddell, Nicolas Bez, Pierre Gloaguen, Sophie Donnet, Stéphanie Mahévas
for: 本研究旨在分析渔船的个体和集体行为，通过使用GPS轨迹数据和 convolutional neural networks 建立低维隐藏表示。
methods: 方法包括使用 conditional variational autoencoder 将轨迹数据映射到低维隐藏空间，并使用 Bhattacharyya 积分来比较轨迹 Distribution。 Collective 行为分析方法包括建立 proximity 图和使用多个网络扩展 stochastic block model。
results: 应用于法国渔船数据，可以获得不同时间段和地点的个体和集体行为征特。

Abstract
We propose an innovative and generic methodology to analyse individual and collective behaviour through individual trajectory data. The work is motivated by the analysis of GPS trajectories of fishing vessels collected from regulatory tracking data in the context of marine biodiversity conservation and ecosystem-based fisheries management. We build a low-dimensional latent representation of trajectories using convolutional neural networks as non-linear mapping. This is done by training a conditional variational auto-encoder taking into account covariates. The posterior distributions of the latent representations can be linked to the characteristics of the actual trajectories. The latent distributions of the trajectories are compared with the Bhattacharyya coefficient, which is well-suited for comparing distributions. Using this coefficient, we analyse the variation of the individual behaviour of each vessel during time. For collective behaviour analysis, we build proximity graphs and use an extension of the stochastic block model for multiple networks. This model results in a clustering of the individuals based on their set of trajectories. The application to French fishing vessels enables us to obtain groups of vessels whose individual and collective behaviours exhibit spatio-temporal patterns over the period 2014-2018.

摘要
我们提出了一种创新的和通用的方法来分析个人和集体行为，使用个人轨迹数据。这项工作受到marine生物多样性保护和基于生态系统的渔业管理的GPS轨迹数据的影响。我们使用卷积神经网络实现低维latent表示，并通过conditional variational autoencoder来考虑covariates。 posterior Distributions of latent representations可以与实际轨迹的特征相关。我们使用Bhattacharyya coefficient比较distribution，分析每艘船的行为变化过程中的差异。为集体行为分析，我们建立 proximity graphs，并使用多个网络的扩展stochastic block model。这种模型可以基于每艘船的轨迹集合来划分个人。在应用于法国渔船数据上，我们可以获得2014-2018年间的个人和集体行为具有时空特征的分组。

From Mutual Information to Expected Dynamics: New Generalization Bounds for Heavy-Tailed SGD

paper_url: http://arxiv.org/abs/2312.00427
repo_url: None
paper_authors: Benjamin Dupuis, Paul Viallard
for: 本研究旨在解释现代机器学习算法的泛化能力，尤其是SGD算法的学习动力与重 tailed 动力之间的关系。
methods: 本文使用了对重 tailed 动力的研究，并将其应用于泛化理论中。我们还引入了一种几何强分离项，通过比较学习动力与预期动力之间的差异来upper bound这个项。
results: 本文提出了一种不包含相互信息项的泛化 bound，并在PAC-Bayesian设置中进一步紧张这个 bound。

Abstract
Understanding the generalization abilities of modern machine learning algorithms has been a major research topic over the past decades. In recent years, the learning dynamics of Stochastic Gradient Descent (SGD) have been related to heavy-tailed dynamics. This has been successfully applied to generalization theory by exploiting the fractal properties of those dynamics. However, the derived bounds depend on mutual information (decoupling) terms that are beyond the reach of computability. In this work, we prove generalization bounds over the trajectory of a class of heavy-tailed dynamics, without those mutual information terms. Instead, we introduce a geometric decoupling term by comparing the learning dynamics (depending on the empirical risk) with an expected one (depending on the population risk). We further upper-bound this geometric term, by using techniques from the heavy-tailed and the fractal literature, making it fully computable. Moreover, as an attempt to tighten the bounds, we propose a PAC-Bayesian setting based on perturbed dynamics, in which the same geometric term plays a crucial role and can still be bounded using the techniques described above.

摘要
moderne 机器学习算法的通用化能力已经是过去几十年的主要研究topic。在最近的年头，SGD的学习dinamics（ Stochastic Gradient Descent）和重 tailed dynamics 之间的关系已经被成功地应用到通用化理论中。然而， derivated bounds 仍然виси于互 informations（decoupling） terms，这些 terms 是计算不可能的。在这个工作中，我们证明了一类重 tailed dynamics 的通用化 bounds，不 dependence on 互 informations terms。我们引入了一个几何 decoupling term，通过比较学习 dinamics（对应empirical risk）与预期 dinamics（对应population risk）。我们还 upper-bound 这个几何 term，使用了重 tailed 和 fractal 文献中的技术，以使其可计算。此外，为了紧缩范围，我们提出了一个 perturbed dynamics 的 PAC-Bayesian setting，在这个设定中，这个几何 term 仍然play a crucial role，可以使用上述技术来 bound。

A framework for mining lifestyle profiles through multi-dimensional and high-order mobility feature clustering

paper_url: http://arxiv.org/abs/2312.00411
repo_url: None
paper_authors: Yeshuo Shu, Gangcheng Zhang, Keyi Liu, Jintong Tang, Liyan Xu
for: 本研究旨在透过高级特征提取和聚类分析，揭示人们的生活方式特征。
methods: 该研究提出了一种进步的特征提取策略，利用空间、时间和 semantics 维度上的高级流动特征，包括旅行模式、时间序列中的快慢点、地点 semantics 等，并将其聚类分析以揭示用户的生活方式特征。
results: 实验使用了500000个用户的路径数据，可以分出7个用户群，每个群都具有不同的生活方式特征，这些特征可以通过跨级流动特征工程和聚类分析来揭示。

Abstract
Human mobility demonstrates a high degree of regularity, which facilitates the discovery of lifestyle profiles. Existing research has yet to fully utilize the regularities embedded in high-order features extracted from human mobility records in such profiling. This study proposes a progressive feature extraction strategy that mines high-order mobility features from users' moving trajectory records from the spatial, temporal, and semantic dimensions. Specific features are extracted such as travel motifs, rhythms decomposed by discrete Fourier transform (DFT) of mobility time series, and vectorized place semantics by word2vec, respectively to the three dimensions, and they are further clustered to reveal the users' lifestyle characteristics. An experiment using a trajectory dataset of over 500k users in Shenzhen, China yields seven user clusters with different lifestyle profiles that can be well interpreted by common sense. The results suggest the possibility of fine-grained user profiling through cross-order trajectory feature engineering and clustering.

摘要
人类流动示示高度的规律性，这使得生活型号的发现变得更加容易。现有研究尚未充分利用人类流动记录中的高级特征来进行 Profiling。本研究提议一种进步的特征提取策略，该策略在用户移动轨迹记录中提取高级的空间、时间和 semantics 维度上的特征。特定的特征包括旅游模式、由 Discrete Fourier Transform (DFT) 分解的流动时间序列的rhythms、word2vec 将地点semanticsvector化，并将其分布到三个维度上。这些特征被进一步归类，以描述用户的生活特征。使用了500k用户的轨迹数据集在深圳，中国进行实验，并得到了七个用户群，每个群都有不同的生活特征，这些特征可以通过跨维度的特征工程和归类来进行细化的用户 Profiling。结果表明，可以通过高级的特征提取和归类来实现细化的用户 Profiling。

A Causality-Aware Pattern Mining Scheme for Group Activity Recognition in a Pervasive Sensor Space

paper_url: http://arxiv.org/abs/2312.00404
repo_url: None
paper_authors: Hyunju Kim, Heesuk Son, Dongman Lee
for: 本研究旨在提出一种高效的群体活动识别方案，以支持智能空间中无障碍和隐私问题的人活动识别。
methods: 本方案使用了基于事件Sequences的 causality 模型，利用推荐规则筛选不相关的噪声事件，然后使用 pattern-tree 算法提取频繁的 causal 模式，最后使用Weighted sum-based pattern matching algorithm 进行group activity识别。
results: 实验结果表明，提出的方案在实际环境中具有较高的识别精度和较小的运行时间开销，比现有方案高效。

Abstract
Human activity recognition (HAR) is a key challenge in pervasive computing and its solutions have been presented based on various disciplines. Specifically, for HAR in a smart space without privacy and accessibility issues, data streams generated by deployed pervasive sensors are leveraged. In this paper, we focus on a group activity by which a group of users perform a collaborative task without user identification and propose an efficient group activity recognition scheme which extracts causality patterns from pervasive sensor event sequences generated by a group of users to support as good recognition accuracy as the state-of-the-art graphical model. To filter out irrelevant noise events from a given data stream, a set of rules is leveraged to highlight causally related events. Then, a pattern-tree algorithm extracts frequent causal patterns by means of a growing tree structure. Based on the extracted patterns, a weighted sum-based pattern matching algorithm computes the likelihoods of stored group activities to the given test event sequence by means of matched event pattern counts for group activity recognition. We evaluate the proposed scheme using the data collected from our testbed and CASAS datasets where users perform their tasks on a daily basis and validate its effectiveness in a real environment. Experiment results show that the proposed scheme performs higher recognition accuracy and with a small amount of runtime overhead than the existing schemes.

摘要
人类活动识别（HAR）是智能 computing 中的关键挑战，其解决方案基于多种学科。特别是在智能空间中无障碍和隐私问题的情况下，通过部署在场景中的普适传感器生成的数据流进行 HAR。在这篇论文中，我们关注一个由多个用户共同完成的群体活动，不需要用户标识，并提出一种高效的群体活动识别方案。该方案从普适传感器事件序列中提取 causality 模式，以支持高准确性。为filter出不相关的噪声事件，我们使用一组规则高亮相关事件。然后，一种 Pattern-tree 算法提取了频繁的 causal 模式，并使用这些模式计算测试事件序列与存储的群体活动之间的匹配度。我们使用实验室和 CASAS 数据集进行评估，并证明该方案在真实环境中的有效性。实验结果表明，提posed scheme 的识别精度高，并且具有较小的运行时间开销，比既存的方案更高。

GFN-SR: Symbolic Regression with Generative Flow Networks

paper_url: http://arxiv.org/abs/2312.00396
repo_url: None
paper_authors: Sida Li, Ioana Marinescu, Sebastian Musslick
for: 这个论文主要目标是解决符号回归问题，以便生成最佳的表达树。
methods: 该方法使用了深度学习和权重网络，模拟了构建表达树的过程，从而可以生成多个最佳的表达树。
results: 对比其他符号回归算法，GFN-SR在噪声数据范围内表现更高，具有学习一个分布的优势，可以在多个解决方案空间中生成多个最佳的表达树。

Abstract
Symbolic regression (SR) is an area of interpretable machine learning that aims to identify mathematical expressions, often composed of simple functions, that best fit in a given set of covariates $X$ and response $y$. In recent years, deep symbolic regression (DSR) has emerged as a popular method in the field by leveraging deep reinforcement learning to solve the complicated combinatorial search problem. In this work, we propose an alternative framework (GFN-SR) to approach SR with deep learning. We model the construction of an expression tree as traversing through a directed acyclic graph (DAG) so that GFlowNet can learn a stochastic policy to generate such trees sequentially. Enhanced with an adaptive reward baseline, our method is capable of generating a diverse set of best-fitting expressions. Notably, we observe that GFN-SR outperforms other SR algorithms in noisy data regimes, owing to its ability to learn a distribution of rewards over a space of candidate solutions.

摘要
Symbolic regression (SR) 是一种可解释机器学习领域，旨在在给定的 covariates $X$ 和响应 $y$ 中找到最佳的数学表达。在过去几年，深度符号回归 (DSR) 在该领域中得到了广泛的应用，通过深度强化学习解决复杂的 combinatorial 搜索问题。在这项工作中，我们提出了一个替代方案 (GFN-SR)，使用深度学习来解决 SR 问题。我们模型了构建表达树的过程为通过指向图 (DAG) 进行搜索，以便 GFlowNet 可以学习一个随机政策，生成这些树sequentially。增强了一个适应奖励基准，我们的方法可以生成一个多样化的最佳表达集。值得注意的是，我们发现 GFN-SR 在噪声数据范围内表现出色，归因于它可以学习一个表达空间中的奖励分布。

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

paper_url: http://arxiv.org/abs/2312.00388
repo_url: None
paper_authors: Junchen Zhao, Yurun Song, Simeng Liu, Ian G. Harris, Sangeetha Abdu Jyothi
for: 这个论文是为了解决在移动设备上部署大型自然语言模型（LLM）的问题， LLMs 的巨大记忆需求对移动设备的资源要求很高。
methods: 这个论文使用了三种关键策略来解决这个问题：首先，使用优化的模型分配技术来将 LLMs 分成不同的部分，然后使用线性优化来调整这些部分与每个设备的能力相对align。其次，使用优化的数据传输机制来确保数据流动效率高，并保持原始模型结构的完整性。最后，论文使用了runtime负载均衡器，帮助监控和重新分配任务，以避免瓶须和卡须，提高整体效率和响应性。
results: 这个论文透过广泛的测试，证明了 LinguaLinked 可以实现高效的 LLM 推断，同时维持一致的吞吐率和最小的延迟。在单线程设定下，LinguaLinked 相比基eline，取得了1.11倍至1.61倍的推断性能提升。在多线程设定下，LinguaLinked 取得了1.73倍至2.65倍的推断性能提升。runtime负载均衡器还提供了一个总体推断性能提升的1.29倍至1.32倍。

Abstract
Deploying Large Language Models (LLMs) locally on mobile devices presents a significant challenge due to their extensive memory requirements. In this paper, we introduce LinguaLinked, a system for decentralized, distributed LLM inference on mobile devices. LinguaLinked enables collaborative execution of the inference task across multiple trusted devices. LinguaLinked ensures data privacy by processing information locally. LinguaLinked uses three key strategies. First, an optimized model assignment technique segments LLMs and uses linear optimization to align segments with each device's capabilities. Second, an optimized data transmission mechanism ensures efficient and structured data flow between model segments while also maintaining the integrity of the original model structure. Finally, LinguaLinked incorporates a runtime load balancer that actively monitors and redistributes tasks among mobile devices to prevent bottlenecks, enhancing the system's overall efficiency and responsiveness. We demonstrate that LinguaLinked facilitates efficient LLM inference while maintaining consistent throughput and minimal latency through extensive testing across various mobile devices, from high-end to low-end Android devices. In our evaluations, compared to the baseline, LinguaLinked achieves an inference performance acceleration of $1.11\times$ to $1.61\times$ in single-threaded settings, $1.73\times$ to $2.65\times$ with multi-threading. Additionally, runtime load balancing yields an overall inference acceleration of $1.29\times$ to $1.32\times$.

摘要
<>通过在移动设备上本地部署大型自然语言模型（LLM）来实现大规模语言模型的计算是一项挑战。在这篇论文中，我们介绍了一个名为LinguaLinked的分布式、协同执行大语言模型推理系统。LinguaLinked使用三种关键策略来实现数据隐私和高效的计算。首先，LinguaLinked使用优化的模型分配技术将LLM分成多个部分，并使用线性优化将这些部分与每个设备的能力进行对齐。其次，LinguaLinked使用优化的数据传输机制来确保数据流动效率和结构化，同时保持模型的原始结构完整性。最后，LinguaLinked包含一个运行时负荷均衡器，实时监控和重新分配任务，以避免瓶颈，提高整体的效率和响应性。我们通过对各种移动设备进行广泛测试，包括高端到低端的Android设备，证明LinguaLinked可以实现高效的LLM推理，保持一致的吞吐量和最小的延迟。在单线程设置下，相比基准情况，LinguaLinked实现了推理性能加速率为1.11倍至1.61倍，在多线程设置下，LinguaLinked实现了推理性能加速率为1.73倍至2.65倍。此外，运行时负荷均衡器可以提高总体的推理加速率为1.29倍至1.32倍。<>

Optimal Sample Complexity of Contrastive Learning

paper_url: http://arxiv.org/abs/2312.00379
repo_url: None
paper_authors: Noga Alon, Dmitrii Avdiukhin, Dor Elboim, Orr Fischer, Grigory Yaroslavtsev
for: 本文研究了对带有标签的对象的对比学习，即学习将数据点与其他数据点之间的距离关系匹配的技术。
methods: 本文使用了对比学习方法，并研究了这种方法的采样复杂性。
results: 本文给出了对比学习采样复杂性的紧binding的下界，包括对于一般的 $\ell_p$ 距离函数和树 metric。特别是，对于任意 $p \geq 1$，我们证明了 $\tilde \Theta(\min(nd,n^2))$ 标注对需要进行 $d$-维表示学习。

Abstract
Contrastive learning is a highly successful technique for learning representations of data from labeled tuples, specifying the distance relations within the tuple. We study the sample complexity of contrastive learning, i.e. the minimum number of labeled tuples sufficient for getting high generalization accuracy. We give tight bounds on the sample complexity in a variety of settings, focusing on arbitrary distance functions, both general $\ell_p$-distances, and tree metrics. Our main result is an (almost) optimal bound on the sample complexity of learning $\ell_p$-distances for integer $p$. For any $p \ge 1$ we show that $\tilde \Theta(\min(nd,n^2))$ labeled tuples are necessary and sufficient for learning $d$-dimensional representations of $n$-point datasets. Our results hold for an arbitrary distribution of the input samples and are based on giving the corresponding bounds on the Vapnik-Chervonenkis/Natarajan dimension of the associated problems. We further show that the theoretical bounds on sample complexity obtained via VC/Natarajan dimension can have strong predictive power for experimental results, in contrast with the folklore belief about a substantial gap between the statistical learning theory and the practice of deep learning.

摘要
“对比学习是一种非常成功的技术，用于从标注对的集合中学习数据表示。我们研究对比学习的样本复杂性，即必须的最小标注对数量以获得高泛化准确率。我们给出了各种设置下的紧张 bounds，包括通用的 $\ell_p$-距离函数和树度量。我们的主要结果是对 $\ell_p$-距离函数的学习所需的样本复杂性的（几乎）优化 bound。对于任意 $p \geq 1$，我们证明了 $\tilde \Theta(\min(nd,n^2))$ 标注对数量是学习 $d$-维表示的必要和充分条件。我们的结果适用于输入样本的任意分布，基于给出的 bounds on Vapnik-Chervonenkis/Natarajan 维度。我们还证明了对比学习的理论样本复杂性 bounds 可以具有强的预测力，与深度学习实践中的信念不同。”Note: "VC/Natarajan dimension" refers to the Vapnik-Chervonenkis dimension or Natarajan dimension, which are measures of the complexity of a set of functions, and are used to bound the sample complexity of learning algorithms.

Streaming Bayesian Modeling for predicting Fat-Tailed Customer Lifetime Value

paper_url: http://arxiv.org/abs/2312.00373
repo_url: None
paper_authors: Alexey V. Calabourdin, Konstantin A. Aksenov
for: 这篇论文是为了应用在层次确率模型和GLMS上的在线学习MCMC方法。
methods: 这篇论文使用了在线学习MCMC方法，可以应用于层次确率模型和GLMS。它还开发了一个可以扩展至多种脂肪尾和细长尾的脂肪尾LTV模型。
results: 这篇论文在一个大型移动应用上评估了这两个开发，并获得了良好的结果。

Abstract
We develop an online learning MCMC approach applicable for hierarchical bayesian models and GLMS. We also develop a fat-tailed LTV model that generalizes over several kinds of fat and thin tails. We demonstrate both developments on commercial LTV data from a large mobile app.

摘要
我们开发了一种在线学习MCMC方法，可应用于层次权重模型和GLMS。我们还开发了一种总体化多种脂肪和瘦 tail 的 LTV 模型。我们在一家大型移动应用商业LTV数据上进行了示例。Here's a word-for-word translation:我们开发了一种在线学习MCMC方法，可应用于层次权重模型和GLMS。我们还开发了一种总体化多种脂肪和瘦 tail 的 LTV 模型。我们在一家大型移动应用商业LTV数据上进行了示例。Note that Simplified Chinese is the standard writing system used in mainland China, while Traditional Chinese is used in Taiwan and other parts of the world.

Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training

paper_url: http://arxiv.org/abs/2312.00359
repo_url: https://github.com/yefanzhou/tempbalance
paper_authors: Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang
for: This paper focuses on improving the training of neural networks by using a layer-wise learning rate method called TempBalance, which is based on Heavy-Tailed Self-Regularization (HT-SR) theory.
methods: The paper proposes using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing.
results: The paper shows that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization on several benchmark datasets, including CIFAR10, CIFAR100, SVHN, and TinyImageNet, using ResNets, VGGs, and WideResNets with various depths and widths. Additionally, TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers.

Abstract
Regularization in modern machine learning is crucial, and it can take various forms in algorithmic design: training set, model family, error function, regularization terms, and optimizations. In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training. Indeed, many widely adopted training strategies basically just define the decay of the learning rate over time. This process can be interpreted as decreasing a temperature, using either a global learning rate (for the entire model) or a learning rate that varies for each parameter. This paper proposes TempBalance, a straightforward yet effective layer-wise learning rate method. TempBalance is based on Heavy-Tailed Self-Regularization (HT-SR) Theory, an approach which characterizes the implicit self-regularization of different layers in trained models. We demonstrate the efficacy of using HT-SR-motivated metrics to guide the scheduling and balancing of temperature across all network layers during model training, resulting in improved performance during testing. We implement TempBalance on CIFAR10, CIFAR100, SVHN, and TinyImageNet datasets using ResNets, VGGs, and WideResNets with various depths and widths. Our results show that TempBalance significantly outperforms ordinary SGD and carefully-tuned spectral norm regularization. We also show that TempBalance outperforms a number of state-of-the-art optimizers and learning rate schedulers.

摘要
现代机器学习中的正则化非常重要，它可以通过训练集、模型家族、错误函数、正则化项和优化器的不同形式来实现。特别是学习率，可以被看作机器学习中的温度类参数，在神经网络训练中扮演着关键角色。实际上，许多广泛采用的训练策略都是通过时间的滑动来定义学习率的减少。这个过程可以被看作是降低温度的过程，使用全局学习率（整个模型）或者是变化的学习率（每个参数）。本文提出了TempBalance，一种简单而有效的层wise学习率方法。TempBalance基于重 tailed自 regulization（HT-SR）理论，该理论描述了不同层的训练模型中的隐式自 regulization。我们示出了使用HT-SR驱动的指标来导航和平衡整个网络层的学习率 durante el entrenamiento，以提高测试时的性能。我们在CIFAR10、CIFAR100、SVHN和TinyImageNet datasets上使用ResNets、VGGs和WideResNets等不同深度和宽度的模型进行实现 TempBalance。我们的结果表明，TempBalance在测试时的性能明显超过了普通的SGD和仔细调整的spectral norm regularization。我们还示出了TempBalance在多种状态 искус智的优化器和学习率调整器上的性能优势。

Transfer learning for predicting source terms of principal component transport in chemically reactive flow

paper_url: http://arxiv.org/abs/2312.00356
repo_url: None
paper_authors: Ki Sung Jung, Tarek Echekki, Jacqueline H. Chen, Mohammad Khalil
for: 本研究旨在 evaluating whether the number of requisite training samples can be reduced with the use of various transfer learning models for predicting chemical source terms of a data-driven reduced-order model that represents the homogeneous ignition process of a hydrogen/air mixture.
methods: 本研究使用了主成分分析减少数据的维度，并使用人工神经网络（ANNs）来 tabulate 反应率的主成分。然后，一系列的常微分方程被解决。
results: 当锻炼样本数量减少到 target task 中（即 T0 > 1000 K 和 various phi）时，减少模型无法预测氢/空气混合物的点燃进程。然后，三种传输学习策略被应用到 ANN 模型的训练中。通过控制 ANN 模型的初始化和正则化，可以remarkably enhance 减少模型的性能。此外，在任务相似度较低时，改变 ANN 模型的初始化方案可以获得额外的性能提升。

Abstract
The objective of this study is to evaluate whether the number of requisite training samples can be reduced with the use of various transfer learning models for predicting, for example, the chemical source terms of the data-driven reduced-order model that represents the homogeneous ignition process of a hydrogen/air mixture. Principal component analysis is applied to reduce the dimensionality of the hydrogen/air mixture in composition space. Artificial neural networks (ANNs) are used to tabulate the reaction rates of principal components, and subsequently, a system of ordinary differential equations is solved. As the number of training samples decreases at the target task (i.e.,for T0 > 1000 K and various phi), the reduced-order model fails to predict the ignition evolution of a hydrogen/air mixture. Three transfer learning strategies are then applied to the training of the ANN model with a sparse dataset. The performance of the reduced-order model with a sparse dataset is found to be remarkably enhanced if the training of the ANN model is restricted by a regularization term that controls the degree of knowledge transfer from source to target tasks. To this end, a novel transfer learning method is introduced, parameter control via partial initialization and regularization (PaPIR), whereby the amount of knowledge transferred is systemically adjusted for the initialization and regularization of the ANN model in the target task. It is found that an additional performance gain can be achieved by changing the initialization scheme of the ANN model in the target task when the task similarity between source and target tasks is relatively low.

摘要
这个研究的目标是判断使用不同的传输学习模型可以降低需要的训练样本数量，以便预测例如氢/空气混合物的数据驱动简化模型中的化学源极值。使用主成分分析降低氢/空气混合物的维度空间，人工神经网络（ANN）用于 Tabulate 总是的反应率，然后解决系统的常微分方程。随着目标任务（即 T0 > 1000 K 和多种 phi）中的训练样本数量减少，简化模型无法预测氢/空气混合物的燃烧演化。然后，将传输学习策略应用到 ANN 模型的训练中。通过对 ANN 模型的初始化和正则化进行系统调整，控制了知识传输的数量，从而提高了简化模型的性能。此外，还发现可以通过修改 target 任务中 ANN 模型的初始化方案来获得额外的性能提升，当任务相似度 relativelow时。

Quantum Kernel t-Distributed Stochastic Neighbor Embedding

paper_url: http://arxiv.org/abs/2312.00352
repo_url: None
paper_authors: Yoshiaki Kawase, Kosuke Mitarai, Keisuke Fujii
for: 用于Visualizing quantum states and optimization trajectories, allowing for a better understanding of quantum circuits and algorithms.
methods: 使用Quantum kernels for fast and highly accurate visualization of quantum states, compared to classical kernel methods.
results: 成功Visualized hand-written digits dataset and optimization trajectories of finding the ground states of transverse field Ising model, without degrading the separability of the input higher dimensional data.

Abstract
Data visualization is important in understanding the characteristics of data that are difficult to see directly. It is used to visualize loss landscapes and optimization trajectories to analyze optimization performance. Popular optimization analysis is performed by visualizing a loss landscape around the reached local or global minimum using principal component analysis. However, this visualization depends on the variational parameters of a quantum circuit rather than quantum states, which makes it difficult to understand the mechanism of optimization process through the property of quantum states. Here, we propose a quantum data visualization method using quantum kernels, which enables us to offer fast and highly accurate visualization of quantum states. In our numerical experiments, we visualize hand-written digits dataset and apply $k$-nearest neighbor algorithm to the low-dimensional data to quantitatively evaluate our proposed method compared with a classical kernel method. As a result, our proposed method achieves comparable accuracy to the state-of-the-art classical kernel method, meaning that the proposed visualization method based on quantum machine learning does not degrade the separability of the input higher dimensional data. Furthermore, we visualize the optimization trajectories of finding the ground states of transverse field Ising model and successfully find the trajectory characteristics. Since quantum states are higher dimensional objects that can only be seen via observables, our visualization method, which inherits the similarity of quantum data, would be useful in understanding the behavior of quantum circuits and algorithms.

摘要
“数据视觉是重要的，它可以帮助我们理解直观不可见的数据特性。它可以用来视觉损失地形和优化轨迹，以分析优化性能。现有的优化分析通常是使用主成分分析来视觉损失地形绕着达到的本地或全球最小值。但这种视觉依赖于量子环境中的变量参数，而不是量子态，这使得我们很难通过量子态的性质来理解优化过程的机制。为了解决这个问题，我们提出了一种基于量子机器学习的量子数据视觉方法。我们的方法可以快速和高度准确地视觉量子态。在我们的数字实验中，我们使用手写数据集和$k$-最近邻algorithm来评估我们的提议方法，并与经典kernel方法进行比较。结果显示，我们的提议方法可以与经典kernel方法相比，表明量子机器学习基于的视觉方法不会降低输入的维度高数据的分离性。此外，我们还可以视觉优化轨迹，并成功地发现了搜索的特性。由于量子态是高维度的对象，只能通过观察来看到，我们的视觉方法，它继承了量子数据的相似性，将会对量子环境和算法的行为提供有用的理解。”

TRC: Trust Region Conditional Value at Risk for Safe Reinforcement Learning

paper_url: http://arxiv.org/abs/2312.00344
repo_url: https://github.com/rllab-snu/Trust-Region-CVaR
paper_authors: Dohyeong Kim, Songhwai Oh
for: 本研究旨在提出一种基于信任区的安全强化学习方法（TRC），以满足CVaR约束并最大化希望返回。
methods: 本方法首先计算CVaR的Upper bound，然后在信任区内使用这个上界来 aproximateCVaR，并将其转化为可导的形式。然后，通过多次解决相应的子问题， trains policies。
results: 在 simulate Navigation tasks中，TRC方法比其他安全RL方法提高性能1.93倍，同时保证约束的满足。

Abstract
As safety is of paramount importance in robotics, reinforcement learning that reflects safety, called safe RL, has been studied extensively. In safe RL, we aim to find a policy which maximizes the desired return while satisfying the defined safety constraints. There are various types of constraints, among which constraints on conditional value at risk (CVaR) effectively lower the probability of failures caused by high costs since CVaR is a conditional expectation obtained above a certain percentile. In this paper, we propose a trust region-based safe RL method with CVaR constraints, called TRC. We first derive the upper bound on CVaR and then approximate the upper bound in a differentiable form in a trust region. Using this approximation, a subproblem to get policy gradients is formulated, and policies are trained by iteratively solving the subproblem. TRC is evaluated through safe navigation tasks in simulations with various robots and a sim-to-real environment with a Jackal robot from Clearpath. Compared to other safe RL methods, the performance is improved by 1.93 times while the constraints are satisfied in all experiments.

摘要
为确保机器人安全，有一种名为安全学习（Safe RL）的研究得到了广泛的关注。在安全学习中，我们目标是找到一个策略，使得返回最大化而且满足定义的安全约束。有各种类型的约束，其中 conditional value at risk（CVaR）约束能够减少高成本导致的失败的概率，因为CVaR是一个 conditional expectation 的上个百分数。在这篇论文中，我们提出了一种基于信任区的安全学习方法，称为 TRC。我们首先 deriv 出 CVaR 的Upper bound，然后在一个信任区内 approximates 这个Upper bound的偏 diferencial 形式。使用这个approximation，我们可以将一个问题的policy gradient 计算出来，并通过反复解这个问题来训练策略。TRC 在安全导航任务中通过在多种机器人和一个 Jackal 机器人上进行了多个实验，与其他安全学习方法进行了比较。相比之下，TRC 在所有实验中都能够满足约束，并且性能提高了1.93倍。

Hypergraph Node Representation Learning with One-Stage Message Passing

paper_url: http://arxiv.org/abs/2312.00336
repo_url: None
paper_authors: Shilin Qu, Weiqing Wang, Yuan-Fang Li, Xin Zhou, Fajie Yuan
for: 该 paper 的目的是提出一种基于 Transformer 框架的 hypergraph 节点表示学习方法，以提高 semi-supervised hypernode 分类任务的性能。
methods: 该 paper 使用了一种基于一阶传递方法的 hypergraph 节点表示学习方法，通过将注意力矩阵和 hypergraph Laplacian 结合在一起，inject 了 hypergraph 结构信息（本地信息）到 Transformers 中（全局信息）。
results: 对于五个代表性的 benchmark 数据集，该 paper 的方法在 semi-supervised hypernode 分类任务上表现出了比较优秀的result，与比较最新的 hypergraph 学习方法相比，提高了性能，具体的提高率为 2.52% 到 6.70%。

Abstract
Hypergraphs as an expressive and general structure have attracted considerable attention from various research domains. Most existing hypergraph node representation learning techniques are based on graph neural networks, and thus adopt the two-stage message passing paradigm (i.e. node -> hyperedge -> node). This paradigm only focuses on local information propagation and does not effectively take into account global information, resulting in less optimal representations. Our theoretical analysis of representative two-stage message passing methods shows that, mathematically, they model different ways of local message passing through hyperedges, and can be unified into one-stage message passing (i.e. node -> node). However, they still only model local information. Motivated by this theoretical analysis, we propose a novel one-stage message passing paradigm to model both global and local information propagation for hypergraphs. We integrate this paradigm into HGraphormer, a Transformer-based framework for hypergraph node representation learning. HGraphormer injects the hypergraph structure information (local information) into Transformers (global information) by combining the attention matrix and hypergraph Laplacian. Extensive experiments demonstrate that HGraphormer outperforms recent hypergraph learning methods on five representative benchmark datasets on the semi-supervised hypernode classification task, setting new state-of-the-art performance, with accuracy improvements between 2.52% and 6.70%. Our code and datasets are available.

摘要
Our theoretical analysis of representative two-stage message passing methods reveals that they can be unified into a one-stage message passing approach (i.e., node -> node), but they still only capture local information. Motivated by this analysis, we propose a novel one-stage message passing paradigm that models both global and local information propagation for hypergraphs.We integrate this paradigm into HGraphormer, a Transformer-based framework for hypergraph node representation learning. HGraphormer combines the attention matrix and hypergraph Laplacian to inject the hypergraph structure information (local information) into Transformers (global information).Experimental results on five benchmark datasets for semi-supervised hypernode classification demonstrate that HGraphormer outperforms recent hypergraph learning methods, achieving new state-of-the-art performance with accuracy improvements ranging from 2.52% to 6.70%. Our code and datasets are available.

ESM-NBR: fast and accurate nucleic acid-binding residue prediction via protein language model feature representation and multi-task learning

paper_url: http://arxiv.org/abs/2312.00842
repo_url: https://github.com/wwzll123/esm-nbr
paper_authors: Wenwu Zeng, Dafeng Lv, Wenjuan Liu, Shaoliang Peng
for: 本研究旨在提出一种快速和准确的基于序列的方法，用于预测蛋白质中的核酸绑定位点。
methods: 本研究使用了大型蛋白质语言模型ESM2提取特征表示，然后使用叠加双向长短时间记忆（BiLSTM）和多层杂化层（MLP）网络进行预测。
results: 实验结果表明，ESM2特征表示的预测性能大大超过了基于进化信息的隐马尔科夫模型（HMM）特征。此外，ESM-NBR在两个独立的测试集上的MCC值为0.427和0.391，相比之下，第二个最佳方法的MCC值高出18.61%和10.45%。

Abstract
Protein-nucleic acid interactions play a very important role in a variety of biological activities. Accurate identification of nucleic acid-binding residues is a critical step in understanding the interaction mechanisms. Although many computationally based methods have been developed to predict nucleic acid-binding residues, challenges remain. In this study, a fast and accurate sequence-based method, called ESM-NBR, is proposed. In ESM-NBR, we first use the large protein language model ESM2 to extract discriminative biological properties feature representation from protein primary sequences; then, a multi-task deep learning model composed of stacked bidirectional long short-term memory (BiLSTM) and multi-layer perceptron (MLP) networks is employed to explore common and private information of DNA- and RNA-binding residues with ESM2 feature as input. Experimental results on benchmark data sets demonstrate that the prediction performance of ESM2 feature representation comprehensively outperforms evolutionary information-based hidden Markov model (HMM) features. Meanwhile, the ESM-NBR obtains the MCC values for DNA-binding residues prediction of 0.427 and 0.391 on two independent test sets, which are 18.61 and 10.45% higher than those of the second-best methods, respectively. Moreover, by completely discarding the time-cost multiple sequence alignment process, the prediction speed of ESM-NBR far exceeds that of existing methods (5.52s for a protein sequence of length 500, which is about 16 times faster than the second-fastest method). A user-friendly standalone package and the data of ESM-NBR are freely available for academic use at: https://github.com/wwzll123/ESM-NBR.

摘要
生物活动中蛋白-核酸结合很重要，确定核酸结合位点的准确性是理解交互机制的关键步骤。虽然许多计算机方法已经开发，但还有挑战。本研究提出了一种快速准确的序列基于方法，即ESM-NBR。在ESM-NBR中，我们首先使用大量蛋白质语言模型ESM2提取特征表示，然后使用堆式反向长短时间记忆（BiLSTM）和多层感知机（MLP）网络来探索DNA和RNA结合位点的共同和专属信息。实验结果表明，ESM2特征表示的预测性能超过进化信息基于隐马尔可夫模型（HMM）特征。此外，ESM-NBR在DNA结合位点预测中获得了MCC值为0.427和0.391，在两个独立测试集上分别高于第二最佳方法的18.61%和10.45%。此外，由完全抛弃多重时间序列对Alignment过程，ESM-NBR的预测速度远胜现有方法（5.52秒，对500个蛋白质序列），约16倍快于第二快的方法。用户友好的独立包和ESM-NBR数据可以免费在GitHub上获取：https://github.com/wwzll123/ESM-NBR。

Multiple Testing of Linear Forms for Noisy Matrix Completion

paper_url: http://arxiv.org/abs/2312.00305
repo_url: None
paper_authors: Wanteng Ma, Lilun Du, Dong Xia, Ming Yuan
for: solves the problem of noisy matrix completion for large-scale recommender systems.
methods: introduces new statistics for individual tests with sharp asymptotics and utilizes a data splitting and symmetric aggregation scheme to control the false discovery rate (FDR).
results: guarantees power under nearly optimal sample size requirements and shows valid FDR control with extensive numerical simulations and real data examples.Here’s the format you requested:
for: < solves the problem of noisy matrix completion for large-scale recommender systems.>
methods: < introduces new statistics for individual tests with sharp asymptotics and utilizes a data splitting and symmetric aggregation scheme to control the false discovery rate (FDR).>
results: < guarantees power under nearly optimal sample size requirements and shows valid FDR control with extensive numerical simulations and real data examples.>

Abstract
Many important tasks of large-scale recommender systems can be naturally cast as testing multiple linear forms for noisy matrix completion. These problems, however, present unique challenges because of the subtle bias-and-variance tradeoff of and an intricate dependence among the estimated entries induced by the low-rank structure. In this paper, we develop a general approach to overcome these difficulties by introducing new statistics for individual tests with sharp asymptotics both marginally and jointly, and utilizing them to control the false discovery rate (FDR) via a data splitting and symmetric aggregation scheme. We show that valid FDR control can be achieved with guaranteed power under nearly optimal sample size requirements using the proposed methodology. Extensive numerical simulations and real data examples are also presented to further illustrate its practical merits.

摘要
很多大规模推荐系统的重要任务可以自然地表示为多个线性形式的含杂矩阵完成测试。然而，这些问题具有特殊的偏见和强度负担问题，由于低级结构引起的维度依赖关系。在这篇论文中，我们开发了一种通用的方法来解决这些问题，通过引入新的个体统计，以及利用它们来控制false discovery rate（FDR）via数据分割和对称聚合方案。我们表明，使用提议的方法可以实现有保证的样本大小要求，并且在nearly optimal样本大小下实现有效的FDR控制。我们还在实验和实际数据例子中进行了详细的 Illustrate its practical benefits.

Towards Aligned Canonical Correlation Analysis: Preliminary Formulation and Proof-of-Concept Results

paper_url: http://arxiv.org/abs/2312.00296
repo_url: None
paper_authors: Biqian Cheng, Evangelos E. Papalexakis, Jia Chen
for: 本研究提出了一种新的核心相关分析（ACCA）框架，以解决传统方法中数据视角之间的Alignment问题。
methods: 该框架通过迭代解决Alignment和多视角嵌入问题来实现jointly embedding多个数据视角在最大相关性空间中。
results: 对于一些实际应用场景，ACCA可以更好地嵌入数据视角，并且可以提高相关性的计算效率。

Abstract
Canonical Correlation Analysis (CCA) has been widely applied to jointly embed multiple views of data in a maximally correlated latent space. However, the alignment between various data perspectives, which is required by traditional approaches, is unclear in many practical cases. In this work we propose a new framework Aligned Canonical Correlation Analysis (ACCA), to address this challenge by iteratively solving the alignment and multi-view embedding.

摘要
Traditional Canonical Correlation Analysis (CCA) 已经广泛应用于将多个视图数据jointly embedding 在最大相关性的隐藏空间中。然而，在许多实际情况下，不同数据视角的对齐问题并不清楚。在这种情况下，我们提出了一个新的框架Aligned Canonical Correlation Analysis (ACCA)，以解决这个挑战，通过迭代解决对齐和多视图嵌入。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form as well.

Learning to forecast diagnostic parameters using pre-trained weather embedding

paper_url: http://arxiv.org/abs/2312.00290
repo_url: None
paper_authors: Peetak P. Mitra, Vivek Ramavajjala
for: 这个论文旨在提出一种方法，使得新增的诊断变量可以轻松地添加到现有的气象预测模型中，而不需要重新训练整个模型。
methods: 该方法包括两个阶段。在第一阶段，我们使用自适应神经网络来嵌入气象变量到一个隐藏空间中。在第二阶段，自适应神经网络被冻结，而下游模型则被训练用于预测诊断变量，使用Only latent representation of prognostic variables as input。
results: 我们的实验表明，使用这种两阶段方法可以达到与特制模型的准确率相同的水平，同时减少了训练和推理过程中的资源消耗。此外，这种方法允许在不affect现有模型的前提下，随时开发新的下游模型。

Abstract
Data-driven weather prediction (DDWP) models are increasingly becoming popular for weather forecasting. However, while operational weather forecasts predict a wide variety of weather variables, DDWPs currently forecast a specific set of key prognostic variables. Non-prognostic ("diagnostic") variables are sometimes modeled separately as dependent variables of the prognostic variables (c.f. FourCastNet), or by including the diagnostic variable as a target in the DDWP. However, the cost of training and deploying bespoke models for each diagnostic variable can increase dramatically with more diagnostic variables, and limit the operational use of such models. Likewise, retraining an entire DDWP each time a new diagnostic variable is added is also cost-prohibitive. We present an two-stage approach that allows new diagnostic variables to be added to an end-to-end DDWP model without the expensive retraining. In the first stage, we train an autoencoder that learns to embed prognostic variables into a latent space. In the second stage, the autoencoder is frozen and "downstream" models are trained to predict diagnostic variables using only the latent representations of prognostic variables as input. Our experiments indicate that models trained using the two-stage approach offer accuracy comparable to training bespoke models, while leading to significant reduction in resource utilization during training and inference. This approach allows for new "downstream" models to be developed as needed, without affecting existing models and thus reducing the friction in operationalizing new models.

摘要
大数据驱动天气预测（DDWP）模型在天气预测中日益受欢迎。然而，而DDWP目前只预测一组关键预测变量。其他非预测（诊断）变量可能被视为依赖于预测变量的依赖变量（如FourCastNet），或者包含诊断变量作为DDWP的目标。然而，为每个诊断变量培训和部署特定模型的成本会随着诊断变量的增加而增加几何，限制这些模型的操作使用。另外，每次添加新的诊断变量时重新培训整个DDWP也是成本不可控的。我们提出了一种两stage的方法，允许新的诊断变量被添加到整体DDWP模型中，而不需要昂贵的重新培训。在第一个阶段，我们培训了一个自适应神经网络，以学习将预测变量嵌入到一个封闭空间中。在第二个阶段，自适应神经网络被冻结，而“下游”模型则在只使用预测变量的嵌入表示为输入时被训练，以预测诊断变量。我们的实验表明，使用这种两stage方法可以实现与特定模型培训一样的准确率，同时在训练和推断过程中减少资源使用量。这种方法允许在需要时随时开发新的“下游”模型，而无需影响现有模型，从而减少操作化新模型的阻力。

Age-Based Scheduling for Mobile Edge Computing: A Deep Reinforcement Learning Approach

paper_url: http://arxiv.org/abs/2312.00279
repo_url: https://github.com/xingqiuhe/dpds
paper_authors: Xingqiu He, Chaoqun You, Tony Q. S. Quek
For: The paper is written for MEC systems that require real-time performance and freshness of collected environmental information, and it proposes a new definition of Age of Information (AoI) that takes into account the event-driven nature of the desired status information.* Methods: The paper proposes an online AoI minimization problem for MEC systems, which can be formulated as a Markov Decision Process (MDP) and solved using Reinforcement Learning (RL) algorithms. To accelerate the learning process, the paper introduces Post-Decision States (PDSs) that exploit the partial knowledge of the system’s dynamics.* Results: The paper demonstrates through numerical results that the proposed algorithm outperforms benchmarks under various scenarios, indicating its effectiveness in minimizing the Age of Information in MEC systems.

Abstract
With the rapid development of Mobile Edge Computing (MEC), various real-time applications have been deployed to benefit people's daily lives. The performance of these applications relies heavily on the freshness of collected environmental information, which can be quantified by its Age of Information (AoI). In the traditional definition of AoI, it is assumed that the status information can be actively sampled and directly used. However, for many MEC-enabled applications, the desired status information is updated in an event-driven manner and necessitates data processing. To better serve these applications, we propose a new definition of AoI and, based on the redefined AoI, we formulate an online AoI minimization problem for MEC systems. Notably, the problem can be interpreted as a Markov Decision Process (MDP), thus enabling its solution through Reinforcement Learning (RL) algorithms. Nevertheless, the traditional RL algorithms are designed for MDPs with completely unknown system dynamics and hence usually suffer long convergence times. To accelerate the learning process, we introduce Post-Decision States (PDSs) to exploit the partial knowledge of the system's dynamics. We also combine PDSs with deep RL to further improve the algorithm's applicability, scalability, and robustness. Numerical results demonstrate that our algorithm outperforms the benchmarks under various scenarios.

摘要
随着移动边 Computing（MEC）的快速发展，许多实时应用程序已经被部署以改善人们的日常生活。这些应用程序的性能受到环境信息的新鲜度（Age of Information，简称AoI）的影响，AoI是指环境信息的最新状态。在传统的AoI定义中，假设环境信息可以主动采样并直接使用。但是，许多MEC实现应用程序中的desired状态信息通过事件驱动方式更新，因此需要数据处理。为了更好地服务这些应用程序，我们提出了一个新的AoI定义，并基于该定义，我们定义了一个在MEC系统上进行在线AoI最小化问题。这个问题可以被解释为Markov Decision Process（MDP），因此可以通过Reinforcement Learning（RL）算法解决。然而，传统的RL算法在完全不知道系统动力学的情况下设计，因此通常具有长期间靠整合时间。为了加速学习过程，我们引入了Post-Decision States（PDS），以利用系统动力学的部分知识。我们还将PDS与深度RL结合，以进一步提高算法的可应用性、可扩展性和可靠性。 numeral results表明，我们的算法在不同的场景下都超过了标准。

Automating Continual Learning

paper_url: http://arxiv.org/abs/2312.00276
repo_url: https://github.com/idsia/automated-cl
paper_authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
for: 本研究旨在解决神经网络学习算法受到更改环境的影响，导致之前学习的技能被忘记。
methods: 我们提出了自动化连续学习（ACL）方法，通过训练自referential神经网络来meta-学习自己的内容学习算法。
results: 我们的实验表明，ACL方法可以有效解决“内容忘记”问题，并在无重复数据的情况下超越了手动设计的算法。

Abstract
General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF) -- previously acquired skills are forgotten when a new task is learned. Instead of hand-crafting new algorithms for avoiding CF, we propose Automated Continual Learning (ACL) to train self-referential neural networks to meta-learn their own in-context continual (meta-)learning algorithms. ACL encodes all desiderata -- good performance on both old and new tasks -- into its meta-learning objectives. Our experiments demonstrate that ACL effectively solves "in-context catastrophic forgetting"; our ACL-learned algorithms outperform hand-crafted ones, e.g., on the Split-MNIST benchmark in the replay-free setting, and enables continual learning of diverse tasks consisting of multiple few-shot and standard image classification datasets.

摘要
(Simplified Chinese translation)通用学习系统应该在开放式环境中不断改进自己，但传统的神经网络学习算法却受到快速协助忘记 (CF) 的影响，已经学习的技能会被新任务所取代。而不是手动制定新的 CF 避免算法，我们提议自动化连续学习 (ACL)，使神经网络自身学习其自身的在 Context continual 学习算法。ACL 将所有的需求（包括老任务和新任务的好势）编码到其元学习目标中。我们的实验表明，ACL 可以有效解决 "在 Context 中的快速协助忘记"，ACL 学习的算法在 Split-MNIST 测试集上无需回放而已经超过了手动制定的算法，并且可以连续学习多种多样的任务，包括多少shot 图像分类和标准图像分类任务。

Towards Clinical Prediction with Transparency: An Explainable AI Approach to Survival Modelling in Residential Aged Care

paper_url: http://arxiv.org/abs/2312.00271
repo_url: https://github.com/teosusnjak/survival-analysis-stage1
paper_authors: Teo Susnjak, Elise Griffin, Mitchell McCutcheon, Kathleen Potter
for: 这份研究的目的是为了精确地估算长期照顾中的遇难时间，以帮助医疗决策。
methods: 这份研究使用了进步的机器学习技术，包括CoxPH、EN、RR、Lasso、GB、XGB和RF等20个实验，以找出最佳的预测模型。
results: 研究发现，GB、XGB和RF模型的C-Index值最高（0.714、0.712和0.712），XGB模型在6个月后的预测AUROC中得到了0.746（95% CI 0.744-0.749）的数据。

Abstract
Background: Accurate survival time estimates aid end-of-life medical decision-making. Objectives: Develop an interpretable survival model for elderly residential aged care residents using advanced machine learning. Setting: A major Australasian residential aged care provider. Participants: Residents aged 65+ admitted for long-term care from July 2017 to August 2023. Sample size: 11,944 residents across 40 facilities. Predictors: Factors include age, gender, health status, co-morbidities, cognitive function, mood, nutrition, mobility, smoking, sleep, skin integrity, and continence. Outcome: Probability of survival post-admission, specifically calibrated for 6-month survival estimates. Statistical Analysis: Tested CoxPH, EN, RR, Lasso, GB, XGB, and RF models in 20 experiments with a 90/10 train/test split. Evaluated accuracy using C-index, Harrell's C-index, dynamic AUROC, IBS, and calibrated ROC. Chose XGB for its performance and calibrated it for 1, 3, 6, and 12-month predictions using Platt scaling. Employed SHAP values to analyze predictor impacts. Results: GB, XGB, and RF models showed the highest C-Index values (0.714, 0.712, 0.712). The optimal XGB model demonstrated a 6-month survival prediction AUROC of 0.746 (95% CI 0.744-0.749). Key mortality predictors include age, male gender, mobility, health status, pressure ulcer risk, and appetite. Conclusions: The study successfully applies machine learning to create a survival model for aged care, aligning with clinical insights on mortality risk factors and enhancing model interpretability and clinical utility through explainable AI.

摘要
背景：精准存活时间估计可以帮助老年医疗决策。目标：使用先进机器学习技术开发可解释的存活模型，用于lderly residential aged care居民。设置：一家大型澳大利亚长期居民Provider。参与者：从2017年7月至2023年8月入院长期care的65岁及以上居民11,944人，分布于40家机构。预测因素：年龄、性别、健康状况、相关病种、认知功能、情绪、营养、 mobilitas、吸烟、睡眠、皮肤完整性和 kontinenz。结果：GB、XGB和RF模型 display the highest C-Index values (0.714, 0.712, 0.712)。使用Platt scaling进行了1、3、6和12个月预测calibration。使用SHAP值来分析预测因素的影响。结果显示，年龄、男性 gender、 mobilitas、健康状况、压力瘤风险和食欲是键要的死亡预测因素。结论：这项研究成功地应用机器学习技术，创造了一个适用于长期居民的存活模型，与临床见解相一致，提高了模型解释性和临床实用性。