cs.LG - 2023-07-27

Federated Model Aggregation via Self-Supervised Priors for Highly Imbalanced Medical Image Classification

  • paper_url: http://arxiv.org/abs/2307.14959
  • repo_url: https://github.com/xmed-lab/fed-mas
  • paper_authors: Marawan Elbatel, Hualiang Wang, Robert Martí, Huazhu Fu, Xiaomeng Li
  • For: This paper focuses on addressing the challenges of federated learning in highly imbalanced medical datasets, specifically skin lesions and gastrointestinal images.* Methods: The authors use publicly available self-supervised auxiliary networks to study inter-client intra-class variations and derive a dynamic balanced model aggregation method called Fed-MAS, which can be used with different local learning methods to optimize a highly robust and unbiased global model.* Results: The authors demonstrate the effectiveness of Fed-MAS in improving the robustness and accuracy of the global model, and provide code for implementing the method at \url{https://github.com/xmed-lab/Fed-MAS}.
    Abstract In the medical field, federated learning commonly deals with highly imbalanced datasets, including skin lesions and gastrointestinal images. Existing federated methods under highly imbalanced datasets primarily focus on optimizing a global model without incorporating the intra-class variations that can arise in medical imaging due to different populations, findings, and scanners. In this paper, we study the inter-client intra-class variations with publicly available self-supervised auxiliary networks. Specifically, we find that employing a shared auxiliary pre-trained model, like MoCo-V2, locally on every client yields consistent divergence measurements. Based on these findings, we derive a dynamic balanced model aggregation via self-supervised priors (MAS) to guide the global model optimization. Fed-MAS can be utilized with different local learning methods for effective model aggregation toward a highly robust and unbiased global model. Our code is available at \url{https://github.com/xmed-lab/Fed-MAS}.
    摘要 医疗领域中,联合学习经常处理高度不均衡的数据集,包括皮肤病变和肠道影像。现有的联合方法在高度不均衡数据集上主要是优化全球模型,而不考虑医疗影像中的内部变化,这些变化可能由不同的人口、发现和扫描仪器引起。在这篇论文中,我们研究了客户端间内部类别变化,使用公开可用的自动预训练网络(如MoCo-V2)。我们发现,在每个客户端上使用共享的 auxiliary 预训练模型可以获得一致的异同度测量。基于这些发现,我们 derivate了一种动态均衡模型聚合方法(MAS),以自动调整全球模型的优化。Fed-MAS 可以与不同的本地学习方法结合使用,以实现高度可靠和无偏Global模型。我们的代码可以在 上找到。

Multi-Source Domain Adaptation through Dataset Dictionary Learning in Wasserstein Space

  • paper_url: http://arxiv.org/abs/2307.14953
  • repo_url: https://github.com/eddardd/demo-dadil
  • paper_authors: Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac
  • for: 解决多源频率适应问题 (Multi-Source Domain Adaptation, MSDA), mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain.
  • methods: 提议一个新的 MSDA 框架基于字典学习和最优运输,每个频率域被看作一个 empirical distribution,表示每个频率域为 Wasserstein 质量器中的一个barycenter。提出了一种新的算法 DaDiL,通过 mini-batches 学习:(i) atom distributions; (ii) 一个矩阵的barycentric coordinates。基于我们的字典,提出了两种新的 MSDA 方法:DaDil-R 基于目标频率域中标注样本的重建,DaDiL-E 基于 atom distributions 的 ensemble。
  • results: 在 Caltech-Office、Office 31 和 CRWU 三个 benchmark 上评估了我们的方法,与前一个state-of-the-art 相比,提高了3.15%、2.29% 和 7.71% 的分类性能。最后,我们示出了 interpolations 在学习的 atom Distributions 中提供了可以泛化到目标频率域的数据。
    Abstract This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.
    摘要 We interpret each domain in MSDA as an empirical distribution, and express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates.Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, which is based on the reconstruction of labeled samples in the target domain, and DaDiL-E, which is based on the ensembling of classifiers learned on atom distributions.We evaluate our methods in three benchmarks: Caltech-Office, Office 31, and CRWU, and achieve improved performance compared to previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance, respectively.Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain.

Network Fault-tolerant and Byzantine-resilient Social Learning via Collaborative Hierarchical Non-Bayesian Learning

  • paper_url: http://arxiv.org/abs/2307.14952
  • repo_url: None
  • paper_authors: Connor Mclaughlin, Matthew Ding, Denis Edogmus, Lili Su
  • for: Addresses the problem of non-Bayesian learning over vulnerable networks with communication failures and external adversarial attacks.
  • methods: Proposes a hierarchical robust push-sum algorithm with sparse information fusion and dual averaging update to achieve average consensus despite packet-dropping link failures. Also, uses a novel Byzantine-resilient gossiping-type rule to facilitate resilient information propagation across sub-networks.
  • results: Obtains provable convergence guarantees for the packet-dropping fault-tolerant non-Bayesian learning algorithm and solves the non-Bayesian learning problem via running multiple dynamics, each of which only involves Byzantine consensus with scalar inputs.
    Abstract As the network scale increases, existing fully distributed solutions start to lag behind the real-world challenges such as (1) slow information propagation, (2) network communication failures, and (3) external adversarial attacks. In this paper, we focus on hierarchical system architecture and address the problem of non-Bayesian learning over networks that are vulnerable to communication failures and adversarial attacks. On network communication, we consider packet-dropping link failures. We first propose a hierarchical robust push-sum algorithm that can achieve average consensus despite frequent packet-dropping link failures. We provide a sparse information fusion rule between the parameter server and arbitrarily selected network representatives. Then, interleaving the consensus update step with a dual averaging update with Kullback-Leibler (KL) divergence as the proximal function, we obtain a packet-dropping fault-tolerant non-Bayesian learning algorithm with provable convergence guarantees. On external adversarial attacks, we consider Byzantine attacks in which the compromised agents can send maliciously calibrated messages to others (including both the agents and the parameter server). To avoid the curse of dimensionality of Byzantine consensus, we solve the non-Bayesian learning problem via running multiple dynamics, each of which only involves Byzantine consensus with scalar inputs. To facilitate resilient information propagation across sub-networks, we use a novel Byzantine-resilient gossiping-type rule at the parameter server.
    摘要 To address external adversarial attacks, we consider Byzantine attacks in which compromised agents can send maliciously calibrated messages to others. To avoid the curse of dimensionality of Byzantine consensus, we solve the non-Bayesian learning problem by running multiple dynamics, each of which only involves Byzantine consensus with scalar inputs. To facilitate resilient information propagation across sub-networks, we use a novel Byzantine-resilient gossiping-type rule at the parameter server.

A Self-Adaptive Penalty Method for Integrating Prior Knowledge Constraints into Neural ODEs

  • paper_url: http://arxiv.org/abs/2307.14940
  • repo_url: None
  • paper_authors: C. Coelho, M. Fernanda P. Costa, L. L. Ferrás
  • for: 模elling constrained natural systems, such as population growth, chemical reaction evolution, and damped harmonic oscillator motion.
  • methods: 使用自适应罚函数方法, dynamically adjust penalty parameters to ensure the models follow the underlying rules or laws of the systems.
  • results: 比较其他罚函数Neural ODE和\emph{vanilla} Neural ODE, demonstrate the effectiveness of the proposed self-adaptive penalty algorithm for Neural ODEs in modelling constrained natural systems, and provide more accurate and robust models with reliable and meaningful predictions.
    Abstract The continuous dynamics of natural systems has been effectively modelled using Neural Ordinary Differential Equations (Neural ODEs). However, for accurate and meaningful predictions, it is crucial that the models follow the underlying rules or laws that govern these systems. In this work, we propose a self-adaptive penalty algorithm for Neural ODEs to enable modelling of constrained natural systems. The proposed self-adaptive penalty function can dynamically adjust the penalty parameters. The explicit introduction of prior knowledge helps to increase the interpretability of Neural ODE -based models. We validate the proposed approach by modelling three natural systems with prior knowledge constraints: population growth, chemical reaction evolution, and damped harmonic oscillator motion. The numerical experiments and a comparison with other penalty Neural ODE approaches and \emph{vanilla} Neural ODE, demonstrate the effectiveness of the proposed self-adaptive penalty algorithm for Neural ODEs in modelling constrained natural systems. Moreover, the self-adaptive penalty approach provides more accurate and robust models with reliable and meaningful predictions.
    摘要 自然系统的连续动力学已经有效地使用神经普通微分方程(神经ODE)进行模型化。然而,为了获得准确和有意义的预测,模型需要遵循下面的规则或法律来自然系统。在这种工作中,我们提议一种自适应罚函数算法,以便神经ODE模型中的约束。这个提议的自适应罚函数可以动态调整罚参数。通过直接引入先知知识,可以增加神经ODE模型的解释性。我们验证了提议的方法,通过模拟三个自然系统具有先知约束:人口增长、化学反应演化和抑压振荡运动。计算实验和与其他罚Neural ODE方法和� vanilla Neural ODE进行比较,表明我们的自适应罚函数算法对神经ODE模型的偏导出来自然系统进行模型化是有效的。此外,自适应罚方法还提供了更准确和可靠的模型,并且可以提供可靠和有意义的预测。

Efficient Interaction-Aware Interval Analysis of Neural Network Feedback Loops

  • paper_url: http://arxiv.org/abs/2307.14938
  • repo_url: None
  • paper_authors: Saber Jafarpour, Akash Harapanahalli, Samuel Coogan
  • for: 这个论文是为了提出一种 computationally efficient 的扩展系统间距 reachability 框架,用于系统控制器是神经网络的情况。
  • methods: 该方法利用包含函数来嵌入closed-loop系统到一个更大的嵌入系统中,以便使用单个轨迹来估计原始系统的行为下 uncertainty。 authors 提出了两种方法来构建closed-loop嵌入系统,它们不同的方式考虑了系统和控制器之间的交互。
  • results: authors 通过实现该方法在 Python 框架ReachMM中,并在一些示例和 benchmark 上进行了效果的评估,以证明该方法的高效性和可扩展性。
    Abstract In this paper, we propose a computationally efficient framework for interval reachability of systems with neural network controllers. Our approach leverages inclusion functions for the open-loop system and the neural network controller to embed the closed-loop system into a larger-dimensional embedding system, where a single trajectory over-approximates the original system's behavior under uncertainty. We propose two methods for constructing closed-loop embedding systems, which account for the interactions between the system and the controller in different ways. The interconnection-based approach considers the worst-case evolution of each coordinate separately by substituting the neural network inclusion function into the open-loop inclusion function. The interaction-based approach uses novel Jacobian-based inclusion functions to capture the first-order interactions between the open-loop system and the controller by leveraging state-of-the-art neural network verifiers. Finally, we implement our approach in a Python framework called ReachMM to demonstrate its efficiency and scalability on benchmarks and examples ranging to $200$ state dimensions.
    摘要 在本文中,我们提出了一种 Computationally Efficient 框架,用于系统 WITH 神经网络控制器的间隔可达性。我们的方法利用包含函数来描述开 Loop 系统和神经网络控制器之间的关系,从而将关闭 Loop 系统 embedding 到一个更大的维度 embedding 系统中,其中一个trajectory 可以大致表示原始系统的行为下 uncertainty。我们提出了两种方法来构建关闭 Loop embedding 系统,它们都考虑了系统和控制器之间的交互方式。interconnection-based 方法通过将神经网络包含函数substituted 到开 Loop 包含函数中来考虑每个坐标的最坏情况演化。interaction-based 方法使用新的Jacobian-based包含函数来捕捉神经网络控制器和开 Loop 系统之间的首次互动,通过利用 state-of-the-art neural network verifiers。最后,我们在一个名为 ReachMM 的 Python 框架中实现了我们的方法,以示其效率和可扩展性。

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

  • paper_url: http://arxiv.org/abs/2307.14936
  • repo_url: None
  • paper_authors: Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, Qianxiang Wang
  • for: 这篇论文是为了提高预训练的代码生成模型性能而写的。
  • methods: 这篇论文提出了一种新的RRTF(排名响应对测试&教师反馈)框架,用于提高预训练的大语言模型代码生成性能。
  • results: 据论文所说,PanGu-Coder2在OpenAI HumanEval对benchmark上达到了62.20%的pass@1,并在CoderEval和LeetCode benchmark上 consistently outperform了所有之前的代码LM。
    Abstract Large Language Models for Code (Code LLM) are flourishing. New and powerful models are released on a weekly basis, demonstrating remarkable performance on the code generation task. Various approaches have been proposed to boost the code generation performance of pre-trained Code LLMs, such as supervised fine-tuning, instruction tuning, reinforcement learning, etc. In this paper, we propose a novel RRTF (Rank Responses to align Test&Teacher Feedback) framework, which can effectively and efficiently boost pre-trained large language models for code generation. Under this framework, we present PanGu-Coder2, which achieves 62.20% pass@1 on the OpenAI HumanEval benchmark. Furthermore, through an extensive evaluation on CoderEval and LeetCode benchmarks, we show that PanGu-Coder2 consistently outperforms all previous Code LLMs.
    摘要 Large Language Models for Code (Code LLM) 是在繁殖的。新的强大模型在每周基础上发布,展示了各种各样的代码生成任务表现。各种方法被提议来提高预训练 Code LLM 的代码生成性能,如监督练化、指令调整、强化学习等。在这篇论文中,我们提出了一种新的 RRTF(评估答案对测试教师反馈的排名)框架,可以有效地和高效地提高预训练大语言模型的代码生成性能。在这个框架下,我们介绍了 PanGu-Coder2,在 OpenAI HumanEval benchmark 上达到了 62.20% 的 pass@1 成绩。此外,我们通过对 CoderEval 和 LeetCode benchmark 进行了广泛的评估,并证明了 PanGu-Coder2 在所有前一代 Code LLM 中具有稳定的优势。

Solving Data Quality Problems with Desbordante: a Demo

  • paper_url: http://arxiv.org/abs/2307.14935
  • repo_url: None
  • paper_authors: George Chernishev, Michael Polyntsov, Anton Chizhov, Kirill Stupakov, Ilya Shchuckin, Alexander Smirnov, Maxim Strutovsky, Alexey Shlyonskikh, Mikhail Firsov, Stepan Manannikov, Nikita Bobrov, Daniil Goncharov, Ilia Barutkin, Vladislav Shalnev, Kirill Muraviev, Anna Rakhmukova, Dmitriy Shcheka, Anton Chernikov, Mikhail Vyrodov, Yaroslav Kurbatov, Maxim Fofanov, Sergei Belokonnyi, Pavel Anosov, Arthur Saliou, Eduard Gaisin, Kirill Smirnov
  • for: This paper aims to address the limitations of existing data profiling systems by providing a new open-source data profiler called Desbordante, which is efficient, scalable, and provides explanations for complex statistics.
  • methods: Desbordante uses C++ for costly operations and seamless Python integration to efficiently mine data for functional dependencies, data constraints, association rules, and other complex statistics.
  • results: The paper demonstrates several scenarios for using Desbordante to solve data quality problems, including typo detection, data deduplication, and data anomaly detection.
    Abstract Data profiling is an essential process in modern data-driven industries. One of its critical components is the discovery and validation of complex statistics, including functional dependencies, data constraints, association rules, and others. However, most existing data profiling systems that focus on complex statistics do not provide proper integration with the tools used by contemporary data scientists. This creates a significant barrier to the adoption of these tools in the industry. Moreover, existing systems were not created with industrial-grade workloads in mind. Finally, they do not aim to provide descriptive explanations, i.e. why a given pattern is not found. It is a significant issue as it is essential to understand the underlying reasons for a specific pattern's absence to make informed decisions based on the data. Because of that, these patterns are effectively rest in thin air: their application scope is rather limited, they are rarely used by the broader public. At the same time, as we are going to demonstrate in this presentation, complex statistics can be efficiently used to solve many classic data quality problems. Desbordante is an open-source data profiler that aims to close this gap. It is built with emphasis on industrial application: it is efficient, scalable, resilient to crashes, and provides explanations. Furthermore, it provides seamless Python integration by offloading various costly operations to the C++ core, not only mining. In this demonstration, we show several scenarios that allow end users to solve different data quality problems. Namely, we showcase typo detection, data deduplication, and data anomaly detection scenarios.
    摘要 “数据 Profiling 是现代数据驱动行业中的一个关键过程。其中一个重要的组成部分是发现和验证复杂统计学,包括函数依赖关系、数据约束、相关规则等。然而,大多数现有的数据 Profiling 系统, concentrate 在复杂统计学上,并不提供适当的工具集成。这创造了采用这些工具在行业中的 significiant 障碍。此外,现有系统并不是为现代企业级工作荟载设计。最重要的是,它们不提供描述性解释,即为什么某个模式缺失。这是一个关键的问题,因为需要了解数据下的基本原因,以便根据数据做出了 Informed 决策。因此,这些模式在现实中几乎无法应用,它们的应用范围很有限,并且 rarely 被更广泛的公众使用。然而,我们将在这个演示中展示,复杂统计学可以高效地解决许多 класси型数据质量问题。Desbordante 是一个开源的数据 Profiler,旨在关闭这个差距。它强调在企业级应用,高效、可扩展、快速响应、提供描述性解释。此外,它提供了Python集成,通过将费时操作委托给C++核心,不仅挖掘。在这个演示中,我们将展示一些使用场景,让用户解决不同的数据质量问题。例如,我们将展示 typo 检测、数据副本检测和数据异常检测场景。”

Approximate Model-Based Shielding for Safe Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.00707
  • repo_url: https://github.com/sacktock/ambs
  • paper_authors: Alexander W. Goodall, Francesco Belardinelli
  • for: 这篇论文的目的是为了解决具有安全性需求的RL任务中的问题,以提高RL的可靠性和安全性。
  • methods: 本论文提出了一种名为简化模型基于护盾(AMBS)的原理方法,可以验证RL策略对于安全需求的满意度。AMBS不需要知道系统的安全相关运动,并且提供了严格的理论基础。
  • results: 在一系列Atari游戏中,AMBS表现较其他安全意识的方法更好,并且不需要对系统的安全相关运动进行先前的了解。
    Abstract Reinforcement learning (RL) has shown great potential for solving complex tasks in a variety of domains. However, applying RL to safety-critical systems in the real-world is not easy as many algorithms are sample-inefficient and maximising the standard RL objective comes with no guarantees on worst-case performance. In this paper we propose approximate model-based shielding (AMBS), a principled look-ahead shielding algorithm for verifying the performance of learned RL policies w.r.t. a set of given safety constraints. Our algorithm differs from other shielding approaches in that it does not require prior knowledge of the safety-relevant dynamics of the system. We provide a strong theoretical justification for AMBS and demonstrate superior performance to other safety-aware approaches on a set of Atari games with state-dependent safety-labels.
    摘要 利用增强学习(RL)解决复杂任务的潜力已经得到了广泛的认可。然而,在实际世界中应用RL到安全关键系统上是不容易的,因为许多算法是样本不充分的,并且最大化标准RL目标无法提供最差情况性能的保证。在这篇论文中,我们提出了一种模型基于的简化遮盾(AMBS)算法,用于验证RL政策与给定的安全限制之间的性能关系。我们的算法与其他安全遮盾方法不同,不需要系统安全相关动力的先前知识。我们提供了强有力的理论基础,并在一组Atari游戏中证明了AMBS的超越其他安全意识方法的性能。

Graph-based Polyphonic Multitrack Music Generation

  • paper_url: http://arxiv.org/abs/2307.14928
  • repo_url: https://github.com/emanuelecosenza/polyphemus
  • paper_authors: Emanuele Cosenza, Andrea Valenti, Davide Bacciu
  • for: 这篇论文旨在提出一种基于图表表示的深度学习系统,用于音乐生成。
  • methods: 该论文使用了一种新的图表表示方法,并提出了一种嵌入型变分自动编码器来生成音乐图表的结构和内容。
  • results: 经过训练后,模型能够生成愉悦的短和长音乐序列,并能够实际地 interpolate между它们,生成具有律动和和声一致性的音乐。此外,Visualization 的结果表明,模型能够将其征预测空间组织成与知名的音乐概念相符的方式。
    Abstract Graphs can be leveraged to model polyphonic multitrack symbolic music, where notes, chords and entire sections may be linked at different levels of the musical hierarchy by tonal and rhythmic relationships. Nonetheless, there is a lack of works that consider graph representations in the context of deep learning systems for music generation. This paper bridges this gap by introducing a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately, one after the other, with a hierarchical architecture that matches the structural priors of music. By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times. This opens the door to a new form of human-computer interaction in the context of music co-creation. After training the model on existing MIDI datasets, the experiments show that the model is able to generate appealing short and long musical sequences and to realistically interpolate between them, producing music that is tonally and rhythmically consistent. Finally, the visualization of the embeddings shows that the model is able to organize its latent space in accordance with known musical concepts.
    摘要 图可以用来模型多声多轨符号音乐,其中每个音符、和每个段落都可以在不同的音乐层次结构中相互连接,通过声调和节奏关系。然而,现有的深度学习系统 для音乐生成中对图表示方法缺乏研究。这篇论文填补这一漏洞,并介绍了一种新的音乐图表示方法和深度Variational Autoencoder,可以分别生成音乐图的结构和内容,并且具有匹配音乐结构的层次结构。通过将结构和内容分开,可以根据指定哪些乐器在某些时间点执行。这开启了一种新的人机合作方式,在音乐合作中。经过训练模型于现有MIDI dataset上,实验表明,模型能够生成愉悦的短长乐曲,并能够实际地在这些乐曲之间进行满意的插值,生成具有声调和节奏一致的音乐。最后,对嵌入的视觉化表示了模型在知名的音乐概念上能够组织其嵌入空间。

Benchmarking Performance of Deep Learning Model for Material Segmentation on Two HPC Systems

  • paper_url: http://arxiv.org/abs/2307.14921
  • repo_url: None
  • paper_authors: Warren R. Williams, S. Ross Glandon, Luke L. Morris, Jing-Ru C. Cheng
  • for: 本研究的目的是为HPC系统的性能benchmarking提供信息,以提高性能和改进作业调度器。
  • methods: 本文开发了一个基于机器学习模型的benchmarking工具,用于测试GPU加速的节点在材料分割分析中的性能。该benchmark使用了Caffe转换为PyTorch的MMdnn工具和MINC-2500数据集。
  • results: 研究发现,虽然Vulcanite在大量benchmark中具有更快的模型时间,但它也更容易受环境因素的影响,导致性能较低。相比之下,Onyx的模型时间在benchmark中具有更高的一致性。
    Abstract Performance Benchmarking of HPC systems is an ongoing effort that seeks to provide information that will allow for increased performance and improve the job schedulers that manage these systems. We develop a benchmarking tool that utilizes machine learning models and gathers performance data on GPU-accelerated nodes while they perform material segmentation analysis. The benchmark uses a ML model that has been converted from Caffe to PyTorch using the MMdnn toolkit and the MINC-2500 dataset. Performance data is gathered on two ERDC DSRC systems, Onyx and Vulcanite. The data reveals that while Vulcanite has faster model times in a large number of benchmarks, and it is also more subject to some environmental factors that can cause performances slower than Onyx. In contrast the model times from Onyx are consistent across benchmarks.
    摘要 高性能计算系统的性能测试是一项不断进行的努力,旨在提供有关性能的信息,以便提高作业调度器管理这些系统的性能。我们开发了一个测试工具,利用机器学习模型并在加速节点上收集 GPU 上执行物理分割分析时的性能数据。这个测试使用了从 Caffe 转换为 PyTorch 的 MMdnn 工具kit 和 MINC-2500 数据集。在两个 ERDC DSRC 系统上进行了性能测试,即 Onyx 和 Vulcanite。测试结果显示,虽然 Vulcanite 在许多测试中具有更快的模型时间,但也更容易受到环境因素的影响,导致性能较 Onyx 慢。相比之下,Onyx 的模型时间在各测试中具有更高的一致性。

NSA: Naturalistic Support Artifact to Boost Network Confidence

  • paper_url: http://arxiv.org/abs/2307.14917
  • repo_url: None
  • paper_authors: Abhijith Sharma, Phil Munz, Apurva Narayan
  • for: The paper is written to address the vulnerability of visual AI systems to natural and synthetic physical corruptions in the real-world, and to propose a novel approach called naturalistic support artifacts (NSA) to improve the robustness of visual AI systems.
  • methods: The paper uses a combination of deep learning techniques, including convolutional neural networks (CNNs) and generative adversarial networks (GANs), to generate naturalistic support artifacts (NSAs) that can be added to the scene to improve the robustness of visual AI systems.
  • results: The paper demonstrates the effectiveness of NSAs in improving prediction confidence scores by four times against natural corruptions on the Imagenette dataset, and also shows an average improvement of 8% in adversarial accuracy. The paper also provides qualitative analysis of NSAs using saliency maps to understand how they help improve prediction confidence.
    Abstract Visual AI systems are vulnerable to natural and synthetic physical corruption in the real-world. Such corruption often arises unexpectedly and alters the model's performance. In recent years, the primary focus has been on adversarial attacks. However, natural corruptions (e.g., snow, fog, dust) are an omnipresent threat to visual AI systems and should be considered equally important. Many existing works propose interesting solutions to train robust models against natural corruption. These works either leverage image augmentations, which come with the additional cost of model training, or place suspicious patches in the scene to design unadversarial examples. In this work, we propose the idea of naturalistic support artifacts (NSA) for robust prediction. The NSAs are shown to be beneficial in scenarios where model parameters are inaccessible and adding artifacts in the scene is feasible. The NSAs are natural looking objects generated through artifact training using DC-GAN to have high visual fidelity in the scene. We test against natural corruptions on the Imagenette dataset and observe the improvement in prediction confidence score by four times. We also demonstrate NSA's capability to increase adversarial accuracy by 8\% on average. Lastly, we qualitatively analyze NSAs using saliency maps to understand how they help improve prediction confidence.
    摘要 自然和人工physical corruption对视觉AI系统提出了挑战。这些损害通常是意外的并改变模型的性能。过去几年,主要关注点是对敌意攻击。但是,自然损害(例如雪、雾、尘埃)对视觉AI系统是一种普遍存在的威胁,应该受到相同的重视。现有的许多工作提出了有趣的方法来训练抗损害的模型。这些工作可以通过图像增强来提高模型的训练,但这会带来额外的成本。或者,在场景中添加不可信的质感来设计不可攻击的示例。在这种情况下,我们提出了自然化支持 artifacts(NSA)的想法,用于强化预测。NSA通过使用DC-GAN进行artifact训练,在场景中生成自然看起来的对象,以提高预测的可信度。我们对Imagenette dataset中的自然损害进行测试,并观察到预测可信度分数提高四倍。此外,我们还证明NSA可以提高敌意率平均8%。最后,我们使用saliency map来Qualitatively分析NSAs,以了解它们如何改善预测的可信度。

Clustering of illustrations by atmosphere using a combination of supervised and unsupervised learning

  • paper_url: http://arxiv.org/abs/2307.15099
  • repo_url: None
  • paper_authors: Keisuke Kubota, Masahiro Okuda
    for:这篇论文主要是为了解决在社交媒体上分布的插画,如Twitter和Pixiv的普及,以及插画的“氛围”在用户喜好中的重要性。methods:这篇论文使用了双重学习和 Pseudo-label 技术,包括监督学习和无监督学习,以获取特征向量。results:实验分析表明,该方法可以超越传统方法,在人类 manually 分类的数据集上实现人类化的划分。
    Abstract The distribution of illustrations on social media, such as Twitter and Pixiv has increased with the growing popularity of animation, games, and animated movies. The "atmosphere" of illustrations plays an important role in user preferences. Classifying illustrations by atmosphere can be helpful for recommendations and searches. However, assigning clear labels to the elusive "atmosphere" and conventional supervised classification is not always practical. Furthermore, even images with similar colors, edges, and low-level features may not have similar atmospheres, making classification based on low-level features challenging. In this paper, this problem is solved using both supervised and unsupervised learning with pseudo-labels. The feature vectors are obtained using the supervised method with pseudo-labels that contribute to an ambiguous atmosphere. Further, clustering is performed based on these feature vectors. Experimental analyses show that our method outperforms conventional methods in human-like clustering on datasets manually classified by humans.
    摘要 社交媒体上的插图分布量增加了,与动画、游戏和动画电影的流行相关。插图的“氛围”在用户喜好中扮演重要的角色。将插图分类为不同的氛围可以有助于推荐和搜索。然而,将氛围的朴素特征明确归类是不实际的。同时,即使图像具有相似的颜色、边缘和低级特征,也可能没有相同的氛围,这使得基于低级特征的分类变得困难。在这篇论文中,我们解决了这个问题,使用了超vised学习和无监督学习,并使用pseudo-label来获得特征向量。然后,对这些特征向量进行聚类分析。实验分析表明,我们的方法比传统方法在人类化的聚类中表现出了优异性。

Scaling Session-Based Transformer Recommendations using Optimized Negative Sampling and Loss Functions

  • paper_url: http://arxiv.org/abs/2307.14906
  • repo_url: https://github.com/otto-de/tron
  • paper_authors: Timo Wilm, Philipp Normann, Sophie Baumeister, Paul-Vincent Kobow
  • for: 这篇论文旨在提出一种可扩展的会话基于转换器推荐器,以提高推荐质量和可扩展性。
  • methods: 该论文使用了优化的负样本选择和列表式损失函数,以增强推荐准确性。
  • results: 在大规模电商数据集上进行评估,该方法与现有方法相比,提高了推荐质量,而且保持了与SASRec相同的训练速度。一次实际A/B测试中,与SASRec比较,Click-through rate提高18.14%, highlighting TRON在实际场景中的潜力。
    Abstract This work introduces TRON, a scalable session-based Transformer Recommender using Optimized Negative-sampling. Motivated by the scalability and performance limitations of prevailing models such as SASRec and GRU4Rec+, TRON integrates top-k negative sampling and listwise loss functions to enhance its recommendation accuracy. Evaluations on relevant large-scale e-commerce datasets show that TRON improves upon the recommendation quality of current methods while maintaining training speeds similar to SASRec. A live A/B test yielded an 18.14% increase in click-through rate over SASRec, highlighting the potential of TRON in practical settings. For further research, we provide access to our source code at https://github.com/otto-de/TRON and an anonymized dataset at https://github.com/otto-de/recsys-dataset.
    摘要 这个研究介绍了TRON,一种可扩展的会话基于转换器推荐器,使用优化的负抽象来提高推荐准确性。由于现有的模型如SASRec和GRU4Rec+的可扩展性和性能限制,TRON将顶峰k个负样本和列wise损失函数集成到其中,以提高推荐准确性。在相关的大规模电商数据集上进行评估,TRON比现有的方法提高了推荐质量,同时保持与SASRec相同的训练速度。一次实际A/B测试中,TRON比SASRec提高了18.14%的点击率,这 highlights TRON在实际场景中的潜力。为进一步研究,我们在GitHub上提供了源代码https://github.com/otto-de/TRON和一个匿名化的数据集https://github.com/otto-de/recsys-dataset。

CodeLens: An Interactive Tool for Visualizing Code Representations

  • paper_url: http://arxiv.org/abs/2307.14902
  • repo_url: None
  • paper_authors: Yuejun Guo, Seifeddine Bettaieb, Qiang Hu, Yves Le Traon, Qiang Tang
  • for: 这篇论文是为了提供一个可视化代码表示环境,帮助开发者快速理解和探索不同类型的代码表示。
  • methods: 这篇论文使用了CodeLens工具,该工具支持多种编程语言和多种代码表示方法,如Token序列、抽象 sintaxis树(AST)、数据流图(DFG)和控制流图(CFG)。
  • results: 通过使用CodeLens工具,开发者可以快速可见化特定的代码表示,并获得代码表示的输入数据,以便应用机器学习算法进行抽取信息。
    Abstract Representing source code in a generic input format is crucial to automate software engineering tasks, e.g., applying machine learning algorithms to extract information. Visualizing code representations can further enable human experts to gain an intuitive insight into the code. Unfortunately, as of today, there is no universal tool that can simultaneously visualise different types of code representations. In this paper, we introduce a tool, CodeLens, which provides a visual interaction environment that supports various representation methods and helps developers understand and explore them. CodeLens is designed to support multiple programming languages, such as Java, Python, and JavaScript, and four types of code representations, including sequence of tokens, abstract syntax tree (AST), data flow graph (DFG), and control flow graph (CFG). By using CodeLens, developers can quickly visualize the specific code representation and also obtain the represented inputs for models of code. The Web-based interface of CodeLens is available at http://www.codelens.org. The demonstration video can be found at http://www.codelens.org/demo.
    摘要 <>translate the following text into Simplified Chinese代表源代码的通用输入格式是至关重要,以便自动化软件工程任务,例如应用机器学习算法提取信息。可视化代码表示可以帮助人类专家获得直观的印象,但当前没有一个通用的工具可同时支持不同类型的代码表示。在这篇论文中,我们介绍了一个工具——CodeLens,它提供了一个可视化交互环境,支持多种代码表示方法,并帮助开发者理解和探索它们。CodeLens支持多种编程语言,如Java、Python和JavaScript,以及四种代码表示方法,包括Token序列、抽象语法树(AST)、数据流图(DFG)和控制流图(CFG)。通过使用CodeLens,开发者可快速可视化特定的代码表示,并获得代码表示的输入数据,以便为代码模型进行模拟。CodeLens的Web基本接口可在http://www.codelens.org上查看,示例视频可在http://www.codelens.org/demo上找到。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Self-Supervised Learning for Improved Synthetic Aperture Sonar Target Recognition

  • paper_url: http://arxiv.org/abs/2307.15098
  • repo_url: None
  • paper_authors: BW Sheffield
  • for: 这个研究探讨了基于自然学习(SSL)的目标识别在Synthetic Aperture Sonar(SAS)成像中的应用。
  • methods: 研究使用了两种知名的SSL算法,MoCov2和BYOL,以及常见的超级vised learning模型,ResNet18,进行二元图像分类任务。
  • results: 结果表明,当只有少量标签时,SSL模型可以超越完全supervised learning模型,但当所有标签都用时,SSL模型不能超越supervised learning模型。
    Abstract This study explores the application of self-supervised learning (SSL) for improved target recognition in synthetic aperture sonar (SAS) imagery. The unique challenges of underwater environments make traditional computer vision techniques, which rely heavily on optical camera imagery, less effective. SAS, with its ability to generate high-resolution imagery, emerges as a preferred choice for underwater imaging. However, the voluminous high-resolution SAS data presents a significant challenge for labeling; a crucial step for training deep neural networks (DNNs). SSL, which enables models to learn features in data without the need for labels, is proposed as a potential solution to the data labeling challenge in SAS. The study evaluates the performance of two prominent SSL algorithms, MoCov2 and BYOL, against the well-regarded supervised learning model, ResNet18, for binary image classification tasks. The findings suggest that while both SSL models can outperform a fully supervised model with access to a small number of labels in a few-shot scenario, they do not exceed it when all the labels are used. The results underscore the potential of SSL as a viable alternative to traditional supervised learning, capable of maintaining task performance while reducing the time and costs associated with data labeling. The study also contributes to the growing body of evidence supporting the use of SSL in remote sensing and could stimulate further research in this area.
    摘要

Cascaded Cross-Modal Transformer for Request and Complaint Detection

  • paper_url: http://arxiv.org/abs/2307.15097
  • repo_url: https://github.com/ristea/ccmt
  • paper_authors: Nicolae-Catalin Ristea, Radu Tudor Ionescu
  • for: 本研究旨在提出一种新的层次跨模态 transformer(CCMT),用于检测电话对话中的客户请求和投诉。
  • methods: 本方法利用多Modal Paradigma,通过自动语音识别(ASR)模型将Speech转录为文本,然后使用不同语言的BERT模型和Wav2Vec2.0音频特征进行组合。
  • results: 我们在ACM Multimedia 2023 Computational Paralinguistics Challenge中的请求子挑战中应用了我们的系统,实现了不等权平均回归率(UAR)的65.41%和85.87%,分别对应于投诉和请求类别。
    Abstract We propose a novel cascaded cross-modal transformer (CCMT) that combines speech and text transcripts to detect customer requests and complaints in phone conversations. Our approach leverages a multimodal paradigm by transcribing the speech using automatic speech recognition (ASR) models and translating the transcripts into different languages. Subsequently, we combine language-specific BERT-based models with Wav2Vec2.0 audio features in a novel cascaded cross-attention transformer model. We apply our system to the Requests Sub-Challenge of the ACM Multimedia 2023 Computational Paralinguistics Challenge, reaching unweighted average recalls (UAR) of 65.41% and 85.87% for the complaint and request classes, respectively.
    摘要 我们提出了一种新的层次跨模态转换器(CCMT),该模型将语音和文本译本结合以检测电话对话中的客户请求和投诉。我们的方法采用多模态思维,使用自动语音识别(ASR)模型将语音转录,然后将转录中文本翻译成不同语言。接着,我们将语言特定的 BERT 基于模型和 Wav2Vec2.0 音频特征相结合在一个新的层次跨关注转换器模型中。我们在 ACM Multimedia 2023 计算语言学挑战中使用我们的系统,达到了不Weighted Average Recall(UAR)的 65.41% 和 85.87% 的投诉和请求类别中的最高 recall。

Generative convective parametrization of dry atmospheric boundary layer

  • paper_url: http://arxiv.org/abs/2307.14857
  • repo_url: None
  • paper_authors: Florian Heyder, Juan Pedro Mellado, Jörg Schumacher
  • for: 这个论文的目的是为了提供一种基于生成式机器学习算法的湍流参数化方法,用于 kilometer-scale Earth系统模型中的层状湍流。
  • methods: 这个论文使用的方法是基于生成式机器学习算法的一种参数化方法,其中包括自适应的扩展和改进。
  • results: 这个论文的结果表明,使用这种参数化方法可以准确地预测湍流场的统计特征,包括湍流的非高斯性特征和快速的热气层升降。此外,这种参数化方法还可以提供湍流的横向组织结构,这种结构不可能通过其他模型 closure 来获得。
    Abstract Turbulence parametrizations will remain a necessary building block in kilometer-scale Earth system models. In convective boundary layers, where the mean vertical gradients of conserved properties such as potential temperature and moisture are approximately zero, the standard ansatz which relates turbulent fluxes to mean vertical gradients via an eddy diffusivity has to be extended by mass flux parametrizations for the typically asymmetric up- and downdrafts in the atmospheric boundary layer. In this work, we present a parametrization for a dry convective boundary layer based on a generative adversarial network. The model incorporates the physics of self-similar layer growth following from the classical mixed layer theory by Deardorff. This enhances the training data base of the generative machine learning algorithm and thus significantly improves the predicted statistics of the synthetically generated turbulence fields at different heights inside the boundary layer. The algorithm training is based on fully three-dimensional direct numerical simulation data. Differently to stochastic parametrizations, our model is able to predict the highly non-Gaussian transient statistics of buoyancy fluctuations, vertical velocity, and buoyancy flux at different heights thus also capturing the fastest thermals penetrating into the stabilized top region. The results of our generative algorithm agree with standard two-equation or multi-plume stochastic mass-flux schemes. The present parametrization provides additionally the granule-type horizontal organization of the turbulent convection which cannot be obtained in any of the other model closures. Our work paves the way to efficient data-driven convective parametrizations in other natural flows, such as moist convection, upper ocean mixing, or convection in stellar interiors.
    摘要 额� parametrizations 将继续作为地球系统模型中必要的建筑物件。在循环边层中,其含有保守的积分物理特征,例如潜在温度和湿度的垂直梯度平均值为近乎零,标准假设,即 relate turbulent fluxes to mean vertical gradients via an eddy diffusivity 需要扩展。在这项工作中,我们提出了一种基于生成对抗网络的湿度 boundary layer parametrization。该模型包含了自similar层生长的物理学,从混合层理论中的Deardorff。这将提高生成机器学习算法的训练数据库,并在不同高度内 synthetically 生成的湿度场预测的统计学性能得到显著提高。算法训练基于完全三维直接数值 simulate 数据。与Stochastic parametrizations 不同,我们的模型可以预测高度不具有Gaussian 性质的杂流异常,垂直速度和杂流质量在不同高度上的统计学性能,因此也捕捉到稳定层顶部最快的热气孔。我们的结果与标准两Equation 或多柱Stochastic 质量流分布相符。我们的 parametrization 提供了 granule-type 的杂流层组织,这不能由其他任何模型 closure 获得。我们的工作开启了高效数据驱动的湿度 parametrizations 在其他自然流体中,例如湿气循环、上层海洋混合或星系内部湿度。

Counterfactual Explanations for Graph Classification Through the Lenses of Density

  • paper_url: http://arxiv.org/abs/2307.14849
  • repo_url: https://github.com/carlo-abrate/Counterfactual-Explanations-for-Graph-Classification-Through-the-Lenses-of-Density
  • paper_authors: Carlo Abrate, Giulia Preti, Francesco Bonchi
  • for: 这个论文旨在提供一种基于density的对graph分类器的counterfactual例子,以便更好地解释分类结果。
  • methods: 这种方法基于density-based counterfactual search framework,可以通过不同的密度结构来实现不同的counterfactual例子。
  • results: 试验结果表明,采用密度作为变量的改变unit是定义versatile和可读的counterfactual例子的关键。
    Abstract Counterfactual examples have emerged as an effective approach to produce simple and understandable post-hoc explanations. In the context of graph classification, previous work has focused on generating counterfactual explanations by manipulating the most elementary units of a graph, i.e., removing an existing edge, or adding a non-existing one. In this paper, we claim that such language of explanation might be too fine-grained, and turn our attention to some of the main characterizing features of real-world complex networks, such as the tendency to close triangles, the existence of recurring motifs, and the organization into dense modules. We thus define a general density-based counterfactual search framework to generate instance-level counterfactual explanations for graph classifiers, which can be instantiated with different notions of dense substructures. In particular, we show two specific instantiations of this general framework: a method that searches for counterfactual graphs by opening or closing triangles, and a method driven by maximal cliques. We also discuss how the general method can be instantiated to exploit any other notion of dense substructures, including, for instance, a given taxonomy of nodes. We evaluate the effectiveness of our approaches in 7 brain network datasets and compare the counterfactual statements generated according to several widely-used metrics. Results confirm that adopting a semantic-relevant unit of change like density is essential to define versatile and interpretable counterfactual explanation methods.
    摘要 <>传统的 counterfactual 例子在图 классификации中已经被证明是一种有效的方法,可以生成简单易理解的后果解释。在这篇文章中,我们认为这种语言的解释可能太细腻,因此我们转移注意力到了实际世界中复杂网络的一些主要特征,例如减少三角形、存在循环模式以及组织成稠密模块。我们因此定义了一种基于密度的 counterfactual 搜索框架,用于生成图类ifizier的实例级别的反例解释。这个框架可以根据不同的归一化概念来实例化。在特定情况下,我们显示了两种具体的实现方法:一种通过打开或关闭三角形来搜索反例图,另一种是通过最大 клиQUE来驱动。我们还讨论了如何将这个框架实例化以利用任何其他的归一化概念,例如给定的节点分类。我们在 7 个大脑网络数据集上评估了我们的方法,并与多种通用的评价指标进行比较。结果表明,采用密度为单位的变化是生成多样化和可读的反例解释方法的关键。

Kernelised Normalising Flows

  • paper_url: http://arxiv.org/abs/2307.14839
  • repo_url: None
  • paper_authors: Eshant English, Matthias Kirchler, Christoph Lippert
  • for: 这个研究旨在探讨Normalising Flows是一种生成模型,并且提出了一种新的kernelised normalising flow paradigm,以提高这种模型的表现。
  • methods: 本研究使用了kernelised flow,即将kernels integrate到Normalising Flows的框架中,以提高模型的表现。
  • results: 研究结果显示,使用kernelised flow可以与基于神经网络的流动模型相比,具有相似或更好的结果,并且可以优化参数效率。特别是在低数据情况下,kernelised flows可以实现非 Parametric density估计。
    Abstract Normalising Flows are generative models characterised by their invertible architecture. However, the requirement of invertibility imposes constraints on their expressiveness, necessitating a large number of parameters and innovative architectural designs to achieve satisfactory outcomes. Whilst flow-based models predominantly rely on neural-network-based transformations for expressive designs, alternative transformation methods have received limited attention. In this work, we present Ferumal flow, a novel kernelised normalising flow paradigm that integrates kernels into the framework. Our results demonstrate that a kernelised flow can yield competitive or superior results compared to neural network-based flows whilst maintaining parameter efficiency. Kernelised flows excel especially in the low-data regime, enabling flexible non-parametric density estimation in applications with sparse data availability.
    摘要 常规化流是一种生成模型,它们具有可逆的架构。然而,需要可逆性的限制表达能力,因此需要大量参数和创新的架构设计来实现满意的结果。而液流模型主要依靠神经网络基于的变换来实现表达设计,而其他变换方法受到了有限的关注。在这项工作中,我们提出了 Ferumal 流,一种新的核心化常规化流模型,它将核心纳入框架。我们的结果显示,核心化流可以与神经网络基于的流比肩或超越其表现,同时保持参数效率。核心化流在数据稀缺时特别有优势,允许非参数式density估计在应用中。

Building RadiologyNET: Unsupervised annotation of a large-scale multimodal medical database

  • paper_url: http://arxiv.org/abs/2308.08517
  • repo_url: None
  • paper_authors: Mateja Napravnik, Franko Hržić, Sebastian Tschauner, Ivan Štajduhar
  • for: 这个论文目的是自动标注医疗影像数据库,以提高医疗诊断和治疗的效率和准确性。
  • methods: 该论文使用了无监督的自动标注方法,利用多Modal的数据源,包括图像、DICOM元数据和描述性诊断。具体来说,该论文使用了多种适当的特征提取器,并对其进行了评估,以选择最佳的特征提取器。
  • results: 该论文的结果表明,将所有数据源的嵌入都 fusion起来最佳地进行无监督划分大规模医疗数据,可以获得最 concise 的划分结果。因此,该论文是建立大量和更细化的医疗影像数据库的第一步。
    Abstract Background and objective: The usage of machine learning in medical diagnosis and treatment has witnessed significant growth in recent years through the development of computer-aided diagnosis systems that are often relying on annotated medical radiology images. However, the availability of large annotated image datasets remains a major obstacle since the process of annotation is time-consuming and costly. This paper explores how to automatically annotate a database of medical radiology images with regard to their semantic similarity. Material and methods: An automated, unsupervised approach is used to construct a large annotated dataset of medical radiology images originating from Clinical Hospital Centre Rijeka, Croatia, utilising multimodal sources, including images, DICOM metadata, and narrative diagnoses. Several appropriate feature extractors are tested for each of the data sources, and their utility is evaluated using k-means and k-medoids clustering on a representative data subset. Results: The optimal feature extractors are then integrated into a multimodal representation, which is then clustered to create an automated pipeline for labelling a precursor dataset of 1,337,926 medical images into 50 clusters of visually similar images. The quality of the clusters is assessed by examining their homogeneity and mutual information, taking into account the anatomical region and modality representation. Conclusion: The results suggest that fusing the embeddings of all three data sources together works best for the task of unsupervised clustering of large-scale medical data, resulting in the most concise clusters. Hence, this work is the first step towards building a much larger and more fine-grained annotated dataset of medical radiology images.
    摘要 背景和目标:在医学诊断和治疗中使用机器学习技术的应用已经在过去几年中呈现出明显的增长趋势,其中Computer-aided diagnosis系统经常利用了注释的医疗影像。然而,大量注释影像数据的可用性仍然是一个主要的障碍,因为注释过程占用时间和成本很高。这篇论文探讨了如何自动注释医疗影像数据库,并且根据各种数据源的Semantic similarity进行分类。材料和方法:本研究使用自动化、无监督的方法,通过构建一个大量注释的医疗影像数据库,该数据库来自克罗地亚维也纳医院中心医院,并使用多种数据源,包括图像、DICOM元数据和描述性诊断。为每种数据源选择合适的特征提取器,并对代表数据 subsets 进行 k-means 和 k-medoids 归一化,以评估特征提取器的有用性。结果:最佳的特征提取器被集成到多模式表示中,并对 precursor 数据集进行自动化分类,将1337926个医疗图像分为50个视觉相似的图像集。cluster质量被评估通过Homogeneity和Mutual Information的评估,考虑骨格区域和模式表示。结论:结果表明,将所有三种数据源的嵌入合并为一起是最佳的方法,得到了最短的 clusters。因此,这种工作是建立大量、更细化的注释医疗影像数据库的第一步。

Fading memory as inductive bias in residual recurrent networks

  • paper_url: http://arxiv.org/abs/2307.14823
  • repo_url: None
  • paper_authors: Igor Dubinin, Felix Effenberger
  • for: 这篇论文旨在研究剩下连接在循环神经网络(RNN)中的影响,以及如何通过这些连接来提高任务性能和网络动态特性。
  • methods: 作者提出了弱相互连接循环神经网络(WCRNN),并研究了这些连接对网络性能、动态特性和忘却特性的影响。
  • results: 研究发现,不同类型的剩下连接可以提供不同的 inductive bias,使网络表现更加强大。具体来说,这些连接可以使网络在数据特性的边缘附近进行动态,或者让网络利用数据特性的特征性质,还可以使网络具有不同的忘却特性。此外,作者还展示了如何将这些结果推广到非线性剩下连接和Elman RNN中。
    Abstract Residual connections have been proposed as architecture-based inductive bias to mitigate the problem of exploding and vanishing gradients and increase task performance in both feed-forward and recurrent networks (RNNs) when trained with the backpropagation algorithm. Yet, little is known about how residual connections in RNNs influence their dynamics and fading memory properties. Here, we introduce weakly coupled residual recurrent networks (WCRNNs) in which residual connections result in well-defined Lyapunov exponents and allow for studying properties of fading memory. We investigate how the residual connections of WCRNNs influence their performance, network dynamics, and memory properties on a set of benchmark tasks. We show that several distinct forms of residual connections yield effective inductive biases that result in increased network expressivity. In particular, residual connections that (i) result in network dynamics at the proximity of the edge of chaos, (ii) allow networks to capitalize on characteristic spectral properties of the data, and (iii) result in heterogeneous memory properties are shown to increase practical expressivity. In addition, we demonstrate how our results can be extended to non-linear residuals and introduce a weakly coupled residual initialization scheme that can be used for Elman RNNs
    摘要 剩下的连接(residual connections)已经被提议作为网络架构偏好,以降低迪迪卷积和淘汰卷积的问题,并提高任务性能。然而,关于剩下的连接在RNN中的动态和渐幂性质的了解很少。在这里,我们介绍了弱相互连接的剩下RNN(WCRNN),其中剩下的连接导致了明确的Lyapunov exponent,可以研究RNN的渐幂性质。我们研究了WCRNN的剩下连接对其性能、网络动态和记忆性质的影响。我们发现,不同类型的剩下连接可以提供不同的偏好,以提高网络表达力。具体来说,剩下连接可以让网络的动态在稳定边缘附近,利用数据的特征频率特性,以及具有多样的记忆性质可以提高实际表达力。此外,我们还证明了我们的结果可以推广到非线性剩下连接,并引入了弱相互连接初始化方案,可以应用于Elman RNN。

Likely, Light, and Accurate Context-Free Clusters-based Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2307.14788
  • repo_url: None
  • paper_authors: Tiago Rodrigues de Almeida, Oscar Martinez Mozos
  • for: 本研究旨在提出一种多 этап probabilistic 方法,用于预测路交通系统中的轨迹。
  • methods: 该方法包括轨迹变换到位移空间,分词时间序列,轨迹提议和排名提议。我们提出了一种新的深度特征归一化方法,基于自适应 GAN,可以更好地适应分布变化。
  • results: 总体系统超过了上下文自适应深度生成模型在人员和道路代理 trajectory 数据中,同时与点估计相比,对最有可能的轨迹进行排名时,效率更高却准确。
    Abstract Autonomous systems in the road transportation network require intelligent mechanisms that cope with uncertainty to foresee the future. In this paper, we propose a multi-stage probabilistic approach for trajectory forecasting: trajectory transformation to displacement space, clustering of displacement time series, trajectory proposals, and ranking proposals. We introduce a new deep feature clustering method, underlying self-conditioned GAN, which copes better with distribution shifts than traditional methods. Additionally, we propose novel distance-based ranking proposals to assign probabilities to the generated trajectories that are more efficient yet accurate than an auxiliary neural network. The overall system surpasses context-free deep generative models in human and road agents trajectory data while performing similarly to point estimators when comparing the most probable trajectory.
    摘要 自适应系统在路运输网络中需要智能机制来预测未来。在这篇论文中,我们提出了一种多stage probabilistic方法 для路径预测:路径变换到位移空间、扩散时间序列集成、路径提议和排序提议。我们介绍了一种新的深度特征划分方法,基于自适应GAN,可以更好地适应分布变化。此外,我们提出了一种新的距离基于排序提议,可以更有效率又准确地分配批量中的概率。总体系统在人类和道路代理人 trajectory 数据中超越了上下文自适应深度生成模型,同时与点估计器相比,最可能的路径预测的准确率相似。

Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset

  • paper_url: http://arxiv.org/abs/2307.14783
  • repo_url: https://github.com/serkansulun/lyricsemotions
  • paper_authors: Serkan Sulun, Pedro Oliveira, Paula Viana
  • for: 这个论文是为了创建一个大规模的符号音乐情感标注数据集而写的。
  • methods: 作者首先使用GoEmotions数据集进行情感分类模型的训练,实现了状态之最佳result,并且使用这些模型对两个大规模的MIDI数据集中的歌词进行应用。
  • results: 作者创建了一个广泛的情感标注音乐数据集,提供了一个价值的资源来探索音乐和情感之间的连接,特别是开发基于特定情感的音乐生成模型。作者的推理、训练模型和数据集在线可用。
    Abstract We present a new large-scale emotion-labeled symbolic music dataset consisting of 12k MIDI songs. To create this dataset, we first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline. We then applied these models to lyrics from two large-scale MIDI datasets. Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions and, especially, to develop models that can generate music based on specific emotions. Our code for inference, trained models, and datasets are available online.
    摘要 我们现在发布了一个新的大规模符号音乐数据集,包含12个MIDI歌曲。为创建这个数据集,我们首先在GoEmotions数据集上训练了情感分类模型,实现了状态的最佳结果,使用的模型比基线模型小一半。然后,我们将这些模型应用于两个大规模MIDI数据集的歌词中。我们的数据集覆盖了各种细腻的情感,提供了价值的资源,以便探索音乐和情感之间的关系,特别是开发基于specific情感的音乐生成模型。我们在线上提供了推理代码、训练模型和数据集。

MATNilm: Multi-appliance-task Non-intrusive Load Monitoring with Limited Labeled Data

  • paper_url: http://arxiv.org/abs/2307.14778
  • repo_url: https://github.com/jxiong22/matnilm
  • paper_authors: Jing Xiong, Tianqi Hong, Dongbo Zhao, Yu Zhang
  • for: 这个研究的目的是提出一个高效精度的非入侵式负载监控(NILM)方法,以分析家庭各种设备的工作状态和能量消耗。
  • methods: 本研究提出了一个多设备任务框架,并使用训练效率高的样本扩展(SA)方法,以减少需要的标签数据。每个设备都有一个共享层次拆分结构,用于其预测和分类任务。此外,还提出了一个二维注意力机制,以捕捉所有设备之间的时空相互关联。
  • results: simulation results show that our proposed approach features a significantly improved performance over many baseline models, with relative errors reduced by more than 50% on average.
    Abstract Non-intrusive load monitoring (NILM) identifies the status and power consumption of various household appliances by disaggregating the total power usage signal of an entire house. Efficient and accurate load monitoring facilitates user profile establishment, intelligent household energy management, and peak load shifting. This is beneficial for both the end-users and utilities by improving the overall efficiency of a power distribution network. Existing approaches mainly focus on developing an individual model for each appliance. Those approaches typically rely on a large amount of household-labeled data which is hard to collect. In this paper, we propose a multi-appliance-task framework with a training-efficient sample augmentation (SA) scheme that boosts the disaggregation performance with limited labeled data. For each appliance, we develop a shared-hierarchical split structure for its regression and classification tasks. In addition, we also propose a two-dimensional attention mechanism in order to capture spatio-temporal correlations among all appliances. With only one-day training data and limited appliance operation profiles, the proposed SA algorithm can achieve comparable test performance to the case of training with the full dataset. Finally, simulation results show that our proposed approach features a significantly improved performance over many baseline models. The relative errors can be reduced by more than 50% on average. The codes of this work are available at https://github.com/jxiong22/MATNilm
    摘要 非侵入式电力监测(NILM)可以识别各种家用电器的状态和能量消耗,从整个房屋的总电力使用信号中分解出各个家用电器的使用情况。高效精准的加载监测可以帮助用户建立使用模式、智能家庭能源管理和峰值加载延迟等等,这对于用户和供应商都是有利的,可以提高总电力分配网络的效率。现有的方法主要集中在开发每个家用电器的个性化模型上。这些方法通常需要大量的家庭标注数据,而这些数据很难收集。在这篇论文中,我们提出了一种多家用电器任务框架,具有训练效率高的样本扩展(SA)方案,可以在有限的标注数据下提高分解性能。每个家用电器都有一个共享层次拆分结构,用于其 regression 和分类任务。此外,我们还提出了两维注意力机制,以捕捉所有家用电器之间的空间时间相关性。只需一天的训练数据和有限的家用电器运行 profiling,我们的 SA 算法可以达到与全 dataset 训练的相同测试性能。最后,我们的实验结果表明,我们的提出的方法在许多基准模型上具有显著提高的性能,相对误差可以降低 более чем 50%的平均值。代码这些工作可以在 查看。

Towards Practicable Sequential Shift Detectors

  • paper_url: http://arxiv.org/abs/2307.14758
  • repo_url: None
  • paper_authors: Oliver Cobb, Arnaud Van Looveren
  • for: 这篇论文是为了探讨分布差异的影响,并提出了实用的方法来检测这种差异。
  • methods: 论文使用了现有的分布探测方法,并评估了它们的实用性。
  • results: 论文发现了一些已有的方法无法满足实际应用中的重要需求,并提出了未来研究的方向。I hope this helps! Let me know if you have any further questions.
    Abstract There is a growing awareness of the harmful effects of distribution shift on the performance of deployed machine learning models. Consequently, there is a growing interest in detecting these shifts before associated costs have time to accumulate. However, desiderata of crucial importance to the practicable deployment of sequential shift detectors are typically overlooked by existing works, precluding their widespread adoption. We identify three such desiderata, highlight existing works relevant to their satisfaction, and recommend impactful directions for future research.
    摘要 有越来越多的人意识到机器学习模型部署时的分布变化对模型性能的负面影响。因此,有越来越多的人对检测这些变化的方法表示兴趣。然而,现有的研究往往忽视了部署sequential shift detector的重要需求,这使得这些方法在实践中无法普及。我们识别出了三个重要的需求,介绍了现有的相关研究,并建议了未来研究的有效方向。

Fair Machine Unlearning: Data Removal while Mitigating Disparities

  • paper_url: http://arxiv.org/abs/2307.14754
  • repo_url: None
  • paper_authors: Alex Oesterling, Jiaqi Ma, Flavio P. Calmon, Hima Lakkaraju
  • for: 本研究旨在提供一种可靠地忘记数据实例的机器学习方法,以保护个人隐私和公正。
  • methods: 本研究使用了一种基于规则的方法,可以有效地忘记数据实例,同时保持分组公正。
  • results: 实验结果表明,该方法可以忘记数据实例,并且保持分组公正。
    Abstract As public consciousness regarding the collection and use of personal information by corporations grows, it is of increasing importance that consumers be active participants in the curation of corporate datasets. In light of this, data governance frameworks such as the General Data Protection Regulation (GDPR) have outlined the right to be forgotten as a key principle allowing individuals to request that their personal data be deleted from the databases and models used by organizations. To achieve forgetting in practice, several machine unlearning methods have been proposed to address the computational inefficiencies of retraining a model from scratch with each unlearning request. While efficient online alternatives to retraining, it is unclear how these methods impact other properties critical to real-world applications, such as fairness. In this work, we propose the first fair machine unlearning method that can provably and efficiently unlearn data instances while preserving group fairness. We derive theoretical results which demonstrate that our method can provably unlearn data instances while maintaining fairness objectives. Extensive experimentation with real-world datasets highlight the efficacy of our method at unlearning data instances while preserving fairness.
    摘要

FLARE: Fingerprinting Deep Reinforcement Learning Agents using Universal Adversarial Masks

  • paper_url: http://arxiv.org/abs/2307.14751
  • repo_url: https://github.com/ssg-research/FLARE
  • paper_authors: Buse G. A. Tekgul, N. Asokan
  • for: 验证深度强化学习策略是否是伪造的另一个策略。
  • methods: 使用非传输的通用敌意掩蔽,生成对于犯罪策略的攻击示例,并使用这些掩蔽作为指纹来验证策略的真实所属。
  • results: 对伪造策略进行验证,100%的动作一致率,不会误 accusations独立策略。Robust against model modification attacks and cannot be easily evaded by more informed adversaries without negatively impacting agent performance.
    Abstract We propose FLARE, the first fingerprinting mechanism to verify whether a suspected Deep Reinforcement Learning (DRL) policy is an illegitimate copy of another (victim) policy. We first show that it is possible to find non-transferable, universal adversarial masks, i.e., perturbations, to generate adversarial examples that can successfully transfer from a victim policy to its modified versions but not to independently trained policies. FLARE employs these masks as fingerprints to verify the true ownership of stolen DRL policies by measuring an action agreement value over states perturbed via such masks. Our empirical evaluations show that FLARE is effective (100% action agreement on stolen copies) and does not falsely accuse independent policies (no false positives). FLARE is also robust to model modification attacks and cannot be easily evaded by more informed adversaries without negatively impacting agent performance. We also show that not all universal adversarial masks are suitable candidates for fingerprints due to the inherent characteristics of DRL policies. The spatio-temporal dynamics of DRL problems and sequential decision-making process make characterizing the decision boundary of DRL policies more difficult, as well as searching for universal masks that capture the geometry of it.
    摘要 我们提出FLARE,第一个验证怀疑深度强化学习(DRL)政策是否为非法复制另一个(受害者)政策的机制。我们首先显示可以找到不可转移的通用防御措施,即扰动,以生成对照改进版本的对抗例项。FLARE使用这些扰动作为指痕,以验证盗取DRL政策的真正所有者。我们的实验评估显示FLARE有100%的动作一致率(在盗取 копиях上),并不会对独立的政策提出错误的指控(没有假阳性)。FLARE也具有模型修改攻击和更进一步的攻击者无法轻易逃脱的强度。我们还证明了不同的通用防御措施不一定适合指痕,因为深度强化学习问题的空间时间动态和统计决策过程使得描述DRL政策的决策boundary更加具有挑战性。

Semantic Image Completion and Enhancement using GANs

  • paper_url: http://arxiv.org/abs/2307.14748
  • repo_url: None
  • paper_authors: Priyansh Saxena, Raahat Gupta, Akshat Maheshwari, Saumil Maheshwari
  • for: 这篇论文的目的是提出一种基于生成对抗网络(GAN)的图像完成和加强方法。
  • methods: 这篇论文使用了GAN网络来实现图像完成和加强任务。
  • results: 研究表明,使用GAN网络可以有效地完成和加强图像,并且可以提高图像质量。
    Abstract Semantic inpainting or image completion alludes to the task of inferring arbitrary large missing regions in images based on image semantics. Since the prediction of image pixels requires an indication of high-level context, this makes it significantly tougher than image completion, which is often more concerned with correcting data corruption and removing entire objects from the input image. On the other hand, image enhancement attempts to eliminate unwanted noise and blur from the image, along with sustaining most of the image details. Efficient image completion and enhancement model should be able to recover the corrupted and masked regions in images and then refine the image further to increase the quality of the output image. Generative Adversarial Networks (GAN), have turned out to be helpful in picture completion tasks. In this chapter, we will discuss the underlying GAN architecture and how they can be used used for image completion tasks.
    摘要 Semantic inpainting or image completion refers to the task of inferring arbitrary large missing regions in images based on image semantics. As the prediction of image pixels requires high-level context, this task is significantly more challenging than image completion, which is often focused on correcting data corruption and removing entire objects from the input image. On the other hand, image enhancement aims to eliminate unwanted noise and blur from the image while preserving most of the image details. An efficient image completion and enhancement model should be able to recover the corrupted and masked regions in images and then refine the image further to improve the quality of the output image. Generative Adversarial Networks (GAN) have proven to be helpful in picture completion tasks, and we will discuss the underlying GAN architecture and how they can be used for image completion tasks in this chapter.

A Strategic Framework for Optimal Decisions in Football 1-vs-1 Shot-Taking Situations: An Integrated Approach of Machine Learning, Theory-Based Modeling, and Game Theory

  • paper_url: http://arxiv.org/abs/2307.14732
  • repo_url: https://github.com/calvinyeungck/analyzing-two-agents-interaction-in-football-shot-taking-situations
  • paper_authors: Calvin C. K. Yeung, Keisuke Fujii
    for:* The paper aims to analyze critical scenarios in football, specifically shot-taking, using game theory and machine learning.methods:* The proposed framework uses ML models to estimate expected payoffs and extracts additional features with a theory-based shot block model.* The xSOT metric is introduced to evaluate players’ actions even if the shot results in no goal, allowing for effective differentiation and comparison.results:* The framework is validated through experiments and shows a high correlation with existing metrics, indicating that xSOT provides valuable insights.* The paper demonstrates the use of the framework to analyze optimal strategies in the World Cup 2022 and a shot situation in EURO 2020.Here is the text in Simplified Chinese:for:* 本文旨在通过游戏理论和机器学习分析足球比赛中的关键情况,具体来说是射击情况。methods:* 提议的框架使用机器学习模型来估算射击结果的预期收益,同时使用理论基础的射击阻挡模型提取更多的特征。* xSOT指标是为了评估球员的决策,即使射击失败也能够提供有价值的信息。results:* 实验 validate了框架,并发现了与现有指标的高相关性,表明xSOT提供了有价值的信息。* 文章使用了框架来分析2022年世界杯和2020年欧洲锦标赛中的优化策略。
    Abstract Complex interactions between two opposing agents frequently occur in domains of machine learning, game theory, and other application domains. Quantitatively analyzing the strategies involved can provide an objective basis for decision-making. One such critical scenario is shot-taking in football, where decisions, such as whether the attacker should shoot or pass the ball and whether the defender should attempt to block the shot, play a crucial role in the outcome of the game. However, there are currently no effective data-driven and/or theory-based approaches to analyzing such situations. To address this issue, we proposed a novel framework to analyze such scenarios based on game theory, where we estimate the expected payoff with machine learning (ML) models, and additional features for ML models were extracted with a theory-based shot block model. Conventionally, successes or failures (1 or 0) are used as payoffs, while a success shot (goal) is extremely rare in football. Therefore, we proposed the Expected Probability of Shot On Target (xSOT) metric to evaluate players' actions even if the shot results in no goal; this allows for effective differentiation and comparison between different shots and even enables counterfactual shot situation analysis. In our experiments, we have validated the framework by comparing it with baseline and ablated models. Furthermore, we have observed a high correlation between the xSOT and existing metrics. This alignment of information suggests that xSOT provides valuable insights. Lastly, as an illustration, we studied optimal strategies in the World Cup 2022 and analyzed a shot situation in EURO 2020.
    摘要 在机器学习、游戏理论等领域中,两个对立的代理经常发生复杂的互动。量化分析这些策略可以提供客观的决策基础。例如,在足球比赛中,决定是否发球或传球,以及是否阻止对方发球的决策具有决定性作用。然而,目前没有有效的数据驱动和/或理论基础的方法来分析这些情况。为解决这个问题,我们提出了一种新的分析框架,基于游戏理论,我们使用机器学习(ML)模型来估计预期的奖励,同时还提取了基于理论的射击阻挡模型中的额外特征。在我们的实验中,我们 validate了这种框架,并与基准和减少模型进行比较。此外,我们发现了xSOT指标与现有指标之间的高相关性。这种信息的对应表明了xSOT提供了有价值的洞察。最后,我们通过研究2022年世界杯和2020年欧洲锦标赛中的优化策略,以及一个在EURO 2020中的射击情况的分析,以示 illustrate 该框架的应用。

Understanding Silent Failures in Medical Image Classification

  • paper_url: http://arxiv.org/abs/2307.14729
  • repo_url: https://github.com/iml-dkfz/sf-visuals
  • paper_authors: Till J. Bungert, Levin Kobelke, Paul F. Jaeger
  • for: 防止隐藏失败,以确保医疗应用中的分类系统可靠性。
  • methods: 使用 confidence scoring functions (CSFs) 检测剩下的失败,或者设计更加可靠的分类器。
  • results: 对四种生物医学任务和多种数据分布shift进行了首次全面分析,发现现有的 CSFs 无法可靠地预防隐藏失败。
    Abstract To ensure the reliable use of classification systems in medical applications, it is crucial to prevent silent failures. This can be achieved by either designing classifiers that are robust enough to avoid failures in the first place, or by detecting remaining failures using confidence scoring functions (CSFs). A predominant source of failures in image classification is distribution shifts between training data and deployment data. To understand the current state of silent failure prevention in medical imaging, we conduct the first comprehensive analysis comparing various CSFs in four biomedical tasks and a diverse range of distribution shifts. Based on the result that none of the benchmarked CSFs can reliably prevent silent failures, we conclude that a deeper understanding of the root causes of failures in the data is required. To facilitate this, we introduce SF-Visuals, an interactive analysis tool that uses latent space clustering to visualize shifts and failures. On the basis of various examples, we demonstrate how this tool can help researchers gain insight into the requirements for safe application of classification systems in the medical domain. The open-source benchmark and tool are at: https://github.com/IML-DKFZ/sf-visuals.
    摘要 Simplified Chinese translation:为确保医疗应用中的分类系统可靠使用,防止悬而无声失败是非常重要。这可以通过设计更加可靠的分类器来实现,或者通过使用信任分数函数(CSF)来检测剩下的失败。图像分类中最主要的失败来源是在训练数据和部署数据之间的分布变化。为了了解医疗领域中 silent failure 预防的当前状况,我们进行了首次全面的比较多种 CSF 在四个生物医学任务中的性能。结果显示,现有的 CSF 无法可靠预防 silent failure。我们认为,更深入的理解数据中失败的根本原因是必要的。为此,我们介绍 SF-Visuals,一个可互动的分析工具,使用秘密空间划分来可见化变化和失败。通过多个例子,我们示例了这个工具如何帮助研究人员了解医疗领域中分类系统的安全应用需求。开源 benchmark 和工具可以在 GitHub 上获取:https://github.com/IML-DKFZ/sf-visuals。

The Effect of Spoken Language on Speech Enhancement using Self-Supervised Speech Representation Loss Functions

  • paper_url: http://arxiv.org/abs/2307.14502
  • repo_url: https://github.com/leto19/commonvoice-demand
  • paper_authors: George Close, Thomas Hain, Stefan Goetze
  • for: 本研究旨在探讨自动化语音增强(SE)领域中使用自动学习语音表示(SSSR)作为功能转换在损失函数中的效果。
  • methods: 本研究使用了不同语言组合和网络结构的自动学习语音表示来训练SE系统,并测试其在未看到语言上的表现。
  • results: 研究发现,训练语言对增强性能的影响相对较小,但是训练数据量的影响非常大。
    Abstract Recent work in the field of speech enhancement (SE) has involved the use of self-supervised speech representations (SSSRs) as feature transformations in loss functions. However, in prior work, very little attention has been paid to the relationship between the language of the audio used to train the self-supervised representation and that used to train the SE system. Enhancement models trained using a loss function which incorporates a self-supervised representation that shares exactly the language of the noisy data used to train the SE system show better performance than those which do not match exactly. This may lead to enhancement systems which are language specific and as such do not generalise well to unseen languages, unlike models trained using traditional spectrogram or time domain loss functions. In this work, SE models are trained and tested on a number of different languages, with self-supervised representations which themselves are trained using different language combinations and with differing network structures as loss function representations. These models are then tested across unseen languages and their performances are analysed. It is found that the training language of the self-supervised representation appears to have a minor effect on enhancement performance, the amount of training data of a particular language, however, greatly affects performance.
    摘要 近期在语音提高领域的研究中,有使用自动生成的语音表示(SSSR)作为损失函数中的特征变换。然而,在先前的工作中,对使用自动生成的语音表示和语音提高系统训练语言之间的关系进行了非常少的关注。使用与听取数据中的语言相同的自动生成表示来训练语音提高系统,可以得到更好的性能。这可能导致的语音提高系统具有语言特定性,无法通用于未见语言。在这个工作中,我们训练和测试了多种不同语言的语音提高模型,使用不同的语言组合和网络结构作为损失函数表示。这些模型然后在未见语言上进行测试,并分析其性能。结果显示,训练自动生成表示的语言对提高性能产生了微不足的影响,但是具体的语言训练数据的量却有很大的影响。

Robust vertebra identification using simultaneous node and edge predicting Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.02509
  • repo_url: https://github.com/imfusiongmbh/vid-vertebra-identification-dataset
  • paper_authors: Vincent Bürgin, Raphael Prevost, Marijn F. Stollenga
  • for: 这篇论文的目的是提出一个简单的架构,用于自动测量CT扫描中的脊椎骨,并将其归类为完整的orientation。
  • methods: 这篇论文使用了一个标准的预测模型和单一的Graph Neural Network(GNN),将脊椎骨与完整的orientation归类。
  • results: 这篇论文的方法能够精确地将脊椎骨与相应的体部和脊椎骨关联,忽略 False Positives,并在一个简单且可训练的架构中进行归类。该方法超越了传统的方法,如匈牙利对称和隐藏Markov模型。此外,该方法在标准的VerSe体身亮度挑战任务中也表现了竞争性的表现。
    Abstract Automatic vertebra localization and identification in CT scans is important for numerous clinical applications. Much progress has been made on this topic, but it mostly targets positional localization of vertebrae, ignoring their orientation. Additionally, most methods employ heuristics in their pipeline that can be sensitive in real clinical images which tend to contain abnormalities. We introduce a simple pipeline that employs a standard prediction with a U-Net, followed by a single graph neural network to associate and classify vertebrae with full orientation. To test our method, we introduce a new vertebra dataset that also contains pedicle detections that are associated with vertebra bodies, creating a more challenging landmark prediction, association and classification task. Our method is able to accurately associate the correct body and pedicle landmarks, ignore false positives and classify vertebrae in a simple, fully trainable pipeline avoiding application-specific heuristics. We show our method outperforms traditional approaches such as Hungarian Matching and Hidden Markov Models. We also show competitive performance on the standard VerSe challenge body identification task.
    摘要 自动 vertebra localization 和识别在 CT 扫描中是至关重要的许多临床应用。许多进步已经在这个领域得到了,但大多数方法忽略了vertebra 的方向。此外,大多数方法使用了规则集的ipeline,这些规则可能在实际的临床图像中不稳定。我们介绍了一个简单的管道,使用标准预测和 U-Net,然后使用单个图解神经网络将 vertebra 与完整的方向相关联和分类。为测试我们的方法,我们介绍了一个新的 vertebra 数据集,该数据集包含pedicle 的检测,这些检测与 vertebra body 相关,创造了更加复杂的标注预测、关联和分类任务。我们的方法能够准确地关联正确的体和pedicle 标记,忽略假阳性,并分类 vertebra 在一个简单、完全可训练的管道中,不需要应用特定的规则。我们显示了我们的方法超过传统方法,如匈牙利匹配和隐马尔可夫模型。我们还显示了与标准 VerSe 挑战体ID任务的竞争性性能。

TimeGNN: Temporal Dynamic Graph Learning for Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2307.14680
  • repo_url: None
  • paper_authors: Nancy Xu, Chrysoula Kosma, Michalis Vazirgiannis
  • for: 这个论文主要用于提出一种可以Capture多ivariate时间序列之间的关系并预测时间序列的方法。
  • methods: 该方法使用图 neural network 结构,可以同时学习图structured representation和时间序列数据的相关性。
  • results: 该方法可以在4-80倍 faster的计算时间内 achieve comparable forecasting performance 与其他状态的艺术方法。
    Abstract Time series forecasting lies at the core of important real-world applications in many fields of science and engineering. The abundance of large time series datasets that consist of complex patterns and long-term dependencies has led to the development of various neural network architectures. Graph neural network approaches, which jointly learn a graph structure based on the correlation of raw values of multivariate time series while forecasting, have recently seen great success. However, such solutions are often costly to train and difficult to scale. In this paper, we propose TimeGNN, a method that learns dynamic temporal graph representations that can capture the evolution of inter-series patterns along with the correlations of multiple series. TimeGNN achieves inference times 4 to 80 times faster than other state-of-the-art graph-based methods while achieving comparable forecasting performance
    摘要 时间序列预测在许多科学和工程领域的重要应用中央。大量的时间序列数据集中含有复杂的模式和长期相关性,这导致了各种神经网络架构的发展。图神经网络方法,它同时学习基于时间序列数据的Raw值相关性来建立图结构,在预测时已经取得了很大成功。然而,这些解决方案通常训练成本高、扩展困难。在这篇论文中,我们提出了TimeGNN方法,它可以捕捉多个时间序列的进化趋势以及多个时间序列之间的相关性。TimeGNN在预测性能和训练成本之间取得了平衡,与其他状态码的图基于方法相比,实现了4到80倍快的预测速度。

Prediction of wind turbines power with physics-informed neural networks and evidential uncertainty quantification

  • paper_url: http://arxiv.org/abs/2307.14675
  • repo_url: None
  • paper_authors: Alfonso Gijón, Ainhoa Pujana-Goitia, Eugenio Perea, Miguel Molina-Solana, Juan Gómez-Romero
  • for: 优化风力机操作和维护,早期缺陷探测
  • methods: 使用物理 informed neural network 模型 reproduce历史数据,并遵循物理限制
  • results: 模型准确预测功率、扭矩和功率系数,并提供了可靠的uncertainty estimations
    Abstract The ever-growing use of wind energy makes necessary the optimization of turbine operations through pitch angle controllers and their maintenance with early fault detection. It is crucial to have accurate and robust models imitating the behavior of wind turbines, especially to predict the generated power as a function of the wind speed. Existing empirical and physics-based models have limitations in capturing the complex relations between the input variables and the power, aggravated by wind variability. Data-driven methods offer new opportunities to enhance wind turbine modeling of large datasets by improving accuracy and efficiency. In this study, we used physics-informed neural networks to reproduce historical data coming from 4 turbines in a wind farm, while imposing certain physical constraints to the model. The developed models for regression of the power, torque, and power coefficient as output variables showed great accuracy for both real data and physical equations governing the system. Lastly, introducing an efficient evidential layer provided uncertainty estimations of the predictions, proved to be consistent with the absolute error, and made possible the definition of a confidence interval in the power curve.
    摘要 随着风能发电的不断扩大,风力机操作的优化变得越来越重要,特别是通过翼角控制器和早期缺陷检测来保持风力机的维护。为了准确预测风力机发电量,需要建立准确和可靠的风力机模型,特别是预测风速对发电量的影响。现有的empirical和物理基础模型具有限制性,尤其是在风速变化的情况下。数据驱动方法可以提高风力机模型的准确性和效率。在这项研究中,我们使用物理 Informed Neural Networks来重现4台风力机历史数据,并对模型做出了certainphysical constraints。开发的回归模型包括输出变量的电力、扭矩和功率系数,具有很高的准确性和物理定律。最后,通过引入效率的证据层,我们可以获得预测结果的uncertainty estimations,并证明了它们与绝对错误之间的一致性。此外,我们还可以通过定义Confidence Interval在发电曲线上来确定预测结果的可靠性。

Bipartite Ranking Fairness through a Model Agnostic Ordering Adjustment

  • paper_url: http://arxiv.org/abs/2307.14668
  • repo_url: https://github.com/cuis15/xorder
  • paper_authors: Sen Cui, Weishen Pan, Changshui Zhang, Fei Wang
  • for: 这篇论文的目的是为了在机器学习中实现公平性,具体是在双面排名enario中学习一个排名函数,让正面类别的元素排名高于负面类别的元素。
  • methods: 本论文提出了一个模型agnostic post-processing框架xOrder,用于实现这个目的。xOrder 使用一个可调整的权重组合来推导一个最佳的扭曲路径,并通过动态计划运算解决。xOrder 可以与不同的分类模型和公平度指标相容,包括监督式和无监督式公平度指标。
  • results: 本论文在四个标准资料集和两个真实世界医疗保健记录存储系统上进行了评估。xOrder 在这些数据集上获得了一个更好的平衡 между算法的有用性和排名公平ness。从数据集的可观化排名 scores 可以看出,xOrder 对不同群体的排名 scores 进行了调整,并且在比较基eline时具有更好的公平性。此外,额外的分析结果显示xOrder 在较少样本和较大的分布差时仍能保持一定的稳定性。
    Abstract Algorithmic fairness has been a serious concern and received lots of interest in machine learning community. In this paper, we focus on the bipartite ranking scenario, where the instances come from either the positive or negative class and the goal is to learn a ranking function that ranks positive instances higher than negative ones. While there could be a trade-off between fairness and performance, we propose a model agnostic post-processing framework xOrder for achieving fairness in bipartite ranking and maintaining the algorithm classification performance. In particular, we optimize a weighted sum of the utility as identifying an optimal warping path across different protected groups and solve it through a dynamic programming process. xOrder is compatible with various classification models and ranking fairness metrics, including supervised and unsupervised fairness metrics. In addition to binary groups, xOrder can be applied to multiple protected groups. We evaluate our proposed algorithm on four benchmark data sets and two real-world patient electronic health record repositories. xOrder consistently achieves a better balance between the algorithm utility and ranking fairness on a variety of datasets with different metrics. From the visualization of the calibrated ranking scores, xOrder mitigates the score distribution shifts of different groups compared with baselines. Moreover, additional analytical results verify that xOrder achieves a robust performance when faced with fewer samples and a bigger difference between training and testing ranking score distributions.
    摘要 algorithmic fairness 已经是机器学习社区中的一个严重问题和Received lots of interest。在这篇论文中,我们关注了两类分类enario,其中实例来自正类或负类,并且目标是学习一个排名函数,排名正类实例更高than负类实例。尽管存在性能和公平之间的交易,我们提出了一种模型agnostic post-processing框架xOrder,以实现在二分类排名中保持算法分类性能的公平。具体来说,我们优化了一个权重和 Utility 的总和,并通过动态编程过程解决。xOrder 与不同的分类模型和公平度标准相容,包括监督性和无监督性公平度标准。此外,xOrder 还可以应用于多个保护组。我们在四个 benchmark 数据集和两个实际患者电子医疗纪录库中评估了我们的提议算法。xOrder 在不同的数据集和度量上均能够实现一个更好的平衡 между算法实用性和排名公平。从Visualization 中的抽象排名分数的结果来看,xOrder 可以减少不同群体的分数分布变化。此外,额外的分析结果表明,xOrder 在有 fewer samples 和测试排名分数分布与训练排名分数分布之间的差异时,具有一定的稳定性。

Decoding the Secrets of Machine Learning in Malware Classification: A Deep Dive into Datasets, Feature Extraction, and Model Performance

  • paper_url: http://arxiv.org/abs/2307.14657
  • repo_url: https://github.com/eurecom-s3/decodingmlsecretsofwindowsmalwareclassification
  • paper_authors: Savino Dambra, Yufei Han, Simone Aonzo, Platon Kotzias, Antonino Vitale, Juan Caballero, Davide Balzarotti, Leyla Bilge
  • for: 本研究旨在探讨机器学习模型在恶意软件检测和分类方面的开放问题,包括数据集收集方式、特征提取方法和模型训练方法等因素对恶意软件分类结果的影响。
  • methods: 本研究使用最新的机器学习模型进行恶意软件检测和分类,并使用了最大的均衡的恶意软件集合(67K个样本,670个家族)进行训练。模型的特征提取方法包括静态分析和动态分析。
  • results: 研究发现静态特征在恶意软件检测和分类中表现更好,而将静态和动态特征相结合只能提供较差的改善。研究还发现没有与封包相关的对恶意软件分类精度的关系,而在动态特征中缺失的行为对恶意软件分类表现产生了很大的损害。此外,研究还发现模型训练用到具有均匀分布的家族数量时,模型在未经见过数据上的泛化性更好。
    Abstract Many studies have proposed machine-learning (ML) models for malware detection and classification, reporting an almost-perfect performance. However, they assemble ground-truth in different ways, use diverse static- and dynamic-analysis techniques for feature extraction, and even differ on what they consider a malware family. As a consequence, our community still lacks an understanding of malware classification results: whether they are tied to the nature and distribution of the collected dataset, to what extent the number of families and samples in the training dataset influence performance, and how well static and dynamic features complement each other. This work sheds light on those open questions. by investigating the key factors influencing ML-based malware detection and classification. For this, we collect the largest balanced malware dataset so far with 67K samples from 670 families (100 samples each), and train state-of-the-art models for malware detection and family classification using our dataset. Our results reveal that static features perform better than dynamic features, and that combining both only provides marginal improvement over static features. We discover no correlation between packing and classification accuracy, and that missing behaviors in dynamically-extracted features highly penalize their performance. We also demonstrate how a larger number of families to classify make the classification harder, while a higher number of samples per family increases accuracy. Finally, we find that models trained on a uniform distribution of samples per family better generalize on unseen data.
    摘要 很多研究已经提出了机器学习(ML)模型用于恶意软件检测和分类,报道了几乎完美的性能。然而,这些研究在组装真实数据的方式、使用不同的静态分析技术和动态分析技术来提取特征、以及认为什么是恶意软件家族的定义上有很大差异。因此,我们的社区仍然缺乏了恶意软件分类结果的理解:是否与收集的数据集的性质和分布相关,数据集中家族和样本的数量对性能的影响,静态和动态特征之间的配合情况如何。本工作用 investigate这些问题。我们收集了最大的均衡恶意软件数据集,包括67K个样本和670个家族(每个家族100个样本),并使用我们的数据集来训练当前的ML模型用于恶意软件检测和家族分类。我们的结果表明,静态特征在恶意软件检测和家族分类中表现得更好,并且将静态和动态特征结合只提供了静态特征的较小改进。我们发现packing和分类精度之间没有相关性,而在动态提取的特征中缺失行为会严重降低性能。我们还示出了家族数量增加分类变得更加困难,而每个家族中样本数量增加后,检测精度提高。最后,我们发现使用均匀分布的样本分类每个家族的模型可以更好地通用于未经见数据。

Machine Learning based Parameter Sensitivity of Regional Climate Models – A Case Study of the WRF Model for Heat Extremes over Southeast Australia

  • paper_url: http://arxiv.org/abs/2307.14654
  • repo_url: None
  • paper_authors: P. Jyoteeshkumar Reddy, Sandeep Chinta, Richard Matear, John Taylor, Harish Baki, Marcus Thatcher, Jatin Kala, Jason Sharples
  • for: 这个研究的目的是估计南东澳大利亚区域气候模型的参数敏感性,以便更好地理解这些模型的表现。
  • methods: 这个研究使用了机器学习(ML)surrogate-based Sobol SA方法来检查南东澳大利亚区域气候模型(WRF)模型参数的敏感性。
  • results: 研究结果显示,WRF模型中的3个参数(散射调整参数、饱和土壤水含量多余参数和气动对流积分系数的形状参数)对表面气象变量(温度、相对湿度和风速)有重要影响。这些结果在两次极端高温事件中都是一样的。
    Abstract Heatwaves and bushfires cause substantial impacts on society and ecosystems across the globe. Accurate information of heat extremes is needed to support the development of actionable mitigation and adaptation strategies. Regional climate models are commonly used to better understand the dynamics of these events. These models have very large input parameter sets, and the parameters within the physics schemes substantially influence the model's performance. However, parameter sensitivity analysis (SA) of regional models for heat extremes is largely unexplored. Here, we focus on the southeast Australian region, one of the global hotspots of heat extremes. In southeast Australia Weather Research and Forecasting (WRF) model is the widely used regional model to simulate extreme weather events across the region. Hence in this study, we focus on the sensitivity of WRF model parameters to surface meteorological variables such as temperature, relative humidity, and wind speed during two extreme heat events over southeast Australia. Due to the presence of multiple parameters and their complex relationship with output variables, a machine learning (ML) surrogate-based global sensitivity analysis method is considered for the SA. The ML surrogate-based Sobol SA is used to identify the sensitivity of 24 adjustable parameters in seven different physics schemes of the WRF model. Results show that out of these 24, only three parameters, namely the scattering tuning parameter, multiplier of saturated soil water content, and profile shape exponent in the momentum diffusivity coefficient, are important for the considered meteorological variables. These SA results are consistent for the two different extreme heat events. Further, we investigated the physical significance of sensitive parameters. This study's results will help in further optimising WRF parameters to improve model simulation.
    摘要 全球各地的热潮和野火会对社会和生态系统产生重大影响。为了支持避免和适应策略的发展,需要准确的热极值信息。区域气象模型通常用于更好地理解这些事件的动态。这些模型有非常大的输入参数集,模型物理方程中的参数对模型性能有很大的影响。然而,区域模型参数敏感性分析(SA)对热极值事件的模型参数敏感性尚未得到广泛研究。本研究将在澳大利亚南东地区进行,这是全球热极值事件的热点之一。在这个地区,Weather Research and Forecasting(WRF)模型是广泛使用的区域模型,用于模拟EXTREME WEATHER事件。因此,本研究将Focus on WRF模型参数对表层天气变量(温度、相对湿度和风速)的敏感性。由于参数的多样性和复杂的关系,我们使用机器学习(ML)Surrogate-based Sobol SA方法进行敏感性分析。结果显示,共24个可调参数中,只有三个参数(散射调整参数、满足湿度的多项式尺度和阻力系数)对表层天气变量具有重要影响。这些敏感性结果在两次EXTREME HEAT事件中都是一致的。此外,我们还研究了敏感参数的物理意义。本研究的结果将有助于进一步优化WRF参数,提高模型的预测能力。

Speed Limits for Deep Learning

  • paper_url: http://arxiv.org/abs/2307.14653
  • repo_url: https://github.com/RishabhP19/Traffic-Surveillance
  • paper_authors: Inbar Seroussi, Alexander A. Alemi, Moritz Helias, Zohar Ringel
  • for: 这 paper 探讨了现代神经网络在训练时是否能够最优化。
  • methods: 本 paper 使用了Recent Advances in Stochastic Thermodynamics来确定神经网络训练过程中的速度上限,基于 Wasserstein-2 距离和动力学过程中的热力学生产率。
  • results: 研究发现,对于线性和线性可导神经网络(例如Neural Tangent Kernel,NTK),训练过程中的学习是在一定的扩展尺度下是最优的,这与小规模实验结果相符。
    Abstract State-of-the-art neural networks require extreme computational power to train. It is therefore natural to wonder whether they are optimally trained. Here we apply a recent advancement in stochastic thermodynamics which allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network, based on the ratio of their Wasserstein-2 distance and the entropy production rate of the dynamical process connecting them. Considering both gradient-flow and Langevin training dynamics, we provide analytical expressions for these speed limits for linear and linearizable neural networks e.g. Neural Tangent Kernel (NTK). Remarkably, given some plausible scaling assumptions on the NTK spectra and spectral decomposition of the labels -- learning is optimal in a scaling sense. Our results are consistent with small-scale experiments with Convolutional Neural Networks (CNNs) and Fully Connected Neural networks (FCNs) on CIFAR-10, showing a short highly non-optimal regime followed by a longer optimal regime.
    摘要 现代神经网络需要极高的计算能力进行训练。因此,它是自然的问题是否Optimally 训练。在这里,我们采用了最新的热力学推理的进步,可以确定从初始权重分布到完全训练后的网络权重分布之间的速度上限,基于这两个分布之间的 Wasserstein-2 距离和动力学过程之间的热力学生产率。对于线性和可线性神经网络,如神经凝结kernel(NTK),我们提供了分析表达式的速度上限。很remarkably,在一些可能的 NTK спектrum和标签的 spectral decomposition 的假设下,我们发现学习是在一个尺度上优化的。我们的结果与小规模实验中的 convolutional neural networks (CNNs)和fully connected neural networks (FCNs)在 CIFAR-10 上表明,存在一个非常短暂的非优化期,然后是一个更长的优化期。

Spatial-Frequency U-Net for Denoising Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2307.14648
  • repo_url: None
  • paper_authors: Xin Yuan, Linjie Li, Jianfeng Wang, Zhengyuan Yang, Kevin Lin, Zicheng Liu, Lijuan Wang
  • for: 该研究使用 diffusion probabilistic model (DDPM) 进行视觉合成,并在wavelet空间中进行研究,而不是在像素空间中。
  • methods: 该研究提出了一种新的网络架构 SFUNet,该架构包括特有的频率域自适应卷积和注意力模块,以jointly模型空间和频率域信息。
  • results: 该研究发现,通过显式地模型波峰信号,该模型可以在 CIFAR-10、FFHQ、LSUN-Bedroom 和 LSUN-Church 数据集上生成高质量的图像,比传统的像素基于网络更高。
    Abstract In this paper, we study the denoising diffusion probabilistic model (DDPM) in wavelet space, instead of pixel space, for visual synthesis. Considering the wavelet transform represents the image in spatial and frequency domains, we carefully design a novel architecture SFUNet to effectively capture the correlation for both domains. Specifically, in the standard denoising U-Net for pixel data, we supplement the 2D convolutions and spatial-only attention layers with our spatial frequency-aware convolution and attention modules to jointly model the complementary information from spatial and frequency domains in wavelet data. Our new architecture can be used as a drop-in replacement to the pixel-based network and is compatible with the vanilla DDPM training process. By explicitly modeling the wavelet signals, we find our model is able to generate images with higher quality on CIFAR-10, FFHQ, LSUN-Bedroom, and LSUN-Church datasets, than the pixel-based counterpart.
    摘要 在这篇论文中,我们研究了在抽象波лет空间中使用扩散概率模型(DDPM)进行视觉合成。我们注意到,抽象波лет变换可以同时表示图像的空间和频率领域信息,因此我们特别设计了一个新的架构SFUNet,以有效地捕捉这两个领域之间的相关性。在标准的净化U-Net中,我们在2D核函数和空间只关注层之上补充了我们的空间频率感知卷积和关注模块,以同时模型空间和频率领域之间的衔接信息。我们的新架构可以作为像素数据上的替换,并且与普通的DDPM训练过程相容。通过显式地模型抽象波лет信号,我们发现我们的模型在CIFAR-10、FFHQ、LSUN-Bedroom和LSUN-Church数据集上可以生成高质量的图像,比普通像素数据上的对应模型更高。

MVMR-FS : Non-parametric feature selection algorithm based on Maximum inter-class Variation and Minimum Redundancy

  • paper_url: http://arxiv.org/abs/2307.14643
  • repo_url: None
  • paper_authors: Haitao Nie, Shengbo Zhang, Bin Xie
  • for: 本研究旨在解决Feature Selection中的 Age-old challenge,即如何准确地测量特征之间的相关性和重复性。
  • methods: 本研究提出了一种基于最大间类差和最小重复性的非参数式特征选择算法(MVMR-FS),该算法首先通过抽象特征的supervised和Unsupervised顶点密度推估来捕捉它们之间的相似性和差异。然后,该算法使用间类概率分布来反映特征相关性,并使用全类概率分布的距离来衡量特征之间的重复性。最后,该算法使用AGA进行搜索,以找到最佳特征子集,使得MVMR最小。
  • results: 相比十种state-of-the-art方法,MVMR-FS实现了最高的平均准确率,并提高了准确率5%到11%。
    Abstract How to accurately measure the relevance and redundancy of features is an age-old challenge in the field of feature selection. However, existing filter-based feature selection methods cannot directly measure redundancy for continuous data. In addition, most methods rely on manually specifying the number of features, which may introduce errors in the absence of expert knowledge. In this paper, we propose a non-parametric feature selection algorithm based on maximum inter-class variation and minimum redundancy, abbreviated as MVMR-FS. We first introduce supervised and unsupervised kernel density estimation on the features to capture their similarities and differences in inter-class and overall distributions. Subsequently, we present the criteria for maximum inter-class variation and minimum redundancy (MVMR), wherein the inter-class probability distributions are employed to reflect feature relevance and the distances between overall probability distributions are used to quantify redundancy. Finally, we employ an AGA to search for the feature subset that minimizes the MVMR. Compared with ten state-of-the-art methods, MVMR-FS achieves the highest average accuracy and improves the accuracy by 5% to 11%.
    摘要 如何准确地衡量特征之间的相关性和重复性是数据特征选择领域的长standing问题。然而,现有的筛子型特征选择方法无法直接测量连续数据中的重复性。另外,大多数方法需要手动指定特征的数量,这可能会导致专家知识不足的情况下出现错误。在本文中,我们提出了一种非 Parametric 特征选择算法基于最大间类差和最小重复性,缩写为 MVMR-FS。我们首先引入了监督和无监督核密概率分布来捕捉特征之间的相似性和总体分布的差异。然后,我们提出了最大间类差和最小重复性的标准(MVMR),其中间类概率分布用于反映特征相关性,而总体概率分布之间的距离用于衡量特征之间的重复性。最后,我们使用 AG 搜索算法来找到最佳特征子集,该子集最小化 MVMR。与现有的十种状态级方法相比,MVMR-FS 达到了最高的平均准确率,提高了准确率5%到11%。

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

  • paper_url: http://arxiv.org/abs/2307.14642
  • repo_url: None
  • paper_authors: Kyurae Kim, Yian Ma, Jacob R. Gardner
  • for: 证明黑盒变量推断(BBVI)with控制变量,特别是 sticking-the-landing(STL)估计,在完美变量家族指定下 converges at a geometric rate。
  • methods: 使用黑盒变量推断和控制变量,特别是 STL 估计,以及前一个工作的 proyected stochastic gradient descent。
  • results: 证明 BBVI 的 Gradient 异议 variance quadratic bound,包括 misspecified variational families,以及对 closed-form entropy gradient estimators 的改进,提供了非假想性的复杂性保证。
    Abstract We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator and provides explicit non-asymptotic complexity guarantees for both.
    摘要 我们证明黑盒变量推断(BBVI)使用控制变量,特别是“粘土降落”(STL)估计器,在完美变量家族特定下具有 Геометрический(传统上称为“线性”)速率减少。具体来说,我们证明STL估计器的梯度方差呈 quadratic 型,包括变量家族不准确的情况。与过去的工作相结合,这直接意味着 BBVI 使用 проекed stochastic gradient descent 确实存在收敛性。此外,我们也改进了已有的闭合形式 entropy Gradient 估计器,使其与 STL 估计器进行比较,并提供了非吸收性 garanties。

Fact-Checking of AI-Generated Reports

  • paper_url: http://arxiv.org/abs/2307.14634
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Razi Mahmood, Ge Wang, Mannudeep Kalra, Pingkun Yan
  • for: 这篇论文旨在提高医疗工作流程的效率和准确性,并减少医疗成本。
  • methods: 本研究使用生成式人工智能(AI)生成推测的自动化报告,并使用该报告的相应图像进行实验。
  • results: 本研究发现,使用专门的检查工具可以对生成式AI报告中的伪阳性找到和移除伪阳性的句子。
    Abstract With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports leading to a more responsible use of AI in expediting clinical workflows.
    摘要 使用生成式人工智能(AI),现在可以生成具有各种真实效果的自动报告,以便加速临床工作流程,提高准确性并降低总成本。然而,这些模型经常“幻觉”,导致自动生成的报告中的false finding。在本文中,我们提出了一种新的实验室检查AI生成报告的方法,使得可以 differentiate real和fake sentences。Specifically, the developed examiner learns the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports, leading to a more responsible use of AI in expediting clinical workflows.

A Survey on Reservoir Computing and its Interdisciplinary Applications Beyond Traditional Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15092
  • repo_url: None
  • paper_authors: Heng Zhang, Danilo Vasconcellos Vargas
  • for: 本研究旨在提供一个综述式的概述,涵盖由机器学习到物理、生物和神经科学等领域的各种应用。
  • methods: 该研究使用了 randomly connected recurrent neural network(RC),并通过Linear readout来生成适当的应用Response。
  • results: 研究发现RC可以在各种应用中提供高效的解决方案,并且可以模拟脑内机制。
    Abstract Reservoir computing (RC), first applied to temporal signal processing, is a recurrent neural network in which neurons are randomly connected. Once initialized, the connection strengths remain unchanged. Such a simple structure turns RC into a non-linear dynamical system that maps low-dimensional inputs into a high-dimensional space. The model's rich dynamics, linear separability, and memory capacity then enable a simple linear readout to generate adequate responses for various applications. RC spans areas far beyond machine learning, since it has been shown that the complex dynamics can be realized in various physical hardware implementations and biological devices. This yields greater flexibility and shorter computation time. Moreover, the neuronal responses triggered by the model's dynamics shed light on understanding brain mechanisms that also exploit similar dynamical processes. While the literature on RC is vast and fragmented, here we conduct a unified review of RC's recent developments from machine learning to physics, biology, and neuroscience. We first review the early RC models, and then survey the state-of-the-art models and their applications. We further introduce studies on modeling the brain's mechanisms by RC. Finally, we offer new perspectives on RC development, including reservoir design, coding frameworks unification, physical RC implementations, and interaction between RC, cognitive neuroscience and evolution.
    摘要 储池计算(RC),最初应用于时间信号处理,是一种循环神经网络,其中神经元之间随机连接。一旦初始化,连接强度保持不变。这种简单的结构使RC变成了一个非线性动力系统,可以将低维度输入映射到高维度空间中。模型的丰富动力、线性分离和记忆容量使得使用简单的线性读取器可以生成适用于各种应用的合适回应。RC的应用范围远 beyond 机器学习,因为它已经在物理硬件实现和生物设备中实现了类似的动力过程。这提供了更多的灵活性和更短的计算时间。此外,由模型的动力触发的神经元响应也为了解大脑机制提供了新的指导。在文献中,关于RC的研究非常广泛和杂乱,这里我们进行了一个统一的RC最新发展的评论,从机器学习到物理学、生物学和神经科学。我们首先评论了早期RC模型,然后概括了当前最先进的模型和其应用。我们还介绍了使用RC模型研究大脑机制的研究。最后,我们提出了新的RC发展 perspectives,包括储池设计、编程框架统一、物理RC实现和RC、认知神经科学和进化之间的交互。

Rapid and Scalable Bayesian AB Testing

  • paper_url: http://arxiv.org/abs/2307.14628
  • repo_url: None
  • paper_authors: Srivas Chennu, Andrew Maher, Christian Pangerl, Subash Prabanantham, Jae Hyeon Bae, Jamie Martin, Bud Goswami
  • for: 这种方法可以帮助企业决策者更好地利用数据来改进数字用户体验。
  • methods: 该方法使用 hierarchical Bayesian estimation Addresses 现有方法的限制,包括多变量设计中因素间的相关性、顺序测试和早期停止、以及在过去测试中学习的聚合 global learnings。
  • results: 对于实际数据,该方法可以提高统计力,无需过度风险,并且可以通过顺序测试和早期停止来加速未来测试。
    Abstract AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners, and the constraints imposed by the statistical hypothesis testing methodologies commonly used for analysis of AB tests. These include the lack of statistical power in multivariate designs with many factors, correlations between these factors, the need of sequential testing for early stopping, and the inability to pool knowledge from past tests. Here, we propose a solution that applies hierarchical Bayesian estimation to address the above limitations. In comparison to current sequential AB testing methodology, we increase statistical power by exploiting correlations between factors, enabling sequential testing and progressive early stopping, without incurring excessive false positive risk. We also demonstrate how this methodology can be extended to enable the extraction of composite global learnings from past AB tests, to accelerate future tests. We underpin our work with a solid theoretical framework that articulates the value of hierarchical estimation. We demonstrate its utility using both numerical simulations and a large set of real-world AB tests. Together, these results highlight the practical value of our approach for statistical inference in the technology industry.
    摘要

BubbleML: A Multi-Physics Dataset and Benchmarks for Machine Learning

  • paper_url: http://arxiv.org/abs/2307.14623
  • repo_url: https://github.com/hpcforge/bubbleml
  • paper_authors: Sheikh Md Shakeel Hassan, Arthur Feeney, Akash Dhruv, Jihoon Kim, Youngjoon Suh, Jaiyoung Ryu, Yoonjin Won, Aparna Chandramowlishwaran
  • for: 本研究旨在提供一个可访问的和多样化的数据集,以便用于机器学习(ML)训练,以更好地理解多物理现象的热变化现象。
  • methods: 本研究使用物理驱动的 simulations提供了准确的表征数据,以涵盖多种燃烧场景,包括 Pool 燃烧、流燃烧和冷凝燃烧等,并且覆盖了一系列参数,如重力条件、流速、冷凝水平和墙面超热等,涵盖51个 simulations。
  • results: 本研究验证了BubbleML数据集的可靠性,并证明了其对实验观测的验证和趋势的表征,因此成为了一个非常有价值的资源 для ML研究。此外,本研究还展示了BubbleML数据集的潜在用于多种下游任务的可能性,如捕捉气泡动力学和温度动力学的运算网络。
    Abstract In the field of phase change phenomena, the lack of accessible and diverse datasets suitable for machine learning (ML) training poses a significant challenge. Existing experimental datasets are often restricted, with limited availability and sparse ground truth data, impeding our understanding of this complex multi-physics phenomena. To bridge this gap, we present the BubbleML Dataset(https://github.com/HPCForge/BubbleML) which leverages physics-driven simulations to provide accurate ground truth information for various boiling scenarios, encompassing nucleate pool boiling, flow boiling, and sub-cooled boiling. This extensive dataset covers a wide range of parameters, including varying gravity conditions, flow rates, sub-cooling levels, and wall superheat, comprising 51 simulations. BubbleML is validated against experimental observations and trends, establishing it as an invaluable resource for ML research. Furthermore, we showcase its potential to facilitate exploration of diverse downstream tasks by introducing two benchmarks: (a) optical flow analysis to capture bubble dynamics, and (b) operator networks for learning temperature dynamics. The BubbleML dataset and its benchmarks serve as a catalyst for advancements in ML-driven research on multi-physics phase change phenomena, enabling the development and comparison of state-of-the-art techniques and models.
    摘要 在热相变现象领域,因无法访问和多样化的数据集而进行机器学习(ML)训练的缺乏,是一大挑战。现有的实验数据往往受限,有限的可用性和罕见的准确数据,使得我们对这种复杂多物理现象的理解受阻。为了bridging这个差距,我们提出了BubbleML数据集(https://github.com/HPCForge/BubbleML),该数据集利用物理驱动的 simulations提供了各种爆泡场景的准确的基础 truth信息,包括不同重力条件、流速、冷却水温和墙面超热等多种参数,涵盖51个 simulations。BubbleML被验证了对实验观察和趋势的验证,成为一个无价的资源 для ML研究。此外,我们还介绍了两个benchmark:(a) 光流分析以捕捉爆泡动力学,以及(b) 运算员网络用于学习温度动力学。BubbleML数据集和其benchmarks将作为多物理热相变现象的ML驱动研究的catalyst,帮助开发和比较现有的技术和模型。

Imitating Complex Trajectories: Bridging Low-Level Stability and High-Level Behavior

  • paper_url: http://arxiv.org/abs/2307.14619
  • repo_url: None
  • paper_authors: Adam Block, Daniel Pfrommer, Max Simchowitz
  • for: 研究复杂非Markovian随机专家示范的模仿在非线性动力系统中。
  • methods: 使用低级控制器(可以是学习的或假设的)来稳定模仿政策,并使用数据扩展和一种新算法技巧来保证政策的稳定性。
  • results: 研究证明,如果学习者准确地估计专家政策的分数,则模仿者的轨迹分布与专家的分布在自然的优化传输距离内很相似。
    Abstract We propose a theoretical framework for studying the imitation of stochastic, non-Markovian, potentially multi-modal (i.e. "complex" ) expert demonstrations in nonlinear dynamical systems. Our framework invokes low-level controllers - either learned or implicit in position-command control - to stabilize imitation policies around expert demonstrations. We show that with (a) a suitable low-level stability guarantee and (b) a stochastic continuity property of the learned policy we call "total variation continuity" (TVC), an imitator that accurately estimates actions on the demonstrator's state distribution closely matches the demonstrator's distribution over entire trajectories. We then show that TVC can be ensured with minimal degradation of accuracy by combining a popular data-augmentation regimen with a novel algorithmic trick: adding augmentation noise at execution time. We instantiate our guarantees for policies parameterized by diffusion models and prove that if the learner accurately estimates the score of the (noise-augmented) expert policy, then the distribution of imitator trajectories is close to the demonstrator distribution in a natural optimal transport distance. Our analysis constructs intricate couplings between noise-augmented trajectories, a technique that may be of independent interest. We conclude by empirically validating our algorithmic recommendations.
    摘要 我们提出一种理论框架,用于研究复杂专家示范(即“复杂”的非马普尔逊、随机的)的仿制在非线性动力系统中。我们的框架利用低级控制器(可以是学习或含义在位置控制中)来稳定仿制策略。我们证明,当低级控制器具有适当的稳定性保证和随机继续性(我们称之为“总体变量稳定”,TVC)时,仿制者可以准确地模仿专家的动作,并且模仿者的动作 distribuion几乎与专家的动作分布一致。我们然后证明,可以通过将扩展噪声添加到执行时来保证 TVC,并且这种方法可以减少准确性下降。我们在 diffusion 模型参数化的政策上实现我们的保证,并证明如果学习者准确地估计专家(噪声扩展)的政策得分,那么仿制者的轨迹分布与专家的轨迹分布在自然的最佳运输距离上几乎一致。我们的分析构造了噪声扩展 trajectory 之间的复杂联系,这可能是独立的 интерес。我们 finally 验证了我们的算法建议。

Self-Contrastive Graph Diffusion Network

  • paper_url: http://arxiv.org/abs/2307.14613
  • repo_url: https://github.com/kunzhan/SCDGN
  • paper_authors: Yixian Ma, Kun Zhan
  • for: 本文提出了一种新的框架,即自同Graph Diffusion Network(SCGDN),用于Graph Self-Contrastive Learning(GSC)中。
  • methods: 本文使用了两个主要组件:Attentional Module(AttM)和Diffusion Module(DiFM)。AttM使用高级结构和特征信息进行聚合,以获得优秀的嵌入;DiFM通过laplacian填充学习平衡每个节点的状态,并允许 adjacency和特征信息在图中协同演化。
  • results: 本文的方法可以避免”采样偏见”和 semantics drift,无需预训练。与现有方法相比,SCGDN可以一直保持高级结构信息,并免受过拟合。实验表明,SCGDN可以在GSC中表现出优秀的性能,并在对照方法和经典方法的比较中占据优势。
    Abstract Augmentation techniques and sampling strategies are crucial in contrastive learning, but in most existing works, augmentation techniques require careful design, and their sampling strategies can only capture a small amount of intrinsic supervision information. Additionally, the existing methods require complex designs to obtain two different representations of the data. To overcome these limitations, we propose a novel framework called the Self-Contrastive Graph Diffusion Network (SCGDN). Our framework consists of two main components: the Attentional Module (AttM) and the Diffusion Module (DiFM). AttM aggregates higher-order structure and feature information to get an excellent embedding, while DiFM balances the state of each node in the graph through Laplacian diffusion learning and allows the cooperative evolution of adjacency and feature information in the graph. Unlike existing methodologies, SCGDN is an augmentation-free approach that avoids "sampling bias" and semantic drift, without the need for pre-training. We conduct a high-quality sampling of samples based on structure and feature information. If two nodes are neighbors, they are considered positive samples of each other. If two disconnected nodes are also unrelated on $k$NN graph, they are considered negative samples for each other. The contrastive objective reasonably uses our proposed sampling strategies, and the redundancy reduction term minimizes redundant information in the embedding and can well retain more discriminative information. In this novel framework, the graph self-contrastive learning paradigm gives expression to a powerful force. SCGDN effectively balances between preserving high-order structure information and avoiding overfitting. The results manifest that SCGDN can consistently generate outperformance over both the contrastive methods and the classical methods.
    摘要 “增强技术和采样策略是对比学习中的关键,但现有的方法中的增强技术通常需要谨慎的设计,采样策略只能捕捉到一小部分内在监督信息。此外,现有的方法通常需要复杂的设计来获得两种不同的数据表示。为了解决这些限制,我们提出了一种新的框架called Self-Contrastive Graph Diffusion Network (SCGDN)。我们的框架包括两个主要组成部分:Attentional Module (AttM)和Diffusion Module (DiFM)。AttM通过聚合更高级结构和特征信息来获得优秀的嵌入,而DiFM通过拉普拉斯协同扩散学习来均衡每个节点在图中的状态,让各节点之间的协同演化和特征信息在图中均衡。不同于现有的方法ologies,SCGDN不需要增强技术和采样偏见,也不需要预训练。我们采用高质量的采样策略,根据结构和特征信息来采样样本。如果两个节点相邻,它们被视为对方的正例样本。如果两个不相关的节点也不在$k$NN图中相互关联,它们被视为对方的负例样本。对比目标reasonably使用我们提出的采样策略,红undancy减少项可以减少嵌入中的重复信息,可以良好地保留更多的特征信息。在这种新的框架中,图自我对比学习模式表达了强大的力量。SCGDN能够均衡保持高级结构信息和避免过拟合。结果显示,SCGDN可以一直在对比方法和传统方法上表现出优异表现。”

Complete and separate: Conditional separation with missing target source attribute completion

  • paper_url: http://arxiv.org/abs/2307.14609
  • repo_url: None
  • paper_authors: Dimitrios Bralios, Efthymios Tzinis, Paris Smaragdis
  • for: 用于提高多 conditional separation 网络的分离性能
  • methods: 使用预训练模型提取额外Semantic数据,并将其与不同输入混合物相结合
  • results: separation performance 接近oracle模型水平,并且与专门的单Conditional模型性能相当,提供了更容易使用的替代方案。In English:
  • for: To improve the separation performance of a multi-conditional separation network
  • methods: Using a pre-trained model to extract additional semantic data and combine it with different input mixtures
  • results: Separation performance approaching the level of an oracle model, and comparable to the best performing specialized single conditional models, providing an easier-to-use alternative.
    Abstract Recent approaches in source separation leverage semantic information about their input mixtures and constituent sources that when used in conditional separation models can achieve impressive performance. Most approaches along these lines have focused on simple descriptions, which are not always useful for varying types of input mixtures. In this work, we present an approach in which a model, given an input mixture and partial semantic information about a target source, is trained to extract additional semantic data. We then leverage this pre-trained model to improve the separation performance of an uncoupled multi-conditional separation network. Our experiments demonstrate that the separation performance of this multi-conditional model is significantly improved, approaching the performance of an oracle model with complete semantic information. Furthermore, our approach achieves performance levels that are comparable to those of the best performing specialized single conditional models, thus providing an easier to use alternative.
    摘要 现代音源分离方法利用输入混合的semantic信息和组成源的信息,当用于条件分离模型时可以实现出色的性能。大多数方法都集中于简单的描述,这些描述不一定适用于不同类型的输入混合。在这项工作中,我们提出了一种方法, giventarget source的partial semantic information和输入混合,使用一个预训练的模型提取更多的semantic数据。然后,我们利用这个预训练模型改进了不相关的多 condional separation network的分离性能。我们的实验表明,这个多conditional模型的分离性能显著提高,接近完全semantic信息的oracle模型。此外,我们的方法可以与特化的单conditional模型相比,实现相似的性能水平,提供一个更容易使用的alternative。

HUTFormer: Hierarchical U-Net Transformer for Long-Term Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2307.14596
  • repo_url: None
  • paper_authors: Zezhi Shao, Fei Wang, Zhao Zhang, Yuchen Fang, Guangyin Jin, Yongjun Xu
  • for: 本研究的目的是提高长期交通预测(1天预测)的精度, Addressing the unique challenges of long-term traffic forecasting, such as exploiting multi-scale representations.
  • methods: 提议一种新的层次U-NetTransformer(HUTFormer)来解决长期交通预测的问题。 HUTFormer包括一个层次编码器和解码器,并使用窗口自注意力和段合并来提取多尺度表示。
  • results: 对四个交通数据集进行了广泛的实验,结果表明,提议的HUTFormer显著超越了当前交通预测和长时序预测基准。
    Abstract Traffic forecasting, which aims to predict traffic conditions based on historical observations, has been an enduring research topic and is widely recognized as an essential component of intelligent transportation. Recent proposals on Spatial-Temporal Graph Neural Networks (STGNNs) have made significant progress by combining sequential models with graph convolution networks. However, due to high complexity issues, STGNNs only focus on short-term traffic forecasting, e.g., 1-hour forecasting, while ignoring more practical long-term forecasting. In this paper, we make the first attempt to explore long-term traffic forecasting, e.g., 1-day forecasting. To this end, we first reveal its unique challenges in exploiting multi-scale representations. Then, we propose a novel Hierarchical U-net TransFormer (HUTFormer) to address the issues of long-term traffic forecasting. HUTFormer consists of a hierarchical encoder and decoder to jointly generate and utilize multi-scale representations of traffic data. Specifically, for the encoder, we propose window self-attention and segment merging to extract multi-scale representations from long-term traffic data. For the decoder, we design a cross-scale attention mechanism to effectively incorporate multi-scale representations. In addition, HUTFormer employs an efficient input embedding strategy to address the complexity issues. Extensive experiments on four traffic datasets show that the proposed HUTFormer significantly outperforms state-of-the-art traffic forecasting and long time series forecasting baselines.
    摘要 traffic 预测,目标是根据历史观察数据预测交通情况,已经是长期的研究话题,广泛被认为是智能交通系统的重要组成部分。 recent proposals on Spatial-Temporal Graph Neural Networks (STGNNs) 已经做出了重要进步,通过将序列模型与图 convolution networks 结合起来。然而,由于高复杂性问题,STGNNs 只能 focus on short-term traffic forecasting,例如 1 小时预测,而忽略了更实用的长期预测。在这篇论文中,我们首次尝试了长期交通预测,例如 1 天预测。为此,我们首先揭示了长期交通预测的独特挑战,然后我们提出了一种 novel Hierarchical U-net TransFormer (HUTFormer) 来解决这些问题。HUTFormer 包括一个层次编码器和解码器,共同生成和利用多级表示交通数据。具体来说,对于编码器,我们提出了窗口自注意和段合并来EXTRACT多级表示长期交通数据。对于解码器,我们设计了跨级注意机制,以有效地合并多级表示。此外,HUTFormer 使用了高效的输入嵌入策略,以解决复杂性问题。经验表明,我们提出的 HUTFormer 在四个交通数据集上具有显著的优势,与当前的交通预测和长时间序列预测基准值做出了比较。

MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.14588
  • repo_url: https://github.com/simonustc/mcpa-for-2d-medical-image-segmentation
  • paper_authors: Liang Xu, Mingxiao Chen, Yi Cheng, Pengfei Shao, Shuwei Shen, Peng Yao, Ronald X. Xu
    for:* 这个研究是为了提出一个基于Convolutional Neural Networks (CNN)的双重构造图像分类模型,以便更好地处理医疗图像分类任务。methods:* 这个模型使用了Transformer模组来强化UNet架构,以更好地捕捉全面特征相关性。* 这个模型还具有Multi-scale Cross Perceptron Attention Network (MCPA)架构,可以将多个当地相关性捕捉到图像中,并将其聚合成全面特征。results:* 这个模型在多个公开的医疗图像数据集上取得了state-of-the-art的性能。
    Abstract The UNet architecture, based on Convolutional Neural Networks (CNN), has demonstrated its remarkable performance in medical image analysis. However, it faces challenges in capturing long-range dependencies due to the limited receptive fields and inherent bias of convolutional operations. Recently, numerous transformer-based techniques have been incorporated into the UNet architecture to overcome this limitation by effectively capturing global feature correlations. However, the integration of the Transformer modules may result in the loss of local contextual information during the global feature fusion process. To overcome these challenges, we propose a 2D medical image segmentation model called Multi-scale Cross Perceptron Attention Network (MCPA). The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron. The Cross Perceptron first captures the local correlations using multiple Multi-scale Cross Perceptron modules, facilitating the fusion of features across scales. The resulting multi-scale feature vectors are then spatially unfolded, concatenated, and fed through a Global Perceptron module to model global dependencies. Furthermore, we introduce a Progressive Dual-branch Structure to address the semantic segmentation of the image involving finer tissue structures. This structure gradually shifts the segmentation focus of MCPA network training from large-scale structural features to more sophisticated pixel-level features. We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices, including the open large-scale dataset of CT (Synapse), MRI (ACDC), fundus camera (DRIVE, CHASE_DB1, HRF), and OCTA (ROSE). The experimental results show that our MCPA model achieves state-of-the-art performance. The code is available at https://github.com/simonustc/MCPA-for-2D-Medical-Image-Segmentation.
    摘要 “UNet架构,基于卷积神经网络(CNN),在医疗图像分析中表现出了惊人的表现。然而,它面临着捕捉长距离依赖关系的挑战,因为卷积操作具有局限的观察领域和自然偏好。随着时间的推移,许多基于transformer技术的方法被引入到UNet架构中,以更好地捕捉全局特征相关性。然而,将transformer模块纳入UNet架构可能会导致全局特征融合过程中的本地Contextual信息损失。为了解决这些挑战,我们提出了一种名为Multi-scale Cross Perceptron Attention Network(MCPA)的2D医疗图像分类模型。MCPA模型包括三个主要组成部分:编码器、解码器和Cross Perceptron。Cross Perceptron首先使用多个多级卷积模块捕捉本地相关性,以便在不同尺度上进行特征融合。然后,得到的多级特征向量被空间拓展、 concatenate 并输入到全球Perceptron模块,以模型全局依赖关系。此外,我们还引入了一种进步的双支结构,以解决医疗图像的Semantic分类问题,涉及到更细的组织结构。这种结构逐渐将MCPA网络训练中的 segmentation 焦点从大规模结构特征转移到更复杂的像素级特征。我们在多个公共可用的医疗图像 dataset 上进行了实验,包括 Synapse、ACDC、DRIVE、CHASE_DB1 和 HRF 等。实验结果表明,我们的 MCPA 模型达到了领先的性能。代码可以在 GitHub 上找到。”

Evaluation of Safety Constraints in Autonomous Navigation with Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.14568
  • repo_url: None
  • paper_authors: Brian Angulo, Gregory Gorbov, Aleksandr Panov, Konstantin Yakovlev
  • for: 这个研究的目的是为了演示Autonomous Navigation中安全性的重要性,并提出一种能够满足安全性要求的学习策略。
  • methods: 这个研究使用了两种不同的学习策略:一种是安全的,另一种是不安全的。 unsafe policy不考虑安全性要求,而safe policy则会考虑这些要求。
  • results: 研究结果显示,使用safe policy可以生成更多的clearance(避免碰撞的距离),并且避免了许多碰撞,而不会 sacrificing总性能。
    Abstract While reinforcement learning algorithms have had great success in the field of autonomous navigation, they cannot be straightforwardly applied to the real autonomous systems without considering the safety constraints. The later are crucial to avoid unsafe behaviors of the autonomous vehicle on the road. To highlight the importance of these constraints, in this study, we compare two learnable navigation policies: safe and unsafe. The safe policy takes the constraints into account, while the other does not. We show that the safe policy is able to generate trajectories with more clearance (distance to the obstacles) and makes less collisions while training without sacrificing the overall performance.
    摘要 而实际自动驾驶系统中,强化学习算法无法直接应用,必须考虑安全限制。这些限制是关键,以避免自动车辆上路上不安全的行为。在这种研究中,我们比较了两种可学习导航策略:安全和不安全。安全策略考虑了限制,而另一个不考虑。我们显示,安全策略可以生成更多的适当距离(距离障碍物)和更少的碰撞,在训练过程中不 sacrificing总性能。

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

  • paper_url: http://arxiv.org/abs/2307.14565
  • repo_url: https://github.com/lipengcs/auto-tables-benchmark
  • paper_authors: Peng Li, Yeye He, Cong Yan, Yue Wang, Surajit Chaudhuri
  • for: 这个论文是为了解决在实际应用中遇到的表格不符合关系标准的问题。
  • methods: 该论文使用自动生成管道来自动转换非关系表格,以便在下游分析工具中进行查询。
  • results: 论文的实验结果表明,Auto-Tables 系统可以成功地自动生成多步转换管道,将非关系表格转换成标准关系表格,并且可以在互动速度下完成这些转换,无需用户输入。
    Abstract Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, have been the standard for tables in relational databases. However, such a standard cannot be taken for granted when dealing with tables "in the wild". Our survey of real spreadsheet-tables and web-tables shows that over 30% of such tables do not conform to the relational standard, for which complex table-restructuring transformations are needed before these tables can be queried easily using SQL-based analytics tools. Unfortunately, the required transformations are non-trivial to program, which has become a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Power-BI/Tableau forums. We develop an Auto-Tables system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages), to transform non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations. We compile an extensive benchmark for this new task, by collecting 244 real test cases from user spreadsheets and online forums. Our evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.
    摘要 traditional Chinese:关联表格,每一行都代表一个实体,每一列都代表一个属性,在关联式数据库中是标准。但是,这个标准不能被视为可预期,当面对真实的该表时,超过30%的表不符合关联式标准,需要进行复杂的表结构变换,以便使用SQL基础的分析工具进行轻松查询。不幸的是,需要进行这些变换的程式码是非常复杂,导致技术和非技术用户都会遇到问题,如Stack Overflow和Excel/Power BI/Tableau论坛中的讨论。我们开发了一个名为Auto-Tables的系统,可以自动合成多步变换管线(在Python或其他语言中),将非关联式表转换为标准关联式形式,以便进行下游分析。我们建立了一个广泛的benchmark测试项目,收集了244个真实的测试案例自用户的试题和网上论坛。我们的评估显示,Auto-Tables可以成功地自动合成变换为超过70%的测试案例,在互动速度下进行变换,不需要用户提供任何输入,这使得这个工具成为了技术和非技术用户都可以使用的有效工具,以准备资料进行分析。

Understanding Forward Process of Convolutional Neural Network

  • paper_url: http://arxiv.org/abs/2307.15090
  • repo_url: https://github.com/himanshub1007/Alzhimers-Disease-Prediction-Using-Deep-learning
  • paper_authors: Peixin Tian
  • for: 这篇论文揭示了深度学习网络中的选择性旋转处理。它解释了activation函数作为输入数据的旋转方面的决定性机制,并用数学工具来分析和理解这种定义的方法。
  • methods: 这篇论文使用了深度学习网络进行实验,以证明其在处理输入数据时的旋转性。它还使用了数学工具来分析和理解这种旋转性。
  • results: 这篇论文的实验结果表明,深度学习网络可以通过activation函数来分析和理解输入数据的旋转性。这些结果还表明了人工神经网络和人脑的数据处理模式之间的一致性。
    Abstract This paper reveal the selective rotation in the CNNs' forward processing. It elucidates the activation function as a discerning mechanism that unifies and quantizes the rotational aspects of the input data. Experiments show how this defined methodology reflects the progress network distinguish inputs based on statistical indicators, which can be comprehended or analyzed by applying structured mathematical tools. Our findings also unveil the consistency between artificial neural networks and the human brain in their data processing pattern.
    摘要 这篇论文揭示了深度神经网络(CNN)的选择性旋转处理。它解释了活动函数作为识别机制,卷积数据的旋转方面的统一和量化。实验显示,这种定义的方法论可以通过结构化的数学工具来理解和分析输入数据的进程。我们的发现还揭示了人工神经网络和人脑在数据处理模式上的一致性。Here's a word-for-word translation:这篇论文揭示了深度神经网络(CNN)的选择性旋转处理。它解释了活动函数作为识别机制,卷积数据的旋转方面的统一和量化。实验显示,这种定义的方法论可以通过结构化的数学工具来理解和分析输入数据的进程。我们的发现还揭示了人工神经网络和人脑在数据处理模式上的一致性。

Adversarial Sleeping Bandit Problems with Multiple Plays: Algorithm and Ranking Application

  • paper_url: http://arxiv.org/abs/2307.14549
  • repo_url: None
  • paper_authors: Jianjun Yuan, Wei Lee Woon, Ludovik Coba
  • for: 这个论文是为了解决在线推荐系统中的睡眠帮手问题。
  • methods: 该算法使用了扩展的睡眠帮手算法,用于选择多个arm。
  • results: 该算法可以 garantuee理论性能,即对 regret的上界为$\bigO(kN^2\sqrt{T\log T})$, где $k$是每次选择的arm数量,$N$是总共有arm数量,$T$是时间 horizonto。
    Abstract This paper presents an efficient algorithm to solve the sleeping bandit with multiple plays problem in the context of an online recommendation system. The problem involves bounded, adversarial loss and unknown i.i.d. distributions for arm availability. The proposed algorithm extends the sleeping bandit algorithm for single arm selection and is guaranteed to achieve theoretical performance with regret upper bounded by $\bigO(kN^2\sqrt{T\log T})$, where $k$ is the number of arms selected per time step, $N$ is the total number of arms, and $T$ is the time horizon.
    摘要 (这篇论文提出了一种有效的算法来解决在在线推荐系统中的睡眠喇嘛问题,该问题具有缩存损失和对称敌对损失,以及不确定的i.i.d.分布。提出的算法是基于单臂选择的睡眠喇嘛算法的扩展,并且保证了理论性能,即对不同的$k$、$N$和$T$来说,做出的停留 regret的上界为$\bigO(kN^2\sqrt{T\log T})$。)

Controlling the Inductive Bias of Wide Neural Networks by Modifying the Kernel’s Spectrum

  • paper_url: http://arxiv.org/abs/2307.14531
  • repo_url: None
  • paper_authors: Amnon Geifman, Daniel Barzilai, Ronen Basri, Meirav Galun
  • for: 本研究旨在提出一种修改宽神经网络偏好的方法,以便根据任务需要 modify the bias of wide neural networks to suit the task at hand.
  • methods: 我们引入 Modified Spectrum Kernels (MSKs),一种新的 constructed kernels 家族,可以用于approximate kernels with desired eigenvalues,而且不需要closed form。我们还提出了一种采用权重梯度下降法,通过修改梯度下降的轨迹,以实现 polynomial 和、在一些情况下,exponential 的训练速度提高,而不会改变最终解决方案。
  • results: 我们的方法可以快速和简单地实现计算效率和执行效率的提高,而且可以在各种任务上达到good performance。
    Abstract Wide neural networks are biased towards learning certain functions, influencing both the rate of convergence of gradient descent (GD) and the functions that are reachable with GD in finite training time. As such, there is a great need for methods that can modify this bias according to the task at hand. To that end, we introduce Modified Spectrum Kernels (MSKs), a novel family of constructed kernels that can be used to approximate kernels with desired eigenvalues for which no closed form is known. We leverage the duality between wide neural networks and Neural Tangent Kernels and propose a preconditioned gradient descent method, which alters the trajectory of GD. As a result, this allows for a polynomial and, in some cases, exponential training speedup without changing the final solution. Our method is both computationally efficient and simple to implement.
    摘要 广关系网络具有偏好学习某些函数,影响了Gradient Descent(GD)的趋向率和可以在培训时间内透过GDlearned的函数。为了解决这问题,我们介绍Modified Spectrum Kernels(MSKs),一种新的建构kernel的家族,可以用来 aproximate desiredeigenvalues的kernel,即无法known的closed form。我们利用广关系网络和Neural Tangent Kernels的 dualidade,提出预调正幂 descent方法,这会改变GD的轨迹。因此,这allow for a polynomial, 在一些情况下,even exponential training speedup without changing the final solution。我们的方法是computationally efficient和simple to implement。

Optimal Estimation in Mixed-Membership Stochastic Block Models

  • paper_url: http://arxiv.org/abs/2307.14530
  • repo_url: None
  • paper_authors: Fedor Noskov, Maxim Panov
  • for: 这 paper 是为了研究混合会员随机块模型(MMSB)中的跨群关系重建问题。
  • methods: 该 paper 使用了不同的方法来重建跨群关系,并证明了这些方法的最小最大下界。 finally, 提出了一种新的估计器,可以达到这个下界。
  • results: 该 paper 提出了一种新的估计器,可以在 MMSB 模型中重建跨群关系,并且证明了这种估计器的正确性。
    Abstract Community detection is one of the most critical problems in modern network science. Its applications can be found in various fields, from protein modeling to social network analysis. Recently, many papers appeared studying the problem of overlapping community detection, where each node of a network may belong to several communities. In this work, we consider Mixed-Membership Stochastic Block Model (MMSB) first proposed by Airoldi et al. (2008). MMSB provides quite a general setting for modeling overlapping community structure in graphs. The central question of this paper is to reconstruct relations between communities given an observed network. We compare different approaches and establish the minimax lower bound on the estimation error. Then, we propose a new estimator that matches this lower bound. Theoretical results are proved under fairly general conditions on the considered model. Finally, we illustrate the theory in a series of experiments.
    摘要 社区探测是现代网络科学中最重要的问题之一。它在不同领域都有广泛的应用,从蛋白质模型到社交网络分析。最近,许多论文出现了研究多重社区探测问题,其中每个网络节点可能属于多个社区。在这种情况下,我们考虑了杂合会员随机块模型(MMSB),由airoldi等人于2008年提出。MMSB提供了模型多重社区结构的非常通用的设定。本文的主要问题是根据观察网络重建社区之间的关系。我们比较了不同方法,并证明了最小最大下界,这个下界对于估计错误的最低 bound。然后,我们提出了一种新的估计器,该估计器匹配这个下界。我们在满足一些基本的假设下提出了理论结果。最后,我们在一系列实验中证明了这些理论结果。

Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM

  • paper_url: http://arxiv.org/abs/2307.14528
  • repo_url: None
  • paper_authors: Guillaume Garrigos, Robert M. Gower, Fabian Schaipp
  • for: 本研究targets solving finite sum-of-terms problem (empirical risk minimization) using stochastic gradient descent (SGD) with adaptive step size.
  • methods: 本 paper proposes two variants of SGD: $\texttt{SPS}+$ and $\texttt{FUVAL}$. $\texttt{SPS}+$ is a minor modification of SPS method that uses sampled loss values and assumes knowledge of the sampled loss at optimality, while $\texttt{FUVAL}$ gradually learns the loss values at optimality.
  • results: 本 paper shows that $\texttt{SPS}_+$ achieves the best known rates of convergence for SGD in Lipschitz non-smooth setting. However, the convergence analysis of $\texttt{FUVAL}$ shows no advantage over SGD, and the stochastic version of $\texttt{FUVAL}$ shows no clear advantage over SGD. The paper also presents experimental results.
    Abstract Here we develop variants of SGD (stochastic gradient descent) with an adaptive step size that make use of the sampled loss values. In particular, we focus on solving a finite sum-of-terms problem, also known as empirical risk minimization. We first detail an idealized adaptive method called $\texttt{SPS}_+$ that makes use of the sampled loss values and assumes knowledge of the sampled loss at optimality. This $\texttt{SPS}_+$ is a minor modification of the SPS (Stochastic Polyak Stepsize) method, where the step size is enforced to be positive. We then show that $\texttt{SPS}_+$ achieves the best known rates of convergence for SGD in the Lipschitz non-smooth. We then move onto to develop $\texttt{FUVAL}$, a variant of $\texttt{SPS}_+$ where the loss values at optimality are gradually learned, as opposed to being given. We give three viewpoints of $\texttt{FUVAL}$, as a projection based method, as a variant of the prox-linear method, and then as a particular online SGD method. We then present a convergence analysis of $\texttt{FUVAL}$ and experimental results. The shortcomings of our work is that the convergence analysis of $\texttt{FUVAL}$ shows no advantage over SGD. Another shortcomming is that currently only the full batch version of $\texttt{FUVAL}$ shows a minor advantages of GD (Gradient Descent) in terms of sensitivity to the step size. The stochastic version shows no clear advantage over SGD. We conjecture that large mini-batches are required to make $\texttt{FUVAL}$ competitive. Currently the new $\texttt{FUVAL}$ method studied in this paper does not offer any clear theoretical or practical advantage. We have chosen to make this draft available online nonetheless because of some of the analysis techniques we use, such as the non-smooth analysis of $\texttt{SPS}_+$, and also to show an apparently interesting approach that currently does not work.
    摘要 我们在这里开发了具有自适应步长的SGD(测量梯度下降)的变种,并使用抽象的负数梯度方法来解决empirical risk minimization这个Finite sum-of-terms问题。我们首先详细介绍了一个理想化的自适应方法called $\texttt{SPS}_+$,它使用抽象的负数梯度方法和认知到抽象的负数梯度的问题。我们随后显示了 $\texttt{SPS}_+ $ 在 Lipschitz 非平滑的情况下的最好已知的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的测度进行对SGD的

Open Problems in Computer Vision for Wilderness SAR and The Search for Patricia Wu-Murad

  • paper_url: http://arxiv.org/abs/2307.14527
  • repo_url: https://github.com/crasar/wisar
  • paper_authors: Thomas Manzini, Robin Murphy
  • for: 本研究旨在应用两种计算机视觉系统(一个是supersvised学习模型EfficientDET,另一个是无监督RXspectral分类器)于98.9 GB的无人机图像,从日本 Wu-Murad无人机搜救救援(WSAR)活动中,并识别出三个未来研究方向。
  • methods: 本研究使用了EfficientDET建模和RXspectral分类器,并在HERIDAL数据集上应用了EfficientDET模型,即使在数据集上达到了与状态艺术相同的性能水平,但在实际应用中出现了许多问题,如 mistakenly identifying tree limbs and rocks as people,以及 failure to identify members of the search team。
  • results: 本研究发现,在实际应用中,已有许多计算机视觉算法在数据集上表现良好,但在实际WSAR操作中表现不佳,主要是因为数据集和实际应用中的图像差异过大。因此,本研究提出了三个未来研究方向,即更加真实的数据集,能够覆盖更多的图像类型的计算机视觉模型,以及更好地定义性能指标。
    Abstract This paper details the challenges in applying two computer vision systems, an EfficientDET supervised learning model and the unsupervised RX spectral classifier, to 98.9 GB of drone imagery from the Wu-Murad wilderness search and rescue (WSAR) effort in Japan and identifies 3 directions for future research. There have been at least 19 proposed approaches and 3 datasets aimed at locating missing persons in drone imagery, but only 3 approaches (2 unsupervised and 1 of an unknown structure) are referenced in the literature as having been used in an actual WSAR operation. Of these proposed approaches, the EfficientDET architecture and the unsupervised spectral RX classifier were selected as the most appropriate for this setting. The EfficientDET model was applied to the HERIDAL dataset and despite achieving performance that is statistically equivalent to the state-of-the-art, the model fails to translate to the real world in terms of false positives (e.g., identifying tree limbs and rocks as people), and false negatives (e.g., failing to identify members of the search team). The poor results in practice for algorithms that showed good results on datasets suggest 3 areas of future research: more realistic datasets for wilderness SAR, computer vision models that are capable of seamlessly handling the variety of imagery that can be collected during actual WSAR operations, and better alignment on performance measures.
    摘要
  1. More realistic datasets for wilderness SAR: The current datasets used for training and testing computer vision models may not accurately reflect the real-world conditions and variability of imagery collected during actual WSAR operations.2. Computer vision models that can handle diverse imagery: The models should be capable of seamlessly handling the variety of imagery that can be collected during actual WSAR operations, such as different lighting conditions, weather, and terrain.3. Better alignment on performance measures: The performance of the models should be evaluated using relevant metrics that reflect the specific requirements of WSAR operations, such as detecting missing persons in real-time and minimizing false positives and false negatives.The paper also notes that while there have been many proposed approaches and datasets for locating missing persons in drone imagery, only a few have been used in actual WSAR operations. The EfficientDET architecture and the unsupervised spectral RX classifier were selected as the most appropriate for this setting, but they failed to translate to real-world performance due to issues with false positives and false negatives.

A new algorithm for Subgroup Set Discovery based on Information Gain

  • paper_url: http://arxiv.org/abs/2307.15089
  • repo_url: None
  • paper_authors: Daniel Gómez-Bravo, Aaron García, Guillermo Vigueras, Belén Ríos, Alejandro Rodríguez-González
  • for: 本研究旨在提出一种新的Pattern Discovery(PD)算法,用于找到数据集中更高频率出现的项集、子序列或结构。
  • methods: 本算法 combines Information Gain(IG)和Odds Ratio(OR)作为多重标准 дляpattern选择。
  • results: 对于11个数据集,IGSD算法比FSSD和SSD++算法提供了更可靠的pattern和更小的pattern集。IGSD还提供了更高的OR值,表明pattern和target之间的依存关系更强。此外,IGSD中的pattern被专家 validate 过,与FSSD和SSD++算法中的pattern更为一致。这些结果表明IGSD是一种适合Pattern Discovery的方法,并且包含非标准PD metric可以更好地评估发现的pattern。
    Abstract Pattern discovery is a machine learning technique that aims to find sets of items, subsequences, or substructures that are present in a dataset with a higher frequency value than a manually set threshold. This process helps to identify recurring patterns or relationships within the data, allowing for valuable insights and knowledge extraction. In this work, we propose Information Gained Subgroup Discovery (IGSD), a new SD algorithm for pattern discovery that combines Information Gain (IG) and Odds Ratio (OR) as a multi-criteria for pattern selection. The algorithm tries to tackle some limitations of state-of-the-art SD algorithms like the need for fine-tuning of key parameters for each dataset, usage of a single pattern search criteria set by hand, usage of non-overlapping data structures for subgroup space exploration, and the impossibility to search for patterns by fixing some relevant dataset variables. Thus, we compare the performance of IGSD with two state-of-the-art SD algorithms: FSSD and SSD++. Eleven datasets are assessed using these algorithms. For the performance evaluation, we also propose to complement standard SD measures with IG, OR, and p-value. Obtained results show that FSSD and SSD++ algorithms provide less reliable patterns and reduced sets of patterns than IGSD algorithm for all datasets considered. Additionally, IGSD provides better OR values than FSSD and SSD++, stating a higher dependence between patterns and targets. Moreover, patterns obtained for one of the datasets used, have been validated by a group of domain experts. Thus, patterns provided by IGSD show better agreement with experts than patterns obtained by FSSD and SSD++ algorithms. These results demonstrate the suitability of the IGSD as a method for pattern discovery and suggest that the inclusion of non-standard SD metrics allows to better evaluate discovered patterns.
    摘要 Pattern discovery 是一种机器学习技术,旨在在数据集中找到高频值的项集、 subsequences 或 substructures。这个过程可以帮助找到数据中的重复模式或关系,从而提供有价值的发现和知识提取。在这个工作中,我们提出了信息增加子集发现(IGSD)算法,这是一种新的 SD 算法,它结合信息增加(IG)和很大比率(OR)作为多种选择 criterion。该算法想要解决现有 SD 算法的一些限制,如需要手动调整关键参数、使用单一的模式搜索 criterion、使用不相交的数据结构进行 subgroup 空间探索、以及无法通过固定一些重要的数据集变量来搜索模式。因此,我们对 IGSD 算法进行了比较,并使用了两种现有 SD 算法:FSSD 和 SSD++。我们对 11 个数据集进行了测试,并为表现评估提出了一种新的方法,该方法包括 IG、OR 和 p-value。获得的结果表明,FSSD 和 SSD++ 算法提供的模式较不可靠,模式集较少,而 IGSD 算法提供的模式较为一致,OR 值较高,表明模式和目标之间的依赖性更高。此外,IGSD 算法对一个数据集中的模式进行了验证,并与领域专家的评估相符。因此,IGSD 算法是一种适用的模式发现方法,并且包括非标准 SD 度量可以更好地评估发现的模式。

Bug Characterization in Machine Learning-based Systems

  • paper_url: http://arxiv.org/abs/2307.14512
  • repo_url: https://github.com/ml-bugs-2022/replication-package
  • paper_authors: Mohammad Mehdi Morovati, Amin Nikanjam, Florian Tambon, Foutse Khomh, Zhen Ming, Jiang
    for:This paper aims to investigate the characteristics of bugs in Machine Learning (ML)-based software systems and the differences between ML and non-ML bugs from the maintenance viewpoint.methods:The paper uses a dataset of 447,948 GitHub repositories that use one of the three most popular ML frameworks (TensorFlow, Keras, and PyTorch) and manually inspects 386 sampled reported issues to identify ML bugs. The paper also examines 109 identified ML bugs to identify their root causes, symptoms, and required fixing time.results:The paper finds that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. The paper also shows that fixing ML bugs is more costly and complex compared to non-ML bugs, with ML bugs requiring more commits, changed files, and changed lines of code to fix. These findings highlight the importance of paying significant attention to the reliability of ML components in ML-based systems.
    Abstract Rapid growth of applying Machine Learning (ML) in different domains, especially in safety-critical areas, increases the need for reliable ML components, i.e., a software component operating based on ML. Understanding the bugs characteristics and maintenance challenges in ML-based systems can help developers of these systems to identify where to focus maintenance and testing efforts, by giving insights into the most error-prone components, most common bugs, etc. In this paper, we investigate the characteristics of bugs in ML-based software systems and the difference between ML and non-ML bugs from the maintenance viewpoint. We extracted 447,948 GitHub repositories that used one of the three most popular ML frameworks, i.e., TensorFlow, Keras, and PyTorch. After multiple filtering steps, we select the top 300 repositories with the highest number of closed issues. We manually investigate the extracted repositories to exclude non-ML-based systems. Our investigation involved a manual inspection of 386 sampled reported issues in the identified ML-based systems to indicate whether they affect ML components or not. Our analysis shows that nearly half of the real issues reported in ML-based systems are ML bugs, indicating that ML components are more error-prone than non-ML components. Next, we thoroughly examined 109 identified ML bugs to identify their root causes, symptoms, and calculate their required fixing time. The results also revealed that ML bugs have significantly different characteristics compared to non-ML bugs, in terms of the complexity of bug-fixing (number of commits, changed files, and changed lines of code). Based on our results, fixing ML bugs are more costly and ML components are more error-prone, compared to non-ML bugs and non-ML components respectively. Hence, paying a significant attention to the reliability of the ML components is crucial in ML-based systems.
    摘要 随着机器学习(ML)在不同领域的应用迅速增长,特别是在安全关键领域,需要可靠的ML组件,即基于ML的软件组件。理解ML基于系统中bug的特点和维护挑战可以帮助ML系统开发者确定维护和测试努力的重点,提供关于最error-prone ком ponents、最常见的bug等信息。在这篇论文中,我们investigated ML基于系统中bug的特点和与非MLbug的区别从维护角度。我们从GitHub上提取了300个最高数量关闭issue的仓库,并经过多个筛选步骤,选择了使用TensorFlow、Keras和PyTorch三大ML框架的仓库。我们 manually inspect了386个采样的报告issue,以确定它们是否affect ML组件。我们的分析结果显示,ML基于系统中的real issue大约占half, indicating that ML组件比非ML组件更容易出现错误。然后,我们进行了109个IDML bug的Root Cause分析,以确定它们的原因、症状和修复时间。结果还表明,ML bug和非ML bug在bug-fixing复杂性方面存在很大差异,即修复ML bug需要更多的commit、更多的改变的文件和更多的改变的代码行数。根据我们的结果,修复ML bug需要更多的时间和劳动,而ML组件也比非ML组件更容易出现错误。因此,在ML-based系统中,重点着眼于ML组件的可靠性是非常重要的。

A Predictive Model of Digital Information Engagement: Forecasting User Engagement With English Words by Incorporating Cognitive Biases, Computational Linguistics and Natural Language Processing

  • paper_url: http://arxiv.org/abs/2307.14500
  • repo_url: None
  • paper_authors: Nimrod Dvir, Elaine Friedman, Suraj Commuri, Fan yang, Jennifer Romano
    for: This paper introduces and tests a novel predictive model for digital information engagement, called the READ model, which integrates cognitive biases and natural language processing to predict the engagement levels of information.methods: The study uses a rigorous testing protocol involving a large-scale online survey to evaluate the engagement levels of 100 words selected from the WordNet database, and computes the READ attributes for each word to predict its engagement level.results: The findings show that the READ model accurately predicts a word’s engagement level with an 84% accuracy rate, and has the potential to enhance content engagement and inform AI language model development and generative text work across various domains.Here’s the simplified Chinese version:for: 这篇论文介绍了一种新的数字信息互动预测模型,称为READ模型,它将认知偏见和自然语言处理integrated into a multidimensional perspective on information engagement.methods: 该研究采用了一种严格的测试协议,通过大规模的在线问卷调查(n = 80,500)来评估100个从WordNet数据库选择的词语的互动水平,并计算每个词语的READ属性以预测其互动水平。results: 结果表明,READ模型在预测词语互动水平方面具有84%的准确率,并有可能在不同领域,如商业、教育、政府和医疗等领域中提高内容互动和AI语言模型开发和生成文本工作。
    Abstract This study introduces and empirically tests a novel predictive model for digital information engagement (IE) - the READ model, an acronym for the four pivotal attributes of engaging information: Representativeness, Ease-of-use, Affect, and Distribution. Conceptualized within the theoretical framework of Cumulative Prospect Theory, the model integrates key cognitive biases with computational linguistics and natural language processing to develop a multidimensional perspective on information engagement. A rigorous testing protocol was implemented, involving 50 randomly selected pairs of synonymous words (100 words in total) from the WordNet database. These words' engagement levels were evaluated through a large-scale online survey (n = 80,500) to derive empirical IE metrics. The READ attributes for each word were then computed and their predictive efficacy examined. The findings affirm the READ model's robustness, accurately predicting a word's IE level and distinguishing the more engaging word from a pair of synonyms with an 84% accuracy rate. The READ model's potential extends across various domains, including business, education, government, and healthcare, where it could enhance content engagement and inform AI language model development and generative text work. Future research should address the model's scalability and adaptability across different domains and languages, thereby broadening its applicability and efficacy.
    摘要 To test the model, a rigorous testing protocol was implemented, involving 50 randomly selected pairs of synonymous words (100 words in total) from the WordNet database. The engagement levels of these words were evaluated through a large-scale online survey (n = 80,500) to derive empirical IE metrics. The READ attributes for each word were then computed and their predictive efficacy examined.The findings confirm the READ model's robustness, accurately predicting a word's IE level and distinguishing the more engaging word from a pair of synonyms with an 84% accuracy rate. The READ model has the potential to enhance content engagement in various domains, including business, education, government, and healthcare, and could also inform AI language model development and generative text work. Future research should focus on scaling and adapting the model across different domains and languages to broaden its applicability and efficacy.

HUGE: Huge Unsupervised Graph Embeddings with TPUs

  • paper_url: http://arxiv.org/abs/2307.14490
  • repo_url: None
  • paper_authors: Brandon Mayer, Anton Tsitsulin, Hendrik Fichtenberger, Jonathan Halcrow, Bryan Perozzi
  • for: 这篇论文是为了快速分析大规模图数据而设计的。
  • methods: 该论文使用了高性能的图嵌入架构,利用tensor处理单元(TPUs)和可配置的高带宽内存来简化图嵌入问题,并可扩展到 Billions 个节点和万亿个边的图数据。
  • results: 论文验证了嵌入空间质量在真实和模拟的大规模数据集上,并达到了高度的嵌入空间质量和性能。
    Abstract Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in a graph. A continuous representation is often more amenable, especially at scale, for solving downstream machine learning tasks such as classification, link prediction, and clustering. A high-performance graph embedding architecture leveraging Tensor Processing Units (TPUs) with configurable amounts of high-bandwidth memory is presented that simplifies the graph embedding problem and can scale to graphs with billions of nodes and trillions of edges. We verify the embedding space quality on real and synthetic large-scale datasets.
    摘要 グラフは、オブジェクトの関系を构造化したデータの表现です。现在、ネットワークデータが広く利用されているため、大规模なグラフをすぐに分析することが必要とされています。グラフの理解のための一つの基本的なステップは、グラフエンコーディングです。グラフのノードに対して、连続的な表现を作成することで、后続の机械学习タスクにおいて、クラスジャッジ、リンクプレディクト、クラスタリングなどの问题を解くのに、より柔软でスケールすることができます。高性能のグラフエンコーディングアーキテクチャが提案されます。このアーキテクチャでは、TPU(テンソルプロセッシングユニット)を使用し、高速バンド幅メモリを调整することで、グラフエンコーディング问题を単纯化することができます。我々は、実际の大规模なデータセットに対して、エンコーディング空间の质を検证しました。

Role of Image Acquisition and Patient Phenotype Variations in Automatic Segmentation Model Generalization

  • paper_url: http://arxiv.org/abs/2307.14482
  • repo_url: None
  • paper_authors: Timothy L. Kline, Sumana Ramanathan, Harrison C. Gottlich, Panagiotis Korfiatis, Adriana V. Gregory
  • for: 本研究旨在评估自动医学图像分割模型的 OUT-OF-DOMAIN 性能和泛化能力,特别是适应新的图像收集和疾病类型。
  • methods: 该研究使用了非对照和对照的腹部 CT 扫描图像数据,并对模型进行了训练和验证,以分割肝脏、胰脏和肾脏。
  • results: 研究发现,使用更广泛的数据集可以提高模型的泛化能力和 OUT-OF-DOMAIN 性能,而不会降低模型在同类型图像上的性能。例如,使用25%的数据集进行训练的模型与仅使用同类型图像进行训练的模型相比,其 Dice 相似性几乎相同。
    Abstract Purpose: This study evaluated the out-of-domain performance and generalization capabilities of automated medical image segmentation models, with a particular focus on adaptation to new image acquisitions and disease type. Materials: Datasets from both non-contrast and contrast-enhanced abdominal CT scans of healthy patients and those with polycystic kidney disease (PKD) were used. A total of 400 images (100 non-contrast controls, 100 contrast controls, 100 non-contrast PKD, 100 contrast PKD) were utilized for training/validation of models to segment kidneys, livers, and spleens, and the final models were then tested on 100 non-contrast CT images of patients affected by PKD. Performance was evaluated using Dice, Jaccard, TPR, and Precision. Results: Models trained on a diverse range of data showed no worse performance than models trained exclusively on in-domain data when tested on in-domain data. For instance, the Dice similarity of the model trained on 25% from each dataset was found to be non-inferior to the model trained purely on in-domain data. Conclusions: The results indicate that broader training examples significantly enhances model generalization and out-of-domain performance, thereby improving automated segmentation tools' applicability in clinical settings. The study's findings provide a roadmap for future research to adopt a data-centric approach in medical image AI model development.
    摘要 目的:本研究评估了自动医疗图像分割模型的域外性和泛化能力,尤其关注于新图像获取和疾病类型的适应性。材料:非对照和对照脐肿CT成像数据库中的健康人群和肝硬化病(PKD)患者的数据集被用。总共使用400张图像(100张非对照控制图像,100张对照控制图像,100张非对照PKD,100张对照PKD)进行模型训练和验证,并将最终模型测试在100张非对照CT成像中的PKD患者中。性能评估使用 dice、jaccard、True Positive Rate(TPR)和精度。结果:模型使用多样化的数据进行训练显示与仅使用域内数据进行训练时的性能相当。例如,模型在25%来自每个数据集上进行训练后的dice相似性与仅在域内数据上进行训练的模型相当。结论:结果表明,广泛的训练示例可以显著提高模型的泛化和域外性,从而提高自动分割工具在临床设置中的应用性。这项研究的结果可以推导为未来医疗图像AI模型开发的道路图。

Equitable Time-Varying Pricing Tariff Design: A Joint Learning and Optimization Approach

  • paper_url: http://arxiv.org/abs/2307.15088
  • repo_url: None
  • paper_authors: Liudong Chen, Bolun Xu
  • for: The paper aims to design equitable time-varying tariffs that balance affordability and response incentives for consumers with limited response capability.
  • methods: The paper proposes a joint learning-based identification and optimization method that uses a recurrent neural network (RNN) to capture high-dimensional and non-linear consumer price response behaviors, and embeds the RNN into the tariff design optimization as a non-linear optimization problem with a quadratic objective.
  • results: The proposed method achieves fast and scalable computation, and simulation using real-world consumer data shows that the equitable tariffs protect low-income consumers from price surges while effectively motivating consumers to reduce peak demand, ensure revenue recovery for the utility company, and achieve robust performance against demand response uncertainties and prediction errors.
    Abstract Time-varying pricing tariffs incentivize consumers to shift their electricity demand and reduce costs, but may increase the energy burden for consumers with limited response capability. The utility must thus balance affordability and response incentives when designing these tariffs by considering consumers' response expectations. This paper proposes a joint learning-based identification and optimization method to design equitable time-varying tariffs. Our proposed method encodes historical prices and demand response data into a recurrent neural network (RNN) to capture high-dimensional and non-linear consumer price response behaviors. We then embed the RNN into the tariff design optimization, formulating a non-linear optimization problem with a quadratic objective. We propose a gradient-based solution method that achieves fast and scalable computation. Simulation using real-world consumer data shows that our equitable tariffs protect low-income consumers from price surges while effectively motivating consumers to reduce peak demand. The method also ensures revenue recovery for the utility company and achieves robust performance against demand response uncertainties and prediction errors.
    摘要 时变价格的电力价格奖励消费者shift其电力需求,以降低成本,但可能增加有限回应能力的消费者的能源荷重。公司因此必须在设计价格时考虑消费者的回应预期,以保持价格可持续性和回应奖励的平衡。这篇论文提出了一种基于学习的标合一识别优化方法,用于设计公平的时变价格。我们使用历史价格和响应数据将消费者价格响应行为编码成回归神经网络(RNN),以捕捉高维和非线性的消费者价格响应行为。然后,我们将RNN embed到价格设计优化中,定义一个非线性优化问题,其目标函数为quadratic。我们提出了一种梯度下降的解决方法,实现了快速和扩展性的计算。实验使用实际的消费者数据显示,我们的公平价格可以保护低收入消费者免受价格涨势,同时有效地鼓励消费者减少峰值需求。此外,我们的方法还确保了公司收益回报,并在需求响应不确定性和预测错误下实现了稳定性。

Limits to Reservoir Learning

  • paper_url: http://arxiv.org/abs/2307.14474
  • repo_url: None
  • paper_authors: Anthony M. Polloreno
  • for: 本研究围定机器学习能力基于物理限制。
  • methods: 我们使用信息处理容量(IPC)测量干扰下噪声影响储存计算机的性能。
  • results: 我们发现IPC在系统大小$n$上是最多 polynomial 型,而且噪声下学习需要 exponential 数量的样本。
    Abstract In this work, we bound a machine's ability to learn based on computational limitations implied by physicality. We start by considering the information processing capacity (IPC), a normalized measure of the expected squared error of a collection of signals to a complete basis of functions. We use the IPC to measure the degradation under noise of the performance of reservoir computers, a particular kind of recurrent network, when constrained by physical considerations. First, we show that the IPC is at most a polynomial in the system size $n$, even when considering the collection of $2^n$ possible pointwise products of the $n$ output signals. Next, we argue that this degradation implies that the family of functions represented by the reservoir requires an exponential number of samples to learn in the presence of the reservoir's noise. Finally, we conclude with a discussion of the performance of the same collection of $2^n$ functions without noise when being used for binary classification.
    摘要 在这个工作中,我们限制机器学习能力基于物理限制的计算限制。我们从信息处理容量(IPC)开始,是一个正规化的指标,用于度量噪声下隐藏函数的性能下降。我们使用IPC测量干扰器计算机的性能下降,并证明其是一个乘元函数关于系统大小n。接着,我们 argue that this degradation implies that the family of functions represented by the reservoir requires an exponential number of samples to learn in the presence of the reservoir's noise.最后,我们讨论无噪声情况下,这个集合的2^n个函数在进行二分类时的性能。Here's the breakdown of the translation:* "In this work" is translated as "在这个工作中" (在这个工作中).* "we bound a machine's ability to learn" is translated as "我们限制机器学习能力" (我们限制机器学习能力).* "based on computational limitations implied by physicality" is translated as "基于物理限制的计算限制" (基于物理限制的计算限制).* "We start by considering the information processing capacity" is translated as "我们开始从信息处理容量开始" (我们开始从信息处理容量开始).* "a normalized measure of the expected squared error" is translated as "一个正规化的指标,用于度量预期平方误差" (一个正规化的指标,用于度量预期平方误差).* "of a collection of signals to a complete basis of functions" is translated as "一个集合的信号到完整的函数基" (一个集合的信号到完整的函数基).* "We use the IPC to measure the degradation under noise" is translated as "我们使用IPC测量噪声下的下降" (我们使用IPC测量噪声下的下降).* "of the performance of reservoir computers" is translated as "隐藏计算机的性能下降" (隐藏计算机的性能下降).* "when constrained by physical considerations" is translated as "在物理限制下" (在物理限制下).* "First, we show that the IPC is at most a polynomial in the system size n" is translated as "首先,我们证明IPC是一个乘元函数关于系统大小n" (首先,我们证明IPC是一个乘元函数关于系统大小n).* "Next, we argue that this degradation implies that the family of functions represented by the reservoir requires an exponential number of samples to learn" is translated as "接着,我们 argue that this degradation implies that隐藏函数集合由隐藏器表示的函数集合需要一个指数数量的样本来学习" (接着,我们 argue that this degradation implies that隐藏函数集合由隐藏器表示的函数集合需要一个指数数量的样本来学习).* "Finally, we conclude with a discussion of the performance of the same collection of functions without noise" is translated as "最后,我们讨论无噪声情况下,这个集合的2^n个函数在进行二分类时的性能" (最后,我们讨论无噪声情况下,这个集合的2^n个函数在进行二分类时的性能).

What Kinds of Contracts Do ML APIs Need?

  • paper_url: http://arxiv.org/abs/2307.14465
  • repo_url: None
  • paper_authors: Samantha Syeda Khairunnesa, Shibbir Ahmed, Sayem Mohammad Imtiaz, Hridesh Rajan, Gary T. Leavens
  • for: This paper aims to identify the most useful contracts for ML API users to catch errors early in the ML pipeline.
  • methods: The study uses empirical data from Stack Overflow to extract 413 informal API specifications for four popular ML libraries (TensorFlow, Scikit-learn, Keras, and PyTorch).
  • results: The key findings are that the most commonly needed contracts for ML APIs are checking constraints on single arguments of an API or on the order of API calls. The study also suggests a need to combine behavioral and temporal contract mining approaches.
    Abstract Recent work has shown that Machine Learning (ML) programs are error-prone and called for contracts for ML code. Contracts, as in the design by contract methodology, help document APIs and aid API users in writing correct code. The question is: what kinds of contracts would provide the most help to API users? We are especially interested in what kinds of contracts help API users catch errors at earlier stages in the ML pipeline. We describe an empirical study of posts on Stack Overflow of the four most often-discussed ML libraries: TensorFlow, Scikit-learn, Keras, and PyTorch. For these libraries, our study extracted 413 informal (English) API specifications. We used these specifications to understand the following questions. What are the root causes and effects behind ML contract violations? Are there common patterns of ML contract violations? When does understanding ML contracts require an advanced level of ML software expertise? Could checking contracts at the API level help detect the violations in early ML pipeline stages? Our key findings are that the most commonly needed contracts for ML APIs are either checking constraints on single arguments of an API or on the order of API calls. The software engineering community could employ existing contract mining approaches to mine these contracts to promote an increased understanding of ML APIs. We also noted a need to combine behavioral and temporal contract mining approaches. We report on categories of required ML contracts, which may help designers of contract languages.
    摘要 近期研究发现机器学习(ML)程序存在许多错误,因此需要为 ML 代码签订合同。合同,就如design by contract方法论,可以帮助文档API和帮助API用户编写正确的代码。问题是:哪些合同可以为 API 用户提供最多帮助?我们尤其关心在 ML 管道中早期发现错误的合同。我们描述了在 Stack Overflow 上关于 TensorFlow、Scikit-learn、Keras 和 PyTorch 四个最常讨论的 ML 库的四百十三个非正式(英文)API 规范。这些规范可以帮助我们解决以下问题:ML 合同违反的根本原因和影响是什么?ML 合同违反是否存在常见的模式?在 ML 软件专业水平上理解 ML 合同需要多少技能准备?在 ML 管道的早期阶段,可以通过检查合同来检测违反吗?我们的关键发现是,ML API 中最需要的合同是检查单个参数的 API 上的约束或者 API 调用的顺序。软件工程社区可以利用现有的合同挖掘方法来挖掘这些合同,以促进 ML API 的理解。我们还注意到了将行为合同和时间合同混合的需求。我们报告了需要的 ML 合同类型,可以帮助设计合同语言的设计者。

Training Quantum Boltzmann Machines with Coresets

  • paper_url: http://arxiv.org/abs/2307.14459
  • repo_url: None
  • paper_authors: Joshua Viszlai, Teague Tomesh, Pranav Gokhale, Eric Anschuetz, Frederic T. Chong
  • for: 加速量子计算机上的量子自适应机器(QBM)训练,使用核心集技术以减少计算时间。
  • methods: 使用核心集技术取代全数据集,以减少Gradient-based步骤中的Gibbs状态抽取步骤数量,并加速总训练时间。
  • results: 使用核心集技术可以在6x6binomial图像数据集上减少QBM训练时间,并且使用Inception分数指标表明,使用核心集技术可以提高QBM的训练效率。
    Abstract Recent work has proposed and explored using coreset techniques for quantum algorithms that operate on classical data sets to accelerate the applicability of these algorithms on near-term quantum devices. We apply these ideas to Quantum Boltzmann Machines (QBM) where gradient-based steps which require Gibbs state sampling are the main computational bottleneck during training. By using a coreset in place of the full data set, we try to minimize the number of steps needed and accelerate the overall training time. In a regime where computational time on quantum computers is a precious resource, we propose this might lead to substantial practical savings. We evaluate this approach on 6x6 binary images from an augmented bars and stripes data set using a QBM with 36 visible units and 8 hidden units. Using an Inception score inspired metric, we compare QBM training times with and without using coresets.
    摘要 近期研究提出了使用核心集技术来加速量子算法在类传统计算机上的应用,以便在半导体量子计算机上实现这些算法的可靠性。我们将这些想法应用到量子博尔tz曼机(QBM)中,其中梯度更新步骤需要 Gibbs 样本采样是主要的计算瓶颈。通过将核心集置换为全量数据集,我们尝试最小化需要的步骤数量,从而加速整体训练时间。在计算时间在量子计算机上是珍贵资源的情况下,我们提议这可能导致实质性的实践成本减少。我们使用6x6 二进制图像从扩展的条纹数据集,并使用一个 QBM WITH 36 可见单元和 8 隐藏单元进行训练。使用基于 Inception metric 的训练时间比较,我们比较了不使用核心集和使用核心集两种方法的 QBM 训练时间。

Predictive Maintenance of Armoured Vehicles using Machine Learning Approaches

  • paper_url: http://arxiv.org/abs/2307.14453
  • repo_url: None
  • paper_authors: Prajit Sengupta, Anant Mehta, Prashant Singh Rana
  • for: 预测机甲车辆维护需求
  • methods: 使用多种模型,包括Light Gradient Boosting、Random Forest、决策树、Extra Tree Classifier和Gradient Boosting,基于感知数据预测机甲车辆维护需求
  • results: 提出的模型在K-fold十字验证和TOPSIS分析中得到了98.93%的准确率、99.80%的精度和99.03%的回归率,能够有效预测机甲车辆维护需求,降低机甲车辆停机时间,提高运作效率。
    Abstract Armoured vehicles are specialized and complex pieces of machinery designed to operate in high-stress environments, often in combat or tactical situations. This study proposes a predictive maintenance-based ensemble system that aids in predicting potential maintenance needs based on sensor data collected from these vehicles. The proposed model's architecture involves various models such as Light Gradient Boosting, Random Forest, Decision Tree, Extra Tree Classifier and Gradient Boosting to predict the maintenance requirements of the vehicles accurately. In addition, K-fold cross validation, along with TOPSIS analysis, is employed to evaluate the proposed ensemble model's stability. The results indicate that the proposed system achieves an accuracy of 98.93%, precision of 99.80% and recall of 99.03%. The algorithm can effectively predict maintenance needs, thereby reducing vehicle downtime and improving operational efficiency. Through comparisons between various algorithms and the suggested ensemble, this study highlights the potential of machine learning-based predictive maintenance solutions.
    摘要 armoured vehicles 是特殊化和复杂的机器设备,用于在高压环境中运行,经常在战斗或战略情况下使用。这项研究提议一种预测维护需求的ensemble系统,该系统根据战车上收集的传感器数据预测维护需求。该提议的模型体系包括轻度梯度拟合、随机森林、决策树、Extra Tree Classifier和梯度拟合等模型,以准确预测战车的维护需求。此外, employ K-fold Cross Validation 和 TOPSIS分析来评估提议的ensemble模型的稳定性。结果显示,该提议的系统实现了98.93%的准确率、99.80%的精度和99.03%的回归率。该算法可以有效预测维护需求,从而降低战车的下线时间,提高运作效率。通过对不同算法的比较以及建议的ensemble,这项研究强调了机器学习基于的预测维护解决方案的潜在力量。

VISPUR: Visual Aids for Identifying and Interpreting Spurious Associations in Data-Driven Decisions

  • paper_url: http://arxiv.org/abs/2307.14448
  • repo_url: https://github.com/picsolab/vispur
  • paper_authors: Xian Teng, Yongsu Ahn, Yu-Ru Lin
  • for: 这篇论文旨在帮助人们通过数据驱动的决策,但是现有工具很容易捕捉到假阳关系,从而导致假的结论和决策。
  • methods: 该论文提出了一个可见分析框架和一个人类中心的工作流程,用于解决假阳关系。包括一个可自动识别可能的干扰因素的 CONFOUNDER DASHBOARD,一个可视化和比较多个子组别模式的 SUBGROUP VIEWER,以及一个可视化和解释恶势力的 REASONING STORYBOARD。
  • results: 经过专家采访和控制的用户实验,我们的结果表明,提出的“去恶势”工作流程和设计的可见分析系统有效地帮助人们识别和理解假阳关系,以及制定负责任的决策。
    Abstract Big data and machine learning tools have jointly empowered humans in making data-driven decisions. However, many of them capture empirical associations that might be spurious due to confounding factors and subgroup heterogeneity. The famous Simpson's paradox is such a phenomenon where aggregated and subgroup-level associations contradict with each other, causing cognitive confusions and difficulty in making adequate interpretations and decisions. Existing tools provide little insights for humans to locate, reason about, and prevent pitfalls of spurious association in practice. We propose VISPUR, a visual analytic system that provides a causal analysis framework and a human-centric workflow for tackling spurious associations. These include a CONFOUNDER DASHBOARD, which can automatically identify possible confounding factors, and a SUBGROUP VIEWER, which allows for the visualization and comparison of diverse subgroup patterns that likely or potentially result in a misinterpretation of causality. Additionally, we propose a REASONING STORYBOARD, which uses a flow-based approach to illustrate paradoxical phenomena, as well as an interactive DECISION DIAGNOSIS panel that helps ensure accountable decision-making. Through an expert interview and a controlled user experiment, our qualitative and quantitative results demonstrate that the proposed "de-paradox" workflow and the designed visual analytic system are effective in helping human users to identify and understand spurious associations, as well as to make accountable causal decisions.
    摘要 大数据和机器学习工具已经共同强化了人们在基于数据的决策上。然而,许多其中捕捉到了偶合关系,可能因为干扰因素和 subgroup 多样性而导致误导。负荷者 парадоксом是这种现象,其中汇总级和 subgroup 级别的关系相互矛盾,从而引起了认知混乱和不当的解释和决策。现有工具提供了少量的洞察和解释,用于帮助人们在实践中避免误导关系。我们提出了 VISPUR,一个可见分析系统,该系统提供了 causal 分析框架和人类中心的工作流程,用于解决误导关系。这些包括一个 CONFOUNDER DASHBOARD,可以自动确定可能的干扰因素,以及一个 SUBGROUP VIEWER,可以Visualize 和比较多样 subgroup 模式,这些模式可能或可能导致误导关系。此外,我们还提出了一个 REASONING STORYBOARD,用于 illustrate 负荷者 парадоксом,以及一个交互式的 DECISION DIAGNOSIS 面板,用于确保负荷者做出负荷者决策。经过专家采访和控制的用户试验,我们的Qualitative 和量化结果表明,提posed "de-paradox" 工作流程和设计的可见分析系统有效地帮助人们检测和理解误导关系,以及做出负荷者决策。

Neural Schrödinger Bridge with Sinkhorn Losses: Application to Data-driven Minimum Effort Control of Colloidal Self-assembly

  • paper_url: http://arxiv.org/abs/2307.14442
  • repo_url: None
  • paper_authors: Iman Nodozi, Charlie Yan, Mira Khare, Abhishek Halder, Ali Mesbah
  • for: 这个论文是关于控制束束自组装的最小努力问题的研究。
  • methods: 这个论文使用了一种名为“神经 Шрёдингер桥”的数据驱动学习和控制框架,以解决这类问题。
  • results: 研究人员通过使用分子动力学 simulate 数据来学习控制的漂移和扩散系数,并使用这些系数来训练一个三个网络,以解决这类问题。
    Abstract We show that the minimum effort control of colloidal self-assembly can be naturally formulated in the order-parameter space as a generalized Schr\"odinger bridge problem -- a class of fixed-horizon stochastic optimal control problems that originated in the works of Erwin Schr\"odinger in the early 1930s. In recent years, this class of problems has seen a resurgence of research activities in control and machine learning communities. Different from the existing literature on the theory and computation for such problems, the controlled drift and diffusion coefficients for colloidal self-assembly are typically non-affine in control, and are difficult to obtain from physics-based modeling. We deduce the conditions of optimality for such generalized problems, and show that the resulting system of equations is structurally very different from the existing results in a way that standard computational approaches no longer apply. Thus motivated, we propose a data-driven learning and control framework, named `neural Schr\"odinger bridge', to solve such generalized Schr\"odinger bridge problems by innovating on recent advances in neural networks. We illustrate the effectiveness of the proposed framework using a numerical case study of colloidal self-assembly. We learn the controlled drift and diffusion coefficients as two neural networks using molecular dynamics simulation data, and then use these two to train a third network with Sinkhorn losses designed for distributional endpoint constraints, specific for this class of control problems.
    摘要 我们示出,可控束的溶体自组合最小努力可以自然地写作通用变量空间的一种对称化Schrödinger桥问题。这是在Erwin Schrödinger在30年代初期的研究中发展出的一种固定时间阶段的数学控制问题。在最近几年,这种问题在控制和机器学习领域中得到了新的研究热情。不同于现有的理论和计算方法,控制束的漂移和扩散系数在溶体自组合中通常是非对称的,从物理模型中难以取得。我们推出了最佳条件,并观察到这个系统的方程是与现有结果 Structurally very different,使得现有的计算方法不再适用。因此,我们提出了一个基于神经网络的学习和控制框架,称为“神经Schrödinger桥”,以解决这种对称化Schrödinger桥问题。我们使用分子动力学模拟数据来学习控制束的漂移和扩散系数两个神经网络,然后将这两个网络用来训练一个第三个神经网络,使用Sinkhorn损失函数,特别是这种控制问题的终端点给定。我们透过一个数值案例研究证明了我们的框架的有效性。

Fixed Integral Neural Networks

  • paper_url: http://arxiv.org/abs/2307.14439
  • repo_url: None
  • paper_authors: Ryan Kortvelesy
  • for: 该论文旨在解决数学 интеграル问题,即对学习函数(如神经网络)的数学 интеграル问题。
  • methods: 该论文提出了一种方法,可以用来计算学习函数的数学 интеграル。这种方法基于了一种新的表示方式,可以减少计算复杂度。
  • results: 该论文通过实验表明,该方法可以准确地计算学习函数的数学 интеграル,并且可以应用于各种应用场景,如概率分布、距离度量等。此外,该论文还引入了一种方法,可以将固定的 интеграル约束应用到神经网络中。
    Abstract It is often useful to perform integration over learned functions represented by neural networks. However, this integration is usually performed numerically, as analytical integration over learned functions (especially neural networks) is generally viewed as intractable. In this work, we present a method for representing the analytical integral of a learned function $f$. This allows the exact integral of a neural network to be computed, and enables constrained neural networks to be parametrised by applying constraints directly to the integral. Crucially, we also introduce a method to constrain $f$ to be positive, a necessary condition for many applications (e.g. probability distributions, distance metrics, etc). Finally, we introduce several applications where our fixed-integral neural network (FINN) can be utilised.
    摘要 通常情况下,通过神经网络学习的函数的数值积分是常用的。然而,这种积分通常是分析不可能的,尤其是神经网络学习的函数。在这项工作中,我们提出了一种方法来计算神经网络学习得到的函数积分。这允许我们精确地计算神经网络的积分,并通过直接应用约束来Parametrise受限神经网络。此外,我们还提出了一种方法来约束函数$f$是正的,这是许多应用中的必要条件(例如概率分布、距离度量等)。最后,我们介绍了一些应用场景, где我们的固定积分神经网络(FINN)可以被利用。

Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models

  • paper_url: http://arxiv.org/abs/2307.14430
  • repo_url: None
  • paper_authors: Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher Ré
  • for: 这个论文研究了如何使用有限token数据来训练预训练过的大语言模型(LM),以提高其下游任务的性能。
  • methods: 作者提出了一个新的框架,基于人类学习的自然顺序来理解LM的学习过程,并使用这种顺序来选择数据进行训练。他们还提出了一种在线数据采样算法,叫做Skill-It,可以在 kontinual pre-training和精度调整两个 режиме下进行数据选择,以提高LM的性能。
  • results: 作者通过使用Synthetic和实际数据进行示例,证明了这种顺序的存在,并证明了采用这种顺序可以在 fewer tokens 中更好地学习更高级别的技能。在 kontinual pre-training режиmé下,Skill-It比随机采样高出36.5点的精度。在精度调整 régime下,Skill-It比只训练target skill的数据减少了13.6%的验证损失。最后,作者应用了他们的技能框架在RedPajama dataset上, continually pre-train了3B参数的LM,并在LM评估套件上达到了高于基eline方法的精度。
    Abstract The quality of training data impacts the performance of pre-trained large language models (LMs). Given a fixed budget of tokens, we study how to best select data that leads to good downstream model performance across tasks. We develop a new framework based on a simple hypothesis: just as humans acquire interdependent skills in a deliberate order, language models also follow a natural order when learning a set of skills from their training data. If such an order exists, it can be utilized for improved understanding of LMs and for data-efficient training. Using this intuition, our framework formalizes the notion of a skill and of an ordered set of skills in terms of the associated data. First, using both synthetic and real data, we demonstrate that these ordered skill sets exist, and that their existence enables more advanced skills to be learned with less data when we train on their prerequisite skills. Second, using our proposed framework, we introduce an online data sampling algorithm, Skill-It, over mixtures of skills for both continual pre-training and fine-tuning regimes, where the objective is to efficiently learn multiple skills in the former and an individual skill in the latter. On the LEGO synthetic in the continual pre-training setting, Skill-It obtains 36.5 points higher accuracy than random sampling. On the Natural Instructions dataset in the fine-tuning setting, Skill-It reduces the validation loss on the target skill by 13.6% versus training on data associated with the target skill itself. We apply our skills framework on the recent RedPajama dataset to continually pre-train a 3B-parameter LM, achieving higher accuracy on the LM Evaluation Harness with 1B tokens than the baseline approach of sampling uniformly over data sources with 3B tokens.
    摘要 “训练数据质量对预训练大型自然语言模型(LM)的性能产生重要影响。给定一个固定的字符数预算,我们研究如何选择数据以实现下游模型在多个任务中的好性能。我们提出了一个新的框架,基于简单的假设:人类在意图的顺序中学习了一系列技能,然后 language models 也会遵循自然的顺序学习这些技能。如果这种顺序存在,那么可以用于更好地理解LMs 和数据efficient 训练。使用这种假设,我们的框架将技能和其相关数据进行了正式的定义。首先,使用 sintetic 和实际数据,我们证明了这些顺序技能集存在,并且它们的存在使得更高级别的技能可以在更少的数据量下学习。其次,我们使用我们提议的框架,开发了一个在线数据采样算法,叫做 Skill-It,用于在 continual pre-training 和 fine-tuning режиmes 中高效地学习多个技能。在 LEGO sintetic 上,Skill-It 在 continual pre-training Setting 中评估得到了36.5个点更高的准确率,相比随机采样。在 Natural Instructions dataset 上,Skill-It 在 fine-tuning Setting 中降低了目标技能的验证损失 by 13.6%,相比于直接在目标技能数据上训练。我们在 Recent RedPajama dataset 上应用了我们的技能框架,实现了在 3B 参数 LM 上的更高的准确率,与基eline 方法(随机采样)在 3B 字符数据上的性能相比。”

TabR: Unlocking the Power of Retrieval-Augmented Tabular Deep Learning

  • paper_url: http://arxiv.org/abs/2307.14338
  • repo_url: https://github.com/yandex-research/tabular-dl-tabr
  • paper_authors: Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, Artem Babenko
  • for: This paper is focused on developing a retrieval-based approach for tabular data problems using deep learning (DL) models.
  • methods: The authors propose a simple feed-forward architecture with an attention-like retrieval component, which is incrementally augmented to improve the performance on tabular data problems. The attention mechanism is designed to retrieve relevant objects from the available training data to make better predictions.
  • results: The proposed TabR model achieves the best average performance among tabular DL models on a set of public benchmarks, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed “GBDT-friendly” benchmark.
    Abstract Deep learning (DL) models for tabular data problems are receiving increasingly more attention, while the algorithms based on gradient-boosted decision trees (GBDT) remain a strong go-to solution. Following the recent trends in other domains, such as natural language processing and computer vision, several retrieval-augmented tabular DL models have been recently proposed. For a given target object, a retrieval-based model retrieves other relevant objects, such as the nearest neighbors, from the available (training) data and uses their features or even labels to make a better prediction. However, we show that the existing retrieval-based tabular DL solutions provide only minor, if any, benefits over the properly tuned simple retrieval-free baselines. Thus, it remains unclear whether the retrieval-based approach is a worthy direction for tabular DL. In this work, we give a strong positive answer to this question. We start by incrementally augmenting a simple feed-forward architecture with an attention-like retrieval component similar to those of many (tabular) retrieval-based models. Then, we highlight several details of the attention mechanism that turn out to have a massive impact on the performance on tabular data problems, but that were not explored in prior work. As a result, we design TabR -- a simple retrieval-based tabular DL model which, on a set of public benchmarks, demonstrates the best average performance among tabular DL models, becomes the new state-of-the-art on several datasets, and even outperforms GBDT models on the recently proposed ``GBDT-friendly'' benchmark (see the first figure).
    摘要 深度学习(DL)模型在表格数据问题上已经收到了越来越多的关注,而基于梯度拟合决策树(GBDT)的算法仍然是一个强大的解决方案。随着其他领域的趋势,如自然语言处理和计算机视觉,一些基于搜索的表格DL模型已经被最近提出。为一个目标对象,一个搜索基于模型将其他相关的对象(如最近的邻居)从可用的数据集中检索出来,并使用其特征或甚至标签来进行更好的预测。然而,我们表明现有的搜索基于DL解决方案只提供了微小,如果有的,的改进,而不是GBDT模型。因此,是否使用搜索基于DL方法是一个值得考虑的问题。在这种情况下,我们给出了一个积极的答案。我们首先将简单的批处理架构逐渐添加了一个注意力相似的搜索组件,类似于许多(表格)搜索基于DL模型中的注意力机制。然后,我们强调了一些注意力机制的细节,这些细节在表格数据问题上有很大的影响,但在前一项工作中没有被探讨。因此,我们设计了TabR模型,这是一个简单的搜索基于DL模型。在一些公共测试集上,TabR模型示出了最佳平均性能 среди表格DL模型,创造了新的州OF-the-art,并 même surpassed GBDT模型在“GBDT-friendly”测试集上(参见第一个图像)。

Waypoint-Based Imitation Learning for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2307.14326
  • repo_url: https://github.com/lucys0/awe
  • paper_authors: Lucy Xiaoyang Shi, Archit Sharma, Tony Z. Zhao, Chelsea Finn
  • for: 该论文主要针对的是解决人工智能机器人 manipulation 领域中的行为模仿学习(BC)问题,即训练机器人执行任务时的累累问题。
  • methods: 该论文提出了一种自动生成方点(AWE)模块,可以自动生成一个最小的方点集,以便将示例分解成一系列方点,并通过直线 interpolate 来 aproximate trajectory。
  • results: 该论文的实验结果表明,通过使用 AWE 模块可以提高 state-of-the-art 算法的成功率,在 simulated 环境中提高了25%,在实际世界上进行的双手 manipulation 任务中提高了4-28%。此外,AWE 还可以减少决策几何的大小,提高机器人的性能。
    Abstract While imitation learning methods have seen a resurgent interest for robotic manipulation, the well-known problem of compounding errors continues to afflict behavioral cloning (BC). Waypoints can help address this problem by reducing the horizon of the learning problem for BC, and thus, the errors compounded over time. However, waypoint labeling is underspecified, and requires additional human supervision. Can we generate waypoints automatically without any additional human supervision? Our key insight is that if a trajectory segment can be approximated by linear motion, the endpoints can be used as waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation learning, a preprocessing module to decompose a demonstration into a minimal set of waypoints which when interpolated linearly can approximate the trajectory up to a specified error threshold. AWE can be combined with any BC algorithm, and we find that AWE can increase the success rate of state-of-the-art algorithms by up to 25% in simulation and by 4-28% on real-world bimanual manipulation tasks, reducing the decision making horizon by up to a factor of 10. Videos and code are available at https://lucys0.github.io/awe/
    摘要 而归案学习方法在机器人操作领域得到了复兴的兴趣,但是经典的复制错误问题仍然困扰着行为归案(BC)。 waypoints 可以帮助解决这个问题,因为它们可以降低归案学习问题的观察 horizon,从而减少错误的汇总。然而, waypoint 标注是下pecified,需要进一步的人类监督。我们的关键发现是,如果一段运动段可以被近似为线性运动,那么该段的终点可以被用作 waypoints。我们提议一种自动生成 waypoints 的方法,即自动 waypoint 提取(AWE)。AWE 可以与任何 BC 算法结合使用,我们发现 AWE 可以提高现有算法的成功率,在模拟环境中提高了25%,在实际世界双手操作任务上提高了4-28%,将决策征集的时间减少到最多的10倍。视频和代码可以在 上找到。

Evaluating the Moral Beliefs Encoded in LLMs

  • paper_url: http://arxiv.org/abs/2307.14324
  • repo_url: https://github.com/ninodimontalcino/moralchoice
  • paper_authors: Nino Scherrer, Claudia Shi, Amir Feder, David M. Blei
  • for: This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs) to understand their moral beliefs.
  • methods: The paper introduces a statistical method for eliciting beliefs encoded in LLMs, which includes statistical measures and evaluation metrics to quantify the probability of an LLM “making a choice”, the associated uncertainty, and the consistency of that choice.
  • results: The study finds that in unambiguous scenarios, most models “choose” actions that align with commonsense, while in ambiguous cases, most models express uncertainty. Additionally, some models reflect clear preferences in ambiguous scenarios, and closed-source models tend to agree with each other.
    Abstract This paper presents a case study on the design, administration, post-processing, and evaluation of surveys on large language models (LLMs). It comprises two components: (1) A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM "making a choice", the associated uncertainty, and the consistency of that choice. (2) We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey comprising 680 high-ambiguity moral scenarios (e.g., "Should I tell a white lie?") and 687 low-ambiguity moral scenarios (e.g., "Should I stop for a pedestrian on the road?"). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., "do not kill"). We administer the survey to 28 open- and closed-source LLMs. We find that (a) in unambiguous scenarios, most models "choose" actions that align with commonsense. In ambiguous cases, most models express uncertainty. (b) Some models are uncertain about choosing the commonsense action because their responses are sensitive to the question-wording. (c) Some models reflect clear preferences in ambiguous scenarios. Specifically, closed-source models tend to agree with each other.
    摘要
  1. A statistical method for eliciting beliefs encoded in LLMs. We introduce statistical measures and evaluation metrics that quantify the probability of an LLM “making a choice”, the associated uncertainty, and the consistency of that choice.2. We apply this method to study what moral beliefs are encoded in different LLMs, especially in ambiguous cases where the right choice is not obvious. We design a large-scale survey consisting of 680 high-ambiguity moral scenarios (e.g., “Should I tell a white lie?”) and 687 low-ambiguity moral scenarios (e.g., “Should I stop for a pedestrian on the road?”). Each scenario includes a description, two possible actions, and auxiliary labels indicating violated rules (e.g., “do not kill”). We administer the survey to 28 open- and closed-source LLMs.Our findings are as follows:1. In unambiguous scenarios, most models “choose” actions that align with commonsense.2. In ambiguous cases, most models express uncertainty.3. Some models are uncertain about choosing the commonsense action because their responses are sensitive to the question-wording.4. Some models reflect clear preferences in ambiguous scenarios. Specifically, closed-source models tend to agree with each other.

Reinforcement Learning by Guided Safe Exploration

  • paper_url: http://arxiv.org/abs/2307.14316
  • repo_url: None
  • paper_authors: Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan
  • for: trains an RL agent to adapt quickly to a target task in a constrained and unsafe environment
  • methods: uses a guide agent to explore safely without a reward signal, and regularizes a target policy towards the guide using transfer learning
  • results: achieves safe transfer learning and helps the target policy solve the task faster
    Abstract Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained reward-free setting, where an agent (the guide) learns to explore safely without the reward signal. This agent is trained in a controlled environment, which allows unsafe interactions and still provides the safety signal. After the target task is revealed, safety violations are not allowed anymore. Thus, the guide is leveraged to compose a safe behaviour policy. Drawing from transfer learning, we also regularize a target policy (the student) towards the guide while the student is unreliable and gradually eliminate the influence of the guide as training progresses. The empirical analysis shows that this method can achieve safe transfer learning and helps the student solve the target task faster.
    摘要 安全性是推广强化学习(RL)的关键。我们通常在实验室中训练RL代理人,以便在真实世界中部署。然而,真实世界目标任务可能不明确,直到部署之前。无奖RL通过不提供奖励来训练代理人,以便快速适应奖励的披露。我们考虑了限制的奖励自由设定,在控制环境中训练代理人,以便安全地探索。当目标任务揭示后,安全违反不再允许。因此,我们可以利用导航器(guide)组织安全行为策略。通过转移学习,我们还 regularize了目标政策(student)向导而言,而学生在训练过程中不可靠。逐渐消除导航器的影响,以便帮助学生更快地解决目标任务。我们的实验分析表明,这种方法可以实现安全的转移学习,帮助学生更快地解决目标任务。

Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity

  • paper_url: http://arxiv.org/abs/2307.14403
  • repo_url: https://github.com/matciotola/lambda-pnn
  • paper_authors: Matteo Ciotola, Giovanni Poggi, Giuseppe Scarpa
  • for: 本研究旨在提出一种基于深度学习的全分辨率训练框架,以提高预先训练在高分辨率目标图像上的性能。
  • methods: 该模型采用了尚未在前一文中使用的残差注意力模块,并定义了一种联合优化 spectral和 spatial 质量的损失函数。此外,提出了一种新的微调策略,以提高在目标图像上的推理时 Adaptation。
  • results: 对于一个大量的测试图像集,经过进行了在艰难情况下的测试,结果表明,提出的方法与当前最佳方法相比,具有较高的数值性能和视觉质量。代码可以在 GitHub 上下载:https://github.com/matciotola/Lambda-PNN。
    Abstract In latest years, deep learning has gained a leading role in the pansharpening of multiresolution images. Given the lack of ground truth data, most deep learning-based methods carry out supervised training in a reduced-resolution domain. However, models trained on downsized images tend to perform poorly on high-resolution target images. For this reason, several research groups are now turning to unsupervised training in the full-resolution domain, through the definition of appropriate loss functions and training paradigms. In this context, we have recently proposed a full-resolution training framework which can be applied to many existing architectures. Here, we propose a new deep learning-based pansharpening model that fully exploits the potential of this approach and provides cutting-edge performance. Besides architectural improvements with respect to previous work, such as the use of residual attention modules, the proposed model features a novel loss function that jointly promotes the spectral and spatial quality of the pansharpened data. In addition, thanks to a new fine-tuning strategy, it improves inference-time adaptation to target images. Experiments on a large variety of test images, performed in challenging scenarios, demonstrate that the proposed method compares favorably with the state of the art both in terms of numerical results and visual output. Code is available online at https://github.com/matciotola/Lambda-PNN.
    摘要 近年来,深度学习在多分辨率图像投射中获得了主导地位。由于缺乏真实数据,大多数基于深度学习的方法在减小分辨率领域进行了超vised 训练。然而,在高分辨率目标图像上,模型经过减小图像训练后表现不佳。为此,许多研究小组现在转向不upervised 训练在全分辨率领域,通过定义合适的损失函数和训练方法。在这个上下文中,我们最近提出了一个全分辨率训练框架,可以应用于许多现有的架构。 我们的研究报告了一种全新的深度学习基于投射模型,该模型完全利用了这种方法的潜力,并提供了前沿性的性能。与前一些研究比较,该模型具有以下改进:* 使用差分注意力模块,以提高模型对图像的投射能力。* 定义一种新的损失函数,该函数同时Promote图像的spectral和空间质量。* 通过一种新的调整策略,提高投射过程中的投射精度。对于许多测试图像,在多种复杂的场景下进行了实验。结果表明,提议的方法与当前最佳方法相比,具有更高的数值结果和视觉输出。代码可以在https://github.com/matciotola/Lambda-PNN上下载。

A Constraint Enforcement Deep Reinforcement Learning Framework for Optimal Energy Storage Systems Dispatch

  • paper_url: http://arxiv.org/abs/2307.14304
  • repo_url: https://github.com/ShengrenHou/Energy-management-MIP-Deep-Reinforcement-Learning
  • paper_authors: Shengren Hou, Edgar Mauricio Salazar Duque, Peter Palensky, Pedro P. Vergara
  • for: optimize the dispatch of energy storage systems (ESSs) in the presence of uncertainty
  • methods: deep reinforcement learning (DRL) algorithms with mixed-integer programming (MIP) formulation to enforce operational constraints
  • results: superior performance compared to state-of-the-art DRL algorithms and the optimal solution with perfect forecast of stochastic variables, effectively enforcing all constraints while delivering high-quality dispatch decisions.
    Abstract The optimal dispatch of energy storage systems (ESSs) presents formidable challenges due to the uncertainty introduced by fluctuations in dynamic prices, demand consumption, and renewable-based energy generation. By exploiting the generalization capabilities of deep neural networks (DNNs), deep reinforcement learning (DRL) algorithms can learn good-quality control models that adaptively respond to distribution networks' stochastic nature. However, current DRL algorithms lack the capabilities to enforce operational constraints strictly, often even providing unfeasible control actions. To address this issue, we propose a DRL framework that effectively handles continuous action spaces while strictly enforcing the environments and action space operational constraints during online operation. Firstly, the proposed framework trains an action-value function modeled using DNNs. Subsequently, this action-value function is formulated as a mixed-integer programming (MIP) formulation enabling the consideration of the environment's operational constraints. Comprehensive numerical simulations show the superior performance of the proposed MIP-DRL framework, effectively enforcing all constraints while delivering high-quality dispatch decisions when compared with state-of-the-art DRL algorithms and the optimal solution obtained with a perfect forecast of the stochastic variables.
    摘要 优化能量存储系统(ESS)的分配具有挑战性,由于动态价格、消耗量和可再生能源生产的不确定性引入的随机性。通过利用深度神经网络(DNN)的泛化能力,深度强化学习(DRL)算法可以学习适应分布网络的随机性的控制模型。然而,现有的DRL算法缺乏对环境和动作空间的操作约束的严格执行能力,经常提供不可实施的控制动作。为解决这个问题,我们提出了一种DRL框架,可以有效地处理连续动作空间,并且在在线操作中严格执行环境和动作空间的操作约束。首先,我们的框架将 trains一个动作价值函数,使用DNN模型来模型。然后,这个动作价值函数被转换为混合整数编程(MIP)形式,以便考虑环境的操作约束。我们的数字实验表明,我们的MIP-DRL框架在比较难的环境下表现出色,可以有效地执行所有约束,并提供高质量的分配决策,与当前的DRL算法和完美预测 Stochastic Variables 的优质解相比。

ChatGPT and Persuasive Technologies for the Management and Delivery of Personalized Recommendations in Hotel Hospitality

  • paper_url: http://arxiv.org/abs/2307.14298
  • repo_url: None
  • paper_authors: Manolis Remountakis, Konstantinos Kotis, Babis Kourtzis, George E. Tsekouras
  • for: 这篇论文旨在探讨将 chatGPT 和吸引技术应用于酒店住宿系统中,以提高个性化体验和酒店业绩。
  • methods: 论文首先介绍了 chatGPT 的能力,包括理解和生成人类语言,以提供更准确和上下文感知的推荐。然后,论文详细介绍了将 chatGPT интегра到推荐系统中,包括分析用户偏好、从在线评论中提取有价值信息、并基于用户 profiling 生成个性化推荐。
  • results: 论文描述了一个实验,通过将 chatGPT 和吸引技术应用于酒店推荐系统中,对用户参与度、满意度和转化率进行了评估。初步结果表明,这些技术在提高客户体验和酒店业绩方面具有潜在的潜力。
    Abstract Recommender systems have become indispensable tools in the hotel hospitality industry, enabling personalized and tailored experiences for guests. Recent advancements in large language models (LLMs), such as ChatGPT, and persuasive technologies, have opened new avenues for enhancing the effectiveness of those systems. This paper explores the potential of integrating ChatGPT and persuasive technologies for automating and improving hotel hospitality recommender systems. First, we delve into the capabilities of ChatGPT, which can understand and generate human-like text, enabling more accurate and context-aware recommendations. We discuss the integration of ChatGPT into recommender systems, highlighting the ability to analyze user preferences, extract valuable insights from online reviews, and generate personalized recommendations based on guest profiles. Second, we investigate the role of persuasive technology in influencing user behavior and enhancing the persuasive impact of hotel recommendations. By incorporating persuasive techniques, such as social proof, scarcity and personalization, recommender systems can effectively influence user decision-making and encourage desired actions, such as booking a specific hotel or upgrading their room. To investigate the efficacy of ChatGPT and persuasive technologies, we present a pilot experi-ment with a case study involving a hotel recommender system. We aim to study the impact of integrating ChatGPT and persua-sive techniques on user engagement, satisfaction, and conversion rates. The preliminary results demonstrate the potential of these technologies in enhancing the overall guest experience and business performance. Overall, this paper contributes to the field of hotel hospitality by exploring the synergistic relationship between LLMs and persuasive technology in recommender systems, ultimately influencing guest satisfaction and hotel revenue.
    摘要 现代技术如推荐系统已经成为酒店业的不可或缺工具,帮助提供个性化和适应性的旅客体验。最新的大语言模型(LLMs),如ChatGPT,以及吸引技术,对推荐系统的效果提高有新的可能性。本文探讨将ChatGPT和吸引技术集成到酒店推荐系统中,以提高旅客体验和酒店业绩。首先,我们深入探讨ChatGPT的能力,它可以理解和生成人类语言,从而提供更加准确和 Context-Aware 的推荐。我们讨论将ChatGPT integrate into recommender systems,包括分析用户偏好、从网络评论中提取有价值信息和基于客户profile generate个性化推荐。其次,我们调查了吸引技术在用户行为中的作用,并将其应用于酒店推荐系统中。通过包含社交证明、缺失和个性化等吸引技术,推荐系统可以更好地影响用户决策和适应旅客体验。为了评估ChatGPT和吸引技术的效果,我们在一个酒店推荐系统的实验中进行了一个先验。我们的目标是研究将ChatGPT和吸引技术集成到推荐系统中的影响,包括用户参与度、满意度和转化率。初步结果表明这些技术在总体客户体验和酒店业绩方面具有潜在的潜力。总之,本文对酒店业的贡献在于探讨LLMs和吸引技术在推荐系统中的相互作用,从而影响客户满意度和酒店收益。

Unraveling the Complexity of Splitting Sequential Data: Tackling Challenges in Video and Time Series Analysis

  • paper_url: http://arxiv.org/abs/2307.14294
  • repo_url: None
  • paper_authors: Diego Botache, Kristina Dingel, Rico Huhnstock, Arno Ehresmann, Bernhard Sick
  • for: 这篇论文主要是为了探讨分析顺序数据的问题,如视频和时间序列数据的分割。
  • methods: 论文使用了多种方法来探讨分析顺序数据的问题,包括数据收集、数据表示、分割率选择、设置质量标准和选择适当的选择策略等。
  • results: 论文通过两个实际例子——汽车测试台和液体中的粒子跟踪——探讨了分析顺序数据的各种挑战,并提出了一些可能的解决方案。
    Abstract Splitting of sequential data, such as videos and time series, is an essential step in various data analysis tasks, including object tracking and anomaly detection. However, splitting sequential data presents a variety of challenges that can impact the accuracy and reliability of subsequent analyses. This concept article examines the challenges associated with splitting sequential data, including data acquisition, data representation, split ratio selection, setting up quality criteria, and choosing suitable selection strategies. We explore these challenges through two real-world examples: motor test benches and particle tracking in liquids.
    摘要 分割连续数据,如视频和时间序列数据,是数据分析任务中的一个重要步骤,包括对象跟踪和异常检测。然而,分割连续数据带来许多挑战,这些挑战可能会影响后续分析的准确性和可靠性。本文探讨分割连续数据的挑战,包括数据获取、数据表示、分割率选择、设置质量标准和选择适合的选择策略。通过两个实际应用例——汽车测试台和液体中粒子跟踪——我们探讨这些挑战。

General Purpose Artificial Intelligence Systems (GPAIS): Properties, Definition, Taxonomy, Open Challenges and Implications

  • paper_url: http://arxiv.org/abs/2307.14283
  • repo_url: None
  • paper_authors: Isaac Triguero, Daniel Molina, Javier Poyatos, Javier Del Ser, Francisco Herrera
    for: 这篇论文关注通用人工智能系统(GPAIS)的定义和分类,以便在不同领域进行合作研究。methods: 本文提出了新的GPAIS定义,并分类了GPAIS的不同类型,包括关闭世界GPAIS和开放世界GPAIS,以及它们的自主性和能力。同时,文章还介绍了实现GPAIS的多种方法,如使用人工智能技术提高另一个AI的能力。results: 文章提出了一种新的GPAIS定义和分类方法,可以帮助不同领域的研究人员在进行通用任务时进行合作研究。同时,文章还介绍了实现GPAIS的多种方法,并对现有的GPAIS系统进行了评价。
    Abstract Most applications of Artificial Intelligence (AI) are designed for a confined and specific task. However, there are many scenarios that call for a more general AI, capable of solving a wide array of tasks without being specifically designed for them. The term General-Purpose Artificial Intelligence Systems (GPAIS) has been defined to refer to these AI systems. To date, the possibility of an Artificial General Intelligence, powerful enough to perform any intellectual task as if it were human, or even improve it, has remained an aspiration, fiction, and considered a risk for our society. Whilst we might still be far from achieving that, GPAIS is a reality and sitting at the forefront of AI research. This work discusses existing definitions for GPAIS and proposes a new definition that allows for a gradual differentiation among types of GPAIS according to their properties and limitations. We distinguish between closed-world and open-world GPAIS, characterising their degree of autonomy and ability based on several factors such as adaptation to new tasks, competence in domains not intentionally trained for, ability to learn from few data, or proactive acknowledgment of their own limitations. We then propose a taxonomy of approaches to realise GPAIS, describing research trends such as the use of AI techniques to improve another AI or foundation models. As a prime example, we delve into generative AI, aligning them with the terms and concepts presented in the taxonomy. Through the proposed definition and taxonomy, our aim is to facilitate research collaboration across different areas that are tackling general-purpose tasks, as they share many common aspects. Finally, we discuss the current state of GPAIS, its challenges and prospects, implications for our society, and the need for responsible and trustworthy AI systems and regulation, with the goal of providing a holistic view of GPAIS.
    摘要 大多数人工智能(AI)应用都是为特定任务设计的,但有许多情况需要一种通用的AI系统,能够解决多种任务而不需要特定设计。这种AI系统被称为通用人工智能系统(GPAIS)。迄今为止,人工通用智能,具有人类水平的智能,执行任何知识任务,或者甚至超越人类,还是一个希望和科幻。虽然我们可能还远离实现这个目标,但GPAIS已经成为现实,并位于人工智能研究的前沿。这篇文章讨论了现有的GPAIS定义,并提出了一个新的定义,以便在GPAIS之间进行渐进的差异化。我们将GPAIS分为关闭世界和开放世界两类,根据它们的自主度和能力,并基于一些因素,如新任务适应度、不直接训练的领域能力、少量数据学习能力、以及主动承认自己的局限性。然后,我们提出了实现GPAIS的方法taxonomy,描述了一些研究趋势,如使用人工智能技术来提高另一个人工智能,或基础模型。作为一个 prime example,我们探讨了生成AI,与文章中所提出的术语和概念相匹配。通过我们的定义和taxonomy,我们的目标是促进不同领域对通用任务的合作研究,因为它们在一些方面有共同点。最后,我们讨论了GPAIS的当前状况、挑战和前景,以及我们社会中的责任和可靠的人工智能系统和法规,以提供一个整体的GPAIS视图。

Deepfake Image Generation for Improved Brain Tumor Segmentation

  • paper_url: http://arxiv.org/abs/2307.14273
  • repo_url: None
  • paper_authors: Roa’a Al-Emaryeen, Sara Al-Nahhas, Fatima Himour, Waleed Mahafza, Omar Al-Kadi
  • for: 这研究旨在使用深度模拟图像生成技术提高脑肿划分精度。
  • methods: 研究使用生成对抗网络进行图像到图像翻译,然后使用基于U-Net的卷积神经网络进行图像划分。
  • results: 研究表明,使用深度模拟图像生成技术可以提高脑肿划分精度,并且可以帮助在受限数据量下进行训练。
    Abstract As the world progresses in technology and health, awareness of disease by revealing asymptomatic signs improves. It is important to detect and treat tumors in early stage as it can be life-threatening. Computer-aided technologies are used to overcome lingering limitations facing disease diagnosis, while brain tumor segmentation remains a difficult process, especially when multi-modality data is involved. This is mainly attributed to ineffective training due to lack of data and corresponding labelling. This work investigates the feasibility of employing deep-fake image generation for effective brain tumor segmentation. To this end, a Generative Adversarial Network was used for image-to-image translation for increasing dataset size, followed by image segmentation using a U-Net-based convolutional neural network trained with deepfake images. Performance of the proposed approach is compared with ground truth of four publicly available datasets. Results show improved performance in terms of image segmentation quality metrics, and could potentially assist when training with limited data.
    摘要 世界的科技和医疗技术不断发展,疾病的早期发现越来越重要。早期发现和治疗肿瘤可以挽救人命。计算机支持技术可以减少疾病诊断中的限制,但脑肿瘤分 segmentation仍然是一项具有挑战性的任务,特别是当涉及多Modal数据时。这主要归因于训练不充分,因为缺乏数据和相应的标签。本研究探讨使用深层负拟合作网络进行有效脑肿瘤分 segmentation的可能性。为此,我们使用生成对抗网络进行图像到图像翻译,然后使用U-Net基于 convolutional neural network(CNN)进行图像分 segmentation,训练时使用深层负拟合作网络生成的图像。我们对提出的方法与公共数据集的标准chmark进行比较,结果显示,我们的方法可以提高图像分 segmentation质量指标,并且在训练时使用有限数据可能帮助。