cs.LG - 2023-07-06

Learning Disentangled Representations in Signed Directed Graphs without Social Assumptions

  • paper_url: http://arxiv.org/abs/2307.03077
  • repo_url: https://github.com/geonwooko/dines
  • paper_authors: Geonwoo Ko, Jinhong Jung
  • for: 学习签名图的节点表示(node representation learning in signed directed graphs)
  • methods: 提出了一种新的方法DINES,它采用分离框架,将每个嵌入分解成不同的因素,以捕捉多个潜在因素。还使用了轻量级的图 convolution,不依赖社会理论。
  • results: 经过广泛的实验,DINES方法在真实世界签名图上效果很好,可以准确预测边的签名。与其他竞争者相比,DINES方法显著超越其他方法。
    Abstract Signed graphs are complex systems that represent trust relationships or preferences in various domains. Learning node representations in such graphs is crucial for many mining tasks. Although real-world signed relationships can be influenced by multiple latent factors, most existing methods often oversimplify the modeling of signed relationships by relying on social theories and treating them as simplistic factors. This limits their expressiveness and their ability to capture the diverse factors that shape these relationships. In this paper, we propose DINES, a novel method for learning disentangled node representations in signed directed graphs without social assumptions. We adopt a disentangled framework that separates each embedding into distinct factors, allowing for capturing multiple latent factors. We also explore lightweight graph convolutions that focus solely on sign and direction, without depending on social theories. Additionally, we propose a decoder that effectively classifies an edge's sign by considering correlations between the factors. To further enhance disentanglement, we jointly train a self-supervised factor discriminator with our encoder and decoder. Throughout extensive experiments on real-world signed directed graphs, we show that DINES effectively learns disentangled node representations, and significantly outperforms its competitors in the sign prediction task.
    摘要 签名图是复杂系统,表示信任关系或偏好在不同领域。学习图节表示是挖掘任务中的关键。although real-world签名关系可能受到多种隐藏因素影响,现有方法通常夸大了签名关系的模型化,通过社会理论对它们进行简单化。这限制了它们的表达能力和捕捉签名关系的多样性。在这篇论文中,我们提出了DINES方法,一种学习签名直接图的不同因素分解节点表示方法。我们采用分解框架,将每个嵌入分解成不同因素,以捕捉多个隐藏因素。我们还提出了一种简单的Edge sign类型的预测方法,通过考虑因素之间的相关性来更好地预测签名关系。为了进一步提高分离度,我们同时训练了一个自我超vised因素分类器与我们的编码器和解码器一起。通过对实际签名直接图进行广泛的实验,我们表明DINES方法可以有效地学习分离节点表示,并在签名预测任务中显著超越其竞争对手。

A Hybrid End-to-End Spatio-Temporal Attention Neural Network with Graph-Smooth Signals for EEG Emotion Recognition

  • paper_url: http://arxiv.org/abs/2307.03068
  • repo_url: None
  • paper_authors: Shadi Sartipi, Mastaneh Torkamani-Azar, Mujdat Cetin
  • for: 这篇论文的主要目标是设计一种自动识别情感状态的模型,具体来说是使用深度神经网络来实现。
  • methods: 这篇论文使用了一种混合结构的空间-时间编码和循环注意力网络块,以及一个预处理步骤使用图像信号处理工具来进行图像平滑处理。
  • results: 根据DEAP数据集,这种建议的模型的性能超过了当前状态的极性识别结果,并且通过跨Modalities的传输学习(TL)来证明模型的学习是通用的。
    Abstract Recently, physiological data such as electroencephalography (EEG) signals have attracted significant attention in affective computing. In this context, the main goal is to design an automated model that can assess emotional states. Lately, deep neural networks have shown promising performance in emotion recognition tasks. However, designing a deep architecture that can extract practical information from raw data is still a challenge. Here, we introduce a deep neural network that acquires interpretable physiological representations by a hybrid structure of spatio-temporal encoding and recurrent attention network blocks. Furthermore, a preprocessing step is applied to the raw data using graph signal processing tools to perform graph smoothing in the spatial domain. We demonstrate that our proposed architecture exceeds state-of-the-art results for emotion classification on the publicly available DEAP dataset. To explore the generality of the learned model, we also evaluate the performance of our architecture towards transfer learning (TL) by transferring the model parameters from a specific source to other target domains. Using DEAP as the source dataset, we demonstrate the effectiveness of our model in performing cross-modality TL and improving emotion classification accuracy on DREAMER and the Emotional English Word (EEWD) datasets, which involve EEG-based emotion classification tasks with different stimuli.
    摘要 近些年,生理数据如电enzephalography(EEG)信号在情感计算中吸引了广泛的关注。在这个上下文中,主要的目标是设计一个自动化的模型,可以评估情感状态。最近,深度神经网络在情感识别任务中表现出了良好的表现。然而,设计一个深度架构,可以从原始数据中提取实用信息,仍然是一个挑战。我们在这里引入了一个深度神经网络,通过混合式空间-时间编码和循环注意网络块来获得可解释的生理学表示。此外,我们对原始数据进行了预处理步骤,使用图像信号处理工具来实现图像平滑处理。我们示出了我们提议的架构可以在公共可用的DEAP数据集上超过状态的报告结果进行情感分类。为了探索学习的一致性,我们还评估了我们学习的模型在转移学习(TL)中的性能。使用DEAP作为源数据集,我们示出了我们模型在跨Modal TL中的效果,并在DREAMER和Emotional English Word(EEWD)数据集上进行了不同的刺激情感分类任务的改进。

DeepOnto: A Python Package for Ontology Engineering with Deep Learning

  • paper_url: http://arxiv.org/abs/2307.03067
  • repo_url: https://github.com/KRR-Oxford/DeepOnto
  • paper_authors: Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks, Carlo Allocca, Taehun Kim, Brahmananda Sapkota
  • for: 本文旨在应用深度学习技术,特别是语言模型(LM),在 ontology engineering 中进行整合和实现。
  • methods: 本文使用 Python 框架 PyTorch 和 Tensorflow,并与广泛使用的 ontology API 如 OWL API 和 Jena 进行整合。
  • results: 本文提出了 Deeponto,一个 Python 套件,可以实现 ontology engineering 中的多个任务,包括 ontology 调整和完成,并且可以运用深度学习方法,如预训 LM。在文中还提供了两个实际应用案例:Samsung Research UK 的数位健康教学和 OAEI 的 Bio-ML 追踪。
    Abstract Applying deep learning techniques, particularly language models (LMs), in ontology engineering has raised widespread attention. However, deep learning frameworks like PyTorch and Tensorflow are predominantly developed for Python programming, while widely-used ontology APIs, such as the OWL API and Jena, are primarily Java-based. To facilitate seamless integration of these frameworks and APIs, we present Deeponto, a Python package designed for ontology engineering. The package encompasses a core ontology processing module founded on the widely-recognised and reliable OWL API, encapsulating its fundamental features in a more "Pythonic" manner and extending its capabilities to include other essential components including reasoning, verbalisation, normalisation, projection, and more. Building on this module, Deeponto offers a suite of tools, resources, and algorithms that support various ontology engineering tasks, such as ontology alignment and completion, by harnessing deep learning methodologies, primarily pre-trained LMs. In this paper, we also demonstrate the practical utility of Deeponto through two use-cases: the Digital Health Coaching in Samsung Research UK and the Bio-ML track of the Ontology Alignment Evaluation Initiative (OAEI).
    摘要 使用深度学习技术,特别是语言模型(LM),在onto工程中引起了广泛的关注。然而,深度学习框架如PyTorch和Tensorflow主要为Python编程语言开发,而广泛使用onto API,如OWL API和Jena,是主要基于Java编程语言。为了实现这些框架和API的无缝集成,我们提出了Deeponto,一个Python包用于onto工程。该包包括一个核心onto处理模块,基于广泛认可和可靠的OWL API,将其主要特征封装在更"Pythonic"的方式,并将其扩展到包括其他重要组成部分,如理解、词法、正规化、投影等。在这个模块基础之上,Deeponto提供了一组工具、资源和算法,支持多种onto工程任务,如onto对齐和完成,通过利用深度学习方法,主要是预训练LM。在这篇论文中,我们还通过两个使用案例,分别是Samsung Research UK的数字健康帮助和OAEI的生物ML轨道,证明Deeponto的实用性。

Generalizing Backpropagation for Gradient-Based Interpretability

  • paper_url: http://arxiv.org/abs/2307.03056
  • repo_url: https://github.com/kdu4108/semiring-backprop-exps
  • paper_authors: Kevin Du, Lucas Torroba Hennigen, Niklas Stoehr, Alexander Warstadt, Ryan Cotterell
  • for: 本研究旨在提高深度神经网络的解释能力,通过计算模型的输出对应输入的梯度来解释模型的工作机制。
  • methods: 本研究使用semiring来扩展归档分析方法,从而计算出模型的梯度图的其他可解释统计量,如最大权重路径和熵。
  • results: 通过synthetic数据和BERT模型进行实验,研究发现:(a) 模型中组件的梯度流量反映该组件对预测的重要性,(b) SVA任务中自动注意力机制的特定路径对模型的预测具有重要性。
    Abstract Many popular feature-attribution methods for interpreting deep neural networks rely on computing the gradients of a model's output with respect to its inputs. While these methods can indicate which input features may be important for the model's prediction, they reveal little about the inner workings of the model itself. In this paper, we observe that the gradient computation of a model is a special case of a more general formulation using semirings. This observation allows us to generalize the backpropagation algorithm to efficiently compute other interpretable statistics about the gradient graph of a neural network, such as the highest-weighted path and entropy. We implement this generalized algorithm, evaluate it on synthetic datasets to better understand the statistics it computes, and apply it to study BERT's behavior on the subject-verb number agreement task (SVA). With this method, we (a) validate that the amount of gradient flow through a component of a model reflects its importance to a prediction and (b) for SVA, identify which pathways of the self-attention mechanism are most important.
    摘要 很多流行的特征归因方法用于解释深度神经网络,都是通过计算模型输出与输入之间的梯度来实现的。虽然这些方法可以指示模型预测中哪些输入特征是重要的,但它们对模型自身的工作机制提供的信息很少。在这篇论文中,我们发现了一种使用半群的形式ulation来计算模型的梯度。这一发现使得我们可以扩展backpropagation算法,以计算模型梯度图的其他可解释统计,例如最大重量路径和熵。我们实现了这种扩展后的算法,在synthetic数据上进行了更好的理解这些统计的测试,并将其应用于研究BERT在主语-谓语数目协调任务(SVA)中的行为。通过这种方法,我们(a)验证了模型中组件的重要性与预测中的梯度流量相对应,(b) для SVA,找到了自注意机制中最重要的路径。

Origin-Destination Travel Time Oracle for Map-based Services

  • paper_url: http://arxiv.org/abs/2307.03048
  • repo_url: None
  • paper_authors: Yan Lin, Huaiyu Wan, Jilin Hu, Shengnan Guo, Bin Yang, Youfang Lin, Christian S. Jensen
  • for: 这个论文的目的是提出一种基于历史轨迹的Origin-Destination(OD)旅行时间估计(TTE)解决方案,以便构建OD旅行时间估计 oracle。
  • methods: 该解决方案基于一个两阶段框架,包括一个 conditioned Pixelated Trajectories(PiT)denoiser 和一个Masked Vision Transformer(MViT)。denoiser 通过学习OD对历史轨迹的相关性,建立了一个 diffusion-based PiT 推理过程;MViT 可以快速和高效地根据推理出的 PiT 估计旅行时间。
  • results: experiments 表明,相比基eline方法,DOT 能够在精度、可扩展性和可解释性三个方面取得更好的性能。
    Abstract Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle~(ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.
    摘要 Given an origin (O), a destination (D), and a departure time (T), an Origin-Destination (OD) travel time oracle (ODT-Oracle) returns an estimate of the time it takes to travel from O to D when departing at T. ODT-Oracles serve important purposes in map-based services. To enable the construction of such oracles, we provide a travel-time estimation (TTE) solution that leverages historical trajectories to estimate time-varying travel times for OD pairs. The problem is complicated by the fact that multiple historical trajectories with different travel times may connect an OD pair, while trajectories may vary from one another. To solve the problem, it is crucial to remove outlier trajectories when doing travel time estimation for future queries. We propose a novel, two-stage framework called Diffusion-based Origin-destination Travel Time Estimation (DOT), that solves the problem. First, DOT employs a conditioned Pixelated Trajectories (PiT) denoiser that enables building a diffusion-based PiT inference process by learning correlations between OD pairs and historical trajectories. Specifically, given an OD pair and a departure time, we aim to infer a PiT. Next, DOT encompasses a Masked Vision Transformer~(MViT) that effectively and efficiently estimates a travel time based on the inferred PiT. We report on extensive experiments on two real-world datasets that offer evidence that DOT is capable of outperforming baseline methods in terms of accuracy, scalability, and explainability.

Track Mix Generation on Music Streaming Services using Transformers

  • paper_url: http://arxiv.org/abs/2307.03045
  • repo_url: None
  • paper_authors: Walid Bendada, Théo Bontempelli, Mathieu Morlon, Benjamin Chapus, Thibault Cador, Thomas Bouabça, Guillaume Salha-Galvan
  • for: 这篇论文描述了一个名为Track Mix的个性化播放列表生成系统,于2022年在音乐流媒体服务Deezer上发布。这个系统可以根据初始音乐曲目自动生成“混合”播放列表,让用户找到与他们喜爱的音乐类似的内容。
  • methods: 为生成这些混合,我们使用了一个基于Transformer模型的方法,该模型在用户播放列表中处理了百万个轨迹序列。我们还对使用Transformer模型进行混合生成的优势、缺点和技术挑战进行分析,并与传统的合作推荐方法进行比较。
  • results: desde su lanzamiento, Track Mix ha generado listas de reproducción diarias para millones de usuarios en Deezer, mejorando su experiencia de descubrimiento de música en la plataforma.
    Abstract This paper introduces Track Mix, a personalized playlist generation system released in 2022 on the music streaming service Deezer. Track Mix automatically generates "mix" playlists inspired by initial music tracks, allowing users to discover music similar to their favorite content. To generate these mixes, we consider a Transformer model trained on millions of track sequences from user playlists. In light of the growing popularity of Transformers in recent years, we analyze the advantages, drawbacks, and technical challenges of using such a model for mix generation on the service, compared to a more traditional collaborative filtering approach. Since its release, Track Mix has been generating playlists for millions of users daily, enhancing their music discovery experience on Deezer.
    摘要 这篇论文介绍了Deezer音乐流媒体服务于2022年发布的个性化播放列表生成系统Track Mix。Track Mix自动生成基于初始音乐曲目的"混合"播放列表,让用户发现与自己喜爱的音乐相似的歌曲。为生成这些混合,我们考虑了基于千万个轨迹序列的Transformer模型。在过去几年内,Transformers的普及程度在不断增长,我们对使用这种模型进行混合生成在服务上的优势、缺点和技术挑战进行分析,并与传统的共同推荐方法进行比较。自其发布以来,Track Mix每天为数百万用户生成播放列表,提高了Deezer音乐发现体验。Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

A Near-Linear Time Algorithm for the Chamfer Distance

  • paper_url: http://arxiv.org/abs/2307.03043
  • repo_url: None
  • paper_authors: Ainesh Bakshi, Piotr Indyk, Rajesh Jayaram, Sandeep Silwal, Erik Waingarten
  • for: 点云集 $A,B \subset \mathbb{R}^d$ 的 Chamfer 距离
  • methods: 使用 $O(d n^2)$-时间简单扫描算法
  • results: 提出了首个 $(1+\epsilon)$-近似算法,运行时间为 $O(nd \log (n)/\varepsilon^2)$,并且可实现。实验表明其准确且快速于大高维数据集。
    Abstract For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. Further, the Chamfer distance is often used as a proxy for the more computationally demanding Earth-Mover (Optimal Transport) Distance. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets. We overcome this bottleneck and present the first $(1+\epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds. We also give evidence that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.
    摘要 For any two point sets $A,B \subset \mathbb{R}^d$ of size up to $n$, the Chamfer distance from $A$ to $B$ is defined as $\text{CH}(A,B)=\sum_{a \in A} \min_{b \in B} d_X(a,b)$, where $d_X$ is the underlying distance measure (e.g., the Euclidean or Manhattan distance). The Chamfer distance is a popular measure of dissimilarity between point clouds, used in many machine learning, computer vision, and graphics applications, and admits a straightforward $O(d n^2)$-time brute force algorithm. However, the \emph{quadratic} dependence on $n$ in the running time makes the naive approach intractable for large datasets.We overcome this bottleneck and present the first $(1+\epsilon)$-approximate algorithm for estimating the Chamfer distance with a near-linear running time. Specifically, our algorithm runs in time $O(nd \log (n)/\varepsilon^2)$ and is implementable. Our experiments demonstrate that it is both accurate and fast on large high-dimensional datasets. We believe that our algorithm will open new avenues for analyzing large high-dimensional point clouds.In addition, we show that if the goal is to \emph{report} a $(1+\varepsilon)$-approximate mapping from $A$ to $B$ (as opposed to just its value), then any sub-quadratic time algorithm is unlikely to exist.

Parameter-Efficient Fine-Tuning of LLaMA for the Clinical Domain

  • paper_url: http://arxiv.org/abs/2307.03042
  • repo_url: None
  • paper_authors: Aryo Pradipta Gema, Luke Daines, Pasquale Minervini, Beatrice Alex
  • for: 这篇论文的目的是如何将预训语言模型(LLaMA)应用到医疗领域中,以提高模型在这个领域的性能。
  • methods: 这篇论文使用了一种叫做Parameter-Efficient Fine-Tuning(PEFT)的技术,将预训语言模型中的一小部分parameters进行精确地微调整,以降低在领域适应中的计算需求。
  • results: 这篇论文的结果显示,使用PEFT技术和医疗领域的资料集进行微调整,可以实现与专门医疗语言模型相比的竞争水平,并且在大规模多类别标签分类任务中实现了6-9%的AUROC分数提升。
    Abstract Adapting pretrained language models to novel domains, such as clinical applications, traditionally involves retraining their entire set of parameters. However, this approach is increasingly proven to be impractical owing to the substantial computational requirements associated with training such large language models. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) techniques offer a viable solution by selectively fine-tuning a small subset of additional parameters, significantly reducing the computational requirements for domain adaptation. In this study, we propose Clinical LLaMA-LoRA, a PEFT adapter layer built upon the open-sourced LLaMA model. Clinical LLaMA-LoRA is trained using clinical notes obtained from the MIMIC-IV database, thereby creating a specialised adapter designed for the clinical domain. Additionally, we propose a two-step PEFT framework which fuses Clinical LLaMA-LoRA with Downstream LLaMA-LoRA, another PEFT adapter specialised for downstream tasks. We evaluate this framework on multiple clinical outcome prediction datasets, comparing it to clinically trained language models. Our proposed framework achieves a state-of-the-art AUROC score averaged across all clinical downstream tasks. We observe substantial improvements of 6-9% AUROC score in the large-scale multilabel classification tasks, such as diagnoses and procedures classification.
    摘要 traditional clinical applications中的适应措施通常是 retrained整个语言模型的参数集。但这种方法随着语言模型的大小增长,所需的计算资源增长得非常快,已经成为一种实际难以进行的。为解决这个问题,Parameter-Efficient Fine-Tuning(PEFT)技术提供了一个可行的解决方案,通过选择ively fine-tune一小部分的参数,大幅降低了适应domain的计算资源需求。在这个研究中,我们提出了Clinical LLaMA-LoRA,一个基于开源的LLaMA模型的PEFT适应层。Clinical LLaMA-LoRA通过使用来自MIMIC-IV数据库的临床笔记进行训练,创造了一个特殊的适应器,专门适用于临床领域。此外,我们还提出了一个两步PEFT框架,将Clinical LLaMA-LoRA与Downstream LLaMA-LoRA,另一个特殊的PEFT适应器,融合在一起。我们对多个临床结果预测任务进行评估,与临床训练的语言模型进行比较。我们的提议框架在所有临床下游任务中获得了状态控制的AUROC分数。我们发现在大规模多标签分类任务中,如诊断和处方分类,有显著的提高,AUROC分数提高了6-9%。

FITS: Modeling Time Series with $10k$ Parameters

  • paper_url: http://arxiv.org/abs/2307.03756
  • repo_url: None
  • paper_authors: Zhijian Xu, Ailing Zeng, Qiang Xu
  • for: 本研究推出了一种名为FITS的轻量级时间序列分析模型。
  • methods: FITS模型基于时间频谱预测的原则,通过抛弃高频组分来实现与状态艺术模型相同的性能,但具有只有约10k参数的极其紧凑的体积。
  • results: FITS模型可以实现时间序列预测和异常检测任务的性能,而且可以轻松地在边缘设备上训练和部署。
    Abstract In this paper, we introduce FITS, a lightweight yet powerful model for time series analysis. Unlike existing models that directly process raw time-domain data, FITS operates on the principle that time series can be manipulated through interpolation in the complex frequency domain. By discarding high-frequency components with negligible impact on time series data, FITS achieves performance comparable to state-of-the-art models for time series forecasting and anomaly detection tasks, while having a remarkably compact size of only approximately $10k$ parameters. Such a lightweight model can be easily trained and deployed in edge devices, creating opportunities for various applications. The anonymous code repo is available in: \url{https://anonymous.4open.science/r/FITS}
    摘要 “在这篇论文中,我们介绍了FITS模型,它是一种轻量级却强大的时间序列分析模型。与现有模型不同,FITS在假设时间序列可以通过在复杂频率域中进行 interpolate 来处理时间序列数据。通过抛弃具有negligible影响的高频组件,FITS可以达到与当前领先模型相当的性能水平,而且只有约10k个参数。这种轻量级的模型可以轻松地在边缘设备中训练和部署,开创了许多应用场景。代码存储库可以在以下链接中找到:https://anonymous.4open.science/r/FITS”

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

  • paper_url: http://arxiv.org/abs/2307.03034
  • repo_url: None
  • paper_authors: Keqin Liu, Chengzhong Zhang
  • for: 本 paper 考虑了一种通用观察模型,用于处理不稳定的多臂投机问题。player需要基于certain feedback机制进行操作,但这些feedback机制可能受到资源约束或环境噪音等因素的影响,导致feedback错误。
  • methods: 作者采用了一种概率模型来描述反馈/观察动态,并将问题转化为一个无穷状态问题。使用了可achivable region方法和partial conservation law(PCL)分析问题的 indextability和优先级指数(Whittle指数)。
  • results: 作者提出了一种近似过程,将问题转化为一个可以使用AG算法(Ni~no-Mora和Bertsimas)解决的Finite-state问题。实验显示,作者的算法在性能上表现非常出色。
    Abstract In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player needs to be based on certain feedback mechanism that is error-prone due to resource constraints or environmental or intrinsic noises. By establishing a general probabilistic model for dynamics of feedback/observation, we formulate the problem as a restless bandit with a countable belief state space starting from an arbitrary initial belief (a priori information). We apply the achievable region method with partial conservation law (PCL) to the infinite-state problem and analyze its indexability and priority index (Whittle index). Finally, we propose an approximation process to transform the problem into which the AG algorithm of Ni\~no-Mora and Bertsimas for finite-state problems can be applied to. Numerical experiments show that our algorithm has an excellent performance.
    摘要 在这篇论文中,我们考虑了一种通用的观察模型 для无休multi-armed bandit问题。玩家的操作需要基于一定的反馈机制,由于资源限制或环境或内在噪声而存在错误。通过建立一个通用的概率模型 для反馈/观察动态,我们将问题转化为一个无限状态的restless bandit问题,从初始信息开始。我们使用可 achievable region方法和部分保存法(PCL)分析问题的指数性和优先级指标(Whittle指标)。最后,我们提出一种简化过程,使得可以将问题转化为一个有限状态的问题,并应用Ni\~no-Mora和Bertsimas的AG算法。实验表明,我们的算法具有出色的性能。

PseudoCell: Hard Negative Mining as Pseudo Labeling for Deep Learning-Based Centroblast Cell Detection

  • paper_url: http://arxiv.org/abs/2307.03211
  • repo_url: None
  • paper_authors: Narongrid Seesawad, Piyalitt Ittichaiwong, Thapanun Sudhawiyangkul, Phattarapong Sawangjai, Peti Thuwajit, Paisarn Boonsakan, Supasan Sripodok, Kanyakorn Veerakanjana, Phoomraphee Luenam, Komgrid Charngkaew, Ananya Pongpaibul, Napat Angkathunyakul, Narit Hnoohom, Sumeth Yuenyong, Chanitra Thuwajit, Theerawit Wilaiprasitporn
  • for: 帮助病理学家在染色体检查中快速屏除非中blast细胞,提高病理诊断效率。
  • methods: 使用深度学习模型自动检测中blast细胞,并结合实际病理医生提供的中blast标注数据和 pseudo-负标注数据(基于假阳性预测结果中的细胞形态特征),以提高检测精度。
  • results: 在实验中, PseudoCell 可以减少病理医生的工作负担,准确地将注意力集中在需要的区域上,并可以根据信任值进行精细的区域屏除。在不同的信任值下, PseudoCell 可以消除58.18%-99.35%的非中blast组织区域。这种方法可以帮助病理医生更快速地完成检查,不需要精心标注数据进行改进。
    Abstract Patch classification models based on deep learning have been utilized in whole-slide images (WSI) of H&E-stained tissue samples to assist pathologists in grading follicular lymphoma patients. However, these approaches still require pathologists to manually identify centroblast cells and provide refined labels for optimal performance. To address this, we propose PseudoCell, an object detection framework to automate centroblast detection in WSI (source code is available at https://github.com/IoBT-VISTEC/PseudoCell.git). This framework incorporates centroblast labels from pathologists and combines them with pseudo-negative labels obtained from undersampled false-positive predictions using the cell's morphological features. By employing PseudoCell, pathologists' workload can be reduced as it accurately narrows down the areas requiring their attention during examining tissue. Depending on the confidence threshold, PseudoCell can eliminate 58.18-99.35% of non-centroblasts tissue areas on WSI. This study presents a practical centroblast prescreening method that does not require pathologists' refined labels for improvement. Detailed guidance on the practical implementation of PseudoCell is provided in the discussion section.
    摘要 报告中的文本翻译为简化中文:深度学习模型在染色质 immagini (WSI) 中的报告中使用了报告用于报告评估抗体混合癌症患者。然而,这些方法仍然需要病理医生手动标识中心blast细胞和提供高级标签以实现最佳性能。为了解决这个问题,我们提出了 PseudoCell,一个对象检测框架,可以自动检测WSI中的中心blast细胞。这个框架利用病理医生提供的中心blast标签和基于细胞形态特征的 pseudo-negative 标签进行组合。通过使用 PseudoCell,病理医生的工作负担可以减少,因为它可以准确地将病理医生的注意力集中在WSI中需要注意的区域上。根据信任阈值,PseudoCell可以从WSI中消除58.18-99.35%的非中心blast区域。这项研究提出了一种实用的中心blast预选方法,不需要病理医生的高级标签进行改进。详细的实现 PseudoCell 的指导在讨论部分中提供。

Improving Retrieval-Augmented Large Language Models via Data Importance Learning

  • paper_url: http://arxiv.org/abs/2307.03027
  • repo_url: https://github.com/amsterdata/ragbooster
  • paper_authors: Xiaozhong Lyu, Stefan Grafberger, Samantha Biegel, Shaopeng Wei, Meng Cao, Sebastian Schelter, Ce Zhang
  • for: 提高大语言模型的性能,无需进行进一步训练。
  • methods: 使用多线性扩展算法计算数据重要性,并提出(ε,δ)优化算法。
  • results: 可以提高大语言模型的性能,只需对搜索结果进行排重或重新权重,而无需进行进一步训练。在某些任务上,可以使一个小型模型(如GPT-JT),通过与搜索引擎API结合,超越GPT-3.5(无 Retrieval增强)。此外,我们还证明了在实践中可以有效计算多线性扩展的 weights(例如,在100万元素的数据集上只需几分钟时间)。
    Abstract Retrieval augmentation enables large language models to take advantage of external knowledge, for example on tasks like question answering and data imputation. However, the performance of such retrieval-augmented models is limited by the data quality of their underlying retrieval corpus. In this paper, we propose an algorithm based on multilinear extension for evaluating the data importance of retrieved data points. There are exponentially many terms in the multilinear extension, and one key contribution of this paper is a polynomial time algorithm that computes exactly, given a retrieval-augmented model with an additive utility function and a validation set, the data importance of data points in the retrieval corpus using the multilinear extension of the model's utility function. We further proposed an even more efficient ({\epsilon}, {\delta})-approximation algorithm. Our experimental results illustrate that we can enhance the performance of large language models by only pruning or reweighting the retrieval corpus, without requiring further training. For some tasks, this even allows a small model (e.g., GPT-JT), augmented with a search engine API, to outperform GPT-3.5 (without retrieval augmentation). Moreover, we show that weights based on multilinear extension can be computed efficiently in practice (e.g., in less than ten minutes for a corpus with 100 million elements).
    摘要

Sparse Graphical Linear Dynamical Systems

  • paper_url: http://arxiv.org/abs/2307.03210
  • repo_url: None
  • paper_authors: Emilie Chouzenoux, Victor Elvira
  • for: 这篇论文主要针对时间序列数据的研究,尤其是状态空间模型(SSM)的参数估计问题。
  • methods: 本文提出了一种新的图像模型框架,它结合了统计图像模型(graphical Lasso)和 causal-based图像模型(graphical Granger),以便 simultanously incorporating static and dynamic graphical information within the context of SSMs。
  • results: 实验 validate了提出的方法,并表明其可以有效地处理实际时间序列数据。
    Abstract Time-series datasets are central in numerous fields of science and engineering, such as biomedicine, Earth observation, and network analysis. Extensive research exists on state-space models (SSMs), which are powerful mathematical tools that allow for probabilistic and interpretable learning on time series. Estimating the model parameters in SSMs is arguably one of the most complicated tasks, and the inclusion of prior knowledge is known to both ease the interpretation but also to complicate the inferential tasks. Very recent works have attempted to incorporate a graphical perspective on some of those model parameters, but they present notable limitations that this work addresses. More generally, existing graphical modeling tools are designed to incorporate either static information, focusing on statistical dependencies among independent random variables (e.g., graphical Lasso approach), or dynamic information, emphasizing causal relationships among time series samples (e.g., graphical Granger approaches). However, there are no joint approaches combining static and dynamic graphical modeling within the context of SSMs. This work proposes a novel approach to fill this gap by introducing a joint graphical modeling framework that bridges the static graphical Lasso model and a causal-based graphical approach for the linear-Gaussian SSM. We present DGLASSO (Dynamic Graphical Lasso), a new inference method within this framework that implements an efficient block alternating majorization-minimization algorithm. The algorithm's convergence is established by departing from modern tools from nonlinear analysis. Experimental validation on synthetic and real weather variability data showcases the effectiveness of the proposed model and inference algorithm.
    摘要 时间序列数据在各个科学和工程领域中具有重要地位,如生物医学、地球观测和网络分析。现有广泛的研究探讨状态空间模型(SSM),它们是数学工具的强大工具,可以在时间序列上进行概率性和可解释的学习。估计SSM模型参数是最复杂的任务之一,并且包含先验知识可以使解释更加容易,但同时也会增加推理任务的复杂性。最近几年的研究尝试将一些模型参数视为图形上的变量,但它们存在一些限制。这种工作的目的是填补这个空白,并提出一种新的 JOINT 图形模型框架,该框架结合了统计依赖关系和 causal 关系,并将其应用于线性加 Gaussian SSM 中。我们提出了一种新的推理方法,称为 DGLASSO(动态图形lasso),它使用一种高效的块 Alternating Majorization-Minimization 算法。我们证明了该算法的收敛性,并通过使用现代非线性分析工具。实验 validate 在 sintetic 和实际气象异常数据上,显示了我们提出的模型和推理算法的效果。

Improving the Efficiency of Human-in-the-Loop Systems: Adding Artificial to Human Experts

  • paper_url: http://arxiv.org/abs/2307.03003
  • repo_url: None
  • paper_authors: Johannes Jakubik, Daniel Weber, Patrick Hemmer, Michael Vössing, Gerhard Satzger
  • for: 这篇论文旨在提高人工智能和机器学习模型之间的合作,以增加数据的值得提取。
  • methods: 这篇论文使用了人类在循环中(HITL)的扩展,将困难分类的数据交给人类审核。但是,这种方法需要大量的人力投入,导致资源的浪费。因此,这篇论文提出了一个混合系统,可以将人类审核的知识传授给人工智能。
  • results: 这篇论文的实验结果显示,该方法可以比传统HITL系统更高效地处理数据分类 задачі。
    Abstract Information systems increasingly leverage artificial intelligence (AI) and machine learning (ML) to generate value from vast amounts of data. However, ML models are imperfect and can generate incorrect classifications. Hence, human-in-the-loop (HITL) extensions to ML models add a human review for instances that are difficult to classify. This study argues that continuously relying on human experts to handle difficult model classifications leads to a strong increase in human effort, which strains limited resources. To address this issue, we propose a hybrid system that creates artificial experts that learn to classify data instances from unknown classes previously reviewed by human experts. Our hybrid system assesses which artificial expert is suitable for classifying an instance from an unknown class and automatically assigns it. Over time, this reduces human effort and increases the efficiency of the system. Our experiments demonstrate that our approach outperforms traditional HITL systems for several benchmarks on image classification.
    摘要 信息系统越来越利用人工智能(AI)和机器学习(ML)来生成价值从大量数据中。然而,ML模型不完美,可能会生成错误的分类。因此,人类在循环(HITL)扩展对ML模型进行人工审核,以便对难以分类的实例进行人类审核。这项研究认为,不断依赖于人类专家来处理困难分类会导致人力劳累,占用有限资源。为解决这个问题,我们提议一种混合系统,创建人工专家,从人类专家之前未知的类型中学习分类数据实例。我们的混合系统根据不同的人工专家选择适合分类某个实例,并自动分配。随着时间的推移,这将减少人力劳累,提高系统的效率。我们的实验表明,我们的方法在多个图像分类benchmark上表现出色。

ContainerGym: A Real-World Reinforcement Learning Benchmark for Resource Allocation

  • paper_url: http://arxiv.org/abs/2307.02991
  • repo_url: https://github.com/pendu/containergym
  • paper_authors: Abhijeet Pendyala, Justin Dettmer, Tobias Glasmachers, Asma Atamna
  • for: 本研究的目的是提出一个基于现实世界工业资源分配任务的问题集,用于评估对现实世界决策问题的应用。
  • methods: 本研究使用了一个称为ContainerGym的实验室,用于评估对不确定性和变量维度的挑战。
  • results: 研究发现了一些知名的深度问题学习算法,如PPO、TRPO和DQN,在面对现实世界问题时存在一些 interessante的限制。
    Abstract We present ContainerGym, a benchmark for reinforcement learning inspired by a real-world industrial resource allocation task. The proposed benchmark encodes a range of challenges commonly encountered in real-world sequential decision making problems, such as uncertainty. It can be configured to instantiate problems of varying degrees of difficulty, e.g., in terms of variable dimensionality. Our benchmark differs from other reinforcement learning benchmarks, including the ones aiming to encode real-world difficulties, in that it is directly derived from a real-world industrial problem, which underwent minimal simplification and streamlining. It is sufficiently versatile to evaluate reinforcement learning algorithms on any real-world problem that fits our resource allocation framework. We provide results of standard baseline methods. Going beyond the usual training reward curves, our results and the statistical tools used to interpret them allow to highlight interesting limitations of well-known deep reinforcement learning algorithms, namely PPO, TRPO and DQN.
    摘要 我们介绍ContainerGym,一个基于现实世界的资源分配任务的问题集,用于评估人工智能推广学习算法。我们的问题集具有现实世界决策问题中常见的挑战,例如不确定性。我们的问题集可以根据问题的困难度而设置,例如可以根据变量的维度来调整问题的难度。相比其他的人工智能推广学习问题集,我们的问题集更加直接地来自现实世界的问题,它仅受到了最小的简化和整理。因此,我们的问题集可以用来评估任何适合我们的资源分配框架的现实世界问题。我们提供了标准基eline方法的结果,以及使用的统计工具,以便highlight interesseting的深度问题学习算法的局限性,例如PPO、TRPO和DQN。

A Privacy-Preserving Walk in the Latent Space of Generative Models for Medical Applications

  • paper_url: http://arxiv.org/abs/2307.02984
  • repo_url: https://github.com/perceivelab/plan
  • paper_authors: Matteo Pennisi, Federica Proietto Salanitri, Giovanni Bellitto, Simone Palazzo, Ulas Bagci, Concetto Spampinato
  • for: 本研究旨在提出一种 latent space 导航策略,以便生成多样化的 sintetic 样本,支持深度模型的有效训练,同时坚持隐私问题的原则性解决。
  • methods: 我们的方法利用卫星标识器作为导航指南,在 latent space 中不线性步行,以避免遇到真实样本的近似者。我们还证明了,任意两个随机选择的 latent space 点之间的步行策略比线性 interpolate 更安全。
  • results: 我们在两个抑TB 和肥皮病类型分类 benchmark 上测试了我们的路径找索策略与 k-same 方法组合,结果表明,使用我们的方法可以减少模型训练时隐私泄露的风险,而不 sacrifice 模型性能。
    Abstract Generative Adversarial Networks (GANs) have demonstrated their ability to generate synthetic samples that match a target distribution. However, from a privacy perspective, using GANs as a proxy for data sharing is not a safe solution, as they tend to embed near-duplicates of real samples in the latent space. Recent works, inspired by k-anonymity principles, address this issue through sample aggregation in the latent space, with the drawback of reducing the dataset by a factor of k. Our work aims to mitigate this problem by proposing a latent space navigation strategy able to generate diverse synthetic samples that may support effective training of deep models, while addressing privacy concerns in a principled way. Our approach leverages an auxiliary identity classifier as a guide to non-linearly walk between points in the latent space, minimizing the risk of collision with near-duplicates of real samples. We empirically demonstrate that, given any random pair of points in the latent space, our walking strategy is safer than linear interpolation. We then test our path-finding strategy combined to k-same methods and demonstrate, on two benchmarks for tuberculosis and diabetic retinopathy classification, that training a model using samples generated by our approach mitigate drops in performance, while keeping privacy preservation.
    摘要 генеративные adversarial networks (GANs) 已经证明可以生成匹配目标分布的合成样本。然而,从隐私角度来看,使用 GANs 作为数据共享的代理不是安全的解决方案,因为它们往往将真实样本的近似 duplicates embedding 在幂空间中。现有作品,受到 k-隐身原则的启发,通过在幂空间中进行样本聚合来解决这个问题,但这会导致数据集被减少一个系数 k。我们的工作是通过提议一种幂空间导航策略,以生成多样的合成样本,支持深度模型的有效训练,同时在原则上保持隐私。我们的方法利用辅助标识类фика器作为帮助器,在幂空间中不线性步行,最小化与真实样本近似 duplicates 的风险。我们实验表明,任意两个随机点在幂空间中的步行策略比线性 interpolate 更安全。然后,我们测试了我们的路径找到策略与 k-same 方法结合使用,并在两个标准 benchmark 上进行肢体病诊断和糖尿病肝病诊断,结果表明,通过我们的方法训练的模型可以减少性能下降,同时保持隐私。

Transfer Learning for the Efficient Detection of COVID-19 from Smartphone Audio Data

  • paper_url: http://arxiv.org/abs/2307.02975
  • repo_url: https://github.com/mattiacampana/transfer-learning-covid-19
  • paper_authors: Mattia Giovanni Campana, Franca Delmastro, Elena Pagani
  • for: 这篇论文的目的是为了探索智能手机数据中疾病检测的开放研究挑战,以及COVID-19 和其呼吸症状的早期检测。
  • methods: 这篇论文使用了三种深度学习模型(VGGish、YAMNET 和 L\textsuperscript{3}-Net),以及两种转移学习方法(特征提取和精度调整),并通过用户独立的实验评估这些模型在四个数据集(总共13,447个样本)上的表现。
  • results: 结果显示L\textsuperscript{3}-Net在所有实验设定中表现最佳,与其他解决方案相比,提高了12.3%的精度-回传Area Under the Curve(AUC),并在特征提取和精度调整方法中均表现出色。此外,研究发现将专案调整到预训练的内部层次通常会导致表现下降,均下降6.6%。
    Abstract Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems. COVID-19 and its respiratory symptoms are an important case study in this area and their early detection is a potential real instrument to counteract the pandemic situation. The efficacy of this solution mainly depends on the performances of AI algorithms applied to the collected data and their possible implementation directly on the users' mobile devices. Considering these issues, and the limited amount of available data, in this paper we present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features, and of two main approaches of transfer learning in the considered scenario: both feature extraction and fine-tuning. Specifically, we considered VGGish, YAMNET, and L\textsuperscript{3}-Net (including 12 different configurations) evaluated through user-independent experiments on 4 different datasets (13,447 samples in total). Results clearly show the advantages of L\textsuperscript{3}-Net in all the experimental settings as it overcomes the other solutions by 12.3\% in terms of Precision-Recall AUC as features extractor, and by 10\% when the model is fine-tuned. Moreover, we note that to fine-tune only the fully-connected layers of the pre-trained models generally leads to worse performances, with an average drop of 6.6\% with respect to feature extraction. %highlighting the need for further investigations. Finally, we evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.
    摘要 《医疗预测从智能手机数据中的挑战》是移动医疗(m-health)系统中的一个开放研究领域。COVID-19和其呼吸症状是这个领域中一个重要的案例研究,早期检测可以成为对抗疫情情况的实际工具。这种解决方案的有效性主要取决于应用于收集的数据的人工智能算法的性能,以及它们可能的直接在用户的移动设备上进行实现。考虑这些问题,以及有限的数据量,在这篇论文中我们提出了三种不同的深度学习模型的实验评估,并与手工设计的特征进行比较。 Specifically, we considered VGGish, YAMNET, and L\textsuperscript{3}-Net (including 12 different configurations) evaluated through user-independent experiments on 4 different datasets (13,447 samples in total). 结果显示L\textsuperscript{3}-Net在所有实验设定中都优于其他解决方案,在报告精度- recall AUC 方面提高12.3%,并在Feature extractor和 fine-tuning 方法下提高10%。此外,我们注意到只调整全连接层的准备模型通常会导致性能下降,均为6.6%。 % highlighting the need for further investigations. Finally, we evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.

Pruning vs Quantization: Which is Better?

  • paper_url: http://arxiv.org/abs/2307.02973
  • repo_url: None
  • paper_authors: Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort
  • for: Comparing the effectiveness of neural network quantization and pruning techniques for compressing deep neural networks.
  • methods: Analytical and empirical comparisons of expected quantization and pruning error, and lower bounds for per-layer pruning and quantization error in trained networks.
  • results: Quantization outperforms pruning in most cases, but pruning might be beneficial in some scenarios with very high compression ratios.
    Abstract Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.
    摘要 neural network 压缩和量化技术已经几乎与神经网络一起出现了。然而,到目前为止只有随意比较这两种技术的比较。在这篇论文中,我们决定回答以下问题:哪个更好:神经网络量化或剪枝?通过回答这个问题,我们希望能够对神经网络硬件设计决策提供指导。我们对两种压缩深度神经网络的比较进行了广泛的比较。首先,我们给出了对于一般数据分布的量化和剪枝错误的分析比较。然后,我们提供了层次剪枝和量化错误的下界,并与经验性错误进行比较。最后,我们对3个任务上8个大规模模型进行了广泛的实验比较。我们的结果表明,大多数情况下,量化超过剪枝。只有在压缩比较高时,剪枝可能在准确性方面具有一定的优势。

DPM: Clustering Sensitive Data through Separation

  • paper_url: http://arxiv.org/abs/2307.02969
  • repo_url: None
  • paper_authors: Yara Schütt, Johannes Liebenow, Tanya Braun, Marcel Gehrke, Florian Thaeter, Esfandiar Mohammadi
  • for: 隐私保护集群算法可以在无监管的情况下对数据进行不监管的分组,同时保证敏感信息的安全性。
  • methods: 本文引入了一种新的差分隐私分组算法DPM,该算法通过在差分隐私的情况下搜索精准的数据点分隔器来实现隐私保护集群。DPM解决了两个关键挑战:寻找大距离分隔器而不是小距离分隔器,以及有效地花费隐私预算。
  • results: 实验评估表明,DPM可以与基准算法KMeans++进行比较,在ε=1和δ=10^-5的情况下,DPM可以在synthetic数据集上提高不同类别的固有稳定度(inertia)的最大值,相比之下,与状态公算法Chang和Kamath的分布式 clustering算法相比,DPM可以提高最大值的差异达50%(synthetic数据集)和62%(实际数据集)。
    Abstract Privacy-preserving clustering groups data points in an unsupervised manner whilst ensuring that sensitive information remains protected. Previous privacy-preserving clustering focused on identifying concentration of point clouds. In this paper, we take another path and focus on identifying appropriate separators that split a data set. We introduce the novel differentially private clustering algorithm DPM that searches for accurate data point separators in a differentially private manner. DPM addresses two key challenges for finding accurate separators: identifying separators that are large gaps between clusters instead of small gaps within a cluster and, to efficiently spend the privacy budget, prioritising separators that split the data into large subparts. Using the differentially private Exponential Mechanism, DPM randomly chooses cluster separators with provably high utility: For a data set $D$, if there is a wide low-density separator in the central $60\%$ quantile, DPM finds that separator with probability $1 - \exp(-\sqrt{|D|})$. Our experimental evaluation demonstrates that DPM achieves significant improvements in terms of the clustering metric inertia. With the inertia results of the non-private KMeans++ as a baseline, for $\varepsilon = 1$ and $\delta=10^{-5}$ DPM improves upon the difference to the baseline by up to $50\%$ for a synthetic data set and by up to $62\%$ for a real-world data set compared to a state-of-the-art clustering algorithm by Chang and Kamath.
    摘要 自避嫌敏感聚类分析数据点,保护敏感信息的同时,也可以自动找到数据集中的精准分割器。在这篇论文中,我们不同于之前的敏感聚类方法,我们的方法是通过找到数据集中的合适分割器来实现敏感聚类。我们提出了一种新的敏感聚类算法DPM,它在 differentially private 的方式下搜索数据集中的精准分割器。DPM 解决了两个关键问题:一是找到分割器,而不是在集群中找到小距离的分割器,二是有效地花费隐私预算。DPM 使用了不同于 KMeans++ 的扩展机制,可以很好地降低隐私预算。我们的实验评估表明,DPM 可以在各种数据集上达到显著的聚类稳定度提升。相比之下,与 KMeans++ 为基线,DPM 在 $\varepsilon = 1$ 和 $\delta = 10^{-5}$ 下可以在 synthetic 数据集上提高差异达到 $50\%$,在 real-world 数据集上提高差异达到 $62\%$。

SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks

  • paper_url: http://arxiv.org/abs/2307.02953
  • repo_url: None
  • paper_authors: Junlong Cheng, Chengrui Gao, Fengjie Wang, Min Zhu
  • for: 这篇研究旨在提出一个轻量级的医疗影像分类网络(SegNetr),以提高医疗影像分类的精度和效率。
  • methods: 这篇研究使用了一个新的SegNetr块,可以在任何阶段进行本地-全球互动,并且具有线性复杂度。另外,研究者还提出了一个通用的资讯保留 skip connection(IRSC),以保持嵌入对应预测器的空间位置资讯,并实现精确的融合。
  • results: 研究结果显示,SegNetr在四个主流医疗影像分类数据集上表现出色,与常用的U-Net相比,缩减了59%和76%的参数和GFLOPs,并且保持了医疗影像分类的精度。此外,研究者还发现,SegNetr可以将其 Component 应用到其他 U-shaped 网络上,以提高其分类性能。
    Abstract Recently, U-shaped networks have dominated the field of medical image segmentation due to their simple and easily tuned structure. However, existing U-shaped segmentation networks: 1) mostly focus on designing complex self-attention modules to compensate for the lack of long-term dependence based on convolution operation, which increases the overall number of parameters and computational complexity of the network; 2) simply fuse the features of encoder and decoder, ignoring the connection between their spatial locations. In this paper, we rethink the above problem and build a lightweight medical image segmentation network, called SegNetr. Specifically, we introduce a novel SegNetr block that can perform local-global interactions dynamically at any stage and with only linear complexity. At the same time, we design a general information retention skip connection (IRSC) to preserve the spatial location information of encoder features and achieve accurate fusion with the decoder features. We validate the effectiveness of SegNetr on four mainstream medical image segmentation datasets, with 59\% and 76\% fewer parameters and GFLOPs than vanilla U-Net, while achieving segmentation performance comparable to state-of-the-art methods. Notably, the components proposed in this paper can be applied to other U-shaped networks to improve their segmentation performance.
    摘要 近期,U型网络在医学图像分割领域占据主导地位,主要是因为它们的简单和容易调整结构。然而,现有的U型分割网络:1)大多数是设计复杂的自注意模块,以补做 convolution 操作所带来的长期依赖缺失,这会增加总参数数和网络的计算复杂度; 2)简单地融合 encoder 和 decoder 的特征,忽略了它们的空间位置之间的连接。在这篇论文中,我们重新思考了以上问题,并建立了一个轻量级的医学图像分割网络,称为 SegNetr。具体来说,我们提出了一个新的 SegNetr 块,可以在任何阶段进行本地-全局交互,并且只有线性复杂度。同时,我们设计了一种普适的信息保留skip连接(IRSC),以保持 encoder 特征的空间位置信息,并与 decoder 特征进行准确融合。我们验证了 SegNetr 在四大流行的医学图像分割 dataset 上的效果,与 vanilla U-Net 相比,它具有59%和76% fewer parameters和GFLOPs,同时 achieving segmentation performance 与当前状态OFthe-art 方法相当。尤其是,本文所提出的组件可以应用于其他 U-shaped 网络,以提高它们的分割性能。

When No-Rejection Learning is Optimal for Regression with Rejection

  • paper_url: http://arxiv.org/abs/2307.02932
  • repo_url: None
  • paper_authors: Xiaocheng Li, Shang Liu, Chunlin Sun, Hanzhao Wang
  • for: 这个论文研究了人工智能和人类之间的预测任务交互模型,具体来说是一个名为学习拒绝(Learning with Rejection)的模型。
  • methods: 这个模型有两个组成部分:预测器和拒绝器。当样本到达时,拒绝器首先决定是否接受它;如果被接受,然后预测器完成预测任务;如果被拒绝,那么预测将被委托给人类。学习问题需要同时学习预测器和拒绝器。这会改变传统的损失函数结构,并经常导致非对称和不一致问题。
  • results: 研究发现,在预测问题中,使用拒绝学习策略可以提高预测性能。此外,通过将拒绝学习策略与传统的预测器学习策略结合使用,可以提高预测性能。
    Abstract Learning with rejection is a prototypical model for studying the interaction between humans and AI on prediction tasks. The model has two components, a predictor and a rejector. Upon the arrival of a sample, the rejector first decides whether to accept it; if accepted, the predictor fulfills the prediction task, and if rejected, the prediction will be deferred to humans. The learning problem requires learning a predictor and a rejector simultaneously. This changes the structure of the conventional loss function and often results in non-convexity and inconsistency issues. For the classification with rejection problem, several works develop surrogate losses for the jointly learning with provable consistency guarantees; in parallel, there has been less work for the regression counterpart. We study the regression with rejection (RwR) problem and investigate the no-rejection learning strategy which treats the RwR problem as a standard regression task to learn the predictor. We establish that the suboptimality of the no-rejection learning strategy observed in the literature can be mitigated by enlarging the function class of the predictor. Then we introduce the truncated loss to single out the learning for the predictor and we show that a consistent surrogate property can be established for the predictor individually in an easier way than for the predictor and the rejector jointly. Our findings advocate for a two-step learning procedure that first uses all the data to learn the predictor and then calibrates the prediction loss for the rejector. It is better aligned with the common intuition that more data samples will lead to a better predictor and it calls for more efforts on a better design of calibration algorithms for learning the rejector. While our discussions mainly focus on the regression problem, the theoretical results and insights generalize to the classification problem as well.
    摘要 学习 WITH 拒绝是一种典型的模型,用于研究人类和 AI 在预测任务之间的交互。该模型包括一个预测器和一个拒绝者。当样本到达时,拒绝者首先决定是否接受它;如果被接受,预测器完成预测任务;如果被拒绝,预测将被延迟到人类。学习问题需要同时学习预测器和拒绝者。这会改变传统的损失函数结构,并常常导致非几何和不一致性问题。对于类别 WITH 拒绝问题,一些工作提出了证明可靠的替代损失函数,而对于 regression 问题,有少量的研究。我们研究了 regression with rejection(RwR)问题,并 investigate no-rejection 学习策略,即将 RwR 问题视为标准的回归任务,以学习预测器。我们证明了在文献中观察到的不优化情况可以通过扩大预测器的功能集来缓解。然后,我们引入了剪辑损失来单独学习预测器,并证明了在更容易的方式上可以建立预测器的一致性属性。我们的发现建议在所有数据上首先学习预测器,然后对预测器进行调整,以更好地适应实际情况。这种二步学习方式更符合人们对更多数据样本会导致更好的预测器的共识,并且强调了更好地设计报告算法以提高拒绝者的学习。虽然我们的讨论主要关注回归问题,但我们的理论结论和发现都适用于类别问题。

A Real-time Human Pose Estimation Approach for Optimal Sensor Placement in Sensor-based Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2307.02906
  • repo_url: None
  • paper_authors: Orhan Konak, Alexander Wischmann, Robin van de Water, Bert Arnrich
  • for: 本研究旨在提供一种可靠的人体活动识别方法,以便实现不侵入式监测人体活动。
  • methods: 本研究使用了实时2D姿态估计来确定最佳传感器位置,并使用视频记录来获取目标活动的2D姿态数据。
  • results: 研究发现,视觉基于的传感器位置选择方法可以与传统深度学习方法相比,表现相似,证明了其有效性。
    Abstract Sensor-based Human Activity Recognition facilitates unobtrusive monitoring of human movements. However, determining the most effective sensor placement for optimal classification performance remains challenging. This paper introduces a novel methodology to resolve this issue, using real-time 2D pose estimations derived from video recordings of target activities. The derived skeleton data provides a unique strategy for identifying the optimal sensor location. We validate our approach through a feasibility study, applying inertial sensors to monitor 13 different activities across ten subjects. Our findings indicate that the vision-based method for sensor placement offers comparable results to the conventional deep learning approach, demonstrating its efficacy. This research significantly advances the field of Human Activity Recognition by providing a lightweight, on-device solution for determining the optimal sensor placement, thereby enhancing data anonymization and supporting a multimodal classification approach.
    摘要 《受器件基于人体活动识别》可以不侵入地监测人体活动。然而,确定最佳受器件位置以实现优化的分类性能仍然是一个挑战。这篇论文提出了一种新的方法来解决这个问题,使用实时二维姿态估计来从视频记录中获取目标活动的姿态数据。这些姿态数据提供了一种独特的策略来确定最佳受器件位置。我们通过实验验证了我们的方法,使用抖动仪器来监测13种不同的活动,并在10名参与者身上进行了评估。我们的发现表明,基于视频的受器件位置确定方法和深度学习方法具有相似的效果,这证明了我们的方法的有效性。这项研究在人体活动识别领域中做出了重要贡献,提供了一种轻量级、在设备上进行的受器件位置确定方法,从而提高数据匿名化和支持多模态分类approach。

PUFFIN: A Path-Unifying Feed-Forward Interfaced Network for Vapor Pressure Prediction

  • paper_url: http://arxiv.org/abs/2307.02903
  • repo_url: None
  • paper_authors: Vinicius Viena Santana, Carine Menezes Rebello, Luana P. Queiroz, Ana Mafalda Ribeiro, Nadia Shardt, Idelfonso B. R. Nogueira
  • for: 提高化学物质预测的精度,以满足工业和环境应用需求。
  • methods: 提出了一种基于机器学习的框架,即PUFFIN(路径联合启发前向网络), combinig transfer learning 和启发节点,以提高热压缩预测。
  • results: PUFFIN 比不使用启发节点或使用通用描述器的其他策略更高的性能。框架的包含域专业知识来超越数据罕见性的能力,适用于更广泛的化学物质分析预测。
    Abstract Accurately predicting vapor pressure is vital for various industrial and environmental applications. However, obtaining accurate measurements for all compounds of interest is not possible due to the resource and labor intensity of experiments. The demand for resources and labor further multiplies when a temperature-dependent relationship for predicting vapor pressure is desired. In this paper, we propose PUFFIN (Path-Unifying Feed-Forward Interfaced Network), a machine learning framework that combines transfer learning with a new inductive bias node inspired by domain knowledge (the Antoine equation) to improve vapor pressure prediction. By leveraging inductive bias and transfer learning using graph embeddings, PUFFIN outperforms alternative strategies that do not use inductive bias or that use generic descriptors of compounds. The framework's incorporation of domain-specific knowledge to overcome the limitation of poor data availability shows its potential for broader applications in chemical compound analysis, including the prediction of other physicochemical properties. Importantly, our proposed machine learning framework is partially interpretable, because the inductive Antoine node yields network-derived Antoine equation coefficients. It would then be possible to directly incorporate the obtained analytical expression in process design software for better prediction and control of processes occurring in industry and the environment.
    摘要 精确预测蒸汽压力在各种工业和环境应用中非常重要。然而,为所有 интерес的化合物都获得准确测量是不可能的,因为实验资源和劳动力成本过高。随着温度的变化,测量蒸汽压力的关系也变得更加复杂。在这篇论文中,我们提出了PUFFIN(Path-Unifying Feed-Forward Interfaced Network)机器学习框架,该框架结合了传输学习和基于预测方程的新权重节点,以提高蒸汽压力预测。通过利用预测方程的启发和传输学习使用图像描述符,PUFFIN超越了不使用预测方程或使用通用描述符的策略。该框架的包含域专业知识以解决资料不足的限制,表明其在化学物质分析中的潜在应用。特别是,我们提出的机器学习框架部分可解释,因为 Antoine 节点的启发导致了网络获得的 Antoine 方程系数。因此,可以直接在进程设计软件中包含获得的分析表达,以提高工业和环境中进程预测和控制的准确性。

Free Bits: Latency Optimization of Mixed-Precision Quantized Neural Networks on the Edge

  • paper_url: http://arxiv.org/abs/2307.02894
  • repo_url: None
  • paper_authors: Georg Rutishauser, Francesco Conti, Luca Benini
  • for: 这篇论文旨在优化混合精度量化的调整运算,以实现模型大小、延迟时间和统计准确性之间的最佳调整。
  • methods: 这篇论文提出了一种混合精度量化搜寻方法,包括一个硬件无关的微分搜寻算法和一个硬件对应的优化方法,以找到适合特定硬件目标的混合精度配置。
  • results: 这篇论文在MobileNetV1和MobileNetV2上进行了评估,并在一家多核RISC-V微控制器平台上部署了结果。获得了与8位模型相比的28.6%的终端延迟减少,并且在不支持子字元运算的硬件上也获得了速度优化。此外,论文还证明了其方法对于针对对减少二进制操作数量的分别搜寻来调整精度配置的超越性。
    Abstract Mixed-precision quantization, where a deep neural network's layers are quantized to different precisions, offers the opportunity to optimize the trade-offs between model size, latency, and statistical accuracy beyond what can be achieved with homogeneous-bit-width quantization. To navigate the intractable search space of mixed-precision configurations for a given network, this paper proposes a hybrid search methodology. It consists of a hardware-agnostic differentiable search algorithm followed by a hardware-aware heuristic optimization to find mixed-precision configurations latency-optimized for a specific hardware target. We evaluate our algorithm on MobileNetV1 and MobileNetV2 and deploy the resulting networks on a family of multi-core RISC-V microcontroller platforms with different hardware characteristics. We achieve up to 28.6% reduction of end-to-end latency compared to an 8-bit model at a negligible accuracy drop from a full-precision baseline on the 1000-class ImageNet dataset. We demonstrate speedups relative to an 8-bit baseline, even on systems with no hardware support for sub-byte arithmetic at negligible accuracy drop. Furthermore, we show the superiority of our approach with respect to differentiable search targeting reduced binary operation counts as a proxy for latency.
    摘要 混合精度量化,其中深度神经网络层次被量化到不同精度,可以超越基于同一bit宽度量化的优化。为探索混合精度配置空间中的优化搜索方法,这篇论文提议了一种混合搜索方法。该方法包括一个硬件无关的可微分搜索算法,以及一个硬件aware的优化策略,以找到适合特定硬件目标的延迟优化的混合精度配置。我们在MobileNetV1和MobileNetV2上测试了我们的算法,并将结果部署到一家多核RISC-V微控制器平台上,该平台具有不同硬件特性。我们实现了与8位模型相比的28.6%的端到端延迟减少,而且只有一个可观的减少精度。此外,我们还证明了我们的方法在对减少二进制运算数量为延迟的目标进行搜索时的优越性。

BaBE: Enhancing Fairness via Estimation of Latent Explaining Variables

  • paper_url: http://arxiv.org/abs/2307.02891
  • repo_url: https://github.com/babe-algorithm/babe
  • paper_authors: Ruta Binkyte, Daniele Gorla, Catuscia Palamidessi
  • for: solves the problem of unfair discrimination between two groups by proposing a pre-processing method to achieve fairness.
  • methods: uses Bayesian Bias Elimination (BaBE), a combination of Bayes inference and the Expectation-Maximization method, to estimate the most likely value of the latent explanatory variable E for each group.
  • results: shows good fairness and high accuracy in experiments on synthetic and real data sets.
    Abstract We consider the problem of unfair discrimination between two groups and propose a pre-processing method to achieve fairness. Corrective methods like statistical parity usually lead to bad accuracy and do not really achieve fairness in situations where there is a correlation between the sensitive attribute S and the legitimate attribute E (explanatory variable) that should determine the decision. To overcome these drawbacks, other notions of fairness have been proposed, in particular, conditional statistical parity and equal opportunity. However, E is often not directly observable in the data, i.e., it is a latent variable. We may observe some other variable Z representing E, but the problem is that Z may also be affected by S, hence Z itself can be biased. To deal with this problem, we propose BaBE (Bayesian Bias Elimination), an approach based on a combination of Bayes inference and the Expectation-Maximization method, to estimate the most likely value of E for a given Z for each group. The decision can then be based directly on the estimated E. We show, by experiments on synthetic and real data sets, that our approach provides a good level of fairness as well as high accuracy.
    摘要 我们考虑了不公正歧视 между两个群体,并提出了预处理方法以实现公正。通常的统计平衡方法会导致坏准确率并不实际实现公正,特别是在敏感属性S和合法属性E(解释变量)之间存在相关性时。为了解决这些缺点,其他公正的概念被提出,特别是conditional statistical parity和equal opportunity。然而,E通常不直接可见于数据中,即是隐藏变量。我们可能观察到Z表示E,但问题在于Z也可能受S的影响,因此Z本身可能受到偏见。为解决这个问题,我们提出了BaBE(抽象折衣 Bayesian Bias Elimination)方法,基于抽象折衣和期望最大化方法,以估计每个群体中E的最可能值。然后,决策可以直接基于估计的E。我们通过对 sintetic和实际数据集进行实验,示出了我们的方法可以实现高度的公正以及高准确率。

Learning to Solve Tasks with Exploring Prior Behaviours

  • paper_url: http://arxiv.org/abs/2307.02889
  • repo_url: https://github.com/ricky-zhu/irdec
  • paper_authors: Ruiqi Zhu, Siyuan Li, Tianhong Dai, Chongjie Zhang, Oya Celiktutan
  • for: 帮助解决 sparse-reward 任务
  • methods: 使用 Intrinsic Rewards Driven Example-based Control (IRDEC) 方法,能够让代理人学习并掌握先前的行为,然后将其与任务特定的行为相连接以解决 sparse-reward 任务
  • results: 在三个导航任务和一个机器人处理任务中,表现优于其他基准值
    Abstract Demonstrations are widely used in Deep Reinforcement Learning (DRL) for facilitating solving tasks with sparse rewards. However, the tasks in real-world scenarios can often have varied initial conditions from the demonstration, which would require additional prior behaviours. For example, consider we are given the demonstration for the task of \emph{picking up an object from an open drawer}, but the drawer is closed in the training. Without acquiring the prior behaviours of opening the drawer, the robot is unlikely to solve the task. To address this, in this paper we propose an Intrinsic Rewards Driven Example-based Control \textbf{(IRDEC)}. Our method can endow agents with the ability to explore and acquire the required prior behaviours and then connect to the task-specific behaviours in the demonstration to solve sparse-reward tasks without requiring additional demonstration of the prior behaviours. The performance of our method outperforms other baselines on three navigation tasks and one robotic manipulation task with sparse rewards. Codes are available at https://github.com/Ricky-Zhu/IRDEC.
    摘要 深圳�技术(DRL)广泛使用演示来解决具有罕见奖励的任务。然而,实际场景中的任务可能会有 demonstrate 中的初始条件不同,需要额外的先行行为。例如,假设我们被给定了选择开放抽屉中的物品任务的演示,但抽屉在训练时是关闭的。如果不具备开启抽屉的先行行为,机器人很 unlikely 解决任务。为此,本文提出了内在奖励驱动示例控制方法(IRDEC)。我们的方法可以让代理人探索和获得所需的先行行为,然后与任务特定行为在演示中连接以解决罕见奖励任务,无需额外的先行行为示例。我们的方法在三个导航任务和一个机器人抓取任务中表现出色,超过其他基线。代码可以在 https://github.com/Ricky-Zhu/IRDEC 上获取。

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

  • paper_url: http://arxiv.org/abs/2307.02884
  • repo_url: None
  • paper_authors: Jiacheng Guo, Minshuo Chen, Huan Wang, Caiming Xiong, Mengdi Wang, Yu Bai
  • for: 研究 POMDP 中学习样本效率的问题,这是 reinforcement learning 中一个具有极大强度困难的问题。
  • methods: 提出了一种增强反馈模型,称为“多观察往事”,在每个 POMDP 交互回合后,学习者可以收集到更多的观察数据,但不能直接观察到latent state。
  • results: 证明在这种反馈模型下,可以实现 sample-efficient learning,并且该模型可以涵盖两类新的 POMDP subclass:多观察揭示 POMDP 和 distinguishable POMDP。这两类 subclass 都是 revelaing POMDP 的推广和放松,但只需要latent state emission distribution 不同而不需要linearly independent。
    Abstract This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs.
    摘要

A Machine-Learned Ranking Algorithm for Dynamic and Personalised Car Pooling Services

  • paper_url: http://arxiv.org/abs/2307.05697
  • repo_url: None
  • paper_authors: Mattia Giovanni Campana, Franca Delmastro, Raffaele Bruno
  • For: 降低城市交通堵塞和污染* Methods: 使用学习排序技术自动生成每个用户的个性化选择模型,并将这些模型用于建议乘客与司机的合适的共乘机会* Results: 实验结果表明,提议的解决方案可以快速和准确地预测用户的个性化选择模型,并在静态和动态条件下表现出优异的性能。
    Abstract Car pooling is expected to significantly help in reducing traffic congestion and pollution in cities by enabling drivers to share their cars with travellers with similar itineraries and time schedules. A number of car pooling matching services have been designed in order to efficiently find successful ride matches in a given pool of drivers and potential passengers. However, it is now recognised that many non-monetary aspects and social considerations, besides simple mobility needs, may influence the individual willingness of sharing a ride, which are difficult to predict. To address this problem, in this study we propose GoTogether, a recommender system for car pooling services that leverages on learning-to-rank techniques to automatically derive the personalised ranking model of each user from the history of her choices (i.e., the type of accepted or rejected shared rides). Then, GoTogether builds the list of recommended rides in order to maximise the success rate of the offered matches. To test the performance of our scheme we use real data from Twitter and Foursquare sources in order to generate a dataset of plausible mobility patterns and ride requests in a metropolitan area. The results show that the proposed solution quickly obtain an accurate prediction of the personalised user's choice model both in static and dynamic conditions.
    摘要 卡车pooling 预计将能够有效地减少城市塞车和污染,通过让 drivers 和旅行者共享车辆,并且将驾驶者和旅行者的行程和时间表汇入一起。然而,现在已经被认为,在分享车辆的决策中,不只有交通需求,还有许多非实物的方面和社交考虑,这些难以预测。为解决这个问题,在这个研究中,我们提出了 GoTogether,一个基于学习排名技术的推荐系统,可以自动从用户的历史选择(即接受或拒绝分享的车辆类型)中 derivate 用户的个人化选择模型。然后,GoTogether 将建立用户的个人化推荐列表,以最大化分享成功率。为验证我们的方案的性能,我们使用了 Twitter 和 Foursquare 的数据来生成一个城市区域的可能的流动模式和分享请求。结果显示,我们的方案快速地获得了个人化用户选择模型的精准预测, both in static and dynamic conditions。

Towards a safe MLOps Process for the Continuous Development and Safety Assurance of ML-based Systems in the Railway Domain

  • paper_url: http://arxiv.org/abs/2307.02867
  • repo_url: None
  • paper_authors: Marc Zeller, Thomas Waschulzik, Reiner Schmid, Claus Bahlmann
  • for: 本文旨在提出一种安全的Machine Learning Operations(MLOps)过程,用于不断开发和安全验证基于机器学习(ML)技术的铁路领域系统。
  • methods: 本文使用了系统工程、安全验证和ML生命周期的组合,实现了一个完整的工作流程。同时,文章还描述了自动化不同阶段的挑战。
  • results: 本文提出了一种安全的MLOps过程,可以帮助实现不断开发和安全验证铁路领域的ML-基于系统。这种过程可以提高系统的可靠性、可重构性和可灵活应用性。
    Abstract Traditional automation technologies alone are not sufficient to enable driverless operation of trains (called Grade of Automation (GoA) 4) on non-restricted infrastructure. The required perception tasks are nowadays realized using Machine Learning (ML) and thus need to be developed and deployed reliably and efficiently. One important aspect to achieve this is to use an MLOps process for tackling improved reproducibility, traceability, collaboration, and continuous adaptation of a driverless operation to changing conditions. MLOps mixes ML application development and operation (Ops) and enables high frequency software releases and continuous innovation based on the feedback from operations. In this paper, we outline a safe MLOps process for the continuous development and safety assurance of ML-based systems in the railway domain. It integrates system engineering, safety assurance, and the ML life-cycle in a comprehensive workflow. We present the individual stages of the process and their interactions. Moreover, we describe relevant challenges to automate the different stages of the safe MLOps process.
    摘要 传统自动化技术独立无法实现列车自动驾驶(称为级别自动化(GoA)4)在不受限制的基础设施上。需要完成的感知任务现在通常通过机器学习(ML)实现,因此需要可靠地开发和部署,以及持续地适应变化的条件。一个重要的方法是使用 MLOps 过程来提高可重复性、跟踪性、合作和持续创新,基于运营反馈。在这篇论文中,我们介绍了一种安全的 MLOps 过程,用于不断发展和安全验证 ML 基于系统的 railway 领域中的系统工程、安全验证和 ML 生命周期的Integration。我们介绍了不同阶段的过程和它们之间的交互。此外,我们还描述了自动化不同阶段的安全 MLOps 过程中的挑战。

PLIERS: a Popularity-Based Recommender System for Content Dissemination in Online Social Networks

  • paper_url: http://arxiv.org/abs/2307.02865
  • repo_url: None
  • paper_authors: Valerio Arnaboldi, Mattia Giovanni Campana, Franca Delmastro, Elena Pagani
  • for: 这篇论文是为了提出一种新的标签基于推荐系统(PLIERS),该系统基于用户主要关注已经拥有的物品和标签的流行程度来进行推荐。
  • methods: 该论文使用了标签的流行程度作为推荐的依据,并通过一系列实验证明了PLIERS的效果。
  • results: 实验结果表明,PLIERS可以比现有的解决方案更好地平衡算法复杂性和个性化推荐的级别,同时提供更有个性化、有 relevance 和有创新性的推荐。
    Abstract In this paper, we propose a novel tag-based recommender system called PLIERS, which relies on the assumption that users are mainly interested in items and tags with similar popularity to those they already own. PLIERS is aimed at reaching a good tradeoff between algorithmic complexity and the level of personalization of recommended items. To evaluate PLIERS, we performed a set of experiments on real OSN datasets, demonstrating that it outperforms state-of-the-art solutions in terms of personalization, relevance, and novelty of recommendations.
    摘要 在这篇论文中,我们提出了一种新的标签基于推荐系统,即PLIERS,它假设用户主要关注的item和标签具有类似的流行度。PLIERS的目标是实现算法复杂性和个性化推荐项的好 equilibrio。为了评估PLIERS,我们在实际社交媒体数据集上进行了一系列实验,并证明它在个性化、 relevance和新颖性方面超过了现状最佳解决方案。

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation

  • paper_url: http://arxiv.org/abs/2307.02842
  • repo_url: None
  • paper_authors: Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang
  • For: 本研究旨在强化政策,以保证在决策过程中保持安全性。* Methods: 本文提出了一种新的风险敏感强化学习形式,使用迭代条件值风险(CVaR)目标函数,并提供了一种基于线性和通用函数近似的算法。* Results: 提出的算法ICVar-L和ICVar-G可以在不同的维度和集数下实现可控的停损 regret,并且提供了一些新的技术,如CVaR运算数学减法、ridge regression与CVaR适应特征以及改进的椭球 potential 函数。
    Abstract Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk. In this paper, we investigate a novel risk-sensitive RL formulation with an Iterated Conditional Value-at-Risk (CVaR) objective under linear and general function approximations. This new formulation, named ICVaR-RL with function approximation, provides a principled way to guarantee safety at each decision step. For ICVaR-RL with linear function approximation, we propose a computationally efficient algorithm ICVaR-L, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret, where $\alpha$ is the risk level, $d$ is the dimension of state-action features, $H$ is the length of each episode, and $K$ is the number of episodes. We also establish a matching lower bound $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$ to validate the optimality of ICVaR-L with respect to $d$ and $K$. For ICVaR-RL with general function approximation, we propose algorithm ICVaR-G, which achieves an $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret, where $D$ is a dimensional parameter that depends on the eluder dimension and covering number. Furthermore, our analysis provides several novel techniques for risk-sensitive RL, including an efficient approximation of the CVaR operator, a new ridge regression with CVaR-adapted features, and a refined elliptical potential lemma.
    摘要 ��risk-sensitive reinforcement learning(RL)目的是优化策略,既能够获得预期的奖励,又能够保证安全。在这篇论文中,我们研究了一种新的risk-sensitive RL形式,即Iterated Conditional Value-at-Risk(CVaR)目标下的RL。这种新形式被称为ICVaR-RL,它提供了一种理性的方式来在各个决策步骤上保证安全。 дляICVaR-RLLinear function approximation,我们提出了一种高效的算法ICVaR-L,它在 $\widetilde{O}(\sqrt{\alpha^{-(H+1)}(d^2H^4+dH^6)K})$ regret上达到,其中 $\alpha$ 是风险水平, $d$ 是状态动作特征的维度, $H$ 是每个episode的长度, $K$ 是集数。我们还证明了一个匹配的下界 $\Omega(\sqrt{\alpha^{-(H-1)}d^2K})$,以验证ICVaR-L的优化性。为ICVaR-RL General function approximation,我们提出了一种算法ICVaR-G,它在 $\widetilde{O}(\sqrt{\alpha^{-(H+1)}DH^4K})$ regret上达到,其中 $D$ 是一个参数,取决于eluder dimension和covering number。此外,我们的分析还提供了一些新的技术,包括CVaR运算的有效 aproximation、CVaR适应的ridge regression和Refined elliptical potential lemma。

Policy Contrastive Imitation Learning

  • paper_url: http://arxiv.org/abs/2307.02829
  • repo_url: None
  • paper_authors: Jialei Huang, Zhaoheng Yin, Yingdong Hu, Yang Gao
  • for: 该论文的目的是提出一种新的仿制学习方法,即Policy Contrastive Imitation Learning(PCIL),以解决现有的仿制学习方法中的一个主要问题,即仿制学习器的表征质量低下。
  • methods: 该论文提出了一种新的仿制学习方法PCIL,该方法通过在不同策略之间固定 anchoring 来学习一个对比性的表征空间,并通过cosine相似性来生成一个平滑的奖励。
  • results: 该论文的实验结果表明,PCIL可以在DeepMind Control suite上实现最佳性能,并且qualitative result Suggests that PCIL建立了一个更平滑和更有意义的表征空间 для仿制学习。
    Abstract Adversarial imitation learning (AIL) is a popular method that has recently achieved much success. However, the performance of AIL is still unsatisfactory on the more challenging tasks. We find that one of the major reasons is due to the low quality of AIL discriminator representation. Since the AIL discriminator is trained via binary classification that does not necessarily discriminate the policy from the expert in a meaningful way, the resulting reward might not be meaningful either. We propose a new method called Policy Contrastive Imitation Learning (PCIL) to resolve this issue. PCIL learns a contrastive representation space by anchoring on different policies and generates a smooth cosine-similarity-based reward. Our proposed representation learning objective can be viewed as a stronger version of the AIL objective and provide a more meaningful comparison between the agent and the policy. From a theoretical perspective, we show the validity of our method using the apprenticeship learning framework. Furthermore, our empirical evaluation on the DeepMind Control suite demonstrates that PCIL can achieve state-of-the-art performance. Finally, qualitative results suggest that PCIL builds a smoother and more meaningful representation space for imitation learning.
    摘要 “智慧传承学习(AIL)是一种流行的方法,它最近已经取得了很大的成功。然而,AIL在更加具体的任务上的表现仍然不满意。我们发现其中一个主要原因是AIL对于策略的识别器表现质量低下。因为AIL的识别器是通过二元分类来训练,这并不一定能够对策略和专家的比较进行真实的分辨。因此,我们提出了一新的方法called Policy Contrastive Imitation Learning(PCIL),以解决这个问题。PCIL通过不同策略的固定 anchor,学习一个对策略的对比性的表现空间,并通过cosine相似性基于的奖励来评估策略。我们的提案的表现学习目标可以视为AIL目标的强化版本,并提供了更加真实的策略和专家之间的比较。从理论上看,我们显示了PCIL的正确性,使用了学习传承框架。此外,我们的实验评估在DeepMind Control套件中,展示了PCIL可以实现最佳性能。最后,我们的质数数据显示PCIL可以建立一个更加平滑和真实的对比学习空间。”

Sampling-based Fast Gradient Rescaling Method for Highly Transferable Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2307.02828
  • repo_url: https://github.com/JHL-HUST/S-FGRM
  • paper_authors: Xu Han, Anmin Liu, Chenxuan Yao, Yanbo Fan, Kun He
  • for: 针对深度神经网络受到黑盒攻击的研究,尤其是黑盒攻击的传输性能。
  • methods: 基于梯度更新的 gradient-based 方法,包括使用 sign 函数生成梯度更新的噪声。
  • results: 提出一种 Sampling-based Fast Gradient Rescaling Method (S-FGRM),可以减少梯度更新的误差并提高黑盒攻击的传输性能。通过数据缩放substitute sign 函数而不需要额外计算成本,并提出了 Depth First Sampling 方法来消除噪声并稳定梯度更新。对于任何 gradient-based 攻击方法,我们的方法都可以用,并且可以与其他输入转换或ensemble方法相结合以进一步提高黑盒攻击的传输性能。在标准 ImageNet 数据集上进行了广泛的实验,并达到了比基eline的state-of-the-art 性能。
    Abstract Deep neural networks are known to be vulnerable to adversarial examples crafted by adding human-imperceptible perturbations to the benign input. After achieving nearly 100% attack success rates in white-box setting, more focus is shifted to black-box attacks, of which the transferability of adversarial examples has gained significant attention. In either case, the common gradient-based methods generally use the sign function to generate perturbations on the gradient update, that offers a roughly correct direction and has gained great success. But little work pays attention to its possible limitation. In this work, we observe that the deviation between the original gradient and the generated noise may lead to inaccurate gradient update estimation and suboptimal solutions for adversarial transferability. To this end, we propose a Sampling-based Fast Gradient Rescaling Method (S-FGRM). Specifically, we use data rescaling to substitute the sign function without extra computational cost. We further propose a Depth First Sampling method to eliminate the fluctuation of rescaling and stabilize the gradient update. Our method could be used in any gradient-based attacks and is extensible to be integrated with various input transformation or ensemble methods to further improve the adversarial transferability. Extensive experiments on the standard ImageNet dataset show that our method could significantly boost the transferability of gradient-based attacks and outperform the state-of-the-art baselines.
    摘要 在这种情况下,我们发现,梯度更新估计中的偏差可能会导致不准确的梯度更新估计,从而导致攻击性能下降。为了解决这个问题,我们提出了一种快速梯度缩放方法(S-FGRM)。具体来说,我们使用数据缩放来取代 sign 函数,而不需要额外的计算成本。此外,我们还提出了深度优先采样方法,以消除缩放的摆动,稳定梯度更新。我们的方法可以在任何梯度基本攻击中使用,并可以与不同的输入变换或集成方法结合使用,以进一步提高攻击性能。我们在标准 ImageNet 数据集上进行了广泛的实验,发现我们的方法可以在攻击性能上提高很多,并超越当前的基eline。

  • paper_url: http://arxiv.org/abs/2307.02819
  • repo_url: None
  • paper_authors: Nathan Koome Murungi, Michael Vinh Pham, Xufeng Dai, Xiaodong Qu
  • for: 本文是一篇系统性文献综述,探讨了在机器学习上的脑机器接口(BCI)研究,尤其是使用电энцеfalography(EEG)进行的研究。
  • methods: 本文使用了最新的研究方法和算法,包括EEG数据采集、数据处理和分析等。
  • results: 本文对BCI研究进行了系统性的总结和分析,提供了最新的发现和探讨,并对未来的研究预测了一些有前途的方向。
    Abstract This paper presents a systematic literature review on Brain-Computer Interfaces (BCIs) in the context of Machine Learning. Our focus is on Electroencephalography (EEG) research, highlighting the latest trends as of 2023. The objective is to provide undergraduate researchers with an accessible overview of the BCI field, covering tasks, algorithms, and datasets. By synthesizing recent findings, our aim is to offer a fundamental understanding of BCI research, identifying promising avenues for future investigations.
    摘要 这篇论文提出了一种系统性的文献评议,探讨了在机器学习之下的脑计算器接口(BCI)。我们的关注点在于电enzephalography(EEG)研究,强调最新的趋势到2023年。我们的目标是为大学生研究者提供访问性的BCI领域概述,包括任务、算法和数据集。通过总结最近的发现,我们希望为未来的研究提供丰富的理解,并确定了可能的发展方向。Note: Simplified Chinese is used here, as it is more widely used in mainland China and other parts of the world. Traditional Chinese is also an option, but it may be less accessible to some readers.

CPDG: A Contrastive Pre-Training Method for Dynamic Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.02813
  • repo_url: None
  • paper_authors: Yuanchen Bei, Hao Xu, Sheng Zhou, Huixuan Chi, Haishuai Wang, Mengdi Zhang, Zhao Li, Jiajun Bu
  • for: 本研究旨在提高动态图神经网络(DGNN)在实际场景中的实用应用。
  • methods: 该研究提出了一种名为Contrastive Pre-Training Method for Dynamic Graph Neural Networks(CPDG),通过灵活的结构-时间子图采样器和结构-时间对照预训练方案来解决DGNN预训练中的总化能力和长短期模型能力问题。
  • results: 对于不同的下游任务和三种传输设置,CPDG在大规模的研究和实际动态图数据集上进行了广泛的实验,并显示了与现有方法相比的显著性提高。
    Abstract Dynamic graph data mining has gained popularity in recent years due to the rich information contained in dynamic graphs and their widespread use in the real world. Despite the advances in dynamic graph neural networks (DGNNs), the rich information and diverse downstream tasks have posed significant difficulties for the practical application of DGNNs in industrial scenarios. To this end, in this paper, we propose to address them by pre-training and present the Contrastive Pre-Training Method for Dynamic Graph Neural Networks (CPDG). CPDG tackles the challenges of pre-training for DGNNs, including generalization capability and long-short term modeling capability, through a flexible structural-temporal subgraph sampler along with structural-temporal contrastive pre-training schemes. Extensive experiments conducted on both large-scale research and industrial dynamic graph datasets show that CPDG outperforms existing methods in dynamic graph pre-training for various downstream tasks under three transfer settings.
    摘要 “动态图数据挖掘在最近几年内得到了广泛应用,这是因为动态图中含有丰富的信息和在实际世界中的广泛使用。尽管动态图神经网络(DGNN)得到了进步,但是动态图信息的丰富和多样化下渠道任务却对DGNN在工业场景中的实际应用带来了很大的挑战。为了解决这些挑战,在本文中,我们提出了一种名为对比预训练方法 для动态图神经网络(CPDG)。CPDG通过flexible的结构-时间子图采样器和结构-时间对比预训练方案,解决了DGNN预训练中的通用能力和长短期模型能力问题。在大规模的研究和工业动态图数据集上进行了广泛的实验,得到了CPDG在多种下渠道任务下的比较优秀表现。”

OLR-WA Online Regression with Weighted Average

  • paper_url: http://arxiv.org/abs/2307.02804
  • repo_url: None
  • paper_authors: Mohammad Abu-Shaira, Greg Speegle
  • for: 这个论文是为了解决机器学习模型建立的问题,即需要大量的训练数据来建立准确的模型。
  • methods: 这个论文提出了一种新的在线学习方法,即OLR-WA(在线回归Weighted Average)方法,该方法可以在新的数据陆续到达时,不需要重新计算整个模型,而是可以逐步更新模型,并且可以根据用户定义的权重来偏好新数据或旧数据。
  • results: 在2D和3D的实验中,OLR-WA方法与整个数据集的静止批处理模型的性能相似,而且可以根据用户设置的权重来控制OLR-WA方法是否更快地适应变化或者更慢地抵抗变化。
    Abstract Machine Learning requires a large amount of training data in order to build accurate models. Sometimes the data arrives over time, requiring significant storage space and recalculating the model to account for the new data. On-line learning addresses these issues by incrementally modifying the model as data is encountered, and then discarding the data. In this study we introduce a new online linear regression approach. Our approach combines newly arriving data with a previously existing model to create a new model. The introduced model, named OLR-WA (OnLine Regression with Weighted Average) uses user-defined weights to provide flexibility in the face of changing data to bias the results in favor of old or new data. We have conducted 2-D and 3-D experiments comparing OLR-WA to a static model using the entire data set. The results show that for consistent data, OLR-WA and the static batch model perform similarly and for varying data, the user can set the OLR-WA to adapt more quickly or to resist change.
    摘要

Few-Shot Personalized Saliency Prediction Using Tensor Regression for Preserving Structural Global Information

  • paper_url: http://arxiv.org/abs/2307.02799
  • repo_url: None
  • paper_authors: Yuya Moroto, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama
  • for: 预测个性化注意力图(PSM),以保留图像的结构信息。
  • methods: 使用tensor-to-matrix regression模型,以保持图像的结构信息。
  • results: 在实验结果中,提出的方法比前期方法更高的预测精度。
    Abstract This paper presents a few-shot personalized saliency prediction using tensor-to-matrix regression for preserving the structural global information of personalized saliency maps (PSMs). In contrast to a general saliency map, a PSM has been great potential since its map indicates the person-specific visual attention that is useful for obtaining individual visual preferences from heterogeneity of gazed areas. The PSM prediction is needed for acquiring the PSM for the unseen image, but its prediction is still a challenging task due to the complexity of individual gaze patterns. For recognizing individual gaze patterns from the limited amount of eye-tracking data, the previous methods adopt the similarity of gaze tendency between persons. However, in the previous methods, the PSMs are vectorized for the prediction model. In this way, the structural global information of the PSMs corresponding to the image is ignored. For automatically revealing the relationship between PSMs, we focus on the tensor-based regression model that can preserve the structural information of PSMs, and realize the improvement of the prediction accuracy. In the experimental results, we confirm the proposed method including the tensor-based regression outperforms the comparative methods.
    摘要 Previous methods have used similarity of gaze tendencies between persons to recognize individual patterns, but these methods vectorize PSMs for the prediction model, ignoring the structural global information of the PSMs corresponding to the image. In this study, we focus on a tensor-based regression model that can preserve the structural information of PSMs, and demonstrate improved prediction accuracy.Experimental results confirm that the proposed method, including tensor-based regression, outperforms comparative methods. By automatically revealing the relationship between PSMs, our method improves the accuracy of personalized saliency prediction.

VerifAI: Verified Generative AI

  • paper_url: http://arxiv.org/abs/2307.02796
  • repo_url: None
  • paper_authors: Nan Tang, Chenyu Yang, Ju Fan, Lei Cao
  • for: 提高生成AI的准确性和可靠性
  • methods: 通过多模式数据湖的数据分析和评估,确保生成AI输出的正确性
  • results: 提高生成AI的可靠性和 trasparency,促进决策的可信度
    Abstract Generative AI has made significant strides, yet concerns about the accuracy and reliability of its outputs continue to grow. Such inaccuracies can have serious consequences such as inaccurate decision-making, the spread of false information, privacy violations, legal liabilities, and more. Although efforts to address these risks are underway, including explainable AI and responsible AI practices such as transparency, privacy protection, bias mitigation, and social and environmental responsibility, misinformation caused by generative AI will remain a significant challenge. We propose that verifying the outputs of generative AI from a data management perspective is an emerging issue for generative AI. This involves analyzing the underlying data from multi-modal data lakes, including text files, tables, and knowledge graphs, and assessing its quality and consistency. By doing so, we can establish a stronger foundation for evaluating the outputs of generative AI models. Such an approach can ensure the correctness of generative AI, promote transparency, and enable decision-making with greater confidence. Our vision is to promote the development of verifiable generative AI and contribute to a more trustworthy and responsible use of AI.
    摘要

The Role of Subgroup Separability in Group-Fair Medical Image Classification

  • paper_url: http://arxiv.org/abs/2307.02791
  • repo_url: https://github.com/biomedia-mira/subgroup-separability
  • paper_authors: Charles Jones, Mélanie Roschewitz, Ben Glocker
  • for: 这个论文探讨了深度分类器的性能差异。
  • methods: 该论文使用了 teoretic analysis和广泛的实验evaluation来研究深度分类器在不同医疗影像Modalities和保护特征下的性能差异。
  • results: 研究发现,深度分类器在不同医疗影像Modalities和保护特征下的能力将个体分为子群变化很大,而且这种性能差异与模型受到系统性偏见的情况有关。这些发现为开发公正医疗AI提供了重要的新视角。
    Abstract We investigate performance disparities in deep classifiers. We find that the ability of classifiers to separate individuals into subgroups varies substantially across medical imaging modalities and protected characteristics; crucially, we show that this property is predictive of algorithmic bias. Through theoretical analysis and extensive empirical evaluation, we find a relationship between subgroup separability, subgroup disparities, and performance degradation when models are trained on data with systematic bias such as underdiagnosis. Our findings shed new light on the question of how models become biased, providing important insights for the development of fair medical imaging AI.
    摘要 我们研究深度分类器的性能差异。我们发现分类器将个体分为子群时的能力差异很大 across medical imaging modalities和保护特征; 重要的是,我们发现这个性能是predictive of algorithmic bias。通过理论分析和广泛的实验评估,我们发现与subgroup separability, subgroup disparities和模型在数据中系统性偏见时的性能下降之间存在关系。我们的发现为开发公正医疗AI提供了重要的新idea。

Large Language Models Empowered Autonomous Edge AI for Connected Intelligence

  • paper_url: http://arxiv.org/abs/2307.02779
  • repo_url: None
  • paper_authors: Yifei Shen, Jiawei Shao, Xinjie Zhang, Zehong Lin, Hao Pan, Dongsheng Li, Jun Zhang, Khaled B. Letaief
  • for: 这篇论文旨在实现连接智能,即在无线网络中快速、低延迟、隐私保护的人工智能服务。
  • methods: 该系统使用云-边缘-客户端层次架构,其中大型自然语言模型(生成式预训练变换器)在云服务器上运行,而其他AI模型在设备和边缘服务器上并行部署。
  • results: 实验结果显示该系统可以准确理解用户需求,高效执行AI模型,并通过边缘联合学习生成高性能AI模型。
    Abstract The evolution of wireless networks gravitates towards connected intelligence, a concept that envisions seamless interconnectivity among humans, objects, and intelligence in a hyper-connected cyber-physical world. Edge AI emerges as a promising solution to achieve connected intelligence by delivering high-quality, low-latency, and privacy-preserving AI services at the network edge. In this article, we introduce an autonomous edge AI system that automatically organizes, adapts, and optimizes itself to meet users' diverse requirements. The system employs a cloud-edge-client hierarchical architecture, where the large language model, i.e., Generative Pretrained Transformer (GPT), resides in the cloud, and other AI models are co-deployed on devices and edge servers. By leveraging the powerful abilities of GPT in language understanding, planning, and code generation, we present a versatile framework that efficiently coordinates edge AI models to cater to users' personal demands while automatically generating code to train new models via edge federated learning. Experimental results demonstrate the system's remarkable ability to accurately comprehend user demands, efficiently execute AI models with minimal cost, and effectively create high-performance AI models through federated learning.
    摘要 wireless networks 的演化倾向于连接智能,这是一种概念,它描述了人机物之间的无缝连接,并在具有质量、延迟和隐私保护的半Connected cyber-physical world中实现智能连接。 Edge AI 作为一种实现连接智能的有力解决方案,可以在网络边缘提供高质量、低延迟和隐私保护的 AI 服务。在本文中,我们介绍了一个自动化的边缘 AI 系统,该系统可以自动组织、适应和优化以满足用户的多样化需求。该系统采用了云端-边缘-客户端层次架构,其中大型语言模型(i.e., 生成预训练 transformer,GPT)在云端 resid,而其他 AI 模型在设备和边缘服务器上并行部署。通过利用 GPT 的强大语言理解、规划和代码生成能力,我们提出了一种多功能框架,可以高效协调边缘 AI 模型,以满足用户的个人需求,并自动生成代码以进行边缘联合学习。实验结果表明该系统具有remarkable的能力,可以准确理解用户需求,高效执行 AI 模型,并通过联合学习创建高性能 AI 模型。

Temporal Difference Learning for High-Dimensional PIDEs with Jumps

  • paper_url: http://arxiv.org/abs/2307.02766
  • repo_url: None
  • paper_authors: Liwei Lu, Hailong Guo, Xu Yang, Yi Zhu
  • for: 解决高维度partial integro-differential equations (PIDEs) 的深度学习框架。
  • methods: 基于 temporal difference learning 的 Levy 过程集和对应的强化学习模型。
  • results: 在100维度实验中Relative error 为 O(10^{-3}),在一维纯跳问题中Relative error 为 O(10^{-4)。 Additionally, the method demonstrates low computational cost and robustness, making it suitable for addressing problems with different forms and intensities of jumps.
    Abstract In this paper, we propose a deep learning framework for solving high-dimensional partial integro-differential equations (PIDEs) based on the temporal difference learning. We introduce a set of Levy processes and construct a corresponding reinforcement learning model. To simulate the entire process, we use deep neural networks to represent the solutions and non-local terms of the equations. Subsequently, we train the networks using the temporal difference error, termination condition, and properties of the non-local terms as the loss function. The relative error of the method reaches O(10^{-3}) in 100-dimensional experiments and O(10^{-4}) in one-dimensional pure jump problems. Additionally, our method demonstrates the advantages of low computational cost and robustness, making it well-suited for addressing problems with different forms and intensities of jumps.
    摘要 在这篇论文中,我们提出了一种深度学习框架,用于解决高维partial integro-differential equations(PIDEs)。我们引入了一组Levy进程,并构建了相应的回归学习模型。为了模拟整个过程,我们使用深度神经网络来表示方程的解和非本地项。然后,我们使用时间差错、终止条件和非本地项的性质作为损失函数进行训练。在100维实验中,我们达到了相对误差为O(10^{-3}),而在一维纯跳问题中,我们达到了O(10^{-4})。此外,我们的方法还具有低计算成本和稳定性,使其适用于不同形式和强度的跳跃问题。

When Does Confidence-Based Cascade Deferral Suffice?

  • paper_url: http://arxiv.org/abs/2307.02764
  • repo_url: None
  • paper_authors: Wittawat Jitkrittum, Neha Gupta, Aditya Krishna Menon, Harikrishna Narasimhan, Ankit Singh Rawat, Sanjiv Kumar
  • for: 本文研究了 cascade 模型中的推论成本可以适应性地变化的策略,特别是使用 confidence 来决定是否继续预测。
  • methods: 本文使用了一种简单的推论规则,即根据当前分类器的信任度来决定是否继续预测。然而,这种信任度基于的推论规则并不考虑 cascade 的结构,例如下游模型的错误。
  • results: 本文发现在某些情况下,信任度基于的推论规则可能会失败,而 alternate 推论策略可以在这些情况下表现更好。本文首先提出了一种理论性的最佳推论规则,然后研究了后备推论机制,并证明它们可以在某些情况下大幅提高推论性能。
    Abstract Cascades are a classical strategy to enable inference cost to vary adaptively across samples, wherein a sequence of classifiers are invoked in turn. A deferral rule determines whether to invoke the next classifier in the sequence, or to terminate prediction. One simple deferral rule employs the confidence of the current classifier, e.g., based on the maximum predicted softmax probability. Despite being oblivious to the structure of the cascade -- e.g., not modelling the errors of downstream models -- such confidence-based deferral often works remarkably well in practice. In this paper, we seek to better understand the conditions under which confidence-based deferral may fail, and when alternate deferral strategies can perform better. We first present a theoretical characterisation of the optimal deferral rule, which precisely characterises settings under which confidence-based deferral may suffer. We then study post-hoc deferral mechanisms, and demonstrate they can significantly improve upon confidence-based deferral in settings where (i) downstream models are specialists that only work well on a subset of inputs, (ii) samples are subject to label noise, and (iii) there is distribution shift between the train and test set.
    摘要 瀑布是一种经典的战略,以适应性地变化在不同的样本上进行推理成本。一个推延规则确定在逻辑顺序中是否invoked下一个分类器,或者终止预测。一种简单的推延规则是基于当前分类器的信任度,例如基于最大预测概率。尽管无法考虑瀑布的结构——例如下游模型的错误——但这种信任度基于推延规则在实践中经常工作非常好。在这篇论文中,我们想要更好地了解 confidence-based 推延可能失败的条件,以及在哪些情况下 alternate 推延策略可以表现更好。我们首先提出了一个理论上的最佳推延规则的 caracterization,准确地描述了 confidence-based 推延可能失败的情况。然后我们研究了后续推延机制,并证明它们可以在下列情况下显著超越 confidence-based 推延:(i)下游模型只是特定输入上工作良好,(ii)样本受到标签噪音,(iii)训练集和测试集之间存在分布变化。

Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages

  • paper_url: http://arxiv.org/abs/2307.03679
  • repo_url: None
  • paper_authors: Shreyanth S
  • for: 提高安全措施和多语言噪声纠正
  • methods: 结合不减波лет变换和Word Embedded Semantic Marginal Autoencoder (WESMA)
  • results: 成功提高多语言安全性和数据质量
    Abstract By combining the undecimated wavelet transform within a Word Embedded Semantic Marginal Autoencoder (WESMA), this research study provides a novel strategy for improving security measures and denoising multiple languages. The incorporation of these strategies is intended to address the issues of robustness, privacy, and multilingualism in data processing applications. The undecimated wavelet transform is used as a feature extraction tool to identify prominent language patterns and structural qualities in the input data. The proposed system may successfully capture significant information while preserving the temporal and geographical links within the data by employing this transform. This improves security measures by increasing the system's ability to detect abnormalities, discover hidden patterns, and distinguish between legitimate content and dangerous threats. The Word Embedded Semantic Marginal Autoencoder also functions as an intelligent framework for dimensionality and noise reduction. The autoencoder effectively learns the underlying semantics of the data and reduces noise components by exploiting word embeddings and semantic context. As a result, data quality and accuracy are increased in following processing stages. The suggested methodology is tested using a diversified dataset that includes several languages and security scenarios. The experimental results show that the proposed approach is effective in attaining security enhancement and denoising capabilities across multiple languages. The system is strong in dealing with linguistic variances, producing consistent outcomes regardless of the language used. Furthermore, incorporating the undecimated wavelet transform considerably improves the system's ability to efficiently address complex security concerns
    摘要 这个研究使用Word Embedded Semantic Marginal Autoencoder(WESMA)和不删减波лет变数(Undecimated Wavelet Transform)的结合,提供了一个新的安全措施和类别数据处理领域的改进策略。这些策略的目的是解决资料处理中的问题,包括Robustness、隐私和多语言支持。Undecimated Wavelet Transform 用于特征提取,以找出输入数据中的主要语言模式和结构特征。这个系统可以成功地捕捉重要信息,并保持资料中的时间和地理连结。这将提高安全措施,增加系统的检测异常、发现隐藏模式和分辨合法内容和危险威胁的能力。Word Embedded Semantic Marginal Autoencoder 同时serve as an intelligent framework for dimensionality and noise reduction,它能够学习资料的底层 semantics,并通过Word Embeddings和Semantic Context来减少噪声。因此,处理后的数据质量和准确性增加。实验结果显示,提案的方法能够在多种语言和安全enario下实现安全增强和类别数据处理能力。系统在不同语言下的处理表现均匀,并且在应用Undecimated Wavelet Transform时,具有更高的处理复杂安全问题的能力。

Optimal Bandwidth Selection for DENCLUE Algorithm

  • paper_url: http://arxiv.org/abs/2307.03206
  • repo_url: None
  • paper_authors: Hao Wang
  • for: 本研究旨在提出一种新的参数选择方法,以提高density-based clustering算法DENCLUE的性能。
  • methods: 本研究使用了一种新的参数选择方法,基于density-based clustering算法DENCLUE。
  • results: 实验结果表明,新的参数选择方法可以提高density-based clustering算法DENCLUE的性能。
    Abstract In modern day industry, clustering algorithms are daily routines of algorithm engineers. Although clustering algorithms experienced rapid growth before 2010. Innovation related to the research topic has stagnated after deep learning became the de facto industrial standard for machine learning applications. In 2007, a density-based clustering algorithm named DENCLUE was invented to solve clustering problem for nonlinear data structures. However, its parameter selection problem was largely neglected until 2011. In this paper, we propose a new approach to compute the optimal parameters for the DENCLUE algorithm, and discuss its performance in the experiment section.
    摘要 现代工业中,聚类算法是算法工程师的日常 Routine。虽然聚类算法在2010年之前经历了快速增长,但是与研究主题相关的创新却在深度学习成为机器学习应用的 де facto标准后停滞不前。2007年,一种基于浓度的聚类算法 named DENCLUE 被发明,用于解决非线性数据结构上的聚类问题。然而,该算法的参数选择问题一直未得到了充分的关注,直到2011年。在这篇论文中,我们提出了一种新的方法来计算 DENCLUE 算法的优化参数,并在实验部分讨论其性能。

Offline Reinforcement Learning with Imbalanced Datasets

  • paper_url: http://arxiv.org/abs/2307.02752
  • repo_url: None
  • paper_authors: Li Jiang, Sijie Chen, Jielin Qiu, Haoran Xu, Wai Kin Chan, Zhao Ding
  • for: 本研究旨在解决现有的离线强化学习(RL)研究中的数据不均衡问题,即在实际世界中的数据分布不均衡。
  • methods: 本研究使用了增强的CQL方法,通过记忆过程来恢复过去相关的经验,以解决离线RL中数据不均衡的挑战。
  • results: 对于具有不同水平的不均衡数据集,我们的方法比基线方法表现出色,获得了更高的性能。
    Abstract The prevalent use of benchmarks in current offline reinforcement learning (RL) research has led to a neglect of the imbalance of real-world dataset distributions in the development of models. The real-world offline RL dataset is often imbalanced over the state space due to the challenge of exploration or safety considerations. In this paper, we specify properties of imbalanced datasets in offline RL, where the state coverage follows a power law distribution characterized by skewed policies. Theoretically and empirically, we show that typically offline RL methods based on distributional constraints, such as conservative Q-learning (CQL), are ineffective in extracting policies under the imbalanced dataset. Inspired by natural intelligence, we propose a novel offline RL method that utilizes the augmentation of CQL with a retrieval process to recall past related experiences, effectively alleviating the challenges posed by imbalanced datasets. We evaluate our method on several tasks in the context of imbalanced datasets with varying levels of imbalance, utilizing the variant of D4RL. Empirical results demonstrate the superiority of our method over other baselines.
    摘要 现有研究中的大部分Offline reinforcement learning(RL)都使用了标准的benchmark,导致实际世界数据集的不均衡问题被忽略。实际世界中的Offline RL数据集经常具有状态空间的不均衡,这可能是因为探索挑战或安全考虑。在这篇论文中,我们指定了Offline RL中的不均衡数据集的性质,其中状态覆盖率遵循一个带有扁平政策的力学分布。理论和实验表明,通常Offline RL方法,如保守Q学习(CQL),在不均衡数据集上不能有效提取策略。 inspirited by natural intelligence,我们提议一种新的Offline RL方法,该方法利用CQL的扩展和回味过程来回忆过去相关的经验,有效地解决了不均衡数据集中的挑战。我们在多个任务上进行了在不均衡数据集上的评估,使用了D4RL的变体。实验结果表明,我们的方法在其他基elines之上具有显著的优越性。

Evaluating the Evaluators: Are Current Few-Shot Learning Benchmarks Fit for Purpose?

  • paper_url: http://arxiv.org/abs/2307.02732
  • repo_url: None
  • paper_authors: Luísa Shimabucoro, Timothy Hospedales, Henry Gouk
  • for: 本研究旨在investigate task-level evaluation in few-shot learning, 以提供更加可靠的模型评估方法。
  • methods: 本文使用cross-validation with a low number of folds和bootstrapping来评估模型性能,并考虑了多种模型选择策略。
  • results: 研究发现,使用cross-validation with a low number of folds可以直接 estimating model performance,而使用bootstrapping或cross-validation with a large number of folds更适合用于模型选择。总之,现有的几个shot learning benchmarks不适合用于评估单个任务的模型性能。
    Abstract Numerous benchmarks for Few-Shot Learning have been proposed in the last decade. However all of these benchmarks focus on performance averaged over many tasks, and the question of how to reliably evaluate and tune models trained for individual tasks in this regime has not been addressed. This paper presents the first investigation into task-level evaluation -- a fundamental step when deploying a model. We measure the accuracy of performance estimators in the few-shot setting, consider strategies for model selection, and examine the reasons for the failure of evaluators usually thought of as being robust. We conclude that cross-validation with a low number of folds is the best choice for directly estimating the performance of a model, whereas using bootstrapping or cross validation with a large number of folds is better for model selection purposes. Overall, we find that existing benchmarks for few-shot learning are not designed in such a way that one can get a reliable picture of how effectively methods can be used on individual tasks.
    摘要 多种几担学习 benchmark 在过去一代提出了,但所有这些 benchmark 都专注于多任务性能的平均值,而忽略了如何可靠地评估和调整用于个个任务的模型。这篇文章是首次研究到任务级评估 -- 在这个 Régime 中发挥作用的基本步骤。我们测量了几担学习中性能估计器的准确性,考虑了选择模型的策略,并探究了通常被视为可靠的评估器的失败原因。我们结论是,使用少量剂的交叉验证是直接测量模型性能的最佳选择,而使用bootstrap或多剂交叉验证更适合用于选择模型目的。总之,我们发现现有的几担学习 benchmark 并没有设计得能够提供用于个个任务的可靠性能图像。

Hierarchical Empowerment: Towards Tractable Empowerment-Based Skill-Learning

  • paper_url: http://arxiv.org/abs/2307.02728
  • repo_url: None
  • paper_authors: Andrew Levy, Sreehari Rammohan, Alessandro Allievi, Scott Niekum, George Konidaris
  • for: 本研究旨在开发一种新的框架,以便更加可 tractable 地学习大量独特技能。
  • methods: 本研究使用 Goal-Conditioned Hierarchical Reinforcement Learning 概念,并提出了一种新的约束下降 bounds 方法,以便更加准确地计算 Empowerment。此外,本研究还提出了一种层次结构,用于计算 Empowerment over exponentially longer time scales。
  • results: 在一系列的 simulated robotics tasks 中,我们的 four level agents 能够学习技能,covering a surface area over two orders of magnitude larger than prior work。
    Abstract General purpose agents will require large repertoires of skills. Empowerment -- the maximum mutual information between skills and the states -- provides a pathway for learning large collections of distinct skills, but mutual information is difficult to optimize. We introduce a new framework, Hierarchical Empowerment, that makes computing empowerment more tractable by integrating concepts from Goal-Conditioned Hierarchical Reinforcement Learning. Our framework makes two specific contributions. First, we introduce a new variational lower bound on mutual information that can be used to compute empowerment over short horizons. Second, we introduce a hierarchical architecture for computing empowerment over exponentially longer time scales. We verify the contributions of the framework in a series of simulated robotics tasks. In a popular ant navigation domain, our four level agents are able to learn skills that cover a surface area over two orders of magnitude larger than prior work.
    摘要 通用代理人需要大量技能。使owerment——最大双方信息共同性——提供了学习大量独特技能的路径,但双方信息困难优化。我们提出了一个新的框架,层次empowerment,它通过组合目标受控层次学习概念来使计算empowerment更加可 tractable。我们的框架做出了两项具体贡献:首先,我们引入了一种新的变量下界,用于计算短时间段内的共同信息,这可以用来计算empowerment。其次,我们引入了一个层次建构,用于在指数增长的时间尺度上计算empowerment。我们在一些模拟的机器人任务中验证了我们的框架的贡献,并在受欢迎的蚂蚁导航领域中,我们的四级代理人能够学习技能,占据了两个级别的表面积,比过去的工作更大。

Through the Fairness Lens: Experimental Analysis and Evaluation of Entity Matching

  • paper_url: http://arxiv.org/abs/2307.02726
  • repo_url: https://github.com/uic-indexlab/fair_entity_matching
  • paper_authors: Nima Shahbazi, Nikola Danevski, Fatemeh Nargesian, Abolfazl Asudeh, Divesh Srivastava
  • for: This paper aims to address the fairness of entity matching (EM) techniques, which have not been well-studied despite extensive research on algorithmic fairness.
  • methods: The authors perform an extensive experimental evaluation of various EM techniques using two social datasets generated from publicly available datasets.
  • results: The authors find that EM techniques can be unfair under certain conditions, such as when some demographic groups are overrepresented or when names are more similar in some groups compared to others. They also find that certain fairness definitions, such as positive predictive value parity and true positive rate parity, are more capable of revealing EM unfairness due to the class imbalance nature of EM.
    Abstract Entity matching (EM) is a challenging problem studied by different communities for over half a century. Algorithmic fairness has also become a timely topic to address machine bias and its societal impacts. Despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. Towards addressing this gap, we perform an extensive experimental evaluation of a variety of EM techniques in this paper. We generated two social datasets from publicly available datasets for the purpose of auditing EM through the lens of fairness. Our findings underscore potential unfairness under two common conditions in real-world societies: (i) when some demographic groups are overrepresented, and (ii) when names are more similar in some groups compared to others. Among our many findings, it is noteworthy to mention that while various fairness definitions are valuable for different settings, due to EM's class imbalance nature, measures such as positive predictive value parity and true positive rate parity are, in general, more capable of revealing EM unfairness.
    摘要 <>TRANSLATE_TEXTEntity matching (EM) 是一个长期研究的问题,已经被不同的社区研究了超过半个世纪。algorithmic fairness 也在当今成为一个时间问题,以 Address Machine Bias 和 Its societal impacts。despite extensive research on these two topics, little attention has been paid to the fairness of entity matching. 在这篇论文中,我们进行了广泛的实验评估多种 EM 技术。我们为了审核 EM 的公平性,Generated two social datasets from publicly available datasets。our findings highlight potential unfairness under two common conditions in real-world societies: (i) when some demographic groups are overrepresented, and (ii) when names are more similar in some groups compared to others. among our many findings, it is noteworthy to mention that while various fairness definitions are valuable for different settings, due to EM's class imbalance nature, measures such as positive predictive value parity and true positive rate parity are, in general, more capable of revealing EM unfairness.<>

Understanding Uncertainty Sampling

  • paper_url: http://arxiv.org/abs/2307.02719
  • repo_url: https://github.com/liushangnoname/uncertainty-sampling
  • paper_authors: Shang Liu, Xiaocheng Li
    for:这篇论文的目的是系统地探讨不确定样本选择算法,并提出一个新的不确定度量“损失作为不确定”,以及一个通用的一致损失函数,以便在不同的任务和损失函数下进行定制化。methods:这篇论文使用了一些数学理论和计算机科学技术,包括流程式和质量控制的损失函数、统计学的随机损失函数、和机器学习的损失函数。results:这篇论文提出了一个新的不确定度量“损失作为不确定”,并证明了这个度量的充分性和相对优化性。此外,这篇论文还提出了一个通用的一致损失函数,并证明了这个函数在不同的任务和损失函数下的定制化。最后,这篇论文还与某些不确定样本选择算法的理论和实际问题进行了连接。
    Abstract Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small.
    摘要 <>translate "Uncertainty sampling is a prevalent active learning algorithm that queries sequentially the annotations of data samples which the current prediction model is uncertain about. However, the usage of uncertainty sampling has been largely heuristic: (i) There is no consensus on the proper definition of "uncertainty" for a specific task under a specific loss; (ii) There is no theoretical guarantee that prescribes a standard protocol to implement the algorithm, for example, how to handle the sequentially arrived annotated data under the framework of optimization algorithms such as stochastic gradient descent. In this work, we systematically examine uncertainty sampling algorithms under both stream-based and pool-based active learning. We propose a notion of equivalent loss which depends on the used uncertainty measure and the original loss function and establish that an uncertainty sampling algorithm essentially optimizes against such an equivalent loss. The perspective verifies the properness of existing uncertainty measures from two aspects: surrogate property and loss convexity. Furthermore, we propose a new notion for designing uncertainty measures called \textit{loss as uncertainty}. The idea is to use the conditional expected loss given the features as the uncertainty measure. Such an uncertainty measure has nice analytical properties and generality to cover both classification and regression problems, which enable us to provide the first generalization bound for uncertainty sampling algorithms under both stream-based and pool-based settings, in the full generality of the underlying model and problem. Lastly, we establish connections between certain variants of the uncertainty sampling algorithms with risk-sensitive objectives and distributional robustness, which can partly explain the advantage of uncertainty sampling algorithms when the sample size is small."中文翻译:Active learning中的uncertainty sampling是一种广泛使用的算法,它采样数据样本,并在这些样本上请求标注。然而,uncertainty sampling的使用具有许多不确定性:(i)没有对特定任务和损失函数的uncertainty定义的共识;(ii)没有理论保证,要求实现这个算法,例如如何在权重 descent等优化算法下处理顺序到达的标注数据。在这种工作中,我们系统地研究了uncertainty sampling算法在流式活动学习和储存活动学习下的性能。我们提出了一个等效损失的概念,它取决于使用的uncertainty度量和原始损失函数。我们证明了uncertainty sampling算法实际上是优化这个等效损失的。我们从两个方面验证了现有的uncertainty度量的正确性:代理性和损失凹陷。此外,我们提出了一个新的设计uncertainty度量的思想,即使用特征 conditional expected loss作为uncertainty度量。这种uncertainty度量具有良好的分析性和通用性,可以覆盖类别和回归问题。这使得我们可以提供流式和储存活动学习下的第一个一般化 bounds。最后,我们建立了certain variants of uncertainty sampling算法与风险敏感目标和分布 robustness之间的连接,这可以部分解释uncertainty sampling算法在样本量小时的优势。

Multi-Similarity Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.02712
  • repo_url: https://github.com/lkh-meredith/Debiased-Momentum-Contrastive-Learning-for-Multimodal-Video-Similarity-Measures
  • paper_authors: Emily Mu, John Guttag, Maggie Makar
  • for: 学习一种多相似性强制约束的抽象表示法,以提高模型的泛化能力。
  • methods: 利用多个相似度指标的监督来学习抽象表示,自动学习相似性权重,以优化模型的泛化性能。
  • results: 对比州chart的基eline,模型经过MSCon强制约束后在领域内和领域外的表现均有较好的成绩。
    Abstract Given a similarity metric, contrastive methods learn a representation in which examples that are similar are pushed together and examples that are dissimilar are pulled apart. Contrastive learning techniques have been utilized extensively to learn representations for tasks ranging from image classification to caption generation. However, existing contrastive learning approaches can fail to generalize because they do not take into account the possibility of different similarity relations. In this paper, we propose a novel multi-similarity contrastive loss (MSCon), that learns generalizable embeddings by jointly utilizing supervision from multiple metrics of similarity. Our method automatically learns contrastive similarity weightings based on the uncertainty in the corresponding similarity, down-weighting uncertain tasks and leading to better out-of-domain generalization to new tasks. We show empirically that networks trained with MSCon outperform state-of-the-art baselines on in-domain and out-of-domain settings.
    摘要 给定一个相似度 metric,对冲方法会学习一个 representation,其中相似的例子会被拟合在一起,而不相似的例子会被分离开来。对冲学习技术已经广泛应用于从图像分类到caption生成等任务中来学习表示。然而,现有的对冲学习方法可能会失去泛化能力,因为它们不考虑可能存在多种相似度关系。在这篇论文中,我们提出了一种新的多相似度对冲损失函数(MSCon),它可以通过多种相似度指标来学习泛化 embedding。我们的方法可以自动学习对冲相似度权重,基于相似度uncertainty的下降,从而实现更好的对外部领域泛化。我们通过实验表明,使用 MSCon 训练的网络在域内和域外设置中都能够超越状态机器。

Towards Symmetry-Aware Generation of Periodic Materials

  • paper_url: http://arxiv.org/abs/2307.02707
  • repo_url: None
  • paper_authors: Youzhi Luo, Chengkai Liu, Shuiwang Ji
  • For: 本研究旨在生成具有固体结构的周期材料,而现有的深度学习方法尚未完全捕捉周期材料的物理准确性。* Methods: 本文提出了一种新的材料生成方法——SyMat,可以捕捉物理准确性的周期材料结构。SyMat使用自适应神经网络模型生成原子类型集、晶格长度和晶格角,并使用一种新的协调分布模型来进行矩阵协调。* Results: SyMat理论上具有对所有Symmetry转换的不变性,并在随机生成和性能优化任务中达到了出色的表现。
    Abstract We consider the problem of generating periodic materials with deep models. While symmetry-aware molecule generation has been studied extensively, periodic materials possess different symmetries, which have not been completely captured by existing methods. In this work, we propose SyMat, a novel material generation approach that can capture physical symmetries of periodic material structures. SyMat generates atom types and lattices of materials through generating atom type sets, lattice lengths and lattice angles with a variational auto-encoder model. In addition, SyMat employs a score-based diffusion model to generate atom coordinates of materials, in which a novel symmetry-aware probabilistic model is used in the coordinate diffusion process. We show that SyMat is theoretically invariant to all symmetry transformations on materials and demonstrate that SyMat achieves promising performance on random generation and property optimization tasks.
    摘要 我们考虑了使用深度模型生成 periodic 材料的问题。 Although symmetry-aware molecule generation has been studied extensively, periodic materials have different symmetries that have not been fully captured by existing methods. In this work, we propose SyMat, a novel material generation approach that can capture the physical symmetries of periodic material structures. SyMat generates atom types and lattices of materials by generating atom type sets, lattice lengths, and lattice angles with a variational autoencoder model. In addition, SyMat employs a score-based diffusion model to generate atom coordinates of materials, using a novel symmetry-aware probabilistic model in the coordinate diffusion process. We prove that SyMat is theoretically invariant to all symmetry transformations on materials and demonstrate that SyMat achieves promising performance on random generation and property optimization tasks.

Loss Functions and Metrics in Deep Learning. A Review

  • paper_url: http://arxiv.org/abs/2307.02694
  • repo_url: None
  • paper_authors: Juan Terven, Diana M. Cordova-Esparza, Alfonzo Ramirez-Pedraza, Edgar A. Chavez-Urbiola
  • for: 这篇论文旨在探讨深度学习中最常用的损失函数和性能指标,以帮助实践者选择适合自己特定任务的方法。
  • methods: 这篇论文评论了深度学习中最常用的损失函数和性能指标,包括其优点和局限性,以及它们在不同的深度学习问题中的应用。
  • results: 论文总结了不同损失函数和性能指标的应用,并提供了帮助实践者选择适合自己特定任务的方法的概述。
    Abstract One of the essential components of deep learning is the choice of the loss function and performance metrics used to train and evaluate models. This paper reviews the most prevalent loss functions and performance measurements in deep learning. We examine the benefits and limits of each technique and illustrate their application to various deep-learning problems. Our review aims to give a comprehensive picture of the different loss functions and performance indicators used in the most common deep learning tasks and help practitioners choose the best method for their specific task.
    摘要 一个深度学习中的重要组件是选择的损失函数和评价指标来训练和评估模型。本文评审了深度学习中最常用的损失函数和评价指标,探讨它们的优缺点和在不同的深度学习问题中的应用。我们的评审旨在给深度学习各种任务中的不同损失函数和评价指标带来全面的认知,帮助实践者选择适合自己特定任务的方法。Note: "深度学习" in Simplified Chinese is written as "深度学习" (shēn dēng xué xí), not "深度机器学习" (shēn dēng jī shū xí) as it is sometimes translated.

Kernels, Data & Physics

  • paper_url: http://arxiv.org/abs/2307.02693
  • repo_url: None
  • paper_authors: Francesco Cagnetta, Deborah Oliveira, Mahalakshmi Sabanayagam, Nikolaos Tsilivis, Julia Kempe
  • for: 这篇论文主要是为了解决机器学习中的一些不可解决的问题,通过找到可解决的kernel形式来获得更好的理解。
  • methods: 这篇论文使用了NTK方法,即通过找到一个可解决的kernel来理解一个不可解决的问题。
  • results: 论文中提出了一些实际应用,如数据简化和鲁棒性增强,以及一些示例的偏见假设。
    Abstract Lecture notes from the course given by Professor Julia Kempe at the summer school "Statistical physics of Machine Learning" in Les Houches. The notes discuss the so-called NTK approach to problems in machine learning, which consists of gaining an understanding of generally unsolvable problems by finding a tractable kernel formulation. The notes are mainly focused on practical applications such as data distillation and adversarial robustness, examples of inductive bias are also discussed.
    摘要 lecture notes from Professor Julia Kempe's course at the "Statistical physics of Machine Learning" summer school in Les Houches, which discusses the so-called NTK approach to problems in machine learning. This approach involves finding a tractable kernel formulation to gain an understanding of generally unsolvable problems. The notes focus mainly on practical applications such as data distillation and adversarial robustness, and examples of inductive bias are also discussed.Here is the translation in Traditional Chinese:lecture notes from Professor Julia Kempe's course at the "Statistical physics of Machine Learning" summer school in Les Houches, which discusses the so-called NTK approach to problems in machine learning. This approach involves finding a tractable kernel formulation to gain an understanding of generally unsolvable problems. The notes focus mainly on practical applications such as data distillation and adversarial robustness, and examples of inductive bias are also discussed.

Scaling In-Context Demonstrations with Structured Attention

  • paper_url: http://arxiv.org/abs/2307.02690
  • repo_url: None
  • paper_authors: Tianle Cai, Kaixuan Huang, Jason D. Lee, Mengdi Wang
  • for: 提高大语言模型(LLM)在上下文学习中的能力,即从少量示例中学习到某个任务的执行。
  • methods: 提出了一种更好的建筑设计,即SAICL(结构化注意力 для上下文学习),它将全注意力替换为特定于上下文学习的结构化注意力机制,并将各个示例之间的不必要依赖关系除掉,使模型具有示例顺序 permutation 的不变性。
  • results: SAICL在meta-training框架下评估,与全注意力相比具有相似或更好的性能,并在执行上实现了 up to 3.4x 速度提升。SAICL 还在每个示例独立处理方法(FiD)的强基eline上表现出色,并且因为其线性特性,可以轻松扩展到多个示例,并在扩展时保持不间断的性能提升。
    Abstract The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between individual demonstrations, while making the model invariant to the permutation of demonstrations. We evaluate SAICL in a meta-training framework and show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a strong Fusion-in-Decoder (FiD) baseline which processes each demonstration independently. Finally, thanks to its linear nature, we demonstrate that SAICL can easily scale to hundreds of demonstrations with continuous performance gains with scaling.
    摘要 最近的大语言模型(LLM)的升温 highlights 它们在上下文学习中的能力,即从数据中学习某种任务,而无需更新参数。然而,这些模型在上下文学习的能力受到模型体系的限制:1)使用示例的限制由于位置嵌入而导致最大句子长度; 2)关注的二次复杂性阻碍用户更多示例的有效使用; 3)LLMs 被证明敏感于示例的顺序。在这种情况下,我们解决这些挑战的方法是提出一种更好的体系设计,即 SAICL(结构化关注 для上下文学习)。SAICL 将全关注替换为特定于上下文学习的结构化关注机制,并消除示例之间的不必要依赖关系,使模型对示例的顺序无敏感。我们在 meta-training 框架中评估 SAICL,并显示 SAICL 与全关注相比实现了相似或更好的性能,并在执行上获得了最多 3.4 倍的速度提升。SAICL 还一直超过了强的 Fusion-in-Decoder(FiD)基eline,该基eline 每个示例独立进行处理。最后,由于它的线性性,我们证明 SAICL 可以轻松扩展到数百个示例,并在扩展时保持不断提高性能。

GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples using Gradients and Invariance Transformations

  • paper_url: http://arxiv.org/abs/2307.02672
  • repo_url: None
  • paper_authors: Julia Lust, Alexandru P. Condurache
  • for: 检测深度神经网络的泛化错误
  • methods: combines 使用梯度信息和变换方法
  • results: 在多种网络架构、问题设置和扰动类型上实现了比州-of-the-art的高效性
    Abstract Deep neural networks tend to make overconfident predictions and often require additional detectors for misclassifications, particularly for safety-critical applications. Existing detection methods usually only focus on adversarial attacks or out-of-distribution samples as reasons for false predictions. However, generalization errors occur due to diverse reasons often related to poorly learning relevant invariances. We therefore propose GIT, a holistic approach for the detection of generalization errors that combines the usage of gradient information and invariance transformations. The invariance transformations are designed to shift misclassified samples back into the generalization area of the neural network, while the gradient information measures the contradiction between the initial prediction and the corresponding inherent computations of the neural network using the transformed sample. Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures, problem setups and perturbation types.
    摘要

Active Class Selection for Few-Shot Class-Incremental Learning

  • paper_url: http://arxiv.org/abs/2307.02641
  • repo_url: https://github.com/chrismcclurg/fscil-acs
  • paper_authors: Christopher McClurg, Ali Ayub, Harsh Tyagi, Sarah M. Rajtmajer, Alan R. Wagner
  • for: 这个论文的目的是开发一种能让自主机器人在有限的互动情况下不断学习环境中的新对象。
  • methods: 这个论文使用了几何学incremental学习和活动类选择的想法,并将其结合到一个基于状态 искусственный智能的模型中。
  • results: 实验结果表明,这种方法可以在真实世界应用中提供长期的有效性,并且可以帮助机器人在有限的互动情况下不断学习和更新其模型。
    Abstract For real-world applications, robots will need to continually learn in their environments through limited interactions with their users. Toward this, previous works in few-shot class incremental learning (FSCIL) and active class selection (ACS) have achieved promising results but were tested in constrained setups. Therefore, in this paper, we combine ideas from FSCIL and ACS to develop a novel framework that can allow an autonomous agent to continually learn new objects by asking its users to label only a few of the most informative objects in the environment. To this end, we build on a state-of-the-art (SOTA) FSCIL model and extend it with techniques from ACS literature. We term this model Few-shot Incremental Active class SeleCtiOn (FIASco). We further integrate a potential field-based navigation technique with our model to develop a complete framework that can allow an agent to process and reason on its sensory data through the FIASco model, navigate towards the most informative object in the environment, gather data about the object through its sensors and incrementally update the FIASco model. Experimental results on a simulated agent and a real robot show the significance of our approach for long-term real-world robotics applications.
    摘要 实际应用中, роботы需要不断学习环境中的新对象,通过与用户有限的交互来实现。以前的研究中,几shot类增量学习(FSCIL)和活动类选择(ACS)都有获得了可塑性的结果,但是都是在限制的设置下测试的。因此,在这篇论文中,我们将 combining FSCIL 和 ACS 的想法,开发一个能让自主代理人不断学习新的对象,只需要用户标注环境中最有用的一些对象。为此,我们基于现状的 FSCIL 模型,并将 ACS 文献中的技术添加到其中。我们称这个模型为 Few-shot Incremental Active class SeleCtiOn(FIASco)。此外,我们还将潜在场景基本navitation技术与我们的模型集成,开发了一个完整的框架,让代理人可以通过 FIASco 模型处理和理解其感知数据, navigate到环境中最有用的对象,通过感知器收集数据,并不断更新 FIASco 模型。实验结果表明,我们的方法在模拟的代理人和真实的 робот上都具有长期实际应用的重要性。

Hybrid Ground-State Quantum Algorithms based on Neural Schrödinger Forging

  • paper_url: http://arxiv.org/abs/2307.02633
  • repo_url: None
  • paper_authors: Paulin de Schoulepnikoff, Oriel Kiss, Sofia Vallecorsa, Giuseppe Carleo, Michele Grossi
  • for: 解决量子系统的地面问题
  • methods: 使用生成式神经网络来筛选关键的 bitstring,消减对 Schmidt 分解的约数
  • results: 比标准实现更高效,可应用于更大的系统和非排列不变系统
    Abstract Entanglement forging based variational algorithms leverage the bi-partition of quantum systems for addressing ground state problems. The primary limitation of these approaches lies in the exponential summation required over the numerous potential basis states, or bitstrings, when performing the Schmidt decomposition of the whole system. To overcome this challenge, we propose a new method for entanglement forging employing generative neural networks to identify the most pertinent bitstrings, eliminating the need for the exponential sum. Through empirical demonstrations on systems of increasing complexity, we show that the proposed algorithm achieves comparable or superior performance compared to the existing standard implementation of entanglement forging. Moreover, by controlling the amount of required resources, this scheme can be applied to larger, as well as non permutation invariant systems, where the latter constraint is associated with the Heisenberg forging procedure. We substantiate our findings through numerical simulations conducted on spins models exhibiting one-dimensional ring, two-dimensional triangular lattice topologies, and nuclear shell model configurations.
    摘要 Entanglement forging based variational algorithms 利用量子系统的二分法解决零点问题。主要限制是对很多 potential basis states(或字串)进行 Schmidt 分解的极限积分。为了解决这个挑战,我们提议一种使用生成神经网络来标识最重要的字串,从而消除极限积分的需要。通过实验表明,我们的方法在不同级别的系统上都可以达到相对或更高的性能,而且可以控制所需资源,因此可以应用于更大的系统和非 permutation 不变系统。我们通过对磁矩模型、二维三角形矩阵和核shell模型的数值仿真来证明我们的发现。

Stability of Q-Learning Through Design and Optimism

  • paper_url: http://arxiv.org/abs/2307.02632
  • repo_url: None
  • paper_authors: Sean Meyn
  • for: 本研究是一篇关于渐进学习和Q学习的讲解和新方法的论文,其中包括在法国南部的INFORMS APS投票讲座,于2023年6月举行。
  • methods: 本研究使用了Stochastic Approximation和Q学习算法,以及一些新的稳定性和可能加速的转化方法。其中两个全新的贡献是:1. Linear function approximation下Q学习稳定性的问题,已经是研究的开放问题之一,这里表明了采用修改 Gibbs 政策可以确保算法的稳定性(参数估计在 bounded 内),但是转化仍然是许多研究的开放问题之一。2. Zap Zero 算法,可以近似新顿-拉普逊流,不需要矩阵倒数。它在一定条件下是稳定的和可靠的,并且适用于Q学习和其他渐进学习算法。
  • results: 本研究获得了一些新的结果,包括:1. 使用修改 Gibbs 政策可以确保Q学习的稳定性,但是转化仍然是一个开放问题。2. Zap Zero 算法可以在一定条件下实现稳定和可靠的转化。
    Abstract Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. The purpose of this paper is in part a tutorial on stochastic approximation and Q-learning, providing details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in Nancy France, June 2023. The paper also presents new approaches to ensure stability and potentially accelerated convergence for these algorithms, and stochastic approximation in other settings. Two contributions are entirely new: 1. Stability of Q-learning with linear function approximation has been an open topic for research for over three decades. It is shown that with appropriate optimistic training in the form of a modified Gibbs policy, there exists a solution to the projected Bellman equation, and the algorithm is stable (in terms of bounded parameter estimates). Convergence remains one of many open topics for research. 2. The new Zap Zero algorithm is designed to approximate the Newton-Raphson flow without matrix inversion. It is stable and convergent under mild assumptions on the mean flow vector field for the algorithm, and compatible statistical assumption on an underlying Markov chain. The algorithm is a general approach to stochastic approximation which in particular applies to Q-learning with "oblivious" training even with non-linear function approximation.
    摘要 Q-学习已经成为现代回归学习工具包的重要组成部分,自1980年代Chris Watkins的论文发表以来。这篇论文的目的之一是提供Stochastic Approximation和Q-学习的教程,详细介绍了2023年6月在法国南部的INFORMS APS投资论坛演讲。此外,论文还提出了新的方法来保证这些算法的稳定性和可能更快的收敛,以及Stochastic Approximation在其他设置下的应用。两个贡献是完全新的:1. Q-学习使用线性函数近似的稳定性问题在研究中一直是一个开放的问题。这篇论文示出,通过修改Gibbs策略,存在一个解决了投影贝尔曼方程的算法,并且这个算法是稳定的(以bounded参数估计的形式)。然而,收敛问题仍然是许多开放的研究问题。2. Zap Zero算法是一种用于近似新顿-拉普逊流动的算法,不需要矩阵反转。它在一些简单的假设下是稳定的和收敛的,并且可以应用于Q-学习的“无知”训练,包括非线性函数近似。这个算法是一种通用的Stochastic Approximation方法,可以应用于Q-学习以外的其他问题。

An explainable model to support the decision about the therapy protocol for AML

  • paper_url: http://arxiv.org/abs/2307.02631
  • repo_url: None
  • paper_authors: Jade M. Almeida, Giovanna A. Castro, João A. Machado-Neto, Tiago A. Almeida
  • for: 预测AML患者的生存机会,以支持医生决策最佳治疗协议。
  • methods: 使用数据分析和可解释机器学习模型,以支持医生决策。
  • results: 提出了一个可解释的机器学习模型,可以安全地支持医生决策,并且实验结果具有承诺性。
    Abstract Acute Myeloid Leukemia (AML) is one of the most aggressive types of hematological neoplasm. To support the specialists' decision about the appropriate therapy, patients with AML receive a prognostic of outcomes according to their cytogenetic and molecular characteristics, often divided into three risk categories: favorable, intermediate, and adverse. However, the current risk classification has known problems, such as the heterogeneity between patients of the same risk group and no clear definition of the intermediate risk category. Moreover, as most patients with AML receive an intermediate-risk classification, specialists often demand other tests and analyses, leading to delayed treatment and worsening of the patient's clinical condition. This paper presents the data analysis and an explainable machine-learning model to support the decision about the most appropriate therapy protocol according to the patient's survival prediction. In addition to the prediction model being explainable, the results obtained are promising and indicate that it is possible to use it to support the specialists' decisions safely. Most importantly, the findings offered in this study have the potential to open new avenues of research toward better treatments and prognostic markers.
    摘要 急性骨髓性白血病(AML)是一种非常严重的血液疾病之一。为了支持专家决策最佳治疗方案,AML患者通常会根据其细胞学和分子特征进行评估,并将患者分为三个风险 категории:有利、中等和不利。然而,现有的风险分类系统存在一些知道的问题,如患者之间的不同性和中等风险类别的不清晰定义。此外,由于大多数AML患者被诊断为中等风险,专家通常会要求更多的测试和分析,导致治疗延迟和病情加重。本文提出了数据分析和可解释的机器学习模型,以支持专家决策最佳治疗方案。除了模型的可解释性之外,研究结果具有推动力,并表明可以安全地使用这些模型来支持专家决策。最重要的是,本研究的发现有可能开启新的研究途径,以提高治疗和诊断 marker 的效果。

FLuID: Mitigating Stragglers in Federated Learning using Invariant Dropout

  • paper_url: http://arxiv.org/abs/2307.02623
  • repo_url: None
  • paper_authors: Irene Wang, Prashant J. Nair, Divya Mahajan
  • for: 这个论文的目的是解决 Federated Learning(FL)中的性能瓶颈问题,即在多个设备上进行本地机器学习模型训练,然后将模型更新同步到共享服务器上,以保护用户隐私。
  • methods: 这个论文使用了一种名为“Invariant Dropout”的方法,该方法可以在 FL 中提取一个子模型,以降低可能影响精度的随机 Dropout 操作。此外,这个论文还提出了一个适应性训练框架,称为 Federated Learning using Invariant Dropout(FLuID),该框架可以在 runtime 中动态调整训练负担,以适应不同的设备性能。
  • results: 论文的实验结果表明,Invariant Dropout 可以保持基eline模型的效率,同时解决 FL 中的性能瓶颈问题。此外,FLuID 可以在 runtime 中动态调整训练负担,以适应不同的设备性能。
    Abstract Federated Learning (FL) allows machine learning models to train locally on individual mobile devices, synchronizing model updates via a shared server. This approach safeguards user privacy; however, it also generates a heterogeneous training environment due to the varying performance capabilities across devices. As a result, straggler devices with lower performance often dictate the overall training time in FL. In this work, we aim to alleviate this performance bottleneck due to stragglers by dynamically balancing the training load across the system. We introduce Invariant Dropout, a method that extracts a sub-model based on the weight update threshold, thereby minimizing potential impacts on accuracy. Building on this dropout technique, we develop an adaptive training framework, Federated Learning using Invariant Dropout (FLuID). FLuID offers a lightweight sub-model extraction to regulate computational intensity, thereby reducing the load on straggler devices without affecting model quality. Our method leverages neuron updates from non-straggler devices to construct a tailored sub-model for each straggler based on client performance profiling. Furthermore, FLuID can dynamically adapt to changes in stragglers as runtime conditions shift. We evaluate FLuID using five real-world mobile clients. The evaluations show that Invariant Dropout maintains baseline model efficiency while alleviating the performance bottleneck of stragglers through a dynamic, runtime approach.
    摘要 federated learning (FL) 允许机器学习模型在个人手持设备上本地进行训练,并通过共享服务器进行模型更新同步。这种方法保护用户隐私,但也创造了不同设备性能水平的多样化训练环境。因此,在FL中,慢速设备(straggler)的低性能经常决定整体训练时间。在这项工作中,我们想使用动态负载均衡来缓解FL中的性能瓶颈。我们提出了“ invariable dropout” 方法,该方法根据模型更新阈值提取子模型,以最小化可能的影响精度。基于这种dropout技术,我们开发了适应训练框架,称为 Federated Learning using Invariant Dropout (FLuID)。FLuID 提供了轻量级的子模型提取机制,以规避计算急剧的设备。我们的方法通过非慢速设备的神经元更新来构建每个慢速设备的个性化子模型,基于客户端性能分析。此外,FLuID 可以在运行时conditions 发生变化时动态适应。我们通过使用五个真实的手持设备进行评估,发现, invariable dropout 可以保持基eline模型效率,同时通过动态、运行时的方式缓解慢速设备的性能瓶颈。

Learning when to observe: A frugal reinforcement learning framework for a high-cost world

  • paper_url: http://arxiv.org/abs/2307.02620
  • repo_url: https://github.com/cbellinger27/learning-when-to-observe-in-rl
  • paper_authors: Colin Bellinger, Mark Crowley, Isaac Tamblyn
  • for: 本研究旨在探讨RL算法在环境状态测量成本高的应用场景下是否可以学习出高效的控制策略。
  • methods: 本文采用了 Deep Dynamic Multi-Step Observationless Agent(DMSOA),并对其进行了比较和实验评估在OpenAI gym和Atari Pong环境中。
  • results: 研究结果表明,DMSOA可以更好地学习控制策略,只需 fewer decision steps和测量步骤,并且在OpenAI gym和Atari Pong环境中表现出了更高的效果。
    Abstract Reinforcement learning (RL) has been shown to learn sophisticated control policies for complex tasks including games, robotics, heating and cooling systems and text generation. The action-perception cycle in RL, however, generally assumes that a measurement of the state of the environment is available at each time step without a cost. In applications such as materials design, deep-sea and planetary robot exploration and medicine, however, there can be a high cost associated with measuring, or even approximating, the state of the environment. In this paper, we survey the recently growing literature that adopts the perspective that an RL agent might not need, or even want, a costly measurement at each time step. Within this context, we propose the Deep Dynamic Multi-Step Observationless Agent (DMSOA), contrast it with the literature and empirically evaluate it on OpenAI gym and Atari Pong environments. Our results, show that DMSOA learns a better policy with fewer decision steps and measurements than the considered alternative from the literature. The corresponding code is available at: \url{https://github.com/cbellinger27/Learning-when-to-observe-in-RL
    摘要 Reinforcement learning (RL) 已经能够学习复杂任务,如游戏、机器人、暖气和冷气系统以及文本生成。但是,行动-感知循环在 RL 中通常假设在每个时间步骤上可以无 costa 地测量环境状态。在材料设计、深海和行星探险、医学应用等领域,可能存在高昂的测量环境状态的成本。在这篇论文中,我们报道了最近快速增长的文献,该文献采用了RL代理不需要或甚至不想在每个时间步骤上测量环境状态的视角。在这个Context中,我们提出了深度动态多步无测量代理(DMSOA),并与文献进行了比较。我们的结果表明,DMSOA 可以更好地学习策略,使用更少的决策步骤和测量。相关代码可以在以下链接中找到:https://github.com/cbellinger27/Learning-when-to-observe-in-RL。

Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition

  • paper_url: http://arxiv.org/abs/2307.02615
  • repo_url: https://github.com/sled-group/comparative-learning
  • paper_authors: Yuwei Bao, Barrett Martin Lattimer, Joyce Chai
  • for: 这个论文目的是提出一种基于人类婴儿语言学习的计算过程,用于词汇学习。
  • methods: 这个论文使用了比较学习的方法,通过比较不同特征之间的相似性和差异,学习抽取共同语言标签中的信息。
  • results: 实验结果表明,这种方法可以具有高效的 continent learning 特性,能够不断学习更多的概念。
    Abstract Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes, learn to filter out and extract the common information for each shared linguistic label. We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping. This procedure does not involve a fixed vocabulary size, nor a discriminative objective, and allows the models to continually learn more concepts efficiently. Our results in controlled experiments have shown the potential of this approach for efficient continual learning of grounded words.
    摘要 人类语言学习是一个高效、指导的、不断进行的过程。在这项工作中,我们 Draw inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning。 Based on cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of various attributes, learn to filter out and extract the common information for each shared linguistic label。 We frame the acquisition of words as not only the information filtration process, but also as representation-symbol mapping。 This procedure does not involve a fixed vocabulary size, nor a discriminative objective, and allows the models to continually learn more concepts efficiently。 Our results in controlled experiments have shown the potential of this approach for efficient continual learning of grounded words。

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

  • paper_url: http://arxiv.org/abs/2307.02598
  • repo_url: https://github.com/divyat09/additive_decoder_extrapolation
  • paper_authors: Sébastien Lachapelle, Divyat Mahajan, Ioannis Mitliagkas, Simon Lacoste-Julien
  • for: 本 paper 探讨了 latent variables 标注和 “out-of-support” 图像生成在表示学习中的问题。
  • methods: 本 paper 使用了一类被称为 additive 的解码器,它们类似于用于 object-centric representation learning (OCRL) 的解码器,并适用于可以为图像 decomposition 为不同物体图像的情况。
  • results: 本 paper 提供了一些条件,以确保使用 additive 解码器来解决重建问题时可以准确地标注 latent variables 的块,并且可以允许 permutation 和 block-wise invertible transformations。此外,本 paper 还证明了 additive 解码器可以生成新的图像,并且可以在新的方式中重新组合已经观察到的变量因素。
    Abstract We tackle the problems of latent variables identification and "out-of-support" image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.
    摘要 我们研究了隐藏变量标识和“不支持”图像生成问题在表示学习中。我们表明这两个问题可以通过我们称为添加型decoder解决,这类decoder与物体中心表示学习(OCRL)中使用的decoder类似,适用于可以分解为对象特定图像的和其他图像的总和。我们提供了解决这些问题时使用添加型decoder的条件, garantuee 隐藏变量块可以通过 permutation和块级可逆变换进行唯一标识。这个保证只需要很弱的 latent factor 分布的假设,这些分布可能具有统计依赖关系和极其复杂的支持形状。我们的结果为非线性独立 componon 分析(ICA)提供了新的设置,并补充了OCRL方法的理论理解。我们还证明了添加型decoder可以通过 recombining 观察到的因变量来生成新的图像,我们称之为 Cartesian-product 推导。我们通过实验表明,添加性是隐藏变量标识和 extrapolation 问题的关键。

TransformerG2G: Adaptive time-stepping for learning temporal graph embeddings using transformers

  • paper_url: http://arxiv.org/abs/2307.02588
  • repo_url: None
  • paper_authors: Alan John Varghese, Aniruddha Bora, Mengjia Xu, George Em Karniadakis
  • for: 本研究旨在提出一种基于 transformer 编码器的图像模型,以便更好地学习多变时间图的动态特征。
  • methods: 该模型使用 transformer 编码器首先学习当前时间点 ($t$) 和上一个时间点 ($t-1, t-l$, $l$ 是 Context 的长度) 中节点的中间表示。然后,使用两个投影层生成时间点 $t$ 的彩色矩阵。
  • results: 对于多种不同的 benchmark 和“新鲜度”水平,我们的模型在链接预测精度和计算效率方面与传统多步方法和我们之前的模型(DynG2G)相比,表现出了明显的优势。此外,通过分析注意力权重,我们可以揭示时间依赖关系,找到影响因素,并获得图structure的复杂交互。例如,我们发现了节点度与注意力权重之间的强相关性,这表明了节点度在图结构的不同阶段的作用。
    Abstract Dynamic graph embedding has emerged as a very effective technique for addressing diverse temporal graph analytic tasks (i.e., link prediction, node classification, recommender systems, anomaly detection, and graph generation) in various applications. Such temporal graphs exhibit heterogeneous transient dynamics, varying time intervals, and highly evolving node features throughout their evolution. Hence, incorporating long-range dependencies from the historical graph context plays a crucial role in accurately learning their temporal dynamics. In this paper, we develop a graph embedding model with uncertainty quantification, TransformerG2G, by exploiting the advanced transformer encoder to first learn intermediate node representations from its current state ($t$) and previous context (over timestamps [$t-1, t-l$], $l$ is the length of context). Moreover, we employ two projection layers to generate lower-dimensional multivariate Gaussian distributions as each node's latent embedding at timestamp $t$. We consider diverse benchmarks with varying levels of ``novelty" as measured by the TEA plots. Our experiments demonstrate that the proposed TransformerG2G model outperforms conventional multi-step methods and our prior work (DynG2G) in terms of both link prediction accuracy and computational efficiency, especially for high degree of novelty. Furthermore, the learned time-dependent attention weights across multiple graph snapshots reveal the development of an automatic adaptive time stepping enabled by the transformer. Importantly, by examining the attention weights, we can uncover temporal dependencies, identify influential elements, and gain insights into the complex interactions within the graph structure. For example, we identified a strong correlation between attention weights and node degree at the various stages of the graph topology evolution.
    摘要 “动态图 embedding 技术在许多应用中成为了非常有效的方法,用于解决多种时间图分析任务(如链接预测、节点分类、推荐系统、异常检测和图生成)。这些时间图具有不同的时间间隔和变化快速的节点特征。因此,在学习时间图动态的过程中,需要考虑长距离的历史图 conte xt。在这篇论文中,我们开发了一种图 embedding 模型,名为 TransformerG2G,通过利用高级变换 encoder 来首先从当前状态($t$)和上一个时间步([$t-1$, $t-l$)中学习中间节点表示。此外,我们采用了两层投影层来生成每个节点的几个维度的多元 Gaussian 分布作为其秘密嵌入。我们在多种不同的“新鲜度”测试(根据 TEA 图表)中进行了多种 benchmark,结果显示,我们的提案的 TransformerG2G 模型在链接预测精度和计算效率方面都高于传统的多步方法和我们之前的 DynG2G 模型,特别是在高度新鲜度时。此外,我们通过分析注意力权重来揭示时间图中的时间依赖关系,找到影响因子和强调关系,并从图结构中获得有价值的信息。例如,我们发现了节点度和注意力权重之间的强相关关系在不同的图结构发展阶段。”

Multimodal Temporal Fusion Transformers Are Good Product Demand Forecasters

  • paper_url: http://arxiv.org/abs/2307.02578
  • repo_url: None
  • paper_authors: Maarten Sukel, Stevan Rudinac, Marcel Worring
  • for: 预测产品需求,解决冷启动问题和类别动态问题。
  • methods: 使用 convolutional 网络、图Structured 网络和 transformer 网络,将多Modal 信息(图像和文本描述)与历史需求、类别信息和时序信息结合使用。
  • results: 在大规模实际数据集上进行实验,提出的方法可以有效地预测各种产品的需求,并且与传统方法相比,具有更高的准确率和可靠性。
    Abstract Multimodal demand forecasting aims at predicting product demand utilizing visual, textual, and contextual information. This paper proposes a method for multimodal product demand forecasting using convolutional, graph-based, and transformer-based architectures. Traditional approaches to demand forecasting rely on historical demand, product categories, and additional contextual information such as seasonality and events. However, these approaches have several shortcomings, such as the cold start problem making it difficult to predict product demand until sufficient historical data is available for a particular product, and their inability to properly deal with category dynamics. By incorporating multimodal information, such as product images and textual descriptions, our architecture aims to address the shortcomings of traditional approaches and outperform them. The experiments conducted on a large real-world dataset show that the proposed approach effectively predicts demand for a wide range of products. The multimodal pipeline presented in this work enhances the accuracy and reliability of the predictions, demonstrating the potential of leveraging multimodal information in product demand forecasting.
    摘要 simplified_chinese多modal需求预测目标在于预测产品需求使用视觉、文本和上下文信息。这篇论文提出一种基于卷积、图гра数据结构和变换器结构的多modal产品需求预测方法。传统的需求预测方法依靠历史需求、产品类别以及附加的上下文信息如季节和事件。然而,这些方法有几个缺陷,如冷启动问题,使得预测产品需求 Until sufficient historical data is available for a particular product 困难,以及它们对类别动态不够好地处理。通过把多modal信息,如产品图像和文本描述,纳入到方法中,我们的建议方法可以解决传统方法的缺陷,并超越它们。实验在大量实际数据集上显示,我们的方法可以有效预测各种产品的需求。在这篇论文中提出的多modal管道可以增强预测的准确性和可靠性, thereby demonstrating the potential of leveraging multimodal information in product demand forecasting.

Several categories of Large Language Models (LLMs): A Short Survey

  • paper_url: http://arxiv.org/abs/2307.10188
  • repo_url: None
  • paper_authors: Saurabh Pahune, Manoj Chandrasekharan
  • for: 这个论文的目的是为聊天机器人和虚拟智能助手技术提供有用的信息和未来方向。
  • methods: 这篇论文涵盖了不同类型的大语言模型(LLM),包括任务基金经济LLM、多语言语言LLM、医学和生物医学LLM、视觉语言LLM以及代码语言模型。它还描述了这些类型的LLM使用的方法、特点、数据集、变换器模型和比较度量。
  • results: 这篇论文总结了各类LLM的研究进展和努力,包括任务基金经济LLM、多语言语言LLM、医学和生物医学LLM、视觉语言LLM以及代码语言模型。它还强调了聊天机器人和虚拟智能助手技术的未解决问题,如提高自然语言处理、提高聊天机器人智能和解决道德和法律问题。
    Abstract Large Language Models(LLMs)have become effective tools for natural language processing and have been used in many different fields. This essay offers a succinct summary of various LLM subcategories. The survey emphasizes recent developments and efforts made for various LLM kinds, including task-based financial LLMs, multilingual language LLMs, biomedical and clinical LLMs, vision language LLMs, and code language models. The survey gives a general summary of the methods, attributes, datasets, transformer models, and comparison metrics applied in each category of LLMs. Furthermore, it highlights unresolved problems in the field of developing chatbots and virtual assistants, such as boosting natural language processing, enhancing chatbot intelligence, and resolving moral and legal dilemmas. The purpose of this study is to provide readers, developers, academics, and users interested in LLM-based chatbots and virtual intelligent assistant technologies with useful information and future directions.
    摘要 大型自然语言模型(LLM)已成为自然语言处理的有效工具,广泛应用于多个领域。本文提供LLM各种子类划分的简洁概述,强调最新的发展和努力。包括任务基金 languages LLMs, 多语言语言 LLMs, 医疗和临床 LLMs, 视觉语言 LLMs, 和代码语言模型。本文介绍每种LLM类型的方法、特点、数据集、转换器模型和比较指标。此外,它还抛光了虚拟助手和智能客服技术的未解决问题,如提高自然语言处理、增强聊天机器人智能和解决道德和法律问题。本文的目的是为有关LLM基于聊天机器人和虚拟智能助手技术的读者、开发者、学者和用户提供有用信息和未来方向。

How accurate are existing land cover maps for agriculture in Sub-Saharan Africa?

  • paper_url: http://arxiv.org/abs/2307.02575
  • repo_url: https://github.com/nasaharvest/crop-mask
  • paper_authors: Hannah Kerner, Catherine Nakalembe, Adam Yang, Ivan Zvonkov, Ryan McWeeny, Gabriel Tseng, Inbal Becker-Reshef
  • for: 评估农业生产和粮食安全性
  • methods: 使用11个公共可用的土地覆盖图进行对比和评估,以确定最适合农业监测和评估的土地覆盖图
  • results: 结果表明不同的土地覆盖图在不同国家和地区之间存在差异和低准确性,建议用户选择最适合自己需求的土地覆盖图,并促进未来的研究集中于解决图像间的差异和提高低准确性区域的准确性。
    Abstract Satellite Earth observations (EO) can provide affordable and timely information for assessing crop conditions and food production. Such monitoring systems are essential in Africa, where there is high food insecurity and sparse agricultural statistics. EO-based monitoring systems require accurate cropland maps to provide information about croplands, but there is a lack of data to determine which of the many available land cover maps most accurately identify cropland in African countries. This study provides a quantitative evaluation and intercomparison of 11 publicly available land cover maps to assess their suitability for cropland classification and EO-based agriculture monitoring in Africa using statistically rigorous reference datasets from 8 countries. We hope the results of this study will help users determine the most suitable map for their needs and encourage future work to focus on resolving inconsistencies between maps and improving accuracy in low-accuracy regions.
    摘要

Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation

  • paper_url: http://arxiv.org/abs/2307.02574
  • repo_url: https://github.com/bobleegogogo/building_height
  • paper_authors: Hao Li, Zhendong Yuan, Gabriel Dax, Gefei Kong, Hongchao Fan, Alexander Zipf, Martin Werner
  • for: 这种研究的目的是为了提供一种自动将建筑高度从低成本的地理信息数据中提取出来的方法,以便在生成低成本和开源的3D城市模型中使用。
  • methods: 本研究使用了 semi-supervised learning(SSL)方法,包括提出一种SSLSchema,并使用多层形态特征从OSM数据中提取建筑高度的信息。
  • results: 在test dataset中,使用Random Forest(RF)、Support Vector Machine(SVM)和Convolutional Neural Network(CNN)三种不同的回归模型,SSL方法在优化建筑高度估计中带来了明显的性能提升,MAE约为2.1米,与现有方法相比具有竞争力。
    Abstract Accurate building height estimation is key to the automatic derivation of 3D city models from emerging big geospatial data, including Volunteered Geographical Information (VGI). However, an automatic solution for large-scale building height estimation based on low-cost VGI data is currently missing. The fast development of VGI data platforms, especially OpenStreetMap (OSM) and crowdsourced street-view images (SVI), offers a stimulating opportunity to fill this research gap. In this work, we propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OSM data to generate low-cost and open-source 3D city modeling in LoD1. The proposed method consists of three parts: first, we propose an SSL schema with the option of setting a different ratio of "pseudo label" during the supervised regression; second, we extract multi-level morphometric features from OSM data (i.e., buildings and streets) for the purposed of inferring building height; last, we design a building floor estimation workflow with a pre-trained facade object detection network to generate "pseudo label" from SVI and assign it to the corresponding OSM building footprint. In a case study, we validate the proposed SSL method in the city of Heidelberg, Germany and evaluate the model performance against the reference data of building heights. Based on three different regression models, namely Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Network (CNN), the SSL method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters, which is competitive to state-of-the-art approaches. The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data, with possibilities in even regions and areas with diverse data quality and availability.
    摘要 正确的建筑高度估计是验证3D城市模型自大规模的地理空间数据中emerging的Automatic Derivation的关键,包括义务授权地理信息(VGI)。然而,一个基于低成本VGI数据的大规模建筑高度估计方法目前缺失。开放街图(OSM)和人群创建的街景图像(SVI)的快速发展提供了一个刺激的机会,以填补这个研究潜在。在这个工作中,我们提出了一个半监督学习(SSL)方法,用于自动从Mapillary SVI和OSM数据中估计建筑高度,以生成低成本和开源的3D城市建模(LoD1)。我们的方法包括三个部分:第一,我们提出了SSL schema,让使用者在监督回归中设置不同的"伪标签"比率;第二,我们从OSM数据中提取多层次形态特征,以估计建筑高度;第三,我们设计了一个建筑楼层估计工作流程,使用预训练的外观物体检测网络来从SVI中生成"伪标签",并将其分配到相应的OSM建筑基本面。在一个 Heidelberg 城市的应用中,我们认为SSL方法可以明显提高建筑高度估计的性能,MAE约2.1米,与现有方法竞争。这个初步结果将验证我们未来将基于低成本VGI数据扩展提案,包括不同的数据质量和可用性。

Conditional Korhunen-Loéve regression model with Basis Adaptation for high-dimensional problems: uncertainty quantification and inverse modeling

  • paper_url: http://arxiv.org/abs/2307.02572
  • repo_url: None
  • paper_authors: Yu-Hong Yeung, Ramakrishna Tipireddy, David A. Barajas-Solano, Alexandre M. Tartakovsky
    for: 这个论文的目的是提高Physical Systems中对可观测Response的模拟精度,特别是在高维问题中。methods: 该论文使用 truncated unconditional Karhunen-Lo'{e}ve expansions (KLEs) 和 conditional Karhunen-Lo'{e}ve expansions (CKLEs) 来构建模拟模型,并通过 Gaussian process regression 来conditioning the covariance kernel of the unconditional expansion on the direct measurements。results: 该论文的结果表明,基于CKLEs的BA模拟模型在forward uncertainty quantification tasks中比基于unconditional expansions的BA模拟模型更加精度。此外,使用CKLE-based BA模拟模型进行 inverse estimation of hydraulic transmissivity field 的结果也比使用unconditional BA模拟模型更加准确。
    Abstract We propose a methodology for improving the accuracy of surrogate models of the observable response of physical systems as a function of the systems' spatially heterogeneous parameter fields with applications to uncertainty quantification and parameter estimation in high-dimensional problems. Practitioners often formulate finite-dimensional representations of spatially heterogeneous parameter fields using truncated unconditional Karhunen-Lo\'{e}ve expansions (KLEs) for a certain choice of unconditional covariance kernel and construct surrogate models of the observable response with respect to the random variables in the KLE. When direct measurements of the parameter fields are available, we propose improving the accuracy of these surrogate models by representing the parameter fields via conditional Karhunen-Lo\'{e}ve expansions (CKLEs). CKLEs are constructed by conditioning the covariance kernel of the unconditional expansion on the direct measurements via Gaussian process regression and then truncating the corresponding KLE. We apply the proposed methodology to constructing surrogate models via the Basis Adaptation (BA) method of the stationary hydraulic head response, measured at spatially discrete observation locations, of a groundwater flow model of the Hanford Site, as a function of the 1,000-dimensional representation of the model's log-transmissivity field. We find that BA surrogate models of the hydraulic head based on CKLEs are more accurate than BA surrogate models based on unconditional expansions for forward uncertainty quantification tasks. Furthermore, we find that inverse estimates of the hydraulic transmissivity field computed using CKLE-based BA surrogate models are more accurate than those computed using unconditional BA surrogate models.
    摘要 我们提出一种方法来提高非模拟模型 observable response 的准确性,该 observable response 是基于物理系统的空间各个参数场的函数。实际工作者通常使用 truncated unconditional Karhunen-Lo\'{e}ve expansion (KLE) 来形式化空间各个参数场,然后使用这些 KLE 来构建非模拟模型。当直接测量 parameter 场可用时,我们提议使用 conditional Karhunen-Lo\'{e}ve expansion (CKLE) 来改进非模拟模型的准确性。CKLE 通过 conditioning covariance kernel 的 unconditional expansion 于直接测量,使用 Gaussian process regression truncate 相应的 KLE。我们应用该方法ологи到 constructing 非模拟模型 via Basis Adaptation (BA) 方法, Specifically, we use the BA method to construct surrogate models of the stationary hydraulic head response of a groundwater flow model of the Hanford Site, as a function of the 1,000-dimensional representation of the model's log-transmissivity field. Our results show that BA surrogate models based on CKLEs are more accurate than BA surrogate models based on unconditional expansions for forward uncertainty quantification tasks. Furthermore, we find that inverse estimates of the hydraulic transmissivity field computed using CKLE-based BA surrogate models are more accurate than those computed using unconditional BA surrogate models.

LongNet: Scaling Transformers to 1,000,000,000 Tokens

  • paper_url: http://arxiv.org/abs/2307.02486
  • repo_url: https://github.com/microsoft/unilm
  • paper_authors: Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei
  • for: 长度较长的序列处理,例如整个文库或互联网序列。
  • methods: 提出了扩散注意力的方法,可以将注意力场展开到远距离处,从而提高序列处理的效率和范围。
  • results: 实验结果表明,LongNet可以在长序列模型和通用语言任务上达到强表现,并且可以serve为分布式训练器。
    Abstract Scaling sequence length has become a critical demand in the era of large language models. However, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. To address this issue, we introduce LongNet, a Transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. Specifically, we propose dilated attention, which expands the attentive field exponentially as the distance grows. LongNet has significant advantages: 1) it has a linear computation complexity and a logarithm dependency between any two tokens in a sequence; 2) it can be served as a distributed trainer for extremely long sequences; 3) its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing Transformer-based optimization. Experiments results demonstrate that LongNet yields strong performance on both long-sequence modeling and general language tasks. Our work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire Internet as a sequence.
    摘要 <>TRANSLATE_TEXTsequence length scaling has become a critical demand in the era of large language models. however, existing methods struggle with either computational complexity or model expressivity, rendering the maximum sequence length restricted. to address this issue, we introduce longnet, a transformer variant that can scale sequence length to more than 1 billion tokens, without sacrificing the performance on shorter sequences. specifically, we propose dilated attention, which expands the attentive field exponentially as the distance grows. longnet has significant advantages: 1) it has a linear computation complexity and a logarithm dependency between any two tokens in a sequence; 2) it can be served as a distributed trainer for extremely long sequences; 3) its dilated attention is a drop-in replacement for standard attention, which can be seamlessly integrated with the existing transformer-based optimization. experiments results demonstrate that longnet yields strong performance on both long-sequence modeling and general language tasks. our work opens up new possibilities for modeling very long sequences, e.g., treating a whole corpus or even the entire internet as a sequence.TRANSLATE_TEXT

Elastic Decision Transformer

  • paper_url: http://arxiv.org/abs/2307.02484
  • repo_url: https://github.com/danderfer/Comp_Sci_Sem_2
  • paper_authors: Yueh-Hua Wu, Xiaolong Wang, Masashi Hamaya
  • for: 提出了一种基于决策变换器的扩展方法,以解决DT和其变种在生成优化路径时的问题。
  • methods: 利用了DT的历史记录来实现 trajectory stitching 在测试时,并在不同情况下保留不同的历史记录,以优化路径。
  • results: 在D4RL locomotor benchmark和Atari游戏中,EDT比基于Q学习的方法表现更好,特别是在多任务情况下。Translation:
  • for: The paper proposes an extension of the Decision Transformer (DT) to solve the problem of trajectory stitching during action inference.
  • methods: The proposed method uses DT’s history record to implement trajectory stitching at test time, and retains different history records in different situations to optimize the path.
  • results: In the D4RL locomotor benchmark and Atari games, EDT outperforms Q-learning-based methods, especially in multi-task scenarios.
    Abstract This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/
    摘要 EDT addresses this issue by stitching trajectories during action inference at test time, achieved by adjusting the history length maintained in DT. EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, allowing it to "stitch" with a more optimal trajectory.Experiments show that EDT can bridge the performance gap between DT-based and Q Learning-based approaches. Specifically, EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: .

Jailbroken: How Does LLM Safety Training Fail?

  • paper_url: http://arxiv.org/abs/2307.02483
  • repo_url: None
  • paper_authors: Alexander Wei, Nika Haghtalab, Jacob Steinhardt
  • for: 本研究旨在探讨大语言模型受到攻击的原因和如何创造攻击。
  • methods: 研究人员使用了两种故障模式来引导攻击设计:竞合目标和欠拟合泛化。
  • results: 研究发现,即使使用了广泛的红队训练和安全训练,现有的模型仍然存在漏洞,新的攻击方法可以在所有提问集中成功。
    Abstract Large language models trained for safety and harmlessness remain susceptible to adversarial misuse, as evidenced by the prevalence of "jailbreak" attacks on early releases of ChatGPT that elicit undesired behavior. Going beyond recognition of the issue, we investigate why such attacks succeed and how they can be created. We hypothesize two failure modes of safety training: competing objectives and mismatched generalization. Competing objectives arise when a model's capabilities and safety goals conflict, while mismatched generalization occurs when safety training fails to generalize to a domain for which capabilities exist. We use these failure modes to guide jailbreak design and then evaluate state-of-the-art models, including OpenAI's GPT-4 and Anthropic's Claude v1.3, against both existing and newly designed attacks. We find that vulnerabilities persist despite the extensive red-teaming and safety-training efforts behind these models. Notably, new attacks utilizing our failure modes succeed on every prompt in a collection of unsafe requests from the models' red-teaming evaluation sets and outperform existing ad hoc jailbreaks. Our analysis emphasizes the need for safety-capability parity -- that safety mechanisms should be as sophisticated as the underlying model -- and argues against the idea that scaling alone can resolve these safety failure modes.
    摘要 大型语言模型培养为安全和无害性的问题尚存在攻击风险,例如ChatGPT的早期发布版本的"监狱攻击",这些攻击可以让模型表现出不жела的行为。我们不仅承认这个问题,还进一步调查这些攻击成功的原因和如何创造。我们假设两种安全培训失败情况:竞合目标和不同渠道的泛化。竞合目标发生在模型的能力和安全目标之间的冲突,而不同渠道的泛化发生在安全培训无法泛化到一个具有能力的领域。我们使用这些失败情况来引导监狱设计,然后评估当前最好的模型,包括OpenAI的GPT-4和Anthropic的Claude v1.3,对于现有和新设计的攻击。我们发现,即使这些模型进行了广泛的红色队伍和安全培训,漏洞仍然存在。尤其是,我们新设计的攻击可以在每个提示中成功,并且在模型的红色队伍评估集中的 unsafe requests 中表现出更好的效果。我们的分析强调了安全机制应该和模型的能力相似,而不是假设升级 alone 可以解决这些安全失败情况。

Conditional independence testing under model misspecification

  • paper_url: http://arxiv.org/abs/2307.02520
  • repo_url: None
  • paper_authors: Felipe Maia Polo, Yuekai Sun, Moulinath Banerjee
  • for: 本文研究了模型错误的情况下 regression-based 独立性测试的性能。
  • methods: 本文提出了三种 regression-based 测试方法的新上界或近似值,以及一种robust against model misspecification的新测试方法——Rao-Blackwellized Predictor Test (RBPT)。
  • results: 实验结果表明,RBPT 可以在模型错误情况下提供更好的测试性能,而且可以与现有的测试方法进行比较。
    Abstract Conditional independence (CI) testing is fundamental and challenging in modern statistics and machine learning. Many modern methods for CI testing rely on powerful supervised learning methods to learn regression functions or Bayes predictors as an intermediate step. Although the methods are guaranteed to control Type-I error when the supervised learning methods accurately estimate the regression functions or Bayes predictors, their behavior is less understood when they fail due to model misspecification. In a broader sense, model misspecification can arise even when universal approximators (e.g., deep neural nets) are employed. Then, we study the performance of regression-based CI tests under model misspecification. Namely, we propose new approximations or upper bounds for the testing errors of three regression-based tests that depend on misspecification errors. Moreover, we introduce the Rao-Blackwellized Predictor Test (RBPT), a novel regression-based CI test robust against model misspecification. Finally, we conduct experiments with artificial and real data, showcasing the usefulness of our theory and methods.
    摘要 conditional independence (CI) 测试是现代统计和机器学习中的基础和挑战。许多现代CI测试方法 rely on 强大的指导学习方法来学习回归函数或 bayes 预测函数作为中间步骤。虽然这些方法能够控制类型一错误,但它们在模型误差时表现不准确。在更广泛的意义上,模型误差可以出现,即使使用 universial approximators(例如深度神经网)。我们研究了 regression-based CI 测试下模型误差的性能。即,我们提出了新的近似或上限值 для三种 regression-based 测试的测试误差,其中受到模型误差的影响。此外,我们引入了 Rao-Blackwellized Predictor Test(RBPT),一种robust against model misspecification的新的 regression-based CI 测试。最后,我们在人工和实际数据上进行了实验,展示了我们的理论和方法的实用性。

Linear Regression on Manifold Structured Data: the Impact of Extrinsic Geometry on Solutions

  • paper_url: http://arxiv.org/abs/2307.02478
  • repo_url: None
  • paper_authors: Liangchen Liu, Juncai He, Richard Tsai
  • for: 这个论文研究了线性回归在数据构造在 manifold 上的应用。
  • methods: 作者们使用了 manifold 的嵌入空间的欧几何学空间来分析数据构造的外在 geometry 对回归的影响。
  • results: 研究发现,当数据构造在某些维度上是平坦的时候,线性回归无Unique解。而在其他维度上,拓扑 manifold 的 curvature(或者参数化时的高阶非线性)会对回归解决做出重要贡献,特别是在沿着 manifold 的正常方向解决。这些发现表明了数据构造geometry对回归模型的稳定性具有重要作用。
    Abstract In this paper, we study linear regression applied to data structured on a manifold. We assume that the data manifold is smooth and is embedded in a Euclidean space, and our objective is to reveal the impact of the data manifold's extrinsic geometry on the regression. Specifically, we analyze the impact of the manifold's curvatures (or higher order nonlinearity in the parameterization when the curvatures are locally zero) on the uniqueness of the regression solution. Our findings suggest that the corresponding linear regression does not have a unique solution when the embedded submanifold is flat in some dimensions. Otherwise, the manifold's curvature (or higher order nonlinearity in the embedding) may contribute significantly, particularly in the solution associated with the normal directions of the manifold. Our findings thus reveal the role of data manifold geometry in ensuring the stability of regression models for out-of-distribution inferences.
    摘要 在这篇论文中,我们研究了线性回归在数据构造在拟合空间上的应用。我们假设数据构造是光滑的,并且嵌入在几何空间中,我们的目标是探讨数据构造的外部几何特性对回归的影响。Specifically,我们分析了数据构造的曲率(或者在参数化时的高阶非线性)对回归解决uniqueness的影响。我们发现当托管的子拟合空间在一些维度上是平的时,相应的线性回归并没有唯一的解。否则,拟合空间的曲率(或者参数化时的高阶非线性)可能会对解决做出重要贡献,特别是在与拟合空间的正常方向相关的解决方案中。我们的发现表明了数据构造几何的作用在外部归一致推断中的稳定性。

Performance Scaling via Optimal Transport: Enabling Data Selection from Partially Revealed Sources

  • paper_url: http://arxiv.org/abs/2307.02460
  • repo_url: None
  • paper_authors: Feiyang Kang, Hoang Anh Just, Anit Kumar Sahu, Ruoxi Jia
    for: This paper aims to improve the process of data selection for machine learning models by proposing a framework called . The paper focuses on the scenario where only a limited subset of samples are available before an acquisition decision is made.methods: The proposed framework uses a two-stage performance inference process, which includes:1. Leveraging the Optimal Transport distance to predict the model’s performance for any data mixture ratio within the range of disclosed data sizes.2. Extrapolating the performance to larger undisclosed data sizes based on a novel parameter-free mapping technique inspired by neural scaling laws.results: The paper demonstrates that significantly improves existing performance scaling approaches in terms of both the accuracy of performance inference and the computation costs associated with constructing the performance predictor. Additionally, outperforms other off-the-shelf solutions in data selection effectiveness.Here is the information in Simplified Chinese text:for: 这 paper 的目的是改进机器学习模型选择数据的过程,提出了一个名为 的框架。 paper 关注到只有有限数据样本可用于决策前的情况。methods: 使用的是一个两个阶段性表现预测过程,包括:1. 利用 Optimal Transport 距离来预测模型在任何数据混合比例范围内的性能。2. 基于一种新的无参数映射技术,使用神经Scaling laws 来推广表现到更大的未知数据大小。results: paper 显示, 可以significantly 改进现有的表现推断方法,包括表现预测精度和构建表现预测器所需的计算成本。另外, 也可以很大程度上超越其他的OFF-the-SHELF 解决方案在数据选择效果上。
    Abstract Traditionally, data selection has been studied in settings where all samples from prospective sources are fully revealed to a machine learning developer. However, in practical data exchange scenarios, data providers often reveal only a limited subset of samples before an acquisition decision is made. Recently, there have been efforts to fit scaling laws that predict model performance at any size and data source composition using the limited available samples. However, these scaling functions are black-box, computationally expensive to fit, highly susceptible to overfitting, or/and difficult to optimize for data selection. This paper proposes a framework called , which predicts model performance and supports data selection decisions based on partial samples of prospective data sources. Our approach distinguishes itself from existing work by introducing a novel *two-stage* performance inference process. In the first stage, we leverage the Optimal Transport distance to predict the model's performance for any data mixture ratio within the range of disclosed data sizes. In the second stage, we extrapolate the performance to larger undisclosed data sizes based on a novel parameter-free mapping technique inspired by neural scaling laws. We further derive an efficient gradient-based method to select data sources based on the projected model performance. Evaluation over a diverse range of applications demonstrates that significantly improves existing performance scaling approaches in terms of both the accuracy of performance inference and the computation costs associated with constructing the performance predictor. Also, outperforms by a wide margin in data selection effectiveness compared to a range of other off-the-shelf solutions.
    摘要 传统上,数据选择已经被研究在所有样本都是完全公开给机器学习开发人员的设置下。然而,在实际数据交换场景下,数据提供者通常只披露一个有限的子集的样本前于收购决策。最近,有人尝试使用有限可用样本来预测模型性能的拓扑函数。然而,这些拓扑函数是黑盒子,计算成本高,易于过拟合,或者困难优化数据选择。这篇论文提出了一个名为的框架,可以根据部分样本预测模型性能并支持数据选择决策。我们的方法与现有工作不同,我们引入了一种新的两stage表现预测过程。在第一个阶段,我们利用最优运输距离来预测模型在任何数据混合比例范围内的性能。在第二个阶段,我们通过一种新的无参数映射技术,基于神经拓扑法则,来推断模型在未知大小的数据上的性能。我们还提出了一种高效的梯度下降方法来选择数据源基于预测模型性能。对于多种应用程序的评估表明,在性能拓扑预测和构建表现预测器的计算成本方面都有显著改进,而且在数据选择效果上也高效过很多其他OFFTHE-SHELF解决方案。

Gaussian Database Alignment and Gaussian Planted Matching

  • paper_url: http://arxiv.org/abs/2307.02459
  • repo_url: None
  • paper_authors: Osman Emre Dai, Daniel Cullina, Negar Kiyavash
  • for: 这种研究的目的是解决数据库对照问题,即给定两个匿名化的数据库,找到它们之间的相似性并将它们对应地进行对照。
  • methods: 这种问题与植入匹配问题相关,即给定一个大图,找到该图中的匹配。作者们使用了最大似然方法来解决这种问题,该方法是一个线性程序。
  • results: 作者们发现,当数据库特征维度为ω(log n)时,对照性的性能阈值与植入匹配阈值相同。此外,作者们还研究了各种约束的放松以更好地理解它们在不同情况下的效果,并提供了可达性和反向下界 bounds。
    Abstract Database alignment is a variant of the graph alignment problem: Given a pair of anonymized databases containing separate yet correlated features for a set of users, the problem is to identify the correspondence between the features and align the anonymized user sets based on correlation alone. This closely relates to planted matching, where given a bigraph with random weights, the goal is to identify the underlying matching that generated the given weights. We study an instance of the database alignment problem with multivariate Gaussian features and derive results that apply both for database alignment and for planted matching, demonstrating the connection between them. The performance thresholds for database alignment converge to that for planted matching when the dimensionality of the database features is \(\omega(\log n)\), where \(n\) is the size of the alignment, and no individual feature is too strong. The maximum likelihood algorithms for both planted matching and database alignment take the form of a linear program and we study relaxations to better understand the significance of various constraints under various conditions and present achievability and converse bounds. Our results show that the almost-exact alignment threshold for the relaxed algorithms coincide with that of maximum likelihood, while there is a gap between the exact alignment thresholds. Our analysis and results extend to the unbalanced case where one user set is not fully covered by the alignment.
    摘要 <>translate "Database alignment is a variant of the graph alignment problem: Given a pair of anonymized databases containing separate yet correlated features for a set of users, the problem is to identify the correspondence between the features and align the anonymized user sets based on correlation alone. This closely relates to planted matching, where given a bigraph with random weights, the goal is to identify the underlying matching that generated the given weights. We study an instance of the database alignment problem with multivariate Gaussian features and derive results that apply both for database alignment and for planted matching, demonstrating the connection between them. The performance thresholds for database alignment converge to that for planted matching when the dimensionality of the database features is ω(log n), where n is the size of the alignment, and no individual feature is too strong. The maximum likelihood algorithms for both planted matching and database alignment take the form of a linear program and we study relaxations to better understand the significance of various constraints under various conditions and present achievability and converse bounds. Our results show that the almost-exact alignment threshold for the relaxed algorithms coincide with that of maximum likelihood, while there is a gap between the exact alignment thresholds. Our analysis and results extend to the unbalanced case where one user set is not fully covered by the alignment."into Simplified Chinese. Here's the translation:数据库对应问题是图像问题的变种:给定一对匿名化的数据库,它们包含用户特征的分离 yet 相关的特征,问题是将这些特征对应起来,并将匿名化用户集相互对应。这与植入匹配问题密切相关,给定一个大Graph with random weights,问题是找出该大Graph中的匹配。我们研究了一个包含多变量 Gaussian 特征的数据库对应问题,并 derive 结果适用于数据库对应和植入匹配。我们发现当数据库特征维度为 ω(log n) 时,对应性reshold 与植入匹配问题的性能reshold 相同,其中 n 是对应的大小。此外,我们还发现了在不同条件下不同约束的放松关系,并 analyze 其可行性和反向关系。我们的结果表明,在放松问题中,准确对应的阈值与最大可能性问题的阈值相同,但是准确对应的阈值与最大可能性问题的阈值之间存在差异。我们的分析和结果还扩展到不均衡的情况,其中一个用户集不完全覆盖对应。

Transgressing the boundaries: towards a rigorous understanding of deep learning and its (non-)robustness

  • paper_url: http://arxiv.org/abs/2307.02454
  • repo_url: None
  • paper_authors: Carsten Hartmann, Lorenz Richter
  • for: This paper focuses on the robustness issues of deep learning (DL) and bridges concerns and attempts from approximation theory to statistical learning theory.
  • methods: The paper reviews Bayesian Deep Learning as a means for uncertainty quantification and rigorous explainability.
  • results: The paper provides a systematic mathematical approach to understanding the specifics of DL and its success in various applications.
    Abstract The recent advances in machine learning in various fields of applications can be largely attributed to the rise of deep learning (DL) methods and architectures. Despite being a key technology behind autonomous cars, image processing, speech recognition, etc., a notorious problem remains the lack of theoretical understanding of DL and related interpretability and (adversarial) robustness issues. Understanding the specifics of DL, as compared to, say, other forms of nonlinear regression methods or statistical learning, is interesting from a mathematical perspective, but at the same time it is of crucial importance in practice: treating neural networks as mere black boxes might be sufficient in certain cases, but many applications require waterproof performance guarantees and a deeper understanding of what could go wrong and why it could go wrong. It is probably fair to say that, despite being mathematically well founded as a method to approximate complicated functions, DL is mostly still more like modern alchemy that is firmly in the hands of engineers and computer scientists. Nevertheless, it is evident that certain specifics of DL that could explain its success in applications demands systematic mathematical approaches. In this work, we review robustness issues of DL and particularly bridge concerns and attempts from approximation theory to statistical learning theory. Further, we review Bayesian Deep Learning as a means for uncertainty quantification and rigorous explainability.
    摘要 Machine learning has made significant progress in various fields, thanks to the rise of deep learning (DL) methods and architectures. However, a major challenge remains the lack of theoretical understanding of DL, which hinders its widespread adoption. This review aims to provide an overview of the challenges in DL, specifically in the areas of interpretability and robustness, and discuss potential solutions.The Importance of Understanding DLDeep learning is a powerful tool for approximating complex functions, but its lack of theoretical understanding poses significant challenges. While it may be sufficient to treat neural networks as black boxes in some cases, many applications require rigorous performance guarantees and a deeper understanding of what can go wrong. This is particularly important in high-stakes applications such as autonomous vehicles, image processing, and speech recognition.Mathematical Foundations of DLDespite its success in applications, DL is still largely based on modern alchemy, with many of its underlying principles still not well understood. However, recent advances in approximation theory and statistical learning theory have provided valuable insights into the mathematical foundations of DL.Robustness Issues in DLOne of the major challenges in DL is its lack of robustness to adversarial attacks and other forms of input noise. This has serious implications for applications where DL models must be relied upon to make critical decisions. To address this challenge, researchers have proposed various techniques, such as adversarial training and input preprocessing.Bayesian Deep LearningBayesian deep learning (BDL) is a promising approach to addressing the challenges of DL. BDL combines the strengths of deep learning with the principles of Bayesian inference, providing a framework for uncertainty quantification and rigorous explainability. By incorporating prior knowledge and uncertainty into the DL framework, BDL offers a more robust and reliable approach to machine learning.ConclusionIn conclusion, deep learning has revolutionized the field of machine learning, but its lack of theoretical understanding poses significant challenges. By reviewing the robustness issues of DL and exploring the potential of BDL, we can develop more reliable and robust machine learning models that can be applied in a wide range of domains.

Hoodwinked: Deception and Cooperation in a Text-Based Game for Language Models

  • paper_url: http://arxiv.org/abs/2308.01404
  • repo_url: https://github.com/aogara-ds/hoodwinked
  • paper_authors: Aidan O’Gara
  • for: 这篇论文研究了现有语言模型是否能够识别和诈骗?作者们提出了一款基于文本游戏的测试方法,以便评估语言模型的干预能力。
  • methods: 作者们使用了GPT-3、GPT-3.5和GPT-4等语言模型,并在这些模型中控制了代理人。在游戏中,玩家需要通过自然语言对话和投票来决定是否banish其他玩家。
  • results: 研究发现,使用更高级别的语言模型可以更好地识别和诈骗。在18个比较中,更高级别的模型比较小的模型表现更好,并且在对话中使用更多的推理和说服技巧。
    Abstract Are current language models capable of deception and lie detection? We study this question by introducing a text-based game called $\textit{Hoodwinked}$, inspired by Mafia and Among Us. Players are locked in a house and must find a key to escape, but one player is tasked with killing the others. Each time a murder is committed, the surviving players have a natural language discussion then vote to banish one player from the game. We conduct experiments with agents controlled by GPT-3, GPT-3.5, and GPT-4 and find evidence of deception and lie detection capabilities. The killer often denies their crime and accuses others, leading to measurable effects on voting outcomes. More advanced models are more effective killers, outperforming smaller models in 18 of 24 pairwise comparisons. Secondary metrics provide evidence that this improvement is not mediated by different actions, but rather by stronger persuasive skills during discussions. To evaluate the ability of AI agents to deceive humans, we make this game publicly available at h https://hoodwinked.ai/ .
    摘要 现在的语言模型能够做出骗局和谎言吗?我们通过一款基于文本的游戏《骗子》(Hoodwinked)来研究这个问题,这款游戏灵感来自《贾布》和《 Among Us》。玩家被困在一个房子中,需要找到逃脱的钥匙,但有一个玩家被要求杀死其他玩家。每次杀人时,幸存的玩家们进行自然语言的讨论,然后投票将一名玩家从游戏中开除。我们在使用GPT-3、GPT-3.5和GPT-4控制的代理人进行实验,发现了骗局和谎言的能力。杀人者经常否认犯罪,指责其他人,导致讨论后投票结果受到影响。更高级的模型在18个比赛中赢得了18场,比较小的模型表现更差。次要指标表明这种改进不来自不同的行动,而是来自更强的说服技巧在讨论中。为了评估人工智能代理人是否能骗人类,我们将这款游戏公开发布在

An Exploratory Literature Study on Sharing and Energy Use of Language Models for Source Code

  • paper_url: http://arxiv.org/abs/2307.02443
  • repo_url: None
  • paper_authors: Max Hort, Anastasiia Grishina, Leon Moonen
  • for: 这个研究的主要目标是调查语言模型在软件工程任务上的训练和共享情况,以及训练过程中的能效性。
  • methods: 这个研究使用了雪崩式Literature搜索来找到语言模型在源代码上的应用,并分析这些应用的可重用性从可持续性的角度。
  • results: 研究发现,现有的494篇唯一出版物中,293篇 relevanter Publications使用语言模型解决源代码相关任务。其中,27% (79/293)的研究将artifacts分享给他人 reuse。此外,研究还收集了训练过程中硬件使用情况以及训练时间,从而了解训练过程中的能效性。研究发现,当前的学术研究中有40%的论文没有分享源代码或训练过程中的artifacts,建议共享源代码和训练过程中的artifacts,以便实现可持续的可重用性。
    Abstract Large language models trained on source code can support a variety of software development tasks, such as code recommendation and program repair. Large amounts of data for training such models benefit the models' performance. However, the size of the data and models results in long training times and high energy consumption. While publishing source code allows for replicability, users need to repeat the expensive training process if models are not shared. The main goal of the study is to investigate if publications that trained language models for software engineering (SE) tasks share source code and trained artifacts. The second goal is to analyze the transparency on training energy usage. We perform a snowballing-based literature search to find publications on language models for source code, and analyze their reusability from a sustainability standpoint. From 494 unique publications, we identified 293 relevant publications that use language models to address code-related tasks. Among them, 27% (79 out of 293) make artifacts available for reuse. This can be in the form of tools or IDE plugins designed for specific tasks or task-agnostic models that can be fine-tuned for a variety of downstream tasks. Moreover, we collect insights on the hardware used for model training, as well as training time, which together determine the energy consumption of the development process. We find that there are deficiencies in the sharing of information and artifacts for current studies on source code models for software engineering tasks, with 40% of the surveyed papers not sharing source code or trained artifacts. We recommend the sharing of source code as well as trained artifacts, to enable sustainable reproducibility. Moreover, comprehensive information on training times and hardware configurations should be shared for transparency on a model's carbon footprint.
    摘要 大型语言模型可以支持软件开发工作中的多种任务,例如代码建议和程式修理。大量训练数据可以提高模型的表现。然而,模型和数据的大小导致训练时间很长,能源消耗高。发布源代码可以保证可复制性,但用户需要重复进行昂贵的训练过程。研究的主要目标是探索发布了语言模型 для软件工程(SE)任务的文献是否共享源代码和训练遗产。第二个目标是分析训练时间的可视性。我们使用雪球搜寻法进行文献搜寻,并分析它们在可持续性方面的可重用性。从494份唯一的文献中,我们获得了293份相关的文献,用语言模型解决代码相关的任务。其中,27%(79份中的27份)公开了工具或IDE插件,这些工具可以用于特定任务或任务无关的模型,可以进行多种下游任务的微调。此外,我们收集了训练硬件和训练时间的信息,这些信息共同决定模型训练过程中的能源消耗。我们发现现有的研究中有40%的文献不会共享源代码或训练遗产,我们建议共享源代码以及训练遗产,以实现可复制性。此外,我们建议分享训练时间和硬件配置的详细信息,以便透明度地描述模型的碳足迹。

Privacy Amplification via Importance Sampling

  • paper_url: http://arxiv.org/abs/2307.10187
  • repo_url: None
  • paper_authors: Dominik Fay, Sebastian Mair, Jens Sjölund
  • for: 本研究探讨对各种抽样方法进行隐私增强的影响,以及这些方法在实现隐私保护的过程中的缺点和限制。
  • methods: 本研究使用了重要抽样法来实现隐私增强,具体来说是通过对每个数据点的抽样概率进行重要性权重来实现。
  • results: 研究结果表明,通过重要抽样法可以在保持隐私水平的情况下提高数据分布的准确性和效率。同时,研究还发现了一些缺点和限制,如重要抽样法可能会增加数据分布的尺度和隐私泄露的风险。
    Abstract We examine the privacy-enhancing properties of subsampling a data set via importance sampling as a pre-processing step for differentially private mechanisms. This extends the established privacy amplification by subsampling result to importance sampling where each data point is weighted by the reciprocal of its selection probability. The implications for privacy of weighting each point are not obvious. On the one hand, a lower selection probability leads to a stronger privacy amplification. On the other hand, the higher the weight, the stronger the influence of the point on the output of the mechanism in the event that the point does get selected. We provide a general result that quantifies the trade-off between these two effects. We show that heterogeneous sampling probabilities can lead to both stronger privacy and better utility than uniform subsampling while retaining the subsample size. In particular, we formulate and solve the problem of privacy-optimal sampling, that is, finding the importance weights that minimize the expected subset size subject to a given privacy budget. Empirically, we evaluate the privacy, efficiency, and accuracy of importance sampling-based privacy amplification on the example of k-means clustering.
    摘要 我们研究对减少数据集的隐私Properties of subsampling as a pre-processing step for differentially private mechanisms. This extends the established privacy amplification by subsampling result to importance sampling, where each data point is weighted by the reciprocal of its selection probability. The implications for privacy of weighting each point are not obvious. On the one hand, a lower selection probability leads to a stronger privacy amplification. On the other hand, the higher the weight, the stronger the influence of the point on the output of the mechanism in the event that the point does get selected. We provide a general result that quantifies the trade-off between these two effects. We show that heterogeneous sampling probabilities can lead to both stronger privacy and better utility than uniform subsampling while retaining the subsample size. In particular, we formulate and solve the problem of privacy-optimal sampling, that is, finding the importance weights that minimize the expected subset size subject to a given privacy budget. Empirically, we evaluate the privacy, efficiency, and accuracy of importance sampling-based privacy amplification on the example of k-means clustering.Here's the text with the original English text and the Simplified Chinese translation side by side for reference:Original English text:We examine the privacy-enhancing properties of subsampling a data set via importance sampling as a pre-processing step for differentially private mechanisms. This extends the established privacy amplification by subsampling result to importance sampling, where each data point is weighted by the reciprocal of its selection probability. The implications for privacy of weighting each point are not obvious. On the one hand, a lower selection probability leads to a stronger privacy amplification. On the other hand, the higher the weight, the stronger the influence of the point on the output of the mechanism in the event that the point does get selected. We provide a general result that quantifies the trade-off between these two effects. We show that heterogeneous sampling probabilities can lead to both stronger privacy and better utility than uniform subsampling while retaining the subsample size. In particular, we formulate and solve the problem of privacy-optimal sampling, that is, finding the importance weights that minimize the expected subset size subject to a given privacy budget. Empirically, we evaluate the privacy, efficiency, and accuracy of importance sampling-based privacy amplification on the example of k-means clustering.Simplified Chinese translation:我们研究对减少数据集的隐私Properties of subsampling as a pre-processing step for differentially private mechanisms. This extends the established privacy amplification by subsampling result to importance sampling, where each data point is weighted by the reciprocal of its selection probability. The implications for privacy of weighting each point are not obvious. On the one hand, a lower selection probability leads to a stronger privacy amplification. On the other hand, the higher the weight, the stronger the influence of the point on the output of the mechanism in the event that the point does get selected. We provide a general result that quantifies the trade-off between these two effects. We show that heterogeneous sampling probabilities can lead to both stronger privacy and better utility than uniform subsampling while retaining the subsample size. In particular, we formulate and solve the problem of privacy-optimal sampling, that is, finding the importance weights that minimize the expected subset size subject to a given privacy budget. Empirically, we evaluate the privacy, efficiency, and accuracy of importance sampling-based privacy amplification on the example of k-means clustering.

External Reasoning: Towards Multi-Large-Language-Models Interchangeable Assistance with Human Feedback

  • paper_url: http://arxiv.org/abs/2307.12057
  • repo_url: https://github.com/AkideLiu/ANLP
  • paper_authors: Akide Liu
  • for: 提高人工智能的普遍智能水平,解决复杂的AI任务
  • methods: 通过选择性地 интегра External Reasoning 知识库,并在不同级别提供多种LLM交换帮助
  • results: 实现了提高多种LLM表现,超过现有解决方案,并且更高效 чем直接LLM处理全文
    Abstract Memory is identified as a crucial human faculty that allows for the retention of visual and linguistic information within the hippocampus and neurons in the brain, which can subsequently be retrieved to address real-world challenges that arise through a lifetime of learning. The resolution of complex AI tasks through the application of acquired knowledge represents a stride toward the realization of artificial general intelligence. However, despite the prevalence of Large Language Models (LLMs) like GPT-3.5 and GPT-4 , which have displayed remarkable capabilities in language comprehension, generation, interaction, and reasoning, they are inhibited by constraints on context length that preclude the processing of extensive, continually evolving knowledge bases. This paper proposes that LLMs could be augmented through the selective integration of knowledge from external repositories, and in doing so, introduces a novel methodology for External Reasoning, exemplified by ChatPDF. Central to this approach is the establishment of a tiered policy for \textbf{External Reasoning based on Multiple LLM Interchange Assistance}, where the level of support rendered is modulated across entry, intermediate, and advanced tiers based on the complexity of the query, with adjustments made in response to human feedback. A comprehensive evaluation of this methodology is conducted using multiple LLMs and the results indicate state-of-the-art performance, surpassing existing solutions including ChatPDF.com. Moreover, the paper emphasizes that this approach is more efficient compared to the direct processing of full text by LLMs.
    摘要 记忆是人类重要的一种功能,允许人们保持视觉和语言信息在脑中 hippocampus 和 neurons 中,并可以在面临生活中的挑战时进行检索。通过人工智能技术的应用,人们可以解决复杂的问题,是人工智能的实现的一步。然而,即使现有的大语言模型(LLMs)如 GPT-3.5 和 GPT-4 已经显示出了惊人的语言理解、生成、互动和理智能力,它们却受到了知识库的长度限制,无法处理大量、不断发展的知识库。本文提出,LLMs 可以通过外部知识集成 selective 的方式进行增强,并 introduce 一种新的方法ology ,例如 ChatPDF。在这种方法中,根据查询的复杂程度,对 LLMs 进行多个 tier 的多种助手支持,并根据人类反馈进行调整。通过多种 LLMs 的测试,结果表明,这种方法可以达到现有解决方案的同等或更高水平,并且更高效于直接 LLMS 处理全文。

Exploring Continual Learning for Code Generation Models

  • paper_url: http://arxiv.org/abs/2307.02435
  • repo_url: None
  • paper_authors: Prateek Yadav, Qing Sun, Hantian Ding, Xiaopeng Li, Dejiao Zhang, Ming Tan, Xiaofei Ma, Parminder Bhatia, Ramesh Nallapati, Murali Krishna Ramanathan, Mohit Bansal, Bing Xiang
  • for: This paper aims to address the issue of continual learning (CL) in the code domain, specifically for large-scale code generation models like Codex and CodeT5.
  • methods: The authors propose a new benchmark called CodeTask-CL that covers a wide range of coding tasks and compare popular CL techniques from NLP and Vision domains. They also introduce a new method called Prompt Pooling with Teacher Forcing (PP-TF) that addresses the issue of catastrophic forgetting in coding tasks.
  • results: The authors achieve a 21.54% improvement over Prompt Pooling with their proposed method PP-TF, demonstrating the effectiveness of their approach. They also establish a training pipeline for CL on code models that can be used for further development of CL methods.Here is the information in Simplified Chinese text:
  • for: 这篇论文目标是解决大规模代码生成模型如Codex和CodeT5的连续学习(CL)问题。
  • methods: 作者提出了一个新的 benchmark called CodeTask-CL,该 benchmark 涵盖了广泛的编程任务,并与 NLP 和 Computer Vision 领域的 CL 技术进行比较。他们还提出了一种新的方法 called Prompt Pooling with Teacher Forcing (PP-TF),用于解决编程任务中的快速忘记问题。
  • results: 作者通过 PP-TF 方法实现了21.54% 的提升,证明了他们的方法的有效性。他们还建立了一个用于 CL 的代码训练管道,这可以激励更多人开发 CL 方法。
    Abstract Large-scale code generation models such as Codex and CodeT5 have achieved impressive performance. However, libraries are upgraded or deprecated very frequently and re-training large-scale language models is computationally expensive. Therefore, Continual Learning (CL) is an important aspect that remains underexplored in the code domain. In this paper, we introduce a benchmark called CodeTask-CL that covers a wide range of tasks, including code generation, translation, summarization, and refinement, with different input and output programming languages. Next, on our CodeTask-CL benchmark, we compare popular CL techniques from NLP and Vision domains. We find that effective methods like Prompt Pooling (PP) suffer from catastrophic forgetting due to the unstable training of the prompt selection mechanism caused by stark distribution shifts in coding tasks. We address this issue with our proposed method, Prompt Pooling with Teacher Forcing (PP-TF), that stabilizes training by enforcing constraints on the prompt selection mechanism and leads to a 21.54% improvement over Prompt Pooling. Along with the benchmark, we establish a training pipeline that can be used for CL on code models, which we believe can motivate further development of CL methods for code models. Our code is available at https://github.com/amazon-science/codetaskcl-pptf
    摘要 大规模代码生成模型如Codex和CodeT5已经实现了印象深刻的性能。然而,库regularly更新或 deprecates,重新训练大规模语言模型是计算成本高昂。因此,持续学习(Continual Learning,CL)在代码领域是一个重要的不足之处。在这篇论文中,我们介绍了一个名为CodeTask-CL的benchmark,该benchmark包括了代码生成、翻译、摘要和修订等多种任务,并且支持不同的输入和输出编程语言。接着,在我们的CodeTask-CLbenchmark上,我们比较了 popular CL技术from NLP和Vision领域。我们发现,有效的方法如Prompt Pooling (PP)会受到极端分布变化的影响,导致训练不稳定,从而导致忘记灾难。我们解决这个问题,提出了我们的提案,Prompt Pooling with Teacher Forcing (PP-TF),该方法可以稳定训练,并且在Prompt Pooling的基础上提高了21.54%。此外,我们还提供了一个可用于CL的代码训练管道,我们认为这可以激励further development of CL方法 для代码模型。我们的代码可以在https://github.com/amazon-science/codetaskcl-pptf找到。

A probabilistic, data-driven closure model for RANS simulations with aleatoric, model uncertainty

  • paper_url: http://arxiv.org/abs/2307.02432
  • repo_url: None
  • paper_authors: Atul Agrawal, Phaedon-Stelios Koutsourelakis
    for:* 这个论文旨在提出一种基于数据的、闭合模型,用于Reynolds均值 Navier-Stokes(RANS) simulations,该模型包括随机变量和模型不确定性。methods:* 该 closure 由两部分组成:一个parametric部分,使用之前提出的神经网络基于张量基函数,这些基函数依赖于流体张量的约束和旋转张量的 invariants。此外,还包括随机变量,用于补做模型错误。results:* 该模型可以生成准确、 probabilistic、预测性的结果,包括所有流体量,即使在模型错误存在的情况下。在牛顿障碍流 пробле目中, demonstrates the capability of the proposed model to produce accurate, probabilistic, predictive estimates for all flow quantities, even in regions where model errors are present.
    Abstract We propose a data-driven, closure model for Reynolds-averaged Navier-Stokes (RANS) simulations that incorporates aleatoric, model uncertainty. The proposed closure consists of two parts. A parametric one, which utilizes previously proposed, neural-network-based tensor basis functions dependent on the rate of strain and rotation tensor invariants. This is complemented by latent, random variables which account for aleatoric model errors. A fully Bayesian formulation is proposed, combined with a sparsity-inducing prior in order to identify regions in the problem domain where the parametric closure is insufficient and where stochastic corrections to the Reynolds stress tensor are needed. Training is performed using sparse, indirect data, such as mean velocities and pressures, in contrast to the majority of alternatives that require direct Reynolds stress data. For inference and learning, a Stochastic Variational Inference scheme is employed, which is based on Monte Carlo estimates of the pertinent objective in conjunction with the reparametrization trick. This necessitates derivatives of the output of the RANS solver, for which we developed an adjoint-based formulation. In this manner, the parametric sensitivities from the differentiable solver can be combined with the built-in, automatic differentiation capability of the neural network library in order to enable an end-to-end differentiable framework. We demonstrate the capability of the proposed model to produce accurate, probabilistic, predictive estimates for all flow quantities, even in regions where model errors are present, on a separated flow in the backward-facing step benchmark problem.
    摘要 我们提出了一种数据驱动的、闭合模型,用于Reynolds均值 Navier-Stokes(RANS) simulations,该模型包括不确定性。该闭合分为两部分:一个parametric部分,使用之前提出的神经网络基于张量函数,这些函数依赖于流速度和旋转张量的 invariants。此外,还有一些随机变量,帮助考虑模型的不确定性。我们提出了一种完全 bayesian 的 формулиров法,并将之与一个稀疏逼 zero prior 结合,以便在问题空间中标识参数闭合不充分的区域,并在这些区域中进行随机修正 Reynolds 压力张量。我们使用了稀疏、间接数据,如平均速度和压力,而不是直接的 Reynolds 压力数据进行训练。 для推断和学习,我们使用了Stochastic Variational Inference 方法,该方法基于 Monte Carlo 估计和 reparametrization trick。这使得我们可以将parametric 敏感度与神经网络库中的自动导数能力结合,以实现一个可微的框架。我们在逆向排流缘问题上示出了我们的模型的能力,可以生成准确的、probabilistic 预测结果,包括在模型错误存在的区域。

In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

  • paper_url: http://arxiv.org/abs/2307.02419
  • repo_url: None
  • paper_authors: Yeqi Gao, Zhao Song, Shenghao Xie
  • for: 这个论文主要研究了大语言模型(LLMs)在人类社会中所带来的显著和转变性改变。
  • methods: 该论文采用了vectorization技术来解决 regression问题,将维度从$d$扩展到$d^2$。
  • results: 研究人员通过完成 lipschitz 分析,得出了关于在上下文学习中的主要结果。
    Abstract Large language models (LLMs) have brought significant and transformative changes in human society. These models have demonstrated remarkable capabilities in natural language understanding and generation, leading to various advancements and impacts across several domains. We consider the in-context learning under two formulation for attention related regression in this work. Given matrices $A_1 \in \mathbb{R}^{n \times d}$, and $A_2 \in \mathbb{R}^{n \times d}$ and $B \in \mathbb{R}^{n \times n}$, the purpose is to solve some certain optimization problems: Normalized version $\min_{X} \| D(X)^{-1} \exp(A_1 X A_2^\top) - B \|_F^2$ and Rescaled version $\| \exp(A_1 X A_2^\top) - D(X) \cdot B \|_F^2$. Here $D(X) := \mathrm{diag}( \exp(A_1 X A_2^\top) {\bf 1}_n )$. Our regression problem shares similarities with previous studies on softmax-related regression. Prior research has extensively investigated regression techniques related to softmax regression: Normalized version $\| \langle \exp(Ax) , {\bf 1}_n \rangle^{-1} \exp(Ax) - b \|_2^2$ and Resscaled version $\| \exp(Ax) - \langle \exp(Ax), {\bf 1}_n \rangle b \|_2^2 $ In contrast to previous approaches, we adopt a vectorization technique to address the regression problem in matrix formulation. This approach expands the dimension from $d$ to $d^2$, resembling the formulation of the regression problem mentioned earlier. Upon completing the lipschitz analysis of our regression function, we have derived our main result concerning in-context learning.
    摘要 In this work, we consider in-context learning under two formulations for attention-related regression. Given matrices $A_1 \in \mathbb{R}^{n \times d}$, $A_2 \in \mathbb{R}^{n \times d}$, and $B \in \mathbb{R}^{n \times n}$, our goal is to solve certain optimization problems:1. Normalized version: $\min_{X} \| D(X)^{-1} \exp(A_1 X A_2^\top) - B \|_F^2$2. Rescaled version: $\| \exp(A_1 X A_2^\top) - D(X) \cdot B \|_F^2$Here, $D(X) = \text{diag}(\exp(A_1 X A_2^\top) \mathbf{1}_n)$. Our regression problem shares similarities with previous studies on softmax-related regression. Prior research has extensively investigated regression techniques related to softmax regression:1. Normalized version: $\| \langle \exp(Ax), \mathbf{1}_n \rangle^{-1} \exp(Ax) - b \|_2^2$2. Rescaled version: $\| \exp(Ax) - \langle \exp(Ax), \mathbf{1}_n \rangle b \|_2^2$In contrast to previous approaches, we adopt a vectorization technique to address the regression problem in matrix formulation. This approach expands the dimension from $d$ to $d^2$, resembling the formulation of the regression problem mentioned earlier.After completing the Lipschitz analysis of our regression function, we have derived our main result concerning in-context learning.

Multi-objective Deep Reinforcement Learning for Mobile Edge Computing

  • paper_url: http://arxiv.org/abs/2307.14346
  • repo_url: https://github.com/gracefulning/mec_morl_multipolicy
  • paper_authors: Ning Yang, Junrui Wen, Meng Zhang, Ming Tang
  • For: The paper is written for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption, and addresses the challenge of unknown preferences in multi-objective resource scheduling for mobile edge computing (MEC) systems.* Methods: The paper uses a multi-objective reinforcement learning (MORL) scheme with proximal policy optimization (PPO) to address the challenge of unknown preferences in MEC systems, and introduces a well-designed state encoding method for constructing features for multiple edges in MEC systems and a sophisticated reward function for accurately computing the utilities of delay and energy consumption.* Results: The paper demonstrates that the proposed MORL scheme enhances the hypervolume of the Pareto front by up to 233.1% compared to benchmarks, indicating the effectiveness of the proposed method in improving the performance of MEC systems.Here’s the information in Simplified Chinese text:* For: 这篇论文是为下一代 mobil 网络应用程序而写的,这些应用程序优先级包括延迟和能耗。* Methods: 这篇论文使用了多目标学习(MORL)和 proximal policy 优化(PPO)来解决 MEC 系统中 unknown preferences 的问题,并引入了多边的状态编码方法和准确计算延迟和能耗的奖励函数。* Results: 论文的实验结果表明,提议的 MORL 方案可以提高 MEC 系统的 Pareto 前面的卷积体率,相比基准值,提高了233.1%。
    Abstract Mobile edge computing (MEC) is essential for next-generation mobile network applications that prioritize various performance metrics, including delays and energy consumption. However, conventional single-objective scheduling solutions cannot be directly applied to practical systems in which the preferences of these applications (i.e., the weights of different objectives) are often unknown or challenging to specify in advance. In this study, we address this issue by formulating a multi-objective offloading problem for MEC with multiple edges to minimize expected long-term energy consumption and transmission delay while considering unknown preferences as parameters. To address the challenge of unknown preferences, we design a multi-objective (deep) reinforcement learning (MORL)-based resource scheduling scheme with proximal policy optimization (PPO). In addition, we introduce a well-designed state encoding method for constructing features for multiple edges in MEC systems, a sophisticated reward function for accurately computing the utilities of delay and energy consumption. Simulation results demonstrate that our proposed MORL scheme enhances the hypervolume of the Pareto front by up to 233.1% compared to benchmarks. Our full framework is available at https://github.com/gracefulning/mec_morl_multipolicy.
    摘要 Mobile edge computing (MEC) 是下一代移动网络应用程序的关键技术,这些应用程序强调多种性能指标,包括延迟和能耗。然而,传统的单目标调度解决方案无法直接应用于实际系统中,因为这些应用程序的偏好(即不同目标的权重)通常未知或难以在先 specify。在本研究中,我们解决了这个问题,通过对 MEC 系统中的多个边进行减少预期长期能耗和传输延迟的多目标卸载问题进行定义。为了解决未知偏好的挑战,我们提出了一种基于多目标学习(deep reinforcement learning,DRL)的资源调度方案,并使用 proximal policy optimization(PPO)来解决。此外,我们还提出了一种有效的状态编码方法,用于构建多边 MEC 系统中的特征。此外,我们还提出了一种具有准确计算延迟和能耗使用的复杂的奖励函数。实验结果表明,我们的提议的 MORL 方案可以提高 Pareto 前方的权重的增加率达到 233.1% 相比 benchmark。我们的全面框架可以在 GitHub 上找到:https://github.com/gracefulning/mec_morl_multipolicy。

OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models

  • paper_url: http://arxiv.org/abs/2307.03084
  • repo_url: https://github.com/thunlp/opendelta
  • paper_authors: Shengding Hu, Ning Ding, Weilin Zhao, Xingtai Lv, Zhen Zhang, Zhiyuan Liu, Maosong Sun
  • for: 这篇研究目的是提出一个开源库OpenDelta,用于实现大型预训练模型(PTMs)的快速适应下游任务。
  • methods: OpenDelta使用多种δ调整方法来实现快速适应,这些方法包括δ模组、阶层δ调整等,并且可以与不同的预训练模型(PTMs)集成。
  • results: OpenDelta提供了一个通用的、可调整的、可扩展的平台,可以帮助研究者和实践者快速适应大型预训练模型。
    Abstract The scale of large pre-trained models (PTMs) poses significant challenges in adapting to downstream tasks due to the high optimization overhead and storage costs associated with full-parameter fine-tuning. To address this, many studies explore parameter-efficient tuning methods, also framed as "delta tuning", which updates only a small subset of parameters, known as "delta modules", while keeping the backbone model's parameters fixed. However, the practicality and flexibility of delta tuning have been limited due to existing implementations that directly modify the code of the backbone PTMs and hard-code specific delta tuning methods for each PTM. In this paper, we present OpenDelta, an open-source library that overcomes these limitations by providing a plug-and-play implementation of various delta tuning methods. Our novel techniques eliminate the need to modify the backbone PTMs' code, making OpenDelta compatible with different, even novel PTMs. OpenDelta is designed to be simple, modular, and extensible, providing a comprehensive platform for researchers and practitioners to adapt large PTMs efficiently.
    摘要 大型预训练模型(PTM)的缺省大小带来了适应下游任务的 significiant挑战,这是因为全参数细化训练所需的优化开销和存储成本很高。为了解决这个问题,许多研究尝试了参数效率的训练方法,也称为"delta tuning",这种方法只更新一个小subset的参数,称为"delta module",而保持背景模型的参数不变。然而,现有的实现方法直接修改背景PTM的代码,这限制了 delta tuning 的实用性和灵活性。在这篇论文中,我们介绍了 OpenDelta,一个开源库,它解决了这些限制。OpenDelta 提供了一个可插入的 delta tuning 实现方式,不需要修改背景PTM 的代码,因此可以与不同的、甚至是新的 PTM 兼容。我们的新技术使得 OpenDelta 可以与不同的 delta tuning 方法进行整合,提供了一个简单、干净、可扩展的平台,以便研究人员和实践者们能够效率地适应大型 PTM。

$ν^2$-Flows: Fast and improved neutrino reconstruction in multi-neutrino final states with conditional normalizing flows

  • paper_url: http://arxiv.org/abs/2307.02405
  • repo_url: https://github.com/rodem-hep/nu2flows
  • paper_authors: John Andrew Raine, Matthew Leigh, Knut Zoch, Tobias Golling
  • for: 这个论文是为了扩展$\nu$-Flows方法,使其能处理包含多个中微子的终态。
  • methods: 这个方法使用了$\nu^2$-Flows方法,可以native地扩展到任何对象类型和多重性的终态。
  • results: 在$t\bar{t}$同位素事件中,$\nu^2$-Flows方法可以更加准确地重建中微子的动量和相关性,并且可以解决所有事件。比较其他方法,这个方法的推断时间更为短暂,并且可以通过图像处理器并行执行来进一步加速。在应用于$t\bar{t}$同位素事件中,$\nu^2$-Flows方法可以提高每个分布的统计精度,比标准技术提高了1.5倍至2倍,并且在一些情况下可以达到4倍。
    Abstract In this work we introduce $\nu^2$-Flows, an extension of the $\nu$-Flows method to final states containing multiple neutrinos. The architecture can natively scale for all combinations of object types and multiplicities in the final state for any desired neutrino multiplicities. In $t\bar{t}$ dilepton events, the momenta of both neutrinos and correlations between them are reconstructed more accurately than when using the most popular standard analytical techniques, and solutions are found for all events. Inference time is significantly faster than competing methods, and can be reduced further by evaluating in parallel on graphics processing units. We apply $\nu^2$-Flows to $t\bar{t}$ dilepton events and show that the per-bin uncertainties in unfolded distributions is much closer to the limit of performance set by perfect neutrino reconstruction than standard techniques. For the chosen double differential observables $\nu^2$-Flows results in improved statistical precision for each bin by a factor of 1.5 to 2 in comparison to the Neutrino Weighting method and up to a factor of four in comparison to the Ellipse approach.
    摘要 在这个工作中,我们引入 $\nu^2$-Flows,它是 Final State 中多个中微子的扩展方法。该架构可以Native scale 到所有对象类型和多重性的最终状态,对于任何需要的中微子多重性。在 $t\bar{t}$ 蛇脊事件中,我们可以更准确地重建中微子的动量和它们之间的相关性,而且可以为所有事件找到解决方案。在比较方法时,我们发现 $\nu^2$-Flows 的推理时间明显更快,可以通过在图形处理器上并行执行来进一步减少。我们将 $\nu^2$-Flows 应用于 $t\bar{t}$ 蛇脊事件,并发现每个分布的不确定性在 unfolded 分布中远远 closer to the limit of performance set by perfect neutrino reconstruction than标准技术。为选择的双 differential 观测量, $\nu^2$-Flows 可以提高每个分布的统计精度,比标准方法的 Neutrino Weighting 方法和 Ellipse 方法高一个factor of 1.5 to 2, and up to a factor of four in comparison to the Ellipse approach。

Unbalanced Optimal Transport: A Unified Framework for Object Detection

  • paper_url: http://arxiv.org/abs/2307.02402
  • repo_url: https://github.com/hdeplaen/uotod
  • paper_authors: Henri De Plaen, Pierre-François De Plaen, Johan A. K. Suykens, Marc Proesmans, Tinne Tuytelaars, Luc Van Gool
  • for: 这个论文的目的是提出一种基于不对称优化运输的对象检测方法,以提高对象检测模型的性能和初始化速度。
  • methods: 该方法使用了不对称优化运输算法,可以在不同的精度和速度之间选择最佳的属性。
  • results: 实验表明,使用该方法训练对象检测模型可以达到现状的最佳性能水平,并且提供更快的初始化速度。I hope that helps! Let me know if you have any other questions.
    Abstract During training, supervised object detection tries to correctly match the predicted bounding boxes and associated classification scores to the ground truth. This is essential to determine which predictions are to be pushed towards which solutions, or to be discarded. Popular matching strategies include matching to the closest ground truth box (mostly used in combination with anchors), or matching via the Hungarian algorithm (mostly used in anchor-free methods). Each of these strategies comes with its own properties, underlying losses, and heuristics. We show how Unbalanced Optimal Transport unifies these different approaches and opens a whole continuum of methods in between. This allows for a finer selection of the desired properties. Experimentally, we show that training an object detection model with Unbalanced Optimal Transport is able to reach the state-of-the-art both in terms of Average Precision and Average Recall as well as to provide a faster initial convergence. The approach is well suited for GPU implementation, which proves to be an advantage for large-scale models.
    摘要

A Versatile Hub Model For Efficient Information Propagation And Feature Selection

  • paper_url: http://arxiv.org/abs/2307.02398
  • repo_url: None
  • paper_authors: Zhaoze Wang, Junsong Wang
  • for: 这 paper 是 investigate 生物大脑的 topology 特征,以及如何使用这些特征来提高信息传递和认知处理的方法。
  • methods: 这 paper 使用了 Echo State Network (ESN) 来研究 hub 结构的机制基础,并发现 hub 结构可以提高模型的性能。
  • results: 研究发现,通过 incorporating hub 结构,可以提高模型的性能,主要是通过更好地处理信息和提取特征来实现这一点。
    Abstract Hub structure, characterized by a few highly interconnected nodes surrounded by a larger number of nodes with fewer connections, is a prominent topological feature of biological brains, contributing to efficient information transfer and cognitive processing across various species. In this paper, a mathematical model of hub structure is presented. The proposed method is versatile and can be broadly applied to both computational neuroscience and Recurrent Neural Networks (RNNs) research. We employ the Echo State Network (ESN) as a means to investigate the mechanistic underpinnings of hub structures. Our findings demonstrate a substantial enhancement in performance upon incorporating the hub structure. Through comprehensive mechanistic analyses, we show that the hub structure improves model performance by facilitating efficient information processing and better feature extractions.
    摘要 translate("Hub structure, characterized by a few highly interconnected nodes surrounded by a larger number of nodes with fewer connections, is a prominent topological feature of biological brains, contributing to efficient information transfer and cognitive processing across various species. In this paper, a mathematical model of hub structure is presented. The proposed method is versatile and can be broadly applied to both computational neuroscience and Recurrent Neural Networks (RNNs) research. We employ the Echo State Network (ESN) as a means to investigate the mechanistic underpinnings of hub structures. Our findings demonstrate a substantial enhancement in performance upon incorporating the hub structure. Through comprehensive mechanistic analyses, we show that the hub structure improves model performance by facilitating efficient information processing and better feature extractions."into Simplified Chinese) traducción(" estructura de nudo principal, caracterizada por un pequeño número de nodos altamente interconectados rodeados por un mayor número de nodos con menos conexiones, es una característica topológica destacada de los cerebros biológicos, que contribuye a la transferencia eficiente de información y el procesamiento cognitivo en diversas especies. En este artículo, se presenta un modelo matemático de la estructura de nudo. El método propuesto es versátil y puede ser ampliamente aplicado en investigaciones de neurociencia computacional y redes neuronales recurrentes (RNNs). Utilizamos la Red de Estado Eco (ESN) como medio para investigar las bases mecanicistas de las estructuras de nudo. Nuestros hallazgos demuestran una mejora substancial en el rendimiento al incorporar la estructura de nudo. A través de análisis mecanicistas exhaustivos, demostramos que la estructura de nudo mejora el rendimiento del modelo al facilitar el procesamiento eficiente de información y la extracción de características mejor."

Causal Discovery with Language Models as Imperfect Experts

  • paper_url: http://arxiv.org/abs/2307.02390
  • repo_url: https://github.com/stephlong614/causal-disco
  • paper_authors: Stephanie Long, Alexandre Piché, Valentina Zantedeschi, Tibor Schuster, Alexandre Drouin
  • for: 本研究旨在提高基于数据驱动的 causal 图 indentification 精度,通过利用专家知识。
  • methods: 我们提议使用 consistency 属性,如 acyclicity 和 conditional independencies,来修正专家提供的错误信息。
  • results: 我们在实际数据上使用大型自然语言模型作为不准确的专家,并进行了一个案例研究。
    Abstract Understanding the causal relationships that underlie a system is a fundamental prerequisite to accurate decision-making. In this work, we explore how expert knowledge can be used to improve the data-driven identification of causal graphs, beyond Markov equivalence classes. In doing so, we consider a setting where we can query an expert about the orientation of causal relationships between variables, but where the expert may provide erroneous information. We propose strategies for amending such expert knowledge based on consistency properties, e.g., acyclicity and conditional independencies in the equivalence class. We then report a case study, on real data, where a large language model is used as an imperfect expert.
    摘要 理解系统下 causal 关系的下发关系是决策准确的基本前提。在这项工作中,我们研究如何使用专家知识来提高数据驱动的 causal 图识别,超出 markov 等类。在这种情况下,我们可以询问专家关于变量之间 causal 关系的方向,但专家可能提供错误的信息。我们提出了基于一致性属性的纠正策略,如循环无法和条件独立性在等类中。然后,我们报告了一个实际数据的案例研究,使用大语言模型作为不完全专家。