cs.LG - 2023-07-20

Synthetic Control Methods by Density Matching under Implicit Endogeneity

  • paper_url: http://arxiv.org/abs/2307.11127
  • repo_url: None
  • paper_authors: Masahiro Kato, Akari Ohda, Masaaki Imaizumi, Kenichiro McAlinn
  • for: 本研究使用Synthetic control方法(SCM)估计对比案例研究中的效果,SCM可以估计对待单位的Counterfactual outcome,并且是对比案例研究中的一种重要工具。
  • methods: 本研究提出了一种新的SCM方法,基于density matching假设,即对待单位的结果density可以被 aproximated为一个权重加权的 mixture model。通过这个假设,我们可以估计SC weights,并且我们的估计器具有三个优点:一、我们的估计器是 asymptotically unbiased; two、我们可以降低对counterfactual prediction的mean squared error; three、我们的方法可以生成对待单位的治疗效果的全体分布,不仅是预期值。
  • results: 本研究通过实验结果展示了我们的方法的效果,并且证明了其比既有SCM方法更加精准和有效。
    Abstract Synthetic control methods (SCMs) have become a crucial tool for causal inference in comparative case studies. The fundamental idea of SCMs is to estimate counterfactual outcomes for a treated unit by using a weighted sum of observed outcomes from untreated units. The accuracy of the synthetic control (SC) is critical for estimating the causal effect, and hence, the estimation of SC weights has been the focus of much research. In this paper, we first point out that existing SCMs suffer from an implicit endogeneity problem, which is the correlation between the outcomes of untreated units and the error term in the model of a counterfactual outcome. We show that this problem yields a bias in the causal effect estimator. We then propose a novel SCM based on density matching, assuming that the density of outcomes of the treated unit can be approximated by a weighted average of the densities of untreated units (i.e., a mixture model). Based on this assumption, we estimate SC weights by matching moments of treated outcomes and the weighted sum of moments of untreated outcomes. Our proposed method has three advantages over existing methods. First, our estimator is asymptotically unbiased under the assumption of the mixture model. Second, due to the asymptotic unbiasedness, we can reduce the mean squared error for counterfactual prediction. Third, our method generates full densities of the treatment effect, not only expected values, which broadens the applicability of SCMs. We provide experimental results to demonstrate the effectiveness of our proposed method.
    摘要 Synthetic control methods (SCMs) 已成为比较研究中的重要工具,用于估计 causal inference。 SCMs 的基本思想是使用一个权重和平∑ 观察到的结果来估计对待单位的 counterfactual 结果。 SC 的准确性是估计 causal effect 的关键,因此 SCMs 的 estimation 问题已经引起了很多研究。在这篇论文中,我们首先指出了现有 SCMs 存在一种隐藏的内生性问题,即对 untreated units 的结果和 counterfactual 结果模型中的错误项之间的相关性。我们证明了这个问题会导致 causal effect 估计器偏移。然后,我们提出了一种基于 density matching 的新的 SCM,假设待处理单位的结果的概率可以通过一个权重和平∑ untreated units 的结果概率来近似。基于这个假设,我们可以通过匹配待处理结果的 moments 和权重和平∑ untreated units 的 moments来估计 SC 权重。我们的提议方法有三个优点:首先,我们的估计器在 mixture model 的假设下是 asymptotically unbiased。其次,由于 asymptotic unbiasedness,我们可以降低 counterfactual prediction 的 mean squared error。第三,我们的方法可以生成对待单位的治疗效果的全部概率分布,不仅是预期值,这扩展了 SCMs 的应用范围。我们提供实验结果,以证明我们的提议方法的效果。

A Markov Chain Model for Identifying Changes in Daily Activity Patterns of People Living with Dementia

  • paper_url: http://arxiv.org/abs/2307.11126
  • repo_url: https://github.com/nvfl/markov-chain-model
  • paper_authors: Nan Fletcher-Lloyd, Alina-Irina Serban, Magdalena Kolanko, David Wingfield, Danielle Wilson, Ramin Nilforooshan, Payam Barnaghi, Eyal Soreq
  • for: 这个研究是为了检测老人生活在困难中的营养和液体消耗情况,以及对这些情况的影响。
  • methods: 这个研究使用了互联网物联网技术收集了73户老人生活在家中的数据,并使用线性混合效应分析检测了COVID-19大流行对老人的吃饭和喝水习惯的影响。
  • results: 研究发现在白天,老人的厨房活动增加了(t(147) = -2.90,p < 0.001),而在夜晚,厨房活动减少了(t(147) = -2.90,p < 0.001)。此外,研究还提出了一种基于营养模型的方法来检测老人的行为变化。
    Abstract Malnutrition and dehydration are strongly associated with increased cognitive and functional decline in people living with dementia (PLWD), as well as an increased rate of hospitalisations in comparison to their healthy counterparts. Extreme changes in eating and drinking behaviours can often lead to malnutrition and dehydration, accelerating the progression of cognitive and functional decline and resulting in a marked reduction in quality of life. Unfortunately, there are currently no established methods by which to objectively detect such changes. Here, we present the findings of an extensive quantitative analysis conducted on in-home monitoring data collected from 73 households of PLWD using Internet of Things technologies. The Coronavirus 2019 (COVID-19) pandemic has previously been shown to have dramatically altered the behavioural habits, particularly the eating and drinking habits, of PLWD. Using the COVID-19 pandemic as a natural experiment, we conducted linear mixed-effects modelling to examine changes in mean kitchen activity within a subset of 21 households of PLWD that were continuously monitored for 499 days. We report an observable increase in day-time kitchen activity and a significant decrease in night-time kitchen activity (t(147) = -2.90, p < 0.001). We further propose a novel analytical approach to detecting changes in behaviours of PLWD using Markov modelling applied to remote monitoring data as a proxy for behaviours that cannot be directly measured. Together, these results pave the way to introduce improvements into the monitoring of PLWD in naturalistic settings and for shifting from reactive to proactive care.
    摘要 🇨🇳 营养不良和肥虚是老年人智能和功能退化的重要风险因素,也会使人们患有智能和功能退化的人群(PLWD)的入院率增加。宽泛的食品和饮料消耗方式的变化可能会导致营养不良和肥虚,加速智能和功能退化的进程,从而导致生活质量下降。可惜,目前没有可靠的方法可以 объектив地探测这些变化。我们在73户老年人智能和功能退化者的家庭中进行了广泛的量化分析,使用互联网物联网技术收集数据。2019冠状病毒疫情(COVID-19)已经对智能和功能退化者的行为习惯产生了深远的影响,特别是饮食和饮料的消耗方式。使用2019冠状病毒疫情作为自然实验,我们使用线性混合效应模型对21户智能和功能退化者的24小时内的厨房活动进行分析。我们发现了日间厨房活动的 observable 增加(t(147) = -2.90,p < 0.001),并且发现夜间厨房活动的显著减少。此外,我们还提出了一种基于远程监测数据的Markov模型,用于检测智能和功能退化者的行为变化。这些结果将为监测智能和功能退化者的监测提供新的方法,并且可以帮助转换到主动监测。

Diffusion Models for Probabilistic Deconvolution of Galaxy Images

  • paper_url: http://arxiv.org/abs/2307.11122
  • repo_url: https://github.com/yashpatel5400/galgen
  • paper_authors: Zhiwei Xue, Yuhang Li, Yash Patel, Jeffrey Regier
  • for: 这个论文是为了提出一种基于深度生成模型的PSF逆推算法,用于恢复宇宙图像中的细节。
  • methods: 该论文使用了一种基于普通 diffusion 模型的方法,不需要类别器,可以更好地提取宇宙图像中的细节。
  • results: 论文的实验结果表明,基于 diffusion 模型的PSF逆推算法可以更好地捕捉宇宙图像中的细节,并且提供了更多的可能性空间,比如 conditional VAE 的方法。
    Abstract Telescopes capture images with a particular point spread function (PSF). Inferring what an image would have looked like with a much sharper PSF, a problem known as PSF deconvolution, is ill-posed because PSF convolution is not an invertible transformation. Deep generative models are appealing for PSF deconvolution because they can infer a posterior distribution over candidate images that, if convolved with the PSF, could have generated the observation. However, classical deep generative models such as VAEs and GANs often provide inadequate sample diversity. As an alternative, we propose a classifier-free conditional diffusion model for PSF deconvolution of galaxy images. We demonstrate that this diffusion model captures a greater diversity of possible deconvolutions compared to a conditional VAE.
    摘要 天文望远镜捕捉到图像,但图像具有特定的点扩散函数(PSF)。尝试恢复图像为更加锐利PSF后的形态,称为PSF恢复,是一个不定问题,因为PSF混合不是可逆变换。深度生成模型吸引了PSF恢复的应用,因为它们可以对候选图像进行 posterior 分布预测,如果将其混合到PSF中,可能会生成观测结果。然而,经典的深度生成模型如VAEs和GANs经常提供不够的样本多样性。为了解决这问题,我们提议一种无类别的条件扩散模型 дляPSF恢复星系图像。我们证明该扩散模型可以捕捉更多的可能的恢复形态,比 conditional VAE 更加多样化。

PASTA: Pretrained Action-State Transformer Agents

  • paper_url: http://arxiv.org/abs/2307.10936
  • repo_url: None
  • paper_authors: Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet, Guillaume Richard, Thomas Pierrot
    for: This paper aims to investigate the use of pre-trained transformer models for reinforcement learning tasks, specifically addressing the problem of adapting models to new environments with limited data.methods: The authors use a unified methodology that includes tokenization at the action and state component level, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT) to adapt the models to downstream tasks.results: The developed models contain fewer than 10 million parameters and can be fine-tuned with fewer than 10,000 parameters during downstream adaptation, allowing for robust policy learning and encouraging further research into the use of transformers for reinforcement learning.
    Abstract Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning.
    摘要 自顾学学习在不同计算领域中引发了革命性的思维方式变革,包括自然语言处理、视觉和生物学。现有的方法通常是使用庞大量未标注数据进行预训练 transformer 模型,作为下游任务的开始点,以提高效率。在返回学习领域,研究人员已经采用了这些方法,并开发了基于专家轨迹的预训练模型,以解决广泛的任务,从 робо特斯到推荐系统。然而,现有的方法通常仅适用于特定下游应用程序的精细预训练目标。本文提出了一种叫做 PASTA 的模型,其中包括使用各种设计选择和涵盖广泛的通用下游任务,包括行为做clone、离线学习、感知故障Robustness和动力学变化适应。我们的研究使用一种统一的方法ologies和涵盖了广泛的下游任务,以系统地比较不同的设计选择,并为实践者提供有价值的洞察。关键特点包括动作和状态组件级别的启用,使用基本的预训练目标如下一个token预测,在多个领域同时训练模型,以及使用参数效率的练习(PEFT)。开发的模型中含 fewer than 10 million parameters,并且通过PEFT进行参数练习,可以在下游适应中使用 fewer than 10,000 parameters,使得广泛的社区可以使用这些模型并重现我们的实验。我们希望这种研究会鼓励更多的人使用 transformer 模型的首要原则来表示RL轨迹,并贡献于Robust policy学习。

Inorganic synthesis-structure maps in zeolites with machine learning and crystallographic distances

  • paper_url: http://arxiv.org/abs/2307.10935
  • repo_url: None
  • paper_authors: Daniel Schwalbe-Koda, Daniel E. Widdowson, Tuan Anh Pham, Vitaliy A. Kurlin
  • for: 这项研究旨在使用计算机模拟和机器学习(ML)技术,为硅酸盐材料的合成创造无监督的材料合成地图。
  • methods: 该研究使用了一种强大的距离度量方法和机器学习分析方法,从253个已知硅酸盐中提取出不同的材料合成条件。
  • results: 研究发现,在不使用标签的情况下,邻近的硅酸盐结构之间的距离度量可以反映硅酸盐的材料合成条件,并且可以预测硅酸盐的合成结果。
    Abstract Zeolites are inorganic materials known for their diversity of applications, synthesis conditions, and resulting polymorphs. Although their synthesis is controlled both by inorganic and organic synthesis conditions, computational studies of zeolite synthesis have focused mostly on organic template design. In this work, we use a strong distance metric between crystal structures and machine learning (ML) to create inorganic synthesis maps in zeolites. Starting with 253 known zeolites, we show how the continuous distances between frameworks reproduce inorganic synthesis conditions from the literature without using labels such as building units. An unsupervised learning analysis shows that neighboring zeolites according to our metric often share similar inorganic synthesis conditions, even in template-based routes. In combination with ML classifiers, we find synthesis-structure relationships for 14 common inorganic conditions in zeolites, namely Al, B, Be, Ca, Co, F, Ga, Ge, K, Mg, Na, P, Si, and Zn. By explaining the model predictions, we demonstrate how (dis)similarities towards known structures can be used as features for the synthesis space. Finally, we show how these methods can be used to predict inorganic synthesis conditions for unrealized frameworks in hypothetical databases and interpret the outcomes by extracting local structural patterns from zeolites. In combination with template design, this work can accelerate the exploration of the space of synthesis conditions for zeolites.
    摘要

Modeling 3D cardiac contraction and relaxation with point cloud deformation networks

  • paper_url: http://arxiv.org/abs/2307.10927
  • repo_url: None
  • paper_authors: Marcel Beetz, Abhirup Banerjee, Vicente Grau
  • for: 该研究旨在开发一种基于点云深度学习的准确评估三维心脏功能的方法,以提高我们对心脏健康和疾病机理的理解。
  • methods: 该方法使用点云深度学习的最新进展,建立了一个点云编码器-解码器结构,以便高效地学习多尺度特征。
  • results: 研究人员对大量的UK Biobank数据集进行了测试,并发现了average Chamfer距离小于图像获取的像素分辨率,同时也发现了与真实数据集的相似性。此外,研究人员还发现了在各个子 популяции中的差异,并且表明了3D凝聚模式可以超越多个临床标准。
    Abstract Global single-valued biomarkers of cardiac function typically used in clinical practice, such as ejection fraction, provide limited insight on the true 3D cardiac deformation process and hence, limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances between the predicted and ground truth anatomies below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis.
    摘要 全球单值生物标志物typically used in clinical practice, such as ejection fraction, only provide limited insight into the true 3D cardiac deformation process and therefore limit the understanding of both healthy and pathological cardiac mechanics. In this work, we propose the Point Cloud Deformation Network (PCD-Net) as a novel geometric deep learning approach to model 3D cardiac contraction and relaxation between the extreme ends of the cardiac cycle. It employs the recent advances in point cloud-based deep learning into an encoder-decoder structure, in order to enable efficient multi-scale feature learning directly on multi-class 3D point cloud representations of the cardiac anatomy. We evaluate our approach on a large dataset of over 10,000 cases from the UK Biobank study and find average Chamfer distances below the pixel resolution of the underlying image acquisition. Furthermore, we observe similar clinical metrics between predicted and ground truth populations and show that the PCD-Net can successfully capture subpopulation-specific differences between normal subjects and myocardial infarction (MI) patients. We then demonstrate that the learned 3D deformation patterns outperform multiple clinical benchmarks by 13% and 7% in terms of area under the receiver operating characteristic curve for the tasks of prevalent MI detection and incident MI prediction and by 7% in terms of Harrell's concordance index for MI survival analysis.

Confidence intervals for performance estimates in 3D medical image segmentation

  • paper_url: http://arxiv.org/abs/2307.10926
  • repo_url: https://github.com/rosanajurdi/SegVal_TMI
  • paper_authors: R. El Jurdi, G. Varoquaux, O. Colliot
  • for: 这 paper 是用来评估医疗图像 segmentation 模型的。
  • methods: 这 paper 使用了 nnU-net 框架和 Medical Decathlon 挑战赛中的两个数据集,以及两种表现指标: dice 准确率和 Hausdorff 距离。
  • results: 这 paper 发现,在不同的测试集大小和表现指标的扩散情况下,参数型的信度范围是Bootstrap估计的可靠近似。此外,它还发现,为了达到某个精度水平,通常需要训练样本数量远少于 classification 任务。 typically,需要约 100-200 个测试样本,而且更Difficult的 segmentation 任务可能需要更多的测试样本。
    Abstract Medical segmentation models are evaluated empirically. As such an evaluation is based on a limited set of example images, it is unavoidably noisy. Beyond a mean performance measure, reporting confidence intervals is thus crucial. However, this is rarely done in medical image segmentation. The width of the confidence interval depends on the test set size and on the spread of the performance measure (its standard-deviation across of the test set). For classification, many test images are needed to avoid wide confidence intervals. Segmentation, however, has not been studied, and it differs by the amount of information brought by a given test image. In this paper, we study the typical confidence intervals in medical image segmentation. We carry experiments on 3D image segmentation using the standard nnU-net framework, two datasets from the Medical Decathlon challenge and two performance measures: the Dice accuracy and the Hausdorff distance. We show that the parametric confidence intervals are reasonable approximations of the bootstrap estimates for varying test set sizes and spread of the performance metric. Importantly, we show that the test size needed to achieve a given precision is often much lower than for classification tasks. Typically, a 1% wide confidence interval requires about 100-200 test samples when the spread is low (standard-deviation around 3%). More difficult segmentation tasks may lead to higher spreads and require over 1000 samples.
    摘要 医学分割模型通常会被实际测试。这种测试基于有限的示例图像,因此无法避免噪音。除了平均性能指标之外,报告信息interval也是非常重要。然而,在医学像分割中,这并不是常见的做法。信息interval的宽度取决于测试集大小和性能指标的扩散(测试集中的标准差)。对于分类任务,需要许多测试图像来避免宽的信息interval。但是,在分割任务中,不同的测试图像会带来不同的信息量。在这篇论文中,我们研究了医学像分割中常见的信息interval。我们在使用标准nnU-net框架、医疗十大挑战赛提供的两个数据集和两个性能指标( dice准确率和 Hausdorff 距离)进行了实验。我们发现,参数信息interval是参数Bootstrap估计的可靠近似,并且显示测试集大小和性能指标的扩散对信息interval的影响。进一步地,我们发现,为了 достичь给定的精度,测试样本的数量通常比分类任务低得多。例如,当标准差较低(约3%)时,1% 宽的信息interval只需要100-200个测试样本。更复杂的分割任务可能会导致更高的扩散,需要更多的测试样本。

Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series

  • paper_url: http://arxiv.org/abs/2307.10923
  • repo_url: https://github.com/exploita123/charmedforfree
  • paper_authors: Aniruddh Raghu, Payal Chandak, Ridwan Alam, John Guttag, Collin M. Stultz
  • for: 这个论文是为了解决现有的自动学习(Self-supervised learning)方法不能处理多Modal时间序列数据的问题。
  • methods: 该论文提出了一种新的自动学习方法——Sequential Multi-Dimensional SSL,它在序列级和个体高维数据级别应用SSL损失来更好地捕捉信息。
  • results: 对两个实际的医疗时间序列数据集进行了实验,结果表明,在先行培育后,使用该方法并then fine-tuning在下游任务上提高了性能,并在一些设置下可以通过不同的自动学习损失函数来提高性能。
    Abstract Self-supervised learning (SSL) for clinical time series data has received significant attention in recent literature, since these data are highly rich and provide important information about a patient's physiological state. However, most existing SSL methods for clinical time series are limited in that they are designed for unimodal time series, such as a sequence of structured features (e.g., lab values and vitals signs) or an individual high-dimensional physiological signal (e.g., an electrocardiogram). These existing methods cannot be readily extended to model time series that exhibit multimodality, with structured features and high-dimensional data being recorded at each timestep in the sequence. In this work, we address this gap and propose a new SSL method -- Sequential Multi-Dimensional SSL -- where a SSL loss is applied both at the level of the entire sequence and at the level of the individual high-dimensional data points in the sequence in order to better capture information at both scales. Our strategy is agnostic to the specific form of loss function used at each level -- it can be contrastive, as in SimCLR, or non-contrastive, as in VICReg. We evaluate our method on two real-world clinical datasets, where the time series contains sequences of (1) high-frequency electrocardiograms and (2) structured data from lab values and vitals signs. Our experimental results indicate that pre-training with our method and then fine-tuning on downstream tasks improves performance over baselines on both datasets, and in several settings, can lead to improvements across different self-supervised loss functions.
    摘要 自适应学习(SSL) для医疗时间序列数据在当前文献中受到了广泛关注,因为这些数据具有高度的资源和重要的生物physiological状态信息。然而,现有的大多数SSL方法仅适用于单模时间序列,例如序列中的结构化特征(如医学实验室值和生物指标)或个人高维度生理学信号(如电cardiogram)。这些现有方法无法轻松地扩展到模型时间序列,其中每个时间步骤都包含结构化特征和高维度数据。在这项工作中,我们解决这个差距,并提议一种新的SSL方法——Sequential Multi-Dimensional SSL。在这种方法中,我们在序列级别和个体高维度数据点级别都应用SSL损失,以更好地捕捉信息在不同级别。我们的策略是对特定的损失函数类型不拘泥,可以是对比性的,如SimCLR,或非对比性的,如VICReg。我们在两个真实的医疗时间序列数据集上进行了实验,其中时间序列包含高频电cardiograms和实验室值和生物指标的序列。我们的实验结果表明,在这些数据集上预训练后,通过精度调整下游任务,可以超过基准值,并在不同的自我超vised损失函数下达到更好的性能。

Language-based Action Concept Spaces Improve Video Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2307.10922
  • repo_url: None
  • paper_authors: Kanchana Ranasinghe, Michael Ryoo
  • for: 学习高效转移和鲁棒的视频表示
  • methods: 使用语言捆绑自我超vised学习将图像CLIP模型适应视频频谱
  • results: 提高零shot和线性探测性能在三个动作认识benchmark上
    Abstract Recent contrastive language image pre-training has led to learning highly transferable and robust image representations. However, adapting these models to video domains with minimal supervision remains an open problem. We explore a simple step in that direction, using language tied self-supervised learning to adapt an image CLIP model to the video domain. A backbone modified for temporal modeling is trained under self-distillation settings with train objectives operating in an action concept space. Feature vectors of various action concepts extracted from a language encoder using relevant textual prompts construct this space. We introduce two train objectives, concept distillation and concept alignment, that retain generality of original representations while enforcing relations between actions and their attributes. Our approach improves zero-shot and linear probing performance on three action recognition benchmarks.
    摘要 Translated into Simplified Chinese:最近的对语言图像预训练技术已经导致学习了高度可转移和稳定的图像表示。然而,将这些模型应用到视频频道上仍然是一个开放的问题。我们考虑了一种简单的方法,使用语言绑定的自我超vision学习来适应图像 CLIP 模型到视频频道。我们修改了 temporal 模型,在自我数据采样设置下使用语言Encoder 提取的不同动作概念的特征向量构建动作概念空间。我们引入了两个训练目标,概念练习和概念对接,以保留原始表示的通用性,同时强制行动和其属性之间的关系。我们的方法提高了零shot和直线探测性能在三个动作认识标准 bencmarks 上。

The Role of Entropy and Reconstruction in Multi-View Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2307.10907
  • repo_url: https://github.com/apple/ml-entropy-reconstruction
  • paper_authors: Borja Rodríguez-Gálvez, Arno Blaas, Pau Rodríguez, Adam Goliński, Xavier Suau, Jason Ramapuram, Dan Busbridge, Luca Zappella
  • for: 本文研究了多视图自动学习(MVSSL)的成功机制,并通过一种新的下界函数来分析不同的MVSSL家族。
  • methods: 本文使用一种基于信息准则(MI)的下界函数,包括一个 entropy 和一个重建项(ER),来分析不同的MVSSL方法。
  • results: 研究结果表明,使用这种 ER 下界函数可以达到与常见MVSSL方法相当的性能,同时使得训练时使用小批量或小EMA系数时更加稳定。
    Abstract The mechanisms behind the success of multi-view self-supervised learning (MVSSL) are not yet fully understood. Contrastive MVSSL methods have been studied through the lens of InfoNCE, a lower bound of the Mutual Information (MI). However, the relation between other MVSSL methods and MI remains unclear. We consider a different lower bound on the MI consisting of an entropy and a reconstruction term (ER), and analyze the main MVSSL families through its lens. Through this ER bound, we show that clustering-based methods such as DeepCluster and SwAV maximize the MI. We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO, showing that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy, and we confirm this empirically. We show that replacing the objectives of common MVSSL methods with this ER bound achieves competitive performance, while making them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients. Github repo: https://github.com/apple/ml-entropy-reconstruction.
    摘要 文中所描述的多视图自学习(MVSSL)的机制仍未完全理解。对于对比MVSSL方法的研究,我们通过InfoNCE,一种低下界的共识信息(MI)来研究。但是其他MVSSL方法和MI之间的关系仍然不清楚。我们考虑一种基于 entropy和重建项(ER)的低下界,并通过这个窗口来分析主要的MVSSL家族。我们显示了使用 clustering-based 方法such as DeepCluster和SwAV时,实际上是最大化MI的。我们还重新解释了 distillation-based 方法such as BYOL和DINO的机制,并证明它们通过直接最大化重建项并间接激发稳定的 entropy来实现。我们通过实验证明这一点。 finally,我们表明将常见MVSSL方法的目标替换为ER下界可以实现竞争性的性能,同时使其在训练时使用小批量或小EMA系数时更加稳定。Here's the breakdown of the translation:* 文中所描述的多视图自学习 (MVSSL):The text is discussing the mechanisms behind the success of multi-view self-supervised learning (MVSSL).* 机制仍未完全理解:The mechanisms behind MVSSL are not yet fully understood.* 对于对比MVSSL方法的研究:Research on comparing MVSSL methods.* 通过InfoNCE来研究:Researching through the lens of InfoNCE, a lower bound of mutual information (MI).* 其他MVSSL方法和MI之间的关系仍然不清楚:The relationship between other MVSSL methods and MI is still unclear.* 我们考虑一种基于 entropy和重建项(ER)的低下界:We consider a different lower bound on MI consisting of an entropy and a reconstruction term (ER).* 并通过这个窗口来分析主要的MVSSL家族:And analyze the main MVSSL families through this ER bound.* 我们显示了使用 clustering-based 方法such as DeepCluster和SwAV时,实际上是最大化MI的:We show that using clustering-based methods such as DeepCluster and SwAV, the MI is maximized.* 我们还重新解释了 distillation-based 方法such as BYOL和DINO的机制:We also re-interpret the mechanisms of distillation-based approaches such as BYOL and DINO.* 并证明它们通过直接最大化重建项并间接激发稳定的 entropy来实现:And prove that they explicitly maximize the reconstruction term and implicitly encourage a stable entropy.* 我们通过实验证明这一点:We confirm this empirically.* finally,我们表明将常见MVSSL方法的目标替换为ER下界可以实现竞争性的性能:Finally, we show that replacing the objectives of common MVSSL methods with the ER bound achieves competitive performance.* 同时使其在训练时使用小批量或小EMA系数时更加稳定:And make them stable when training with smaller batch sizes or smaller exponential moving average (EMA) coefficients.

Variational Point Encoding Deformation for Dental Modeling

  • paper_url: http://arxiv.org/abs/2307.10895
  • repo_url: None
  • paper_authors: Johan Ziruo Ye, Thomas Ørkild, Peter Lempel Søndergaard, Søren Hauberg
  • for: 本研究旨在鼓励更多研究,透过发布新的大量牙齿矩阵数据集。
  • methods: 我们提出了一种扩展FoldingNet的方法,称为Variational FoldingNet(VF-Net),它允许点云表示的 probabilistic 学习。
  • results: 我们的实验结果表明,VF-Net 比现有模型在牙齿扫描和推理方面具有更高的表现力,同时具有更好的鲁棒性。
    Abstract Digital dentistry has made significant advancements in recent years, yet numerous challenges remain to be addressed. In this study, we release a new extensive dataset of tooth meshes to encourage further research. Additionally, we propose Variational FoldingNet (VF-Net), which extends FoldingNet to enable probabilistic learning of point cloud representations. A key challenge in existing latent variable models for point clouds is the lack of a 1-to-1 mapping between input points and output points. Instead, they must rely on optimizing Chamfer distances, a metric that does not have a normalized distributional counterpart, preventing its usage in probabilistic models. We demonstrate that explicit minimization of Chamfer distances can be replaced by a suitable encoder, which allows us to increase computational efficiency while simplifying the probabilistic extension. Our experimental findings present empirical evidence demonstrating the superior performance of VF-Net over existing models in terms of dental scan reconstruction and extrapolation. Additionally, our investigation highlights the robustness of VF-Net's latent representations. These results underscore the promising prospects of VF-Net as an effective and reliable method for point cloud reconstruction and analysis.
    摘要 《数字牙科技术的进步和挑战》Recently, digital dentistry has made significant advancements, but there are still many challenges that need to be addressed. In this study, we release a new and extensive dataset of tooth meshes to encourage further research. Additionally, we propose a new method called Variational FoldingNet (VF-Net), which extends FoldingNet to enable probabilistic learning of point cloud representations.Currently, there is a key challenge in existing latent variable models for point clouds, which is the lack of a 1-to-1 mapping between input points and output points. Instead, they must rely on optimizing Chamfer distances, a metric that does not have a normalized distributional counterpart, preventing its usage in probabilistic models. We demonstrate that explicit minimization of Chamfer distances can be replaced by a suitable encoder, which allows us to increase computational efficiency while simplifying the probabilistic extension.Our experimental findings present empirical evidence demonstrating the superior performance of VF-Net over existing models in terms of dental scan reconstruction and extrapolation. Additionally, our investigation highlights the robustness of VF-Net's latent representations. These results underscore the promising prospects of VF-Net as an effective and reliable method for point cloud reconstruction and analysis.

Learning and Generalizing Polynomials in Simulation Metamodeling

  • paper_url: http://arxiv.org/abs/2307.10892
  • repo_url: https://github.com/jesperhauch/polynomial_deep_learning
  • paper_authors: Jesper Hauch, Christoffer Riis, Francisco C. Pereira
  • for: 本研究旨在提高人工神经网络的 polynomial 拟合能力和通用性,以便在多种工程领域中使用。
  • methods: 本文提出了一种基于多项式神经网络(MNN)的拟合方法,并使用 MNN 作为递归建模Component。
  • results: 实验表明,MNN 比基eline模型更好地泛化,并且其在验证集上的性能与测试集上的性能相似。此外,作者还提出了一种基于 simulations 的模拟中间模型方法,可以更好地拟合 polynomial 时间步骤更新的 simulations。
    Abstract The ability to learn polynomials and generalize out-of-distribution is essential for simulation metamodels in many disciplines of engineering, where the time step updates are described by polynomials. While feed forward neural networks can fit any function, they cannot generalize out-of-distribution for higher-order polynomials. Therefore, this paper collects and proposes multiplicative neural network (MNN) architectures that are used as recursive building blocks for approximating higher-order polynomials. Our experiments show that MNNs are better than baseline models at generalizing, and their performance in validation is true to their performance in out-of-distribution tests. In addition to MNN architectures, a simulation metamodeling approach is proposed for simulations with polynomial time step updates. For these simulations, simulating a time interval can be performed in fewer steps by increasing the step size, which entails approximating higher-order polynomials. While our approach is compatible with any simulation with polynomial time step updates, a demonstration is shown for an epidemiology simulation model, which also shows the inductive bias in MNNs for learning and generalizing higher-order polynomials.
    摘要 “模型学习 polynomials 和泛化到不同分布是Engineering 多个领域的必备技能,因为时间步长更新通常是 polynomials。虽然前向神经网络可以适应任何函数,但它们无法泛化到高阶 polynomials。因此,本文收集并提出了multiplicative neural network(MNN)架构,用于 recursive 构建高阶 polynomials 的近似。我们的实验表明,MNNs 在泛化方面表现更好,并且在验证集中的性能与验证集外的性能相似。此外,我们还提出了一种 simulation metamodeling 方法,用于 simulations with polynomial time step updates。对于这些 simulations,可以通过增加步长来快速 simulate 时间 интерVAL,这意味着需要近似高阶 polynomials。我们的方法与任何具有 polynomial time step updates 的 simulation 相容,并在 epidemiology 模型中进行了示例,这也表明了 MNNs 对于学习和泛化高阶 polynomials 的适应性。”Note that Simplified Chinese is used here, which is the most widely used variety of Chinese in mainland China. If you prefer Traditional Chinese, I can provide that version as well.

Syntactic vs Semantic Linear Abstraction and Refinement of Neural Networks

  • paper_url: http://arxiv.org/abs/2307.10891
  • repo_url: https://github.com/cxlvinchau/linna
  • paper_authors: Calvin Chau, Jan Křetínský, Stefanie Mohr
  • for: 提高神经网络的可扩展性。
  • methods: 使用 linear combination of neurons 来取代单个 neuron,并在 syntaxic 和 semantic 上进行抽象。
  • results: 实现更高的减少,并引入一种改进的减少方法以保持准确性。
    Abstract Abstraction is a key verification technique to improve scalability. However, its use for neural networks is so far extremely limited. Previous approaches for abstracting classification networks replace several neurons with one of them that is similar enough. We can classify the similarity as defined either syntactically (using quantities on the connections between neurons) or semantically (on the activation values of neurons for various inputs). Unfortunately, the previous approaches only achieve moderate reductions, when implemented at all. In this work, we provide a more flexible framework where a neuron can be replaced with a linear combination of other neurons, improving the reduction. We apply this approach both on syntactic and semantic abstractions, and implement and evaluate them experimentally. Further, we introduce a refinement method for our abstractions, allowing for finding a better balance between reduction and precision.
    摘要 归纳是一种关键的验证技术,可以提高神经网络的扩展性。然而,归纳神经网络的使用范围还很有限。先前的方法是将一些神经元与其他相似的神经元进行交换,以实现归纳。我们可以将相似性分为逻辑(通过神经元之间的连接量)或semantic(通过神经元对各种输入的活动值)两种。可惜,先前的方法只能实现一定的减少,而且只有部分实现。在这项工作中,我们提供了更 flexible的框架,允许一个神经元被替换为一个线性组合其他神经元,从而提高减少。我们在逻辑和semantic归纳上应用这种方法,并进行实验性评估。此外,我们还引入了一种精细化方法,可以帮助找到更好的减少和精度之间的平衡。

Player-optimal Stable Regret for Bandit Learning in Matching Markets

  • paper_url: http://arxiv.org/abs/2307.10890
  • repo_url: None
  • paper_authors: Fang Kong, Shuai Li
  • For: This paper focuses on the problem of matching markets, specifically on finding a stable matching in an online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms).* Methods: The paper proposes a new algorithm called explore-then-Gale-Shapley (ETGS) and analyzes its performance in terms of the optimal stable regret of each player.* Results: The paper shows that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$, which is a significantly better result than previous works that either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. Additionally, the paper shows that the regret upper bound matches the previously derived lower bound when the preferences of participants satisfy some special conditions.
    Abstract The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper bound for player-optimal stable regret, their result can be exponentially large if players' preference gap is small. Whether a polynomial guarantee for this regret exists is a significant but still open problem. In this work, we provide a new algorithm named explore-then-Gale-Shapley (ETGS) and show that the optimal stable regret of each player can be upper bounded by $O(K\log T/\Delta^2)$ where $K$ is the number of arms, $T$ is the horizon and $\Delta$ is the players' minimum preference gap among the first $N+1$-ranked arms. This result significantly improves previous works which either have a weaker player-pessimal stable matching objective or apply only to markets with special assumptions. When the preferences of participants satisfy some special conditions, our regret upper bound also matches the previously derived lower bound.
    摘要 问题的匹配市场已经在文献中进行了长时间的研究,因为它在各种应用场景中具有广泛的应用前景。在这个问题中,找到一个稳定的匹配是一种常见的平衡目标。由于市场参与者通常对他们的偏好不甚清楚,因此一推最近的研究在在线设置下研究了参与者在多轮互动中学习他们未知的偏好。大多数前一些工作只能 deriv theoretically guarantees for player-pessimal stable regret,它是基于参与者最差偏好的稳定匹配中的最低奖励。然而,在最低稳定匹配下,参与者只能获得所有稳定匹配中最低的奖励。为了提高参与者的收益,参与者最佳稳定匹配是最感到满意的。虽然 \citet{basu21beyond} 成功地提出了一个Upper bound for player-optimal stable regret,但其结果可能会是指数增长的,如果参与者偏好的差距很小。whether a polynomial guarantee for this regret exists is a significant but still open problem。在这个工作中,我们提出了一个新的算法名为explore-then-Gale-Shapley(ETGS),并证明了每个参与者的最佳稳定 regret可以 upper bounded by $O(K\log T/\Delta^2)$,where $K$ is the number of arms, $T$ is the horizon, and $\Delta$ is the participants' minimum preference gap among the first $N+1$-ranked arms。这个结果比前一些工作更好,因为它们的目标是player-pessimal stable matching,或者只适用于特殊的市场假设。当参与者的偏好满足某些特殊条件时,我们的 regret upper bound也与之前 derive的下界匹配。

What Twitter Data Tell Us about the Future?

  • paper_url: http://arxiv.org/abs/2308.02035
  • repo_url: None
  • paper_authors: Alina Landowska, Marek Robak, Maciej Skorski
  • For: This paper investigates the futures projected by futurists on Twitter and explores the impact of language cues on anticipatory thinking among social media users.* Methods: The study uses a compiled dataset of over 1 million publicly shared tweets by future influencers and develops a scalable NLP pipeline using state-of-the-art models. The research employs topic modeling techniques, such as LDA and BERTopic, to identify the topics and language cues used by futurists.* Results: The study finds 15 topics from the LDA approach and 100 distinct topics from the BERTopic approach within the futurists’ tweets. The research demonstrates that the futurists’ language cues signal futures-in-the-making that enhance social media users to anticipate their own scenarios and respond to them in the present.Here are the three information points in Simplified Chinese text:* For: 这个研究 investigate Twitter 上的未来投影和社交媒体用户的预测思维。* Methods: 这个研究使用了超过100万次公共分享的 tweets,并开发了可扩展的自然语言处理管道,使用当前的模型。研究使用了主题模型,如 LDA 和 BERTopic,来标识未来者的话题和语言指示。* Results: 研究发现了 LDA approach 中的15个话题和 BERTopic approach 中的100个不同的话题。研究表明,未来者的语言指示signal未来的形成,使社交媒体用户能够预测自己的enario并在现在 respond to it。
    Abstract Anticipation is a fundamental human cognitive ability that involves thinking about and living towards the future. While language markers reflect anticipatory thinking, research on anticipation from the perspective of natural language processing is limited. This study aims to investigate the futures projected by futurists on Twitter and explore the impact of language cues on anticipatory thinking among social media users. We address the research questions of what futures Twitter's futurists anticipate and share, and how these anticipated futures can be modeled from social data. To investigate this, we review related works on anticipation, discuss the influence of language markers and prestigious individuals on anticipatory thinking, and present a taxonomy system categorizing futures into "present futures" and "future present". This research presents a compiled dataset of over 1 million publicly shared tweets by future influencers and develops a scalable NLP pipeline using SOTA models. The study identifies 15 topics from the LDA approach and 100 distinct topics from the BERTopic approach within the futurists' tweets. These findings contribute to the research on topic modelling and provide insights into the futures anticipated by Twitter's futurists. The research demonstrates the futurists' language cues signals futures-in-the-making that enhance social media users to anticipate their own scenarios and respond to them in present. The fully open-sourced dataset, interactive analysis, and reproducible source code are available for further exploration.
    摘要 人类有一种基本的认知能力,即预期(anticipation),它关注未来的发展和生活。虽然语言标记反映了预期思维,但从自然语言处理的角度来研究预期却有限。这项研究目的是Investigate Twitter上的未来预测和社交媒体用户对未来的预测思维的影响。我们解决的研究问题包括Twitter上预测的未来是什么和这些预测如何被社交数据模型化。为了调查这一点,我们提出了相关的研究和语言标记的影响以及著名人士对预期思维的影响,并提出了一个“现在未来”和“未来现在”的分类系统。本研究使用了一亿多个公共分享的推特信息,并开发了一个可扩展的自然语言处理(NLP)管道,使用当前的最佳实践模型。我们从LDA方法和BERTopic方法中提取了15个主题和100个特定主题,这些发现贡献于主题模型研究,并为Twitter上预测未来提供了新的视角。本研究表明预测者的语言标记可以预示未来的发展,使社交媒体用户能够预测和响应他们的enario。我们提供了全部开源的数据集、交互分析和可重复的代码,以便进一步探索。

Risk-optimized Outlier Removal for Robust Point Cloud Classification

  • paper_url: http://arxiv.org/abs/2307.10875
  • repo_url: None
  • paper_authors: Xinke Li, Junchi Lu
  • for: 这个研究的目的是为了提高点云深度模型在安全敏感场景中的可靠性和安全性,因为这些模型可能会受到意外或自然occurring的点云误差的干扰。
  • methods: 这篇研究提出了一个新的点云异常点除除法,called PointCVaR,可以让标准训练的模型消除额外的异常点和重建数据。这个方法开始是通过做出属性分析,以 determine the influence of each point on the model output,我们称之为点云风险。然后,我们使用Conditional Value at Risk (CVaR) 来优化高风险点的筛选过程。
  • results: 这篇研究在不同的点云误差情况下,通过了多种移除和分类实验,获得了出色的结果。尤其是在受到随机误差、敌意误差和后门触发误差的攻击下,PointCVaR可以成功地防御这些攻击,并且在这些情况下 achieves 87% 的精度。
    Abstract The popularity of point cloud deep models for safety-critical purposes has increased, but the reliability and security of these models can be compromised by intentional or naturally occurring point cloud noise. To combat this issue, we present a novel point cloud outlier removal method called PointCVaR, which empowers standard-trained models to eliminate additional outliers and restore the data. Our approach begins by conducting attribution analysis to determine the influence of each point on the model output, which we refer to as point risk. We then optimize the process of filtering high-risk points using Conditional Value at Risk (CVaR) as the objective. The rationale for this approach is based on the observation that noise points in point clouds tend to cluster in the tail of the risk distribution, with a low frequency but a high level of risk, resulting in significant interference with classification results. Despite requiring no additional training effort, our method produces exceptional results in various removal-and-classification experiments for noisy point clouds, which are corrupted by random noise, adversarial noise, and backdoor trigger noise. Impressively, it achieves 87% accuracy in defense against the backdoor attack by removing triggers. Overall, the proposed PointCVaR effectively eliminates noise points and enhances point cloud classification, making it a promising plug-in module for various models in different scenarios.
    摘要 “随着深度点云模型在安全敏感领域的普及,这些模型对于意外或自然发生的点云噪音的可靠性和安全性受到损害。为了解决这个问题,我们提出了一个新的点云异常点除除法 called PointCVaR,它让标准训练的模型能够更好地消除额外的异常点和重建数据。我们的方法开始 WITH 点云影响分析,决定每个点的影响力,我们称之为点风险。然后,我们使用 Conditional Value at Risk(CVaR)来优化高风险点的范例。我们发现点云噪音通常集中在风险分布的尾部,有较低的频率但高度的风险,导致分类结果受到干扰。尽管不需要额外的训练努力,我们的方法在不同的实验中获得了出色的成绩,包括随机噪音、敌意噪音和后门触发噪音降落。特别是,它在防御后门攻击时取得了87%的准确率。总的来说,我们的PointCVaR可以干扰点云噪音,提高点云分类,使其成为不同情况下的实用插件模组。”

Nonlinear Meta-Learning Can Guarantee Faster Rates

  • paper_url: http://arxiv.org/abs/2307.10870
  • repo_url: None
  • paper_authors: Dimitri Meunier, Zhu Li, Arthur Gretton, Samory Kpotufe
  • for: 本研究的目标是为meta-学习提供理论保证,以便在相关任务之间共享表示结构,从而简化目标任务。
  • methods: 本研究使用了非线性表示,并采用了严格的常见函数 régularization来约束任务特有的偏误。
  • results: 研究人员通过 teoretic 分析和实验 validate了meta-学习的非线性表示下的保证,并证明了随着任务数($N$)的增加,学习共享表示的速率可以scale。
    Abstract Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. Importantly, the main aim in theory works on the subject is to understand the extent to which convergence rates -- in learning a common representation -- \emph{may scale with the number $N$ of tasks} (as well as the number of samples per task). First steps in this setting demonstrate this property when both the shared representation amongst tasks, and task-specific regression functions, are linear. This linear setting readily reveals the benefits of aggregating tasks, e.g., via averaging arguments. In practice, however, the representation is often highly nonlinear, introducing nontrivial biases in each task that cannot easily be averaged out as in the linear case. In the present work, we derive theoretical guarantees for meta-learning with nonlinear representations. In particular, assuming the shared nonlinearity maps to an infinite-dimensional RKHS, we show that additional biases can be mitigated with careful regularization that leverages the smoothness of task-specific regression functions,
    摘要 很多最近的理论工作在meta-学中目标是利用相似的表示结构来简化目标任务。重要的是,理论工作的主要目标是理解学习共享表示结构时速度如何随着任务数量 $N$ 和样本数量的增加而增长。在首先步骤中,当共享表示结构和任务特定的回归函数都是线性的时,这种性质 readily reveals the benefits of task aggregation,例如,通过平均Arguments。然而,在实践中,表示结构通常是非线性的,引入了每个任务中的非轻松偏见,这些偏见无法如linear case中那样平均化。在 presente 工作中,我们 derive theoretical guarantees for meta-学with nonlinear representations。具体来说,我们假设共享非线性映射到了无穷dimensional RKHS中,我们显示了适当的 regularization可以减轻任务特定的偏见,同时利用任务特定的回归函数的平滑性。

Performance Issue Identification in Cloud Systems with Relational-Temporal Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.10869
  • repo_url: https://github.com/ase-submission/rtanomaly
  • paper_authors: Wenwei Gu, Jinyang Liu, Zhuangbin Chen, Jianping Zhang, Yuxin Su, Jiazhen Gu, Cong Feng, Zengyin Yang, Michael Lyu
  • for: 本研究旨在提高大规模云服务系统的可靠性和性能,通过准确地识别和定位问题。
  • methods: 本研究提出了一种基于关系和时间特征的多变量异常检测模型(RTAnomaly),通过图注意层学习 metrics 之间的依赖关系,更好地发现异常 metrics。
  • results: 对于公共数据集和两个工业数据集,RTAnomaly 与基eline模型进行比较,实现了平均 F1 分数为 0.929 和 Hit@3 为 0.920,表明RTAnomaly 的优越性。
    Abstract Performance issues permeate large-scale cloud service systems, which can lead to huge revenue losses. To ensure reliable performance, it's essential to accurately identify and localize these issues using service monitoring metrics. Given the complexity and scale of modern cloud systems, this task can be challenging and may require extensive expertise and resources beyond the capacity of individual humans. Some existing methods tackle this problem by analyzing each metric independently to detect anomalies. However, this could incur overwhelming alert storms that are difficult for engineers to diagnose manually. To pursue better performance, not only the temporal patterns of metrics but also the correlation between metrics (i.e., relational patterns) should be considered, which can be formulated as a multivariate metrics anomaly detection problem. However, most of the studies fall short of extracting these two types of features explicitly. Moreover, there exist some unlabeled anomalies mixed in the training data, which may hinder the detection performance. To address these limitations, we propose the Relational- Temporal Anomaly Detection Model (RTAnomaly) that combines the relational and temporal information of metrics. RTAnomaly employs a graph attention layer to learn the dependencies among metrics, which will further help pinpoint the anomalous metrics that may cause the anomaly effectively. In addition, we exploit the concept of positive unlabeled learning to address the issue of potential anomalies in the training data. To evaluate our method, we conduct experiments on a public dataset and two industrial datasets. RTAnomaly outperforms all the baseline models by achieving an average F1 score of 0.929 and Hit@3 of 0.920, demonstrating its superiority.
    摘要 大规模云服务系统中的性能问题会导致重大的收益损失。为确保可靠性,需要准确地识别和定位这些问题使用服务监控指标。由于现代云系统的复杂性和规模,这可能是一项具有挑战性和需要专业知识和资源的任务。现有的方法可能会分析每个指标独立地检测异常。然而,这可能会导致过载的警示,使得工程师难以手动诊断。为了提高性能,不仅需要考虑时间序列中的指标异常,还需要考虑指标之间的相互关系(即关系异常),这可以被视为多变量指标异常检测问题。然而,大多数研究都没有明确提取这两种特征。此外,存在在训练数据中的未标注异常,可能会降低检测性能。为解决这些限制,我们提出了关系时间异常检测模型(RTAnomaly),该模型将指标之间的关系和时间序列信息结合使用。RTAnomaly使用图注意层学习指标之间的依赖关系,以更好地发现可能导致异常的异常指标。此外,我们利用未标注异常学习的概念,以Address the issue of potential anomalies in the training data。为评估我们的方法,我们在公共数据集和两个工业数据集上进行了实验。结果显示,RTAnomaly在所有基线模型之上具有平均F1分数0.929和 Hit@3 0.920,这表明它的优势。

FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback

  • paper_url: http://arxiv.org/abs/2307.10867
  • repo_url: https://github.com/figcapshf/figcapshf
  • paper_authors: Ashish Singh, Prateek Agarwal, Zixuan Huang, Arpita Singh, Tong Yu, Sungchul Kim, Victor Bursztyn, Nikos Vlassis, Ryan A. Rossi
  • for: 这个论文主要是为了解决科学文献中图文合成的问题,提高图文合成的质量和准确性。
  • methods: 该论文使用了一种新的框架,即 FigCaps-HF,来生成图文合成。该框架包括自动评估图文对的质量以及基于人工反馈的学习方法,以优化图文合成的质量和准确性。
  • results: 该论文通过对不同类型的模型进行比较,证明了 FigCaps-HF 框架可以提高图文合成的性能。特别是,当使用 BLIP 作为基础模型时,RLHF 方法可以获得一个平均提升率达 35.7%、16.9% 和 9% 在 ROUGE、BLEU 和 Meteor 等指标中。此外,该论文还释放了一个大规模的 benchmark 数据集,以便进一步评估和发展 RLHF 技术。
    Abstract Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.
    摘要 <>Translate the given text into Simplified Chinese.<>科学视觉和文档中的标题是非常重要的,现有的科学标题生成方法都是基于文档中提取的figure-caption对,但是这些方法 frequently fall short (15),导致生成的标题与读者首选不符。为了生成高质量的标题,我们介绍了FigCaps-HF,一个新的标题生成框架,可以在读者首选的基础上生成标题。我们的框架包括以下两个部分:1. 一种自动评估figure-caption对的质量方法。2. 一种基于人工反馈的强化学习(RLHF)方法,用于优化一个生成figure-to-caption模型,以满足读者首选。我们的简单学习框架在不同的模型上进行了标准化finetuning后,都能够提高性能。特别是当使用BLIP作为基础模型时,我们的RLHF框架实现了ROUGE、BLEU和Meteor等指标中的平均提升为35.7%、16.9%和9%。最后,我们发布了一个大规模的人工反馈 benchmark dataset,以便进一步评估和发展RLHF技术。

Addressing caveats of neural persistence with deep graph persistence

  • paper_url: http://arxiv.org/abs/2307.10865
  • repo_url: https://github.com/ExplainableML/Deep-Graph-Persistence
  • paper_authors: Leander Girrbach, Anders Christensen, Ole Winther, Zeynep Akata, A. Sophia Koepke
  • for: 这个论文的目的是为了提出一种新的深度学习中的数据分析方法,以及一种基于这种方法的深度网络复杂度量度。
  • methods: 这个论文使用了topological数据分析的方法,以及一种新的层间拟合方法来处理深度网络。
  • results: 研究发现,深度网络的层次结构和大量 weights 的分布是决定 neural persistence 的两大因素。此外,通过对深度网络进行扩展,可以解决 variance 相关的问题,并且可以准确地量度深度网络的复杂度。
    Abstract Neural Persistence is a prominent measure for quantifying neural network complexity, proposed in the emerging field of topological data analysis in deep learning. In this work, however, we find both theoretically and empirically that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. Whilst this captures useful information for linear classifiers, we find that no relevant spatial structure is present in later layers of deep neural networks, making neural persistence roughly equivalent to the variance of weights. Additionally, the proposed averaging procedure across layers for deep neural networks does not consider interaction between layers. Based on our analysis, we propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers, which is equivalent to calculating neural persistence on one particular matrix. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues through standardisation. Code is available at https://github.com/ExplainableML/Deep-Graph-Persistence .
    摘要 neural persistency 是一种深度学习中的核心度量,在 topological data analysis 领域中提出。在这种工作中,我们发现了论理和实验两个方面的结论:网络权重的方差和大权重的空间吸引力是影响 neural persistency 的主要因素。这些信息对于线性分类器是有用的,但我们发现了深层神经网络中的后Layer没有相关的空间结构,因此 neural persistency 大致相当于网络权重的方差。此外,对于深度神经网络,层融合策略不考虑层之间的交互。基于我们的分析,我们提出了层拓扑下的 filtration 扩展,该扩展等于在一个特定矩阵上计算 neural persistency。这个方法会隐式地包含神经网络中的持续路径和减少方差相关的问题。代码可以在 上找到。

Divide & Bind Your Attention for Improved Generative Semantic Nursing

  • paper_url: http://arxiv.org/abs/2307.10864
  • repo_url: None
  • paper_authors: Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva
  • for: 这篇论文的目的是提出一种基于Generative Semantic Nursing(GSN)的方法,用于解决复杂的提示问题和多个实体之间的属性绑定问题。
  • methods: 该方法使用了两个新的损失函数:一个新的注意力损失函数和一个绑定损失函数,以提高GSN的表现。
  • results: 该方法在多个评估标准上表现出色,能够准确地synthesize所需的对象,并且Attribute binding更加紧密。更多视频和更新可以在项目页面上找到:https://sites.google.com/view/divide-and-bind
    Abstract Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., ``a cat and a dog''. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page \url{https://sites.google.com/view/divide-and-bind}.
    摘要 新型大规模文本至图生成模型,如稳定扩散(SD),已经显示出惊人的成果,具有高准确性。然而,当前领先的模型仍然难以生成完全遵循输入提示的图像。先前的工作,听取与激发(Attend & Excite),引入了生成 semantic nursing(GSN)的概念,通过在推理时间进行交叉注意力优化,以更好地包含 semantics。它在生成简单提示(例如,“一只猫和一只狗”)中显示了扎实的成果。然而,其效果在处理更复杂的提示时下降,并不直接地解决不正确的属性绑定问题。为了解决复杂提示或场景中多个实体的挑战,以及提高属性绑定的问题,我们提出了分区与绑定(Divide & Bind)方法。我们引入了两种新的损失目标:一种新的注意力损失和一种绑定损失。我们的方法在处理复杂提示下能够准确地生成愿景中的目标对象,并且具有改进的属性Alignment。更多视频和更新可以在项目页面()中找到。

Self-paced Weight Consolidation for Continual Learning

  • paper_url: http://arxiv.org/abs/2307.10845
  • repo_url: https://github.com/congwei45/spWC
  • paper_authors: Wei Cong, Yang Cong, Gan Sun, Yuyang Liu, Jiahua Dong
  • for: 这个研究旨在提高sequential task learning中的continual learning效能,并避免catastrophic forgetting这个问题。
  • methods: 我们提出了一个自适应的Weight Consolidation(spWC)框架,通过评估先前任务的推导性贡献来实现 Robust continual learning。我们还开发了一个自适应的调整方法,可以根据关键性表现指标(例如精度)来评估过去任务的困难程度。
  • results: 我们的方法可以对多个任务进行sequential learning,并且可以实现better performance和less computational cost。实验结果显示,我们的方法可以与其他流行的continual learning算法相比,在多个公共Benchmark dataset上实现更好的效果。
    Abstract Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms.
    摘要 CONTINUAL LEARNING算法,它们保持新任务参数与前一个任务相似,在Sequential task learning setting中很受欢迎。但是,1)新的 continual learner的性能将受到前一个任务的贡献的影响,而无法分别评估这些贡献;2)随着任务的增加,现有的算法的计算成本将增加很多,因为它们需要对所有任务进行Regularization。为解决这些挑战,我们提出了一个自适应Weight Consolidation(spWC)框架,以实现Robust continual learning。具体来说,我们开发了一种自适应Regularization,通过测量难度来评估过去任务的优先级。当遇到新任务时,我们将所有过去任务排序为“difficult”到“easy”的顺序,根据优先级。然后,我们将新的 continual learner的参数学习 via 选择保留过去任务中更难的知识。这可以很好地解决catastrophic forgetting问题,同时降低计算成本。我们采用了一种alternative convex search来逐步更新模型参数和优先级权重。我们的spWC框架是可插入的,可以应用于大多数 continual learning算法(例如EWC、MAS和RCIL)以及不同的方向(例如分类和分割)。实验结果表明,我们的提议可以在多个公共 benchmark dataset上提高性能,相比其他流行的 continual learning算法。

Global Precipitation Nowcasting of Integrated Multi-satellitE Retrievals for GPM: A U-Net Convolutional LSTM Architecture

  • paper_url: http://arxiv.org/abs/2307.10843
  • repo_url: https://github.com/reyhaneh-92/genesis_nowcast
  • paper_authors: Reyhaneh Rahimi, Ardeshir Ebtehaj, Ali Behrangi, Jackson Tan
  • for: 这个研究旨在开发一种深度学习架构,用于全球范围内降水预测,每30分钟预测4小时前的降水情况。
  • methods: 这个架构结合了U-Net和卷积长Short-Term Memory(LSTM)神经网络,并使用IMERG和一些关键的降水驱动因素从全球预测系统(GFS)来训练。
  • results: 研究发现,使用不同的训练损失函数,包括平均方差(回归)和焦点损失(分类),对降水预测质量有着不同的影响。结果表明,回归网络在降水轻度(下1.6毫米/小时)方面表现良好,而分类网络在降水极端(大于8毫米/小时)方面可以超过回归网络,以 Critical Success Index(CSI)为标准。同时,包含物理变量可以提高降水预测,特别是在较长的预测时间内。
    Abstract This paper presents a deep learning architecture for nowcasting of precipitation almost globally every 30 min with a 4-hour lead time. The architecture fuses a U-Net and a convolutional long short-term memory (LSTM) neural network and is trained using data from the Integrated MultisatellitE Retrievals for GPM (IMERG) and a few key precipitation drivers from the Global Forecast System (GFS). The impacts of different training loss functions, including the mean-squared error (regression) and the focal-loss (classification), on the quality of precipitation nowcasts are studied. The results indicate that the regression network performs well in capturing light precipitation (below 1.6 mm/hr), but the classification network can outperform the regression network for nowcasting of precipitation extremes (>8 mm/hr), in terms of the critical success index (CSI).. Using the Wasserstein distance, it is shown that the predicted precipitation by the classification network has a closer class probability distribution to the IMERG than the regression network. It is uncovered that the inclusion of the physical variables can improve precipitation nowcasting, especially at longer lead times in both networks. Taking IMERG as a relative reference, a multi-scale analysis in terms of fractions skill score (FSS), shows that the nowcasting machine remains skillful (FSS > 0.5) at the resolution of 10 km compared to 50 km for GFS. For precipitation rates greater than 4~mm/hr, only the classification network remains FSS-skillful on scales greater than 50 km within a 2-hour lead time.
    摘要

Label Calibration for Semantic Segmentation Under Domain Shift

  • paper_url: http://arxiv.org/abs/2307.10842
  • repo_url: None
  • paper_authors: Ondrej Bohdal, Da Li, Timothy Hospedales
  • for: 这篇论文是用于测试一个预训 semantic segmentation 模型在新Domain上的性能是否会严重下降。
  • methods: 这篇论文使用了一种基于域Shift的预训模型进行适应,通过计算软 labels prototype 并根据最相似的分类概率 vector 进行预测。
  • results: 论文显示了这种适应方法可以很快速、几乎不需要更多的计算资源,并且能够提高性能。它还证明了这种适应方法在实际上是非常有用的。
    Abstract Performance of a pre-trained semantic segmentation model is likely to substantially decrease on data from a new domain. We show a pre-trained model can be adapted to unlabelled target domain data by calculating soft-label prototypes under the domain shift and making predictions according to the prototype closest to the vector with predicted class probabilities. The proposed adaptation procedure is fast, comes almost for free in terms of computational resources and leads to considerable performance improvements. We demonstrate the benefits of such label calibration on the highly-practical synthetic-to-real semantic segmentation problem.
    摘要 “一个先进的语义分割模型在新领域数据上的性能可能会减退很多。我们显示了一个先进模型可以通过计算域转移下的软标签 прототипы,并根据最相似的 вектор预测类别概率来进行预测。我们提出的适应过程快速、计算资源几乎没有成本,并导致了显著的性能提升。我们在实际上非常有用的 sintetic-to-real语义分割问题中展示了这种标签准确化的好处。”Note: "域转移" (domain shift) is translated as "域转移" in Simplified Chinese, and "soft-label" is translated as "软标签" in Simplified Chinese.

Adversarial Conversational Shaping for Intelligent Agents

  • paper_url: http://arxiv.org/abs/2307.11785
  • repo_url: None
  • paper_authors: Piotr Tarasiewicz, Sultan Kenjeyev, Ilana Sebag, Shehab Alshehabi
  • for: 提高对话代理人的智能会话系统稳定性和准确性
  • methods: 使用生成对抗网络(GANPG)和奖励每一个生成步骤(REGS)模型,并在seq2seq和 transformers 框架下进行强化学习
  • results: 通过不同的训练细节,模型可以提高对话代理人的性能和可靠性
    Abstract The recent emergence of deep learning methods has enabled the research community to achieve state-of-the art results in several domains including natural language processing. However, the current robocall system remains unstable and inaccurate: text generator and chat-bots can be tedious and misunderstand human-like dialogue. In this work, we study the performance of two models able to enhance an intelligent conversational agent through adversarial conversational shaping: a generative adversarial network with policy gradient (GANPG) and a generative adversarial network with reward for every generation step (REGS) based on the REGS model presented in Li et al. [18] . This model is able to assign rewards to both partially and fully generated text sequences. We discuss performance with different training details : seq2seq [ 36] and transformers [37 ] in a reinforcement learning framework.
    摘要

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

  • paper_url: http://arxiv.org/abs/2307.11784
  • repo_url: None
  • paper_authors: Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao
  • for: 本文旨在提出一种可靠地在安全关键领域使用学习能力的方法,以确保系统的安全性。
  • methods: 本文提出了一种两步验证方法,以实现可证明的统计保证。
  • results: 本文认为,现有的方法无法实际实现可证明的保证,therefore promoting the two-step verification method for achieving provable statistical guarantees.
    Abstract Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.
    摘要 机器学习技术已经取得了很大的进步,但在安全关键领域使用学习能力的组件仍然存在挑战。其中一个最大的挑战是实现可靠的安全保证。在这篇论文中,我们首先讨论了设计和验证这些系统的工程和研究挑战。然后,根据现有的工作无法实现可证的保证,我们提出了两步验证方法以实现可证的统计保证。

On Combining Expert Demonstrations in Imitation Learning via Optimal Transport

  • paper_url: http://arxiv.org/abs/2307.10810
  • repo_url: https://github.com/ilanasebag/Sliced-MMOT-Imitation-Learning
  • paper_authors: Ilana Sebag, Samuel Cohen, Marc Peter Deisenroth
  • for: 教学Agent特定任务 через专家示范
  • methods: 使用优化运输方法测量Agent和专家轨迹之间的距离,并将多个专家示范合并在OT上
  • results: 在OpenAI Gym控制环境中,提出了一种使用多个专家示范的方法,并分析了其效率,发现标准方法不总是最优
    Abstract Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.
    摘要 copied from clipboard模仿学习(IL)目的是教导代理人特定任务通过专家示范。一种关键的IL方法是定义代理人和专家之间的距离,并找到一个代理人策略,以最小化这个距离。优质运输方法在模仿学习中广泛应用,它们提供了测量代理人和专家轨迹之间有意义的距离的方法。然而,多个专家示范的组合尚未得到广泛的研究。标准方法是简单地 concatenate 状态(-动作)轨迹,这会导致轨迹是多模的。我们提出了一种 альтернатив 方法,使用多个多重最优运输距离,使得多个和多样的状态轨迹在OT意义上能够合理地组合,提供一个更加有意义的 geometric average 的示范。我们的方法允许代理人从多个专家中学习,并在 OpenAI Gym 控制环境中进行了效率分析,结果显示,标准方法并不总是优化的。

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

  • paper_url: http://arxiv.org/abs/2307.10805
  • repo_url: None
  • paper_authors: Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon
  • for: 提高分布式学习中通信开销的减少
  • methods: 利用矩阵列中具有不同分散度的特征进行压缩,并采用适应式dropout和适应式量化策略
  • results: 与现有分布式学习框架相比,提供5.6%以上的分类精度提升,同时减少了320倍的通信开销
    Abstract This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.
    摘要
  1. Adaptive feature-wise dropout: The intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped.2. Adaptive feature-wise quantization: The non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression.Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.Translation in Simplified Chinese:这篇论文提出了一种新的通信减少的split学习(SL)框架,名为SplitFC,它降低了在SL训练过程中传输中间特征和梯度 вектор的通信开销。SplitFC的关键思想是利用不同的散度度在矩阵列中。SplitFC包括两种压缩策略:1. 适应特征 wise dropout:中间特征 вектор通过适应dropout概率确定了dropout probabilities,然后根据链规则,相关的中间梯度 вектор也会被Drop。2. 适应特征 wise quantization:未Drop的中间特征和梯度 вектор通过适应压缩水平确定了压缩级别,以避免压缩误差。压缩级别的优化准确表达得到了关闭式表达。实验结果表明,SplitFC在MNIST、CIFAR-10和CelebA datasets上提供了 más de 5.6%的分类精度提升,同时与无压缩SL框架相比,它需要320倍少的通信开销。

Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

  • paper_url: http://arxiv.org/abs/2307.10803
  • repo_url: None
  • paper_authors: Hanchen Yang, Wengen Li, Shuyu Wang, Hui Li, Jihong Guan, Shuigeng Zhou, Jiannong Cao
  • For: This paper provides a comprehensive survey of existing spatial-temporal data mining (STDM) studies for ocean science, including a review of widely-used ST ocean datasets and their unique characteristics, as well as techniques for data quality enhancement and various STDM tasks.* Methods: The paper reviews and discusses various techniques for STDM in ocean science, including data preprocessing, feature extraction, and machine learning algorithms for tasks such as prediction, event detection, pattern mining, and anomaly detection.* Results: The paper highlights the unique challenges and opportunities of STDM in ocean science, and discusses promising research opportunities in this field, including the application of advanced STDM techniques to climate forecasting and disaster warning.
    Abstract With the rapid amassing of spatial-temporal (ST) ocean data, many spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, including climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated but with unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models on ST ocean data. To the best of our knowledge, a comprehensive survey of existing studies remains missing in the literature, which hinders not only computer scientists from identifying the research issues in ocean data mining but also ocean scientists to apply advanced STDM techniques. In this paper, we provide a comprehensive survey of existing STDM studies for ocean science. Concretely, we first review the widely-used ST ocean datasets and highlight their unique characteristics. Then, typical ST ocean data quality enhancement techniques are explored. Next, we classify existing STDM studies in ocean science into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate on the techniques for these tasks. Finally, promising research opportunities are discussed. This survey can help scientists from both computer science and ocean science better understand the fundamental concepts, key techniques, and open challenges of STDM for ocean science.
    摘要 随着空间时间(ST)海洋数据的快速汇集,许多空间时间数据挖掘(STDM)研究已经进行以解决海洋问题,如气候预测和灾害警告。相比一般ST数据(例如交通数据),ST海洋数据更加复杂,但具有独特特征,如多样性和高稀畴性。这些特征使得设计和训练STDM模型对ST海洋数据变得更加困难。据我们所知,现有的相关研究检索在文献中缺失,这不仅阻碍了计算机科学家从海洋数据挖掘中了解研究问题,也阻碍了海洋科学家应用先进的STDM技术。在这篇论文中,我们提供了海洋科学领域的全面的STDM研究检索。具体来说,我们首先评论了广泛使用的ST海洋数据集和其独特特征。然后,我们探讨了一般ST海洋数据质量提升技术。接着,我们分类了现有的STDM研究,并详细介绍了这些任务的技术。最后,我们讨论了有前途的研究机遇。这种检索可以帮助计算机科学家和海洋科学家更好地理解STDM的基本概念、关键技术和开放的挑战,以及在海洋科学领域应用STDM技术的可能性。

Meta-Transformer: A Unified Framework for Multimodal Learning

  • paper_url: http://arxiv.org/abs/2307.10802
  • repo_url: https://github.com/invictus717/MetaTransformer
  • paper_authors: Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue
  • for: 这个研究旨在建立一个能够处理多种模式的模型,并将其与不同的模式进行关联。
  • methods: 这个方法使用一个冻结的encoder来进行多模式认识,并将不同的输入数据maps到共同的token空间,以EXTRACT高级 semantic feature。
  • results: 这个方法可以在12种模式上进行通用的学习,包括基本的认识(文本、图像、点 cloud、音频、影片)、实际应用(X-ray、infrared、颜色、IMU)和数据采矿(图形、表格、时间序列)。
    Abstract Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities ($\textit{e.g.}$ natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we propose a framework, named Meta-Transformer, that leverages a $\textbf{frozen}$ encoder to perform multimodal perception without any paired multimodal training data. In Meta-Transformer, the raw input data from various modalities are mapped into a shared token space, allowing a subsequent encoder with frozen parameters to extract high-level semantic features of the input data. Composed of three main components: a unified data tokenizer, a modality-shared encoder, and task-specific heads for downstream tasks, Meta-Transformer is the first framework to perform unified learning across 12 modalities with unpaired data. Experiments on different benchmarks reveal that Meta-Transformer can handle a wide range of tasks including fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). Meta-Transformer indicates a promising future for developing unified multimodal intelligence with transformers. Code will be available at https://github.com/invictus717/MetaTransformer
    摘要 多Modal学习旨在建立处理多种模式的模型。尽管多年的发展,仍然困难设计处理多种模式的统一网络(例如自然语言、2D图像、3D点云、音频、视频、时间序列、表格数据)的模型,因为这些模式之间存在隐藏的差异。在这项工作中,我们提出了一个框架,名为Meta-Transformer,它利用一个冻结的Encoder来实现多Modal感知,无需任何对准的多Modal训练数据。Meta-Transformer框架由三个主要组件组成:一个统一的数据Tokenizer、一个共享Encoder和下游任务的任务特定头。Meta-Transformer是首个在12种模式上进行统一学习的框架,无需对数据进行匹配。在不同的Benchmark上进行的实验表明,Meta-Transformer可以处理各种任务,包括基本的感知(文本、图像、点云、音频、视频)、实用应用(X射线、红外、偏振、IMU)和数据挖掘(图形、表格、时间序列)。Meta-Transformer表明了未来在使用Transformer进行多Modal智能的发展具有扎实的前景。代码将在https://github.com/invictus717/MetaTransformer上提供。

Convergence of Adam for Non-convex Objectives: Relaxed Hyperparameters and Non-ergodic Case

  • paper_url: http://arxiv.org/abs/2307.11782
  • repo_url: None
  • paper_authors: Meixuan He, Yuqing Liang, Jinlan Liu, Dongpo Xu
  • for: investigate the convergence properties of Adam algorithm in non-convex settings and develop a better understanding of its performance.
  • methods: introduce precise definitions of ergodic and non-ergodic convergence, and establish a weaker sufficient condition for the ergodic convergence guarantee of Adam.
  • results: prove that the last iterate of Adam converges to a stationary point for non-convex objectives, and obtain the non-ergodic convergence rate of $O(1/K)$ for function values under the Polyak-Lojasiewicz (PL) condition.
    Abstract Adam is a commonly used stochastic optimization algorithm in machine learning. However, its convergence is still not fully understood, especially in the non-convex setting. This paper focuses on exploring hyperparameter settings for the convergence of vanilla Adam and tackling the challenges of non-ergodic convergence related to practical application. The primary contributions are summarized as follows: firstly, we introduce precise definitions of ergodic and non-ergodic convergence, which cover nearly all forms of convergence for stochastic optimization algorithms. Meanwhile, we emphasize the superiority of non-ergodic convergence over ergodic convergence. Secondly, we establish a weaker sufficient condition for the ergodic convergence guarantee of Adam, allowing a more relaxed choice of hyperparameters. On this basis, we achieve the almost sure ergodic convergence rate of Adam, which is arbitrarily close to $o(1/\sqrt{K})$. More importantly, we prove, for the first time, that the last iterate of Adam converges to a stationary point for non-convex objectives. Finally, we obtain the non-ergodic convergence rate of $O(1/K)$ for function values under the Polyak-Lojasiewicz (PL) condition. These findings build a solid theoretical foundation for Adam to solve non-convex stochastic optimization problems.
    摘要 亚当是一种常用的机会估计算法在机器学习中。然而,它的整合仍然未全面理解,特别是在非对称设定下。本文专注于探索亚当的参数设定,以提高其在实际应用中的性能。主要贡献如下:首先,我们引入了精确的ergodic和non-ergodic整合定义,这些定义包括大多数 Stochastic optimization算法的整合形式。同时,我们强调non-ergodic整合的superiority。第二,我们提出了一个较弱的充分必要条件,以确保亚当的ergodic整合保证。这允许更加relaxed的参数选择。基于这个基础,我们获得了几乎确定的almost sure ergodic整合速率,它是$o(1/\sqrt{K})$。更重要的是,我们证明了亚当的最后迭代向非对称目标函数 converge。最后,我们取得了非ergodic整合速率为$O(1/K)$,它是PL conditon下的函数值。这些发现建立了亚当在非对称随机估计问题上的坚固理论基础。

Optimizing PatchCore for Few/many-shot Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.10792
  • repo_url: https://github.com/scortexio/patchcore-few-shot
  • paper_authors: João Santos, Triet Tran, Oliver Rippel
  • for: 这个论文主要关注于ew-shot anomaly detection(AD)领域的研究,特别是使用只有几个选择的样本来分辨正常和异常数据。
  • methods: 这篇论文使用了PatchCore,当前的全shot AD/AS算法,进行研究,包括优化其多种超参数和将supervised learning中知悉的技术转移到AD领域。
  • results: 实验表明,可以通过优化超参数和使用图像水平的扩展来实现显著性能提升,并在VisA dataset上实现了新的state of the art。此外,该论文还提出了未来研究的可能性,即研究具有强 inductive bias的特征提取器。
    Abstract Few-shot anomaly detection (AD) is an emerging sub-field of general AD, and tries to distinguish between normal and anomalous data using only few selected samples. While newly proposed few-shot AD methods do compare against pre-existing algorithms developed for the full-shot domain as baselines, they do not dedicatedly optimize them for the few-shot setting. It thus remains unclear if the performance of such pre-existing algorithms can be further improved. We address said question in this work. Specifically, we present a study on the AD/anomaly segmentation (AS) performance of PatchCore, the current state-of-the-art full-shot AD/AS algorithm, in both the few-shot and the many-shot settings. We hypothesize that further performance improvements can be realized by (I) optimizing its various hyperparameters, and by (II) transferring techniques known to improve few-shot supervised learning to the AD domain. Exhaustive experiments on the public VisA and MVTec AD datasets reveal that (I) significant performance improvements can be realized by optimizing hyperparameters such as the underlying feature extractor, and that (II) image-level augmentations can, but are not guaranteed, to improve performance. Based on these findings, we achieve a new state of the art in few-shot AD on VisA, further demonstrating the merit of adapting pre-existing AD/AS methods to the few-shot setting. Last, we identify the investigation of feature extractors with a strong inductive bias as a potential future research direction for (few-shot) AD/AS.
    摘要 新型异常检测(AD)是一个新趋势的分支,它目标是使用只有几个选择的样本分类正常和异常数据。虽然新提出的几个AD方法会比较旧的全shot预测器,但它们没有专门优化它们为几个shot设定。因此,其性能仍然存在uncertainty。我们在这里解决这个问题。我们对PatchCore,当前的全shotAD/AS算法,在几个shot和多个shot设定下进行AD/AS性能的研究。我们假设可以通过(I)优化其各种超参数,以及(II)将几个shot学习中的技术转移到AD领域来实现性能提高。我们在公共的VisA和MVTec AD datasets上进行了广泛的实验,发现(I)可以通过优化特征提取器来实现显著性能提高,并且(II)图像水平的扩展可以,但并不一定,提高性能。基于这些发现,我们在VisA上实现了新的状态态的AD,进一步证明了适应前 exist AD/AS方法到几个shot设定的价值。最后,我们认为在(几个shot)AD/AS领域 investigating feature extractors with strong inductive bias 是一个可能的未来研究方向。

Adversarial attacks for mixtures of classifiers

  • paper_url: http://arxiv.org/abs/2307.10788
  • repo_url: None
  • paper_authors: Lucas Gnecco Heredia, Benjamin Negrevergne, Yann Chevaleyre
  • for: 提高鲁棒性 against adversarial attacks
  • methods: 使用mixtures of classifiers (a.k.a. randomized ensembles)
  • results: 引入两种攻击性质(有效性和最大化),并证明现有攻击不符合这两种性质。还提出了一种新的攻击方法called lattice climber attack,并在binary linear setting下提供了理论保证,并在 synthetic和实际数据上进行了实验验证。
    Abstract Mixtures of classifiers (a.k.a. randomized ensembles) have been proposed as a way to improve robustness against adversarial attacks. However, it has been shown that existing attacks are not well suited for this kind of classifiers. In this paper, we discuss the problem of attacking a mixture in a principled way and introduce two desirable properties of attacks based on a geometrical analysis of the problem (effectiveness and maximality). We then show that existing attacks do not meet both of these properties. Finally, we introduce a new attack called lattice climber attack with theoretical guarantees on the binary linear setting, and we demonstrate its performance by conducting experiments on synthetic and real datasets.
    摘要 合并分类器(即随机 ensemble)已经被提议用于提高对抗骚扰攻击的Robustness。然而,已经证明现有的攻击方法并不适用于这种类型的分类器。在这篇论文中,我们讨论了攻击混合的问题,并提出了两个愿望的攻击特性(有效性和最大化)。我们then表明现有的攻击方法并不满足这两个特性。最后,我们介绍了一种新的攻击方法called lattice climber attack,并提供了对二分线性设定下的理论保证。我们通过对 sintetic和实际数据进行实验,证明了这种攻击方法的性能。

Feed-Forward Source-Free Domain Adaptation via Class Prototypes

  • paper_url: http://arxiv.org/abs/2307.10787
  • repo_url: None
  • paper_authors: Ondrej Bohdal, Da Li, Timothy Hospedales
  • for: 本研究旨在探讨源自由领域适应的快速化方法,以替代基于反射传播的适应方法。
  • methods: 本方法基于预训练模型计算类 prototype,实现了快速化适应并且只需要小量时间。
  • results: 本研究实现了准确率的显著提升,并且比普通适应方法快速得多。
    Abstract Source-free domain adaptation has become popular because of its practical usefulness and no need to access source data. However, the adaptation process still takes a considerable amount of time and is predominantly based on optimization that relies on back-propagation. In this work we present a simple feed-forward approach that challenges the need for back-propagation based adaptation. Our approach is based on computing prototypes of classes under the domain shift using a pre-trained model. It achieves strong improvements in accuracy compared to the pre-trained model and requires only a small fraction of time of existing domain adaptation methods.
    摘要 源自由领域适应已经成为很受欢迎的方法,因为它的实用性和不需要访问源数据。然而,适应过程仍然需要一定的时间,并且主要基于优化,使用反向传播。在这种工作中,我们提出了一种简单的前向方法,挑战需要反向传播基于适应。我们的方法基于使用预训练模型计算类下的prototype,实现了与预训练模型的准确率强劲提高,并且只需一小部分时间。

Efficient Beam Tree Recursion

  • paper_url: http://arxiv.org/abs/2307.10779
  • repo_url: None
  • paper_authors: Jishnu Ray Chowdhury, Cornelia Caragea
  • for: 这个论文的目的是提出一种简单的扩展,以提高 Gumbel Tree RvNN 的长度整合性表现,并维持与其他任务的相似性。
  • methods: 这个论文使用的方法是识别 BT-RvNN 的主要瓶颈,并提出一些简化其内存使用的策略。
  • results: 这个论文的结果显示,使用这些策略可以将 BT-RvNN 的内存使用量降低 $10$-$16$ 倍,并创造一个新的州分-of-the-art 在 ListOps 中,同时保持与其他任务的相似性。
    Abstract Beam Tree Recursive Neural Network (BT-RvNN) was recently proposed as a simple extension of Gumbel Tree RvNN and it was shown to achieve state-of-the-art length generalization performance in ListOps while maintaining comparable performance on other tasks. However, although not the worst in its kind, BT-RvNN can be still exorbitantly expensive in memory usage. In this paper, we identify the main bottleneck in BT-RvNN's memory usage to be the entanglement of the scorer function and the recursive cell function. We propose strategies to remove this bottleneck and further simplify its memory usage. Overall, our strategies not only reduce the memory usage of BT-RvNN by $10$-$16$ times but also create a new state-of-the-art in ListOps while maintaining similar performance in other tasks. In addition, we also propose a strategy to utilize the induced latent-tree node representations produced by BT-RvNN to turn BT-RvNN from a sentence encoder of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{d}$ into a sequence contextualizer of the form $f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$. Thus, our proposals not only open up a path for further scalability of RvNNs but also standardize a way to use BT-RvNNs as another building block in the deep learning toolkit that can be easily stacked or interfaced with other popular models such as Transformers and Structured State Space models.
    摘要 “Beam Tree Recursive Neural Network(BT-RvNN)最近被提出,它是Gumbel Tree RvNN的简单扩展,并在ListOps中实现了状态前瞻性的长度泛化性,同时保持与其他任务的相似性。然而,虽然不是最差的一种,BT-RvNN仍然具有昂贵的内存使用。在这篇论文中,我们确定了BT-RvNN的主要瓶颈是排序函数和循环细胞函数的杂谱。我们提出了一些缓解瓶颈的策略,以降低BT-RvNN的内存使用。总的来说,我们的策略不仅降低了BT-RvNN的内存使用量$10$-$16$倍,还创造了一个新的状态前瞻性在ListOps中,同时保持与其他任务的相似性。此外,我们还提出了利用BT-RvNN生成的潜在树节点表示来将BT-RvNN转换为一个序列Contextualizer的形式$f:\mathbb{R}^{n \times d} \rightarrow \mathbb{R}^{n \times d}$。因此,我们的提议不仅开启了RvNN的扩展可能性,还标准化了使用BT-RvNN作为深度学习工具箱中的另一个构建件,可以轻松堆叠或者与其他流行的模型相互作用,如Transformers和结构化状态空间模型。”

Assessing the Use of AutoML for Data-Driven Software Engineering

  • paper_url: http://arxiv.org/abs/2307.10774
  • repo_url: None
  • paper_authors: Fabio Calefato, Luigi Quaranta, Filippo Lanubile, Marcos Kalinowski
  • for: 填补AI/ML技术专业人员短缺的问题,促进自动机器学习(AutoML)的应用。
  • methods: 使用混合方法研究,包括12种终端AutoML工具在两个SE数据集上的比较,以及对实践者和研究者的调查和访谈。
  • results: 发现AutoML解决方案可以在SE领域中的分类任务中表现更好 than manually trained and optimized models,但目前可用的AutoML解决方案仍未能完全支持所有队员和开发工作流程的自动化。
    Abstract Background. Due to the widespread adoption of Artificial Intelligence (AI) and Machine Learning (ML) for building software applications, companies are struggling to recruit employees with a deep understanding of such technologies. In this scenario, AutoML is soaring as a promising solution to fill the AI/ML skills gap since it promises to automate the building of end-to-end AI/ML pipelines that would normally be engineered by specialized team members. Aims. Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted by teams developing AI/ML-enabled systems and how it is perceived by practitioners and researchers. Method. To fill these gaps, in this paper, we present a mixed-method study comprising a benchmark of 12 end-to-end AutoML tools on two SE datasets and a user survey with follow-up interviews to further our understanding of AutoML adoption and perception. Results. We found that AutoML solutions can generate models that outperform those trained and optimized by researchers to perform classification tasks in the SE domain. Also, our findings show that the currently available AutoML solutions do not live up to their names as they do not equally support automation across the stages of the ML development workflow and for all the team members. Conclusions. We derive insights to inform the SE research community on how AutoML can facilitate their activities and tool builders on how to design the next generation of AutoML technologies.
    摘要 背景:由于人工智能(AI)和机器学习(ML)在软件开发中的普及,公司困难找到具备深层理解AI/ML技术的员工。在这种情况下,AutoML在解决AI/ML技能差距方面表现出了扎根的优势,因为它承诺自动化建立AI/ML管道,通常需要专业的团队成员进行工程。目标:尽管有增加的兴趣和高期望,但是有关AutoML在开发AI/ML相关系统的团队中的采用和专家和研究人员对其看法的信息不够。方法:为了填补这些空白,本文提出了一项混合方法研究,包括12个终端AutoML工具在两个SE数据集上的benchmark,以及与相关专家和研究人员进行详细交流的用户调查。结果:我们发现AutoML解决方案可以在SE领域中对分类任务进行更好的模型生成,而且我们的发现还表明现有的AutoML解决方案并不能够完全支持自动化ML开发工作流程中的所有阶段和所有团队成员。结论:我们从研究中得到了关于如何使用AutoML促进SE研究人员的活动,以及如何设计下一代AutoML技术的技术建议。

Music Genre Classification with ResNet and Bi-GRU Using Visual Spectrograms

  • paper_url: http://arxiv.org/abs/2307.10773
  • repo_url: None
  • paper_authors: Junfei Zhang
  • for: 提高音乐播放服务的用户体验和满意度,即Music Recommendation Systems。
  • methods: 使用视觉spectrogram作为输入,并提出了一种 hybrid 模型,结合 Residual neural Network (ResNet) 和 Gated Recurrent Unit (GRU),以更好地捕捉音乐数据的复杂性。
  • results: 提出了一种新的 Automatic Music Genre Classification (AMGC) 系统,可以更好地捕捉音乐数据的复杂性,并且可能提高音乐推荐系统的准确率。
    Abstract Music recommendation systems have emerged as a vital component to enhance user experience and satisfaction for the music streaming services, which dominates music consumption. The key challenge in improving these recommender systems lies in comprehending the complexity of music data, specifically for the underpinning music genre classification. The limitations of manual genre classification have highlighted the need for a more advanced system, namely the Automatic Music Genre Classification (AMGC) system. While traditional machine learning techniques have shown potential in genre classification, they heavily rely on manually engineered features and feature selection, failing to capture the full complexity of music data. On the other hand, deep learning classification architectures like the traditional Convolutional Neural Networks (CNN) are effective in capturing the spatial hierarchies but struggle to capture the temporal dynamics inherent in music data. To address these challenges, this study proposes a novel approach using visual spectrograms as input, and propose a hybrid model that combines the strength of the Residual neural Network (ResNet) and the Gated Recurrent Unit (GRU). This model is designed to provide a more comprehensive analysis of music data, offering the potential to improve the music recommender systems through achieving a more comprehensive analysis of music data and hence potentially more accurate genre classification.
    摘要 音乐推荐系统已成为现代音乐流媒体服务的重要组成部分,以提高用户体验和满意度。然而,现有的音乐分类方法受到了复杂的音乐数据的限制,尤其是在音乐种类归类方面。传统的机器学习技术可能有潜力,但是它们依赖于人工设计的特征和特征选择,无法捕捉音乐数据的全面复杂性。相反,深度学习分类架构如传统的卷积神经网络(CNN)可以捕捉音乐数据的空间层次结构,但是它们很难捕捉音乐数据中的时间动态特征。为解决这些挑战,本研究提出了一种新的方法,使用视觉спектрограм作为输入,并提出了一种混合模型,结合了Residual神经网络(ResNet)和Gated Recurrent Unit(GRU)。这种模型设计用于为音乐数据提供更全面的分析,并且可能提高音乐推荐系统的准确率。

Unveiling Emotions from EEG: A GRU-Based Approach

  • paper_url: http://arxiv.org/abs/2308.02778
  • repo_url: None
  • paper_authors: Sarthak Johari, Gowri Namratha Meedinti, Radhakrishnan Delhibabu, Deepak Joshi
    for: 这项研究旨在使用EEG数据进行情感识别,以提高人机交互和情感计算领域的发展。methods: 这项研究使用了Gated Recurrent Unit(GRU)算法,它是一种Recurrent Neural Networks(RNNs)的变种,通过利用EEG信号来预测情感状态。研究者们使用了公共可访问的数据集,包括在无动作状态下的中性数据以及受到刺激后的人类EEG记录,以及激发 happiness、neutral和negative情感的刺激。为了获得最佳的特征提取,研究者们对EEG数据进行了磁态除、频率筛选和normalization处理。results: 研究者们使用了GRU模型,并在验证集上达到了100%的准确率。与其他机器学习方法相比,GRU模型的Extreme Gradient Boosting Classifier具有最高的准确率。研究者们还对模型的混淆矩阵进行了分析,从而获得了精准的情感分类结果。这项研究表明了深度学习模型如GRU在情感识别方面的潜力,并且开创了新的情感计算领域的可能性。
    Abstract One of the most important study areas in affective computing is emotion identification using EEG data. In this study, the Gated Recurrent Unit (GRU) algorithm, which is a type of Recurrent Neural Networks (RNNs), is tested to see if it can use EEG signals to predict emotional states. Our publicly accessible dataset consists of resting neutral data as well as EEG recordings from people who were exposed to stimuli evoking happy, neutral, and negative emotions. For the best feature extraction, we pre-process the EEG data using artifact removal, bandpass filters, and normalization methods. With 100% accuracy on the validation set, our model produced outstanding results by utilizing the GRU's capacity to capture temporal dependencies. When compared to other machine learning techniques, our GRU model's Extreme Gradient Boosting Classifier had the highest accuracy. Our investigation of the confusion matrix revealed insightful information about the performance of the model, enabling precise emotion classification. This study emphasizes the potential of deep learning models like GRUs for emotion recognition and advances in affective computing. Our findings open up new possibilities for interacting with computers and comprehending how emotions are expressed through brainwave activity.
    摘要 (Simplified Chinese)一个非常重要的研究领域在情感计算是使用EEG数据进行情感识别。在这项研究中,我们使用了Gated Recurrent Unit(GRU)算法,这是一种Recurrent Neural Networks(RNNs)的变种,以EEG信号来预测情感状态。我们的公共可访问数据集包括普通的中性数据以及在刺激人们表现出喜、中性和负情的EEG记录。为了获得最佳的特征提取,我们对EEG数据进行了噪声除除、频率筛选和 нормализа化处理。在验证集上得到100%的准确率,我们的模型表现出色,利用GRU捕捉时间相关性的能力。与其他机器学习技术相比,我们的GRU模型的Extreme Gradient Boosting Classifier(EGGB)准确率最高。我们对模型的混淆矩阵进行了调查,获得了模型表现的有用信息,启发精准的情感分类。这项研究强调了深度学习模型如GRU的情感识别潜力,并提出了对情感计算的新可能性。我们的发现打开了与计算机交互和理解脑波活动中情感表达的新可能性。

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

  • paper_url: http://arxiv.org/abs/2307.10768
  • repo_url: https://github.com/zhanglab-deepneurocoglab/worm
  • paper_authors: Ankur Sikarwar, Mengmi Zhang
  • for: 本研究的目的是开发一个完整的工作记忆(WM)benchmark数据集,以便用于AI WM模型的开发和评估。
  • methods: 本研究使用了10个任务和100万个实验,评估了4种功能、3种领域和11种行为和神经特征。同时,还包括了人类行为的参照值作为比较标准。
  • results: 研究发现,AI模型在一些情况下能够模拟大脑中的工作记忆特征,如 primacy 和 recency 效应,以及各个领域和功能的神经团块和相关性。然而,也发现现有模型存在一些限制,无法完全approximate人类行为。这个数据集将成为跨 дисциплиinary的资源,用于比较和改进WM模型,研究WM的神经基础,并开发人类样式的WM模型。
    Abstract Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.
    摘要 工作记忆(WM),一种基本的认知过程,协助短暂存储、结合、操作和抽取信息,在推理和决策任务中扮演至关重要的角色。为了有效开发和评估人工智能WM模型,需要一些可靠的benchmark数据集。在这里,我们介绍了一个全面的Working Memory(WorM)benchmark数据集,用于这个目的。WorM包括10个任务和总共100万个尝试,评估了4种功能、3种领域和11种行为和神经特征。我们将现有的循环神经网络和转换器模型在所有这些任务上同时训练和测试。我们还包括了人类行为标准 als an upper bound for comparison。我们的结果表明,人工智能模型在脑中的WM特征中复制了一些特征,如劣antage和最新效应,以及特定领域和功能的神经团和相关特征。在实验中,我们还发现了现有模型的一些局限性,无法模拟人类行为。这个数据集作为一个 ценный资源,可以为认知心理学、神经科学和人工智能社区提供一个标准化的框架,用于比较和改进WM模型,调查WM的神经基础,并开发人类类似的WM模型。我们的源代码和数据可以在https://github.com/ZhangLab-DeepNeuroCogLab/WorM上获取。

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

  • paper_url: http://arxiv.org/abs/2307.10763
  • repo_url: https://github.com/mondalanindya/msqnet
  • paper_authors: Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta
  • for: 提出了一种actor-agnostic multi-modal multi-label action recognition方法,以解决actor-specific pose estimation和多个行为同时发生的问题。
  • methods: 提出了一种基于 transformer 检测框架的 Multi-modal Semantic Query Network (MSQNet) 模型,利用视觉和文本模式更好地表示行为类别。
  • results: 在五个公开的数据集上进行了广泛的实验,并 consistently outperformed actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%.
    Abstract Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.
    摘要 现有的动作识别方法通常是actor-specific的,因为actor之间存在内在的 topological 和 apparent 差异。这会导致actor-specific 姿势估计(例如人 VS 动物),从而增加模型设计复杂度和维护成本。此外,它们通常只学习视觉modal alone 和单个标签分类,而忽略其他可用的信息源(例如类名文本)以及同时发生的多个动作。为了超越这些限制,我们提出了一种新的approach called 'actor-agnostic multi-modal multi-label action recognition', which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.

Mitigating Voter Attribute Bias for Fair Opinion Aggregation

  • paper_url: http://arxiv.org/abs/2307.10749
  • repo_url: None
  • paper_authors: Ryosuke Ueda, Koh Takeuchi, Hisashi Kashima
  • For: This paper focuses on developing fair opinion aggregation methods to address biases in decision-making, particularly in tasks without objectively true labels.* Methods: The authors propose a combination of opinion aggregation models, such as majority voting and the Dawid and Skene model, with fairness options like sample weighting and data splitting. They also introduce a new Soft D&S model to improve soft label estimation.* Results: The experimental results show that the combination of Soft D&S and data splitting is effective for dense data, while weighted majority voting is effective for sparse data. These findings support fair opinion aggregation in decision-making, both for human and machine-learning models.Here’s the simplified Chinese text for the three key points:* For: 这篇论文关注了在决策中减少意见偏见,特别是在没有唯一正确标签的任务中。* Methods: 作者们提议结合意见集成模型,如多数投票和达韦德和锈模型,以及公平选项,如样本权重和数据分割。他们还提出了一种新的软D&S模型,以提高软标签估计的准确性。* Results: 实验结果表明,将软D&S模型与数据分割结合使用,对于稠密数据是有效的,而Weighted多数投票对于稀疏数据是有效的。这些发现将为人类和机器学习模型的均衡意见集成提供支持。
    Abstract The aggregation of multiple opinions plays a crucial role in decision-making, such as in hiring and loan review, and in labeling data for supervised learning. Although majority voting and existing opinion aggregation models are effective for simple tasks, they are inappropriate for tasks without objectively true labels in which disagreements may occur. In particular, when voter attributes such as gender or race introduce bias into opinions, the aggregation results may vary depending on the composition of voter attributes. A balanced group of voters is desirable for fair aggregation results but may be difficult to prepare. In this study, we consider methods to achieve fair opinion aggregation based on voter attributes and evaluate the fairness of the aggregated results. To this end, we consider an approach that combines opinion aggregation models such as majority voting and the Dawid and Skene model (D&S model) with fairness options such as sample weighting. To evaluate the fairness of opinion aggregation, probabilistic soft labels are preferred over discrete class labels. First, we address the problem of soft label estimation without considering voter attributes and identify some issues with the D&S model. To address these limitations, we propose a new Soft D&S model with improved accuracy in estimating soft labels. Moreover, we evaluated the fairness of an opinion aggregation model, including Soft D&S, in combination with different fairness options using synthetic and semi-synthetic data. The experimental results suggest that the combination of Soft D&S and data splitting as a fairness option is effective for dense data, whereas weighted majority voting is effective for sparse data. These findings should prove particularly valuable in supporting decision-making by human and machine-learning models with balanced opinion aggregation.
    摘要 “多元意见的统计发挥了重要的决策作用,例如在招聘和贷款审核中,以及在指导学习中标签数据。 Although majority voting和现有的意见统计模型在简单任务上效果良好,但在无明确真实标签的任务中,它们无法应对不同意见的分歧。具体来说,当投票者属性如性别或种族引入偏见到意见时,统计结果将因投票者属性的分布而异。一个均衡的投票者群体是有利于公平统计结果的,但可能具有困难。在本研究中,我们考虑了基于投票者属性的公平意见统计方法,并评估这些方法的公平性。为此,我们考虑了结合意见统计模型(如多数投票和道维德和斯凯纳模型)和公平选项(如抽样重量)。对于评估公平性,随机软标签被视为更好的选择,而不是硬coded标签。我们首先解决了不考虑投票者属性的soft label估计问题,并发现了一些限制。为了解决这些限制,我们提出了一个新的Soft D&S模型,具有改善了 soft label 估计的精度。此外,我们还评估了不同公平选项与Soft D&S模型的整体公平性,使用人工和半自然数据。实验结果显示,结合Soft D&S模型和抽样重量的公平选项是在厚度数据中有效的,而Weighted多数投票则是在叠节数据中有效的。这些发现将在支持人类和机器学习模型的投票结果均衡中提供价值。”

Fairness-Aware Client Selection for Federated Learning

  • paper_url: http://arxiv.org/abs/2307.10738
  • repo_url: None
  • paper_authors: Yuxin Shi, Zelei Liu, Zhuan Shi, Han Yu
  • for: 提高 Federated Learning(FL)客户端选择的公平性和模型性能。
  • methods: 基于 Lyapunov 优化的 Fairness-aware Federated Client Selection(FairFedCS)方法,通过考虑客户端的声誉、参与 FL 任务的时间和模型性能的贡献,动态调整客户端选择概率。
  • results: 在实际的 multimedia 数据集上进行了广泛的实验,并显示了 FairFedCS 可以提高平均 fairness 19.6% 和测试精度 0.73% 比最佳状态的方法。
    Abstract Federated learning (FL) has enabled multiple data owners (a.k.a. FL clients) to train machine learning models collaboratively without revealing private data. Since the FL server can only engage a limited number of clients in each training round, FL client selection has become an important research problem. Existing approaches generally focus on either enhancing FL model performance or enhancing the fair treatment of FL clients. The problem of balancing performance and fairness considerations when selecting FL clients remains open. To address this problem, we propose the Fairness-aware Federated Client Selection (FairFedCS) approach. Based on Lyapunov optimization, it dynamically adjusts FL clients' selection probabilities by jointly considering their reputations, times of participation in FL tasks and contributions to the resulting model performance. By not using threshold-based reputation filtering, it provides FL clients with opportunities to redeem their reputations after a perceived poor performance, thereby further enhancing fair client treatment. Extensive experiments based on real-world multimedia datasets show that FairFedCS achieves 19.6% higher fairness and 0.73% higher test accuracy on average than the best-performing state-of-the-art approach.
    摘要 federated learning(FL)已经允许多个数据拥有者(即FL客户)共同训练机器学习模型,无需披露私人数据。由于FL服务器只能在每次训练中选择一定数量的客户,因此选择FL客户已成为一个重要的研究问题。现有的方法通常是增强FL模型性能或增强FL客户的公平待遇。尚未解决FL客户性能和公平待遇考虑的权衡问题。为解决这个问题,我们提出了公平性感知 Federated Client Selection(FairFedCS)方法。基于 Lyapunov 优化,它在FL客户选择概率中进行了动态调整,并且同时考虑了FL客户的声誉、参与FL任务的时间和对模型性能的贡献。不使用阈值基于声誉筛选,因此允许FL客户在感知性能不佳时重新恢复声誉,从而进一步提高FL客户的公平待遇。经过基于实际 multimedia 数据集的广泛实验,我们发现 FairFedCS 平均提高了19.6%的公平性和0.73%的测试准确率。

Long-Tail Theory under Gaussian Mixtures

  • paper_url: http://arxiv.org/abs/2307.10736
  • repo_url: https://github.com/armanbolatov/long_tail
  • paper_authors: Arman Bolatov, Maxat Tezekbayev, Igor Melnykov, Artur Pak, Vassilina Nikoulina, Zhenisbek Assylbekov
  • for: 这篇论文关注 Feldman 的长尾理论 (2020),提出了一个简单的 Gaussian 混合模型,以测试不同类型的标签模型在长尾分布下的性能。
  • methods: 论文使用了一个线性分类器和一个非线性分类器,以评估它们在长尾分布下的适用程度。
  • results: 论文发现,在长尾分布下,非线性分类器能够对新数据进行更好的适应,而线性分类器则无法降低一定水平的普遍化错误。此外,论文还发现,当尾部的长度变短时,两种模型之间的性能差距可以降低。
    Abstract We suggest a simple Gaussian mixture model for data generation that complies with Feldman's long tail theory (2020). We demonstrate that a linear classifier cannot decrease the generalization error below a certain level in the proposed model, whereas a nonlinear classifier with a memorization capacity can. This confirms that for long-tailed distributions, rare training examples must be considered for optimal generalization to new data. Finally, we show that the performance gap between linear and nonlinear models can be lessened as the tail becomes shorter in the subpopulation frequency distribution, as confirmed by experiments on synthetic and real data.
    摘要 我们建议一个简单的 Gaussian 混合模型来生成数据,遵循 Feldman 的长尾理论(2020)。我们显示了一个线性分类器无法在提案的模型中降低泛化误差下限,而一个具有记忆容量的非线性分类器则可以。这证实了对长尾分布的数据,罕见的训练示例必须被考虑来取得最佳的泛化至新数据。最后,我们显示了线性和非线性模型之间的性能差距可以随着尾部的短化而减少,通过实验证明在 sintetic 和实际数据上。

Comparison between transformers and convolutional models for fine-grained classification of insects

  • paper_url: http://arxiv.org/abs/2307.11112
  • repo_url: None
  • paper_authors: Rita Pucci, Vincent J. Kalkman, Dan Stowell
  • for: 本研究旨在提高昆虫种类的自动分类精度,尤其是在同一个分类类型下的种类 diferenciación。
  • methods: 本研究使用了深度学习算法,特别是 transformer 和 convolutional 层结构,进行比较研究。
  • results: 研究发现,hybrid 模型在准确性和执行速度两个方面均有优异表现,而 transformer 模型在样本缺乏时具有更高的执行速度。
    Abstract Fine-grained classification is challenging due to the difficulty of finding discriminatory features. This problem is exacerbated when applied to identifying species within the same taxonomical class. This is because species are often sharing morphological characteristics that make them difficult to differentiate. We consider the taxonomical class of Insecta. The identification of insects is essential in biodiversity monitoring as they are one of the inhabitants at the base of many ecosystems. Citizen science is doing brilliant work of collecting images of insects in the wild giving the possibility to experts to create improved distribution maps in all countries. We have billions of images that need to be automatically classified and deep neural network algorithms are one of the main techniques explored for fine-grained tasks. At the SOTA, the field of deep learning algorithms is extremely fruitful, so how to identify the algorithm to use? We focus on Odonata and Coleoptera orders, and we propose an initial comparative study to analyse the two best-known layer structures for computer vision: transformer and convolutional layers. We compare the performance of T2TViT, a fully transformer-base, EfficientNet, a fully convolutional-base, and ViTAE, a hybrid. We analyse the performance of the three models in identical conditions evaluating the performance per species, per morph together with sex, the inference time, and the overall performance with unbalanced datasets of images from smartphones. Although we observe high performances with all three families of models, our analysis shows that the hybrid model outperforms the fully convolutional-base and fully transformer-base models on accuracy performance and the fully transformer-base model outperforms the others on inference speed and, these prove the transformer to be robust to the shortage of samples and to be faster at inference time.
    摘要 细致分类问题困难,主要因为找到分化特征困难。当应用到同一种植物类中的物种识别时,这问题更加困难。这是因为物种经常共享 morphological 特征,使其困难分 differentiate。我们考虑 insecta 纲。识别昆虫对生物多样性监测至关重要,因为它们是生态系统的基础居民。公民科学在野外采集昆虫图像,提供了专家创建改进的分布图的机会。我们有数十亿个图像需要自动分类,深度学习算法是细致任务中的主要技术之一。在 SOTA 中,深度学习领域非常肥沃,因此如何选择算法?我们将关注 Odonata 和 Coleoptera 两个目。我们提出了一个初步比较研究,检验了两种最为知名的计算机视觉层结构:变换层和卷积层。我们比较了 T2TViT、EfficientNet 和 ViTAE 三种模型的性能,分析了每种物种、每种形态、性别、推理时间和总性能。虽然我们所观察到的性能很高,但我们的分析表明,混合模型在准确性和推理速度两个方面都有优势。此外,完全转换基础模型在准确性方面表现出色,而完全卷积基础模型在推理速度方面表现出色。这证明了转换层在样本缺乏时具有稳定性和快速推理能力。

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

  • paper_url: http://arxiv.org/abs/2307.10719
  • repo_url: None
  • paper_authors: David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan
  • For: The paper discusses the risks of malicious use of large language models (LLMs) and the limitations of existing defense mechanisms, such as model fine-tuning or output censorship.* Methods: The paper presents theoretical limitations of semantic censorship approaches, highlighting the inherent challenges in censoring LLM outputs due to their programmatic and instruction-following capabilities.* Results: The paper argues that the challenges of censorship extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones, and proposes that the problem of censorship should be reevaluated and treated as a security problem to mitigate potential risks.Here is the same information in Simplified Chinese text:* For: 论文探讨大语言模型(LLM)的可能的恶用行为风险,以及现有的防御机制的不足。* Methods: 论文描述了semantic censorship的理论限制,强调 LLM 的程序化和指令执行能力使得 censored 输出仍然存在问题。* Results: 论文认为, censored 输出的问题不仅限于semantic censorship,攻击者可以通过收集 permissible 输出来重建不当输出,therefore, the problem of censorship should be reevaluated and treated as a security problem to mitigate potential risks.
    Abstract Large language models (LLMs) have exhibited impressive capabilities in comprehending complex instructions. However, their blind adherence to provided instructions has led to concerns regarding risks of malicious use. Existing defence mechanisms, such as model fine-tuning or output censorship using LLMs, have proven to be fallible, as LLMs can still generate problematic responses. Commonly employed censorship approaches treat the issue as a machine learning problem and rely on another LM to detect undesirable content in LLM outputs. In this paper, we present the theoretical limitations of such semantic censorship approaches. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, highlighting the inherent challenges in censorship that arise due to LLMs' programmatic and instruction-following capabilities. Furthermore, we argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones. As a result, we propose that the problem of censorship needs to be reevaluated; it should be treated as a security problem which warrants the adaptation of security-based approaches to mitigate potential risks.
    摘要

Differences Between Hard and Noisy-labeled Samples: An Empirical Study

  • paper_url: http://arxiv.org/abs/2307.10718
  • repo_url: https://github.com/mahf93/hard-vs-noisy
  • paper_authors: Mahsa Forouzesh, Patrick Thiran
  • for: 本研究旨在解决难度强、标签错误的样本集合中的噪声样本问题。
  • methods: 我们提出了一种系统性的实验方法,用于分析困难样本和噪声标签样本之间的相似性和 diferencias。我们还提出了一种简单 yet effective的度量,可以从噪声标签样本中筛选出噪声样本,保留困难样本。
  • results: 我们的研究表明,使用我们提出的度量筛选出噪声标签样本后,模型在训练过后的测试精度得到了最高的提升。此外,我们的数据分配方法在实际世界中存在标签噪声时也表现出色。
    Abstract Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and another deals with hard samples. However, when both types of data are present, most existing methods treat them equally, which results in a decline in the overall performance of the model. In this paper, we first design various synthetic datasets with custom hardness and noisiness levels for different samples. Our proposed systematic empirical study enables us to better understand the similarities and more importantly the differences between hard-to-learn samples and incorrectly-labeled samples. These controlled experiments pave the way for the development of methods that distinguish between hard and noisy samples. Through our study, we introduce a simple yet effective metric that filters out noisy-labeled samples while keeping the hard samples. We study various data partitioning methods in the presence of label noise and observe that filtering out noisy samples from hard samples with this proposed metric results in the best datasets as evidenced by the high test accuracy achieved after models are trained on the filtered datasets. We demonstrate this for both our created synthetic datasets and for datasets with real-world label noise. Furthermore, our proposed data partitioning method significantly outperforms other methods when employed within a semi-supervised learning framework.
    摘要 <>按照以下转换规则将文本翻译成简化中文:1. 将"noisy"和"incorrectly"替换为"噪音"和"错误"。2. 将"samples"替换为"样本"。3. 将"hard"和"difficult"替换为"困难"。4. 将"labels"替换为"标签"。5. 将"existing"替换为"现有"。6. 将"systematic"替换为"系统的"。7. 将"empirical"替换为"实际的"。8. 将"controlled"替换为"控制的"。9. 将"filter"替换为"筛选"。10. 将"partitions"替换为"分区"。转换后的文本如下:EXTRACTING NOISY OR INCORRECTLY LABELED SAMPLES FROM A LABELED DATASET WITH HARD/DIFFICULT SAMPLES IS AN IMPORTANT YET UNDERE-EXPLORED TOPIC. TWO GENERAL AND OFTEN INDEPENDENT LINES OF WORK EXIST, ONE FOCUSES ON ADDRESSING NOISY LABELS, AND ANOTHER DEALS WITH HARD SAMPLES. HOWEVER, WHEN BOTH TYPES OF DATA ARE PRESENT, MOST EXISTING METHODS TREAT THEM EQUALLY, WHICH RESULTS IN A DECLINE IN THE OVERALL PERFORMANCE OF THE MODEL. IN THIS PAPER, WE FIRST DESIGN VARIOUS SYNTHETIC DATASETS WITH CUSTOM HARDNESS AND NOISINESS LEVELS FOR DIFFERENT SAMPLES. OUR PROPOSED SYSTEMATIC EMPIRICAL STUDY ENABLES US TO BETTER UNDERSTAND THE SIMILARITIES AND MORE IMPORTANTLY THE DIFFERENCES BETWEEN HARD-TO-LEARN SAMPLES AND INCORRECTLY LABELED SAMPLES. THESE CONTROLLED EXPERIMENTS PAVE THE WAY FOR THE DEVELOPMENT OF METHODS THAT DISTINGUISH BETWEEN HARD AND NOISY SAMPLES. THROUGH OUR STUDY, WE INTRODUCE A SIMPLE YET EFFECTIVE METRIC THAT FILTERS OUT NOISY-LABELED SAMPLES WHILE KEEPING THE HARD SAMPLES. WE STUDY VARIOUS DATA PARTITIONING METHODS IN THE PRESENCE OF LABEL NOISE AND OBSERVE THAT FILTERING OUT NOISY SAMPLES FROM HARD SAMPLES WITH THIS PROPOSED METRIC RESULTS IN THE BEST DATASETS AS EVIDENCED BY THE HIGH TEST ACCURACY ACHIEVED AFTER MODELS ARE TRAINED ON THE FILTERED DATASETS. WE DEMONSTRATE THIS FOR BOTH OUR CREATED SYNTHETIC DATASETS AND FOR DATASETS WITH REAL-WORLD LABEL NOISE. FURTHERMORE, OUR PROPOSED DATA PARTITIONING METHOD SIGNIFICANTLY OUTPERFORMS OTHER METHODS WHEN EMPLOYED WITHIN A SEMI-SUPERVISED LEARNING FRAMEWORK.

AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2307.10711
  • repo_url: None
  • paper_authors: Jiachun Pan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng, Hanshu Yan
  • for: 这篇论文旨在Addressing the challenge of diffusion probabilistic model (DPM) customization when only available supervision is a differentiable metric defined on the generated contents.
  • methods: 该论文提出了一种新的方法 called AdjointDPM,它首先使用扩展ODE来生成新的样本,然后使用逆变数法来归导损失的梯度到模型参数(包括conditioning信号、网络参数和初始噪声)。
  • results: 该论文通过三个有趣的任务来证明AdjointDPM的效果:将视觉特效转换为标识码嵌入,finetune DPMs для特定类型的风格化,以及优化初始噪声生成对抗样本 для安全审核。
    Abstract Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.
    摘要 现有的自定义方法需要访问多个参考示例,以将预训练的扩散概率模型(DPM)与用户提供的概念进行对接。这篇论文目标是解决DPM自定义时,只有用户提供的可微分度量表示的挑战。因为扩散过程中的DPM采样过程 involve recursive calls to the denoising UNet,直观梯度反propagation需要存储所有迭代过程的间接状态,导致内存消耗极高。为解决这个问题,我们提出了一种新的方法AdjointDPM。首先,AdjointDPM使用扩散模型来生成新的样本,然后使用逆散射敏感度法来反propagate损失的梯度到模型参数(包括conditioning信号、网络参数和初始噪音)。为了减少在前向生成和反propagation过程中的数值错误,我们进一步将概率流ODE和扩充ODE转换为简单的非硬式ODE,并使用加速的exponential integration。最后,我们示例了AdjointDPM在三个有趣的任务上的效果:将视觉特效转换为标识码嵌入,finetune DPMs for specific types of stylization,和优化初始噪音以生成安全审核中的敌意样本。

Reparameterized Policy Learning for Multimodal Trajectory Optimization

  • paper_url: http://arxiv.org/abs/2307.10710
  • repo_url: https://github.com/haosulab/RPG
  • paper_authors: Zhiao Huang, Litian Liang, Zhan Ling, Xuanlin Li, Chuang Gan, Hao Su
  • for: 本研究的目标是解决高维连续动作空间中RL政策参数化的挑战。
  • methods: 我们提出了一种原则正的框架,将连续RL政策视为环境优质轨迹的生成模型。通过将政策 conditional on一个隐藏变量,我们得到了一种新的可变 bounds,它鼓励环境探索。
  • results: 我们提出了一种实用的模型基于RL方法,called Reparameterized Policy Gradient (RPG),它利用多模态政策参数化和学习的世界模型来实现强大的探索能力和高数据效率。实验结果表明,我们的方法可以帮助代理人避免环境中的局部优点,解决杂质奖励环境,并在各种任务中达到优秀的性能。
    Abstract We investigate the challenge of parametrizing policies for reinforcement learning (RL) in high-dimensional continuous action spaces. Our objective is to develop a multimodal policy that overcomes limitations inherent in the commonly-used Gaussian parameterization. To achieve this, we propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. By conditioning the policy on a latent variable, we derive a novel variational bound as the optimization objective, which promotes exploration of the environment. We then present a practical model-based RL method, called Reparameterized Policy Gradient (RPG), which leverages the multimodal policy parameterization and learned world model to achieve strong exploration capabilities and high data efficiency. Empirical results demonstrate that our method can help agents evade local optima in tasks with dense rewards and solve challenging sparse-reward environments by incorporating an object-centric intrinsic reward. Our method consistently outperforms previous approaches across a range of tasks. Code and supplementary materials are available on the project page https://haosulab.github.io/RPG/
    摘要 我们研究了重 parametrization policies 的挑战,在高维连续行为空间中进行学习反馈学习(RL)。我们的目标是开发一个多模态策略,超越常用的 Gaussian 参数化的限制。为 достичь这一目标,我们提出了一种原则的框架,将连续RL策略模型为优质轨迹生成模型。通过对策略进行受限变量条件,我们得到了一种新的variational bound,作为优化目标,这会促进环境的探索。然后,我们提出了一种实用的基于模型的RL方法,called Reparameterized Policy Gradient(RPG),利用多模态策略参数化和学习的世界模型,以实现强大的探索能力和高数据效率。我们的方法在多个任务中表现出色,可以帮助代理人避免环境中的地方最优化,并通过嵌入物体内部的内在奖励来解决稀悉奖励环境。我们的方法在多个任务中比前一些方法表现出色,代码和补充材料可以在项目页面https://haosulab.github.io/RPG/ 中找到。

TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars

  • paper_url: http://arxiv.org/abs/2307.10705
  • repo_url: https://github.com/chequanghuy/TwinLiteNet
  • paper_authors: Quang Huy Che, Dinh Phuc Nguyen, Minh Quan Pham, Duc Khai Lam
  • for: 本研究旨在提出一种轻量级的模型,用于自动驾驶车辆环境理解的驱动区域和车道线分割。
  • methods: 该模型基于TwinLiteNet方法,它是一种廉价的模型,但具有高准确率和高效性。
  • results: 实验结果表明,TwinLiteNet在BDD100K数据集上 achievement mIoU分数为91.3%,对比现有方法相对较高,且具有较少的计算资源需求。 Code可以在url{https://github.com/chequanghuy/TwinLiteNet}中获取。
    Abstract Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}.
    摘要 Semantic segmentation 是自动驾驶中常见的任务,用于理解周围环境。驱动区域分割和车道检测是安全和高效导航的关键,但原始semantic segmentation模型 computationally expensive和需要高级硬件,不适合自动驾驶车辆中的嵌入式系统。这篇论文提出了一种轻量级的模型,用于驱动区域和车道分割。TwinLiteNet是一种便宜的设计,但它可以实现高度准确和高速的分割结果。我们在BDD100K数据集上评估TwinLiteNet,并与现代模型进行比较。实验结果表明,我们的TwinLiteNet与现有方法相似,需要 significatively fewer computational resources。具体来说,TwinLiteNet在驱动区域任务中 achieves mIoU 分数为 91.3%,并在车道检测任务中 achieves IoU 分数为 31.08%,仅使用 0.4 万个参数。此外,TwinLiteNet可以在实时中在有限的计算能力的嵌入式设备上运行,特别是在 Jetson Xavier NX 上 achieve 60 FPS,使其成为自动驾驶车辆的理想解决方案。代码可以在以下链接中找到:

Decentralized Smart Charging of Large-Scale EVs using Adaptive Multi-Agent Multi-Armed Bandits

  • paper_url: http://arxiv.org/abs/2307.10704
  • repo_url: None
  • paper_authors: Sharyal Zafar, Raphaël Feraud, Anne Blavette, Guy Camilleri, Hamid Ben
  • for: 这篇论文的目的是提出一种完全分散式的充电系统,以解决电动车充电过载和电压限制问题。
  • methods: 该系统采用了自适应多代理系统哲学,并使用多臂投掷学来处理系统不确定性。
  • results: caso study表明,该系统具有分散式、扩展性、实时性、无模型基础和公平性等特点。
    Abstract The drastic growth of electric vehicles and photovoltaics can introduce new challenges, such as electrical current congestion and voltage limit violations due to peak load demands. These issues can be mitigated by controlling the operation of electric vehicles i.e., smart charging. Centralized smart charging solutions have already been proposed in the literature. But such solutions may lack scalability and suffer from inherent drawbacks of centralization, such as a single point of failure, and data privacy concerns. Decentralization can help tackle these challenges. In this paper, a fully decentralized smart charging system is proposed using the philosophy of adaptive multi-agent systems. The proposed system utilizes multi-armed bandit learning to handle uncertainties in the system. The presented system is decentralized, scalable, real-time, model-free, and takes fairness among different players into account. A detailed case study is also presented for performance evaluation.
    摘要 electric vehicles 和 photovoltaics 的快速增长可能会引入新的挑战,如电流拥堵和电压限制由峰值负荷带来。这些问题可以通过智能充电控制来缓解。文章中已经提出了中央化智能充电解决方案。但这些解决方案可能缺乏扩展性和中央化的缺点,如唯一点失败和数据隐私问题。分散化可以解决这些挑战。本文提出了一个完全分散式智能充电系统,使用适应多代理系统的哲学。该系统使用多臂投掷学来处理系统不确定性。提出的系统是分散式、可扩展、实时、模型自由、具有公平性的。还提供了一个详细的案例研究以评估性能。

Graphs in State-Space Models for Granger Causality in Climate Science

  • paper_url: http://arxiv.org/abs/2307.10703
  • repo_url: None
  • paper_authors: Víctor Elvira, Émilie Chouzenoux, Jordi Cerdà, Gustau Camps-Valls
  • for: 评估时间序列之间的预测可能性
  • methods: 使用图ematrix模型和lasso正则化
  • results: 提高了对标准Granger causality方法的比较Here’s a more detailed explanation of each point:
  • for: The paper is written to assess the predictability of time series from another time series using Granger causality, which is a widely used method in many applied disciplines.
  • methods: The paper uses a graphical perspective of state-space models and a recently presented expectation-maximization algorithm called GraphEM to estimate the linear matrix operator in the state equation of a linear-Gaussian state-space model. Lasso regularization is included in the M-step, which is solved using a proximal splitting Douglas-Rachford algorithm.
  • results: The proposed model and inference technique are demonstrated to have benefits over standard Granger causality methods through experiments in toy examples and challenging climate problems.
    Abstract Granger causality (GC) is often considered not an actual form of causality. Still, it is arguably the most widely used method to assess the predictability of a time series from another one. Granger causality has been widely used in many applied disciplines, from neuroscience and econometrics to Earth sciences. We revisit GC under a graphical perspective of state-space models. For that, we use GraphEM, a recently presented expectation-maximisation algorithm for estimating the linear matrix operator in the state equation of a linear-Gaussian state-space model. Lasso regularisation is included in the M-step, which is solved using a proximal splitting Douglas-Rachford algorithm. Experiments in toy examples and challenging climate problems illustrate the benefits of the proposed model and inference technique over standard Granger causality methods.
    摘要 格兰治 causality(GC)frequently 被视为不是实际的 causality。然而,它仍然是最广泛使用的方法来评估时间序列之间的预测性。格兰治 causality 在各种应用领域中广泛使用,从神经科学和经济统计到地球科学。我们在图表视角下重新审视GC。为此,我们使用 GraphEM,一种最近提出的期望最大化算法来估计状态方程中的线性矩阵运算符。lasso 规范化包括在M-step中,使用距离 Douglas-Rachford 算法解决。实验在小例子和挑战性气候问题中ILLUSTRATE 我们提议的模型和推理方法的优势于标准Granger causality方法。

Self2Self+: Single-Image Denoising with Self-Supervised Learning and Image Quality Assessment Loss

  • paper_url: http://arxiv.org/abs/2307.10695
  • repo_url: https://github.com/JK-the-Ko/Self2SelfPlus
  • paper_authors: Jaekyun Ko, Sanghwan Lee
  • for: 本研究旨在提出一种基于单个噪声图像的自监督学习方法,以便提高噪声除去效果的可行性和实用性。
  • methods: 该方法使用阻塞卷积来提取特征,并使用无参图质量评估来导航训练过程。另外,方法还使用bernoulli采样和dropout来随机 sampling实例从输入图像集,以提高训练效果。
  • results: 实验结果表明,提出的方法在 sintetic 和实际 dataset 上达到了当前最佳的噪声除去性能。这表明了方法的有效性和实用性,可能用于各种噪声除去任务。
    Abstract Recently, denoising methods based on supervised learning have exhibited promising performance. However, their reliance on external datasets containing noisy-clean image pairs restricts their applicability. To address this limitation, researchers have focused on training denoising networks using solely a set of noisy inputs. To improve the feasibility of denoising procedures, in this study, we proposed a single-image self-supervised learning method in which only the noisy input image is used for network training. Gated convolution was used for feature extraction and no-reference image quality assessment was used for guiding the training process. Moreover, the proposed method sampled instances from the input image dataset using Bernoulli sampling with a certain dropout rate for training. The corresponding result was produced by averaging the generated predictions from various instances of the trained network with dropouts. The experimental results indicated that the proposed method achieved state-of-the-art denoising performance on both synthetic and real-world datasets. This highlights the effectiveness and practicality of our method as a potential solution for various noise removal tasks.
    摘要 最近,基于超级vised学习的减噪方法表现出色。然而,它们依赖于外部 dataset 中的噪音-清洁图像对,这限制了它们的应用场景。为解决这些 limitation,研究人员对减噪网络的训练进行了更多的关注。在这种情况下,我们提出了使用单一静止输入图像进行网络训练的自我监督学习方法。使用了扩展 convolution 来抽取特征,并使用无参图像质量评估来引导训练过程。此外,我们使用 Bernoulli 抽样来从输入图像集中随机选择实例进行训练。通过 averaging 生成的多个实例的 predicates,我们得到了最终的结果。实验结果表明,我们的方法在 synthetic 和实际世界 dataset 上实现了 state-of-the-art 的减噪性能。这表明了我们的方法在减噪任务中的实用性和可行性。

Fractional Denoising for 3D Molecular Pre-training

  • paper_url: http://arxiv.org/abs/2307.10683
  • repo_url: https://github.com/fengshikun/frad
  • paper_authors: Shikun Feng, Yuyan Ni, Yanyan Lan, Zhi-Ming Ma, Wei-Ying Ma
  • for: 提高3D分子预训练方法的性能,特别是在药物搜寻任务中。
  • methods: 提出了一种新的混合噪声策略,包括对 dip 角和坐标进行噪声处理。同时,提出了一种新的分数噪声处理方法(Frad),可以更好地适应分子的 ani sowropic 特征。
  • results: 实验表明,Frad 可以更好地提高分子表示性,并在 9 个 QM9 任务和 7 个 MD17 任务中达到新的顶峰性。
    Abstract Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecular representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.
    摘要 “坐标干扰是一种有前途的3D分子预训方法,它在不同的下游药物探索任务中表现出色。理论上,这个目标等于学习力场,这是对下游任务有帮助的。然而,坐标干扰学习一个有效的力场受到两个挑战,即低覆盖样本和各向同性力场。这些挑战的根本原因是现有的坐标干扰方法对分子的分布假设不能够捕捉分子的非对称特征。为了解决这些挑战,我们提出了一种新的混合噪音策略,包括坐标和方向夹角的噪音。但是,将这种混合噪音进行传统的噪音除法不再等于学习力场。经过理论的推导,我们发现这个问题是因为输入构造的假设所导致的。为了解决这问题,我们提出了一种新的分解方法(Frad),它只对后者的坐标部分进行噪音除法。这样,Frad可以同时具有较低的能量结构和力场等于性。实验结果显示Frad在分子表现方面有新的顶峰性,在QM9和MD17上分别取得9/12和7/8的新纪录。”

Deep learning for classification of noisy QR codes

  • paper_url: http://arxiv.org/abs/2307.10677
  • repo_url: None
  • paper_authors: Rebecca Leygonie, Sylvain Lobry, ), Laurent Wendling (LIPADE)
  • for: 本研究旨在定义基于深度学习的古典分类模型在抽象图像上的限制,当应用于不可见化对象的图像。
  • methods: 我们使用了基于深度学习的图像分类模型,并对QR码(快速响应码)进行了训练。QR码不是为人类手动读取而设计的,因此我们可以通过对QR码生成的信息进行分析,了解深度学习模型在抽象图像分类中的限制。
  • results: 我们的研究结果表明,基于深度学习的模型可以在抽象图像分类中表现出优异的性能,并且在噪声存在的情况下也能够保持一定的稳定性。这项研究表明,基于深度学习的模型可以在理解抽象图像方面发挥作用。
    Abstract We wish to define the limits of a classical classification model based on deep learning when applied to abstract images, which do not represent visually identifiable objects.QR codes (Quick Response codes) fall into this category of abstract images: one bit corresponding to one encoded character, QR codes were not designed to be decoded manually. To understand the limitations of a deep learning-based model for abstract image classification, we train an image classification model on QR codes generated from information obtained when reading a health pass. We compare a classification model with a classical (deterministic) decoding method in the presence of noise. This study allows us to conclude that a model based on deep learning can be relevant for the understanding of abstract images.
    摘要 我们想定义深度学习模型在抽象图像分类中的限制,当应用于不可识别的视觉对象。二维码(快速响应码)是这类抽象图像的一个例子:每一比特对应一个编码字符,二维码不是为人类手动解码。通过在健康通行证中读取的信息生成的二维码,我们训练了一个图像分类模型,并与经典(束缚)解码方法进行比较,以评估深度学习模型在抽象图像分类中的局限性。这项研究允许我们 conclued that deep learning模型对抽象图像有 relevance。

A Survey of What to Share in Federated Learning: Perspectives on Model Utility, Privacy Leakage, and Communication Efficiency

  • paper_url: http://arxiv.org/abs/2307.10655
  • repo_url: None
  • paper_authors: Jiawei Shao, Zijian Li, Wenqiang Sun, Tailin Zhou, Yuchang Sun, Lumin Liu, Zehong Lin, Jun Zhang
  • for: 这个论文主要是为了探讨 Federated Learning(FL)中可以共同培训多个参与者的方法,以保护数据隐私。
  • methods: 这篇论文分析了不同类型的共享方法,包括模型共享、synthetic数据共享和知识共享。
  • results: 论文通过对不同共享方法的性能和通信开销进行比较,以及对模型泄露和会员推测攻击的评估,提供了一些结论。
    Abstract Federated learning (FL) has emerged as a highly effective paradigm for privacy-preserving collaborative training among different parties. Unlike traditional centralized learning, which requires collecting data from each party, FL allows clients to share privacy-preserving information without exposing private datasets. This approach not only guarantees enhanced privacy protection but also facilitates more efficient and secure collaboration among multiple participants. Therefore, FL has gained considerable attention from researchers, promoting numerous surveys to summarize the related works. However, the majority of these surveys concentrate on methods sharing model parameters during the training process, while overlooking the potential of sharing other forms of local information. In this paper, we present a systematic survey from a new perspective, i.e., what to share in FL, with an emphasis on the model utility, privacy leakage, and communication efficiency. This survey differs from previous ones due to four distinct contributions. First, we present a new taxonomy of FL methods in terms of the sharing methods, which includes three categories of shared information: model sharing, synthetic data sharing, and knowledge sharing. Second, we analyze the vulnerability of different sharing methods to privacy attacks and review the defense mechanisms that provide certain privacy guarantees. Third, we conduct extensive experiments to compare the performance and communication overhead of various sharing methods in FL. Besides, we assess the potential privacy leakage through model inversion and membership inference attacks, while comparing the effectiveness of various defense approaches. Finally, we discuss potential deficiencies in current methods and outline future directions for improvement.
    摘要 受领学习(Federated Learning,FL)已经出现为保持隐私的高效模式,多个方面共同训练不同的数据集。与传统中央学习不同,FL 允许客户端共享隐私保护的信息而不曝光私有数据集。这种方法不仅保证了更好的隐私保护,还促进了多个参与者之间更加有效和安全的协作。因此,FL 已经吸引了广泛的研究人员,促使了许多相关的评估文章。然而,大多数这些评估文章都集中在共享模型参数 durante el proceso de entrenamiento,而忽略了其他形式的本地信息的共享的潜在价值。在这篇文章中,我们提供了一个系统的评估,即在 FL 中何时共享什么,并强调模型的实用性、隐私泄露和通信效率。这种评估与以往不同,主要由以下四个特点:1. 我们提出了一个新的分类方法,将 FL 方法分为三类共享信息:模型共享、合成数据共享和知识共享。2. 我们分析了不同共享方法的隐私攻击的敏感性,并评估了提供一定隐私保证的防御机制。3. 我们进行了广泛的实验,比较不同共享方法在 FL 中的性能和通信开销。此外,我们评估了模型反向泄露和成员身份攻击的风险,并比较了不同防御策略的效果。4. 我们讨论了当前方法的缺陷和未来方向的改进。总之,本文提供了一个系统的评估,帮助读者更好地理解 FL 中共享哪些信息,以及这些信息在模型实用性、隐私泄露和通信效率方面的影响。

Conditional expectation network for SHAP

  • paper_url: http://arxiv.org/abs/2307.10654
  • repo_url: None
  • paper_authors: Ronald Richman, Mario V. Wüthrich
  • for: 这个研究旨在提出一种能够有效地计算Conditional SHAP值的神经网络方法,以便在神经网络和其他回归模型中使用。
  • methods: 这种方法使用了SHAP技术,并且特别考虑了特征组件之间的依赖关系。
  • results: 这种方法可以准确地计算Conditional SHAP值,并且可以提供drop1和anova分析,以及一种考虑特征组件之间依赖关系的PDP图像。
    Abstract A very popular model-agnostic technique for explaining predictive models is the SHapley Additive exPlanation (SHAP). The two most popular versions of SHAP are a conditional expectation version and an unconditional expectation version (the latter is also known as interventional SHAP). Except for tree-based methods, usually the unconditional version is used (for computational reasons). We provide a (surrogate) neural network approach which allows us to efficiently calculate the conditional version for both neural networks and other regression models, and which properly considers the dependence structure in the feature components. This proposal is also useful to provide drop1 and anova analyses in complex regression models which are similar to their generalized linear model (GLM) counterparts, and we provide a partial dependence plot (PDP) counterpart that considers the right dependence structure in the feature components.
    摘要 非常流行的模型无关技术之一是SHapley Additive exPlanation(SHAP)。这两个版本是条件预期版本和无条件预期版本(后者也称为交互SHAP)。通常情况下,使用无条件版本(由于计算原因)。我们提供一种(代理)神经网络方法,可以高效计算条件版本,并且正确考虑特征组件之间的依赖关系。这种提议还有助于提供drop1和anova分析在复杂回归模型中,与其普通线性模型(GLM)对应的分析。此外,我们还提供了一种考虑特征组件依赖关系的PDP对应。

Refining the Optimization Target for Automatic Univariate Time Series Anomaly Detection in Monitoring Services

  • paper_url: http://arxiv.org/abs/2307.10653
  • repo_url: None
  • paper_authors: Manqing Dong, Zhanxiang Zhao, Yitong Geng, Wentao Li, Wei Wang, Huai Jiang
  • for: 这篇论文是为了提高时间序列异常探测的自动化化,以提高工业监控服务的可靠性和系统性能。
  • methods: 本文提出了一个全面的自动化parameter优化框架,包括三个优化目标:预测得分、形状得分和敏感度得分,这些目标可以轻松地适应不同的模型后段,无需专业知识或手动标注努力。
  • results: 本文的提案框架已经在线上进行了超过六个月的实际应用,处理了每分钟50,000多个时间序列,并简化了用户的体验,仅需要提供预期的敏感值,并且实现了适当的侦测结果。
    Abstract Time series anomaly detection is crucial for industrial monitoring services that handle a large volume of data, aiming to ensure reliability and optimize system performance. Existing methods often require extensive labeled resources and manual parameter selection, highlighting the need for automation. This paper proposes a comprehensive framework for automatic parameter optimization in time series anomaly detection models. The framework introduces three optimization targets: prediction score, shape score, and sensitivity score, which can be easily adapted to different model backbones without prior knowledge or manual labeling efforts. The proposed framework has been successfully applied online for over six months, serving more than 50,000 time series every minute. It simplifies the user's experience by requiring only an expected sensitive value, offering a user-friendly interface, and achieving desired detection results. Extensive evaluations conducted on public datasets and comparison with other methods further confirm the effectiveness of the proposed framework.
    摘要 时序列异常检测是对于工业监测服务而言是非常重要的,旨在确保可靠性并优化系统性能。现有的方法经常需要大量的标注资源和手动参数选择,这高亮了自动化的需求。这篇论文提出了一个完整的自动参数优化框架 для时序列异常检测模型。该框架引入了三个优化目标:预测得分、形态得分和敏感度得分,可以轻松地适应不同的模型背景而无需互知或手动标注努力。该提议的框架已经在线上运行了超过六个月,处理了每分钟50,000个时序列,并提供了一个易用的用户界面,以及达到了检测结果的所求的目标。对于公共数据集和其他方法进行了广泛的评估,并证实了提议的效果。

Data-Driven Latency Probability Prediction for Wireless Networks: Focusing on Tail Probabilities

  • paper_url: http://arxiv.org/abs/2307.10648
  • repo_url: https://github.com/samiemostafavi/wireless-pr3d
  • paper_authors: Samie Mostafavi, Gourav Prateek Sharma, James Gross
  • for: Ensuring end-to-end network latency with extremely high reliability (99.999%) in wireless networks, particularly for cyber-physical systems and human-in-the-loop applications.
  • methods: Using state-of-the-art data-driven approaches, such as mixture density networks (MDN) and extreme value mixture models, to predict the tail of the latency distribution and estimate the likelihood of rare latencies conditioned on network parameters.
  • results: Benchmarking the proposed approaches using actual latency measurements of IEEE 802.11g (WiFi), commercial private, and a software-defined 5G network to evaluate their sensitivities concerning the tail probabilities.
    Abstract With the emergence of new application areas, such as cyber-physical systems and human-in-the-loop applications, there is a need to guarantee a certain level of end-to-end network latency with extremely high reliability, e.g., 99.999%. While mechanisms specified under IEEE 802.1as time-sensitive networking (TSN) can be used to achieve these requirements for switched Ethernet networks, implementing TSN mechanisms in wireless networks is challenging due to their stochastic nature. To conform the wireless link to a reliability level of 99.999%, the behavior of extremely rare outliers in the latency probability distribution, or the tail of the distribution, must be analyzed and controlled. This work proposes predicting the tail of the latency distribution using state-of-the-art data-driven approaches, such as mixture density networks (MDN) and extreme value mixture models, to estimate the likelihood of rare latencies conditioned on the network parameters, which can be used to make more informed decisions in wireless transmission. Actual latency measurements of IEEE 802.11g (WiFi), commercial private and a software-defined 5G network are used to benchmark the proposed approaches and evaluate their sensitivities concerning the tail probabilities.
    摘要 随着新的应用领域的出现,如半导体物理系统和人工智能应用,需要保证端到端网络延迟在极高可靠性水平(99.999%)下达成。而根据IEEE 802.1as时间敏感网络(TSN)规范,可以用switched Ethernet网络达到这些要求。然而,在无线网络中实施TSN机制是困难的,因为无线链路的随机性。为了使 wireless link 达到99.999%的可靠性水平,需要分析和控制无线链路的很少极端延迟的行为,即延迟分布的尾部。本工作提议使用现有的数据驱动方法,如混合密度网络(MDN)和极值混合模型,来估计延迟分布的尾部,并用此来估计conditioned on 网络参数的罕见延迟的可能性。实际测量IEEE 802.11g(WiFi)、商业专用和软件定义5G网络的延迟值,用于评估和评测提议的敏感程度。

Fisher-Rao distance and pullback SPD cone distances between multivariate normal distributions

  • paper_url: http://arxiv.org/abs/2307.10644
  • repo_url: None
  • paper_authors: Frank Nielsen
  • for: 该论文旨在处理多变量正态分布集合,如扩散tensor成像、结构tensor计算机视觉、雷达信号处理、机器学习等领域的数据集。
  • methods: 该论文提出了一种快速和稳定的方法来 aproximate multivariate normal distributions的Fisher-Rao距离,以及一种基于几何映射的方法来定义正态分布之间的距离。
  • results: 该论文的结果表明,基于Fisher信息度量的Fisher-Rao距离可以很好地approximate multivariate normal distributions,而且 Computationally, the pullback Hilbert cone distance is much lighter than the Fisher-Rao distance approximation, since it only requires the extreme minimal and maximal eigenvalues of matrices. In addition, the paper shows how to use these distances in clustering tasks.
    Abstract Data sets of multivariate normal distributions abound in many scientific areas like diffusion tensor imaging, structure tensor computer vision, radar signal processing, machine learning, just to name a few. In order to process those normal data sets for downstream tasks like filtering, classification or clustering, one needs to define proper notions of dissimilarities between normals and paths joining them. The Fisher-Rao distance defined as the Riemannian geodesic distance induced by the Fisher information metric is such a principled metric distance which however is not known in closed-form excepts for a few particular cases. In this work, we first report a fast and robust method to approximate arbitrarily finely the Fisher-Rao distance between multivariate normal distributions. Second, we introduce a class of distances based on diffeomorphic embeddings of the normal manifold into a submanifold of the higher-dimensional symmetric positive-definite cone corresponding to the manifold of centered normal distributions. We show that the projective Hilbert distance on the cone yields a metric on the embedded normal submanifold and we pullback that cone distance with its associated straight line Hilbert cone geodesics to obtain a distance and smooth paths between normal distributions. Compared to the Fisher-Rao distance approximation, the pullback Hilbert cone distance is computationally light since it requires to compute only the extreme minimal and maximal eigenvalues of matrices. Finally, we show how to use those distances in clustering tasks.
    摘要 数据集中的多变量正态分布非常普遍,例如扩散tensor成像、结构tensor计算机视觉、雷达信号处理、机器学习等。为了处理这些正态数据集,需要定义适当的不同量 zwischen normals和joinning它们的路径。 Fisher-Rao距离是一种原理的距离度量,但它没有固定形式,除了一些特殊情况之外。在这项工作中,我们首先报道了一种快速和稳定的方法来估算正态分布之间的Fisher-Rao距离。其次,我们引入一类基于正态映射的距离,该距离在高维正态区域上定义了一个正态分布的子集。我们展示了该距离在映射后的正态分布上是一个度量,并将其pullback到正态映射上,从而得到了一个距离和正态分布之间的缓解路径。与Fisher-Rao距离估算相比,pullback Hilbert cone距离是计算更轻量级的,因为只需计算正态分布的极小和最大特征值。最后,我们介绍了如何使用这些距离在分 clustering 任务中。

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10635
  • repo_url: https://github.com/mandyyyyii/scibench
  • paper_authors: Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang
  • for: This paper aims to evaluate the reasoning capabilities of large language models (LLMs) on complex scientific problem solving.
  • methods: The paper introduces an expansive benchmark suite called SciBench, which features two datasets: an open set of collegiate-level scientific problems and a closed set of undergraduate-level exams in computer science and mathematics. The authors evaluate the performance of two representative LLMs with various prompting strategies.
  • results: The results show that current LLMs have an overall score of merely 35.80% and make ten different types of errors. The authors find that no single prompting strategy significantly outperforms others, and some strategies that improve in certain problem-solving skills result in declines in other skills.
    Abstract Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.
    摘要 SciBench contains two datasets: an open set featuring collegiate-level scientific problems from mathematics, chemistry, and physics textbooks, and a closed set consisting of undergraduate-level exam problems in computer science and mathematics. We conduct an in-depth benchmark study of two representative LLMs using various prompting strategies, and find that current LLMs achieve only a 35.80% overall score.Through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis shows that no single prompting strategy consistently outperforms others, and some strategies that improve performance in one area lead to declines in other areas.We envision that SciBench will drive further advancements in the reasoning abilities of LLMs, ultimately contributing to scientific research and discovery.

Generative Language Models on Nucleotide Sequences of Human Genes

  • paper_url: http://arxiv.org/abs/2307.10634
  • repo_url: https://github.com/boun-tabi/generativelm-genes
  • paper_authors: Musa Nuri Ihtiyar, Arzucan Ozgur
    for:This paper focuses on developing an autoregressive generative language model for DNA sequences, specifically on the nucleotide sequences of human genes.methods:The authors use a systematic approach to examine the performance of different models, including RNNs and N-grams, and explore the use of real-life tasks beyond classical metrics such as perplexity.results:The study finds that RNNs perform the best, and that selecting a language with a minimal vocabulary size does not significantly reduce the amount of data needed.
    Abstract Language models, primarily transformer-based ones, obtained colossal success in NLP. To be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. Yet, the generative side of the coin is mainly unexplored to the best of our knowledge. Consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. Because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. This decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. First of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. Another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. How essential using real-life tasks beyond the classical metrics such as perplexity is observed. Furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. The reason for reviewing this was that choosing such a language might make the problem easier. However, what we observed in this study was it did not provide that much of a change in the amount of data needed.
    摘要 language models, primarily transformer-based ones, have achieved great success in NLP. to be more precise, studies like BERT in NLU and works such as GPT-3 for NLG are very crucial. DNA sequences are very close to natural language in terms of structure, so if the DNA-related bioinformatics domain is concerned, discriminative models, like DNABert, exist. however, the generative side of the coin is mainly unexplored to the best of our knowledge. consequently, we focused on developing an autoregressive generative language model like GPT-3 for DNA sequences. because working with whole DNA sequences is challenging without substantial computational resources, we decided to carry out our study on a smaller scale, focusing on nucleotide sequences of human genes, unique parts in DNA with specific functionalities, instead of the whole DNA. this decision did not change the problem structure a lot due to the fact that both DNA and genes can be seen as 1D sequences consisting of four different nucleotides without losing much information and making too much simplification. first of all, we systematically examined an almost entirely unexplored problem and observed that RNNs performed the best while simple techniques like N-grams were also promising. another beneficial point was learning how to work with generative models on languages we do not understand, unlike natural language. how essential using real-life tasks beyond classical metrics such as perplexity is observed. furthermore, checking whether the data-hungry nature of these models can be changed through selecting a language with minimal vocabulary size, four owing to four different types of nucleotides, is examined. the reason for reviewing this was that choosing such a language might make the problem easier. however, what we observed in this study was it did not provide that much of a change in the amount of data needed.

Multi-Method Self-Training: Improving Code Generation With Text, And Vice Versa

  • paper_url: http://arxiv.org/abs/2307.10633
  • repo_url: None
  • paper_authors: Shriyash K. Upadhyay, Etan J. Ginsberg
  • for: 该论文的目的是提出多方法自动训练(MMST),以增强语言模型的可用性和性能。
  • methods: 该论文使用了一种176B参数的语言和代码模型,并对其进行多方法自动训练,以便augment各种方法的优势和改善各种方法的缺陷。
  • results: 该论文显示,通过多方法自动训练,可以1)提高较弱的方法性能(最多30%),2)提高较强的方法性能(最多32.2%),3)提高相关 yet distinct tasks的性能(最多10.3%)。此外,论文还进行了ablation analyses,并发现MMST生成的数据量更大,但是性能提高的原因是多种方法的使用。
    Abstract Large Language Models have many methods for solving the same problem. This introduces novel strengths (different methods may work well for different problems) and weaknesses (it may be difficult for users to know which method to use). In this paper, we introduce Multi-Method Self-Training (MMST), where one method is trained on the filtered outputs of another, allowing us to augment the strengths and ameliorate the weaknesses of each method. Using a 176B parameter model trained on both language and code, we show that MMST can 1) improve the less performant method (up to 30%) making the model easier to use, 2) improve the more performant method (up to 32.2%) making the model more performant, and 3) improve the performance of related but distinct tasks (up to 10.3%) by improving the ability of the model to generate rationales. We then conduct ablation analyses to explore why MMST works. We show that MMST generates more data than traditional self-training, but the improvement in performance is driven by the use of multiple methods. We also analyze prompt-engineering and anti-correlated performance between methods as means of making MMST more effective. We hope the evidence from our paper motivates machine learning researchers to explore ways in which advances in language models allow for new forms of training.
    摘要 大语言模型有许多方法解决同一问题,这引入了新的优势(不同的方法可能适用于不同的问题)和劣势(用户可能Difficult to determine哪种方法使用)。在这篇论文中,我们介绍了多种方法自我培训(MMST),其中一种方法在另一种方法的过滤输出上进行训练,从而可以增强每种方法的优势和缓解劣势。使用176亿参数模型,我们显示了以下三点:1. 使用MMST可以提高较弱的方法(最多30%),使模型更易用。2. 使用MMST可以提高较强的方法(最多32.2%),使模型更高效。3. 使用MMST可以提高相关而不同的任务(最多10.3%)的性能,通过提高模型生成合理性的能力。然后,我们进行了剥离分析,发现MMST生成了更多的数据,但是性能提高的原因是使用多种方法。我们还分析了引擎和反相性的作用,以便使MMST更有效。我们希望这篇论文的证据能够鼓励机器学习研究人员探索语言模型的新训练方法。

Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques

  • paper_url: http://arxiv.org/abs/2307.10617
  • repo_url: None
  • paper_authors: Anusuya Baby Hari Krishnan
  • for: 本研究旨在提出一种机器学习模型,用于 indentifying 评论中的假评价(deceptive reviews),尤其是针对餐厅评论。
  • methods: 本研究采用了n-gram模型和max features技术来有效地识别假评价内容,并对五种不同的机器学习分类算法进行了比较。
  • results: 实验结果表明,使用了负面攻击分类器的方法可以达到最高的精度和假评价识别率。此外,研究还应用了深度学习技术来进一步提高假评价检测的效果。
    Abstract In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidable practice for businesses seeking to promote themselves or undermine their rivals. Detecting such deceptive reviews has become an intense and ongoing area of research. This research paper proposes a machine learning model to identify deceptive reviews, with a particular focus on restaurants. This study delves into the performance of numerous experiments conducted on a dataset of restaurant reviews known as the Deceptive Opinion Spam Corpus. To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content, particularly focusing on fake reviews. A benchmark study is undertaken to explore the performance of two different feature extraction techniques, which are then coupled with five distinct machine learning classification algorithms. The experimental results reveal that the passive aggressive classifier stands out among the various algorithms, showcasing the highest accuracy not only in text classification but also in identifying fake reviews. Moreover, the research delves into data augmentation and implements various deep learning techniques to further enhance the process of detecting deceptive reviews. The findings shed light on the efficacy of the proposed machine learning approach and offer valuable insights into dealing with deceptive reviews in the realm of online businesses.
    摘要 现代数字景观中,在线评论已成为不同业务的不可或缺的工具。marketers、广告人和在线业务在推广自己或抹黑竞争对手的产品和服务的过程中,都发现了奖励创造假阳性评论的做法。因此,创造假评论已成为促进自己或抹黑竞争对手的不可避免的做法。检测这些假评论已成为研究的焦点之一。这篇研究论文提出了一种机器学习模型,用于识别假评论,特别是针对餐厅的评论。这项研究通过对知名的餐厅评论数据集(Deceptive Opinion Spam Corpus)进行多个实验,开发了ngram模型和最佳特征来有效地识别假内容,特别是假评论。进一步,这篇研究进行了两种不同的特征提取技术的比较,然后与五种不同的机器学习分类算法结合。实验结果表明,通过适应性分类器得到了最高的准确率,不仅在文本分类方面,还在识别假评论方面。此外,研究还探讨了数据扩充和深度学习技术,以进一步提高检测假评论的过程。研究结果突出了提议的机器学习方法的效果,并为在线业务中处理假评论提供了有价值的思路。

Heterogeneous Federated Learning: State-of-the-art and Research Challenges

  • paper_url: http://arxiv.org/abs/2307.10616
  • repo_url: https://github.com/marswhu/hfl_survey
  • paper_authors: Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao
  • for: 本文是一篇关于 federated learning(FL)在异步izable 环境下的报告,它们提出了在实际应用中遇到的多种挑战,以及现有的解决方案。
  • methods: 本文提出了一种新的分类方法,包括数据水平、模型水平和服务器水平的分类方法。此外,文章还提出了一些关键的未来研究方向。
  • results: 本文通过对多种研究挑战和现有的解决方案进行分析,提出了一些关键的未来研究方向,可以帮助进一步发展 Federated Learning 领域。I hope this helps! Let me know if you have any further questions.
    Abstract Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is much more challenging, and corresponding solutions are diverse and complex. Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential. In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges. In addition, recent advances in HFL are reviewed and a new taxonomy of existing HFL methods is proposed with an in-depth analysis of their pros and cons. We classify existing methods from three different levels according to the HFL procedure: data-level, model-level, and server-level. Finally, several critical and promising future research directions in HFL are discussed, which may facilitate further developments in this field. A periodically updated collection on HFL is available at https://github.com/marswhu/HFL_Survey.
    摘要 随着联合学习(Federated Learning,FL)的应用范围扩大,它在大规模企业应用场景中受到了越来越多的关注。现有的联合学习研究主要集中在模型同质 Settings中。然而,实际的联合学习往往面临参与客户端的数据分布、模型架构、网络环境和硬件设备之间的差异。这种差异的联合学习(Heterogeneous Federated Learning,HFL)是更加复杂和多样化的,需要相应的研究挑战和解决方案。因此,一篇系统性的调查研究在这个领域是非常重要的。在本调查中,我们首先总结了HFL中不同方面的研究挑战,包括统计差异、模型差异、通信差异、设备差异以及其他挑战。此外,我们还进行了现有HFL方法的回顾,并提出了一种新的分类方法,根据HFL过程的三级层次:数据层、模型层和服务器层。最后,我们还讨论了未来研究的一些重要和优先的方向,以便进一步发展这一领域。关于HFL的相关研究可以通过https://github.com/marswhu/HFL_Survey查看更新的集成。

Flatness-Aware Minimization for Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.11108
  • repo_url: None
  • paper_authors: Xingxuan Zhang, Renzhe Xu, Han Yu, Yancheng Dong, Pengfei Tian, Peng Cu
  • for: 这篇研究旨在探讨领域扩展(Domain Generalization,DG)中的优化器选择问题。
  • methods: 本研究提出了一种新的方法——Flatness-Aware Minimization for Domain Generalization(FAD),可以有效地优化零项和首项的平坦性同时,以提高DG模型的适用范围。
  • results: 实验结果显示FAD在多种DG数据集上具有优越性,并且能够发现更平坦的极点,较其他零项和首项平坦性感知优化方法更好。
    Abstract Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. As a critical aspect of DG, optimizer selection has not been explored in depth. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default optimizer for all datasets. However, we reveal that Adam is not necessarily the optimal choice for the majority of current DG methods and datasets. Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD) generalization error and convergence. Our experimental results demonstrate the superiority of FAD on various DG datasets. Additionally, we confirm that FAD is capable of discovering flatter optima in comparison to other zeroth-order and first-order flatness-aware optimization methods.
    摘要 领域总结(DG)目标是学习能够适应未知分布变化的Robust模型。作为DG的关键方面,优化器选择尚未得到深入研究。现在,大多数DG方法采用DomainBed的标准准则,并使用Adam作为所有数据集的默认优化器。然而,我们发现Adam并不一定是大多数当前DG方法和数据集的优选优化器。基于损失函数地形的视角,我们提出了一种新的方法:适应性谱优化 для领域总结(FAD),可以高效地同时优化零次项和首次项的平坦性。我们提供了FAD的OOD泛化误差和收敛性的理论分析。我们的实验结果表明FAD在多个DG数据集上具有优秀的性能。此外,我们证明了FAD可以更好地找到平坦的极点,比其他零次项和首次项平坦性意识优化器更强。

Ensemble Learning based Anomaly Detection for IoT Cybersecurity via Bayesian Hyperparameters Sensitivity Analysis

  • paper_url: http://arxiv.org/abs/2307.10596
  • repo_url: None
  • paper_authors: Tin Lai, Farnaz Farid, Abubakar Bello, Fariza Sabrina
    for: 这 paper 的目的是提高 IoT 网络的安全性 via 异常检测。methods: 这 paper 使用了 ensemble 机器学习方法来提高异常检测的准确性,并使用 Bayesian 超参数优化来适应多种 IoT 传感器读数。results: 实验结果表明,这种方法在比较传统方法时有更高的预测力。
    Abstract The Internet of Things (IoT) integrates more than billions of intelligent devices over the globe with the capability of communicating with other connected devices with little to no human intervention. IoT enables data aggregation and analysis on a large scale to improve life quality in many domains. In particular, data collected by IoT contain a tremendous amount of information for anomaly detection. The heterogeneous nature of IoT is both a challenge and an opportunity for cybersecurity. Traditional approaches in cybersecurity monitoring often require different kinds of data pre-processing and handling for various data types, which might be problematic for datasets that contain heterogeneous features. However, heterogeneous types of network devices can often capture a more diverse set of signals than a single type of device readings, which is particularly useful for anomaly detection. In this paper, we present a comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomaly detection. Rather than using one single machine learning model, ensemble learning combines the predictive power from multiple models, enhancing their predictive accuracy in heterogeneous datasets rather than using one single machine learning model. We propose a unified framework with ensemble learning that utilises Bayesian hyperparameter optimisation to adapt to a network environment that contains multiple IoT sensor readings. Experimentally, we illustrate their high predictive power when compared to traditional methods.
    摘要 互联网智能设备(IoT)已经在全球范围内集成了数以十万计的智能设备,这些设备可以自动与其他相连的设备交换数据,而无需人类干预。IoT允许大规模数据聚合和分析,从而提高生活质量在多个领域。特别是数据收集到IoT中含有巨量数据异常检测信息。IoT网络设备的多样性同时是挑战和机遇,传统的网络安全监控方法通常需要不同类型的数据处理和处理,这可能会对具有多种特征的数据造成问题。然而,多种网络设备可以捕捉更多的信号,这对异常检测是特别有用。在这篇论文中,我们提出了一项总结性的研究,利用多个机器学习模型的 ensemble 方法提高 IoT 网络安全性 via 异常检测。而不是使用单一的机器学习模型,ensemble 学习可以将多种模型的预测力相互结合,在具有多种特征的数据集中提高预测精度。我们提议一种统一框架,利用 Bayesian 参数优化来适应包含多种 IoT 传感器读数的网络环境。实验表明,我们的方法具有高预测力,与传统方法相比。

Forecasting Battery Electric Vehicle Charging Behavior: A Deep Learning Approach Equipped with Micro-Clustering and SMOTE Techniques

  • paper_url: http://arxiv.org/abs/2307.10588
  • repo_url: None
  • paper_authors: Hanif Tayarani, Trisha V. Ramadoss, Vaishnavi Karanam, Gil Tal, Christopher Nitta
  • for: 降低排放和污染物对环境的影响,将交通领域电化。
  • methods: 使用人工神经网络算法,对BEV车辆的行程和充电资料进行预测。
  • results: 比较 benchmark 方法,MCDNN 能更好地预测 BEV 充电事件。
    Abstract Energy systems, climate change, and public health are among the primary reasons for moving toward electrification in transportation. Transportation electrification is being promoted worldwide to reduce emissions. As a result, many automakers will soon start making only battery electric vehicles (BEVs). BEV adoption rates are rising in California, mainly due to climate change and air pollution concerns. While great for climate and pollution goals, improperly managed BEV charging can lead to insufficient charging infrastructure and power outages. This study develops a novel Micro Clustering Deep Neural Network (MCDNN), an artificial neural network algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events, information that is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively. The MCDNN is configured using a robust dataset of trips and charges that occurred in California between 2015 and 2020 from 132 BEVs, spanning 5 BEV models for a total of 1570167 vehicle miles traveled. The numerical findings revealed that the proposed MCDNN is more effective than benchmark approaches in this field, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models in predicting the charging events.
    摘要 《能源系统、气候变化和公共卫生》是电动化交通的主要促进因素之一。由于气候变化和污染的关注,全球范围内的电动化交通吸引着广泛的推广。因此,许多汽车制造商即将停止生产内燃机油车,转而生产电池电动车(BEV)。在加利福尼亚州,BEV的采购率在增加,主要是由于气候变化和空气污染的问题。虽然这对气候和污染目标具有优秀的效果,但是不当管理BEV充电可能会导致充电基础设施不足和停电。这项研究开发了一种微型团集深度神经网络(MCDNN)算法,该算法可以高效地学习BEV的行驶和充电数据,以预测BEV的充电事件。MCDNN配置了加利福尼亚州2015-2020年间132台电动车的行驶记录,涵盖5种电动车型,共计1570167公里行驶。numerical发现,提出的MCDNN比各种参考方法,如支持向量机、最近邻居、决策树和其他神经网络模型在预测充电事件方面更有效。

A Holistic Assessment of the Reliability of Machine Learning Systems

  • paper_url: http://arxiv.org/abs/2307.10586
  • repo_url: None
  • paper_authors: Anthony Corso, David Karamadian, Romeo Valentin, Mary Cooper, Mykel J. Kochenderfer
  • for: 本研究旨在评估机器学习系统的可靠性,以便在高度竞争的领域中提高系统的可靠性。
  • methods: 本研究提出了一种整体评估机器学习系统可靠性的方法,包括五个关键属性的评估:内部分布准确率、环境变化快速稳定性、针对性攻击快速稳定性、校准性和外部分布检测。
  • results: 研究人员通过使用提出的方法对500多个模型进行评估,发现不同的算法方法可以同时提高多个可靠性指标,而不是只是优先一个指标。这项研究为机器学习可靠性的全面理解和未来研发提供了一份路线图。
    Abstract As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of these systems can significantly diminish due to adversarial attacks or environmental changes, leading to overconfident predictions, failures to detect input faults, and an inability to generalize in unexpected scenarios. This paper proposes a holistic assessment methodology for the reliability of ML systems. Our framework evaluates five key properties: in-distribution accuracy, distribution-shift robustness, adversarial robustness, calibration, and out-of-distribution detection. A reliability score is also introduced and used to assess the overall system reliability. To provide insights into the performance of different algorithmic approaches, we identify and categorize state-of-the-art techniques, then evaluate a selection on real-world tasks using our proposed reliability metrics and reliability score. Our analysis of over 500 models reveals that designing for one metric does not necessarily constrain others but certain algorithmic techniques can improve reliability across multiple metrics simultaneously. This study contributes to a more comprehensive understanding of ML reliability and provides a roadmap for future research and development.
    摘要 machine learning (ML) 系统在高度重要的设置中如医疗、交通、军事和国家安全中越来越普遍,关于它们的可靠性的问题也得到了关注。尽管在进步方面做出了很大的进展,但是这些系统的性能可能会因为抗对抗攻击或环境变化而减退,导致过于自信的预测、输入错误的检测失败和不能适应意外的情况。本文提出了一种整体评估方法 для ML 系统的可靠性。我们的框架评估了五个关键属性:在输入数据集上的准确率、对输入数据集的变化robustness、对抗攻击的Robustness、calibration和对输入数据集之外的检测。我们还引入了一个可靠度分数,用于评估整体系统的可靠性。为了提供不同算法approach的性能分析,我们分类了现有的技术,然后使用我们的提出的可靠性指标和可靠度分数评估一些实际任务中的选择。我们的分析结果表明,不同的算法approach可以同时改善多个可靠性指标。这种研究对 ML 系统的可靠性进行了更全面的理解,并提供了未来研究和开发的道路图。

Intelligent model for offshore China sea fog forecasting

  • paper_url: http://arxiv.org/abs/2307.10580
  • repo_url: None
  • paper_authors: Yanfei Xiang, Qinghong Zhang, Mingqing Wang, Ruixue Xia, Yang Kong, Xiaomeng Huang
  • For: 预测海上雾霾的精准和准确性非常重要,以便有效管理海岸和海上经济活动。* Methods: 本研究使用机器学习方法,并在数值天气预测模型中嵌入,以解决海上雾霾预测的问题。在训练机器学习模型之前,我们使用时间延迟相关分析技术来 indentify关键预测因素,并解决海上雾霾出现的下面机制。* Results: 我们的机器学习基于方法在一年的测试数据上表现出色,超越了WRF-NMM和NOAA FSL的预测性能。具体来说,在预测海上雾霾视力低于或等于1公里的情况下,我们的方法在60小时前的预测中具有更高的检测可能性(POD)和更低的误风率(FAR)。
    Abstract Accurate and timely prediction of sea fog is very important for effectively managing maritime and coastal economic activities. Given the intricate nature and inherent variability of sea fog, traditional numerical and statistical forecasting methods are often proven inadequate. This study aims to develop an advanced sea fog forecasting method embedded in a numerical weather prediction model using the Yangtze River Estuary (YRE) coastal area as a case study. Prior to training our machine learning model, we employ a time-lagged correlation analysis technique to identify key predictors and decipher the underlying mechanisms driving sea fog occurrence. In addition, we implement ensemble learning and a focal loss function to address the issue of imbalanced data, thereby enhancing the predictive ability of our model. To verify the accuracy of our method, we evaluate its performance using a comprehensive dataset spanning one year, which encompasses both weather station observations and historical forecasts. Remarkably, our machine learning-based approach surpasses the predictive performance of two conventional methods, the weather research and forecasting nonhydrostatic mesoscale model (WRF-NMM) and the algorithm developed by the National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL). Specifically, in regard to predicting sea fog with a visibility of less than or equal to 1 km with a lead time of 60 hours, our methodology achieves superior results by increasing the probability of detection (POD) while simultaneously reducing the false alarm ratio (FAR).
    摘要 <>将文本翻译成简化中文。<> Effective sea fog prediction is crucial for managing maritime and coastal economic activities. However, traditional numerical and statistical forecasting methods often fall short due to the complex and inherently variable nature of sea fog. This study aims to develop an advanced sea fog forecasting method using a numerical weather prediction model, with the Yangtze River Estuary (YRE) coastal area as a case study. Before training our machine learning model, we employ a time-lagged correlation analysis technique to identify key predictors and understand the underlying mechanisms driving sea fog occurrence. Additionally, we use ensemble learning and a focal loss function to address the issue of imbalanced data, which enhances the predictive ability of our model. To evaluate the accuracy of our method, we use a comprehensive dataset spanning one year, which includes both weather station observations and historical forecasts. Our machine learning-based approach outperforms two conventional methods, the Weather Research and Forecasting nonhydrostatic mesoscale model (WRF-NMM) and the algorithm developed by the National Oceanic and Atmospheric Administration (NOAA) Forecast Systems Laboratory (FSL). Specifically, our methodology achieves better results in predicting sea fog with a visibility of less than or equal to 1 km with a lead time of 60 hours, with higher probability of detection (POD) and lower false alarm ratio (FAR).

SecureBoost Hyperparameter Tuning via Multi-Objective Federated Learning

  • paper_url: http://arxiv.org/abs/2307.10579
  • repo_url: None
  • paper_authors: Ziyao Ren, Yan Kang, Lixin Fan, Linghua Yang, Yongxin Tong, Qiang Yang
  • For: 这个研究目的是提出一个名为 Constrained Multi-Objective SecureBoost (CMOSB) 的算法,用于在阶层式联合学习中选择最佳的 SecureBoost 参数,以实现最佳的调解 между 功能损失、训练成本和隐私泄露。* Methods: 这个研究使用了 SecureBoost 算法,并将其与多bjective evolutionary algorithm (MOEA) 结合,以找到 Pareto 最佳解。另外,这个研究还提出了一个新的实例聚类攻击来量化隐私泄露。* Results: 实验结果显示,CMOSB 可以获得不只是基eline的优化参数,还可以找到最佳的参数集,以满足不同的 FL 参与者的需求。
    Abstract SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting. It is widely used in fields such as finance and healthcare due to its interpretability, effectiveness, and privacy-preserving capability. However, SecureBoost suffers from high computational complexity and risk of label leakage. To harness the full potential of SecureBoost, hyperparameters of SecureBoost should be carefully chosen to strike an optimal balance between utility, efficiency, and privacy. Existing methods either set hyperparameters empirically or heuristically, which are far from optimal. To fill this gap, we propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find Pareto optimal solutions that each solution is a set of hyperparameters achieving optimal tradeoff between utility loss, training cost, and privacy leakage. We design measurements of the three objectives. In particular, the privacy leakage is measured using our proposed instance clustering attack. Experimental results demonstrate that the CMOSB yields not only hyperparameters superior to the baseline but also optimal sets of hyperparameters that can support the flexible requirements of FL participants.
    摘要 secureboost是一种树融合算法,利用同质加密保护数据隐私在垂直联合学习设置下。它在金融和医疗等领域广泛应用,因为它具有可读性、效果和隐私保护能力。然而,secureboost受到高计算复杂性和标签泄露的风险。为了激活secureboost的全部潜力,secureboost的超参数应该仔细选择,以达到最佳的平衡点。现有的方法可以通过实验或规则来设置超参数,但这些方法远不够优化。为了填补这一空白,我们提出了一种受限multi-目标secureboost(CMOSB)算法,以找到Pareto优化解决方案,每个解决方案都是一组超参数,实现了Utility损失、训练成本和隐私泄露的优化平衡。我们设计了三个目标量表示。具体来说,隐私泄露被我们提出的实例划分攻击来度量。实验结果表明,CMOSB可以不仅提供超参数优于基准值,还可以找到优化的超参数集,以满足联合学习参与者的灵活要求。

Boosting Federated Learning Convergence with Prototype Regularization

  • paper_url: http://arxiv.org/abs/2307.10575
  • repo_url: None
  • paper_authors: Yu Qiao, Huy Q. Le, Choong Seon Hong
  • for: 这篇论文旨在提高 Federated Learning (FL) 中的模型性能,解决 Client 间资料不均匀问题。
  • methods: 本文提出了一种基于 Prototype 的调整策略,通过服务器将分布式 Client 的本地 Prototype 聚合成全局 Prototype,将其传回个别 Client 进行本地训练。
  • results: 实验结果显示,该方法在 MNIST 和 Fashion-MNIST 上得到了3.3% 和8.9% 的平均测试精度提升,相比最受欢迎的基于 FedAvg 的基eline。此外,本方法在不均匀环境下具有快速的整合速率。
    Abstract As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings.
    摘要 为了解决客户端数据不均匀性的问题,本文提出了一种基于原型的规范约束策略,用于在分布式机器学习中协同训练共享模型。具体来说,规范过程包括将分布在各客户端上的本地原型由服务器进行汇总,生成一个全局原型,然后将该全局原型发送回到各个客户端,以供本地训练指导。实验结果表明,与最常用的基准方法FedAvg相比,我们的方案在MNIST和Fashion-MNIST两个预测集上平均测试精度提高3.3%和8.9%。此外,我们的方法在不均匀设置下具有快速收敛的特点。

Deceptive Alignment Monitoring

  • paper_url: http://arxiv.org/abs/2307.10569
  • repo_url: None
  • paper_authors: Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo
  • for: 本研究旨在防止大机器学习模型的欺骗性行为,以及检测这些模型是否在不明确的目的下进行 modify 其行为。
  • methods: 本文提出了多个不同的机器学习子领域的研究方向,以检测和防止模型的欺骗性行为。
  • results: 本文认为,这些研究方向将在未来对检测和防止模型的欺骗性行为起到关键作用,并且将为敏捷机器学习社区带来新的研究机遇。
    Abstract As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.
    摘要 随着大机器学习模型的能力不断增长,以及这些模型的自主权力不断扩展,一个新的对手出现了:模型本身。这种威胁被称为“欺骗启动”(deceptive alignment)在AI安全与Alignment社区中。因此,我们将这一方向称为“欺骗启动监测”(Deceptive Alignment Monitoring)。在这种工作中,我们认为未来几年将成为更加重要和关键的方向,并且这些领域的进步将带来长期挑战和新的研究机遇。我们最终呼吁了对抗机器学习社区更加参与这些emerging方向。

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

  • paper_url: http://arxiv.org/abs/2307.10563
  • repo_url: None
  • paper_authors: Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo
  • for: 提高模型 robustness 和可见性,并应对 adversarial 攻击
  • methods: 基于 probablistic 和几何学的方法,探索 activation space 中 pseudo-class 的性质变化,找到 adversarial 攻击的源头
  • results: 提供了一种可靠的 anomaly detection 方法,可以帮助提高模型的安全性和可靠性,并应用于实际场景中
    Abstract We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseudo-classes, or high-dimensional modes in activation space, yielding a powerful tool for uncovering and combating adversarial attacks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
    摘要 我们介绍FACADE,一个新的机会概率和几何框架,用于无supervised机器学习领域中的机器学习过程中的非常�ynchronize攻击探测。其主要目的是提高机器学习模型的抗干扰能力,提高可扩展的模型监控,并在实际应用中展示了可靠的应用。FACADE的目标是生成机会概率分布 над circuit,从而获得 Pseudo-classes 或高维度模式在活动空间的变化特征,实现了强大的探测和抗干扰攻击的工具。我们的方法可以提高机器学习模型的类别Robustness,并且可以扩展到可扩展的模型监控。在实际应用中,FACADE 展示了可靠的应用。

Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples

  • paper_url: http://arxiv.org/abs/2307.10562
  • repo_url: None
  • paper_authors: Shaokui Wei, Mingda Zhang, Hongyuan Zha, Baoyuan Wu
  • for: 本研究探讨了如何使用小量净数据级联机器学习模型中的恶意攻击推议。
  • methods: 本研究使用了链接恶意风险和敌意风险的联系, derivates a novel upper bound for backdoor risk, 并提出了一种新的二级优化问题来 Mitigate backdoor 攻击。
  • results: 实验表明,我们提出的方法可以在不同的 benchmark 数据集和网络架构上达到 state-of-the-art 的性能。
    Abstract Backdoor attacks are serious security threats to machine learning models where an adversary can inject poisoned samples into the training set, causing a backdoored model which predicts poisoned samples with particular triggers to particular target classes, while behaving normally on benign samples. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk, which mainly captures the risk on the shared adversarial examples (SAEs) between the backdoored model and the purified model. This upper bound further suggests a novel bi-level optimization problem for mitigating backdoor using adversarial training techniques. To solve it, we propose Shared Adversarial Unlearning (SAU). Specifically, SAU first generates SAEs, and then, unlearns the generated SAEs such that they are either correctly classified by the purified model and/or differently classified by the two models, such that the backdoor effect in the backdoored model will be mitigated in the purified model. Experiments on various benchmark datasets and network architectures show that our proposed method achieves state-of-the-art performance for backdoor defense.
    摘要 <>本文探讨了使用小清洁数据集来纯化被攻击的模型的任务。我们发现了连接背门攻击风险和敌对攻击风险的关系,从而得出了一个新的背门风险Upper bound,它主要捕捉了由两个模型共享的敌对示例(SAEs)所带来的风险。基于这个Upper bound,我们提出了一种新的两级优化问题来 Mitigate 背门攻击。我们称之为共同敌对学习(SAU)。SAU首先生成SAEs,然后通过不正确地分类这些SAEs来减少背门效果在纯化后的模型中。实验结果表明,我们的提出的方法在多个benchmark数据集和网络架构上达到了背门防御的状态之 arts。Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, I can provide that as well.

Post-variational quantum neural networks

  • paper_url: http://arxiv.org/abs/2307.10560
  • repo_url: None
  • paper_authors: Po-Wei Huang, Patrick Rebentrost
  • for: 本研究旨在提出一种使用混合量子-классические计算和变量算法来解决量子计算机器硬件不够发展的问题,并且提高量子模型优化的效率。
  • methods: 本研究使用了混合量子-классические计算和变量算法,并提出了“后变量策略”,即将调整参数从量子计算机器传递到类型计算机器上进行优化。 ensemble策略和构建个别量子电路的设计原则也被讨论。
  • results: 本研究表明,使用后变量策略可以提高量子模型的优化效率,并且可以应用于实际应用场景如手写字符识别,实现96%的分类精度。
    Abstract Quantum computing has the potential to provide substantial computational advantages over current state-of-the-art classical supercomputers. However, current hardware is not advanced enough to execute fault-tolerant quantum algorithms. An alternative of using hybrid quantum-classical computing with variational algorithms can exhibit barren plateau issues, causing slow convergence of gradient-based optimization techniques. In this paper, we discuss "post-variational strategies", which shift tunable parameters from the quantum computer to the classical computer, opting for ensemble strategies when optimizing quantum models. We discuss various strategies and design principles for constructing individual quantum circuits, where the resulting ensembles can be optimized with convex programming. Further, we discuss architectural designs of post-variational quantum neural networks and analyze the propagation of estimation errors throughout such neural networks. Lastly, we show that our algorithm can be applied to real-world applications such as image classification on handwritten digits, producing a 96% classification accuracy.
    摘要 量子计算具有提供现代классиical超级计算机substantial计算优势的潜力。然而,当前硬件还不够先进,无法执行 fault-tolerant 量子算法。作为alternative,我们可以使用hybrid量子-классиical计算,使用变量算法,但这会导致恶势垃圾板块问题,使得梯度基于优化技术的敏感度变慢。在这篇论文中,我们讨论了“后变量策略”,即将可调参数从量子计算机shift到类型计算机,选择ensemble策略来优化量子模型。我们讨论了各种策略和设计原则,用于构建个性化的量子Circuit,其结果可以通过凸型Programming优化。此外,我们还讨论了post-variational量子神经网络的建筑设计,并分析了估计错误在such神经网络中的传播。最后,我们示出了我们的算法可以应用于实际应用场景,如手写数字识别,并达到96%的分类精度。

Air Traffic Controller Workload Level Prediction using Conformalized Dynamical Graph Learning

  • paper_url: http://arxiv.org/abs/2307.10559
  • repo_url: https://github.com/ymlasu/para-atm-collection
  • paper_authors: Yutian Pang, Jueming Hu, Christopher S. Lieber, Nancy J. Cooke, Yongming Liu
  • for: 预测空交控制员(ATCo)的工作负担,以提高航空业务操作的安全性和空间利用率。
  • methods: 使用人类在Loop(HITL) simulations with retired ATCo,并对实际航空数据和工作负担标签进行分析。提议使用图表深度学习框架和协Forms预测ATCo工作负担水平。
  • results: 实验结果表明, besides 交通密度特征,交通冲突特征也对工作负担预测做出贡献(即最小水平/垂直分离距离)。 directly learning from空间时间图像的空间特征可以提高预测精度,比手工设计的交通复杂度特征更高。 conformal prediction 是一种有价值的工具,可以进一步提高预测精度,并生成一个范围内的预测工作负担标签。
    Abstract Air traffic control (ATC) is a safety-critical service system that demands constant attention from ground air traffic controllers (ATCos) to maintain daily aviation operations. The workload of the ATCos can have negative effects on operational safety and airspace usage. To avoid overloading and ensure an acceptable workload level for the ATCos, it is important to predict the ATCos' workload accurately for mitigation actions. In this paper, we first perform a review of research on ATCo workload, mostly from the air traffic perspective. Then, we briefly introduce the setup of the human-in-the-loop (HITL) simulations with retired ATCos, where the air traffic data and workload labels are obtained. The simulations are conducted under three Phoenix approach scenarios while the human ATCos are requested to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis is conducted. Next, we propose a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels. The code used is available at \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Prediction/}{$\mathsf{Link}$}.
    摘要 空交控制(ATC)是一个安全关键的服务系统,需要地面空交控制员(ATCo)不断注意以维护每天的航空运输业务。ATCo的工作负担可能会对操作安全和空域使用产生负面影响。为了避免过载和确保ATCo的工作负担水平接受,需要准确预测ATCo的工作负担。在这篇论文中,我们首先进行了研究人员对ATCo工作负担的评估,主要来自空交 perspective。然后,我们 briefly introduce了人在Loop(HITL) simulations with retired ATCos,其中获取了空交数据和工作负担标签。 simulations were conducted under three Phoenix approach scenarios, while the human ATCos were asked to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis was conducted. Next, we proposed a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature also contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compared to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting in a range of predicted workload labels. The code used is available at $\mathsf{Link}$.

SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer

  • paper_url: http://arxiv.org/abs/2307.10550
  • repo_url: https://github.com/0913ktg/sc_vall-e
  • paper_authors: Daegyeom Kim, Seongho Hong, Yong-Hoon Choi
  • For: The paper is written to propose a style control (SC) VALL-E model for expressive speech synthesis, which can generate diverse voices with controllable attributes such as emotion, speaking rate, pitch, and voice intensity.* Methods: The SC VALL-E model is based on the neural codec language model (VALE) and the generative pretrained transformer 3 (GPT-3), and it uses a newly designed style network to control the attributes of the generated speech. The model takes input from text sentences and prompt audio and is trained to generate controllable speech that can mimic the characteristics of the prompt audio.* Results: The paper conducts comparative experiments with three representative expressive speech synthesis models and measures word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics. The results show that the SC VALL-E model demonstrates competitive performance compared to the existing models and can generate a variety of expressive sounds with controllable attributes.
    Abstract Expressive speech synthesis models are trained by adding corpora with diverse speakers, various emotions, and different speaking styles to the dataset, in order to control various characteristics of speech and generate the desired voice. In this paper, we propose a style control (SC) VALL-E model based on the neural codec language model (called VALL-E), which follows the structure of the generative pretrained transformer 3 (GPT-3). The proposed SC VALL-E takes input from text sentences and prompt audio and is designed to generate controllable speech by not simply mimicking the characteristics of the prompt audio but by controlling the attributes to produce diverse voices. We identify tokens in the style embedding matrix of the newly designed style network that represent attributes such as emotion, speaking rate, pitch, and voice intensity, and design a model that can control these attributes. To evaluate the performance of SC VALL-E, we conduct comparative experiments with three representative expressive speech synthesis models: global style token (GST) Tacotron2, variational autoencoder (VAE) Tacotron2, and original VALL-E. We measure word error rate (WER), F0 voiced error (FVE), and F0 gross pitch error (F0GPE) as evaluation metrics to assess the accuracy of generated sentences. For comparing the quality of synthesized speech, we measure comparative mean option score (CMOS) and similarity mean option score (SMOS). To evaluate the style control ability of the generated speech, we observe the changes in F0 and mel-spectrogram by modifying the trained tokens. When using prompt audio that is not present in the training data, SC VALL-E generates a variety of expressive sounds and demonstrates competitive performance compared to the existing models. Our implementation, pretrained models, and audio samples are located on GitHub.
    摘要 干支表达 synthesis 模型通常通过添加具有多个说话者、不同情感和不同说话风格的文本 corpus 来训练,以控制不同特性的语音并生成感兴趣的声音。在这篇文章中,我们提出了一种基于 neural codec 语言模型(称为 VALL-E)的风格控制(SC) VALLE 模型。我们在 VALL-E 的结构上添加了一个新的风格网络,并在这个风格网络中标识了不同的特征表达,如情感、说话速度、音高和声音强度。我们设计了一个可控制这些特征的模型。为评估 SC VALL-E 的表现,我们进行了与三种常见的表达性语音合成模型进行比较:global style token(GST) Tacotron2、variational autoencoder(VAE) Tacotron2 和原始 VALL-E。我们使用 word error rate(WER)、F0 voiced error(FVE)和 F0 gross pitch error(F0GPE)作为评估 metric。为比较生成的语音质量,我们使用 comparative mean option score(CMOS)和 similarity mean option score(SMOS)。为评估生成的语音风格控制能力,我们观察了 F0 和 mel-spectrogram 的变化。当使用不在训练数据中的提示音时,SC VALL-E 能够生成多种表达性的声音,并与现有模型相比具有竞争力。我们的实现、预训练模型和声音样本位于 GitHub。

Differentially Flat Learning-based Model Predictive Control Using a Stability, State, and Input Constraining Safety Filter

  • paper_url: http://arxiv.org/abs/2307.10541
  • repo_url: https://github.com/utiasdsl/fmpc_socp
  • paper_authors: Adam W. Hall, Melissa Greeff, Angela P. Schoellig
    for: learning-based optimal control algorithms for unknown systemsmethods: exploits differential flatness, nonlinear transformation learned as a Gaussian process, safety filter, two successive convex optimizationsresults: similar performance to state-of-the-art learning-based controllers, significantly better computational efficiency, respects flat state and input constraints, guarantees stability
    Abstract Learning-based optimal control algorithms control unknown systems using past trajectory data and a learned model of the system dynamics. These controllers use either a linear approximation of the learned dynamics, trading performance for faster computation, or nonlinear optimization methods, which typically perform better but can limit real-time applicability. In this work, we present a novel nonlinear controller that exploits differential flatness to achieve similar performance to state-of-the-art learning-based controllers but with significantly less computational effort. Differential flatness is a property of dynamical systems whereby nonlinear systems can be exactly linearized through a nonlinear input mapping. Here, the nonlinear transformation is learned as a Gaussian process and is used in a safety filter that guarantees, with high probability, stability as well as input and flat state constraint satisfaction. This safety filter is then used to refine inputs from a flat model predictive controller to perform constrained nonlinear learning-based optimal control through two successive convex optimizations. We compare our method to state-of-the-art learning-based control strategies and achieve similar performance, but with significantly better computational efficiency, while also respecting flat state and input constraints, and guaranteeing stability.
    摘要 学习基于的优化控制算法可以控制未知系统使用过去轨迹数据和学习到系统动力学模型。这些控制器使用 линей化学习动力学模型,换取更快的计算速度,或者非线性优化方法,通常表现更好,但可能限制实时应用。在这种工作中,我们提出了一种新的非线性控制器,利用差分平凡性来实现与现有学习基于控制策略相似的性能,但计算效率明显更高。差分平凡性是动力系统的性质,通过非线性输入映射来将非线性系统 Linearize。在这里,非线性变换是通过 Gaussian Process 学习的,并用于安全筛选器,保证高概率稳定性和输入和平凡状态约束的满足。这个安全筛选器然后用于改进由平凡模型预测控制器输出的输入,通过两次 convex 优化来实现受限制的非线性学习基于优化控制。我们与当前学习基于控制策略进行比较,实现相似的性能,但计算效率明显更高,同时也遵守平凡状态和输入约束,并保证稳定性。

The Extractive-Abstractive Axis: Measuring Content “Borrowing” in Generative Language Models

  • paper_url: http://arxiv.org/abs/2307.11779
  • repo_url: None
  • paper_authors: Nedelina Teneva
  • for: 本研究旨在探讨生成模型的抽象性和内容授权问题,并提出了EXTRACTIVE-ABSTRACTIVE轴来评估生成模型。
  • methods: 本研究使用了生成模型对文本数据进行生成和抽象,并对生成结果进行评估。
  • results: 研究发现,生成模型的抽象性和内容授权问题需要更加重视,并提出了对生成模型的评估指标、数据集和注解指南。
    Abstract Generative language models produce highly abstractive outputs by design, in contrast to extractive responses in search engines. Given this characteristic of LLMs and the resulting implications for content Licensing & Attribution, we propose the the so-called Extractive-Abstractive axis for benchmarking generative models and highlight the need for developing corresponding metrics, datasets and annotation guidelines. We limit our discussion to the text modality.
    摘要 <>将文本转换为简化中文。<>生成语言模型会生成高度抽象的输出,与搜索引擎的EXTRACTIVE responses不同,这种特点对内容授权和归功有重要影响。我们提出EXTRACTIVE-ABSTRACTIVE轴来评估生成模型,并需要开发相应的指标、数据集和注释指南。我们只限制于文本 modal。

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

  • paper_url: http://arxiv.org/abs/2307.10529
  • repo_url: None
  • paper_authors: Xueying Ding, Yue Zhao, Leman Akoglu
  • for: 这篇论文主要针对无监督的偏差检测(Outlier Detection,OD)中的问题,即有效地调整(Hyperparameter,HP)的选择和优化。
  • methods: 本文提出了一个名为HYPER的方法,它通过设计和训练一个新的对应网络(Hypernetwork,HN),将 HP 映射到OD模型的优化参数。此外,HYPER还使用了元学习来训练一个代理验证函数,以有效地 validate OD 模型。
  • results: 实验结果显示,HYPER 在 35 个 OD 任务上实现了高性能,并与 8 个基eline 比较得到了显著的效率优势。
    Abstract Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.
    摘要

Beyond Black-Box Advice: Learning-Augmented Algorithms for MDPs with Q-Value Predictions

  • paper_url: http://arxiv.org/abs/2307.10524
  • repo_url: None
  • paper_authors: Tongxin Li, Yiheng Lin, Shaolei Ren, Adam Wierman
  • for: 这个论文旨在研究在单轨迹 Markov决策过程(MDP)中的一致性和可靠性的负担关系,并在不可信 advise 的情况下进行研究。
  • methods: 该论文使用 Q-值建议来研究一致性和可靠性的负担关系,并在普通 MDP 模型中包括了连续和离散状态/动作空间。
  • results: 研究结果表明,通过使用 Q-值建议,可以在不可信 advise 的情况下实现近似优化的性能保证,并且比靠solely black-box advise 可以获得更高的性能。
    Abstract We study the tradeoff between consistency and robustness in the context of a single-trajectory time-varying Markov Decision Process (MDP) with untrusted machine-learned advice. Our work departs from the typical approach of treating advice as coming from black-box sources by instead considering a setting where additional information about how the advice is generated is available. We prove a first-of-its-kind consistency and robustness tradeoff given Q-value advice under a general MDP model that includes both continuous and discrete state/action spaces. Our results highlight that utilizing Q-value advice enables dynamic pursuit of the better of machine-learned advice and a robust baseline, thus result in near-optimal performance guarantees, which provably improves what can be obtained solely with black-box advice.
    摘要 我们研究了在单轨时变Markov决策过程(MDP)中的一致性和可靠性的贸易。我们的工作与传统途径不同,即对建议视为黑盒来源的做法。相反,我们考虑了一种情况,在该情况下,建议的生成方式具有更多的信息。我们证明了一种首次的一致性和可靠性贸易,基于Q值建议在通用MDP模型中,该模型包括连续和离散状态/动作空间。我们的结果表明,通过利用Q值建议,可以在机器学习建议和一个可靠基础线上动态追求更好的性能,从而获得优化的性能保证,这与黑盒建议 alone 无法达到。

Prediction of Handball Matches with Statistically Enhanced Learning via Estimated Team Strengths

  • paper_url: http://arxiv.org/abs/2307.11777
  • repo_url: None
  • paper_authors: Florian Felice, Christophe Ley
  • for: 预测手球赛事
  • methods: 使用Statistically Enhanced Learning(SEL)模型,并与现有模型进行比较,以评估其性能能力
  • results: 模型的准确率高于80%,并且通过可解释方法提供了有价值的统计和预测性能分析,有助于手球队教练提前准备比赛。
    Abstract We propose a Statistically Enhanced Learning (aka. SEL) model to predict handball games. Our Machine Learning model augmented with SEL features outperforms state-of-the-art models with an accuracy beyond 80%. In this work, we show how we construct the data set to train Machine Learning models on past female club matches. We then compare different models and evaluate them to assess their performance capabilities. Finally, explainability methods allow us to change the scope of our tool from a purely predictive solution to a highly insightful analytical tool. This can become a valuable asset for handball teams' coaches providing valuable statistical and predictive insights to prepare future competitions.
    摘要 我们提出了一个统计增强学习(简称 SEL)模型,用于预测手球比赛。我们的机器学习模型,通过添加 SEL 特征,超过了现状最佳模型的准确率80%。在这项工作中,我们介绍了如何使用过去女子俱乐部比赛数据来训练机器学习模型。然后,我们比较了不同的模型,并评估它们的性能能力。最后,可视化方法使我们的工具从一种仅仅是预测解决方案转化为一种具有高度探索性的分析工具,这将成为手球队教练的宝贵统计和预测信息,以准备未来的比赛。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation

  • paper_url: http://arxiv.org/abs/2307.10507
  • repo_url: None
  • paper_authors: Minghui Chen, Meirui Jiang, Qi Dou, Zehua Wang, Xiaoxiao Li
  • for: 本研究旨在提高分布式学习(Federated Learning,FL)中模型的通用性和全局性,解决当面临分布shift时现有FL算法的负面效果。
  • methods: 我们提出了一种新的联邦模型汤(Federated Model Soup,FMS)方法,通过在联邦训练阶段对本地和全局模型进行选择性 interpolate 来优化本地和全局性之间的负面效果。
  • results: 我们在Retinal和病理图像分类任务上评估了我们的方法,并实现了显著提高对于非典型数据的泛化性。代码可以在https://github.com/ubc-tea/FedSoup中找到。
    Abstract Cross-silo federated learning (FL) enables the development of machine learning models on datasets distributed across data centers such as hospitals and clinical research laboratories. However, recent research has found that current FL algorithms face a trade-off between local and global performance when confronted with distribution shifts. Specifically, personalized FL methods have a tendency to overfit to local data, leading to a sharp valley in the local model and inhibiting its ability to generalize to out-of-distribution data. In this paper, we propose a novel federated model soup method (i.e., selective interpolation of model parameters) to optimize the trade-off between local and global performance. Specifically, during the federated training phase, each client maintains its own global model pool by monitoring the performance of the interpolated model between the local and global models. This allows us to alleviate overfitting and seek flat minima, which can significantly improve the model's generalization performance. We evaluate our method on retinal and pathological image classification tasks, and our proposed method achieves significant improvements for out-of-distribution generalization. Our code is available at https://github.com/ubc-tea/FedSoup.
    摘要 跨存储板 federated learning (FL) 可以开发机器学习模型在数据中心如医院和临床研究实验室等地的数据集上。然而,最近的研究发现,当面临分布变化时,当前的 FL 算法面临一种本地和全球性能之间的负权补偿。特别是,个性化 FL 方法有偏向本地数据过拟合的倾向,导致本地模型呈锐降谷,阻碍其在不同数据集上的泛化性能。在本文中,我们提出一种新的联邦模型汤 soup 方法(即选择性 interpolate 模型参数),以优化本地和全球性能之间的负权补偿。具体来说,在联邦训练阶段,每个客户端都会维护自己的全球模型池,并在本地和全球模型之间进行选择性 interpolate 模型参数。这有助于解决过拟合问题,寻找平降谷,可以显著提高模型的泛化性能。我们在Retinal和病理图像分类任务上评估了我们的方法,并取得了显著的外部数据集泛化性能改进。我们的代码可以在https://github.com/ubc-tea/FedSoup 中找到。

Identifying Interpretable Subspaces in Image Representations

  • paper_url: http://arxiv.org/abs/2307.10504
  • repo_url: None
  • paper_authors: Neha Kalibhat, Shweta Bhardwaj, Bayan Bruss, Hamed Firooz, Maziar Sanjabi, Soheil Feizi
  • for: 本文旨在解释图像表示的特征,提高图像表示的可解释性。
  • methods: 本文使用了对比概念(Contrasting Concepts)来解释图像表示的特征。首先,使用大量captioning数据集(如LAION-400m)和预训练的视觉语言模型(如CLIP)来生成特征的描述。然后,对每个描述语言进行分数和排名,从而得到少量共享的人类可理解的概念,它们准确地描述了目标特征。此外,本文还使用了对比解释,使用低活跃图像(counterfactual)来消除幻觉的概念。
  • results: 研究发现,许多现有的方法只能独立解释特征的一部分,但是使用FALCON可以解释大型表示空间中的特征,并且可以通过高阶分数来解释特征。此外,本文还提出了一种将概念从一个可解释的表示空间传递到另一个未知表示空间的技术。
    Abstract We propose Automatic Feature Explanation using Contrasting Concepts (FALCON), an interpretability framework to explain features of image representations. For a target feature, FALCON captions its highly activating cropped images using a large captioning dataset (like LAION-400m) and a pre-trained vision-language model like CLIP. Each word among the captions is scored and ranked leading to a small number of shared, human-understandable concepts that closely describe the target feature. FALCON also applies contrastive interpretation using lowly activating (counterfactual) images, to eliminate spurious concepts. Although many existing approaches interpret features independently, we observe in state-of-the-art self-supervised and supervised models, that less than 20% of the representation space can be explained by individual features. We show that features in larger spaces become more interpretable when studied in groups and can be explained with high-order scoring concepts through FALCON. We discuss how extracted concepts can be used to explain and debug failures in downstream tasks. Finally, we present a technique to transfer concepts from one (explainable) representation space to another unseen representation space by learning a simple linear transformation.
    摘要 我们提出了自动Feature解释使用对比概念(FALCON)的解释框架,用于解释图像表示中的特征。为目标特征,FALCON使用大量captioningdataset(如LAION-400m)和预训练的视觉语言模型(如CLIP)来caption高度活跃的裁剪图像。每个单词在caption中被分数和排名,导致一小数量的共享、人类可理解的概念,准确地描述目标特征。此外,FALCON还使用对比解释使用低活跃(counterfactual)图像,以消除幻数概念。现有的方法大多解释特征独立,但我们发现在当今的自然语言和指导下的模型中, Less than 20% of the representation space can be explained by individual features。我们表明,在更大的空间中,特征在组合 изуча时变得更加解释,可以通过高级分数概念来解释。我们还讨论了抽取的概念如何用于解释和调试下游任务的失败。最后,我们提出了一种将概念从一个可解释的表示空间传输到另一个未知表示空间的学习简单线性变换的技术。

A Competitive Learning Approach for Specialized Models: A Solution for Complex Physical Systems with Distinct Functional Regimes

  • paper_url: http://arxiv.org/abs/2307.10496
  • repo_url: https://github.com/exploita123/charmedforfree
  • paper_authors: Okezzi F. Ukorigho, Opeoluwa Owoyele
  • for: 该文章是为了提出一种新的竞争学习方法,用于获取基于数据的物理系统模型。
  • methods: 该方法使用动态损失函数,让一组模型同时在数据上进行训练,以便在数据中发现不同的功能 режи度。
  • results: 实验结果表明,该方法可以成功地发现功能 régime,找到真正的管理方程,并减少测试错误。
    Abstract Complex systems in science and engineering sometimes exhibit behavior that changes across different regimes. Traditional global models struggle to capture the full range of this complex behavior, limiting their ability to accurately represent the system. In response to this challenge, we propose a novel competitive learning approach for obtaining data-driven models of physical systems. The primary idea behind the proposed approach is to employ dynamic loss functions for a set of models that are trained concurrently on the data. Each model competes for each observation during training, allowing for the identification of distinct functional regimes within the dataset. To demonstrate the effectiveness of the learning approach, we coupled it with various regression methods that employ gradient-based optimizers for training. The proposed approach was tested on various problems involving model discovery and function approximation, demonstrating its ability to successfully identify functional regimes, discover true governing equations, and reduce test errors.
    摘要 科学和工程中的复杂系统有时会展现不同的行为方式,传统的全球模型很难捕捉这些复杂的行为范围,这限制了它们的准确性。为应对这个挑战,我们提议一种新的竞争学习方法,通过在数据上同时训练多个模型,并使用动态损失函数来让每个模型在训练过程中竞争对每个观察结果。这会使得模型能够成功地识别数据集中的不同功能 режи度。为证明该学习方法的效果,我们将其与不同的回归方法结合使用,这些回归方法使用梯度基于优化器进行训练。我们在各种模型发现和函数近似问题中测试了该方法,并证明了它可以成功地识别功能 режи度,发现真正的管理方程和降低测试错误。

Novel Batch Active Learning Approach and Its Application to Synthetic Aperture Radar Datasets

  • paper_url: http://arxiv.org/abs/2307.10495
  • repo_url: https://github.com/chapman20j/sar_bal
  • paper_authors: James Chapman, Bohan Chen, Zheng Tan, Jeff Calder, Kevin Miller, Andrea L. Bertozzi
  • for: 这个论文主要针对的是sequential active learning方法在Synthetic Aperture Radar(SAR)数据集上的应用和提高。
  • methods: 这篇论文提出了一种新的两部分方法,包括Dijkstra的 Annulus Core-Set(DAC)和LocalMax,以便批处理活动学习。
  • results: 根据实验结果,这种批处理活动学习方法可以与sequential active learning方法几乎达到同样的准确率,但是更高效,与批处理大小成比例。此外,该方法在classify FUSAR-Ship和OpenSARShip datasets时达到了状态平台CNN-based方法的性能。
    Abstract Active learning improves the performance of machine learning methods by judiciously selecting a limited number of unlabeled data points to query for labels, with the aim of maximally improving the underlying classifier's performance. Recent gains have been made using sequential active learning for synthetic aperture radar (SAR) data arXiv:2204.00005. In each iteration, sequential active learning selects a query set of size one while batch active learning selects a query set of multiple datapoints. While batch active learning methods exhibit greater efficiency, the challenge lies in maintaining model accuracy relative to sequential active learning methods. We developed a novel, two-part approach for batch active learning: Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size. As an application, a pipeline is built based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms the state-of-the-art CNN-based methods.
    摘要 To address this challenge, we developed a novel two-part approach for batch active learning, consisting of Dijkstra's Annulus Core-Set (DAC) for core-set generation and LocalMax for batch sampling. The batch active learning process that combines DAC and LocalMax achieves nearly identical accuracy as sequential active learning but is more efficient, proportional to the batch size.As an application, we built a pipeline based on transfer learning feature embedding, graph learning, DAC, and LocalMax to classify the FUSAR-Ship and OpenSARShip datasets. Our pipeline outperforms state-of-the-art CNN-based methods.Here is the translation in Simplified Chinese:活动学可以提高机器学习方法的性能,通过选择一个有限数量的未标注数据点,并将其用于标注,以最大化下面的类ifier表现。最近,有关synthetic aperture radar(SAR)数据的进步已经在arXiv:2204.0005中进行。在每个迭代中,sequential active learning选择一个查询集合,而批处活动学选择多个数据点的查询集合。虽然批处活动学方法更高效,但是保持模型准确性与sequential active learning方法相比是一大挑战。我们开发了一种新的、两部分的批处活动学方法,包括Dijkstra的Annulus Core-Set(DAC)和LocalMax。这种批处活动学过程结合DAC和LocalMax可以实现与sequential active learning方法准确性相似,但是更高效,即与批处大小相关。作为应用,我们建立了基于传输学习特征嵌入、图学习、DAC和LocalMax的管道,用于分类FUSAR-Ship和OpenSARShip数据集。我们的管道超过了基于Convolutional Neural Networks(CNN)的状态控制方法。

Blockchain-Based Federated Learning: Incentivizing Data Sharing and Penalizing Dishonest Behavior

  • paper_url: http://arxiv.org/abs/2307.10492
  • repo_url: None
  • paper_authors: Amir Jaberzadeh, Ajay Kumar Shrestha, Faijan Ahamad Khan, Mohammed Afaan Shaikh, Bhargav Dave, Jason Geng
  • for: 本研究旨在提供一个整合数据信任的联邦学习框架,以便在多方合作下进行安全且公正的数据分享,并提供激励、存取控制机制和处罚不正行为。
  • methods: 本研究使用了InterPlanetary File System、区块链和智能合约来实现安全且可靠的数据分享,并将数据信任 integrate into federated learning,以提高联邦学习模型的准确性。
  • results: 实验结果显示,提案的模型能够提高联邦学习模型的准确性,并确保数据分享过程中的安全和公正。此外,研究者还发展了一个基于区块技术的分散式机器学习平台,能够在多方合作下训练 CNN 模型,并维护数据隐私和安全。
    Abstract With the increasing importance of data sharing for collaboration and innovation, it is becoming more important to ensure that data is managed and shared in a secure and trustworthy manner. Data governance is a common approach to managing data, but it faces many challenges such as data silos, data consistency, privacy, security, and access control. To address these challenges, this paper proposes a comprehensive framework that integrates data trust in federated learning with InterPlanetary File System, blockchain, and smart contracts to facilitate secure and mutually beneficial data sharing while providing incentives, access control mechanisms, and penalizing any dishonest behavior. The experimental results demonstrate that the proposed model is effective in improving the accuracy of federated learning models while ensuring the security and fairness of the data-sharing process. The research paper also presents a decentralized federated learning platform that successfully trained a CNN model on the MNIST dataset using blockchain technology. The platform enables multiple workers to train the model simultaneously while maintaining data privacy and security. The decentralized architecture and use of blockchain technology allow for efficient communication and coordination between workers. This platform has the potential to facilitate decentralized machine learning and support privacy-preserving collaboration in various domains.
    摘要 随着数据共享的重要性增加,保证数据的安全和可靠性变得越来越重要。数据治理是一种常见的数据管理方式,但它面临着数据孤岛、数据一致性、隐私、安全和访问控制等挑战。为了解决这些挑战,这篇论文提出了一个涵盖数据信任的 federated learning 框架,并与 InterPlanetary File System、区块链和智能合约结合,实现安全和互惠的数据分享,并提供了奖励、访问控制机制和惩戒任何不诚实行为。实验结果表明,提议的模型能够提高 federated learning 模型的准确率,同时保障数据分享的安全性和公平性。论文还描述了一个基于区块链技术的分布式 federated learning 平台,可以同时训练多个工作者的 CNN 模型,并保持数据隐私和安全性。该平台的分布式架构和使用区块链技术,可以实现高效的通信和协调。这种平台具有推动分布式机器学习和保持隐私协作的潜在潜力。

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

  • paper_url: http://arxiv.org/abs/2307.10490
  • repo_url: https://github.com/ebagdasa/multimodal_injection
  • paper_authors: Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov
  • for: 用图像和声音进行间接提示和指导 injection 攻击。
  • methods: 生成攻击者选择的抗干扰噪音或图像,并将其混合到原始模型中。
  • results: 当用户问题 benign 模型关于干扰后的图像或声音时,攻击者可以通过控制模型的输出文本和对话流来实现攻击。
    Abstract We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.
    摘要 我们示示如何使用图像和声音进行间接提示和指令注入在多modal LLMS中。攻击者创造了这些提示的攻击扰动,并与图像或音频录音混合在一起。当用户对(未修改、良好)模型询问这些混合过的图像或音频时,攻击扰动将使模型输出攻击者选择的文本和/或导致后续对话按照攻击者的指令进行。我们透过多个证明例子,证明这种攻击可以对LLLaVa和PandaGPT进行。

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

  • paper_url: http://arxiv.org/abs/2307.10488
  • repo_url: https://github.com/thakur-nandan/sprint
  • paper_authors: Nandan Thakur, Kexin Wang, Iryna Gurevych, Jimmy Lin
  • for: 这个论文主要是为了提供一个统一的 Python 工具集(SPRINT),用于评估基于含义搜索的神经稀缺检索模型。
  • methods: 这个论文使用了 Pyserini 和 Lucene 等工具来实现一个通用的接口,支持多种基于神经网络的稀缺检索模型。用户可以轻松地添加自己的定制模型,只需要定义权重方法即可。
  • results: 根据 authors 的实验结果,SPRINT 工具集可以在 BEIR 等 benchmark 上建立强大且可重复的零批稀缺检索基准。其中,SPLADEv2 模型在 BEIR 上的平均得分为 0.470 nDCG@10,胜过其他神经稀缺检索模型。 authors 还发现,SPLADEv2 生成的稀缺表示可以帮助其取得表现提升,其中大多数的字符出现在原始查询和文档之外。
    Abstract Traditionally, sparse retrieval systems relied on lexical representations to retrieve documents, such as BM25, dominated information retrieval tasks. With the onset of pre-trained transformer models such as BERT, neural sparse retrieval has led to a new paradigm within retrieval. Despite the success, there has been limited software supporting different sparse retrievers running in a unified, common environment. This hinders practitioners from fairly comparing different sparse models and obtaining realistic evaluation results. Another missing piece is, that a majority of prior work evaluates sparse retrieval models on in-domain retrieval, i.e. on a single dataset: MS MARCO. However, a key requirement in practical retrieval systems requires models that can generalize well to unseen out-of-domain, i.e. zero-shot retrieval tasks. In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval. The toolkit currently includes five built-in models: uniCOIL, DeepImpact, SPARTA, TILDEv2 and SPLADEv2. Users can also easily add customized models by defining their term weighting method. Using our toolkit, we establish strong and reproducible zero-shot sparse retrieval baselines across the well-acknowledged benchmark, BEIR. Our results demonstrate that SPLADEv2 achieves the best average score of 0.470 nDCG@10 on BEIR amongst all neural sparse retrievers. In this work, we further uncover the reasons behind its performance gain. We show that SPLADEv2 produces sparse representations with a majority of tokens outside of the original query and document which is often crucial for its performance gains, i.e. a limitation among its other sparse counterparts. We provide our SPRINT toolkit, models, and data used in our experiments publicly here at https://github.com/thakur-nandan/sprint.
    摘要 传统上,稀疏搜寻系统从字词表现来进行文档搜寻,如BM25,对于搜寻任务产生了重要影响。随着预训 трансформа器模型,如BERT,神经稀疏搜寻带来了一个新的时代。不过,有限的软件支持不同的稀疏模型在一个共同环境中运行,导致实践者很难比较不同的稀疏模型,并获得实际的评估结果。另外,许多先前的工作仅对内部过滤进行评估,即在MS MARCO上进行内部过滤。但是,实际搜寻系统中需要模型能够对未见过的零数据类型进行推导,这是一个重要的需求。在这个研究中,我们提供了SPRINT,一个基于Pyserini和Lucene的Python工具组,支持一个共同的界面,用于评估神经稀疏搜寻。工具组目前包括五个内置模型:uniCOIL、DeepImpact、SPARTA、TILDEv2和SPLADEv2。用户可以轻松地添加自己定义的条件评估方法。使用我们的工具组,我们建立了强大且可重现的零数据类型神经稀疏搜寻基准,并在BEIR上取得了最好的平均分为0.470 nDCG@10。在这个研究中,我们进一步探索了SPLADEv2的表现原因,发现它生成的稀疏表现中,大多数的字词位于原始查询和文档之外,这经常是其表现优化的关键。我们提供了我们在这个研究中使用的SPRINT工具组、模型和数据,可以在https://github.com/thakur-nandan/sprint上取得。

FinGPT: Democratizing Internet-scale Data for Financial Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10485
  • repo_url: https://github.com/ai4finance-foundation/fingpt
  • paper_authors: Xiao-Yang Liu, Guoxuan Wang, Daochen Zha
    for:FinGPT aims to democratize Internet-scale financial data for large language models (LLMs) to revolutionize the finance industry.methods:FinGPT introduces an open-sourced and data-centric framework that automates the collection and curation of real-time financial data from diverse sources on the Internet.results:FinGPT provides researchers and practitioners with accessible and transparent resources to develop their FinLLMs, and demonstrates several applications including robo-advisor, sentiment analysis for algorithmic trading, and low-code development.
    Abstract Large language models (LLMs) have demonstrated remarkable proficiency in understanding and generating human-like texts, which may potentially revolutionize the finance industry. However, existing LLMs often fall short in the financial field, which is mainly attributed to the disparities between general text data and financial text data. Unfortunately, there is only a limited number of financial text datasets available (quite small size), and BloombergGPT, the first financial LLM (FinLLM), is close-sourced (only the training logs were released). In light of this, we aim to democratize Internet-scale financial data for LLMs, which is an open challenge due to diverse data sources, low signal-to-noise ratio, and high time-validity. To address the challenges, we introduce an open-sourced and data-centric framework, \textit{Financial Generative Pre-trained Transformer (FinGPT)}, that automates the collection and curation of real-time financial data from >34 diverse sources on the Internet, providing researchers and practitioners with accessible and transparent resources to develop their FinLLMs. Additionally, we propose a simple yet effective strategy for fine-tuning FinLLM using the inherent feedback from the market, dubbed Reinforcement Learning with Stock Prices (RLSP). We also adopt the Low-rank Adaptation (LoRA, QLoRA) method that enables users to customize their own FinLLMs from open-source general-purpose LLMs at a low cost. Finally, we showcase several FinGPT applications, including robo-advisor, sentiment analysis for algorithmic trading, and low-code development. FinGPT aims to democratize FinLLMs, stimulate innovation, and unlock new opportunities in open finance. The codes are available at https://github.com/AI4Finance-Foundation/FinGPT and https://github.com/AI4Finance-Foundation/FinNLP
    摘要 大型自然语言模型(LLM)已经表现出了人类语言理解和生成的很高水平,这可能会革命化金融业。然而,现有的LLM在金融领域经常缺乏表现,这主要归结于通用文本数据和金融文本数据之间的差异。尽管只有有限的金融文本数据集available(数据集较小),而BloombergGPT,首个金融LLM(FinLLM),则是关闭源的(只发布了训练日志)。为了普及互联网级金融数据 для LLM,这是一个开放的挑战,因为数据来源多样化、信号噪声比较低和时间有效性很高。为了解决这些挑战,我们提出了一个开源和数据中心的框架,名为金融生成预训练变换器(Financial Generative Pre-trained Transformer,FinGPT)。FinGPT自动收集和整理互联网上 >34 个不同来源的实时金融数据,为研究人员和实践者提供了可 accessible 和 transparent 的资源,以便开发自己的FinLLM。此外,我们还提出了一种简单 yet effective的RLSP(市场反馈强化学习)策略,可以通过市场的自然反馈来训练FinLLM。此外,我们采用了LoRA(低级适应)方法,允许用户自定义自己的FinLLM,从开源通用自然语言模型(NLM)中获得优秀的性能,而不需要大量的人工调整。FinGPT还提供了多种应用,包括智能投资、情感分析 для算法交易和低代码开发。FinGPT的代码可以在 上下载。FinGPT的目标是普及FinLLM,促进创新,并在开放金融中解锁新的机会。

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

  • paper_url: http://arxiv.org/abs/2307.10472
  • repo_url: None
  • paper_authors: Omkar Dige, Jacob-Junqi Tian, David Emerson, Faiza Khan Khattak
  • for: 评估语言模型中的社会偏见
  • methods: 采用零批示评估语言模型的偏见识别能力
  • results: 结果显示,通过训练 Alpaca 7B 模型,可以达到 56.7% 的准确率,并且规模和数据多样性的扩展可能会带来更好的表现。
    Abstract As the breadth and depth of language model applications continue to expand rapidly, it is increasingly important to build efficient frameworks for measuring and mitigating the learned or inherited social biases of these models. In this paper, we present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting, including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and data diversity could lead to further performance gain. This is a work-in-progress presenting the first component of our bias mitigation framework. We will keep updating this work as we get more results.
    摘要 随着语言模型应用的广度和深度快速扩展,现在越来越重要地建立高效的语言模型偏见评估和mitigation的框架。在这篇论文中,我们介绍了我们对适用于各种各样的引入语言模型的偏见识别能力进行评估的方法,包括链条(Chain-of-Thought)提示。在LLaMA和其两个 instrucion fine-tuned 版本中,Alpaca 7B 在偏见识别任务上表现最好,准确率为 56.7%。我们还证明了通过增加 LLVM 大小和数据多样性可以实现更大的性能提升。这是我们偏见 mitigation 框架的首个组成部分,我们会继续更新这个工作,以获得更多的结果。

Classification of Visualization Types and Perspectives in Patents

  • paper_url: http://arxiv.org/abs/2307.10471
  • repo_url: https://github.com/tibhannover/patentimageclassification
  • paper_authors: Junaid Ahmed Ghauri, Eric Müller-Budack, Ralph Ewerth
  • for: 本研究旨在提高套件申请检索和浏览的效率,通过使用不同类型的视觉化和视角来显示创新的细节。
  • methods: 本研究使用了现代深度学习方法,包括变换器,进行图像类型和视角的分类。我们也对CLEF-IP dataset进行扩展,并提供了手动标注的ground truth。
  • results: 实验结果表明了提案的方法的可行性。我们将源代码、模型和数据集公开发布。
    Abstract Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.
    摘要 In this paper, we employ state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We expand the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. Furthermore, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. The source code, models, and dataset will be publicly available.

Properties of Discrete Sliced Wasserstein Losses

  • paper_url: http://arxiv.org/abs/2307.10352
  • repo_url: None
  • paper_authors: Eloi Tanguy, Rémi Flamary, Julie Delon
  • for: 这个论文主要研究了 $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z) $ 的属性和优化性,其中 $\gamma_Y $ 和 $\gamma_Z $ 是两个 uniform 抽象概率分布。
  • methods: 这篇论文使用了多种方法,包括研究 $\mathcal{E} $ 的正则性和优化性,以及其 Monte-Carlo 采样 $\mathcal{E}_p $ 的渐近稳定性和 almost-sure uniform 收敛性。
  • results: 研究结果表明,在某些情况下,Stochastic Gradient Descent 方法可以减少 $\mathcal{E} $ 和 $\mathcal{E}_p $ 的优化问题,并且这些方法会收敛到(Clarke)优化点。
    Abstract The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.
    摘要 “划分 Wasserstein(SW)距离已成为比 Wasserstein 距离更受欢迎的选择,用于比较概率分布。它在图像处理、领域适应和生成模型中广泛应用,通常是寻找可以最小化 SW 的参数,以便作为这些参数的损失函数。这些优化问题都有同一个子问题,即寻找可以最小化 SW 能量。在这篇文章中,我们研究了 $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z) $ 的性质,其中 $Y \in \mathbb{R}^{n \times d}$ 是一个某个概率分布的支持。我们调查了这个能量的规律和优化性,以及其 Monte-Carlo 预估 $\mathcal{E}_p$ 的数值,并证明了这些点的均匀收摄和确定性。最后,我们显示了在某些意义上,使用 Stochastic Gradient Descent 方法优化 $\mathcal{E}$ 和 $\mathcal{E}_p$ 可以导向(Clarke)内部点的极值。”

A data science axiology: the nature, value, and risks of data science

  • paper_url: http://arxiv.org/abs/2307.10460
  • repo_url: None
  • paper_authors: Michael L. Brodie
  • for: 这篇论文是为了探讨数据科学的axiology,即其目的、性质、重要性、风险和价值,以帮助理解和定义数据科学,并找到其可能的利益和风险。
  • methods: 这篇论文使用了AXIOLOGY的方法来探讨数据科学的特点,包括其不可预测的性和AI的应用。
  • results: 这篇论文的结果表明,数据科学在知识发现方面具有很大的潜力和可能性,但同时也存在一些风险,例如不可预测的结果和AI的应用可能导致的不良影响。
    Abstract Data science is not a science. It is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery that is not otherwise possible and can be beyond human reasoning. It is changing our world practically and profoundly already widely deployed in tens of thousands of applications in every discipline in an AI Arms Race that, due to its inscrutability, can lead to unfathomed risks. This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving, by exploring and evaluating its remarkable, definitive features. As data science is in its infancy, this initial, speculative axiology is intended to aid in understanding and defining data science to recognize its potential benefits, risks, and open research challenges. AI based data science is inherently about uncertainty that may be more realistic than our preference for the certainty of science. Data science will have impacts far beyond knowledge discovery and will take us into new ways of understanding the world.
    摘要 “数据科学不是一种科学。它是一种研究方法论,具有未曾探索的范围、大小、复杂性和知识发现的力量,超出人类理解的限制。它正在改变我们的世界,已经广泛应用于万千个应用领域,在人工智能竞赛中投入了巨资。这篇论文提出了数据科学的axiology,其目的、本质、重要性、风险和价值,通过探究和评估其非凡的特点。由于数据科学处于其初期,这些初步的论据axiology的目的是帮助我们理解和定义数据科学,认识其潜在的利益、风险和研究挑战。AI基于的数据科学是一种不确定性,可能更加真实地反映我们对世界的理解。数据科学将对我们的世界产生深远的影响,将带我们进入新的理解世界。”

A New Computationally Simple Approach for Implementing Neural Networks with Output Hard Constraints

  • paper_url: http://arxiv.org/abs/2307.10459
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Andrei V. Konstantinov, Lev V. Utkin
  • for: 提出了一种新的计算简单的神经网络输出值约束方法。
  • methods: 使用了额外的神经网络层来实现约束,并将约束转换为神经网络输出值的限制。
  • results: 方法可以简单地扩展到受约束的输入输出问题,并且可以实现不同类型的约束,包括线性和二次约束、等式约束和动态约束。计算复杂度为O(n*m)和O(n^2*m)。数据实验 validate了该方法。
    Abstract A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by the additional neural network layer with constraints for output. The proposed method is simply extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can simply be implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, constraints in the form of boundaries. An important feature of the method is its computational simplicity. Complexities of the forward pass of the proposed neural network layer by linear and quadratic constraints are O(n*m) and O(n^2*m), respectively, where n is the number of variables, m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems. The code implementing the method is publicly available.
    摘要 一种新的计算简单的方法,用于在神经网络输出值上强制实施硬 convex 约束,被提出。该方法的关键思想是将神经网络参数 вектор映射到一个确定在可行集中的点上。该映射通过额外的神经网络层实现,该层受约束的输出约束。提出的方法可以简单地扩展到输出约束不仅仅是单个输出向量,而且也包括输入的共同约束。投影方法可以简单地在提出的方法中实现。该方法可以具体实现不同类型的约束,包括线性和quadratic约束,等式约束,以及边界约束。该方法的计算简单性是其重要特点,其前向传播复杂度为O(n\*m)和O(n^2\*m),其中n是变量数,m是约束数。数值实验证明了该方法的可行性和分类能力。代码实现该方法公开可用。

A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

  • paper_url: http://arxiv.org/abs/2307.10455
  • repo_url: https://github.com/zahrag/BIOSCAN-1M
  • paper_authors: Zahra Gharaee, ZeMing Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott C. Lowe, Jaclyn T. A. McKeown, Chris C. Y. Ho, Joschka McLeod, Yi-Yun C Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel X. Chang, Graham W. Taylor, Paul Fieguth
  • for: This paper aims to provide a large dataset of hand-labelled insect images to train computer-vision models for taxonomic assessment, and to lay the foundation for a comprehensive survey of global biodiversity.
  • methods: The dataset, called BIOSCAN-Insect, includes raw nucleotide barcode sequences and assigned barcode index numbers for each record, and is primarily used to train computer-vision models for image-based taxonomic assessment.
  • results: The paper presents a million-image dataset with a long-tailed class-imbalance distribution and highly fine-grained classification problem at lower taxonomic levels, which provides a challenging task for image-based taxonomic classification.
    Abstract In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels. Beyond spurring interest in biodiversity research within the machine learning community, progress on creating an image-based taxonomic classifier will also further the ultimate goal of all BIOSCAN research: to lay the foundation for a comprehensive survey of global biodiversity. This paper introduces the dataset and explores the classification task through the implementation and analysis of a baseline classifier.
    摘要 寻求catalóg insect多样性,我们提出一个新的大型手标注 insect 图像数据集,称为 BIOSCAN-Insect 数据集。每个记录都由专家taxonomically 分类,同时还有关联的遗传信息,包括 raw нуклеоти德核心序列和分配给每个物种的核心序列编号,这些是生物学基于的种类分类的代理。本文报道一个精心纪录 million 张图像数据集,主要用于训练计算机视觉模型,以提供图像基于的种类评估。然而,数据集还具有一些吸引人的特征,如生物学性的长尾分布和生物学分类系统的层次结构,这些特征都是机器学习社区的研究对象。 basis 的目标是建立一个全面的 global 生物多样性 监测系统,本文引入数据集并通过实现和分析基eline 分类器来探讨分类任务。

The importance of feature preprocessing for differentially private linear optimization

  • paper_url: http://arxiv.org/abs/2307.11106
  • repo_url: None
  • paper_authors: Ziteng Sun, Ananda Theertha Suresh, Aditya Krishna Menon
  • for: 本研究目的是研究 differentially private stochastic gradient descent (DPSGD) 是否具有 sufficient condition to find a good minimizer for every dataset under privacy constraints.
  • methods: 本研究使用了 differentially private stochastic gradient descent (DPSGD) 和其 variants,以及 feature preprocessing.
  • results: 研究发现,without feature preprocessing, DPSGD 会导致 privacy error proportional to the maximum norm of features over all samples. 我们提出了一种名为 DPSGD-F 的算法,combines DPSGD with feature preprocessing, and prove that for classification tasks, it incurs a privacy error proportional to the diameter of the features. 我们还在图像分类 benchmarks 中证明了它的实用性.
    Abstract Training machine learning models with differential privacy (DP) has received increasing interest in recent years. One of the most popular algorithms for training differentially private models is differentially private stochastic gradient descent (DPSGD) and its variants, where at each step gradients are clipped and combined with some noise. Given the increasing usage of DPSGD, we ask the question: is DPSGD alone sufficient to find a good minimizer for every dataset under privacy constraints? As a first step towards answering this question, we show that even for the simple case of linear classification, unlike non-private optimization, (private) feature preprocessing is vital for differentially private optimization. In detail, we first show theoretically that there exists an example where without feature preprocessing, DPSGD incurs a privacy error proportional to the maximum norm of features over all samples. We then propose an algorithm called DPSGD-F, which combines DPSGD with feature preprocessing and prove that for classification tasks, it incurs a privacy error proportional to the diameter of the features $\max_{x, x' \in D} \|x - x'\|_2$. We then demonstrate the practicality of our algorithm on image classification benchmarks.
    摘要 translate-into:zh-CN训练机器学习模型 WITH differential privacy (DP) 在最近几年内得到了越来越多的关注。DP中最受欢迎的算法之一是差分隐私梯度下降 (DPSGD) 和其变体,在每步都将梯度clip和杂音结合在一起。随着 DPSGD 的使用越来越普遍,我们问:DPSGD 是否能够在隐私限制下找到每个数据集上的好最小值?作为回答的第一步,我们证明了非私有优化不同于隐私优化,private feature preprocessing 是必需的。在详细的证明中,我们证明了在 Linear classification 任务上,如果没有 feature preprocessing,DPSGD 会导致隐私错误与最大特征值的最大值成正比。我们then propose了一个名为 DPSGD-F 的算法,它将 DPSGD 与 feature preprocessing 结合,并证明了在分类任务上,它的隐私错误与特征值的最大值成正比。最后,我们在图像分类标准 benchmark 上证明了我们的算法的实用性。

Integrating a Heterogeneous Graph with Entity-aware Self-attention using Relative Position Labels for Reading Comprehension Model

  • paper_url: http://arxiv.org/abs/2307.10443
  • repo_url: None
  • paper_authors: Shima Foolad, Kourosh Kiani
  • for: 提高机器阅读理解模型的复杂逻辑处理能力
  • methods: injecting external knowledge into the transformer architecture without relying on external knowledge
  • results: 模型在ReCoRD数据集上的表现比cutting-edge LUKE-Graph和基eline LUKE模型更优。
    Abstract Despite the significant progress made by transformer models in machine reading comprehension tasks, they still fall short in handling complex reasoning tasks due to the absence of explicit knowledge in the input sequence. To address this limitation, many recent works have proposed injecting external knowledge into the model. However, selecting relevant external knowledge, ensuring its availability, and requiring additional processing steps remain challenging. In this paper, we introduce a novel attention pattern that integrates reasoning knowledge derived from a heterogeneous graph into the transformer architecture without relying on external knowledge. The proposed attention pattern comprises three key elements: global-local attention for word tokens, graph attention for entity tokens that exhibit strong attention towards tokens connected in the graph as opposed to those unconnected, and the consideration of the type of relationship between each entity token and word token. This results in optimized attention between the two if a relationship exists. The pattern is coupled with special relative position labels, allowing it to integrate with LUKE's entity-aware self-attention mechanism. The experimental findings corroborate that our model outperforms both the cutting-edge LUKE-Graph and the baseline LUKE model on the ReCoRD dataset that focuses on commonsense reasoning.
    摘要 尽管变换器模型在机器阅读理解任务中做出了重要进步,但它们仍然缺乏明确的知识表达,导致在复杂的推理任务中表现不佳。为解决这一限制,许多最近的研究强调在模型中注入外部知识。然而,选择相关的外部知识,确保其可用性,以及需要额外的处理步骤仍然是挑战。本文提出了一种新的注意模式,即在变换器架构中 интеグ推理知识来自多样化图表示。该注意模式包括三个关键元素:全局-本地注意力 для单词Token,对于实体Token的注意力,以及对每个实体Token和单词Token之间的关系类型进行考虑。这些元素的结合使得注意力得到优化。此外,我们还采用特殊相对位标签,使得该注意模式可以与LUKE模型的实体意识自注意机制集成。实验结果表明,我们的模型在 Commonsense Reasoning 数据集上比悉心LUKE-Graph和基础LUKE模型表现出色。

Confidence Estimation Using Unlabeled Data

  • paper_url: http://arxiv.org/abs/2307.10440
  • repo_url: https://github.com/topoxlab/consistency-ranking-loss
  • paper_authors: Chen Li, Xiaoling Hu, Chao Chen
  • for: 这篇论文的目的是提出一种基于半监督学习的信任估计方法,即使训练标签很少也可以估计模型对未标注样本的信任程度。
  • methods: 该方法使用训练过程中预测的一致性作为代理函数,并提出了一种一致性排名损失函数来估计信任程度。
  • results: 在图像分类和分割任务上,该方法实现了领先的性能在信任估计中,并且通过下游活动学任务的示例表明了该方法的优势。Here’s the English version of the three key points:
  • for: The purpose of this paper is to propose a confidence estimation method for a semi-supervised setting, where most training labels are unavailable.
  • methods: The method uses the consistency of predictions through the training process as a surrogate function, and proposes a consistency ranking loss function for confidence estimation.
  • results: On both image classification and segmentation tasks, the proposed method achieves state-of-the-art performances in confidence estimation, and demonstrates its advantage through an active learning task.
    Abstract Overconfidence is a common issue for deep neural networks, limiting their deployment in real-world applications. To better estimate confidence, existing methods mostly focus on fully-supervised scenarios and rely on training labels. In this paper, we propose the first confidence estimation method for a semi-supervised setting, when most training labels are unavailable. We stipulate that even with limited training labels, we can still reasonably approximate the confidence of model on unlabeled samples by inspecting the prediction consistency through the training process. We use training consistency as a surrogate function and propose a consistency ranking loss for confidence estimation. On both image classification and segmentation tasks, our method achieves state-of-the-art performances in confidence estimation. Furthermore, we show the benefit of the proposed method through a downstream active learning task. The code is available at https://github.com/TopoXLab/consistency-ranking-loss
    摘要 通过训练过程中的预测一致性来估计模型的自信度,我们提出了首个在半监督Setting下的自信度估计方法。即使有限的训练标签,我们仍可以通过训练过程中的预测一致性来理想地估计模型对无标示样本的自信度。我们使用训练一致性作为代理函数,并提出了一种一致排名损失用于自信度估计。在图像分类和 segmentation 任务中,我们的方法实现了状态机器人的表现,并且我们还证明了我们的方法在下游活动学任务中的利好。代码可以在 https://github.com/TopoXLab/consistency-ranking-loss 上获取。

  • paper_url: http://arxiv.org/abs/2307.10438
  • repo_url: None
  • paper_authors: Shengli Jiang, Shiyi Qin, Reid C. Van Lehn, Prasanna Balaprakash, Victor M. Zavala
  • for: 用于 молекуляр的性能预测
  • methods: 使用自动搜索生成高性能 GNN ensemble,并使用 variance decomposition 分解数据和模型不确定性
  • results: 在多个 benchmark 数据集上表现出色,在预测准确性和 UQ 性能方面超过现有方法,并通过 t-SNE 可视化探索分子特征和不确定性的相关性。
    Abstract Graph Neural Networks (GNNs) have emerged as a prominent class of data-driven methods for molecular property prediction. However, a key limitation of typical GNN models is their inability to quantify uncertainties in the predictions. This capability is crucial for ensuring the trustworthy use and deployment of models in downstream tasks. To that end, we introduce AutoGNNUQ, an automated uncertainty quantification (UQ) approach for molecular property prediction. AutoGNNUQ leverages architecture search to generate an ensemble of high-performing GNNs, enabling the estimation of predictive uncertainties. Our approach employs variance decomposition to separate data (aleatoric) and model (epistemic) uncertainties, providing valuable insights for reducing them. In our computational experiments, we demonstrate that AutoGNNUQ outperforms existing UQ methods in terms of both prediction accuracy and UQ performance on multiple benchmark datasets. Additionally, we utilize t-SNE visualization to explore correlations between molecular features and uncertainty, offering insight for dataset improvement. AutoGNNUQ has broad applicability in domains such as drug discovery and materials science, where accurate uncertainty quantification is crucial for decision-making.
    摘要 图 neural network (GNN) 已经成为分子性质预测中一种显著的数据驱动方法。然而,典型的 GNN 模型无法量化预测结果的不确定性。这种能力是在下游任务中使模型使用和部署的信任性质的关键。为此,我们介绍 AutoGNNUQ,一种自动 uncertainty quantification(UQ)方法 для分子性质预测。AutoGNNUQ 利用架构搜索生成一个高性能的 GNN ensemble,以便估计预测结果的不确定性。我们的方法使用差分分析将数据( aleatoric)和模型(epistemic)不确定性分解,为了减少它们。在我们的计算实验中,我们证明 AutoGNNUQ 在多个benchmark数据集上表现出色,比现有的 UQ 方法更高精度和 UQ 性能。此外,我们使用 t-SNE 可视化来探索分子特征与不确定性之间的相关性,为了改进数据集。AutoGNNUQ 在药物探索和材料科学等领域有广泛的应用,因为它可以减少分子性质预测中的不确定性。

A Bayesian Programming Approach to Car-following Model Calibration and Validation using Limited Data

  • paper_url: http://arxiv.org/abs/2307.10437
  • repo_url: None
  • paper_authors: Franklin Abodo
  • for: 这个论文是为了提供一种可以准确模拟驾驶行为的模型,以便在交通研究和工程中设计和评估道路改进计划。
  • methods: 这个论文使用了微型驾驶行为模型,从而 derivate macroscopic 措施如流速和拥堵。然而,现有的模型多数只适用于特定的交通情况和道路配置,而无法 direct 应用于工区(WZ)的情况。因此,美国交通部(USDOT)的负责交通研究的Volpe中心被委托,以开发一种可以准确模拟驾驶行为的CF模型,以便在工区中进行安全的交通规划。
  • results: 在模型开发过程中,Volpe研究人员发现了困难在模型kalibrase,因此提出了问题是否存在模型中的问题,数据中的问题,或者 kalibrase 过程中的问题。本论文使用 bayesian 方法进行数据分析和参数估计,以探讨和解决这些问题。首先,使用 bayesian 推理测量数据集的充分性。其次,比较 Volpe 研究人员使用的 Genetic Algorithm 基于 calibration 的过程和 bayesian calibration 的结果。最后,通过使用已有的 CF 模型,Wiedemann 99,对 Volpe 模型进行 probabilistic 模型化。验证是通过信息 критери估计 predictive 准确性来进行。
    Abstract Traffic simulation software is used by transportation researchers and engineers to design and evaluate changes to roadways. These simulators are driven by models of microscopic driver behavior from which macroscopic measures like flow and congestion can be derived. Many models are designed for a subset of possible traffic scenarios and roadway configurations, while others have no explicit constraints on their application. Work zones (WZs) are one scenario for which no model to date has reproduced realistic driving behavior. This makes it difficult to optimize for safety and other metrics when designing a WZ. The Federal Highway Administration commissioned the USDOT Volpe Center to develop a car-following (CF) model for use in microscopic simulators that can capture and reproduce driver behavior accurately within and outside of WZs. Volpe also performed a naturalistic driving study to collect telematics data from vehicles driven on roads with WZs for use in model calibration. During model development, Volpe researchers observed difficulties in calibrating their model, leaving them to question whether there existed flaws in their model, in the data, or in the procedure used to calibrate the model using the data. In this thesis, I use Bayesian methods for data analysis and parameter estimation to explore and, where possible, address these questions. First, I use Bayesian inference to measure the sufficiency of the size of the data set. Second, I compare the procedure and results of the genetic algorithm based calibration performed by the Volpe researchers with those of Bayesian calibration. Third, I explore the benefits of modeling CF hierarchically. Finally, I apply what was learned in the first three phases using an established CF model, Wiedemann 99, to the probabilistic modeling of the Volpe model. Validation is performed using information criteria as an estimate of predictive accuracy.
    摘要 交通模拟软件被运输研究人员和工程师使用来设计和评估路径变化。这些模拟器由微型驾驶行为模型驱动,从而 derive 流和堵塞等宏观指标。许多模型适用于特定的交通enario 和路径配置,而其他些没有明确的约束。工地(WZ)是一种 scenarios для which no model to date has reproduced realistic driving behavior. 这使得在设计工地时 diffficult to optimize for safety and other metrics。美国公路管理局委托美国交通部Volpe Center 开发一个可以在微型模拟器中Capture 和重现驾驶行为的 car-following (CF)模型。Volpe 还执行了一项自然驾驶研究,收集了在路径上驾驶的 vehicless 的 telematics 数据,用于模型均衡。在模型开发过程中,Volpe 研究人员注意到了困难在均衡模型,使得他们开始提问是否存在模型中的毛病、数据中的毛病或者均衡模型使用数据的过程中的毛病。在这个论文中,我使用 bayesian 方法来分析数据和参数估计,以探索和解决这些问题。首先,我使用 bayesian 推理来测试数据集的大小是否充分。其次,我比较了 Volpe 研究人员使用的 genetic algorithm 基于的均衡过程和 bayesian 均衡过程的结果。最后,我探索了模型CF 的层次结构化的好处。最后,我使用一个已知的 CF 模型,Wiedemann 99,来应用在 Volpe 模型上。验证是使用信息 критериion 来估计预测精度。

A Matrix Ensemble Kalman Filter-based Multi-arm Neural Network to Adequately Approximate Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.10436
  • repo_url: https://github.com/ved-piyush/menkf-ann-pul
  • paper_authors: Ved Piyush, Yuchen Yan, Yuzhen Zhou, Yanbin Yin, Souparno Ghosh
  • for: This paper aims to propose a new technique for approximating deep learning models, specifically long short-term memory (LSTM) networks, using a Kalman filter-based approach.
  • methods: The proposed method, called Matrix Ensemble Kalman Filter-based multi-arm ANN (MEnKF-ANN), uses a multi-arm extension of a Kalman filter to approximate LSTM networks, and also performs explicit model stacking to handle unequal-size feature sets.
  • results: The proposed method can adequately approximate LSTM networks trained to classify carbohydrate substrates based on genomic sequences, and can also provide uncertainty estimates for the predictions.
    Abstract Deep Learners (DLs) are the state-of-art predictive mechanism with applications in many fields requiring complex high dimensional data processing. Although conventional DLs get trained via gradient descent with back-propagation, Kalman Filter (KF)-based techniques that do not need gradient computation have been developed to approximate DLs. We propose a multi-arm extension of a KF-based DL approximator that can mimic DL when the sample size is too small to train a multi-arm DL. The proposed Matrix Ensemble Kalman Filter-based multi-arm ANN (MEnKF-ANN) also performs explicit model stacking that becomes relevant when the training sample has an unequal-size feature set. Our proposed technique can approximate Long Short-term Memory (LSTM) Networks and attach uncertainty to the predictions obtained from these LSTMs with desirable coverage. We demonstrate how MEnKF-ANN can "adequately" approximate an LSTM network trained to classify what carbohydrate substrates are digested and utilized by a microbiome sample whose genomic sequences consist of polysaccharide utilization loci (PULs) and their encoded genes.
    摘要 深度学习器(DL)是现代预测机制的州标,应用于需要复杂高维数据处理的多个领域。尽管传统的DL通过梯度下降和反推训练,但是使用Kalman滤波器(KF)技术不需要计算梯度的方法已经开发出来。我们提议一种基于KF的多臂ANN(MEnKF-ANN),可以模拟DL,即使训练样本规模太小。我们的提议技术还实现了显式模型堆叠,当特征集的大小不同时变得有用。我们的提议技术可以模拟长期短 память网络(LSTM),并将对应的预测结果添加不确定性。我们示例了MEnKF-ANN可以“合理”地模拟一个基于PULs和其编码的微生物批处理训练集,用于预测微生物样本中吃掉和利用的碳水化合物substrate。

Learning Formal Specifications from Membership and Preference Queries

  • paper_url: http://arxiv.org/abs/2307.10434
  • repo_url: None
  • paper_authors: Ameesh Shah, Marcell Vazquez-Chanlatte, Sebastian Junges, Sanjit A. Seshia
  • for: 学习形式规定(如自动机)的正式规定
  • methods: 提议一种新的框架, combining membership labels和对比 preference,以便更加灵活地进行活动规定学习
  • results: 在两个不同的领域中实现了框架,并证明了我们的方法可以强健地和方便地通过对比和成员标签来识别规定。
    Abstract Active learning is a well-studied approach to learning formal specifications, such as automata. In this work, we extend active specification learning by proposing a novel framework that strategically requests a combination of membership labels and pair-wise preferences, a popular alternative to membership labels. The combination of pair-wise preferences and membership labels allows for a more flexible approach to active specification learning, which previously relied on membership labels only. We instantiate our framework in two different domains, demonstrating the generality of our approach. Our results suggest that learning from both modalities allows us to robustly and conveniently identify specifications via membership and preferences.
    摘要 active learning是一种已经广泛研究的学习方法,用于学习正式规则,如自动机。在这项工作中,我们将活动规则学习框架扩展到请求组合会员标签和对比性偏好。这种组合方式允许我们更加灵活地进行活动规则学习,之前只能通过会员标签进行学习。我们在两个不同领域中实现了我们的框架,并证明了我们的方法的通用性。我们的结果表明,从两种模式学习可以强大地和方便地识别规则via会员和偏好。

DP-TBART: A Transformer-based Autoregressive Model for Differentially Private Tabular Data Generation

  • paper_url: http://arxiv.org/abs/2307.10430
  • repo_url: None
  • paper_authors: Rodrigo Castellon, Achintya Gopal, Brian Bloniarz, David Rosenberg
  • for: 生成具有分布式隐私的 tabular 数据
  • methods: 使用 transformer 模型实现 differentially-private 推论
  • results: 在各种数据集上达到与 marginal-based 方法竞争的性能,在某些情况下甚至超越状态之arte 方法表现Here’s the translation in Simplified Chinese:
  • for: 生成具有分布式隐私的 tabular 数据
  • methods: 使用 transformer 模型实现 differentially-private 推论
  • results: 在各种数据集上达到与 marginal-based 方法竞争的性能,在某些情况下甚至超越状态之arte 方法表现
    Abstract The generation of synthetic tabular data that preserves differential privacy is a problem of growing importance. While traditional marginal-based methods have achieved impressive results, recent work has shown that deep learning-based approaches tend to lag behind. In this work, we present Differentially-Private TaBular AutoRegressive Transformer (DP-TBART), a transformer-based autoregressive model that maintains differential privacy and achieves performance competitive with marginal-based methods on a wide variety of datasets, capable of even outperforming state-of-the-art methods in certain settings. We also provide a theoretical framework for understanding the limitations of marginal-based approaches and where deep learning-based approaches stand to contribute most. These results suggest that deep learning-based techniques should be considered as a viable alternative to marginal-based methods in the generation of differentially private synthetic tabular data.
    摘要 “ differential privacy 的 synthetic 表格数据生成问题是一个日益重要的问题。传统的边缘基于方法已经取得了很好的成绩,但最近的工作表明,深度学习基于方法在这个领域比较落后。在这篇文章中,我们介绍了一种名为 Differentially-Private TaBular AutoRegressive Transformer (DP-TBART),这是一种基于 transformer 的自然语言模型,可以保持 differential privacy 并在各种数据集上达到与边缘基于方法相当的性能。我们还提供了一个理论框架,用于理解边缘基于方法的局限性,以及深度学习基于方法在这个领域中的潜在贡献。这些结果表明,深度学习基于方法应该被视为 differential privacy 生成 synthetic 表格数据的可行的替代方案。”Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.

PreDiff: Precipitation Nowcasting with Latent Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.10422
  • repo_url: None
  • paper_authors: Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang
  • for: 预测地球系统的未来状况,使用深度学习技术来处理大量的空间时间数据。
  • methods: 提出了一种两stage管道,首先开发了一种可能性扩散模型(PreDiff),其可以进行 probabilistic 预测。其次,通过explicit地控制知识机制,使预测结果与专业知识相一致。
  • results: 通过在Synthetic dataset N-body MNIST和实际 precipitation nowcasting dataset SEVIR进行实验,确认了PreDiff的可行性和Domain-specific prior knowledge的可控性,并且预测结果具有高度的操作实用性。
    Abstract Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge control mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
    摘要 地球系统预测 Traditional 依靠复杂的物理模型,computationally expensive 和需要专业知识。 last decade, the unprecedented increase in spatiotemporal Earth observation data 使得 data-driven forecasting models using deep learning techniques 表现出了 promise для diverse Earth system forecasting tasks。 However, these models either struggle with handling uncertainty 或 neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting:1. We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts.2. We incorporate an explicit knowledge control mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly.We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.

Technical Challenges of Deploying Reinforcement Learning Agents for Game Testing in AAA Games

  • paper_url: http://arxiv.org/abs/2307.11105
  • repo_url: None
  • paper_authors: Jonas Gillberg, Joakim Bergdahl, Alessandro Sestini, Andrew Eakins, Linus Gisslen
  • for: 这个技术论文是为了推广游戏生产中的机器学习技术,特意是通过让游戏自动化测试解决方案中加入了实验性的学习系统来提高测试覆盖率。
  • methods: 这篇技术论文描述了一种将学习系统与现有的脚本化测试解决方案集成,以提高测试覆盖率。具体来说,他们使用了一种基于强化学习的方法,通过让机器学习算法学习自动化测试过程中的优化策略,以提高测试效果。
  • results: 据文章报道,通过将学习系统与脚本化测试解决方案集成,可以有效提高测试覆盖率,并且在一些AAA游戏,如《战场2042》和《黑暗空间2023》中实现了一定的成果。
    Abstract Going from research to production, especially for large and complex software systems, is fundamentally a hard problem. In large-scale game production, one of the main reasons is that the development environment can be very different from the final product. In this technical paper we describe an effort to add an experimental reinforcement learning system to an existing automated game testing solution based on scripted bots in order to increase its capacity. We report on how this reinforcement learning system was integrated with the aim to increase test coverage similar to [1] in a set of AAA games including Battlefield 2042 and Dead Space (2023). The aim of this technical paper is to show a use-case of leveraging reinforcement learning in game production and cover some of the largest time sinks anyone who wants to make the same journey for their game may encounter. Furthermore, to help the game industry to adopt this technology faster, we propose a few research directions that we believe will be valuable and necessary for making machine learning, and especially reinforcement learning, an effective tool in game production.
    摘要 从研究到生产,特别是 для大型和复杂的软件系统,是一个基本问题。在大规模游戏生产中,一个主要原因是开发环境和产品环境之间的差异。在这份技术著作中,我们描述了将实验式学习系统添加到现有的自动游戏测试解决方案基于脚本式 Bot 以增加其容量的尝试。我们报告了在一些 AAA 游戏,包括 Battlefield 2042 和 Dead Space (2023) 中将这个学习系统整合的成果,并希望这份技术著作可以显示游戏生产中如何使用学习机器人,以及一些可能会遇到的主要时间潜在障碍。此外,为了帮助游戏业界更快地采纳这技术,我们建议了一些研究方向,我们认为这些研究方向将是有价值和必要的,以便在游戏生产中使用机器学习和特别是实验学习。

Interpreting and Correcting Medical Image Classification with PIP-Net

  • paper_url: http://arxiv.org/abs/2307.10404
  • repo_url: https://github.com/m-nauta/pipnet
  • paper_authors: Meike Nauta, Johannes H. Hegeman, Jeroen Geerdink, Jörg Schlötterer, Maurice van Keulen, Christin Seifert
  • for: 这篇论文旨在探讨可解释的机器学习模型在医学影像分类 tasks 中的应用性和潜力。
  • methods: 论文使用的是PIP-Net模型,这是一种可解释的图像分类模型,它学习了人类理解的图像部件。
  • results: 研究发现,PIP-Net 的决策过程与医学分类标准相一致,只需要提供图像级别的类别标签。此外,研究还发现了如何通过直接禁用不想要的原型来人工修正PIP-Net的思维。I hope that helps! Let me know if you have any other questions.
    Abstract Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.
    摘要 <>本文探讨了可解释型机器学习模型在医学图像分类 task 中的可行性和潜力。 Specifically, we explore the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision-making process is in line with medical classification standards, while only provided with image-level class labels. Additionally, we show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Selection functions of strong lens finding neural networks

  • paper_url: http://arxiv.org/abs/2307.10355
  • repo_url: None
  • paper_authors: A. Herle, C. M. O’Riordan, S. Vegetti
  • for: 这个论文主要目的是研究深 gravitational lens 系统中 neural network 的偏袋性。
  • methods: 这个论文使用了类似于常见 literatura 中使用的 Convolutional Neural Networks 和三个不同的训练集来研究 lens finding neural network 的选择函数。
  • results: 研究发现,这些 neural network 偏好 larger Einstein radii 和更集中的 source-light distributions。增加检测重要性阈值可以改善选择函数的效果。
    Abstract Convolution Neural Networks trained for the task of lens finding with similar architecture and training data as is commonly found in the literature are biased classifiers. An understanding of the selection function of lens finding neural networks will be key to fully realising the potential of the large samples of strong gravitational lens systems that will be found in upcoming wide-field surveys. We use three training datasets, representative of those used to train galaxy-galaxy and galaxy-quasar lens finding neural networks. The networks preferentially select systems with larger Einstein radii and larger sources with more concentrated source-light distributions. Increasing the detection significance threshold to 12$\sigma$ from 8$\sigma$ results in 50 per cent of the selected strong lens systems having Einstein radii $\theta_\mathrm{E}$ $\ge$ 1.04 arcsec from $\theta_\mathrm{E}$ $\ge$ 0.879 arcsec, source radii $R_S$ $\ge$ 0.194 arcsec from $R_S$ $\ge$ 0.178 arcsec and source S\'ersic indices $n_{\mathrm{Sc}^{\mathrm{S}$ $\ge$ 2.62 from $n_{\mathrm{Sc}^{\mathrm{S}$ $\ge$ 2.55. The model trained to find lensed quasars shows a stronger preference for higher lens ellipticities than those trained to find lensed galaxies. The selection function is independent of the slope of the power-law of the mass profiles, hence measurements of this quantity will be unaffected. The lens finder selection function reinforces that of the lensing cross-section, and thus we expect our findings to be a general result for all galaxy-galaxy and galaxy-quasar lens finding neural networks.
    摘要 convolutional neural networks 特别是用于这个任务的 Architecture 和训练数据,即通常在文献中找到的 Architecture 和训练数据,是偏向分类器。 理解这个镜像系统的选择函数是掌握这个大量强 gravitational lens系统的潜在力量的关键。 我们使用了三个训练数据集,代表了通常用于训练 galaxy-galaxy 和 galaxy-quasar 镜像系统的训练数据。 这些网络偏好 Systems with larger Einstein radii 和更集中的源光辉分布。 将检测关键值从 8σ 提高到 12σ 会导致50%选择的强镜系统有 Einstein radii θE 大于或等于 1.04弧度,source radii RS 大于或等于 0.194弧度,source Sérsic indices nScS 大于或等于 2.62。 对于找寻类别的模型,它具有更强的偏好 towards higher lens ellipticities than those trained to find lensed galaxies。 选择函数不受Source 的梯度影响,因此Measurements of this quantity will be unaffected。 镜像选择函数与镜像截面的选择函数相似,因此我们预期我们的发现将是一个通用的结果,适用于所有 galaxy-galaxy 和 galaxy-quasar 镜像系统。

LightPath: Lightweight and Scalable Path Representation Learning

  • paper_url: http://arxiv.org/abs/2307.10171
  • repo_url: None
  • paper_authors: Sean Bin Yang, Jilin Hu, Chenjuan Guo, Bin Yang, Christian S. Jensen
  • for: 本文提出了一种轻量级和可扩展的路径表示学习框架,用于智能交通和智能城市应用。
  • methods: 提议使用笛卡尔环境抽象和全局本地知识传播来减少资源消耗和提高可扩展性,同时保持准确性。
  • results: 经过广泛的实验 validate 了该框架的可扩展性和精度,并且在资源有限的环境中具有优势。
    Abstract Movement paths are used widely in intelligent transportation and smart city applications. To serve such applications, path representation learning aims to provide compact representations of paths that enable efficient and accurate operations when used for different downstream tasks such as path ranking and travel cost estimation. In many cases, it is attractive that the path representation learning is lightweight and scalable; in resource-limited environments and under green computing limitations, it is essential. Yet, existing path representation learning studies focus on accuracy and pay at most secondary attention to resource consumption and scalability. We propose a lightweight and scalable path representation learning framework, termed LightPath, that aims to reduce resource consumption and achieve scalability without affecting accuracy, thus enabling broader applicability. More specifically, we first propose a sparse auto-encoder that ensures that the framework achieves good scalability with respect to path length. Next, we propose a relational reasoning framework to enable faster training of more robust sparse path encoders. We also propose global-local knowledge distillation to further reduce the size and improve the performance of sparse path encoders. Finally, we report extensive experiments on two real-world datasets to offer insight into the efficiency, scalability, and effectiveness of the proposed framework.
    摘要 路径表示法广泛应用于智能交通和智能城市应用程序中。为了满足这些应用程序,路径表示学习目标是提供高效精度的路径表示,以便在不同的下游任务中进行高效的操作,如路径排名和旅行费用估算。在资源有限的环境和绿色计算限制下,现有的路径表示学习研究通常强调精度,并且只在必要的情况下进行次要的考虑。我们提出了一个轻量级和可扩展的路径表示学习框架,称为LightPath,以降低资源消耗和实现可扩展性,而不影响准确性。更 Specifically,我们首先提出了一个稀疏自动编码器,以确保框架在路径长度方面具有良好的扩展性。然后,我们提出了一个关系理解框架,以更快地训练更加稀疏的路径编码器。 finally,我们提出了全球-本地知识传播,以进一步减小路径编码器的大小和提高其性能。我们在两个真实世界数据集上进行了广泛的实验,以提供有关效率、可扩展性和效果的深入了解。

Challenges and Applications of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10169
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy
  • for: 本研究旨在为机器学习研究人员提供一个系统的开放问题和成功应用领域,以便更快地了解大语言模型(LLMs)领域的当前状态,并更快地成为产ктив的研究人员。
  • methods: 本研究使用了系统的Literature Review和问题定义方法,以掌握大语言模型领域的当前状态和未解决问题。
  • results: 本研究提出了一系列的开放问题和成功应用领域,以便 ML 研究人员更快地了解大语言模型领域的当前状态,并更快地成为产ктив的研究人员。
    Abstract Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive.
    摘要 庞大语言模型(LLM)从无存到普遍的机器学习议题中的几年内。由于这个领域的快速进步,因此难以识别还没有解决的挑战和已经有成果的应用领域。在这篇论文中,我们 hoping to establish a systematic set of open problems and application successes,以便ML研究人员更快地了解这个领域的现状,更快地成为生产力。Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and other countries.

VITS : Variational Inference Thomson Sampling for contextual bandits

  • paper_url: http://arxiv.org/abs/2307.10167
  • repo_url: None
  • paper_authors: Pierre Clavier, Tom Huix, Alain Durmus
  • for: 这 paper 是关于 contextual bandits 的一种变体 Thompson sampling(TS)算法的研究。
  • methods: 该算法使用 Gaussian Variational Inference 提供高效的 posterior 近似,并且可以轻松地从近似中采样。
  • results: 该paper 表明 VITS 算法可以实现 sub-linear regret bound,并且在 synthetic 和实际世界数据上进行了实验验证。Here’s the breakdown of each point in English:* For: The paper is about a variant of the Thompson sampling algorithm for contextual bandits.* Methods: The algorithm uses Gaussian Variational Inference to provide efficient posterior approximations, and can easily sample from these approximations.* Results: The paper shows that the VITS algorithm can achieve a sub-linear regret bound, and demonstrates its effectiveness through experiments on both synthetic and real-world datasets.
    Abstract In this paper, we introduce and analyze a variant of the Thompson sampling (TS) algorithm for contextual bandits. At each round, traditional TS requires samples from the current posterior distribution, which is usually intractable. To circumvent this issue, approximate inference techniques can be used and provide samples with distribution close to the posteriors. However, current approximate techniques yield to either poor estimation (Laplace approximation) or can be computationally expensive (MCMC methods, Ensemble sampling...). In this paper, we propose a new algorithm, Varational Inference Thompson sampling VITS, based on Gaussian Variational Inference. This scheme provides powerful posterior approximations which are easy to sample from, and is computationally efficient, making it an ideal choice for TS. In addition, we show that VITS achieves a sub-linear regret bound of the same order in the dimension and number of round as traditional TS for linear contextual bandit. Finally, we demonstrate experimentally the effectiveness of VITS on both synthetic and real world datasets.
    摘要 在本文中,我们介绍并分析了一种 Contextual Bandit 中的 Thompson 抽样(TS)算法的变体。在每个轮次中,传统的 TS 需要从当前的 posterior 分布中采样,通常是不可行的。为了缓解这个问题,我们可以使用 Approximate Inference 技术,提供靠近 posterior 的样本。然而,当前的 approximate 技术可能会导致低效的估计(Laplace 应用)或者是 computationally expensive(MCMC 方法、Ensemble 抽样...)。在本文中,我们提出了一个新的算法,基于 Gaussian Variational Inference 的 Varitional Inference Thompson Sampling(VITS)。这种方案提供了简单易于采样的强 posterior aproximation,计算效率高,适用于 TS。此外,我们证明了 VITS 在维度和轮次数方面的下降 regret bound 与传统的 TS 相同。最后,我们通过 synthetic 和实际世界数据的实验证明了 VITS 的实际效果。

Improving Multimodal Datasets with Image Captioning

  • paper_url: http://arxiv.org/abs/2307.10350
  • repo_url: None
  • paper_authors: Thao Nguyen, Samir Yitzhak Gadre, Gabriel Ilharco, Sewoong Oh, Ludwig Schmidt
  • for: 提高大型视觉语言模型的成功,如CLIP和Flamingo。
  • methods: 研究如何使用生成的标题提高web数据的Utility,并比较不同混合策略的性能。
  • results: 与DataComp benchmark中提出的最佳策略相比,我们的方法在ImageNet和38个任务中提高了2%和4%的性能,并在Flickr和MS-COCO检索中表现了2倍的提升。我们还分析了生成标题的效果,并证明标准图像描述标准不是多Modal训练中标题的可靠指标。最后,我们在大规模的DataComp中进行了实验,探讨生成标题在大量训练数据量下的局限性,以及图像淘汰的重要性。
    Abstract Massive web datasets play a key role in the success of large vision-language models like CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to reduce noise often come at the expense of data diversity. Our work focuses on caption quality as one major source of noise, and studies how generated captions can increase the utility of web-scraped datapoints with nondescript text. Through exploring different mixing strategies for raw and generated captions, we outperform the best filtering method proposed by the DataComp benchmark by 2% on ImageNet and 4% on average across 38 tasks, given a candidate pool of 128M image-text pairs. Our best approach is also 2x better at Flickr and MS-COCO retrieval. We then analyze what makes synthetic captions an effective source of text supervision. In experimenting with different image captioning models, we also demonstrate that the performance of a model on standard image captioning benchmarks (e.g., NoCaps CIDEr) is not a reliable indicator of the utility of the captions it generates for multimodal training. Finally, our experiments with using generated captions at DataComp's large scale (1.28B image-text pairs) offer insights into the limitations of synthetic text, as well as the importance of image curation with increasing training data quantity.
    摘要 大量网络数据对于大型视觉语言模型如CLIP和Flamingo的成功起到了关键作用。然而,原始网络数据具有噪音,现有的过滤方法通常会导致数据多样性减少。我们的工作将注意力点在caption质量上,研究如何使用生成的caption提高网络抓取到的文本点的使用价值。通过不同的混合策略来融合原始和生成的caption,我们在ImageNet和38个任务上比DataCompbenchmark中的最佳过滤方法提高2%和4%。我们的最佳方法还在Flickr和MS-COCO检索中表现出2倍的好干净性。我们还分析了生成caption的有效性源泉,并通过不同的图像描述模型的实验,发现标准图像描述benchmark(如NoCaps CIDEr)中模型的性能不是训练多模式时caption的用途的可靠指标。最后,我们在DataComp的大规模数据(1.28B image-text pair)上进行实验,提供了生成caption的局限性以及图像筛选的重要性,随着训练数据量的增加。

Rethinking Backdoor Attacks

  • paper_url: http://arxiv.org/abs/2307.10163
  • repo_url: https://github.com/lancopku/SOS
  • paper_authors: Alaa Khaddaj, Guillaume Leclerc, Aleksandar Makelov, Kristian Georgiev, Hadi Salman, Andrew Ilyas, Aleksander Madry
  • for: 本文旨在探讨Backdoor攻击的问题,即敌对者在训练集中插入恶意构建的例子,以让模型易受欺诈。
  • methods: 本文提出了一种新的方法来抗击Backdoor攻击,即通过无结构信息对训练数据分布进行检测,并使用Robust统计技术来检测和移除恶意构建的例子。
  • results: 本文的结果表明,在缺乏结构信息的情况下,Backdoor攻击是不可识别的,而且与自然出现的特征相同。基于此观察,本文检视了现有的Backdoor攻击防御方法,并描述了它们的假设和依赖关系。最后,本文提出了一种新的假设,即Backdoor攻击对应于训练数据中最强的特征。基于这个假设,本文开发了一种新的检测算法,具有理论保证和实际效果。
    Abstract In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. In this work, we present a different approach to the backdoor attack problem. Specifically, we show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data--and thus impossible to "detect" in a general sense. Then, guided by this observation, we revisit existing defenses against backdoor attacks and characterize the (often latent) assumptions they make and on which they depend. Finally, we explore an alternative perspective on backdoor attacks: one that assumes these attacks correspond to the strongest feature in the training data. Under this assumption (which we make formal) we develop a new primitive for detecting backdoor attacks. Our primitive naturally gives rise to a detection algorithm that comes with theoretical guarantees and is effective in practice.
    摘要 在一个后门攻击中,敌对者会插入一些恶意构建的后门示例,以让模型易于操纵。防御这类攻击通常包括视这些插入的示例为训练集中的异常值,并使用robust统计技术来探测和除掉它们。在这项工作中,我们提出了一种不同的后门攻击问题的解决方案。具体来说,我们表明了在训练数据分布的结构信息不存在的情况下,后门攻击是无法分辨的,因此不能在通用的概念上探测。然后,我们根据这一观察,重新评估了现有的后门攻击防御方法,描述了它们所假设的(常常隐藏的)假设和依赖项。最后,我们探索了一种假设后门攻击对应于训练数据中最强的特征,并将这种假设进行了正式表述。我们的原则 Naturally gives rise to a detection algorithm that comes with theoretical guarantees and is effective in practice.

Robust Driving Policy Learning with Guided Meta Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.10160
  • repo_url: None
  • paper_authors: Kanghoon Lee, Jiachen Li, David Isele, Jinkyoo Park, Kikuo Fujimura, Mykel J. Kochenderfer
  • for: 增强自动驾驶车辆在互动交通场景中的自适应能力
  • methods: 使用随机互动奖励函数生成多种目标,并通过引导政策实现这些目标,以 trains 多种驾驶策略
  • results: 在一个具有挑战性的 T-路口场景中,成功启动了一个适应性强的驾驶策略,并且能够在未经见过的社交车辆行为下generalizationHere’s the same information in a more detailed format:
  • for: The paper aims to improve the adaptability of autonomous vehicles in interactive traffic scenarios by training a single meta-policy to handle diverse social vehicle behaviors.
  • methods: The proposed method uses randomized interaction-based reward functions to generate diverse objectives and train the meta-policy through guiding policies that achieve specific objectives. The ego vehicle’s driving policy is trained to be robust to unseen situations with out-of-distribution (OOD) social agents’ behaviors.
  • results: The proposed method is tested in a challenging uncontrolled T-intersection scenario, where the ego vehicle’s driving policy is able to generalize well to unseen situations with OOD social agents’ behaviors.I hope this helps! Let me know if you have any further questions.
    Abstract Although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors. In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives. We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.
    摘要 (Simplified Chinese)尽管深度强化学习(DRL)在交互式交通场景中展现出了扎实的结果,现有的工作通常采用固定行为策略来控制社交车辆在训练环境中。这可能会使学习的驾驶策略过拟合环境,使其与不同、未见的行为的社交车辆困难交流。在这种工作中,我们提出了一种高效的方法,用于训练社交车辆的多样化驾驶策略。通过随机 modify social vehicle的互动基于奖励函数,我们可以生成多样的目标,并高效地通过指导策略来训练单一的元策略。我们还提出了一种增强ego车驾驶策略的训练策略,使用已学习的元策略控制社交车辆的环境。我们的方法成功地学习了一个 Egode驾驶策略,该策略在未见的社交车辆行为情况下可以普适地应用。

Curvature-based Clustering on Graphs

  • paper_url: http://arxiv.org/abs/2307.10155
  • repo_url: https://github.com/agosztolai/geometric_clustering
  • paper_authors: Yu Tian, Zachary Lubberts, Melanie Weber
  • for: 本文研究了一种基于图形学的无监督节点划分(或社区检测)算法,用于找到图中紧密连接的子结构,即社区或群体。
  • methods: 本文使用了离散 Ricci curvature 和其相关的 геометрический流动,以揭示图中的社区结构。 并考虑了多种离散 curvature 观念,并对其进行分析。
  • results: 本文提供了both theoretical 和 empirical 证明,证明了我们的 curvature-based 划分算法的实用性。 此外,还提供了一些关于图形 curvature 和其对副图 curvature 的关系的结果,可能对 curvature-based 网络分析有独立的价值。
    Abstract Unsupervised node clustering (or community detection) is a classical graph learning task. In this paper, we study algorithms, which exploit the geometry of the graph to identify densely connected substructures, which form clusters or communities. Our method implements discrete Ricci curvatures and their associated geometric flows, under which the edge weights of the graph evolve to reveal its community structure. We consider several discrete curvature notions and analyze the utility of the resulting algorithms. In contrast to prior literature, we study not only single-membership community detection, where each node belongs to exactly one community, but also mixed-membership community detection, where communities may overlap. For the latter, we argue that it is beneficial to perform community detection on the line graph, i.e., the graph's dual. We provide both theoretical and empirical evidence for the utility of our curvature-based clustering algorithms. In addition, we give several results on the relationship between the curvature of a graph and that of its dual, which enable the efficient implementation of our proposed mixed-membership community detection approach and which may be of independent interest for curvature-based network analysis.
    摘要 不监督节点划分(或社区探测)是一个经典的图学任务。在这篇论文中,我们研究了利用图形的几何特性来标识紧密连接的子结构,它们组成社区或社区。我们的方法利用离散 Ricci 曲率和其相关的几何流动,以便在图形中揭示社区结构。我们考虑了多种离散曲率概念,并分析了它们的使用价值。与先前文献不同,我们不仅研究单会员社区探测,每个节点都属于唯一一个社区,还研究了混合会员社区探测,社区可能 overlap。为实现后一种情况,我们提出在对граф的 dual 进行社区探测,即线图。我们提供了理论和实验证明,以及对权重图的 curvature 的分析,这些结果可能为 curvature-based 网络分析提供帮助。

Code Detection for Hardware Acceleration Using Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10348
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Pablo Antonio Martínez, Gregorio Bernabé, José Manuel García
  • for: 本研究探讨了使用大型自然语言模型(LLM)进行代码检测。
  • methods: 我们提出了一种初步的提示策略和一种新的提示策略来实现代码检测。
  • results: 结果表明,我们的新提示策略可以减少假阳性,实现了优秀的总准确率(91.1%, 97.9%, 和 99.7%)。这些结果对现有的代码检测方法提出了明显的挑战。
    Abstract Large language models (LLMs) have been massively applied to many tasks, often surpassing state-of-the-art approaches. While their effectiveness in code generation has been extensively studied (e.g., AlphaCode), their potential for code detection remains unexplored. This work presents the first analysis of code detection using LLMs. Our study examines essential kernels, including matrix multiplication, convolution, and fast-fourier transform, implemented in C/C++. We propose both a preliminary, naive prompt and a novel prompting strategy for code detection. Results reveal that conventional prompting achieves great precision but poor accuracy (68.8%, 22.3%, and 79.2% for GEMM, convolution, and FFT, respectively) due to a high number of false positives. Our novel prompting strategy substantially reduces false positives, resulting in excellent overall accuracy (91.1%, 97.9%, and 99.7%, respectively). These results pose a considerable challenge to existing state-of-the-art code detection methods.
    摘要 大型语言模型(LLM)已经广泛应用于多种任务,经常超越现有的方法。而它们在代码生成中的应用则未得到充分探讨。 本研究是代码检测使用 LLM 的首次分析。我们的研究探讨了重要的核心操作,包括矩阵乘法、卷积和快速傅立叶变换,它们在 C/C++ 中实现。我们提出了一个初步、简单的提示和一个新的提示策略来进行代码检测。结果显示,传统的提示方法可以很好地精度(68.8%、22.3%和79.2%),但是受到许多假阳性的影响,因此精度低。我们的新提示策略可以干扰假阳性,实现了优秀的总精度(91.1%、97.9%和99.7%)。这些结果对现有的代码检测方法提出了严重的挑战。

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

  • paper_url: http://arxiv.org/abs/2307.10142
  • repo_url: https://github.com/se-hwan/pbrs-humanoid
  • paper_authors: Se Hwan Jeon, Steve Heim, Charles Khazoom, Sangbae Kim
  • for: 本研究旨在 benchmarking 标准的 reward shaping 方法和 potential based reward shaping (PBRS) 方法,以加速 reinforcement learning (RL) 的学习速度。
  • methods: 本研究使用了 humanoid robot 进行实验,并对两种 reward shaping 方法进行比较。
  • results: 研究发现,在高维系统中,PBRS 的优化效果只有较小的改善,但 PBRS 的评价标准更容易调整。
    Abstract The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.
    摘要 主要挑战在开发有效的增强学习(RL)管道是设计和调整奖金函数。Well-designed 形式的奖金可以导致学习速度明显提高。然而,Naively 定义的奖金可能会与愿望行为冲突,导致过拟合或者even erratic performance if not properly tuned。理论上,广泛的 potential based reward shaping(PBRS)可以帮助导引学习过程,无需affecting the optimal policy。虽然一些研究已经探讨了使用 potential based reward shaping 加速学习的整合,但大多数研究仅限于格子世界和低维系统,RL 在机器人领域主要依靠标准的奖金形式。在这篇论文中,我们对标准的奖金形式和 PBRS 进行了对比,发现在这个高维系统中,PBRS 只有微妙的加速学习速度。然而,PBRS 奖金项是标准奖金形式相比较更加Robust 尺度,因此更容易调整。

ProtiGeno: a prokaryotic short gene finder using protein language models

  • paper_url: http://arxiv.org/abs/2307.10343
  • repo_url: https://github.com/tonytu16/protigeno
  • paper_authors: Tony Tu, Gautham Krishna, Amirali Aghazadeh
  • for: 本研究旨在提高质子生物基因预测的准确性和 recall,尤其是在短读取 Frame (ORFs) 中。
  • methods: 我们开发了一种基于深度学习的方法,称为 ProtiGeno,使用一个训练在数百万个演化后的蛋白质模型来预测质子生物短基因。
  • results: 在系统性的大规模实验中,我们示出了 ProtiGeno 可以更高度准确地预测短质子生物基因,比现有的状态艺术基因预测器更高。我们还讲述了 ProtiGeno 预测短基因的预测特征和可能的限制。
    Abstract Prokaryotic gene prediction plays an important role in understanding the biology of organisms and their function with applications in medicine and biotechnology. Although the current gene finders are highly sensitive in finding long genes, their sensitivity decreases noticeably in finding shorter genes (<180 nts). The culprit is insufficient annotated gene data to identify distinguishing features in short open reading frames (ORFs). We develop a deep learning-based method called ProtiGeno, specifically targeting short prokaryotic genes using a protein language model trained on millions of evolved proteins. In systematic large-scale experiments on 4,288 prokaryotic genomes, we demonstrate that ProtiGeno predicts short coding and noncoding genes with higher accuracy and recall than the current state-of-the-art gene finders. We discuss the predictive features of ProtiGeno and possible limitations by visualizing the three-dimensional structure of the predicted short genes. Data, codes, and models are available at https://github.com/tonytu16/protigeno.
    摘要 probiotic gene prediction plays an important role in understanding the biology of organisms and their function, with applications in medicine and biotechnology. Although the current gene finders are highly sensitive in finding long genes, their sensitivity decreases noticeably in finding shorter genes (<180 nts). The culprit is insufficient annotated gene data to identify distinguishing features in short open reading frames (ORFs). We develop a deep learning-based method called ProtiGeno, specifically targeting short prokaryotic genes using a protein language model trained on millions of evolved proteins. In systematic large-scale experiments on 4,288 prokaryotic genomes, we demonstrate that ProtiGeno predicts short coding and noncoding genes with higher accuracy and recall than the current state-of-the-art gene finders. We discuss the predictive features of ProtiGeno and possible limitations by visualizing the three-dimensional structure of the predicted short genes. Data, codes, and models are available at https://github.com/tonytu16/protigeno.Here's the word-for-word translation of the text into Simplified Chinese: probiotic gene prediction plays an important role in understanding the biology of organisms and their function, with applications in medicine and biotechnology. Although the current gene finders are highly sensitive in finding long genes, their sensitivity decreases noticeably in finding shorter genes (<180 nts). The culprit is insufficient annotated gene data to identify distinguishing features in short open reading frames (ORFs). We develop a deep learning-based method called ProtiGeno, specifically targeting short prokaryotic genes using a protein language model trained on millions of evolved proteins. In systematic large-scale experiments on 4,288 prokaryotic genomes, we demonstrate that ProtiGeno predicts short coding and noncoding genes with higher accuracy and recall than the current state-of-the-art gene finders. We discuss the predictive features of ProtiGeno and possible limitations by visualizing the three-dimensional structure of the predicted short genes. Data, codes, and models are available at https://github.com/tonytu16/protigeno.Please note that the translation is done using Google Translate, and the result may not be perfectly accurate or idiomatic.

Gradient Sparsification For Masked Fine-Tuning of Transformers

  • paper_url: http://arxiv.org/abs/2307.10098
  • repo_url: None
  • paper_authors: James O’ Neill, Sourav Dutta
  • for: 这 paper 是 investigate 如何提高 transfer learning 中的 fine-tuning 性能,并提出了一种 gradient sparsification 方法 GradDrop。
  • methods: 这 paper 使用了 two 种方法来 evaluate GradDrop,一种是使用 multilingual XGLUE bencmark,另一种是使用 XLMR-Large 模型。
  • results: 实验结果表明,GradDrop 可以与使用额外的翻译数据进行 intermediate pretraining 相比,并且超过标准的 fine-tuning 和慢滑层解决方法。另外,一种 post-analysis 还表明,GradDrop 可以提高 under-resourced 语言的性能。
    Abstract Fine-tuning pretrained self-supervised language models is widely adopted for transfer learning to downstream tasks. Fine-tuning can be achieved by freezing gradients of the pretrained network and only updating gradients of a newly added classification layer, or by performing gradient updates on all parameters. Gradual unfreezing makes a trade-off between the two by gradually unfreezing gradients of whole layers during training. This has been an effective strategy to trade-off between storage and training speed with generalization performance. However, it is not clear whether gradually unfreezing layers throughout training is optimal, compared to sparse variants of gradual unfreezing which may improve fine-tuning performance. In this paper, we propose to stochastically mask gradients to regularize pretrained language models for improving overall fine-tuned performance. We introduce GradDrop and variants thereof, a class of gradient sparsification methods that mask gradients during the backward pass, acting as gradient noise. GradDrop is sparse and stochastic unlike gradual freezing. Extensive experiments on the multilingual XGLUE benchmark with XLMR-Large show that GradDrop is competitive against methods that use additional translated data for intermediate pretraining and outperforms standard fine-tuning and gradual unfreezing. A post-analysis shows how GradDrop improves performance with languages it was not trained on, such as under-resourced languages.
    摘要 广泛采用已经预训练自主语言模型的微调是为了进行转移学习到下游任务。微调可以通过冻结预训练网络的梯度来实现,或者是通过在新增的分类层上进行梯度更新。渐进冻结可以在训练过程中逐渐解冻整个层的梯度,从而实现存储和训练速度之间的平衡。然而,不知道渐进冻结layers在训练过程中是最佳的,相比之下, sparse variant of gradual unfreezing可能会提高微调性能。在这篇论文中,我们提出了随机层梯度掩码来规范预训练语言模型,以提高总体微调性能。我们引入GradDrop和其变种,它是一类梯度减少方法,在反向传播中随机掩码梯度。GradDrop是不同于渐进冻结的,它是随机和粗略的。我们在多语言XGLUE标准测试 benchmark上进行了广泛的实验,结果显示GradDrop和其变种与使用额外翻译数据进行中间预训练的方法相当竞争,并且超过了标准微调和渐进冻结。一种后期分析表明,GradDrop在未经训练的语言上提高性能,如受到了资源的语言。

Revisiting invariances and introducing priors in Gromov-Wasserstein distances

  • paper_url: http://arxiv.org/abs/2307.10093
  • repo_url: None
  • paper_authors: Pinar Demetci, Quang Huy Tran, Ievgen Redko, Ritambhara Singh
  • for: 本研究旨在提出一种新的优先Transport-基于距离,以提高对metric spaces中样本之间的比较,并且能够在certain应用中控制对映射变换的灵活性。
  • methods: 本研究使用augmented Gromov-Wasserstein distance,该距离考虑了样本之间的pairwise similarity,同时还能够 incorporate feature alignments,以更好地利用输入数据中的先验知识。
  • results: 本研究通过 theoretically analyzing the proposed metric,并在单细胞多Omic alignment和机器学习中进行了实验 validate the effectiveness of the proposed method。
    Abstract Gromov-Wasserstein distance has found many applications in machine learning due to its ability to compare measures across metric spaces and its invariance to isometric transformations. However, in certain applications, this invariance property can be too flexible, thus undesirable. Moreover, the Gromov-Wasserstein distance solely considers pairwise sample similarities in input datasets, disregarding the raw feature representations. We propose a new optimal transport-based distance, called Augmented Gromov-Wasserstein, that allows for some control over the level of rigidity to transformations. It also incorporates feature alignments, enabling us to better leverage prior knowledge on the input data for improved performance. We present theoretical insights into the proposed metric. We then demonstrate its usefulness for single-cell multi-omic alignment tasks and a transfer learning scenario in machine learning.
    摘要 《Gromov-Wasserstein距离》在机器学习中发现了广泛的应用,主要是因为它可以比较度量空间中的度量,并且对于同态变换是不变的。然而,在某些应用场景中,这种不变性属性可能是不需要的,甚至是不жела的。此外,Gromov-Wasserstein距离仅考虑输入数据集中的对应关系,不考虑原始特征表示。我们提议一种新的优化的Gromov-Wasserstein距离,called Augmented Gromov-Wasserstein,允许控制变换的级别。它还包含特征对齐,使得我们可以更好地利用输入数据中的先验知识,提高性能。我们提供了关于提议度量的理论听见。然后,我们在单细ће多元素Alignment任务和机器学习中的传递学习场景中展示了其用于。