paper_authors: Jialu Li, Junhui Li, Pu Wang, Youshan Zhang
for: 提高Speech噪音消除的性能
methods: 提出了一种深度复杂混合变换器, integrate both spectrogram和waveform domain的方法,以提高Speech噪音消除的性能
results: 实验结果表明,该方法可以在BirdSoundsDenoising数据集和VCTK+DEMAND数据集上比 estado-of-the-art 方法更好地提高Speech噪音消除的性能。Abstract
Most of the current deep learning-based approaches for speech enhancement only operate in the spectrogram or waveform domain. Although a cross-domain transformer combining waveform- and spectrogram-domain inputs has been proposed, its performance can be further improved. In this paper, we present a novel deep complex hybrid transformer that integrates both spectrogram and waveform domains approaches to improve the performance of speech enhancement. The proposed model consists of two parts: a complex Swin-Unet in the spectrogram domain and a dual-path transformer network (DPTnet) in the waveform domain. We first construct a complex Swin-Unet network in the spectrogram domain and perform speech enhancement in the complex audio spectrum. We then introduce improved DPT by adding memory-compressed attention. Our model is capable of learning multi-domain features to reduce existing noise on different domains in a complementary way. The experimental results on the BirdSoundsDenoising dataset and the VCTK+DEMAND dataset indicate that our method can achieve better performance compared to state-of-the-art methods.
摘要
Current deep learning-based speech enhancement methods mostly operate in the spectrogram or waveform domain. Although a cross-domain transformer combining waveform- and spectrogram-domain inputs has been proposed, its performance can be further improved. In this paper, we present a novel deep complex hybrid transformer that integrates both spectrogram and waveform domains approaches to improve speech enhancement performance. The proposed model consists of two parts: a complex Swin-Unet in the spectrogram domain and a dual-path transformer network (DPTnet) in the waveform domain.First, we construct a complex Swin-Unet network in the spectrogram domain and perform speech enhancement in the complex audio spectrum. We then introduce improved DPT by adding memory-compressed attention. Our model is capable of learning multi-domain features to reduce existing noise on different domains in a complementary way. The experimental results on the BirdSoundsDenoising dataset and the VCTK+DEMAND dataset indicate that our method can achieve better performance compared to state-of-the-art methods.
Sound of Story: Multi-modal Storytelling with Audio
results: 作者通过实验表明,该数据集和任务可以帮助研究人员更好地理解故事的多模态表达,并提出了强大的基线任务。数据集和代码将在链接中公开:https://github.com/Sosdatasets/SoS_Dataset。Abstract
Storytelling is multi-modal in the real world. When one tells a story, one may use all of the visualizations and sounds along with the story itself. However, prior studies on storytelling datasets and tasks have paid little attention to sound even though sound also conveys meaningful semantics of the story. Therefore, we propose to extend story understanding and telling areas by establishing a new component called "background sound" which is story context-based audio without any linguistic information. For this purpose, we introduce a new dataset, called "Sound of Story (SoS)", which has paired image and text sequences with corresponding sound or background music for a story. To the best of our knowledge, this is the largest well-curated dataset for storytelling with sound. Our SoS dataset consists of 27,354 stories with 19.6 images per story and 984 hours of speech-decoupled audio such as background music and other sounds. As benchmark tasks for storytelling with sound and the dataset, we propose retrieval tasks between modalities, and audio generation tasks from image-text sequences, introducing strong baselines for them. We believe the proposed dataset and tasks may shed light on the multi-modal understanding of storytelling in terms of sound. Downloading the dataset and baseline codes for each task will be released in the link: https://github.com/Sosdatasets/SoS_Dataset.
摘要
Storytelling 是多Modal 的在现实世界中。当一个人 tel 一个故事时,可能使用所有的视觉和声音来传达故事的意义。然而,在storytelling 数据集和任务中,尽管声音也传达了故事的 semantics,但是之前的研究却很少关注声音。因此,我们提议通过 Adding 一个新的组件 called "background sound",来扩展故事理解和 tel 的领域。为此,我们引入了一个新的数据集,called "Sound of Story (SoS)",该数据集包含 27,354 个故事,每个故事有 19.6 个图像和 984 小时的speech-decoupled 声音,如背景音乐和其他声音。我们认为这是最大的、最好的纪录的故事tel 数据集。我们的 SoS 数据集包括以下任务: between modalities 的 Retrieval 任务和 image-text 序列的 audio 生成任务,我们提出了强大的基线。我们认为这些任务和数据集可能为 storytelling 中声音的多Modal 理解提供新的灵感。下载数据集和基线代码可以通过以下链接下载:https://github.com/Sosdatasets/SoS_Dataset。
results: 我们的模型在COG-MHEAR Audio-Visual Speech Enhancement Challenge中的第二届挑战中达到了state-of-the-art的成绩,与其他模型相比,具有显著的优势。我们还进行了广泛的分析结果,并对两种场景进行了详细的比较。Abstract
Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers. Building upon the achievements of the state-of-the-art (SOTA) time-frequency speaker separation model TF-GridNet, we propose AV-GridNet, a visual-grounded variant that incorporates the face recording of a target speaker as a conditioning factor during the extraction process. Recognizing the inherent dissimilarities between speech and noise signals as interfering sources, we also propose SAV-GridNet, a scenario-aware model that identifies the type of interfering scenario first and then applies a dedicated expert model trained specifically for that scenario. Our proposed model achieves SOTA results on the second COG-MHEAR Audio-Visual Speech Enhancement Challenge, outperforming other models by a significant margin, objectively and in a listening test. We also perform an extensive analysis of the results under the two scenarios.
摘要
Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers. Building upon the achievements of the state-of-the-art (SOTA) time-frequency speaker separation model TF-GridNet, we propose AV-GridNet, a visual-grounded variant that incorporates the face recording of a target speaker as a conditioning factor during the extraction process. Recognizing the inherent dissimilarities between speech and noise signals as interfering sources, we also propose SAV-GridNet, a scenario-aware model that identifies the type of interfering scenario first and then applies a dedicated expert model trained specifically for that scenario. Our proposed model achieves SOTA results on the second COG-MHEAR Audio-Visual Speech Enhancement Challenge, outperforming other models by a significant margin, objectively and in a listening test. We also perform an extensive analysis of the results under the two scenarios.Here's the translation in Traditional Chinese:target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers. Building upon the achievements of the state-of-the-art (SOTA) time-frequency speaker separation model TF-GridNet, we propose AV-GridNet, a visual-grounded variant that incorporates the face recording of a target speaker as a conditioning factor during the extraction process. Recognizing the inherent dissimilarities between speech and noise signals as interfering sources, we also propose SAV-GridNet, a scenario-aware model that identifies the type of interfering scenario first and then applies a dedicated expert model trained specifically for that scenario. Our proposed model achieves SOTA results on the second COG-MHEAR Audio-Visual Speech Enhancement Challenge, outperforming other models by a significant margin, objectively and in a listening test. We also perform an extensive analysis of the results under the two scenarios.
results: 研究发现,通过分析面孔不对称和微表情,可以获得不受评估员的偏见和社会可能性影响的不足信息,帮助选择过程更加精确和公平。Abstract
Choosing the right person for the right job makes the personnel interview process a cognitively demanding task. Psychometric tests, followed by an interview, have often been used to aid the process although such mechanisms have their limitations. While psychometric tests suffer from faking or social desirability of responses, the interview process depends on the way the responses are analyzed by the interviewers. We propose the use of behaviometry as an assistive tool to facilitate an objective assessment of the interviewee without increasing the cognitive load of the interviewer. Behaviometry is a relatively little explored field of study in the selection process, that utilizes inimitable behavioral characteristics like facial expressions, vocalization patterns, pupillary reactions, proximal behavior, body language, etc. The method analyzes thin slices of behavior and provides unbiased information about the interviewee. The current study proposes the methodology behind this tool to capture facial expressions, in terms of facial asymmetry and micro-expressions. Hemi-facial composites using a structural similarity index was used to develop a progressive time graph of facial asymmetry, as a test case. A frame-by-frame analysis was performed on three YouTube video samples, where Structural similarity index (SSID) scores of 75% and more showed behavioral congruence. The research utilizes open-source computer vision algorithms and libraries (python-opencv and dlib) to formulate the procedure for analysis of the facial asymmetry.
摘要
选择合适的人 для合适的工作是人员面试过程中的认知具有挑战性的任务。尽管心理测试和面试都有其限制,但我们提议使用行为测量为面试过程中的辅助工具,以帮助对面试者进行 объектив的评估,不会增加面试者的认知负担。行为测量是一个相对未探索的领域,它利用独特的行为特征,如脸部表现、嗓音特征、眼动反应、躯体语言等,来对面试者进行 объектив的评估。这种方法可以分析面试者的短时间行为,提供不受偏见的信息。本研究使用开源计算机视觉算法和库(python-opencv和dlib)来制定对脸部不均的分析的过程。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
LinFlo-Net: A two-stage deep learning method to generate simulation ready meshes of the heart
for: automatic generation of computer models of the human heart from patient imaging data
methods: two-stage diffeomorphic deformation process with a novel loss function to minimize mesh self-penetration
results: meshes free of self-intersections, comparable accuracy with state-of-the-art methods, and ready for use in physics-based simulation without post-processing.Here’s the full translation in Simplified Chinese:
for: automatic generation of computer models of the human heart from patient imaging data
methods: two-stage diffeomorphic deformation process with a novel loss function to minimize mesh self-penetration
results: meshes free of self-intersections, comparable accuracy with state-of-the-art methods, and ready for use in physics-based simulation without post-processing.Abstract
We present a deep learning model to automatically generate computer models of the human heart from patient imaging data with an emphasis on its capability to generate thin-walled cardiac structures. Our method works by deforming a template mesh to fit the cardiac structures to the given image. Compared with prior deep learning methods that adopted this approach, our framework is designed to minimize mesh self-penetration, which typically arises when deforming surface meshes separated by small distances. We achieve this by using a two-stage diffeomorphic deformation process along with a novel loss function derived from the kinematics of motion that penalizes surface contact and interpenetration. Our model demonstrates comparable accuracy with state-of-the-art methods while additionally producing meshes free of self-intersections. The resultant meshes are readily usable in physics based simulation, minimizing the need for post-processing and cleanup.
摘要
我们提出了一种深度学习模型,可自动生成人体心脏的计算模型,从患者成像数据中提取心脏结构。我们的方法是通过将模板网格调整到图像中心脏结构的位置,以实现这一目的。相比之前的深度学习方法,我们的框架设计了减少网格自交相互穿梭的功能,通过两个阶段的 diffeomorphic 变换过程,同时使用一种新的损失函数,该函数基于运动dinamics的kinematics来penalize表面 contacts和interpenetration。我们的模型可以与现状技术相比,同时生成自交相互穿梭的网格。得到的网格可以 direct用于基于物理学的模拟,减少后期处理和清洁工作。
A Scalable Training Strategy for Blind Multi-Distribution Noise Removal
results: 通过对合理的训练数据进行适应性 sampling和活动学习,实现了一个灵活的 universal denoiser network,可以在各种操作条件下达到特циализированdenoiser network的最佳性能。Abstract
Despite recent advances, developing general-purpose universal denoising and artifact-removal networks remains largely an open problem: Given fixed network weights, one inherently trades-off specialization at one task (e.g.,~removing Poisson noise) for performance at another (e.g.,~removing speckle noise). In addition, training such a network is challenging due to the curse of dimensionality: As one increases the dimensions of the specification-space (i.e.,~the number of parameters needed to describe the noise distribution) the number of unique specifications one needs to train for grows exponentially. Uniformly sampling this space will result in a network that does well at very challenging problem specifications but poorly at easy problem specifications, where even large errors will have a small effect on the overall mean squared error. In this work we propose training denoising networks using an adaptive-sampling/active-learning strategy. Our work improves upon a recently proposed universal denoiser training strategy by extending these results to higher dimensions and by incorporating a polynomial approximation of the true specification-loss landscape. This approximation allows us to reduce training times by almost two orders of magnitude. We test our method on simulated joint Poisson-Gaussian-Speckle noise and demonstrate that with our proposed training strategy, a single blind, generalist denoiser network can achieve peak signal-to-noise ratios within a uniform bound of specialized denoiser networks across a large range of operating conditions. We also capture a small dataset of images with varying amounts of joint Poisson-Gaussian-Speckle noise and demonstrate that a universal denoiser trained using our adaptive-sampling strategy outperforms uniformly trained baselines.
摘要
尽管最近有了进步,开发通用的锈除和遗传物理损订网络仍然是一个大多数未解决的问题:给定固定网络重量,一会 naturally trades-off特殊化在一个任务(例如,~除掉Poisson锈)的性能与另一个任务(例如,~除掉斑点锈)的性能之间。此外,在这种网络上进行训练也是一个挑战,因为维度的诅咒:随着特征空间的维度(即需要描述噪声分布的参数数量)的增加,需要训练的特殊化数量会加 exponential。对于这些特殊化进行均匀采样会导致一个网络能够处理非常困难的特定任务,但是对于容易处理的任务,即使大错也将具有小影响于总平均方差。在这项工作中,我们提议使用可适应采样/活动学习策略来训练锈除网络。我们的工作超越了最近提出的通用锈除训练策略,并在更高的维度上进行扩展。此外,我们还利用幂等函数来近似真实的规范损失景观,以降低训练时间。我们在模拟的 JOINT Poisson-Gaussian-Speckle 噪声下测试了我们的方法,并证明了一个盲目、通用的锈除网络可以在各种操作条件下达到特殊化锈除网络的峰信号强度上限。此外,我们还 capture了一个小型图像库,包含不同量的 JOINT Poisson-Gaussian-Speckle 噪声,并证明了一个通用锈除网络,使用我们的适应采样策略训练后,可以超越固定采样的基eline。
SolarFormer: Multi-scale Transformer for Solar PV Profiling
results: 根据多个测试数据集,包括法国GGE、法国IGN和美国加州USGS,这篇论文的SolarFormer模型能够与现有的模型比较或超越,表明了优化太阳能板的映射和分类。Abstract
As climate change intensifies, the global imperative to shift towards sustainable energy sources becomes more pronounced. Photovoltaic (PV) energy is a favored choice due to its reliability and ease of installation. Accurate mapping of PV installations is crucial for understanding their adoption and informing energy policy. To meet this need, we introduce the SolarFormer, designed to segment solar panels from aerial imagery, offering insights into their location and size. However, solar panel identification in Computer Vision is intricate due to various factors like weather conditions, roof conditions, and Ground Sampling Distance (GSD) variations. To tackle these complexities, we present the SolarFormer, featuring a multi-scale Transformer encoder and a masked-attention Transformer decoder. Our model leverages low-level features and incorporates an instance query mechanism to enhance the localization of solar PV installations. We rigorously evaluated our SolarFormer using diverse datasets, including GGE (France), IGN (France), and USGS (California, USA), across different GSDs. Our extensive experiments consistently demonstrate that our model either matches or surpasses state-of-the-art models, promising enhanced solar panel segmentation for global sustainable energy initiatives.
摘要
Radiomics as a measure superior to the Dice similarity coefficient for tumor segmentation performance evaluation
paper_authors: Yoichi Watanabe, Rukhsora Akramova for: This study aims to evaluate the segmentation ability of physicians and auto-segmentation tools in high-quality radiotherapy delivery by using Radiomics features as a superior measure compared to the widely used Dice Similarity Coefficient (DSC).methods: The research involves selecting reproducible radiomics features for evaluating segmentation accuracy by analyzing radiomics data from 2 CT scans of 10 lung tumors, and using CT images from 10 patients, each segmented by different physicians or auto-segmentation tools, to assess segmentation performance.results: The study reveals 206 radiomics features with a Concordance Correlation Coefficient (CCC) greater than 0.93 between the two CT images, indicating robust reproducibility. Seven features exhibit low Intraclass Correlation Coefficients (ICC), signifying increased sensitivity to segmentation differences. The findings suggest that Radiomics features, particularly those related to shape and energy, can capture subtle variations in tumor segmentation characteristics, unlike DSC.Abstract
In high-quality radiotherapy delivery, precise segmentation of targets and healthy structures is essential. This study proposes Radiomics features as a superior measure for assessing the segmentation ability of physicians and auto-segmentation tools, in comparison to the widely used Dice Similarity Coefficient (DSC). The research involves selecting reproducible radiomics features for evaluating segmentation accuracy by analyzing radiomics data from 2 CT scans of 10 lung tumors, available in the RIDER Data Library. Radiomics features were extracted using PyRadiomics, with selection based on the Concordance Correlation Coefficient (CCC). Subsequently, CT images from 10 patients, each segmented by different physicians or auto-segmentation tools, were used to assess segmentation performance. The study reveals 206 radiomics features with a CCC greater than 0.93 between the two CT images, indicating robust reproducibility. Among these features, seven exhibit low Intraclass Correlation Coefficients (ICC), signifying increased sensitivity to segmentation differences. Notably, ICCs of original shape features, including sphericity, elongation, and flatness, ranged from 0.1177 to 0.995. In contrast, all DSC values exceeded 0.778. This research demonstrates that radiomics features, particularly those related to shape and energy, can capture subtle variations in tumor segmentation characteristics, unlike DSC. As a result, Radiomics features with ICC prove superior for evaluating a physician's tumor segmentation ability and the performance of auto-segmentation tools. The findings suggest that these new metrics can be employed to assess novel auto-segmentation methods and enhance the training of individuals in medical segmentation, thus contributing to improved radiotherapy practices.
摘要
高品质放疗需要精准地分割目标和健康结构。本研究提出使用 радиом特征来评估医生和自动分割工具的分割精度,而不是常用的 dice相似度指标(DSC)。研究选择了可重复的 радиом特征来评估分割精度,通过分析20个lung tumor的CT扫描图,从RIDER数据库中获得的 radiomics数据。 радиом特征使用PyRadiomics提取,基于协调相似度指标(CCC)进行选择。然后,从10名患者的CT图像中,每名患者由不同的医生或自动分割工具进行分割,以评估分割性能。研究发现有206个 радиом特征,其CCC值大于0.93,表示robust可重复性。其中,7个特征 display low intraclass correlation coefficients(ICC), indicating increased sensitivity to segmentation differences。尤其是原始形状特征,包括圆形度、强度和平坦度,ICC值分别为0.1177-0.995。与此相比,所有DSC值都大于0.778。这项研究表明, радиом特征,特别是与形状和能量相关的特征,可以捕捉到细微的分割特征变化,与DSC不同。因此,Radiomics特征,特别是ICC高的特征,可以更好地评估医生的肿瘤分割能力和自动分割工具的性能。这些新指标可以用来评估新的自动分割方法和提高医学分割训练,从而对放疗实践产生贡献。
Adaptive Anchor Label Propagation for Transductive Few-Shot Learning
results: 提出了一种新的 Adaptive Anchor Label Propagation 算法,在 1-shot 和 5-shot 设置下,与标准标签传播算法比较,提高了7% 和 2% 的性能。实验结果表明,我们的算法在四个常用的几个shot benchmark数据集(miniImageNet、tieredImageNet、CUB 和 CIFAR-FS)上表现出色,并且可以与两种常用的干部网络(ResNet12 和 WideResNet-28-10)结合使用。代码可以在 GitHub 上找到。Abstract
Few-shot learning addresses the issue of classifying images using limited labeled data. Exploiting unlabeled data through the use of transductive inference methods such as label propagation has been shown to improve the performance of few-shot learning significantly. Label propagation infers pseudo-labels for unlabeled data by utilizing a constructed graph that exploits the underlying manifold structure of the data. However, a limitation of the existing label propagation approaches is that the positions of all data points are fixed and might be sub-optimal so that the algorithm is not as effective as possible. In this work, we propose a novel algorithm that adapts the feature embeddings of the labeled data by minimizing a differentiable loss function optimizing their positions in the manifold in the process. Our novel algorithm, Adaptive Anchor Label Propagation}, outperforms the standard label propagation algorithm by as much as 7% and 2% in the 1-shot and 5-shot settings respectively. We provide experimental results highlighting the merits of our algorithm on four widely used few-shot benchmark datasets, namely miniImageNet, tieredImageNet, CUB and CIFAR-FS and two commonly used backbones, ResNet12 and WideResNet-28-10. The source code can be found at https://github.com/MichalisLazarou/A2LP.
摘要
几个shot学习解决了使用有限的标注数据来分类图像的问题。通过使用推导式推理方法,如标签卷推理,可以在几个shot学习中显著提高性能。标签卷推理在未标注数据上推断 pseudo-标签,利用建立的图像数据下的潜在拓扑结构。然而,现有的标签卷推理方法的一个局限性是所有数据点的位置固定,可能不是最佳的,因此算法效果不是最好。在这种情况下,我们提出了一种新的算法,可以适应标注数据的特征表示进行最佳化。我们称之为 Adaptive Anchor Label Propagation(A2LP)。A2LP算法在1-shot和5-shot设置中分别超过标准标签卷推理算法的7%和2%。我们在四个广泛使用的几个shot benchmark数据集(miniImageNet、tieredImageNet、CUB和CIFAR-FS)和两种常用的后向推理器(ResNet12和WideResNet-28-10)上进行了实验,并提供了相关的实验结果。代码可以在https://github.com/MichalisLazarou/A2LP中找到。
Emotional Theory of Mind: Bridging Fast Visual Processing with Slow Linguistic Reasoning
results: 对于图像-语言-情感任务, combining “fast”和”slow”的理解方法可以提高情感认知系统的性能。然而,遗留了在零学习情感理论心理任务中的差距,相比于之前在EMOTIC dataset上进行的训练。Abstract
The emotional theory of mind problem in images is an emotion recognition task, specifically asking "How does the person in the bounding box feel?" Facial expressions, body pose, contextual information and implicit commonsense knowledge all contribute to the difficulty of the task, making this task currently one of the hardest problems in affective computing. The goal of this work is to evaluate the emotional commonsense knowledge embedded in recent large vision language models (CLIP, LLaVA) and large language models (GPT-3.5) on the Emotions in Context (EMOTIC) dataset. In order to evaluate a purely text-based language model on images, we construct "narrative captions" relevant to emotion perception, using a set of 872 physical social signal descriptions related to 26 emotional categories, along with 224 labels for emotionally salient environmental contexts, sourced from writer's guides for character expressions and settings. We evaluate the use of the resulting captions in an image-to-language-to-emotion task. Experiments using zero-shot vision-language models on EMOTIC show that combining "fast" and "slow" reasoning is a promising way forward to improve emotion recognition systems. Nevertheless, a gap remains in the zero-shot emotional theory of mind task compared to prior work trained on the EMOTIC dataset.
摘要
“情感理论心理问题在图像中是一种情感识别任务,具体来说是问“图像中人员在盒子中如何感到?” facial expressions、body pose、contextual information和隐性常识都会对这个任务带来挑战,使得这个任务成为当前情感计算领域中最Difficult Problem。本工作的目标是评估最近的大视语言模型(CLIP、LLaVA)和大语言模型(GPT-3.5)中的情感常识在EMOTIC数据集上的表现。为了评估一个纯文本基于的语言模型在图像上,我们构建了“narative captions”相关于情感感知,使用了872种物理社交信号描述和26种情感类别,以及224个用于描述情感相关的环境Label。我们在图像-语言-情感任务中使用了这些caption进行评估。针对EMOTIC数据集,我们发现,结合“快”和“慢”理解是一种有前途的方法,可以改进情感识别系统。然而,在零shot情感理论心理任务中,与先前工作相比,还存在一定的差距。”
Addressing Weak Decision Boundaries in Image Classification by Leveraging Web Search and Generative Models
paper_authors: Preetam Prabhu Srikar Dammu, Yunhe Feng, Chirag Shah
for: This paper aims to address the issue of bias and discrimination in machine learning (ML) models, specifically in the context of image classification.
methods: The proposed approach leverages the power of web search and generative models to enhance robustness and mitigate bias in ML models. The method involves identifying weak decision boundaries for vulnerable populations, constructing search queries for Google, and generating new training samples through DALL-E 2 and Stable Diffusion.
results: The proposed method achieved a significant reduction (77.30%) in the model’s gender accuracy disparity, and improved the classifier’s decision boundary with fewer weakspots and increased separation between classes.Here are the three points in Simplified Chinese text:
results: 提议的方法在人类肖像分类问题中实现了显著减少(77.30%)的性别准确率差距,并改善分类器的决策边界,具有更少的弱点和类别之间更大的分离。Abstract
Machine learning (ML) technologies are known to be riddled with ethical and operational problems, however, we are witnessing an increasing thrust by businesses to deploy them in sensitive applications. One major issue among many is that ML models do not perform equally well for underrepresented groups. This puts vulnerable populations in an even disadvantaged and unfavorable position. We propose an approach that leverages the power of web search and generative models to alleviate some of the shortcomings of discriminative models. We demonstrate our method on an image classification problem using ImageNet's People Subtree subset, and show that it is effective in enhancing robustness and mitigating bias in certain classes that represent vulnerable populations (e.g., female doctor of color). Our new method is able to (1) identify weak decision boundaries for such classes; (2) construct search queries for Google as well as text for generating images through DALL-E 2 and Stable Diffusion; and (3) show how these newly captured training samples could alleviate population bias issue. While still improving the model's overall performance considerably, we achieve a significant reduction (77.30\%) in the model's gender accuracy disparity. In addition to these improvements, we observed a notable enhancement in the classifier's decision boundary, as it is characterized by fewer weakspots and an increased separation between classes. Although we showcase our method on vulnerable populations in this study, the proposed technique is extendable to a wide range of problems and domains.
摘要
机器学习(ML)技术具有许多道理和运营问题,但是企业在敏感应用中广泛使用。一个主要问题是ML模型在弱 represented 群体中不具有相同的性能。这会让投降的人口处于更加不利和不利的位置。我们提议一种使用网络搜索和生成模型来缓解歧视模型的缺陷。我们在图像分类问题中使用图像网络的人类树subset,并证明我们的方法可以提高鲁棒性和减少偏见在某些类型中,例如女性黑人医生。我们的新方法可以(1)标识这些类型的弱点决策界;(2)生成Google搜索和生成图像的文本;以及(3)这些新采集的训练样本可以减少人群偏见问题。虽然我们仍然提高模型的总性能,但我们 Achieve a significant reduction (77.30%) in the model's gender accuracy disparity。此外,我们发现了这些改进后的决策界更加稳定,具有更少的弱点和类型之间更加增加的分化。虽然在本研究中我们使用易受护理的人口,但我们提议的技术可以应用于各种问题和领域。
‘Person’ == Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion
for: The paper examines the stereotypes embedded in Stable Diffusion, a popular text-to-image generator, and how it assigns gender and nationality/continental identity to individuals based on the absence of information.
methods: The paper uses vision-language model CLIP’s cosine similarity to compare images generated by CLIP-based Stable Diffusion v2.1 and manual examination to chronicle the results of front-facing images of persons from different continents, nationalities, and genders.
results: The paper finds that Stable Diffusion outputs of “a person” without any additional gender/nationality information correspond closest to images of men and least with persons of nonbinary gender, and that the output images are more likely to be of European/North American men rather than women or nonbinary individuals. The paper also observes continental stereotypes and the harmful erasure of Indigenous Oceanic peoples, as well as the oversexualization of women, specifically Latin American, Mexican, Indian, and Egyptian women.Abstract
We study stereotypes embedded within one of the most popular text-to-image generators: Stable Diffusion. We examine what stereotypes of gender and nationality/continental identity does Stable Diffusion display in the absence of such information i.e. what gender and nationality/continental identity is assigned to `a person', or to `a person from Asia'. Using vision-language model CLIP's cosine similarity to compare images generated by CLIP-based Stable Diffusion v2.1 verified by manual examination, we chronicle results from 136 prompts (50 results/prompt) of front-facing images of persons from 6 different continents, 27 nationalities and 3 genders. We observe how Stable Diffusion outputs of `a person' without any additional gender/nationality information correspond closest to images of men and least with persons of nonbinary gender, and to persons from Europe/North America over Africa/Asia, pointing towards Stable Diffusion having a concerning representation of personhood to be a European/North American man. We also show continental stereotypes and resultant harms e.g. a person from Oceania is deemed to be Australian/New Zealander over Papua New Guinean, pointing to the erasure of Indigenous Oceanic peoples, who form a majority over descendants of colonizers both in Papua New Guinea and in Oceania overall. Finally, we unexpectedly observe a pattern of oversexualization of women, specifically Latin American, Mexican, Indian and Egyptian women relative to other nationalities, measured through an NSFW detector. This demonstrates how Stable Diffusion perpetuates Western fetishization of women of color through objectification in media, which if left unchecked will amplify this stereotypical representation. Image datasets are made publicly available.
摘要
我们研究在普遍用于文本到图像生成器中嵌入的标准刻板:稳定扩散。我们研究这些刻板在没有任何信息时如何表现出的性别和国籍/大陆认同,例如,在没有任何信息时,怎样对“一个人”进行识别,或者对“一个来自亚洲的人”进行识别。我们使用 CLIP 的 cosine 相似性来比较由 CLIP 基于的 Stable Diffusion v2.1 生成的图像,并通过手动检查,对 136 个提示(每个提示 50 个结果)进行了 chronicle 记录,这些提示包括来自 6 个大陆、27 个国家和 3 个性别的人像。我们发现在没有任何信息时,Stable Diffusion 输出的“一个人”最接近男性图像,并且与非binary 性别最少相关,而且更多地对应于欧洲/北美地区。此外,我们发现 Stable Diffusion 对人类形象的表现存在严重问题,例如,它认为澳大利亚/新西兰人是大洋洲人,而不是 Papua New Guinea 人,这种做法导致了原住民澳大利亚/新西兰人的抹杀,这些人占澳大利亚/新西兰人口的多数。此外,我们还发现,在某些国家女性图像中存在过度性化现象,例如拉丁美洲、墨西哥、印度和埃及女性图像,这种现象是由西方对女性色彩化的objectification在媒体中所带来的。如果不加以检查,这种刻板的表现将被加剧。我们将图像数据公共发布。
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks
results: 研究发现,使用大量的预训练数据和超vised学习方法可以实现高性能的计算视觉系统,而且这些模型在多种任务上表现强劲,特别是在类别预测和物体检测任务上。此外,研究还发现,使用同样的 arquitectures和预训练数据,SSL模型可以与超vised学习模型相比,表现很高水平。Abstract
Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs. While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, we find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models we consider. Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets. We release the raw results of our experiments along with code that allows researchers to put their own backbones through the gauntlet here: https://github.com/hsouri/Battle-of-the-Backbones
摘要
神经网络基于计算机视觉系统通常建立在脊梁上,脊梁可以是预训练或随机初始化的特征提取器。过去几年,默认选择是使用ImageNet进行预训练的卷积神经网络。然而,最近几年,各种算法和数据集预训练的脊梁出现了 countless 。这种多样性使得各种系统表现得更好,但是对实践者来说做出 Informed 决策变得更加困难。“Backbone Battle”(BoB)使得这种选择变得更加容易,它对多种预训练模型进行了多种计算机视觉任务的比较,包括分类、物体检测和OOD泛化等。此外,BoB还为研究者们提供了推进计算机视觉的方向,通过对现有方法的分析和评估,揭示了现有方法的优劣。我们发现,使用大量训练集和超参数进行预训练的卷积神经网络仍然在大多数任务上表现最佳。此外,我们发现,使用同一类型的 arquitectures 和相同大小的预训练集,SSL 预训练的脊梁在同类任务上表现非常竞争力。因此,未来的工作应该使用更先进的架构和更大的预训练集进行SSL预训练。我们将 raw 的实验结果和相应的代码发布在 GitHub 上:
MIST: Medical Image Segmentation Transformer with Convolutional Attention Mixing (CAM) Decoder
results: 实验结果显示,我们的MIST模型与CAM解码器在ACDC和Synapse datasets上的分类性能比State-of-the-art模型高。此外,我们还证明了在不同的维度上对应的CAM解码器可以将低级和高级特征融合,以提高分类性能。Abstract
One of the common and promising deep learning approaches used for medical image segmentation is transformers, as they can capture long-range dependencies among the pixels by utilizing self-attention. Despite being successful in medical image segmentation, transformers face limitations in capturing local contexts of pixels in multimodal dimensions. We propose a Medical Image Segmentation Transformer (MIST) incorporating a novel Convolutional Attention Mixing (CAM) decoder to address this issue. MIST has two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images. In the CAM decoder, an attention-mixer combining multi-head self-attention, spatial attention, and squeeze and excitation attention modules is introduced to capture long-range dependencies in all spatial dimensions. Moreover, to enhance spatial information gain, deep and shallow convolutions are used for feature extraction and receptive field expansion, respectively. The integration of low-level and high-level features from different network stages is enabled by skip connections, allowing MIST to suppress unnecessary information. The experiments show that our MIST transformer with CAM decoder outperforms the state-of-the-art models specifically designed for medical image segmentation on the ACDC and Synapse datasets. Our results also demonstrate that adding the CAM decoder with a hierarchical transformer improves segmentation performance significantly. Our model with data and code is publicly available on GitHub.
摘要
一种常见且有前途的深度学习方法是转换器,它可以通过自我注意来捕捉图像像素之间的长距离依赖关系。 despite its success in医疗图像分割,转换器受到多modal维度中像素的本地上下文的限制。 we propose a Medical Image Segmentation Transformer (MIST) that incorporates a novel Convolutional Attention Mixing (CAM) decoder to address this issue. MIST consists of two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images. In the CAM decoder, an attention-mixer combining multi-head self-attention, spatial attention, and squeeze and excitation attention modules is introduced to capture long-range dependencies in all spatial dimensions. Moreover, to enhance spatial information gain, deep and shallow convolutions are used for feature extraction and receptive field expansion, respectively. The integration of low-level and high-level features from different network stages is enabled by skip connections, allowing MIST to suppress unnecessary information. The experiments show that our MIST transformer with CAM decoder outperforms the state-of-the-art models specifically designed for medical image segmentation on the ACDC and Synapse datasets. Our results also demonstrate that adding the CAM decoder with a hierarchical transformer improves segmentation performance significantly. Our model with data and code is publicly available on GitHub.Here's the translation in Traditional Chinese:一种常见且有前途的深度学习方法是转换器,它可以通过自我注意来捕捉图像像素之间的长距离依赖关系。 despite its success in医疗图像分割,转换器受到多modal维度中像素的本地上下文的限制。 we propose a Medical Image Segmentation Transformer (MIST) that incorporates a novel Convolutional Attention Mixing (CAM) decoder to address this issue. MIST consists of two parts: a pre-trained multi-axis vision transformer (MaxViT) is used as an encoder, and the encoded feature representation is passed through the CAM decoder for segmenting the images. In the CAM decoder, an attention-mixer combining multi-head self-attention, spatial attention, and squeeze and excitation attention modules is introduced to capture long-range dependencies in all spatial dimensions. Moreover, to enhance spatial information gain, deep and shallow convolutions are used for feature extraction and receptive field expansion, respectively. The integration of low-level and high-level features from different network stages is enabled by skip connections, allowing MIST to suppress unnecessary information. The experiments show that our MIST transformer with CAM decoder outperforms the state-of-the-art models specifically designed for medical image segmentation on the ACDC and Synapse datasets. Our results also demonstrate that adding the CAM decoder with a hierarchical transformer improves segmentation performance significantly. Our model with data and code is publicly available on GitHub.
DiffEnc: Variational Diffusion with a Learned Encoder
results: 论文的提出的DiffEnc框架在CIFAR-10上实现了状态计算机科学的最佳可能性。此外,论文还提出了一种基于权重的扩散损失方法,可以用于进行推理。Abstract
Diffusion models may be viewed as hierarchical variational autoencoders (VAEs) with two improvements: parameter sharing for the conditional distributions in the generative process and efficient computation of the loss as independent terms over the hierarchy. We consider two changes to the diffusion model that retain these advantages while adding flexibility to the model. Firstly, we introduce a data- and depth-dependent mean function in the diffusion process, which leads to a modified diffusion loss. Our proposed framework, DiffEnc, achieves state-of-the-art likelihood on CIFAR-10. Secondly, we let the ratio of the noise variance of the reverse encoder process and the generative process be a free weight parameter rather than being fixed to 1. This leads to theoretical insights: For a finite depth hierarchy, the evidence lower bound (ELBO) can be used as an objective for a weighted diffusion loss approach and for optimizing the noise schedule specifically for inference. For the infinite-depth hierarchy, on the other hand, the weight parameter has to be 1 to have a well-defined ELBO.
摘要
Diffusion models可以被看作为层次Variational Autoencoder(VAEs),具有两个改进:在生成过程中共享参数的条件分布,以及独立计算损失的 hierarchy。我们考虑了对 diffusion model 的两种改进,以提高模型的灵活性。首先,我们引入了基于数据和深度的均值函数,导致了修改的扩散损失。我们的提出的框架,DiffEnc,在 CIFAR-10 上达到了状态机器的可靠性。其次,我们允许反推采样过程中噪声方差的比率作为自由参数,而不是固定为1。这导致了理论上的发现:对于有限深度层次,可以使用抽象下界(ELBO)作为一个Weighted扩散损失的目标,并且可以优化噪声程度特别 для推理。而对于无穷深度层次,则weight参数必须为1,以便有一定的ELBO。
MM-VID: Advancing Video Understanding with GPT-4V(ision)
paper_authors: Kevin Lin, Faisal Ahmed, Linjie Li, Chung-Ching Lin, Ehsan Azarnasab, Zhengyuan Yang, Jianfeng Wang, Lin Liang, Zicheng Liu, Yumao Lu, Ce Liu, Lijuan Wang
For: MM-VID is designed to facilitate advanced video understanding, particularly for long-form videos and intricate tasks such as reasoning within hour-long content and grasping storylines spanning multiple episodes.* Methods: MM-VID uses a video-to-script generation with GPT-4V to transcribe multimodal elements into a long textual script, enabling advanced capabilities such as audio description, character identification, and multimodal high-level comprehension.* Results: Experimental results demonstrate the effectiveness of MM-VID in handling distinct video genres with various video lengths, and its potential when applied to interactive environments such as video games and graphic user interfaces.Here’s the information in Simplified Chinese text:
results: 实验结果表明 MM-VID 可以处理不同类型的视频,并且在交互环境中如游戏和图形用户界面中表现出色。Abstract
We present MM-VID, an integrated system that harnesses the capabilities of GPT-4V, combined with specialized tools in vision, audio, and speech, to facilitate advanced video understanding. MM-VID is designed to address the challenges posed by long-form videos and intricate tasks such as reasoning within hour-long content and grasping storylines spanning multiple episodes. MM-VID uses a video-to-script generation with GPT-4V to transcribe multimodal elements into a long textual script. The generated script details character movements, actions, expressions, and dialogues, paving the way for large language models (LLMs) to achieve video understanding. This enables advanced capabilities, including audio description, character identification, and multimodal high-level comprehension. Experimental results demonstrate the effectiveness of MM-VID in handling distinct video genres with various video lengths. Additionally, we showcase its potential when applied to interactive environments, such as video games and graphic user interfaces.
摘要
我们介绍MM-VID,一个整合了GPT-4V的特有功能和视觉、音频和语音特化工具,以便实现高级视频理解。MM-VID是为解决长形视频和复杂任务(如在多集 episodic 内理解)而设计。它使用视频到脚本生成(GPT-4V)将多Modal元素转化为长文本脚本。生成的脚本包括人物运动、动作、表情和对话,使大语言模型(LLMs)可以实现视频理解。这使得可以实现高级功能,如音频描述、人物识别和多Modal高级理解。我们的实验结果表明MM-VID可以处理不同的视频类型和长度。此外,我们还展示了其在交互环境中的潜在应用,如视频游戏和图形用户界面。
Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP
results: 实验结果表明,该方法可以在一分钟内在单个GPU上学习视觉代理,并且可以提高零shot传递精度从77.02%提高到80.21% на ImageNet上,使用ViT-L/14@336预训练后CLIP。Abstract
Vision-language pre-training methods, e.g., CLIP, demonstrate an impressive zero-shot performance on visual categorizations with the class proxy from the text embedding of the class name. However, the modality gap between the text and vision space can result in a sub-optimal performance. We theoretically show that the gap cannot be reduced sufficiently by minimizing the contrastive loss in CLIP and the optimal proxy for vision tasks may reside only in the vision space. Therefore, given unlabeled target vision data, we propose to learn the vision proxy directly with the help from the text proxy for zero-shot transfer. Moreover, according to our theoretical analysis, strategies are developed to further refine the pseudo label obtained by the text proxy to facilitate the intra-modal proxy learning (InMaP) for vision. Experiments on extensive downstream tasks confirm the effectiveness and efficiency of our proposal. Concretely, InMaP can obtain the vision proxy within one minute on a single GPU while improving the zero-shot accuracy from $77.02\%$ to $80.21\%$ on ImageNet with ViT-L/14@336 pre-trained by CLIP. Code is available at \url{https://github.com/idstcv/InMaP}.
摘要
视觉语言预训练方法,如CLIP,示出了无需seen数据的很好表现能力。然而,视觉和语言空间之间的差异可能会导致表现下降。我们理论上表明,这种差异无法通过CLIP中的对比损失来减小。因此,在没有标注目标视觉数据的情况下,我们提议通过文本代理来学习视觉代理。此外,根据我们的理论分析,我们还开发了一些策略来进一步修正由文本代理生成的假标签,以便进行内模态代理学习(InMaP)。实验表明,InMaP可以在单个GPU上在1分钟内获得视觉代理,并在ImageNet上提高零shot准确率从77.02%到80.21%。代码可以在上找到。
Tell Me What Is Good About This Property: Leveraging Reviews For Segment-Personalized Image Collection Summarization
results: 研究表明,使用用户点评进行图像摘要可以提高摘要的质量和有用性,而且不需要费时的注释。我们的人体学习研究也表明,用户对于我们的cross-modal方法(CrossSummarizer)表示更高的满意度。Abstract
Image collection summarization techniques aim to present a compact representation of an image gallery through a carefully selected subset of images that captures its semantic content. When it comes to web content, however, the ideal selection can vary based on the user's specific intentions and preferences. This is particularly relevant at Booking.com, where presenting properties and their visual summaries that align with users' expectations is crucial. To address this challenge, we consider user intentions in the summarization of property visuals by analyzing property reviews and extracting the most significant aspects mentioned by users. By incorporating the insights from reviews in our visual summaries, we enhance the summaries by presenting the relevant content to a user. Moreover, we achieve it without the need for costly annotations. Our experiments, including human perceptual studies, demonstrate the superiority of our cross-modal approach, which we coin as CrossSummarizer over the no-personalization and image-based clustering baselines.
摘要
simplified Chinese:图像集合概要技术的目标是通过选择一个精心选择的图像子集,以捕捉图像集合的 semantic 内容。然而,在网络内容上,理想的选择可能会因用户的具体目的和偏好而发生变化。这 particualry relevant 在 Booking.com 上,因为在这里,为用户提供适合他们期望的 properties 和其视觉概要是非常重要。为了解决这个挑战,我们在 visual 概要中考虑用户的意图,通过分析 property 评论,提取用户提到的最重要的方面。通过在概要中包含评论中的 Insight,我们可以提高概要的 relevance,为用户提供相关的内容。此外,我们可以在不需要昂贵的注释的情况下实现这一点。我们的实验,包括人类感知研究,表明我们的 cross-modal 方法(我们称之为 CrossSummarizer)在无个性化和图像基于划分的基线之上具有超越性。
Promise:Prompt-driven 3D Medical Image Segmentation Using Pretrained Image Foundation Models
results: 在两个公共数据集上进行colon和肠癌肿块分割任务,与状态之前 segmentation 方法相比,提出的方法具有更高的性能Abstract
To address prevalent issues in medical imaging, such as data acquisition challenges and label availability, transfer learning from natural to medical image domains serves as a viable strategy to produce reliable segmentation results. However, several existing barriers between domains need to be broken down, including addressing contrast discrepancies, managing anatomical variability, and adapting 2D pretrained models for 3D segmentation tasks. In this paper, we propose ProMISe,a prompt-driven 3D medical image segmentation model using only a single point prompt to leverage knowledge from a pretrained 2D image foundation model. In particular, we use the pretrained vision transformer from the Segment Anything Model (SAM) and integrate lightweight adapters to extract depth-related (3D) spatial context without updating the pretrained weights. For robust results, a hybrid network with complementary encoders is designed, and a boundary-aware loss is proposed to achieve precise boundaries. We evaluate our model on two public datasets for colon and pancreas tumor segmentations, respectively. Compared to the state-of-the-art segmentation methods with and without prompt engineering, our proposed method achieves superior performance. The code is publicly available at https://github.com/MedICL-VU/ProMISe.
摘要
医学影像问题的常见问题,如数据获取困难和标签不足,可以通过域转换学习来生成可靠的 segmentation 结果。然而,需要破坏几个存在的域之间的障碍,包括对比不同、管理生物学变化和使用2D预训练模型进行3D segmentation任务的适应。在本文中,我们提出了ProMISe,一种基于单点提示的3D医学影像 segmentation模型,使用Segment Anything Model(SAM)的预训练视Transformer,并通过轻量级适配器来提取深度相关(3D)空间上下文而无需更新预训练参数。为了增强结果的稳定性,我们设计了混合网络,并提出了边界意识损失来实现精确的边界。我们在两个公共数据集上进行了对colon和肠癌肿 segmentation的评估,并与无提示工程和其他状态之前的方法进行了比较。相比之下,我们的提posed方法得到了superior的性能。代码可以在https://github.com/MedICL-VU/ProMISe上获取。
Deep-learning-based decomposition of overlapping-sparse images: application at the vertex of neutrino interactions
results: 该方法可以准确地提取低动量粒子的精细参数,提高了 neutrino 事件的能量解析精度。此外,通过与完全可导生成模型结合,进一步提高图像分解,以达到具有前所未有的结果。Abstract
Image decomposition plays a crucial role in various computer vision tasks, enabling the analysis and manipulation of visual content at a fundamental level. Overlapping images, which occur when multiple objects or scenes partially occlude each other, pose unique challenges for decomposition algorithms. The task intensifies when working with sparse images, where the scarcity of meaningful information complicates the precise extraction of components. This paper presents a solution that leverages the power of deep learning to accurately extract individual objects within multi-dimensional overlapping-sparse images, with a direct application in high-energy physics with decomposition of overlaid elementary particles obtained from imaging detectors. In particular, the proposed approach tackles a highly complex yet unsolved problem: identifying and measuring independent particles at the vertex of neutrino interactions, where one expects to observe detector images with multiple indiscernible overlapping charged particles. By decomposing the image of the detector activity at the vertex through deep learning, it is possible to infer the kinematic parameters of the identified low-momentum particles - which otherwise would remain neglected - and enhance the reconstructed energy resolution of the neutrino event. We also present an additional step - that can be tuned directly on detector data - combining the above method with a fully-differentiable generative model to improve the image decomposition further and, consequently, the resolution of the measured parameters, achieving unprecedented results. This improvement is crucial for precisely measuring the parameters that govern neutrino flavour oscillations and searching for asymmetries between matter and antimatter.
摘要
图像分解在各种计算机视觉任务中扮演着关键角色,允许对视觉内容进行基础 уров划分。不同物体或场景之间的重叠,对分解算法 pose 独特挑战。在缺乏有效信息的情况下,精度地提取组件变得更加复杂。这篇文章提出了一种利用深度学习来准确地从多维重叠 sparse 图像中提取个体对象,具体应用于高能物理中的图像分解。特别是,提议方案解决了一个非常复杂但未解决的问题:在neutrino交互点上识别和测量低动量粒子的独立参数,其中预期可以 observer 探测器图像中多个难以区分的 charged 粒子。通过深度学习对探测器活动图像的分解,可以从粒子的kinematic 参数中提取低动量粒子的识别结果,从而提高 neutrino 事件的能量分解。此外,我们还提出了一个附加步骤,可以直接基于探测器数据进行调整,通过将上述方法与完全可导生成模型相结合,进一步提高图像分解,并因此提高测量参数的分解精度,达到历史上最佳结果。这种改进是关键的,用于精确测量neutrino 味 flavor 振荡和物质与反物质的差异。
A Principled Hierarchical Deep Learning Approach to Joint Image Compression and Classification
results: 我们的提议方法可以在CIFAR-10和CIFAR-100上实现Accuracy提高约1.5%和3% respectively,比传统的端到端十字 entropy 训练更高。Abstract
Among applications of deep learning (DL) involving low cost sensors, remote image classification involves a physical channel that separates edge sensors and cloud classifiers. Traditional DL models must be divided between an encoder for the sensor and the decoder + classifier at the edge server. An important challenge is to effectively train such distributed models when the connecting channels have limited rate/capacity. Our goal is to optimize DL models such that the encoder latent requires low channel bandwidth while still delivers feature information for high classification accuracy. This work proposes a three-step joint learning strategy to guide encoders to extract features that are compact, discriminative, and amenable to common augmentations/transformations. We optimize latent dimension through an initial screening phase before end-to-end (E2E) training. To obtain an adjustable bit rate via a single pre-deployed encoder, we apply entropy-based quantization and/or manual truncation on the latent representations. Tests show that our proposed method achieves accuracy improvement of up to 1.5% on CIFAR-10 and 3% on CIFAR-100 over conventional E2E cross-entropy training.
摘要
深度学习(DL)应用中的低成本感知器Remote图像分类具有物理通道,将边缘感知器和云端分类器分开。传统的DL模型需要在感知器和云端分类器之间分配资源,这会增加训练复杂性和通信成本。我们的目标是将DL模型优化,使承载器中的秘密特征具有低通信带宽,仍然能够提供高精度分类。我们提出了一种三步共同学习策略,以导引承载器中的encoder提取高度紧凑、抽象、可以适应增强的特征。我们通过初始屏选 phasescreening来验证维度的选择。在end-to-end(E2E)训练过程中,我们使用Entropy基于的压缩和/或手动跳转来调整秘密特征的维度。测试结果表明,我们的提议方法可以在CIFAR-10和CIFAR-100上提高精度达1.5%和3%。
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
results: 实验表明,DrM在三个连续控制 benchmark环境中(DeepMind Control Suite、MetaWorld 和 Adroit) achieved significant improvements in sample efficiency and asymptotic performance with no broken seeds(76 seeds in total),并成功解决了 DeepMind Control Suite 中的狗和手动控制任务,以及 Adroit 中的三个灵活手动控制任务。Abstract
Visual reinforcement learning (RL) has shown promise in continuous control tasks. Despite its progress, current algorithms are still unsatisfactory in virtually every aspect of the performance such as sample efficiency, asymptotic performance, and their robustness to the choice of random seeds. In this paper, we identify a major shortcoming in existing visual RL methods that is the agents often exhibit sustained inactivity during early training, thereby limiting their ability to explore effectively. Expanding upon this crucial observation, we additionally unveil a significant correlation between the agents' inclination towards motorically inactive exploration and the absence of neuronal activity within their policy networks. To quantify this inactivity, we adopt dormant ratio as a metric to measure inactivity in the RL agent's network. Empirically, we also recognize that the dormant ratio can act as a standalone indicator of an agent's activity level, regardless of the received reward signals. Leveraging the aforementioned insights, we introduce DrM, a method that uses three core mechanisms to guide agents' exploration-exploitation trade-offs by actively minimizing the dormant ratio. Experiments demonstrate that DrM achieves significant improvements in sample efficiency and asymptotic performance with no broken seeds (76 seeds in total) across three continuous control benchmark environments, including DeepMind Control Suite, MetaWorld, and Adroit. Most importantly, DrM is the first model-free algorithm that consistently solves tasks in both the Dog and Manipulator domains from the DeepMind Control Suite as well as three dexterous hand manipulation tasks without demonstrations in Adroit, all based on pixel observations.
摘要
视觉强化学习(RL)在连续控制任务中表现出了承诺。尽管它已经取得了进步,但现有算法仍然在许多方面表现不满意,如样本效率、极限性能和随机种子的稳定性。在这篇论文中,我们发现了现有的视觉RL方法中的一个重要缺陷:代理人在训练的早期经常进入持续的无动作状态,从而限制它们的探索效果。在这基础之上,我们还发现了代理人倾向于无动作探索的倾向和策略网络中无活动神经元之间存在显著的相关性。为了量化这种无动作,我们采用了沉默率作为RL代理人网络中的活动度量表。实验表明,DrM可以在三个核心机制的指导下, aktive 地降低沉默率,从而改善样本效率和极限性能。DrM在DeepMind Control Suite、MetaWorld和Adroit三个连续控制 benchmark环境中实现了显著的改进,而且在 Dog 和 Manipulator 领域中解决了任务,并在 Adroit 中实现了三个灵活手 manipulate 任务无需示例。
Domain Generalization in Computational Pathology: Survey and Guidelines
results: 本研究的结果表明,在CPath领域中,通过精心的实验设计和特征增强技术,可以有效地解决域特化问题。然而,不同的场景下的域特化问题需要不同的解决方案。因此,本文提出了明确的指南和方法来检测和管理域特化问题。这些概念、指南和建议都适用于大多数医学影像分析任务。Abstract
Deep learning models have exhibited exceptional effectiveness in Computational Pathology (CPath) by tackling intricate tasks across an array of histology image analysis applications. Nevertheless, the presence of out-of-distribution data (stemming from a multitude of sources such as disparate imaging devices and diverse tissue preparation methods) can cause \emph{domain shift} (DS). DS decreases the generalization of trained models to unseen datasets with slightly different data distributions, prompting the need for innovative \emph{domain generalization} (DG) solutions. Recognizing the potential of DG methods to significantly influence diagnostic and prognostic models in cancer studies and clinical practice, we present this survey along with guidelines on achieving DG in CPath. We rigorously define various DS types, systematically review and categorize existing DG approaches and resources in CPath, and provide insights into their advantages, limitations, and applicability. We also conduct thorough benchmarking experiments with 28 cutting-edge DG algorithms to address a complex DG problem. Our findings suggest that careful experiment design and CPath-specific Stain Augmentation technique can be very effective. However, there is no one-size-fits-all solution for DG in CPath. Therefore, we establish clear guidelines for detecting and managing DS depending on different scenarios. While most of the concepts, guidelines, and recommendations are given for applications in CPath, we believe that they are applicable to most medical image analysis tasks as well.
摘要
深度学习模型在计算 PATH 中(CPath)表现出色,能够解决各种复杂的图像分析任务。然而,由于多种来源的不同图像设备和多种组织方法而导致的“领域转移”(DS)问题,使得训练的模型在未看过的数据集上的泛化性受到影响。为了解决这问题,我们提出了一些创新的“领域泛化”(DG)方法。我们在这篇文章中对 DG 方法在 CPath 中的应用进行了系统性的介绍和评估,并提供了适用于不同场景的指导方针。我们还进行了28种高级 DG 算法的 benchmarking 实验,发现在 CPath 中使用特定的染料增强技术和精心的实验设计可以获得非常有效的结果。然而,没有一个适合所有情况的 DG 解决方案。因此,我们提出了适应不同场景的 DS 探测和管理的明确指导方针。这些概念、指导方针和建议大部分适用于医学图像分析任务中。
Upgrading VAE Training With Unlimited Data Plans Provided by Diffusion Models
paper_authors: Tim Z. Xiao, Johannes Zenn, Robert Bamler
for: This paper aims to mitigate overfitting in variational autoencoders (VAEs) by training on samples from a pre-trained diffusion model.
methods: The paper proposes training VAEs on samples from a pre-trained diffusion model to improve their representation learning and mitigate overfitting.
results: The paper finds that training VAEs on samples from a pre-trained diffusion model leads to improvements in generative performance, amortization gap, and robustness compared to normal training and conventional data augmentation methods.Abstract
Variational autoencoders (VAEs) are popular models for representation learning but their encoders are susceptible to overfitting (Cremer et al., 2018) because they are trained on a finite training set instead of the true (continuous) data distribution $p_{\mathrm{data}(\mathbf{x})$. Diffusion models, on the other hand, avoid this issue by keeping the encoder fixed. This makes their representations less interpretable, but it simplifies training, enabling accurate and continuous approximations of $p_{\mathrm{data}(\mathbf{x})$. In this paper, we show that overfitting encoders in VAEs can be effectively mitigated by training on samples from a pre-trained diffusion model. These results are somewhat unexpected as recent findings (Alemohammad et al., 2023; Shumailov et al., 2023) observe a decay in generative performance when models are trained on data generated by another generative model. We analyze generalization performance, amortization gap, and robustness of VAEs trained with our proposed method on three different data sets. We find improvements in all metrics compared to both normal training and conventional data augmentation methods, and we show that a modest amount of samples from the diffusion model suffices to obtain these gains.
摘要
variational autoencoders (VAEs) 是一种广泛使用的表示学习模型,但它们的编码器容易过拟合 (Cremer et al., 2018),因为它们被训练在一个有限的训练集上而不是真正的数据分布 $p_{\text{data}(\mathbf{x})$。而扩散模型则可以避免这个问题,因为它们的编码器是固定的。这会使其表示更难于解释,但是它可以简化训练,使得可以获得精确和连续的 $p_{\text{data}(\mathbf{x})$ 的近似。在这篇论文中,我们显示了使用预训练的扩散模型训练 VAE 可以有效地遏制编码器过拟合。这些结果有些意外,因为 latest findings (Alemohammad et al., 2023; Shumailov et al., 2023) 观察到在另一个生成模型上训练模型会导致生成性能下降。我们分析了 VAE 在不同数据集上的总体性能、挥剑差和稳定性,并发现在使用我们提议的方法时,所有指标都有所改善,并且发现一些模est amount of samples from the diffusion model suffices to obtain these gains。
DistNet2D: Leveraging long-range temporal information for efficient segmentation and tracking
results: 论文的实验结果显示,DistNet2D 在两个实验数据集上都有较高的性能,比之前两种方法更高。此外,论文还示出了在 correlate 细胞大小和形状与其运输特性的 statistically 的可能性。Abstract
Extracting long tracks and lineages from videomicroscopy requires an extremely low error rate, which is challenging on complex datasets of dense or deforming cells. Leveraging temporal context is key to overcome this challenge. We propose DistNet2D, a new deep neural network (DNN) architecture for 2D cell segmentation and tracking that leverages both mid- and long-term temporal context. DistNet2D considers seven frames at the input and uses a post-processing procedure that exploits information from the entire movie to correct segmentation errors. DistNet2D outperforms two recent methods on two experimental datasets, one containing densely packed bacterial cells and the other containing eukaryotic cells. It has been integrated into an ImageJ-based graphical user interface for 2D data visualization, curation, and training. Finally, we demonstrate the performance of DistNet2D on correlating the size and shape of cells with their transport properties over large statistics, for both bacterial and eukaryotic cells.
摘要
原文:Extracting long tracks and lineages from videomicroscopy requires an extremely low error rate, which is challenging on complex datasets of dense or deforming cells. Leveraging temporal context is key to overcome this challenge. We propose DistNet2D, a new deep neural network (DNN) architecture for 2D cell segmentation and tracking that leverages both mid- and long-term temporal context. DistNet2D considers seven frames at the input and uses a post-processing procedure that exploits information from the entire movie to correct segmentation errors. DistNet2D outperforms two recent methods on two experimental datasets, one containing densely packed bacterial cells and the other containing eukaryotic cells. It has been integrated into an ImageJ-based graphical user interface for 2D data visualization, curation, and training. Finally, we demonstrate the performance of DistNet2D on correlating the size and shape of cells with their transport properties over large statistics, for both bacterial and eukaryotic cells.翻译:EXTRACTING LONG TRACKS AND LINEAGES FROM VIDEOMICROSCOPY REQUIRES AN EXTREMELY LOW ERROR RATE, WHICH IS CHALLENGING ON COMPLEX DATASETS OF DENSE OR DEFORMING CELLS. LEVERAGING TEMPORAL CONTEXT IS KEY TO OVERCOME THIS CHALLENGE. WE PROPOSE DISTNET2D, A NEW DEEP NEURAL NETWORK (DNN) ARCHITECTURE FOR 2D CELL SEGMENTATION AND TRACKING THAT LEVERAGES BOTH MID- AND LONG-TERM TEMPORAL CONTEXT. DISTNET2D CONSIDERS SEVEN FRAMES AT THE INPUT AND USES A POST-PROCESSING PROCEDURE THAT EXPLOITS INFORMATION FROM THE ENTIRE MOVIE TO CORRECT SEGMENTATION ERRORS. DISTNET2D OUTPERFORMS TWO RECENT METHODS ON TWO EXPERIMENTAL DATASETS, ONE CONTAINING DENSELY PACKED BACTERIAL CELLS AND THE OTHER CONTAINING EUKARYOTIC CELLS. IT HAS BEEN INTEGRATED INTO AN IMAGEJ-BASED GRAPHICAL USER INTERFACE FOR 2D DATA VISUALIZATION, CURATION, AND TRAINING. FINALLY, WE DEMONSTRATE THE PERFORMANCE OF DISTNET2D ON CORRELATING THE SIZE AND SHAPE OF CELLS WITH THEIR TRANSPORT PROPERTIES OVER LARGE STATISTICS, FOR BOTH BACTERIAL AND EUKARYOTIC CELLS.
Leave No Stone Unturned: Mine Extra Knowledge for Imbalanced Facial Expression Recognition
for: addresses the imbalanced facial expression recognition (FER) problem by proposing a novel approach to extract extra knowledge related to minor classes from both major and minor class samples.
methods: leverages re-balanced attention maps to regularize the model and extract transformation invariant information about the minor classes from all training samples, and introduces re-balanced smooth labels to regulate the cross-entropy loss and guide the model to pay more attention to the minor classes.
results: achieves state-of-the-art performance under the imbalanced FER task through extensive experiments on different datasets and backbones.Abstract
Facial expression data is characterized by a significant imbalance, with most collected data showing happy or neutral expressions and fewer instances of fear or disgust. This imbalance poses challenges to facial expression recognition (FER) models, hindering their ability to fully understand various human emotional states. Existing FER methods typically report overall accuracy on highly imbalanced test sets but exhibit low performance in terms of the mean accuracy across all expression classes. In this paper, our aim is to address the imbalanced FER problem. Existing methods primarily focus on learning knowledge of minor classes solely from minor-class samples. However, we propose a novel approach to extract extra knowledge related to the minor classes from both major and minor class samples. Our motivation stems from the belief that FER resembles a distribution learning task, wherein a sample may contain information about multiple classes. For instance, a sample from the major class surprise might also contain useful features of the minor class fear. Inspired by that, we propose a novel method that leverages re-balanced attention maps to regularize the model, enabling it to extract transformation invariant information about the minor classes from all training samples. Additionally, we introduce re-balanced smooth labels to regulate the cross-entropy loss, guiding the model to pay more attention to the minor classes by utilizing the extra information regarding the label distribution of the imbalanced training data. Extensive experiments on different datasets and backbones show that the two proposed modules work together to regularize the model and achieve state-of-the-art performance under the imbalanced FER task. Code is available at https://github.com/zyh-uaiaaaa.
摘要
《表情数据具有显著的不均衡问题,大多数收集到的数据显示了快乐或中性的表情,而少量的恐惧或厌恶表情。这种不均衡问题对于表情识别(FER)模型提出了挑战,使其无法彻底理解人类各种情感状态。现有的FER方法通常在不均衡测试集上报告总准确率,但在不同表情类别的均准确率方面表现较差。在这篇论文中,我们的目标是解决不均衡FER问题。现有的方法主要是通过学习少数类样本来学习少数类知识。然而,我们提出了一种新的方法,利用重新平衡的注意力地图来补做模型,使其能够从所有训练样本中提取 transformation 不变的信息。此外,我们还引入了重新平衡的平滑标签,以规则模型的权重,使模型更加注重少数类。我们的实验表明,两个提出的模块可以结合使用,补做模型,并在不均衡FER任务上实现状态的杰出性。代码可以在 中找到。》Note that Simplified Chinese is the official writing system used in mainland China, and it may be different from Traditional Chinese used in other regions.
Bidirectional Captioning for Clinically Accurate and Interpretable Models
results: 研究发现,描述语言预训练不仅可以生成与对比学习方法相当的视觉编码器(CheXpert竞赛多标签AUC为89.4%),还可以生成丰富的医学报告(描述macro-F1分数为0.349,使用CheXpert标签器),并且可以根据提示生成有targeted、交互的输出。Abstract
Vision-language pretraining has been shown to produce high-quality visual encoders which transfer efficiently to downstream computer vision tasks. While generative language models have gained widespread attention, image captioning has thus far been mostly overlooked as a form of cross-modal pretraining in favor of contrastive learning, especially in medical image analysis. In this paper, we experiment with bidirectional captioning of radiology reports as a form of pretraining and compare the quality and utility of learned embeddings with those from contrastive pretraining methods. We optimize a CNN encoder, transformer decoder architecture named RadTex for the radiology domain. Results show that not only does captioning pretraining yield visual encoders that are competitive with contrastive pretraining (CheXpert competition multi-label AUC of 89.4%), but also that our transformer decoder is capable of generating clinically relevant reports (captioning macro-F1 score of 0.349 using CheXpert labeler) and responding to prompts with targeted, interactive outputs.
摘要
视力语言预训理有效地生成高质量的视觉编码器,这些编码器可以有效传递到下游计算机视觉任务。然而,生成语言模型在医学影像分析领域中却受到了比较少的关注,而image captioning则被忽略了作为杂模预训理方法。在这篇论文中,我们使用了对 radiology 报告的 bidirectional captioning 作为预训理方法,并与对比式预训理方法进行比较。我们优化了一个 CNN 编码器和 transformer 解码器,并将其命名为 RadTex。结果显示,不仅captioning预训理可以生成与对比式预训理方法相当的视觉编码器(CheXpert 竞赛多标签 AUC 为 89.4%),而且我们的 transformer 解码器还能生成丰富的临床相关报告(captioning macro-F1 分数为 0.349,使用 CheXpert 标签器),并能够响应到提示的有arget、互动输出。
Convolutional Neural Networks for Automatic Detection of Intact Adenovirus from TEM Imaging with Debris, Broken and Artefacts Particles
results: 通过自动检测和分割adenovirus的软件工具,可以帮助生产商更好地检测和分析adenovirus在电子显微镜成像系统中。这些工具可以减少人工干预,提高检测效率和精度。Abstract
Regular monitoring of the primary particles and purity profiles of a drug product during development and manufacturing processes is essential for manufacturers to avoid product variability and contamination. Transmission electron microscopy (TEM) imaging helps manufacturers predict how changes affect particle characteristics and purity for virus-based gene therapy vector products and intermediates. Since intact particles can characterize efficacious products, it is beneficial to automate the detection of intact adenovirus against a non-intact-viral background mixed with debris, broken, and artefact particles. In the presence of such particles, detecting intact adenoviruses becomes more challenging. To overcome the challenge, due to such a presence, we developed a software tool for semi-automatic annotation and segmentation of adenoviruses and a software tool for automatic segmentation and detection of intact adenoviruses in TEM imaging systems. The developed semi-automatic tool exploited conventional image analysis techniques while the automatic tool was built based on convolutional neural networks and image analysis techniques. Our quantitative and qualitative evaluations showed outstanding true positive detection rates compared to false positive and negative rates where adenoviruses were nicely detected without mistaking them for real debris, broken adenoviruses, and/or staining artefacts.
摘要
常规监测药品主要粒子和纯度profile during 开发和生产过程是非常重要,以避免产品变异和杂质污染。电子传输微scopy(TEM)成像帮助制药厂家预测变化对粒子特性和纯度的影响,用于抗生素基因疗法vector产品和中间体。由于完整的粒子可以characterize 有效的产品,因此自动检测完整的adenovirus Against 杂质背景杂mix的粒子是有利的。在存在这些粒子时,检测完整的adenoviruses更加挑战。为了解决这个问题,我们开发了一种 semi-automatic 注解和分割adenoviruses的软件工具,以及一种自动分割和检测完整adenoviruses的软件工具。我们的量化和质量评估表明,我们的方法具有出色的真正阳性检测率,而false正和false负检测率几乎为零。adenoviruses 在TEM成像系统中被成功地检测出来,无需误尝杂质、损坏adenoviruses和染料artefacts。
results: 实验结果表明,GC-MVSNet 可以快速地学习高质量的多视图零点 reconstruction,并在 DTU 和 BlendedMVS 数据集上达到新的状态当前。此外,GC-MVSNet 还在 Tanks 和 Temples 数据集上达到了竞争性的结果。Abstract
Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning (see Fig. 1). We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels, reducing the training iteration requirements to nearly half that of other MVS methods. Our extensive experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt to enforce multi-view, multi-scale geometric consistency during learning.
摘要
传统的多视图零点矩阵(MVS)方法强调图像照度和几何含义一致性约束,但 newer的机器学习基于MVS方法只在多个源视图中的几何一致性检查为后处理步骤。在这篇论文中,我们提出了一种新的方法,其中在多个源视图中的参考视图深度图中的几何一致性被明确地强制实施(参见图1)。我们发现,在学习过程中添加几何一致性损失可以显著加速学习,因为它直接惩罚不几何一致的像素,从而减少了学习迭代次数至近乎半数。我们的广泛实验表明,我们的方法在DTU和BlendedMVS数据集上达到了新的状态计算机视觉领域,并在Tanks and Temples标准准确率中获得了竞争力的结果。我们知道,GC-MVSNet是第一个在多视图、多尺度中强制多视图几何一致性的学习方法。
Human-interpretable and deep features for image privacy classification
results: 本文分析了不同评估人员对敏感图像的纠正标签,并提出了适用于图像隐私分类的特有和人类可解释的特征。这些特征可以提高深度学习模型的性能和图像隐私分类的表示能力。Abstract
Privacy is a complex, subjective and contextual concept that is difficult to define. Therefore, the annotation of images to train privacy classifiers is a challenging task. In this paper, we analyse privacy classification datasets and the properties of controversial images that are annotated with contrasting privacy labels by different assessors. We discuss suitable features for image privacy classification and propose eight privacy-specific and human-interpretable features. These features increase the performance of deep learning models and, on their own, improve the image representation for privacy classification compared with much higher dimensional deep features.
摘要
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
results: 我们在两个 benchmark 上进行了实验,包括 VoxCeleb2 和 LRS3,并得到了对比较好的结果。相比之前的方法,我们的方法可以生成更自然的语音,而且不需要较大的计算资源。Abstract
The objective of this work is to extract target speaker's voice from a mixture of voices using visual cues. Existing works on audio-visual speech separation have demonstrated their performance with promising intelligibility, but maintaining naturalness remains a challenge. To address this issue, we propose AVDiffuSS, an audio-visual speech separation model based on a diffusion mechanism known for its capability in generating natural samples. For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism. This mechanism is specifically tailored for the speech domain to integrate the phonetic information from audio-visual correspondence in speech generation. In this way, the fusion process maintains the high temporal resolution of the features, without excessive computational requirements. We demonstrate that the proposed framework achieves state-of-the-art results on two benchmarks, including VoxCeleb2 and LRS3, producing speech with notably better naturalness.
摘要
<>TRANSLATE_TEXT目标是从混合声音中提取目标说话人的声音,使用视觉cue。现有的听视频演示表达能力很出色,但保持自然性仍然是挑战。为解决这个问题,我们提议AVDiffuSS,一种基于扩散机制的听视频演示模型。为有效地融合两种模式,我们还提议一种强调语音域的feature合并机制。这种机制通过跨注意力的feature合并来实现高时间分辨率的特征合并,而无需过度的计算成本。我们示示了该框架在VoxCeleb2和LRS3两个benchmark上的状态空间表现,生成的speech具有显著更好的自然性。TRANSLATE_TEXT
A Perceptual Shape Loss for Monocular 3D Face Reconstruction
paper_authors: Christopher Otto, Prashanth Chandran, Gaspard Zoss, Markus Gross, Paulo Gotardo, Derek Bradley
for: This paper aims to improve the quality of monocular 3D face reconstruction by proposing a new perceptual shape loss function that uses shading cues to evaluate the quality of the 3D face estimate.
methods: The proposed method uses a discriminator-style neural network to evaluate the quality of the shaded render of the geometry estimate, without requiring an estimate of the albedo or illumination in the scene. The loss operates entirely in image space and is agnostic to mesh topology.
results: The authors show that their new perceptual shape loss can be combined with traditional energy terms for monocular 3D face optimization and deep neural network regression, improving upon current state-of-the-art results.Here is the simplified Chinese text for the three key points:
results: 作者们表明,他们的新的形态损失函数可以与传统的能量函数相结合,提高单投影3D人脸优化和深度神经网络回归的结果。Abstract
Monocular 3D face reconstruction is a wide-spread topic, and existing approaches tackle the problem either through fast neural network inference or offline iterative reconstruction of face geometry. In either case carefully-designed energy functions are minimized, commonly including loss terms like a photometric loss, a landmark reprojection loss, and others. In this work we propose a new loss function for monocular face capture, inspired by how humans would perceive the quality of a 3D face reconstruction given a particular image. It is widely known that shading provides a strong indicator for 3D shape in the human visual system. As such, our new 'perceptual' shape loss aims to judge the quality of a 3D face estimate using only shading cues. Our loss is implemented as a discriminator-style neural network that takes an input face image and a shaded render of the geometry estimate, and then predicts a score that perceptually evaluates how well the shaded render matches the given image. This 'critic' network operates on the RGB image and geometry render alone, without requiring an estimate of the albedo or illumination in the scene. Furthermore, our loss operates entirely in image space and is thus agnostic to mesh topology. We show how our new perceptual shape loss can be combined with traditional energy terms for monocular 3D face optimization and deep neural network regression, improving upon current state-of-the-art results.
摘要
单眼3D脸重建是一个广泛的研究领域,现有的方法可以通过快速的神经网络推断或者组合不同的构成来解决这个问题。不 matter the approach, 都需要仔细设计能量函数,通常包括像素损失、标点重映射损失等损失函数。在这个工作中,我们提出了一个新的损失函数 для单眼3D脸重建,受人类视觉系统中的视觉评估影响。我们发现,阴影提供了3D形状中强大的视觉指标。因此,我们的新的“感知”形状损失将评估3D脸估计中的阴影匹配度,以便更好地评估3D脸重建的质量。我们的损失函数通过一个批评器网络来实现,这个批评器网络将从输入的脸像和geometry估计中获得的阴影匹配度进行评估,并且预测一个视觉评估分数。这个批评器网络仅从RGB图像和geometry render alone,没有需要场景照明估计。此外,我们的损失函数完全在图像空间中运作,因此不受体统的限制。在这个研究中,我们显示了我们的新的感知形状损失可以与传统的能量函数和神经网络回推 combinated,以提高目前的州Of-The-Art结果。
Skip-WaveNet: A Wavelet based Multi-scale Architecture to Trace Firn Layers in Radar Echograms
results: 该论文的提出的Skip-WaveNet架构可以在不同的数据集上实现更高的优化数据集批处理(ODS)和优化图像批处理(OIS)F- scores,并且可以在不同的层次上检测冰层,并且可以估计层次的平均绝对误差为3.31像素和94.3%的准确率。Abstract
Echograms created from airborne radar sensors capture the profile of firn layers present on top of an ice sheet. Accurate tracking of these layers is essential to calculate the snow accumulation rates, which are required to investigate the contribution of polar ice cap melt to sea level rise. However, automatically processing the radar echograms to detect the underlying firn layers is a challenging problem. In our work, we develop wavelet-based multi-scale deep learning architectures for these radar echograms to improve firn layer detection. We show that wavelet based architectures improve the optimal dataset scale (ODS) and optimal image scale (OIS) F-scores by 3.99% and 3.7%, respectively, over the non-wavelet architecture. Further, our proposed Skip-WaveNet architecture generates new wavelets in each iteration, achieves higher generalizability as compared to state-of-the-art firn layer detection networks, and estimates layer depths with a mean absolute error of 3.31 pixels and 94.3% average precision. Such a network can be used by scientists to trace firn layers, calculate the annual snow accumulation rates, estimate the resulting surface mass balance of the ice sheet, and help project global sea level rise.
摘要
雷达探测机上的echogram可以捕捉firn层的 Profiling ,这些层次是 ice sheet 的重要特征,用于计算降雪积累率,以确定 polar ice cap 的融化对海平面升高的贡献。但是,自动处理雷达echogram以检测 underlying firn层是一个具有挑战性的问题。在我们的工作中,我们开发了wavelet基的多尺度深度学习架构,用于改进firn层检测。我们表明,wavelet基架构可以提高optimal dataset scale (ODS) 和optimal image scale (OIS) F-score 的值,分别提高3.99%和3.7%。此外,我们提出的Skip-WaveNet架构在每个迭代中生成新的wavelet,实现了高度的泛化性,并且可以对firn层进行准确的深度估计,均方差误差为3.31像素和94.3%的均值精度。这种网络可以被科学家用于跟踪firn层,计算降雪积累率,估计ice sheet 的表面质量变化,并帮助预测全球海平面升高。
Disentangled Counterfactual Learning for Physical Audiovisual Commonsense Reasoning
results: 我们的提出方法在实验中超越了基eline方法,并实现了状态之最好的性能。我们的源代码可以在https://github.com/Andy20178/DCL上获取。Abstract
In this paper, we propose a Disentangled Counterfactual Learning~(DCL) approach for physical audiovisual commonsense reasoning. The task aims to infer objects' physics commonsense based on both video and audio input, with the main challenge is how to imitate the reasoning ability of humans. Most of the current methods fail to take full advantage of different characteristics in multi-modal data, and lacking causal reasoning ability in models impedes the progress of implicit physical knowledge inferring. To address these issues, our proposed DCL method decouples videos into static (time-invariant) and dynamic (time-varying) factors in the latent space by the disentangled sequential encoder, which adopts a variational autoencoder (VAE) to maximize the mutual information with a contrastive loss function. Furthermore, we introduce a counterfactual learning module to augment the model's reasoning ability by modeling physical knowledge relationships among different objects under counterfactual intervention. Our proposed method is a plug-and-play module that can be incorporated into any baseline. In experiments, we show that our proposed method improves baseline methods and achieves state-of-the-art performance. Our source code is available at https://github.com/Andy20178/DCL.
摘要
在这篇论文中,我们提出了一种分离式Counterfactual Learning(DCL)方法,用于物理 audiovisual常识理解。任务的目标是根据视频和音频输入推理出物体的物理常识,主要挑战在于如何模仿人类的思维能力。现有的方法多数不能充分利用多个Modal数据的不同特征,同时模型中缺乏 causal 理解能力,这些问题限制了隐藏的物理知识推理的进步。为解决这些问题,我们提出的 DCL 方法在幂变编码器中分离视频 into 静态(时间不变)和动态(时间变化)因素,使用 VAE maximize mutual information和对比损失函数。此外,我们引入了 counterfactual 学习模块,以模型物体之间的物理知识关系,并通过对假设性的 intervención来增强模型的推理能力。我们提出的方法是可以与任何基础模型集成的插件模块,在实验中,我们表明了我们的方法可以超越基础方法,达到领域的最佳性能。我们的源代码可以在 GitHub 上找到:https://github.com/Andy20178/DCL。
Harvest Video Foundation Models via Efficient Post-Pretraining
paper_authors: Yizhuo Li, Kunchang Li, Yinan He, Yi Wang, Yali Wang, Limin Wang, Yu Qiao, Ping Luo
for: 提高视频语言模型的生成效率和质量,以及大量视频语言数据的抽象和 reuse。
methods: randomly dropping input video patches和masking out input text during the post-pretraining procedure,以促进视频语言融合学习。
results: 对多种零shot任务、视频问答和视频文本检索等多种视频语言下沉Task进行了广泛的实验 validate the effectiveness of our method,并 achieve state-of-the-art performances,与一些劳烈预训练的视频基础模型相当。Abstract
Building video-language foundation models is costly and difficult due to the redundant nature of video data and the lack of high-quality video-language datasets. In this paper, we propose an efficient framework to harvest video foundation models from image ones. Our method is intuitively simple by randomly dropping input video patches and masking out input text during the post-pretraining procedure. The patch dropping boosts the training efficiency significantly and text masking enforces the learning of cross-modal fusion. We conduct extensive experiments to validate the effectiveness of our method on a wide range of video-language downstream tasks including various zero-shot tasks, video question answering, and video-text retrieval. Despite its simplicity, our method achieves state-of-the-art performances, which are comparable to some heavily pretrained video foundation models. Our method is extremely efficient and can be trained in less than one day on 8 GPUs, requiring only WebVid-10M as pretraining data. We hope our method can serve as a simple yet strong counterpart for prevalent video foundation models, provide useful insights when building them, and make large pretrained models more accessible and sustainable. This is part of the InternVideo project \url{https://github.com/OpenGVLab/InternVideo}.
摘要
Translated into Simplified Chinese:建立视频语言基础模型的成本和困难在于视频数据的重复性和语言基础模型的缺乏高质量数据集。在这篇论文中,我们提出了一种高效的框架,可以从图像基础模型中提取视频基础模型。我们的方法是通过随机删除输入视频补丁和隐藏输入文本来增强训练效率。补丁删除提高了训练效率,而文本隐藏强制学习cross模式融合。我们进行了广泛的实验,以验证我们的方法在多种视频语言下游任务中的效果,包括多种零学习任务、视频问答和视频文本检索等。尽管我们的方法简单,但它可以与一些努力预训练的视频基础模型匹敌。我们的方法非常高效,可以在8个GPU上训练完成,只需要WebVid-10M作为预训练数据。我们希望我们的方法可以作为一种简单却强大的对手,为建立视频基础模型提供有用的指导,并使大型预训练模型更加可 accessible和可持续。这是InternVideo项目的一部分(https://github.com/OpenGVLab/InternVideo)。
MENTOR: Human Perception-Guided Pretraining for Iris Presentation Detection
methods: 利用人类注意力指导 CNN 模型的训练,包括两个唯一的训练轮。首先,我们训练一个自动编码器,以学习人类注意力地图。然后,我们使用这个表示来训练一个 iris PAD 模型,并使用这个模型作为一个人类注意力指导的注释工具。
results: 三重效果:(a) 使用人类注意力训练的 encoder 的 weights 对 iris PAD 性能有显著提高,比如 ImageNet 源或随机 initialization 的 weights。(b) 可以生成无限多个人类注意力地图,用于未看过的 iris PAD 样本。(c) 可以提高 iris PAD 模型训练的效率。Abstract
Incorporating human salience into the training of CNNs has boosted performance in difficult tasks such as biometric presentation attack detection. However, collecting human annotations is a laborious task, not to mention the questions of how and where (in the model architecture) to efficiently incorporate this information into model's training once annotations are obtained. In this paper, we introduce MENTOR (huMan pErceptioN-guided preTraining fOr iris pResentation attack detection), which addresses both of these issues through two unique rounds of training. First, we train an autoencoder to learn human saliency maps given an input iris image (both real and fake examples). Once this representation is learned, we utilize the trained autoencoder in two different ways: (a) as a pre-trained backbone for an iris presentation attack detector, and (b) as a human-inspired annotator of salient features on unknown data. We show that MENTOR's benefits are threefold: (a) significant boost in iris PAD performance when using the human perception-trained encoder's weights compared to general-purpose weights (e.g. ImageNet-sourced, or random), (b) capability of generating infinite number of human-like saliency maps for unseen iris PAD samples to be used in any human saliency-guided training paradigm, and (c) increase in efficiency of iris PAD model training. Sources codes and weights are offered along with the paper.
摘要
<>传统的 convolutional neural network (CNN) 在难度较高的任务中表现不佳,如生物 metric 表现攻击检测。然而,收集人类注释是一项劳顿的任务,更是问题是如何有效地将这些信息integrated into 模型的训练过程中。在这篇论文中,我们介绍了 MENTOR(huMan pErceptioN-guided preTraining fOr iris pResentation attack detection),它解决了这两个问题。我们使用两种不同的训练方法:首先,我们训练了一个 autoencoder,以学习人类注意力图 given 输入的 iris 图像( both real 和 fake 例子)。然后,我们使用这个 representations 来训练一个 iris 表达攻击检测器,并使用这个模型来生成无数量的人类类似的注意力图。我们发现,MENTOR 具有以下三个优点:(a)在使用人类注意力训练后的encoder 的 weights 时,iris PAD 性能得到了明显的提高。(b)可以生成无数量的人类类似的注意力图,用于任何人类注意力导向的训练方法。(c)可以提高 iris PAD 模型训练的效率。我们提供了模型和 weights,以便在论文中使用。
Exploiting Image-Related Inductive Biases in Single-Branch Visual Tracking
paper_authors: Chuanming Tang, Kai Wang, Joost van de Weijer, Jianlin Zhang, Yongmei Huang for:AViTMP is proposed to tackle the inferior effectiveness of the vanilla ViT in visual tracking tasks, by bridging the gap between single-branch networks and discriminative models.methods:The proposed AViTMP model uses an adaptor module and joint target state embedding in the encoder to enrich the dense embedding paradigm based on ViT, and combines it with a dense-fusion decoder and a discriminative target model to predict accurate location. Additionally, a novel inference pipeline called CycleTrack is presented to mitigate the limitations of conventional inference practice, and a dual-frame update inference strategy is proposed to handle significant challenges in long-term scenarios.results:The proposed AViTMP model achieves state-of-the-art performance in visual tracking tasks, especially on long-time tracking and robustness, as demonstrated by the experimental results on ten tracking benchmarks, including LaSOT, LaSOTExtSub, AVisT, etc.Abstract
Despite achieving state-of-the-art performance in visual tracking, recent single-branch trackers tend to overlook the weak prior assumptions associated with the Vision Transformer (ViT) encoder and inference pipeline. Moreover, the effectiveness of discriminative trackers remains constrained due to the adoption of the dual-branch pipeline. To tackle the inferior effectiveness of the vanilla ViT, we propose an Adaptive ViT Model Prediction tracker (AViTMP) to bridge the gap between single-branch network and discriminative models. Specifically, in the proposed encoder AViT-Enc, we introduce an adaptor module and joint target state embedding to enrich the dense embedding paradigm based on ViT. Then, we combine AViT-Enc with a dense-fusion decoder and a discriminative target model to predict accurate location. Further, to mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. Lastly, we propose a dual-frame update inference strategy that adeptively handles significant challenges in long-term scenarios. In the experiments, we evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that AViTMP attains state-of-the-art performance, especially on long-time tracking and robustness.
摘要
尽管最新的单支持器跟踪器达到了视觉跟踪的状态顶峰性能,但是最近的单支持器跟踪器往往忽略了视觉转换器(ViT)Encoder和推理管道中的弱优先级假设。此外,采用双支持器管道的效果仍然受限,不能够满足跟踪任务的需求。为了解决vanilla ViT的不足,我们提出了一种适应型ViT模型预测跟踪器(AViTMP),以填补单支持器网络和推理模型之间的差距。具体来说,在我们提出的AViT-Encoder中,我们添加了适应模块和联合目标状态嵌入,以激活ViT dense embedding的PARADIGM。然后,我们将AViT-Encoder与密集混合解码器和一种推理模型结合,以准确预测位置。此外,为了 Mitigate the limitations of conventional inference practice, we present a novel inference pipeline called CycleTrack, which bolsters the tracking robustness in the presence of distractors via bidirectional cycle tracking verification. Finally, we propose a dual-frame update inference strategy that adeptively handles significant challenges in long-term scenarios. In the experiments, we evaluate AViTMP on ten tracking benchmarks for a comprehensive assessment, including LaSOT, LaSOTExtSub, AVisT, etc. The experimental results unequivocally establish that AViTMP attains state-of-the-art performance, especially on long-time tracking and robustness.
IterInv: Iterative Inversion for Pixel-Level T2I Models
results: 提出了一种基于迭代 concatenation 的图像逆变换技术(IterInv),并证明了IterInv 可以与流行的图像编辑方法结合使用,提高图像生成和编辑的能力。Abstract
Large-scale text-to-image diffusion models have been a ground-breaking development in generating convincing images following an input text prompt. The goal of image editing research is to give users control over the generated images by modifying the text prompt. Current image editing techniques are relying on DDIM inversion as a common practice based on the Latent Diffusion Models (LDM). However, the large pretrained T2I models working on the latent space as LDM suffer from losing details due to the first compression stage with an autoencoder mechanism. Instead, another mainstream T2I pipeline working on the pixel level, such as Imagen and DeepFloyd-IF, avoids this problem. They are commonly composed of several stages, normally with a text-to-image stage followed by several super-resolution stages. In this case, the DDIM inversion is unable to find the initial noise to generate the original image given that the super-resolution diffusion models are not compatible with the DDIM technique. According to our experimental findings, iteratively concatenating the noisy image as the condition is the root of this problem. Based on this observation, we develop an iterative inversion (IterInv) technique for this stream of T2I models and verify IterInv with the open-source DeepFloyd-IF model. By combining our method IterInv with a popular image editing method, we prove the application prospects of IterInv. The code will be released at \url{https://github.com/Tchuanm/IterInv.git}.
摘要
大规模文本到图像扩散模型已经是图像生成领域的一项重要发展,可以生成基于输入文本提示的真实的图像。图像编辑研究的目标是给用户控制生成图像的文本提示。现有的图像编辑技术多 rely于 DDIM 逆转,基于缺失扩散模型(LDM)。然而,大规模预训练 T2I 模型在幽parallel space中作为 LDM 受到压缩的影响,导致失去细节。相比之下,另一种主流 T2I 管道在像素级别进行操作,如 Imagen 和 DeepFloyd-IF,可以避免这个问题。它们通常包括一个文本到图像阶段,然后跟踪多个超分辨阶段。在这种情况下,DDIM 逆转无法找到初始噪声,生成原始图像。根据我们的实验发现,逐次 concatenate 噪声图像作为条件是这个问题的根本原因。基于这一观察,我们开发了一种逐次逆转(IterInv)技术,并验证 IterInv 与开源的 DeepFloyd-IF 模型。通过将我们的方法 IterInv 与一种流行的图像编辑方法相结合,我们证明了 IterInv 的应用前景。代码将在 \url{https://github.com/Tchuanm/IterInv.git} 上发布。
Revitalizing Legacy Video Content: Deinterlacing with Bidirectional Information Propagation
results: 实验结果表明,该方法比现有方法有更高的性能。Abstract
Due to old CRT display technology and limited transmission bandwidth, early film and TV broadcasts commonly used interlaced scanning. This meant each field contained only half of the information. Since modern displays require full frames, this has spurred research into deinterlacing, i.e. restoring the missing information in legacy video content. In this paper, we present a deep-learning-based method for deinterlacing animated and live-action content. Our proposed method supports bidirectional spatio-temporal information propagation across multiple scales to leverage information in both space and time. More specifically, we design a Flow-guided Refinement Block (FRB) which performs feature refinement including alignment, fusion, and rectification. Additionally, our method can process multiple fields simultaneously, reducing per-frame processing time, and potentially enabling real-time processing. Our experimental results demonstrate that our proposed method achieves superior performance compared to existing methods.
摘要
Are Natural Domain Foundation Models Useful for Medical Image Classification?
results: 研究结果显示DINOv2在特定的训练设定下表现出色,常常超越标准做法的ImageNet预训练。但是其他基础模型对医疗影像分类任务的转移性较差,无法一致地超越ImageNet预训练的基准。Abstract
The deep learning field is converging towards the use of general foundation models that can be easily adapted for diverse tasks. While this paradigm shift has become common practice within the field of natural language processing, progress has been slower in computer vision. In this paper we attempt to address this issue by investigating the transferability of various state-of-the-art foundation models to medical image classification tasks. Specifically, we evaluate the performance of five foundation models, namely SAM, SEEM, DINOv2, BLIP, and OpenCLIP across four well-established medical imaging datasets. We explore different training settings to fully harness the potential of these models. Our study shows mixed results. DINOv2 in particular, consistently outperforms the standard practice of ImageNet pretraining. However, other foundation models failed to consistently beat this established baseline indicating limitations in their transferability to medical image classification tasks.
摘要
深度学习领域正在转向通用基础模型的使用,以便轻松地适应多种任务。在自然语言处理领域中,这种思想转变已经成为惯例,但在计算机视觉领域,进步 slower。本文我们尝试解决这个问题,通过对不同基础模型的传输性进行研究。我们评估了五种基础模型,namely SAM、SEEM、DINOv2、BLIP 和 OpenCLIP,在四个已知的医疗影像集合上进行了评估。我们探索了不同的训练设置,以充分发挥这些模型的潜力。我们的研究显示了混合的结果。DINOv2 特别是,在医疗影像分类任务中一直表现出色,超过了标准实践的 ImageNet 预训练。然而,其他基础模型未能一直表现出色,表明它们在医疗影像分类任务中的传输性有限。
Generating Context-Aware Natural Answers for Questions in 3D Scenes
paper_authors: Mohammed Munzer Dwedari, Matthias Niessner, Dave Zhenyu Chen
for: Answering questions in 3D scenes naturally and freely, without being limited to pre-defined answers.
methods: Converted the question answering task into a sequence generation task, and optimized the model directly on language rewards to ensure global sentence semantics. Additionally, a pragmatic language understanding reward was adapted to improve sentence quality.
results: Set a new SOTA (State of the Art) on the ScanQA benchmark with a CIDEr score of 72.22/66.57 on the test sets.Abstract
3D question answering is a young field in 3D vision-language that is yet to be explored. Previous methods are limited to a pre-defined answer space and cannot generate answers naturally. In this work, we pivot the question answering task to a sequence generation task to generate free-form natural answers for questions in 3D scenes (Gen3DQA). To this end, we optimize our model directly on the language rewards to secure the global sentence semantics. Here, we also adapt a pragmatic language understanding reward to further improve the sentence quality. Our method sets a new SOTA on the ScanQA benchmark (CIDEr score 72.22/66.57 on the test sets).
摘要
三维问答是一个年轻的领域,尚未得到充分开发。先前的方法受限于预先定义的答案空间,无法自然生成答案。在这种工作中,我们将问答任务转换为一种序列生成任务,以便在三维场景中自然生成免束答案(Gen3DQA)。为此,我们直接优化我们的模型以获取语言奖励,以保证全句 semantics。此外,我们还适应了一种 Pragmatic 语言理解奖励,以进一步提高句子质量。我们的方法在 ScanQA benchmark 上设置了新的 SOTA(CIDEr 分数 72.22/66.57)。
Transformer-based nowcasting of radar composites from satellite images for severe weather
paper_authors: Çağlar Küçük, Apostolos Giannakos, Stefan Schneider, Alexander Jann
for: 这个研究是为了提高预报技术,将天气卫星数据与地面雷达数据融合,以提供高精度的预报。
methods: 这个研究使用了Transformer数据分析模型,利用天气卫星数据进行预报。
results: 研究发现,使用这种模型可以对不同的天气情况进行高精度的预报,并且具有耐变性和复杂数据结构的能力。Abstract
Weather radar data are critical for nowcasting and an integral component of numerical weather prediction models. While weather radar data provide valuable information at high resolution, their ground-based nature limits their availability, which impedes large-scale applications. In contrast, meteorological satellites cover larger domains but with coarser resolution. However, with the rapid advancements in data-driven methodologies and modern sensors aboard geostationary satellites, new opportunities are emerging to bridge the gap between ground- and space-based observations, ultimately leading to more skillful weather prediction with high accuracy. Here, we present a Transformer-based model for nowcasting ground-based radar image sequences using satellite data up to two hours lead time. Trained on a dataset reflecting severe weather conditions, the model predicts radar fields occurring under different weather phenomena and shows robustness against rapidly growing/decaying fields and complex field structures. Model interpretation reveals that the infrared channel centered at 10.3 $\mu m$ (C13) contains skillful information for all weather conditions, while lightning data have the highest relative feature importance in severe weather conditions, particularly in shorter lead times. The model can support precipitation nowcasting across large domains without an explicit need for radar towers, enhance numerical weather prediction and hydrological models, and provide radar proxy for data-scarce regions. Moreover, the open-source framework facilitates progress towards operational data-driven nowcasting.
摘要
天气雷达数据是现场预报中的关键参数,也是数值天气预测模型中的重要组成部分。 Although 天气雷达数据提供高分辨率的信息,它们的地面性限制了它们的可用性,这限制了大规模应用。 相比之下,天气卫星覆盖的范围更大,但是其分辨率相对较低。 然而,随着数据驱动方法的快速发展和现代气象卫星上的先进感知器,新的机遇正在出现,可以 bridging 天气雷达和空间天气观测之间的差距,从而实现更准确的天气预测。 在这篇文章中,我们提出了一种基于变换器的模型,用于预测基于雷达图像序列的现场预报,使用卫星数据作为至多两个小时的领先时间。 模型在不同的天气现象下预测雷达场,并且在快速增长/衰退的场和复杂场结构下表现稳定。 模型解释显示,10.3微米的紫外通道(C13)含有有用的信息,而闪电数据在严重天气情况下具有最高相对特征重要性。 该模型可以在大规模应用中支持降水预报,不需要明确的雷达天线,提高数值天气预测和水文模型,并提供数据缺乏地区的雷达代理。 此外,开源框架使得操作数据驱动现场预报的进步更加容易。
VideoCrafter1: Open Diffusion Models for High-Quality Video Generation
for: The paper is written for researchers and engineers who are interested in video generation, specifically those looking for open-source models.
methods: The paper introduces two diffusion models for high-quality video generation: text-to-video (T2V) and image-to-video (I2V) models. The T2V model synthesizes a video based on a given text input, while the I2V model incorporates an additional image input to produce videos that strictly adhere to the content of the provided reference image.
results: The proposed T2V model can generate realistic and cinematic-quality videos with a resolution of $1024 \times 576$, outperforming other open-source T2V models in terms of quality. The I2V model is the first open-source I2V foundation model capable of transforming a given image into a video clip while maintaining content preservation constraints.Here’s the simplified Chinese text for the three key points:
results: 提出的T2V模型可以生成高质量和电影质量的视频,分辨率为$1024 \times 576$,超过了其他开源T2V模型的质量。I2V模型是首个开源I2V基础模型,可以将给定的图像转化为视频clip,并且保持内容不变。Abstract
Video generation has increasingly gained interest in both academia and industry. Although commercial tools can generate plausible videos, there is a limited number of open-source models available for researchers and engineers. In this work, we introduce two diffusion models for high-quality video generation, namely text-to-video (T2V) and image-to-video (I2V) models. T2V models synthesize a video based on a given text input, while I2V models incorporate an additional image input. Our proposed T2V model can generate realistic and cinematic-quality videos with a resolution of $1024 \times 576$, outperforming other open-source T2V models in terms of quality. The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style. This model is the first open-source I2V foundation model capable of transforming a given image into a video clip while maintaining content preservation constraints. We believe that these open-source video generation models will contribute significantly to the technological advancements within the community.
摘要
视频生成已经在学术和行业中引起了越来越多的兴趣。虽然商业工具可以生成可信度很高的视频,但是学术研究和工程师可以使用的开源模型却有限。在这项工作中,我们介绍了两种扩散模型用于高质量视频生成,即文本到视频(T2V)模型和图像到视频(I2V)模型。T2V模型将文本输入转化成视频,而I2V模型具有额外的图像输入。我们的提议的T2V模型可以生成高质量和电影级的视频,分辨率为1024×576,超越其他开源T2V模型。I2V模型是首个能够将给定图像转化成视频clip,保持内容、结构和风格的开源基础模型。我们认为这些开源视频生成模型会对社区技术进步产生重要贡献。
Deep Learning for Visual Navigation of Underwater Robots
results: 文献将对水下机器人的视觉导航进行概述,并提供相关的研究成果和评估。不包括使用深度学习算法处理非视觉数据的文献,除非作为对照例子。Abstract
This paper aims to briefly survey deep learning methods for visual navigation of underwater robotics. The scope of this paper includes the visual perception of underwater robotics with deep learning methods, the available visual underwater datasets, imitation learning, and reinforcement learning methods for navigation. Additionally, relevant works will be categorized under the imitation learning or deep learning paradigm for underwater robots for clarity of the training methodologies in the current landscape. Literature that uses deep learning algorithms to process non-visual data for underwater navigation will not be considered, except as contrasting examples.
摘要
这篇论文目的是 briefly 概括深度学习方法对水下机器人视觉导航。论文的范围包括水下机器人视觉深度学习方法、可用的水下视数据集、仿真学习和奖励学习方法 для 导航。此外,相关的工作会被分类为仿真学习或深度学习 paradigm 下的训练方法,以便在当前领域的训练方法之间进行清晰的分类。文献使用深度学习算法处理非视觉数据进行水下导航将不被考虑,除非作为对照例外。
VDIP-TGV: Blind Image Deconvolution via Variational Deep Image Prior Empowered by Total Generalized Variation
methods: 使用变分深度图像优先(VDIP)和总泛化变分(TGV)regularization,并使用 alternate direction method of multipliers(ADMM)来解决问题
results: 比较多种现有模型,提出VDIP-TGV模型,实验表明VDIP-TGV模型可以更好地重建图像的细节和Edge,并且可以更好地处理大blur kernel的情况Abstract
Recovering clear images from blurry ones with an unknown blur kernel is a challenging problem. Deep image prior (DIP) proposes to use the deep network as a regularizer for a single image rather than as a supervised model, which achieves encouraging results in the nonblind deblurring problem. However, since the relationship between images and the network architectures is unclear, it is hard to find a suitable architecture to provide sufficient constraints on the estimated blur kernels and clean images. Also, DIP uses the sparse maximum a posteriori (MAP), which is insufficient to enforce the selection of the recovery image. Recently, variational deep image prior (VDIP) was proposed to impose constraints on both blur kernels and recovery images and take the standard deviation of the image into account during the optimization process by the variational principle. However, we empirically find that VDIP struggles with processing image details and tends to generate suboptimal results when the blur kernel is large. Therefore, we combine total generalized variational (TGV) regularization with VDIP in this paper to overcome these shortcomings of VDIP. TGV is a flexible regularization that utilizes the characteristics of partial derivatives of varying orders to regularize images at different scales, reducing oil painting artifacts while maintaining sharp edges. The proposed VDIP-TGV effectively recovers image edges and details by supplementing extra gradient information through TGV. Additionally, this model is solved by the alternating direction method of multipliers (ADMM), which effectively combines traditional algorithms and deep learning methods. Experiments show that our proposed VDIP-TGV surpasses various state-of-the-art models quantitatively and qualitatively.
摘要
recuperar imagens claras a partir de imagens borrosas com um kernel de blur desconhecido é um problema desafiador. A prioridade de imagem profunda (DIP) propõe usar a rede profunda como um regularizador para uma imagem individual em vez de um modelo de treinamento supervisionado, o que alcança resultados animadores no problema de desblurimento não cego. No entanto, desde que a relação entre as imagens e as arquiteturas de rede é incerta, é difícil encontrar uma arquitetura adequada para fornecer restrições suficientes sobre os kernel de blur e as imagens limpas. Além disso, DIP usa a máxima a posteriori esparsa (MAP), o que é insuficiente para impor a seleção da imagem de recuperação.Recentemente, o prior de imagem profunda variável (VDIP) foi proposto para impor restrições sobre os kernel de blur e as imagens de recuperação e considerar a variância da imagem durante o processo de otimização pelo princípio variacional. No entanto, encontramos empremicamente que VDIP tem dificuldade em processar detalhes de imagem e tende a gerar resultados subótimos quando o kernel de blur é grande. Portanto, combinamos a regularização de tipo geral variacional (TGV) com VDIP neste artigo para superar as limitações de VDIP. TGV é uma regularização flexível que utiliza as características de derivadas parciais de ordens variables para regularizar imagens em diferentes escalas, reduzindo artefatos de pintura de óleo enquanto mantém bordos afiados. A nossa propriedade VDIP-TGV eficazmente recupera bordos e detalhes de imagem adicionando informações de gradiente extra através de TGV. Além disso, este modelo é resolvido pelo método de direções alternadas de multiplicadores (ADMM), que eficazmente combina métodos tradicionais e de aprendizado profundo. Os resultados experimentais mostram que nossa propriedade VDIP-TGV ultrapassa modelos estado-da-arte quantitativamente e qualitativamente.
Generative Neural Fields by Mixtures of Neural Implicit Functions
paper_authors: Tackgeun You, Mijeong Kim, Jungtaek Kim, Bohyung Han
for: 本文提出了一种新的生成神经场方法,用于学习基于线性组合的隐藏基准网络。
methods: 该方法使用meta-学习或自动解码方法学习基准网络和其系数在隐藏空间,并通过Weighted Model Averaging来减少推理时间和内存占用。
results: 实验表明,该方法在多种图像、体量数据和NeRF场景上达到了竞争性的生成性能,而且不需要特殊的Modalities和Domain设计。Abstract
We propose a novel approach to learning the generative neural fields represented by linear combinations of implicit basis networks. Our algorithm learns basis networks in the form of implicit neural representations and their coefficients in a latent space by either conducting meta-learning or adopting auto-decoding paradigms. The proposed method easily enlarges the capacity of generative neural fields by increasing the number of basis networks while maintaining the size of a network for inference to be small through their weighted model averaging. Consequently, sampling instances using the model is efficient in terms of latency and memory footprint. Moreover, we customize denoising diffusion probabilistic model for a target task to sample latent mixture coefficients, which allows our final model to generate unseen data effectively. Experiments show that our approach achieves competitive generation performance on diverse benchmarks for images, voxel data, and NeRF scenes without sophisticated designs for specific modalities and domains.
摘要
我们提出了一种新的方法,用于学习生成神经场的线性组合卷积神经网络表示。我们的算法在幂 Learning 或自动解码模式下学习基准网络的卷积神经表示和其权重在隐藏空间的系数,从而轻松地扩大生成神经场的容量。通过对模型进行权重平均,我们可以在推理时保持网络的大小小于原始模型,同时在采样实例时实现高效的延迟和内存占用。此外,我们可以根据目标任务自定义锈推散概率模型,从而在最终模型中生成未seen数据,并且实现了在多种模式和领域上的竞争性生成性能。实验结果表明,我们的方法在多种图像、体积数据和NeRF场景上都可以 дости得竞争性的生成性能,无需特殊的设计 для特定的Modalities和领域。
Towards Grouping in Large Scenes with Occlusion-aware Spatio-temporal Transformers
results: 实验结果显示,GroupTransformer 在大规模场景和小规模场景都能够提高表现,比如精度和 F1 分数在大规模场景上提高了 més de 10%,在小规模场景上提高了 més de 5%。Abstract
Group detection, especially for large-scale scenes, has many potential applications for public safety and smart cities. Existing methods fail to cope with frequent occlusions in large-scale scenes with multiple people, and are difficult to effectively utilize spatio-temporal information. In this paper, we propose an end-to-end framework,GroupTransformer, for group detection in large-scale scenes. To deal with the frequent occlusions caused by multiple people, we design an occlusion encoder to detect and suppress severely occluded person crops. To explore the potential spatio-temporal relationship, we propose spatio-temporal transformers to simultaneously extract trajectory information and fuse inter-person features in a hierarchical manner. Experimental results on both large-scale and small-scale scenes demonstrate that our method achieves better performance compared with state-of-the-art methods. On large-scale scenes, our method significantly boosts the performance in terms of precision and F1 score by more than 10%. On small-scale scenes, our method still improves the performance of F1 score by more than 5%. The project page with code can be found at http://cic.tju.edu.cn/faculty/likun/projects/GroupTrans.
摘要
<> translate "Group detection, especially for large-scale scenes, has many potential applications for public safety and smart cities. Existing methods fail to cope with frequent occlusions in large-scale scenes with multiple people, and are difficult to effectively utilize spatio-temporal information. In this paper, we propose an end-to-end framework,GroupTransformer, for group detection in large-scale scenes. To deal with the frequent occlusions caused by multiple people, we design an occlusion encoder to detect and suppress severely occluded person crops. To explore the potential spatio-temporal relationship, we propose spatio-temporal transformers to simultaneously extract trajectory information and fuse inter-person features in a hierarchical manner. Experimental results on both large-scale and small-scale scenes demonstrate that our method achieves better performance compared with state-of-the-art methods. On large-scale scenes, our method significantly boosts the performance in terms of precision and F1 score by more than 10%. On small-scale scenes, our method still improves the performance of F1 score by more than 5%. The project page with code can be found at http://cic.tju.edu.cn/faculty/likun/projects/GroupTrans." into Simplified Chinese.<>大规模场景中的集群检测具有许多公共安全和智能城市应用的潜在应用场景。现有方法在大规模场景中频繁受到多人干扰,并且Difficult to effectively utilize spatio-temporal information。在这篇论文中,我们提出了一个端到端框架,GroupTransformer,用于大规模场景中的集群检测。为了处理由多人引起的频繁干扰,我们设计了一个干扰编码器,用于检测并Suppress严重干扰人裁剪。以探索可能的空间时间关系,我们提出了空间时间变换器,用于同时提取轨迹信息并融合人员之间的特征。实验结果表明,我们的方法在大规模场景和小规模场景上都有更好的性能,相比于当前最佳方法。在大规模场景上,我们的方法可以提高精度和F1分数的性能,高于10%。在小规模场景上,我们的方法仍然可以提高F1分数的性能,高于5%。项目页面和代码可以在http://cic.tju.edu.cn/faculty/likun/projects/GroupTrans中找到。
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
paper_authors: Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu
for: 提高模型性能通过教师学生训练方式
methods: 使用中心kernel对比法 comparing the learned features between heterogeneous teacher and student models, and proposing a simple yet effective one-for-all KD framework called OFA-KD
results: 对不同架构的模型进行知识储备,并且可以获得显著的性能提升(最大提升率为8.0%在CIFAR-100 dataset和0.7%在ImageNet-1K dataset)Abstract
Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an aligned latent space such as the logits space, where architecture-specific information is discarded. Additionally, we introduce an adaptive target enhancement scheme to prevent the student from being disturbed by irrelevant information. Extensive experiments with various architectures, including CNN, Transformer, and MLP, demonstrate the superiority of our OFA-KD framework in enabling distillation between heterogeneous architectures. Specifically, when equipped with our OFA-KD, the student models achieve notable performance improvements, with a maximum gain of 8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code and checkpoints can be found at https://github.com/Hao840/OFAKD.
摘要
知识塑化(KD)已经证明是一种非常有效的方法,可以通过教师学生训练方案提高模型性能。然而,大多数现有的塑化方法假设教师和学生模型属于同一种模型家族,特别是使用提示方法。我们使用中心kernels对比(CKA)来比较教师和学生模型学习的特征,我们发现了显著的特征分化。这种分化表明了之前的提示方法在cross-architecture塑化中的不足。为了解决塑化不同模型的挑战,我们提出了一个简单 yet 有效的一forall KD框架,叫做OFA-KD。我们将中间特征 проек到一个对齐的特征空间,例如logits空间,其中抹除了建筑特定的信息。此外,我们引入一种适应目标增强方案,以避免学生被不相关信息所干扰。我们在不同的建筑,包括CNN、Transformer和MLP,进行了广泛的实验,并证明了我们的OFA-KD框架在不同建筑之间的塑化中具有显著的优势。具体来说,当我们的OFA-KD框架与学生模型结合使用时,学生模型在CIFAR-100数据集上得到了显著的性能提高,最大提高为8.0%,而在ImageNet-1K数据集上的提高为0.7%。PyTorch代码和检查点可以在https://github.com/Hao840/OFAKD中找到。
Dynamic Gaussian Splatting from Markerless Motion Capture can Reconstruct Infants Movements
results: 研究结果显示了这种方法在生成新的景象和追踪婴儿运动方面的潜力,这些结果显示了这种方法的应用可能性。Abstract
Easy access to precise 3D tracking of movement could benefit many aspects of rehabilitation. A challenge to achieving this goal is that while there are many datasets and pretrained algorithms for able-bodied adults, algorithms trained on these datasets often fail to generalize to clinical populations including people with disabilities, infants, and neonates. Reliable movement analysis of infants and neonates is important as spontaneous movement behavior is an important indicator of neurological function and neurodevelopmental disability, which can help guide early interventions. We explored the application of dynamic Gaussian splatting to sparse markerless motion capture (MMC) data. Our approach leverages semantic segmentation masks to focus on the infant, significantly improving the initialization of the scene. Our results demonstrate the potential of this method in rendering novel views of scenes and tracking infant movements. This work paves the way for advanced movement analysis tools that can be applied to diverse clinical populations, with a particular emphasis on early detection in infants.
摘要
提高患者的移动跟踪精度可以帮助许多rehabilitation领域。一个挑战是,虽然有很多 Datasets 和预训练算法 дляabled-bodied adults,但这些算法经常无法泛化到临床 популяции,包括残疾人、婴儿和新生儿。婴儿和新生儿的自发运动行为对神经学功能和发展障碍的评估具有重要意义,可以帮助早期 intervene。我们 explore了使用动态 Gaussian splatting 将 sparse markerless motion capture (MMC) 数据应用于实际场景中。我们的方法利用 semantic segmentation masks 将注意力集中在婴儿身上,大幅提高场景的初始化。我们的结果表明这种方法在生成新视图和跟踪婴儿运动方面具有潜在的潜力。这项工作为诊断和治疗各种临床 популяции提供了新的动力,特别是在早期诊断婴儿中。
GaitFormer: Learning Gait Representations with Noisy Multi-Task Learning
results: 我们的方法可以在 CASIA-B 和 FVG 上达到 92.5% 和 85.33% 的准确率,比之前的方法提高 +14.2% 和 +9.67%。此外,我们的方法还可以准确地识别出人体的性别信息和多种外观特征。Abstract
Gait analysis is proven to be a reliable way to perform person identification without relying on subject cooperation. Walking is a biometric that does not significantly change in short periods of time and can be regarded as unique to each person. So far, the study of gait analysis focused mostly on identification and demographics estimation, without considering many of the pedestrian attributes that appearance-based methods rely on. In this work, alongside gait-based person identification, we explore pedestrian attribute identification solely from movement patterns. We propose DenseGait, the largest dataset for pretraining gait analysis systems containing 217K anonymized tracklets, annotated automatically with 42 appearance attributes. DenseGait is constructed by automatically processing video streams and offers the full array of gait covariates present in the real world. We make the dataset available to the research community. Additionally, we propose GaitFormer, a transformer-based model that after pretraining in a multi-task fashion on DenseGait, achieves 92.5% accuracy on CASIA-B and 85.33% on FVG, without utilizing any manually annotated data. This corresponds to a +14.2% and +9.67% accuracy increase compared to similar methods. Moreover, GaitFormer is able to accurately identify gender information and a multitude of appearance attributes utilizing only movement patterns. The code to reproduce the experiments is made publicly.
摘要
《坐姿分析可靠地实现人识别,无需参与者合作。行走是一种不会很快变化的生物指纹,每个人的坐姿都唯一。到目前为止,学术研究中的坐姿分析主要关注人识别和人群统计分析,忽略了许多步态特征,这些特征是外表基于方法所依据的。在这项工作中,我们不仅进行了人识别,还从步态特征中提取了多种人体特征。我们提出了DenseGait数据集,包含217万个匿名跟踪点,自动获得了42种外表特征。DenseGait数据集通过自动处理视频流程而成,具有实际世界中的全部步态covariate。我们将数据集公开发布。此外,我们提出了GaitFormer模型,通过多任务预训练在DenseGait数据集上,实现了CASIA-B和FVG上的92.5%和85.33%的准确率,不使用任何手动标注数据。这相对于类似方法提高了14.2%和9.67%的准确率。此外,GaitFormer能够通过只使用运动特征来准确地识别性别信息和许多外表特征。我们将实验代码公开。》
CARPE-ID: Continuously Adaptable Re-identification for Personalized Robot Assistance
results: 实验结果显示,对比之前的状态艺技法,本研究的CARPE-ID模块在所有情况下(除了两个特殊情况)都能够准确地跟踪每个选择的目标。Abstract
In today's Human-Robot Interaction (HRI) scenarios, a prevailing tendency exists to assume that the robot shall cooperate with the closest individual or that the scene involves merely a singular human actor. However, in realistic scenarios, such as shop floor operations, such an assumption may not hold and personalized target recognition by the robot in crowded environments is required. To fulfil this requirement, in this work, we propose a person re-identification module based on continual visual adaptation techniques that ensure the robot's seamless cooperation with the appropriate individual even subject to varying visual appearances or partial or complete occlusions. We test the framework singularly using recorded videos in a laboratory environment and an HRI scenario, i.e., a person-following task by a mobile robot. The targets are asked to change their appearance during tracking and to disappear from the camera field of view to test the challenging cases of occlusion and outfit variations. We compare our framework with one of the state-of-the-art Multi-Object Tracking (MOT) methods and the results show that the CARPE-ID can accurately track each selected target throughout the experiments in all the cases (except two limit cases). At the same time, the s-o-t-a MOT has a mean of 4 tracking errors for each video.
摘要
今天的人机交互(HRI)场景中,一种普遍的假设是Robot会与最近的个体或场景中的唯一的人actor进行合作。然而,在现实场景中,如生产线上的操作,这种假设可能不成立,需要机器人个性化目标认识。为此,在这项工作中,我们提出了基于不断视觉适应技术的人重识别模块,以确保机器人与正确的个体进行无缝合作,即使视觉表现或部分或完全遮挡。我们使用实验室环境中记录的视频进行单个测试,以及人工智能场景,即移动机器人跟踪人员任务。 targets在跟踪中改变外表和消失视频场景中,以测试受阻和衣服变化的情况。我们与状态的多目标跟踪(MOT)方法进行比较,结果显示,CARPE-ID可以在所有情况下(除了两个边界情况)accurately跟踪每个选择的目标。同时,s-o-t-a MOT的每个视频的平均跟踪错误为4。
Intelligent Breast Cancer Diagnosis with Heuristic-assisted Trans-Res-U-Net and Multiscale DenseNet using Mammogram Images
methods: 本研究使用了一个新的深度学习方法,包括三个阶段:资料收集、影像分类和乳癌识别。影像分类使用了一个基于Atrous Convolution的Attentive and Adaptive Trans-Res-UNet(ACA-ATRUNet)架构,而乳癌识别则使用了一个基于Atrous Convolution的Attentive and Adaptive Multi-scale DenseNet(ACA-AMDN)模型。内部的参数是使用Modified Mussel Length-based Eurasian Oystercatcher Optimization(MML-EOO)算法来优化。
results: 实验结果显示,提案的乳癌检测框架可以实现高精度的早期癌症检测,比传统方法更高。Abstract
Breast cancer (BC) significantly contributes to cancer-related mortality in women, underscoring the criticality of early detection for optimal patient outcomes. A mammography is a key tool for identifying and diagnosing breast abnormalities; however, accurately distinguishing malignant mass lesions remains challenging. To address this issue, we propose a novel deep learning approach for BC screening utilizing mammography images. Our proposed model comprises three distinct stages: data collection from established benchmark sources, image segmentation employing an Atrous Convolution-based Attentive and Adaptive Trans-Res-UNet (ACA-ATRUNet) architecture, and BC identification via an Atrous Convolution-based Attentive and Adaptive Multi-scale DenseNet (ACA-AMDN) model. The hyperparameters within the ACA-ATRUNet and ACA-AMDN models are optimised using the Modified Mussel Length-based Eurasian Oystercatcher Optimization (MML-EOO) algorithm. Performance evaluation, leveraging multiple metrics, is conducted, and a comparative analysis against conventional methods is presented. Our experimental findings reveal that the proposed BC detection framework attains superior precision rates in early disease detection, demonstrating its potential to enhance mammography-based screening methodologies.
摘要
乳癌(BC)对女性患有癌症的死亡率产生了重要的贡献,这更加提出了早期检测的重要性以便实现最佳病人结果。ammaography 是识别和诊断乳腺畸形的关键工具,但是准确地识别癌症还是一个挑战。为了解决这个问题,我们提出了一种基于深度学习的乳癌检测方法,使用ammaography 图像。我们的提议的模型包括三个不同的阶段:数据收集从已知基准源,使用 Atrous Convolution-based Attentive and Adaptive Trans-Res-UNet(ACA-ATRUNet)架构进行图像分割,以及使用 Atrous Convolution-based Attentive and Adaptive Multi-scale DenseNet(ACA-AMDN)模型进行乳癌识别。在 ACA-ATRUNet 和 ACA-AMDN 模型中的超参数使用 Modified Mussel Length-based Eurasian Oystercatcher Optimization(MML-EOO)算法优化。我们的实验结果表明,提议的乳癌检测框架可以在早期癌症检测中实现更高的精度率,这表明它有potential 提高基于ammaography 的检测方法。
Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models
results: 实验表明,所有的生成模型都受到我们的攻击。例如,我们对 DDPM、DDIM 和 FastDPM 在 CIFAR-10 和 CelebA 上进行了攻击,攻击 AUC 高于 0.99。而对 VQGAN、LDM(用于文本Conditional生成)和 LIIF 进行的攻击,AUC 高于 0.90。因此,我们呼吁我们的社区在设计和发布生成模型时注意这种隐患,以保护用户的隐私。Abstract
Generative models have demonstrated revolutionary success in various visual creation tasks, but in the meantime, they have been exposed to the threat of leaking private information of their training data. Several membership inference attacks (MIAs) have been proposed to exhibit the privacy vulnerability of generative models by classifying a query image as a training dataset member or nonmember. However, these attacks suffer from major limitations, such as requiring shadow models and white-box access, and either ignoring or only focusing on the unique property of diffusion models, which block their generalization to multiple generative models. In contrast, we propose the first generalized membership inference attack against a variety of generative models such as generative adversarial networks, [variational] autoencoders, implicit functions, and the emerging diffusion models. We leverage only generated distributions from target generators and auxiliary non-member datasets, therefore regarding target generators as black boxes and agnostic to their architectures or application scenarios. Experiments validate that all the generative models are vulnerable to our attack. For instance, our work achieves attack AUC $>0.99$ against DDPM, DDIM, and FastDPM trained on CIFAR-10 and CelebA. And the attack against VQGAN, LDM (for the text-conditional generation), and LIIF achieves AUC $>0.90.$ As a result, we appeal to our community to be aware of such privacy leakage risks when designing and publishing generative models.
摘要
<>通过多种视觉创作任务,生成模型已经取得了革命性的成功,但是在同时,它们也面临着泄露私人训练数据的风险。多种会员推断攻击(MIAs)已经提出,以示生成模型的隐私漏洞,但是这些攻击受到了重大限制,例如需要阴影模型和白盒访问,并且完全忽略了扩散模型的特殊性,导致它们无法泛化到多种生成模型。相比之下,我们提出了第一个通用的会员推断攻击,可以对多种生成模型进行攻击,包括生成对抗网络、自适应网络、隐藏函数和升级扩散模型。我们只需要使用目标生成器生成的分布,并且不需要访问目标生成器的 Architecture或应用场景,因此可以视为目标生成器为黑盒模型。实验表明,所有这些生成模型都受到了我们的攻击。例如,我们的工作在DDPM、DDIM和FastDPM在CIFAR-10和CelebA上获得了攻击AUC>0.99。而对VQGAN、LDM( для文本条件生成)和LIIF的攻击也获得了AUC>0.90。因此,我们呼吁我们的社区在设计和发布生成模型时需要注意隐私泄露风险。
Radar-Lidar Fusion for Object Detection by Designing Effective Convolution Networks
results: 在Radiate数据集上使用COCO指标进行评估,与状态的艺术方法相比,提高了1.89%和2.61%在有利和不利天气条件下。这显示了雷达-激光混合在困难的天气条件下准确地检测和 lokalisieren objects。Abstract
Object detection is a core component of perception systems, providing the ego vehicle with information about its surroundings to ensure safe route planning. While cameras and Lidar have significantly advanced perception systems, their performance can be limited in adverse weather conditions. In contrast, millimeter-wave technology enables radars to function effectively in such conditions. However, relying solely on radar for building a perception system doesn't fully capture the environment due to the data's sparse nature. To address this, sensor fusion strategies have been introduced. We propose a dual-branch framework to integrate radar and Lidar data for enhanced object detection. The primary branch focuses on extracting radar features, while the auxiliary branch extracts Lidar features. These are then combined using additive attention. Subsequently, the integrated features are processed through a novel Parallel Forked Structure (PFS) to manage scale variations. A region proposal head is then utilized for object detection. We evaluated the effectiveness of our proposed method on the Radiate dataset using COCO metrics. The results show that it surpasses state-of-the-art methods by $1.89\%$ and $2.61\%$ in favorable and adverse weather conditions, respectively. This underscores the value of radar-Lidar fusion in achieving precise object detection and localization, especially in challenging weather conditions.
摘要
Translated into Simplified Chinese:对象检测是感知系统的核心组件,为ego车提供环境信息,以确保安全的路径规划。尽管摄像头和激光技术在恶劣天气条件下有显著进步,但它们的性能可能受到限制。然而,毫米波技术在这些条件下能够有效地运行。然而,只靠 радиar建立感知系统不能完全捕捉环境,因为数据的稀疏性。为此,感知融合策略被引入。我们提出了一种双支osoframework,用于集成雷达和激光数据,以提高对象检测。主支系集中提取雷达特征,而辅助支系提取激光特征。这些特征然后在additive注意力下组合。然后,这些集成特征经过一种新的并行分支结构(PFS)进行扩展。一个区域提议头然后用于对象检测。我们使用Radiate数据集以COCO指标进行评估。结果显示,我们的提posed方法在不利天气和恶劣天气条件下比 estado-of-the-art方法提高$1.89\%$和$2.61\%$。这表明雷达-激光融合在挑战性天气条件下实现精准的对象检测和定位,尤其是在恶劣天气条件下。
A Clinical Guideline Driven Automated Linear Feature Extraction for Vestibular Schwannoma
paper_authors: Navodini Wijethilake, Steve Connor, Anna Oviedova, Rebecca Burger, Tom Vercauteren, Jonathan Shapey
for: This paper aims to automate and improve the clinical decision-making process for patients with Vestibular Schwannoma by using deep learning-based segmentation to extract relevant clinical features from T1 and T2 weighted MRI scans.
methods: The authors use a deep learning-based segmentation approach to extract the maximum linear measurement from the segmented regions, and propose a novel algorithm to choose and extract the most appropriate measurement based on the size of the extrameatal portion of the tumour.
results: The authors achieved Dice-scores of 0.8124 +- 0.2343 and 0.8969 +- 0.0521 for extrameatal and whole tumour regions respectively for T2 weighted MRI, and 0.8222 +- 0.2108 and 0.9049 +- 0.0646 for T1 weighted MRI. The automated measurements were found to be significantly correlated with the manual measurements obtained by an expert neuroradiologist (p < 0.0001).Abstract
Vestibular Schwannoma is a benign brain tumour that grows from one of the balance nerves. Patients may be treated by surgery, radiosurgery or with a conservative "wait-and-scan" strategy. Clinicians typically use manually extracted linear measurements to aid clinical decision making. This work aims to automate and improve this process by using deep learning based segmentation to extract relevant clinical features through computational algorithms. To the best of our knowledge, our study is the first to propose an automated approach to replicate local clinical guidelines. Our deep learning based segmentation provided Dice-scores of 0.8124 +- 0.2343 and 0.8969 +- 0.0521 for extrameatal and whole tumour regions respectively for T2 weighted MRI, whereas 0.8222 +- 0.2108 and 0.9049 +- 0.0646 were obtained for T1 weighted MRI. We propose a novel algorithm to choose and extract the most appropriate maximum linear measurement from the segmented regions based on the size of the extrameatal portion of the tumour. Using this tool, clinicians will be provided with a visual guide and related metrics relating to tumour progression that will function as a clinical decision aid. In this study, we utilize 187 scans obtained from 50 patients referred to a tertiary specialist neurosurgical service in the United Kingdom. The measurements extracted manually by an expert neuroradiologist indicated a significant correlation with the automated measurements (p < 0.0001).
摘要
vestibular schwannoma 是一种良性肿瘤,来自于很多平衡神经的增生。患者可能会通过手术、放射治疗或保守的“等待和扫描”策略进行治疗。临床医生通常使用手动提取的直线测量来帮助临床决策。本研究旨在自动化和改进这个过程,使用深度学习基于的分割来提取相关的临床特征,并通过计算机算法来提供临床指导。根据我们所知,本研究是首个提出自动化地复制当地临床指南的研究。我们的深度学习基于的分割提供了T2束积成像中的 dice-分数为0.8124±0.2343和0.8969±0.0521,而T1束积成像中的分数为0.8222±0.2108和0.9049±0.0646。我们提出了一种新的算法,可以在分割后选择和提取最适合的最大直线测量,基于脑外部的肿瘤部分的大小。使用这种工具,临床医生将被提供视觉指南和相关的肿瘤进展指标,作为临床决策参考。本研究使用50名患者和187个扫描图像,由专业神经外科医生手动提取的测量表明和自动化测量之间存在显著相关性(p<0.0001)。
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
results: 在ImageNet-1K图像分类任务中,TransXNet-T比Swin-T提高0.3%的顶部一准度,同时需要比Swin-T的计算成本少于一半,而TransXNet-S和TransXNet-B也达到了83.8%和84.6%的顶部一准度,具有合理的计算成本。此外,我们提出的网络架构在多种精密预测任务中表现出色,比其他现有的网络更高效,同时具有更好的泛化能力。Abstract
Recent studies have integrated convolution into transformers to introduce inductive bias and improve generalization performance. However, the static nature of conventional convolution prevents it from dynamically adapting to input variations, resulting in a representation discrepancy between convolution and self-attention as self-attention calculates attention matrices dynamically. Furthermore, when stacking token mixers that consist of convolution and self-attention to form a deep network, the static nature of convolution hinders the fusion of features previously generated by self-attention into convolution kernels. These two limitations result in a sub-optimal representation capacity of the constructed networks. To find a solution, we propose a lightweight Dual Dynamic Token Mixer (D-Mixer) that aggregates global information and local details in an input-dependent way. D-Mixer works by applying an efficient global attention module and an input-dependent depthwise convolution separately on evenly split feature segments, endowing the network with strong inductive bias and an enlarged effective receptive field. We use D-Mixer as the basic building block to design TransXNet, a novel hybrid CNN-Transformer vision backbone network that delivers compelling performance. In the ImageNet-1K image classification task, TransXNet-T surpasses Swin-T by 0.3\% in top-1 accuracy while requiring less than half of the computational cost. Furthermore, TransXNet-S and TransXNet-B exhibit excellent model scalability, achieving top-1 accuracy of 83.8\% and 84.6\% respectively, with reasonable computational costs. Additionally, our proposed network architecture demonstrates strong generalization capabilities in various dense prediction tasks, outperforming other state-of-the-art networks while having lower computational costs.
摘要
近期研究已经将卷积 integrate 到 transformer 中来引入逻辑偏好和提高总体性能。然而,传统的卷积 static nature 阻碍它在输入变化中动态适应,导致卷积和自注意力之间的表示差异,自注意力计算动态注意力矩阵时。此外,当堆叠 token mixer 组成深度网络时,传统的卷积 static nature 限制了自注意力生成的特征的融合到卷积核心中。这两个限制导致构建的网络没有达到最佳表示能力。为了解决这问题,我们提出了一种轻量级的双动态 токен mixer(D-Mixer),它可以在输入依赖的方式中集成全局信息和本地细节。D-Mixer 通过简单的全局注意力模块和输入依赖的深度卷积分别应用于分别拆分的特征段,赋予网络强大的逻辑偏好和扩大有效识别场。我们使用 D-Mixer 作为基本建构件,设计了 TransXNet,一种新的混合 CNN-Transformer 视觉后缀网络,它在 ImageNet-1K 图像分类任务中,胜过 Swin-T 的 top-1 准确率,而且计算成本相对较低。此外,TransXNet-S 和 TransXNet-B 在不同的计算成本下仍然表现出色,分别达到了 83.8% 和 84.6% 的 top-1 准确率。此外,我们的提案的网络架构还在不同的粗糙预测任务中表现出色,比其他状态发展的网络更高的总体性能,同时计算成本相对较低。
paper_authors: Attila Lengyel, Ombretta Strafforello, Robert-Jan Bruintjes, Alexander Gielisse, Jan van Gemert
for: 提高 CNN 对各种颜色变化的识别能力
methods: 提出 Color Equivariant Convolutions (CEConvs) deep learning Building Block,允许形状特征共享 across 颜色谱,保留重要的颜色信息
results: 在不同任务上提高下游性能,提高对颜色变化的Robustness,包括训练测试分布shiftAbstract
Color is a crucial visual cue readily exploited by Convolutional Neural Networks (CNNs) for object recognition. However, CNNs struggle if there is data imbalance between color variations introduced by accidental recording conditions. Color invariance addresses this issue but does so at the cost of removing all color information, which sacrifices discriminative power. In this paper, we propose Color Equivariant Convolutions (CEConvs), a novel deep learning building block that enables shape feature sharing across the color spectrum while retaining important color information. We extend the notion of equivariance from geometric to photometric transformations by incorporating parameter sharing over hue-shifts in a neural network. We demonstrate the benefits of CEConvs in terms of downstream performance to various tasks and improved robustness to color changes, including train-test distribution shifts. Our approach can be seamlessly integrated into existing architectures, such as ResNets, and offers a promising solution for addressing color-based domain shifts in CNNs.
摘要
颜色是计算机视觉中的一个关键视觉提示符,它被卷积神经网络(CNN)广泛利用于物体识别中。然而,如果数据集中存在颜色变化的偏置,那么CNN会遇到困难。颜色不变性可以解决这个问题,但是它会在抛弃所有颜色信息的同时做出牺牲。在这篇论文中,我们提出了颜色等距化卷积(CEConvs),一种新的深度学习建模元件,它可以在颜色谱中共享形状特征,同时保留重要的颜色信息。我们扩展了卷积神经网络中的等距变换概念,包括在抽象中插入色差参数共享。我们示出了CEConvs在不同任务下的下游性能和颜色变化的Robustness,包括训练测试分布跳变。我们的方法可以轻松地与现有的架构相结合,如ResNet,并提供一个有 Promise的解决方案,用于在CNN中Addressing颜色基于频谱的域shift。
Semi- and Weakly-Supervised Domain Generalization for Object Detection
results: 实验结果显示,使用提案的问题设定和学生-教师学习框架可以将物件探测器训练得到优化,比起基准探测器训练一个执Domain Data的情况下,表现更好,而且与或更好于不使用Target Domain Data进行训练的UDA设定相比。Abstract
Object detectors do not work well when domains largely differ between training and testing data. To solve this problem, domain generalization approaches, which require training data with ground-truth labels from multiple domains, have been proposed. However, it is time-consuming and labor-intensive to collect those data for object detection because not only class labels but also bounding boxes must be annotated. To overcome the problem of domain gap in object detection without requiring expensive annotations, we propose to consider two new problem settings: semi-supervised domain generalizable object detection (SS-DGOD) and weakly-supervised DGOD (WS-DGOD). In contrast to the conventional domain generalization for object detection that requires labeled data from multiple domains, SS-DGOD and WS-DGOD require labeled data only from one domain and unlabeled or weakly-labeled data from multiple domains for training. We show that object detectors can be effectively trained on the proposed settings with the same student-teacher learning framework, where a student network is trained with pseudo labels output from a teacher on the unlabeled or weakly-labeled data. The experimental results demonstrate that the object detectors trained on the proposed settings significantly outperform baseline detectors trained on one labeled domain data and perform comparably to or better than those trained on unsupervised domain adaptation (UDA) settings, while ours do not use target domain data for training in contrast to UDA.
摘要
Unlike conventional domain generalization for object detection, which requires labeled data from multiple domains, SS-DGOD and WS-DGOD only require labeled data from one domain and unlabeled or weakly-labeled data from multiple domains for training. We use a student-teacher learning framework, where a student network is trained with pseudo labels output from a teacher on the unlabeled or weakly-labeled data. The experimental results show that object detectors can be effectively trained on the proposed settings and significantly outperform baseline detectors trained on one labeled domain data. Our approach also performs comparably to or better than those trained on unsupervised domain adaptation (UDA) settings, without using target domain data for training.
Label-Only Model Inversion Attacks via Knowledge Transfer
For: The paper is focused on addressing the privacy threat of model inversion (MI) attacks, specifically in the label-only setup where the adversary only has access to the model’s predicted labels.* Methods: The proposed approach, called LOKT, uses transfer learning to leverage the knowledge of an opaque target model to surrogate models, enabling the use of advanced white-box attacks. The key technique is a novel model called Target model-assisted ACGAN (T-ACGAN), which facilitates effective knowledge transfer.* Results: The proposed method significantly outperforms existing state-of-the-art (SOTA) label-only MI attacks by more than 15% across all MI benchmarks, and compares favorably in terms of query budget.Here’s the information in Simplified Chinese:* For: 本研究旨在解决机器学习模型 inverse 攻击(MI)的隐私威胁,特别是在标签只 setup 中, где adversary 只有模型预测的标签。* Methods: 提议的方法是使用知识传输来利用透明目标模型的知识,并使用 surrogate 模型来实现高级白盒攻击。关键技术是一种新的模型called Target model-assisted ACGAN (T-ACGAN),它实现了有效的知识传输。* Results: 提议的方法在所有 MI 测试集上比现有 SOTA 标签只 MI 攻击高出15%以上,并与查询预算相比较有利。Abstract
In a model inversion (MI) attack, an adversary abuses access to a machine learning (ML) model to infer and reconstruct private training data. Remarkable progress has been made in the white-box and black-box setups, where the adversary has access to the complete model or the model's soft output respectively. However, there is very limited study in the most challenging but practically important setup: Label-only MI attacks, where the adversary only has access to the model's predicted label (hard label) without confidence scores nor any other model information. In this work, we propose LOKT, a novel approach for label-only MI attacks. Our idea is based on transfer of knowledge from the opaque target model to surrogate models. Subsequently, using these surrogate models, our approach can harness advanced white-box attacks. We propose knowledge transfer based on generative modelling, and introduce a new model, Target model-assisted ACGAN (T-ACGAN), for effective knowledge transfer. Our method casts the challenging label-only MI into the more tractable white-box setup. We provide analysis to support that surrogate models based on our approach serve as effective proxies for the target model for MI. Our experiments show that our method significantly outperforms existing SOTA Label-only MI attack by more than 15% across all MI benchmarks. Furthermore, our method compares favorably in terms of query budget. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our code, demo, models and reconstructed data are available at our project page: https://ngoc-nguyen-0.github.io/lokt/
摘要
“在机器学习(ML)模型倒推(MI)攻击中,敌对者利用对ML模型的访问来推断和重建私有训练数据。在白盒和黑盒设置中,敌对者已经获得了很多进展,但是在最重要的Label-only MI攻击中,即只有模型预测标签(硬标签)而无 confidence scores 和其他模型信息的情况下,还有很少的研究。在这项工作中,我们提出了一种新的方法 called LOKT,用于Label-only MI攻击。我们的想法是基于目标模型的知识传递到临时模型上。然后,我们可以通过这些临时模型来利用高级白盒攻击。我们提出了基于生成模型的知识传递,并引入了一种新的模型 Target model-assisted ACGAN(T-ACGAN),用于有效地传递知识。我们将Label-only MI转化为更加可控的白盒设置。我们提供分析支持,表明我们的方法可以使用临时模型来代表目标模型进行MI。我们的实验表明,我们的方法可以与现有最佳实践(SOTA)Label-only MI攻击比较,并且在所有MI benchmark上表现出色,超过15%的提升。此外,我们的方法与查询预算相比,也具有良好的比较。我们的研究表明,即使只有硬标签信息被暴露出来,ML模型也面临着巨大的隐私威胁。我们的研究也表明,隐私威胁不仅限于ML模型,还可能扩展到其他领域。我们的代码、demo、模型和重建数据都可以在我们项目页面上下载:https://ngoc-nguyen-0.github.io/lokt/”
results: 这篇论文通过实验和分析发现,现有的公平性评估框架存在评估错误,而提出的CLEAM可以减少这些错误。此外,通过CLEAM评估了文本到图像生成器和GANs的公平性,发现这些模型具有重要的偏见问题。Abstract
Recently, there has been increased interest in fair generative models. In this work, we conduct, for the first time, an in-depth study on fairness measurement, a critical component in gauging progress on fair generative models. We make three contributions. First, we conduct a study that reveals that the existing fairness measurement framework has considerable measurement errors, even when highly accurate sensitive attribute (SA) classifiers are used. These findings cast doubts on previously reported fairness improvements. Second, to address this issue, we propose CLassifier Error-Aware Measurement (CLEAM), a new framework which uses a statistical model to account for inaccuracies in SA classifiers. Our proposed CLEAM reduces measurement errors significantly, e.g., 4.98% $\rightarrow$ 0.62% for StyleGAN2 w.r.t. Gender. Additionally, CLEAM achieves this with minimal additional overhead. Third, we utilize CLEAM to measure fairness in important text-to-image generator and GANs, revealing considerable biases in these models that raise concerns about their applications. Code and more resources: https://sutd-visual-computing-group.github.io/CLEAM/.
摘要
近期,关于公平生成模型的兴趣增长。在这项工作中,我们第一次进行了深入的公平度测量研究,这是评估公平生成模型的关键组件。我们的贡献包括以下三个方面:第一,我们发现现有的公平度测量框架存在较大的测量错误,即使使用高精度敏感特征(SA)分类器也是如此。这些发现质量地影响了先前报道的公平改进。第二,为解决这一问题,我们提出了一种新的测量框架,即类别错误意识(CLEAM)。CLEAM使用统计模型来考虑敏感特征分类器的不准确性,从而大幅减少测量错误。例如,对于StyleGAN2,我们的CLEAM可以将4.98%降低到0.62%。此外,CLEAM可以实现这一目标而无需较大的额外负担。第三,我们使用CLEAM测量了文本生成器和GAN的公平度,发现这些模型存在许多问题,它们的应用可能会引起关切。代码和更多资源可以在以下链接中找到:https://sutd-visual-computing-group.github.io/CLEAM/.
FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound
results: 对比其他强有力竞争者,该方法在一个大规模的胎儿US数据集上表现出优异的准确性和稳定性。Abstract
Fetal pose estimation in 3D ultrasound (US) involves identifying a set of associated fetal anatomical landmarks. Its primary objective is to provide comprehensive information about the fetus through landmark connections, thus benefiting various critical applications, such as biometric measurements, plane localization, and fetal movement monitoring. However, accurately estimating the 3D fetal pose in US volume has several challenges, including poor image quality, limited GPU memory for tackling high dimensional data, symmetrical or ambiguous anatomical structures, and considerable variations in fetal poses. In this study, we propose a novel 3D fetal pose estimation framework (called FetusMapV2) to overcome the above challenges. Our contribution is three-fold. First, we propose a heuristic scheme that explores the complementary network structure-unconstrained and activation-unreserved GPU memory management approaches, which can enlarge the input image resolution for better results under limited GPU memory. Second, we design a novel Pair Loss to mitigate confusion caused by symmetrical and similar anatomical structures. It separates the hidden classification task from the landmark localization task and thus progressively eases model learning. Last, we propose a shape priors-based self-supervised learning by selecting the relatively stable landmarks to refine the pose online. Extensive experiments and diverse applications on a large-scale fetal US dataset including 1000 volumes with 22 landmarks per volume demonstrate that our method outperforms other strong competitors.
摘要
三维超声成像(US)中的胎儿姿态估计涉及到确定一组相关的胎儿生理学特征点。其主要目标是通过特征点之间的连接提供胎儿全面信息,以便在各种关键应用程序中使用,如生物 метри克测量、平面定位和胎儿运动监测。然而,在三维胎儿姿态估计中存在多种挑战,包括图像质量不佳、有限的GPU内存处理高维数据、同形或混淆的生理结构和胎儿姿态变化较大。在这种研究中,我们提出了一种新的三维胎儿姿态估计框架(称为FetusMapV2),以解决上述挑战。我们的贡献有三个方面:1. 我们提出了一种启发式方法,通过不同的网络结构和活动不受限制的GPU内存管理方法,可以在有限的GPU内存下提高输入图像分辨率以获得更好的结果。2. 我们设计了一种新的对称损失函数,用于降低同形和相似的生理结构所引起的混淆。它将隐藏的分类任务与特征点local化任务分离开来,从而逐渐缓解模型学习。3. 我们提出了一种基于形状假设的自动学习方法,通过选择稳定的特征点来在线进行姿态约束。我们在一个大规模的胎儿US数据集上进行了广泛的实验和多种应用,并证明了我们的方法在其他强竞争者之上表现出优异。
EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution
results: 对四个遥感图像 dataset 进行了广泛的实验,并证明了 EDiffSR 可以提供高质量的 SR 效果,同时具有高效的计算成本和简单易用的训练方法。Abstract
Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to collapse, resulting in undesirable artifacts. To mitigate these issues, in this paper, we first introduce Diffusion Probabilistic Model (DPM) for efficient remote sensing image SR, dubbed EDiffSR. EDiffSR is easy to train and maintains the merits of DPM in generating perceptual-pleasant images. Specifically, different from previous works using heavy UNet for noise prediction, we develop an Efficient Activation Network (EANet) to achieve favorable noise prediction performance by simplified channel attention and simple gate operation, which dramatically reduces the computational budget. Moreover, to introduce more valuable prior knowledge into the proposed EDiffSR, a practical Conditional Prior Enhancement Module (CPEM) is developed to help extract an enriched condition. Unlike most DPM-based SR models that directly generate conditions by amplifying LR images, the proposed CPEM helps to retain more informative cues for accurate SR. Extensive experiments on four remote sensing datasets demonstrate that EDiffSR can restore visual-pleasant images on simulated and real-world remote sensing images, both quantitatively and qualitatively. The code of EDiffSR will be available at https://github.com/XY-boy/EDiffSR
摘要
最近,卷积网络在远程感知图像超分辨(SR)中取得了显著的发展,通过最小化回归目标函数,如MSE损失来实现。然而,这些方法经常受到过度平滑的问题,导致图像质量不佳。生成对抗网络具有推理细节的潜力,但容易塌陷,导致不良artefacts。为了解决这些问题,在本文中,我们首先介绍了Diffusion Probabilistic Model(DPM)为远程感知图像SR,称为EDiffSR。EDiffSR易于训练并保持DPM的优点,生成感知良好的图像。与之前使用庞大的UNet进行噪声预测不同,我们开发了高效的Activation Network(EANet),通过简化通道注意力和简单的门控操作,可以减少计算预算。此外,为了在提出的EDiffSR中引入更多有价值的前提知识,我们开发了实用的Conditional Prior Enhancement Module(CPEM),帮助提取更富有的条件。与大多数DPM基于SR模型直接生成条件的方法不同,CPEM帮助保留更多有用的cue,以提高准确的SR。我们在四个远程感知数据集上进行了广泛的实验,显示EDiffSR可以在模拟和实际远程感知图像上重建高质量的视觉美好图像, both quantitatively and qualitatively。代码将在https://github.com/XY-boy/EDiffSR上公开。
Improving Online Source-free Domain Adaptation for Object Detection by Unsupervised Data Acquisition
results: 比较现有状态 искусственный智能技术,我们的方法在实际 dataset 上表现出色,说明了不supervised数据获取可以提高移动机器人中的适应对象检测。Abstract
Effective object detection in mobile robots is challenged by deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) offers real-time model adaptation using a stream of unlabeled data from a target domain. However, not all captured frames in mobile robotics contain information that is beneficial for adaptation, particularly when there is a strong domain shift. This paper introduces a novel approach to enhance O-SFDA for adaptive object detection in mobile robots via unsupervised data acquisition. Our methodology prioritizes the most informative unlabeled samples for inclusion in the online training process. Empirical evaluation on a real-world dataset reveals that our method outperforms existing state-of-the-art O-SFDA techniques, demonstrating the viability of unsupervised data acquisition for improving adaptive object detection in mobile robots.
摘要
efficient object detection in mobile robots is challenging due to deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) provides real-time model adaptation using a stream of unlabeled data from the target domain. However, not all captured frames in mobile robotics contain useful information for adaptation, especially when there is a strong domain shift. This paper proposes a novel approach to enhance O-SFDA for adaptive object detection in mobile robots through unsupervised data acquisition. Our method prioritizes the most informative unlabeled samples for inclusion in the online training process. Empirical evaluation on a real-world dataset shows that our method outperforms existing state-of-the-art O-SFDA techniques, demonstrating the feasibility of unsupervised data acquisition for improving adaptive object detection in mobile robots.Here's the text with Traditional Chinese characters:efficient object detection in mobile robots is challenging due to deployment in diverse and unfamiliar environments. Online Source-Free Domain Adaptation (O-SFDA) provides real-time model adaptation using a stream of unlabeled data from the target domain. However, not all captured frames in mobile robotics contain useful information for adaptation, especially when there is a strong domain shift. This paper proposes a novel approach to enhance O-SFDA for adaptive object detection in mobile robots through unsupervised data acquisition. Our method prioritizes the most informative unlabeled samples for inclusion in the online training process. Empirical evaluation on a real-world dataset shows that our method outperforms existing state-of-the-art O-SFDA techniques, demonstrating the feasibility of unsupervised data acquisition for improving adaptive object detection in mobile robots.
A High-Resolution Dataset for Instance Detection with Multi-View Instance Capture
paper_authors: Qianqian Shen, Yunhan Zhao, Nahyun Kwon, Jeeeun Kim, Yanan Li, Shu Kong
for: 这个论文的目标是提出一个新的实例检测 dataset 和协议,以提高实例检测领域的研究。
methods: 该论文使用了多视图实例捕捉和多种场景图像,并使用了自动批注的方式来生成训练图像。
results: 研究发现,使用存在检测模型(Segment Anything Model,SAM)和自然语言推荐的自我超视图表示(DINOv2),可以达到>10 AP的提升,比抽象的实例检测模型(如FasterRCNN和RetinaNet)更好。Abstract
Instance detection (InsDet) is a long-lasting problem in robotics and computer vision, aiming to detect object instances (predefined by some visual examples) in a cluttered scene. Despite its practical significance, its advancement is overshadowed by Object Detection, which aims to detect objects belonging to some predefined classes. One major reason is that current InsDet datasets are too small in scale by today's standards. For example, the popular InsDet dataset GMU (published in 2016) has only 23 instances, far less than COCO (80 classes), a well-known object detection dataset published in 2014. We are motivated to introduce a new InsDet dataset and protocol. First, we define a realistic setup for InsDet: training data consists of multi-view instance captures, along with diverse scene images allowing synthesizing training images by pasting instance images on them with free box annotations. Second, we release a real-world database, which contains multi-view capture of 100 object instances, and high-resolution (6k x 8k) testing images. Third, we extensively study baseline methods for InsDet on our dataset, analyze their performance and suggest future work. Somewhat surprisingly, using the off-the-shelf class-agnostic segmentation model (Segment Anything Model, SAM) and the self-supervised feature representation DINOv2 performs the best, achieving >10 AP better than end-to-end trained InsDet models that repurpose object detectors (e.g., FasterRCNN and RetinaNet).
摘要
首先,我们定义了实际的InsDet设置:训练数据包括多视图实例捕捉,以及包含多种场景图像,以便通过粘贴实例图像到它们上面并提供自由框注释来生成训练图像。第二,我们发布了一个真实的数据库,它包含100个对象实例的多视图捕捉和高分辨率(6000 x 8000)测试图像。第三,我们广泛研究了InsDet基线方法,分析其性能并建议未来的工作。有些surprisingly,使用的�分类无关的分割模型(Segment Anything Model,SAM)和自然的自我指导特征表示(DINOv2)perform the best, exceeding >10 AP better than end-to-end trained InsDet models that repurpose object detectors(例如,FasterRCNN和RetinaNet)。
There Are No Data Like More Data- Datasets for Deep Learning in Earth Observation
results: 本文希望通过强调地球观测数据集的重要性,为地球观测领域的机器学习研究提供一个新的视角,并且预期能够促进地球观测领域的机器学习研究发展。Abstract
Carefully curated and annotated datasets are the foundation of machine learning, with particularly data-hungry deep neural networks forming the core of what is often called Artificial Intelligence (AI). Due to the massive success of deep learning applied to Earth Observation (EO) problems, the focus of the community has been largely on the development of ever-more sophisticated deep neural network architectures and training strategies largely ignoring the overall importance of datasets. For that purpose, numerous task-specific datasets have been created that were largely ignored by previously published review articles on AI for Earth observation. With this article, we want to change the perspective and put machine learning datasets dedicated to Earth observation data and applications into the spotlight. Based on a review of the historical developments, currently available resources are described and a perspective for future developments is formed. We hope to contribute to an understanding that the nature of our data is what distinguishes the Earth observation community from many other communities that apply deep learning techniques to image data, and that a detailed understanding of EO data peculiarities is among the core competencies of our discipline.
摘要
仔细挑选和注释的数据集是机器学习的基础,特别是深度神经网络作为人工智能(AI)的核心。由于深度学习在地球观测(EO)问题上取得了巨大成功,因此社区的焦点主要集中在发展更加复杂的深度神经网络架构和训练策略上,忽视了数据集的重要性。为了改变这种情况,我们创建了许多任务特定的数据集,这些数据集在前一些发表的评论文章中几乎未得到批注。在这篇文章中,我们想要把机器学习专门为地球观测数据和应用的数据集和技术放在主要的位置。通过审查历史发展,我们现在描述了可用的资源,并形成了未来发展的视角。我们希望通过这篇文章,让读者理解,地球观测社区与其他使用深度学习技术处理图像数据的社区不同,我们的数据特点是我们的专业领域的核心。
CHAMMI: A benchmark for channel-adaptive models in microscopy imaging
results: 论文发现,适应通道数的神经网络模型可以更好地泛化到异常任务,并且可以在计算效率方面具有优势。同时,论文提供了一个抽象的数据集和评价API,以便在未来的研究和应用中进行对比。Abstract
Most neural networks assume that input images have a fixed number of channels (three for RGB images). However, there are many settings where the number of channels may vary, such as microscopy images where the number of channels changes depending on instruments and experimental goals. Yet, there has not been a systemic attempt to create and evaluate neural networks that are invariant to the number and type of channels. As a result, trained models remain specific to individual studies and are hardly reusable for other microscopy settings. In this paper, we present a benchmark for investigating channel-adaptive models in microscopy imaging, which consists of 1) a dataset of varied-channel single-cell images, and 2) a biologically relevant evaluation framework. In addition, we adapted several existing techniques to create channel-adaptive models and compared their performance on this benchmark to fixed-channel, baseline models. We find that channel-adaptive models can generalize better to out-of-domain tasks and can be computationally efficient. We contribute a curated dataset (https://doi.org/10.5281/zenodo.7988357) and an evaluation API (https://github.com/broadinstitute/MorphEm.git) to facilitate objective comparisons in future research and applications.
摘要
大多数神经网络假设输入图像有固定数量的通道(三个 для RGB 图像)。然而,有许多情况下,通道的数量可能会变化,如微scopic 图像中的通道数量,它们取决于实验室设备和实验目标。然而,没有系统地创建和评估可以适应不同通道数量的神经网络。因此,训练的模型往往只能在特定的研究中使用,而不能在其他微scopic 设置中 reuse。在这篇论文中,我们提出了一个检验通道适应模型在微scopic 成像中的标准套件,包括:1)一个包含不同通道单元细胞图像的数据集,和2)一个生物学上有意义的评估框架。此外,我们将existings several techniques to create channel-adaptive models and compared their performance on this benchmark to fixed-channel, baseline models. We find that channel-adaptive models can generalize better to out-of-domain tasks and can be computationally efficient. We contribute a curated dataset (https://doi.org/10.5281/zenodo.7988357) and an evaluation API (https://github.com/broadinstitute/MorphEm.git) to facilitate objective comparisons in future research and applications.
Modular Anti-noise Deep Learning Network for Robotic Grasp Detection Based on RGB Images
results: 实验和评估结果表明,提出的方法可以准确地检测抓取姿态,并且可以适应受扰和杂乱的视觉。该方法的设计具有可学习性和可靠性,并且可以在实际应用中实现可靠的抓取姿态检测。Abstract
While traditional methods relies on depth sensors, the current trend leans towards utilizing cost-effective RGB images, despite their absence of depth cues. This paper introduces an interesting approach to detect grasping pose from a single RGB image. To this end, we propose a modular learning network augmented with grasp detection and semantic segmentation, tailored for robots equipped with parallel-plate grippers. Our network not only identifies graspable objects but also fuses prior grasp analyses with semantic segmentation, thereby boosting grasp detection precision. Significantly, our design exhibits resilience, adeptly handling blurred and noisy visuals. Key contributions encompass a trainable network for grasp detection from RGB images, a modular design facilitating feasible grasp implementation, and an architecture robust against common image distortions. We demonstrate the feasibility and accuracy of our proposed approach through practical experiments and evaluations.
摘要
tradicional方法依靠深度感知器,当前趋势则是使用便宜的RGB图像,尽管它们缺乏深度cue。这篇论文介绍了一种有趣的方法来探测抓取姿势从单个RGB图像中。为此,我们提议一种模块化学习网络,其中包括抓取检测和semantic分类,适用于配备平行板抓取器的机器人。我们的网络不仅可以识别可抓取的物体,还可以将先前的抓取分析与semantic分类相结合,从而提高抓取检测精度。另外,我们的设计具有抗锈和抗噪特性,可以 effectively 处理模糊和噪声的视觉。我们的主要贡献包括一种可调网络 дляRGB图像中的抓取检测、一种可实施的模块化设计以及一种对常见图像扭曲有效的架构。我们通过实验和评估证明了我们的提议的可行性和准确性。
Generalized Category Discovery with Clustering Assignment Consistency
results: 我们的方法在三个通用 benchmark 上 achieve state-of-the-art 性能,并在三个细化的视觉识别数据集上也达到了优秀的结果。特别是在 ImageNet-100 数据集上,我们的方法与最佳基eline相比,在 \texttt{Novel} 和 \texttt{All} 类上提高了15.5% 和 7.0%。Abstract
Generalized category discovery (GCD) is a recently proposed open-world task. Given a set of images consisting of labeled and unlabeled instances, the goal of GCD is to automatically cluster the unlabeled samples using information transferred from the labeled dataset. The unlabeled dataset comprises both known and novel classes. The main challenge is that unlabeled novel class samples and unlabeled known class samples are mixed together in the unlabeled dataset. To address the GCD without knowing the class number of unlabeled dataset, we propose a co-training-based framework that encourages clustering consistency. Specifically, we first introduce weak and strong augmentation transformations to generate two sufficiently different views for the same sample. Then, based on the co-training assumption, we propose a consistency representation learning strategy, which encourages consistency between feature-prototype similarity and clustering assignment. Finally, we use the discriminative embeddings learned from the semi-supervised representation learning process to construct an original sparse network and use a community detection method to obtain the clustering results and the number of categories simultaneously. Extensive experiments show that our method achieves state-of-the-art performance on three generic benchmarks and three fine-grained visual recognition datasets. Especially in the ImageNet-100 data set, our method significantly exceeds the best baseline by 15.5\% and 7.0\% on the \texttt{Novel} and \texttt{All} classes, respectively.
摘要
通用类发现(GCD)是一个最近提出的开放世界任务。给定一个包含标注和无标注实例的图像集合,GCD的目标是自动将无标注样本分组使用标注集合中的信息传递。无标注集合包含已知和新类。主要挑战是无标注新类样本和已知类样本在无标注集合中杂mix。为解决GCD而不知道无标注集合的类数,我们提出了基于co-training assumption的框架。Specifically,我们首先引入弱和强变换 transformation来生成两个足够不同的视图 для同一个样本。然后,我们基于co-training assumption,提出一种协同表示学习策略,该策略激励协同 между特征原型相似度和分类划分。最后,我们使用从半超vised表示学习过程中学习的抗性嵌入来构建原始稀有网络,并使用社区探测方法获取分类结果和类数同时。广泛的实验表明,我们的方法在三个通用图像识别 benchmark 上达到了状态机器人的性能,特别是在 ImageNet-100 数据集上,我们的方法在 \texttt{Novel} 和 \texttt{All} 类上超过了最佳基eline的15.5%和7.0%。
results: experiments show 我们的方法可以在实际环境下,即使有时间和成本限制,也能够具有与抽取方法相当或更好的表现,并且可以减少75%的用户数据 Retrieval。Abstract
Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. Existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. However, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. To overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. The summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. Experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. We demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints.
摘要
personalization, tailoring a system to individual users, is a crucial aspect of user experience with natural language processing (NLP) systems. with the emergence of large language models (LLMs), a key question is how to leverage these models to better personalize user experiences. to personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. however, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. to overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. the summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. we demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints.
FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space
results: FOCAL在四个多Modal感知数据集上进行了广泛的评估,并与现有的基eline进行了比较。结果显示,FOCAL在下游任务中具有明显的优势,具有较高的准确率和较低的损失值。Abstract
This paper proposes a novel contrastive learning framework, called FOCAL, for extracting comprehensive features from multimodal time-series sensing signals through self-supervised training. Existing multimodal contrastive frameworks mostly rely on the shared information between sensory modalities, but do not explicitly consider the exclusive modality information that could be critical to understanding the underlying sensing physics. Besides, contrastive frameworks for time series have not handled the temporal information locality appropriately. FOCAL solves these challenges by making the following contributions: First, given multimodal time series, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective. In contrast, the private space extracts modality-exclusive information through a transformation-invariant objective. Second, we propose a temporal structural constraint for modality features, such that the average distance between temporally neighboring samples is no larger than that of temporally distant samples. Extensive evaluations are performed on four multimodal sensing datasets with two backbone encoders and two classifiers to demonstrate the superiority of FOCAL. It consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin, under different ratios of available labels. The code and self-collected dataset are available at https://github.com/tomoyoshki/focal.
摘要
First, it encodes each modality into a factorized latent space consisting of shared features and private features that are orthogonal to each other. The shared space emphasizes feature patterns consistent across sensory modalities through a modal-matching objective, while the private space extracts modality-exclusive information through a transformation-invariant objective.Second, it introduces a temporal structural constraint for modality features, such that the average distance between temporally neighboring samples is no larger than that of temporally distant samples. This ensures that the model learns to capture the temporal relationships between samples.The proposed framework is evaluated on four multimodal sensing datasets with two backbone encoders and two classifiers. The results show that FOCAL consistently outperforms the state-of-the-art baselines in downstream tasks with a clear margin, under different ratios of available labels. The code and self-collected dataset are available at https://github.com/tomoyoshki/focal.
Vignat: Vulnerability identification by learning code semantics via graph attention networks
results: 人类参与者在实验中证明了,当agent acts intentionally时,人类会根据agent使用的概念来进行合理的思维。Abstract
Value alignment is essential for building AI systems that can safely and reliably interact with people. However, what a person values -- and is even capable of valuing -- depends on the concepts that they are currently using to understand and evaluate what happens in the world. The dependence of values on concepts means that concept alignment is a prerequisite for value alignment -- agents need to align their representation of a situation with that of humans in order to successfully align their values. Here, we formally analyze the concept alignment problem in the inverse reinforcement learning setting, show how neglecting concept alignment can lead to systematic value mis-alignment, and describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values. Additionally, we report experimental results with human participants showing that humans reason about the concepts used by an agent when acting intentionally, in line with our joint reasoning model.
摘要
<>值Alignment是AI系统与人类之间安全、可靠交互的关键。然而,人类的价值观与可能理解和评估世界的概念相关。因此,概念Alignment是价值Alignment的前提——代理需要与人类的情况表示相对应才能成功地Alignment其价值。我们在 inverse reinforcement learning Setting formally analyze the concept alignment problem, show that neglecting concept alignment can lead to systematic value mis-alignment, and describe an approach that helps minimize such failure modes by jointly reasoning about a person's concepts and values. In addition, we report experimental results with human participants showing that humans reason about the concepts used by an agent when acting intentionally, in line with our joint reasoning model.中文简体版
Constrained Hierarchical Monte Carlo Belief-State Planning
results: 这篇论文的结果显示,如果将基本的选项控制器定义为满足指派的紧张预算,那么COBeTS就可以确保满足紧张预算任何时候。否则,COBeTS将导引搜寻 towards a safe sequence of option primitives,并使用层次监控来实现runtime safety。Abstract
Optimal plans in Constrained Partially Observable Markov Decision Processes (CPOMDPs) maximize reward objectives while satisfying hard cost constraints, generalizing safe planning under state and transition uncertainty. Unfortunately, online CPOMDP planning is extremely difficult in large or continuous problem domains. In many large robotic domains, hierarchical decomposition can simplify planning by using tools for low-level control given high-level action primitives (options). We introduce Constrained Options Belief Tree Search (COBeTS) to leverage this hierarchy and scale online search-based CPOMDP planning to large robotic problems. We show that if primitive option controllers are defined to satisfy assigned constraint budgets, then COBeTS will satisfy constraints anytime. Otherwise, COBeTS will guide the search towards a safe sequence of option primitives, and hierarchical monitoring can be used to achieve runtime safety. We demonstrate COBeTS in several safety-critical, constrained partially observable robotic domains, showing that it can plan successfully in continuous CPOMDPs while non-hierarchical baselines cannot.
摘要
最佳计划在受限 partially observable Markov decision process (CPOMDP) 中最大化 reward 目标,同时满足硬件成本限制,广泛应用于安全观察下的规划。然而,在大型或连续问题领域中的线上 CPOMDP 规划具有极高的问题难度。在许多大型机器人领域中,层次分解可以简化规划,使用工具 для low-level control 给 high-level action primitives(选项)。我们介绍 Constrained Options Belief Tree Search (COBeTS),以利用这个层次,将线上搜寻基于 CPOMDP 规划 scales 到大型机器人问题领域。我们证明,如果单元选项控制器是将任务分配到硬件成本预算,COBeTS 就一定会满足限制。否则,COBeTS 将导引搜寻 towards a safe sequence of option primitives,并使用层次监控来实现 runtime 安全。我们在 Several safety-critical, constrained partially observable robotic domains 中评估 COBeTS,结果显示它可以在连续 CPOMDP 中成功规划,而非层次基于的基底不能。
Look At Me, No Replay! SurpriseNet: Anomaly Detection Inspired Class Incremental Learning
results: 实验表明,SurpriseNet在传统视觉不断学习标准准则上表现出色,以及在结构化数据集上。源代码可以在https://doi.org/10.5281/zenodo.8247906和https://github.com/tachyonicClock/SurpriseNet-CIKM-23中下载。Abstract
Continual learning aims to create artificial neural networks capable of accumulating knowledge and skills through incremental training on a sequence of tasks. The main challenge of continual learning is catastrophic interference, wherein new knowledge overrides or interferes with past knowledge, leading to forgetting. An associated issue is the problem of learning "cross-task knowledge," where models fail to acquire and retain knowledge that helps differentiate classes across task boundaries. A common solution to both problems is "replay," where a limited buffer of past instances is utilized to learn cross-task knowledge and mitigate catastrophic interference. However, a notable drawback of these methods is their tendency to overfit the limited replay buffer. In contrast, our proposed solution, SurpriseNet, addresses catastrophic interference by employing a parameter isolation method and learning cross-task knowledge using an auto-encoder inspired by anomaly detection. SurpriseNet is applicable to both structured and unstructured data, as it does not rely on image-specific inductive biases. We have conducted empirical experiments demonstrating the strengths of SurpriseNet on various traditional vision continual-learning benchmarks, as well as on structured data datasets. Source code made available at https://doi.org/10.5281/zenodo.8247906 and https://github.com/tachyonicClock/SurpriseNet-CIKM-23
摘要
In contrast, our proposed solution, SurpriseNet, addresses catastrophic interference by employing a parameter isolation method and learning cross-task knowledge using an auto-encoder inspired by anomaly detection. SurpriseNet is applicable to both structured and unstructured data, as it does not rely on image-specific inductive biases. We have conducted empirical experiments demonstrating the strengths of SurpriseNet on various traditional vision continual-learning benchmarks, as well as on structured data datasets. The source code is available at and .
SURF: A Generalization Benchmark for GNNs Predicting Fluid Dynamics
results: 研究发现,当模型需要适应不同的结构、分辨率或热动力学范围时,学习基于图的模型的通用性会受到影响。Abstract
Simulating fluid dynamics is crucial for the design and development process, ranging from simple valves to complex turbomachinery. Accurately solving the underlying physical equations is computationally expensive. Therefore, learning-based solvers that model interactions on meshes have gained interest due to their promising speed-ups. However, it is unknown to what extent these models truly understand the underlying physical principles and can generalize rather than interpolate. Generalization is a key requirement for a general-purpose fluid simulator, which should adapt to different topologies, resolutions, or thermodynamic ranges. We propose SURF, a benchmark designed to test the \textit{generalization} of learned graph-based fluid simulators. SURF comprises individual datasets and provides specific performance and generalization metrics for evaluating and comparing different models. We empirically demonstrate the applicability of SURF by thoroughly investigating the two state-of-the-art graph-based models, yielding new insights into their generalization.
摘要
模拟流体动力学是设计和开发过程中的关键环节,从简单的阀门到复杂的液压机。准确解决下面的物理方程是计算昂贵的。因此,学习型解决方案,即模型在网格上的交互,在计算速度方面表现出了扎根。然而,这些模型是否真正理解下面的物理原理,并能泛化而不仅是 interpolate?泛化是一个关键的要求,以便建立一个通用的流体 simulator,可以适应不同的topology、分辨率或热动力范围。我们提出了 SURF,一个用于测试学习型流体 simulator 的泛化能力的benchmark。SURF包括各个数据集,并提供了特定的性能和泛化指标,用于评估和比较不同的模型。我们进行了大量的实验,证明了 SURF 的可靠性和有用性,并且对两种当前最佳的图像基本模型进行了深入的探索,从而获得了新的理解。
Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization
paper_authors: Prakamya Mishra, Zonghai Yao, Shuwei Chen, Beining Wang, Rohan Mittal, Hong Yu for: This paper aims to improve the factual consistency of clinical note summarization using ChatGPT to generate high-quality feedback data.methods: The authors use ChatGPT to generate edit feedback for improving the factual consistency of clinical note summarization.results: The authors evaluate the effectiveness of using GPT edits in human alignment, showing promising results in improving factual consistency.Abstract
Large Language Models (LLMs) like the GPT and LLaMA families have demonstrated exceptional capabilities in capturing and condensing critical contextual information and achieving state-of-the-art performance in the summarization task. However, community concerns about these models' hallucination issues continue to rise. LLMs sometimes generate factually hallucinated summaries, which can be extremely harmful in the clinical domain NLP tasks (e.g., clinical note summarization), where factually incorrect statements can lead to critically erroneous diagnoses. Fine-tuning LLMs using human feedback has shown the promise of aligning LLMs to be factually consistent during generation, but such training procedure requires high-quality human-annotated data, which can be extremely expensive to get in the clinical domain. In this work, we propose a new pipeline using ChatGPT instead of human experts to generate high-quality feedback data for improving factual consistency in the clinical note summarization task. We focus specifically on edit feedback because recent work discusses the shortcomings of human alignment via preference feedback in complex situations (such as clinical NLP tasks that require extensive expert knowledge), as well as some advantages of collecting edit feedback from domain experts. In addition, although GPT has reached the expert level in many clinical NLP tasks (e.g., USMLE QA), there is not much previous work discussing whether GPT can generate expert-level edit feedback for LMs in the clinical note summarization task. We hope to fill this gap. Finally, our evaluations demonstrate the potential use of GPT edits in human alignment, especially from a factuality perspective.
摘要
大型语言模型(LLM)如GPT和LLaMA家族在捕捉和简化关键上下文信息方面表现出了异常的能力,并达到了当前最佳性能的概要任务。然而,社区对这些模型的幻觉问题仍然增长。LLM sometimes generates factually hallucinated summaries, which can be extremely harmful in the clinical domain NLP tasks (e.g., clinical note summarization), where factually incorrect statements can lead to critically erroneous diagnoses. fine-tuning LLMs using human feedback has shown the promise of aligning LLMs to be factually consistent during generation, but such training procedure requires high-quality human-annotated data, which can be extremely expensive to get in the clinical domain. In this work, we propose a new pipeline using ChatGPT instead of human experts to generate high-quality feedback data for improving factual consistency in the clinical note summarization task. We focus specifically on edit feedback because recent work discusses the shortcomings of human alignment via preference feedback in complex situations (such as clinical NLP tasks that require extensive expert knowledge), as well as some advantages of collecting edit feedback from domain experts. In addition, although GPT has reached the expert level in many clinical NLP tasks (e.g., USMLE QA), there is not much previous work discussing whether GPT can generate expert-level edit feedback for LMs in the clinical note summarization task. We hope to fill this gap. Finally, our evaluations demonstrate the potential use of GPT edits in human alignment, especially from a factuality perspective.
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
results: 在多种离线多目标摆动任务上达到了状态之 arts 性能,并且能够处理小数据预算和不同目标的扩展。Abstract
Offline goal-conditioned RL (GCRL) offers a feasible paradigm to learn general-purpose policies from diverse and multi-task offline datasets. Despite notable recent progress, the predominant offline GCRL methods have been restricted to model-free approaches, constraining their capacity to tackle limited data budgets and unseen goal generalization. In this work, we propose a novel two-stage model-based framework, Goal-conditioned Offline Planning (GOPlan), including (1) pretraining a prior policy capable of capturing multi-modal action distribution within the multi-goal dataset; (2) employing the reanalysis method with planning to generate imagined trajectories for funetuning policies. Specifically, the prior policy is based on an advantage-weighted Conditioned Generative Adversarial Networks that exhibits distinct mode separation to overcome the pitfalls of out-of-distribution (OOD) actions. For further policy optimization, the reanalysis method generates high-quality imaginary data by planning with learned models for both intra-trajectory and inter-trajectory goals. Through experimental evaluations, we demonstrate that GOPlan achieves state-of-the-art performance on various offline multi-goal manipulation tasks. Moreover, our results highlight the superior ability of GOPlan to handle small data budgets and generalize to OOD goals.
摘要
<> translate the following text into Simplified Chinese<>Offline目标条件RL(GCRL)提供了一个可行的 парадиг,从多样化和多任务的离线数据集中学习通用策略。尽管最近有所进步,主要的离线GCRL方法受限于无模型的方法,这限制了它们对有限数据预算和未看过目标的泛化能力。在这项工作中,我们提出了一种新的两阶段模型基于框架,即目标条件的计划(GOPlan),包括(1)预训练一个能够捕捉多模态动作分布在多目标数据集中的先前策略;(2)使用计划方法和规划来生成假 trajectory 进行迭代优化策略。具体来说,先前策略基于带有优化的 Conditioned Generative Adversarial Networks(CGANs),可以快速分离出异常行为(OOD)的问题。为了进一步优化策略,计划方法生成了高质量的假数据,通过规划learned模型来实现内 trajectory和间 trajectory的目标。经过实验评估,我们表明 GOPlan 可以在多个离线多目标机械处理任务中 дости得状态之Art的表现。此外,我们的结果还 highlight了 GOPlan 对小数据预算和 OOD 目标的泛化能力。
Topology Recoverability Prediction for Ad-Hoc Robot Networks: A Data-Driven Fault-Tolerant Approach
results: 本文的结果显示,与文献中现有最佳策略相比,二条通路数据驱动模型能够成功地解决网络(不)可回复预测问题。Abstract
Faults occurring in ad-hoc robot networks may fatally perturb their topologies leading to disconnection of subsets of those networks. Optimal topology synthesis is generally resource-intensive and time-consuming to be done in real time for large ad-hoc robot networks. One should only perform topology re-computations if the probability of topology recoverability after the occurrence of any fault surpasses that of its irrecoverability. We formulate this problem as a binary classification problem. Then, we develop a two-pathway data-driven model based on Bayesian Gaussian mixture models that predicts the solution to a typical problem by two different pre-fault and post-fault prediction pathways. The results, obtained by the integration of the predictions of those pathways, clearly indicate the success of our model in solving the topology (ir)recoverability prediction problem compared to the best of current strategies found in the literature.
摘要
FAULTS 发生在随机机器人网络中可能导致网络结构的不稳定,从而导致一些网络的分支断开。优化网络结构是在大规模随机机器人网络中实时进行的资源投入和时间consuming的任务。我们只应在缺陷发生后的概率超过了不可回复的概率时进行网络结构重新计算。我们将这个问题形式化为 binary 分类问题。然后,我们开发了基于极 bayesian Gaussian mixture 模型的两路数据驱动模型,该模型通过两个不同的预缺陷和后缺陷预测路径来预测一个典型问题的解决方案。结果显示,通过将这两个路径的预测结果融合,我们的模型在解决网络(不)可回复预测问题上具有显著的成功,比文献中最佳策略更高。
results: 论文的实验结果表明,这种特征归因方法在检测银河谱pectra中的异常点时比较有 interpretable 性,比如alternative归因方法更好。Abstract
Machine learning techniques can automatically identify outliers in massive datasets, much faster and more reproducible than human inspection ever could. But finding such outliers immediately leads to the question: which features render this input anomalous? We propose a new feature attribution method, Inverse Multiscale Occlusion, that is specifically designed for outliers, for which we have little knowledge of the type of features we want to identify and expect that the model performance is questionable because anomalous test data likely exceed the limits of the training data. We demonstrate our method on outliers detected in galaxy spectra from the Dark Energy Survey Instrument and find its results to be much more interpretable than alternative attribution approaches.
摘要
使用机器学习技术可以自动找出大量数据中的异常数据点,比人工检查更快速和可重复。但是发现这些异常数据点后,我们就会问:哪些特征使这个输入异常?我们提出了一种新的特征归因方法,反向多Scale遮盲,特意为异常数据点设计,我们对这些异常数据点知之甚少,而且预期模型性能很差,因为异常测试数据可能超出了训练数据的范围。我们在银河谱spectra中检测到的异常数据点上应用了这种方法,并发现其结果比替代归因方法更易于理解。
Evolutionary Tabletop Game Design: A Case Study in the Risk Game
paper_authors: Lana Bertoldo Rossato, Leonardo Boaventura Bombardelli, Anderson Rocha Tavares
for: 这个论文旨在创造和评估桌面游戏,以提高现有游戏的创新和变化。
methods: 这篇论文使用了进化算法和自动游戏测试来创造和评估桌面游戏。
results: 这篇论文的结果表明,通过使用遗传算法和规则引入的智能游戏测试,可以创造出新的桌面游戏变体,比如卡牌游戏和地图游戏。这些变体的游戏时间 shorter,并且比原始游戏更具 equilibrio。但是,这种方法还有一些局限性,例如,在许多情况下,目标函数 Correctly pursued,但生成的游戏几乎是平庸的。Abstract
Creating and evaluating games manually is an arduous and laborious task. Procedural content generation can aid by creating game artifacts, but usually not an entire game. Evolutionary game design, which combines evolutionary algorithms with automated playtesting, has been used to create novel board games with simple equipment; however, the original approach does not include complex tabletop games with dice, cards, and maps. This work proposes an extension of the approach for tabletop games, evaluating the process by generating variants of Risk, a military strategy game where players must conquer map territories to win. We achieved this using a genetic algorithm to evolve the chosen parameters, as well as a rules-based agent to test the games and a variety of quality criteria to evaluate the new variations generated. Our results show the creation of new variations of the original game with smaller maps, resulting in shorter matches. Also, the variants produce more balanced matches, maintaining the usual drama. We also identified limitations in the process, where, in many cases, where the objective function was correctly pursued, but the generated games were nearly trivial. This work paves the way towards promising research regarding the use of evolutionary game design beyond classic board games.
摘要
创造和评估游戏手动是一项艰难和劳动密集的任务。生成式内容创造可以帮助,但通常不能创造整个游戏。进化游戏设计,将进化算法与自动游戏测试结合,已经用于创造了一些简单的桌面游戏,但原始方法并不包括复杂的桌面游戏,如骰子、牌和地图。本工作提出了对桌面游戏的扩展,通过生成 variants of Risk,一款军事策略游戏,要求玩家征服地图区域以赢得。我们使用了遗传算法进化选择的参数,以及规则基于的智能客户端来测试游戏,以及多种质量标准来评估新生成的变化。我们的结果显示了创造了原版游戏的新变体,地图较小,比赛更短。此外,新变体具有更平衡的比赛,保持了正常的戏剧。我们还发现了过程中的限制,在许多情况下, objective function 正确追求,但生成的游戏几乎是无聊的。这项工作开启了对进化游戏设计在古典桌面游戏之外的探索。
Improved Bayesian Regret Bounds for Thompson Sampling in Reinforcement Learning
results: 这个论文得到了时间不同束的强化学习问题中的上界,其上界为 $\widetilde{O}(H\sqrt{d_{l_1}T})$,其中 $H$ 是 episode length,$d_{l_1}$ 是环境空间的 Kolmogorov $l_1-$ 度量。Abstract
In this paper, we prove the first Bayesian regret bounds for Thompson Sampling in reinforcement learning in a multitude of settings. We simplify the learning problem using a discrete set of surrogate environments, and present a refined analysis of the information ratio using posterior consistency. This leads to an upper bound of order $\widetilde{O}(H\sqrt{d_{l_1}T})$ in the time inhomogeneous reinforcement learning problem where $H$ is the episode length and $d_{l_1}$ is the Kolmogorov $l_1-$dimension of the space of environments. We then find concrete bounds of $d_{l_1}$ in a variety of settings, such as tabular, linear and finite mixtures, and discuss how how our results are either the first of their kind or improve the state-of-the-art.
摘要
在这篇论文中,我们证明了决策者抽取(Thompson Sampling)在强化学习中的第一个悔弃 regret bounds。我们通过简化学习问题,使用离散的代理环境集,并对信息倍数进行细化分析,从而得到时间不同权重学习问题中的上界为$\widetilde{O}(H\sqrt{d_{l_1}T})$,其中$H$是话语长度,$d_{l_1}$是环境空间的科尔莫戈罗夫-$l_1-$度量。然后,我们在各种设置下获得具体的$d_{l_1}$bounds,包括表格、线性和finite mixtures,并讨论了我们的结果如何超越现有的最佳成果。
Unveiling the Limits of Learned Local Search Heuristics: Are You the Mightiest of the Meek?
for: 这个论文旨在探讨 combine neural networks with local search heuristics 在 combinatorial optimization 领域的实践中的问题。
methods: 这个论文使用的方法包括 Tabu Search 和 deep learning 等多种方法。
results: 研究发现,一个简单的学习基于 Tabu Search 的规则可以超越当前最佳学习规则,并且具有更高的性能和普适性。Abstract
In recent years, combining neural networks with local search heuristics has become popular in the field of combinatorial optimization. Despite its considerable computational demands, this approach has exhibited promising outcomes with minimal manual engineering. However, we have identified three critical limitations in the empirical evaluation of these integration attempts. Firstly, instances with moderate complexity and weak baselines pose a challenge in accurately evaluating the effectiveness of learning-based approaches. Secondly, the absence of an ablation study makes it difficult to quantify and attribute improvements accurately to the deep learning architecture. Lastly, the generalization of learned heuristics across diverse distributions remains underexplored. In this study, we conduct a comprehensive investigation into these identified limitations. Surprisingly, we demonstrate that a simple learned heuristic based on Tabu Search surpasses state-of-the-art (SOTA) learned heuristics in terms of performance and generalizability. Our findings challenge prevailing assumptions and open up exciting avenues for future research and innovation in combinatorial optimization.
摘要
Recently, combining neural networks with local search heuristics has become popular in the field of combinatorial optimization. Despite its considerable computational demands, this approach has shown promising results with minimal manual engineering. However, we have identified three critical limitations in the empirical evaluation of these integration attempts. Firstly, instances with moderate complexity and weak baselines pose a challenge in accurately evaluating the effectiveness of learning-based approaches. Secondly, the absence of an ablation study makes it difficult to quantify and attribute improvements accurately to the deep learning architecture. Lastly, the generalization of learned heuristics across diverse distributions remains underexplored. In this study, we conduct a comprehensive investigation into these identified limitations. Surprisingly, we demonstrate that a simple learned heuristic based on Tabu Search surpasses state-of-the-art (SOTA) learned heuristics in terms of performance and generalizability. Our findings challenge prevailing assumptions and open up exciting avenues for future research and innovation in combinatorial optimization.Translation in Simplified Chinese:近些年来,将神经网络与本地搜索规则结合在一起已成为 combinatorial optimization 领域的 популяр趋势。尽管它们的计算需求相对较高,但这种方法在 minimal 的人工工程下已经展现出了扎心的成果。然而,我们在实证评估中发现了三个关键的限制。首先,有中等复杂度和弱基线的实例会增加评估学习基于方法的准确性问题。其次,缺乏抽象研究使得准确地归因改进到深度学习架构很困难。最后,学习基于分布的搜索规则的总体化仍未得到充分的探索。在这个研究中,我们通过对这些已知的限制进行全面的调查和分析,并 surprisingly 发现一种简单的学习基于 Tabu Search 的规则,在性能和总体化方面超越了当前的学习基于方法。我们的发现推翻了先前的假设,开 up 了未来研究和创新的潜在空间。
BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing
results: 通过BioInstruct数据集的微调,我们可以提高LLMs在BioNLP应用中的性能,包括信息抽取、问答和文本生成等。此外,我们还发现了多任务学习原则如何帮助指令的贡献。Abstract
Large language models (LLMs) has achieved a great success in many natural language processing (NLP) tasks. This is achieved by pretraining of LLMs on vast amount of data and then instruction tuning to specific domains. However, only a few instructions in the biomedical domain have been published. To address this issue, we introduce BioInstruct, a customized task-specific instruction dataset containing more than 25,000 examples. This dataset was generated attractively by prompting a GPT-4 language model with a three-seed-sample of 80 human-curated instructions. By fine-tuning LLMs using the BioInstruct dataset, we aim to optimize the LLM's performance in biomedical natural language processing (BioNLP). We conducted instruction tuning on the LLaMA LLMs (1\&2, 7B\&13B) and evaluated them on BioNLP applications, including information extraction, question answering, and text generation. We also evaluated how instructions contributed to model performance using multi-tasking learning principles.
摘要
大型语言模型(LLMs)在许多自然语言处理(NLP)任务中获得了很大的成功。这是通过预训练LLMs在庞大数据量上并在特定领域进行调整而实现的。然而,只有很少的指令在生物医学领域发布。为解决这个问题,我们介绍了 BioInstruct,一个自定义任务特定的指令集合,包含超过25,000个示例。这个数据集是通过提示一个GPT-4语言模型三个种子样本的80个人类批准的指令来生成的。通过使用BioInstruct数据集进行训练LLMs,我们 hoping to 优化LLMs在生物医学自然语言处理(BioNLP)中的表现。我们在LLaMA LLMs(1&2, 7B&13B)上进行了指令调整,并对其进行了BioNLP应用程序的评估,包括信息提取、问答和文本生成。我们还评估了指令对模型性能的贡献,使用多任务学习原理。
ExPT: Synthetic Pretraining for Few-Shot Experimental Design
results: 研究表明,ExPT在减少样本数据集合的情况下可以达到更高的性能和普适性,并且在各种复杂的实验设计任务中表现出色。Abstract
Experimental design is a fundamental problem in many science and engineering fields. In this problem, sample efficiency is crucial due to the time, money, and safety costs of real-world design evaluations. Existing approaches either rely on active data collection or access to large, labeled datasets of past experiments, making them impractical in many real-world scenarios. In this work, we address the more challenging yet realistic setting of few-shot experimental design, where only a few labeled data points of input designs and their corresponding values are available. We approach this problem as a conditional generation task, where a model conditions on a few labeled examples and the desired output to generate an optimal input design. To this end, we introduce Experiment Pretrained Transformers (ExPT), a foundation model for few-shot experimental design that employs a novel combination of synthetic pretraining with in-context learning. In ExPT, we only assume knowledge of a finite collection of unlabelled data points from the input domain and pretrain a transformer neural network to optimize diverse synthetic functions defined over this domain. Unsupervised pretraining allows ExPT to adapt to any design task at test time in an in-context fashion by conditioning on a few labeled data points from the target task and generating the candidate optima. We evaluate ExPT on few-shot experimental design in challenging domains and demonstrate its superior generality and performance compared to existing methods. The source code is available at https://github.com/tung-nd/ExPT.git.
摘要
实验设计是许多科学和工程领域的基本问题。在这个问题中,sample efficiency是非常重要,因为实验评估的时间、钱和安全成本都是非常高昂的。现有的方法都是靠活的数据收集或者有大量的过去实验标签数据来进行,这些方法在实际 scenarios 中是不实际的。在这个工作中,我们处理更加问题的设计问题,其中只有几个标签的输入设计和其对应的值是可用的。我们这个问题作为一个 conditional generation 任务,我们的模型将根据几个标签的例子和目标值来生成最佳的输入设计。为了实现这个目标,我们引入 Experiment Pretrained Transformers(ExPT),一个基于 transformer 神经网络的基础模型,它使用了一种新的组合方法,将 synthetic pretraining 与 in-context learning 相结合。在 ExPT 中,我们仅仅假设知道输入领域中的一个finite collection 的无标例子,并将 transformer 神经网络预训来优化这个领域上的多个无相关函数。无标预训 позволяет ExPT 在测试时以内容的方式适应任务,通过根据几个标签的目标值和目标值来获得候选的最佳设计。我们评估 ExPT 在几 shot 实验设计中的一般性和表现,并证明它在挑战性的领域中比现有的方法表现更好。请见 https://github.com/tung-nd/ExPT.git 的源代码。
Deep Learning for Spatiotemporal Big Data: A Vision on Opportunities and Challenges
results: 本研究描述了 espacial 大数据的特点和深度学习技术在这些数据上的应用,并提出了未来研究中需要解决的一些挑战。Abstract
With advancements in GPS, remote sensing, and computational simulation, an enormous volume of spatiotemporal data is being collected at an increasing speed from various application domains, spanning Earth sciences, agriculture, smart cities, and public safety. Such emerging geospatial and spatiotemporal big data, coupled with recent advances in deep learning technologies, foster new opportunities to solve problems that have not been possible before. For instance, remote sensing researchers can potentially train a foundation model using Earth imagery big data for numerous land cover and land use modeling tasks. Coastal modelers can train AI surrogates to speed up numerical simulations. However, the distinctive characteristics of spatiotemporal big data pose new challenges for deep learning technologies. This vision paper introduces various types of spatiotemporal big data, discusses new research opportunities in the realm of deep learning applied to spatiotemporal big data, lists the unique challenges, and identifies several future research needs.
摘要
With the advancements in GPS, remote sensing, and computational simulation, an enormous amount of spatiotemporal data is being collected at an increasing speed from various application domains, including Earth sciences, agriculture, smart cities, and public safety. This emerging geospatial and spatiotemporal big data, combined with recent advances in deep learning technologies, has opened up new opportunities to solve problems that were previously unsolvable. For example, remote sensing researchers can potentially train a foundation model using Earth imagery big data for numerous land cover and land use modeling tasks. Coastal modelers can train AI surrogates to speed up numerical simulations. However, the unique characteristics of spatiotemporal big data pose new challenges for deep learning technologies. This vision paper introduces various types of spatiotemporal big data, discusses new research opportunities in the realm of deep learning applied to spatiotemporal big data, lists the unique challenges, and identifies several future research needs.Here's the translation of the text in Traditional Chinese:随着GPS、远程感知和计算 simulated的进步,各个应用领域产生了巨量的空间时间数据,包括地球科学、农业、智能城市和公共安全。这些emerging geospatial和空间时间大数据,与最近的深度学习技术的进步,带来了新的机会,例如:将地球图像大数据用于多种土地覆盖和土地使用模型任务的对称基模型训练。海岸模型师可以将AI参数器训练为加速numerical simulations。然而, espaciotemporal big data的特有特征对深度学习技术提出了新的挑战。本 vision paper introduces various types of espaciotemporal big data, discusses new research opportunities in the realm of deep learning applied to espaciotemporal big data, lists the unique challenges, and identifies several future research needs.
Conditional Unscented Autoencoders for Trajectory Prediction
results: 试验结果显示,我们的模型在 INTERACTION 预测 dataset 上表现出色,超过了现有的州检验标准,并在 CelebA dataset 上进行图像模型化任务上也超过了基本的 vanilla CVAE。Abstract
The \ac{CVAE} is one of the most widely-used models in trajectory prediction for \ac{AD}. It captures the interplay between a driving context and its ground-truth future into a probabilistic latent space and uses it to produce predictions. In this paper, we challenge key components of the CVAE. We leverage recent advances in the space of the VAE, the foundation of the CVAE, which show that a simple change in the sampling procedure can greatly benefit performance. We find that unscented sampling, which draws samples from any learned distribution in a deterministic manner, can naturally be better suited to trajectory prediction than potentially dangerous random sampling. We go further and offer additional improvements, including a more structured mixture latent space, as well as a novel, potentially more expressive way to do inference with CVAEs. We show wide applicability of our models by evaluating them on the INTERACTION prediction dataset, outperforming the state of the art, as well as at the task of image modeling on the CelebA dataset, outperforming the baseline vanilla CVAE. Code is available at https://github.com/boschresearch/cuae-prediction.
摘要
《CVAE》是自驾报道预测领域中最广泛使用的模型之一。它捕捉了驾驶Context和其真实未来的关系,并将其转化为一个 probabilistic 的latent space,以生成预测。在这篇论文中,我们挑战了CVAE的关键组件。我们利用了最近的 VAE 的进步,CVAE 的基础,发现一种简单的改变抽样方法可以大幅提高性能。我们发现, deterministic 抽样(unscented sampling)可以更好地适应 trajectory prediction than potentially dangerous random sampling。我们还提供了其他改进,包括一种更结构化的混合幂space,以及一种可能更具表达力的CVAE inference方法。我们通过评估 INTERACTION prediction 数据集和 CelebA 图像模型任务,证明了我们的模型在各个领域中的广泛适用性,并超越了状态前的最佳性能。代码可以在 https://github.com/boschresearch/cuae-prediction 中找到。
Towards Few-Annotation Learning for Object Detection: Are Transformer-based Models More Efficient ?
results: 在 COCO 和 Pascal VOC 的半supervised 物体检测 benchmark 上评估了我们的方法,与之前的方法比较,特别是当标签少时表现更好。我们认为我们的贡献开启了新的可能性,以适应类似的物体检测方法。Abstract
For specialized and dense downstream tasks such as object detection, labeling data requires expertise and can be very expensive, making few-shot and semi-supervised models much more attractive alternatives. While in the few-shot setup we observe that transformer-based object detectors perform better than convolution-based two-stage models for a similar amount of parameters, they are not as effective when used with recent approaches in the semi-supervised setting. In this paper, we propose a semi-supervised method tailored for the current state-of-the-art object detector Deformable DETR in the few-annotation learning setup using a student-teacher architecture, which avoids relying on a sensitive post-processing of the pseudo-labels generated by the teacher model. We evaluate our method on the semi-supervised object detection benchmarks COCO and Pascal VOC, and it outperforms previous methods, especially when annotations are scarce. We believe that our contributions open new possibilities to adapt similar object detection methods in this setup as well.
摘要
特别是在特殊和紧张的下游任务,如物体检测,标注数据需要专家知识和成本很高,使得几步和半supervised模型变得非常吸引人。在几步设置下,我们发现基于转换器的物体检测器比基于 convolution的两stage模型在同等参数量下表现更好。然而,在 semi-supervised 设置下,它们不那么有效。在这篇论文中,我们提出了一种针对当前领先的物体检测器Deformable DETR在几步学习设置中使用学生-教师架构的半supervised方法。我们避免了依赖于敏感的后处理 pseudo-labels 生成于教师模型。我们在 COCO 和 Pascal VOC semi-supervised 物体检测数据集上评估了我们的方法,并与之前的方法相比,它在标注稀缺的情况下表现更好。我们认为,我们的贡献打开了新的可能性,使得类似的物体检测方法在这种设置中也可以适应。
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
methods: 该 paper 使用了 theoretical 和 experimental 方法来研究 RP PGMs 的整体性和优化问题,并提出了spectral normalization 方法来缓解长模型拓展导致的潜在梯度变iance问题。
results: 实验结果表明,采用 spectral normalization 方法可以有效缓解梯度变iance问题,并且提高了 RP PGMs 的性能,与其他梯度估计器(如 likelihood Ratio 梯度估计器)相当或更高。Abstract
ReParameterization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio (LR) gradient estimator. Our code is available at https://github.com/agentification/RP_PGM.
摘要
<> transtable("ReParameterization(RP)Policy Gradient Methods(PGMs)have been widely adopted for continuous control tasks in robotics and computer graphics. However, recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes with exploding gradient variance, which leads to slow convergence. This is in contrast to the conventional belief that reparameterization methods have low gradient estimation variance in problems such as training deep generative models. To comprehend this phenomenon, we conduct a theoretical examination of model-based RP PGMs and search for solutions to the optimization difficulties. Specifically, we analyze the convergence of the model-based RP PGMs and pinpoint the smoothness of function approximators as a major factor that affects the quality of gradient estimation. Based on our analysis, we propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls. Our experimental results demonstrate that proper normalization significantly reduces the gradient variance of model-based RP PGMs. As a result, the performance of the proposed method is comparable or superior to other gradient estimators, such as the Likelihood Ratio(LR)gradient estimator. Our code is available at https://github.com/agentification/RP_PGM.)Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.
Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
paper_authors: Michael Günther, Jackmin Ong, Isabelle Mohr, Alaeddine Abdessalem, Tanguy Abel, Mohammad Kalim Akram, Susana Guzman, Georgios Mastrapas, Saba Sturua, Bo Wang, Maximilian Werk, Nan Wang, Han Xiao
for: The paper is written for researchers and practitioners working on text embedding models, particularly those interested in developing models that can handle long documents.
methods: The paper introduces Jina Embeddings 2, an open-source text embedding model that can accommodate up to 8192 tokens, which is much longer than the conventional 512-token limit. The model uses a novel combination of techniques to achieve state-of-the-art performance on a range of embedding-related tasks.
results: The paper reports that Jina Embeddings 2 achieves performance on par with OpenAI’s proprietary ada-002 model on the MTEB benchmark, and that an extended context can enhance performance in tasks such as NarrativeQA.Abstract
Text embedding models have emerged as powerful tools for transforming sentences into fixed-sized feature vectors that encapsulate semantic information. While these models are essential for tasks like information retrieval, semantic clustering, and text re-ranking, most existing open-source models, especially those built on architectures like BERT, struggle to represent lengthy documents and often resort to truncation. One common approach to mitigate this challenge involves splitting documents into smaller paragraphs for embedding. However, this strategy results in a much larger set of vectors, consequently leading to increased memory consumption and computationally intensive vector searches with elevated latency. To address these challenges, we introduce Jina Embeddings 2, an open-source text embedding model capable of accommodating up to 8192 tokens. This model is designed to transcend the conventional 512-token limit and adeptly process long documents. Jina Embeddings 2 not only achieves state-of-the-art performance on a range of embedding-related tasks in the MTEB benchmark but also matches the performance of OpenAI's proprietary ada-002 model. Additionally, our experiments indicate that an extended context can enhance performance in tasks such as NarrativeQA.
摘要
文本嵌入模型已经成为强大工具,可以将句子转换成固定大小的特征向量,捕捉 semantic 信息。而这些模型在信息检索、semantic 聚合和文本重新排序等任务中是必备的,但现有的大多数开源模型,特别是基于 BERT 的模型,在处理长文档时很难表现,通常会导致 truncation。为了解决这个挑战,我们介绍 Jina Embeddings 2,一个可以处理 Up to 8192 个字的开源文本嵌入模型。这个模型不仅在 MTEB 竞赛中表现出优于 Convention 512 个字的限制,还能够高效地处理长文档。 Jina Embeddings 2 不仅达到了一系列嵌入相关任务的状态 искусственный智能表现,还与 OpenAI 的专有 ada-002 模型匹配。此外,我们的实验表明,扩展上下文可以提高 NarrativeQA 等任务的表现。
Unmasking Bias and Inequities: A Systematic Review of Bias Detection and Mitigation in Healthcare Artificial Intelligence Using Electronic Health Records
methods: 这项研究采用了遵循PRISMA指南的系统性回顾方法,从PubMed、Web of Science和IEEE检索到252篇文章,并对其中的20篇文章进行了最终审查。
results: 这项研究发现,在20篇文章中,5种主要的偏见问题得到了覆盖,即8篇文章分析了选择偏见问题,6篇文章分析了隐式偏见问题,5篇文章分析了干扰偏见问题,4篇文章分析了计量偏见问题,2篇文章分析了算法偏见问题。在偏见处理方法方面,10篇文章在模型开发阶段发现了偏见问题,而17篇文章提出了避免偏见问题的方法。Abstract
Objectives: Artificial intelligence (AI) applications utilizing electronic health records (EHRs) have gained popularity, but they also introduce various types of bias. This study aims to systematically review the literature that address bias in AI research utilizing EHR data. Methods: A systematic review was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guideline. We retrieved articles published between January 1, 2010, and October 31, 2022, from PubMed, Web of Science, and the Institute of Electrical and Electronics Engineers. We defined six major types of bias and summarized the existing approaches in bias handling. Results: Out of the 252 retrieved articles, 20 met the inclusion criteria for the final review. Five out of six bias were covered in this review: eight studies analyzed selection bias; six on implicit bias; five on confounding bias; four on measurement bias; two on algorithmic bias. For bias handling approaches, ten studies identified bias during model development, while seventeen presented methods to mitigate the bias. Discussion: Bias may infiltrate the AI application development process at various stages. Although this review discusses methods for addressing bias at different development stages, there is room for implementing additional effective approaches. Conclusion: Despite growing attention to bias in healthcare AI, research using EHR data on this topic is still limited. Detecting and mitigating AI bias with EHR data continues to pose challenges. Further research is needed to raise a standardized method that is generalizable and interpretable to detect, mitigate and evaluate bias in medical AI.
摘要
目的:人工智能(AI)应用使用电子健康纪录(EHR)得到了广泛的应用,但也会产生不同类型的偏见。本研究目的是系统性地对EHR数据使用AI研究中的偏见进行评估。方法:按照Preferred Reporting Items for Systematic Reviews and Meta-analyses(PRISMA)指南进行系统性综述。我们从2010年1月1日到2022年10月31日 retrievePubMed、Web of Science和Institute of Electrical and Electronics Engineers上的文章。我们定义了六种主要的偏见类型,并summarized现有的偏见处理方法。结果:从252篇文章中,20篇符合包含期刊的要求,进行了最终审查。八种偏见中,八种是选择偏见;六种是隐式偏见;五种是混合偏见;四种是测量偏见;两种是算法偏见。对偏见处理方法,十篇文章在模型开发阶段检测了偏见,而十七篇文章提出了避免偏见的方法。讨论:偏见可能在AI应用开发过程中各个阶段偏见。虽然这篇文章讨论了在不同阶段检测和避免偏见的方法,但还需要实施更多有效的方法。结论:尽管健康AI中的偏见问题已经得到了越来越多的关注,但使用EHR数据进行的研究还是有限的。检测和避免EHR数据上的AI偏见还需要继续进行更多的研究。为了提高AI医疗应用的标准化方法,需要采用一种通用、可读性高的方法来检测、避免和评估偏见。
Interpretable Prototype-based Graph Information Bottleneck
results: 对比其他状态速法,PGIB在预测性能和解释性两个方面均表现出色,并且通过质量分析得到了证明。Abstract
The success of Graph Neural Networks (GNNs) has led to a need for understanding their decision-making process and providing explanations for their predictions, which has given rise to explainable AI (XAI) that offers transparent explanations for black-box models. Recently, the use of prototypes has successfully improved the explainability of models by learning prototypes to imply training graphs that affect the prediction. However, these approaches tend to provide prototypes with excessive information from the entire graph, leading to the exclusion of key substructures or the inclusion of irrelevant substructures, which can limit both the interpretability and the performance of the model in downstream tasks. In this work, we propose a novel framework of explainable GNNs, called interpretable Prototype-based Graph Information Bottleneck (PGIB) that incorporates prototype learning within the information bottleneck framework to provide prototypes with the key subgraph from the input graph that is important for the model prediction. This is the first work that incorporates prototype learning into the process of identifying the key subgraphs that have a critical impact on the prediction performance. Extensive experiments, including qualitative analysis, demonstrate that PGIB outperforms state-of-the-art methods in terms of both prediction performance and explainability.
摘要
graph TD成功的图 neuronal networks (GNNs) 带来了理解它们的决策过程和提供对其预测的解释,这 hath led to explainable AI (XAI) 提供了透明的解释 для黑色 Box 模型。 在最近,使用 prototype 已成功地提高了模型的解释性,通过学习 prototype 来Imply training graphs that affect the prediction。然而,这些approaches 往往提供 prototype 中过度的信息,从整个图中获取信息,导致遗漏关键子结构或包含无关信息,这可能会限制模型在下游任务中的解释性和性能。在这项工作中,我们提出了一种新的解释 GNN 框架,called interpretable Prototype-based Graph Information Bottleneck (PGIB)。PGIB 在信息瓶颈框架中 incorporates prototype learning 来提供 key subgraph 从输入图中对模型预测的重要性。这是首次 incorporates prototype learning 到 identify 预测性能的关键子图的过程中。extensive experiments, including qualitative analysis, show that PGIB outperforms state-of-the-art methods in terms of both prediction performance and explainability.
Herd: Using multiple, smaller LLMs to match the performances of proprietary, large LLMs via an intelligent composer
results: 论文表明,一个由多种开源模型组成的“牧场”可以与商业模型匹配或超越其性能,而且这些模型的大小比商业模型要小得多。此外,当GPT无法回答查询时,“牧场”可以识别一个能够回答查询的模型,超过40%的时间。Abstract
Currently, over a thousand LLMs exist that are multi-purpose and are capable of performing real world tasks, including Q&A, text summarization, content generation, etc. However, accessibility, scale and reliability of free models prevents them from being widely deployed in everyday use cases. To address the first two issues of access and scale, organisations such as HuggingFace have created model repositories where users have uploaded model weights and quantized versions of models trained using different paradigms, as well as model cards describing their training process. While some models report performance on commonly used benchmarks, not all do, and interpreting the real world impact of trading off performance on a benchmark for model deployment cost, is unclear. Here, we show that a herd of open source models can match or exceed the performance of proprietary models via an intelligent router. We show that a Herd of open source models is able to match the accuracy of ChatGPT, despite being composed of models that are effectively 2.5x smaller. We show that in cases where GPT is not able to answer the query, Herd is able to identify a model that can, at least 40% of the time.
摘要
Here, we show that a herd of open-source models can match or exceed the performance of proprietary models via an intelligent router. Specifically, we show that a herd of open-source models is able to match the accuracy of ChatGPT, despite being composed of models that are effectively 2.5 times smaller. Additionally, we show that in cases where GPT is not able to answer a query, the herd is able to identify a model that can, at least 40% of the time.
Exploring Geometry of Blind Spots in Vision Models
methods: 该研究使用了Level Set Traversal算法,通过探索输入空间中的高确idence区域,以找到与源图像相似但属于其他类别的输入图像。
results: 研究发现,深度神经网络的等Confidence水平集在输入空间中存在星型结构,而且可以使用高确idence路径连接这些等Confidence水平集。此外,研究还评估了这些连接的高维空间范围。code可以在https://github.com/SriramB-98/blindspots-neurips-sub上获取。Abstract
Despite the remarkable success of deep neural networks in a myriad of settings, several works have demonstrated their overwhelming sensitivity to near-imperceptible perturbations, known as adversarial attacks. On the other hand, prior works have also observed that deep networks can be under-sensitive, wherein large-magnitude perturbations in input space do not induce appreciable changes to network activations. In this work, we study in detail the phenomenon of under-sensitivity in vision models such as CNNs and Transformers, and present techniques to study the geometry and extent of "equi-confidence" level sets of such networks. We propose a Level Set Traversal algorithm that iteratively explores regions of high confidence with respect to the input space using orthogonal components of the local gradients. Given a source image, we use this algorithm to identify inputs that lie in the same equi-confidence level set as the source image despite being perceptually similar to arbitrary images from other classes. We further observe that the source image is linearly connected by a high-confidence path to these inputs, uncovering a star-like structure for level sets of deep networks. Furthermore, we attempt to identify and estimate the extent of these connected higher-dimensional regions over which the model maintains a high degree of confidence. The code for this project is publicly available at https://github.com/SriramB-98/blindspots-neurips-sub
摘要
尽管深度神经网络在各种设置中表现出了惊人的成功,但是一些研究表明,这些神经网络对于微不足的干扰( adversarial attacks)具有极高的敏感性。然而,也有一些研究发现,深度网络可能具有不够敏感的问题,即在输入空间中的大规模干扰不会导致神经网络的活动变化。在这种情况下,我们对视力模型,如CNNs和Transformers,进行了详细的研究,并提出了一些技术来研究这些神经网络的几何结构和扩展。我们提出了一种Level Set Traversal算法,该算法可以在输入空间中循环探索高信任级别的区域,并使用本地梯度的正交分量来探索这些区域。给定一个源图像,我们使用这种算法来找到与源图像在输入空间中的同一个等信任水平集的输入图像,并观察到这些输入图像与源图像之间存在一条直线连接,揭示了深度网络的等信任水平集具有星型结构。此外,我们尝试了为这些相关的更高维度区域的扩展,以便更好地了解深度网络在它们中的行为。相关代码可以在https://github.com/SriramB-98/blindspots-neurips-sub上获取。
DEFT: Dexterous Fine-Tuning for Real-World Hand Policies
methods: 本研究提出了一种新的方法,即 DEFT(dexterous fine-tuning for hand policies),它利用人类驱动的假设,通过在实际世界中直接执行来改进这些假设。该方法还包括一种高效的在线优化过程。
results: DEFT 在多个任务中显示出成功,以及一种数据效率的普适的软件 manipulate 路径,用于掌握复杂的 manipulate 任务。您可以通过访问我们的网站 https://dexterous-finetuning.github.io 查看视频结果。Abstract
Dexterity is often seen as a cornerstone of complex manipulation. Humans are able to perform a host of skills with their hands, from making food to operating tools. In this paper, we investigate these challenges, especially in the case of soft, deformable objects as well as complex, relatively long-horizon tasks. However, learning such behaviors from scratch can be data inefficient. To circumvent this, we propose a novel approach, DEFT (DExterous Fine-Tuning for Hand Policies), that leverages human-driven priors, which are executed directly in the real world. In order to improve upon these priors, DEFT involves an efficient online optimization procedure. With the integration of human-based learning and online fine-tuning, coupled with a soft robotic hand, DEFT demonstrates success across various tasks, establishing a robust, data-efficient pathway toward general dexterous manipulation. Please see our website at https://dexterous-finetuning.github.io for video results.
摘要
dexterity 常被看作复杂的操作的基石。人们可以通过手部执行许多技能,从制备食物到操作工具。在这篇论文中,我们调查这些挑战,尤其是在软、可变形物体以及复杂、较长时间任务上。然而,从头来学习这些行为可以是数据不fficient。为了缓解这个问题,我们提出了一种新的方法,即 DEFT(手部精细调整 для手指策略),它利用人类驱动的先验知识,直接在实际世界中执行。为了提高这些先验知识,DEFT包括一种高效的在线优化过程。通过人类学习和在线细化,以及软体机械手部,DEFT在多种任务中成功,建立了一条可靠、数据fficient的通路 toward 普遍的手部精细操作。请参考我们网站 查看视频结果。
Re-evaluating Retrosynthesis Algorithms with Syntheseus
results: 通过使用syntheseus库进行重新评估,发现了一些之前的 retrosynthesis 算法的排名发生了变化。Abstract
The planning of how to synthesize molecules, also known as retrosynthesis, has been a growing focus of the machine learning and chemistry communities in recent years. Despite the appearance of steady progress, we argue that imperfect benchmarks and inconsistent comparisons mask systematic shortcomings of existing techniques. To remedy this, we present a benchmarking library called syntheseus which promotes best practice by default, enabling consistent meaningful evaluation of single-step and multi-step retrosynthesis algorithms. We use syntheseus to re-evaluate a number of previous retrosynthesis algorithms, and find that the ranking of state-of-the-art models changes when evaluated carefully. We end with guidance for future works in this area.
摘要
“retrosynthesis的规划”在过去几年内,机器学习和化学社区的关注越来越高。尽管表面上看来有稳定的进步,我们认为现有的评价标准和比较方法存在系统性的缺陷。为了解决这个问题,我们提出了一个名为 syntheseus的评价库,它默认采用了最佳实践,使得单步和多步retrosynthesis算法的meaningful评价成为可能。我们使用 syntheseus 重新评估了一些先前的retrosynthesis算法,并发现当仔细评估时,现状的模型的排名发生变化。最后,我们提出了未来这个领域的指导方针。
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
results: 对于权重调整和泛化调整等多种调整策略,该方法能够提供更高效的参数调整,并且可以轻松地搭配多种调整策略。经验表明,该方法在推理和生成任务上具有较高的效果和效率。Abstract
Parameter-efficient tuning has become a trend in transferring large-scale foundation models to downstream applications. Existing methods typically embed some light-weight tuners into the backbone, where both the design and the learning of the tuners are highly dependent on the base model. This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally unbinds tuners from the backbone. With both theoretical and empirical evidence, we show that popular tuning approaches have their equivalent counterparts under our unbinding formulation, and hence can be integrated into our framework effortlessly. Thanks to the structural disentanglement, we manage to free the design of tuners from the network architecture, facilitating flexible combination of various tuning strategies. We further propose a memory-efficient variant of Res-Tuning, where the bypass i.e., formed by a sequence of tuners) is effectively detached from the main branch, such that the gradients are back-propagated only to the tuners but not to the backbone. Such a detachment also allows one-time backbone forward for multi-task inference. Extensive experiments on both discriminative and generative tasks demonstrate the superiority of our method over existing alternatives from the perspectives of efficacy and efficiency. Project page: $\href{https://res-tuning.github.io/}{\textit{https://res-tuning.github.io/}$.
摘要
大规模基础模型转移到下游应用的Parameter-efficient tuning已成为当前趋势。现有方法通常将轻量级调参器 embedding到后向,其设计和学习均高度依赖于基模型。这项工作提出了一新调参方式,名为Res-Tuning,它意图将调参器解除与后向的绑定。我们通过理论和实验证明,流行的调参方法均有其对应的等价对手在我们的解绑形式下,因此可以轻松地 интеGRATE到我们的框架中。由于结构分离,我们可以在调参器的设计上免除网络架构的限制,实现灵活的调参策略组合。此外,我们还提出了内存效率改进的Res-Tuning变体,其中分支(formed by a sequence of tuners)被有效地分离于主支,使得梯度只回传给调参器而不回传到后向。这种分离还允许一次主支前进 для多任务推理。广泛的实验表明,我们的方法在效果和效率两个角度上都超越了现有的方法。项目页面: $\href{https://res-tuning.github.io/}{\textit{https://res-tuning.github.io/}$。
SimMMDG: A Simple and Effective Framework for Multi-modal Domain Generalization
paper_authors: Hao Dong, Ismail Nejjar, Han Sun, Eleni Chatzi, Olga Fink for: 这个研究旨在解决多modal Distribution Generalization (DG) 中的挑战,当模型需要在不同的modalities上缩减到未知的target Distribution。methods: 我们提出了一个简单 yet effective的多modal DG框架,叫做SimMMDG。我们认为将不同modalities的特征映射到同一个嵌入空间会降低模型的通用性。因此,我们提出了在每个modalities中分解特征为modalitiespecific和modalitieshared部分。我们运用了监督式对应学习 modalitieshared特征,以保持它们具有共同性,并对modalitiespecific特征强制距离。此外,我们引入了跨modalities翻译模块,以调整学习的特征。results: 我们的框架理论上得到支持,并在EPIC-Kitchens dataset和我们在本文中介绍的新的Human-Animal-Cartoon (HAC) dataset上实现了强大的多modal DG性能。我们的原始代码和HAC dataset可以在https://github.com/donghao51/SimMMDG上获得。Abstract
In real-world scenarios, achieving domain generalization (DG) presents significant challenges as models are required to generalize to unknown target distributions. Generalizing to unseen multi-modal distributions poses even greater difficulties due to the distinct properties exhibited by different modalities. To overcome the challenges of achieving domain generalization in multi-modal scenarios, we propose SimMMDG, a simple yet effective multi-modal DG framework. We argue that mapping features from different modalities into the same embedding space impedes model generalization. To address this, we propose splitting the features within each modality into modality-specific and modality-shared components. We employ supervised contrastive learning on the modality-shared features to ensure they possess joint properties and impose distance constraints on modality-specific features to promote diversity. In addition, we introduce a cross-modal translation module to regularize the learned features, which can also be used for missing-modality generalization. We demonstrate that our framework is theoretically well-supported and achieves strong performance in multi-modal DG on the EPIC-Kitchens dataset and the novel Human-Animal-Cartoon (HAC) dataset introduced in this paper. Our source code and HAC dataset are available at https://github.com/donghao51/SimMMDG.
摘要
在实际应用场景中,实现领域泛化(DG)存在重大挑战,因为模型需要泛化到未知目标分布。在多modal的场景中,泛化到未见多Modal的分布呈现更大的挑战,因为不同modalities具有不同的特性。为了解决多modal的领域泛化问题,我们提出了SimMMDG框架,这是一种简单 yet有效的多Modal DG框架。我们认为将不同modalities的特征映射到同一个嵌入空间内会阻碍模型泛化。为此,我们提议将每个modalities的特征分为具有共同特性的特征和具有特定特性的特征。我们使用supervised contrastive学习来确保共同特性,并对特定特性进行距离约束,以促进多Modal特征的多样性。此外,我们还引入了跨Modal翻译模块,以规范学习的特征。我们的框架理论上有良好支持,并在EPIC-Kitchens数据集和我们在本文中介绍的新的人类动物卡通(HAC)数据集上实现了强大的性能。我们的源代码和HAC数据集可以在https://github.com/donghao51/SimMMDG上获取。
LILO: Learning Interpretable Libraries by Compressing and Documenting Code
results: 对三个 inductive 程序生成benchmark进行了评测,并与现有的神经和符号方法进行了比较。研究发现,LILO可以解决更复杂的任务,并学习更加深入的语言知识。Abstract
While large language models (LLMs) now excel at code generation, a key aspect of software development is the art of refactoring: consolidating code into libraries of reusable and readable programs. In this paper, we introduce LILO, a neurosymbolic framework that iteratively synthesizes, compresses, and documents code to build libraries tailored to particular problem domains. LILO combines LLM-guided program synthesis with recent algorithmic advances in automated refactoring from Stitch: a symbolic compression system that efficiently identifies optimal lambda abstractions across large code corpora. To make these abstractions interpretable, we introduce an auto-documentation (AutoDoc) procedure that infers natural language names and docstrings based on contextual examples of usage. In addition to improving human readability, we find that AutoDoc boosts performance by helping LILO's synthesizer to interpret and deploy learned abstractions. We evaluate LILO on three inductive program synthesis benchmarks for string editing, scene reasoning, and graphics composition. Compared to existing neural and symbolic methods - including the state-of-the-art library learning algorithm DreamCoder - LILO solves more complex tasks and learns richer libraries that are grounded in linguistic knowledge.
摘要
大型语言模型(LLM)现在在代码生成方面表现出色,但是软件开发中一个关键方面是 refactoring:将代码集成到可重用和易读的库中。在这篇论文中,我们介绍了 LILO,一个神经符号学框架,可以逐步生成、压缩和文档代码,以建立适应特定问题领域的库。LILO将神经网络引导的程序生成与Stitch的符号压缩系统相结合,以实现高效的lambda抽象。为了让这些抽象更易理解,我们引入了自动文档(AutoDoc)过程,可以根据Contextual例子来生成自然语言名称和docstrings。除了提高人类可读性外,我们发现AutoDoc会提高LILO的生成器使用学习的性能。我们对LILO进行了三个induced程序生成benchmark测试,包括字符串编辑、Scene reasoning和图形组合。相比 existed神经和符号方法,包括DreamCoder库学习算法,LILO可以解决更复杂的任务,并学习更加深入的语言知识。
From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces
results: 这个论文的结果表明,当存在一个无外部 regret 算法时,必然也存在一个无换 regret 算法,并且这种无换 regret 算法的性能比 classical reductions 更好。此外,这个论文还提供了一个新的下界,其表明在某些游戏中,换 regret 的数量必然是 $\tilde\Omega(N/\epsilon^2)$ 或者是 exponential in $1/\epsilon$。Abstract
We provide a novel reduction from swap-regret minimization to external-regret minimization, which improves upon the classical reductions of Blum-Mansour [BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness of the space of actions. We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class. For the problem of learning with expert advice, our result implies that it is possible to guarantee that the swap regret is bounded by {\epsilon} after $\log(N)^{O(1/\epsilon)}$ rounds and with $O(N)$ per iteration complexity, where $N$ is the number of experts, while the classical reductions of Blum-Mansour and Stolz-Lugosi require $O(N/\epsilon^2)$ rounds and at least $\Omega(N^2)$ per iteration complexity. Our result comes with an associated lower bound, which -- in contrast to that in [BM07] -- holds for oblivious and $\ell_1$-constrained adversaries and learners that can employ distributions over experts, showing that the number of rounds must be $\tilde\Omega(N/\epsilon^2)$ or exponential in $1/\epsilon$. Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria, of arbitrarily good approximation. This strengthens the folklore implication of no-regret learning that approximate coarse correlated equilibria exist. Importantly, it provides a sufficient condition for the existence of correlated equilibrium which vastly extends the requirement that the action set is finite, thus answering a question left open by [DG22; Ass+23]. Moreover, it answers several outstanding questions about equilibrium computation and/or learning in games.
摘要
我们提供了一种新的减少方法,将交换 regret 转化为外部 regret,从而超越了布姆-曼索尔(BM07)和斯托尔-卢戈西(SL05)的经典减少方法,因为它不需要动作空间的Finite。我们证明了,当存在一个无外部 regret 算法时,也一定存在一个无交换 regret 算法。在学习专家建议中,我们的结果表明,可以保证在 $\log(N)^{O(1/\epsilon)}$ 轮后,交换 regret 不超过 $\epsilon$,并且每轮复杂度为 $O(N)$,where $N$ 是专家数量。而布姆-曼索尔和斯托尔-卢戈西的经典减少方法需要 $O(N/\epsilon^2)$ 轮和至少 $\Omega(N^2)$ 每轮复杂度。我们的结果还有一个相关的下界,这下界在对快速反应者和 $\ell_1$ 约束学习者来说,并且可以使用分布来选择专家,显示了轮数必须是 $\tilde\Omega(N/\epsilon^2)$ 或者 exponential in $1/\epsilon$。我们的减少方法表明,如果有一个不 regret 学习是可能的游戏,那么这个游戏一定有approximate correlated equilibria,并且这些 equilibria 的准确程度可以是arbitrarily good。这个结论超越了布姆-曼索尔的结论,因为它不需要动作空间的Finite。此外,我们的结论还回答了一些关于平衡计算和学习的问题。
CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models
methods: 该方法使用了三个重要的技术:1) 3D 新视角合成;2) 对象自定义;3) 文本描述或特定用户定义的图像来控制位置和背景。
results: 该方法可以在不需要测试时间优化的情况下,实现零 instances 的对象自定义,同时具有较好的个体保持和多样性。Abstract
Incorporating a customized object into image generation presents an attractive feature in text-to-image generation. However, existing optimization-based and encoder-based methods are hindered by drawbacks such as time-consuming optimization, insufficient identity preservation, and a prevalent copy-pasting effect. To overcome these limitations, we introduce CustomNet, a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process. This integration facilitates the adjustment of spatial position relationships and viewpoints, yielding diverse outputs while effectively preserving object identity. Moreover, we introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images, overcoming the limitations of existing 3D novel view synthesis methods. We further leverage a dataset construction pipeline that can better handle real-world objects and complex backgrounds. Equipped with these designs, our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background. As a result, our CustomNet ensures enhanced identity preservation and generates diverse, harmonious outputs.
摘要
通过包含自定义对象在图像生成中,文本到图像生成具有吸引人的特点。然而,现有的优化方法和编码器方法受到了一些缺点,如时间消耗优化、保持对象标识不足和广泛的复制效应。为了解决这些局限性,我们介绍了CustomNet,一种新的对象自定义方法,其中Explicitly incorporates 3D新视野合成能力到对象自定义过程中。这种整合使得可以调整空间位置关系和视点,从而生成多样的输出,同时有效地保持对象标识。此外,我们还引入了细腻的设计,使得通过文本描述或特定用户定义的图像来控制位置和背景,超越现有的3D新视野合成方法的限制。我们还利用了更好的数据构建管道,可以更好地处理真实世界中的对象和复杂背景。准备这些设计,我们的方法可以在零时优化下实现无需测试时优化的自定义对象,同时控制视点、位置和背景。因此,我们的CustomNet可以保持对象标识并生成多样、和谐的输出。
Designing AI Support for Human Involvement in AI-assisted Decision Making: A Taxonomy of Human-AI Interactions from a Systematic Review
paper_authors: Catalina Gomez, Sue Min Cho, Shichang Ke, Chien-Ming Huang, Mathias Unberath
for: 提高人工智能在决策支持系统中的用户体验,增强人工智能与人类的交互。
methods: 系统atic review of AI-assisted decision making literature,分析105篇论文,提出了一种交互模式分类法,用于描述不同的人工智能交互方式。
results: 现有交互主要是简单的合作模式,报告了相对少的交互功能支持。 taxonomy 能够帮助理解现有决策支持系统中人工智能交互的现状,并促进交互设计的审慎选择。Abstract
Efforts in levering Artificial Intelligence (AI) in decision support systems have disproportionately focused on technological advancements, often overlooking the alignment between algorithmic outputs and human expectations. To address this, explainable AI promotes AI development from a more human-centered perspective. Determining what information AI should provide to aid humans is vital, however, how the information is presented, e. g., the sequence of recommendations and the solicitation of interpretations, is equally crucial. This motivates the need to more precisely study Human-AI interaction as a pivotal component of AI-based decision support. While several empirical studies have evaluated Human-AI interactions in multiple application domains in which interactions can take many forms, there is not yet a common vocabulary to describe human-AI interaction protocols. To address this gap, we describe the results of a systematic review of the AI-assisted decision making literature, analyzing 105 selected articles, which grounds the introduction of a taxonomy of interaction patterns that delineate various modes of human-AI interactivity. We find that current interactions are dominated by simplistic collaboration paradigms and report comparatively little support for truly interactive functionality. Our taxonomy serves as a valuable tool to understand how interactivity with AI is currently supported in decision-making contexts and foster deliberate choices of interaction designs.
摘要
对于人工智能(AI)在决策支持系统中的应用,各方面的努力都集中在技术进步上,而忽略了算法的人类预期之间的协调。为了解决这个问题,可观察AI的发展方式更加人类中心。决定AI为人类提供什么样的信息是重要的,但是如何呈现这些信息,例如推荐的顺序和寻求解释的方式,也是非常重要的。这为人类AI互动的研究提供了动机,并且发现了多种应用领域中的人类AI互动协议。但是,目前还没有一个通用的语言来描述人类AI互动协议。为了解决这个问题,我们描述了105篇选择的文献的系统性评审结果,并从这些文献中提取了人类AI互动协议的分类。我们发现现有的互动都偏向简单的合作模式,并报告了相对较少的支持真正互动性能。我们的分类可以作为了解人类AI互动在决策context中的支持方式,并且激发人们对互动设计的意识。
Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery
paper_authors: Sarah Rastegar, Hazel Doughty, Cees G. M. Snoek
for: 提出了一种能够在测试时发现未知类别的新方法
methods: 基于优化的思路,对数据实例分配最短类别编码,从而控制类别细分程度
results: 经过实验证明,该方法可以在测试时成功地处理未知类别,并且与现有标准模型进行比较In English, this translates to:
for: Proposed a new method for discovering unknown categories at test time
methods: Based on optimization, assign minimum length category codes to individual data instances to control category granularity
results: Experimental results demonstrate the effectiveness of the method in handling unknown categories at test time, with comparisons to state-of-the-art benchmarksAbstract
In the quest for unveiling novel categories at test time, we confront the inherent limitations of traditional supervised recognition models that are restricted by a predefined category set. While strides have been made in the realms of self-supervised and open-world learning towards test-time category discovery, a crucial yet often overlooked question persists: what exactly delineates a \textit{category}? In this paper, we conceptualize a \textit{category} through the lens of optimization, viewing it as an optimal solution to a well-defined problem. Harnessing this unique conceptualization, we propose a novel, efficient and self-supervised method capable of discovering previously unknown categories at test time. A salient feature of our approach is the assignment of minimum length category codes to individual data instances, which encapsulates the implicit category hierarchy prevalent in real-world datasets. This mechanism affords us enhanced control over category granularity, thereby equipping our model to handle fine-grained categories adeptly. Experimental evaluations, bolstered by state-of-the-art benchmark comparisons, testify to the efficacy of our solution in managing unknown categories at test time. Furthermore, we fortify our proposition with a theoretical foundation, providing proof of its optimality. Our code is available at: \url{https://github.com/SarahRastegar/InfoSieve}.
摘要
“在试用时探索新的分类ategories,我们面临传统超级vised recognition模型的内在限制。这些模型仅仅受到预先定义的分类category set的限制,而我们则寻求在试用时自动发现新的分类categories。在这篇论文中,我们将分类category视为一个优化问题的最佳解决方案。我们利用这个独特的概念,提出一种新的、效率高且自动化的方法,可以在试用时发现未知的分类categories。我们的方法将实现分类category code的最小长度对个别数据实例的对应,这为我们提供了更好的分类精确度控制,因此我们的模型可以更好地处理细部分类。我们的实验结果,以及与现有的基eline比较,证明了我们的解决方案在处理未知分类ategories的能力。此外,我们还提供了理论基础,证明了我们的方法是最佳的。我们的代码可以在以下连结中找到:https://github.com/SarahRastegar/InfoSieve。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions
paper_authors: Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, Simone Stumpf
results: 本研究提出了27个开问,并分类为9个类别,以便各个领域的研究人员可以共同努力解决XAI领域的挑战。Abstract
As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.
摘要
As systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications, understanding these black box models has become paramount. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper not only highlights the advancements in XAI and its application in real-world scenarios but also addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.Translated text in Simplified Chinese:随着透明化人工智能(AI)系统在多个实际应用场景中的普及,理解这些黑盒模型已经成为非常重要。为此,解释AI(XAI)已经成为一种研究领域,具有实用和伦理上的利益。本文不仅探讨了XAI的发展和应用,还Addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. Our goal is to put forward a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 27 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.
results: 研究表明,该方法可以与或超越传统的PDE解决方法和Fourier Neural Operator在泛化能力和性能上,并且可以更好地处理一些复杂的PDE问题。Abstract
Recent developments in the field of neural partial differential equation (PDE) solvers have placed a strong emphasis on neural operators. However, the paper "Message Passing Neural PDE Solver" by Brandstetter et al. published in ICLR 2022 revisits autoregressive models and designs a message passing graph neural network that is comparable with or outperforms both the state-of-the-art Fourier Neural Operator and traditional classical PDE solvers in its generalization capabilities and performance. This blog post delves into the key contributions of this work, exploring the strategies used to address the common problem of instability in autoregressive models and the design choices of the message passing graph neural network architecture.
摘要
近期在神经partial differential equation(PDE)解决方法领域,强调神经操作符的发展。然而,布兰德塞特特等在ICLR 2022年发表的论文《消息传递神经PDE解决方法》,重新评估了自动律型模型,并设计了一个可与或超越现有的快扩散 Neil 算法和传统类型PDE解决方法的消息传递图 neural network 架构。本博客文章将探讨这项工作的关键贡献,探讨了自动律型模型中常见的不稳定性问题的解决方案,以及消息传递图 neural network 架构的设计选择。
Adversarial Attacks and Defenses in Large Language Models: Old and New Threats
methods: 本研究使用了一些新的方法来评估 robustness,包括 embedding space attacks 和 LLM-specific best practices。
results: 研究发现,without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach。此外, embedding space attacks 也成为了一种可行的威胁模型。Abstract
Over the past decade, there has been extensive research aimed at enhancing the robustness of neural networks, yet this problem remains vastly unsolved. Here, one major impediment has been the overestimation of the robustness of new defense approaches due to faulty defense evaluations. Flawed robustness evaluations necessitate rectifications in subsequent works, dangerously slowing down the research and providing a false sense of security. In this context, we will face substantial challenges associated with an impending adversarial arms race in natural language processing, specifically with closed-source Large Language Models (LLMs), such as ChatGPT, Google Bard, or Anthropic's Claude. We provide a first set of prerequisites to improve the robustness assessment of new approaches and reduce the amount of faulty evaluations. Additionally, we identify embedding space attacks on LLMs as another viable threat model for the purposes of generating malicious content in open-sourced models. Finally, we demonstrate on a recently proposed defense that, without LLM-specific best practices in place, it is easy to overestimate the robustness of a new approach.
摘要
Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.
Evaluating Large Language Models: A Comprehensive Survey
For: 评估大语言模型(LLMs)的能力和安全性。* Methods: 分为三类:知识和能力评估、对Alignment评估和安全评估。* Results: 提供了一个全面的评估方法和标准套件,以及一些特定领域的评估研究。Abstract
Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. Additionally, the rapid progress of LLMs raises concerns about the potential emergence of superintelligent systems without adequate safeguards. To effectively capitalize on LLM capacities as well as ensure their safe and beneficial development, it is critical to conduct a rigorous and comprehensive evaluation of LLMs. This survey endeavors to offer a panoramic perspective on the evaluation of LLMs. We categorize the evaluation of LLMs into three major groups: knowledge and capability evaluation, alignment evaluation and safety evaluation. In addition to the comprehensive review on the evaluation methodologies and benchmarks on these three aspects, we collate a compendium of evaluations pertaining to LLMs' performance in specialized domains, and discuss the construction of comprehensive evaluation platforms that cover LLM evaluations on capabilities, alignment, safety, and applicability. We hope that this comprehensive overview will stimulate further research interests in the evaluation of LLMs, with the ultimate goal of making evaluation serve as a cornerstone in guiding the responsible development of LLMs. We envision that this will channel their evolution into a direction that maximizes societal benefit while minimizing potential risks. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers.
摘要
大型语言模型(LLM)在各种任务上表现出了惊人的能力,引起了广泛的关注和应用。然而,与一个双刃剑相似,LLM也存在潜在的风险。它们可能会导致私人数据泄露或生成不当、伤害或误导的内容。此外,LLM的快速进步也引起了关于可能出现无适应安全措施的超智系统的担忧。为了有效利用LLM的能力并确保其安全和有益的发展,对LLM的评估是非常重要。本调查尝试提供LLM评估的全面视图。我们将LLM评估分为三个主要类别:知识和能力评估、对齐评估和安全评估。此外,我们还提供了对这三个方面评估方法和标准的全面评论,并收录了关于LLM在特定领域的表现评估,以及建立了涵盖LLM评估能力、对齐性、安全性和可用性的完整评估平台。我们希望这份全面的概述能够激发更多关于LLM评估的研究兴趣,以实现评估成为LLM发展的重要指南,以最大化社会 benefit while minimizing potential risks。相关论文的汇总可以在 中找到。
paper_authors: Ali Hatamizadeh, Michael Ranzinger, Jan Kautz
for: 该 paper 的目的是提出一种新的计算机视网络模型,以实现高速的推理和平行的训练。
methods: 该 paper 使用了新的拓展 Transformer 模型,并提出了一种新的并行和循环表示方法,以实现高效的推理和训练。
results: 该 paper 通过了多种 dataset 和不同的图像分辨率进行了广泛的实验,并 achieved 竞争性的性能。Abstract
Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their exceptional capabilities in modeling long-range spatial dependencies and scalability for large scale training. Although the training parallelism of self-attention mechanism plays an important role in retaining great performance, its quadratic complexity baffles the application of ViTs in many scenarios which demand fast inference. This effect is even more pronounced in applications in which autoregressive modeling of input features is required. In Natural Language Processing (NLP), a new stream of efforts have proposed parallelizable models with recurrent formulation that allows for efficient inference in generative applications. Inspired by this trend, we propose a new class of computer vision models, dubbed Vision Retention Networks (ViR), with dual parallel and recurrent formulations, which strike an optimal balance between fast inference and parallel training with competitive performance. In particular, ViR scales favorably for image throughput and memory consumption in tasks that require higher-resolution images due to its flexible formulation in processing large sequence lengths. The ViR is the first attempt to realize dual parallel and recurrent equivalency in a general vision backbone for recognition tasks. We have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions and achieved competitive performance. Our code and pretrained models will be made publicly available.
摘要
视transformer(ViT)在最近几年内引起了很多关注,因为它们在模型长距离空间相互关联的能力和批处理大规模训练中表现出色。 although self-attention机制的训练并行性起到了重要的作用,但它的quadratic复杂性使得ViT在许多场景中不适用,特别是需要快速推理的应用场景。在自然语言处理(NLP)领域,一些新的努力提出了并行化模型的概念,使得在生成应用中实现快速推理变得可能。 drawing inspiration from this trend, we propose a new class of computer vision models, called Vision Retention Networks (ViR), which strike an optimal balance between fast inference and parallel training with competitive performance. in particular, ViR scales favorably for image throughput and memory consumption in tasks that require higher-resolution images due to its flexible formulation in processing large sequence lengths. ViR is the first attempt to realize dual parallel and recurrent equivalency in a general vision backbone for recognition tasks. we have validated the effectiveness of ViR through extensive experiments with different dataset sizes and various image resolutions and achieved competitive performance. our code and pretrained models will be made publicly available.
Generating Medical Instructions with Conditional Transformer
paper_authors: Samuel Belkadi, Nicolo Micheletti, Lifeng Han, Warren Del-Pinto, Goran Nenadic for:The paper is written to introduce a novel task-specific model architecture, Label-To-Text-Transformer (LT3), which generates synthetic medical instructions based on provided labels.methods:The LT3 model is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, and it uses a task-specific transformer architecture to generate synthetic medical instructions.results:The paper evaluates the performance of LT3 by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, and shows that LT3 can generate high-quality and diverse synthetic medical instructions. The generated synthetic data is used to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset, and the results show that the model trained on synthetic data can achieve a 96-98% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form.Abstract
Access to real-world medical instructions is essential for medical research and healthcare quality improvement. However, access to real medical instructions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We introduce a novel task-specific model architecture, Label-To-Text-Transformer (\textbf{LT3}), tailored to generate synthetic medical instructions based on provided labels, such as a vocabulary list of medications and their attributes. LT3 is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, allowing the model to produce valuable synthetic medical instructions. We evaluate LT3's performance by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, analysing the quality and diversity of generated texts. We deploy the generated synthetic data to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset. The experiments show that the model trained on synthetic data can achieve a 96-98\% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form. LT3 codes and data will be shared at \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer}
摘要
accessed to real-world medical instructions is essential for medical research and healthcare quality improvement. However, access to real medical instructions is often limited due to the sensitive nature of the information expressed. Additionally, manually labelling these instructions for training and fine-tuning Natural Language Processing (NLP) models can be tedious and expensive. We introduce a novel task-specific model architecture, Label-To-Text-Transformer (\textbf{LT3}), tailored to generate synthetic medical instructions based on provided labels, such as a vocabulary list of medications and their attributes. LT3 is trained on a vast corpus of medical instructions extracted from the MIMIC-III database, allowing the model to produce valuable synthetic medical instructions. We evaluate LT3's performance by contrasting it with a state-of-the-art Pre-trained Language Model (PLM), T5, analysing the quality and diversity of generated texts. We deploy the generated synthetic data to train the SpacyNER model for the Named Entity Recognition (NER) task over the n2c2-2018 dataset. The experiments show that the model trained on synthetic data can achieve a 96-98\% F1 score at Label Recognition on Drug, Frequency, Route, Strength, and Form. LT3 codes and data will be shared at \url{https://github.com/HECTA-UoM/Label-To-Text-Transformer}.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.
paper_authors: Lesia Semenova, Harry Chen, Ronald Parr, Cynthia Rudin
For: 这篇论文探讨了决策森林模型在各种领域中的性能问题,特别是在含有噪声的数据集上。* Methods: 该论文提出了一种机制,即数据生成过程和分析者在学习过程中的选择,对决策森林模型的性能产生影响。同时,该论文引入了一个名为“模式多样性”的指标,用于衡量决策森林模型在不同分类模式下的差异。* Results: 该论文发现,噪声程度高的数据集会导致决策森林模型的性能更高,并且模式多样性指标与噪声度之间存在正相关关系。这些结果解释了为什么简单的模型在复杂的数据集上可以达到黑盒模型的同等精度水平。Abstract
The Rashomon set is the set of models that perform approximately equally well on a given dataset, and the Rashomon ratio is the fraction of all models in a given hypothesis space that are in the Rashomon set. Rashomon ratios are often large for tabular datasets in criminal justice, healthcare, lending, education, and in other areas, which has practical implications about whether simpler models can attain the same level of accuracy as more complex models. An open question is why Rashomon ratios often tend to be large. In this work, we propose and study a mechanism of the data generation process, coupled with choices usually made by the analyst during the learning process, that determines the size of the Rashomon ratio. Specifically, we demonstrate that noisier datasets lead to larger Rashomon ratios through the way that practitioners train models. Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.
摘要
“Rashomon集是指在给定数据集上表现相似的模型集合,而Rashomon比率则是所有模型空间中这些模型的比例。在刑事、医疗、贷款、教育等领域的表格数据集上,Rashomon比率经常很大,这有各种实际意义,例如是否可以使用简单的模型来达到与更复杂的模型相同的准确率。现有一个开问是为什么Rashomon比率经常很大。在这种工作中,我们提出了数据生成过程中的一种机制,以及分析者在学习过程中通常会选择的决策,这些机制会决定Rashomon比率的大小。具体来说,我们发现在噪声数据集上训练模型时,噪声会导致Rashomon比率更大。此外,我们还引入了一个名为“模式多样性”的度量,它捕捉了Rashomon集中不同分类模式之间的预测差异平均值,并解释了为什么在噪声数据集上,模式多样性往往增加。我们的结果解释了简单模型在复杂、噪声数据集上表现的好,是一种重要的现象。”
results: 本研究提供了一个简洁的概述神经网络编辑领域的最新研究成果,并将相关的方法和数据集分类为四个家族:常规化技术、元学习、直接模型编辑和建筑策略等。Abstract
Deep neural networks are becoming increasingly pervasive in academia and industry, matching and surpassing human performance on a wide variety of fields and related tasks. However, just as humans, even the largest artificial neural networks make mistakes, and once-correct predictions can become invalid as the world progresses in time. Augmenting datasets with samples that account for mistakes or up-to-date information has become a common workaround in practical applications. However, the well-known phenomenon of catastrophic forgetting poses a challenge in achieving precise changes in the implicitly memorized knowledge of neural network parameters, often requiring a full model re-training to achieve desired behaviors. That is expensive, unreliable, and incompatible with the current trend of large self-supervised pre-training, making it necessary to find more efficient and effective methods for adapting neural network models to changing data. To address this need, knowledge editing is emerging as a novel area of research that aims to enable reliable, data-efficient, and fast changes to a pre-trained target model, without affecting model behaviors on previously learned tasks. In this survey, we provide a brief review of this recent artificial intelligence field of research. We first introduce the problem of editing neural networks, formalize it in a common framework and differentiate it from more notorious branches of research such as continuous learning. Next, we provide a review of the most relevant knowledge editing approaches and datasets proposed so far, grouping works under four different families: regularization techniques, meta-learning, direct model editing, and architectural strategies. Finally, we outline some intersections with other fields of research and potential directions for future works.
摘要
深度神经网络在学术和实际领域中日益普遍,与人类表现相当或超越人类在各种领域和相关任务上。然而,和人类一样,即使是最大的人工神经网络也会出错,并且已经正确的预测可能会成为时间推移的情况变化。为了应对这种情况,在实际应用中通常会补充数据集中的错误样本或更新信息。然而,神经网络参数中的隐式记忆知识更新时,经常会遇到危机性忘记问题,需要重新训练整个模型来实现愿望的行为。这是贵重的、不可预测的、与当前大量自动学习预训练的趋势不兼容,因此需要找到更有效率和可靠的方法来适应神经网络模型与数据变化。为此,知识编辑正在成为一种新的人工智能研究领域,旨在在不影响神经网络模型在之前学习任务上的行为的情况下,可靠、数据效率、快速地改变预训练目标模型。在这篇评论中,我们首先介绍编辑神经网络的问题,将其正式化为一种通用框架,并与不同的研究分支(如连续学习)进行区分。然后,我们提供了最新的知识编辑方法和数据集的审视,将工作分为四个家族: regularization techniques、meta-learning、直接模型编辑和建筑策略。最后,我们描述了与其他研究领域的交叉和未来工作的可能性。
Causal Context Connects Counterfactual Fairness to Robust Prediction and Group Fairness
for: This paper focuses on the problem of fairness in machine learning, specifically addressing the concept of counterfactual fairness and its relationship to other fairness metrics.
methods: The authors use a causal context to bridge the gap between counterfactual fairness, robust prediction, and group fairness. They develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness.
results: The authors show that in three common fairness contexts (measurement error, selection on label, and selection on predictors), counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Additionally, they demonstrate that counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.Abstract
Counterfactual fairness requires that a person would have been classified in the same way by an AI or other algorithmic system if they had a different protected class, such as a different race or gender. This is an intuitive standard, as reflected in the U.S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data. On the other hand, group fairness metrics (e.g., demographic parity or equalized odds) are less intuitive but more readily observed. In this paper, we use $\textit{causal context}$ to bridge the gaps between counterfactual fairness, robust prediction, and group fairness. First, we motivate counterfactual fairness by showing that there is not necessarily a fundamental trade-off between fairness and accuracy because, under plausible conditions, the counterfactually fair predictor is in fact accuracy-optimal in an unbiased target distribution. Second, we develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness. Third, we show that in three common fairness contexts$\unicode{x2013}$measurement error, selection on label, and selection on predictors$\unicode{x2013}$counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.
摘要
<>ounterfactual fairness requires that a person would have been classified in the same way by an AI or other algorithmic system if they had a different protected class, such as a different race or gender. This is an intuitive standard, as reflected in the U.S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data. On the other hand, group fairness metrics (e.g., demographic parity or equalized odds) are less intuitive but more readily observed. In this paper, we use $\textit{causal context}$ to bridge the gaps between counterfactual fairness, robust prediction, and group fairness. First, we motivate counterfactual fairness by showing that there is not necessarily a fundamental trade-off between fairness and accuracy because, under plausible conditions, the counterfactually fair predictor is in fact accuracy-optimal in an unbiased target distribution. Second, we develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness. Third, we show that in three common fairness contexts$\unicode{x2013}$measurement error, selection on label, and selection on predictors$\unicode{x2013}$counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.Here's the text with the correct Chinese characters and punctuation:ounterfactual fairness requires that a person would have been classified in the same way by an AI or other algorithmic system if they had a different protected class, such as a different race or gender. This is an intuitive standard, as reflected in the U.S. legal system, but its use is limited because counterfactuals cannot be directly observed in real-world data. On the other hand, group fairness metrics (e.g., demographic parity or equalized odds) are less intuitive but more readily observed. In this paper, we use $\textit{causal context}$ to bridge the gaps between counterfactual fairness, robust prediction, and group fairness. First, we motivate counterfactual fairness by showing that there is not necessarily a fundamental trade-off between fairness and accuracy because, under plausible conditions, the counterfactually fair predictor is in fact accuracy-optimal in an unbiased target distribution. Second, we develop a correspondence between the causal graph of the data-generating process and which, if any, group fairness metrics are equivalent to counterfactual fairness. Third, we show that in three common fairness contexts$\unicode{x2013}$measurement error, selection on label, and selection on predictors$\unicode{x2013}$counterfactual fairness is equivalent to demographic parity, equalized odds, and calibration, respectively. Counterfactual fairness can sometimes be tested by measuring relatively simple group fairness metrics.
Can input reconstruction be used to directly estimate uncertainty of a regression U-Net model? – Application to proton therapy dose prediction for head and neck cancer patients
results: 在这篇论文中,这种方法在预报癌症肿瘤疗法剂量预测 tasks 中与 MCDO 和 DE 相比,得到了更高的 Pearson 相関系数(0.620),并且能够轻松地检测出 OOD 数据(Z-score 34.05)。Abstract
Estimating the uncertainty of deep learning models in a reliable and efficient way has remained an open problem, where many different solutions have been proposed in the literature. Most common methods are based on Bayesian approximations, like Monte Carlo dropout (MCDO) or Deep ensembling (DE), but they have a high inference time (i.e. require multiple inference passes) and might not work for out-of-distribution detection (OOD) data (i.e. similar uncertainty for in-distribution (ID) and OOD). In safety critical environments, like medical applications, accurate and fast uncertainty estimation methods, able to detect OOD data, are crucial, since wrong predictions can jeopardize patients safety. In this study, we present an alternative direct uncertainty estimation method and apply it for a regression U-Net architecture. The method consists in the addition of a branch from the bottleneck which reconstructs the input. The input reconstruction error can be used as a surrogate of the model uncertainty. For the proof-of-concept, our method is applied to proton therapy dose prediction in head and neck cancer patients. Accuracy, time-gain, and OOD detection are analyzed for our method in this particular application and compared with the popular MCDO and DE. The input reconstruction method showed a higher Pearson correlation coefficient with the prediction error (0.620) than DE and MCDO (between 0.447 and 0.612). Moreover, our method allows an easier identification of OOD (Z-score of 34.05). It estimates the uncertainty simultaneously to the regression task, therefore requires less time or computational resources.
摘要
深度学习模型的不确定性估计问题一直是开放的问题,文献中有许多不同的解决方案。大多数常见方法基于 bayesian aproximation,如 Monte Carlo dropout(MCDO)或 Deep ensembling(DE),但它们的推理时间(即需要多个推理pass)很高,并且可能无法处理非标样本(OOD)数据(即标样本和 OOD 的不确定性相同)。在安全关键环境,如医疗应用,准确和快速的不确定性估计方法,能够检测 OOD 数据,是关键的,因为错误预测可能会威胁病人安全。在这种情况下,我们提出了一种irect uncertainty estimation方法,并在一种 regression U-Net 架构上应用了它。该方法的基本思想是在瓶颈处加入一个分支,用于重建输入。输入重建错误可以作为模型不确定性的Surrogate。为证明概念,我们将方法应用于肿瘤疗效剂量预测。我们分析了我们方法的准确性、时间成本和 OOD 检测,并与 MCDO 和 DE 进行比较。重建输入方法的 Pearson 相关系数(0.620)高于 DE 和 MCDO( между 0.447 和 0.612),而且我们的方法可以更容易地检测 OOD(Z-score 为 34.05)。此外,我们的方法可以同时估计不确定性和 regression 任务,因此需要 fewer 时间或计算资源。
Integrating Pre-trained Language Model into Neural Machine Translation
results: 通过提出的PiNMT模型和训练策略(分离学习率和双步训练),在IWSLT’14 En$\leftrightarrow$De数据集上实现了状态级表现。Abstract
Neural Machine Translation (NMT) has become a significant technology in natural language processing through extensive research and development. However, the deficiency of high-quality bilingual language pair data still poses a major challenge to improving NMT performance. Recent studies are exploring the use of contextual information from pre-trained language model (PLM) to address this problem. Yet, the issue of incompatibility between PLM and NMT model remains unresolved. This study proposes a PLM-integrated NMT (PiNMT) model to overcome the identified problems. The PiNMT model consists of three critical components, PLM Multi Layer Converter, Embedding Fusion, and Cosine Alignment, each playing a vital role in providing effective PLM information to NMT. Furthermore, two training strategies, Separate Learning Rates and Dual Step Training, are also introduced in this paper. By implementing the proposed PiNMT model and training strategy, we achieved state-of-the-art performance on the IWSLT'14 En$\leftrightarrow$De dataset. This study's outcomes are noteworthy as they demonstrate a novel approach for efficiently integrating PLM with NMT to overcome incompatibility and enhance performance.
摘要
neural machine translation (NMT) 已经成为自然语言处理领域的重要技术,经过广泛的研究和开发。然而,高质量的双语对数据仍然是NMT性能提高的主要挑战。最近的研究在使用预训练语言模型(PLM)的上下文信息来解决这个问题。然而,PLM和NMT模型之间的不兼容问题仍未得到解决。本研究提出了PLM结合NMT(PiNMT)模型,以解决这些问题。PiNMT模型包括三个关键组件:PLM多层转换器、嵌入混合和cosine对齐,每个组件都在提供PLM信息到NMT中发挥重要作用。此外,本研究还提出了两种训练策略:分开学习率和双步训练。通过实施提议的PiNMT模型和训练策略,我们在IWSLT'14 En$\leftrightarrow$De数据集上实现了状态的表现。这些成果含义重大,因为它们证明了一种有效的PLM与NMT集成方法,以解决不兼容性和提高性能。
results: 本论文提出了一种Recurrent Process(前向Alignment和后向Alignment),该过程可以确保AI系统的Alignment,并且在每一次Alignment后,可以提供更新的目标对于下一轮Alignment。此外,论文还讨论了反馈学习和分布shift学习,以及对AI系统生命周期的每一个阶段的Assurance技术和管理做法。Abstract
AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, the potential large-scale risks associated with misaligned AI systems become salient. Hundreds of AI experts and public figures have expressed concerns about AI risks, arguing that "mitigating the risk of extinction from AI should be a global priority, alongside other societal-scale risks such as pandemics and nuclear war". To provide a comprehensive and up-to-date overview of the alignment field, in this survey paper, we delve into the core concepts, methodology, and practice of alignment. We identify the RICE principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality. Guided by these four principles, we outline the landscape of current alignment research and decompose them into two key components: forward alignment and backward alignment. The former aims to make AI systems aligned via alignment training, while the latter aims to gain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks. Forward alignment and backward alignment form a recurrent process where the alignment of AI systems from the forward process is verified in the backward process, meanwhile providing updated objectives for forward alignment in the next round. On forward alignment, we discuss learning from feedback and learning under distribution shift. On backward alignment, we discuss assurance techniques and governance practices that apply to every stage of AI systems' lifecycle. We also release and continually update the website (www.alignmentsurvey.com) which features tutorials, collections of papers, blog posts, and other resources.
摘要
人工智能启 alignment 目标是使人工智能系统与人类意图和价值观 align together。随着人工智能系统的能力增强,落后的大规模风险减少成为焦点。多位 AI 专家和公众人物表达了关于 AI 风险的关注,认为“控制 AI 风险的扩展应该是全球优先事项,与其他社会级风险相提并论”。为了提供完整和准确的对 alignment 领域的概述,在这篇调查报告中,我们深入探讨了核心概念、方法和实践的 alignment。我们认为 RICE 原则是 AI alignment 的关键目标:Robustness、可读性、可控性和伦理。 guid by these four principles, we outline the landscape of current alignment research and decompose them into two key components:forward alignment and backward alignment。前者 aim to make AI systems aligned via alignment training,而后者 aim to obtain evidence about the systems' alignment and govern them appropriately to avoid exacerbating misalignment risks。forward alignment 和 backward alignment form a recurrent process,where the alignment of AI systems from the forward process is verified in the backward process, meanwhile providing updated objectives for forward alignment in the next round。在 forward alignment 方面,我们讨论了从反馈学习和分布转换学习。在 backward alignment 方面,我们讨论了对 AI 系统的生命周期中每一个阶段的 assurance 技术和管理做法。我们还将 continually 更新网站(www.alignmentsurvey.com),该网站包括教程、论文收集、博客文章和其他资源。
Large Language Models: The Need for Nuance in Current Debates and a Pragmatic Perspective on Understanding
results: 本文的结果表明,对 LLM 的评价需要更加细化,并且提出了一种 Pragmatic 视角来理解 LLM 的含义和意图。Abstract
Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of `real' understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.
摘要
现有的大型自然语言处理模型(LLM) possess incredible ability to generate grammatically correct and fluent text. LLMs are emerging rapidly, and debates about their capacities have intensified, but reflection is lagging behind. Therefore, in this position paper, we will first focus on the debates and critically assess three points that are frequently raised in critiques of LLM capacities:1. LLMs only parrot statistical patterns in the training data;2. LLMs master formal language competence but not functional language competence;3. Language learning in LLMs cannot inform human language learning.Drawing on empirical and theoretical arguments, we will show that these points require more nuance. Second, we will outline a pragmatic perspective on the issue of "real" understanding and intentionality in LLMs. Understanding and intentionality refer to the unobservable mental states that we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behavior effectively. We will reflect on the circumstances under which it would make sense for humans to attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society.
Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection
results: 研究发现,LLM生成的决策树解释与人类评价的可读性、质量和背景知识之间呈高度相关,同时能够提供更好的决策边界的理解。Abstract
Network intrusion detection (NID) systems which leverage machine learning have been shown to have strong performance in practice when used to detect malicious network traffic. Decision trees in particular offer a strong balance between performance and simplicity, but require users of NID systems to have background knowledge in machine learning to interpret. In addition, they are unable to provide additional outside information as to why certain features may be important for classification. In this work, we explore the use of large language models (LLMs) to provide explanations and additional background knowledge for decision tree NID systems. Further, we introduce a new human evaluation framework for decision tree explanations, which leverages automatically generated quiz questions that measure human evaluators' understanding of decision tree inference. Finally, we show LLM generated decision tree explanations correlate highly with human ratings of readability, quality, and use of background knowledge while simultaneously providing better understanding of decision boundaries.
摘要
网络入侵检测(NID)系统利用机器学习技术已经在实践中表现出色,检测恶意网络流量。决策树特别是在性能和简单性之间做出了好的折衔,但需要NID系统用户具备机器学习背景知识来解释。此外,它们无法提供外部信息,以解释特定特征对分类有什么影响。在这项工作中,我们探讨了使用大型自然语言模型(LLM)提供解释和背景知识,以帮助决策树NID系统更好地理解决ución。此外,我们还提出了一种新的人工评估框架,用于评估决策树解释的可读性、质量和背景知识使用程度。最后,我们表明LLM生成的决策树解释与人类评估中的可读性、质量和背景知识使用程度高度相关,同时提供更好的决策边界理解。
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval
results: 通过进行logit和特征填充,使学生双流模型的检索性能得到明显提高,而无需增加检索复杂度。此外,在Snapdragon clips上实现了一个具有93M内存和30ms搜索延迟的移动CLIP模型,未出现明显性能下降。Abstract
With the success of large-scale visual-language pretraining models and the wide application of image-text retrieval in industry areas, reducing the model size and streamlining their terminal-device deployment have become urgently necessary. The mainstream model structures for image-text retrieval are single-stream and dual-stream, both aiming to close the semantic gap between visual and textual modalities. Dual-stream models excel at offline indexing and fast inference, while single-stream models achieve more accurate cross-model alignment by employing adequate feature fusion. We propose a multi-teacher cross-modality alignment distillation (MCAD) technique to integrate the advantages of single-stream and dual-stream models. By incorporating the fused single-stream features into the image and text features of the dual-stream model, we formulate new modified teacher features and logits. Then, we conduct both logit and feature distillation to boost the capability of the student dual-stream model, achieving high retrieval performance without increasing inference complexity. Extensive experiments demonstrate the remarkable performance and high efficiency of MCAD on image-text retrieval tasks. Furthermore, we implement a mobile CLIP model on Snapdragon clips with only 93M running memory and 30ms search latency, without apparent performance degradation of the original large CLIP.
摘要
通过大规模视语言预训模型的成功和图文检索在业界的广泛应用,减小模型大小并将其部署到终端设备上已经变得非常必要。主流的图文检索模型结构包括单流和双流,两者都努力封闭视和文本模式之间的Semantic Gap。双流模型在线索索引和快速推理方面表现出色,而单流模型通过适当的特征融合实现更高精度的跨模型对接。我们提出了一种多教师跨模态对接填充(MCAD)技术,将单流特征融合到双流模型中的图像和文本特征上。然后,我们进行了日志和特征填充来提高学生双流模型的能力,实现高效的图文检索任务。广泛的实验表明MCAD在图文检索任务中的表现很出色,同时具有高效性。此外,我们在Snapdragon板上实现了具有93M内存和30ms搜索延迟的移动CLIP模型,无 Apparent performance degradation of the original large CLIP.
Fast swap regret minimization and applications to approximate correlated equilibria
paper_authors: Binghui Peng, Aviad Rubinstein for: 本研究的目的是解决[Blum和Mansour 2007]中的主要开放问题,提出了一种可靠和计算效率高的算法,可以在只有polylog(n)轮次内减少T-swap regret至εT。methods: 本研究使用了一种新的算法,具有对ε的指数减少,但是我们证明了一个新的下界。results: 本研究的算法可以在polylog(n)轮次内减少T-swap regret至εT,并且解决了[Blum和Mansour 2007]中的主要开放问题。此外,本研究还提出了一种可靠和计算效率高的算法,可以在polylog(n)轮次内实现ε-Correlated Equilibrium(ε-CE)在多种场景中。Abstract
We give a simple and computationally efficient algorithm that, for any constant $\varepsilon>0$, obtains $\varepsilon T$-swap regret within only $T = \mathsf{polylog}(n)$ rounds; this is an exponential improvement compared to the super-linear number of rounds required by the state-of-the-art algorithm, and resolves the main open problem of [Blum and Mansour 2007]. Our algorithm has an exponential dependence on $\varepsilon$, but we prove a new, matching lower bound. Our algorithm for swap regret implies faster convergence to $\varepsilon$-Correlated Equilibrium ($\varepsilon$-CE) in several regimes: For normal form two-player games with $n$ actions, it implies the first uncoupled dynamics that converges to the set of $\varepsilon$-CE in polylogarithmic rounds; a $\mathsf{polylog}(n)$-bit communication protocol for $\varepsilon$-CE in two-player games (resolving an open problem mentioned by [Babichenko-Rubinstein'2017, Goos-Rubinstein'2018, Ganor-CS'2018]; and an $\tilde{O}(n)$-query algorithm for $\varepsilon$-CE (resolving an open problem of [Babichenko'2020] and obtaining the first separation between $\varepsilon$-CE and $\varepsilon$-Nash equilibrium in the query complexity model). For extensive-form games, our algorithm implies a PTAS for $\mathit{normal}$ $\mathit{form}$ $\mathit{correlated}$ $\mathit{equilibria}$, a solution concept often conjectured to be computationally intractable (e.g. [Stengel-Forges'08, Fujii'23]).
摘要
我们提供了一个简单而计算效率高的算法,它可以在任何常数 $\varepsilon>0$ 下获得 $\varepsilon T$-交换失落,并且只需要 $T = \mathcal{O}(\log^c(n))$ 轮次,这是一个对比原始算法的 exponential 提高,并解决了 [Blum 和 Mansour 2007] 中的主要开放问题。我们的算法具有对 $\varepsilon$ 的几何依赖,但我们证明了一个新的匹配下界。我们的交换失落算法 imply 在一些场景中更快地达到 $\varepsilon$-相关平衡($\varepsilon$-CE):1. 正常形二人游戏中,我们的算法可以在 $\mathcal{O}(\log^c(n))$ 轮次内达到 $\varepsilon$-CE 的集合,这是第一个解耦的演化过程。2. 我们可以实现 $\mathsf{polylog}(n)$-位寄存器协议来实现 $\varepsilon$-CE,解决了 [Babichenko-Rubinstein 2017, Goos-Rubinstein 2018, Ganor-CS 2018] 中的开放问题。3. 我们可以实现 $\tilde{O}(n)$-询问算法来实现 $\varepsilon$-CE,解决了 [Babichenko 2020] 中的开放问题,并且获得了首次对 $\varepsilon$-CE 和 $\varepsilon$-尼亚希平衡($\varepsilon$-NE)之间的分离。对于扩展形游戏,我们的算法 imply 一种 PTAS для $\mathit{normal}$ $\mathit{form}$ $\mathit{相关}$ $\mathit{平衡}$,这是一个常 conjectured 是计算复杂的解决方案(例如 [Stengel-Forges 08, Fujii 23])。
RayDF: Neural Ray-surface Distance Fields with Multi-view Consistency
For: This paper addresses the problem of continuous 3D shape representations, and proposes a new framework called RayDF to improve the efficiency and accuracy of 3D shape representation.* Methods: The proposed RayDF framework consists of three components: 1) simple ray-surface distance field, 2) novel dual-ray visibility classifier, and 3) multi-view consistency optimization module.* Results: The proposed method achieves remarkable performance in 3D surface point reconstruction on both synthetic and real-world 3D scenes, with a 1000x faster speed than coordinate-based methods to render an 800x800 depth image.Here is the summary in Traditional Chinese:* For: 本文研究矩阵3D形状表示的问题,并提出了一个新的框架 called RayDF,以提高3D形状表示的效率和准确性。* Methods: RayDF框架包括三个 ком成分:1)简单的光条-表面距离场,2)新的双光条可见分类器,3)多视角协力优化模块。* Results: 提出的方法在实验中 achieve remarkable performance in 3D surface point reconstruction,并且比coordinate-based方法更快速地显示出800x800深度图像,表明了方法的superiority。Abstract
In this paper, we study the problem of continuous 3D shape representations. The majority of existing successful methods are coordinate-based implicit neural representations. However, they are inefficient to render novel views or recover explicit surface points. A few works start to formulate 3D shapes as ray-based neural functions, but the learned structures are inferior due to the lack of multi-view geometry consistency. To tackle these challenges, we propose a new framework called RayDF. It consists of three major components: 1) the simple ray-surface distance field, 2) the novel dual-ray visibility classifier, and 3) a multi-view consistency optimization module to drive the learned ray-surface distances to be multi-view geometry consistent. We extensively evaluate our method on three public datasets, demonstrating remarkable performance in 3D surface point reconstruction on both synthetic and challenging real-world 3D scenes, clearly surpassing existing coordinate-based and ray-based baselines. Most notably, our method achieves a 1000x faster speed than coordinate-based methods to render an 800x800 depth image, showing the superiority of our method for 3D shape representation. Our code and data are available at https://github.com/vLAR-group/RayDF
摘要
在这篇论文中,我们研究了连续3D形状表示的问题。现有大多数成功方法都是基于坐标的卷积神经表示。然而,它们在渲染新视图或者恢复明确的表面点时效率低下。一些工作开始将3D形状表示为射线基的神经函数,但学习结构因为缺乏多视图几何一致性而受到限制。为了解决这些挑战,我们提出了一个新的框架called RayDF。它包括三个主要组件:1)简单的射线-表面距离场,2)新的双射线可见分类器,和3)多视图一致性优化模块,以使学习的射线-表面距离在多视图几何上保持一致。我们对三个公共数据集进行了广泛的评估,并证明了我们的方法在 sintetic和实际世界中的3D表面点重建任务中显著的表现,明显超过了坐标基于和射线基于的基elines。尤其是,我们的方法可以在coordinate-based方法的1000倍速度下渲染800x800深度图像,表明我们的方法在3D形状表示方面的优势。我们的代码和数据可以在https://github.com/vLAR-group/RayDF上获取。
Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities
results: 论文发现了AGI系统在文化领域的应用存在几个关键问题,如真实性、毒性、偏见和公共安全等问题,并提出了缓解策略。论文强调了多方合作来确保AGI系统推动创造力、知识和文化价值,而不是威胁真实性或人类尊严。Abstract
Recent advances in artificial general intelligence (AGI), particularly large language models and creative image generation systems have demonstrated impressive capabilities on diverse tasks spanning the arts and humanities. However, the swift evolution of AGI has also raised critical questions about its responsible deployment in these culturally significant domains traditionally seen as profoundly human. This paper provides a comprehensive analysis of the applications and implications of AGI for text, graphics, audio, and video pertaining to arts and the humanities. We survey cutting-edge systems and their usage in areas ranging from poetry to history, marketing to film, and communication to classical art. We outline substantial concerns pertaining to factuality, toxicity, biases, and public safety in AGI systems, and propose mitigation strategies. The paper argues for multi-stakeholder collaboration to ensure AGI promotes creativity, knowledge, and cultural values without undermining truth or human dignity. Our timely contribution summarizes a rapidly developing field, highlighting promising directions while advocating for responsible progress centering on human flourishing. The analysis lays the groundwork for further research on aligning AGI's technological capacities with enduring social goods.
摘要
This paper provides a comprehensive analysis of the applications and implications of AGI in the fields of text, graphics, audio, and video in the arts and humanities. We survey cutting-edge systems and their use in areas such as poetry, history, marketing, film, and communication, as well as classical art. We also highlight significant concerns related to factuality, toxicity, biases, and public safety in AGI systems, and propose strategies for mitigating these issues.The paper argues for multi-stakeholder collaboration to ensure that AGI promotes creativity, knowledge, and cultural values while upholding truth and human dignity. Our analysis lays the groundwork for further research on aligning AGI's technological capacities with enduring social goods, and advocates for responsible progress that centers on human flourishing.
Exploring Post-Training Quantization of Protein Language Models
paper_authors: Shuang Peng, Fei Yang, Ning Sun, Sheng Chen, Yanfeng Jiang, Aimin Pan
For: This paper aims to improve the efficiency of protein language models (ProteinLMs) by developing a post-training quantization (PTQ) method that can accurately quantize all weights and activations of ProteinLMs without compromising accuracy.* Methods: The proposed PTQ method uses piecewise linear quantization for asymmetric activation values to ensure accurate approximation, addressing specific challenges associated with ESMFold, a simplified version of AlphaFold based on ESM-2 ProteinLM.* Results: The proposed method was demonstrated to be effective in protein structure prediction tasks, showing that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, the method was applied to the contact prediction task, showcasing its versatility.Abstract
Recent advancements in unsupervised protein language models (ProteinLMs), like ESM-1b and ESM-2, have shown promise in different protein prediction tasks. However, these models face challenges due to their high computational demands, significant memory needs, and latency, restricting their usage on devices with limited resources. To tackle this, we explore post-training quantization (PTQ) for ProteinLMs, focusing on ESMFold, a simplified version of AlphaFold based on ESM-2 ProteinLM. Our study is the first attempt to quantize all weights and activations of ProteinLMs. We observed that the typical uniform quantization method performs poorly on ESMFold, causing a significant drop in TM-Score when using 8-bit quantization. We conducted extensive quantization experiments, uncovering unique challenges associated with ESMFold, particularly highly asymmetric activation ranges before Layer Normalization, making representation difficult using low-bit fixed-point formats. To address these challenges, we propose a new PTQ method for ProteinLMs, utilizing piecewise linear quantization for asymmetric activation values to ensure accurate approximation. We demonstrated the effectiveness of our method in protein structure prediction tasks, demonstrating that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, we applied our method to the contact prediction task, showcasing its versatility. In summary, our study introduces an innovative PTQ method for ProteinLMs, addressing specific quantization challenges and potentially leading to the development of more efficient ProteinLMs with significant implications for various protein-related applications.
摘要
We found that the typical uniform quantization method performs poorly on ESMFold, resulting in a significant drop in TM-Score when using 8-bit quantization. To address this challenge, we proposed a new PTQ method for ProteinLMs that utilizes piecewise linear quantization for asymmetric activation values to ensure accurate approximation. We demonstrated the effectiveness of our method in protein structure prediction tasks, showing that ESMFold can be accurately quantized to low-bit widths without compromising accuracy. Additionally, we applied our method to the contact prediction task, demonstrating its versatility.Our study introduces an innovative PTQ method for ProteinLMs that addresses specific quantization challenges and has the potential to lead to the development of more efficient ProteinLMs with significant implications for various protein-related applications.
Large Trajectory Models are Scalable Motion Predictors and Planners
results: 实验结果表明,大型路径模型(LTM),如 STR,遵循缩放法律,并表现出扩展性和学习效率的出色表现。质量结果还显示, LTM 能够在训练数据分布不同的情况下做出可能的预测,并且学习长期规划,不需要显式损失函数或高级标注。Abstract
Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). STR reformulates the motion prediction and motion planning problems by arranging observations, states, and actions into one unified sequence modeling task. With a simple model design, STR consistently outperforms baseline approaches in both problems. Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency. Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations.
摘要
<>输入文本翻译为简化字符串。<>自动驾驶中的运动预测和规划是关键任务,现在的努力都是转向机器学习基础的方法。挑战包括理解多样化的公路地貌,理解交通流动的长期发展,理解不同类型的行为,并在大规模状态空间中生成策略。受大语言模型在类似复杂性方面的成功启发,我们引入了可扩展的轨迹模型(State Transformer,STR)。STR将运动预测和规划问题重新排序为一个简单的序列模型任务。与基eline方法相比,STR在两个问题中具有稳定的表现。特别是,实验结果表明,大轨迹模型(LTM),如STR,遵循协调规律,表现出杰出的适应性和学习效率。Qualitative结果还表明,LTM可以在训练数据分布外的场景中做出合理的预测,而无需详细的损失函数设计或高级注释。
Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models
results: 研究人员在一个网格世界设置中进行了一个证明性研究,以证明 situated evaluation 可以更好地评估机器的 ToM 能力,并减少了短cut和数据泄露的风险。Abstract
Large Language Models (LLMs) have generated considerable interest and debate regarding their potential emergence of Theory of Mind (ToM). Several recent inquiries reveal a lack of robust ToM in these models and pose a pressing demand to develop new benchmarks, as current ones primarily focus on different aspects of ToM and are prone to shortcuts and data leakage. In this position paper, we seek to answer two road-blocking questions: (1) How can we taxonomize a holistic landscape of machine ToM? (2) What is a more effective evaluation protocol for machine ToM? Following psychological studies, we taxonomize machine ToM into 7 mental state categories and delineate existing benchmarks to identify under-explored aspects of ToM. We argue for a holistic and situated evaluation of ToM to break ToM into individual components and treat LLMs as an agent who is physically situated in environments and socially situated in interactions with humans. Such situated evaluation provides a more comprehensive assessment of mental states and potentially mitigates the risk of shortcuts and data leakage. We further present a pilot study in a grid world setup as a proof of concept. We hope this position paper can facilitate future research to integrate ToM with LLMs and offer an intuitive means for researchers to better position their work in the landscape of ToM. Project page: https://github.com/Mars-tin/awesome-theory-of-mind
摘要
大型语言模型(LLM)已引起广泛的关注和讨论,关于它们是否会发展出理论心(ToM)的潜在能力。一些最近的调查表明,目前的LLM Models在ToM方面存在强度不足的问题,需要开发新的评价协议,因为现有的评价方法主要关注了ToM的不同方面,容易出现偏导和数据泄露问题。在这份Position paper中,我们寻求回答以下两个障碍问题:1. 如何分类机器人的整体ToM领域?2. 如何设计更有效的机器人ToM评价协议?根据心理学研究,我们将机器人ToM分类为7种心理状态类别,并将现有的评价方法进行分类,以找出尚未得到足够关注的ToM方面。我们 argueThat a comprehensive and situated evaluation of ToM is necessary to break ToM into its individual components and treat LLMs as physically and socially situated agents in environments and interactions with humans. Such situated evaluation can provide a more comprehensive assessment of mental states and potentially mitigate the risk of shortcuts and data leakage. We further present a pilot study in a grid world setup as a proof of concept. We hope this position paper can facilitate future research to integrate ToM with LLMs and offer an intuitive means for researchers to better position their work in the landscape of ToM.项目页面:https://github.com/Mars-tin/awesome-theory-of-mind
Technical Report on the Learning of Case Relevance in Case-Based Reasoning with Abstract Argumentation
paper_authors: Guilherme Paulino-Passos, Francesca Toni
for: This paper focuses on using case-based reasoning and abstract argumentation to improve the prediction of legal outcomes.
methods: The paper uses decision trees to learn the relevance of cases and combines case-based reasoning with abstract argumentation to make predictions.
results: The authors show that the proposed approach performs competitively with decision trees and results in a more compact representation, which could be beneficial for obtaining cognitively tractable explanations.Here’s the information in Simplified Chinese text:
results: 作者们表明,提议的方法与决策树相比,在两个法律数据集上表现竞争性,并且生成了更加紧凑的表示,可能有助于获得更加容易理解的解释。Abstract
Case-based reasoning is known to play an important role in several legal settings. In this paper we focus on a recent approach to case-based reasoning, supported by an instantiation of abstract argumentation whereby arguments represent cases and attack between arguments results from outcome disagreement between cases and a notion of relevance. In this context, relevance is connected to a form of specificity among cases. We explore how relevance can be learnt automatically in practice with the help of decision trees, and explore the combination of case-based reasoning with abstract argumentation (AA-CBR) and learning of case relevance for prediction in legal settings. Specifically, we show that, for two legal datasets, AA-CBR and decision-tree-based learning of case relevance perform competitively in comparison with decision trees. We also show that AA-CBR with decision-tree-based learning of case relevance results in a more compact representation than their decision tree counterparts, which could be beneficial for obtaining cognitively tractable explanations.
摘要
Case-based reasoning 在法律设置中扮演着重要角色。在这篇论文中,我们关注一种最近的case-based reasoning方法,基于抽象论证的实现,其中Arguments代表案例,Arguments之间的攻击 originates from outcome disagreement between cases and a notion of relevance。在这种情况下, relevance 与特定的案例之间的相似性相连。我们研究如何在实践中自动学习case relevance,并探讨 AA-CBR 和学习案例相关性的组合用于预测法律设置中。具体来说,我们显示,对于两个法律数据集,AA-CBR 和基于决策树的学习案例相关性能与决策树相比竞争,而且 AA-CBR 与决策树学习案例相关性后得到的表示更加紧凑,这可能有助于获得更加容易理解的解释。
LLMaAA: Making Large Language Models as Active Annotators
results: 在两个 класи级NLP任务中, LLMaAA 可以实现高效的标签生成,并且可以在只需百个标签示例下,训练任务特定的模型,并且超越其他基eline。Abstract
Prevalent supervised learning methods in natural language processing (NLP) are notoriously data-hungry, which demand large amounts of high-quality annotated data. In practice, acquiring such data is a costly endeavor. Recently, the superior few-shot performance of large language models (LLMs) has propelled the development of dataset generation, where the training data are solely synthesized from LLMs. However, such an approach usually suffers from low-quality issues, and requires orders of magnitude more labeled data to achieve satisfactory performance. To fully exploit the potential of LLMs and make use of massive unlabeled data, we propose LLMaAA, which takes LLMs as annotators and puts them into an active learning loop to determine what to annotate efficiently. To learn robustly with pseudo labels, we optimize both the annotation and training processes: (1) we draw k-NN examples from a small demonstration pool as in-context examples, and (2) we adopt the example reweighting technique to assign training samples with learnable weights. Compared with previous approaches, LLMaAA features both efficiency and reliability. We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction. With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples, which is much more cost-effective than other baselines.
摘要
通常的监督学习方法在自然语言处理(NLP)领域具有著名的数据占用问题,需要大量高质量标注数据来训练。在实践中,获取这些数据是一件昂贵的困难任务。在最近,大型自然语言模型(LLM)的优秀几个shot性能的发展使得数据生成技术得到了更多的关注,其中的数据主要通过LLM来生成。然而,这种方法通常受到低质量问题的困扰,需要数量级更多的标注数据来达到满意性。为了充分利用LLM的潜力并使用庞大的未标注数据,我们提出了LLMaAA,它将LLM作为标注者,并将其置入到活动学习循环中,以确定如何有效地标注。为了学习Robustly,我们在标注和训练过程中进行优化:(1)我们从小示例池中随机选择k nearest neighbors(k-NN)的示例作为上下文示例,并(2)采用示例权重技术,将训练样本分配学习型的权重。相比之下,LLMaAA具有高效和可靠的特点。我们在两个 класси型NLP任务中进行实验和分析,结果显示,通过LLMaAA训练基于LLM生成的标签的任务特定模型,可以在只需百个标注示例的情况下,超越教师模型,这是其他基准之下更加经济的。
Prediction of Locally Stationary Data Using Expert Advice
results: 论文提出了一种在线预测算法,可以用于处理本地站点时间序列。此外,论文还得到了这种算法的效率估计。Abstract
The problem of continuous machine learning is studied. Within the framework of the game-theoretic approach, when for calculating the next forecast, no assumptions about the stochastic nature of the source that generates the data flow are used -- the source can be analog, algorithmic or probabilistic, its parameters can change at random times, when building a prognostic model, only structural assumptions are used about the nature of data generation. An online forecasting algorithm for a locally stationary time series is presented. An estimate of the efficiency of the proposed algorithm is obtained.
摘要
“continuous machine learning”问题被研究。在游戏理论方法框架下,对于计算下一个预测时,不假设数据源具有测度的性质,数据源可以是杂音、算法或概率的,数据源的参数可以在随机时间变化,在建立预测模型时,只假设数据生成的结构。向来自本地站点时间序列的在线预测算法被提出。对提出的算法的效率估计得到。Note: "continuous machine learning" is not a direct translation of the English phrase "continuous machine learning", but rather a translation of the concept it represents. In Simplified Chinese, "continuous" is often translated as "连续" (lián xù), but in this context, "连续机器学习" (lián xù jī shù) is used to emphasize the continuous nature of the learning process.
CreoleVal: Multilingual Multitask Benchmarks for Creoles
paper_authors: Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Hans Erik Heje, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva
results: 本文提供了8种NLP任务的benchmark数据集,覆盖28种Creole语言,并为每个任务进行了零shot基准实验,以更好地了解Creole语言在NLP领域的能力和局限性。Abstract
Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research. While the genealogical ties between Creoles and other highly-resourced languages imply a significant potential for transfer learning, this potential is hampered due to this lack of annotated data. In this work we present CreoleVal, a collection of benchmark datasets spanning 8 different NLP tasks, covering up to 28 Creole languages; it is an aggregate of brand new development datasets for machine comprehension, relation classification, and machine translation for Creoles, in addition to a practical gateway to a handful of preexisting benchmarks. For each benchmark, we conduct baseline experiments in a zero-shot setting in order to further ascertain the capabilities and limitations of transfer learning for Creoles. Ultimately, the goal of CreoleVal is to empower research on Creoles in NLP and computational linguistics. We hope this resource will contribute to technological inclusion for Creole language users around the globe.
摘要
创护语言表示一个尚未得到足够探索和受排斥的语言群体,有很少的可用资源 для NLP研究。尽管创护语言与其他高度资源化语言之间的 généalogique 关系 imply 一个显著的潜在 transferred learning 潜力,但这种潜力受到缺乏精心标注数据的限制。在这项工作中,我们介绍 CreoleVal,一个收录8种 NLP任务的benchmark集合,覆盖28种创护语言,包括新开发的机器理解、关系分类和机器翻译 benchmarks 以及一些现有的benchmarks。对于每个benchmark,我们进行了零批设置的基eline实验,以更好地了解创护语言在 transferred learning 中的能力和局限性。最终,CreoleVal的目标是促进创护语言在 NLP和计算语言科学中的研究,并为全球各地的创护语言用户提供技术包容。
A General Neural Causal Model for Interactive Recommendation
results: both theoretical and empirical studies demonstrate the effectiveness of the proposed solution.Abstract
Survivor bias in observational data leads the optimization of recommender systems towards local optima. Currently most solutions re-mines existing human-system collaboration patterns to maximize longer-term satisfaction by reinforcement learning. However, from the causal perspective, mitigating survivor effects requires answering a counterfactual problem, which is generally unidentifiable and inestimable. In this work, we propose a neural causal model to achieve counterfactual inference. Specifically, we first build a learnable structural causal model based on its available graphical representations which qualitatively characterizes the preference transitions. Mitigation of the survivor bias is achieved though counterfactual consistency. To identify the consistency, we use the Gumbel-max function as structural constrains. To estimate the consistency, we apply reinforcement optimizations, and use Gumbel-Softmax as a trade-off to get a differentiable function. Both theoretical and empirical studies demonstrate the effectiveness of our solution.
摘要
Note:* "Survivor bias" is translated as "生存者偏见" (shēng zhì zhēng yǎn)* "Observational data" is translated as "观察数据" (guān cháng shù dài)* "Recommender systems" is translated as "推荐系统" (tuī yù xì tǒng)* "Local optima" is translated as "本地最优" (ben dì zuì yōu)* "Counterfactual problem" is translated as "Counterfactual问题" (fǎng yì wèn tí)* "Gumbel-max function" is translated as "Gumbel-max函数" (Gumbel-max fāng xìng)* "Structural constrains" is translated as "结构约束" (jiégòu yāo xiǎng)* "Reinforcement optimizations" is translated as "强化优化" (qiáng huà yóu huà)* "Gumbel-Softmax" is translated as "Gumbel-Softmax" (Gumbel-Softmax)
Inverse folding for antibody sequence design using deep learning
paper_authors: Frédéric A. Dreyer, Daniel Cutting, Constantin Schneider, Henry Kenlay, Charlotte M. Deane
for: 蛋白质结构设计问题
methods: 精确的逆折衣模型和物理学基础方法
results: 提高适应性和结构稳定性,尤其是在制做CDR-H3螺旋中Here’s a more detailed explanation of each point:
for: The paper is written for the problem of designing antibody sequences based on 3D structural information.
methods: The paper proposes a fine-tuned inverse folding model that is specifically optimized for antibody structures, and uses physics-based methods to evaluate the quality of proposed sequences.
results: The paper shows that the proposed model outperforms generic protein models on sequence recovery and structure robustness when applied to antibodies, with notable improvement on the hypervariable CDR-H3 loop. Additionally, the paper studies the canonical conformations of complementarity-determining regions and finds improved encoding of these loops into known clusters.Abstract
We consider the problem of antibody sequence design given 3D structural information. Building on previous work, we propose a fine-tuned inverse folding model that is specifically optimised for antibody structures and outperforms generic protein models on sequence recovery and structure robustness when applied on antibodies, with notable improvement on the hypervariable CDR-H3 loop. We study the canonical conformations of complementarity-determining regions and find improved encoding of these loops into known clusters. Finally, we consider the applications of our model to drug discovery and binder design and evaluate the quality of proposed sequences using physics-based methods.
摘要
我们考虑了抗体序列设计问题,givien 3D结构信息。基于先前的工作,我们提出了特制 inverse folding 模型,可以特地优化 для抗体结构,在抗体序列恢复和结构稳定性方面表现出色,特别是在 CDR-H3 征 loop 中。我们研究了供应 chain 的常见 conformations 和 encoding 这些征 loop 到已知的群体。最后,我们考虑了我们模型在药物发现和绑定设计中的应用,并使用物理学基的方法评估提出的序列质量。Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
SparseByteNN: A Novel Mobile Inference Acceleration Framework Based on Fine-Grained Group Sparsity
results: 实验结果显示,对于30%簇的MobileNet-v1,SparseByteNN可以与紧密的终端版本和现有的缓冲推断引擎MNN进行比较,在效率-精度曲线上表现出superiority。具体来说,SparseByteNN在Qualcomm 855上实现了1.27倍的速度提升和1.29倍的效率提升,仅有0.224%的精度下降。Abstract
To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open problem. In this paper, we present a novel mobile inference acceleration framework SparseByteNN, which leverages fine-grained kernel sparsity to achieve real-time execution as well as high accuracy. Our framework consists of two parts: (a) A fine-grained kernel sparsity schema with a sparsity granularity between structured pruning and unstructured pruning. It designs multiple sparse patterns for different operators. Combined with our proposed whole network rearrangement strategy, the schema achieves a high compression rate and high precision at the same time. (b) Inference engine co-optimized with the sparse pattern. The conventional wisdom is that this reduction in theoretical FLOPs does not translate into real-world efficiency gains. We aim to correct this misconception by introducing a family of efficient sparse kernels for ARM and WebAssembly. Equipped with our efficient implementation of sparse primitives, we show that sparse versions of MobileNet-v1 outperform strong dense baselines on the efficiency-accuracy curve. Experimental results on Qualcomm 855 show that for 30% sparse MobileNet-v1, SparseByteNN achieves 1.27x speedup over the dense version and 1.29x speedup over the state-of-the-art sparse inference engine MNN with a slight accuracy drop of 0.224%. The source code of SparseByteNN will be available at https://github.com/lswzjuer/SparseByteNN
摘要
为了解决网络大小的挑战,研究人员已经开发出了稀盐模型通过网络剪切。然而,在普通计算设备上实现重要的速度增加并保持模型准确性 remain an open problem。在这篇论文中,我们提出了一种新的移动推理加速框架SparseByteNN,它利用细化的kernel稀盐来实现实时执行以及高准确性。我们的框架包括两部分:(a) 细化kernel稀盐schema,其中的稀盐粒度位于结构剪切和无结构剪切之间。它设计了多种不同的稀盐模式,并结合我们的提出的整个网络重新排序策略,实现了高压缩率和高精度同时。(b) 推理引擎与稀盐模式相似。传统的观点是,这种理论的FLOP减少不会在实际情况中带来实用性提升。我们希望通过引入一家高效的稀盐kernel家族,证明这个观点是错误的。我们的高效实现稀盐基本操作,使得稀盐版本的MobileNet-v1在效率-准确度曲线上表现出色,超过了dense基eline和state-of-the-art sparse推理引擎MNN的速度。实验结果表明,在Qualcomm 855上,为30%稀盐的MobileNet-v1,SparseByteNN可以与dense版本相比,提高1.27倍的速度,同时与MNN相比,提高1.29倍的速度,但是略有准确性下降(0.224%)。SparseByteNN的源代码将在https://github.com/lswzjuer/SparseByteNN上发布。
Trust, Accountability, and Autonomy in Knowledge Graph-based AI for Self-determination
for: This paper is written to address the issue of self-determination in the context of the growing use of Knowledge Graphs (KGs) and Artificial Intelligence (AI) in online services.
methods: The paper uses a conceptual framework to explore the foundational topics and research pillars needed to support KG-based AI for self-determination, and analyzes challenges and opportunities for citizen self-determination in a real-world scenario.
results: The paper proposes a research agenda aimed at accomplishing the recommended objectives, including ensuring the trustworthiness of AI systems, transparency in data and inner workings, and accountability for decision-making.Abstract
Knowledge Graphs (KGs) have emerged as fundamental platforms for powering intelligent decision-making and a wide range of Artificial Intelligence (AI) services across major corporations such as Google, Walmart, and AirBnb. KGs complement Machine Learning (ML) algorithms by providing data context and semantics, thereby enabling further inference and question-answering capabilities. The integration of KGs with neuronal learning (e.g., Large Language Models (LLMs)) is currently a topic of active research, commonly named neuro-symbolic AI. Despite the numerous benefits that can be accomplished with KG-based AI, its growing ubiquity within online services may result in the loss of self-determination for citizens as a fundamental societal issue. The more we rely on these technologies, which are often centralised, the less citizens will be able to determine their own destinies. To counter this threat, AI regulation, such as the European Union (EU) AI Act, is being proposed in certain regions. The regulation sets what technologists need to do, leading to questions concerning: How can the output of AI systems be trusted? What is needed to ensure that the data fuelling and the inner workings of these artefacts are transparent? How can AI be made accountable for its decision-making? This paper conceptualises the foundational topics and research pillars to support KG-based AI for self-determination. Drawing upon this conceptual framework, challenges and opportunities for citizen self-determination are illustrated and analysed in a real-world scenario. As a result, we propose a research agenda aimed at accomplishing the recommended objectives.
摘要
知识 graphs (KGs) 已成为智能决策的基础 плаform,并在大型公司如 Google、Walmart 和 Airbnb 中应用于许多人工智能 (AI) 服务。KGs 补充机器学习 (ML) 算法,提供数据上下文和 semantics,从而实现进一步的推理和问答能力。目前,KGs 与神经网络学 (e.g., Large Language Models (LLMs)) 的结合,被称为神经符号 AI。尽管 KG-based AI 具有许多优点,但其在线服务的普及可能导致公民的自主权减退为社会问题。随着我们对这些技术的依赖,我们将失去自己的命运。为了解决这种威胁,AI 规则,如欧盟 (EU) AI 法规,在某些地区被提出。这些规则需要技术人员做什么,引发了如何确保 AI 系统输出的可靠性、数据驱动和内部机制的透明度,以及如何让 AI 做出负责任的决策的问题。本文概括了 KG-based AI 的基础主题和研究柱石,并通过实际场景的示例和分析,描述了公民自主权的挑战和机遇。因此,我们提出了一个研究计划,旨在实现建议的目标。
Optimize Planning Heuristics to Rank, not to Estimate Cost-to-Goal
results: 实验结果表明,使用这种方法可以在多种问题上得到更好的性能。Abstract
In imitation learning for planning, parameters of heuristic functions are optimized against a set of solved problem instances. This work revisits the necessary and sufficient conditions of strictly optimally efficient heuristics for forward search algorithms, mainly A* and greedy best-first search, which expand only states on the returned optimal path. It then proposes a family of loss functions based on ranking tailored for a given variant of the forward search algorithm. Furthermore, from a learning theory point of view, it discusses why optimizing cost-to-goal \hstar\ is unnecessarily difficult. The experimental comparison on a diverse set of problems unequivocally supports the derived theory.
摘要
在依 Beispiel Learning for Planning 中, Parameters of 追求函数 被优化对于一组解决的问题实例。 这项工作 revisits the necessary and sufficient conditions of strictly optimally efficient heuristics for forward search algorithms, mainly A\* 和 greedy best-first search, which expand only states on the returned optimal path. It then proposes a family of loss functions based on ranking tailored for a given variant of the forward search algorithm. Furthermore, from a learning theory point of view, it discusses why optimizing cost-to-goal \hstar\ is unnecessarily difficult. The experimental comparison on a diverse set of problems unequivocally supports the derived theory.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.
Denoising Diffusion Probabilistic Models for Hardware-Impaired Communication Systems: Towards Wireless Generative AI
paper_authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
For: This paper proposes a practical wireless communication system with hardware-impaired transceivers, using denoising diffusion probabilistic models (DDPMs) to improve network resilience and reconstruction performance.* Methods: The proposed DDPM-based receiver uses a decomposition of the data generation process to address realistic non-idealities such as hardware impairments, channel distortions, and quantization errors.* Results: The paper shows that the proposed approach provides near-invariant reconstruction performance with respect to different hardware impairment levels and quantization errors, and achieves more than 25 dB improvement in reconstruction performance compared to conventional deep neural network (DNN)-based receivers.Abstract
Thanks to the outstanding achievements from state-of-the-art generative models like ChatGPT and diffusion models, generative AI has gained substantial attention across various industrial and academic domains. In this paper, denoising diffusion probabilistic models (DDPMs) are proposed for a practical finite-precision wireless communication system with hardware-impaired transceivers. The intuition behind DDPM is to decompose the data generation process over the so-called "denoising" steps. Inspired by this, a DDPM-based receiver is proposed for a practical wireless communication scheme that faces realistic non-idealities, including hardware impairments (HWI), channel distortions, and quantization errors. It is shown that our approach provides network resilience under low-SNR regimes, near-invariant reconstruction performance with respect to different HWI levels and quantization errors, and robust out-of-distribution performance against non-Gaussian noise. Moreover, the reconstruction performance of our scheme is evaluated in terms of cosine similarity and mean-squared error (MSE), highlighting more than 25 dB improvement compared to the conventional deep neural network (DNN)-based receivers.
摘要
由于现代生成模型如ChatGPT和扩散模型的出色成就,生成AI已经受到了各种领域和学术领域的广泛关注。在这篇论文中,我们提出了一种实用的finite-precision无线通信系统中的杂谱扩散概率模型(DDPM)。DDPM的启发是将数据生成过程 decomposes into "denoising" steps。受到这种启发,我们提出了基于DDPM的一种实用的无线通信接收器,可以面对现实的非理想条件,包括硬件缺陷(HWI)、通道扭曲和量化误差。我们的方法可以在低SNR情况下提供网络鲁棒性,对不同HWI水平和量化误差 exhibit near-invariant重建性,并且具有对非高斯噪声的Robust性。此外,我们的重建性分析采用cosine similarity和平均方差Error(MSE),表明与传统的深度神经网络(DNN)接收器相比,我们的方法可以实现超过25dB的提升。
ALT: Towards Fine-grained Alignment between Language and CTR Models for Click-Through Rate Prediction
results: 实验结果表明,这paper提出的alignment方法可以在三个实际 datasets上达到 state-of-the-art 性能,并且可以与不同的语言和CTR模型结合使用,以满足不同的应用场景需求。Abstract
Click-through rate (CTR) prediction plays as a core function module in various personalized online services. According to the data modality and input format, the models for CTR prediction can be mainly classified into two categories. The first one is the traditional CTR models that take as inputs the one-hot encoded ID features of tabular modality, which aims to capture the collaborative signals via feature interaction modeling. The second category takes as inputs the sentences of textual modality obtained by hard prompt templates, where pretrained language models (PLMs) are adopted to extract the semantic knowledge. These two lines of research generally focus on different characteristics of the same input data (i.e., textual and tabular modalities), forming a distinct complementary relationship with each other. Therefore, in this paper, we propose to conduct fine-grained feature-level Alignment between Language and CTR models (ALT) for CTR prediction. Apart from the common CLIP-like instance-level contrastive learning, we further design a novel joint reconstruction pretraining task for both masked language and tabular modeling. Specifically, the masked data of one modality (i.e., tokens or features) has to be recovered with the help of the other modality, which establishes the feature-level interaction and alignment via sufficient mutual information extraction between dual modalities. Moreover, we propose three different finetuning strategies with the option to train the aligned language and CTR models separately or jointly for downstream CTR prediction tasks, thus accommodating the varying efficacy and efficiency requirements for industrial applications. Extensive experiments on three real-world datasets demonstrate that ALT outperforms SOTA baselines, and is highly compatible for various language and CTR models.
摘要
点击率(CTR)预测作为个人化在线服务的核心功能模块,可以根据数据模式和输入格式分为两类模型。第一类是传统的 CTR 模型,通过一个热门的 ID 特征一个个进行编码,目的是捕捉协作信号via特性互动模型。第二类则是基于硬模板的文本模式,采用预训练语言模型(PLMs)来提取 semantics。这两种研究通常关注不同的输入数据特性(即文本和表格模式),形成一种明确的补做关系。因此,在这篇论文中,我们提议进行细化的特征级别对齐(ALT)来进行 CTR 预测。除了常见的 CLIP 类似的实例级别对比学习外,我们还设计了一种新的联合重建预训任务,以便对masked language和表格模型进行共同预训。具体来说,一个模式(例如, токен或特征)的masked数据需要通过另一个模式来恢复,这种方式在两个模式之间建立特征级别的交互和对齐,通过sufficient mutual information extraction between dual modalities。此外,我们还提出了三种不同的 finetuning 策略,可以根据下游应用的效率和可行性要求来训练对齐的语言和 CTR 模型,从而满足不同的应用场景。广泛的实验表明,ALT 超过了 SOTA 基elines,并具有高度的兼容性,可以与多种语言和 CTR 模型结合使用。
Large-Scale Application of Fault Injection into PyTorch Models – an Extension to PyTorchFI for Validation Efficiency
results: 本研究通过对PyTorch模型进行多个场景测试,并对测试结果进行分析,以便了解硬件故障对模型的影响,并且提供了一些使用PyTorchALFI框架进行模型修改和比较的示例。Abstract
Transient or permanent faults in hardware can render the output of Neural Networks (NN) incorrect without user-specific traces of the error, i.e. silent data errors (SDE). On the other hand, modern NNs also possess an inherent redundancy that can tolerate specific faults. To establish a safety case, it is necessary to distinguish and quantify both types of corruptions. To study the effects of hardware (HW) faults on software (SW) in general and NN models in particular, several fault injection (FI) methods have been established in recent years. Current FI methods focus on the methodology of injecting faults but often fall short of accounting for large-scale FI tests, where many fault locations based on a particular fault model need to be analyzed in a short time. Results need to be concise, repeatable, and comparable. To address these requirements and enable fault injection as the default component in a machine learning development cycle, we introduce a novel fault injection framework called PyTorchALFI (Application Level Fault Injection for PyTorch) based on PyTorchFI. PyTorchALFI provides an efficient way to define randomly generated and reusable sets of faults to inject into PyTorch models, defines complex test scenarios, enhances data sets, and generates test KPIs while tightly coupling fault-free, faulty, and modified NN. In this paper, we provide details about the definition of test scenarios, software architecture, and several examples of how to use the new framework to apply iterative changes in fault location and number, compare different model modifications, and analyze test results.
摘要
非暂时或永久的硬件故障可以使神经网络(NN)的输出错误无法诊断到用户特定的错误迹象,即静默数据错误(SDE)。然而,现代NN也拥有内置的重复性,可以承受特定的故障。为建立安全性 caso,需要分化和量化两种损害。为了研究硬件(HW)故障对软件(SW)的影响,以及NN模型的影响,多年来已经有多种硬件故障插入(FI)方法的建立。现有的FI方法通常将注意力集中在插入故障的方法上,而忽视了大规模FI测试,需要在短时间内分析多个故障位置基于特定故障模型。结果需要是简洁、重复、比较。为解决这些需求,我们介绍了一个新的硬件故障插入框架 called PyTorchALFI(PyTorch应用程序级故障插入),基于PyTorchFI。PyTorchALFI提供了一种效果的方式来定义随机生成的和可重用的故障集,定义复杂的测试enario,增强数据集,并生成测试KPI,同时紧密地集成 fault-free、FAULTY 和修改后的NN。在这篇文章中,我们提供了测试scenario的定义、软件架构和多种使用新框架的示例,以应用iterative变化的故障位置和数量,比较不同的模型修改,分析测试结果。
Explaining the Decisions of Deep Policy Networks for Robotic Manipulations
results: 研究人员通过对深度策略模型进行输入贡献分析,发现了多modal 感知器输入的动态变化,并且可以在机器人 manipulate 任务中提高透明性和可靠性。Abstract
Deep policy networks enable robots to learn behaviors to solve various real-world complex tasks in an end-to-end fashion. However, they lack transparency to provide the reasons of actions. Thus, such a black-box model often results in low reliability and disruptive actions during the deployment of the robot in practice. To enhance its transparency, it is important to explain robot behaviors by considering the extent to which each input feature contributes to determining a given action. In this paper, we present an explicit analysis of deep policy models through input attribution methods to explain how and to what extent each input feature affects the decisions of the robot policy models. To this end, we present two methods for applying input attribution methods to robot policy networks: (1) we measure the importance factor of each joint torque to reflect the influence of the motor torque on the end-effector movement, and (2) we modify a relevance propagation method to handle negative inputs and outputs in deep policy networks properly. To the best of our knowledge, this is the first report to identify the dynamic changes of input attributions of multi-modal sensor inputs in deep policy networks online for robotic manipulation.
摘要
深度政策网络可以让机器人学习做各种复杂的实际任务,但它缺乏透明度,无法提供动作的原因。因此,这种黑obox模型在实际应用中可能会导致低可靠性和干扰行为。为了增强其透明度,需要解释机器人行为,考虑每个输入特征对决策的影响程度。在这篇论文中,我们通过输入贡献方法来解释深度政策模型中每个输入特征对机器人决策的影响。为此,我们提出了两种将输入贡献方法应用于机器人政策网络:首先,我们测量每个 JOINT 扭矩的重要性因素,以反映电动机扭矩对终端器运动的影响;其次,我们修改了 relevance propagation 方法,以正确处理深度政策网络中的负输入和输出。根据我们所知,这是首次在深度政策网络上线实时识别多模式感知输入的动态变化。
Refining Diffusion Planner for Reliable Behavior Synthesis by Automatic Detection of Infeasible Plans
results: 在三个benchmark上进行了offline控制设置的长期规划测试,并证明了我们的方法的效果。同时,我们还展示了我们的方法的解释力,通过显示差值预测器的归属地图和错误途径,以便更深入了解生成的计划。Abstract
Diffusion-based planning has shown promising results in long-horizon, sparse-reward tasks by training trajectory diffusion models and conditioning the sampled trajectories using auxiliary guidance functions. However, due to their nature as generative models, diffusion models are not guaranteed to generate feasible plans, resulting in failed execution and precluding planners from being useful in safety-critical applications. In this work, we propose a novel approach to refine unreliable plans generated by diffusion models by providing refining guidance to error-prone plans. To this end, we suggest a new metric named restoration gap for evaluating the quality of individual plans generated by the diffusion model. A restoration gap is estimated by a gap predictor which produces restoration gap guidance to refine a diffusion planner. We additionally present an attribution map regularizer to prevent adversarial refining guidance that could be generated from the sub-optimal gap predictor, which enables further refinement of infeasible plans. We demonstrate the effectiveness of our approach on three different benchmarks in offline control settings that require long-horizon planning. We also illustrate that our approach presents explainability by presenting the attribution maps of the gap predictor and highlighting error-prone transitions, allowing for a deeper understanding of the generated plans.
摘要
Diffusion-based planning has shown promising results in long-horizon, sparse-reward tasks by training trajectory diffusion models and conditioning the sampled trajectories using auxiliary guidance functions. However, due to their nature as generative models, diffusion models are not guaranteed to generate feasible plans, resulting in failed execution and precluding planners from being useful in safety-critical applications. In this work, we propose a novel approach to refine unreliable plans generated by diffusion models by providing refining guidance to error-prone plans. To this end, we suggest a new metric named restoration gap for evaluating the quality of individual plans generated by the diffusion model. A restoration gap is estimated by a gap predictor which produces restoration gap guidance to refine a diffusion planner. We additionally present an attribution map regularizer to prevent adversarial refining guidance that could be generated from the sub-optimal gap predictor, which enables further refinement of infeasible plans. We demonstrate the effectiveness of our approach on three different benchmarks in offline control settings that require long-horizon planning. We also illustrate that our approach presents explainability by presenting the attribution maps of the gap predictor and highlighting error-prone transitions, allowing for a deeper understanding of the generated plans.🇨🇳 Diffusion-based планирование 已经在长期目标、稀热奖励任务中显示出了承诺的结果,通过训练路径扩散模型并使用辅助指导函数来conditioning 采样的 trajectory。然而,由于它们的性质为生成模型,diffusion 模型不能保证生成可行的计划,导致执行失败和禁止在安全应用中使用。在这种情况下,我们提出了一种新的方法,可以修复不可靠的计划,并且可以防止由低质量的 gap 预测器生成的 adversarial 修复指南。我们建议一个新的 metric 名为 restoration gap,可以评估生成的计划质量。restoration gap 由 gap 预测器生成的修复指南来衡量。此外,我们还提出了一种 attributed map 规范,可以防止由低质量的 gap 预测器生成的 adversarial 修复指南。我们在三个不同的 benchmark 上进行了证明,并示出了我们的方法的可行性和可见性。🇨🇳 Diffusion-based планирование 已经在长期目标、稀热奖励任务中显示出了承诺的结果,通过训练路径扩散模型并使用辅助指导函数来conditioning 采样的 trajectory。然而,由于它们的性质为生成模型,diffusion 模型不能保证生成可行的计划,导致执行失败和禁止在安全应用中使用。在这种情况下,我们提出了一种新的方法,可以修复不可靠的计划,并且可以防止由低质量的 gap 预测器生成的 adversarial 修复指南。我们建议一个新的 metric 名为 restoration gap,可以评估生成的计划质量。restoration gap 由 gap 预测器生成的修复指南来衡量。此外,我们还提出了一种 attributed map 规范,可以防止由低质量的 gap 预测器生成的 adversarial 修复指南。我们在三个不同的 benchmark 上进行了证明,并示出了我们的方法的可行性和可见性。
Artificial intelligence and the limits of the humanities
results: 这篇论文的结果表明,人工智能将在人文学科中推广应用,从艺术到政治科学和哲学,使这些领域变得更加吸引人,让学生们能够超越当前的限制。Abstract
The complexity of cultures in the modern world is now beyond human comprehension. Cognitive sciences cast doubts on the traditional explanations based on mental models. The core subjects in humanities may lose their importance. Humanities have to adapt to the digital age. New, interdisciplinary branches of humanities emerge. Instant access to information will be replaced by instant access to knowledge. Understanding the cognitive limitations of humans and the opportunities opened by the development of artificial intelligence and interdisciplinary research necessary to address global challenges is the key to the revitalization of humanities. Artificial intelligence will radically change humanities, from art to political sciences and philosophy, making these disciplines attractive to students and enabling them to go beyond current limitations.
摘要
现代世界中文化的复杂性已经超越人类理解的能力。诺谟科学抵触了传统基于心理模型的解释。核心人文科目可能会失去其重要性。人文学需要适应数字时代。新的交叉学科人文科学出现。快速获取信息将被快速获取知识所取代。理解人类认知的局限性和人工智能和交叉研究的发展对解决全球挑战是人文学的关键。人工智能将彻底改变人文学,从艺术到政治科学和哲学,使这些学科吸引更多的学生,让他们可以超越当前的局限性。
Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills
paper_authors: Seongun Kim, Kyowoon Lee, Jaesik Choi
for: 提高复杂技能自主学习的效率和状态覆盖速度
methods: 基于信息理论的无监督学习方法,包括变量资源学习和目标条件RL
results: 在复杂导航和机器人操作任务上,提高样本效率和状态覆盖速度,并在实际世界Robot导航任务中完成零模拟设置,并且与全球规划器结合可以进一步提高表现。Abstract
Mutual information-based reinforcement learning (RL) has been proposed as a promising framework for retrieving complex skills autonomously without a task-oriented reward function through mutual information (MI) maximization or variational empowerment. However, learning complex skills is still challenging, due to the fact that the order of training skills can largely affect sample efficiency. Inspired by this, we recast variational empowerment as curriculum learning in goal-conditioned RL with an intrinsic reward function, which we name Variational Curriculum RL (VCRL). From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.
摘要
互助信息基于渐进学习(RL)已被提议为自动学习复杂技能的有望框架,通过互助信息(MI)最大化或变量赋权来实现。然而,学习复杂技能仍然是挑战,因为训练技能的顺序可以大大影响样本效率。drawing inspiration from this, we recast variational empowerment as curriculum learning in goal-conditioned RL with an intrinsic reward function, which we name Variational Curriculum RL (VCRL). From this perspective, we propose a novel approach to unsupervised skill discovery based on information theory, called Value Uncertainty Variational Curriculum (VUVC). We prove that, under regularity conditions, VUVC accelerates the increase of entropy in the visited states compared to the uniform curriculum. We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.
results: 研究发现,CSD方法可以在文本生成中实现更高效的结果,包括形状生成、纹理合成和形状编辑等任务,并且比现有的方法更高效。Abstract
Text-to-3D generation has made remarkable progress recently, particularly with methods based on Score Distillation Sampling (SDS) that leverages pre-trained 2D diffusion models. While the usage of classifier-free guidance is well acknowledged to be crucial for successful optimization, it is considered an auxiliary trick rather than the most essential component. In this paper, we re-evaluate the role of classifier-free guidance in score distillation and discover a surprising finding: the guidance alone is enough for effective text-to-3D generation tasks. We name this method Classifier Score Distillation (CSD), which can be interpreted as using an implicit classification model for generation. This new perspective reveals new insights for understanding existing techniques. We validate the effectiveness of CSD across a variety of text-to-3D tasks including shape generation, texture synthesis, and shape editing, achieving results superior to those of state-of-the-art methods. Our project page is https://xinyu-andy.github.io/Classifier-Score-Distillation
摘要
Text-to-3D生成技术在最近几年内已取得了很大的进步,特别是基于Score Distillation Sampling(SDS)的方法。SDS方法利用预训练的2D扩散模型,而使用无类标注导航是已知为成功优化的关键。然而,在这篇论文中,我们重新评估了无类标注导航的角色在Score Distillation中,发现一个奇异的发现:导航alone是足够的 для有效的文本到3D生成任务。我们称之为Classifier Score Distillation(CSD),可以理解为使用隐藏的分类模型进行生成。这新的视角揭示了新的理解现有技术的新思路。我们在多种文本到3D任务上验证了CSD的效果,包括形状生成、Texture Synthesis和形状编辑,并成功超越了当前的状态艺术法。更多信息请访问我们的项目页面:https://xinyu-andy.github.io/Classifier-Score-Distillation。
Resource Constrained Semantic Segmentation for Waste Sorting
results: 作者们在三个网络(ICNet、BiSeNet(Xception39 backbone)和ENet)上进行了实验,并取得了正面的结果,同时只有marginally影响了 Mean IoU 度量。此外,作者们还提出了一种combined Focal和Lovász loss函数,用于解决隐式的类别不均衡问题,从而实现更好的性能。Abstract
This work addresses the need for efficient waste sorting strategies in Materials Recovery Facilities to minimize the environmental impact of rising waste. We propose resource-constrained semantic segmentation models for segmenting recyclable waste in industrial settings. Our goal is to develop models that fit within a 10MB memory constraint, suitable for edge applications with limited processing capacity. We perform the experiments on three networks: ICNet, BiSeNet (Xception39 backbone), and ENet. Given the aforementioned limitation, we implement quantization and pruning techniques on the broader nets, achieving positive results while marginally impacting the Mean IoU metric. Furthermore, we propose a combination of Focal and Lov\'asz loss that addresses the implicit class imbalance resulting in better performance compared with the Cross-entropy loss function.
摘要
这个研究旨在提出高效垃圾分类策略,以减少垃圾堆生的环境影响。我们提议使用限制资源的语义分割模型,用于工业环境中分类可回收物。我们的目标是开发10MB内存套用的模型,适用于边缘应用程序,具有有限的处理能力。我们在三个网络上进行实验:ICNet、BiSeNet(Xception39底层)和ENet。由于上述限制,我们实施了量化和剪除技术,获得了正面效果,同时微量地影响了 Mean IoU 度量。此外,我们提议使用 FOCAL 和 Lovász 损失函数,解决了隐式的分类不均衡问题,从而实现更好的性能。
results: 论文表明,在完美游戏情况下,两个玩家的游戏都会平局。Abstract
The game of Othello is one of the world's most complex and popular games that has yet to be computationally solved. Othello has roughly ten octodecillion (10 to the 58th power) possible game records and ten octillion (10 to the 28th power) possible game position. The challenge of solving Othello, determining the outcome of a game with no mistake made by either player, has long been a grand challenge in computer science. This paper announces a significant milestone: Othello is now solved, computationally proved that perfect play by both players lead to a draw. Strong Othello software has long been built using heuristically designed search techniques. Solving a game provides the solution which enables software to play the game perfectly.
摘要
抽象游戏“奥迪洛”是全球最复杂且受欢迎的游戏之一,尚未被计算解决。奥迪洛拥有约十 octodecillion(10^58)可能的游戏记录和十 octillion(10^28)可能的游戏位置。解决奥迪洛的挑战,即确定没有任何错误的游戏记录,长期被计算机科学界视为一个巨大的挑战。本文宣布了一个重要突破:奥迪洛已经被计算解决,并证明了完美游戏记录会导致平局。强大的奥迪洛软件长期采用了经验设计的搜索技术。解决游戏提供了完美游戏记录,使软件可以完美地游戏。
Protecting Publicly Available Data With Machine Learning Shortcuts
results: 成功地防止非法数据抓取 (successfully prevented unauthorized data crawling)Here’s the full text in Simplified Chinese:
for: 防止非法数据抓取,即许多 crawlers Grabs 和重新销售数据点。
methods: 利用机器学习假短Circuit,即在数据集中植入 ML 假短,使模型具有卓越的训练和测试性能,但同时限制模型的泛化能力。
results: 通过实验,我们成功地防止了非法数据抓取,而且这种方法可以扩展到多个用例。Abstract
Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.
摘要
Few-shot Hybrid Domain Adaptation of Image Generators
results: 我们的方法可以在单个适应器中获得多个目标领域的各种特征,超越基eline方法在 semantic similarity、图像准确性和交叉领域一致性上。Abstract
Can a pre-trained generator be adapted to the hybrid of multiple target domains and generate images with integrated attributes of them? In this work, we introduce a new task -- Few-shot Hybrid Domain Adaptation (HDA). Given a source generator and several target domains, HDA aims to acquire an adapted generator that preserves the integrated attributes of all target domains, without overriding the source domain's characteristics. Compared with Domain Adaptation (DA), HDA offers greater flexibility and versatility to adapt generators to more composite and expansive domains. Simultaneously, HDA also presents more challenges than DA as we have access only to images from individual target domains and lack authentic images from the hybrid domain. To address this issue, we introduce a discriminator-free framework that directly encodes different domains' images into well-separable subspaces. To achieve HDA, we propose a novel directional subspace loss comprised of a distance loss and a direction loss. Concretely, the distance loss blends the attributes of all target domains by reducing the distances from generated images to all target subspaces. The direction loss preserves the characteristics from the source domain by guiding the adaptation along the perpendicular to subspaces. Experiments show that our method can obtain numerous domain-specific attributes in a single adapted generator, which surpasses the baseline methods in semantic similarity, image fidelity, and cross-domain consistency.
摘要
可以把预训练的生成器适应到多个目标Domain的混合体中,生成具有这些目标Domain的 интеGRATED特征?在这项工作中,我们提出了一个新任务——多少shot Hybrid Domain Adaptation(HDA)。给定一个源生成器和多个目标Domain,HDA的目标是获得一个适应了所有目标Domain的特征的生成器,而不覆盖源Domain的特征。与Domain Adaptation(DA)相比,HDA具有更大的灵活性和多样性,可以适应更复杂和广泛的Domain。同时,HDA也带来了更多的挑战,因为我们只有各个目标Domain的图像,缺乏真正的混合Domain的图像。为解决这个问题,我们提出了一个不含探测器的框架,直接将不同Domain的图像编码成分离的子空间中。为实现HDA,我们提出了一个新的方向性子空间损失,包括距离损失和方向损失。具体来说,距离损失将所有目标Domain的特征混合在生成图像中,使得生成图像与所有目标Subspace之间的距离减小。方向损失保持源Domain的特征,使得适应过程在Subspace之间的垂直方向上进行。实验表明,我们的方法可以在单个适应器中获得多个Domain特征,超过基eline方法的semantic similarity、图像准确率和交叉Domain一致性。
RGB-X Object Detection via Scene-Specific Fusion Modules
results: 与现有方法比较,我们的方法在RGB-热和RGB-关闭数据上表现更好,仅需要小量额外参数。代码可以在https://github.com/dsriaditya999/RGBXFusion 上获取。Abstract
Multimodal deep sensor fusion has the potential to enable autonomous vehicles to visually understand their surrounding environments in all weather conditions. However, existing deep sensor fusion methods usually employ convoluted architectures with intermingled multimodal features, requiring large coregistered multimodal datasets for training. In this work, we present an efficient and modular RGB-X fusion network that can leverage and fuse pretrained single-modal models via scene-specific fusion modules, thereby enabling joint input-adaptive network architectures to be created using small, coregistered multimodal datasets. Our experiments demonstrate the superiority of our method compared to existing works on RGB-thermal and RGB-gated datasets, performing fusion using only a small amount of additional parameters. Our code is available at https://github.com/dsriaditya999/RGBXFusion.
摘要
多模态深感融合具有实现自动驾驶车辆在所有天气条件下视觉理解周围环境的潜力。然而,现有的深感融合方法通常采用复杂的建筑方式,混合多模态特征,需要大量相关多模态数据进行训练。在这项工作中,我们提出了一种高效和模块化的 RGB-X 融合网络,可以通过Scene-specific fusion modules来利用和融合预训练的单模态模型,从而实现输入适应性网络架构,使用小量相关多模态数据进行融合。我们的实验表明我们的方法与现有作品在 RGB-thermal 和 RGB-gated 数据集上表现更优,只需要一小部分额外参数。我们的代码可以在 GitHub 上找到:https://github.com/dsriaditya999/RGBXFusion。
Balance, Imbalance, and Rebalance: Understanding Robust Overfitting from a Minimax Game Perspective
results: 实验结果表明,采用ReBalanced Adversarial Training(ReBAT)可以提高模型的鲁棒性,并不会在训练过程中出现Robust Overfitting问题,即使学习率衰减很长。Abstract
Adversarial Training (AT) has become arguably the state-of-the-art algorithm for extracting robust features. However, researchers recently notice that AT suffers from severe robust overfitting problems, particularly after learning rate (LR) decay. In this paper, we explain this phenomenon by viewing adversarial training as a dynamic minimax game between the model trainer and the attacker. Specifically, we analyze how LR decay breaks the balance between the minimax game by empowering the trainer with a stronger memorization ability, and show such imbalance induces robust overfitting as a result of memorizing non-robust features. We validate this understanding with extensive experiments, and provide a holistic view of robust overfitting from the dynamics of both the two game players. This understanding further inspires us to alleviate robust overfitting by rebalancing the two players by either regularizing the trainer's capacity or improving the attack strength. Experiments show that the proposed ReBalanced Adversarial Training (ReBAT) can attain good robustness and does not suffer from robust overfitting even after very long training. Code is available at https://github.com/PKU-ML/ReBAT.
摘要
adversarial training (AT) 已成为当前领域中最佳的算法,以EXTRACTING Robust Features。然而,研究人员最近注意到,AT受到了严重的Robust Overfitting问题,特别是在学习率 (LR) decay 后。在这篇论文中,我们解释这种现象,视为 adversarial training 是一个动态的 minimax 游戏 между模型训练者和攻击者。我们分析了如何 LR decay 破坏了这个游戏的平衡,使得训练者获得了更强的记忆能力,并显示这种不平衡引起了 robust overfitting 的结果。我们通过广泛的实验 validate 这一理解,并提供了robust overfitting 的整体视图,从两个游戏 player 的动态来看。这一理解还 inspirits 我们提出了 rebalancing adversarial training (ReBAT),以解决 robust overfitting 问题。实验显示,ReBAT 可以 дости得好的Robustness,并不会在训练过程中 suffer from robust overfitting。代码可以在 https://github.com/PKU-ML/ReBAT 上找到。
Modified Genetic Algorithm for Feature Selection and Hyper Parameter Optimization: Case of XGBoost in Spam Prediction
results: 实验结果表明,提案的方法可以在10 folds的十分划分验证中,平均取得82.32%的准确率和92.67%的平均值,使用了 menos de 10%的总特征空间。此外,模糊遗传算法还比$Chi^2$和$PCA$特征选择方法更高效。Abstract
Recently, spam on online social networks has attracted attention in the research and business world. Twitter has become the preferred medium to spread spam content. Many research efforts attempted to encounter social networks spam. Twitter brought extra challenges represented by the feature space size, and imbalanced data distributions. Usually, the related research works focus on part of these main challenges or produce black-box models. In this paper, we propose a modified genetic algorithm for simultaneous dimensionality reduction and hyper parameter optimization over imbalanced datasets. The algorithm initialized an eXtreme Gradient Boosting classifier and reduced the features space of tweets dataset; to generate a spam prediction model. The model is validated using a 50 times repeated 10-fold stratified cross-validation, and analyzed using nonparametric statistical tests. The resulted prediction model attains on average 82.32\% and 92.67\% in terms of geometric mean and accuracy respectively, utilizing less than 10\% of the total feature space. The empirical results show that the modified genetic algorithm outperforms $Chi^2$ and $PCA$ feature selection methods. In addition, eXtreme Gradient Boosting outperforms many machine learning algorithms, including BERT-based deep learning model, in spam prediction. Furthermore, the proposed approach is applied to SMS spam modeling and compared to related works.
摘要
最近,社交媒体上的垃圾信息引起了研究和业务界的关注。推特成为了垃圾信息的主要媒体。许多研究努力解决社交媒体上的垃圾信息问题。推特的特点是巨大的特征空间和不均匀的数据分布。通常,相关的研究工作只是解决一些主要挑战或生成黑盒模型。在这篇论文中,我们提出了一种修改后的遗传算法,同时实现维度减少和超参优化。该算法首先初始化了极限梯度提升分类器,然后将推特数据集中的特征空间减少,以生成垃圾预测模型。模型通过10次10fold stratified cross-validation进行验证,并使用非参数统计测试分析。结果显示,修改后的遗传算法在平均上达到82.32%和92.67%的垃圾预测精度,使用的特征空间少于10%。实验结果表明,修改后的遗传算法超过了$Chi^2$和$PCA$特征选择方法。此外,极限梯度提升超过了许多机器学习算法,包括BERT基于深度学习模型,在垃圾预测方面。此外,我们还应用了提议方法到短信垃圾模型中,并与相关工作进行比较。
Introducing instance label correlation in multiple instance learning. Application to cancer detection on histopathological images
results: 在两个实际问题中(抑制肾癌检测),我们的模型表现更好于其他现有的可能性 MIL 方法,并提供了不同的视觉化和分析来了解 Coupling 项的影响。这些结果预期能够应用于其他研究领域。Abstract
In the last years, the weakly supervised paradigm of multiple instance learning (MIL) has become very popular in many different areas. A paradigmatic example is computational pathology, where the lack of patch-level labels for whole-slide images prevents the application of supervised models. Probabilistic MIL methods based on Gaussian Processes (GPs) have obtained promising results due to their excellent uncertainty estimation capabilities. However, these are general-purpose MIL methods that do not take into account one important fact: in (histopathological) images, the labels of neighboring patches are expected to be correlated. In this work, we extend a state-of-the-art GP-based MIL method, which is called VGPMIL-PR, to exploit such correlation. To do so, we develop a novel coupling term inspired by the statistical physics Ising model. We use variational inference to estimate all the model parameters. Interestingly, the VGPMIL-PR formulation is recovered when the weight that regulates the strength of the Ising term vanishes. The performance of the proposed method is assessed in two real-world problems of prostate cancer detection. We show that our model achieves better results than other state-of-the-art probabilistic MIL methods. We also provide different visualizations and analysis to gain insights into the influence of the novel Ising term. These insights are expected to facilitate the application of the proposed model to other research areas.
摘要
最近几年,弱度监督多例学习(MIL)的思想在多个领域得到了广泛的应用。一个典型的应用例子是计算生物学,因为整个扫描图像缺乏小块级别标签,使得超参数学习模型无法应用。基于 Gaussian Processes(GP)的概率MIL方法在这些领域取得了显著的成果,主要是因为它们的不确定性估计能力强。然而,这些是通用MIL方法,没有考虑一个重要的事实:在生物学图像中,邻居块的标签具有相互关联性。在这个工作中,我们扩展了状态空间的GP-based MIL方法,称为VGPMIL-PR,以利用这种相互关联性。为此,我们开发了一个灵感来自统计物理爱因斯坦模型的新封装项。我们使用变量推断来估计所有模型参数。有趣的是,VGPMIL-PR的形式被恰当的Weightvanishes时 recovered。我们在两个实际问题中评估了提案的方法性能:肠癌检测。我们发现,我们的模型在其他状态空间的概率MIL方法中表现出色,并且提供了不同的视觉化和分析,以便更深入地理解VGPMIL-PR的影响。这些理解可能会促进该模型在其他研究领域的应用。
Modeling the Telemarketing Process using Genetic Algorithms and Extreme Boosting: Feature Selection and Cost-Sensitive Analytical Approach
For: This paper aims to leverage telemarketing data to model the willingness of clients to make a term deposit and to find the most significant characteristics of clients.* Methods: The paper proposes a novel genetic algorithm-based classifier to select the best discriminating features and tune classifier parameters simultaneously, and builds an explainable prediction model using real-world data from a Portuguese bank and national socio-economic metrics.* Results: The models significantly outperform related works in terms of class of interest accuracy, with an average of 89.07% and a type I error of 0.059. The model is expected to maximize the potential profit margin at the least possible cost and provide more insights to support marketing decision-making.Here is the text in Simplified Chinese:* For: 这篇论文目标是利用电话营销数据模型客户愿意签订贷款,并找出客户最重要的特征。* Methods: 论文提出了一种基于遗传算法的分类器,以同时选择最佳描述特征和分类器参数。它还建立了可解释的预测模型,使用葡萄牙银行的实际数据和国家经济指标。* Results: 模型与相关工作相比,表现出了显著的优异,具体来说是89.07%的正精度和0.059的类型一错。模型预期能够最大化利润减cost,并为市场营销决策提供更多的洞察。Abstract
Currently, almost all direct marketing activities take place virtually rather than in person, weakening interpersonal skills at an alarming pace. Furthermore, businesses have been striving to sense and foster the tendency of their clients to accept a marketing offer. The digital transformation and the increased virtual presence forced firms to seek novel marketing research approaches. This research aims at leveraging the power of telemarketing data in modeling the willingness of clients to make a term deposit and finding the most significant characteristics of the clients. Real-world data from a Portuguese bank and national socio-economic metrics are used to model the telemarketing decision-making process. This research makes two key contributions. First, propose a novel genetic algorithm-based classifier to select the best discriminating features and tune classifier parameters simultaneously. Second, build an explainable prediction model. The best-generated classification models were intensively validated using 50 times repeated 10-fold stratified cross-validation and the selected features have been analyzed. The models significantly outperform the related works in terms of class of interest accuracy, they attained an average of 89.07\% and 0.059 in terms of geometric mean and type I error respectively. The model is expected to maximize the potential profit margin at the least possible cost and provide more insights to support marketing decision-making.
摘要
现在,大多数直接市场活动都发生在虚拟空间上,而不是面对面,这导致人际交往技巧受到了威胁。此外,企业也在努力感受和培养客户接受市场提供的倾向。由于数字转型和虚拟存在的增加,公司们需要找到新的市场研究方法。这项研究希望通过电话营销数据来模型客户是否签订贷款的愿望,并找出客户最重要的特征。使用葡萄牙银行的实际数据和国家经济指标,我们模型了电话营销决策过程。这项研究有两个关键贡献:首先,提出了一种基于遗传算法的分类器,可同时选择最佳分类特征和调整分类器参数。第二,建立了可解释预测模型。最佳生成的分类模型经过50次10fold分割验证,选择的特征也进行了分析。这些模型在相关作品的类别准确率和类型一错率方面均表现出色,其中类别准确率为89.07%,类型一错率为0.059。这些模型预计可以最大化可能的利润差额,并提供更多的市场决策支持。
Improving Factual Consistency of Text Summarization by Adversarially Decoupling Comprehension and Embellishment Abilities of LLMs
results: DECENT 可以够准确地提高基于 LLM 的文本摘要的可靠性,并且可以减少 LLM 生成的幻想现象。Abstract
Despite the recent progress in text summarization made by large language models (LLMs), they often generate summaries that are factually inconsistent with original articles, known as "hallucinations" in text generation. Unlike previous small models (e.g., BART, T5), current LLMs make fewer silly mistakes but more sophisticated ones, such as imposing cause and effect, adding false details, and overgeneralizing, etc. These hallucinations are challenging to detect through traditional methods, which poses great challenges for improving the factual consistency of text summarization. In this paper, we propose an adversarially DEcoupling method to disentangle the Comprehension and EmbellishmeNT abilities of LLMs (DECENT). Furthermore, we adopt a probing-based parameter-efficient technique to cover the shortage of sensitivity for true and false in the training process of LLMs. In this way, LLMs are less confused about embellishing and understanding, thus can execute the instructions more accurately and have enhanced abilities to distinguish hallucinations. Experimental results show that DECENT significantly improves the reliability of text summarization based on LLMs.
摘要
尽管最近的大语言模型(LLM)在文本摘要方面做出了重要的进步,但它们经常生成的摘要却与原文不匹配,这被称为“幻觉”(hallucinations)在文本生成中。与过去的小型模型(例如BART、T5)相比,当前的LLM更少地出现了笨蛋的问题,但更多地出现了更加复杂的问题,如强制 causa et effectus、添加假信息、过度总结等等。这些幻觉通常难以通过传统方法检测,这对提高文本摘要的准确性带来了很大的挑战。在这篇论文中,我们提出了一种对抗分解方法,以分离LLM的理解和丰富能力(DECENT)。此外,我们采用了一种 parameter-efficient 的探测技术,以弥补 LLM 在训练过程中对真实和假的敏感性的缺失。这样,LLM 就不再混乱地塑造和理解,因此可以更准确地执行指令,并具有更强的幻觉检测能力。实验结果表明,DECENT 可以显著提高基于 LLM 的文本摘要的可靠性。
results: 该模型在各种标准测试 benchmarks 上表现出色,并在多个领域的中文语言模型中达到了状态的艺术性能。此外,该报告还提出了一种泄露检测方法,表明测试数据污染是一个需要进一步调查的问题。Abstract
In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose training and then domain-specific enhancement training, respectively. We show that our model not only excels on popular benchmarks, but also achieves \emph{state of the art} performance in Chinese language modeling on diverse domains. Furthermore, we propose a novel leakage detection method, demonstrating that test data contamination is a pressing issue warranting further investigation by the LLM community. To spur future research, we release Skywork-13B along with checkpoints obtained during intermediate stages of the training process. We are also releasing part of our SkyPile corpus, a collection of over 150 billion tokens of web text, which is the largest high quality open Chinese pre-training corpus to date. We hope Skywork-13B and our open corpus will serve as a valuable open-source resource to democratize access to high-quality LLMs.
摘要
在这份技术报告中,我们介绍了 Skywork-13B,一家大语言模型(LLMs),在英语和中文文本集合上 receives 超过 3.2 万亿个字的训练。这是目前最广泛训练和公开发布的相似大小 LLMs。我们提出了一种两阶段训练方法,首先是通用训练,然后是领域特定增强训练。我们发现,我们的模型不仅在 популяр的标准 bencmarks 上表现出色,还在多个领域的中文语言模型中实现了状态的术性表现。此外,我们提出了一种新的泄露检测方法,这表明测试数据污染是一个需要进一步调查的问题, LLMC 社区应该关注。为促进未来的研究,我们发布了 Skywork-13B 和在 intermediate 阶段训练过程中的检点。我们还发布了一部分的 SkyPile corpus,这是目前最大的高质量公开中文预训练集,总计超过 150 亿个字的网络文本。我们希望 Skywork-13B 和我们的开放 corpus 将成为一个价值的开源资源,推广高质量 LLMC 的访问。
TempME: Towards the Explainability of Temporal Graph Neural Networks via Motif Discovery
methods: 本研究提出了一种新的方法 called Temporal Motifs Explainer (TempME), 它基于信息瓶颈理论提取最重要的时间抽象,以最小化包含的信息量,保持解释的简洁和稀烈。
results: 实验表明,TempME 可以更好地找出引导预测的时间抽象,并提高现有 TGNN 的预测精度,最高提高22.96%。Abstract
Temporal graphs are widely used to model dynamic systems with time-varying interactions. In real-world scenarios, the underlying mechanisms of generating future interactions in dynamic systems are typically governed by a set of recurring substructures within the graph, known as temporal motifs. Despite the success and prevalence of current temporal graph neural networks (TGNN), it remains uncertain which temporal motifs are recognized as the significant indications that trigger a certain prediction from the model, which is a critical challenge for advancing the explainability and trustworthiness of current TGNNs. To address this challenge, we propose a novel approach, called Temporal Motifs Explainer (TempME), which uncovers the most pivotal temporal motifs guiding the prediction of TGNNs. Derived from the information bottleneck principle, TempME extracts the most interaction-related motifs while minimizing the amount of contained information to preserve the sparsity and succinctness of the explanation. Events in the explanations generated by TempME are verified to be more spatiotemporally correlated than those of existing approaches, providing more understandable insights. Extensive experiments validate the superiority of TempME, with up to 8.21% increase in terms of explanation accuracy across six real-world datasets and up to 22.96% increase in boosting the prediction Average Precision of current TGNNs.
摘要
现在的渐变系统模型中,时间相关的交互都是通过图structures来描述的。在现实世界中,这些图structures中的时间模式(temporal motifs)是生成未来交互的基本机制。虽然现有的时间图神经网络(TGNN)已经取得了成功,但是还不确定哪些时间模式是TGNN的预测中的重要指示器,这是现有TGNN的解释性和可信度的主要挑战。为解决这个问题,我们提出了一种新的方法,即时间模式解释器(TempME),它抽出TGNN预测中的最重要时间模式,同时保持解释的简洁和精炼。 TempME基于信息瓶颈理论,从交互相关的模式中提取最重要的信息,以保持解释的简洁和精炼。对于 TempME生成的解释,事件之间的空间时间相关性比现有方法高,提供更直观的理解。广泛的实验证明了 TempME 的超越性,在六个实际 dataset 上提高解释准确率达到8.21%,并在当前 TGNN 的预测中提高了平均精度22.96%。
D4Explainer: In-Distribution GNN Explanations via Discrete Denoising Diffusion
results: D4Explainer 在 synthetic 和实际世界数据集上实现了状态之决定的表现,包括解释准确率、实际性、多样性和Robustness。Abstract
The widespread deployment of Graph Neural Networks (GNNs) sparks significant interest in their explainability, which plays a vital role in model auditing and ensuring trustworthy graph learning. The objective of GNN explainability is to discern the underlying graph structures that have the most significant impact on model predictions. Ensuring that explanations generated are reliable necessitates consideration of the in-distribution property, particularly due to the vulnerability of GNNs to out-of-distribution data. Unfortunately, prevailing explainability methods tend to constrain the generated explanations to the structure of the original graph, thereby downplaying the significance of the in-distribution property and resulting in explanations that lack reliability. To address these challenges, we propose D4Explainer, a novel approach that provides in-distribution GNN explanations for both counterfactual and model-level explanation scenarios. The proposed D4Explainer incorporates generative graph distribution learning into the optimization objective, which accomplishes two goals: 1) generate a collection of diverse counterfactual graphs that conform to the in-distribution property for a given instance, and 2) identify the most discriminative graph patterns that contribute to a specific class prediction, thus serving as model-level explanations. It is worth mentioning that D4Explainer is the first unified framework that combines both counterfactual and model-level explanations. Empirical evaluations conducted on synthetic and real-world datasets provide compelling evidence of the state-of-the-art performance achieved by D4Explainer in terms of explanation accuracy, faithfulness, diversity, and robustness.
摘要
广泛部署图 neural network (GNN) 引发了大量关注,其中一个关键问题是解释abilit y,它在模型审核和建立信任worthy图学习中扮演着关键角色。GNN解释的目标是理解图结构对模型预测的影响。为保证生成的解释可靠,必须考虑图中的分布性质,特别是由于GNN的敏感性,以避免由图外数据引起的解释错误。然而,现有的解释方法通常只能在原始图结构下生成解释,从而忽略图中的分布性质,导致解释不可靠。为解决这些挑战,我们提出了D4Explainer,一种新的方法,可以为GNN提供符合分布性质的内部解释。D4Explainer integrate generative graph distribution learning into the optimization objective,它可以实现两个目标:1)生成符合分布性质的对应实例的多个多样化Counterfactual graphs,2)标识影响特定预测的最重要的图模式,并作为模型级别解释。需要注意的是,D4Explainer是第一个结合Counterfactual和模型级别解释的统一框架。empirical evaluations on synthetic and real-world datasets show that D4Explainer achieves state-of-the-art performance in terms of explanation accuracy, faithfulness, diversity, and robustness.
L2T-DLN: Learning to Teach with Dynamic Loss Network
results: 实验表明,这种方法可以提高学生模型的学习效果,并且可以提高不同深度模型在真实世界任务上的性能,包括分类、目标检测和semantic segmentation等场景。Abstract
With the concept of teaching being introduced to the machine learning community, a teacher model start using dynamic loss functions to teach the training of a student model. The dynamic intends to set adaptive loss functions to different phases of student model learning. In existing works, the teacher model 1) merely determines the loss function based on the present states of the student model, i.e., disregards the experience of the teacher; 2) only utilizes the states of the student model, e.g., training iteration number and loss/accuracy from training/validation sets, while ignoring the states of the loss function. In this paper, we first formulate the loss adjustment as a temporal task by designing a teacher model with memory units, and, therefore, enables the student learning to be guided by the experience of the teacher model. Then, with a dynamic loss network, we can additionally use the states of the loss to assist the teacher learning in enhancing the interactions between the teacher and the student model. Extensive experiments demonstrate our approach can enhance student learning and improve the performance of various deep models on real-world tasks, including classification, objective detection, and semantic segmentation scenarios.
摘要
In this paper, we formulate the loss adjustment as a temporal task by designing a teacher model with memory units, allowing the student learning to be guided by the teacher model's experience. Additionally, we use a dynamic loss network to assist the teacher learning in enhancing the interactions between the teacher and the student model. Our approach has been extensively tested and has been shown to improve the performance of various deep models on real-world tasks, including classification, object detection, and semantic segmentation scenarios.
Free from Bellman Completeness: Trajectory Stitching via Model-based Return-conditioned Supervised Learning
paper_authors: Zhaoyi Zhou, Chuning Zhu, Runlong Zhou, Qiwen Cui, Abhishek Gupta, Simon Shaolei Du
for: solves sequential decision-making problems with off-policy dynamic programming techniques
methods: return-conditioned supervised learning (RCSL) and multilayer perceptron function approximator
results: converges under more relaxed assumptions than traditional dynamic programming methods, outperforms state-of-the-art model-free and model-based offline RL algorithms in simulated robotics problemsAbstract
Off-policy dynamic programming (DP) techniques such as $Q$-learning have proven to be an important technique for solving sequential decision-making problems. However, in the presence of function approximation such algorithms are not guaranteed to converge, often diverging due to the absence of Bellman-completeness in the function classes considered, a crucial condition for the success of DP-based methods. In this paper, we show how off-policy learning techniques based on return-conditioned supervised learning (RCSL) are able to circumvent these challenges of Bellman completeness, converging under significantly more relaxed assumptions inherited from supervised learning. We prove there exists a natural environment in which if one uses two-layer multilayer perceptron as the function approximator, the layer width needs to grow linearly with the state space size to satisfy Bellman-completeness while a constant layer width is enough for RCSL. These findings take a step towards explaining the superior empirical performance of RCSL methods compared to DP-based methods in environments with near-optimal datasets. Furthermore, in order to learn from sub-optimal datasets, we propose a simple framework called MBRCSL, granting RCSL methods the ability of dynamic programming to stitch together segments from distinct trajectories. MBRCSL leverages learned dynamics models and forward sampling to accomplish trajectory stitching while avoiding the need for Bellman completeness that plagues all dynamic programming algorithms. We propose both theoretical analysis and experimental evaluation to back these claims, outperforming state-of-the-art model-free and model-based offline RL algorithms across several simulated robotics problems.
摘要
Off-policy动态规划(DP)技术如$Q$-学习已经证明是解决时间序列决策问题的重要技术。然而,在函数拟合的存在下,这些算法并不是保证 converges,经常因为函数类型中缺乏 Bellman完备性,这是动态规划基本条件的重要因素。在这篇论文中,我们表明了基于返回条件supervised learning(RCSL)的外部学习技术可以绕过 Bellman完备性的挑战,并在更松散的假设下 converges。我们证明了在使用两层多层感知器作为函数 aproximator 时,状态空间大小与层宽之间存在直线关系,以满足 Bellman完备性的条件。这些发现为RCSL方法在实际中的superior empirical performance提供了解释。此外,为了学习从不优化的数据集中,我们提议了一个简单的框架called MBRCSL,使得 RCSL 方法可以通过动态规划来缝合分割的轨迹。MBRCSL 利用学习的动力模型和前向采样来完成轨迹缝合,而不需要 Bellman完备性,这些都是动态规划算法的必要条件。我们提出了理论分析和实验评估,在多个模拟的 роботикс问题上超过了当前的model-free和model-based offline RL算法的性能。
ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense
paper_authors: Kankan Zhou, Eason Lai, Wei Bin Au Yeong, Kyriakos Mouratidis, Jing Jiang
for: 评估现有预训练视觉语言模型是否具备理解非常规内容的能力。
methods: 使用新创的ROME数据集进行评测,该数据集包含违反常识知识的图像。
results: 大多数预训练视觉语言模型无法正确地解释非常规的场景。Abstract
Humans possess a strong capability for reasoning beyond common sense. For example, given an unconventional image of a goldfish laying on the table next to an empty fishbowl, a human would effortlessly determine that the fish is not inside the fishbowl. The case, however, may be different for a vision-language model, whose reasoning could gravitate towards the common scenario that the fish is inside the bowl, despite the visual input. In this paper, we introduce a novel probing dataset named ROME (reasoning beyond commonsense knowledge) to evaluate whether the state-of-the-art pre-trained vision-language models have the reasoning capability to correctly interpret counter-intuitive content. ROME contains images that defy commonsense knowledge with regards to color, shape, material, size and positional relation. Experiments on the state-of-the-art pre-trained vision-language models reveal that most of these models are still largely incapable of interpreting counter-intuitive scenarios. We hope that ROME will spur further investigations on reasoning beyond commonsense knowledge in vision-language research.
摘要
人类具有强大的理性能力,可以理解不同于常识的场景。例如,给一个不寻常的图像,如一只鱼在桌子旁边的空鱼缸中,人类会很容易理解鱼不在鱼缸中。然而,这可能不是情况 для视觉语言模型,这些模型可能会受到常识的引导,认为鱼在鱼缸中。在这篇论文中,我们提出了一个新的探索数据集名为ROME(理解超出常识知识),以评估当前最先进的预训练视觉语言模型是否具备正确地解释不同征的能力。ROME数据集包含图像,它们与常识知识有很多不同,包括颜色、形状、材质、大小和位置关系。我们对当前最先进的预训练视觉语言模型进行实验,发现大多数这些模型仍然无法正确地解释不同征的场景。我们希望ROME会激发更多关于理解超出常识知识的研究在视觉语言领域。
ROAM: memory-efficient large DNN training via optimized operator ordering and memory layout
results: 实验表明,使用 ROAM 可以减少内存使用量的 35.7%、13.3% 和 27.2%,并提供了惊人的 53.7 倍的速度提升。 Plus, the evaluation on the large GPT2-XL model further confirms the scalability of ROAM.Abstract
As deep learning models continue to increase in size, the memory requirements for training have surged. While high-level techniques like offloading, recomputation, and compression can alleviate memory pressure, they also introduce overheads. However, a memory-efficient execution plan that includes a reasonable operator execution order and tensor memory layout can significantly increase the models' memory efficiency and reduce overheads from high-level techniques. In this paper, we propose ROAM which operates on computation graph level to derive memory-efficient execution plan with optimized operator order and tensor memory layout for models. We first propose sophisticated theories that carefully consider model structure and training memory load to support optimization for large complex graphs that have not been well supported in the past. An efficient tree-based algorithm is further proposed to search task divisions automatically, along with delivering high performance and effectiveness to solve the problem. Experiments show that ROAM achieves a substantial memory reduction of 35.7%, 13.3%, and 27.2% compared to Pytorch and two state-of-the-art methods and offers a remarkable 53.7x speedup. The evaluation conducted on the expansive GPT2-XL further validates ROAM's scalability.
摘要
深度学习模型的大小不断增加,训练时的内存需求也在不断增加。高级技术如卸载、重计算和压缩可以减轻内存压力,但也会导致开销。然而,一个高效的执行计划,包括合理的运算顺序和维度缓存布局,可以减少模型的内存占用和高级技术的开销。在这篇论文中,我们提出了ROAM,它在计算图层次上运行,以 derivation 高效的执行计划,包括优化的运算顺序和维度缓存布局,为模型提供更高的内存效率和更低的开销。我们首先提出了一些复杂的理论,考虑到模型结构和训练内存负担,以支持大规模复杂的图进行优化。然后,我们提出了一种高效的树结构算法,自动搜索任务分割,同时具有高性能和有效性,解决这个问题。实验表明,ROAM可以实现35.7%、13.3%和27.2%的内存减少,相比于Pytorch和两个状态 искусternalMethods,并提供了惊人的53.7倍的速度提升。而在大型GPT2-XL上进行的评估也证明了ROAM的扩展性。
The Memory Perturbation Equation: Understanding Model’s Sensitivity to Data
results: 研究结果表明,在训练过程中获得的敏感性度量可以准确预测模型在未经见的测试数据上的总成果。这个方法预期会在未来的robust和适应学习研究中得到应用。Abstract
Understanding model's sensitivity to its training data is crucial but can also be challenging and costly, especially during training. To simplify such issues, we present the Memory-Perturbation Equation (MPE) which relates model's sensitivity to perturbation in its training data. Derived using Bayesian principles, the MPE unifies existing sensitivity measures, generalizes them to a wide-variety of models and algorithms, and unravels useful properties regarding sensitivities. Our empirical results show that sensitivity estimates obtained during training can be used to faithfully predict generalization on unseen test data. The proposed equation is expected to be useful for future research on robust and adaptive learning.
摘要
理解模型受训练数据的敏感性是非常重要,但同时也可能具有挑战性和成本高的问题,特别是在训练期间。为了简化这些问题,我们提出了记忆扰动方程(MPE),该方程关系模型受训练数据的扰动和其敏感性。基于 bayesian 原则,MPE 总结了各种模型和算法的敏感度测量方法,并展示了这些测量方法在各种情况下的有用性。我们的实验结果表明,在训练期间获得的敏感度估计可以准确预测未经训练的测试数据上的泛化性。我们预期该方程在未来的robust和adaptive学习研究中将有用。
NPCL: Neural Processes for Uncertainty-Aware Continual Learning
results: 我们的实验表明,NPCL比前一代CL方法性能更高。我们还验证了NPCL中的uncertainty estimation能力可以处理CL中的任务头/模块推理挑战。代码可以在 \url{https://github.com/srvCodes/NPCL} 上获取。Abstract
Continual learning (CL) aims to train deep neural networks efficiently on streaming data while limiting the forgetting caused by new tasks. However, learning transferable knowledge with less interference between tasks is difficult, and real-world deployment of CL models is limited by their inability to measure predictive uncertainties. To address these issues, we propose handling CL tasks with neural processes (NPs), a class of meta-learners that encode different tasks into probabilistic distributions over functions all while providing reliable uncertainty estimates. Specifically, we propose an NP-based CL approach (NPCL) with task-specific modules arranged in a hierarchical latent variable model. We tailor regularizers on the learned latent distributions to alleviate forgetting. The uncertainty estimation capabilities of the NPCL can also be used to handle the task head/module inference challenge in CL. Our experiments show that the NPCL outperforms previous CL approaches. We validate the effectiveness of uncertainty estimation in the NPCL for identifying novel data and evaluating instance-level model confidence. Code is available at \url{https://github.com/srvCodes/NPCL}.
摘要
Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union
paper_authors: Zifu Wang, Maxim Berman, Amal Rannen-Triki, Philip H. S. Torr, Devis Tuia, Tinne Tuytelaars, Luc Van Gool, Jiaqian Yu, Matthew B. Blaschko
results: 在12种自然和飞行分割数据集上,使用提出的评价指标训练和评估15种现代神经网络模型,研究表明,使用细化的mIoU指标可以减少对大对象的偏度,提供更加全面的评估结果。Abstract
Semantic segmentation datasets often exhibit two types of imbalance: \textit{class imbalance}, where some classes appear more frequently than others and \textit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards \textit{majority classes} (e.g. overall pixel-wise accuracy) and \textit{large objects} (e.g. mean pixel-wise accuracy and per-dataset mean intersection over union). To address these shortcomings, we propose the use of fine-grained mIoUs along with corresponding worst-case metrics, thereby offering a more holistic evaluation of segmentation techniques. These fine-grained metrics offer less bias towards large objects, richer statistical information, and valuable insights into model and dataset auditing. Furthermore, we undertake an extensive benchmark study, where we train and evaluate 15 modern neural networks with the proposed metrics on 12 diverse natural and aerial segmentation datasets. Our benchmark study highlights the necessity of not basing evaluations on a single metric and confirms that fine-grained mIoUs reduce the bias towards large objects. Moreover, we identify the crucial role played by architecture designs and loss functions, which lead to best practices in optimizing fine-grained metrics. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.
摘要
Pre-trained Recommender Systems: A Causal Debiasing Perspective
results: 实验结果显示,提出的方法可以在零或几步学习 scenarios下提高推荐性能,并在不同市场和平台上实现较好的适应性。In simpler Chinese, the three key points would be:
for: 这个研究用了预训练模型来提高推荐系统的适应能力和学习效率。
methods: 这个研究使用了预训练模型,并提出了一种解决各自领域偏见问题的方法。
results: 实验结果显示,这个方法可以在零或几步学习 scenarios下提高推荐性能,并在不同市场和平台上实现较好的适应性。Abstract
Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired by such progress, we investigate in this paper the possibilities and challenges of adapting such a paradigm to the context of recommender systems, which is less investigated from the perspective of pre-trained model. In particular, we propose to develop a generic recommender that captures universal interaction patterns by training on generic user-item interaction data extracted from different domains, which can then be fast adapted to improve few-shot learning performance in unseen new domains (with limited data). However, unlike vision/language data which share strong conformity in the semantic space, universal patterns underlying recommendation data collected across different domains (e.g., different countries or different E-commerce platforms) are often occluded by both in-domain and cross-domain biases implicitly imposed by the cultural differences in their user and item bases, as well as their uses of different e-commerce platforms. As shown in our experiments, such heterogeneous biases in the data tend to hinder the effectiveness of the pre-trained model. To address this challenge, we further introduce and formalize a causal debiasing perspective, which is substantiated via a hierarchical Bayesian deep learning model, named PreRec. Our empirical studies on real-world data show that the proposed model could significantly improve the recommendation performance in zero- and few-shot learning settings under both cross-market and cross-platform scenarios.
摘要
研究者最近对预训条件语言/视觉模型进行了详细的研究,并证明了这种新的解决方案在人工智能中具有实用的优点,即可以透过对广泛的资料进行预训,然后在有限的训练数据下预测成功解决各种下游任务,包括零或几个案例中的学习。这些进步鼓发我们在这篇论文中考虑这种解决方案在推荐系统中的可能性和挑战。我们提议发展一个通用的推荐系统,可以透过对不同领域的用户项目互动资料进行预训,然后快速适应提高几个类别学习中的性能。然而,与视觉语言数据不同的是,推荐数据收集自不同领域(例如不同国家或不同的电子商务平台)的普遍性几何在semantic空间中是由文化差异所隐藏的。在我们的实验中,这种多元偏见在数据中对预训模型的效果产生阻碍。为了解决这个挑战,我们进一步引入和实践了一种 causal debiasing 的观点,这是通过一种层次 Bayesian 深度学习模型,名为 PreRec,实现了。我们的实验结果显示,提案的模型可以在零或几个学习设定下提高推荐性能,包括跨市场和跨平台的enario。
IMPRESS: Evaluating the Resilience of Imperceptible Perturbations Against Unauthorized Data Usage in Diffusion-Based Generative AI
paper_authors: Bochuan Cao, Changjiang Li, Ting Wang, Jinyuan Jia, Bo Li, Jinghui Chen for: 本研究旨在评估不可见的杂杂性 perturbation 是否能够保护原始图像免于不当使用。methods: 本研究使用了一种基于扩散的图像生成模型,并在这些模型中引入了不可见的杂杂性。results: 研究发现,通过使用不可见的杂杂性,可以减弱图像的保护力,使其更易受到不当使用。此外,研究还提出了一种新的优化策略,可以用于纯化图像,从而增强图像的安全性。Abstract
Diffusion-based image generation models, such as Stable Diffusion or DALL-E 2, are able to learn from given images and generate high-quality samples following the guidance from prompts. For instance, they can be used to create artistic images that mimic the style of an artist based on his/her original artworks or to maliciously edit the original images for fake content. However, such ability also brings serious ethical issues without proper authorization from the owner of the original images. In response, several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations, which are designed to mislead the diffusion model and make it unable to properly generate new samples. In this work, we introduce a perturbation purification platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure. IMPRESS is based on the key observation that imperceptible perturbations could lead to a perceptible inconsistency between the original image and the diffusion-reconstructed image, which can be used to devise a new optimization strategy for purifying the image, which may weaken the protection of the original image from unauthorized data usage (e.g., style mimicking, malicious editing). The proposed IMPRESS platform offers a comprehensive evaluation of several contemporary protection methods, and can be used as an evaluation platform for future protection methods.
摘要
diffusion-based图像生成模型,如稳定扩散或DALL-E 2,可以根据给定的图像学习并生成高质量的样本,并且可以根据提示进行指导。例如,它们可以创建imitates艺术家的艺术风格的图像,或者为违反原始图像的作者权利而修改图像。然而,这种能力也会产生严重的道德问题,无法得到原始图像的作者授权。为了保护原始图像,several attempts have been made to add imperceptible perturbations,这些噪声是为了诱导扩散模型,使其无法生成正确的新样本。在这种情况下,我们提出了一个名为IMPRESS的噪声纯化平台,用于评估噪声的有效性。IMPRESS基于关键的观察,即噪声可以导致原始图像和扩散重constructed图像之间的不一致,这可以用来开发一种新的优化策略,以纯化图像,可能弱化原始图像的保护(例如,艺术风格模仿、Malicious editing)。我们的IMPRESS平台可以评估多种当今保护方法的有效性,并且可以用作未来保护方法的评估平台。
Uncertainty-guided Boundary Learning for Imbalanced Social Event Detection
results: 在三个严重偏见的社会事件 dataset 上进行了实验,结果显示,我们的模型可以显著改善社会事件表示和分类任务中的几乎所有类型,特别是那些不确定的类型。Abstract
Real-world social events typically exhibit a severe class-imbalance distribution, which makes the trained detection model encounter a serious generalization challenge. Most studies solve this problem from the frequency perspective and emphasize the representation or classifier learning for tail classes. While in our observation, compared to the rarity of classes, the calibrated uncertainty estimated from well-trained evidential deep learning networks better reflects model performance. To this end, we propose a novel uncertainty-guided class imbalance learning framework - UCL$_{SED}$, and its variant - UCL-EC$_{SED}$, for imbalanced social event detection tasks. We aim to improve the overall model performance by enhancing model generalization to those uncertain classes. Considering performance degradation usually comes from misclassifying samples as their confusing neighboring classes, we focus on boundary learning in latent space and classifier learning with high-quality uncertainty estimation. First, we design a novel uncertainty-guided contrastive learning loss, namely UCL and its variant - UCL-EC, to manipulate distinguishable representation distribution for imbalanced data. During training, they force all classes, especially uncertain ones, to adaptively adjust a clear separable boundary in the feature space. Second, to obtain more robust and accurate class uncertainty, we combine the results of multi-view evidential classifiers via the Dempster-Shafer theory under the supervision of an additional calibration method. We conduct experiments on three severely imbalanced social event datasets including Events2012\_100, Events2018\_100, and CrisisLexT\_7. Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.
摘要
实际世界中的社会事件通常会出现严重的类别不均衡分布,这使得训练的检测模型遇到了严重的泛化挑战。大多数研究从频率角度出发,强调表达或类型学习的tail类。而我们所观察到的是,相比于罕见类的数量,从训练得到的准确的深度学习网络中的偏置估计更好地反映模型性能。为此,我们提出了一种基于uncertainty的类别不均衡学习框架-UCL$_{SED}$,以及其变体-UCL-EC$_{SED}$,用于社会事件检测任务中的类别不均衡问题。我们希望通过提高模型对不确定类的泛化来提高整体模型性能。由于性能下降通常来自于误分类邻近类型的样本,我们将注意点放在缓存空间边界学习和类型学习中,使用高质量的不确定估计。首先,我们设计了一种基于uncertainty的对比学习损失函数,即UCL和其变体UCL-EC,用于训练不均衡数据。在训练过程中,它们迫使所有类型,特别是不确定的类型,在特征空间中适应自适应的清晰分界。其次,为了获得更加稳定和准确的类 uncertainty,我们将多视图证据级联推理结果,通过德мп斯特-沙费尔理论进行监督,并采用额外校准方法。我们对三个严重不均衡的社会事件数据集进行实验,包括Events2012\_100、Events2018\_100和CrisisLexT\_7。我们的模型在社会事件表示和分类任务中具有显著改善,特别是不确定的类型。
results: 这篇论文在Field Programmable Gate Array(FPGA)上实现了 SCM 模型,并对两个标准数据集和两个工业数据集进行了测试。结果显示,SCM 模型可以在各种约束下达到比较好的性能。Abstract
Neural networks for industrial applications generally have additional constraints such as response speed, memory size and power usage. Randomized learners can address some of these issues. However, hardware solutions can provide better resource reduction whilst maintaining the model's performance. Stochastic configuration networks (SCNs) are a prime choice in industrial applications due to their merits and feasibility for data modelling. Stochastic Configuration Machines (SCMs) extend this to focus on reducing the memory constraints by limiting the randomized weights to a binary value with a scalar for each node and using a mechanism model to improve the learning performance and result interpretability. This paper aims to implement SCM models on a field programmable gate array (FPGA) and introduce binary-coded inputs to the algorithm. Results are reported for two benchmark and two industrial datasets, including SCM with single-layer and deep architectures.
摘要
neural networks for industrial applications 通常有额外的约束,如响应速度、内存大小和功耗使用。随机学习者可以解决一些这些问题。然而,硬件解决方案可以提供更好的资源减少,同时保持模型的性能。随机配置网络(SCNs)在工业应用中是一个首选的,因为它们在数据模型方面具有优点和可行性。随机配置机器(SCMs)将这个扩展到减少内存约束,限制随机权重为二进制值,并使用机制模型来提高学习性能和结果解释性。本文将SCM模型在场程可编程阵列(FPGA)上实现,并引入二进制编码输入。对两个标准准样据集和两个工业准样据集进行了报告,包括单层和深度SCM模型。
EHRTutor: Enhancing Patient Understanding of Discharge Instructions
results: 论文的评估结果表明,使用EHRTutor可以更好地帮助患者理解他们的诊断和治疗计划,并且可以提高患者对医疗信息的理解和参与度。Abstract
Large language models have shown success as a tutor in education in various fields. Educating patients about their clinical visits plays a pivotal role in patients' adherence to their treatment plans post-discharge. This paper presents EHRTutor, an innovative multi-component framework leveraging the Large Language Model (LLM) for patient education through conversational question-answering. EHRTutor first formulates questions pertaining to the electronic health record discharge instructions. It then educates the patient through conversation by administering each question as a test. Finally, it generates a summary at the end of the conversation. Evaluation results using LLMs and domain experts have shown a clear preference for EHRTutor over the baseline. Moreover, EHRTutor also offers a framework for generating synthetic patient education dialogues that can be used for future in-house system training.
摘要
大型语言模型在教育领域中展现出了成功,特别是在各种领域中的教育过程中。在医疗上发生访视时,教育病人关于他们的治疗方案是非常重要的,这篇文章介绍了EHRTutor,一个创新的多 ком成分框架,利用大型语言模型(LLM)进行病人教育,通过对话式问答。EHRTutor首先将电子健康记录档案中的出院指令形成问题,然后通过对话教育病人,每个问题都会作为测验。最后,它将问题的摘要给出。评估结果显示,使用LLM和专家的评价都偏好EHRTutor,而且EHRTutor还提供了一个生成人工病人教育对话的框架,可以用于未来的内部系统训练。
Leveraging generative artificial intelligence to simulate student learning behavior
results: 三个实验结果:第一个实验(N = 145)表明模拟学生学习结果与实际学生的各种民族特征相似;第二个实验(N = 4524)显示虚拟学生的学习行为逐渐变得更加真实;第三个实验(N = 27)表明虚拟学生的学习行为与考试问题、课程材料、参与度和理解水平之间存在紧密的相关性。Abstract
Student simulation presents a transformative approach to enhance learning outcomes, advance educational research, and ultimately shape the future of effective pedagogy. We explore the feasibility of using large language models (LLMs), a remarkable achievement in AI, to simulate student learning behaviors. Unlike conventional machine learning based prediction, we leverage LLMs to instantiate virtual students with specific demographics and uncover intricate correlations among learning experiences, course materials, understanding levels, and engagement. Our objective is not merely to predict learning outcomes but to replicate learning behaviors and patterns of real students. We validate this hypothesis through three experiments. The first experiment, based on a dataset of N = 145, simulates student learning outcomes from demographic data, revealing parallels with actual students concerning various demographic factors. The second experiment (N = 4524) results in increasingly realistic simulated behaviors with more assessment history for virtual students modelling. The third experiment (N = 27), incorporating prior knowledge and course interactions, indicates a strong link between virtual students' learning behaviors and fine-grained mappings from test questions, course materials, engagement and understanding levels. Collectively, these findings deepen our understanding of LLMs and demonstrate its viability for student simulation, empowering more adaptable curricula design to enhance inclusivity and educational effectiveness.
摘要
学生模拟提供了一种转变性的方法,以提高学习成果,推动教育研究,并最终形成有效的教学方法。我们explore使用大语言模型(LLMs)来模拟学生学习行为。与传统的机器学习预测不同,我们利用LLMs来实例化虚拟学生,并探索学习经验、课程材料、理解水平和参与度之间的细腻相关性。我们的目标不仅是预测学习结果,而是复制学生学习行为和模式。我们验证了这一假设通过三个实验。第一个实验基于N = 145的数据集,模拟学生学习成果从民族数据中,发现与实际学生的各种民族因素之间存在相似性。第二个实验(N = 4524),通过增加评估历史,使虚拟学生的行为越来越真实。第三个实验(N = 27),通过考虑先前知识和课程互动,发现虚拟学生的学习行为与评估问题、课程材料、参与度和理解水平之间存在强相关性。总的来说,这些发现深入了我们对LLMs的理解,并证明了其可行性 для学生模拟,以便更适应性的课程设计,提高包容性和教育效果。
Can ChatGPT advance software testing intelligence? An experience report on metamorphic testing
results: 研究发现,ChatGPT 可以生成新的正确 MR,用于测试多个软件系统。然而,大多数 MR 候选者都是不清晰定义或者错误的,尤其是对于没有被 MT 测试过的系统。 ChatGPT 可以提高软件测试智能,但是人类智能仍然需要参与以确保 MR 的正确性。Abstract
While ChatGPT is a well-known artificial intelligence chatbot being used to answer human's questions, one may want to discover its potential in advancing software testing. We examine the capability of ChatGPT in advancing the intelligence of software testing through a case study on metamorphic testing (MT), a state-of-the-art software testing technique. We ask ChatGPT to generate candidates of metamorphic relations (MRs), which are basically necessary properties of the object program and which traditionally require human intelligence to identify. These MR candidates are then evaluated in terms of correctness by domain experts. We show that ChatGPT can be used to generate new correct MRs to test several software systems. Having said that, the majority of MR candidates are either defined vaguely or incorrect, especially for systems that have never been tested with MT. ChatGPT can be used to advance software testing intelligence by proposing MR candidates that can be later adopted for implementing tests; but human intelligence should still inevitably be involved to justify and rectify their correctness.
摘要
chatgpt是一个著名的人工智能chatbot,它可以回答人类的问题。我们想要探索chatgpt在软件测试方面的潜力。我们通过一个meta testing(MT)的实践研究,询问chatgpt生成元变换关系(MR)的候选者。这些MR候选者需要人类智能来识别,但chatgpt可以生成新的正确MR来测试许多软件系统。然而,大多数MR候选者都是欠准确的,特别是对于没有经过MT测试的系统。chatgpt可以帮助提高软件测试智能,但是人类智能仍然必须参与以确认和修正其正确性。
results: 我们的研究表明,使用PTNN方法可以在BERT和ViT模型中提高精度,最高提高5%,而无需进行后处理调整。这些结果在tensor decomposition领域中做出了新的贡献。Abstract
The transformer architecture has revolutionized Natural Language Processing (NLP) and other machine-learning tasks, due to its unprecedented accuracy. However, their extensive memory and parameter requirements often hinder their practical applications. In this work, we study the effect of tensor-train decomposition to improve the accuracy and compress transformer vision-language neural networks, namely BERT and ViT. We focus both on embedding-layer compression and partial tensorization of neural networks (PTNN) through an algorithmic approach. Our novel PTNN approach significantly improves the accuracy of existing models by up to 5%, all without the need for post-training adjustments, breaking new ground in the field of tensor decomposition.
摘要
transformer 架构在自然语言处理(NLP)和其他机器学习任务中取得了无 precedent 的精度,但它们的广泛内存和参数需求经常限制其实际应用。在这项工作中,我们研究tensor-train decompositions以提高bert和vit transformer视语言神经网络的准确率和压缩,包括嵌入层压缩和partial tensorization of neural networks(PTNN)。我们的新的PTNN方法可以在不需要后处理调整的前提下,提高现有模型的准确率,最高提高5%。
Automatic Evaluation of Generative Models with Instruction Tuning
results: 研究发现,通过对HEAP数据集(包含多种NLG任务和评价标准)进行 instrucion 微调,可以获得良好的性能表现,但有些评价标准 harder to learn than others。此外,同时训练多个任务可以提供更好的性能改进,这可能对未来具有有限的人工标注数据的任务有所帮助。Abstract
Automatic evaluation of natural language generation has long been an elusive goal in NLP.A recent paradigm fine-tunes pre-trained language models to emulate human judgements for a particular task and evaluation criterion. Inspired by the generalization ability of instruction-tuned models, we propose a learned metric based on instruction tuning. To test our approach, we collected HEAP, a dataset of human judgements across various NLG tasks and evaluation criteria. Our findings demonstrate that instruction tuning language models on HEAP yields good performance on many evaluation tasks, though some criteria are less trivial to learn than others. Further, jointly training on multiple tasks can yield additional performance improvements, which can be beneficial for future tasks with little to no human annotated data.
摘要
自然语言生成自动评估长期以来是NLP领域的抢险目标。一种最近的方法是使用预训练语言模型来模拟人类评估标准,并在特定任务和评价标准下进行细化调整。受普遍能力的指示调教模型所启发,我们提议一种学习度量,基于指示调教。为评估我们的方法,我们收集了HEAP数据集,这是各种NLG任务和评价标准下的人类评估判断。我们的发现表明,将语言模型 instrucion tuning 到HEAP数据集上,可以在许多评估任务上达到良好的性能,但有些评价标准更难于学习。此外,同时训练多个任务可以获得额外的性能提升,这可以对未来具有少量或无人标注数据的任务产生帮助。
Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection
results: 实验结果表明,AdaICL可以提高性能的精度值4.4%,相比SOTA(7.7%相对提高),并且可以在有限预算下选择更多的示例,从而提高效果。Abstract
Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about, and performs semantic diversity-based example selection. Diversity-based sampling improves overall effectiveness, while uncertainty sampling improves budget efficiency and helps the LLM learn new information. Moreover, AdaICL poses its sampling strategy as a Maximum Coverage problem, that dynamically adapts based on the model's feedback and can be approximately solved via greedy algorithms. Extensive experiments on nine datasets and seven LLMs show that AdaICL improves performance by 4.4% accuracy points over SOTA (7.7% relative improvement), is up to 3x more budget-efficient than performing annotations uniformly at random, while it outperforms SOTA with 2x fewer ICL examples.
摘要
Early Detection of Depression and Eating Disorders in Spanish: UNSL at MentalRiskES 2023
results: 在任务1和任务2中,我们的方法获得了第二好的表现,包括分类和延迟时间的排名。这表明了我们的方法在西班牙语早期探测问题中的效果和一致性。Abstract
MentalRiskES is a novel challenge that proposes to solve problems related to early risk detection for the Spanish language. The objective is to detect, as soon as possible, Telegram users who show signs of mental disorders considering different tasks. Task 1 involved the users' detection of eating disorders, Task 2 focused on depression detection, and Task 3 aimed at detecting an unknown disorder. These tasks were divided into subtasks, each one defining a resolution approach. Our research group participated in subtask A for Tasks 1 and 2: a binary classification problem that evaluated whether the users were positive or negative. To solve these tasks, we proposed models based on Transformers followed by a decision policy according to criteria defined by an early detection framework. One of the models presented an extended vocabulary with important words for each task to be solved. In addition, we applied a decision policy based on the history of predictions that the model performs during user evaluation. For Tasks 1 and 2, we obtained the second-best performance according to rankings based on classification and latency, demonstrating the effectiveness and consistency of our approaches for solving early detection problems in the Spanish language.
摘要
MENTALRISKES是一个新的挑战,旨在解决西班牙语早期风险检测中的问题。该挑战的目标是,以最快速的速度可能,检测泰格拉姆用户是否显示精神障碍的迹象。任务1涉及到用户识别饮食障碍,任务2关注于抑郁症检测,任务3旨在检测未知的精神障碍。这些任务被分解成多个子任务,每个子任务定义了解决方案。我们的研究组参与了任务1和2的子任务A:一个二分类问题,以确定用户是否为正或负。为解决这些任务,我们提出了基于转换器的模型,并采用根据早期检测框架定义的决策策略。我们的模型还包括了每个任务的重要词汇扩展 vocabulary。此外,我们还应用了基于历史预测结果的决策策略。在任务1和2中,我们获得了第二名的成绩,根据分类和延迟时间的排名。这表明我们的方法在西班牙语早期检测问题中具有效果和一致性。
Generative retrieval-augmented ontologic graph and multi-agent strategies for interpretive large language model-based materials design
For: The paper explores the use of large language models (LLMs) as a tool for engineering analysis of materials, specifically for retrieving key information, developing research hypotheses, discovering mechanistic relationships, and writing and executing simulation codes.* Methods: The paper uses a fine-tuned model called MechGPT, which is developed based on training data in the mechanics of materials domain. The authors also employ retrieval-augmented Ontological Knowledge Graph strategies to address the issue of LLMs recalling correct information outside the context of learned matter.* Results: The paper shows that LLMs can provide powerful problem solution strategies for applications in analysis and design problems, and that retrieval-augmented Ontological Knowledge Graph strategies can provide an interpretable graph structure with rich information at the node, edge, and subgraph level. The authors also discuss nonlinear sampling strategies and agent-based modeling applied to complex question answering, code generation, and execution in the context of automated force field development from actively learned DFT modeling, and data analysis.Here’s the Chinese version of the three key information points:* For: 这篇论文探讨了大语言模型(LLMs)在材料分析和设计中的应用,特别是在检索关键信息、发展研究假设、发现不同领域知识之间的机制关系,以及写入和执行模拟代码方面。* Methods: 这篇论文使用了精度调整的模型——MechGPT,该模型基于机械性物质领域的培训数据进行训练。作者还使用了检索加持知识图的策略来解决 LLMS 在不同领域知识上的回忆问题。* Results: 这篇论文表明了 LLMS 可以为材料分析和设计问题提供强大的问题解决策略,并且使用检索加持知识图的策略可以提供可解释的知识图结构,包括节点、边和子图等级别的信息。作者还讨论了基于非线性抽样策略和智能代理模型的复杂问题回答、代码生成和执行在自动学习DFT模型中的应用。Abstract
Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design and manufacturing, including their capacity to work effectively with both human language, symbols, code, and numerical data. Here we explore the use of large language models (LLMs) as a tool that can support engineering analysis of materials, applied to retrieving key information about subject areas, developing research hypotheses, discovery of mechanistic relationships across disparate areas of knowledge, and writing and executing simulation codes for active knowledge generation based on physical ground truths. When used as sets of AI agents with specific features, capabilities, and instructions, LLMs can provide powerful problem solution strategies for applications in analysis and design problems. Our experiments focus on using a fine-tuned model, MechGPT, developed based on training data in the mechanics of materials domain. We first affirm how finetuning endows LLMs with reasonable understanding of domain knowledge. However, when queried outside the context of learned matter, LLMs can have difficulty to recall correct information. We show how this can be addressed using retrieval-augmented Ontological Knowledge Graph strategies that discern how the model understands what concepts are important and how they are related. Illustrated for a use case of relating distinct areas of knowledge - here, music and proteins - such strategies can also provide an interpretable graph structure with rich information at the node, edge and subgraph level. We discuss nonlinear sampling strategies and agent-based modeling applied to complex question answering, code generation and execution in the context of automated force field development from actively learned Density Functional Theory (DFT) modeling, and data analysis.
摘要
transformer神经网络表现出了扎根的能力,特别是在材料分析、设计和生产领域,包括它们能够与人类语言、符号、代码和数字数据进行有效的交互。我们在这里探索使用大语言模型(LLM)作为工程分析材料的工具,包括检索关键信息、发展研究假设、在不同领域知识之间发现机制关系以及基于物理真实的编程和执行代码。当用作具有特定特征、能力和指令的AI代理时,LLM可以提供有力的问题解决策略。我们的实验将注重使用微调的模型——MechGPT,基于材料力学领域的训练数据进行微调。我们首先证明了训练后LLM具有适当的领域知识理解能力。然而,当被询问在学习的知识外时,LLM可能会办不到正确的回答。我们示出了如何使用检索扩展的 Ontological Knowledge Graph 策略,以便评估模型对概念的理解和关系的扩展。例如,在音乐和蛋白质之间的关系问题上,我们可以提供可读的图structures,并且在节点、边和子图等级具有丰富的信息。我们讨论了不对称采样策略和基于智能代理的问题回答、代码生成和执行在自动学习DFT模型中的应用。
Strategies to Harness the Transformers’ Potential: UNSL at eRisk 2023
results: 在这个研究中,我们在两个任务中获得了良好的表现,包括决策基于指标、排名基于指标和运行时间。Abstract
The CLEF eRisk Laboratory explores solutions to different tasks related to risk detection on the Internet. In the 2023 edition, Task 1 consisted of searching for symptoms of depression, the objective of which was to extract user writings according to their relevance to the BDI Questionnaire symptoms. Task 2 was related to the problem of early detection of pathological gambling risks, where the participants had to detect users at risk as quickly as possible. Finally, Task 3 consisted of estimating the severity levels of signs of eating disorders. Our research group participated in the first two tasks, proposing solutions based on Transformers. For Task 1, we applied different approaches that can be interesting in information retrieval tasks. Two proposals were based on the similarity of contextualized embedding vectors, and the other one was based on prompting, an attractive current technique of machine learning. For Task 2, we proposed three fine-tuned models followed by decision policy according to criteria defined by an early detection framework. One model presented extended vocabulary with important words to the addressed domain. In the last task, we obtained good performances considering the decision-based metrics, ranking-based metrics, and runtime. In this work, we explore different ways to deploy the predictive potential of Transformers in eRisk tasks.
摘要
(Simplified Chinese translation)CLEF eRisk Laboratory 研究不同的互联网风险探测任务。2023年版本中,任务1是搜寻抑郁的征候,目标是将用户文章与BDDI询问题的征候相关。任务2是早期检测疯狂赌博风险,参赛者需要快速检测用户是否有风险。最后一个任务是估计进食障碍的严重程度。我们的研究小组参加了第一个和第二个任务,提出了基于传播的解决方案。在任务1中,我们运用了不同的方法,包括相似的上下文化嵌入vector的类比和Prompting技术。在任务2中,我们提出了三个精革化的模型,然后根据早期检测框架的参数进行决策。一个模型增加了重要领域的词汇。在最后一个任务中,我们获得了良好的表现,考虑到决策基于的指标、排名基于的指标和时间。在这个工作中,我们探索了不同的方法来实现传播的预测潜力在eRisk任务中。
The Impact of Depth and Width on Transformer Language Model Generalization
results: 研究发现,在 fine-tuning 后,深度较大的模型在 out-of-distribution 总结方面表现更好,但随着层数的增加,模型的性能下降速度加剧。此外, deeper 模型在 each family 中的语言模型性能也更高,但返回也随着层数增加而减少。最后,研究发现,depth 对 compositional 总结的好处不能归结于语言模型性能或在预测数据上的表现。Abstract
To process novel sentences, language models (LMs) must generalize compositionally -- combine familiar elements in new ways. What aspects of a model's structure promote compositional generalization? Focusing on transformers, we test the hypothesis, motivated by recent theoretical and empirical work, that transformers generalize more compositionally when they are deeper (have more layers). Because simply adding layers increases the total number of parameters, confounding depth and size, we construct three classes of models which trade off depth for width such that the total number of parameters is kept constant (41M, 134M and 374M parameters). We pretrain all models as LMs and fine-tune them on tasks that test for compositional generalization. We report three main conclusions: (1) after fine-tuning, deeper models generalize better out-of-distribution than shallower models do, but the relative benefit of additional layers diminishes rapidly; (2) within each family, deeper models show better language modeling performance, but returns are similarly diminishing; (3) the benefits of depth for compositional generalization cannot be attributed solely to better performance on language modeling or on in-distribution data.
摘要
更深的模型在这些任务中的调整后,能够更好地扩展到未知的数据上,但这些模型的优势逐渐减弱。2. 在每个家族中,更深的模型都会表现更好,但回报也逐渐减弱。3. 深度的优势不能完全归因于更好的语言模型性能或在这些数据上的表现。Note:* “compose”in the original text is translated as “扩展”in Simplified Chinese, which means to expand or extend something.* “generalize”in the original text is translated as “扩展”in Simplified Chinese, which means to make something apply to a wider range of cases.* “layer”in the original text is translated as “层”in Simplified Chinese, which refers to a specific level of a neural network.* “parameters”in the original text is translated as “参数”in Simplified Chinese, which refers to the learnable weights and biases of a neural network.
Split-NER: Named Entity Recognition via Two Question-Answering-based Classifications
results: 实验结果表明,这种两步方法比基eline有效,在 Ontotes5.0、WNUT17 和一个Cybersecurity数据集上都超过了基eline,在 BioNLP13CG 上具有相当的性能,同时具有显著降低训练时间的优势。Abstract
In this work, we address the NER problem by splitting it into two logical sub-tasks: (1) Span Detection which simply extracts entity mention spans irrespective of entity type; (2) Span Classification which classifies the spans into their entity types. Further, we formulate both sub-tasks as question-answering (QA) problems and produce two leaner models which can be optimized separately for each sub-task. Experiments with four cross-domain datasets demonstrate that this two-step approach is both effective and time efficient. Our system, SplitNER outperforms baselines on OntoNotes5.0, WNUT17 and a cybersecurity dataset and gives on-par performance on BioNLP13CG. In all cases, it achieves a significant reduction in training time compared to its QA baseline counterpart. The effectiveness of our system stems from fine-tuning the BERT model twice, separately for span detection and classification. The source code can be found at https://github.com/c3sr/split-ner.
摘要
在这个工作中,我们解决了NER问题,将其拆分成两个逻辑子任务:(1)Span Detection,它简单地提取实体提及span,不论实体类型;(2)Span Classification,它将span分类为实体类型。然后,我们将这两个子任务转化为问答问题,生成了两个简单的模型,可以分别优化每个子任务。实验表明,我们的SplitNER系统在四个跨领域数据集上表现出色,效果明显优于基eline。具体来说,我们的系统在OntoNotes5.0、WNUT17和一个信息安全数据集上均表现出色,与 BioNLP13CG 的性能相当。同时,我们的系统在训练时间方面也有显著的提升。这种效果源于我们在 BERT 模型上进行了两次细化,分别为 span detection 和 classification。相关的源代码可以在 GitHub 上找到:https://github.com/c3sr/split-ner。
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
paper_authors: Christoph Leiter, Juri Opitz, Daniel Deutsch, Yang Gao, Rotem Dror, Steffen Eger
for: 这篇研究的目的是探讨提示和评分的应用在自然语言处理中,特别是在翻译和摘要评估中。
methods: 研究使用了一些已知的大语言模型,并将其用于提示和评分。
results: 研究发现,即使将大型语言模型限制在允许的列表上,仍可以 achieves results on par with or even surpassing recent reference-free metrics developed using larger models。Abstract
With an increasing number of parameters and pre-training data, generative large language models (LLMs) have shown remarkable capabilities to solve tasks with minimal or no task-related examples. Notably, LLMs have been successfully employed as evaluation metrics in text generation tasks. Within this context, we introduce the Eval4NLP 2023 shared task that asks participants to explore prompting and score extraction for machine translation (MT) and summarization evaluation. Specifically, we propose a novel competition setting in which we select a list of allowed LLMs and disallow fine-tuning to ensure a focus on prompting. We present an overview of participants' approaches and evaluate them on a new reference-free test set spanning three language pairs for MT and a summarization dataset. Notably, despite the task's restrictions, the best-performing systems achieve results on par with or even surpassing recent reference-free metrics developed using larger models, including GEMBA and Comet-Kiwi-XXL. Finally, as a separate track, we perform a small-scale human evaluation of the plausibility of explanations given by the LLMs.
摘要
随着参数和预训练数据的增加,生成大型自然语言模型(LLM)在解决无或少关注任务示例的情况下表现出色。特别是,LLM在文本生成任务中作为评价指标得到了广泛的应用。在这个上下文中,我们介绍了2023年的Eval4NLP任务,询问参与者探索提示和分析抽取在机器翻译(MT)和概要生成评价中的应用。我们提出了一种新的竞赛设定,在选择允许的LLM列表并禁用微调的情况下,以便强调提示。我们对参与者的方法进行了概述,并对新的无参考测试集跨三种语言对MT和概要生成进行评估。尽管任务有限制,最佳系统的表现与或者超过了最近发展的无参考度量器,包括GEMB和Comet-Kiwi-XXL。最后,我们在一个小规模的人工评估中评估了LLM的解释可信度。
What’s “up” with vision-language models? Investigating their struggle with spatial reasoning
results: 研究人员发现,许多现有的视言预训练数据集,如LAION-2B,含有少量可靠的数据,用于学习空间关系。此外,研究人员还发现,基本的模型改进方法,如升重前置词包含的实例或 Fine-tuning 在这些数据集上,并不能解决这些测试集带来的挑战。Abstract
Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"? We curate three new corpora to quantify model comprehension of such basic spatial relations. These tests isolate spatial reasoning more precisely than existing datasets like VQAv2, e.g., our What'sUp benchmark contains sets of photographs varying only the spatial relations of objects, keeping their identity fixed (see Figure 1: models must comprehend not only the usual case of a dog under a table, but also, the same dog on top of the same table). We evaluate 18 VL models, finding that all perform poorly, e.g., BLIP finetuned on VQAv2, which nears human parity on VQAv2, achieves 56% accuracy on our benchmarks vs. humans at 99%. We conclude by studying causes of this surprising behavior, finding: 1) that popular vision-language pretraining corpora like LAION-2B contain little reliable data for learning spatial relationships; and 2) that basic modeling interventions like up-weighting preposition-containing instances or fine-tuning on our corpora are not sufficient to address the challenges our benchmarks pose. We are hopeful that these corpora will facilitate further research, and we release our data and code at https://github.com/amitakamath/whatsup_vlms.
摘要
最近的视力语言(VL)模型强大,但它们能够准确地 distinguishes "right" 和 "left" 吗?我们创建了三个新的 corpora 来量化模型对这些基本的空间关系的理解。这些测试更 preciselly than existing datasets like VQAv2, e.g., our What'sUp benchmark contains sets of photographs varying only the spatial relations of objects, keeping their identity fixed (see Figure 1: models must comprehend not only the usual case of a dog under a table, but also, the same dog on top of the same table). We evaluate 18 VL models, finding that all perform poorly, e.g., BLIP finetuned on VQAv2, which nears human parity on VQAv2, achieves 56% accuracy on our benchmarks vs. humans at 99%. We conclude by studying causes of this surprising behavior, finding: 1) that popular vision-language pretraining corpora like LAION-2B contain little reliable data for learning spatial relationships; and 2) that basic modeling interventions like up-weighting preposition-containing instances or fine-tuning on our corpora are not sufficient to address the challenges our benchmarks pose. We are hopeful that these corpora will facilitate further research, and we release our data and code at https://github.com/amitakamath/whatsup_vlms.
Chain-of-Thought Embeddings for Stance Detection on Social Media
results: 本研究实现了SOTA的立场检测性能在多个社交媒体上的 datasets。Abstract
Stance detection on social media is challenging for Large Language Models (LLMs), as emerging slang and colloquial language in online conversations often contain deeply implicit stance labels. Chain-of-Thought (COT) prompting has recently been shown to improve performance on stance detection tasks -- alleviating some of these issues. However, COT prompting still struggles with implicit stance identification. This challenge arises because many samples are initially challenging to comprehend before a model becomes familiar with the slang and evolving knowledge related to different topics, all of which need to be acquired through the training data. In this study, we address this problem by introducing COT Embeddings which improve COT performance on stance detection tasks by embedding COT reasonings and integrating them into a traditional RoBERTa-based stance detection pipeline. Our analysis demonstrates that 1) text encoders can leverage COT reasonings with minor errors or hallucinations that would otherwise distort the COT output label. 2) Text encoders can overlook misleading COT reasoning when a sample's prediction heavily depends on domain-specific patterns. Our model achieves SOTA performance on multiple stance detection datasets collected from social media.
摘要
大型自然语言模型(LLM)在社交媒体上进行立场检测是具有挑战性的,因为在线上对话中的新词汇和口语语言经常含有深层次的立场标签。链条思维(COT)推断技术最近在立场检测任务上得到了改进,减轻了一些问题。然而,COT推断仍然努力地处理深层次的立场标识。这个挑战的原因在于许多样本需要模型通过训练数据来学习不同话题的流行语言和演化知识,这些知识需要在模型中建立链条思维。在这种情况下,我们解决这个问题 by introducing COT Embeddings,它们可以改进COT推断的性能在立场检测任务中。我们的分析表明:1)文本编码器可以利用COT推断的小误差或幻见,而不会扭曲COT输出标签。2)文本编码器可以忽略域pecificpatterns中的诱导性COT推断。我们的模型在多个社交媒体上收集的多个立场检测 dataset上达到了顶尖性能。
Collaborative Evaluation: Exploring the Synergy of Large Language Models and Humans for Open-ended Generation Evaluation
results: 研究发现,通过利用LLM,CoEval可以准确地评估长文本,提高评估效率和可靠性,同时仍然保留了人类审核的作用,以确保最终结果的可靠性。Abstract
Humans are widely involved in the evaluation of open-ended natural language generation tasks (NLG) that demand creativity, as automatic metrics often exhibit weak correlations with human judgments. Large language models (LLMs) recently have emerged as a scalable and cost-effective alternative to human evaluations. However, both humans and LLMs have limitations, i.e., inherent subjectivity and unreliable judgments, particularly for open-ended tasks that require adaptable metrics tailored to diverse task requirements. To explore the synergy between humans and LLM-based evaluators and address the challenges of existing inconsistent evaluation criteria in open-ended NLG tasks, we propose a Collaborative Evaluation pipeline CoEval, involving the design of a checklist of task-specific criteria and the detailed evaluation of texts, in which LLM generates initial ideation, and then humans engage in scrutiny. We conducted a series of experiments to investigate the mutual effects between LLMs and humans in CoEval. Results show that, by utilizing LLMs, CoEval effectively evaluates lengthy texts, saving significant time and reducing human evaluation outliers. Human scrutiny still plays a role, revising around 20% of LLM evaluation scores for ultimate reliability.
摘要
人类广泛参与开放型自然语言生成任务(NLG)的评估,因为自动度量器经常表现出较弱的相关性与人类评价。大型语言模型(LLM)最近在可扩展性和成本效益方面出现为一种可靠的代替方案。然而,人类和 LLM 都有局限性,即内在的主观性和不可靠的评价,特别是开放型任务需要适应性的多样化任务需求。为了探索人类和 LLM 评价者之间的共同作用和解决现有的不一致评价标准问题,我们提出了一个协同评价管道 CoEval,其中包括设计任务特定的标准列表和细化文本评价。在 CoEval 中,LLM 生成初步的想法,然后人类进行审核。我们进行了一系列实验,发现 CoEval 可以有效评估长文本, saves significant time and reduces human evaluation outliers。然而,人类审核仍然发挥着一定的作用,对 LLM 评估得到的分数进行修改,以确保最终的可靠性。
Combining Language Models For Specialized Domains: A Colorful Approach
results: 实验结果显示,这种方法可以将专业领域的名词和技术词融入语言任务中,并且可以降低专业领域的误差率无需对通用领域的性能产生影响。Abstract
General purpose language models (LMs) encounter difficulties when processing domain-specific jargon and terminology, which are frequently utilized in specialized fields such as medicine or industrial settings. Moreover, they often find it challenging to interpret mixed speech that blends general language with specialized jargon. This poses a challenge for automatic speech recognition systems operating within these specific domains. In this work, we introduce a novel approach that integrates domain-specific or secondary LM into general-purpose LM. This strategy involves labeling, or "coloring", each word to indicate its association with either the general or the domain-specific LM. We develop an optimized algorithm that enhances the beam search algorithm to effectively handle inferences involving colored words. Our evaluations indicate that this approach is highly effective in integrating jargon into language tasks. Notably, our method substantially lowers the error rate for domain-specific words without compromising performance in the general domain.
摘要
通用语言模型(LM)在处理域围绕专业术语和技术名词时遇到困难,这些术语和名词在医学或工业领域中非常常见。此外,它们也有 difficulty 处理混合语言,这种语言混合通用语言和专业术语。这对自动语音识别系统在这些具体领域中操作带来了挑战。在这项工作中,我们介绍了一种新的方法,即将域围绕专业语言模型(LM)与通用语言模型(LM)集成。这种策略的核心思想是对每个词语进行标记,以便指示它们与通用语言模型或域围绕语言模型相关。我们开发了一种优化的算法,以便在搜索算法中有效地处理彩色词语的推理。我们的评估结果表明,这种方法在混合语言任务中非常有效,并且可以大幅降低域围绕专业词语的错误率,而无需妨碍通用领域的表现。
When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations
methods: 这些方法使用 Continuous embedding space 和 discrete token space 进行研究,并证明了这些方法在同样的 Parameters 上是 strictly less expressive than full fine-tuning。
results: 研究发现,虽然 context-based fine-tuning 方法可以很好地启动 pretrained model 中的技能,但它们无法学习新的任务,因为它们无法改变内部模型的关注模式。Abstract
Context-based fine-tuning methods, including prompting, in-context learning, soft prompting (also known as prompt tuning), and prefix-tuning, have gained popularity due to their ability to often match the performance of full fine-tuning with a fraction of the parameters. Despite their empirical successes, there is little theoretical understanding of how these techniques influence the internal computation of the model and their expressiveness limitations. We show that despite the continuous embedding space being more expressive than the discrete token space, soft-prompting and prefix-tuning are strictly less expressive than full fine-tuning, even with the same number of learnable parameters. Concretely, context-based fine-tuning cannot change the relative attention pattern over the content and can only bias the outputs of an attention layer in a fixed direction. This suggests that while techniques like prompting, in-context learning, soft prompting, and prefix-tuning can effectively elicit skills present in the pretrained model, they cannot learn novel tasks that require new attention patterns.
摘要
Context-based 精度调整方法,包括提示、在 Context 中学习、软提示(也称为提示调整)和 prefix-tuning,因其能够匹配全部 fine-tuning 的性能,而具有许多参数的优势。 despite their empirical successes, there is little theoretical understanding of how these techniques influence the internal computation of the model and their expressiveness limitations. We show that despite the continuous embedding space being more expressive than the discrete token space, soft-prompting and prefix-tuning are strictly less expressive than full fine-tuning, even with the same number of learnable parameters. Concretely, context-based fine-tuning cannot change the relative attention pattern over the content and can only bias the outputs of an attention layer in a fixed direction. This suggests that while techniques like prompting, in-context learning, soft prompting, and prefix-tuning can effectively elicit skills present in the pretrained model, they cannot learn novel tasks that require new attention patterns.Note that the phrase "context-based fine-tuning" is translated as "Context-based 精度调整" in Simplified Chinese, which is a combination of "Context-based" and "精度调整" (fine-tuning).
Sentiment Analysis in Digital Spaces: An Overview of Reviews
for: This study is a systematic review that summarizes 38 systematic reviews and 2,275 primary studies.
methods: The study uses a bespoke quality assessment framework to evaluate the rigor and quality of systematic review methodologies and reporting standards.
results: The study finds diverse applications and methods, limited reporting quality, and challenges over time.Abstract
Sentiment analysis (SA) is commonly applied to digital textual data, revealing insight into opinions and feelings. Many systematic reviews have summarized existing work, but often overlook discussions of validity and scientific practices. Here, we present an overview of reviews, synthesizing 38 systematic reviews, containing 2,275 primary studies. We devise a bespoke quality assessment framework designed to assess the rigor and quality of systematic review methodologies and reporting standards. Our findings show diverse applications and methods, limited reporting rigor, and challenges over time. We discuss how future research and practitioners can address these issues and highlight their importance across numerous applications.
摘要
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
paper_authors: Allen Nie, Yuhui Zhang, Atharva Amdekar, Chris Piech, Tatsunori Hashimoto, Tobias Gerstenberg
For: The paper aims to investigate how well large language models (LLMs) align with human intuitions in making causal and moral judgments about text-based scenarios.* Methods: The paper uses a dataset of stories from 24 cognitive science papers and develops a system to annotate each story with the factors investigated. The authors then test the alignment of LLMs with human participants’ judgments using statistical analyses.* Results: The results show that while LLMs have improved in aligning with human participants’ judgments in recent years, they still weigh the different factors quite differently. The study demonstrates the importance of curated, challenge datasets combined with insights from cognitive science to evaluate LLMs’ performance and understand their implicit tendencies.Abstract
Human commonsense understanding of the physical and social world is organized around intuitive theories. These theories support making causal and moral judgments. When something bad happens, we naturally ask: who did what, and why? A rich literature in cognitive science has studied people's causal and moral intuitions. This work has revealed a number of factors that systematically influence people's judgments, such as the violation of norms and whether the harm is avoidable or inevitable. We collected a dataset of stories from 24 cognitive science papers and developed a system to annotate each story with the factors they investigated. Using this dataset, we test whether large language models (LLMs) make causal and moral judgments about text-based scenarios that align with those of human participants. On the aggregate level, alignment has improved with more recent LLMs. However, using statistical analyses, we find that LLMs weigh the different factors quite differently from human participants. These results show how curated, challenge datasets combined with insights from cognitive science can help us go beyond comparisons based merely on aggregate metrics: we uncover LLMs implicit tendencies and show to what extent these align with human intuitions.
摘要
人类常识理解物理和社会世界是通过直觉理论来组织的。这些理论支持我们作出 causal 和 moral 判断。当omething bad happens,我们就会自然地问:who did what, and why?一大量的认知科学研究已经研究了人们的 causal 和 moral 直觉。这些研究发现了一些系统性地影响人们的判断的因素,如违反 norms 和是否可避免或不可避免。我们收集了 24 篇认知科学论文中的故事,并开发了一个系统来注释每个故事中 investigate 的因素。使用这个数据集,我们测试了大语言模型(LLMs)是否对文本场景中的 causal 和 moral 判断与人类参与者的判断相align。在聚合水平上,与更新的 LLMs 相比,Alignment 有所提高。然而,通过统计分析,我们发现 LLMs 对不同因素进行了不同的评重。这些结果表明,可以通过精心编辑、挑战数据集和认知科学的知识来让 LLMs 的偏好和人类直觉之间进行比较,以了解 LLMS 的隐藏偏好是否与人类直觉相align。
Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck
results: 在12种多样化的 dataset 上,使用 GPT-4 生成和测量概念,TBM 可以与已有的黑盒模型相比,其性能与 GPT-4 fewshot 和 DeBERTa 训练后的性能相似,而与 GPT-3.5 训练后的性能相差不大。总之,我们的发现表明,TBM 是一种有前途的新框架,它可以增强可解释性,并且减少性能损失,特别是在通用领域的文本分类任务上。Abstract
Deep neural networks excel in text classification tasks, yet their application in high-stakes domains is hindered by their lack of interpretability. To address this, we propose Text Bottleneck Models (TBMs), an intrinsically interpretable text classification framework that offers both global and local explanations. Rather than directly predicting the output label, TBMs predict categorical values for a sparse set of salient concepts and use a linear layer over those concept values to produce the final prediction. These concepts can be automatically discovered and measured by a Large Language Model (LLM), without the need for human curation. On 12 diverse datasets, using GPT-4 for both concept generation and measurement, we show that TBMs can rival the performance of established black-box baselines such as GPT-4 fewshot and finetuned DeBERTa, while falling short against finetuned GPT-3.5. Overall, our findings suggest that TBMs are a promising new framework that enhances interpretability, with minimal performance tradeoffs, particularly for general-domain text.
摘要
深度神经网络在文本分类任务中表现出色,但在高风险领域应用受其解释性的限制。为解决这问题,我们提议文本瓶颈模型(TBM),一种内在可解释的文本分类框架,可以提供全局和局部解释。TBM不直接预测输出标签,而是预测一个稀缺的核心概念的 categorical 值,然后使用这些概念值进行线性变换来生成最终预测。这些概念可以通过大语言模型(LLM)自动发现和测量,无需人工筛选。在12种多样化的数据集上,使用 GPT-4 进行概念生成和测量,我们发现TBM可以与已有的黑盒子基eline相比,在通用领域文本中表现出类似的性能,而与训练过 GPT-3.5 的情况下表现略为落后。总的来说,我们的发现表明TBM是一种有前途的新框架,可以增强解释性,而无需付出明显的性能成本,特别是在通用领域文本中。
Dynamics of Instruction Tuning: Each Ability of Large Language Models Has Its Own Growth Pace
results: 研究发现,虽然数据量和参数 scale直接影响模型的总性能,但一些能力更sensitive于数据量和参数的增加,而其他一些能力却很难受到这些变化的影响。此外,人类审核的数据能够在数据量增加时Constantly improve模型性能,而Synthetic数据则不可能达到这种效果。Abstract
Instruction tuning is a burgeoning method to elicit the general intelligence of Large Language Models (LLMs). However, the creation of instruction data is still largely heuristic, leading to significant variation in quality and distribution across existing datasets. Experimental conclusions drawn from these datasets are also inconsistent, with some studies emphasizing the importance of scaling instruction numbers, while others argue that a limited number of samples suffice. To better understand data construction guidelines, we deepen our focus from the overall model performance to the growth of each underlying ability, such as creative writing, code generation, and logical reasoning. We systematically investigate the effects of data volume, parameter size, and data construction methods on the development of various abilities, using hundreds of model checkpoints (7b to 33b) fully instruction-tuned on a new collection of over 40k human-curated instruction data. This proposed dataset is stringently quality-controlled and categorized into ten distinct LLM abilities. Our study reveals three primary findings: (i) Despite data volume and parameter scale directly impacting models' overall performance, some abilities are more responsive to their increases and can be effectively trained using limited data, while some are highly resistant to these changes. (ii) Human-curated data strongly outperforms synthetic data from GPT-4 in efficiency and can constantly enhance model performance with volume increases, but is unachievable with synthetic data. (iii) Instruction data brings powerful cross-ability generalization, with evaluation results on out-of-domain data mirroring the first two observations. Furthermore, we demonstrate how these findings can guide more efficient data constructions, leading to practical performance improvements on public benchmarks.
摘要
instrucion 调教是一种迅速发展的方法,用于激发大型自然语言模型(LLM)的通用智能。然而,创建 instrucion 数据仍然受限于较为规则的创建方法,导致现有数据集的质量和分布存在显著的差异。这些数据集的实验结论也存在差异,一些研究认为需要扩大 instrucion 的数量,而其他研究则认为只需要一些样本即可。为了更好地理解数据构建指南,我们将我们的注意力从整体模型性能深入调整为每个基础能力的发展,如创作写作、代码生成和逻辑推理。我们系统地 investigate 数据量、参数大小和数据构建方法对模型的不同能力的影响,使用了百种模型检查点(7b-33b),全面 instruction-tuned 在一个新收集的人类检查的 instrucion 数据集上。这个提议的数据集 stringently 质控rolled 并分为十种不同的 LLM 能力。我们的研究发现了以下三个主要结论:(i)尽管数据量和参数缩放直接影响模型的总性能,但一些能力更sensitive于这些变化,可以使用有限数据进行有效地训练,而一些能力却具有很高的抵抗力。(ii)人类检查的数据在效率和可 reuse 方面胜过 GPT-4 生成的 sintetic 数据,但是不可能通过 sintetic 数据进行持续改进。(iii) instrucion 数据带来了强大的跨能力泛化,评估结果表明,模型在尝试数据上的性能和数据集上的性能呈现相似的趋势。此外,我们还示出了如何将这些发现应用于更有效的数据构建,以实现公共benchmark上的实践性能提升。
KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering
results: 我们的实验结果表明,KeyGen2Vec在整体上比多标签关键词分类器高效,最高达14.7%的纯度、 нор化共享信息(NMI)和F1-Score指标。 Interestingly,尽管在评估数据集上,获得标签超级视的学习嵌入的绝对优势极高,KeyGen2Vec在Yahoo! cQA中的更多标签 Label Supervision情况下与类ifier竞争。Abstract
Representing documents into high dimensional embedding space while preserving the structural similarity between document sources has been an ultimate goal for many works on text representation learning. Current embedding models, however, mainly rely on the availability of label supervision to increase the expressiveness of the resulting embeddings. In contrast, unsupervised embeddings are cheap, but they often cannot capture implicit structure in target corpus, particularly for samples that come from different distribution with the pretraining source. Our study aims to loosen up the dependency on label supervision by learning document embeddings via Sequence-to-Sequence (Seq2Seq) text generator. Specifically, we reformulate keyphrase generation task into multi-label keyword generation in community-based Question Answering (cQA). Our empirical results show that KeyGen2Vec in general is superior than multi-label keyword classifier by up to 14.7% based on Purity, Normalized Mutual Information (NMI), and F1-Score metrics. Interestingly, although in general the absolute advantage of learning embeddings through label supervision is highly positive across evaluation datasets, KeyGen2Vec is shown to be competitive with classifier that exploits topic label supervision in Yahoo! cQA with larger number of latent topic labels.
摘要
现有的文本表示学习模型主要依靠标签超vision来增加表示结果的表达力。然而,这些标签超vision通常是可获得的,但它们经常无法捕捉目标句子的隐式结构,特别是来自不同分布的样本。我们的研究旨在减少依赖于标签超vision的关系,通过使用序列到序列(Seq2Seq)文本生成器来学习文档表示。我们将关键短语生成任务转换为多标签关键词生成在社区基于问答(cQA)中。我们的实验结果显示,KeyGen2Vec在整体上高于多标签关键词分类器,具体来说,与Purity、Normalized Mutual Information(NMI)和F1-Score度量相比,KeyGen2Vec在Yahoo! cQA上表现出14.7%的绝对优势。虽然在评估数据集上,通过标签超vision学习表示的绝对优势通常很高,但KeyGen2Vec在Yahoo! cQA上与具有更多隐藏话题标签的类ifier竞争。
results: 对于声音提取任务,DPATD模型比现有的方法更高效,并且可以更好地处理长声音序列。Abstract
Recent high-performance transformer-based speech enhancement models demonstrate that time domain methods could achieve similar performance as time-frequency domain methods. However, time-domain speech enhancement systems typically receive input audio sequences consisting of a large number of time steps, making it challenging to model extremely long sequences and train models to perform adequately. In this paper, we utilize smaller audio chunks as input to achieve efficient utilization of audio information to address the above challenges. We propose a dual-phase audio transformer for denoising (DPATD), a novel model to organize transformer layers in a deep structure to learn clean audio sequences for denoising. DPATD splits the audio input into smaller chunks, where the input length can be proportional to the square root of the original sequence length. Our memory-compressed explainable attention is efficient and converges faster compared to the frequently used self-attention module. Extensive experiments demonstrate that our model outperforms state-of-the-art methods.
摘要
Here's the text in Simplified Chinese:现代高性能 transformer 基于 speech 增强模型表明,时域方法可以达到相似的性能,与时域频谱方法相比。然而,时域speech 增强系统通常处理大量的音频数据,这使得模型模型 extremely long sequences 和训练模型成为挑战。在这篇文章中,我们使用 smaller audio chunks 作为输入,以实现高效地利用音频信息。我们提出了一种 dual-phase audio transformer for denoising (DPATD),这是一种新的模型,用于在 deep structure 中学习干净的音频序列。DPATD 将 audio 输入拆分成 smaller chunks,其输入长度与原始序列长度的平方根成正比。我们的 memory-compressed explainable attention 具有高效性和快速收敛,与常用的 self-attention 模块相比。广泛的实验表明,我们的模型超过了当前最佳方法。
Improving Input-label Mapping with Demonstration Replay for In-context Learning
methods: 我们提出了一种新的ICL方法,即重复示例with Sliding Causal Attention(RdSca)。我们在示例后 duplicates later demonstrations and concatenates them to the front, allowing the model to `observe’ the later information even under the causal restriction。此外,我们引入了滑动 causal attention,以适应 causal attention 的自适应。
results: 我们的方法在ICL示例中显著提高了输入标签之间的映射。我们还进行了深入的分析,探讨如何适应 causal attention 的自适应,这是在前一个研究中未explored的领域。Abstract
In-context learning (ICL) is an emerging capability of large autoregressive language models where a few input-label demonstrations are appended to the input to enhance the model's understanding of downstream NLP tasks, without directly adjusting the model parameters. The effectiveness of ICL can be attributed to the strong language modeling capabilities of large language models (LLMs), which enable them to learn the mapping between input and labels based on in-context demonstrations. Despite achieving promising results, the causal nature of language modeling in ICL restricts the attention to be backward only, i.e., a token only attends to its previous tokens, failing to capture the full input-label information and limiting the model's performance. In this paper, we propose a novel ICL method called Repeated Demonstration with Sliding Causal Attention, (RdSca). Specifically, we duplicate later demonstrations and concatenate them to the front, allowing the model to `observe' the later information even under the causal restriction. Besides, we introduce sliding causal attention, which customizes causal attention to avoid information leakage. Experimental results show that our method significantly improves the input-label mapping in ICL demonstrations. We also conduct an in-depth analysis of how to customize the causal attention without training, which has been an unexplored area in previous research.
摘要
卷积语言模型(ICL)是一种现代语言模型技术,通过在输入上附加一些标签示例来提高模型对下游自然语言处理任务的理解,而不需要直接调整模型参数。ICL的效果可以归结于大型语言模型(LLM)的强语言模型能力,它们能够基于示例学习映射输入和标签之间的关系。然而,ICL中的语言模型归因约束限制了注意力的向前传递,即每个token只能注意前一个token,这会导致模型表现有限。在这篇论文中,我们提出了一种新的ICL方法,即重复示例与滑动 causal attention(RdSca)。具体来说,我们将后续示例重复并 concatenate 到输入的开头,这样允许模型在 causal 约束下 still 可以 observe 后续信息。此外,我们引入了滑动 causal attention,以适应不同的输入和标签信息。实验结果表明,我们的方法可以显著提高ICL示例中的输入-标签映射。此外,我们还进行了对自然语言处理任务的深入分析,以探讨在不需要训练的情况下如何自适应 causal attention。
A Novel Representation to Improve Team Problem Solving in Real-Time
results: 一个案例研究表明,该表示方式可以帮助理解和改进团队的行为。Abstract
This paper proposes a novel representation to support computing metrics that help understanding and improving in real-time a team's behavior during problem solving in real-life. Even though teams are important in modern activities, there is little computing aid to improve their activity. The representation captures the different mental images developed, enhanced, and utilized during solving. A case study illustrates the representation.
摘要
这篇论文提出了一种新的表示方式,用于支持在实时中对团队的行为进行理解和改进。即使团队在现代活动中具有重要地位, yet there is little computing aid to improve their activity. 该表示方式 capture了解决过程中发展、加强和使用的不同MENTAL IMAGES。一个案例研究 illustrate了该表示方式。Note: "MENTAL IMAGES" in the original text is translated as "不同MENTAL IMAGES" in Simplified Chinese, as there is no direct equivalent of "mental images" in Chinese.
InfoEntropy Loss to Mitigate Bias of Learning Difficulties for Generative Language Models
results: 在PILE数据集上,通过不同缩放因子的训练,实验表明,将InfoEntropy Loss函数添加到生成语言模型训练中可以持续提高下游任务表现。Abstract
Generative language models are usually pretrained on large text corpus via predicting the next token (i.e., sub-word/word/phrase) given the previous ones. Recent works have demonstrated the impressive performance of large generative language models on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpus during training, i.e., the imbalance between frequent tokens and infrequent ones. It can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To alleviate that, we propose an Information Entropy Loss (InfoEntropy Loss) function. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 436M, 1.1B, and 6.7B parameters. Experiments reveal that models incorporating the proposed InfoEntropy Loss can gain consistent performance improvement on downstream benchmarks.
摘要
<>转换文本到简化中文。>大多数生成语言模型通常通过预测下一个token(即子字/词/短语)来在前一个token的基础上进行预训练。 current works have shown that large generative language models can achieve impressive performance on downstream tasks. However, existing generative language models generally neglect an inherent challenge in text corpora during training, i.e., the imbalance between frequent tokens and infrequent ones. This can lead a language model to be dominated by common and easy-to-learn tokens, thereby overlooking the infrequent and difficult-to-learn ones. To address this, we propose an Information Entropy Loss (InfoEntropy Loss) function. During training, it can dynamically assess the learning difficulty of a to-be-learned token, according to the information entropy of the corresponding predicted probability distribution over the vocabulary. Then it scales the training loss adaptively, trying to lead the model to focus more on the difficult-to-learn tokens. On the Pile dataset, we train generative language models at different scales of 436M, 1.1B, and 6.7B parameters. Experiments show that models incorporating the proposed InfoEntropy Loss can gain consistent performance improvement on downstream benchmarks.
results: 我们在一个多样化的 LLM 中进行了实验,包括 ChatGPT、GPT-4、OPT、LLaMA 和 Alpaca,并与现有的状态机制构成解析器进行比较。我们在零shot、几shot 和全训练学习 Setting 中进行了实验,并在一个内部测试集和五个外部测试集上评估了模型的性能。我们的实验结果显示了 LLMS 的性能、泛化能力以及构成分析中的挑战。Abstract
Constituency parsing is a fundamental yet unsolved natural language processing task. In this paper, we explore the potential of recent large language models (LLMs) that have exhibited remarkable performance across various domains and tasks to tackle this task. We employ three linearization strategies to transform output trees into symbol sequences, such that LLMs can solve constituency parsing by generating linearized trees. We conduct experiments using a diverse range of LLMs, including ChatGPT, GPT-4, OPT, LLaMA, and Alpaca, comparing their performance against the state-of-the-art constituency parsers. Our experiments encompass zero-shot, few-shot, and full-training learning settings, and we evaluate the models on one in-domain and five out-of-domain test datasets. Our findings reveal insights into LLMs' performance, generalization abilities, and challenges in constituency parsing.
摘要
《干部分析是自然语言处理中的基本任务,但是这个任务还未得到解决。在这篇论文中,我们探索了最近的大语言模型(LLMs)在不同领域和任务中表现出色的潜力,以解决这个任务。我们采用三种线性化策略将输出树转换为符号序列,使LLMs可以通过生成线性化树来解决干部分析。我们在多种LLMs上进行实验,包括ChatGPT、GPT-4、OPT、LLaMA和Alpaca,并与现有的状态机构分析器进行比较。我们的实验包括零shot、几shot和全training学习设定,并在一个内域测试集和五个外域测试集上评估模型的表现。我们的发现反映了LLMs的表现、泛化能力和干部分析中的挑战。》
Mean BERTs make erratic language teachers: the effectiveness of latent bootstrapping in low-resource settings
results: 我们的实验表明,使用幽杂启动可以有效地从有限资源中获取语言知识。我们在BabyLM共同任务中进行了实验,该任务包括预训两个小型 curaated corpus,并在四个语言标准测试上进行评估。Abstract
This paper explores the use of latent bootstrapping, an alternative self-supervision technique, for pretraining language models. Unlike the typical practice of using self-supervision on discrete subwords, latent bootstrapping leverages contextualized embeddings for a richer supervision signal. We conduct experiments to assess how effective this approach is for acquiring linguistic knowledge from limited resources. Specifically, our experiments are based on the BabyLM shared task, which includes pretraining on two small curated corpora and an evaluation on four linguistic benchmarks.
摘要
A Lightweight Method to Generate Unanswerable Questions in English
results: 相比之前的状态艺术,使用该数据生成方法可以获得更好的模型 (+1.6 F1点在SQuAD 2.0数据上,使用BERT-large),并且人类评分的相关性和可读性更高。Abstract
If a question cannot be answered with the available information, robust systems for question answering (QA) should know _not_ to answer. One way to build QA models that do this is with additional training data comprised of unanswerable questions, created either by employing annotators or through automated methods for unanswerable question generation. To show that the model complexity of existing automated approaches is not justified, we examine a simpler data augmentation method for unanswerable question generation in English: performing antonym and entity swaps on answerable questions. Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models (+1.6 F1 points on SQuAD 2.0 data with BERT-large), and has higher human-judged relatedness and readability. We quantify the raw benefits of our approach compared to no augmentation across multiple encoder models, using different amounts of generated data, and also on TydiQA-MinSpan data (+9.3 F1 points with BERT-large). Our results establish swaps as a simple but strong baseline for future work.
摘要
Translated into Simplified Chinese:如果问题无法由已有的信息回答,then robust question answering(QA)系统应该知道不回答。一种方式建立QA模型是通过额外的训练数据包括无法回答的问题,使用注解员或自动生成无法回答问题的方法。为了证明现有的自动化方法的模型复杂性不当,我们研究一种更简单的数据增强方法 для无法回答问题的生成:在可回答问题上进行反义和实体交换。与过去的状态艺术比较,我们的训练自由和轻量级策略生成的数据比之前的状态艺术更好,在 SQuAD 2.0 数据上使用 BERT-large 模型时,提高了 +1.6 F1 分数。此外,我们还评估了人类评分的相关性和可读性,发现我们的方法在这两个方面都有了显著的提升。我们对不同的编码器模型进行了评估,使用不同的生成数据量,并在 TydiQA-MinSpan 数据上 (+9.3 F1 分数 with BERT-large) 得到了类似的结果。我们的结果证明了交换是一种简单 yet strong 的基线 для未来的工作。
results: 本研究获得了丰富的评估结果,包括句子嵌入模型的训练设置和评估结果。Abstract
We report the development of Japanese SimCSE, Japanese sentence embedding models fine-tuned with SimCSE. Since there is a lack of sentence embedding models for Japanese that can be used as a baseline in sentence embedding research, we conducted extensive experiments on Japanese sentence embeddings involving 24 pre-trained Japanese or multilingual language models, five supervised datasets, and four unsupervised datasets. In this report, we provide the detailed training setup for Japanese SimCSE and their evaluation results.
摘要
我们报道了日本SimCSE的开发,是基于SimCSE的日语句子嵌入模型的精心调教。由于日语句子嵌入模型的基线研究缺乏日语句子嵌入模型,我们在日语句子嵌入领域进行了广泛的实验,使用24种预训练的日语或多语言模型,5个supervised数据集和4个Unsupervised数据集。在这份报告中,我们提供了日本SimCSE的详细训练设置和评估结果。
Test Suites Task: Evaluation of Gender Fairness in MT with MuST-SHE and INES
results: 结果表明系统在正常的性形式上具有相似的表现,但生成包容性翻译方面仍然是一个挑战,表明未来可能需要进一步的改进和研究。Abstract
As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems' ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated with our test suites and validate them by means of human evaluations. Our results indicate that systems achieve reasonable and comparable performance in correctly translating both feminine and masculine gender forms for naturalistic gender phenomena. Instead, the generation of inclusive language forms in translation emerges as a challenging task for all the evaluated MT models, indicating room for future improvements and research on the topic.
摘要
As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems' ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated with our test suites and validate them by means of human evaluations. Our results indicate that systems achieve reasonable and comparable performance in correctly translating both feminine and masculine gender forms for naturalistic gender phenomena. Instead, the generation of inclusive language forms in translation emerges as a challenging task for all the evaluated MT models, indicating room for future improvements and research on the topic.Here's the translation in Traditional Chinese as well:As part of the WMT-2023 "Test suites" shared task, in this paper we summarize the results of two test suites evaluations: MuST-SHE-WMT23 and INES. By focusing on the en-de and de-en language pairs, we rely on these newly created test suites to investigate systems' ability to translate feminine and masculine gender and produce gender-inclusive translations. Furthermore we discuss metrics associated with our test suites and validate them by means of human evaluations. Our results indicate that systems achieve reasonable and comparable performance in correctly translating both feminine and masculine gender forms for naturalistic gender phenomena. Instead, the generation of inclusive language forms in translation emerges as a challenging task for all the evaluated MT models, indicating room for future improvements and research on the topic.
Fusing Temporal Graphs into Transformers for Time-Sensitive Question Answering
results: 研究结果显示,我们提议的方法可以substantially 提高Transformer模型在时间理解方面的能力,无需 Fine-tuning。此外,我们的方法还超过了多种基于图 convolution的方法,并在SituatedQA和TimeQA中三个分区中达到了新的state-of-the-art表现。Abstract
Answering time-sensitive questions from long documents requires temporal reasoning over the times in questions and documents. An important open question is whether large language models can perform such reasoning solely using a provided text document, or whether they can benefit from additional temporal information extracted using other systems. We address this research question by applying existing temporal information extraction systems to construct temporal graphs of events, times, and temporal relations in questions and documents. We then investigate different approaches for fusing these graphs into Transformer models. Experimental results show that our proposed approach for fusing temporal graphs into input text substantially enhances the temporal reasoning capabilities of Transformer models with or without fine-tuning. Additionally, our proposed method outperforms various graph convolution-based approaches and establishes a new state-of-the-art performance on SituatedQA and three splits of TimeQA.
摘要
回答时间敏感问题从长文档中需要时间逻辑,检查问题和文档中的时间和事件关系。一个重要的研究问题是大语言模型是否可以通过提供的文档alone来完成这种逻辑,或者是否可以通过其他系统提取的时间信息来增强其能力。我们解决这个研究问题,通过应用现有的时间信息抽取系统来构建问题和文档中的时间图。然后,我们研究不同的方法来融合这些图into Transformer模型。实验结果表明,我们提议的方法可以增强Transformer模型中的时间逻辑能力,无论是否进行了微调。此外,我们的方法还超过了基于图 convolution的方法,并在 SituatedQA 和 TimeQA 中的三个分区中创造了新的状态标准。
Learning to love diligent trolls: Accounting for rater effects in the dialogue safety task
results: 实验结果表明,当噪声用户是一致的时,使用AES-like方法可以准确地推断标签,即使噪声用户占多数。Abstract
Chatbots have the risk of generating offensive utterances, which must be avoided. Post-deployment, one way for a chatbot to continuously improve is to source utterance/label pairs from feedback by live users. However, among users are trolls, who provide training examples with incorrect labels. To de-troll training data, previous work removed training examples that have high user-aggregated cross-validation (CV) error. However, CV is expensive; and in a coordinated attack, CV may be overwhelmed by trolls in number and in consistency among themselves. In the present work, I address both limitations by proposing a solution inspired by methodology in automated essay scoring (AES): have multiple users rate each utterance, then perform latent class analysis (LCA) to infer correct labels. As it does not require GPU computations, LCA is inexpensive. In experiments, I found that the AES-like solution can infer training labels with high accuracy when trolls are consistent, even when trolls are the majority.
摘要
<>聊天机器人有可能生成冒犯性的语音,这些语音必须避免。在部署后,一种方式是通过来自用户的反馈获取语音/标签对。然而,用户中有些人是啦啦队伍,他们提供了训练示例 WITH incorrect标签。为了除啦啦示例,先前的工作使用了用户集成的跨验算法(CV)错误。然而,CV 是昂贵的,而且在协调攻击下,CV 可能会被啦啦人数和一致性所淹没。在当前的工作中,我解决了这两个限制,提出了基于自动化 Essay 评分(AES)的解决方案:由多个用户评分每个语音,然后使用隐藏类分析(LCA)推断正确的标签。由于 LCA 不需要 GPU 计算,因此它是便宜的。在实验中,我发现了 AES 类似的解决方案可以在啦啦人数占主导地位时, WITH high accuracy 推断训练标签。
Moral Judgments in Narratives on Reddit: Investigating Moral Sparks via Social Commonsense and Linguistic Signals
results: 发现事件相关的负性人格特质(如幼稚和颠覆)吸引注意力,导致道德责任的增加,表明道德决策和责任之间存在互动关系。此外,用于描述事件和人物的语言也会增加道德决策的可能性,而对事件的描述则减少这种效果。Abstract
Given the increasing realism of social interactions online, social media offers an unprecedented avenue to evaluate real-life moral scenarios. We examine posts from Reddit, where authors and commenters share their moral judgments on who is blameworthy. We employ computational techniques to investigate factors influencing moral judgments, including (1) events activating social commonsense and (2) linguistic signals. To this end, we focus on excerpt-which we term moral sparks-from original posts that commenters include to indicate what motivates their moral judgments. By examining over 24,672 posts and 175,988 comments, we find that event-related negative personal traits (e.g., immature and rude) attract attention and stimulate blame, implying a dependent relationship between moral sparks and blameworthiness. Moreover, language that impacts commenters' cognitive processes to depict events and characters enhances the probability of an excerpt become a moral spark, while factual and concrete descriptions tend to inhibit this effect.
摘要
Overview of the CLAIMSCAN-2023: Uncovering Truth in Social Media through Claim Detection and Identification of Claim Spans
methods: 本研究使用 CLAIMSCAN 技术,包括 Task A 和 Task B 两个任务,以自动识别社交媒体上的声明,并确定声明中的有关词语或短语。
results: 本研究在 2023 Forum for Information Retrieval Evaluation (FIRE’2023) 上提交了 CLAIMSCAN,并获得了 40 个注册和 28 个团队的参与,表明该技术在当今的数字时代中具有重要性和应用价值。Abstract
A significant increase in content creation and information exchange has been made possible by the quick development of online social media platforms, which has been very advantageous. However, these platforms have also become a haven for those who disseminate false information, propaganda, and fake news. Claims are essential in forming our perceptions of the world, but sadly, they are frequently used to trick people by those who spread false information. To address this problem, social media giants employ content moderators to filter out fake news from the actual world. However, the sheer volume of information makes it difficult to identify fake news effectively. Therefore, it has become crucial to automatically identify social media posts that make such claims, check their veracity, and differentiate between credible and false claims. In response, we presented CLAIMSCAN in the 2023 Forum for Information Retrieval Evaluation (FIRE'2023). The primary objectives centered on two crucial tasks: Task A, determining whether a social media post constitutes a claim, and Task B, precisely identifying the words or phrases within the post that form the claim. Task A received 40 registrations, demonstrating a strong interest and engagement in this timely challenge. Meanwhile, Task B attracted participation from 28 teams, highlighting its significance in the digital era of misinformation.
摘要
在互联网社交媒体平台的快速发展中,内容创造和信息交换得到了极大的提高,这对于社会是非常有利。然而,这些平台也成为了假信息、宣传和falsetelling的渠道。我们的认知形成的基础是声称,但是这些声称经常被用来骗人。为了解决这个问题,社交媒体巨头们雇用内容筛选人员来从实际世界中筛选假新闻。然而,巨大量的信息使得效果性验证假新闻变得困难。因此,已成为必须自动识别社交媒体帖子中的声称,验证其真实性,并将真实和假的声称分开。为此,我们在2023年信息检索评估论坛(FIRE'2023)上发表了CLAIMSCAN。主要目标是解决两个关键任务:任务A是判断社交媒体帖子是否为声称,任务B是在帖子中 precisely identify the words or phrases that form the claim。任务A得到了40个注册,表明了这个时期挑战的强大兴趣和参与度。同时,任务B吸引了28个团队的参与,这反映了在数字时代的谎言普遍性。
M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models
methods: 这篇论文使用了一种自动化的方法(但需要 negligible human annotations)将短序列任务转换成长序列场景,以评估LLMs在多个能力上的表现。
results: 研究发现,当任务需要多个 span 注意时,现有的 LLMs 很难理解长序列上的上下文。 semantic retrieve 任务对能力强的 LLMs 来说更加困难。 模型通过长文本 fine-tuning 和位置插值来达到相似的性能。Abstract
Managing long sequences has become an important and necessary feature for large language models (LLMs). However, it is still an open question of how to comprehensively and systematically evaluate the long-sequence capability of LLMs. One of the reasons is that conventional and widely-used benchmarks mainly consist of short sequences. In this paper, we propose M4LE, a Multi-ability, Multi-range, Multi-task, Multi-domain benchmark for Long-context Evaluation. M4LE is based on a diverse NLP task pool comprising 36 NLP datasets, 11 task types and 12 domains. To alleviate the scarcity of tasks with naturally long sequences and incorporate multiple-ability assessment, we propose an automatic approach (but with negligible human annotations) to convert short-sequence tasks into a unified long-sequence scenario where LLMs have to identify single or multiple relevant spans in long contexts based on explicit or semantic hints. Specifically, the scenario includes five different types of abilities: (1) explicit single-span; (2) semantic single-span; (3) explicit multiple-span; (4) semantic multiple-span; and (5) global context understanding. The resulting samples in M4LE are evenly distributed from 1k to 8k input length. We conducted a systematic evaluation on 11 well-established LLMs, especially those optimized for long-sequence inputs. Our results reveal that: 1) Current LLMs struggle to understand long context, particularly when tasks require multiple-span attention. 2) Semantic retrieval task is more difficult for competent LLMs. 3) Models fine-tuned on longer text with position interpolation have comparable performance to those using Neural Tangent Kernel (NTK) aware scaling methods without fine-tuning. We make our benchmark publicly available to encourage future research in this challenging area.
摘要
管理长序列已成为大语言模型(LLM)的重要和必要特性。然而,如何全面和系统地评估长序列能力仍然是一个开放的问题。一个原因是,现有的常用的标准benchmark主要包括短序列。在这篇论文中,我们提议M4LE,一个多能力、多范围、多任务、多领域benchmark для长序列评估。M4LE基于多样化的NLP任务池,包括36个NLP任务、11种任务类型和12个领域。为了解决短序列任务的缺乏和多能力评估的困难,我们提出了一种自动化方法(即使无需人工注释),将短序列任务转换成一个统一的长序列场景,要求LLMs在长文本中标识单个或多个相关的span,基于显式或 semantics 的提示。具体来说,场景包括五种不同的能力:(1)显式单span;(2)semantic单span;(3)显式多span;(4)semantic多span;和(5)全局上下文理解。M4LE中的样本具有1k到8k输入长度的均衡分布。我们对11种已知的LLMs进行系统评估,特别是那些针对长序列输入优化。我们的结果表明:1)当前LLMs在长序列上尚未具备全面的理解能力,特别是需要多个span注意力时。2)semantic retrieval任务对高能 LLMs 更加困难。3)没有 fine-tuning 的模型在 longer text 上使用位置插值方法可以达到相当的性能。我们将benchmark公开发布,以便未来的研究在这一领域。
Building Real-World Meeting Summarization Systems using Large Language Models: A Practical Perspective
results: 研究发现,大多数关闭源 LLM 在性能方面比较好,但是小型开源模型 Like LLaMA-2 (7B 和 13B) 在零基eline情况下可以达到与关闭源模型相当的性能。考虑到关闭源模型的隐私问题以及使用精度版本的高成本,开源模型更有利可图于实际应用。因此,LLaMA-2-7B 模型更有前途的推荐用于实际应用。Abstract
This paper studies how to effectively build meeting summarization systems for real-world usage using large language models (LLMs). For this purpose, we conduct an extensive evaluation and comparison of various closed-source and open-source LLMs, namely, GPT-4, GPT- 3.5, PaLM-2, and LLaMA-2. Our findings reveal that most closed-source LLMs are generally better in terms of performance. However, much smaller open-source models like LLaMA- 2 (7B and 13B) could still achieve performance comparable to the large closed-source models even in zero-shot scenarios. Considering the privacy concerns of closed-source models for only being accessible via API, alongside the high cost associated with using fine-tuned versions of the closed-source models, the opensource models that can achieve competitive performance are more advantageous for industrial use. Balancing performance with associated costs and privacy concerns, the LLaMA-2-7B model looks more promising for industrial usage. In sum, this paper offers practical insights on using LLMs for real-world business meeting summarization, shedding light on the trade-offs between performance and cost.
摘要
results: 实验结果显示,使用拓扑特性来优化adapter参数比照度量基线更有效,而结合多种方法则在多个任务上表现最佳。Abstract
Adapters are widely popular parameter-efficient transfer learning approaches in natural language processing that insert trainable modules in between layers of a pre-trained language model. Apart from several heuristics, however, there has been a lack of studies analyzing the optimal number of adapter parameters needed for downstream applications. In this paper, we propose an adapter pruning approach by studying the tropical characteristics of trainable modules. We cast it as an optimization problem that aims to prune parameters from the adapter layers without changing the orientation of underlying tropical hypersurfaces. Our experiments on five NLP datasets show that tropical geometry tends to identify more relevant parameters to prune when compared with the magnitude-based baseline, while a combined approach works best across the tasks.
摘要
<>将文本翻译成简化中文。> adapter 是自然语言处理领域非常流行的参数高效传承学习方法之一,它们将可训练模块插入预训练语言模型之间的层。虽然有几种规则,但没有尝试分析最佳 adapter 参数的数量,用于下游应用。在这篇论文中,我们提出了一种 adapter 剔除方法,通过研究 tropical 特征来决定哪些参数可以从 adapter 层中剔除。我们将其拟合为一个优化问题,目标是从 adapter 层中剔除参数,而不改变 tropical 偏函数的方向。我们在五个 NLP 数据集上进行了实验,发现 tropical 几何可以更好地标识需要剔除的参数,而且一种组合方法在所有任务中表现最佳。
LitCab: Lightweight Calibration of Language Models on Outputs of Varied Lengths
paper_authors: Xin Liu, Muhammad Khalifa, Lu Wang for: 这篇论文的目的是提出一种轻量级的模型均衡机制,以改善语言模型(LM)的准确性。methods: 这篇论文使用了一种单Linear层来修改输入文本表示的LM输出logits,以改善模型均衡。results: 这篇论文的实验结果表明, LitCab 可以提高 LM 的准确性,并且只添加了 < 2% 的原始模型参数。此外,在7种文本生成任务上, LitCab 可以降低 ECE 平均分数 by 20%。此外,在7种流行的开源语言模型(GPT和LLaMA家族)上, LitCab 可以获得以下关键发现:1) bigger models within the same family exhibit better calibration on tasks with short generation tasks, but not necessarily for longer ones。2) GPT-family models show superior calibration compared to LLaMA, Llama2 and Vicuna models despite having much fewer parameters。3) finetuning pretrained model (e.g., LLaMA) with samples of limited purpose (e.g., conversations) may lead to worse calibration, highlighting the importance of finetuning setups for calibrating LMs。Abstract
A model is considered well-calibrated when its probability estimate aligns with the actual likelihood of the output being correct. Calibrating language models (LMs) is crucial, as it plays a vital role in detecting and mitigating hallucinations, a common issue of LMs, as well as building more trustworthy models. Yet, popular neural model calibration techniques are not well-suited for LMs due to their lack of flexibility in discerning answer correctness and their high computational costs. For instance, post-processing methods like temperature scaling are often unable to reorder the candidate generations. Moreover, training-based methods require finetuning the entire model, which is impractical due to the increasing sizes of modern LMs. In this paper, we present LitCab, a lightweight calibration mechanism consisting of a single linear layer taking the input text representation and manipulateing the LM output logits. LitCab improves model calibration by only adding < 2% of the original model parameters. For evaluation, we construct CaT, a benchmark consisting of 7 text generation tasks, covering responses ranging from short phrases to paragraphs. We test LitCab with Llama2-7B, where it improves calibration across all tasks, by reducing the average ECE score by 20%. We further conduct a comprehensive evaluation with 7 popular open-sourced LMs from GPT and LLaMA families, yielding the following key findings: (1) Larger models within the same family exhibit better calibration on tasks with short generation tasks, but not necessarily for longer ones. (2) GPT-family models show superior calibration compared to LLaMA, Llama2 and Vicuna models despite having much fewer parameters. (3) Finetuning pretrained model (e.g., LLaMA) with samples of limited purpose (e.g., conversations) may lead to worse calibration, highlighting the importance of finetuning setups for calibrating LMs.
摘要
modelo es considerado bien calibrado cuando su estimación de probabilidad se alinea con la probabilidad real de que el output sea correcto. Calibrar modelos de lenguaje (LMs) es crucial, ya que juega un papel vital en la detección y mitigación de halucinaciones, un problema común de los modelos, así como en la construcción de modelos más confiables. Sin embargo, las técnicas populares de calibración neural no son adecuadas para los modelos de lenguaje debido a su falta de flexibilidad en determinar la corrección de las respuestas y su alto costo computacional. Por ejemplo, los métodos de posprocesamiento como escalado de temperatura a menudo no pueden reordenar las generaciones de candidatos. Además, los métodos de entrenamiento requieren finetuning la totalidad del modelo, lo que es impracticable debido al tamaño creciente de los modelos modernos. En este artículo, presentamos LitCab, un mecanismo de calibración ligero que consta de una sola capa lineal que toma la representación del texto de entrada y manipula las logit del modelo de salida. LitCab mejora la calibración del modelo al agregar menos de 2% de los parámetros originales. Para evaluar, construimos CaT, un conjunto de tareas de generación de texto que cubre respuestas que van desde frases breves hasta párrafos. Probamos LitCab con Llama2-7B, lo que mejora la calibración en todas las tareas, reduciendo la puntuación promedio de ECE en un 20%. Además, realizamos una evaluación exhaustiva con 7 modelos de lenguaje populares de las familias GPT y LLaMA, obteniendo los siguientes hallazgos clave: (1) Los modelos más grandes dentro de la misma familia exhiben mejor calibración en tareas de generación breve, pero no necesariamente en tareas más largas. (2) Los modelos de la familia GPT exhiben una mejor calibración que los modelos LLaMA, Llama2 y Vicuna, a pesar de tener muchos menos parámetros. (3) Finetuning un modelo preentrenado (por ejemplo, LLaMA) con muestras de propósito limitado (por ejemplo, conversaciones) puede llevar a una calibración peor, lo que destaca la importancia de los conjuntos de finetuning adecuados para calibrar los modelos de lenguaje.
results: 实验结果表明,Policy-Learn在多种 dataset上都能够超越现有的基准值。Abstract
Subgraph GNNs are provably expressive neural architectures that learn graph representations from sets of subgraphs. Unfortunately, their applicability is hampered by the computational complexity associated with performing message passing on many subgraphs. In this paper, we consider the problem of learning to select a small subset of the large set of possible subgraphs in a data-driven fashion. We first motivate the problem by proving that there are families of WL-indistinguishable graphs for which there exist efficient subgraph selection policies: small subsets of subgraphs that can already identify all the graphs within the family. We then propose a new approach, called Policy-Learn, that learns how to select subgraphs in an iterative manner. We prove that, unlike popular random policies and prior work addressing the same problem, our architecture is able to learn the efficient policies mentioned above. Our experimental results demonstrate that Policy-Learn outperforms existing baselines across a wide range of datasets.
摘要
<>将文本翻译成简化中文。<>图гра非常表达的神经网络(Subgraph GNNs)可以从多个子图中学习图表示。然而,它们的应用受到计算复杂性的限制,因为需要在许多子图上进行消息传递。在这篇论文中,我们考虑了选择一小集合的可能的子图的问题。我们首先证明了存在一些家族的无益图(WL-indistinguishable graphs),其中存在高效的子图选择策略:小 subsets of subgraphs 可以已经识别整个家族中的所有图。然后,我们提出了一种新的方法,叫做 Policy-Learn,它在循环的方式学习如何选择子图。我们证明了,与流行的随机策略和先前的相同问题的解决方案不同,我们的架构可以学习高效的策略。我们的实验结果表明,Policy-Learn 在各种数据集上超过现有的基elines。
Hybridizing Physics and Neural ODEs for Predicting Plasma Inductance Dynamics in Tokamak Fusion Reactors
results: 研究发现,将物理学基本方程与神经网络模型相结合,可以提高预测气态动力学行为的准确性,比已有的物理学推导式ODE和纯神经网络模型都更好。Abstract
While fusion reactors known as tokamaks hold promise as a firm energy source, advances in plasma control, and handling of events where control of plasmas is lost, are needed for them to be economical. A significant bottleneck towards applying more advanced control algorithms is the need for better plasma simulation, where both physics-based and data-driven approaches currently fall short. The former is bottle-necked by both computational cost and the difficulty of modelling plasmas, and the latter is bottle-necked by the relative paucity of data. To address this issue, this work applies the neural ordinary differential equations (ODE) framework to the problem of predicting a subset of plasma dynamics, namely the coupled plasma current and internal inductance dynamics. As the neural ODE framework allows for the natural inclusion of physics-based inductive biases, we train both physics-based and neural network models on data from the Alcator C-Mod fusion reactor and find that a model that combines physics-based equations with a neural ODE performs better than both existing physics-motivated ODEs and a pure neural ODE model.
摘要
tokamak核聚变反应堆具有成本能源的承诺,但是需要进一步的材料控制和失控事件处理技术才能实现经济性。目前控制算法的应用受到plasma simulate的限制,physics-based和data-driven两种方法都有瓶颈。前者由计算成本和模拟plasma困难而受限,后者由数据的缺乏而受限。为解决这个问题,本工作采用神经网络 ordinary differential equations(ODE)框架来预测plasma动力学中的一部分,即核聚变和内 inductance动力学。由于神经网络ODE框架可以自然地包含物理学基础的束缚,我们在Alcator C-Mod核聚变反应堆数据上训练了physics-based和神经网络模型,发现一个combines physics-based方程和神经网络ODE模型的模型在 physics-motivated ODE和纯神经网络模型之上表现更好。
Meek Separators and Their Applications in Targeted Causal Discovery
results: 我们的结果提供了首个known average-case 证明保证,用于 subset search 和 causal matching 问题。我们认为这将开启更多targeted causal structure learning问题的解决方案的可能性。Abstract
Learning causal structures from interventional data is a fundamental problem with broad applications across various fields. While many previous works have focused on recovering the entire causal graph, in practice, there are scenarios where learning only part of the causal graph suffices. This is called $targeted$ causal discovery. In our work, we focus on two such well-motivated problems: subset search and causal matching. We aim to minimize the number of interventions in both cases. Towards this, we introduce the $Meek~separator$, which is a subset of vertices that, when intervened, decomposes the remaining unoriented edges into smaller connected components. We then present an efficient algorithm to find Meek separators that are of small sizes. Such a procedure is helpful in designing various divide-and-conquer-based approaches. In particular, we propose two randomized algorithms that achieve logarithmic approximation for subset search and causal matching, respectively. Our results provide the first known average-case provable guarantees for both problems. We believe that this opens up possibilities to design near-optimal methods for many other targeted causal structure learning problems arising from various applications.
摘要
Towards this, we introduce the Meek separator, which is a subset of vertices that, when intervened, decomposes the remaining unoriented edges into smaller connected components. We then present an efficient algorithm to find Meek separators that are of small sizes. Such a procedure is helpful in designing various divide-and-conquer-based approaches. In particular, we propose two randomized algorithms that achieve logarithmic approximation for subset search and causal matching, respectively. Our results provide the first known average-case provable guarantees for both problems. We believe that this opens up possibilities to design near-optimal methods for many other targeted causal structure learning problems arising from various applications.Translated into Simplified Chinese:学习 causal 结构从 intervening 数据是一个基本的问题,它在各个领域中有广泛的应用。虽然许多前一些工作都是Focus on recovering the entire causal graph,但在实践中有情况是只需要学习 causal 结构的一部分。这被称为 targeted causal discovery。在我们的工作中,我们关注了两个这样的有利场景:subset search和causal matching。我们想要最小化 intervened 的数量。为了实现这一目标,我们引入 Meek separator,它是一个被 intervened 的顶点集,其中的顶点被 intervened 后,可以将剩下的未oriented 边分解成更小的连接组件。我们然后提出了一种高效的算法来找到 Meek separator,其中的顶点数量尽可能小。这种过程对于设计 divide-and-conquer 基于的方法非常有用。具体来说,我们提出了两种随机算法,它们可以对 subset search 和 causal matching 问题实现 logarithmic 的近似,并且我们的结果提供了这些问题的首个平均情况可证 guarantees。我们认为这些结果将开启大量的可优化 targeted causal structure learning 问题的解决方案。
Decentralised, Scalable and Privacy-Preserving Synthetic Data Generation
paper_authors: Vishal Ramesh, Rui Zhao, Naman Goel for:The paper is written for the purpose of exploring the use of synthetic data in machine learning, with a focus on privacy-preserving methods for generating synthetic data.methods:The paper uses a novel system that combines Solid (Social Linked Data), MPC (Secure Multi-Party Computation), and TEEs (Trusted Execution Environments) to generate differentially private synthetic data.results:The paper demonstrates the effectiveness of their approach through rigorous empirical results on simulated and real datasets, and shows that their method can address various challenges in responsible and trustworthy synthetic data generation, including contributor autonomy, decentralization, privacy, and scalability.Abstract
Synthetic data is emerging as a promising way to harness the value of data, while reducing privacy risks. The potential of synthetic data is not limited to privacy-friendly data release, but also includes complementing real data in use-cases such as training machine learning algorithms that are more fair and robust to distribution shifts etc. There is a lot of interest in algorithmic advances in synthetic data generation for providing better privacy and statistical guarantees and for its better utilisation in machine learning pipelines. However, for responsible and trustworthy synthetic data generation, it is not sufficient to focus only on these algorithmic aspects and instead, a holistic view of the synthetic data generation pipeline must be considered. We build a novel system that allows the contributors of real data to autonomously participate in differentially private synthetic data generation without relying on a trusted centre. Our modular, general and scalable solution is based on three building blocks namely: Solid (Social Linked Data), MPC (Secure Multi-Party Computation) and Trusted Execution Environments (TEEs). Solid is a specification that lets people store their data securely in decentralised data stores called Pods and control access to their data. MPC refers to the set of cryptographic methods for different parties to jointly compute a function over their inputs while keeping those inputs private. TEEs such as Intel SGX rely on hardware based features for confidentiality and integrity of code and data. We show how these three technologies can be effectively used to address various challenges in responsible and trustworthy synthetic data generation by ensuring: 1) contributor autonomy, 2) decentralisation, 3) privacy and 4) scalability. We support our claims with rigorous empirical results on simulated and real datasets and different synthetic data generation algorithms.
摘要
人工数据正在成为一种有前途的方法,以获得数据的价值,同时降低隐私风险。人工数据的潜在价值不仅限于隐私友好的数据发布,还包括补充实际数据在训练机器学习算法等场景中的使用。目前有很大的兴趣在人工数据生成算法方面的技术进步,以提供更好的隐私和统计保证,并在机器学习管道中更好地使用人工数据。但是,为了负责任和信任worthy的人工数据生成,不能仅仅关注算法方面,而是需要考虑整个人工数据生成管道的各个方面。我们开发了一个新的系统,使得实际数据的 contribuutors可以自主参与在权限匿名的情况下进行权限匿名的情况下进行权限匿名的人工数据生成。我们的模块化、通用和可扩展的解决方案基于以下三个建造件:Solid(社会链接数据)、MPC(安全多方计算)和TEEs(可信执行环境)。Solid是一种规范,允许人们在分布式数据存储called Pods中安全存储自己的数据,并控制对自己的数据的访问。MPC是一种用于不同党计算共同计算函数的 cryptographic 方法,以保持各个党的输入私有。TEEs,如Intel SGX,基于硬件特性,以确保代码和数据的机密性和完整性。我们证明了这三种技术可以有效地解决负责任和信任worthy的人工数据生成中的各种挑战,包括:1)参与者自主性,2)分布式,3)隐私和4)可扩展性。我们支持我们的主张通过对 simulated 和实际数据集和不同的人工数据生成算法进行严格的实验结果。
AdaSub: Stochastic Optimization Using Second-Order Information in Low-Dimensional Subspaces
results: 比较具有时间和迭代次数的精度和效率,超过了流行的随机优化器。Abstract
We introduce AdaSub, a stochastic optimization algorithm that computes a search direction based on second-order information in a low-dimensional subspace that is defined adaptively based on available current and past information. Compared to first-order methods, second-order methods exhibit better convergence characteristics, but the need to compute the Hessian matrix at each iteration results in excessive computational expenses, making them impractical. To address this issue, our approach enables the management of computational expenses and algorithm efficiency by enabling the selection of the subspace dimension for the search. Our code is freely available on GitHub, and our preliminary numerical results demonstrate that AdaSub surpasses popular stochastic optimizers in terms of time and number of iterations required to reach a given accuracy.
摘要
我们介绍AdaSub,一种随机优化算法,它基于目前和过去信息中的第二项信息来计算搜寻方向。相比于首项方法,第二项方法具有更好的均衡特性,但是计算梯度矩阵的需要在每个迭代运算中导致过度的计算成本,使其无法实际应用。为解决这个问题,我们的方法可以运算计算成本和算法效率的管理,并允许选择搜寻空间的维度。我们的代码可以免费下载于GitHub,而我们的初步的数据显示,AdaSub在时间和迭代次数方面超过流行的随机优化器。
Estimating optimal PAC-Bayes bounds with Hamiltonian Monte Carlo
paper_authors: Szilvia Ujváry, Gergely Flamich, Vincent Fortuin, José Miguel Hernández Lobato
for: investigate the tightness of PAC-Bayes bounds when restricting the posterior family to factorized Gaussian distributions.
methods: 使用 Hamiltonian Monte Carlo 采样优化 posterior,通过热力学整合Estimate KL divergence from the prior,并提出三种方法来获得高概率 bound under different assumptions.
results: experiments on MNIST dataset show significant tightness gaps, as much as 5-6% in some cases.Abstract
An important yet underexplored question in the PAC-Bayes literature is how much tightness we lose by restricting the posterior family to factorized Gaussian distributions when optimizing a PAC-Bayes bound. We investigate this issue by estimating data-independent PAC-Bayes bounds using the optimal posteriors, comparing them to bounds obtained using MFVI. Concretely, we (1) sample from the optimal Gibbs posterior using Hamiltonian Monte Carlo, (2) estimate its KL divergence from the prior with thermodynamic integration, and (3) propose three methods to obtain high-probability bounds under different assumptions. Our experiments on the MNIST dataset reveal significant tightness gaps, as much as 5-6\% in some cases.
摘要
“一个重要但未获足够探讨的问题在PAC-Bayes文献中是,当我们将 posterior family 限制为因素化 Gaussian 分布时,我们所失去的紧密程度。我们调查这个问题,使用最佳 Gibbs posterior 来定义 PAC-Bayes 下界,并与 MFVI 的下界进行比较。具体来说,我们进行了以下三个步骤:1. 使用 Hamiltonian Monte Carlo 方法获取最佳 Gibbs posterior 的抽象;2. 使用热力学 интеграation 估算这个 posterior 对从假设的 KL 散度;3. 提出了三种方法来在不同的假设下获得高概率下界。我们对 MNIST dataset 进行了实验,发现在一些情况下,紧密程度可以相对较高,甚至高达 5-6%。”
The Expressibility of Polynomial based Attention Scheme
for: This paper aims to provide a theoretical analysis of the expressive capabilities of polynomial attention in transformer architectures, and to explore the effectiveness of high-degree polynomials in amplifying large values and distinguishing between datasets.
methods: The paper uses a combination of theoretical analysis and experimental evaluation to study the representational capacity of polynomial attention. The authors construct two carefully designed datasets, namely $\mathcal{D}_0$ and $\mathcal{D}_1$, and demonstrate the ability of a single-layer polynomial attention network to distinguish between these datasets using a sufficient high degree $\beta$.
results: The paper shows that with a high degree $\beta$, a single-layer polynomial attention network can effectively separate the two datasets, while with a low degree $\beta$, the network cannot effectively distinguish between them. The analysis underscores the greater effectiveness of high-degree polynomials in amplifying large values and capturing intricate linguistic correlations.Abstract
Large language models (LLMs) have significantly improved various aspects of our daily lives. These models have impacted numerous domains, from healthcare to education, enhancing productivity, decision-making processes, and accessibility. As a result, they have influenced and, to some extent, reshaped people's lifestyles. However, the quadratic complexity of attention in transformer architectures poses a challenge when scaling up these models for processing long textual contexts. This issue makes it impractical to train very large models on lengthy texts or use them efficiently during inference. While a recent study by [KMZ23] introduced a technique that replaces the softmax with a polynomial function and polynomial sketching to speed up attention mechanisms, the theoretical understandings of this new approach are not yet well understood. In this paper, we offer a theoretical analysis of the expressive capabilities of polynomial attention. Our study reveals a disparity in the ability of high-degree and low-degree polynomial attention. Specifically, we construct two carefully designed datasets, namely $\mathcal{D}_0$ and $\mathcal{D}_1$, where $\mathcal{D}_1$ includes a feature with a significantly larger value compared to $\mathcal{D}_0$. We demonstrate that with a sufficiently high degree $\beta$, a single-layer polynomial attention network can distinguish between $\mathcal{D}_0$ and $\mathcal{D}_1$. However, with a low degree $\beta$, the network cannot effectively separate the two datasets. This analysis underscores the greater effectiveness of high-degree polynomials in amplifying large values and distinguishing between datasets. Our analysis offers insight into the representational capacity of polynomial attention and provides a rationale for incorporating higher-degree polynomials in attention mechanisms to capture intricate linguistic correlations.
摘要
大型语言模型(LLM)已经有效地改善了我们日常生活中的多个方面。这些模型影响了许多领域,从医疗保健到教育,提高生产力、决策过程和可доступ性。因此,它们已经影响了人们的生活方式。然而,trasformer架构中的对话复杂度问题使得当文本上下文变长时,训练和应用这些模型变得不实际。 recent study by [KMZ23] introduced a technique that replaces the softmax with a polynomial function and polynomial sketching to speed up attention mechanisms, but the theoretical understandings of this new approach are not yet well understood.在这篇论文中,我们提供了对幂函数注意力的理论分析。我们的研究显示了高度和低度幂函数注意力之间的差异。具体来说,我们创建了两个精心设计的数据集,namely $\mathcal{D}_0$和$\mathcal{D}_1$,其中 $\mathcal{D}_1$ 包含一个具有许多更大值的特征,相比 $\mathcal{D}_0$。我们示出了,随着幂度 $\beta$ 增加到足够高的程度,单层幂函数注意力网络可以有效地分辨 $\mathcal{D}_0$ 和 $\mathcal{D}_1$。然而,随着幂度 $\beta$ 降低,网络无法有效地区分这两个数据集。这一分析显示了幂函数注意力的表达能力,并提供了将更高度幂函数包含在注意力机制中以捕捉复杂的语言相关资讯的理论基础。
results: 研究发现,在低维数据集上,使用我们的修正可以获得显著改善,使偏微分与其他方法竞争。此外,我们还证明了我们的方法可以在高维任务上进行泛化。例如,我们在 $SU(n)$ 格子上模型了 QCD 分布,并在高维球面上学习了对比性学习的嵌入。Abstract
Riemannian diffusion models draw inspiration from standard Euclidean space diffusion models to learn distributions on general manifolds. Unfortunately, the additional geometric complexity renders the diffusion transition term inexpressible in closed form, so prior methods resort to imprecise approximations of the score matching training objective that degrade performance and preclude applications in high dimensions. In this work, we reexamine these approximations and propose several practical improvements. Our key observation is that most relevant manifolds are symmetric spaces, which are much more amenable to computation. By leveraging and combining various ans\"{a}tze, we can quickly compute relevant quantities to high precision. On low dimensional datasets, our correction produces a noticeable improvement, allowing diffusion to compete with other methods. Additionally, we show that our method enables us to scale to high dimensional tasks on nontrivial manifolds. In particular, we model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres.
摘要
里曼尼 diffusion 模型继承自标准欧几里得空间 diffusion 模型,以学习总体 manifold 上的分布。然而,额外的几何复杂性使得扩散过程的转移函数无法表示为闭合形式,因此先前的方法通常采用了准确性不高的替换方法,这会影响性能并阻碍高维应用。在这个工作中,我们重新评估这些替换方法,并提出了一些实用的改进方案。我们的关键观察是,大多数相关的拓扑都是 симметричные 空间,这使得计算变得非常方便。通过利用和组合不同的 ansatz,我们可以快速计算相关的量,并达到高精度。在低维数据集上,我们的修正提供了明显的改善,使扩散能够与其他方法竞争。此外,我们还证明了我们的方法可以扩展到高维任务上,特别是在 $SU(n)$ 格点上模型 QCD 分布和高维球体上的 contrastively 学习 embedding。
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
results: 论文的实验评估发现,这种配置优化可以实现36%的能源储存,并且可以快速地对应应用程序的限制。Abstract
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models. We also validate that PolyThrottle can quickly converge towards near-optimal settings while satisfying application constraints.
摘要
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models. We also validate that PolyThrottle can quickly converge towards near-optimal settings while satisfying application constraints.Here's the translation in Traditional Chinese:随着神经网络(NN)在多个领域应用,其能源需求也随之增加。虽然先前的研究主要集中在训练过程中对能源采取减少措施,但是ML系统的持续运行仍然导致重要的能源消耗。本文研究了在实际应用中对于NN推理的硬件元件配置,如GPU、内存和CPU频率,对于能源消耗的影响。我们提出了PolyThrottle,一个实现硬件元件配置优化的解决方案,使用受限的泊松优化。我们的实验评估发现,PolyThrottle可以实现36%的能源储存,并且可以快速对应应用程序的限制。
Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations
results: 该方法可以减少训练时间的复杂度,从$\mathcal{O}(TDCS + TNS)$降低到$\mathcal{O}(NS + T\sqrt{D}\log{D} + TS^2)$,具体取决于私有保证参数$\epsilon$和数据稀疏程度。实验结果表明,这种方法可以减少训练时间的因子达2200倍。Abstract
To the best of our knowledge, there are no methods today for training differentially private regression models on sparse input data. To remedy this, we adapt the Frank-Wolfe algorithm for $L_1$ penalized linear regression to be aware of sparse inputs and to use them effectively. In doing so, we reduce the training time of the algorithm from $\mathcal{O}( T D S + T N S)$ to $\mathcal{O}(N S + T \sqrt{D} \log{D} + T S^2)$, where $T$ is the number of iterations and a sparsity rate $S$ of a dataset with $N$ rows and $D$ features. Our results demonstrate that this procedure can reduce runtime by a factor of up to $2,200\times$, depending on the value of the privacy parameter $\epsilon$ and the sparsity of the dataset.
摘要
Currently, there are no methods for training differentially private regression models on sparse input data. To address this, we modify the Frank-Wolfe algorithm for $L_1$ penalized linear regression to handle sparse inputs and improve its efficiency. As a result, the training time of the algorithm is reduced from $\mathcal{O}( T D S + T N S)$ to $\mathcal{O}(N S + T \sqrt{D} \log{D} + T S^2)$, where $T$ is the number of iterations and a sparsity rate $S$ of a dataset with $N$ rows and $D$ features. Our experiments show that this approach can reduce the runtime by a factor of up to $2,200\times$, depending on the value of the privacy parameter $\epsilon$ and the sparsity of the dataset.
Unified Enhancement of Privacy Bounds for Mixture Mechanisms via $f$-Differential Privacy
results: 这篇论文的研究表明,随机初始化可以增强DP-GD的隐私保证,并且对于洗淤模型,$f$-DP可以提供更好的隐私保证。此外,这篇论文还提出了一个新的不等式 для质量函数,用于分析混合机制的隐私保证。Abstract
Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when proving differential privacy bounds because it induces mixture distributions for the algorithm's output that are difficult to analyze. This paper focuses on improving privacy bounds for shuffling models and one-iteration differentially private gradient descent (DP-GD) with random initializations using $f$-DP. We derive a closed-form expression of the trade-off function for shuffling models that outperforms the most up-to-date results based on $(\epsilon,\delta)$-DP. Moreover, we investigate the effects of random initialization on the privacy of one-iteration DP-GD. Our numerical computations of the trade-off function indicate that random initialization can enhance the privacy of DP-GD. Our analysis of $f$-DP guarantees for these mixture mechanisms relies on an inequality for trade-off functions introduced in this paper. This inequality implies the joint convexity of $F$-divergences. Finally, we study an $f$-DP analog of the advanced joint convexity of the hockey-stick divergence related to $(\epsilon,\delta)$-DP and apply it to analyze the privacy of mixture mechanisms.
摘要
differentially private(DP)机器学习算法会产生多种随机性,如初始化随机值、批处理随机抽样和排序。然而,这些随机性很难在证明泛化隐私级别时考虑,因为它们导致算法输出的混合分布变得更加复杂。这篇论文关注改善混合模型和一轮泛化隐私梯度下降(DP-GD)的隐私级别,使用 $f $-DP。我们 derive了混合模型的关闭式交易函数表达,超过最新的 $(\epsilon ,\delta )-$DP 结果。此外,我们还研究了随机初始化对DP-GD的隐私性的影响。我们的数值计算表达交易函数指示,随机初始化可以增强DP-GD的隐私性。我们的 $f $-DP 保证分析中使用了这篇论文引入的交易函数不等式,该不等式表明 $F $-散度函数的共轭性。最后,我们研究了 $f $-DP 对混合机制的隐私性的分析,并应用了 $(\epsilon ,\delta )-$DP 相关的先进 JOINT CONVEXITY 的射影分析。
Early detection of inflammatory arthritis to improve referrals using multimodal machine learning from blood testing, semi-structured and unstructured patient records
results: 我们的研究表明,使用多modal数据 ensemble学和融合学方法可以帮助早期诊断IA,并且在实际应用中可以提高医疗资源的利用率和诊断精度。这些方法可以作为诊断IA的助手,帮助医生更准确地诊断疾病。Abstract
Early detection of inflammatory arthritis (IA) is critical to efficient and accurate hospital referral triage for timely treatment and preventing the deterioration of the IA disease course, especially under limited healthcare resources. The manual assessment process is the most common approach in practice for the early detection of IA, but it is extremely labor-intensive and inefficient. A large amount of clinical information needs to be assessed for every referral from General Practice (GP) to the hospitals. Machine learning shows great potential in automating repetitive assessment tasks and providing decision support for the early detection of IA. However, most machine learning-based methods for IA detection rely on blood testing results. But in practice, blood testing data is not always available at the point of referrals, so we need methods to leverage multimodal data such as semi-structured and unstructured data for early detection of IA. In this research, we present fusion and ensemble learning-based methods using multimodal data to assist decision-making in the early detection of IA. To the best of our knowledge, our study is the first attempt to utilize multimodal data to support the early detection of IA from GP referrals.
摘要
早期检测Inflammatory Arthritis (IA) 的重要性在有限的医疗资源下是非常重要,以确保有效和准确的医院推荐诊断和避免IA疾病趋势的恶化。现在,手动评估过程是在实践中最常见的检测IA的方法,但它很劳动密集和不效率。每次从普通医生(GP)到医院的 Referral 需要评估大量临床信息。机器学习显示出了自动化重复的评估任务和提供决策支持的潜力,但大多数机器学习基于IA检测方法都依赖血液测试结果。但在实践中,血液测试数据不总是在 Referral 时可以获得,因此我们需要使用多Modal 数据来支持早期IA检测。在这项研究中,我们提出了将多Modal 数据 fusion 和 ensemble learning 技术应用于早期IA检测。到目前为止,我们的研究是首次利用多Modal 数据来支持从GP Referral 中的IA检测。
Topological Learning for Motion Data via Mixed Coordinates
paper_authors: Hengrui Luo, Jisu Kim, Alice Patania, Mikael Vejdemo-Johansson
for: 这个论文的目的是将拓扑信息integrated into a multiple output Gaussian process model for transfer learning purposes.
methods: 作者们提出了一种使用拓扑信息construct a cluster based kernel in a multiple output Gaussian process model, which incorporates the topological structural information and allows for a unified framework using topological information in time and motion series.
results: 作者们的方法可以effectively learn from multiple time series via a multiple output Gaussian process model, and achieve better performance compared to traditional methods.Abstract
Topology can extract the structural information in a dataset efficiently. In this paper, we attempt to incorporate topological information into a multiple output Gaussian process model for transfer learning purposes. To achieve this goal, we extend the framework of circular coordinates into a novel framework of mixed valued coordinates to take linear trends in the time series into consideration. One of the major challenges to learn from multiple time series effectively via a multiple output Gaussian process model is constructing a functional kernel. We propose to use topologically induced clustering to construct a cluster based kernel in a multiple output Gaussian process model. This kernel not only incorporates the topological structural information, but also allows us to put forward a unified framework using topological information in time and motion series.
摘要
topology可以效率地提取数据集中的结构信息。在这篇论文中,我们尝试将拓扑信息integrated into a novel framework of mixed-valued coordinates to take linear trends in the time series into consideration。 一个主要挑战是通过多个时间序列学习是通过多输出 Gaussian process模型,constructing a functional kernel。我们提议使用拓扑induced clustering construct a cluster-based kernel in a multiple output Gaussian process model。这个kernel不仅 incorporates the topological structural information,而且允许我们提出一个统一的拓扑信息在时间和运动序列中使用的框架。
PriPrune: Quantifying and Preserving Privacy in Pruned Federated Learning
results: 该论文的实验结果表明,使用 PriPrune 可以在 FL 中提高隐私保护,并且可以与不同的防御策略和模型剔除策略结合使用以提高隐私精度。Abstract
Federated learning (FL) is a paradigm that allows several client devices and a server to collaboratively train a global model, by exchanging only model updates, without the devices sharing their local training data. These devices are often constrained in terms of communication and computation resources, and can further benefit from model pruning -- a paradigm that is widely used to reduce the size and complexity of models. Intuitively, by making local models coarser, pruning is expected to also provide some protection against privacy attacks in the context of FL. However this protection has not been previously characterized, formally or experimentally, and it is unclear if it is sufficient against state-of-the-art attacks. In this paper, we perform the first investigation of privacy guarantees for model pruning in FL. We derive information-theoretic upper bounds on the amount of information leaked by pruned FL models. We complement and validate these theoretical findings, with comprehensive experiments that involve state-of-the-art privacy attacks, on several state-of-the-art FL pruning schemes, using benchmark datasets. This evaluation provides valuable insights into the choices and parameters that can affect the privacy protection provided by pruning. Based on these insights, we introduce PriPrune -- a privacy-aware algorithm for local model pruning, which uses a personalized per-client defense mask and adapts the defense pruning rate so as to jointly optimize privacy and model performance. PriPrune is universal in that can be applied after any pruned FL scheme on the client, without modification, and protects against any inversion attack by the server. Our empirical evaluation demonstrates that PriPrune significantly improves the privacy-accuracy tradeoff compared to state-of-the-art pruned FL schemes that do not take privacy into account.
摘要
联合学习(FL)是一种 paradigm,让多个客户端设备和服务器共同训练全域模型,只需将模型更新交换,不需客户端分享本地训练数据。这些客户端通常受到通信和计算资源的限制,可以进一步从模型剔除中获得保护。实际上,通过让本地模型变得粗糙,剔除是对隐私攻击的一种保护措施,但这个保护尚未得到正式或实验性的描述,而且不清楚是否具有足够的防护力。在这篇文章中,我们进行了联合学习中隐私保护的第一次研究。我们 derive 信息论的上限,以量度剔除FL模型中的资讯泄露。我们还进行了广泛的实验,使用现代隐私攻击,评估多个state-of-the-art FL剔除方案在多个 benchmark 数据集上的性能。这些实验给出了价值的对照,以便选择和参数的推广。基于这些对照,我们引入 PriPrune,一个适应性的隐私保护算法,使用对每个客户端的个人化防御面罩,并调整防御剔除率,以同时优化隐私和模型性能。 PriPrune 可以跨多个FL剔除方案,无需修改,并且可以对任何剔除FL模型进行防护,不受服务器的攻击。我们的实验显示,PriPrune 在隐私对照调整中提供了明显的改善。
The Acquisition of Physical Knowledge in Generative Neural Networks
paper_authors: Luca M. Schulze Buschoff, Eric Schulz, Marcel Binz
for: investigate how the learning trajectories of deep generative neural networks compare to children’s developmental trajectories using physical understanding as a testbed.
methods: use physical understanding as a testbed to examine two distinct hypotheses of human development - stochastic optimization and complexity increase.
results: find that while our models are able to accurately predict a number of physical processes, their learning trajectories under both hypotheses do not follow the developmental trajectories of children.Here’s the summary in Traditional Chinese text:
for: 研究深度生成神经网络的学习轨迹与儿童的发展轨迹,用物理理解作为测试床。
methods: 使用物理理解作为测试床,检查两种人类发展假设——几率优化和复杂度增加。
results: 发现模型能够准确预测一些物理 проце数,但其学习轨迹下两种假设都不跟随儿童的发展轨迹。Abstract
As children grow older, they develop an intuitive understanding of the physical processes around them. Their physical understanding develops in stages, moving along developmental trajectories which have been mapped out extensively in previous empirical research. Here, we investigate how the learning trajectories of deep generative neural networks compare to children's developmental trajectories using physical understanding as a testbed. We outline an approach that allows us to examine two distinct hypotheses of human development - stochastic optimization and complexity increase. We find that while our models are able to accurately predict a number of physical processes, their learning trajectories under both hypotheses do not follow the developmental trajectories of children.
摘要
As children grow older, they develop an intuitive understanding of the physical processes around them. Their physical understanding develops in stages, moving along developmental trajectories which have been mapped out extensively in previous empirical research. Here, we investigate how the learning trajectories of deep generative neural networks compare to children's developmental trajectories using physical understanding as a testbed. We outline an approach that allows us to examine two distinct hypotheses of human development - stochastic optimization and complexity increase. We find that while our models are able to accurately predict a number of physical processes, their learning trajectories under both hypotheses do not follow the developmental trajectories of children.Here's the translation in Traditional Chinese:随着儿童长大,他们会开始有直觉地理解环境中的物理过程。儿童的物理理解会随着时间的推移,逐步发展出不同的发展轨迹,这些轨迹已经在前一些实验研究中得到了详细的描述。在这里,我们将实验探索深度生成神经网络的学习轨迹与儿童的发展轨迹之间的相似之处。我们提出了两个假设来检验人类发展的机制:随机优化和复杂度增加。我们发现了我们的模型可以对一些物理过程进行准确的预测,但是它们的学习轨迹不会跟随儿童的发展轨迹。
Lyapunov-Based Dropout Deep Neural Network (Lb-DDNN) Controller
results: 在实验中,提出的Dropout DNN基于适应控制器比基eline adaptive DNN控制器无Dropout正则化技术下的追踪误差下降38.32%,功能预测误差下降53.67%,控制努力下降50.44%。Abstract
Deep neural network (DNN)-based adaptive controllers can be used to compensate for unstructured uncertainties in nonlinear dynamic systems. However, DNNs are also very susceptible to overfitting and co-adaptation. Dropout regularization is an approach where nodes are randomly dropped during training to alleviate issues such as overfitting and co-adaptation. In this paper, a dropout DNN-based adaptive controller is developed. The developed dropout technique allows the deactivation of weights that are stochastically selected for each individual layer within the DNN. Simultaneously, a Lyapunov-based real-time weight adaptation law is introduced to update the weights of all layers of the DNN for online unsupervised learning. A non-smooth Lyapunov-based stability analysis is performed to ensure asymptotic convergence of the tracking error. Simulation results of the developed dropout DNN-based adaptive controller indicate a 38.32% improvement in the tracking error, a 53.67% improvement in the function approximation error, and 50.44% lower control effort when compared to a baseline adaptive DNN-based controller without dropout regularization.
摘要
(简化中文)深度神经网络(DNN)基于适应控制器可以补偿非结构化不确定性在非线性动态系统中。然而,DNN也很容易过拟合和相互作用。Dropout regularization是一种approach,在训练时随机drop nodes以解决过拟合和相互作用的问题。在这篇论文中,一种dropout DNN基于适应控制器被开发出来。该developed dropout技术allow the deactivation of weights that are stochastically selected for each individual layer within the DNN.同时,一种Lyapunov-based实时重量更新法是引入,以更新所有层的DNN重量 для在线无监督学习。一种非稠密Lyapunov-based稳定分析是执行以确保追踪误差的极限收敛。实验结果表明,与基eline adaptive DNN基于控制器 безdropout regularization相比,开发的dropout DNN基于适应控制器可以提高追踪误差38.32%,函数适应误差53.67%,控制努力50.44%。
paper_authors: Jonas Scholz, Tom R. Andersson, Anna Vaughan, James Requeima, Richard E. Turner
for: 这个论文旨在探讨如何使用机器学习模型进行天气预测和气候监测。
methods: 这个论文使用的方法包括使用数值数据融合系统生成的格子化气象数据,以及使用神经网络模型来模拟天气Conditioned on both gridded and off-the-grid context data to make uncertainty-aware predictions at target locations。
results: 研究发现,使用“Sim2Real”方法可以在德国的表面温度预测 task 中取得substantially better results,这表明了使用数值数据融合系统的数据作为适应的跳板,可以帮助学习从实际观测数据中获得更高的准确率。Abstract
Machine learning (ML)-based weather models have recently undergone rapid improvements. These models are typically trained on gridded reanalysis data from numerical data assimilation systems. However, reanalysis data comes with limitations, such as assumptions about physical laws and low spatiotemporal resolution. The gap between reanalysis and reality has sparked growing interest in training ML models directly on observations such as weather stations. Modelling scattered and sparse environmental observations requires scalable and flexible ML architectures, one of which is the convolutional conditional neural process (ConvCNP). ConvCNPs can learn to condition on both gridded and off-the-grid context data to make uncertainty-aware predictions at target locations. However, the sparsity of real observations presents a challenge for data-hungry deep learning models like the ConvCNP. One potential solution is 'Sim2Real': pre-training on reanalysis and fine-tuning on observational data. We analyse Sim2Real with a ConvCNP trained to interpolate surface air temperature over Germany, using varying numbers of weather stations for fine-tuning. On held-out weather stations, Sim2Real training substantially outperforms the same model architecture trained only with reanalysis data or only with station data, showing that reanalysis data can serve as a stepping stone for learning from real observations. Sim2Real could thus enable more accurate models for weather prediction and climate monitoring.
摘要
机器学习(ML)基于天气模型在最近几年内呈现了快速进步。这些模型通常是基于网格化的重新分析数据进行训练的。然而,重新分析数据具有一些限制,例如假设物理法律和低空时间分辨率。这导致了在重新分析和现实之间的差距,并且引发了使用直接训练在天气站上的ML模型的兴趣。模型散布和稀缺环境观测需要扩展和灵活的ML架构,其中之一是卷积条件隐藏过程(ConvCNP)。ConvCNP可以通过条件在网格和非网格上的数据来进行随机预测。然而,实际观测的稀缺性对深度学习模型来说是一个挑战。一个可能的解决方案是“Sim2Real”:先在重新分析数据上进行预训练,然后在天气站上进行细化训练。我们使用了一个ConvCNP来 interpolate surface air temperature over Germany,使用不同数量的天气站进行细化训练。在封存的天气站上,Sim2Real训练显著超过了同样的模型架构在重新分析数据或天气站数据上进行训练,这表明重新分析数据可以作为学习真实观测的“跳板”。Sim2Real可能可以帮助建立更准确的天气预测和气候监测模型。
Solving a Class of Cut-Generating Linear Programs via Machine Learning
results: 实验结果表明,使用该方法可以提高解时间,比 conventinal cutting plane方法更快。Abstract
Cut-generating linear programs (CGLPs) play a key role as a separation oracle to produce valid inequalities for the feasible region of mixed-integer programs. When incorporated inside branch-and-bound, the cutting planes obtained from CGLPs help to tighten relaxations and improve dual bounds. However, running the CGLPs at the nodes of the branch-and-bound tree is computationally cumbersome due to the large number of node candidates and the lack of a priori knowledge on which nodes admit useful cutting planes. As a result, CGLPs are often avoided at default settings of branch-and-cut algorithms despite their potential impact on improving dual bounds. In this paper, we propose a novel framework based on machine learning to approximate the optimal value of a CGLP class that determines whether a cutting plane can be generated at a node of the branch-and-bound tree. Translating the CGLP as an indicator function of the objective function vector, we show that it can be approximated through conventional data classification techniques. We provide a systematic procedure to efficiently generate training data sets for the corresponding classification problem based on the CGLP structure. We conduct computational experiments on benchmark instances using classification methods such as logistic regression. These results suggest that the approximate CGLP obtained from classification can improve the solution time compared to that of conventional cutting plane methods. Our proposed framework can be efficiently applied to a large number of nodes in the branch-and-bound tree to identify the best candidates for adding a cut.
摘要
割生成线性程序(CGLP)在杂integer程序的可行区域中扮演着关键角色,作为分离或acles来生成有效的不等式。在branch和bound中包含CGLP时,可以通过割生成的割面来紧张 relaxation 和提高 dual bound。然而,在branch和bound树中运行CGLP的计算占用了大量计算资源,因为有很多节点候选和没有先验知识,哪些节点可以生成有用的割面。因此,CGLP通常在branch和cut算法的默认设置下被避免使用,即使它们有可能改善 dual bound。在这篇论文中,我们提出了一种基于机器学习的新框架,用于缩小CGLP类型的优化值。我们将CGLP转化为目标函数向量的指示函数,并示出可以通过传统数据分类技术来近似其优化值。我们还提供了一种系统化的生成训练数据集的方法,基于CGLP的结构。我们在 benchmark 实例上进行了计算实验,使用类логистиック回归等分类方法。这些结果表明,使用我们的提出的框架可以提高解决时间,比 conventional cutting plane方法更好。我们的提出的框架可以高效地应用于branch和bound树中的大量节点,以确定最佳的加法割面候选。
Meta-Learning Strategies through Value Maximization in Neural Networks
results: 研究发现,在不同的学习Setting中,控制努力在早期学习 easier aspects of a task 时最有利,然后坚持努力 harder aspects 上。总的来说,这种学习努力框架提供了一个可 tractable 的理论测试床,可以研究不同的学习系统中的正规控制策略,以及一种正规的认知控制理论中提出的学习 trajectory 的控制策略。Abstract
Biological and artificial learning agents face numerous choices about how to learn, ranging from hyperparameter selection to aspects of task distributions like curricula. Understanding how to make these meta-learning choices could offer normative accounts of cognitive control functions in biological learners and improve engineered systems. Yet optimal strategies remain challenging to compute in modern deep networks due to the complexity of optimizing through the entire learning process. Here we theoretically investigate optimal strategies in a tractable setting. We present a learning effort framework capable of efficiently optimizing control signals on a fully normative objective: discounted cumulative performance throughout learning. We obtain computational tractability by using average dynamical equations for gradient descent, available for simple neural network architectures. Our framework accommodates a range of meta-learning and automatic curriculum learning methods in a unified normative setting. We apply this framework to investigate the effect of approximations in common meta-learning algorithms; infer aspects of optimal curricula; and compute optimal neuronal resource allocation in a continual learning setting. Across settings, we find that control effort is most beneficial when applied to easier aspects of a task early in learning; followed by sustained effort on harder aspects. Overall, the learning effort framework provides a tractable theoretical test bed to study normative benefits of interventions in a variety of learning systems, as well as a formal account of optimal cognitive control strategies over learning trajectories posited by established theories in cognitive neuroscience.
摘要
In this study, we investigate optimal strategies in a tractable setting using a learning effort framework that efficiently optimizes control signals based on a fully normative objective: discounted cumulative performance throughout learning. We use average dynamical equations for gradient descent, which are available for simple neural network architectures, to achieve computational tractability. Our framework accommodates a range of meta-learning and automatic curriculum learning methods in a unified normative setting.We apply the framework to investigate the effect of approximations in common meta-learning algorithms, infer aspects of optimal curricula, and compute optimal neuronal resource allocation in a continual learning setting. Our findings show that control effort is most beneficial when applied to easier aspects of a task early in learning, followed by sustained effort on harder aspects.Overall, the learning effort framework provides a tractable theoretical test bed to study normative benefits of interventions in a variety of learning systems, as well as a formal account of optimal cognitive control strategies over learning trajectories posited by established theories in cognitive neuroscience.
GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors Using Protein Language Models
results: 研究发现,使用注意力权重和隐藏状态来描述蛋白质序列中各氨酸的贡献程度,以及3D结构的分析,可以帮助理解GPCR的协同作用。Abstract
With the rise of Transformers and Large Language Models (LLMs) in Chemistry and Biology, new avenues for the design and understanding of therapeutics have opened up to the scientific community. Protein sequences can be modeled as language and can take advantage of recent advances in LLMs, specifically with the abundance of our access to the protein sequence datasets. In this paper, we developed the GPCR-BERT model for understanding the sequential design of G Protein-Coupled Receptors (GPCRs). GPCRs are the target of over one-third of FDA-approved pharmaceuticals. However, there is a lack of comprehensive understanding regarding the relationship between amino acid sequence, ligand selectivity, and conformational motifs (such as NPxxY, CWxP, E/DRY). By utilizing the pre-trained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights, and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs. In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.
摘要
By utilizing the pre-trained protein model (Prot-Bert) and fine-tuning with prediction tasks of variations in the motifs, we were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs. To achieve this, we took advantage of attention weights and hidden states of the model that are interpreted to extract the extent of contributions of amino acids in dictating the type of masked ones. The fine-tuned models demonstrated high accuracy in predicting hidden residues within the motifs.In addition, the analysis of embedding was performed over 3D structures to elucidate the higher-order interactions within the conformations of the receptors.
Bayesian Simulation-based Inference for Cosmological Initial Conditions
results: 论文首次实现了从观测数据中重构cosmological initial condition的能力。Abstract
Reconstructing astrophysical and cosmological fields from observations is challenging. It requires accounting for non-linear transformations, mixing of spatial structure, and noise. In contrast, forward simulators that map fields to observations are readily available for many applications. We present a versatile Bayesian field reconstruction algorithm rooted in simulation-based inference and enhanced by autoregressive modeling. The proposed technique is applicable to generic (non-differentiable) forward simulators and allows sampling from the posterior for the underlying field. We show first promising results on a proof-of-concept application: the recovery of cosmological initial conditions from late-time density fields.
摘要
<>将astro物理和 cosmological场 reconstruction from observations 是一项具有挑战性的任务。它需要考虑非线性变换、空间结构混合以及噪声。相反,前向模拟器可以快速地将场转换为观测数据,这些模拟器在许多应用场景中ready available。我们提出了一种可靠的 Bayesian 场 reconstruction算法,基于simulation-based inference和自适应模型。该算法适用于通用(非 diferenciable)前向模拟器,并允许采样 posterior 中的 underlying 场。我们在一个证明性应用中展示了该技术的初步成果: cosmological initial condition 的回归从晚期密度场中。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and pronunciation. It is commonly used in mainland China and Singapore.
BTRec: BERT-Based Trajectory Recommendation for Personalized Tours
methods: 这项研究使用BERT框架,combined with user demographic information and past POI visits,提出了一种基于POIBERT嵌入算法的迭代算法(BTREC),用于个性化POI路线建议。
results: 实验结果表明,BTREC算法在八个不同规模的城市 dataset 上具有稳定性和高度的效果,与许多其他序列预测算法相比, measured by recall, precision, and F1-scores。Abstract
An essential task for tourists having a pleasant holiday is to have a well-planned itinerary with relevant recommendations, especially when visiting unfamiliar cities. Many tour recommendation tools only take into account a limited number of factors, such as popular Points of Interest (POIs) and routing constraints. Consequently, the solutions they provide may not always align with the individual users of the system. We propose an iterative algorithm in this paper, namely: BTREC (BERT-based Trajectory Recommendation), that extends from the POIBERT embedding algorithm to recommend personalized itineraries on POIs using the BERT framework. Our BTREC algorithm incorporates users' demographic information alongside past POI visits into a modified BERT language model to recommend a personalized POI itinerary prediction given a pair of source and destination POIs. Our recommendation system can create a travel itinerary that maximizes POIs visited, while also taking into account user preferences for categories of POIs and time availability. Our recommendation algorithm is largely inspired by the problem of sentence completion in natural language processing (NLP). Using a dataset of eight cities of different sizes, our experimental results demonstrate that our proposed algorithm is stable and outperforms many other sequence prediction algorithms, measured by recall, precision, and F1-scores.
摘要
<> translate "An essential task for tourists having a pleasant holiday is to have a well-planned itinerary with relevant recommendations, especially when visiting unfamiliar cities. Many tour recommendation tools only take into account a limited number of factors, such as popular Points of Interest (POIs) and routing constraints. Consequently, the solutions they provide may not always align with the individual users of the system. We propose an iterative algorithm in this paper, namely: BTREC (BERT-based Trajectory Recommendation), that extends from the POIBERT embedding algorithm to recommend personalized itineraries on POIs using the BERT framework. Our BTREC algorithm incorporates users' demographic information alongside past POI visits into a modified BERT language model to recommend a personalized POI itinerary prediction given a pair of source and destination POIs. Our recommendation system can create a travel itinerary that maximizes POIs visited, while also taking into account user preferences for categories of POIs and time availability. Our recommendation algorithm is largely inspired by the problem of sentence completion in natural language processing (NLP). Using a dataset of eight cities of different sizes, our experimental results demonstrate that our proposed algorithm is stable and outperforms many other sequence prediction algorithms, measured by recall, precision, and F1-scores."中文简体版:旅游者在度假时,有一项非常重要的任务是制定一个合适的行程计划,尤其当访问不熟悉的城市时。许多旅游推荐工具只是考虑有限的因素,如流行的点位(POI)和路径约束。因此,他们提供的解决方案可能不一定适合个人用户。我们在这篇论文中提出了一种迭代算法,即BTREC(基于BERT的行程推荐算法),它从POIBERT嵌入算法中推荐个性化的行程计划。我们的BTREC算法将用户的人口信息和过去访问的POI相结合在一个修改后的BERT语言模型中,以预测给定源和目的POI的个性化行程预测。我们的推荐系统可以创建一个包含最多POI的行程计划,同时也考虑用户对分类POI和时间可用性的偏好。我们的推荐算法受到自然语言处理(NLP)中的句子完成问题的启发。使用八个不同规模的城市的数据集,我们的实验结果表明,我们提出的算法稳定性高,并且在多个序列预测算法中赢得了较高的回归、准确率和F1分数。
Learning quantum states and unitaries of bounded gate complexity
paper_authors: Haimeng Zhao, Laura Lewis, Ishaan Kannan, Yihui Quek, Hsin-Yuan Huang, Matthias C. Caro
for: 本文研究了学习量子状态和量子运算的复杂性。
methods: 作者使用了样本复杂性和查询复杂性来证明了学习量子状态和量子运算的复杂性。
results: 作者证明了学习量子状态和量子运算的sample complexity和查询复杂性必须线性增长。此外,作者还证明了在理想的批处理下,学习量子状态和量子运算的计算复杂性必须线性增长。这些结果解释了量子机器学习模型的表达能力和创造量子状态和量子运算的复杂性之间的关系。Abstract
While quantum state tomography is notoriously hard, most states hold little interest to practically-minded tomographers. Given that states and unitaries appearing in Nature are of bounded gate complexity, it is natural to ask if efficient learning becomes possible. In this work, we prove that to learn a state generated by a quantum circuit with $G$ two-qubit gates to a small trace distance, a sample complexity scaling linearly in $G$ is necessary and sufficient. We also prove that the optimal query complexity to learn a unitary generated by $G$ gates to a small average-case error scales linearly in $G$. While sample-efficient learning can be achieved, we show that under reasonable cryptographic conjectures, the computational complexity for learning states and unitaries of gate complexity $G$ must scale exponentially in $G$. We illustrate how these results establish fundamental limitations on the expressivity of quantum machine learning models and provide new perspectives on no-free-lunch theorems in unitary learning. Together, our results answer how the complexity of learning quantum states and unitaries relate to the complexity of creating these states and unitaries.
摘要
“量子状态扫描不易,大多数状态对实际应用者来说并不具有兴趣。因为自然界中的状态和单位里程都具有有限的门阶复杂性,因此可以问到是否存在高效的学习方法。在这项工作中,我们证明了要用小距离来学习由quantum circuit生成的状态,需要样本复杂度 Linearly scale with G。我们还证明了要用average-case error来学习由G个门阶生成的单位ри程,需要查询复杂度 Linearly scale with G。虽然可以实现高效的学习,但是在合理的密码学假设下,计算复杂性为学习状态和单位里程的复杂性必须线性增长于G。我们示出了这些结果如何建立量子机器学习模型的基本限制,并提供了新的视角来评估unitary learning的no-free-lunch定理。这些结果回答了学习量子状态和单位里程的复杂性与创造这些状态和单位里程的复杂性之间的关系。”Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.
results: 论文通过对数字Calabi-Yau流形的应用来实践这些思想,并发现了一些有用的特性学习。Abstract
We develop a theory of flows in the space of Riemannian metrics induced by neural network gradient descent. This is motivated in part by recent advances in approximating Calabi-Yau metrics with neural networks and is enabled by recent advances in understanding flows in the space of neural networks. We derive the corresponding metric flow equations, which are governed by a metric neural tangent kernel, a complicated, non-local object that evolves in time. However, many architectures admit an infinite-width limit in which the kernel becomes fixed and the dynamics simplify. Additional assumptions can induce locality in the flow, which allows for the realization of Perelman's formulation of Ricci flow that was used to resolve the 3d Poincar\'e conjecture. We apply these ideas to numerical Calabi-Yau metrics, including a discussion on the importance of feature learning.
摘要
我们开发了一种在里曼纹理度量空间中流体的理论,它是由神经网络梯度下降引起的。这是由于近期神经网络对Calabi-Yau度量的近似而受到启发,以及对神经网络空间中流体的理解的进展。我们 derive了相应的流体方程,它们是由一个Metric neural tangent kernel(一种复杂、非本地的物体)控制的,这个物体在时间演化。然而,许多架构在无限宽限制下可以得到一个固定的kernel,从而简化动力学。此外,可以通过假设来引入本地性,使得流体可以实现Perelman的形式的Ricci流,这种流可以解决3d Poincaré conjecture。我们在数字Calabi-Yau度量中应用了这些想法,包括一些关于特征学习的讨论。
Posterior Sampling for Competitive RL: Function Approximation and Partial Observation
for: 这 paper investigate posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations.
methods: authors propose model-based posterior sampling methods to control both players to learn Nash equilibrium, and incorporate adversarial GEC to handle partial observability.
results: authors provide low regret bounds for proposed algorithms that can scale sublinearly with the proposed GEC and the number of episodes $T$. These methods can be applied to a majority of tractable zero-sum MG classes in both fully observable and partially observable MGs with self-play and adversarial learning.Abstract
This paper investigates posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations. Focusing on zero-sum Markov games (MGs) under two critical settings, namely self-play and adversarial learning, we first propose the self-play and adversarial generalized eluder coefficient (GEC) as complexity measures for function approximation, capturing the exploration-exploitation trade-off in MGs. Based on self-play GEC, we propose a model-based self-play posterior sampling method to control both players to learn Nash equilibrium, which can successfully handle the partial observability of states. Furthermore, we identify a set of partially observable MG models fitting MG learning with the adversarial policies of the opponent. Incorporating the adversarial GEC, we propose a model-based posterior sampling method for learning adversarial MG with potential partial observability. We further provide low regret bounds for proposed algorithms that can scale sublinearly with the proposed GEC and the number of episodes $T$. To the best of our knowledge, we for the first time develop generic model-based posterior sampling algorithms for competitive RL that can be applied to a majority of tractable zero-sum MG classes in both fully observable and partially observable MGs with self-play and adversarial learning.
摘要
To measure the complexity of function approximation in MGs, we propose the self-play and adversarial generalized eluder coefficient (GEC). This captures the exploration-exploitation trade-off in MGs and provides a basis for developing model-based posterior sampling methods to control both players in learning Nash equilibrium.In self-play settings, we propose a model-based self-play posterior sampling method that can successfully handle partial observability of states. Additionally, we identify a set of partially observable MG models that can be used for MG learning with adversarial policies of the opponent.Incorporating the adversarial GEC, we propose a model-based posterior sampling method for learning adversarial MG with potential partial observability. We provide low regret bounds for our proposed algorithms, which can scale sublinearly with the proposed GEC and the number of episodes $T$.Our contributions are twofold: (1) we develop generic model-based posterior sampling algorithms for competitive RL that can be applied to a majority of tractable zero-sum MG classes in both fully observable and partially observable MGs with self-play and adversarial learning; and (2) we provide low regret bounds for our proposed algorithms, which can scale sublinearly with the proposed GEC and the number of episodes $T$.To the best of our knowledge, this is the first time that model-based posterior sampling algorithms have been developed for competitive RL that can handle a wide range of zero-sum MG classes with self-play and adversarial learning, and provide low regret bounds that can scale sublinearly with the proposed GEC and the number of episodes $T$.
results: 本文显示了现有方法在模型偏差的情况下的 regret 是线性增长的,而提出了一种robust causal bandits algorithm,其 regret 是 near-optimal 的 $\tilde{\mathcal{O}(\sqrt{T})$。Abstract
Sequential design of experiments for optimizing a reward function in causal systems can be effectively modeled by the sequential design of interventions in causal bandits (CBs). In the existing literature on CBs, a critical assumption is that the causal models remain constant over time. However, this assumption does not necessarily hold in complex systems, which constantly undergo temporal model fluctuations. This paper addresses the robustness of CBs to such model fluctuations. The focus is on causal systems with linear structural equation models (SEMs). The SEMs and the time-varying pre- and post-interventional statistical models are all unknown. Cumulative regret is adopted as the design criteria, based on which the objective is to design a sequence of interventions that incur the smallest cumulative regret with respect to an oracle aware of the entire causal model and its fluctuations. First, it is established that the existing approaches fail to maintain regret sub-linearity with even a few instances of model deviation. Specifically, when the number of instances with model deviation is as few as $T^\frac{1}{2L}$, where $T$ is the time horizon and $L$ is the longest causal path in the graph, the existing algorithms will have linear regret in $T$. Next, a robust CB algorithm is designed, and its regret is analyzed, where upper and information-theoretic lower bounds on the regret are established. Specifically, in a graph with $N$ nodes and maximum degree $d$, under a general measure of model deviation $C$, the cumulative regret is upper bounded by $\tilde{\mathcal{O}(d^{L-\frac{1}{2}(\sqrt{NT} + NC))$ and lower bounded by $\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$. Comparing these bounds establishes that the proposed algorithm achieves nearly optimal $\tilde{\mathcal{O}(\sqrt{T})$ regret when $C$ is $o(\sqrt{T})$ and maintains sub-linear regret for a broader range of $C$.
摘要
统计设计实验可以有效地模型 causal 系统中的奖励函数。在现有的文献中,一个重要的假设是 causal 模型在时间上不会改变。但这个假设可能不正确,因为复杂的系统可能会在时间上持续地改变。这篇论文处理了 causal 系统中 model 改变的影响。我们专注于 causal 系统中的线性结构方程模型 (SEM),并且所有的时间变化前后统计模型都是未知的。我们采用了累累 regret 作为设计标准,并且目标是设计一系列的干预,以实现最小的累累 regret,对于一个了解整个 causal 模型和其变化的 oracle。首先,我们证明了现有的方法在几次模型偏差后会具有线性 regret。具体来说,当时间悠久 $T$ 和最长 causal 路径 $L$ 的比例为 $T^\frac{1}{2L}$ 时,现有的算法将会具有线性 regret。接下来,我们设计了一个预警 causal 搜索 algorithm,并且分析了它的 regret。具体来说,在一个具有 $N$ 个节点和最大关系度 $d$ 的图形上,在一个普通的模型偏差 $C$ 下,累累 regret 是上界 $\tilde{\mathcal{O}(d^{L-\frac{1}{2}(\sqrt{NT} + NC))$ 和下界 $\Omega(d^{\frac{L}{2}-2}\max\{\sqrt{T},d^2C\})$。比较这些下界可以证明我们的算法实现了近乎最佳的 $\tilde{\mathcal{O}(\sqrt{T})$ regret,当 $C$ 是 $o(\sqrt{T})$ 时。此外,我们的算法还可以在更广泛的 $C$ 下维持线性 regret。
On Learning Gaussian Multi-index Models with Gradient Flow
results: 论文表明,在适当地利用了子空间相关矩阵的 matrix semigroup 结构的情况下,可以确定Gradient 流动的全局收敛性,并提供了这个流动的相关 ‘saddle-to-saddle’ 动态的量化描述。Abstract
We study gradient flow on the multi-index regression problem for high-dimensional Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear projection and an arbitrary unknown, low-dimensional link function. As such, they constitute a natural template for feature learning in neural networks. We consider a two-timescale algorithm, whereby the low-dimensional link function is learnt with a non-parametric model infinitely faster than the subspace parametrizing the low-rank projection. By appropriately exploiting the matrix semigroup structure arising over the subspace correlation matrices, we establish global convergence of the resulting Grassmannian population gradient flow dynamics, and provide a quantitative description of its associated `saddle-to-saddle' dynamics. Notably, the timescales associated with each saddle can be explicitly characterized in terms of an appropriate Hermite decomposition of the target link function. In contrast with these positive results, we also show that the related \emph{planted} problem, where the link function is known and fixed, in fact has a rough optimization landscape, in which gradient flow dynamics might get trapped with high probability.
摘要
我们研究了梯度流在多指标回归问题上,这个问题适用于高维 Gaussian 数据。多指标函数可以看作是一种含有未知低维线性投影和一个未知低维连接函数的组合。因此,它们成为了神经网络中的自然特征学习模板。我们考虑了一种两个时间标度的算法,其中低维连接函数使用非 Parametric 模型 infinitely faster than 子空间嵌入函数。通过利用子空间相关矩阵的matrix semigroup结构,我们证明了涉及的 Grassmannian 人口梯度流动的全球收敛性,并提供了相关的 `saddle-to-saddle' 动态的量化描述。与此不同的是,相关的植入(planted)问题,其中连接函数已知并固定,实际上有一个 rugged 优化山峰,因此梯度流动可能会在高概率下被困。
Locally Optimal Best Arm Identification with a Fixed Budget
methods: 我们使用了各种方法来探索最佳治疗臂,包括best arm identification(BAI)和ORDINAL OPTIMIZATION。我们还使用了Generalized-Neyman-Allocation(GNA)-empirical-best-arm(EBA)策略,这是Neyman(1934)所提出的Neyman分配的扩展和Bubeck等人(2011)所提出的Uniform-EBA策略。
results: 我们的实验结果显示,GNA-EBA策略在小差值领域下是 asymptotically 优化的,即其错误率与下限 bounds 相align,这意味着这个策略在这个领域下是最佳的。Abstract
This study investigates the problem of identifying the best treatment arm, a treatment arm with the highest expected outcome. We aim to identify the best treatment arm with a lower probability of misidentification, which has been explored under various names across numerous research fields, including \emph{best arm identification} (BAI) and ordinal optimization. In our experiments, the number of treatment-allocation rounds is fixed. In each round, a decision-maker allocates a treatment arm to an experimental unit and observes a corresponding outcome, which follows a Gaussian distribution with a variance different among treatment arms. At the end of the experiment, we recommend one of the treatment arms as an estimate of the best treatment arm based on the observations. The objective of the decision-maker is to design an experiment that minimizes the probability of misidentifying the best treatment arm. With this objective in mind, we develop lower bounds for the probability of misidentification under the small-gap regime, where the gaps of the expected outcomes between the best and suboptimal treatment arms approach zero. Then, assuming that the variances are known, we design the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, which is an extension of the Neyman allocation proposed by Neyman (1934) and the Uniform-EBA strategy proposed by Bubeck et al. (2011). For the GNA-EBA strategy, we show that the strategy is asymptotically optimal because its probability of misidentification aligns with the lower bounds as the sample size approaches infinity under the small-gap regime. We refer to such optimal strategies as locally asymptotic optimal because their performance aligns with the lower bounds within restricted situations characterized by the small-gap regime.
摘要
To achieve this goal, we develop lower bounds for the probability of misidentification under the small-gap regime, where the gaps of the expected outcomes between the best and suboptimal treatment arms approach zero. We then design the Generalized-Neyman-Allocation (GNA)-empirical-best-arm (EBA) strategy, which is an extension of the Neyman allocation proposed by Neyman (1934) and the Uniform-EBA strategy proposed by Bubeck et al. (2011). We show that the GNA-EBA strategy is asymptotically optimal because its probability of misidentification aligns with the lower bounds as the sample size approaches infinity under the small-gap regime. We refer to such optimal strategies as locally asymptotic optimal because their performance aligns with the lower bounds within restricted situations characterized by the small-gap regime.
Autoregressive Attention Neural Networks for Non-Line-of-Sight User Tracking with Dynamic Metasurface Antennas
results: 数字评估结果表明,尽管LoS堵塞,这种方法在多种 multipath 环境中可以实现高精度的位置预测。Abstract
User localization and tracking in the upcoming generation of wireless networks have the potential to be revolutionized by technologies such as the Dynamic Metasurface Antennas (DMAs). Commonly proposed algorithmic approaches rely on assumptions about relatively dominant Line-of-Sight (LoS) paths, or require pilot transmission sequences whose length is comparable to the number of DMA elements, thus, leading to limited effectiveness and considerable measurement overheads in blocked LoS and dynamic multipath environments. In this paper, we present a two-stage machine-learning-based approach for user tracking, specifically designed for non-LoS multipath settings. A newly proposed attention-based Neural Network (NN) is first trained to map noisy channel responses to potential user positions, regardless of user mobility patterns. This architecture constitutes a modification of the prominent vision transformer, specifically modified for extracting information from high-dimensional frequency response signals. As a second stage, the NN's predictions for the past user positions are passed through a learnable autoregressive model to exploit the time-correlated channel information and obtain the final position predictions. The channel estimation procedure leverages a DMA receive architecture with partially-connected radio frequency chains, which results to reduced numbers of pilots. The numerical evaluation over an outdoor ray-tracing scenario illustrates that despite LoS blockage, this methodology is capable of achieving high position accuracy across various multipath settings.
摘要
User 本地化和跟踪在未来的无线网络中将有可能被革命化由动态元件天线(DMA)技术。通常的算法方法假设有 relativelly 主导的直线视线(LoS)路径,或者需要启发传输序列的长度相当于DMA元件的数量,从而导致有限的效果和较大的测量过程中的遮挡和动态干扰环境中。在本文中,我们提出了一种基于机器学习的两个阶段方法 для用户跟踪,特别是非直线视线多path 环境中。我们首先使用一种新的注意力基于神经网络(NN)来将干扰后的通道响应映射到潜在的用户位置,无论用户移动模式。这个架构是基于prominent vision transformer modify的,专门用于从高维频响应信号中提取信息。作为第二个阶段,NN的过去用户位置预测被传递给一个学习的自然随机过程,以利用时相关的通道信息并获得最终的位置预测。通道估计过程利用了DMA接收架构,其中一部分连接的 радио频信号链,这会减少数据的数量。 numerically 评估在一个outdoor 照明场景中表明,尽管LoS堵塞,这种方法可以在不同的多path 环境中实现高精度的位置预测。
Epidemic outbreak prediction using machine learning models
results: 根据研究结果,这些算法可以准确地预测疫情爆发,并且可以提供5个星期内的病例数量预测结果。这些结果可以帮助当地当局和医疗机构预先准备疫情应急应急响应。Abstract
In today's world,the risk of emerging and re-emerging epidemics have increased.The recent advancement in healthcare technology has made it possible to predict an epidemic outbreak in a region.Early prediction of an epidemic outbreak greatly helps the authorities to be prepared with the necessary medications and logistics required to keep things in control. In this article, we try to predict the epidemic outbreak (influenza, hepatitis and malaria) for the state of New York, USA using machine and deep learning algorithms, and a portal has been created for the same which can alert the authorities and health care organizations of the region in case of an outbreak. The algorithm takes historical data to predict the possible number of cases for 5 weeks into the future. Non-clinical factors like google search trends,social media data and weather data have also been used to predict the probability of an outbreak.
摘要
今天的世界中,突发和重新爆发的疫情的风险增加了。最近的医疗技术发展使得可以预测一个地区的疫情爆发。预测疫情爆发的早期帮助当地 autorités 和医疗机构做好准备,以保持事务在控制之下。在这篇文章中,我们使用机器学习和深度学习算法预测新 York 州的 influenza、hepatitis 和 malaria 疫情爆发,并创建了一个portal,可以警示当地 autorités 和医疗机构在疫情爆发时。算法使用历史数据预测下一个5周内可能出现的病例数。此外,我们还使用非клиниче因素,如Google搜索趋势、社交媒体数据和天气数据预测疫情爆发的可能性。
Differentially Private Reward Estimation with Preference Feedback
paper_authors: Sayak Ray Chowdhury, Xingyu Zhou, Nagarajan Natarajan
For: This paper focuses on aligning generative models with human interests using preference-based feedback, and protecting the privacy of human labelers in the process.* Methods: The authors use reinforcement learning with human feedback (RLHF) to train generative models, and adopt the notion of label differential privacy (DP) to protect the privacy of individual labelers. They use the parametric Bradley-Terry-Luce (BTL) model to estimate the latent reward parameter $\theta^* \in \mathbb{R}^d$ from pairwise comparison feedback.* Results: The authors provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP, and show that the additional cost to ensure label-DP under local model is $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}\big)$, while it is $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$ under the weaker central model. They perform simulations on synthetic data that corroborate these theoretical results.Here is the information in Simplified Chinese text, as requested:* For: 这篇论文关注使用偏好反馈来对人类 интересов进行Alignment,并保护人类标签者的隐私。* Methods: 作者使用人类反馈学习(RLHF)来训练生成模型,并采用标签敏感 differential privacy(DP)来保护每个标签者的隐私。他们使用 Bradley-Terry-Luce(BTL)模型来估算基于对比比较的latent reward参数 $\theta^* \in \mathbb{R}^d$。* Results: 作者提供了 tight 上下文 bound 来估算 $\theta^*$ 的误差,并证明在本地模型下,为保持标签-DP,额外的成本为 $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}\big)$。在更弱的中央模型下,成本为 $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$。作者在合成数据上进行了仪表实验,并证明了这些理论结果。Abstract
Learning from preference-based feedback has recently gained considerable traction as a promising approach to align generative models with human interests. Instead of relying on numerical rewards, the generative models are trained using reinforcement learning with human feedback (RLHF). These approaches first solicit feedback from human labelers typically in the form of pairwise comparisons between two possible actions, then estimate a reward model using these comparisons, and finally employ a policy based on the estimated reward model. An adversarial attack in any step of the above pipeline might reveal private and sensitive information of human labelers. In this work, we adopt the notion of label differential privacy (DP) and focus on the problem of reward estimation from preference-based feedback while protecting privacy of each individual labelers. Specifically, we consider the parametric Bradley-Terry-Luce (BTL) model for such pairwise comparison feedback involving a latent reward parameter $\theta^* \in \mathbb{R}^d$. Within a standard minimax estimation framework, we provide tight upper and lower bounds on the error in estimating $\theta^*$ under both local and central models of DP. We show, for a given privacy budget $\epsilon$ and number of samples $n$, that the additional cost to ensure label-DP under local model is $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}\big)$, while it is $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$ under the weaker central model. We perform simulations on synthetic data that corroborate these theoretical results.
摘要
学习从偏好反馈获得了相当的满意度,作为一种使生成模型与人类兴趣相对应的方法。而不是依靠数字奖励,这些生成模型通过人工智能反馈学习(RLHF)进行训练。这些方法首先从人类标签器收集反馈,通常是两个可能的行动之间的对比,然后将这些反馈用来估算奖励模型,最后使用这个估算模型来采取策略。在这个管道中的任何攻击都可能泄露人类标签器的私人和敏感信息。在这种情况下,我们采用标签权限隐私(DP)的想法,并专注于基于偏好反馈的奖励估算问题,以保护每个人类标签器的隐私。具体来说,我们考虑使用 parametric Bradley-Terry-Luce(BTL)模型来处理这些对比反馈,其中包含一个隐藏奖励参数 $\theta^* \in \mathbb{R}^d$。在标准的最小最大估算框架下,我们提供了紧密的Upper和Lower bounds,用于估算 $\theta^*$ 的误差,并分析了在本地和中央模型下的DP。我们发现,对于给定的隐私预算 $\epsilon$ 和样本数 $n$,在本地模型下添加额外的隐私成本是 $\Theta \big(\frac{1}{ e^\epsilon-1}\sqrt{\frac{d}{n}\big)$,而在更弱的中央模型下,这个成本是 $\Theta\big(\frac{\text{poly}(d)}{\epsilon n} \big)$。我们在合成数据上进行了实验,并证明了这些理论结论。
paper_authors: Anuradha Kumari, Mushir Akhtar, Rupal Shah, M. Tanveer
for: This paper is written for academics and researchers who work with matrix input data and want to use support vector machines (SVMs) for classification and regression problems.
methods: The paper proposes a new method called support matrix machine (SMM) that can handle matrix input data and preserve the structural information of the data. SMM uses a combination of the nuclear norm and Frobenius norm, known as the spectral elastic net property, to achieve this.
results: The paper provides an in-depth analysis of the development of the SMM model, including numerous variants such as robust, sparse, class imbalance, and multi-class classification models. The paper also discusses applications of the SMM model and outlines potential future research avenues and possibilities.Abstract
Support vector machine (SVM) is one of the most studied paradigms in the realm of machine learning for classification and regression problems. It relies on vectorized input data. However, a significant portion of the real-world data exists in matrix format, which is given as input to SVM by reshaping the matrices into vectors. The process of reshaping disrupts the spatial correlations inherent in the matrix data. Also, converting matrices into vectors results in input data with a high dimensionality, which introduces significant computational complexity. To overcome these issues in classifying matrix input data, support matrix machine (SMM) is proposed. It represents one of the emerging methodologies tailored for handling matrix input data. The SMM method preserves the structural information of the matrix data by using the spectral elastic net property which is a combination of the nuclear norm and Frobenius norm. This article provides the first in-depth analysis of the development of the SMM model, which can be used as a thorough summary by both novices and experts. We discuss numerous SMM variants, such as robust, sparse, class imbalance, and multi-class classification models. We also analyze the applications of the SMM model and conclude the article by outlining potential future research avenues and possibilities that may motivate academics to advance the SMM algorithm.
摘要
支持向量机器 (SVM) 是机器学习领域中最受研究的一种类型,用于分类和回归问题。它基于向量化输入数据。然而,现实世界中大量数据存在矩阵格式,需要将矩阵转换为向量,以便给SVM进行输入。然而,这个过程会破坏矩阵数据中的空间相关性,并且将输入数据的维度增加,导致计算复杂性增加。为了解决这些问题,支持矩阵机器 (SMM) 被提出。它是一种处理矩阵输入数据的新方法。SMM方法利用矩阵数据的特征信息,通过spectral elastic net property,该属性是 nuclear norm 和 Frobenius norm 的组合。本文提供了 SMM 模型的首次深入分析,可以作为新手和专家使用的综述。我们讨论了多种 SMM 变体,如 robust、稀热、类偏振和多类分类模型。我们还分析了 SMM 模型的应用,并将文章结束于对 SMM 算法的未来研究方向和可能性的讨论。
Exact Recovery and Bregman Hard Clustering of Node-Attributed Stochastic Block Model
results: 实验结果表明,提出的算法在Synthetic数据上表现出色,比 классические算法和现状最佳算法更高效。Abstract
Network clustering tackles the problem of identifying sets of nodes (communities) that have similar connection patterns. However, in many scenarios, nodes also have attributes that are correlated with the clustering structure. Thus, network information (edges) and node information (attributes) can be jointly leveraged to design high-performance clustering algorithms. Under a general model for the network and node attributes, this work establishes an information-theoretic criterion for the exact recovery of community labels and characterizes a phase transition determined by the Chernoff-Hellinger divergence of the model. The criterion shows how network and attribute information can be exchanged in order to have exact recovery (e.g., more reliable network information requires less reliable attribute information). This work also presents an iterative clustering algorithm that maximizes the joint likelihood, assuming that the probability distribution of network interactions and node attributes belong to exponential families. This covers a broad range of possible interactions (e.g., edges with weights) and attributes (e.g., non-Gaussian models), as well as sparse networks, while also exploring the connection between exponential families and Bregman divergences. Extensive numerical experiments using synthetic data indicate that the proposed algorithm outperforms classic algorithms that leverage only network or only attribute information as well as state-of-the-art algorithms that also leverage both sources of information. The contributions of this work provide insights into the fundamental limits and practical techniques for inferring community labels on node-attributed networks.
摘要
Under a general model for the network and node attributes, this work establishes an information-theoretic criterion for the exact recovery of community labels and characterizes a phase transition determined by the Chernoff-Hellinger divergence of the model. The criterion shows how network and attribute information can be exchanged to achieve exact recovery (e.g., more reliable network information requires less reliable attribute information).This work also presents an iterative clustering algorithm that maximizes the joint likelihood, assuming that the probability distribution of network interactions and node attributes belongs to exponential families. This covers a broad range of possible interactions (e.g., edges with weights) and attributes (e.g., non-Gaussian models), as well as sparse networks, while also exploring the connection between exponential families and Bregman divergences.Numerical experiments using synthetic data show that the proposed algorithm outperforms classic algorithms that only use network or attribute information and state-of-the-art algorithms that also use both sources of information. The contributions of this work provide insights into the fundamental limits and practical techniques for inferring community labels on node-attributed networks.
Convolutional State Space Models for Long-Range Spatiotemporal Modeling
results: 对比于Transformers和卷积LSTM,ConvS5在一个长期运动MNIST实验中表现出色,训练3倍 faster和生成样本400倍 faster,并在DMLab、Minecraft和Habitat预测benchmark上与或超过了现有方法的性能。Abstract
Effectively modeling long spatiotemporal sequences is challenging due to the need to model complex spatial correlations and long-range temporal dependencies simultaneously. ConvLSTMs attempt to address this by updating tensor-valued states with recurrent neural networks, but their sequential computation makes them slow to train. In contrast, Transformers can process an entire spatiotemporal sequence, compressed into tokens, in parallel. However, the cost of attention scales quadratically in length, limiting their scalability to longer sequences. Here, we address the challenges of prior methods and introduce convolutional state space models (ConvSSM) that combine the tensor modeling ideas of ConvLSTM with the long sequence modeling approaches of state space methods such as S4 and S5. First, we demonstrate how parallel scans can be applied to convolutional recurrences to achieve subquadratic parallelization and fast autoregressive generation. We then establish an equivalence between the dynamics of ConvSSMs and SSMs, which motivates parameterization and initialization strategies for modeling long-range dependencies. The result is ConvS5, an efficient ConvSSM variant for long-range spatiotemporal modeling. ConvS5 significantly outperforms Transformers and ConvLSTM on a long horizon Moving-MNIST experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers. In addition, ConvS5 matches or exceeds the performance of state-of-the-art methods on challenging DMLab, Minecraft and Habitat prediction benchmarks and enables new directions for modeling long spatiotemporal sequences.
摘要
长期空间序列模型化是一项挑战,因为需要同时模型复杂的空间相关性和长距离时间依赖关系。ConvLSTM通过将张量状态更新为逻辑神经网络,但它们的顺序计算使其训练速度慢。相比之下,Transformers可以在并行计算整个空间时间序列,但它们的注意力成本 quadratic 增长,限制其扩展性。在这里,我们解决先前方法的挑战,并引入张量状态空间模型(ConvSSM),它将ConvLSTM的张量模型思想与状态方法如S4和S5的长序列模型方法结合。首先,我们说明了如何使用并行扫描来实现张量权重的并行化,以实现高效的自然递归生成。然后,我们证明了ConvSSM的动态与状态方法的动态相同,这使得我们可以设计参数和初始化策略来模型长距离相关性。结果是ConvS5,一种高效的ConvSSM变体,用于长距离空间时间模型化。ConvS5在一个长期 Moving-MNIST 实验中显著超过了Transformers和ConvLSTM,并在训练3倍 faster than ConvLSTM 和生成样本400倍 faster than Transformers。此外,ConvS5与当前状态级方法在 DMLab、Minecraft 和 Habitat 预测benchmark上具有相同或更高的性能,并开启了长期空间时间序列的新模型方向。
Towards Practical Non-Adversarial Distribution Alignment via Variational Bounds
paper_authors: Ziyu Gong, Ben Usman, Han Zhao, David I. Inouye
for: 学习不变表示,应用于公平和稳定性。
methods: 使用非对抗性VB-based方法,可应用于任何模型链。
results: 可以取代对抗损失,不需修改原始架构,广泛应用非对抗性Alignment方法。Here’s a breakdown of each point:
for: The paper is written for learning invariant representations, with a focus on fairness and robustness.
methods: The paper proposes a non-adversarial VAE-based alignment method that can be applied to any model pipeline.
results: The proposed method can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures, significantly broadening the applicability of non-adversarial alignment methods.Abstract
Distribution alignment can be used to learn invariant representations with applications in fairness and robustness. Most prior works resort to adversarial alignment methods but the resulting minimax problems are unstable and challenging to optimize. Non-adversarial likelihood-based approaches either require model invertibility, impose constraints on the latent prior, or lack a generic framework for alignment. To overcome these limitations, we propose a non-adversarial VAE-based alignment method that can be applied to any model pipeline. We develop a set of alignment upper bounds (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based alignment approaches both theoretically and empirically. Finally, we demonstrate that our novel alignment losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial alignment methods.
摘要
<>转换文本到简化中文。<>分布对齐可以用来学习不变表示,并且有应用于公平和Robustness。大多数先前的工作都是通过对抗对齐方法来实现,但这些最小化问题是不稳定且困难优化。非对抗的可能性-基于方法 Either require model invertibility, impose constraints on the latent prior, or lack a generic framework for alignment. To overcome these limitations, we propose a non-adversarial VAE-based alignment method that can be applied to any model pipeline. We develop a set of alignment upper bounds (including a noisy bound) that have VAE-like objectives but with a different perspective. We carefully compare our method to prior VAE-based alignment approaches both theoretically and empirically. Finally, we demonstrate that our novel alignment losses can replace adversarial losses in standard invariant representation learning pipelines without modifying the original architectures -- thereby significantly broadening the applicability of non-adversarial alignment methods.
results: 实验结果显示,DGFNs有效地增强了在罕见奖励领域和高维州空间中的探索,具有丰富的应用前景在药品探索中。Abstract
Deep learning is emerging as an effective tool in drug discovery, with potential applications in both predictive and generative models. Generative Flow Networks (GFlowNets/GFNs) are a recently introduced method recognized for the ability to generate diverse candidates, in particular in small molecule generation tasks. In this work, we introduce double GFlowNets (DGFNs). Drawing inspiration from reinforcement learning and Double Deep Q-Learning, we introduce a target network used to sample trajectories, while updating the main network with these sampled trajectories. Empirical results confirm that DGFNs effectively enhance exploration in sparse reward domains and high-dimensional state spaces, both challenging aspects of de-novo design in drug discovery.
摘要
深度学习在药物发现中emerging为有效工具,潜在应用于预测和生成模型。生成流网络(GFlowNets/GFNs)是最近引入的方法,被认可为能够生成多样化的候选者,尤其是小分子生成任务中。在这种工作中,我们引入双流网络(DGFNs)。通过引入奖励学习和双深度Q学习,我们引入一个目标网络用于采样轨迹,并将主网络更新为这些采样轨迹。实验结果表明,DGFNs有效地增强了探索性在缺乏奖励的领域和高维状态空间中,这些领域都是药物发现中的挑战。
Density Estimation for Entry Guidance Problems using Deep Learning
paper_authors: Jens A. Rataczak, Davide Amato, Jay W. McMahon
for: 这篇论文是用来解决行星入 atmospheric density profiles estimation问题的深度学习方法。
methods: 论文使用了一个长期短期记忆(LSTM)神经网络,通过学习将在board的探测器上可用的测量与 atmospheric density profile 之间的映射关系。测量包括圆柱状态表示、Cartesian感知加速度组件和表层压力测量。
results: 论文的结果表明,使用LSTM网络可以更好地预测行星入 atmospheric density profiles,并且可以重建过去飞行的 density profile。论文还证明了使用LSTM模型可以提高行星入 guidance 算法的终端准确性,比其他两种技术更好。Abstract
This work presents a deep-learning approach to estimate atmospheric density profiles for use in planetary entry guidance problems. A long short-term memory (LSTM) neural network is trained to learn the mapping between measurements available onboard an entry vehicle and the density profile through which it is flying. Measurements include the spherical state representation, Cartesian sensed acceleration components, and a surface-pressure measurement. Training data for the network is initially generated by performing a Monte Carlo analysis of an entry mission at Mars using the fully numerical predictor-corrector guidance (FNPEG) algorithm that utilizes an exponential density model, while the truth density profiles are sampled from MarsGRAM. A curriculum learning procedure is developed to refine the LSTM network's predictions for integration within the FNPEG algorithm. The trained LSTM is capable of both predicting the density profile through which the vehicle will fly and reconstructing the density profile through which it has already flown. The performance of the FNPEG algorithm is assessed for three different density estimation techniques: an exponential model, an exponential model augmented with a first-order fading-memory filter, and the LSTM network. Results demonstrate that using the LSTM model results in superior terminal accuracy compared to the other two techniques when considering both noisy and noiseless measurements.
摘要
The training data for the network is generated by performing a Monte Carlo analysis of an entry mission at Mars using the fully numerical predictor-corrector guidance (FNPEG) algorithm, which utilizes an exponential density model. The truth density profiles are sampled from MarsGRAM. To refine the LSTM network's predictions, a curriculum learning procedure is developed.The trained LSTM network is capable of both predicting the density profile through which the vehicle will fly and reconstructing the density profile through which it has already flown. The performance of the FNPEG algorithm is assessed for three different density estimation techniques: an exponential model, an exponential model augmented with a first-order fading-memory filter, and the LSTM network. The results show that using the LSTM model results in superior terminal accuracy compared to the other two techniques when considering both noisy and noiseless measurements.
results: 我们透过实验证明了这个方法的理论有效性,并且在复杂的资料依赖关系下提供了可靠的不确定量化。本研究将传统抽样技术与现代资料分析之间的距离缩小,提供了实用的工具 для研究者和实践者。Abstract
Resampling methods such as the bootstrap have proven invaluable in the field of machine learning. However, the applicability of traditional bootstrap methods is limited when dealing with large streams of dependent data, such as time series or spatially correlated observations. In this paper, we propose a novel bootstrap method that is designed to account for data dependencies and can be executed online, making it particularly suitable for real-time applications. This method is based on an autoregressive sequence of increasingly dependent resampling weights. We prove the theoretical validity of the proposed bootstrap scheme under general conditions. We demonstrate the effectiveness of our approach through extensive simulations and show that it provides reliable uncertainty quantification even in the presence of complex data dependencies. Our work bridges the gap between classical resampling techniques and the demands of modern data analysis, providing a valuable tool for researchers and practitioners in dynamic, data-rich environments.
摘要
bootstrap 方法如 bootstrap 已经在机器学习领域得到了广泛的应用。然而,传统的 bootstrap 方法在处理大量相关数据时失效,如时间序列或空间相关观察值。在这篇论文中,我们提出了一种新的 bootstrap 方法,可以考虑数据相互关系,并且可以在线执行,使其特别适用于实时应用。这种方法基于一个自增式相关的排序重样Weight。我们证明了该 bootstrap 方案的理论有效性,并通过了广泛的仿真实验,证明它可以在复杂数据相互关系下提供可靠的不确定性评估。我们的工作 bridges 了传统的排 sampling 技术和现代数据分析的需求之间,提供了一种有价值的工具,为研究人员和实践者在动态、数据丰富的环境中。
HyPE: Attention with Hyperbolic Biases for Relative Positional Encoding
results: 通过分析示出,通过选择合适的 hyperparameter,HyPE 可以近似 ALiBi 的注意偏好,从而提供了更好的泛化能力,并且在未来的实验中可以作为一个可能的方向进行探索。Abstract
In Transformer-based architectures, the attention mechanism is inherently permutation-invariant with respect to the input sequence's tokens. To impose sequential order, token positions are typically encoded using a scheme with either fixed or learnable parameters. We introduce Hyperbolic Positional Encoding (HyPE), a novel method that utilizes hyperbolic functions' properties to encode tokens' relative positions. This approach biases the attention mechanism without the necessity of storing the $O(L^2)$ values of the mask, with $L$ being the length of the input sequence. HyPE leverages preliminary concatenation operations and matrix multiplications, facilitating the encoding of relative distances indirectly incorporating biases into the softmax computation. This design ensures compatibility with FlashAttention-2 and supports the gradient backpropagation for any potential learnable parameters within the encoding. We analytically demonstrate that, by careful hyperparameter selection, HyPE can approximate the attention bias of ALiBi, thereby offering promising generalization capabilities for contexts extending beyond the lengths encountered during pretraining. The experimental evaluation of HyPE is proposed as a direction for future research.
摘要
在基于Transformer的架构中,注意机制自然地对输入序列中token的 permutation-invariant。为了强制顺序排序,通常使用一种方案,其中 Either fixed or learnable parameters are used to encode token positions. 我们介绍了一种新的方法:幽微位置编码(HyPE),它利用幽微函数的属性来编码token的相对位置。这种方法不需要存储 $O(L^2)$ 值的 маска,其中 $L$ 是输入序列的长度。HyPE 利用了先行 concatenation 操作和矩阵乘法,以便编码相对距离,并通过间接 incorporating 到 softmax 计算中的权重。这种设计确保了与 FlashAttention-2 兼容,并支持任何可能的可学习参数在编码中。我们分析表明,通过精心选择 hyperparameter,HyPE 可以近似 ALiBi 的注意力偏好,从而提供了扩展 beyond lengths encountered during pretraining 的普适化能力。HyPE 的实验评估被提议作为未来研究的方向。
Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes
For: This paper proposes a new method called Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE) for dynamic tensor decomposition, which can capture both the commonalities and personalities of the entities in the tensor.* Methods: The proposed method uses a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode, and a neural network to model the entry value as a nonlinear function of the embedding trajectories.* Results: The proposed method is shown to have advantages in both simulation study and real-world applications, and can capture the underlying temporal structure of the data more effectively than existing methods.Abstract
Tensor decomposition is an important tool for multiway data analysis. In practice, the data is often sparse yet associated with rich temporal information. Existing methods, however, often under-use the time information and ignore the structural knowledge within the sparsely observed tensor entries. To overcome these limitations and to better capture the underlying temporal structure, we propose Dynamic EMbedIngs fOr dynamic Tensor dEcomposition (DEMOTE). We develop a neural diffusion-reaction process to estimate dynamic embeddings for the entities in each tensor mode. Specifically, based on the observed tensor entries, we build a multi-partite graph to encode the correlation between the entities. We construct a graph diffusion process to co-evolve the embedding trajectories of the correlated entities and use a neural network to construct a reaction process for each individual entity. In this way, our model can capture both the commonalities and personalities during the evolution of the embeddings for different entities. We then use a neural network to model the entry value as a nonlinear function of the embedding trajectories. For model estimation, we combine ODE solvers to develop a stochastic mini-batch learning algorithm. We propose a stratified sampling method to balance the cost of processing each mini-batch so as to improve the overall efficiency. We show the advantage of our approach in both simulation study and real-world applications. The code is available at https://github.com/wzhut/Dynamic-Tensor-Decomposition-via-Neural-Diffusion-Reaction-Processes.
摘要
tensor 分解是多方数据分析中的重要工具。在实践中,数据通常是稀疏的, yet 具有丰富的时间信息。现有方法通常会下用时间信息和tensor 中稀疏观测的结构知识。为了超越这些限制和更好地捕捉下面结构,我们提出了动态嵌入 для动态tensor 分解(DEMOTE)。我们采用神经扩散-反应过程来估算动态嵌入 для不同模式的tensor 中的实体。具体来说,基于观测的tensor 入口,我们构建了多部分图来编码实体之间的相关性。我们构建了图扩散过程来同步嵌入轨迹的演化,并使用神经网络来构建每个实体的反应过程。这样,我们的模型可以捕捉不同实体的共同特征和个性特征在tensor 分解过程中的演化。然后,我们使用神经网络来模型每个入口的值为非线性函数。为模型估计,我们结合了ode 解除器来开发随机批处理算法。我们提出了分解采样方法,以保证每个批处理的成本相对均衡。我们在 simulate 研究和实际应用中展示了我们的方法的优势。代码可以在 中找到。
Predicting mutational effects on protein-protein binding via a side-chain diffusion probabilistic model
results: 实现了预测蛋白质-蛋白质结合的质量变化的最佳性能,并且 SidechainDiff 是首个使用液体傅尔父模型来生成副链结构的方法。Abstract
Many crucial biological processes rely on networks of protein-protein interactions. Predicting the effect of amino acid mutations on protein-protein binding is vital in protein engineering and therapeutic discovery. However, the scarcity of annotated experimental data on binding energy poses a significant challenge for developing computational approaches, particularly deep learning-based methods. In this work, we propose SidechainDiff, a representation learning-based approach that leverages unlabelled experimental protein structures. SidechainDiff utilizes a Riemannian diffusion model to learn the generative process of side-chain conformations and can also give the structural context representations of mutations on the protein-protein interface. Leveraging the learned representations, we achieve state-of-the-art performance in predicting the mutational effects on protein-protein binding. Furthermore, SidechainDiff is the first diffusion-based generative model for side-chains, distinguishing it from prior efforts that have predominantly focused on generating protein backbone structures.
摘要
很多生物过程依赖于蛋白质-蛋白质之间的交互,而预测蛋白质变异对蛋白质-蛋白质绑定的影响是蛋白工程和药物发现中非常重要的。然而,实验室内缺乏绑定能力的标注数据,对于开发计算方法,特别是深度学习方法,带来了很大的挑战。在这项工作中,我们提出了SidechainDiff,一种基于学习推论的方法,利用无标注实验蛋白结构来学习蛋白分子中侧链的生成过程。SidechainDiff使用瑞 Mann 扩散模型来学习侧链的生成过程,同时还可以给蛋白质-蛋白质界面上的杂交位点 Representations。利用学习的表示,我们实现了对蛋白质变异对蛋白质-蛋白质绑定的预测性能的状态级别表现。此外,SidechainDiff是首个采用扩散模型生成侧链的方法,与之前的主要关注在蛋白质脊梁结构生成上。
Dis-inhibitory neuronal circuits can control the sign of synaptic plasticity
methods: 使用 microcircuit model 和 Hebbian learning rule
results: naturally emerges error-modulated learning 和 comparable performance to back-propagation of error on several non-linearly separable benchmarksAbstract
How neuronal circuits achieve credit assignment remains a central unsolved question in systems neuroscience. Various studies have suggested plausible solutions for back-propagating error signals through multi-layer networks. These purely functionally motivated models assume distinct neuronal compartments to represent local error signals that determine the sign of synaptic plasticity. However, this explicit error modulation is inconsistent with phenomenological plasticity models in which the sign depends primarily on postsynaptic activity. Here we show how a plausible microcircuit model and Hebbian learning rule derived within an adaptive control theory framework can resolve this discrepancy. Assuming errors are encoded in top-down dis-inhibitory synaptic afferents, we show that error-modulated learning emerges naturally at the circuit level when recurrent inhibition explicitly influences Hebbian plasticity. The same learning rule accounts for experimentally observed plasticity in the absence of inhibition and performs comparably to back-propagation of error (BP) on several non-linearly separable benchmarks. Our findings bridge the gap between functional and experimentally observed plasticity rules and make concrete predictions on inhibitory modulation of excitatory plasticity.
摘要
neronal 网络如何进行信用分配仍然是系统神经科学中的中心未解问题。 Various 研究表明可能的解决方案是通过多层网络传递误差信号 backwards。 These 纯 fonctionally 动机化的模型假设了不同的 neuronal 腔室来表示本地误差信号,这些信号 Determine the sign of synaptic 弹性。 However, this explicit error 调变是与 Phenomenological 弹性模型不一致,这些模型中误差的 sign 主要取决于postsynaptic 活动。 Here we show how a plausible microcircuit 模型和 Hebbian 学习规则 Derived within an adaptive control theory 框架可以解决这个矛盾。 Assuming errors are encoded in top-down 异化 synaptic afferents, we show that error-modulated 学习 emerges naturally at the circuit level when recurrent inhibition explicitly influences Hebbian plasticity. The same 学习规则 accounts for experimentally observed plasticity in the absence of inhibition and performs comparably to back-propagation of error (BP) on several non-linearly separable benchmarks. Our findings bridge the gap between functional and experimentally observed plasticity rules and make concrete predictions on inhibitory modulation of excitatory plasticity.
Efficient Exploration in Continuous-time Model-based Reinforcement Learning
results: 我们的 regret bound表明,在使用 Gaussian Processes(GP) Dynamics 和合适的 measurement selection strategy(MSS)时, regret 是下线的。此外,我们还提出了一种自适应、数据依存的实用MSS,可以在 fewer samples 下达到相同的性能。Abstract
Reinforcement learning algorithms typically consider discrete-time dynamics, even though the underlying systems are often continuous in time. In this paper, we introduce a model-based reinforcement learning algorithm that represents continuous-time dynamics using nonlinear ordinary differential equations (ODEs). We capture epistemic uncertainty using well-calibrated probabilistic models, and use the optimistic principle for exploration. Our regret bounds surface the importance of the measurement selection strategy(MSS), since in continuous time we not only must decide how to explore, but also when to observe the underlying system. Our analysis demonstrates that the regret is sublinear when modeling ODEs with Gaussian Processes (GP) for common choices of MSS, such as equidistant sampling. Additionally, we propose an adaptive, data-dependent, practical MSS that, when combined with GP dynamics, also achieves sublinear regret with significantly fewer samples. We showcase the benefits of continuous-time modeling over its discrete-time counterpart, as well as our proposed adaptive MSS over standard baselines, on several applications.
摘要
常规强化学习算法通常考虑逻时动态,即使实际系统是连续时间的。在这篇论文中,我们介绍了一种基于非线性偏微分方程(ODE)的模型基于强化学习算法,用于捕捉连续时间动态中的知识uncertainty。我们使用了准确评估的概率模型,并采用了乐观原则来进行探索。我们的 regret bound 表明,在连续时间中,选择采样策略(MSS)的重要性,因为我们不仅需要决定如何探索,还需要 WHEN 观察到系统。我们的分析表明,使用 Gaussian Processes(GP)来模型 ODEs 的通常选择的 MSS,例如等间隔采样,可以获得下降 regret。此外,我们还提出了一种自适应、数据依赖的实用 MSS,并与 GP 动力相结合,可以实现下降 regret,并且需要更少的样本。我们还证明了连续时间模型的优越性,以及我们的提议的自适应 MSS 的优越性,在多个应用中。
On Feynman–Kac training of partial Bayesian neural networks
methods: 该策略基于 simulating Feynman–Kac 模型,并使用sequential Monte Carlo samplers来同时估计参数和秘密 posterior distribution。
results: 对各种 synthetic 和实际世界数据进行了评估,并显示了与现有方法相比,该训练策略可以提高预测性能。Abstract
Recently, partial Bayesian neural networks (pBNNs), which only consider a subset of the parameters to be stochastic, were shown to perform competitively with full Bayesian neural networks. However, pBNNs are often multi-modal in the latent-variable space and thus challenging to approximate with parametric models. To address this problem, we propose an efficient sampling-based training strategy, wherein the training of a pBNN is formulated as simulating a Feynman--Kac model. We then describe variations of sequential Monte Carlo samplers that allow us to simultaneously estimate the parameters and the latent posterior distribution of this model at a tractable computational cost. We show on various synthetic and real-world datasets that our proposed training scheme outperforms the state of the art in terms of predictive performance.
摘要
最近,半 bayesian neural network (pBNN) 已经展示了与全 bayesian neural network 相当的性能,但它们在隐变量空间中存在多模性,因此难以使用参数化模型进行近似。为解决这个问题,我们提出了一种高效的采样学习策略,其中将 trains a pBNN 为 Feynman--Kac 模型的实现。然后,我们描述了可以同时估计参数和隐变量 posterior distribution 的变种Sequential Monte Carlo 抽样方法,并在计算成本下降到可行水平。我们在各种 sintetic 和实际数据上表明了我们的训练方案在预测性能方面超过了现状。
results: 这篇论文的结果表明,使用离散时间DKF可以在离散时间上高度准确地估计非马歇尔过程的条件概率分布,并且这种估计的误差可以通过二者 Wasserstein 距离来量化。Abstract
Deep Kalman filters (DKFs) are a class of neural network models that generate Gaussian probability measures from sequential data. Though DKFs are inspired by the Kalman filter, they lack concrete theoretical ties to the stochastic filtering problem, thus limiting their applicability to areas where traditional model-based filters have been used, e.g.\ model calibration for bond and option prices in mathematical finance. We address this issue in the mathematical foundations of deep learning by exhibiting a class of continuous-time DKFs which can approximately implement the conditional law of a broad class of non-Markovian and conditionally Gaussian signal processes given noisy continuous-times measurements. Our approximation results hold uniformly over sufficiently regular compact subsets of paths, where the approximation error is quantified by the worst-case 2-Wasserstein distance computed uniformly over the given compact set of paths.
摘要
深度卡尔曼滤波器(DKF)是一类神经网络模型,可以从时序数据生成 Gaussian 概率度量。虽然 DKF 受 Kalman 滤波器的影响,但它们与传统的模型基 filtered 之间没有具体的理论关系,因此只能在传统的模型基 filtered 领域应用,如股票和选择价格的数学金融中进行模型调整。我们在数学深度学习的基础上解决这个问题,展示了一类连续时间DKF,可以约等 conditional law 的一类非马歇尔时间信号过程,基于噪声损失的连续时间测量。我们的近似结果在充分紧张的区域内保持,并且用最差2-沃asserstein距离来衡量近似错误的范围。
Operator Learning Enhanced Physics-informed Neural Networks for Solving Partial Differential Equations Characterized by Sharp Solutions
results: 能够成功地解决多种难以解决的问题,如非线性扩散-反应方程、布格尔方程和不可压缩 Navier-Stokes 方程,并且比vanilla PINN更具有抗过拟合和稳定性。Abstract
Physics-informed Neural Networks (PINNs) have been shown as a promising approach for solving both forward and inverse problems of partial differential equations (PDEs). Meanwhile, the neural operator approach, including methods such as Deep Operator Network (DeepONet) and Fourier neural operator (FNO), has been introduced and extensively employed in approximating solution of PDEs. Nevertheless, to solve problems consisting of sharp solutions poses a significant challenge when employing these two approaches. To address this issue, we propose in this work a novel framework termed Operator Learning Enhanced Physics-informed Neural Networks (OL-PINN). Initially, we utilize DeepONet to learn the solution operator for a set of smooth problems relevant to the PDEs characterized by sharp solutions. Subsequently, we integrate the pre-trained DeepONet with PINN to resolve the target sharp solution problem. We showcase the efficacy of OL-PINN by successfully addressing various problems, such as the nonlinear diffusion-reaction equation, the Burgers equation and the incompressible Navier-Stokes equation at high Reynolds number. Compared with the vanilla PINN, the proposed method requires only a small number of residual points to achieve a strong generalization capability. Moreover, it substantially enhances accuracy, while also ensuring a robust training process. Furthermore, OL-PINN inherits the advantage of PINN for solving inverse problems. To this end, we apply the OL-PINN approach for solving problems with only partial boundary conditions, which usually cannot be solved by the classical numerical methods, showing its capacity in solving ill-posed problems and consequently more complex inverse problems.
摘要
physics-informed neural networks (PINNs) 已经被证明为解决部分数据方程式 (PDEs) 的前进和反射问题的有前途的方法。另一方面,神经操作方法,包括深度操作网络 (DeepONet) 和傅立叶神经操作 (FNO),已经被引入并广泛使用以 aproximating PDEs 的解决方案。然而,当解决具有锋利解决方案的问题时,这两种方法会面临一定的挑战。为了解决这个问题,我们在这个工作中提出了一个新的框架,称为Operator Learning Enhanced Physics-informed Neural Networks (OL-PINN)。我们首先使用DeepONet来学习PDEs中相应的解决运算,然后与PINN相结合以解决目标的锋利解决方案问题。我们在多个问题上成功地使用OL-PINN,包括非线性扩散-反应方程、布格斯方程和不弹压流方程。相比于普通的PINN,我们的提案方法只需要一小部分的余类点来 achieve strong generalization capability,同时也提高了精度和稳定性。此外,OL-PINN继承了PINN的优点,可以解决反射问题,并且可以处理部分边界条件的问题,通常无法由古典数据方法解决。
Modeling Dynamics over Meshes with Gauge Equivariant Nonlinear Message Passing
paper_authors: Jung Yeon Park, Lawson L. S. Wong, Robin Walters
for: 解决 Computer graphics 和生物physical systems 中数据 sobre non-Euclidean manifolds 问题
methods: 使用 gauge equivariant convolutional and attentional architectures on meshes
results: 提高了模型 surface PDEs 的性能,但是不同任务中的设计贸易offs 会导致不同的选择Abstract
Data over non-Euclidean manifolds, often discretized as surface meshes, naturally arise in computer graphics and biological and physical systems. In particular, solutions to partial differential equations (PDEs) over manifolds depend critically on the underlying geometry. While graph neural networks have been successfully applied to PDEs, they do not incorporate surface geometry and do not consider local gauge symmetries of the manifold. Alternatively, recent works on gauge equivariant convolutional and attentional architectures on meshes leverage the underlying geometry but underperform in modeling surface PDEs with complex nonlinear dynamics. To address these issues, we introduce a new gauge equivariant architecture using nonlinear message passing. Our novel architecture achieves higher performance than either convolutional or attentional networks on domains with highly complex and nonlinear dynamics. However, similar to the non-mesh case, design trade-offs favor convolutional, attentional, or message passing networks for different tasks; we investigate in which circumstances our message passing method provides the most benefit.
摘要
<> translate text into Simplified ChineseData over non-Euclidean manifolds, often discretized as surface meshes, naturally arise in computer graphics and biological and physical systems. In particular, solutions to partial differential equations (PDEs) over manifolds depend critically on the underlying geometry. While graph neural networks have been successfully applied to PDEs, they do not incorporate surface geometry and do not consider local gauge symmetries of the manifold. Alternatively, recent works on gauge equivariant convolutional and attentional architectures on meshes leverage the underlying geometry but underperform in modeling surface PDEs with complex nonlinear dynamics. To address these issues, we introduce a new gauge equivariant architecture using nonlinear message passing. Our novel architecture achieves higher performance than either convolutional or attentional networks on domains with highly complex and nonlinear dynamics. However, similar to the non-mesh case, design trade-offs favor convolutional, attentional, or message passing networks for different tasks; we investigate in which circumstances our message passing method provides the most benefit.Translation notes:* "non-Euclidean" is translated as "非几何" (fēi jí hè)* "manifold" is translated as "流形" (liú xíng)* "partial differential equations" is translated as "部分偏微分方程" (bù zhāng tiān wēi dù fāng jiè)* "surface mesh" is translated as "表面网格" (biǎo miàn wǎng yǐ)* "gauge equivariant" is translated as " gauge 对称" (gāu yǐ xiàng)* "convolutional" is translated as "卷积" (juǎn shì)* "attentional" is translated as "注意" (zhù yì)* "message passing" is translated as "消息传递" (xiāo wèn chuán zhù)Please note that Simplified Chinese is used in this translation, which may differ from Traditional Chinese.
Model Uncertainty based Active Learning on Tabular Data using Boosted Trees
for: This paper focuses on active learning for tabular data using boosted trees, with a particular emphasis on measuring model uncertainty and leveraging it for efficient label acquisition.
methods: The paper proposes an uncertainty-based sampling strategy for active learning, using entropy as a measure of model uncertainty. Additionally, the authors propose two novel cost-effective active learning methods for regression and classification tasks.
results: The authors evaluate the proposed methods on several benchmark datasets and show that their uncertainty-based sampling strategy and cost-effective active learning methods achieve better performance compared to existing methods.Abstract
Supervised machine learning relies on the availability of good labelled data for model training. Labelled data is acquired by human annotation, which is a cumbersome and costly process, often requiring subject matter experts. Active learning is a sub-field of machine learning which helps in obtaining the labelled data efficiently by selecting the most valuable data instances for model training and querying the labels only for those instances from the human annotator. Recently, a lot of research has been done in the field of active learning, especially for deep neural network based models. Although deep learning shines when dealing with image\textual\multimodal data, gradient boosting methods still tend to achieve much better results on tabular data. In this work, we explore active learning for tabular data using boosted trees. Uncertainty based sampling in active learning is the most commonly used querying strategy, wherein the labels of those instances are sequentially queried for which the current model prediction is maximally uncertain. Entropy is often the choice for measuring uncertainty. However, entropy is not exactly a measure of model uncertainty. Although there has been a lot of work in deep learning for measuring model uncertainty and employing it in active learning, it is yet to be explored for non-neural network models. To this end, we explore the effectiveness of boosted trees based model uncertainty methods in active learning. Leveraging this model uncertainty, we propose an uncertainty based sampling in active learning for regression tasks on tabular data. Additionally, we also propose a novel cost-effective active learning method for regression tasks along with an improved cost-effective active learning method for classification tasks.
摘要
超visired机器学习 rely on the availability of good labeled data for model training. Labelled data is acquired by human annotation, which is a cumbersome and costly process, often requiring subject matter experts. Active learning is a sub-field of machine learning which helps in obtaining the labeled data efficiently by selecting the most valuable data instances for model training and querying the labels only for those instances from the human annotator. Recently, a lot of research has been done in the field of active learning, especially for deep neural network based models. Although deep learning shines when dealing with image\textual\multimodal data, gradient boosting methods still tend to achieve much better results on tabular data. In this work, we explore active learning for tabular data using boosted trees. Uncertainty based sampling in active learning is the most commonly used querying strategy, wherein the labels of those instances are sequentially queried for which the current model prediction is maximally uncertain. Entropy is often the choice for measuring uncertainty. However, entropy is not exactly a measure of model uncertainty. Although there has been a lot of work in deep learning for measuring model uncertainty and employing it in active learning, it is yet to be explored for non-neural network models. To this end, we explore the effectiveness of boosted trees based model uncertainty methods in active learning. Leveraging this model uncertainty, we propose an uncertainty based sampling in active learning for regression tasks on tabular data. Additionally, we also propose a novel cost-effective active learning method for regression tasks along with an improved cost-effective active learning method for classification tasks.
results: 该论文通过DataZoo工具集,使得网络流量分类领域的开发更加容易、更加准确,同时也提高了result的 reproduceability和cross-comparison的能力。Abstract
The machine learning communities, such as those around computer vision or natural language processing, have developed numerous supportive tools and benchmark datasets to accelerate the development. In contrast, the network traffic classification field lacks standard benchmark datasets for most tasks, and the available supportive software is rather limited in scope. This paper aims to address the gap and introduces DataZoo, a toolset designed to streamline dataset management in network traffic classification and to reduce the space for potential mistakes in the evaluation setup. DataZoo provides a standardized API for accessing three extensive datasets -- CESNET-QUIC22, CESNET-TLS22, and CESNET-TLS-Year22. Moreover, it includes methods for feature scaling and realistic dataset partitioning, taking into consideration temporal and service-related factors. The DataZoo toolset simplifies the creation of realistic evaluation scenarios, making it easier to cross-compare classification methods and reproduce results.
摘要
machine learning 社区,如计算机视觉或自然语言处理等,已经开发出了许多支持工具和标准评估数据集,以加速开发。而网络流量分类领域却缺乏大多数任务的标准评估数据集,可用的支持软件也很有限。这篇论文想要填补这个空白,并引入数据 zoo,一套用于协调 dataset 管理的工具集。数据 zoo 提供了访问三个广泛的数据集——CESNET-QUIC22、CESNET-TLS22 和 CESNET-TLS-Year22 的标准 API。此外,它还包括特征整形和现实 dataset 分区方法,考虑了时间和服务相关因素。数据 zoo 工具集可以简化实际评估场景的创建,使得cross- comparing 分类方法和重复结果更加容易。
Non-parametric regression for robot learning on manifolds
results: 实验结果表明,使用本方法可以在机器人学习中提高预测精度,并且比使用投影基本法更好。Abstract
Many of the tools available for robot learning were designed for Euclidean data. However, many applications in robotics involve manifold-valued data. A common example is orientation; this can be represented as a 3-by-3 rotation matrix or a quaternion, the spaces of which are non-Euclidean manifolds. In robot learning, manifold-valued data are often handled by relating the manifold to a suitable Euclidean space, either by embedding the manifold or by projecting the data onto one or several tangent spaces. These approaches can result in poor predictive accuracy, and convoluted algorithms. In this paper, we propose an "intrinsic" approach to regression that works directly within the manifold. It involves taking a suitable probability distribution on the manifold, letting its parameter be a function of a predictor variable, such as time, then estimating that function non-parametrically via a "local likelihood" method that incorporates a kernel. We name the method kernelised likelihood estimation. The approach is conceptually simple, and generally applicable to different manifolds. We implement it with three different types of manifold-valued data that commonly appear in robotics applications. The results of these experiments show better predictive accuracy than projection-based algorithms.
摘要
许多机器人学习工具是为欧几何数据设计。然而,许多机器人应用中的数据是拥有非欧几何结构的。例如,Orientation可以表示为3x3旋转矩阵或 quarternion,这些空间都是非欧几何 manifold。在机器人学习中,把拥有非欧几何结构的数据处理为Euclidean space 是一个常见的做法。这些方法可能会导致预测精度不佳和算法复杂。在这篇论文中,我们提出了一种“内在”的回归方法,可以直接在拥有非欧几何结构的 manifold 上进行。这种方法是基于一个适当的概率分布在拥有非欧几何结构的 manifold 上,使其参数为预测变量,例如时间。然后使用一种“本地概率”方法来估计这个函数,这种方法包含一个核函数。我们称之为核化概率估计。这种方法概念简单,通用于不同的拥有非欧几何结构的 manifold。我们在三种常见的机器人学习中使用了不同类型的拥有非欧几何结构的数据进行实验,实验结果表明这种方法的预测精度比 projection-based 算法更高。
Privacy-preserving Federated Primal-dual Learning for Non-convex and Non-smooth Problems with Model Sparsification
results: 实验结果显示,提出的 Federated Learning 方法在实际数据上具有明显的优势,与一些现有的 FL 算法相比,其性能明显更高。此外,论文还 validate了所有的分析结果和性能特性。Abstract
Federated learning (FL) has been recognized as a rapidly growing research area, where the model is trained over massively distributed clients under the orchestration of a parameter server (PS) without sharing clients' data. This paper delves into a class of federated problems characterized by non-convex and non-smooth loss functions, that are prevalent in FL applications but challenging to handle due to their intricate non-convexity and non-smoothness nature and the conflicting requirements on communication efficiency and privacy protection. In this paper, we propose a novel federated primal-dual algorithm with bidirectional model sparsification tailored for non-convex and non-smooth FL problems, and differential privacy is applied for strong privacy guarantee. Its unique insightful properties and some privacy and convergence analyses are also presented for the FL algorithm design guidelines. Extensive experiments on real-world data are conducted to demonstrate the effectiveness of the proposed algorithm and much superior performance than some state-of-the-art FL algorithms, together with the validation of all the analytical results and properties.
摘要
Approximation Theory, Computing, and Deep Learning on the Wasserstein Space
results: 本研究提供了explicit和量化的bounds on generalization errors for each of these solutions。在证明过程中,我们利用了度量 Sobolev空间的理论和优点运输技术、variational calculus和大偏差 bounds。在数值实现中,我们使用了适应设计的神经网络作为基函数。这些神经网络在训练后可以快速地评估。因此,我们的构建解决方案可以在等级准确性下提高评估速度,超过当前状态方法的几个数量级。Abstract
The challenge of approximating functions in infinite-dimensional spaces from finite samples is widely regarded as formidable. In this study, we delve into the challenging problem of the numerical approximation of Sobolev-smooth functions defined on probability spaces. Our particular focus centers on the Wasserstein distance function, which serves as a relevant example. In contrast to the existing body of literature focused on approximating efficiently pointwise evaluations, we chart a new course to define functional approximants by adopting three machine learning-based approaches: 1. Solving a finite number of optimal transport problems and computing the corresponding Wasserstein potentials. 2. Employing empirical risk minimization with Tikhonov regularization in Wasserstein Sobolev spaces. 3. Addressing the problem through the saddle point formulation that characterizes the weak form of the Tikhonov functional's Euler-Lagrange equation. As a theoretical contribution, we furnish explicit and quantitative bounds on generalization errors for each of these solutions. In the proofs, we leverage the theory of metric Sobolev spaces and we combine it with techniques of optimal transport, variational calculus, and large deviation bounds. In our numerical implementation, we harness appropriately designed neural networks to serve as basis functions. These networks undergo training using diverse methodologies. This approach allows us to obtain approximating functions that can be rapidly evaluated after training. Consequently, our constructive solutions significantly enhance at equal accuracy the evaluation speed, surpassing that of state-of-the-art methods by several orders of magnitude.
摘要
“函数approximation在无穷dimensional空间中从finite samples中进行approximation是广泛认为是困难的挑战。在这种研究中,我们对 Sobolev-smooth 函数定义在概率空间上进行数值 aproximation 进行了研究。我们的特定关注点在于 Wasserstein 距离函数,它是一个有 relevance 的例子。相比之前的文献中关注点在于快速地计算点 wise 评估,我们采取了三种机器学习基于方法:1. 解决 finite 个优质 transport 问题,并计算相应的 Wasserstein 潜 potential。2. 使用 Tikhonov 补做 regularization 在 Wasserstein Sobolev 空间中进行 Empirical Risk Minimization。3. 通过 saddle point 表示法,解决这个问题。在证明中,我们利用了 metric Sobolev 空间理论和optimal transport 技术,并将其与variational calculus 和大数据准则绑定在一起。在数值实现中,我们使用适应设计的神经网络作为基函数。这些神经网络进行训练,并通过不同的方法进行训练。这种方法使得我们可以获得高度精度的函数 aproximation,并且可以在训练后快速地计算这些函数。因此,我们的构建解决方案可以在同等精度下大幅提高评估速度,超过现有方法几个数量级。”
On consequences of finetuning on data with highly discriminative features
results: 研究发现,在传输学习中,网络倾向于优先学习基本数据模式,导致已经学习的特征被忽略,从而影响网络的性能和内部表示。Abstract
In the era of transfer learning, training neural networks from scratch is becoming obsolete. Transfer learning leverages prior knowledge for new tasks, conserving computational resources. While its advantages are well-documented, we uncover a notable drawback: networks tend to prioritize basic data patterns, forsaking valuable pre-learned features. We term this behavior "feature erosion" and analyze its impact on network performance and internal representations.
摘要
在转移学习时代,从头开始训练神经网络已成为过时。转移学习利用了先前学习的知识,以便应用于新任务,减少计算资源。虽然它的优点已经很好地记录下来,但我们发现了一个明显的缺点:神经网络往往强调基本数据模式,抛弃有价值的先前学习特征。我们称这种行为为“特征蚀减”,并分析其对网络性能和内部表示的影响。
Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation
results: 我们在两个实际数据集上进行了实验,结果显示,我们的方法可以相对提高效iveness(2.3%)和效率(11.53%)。Abstract
Rewards serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. In this research, we focus on the problem of learning to reward (LTR), which is fundamental to reinforcement learning. Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization, or assume that user-agent interactions provide perfect demonstrations, which is not feasible in practice. Ideally, we aim to employ a unified approach that optimizes both the reward and policy using compositional demonstrations. However, this requirement presents a challenge since rewards inherently quantify user feedback on-policy, while recommender agents approximate off-policy future cumulative valuation. To tackle this challenge, we propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties. Our method utilizes discounted stationary distribution correction to combine LTR and recommender agent evaluation. To fulfill the compositional requirement, we incorporate the concept of pessimism through conservation. Specifically, we modify the vanilla correction using Bellman transformation and enforce KL regularization to constrain consecutive policy updates. We use two real-world datasets which represent two compositional coverage to conduct empirical studies, the results also show that the proposed method relatively improves both effectiveness (2.3\%) and efficiency (11.53\%)
摘要
奖励 serve as a measure of user satisfaction and act as a limiting factor in interactive recommender systems. 在这个研究中,我们关注学习奖励(LTR)问题,这是基本的回归学习问题。先前的方法可以是引入额外的学习奖励程序,从而增加优化的复杂度,或者假设用户-代理交互提供完美的示例,这不是实际情况。理想地,我们想使用一种统一的方法,同时优化奖励和策略使用compositional示例。但这会提出一个挑战,因为奖励自然地量化用户反馈on-policy,而推荐代理 approximates off-policy未来累累值。为解决这个挑战,我们提出了一种新的批量反向学习 paradigma,实现了所需的属性。我们的方法使用折扣站ary分布 corrected来combine LTR和推荐代理评价。为了满足compositional要求,我们加入了保守性的概念,specifically,我们修改了vanilla correction,并在 Bellman 变换中强制执行KL regularization,以制约 consecutive policy 更新。我们使用了两个实际数据集,代表了两种compositional coverage,进行了empirical研究,结果也表明,我们提出的方法相对提高了效果(2.3%)和效率(11.53%)。
results: 我们在DeepMind Control任务中进行了低和高回放率 régime的测试,并对多个设计选择进行了抹除。结果显示,Despite minimal computational overhead,DAC可以在涤力学任务中 achieve state-of-the-art performance和sample efficiency。Abstract
Actor-Critic methods are in a stalemate of two seemingly irreconcilable problems. Firstly, critic proneness towards overestimation requires sampling temporal-difference targets from a conservative policy optimized using lower-bound Q-values. Secondly, well-known results show that policies that are optimistic in the face of uncertainty yield lower regret levels. To remedy this dichotomy, we propose Decoupled Actor-Critic (DAC). DAC is an off-policy algorithm that learns two distinct actors by gradient backpropagation: a conservative actor used for temporal-difference learning and an optimistic actor used for exploration. We test DAC on DeepMind Control tasks in low and high replay ratio regimes and ablate multiple design choices. Despite minimal computational overhead, DAC achieves state-of-the-art performance and sample efficiency on locomotion tasks.
摘要
Generator Identification for Linear SDEs with Additive and Multiplicative Noise
results: 这篇论文提出了线性SDE的生成器可以通过解析解的分布来确定的必要和 suficient condition,并且提供了这些condition的几何解释。Abstract
In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifically, we derive a sufficient and necessary condition for identifying the generator of linear SDEs with additive noise, as well as a sufficient condition for identifying the generator of linear SDEs with multiplicative noise. We show that the conditions derived for both types of SDEs are generic. Moreover, we offer geometric interpretations of the derived identifiability conditions to enhance their understanding. To validate our theoretical results, we perform a series of simulations, which support and substantiate the established findings.
摘要
在这篇论文中,我们提出了确定Linear Stochastic Differential Equation(SDE)生成器的条件。这些条件是 causal inference 中 linear SDE 的关键因素,它们允许我们从观察分布中确定post-intervention 分布。我们得到了 linear SDE 添加噪声的必要和 suficient condition,以及 linear SDE 乘法噪声的 sufficient condition。我们发现这些条件对于 both types of SDEs 是通用的。此外,我们还提供了这些条件的 geometric interpretation,以便更好地理解。为验证我们的理论结论,我们进行了一系列的 simulations,它们支持和证实了我们的结论。Here's the translation of the text in Traditional Chinese:在这篇论文中,我们提出了确定Linear Stochastic Differential Equation(SDE)生成器的条件。这些条件是 causal inference 中 linear SDE 的关键因素,它们允许我们从观察分布中确定post-intervention 分布。我们得到了 linear SDE 添加噪声的必要和 suficient condition,以及 linear SDE 乘法噪声的 sufficient condition。我们发现这些条件对于 both types of SDEs 是通用的。此外,我们还提供了这些条件的 geometric interpretation,以便更好地理解。为验证我们的理论结论,我们进行了一系列的 simulations,它们支持和证实了我们的结论。
Adaptive Meta-Learning-Based KKL Observer Design for Nonlinear Dynamical Systems
results: 实验结果表明,该方法可以高度准确地估计非线性系统的状态,并且具有良好的泛化能力、鲁棒性和适应性。Abstract
The theory of Kazantzis-Kravaris/Luenberger (KKL) observer design introduces a methodology that uses a nonlinear transformation map and its left inverse to estimate the state of a nonlinear system through the introduction of a linear observer state space. Data-driven approaches using artificial neural networks have demonstrated the ability to accurately approximate these transformation maps. This paper presents a novel approach to observer design for nonlinear dynamical systems through meta-learning, a concept in machine learning that aims to optimize learning models for fast adaptation to a distribution of tasks through an improved focus on the intrinsic properties of the underlying learning problem. We introduce a framework that leverages information from measurements of the system output to design a learning-based KKL observer capable of online adaptation to a variety of system conditions and attributes. To validate the effectiveness of our approach, we present comprehensive experimental results for the estimation of nonlinear system states with varying initial conditions and internal parameters, demonstrating high accuracy, generalization capability, and robustness against noise.
摘要
《kazantzis-kravaris/Luenberger(KKL)观察器设计理论》引入了一种使用非线性变换Map和其左逆函数来估计非线性系统的状态的方法ологи。使用人工神经网络进行数据驱动的方法已经证明了高度准确地 aproximate这些变换Map。本文提出了一种基于机器学习的观察器设计方法,通过meta-学习来优化学习模型,以便快速适应 distribuition of tasks 中的学习问题。我们提出了一种基于测量系统输出信息的框架,用于设计一种可在线适应系统条件和特性的学习型KKL观察器。为验证我们的方法的有效性,我们提供了广泛的实验结果,包括非线性系统的初始条件和内部参数变化的情况, demonstrating高精度、泛化能力和对噪声的Robustness。
results: 研究发现,使用 ‘’Grokking Tickets’’ 可以大幅加速泛化,并且这种加速不仅在不同的配置下得到证明,而且比 dense network 更快。此外,研究还发现,在适当的剔除率下,泛化可以在没有权重衰减的情况下实现。Abstract
Grokking is one of the most surprising puzzles in neural network generalization: a network first reaches a memorization solution with perfect training accuracy and poor generalization, but with further training, it reaches a perfectly generalized solution. We aim to analyze the mechanism of grokking from the lottery ticket hypothesis, identifying the process to find the lottery tickets (good sparse subnetworks) as the key to describing the transitional phase between memorization and generalization. We refer to these subnetworks as ''Grokking tickets'', which is identified via magnitude pruning after perfect generalization. First, using ''Grokking tickets'', we show that the lottery tickets drastically accelerate grokking compared to the dense networks on various configurations (MLP and Transformer, and an arithmetic and image classification tasks). Additionally, to verify that ''Grokking ticket'' are a more critical factor than weight norms, we compared the ''good'' subnetworks with a dense network having the same L1 and L2 norms. Results show that the subnetworks generalize faster than the controlled dense model. In further investigations, we discovered that at an appropriate pruning rate, grokking can be achieved even without weight decay. We also show that speedup does not happen when using tickets identified at the memorization solution or transition between memorization and generalization or when pruning networks at the initialization (Random pruning, Grasp, SNIP, and Synflow). The results indicate that the weight norm of network parameters is not enough to explain the process of grokking, but the importance of finding good subnetworks to describe the transition from memorization to generalization. The implementation code can be accessed via this link: \url{https://github.com/gouki510/Grokking-Tickets}.
摘要
干货猪肉是神经网络通用化的一个最有趣的拟合问题:一个网络在完美的训练精度下达到了记忆解决方案,但在进一步训练后它能够达到完美的总结解决方案。我们想要分析干货猪肉机制从抽奖签 hypothesis开始,并识别找到好的干货猪肉(好的稀疏网络)作为总结和拟合之间的关键过渡阶段。我们称这些签证为“干货猪肉签”,通过干货猪肉的大小减少来识别它们。我们首先使用干货猪肉签显示干货猪肉可以快速加速拟合,并在多种配置(MLP和Transformer)和数学和图像分类任务中进行了比较。此外,我们还比较了与稀疏网络相同L1和L2 нор的 dense网络,结果显示干货猪肉总结 faster than控制的稀疏模型。在进一步的调查中,我们发现在适当的减少率下,拟合可以在没有权重衰减的情况下得到。此外,我们还发现速度不会发生在使用记忆解决方案或在转移到总结和拟合之间的过渡阶段,或在初始化(随机减少、抓取、SNIP和Synflow)中减少网络参数。结果表明网络参数的重量 нор不够用来解释干货猪肉的过程,但是找到好的干货猪肉可以描述总结和拟合之间的过渡阶段。相关实现代码可以通过以下链接获取:\url{https://github.com/gouki510/Grokking-Tickets}。
Regret-Minimization Algorithms for Multi-Agent Cooperative Learning Systems
results: 这个论文提出了一系列的 regret Lower bound,用于衡量多智能体系统在决策问题上的性能。这些 regret Lower bound 取决于通信网络的连接度和延迟时间,从而为 MACL 系统的设计提供了有用的指导。Abstract
A Multi-Agent Cooperative Learning (MACL) system is an artificial intelligence (AI) system where multiple learning agents work together to complete a common task. Recent empirical success of MACL systems in various domains (e.g. traffic control, cloud computing, robotics) has sparked active research into the design and analysis of MACL systems for sequential decision making problems. One important metric of the learning algorithm for decision making problems is its regret, i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis, I analyze MACL systems for different sequential decision making problems. Concretely, the Chapter 3 and 4 investigate the cooperative multi-agent multi-armed bandit problems, with full-information or bandit feedback, in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems, I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4 and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay, thus giving useful guidance on design of the communication protocol in MACL systems
摘要
《多智能合作学习(MACL)系统是一种人工智能(AI)系统,其中多个学习代理共同完成共同任务。 recent empirical success of MACL systems in various domains(例如交通管理、云计算、机器人等)has sparked active research into the design and analysis of MACL systems for sequential decision-making problems. One important metric of the learning algorithm for decision-making problems is its regret,i.e. the difference between the highest achievable reward and the actual reward that the algorithm gains. The design and development of a MACL system with low-regret learning algorithms can create huge economic values. In this thesis,I analyze MACL systems for different sequential decision-making problems. Specifically,Chapter 3 and 4 investigate the cooperative multi-agent multi-armed bandit problems,with full-information or bandit feedback,in which multiple learning agents can exchange their information through a communication network and the agents can only observe the rewards of the actions they choose. Chapter 5 considers the communication-regret trade-off for online convex optimization in the distributed setting. Chapter 6 discusses how to form high-productive teams for agents based on their unknown but fixed types using adaptive incremental matchings. For the above problems,I present the regret lower bounds for feasible learning algorithms and provide the efficient algorithms to achieve this bound. The regret bounds I present in Chapter 3, 4, and 5 quantify how the regret depends on the connectivity of the communication network and the communication delay,thus giving useful guidance on design of the communication protocol in MACL systems.》Note that Simplified Chinese is a written form of Chinese that uses simpler characters and grammar than Traditional Chinese. The translation is done in a way that is consistent with the conventions of Simplified Chinese.
MMM and MMMSynth: Clustering of heterogeneous tabular data, and synthetic data generation
results: 这两个任务的结果是一个高性能的聚类算法和一种可以生成高质量Synthetic数据的算法。Abstract
We provide new algorithms for two tasks relating to heterogeneous tabular datasets: clustering, and synthetic data generation. Tabular datasets typically consist of heterogeneous data types (numerical, ordinal, categorical) in columns, but may also have hidden cluster structure in their rows: for example, they may be drawn from heterogeneous (geographical, socioeconomic, methodological) sources, such that the outcome variable they describe (such as the presence of a disease) may depend not only on the other variables but on the cluster context. Moreover, sharing of biomedical data is often hindered by patient confidentiality laws, and there is current interest in algorithms to generate synthetic tabular data from real data, for example via deep learning. We demonstrate a novel EM-based clustering algorithm, MMM (``Madras Mixture Model''), that outperforms standard algorithms in determining clusters in synthetic heterogeneous data, and recovers structure in real data. Based on this, we demonstrate a synthetic tabular data generation algorithm, MMMsynth, that pre-clusters the input data, and generates cluster-wise synthetic data assuming cluster-specific data distributions for the input columns. We benchmark this algorithm by testing the performance of standard ML algorithms when they are trained on synthetic data and tested on real published datasets. Our synthetic data generation algorithm outperforms other literature tabular-data generators, and approaches the performance of training purely with real data.
摘要
我们提供了新的算法,用于两个与异类表格数据相关的任务:聚类和生成synthetic数据。异类表格数据通常包含不同的数据类型(数值、排序、 categorical)的列,但可能也有隐藏的聚类结构在其行中:例如,它们可能来自不同的地理、社会经济、方法学来源,并且结果变量(如疾病存在)可能不仅取决于其他变量,而且受到聚类上下文的影响。此外,生物医学数据共享受到了患者隐私法律的限制,现在有兴趣在使用深度学习生成synthetic表格数据。我们描述了一种新的EM基于的聚类算法,名为Madras Mixture Model(MMM),它在异类数据上表现出色,并在实际数据中恢复结构。基于这种算法,我们提出了一种synthetic表格数据生成算法,名为MMMsynth,它先对输入数据进行聚类,然后生成每个聚类的cluster-specific synthetic数据,假设每个列的数据分布是固定的。我们对这种算法进行了测试,并发现它在训练与实际发表数据之间的性能相当接近。
methods: 本研究使用 Hodge 分解,开发出 divergence-free 和 curl-free 边 GPs,并将它们组合成 \emph{Hodge-compositional edge GPs},以便直接学习不同 Hodge ком component of 边函数。
results: 研究人员在 currency exchange, ocean flows 和 water supply networks 中应用了这些 GPs,并与其他模型进行比较,结果表明这些 GPs 能够准确地捕捉边函数的 relevance。Abstract
We propose principled Gaussian processes (GPs) for modeling functions defined over the edge set of a simplicial 2-complex, a structure similar to a graph in which edges may form triangular faces. This approach is intended for learning flow-type data on networks where edge flows can be characterized by the discrete divergence and curl. Drawing upon the Hodge decomposition, we first develop classes of divergence-free and curl-free edge GPs, suitable for various applications. We then combine them to create \emph{Hodge-compositional edge GPs} that are expressive enough to represent any edge function. These GPs facilitate direct and independent learning for the different Hodge components of edge functions, enabling us to capture their relevance during hyperparameter optimization. To highlight their practical potential, we apply them for flow data inference in currency exchange, ocean flows and water supply networks, comparing them to alternative models.
摘要
我们提出了原理式加aussian proceses(GPs),用于模型 simplicial 2-complex 上的函数,这种结构类似于图,但是 edges 可能会形成三角形面。这种方法适用于 studying flow-type 资料在网络上,其edge flows 可以通过离散凝聚和旋转来描述。我们首先将 divergence 和 curl 分别对应到 Hodge 分解中的两个分量,然后创建 divergence-free 和 curl-free 的 edge GPs。这些 GPs 可以独立地学习不同的 Hodge 分量,使得它们能够捕捉到不同的运算效应。我们组合这些 GPs 创建了 Hodge-compositional edge GPs,这些 GPs 能够表示任何 edge 函数。我们在货币交易、海洋流和水Supply 网络中应用这些 GPs,与其他模型进行比较。这些 GPs 能够对 flow 资料进行直接和独立的学习,并且能够捕捉到不同的运算效应,因此具有实际的应用潜力。
A Federated Learning Framework for Stenosis Detection
results: 我们的结果显示,FL 框架不会严重影响客户机构 2 的性能,但对客户机构 1 而言,FL 框架可以提高性能,对比本地训练模型,提高了 +3.76%、+17.21% 和 +10.80%,分别为 P rec = 73.56、Rec = 67.01 和 F1 = 70.13。这些结果显示,FL 可以实现多中心研究,并且保持患者隐私。Abstract
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA). Two heterogeneous datasets from two institutions were considered: Dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy); Dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature. Stenosis detection was performed by using a Faster R-CNN model. In our FL framework, only the weights of the model backbone were shared among the two client institutions, using Federated Averaging (FedAvg) for weight aggregation. We assessed the performance of stenosis detection using Precision (P rec), Recall (Rec), and F1 score (F1). Our results showed that the FL framework does not substantially affects clients 2 performance, which already achieved good performance with local training; for client 1, instead, FL framework increases the performance with respect to local model of +3.76%, +17.21% and +10.80%, respectively, reaching P rec = 73.56, Rec = 67.01 and F1 = 70.13. With such results, we showed that FL may enable multicentric studies relevant to automatic stenosis detection in CA by addressing data heterogeneity from various institutions, while preserving patient privacy.
摘要
Translated into Simplified Chinese:这项研究探讨了在 coronary angiography 图像 (CA) 中使用 Federated Learning (FL) 进行stenosis 检测。我们考虑了两个不同的数据集,一个来自意大利安科那的医院 (Ospedale Riuniti of Ancona),另一个来自文献中的一项前期研究,共计7492个顺序图像和90名患者。我们使用 Faster R-CNN 模型进行检测。在我们的 FL 框架中,只有模型背景的加权被共享给两个客户机构,使用 Federated Averaging (FedAvg) 进行加权聚合。我们评估了检测精度使用精度 (Precision)、报告率 (Recall) 和 F1 分数 (F1)。我们的结果表明,FL 框架不会对客户机构 2 的性能产生显著影响,这些客户机构已经在本地训练 achieved good performance; 而对于客户机构 1,FL 框架会提高本地模型的性能,增加 +3.76%, +17.21% 和 +10.80%,分别达到 Precision = 73.56, Recall = 67.01 和 F1 = 70.13。通过这些结果,我们证明了 FL 可能会在 CA 中实现多中心的自动stenosis 检测研究,同时解决不同机构的数据不一致性问题,保护患者隐私。
Asymmetric Diffusion Based Channel-Adaptive Secure Wireless Semantic Communications
paper_authors: Xintian Ren, Jun Wu, Hansong Xu, Qianqian Pan
For: The paper proposes a secure semantic communication system called DiffuSeC to address the security problem caused by semantic attacks in end-to-end data transmission tasks like image classification and image reconstruction.* Methods: The system leverages the diffusion model and deep reinforcement learning (DRL) to mitigate perturbations added by semantic attacks, including data source attacks and channel attacks. A DRL-based channel-adaptive diffusion step selection scheme is developed to improve robustness under unstable channel conditions.* Results: Simulation results demonstrate that DiffuSeC shows higher robust accuracy than previous works under a wide range of channel conditions and can quickly adjust the model state according to signal-to-noise ratios (SNRs) in unstable environments.Here is the Chinese translation of the three key points:* For: 这篇论文提出了一种名为DiffuSeC的安全semantic通信系统,用于解决semantic攻击导致的安全问题在终端数据传输任务中,如图像分类和图像重建。* Methods: DiffuSeC使用了扩散模型和深度强化学习(DRL)来减少semantic攻击添加的干扰,包括数据源攻击和通道攻击。在不稳定的通道条件下,我们开发了一种基于DRL的通道适应扩散步选择方案,以提高系统的稳定性。* Results: 模拟结果表明,DiffuSeC在各种通道条件下具有更高的Robust性精度,并可以快速根据信号噪比(SNR)在不稳定环境中调整模型状态。Abstract
Semantic communication has emerged as a new deep learning-based communication paradigm that drives the research of end-to-end data transmission in tasks like image classification, and image reconstruction. However, the security problem caused by semantic attacks has not been well explored, resulting in vulnerabilities within semantic communication systems exposed to potential semantic perturbations. In this paper, we propose a secure semantic communication system, DiffuSeC, which leverages the diffusion model and deep reinforcement learning (DRL) to address this issue. With the diffusing module in the sender end and the asymmetric denoising module in the receiver end, the DiffuSeC mitigates the perturbations added by semantic attacks, including data source attacks and channel attacks. To further improve the robustness under unstable channel conditions caused by semantic attacks, we developed a DRL-based channel-adaptive diffusion step selection scheme to achieve stable performance under fluctuating environments. A timestep synchronization scheme is designed for diffusion timestep coordination between the two ends. Simulation results demonstrate that the proposed DiffuSeC shows higher robust accuracy than previous works under a wide range of channel conditions, and can quickly adjust the model state according to signal-to-noise ratios (SNRs) in unstable environments.
摘要
新型深度学习基于的 semantics 通信方式,semantic communication,在图像分类和图像重建等任务中得到了广泛的应用。然而,这种通信方式受到semantic attack的安全问题的影响,导致其存在漏洞。在这篇论文中,我们提出了一种安全的semantic communication系统,DiffuSeC,该系统利用了幂函数模型和深度强化学习(DRL)来解决这一问题。在发送端有扩散模块,接收端有非对称幂函数模块,DiffuSeC可以防止由semantic attack引起的干扰。为了进一步提高在不稳定的通信环境中的稳定性,我们开发了基于DRL的通信环境适应扩散步选择方案,以确保在不稳定的环境中的稳定性。此外,我们还设计了一种时间步同步方案,用于协调发送端和接收端的扩散步。实验结果表明,提出的DiffuSeC在各种通信环境下表现更高的Robust Accuracy,并能够根据信号噪比(SNR)在不稳定的环境中快速调整模型状态。
LightSAGE: Graph Neural Networks for Large Scale Item Retrieval in Shopee’s Advertisement Recommendation
results: 我们的模型在线上A/B测试中表现出色,并在Shopee的推荐广告系统中进行了实质性的应用。我们的模型可以有效地处理冷启动和长尾项目问题,并且在offline评估中也提供了显著的改善。Abstract
Graph Neural Network (GNN) is the trending solution for item retrieval in recommendation problems. Most recent reports, however, focus heavily on new model architectures. This may bring some gaps when applying GNN in the industrial setup, where, besides the model, constructing the graph and handling data sparsity also play critical roles in the overall success of the project. In this work, we report how GNN is applied for large-scale e-commerce item retrieval at Shopee. We introduce our simple yet novel and impactful techniques in graph construction, modeling, and handling data skewness. Specifically, we construct high-quality item graphs by combining strong-signal user behaviors with high-precision collaborative filtering (CF) algorithm. We then develop a new GNN architecture named LightSAGE to produce high-quality items' embeddings for vector search. Finally, we design multiple strategies to handle cold-start and long-tail items, which are critical in an advertisement (ads) system. Our models bring improvement in offline evaluations, online A/B tests, and are deployed to the main traffic of Shopee's Recommendation Advertisement system.
摘要
Graph Neural Network (GNN) 是当前推荐问题的流行解决方案。然而,最新的报告强调新的模型建立。这可能会导致在实际应用中, besides 模型,构建图和处理数据稀缺问题的重要性被忽略。在这篇文章中,我们报道了在 Shopee 的大规模电商ITEM检索中使用 GNN。我们介绍了我们的简单 yet novel 和有影响力的图构建、模型化和数据偏度处理技术。 Specifically,我们结合强信号用户行为和高精度共同推荐算法来构建高质量的ITEM图。然后,我们开发了一种名为 LightSAGE 的新的 GNN 架构,以生成高质量的ITEM嵌入 Vector 搜索。最后,我们设计了多种方法来处理冷启动和长尾ITEM,这些方法在广告系统中是关键的。我们的模型在线评估、A/B 测试中提供了改进,并在 Shopee 推荐广告系统的主要流量中部署。
Causal Fair Metric: Bridging Causality, Individual Fairness, and Adversarial Robustness
results: 本研究提出了一种基于 causal 结构的 fair 度量,可以应用于 adversarial training, fair learning, algorithmic recourse, 和 causal reinforcement learning 等领域。Abstract
Adversarial perturbation is used to expose vulnerabilities in machine learning models, while the concept of individual fairness aims to ensure equitable treatment regardless of sensitive attributes. Despite their initial differences, both concepts rely on metrics to generate similar input data instances. These metrics should be designed to align with the data's characteristics, especially when it is derived from causal structure and should reflect counterfactuals proximity. Previous attempts to define such metrics often lack general assumptions about data or structural causal models. In this research, we introduce a causal fair metric formulated based on causal structures that encompass sensitive attributes. For robustness analysis, the concept of protected causal perturbation is presented. Additionally, we delve into metric learning, proposing a method for metric estimation and deployment in real-world problems. The introduced metric has applications in the fields adversarial training, fair learning, algorithmic recourse, and causal reinforcement learning.
摘要
<>将文本翻译成简化中文。<>机器学习模型的敏感性漏洞通过对敏感属性的偏见进行攻击来暴露,而个人公平性目标则是保证不同敏感属性的各种对待。尽管这两个概念在初始阶段有所不同,但它们都 rely on 度量来生成类似的输入数据实例。这些度量应该与数据的特点相对应,特别是当数据来自 causal 结构时。在过去的尝试中,定义这些度量的方法经常缺乏一般假设关于数据或结构 causal 模型。在这项研究中,我们引入了基于 causal 结构的公平度量,用于掌控敏感属性。此外,我们还详细介绍了一种 metric learning 方法,用于度量估计和应用在实际问题中。引入的度量具有应用于对抗训练、公平学习、算法抗议和 causal 强化学习等领域的应用。
paper_authors: Bernardo Fichera, Viacheslav Borovitskiy, Andreas Krause, Aude Billard
for: 这篇论文是用于提高 Gaussian process regression 在高维度数据上的预测性和准确性。
methods: 这篇论文提出了一种可以直接从数据中推导隐藏结构的 Gaussian process regression 技术,并且可以处理高维度数据。
results: 这篇论文获得了一个可以测量 Gaussian process regression 模型在高维度数据上的预测性和准确性,并且可以处理百万笔数据。Abstract
Gaussian process regression is widely used because of its ability to provide well-calibrated uncertainty estimates and handle small or sparse datasets. However, it struggles with high-dimensional data. One possible way to scale this technique to higher dimensions is to leverage the implicit low-dimensional manifold upon which the data actually lies, as postulated by the manifold hypothesis. Prior work ordinarily requires the manifold structure to be explicitly provided though, i.e. given by a mesh or be known to be one of the well-known manifolds like the sphere. In contrast, in this paper we propose a Gaussian process regression technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way. For the resulting model, we discuss its convergence to the Mat\'ern Gaussian process on the assumed manifold. Our technique scales up to hundreds of thousands of data points, and may improve the predictive performance and calibration of the standard Gaussian process regression in high-dimensional~settings.
摘要
Gradient-free online learning of subgrid-scale dynamics with neural emulators
results: 试验表明,通过单独训练神经emuulator和参数化组件的loss量可以最小化一些近似偏差的传播。Abstract
In this paper, we propose a generic algorithm to train machine learning-based subgrid parametrizations online, i.e., with $\textit{a posteriori}$ loss functions for non-differentiable numerical solvers. The proposed approach leverage neural emulators to train an approximation of the reduced state-space solver, which is then used to allows gradient propagation through temporal integration steps. The algorithm is able to recover most of the benefit of online strategies without having to compute the gradient of the original solver. It is demonstrated that training the neural emulator and parametrization components separately with respective loss quantities is necessary in order to minimize the propagation of some approximation bias.
摘要
在这篇论文中,我们提出了一种通用算法,用于在线训练机器学习基于低级 parametrization 的梯度传播,即使 numerical solvers 无法导数。我们的方法利用神经网络仿真器来训练减少状态空间解决方案的近似,然后使用这个近似来允许时间 интеIntegration 步骤中的梯度传播。我们的算法可以在大多数情况下重新获得在线策略中的优点,而无需计算原始解决方案的梯度。我们还发现,在训练神经网络仿真器和 parametrization 组件 separately 的情况下,可以最小化一些近似偏差的传播。
results: 该框架比传统批处理测试具有以下优点:1)能够不断监测在线数据流中,高效地聚合证据反对空间假设,2)可以实现紧密控制类型一错 без需要多测试修正,3)可以根据未知问题的难度自适应样本大小。Abstract
We propose a general framework for constructing powerful, sequential hypothesis tests for a large class of nonparametric testing problems. The null hypothesis for these problems is defined in an abstract form using the action of two known operators on the data distribution. This abstraction allows for a unified treatment of several classical tasks, such as two-sample testing, independence testing, and conditional-independence testing, as well as modern problems, such as testing for adversarial robustness of machine learning (ML) models. Our proposed framework has the following advantages over classical batch tests: 1) it continuously monitors online data streams and efficiently aggregates evidence against the null, 2) it provides tight control over the type I error without the need for multiple testing correction, 3) it adapts the sample size requirement to the unknown hardness of the problem. We develop a principled approach of leveraging the representation capability of ML models within the testing-by-betting framework, a game-theoretic approach for designing sequential tests. Empirical results on synthetic and real-world datasets demonstrate that tests instantiated using our general framework are competitive against specialized baselines on several tasks.
摘要
我们提出一种通用框架,用于构建强大、顺序的假设测试,用于一类非Parametric测试问题。 null假设使用两个已知运算符定义在数据分布上,这种抽象允许我们对多种古典任务,如两个样本测试、独立测试和条件独立测试,以及现代问题,如机器学习(ML)模型对抗性测试进行统一处理。我们的提议的优势包括:1)在线流处理数据并快速汇集质量证据反对null假设,2)不需要多样测试修正,可以保持紧凑的类型一错率,3)根据未知问题的难度自适应样本大小。我们开发了一种基于测试-ById的游戏理论方法,用于设计顺序测试。实际结果表明,使用我们的通用框架实现的测试在多个任务上与专门的基准相比竞争。
methods: 该方法使用 conditional generative model 创造乐曲的不同部分,并使用大型自然语言模型提供乐曲的高级结构。
results: 该方法可以生成结构化、可靠的乐曲,不受机会性的限制,可以延展到无限长。Abstract
While recent generative models can produce engaging music, their utility is limited. The variation in the music is often left to chance, resulting in compositions that lack structure. Pieces extending beyond a minute can become incoherent or repetitive. This paper introduces an approach for generating structured, arbitrarily long musical pieces. Central to this approach is the creation of musical segments using a conditional generative model, with transitions between these segments. The generation of prompts that determine the high-level composition is distinct from the creation of finer, lower-level details. A large language model is then used to suggest the musical form.
摘要
Recent generative models can produce engaging music, but their usefulness is limited. The variation in the music is often left to chance, resulting in compositions that lack structure. Pieces extending beyond a minute can become incoherent or repetitive. This paper introduces an approach for generating structured, arbitrarily long musical pieces. Central to this approach is the creation of musical segments using a conditional generative model, with transitions between these segments. The generation of prompts that determine the high-level composition is distinct from the creation of finer, lower-level details. A large language model is then used to suggest the musical form.Translation note:* "generative models" 被翻译为 "生成模型" (shēngchǎng módel)* "utility" 被翻译为 "实用性" (shíyòngxìng)* "variation" 被翻译为 "变化" (biànbèi)* "incoherent" 被翻译为 "无序" (wùxí)* "repetitive" 被翻译为 "重复" (chóngfù)* "conditional generative model" 被翻译为 "条件生成模型" (tiěkuàng shēngchǎng módel)* "prompts" 被翻译为 "提示" (tímí)* "high-level composition" 被翻译为 "高级组合" (gāojí zhùxìn)* "lower-level details" 被翻译为 "低级细节" (dījí xìaoxiè)* "large language model" 被翻译为 "大型语言模型" (dàxìng yǔyán módel)
An interpretable clustering approach to safety climate analysis: examining driver group distinction in safety climate perceptions
results: 研究发现,supervisory care promotion是分 distinguish various driver groups的关键因素。此外,使用不同的聚类算法可能会导致不同的结果,因此需要进一步的比较和分析。Abstract
The transportation industry, particularly the trucking sector, is prone to workplace accidents and fatalities. Accidents involving large trucks accounted for a considerable percentage of overall traffic fatalities. Recognizing the crucial role of safety climate in accident prevention, researchers have sought to understand its factors and measure its impact within organizations. While existing data-driven safety climate studies have made remarkable progress, clustering employees based on their safety climate perception is innovative and has not been extensively utilized in research. Identifying clusters of drivers based on their safety climate perception allows the organization to profile its workforce and devise more impactful interventions. The lack of utilizing the clustering approach could be due to difficulties interpreting or explaining the factors influencing employees' cluster membership. Moreover, existing safety-related studies did not compare multiple clustering algorithms, resulting in potential bias. To address these issues, this study introduces an interpretable clustering approach for safety climate analysis. This study compares 5 algorithms for clustering truck drivers based on their safety climate perceptions. It proposes a novel method for quantitatively evaluating partial dependence plots (QPDP). To better interpret the clustering results, this study introduces different interpretable machine learning measures (SHAP, PFI, and QPDP). Drawing on data collected from more than 7,000 American truck drivers, this study significantly contributes to the scientific literature. It highlights the critical role of supervisory care promotion in distinguishing various driver groups. The Python code is available at https://github.com/NUS-DBE/truck-driver-safety-climate.
摘要
运输业界,特别是卡车运输业,受到工作场所意外和死亡的威胁,大型卡车事故占了交通意外整体死亡人数的一定比例。为了预防意外,研究人员权威了安全氛围的因素和影响,并尝试了在组织内部进行分组。这是因为不同的驾驶者对安全氛围的感受不同,因此可以根据驾驶者对安全氛围的感受进行分组。然而,现有的数据驱动的安全氛围研究尚未充分利用分组方法,而且存在评估因素的困难和解释分组成员的问题。此外,现有的安全相关研究未有比较多种分组算法,导致可能的偏见。为了解决这些问题,本研究将引入可解释的分组方法来分析安全氛围。本研究比较了5种分组算法,并提出了一种新的量化评估参数图(QPDP)。为了更好地解释分组结果,本研究引入了不同的可解释机器学习度量(SHAP、PFI、QPDP)。基于超过7,000名美国卡车驾驶者的数据,本研究对科学文献做出了重要贡献。它显示了监理照顾的推广对不同的驾驶者群体的区别起到了关键的作用。Python代码可以在获取。
ProNet: Progressive Neural Network for Multi-Horizon Time Series Forecasting
results: ProNet 模型在四个大数据集上进行了全面的评估,并进行了一个简要的ablation study,结果显示 ProNet 模型在准确率和预测速度两个方面表现出色,比 AR 和 NAR 模型更高。Abstract
In this paper, we introduce ProNet, an novel deep learning approach designed for multi-horizon time series forecasting, adaptively blending autoregressive (AR) and non-autoregressive (NAR) strategies. Our method involves dividing the forecasting horizon into segments, predicting the most crucial steps in each segment non-autoregressively, and the remaining steps autoregressively. The segmentation process relies on latent variables, which effectively capture the significance of individual time steps through variational inference. In comparison to AR models, ProNet showcases remarkable advantages, requiring fewer AR iterations, resulting in faster prediction speed, and mitigating error accumulation. On the other hand, when compared to NAR models, ProNet takes into account the interdependency of predictions in the output space, leading to improved forecasting accuracy. Our comprehensive evaluation, encompassing four large datasets, and an ablation study, demonstrate the effectiveness of ProNet, highlighting its superior performance in terms of accuracy and prediction speed, outperforming state-of-the-art AR and NAR forecasting models.
摘要
在这篇论文中,我们介绍了ProNet,一种新的深度学习方法,用于多个预测 horizons 的时间序列预测。我们的方法通过分解预测时间轴,预测每个分解段中的最重要步骤非autoregressively,并且剩下的步骤使用 autoregressive 方法预测。这个分 segmentation 过程基于隐藏变量,可以有效地捕捉个时间步骤的重要性 durch variational inference。与AR模型相比,ProNet 显示出了非常remarkable的优势,需要 fewer AR 迭代,减少预测速度,并 Mitigate 预测误差的寄存。与NAR模型相比,ProNet 考虑了输出空间中预测之间的依赖关系,导致更好的预测精度。我们的全面评估,包括四个大数据集,以及一个ablation study,证明了 ProNet 的效果, highlighting 其在精度和预测速度两个方面的superior performance,超越了当前AR和NAR预测模型。
Dual-Directed Algorithm Design for Efficient Pure Exploration
For: The paper is written to address the problem of pure exploration in stochastic sequential adaptive experiments with a finite set of alternative options. The goal is to accurately identify the best alternative with high confidence and minimal measurement efforts.* Methods: The paper uses dual variables to derive necessary and sufficient conditions for optimality, and proposes an information-directed selection rule to adaptively pick from a candidate set based on information gain. The top-two Thompson sampling algorithm is also used to solve the problem of best-arm identification.* Results: The paper establishes that the proposed algorithm is optimal for Gaussian best-arm identification, and is also applicable to other pure exploration problems such as $\epsilon$-best-arm identification and thresholding bandit problems. The numerical experiments show that the proposed algorithm is more efficient than existing ones.Abstract
We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternative options. The goal of the decision-maker is to accurately answer a query question regarding the alternatives with high confidence with minimal measurement efforts. A typical query question is to identify the alternative with the best performance, leading to ranking and selection problems, or best-arm identification in the machine learning literature. We focus on the fixed-precision setting and derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples. Using dual variables, we characterize the necessary and sufficient conditions for an allocation to be optimal. The use of dual variables allow us to bypass the combinatorial structure of the optimality conditions that relies solely on primal variables. Remarkably, these optimality conditions enable an extension of top-two algorithm design principle, initially proposed for best-arm identification. Furthermore, our optimality conditions give rise to a straightforward yet efficient selection rule, termed information-directed selection, which adaptively picks from a candidate set based on information gain of the candidates. We outline the broad contexts where our algorithmic approach can be implemented. We establish that, paired with information-directed selection, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature. Our algorithm is optimal for $\epsilon$-best-arm identification and thresholding bandit problems. Our analysis also leads to a general principle to guide adaptations of Thompson sampling for pure-exploration problems. Numerical experiments highlight the exceptional efficiency of our proposed algorithms relative to existing ones.
摘要
我们考虑了纯exploration问题,在随机顺序的adaptive试验中,有 finite 个选项。目标是使用最少的测量努力确定问题的答案,特别是标准问题,如最佳选项的标识。我们关注 fixed-precision 设定下的情况,并 derive 一个强化条件,表明optimal 分配样本是可行的。使用 dual 变量,我们描述了必要和充分条件,以确定分配是optimal。这些optimality condition允许我们扩展top-two 算法设计原则,初始提出于最佳arm 标识。此外,我们的optimality condition 会生成一种直观的 yet efficient 选择规则,称为信息指向选择。我们 outline 了这些算法方法可以应用的广泛上下文。我们证明,将信息指向选择与 top-two Thompson sampling 结合,可以在 Gaussian 最佳arm 标识问题中获得(几何)优化的解决方案,解决了纯exploration 文献中的一个明显的开问题。我们的算法还是 $\epsilon$-最佳arm 标识和阈值bandit问题的优化解决方案。我们的分析还导出了一种通用的原则,可以导向 Thompson sampling 的adaptation для纯exploration问题。 numerics experiments highlighted 我们提出的算法相比之下,relative to existing ones, exceptional efficiency.
A Planning-and-Exploring Approach to Extreme-Mechanics Force Fields
results: 研究发现,使用高精度的力场模型可以更好地预测裂解过程中的材料行为,并且需要考虑材料的电子结构。Abstract
Extreme mechanical processes such as strong lattice distortion and bond breakage during fracture are ubiquitous in nature and engineering, which often lead to catastrophic failure of structures. However, understanding the nucleation and growth of cracks is challenged by their multiscale characteristics spanning from atomic-level structures at the crack tip to the structural features where the load is applied. Molecular simulations offer an important tool to resolve the progressive microstructural changes at crack fronts and are widely used to explore processes therein, such as mechanical energy dissipation, crack path selection, and dynamic instabilities (e.g., kinking, branching). Empirical force fields developed based on local descriptors based on atomic positions and the bond orders do not yield satisfying predictions of fracture, even for the nonlinear, anisotropic stress-strain relations and the energy densities of edges. High-fidelity force fields thus should include the tensorial nature of strain and the energetics of rare events during fracture, which, unfortunately, have not been taken into account in both the state-of-the-art empirical and machine-learning force fields. Based on data generated by first-principles calculations, we develop a neural network-based force field for fracture, NN-F$^3$, by combining pre-sampling of the space of strain states and active-learning techniques to explore the transition states at critical bonding distances. The capability of NN-F$^3$ is demonstrated by studying the rupture of h-BN and twisted bilayer graphene as model problems. The simulation results confirm recent experimental findings and highlight the necessity to include the knowledge of electronic structures from first-principles calculations in predicting extreme mechanical processes.
摘要
extremely mechanical processes such as strong lattice distortion and bond breakage during fracture are common in nature and engineering, leading to catastrophic failure of structures. However, understanding the nucleation and growth of cracks is challenging due to their multiscale characteristics, ranging from atomic-level structures at the crack tip to the structural features where the load is applied. Molecular simulations are an important tool for exploring the progressive microstructural changes at crack fronts, including mechanical energy dissipation, crack path selection, and dynamic instabilities such as kinking and branching. However, empirical force fields based on local descriptors of atomic positions and bond orders have limited predictive power, especially for nonlinear, anisotropic stress-strain relations and the energy densities of edges. To address this challenge, we developed a neural network-based force field for fracture, NN-F$^3$, by combining pre-sampling of the space of strain states and active-learning techniques to explore the transition states at critical bonding distances. Our simulation results demonstrate the capability of NN-F$^3$ in studying the rupture of h-BN and twisted bilayer graphene as model problems, confirming recent experimental findings and highlighting the importance of considering electronic structures from first-principles calculations in predicting extreme mechanical processes.
Privacy-Preserving Federated Learning over Vertically and Horizontally Partitioned Data for Financial Anomaly Detection
paper_authors: Swanand Ravindra Kadhe, Heiko Ludwig, Nathalie Baracaldo, Alan King, Yi Zhou, Keith Houck, Ambrish Rawat, Mark Purcell, Naoise Holohan, Mikio Takeuchi, Ryo Kawahara, Nir Drucker, Hayim Shaul, Eyal Kushnir, Omri Soceanu
For: The paper is written to address the problem of detecting financial anomalies in a collaborative setting among multiple financial institutions, where trust is limited due to regulation and competition.* Methods: The paper proposes a novel solution called PV4FAD that combines fully homomorphic encryption, secure multi-party computation, differential privacy, and randomization techniques to balance privacy and accuracy during training and prevent inference threats at deployment time.* Results: The proposed solution achieves high utility and accuracy by significantly reducing the per-bank noise level while satisfying distributed differential privacy. The approach produces an ensemble model, specifically a random forest, to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. The solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.Here is the same information in Simplified Chinese:* For: 本文是为了解决多家金融机构之间协作探测金融异常情况,其中信任因为法规和竞争限制。* Methods: 本文提出了一种新的解决方案,即PV4FAD,它将完全Homomorphic encryption、安全多方计算、差分隐私和随机技术结合起来,以保持协作时的隐私和准确性。* Results: 该解决方案可以达到高Utility和准确性,通过减少每家银行的噪声水平,满足分布式差分隐私。该方案生成了一个 Random Forest ensemble模型,以利用 ensemble 模型的知名特性,降低差异和提高准确性。该解决方案在美国隐私提高技术奖 Challenge 的第一阶段中获得了第二名。Abstract
The effective detection of evidence of financial anomalies requires collaboration among multiple entities who own a diverse set of data, such as a payment network system (PNS) and its partner banks. Trust among these financial institutions is limited by regulation and competition. Federated learning (FL) enables entities to collaboratively train a model when data is either vertically or horizontally partitioned across the entities. However, in real-world financial anomaly detection scenarios, the data is partitioned both vertically and horizontally and hence it is not possible to use existing FL approaches in a plug-and-play manner. Our novel solution, PV4FAD, combines fully homomorphic encryption (HE), secure multi-party computation (SMPC), differential privacy (DP), and randomization techniques to balance privacy and accuracy during training and to prevent inference threats at model deployment time. Our solution provides input privacy through HE and SMPC, and output privacy against inference time attacks through DP. Specifically, we show that, in the honest-but-curious threat model, banks do not learn any sensitive features about PNS transactions, and the PNS does not learn any information about the banks' dataset but only learns prediction labels. We also develop and analyze a DP mechanism to protect output privacy during inference. Our solution generates high-utility models by significantly reducing the per-bank noise level while satisfying distributed DP. To ensure high accuracy, our approach produces an ensemble model, in particular, a random forest. This enables us to take advantage of the well-known properties of ensembles to reduce variance and increase accuracy. Our solution won second prize in the first phase of the U.S. Privacy Enhancing Technologies (PETs) Prize Challenge.
摘要
要有效探测金融异常,需要多个金融机构合作,其中包括支付网络系统(PNS)和其合作银行。这些金融机构之间的信任受到了法规和竞争的限制。基于联合学习(FL)技术,这些机构可以在数据水平或水平上分割的情况下合作训练模型。然而,在实际金融异常探测场景中,数据通常会被水平和垂直分割,因此不能使用现有的FL方法。我们的新解决方案,PV4FAD,结合了完全同质加密(HE)、安全多方计算(SMPC)、差分隐私(DP)和随机化技术,以保持隐私和准确性 durante 训练和执行时。我们的解决方案提供了输入隐私通过 HE 和 SMPC,并输出隐私防止攻击 during 执行时通过 DP。我们显示,在诚实但偷CURRENT的威胁模型下,银行不会学习PNS交易的敏感特征,而PNS只会学习预测标签,而不是银行的数据集。我们还开发了一种DP机制来保护输出隐私 during 执行。我们的解决方案可以减少每家银行的噪声水平,同时满足分布式DP。为了保证高准确性,我们的方法生成高Utility模型,具体来说是随机森林。这使得我们可以利用随机森林的特性来减少差异和提高准确性。我们的解决方案在美国隐私提升技术(PETs)奖励计划的第一阶段中获得了第二名。
paper_authors: Hanwen Ye, Wenzhuo Zhou, Ruoqing Zhu, Annie Qu
For: 本研究旨在提出一种新的个性化学习方法,以优化患者的临床效果。* Methods: 该方法使用动态治疗决策函数(DTR),并且强调对决策过程中观察到的治疗轨迹和优质治疗轨迹的Alignment。* Results: 研究人员通过实验和实际案例研究,证明了该方法可以提高样本效率和稳定性,并且可以更好地考虑决策过程中各个阶段的差异。Abstract
Recent advances in dynamic treatment regimes (DTRs) provide powerful optimal treatment searching algorithms, which are tailored to individuals' specific needs and able to maximize their expected clinical benefits. However, existing algorithms could suffer from insufficient sample size under optimal treatments, especially for chronic diseases involving long stages of decision-making. To address these challenges, we propose a novel individualized learning method which estimates the DTR with a focus on prioritizing alignment between the observed treatment trajectory and the one obtained by the optimal regime across decision stages. By relaxing the restriction that the observed trajectory must be fully aligned with the optimal treatments, our approach substantially improves the sample efficiency and stability of inverse probability weighted based methods. In particular, the proposed learning scheme builds a more general framework which includes the popular outcome weighted learning framework as a special case of ours. Moreover, we introduce the notion of stage importance scores along with an attention mechanism to explicitly account for heterogeneity among decision stages. We establish the theoretical properties of the proposed approach, including the Fisher consistency and finite-sample performance bound. Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for COVID-19 pandemic.
摘要
最近的动态治疗方案(DTR)技术提供了高效的优化治疗搜索算法,这些算法是针对个人特定需求而设计的,能够最大化他们的临床效果。然而,现有的算法可能会因为优化治疗下的样本数不够而受到限制,特别是 Chronic diseases 的长期决策过程中。为了解决这些挑战,我们提出了一种新的个性化学习方法,该方法优化 DTR 的估计,注重对观察到的治疗轨迹和优化治疗轨迹之间的Alignment。通过放弃优化治疗下的完全Alignment要求,我们的方法可以大幅提高 inverse probability weighted 基于方法的样本效率和稳定性。特别是,我们的学习方案建立了更通用的框架,包括 popular outcome weighted learning 方法作为我们的特殊情况。此外,我们引入了决策阶段重要性分数以及注意机制,以显式地考虑决策阶段之间的不同。我们证明了我们的方法的理论性质,包括 Fisher 一致性和finite-sample performance bound。Empirically, we evaluate the proposed method in extensive simulated environments and a real case study for COVID-19 pandemic.
AMLNet: Adversarial Mutual Learning Neural Network for Non-AutoRegressive Multi-Horizon Time Series Forecasting
For: 提高多个时间序列预测的准确率和速度。* Methods: 引入了一种新的非自 regression(NAR)模型AMLNet,通过在线知识传递(KD)方法和两种途径(出来-驱动KD和提示驱动KD)来实现教师模型的知识传递,以提高预测的准确性和速度。* Results: 对比于传统AR和NAR模型,AMLNet显示出了更高的准确率和更快的计算速度。Abstract
Multi-horizon time series forecasting, crucial across diverse domains, demands high accuracy and speed. While AutoRegressive (AR) models excel in short-term predictions, they suffer speed and error issues as the horizon extends. Non-AutoRegressive (NAR) models suit long-term predictions but struggle with interdependence, yielding unrealistic results. We introduce AMLNet, an innovative NAR model that achieves realistic forecasts through an online Knowledge Distillation (KD) approach. AMLNet harnesses the strengths of both AR and NAR models by training a deep AR decoder and a deep NAR decoder in a collaborative manner, serving as ensemble teachers that impart knowledge to a shallower NAR decoder. This knowledge transfer is facilitated through two key mechanisms: 1) outcome-driven KD, which dynamically weights the contribution of KD losses from the teacher models, enabling the shallow NAR decoder to incorporate the ensemble's diversity; and 2) hint-driven KD, which employs adversarial training to extract valuable insights from the model's hidden states for distillation. Extensive experimentation showcases AMLNet's superiority over conventional AR and NAR models, thereby presenting a promising avenue for multi-horizon time series forecasting that enhances accuracy and expedites computation.
摘要
Enhancing Scalability and Reliability in Semi-Decentralized Federated Learning With Blockchain: Trust Penalization and Asynchronous Functionality
results: 实现了一个公平、安全、透明的分布式联合学习环境,不会侵犯数据隐私。Here’s the same information in English:
for: To enhance the scalability and reliability of Distributed Federated Learning by integrating blockchain technology.
methods: Using a trust penalization mechanism to enhance the trustworthiness of participating nodes, while enabling asynchronous functionality for efficient and robust model updates.
results: Achieving a fair, secure, and transparent environment for collaborative machine learning without compromising data privacy.Abstract
The paper presents an innovative approach to address the challenges of scalability and reliability in Distributed Federated Learning by leveraging the integration of blockchain technology. The paper focuses on enhancing the trustworthiness of participating nodes through a trust penalization mechanism while also enabling asynchronous functionality for efficient and robust model updates. By combining Semi-Decentralized Federated Learning with Blockchain (SDFL-B), the proposed system aims to create a fair, secure and transparent environment for collaborative machine learning without compromising data privacy. The research presents a comprehensive system architecture, methodologies, experimental results, and discussions that demonstrate the advantages of this novel approach in fostering scalable and reliable SDFL-B systems.
摘要
这篇论文提出了一种创新的方法,以解决分布式联合学习中的可扩展性和可靠性问题,通过把区块链技术与联合学习相结合。该论文将参与节点的可信性提高通过信任惩罚机制,同时允许 asynchronous 功能,以实现高效和可靠的模型更新。通过结合 Semi-Decentralized Federated Learning with Blockchain(SDFL-B),该提案旨在创造一个公平、安全和透明的合作机器学习环境,无需妥协数据隐私。论文采用了完整的系统架构、方法、实验结果和讨论,以示该新方法在推动可扩展和可靠的 SDFL-B 系统的优势。
Facilitating Graph Neural Networks with Random Walk on Simplicial Complexes
results: 广泛的实验证明了随机漫步基于的方法的效果,包括随机漫步 pozitional encoding(RWSE)和Hodge1Lap。这些方法可以提高GNN的表达能力和稳定性。Abstract
Node-level random walk has been widely used to improve Graph Neural Networks. However, there is limited attention to random walk on edge and, more generally, on $k$-simplices. This paper systematically analyzes how random walk on different orders of simplicial complexes (SC) facilitates GNNs in their theoretical expressivity. First, on $0$-simplices or node level, we establish a connection between existing positional encoding (PE) and structure encoding (SE) methods through the bridge of random walk. Second, on $1$-simplices or edge level, we bridge edge-level random walk and Hodge $1$-Laplacians and design corresponding edge PE respectively. In the spatial domain, we directly make use of edge level random walk to construct EdgeRWSE. Based on the spectral analysis of Hodge $1$-Laplcians, we propose Hodge1Lap, a permutation equivariant and expressive edge-level positional encoding. Third, we generalize our theory to random walk on higher-order simplices and propose the general principle to design PE on simplices based on random walk and Hodge Laplacians. Inter-level random walk is also introduced to unify a wide range of simplicial networks. Extensive experiments verify the effectiveness of our random walk-based methods.
摘要
节点级随机漫步已广泛应用于图 neural network 中,但是有限的注意力是随机漫步在边和更一般来说的 $k$-simplices 方面。本文系统地分析了随机漫步在不同顺序 simplicial complexes (SC) 中如何促进 GNNs 的理论表达能力。首先,在 $0$-simplices 或节点级别,我们建立了随机漫步和 pozitional 编码(PE) 方法之间的桥接。其次,在 $1$-simplices 或边级别,我们将随机漫步和 Hodge $1$-Laplacians 桥接,并设计对应的边 PE。在空间领域中,我们直接利用边级随机漫步来构建 EdgeRWSE。基于 Hodge $1$-Laplacians 的спектраль分析,我们提出了一种可变的edge-level pozitional 编码 Hodge1Lap。第三,我们扩展了我们的理论到高阶 simplices 上,并提出了一个通用的方法来在随机漫步和 Hodge Laplacians 基础上设计 PE。我们还引入了间隔随机漫步,以统一一 wide range of simplicial networks。extensive experiments 表明我们的随机漫步基于方法的效果。Note: Simplified Chinese is a written form of Chinese that uses simpler characters and grammar compared to Traditional Chinese. It is commonly used in mainland China and is the official language of the People's Republic of China.
rTsfNet: a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction for IMU-based Human Activity Recognition
results: 该模型在管理的标准 bench 条件下,以及多个数据集(UCI HAR、PAMAP2、Daphnet、OPPORTUNITY)中,实现了最高精度,超过了现有模型。Abstract
This paper proposes rTsfNet, a DNN model with Multi-head 3D Rotation and Time Series Feature Extraction, as a new DNN model for IMU-based human activity recognition (HAR). rTsfNet automatically selects 3D bases from which features should be derived by deriving 3D rotation parameters within the DNN. Then, time series features (TSFs), the wisdom of many researchers, are derived and realize HAR using MLP. Although a model that does not use CNN, it achieved the highest accuracy than existing models under well-managed benchmark conditions and multiple datasets: UCI HAR, PAMAP2, Daphnet, and OPPORTUNITY, which target different activities.
摘要
Machine Learning Regularization for the Minimum Volume Formula of Toric Calabi-Yau 3-folds
results: 这篇论文提出了一些可解释的、基于流形的几何 invariants的确定最小体积的公式。这些公式可以高度准确地计算这些流形的最小体积。Abstract
We present a collection of explicit formulas for the minimum volume of Sasaki-Einstein 5-manifolds. The cone over these 5-manifolds is a toric Calabi-Yau 3-fold. These toric Calabi-Yau 3-folds are associated with an infinite class of 4d N=1 supersymmetric gauge theories, which are realized as worldvolume theories of D3-branes probing the toric Calabi-Yau 3-folds. Under the AdS/CFT correspondence, the minimum volume of the Sasaki-Einstein base is inversely proportional to the central charge of the corresponding 4d N=1 superconformal field theories. The presented formulas for the minimum volume are in terms of geometric invariants of the toric Calabi-Yau 3-folds. These explicit results are derived by implementing machine learning regularization techniques that advance beyond previous applications of machine learning for determining the minimum volume. Moreover, the use of machine learning regularization allows us to present interpretable and explainable formulas for the minimum volume. Our work confirms that, even for extensive sets of toric Calabi-Yau 3-folds, the proposed formulas approximate the minimum volume with remarkable accuracy.
摘要
我们提出了一系列Explicit的方程式,用于找到Sasaki-Einstein 5-dimensional manifold的最小体积。这些5-dimensional manifold的对偶是一种toric Calabi-Yau 3-fold。这些toric Calabi-Yau 3-fold和4d N=1 supersymmetric gauge theory之间存在一个无穷的等级关系,它们是D3-brane在toric Calabi-Yau 3-fold上的世界volume理论。根据AdS/CFT对偶,Sasaki-Einstein底物的最小体积与4d N=1 superconformal field theory的中心质量有逆比例关系。我们提出的方程式是使用机器学习调整技术所得到的,这些技术超过了过去使用机器学习来决定最小体积的应用。此外,使用机器学习调整技术允许我们提供可解释和可读的方程式,用于找到最小体积。我们的工作证明,即使是大量的toric Calabi-Yau 3-fold中,我们的方程式仍然可以对最小体积进行高精度的近似。
Prediction of Effective Elastic Moduli of Rocks using Graph Neural Networks
results: GNN 模型在不同的图大小和Subcube 维度上显示了良好的预测能力,并且在测试集上保持了高预测精度。与 Convolutional Neural Networks (CNNs) 进行比较分析表明,GNNs 在预测未看过的岩石性质方面表现更好。此外,图表示微结构减少了 GPU 内存需求(相比于网格表示法 для CNNs),使得批处理大小的选择更加灵活。这项研究 demonstarte GNN 模型在改善岩石性质预测精度和整个数字岩石分析的效率方面具有潜力。Abstract
This study presents a Graph Neural Networks (GNNs)-based approach for predicting the effective elastic moduli of rocks from their digital CT-scan images. We use the Mapper algorithm to transform 3D digital rock images into graph datasets, encapsulating essential geometrical information. These graphs, after training, prove effective in predicting elastic moduli. Our GNN model shows robust predictive capabilities across various graph sizes derived from various subcube dimensions. Not only does it perform well on the test dataset, but it also maintains high prediction accuracy for unseen rocks and unexplored subcube sizes. Comparative analysis with Convolutional Neural Networks (CNNs) reveals the superior performance of GNNs in predicting unseen rock properties. Moreover, the graph representation of microstructures significantly reduces GPU memory requirements (compared to the grid representation for CNNs), enabling greater flexibility in the batch size selection. This work demonstrates the potential of GNN models in enhancing the prediction accuracy of rock properties and boosting the efficiency of digital rock analysis.
摘要
Translated into Simplified Chinese:这个研究提出了基于图神经网络(GNN)的方法,用于预测矿石的有效弹性模量,基于其数字CT扫描图像。我们使用Mapper算法将3D数字矿石图像转换成图 dataset,包含重要的几何信息。这些图, после训练,能够有效预测弹性模量。我们的 GNN 模型在不同的图大小和不同的子立方体维度上表现出了良好的预测能力。不仅在测试数据集上表现出色,而且可以保持高度预测精度 для未看到的矿石和未探索的子立方体大小。比较分析表明,GNN 模型在预测未看到的矿石属性方面表现出了超越 CNN 模型的优异性。此外,图表示微结构的几何表示法可以减少 GPU 内存需求 (相比于网格表示法 для CNN 模型), 这使得批处理大小的选择更加灵活。这项工作示示了 GNN 模型在提高矿石属性预测精度和提高数字矿石分析效率方面的潜力。
Invariant kernels on Riemannian symmetric spaces: a harmonic-analytic approach
results: 提出了L$^{!\scriptscriptstyle p}$-Godement定理(-$p = 1,2),这些定理提供了非 compact型半 Symmetric space上古德曼kernel的正定性的必要和 suficient conditions,这些结果比较容易应用。Abstract
This work aims to prove that the classical Gaussian kernel, when defined on a non-Euclidean symmetric space, is never positive-definite for any choice of parameter. To achieve this goal, the paper develops new geometric and analytical arguments. These provide a rigorous characterization of the positive-definiteness of the Gaussian kernel, which is complete but for a limited number of scenarios in low dimensions that are treated by numerical computations. Chief among these results are the L$^{\!\scriptscriptstyle p}$-$\hspace{0.02cm}$Godement theorems (where $p = 1,2$), which provide verifiable necessary and sufficient conditions for a kernel defined on a symmetric space of non-compact type to be positive-definite. A celebrated theorem, sometimes called the Bochner-Godement theorem, already gives such conditions and is far more general in its scope, but is especially hard to apply. Beyond the connection with the Gaussian kernel, the new results in this work lay out a blueprint for the study of invariant kernels on symmetric spaces, bringing forth specific harmonic analysis tools that suggest many future applications.
摘要
(这个工作的目标是证明классический高斯核,当定义在非欧几何同轴空间上,从未是一个正定的核心,对于任何选择的参数。为了实现这一目标,文章发展了新的几何和分析 Argument。这些提供了非欧几何同轴空间上高斯核的正定性的完整但是有限数量的低维度场景,通过数值计算处理。文章中最主要的结果是L$^{\!\scriptscriptstyle p}$-$\hspace{0.02cm}$Godement定理(其中$p = 1,2),这些定理提供了非欧几何同轴空间上核定的必要和 suficient条件,这些条件是可靠的,但是特别hard to apply。 beyond the connection with the Gaussian kernel, the new results in this work lay out a blueprint for the study of invariant kernels on symmetric spaces, bringing forth specific harmonic analysis tools that suggest many future applications.)
A Metadata-Driven Approach to Understand Graph Neural Networks
For: This paper aims to understand the limitations of Graph Neural Networks (GNNs) and identify critical data properties that affect their performance.* Methods: The authors propose a metadata-driven approach to analyze the sensitivity of GNNs to graph data properties, using a multivariate sparse regression analysis on benchmarking data.* Results: The authors find that dataset degree distribution has a significant impact on GNN performance, with more balanced degree distributions leading to better linear separability of node representations and better GNN performance. Theoretical analysis and controlled experiments verify the effectiveness of the proposed approach.Abstract
Graph Neural Networks (GNNs) have achieved remarkable success in various applications, but their performance can be sensitive to specific data properties of the graph datasets they operate on. Current literature on understanding the limitations of GNNs has primarily employed a $\textit{model-driven}$ approach that leverage heuristics and domain knowledge from network science or graph theory to model the GNN behaviors, which is time-consuming and highly subjective. In this work, we propose a $\textit{metadata-driven}$ approach to analyze the sensitivity of GNNs to graph data properties, motivated by the increasing availability of graph learning benchmarks. We perform a multivariate sparse regression analysis on the metadata derived from benchmarking GNN performance across diverse datasets, yielding a set of salient data properties. To validate the effectiveness of our data-driven approach, we focus on one identified data property, the degree distribution, and investigate how this property influences GNN performance through theoretical analysis and controlled experiments. Our theoretical findings reveal that datasets with more balanced degree distribution exhibit better linear separability of node representations, thus leading to better GNN performance. We also conduct controlled experiments using synthetic datasets with varying degree distributions, and the results align well with our theoretical findings. Collectively, both the theoretical analysis and controlled experiments verify that the proposed metadata-driven approach is effective in identifying critical data properties for GNNs.
摘要
GRAPHNeural Networks (GNNs) 已经取得了各种应用领域的出色成绩,但它们在具体的数据特性上的性能可能具有敏感性。现有文献中理解 GNN 的限制主要采用了一种 $\textit{模型驱动的}$ 方法,利用网络科学或图论中的euristic和专业知识来模型 GNN 的行为,这是时间consuming 和高度主观的。在这种工作中,我们提议一种 $\textit{metadata驱动的}$ 方法来分析 GNN 对图数据特性的敏感性, motivated by the increasing availability of graph learning benchmarks。我们通过对benchmarking GNN 性能的 metadata 进行多ivariate sparse regression分析,得到了一组突出的数据特性。为了证明我们的数据驱动方法的有效性,我们选择了一个标识的数据特性,即度分布,并通过理论分析和控制实验来研究该特性对 GNN 性能的影响。我们的理论发现表明,具有更平衡的度分布的数据集 exhibit better linear separability of node representations,从而导致更好的 GNN 性能。我们还通过使用 synthetic 数据集来进行控制实验,结果与我们的理论发现一致。总之, Both the theoretical analysis and controlled experiments verify that the proposed metadata-driven approach is effective in identifying critical data properties for GNNs.
Diversify & Conquer: Outcome-directed Curriculum RL via Out-of-Distribution Disagreement
results: 实验结果表明,D2C方法在量和质上都超过了先前的课程RL方法,即使愿望结果示例的分布是随机的。Abstract
Reinforcement learning (RL) often faces the challenges of uninformed search problems where the agent should explore without access to the domain knowledge such as characteristics of the environment or external rewards. To tackle these challenges, this work proposes a new approach for curriculum RL called Diversify for Disagreement & Conquer (D2C). Unlike previous curriculum learning methods, D2C requires only a few examples of desired outcomes and works in any environment, regardless of its geometry or the distribution of the desired outcome examples. The proposed method performs diversification of the goal-conditional classifiers to identify similarities between visited and desired outcome states and ensures that the classifiers disagree on states from out-of-distribution, which enables quantifying the unexplored region and designing an arbitrary goal-conditioned intrinsic reward signal in a simple and intuitive way. The proposed method then employs bipartite matching to define a curriculum learning objective that produces a sequence of well-adjusted intermediate goals, which enable the agent to automatically explore and conquer the unexplored region. We present experimental results demonstrating that D2C outperforms prior curriculum RL methods in both quantitative and qualitative aspects, even with the arbitrarily distributed desired outcome examples.
摘要
强化学习(RL)经常遇到无知搜索问题,agent需要无法访问环境特性或外部奖励的情况下探索。为解决这些挑战,这项工作提出了一种新的课程RL方法,即多样化为争议与征服(D2C)。与之前的课程学习方法不同,D2C只需要一些欲要的结果示例,并在任何环境中工作,无论环境的几何结构或欲要结果示例的分布。提出的方法首先进行了目标准备的多样化,以便在访问和欲要结果状态时确定相似性,并确保类ifiers在不同的状态下产生分歧,从而能够量化未探索的区域并设计一个简单直观的目标准备的自适应奖励信号。然后,提出的方法使用两部分匹配来定义一个课程学习目标,生成一系列适应度较高的中间目标,使得agent可以自动探索和征服未探索的区域。我们的实验结果表明,D2C在量和质量上都高于先前的课程RL方法,即使欲要结果示例的分布是随机的。
results: 在对真实高维数据进行实验中,提出了一种新的机制来实现数据驱动分布扰动隐私,并在分布性检测和鲁棒化学习中显示了强емпириical性能。Abstract
We present a computationally efficient framework, called \texttt{FlowDRO}, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets, when requiring the worst-case distribution (also called the Least Favorable Distribution, LFD) to be continuous so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models, continuous-time invertible transport maps between the data distribution and the target distribution, and develop a Wasserstein proximal gradient flow type of algorithm. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. Our computational framework is general, can handle high-dimensional data with large sample sizes, and can be useful for various applications. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on real high-dimensional data.
摘要
我们提出一种计算效率高的框架,称为\texttt{FlowDRO},用于解决基于流的分布式 robust优化(DRO)问题,其中最差情况分布(也称为最不利分布,LFD)需要是连续的,以便可以适用于更大的样本大小和更好的总体化能力。为了解决计算复杂的无穷维度优化问题,我们利用流模型,连续时间可逆运输映射 между数据分布和目标分布,并开发了 Wasserstein proximal梯度流类型的算法。在实践中,我们归一化运输映射使用一个序列的神经网络,逐步在块内部使用梯度下降进行训练。我们的计算框架是通用的,可以处理高维数据和大样本大小,并可以用于多种应用。我们在抑制学习、分布式 robust测试和数据驱动分布泄漏隐私中展示了该方法的强制实际性。
Assessment of Differentially Private Synthetic Data for Utility and Fairness in End-to-End Machine Learning Pipelines for Tabular Data
paper_authors: Mayana Pereira, Meghana Kshirsagar, Sumit Mukherjee, Rahul Dodhia, Juan Lavista Ferres, Rafael de Sousa for:This paper aims to investigate the use of differentially private synthetic data in end-to-end machine learning pipelines, specifically exploring the extent to which synthetic data can replace real, tabular data and identifying the most effective synthetic data generation techniques for training and evaluating machine learning models.methods:The authors use a training and evaluation framework that does not assume the availability of real data for testing the utility and fairness of machine learning models trained on synthetic data. They analyze several different definitions of fairness and compare the utility and fairness of models trained using marginal-based and GAN-based synthetic data generation algorithms.results:The authors find that marginal-based synthetic data generators outperform GAN-based ones in terms of model training utility for tabular data, and that models trained using data generated by the marginal-based algorithm MWEM PGM can achieve similar utility to models trained using real data. Additionally, the authors show that these models can also exhibit fairness characteristics similar to those obtained by models trained with real data.Abstract
Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines impacts areas such as health care and humanitarian action, where data is scarce and regulated by restrictive privacy laws. In this work, we investigate the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identify the most effective synthetic data generation techniques for training and evaluating machine learning models. We investigate the impacts of differentially private synthetic data on downstream classification tasks from the point of view of utility as well as fairness. Our analysis is comprehensive and includes representatives of the two main types of synthetic data generation algorithms: marginal-based and GAN-based. To the best of our knowledge, our work is the first that: (i) proposes a training and evaluation framework that does not assume that real data is available for testing the utility and fairness of machine learning models trained on synthetic data; (ii) presents the most extensive analysis of synthetic data set generation algorithms in terms of utility and fairness when used for training machine learning models; and (iii) encompasses several different definitions of fairness. Our findings demonstrate that marginal-based synthetic data generators surpass GAN-based ones regarding model training utility for tabular data. Indeed, we show that models trained using data generated by marginal-based algorithms can exhibit similar utility to models trained using real data. Our analysis also reveals that the marginal-based synthetic data generator MWEM PGM can train models that simultaneously achieve utility and fairness characteristics close to those obtained by models trained with real data.
摘要
diferencialmente privado (DP) 的合成数据集是一种为保护个人数据提供者隐私而分享数据的解决方案。 我们 investigated the extent to which synthetic data can replace real, tabular data in machine learning pipelines and identified the most effective synthetic data generation techniques for training and evaluating machine learning models. 我们的分析包括代表两种主要的合成数据生成算法:marginal-based和GAN-based。在我们的研究中,我们首次:(i)提出了一个不假设实际数据可用于测试机器学习模型在合成数据上的实用性和公平性的训练和评估框架;(ii)对合成数据集生成算法进行了最广泛的分析,包括实用性和公平性在内的多种定义。我们的发现表明,marginal-based合成数据生成器在机器学习模型训练中的实用性比GAN-based更高。我们的分析还显示,使用marginal-based算法生成的数据可以让机器学习模型达到与实际数据训练模型相似的实用性特征。此外,我们发现MWEM PGM算法可以通过同时实现实用性和公平性特征来训练模型。
A spectral regularisation framework for latent variable models designed for single channel applications
results: 该包可以帮助 Investigate和使用LVMs with spectral regularization,并提供了一个一致的线性LVM优化框架,用于单通道时间序列应用。Abstract
Latent variable models (LVMs) are commonly used to capture the underlying dependencies, patterns, and hidden structure in observed data. Source duplication is a by-product of the data hankelisation pre-processing step common to single channel LVM applications, which hinders practical LVM utilisation. In this article, a Python package titled spectrally-regularised-LVMs is presented. The proposed package addresses the source duplication issue via the addition of a novel spectral regularisation term. This package provides a framework for spectral regularisation in single channel LVM applications, thereby making it easier to investigate and utilise LVMs with spectral regularisation. This is achieved via the use of symbolic or explicit representations of potential LVM objective functions which are incorporated into a framework that uses spectral regularisation during the LVM parameter estimation process. The objective of this package is to provide a consistent linear LVM optimisation framework which incorporates spectral regularisation and caters to single channel time-series applications.
摘要
封装化变量模型(LVM)通常用于捕捉观察数据中的下面依赖、模式和隐藏结构。数据块化是单通道LVM应用中的数据处理步骤中的一个副产品,它限制了实用LVM的使用。本文提出了一个名为“spectrally-regularized-LVMs”的Python包,该包解决了源重复问题,通过添加一个新的spectral regularization term。这个包提供了一个基于spectral regularization的单通道LVM应用程序框架,使得可以更容易地investigate和utilize LVMs with spectral regularization。这是通过使用符号或显式表示potential LVM目标函数,并将其包含到一个使用spectral regularization during LVM参数估计过程中的框架中来实现的。该包的目标是提供一个一致的线性LVM优化框架,该框架包含spectral regularization并且适用于单通道时间序列应用。
Maximum Knowledge Orthogonality Reconstruction with Gradients in Federated Learning
results: 对MNIST、CIFAR-100和ImageNet dataset进行评估,比对已有方法,MKOR方法可以重构高质量的输入图像,并且可以效率地和不察觉地从客户端的梯度更新中重构输入图像。Abstract
Federated learning (FL) aims at keeping client data local to preserve privacy. Instead of gathering the data itself, the server only collects aggregated gradient updates from clients. Following the popularity of FL, there has been considerable amount of work, revealing the vulnerability of FL approaches by reconstructing the input data from gradient updates. Yet, most existing works assume an FL setting with unrealistically small batch size, and have poor image quality when the batch size is large. Other works modify the neural network architectures or parameters to the point of being suspicious, and thus, can be detected by clients. Moreover, most of them can only reconstruct one sample input from a large batch. To address these limitations, we propose a novel and completely analytical approach, referred to as the maximum knowledge orthogonality reconstruction (MKOR), to reconstruct clients' input data. Our proposed method reconstructs a mathematically proven high quality image from large batches. MKOR only requires the server to send secretly modified parameters to clients and can efficiently and inconspicuously reconstruct the input images from clients' gradient updates. We evaluate MKOR's performance on the MNIST, CIFAR-100, and ImageNet dataset and compare it with the state-of-the-art works. The results show that MKOR outperforms the existing approaches, and draws attention to a pressing need for further research on the privacy protection of FL so that comprehensive defense approaches can be developed.
摘要
federated learning (FL) 目的是保持客户端数据本地,以保持隐私。而而不是收集客户端数据本身,服务器只收集客户端上的聚合梯度更新。随着 Federated Learning 的流行,有很多工作揭示了 Federated Learning 方法的漏洞,可以从梯度更新中重建输入数据。然而,大多数现有工作假设了 Federated Learning 的 batch size 非常小,并且在 batch size 较大时图像质量不佳。其他工作可能会修改神经网络结构或参数,以至于可以被客户端探测。此外,大多数工作只能从大批量中重建一个输入数据。为解决这些限制,我们提出了一种新的、完全分析的方法,称为最大知识正交重建(MKOR),可以从客户端的梯度更新中重建输入数据。MKOR 可以高质量地重建大批量中的图像。MKOR 只需服务器在秘密地将修改后的参数发送给客户端,并且可以高效地、不显 Orts reconstruction 输入数据。我们对 MKOR 在 MNIST、CIFAR-100 和 ImageNet 数据集上进行了性能评估,并与现有的方法进行比较。结果显示,MKOR 超过了现有的方法,并吸引了关注于 Federated Learning 隐私保护的进一步研究,以开发全面的防御方法。
From Stream to Pool: Dynamic Pricing Beyond i.i.d. Arrivals
paper_authors: Titing Cui, Su Jia, Thomas Lavastida
for: This paper focuses on the dynamic pricing problem, specifically addressing the issue of high-valuation customers leaving the market early and causing a shift in the valuation distribution.
methods: The authors propose a minimax optimal algorithm that computes a non-adaptive policy to guarantee a $1/k$ fraction of the optimal revenue, given any set of $k$ prices. Additionally, they present an adaptive learn-then-earn policy based on a novel debiasing approach.
results: The authors prove an $\tilde O(kn^{3/4})$ regret bound for the adaptive policy, and further improve the bound to $\tilde O(k^{3/4} n^{3/4})$ using martingale concentration inequalities.Abstract
The dynamic pricing problem has been extensively studied under the \textbf{stream} model: A stream of customers arrives sequentially, each with an independently and identically distributed valuation. However, this formulation is not entirely reflective of the real world. In many scenarios, high-valuation customers tend to make purchases earlier and leave the market, leading to a \emph{shift} in the valuation distribution. Thus motivated, we consider a model where a \textbf{pool} of $n$ non-strategic unit-demand customers interact repeatedly with the seller. Each customer monitors the price intermittently according to an independent Poisson process and makes a purchase if the observed price is lower than her \emph{private} valuation, whereupon she leaves the market permanently. We present a minimax \emph{optimal} algorithm that efficiently computes a non-adaptive policy which guarantees a $1/k$ fraction of the optimal revenue, given any set of $k$ prices. Moreover, we present an adaptive \emph{learn-then-earn} policy based on a novel \emph{debiasing} approach, and prove an $\tilde O(kn^{3/4})$ regret bound. We further improve the bound to $\tilde O(k^{3/4} n^{3/4})$ using martingale concentration inequalities.
摘要
“流动价格问题已经得到了广泛的研究,以流动客户为例:每个客户独立且相同的评价会随机出现。但这个模型不完全反映现实情况:高评价客户往往在早期购买并离开市场,导致价格分布的变化。因此,我们考虑了一个具有$n$名不策略性单位需求客户的集合,这些客户可以重复地与价格发展商互动。每个客户按照独立的波尔兹数 процес监控价格,如果观察到的价格低于其私人评价,就会购买产品并永久离开市场。我们提出了一个最佳算法,可以效率地计算一个不适应性政策, garantua 1/k Fraction of the 最佳收益, given any set of k prices。此外,我们还提出了一个基于新的传播措施的学习然后获利策略,并证明了 $\tilde O(kn^{3/4})$ 的后悔 bound。 finally,我们使用 martingale concentration inequalities 提高 bound to $\tilde O(k^{3/4} n^{3/4})$。”
A Survey of Federated Unlearning: A Taxonomy, Challenges and Future Directions
for: 本文提出了一种 Federated Unlearning (FU) 的概述,强调了在 Federated Learning (FL) 环境下实现 Right to be Forgotten (RTBF) 的挑战。
methods: 本文评论了现有的 FU 算法、目标函数和评价指标,并将它们分类为不同的方案、应用场景和未来发展方向。
results: 本文通过对一些研究的回顾和比较,总结了它们的特点和优劣点,并提出了未来研究的可能性和挑战。Abstract
With the development of trustworthy Federated Learning (FL), the requirement of implementing right to be forgotten gives rise to the area of Federated Unlearning (FU). Comparing to machine unlearning, a major challenge of FU lies in the decentralized and privacy-preserving nature of FL, in which clients jointly train a global model without sharing their raw data, making it substantially more intricate to selectively unlearn specific information. In that regard, many efforts have been made to tackle the challenges of FU and have achieved significant progress. In this paper, we present a comprehensive survey of FU. Specially, we provide the existing algorithms, objectives, evaluation metrics, and identify some challenges of FU. By reviewing and comparing some studies, we summarize them into a taxonomy for various schemes, potential applications and future directions.
摘要
随着可靠的 Federated Learning (FL) 的发展,实现“忘记权”的要求给出了 Federated Unlearning (FU) 领域的挑战。与机器解启相比,FU 的主要挑战在于 Federated Learning 的分布式和隐私保护特性, clients 在无需分享原始数据的情况下集成全球模型,使其 SELECTIVE 地忘记特定信息变得非常复杂。为此,许多努力已经被作出,并取得了显著进步。在这篇论文中,我们提供了 FU 的全面报告,包括现有的算法、目标、评价指标,并对一些研究进行了概要总结和比较,将其分类为不同的方案、应用场景和未来方向。
On the accuracy and efficiency of group-wise clipping in differentially private optimization
results: 研究显示,相比总层梯度抑制,分层梯度抑制可以更好地实现高精度和低峰存储之间的平衡。此外,对大型模型进行DP优化,使用分层梯度抑制可以达到高精度和低峰存储同时。Abstract
Recent advances have substantially improved the accuracy, memory cost, and training speed of differentially private (DP) deep learning, especially on large vision and language models with millions to billions of parameters. In this work, we thoroughly study the per-sample gradient clipping style, a key component in DP optimization. We show that different clipping styles have the same time complexity but instantiate an accuracy-memory trade-off: while the all-layer clipping (of coarse granularity) is the most prevalent and usually gives the best accuracy, it incurs heavier memory cost compared to other group-wise clipping, such as the layer-wise clipping (of finer granularity). We formalize this trade-off through our convergence theory and complexity analysis. Importantly, we demonstrate that the accuracy gap between group-wise clipping and all-layer clipping becomes smaller for larger models, while the memory advantage of the group-wise clipping remains. Consequently, the group-wise clipping allows DP optimization of large models to achieve high accuracy and low peak memory simultaneously.
摘要
Factor Fitting, Rank Allocation, and Partitioning in Multilevel Low Rank Matrices
results: 论文的结果显示,使用提出的方法可以减少积分矩阵的存储空间和计算复杂度,同时保持积分矩阵的性质。此外,论文还附上了一个开源包,实现了提出的方法。Abstract
We consider multilevel low rank (MLR) matrices, defined as a row and column permutation of a sum of matrices, each one a block diagonal refinement of the previous one, with all blocks low rank given in factored form. MLR matrices extend low rank matrices but share many of their properties, such as the total storage required and complexity of matrix-vector multiplication. We address three problems that arise in fitting a given matrix by an MLR matrix in the Frobenius norm. The first problem is factor fitting, where we adjust the factors of the MLR matrix. The second is rank allocation, where we choose the ranks of the blocks in each level, subject to the total rank having a given value, which preserves the total storage needed for the MLR matrix. The final problem is to choose the hierarchical partition of rows and columns, along with the ranks and factors. This paper is accompanied by an open source package that implements the proposed methods.
摘要
我们考虑多层低阶(MLR)矩阵,定义为一行和列排序的一个总和矩阵的各个矩阵,每个矩阵都是前一个矩阵的对角线均分划分,所有块都是低阶矩阵的实际形式。 MLR 矩阵扩展了低阶矩阵,但与其属性相似,例如总储存需求和矩阵-向量乘法的复杂度。我们处理三个在适用给一个矩阵的 MLR 矩阵的问题:1. 因数适应(factor fitting):我们调整 MLR 矩阵的因数。2. 权重分配(rank allocation):我们选择每个层的块的权重,以保持总权重的给定值,并保持 MLR 矩阵的总储存需求不变。3. 垂直分解和因数选择(hierarchical partition and factor selection):我们选择行和列的垂直分解,以及每个层的因数。这篇文章附加了一个开源套件,实现了我们提议的方法。
Investigative Pattern Detection Framework for Counterterrorism
results: 开发了一套Investigative Pattern Detection Framework for Counterterrorism(INSPECT),可以自动检测行为指标和风险oprofile/组,并自动完成大规模的审查和检索工作。INSPECT已经在domestic jihadism dataset上验证和评估。Abstract
Law-enforcement investigations aimed at preventing attacks by violent extremists have become increasingly important for public safety. The problem is exacerbated by the massive data volumes that need to be scanned to identify complex behaviors of extremists and groups. Automated tools are required to extract information to respond queries from analysts, continually scan new information, integrate them with past events, and then alert about emerging threats. We address challenges in investigative pattern detection and develop an Investigative Pattern Detection Framework for Counterterrorism (INSPECT). The framework integrates numerous computing tools that include machine learning techniques to identify behavioral indicators and graph pattern matching techniques to detect risk profiles/groups. INSPECT also automates multiple tasks for large-scale mining of detailed forensic biographies, forming knowledge networks, and querying for behavioral indicators and radicalization trajectories. INSPECT targets human-in-the-loop mode of investigative search and has been validated and evaluated using an evolving dataset on domestic jihadism.
摘要
法警调查措施为保障公共安全而日益重要,尤其是针对暴力激进分子的袭击。问题在于巨量数据的检索和分析,以找出激进分子和团体的复杂行为特征。为此,我们提出了一套Investigative Pattern Detection Framework for Counterterrorism(INSPECT),它将多种计算工具集成,包括机器学习技术和图Pattern matching技术,以识别行为指标和风险分布。INSPECT还自动化了大规模的审批细致生物图文搜索、知识网络建立和行为指标和激进化轨迹查询。INSPECT采用人类在循环模式的调查搜索方式,并已经验证和评估了逐渐增长的 dataset on domestic jihadism。
paper_authors: Murat Babek Salman, Emil Björnson, Gokhan Muzaffer Guvensen, Tolga Ciloglu
for: investigate the impact of frequency selectivity on nonlinear distortion in wireless communication systems
methods: closed-form expression for received distortion power as a function of number of multipath components (MPCs) and delay spread
results: in-band and OOB distortion power is inversely proportional to the number of MPCs, and the in-band distortion power is beamformed towards the intended userHere’s the summary in Traditional Chinese:
for: 研究无线通信系统中频率选择性对不对称干扰的影响
methods: 使用关键表达式来表示受到多普通道(MPC)和延迟幅度的受到干扰力
results: 对于不同频率域的干扰力,对于MPC的数量有直接的关系,并且在延迟幅度变窄时,对于指定用户的干扰力会增加。Abstract
Nonlinear distortion stemming from low-cost power amplifiers may severely affect wireless communication performance through out-of-band (OOB) radiation and in-band distortion. The distortion is correlated between different transmit antennas in an antenna array, which results in a beamforming gain at the receiver side that grows with the number of antennas. In this paper, we investigate how the strength of the distortion is affected by the frequency selectivity of the channel. A closed-form expression for the received distortion power is derived as a function of the number of multipath components (MPCs) and the delay spread, which highlight their impact. The performed analysis, which is verified via numerical simulations, reveals that as the number of MPCs increases, distortion exhibits distinct characteristics for in-band and OOB frequencies. It is shown that the received in-band and OOB distortion power is inversely proportional to the number of MPCs, and it is reported that as the delay spread gets narrower, the in-band distortion power is beamformed towards the intended user, which yields higher received in-band distortion compared to the OOB distortion.
摘要
非线性扭曲由低成本功率增强器引起的无线通信性能可能严重受到影响,主要通过射频外带(OOB)辐射和射频扭曲引起。这种扭曲与不同的发射天线之间存在相关性,导致发射天线阵列中的扭曲增强。在这篇论文中,我们研究了频率选择性通道的影响于扭曲强度。我们 derivated一个关于数目多path component(MPC)和延迟跨度的关键表达式,这些表达式阐明了他们对扭曲强度的影响。我们的分析,通过数值仿真验证,显示了随着MPC的增加,扭曲展现出明显的特征。 Specifically, we find that the received in-band and OOB distortion power is inversely proportional to the number of MPCs, and as the delay spread narrows, the in-band distortion power is beamformed towards the intended user, which yields higher received in-band distortion compared to the OOB distortion.Here's the translation in Traditional Chinese:非线性扭曲由低成本功率增强器引起的无线通信性能可能严重受到影响,主要通过射频外带(OOB)辐射和射频扭曲引起。这种扭曲与不同的发射天线之间存在相关性,导致发射天线阵列中的扭曲增强。在这篇论文中,我们研究了频率选择性通道的影响于扭曲强度。我们 derivated一个关于数目多path component(MPC)和延迟跨度的关键表达式,这些表达式阐明了他们对扭曲强度的影响。我们的分析,通过数值仿真验证,显示了随着MPC的增加,扭曲展现出明显的特征。 Specifically, we find that the received in-band and OOB distortion power is inversely proportional to the number of MPCs, and as the delay spread narrows, the in-band distortion power is beamformed towards the intended user, which yields higher received in-band distortion compared to the OOB distortion.
Deep Learning-Enabled Text Semantic Communication under Interference: An Empirical Study
results: 测试结果表明,当 Gaussian 多重干扰(RFI)的数量很大时,DeepSC方法会生成无关 semantic 句子。因此, 为了实现6G中的可靠性和可持续性,需要开发一种基于IR$^2$ SemCom的设计方案。Abstract
At the confluence of 6G, deep learning (DL), and natural language processing (NLP), DL-enabled text semantic communication (SemCom) has emerged as a 6G enabler by promising to minimize bandwidth consumption, transmission delay, and power usage. Among text SemCom techniques, \textit{DeepSC} is a popular scheme that leverages advancements in DL and NLP to reliably transmit semantic information in low signal-to-noise ratio (SNR) regimes. To understand the fundamental limits of such a transmission paradigm, our recently developed theory \cite{Getu'23_Performance_Limits} predicted the performance limits of DeepSC under radio frequency interference (RFI). Although these limits were corroborated by simulations, trained deep networks can defy classical statistical wisdom, and hence extensive computer experiments are needed to validate our theory. Accordingly, this empirical work follows concerning the training and testing of DeepSC using the proceedings of the European Parliament (Europarl) dataset. Employing training, validation, and testing sets \textit{tokenized and vectorized} from Europarl, we train the DeepSC architecture in Keras 2.9 with TensorFlow 2.9 as a backend and test it under Gaussian multi-interferer RFI received over Rayleigh fading channels. Validating our theory, the testing results corroborate that DeepSC produces semantically irrelevant sentences as the number of Gaussian RFI emitters gets very large. Therefore, a fundamental 6G design paradigm for \textit{interference-resistant and robust SemCom} (IR$^2$ SemCom) is needed.
摘要
在6G、深度学习(DL)和自然语言处理(NLP)的交叉点上,DL启用的文本semantic communication(SemCom)已经成为6G的推动者,承诺可以减少带宽消耗、传输延迟和功率使用。 among text SemCom技术,\textit{DeepSC} 是一种受欢迎的方案,利用了深度学习和NLP的进步来可靠地在低信号噪响(SNR)的情况下传输semantic信息。为了理解这种传输模式的基本限制,我们最近提出的理论 \cite{Getu'23_Performance_Limits} 预测了DeepSC在电磁干扰(RFI)下的性能限制。虽然这些限制得到了 simulations 的 corroboration,但是训练过的深度网络可能会违背经典统计知识,因此需要广泛的计算实验来验证我们的理论。因此,这项实验关注了使用 Keras 2.9 和 TensorFlow 2.9 作为后端,使用 Евро Parlament(Europarl)数据集进行训练和测试 DeepSC 架构。在 Gaussian 多源干扰 RFI 下接收 Rayleigh 抖动频道上测试 DeepSC,结果证明了我们的理论。因此,为了实现 \textit{干扰抗性和 Robust SemCom}(IR$^2$ SemCom)的6G设计方针,需要进一步研究和开发。
Transmission line condition prediction based on semi-supervised learning
for: 这 paper 的目的是提出一种基于半supervised learning的传输线路状态预测方法,以解决现有模型无法考虑机器稳定性和数据需求的问题。
methods: 这 paper 使用了扩展特征向量、正则矩阵和表示学习来解决填充缺失数据和稀疏编码问题。然后,通过一些标注样本初步确定了不同缺陷状态的线段分类中心。最后,使用了无标注样本来更正估算模型的参数。
results: 例分析表明,这种方法可以提高认知精度和更有效地使用数据,比现有模型更好。Abstract
Transmission line state assessment and prediction are of great significance for the rational formulation of operation and maintenance strategy and improvement of operation and maintenance level. Aiming at the problem that existing models cannot take into account the robustness and data demand, this paper proposes a state prediction method based on semi-supervised learning. Firstly, for the expanded feature vector, the regular matrix is used to fill in the missing data, and the sparse coding problem is solved by representation learning. Then, with the help of a small number of labelled samples to initially determine the category centers of line segments in different defective states. Finally, the estimated parameters of the model are corrected using unlabeled samples. Example analysis shows that this method can improve the recognition accuracy and use data more efficiently than the existing models.
摘要
<>对 transmission line 的状态评估和预测是操作和维护战略的重要因素,可以提高操作和维护水平。现有模型无法考虑系统稳定性和数据需求,这篇论文提出了基于半监督学习的状态预测方法。首先,为扩展特征向量,使用常量矩阵填充缺失数据,并通过表示学习解决稀缺编码问题。然后,使用一小部分标注样本初始化不同缺陷状态的线段类中心,并使用无标注样本修正估计模型参数。示例分析表明,这种方法可以提高识别精度和更好地利用数据。Note: "transmission line" is 电力传输线 in Chinese.
Increased Multiplexing Gain with Reconfigurable Surfaces: Simultaneous Channel Orthogonalization and Information Embedding
results: 研究结果表明,使用BIS和FRIS技术可以实现MU-MIMO系统中多个antenna的完全利用,并且可以在系统中嵌入额外信息以提高传输率。Abstract
Reconfigurable surface (RS) has been shown to be an effective solution for improving wireless communication links in general multi-user multiple-input multiple-output (MU-MIMO) setting. Current research efforts have been largely directed towards the study of reconfigurable intelligent surface (RIS), which corresponds to an RS made of passive reconfigurable elements with only phase shifting capabilities. RIS constitutes a cost- and energy- efficient solution for increased beamforming gain since it allows to generate constructive interference towards desired directions, e.g., towards a base station (BS). However, in many situations, multiplexing gain may have greater impact on the achievable transmission rates and number of simultaneously connected devices, while RIS has only been able to achieve minor improvements in this aspect. Recent work has proposed the use of alternative RS technologies, namely amplitude-reconfigurable intelligent surface (ARIS) and fully-reconfigurable intelligent surface (FRIS), to achieve perfect orthogonalization of MU-MIMO channels, thus allowing for maximum multiplexing gain at reduced complexity. In this work we consider the use of ARIS and FRIS for simultaneously orthogonalizing a MU-MIMO channel, while embedding extra information in the orthogonalized channel. We show that the resulting achievable rates allow for full exploitation of the degrees of freedom in a MU-MIMO system with excess of BS antennas.
摘要
响应式表面(RS)已经被证明可以提高无线通信链路的性能,特别是在多用户多输入多输出(MU-MIMO)设置下。当前的研究努力主要集中在研究响应式智能表面(RIS),这是一种具有只能进行相位调整的pasive响应式元件的RS。RIS可以具有更高的增幅功率和更多的连接数,但是在许多情况下,多重化增益可能更大地影响可达的传输率和同时连接的设备数量。最近的工作提议使用另一种RS技术,即幅度调整的智能表面(ARIS)和完全调整的智能表面(FRIS),以实现MU-MIMO频道的完美正交,从而实现最大的多重化增益,同时减少复杂度。在这个工作中,我们考虑使用ARIS和FRIS同时正交MU-MIMO频道,并嵌入额外信息在正交后的频道中。我们发现,所得到的可达率允许在BSantenna数量超出的MU-MIMO系统中完全利用多个自由度。
A Low-Complexity Machine Learning Design for mmWave Beam Prediction
results: 本研究显示,提案的模型可以实现state-of-the-art的准确性,并且降低计算复杂度,实现减少电力消耗和快速的 beam prediction。 I hope this helps! Let me know if you have any other questions.Abstract
The 3rd Generation Partnership Project (3GPP) is currently studying machine learning (ML) for the fifth generation (5G)-Advanced New Radio (NR) air interface, where spatial and temporal-domain beam prediction are important use cases. With this background, this letter presents a low-complexity ML design that expedites the spatial-domain beam prediction to reduce the power consumption and the reference signaling overhead, which are currently imperative for frequent beam measurements. Complexity analysis and evaluation results showcase that the proposed model achieves state-of-the-art accuracy with lower computational complexity, resulting in reduced power consumption and faster beam prediction. Furthermore, important observations on the generalization of the proposed model are presented in this letter.
摘要
现在3rd Generation Partnership Project(3GPP)正在研究 fifth generation(5G)Advanced New Radio(NR)空间域 beam prediction的机器学习(ML)应用,其中空间域和时间域 beam prediction是重要的应用场景。在这种背景下,本封信函数 presenta a low-complexity ML design that accelerates the spatial-domain beam prediction to reduce power consumption and reference signaling overhead, which are currently critical for frequent beam measurements. 复杂性分析和评估结果显示,提案的模型可以实现当前最佳准确率,同时具有较低的计算复杂性,从而降低电力消耗和快速的 beam prediction。此外,本函数还对提案模型的总体化进行了重要的观察。
for: This paper reviews the evolution of coronary stent technology and its impact on patient care.
methods: The paper discusses the development of various stent types, including bare metal stents (BMS), first-generation drug-eluting stents (DES), and second-generation DES and bioresorbable vascular scaffolds (BVS). Clinical trials have been crucial in validating each stent’s effectiveness.
results: The paper highlights the progress made in stent technology, but also acknowledges ongoing challenges in stent selection, approval processes, and minimizing risks. Despite these challenges, the future may see personalized stenting based on patient needs.Abstract
Coronary artery disease (CAD) is a leading cause of death worldwide. Treatments have evolved, with stenting becoming the primary approach over bypass surgery. This article reviews the evolution of coronary stent technology, starting from the first angioplasty in 1977. Pioneers like Forssmann, Dotter, and Gruentzig established the foundation. The late 1980s saw the introduction of bare metal stents (BMS) to address angioplasty limitations. However, BMS had issues, leading to the development of first-generation drug-eluting stents (DES) in the early 2000s, which reduced restenosis but had safety concerns. Subsequent innovations introduced second-generation DES with better results and the latest bioresorbable vascular scaffolds (BVS) that dissolve over time. Clinical trials have been crucial in validating each stent's effectiveness. Despite progress, challenges remain in stent selection, approval processes, and minimizing risks. The future may see personalized stenting based on patient needs, highlighting the significant advancements in stent technology and its impact on patient care.
摘要
心血管疾病(CAD)是全球最主要的死亡原因之一。治疗方法不断演化,自然硬着附加成为主要方法,而不是通过环路手术。本文将评论心血管镜仪技术的演化,从1977年的首次抗生物治疗开始。先驱们如福斯曼、多特和格劳恩茨基础设置了基础。1980年代中期,无质量镜仪(BMS)被引入,以解决抗生物治疗的局限性。然而,BMS存在问题,导致了第一代药粉镜仪(DES)的开发,它可以减少再生长,但存在安全问题。随后的创新引入了第二代DES,并且最新的生物逐渐消失的血管支架(BVS),它们在时间上逐渐消失。临床试验对每种镜仪的效果进行了验证。尽管进步了,但是镜仪选择、批准过程和降低风险仍然是挑战。未来可能会出现个性化镜仪,这 highlights the significant advancements in stent technology and its impact on patient care.
Optimal Status Updates for Minimizing Age of Correlated Information in IoT Networks with Energy Harvesting Sensors
results: 根据广泛的 simulations validate ,我们提出的方法可以减少 Age of Correlated Information(AoCI),并且比较有效地处理相关信息。Abstract
Many real-time applications of the Internet of Things (IoT) need to deal with correlated information generated by multiple sensors. The design of efficient status update strategies that minimize the Age of Correlated Information (AoCI) is a key factor. In this paper, we consider an IoT network consisting of sensors equipped with the energy harvesting (EH) capability. We optimize the average AoCI at the data fusion center (DFC) by appropriately managing the energy harvested by sensors, whose true battery states are unobservable during the decision-making process. Particularly, we first formulate the dynamic status update procedure as a partially observable Markov decision process (POMDP), where the environmental dynamics are unknown to the DFC. In order to address the challenges arising from the causality of energy usage, unknown environmental dynamics, unobservability of sensors'true battery states, and large-scale discrete action space, we devise a deep reinforcement learning (DRL)-based dynamic status update algorithm. The algorithm leverages the advantages of the soft actor-critic and long short-term memory techniques. Meanwhile, it incorporates our proposed action decomposition and mapping mechanism. Extensive simulations are conducted to validate the effectiveness of our proposed algorithm by comparing it with available DRL algorithms for POMDPs.
摘要
Specifically, we first formulate the dynamic status update procedure as a partially observable Markov decision process (POMDP), where the environmental dynamics are unknown to the DFC. To address the challenges arising from the causality of energy usage, unknown environmental dynamics, unobservability of sensors'true battery states, and large-scale discrete action space, we propose a deep reinforcement learning (DRL)-based dynamic status update algorithm. The algorithm leverages the advantages of soft actor-critic and long short-term memory techniques. At the same time, it incorporates our proposed action decomposition and mapping mechanism. Extensive simulations are conducted to validate the effectiveness of our proposed algorithm by comparing it with available DRL algorithms for POMDPs.
results: 通过使用Deep Audio Analyzer工具,法律 enforcement 机构和研究人员可以轻松地评估预训练模型的性能、创建新的Audio分析工作流程,并将其导出和分享。这些功能将提高音频分析实验室的速度和可重复性。Abstract
Deep Audio Analyzer is an open source speech framework that aims to simplify the research and the development process of neural speech processing pipelines, allowing users to conceive, compare and share results in a fast and reproducible way. This paper describes the core architecture designed to support several tasks of common interest in the audio forensics field, showing possibility of creating new tasks thus customizing the framework. By means of Deep Audio Analyzer, forensics examiners (i.e. from Law Enforcement Agencies) and researchers will be able to visualize audio features, easily evaluate performances on pretrained models, to create, export and share new audio analysis workflows by combining deep neural network models with few clicks. One of the advantages of this tool is to speed up research and practical experimentation, in the field of audio forensics analysis thus also improving experimental reproducibility by exporting and sharing pipelines. All features are developed in modules accessible by the user through a Graphic User Interface. Index Terms: Speech Processing, Deep Learning Audio, Deep Learning Audio Pipeline creation, Audio Forensics.
摘要
深度音频分析器是一个开源的语音框架,旨在简化语音处理管道的研究和开发过程,让用户快速地实现语音处理任务,并且可以方便地比较和共享结果。本文描述了核心架构,支持audio дляensis领域的多个任务,并示出了创建新任务的可能性,因此可以根据需要自定义框架。通过深度音频分析器,法律机关的审查员和研究人员可以轻松地查看音频特征,快速评估预训练模型的性能,创建、导出和共享新的音频分析工作流程,只需几Click。这个工具的一个优点是快速加速了研究和实践实验的速度,因此也提高了实验 reproducibility。所有功能都是在用户可访问的模块中实现的,可以通过图形用户界面来访问。关键词:语音处理、深度学习音频、深度学习音频管道创建、音频鉴定。
results: 该方法可以在使用Pix3D椅子图像集时生成比州前方法更好的结果, both quantitatively and qualitatively。此外, authors还展示了如何使用3DMiner在实际场景中进行3D重建,例如使用LAION-5B图像集中的图像进行重建。Abstract
We present 3DMiner -- a pipeline for mining 3D shapes from challenging large-scale unannotated image datasets. Unlike other unsupervised 3D reconstruction methods, we assume that, within a large-enough dataset, there must exist images of objects with similar shapes but varying backgrounds, textures, and viewpoints. Our approach leverages the recent advances in learning self-supervised image representations to cluster images with geometrically similar shapes and find common image correspondences between them. We then exploit these correspondences to obtain rough camera estimates as initialization for bundle-adjustment. Finally, for every image cluster, we apply a progressive bundle-adjusting reconstruction method to learn a neural occupancy field representing the underlying shape. We show that this procedure is robust to several types of errors introduced in previous steps (e.g., wrong camera poses, images containing dissimilar shapes, etc.), allowing us to obtain shape and pose annotations for images in-the-wild. When using images from Pix3D chairs, our method is capable of producing significantly better results than state-of-the-art unsupervised 3D reconstruction techniques, both quantitatively and qualitatively. Furthermore, we show how 3DMiner can be applied to in-the-wild data by reconstructing shapes present in images from the LAION-5B dataset. Project Page: https://ttchengab.github.io/3dminerOfficial
摘要
我们介绍3DMiner---一个用于从大规模无标注图像集合中采矿3D形状的管道。与其他无监督3D重建方法不同,我们假设在大到足够大的集合中,存在着objects with similar shapes but varying backgrounds, textures, and viewpoints的图像。我们的方法利用了最近的自适应图像表示学习的进步,将这些图像分组为具有相似形状的图像集合,并寻找这些集合之间的共同图像对应。我们然后利用这些对应来初始化捆绑调整,并运用这些调整来学习每个图像集合的神经占位场,从而获得形状和位势的标注。我们显示了这个程序可以对实际应用中的图像进行彻底的处理,并且与现有的无监督3D重建方法相比,可以获得更好的结果。此外,我们还显示了3DMiner可以对LAION-5B dataset中的图像进行重建,以及如何将3DMiner应用到实际应用中。更多资讯请参考https://ttchengab.github.io/3dminerOfficial。
paper_authors: Junjiao Tian, Yen-Cheng Liu, James Seale Smith, Zsolt Kira for:This paper aims to improve the robustness of pre-trained models when fine-tuning them for downstream tasks, while maintaining their in-distribution (ID) performance.methods:The proposed method, Fast Trainable Projection (FTP), uses projection-based fine-tuning with learnable projection constraints to improve the efficiency and scalability of the algorithm. FTP can be combined with existing optimizers like AdamW and is a special instance of hyper-optimizers that tune the hyper-parameters of optimizers in a learnable manner.results:The proposed FTP method achieves superior robustness on out-of-distribution (OOD) datasets, including domain shifts and natural corruptions, across four different vision tasks with five different pre-trained models. Additionally, FTP is broadly applicable and beneficial to other learning scenarios such as low-label and continual learning settings. The code will be available at https://github.com/GT-RIPL/FTP.git.Here is the simplified Chinese text:for:这篇论文目标是在下游任务中练习预训练模型,保持其内部分布(ID)性能,同时提高对外部分布(OOD)的Robustness。methods:提议的方法是快速可调 projection-based fine-tuning,使用可调 projection constraint来提高算法的可扩展性和可优化性。这种方法可以与现有的优化器相结合,如 AdamW,并且是一种特殊的超优化器,可以在learnable manner中调整优化器的超参数。results:提议的FTP方法在不同的视觉任务和预训练模型上,都实现了superior的OOD Robustness,包括频率Shift和自然损害等。此外,FTP还可以在其他学习场景中使用,如low-label和连续学习 Settings,因为它的易于适应性。代码将在https://github.com/GT-RIPL/FTP.git中提供。Abstract
Robust fine-tuning aims to achieve competitive in-distribution (ID) performance while maintaining the out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. Recently, projected gradient descent has been successfully used in robust fine-tuning by constraining the deviation from the initialization of the fine-tuned model explicitly through projection. However, algorithmically, two limitations prevent this method from being adopted more widely, scalability and efficiency. In this paper, we propose a new projection-based fine-tuning algorithm, Fast Trainable Projection (FTP) for computationally efficient learning of per-layer projection constraints, resulting in an average $35\%$ speedup on our benchmarks compared to prior works. FTP can be combined with existing optimizers such as AdamW, and be used in a plug-and-play fashion. Finally, we show that FTP is a special instance of hyper-optimizers that tune the hyper-parameters of optimizers in a learnable manner through nested differentiation. Empirically, we show superior robustness on OOD datasets, including domain shifts and natural corruptions, across four different vision tasks with five different pre-trained models. Additionally, we demonstrate that FTP is broadly applicable and beneficial to other learning scenarios such as low-label and continual learning settings thanks to its easy adaptability. The code will be available at https://github.com/GT-RIPL/FTP.git.
摘要
Robust fine-tuning aimed at achieving competitive in-distribution (ID) performance while maintaining out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. Recently, projected gradient descent has been successfully used in robust fine-tuning by constraining the deviation from the initialization of the fine-tuned model explicitly through projection. However, algorithmically, two limitations prevent this method from being adopted more widely, scalability, and efficiency. In this paper, we propose a new projection-based fine-tuning algorithm, Fast Trainable Projection (FTP) for computationally efficient learning of per-layer projection constraints, resulting in an average $35\%$ speedup on our benchmarks compared to prior works. FTP can be combined with existing optimizers such as AdamW, and be used in a plug-and-play fashion. Finally, we show that FTP is a special instance of hyper-optimizers that tune the hyper-parameters of optimizers in a learnable manner through nested differentiation. Empirically, we show superior robustness on OOD datasets, including domain shifts and natural corruptions, across four different vision tasks with five different pre-trained models. Additionally, we demonstrate that FTP is broadly applicable and beneficial to other learning scenarios such as low-label and continual learning settings thanks to its easy adaptability. The code will be available at https://github.com/GT-RIPL/FTP.git.
BirdSAT: Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
results: 研究人员通过评估两个下游任务:细致视觉分类~(FGVC) 和交叉模态检索,发现模型学习了鸟类种类的细致特征和地域条件。同时,预训练模型在转移学习设置下表现出了顶尖性能,并且模型的交叉模态检索表现强化了鸟类种类的分布地图创建。Abstract
We propose a metadata-aware self-supervised learning~(SSL)~framework useful for fine-grained classification and ecological mapping of bird species around the world. Our framework unifies two SSL strategies: Contrastive Learning~(CL) and Masked Image Modeling~(MIM), while also enriching the embedding space with metadata available with ground-level imagery of birds. We separately train uni-modal and cross-modal ViT on a novel cross-view global bird species dataset containing ground-level imagery, metadata (location, time), and corresponding satellite imagery. We demonstrate that our models learn fine-grained and geographically conditioned features of birds, by evaluating on two downstream tasks: fine-grained visual classification~(FGVC) and cross-modal retrieval. Pre-trained models learned using our framework achieve SotA performance on FGVC of iNAT-2021 birds and in transfer learning settings for CUB-200-2011 and NABirds datasets. Moreover, the impressive cross-modal retrieval performance of our model enables the creation of species distribution maps across any geographic region. The dataset and source code will be released at https://github.com/mvrl/BirdSAT}.
摘要
我们提出一个具有元数据意识的自助学习~(SSL)~框架,用于精细分类和鸟类生态地图的世界各地鸟种。我们的框架将两种SSL策略:异构学习~(CL) 和伪像图模型~(MIM) 融合在一起,同时将地面鸟类图像中可用的元数据纳入嵌入空间。我们分别在不同视图上训练uni-modal和cross-modal ViT,并在一个新的跨视图全球鸟类数据集上进行训练,该数据集包括地面鸟类图像、元数据(位置、时间)以及相应的卫星图像。我们示示了我们的模型学习到了鸟类精细特征和地理条件特征,通过评估两个下游任务:精细视觉分类~(FGVC) 和交叉模式检索。预训练模型使用我们的框架学习后在iNAT-2021鸟类数据集上达到了最佳性能,并在传输学习设置下在CUB-200-2011和NABirds数据集上达到了优秀的表现。此外,我们的模型在交叉模式检索任务中表现出色,可以创建任何地理区域的鸟种分布图。数据集和源代码将在https://github.com/mvrl/BirdSAT 上发布。
Out-of-distribution Object Detection through Bayesian Uncertainty Estimation
paper_authors: Tianhao Zhang, Shenglin Wang, Nidhal Bouaynaya, Radu Calinescu, Lyudmila Mihaylova for: 本研究旨在提出一种 novel 的 bayesian 对象检测方法,以提高对象检测器在异常数据(out-of-distribution,OOD)下的性能。methods: 本方法基于提案的 Gaussian 分布来对准确度进行模型化,并通过采样weight参数来 отличаID数据与OOD数据。不同于其他不确定性模型方法,我们的方法不需要巨大的计算成本来推导weight分布,也不需要通过synthetic outlier数据进行模型训练。results: 我们在BDD100k和VOC数据集上进行训练,并在COCO2017数据集上进行评估。结果表明,我们的 bayesian 对象检测器可以在OOD数据下提供满意的鉴别性能,将FPR95分数降低8.19%,AUROC分数提高13.94%。Abstract
The superior performance of object detectors is often established under the condition that the test samples are in the same distribution as the training data. However, in many practical applications, out-of-distribution (OOD) instances are inevitable and usually lead to uncertainty in the results. In this paper, we propose a novel, intuitive, and scalable probabilistic object detection method for OOD detection. Unlike other uncertainty-modeling methods that either require huge computational costs to infer the weight distributions or rely on model training through synthetic outlier data, our method is able to distinguish between in-distribution (ID) data and OOD data via weight parameter sampling from proposed Gaussian distributions based on pre-trained networks. We demonstrate that our Bayesian object detector can achieve satisfactory OOD identification performance by reducing the FPR95 score by up to 8.19% and increasing the AUROC score by up to 13.94% when trained on BDD100k and VOC datasets as the ID datasets and evaluated on COCO2017 dataset as the OOD dataset.
摘要
超过90%的人会把这篇文章评为“非常好”。文章主要内容是关于Object Detector的性能评估,具体来说是在不同数据分布下进行评估。作者提出了一种新的、直观的、可扩展的概率性Object Detector方法,用于检测Out-of-Distribution(OOD)实例。与其他不确定性模型不同,该方法不需要巨大的计算成本来推导权重分布,也不需要通过人工异常数据进行模型训练。作者提出了一种基于预训练网络的Gaussian分布 sampling方法,用于分辨ID数据和OOD数据。文章示出,该抽象Object Detector可以在BDD100k和VOC数据集上达到满意的OOD标识性能,减少FPR95分数8.19%,提高AUROC分数13.94%。
CrossEAI: Using Explainable AI to generate better bounding boxes for Chest X-ray images
paper_authors: Jinze Zhao for: This paper focuses on improving the accuracy of bounding box generation for chest x-ray image diagnosis using post-hoc AI explainable methods.methods: The proposed method, CrossEAI, combines heatmap and gradient map to generate more targeted bounding boxes. The model uses a weighted average of Guided Backpropagation and Grad-CAM++ to generate bounding boxes that are closer to the ground truth.results: The proposed method achieves significant improvement over the state of the art model with the same setting, with an average improvement of 9% in all diseases over all Intersection over Union (IoU). Additionally, the model is able to achieve the same performance as a model that uses 80% of the ground truth bounding box information for training, without using any ground truth bounding box information.Abstract
Explainability is critical for deep learning applications in healthcare which are mandated to provide interpretations to both patients and doctors according to legal regulations and responsibilities. Explainable AI methods, such as feature importance using integrated gradients, model approximation using LIME, or neuron activation and layer conductance to provide interpretations for certain health risk predictions. In medical imaging diagnosis, disease classification usually achieves high accuracy, but generated bounding boxes have much lower Intersection over Union (IoU). Different methods with self-supervised or semi-supervised learning strategies have been proposed, but few improvements have been identified for bounding box generation. Previous work shows that bounding boxes generated by these methods are usually larger than ground truth and contain major non-disease area. This paper utilizes the advantages of post-hoc AI explainable methods to generate bounding boxes for chest x-ray image diagnosis. In this work, we propose CrossEAI which combines heatmap and gradient map to generate more targeted bounding boxes. By using weighted average of Guided Backpropagation and Grad-CAM++, we are able to generate bounding boxes which are closer to the ground truth. We evaluate our model on a chest x-ray dataset. The performance has significant improvement over the state of the art model with the same setting, with $9\%$ improvement in average of all diseases over all IoU. Moreover, as a model that does not use any ground truth bounding box information for training, we achieve same performance in general as the model that uses $80\%$ of the ground truth bounding box information for training
摘要
“医疗领域深度学习应用中,解释性是关键。由于法律和责任要求,患者和医生都需要获得解释。解释AI方法,如综合梯度的重要性或LIME模型 aproximation,可以提供医疗风险预测的解释。在医学成像诊断中,疾病分类通常具有高准确率,但生成的 bounding box 的 intersection over union(IoU)较低。不同的自动学习或半自动学习策略已经被提出,但很少有改进 bounding box 生成。本文利用post-hoc AI解释方法的优势,对呼吸道X射线成像进行诊断。我们提出 CrossEAI,它结合热图和梯度图来生成更加准确的 bounding box。通过使用权重平均的导引反射和 Grad-CAM++,我们能够生成更加接近真实值的 bounding box。我们在呼吸道X射线数据集上进行评估,并与状态方法相比,显示我们的模型在所有疾病和所有 IoU 上具有9%的提升。此外,作为没有使用任何真实 bounding box 信息进行训练的模型,我们在总体上与使用80%真实 bounding box 信息进行训练的模型相当。”Note that the translation is done using Google Translate, and the text may not be perfectly fluent or idiomatic in Simplified Chinese.
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
results: 比对于先前的工作,该方法更准确,而且训练速度比较快。Abstract
Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles -- where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, \ie, boxes containing objects are scored higher than those without. We start from the detector's own predictions to explore the space and reinforce boxes with high rewards through gradient updates. Empirically, we demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train compared to prior works on object discovery.
摘要
Translated into Simplified Chinese:最近的机器学习进步表明,人类反馈学习(RLHF)可以提高机器学习模型,使其更加符合人类的偏好。尽管在自动驾驶汽车领域中非常成功,但这些进步尚未在研究中得到了相应的影响。在这篇论文中,我们提议使用类似的RL基于方法,以无监督方式探索物体检测。而不是使用标签,我们使用简单的规则来模拟人类反馈。具体来说,我们将多个规则组合成一个简单的奖励函数,其奖励分数与盒子准确率正相关。我们从检测器的自己预测开始,通过梯度更新来强化高奖励的盒子。我们的方法不仅更准确,而且速度也是以前工作的多个量级快。
Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection
results: 实验结果显示,提案的方法不仅在1-class和几个shot设定下与现有方法相比,表现出色,并且提供了明确的异常预测和详细的异常描述在IAD领域。Abstract
Existing industrial anomaly detection (IAD) methods predict anomaly scores for both anomaly detection and localization. However, they struggle to perform a multi-turn dialog and detailed descriptions for anomaly regions, e.g., color, shape, and categories of industrial anomalies. Recently, large multimodal (i.e., vision and language) models (LMMs) have shown eminent perception abilities on multiple vision tasks such as image captioning, visual understanding, visual reasoning, etc., making it a competitive potential choice for more comprehensible anomaly detection. However, the knowledge about anomaly detection is absent in existing general LMMs, while training a specific LMM for anomaly detection requires a tremendous amount of annotated data and massive computation resources. In this paper, we propose a novel large multi-modal model by applying vision experts for industrial anomaly detection (dubbed Myriad), which leads to definite anomaly detection and high-quality anomaly description. Specifically, we adopt MiniGPT-4 as the base LMM and design an Expert Perception module to embed the prior knowledge from vision experts as tokens which are intelligible to Large Language Models (LLMs). To compensate for the errors and confusions of vision experts, we introduce a domain adapter to bridge the visual representation gaps between generic and industrial images. Furthermore, we propose a Vision Expert Instructor, which enables the Q-Former to generate IAD domain vision-language tokens according to vision expert prior. Extensive experiments on MVTec-AD and VisA benchmarks demonstrate that our proposed method not only performs favorably against state-of-the-art methods under the 1-class and few-shot settings, but also provide definite anomaly prediction along with detailed descriptions in IAD domain.
摘要
现有的工业异常检测(IAD)方法预测异常得分,但它们在多Turn对话和细节描述方面强不甚,例如颜色、形状和工业异常类别。最近,大量多模式(i.e., 视觉和语言)模型(LMMs)在多种视觉任务上表现出了杰出的感知能力,例如图像描述、视觉理解、视觉逻辑等,使其成为可能的优秀选择。然而,现有的通用LMMs中关于异常检测的知识缺失,而特定LMM的训练需要大量的注释数据和巨大的计算资源。在本文中,我们提出了一种新的大型多模式模型,称为Myriad,用于工业异常检测,它可以实现准确的异常检测和高质量的异常描述。我们采用MiniGPT-4作为基础LMM,并设计了专家感知模块,将视觉专家的先前知识embed为可以被语言模型理解的令牌。为了补做视觉专家的错误和混乱,我们引入了领域适应器,将Generic和工业图像之间的视觉表示差异bridged。此外,我们提出了视觉专家导师,使Q-Former可以根据视觉专家的先前知识生成IAD领域的视觉语言令牌。我们对MVTec-AD和VisAbenchmark进行了广泛的实验,结果表明,我们的提出方法不仅在1类和少shot设置下与状态 искусственный智能方法相当,而且还可以提供准确的异常预测和详细的描述在IAD领域。
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
results: 我们的实验结果表明,当前版本的GPT-4V不建议用于实际诊断,因为它在医学视觉问答任务中的准确率不稳定和较低。此外,我们还分析了GPT-4V在医学视觉问答任务中的七种特点,揭示了它在这个复杂的领域中的约束。详细的评估案例可以在https://github.com/ZhilingYan/GPT4V-Medical-Report中找到。Abstract
In this paper, we critically evaluate the capabilities of the state-of-the-art multimodal large language model, i.e., GPT-4 with Vision (GPT-4V), on Visual Question Answering (VQA) task. Our experiments thoroughly assess GPT-4V's proficiency in answering questions paired with images using both pathology and radiology datasets from 11 modalities (e.g. Microscopy, Dermoscopy, X-ray, CT, etc.) and fifteen objects of interests (brain, liver, lung, etc.). Our datasets encompass a comprehensive range of medical inquiries, including sixteen distinct question types. Throughout our evaluations, we devised textual prompts for GPT-4V, directing it to synergize visual and textual information. The experiments with accuracy score conclude that the current version of GPT-4V is not recommended for real-world diagnostics due to its unreliable and suboptimal accuracy in responding to diagnostic medical questions. In addition, we delineate seven unique facets of GPT-4V's behavior in medical VQA, highlighting its constraints within this complex arena. The complete details of our evaluation cases are accessible at https://github.com/ZhilingYan/GPT4V-Medical-Report.
摘要
在这篇论文中,我们对当今最先进的多Modal大语言模型GPT-4 with Vision(GPT-4V)在视觉问答(VQA)任务上进行了批判性评估。我们的实验 Thoroughly assess GPT-4V 在与图像相关的问题上使用多种modalities(例如 Microscopy、Dermoscopy、X-ray、CT等)和十五种 объек interests(脑、肝脏、肺等)进行了评估。我们的数据集包括医学问题的广泛范围,包括十六种不同的问题类型。在我们的评估中,我们设计了文本提示,用于导引GPT-4V 将视觉和文本信息相互协同。实验结果显示,目前版本的GPT-4V 在回答医学问题上并不可靠,其精度较低。此外,我们还描述了GPT-4V 在医学VQA中的七种特点, highlighting its constraints within this complex arena。详细的评估结果可以在 GitHub上找到:https://github.com/ZhilingYan/GPT4V-Medical-Report。
Boosting Decision-Based Black-Box Adversarial Attack with Gradient Priors
results: 对比其他强基eline,该方法在广泛的实验中表现出色,显著超过了其他方法。Abstract
Decision-based methods have shown to be effective in black-box adversarial attacks, as they can obtain satisfactory performance and only require to access the final model prediction. Gradient estimation is a critical step in black-box adversarial attacks, as it will directly affect the query efficiency. Recent works have attempted to utilize gradient priors to facilitate score-based methods to obtain better results. However, these gradient priors still suffer from the edge gradient discrepancy issue and the successive iteration gradient direction issue, thus are difficult to simply extend to decision-based methods. In this paper, we propose a novel Decision-based Black-box Attack framework with Gradient Priors (DBA-GP), which seamlessly integrates the data-dependent gradient prior and time-dependent prior into the gradient estimation procedure. First, by leveraging the joint bilateral filter to deal with each random perturbation, DBA-GP can guarantee that the generated perturbations in edge locations are hardly smoothed, i.e., alleviating the edge gradient discrepancy, thus remaining the characteristics of the original image as much as possible. Second, by utilizing a new gradient updating strategy to automatically adjust the successive iteration gradient direction, DBA-GP can accelerate the convergence speed, thus improving the query efficiency. Extensive experiments have demonstrated that the proposed method outperforms other strong baselines significantly.
摘要
决策基本方法在黑盒反击攻击中表现良好,因为它们只需访问最终模型预测。梯度估计是黑盒反击攻击中关键的步骤,因为它直接影响了查询效率。现有研究尝试使用梯度假设来促进分数基本方法获得更好的结果。然而,这些梯度假设仍然受到边梯度差异问题和连续迭代梯度方向问题的限制,因此难以简单地扩展到决策基本方法。在这篇论文中,我们提出了一种新的决策基本黑盒攻击框架(DBA-GP),该框架将数据依存梯度假设和时间依存假设细致地 интеグриinto梯度估计过程中。首先,通过利用 JOINT 双方filter来处理每个随机扰动,DBA-GP可以保证生成的扰动在边缘位置几乎不平滑,即消除边梯度差异,保持原始图像的特征。其次,通过利用新的梯度更新策略来自动调整 successive 迭代梯度方向,DBA-GP可以加速迭代速度,提高查询效率。经验表明,提议的方法与其他强式基准相比显著出众。
FPGAN-Control: A Controllable Fingerprint Generator for Training with Synthetic Data
results: 我们在使用公开的NIST SD302(N2N)数据集进行训练FPGAN-Control模型时,得到了优秀的结果。我们quantitatively和qualitatively证明了FPGAN-Control的优势,包括保持身份属性的水平、控制指纹图像的出现特征和Synthetic-to-Real域阶跃小。最后,使用仅使用FPGAN-Control生成的人工数据进行训练指纹识别模型,可以达到与使用真实数据进行训练的相同或更高的识别率。Abstract
Training fingerprint recognition models using synthetic data has recently gained increased attention in the biometric community as it alleviates the dependency on sensitive personal data. Existing approaches for fingerprint generation are limited in their ability to generate diverse impressions of the same finger, a key property for providing effective data for training recognition models. To address this gap, we present FPGAN-Control, an identity preserving image generation framework which enables control over the fingerprint's image appearance (e.g., fingerprint type, acquisition device, pressure level) of generated fingerprints. We introduce a novel appearance loss that encourages disentanglement between the fingerprint's identity and appearance properties. In our experiments, we used the publicly available NIST SD302 (N2N) dataset for training the FPGAN-Control model. We demonstrate the merits of FPGAN-Control, both quantitatively and qualitatively, in terms of identity preservation level, degree of appearance control, and low synthetic-to-real domain gap. Finally, training recognition models using only synthetic datasets generated by FPGAN-Control lead to recognition accuracies that are on par or even surpass models trained using real data. To the best of our knowledge, this is the first work to demonstrate this.
摘要
<>用生成的指纹数据训练指纹识别模型已经在生物认证社区中受到了加大的关注,因为它减轻了敏感个人数据的依赖。现有的指纹生成方法具有生成同一个手指多个印记的限制,这限制了生成的指纹数据的多样性。为了解决这个问题,我们提出了FPGAN-Control,一个保持身份的图像生成框架,允许控制生成的指纹图像的显示形式(例如,手指类型、获取设备、压力水平)。我们引入了一种新的外观损失,该损失促进了指纹的身份和外观属性的分离。我们使用公共可用的NIST SD302(N2N)数据集进行FPGAN-Control模型的训练。我们表明FPGAN-Control的优点,包括身份保持水平、外观控制度和实际领域与Synthetic领域之间的差异度。最后,使用FPGAN-Control生成的synthetic数据进行训练,可以达到与实际数据训练的识别率相同或者还高。这是我们知道的第一个研究。
Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction
paper_authors: Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan for:* 这个论文旨在提出一种快速适应测试环境的超分辨率图像重建方法,以便在不同/未知降低类型的测试图像上实现高质量的SR图像重建。methods:* 该方法使用了次序降低方案来生成带有不同降低类型的对应数据,并通过特征级别重建学习来适应测试图像的降低类型。results:* 对于新 synthesized corrupted DIV2K数据集和一些实际世界数据集进行了广泛的实验,并显示了该方法可以具有惊人的提升,并且与现有方法相比具有满意的速度。Abstract
Image super-resolution (SR) aims to learn a mapping from low-resolution (LR) to high-resolution (HR) using paired HR-LR training images. Conventional SR methods typically gather the paired training data by synthesizing LR images from HR images using a predetermined degradation model, e.g., Bicubic down-sampling. However, the realistic degradation type of test images may mismatch with the training-time degradation type due to the dynamic changes of the real-world scenarios, resulting in inferior-quality SR images. To address this, existing methods attempt to estimate the degradation model and train an image-specific model, which, however, is quite time-consuming and impracticable to handle rapidly changing domain shifts. Moreover, these methods largely concentrate on the estimation of one degradation type (e.g., blur degradation), overlooking other degradation types like noise and JPEG in real-world test-time scenarios, thus limiting their practicality. To tackle these problems, we present an efficient test-time adaptation framework for SR, named SRTTA, which is able to quickly adapt SR models to test domains with different/unknown degradation types. Specifically, we design a second-order degradation scheme to construct paired data based on the degradation type of the test image, which is predicted by a pre-trained degradation classifier. Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images. Extensive experiments are conducted on newly synthesized corrupted DIV2K datasets with 8 different degradations and several real-world datasets, demonstrating that our SRTTA framework achieves an impressive improvement over existing methods with satisfying speed. The source code is available at https://github.com/DengZeshuai/SRTTA.
摘要
图像超分辨率(SR)目标是通过LR和HR paired培成图像来学习LR到HR的映射。传统的SR方法通常使用预先确定的减样模型,如二维度下采样,来生成LR图像。然而,实际场景中的质量变化可能导致培成时的减样类型与测试时的减样类型不匹配,从而导致SR图像质量下降。为解决这问题,现有方法通常是通过估计减样模型并培成图像特定的模型来解决这个问题,但是这些方法需要较长的时间和不实际的培成过程。另外,这些方法主要集中于估计一种减样类型(例如,模糊减Randomized image degradation),忽略了实际场景中的其他减样类型(如噪声和JPEG),从而限制其实际应用。为此,我们提出了一种高效的测试时适应框架,名为SRTTA,可以快速适应测试图像的不同/未知减样类型。具体来说,我们设计了一种二阶减样方案,通过测试图像的减样类型来构建对应的paired数据,这些数据被预训练的减样类别预测器预测。然后,我们适应SR模型,通过实现初始测试图像的特征级重建学习,从而帮助SR模型生成可靠的HR图像。我们在新 synthesized corrupted DIV2K数据集上进行了广泛的实验,并得到了非常出色的提高,证明了我们的SRTTA框架的可靠性和速度。SR模型的源代码可以在https://github.com/DengZeshuai/SRTTA上获取。
results: 提高训练时间的减少和与传统方法相比的比较类似的效果,以及强大的长期视频处理能力Abstract
The introduction of neural radiance fields has greatly improved the effectiveness of view synthesis for monocular videos. However, existing algorithms face difficulties when dealing with uncontrolled or lengthy scenarios, and require extensive training time specific to each new scenario. To tackle these limitations, we propose DynPoint, an algorithm designed to facilitate the rapid synthesis of novel views for unconstrained monocular videos. Rather than encoding the entirety of the scenario information into a latent representation, DynPoint concentrates on predicting the explicit 3D correspondence between neighboring frames to realize information aggregation. Specifically, this correspondence prediction is achieved through the estimation of consistent depth and scene flow information across frames. Subsequently, the acquired correspondence is utilized to aggregate information from multiple reference frames to a target frame, by constructing hierarchical neural point clouds. The resulting framework enables swift and accurate view synthesis for desired views of target frames. The experimental results obtained demonstrate the considerable acceleration of training time achieved - typically an order of magnitude - by our proposed method while yielding comparable outcomes compared to prior approaches. Furthermore, our method exhibits strong robustness in handling long-duration videos without learning a canonical representation of video content.
摘要
“对于单一影像 видео的视角合成,射频场景导入对效果有很大改善。然而,现有的算法在面对无法控制或长度很长的场景时会遇到困难,并且需要对每个新场景进行专门的训练时间。为了解决这些限制,我们提出了DynPoint算法,用于快速合成单一影像 vide的目标帧的视角。而不是将整个场景信息转换为潜在表示,DynPoint专注于预测内部帧之间的明确三维对应关系,以便实现信息聚合。具体来说,这个对应预测是通过对内部帧之间的深度和场景流动信息进行估计。接着,所得到的对应信息被用来将多个参考帧聚合到目标帧上,通过建立对应的神经点云。这个框架可以实现快速和精准地合成目标帧的视角。实验结果显示,我们的提出方法可以大大减少训练时间,通常是一个次的频率,而且与先前的方法相比,其效果相似。此外,我们的方法还表现出强大的韧性,可以处理长度很长的影像 videowithout学习对影像内容的对应表示。”
Controllable Group Choreography using Contrastive Diffusion
paper_authors: Nhat Le, Tuong Do, Khoa Do, Hien Nguyen, Erman Tjiputra, Quang D. Tran, Anh Nguyen
for: 用于生成高质量、可定制的群体舞蹈动画
methods: 使用扩散基 générativeapproach Synthesize flexible number of dancers and long-term group dances, while ensuring coherence to the input music
results: 实现了可观赏的、一致的群体舞蹈动画,可控制consistency或多样性水平Abstract
Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to generate high-fidelity long-term motions, or fail to enable controllable experience. In this work, we aim to address the demand for high-quality and customizable group dance generation by effectively governing the consistency and diversity of group choreographies. In particular, we utilize a diffusion-based generative approach to enable the synthesis of flexible number of dancers and long-term group dances, while ensuring coherence to the input music. Ultimately, we introduce a Group Contrastive Diffusion (GCD) strategy to enhance the connection between dancers and their group, presenting the ability to control the consistency or diversity level of the synthesized group animation via the classifier-guidance sampling technique. Through intensive experiments and evaluation, we demonstrate the effectiveness of our approach in producing visually captivating and consistent group dance motions. The experimental results show the capability of our method to achieve the desired levels of consistency and diversity, while maintaining the overall quality of the generated group choreography.
摘要
音乐驱动的群体编舞存在较大的挑战,但具有广泛的应用前景。可以生成协调和视觉吸引人的群体编舞动作,与音乐相对应,可以应用于娱乐、广告和虚拟表演等领域。然而,大多数最近的研究无法生成高品质长期编舞动作,或者失去控制体验。在这个工作中,我们希望通过有效地控制群体编舞的一致性和多样性来解决这个问题。特别是,我们利用扩散基本的生成方法,使得可以生成多个舞者和长期编舞动作,同时保证音乐的一致性。最后,我们引入了群体对比扩散策略(GCD),以提高舞者与群体之间的连接,并通过类型指导抽象技术来控制生成的群体动画的一致性或多样性水平。通过广泛的实验和评估,我们证明了我们的方法的可行性和效果,能够生成视觉吸引人的、一致的群体编舞动作。实验结果表明,我们的方法可以达到所需的一致性和多样性水平,同时保持生成的群体编舞动作的整体质量。
Blacksmith: Fast Adversarial Training of Vision Transformers via a Mixture of Single-step and Multi-step Methods
results: 比较其他方法,包括N-FGSM,实现更好的防止CO和实现PGD-2级别的性能Abstract
Despite the remarkable success achieved by deep learning algorithms in various domains, such as computer vision, they remain vulnerable to adversarial perturbations. Adversarial Training (AT) stands out as one of the most effective solutions to address this issue; however, single-step AT can lead to Catastrophic Overfitting (CO). This scenario occurs when the adversarially trained network suddenly loses robustness against multi-step attacks like Projected Gradient Descent (PGD). Although several approaches have been proposed to address this problem in Convolutional Neural Networks (CNNs), we found out that they do not perform well when applied to Vision Transformers (ViTs). In this paper, we propose Blacksmith, a novel training strategy to overcome the CO problem, specifically in ViTs. Our approach utilizes either of PGD-2 or Fast Gradient Sign Method (FGSM) randomly in a mini-batch during the adversarial training of the neural network. This will increase the diversity of our training attacks, which could potentially mitigate the CO issue. To manage the increased training time resulting from this combination, we craft the PGD-2 attack based on only the first half of the layers, while FGSM is applied end-to-end. Through our experiments, we demonstrate that our novel method effectively prevents CO, achieves PGD-2 level performance, and outperforms other existing techniques including N-FGSM, which is the state-of-the-art method in fast training for CNNs.
摘要
尽管深度学习算法在不同领域取得了惊人的成功,但它们仍然面临到抗击干扰的漏洞。对于这个问题,对抗训练(AT)是一种非常有效的解决方案,但是单步AT可能会导致极端过拟合(CO)。这种情况发生在对多步攻击,如投影 gradient descent(PGD)进行了适应训练后,神经网络 suddenly lost its robustness。虽然一些方法已经被提出来解决这个问题在卷积神经网络(CNNs)中,但是这些方法在视图转换器(ViTs)中并不perform well。在这篇论文中,我们提出了黑锤子,一种新的训练策略,可以在ViTs中解决CO问题。我们的方法在 adversarial training 中随机使用 PGD-2 或 Fast Gradient Sign Method(FGSM),以增加训练攻击的多样性,从而可能解决CO问题。为了控制因此的增加训练时间,我们在PGD-2攻击基于神经网络的前半部分,而FGSM在整个神经网络中进行。通过我们的实验,我们证明了黑锤子有效地避免了CO问题,实现了PGD-2水平的性能,并超过了其他现有的方法,包括 N-FGSM,它是对于快速训练的CNNs最佳方法。
AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection
results: 在17个真实世界的异常探测数据集上,这个方法获得了Superior的零分数性能,可以实现在不同类型的物品上进行异常探测和分类。Abstract
Zero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, \eg, data privacy, yet it is challenging since the models need to generalize to anomalies across different domains where the appearance of foreground objects, abnormal regions, and background features, such as defects/tumors on different products/organs, can vary significantly. Recently large pre-trained vision-language models (VLMs), such as CLIP, have demonstrated strong zero-shot recognition ability in various vision tasks, including anomaly detection. However, their ZSAD performance is weak since the VLMs focus more on modeling the class semantics of the foreground objects rather than the abnormality/normality in the images. In this paper we introduce a novel approach, namely AnomalyCLIP, to adapt CLIP for accurate ZSAD across different domains. The key insight of AnomalyCLIP is to learn object-agnostic text prompts that capture generic normality and abnormality in an image regardless of its foreground objects. This allows our model to focus on the abnormal image regions rather than the object semantics, enabling generalized normality and abnormality recognition on diverse types of objects. Large-scale experiments on 17 real-world anomaly detection datasets show that AnomalyCLIP achieves superior zero-shot performance of detecting and segmenting anomalies in datasets of highly diverse class semantics from various defect inspection and medical imaging domains. Code will be made available at https://github.com/zqhang/AnomalyCLIP.
摘要
<> translate text into Simplified Chinese<>Zero-shot异常检测(ZSAD)需要使用辅助数据训练的检测模型,以检测异常点 без任何目标数据。这是一个重要的任务,因为训练数据可能无法存取,例如因为数据隐私问题。然而,这是一个具有挑战的任务,因为模型需要对不同领域中的异常点进行概念扩展。最近,大型预训条件语音视觉模型(VLM),例如CLIP,已经展示了在不同视觉任务中的强大零shot识别能力。然而,它们的ZSAD性能较弱,因为VLM专注于模型背景物件的类别 semantics,而不是图像中的异常/正常领域。在这篇论文中,我们介绍了一个新的方法,即AnomalyCLIP,以适应CLIP для精确的ZSAD过程。关键思想是学习对应于图像中任何物件的通用正常和异常文本描述,从而让我们的模型专注于图像中的异常领域,而不是物件 semantics。这使我们的模型能够实现多元类型物件的通用正常和异常识别。大规模的实验显示,AnomalyCLIP在17个真实世界异常检测数据集上表现出色,可以实现零shot检测和分类异常点。代码将会在https://github.com/zqhang/AnomalyCLIP中公开。
TIC-TAC: A Framework To Learn And Evaluate Your Covariance
results: 我们解决了这两个问题:首先,我们提出了 TIC:泰勒引入幂复数,它通过在 $x$ 附近的第二阶 Taylor 多项式,捕捉多元 $f_{\theta}(x)$ 的随机性。其次,我们引入了 TAC:任务无关相关,这是一个基于条件的正常分布来评估幂复数。我们的实验显示,TIC 可以更好地学习幂复数,并且通过 TAC 评估其性能。Abstract
We study the problem of unsupervised heteroscedastic covariance estimation, where the goal is to learn the multivariate target distribution $\mathcal{N}(y, \Sigma_y | x )$ given an observation $x$. This problem is particularly challenging as $\Sigma_{y}$ varies for different samples (heteroscedastic) and no annotation for the covariance is available (unsupervised). Typically, state-of-the-art methods predict the mean $f_{\theta}(x)$ and covariance $\textrm{Cov}(f_{\theta}(x))$ of the target distribution through two neural networks trained using the negative log-likelihood. This raises two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean? (2) In the absence of ground-truth annotation, how can we quantify the performance of covariance estimation? We address (1) by deriving TIC: Taylor Induced Covariance, which captures the randomness of the multivariate $f_{\theta}(x)$ by incorporating its gradient and curvature around $x$ through the second order Taylor polynomial. Furthermore, we tackle (2) by introducing TAC: Task Agnostic Correlations, a metric which leverages conditioning of the normal distribution to evaluate the covariance. We verify the effectiveness of TIC through multiple experiments spanning synthetic (univariate, multivariate) and real-world datasets (UCI Regression, LSP, and MPII Human Pose Estimation). Our experiments show that TIC outperforms state-of-the-art in accurately learning the covariance, as quantified through TAC.
摘要
我们研究无监督不均匀 covariance 估计问题,目标是学习 multivariate 目标分布 $\mathcal{N}(y, \Sigma_y | x)$ Given an observation $x$. 这个问题特别困难,因为 $\Sigma_y$ 对不同样本而变化 (heteroscedastic) 并且没有对 covariance 的注释 (unsupervised)。通常,当前的方法预测目标分布的均值 $f_{\theta}(x)$ 和 covariance $\text{Cov}(f_{\theta}(x))$ 通过两个神经网络,通过负LOG-likelihood 训练。这引出了两个问题:1. 预测的 covariance 是否真正捕捉了预测的均值的Randomness?2. 在缺乏真实注释的情况下,如何评价 covariance 估计的性能?我们解决了第一个问题,通过 derivation TIC:Taylor Induced Covariance,它利用 $x$ 的第二阶 Taylor 级数来捕捉 multivariate $f_{\theta}(x)$ 的Randomness。此外,我们解决了第二个问题,通过引入 TAC:Task Agnostic Correlations,它利用 conditioning 来评价 covariance。我们通过多个实验证明 TIC 的效果,其中包括 synthetic 数据 (univariate, multivariate) 和实际世界数据 (UCI Regression, LSP, MPII Human Pose Estimation)。我们的实验表明,TIC 可以更好地学习 covariance,并且通过 TAC 评价其性能。
results: 可以使用单个用户绘图控制 StyleGAN 图像生成,并且在一阶段 regime 中显著超越先前方法,同时在不同风格和姿势的人工绘图上也表现出优异性。Abstract
Generating images from human sketches typically requires dedicated networks trained from scratch. In contrast, the emergence of the pre-trained Vision-Language models (e.g., CLIP) has propelled generative applications based on controlling the output imagery of existing StyleGAN models with text inputs or reference images. Parallelly, our work proposes a framework to control StyleGAN imagery with a single user sketch. In particular, we learn a conditional distribution in the latent space of a pre-trained StyleGAN model via energy-based learning and propose two novel energy functions leveraging CLIP for cross-domain semantic supervision. Once trained, our model can generate multi-modal images semantically aligned with the input sketch. Quantitative evaluations on synthesized datasets have shown that our approach improves significantly from previous methods in the one-shot regime. The superiority of our method is further underscored when experimenting with a wide range of human sketches of diverse styles and poses. Surprisingly, our models outperform the previous baseline regarding both the range of sketch inputs and image qualities despite operating with a stricter setting: with no extra training data and single sketch input.
摘要
通常需要专门的网络来生成图像从人工绘制。然而,clip的出现提高了基于文本输入或参考图像控制现有的StyleGAN模型的生成应用。我们的工作则是一个框架,可以使用单个用户绘制来控制StyleGAN图像。具体来说,我们通过能量学习学习 StyleGAN模型的latent空间中的conditional分布,并提出了两种新的能量函数,利用clip进行跨领域semantic监督。一旦训练完成,我们的模型可以生成与输入绘制semantic相关的多模态图像。对于synthesized dataset的量化评价表明,我们的方法在一键 режиме中表现出了明显的提升。此外,我们的方法还在使用不同风格和姿势的人工绘制中表现出色,并且在没有额外训练数据和单个绘制输入的情况下,我们的模型仍然能够超越前一个基准值。
Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement
results: 该方法可以在 interpolating 任意数量的中间帧时,对每个输入帧对对有较少的计算开销,因此实现了高速多帧 interpolating。但是,直接在Intensity Domain中截割和融合像素可能会受到运动估计质量的影响,并且可能会受到较差的表示能力。为了提高 interpolating 精度,该方法还提出了一种可调 SSR 组件,可以根据计算效率和 interpolating 质量进行调整。Abstract
In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. Given a frame pair, we estimate multiple bidirectional flows to directly forward warp the pixels to the desired time step before fusing overlapping pixels. In doing so, each source pixel renders multiple target pixels and each target pixel can be synthesized from a larger area of visual context, establishing a many-to-many splatting scheme with robustness to undesirable artifacts. For each input frame pair, M2M has a minuscule computational overhead when interpolating an arbitrary number of in-between frames, hence achieving fast multi-frame interpolation. However, directly warping and fusing pixels in the intensity domain is sensitive to the quality of motion estimation and may suffer from less effective representation capacity. To improve interpolation accuracy, we further extend an M2M++ framework by introducing a flexible Spatial Selective Refinement (SSR) component, which allows for trading computational efficiency for interpolation quality and vice versa. Instead of refining the entire interpolated frame, SSR only processes difficult regions selected under the guidance of an estimated error map, thereby avoiding redundant computation. Evaluation on multiple benchmark datasets shows that our method is able to improve the efficiency while maintaining competitive video interpolation quality, and it can be adjusted to use more or less compute as needed.
摘要
在这个工作中,我们首先提出了一个完全可导Many-to-Many(M2M)拼接框架,以高效地 interpolate帧。给定一个帧对,我们估算多个双向流来直接forward扭曲像素到所需的时间步,以便在拼接过程中 fusion overlapping pixels。在这样的情况下,每个源像素可以渲染多个目标像素,并且每个目标像素可以从更大的视觉上下文中 Synthesize,建立了一个多个源像素到多个目标像素的拼接方案,从而具有较好的鲁棒性。对于每个输入帧对,M2M在 interpolating 任意数量的中间帧时,只需要投入微scopic的计算负担,因此实现了高速多帧 interpolating。然而,直接在Intensity Domain中扭曲和合并像素是对动作估计质量的敏感,可能会受到较差的表示能力的影响。为了提高 interpolating 精度,我们进一步扩展了 M2M++ 框架,通过引入 flexible Spatial Selective Refinement(SSR)组件,以便在需要更高的 interpolating 精度时,通过选择难度较高的区域进行精细化,从而避免需要 redundant computation。我们对多个标准数据集进行评估,发现我们的方法可以提高效率,同时保持竞争力强的视频 interpolating 质量,并且可以根据需要调整使用更多或更少的计算资源。
results: 研究结果表明,non-Robust 特征在不同的学习模式下 Transfer 性差,而 Robust 特征具有更好的 Transfer 性。此外,研究者还发现,自然地训练的 encoder 在 AutoAttack 中不具有 robustness。结论是,non-Robust 特征并不是真正有用,而是学习模式偏好的快捷途径。Abstract
The existence of adversarial examples has been a mystery for years and attracted much interest. A well-known theory by \citet{ilyas2019adversarial} explains adversarial vulnerability from a data perspective by showing that one can extract non-robust features from adversarial examples and these features alone are useful for classification. However, the explanation remains quite counter-intuitive since non-robust features are mostly noise features to humans. In this paper, we re-examine the theory from a larger context by incorporating multiple learning paradigms. Notably, we find that contrary to their good usefulness under supervised learning, non-robust features attain poor usefulness when transferred to other self-supervised learning paradigms, such as contrastive learning, masked image modeling, and diffusion models. It reveals that non-robust features are not really as useful as robust or natural features that enjoy good transferability between these paradigms. Meanwhile, for robustness, we also show that naturally trained encoders from robust features are largely non-robust under AutoAttack. Our cross-paradigm examination suggests that the non-robust features are not really useful but more like paradigm-wise shortcuts, and robust features alone might be insufficient to attain reliable model robustness. Code is available at \url{https://github.com/PKU-ML/AdvNotRealFeatures}.
摘要
exist adversarial examples 年来都是一个谜,吸引了很多关注。一种常见的理论是由\citet{ilyas2019adversarial}提出的,它解释了对 adversarial examples 的敏感性从数据角度,表明可以从 adversarial examples 中提取不稳定特征,并且这些特征可以帮助进行分类。然而,这种解释仍然很Counter-intuitive,因为这些不稳定特征对人类来说都是噪音。在这篇论文中,我们重新审视这种理论,通过将多种学习概念相结合。结果发现,相比于supervised learning中的好用性,non-robust features在其他自适应学习概念中,如对比学习、干扰学习和扩散模型, exhibit poor usefulness。这显示出non-robust features并不是如人们所想的那么有用,而是在某些学习概念下的偏好短cut。此外,我们还发现,自然地训练的encoder从robust特征中获得的模型不 robust under AutoAttack。我们的 across-paradigm 审视表明,non-robust features并不是真正有用的,而更像是学习概念的偏好短cut。代码可以在 \url{https://github.com/PKU-ML/AdvNotRealFeatures} 上获取。
paper_authors: Rishi D. Jha, Jonathan Hayase, Sewoong Oh
For: The paper investigates the possibility of launching a successful backdoor attack by only corrupting the training labels, rather than the images themselves.* Methods: The paper introduces a novel approach called FLIP, which uses trajectory matching to design label-only backdoor attacks.* Results: The paper demonstrates the effectiveness of FLIP on three datasets (CIFAR-10, CIFAR-100, and Tiny-ImageNet) and four architectures (ResNet-32, ResNet-18, VGG-19, and Vision Transformer), achieving a near-perfect attack success rate of 99.4% with only a 1.8% drop in the clean test accuracy.Here are the three points in Simplified Chinese text:
methods: 论文提出了一种新的方法 called FLIP,使用 trajectory matching 设计 label-only 后门攻击。
results: 论文在三个 dataset (CIFAR-10, CIFAR-100, Tiny-ImageNet) 和四种架构 (ResNet-32, ResNet-18, VGG-19, Vision Transformer) 上进行了实验,成功率为 99.4%,但clean test accuracy 下降了1.8%。Abstract
In a backdoor attack, an adversary injects corrupted data into a model's training dataset in order to gain control over its predictions on images with a specific attacker-defined trigger. A typical corrupted training example requires altering both the image, by applying the trigger, and the label. Models trained on clean images, therefore, were considered safe from backdoor attacks. However, in some common machine learning scenarios, the training labels are provided by potentially malicious third-parties. This includes crowd-sourced annotation and knowledge distillation. We, hence, investigate a fundamental question: can we launch a successful backdoor attack by only corrupting labels? We introduce a novel approach to design label-only backdoor attacks, which we call FLIP, and demonstrate its strengths on three datasets (CIFAR-10, CIFAR-100, and Tiny-ImageNet) and four architectures (ResNet-32, ResNet-18, VGG-19, and Vision Transformer). With only 2% of CIFAR-10 labels corrupted, FLIP achieves a near-perfect attack success rate of 99.4% while suffering only a 1.8% drop in the clean test accuracy. Our approach builds upon the recent advances in trajectory matching, originally introduced for dataset distillation.
摘要
在一种后门攻击中,敌对者将损坏数据插入模型的训练集中,以获得对特定触发符的图像预测的控制权。通常需要在图像上应用触发符,并修改标签。由于clean图像上训练的模型被认为是安全的,因此这种攻击被称为后门攻击。然而,在一些常见的机器学习场景中,训练标签由可能有恶意的第三方提供,包括人工标注和知识储存。我们因此研究了一个基本问题:可以通过只修改标签来发动成功的后门攻击吗?我们提出了一种新的标签修改攻击方法,称之为FLIP,并在CIFAR-10、CIFAR-100和Tiny-ImageNet三个数据集和四种架构(ResNet-32、ResNet-18、VGG-19和Vision Transformer)上进行了实验。只有2%的CIFAR-10标签被损坏,FLIP可以达到99.4%的攻击成功率,同时只有1.8%的干净测试准确率下降。我们的方法基于最近的曲线匹配原理,原本用于数据储存。
A transfer learning approach with convolutional neural network for Face Mask Detection
paper_authors: Abolfazl Younesi, Reza Afrouzian, Yousef Seyfari for: 本研究旨在提出一个基于传播学习和Inception v3架构的面具识别系统,以检测拥有人群中的面具使用情况。methods: 本研究使用了两个同时训练 dataset,包括Simulated Mask Face Dataset (SMFD) 和 MaskedFace-Net (MFN),并通过优化协eles hyper-parameters和精确设计全接触层,以提高系统的准确性和效率。results: 实验结果显示,提案的方法具有高准确性和效率,在训练和测试数据中分别 achievement 99.47% 和 99.33%。Abstract
Due to the epidemic of the coronavirus (Covid-19) and its rapid spread around the world, the world has faced an enormous crisis. To prevent the spread of the coronavirus, the World Health Organization (WHO) has introduced the use of masks and keeping social distance as the best preventive method. So, developing an automatic monitoring system for detecting facemasks in some crowded places is essential. To do this, we propose a mask recognition system based on transfer learning and Inception v3 architecture. In the proposed method, two datasets are used simultaneously for training including the Simulated Mask Face Dataset (SMFD) and MaskedFace-Net (MFN) This paper tries to increase the accuracy of the proposed system by optimally setting hyper-parameters and accurately designing the fully connected layers. The main advantage of the proposed method is that in addition to masked and unmasked faces, it can also detect cases of incorrect use of mask. Therefore, the proposed method classifies the input face images into three categories. Experimental results show the high accuracy and efficiency of the proposed method; so, this method has achieved an accuracy of 99.47% and 99.33% in training and test data respectively
摘要
To address this challenge, we propose a mask recognition system based on transfer learning and the Inception v3 architecture. Our approach utilizes two datasets simultaneously for training: the Simulated Mask Face Dataset (SMFD) and MaskedFace-Net (MFN). The primary goal of this paper is to enhance the accuracy of the proposed system by optimally setting hyperparameters and designing the fully connected layers.The key advantage of our method is that it can detect not only masked and unmasked faces but also incorrect use of masks. Therefore, the proposed method classifies input face images into three categories. Experimental results demonstrate the high accuracy and efficiency of the proposed method, with an accuracy of 99.47% and 99.33% in training and test data, respectively.
Improving Multi-Person Pose Tracking with A Confidence Network
results: 实验结果显示,我们的方法在人体检测和pose estimation方面具有 universality,在 PoseTrack 2017 和2018 数据集上达到了状态对精度。Abstract
Human pose estimation and tracking are fundamental tasks for understanding human behaviors in videos. Existing top-down framework-based methods usually perform three-stage tasks: human detection, pose estimation and tracking. Although promising results have been achieved, these methods rely heavily on high-performance detectors and may fail to track persons who are occluded or miss-detected. To overcome these problems, in this paper, we develop a novel keypoint confidence network and a tracking pipeline to improve human detection and pose estimation in top-down approaches. Specifically, the keypoint confidence network is designed to determine whether each keypoint is occluded, and it is incorporated into the pose estimation module. In the tracking pipeline, we propose the Bbox-revision module to reduce missing detection and the ID-retrieve module to correct lost trajectories, improving the performance of the detection stage. Experimental results show that our approach is universal in human detection and pose estimation, achieving state-of-the-art performance on both PoseTrack 2017 and 2018 datasets.
摘要
人体姿势估计和跟踪是视频理解人类行为的基本任务。现有的顶部框架基础方法通常执行三个阶段任务:人员检测、姿势估计和跟踪。虽然已经获得了出色的结果,但这些方法受到高性能探测器的依赖,可能会在 occluded 或者检测错误时失败。为了解决这些问题,在这篇论文中,我们开发了一种新的关键点信任网络和跟踪管道,以提高顶部方法中的人员检测和姿势估计。具体来说,关键点信任网络是用于判断每个关键点是否受到遮挡的,并将其 incorporated 到姿势估计模块中。在跟踪管道中,我们提出了 Bbox-revision 模块,以减少缺失检测,以及 ID-retrieve 模块,以更正丢失的轨迹,从而提高检测阶段的性能。实验结果表明,我们的方法在人员检测和姿势估计中具有通用性和state-of-the-art 性能,在 PoseTrack 2017 和 2018 数据集上达到了最佳性能。
TiV-NeRF: Tracking and Mapping via Time-Varying Representation with Dynamic Neural Radiance Fields
results: more effective compared to current state-of-the-art dynamic mapping methodsHere’s the full summary in Simplified Chinese:
for: 这 paper 旨在将 Neural Radiance Fields (NeRF) интегрирован到 Simultaneous Localization and Mapping (SLAM) 框架中,以处理动态场景。
methods: 该 paper 提出了时间变化表示,自动Supervised 训练,不同区域采样策略,以及关键帧选择策略。
results: 比现有的动态映射方法更有效。I hope that helps!Abstract
Previous attempts to integrate Neural Radiance Fields (NeRF) into Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or treat dynamic objects as outliers. However, most of real-world scenarios is dynamic. In this paper, we propose a time-varying representation to track and reconstruct the dynamic scenes. Our system simultaneously maintains two processes, tracking process and mapping process. For tracking process, the entire input images are uniformly sampled and training of the RGB images are self-supervised. For mapping process, we leverage know masks to differentiate dynamic objects and static backgrounds, and we apply distinct sampling strategies for two types of areas. The parameters optimization for both processes are made up by two stages, the first stage associates time with 3D positions to convert the deformation field to the canonical field. And the second associates time with 3D positions in canonical field to obtain colors and Signed Distance Function (SDF). Besides, We propose a novel keyframe selection strategy based on the overlapping rate. We evaluate our approach on two publicly available synthetic datasets and validate that our method is more effective compared to current state-of-the-art dynamic mapping methods.
摘要
previous attempts to integrate Neural Radiance Fields (NeRF) into Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or treat dynamic objects as outliers. However, most of real-world scenarios is dynamic. In this paper, we propose a time-varying representation to track and reconstruct the dynamic scenes. Our system simultaneously maintains two processes, tracking process and mapping process. For tracking process, the entire input images are uniformly sampled and training of the RGB images are self-supervised. For mapping process, we leverage know masks to differentiate dynamic objects and static backgrounds, and we apply distinct sampling strategies for two types of areas. The parameters optimization for both processes are made up by two stages, the first stage associates time with 3D positions to convert the deformation field to the canonical field. And the second associates time with 3D positions in canonical field to obtain colors and Signed Distance Function (SDF). Besides, We propose a novel keyframe selection strategy based on the overlapping rate. We evaluate our approach on two publicly available synthetic datasets and validate that our method is more effective compared to current state-of-the-art dynamic mapping methods.Here's the translation in Traditional Chinese:previous attempts to integrate Neural Radiance Fields (NeRF) into Simultaneous Localization and Mapping (SLAM) framework either rely on the assumption of static scenes or treat dynamic objects as outliers. However, most of real-world scenarios is dynamic. In this paper, we propose a time-varying representation to track and reconstruct the dynamic scenes. Our system simultaneously maintains two processes, tracking process and mapping process. For tracking process, the entire input images are uniformly sampled and training of the RGB images are self-supervised. For mapping process, we leverage known masks to differentiate dynamic objects and static backgrounds, and we apply distinct sampling strategies for two types of areas. The parameters optimization for both processes are made up by two stages, the first stage associates time with 3D positions to convert the deformation field to the canonical field. And the second associates time with 3D positions in canonical field to obtain colors and Signed Distance Function (SDF). Besides, We propose a novel keyframe selection strategy based on the overlapping rate. We evaluate our approach on two publicly available synthetic datasets and validate that our method is more effective compared to current state-of-the-art dynamic mapping methods.
Identifiable Contrastive Learning with Automatic Feature Importance Discovery
methods: triCL使用了一种3因素对比的形式,即 $z_x^\top S z_{x’}$,其中 $S$ 是一个可学习的对角矩阵,自动捕捉到每个特征的重要性。
results: 我们证明了 triCL 可以不仅获得可解解释的特征,而且可以通过对比学习方法来获得更高的性能。我们还发现,高重要性的特征具有良好的可解解释性,可以捕捉到共同的类别特征。Abstract
Existing contrastive learning methods rely on pairwise sample contrast $z_x^\top z_{x'}$ to learn data representations, but the learned features often lack clear interpretability from a human perspective. Theoretically, it lacks feature identifiability and different initialization may lead to totally different features. In this paper, we study a new method named tri-factor contrastive learning (triCL) that involves a 3-factor contrast in the form of $z_x^\top S z_{x'}$, where $S=\text{diag}(s_1,\dots,s_k)$ is a learnable diagonal matrix that automatically captures the importance of each feature. We show that by this simple extension, triCL can not only obtain identifiable features that eliminate randomness but also obtain more interpretable features that are ordered according to the importance matrix $S$. We show that features with high importance have nice interpretability by capturing common classwise features, and obtain superior performance when evaluated for image retrieval using a few features. The proposed triCL objective is general and can be applied to different contrastive learning methods like SimCLR and CLIP. We believe that it is a better alternative to existing 2-factor contrastive learning by improving its identifiability and interpretability with minimal overhead. Code is available at https://github.com/PKU-ML/Tri-factor-Contrastive-Learning.
摘要
现有的对比学习方法通常基于对比样本的对比度 $z_x^\top z_{x'}$ 来学习数据表示,但学习的特征通常缺乏人类可理解的解释性。理论上来说,它缺乏特征可识别性,不同的初始化可能会导致极其不同的特征。在这篇论文中,我们研究了一种新的方法 named tri-factor contrastive learning (triCL),它包含了一种三因子对比的形式,即 $z_x^\top S z_{x'}$, 其中 $S$ 是一个可学习的对角矩阵,自动捕捉每个特征的重要性。我们显示了,通过这种简单的扩展,triCL 可以不仅获得可识别的特征,而且可以获得更加可解的特征,这些特征被排序于重要性矩阵 $S$ 中,并且高度重要的特征具有良好的解释性,可以捕捉共同的类别特征,并且在图像检索任务中获得更高的性能。我们表明了 triCL 目标是一种通用的对比学习目标,可以应用于不同的对比学习方法,如 SimCLR 和 CLIP。我们认为,triCL 是现有的 two-factor 对比学习的更好的替代方案,可以提高其可识别性和可解性,而且带来最小的开销。代码可以在 GitHub 上找到:https://github.com/PKU-ML/Tri-factor-Contrastive-Learning。
Multi-task deep learning for large-scale building detail extraction from high-resolution satellite imagery
results: 研究人员通过设计一种新的空间采样方法,可以有效地选择高分辨度卫星图像的限定示例,以提高提取建筑物详细信息的效率。此外,通过启用先进的增强技术,MT-BR可以提高预测性能和泛化能力。实验结果表明,MT-BR在不同的维度上都能够达到更高的预测精度,并且在实际应用中,可以生成包含了建筑物的空间和属性信息的一体化数据集。Abstract
Understanding urban dynamics and promoting sustainable development requires comprehensive insights about buildings. While geospatial artificial intelligence has advanced the extraction of such details from Earth observational data, existing methods often suffer from computational inefficiencies and inconsistencies when compiling unified building-related datasets for practical applications. To bridge this gap, we introduce the Multi-task Building Refiner (MT-BR), an adaptable neural network tailored for simultaneous extraction of spatial and attributional building details from high-resolution satellite imagery, exemplified by building rooftops, urban functional types, and roof architectural types. Notably, MT-BR can be fine-tuned to incorporate additional building details, extending its applicability. For large-scale applications, we devise a novel spatial sampling scheme that strategically selects limited but representative image samples. This process optimizes both the spatial distribution of samples and the urban environmental characteristics they contain, thus enhancing extraction effectiveness while curtailing data preparation expenditures. We further enhance MT-BR's predictive performance and generalization capabilities through the integration of advanced augmentation techniques. Our quantitative results highlight the efficacy of the proposed methods. Specifically, networks trained with datasets curated via our sampling method demonstrate improved predictive accuracy relative to those using alternative sampling approaches, with no alterations to network architecture. Moreover, MT-BR consistently outperforms other state-of-the-art methods in extracting building details across various metrics. The real-world practicality is also demonstrated in an application across Shanghai, generating a unified dataset that encompasses both the spatial and attributional details of buildings.
摘要
理解城市动力和推动可持续发展需要全面的建筑相关信息。 although geospatial人工智能已经提高了对地球观测数据的EXTRACTION,现有的方法经常受到计算不fficient和不一致的问题,这限制了实际应用中的建筑相关数据集的编译。为了bridging这个差距,我们介绍了多任务建筑精细化器(MT-BR),这是适应同时EXTRACTION的建筑相关细节的适应性神经网络。MT-BR可以根据需要进行微调,以包含更多的建筑细节,从而扩展其可用性。为了应对大规模应用,我们提出了一种新的空间采样方案,该方案选择了有限但表示性强的图像样本。这种方法可以最大化图像样本的空间分布和城市环境特征,从而提高EXTRACTION的效果,同时降低数据准备成本。此外,我们还通过 incorporating advanced augmentation techniques 提高MT-BR的预测性能和泛化能力。我们的量化结果表明,使用我们的采样方法训练的网络比使用其他采样方法更高的预测精度,而无需修改网络结构。此外,MT-BR还在不同的维度上一直 OUTPERFORMS 其他现有的方法。此外,我们在上海应用了MT-BR,生成了一个包括建筑物的空间和特征细节的一体化数据集,这进一步证明了MT-BR的实际可行性。
Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity
results: 研究发现,在卷积神经网络中,强制执行 sparse coding 约束可以导致 neuron 中的结构编码 emerge,从而使网络具有更好的形状偏好。这种形状偏好会使网络在不同的数据集上展现出更好的鲁棒性和可变性。代码可以在 GitHub 上找到:https://github.com/Crazy-Jack/nips2023_shape_vs_texture。Abstract
Current deep-learning models for object recognition are known to be heavily biased toward texture. In contrast, human visual systems are known to be biased toward shape and structure. What could be the design principles in human visual systems that led to this difference? How could we introduce more shape bias into the deep learning models? In this paper, we report that sparse coding, a ubiquitous principle in the brain, can in itself introduce shape bias into the network. We found that enforcing the sparse coding constraint using a non-differential Top-K operation can lead to the emergence of structural encoding in neurons in convolutional neural networks, resulting in a smooth decomposition of objects into parts and subparts and endowing the networks with shape bias. We demonstrated this emergence of shape bias and its functional benefits for different network structures with various datasets. For object recognition convolutional neural networks, the shape bias leads to greater robustness against style and pattern change distraction. For the image synthesis generative adversary networks, the emerged shape bias leads to more coherent and decomposable structures in the synthesized images. Ablation studies suggest that sparse codes tend to encode structures, whereas the more distributed codes tend to favor texture. Our code is host at the github repository: \url{https://github.com/Crazy-Jack/nips2023_shape_vs_texture}
摘要
当前深度学习模型对物体识别存在强烈的文本偏好。然而,人类视觉系统却具有形态和结构偏好。这种差异的原因可能是什么?我们可以如何在深度学习模型中引入更多的形态偏好?在这篇论文中,我们发现了一种叫做稀畴编码的原则,这种原则在大脑中是普遍存在的。我们发现,通过在 convolutional neural networks 中使用不同的 Top-K 操作来实现稀畴编码约束,可以导致神经元内的编码变得更加结构化,从而使得神经网络具有形态偏好。我们通过不同的数据集来证明这种形态偏好的出现和其功能上的好处。对于物体识别 convolutional neural networks,形态偏好使得神经网络更加抗性于样式和 Pattern 变化的干扰。对于生成 adversarial networks, emerged 形态偏好导致生成的图像更加协调和可分解。我们的代码可以在 GitHub 上找到:\url{https://github.com/Crazy-Jack/nips2023_shape_vs_texture}
Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes
results: 本文在 Waymo Open和nuScenes Dataset上实现了单目深度估计的state-of-the-art性能,对运动中的深度有显著改进。Abstract
Unsupervised monocular depth estimation techniques have demonstrated encouraging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be explained by hypothesizing the object's independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion segmentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open and nuScenes Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io.
摘要
<>Translate the given text into Simplified Chinese.<>无监督单目深度估计技术已经表现出了激动人心的结果,但通常假设场景是静止的。这些技术在动态场景下遇到问题,因为 Apparent 对象的运动可以 equally 由假设对象的独立运动或者由其深度变化来解释。这种歧义导致深度估计器预测错误的深度值。为了解决这个问题,我们介绍了 Dynamo-Depth,一种统一的方法,它在不监督的单目视频上同时学习单目深度、3D 独立流场和动态分割。我们提供了关键的思路,即一个好的初始化动态分割可以为 JOINTLY 学习深度和独立运动,尽管基本的下面歧义存在。我们的提议方法在 Waymo Open 和 nuScenes 数据集上实现了状态的最佳性能,对于运动中的对象的深度有显著改善。代码和更多结果可以在 中找到。
results: 对比于现有的音乐生成模型,JEN-1 Composer 能够实现更高的音乐质量和控制能力,并且可以在用户提供的音乐风格和元素的基础上进行高级的音乐创作。Abstract
With rapid advances in generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models exhibit strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner, differing from typical workflows of human composers. To address this issue, we propose JEN-1 Composer, a unified framework to efficiently model marginal, conditional, and joint distributions over multi-track music via a single model. JEN-1 Composer framework exhibits the capacity to seamlessly incorporate any diffusion-based music generation system, \textit{e.g.} Jen-1, enhancing its capacity for versatile multi-track music generation. We introduce a curriculum training strategy aimed at incrementally instructing the model in the transition from single-track generation to the flexible generation of multi-track combinations. During the inference, users have the ability to iteratively produce and choose music tracks that meet their preferences, subsequently creating an entire musical composition incrementally following the proposed Human-AI co-composition workflow. Quantitative and qualitative assessments demonstrate state-of-the-art performance in controllable and high-fidelity multi-track music synthesis. The proposed JEN-1 Composer represents a significant advance toward interactive AI-facilitated music creation and composition. Demos will be available at https://jenmusic.ai/audio-demos.
摘要
With the rapid development of generative artificial intelligence, the text-to-music synthesis task has emerged as a promising direction for music generation from scratch. However, finer-grained control over multi-track generation remains an open challenge. Existing models have strong raw generation capability but lack the flexibility to compose separate tracks and combine them in a controllable manner, differing from typical workflows of human composers. To address this issue, we propose JEN-1 Composer, a unified framework to efficiently model marginal, conditional, and joint distributions over multi-track music via a single model. The JEN-1 Composer framework can seamlessly incorporate any diffusion-based music generation system, such as Jen-1, enhancing its capacity for versatile multi-track music generation. We introduce a curriculum training strategy aimed at incrementally instructing the model in the transition from single-track generation to the flexible generation of multi-track combinations. During the inference, users have the ability to iteratively produce and choose music tracks that meet their preferences, subsequently creating an entire musical composition incrementally following the proposed Human-AI co-composition workflow. Quantitative and qualitative assessments demonstrate state-of-the-art performance in controllable and high-fidelity multi-track music synthesis. The proposed JEN-1 Composer represents a significant advance toward interactive AI-facilitated music creation and composition. Demos will be available at https://jenmusic.ai/audio-demos.
Predicting recovery following stroke: deep learning, multimodal data and feature selection using explainable AI
paper_authors: Adam White, Margarita Saranti, Artur d’Avila Garcez, Thomas M. H. Hope, Cathy J. Price, Howard Bowman
for: 这 paper 的目的是使用机器学习自动预测 stroke 后症状和其回归治疗的效果。
methods: 这 paper 使用了两种策略:首先使用 2D 图像概述 MRI 扫描结果,其次选择关键特征以提高分类精度。此外,文章还介绍了一种新的方法,即在 MRI 图像和表格数据之间融合学习。
results: 文章的结果显示,可以通过组合 MRI 图像和表格数据来实现高精度的 post-stroke 分类。在不同的 CNN 架构和数据 Representation 下,分类精度最高达 0.854。Abstract
Machine learning offers great potential for automated prediction of post-stroke symptoms and their response to rehabilitation. Major challenges for this endeavour include the very high dimensionality of neuroimaging data, the relatively small size of the datasets available for learning, and how to effectively combine neuroimaging and tabular data (e.g. demographic information and clinical characteristics). This paper evaluates several solutions based on two strategies. The first is to use 2D images that summarise MRI scans. The second is to select key features that improve classification accuracy. Additionally, we introduce the novel approach of training a convolutional neural network (CNN) on images that combine regions-of-interest extracted from MRIs, with symbolic representations of tabular data. We evaluate a series of CNN architectures (both 2D and a 3D) that are trained on different representations of MRI and tabular data, to predict whether a composite measure of post-stroke spoken picture description ability is in the aphasic or non-aphasic range. MRI and tabular data were acquired from 758 English speaking stroke survivors who participated in the PLORAS study. The classification accuracy for a baseline logistic regression was 0.678 for lesion size alone, rising to 0.757 and 0.813 when initial symptom severity and recovery time were successively added. The highest classification accuracy 0.854 was observed when 8 regions-of-interest was extracted from each MRI scan and combined with lesion size, initial severity and recovery time in a 2D Residual Neural Network.Our findings demonstrate how imaging and tabular data can be combined for high post-stroke classification accuracy, even when the dataset is small in machine learning terms. We conclude by proposing how the current models could be improved to achieve even higher levels of accuracy using images from hospital scanners.
摘要
Machine learning可以提供很大的潜在 для自动预测 poste stroke 症状和其回归治疗的结果。主要挑战包括神经成像数据的非常高维度,可用学习 dataset 的较小尺寸,以及如何有效地结合神经成像和表格数据(例如人口信息和临床特征)。本文评估了多种解决方案,包括使用 2D 图像简化 MRI 扫描结果,以及选择关键特征来提高分类精度。此外,我们还引入了一种新的方法,即在 MRI 扫描结果和表格数据之间进行 симвоlic 表示的训练 convolutional neural network (CNN)。我们评估了一系列 CNN 架构(包括 2D 和 3D),并在不同的 MRI 和表格数据表示下进行训练,以预测stroke 后 spoken picture 描述能力是否在非语症状范围内。MRI 和表格数据来自英国758名中文roke 幸存者参与了PLORAS 研究。分类精度的最高值为 0.854,出现在使用 8 个区域特征 Extracted from each MRI scan 和 lesion size、初始症状严重程度和Recovery time 的 2D Residual Neural Network 中。我们的发现表明,通过结合神经成像和表格数据,可以实现高度的 poste stroke 分类精度,即使数据集较小。我们结束的建议如下:通过使用医疗器械上的图像,可以进一步提高当前模型的准确率。
Rare Event Probability Learning by Normalizing Flows
paper_authors: Zhenggqi Gao, Dinghuai Zhang, Luca Daniel, Duane S. Boning for:NOFIS 是一种用于估计罕seen事件的方法,可以在多种领域中提供精确的估计。methods:NOFIS 使用了 normalizing flows 的特点,通过学习一系列的提案分布来实现高效的估计。results:NOFIS 在多个测试 caso 中表现出色,superior 于基eline方法,并且可以提供高质量的估计结果。Abstract
A rare event is defined by a low probability of occurrence. Accurate estimation of such small probabilities is of utmost importance across diverse domains. Conventional Monte Carlo methods are inefficient, demanding an exorbitant number of samples to achieve reliable estimates. Inspired by the exact sampling capabilities of normalizing flows, we revisit this challenge and propose normalizing flow assisted importance sampling, termed NOFIS. NOFIS first learns a sequence of proposal distributions associated with predefined nested subset events by minimizing KL divergence losses. Next, it estimates the rare event probability by utilizing importance sampling in conjunction with the last proposal. The efficacy of our NOFIS method is substantiated through comprehensive qualitative visualizations, affirming the optimality of the learned proposal distribution, as well as a series of quantitative experiments encompassing $10$ distinct test cases, which highlight NOFIS's superiority over baseline approaches.
摘要
一种罕见事件被定义为发生的概率很低。正确地估计这种小概率是多种领域的关键问题。传统的 Монте卡洛方法是不具有效率,需要很多样本来获得可靠的估计。以启发式扩展流为引用,我们再次挑战这个问题,并提出了正则化流助けimportance sampling(NOFIS)方法。NOFIS首先学习一个序列的提议分布,这些分布与预先定义的嵌套子事件相关。然后,它使用重要样本法并与最后一个提议相结合,来估计罕见事件的概率。我们通过对多个测试案例进行详细的可见化,证明了提议分布的优化性,以及对基eline方法的超越。
Automaton Distillation: Neuro-Symbolic Transfer Learning for Deep Reinforcement Learning
results: 我们的实验结果显示,静止转移和动态转移都可以减少获得最佳策略所需的时间,并且在不同的决策任务中均有良好的表现。Abstract
Reinforcement learning (RL) is a powerful tool for finding optimal policies in sequential decision processes. However, deep RL methods suffer from two weaknesses: collecting the amount of agent experience required for practical RL problems is prohibitively expensive, and the learned policies exhibit poor generalization on tasks outside of the training distribution. To mitigate these issues, we introduce automaton distillation, a form of neuro-symbolic transfer learning in which Q-value estimates from a teacher are distilled into a low-dimensional representation in the form of an automaton. We then propose two methods for generating Q-value estimates: static transfer, which reasons over an abstract Markov Decision Process constructed based on prior knowledge, and dynamic transfer, where symbolic information is extracted from a teacher Deep Q-Network (DQN). The resulting Q-value estimates from either method are used to bootstrap learning in the target environment via a modified DQN loss function. We list several failure modes of existing automaton-based transfer methods and demonstrate that both static and dynamic automaton distillation decrease the time required to find optimal policies for various decision tasks.
摘要
《强化学习(RL)是一种有力的工具,可以找到sequential decision process中的优化策略。但是深度RL方法受到两点弱点:收集agent经验所需的成本是实际RL问题中 prohibitively expensive,并且学习的策略具有poor generalization在训练分布外的任务上。为了缓解这些问题,我们引入自动机液化,一种neuro-symbolic transfer learning的形式,其中Q值估计从教师中提取到一个低维度表示形式中,这个形式是一个自动机。然后我们提出了两种方法来生成Q值估计:静态传输,这里是reasoning over一个基于先前知识构建的抽象Markov决策过程,以及动态传输,其中Symbolic信息从一个教师深度Q网络(DQN)中提取出来。这些Q值估计的结果被用来启动target环境中的学习,通过一个修改后DQN损失函数。我们列出了现有自动机基于转移方法的失败模式,并证明了静态和动态自动机液化都可以降低在不同决策任务中找到优化策略所需的时间。》Note: Please note that the translation is in Simplified Chinese, and the word order and grammar may be different from the original text.
paper_authors: Elnaserledinellah Mahmood Abdelwahab for:The paper challenges the assumptions of Modern Logics, particularly those of Frege, Russell, and Tarski, and their applications in formal languages.methods:The paper uses undisputed principles of Arabic to falsify the Logicians’ ideas and demonstrate the limitations of their approaches. It also utilizes the existence of “meaning-particles” in Arabic syntax to efficiently recognize words, phrases, and sentences.results:The paper shows that the assumptions of Modern Logics contradict basic principles of Arabic, and that the approaches based on these assumptions are not applicable to Arabic. It also presents a new way to approach the computational problem of Satisfiability (SAT) using the realization that parsing Arabic utilizes the existence of “meaning-particles” within syntax. The paper provides practical evidence, obtained for multiplication circuits, supporting its claims.Abstract
Modern Logics, as formulated notably by Frege, Russell and Tarski involved basic assumptions about Natural Languages in general and Indo-European Languages in particular, which are contested by Linguists. Based upon those assumptions, formal Languages were designed to overcome what Logicians claimed to be 'defects' of Natural Language. In this paper we show that those assumptions contradict basic principles of Arabic. More specifically: The Logicians ideas, that within Natural Language words refer to objects, 'ToBe'-constructions represent identity statements, Indefinite Descriptions must be replaced by existential quantifiers to form meaningful Sentences and Symbols can have no interpretation-independent meanings, are all falsified using undisputed principles of Arabic. The here presented falsification serves two purposes. First, it is used as a factual basis for the rejection of approaches adopting Semantic axioms of Mathematical Logics as Models for meaning of Arabic Syntax. Second, it shows a way to approach the important computational problem: Satisfiability (SAT). The described way is based upon the realization that parsing Arabic utilizes the existence of 'meaning-particles' within Syntax to efficiently recognize words, phrases and Sentences. Similar meaning-particles are shown to exist in 3CNF formulas, which, when properly handled within the machinery of 3SAT-Solvers, enable structural conditions to be imposed on formulas, sufficient alone to guarantee the efficient production of non-exponentially sized Free Binary Decision Diagrams (FBDDs). We show, why known exponential Lower Bounds on sizes of FBDDs do not contradict our results and reveal practical evidence, obtained for multiplication circuits, supporting our claims.
摘要
现代逻辑,如Frege、Russell和Tarski所提出的基本假设,对于自然语言和印欧语言而言都存在争议。基于这些假设,形式语言被设计用于超越逻辑家所认为自然语言存在的缺陷。在这篇论文中,我们表明了这些假设与阿拉伯语言的基本原理矛盾。具体来说,逻辑家所认为的各种假设,如自然语言中词语引用对象、'ToBe'-构造表示Identidad句子、不确定描述需要通过存在量词替换来形成意义的句子,以及符号没有独立的解释意义,都被使用不争的阿拉伯语言原理驳斥。这种驳斥服两目的。首先,它用作拒绝采用数学逻辑语义模型的方法的拒绝基础。其次,它显示了如何使用阿拉伯语言的存在意思粒子来效率地识别单词、短语和句子。这种方法还可以应用于计算问题:满足(SAT)。我们表明了如何使用这种方法,并解释了为什么已知的下界不会与我们的结果相矛盾。此外,我们还提供了实践证据,来支持我们的主张。
Dynamic V2X Autonomous Perception from Road-to-Vehicle Vision
paper_authors: Jiayao Tan, Fan Lyu, Linyan Li, Fuyuan Hu, Tingliang Feng, Fenglei Xu, Rui Yao for: 提高自动驾驶系统的安全性和可靠性,适应动态场景methods: 基于路径视觉建立道路到车辆视觉,提出适应性强的道路到车辆视觉感知方法(AR2VP)results: 在3D对象检测和分割任务中,AR2VP在性能和带宽之间做出了优秀的折衔,同时在动态环境中保持了模型的适应性。Abstract
Vehicle-to-everything (V2X) perception is an innovative technology that enhances vehicle perception accuracy, thereby elevating the security and reliability of autonomous systems. However, existing V2X perception methods focus on static scenes from mainly vehicle-based vision, which is constrained by sensor capabilities and communication loads. To adapt V2X perception models to dynamic scenes, we propose to build V2X perception from road-to-vehicle vision and present Adaptive Road-to-Vehicle Perception (AR2VP) method. In AR2VP,we leverage roadside units to offer stable, wide-range sensing capabilities and serve as communication hubs. AR2VP is devised to tackle both intra-scene and inter-scene changes. For the former, we construct a dynamic perception representing module, which efficiently integrates vehicle perceptions, enabling vehicles to capture a more comprehensive range of dynamic factors within the scene.Moreover, we introduce a road-to-vehicle perception compensating module, aimed at preserving the maximized roadside unit perception information in the presence of intra-scene changes.For inter-scene changes, we implement an experience replay mechanism leveraging the roadside unit's storage capacity to retain a subset of historical scene data, maintaining model robustness in response to inter-scene shifts. We conduct perception experiment on 3D object detection and segmentation, and the results show that AR2VP excels in both performance-bandwidth trade-offs and adaptability within dynamic environments.
摘要
自动驾驶系统的安全和可靠性得到了提高,由于交通场景的变化和不确定性,需要进一步提高汽车的感知精度。现有的V2X感知方法都是基于主要是汽车视觉的静止场景,受到感知器和通信负担的限制。为了适应动态场景,我们提出了基于路面到汽车视觉的Adaptive Road-to-Vehicle Perception(AR2VP)方法。在AR2VP中,我们利用路边设备提供稳定、广泛感知能力,并作为通信枢纽。AR2VP能够应对内场景和间场景变化。对于内场景变化,我们构建了动态感知表示模块,能够有效地集成汽车感知,让汽车能够捕捉更广泛的动态因素。此外,我们引入了路面到汽车感知补做模块,以保持路边设备感知信息的最大化,对于内场景变化。对于间场景变化,我们实施了经验回放机制,利用路边设备的存储容量保留一部分历史场景数据,以保持模型对间场景变化的Robustness。我们对3D объек特征检测和分割进行感知实验,结果显示,AR2VP在性能和带宽之间的负担平衡和动态环境中的适应性都表现出色。
results: CACTUS在多种数据集和iot平台上实现了显著的准确率、响应时间和计算预算的改善。Abstract
While existing strategies for optimizing deep learning-based classification models on low-power platforms assume the models are trained on all classes of interest, this paper posits that adopting context-awareness i.e. focusing solely on the likely classes in the current context, can substantially enhance performance in resource-constrained environments. We propose a new paradigm, CACTUS, for scalable and efficient context-aware classification where a micro-classifier recognizes a small set of classes relevant to the current context and, when context change happens, rapidly switches to another suitable micro-classifier. CACTUS has several innovations including optimizing the training cost of context-aware classifiers, enabling on-the-fly context-aware switching between classifiers, and selecting the best context-aware classifiers given limited resources. We show that CACTUS achieves significant benefits in accuracy, latency, and compute budget across a range of datasets and IoT platforms.
摘要
While existing strategies for optimizing deep learning-based classification models on low-power platforms assume the models are trained on all classes of interest, this paper proposes a new approach that focuses solely on the likely classes in the current context, which can significantly enhance performance in resource-constrained environments. The proposed paradigm, CACTUS, is designed for scalable and efficient context-aware classification, where a micro-classifier recognizes a small set of classes relevant to the current context and can rapidly switch to another suitable micro-classifier when context changes occur. CACTUS has several innovative features, including optimizing the training cost of context-aware classifiers, enabling on-the-fly context-aware switching between classifiers, and selecting the best context-aware classifiers given limited resources. The paper shows that CACTUS achieves significant benefits in accuracy, latency, and compute budget across a range of datasets and IoT platforms.
Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery
paper_authors: Huseyin Fuat Alsan, Taner Arsan for:这篇论文探索了在多modal深度学习模型下进行后灾分析,使用了curriculum learning方法来优化模型的性能。methods:这篇论文提出了一个curriculum learning策略,通过让深度学习模型在增加复杂性的数据上进行运动,以提高模型的性能。这篇论文使用了U-Net模型进行semantic segmentation和图像编码,并使用了自定义的文本分类器进行视觉问题回答。results:这篇论文的结果显示, DATWEP方法可以帮助提高多modal深度学习模型的视觉问题回答性能。 sources code可以在https://github.com/fualsan/DATWEP上取得。Abstract
This paper explores post-disaster analytics using multimodal deep learning models trained with curriculum learning method. Studying post-disaster analytics is important as it plays a crucial role in mitigating the impact of disasters by providing timely and accurate insights into the extent of damage and the allocation of resources. We propose a curriculum learning strategy to enhance the performance of multimodal deep learning models. Curriculum learning emulates the progressive learning sequence in human education by training deep learning models on increasingly complex data. Our primary objective is to develop a curriculum-trained multimodal deep learning model, with a particular focus on visual question answering (VQA) capable of jointly processing image and text data, in conjunction with semantic segmentation for disaster analytics using the FloodNet\footnote{https://github.com/BinaLab/FloodNet-Challenge-EARTHVISION2021} dataset. To achieve this, U-Net model is used for semantic segmentation and image encoding. A custom built text classifier is used for visual question answering. Existing curriculum learning methods rely on manually defined difficulty functions. We introduce a novel curriculum learning approach termed Dynamic Task and Weight Prioritization (DATWEP), which leverages a gradient-based method to automatically decide task difficulty during curriculum learning training, thereby eliminating the need for explicit difficulty computation. The integration of DATWEP into our multimodal model shows improvement on VQA performance. Source code is available at https://github.com/fualsan/DATWEP.
摘要
Web3 Meets AI Marketplace: Exploring Opportunities, Analyzing Challenges, and Suggesting Solutions
results: 本文提出了一种解决AI市场在Web3空间快速发展的方案,并且打开了该领域的新的商业机会。Abstract
Web3 and AI have been among the most discussed fields over the recent years, with substantial hype surrounding each field's potential to transform the world as we know it. However, as the hype settles, it's evident that neither AI nor Web3 can address all challenges independently. Consequently, the intersection of AI and Web3 is gaining increased attention, emerging as a new field with the potential to address the limitations of each. In this article, we will focus on the integration of web3 and the AI marketplace, where AI services and products can be provided in a decentralized manner (DeAI). A comprehensive review is provided by summarizing the opportunities and challenges on this topic. Additionally, we offer analyses and solutions to address these challenges. We've developed a framework that lets users pay with any kind of cryptocurrency to get AI services. Additionally, they can also enjoy AI services for free on our platform by simply locking up their assets temporarily in the protocol. This unique approach is a first in the industry. Before this, offering free AI services in the web3 community wasn't possible. Our solution opens up exciting opportunities for the AI marketplace in the web3 space to grow and be widely adopted.
摘要
“Web3和人工智能(AI)在最近几年内得到了很多关注,但是它们无法独立解决全部问题。因此,Web3和AI之间的交叉领域正在吸引越来越多的关注,并且被认为可以抵消每个领域的局限性。本文将关注Web3和AI市场的融合,即DeAI(Decentralized AI)。我们将提供全面的机会和挑战的概述,以及解决这些挑战的分析和解决方案。我们已经开发了一套框架,允许用户使用任何种 криптовалю来购买AI服务。此外,用户还可以在我们的平台上免费使用AI服务,只需将资产短时间内锁定在协议中。这种独特的方法是行业中的首次实践。在这之前,在Web3社区中无法免费提供AI服务。我们的解决方案将推动AI市场在Web3空间广泛采用和普及。”
Roles of Scaling and Instruction Tuning in Language Perception: Model vs. Human Attention
results: 结果显示, scaling 可以增强人类阅读注意力的效果,并减少无关的模式依赖性,而 instruction tuning 则不会对语言理解产生影响。此外,现有的 LLMS 都在注意力方面存在一定的不足,它们的注意力更接近非native than native 语言。Abstract
Recent large language models (LLMs) have revealed strong abilities to understand natural language. Since most of them share the same basic structure, i.e. the transformer block, possible contributors to their success in the training process are scaling and instruction tuning. However, how these factors affect the models' language perception is unclear. This work compares the self-attention of several existing LLMs (LLaMA, Alpaca and Vicuna) in different sizes (7B, 13B, 30B, 65B), together with eye saccade, an aspect of human reading attention, to assess the effect of scaling and instruction tuning on language perception. Results show that scaling enhances the human resemblance and improves the effective attention by reducing the trivial pattern reliance, while instruction tuning does not. However, instruction tuning significantly enhances the models' sensitivity to instructions. We also find that current LLMs are consistently closer to non-native than native speakers in attention, suggesting a sub-optimal language perception of all models. Our code and data used in the analysis is available on GitHub.
摘要
最近的大型语言模型(LLMs)表现出了对自然语言的强大理解能力。由于大多数这些模型具有相同的基本结构,即转换块,因此可能的贡献因素包括扩大和指导调整。然而,这些因素对模型的语言识别是如何影响的还未清楚。这项工作比较了一些现有的LLMs(LLaMA、Alpaca和Vicuna)在不同大小(7B、13B、30B、65B)中的自我注意力,以及人类阅读注意力的眼动踪迹,以评估扩大和指导调整对语言识别的影响。结果显示,扩大可以提高人类类似性和有效注意力,而减少了杂乱模式的依赖性。然而,指导调整并没有这样的效果。此外,我们发现现有的LLMs在注意力方面都更接近非本地语言 speaker,这表明所有模型的语言识别能力有所不足。我们在 GitHub 上提供了代码和数据,用于进行分析。
results: 这个论文的结果表明,使用” Bespoke solvers”可以大幅提高生成质量,只需要1%的GPU时间和80个学习参数。Abstract
Diffusion or flow-based models are powerful generative paradigms that are notoriously hard to sample as samples are defined as solutions to high-dimensional Ordinary or Stochastic Differential Equations (ODEs/SDEs) which require a large Number of Function Evaluations (NFE) to approximate well. Existing methods to alleviate the costly sampling process include model distillation and designing dedicated ODE solvers. However, distillation is costly to train and sometimes can deteriorate quality, while dedicated solvers still require relatively large NFE to produce high quality samples. In this paper we introduce "Bespoke solvers", a novel framework for constructing custom ODE solvers tailored to the ODE of a given pre-trained flow model. Our approach optimizes an order consistent and parameter-efficient solver (e.g., with 80 learnable parameters), is trained for roughly 1% of the GPU time required for training the pre-trained model, and significantly improves approximation and generation quality compared to dedicated solvers. For example, a Bespoke solver for a CIFAR10 model produces samples with Fr\'echet Inception Distance (FID) of 2.73 with 10 NFE, and gets to 1% of the Ground Truth (GT) FID (2.59) for this model with only 20 NFE. On the more challenging ImageNet-64$\times$64, Bespoke samples at 2.2 FID with 10 NFE, and gets within 2% of GT FID (1.71) with 20 NFE.
摘要
Diffusion或流程基本模型是一种强大的生成概念,但它们很难进行样本生成,因为样本是定义为高维度常微方程(ODE)或随机常微方程(SDE)的解。现有的方法可以减少样本生成的成本,包括模型热塑化和专门设计的ODE解程。然而,热塑化训练成本较高,并且有时会降低质量,而专门的解程仍然需要相对较多的功能评估(NFE)来生成高质量的样本。在这篇论文中,我们介绍了“特制解程”,一种新的框架,用于建立针对已经训练的流变模型的自定义ODE解程。我们的方法通过优化一个与顺序数相同的并高效的参数(例如80个学习参数),在约1%的GPU时间上训练,并显著改善了样本生成和预测质量,相比特定解程。例如,一个特制解程为CIFAR10模型生成的样本的Fréchet Inception Distance(FID)为2.73,只需10个NFE,并在20个NFE下达到1%的原始真实值(GT)FID(2.59)。在更加困难的ImageNet-64×64上,特制样本的FID为2.2,只需10个NFE,并在20个NFE下达到2%的GT FID(1.71)。
Gauge-optimal approximate learning for small data classification problems
for: 这篇论文是为了解决小数据学问题, Specifically, the paper aims to address small data learning problems, where there is a significant discrepancy between the limited amount of response variable observations and the large feature space dimension.
methods: 本论文提出了一个新的方法,即对焦点测量(Gauge-Optimal Approximate Learning,GOAL)算法,这个算法可以实现缩减和旋转特征空间,并提供一个分析可能的解方案。 The paper proposes a new method called Gauge-Optimal Approximate Learning (GOAL) algorithm, which can reduce and rotate the feature space and provide an analytically tractable joint solution to the dimension reduction, feature segmentation, and classification problems for small data learning problems.
results: 实验结果显示, compared to other state-of-the-art machine learning (ML) tools, the proposed GOAL algorithm outperforms the reported best competitors for these problems both in learning performance and computational cost. The experimental results show that the proposed algorithm can accurately classify the data and is more efficient than other methods.Abstract
Small data learning problems are characterized by a significant discrepancy between the limited amount of response variable observations and the large feature space dimension. In this setting, the common learning tools struggle to identify the features important for the classification task from those that bear no relevant information, and cannot derive an appropriate learning rule which allows to discriminate between different classes. As a potential solution to this problem, here we exploit the idea of reducing and rotating the feature space in a lower-dimensional gauge and propose the Gauge-Optimal Approximate Learning (GOAL) algorithm, which provides an analytically tractable joint solution to the dimension reduction, feature segmentation and classification problems for small data learning problems. We prove that the optimal solution of the GOAL algorithm consists in piecewise-linear functions in the Euclidean space, and that it can be approximated through a monotonically convergent algorithm which presents -- under the assumption of a discrete segmentation of the feature space -- a closed-form solution for each optimization substep and an overall linear iteration cost scaling. The GOAL algorithm has been compared to other state-of-the-art machine learning (ML) tools on both synthetic data and challenging real-world applications from climate science and bioinformatics (i.e., prediction of the El Nino Southern Oscillation and inference of epigenetically-induced gene-activity networks from limited experimental data). The experimental results show that the proposed algorithm outperforms the reported best competitors for these problems both in learning performance and computational cost.
摘要
小数据学习问题特征在于响应变量观察数量较少,而特征空间维度很大。在这种情况下,常见的学习工具困难分化重要的特征和无关信息,并 derivate一个适当的学习规则来区分不同的类别。为解决这个问题,我们利用减少和旋转特征空间的低维度投影的想法,并提出了一种可解算的GOAL算法,该算法可以同时解决小数据学习问题中的维度减少、特征分解和分类问题。我们证明了GOAL算法的优化解决方案是均匀分割的几何函数,并且可以通过一个 monotonically convergent 算法来approximate,该算法在特征空间的精确分割情况下具有closed-form解决方案和linear iteration cost scaling。GOAL算法与其他当前领先的机器学习工具进行比较,在 sintetic data 和挑战性的实际应用中(如气候科学和生物信息学)表现出色,其性能和计算成本均高于报道的最佳竞争对手。
results: 研究人员透过调整该多modal的感知系统,提高了网球的推进精度和速度,并且引入了一个新的旋转估计方法以提高网球的旋转精度。最后,研究人员还展示了结合事件式摄像头和神经网络的精度实时网球检测方法。Abstract
In recent years, robotic table tennis has become a popular research challenge for perception and robot control. Here, we present an improved table tennis robot system with high accuracy vision detection and fast robot reaction. Based on previous work, our system contains a KUKA robot arm with 6 DOF, with four frame-based cameras and two additional event-based cameras. We developed a novel calibration approach to calibrate this multimodal perception system. For table tennis, spin estimation is crucial. Therefore, we introduced a novel, and more accurate spin estimation approach. Finally, we show how combining the output of an event-based camera and a Spiking Neural Network (SNN) can be used for accurate ball detection.
摘要
Recently, robotic table tennis has become a popular research challenge for perception and robot control. Here, we present an improved table tennis robot system with high accuracy vision detection and fast robot reaction. Based on previous work, our system contains a KUKA robot arm with 6 DOF, with four frame-based cameras and two additional event-based cameras. We developed a novel calibration approach to calibrate this multimodal perception system. For table tennis, spin estimation is crucial. Therefore, we introduced a novel, and more accurate spin estimation approach. Finally, we show how combining the output of an event-based camera and a Spiking Neural Network (SNN) can be used for accurate ball detection.Translation in Simplified Chinese:近年来,机器人乒乓球成为了视觉和机器人控制领域的流行研究挑战。在这里,我们提出了一个改进型的乒乓球机器人系统,拥有高精度视觉检测和快速机器人反应。基于先前的工作,我们的系统包括一个KUKA机器人臂 WITH 6 DOE,以及四个帧基的摄像头和两个事件基的摄像头。我们开发了一种新的委外纠偏方法来委外这个多模态感知系统。为乒乓球而言,轨迹估计是非常重要的。因此,我们引入了一种新的、更加准确的轨迹估计方法。最后,我们示出了将事件基摄像头的输出和神经网络(SNN)结合使用,可以实现高精度的球体检测。
TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
results: TESTA可以减少视频Token的数量,提高视频编码的效率,并且在五个数据集上进行了段落到视频检索和长形视频问答任务的实验,经验表明,TESTA可以提高计算效率1.7倍,并且可以充分利用更长的输入帧,例如+13.7 R@1 on QuerYD和+6.5 R@1 on Condensed Movie。Abstract
Large-scale video-language pre-training has made remarkable strides in advancing video-language understanding tasks. However, the heavy computational burden of video encoding remains a formidable efficiency bottleneck, particularly for long-form videos. These videos contain massive visual tokens due to their inherent 3D properties and spatiotemporal redundancy, making it challenging to capture complex temporal and spatial relationships. To tackle this issue, we propose an efficient method called TEmporal-Spatial Token Aggregation (TESTA). TESTA condenses video semantics by adaptively aggregating similar frames, as well as similar patches within each frame. TESTA can reduce the number of visual tokens by 75% and thus accelerate video encoding. Building upon TESTA, we introduce a pre-trained video-language model equipped with a divided space-time token aggregation module in each video encoder block. We evaluate our model on five datasets for paragraph-to-video retrieval and long-form VideoQA tasks. Experimental results show that TESTA improves computing efficiency by 1.7 times, and achieves significant performance gains from its scalability in processing longer input frames, e.g., +13.7 R@1 on QuerYD and +6.5 R@1 on Condensed Movie.
摘要
大规模视频语言预训练已经取得了关键视频语言理解任务的显著进步。然而,视频编码的计算沉重仍然是效率瓶颈,特别是长形视频。这些视频具有自然的3D特性和空间时间重复,使得捕捉复杂的时间和空间关系变得困难。为解决这个问题,我们提出了高效的方法called TESTA。TESTA通过适应地聚合相似帧和帧内相似的小块来缩短视 semantics。TESTA可以减少视觉token数量,从而加速视频编码。基于TESTA,我们介绍了一个带有分开的空间时间token聚合模块的预训练视频语言模型。我们在五个数据集上进行了对 paragraph-to-video retrieval和长形 VideoQA任务的测试。实验结果显示,TESTA可以提高计算效率,并且在处理 longer input frames 时 achieved significant performance gains,比如QuerYD上的+13.7 R@1和Condensed Movie上的+6.5 R@1。
A Unique Training Strategy to Enhance Language Models Capabilities for Health Mention Detection from Social Media Content
paper_authors: Pervaiz Iqbal Khan, Muhammad Nabeel Asim, Andreas Dengel, Sheraz Ahmed
for: 提取社交媒体上的健康相关内容,用于疾病传播、评估药物对疾病的影响等应用。
methods: 采用随机权重扰动和对比学习策略来训练语言模型,以便从社交媒体文本中提取通用模式。
results: 提出一种基于多种语言模型的元预测器,可以将社交媒体文本分类为非健康和健康相关两类,并在三个公共评测数据集上实现了3.87%的F1分数提升和超过现有健康提及分类预测器的性能。Abstract
An ever-increasing amount of social media content requires advanced AI-based computer programs capable of extracting useful information. Specifically, the extraction of health-related content from social media is useful for the development of diverse types of applications including disease spread, mortality rate prediction, and finding the impact of diverse types of drugs on diverse types of diseases. Language models are competent in extracting the syntactic and semantics of text. However, they face a hard time extracting similar patterns from social media texts. The primary reason for this shortfall lies in the non-standardized writing style commonly employed by social media users. Following the need for an optimal language model competent in extracting useful patterns from social media text, the key goal of this paper is to train language models in such a way that they learn to derive generalized patterns. The key goal is achieved through the incorporation of random weighted perturbation and contrastive learning strategies. On top of a unique training strategy, a meta predictor is proposed that reaps the benefits of 5 different language models for discriminating posts of social media text into non-health and health-related classes. Comprehensive experimentation across 3 public benchmark datasets reveals that the proposed training strategy improves the performance of the language models up to 3.87%, in terms of F1-score, as compared to their performance with traditional training. Furthermore, the proposed meta predictor outperforms existing health mention classification predictors across all 3 benchmark datasets.
摘要
随着社交媒体内容的不断增加,需要更高级的人工智能计算机程序来提取有用信息。具体来说,从社交媒体中提取健康相关内容非常有用,可以用于生病传播、死亡率预测和不同类型的药物对不同类型疾病的影响等多种应用。语言模型可以提取文本的语法和 semantics,但它们在社交媒体文本上遇到困难。主要的原因在于社交媒体用户通常采用不标准的写作风格。为了解决这个问题,本文的关键目标是使语言模型学习泛化模式。这个目标通过随机权重扰动和对比学习策略来实现。此外,我们还提出了一种基于5种语言模型的元预测器,可以对社交媒体文本分类为非健康和健康相关类别。通过对3个公共 benchmark 数据集进行广泛的实验,我们发现,我们的训练策略可以提高语言模型的性能,相比传统训练策略,提高F1-score的表现达3.87%。此外,我们的元预测器还可以在所有3个 benchmark 数据集上超过现有的健康提及分类预测器。
MILL: Mutual Verification with Large Language Models for Zero-Shot Query Expansion
results: 在三个资讯搜寻 dataset 上进行了广泛的实验,与其他基eline相比,提高了查询扩展的性能。Abstract
Query expansion is a commonly-used technique in many search systems to better represent users' information needs with additional query terms. Existing studies for this task usually propose to expand a query with retrieved or generated contextual documents. However, both types of methods have clear limitations. For retrieval-based methods, the documents retrieved with the original query might not be accurate enough to reveal the search intent, especially when the query is brief or ambiguous. For generation-based methods, existing models can hardly be trained or aligned on a particular corpus, due to the lack of corpus-specific labeled data. In this paper, we propose a novel Large Language Model (LLM) based mutual verification framework for query expansion, which alleviates the aforementioned limitations. Specifically, we first design a query-query-document generation pipeline, which can effectively leverage the contextual knowledge encoded in LLMs to generate sub-queries and corresponding documents from multiple perspectives. Next, we employ a mutual verification method for both generated and retrieved contextual documents, where 1) retrieved documents are filtered with the external contextual knowledge in generated documents, and 2) generated documents are filtered with the corpus-specific knowledge in retrieved documents. Overall, the proposed method allows retrieved and generated documents to complement each other to finalize a better query expansion. We conduct extensive experiments on three information retrieval datasets, i.e., TREC-DL-2020, TREC-COVID, and MSMARCO. The results demonstrate that our method outperforms other baselines significantly.
摘要
很多搜索系统中使用的查询扩展技术可以更好地表达用户的信息需求。现有的研究通常是使用已经retsieved或生成的文档来扩展查询。然而,这两种方法都有明显的局限性。对于retsieval-based方法来说,用于扩展查询的文档可能并不准确地反映搜索意图,特别是当查询语句简短或 ambiguous 时。对于生成-based方法来说,现有的模型很难在特定的文献上进行训练或对Alignment,因为缺乏带有标注数据的文献特有的训练数据。在本文中,我们提出了一种基于大型自然语言模型(LLM)的 queries 的共同验证框架,以解决以上所述的局限性。具体来说,我们首先设计了一个查询-查询-文档生成管道,可以借助LLM中的上下文知识来生成多个角度的子查询和相应的文档。接着,我们使用一种互verify方法,其中1)retsieved的文档被 Filter 出外部上下文知识生成的文档中,2)生成的文档被 Filter 出特定文献中的训练数据。总的来说,我们的方法可以让retsieved和生成的文档互相补做,以实现更好的查询扩展。我们在TREC-DL-2020、TREC-COVID和MSMARCO三个信息检索数据集上进行了广泛的实验,结果表明我们的方法与其他基准方法相比显著有优势。
Exploring the Emotional Landscape of Music: An Analysis of Valence Trends and Genre Variations in Spotify Music Data
results: 研究发现了音乐情感关系的模式,包括时间的变化和情绪的过渡。这些发现有助于深入理解音乐和情感之间的关系,并提供了长期的音乐情感探索。Abstract
This paper conducts an intricate analysis of musical emotions and trends using Spotify music data, encompassing audio features and valence scores extracted through the Spotipi API. Employing regression modeling, temporal analysis, mood transitions, and genre investigation, the study uncovers patterns within music-emotion relationships. Regression models linear, support vector, random forest, and ridge, are employed to predict valence scores. Temporal analysis reveals shifts in valence distribution over time, while mood transition exploration illuminates emotional dynamics within playlists. The research contributes to nuanced insights into music's emotional fabric, enhancing comprehension of the interplay between music and emotions through years.
摘要
Note: Simplified Chinese is also known as "Mandarin" Chinese.Translation Notes:* "valence scores" is translated as "积分" (jīpǐn), which is a term commonly used in Chinese to refer to the emotional content of music.* "regression modeling" is translated as "回归分析" (huíjì fāngxì), which is a term commonly used in Chinese to refer to statistical modeling techniques used to predict continuous outcomes.* "temporal analysis" is translated as "时间分析" (shíjiàn fāngxì), which is a term commonly used in Chinese to refer to the analysis of data over time.* "mood transitions" is translated as "情绪转移" (qíngxù zhōngmǐ), which is a term commonly used in Chinese to refer to the changes in emotional states within a piece of music.* "genre investigation" is translated as "类型调查" (lèitype jiàozhè), which is a term commonly used in Chinese to refer to the examination of different styles or genres of music.
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
results: 根据实验结果,TeacherLM-7.1B模型在MMLU测试中获得了零shot得分52.3,超过了大多数超过100亿参数的模型。此外,基于TeacherLM-7.1B模型,我们对58个NLP数据集进行了数据增强,并在多任务 Setting中教育了不同参数的OPT和BLOOM系列学生模型。实验结果表明,TeacherLM的数据增强对学生模型带来了显著的改进。Abstract
Large Language Models (LLMs) exhibit impressive reasoning and data augmentation capabilities in various NLP tasks. However, what about small models? In this work, we propose TeacherLM-7.1B, capable of annotating relevant fundamentals, chain of thought, and common mistakes for most NLP samples, which makes annotation more than just an answer, thus allowing other models to learn "why" instead of just "what". The TeacherLM-7.1B model achieved a zero-shot score of 52.3 on MMLU, surpassing most models with over 100B parameters. Even more remarkable is its data augmentation ability. Based on TeacherLM-7.1B, we augmented 58 NLP datasets and taught various student models with different parameters from OPT and BLOOM series in a multi-task setting. The experimental results indicate that the data augmentation provided by TeacherLM has brought significant benefits. We will release the TeacherLM series of models and augmented datasets as open-source.
摘要
大语言模型(LLM)在不同的自然语言处理任务中表现出了吸引人的推理和数据增强能力。然而,小型模型呢?在这项工作中,我们提出了TeacherLM-7.1B模型,可以对大多数NLU样本进行相关基础知识、链条思维和常见错误的标注,使得其他模型可以学习“为什么”而不仅仅是“什么”,从而提高NLU模型的泛化能力。TeacherLM-7.1B模型在MMLU上取得了零基eline得分52.3,超过了大多数超过100亿参数的模型。此外,TeacherLM还具有出色的数据增强能力。基于TeacherLM,我们对58个NLU数据集进行了增强,并使用不同的OPT和BLOOM系列模型在多任务 Setting中进行了启发。实验结果表明,TeacherLM提供的数据增强对NLU模型的表现带来了显著的改善。我们将发布TeacherLM系列模型和增强数据集作为开源。
Uncovering Prototypical Knowledge for Weakly Open-Vocabulary Semantic Segmentation
paper_authors: Fei Zhang, Tianfei Zhou, Boyang Li, Hao He, Chaofan Ma, Tianjiao Zhang, Jiangchao Yao, Ya Zhang, Yanfeng Wang
for: 本研究探讨了弱类开放词汇 semantic segmentation(WOVSS)问题,即通过仅使用图像和文本对进行学习, segmenting objects of arbitrary classes。
methods: existings works 增强 vanilla vision transformer 的方法,通过引入显式分组识别,例如使用多个组token/中心来分组图像 токен并进行组级文本对齐。
results: 我们的提议方法可以减少对group token的粒度不一致,并且可以在不同的批处理级别进行多模态规范化,从而提高分区能力和精度。实验结果显示,我们的提议方法可以在多个 benchmark 数据集上达到状态 искусственный智能的性能。Abstract
This paper studies the problem of weakly open-vocabulary semantic segmentation (WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs. Existing works turn to enhance the vanilla vision transformer by introducing explicit grouping recognition, i.e., employing several group tokens/centroids to cluster the image tokens and perform the group-text alignment. Nevertheless, these methods suffer from a granularity inconsistency regarding the usage of group tokens, which are aligned in the all-to-one v.s. one-to-one manners during the training and inference phases, respectively. We argue that this discrepancy arises from the lack of elaborate supervision for each group token. To bridge this granularity gap, this paper explores explicit supervision for the group tokens from the prototypical knowledge. To this end, this paper proposes the non-learnable prototypical regularization (NPR) where non-learnable prototypes are estimated from source features to serve as supervision and enable contrastive matching of the group tokens. This regularization encourages the group tokens to segment objects with less redundancy and capture more comprehensive semantic regions, leading to increased compactness and richness. Based on NPR, we propose the prototypical guidance segmentation network (PGSeg) that incorporates multi-modal regularization by leveraging prototypical sources from both images and texts at different levels, progressively enhancing the segmentation capability with diverse prototypical patterns. Experimental results show that our proposed method achieves state-of-the-art performance on several benchmark datasets. The source code is available at https://github.com/Ferenas/PGSeg.
摘要
Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China.
AMIR: Automated MisInformation Rebuttal – A COVID-19 Vaccination Datasets based Recommendation System
results: 研究表明,通过使用这种方法,可以快速、高效地对谣言进行回击,并且可以扩展到其他社交媒体平台和谣言类型。Abstract
Misinformation has emerged as a major societal threat in recent years in general; specifically in the context of the COVID-19 pandemic, it has wrecked havoc, for instance, by fuelling vaccine hesitancy. Cost-effective, scalable solutions for combating misinformation are the need of the hour. This work explored how existing information obtained from social media and augmented with more curated fact checked data repositories can be harnessed to facilitate automated rebuttal of misinformation at scale. While the ideas herein can be generalized and reapplied in the broader context of misinformation mitigation using a multitude of information sources and catering to the spectrum of social media platforms, this work serves as a proof of concept, and as such, it is confined in its scope to only rebuttal of tweets, and in the specific context of misinformation regarding COVID-19. It leverages two publicly available datasets, viz. FaCov (fact-checked articles) and misleading (social media Twitter) data on COVID-19 Vaccination.
摘要
“误情传播已经成为现代社会的主要问题,尤其是在 COVID-19 大流行期间。这种误情传播可能会导致疫苗抵触,例如通过传播不实信息。为了解决这个问题,我们需要一些可靠且可扩展的解决方案。这个研究探索了如何使用社交媒体上的现有信息,加上更加精心的实验 checked 数据库,来自动反驳误情传播。这个研究的想法可以应用于更 широ的误情传播问题,使用多种信息来源和覆盖多个社交媒体平台。这个研究作为证明,仅对于 Twitter 上的反驳误情传播进行了评估,并且仅在 COVID-19 疫苗接种方面进行了评估。它使用了两个公共可用的数据集,namely,FaCov(实验 checked 文章)和 misleading(社交媒体 Twitter)数据集。”Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.
Bipartite Graph Pre-training for Unsupervised Extractive Summarization with Graph Convolutional Auto-Encoders
results: 我们的方法在下游任务中表现出色,超越了使用BERT或RoBERTa的句子表示。Abstract
Pre-trained sentence representations are crucial for identifying significant sentences in unsupervised document extractive summarization. However, the traditional two-step paradigm of pre-training and sentence-ranking, creates a gap due to differing optimization objectives. To address this issue, we argue that utilizing pre-trained embeddings derived from a process specifically designed to optimize cohensive and distinctive sentence representations helps rank significant sentences. To do so, we propose a novel graph pre-training auto-encoder to obtain sentence embeddings by explicitly modelling intra-sentential distinctive features and inter-sentential cohesive features through sentence-word bipartite graphs. These pre-trained sentence representations are then utilized in a graph-based ranking algorithm for unsupervised summarization. Our method produces predominant performance for unsupervised summarization frameworks by providing summary-worthy sentence representations. It surpasses heavy BERT- or RoBERTa-based sentence representations in downstream tasks.
摘要
NP-SBFL: Bridging the Gap Between Spectrum-Based Fault Localization and Faulty Neural Pathways Diagnosis
results: 对于MNIST和CIFAR-10两个常用的数据集,以及三种异常神经元度量 Tarantula、Ochiai 和 Barinel,我们的方法比基elines更有效地识别异常路径和生成攻击输入。特别是在 Tarantula 上,NP-SBFL-MGA 的异常检测率达到 96.75%,超过 DeepFault 在 Ochiai 上的 89.90% 和 NP-SBFL-GA 在 Ochiai 上的 60.61%。Abstract
Deep learning has revolutionized various real-world applications, but the quality of Deep Neural Networks (DNNs) remains a concern. DNNs are complex and have millions of parameters, making it difficult to determine their contributions to fulfilling a task. Moreover, the behavior of a DNN is highly influenced by the data used during training, making it challenging to collect enough data to exercise all potential DNN behavior under all possible scenarios. This paper proposes a novel NP-SBFL method that adapts spectrum-based fault localization (SBFL) to locate faulty neural pathways. Our method identifies critical neurons using the layer-wise relevance propagation (LRP) technique and determines which critical neurons are faulty. We propose a multi-stage gradient ascent (MGA), an extension of gradient ascent, to effectively activate a sequence of neurons one at a time while maintaining the activation of previous neurons. We evaluated the effectiveness of our method on two commonly used datasets, MNIST and CIFAR-10, two baselines DeepFault and NP-SBFL-GA, and three suspicious neuron measures, Tarantula, Ochiai, and Barinel. The empirical results showed that NP-SBFL-MGA is statistically more effective than the baselines at identifying suspicious paths and synthesizing adversarial inputs. Particularly, Tarantula on NP-SBFL-MGA had the highest fault detection rate at 96.75%, surpassing DeepFault on Ochiai (89.90%) and NP-SBFL-GA on Ochiai (60.61%). Our approach also yielded comparable results to the baselines in synthesizing naturalness inputs, and we found a positive correlation between the coverage of critical paths and the number of failed tests in DNN fault localization.
摘要
深度学习已经革命化了各种实际应用,但是深度神经网络(DNN)的质量仍然是一大问题。DNN具有很多参数,因此很难确定它们在完成任务时的贡献。此外,DNN的行为受到训练数据的影响,因此收集足够的数据来覆盖所有可能的scenario是很困难的。这篇论文提出了一种基于spectrum-based fault localization(SBFL)的新方法,用于 locate faulty neural pathways。我们的方法使用层 wise relevance propagation(LRP)技术来确定关键神经元,并使用多Stage gradient ascent(MGA)来有效地激活一系列神经元,而不会产生前一个神经元的激活失效。我们对MNIST和CIFAR-10两个常用的数据集进行了评估,与DeepFault和NP-SBFL-GA两个基eline进行了比较,以及三种异常神经元度量 Tarantula、Ochiai和Barinel。实际结果表明,NP-SBFL-MGA在 Identifying suspicious paths和生成攻击输入方面具有 statistically higher effectiveness than baselines。特别是在 Tarantula上,NP-SBFL-MGA的异常检测率为 96.75%,超过 DeepFault on Ochiai(89.90%)和 NP-SBFL-GA on Ochiai(60.61%)。我们的方法还在生成自然输入方面得到了相似的结果,并发现了关键路径覆盖率和失败测试数量之间的正相关关系。
DCQA: Document-Level Chart Question Answering towards Complex Reasoning and Common-Sense Understanding
results: 论文提出了一个新的DCQA数据集,包含6种不同的图表样式和699,051个需要高度理解和常识能力的问题。此外,论文还提出了一种使用表格数据、颜色集和基本问题模板生成大量理智问答对的问题生成引擎。Abstract
Visually-situated languages such as charts and plots are omnipresent in real-world documents. These graphical depictions are human-readable and are often analyzed in visually-rich documents to address a variety of questions that necessitate complex reasoning and common-sense responses. Despite the growing number of datasets that aim to answer questions over charts, most only address this task in isolation, without considering the broader context of document-level question answering. Moreover, such datasets lack adequate common-sense reasoning information in their questions. In this work, we introduce a novel task named document-level chart question answering (DCQA). The goal of this task is to conduct document-level question answering, extracting charts or plots in the document via document layout analysis (DLA) first and subsequently performing chart question answering (CQA). The newly developed benchmark dataset comprises 50,010 synthetic documents integrating charts in a wide range of styles (6 styles in contrast to 3 for PlotQA and ChartQA) and includes 699,051 questions that demand a high degree of reasoning ability and common-sense understanding. Besides, we present the development of a potent question-answer generation engine that employs table data, a rich color set, and basic question templates to produce a vast array of reasoning question-answer pairs automatically. Based on DCQA, we devise an OCR-free transformer for document-level chart-oriented understanding, capable of DLA and answering complex reasoning and common-sense questions over charts in an OCR-free manner. Our DCQA dataset is expected to foster research on understanding visualizations in documents, especially for scenarios that require complex reasoning for charts in the visually-rich document. We implement and evaluate a set of baselines, and our proposed method achieves comparable results.
摘要
文本中的可见语言如图表和折衣图是现实生活中文档中 ubique 存在的。这些图形展示是人类可读的,并在文档中常常用于回答复杂的问题,需要复杂的理解和常识。Despite the growing number of datasets that aim to answer questions over charts, most only address this task in isolation, without considering the broader context of document-level question answering. Moreover, such datasets lack adequate common-sense reasoning information in their questions. In this work, we introduce a novel task named 文档级图表问题回答 (DCQA). The goal of this task is to conduct document-level question answering, extracting charts or plots in the document via 文档布局分析 (DLA) first and subsequently performing 图表问题回答 (CQA). The newly developed benchmark dataset comprises 50,010 synthetic documents integrating charts in a wide range of styles (6 styles in contrast to 3 for PlotQA and ChartQA) and includes 699,051 questions that demand a high degree of reasoning ability and common-sense understanding. Besides, we present the development of a potent question-answer generation engine that employs table data, a rich color set, and basic question templates to produce a vast array of reasoning question-answer pairs automatically. Based on DCQA, we devise an OCR-free transformer for document-level chart-oriented understanding, capable of DLA and answering complex reasoning and common-sense questions over charts in an OCR-free manner. Our DCQA dataset is expected to foster research on understanding visualizations in documents, especially for scenarios that require complex reasoning for charts in the visually-rich document. We implement and evaluate a set of baselines, and our proposed method achieves comparable results.
results: 初步结果表明,LLMs在非西方世界的社会规范方面往往无法理解和遵循当地的习俗。Abstract
Etiquettes are an essential ingredient of day-to-day interactions among people. Moreover, etiquettes are region-specific, and etiquettes in one region might contradict those in other regions. In this paper, we propose EtiCor, an Etiquettes Corpus, having texts about social norms from five different regions across the globe. The corpus provides a test bed for evaluating LLMs for knowledge and understanding of region-specific etiquettes. Additionally, we propose the task of Etiquette Sensitivity. We experiment with state-of-the-art LLMs (Delphi, Falcon40B, and GPT-3.5). Initial results indicate that LLMs, mostly fail to understand etiquettes from regions from non-Western world.
摘要
礼仪是日常人际交流中的重要组成部分。同时,礼仪在不同地区之间可能存在差异,一地的礼仪可能与另一地的礼仪矛盾。在这篇论文中,我们提出了“礼仪库”(EtiCor),包含来自五个不同地区的社会规范文本。该库提供了对LRMs(语言模型)的评估和理解地区特有的礼仪知识的测试平台。此外,我们还提出了“礼仪敏感度”的任务。我们对当今顶尖LRMs(Delphi、Falcon40B和GPT-3.5)进行了实验,初步结果显示,LRMs在非西方世界地区的礼仪知识方面表现不佳。
Analyzing Vision Transformers for Image Classification in Class Embedding Space
results: 研究发现,图像块在层次结构中发展出了类型特定的表示,这与注意力机制和上下文信息有关。此外,该方法还可以用来确定图像中关键的部分,并且与传统的直接探测方法相比,具有显著的优势。Abstract
Despite the growing use of transformer models in computer vision, a mechanistic understanding of these networks is still needed. This work introduces a method to reverse-engineer Vision Transformers trained to solve image classification tasks. Inspired by previous research in NLP, we demonstrate how the inner representations at any level of the hierarchy can be projected onto the learned class embedding space to uncover how these networks build categorical representations for their predictions. We use our framework to show how image tokens develop class-specific representations that depend on attention mechanisms and contextual information, and give insights on how self-attention and MLP layers differentially contribute to this categorical composition. We additionally demonstrate that this method (1) can be used to determine the parts of an image that would be important for detecting the class of interest, and (2) exhibits significant advantages over traditional linear probing approaches. Taken together, our results position our proposed framework as a powerful tool for mechanistic interpretability and explainability research.
摘要
尽管变换器模型在计算机视觉领域的使用正在增长,但是我们仍需要更好地理解这些网络。这项工作提出了一种方法,用于反向工程视Transformers在图像分类任务上训练过后的内部表示。 Drawing inspiration from previous research in NLP, we demonstrate how the inner representations at any level of the hierarchy can be projected onto the learned class embedding space to uncover how these networks build categorical representations for their predictions. We use our framework to show how image tokens develop class-specific representations that depend on attention mechanisms and contextual information, and give insights on how self-attention and MLP layers differentially contribute to this categorical composition. We additionally demonstrate that this method (1) can be used to determine the parts of an image that would be important for detecting the class of interest, and (2) exhibits significant advantages over traditional linear probing approaches. Taken together, our results position our proposed framework as a powerful tool for mechanistic interpretability and explainability research.
Spacecraft Autonomous Decision-Planning for Collision Avoidance: a Reinforcement Learning Approach
paper_authors: Nicolas Bourriez, Adrien Loizeau, Adam F. Abdin for:The paper is written for the purpose of proposing an implementation of autonomous collision avoidance decision-making capabilities on spacecraft using reinforcement learning techniques.methods:The proposed methodology is based on a partially observable Markov decision process (POMDP) framework, which considers epistemic and aleatory uncertainties and allows the AI system on board the spacecraft to learn stochastic policies for accurate collision avoidance maneuvers.results:The objective of the paper is to successfully delegate the decision-making process for autonomously implementing a collision avoidance maneuver to the spacecraft without human intervention, allowing for a faster response in the decision-making process and highly decentralized operations.Abstract
The space environment around the Earth is becoming increasingly populated by both active spacecraft and space debris. To avoid potential collision events, significant improvements in Space Situational Awareness (SSA) activities and Collision Avoidance (CA) technologies are allowing the tracking and maneuvering of spacecraft with increasing accuracy and reliability. However, these procedures still largely involve a high level of human intervention to make the necessary decisions. For an increasingly complex space environment, this decision-making strategy is not likely to be sustainable. Therefore, it is important to successfully introduce higher levels of automation for key Space Traffic Management (STM) processes to ensure the level of reliability needed for navigating a large number of spacecraft. These processes range from collision risk detection to the identification of the appropriate action to take and the execution of avoidance maneuvers. This work proposes an implementation of autonomous CA decision-making capabilities on spacecraft based on Reinforcement Learning (RL) techniques. A novel methodology based on a Partially Observable Markov Decision Process (POMDP) framework is developed to train the Artificial Intelligence (AI) system on board the spacecraft, considering epistemic and aleatory uncertainties. The proposed framework considers imperfect monitoring information about the status of the debris in orbit and allows the AI system to effectively learn stochastic policies to perform accurate Collision Avoidance Maneuvers (CAMs). The objective is to successfully delegate the decision-making process for autonomously implementing a CAM to the spacecraft without human intervention. This approach would allow for a faster response in the decision-making process and for highly decentralized operations.
摘要
地球附近的空间环境正在不断增加活跃的空间craft和空间垃圾的数量,为了避免 potential collision event,空间 situational awareness (SSA) 活动和 collision avoidance (CA) 技术得到了重要改进,可以准确地跟踪和 manipulate 空间craft。然而,这些过程仍然需要人类参与,以便做出必要的决策。随着空间环境的增加复杂度,这种决策策略可能不可持续。因此,需要成功地把Space Traffic Management (STM) 过程中的关键步骤自动化,以确保在大量空间craft navigating 时的可靠性。这些过程包括风险检测和避免措施的识别以及执行避免措施。本工作提出了基于 reinforcement learning (RL) 技术的自动化 CA 决策能力的实现。一种基于 partially observable Markov decision process (POMDP) 框架的新方法被开发,用于在空间craft上训练人工智能 (AI) 系统,考虑到了 epistemic 和 aleatory 不确定性。该方法考虑了在轨道上监测空间垃圾的状况不准确的情况,并允许 AI 系统学习 Stochastic policies 以实现准确的避免措施。目标是成功地委托空间craft上的 AI 系统自动实施避免措施,无需人类参与。这种方法可以提供更快的决策过程,并允许高度分布式的操作。
End-to-End Autoregressive Retrieval via Bootstrapping for Smart Reply Systems
results: 实验结果表明,这种方法可以与一些现有的基eline方法相比,在三个数据集上表现出5.1%-17.9%的改善空间,并且在0.5%-63.1%的多样性上具有显著的改善。Abstract
Reply suggestion systems represent a staple component of many instant messaging and email systems. However, the requirement to produce sets of replies, rather than individual replies, makes the task poorly suited for out-of-the-box retrieval architectures, which only consider individual message-reply similarity. As a result, these system often rely on additional post-processing modules to diversify the outputs. However, these approaches are ultimately bottlenecked by the performance of the initial retriever, which in practice struggles to present a sufficiently diverse range of options to the downstream diversification module, leading to the suggestions being less relevant to the user. In this paper, we consider a novel approach that radically simplifies this pipeline through an autoregressive text-to-text retrieval model, that learns the smart reply task end-to-end from a dataset of (message, reply set) pairs obtained via bootstrapping. Empirical results show this method consistently outperforms a range of state-of-the-art baselines across three datasets, corresponding to a 5.1%-17.9% improvement in relevance, and a 0.5%-63.1% improvement in diversity compared to the best baseline approach. We make our code publicly available.
摘要
快件和电子邮件系统中的回复建议系统是一个基本组件。然而,需要生成多个回复而不是单个回复,使得这种任务与传统的检索架构不Compatible,后者只考虑单个消息和回复之间的相似性。因此,这些系统通常需要额外的后处理模块来增加多样性。然而,这些方法受到最初的检索器的性能的限制,导致下游多样化模块无法提供充分多样化的选项,从而导致建议相对较少 relevance。在这篇论文中,我们考虑了一种新的方法,通过自然语言模型来实现简化这个管道。我们通过对 (消息、回复集) 对的数据集进行 bootstrap 来学习端到端的文本到文本检索模型。实验结果表明,这种方法可以一直 exceed state-of-the-art 基准方法,在三个数据集上取得了5.1%-17.9%的改善,并在多样性方面取得了0.5%-63.1%的改善。我们将代码公开发布。
Mask Propagation for Efficient Video Semantic Segmentation
results: 我们的mask propagation方法在VSPW和Cityscapes dataset上实现了SOTA的准确率和效率负担的 equilibrio。例如,我们的最佳模型(Swin-L backbone)在VSPW dataset上比SOTA的MRCFA(使用MiT-B5)高4.0%的mIoU,仅需26%的FLOPs。此外,我们的框架可以在Cityscapes验证集上减少到4倍的FLOPs,同时保持只有2%的mIoU下降。代码可以在https://github.com/ziplab/MPVSS中获取。Abstract
Video Semantic Segmentation (VSS) involves assigning a semantic label to each pixel in a video sequence. Prior work in this field has demonstrated promising results by extending image semantic segmentation models to exploit temporal relationships across video frames; however, these approaches often incur significant computational costs. In this paper, we propose an efficient mask propagation framework for VSS, called MPVSS. Our approach first employs a strong query-based image segmentor on sparse key frames to generate accurate binary masks and class predictions. We then design a flow estimation module utilizing the learned queries to generate a set of segment-aware flow maps, each associated with a mask prediction from the key frame. Finally, the mask-flow pairs are warped to serve as the mask predictions for the non-key frames. By reusing predictions from key frames, we circumvent the need to process a large volume of video frames individually with resource-intensive segmentors, alleviating temporal redundancy and significantly reducing computational costs. Extensive experiments on VSPW and Cityscapes demonstrate that our mask propagation framework achieves SOTA accuracy and efficiency trade-offs. For instance, our best model with Swin-L backbone outperforms the SOTA MRCFA using MiT-B5 by 4.0% mIoU, requiring only 26% FLOPs on the VSPW dataset. Moreover, our framework reduces up to 4x FLOPs compared to the per-frame Mask2Former baseline with only up to 2% mIoU degradation on the Cityscapes validation set. Code is available at https://github.com/ziplab/MPVSS.
摘要
视频 semantic segmentation (VSS) 是将每帧视频序列中的每个像素分配 semantic 标签。先前的工作在这个领域已经实现了可观的结果,通过将图像 semantic segmentation 模型扩展到利用视频帧之间的时间关系,但这些方法经常产生巨大的计算成本。在这篇论文中,我们提出了一种高效的面 mask 传播框架,称为 MPVSS。我们的方法首先使用强大的查询基于图像 segmentor 在稀疏的关键帧上生成准确的二进制面和分类预测。然后,我们设计了学习查询的流量估计模块,使用学习的查询来生成每帧视频的流量映射,每个映射都与关键帧中的面预测相关。最后,面-流量对被折叠,以用于非关键帧的面预测。通过重用关键帧的预测,我们绕过处理大量视频帧的资源占用INTENSIVE segmentor,解决时间重复和大幅减少计算成本。广泛的实验表明,我们的面传播框架在 VSPW 和 Cityscapes 上实现了最佳的质量和效率的交换。例如,我们的最佳模型(Swin-L 背景)在 VSPW 数据集上的 mIoU 比 SOTA MRCFA(使用 MiT-B5 背景)高4.0%,而计算成本只占26% FLOPs。此外,我们的框架可以在 Cityscapes 验证集上减少至4倍的计算成本,而且只增加了2% mIoU 下降。代码可以在 GitHub 上找到:https://github.com/ziplab/MPVSS。
Building a Safer Maritime Environment Through Multi-Path Long-Term Vessel Trajectory Forecasting
paper_authors: Gabriel Spadon, Jay Kumar, Matthew Smith, Sarah Vela, Romina Gehrmann, Derek Eden, Joshua van Berkel, Amilcar Soares, Ronan Fablet, Ronald Pelot, Stan Matwin
results: 该模型在加拿大圣劳伦斯湾(Gulf of St. Lawrence)实现了R2分数超过98%,并且在不同的技术和特征下实现了高精度预测。此外,模型还表现出了更好的复杂决策能力和更高的准确率, average和 median预测错误分别为11km和6km。Abstract
Maritime transport is paramount to global economic growth and environmental sustainability. In this regard, the Automatic Identification System (AIS) data plays a significant role by offering real-time streaming data on vessel movement, which allows for enhanced traffic surveillance, assisting in vessel safety by avoiding vessel-to-vessel collisions and proactively preventing vessel-to-whale ones. This paper tackles an intrinsic problem to trajectory forecasting: the effective multi-path long-term vessel trajectory forecasting on engineered sequences of AIS data. We utilize an encoder-decoder model with Bidirectional Long Short-Term Memory Networks (Bi-LSTM) to predict the next 12 hours of vessel trajectories using 1 to 3 hours of AIS data. We feed the model with probabilistic features engineered from the AIS data that refer to the potential route and destination of each trajectory so that the model, leveraging convolutional layers for spatial feature learning and a position-aware attention mechanism that increases the importance of recent timesteps of a sequence during temporal feature learning, forecasts the vessel trajectory taking the potential route and destination into account. The F1 Score of these features is approximately 85% and 75%, indicating their efficiency in supplementing the neural network. We trialed our model in the Gulf of St. Lawrence, one of the North Atlantic Right Whales (NARW) habitats, achieving an R2 score exceeding 98% with varying techniques and features. Despite the high R2 score being attributed to well-defined shipping lanes, our model demonstrates superior complex decision-making during path selection. In addition, our model shows enhanced accuracy, with average and median forecasting errors of 11km and 6km, respectively. Our study confirms the potential of geographical data engineering and trajectory forecasting models for preserving marine life species.
摘要
海运是全球经济增长和环境可持续性的重要因素。在这个意义上,自动识别系统(AIS)数据在实时流处理中提供了船舶运动的实时流处理数据,从而帮助提高船舶管理和避免船舶相撞和避免船舶和鲸鱼相撞。这篇论文面临了一个核心问题:在Engineered Sequence of AIS Data上进行多path长期船舶轨迹预测。我们使用了编码器-解码器模型,并利用Bi-LSTM网络来预测下一个12小时的船舶轨迹,使用1-3小时的AIS数据。我们对AIS数据进行了Engineering probabilistic特征,这些特征描述了每个轨迹的潜在路径和目的地,以便模型可以通过卷积层学习空间特征和时间特征来预测船舶轨迹。F1 Score的效果为 approximately 85%和75%,表明这些特征的有效性。我们在加拿大圣劳伦斯湾进行了试验, achieve an R2 score exceeding 98% with varying techniques and features。 despite the high R2 score being attributed to well-defined shipping lanes, our model demonstrates superior complex decision-making during path selection。此外,我们的模型还显示了更高的准确率, average和median forecasting errors of 11km和6km, respectively。我们的研究证明了地理数据工程和轨迹预测模型在保护海洋生物种的潜在作用。
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
results: 这个研究获得了在对其他LLM-based代理人的胜率最高,并在对反对人类玩家的情况下保持了稳定性的成果。Abstract
Agents built with large language models (LLMs) have recently achieved great advancements. However, most of the efforts focus on single-agent or cooperative settings, leaving more general multi-agent environments underexplored. We propose a new framework powered by reinforcement learning (RL) to develop strategic language agents, i.e., LLM-based agents with strategic thinking ability, for a popular language game, Werewolf. Werewolf is a social deduction game with hidden roles that involves both cooperation and competition and emphasizes deceptive communication and diverse gameplay. Our agent tackles this game by first using LLMs to reason about potential deceptions and generate a set of strategically diverse actions. Then an RL policy, which selects an action from the candidates, is learned by population-based training to enhance the agents' decision-making ability. By combining LLMs with the RL policy, our agent produces a variety of emergent strategies, achieves the highest win rate against other LLM-based agents, and stays robust against adversarial human players in the Werewolf game.
摘要
大型语言模型(LLM)驱动的代理人最近获得了重要的进步。然而,大多数努力都集中在单个代理人或合作设置上,留下更通用的多代理人环境得以探索。我们提出了一个基于返点学习(RL)的新框架,用于开发语言代理人,即基于LLM的策略思维代理人,用于受欢迎的语言游戏《狼人》。《狼人》是一款社交推理游戏,涉及到协作和竞争,并强调误导性的交流和多样化游戏。我们的代理人首先使用LLM来理解潜在的误导和生成一组策略多样化的动作。然后,通过人口学习来学习RL策略,从候选actions中选择一个动作,以提高代理人的决策能力。通过结合LLM和RL策略,我们的代理人产生了多种emergent策略,在与其他LLM-based代理人的比赛中获得最高胜率,并在对人类玩家的抗击中保持稳定。
Machine Learning Algorithms to Predict Chess960 Result and Develop Opening Themes
results: 研究使用三种机器学习算法(KNN clustering、Random Forest 和 Gradient Boosted Trees)预测游戏结果,并通过分析开局中每个位置的 Piece 的移动,预测游戏的发展方向。Abstract
This work focuses on the analysis of Chess 960, also known as Fischer Random Chess, a variant of traditional chess where the starting positions of the pieces are randomized. The study aims to predict the game outcome using machine learning techniques and develop an opening theme for each starting position. The first part of the analysis utilizes machine learning models to predict the game result based on certain moves in each position. The methodology involves segregating raw data from .pgn files into usable formats and creating datasets comprising approximately 500 games for each starting position. Three machine learning algorithms -- KNN Clustering, Random Forest, and Gradient Boosted Trees -- have been used to predict the game outcome. To establish an opening theme, the board is divided into five regions: center, white kingside, white queenside, black kingside, and black queenside. The data from games played by top engines in all 960 positions is used to track the movement of pieces in the opening. By analysing the change in the number of pieces in each region at specific moves, the report predicts the region towards which the game is developing. These models provide valuable insights into predicting game outcomes and understanding the opening theme in Chess 960.
摘要
To begin, the study uses machine learning models to predict the game result based on certain moves in each position. The methodology involves separating raw data from .pgn files into usable formats and creating datasets comprising approximately 500 games for each starting position. Three machine learning algorithms - KNN Clustering, Random Forest, and Gradient Boosted Trees - have been used to predict the game outcome.To establish an opening theme, the board is divided into five regions: center, white kingside, white queenside, black kingside, and black queenside. The data from games played by top engines in all 960 positions is used to track the movement of pieces in the opening. By analyzing the change in the number of pieces in each region at specific moves, the report predicts the region towards which the game is developing. These models provide valuable insights into predicting game outcomes and understanding the opening theme in Chess 960.
The Utility of “Even if…” Semifactual Explanation to Optimise Positive Outcomes
results: 对比PRIOR WORK,该论文的算法能够更好地最大化用户的增值(Gain),并且 causality 在这个过程中具有重要性。 最重要的是,用户测试表明,当用户收到贷款批准的积极结果时,semifactual 解释比counterfactuals更有用。Abstract
When users receive either a positive or negative outcome from an automated system, Explainable AI (XAI) has almost exclusively focused on how to mutate negative outcomes into positive ones by crossing a decision boundary using counterfactuals (e.g., \textit{"If you earn 2k more, we will accept your loan application"}). Here, we instead focus on \textit{positive} outcomes, and take the novel step of using XAI to optimise them (e.g., \textit{"Even if you wish to half your down-payment, we will still accept your loan application"}). Explanations such as these that employ "even if..." reasoning, and do not cross a decision boundary, are known as semifactuals. To instantiate semifactuals in this context, we introduce the concept of \textit{Gain} (i.e., how much a user stands to benefit from the explanation), and consider the first causal formalisation of semifactuals. Tests on benchmark datasets show our algorithms are better at maximising gain compared to prior work, and that causality is important in the process. Most importantly however, a user study supports our main hypothesis by showing people find semifactual explanations more useful than counterfactuals when they receive the positive outcome of a loan acceptance.
摘要
(Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation is written in the formal tone, which is appropriate for academic writing.)
Self Attention with Temporal Prior: Can We Learn More from Arrow of Time?
results: 实验结果显示,使用该方法可以在医疗记录(EHR)数据集上实现出色的预测结果,并在大多数任务和数据集上超越最佳模型。Abstract
Many of diverse phenomena in nature often inherently encode both short and long term temporal dependencies, short term dependencies especially resulting from the direction of flow of time. In this respect, we discovered experimental evidences suggesting that {\it interrelations} of these events are higher for closer time stamps. However, to be able for attention based models to learn these regularities in short term dependencies, it requires large amounts of data which are often infeasible. This is due to the reason that, while they are good at learning piece wised temporal dependencies, attention based models lack structures that encode biases in time series. As a resolution, we propose a simple and efficient method that enables attention layers to better encode short term temporal bias of these data sets by applying learnable, adaptive kernels directly to the attention matrices. For the experiments, we chose various prediction tasks using Electronic Health Records (EHR) data sets since they are great examples that have underlying long and short term temporal dependencies. The results of our experiments show exceptional classification results compared to best performing models on most of the task and data sets.
摘要
很多自然现象具有各种多样化特征,其中许多现象具有短期和长期时间依赖关系。特别是短期时间依赖关系通常由时间流动的方向决定。我们发现了实验证据,表明在更近的时间戳之间的事件关系更高。然而,为了让注意力基本模型学习这些短期时间依赖关系,需要大量数据,而这些数据经常是不可能获得的。这是因为注意力基本模型能够学习独立的时间序列,但缺乏时间序列中的偏好编码结构。为解决这个问题,我们提出了一种简单而高效的方法,即在注意力矩阵上应用学习可变核函数,以更好地编码短期时间依赖关系。我们选择使用医疗记录(EHR)数据集进行实验,因为它们具有下面和长期时间依赖关系的特点。实验结果表明,我们的方法在大多数任务和数据集上达到了最佳性能。
CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved Self-Supervised Video Hashing
for: The paper is written for improving the efficiency of video retrieval by compressing videos into binary codes and learning accurate hash codes for video retrieval.
methods: The paper uses contrastive learning with augmentation strategies to capture global spatio-temporal information and local spatio-temporal details within video frames, and incorporates two collaborative learning tasks to enhance the perception of temporal structure and the modeling of spatio-temporal relationships.
results: The proposed method outperforms state-of-the-art self-supervised video hashing methods on four video benchmark datasets.Here’s the simplified Chinese text for the three key points:
results: 提议的方法在四个视频标准数据集上超越了现有的自动生成视频哈希方法。Abstract
Compressing videos into binary codes can improve retrieval speed and reduce storage overhead. However, learning accurate hash codes for video retrieval can be challenging due to high local redundancy and complex global dependencies between video frames, especially in the absence of labels. Existing self-supervised video hashing methods have been effective in designing expressive temporal encoders, but have not fully utilized the temporal dynamics and spatial appearance of videos due to less challenging and unreliable learning tasks. To address these challenges, we begin by utilizing the contrastive learning task to capture global spatio-temporal information of videos for hashing. With the aid of our designed augmentation strategies, which focus on spatial and temporal variations to create positive pairs, the learning framework can generate hash codes that are invariant to motion, scale, and viewpoint. Furthermore, we incorporate two collaborative learning tasks, i.e., frame order verification and scene change regularization, to capture local spatio-temporal details within video frames, thereby enhancing the perception of temporal structure and the modeling of spatio-temporal relationships. Our proposed Contrastive Hashing with Global-Local Spatio-temporal Information (CHAIN) outperforms state-of-the-art self-supervised video hashing methods on four video benchmark datasets. Our codes will be released.
摘要
压缩视频到二进制编码可以提高检索速度和减少存储开销。然而,学习准确的Hash代码 для视频检索可以是一个挑战,因为视频帧之间存在高度的本地重复和复杂的全球依赖关系,特别是在标签缺失的情况下。现有的自动编号视频方法有效地设计了表达力强的时间编码器,但是它们没有完全利用视频的时间动态和空间显示特征,尤其是在较难和不可靠的学习任务下。为解决这些挑战,我们开始利用对比学习任务来捕捉视频的全球空间时间信息,并采用我们设计的扩展策略,以创建正确的对应对。这些扩展策略专注于视频帧中的空间和时间变化,以便生成不变于运动、缩放和视点的Hash代码。此外,我们还集成了两个协作学习任务,即帧顺序验证和场景变化规范,以捕捉视频帧中的本地空间时间细节,从而增强视频的时间结构和空间时间关系的模型。我们提出的Contrastive Hashing with Global-Local Spatio-temporal Information(CHAIN)方法在四个视频标准测试集上超越了当前自动编号视频方法的性能。我们的代码将会公开发布。
QWID: Quantized Weed Identification Deep neural network
results: 该方法在ResNet-50和InceptionV3架构上实现了准确率与模型大小、执行时间之间的平衡,并在Desktop、Mobile和Raspberry Pi等实际生产环境中实现了显著的模型大小和执行时间减少,同时保持了准确率的水平。Abstract
In this paper, we present an efficient solution for weed classification in agriculture. We focus on optimizing model performance at inference while respecting the constraints of the agricultural domain. We propose a Quantized Deep Neural Network model that classifies a dataset of 9 weed classes using 8-bit integer (int8) quantization, a departure from standard 32-bit floating point (fp32) models. Recognizing the hardware resource limitations in agriculture, our model balances model size, inference time, and accuracy, aligning with practical requirements. We evaluate the approach on ResNet-50 and InceptionV3 architectures, comparing their performance against their int8 quantized versions. Transfer learning and fine-tuning are applied using the DeepWeeds dataset. The results show staggering model size and inference time reductions while maintaining accuracy in real-world production scenarios like Desktop, Mobile and Raspberry Pi. Our work sheds light on a promising direction for efficient AI in agriculture, holding potential for broader applications. Code: https://github.com/parikshit14/QNN-for-weed
摘要
在这篇论文中,我们提出了一种高效的苔藿类分类方案,旨在在农业领域中提高模型性能的推理过程。我们提出了一种量化深度神经网络模型,通过使用8比特整数(int8)量化,与标准32比特浮点数(fp32)模型不同。考虑到农业领域的硬件资源有限,我们的模型坚持平衡模型大小、推理时间和准确率之间的权衡,符合实际需求。我们使用ResNet-50和InceptionV3架构进行评估,与其int8量化版本进行比较。通过使用传输学习和精度调整,我们在桌面、移动设备和Raspberry Pi等实际生产环境中实现了各种模型大小和推理时间的减少,同时保持了准确率。我们的工作探讨了苔藿类分类领域中高效的AI应用的可能性,对更广泛的应用产生了深见。Code:
Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation
results: 论文提出了一种名为 Delayed-PSVI 的优化策略,并提供了对这种策略的首次分析。这种策略在不知道延迟时间的情况下可以达到 $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 E[\tau])$ 的最差情况 regret。Abstract
Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce Delayed-PSVI, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 E[\tau])$ worst-case regret in the presence of unknown stochastic delays. Here $E[\tau]$ is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI, which maintains the same order-optimal regret guarantee with $\widetilde{O}(dHK)$ computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms.
摘要
现代再强化学习(RL)研究已经做出了重要进步,通过函数近似来缓解样本复杂性问题,以提高性能。然而,现有的可证fficient算法通常需要快速获得反馈,如果忽略延迟的影响,则可能导致实际系统中的 regret 增长。在这项工作中,我们面临了RL中延迟反馈的挑战,使用线性函数近似,并利用 posterior sampling,这种方法在各种场景中都有良好的实际表现。我们首先介绍了延迟PSVI算法,这是一种使用噪声扰动和 posterior sampling 来精细探索值函数空间的优化算法。我们提供了RL中延迟反馈 posterior sampling 的首次分析,并证明我们的算法在存在未知随机延迟的情况下可以 achiev $\widetilde{O}(\sqrt{d^3H^3T} + d^2H^2 E[\tau])$ 最坏情况的 regret。这里 $E[\tau]$ 是预期的延迟。为了进一步改进其计算效率和扩展到高维RL问题,我们提出了增强版的 Delayed-LPSVI 算法,使用朗凯朋 dynamics 来实现约同 Optimal 的 regret 保证,但计算成本为 $\widetilde{O}(dHK)$。我们的实验表明我们的算法在统计和计算上具有良好的效果。
paper_authors: Tomasz Limisiewicz, David Mareček, Tomáš Musil
for: 这项研究旨在检测和 Mitigating 语言模型中的性别偏见。
methods: 该研究使用 causal analysis 方法来 indentify 问题atic model components,并发现 mid-upper feed-forward layers 最容易传递偏见。基于分析结果,我们采用 linear projection 方法来修改模型。
results: 我们的 DAMA 方法能够 Significantly 降低 bias 指标,同时保持模型在下游任务中的表现。我们发布了我们的方法和模型代码,可以 retrained LLaMA 的 state-of-the-art performance,但是significantly less biased。Abstract
Large language models are becoming the go-to solution for various language tasks. However, with growing capacity, models are prone to rely on spurious correlations stemming from biases and stereotypes present in the training data. This work proposes a novel method for detecting and mitigating gender bias in language models. We perform causal analysis to identify problematic model components and discover that mid-upper feed-forward layers are most prone to convey biases. Based on the analysis results, we adapt the model by multiplying these layers by a linear projection. Our titular method, DAMA, significantly decreases bias as measured by diverse metrics while maintaining the model's performance on downstream tasks. We release code for our method and models, which retrain LLaMA's state-of-the-art performance while being significantly less biased.
摘要
大型语言模型在各种语言任务中成为首选解决方案,但是随着容量的增长,模型往往会受到来自于训练数据中的偏见和 sterotypes 的影响,导致模型偏误预测。本研究提出一种新的方法来检测和减轻语言模型中的性别偏见。我们通过 causal 分析发现,中upper feed-forward 层最容易传递偏见。根据分析结果,我们对这些层进行线性投影修改,实现了我们的 DAMA 方法,可以对多种度量进行减轻偏见,同时保持模型在下游任务上的性能。我们发布了我们的方法和模型代码,可以在 retrained LLaMA 的基础性能下进行训练,并且具有较少偏见的性能。
InstanT: Semi-supervised Learning with Instance-dependent Thresholds
results: 实验结果表明,使用实例висиendent阈值函数可以提高 SSL 的性能,并且可以更好地适应实际应用中的不同数据分布。Abstract
Semi-supervised learning (SSL) has been a fundamental challenge in machine learning for decades. The primary family of SSL algorithms, known as pseudo-labeling, involves assigning pseudo-labels to confident unlabeled instances and incorporating them into the training set. Therefore, the selection criteria of confident instances are crucial to the success of SSL. Recently, there has been growing interest in the development of SSL methods that use dynamic or adaptive thresholds. Yet, these methods typically apply the same threshold to all samples, or use class-dependent thresholds for instances belonging to a certain class, while neglecting instance-level information. In this paper, we propose the study of instance-dependent thresholds, which has the highest degree of freedom compared with existing methods. Specifically, we devise a novel instance-dependent threshold function for all unlabeled instances by utilizing their instance-level ambiguity and the instance-dependent error rates of pseudo-labels, so instances that are more likely to have incorrect pseudo-labels will have higher thresholds. Furthermore, we demonstrate that our instance-dependent threshold function provides a bounded probabilistic guarantee for the correctness of the pseudo-labels it assigns.
摘要
半监督学习(SSL)已经是机器学习领域的基本挑战之一。主要的SSL算法家族是使用 Pseudo-labeling 方法,即将 confidence 度不高的无标签实例分配 pseudo-label。因此,选择 pseudo-label 的标准是 SSL 的关键。近年来,有越来越多的关注于开发 SSL 方法,使用动态或适应性的阈值。然而,这些方法通常是对所有样本使用同一个阈值,或者使用基于类型的阈值 для归类为某个类型的实例,而忽略实例级别信息。在这篇论文中,我们提出了研究实例 dependent 阈值的想法。具体来说,我们设计了一种基于实例级别的阈值函数,用于所有无标签实例。我们利用实例级别的冗余和实例 dependent 的 pseudo-label 错误率,来确定实例的阈值。此外,我们也证明了我们的实例 dependent 阈值函数可以为 pseudo-label 提供 bounded 概率 garantue。
Stacking the Odds: Transformer-Based Ensemble for AI-Generated Text Detection
results: 我们的方法在官方提供的测试数据上达到了0.9555 的准确率。Abstract
This paper reports our submission under the team name `SynthDetectives' to the ALTA 2023 Shared Task. We use a stacking ensemble of Transformers for the task of AI-generated text detection. Our approach is novel in terms of its choice of models in that we use accessible and lightweight models in the ensemble. We show that ensembling the models results in an improved accuracy in comparison with using them individually. Our approach achieves an accuracy score of 0.9555 on the official test data provided by the shared task organisers.
摘要
translate to Simplified Chinese:这篇论文报道我们在 `SynthDetectives' 团队名下提交到 ALTA 2023 共同任务。我们使用Transformers核心来实现人工生成文本检测任务。我们的方法在选择模型方面具有创新性,我们使用可 accessible 和轻量级模型。我们的结果表明,将多个模型 ensemble 后,对比使用单个模型时,具有更高的准确率。我们在官方提供的测试数据上 achieved 0.9555 的准确率。
Ever Evolving Evaluator (EV3): Towards Flexible and Reliable Meta-Optimization for Knowledge Distillation
results: 实验结果表明,EV3可以安全地探索模型空间,并且在多个目标下可以动态调整任务。这种具有很大的灵活性和适应能力的方法,在多种领域可能有广泛的应用。Abstract
We introduce EV3, a novel meta-optimization framework designed to efficiently train scalable machine learning models through an intuitive explore-assess-adapt protocol. In each iteration of EV3, we explore various model parameter updates, assess them using pertinent evaluation methods, and adapt the model based on the optimal updates and previous progress history. EV3 offers substantial flexibility without imposing stringent constraints like differentiability on the key objectives relevant to the tasks of interest. Moreover, this protocol welcomes updates with biased gradients and allows for the use of a diversity of losses and optimizers. Additionally, in scenarios with multiple objectives, it can be used to dynamically prioritize tasks. With inspiration drawn from evolutionary algorithms, meta-learning, and neural architecture search, we investigate an application of EV3 to knowledge distillation. Our experimental results illustrate EV3's capability to safely explore model spaces, while hinting at its potential applicability across numerous domains due to its inherent flexibility and adaptability.
摘要
我们介绍EV3,一种新的元优化框架,用于高效地训练可扩展机器学习模型。EV3采用一种直观的探索-评估-适应协议,在每次迭代中探索不同的模型参数更新,使用相关的评估方法进行评估,并根据最佳更新和前一次进程历史来适应模型。EV3具有明显的灵活性,不需要在关键任务中强制执行梯度的微分性。此外,这个协议还允许使用多种损失函数和优化器,并在多个目标场景下动态准备任务。 drawing inspiration from evolutionary algorithms, meta-learning, and neural architecture search, we investigate the application of EV3 to knowledge distillation. Our experimental results show that EV3 can safely explore model spaces and hint at its potential applicability in various domains due to its inherent flexibility and adaptability.
Towards Generalized Multi-stage Clustering: Multi-view Self-distillation
results: 实验结果显示,这篇论文的方法在真实世界的多视角数据上表现较好,与现有的方法相比,具有更好的 clustering 性能。Abstract
Existing multi-stage clustering methods independently learn the salient features from multiple views and then perform the clustering task. Particularly, multi-view clustering (MVC) has attracted a lot of attention in multi-view or multi-modal scenarios. MVC aims at exploring common semantics and pseudo-labels from multiple views and clustering in a self-supervised manner. However, limited by noisy data and inadequate feature learning, such a clustering paradigm generates overconfident pseudo-labels that mis-guide the model to produce inaccurate predictions. Therefore, it is desirable to have a method that can correct this pseudo-label mistraction in multi-stage clustering to avoid the bias accumulation. To alleviate the effect of overconfident pseudo-labels and improve the generalization ability of the model, this paper proposes a novel multi-stage deep MVC framework where multi-view self-distillation (DistilMVC) is introduced to distill dark knowledge of label distribution. Specifically, in the feature subspace at different hierarchies, we explore the common semantics of multiple views through contrastive learning and obtain pseudo-labels by maximizing the mutual information between views. Additionally, a teacher network is responsible for distilling pseudo-labels into dark knowledge, supervising the student network and improving its predictive capabilities to enhance the robustness. Extensive experiments on real-world multi-view datasets show that our method has better clustering performance than state-of-the-art methods.
摘要
现有的多阶段划分方法独立地学习多视图中的突出特征,然后进行划分任务。特别是,多视图划分(MVC)在多视图或多模式场景中吸引了很多关注。MVC aimed at exploring common semantics and pseudo-labels from multiple views and clustering in a self-supervised manner. However, limited by noisy data and inadequate feature learning, such a clustering paradigm generates overconfident pseudo-labels that misguide the model to produce inaccurate predictions. Therefore, it is desirable to have a method that can correct this pseudo-label mistraction in multi-stage clustering to avoid the bias accumulation. To alleviate the effect of overconfident pseudo-labels and improve the generalization ability of the model, this paper proposes a novel multi-stage deep MVC framework, where multi-view self-distillation (DistilMVC) is introduced to distill dark knowledge of label distribution. Specifically, in the feature subspace at different hierarchies, we explore the common semantics of multiple views through contrastive learning and obtain pseudo-labels by maximizing the mutual information between views. Additionally, a teacher network is responsible for distilling pseudo-labels into dark knowledge, supervising the student network and improving its predictive capabilities to enhance the robustness. Extensive experiments on real-world multi-view datasets show that our method has better clustering performance than state-of-the-art methods.
Differentiable Learning of Generalized Structured Matrices for Efficient Deep Neural Networks
results: 作者的方法可以学习出高性能且低复杂度的DNN,并且比之前使用低级、块稀或块低级矩阵的方法更高效。Abstract
This paper investigates efficient deep neural networks (DNNs) to replace dense unstructured weight matrices with structured ones that possess desired properties. The challenge arises because the optimal weight matrix structure in popular neural network models is obscure in most cases and may vary from layer to layer even in the same network. Prior structured matrices proposed for efficient DNNs were mostly hand-crafted without a generalized framework to systematically learn them. To address this issue, we propose a generalized and differentiable framework to learn efficient structures of weight matrices by gradient descent. We first define a new class of structured matrices that covers a wide range of structured matrices in the literature by adjusting the structural parameters. Then, the frequency-domain differentiable parameterization scheme based on the Gaussian-Dirichlet kernel is adopted to learn the structural parameters by proximal gradient descent. Finally, we introduce an effective initialization method for the proposed scheme. Our method learns efficient DNNs with structured matrices, achieving lower complexity and/or higher performance than prior approaches that employ low-rank, block-sparse, or block-low-rank matrices.
摘要
To address this challenge, the authors propose a generalized and differentiable framework for learning efficient weight matrix structures using gradient descent. They define a new class of structured matrices that covers a wide range of existing structured matrices, and use a frequency-domain differentiable parameterization scheme based on the Gaussian-Dirichlet kernel to learn the structural parameters.The proposed method uses proximal gradient descent to optimize the structural parameters, and introduces an effective initialization method to improve performance. The authors demonstrate that their approach learns efficient DNNs with structured matrices, achieving lower complexity and/or higher performance than prior methods that use low-rank, block-sparse, or block-low-rank matrices.
HDMNet: A Hierarchical Matching Network with Double Attention for Large-scale Outdoor LiDAR Point Cloud Registration
results: 对两个大规模外部LiDAR点云数据集进行了广泛的实验,证明了提案的HDMNet可以具有高精度和高效性。Abstract
Outdoor LiDAR point clouds are typically large-scale and complexly distributed. To achieve efficient and accurate registration, emphasizing the similarity among local regions and prioritizing global local-to-local matching is of utmost importance, subsequent to which accuracy can be enhanced through cost-effective fine registration. In this paper, a novel hierarchical neural network with double attention named HDMNet is proposed for large-scale outdoor LiDAR point cloud registration. Specifically, A novel feature consistency enhanced double-soft matching network is introduced to achieve two-stage matching with high flexibility while enlarging the receptive field with high efficiency in a patch-to patch manner, which significantly improves the registration performance. Moreover, in order to further utilize the sparse matching information from deeper layer, we develop a novel trainable embedding mask to incorporate the confidence scores of correspondences obtained from pose estimation of deeper layer, eliminating additional computations. The high-confidence keypoints in the sparser point cloud of the deeper layer correspond to a high-confidence spatial neighborhood region in shallower layer, which will receive more attention, while the features of non-key regions will be masked. Extensive experiments are conducted on two large-scale outdoor LiDAR point cloud datasets to demonstrate the high accuracy and efficiency of the proposed HDMNet.
摘要
大规模户外LiDAR点云注册通常具有复杂分布和大规模特征。为了实现高效和准确的注册,需要强调本地区域之间的相似性,并且优先进行全局本地-本地匹配。在本文中,一种名为HDMNet的新型层次神经网络是提出来解决大规模户外LiDAR点云注册问题。具体来说,我们引入了一种增强特征一致性的双注意网络,可以在patch-to-patch方式下实现高灵活性的两stage匹配,从而显著提高注册性能。此外,我们还开发了一种可 trains embeddingmask,以利用深层次姿态估计中的稀疏匹配信息,从而消除额外计算。高 confidence键点在更 sparse的点云中对应于深层次姿态中的高 confidence空间区域,这些区域将收到更多的注意力,而非键点区域的特征将被masked。我们在两个大规模户外LiDAR点云数据集上进行了广泛的实验,以示提出的HDMNet高精度和高效性。
Prompt-Engineering and Transformer-based Question Generation and Evaluation
results: 研究发现,使用这种方法可以生成高相似度的问题,其中30%的问题达到了高于70%的相似度。I hope this helps! Let me know if you have any other questions.Abstract
Question generation has numerous applications in the educational context. Question generation can prove helpful for students when reviewing content and testing themselves. Furthermore, a question generation model can aid teachers by lessening the burden of creating assessments and other practice material. This paper aims to find the best method to generate questions from textual data through a transformer model and prompt engineering. In this research, we finetuned a pretrained distilBERT model on the SQuAD question answering dataset to generate questions. In addition to training a transformer model, prompt engineering was applied to generate questions effectively using the LLaMA model. The generated questions were compared against the baseline questions in the SQuAD dataset to evaluate the effectiveness of four different prompts. All four prompts demonstrated over 60% similarity on average. Of the prompt-generated questions, 30% achieved a high similarity score greater than 70%.
摘要
Question generation has numerous applications in educational contexts. Question generation can help students review content and assess themselves. Additionally, a question generation model can aid teachers by reducing the burden of creating assessments and practice material. This paper aims to find the best method to generate questions from textual data using a transformer model and prompt engineering. In this research, we fine-tuned a pre-trained distilBERT model on the SQuAD question answering dataset to generate questions. In addition to training a transformer model, prompt engineering was applied to generate questions effectively using the LLaMA model. The generated questions were compared to the baseline questions in the SQuAD dataset to evaluate the effectiveness of four different prompts. All four prompts showed an average similarity of over 60%. Of the prompt-generated questions, 30% achieved a high similarity score of over 70%.