results: 提供一个可以实现现场氛围和观众反馈的虚拟观众框架,实现了虚拟会议中的现场氛围和观众反馈。Abstract
The COVID-19 pandemic shifted many events in our daily lives into the virtual domain. While virtual conference systems provide an alternative to physical meetings, larger events require a muted audience to avoid an accumulation of background noise and distorted audio. However, performing artists strongly rely on the feedback of their audience. We propose a concept for a virtual audience framework which supports all participants with the ambience of a real audience. Audience feedback is collected locally, allowing users to express enthusiasm or discontent by selecting means such as clapping, whistling, booing, and laughter. This feedback is sent as abstract information to a virtual audience server. We broadcast the combined virtual audience feedback information to all participants, which can be synthesized as a single acoustic feedback by the client. The synthesis can be done by turning the collective audience feedback into a prompt that is fed to state-of-the-art models such as AudioGen. This way, each user hears a single acoustic feedback sound of the entire virtual event, without requiring to unmute or risk hearing distorted, unsynchronized feedback.
摘要
COVID-19 大流行使得许多日常生活活动转移到虚拟领域。虚拟会议系统为物理会议提供了替代方案,但是大型活动需要干杂背景噪音和扭曲的音频避免。但是表演艺术家强调audience反馈的重要性。我们提出了一种虚拟听众框架,该框架支持所有参与者在虚拟环境中感受到真实听众的氛围。听众反馈被本地收集,用户可以通过选择方式如掌声、喊喊、嘘声和笑声表达积极或不满。这些反馈信息被发送到虚拟听众服务器,然后将所有参与者发送的反馈信息组合并 Broadcast。客户端可以将这些反馈信息 sinthez为单一的音频反馈,不需要静音或听到扭曲的反馈。
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
results: 本文通过实证研究,证明了一些特性和功能的有效性,并达到了竞争性或者国际先进水平的表现。Abstract
TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio's development principles and contents and highlight key features we include in its latest version (2.1): self-supervised learning pre-trained pipelines and training recipes, high-performance CTC decoders, speech recognition models and training recipes, advanced media I/O capabilities, and tools for performing forced alignment, multi-channel speech enhancement, and reference-less speech assessment. For a selection of these features, through empirical studies, we demonstrate their efficacy and show that they achieve competitive or state-of-the-art performance.
摘要
torchAudio 是一个开源的音频和语音处理库,建立在 PyTorch 之上,旨在加速音频和语音技术的研究和开发。它的贡献者们routinely与用户交流,了解他们的需求,并通过开发有力量的功能来满足他们。在这篇文章中,我们将survey torchAudio 的开发原则和内容,并强调最新版本(2.1)中包含的关键功能。这些功能包括:自然语言处理预训练管道和训练规程,高性能的 CTC 解码器,语音识别模型和训练规程,高级媒体 I/O 能力,以及用于强制对应、多通道语音增强和无参考语音评估的工具。对这些功能的一些特点,我们通过实验研究证明了它们的有效性,并证明它们在比较或国际级的性能。
results: 相比基eline,提议方法能够提高听众体验技术的压缩率,包括存储和传输Here’s a breakdown of each point:
for: The paper is written for the purpose of improving the efficiency of multichannel lossless coding, which is used in audio compression technology.
methods: The proposed method uses a signal model that predicts the upmix based on both past samples of the upmix and current time samples of the downmix. The model parameters are optimized using a general linear solver, and the prediction residual is Rice coded. Additionally, the use of an SVD projection prior to residual coding is proposed.
results: The proposed method shows improved compression ratios compared to various baselines, including FLAC, for the storage and transmission of immersive audio.Abstract
In this paper, techniques for improving multichannel lossless coding are examined. A method is proposed for the simultaneous coding of two or more different renderings (mixes) of the same content. The signal model uses both past samples of the upmix, and the current time samples of downmix samples to predict the upmix. Model parameters are optimized via a general linear solver, and the prediction residual is Rice coded. Additionally, the use of an SVD projection prior to residual coding is proposed. A comparison is made against various baselines, including FLAC. The proposed methods show improved compression ratios for the storage and transmission of immersive audio.
摘要
在这篇论文中,我们研究了多通道无损编码技术的改进方法。我们提议同时编码两个或更多不同的渲染(混合)的同一个内容。信号模型使用过去时间amples的混合和当前时间amples的混合样本来预测混合。模型参数通过一般线性解决器优化,预测差异用Rice编码。此外,我们还提出了SVD проекции前置 residual编码的方法。与不同的基准值进行比较,我们的方法显示在具有幂扩增音频存储和传输中提供了更好的压缩比率。
MixRep: Hidden Representation Mixup for Low-Resource Speech Recognition
methods: 本文提出了 interpolating the feature dimensions of hidden representations in the neural network,可以应用于输入和每层输出的feature。此外,我们还提出了将mixup与时间轴的regulization相结合,并应用到ConformerEncoder上。
results: 实验结果表明,MixRep可以在低资源ASR中提供更高的性能,比其他增强方法更好。与SpecAugment强制比较,MixRep在eval92集和Callhome部分的eval’2000集上减少了相对WRER值6.5%和6.7%。Abstract
In this paper, we present MixRep, a simple and effective data augmentation strategy based on mixup for low-resource ASR. MixRep interpolates the feature dimensions of hidden representations in the neural network that can be applied to both the acoustic feature input and the output of each layer, which generalizes the previous MixSpeech method. Further, we propose to combine the mixup with a regularization along the time axis of the input, which is shown as complementary. We apply MixRep to a Conformer encoder of an E2E LAS architecture trained with a joint CTC loss. We experiment on the WSJ dataset and subsets of the SWB dataset, covering reading and telephony conversational speech. Experimental results show that MixRep consistently outperforms other regularization methods for low-resource ASR. Compared to a strong SpecAugment baseline, MixRep achieves a +6.5\% and a +6.7\% relative WER reduction on the eval92 set and the Callhome part of the eval'2000 set.
摘要
在这篇论文中,我们提出了一种基于mixup的简单有效数据扩大策略,称为MixRep,用于低资源ASR。MixRep interpolates the feature dimensions of hidden representations in the neural network, which can be applied to both the acoustic feature input and the output of each layer, thus generalizing the previous MixSpeech method。另外,我们提议将mixup与时间轴方向的准则相结合,以便增强其效果。我们在一个Conformer编码器上应用MixRep,并使用一个CTC损失函数进行训练。我们在WSJ dataset和SWB dataset的一些子集上进行实验,包括读取和电话交流的语音。实验结果表明,MixRep在低资源ASR中consistently outperform其他准则方法。相比于一个强大的SpecAugment基准,MixRep在eval92集和Callhome部分的eval'2000集上实现了+6.5%和+6.7%的相对WRER降低。
Relative Transfer Function Vector Estimation for Acoustic Sensor Networks Exploiting Covariance Matrix Structure
paper_authors: Wiebke Middelberg, Henri Gode, Simon Doclo
for: 这篇论文主要针对的是听音环境中多个杂音源的噪声减少问题。
methods: 这篇论文提出了两种Relative Transfer Function(RTF)向量估计方法,其中一种是基于噪声covariance矩阵的whitening方法,另一种是基于噪声矩阵的off-diagonal块选择方法。
results: 在使用这两种方法后,对真实的频谱记录进行了 simulated 环境中的 reverberation 环境中的多个噪声源下的噪声减少测试,结果显示,modified CW方法与CW方法相比,有slightly better的SNR提升表现,而off-diagonal选择方法则超过了偏向RTF向量估计。Abstract
In many multi-microphone algorithms for noise reduction, an estimate of the relative transfer function (RTF) vector of the target speaker is required. The state-of-the-art covariance whitening (CW) method estimates the RTF vector as the principal eigenvector of the whitened noisy covariance matrix, where whitening is performed using an estimate of the noise covariance matrix. In this paper, we consider an acoustic sensor network consisting of multiple microphone nodes. Assuming uncorrelated noise between the nodes but not within the nodes, we propose two RTF vector estimation methods that leverage the block-diagonal structure of the noise covariance matrix. The first method modifies the CW method by considering only the diagonal blocks of the estimated noise covariance matrix. In contrast, the second method only considers the off-diagonal blocks of the noisy covariance matrix, but cannot be solved using a simple eigenvalue decomposition. When applying the estimated RTF vector in a minimum variance distortionless response beamformer, simulation results for real-world recordings in a reverberant environment with multiple noise sources show that the modified CW method performs slightly better than the CW method in terms of SNR improvement, while the off-diagonal selection method outperforms a biased RTF vector estimate obtained as the principal eigenvector of the noisy covariance matrix.
摘要
多频器算法中的Target speaker的相对传输函数(RTF)向量估计是多频器算法中非常重要的一个步骤。现在的State-of-the-art方法是covariance whitening(CW)方法,它估计RTF向量为白化后的噪声矩阵中的主要特征向量。在这篇论文中,我们考虑了一个包含多个麦克风节点的声学感知网络。假设 node之间的噪声是独立的,但不是内部独立的,我们提出了两种RTF向量估计方法,它们都利用噪声矩阵的块对称结构。第一种方法是修改CW方法,只考虑预估噪声矩阵的对角块。相比之下,第二种方法只考虑噪声矩阵的偏置块,但不可以使用简单的特征值分解来解决。当应用估计RTF向量在无损杂点抗噪声器中时,通过使用实际录制的真实环境中的多个噪声源,我们的simulation结果显示,修改CW方法与CW方法在SNR提高方面的性能略微不同,而偏置选择方法则超过偏置RTF向量估计。
methods: 这个论文使用机器学习和空间随机模型的方法,通过生成Synthetic Training Data来 overcome 3D imaging技术的问题。
results: 这个论文提出了一种基于机器学习和空间随机模型的方法,可以从2D图像中预测3D结构。此外,论文还进行了错误分析,以评估这种预测方法的准确性。Abstract
The structural characterization of hetero-aggregates in 3D is of great interest, e.g., for deriving process-structure or structure-property relationships. However, since 3D imaging techniques are often difficult to perform as well as time and cost intensive, a characterization of hetero-aggregates based on 2D image data is desirable, but often non-trivial. To overcome the issues of characterizing 3D structures from 2D measurements, a method is presented that relies on machine learning combined with methods of spatial stochastic modeling, where the latter are utilized for the generation of synthetic training data. This kind of training data has the advantage that time-consuming experiments for the synthesis of differently structured materials followed by their 3D imaging can be avoided. More precisely, a parametric stochastic 3D model is presented, from which a wide spectrum of virtual hetero-aggregates can be generated. Additionally, the virtual structures are passed to a physics-based simulation tool in order to generate virtual scanning transmission electron microscopy (STEM) images. The preset parameters of the 3D model together with the simulated STEM images serve as a database for the training of convolutional neural networks, which can be used to determine the parameters of the underlying 3D model and, consequently, to predict 3D structures of hetero-aggregates from 2D STEM images. Furthermore, an error analysis is performed to evaluate the prediction power of the trained neural networks with respect to structural descriptors, e.g. the hetero-coordination number.
摘要
“三维结构Characterization的研究对于异化体组合物有很大的 интерес,例如 derivation of process-structure or structure-property relationships. However, since 3D imaging techniques are often difficult to perform and time-consuming, a characterization of hetero-aggregates based on 2D image data is desirable but challenging. To overcome the limitations of characterizing 3D structures from 2D measurements, a method is proposed that combines machine learning with spatial stochastic modeling. This approach utilizes synthetic training data generated by the latter method to avoid time-consuming experiments for the synthesis of differently structured materials and their 3D imaging. Specifically, a parametric stochastic 3D model is presented, from which a wide spectrum of virtual hetero-aggregates can be generated. The virtual structures are then passed to a physics-based simulation tool to generate virtual scanning transmission electron microscopy (STEM) images. The pre-set parameters of the 3D model and the simulated STEM images serve as a database for training convolutional neural networks, which can be used to determine the parameters of the underlying 3D model and predict 3D structures of hetero-aggregates from 2D STEM images. Additionally, an error analysis is performed to evaluate the prediction power of the trained neural networks with respect to structural descriptors, such as the hetero-coordination number.”Note that Simplified Chinese is used in this translation, which is a standardized form of Chinese that is easier to read and write than Traditional Chinese. However, if you prefer Traditional Chinese, I can also provide the translation in that format.
Learning to recognize occluded and small objects with partial inputs
results: 实验结果表明,MSL 能够与之前的状态图像识别方法竞争,并且可以快速、简单地应用于多个标签图像识别任务。此外,我们还证明了MSL 对随机遮盲的稳定性和非遮盲对象的识别能力。代码和预训练模型可以在 GitHub 上获取。Abstract
Recognizing multiple objects in an image is challenging due to occlusions, and becomes even more so when the objects are small. While promising, existing multi-label image recognition models do not explicitly learn context-based representations, and hence struggle to correctly recognize small and occluded objects. Intuitively, recognizing occluded objects requires knowledge of partial input, and hence context. Motivated by this intuition, we propose Masked Supervised Learning (MSL), a single-stage, model-agnostic learning paradigm for multi-label image recognition. The key idea is to learn context-based representations using a masked branch and to model label co-occurrence using label consistency. Experimental results demonstrate the simplicity, applicability and more importantly the competitive performance of MSL against previous state-of-the-art methods on standard multi-label image recognition benchmarks. In addition, we show that MSL is robust to random masking and demonstrate its effectiveness in recognizing non-masked objects. Code and pretrained models are available on GitHub.
摘要
Recognizing multiple objects in an image is challenging due to occlusions, and becomes even more so when the objects are small. While existing multi-label image recognition models show promise, they do not explicitly learn context-based representations, and therefore struggle to correctly recognize small and occluded objects. Intuitively, recognizing occluded objects requires knowledge of partial input, and hence context. Motivated by this intuition, we propose Masked Supervised Learning (MSL), a single-stage, model-agnostic learning paradigm for multi-label image recognition. The key idea is to learn context-based representations using a masked branch and to model label co-occurrence using label consistency. Experimental results demonstrate the simplicity, applicability, and more importantly the competitive performance of MSL against previous state-of-the-art methods on standard multi-label image recognition benchmarks. In addition, we show that MSL is robust to random masking and demonstrate its effectiveness in recognizing non-masked objects. Code and pretrained models are available on GitHub.Here's the translation in Traditional Chinese:识别多个图像中的物体是困难的,尤其是当物体小时。现有的多 Label 图像识别模型虽然有推荐,但是它们不会直接学习上下文基于的表现,因此对于小和遮蔽的物体来说,其表现不佳。我们受到这个直觉的动机,提出了几个概念,包括:几个 Label 的共同出现,以及对于部分输入的知识。我们提出了一个单阶段、无法检测的学习方法,即掩盖Supervised Learning (MSL),以学习上下文基于的表现。我们的关键思想是,通过掩盖分支来学习上下文基于的表现,并且使用标签的共同出现来模型标签的共同性。我们的实验结果显示,MSL 的简单性、应用性和更重要的是,与前一代方法相比,其表现非常竞争。此外,我们还证明了 MSL 在随机掩盖下是稳定的,并且在非掩盖的情况下表现良好。我们的代码和预训模型都可以在 GitHub 上找到。
GPT-4 Vision on Medical Image Classification – A Case Study on COVID-19 Dataset
results: 研究发现,通过使用 GPT-4V,图像分类的准确率得到了显著提高,表明了 GPT-4V 在 COVID-19 图像分类中的潜在应用价值。Abstract
This technical report delves into the application of GPT-4 Vision (GPT-4V) in the nuanced realm of COVID-19 image classification, leveraging the transformative potential of in-context learning to enhance diagnostic processes.
摘要
这份技术报告探讨了 COVID-19 图像分类领域中 GPT-4 Vision(GPT-4V)的应用,利用 context learning 的潜在力量提高诊断过程。Note:* "GPT-4V" is translated as "GPT-4 Vision" (格PT-4视力)* "in-context learning" is translated as " context learning" (上下文学习)* "diagnostic processes" is translated as "诊断过程" (诊断过程)
Knowledge-based in silico models and dataset for the comparative evaluation of mammography AI for a range of breast characteristics, lesion conspicuities and doses
results: 我们释放了M-SYNTH数据集,包含四种乳腺纤维质分布的人群,通过Monte Carlo x射线计算模拟不同的暴露水平进行图像捕获。我们发现,随着乳腺纤维质的增加,AI模型的性能逐渐下降,而随着质量的增加,AI模型的性能则逐渐提高。随着暴露水平的减少,AI模型的性能下降,最高的性能出现在较低的暴露水平下。Abstract
To generate evidence regarding the safety and efficacy of artificial intelligence (AI) enabled medical devices, AI models need to be evaluated on a diverse population of patient cases, some of which may not be readily available. We propose an evaluation approach for testing medical imaging AI models that relies on in silico imaging pipelines in which stochastic digital models of human anatomy (in object space) with and without pathology are imaged using a digital replica imaging acquisition system to generate realistic synthetic image datasets. Here, we release M-SYNTH, a dataset of cohorts with four breast fibroglandular density distributions imaged at different exposure levels using Monte Carlo x-ray simulations with the publicly available Virtual Imaging Clinical Trial for Regulatory Evaluation (VICTRE) toolkit. We utilize the synthetic dataset to analyze AI model performance and find that model performance decreases with increasing breast density and increases with higher mass density, as expected. As exposure levels decrease, AI model performance drops with the highest performance achieved at exposure levels lower than the nominal recommended dose for the breast type.
摘要
<>为了生成人工智能(AI)医疗设备的安全性和有效性的证据,我们需要对各种患者群体的病例进行评估。我们提出了一种使用数字医学ipeline进行医学影像AI模型的评估方法,其中使用了卫星模型来生成人工的影像数据集。在这种方法中,我们使用了VICTRE工具包来进行Monte Carlo x射线计算,生成了不同抑制物质分布的胸部病例数据集。我们通过分析这些数据来评估AI模型的性能,发现模型性能随着乳腺细胞分布的增加而下降,而模型在高质量细胞分布下的性能最高。随着曝光水平的下降,AI模型的性能下降,最佳性能在低于标准推荐剂量之下得到。>>>
Exploring Shape Embedding for Cloth-Changing Person Re-Identification via 2D-3D Correspondences
paper_authors: Yubin Wang, Huimin Yu, Yuming Yan, Shuyi Song, Biyang Liu, Yichong Lu for: 这篇论文旨在解决 cloth-changing ReID 问题,即人脸识别 task 中人物穿着不同服装时的问题。methods: 这篇论文提出了一种新的 shape embedding 方法,即 Continuous Surface Correspondence Learning (CSCL),它通过Pixel-to-vertex classification来建立人像与3D人体模型之间的连续匹配,从而获取人像与3D模型之间的匹配点。results: 实验表明,CSCL 方法可以remarkably enhance the model’s global understanding of human body shape,并在 cloth-changing ReID 和 cloth-consistent ReID Benchmarks 上达到了出色的效果。Abstract
Cloth-Changing Person Re-Identification (CC-ReID) is a common and realistic problem since fashion constantly changes over time and people's aesthetic preferences are not set in stone. While most existing cloth-changing ReID methods focus on learning cloth-agnostic identity representations from coarse semantic cues (e.g. silhouettes and part segmentation maps), they neglect the continuous shape distributions at the pixel level. In this paper, we propose Continuous Surface Correspondence Learning (CSCL), a new shape embedding paradigm for cloth-changing ReID. CSCL establishes continuous correspondences between a 2D image plane and a canonical 3D body surface via pixel-to-vertex classification, which naturally aligns a person image to the surface of a 3D human model and simultaneously obtains pixel-wise surface embeddings. We further extract fine-grained shape features from the learned surface embeddings and then integrate them with global RGB features via a carefully designed cross-modality fusion module. The shape embedding paradigm based on 2D-3D correspondences remarkably enhances the model's global understanding of human body shape. To promote the study of ReID under clothing change, we construct 3D Dense Persons (DP3D), which is the first large-scale cloth-changing ReID dataset that provides densely annotated 2D-3D correspondences and a precise 3D mesh for each person image, while containing diverse cloth-changing cases over all four seasons. Experiments on both cloth-changing and cloth-consistent ReID benchmarks validate the effectiveness of our method.
摘要
cloth-changing 人识别 (CC-ReID) 是一个常见并且现实存在的问题,因为时尚不断发展,人们的美学偏好也不是固定的。现有的 cloth-changing ReID 方法多数是通过学习粗略的 semantic cues(例如 silhouette 和 part segmentation map)来学习不同服装下的人脸特征,但是它们忽略了人体图像中精细的形状分布。在这篇文章中,我们提出了 Continuous Surface Correspondence Learning (CSCL),一种新的形状嵌入方法,用于 cloth-changing ReID。CSCL 通过像素到顶点的分类来建立人体图像与Canonical 3D 人体模型之间的连续对应关系,从而自然地将人体图像与模型之间建立对应关系,同时获得像素级别的表面嵌入。我们还提取了高级别的形状特征从学习的表面嵌入,然后与全球 RGB 特征进行权重相乘。基于 2D-3D 对应关系的形状嵌入方法,可以强化模型对人体形状的全面理解。为了推动 cloth-changing ReID 的研究,我们构建了 3D Dense Persons (DP3D),这是首个包含了不同的服装变化情况的 cloth-changing ReID 数据集,每个人像图像都有精 densely 注解的 2D-3D 对应关系和精确的 3D 网格。实验表明,我们的方法在 cloth-changing 和 cloth-consistent ReID Benchmark 上具有remarkable的效果。
Always Clear Days: Degradation Type and Severity Aware All-In-One Adverse Weather Removal
results: 相比于现有的State-of-the-Art方法,该模型可以在不同的气象恢复任务中显著超越对手,并且具有较少的模型参数。此外,该模型还可以 Restore 未seen 领域的多种气象降低图像,并可以调整恢复水平。Abstract
All-in-one adverse weather removal is an emerging topic on image restoration, which aims to restore multiple weather degradation in an unified model, and the challenging are twofold. First, discovering and handling the property of multi-domain in target distribution formed by multiple weather conditions. Second, design efficient and effective operations for different degradation types. To address this problem, most prior works focus on the multi-domain caused by weather type. Inspired by inter\&intra-domain adaptation literature, we observed that not only weather type but also weather severity introduce multi-domain within each weather type domain, which is ignored by previous methods, and further limit their performance. To this end, we proposed a degradation type and severity aware model, called \textbf{UtilityIR}, for blind all-in-one bad weather image restoration. To extract weather information from single image, we proposed a novel Marginal Quality Ranking Loss (MQRL) and utilized Contrastive Loss (CL) to guide weather severity and type extraction, and leverage a bag of novel techniques such as Multi-Head Cross Attention (MHCA) and Local-Global Adaptive Instance Normalization (LG-AdaIN) to efficiently restore spatial varying weather degradation. The proposed method can significantly outperform the SOTA methods subjectively and objectively on different weather restoration tasks with a large margin, and enjoy less model parameters. Proposed method even can restore \textbf{unseen} domain combined multiple degradation images, and modulating restoration level. Implementation code will be available at {https://github.com/fordevoted/UtilityIR}{\textit{this repository}
摘要
全面天气环境去除是一个现代图像修复领域的热点问题,目标是通过一个统一模型来恢复多种天气下的图像异常情况,问题的两个级别是:首先,发现和处理目标分布中多个域的性质,其次,设计高效和有效的操作方法 для不同的退化类型。以前的大多数工作都是通过多种天气类型来处理多个域的问题,但是我们发现,不同的天气严重性也会在每个天气类型中引入多个域,这一点被以前的方法忽略了,从而限制了其性能。为了解决这个问题,我们提出了一种具有退化类型和严重性意识的模型,称为\textbf{UtilityIR},用于盲目全面坏天气图像修复。为了从单个图像中提取天气信息,我们提出了一种新的环境质量排名损失函数(MQRL),并使用了对比损失函数(CL)来引导天气严重性和类型的提取,并利用了一系列新的技术,如多头交叉注意力(MHCA)和本地-全局适应实例均衡化(LG-AdaIN),以高效地恢复空间变化的天气退化。我们的方法可以Subjectively和Objectively在不同的天气修复任务上与state-of-the-art方法进行比较,并且具有较少的模型参数。我们的方法甚至可以恢复未经见过的多个退化图像,并可以调整修复水平。我们的实现代码将在[这个仓库](https://github.com/fordevoted/UtilityIR)上提供。
Heterogeneous Federated Learning with Group-Aware Prompt Tuning
results: 我们的方法可以让单个全球模型自动适应不同客户端的本地数据分布,不需要本地微调。与替换方法不同,我们的方法可以准确地跨越客户端之间的差异,从而实现联合学习中的全球和本地模型匹配。我们通过了广泛的实验和减少研究来证明方法的有效性。Abstract
Transformers have achieved remarkable success in various machine-learning tasks, prompting their widespread adoption. In this paper, we explore their application in the context of federated learning (FL), with a particular focus on heterogeneous scenarios where individual clients possess diverse local datasets. To meet the computational and communication demands of FL, we leverage pre-trained Transformers and use an efficient prompt-tuning strategy. Our strategy introduces the concept of learning both shared and group prompts, enabling the acquisition of universal knowledge and group-specific knowledge simultaneously. Additionally, a prompt selection module assigns personalized group prompts to each input, aligning the global model with the data distribution of each client. This approach allows us to train a single global model that can automatically adapt to various local client data distributions without requiring local fine-tuning. In this way, our proposed method effectively bridges the gap between global and personalized local models in Federated Learning and surpasses alternative approaches that lack the capability to adapt to previously unseen clients. The effectiveness of our approach is rigorously validated through extensive experimentation and ablation studies.
摘要
“对于联邦学习(Federated Learning,FL)的应用,trasnformers已经取得了杰出的成就,它们的广泛应用引起了广泛的关注。在本文中,我们探讨trasnformers在多种不同资料分布的联邦学习中的应用,特别是在客户端拥有多样化的本地数据时。为了解决联邦学习中的计算和通信需求,我们将pre-trained transformers和高效的提示调整策略应用于联邦学习。我们的策略是学习共享和分组提示,允许同时获取通用知识和分组特定知识。此外,提示选择模块将每个输入的个人化分组提示分配给每个客户端,使全球模型与每个客户端的数据分布保持一致。这种方法允许我们训练一个单一的全球模型,无需进行本地微调整,并且自动适应不同客户端的数据分布。因此,我们的提案可以有效地跨越全球和个人化的客户端模型之间的差异,超越缺乏适应不见前的客户端模型。我们的方法的有效性经过了广泛的实验和剥夺研究,以证明其可行性和优势。”
FOUND: Foot Optimization with Uncertain Normals for Surface Deformation Using Synthetic Data
results: 论文表明其法向量预测器在实际图像上表现出色,而优化方案也在几个视图设置下比 estado del arte 光学测量管道表现更好。Abstract
Surface reconstruction from multi-view images is a challenging task, with solutions often requiring a large number of sampled images with high overlap. We seek to develop a method for few-view reconstruction, for the case of the human foot. To solve this task, we must extract rich geometric cues from RGB images, before carefully fusing them into a final 3D object. Our FOUND approach tackles this, with 4 main contributions: (i) SynFoot, a synthetic dataset of 50,000 photorealistic foot images, paired with ground truth surface normals and keypoints; (ii) an uncertainty-aware surface normal predictor trained on our synthetic dataset; (iii) an optimization scheme for fitting a generative foot model to a series of images; and (iv) a benchmark dataset of calibrated images and high resolution ground truth geometry. We show that our normal predictor outperforms all off-the-shelf equivalents significantly on real images, and our optimization scheme outperforms state-of-the-art photogrammetry pipelines, especially for a few-view setting. We release our synthetic dataset and baseline 3D scans to the research community.
摘要
表面重建从多视图图像是一项具有挑战性的任务,解决方案通常需要大量的采样图像和高重叠率。我们寻求开发一种几视图重建方法,专门针对人体脚部。为解决这个任务,我们需要从RGB图像中提取丰富的地理学特征,然后精心融合到最终的3D对象中。我们的FOUND方法从以下四个方面做出贡献:(i) SynFoot,一个包含50,000个真实风格的脚部图像,每个图像都有附加的表面法向量和关键点数据;(ii) 基于我们的 sintetic dataset 的不确定性感知表面法向量预测器;(iii) 用于把一系列图像适应到生成的脚部模型中的优化方案;(iv) 一个准备了卡лли布рован的图像和高分辨率的真实地理学几何结构的参考数据集。我们表明我们的normal预测器在真实图像上明显超过了所有准备的等价器,而我们的优化方案在几视图设置下明显超过了当前的摄影探测渠道。我们发布我们的 sintetic dataset 和基线3D扫描数据,以便研究人员进行更多的探索和应用。
LipSim: A Provably Robust Perceptual Similarity Metric
for: This paper is written for researchers and practitioners interested in developing and applying perceptual similarity metrics, particularly those concerned with the vulnerability of these metrics to adversarial attacks.
methods: The paper uses an ensemble of ViT-based feature extractors and proposes a framework for training a robust perceptual similarity metric called LipSim, which leverages 1-Lipschitz neural networks as the backbone and provides provable guarantees.
results: The paper demonstrates the vulnerability of state-of-the-art perceptual similarity metrics to adversarial attacks and presents a comprehensive set of experiments showing the performance of LipSim in terms of natural and certified scores, as well as on the image retrieval application.Abstract
Recent years have seen growing interest in developing and applying perceptual similarity metrics. Research has shown the superiority of perceptual metrics over pixel-wise metrics in aligning with human perception and serving as a proxy for the human visual system. On the other hand, as perceptual metrics rely on neural networks, there is a growing concern regarding their resilience, given the established vulnerability of neural networks to adversarial attacks. It is indeed logical to infer that perceptual metrics may inherit both the strengths and shortcomings of neural networks. In this work, we demonstrate the vulnerability of state-of-the-art perceptual similarity metrics based on an ensemble of ViT-based feature extractors to adversarial attacks. We then propose a framework to train a robust perceptual similarity metric called LipSim (Lipschitz Similarity Metric) with provable guarantees. By leveraging 1-Lipschitz neural networks as the backbone, LipSim provides guarded areas around each data point and certificates for all perturbations within an $\ell_2$ ball. Finally, a comprehensive set of experiments shows the performance of LipSim in terms of natural and certified scores and on the image retrieval application. The code is available at https://github.com/SaraGhazanfari/LipSim.
摘要
近年来,有越来越多的研究者关注开发和应用感知相似度度量。研究表明,感知度量比像素精度更能与人类感知相匹配,并作为人类视觉系统的代理。然而,由于感知度量基于神经网络,因此存在抗击攻击的担忧。这是合理的推理,因为神经网络具有抗击攻击的敏感性。在这种情况下,我们展示了现状顶尖感知相似度度量基于ViT基于特征提取器的集成系统对抗攻击的漏斗性。然后,我们提议一种训练可靠的感知相似度度量的框架,称为LipSim(Lipschitz相似度度量)。通过使用1-Lipschitz神经网络作为核心,LipSim提供了每个数据点的保护区和所有折射在$\ell_2$球体内的证明。最后,我们进行了详细的实验,以评估LipSim在自然和证明得分上的性能,以及图像检索应用中的表现。代码可以在https://github.com/SaraGhazanfari/LipSim中找到。
PlantPlotGAN: A Physics-Informed Generative Adversarial Network for Plant Disease Prediction
paper_authors: Felipe A. Lopes, Vasit Sagan, Flavio Esposito
for: 园区监测是重要的农业管理和收获健康的关键,尤其是检测植物疾病。
methods: 我们使用无人飞行器(UAV)收集多spectral图像,以帮助园区监测。
results: 我们的 PlantPlotGAN 模型可以生成高品质的合成多spectral图像,并且可以提高检测植物疾病的预测模型精度。Abstract
Monitoring plantations is crucial for crop management and producing healthy harvests. Unmanned Aerial Vehicles (UAVs) have been used to collect multispectral images that aid in this monitoring. However, given the number of hectares to be monitored and the limitations of flight, plant disease signals become visually clear only in the later stages of plant growth and only if the disease has spread throughout a significant portion of the plantation. This limited amount of relevant data hampers the prediction models, as the algorithms struggle to generalize patterns with unbalanced or unrealistic augmented datasets effectively. To address this issue, we propose PlantPlotGAN, a physics-informed generative model capable of creating synthetic multispectral plot images with realistic vegetation indices. These indices served as a proxy for disease detection and were used to evaluate if our model could help increase the accuracy of prediction models. The results demonstrate that the synthetic imagery generated from PlantPlotGAN outperforms state-of-the-art methods regarding the Fr\'echet inception distance. Moreover, prediction models achieve higher accuracy metrics when trained with synthetic and original imagery for earlier plant disease detection compared to the training processes based solely on real imagery.
摘要
监测植业是cro管理和生产健康卫生的关键。无人驾驶飞行器(UAV)已被用于收集多spectral图像,以帮助监测。然而, giventhe number of hectares to be monitored and the limitations of flight, plant disease signals only become visually clear in the later stages of plant growth, and only if the disease has spread throughout a significant portion of the plantation. This limited amount of relevant data hampers the prediction models, as the algorithms struggle to generalize patterns with unbalanced or unrealistic augmented datasets effectively. To address this issue, we propose PlantPlotGAN, a physics-informed generative model capable of creating synthetic multispectral plot images with realistic vegetation indices. These indices served as a proxy for disease detection and were used to evaluate if our model could help increase the accuracy of prediction models. The results demonstrate that the synthetic imagery generated from PlantPlotGAN outperforms state-of-the-art methods regarding the Fréchet inception distance. Moreover, prediction models achieve higher accuracy metrics when trained with synthetic and original imagery for earlier plant disease detection compared to the training processes based solely on real imagery.Here's the word-for-word translation:监测植业是cro管理和生产健康卫生的关键。无人驾驶飞行器(UAV)已被用于收集多spectral图像,以帮助监测。然而, giventhe number of hectares to be monitored and the limitations of flight, plant disease signals only become visually clear in the later stages of plant growth, and only if the disease has spread throughout a significant portion of the plantation. This limited amount of relevant data hampers the prediction models, as the algorithms struggle to generalize patterns with unbalanced or unrealistic augmented datasets effectively. To address this issue, we propose PlantPlotGAN, a physics-informed generative model capable of creating synthetic multispectral plot images with realistic vegetation indices. These indices served as a proxy for disease detection and were used to evaluate if our model could help increase the accuracy of prediction models. The results demonstrate that the synthetic imagery generated from PlantPlotGAN outperforms state-of-the-art methods regarding the Fréchet inception distance. Moreover, prediction models achieve higher accuracy metrics when trained with synthetic and original imagery for earlier plant disease detection compared to the training processes based solely on real imagery.
A Self-Supervised Approach to Land Cover Segmentation
results: 经过10个微调轮,实现了约52%的准确率在5个样本中,表明了自动化标注高分辨率农业Remote sensing图像的可能性。Abstract
Land use/land cover change (LULC) maps are integral resources in earth science and agricultural research. Due to the nature of such maps, the creation of LULC maps is often constrained by the time and human resources necessary to accurately annotate satellite imagery and remote sensing data. While computer vision models that perform semantic segmentation to create detailed labels from such data are not uncommon, litle research has been done on self-supervised and unsupervised approaches to labelling LULC maps without the use of ground-truth masks. Here, we demonstrate a self-supervised method of land cover segmentation that has no need for high-quality ground truth labels. The proposed deep learning employs a frozen pre-trained ViT backbone transferred from DINO in a STEGO architecture and is fine-tuned using a custom dataset consisting of very high resolution (VHR) sattelite imagery. After only 10 epochs of fine-tuning, an accuracy of roughly 52% was observed across 5 samples, signifying the feasibility of self-supervised models for the automated labelling of VHR LULC maps.
摘要
Land use/land cover change(LULC)地图是地球科学和农业研究中的重要资源。由于LULC地图的创建通常受到时间和人员资源的限制,因为需要精确地标注卫星图像和远程感知数据。虽然用计算机视觉模型进行semantic segmentation,从数据中生成细节标签并不是无前例的,但是对LULC地图的自动标注而无需高质量地面真实标签的研究不多。本文提出了一种没有需要高质量地面真实标签的自动标注方法。该深度学习模型使用冰结的预训练ViT背bone,并在STEGO架构中进行了精度调整。经过10个精度调整 epoch,模型在5个样本上达到了约52%的准确率,表明自动标注模型可以实施高分辨率LULC地图的自动标注。
Generative AI Model for Artistic Style Transfer Using Convolutional Neural Networks
results: 本文通过实验结果显示了该方法的效果和多样性,包括不同风格和内容的图像合成。Abstract
Artistic style transfer, a captivating application of generative artificial intelligence, involves fusing the content of one image with the artistic style of another to create unique visual compositions. This paper presents a comprehensive overview of a novel technique for style transfer using Convolutional Neural Networks (CNNs). By leveraging deep image representations learned by CNNs, we demonstrate how to separate and manipulate image content and style, enabling the synthesis of high-quality images that combine content and style in a harmonious manner. We describe the methodology, including content and style representations, loss computation, and optimization, and showcase experimental results highlighting the effectiveness and versatility of the approach across different styles and content
摘要
美术风格传输,一种吸引人的生成人工智能应用,涉及将一幅图像的内容与另一幅图像的艺术风格相结合,以创造独特的视觉作品。本文提出了一种基于卷积神经网络(CNN)的新方法,用于实现风格传输。通过利用深度图像表示学习出来的CNN,我们示例了如何分离和处理图像内容和风格,以生成高质量的合成图像,其中内容和风格兼得协调。我们介绍了方法的具体实现,包括内容和风格表示、损失计算和优化,并通过实验结果表明该方法在不同的风格和内容上的效果和多样性。
paper_authors: Jiang-Xin Shi, Tong Wei, Yuke Xiang, Yu-Feng Li
for: investigate the effectiveness of re-sampling in modern long-tail learning tasks
methods: experiments on two homogeneous datasets, context shift augmentation module to generate diverse training images for the tail class
results: proposed module can boost generalization and outperform other approaches, including class-balanced re-sampling, decoupled classifier re-training, and data augmentation methodsAbstract
Long-tail learning has received significant attention in recent years due to the challenge it poses with extremely imbalanced datasets. In these datasets, only a few classes (known as the head classes) have an adequate number of training samples, while the rest of the classes (known as the tail classes) are infrequent in the training data. Re-sampling is a classical and widely used approach for addressing class imbalance issues. Unfortunately, recent studies claim that re-sampling brings negligible performance improvements in modern long-tail learning tasks. This paper aims to investigate this phenomenon systematically. Our research shows that re-sampling can considerably improve generalization when the training images do not contain semantically irrelevant contexts. In other scenarios, however, it can learn unexpected spurious correlations between irrelevant contexts and target labels. We design experiments on two homogeneous datasets, one containing irrelevant context and the other not, to confirm our findings. To prevent the learning of spurious correlations, we propose a new context shift augmentation module that generates diverse training images for the tail class by maintaining a context bank extracted from the head-class images. Experiments demonstrate that our proposed module can boost the generalization and outperform other approaches, including class-balanced re-sampling, decoupled classifier re-training, and data augmentation methods. The source code is available at https://www.lamda.nju.edu.cn/code_CSA.ashx.
摘要
“长尾学习在最近几年内得到了广泛关注,因为它面临着极其不均衡的数据集的挑战。在这些数据集中,只有一些类(称为头类)有足够的训练样本,而另外的类(称为尾类)则是训练数据中罕见的。重新采样是经典的和广泛使用的方法来解决类均衡问题。然而,最新的研究表明,重新采样在现代长尾学习任务中并不能提供显著的性能提升。本文旨在系统地探讨这种现象。我们的研究表明,重新采样可以在训练图像不含 semantically irrelevant 上下文时大幅提高泛化。在其他情况下,它可能学习不相关的上下文和目标标签之间的意外相关性。我们设计了两个同质数据集的实验,一个包含 irrelevant context,另一个不包含,以确认我们的发现。为避免学习不相关的上下文,我们提议一种新的上下文shift augmentation模块,该模块可以生成 tail 类的多样化训练图像,保持 head 类图像中的上下文银行。实验表明,我们提议的模块可以提高泛化和其他方法相比,包括类均衡重新采样、解册分类器重新训练和数据扩展方法。代码可以在 中获取。”
Edge AI-Based Vein Detector for Efficient Venipuncture in the Antecubital Fossa
methods: 这个论文使用了 Near Infrared (NIR) 成像和深度学习 (DL) 技术来 segmentation 腕静脉。
results: 这个论文提出了一种新的 NIR 成像基于的腕静脉 segmentation 数据集,并提出了一种修改后的 U-Net 架构来特别地在 antecubital fossa 区域中找到血管。此外,这个论文还测试了四种常用的嵌入式微计算机和四种压缩模式,并选择了使用 Raspberry Pi 4B 卡来实现最佳的执行时间和准确性平衡。Abstract
Assessing the condition and visibility of veins is a crucial step before obtaining intravenous access in the antecubital fossa, which is a common procedure to draw blood or administer intravenous therapies (IV therapies). Even though medical practitioners are highly skilled at intravenous cannulation, they usually struggle to perform the procedure in patients with low visible veins due to fluid retention, age, overweight, dark skin tone, or diabetes. Recently, several investigations proposed combining Near Infrared (NIR) imaging and deep learning (DL) techniques for forearm vein segmentation. Although they have demonstrated compelling results, their use has been rather limited owing to the portability and precision requirements to perform venipuncture. In this paper, we aim to contribute to bridging this gap using three strategies. First, we introduce a new NIR-based forearm vein segmentation dataset of 2,016 labelled images collected from 1,008 subjects with low visible veins. Second, we propose a modified U-Net architecture that locates veins specifically in the antecubital fossa region of the examined patient. Finally, a compressed version of the proposed architecture was deployed inside a bespoke, portable vein finder device after testing four common embedded microcomputers and four common quantization modalities. Experimental results showed that the model compressed with Dynamic Range Quantization and deployed on a Raspberry Pi 4B card produced the best execution time and precision balance, with 5.14 FPS and 0.957 of latency and Intersection over Union (IoU), respectively. These results show promising performance inside a resource-restricted low-cost device.
摘要
医疗人员在 antecubital fossa 区域进行血液或 intravenous therapies (IV therapies) 的时候,需要评估血管的状况和可见度。尽管医疗人员具有高度的血液引导技能,但在有低可见度的血管的患者中,医疗人员通常会面临困难。近些年,一些研究提出了结合 Near Infrared (NIR) 成像和深度学习 (DL) 技术来 segment 胳膊血管的方法。尽管它们已经展示出了吸引人的结果,但它们的使用受到了可移植性和精度的限制,以便在进行 venipuncture 时进行血液引导。在这篇论文中,我们想要帮助bridging这个差距。我们的方法包括三个方面:1. 我们提供了一个新的 NIR-based 胳膊血管 segmentation 数据集,包含了 2,016 个标注的图像,来自 1,008 名患者,其中许多患者有低可见度的血管。2. 我们提出了一种修改后的 U-Net 架构,可以在特定的 antecubital fossa 区域内准确地定位血管。3. 我们在一个特制的、可携带的 vein finder 设备中部署了一个压缩版的提议架构,并测试了四种常见的嵌入式微计算机和四种常见的压缩模式。实验结果表明,使用 Dynamics Range Quantization 压缩并在 Raspberry Pi 4B 卡上部署的模型在执行时间和精度之间达到了良好的平衡,具体来说是 5.14 FPS 和 0.957 的延迟和 Intersection over Union (IoU),分别是。这些结果表明在有限的资源和低成本设备中,我们的方法可以实现出色的性能。
TBDLNet: a network for classifying multidrug-resistant and drug-sensitive tuberculosis
methods: 该模型采用预训练ResNet50提取特征,并使用三个随机神经网络来避免过拟合问题。 ensemble of three RNNs 是用来提高Robustness的。
results: 该模型在五种批处分划 validation中得到了0.9822的准确率、0.9815的特征率、0.9823的精度、0.9829的敏感率和0.9826的F1-score。TBDLNet适用于分类多药 resistant和敏感肺炎,可以早些地检测多药 resistant肺炎,帮助在时间内调整治疗方案,提高治疗效果。Abstract
This paper proposes applying a novel deep-learning model, TBDLNet, to recognize CT images to classify multidrug-resistant and drug-sensitive tuberculosis automatically. The pre-trained ResNet50 is selected to extract features. Three randomized neural networks are used to alleviate the overfitting problem. The ensemble of three RNNs is applied to boost the robustness via majority voting. The proposed model is evaluated by five-fold cross-validation. Five indexes are selected in this paper, which are accuracy, sensitivity, precision, F1-score, and specificity. The TBDLNet achieves 0.9822 accuracy, 0.9815 specificity, 0.9823 precision, 0.9829 sensitivity, and 0.9826 F1-score, respectively. The TBDLNet is suitable for classifying multidrug-resistant tuberculosis and drug-sensitive tuberculosis. It can detect multidrug-resistant pulmonary tuberculosis as early as possible, which helps to adjust the treatment plan in time and improve the treatment effect.
摘要
Translation in Simplified Chinese:这篇论文提议使用一种新的深度学习模型TBDLNet,用于自动识别CT图像,并将其分为多药抗药性和敏感肺结核细菌两类。模型使用预训练的ResNet50提取特征,并使用三个随机的神经网络来避免过拟合问题。ensemble三个RNN使用多数投票法来提高鲁棒性。模型使用五fold交叉验证来评估,使用五个指标:准确率、敏感率、精度、F1分数和特征率。TBDLNet在这些指标中得分为0.9822、0.9815、0.9823、0.9829和0.9826,分别。TBDLNet适用于分类多药抗药性和敏感肺结核细菌,可以在时间上早 detection多药抗药性肺结核细菌,帮助在时间上适当地调整治疗方案,提高治疗效果。
Artifact-Robust Graph-Based Learning in Digital Pathology
paper_authors: Saba Heidari Gheshlaghi, Milan Aryal, Nasim Yahyasoltani, Masoud Ganji for:This paper aims to develop a novel robust learning approach to account for perturbations in whole slide images (WSIs) for prostate cancer diagnosis.methods:The proposed approach uses graph convolutional networks (GCNs) to extract features from the graph representing WSI, followed by a denoiser and a transformer for classification.results:The proposed model shows significant improvement in cancer diagnosis compared to non-robust algorithms, with accuracy and kappa scores improved by the denoiser and the use of GCNs.Abstract
Whole slide images~(WSIs) are digitized images of tissues placed in glass slides using advanced scanners. The digital processing of WSIs is challenging as they are gigapixel images and stored in multi-resolution format. A common challenge with WSIs is that perturbations/artifacts are inevitable during storing the glass slides and digitizing them. These perturbations include motion, which often arises from slide movement during placement, and changes in hue and brightness due to variations in staining chemicals and the quality of digitizing scanners. In this work, a novel robust learning approach to account for these artifacts is presented. Due to the size and resolution of WSIs and to account for neighborhood information, graph-based methods are called for. We use graph convolutional network~(GCN) to extract features from the graph representing WSI. Through a denoiser {and pooling layer}, the effects of perturbations in WSIs are controlled and the output is followed by a transformer for the classification of different grades of prostate cancer. To compare the efficacy of the proposed approach, the model without denoiser is trained and tested with WSIs without any perturbation and then different perturbations are introduced in WSIs and passed through the network with the denoiser. The accuracy and kappa scores of the proposed model with prostate cancer dataset compared with non-robust algorithms show significant improvement in cancer diagnosis.
摘要
整幕图像(WSIs)是用高级扫描仪将组织胶囊中的组织样本扫描成数字图像。由于WSIs的数字处理具有高分辨率和多resolution format,因此处理WSIs是一项挑战。常见的WSIs问题是在存储玻璃板和扫描时产生的干扰和 artifacts。这些干扰包括摆动、着色和亮度变化,这些变化可能是化学品的质量和扫描仪的不同。在这项工作中,我们提出了一种新的Robust学习方法来处理这些干扰。由于WSIs的大小和分辨率,以及需要考虑 neighboring information,因此我们使用图gram卷积网络(GCN)来提取WSIs中的特征。通过杂化和池化层,我们控制了干扰的影响,然后使用变换器进行不同grade的肾癌诊断。为了比较提议方法的有效性,我们在不含干扰的WSIs上train和测试模型,然后在WSIs中引入不同的干扰,并将其传递 через网络。我们的方法与肾癌数据集的准确率和κ值 Score在非Robust算法的情况下显示了显著的改善。
results: 我们的SS-PNG-NW+在PNG数据集上进行了广泛的实验,与完全有标注的模型相比,在所有数据比例下达到了相当的表现。特别是,我们的SS-PNG-NW+在只使用30%和50%的标注数据时表现出色,与完全有标注的模型相比,提高了0.8%和1.1%的表现。这表明我们的提出的SS-PNG-NW+在限制标注数据下提高PNG任务的实际性。Abstract
Despite considerable progress, the advancement of Panoptic Narrative Grounding (PNG) remains hindered by costly annotations. In this paper, we introduce a novel Semi-Supervised Panoptic Narrative Grounding (SS-PNG) learning scheme, capitalizing on a smaller set of labeled image-text pairs and a larger set of unlabeled pairs to achieve competitive performance. Unlike visual segmentation tasks, PNG involves one pixel belonging to multiple open-ended nouns. As a result, existing multi-class based semi-supervised segmentation frameworks cannot be directly applied to this task. To address this challenge, we first develop a novel SS-PNG Network (SS-PNG-NW) tailored to the SS-PNG setting. We thoroughly investigate strategies such as Burn-In and data augmentation to determine the optimal generic configuration for the SS-PNG-NW. Additionally, to tackle the issue of imbalanced pseudo-label quality, we propose a Quality-Based Loss Adjustment (QLA) approach to adjust the semi-supervised objective, resulting in an enhanced SS-PNG-NW+. Employing our proposed QLA, we improve BCE Loss and Dice loss at pixel and mask levels, respectively. We conduct extensive experiments on PNG datasets, with our SS-PNG-NW+ demonstrating promising results comparable to fully-supervised models across all data ratios. Remarkably, our SS-PNG-NW+ outperforms fully-supervised models with only 30% and 50% supervision data, exceeding their performance by 0.8% and 1.1% respectively. This highlights the effectiveness of our proposed SS-PNG-NW+ in overcoming the challenges posed by limited annotations and enhancing the applicability of PNG tasks. The source code is available at https://github.com/nini0919/SSPNG.
摘要
尽管已经做出了很大的进步,但是对于图像文本对应关系(PNG)的进一步发展仍然受到严重的标注成本限制。在这篇论文中,我们介绍了一种新的半超vised Panoptic Narrative Grounding(SS-PNG)学习方案,利用一个更小的标注图像文本对的集合和一个更大的无标注对来实现竞争性的性能。与视觉分割任务不同,PNG中一个像素可以属于多个开放式名称。因此,现有的多类基于 semi-supervised segmentation的框架无法直接应用于这个任务。为解决这个挑战,我们首先开发了一种适应 SS-PNG 的 SS-PNG 网络(SS-PNG-NW)。我们在这种 SS-PNG-NW 中进行了严格的调查和数据增强等策略,以确定最佳的通用配置。此外,为了解决假标注质量偏斜的问题,我们提出了一种 Quality-Based Loss Adjustment(QLA)方法,以调整 semi-supervised 目标函数,从而得到了一种提升的 SS-PNG-NW+。我们在 PNG 数据集上进行了广泛的实验,并证明了我们的 SS-PNG-NW+ 在所有数据比例下具有出色的表现,与完全超vised 模型相当。特别是,我们的 SS-PNG-NW+ 在仅使用 30% 和 50% 的超visisted数据时,超过了完全超vised 模型的性能,提高了其性能的 0.8% 和 1.1% 分别。这种表现说明了我们提出的 SS-PNG-NW+ 对于做到 PNG 任务的应用性能具有很高的效iveness。SS-PNG 网络的源代码可以在 GitHub 上找到:https://github.com/nini0919/SSPNG。
Unsupervised Representation Learning for Diverse Deformable Shape Collections
results: 我们的方法可以实现优秀的重建和更加真实和平滑的 interpolations,并且超过基eline方法的性能。Abstract
We introduce a novel learning-based method for encoding and manipulating 3D surface meshes. Our method is specifically designed to create an interpretable embedding space for deformable shape collections. Unlike previous 3D mesh autoencoders that require meshes to be in a 1-to-1 correspondence, our approach is trained on diverse meshes in an unsupervised manner. Central to our method is a spectral pooling technique that establishes a universal latent space, breaking free from traditional constraints of mesh connectivity and shape categories. The entire process consists of two stages. In the first stage, we employ the functional map paradigm to extract point-to-point (p2p) maps between a collection of shapes in an unsupervised manner. These p2p maps are then utilized to construct a common latent space, which ensures straightforward interpretation and independence from mesh connectivity and shape category. Through extensive experiments, we demonstrate that our method achieves excellent reconstructions and produces more realistic and smoother interpolations than baseline approaches.
摘要
我们提出了一种新的学习基于方法用于编码和操作三维表面网格。我们的方法专门设计用于创建可解释的嵌入空间,用于不可归类的形状集合。与过去的3D笼自动编码器不同,我们的方法不需要笼子在1-1对应。我们的方法在无监督的情况下在多种笼子上进行训练。中心于我们的方法是一种spectral pooling技术,该技术建立了一个通用的嵌入空间,脱离了传统的笼子连接和形状类别的限制。整个过程分为两个阶段。在第一阶段,我们使用函数映射方法抽取点对点(p2p)地图 между一个集合的形状。这些p2p地图然后用于构建共同嵌入空间,这使得解释更直观,独立于笼子连接和形状类别。通过广泛的实验,我们证明了我们的方法可以实现出色的重建和更加真实和平滑的 interpolations than 基准方法。
End-to-end Video Gaze Estimation via Capturing Head-face-eye Spatial-temporal Interaction Context
methods: MCGaze方法可以同时解决头、脸、眼的指示位置定位问题,并在一步式的方式下进行优化,从而实现最佳性能。在这个过程中,头、脸、眼的上下文信息互相交换,从而在眼动推断中 simultanously capture global clue from head and face, and local clue from eye.
results: 实验结果表明,MCGaze方法在面临到复杂的 Gaze360 数据集的测试中表现出色,证明了我们的提议的优越性。Abstract
In this letter, we propose a new method, Multi-Clue Gaze (MCGaze), to facilitate video gaze estimation via capturing spatial-temporal interaction context among head, face, and eye in an end-to-end learning way, which has not been well concerned yet. The main advantage of MCGaze is that the tasks of clue localization of head, face, and eye can be solved jointly for gaze estimation in a one-step way, with joint optimization to seek optimal performance. During this, spatial-temporal context exchange happens among the clues on the head, face, and eye. Accordingly, the final gazes obtained by fusing features from various queries can be aware of global clues from heads and faces, and local clues from eyes simultaneously, which essentially leverages performance. Meanwhile, the one-step running way also ensures high running efficiency. Experiments on the challenging Gaze360 dataset verify the superiority of our proposition. The source code will be released at https://github.com/zgchen33/MCGaze.
摘要
在这封信中,我们提出了一种新的方法,即多 clue gaze(MCGaze),用于通过捕捉头、面和眼的空间-时间交互 context来进行视频眼动估计,这种方法尚未得到了充分关注。MCGaze 的主要优点是可以同时解决头、面和眼的 clue localization 问题,从而实现一步骤的眼动估计,并且在joint optimization中进行优化以求最佳性能。在这个过程中,头、面和眼之间的空间-时间上的Context Exchange 发生,从而使得最终的眼动结果可以同时充分利用全头和面上的全局 clue,以及眼上的本地 clue,这种方法可以提高性能。此外,MCGaze 的一步运行方式也保证了高效率。实验表明,在 Gaze360 数据集上,我们的提议超过了传统方法的性能。源代码将于 https://github.com/zgchen33/MCGaze 上发布。
results: 该方法可以在各种情况下提供高质量的干扰除结果,而且比传统的监督方法更快速,并且可以避免创造大量的样本抽象。Abstract
Traditional supervised denoisers are trained using pairs of noisy input and clean target images. They learn to predict a central tendency of the posterior distribution over possible clean images. When, e.g., trained with the popular quadratic loss function, the network's output will correspond to the minimum mean square error (MMSE) estimate. Unsupervised denoisers based on Variational AutoEncoders (VAEs) have succeeded in achieving state-of-the-art results while requiring only unpaired noisy data as training input. In contrast to the traditional supervised approach, unsupervised denoisers do not directly produce a single prediction, such as the MMSE estimate, but allow us to draw samples from the posterior distribution of clean solutions corresponding to the noisy input. To approximate the MMSE estimate during inference, unsupervised methods have to create and draw a large number of samples - a computationally expensive process - rendering the approach inapplicable in many situations. Here, we present an alternative approach that trains a deterministic network alongside the VAE to directly predict a central tendency. Our method achieves results that surpass the results achieved by the unsupervised method at a fraction of the computational cost.
摘要
传统的监督式降噪器通常通过对噪声输入和干净目标图像的对照对进行训练,学习预测噪声输入的后逻脑分布中的中位数。例如,使用流行的quadratic loss函数训练网络,网络的输出将对应于最小平均方差估计(MMSE)。不同于传统的监督式方法,无监督降噪器基于Variational AutoEncoders(VAEs)可以在不需要对应的干净数据的情况下实现状态的最佳结果。然而,在推理过程中,无监督降噪器不直接生成唯一的预测结果,而是允许我们从降噪器的 posterior distribution 中随机抽取干净解决方案对应的噪声输入。为了在推理过程中 aproximate MMSE 估计,无监督方法需要创建和抽取大量的样本,这是 computationally expensive 的过程,因此在许多情况下无法应用。在这篇文章中,我们提出了一种alternative方法,该方法通过同时训练 deterministic 网络和 VAE 来直接预测中位数。我们的方法可以在computational cost的一个 fraction 的情况下超越无监督方法的结果。
Classifier-head Informed Feature Masking and Prototype-based Logit Smoothing for Out-of-Distribution Detection
results: 实验结果显示,本研究的方法可以将OOD检测精度提高,并且与现有方法相容。本研究新创出了State-of-the-art的性能。代码将会公开发布。Abstract
Out-of-distribution (OOD) detection is essential when deploying neural networks in the real world. One main challenge is that neural networks often make overconfident predictions on OOD data. In this study, we propose an effective post-hoc OOD detection method based on a new feature masking strategy and a novel logit smoothing strategy. Feature masking determines the important features at the penultimate layer for each in-distribution (ID) class based on the weights of the ID class in the classifier head and masks the rest features. Logit smoothing computes the cosine similarity between the feature vector of the test sample and the prototype of the predicted ID class at the penultimate layer and uses the similarity as an adaptive temperature factor on the logit to alleviate the network's overconfidence prediction for OOD data. With these strategies, we can reduce feature activation of OOD data and enlarge the gap in OOD score between ID and OOD data. Extensive experiments on multiple standard OOD detection benchmarks demonstrate the effectiveness of our method and its compatibility with existing methods, with new state-of-the-art performance achieved from our method. The source code will be released publicly.
摘要
OUT-OF-DISTRIBUTION (OOD) 检测是在真实世界中部署神经网络的关键。一个主要挑战是神经网络frequently 对OOD数据进行过自信的预测。在这项研究中,我们提出了一种有效的后置OOD检测方法,基于新的特征遮盾策略和一种新的logit平滑策略。特征遮盾在半最后层确定每个ID类型的重要特征,根据ID类型的分类器头的权重,并将其他特征遮盾。logit平滑计算测试样本的特征向量和预测ID类型的prototype在半最后层的cos仿射系数,并使用这个相似性作为适应温度因子来缓解神经网络对OOD数据的过自信预测。通过这些策略,我们可以降低OOD数据的特征活动和扩大ID和OOD数据之间的分布差。我们的方法与现有方法相容,并在多个标准OOD检测 benchmark上实现了新的 state-of-the-art 性能。我们将代码公开发布。
A Chebyshev Confidence Guided Source-Free Domain Adaptation Framework for Medical Image Segmentation
paper_authors: Jiesi Hu, Yanwu Yang, Xutao Guo, Jinghua Wang, Ting Ma for:This paper focuses on addressing the accuracy deterioration issue of pseudo-labels (PLs) in source-free domain adaptation (SFDA) methods, which is a crucial problem in medical imaging scenarios due to privacy concerns.methods:The proposed framework consists of three main components: (1) Chebyshev confidence guided SFDA, (2) confidence-guided denoising methods (direct denoising and prototypical denoising), and (3) a novel teacher-student joint training scheme (TJTS) with a confidence weighting module.results:Extensive experiments in diverse domain scenarios demonstrate the effectiveness of the proposed framework, achieving superior performance compared to state-of-the-art SFDA methods. The proposed approach precisely estimates the reliability of PLs and generates high-quality PLs, leading to improved adaptation performance.Abstract
Source-free domain adaptation (SFDA) aims to adapt models trained on a labeled source domain to an unlabeled target domain without the access to source data. In medical imaging scenarios, the practical significance of SFDA methods has been emphasized due to privacy concerns. Recent State-of-the-art SFDA methods primarily rely on self-training based on pseudo-labels (PLs). Unfortunately, PLs suffer from accuracy deterioration caused by domain shift, and thus limit the effectiveness of the adaptation process. To address this issue, we propose a Chebyshev confidence guided SFDA framework to accurately assess the reliability of PLs and generate self-improving PLs for self-training. The Chebyshev confidence is estimated by calculating probability lower bound of the PL confidence, given the prediction and the corresponding uncertainty. Leveraging the Chebyshev confidence, we introduce two confidence-guided denoising methods: direct denoising and prototypical denoising. Additionally, we propose a novel teacher-student joint training scheme (TJTS) that incorporates a confidence weighting module to improve PLs iteratively. The TJTS, in collaboration with the denoising methods, effectively prevents the propagation of noise and enhances the accuracy of PLs. Extensive experiments in diverse domain scenarios validate the effectiveness of our proposed framework and establish its superiority over state-of-the-art SFDA methods. Our paper contributes to the field of SFDA by providing a novel approach for precisely estimating the reliability of pseudo-labels and a framework for obtaining high-quality PLs, resulting in improved adaptation performance.
摘要
To address this issue, we propose a Chebyshev confidence guided SFDA framework to accurately assess the reliability of PLs and generate self-improving PLs for self-training. The Chebyshev confidence is estimated by calculating the probability lower bound of the PL confidence, given the prediction and the corresponding uncertainty.Leveraging the Chebyshev confidence, we introduce two confidence-guided denoising methods: direct denoising and prototypical denoising. Additionally, we propose a novel teacher-student joint training scheme (TJTS) that incorporates a confidence weighting module to improve PLs iteratively. The TJTS, in collaboration with the denoising methods, effectively prevents the propagation of noise and enhances the accuracy of PLs.Extensive experiments in diverse domain scenarios validate the effectiveness of our proposed framework and establish its superiority over state-of-the-art SFDA methods. Our paper contributes to the field of SFDA by providing a novel approach for precisely estimating the reliability of pseudo-labels and a framework for obtaining high-quality PLs, resulting in improved adaptation performance.
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation
paper_authors: Yucheng Suo, Linchao Zhu, Yi Yang for: 这种研究旨在解决零shot引用图像分割中的挑战,即基于引用表达而不需要训练的实例掩模分割。methods: 该方法基于Text Augmented Spatial-aware(TAS)框架,包括实例掩模提取网络、文本增强视觉对应分数以及空间修正器。results: 对RefCOCO、RefCOCO+和RefCOCOg等多个 dataset进行了广泛的实验,并表明该方法在零shot引用图像分割任务中具有明显的优势,超越了现有的状态计算方法。Abstract
In this paper, we study a challenging task of zero-shot referring image segmentation. This task aims to identify the instance mask that is most related to a referring expression without training on pixel-level annotations. Previous research takes advantage of pre-trained cross-modal models, e.g., CLIP, to align instance-level masks with referring expressions. %Yet, CLIP only considers image-text pair level alignment, which neglects fine-grained image region and complex sentence matching. Yet, CLIP only considers the global-level alignment of image-text pairs, neglecting fine-grained matching between the referring sentence and local image regions. To address this challenge, we introduce a Text Augmented Spatial-aware (TAS) zero-shot referring image segmentation framework that is training-free and robust to various visual encoders. TAS incorporates a mask proposal network for instance-level mask extraction, a text-augmented visual-text matching score for mining the image-text correlation, and a spatial rectifier for mask post-processing. Notably, the text-augmented visual-text matching score leverages a $P$ score and an $N$-score in addition to the typical visual-text matching score. The $P$-score is utilized to close the visual-text domain gap through a surrogate captioning model, where the score is computed between the surrogate model-generated texts and the referring expression. The $N$-score considers the fine-grained alignment of region-text pairs via negative phrase mining, encouraging the masked image to be repelled from the mined distracting phrases. Extensive experiments are conducted on various datasets, including RefCOCO, RefCOCO+, and RefCOCOg. The proposed method clearly outperforms state-of-the-art zero-shot referring image segmentation methods.
摘要
在这篇论文中,我们研究了零shot引用图像分割的挑战性任务。这个任务的目标是使用没有Pixel级别注释的情况下,从referring表达中确定最相关的实例Mask。先前的研究利用了预训练的交叉模态模型,如CLIP,来将实例级别的mask与referring表达相Alignment。然而,CLIP只考虑了图像文本对的全局匹配,忽略了图像区域细化和复杂的句子匹配。为解决这个挑战,我们提出了一个Text Augmented Spatial-aware(TAS)零shot引用图像分割框架。TAS包括一个Mask proposal网络 для实例级别的Mask提取,一个文本增强的视觉文本匹配分数 для挖掘图像文本的相关性,以及一个空间正则化器 дляMask后处理。值得注意的是,文本增强的视觉文本匹配分数利用了$P$ score和$N$-score,以及传统的视觉文本匹配分数。$P$-score通过一个surrogate captioning模型来闭合视觉文本域的差距,其中分数是计算surrogate模型生成的文本和引用表达之间的相似度。$N$-score考虑了图像文本对的细化对应,通过负phrase挖掘,使masked图像受到挖掘的负面抑制。我们对RefCOCO、RefCOCO+和RefCOCOg等多个dataset进行了广泛的实验,并证明了我们的方法在零shot引用图像分割任务中具有明显的优势。
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Real Image
results: 这篇论文的模型在LPIPS中的 zero-shot 设定中设置了新的州OF-the-art 纪录,甚至超过了特别在DTU上训练的方法。此外,这篇论文还适用了Mip-NeRF 360 dataset作为单一图像新观点合成的新 bencmark,并在这个设定中展现了强大的性能。Abstract
We introduce a 3D-aware diffusion model, ZeroNVS, for single-image novel view synthesis for in-the-wild scenes. While existing methods are designed for single objects with masked backgrounds, we propose new techniques to address challenges introduced by in-the-wild multi-object scenes with complex backgrounds. Specifically, we train a generative prior on a mixture of data sources that capture object-centric, indoor, and outdoor scenes. To address issues from data mixture such as depth-scale ambiguity, we propose a novel camera conditioning parameterization and normalization scheme. Further, we observe that Score Distillation Sampling (SDS) tends to truncate the distribution of complex backgrounds during distillation of 360-degree scenes, and propose "SDS anchoring" to improve the diversity of synthesized novel views. Our model sets a new state-of-the-art result in LPIPS on the DTU dataset in the zero-shot setting, even outperforming methods specifically trained on DTU. We further adapt the challenging Mip-NeRF 360 dataset as a new benchmark for single-image novel view synthesis, and demonstrate strong performance in this setting. Our code and data are at http://kylesargent.github.io/zeronvs/
摘要
我们介绍了一种3D意识扩散模型,namely ZeroNVS,用于单图新视角合成Scene中的异常场景。而现有方法通常是为单个物体设置masked背景,我们提出了新的技术来解决在野外多对象场景中引入的挑战。具体来说,我们在混合数据源上训练了生成的先验,以捕捉object-centric、indoor和outdoor场景。为了解决数据混合引入的深度尺度歧义,我们提出了一种新的摄像头条件化和正规化方案。此外,我们发现Score Distillation Sampling (SDS)在混合360度场景中进行distillation时,容易对复杂背景进行短结,我们提出了"SDS anchoring"来提高合成的新视角的多样性。我们的模型在LPIPS上DTU数据集上达到了新的州OF-THE-ART记录,甚至超越了特地在DTU上训练的方法。此外,我们采用了Difficult Mip-NeRF 360数据集作为新的benchmark,并在这个设置下达到了出色的性能。我们的代码和数据可以在http://kylesargent.github.io/zeronvs/上找到。
FaultSeg Swin-UNETR: Transformer-Based Self-Supervised Pretraining Model for Fault Recognition
results: 在Thebe数据集上实现了领先的性能,OIS和ODS指标中评估为最佳Here’s a brief explanation of each point:1. for: The paper aims to improve the accuracy of seismic fault recognition by introducing a self-supervised learning approach using a large amount of unlabeled seismic data for pretraining.2. methods: The proposed method utilizes the Swin Transformer model as the core network and employs the SimMIM pretraining task to capture unique features related to discontinuities in seismic data. Additionally, the authors refine the structure of the Swin-UNETR model to enable multiscale decoding and fusion for more effective fault detection.3. results: The experimental results on the Thebe dataset demonstrate that the proposed method achieves state-of-the-art performance, as measured by the OIS and ODS metrics.Abstract
This paper introduces an approach to enhance seismic fault recognition through self-supervised pretraining. Seismic fault interpretation holds great significance in the fields of geophysics and geology. However, conventional methods for seismic fault recognition encounter various issues, including dependence on data quality and quantity, as well as susceptibility to interpreter subjectivity. Currently, automated fault recognition methods proposed based on small synthetic datasets experience performance degradation when applied to actual seismic data. To address these challenges, we have introduced the concept of self-supervised learning, utilizing a substantial amount of relatively easily obtainable unlabeled seismic data for pretraining. Specifically, we have employed the Swin Transformer model as the core network and employed the SimMIM pretraining task to capture unique features related to discontinuities in seismic data. During the fine-tuning phase, inspired by edge detection techniques, we have also refined the structure of the Swin-UNETR model, enabling multiscale decoding and fusion for more effective fault detection. Experimental results demonstrate that our proposed method attains state-of-the-art performance on the Thebe dataset, as measured by the OIS and ODS metrics.
摘要
Multivessel Coronary Artery Segmentation and Stenosis Localisation using Ensemble Learning
paper_authors: Muhammad Bilal, Dinis Martinho, Reiner Sim, Adnan Qayyum, Hunaid Vohra, Massimo Caputo, Taofeek Akinosho, Sofiat Abioye, Zaheer Khan, Waleed Niaz, Junaid Qadir for: 这个研究旨在提供一个基于机器学习的自动化诊断方案,以帮助cardiologists诊断折叠动脉疾病(CAD)。methods: 该研究使用了一种结合多个基线模型的 ensemble 模型,通过逐渐提高性能的训练策略,包括多个阶段的预training、多血管分割和精度提高等。results: 该研究的结果显示,使用这种方法可以Double the predictive accuracy of the proposed solution,并且通过进一步纠正错误的blob来进行精度提高。最终得到的结果为 coronary artery segmentation 的 mean F1 score 为 37.69%,和 stenosis localization 的 mean F1 score 为 39.41%。Abstract
Coronary angiography analysis is a common clinical task performed by cardiologists to diagnose coronary artery disease (CAD) through an assessment of atherosclerotic plaque's accumulation. This study introduces an end-to-end machine learning solution developed as part of our solution for the MICCAI 2023 Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) challenge, which aims to benchmark solutions for multivessel coronary artery segmentation and potential stenotic lesion localisation from X-ray coronary angiograms. We adopted a robust baseline model training strategy to progressively improve performance, comprising five successive stages of binary class pretraining, multivessel segmentation, fine-tuning using class frequency weighted dataloaders, fine-tuning using F1-based curriculum learning strategy (F1-CLS), and finally multi-target angiogram view classifier-based collective adaptation. Unlike many other medical imaging procedures, this task exhibits a notable degree of interobserver variability. %, making it particularly amenable to automated analysis. Our ensemble model combines the outputs from six baseline models using the weighted ensembling approach, which our analysis shows is found to double the predictive accuracy of the proposed solution. The final prediction was further refined, targeting the correction of misclassified blobs. Our solution achieved a mean F1 score of $37.69\%$ for coronary artery segmentation, and $39.41\%$ for stenosis localisation, positioning our team in the 5th position on both leaderboards. This work demonstrates the potential of automated tools to aid CAD diagnosis, guide interventions, and improve the accuracy of stent injections in clinical settings.
摘要
coronary angiography 分析是一种常见的临床任务,由医生用于诊断液体动脉疾病(CAD)的评估,包括atherosclerotic plaque的积累。这项研究介绍了一种基于我们的解决方案的自动化解决方案,用于MICCAI 2023 自动区域基础 coronary artery disease 诊断(ARCADE)挑战,以获得多个血管 segmentation 和可能的狭窄 lesion 的位置。我们采用了一种可靠的基线模型训练策略,包括五个顺序的 binary class pretraining、多血管 segmentation、精度调整使用类频率加载器、F1-based curriculum learning strategy(F1-CLS)和最后是多视图 coronary angiogram 类型的集成adaptation。与许多医疗影像过程不同,这个任务具有显著的Interobserver variability,使其更适合自动分析。我们的集成模型将六个基线模型的输出结合使用重量加权ensembleapproach,我们的分析显示可以double predictive accuracy of the proposed solution。最终预测还进行了进一步的纠正,以正确化错误的 blob。我们的解决方案在 coronary artery segmentation 方面 achievement mean F1 score of 37.69%,并在 localisation 方面 achievement mean F1 score of 39.41%,位于领先board 的第五名。这项工作 demonstarted the potential of automated tools to aid CAD diagnosis, guide interventions, and improve the accuracy of stent injections in clinical settings.
Shape-centered Representation Learning for Visible-Infrared Person Re-identification
results: 该论文的实验结果表明,ScRL可以在人脸识别任务中实现remarkable的性能,其中 Rank-1(mAP)精度达到76.1%, 71.2%, 92.4%(72.6%, 52.9%, 86.7%)在SYSU-MM01、HITSZ-VCM和RegDB数据集上。Abstract
Current Visible-Infrared Person Re-Identification (VI-ReID) methods prioritize extracting distinguishing appearance features, ignoring the natural resistance of body shape against modality changes. Initially, we gauged the discriminative potential of shapes by a straightforward concatenation of shape and appearance features. However, two unresolved issues persist in the utilization of shape features. One pertains to the dependence on auxiliary models for shape feature extraction in the inference phase, along with the errors in generated infrared shapes due to the intrinsic modality disparity. The other issue involves the inadequately explored correlation between shape and appearance features. To tackle the aforementioned challenges, we propose the Shape-centered Representation Learning framework (ScRL), which focuses on learning shape features and appearance features associated with shapes. Specifically, we devise the Shape Feature Propagation (SFP), facilitating direct extraction of shape features from original images with minimal complexity costs during inference. To restitute inaccuracies in infrared body shapes at the feature level, we present the Infrared Shape Restitution (ISR). Furthermore, to acquire appearance features related to shape, we design the Appearance Feature Enhancement (AFE), which accentuates identity-related features while suppressing identity-unrelated features guided by shape features. Extensive experiments are conducted to validate the effectiveness of the proposed ScRL. Achieving remarkable results, the Rank-1 (mAP) accuracy attains 76.1%, 71.2%, 92.4% (72.6%, 52.9%, 86.7%) on the SYSU-MM01, HITSZ-VCM, RegDB datasets respectively, outperforming existing state-of-the-art methods.
摘要
当前可见红外人重认(VI-ReID)方法强调抽出特征特征,忽视人体形态自然对模态变化的抵抗性。我们首先评估特征的推诉潜力,通过简单 concatenation shape 和 appearance 特征。但是,在使用 shape 特征时,存在两个不解决的问题。其一是在推理阶段依赖 auxilary 模型来EXTRACT shape 特征,同时因内生模态差而产生的生成红外形态错误。另一个问题是 shape 和 appearance 特征之间的相关性未得到充分探索。为了解决这些挑战,我们提出了 Shape-centered Representation Learning 框架(ScRL),它注重学习 shape 特征和 appearance 特征相关的 shape。具体来说,我们设计了 Shape Feature Propagation (SFP),它可以在原始图像中直接EXTRACT shape 特征,降低推理复杂性。此外,我们还提出了 Infrared Shape Restitution (ISR),用于在特征层修复红外形态错误。此外,我们还设计了 Appearance Feature Enhancement (AFE),它可以强调身份相关的特征,同时避免身份不相关的特征,以shape特征为引导。我们进行了广泛的实验,以验证 ScRL 的效果。得到了惊人的结果,VI-ReID 方法的 Rank-1(mAP)精度达到 76.1%、71.2%、92.4%(72.6%、52.9%、86.7%),在 SYSU-MM01、HITSZ-VCM 和 RegDB 数据集上,分别高于当前状态的前iers。
Instance Segmentation under Occlusions via Location-aware Copy-Paste Data Augmentation
results: 本研究在 MMSports 2023 DeepSportRadar 比赛中取得了很好的结果,其中 occlusion 得分 (OM) 为 0.533,位于领导者板卡的第一名。Abstract
Occlusion is a long-standing problem in computer vision, particularly in instance segmentation. ACM MMSports 2023 DeepSportRadar has introduced a dataset that focuses on segmenting human subjects within a basketball context and a specialized evaluation metric for occlusion scenarios. Given the modest size of the dataset and the highly deformable nature of the objects to be segmented, this challenge demands the application of robust data augmentation techniques and wisely-chosen deep learning architectures. Our work (ranked 1st in the competition) first proposes a novel data augmentation technique, capable of generating more training samples with wider distribution. Then, we adopt a new architecture - Hybrid Task Cascade (HTC) framework with CBNetV2 as backbone and MaskIoU head to improve segmentation performance. Furthermore, we employ a Stochastic Weight Averaging (SWA) training strategy to improve the model's generalization. As a result, we achieve a remarkable occlusion score (OM) of 0.533 on the challenge dataset, securing the top-1 position on the leaderboard. Source code is available at this https://github.com/nguyendinhson-kaist/MMSports23-Seg-AutoID.
摘要
干扰是计算机视觉领域的长期问题,特别是在实例分割方面。ACM MMSports 2023 DeepSportRadar datasets 已经引入了专门用于人体分割的篮球场景,以及特殊的评价指标 для干扰情况。由于数据集的规模较小和需要分割的对象具有高度变形的特点,这个挑战需要应用robust的数据扩展技术和合适的深度学习架构。我们的工作(在比赛中排名第一)首先提出了一种新的数据扩展技术,能够生成更多的训练样本,并且具有更广泛的分布。然后,我们采用了一个新的框架——Hybrid Task Cascade(HTC)框架,其中CBNetV2 作为 backing 和 MaskIoU 头部来提高分割性能。此外,我们还使用了一种Stochastic Weight Averaging(SWA) 训练策略,以提高模型的泛化性。因此,我们在挑战数据集上实现了干扰分数(OM)为 0.533,在 liderboard 上排名第一。源代码可以在以下链接中找到:https://github.com/nguyendinhson-kaist/MMSports23-Seg-AutoID。
Diversifying Spatial-Temporal Perception for Video Domain Generalization
results: 在三个不同类型的benchmark上进行了广泛的实验,证明了我们的方法的有效性和多样性。Abstract
Video domain generalization aims to learn generalizable video classification models for unseen target domains by training in a source domain. A critical challenge of video domain generalization is to defend against the heavy reliance on domain-specific cues extracted from the source domain when recognizing target videos. To this end, we propose to perceive diverse spatial-temporal cues in videos, aiming to discover potential domain-invariant cues in addition to domain-specific cues. We contribute a novel model named Spatial-Temporal Diversification Network (STDN), which improves the diversity from both space and time dimensions of video data. First, our STDN proposes to discover various types of spatial cues within individual frames by spatial grouping. Then, our STDN proposes to explicitly model spatial-temporal dependencies between video contents at multiple space-time scales by spatial-temporal relation modeling. Extensive experiments on three benchmarks of different types demonstrate the effectiveness and versatility of our approach.
摘要
视频领域通用化目标在培养源领域中学习通用的视频分类模型,以便在目标领域中进行推理。一个关键的挑战是防止在目标视频识别中过重依赖源领域特有的特征。为此,我们提议利用视频中的多样化空间-时间特征,找到可能的领域不受影响的特征。我们提出了一种新的模型,即空间-时间多样化网络(STDN),它在视频数据中提高多样化性。首先,我们的 STDN 提出了在个体帧中发现多种空间特征的方法,并进行空间组合。然后,我们的 STDN 利用多个空间-时间尺度的空间-时间关系模型,以模拟视频内容之间的空间-时间相互关系。我们在三个不同类型的 benchmark 上进行了广泛的实验,并证明了我们的方法的有效性和多样性。
DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF
paper_authors: Chaowei Liu, Jichun Li, Yihua Teng, Chaoqun Wang, Nuo Xu, Jihao Wu, Dandan Tu
for: 提高多层次陌生文档图像的Restoration至其潜在的PDF版本
methods: 基于”Perceive-then-Restore”模式的 transformer 块,加上 GAN 和优质PDF图像,以减少陌生度和提高视觉质量
results: 实验结果显示, DocStormer 可以有效地恢复多层次陌生文档图像,提供了一个新的 Restoration 方法,可以填补当前学术领域中的一个知识漏洞。Abstract
For capturing colored document images, e.g. posters and magazines, it is common that multiple degradations such as shadows, wrinkles, etc., are simultaneously introduced due to external factors. Restoring multi-degraded colored document images is a great challenge, yet overlooked, as most existing algorithms focus on enhancing color-ignored document images via binarization. Thus, we propose DocStormer, a novel algorithm designed to restore multi-degraded colored documents to their potential pristine PDF. The contributions are: firstly, we propose a "Perceive-then-Restore" paradigm with a reinforced transformer block, which more effectively encodes and utilizes the distribution of degradations. Secondly, we are the first to utilize GAN and pristine PDF magazine images to narrow the distribution gap between the enhanced results and PDF images, in pursuit of less degradation and better visual quality. Thirdly, we propose a non-parametric strategy, PFILI, which enables a smaller training scale and larger testing resolutions with acceptable detail trade-off, while saving memory and inference time. Fourthly, we are the first to propose a novel Multi-Degraded Colored Document image Enhancing dataset, named MD-CDE, for both training and evaluation. Experimental results show that the DocStormer exhibits superior performance, capable of revitalizing multi-degraded colored documents into their potential pristine digital versions, which fills the current academic gap from the perspective of method, data, and task.
摘要
For capturing 颜色文档图像,如 poster 和杂志, external factors 可能同时引入多种干扰, such as shadows 和折皮等。 Restoring 多干扰的颜色文档图像是一大挑战,尤其是被忽略的,因为大多数现有算法都专注于提高无色文档图像的明暗分割。 Therefore, we propose DocStormer, a novel algorithm designed to restore 多干扰的颜色文档图像 to its potential pristine PDF. The contributions are:Firstly, we propose a "Perceive-then-Restore" paradigm with a reinforced transformer block, which more effectively encodes and utilizes the distribution of degradations.Secondly, we are the first to utilize GAN and pristine PDF magazine images to narrow the distribution gap between the enhanced results and PDF images, in pursuit of less degradation and better visual quality.Thirdly, we propose a non-parametric strategy, PFILI, which enables a smaller training scale and larger testing resolutions with acceptable detail trade-off, while saving memory and inference time.Fourthly, we are the first to propose a novel Multi-Degraded Colored Document image Enhancing dataset, named MD-CDE, for both training and evaluation. Experimental results show that the DocStormer exhibits superior performance, capable of revitalizing 多干扰的颜色文档图像 into its potential pristine digital versions, which fills the current academic gap from the perspective of method, data, and task.
Impressions: Understanding Visual Semiotics and Aesthetic Impact
for: investigate the semiotics of images and how specific visual features and design choices can elicit specific emotions, thoughts, and beliefs.
methods: design an annotation task heavily inspired by image analysis techniques in the Visual Arts to collect image-caption pairs and unique annotations exploring impact, pragmatic image description, impressions, and aesthetic design choices.
results: existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images, but this dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.Here is the full translation of the paper’s abstract in Simplified Chinese:
for: 这个研究旨在 investigate the semiotics of images, and how specific visual features and design choices can elicit specific emotions, thoughts, and beliefs.
methods: 这个研究使用了一个 heavily inspired by image analysis techniques in the Visual Arts 的 annotation task,收集了 1,440 个 image-caption pairs 和 4,320 个 unique annotations,探讨 impact, pragmatic image description, impressions, 和 aesthetic design choices.
results: 现有的 multimodal image captioning 和 conditional generation models 对 images 的 simulated human responses 表现不佳,但是这个 dataset 能够 significantly improve 这些模型的 ability to model impressions 和 aesthetic evaluations of images through fine-tuning 和 few-shot adaptation.Abstract
Is aesthetic impact different from beauty? Is visual salience a reflection of its capacity for effective communication? We present Impressions, a novel dataset through which to investigate the semiotics of images, and how specific visual features and design choices can elicit specific emotions, thoughts and beliefs. We posit that the impactfulness of an image extends beyond formal definitions of aesthetics, to its success as a communicative act, where style contributes as much to meaning formation as the subject matter. However, prior image captioning datasets are not designed to empower state-of-the-art architectures to model potential human impressions or interpretations of images. To fill this gap, we design an annotation task heavily inspired by image analysis techniques in the Visual Arts to collect 1,440 image-caption pairs and 4,320 unique annotations exploring impact, pragmatic image description, impressions, and aesthetic design choices. We show that existing multimodal image captioning and conditional generation models struggle to simulate plausible human responses to images. However, this dataset significantly improves their ability to model impressions and aesthetic evaluations of images through fine-tuning and few-shot adaptation.
摘要
是美学影响与美的区别?视觉吸引力是通信效果的反映吗?我们介绍Impressions,一个新的数据集,用于探讨图像的 semiotics,并如何specific visual features和设计选择可以引发specific emotions, thoughts和beliefs。我们认为图像的吸引力不仅限于传统的美学定义,还包括图像作为通信行为的成功度,style与subject matter共同形成意义。但是,先前的图像描述数据集不适用于激发人类的印象或解释。为了填补这个空白,我们设计了一个基于图像分析技术的image描述任务,收集了1,440个图像-描述对和4,320个特有的批注,探讨影响、实用描述、印象和美学设计选择。我们发现,现有的多modal图像描述和条件生成模型在模拟人类对图像的回应方面表现不佳。但是,这个数据集可以大幅提高这些模型对图像印象和美学评价的能力。
Reconstructive Latent-Space Neural Radiance Fields for Efficient 3D Scene Representations
paper_authors: Tristan Aumentado-Armstrong, Ashkan Mirzaei, Marcus A. Brubaker, Jonathan Kelly, Alex Levinshtein, Konstantinos G. Derpanis, Igor Gilitschenski
for: 这paper aimed to improve the efficiency of Neural Radiance Fields (NeRFs) for 3D scene representation, while maintaining high image quality.
results: 相比标准色域 NeRFs,latent-space NeRF 可以生成更高质量的新视图,并且可以在三倍的渲染速度下得到更好的效果。此外,通过缩小 AE 架构,可以控制效率和图像质量之间的交易,并达到更高的渲染速度。Abstract
Neural Radiance Fields (NeRFs) have proven to be powerful 3D representations, capable of high quality novel view synthesis of complex scenes. While NeRFs have been applied to graphics, vision, and robotics, problems with slow rendering speed and characteristic visual artifacts prevent adoption in many use cases. In this work, we investigate combining an autoencoder (AE) with a NeRF, in which latent features (instead of colours) are rendered and then convolutionally decoded. The resulting latent-space NeRF can produce novel views with higher quality than standard colour-space NeRFs, as the AE can correct certain visual artifacts, while rendering over three times faster. Our work is orthogonal to other techniques for improving NeRF efficiency. Further, we can control the tradeoff between efficiency and image quality by shrinking the AE architecture, achieving over 13 times faster rendering with only a small drop in performance. We hope that our approach can form the basis of an efficient, yet high-fidelity, 3D scene representation for downstream tasks, especially when retaining differentiability is useful, as in many robotics scenarios requiring continual learning.
摘要
paper_authors: Qiankun Liu, Yichen Li, Yuqi Jiang, Ying Fu for: 本研究的目的是提出一种简单 yet effective的 Generic Multi-Object Tracking (GMOT) 方法,以便在不同场景中检测和跟踪动态对象。methods: 本研究使用了 Siamese-DETR 方法,其中利用了 detection 数据集 (e.g., COCO) 进行训练,并 introduce 了一种动态匹配训练策略以使用提供的筛选器。results: 实验结果显示,Siamese-DETR 在 GMOT-40 数据集上表现出色,至今为止比 EXISTS 的 MOT 方法更高。Abstract
The ability to detect and track the dynamic objects in different scenes is fundamental to real-world applications, e.g., autonomous driving and robot navigation. However, traditional Multi-Object Tracking (MOT) is limited to tracking objects belonging to the pre-defined closed-set categories. Recently, Open-Vocabulary MOT (OVMOT) and Generic MOT (GMOT) are proposed to track interested objects beyond pre-defined categories with the given text prompt and template image. However, the expensive well pre-trained (vision-)language model and fine-grained category annotations are required to train OVMOT models. In this paper, we focus on GMOT and propose a simple but effective method, Siamese-DETR, for GMOT. Only the commonly used detection datasets (e.g., COCO) are required for training. Different from existing GMOT methods, which train a Single Object Tracking (SOT) based detector to detect interested objects and then apply a data association based MOT tracker to get the trajectories, we leverage the inherent object queries in DETR variants. Specifically: 1) The multi-scale object queries are designed based on the given template image, which are effective for detecting different scales of objects with the same category as the template image; 2) A dynamic matching training strategy is introduced to train Siamese-DETR on commonly used detection datasets, which takes full advantage of provided annotations; 3) The online tracking pipeline is simplified through a tracking-by-query manner by incorporating the tracked boxes in previous frame as additional query boxes. The complex data association is replaced with the much simpler Non-Maximum Suppression (NMS). Extensive experimental results show that Siamese-DETR surpasses existing MOT methods on GMOT-40 dataset by a large margin.
摘要
能力检测和跟踪不同场景中的动态对象是实际应用中的基本要求,例如自动驾驶和机器人导航。然而,传统的多对象跟踪(MOT)仅能跟踪预定的关闭集类型的对象。最近,开放词汇MOT(OVMOT)和通用MOT(GMOT)被提出,以检测与给定模板图像中的对象相关的对象。然而,需要昂贵的高级见语言模型和细化类别标注来训练OVMOT模型。在本文中,我们将关注GMOT,并提出一种简单 yet effective的方法:Siamese-DETR。只需使用常用的检测数据集(例如COCO)进行训练。与现有GMOT方法不同,我们不会训练单个对象检测器来检测兴趣对象,而是利用DETR变体中的内置对象查询。具体来说,我们做了以下三个方法:1)基于给定模板图像的多尺度对象查询,可以有效地检测不同的对象大小与模板图像中的同一类型对象; 2)我们引入了动态匹配训练策略,以利用提供的注释来训练Siamese-DETR; 3)通过将跟踪框架简化为查询方式,并将已跟踪的框架作为额外的查询框架,替代复杂的数据关联。这里的数据关联被替换为非最大Suppression(NMS)。我们的实验结果表明,Siamese-DETR在GMOT-40数据集上大幅超越现有MOT方法。
SmooSeg: Smoothness Prior for Unsupervised Semantic Segmentation
results: 根据我们的实验结果,SmooSeg 可以对 COCOStuff、Cityscapes 和 Potsdam-3 等三个数据集进行高效的分割,并且与 STEGO 相比,SmooSeg 可以提高 pixel accuracy 的表现。具体来说,在 COCOStuff 数据集上,SmooSeg 可以提高 pixel accuracy 的表现+14.9%,在 Cityscapes 数据集上提高 +13.0%,在 Potsdam-3 数据集上提高 +5.7%。Abstract
Unsupervised semantic segmentation is a challenging task that segments images into semantic groups without manual annotation. Prior works have primarily focused on leveraging prior knowledge of semantic consistency or priori concepts from self-supervised learning methods, which often overlook the coherence property of image segments. In this paper, we demonstrate that the smoothness prior, asserting that close features in a metric space share the same semantics, can significantly simplify segmentation by casting unsupervised semantic segmentation as an energy minimization problem. Under this paradigm, we propose a novel approach called SmooSeg that harnesses self-supervised learning methods to model the closeness relationships among observations as smoothness signals. To effectively discover coherent semantic segments, we introduce a novel smoothness loss that promotes piecewise smoothness within segments while preserving discontinuities across different segments. Additionally, to further enhance segmentation quality, we design an asymmetric teacher-student style predictor that generates smoothly updated pseudo labels, facilitating an optimal fit between observations and labeling outputs. Thanks to the rich supervision cues of the smoothness prior, our SmooSeg significantly outperforms STEGO in terms of pixel accuracy on three datasets: COCOStuff (+14.9%), Cityscapes (+13.0%), and Potsdam-3 (+5.7%).
摘要
无监督semantic segmentation是一项复杂的任务,它的目标是将图像分割成semantic组without manual annotation. 先前的研究主要依靠自动学习方法来激活先前的semantic consistency或self-supervised learning方法,这些方法经常忽视图像分割的coherence性质. 在这篇论文中,我们表明了smoothness prior,即close features in a metric space share the same semantics,可以大大简化segmentation。 在这个思想下,我们提出了一种新的方法called SmooSeg,它利用self-supervised learning方法来表示observations的closeness关系作为smoothness信号。 为了有效发现coherent semantic segments,我们引入了一种新的smoothness loss,该损失函数激活piecewise smoothness within segments while preserving discontinuities across different segments。 此外,我们还设计了一种异形 teacher-student 预测器,该预测器可以生成smoothly updated pseudo labels,使得observations和labeling输出之间进行优化的适应。 由于smoothness prior提供了丰富的监督信号,我们的SmooSeg在COCOStuff (+14.9%), Cityscapes (+13.0%), and Potsdam-3 (+5.7%)三个数据集上都显著超过STEGO的像素准确率。
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
results: 该文章通过对多个标准 benchmark 数据集进行测试,证明了GJR模块可以帮助图像归一化进行更好的分类和识别,并且在速度和精度两个方面具有优于传统方法的优势。此外,文章还提出了一种基于预训练的Grid Jigsaw Representation(pGJR)方法,该方法可以在快速的 converges 过程中提高图像归一化的效果。Abstract
Unsupervised representation learning for image clustering is essential in computer vision. Although the advancement of visual models has improved image clustering with efficient visual representations, challenges still remain. Firstly, these features often lack the ability to represent the internal structure of images, hindering the accurate clustering of visually similar images. Secondly, the existing features tend to lack finer-grained semantic labels, limiting the ability to capture nuanced differences and similarities between images. In this paper, we first introduce Jigsaw based strategy method for image clustering called Grid Jigsaw Representation (GJR) with systematic exposition from pixel to feature in discrepancy against human and computer. We emphasize that this algorithm, which mimics human jigsaw puzzle, can effectively improve the model to distinguish the spatial feature between different samples and enhance the clustering ability. GJR modules are appended to a variety of deep convolutional networks and tested with significant improvements on a wide range of benchmark datasets including CIFAR-10, CIFAR-100/20, STL-10, ImageNet-10 and ImageNetDog-15. On the other hand, convergence efficiency is always an important challenge for unsupervised image clustering. Recently, pretrained representation learning has made great progress and released models can extract mature visual representations. It is obvious that use the pretrained model as feature extractor can speed up the convergence of clustering where our aim is to provide new perspective in image clustering with reasonable resource application and provide new baseline. Further, we innovate pretrain-based Grid Jigsaw Representation (pGJR) with improvement by GJR. The experiment results show the effectiveness on the clustering task with respect to the ACC, NMI and ARI three metrics and super fast convergence speed.
摘要
自然无监督学习是计算机视觉中不可或缺的一部分。虽然视觉模型的进步使得图像归类得到了有效的视觉表示,但是还存在一些挑战。首先,这些特征通常缺乏表示图像内部结构的能力,使得准确归类类似图像 become more difficult.其次,现有的特征通常缺乏更细grained的Semantic Label,限制了捕捉图像之间细微差异和相似性的能力。在这篇论文中,我们首先介绍了基于Jigsaw策略的图像归类方法,即Grid Jigsaw Representation(GJR),并进行系统性的描述从像素到特征之间的差异。我们强调这种算法,类似于人类的缺失图形,可以有效地提高模型对图像之间的空间特征的分辨率,从而提高归类能力。GJR模块被附加到了多种深度卷积网络中,并在各种benchmark数据集上进行了广泛的测试,包括CIFAR-10、CIFAR-100/20、STL-10、ImageNet-10和ImageNetDog-15。然而,无监督图像归类中的收敛效率总是一个重要的挑战。最近,预训练的表征学习已经取得了很大的进步,释放出了许多高质量的视觉表示。可以看到,使用预训练模型作为特征提取器可以加速归类的收敛速度。我们的目标是提供一种新的视角,以及一种合理的资源应用,以提高图像归类的效果。此外,我们还创新了预训练基于Grid Jigsaw Representation(pGJR),通过改进GJR来提高归类效果。实验结果表明,pGJR在归类任务中对ACC、NMI和ARI三个 metric具有显著的效果,并且具有超快的收敛速度。
What You See Is What You Detect: Towards better Object Densification in 3D detection
paper_authors: Tianran Liu, Zeping Zhang Morteza Mousa Pasandi, Robert Laganiere
for: The paper is written for improving the accuracy of 3D object detection from Lidar signals, specifically addressing the issue of object completion in 3D perception.
methods: The paper proposes a visible part completion method that requires only a small number of prediction points, which is based on a mesh-deformation-based approach to augment the point set associated with visible foreground objects. The method consists of two parts: an Intra-Frustum Segmentation Transformer (IFST) and a Mesh Depth Completion Network(MDCNet).
results: The paper shows that the proposed method can provide up to 12.2% performance improvements over most of the public baseline models on the KITTI and NuScenes dataset, bringing the state-of-the-art to a new level.Here is the information in Simplified Chinese text:
results: 本文显示,提出的方法可以在 KITTI 和 NuScenes 数据集上提供最多 12.2% 的性能提升,将状态艺术带到新的水平。Abstract
Recent works have demonstrated the importance of object completion in 3D Perception from Lidar signal. Several methods have been proposed in which modules were used to densify the point clouds produced by laser scanners, leading to better recall and more accurate results. Pursuing in that direction, we present, in this work, a counter-intuitive perspective: the widely-used full-shape completion approach actually leads to a higher error-upper bound especially for far away objects and small objects like pedestrians. Based on this observation, we introduce a visible part completion method that requires only 11.3\% of the prediction points that previous methods generate. To recover the dense representation, we propose a mesh-deformation-based method to augment the point set associated with visible foreground objects. Considering that our approach focuses only on the visible part of the foreground objects to achieve accurate 3D detection, we named our method What You See Is What You Detect (WYSIWYD). Our proposed method is thus a detector-independent model that consists of 2 parts: an Intra-Frustum Segmentation Transformer (IFST) and a Mesh Depth Completion Network(MDCNet) that predicts the foreground depth from mesh deformation. This way, our model does not require the time-consuming full-depth completion task used by most pseudo-lidar-based methods. Our experimental evaluation shows that our approach can provide up to 12.2\% performance improvements over most of the public baseline models on the KITTI and NuScenes dataset bringing the state-of-the-art to a new level. The codes will be available at \textcolor[RGB]{0,0,255}{\url{https://github.com/Orbis36/WYSIWYD}
摘要
最近的研究表明3D感知从激光信号中的物体完成是非常重要的。许多方法已经被提出,其中包括使用模块来增强激光扫描仪生成的点云,从而提高精度和准确性。在这个方向下,我们在这项工作中提出了一个Counter-Intuitive Perspective:广泛使用的全形完成方法实际上会导致远距离物体和小物体(如人肉)的高错误上界。基于这一观察,我们引入可见部分完成方法,只需11.3%的预测点。为了恢复稠密表示,我们提议一种基于网格扭形的方法,用于补充可见前景物体的点集。由于我们的方法只关注可见前景物体来实现准确3D探测,因此我们将其命名为What You See Is What You Detect(WYSIWYD)。我们的提出的方法包括两部分:Intra-Frustum Segmentation Transformer(IFST)和Mesh Depth Completion Network(MDCNet),它们分别预测前景物体的深度和网格扭形。这样,我们的模型不需要时间consuming的全深度完成任务,与大多数 pseudo-lidar 基于的方法不同。我们的实验评估表明,我们的方法可以在 KITTI 和 NuScenes 数据集上提供Up to 12.2%的性能提升,将状态艺术引入到新的水平。代码将在 \textcolor[RGB]{0,0,255}{\url{https://github.com/Orbis36/WYSIWYD} 上提供。
for: 这 paper 是为了研究智能代理如何使用内部世界模型来预测不同的行为范围和时间尺度上的不同趋势。
methods: 这 paper 使用了一种名为 Multi Time Scale State Space (MTS3) 的概率ormalism,这种ormalism 可以有效地在多个时间尺度上进行高精度的长期预测和不确定性估计。
results: 实验表明,MTS3 方法在许多系统标识 benchmark 上表现出色,包括复杂的模拟和实际世界动力系统。Abstract
Intelligent agents use internal world models to reason and make predictions about different courses of their actions at many scales. Devising learning paradigms and architectures that allow machines to learn world models that operate at multiple levels of temporal abstractions while dealing with complex uncertainty predictions is a major technical hurdle. In this work, we propose a probabilistic formalism to learn multi-time scale world models which we call the Multi Time Scale State Space (MTS3) model. Our model uses a computationally efficient inference scheme on multiple time scales for highly accurate long-horizon predictions and uncertainty estimates over several seconds into the future. Our experiments, which focus on action conditional long horizon future predictions, show that MTS3 outperforms recent methods on several system identification benchmarks including complex simulated and real-world dynamical systems.
摘要
智能代理用内部世界模型来进行理解和预测不同的行动轨迹,从小规模到大规模,面临艰难的技术挑战。在这种工作中,我们提出了一种概率形式来学习多级时间尺度的世界模型,我们称之为多时间尺度状态空间(MTS3)模型。我们的模型使用多个时间尺度的计算效率优化的推理方案,以实现高精度的长期预测和未来数分秒内的不确定性估计。我们的实验集中关注行动条件长期未来预测,并在复杂的模拟和真实世界动力系统上达到了比较好的效果,超过了最近的方法。
Sample based Explanations via Generalized Representers
results: 文章进行了对两个图像和两个文本分类 datasets 的实验比较,并证明了不同的通用代表者在不同的 dataset 上的性能。Abstract
We propose a general class of sample based explanations of machine learning models, which we term generalized representers. To measure the effect of a training sample on a model's test prediction, generalized representers use two components: a global sample importance that quantifies the importance of the training point to the model and is invariant to test samples, and a local sample importance that measures similarity between the training sample and the test point with a kernel. A key contribution of the paper is to show that generalized representers are the only class of sample based explanations satisfying a natural set of axiomatic properties. We discuss approaches to extract global importances given a kernel, and also natural choices of kernels given modern non-linear models. As we show, many popular existing sample based explanations could be cast as generalized representers with particular choices of kernels and approaches to extract global importances. Additionally, we conduct empirical comparisons of different generalized representers on two image and two text classification datasets.
摘要
我们提出一种通用的样本基于解释方法,我们称之为通用表示者(generalized representers)。为了测量训练样本对模型测试预测的影响,通用表示者使用两个组件:全局样本重要性和本地样本重要性。全局样本重要性量化训练点对模型的影响,是不变的测试样本,而本地样本重要性则是测试点和训练点之间的相似性,使用核函数。我们的论文的一个重要贡献是证明通用表示者是唯一满足自然的axioms的类型的样本基于解释方法。我们讨论如何从核函数提取全局重要性,以及现代非线性模型中的自然选择核函数。我们还进行了两个图像和两个文本分类 datasets上的实验比较,以证明不同的通用表示者之间的区别。
3DCoMPaT$^{++}$: An improved Large-scale 3D Vision Dataset for Compositional Recognition
results: 研究人员在CVPR2023会议上组织了一场数据挑战,展示了赢家方法的使用,并 explore了GCR增强的一些alternative技术。Abstract
In this work, we present 3DCoMPaT$^{++}$, a multimodal 2D/3D dataset with 160 million rendered views of more than 10 million stylized 3D shapes carefully annotated at the part-instance level, alongside matching RGB point clouds, 3D textured meshes, depth maps, and segmentation masks. 3DCoMPaT$^{++}$ covers 41 shape categories, 275 fine-grained part categories, and 293 fine-grained material classes that can be compositionally applied to parts of 3D objects. We render a subset of one million stylized shapes from four equally spaced views as well as four randomized views, leading to a total of 160 million renderings. Parts are segmented at the instance level, with coarse-grained and fine-grained semantic levels. We introduce a new task, called Grounded CoMPaT Recognition (GCR), to collectively recognize and ground compositions of materials on parts of 3D objects. Additionally, we report the outcomes of a data challenge organized at CVPR2023, showcasing the winning method's utilization of a modified PointNet$^{++}$ model trained on 6D inputs, and exploring alternative techniques for GCR enhancement. We hope our work will help ease future research on compositional 3D Vision.
摘要
在这个工作中,我们介绍3DCoMPaT$^{++}$,一个多Modal 2D/3D数据集,包含160万个渲染视图,以及更多的1000万个精细化的3D形状,其中每个形状都有精细化的部件标注,同时还包含匹配的RGB点云、 текстури化的三角形、深度地图和分割mask。3DCoMPaT$^{++}$覆盖了41种形状类,275种精细化部件类,以及293种精细化材料类,这些类可以在3D对象的部件上进行组合应用。我们从四个相等的视图渲染了一个百万个精细化的形状,并且随机选择四个视图,共计160万个渲染。在部件级别进行分割,并设置了粗略和细腻的semantic水平。我们介绍了一个新任务,即Grounded CoMPaT Recognition (GCR),以同时认识和固定3D对象的部件上的材料组合。此外,我们还报告了CVPR2023年度数据挑战的结果,展示了一种使用修改后的PointNet$^{++}$模型训练于6D输入的赢家方法,以及探讨了GCR增强技术的代替方法。我们希望我们的工作能够为未来的3D视觉研究提供帮助。
Deep Reinforcement Learning for Weapons to Targets Assignment in a Hypersonic strike
results: 相比NLIP策略,深度强化学习策略具有优化性和1000倍减少计算时间,可以实现实时决策,满足 autonomous 决策在任务末端。Abstract
We use deep reinforcement learning (RL) to optimize a weapons to target assignment (WTA) policy for multi-vehicle hypersonic strike against multiple targets. The objective is to maximize the total value of destroyed targets in each episode. Each randomly generated episode varies the number and initial conditions of the hypersonic strike weapons (HSW) and targets, the value distribution of the targets, and the probability of a HSW being intercepted. We compare the performance of this WTA policy to that of a benchmark WTA policy derived using non-linear integer programming (NLIP), and find that the RL WTA policy gives near optimal performance with a 1000X speedup in computation time, allowing real time operation that facilitates autonomous decision making in the mission end game.
摘要
我们使用深度强化学习(RL)优化多辆高速武器对多个目标的分配策略,以最大化每个回合的目标总值。每个随机生成的回合都会变化高速武器和目标的数量和初始状态,目标的价值分布,以及高速武器被 intercept 的概率。我们对这种 WTA 策略与非线性整数编程(NLIP) derive 的参考 WTA 策略进行比较,发现 RL WTA 策略在计算时间上具有1000倍的加速,可以实现实时运行,从而促进任务尾部自动决策。
How Well Do Feature-Additive Explainers Explain Feature-Additive Predictors?
results: 研究发现,这些解释器在处理符号表示、神经网络和总代数模型上都有较差的性能,尤其是当决策过程含有特征交互时。Abstract
Surging interest in deep learning from high-stakes domains has precipitated concern over the inscrutable nature of black box neural networks. Explainable AI (XAI) research has led to an abundance of explanation algorithms for these black boxes. Such post hoc explainers produce human-comprehensible explanations, however, their fidelity with respect to the model is not well understood - explanation evaluation remains one of the most challenging issues in XAI. In this paper, we ask a targeted but important question: can popular feature-additive explainers (e.g., LIME, SHAP, SHAPR, MAPLE, and PDP) explain feature-additive predictors? Herein, we evaluate such explainers on ground truth that is analytically derived from the additive structure of a model. We demonstrate the efficacy of our approach in understanding these explainers applied to symbolic expressions, neural networks, and generalized additive models on thousands of synthetic and several real-world tasks. Our results suggest that all explainers eventually fail to correctly attribute the importance of features, especially when a decision-making process involves feature interactions.
摘要
高于常规领域的深度学习突破性引起了黑盒神经网络的不可预测性的问题的关注。可解释AI(XAI)研究引发了大量的解释算法 для这些黑盒。然而,这些后期解释器的准确性与模型之间的关系并不很清楚 - 解释评估仍然是XAI中最大的挑战。在这篇论文中,我们提出了一个targeted yet important问题:可能性分解器(例如LIME、SHAP、SHAPR、MAPLE和PDP)能够解释增加性预测器吗?我们在这篇论文中评估这些解释器在符号表示法、神经网络和总加itive模型上的 thousendsof synthetic和several real-world任务中的效果。我们的结果表明,无论是在符号表示法还是在实际任务上,所有的解释器都 eventually fail to correctly attribute the importance of features,特别是当决策过程中涉及到特征之间的互动。
MOSEL: Inference Serving Using Dynamic Modality Selection
paper_authors: Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja J. Yadwadkar, Aditya Akella
For: The paper is written for researchers and developers who are working on machine learning models and inference-serving systems, and who are looking for ways to improve the efficiency and accuracy of their models.* Methods: The paper proposes a new approach called modality selection, which involves adaptively choosing the most relevant modalities for an inference task based on user-defined performance and accuracy requirements. The proposed approach is implemented in an automated inference serving system called MOSEL.* Results: The paper reports that MOSEL improves system throughput by 3.6 times with an accuracy guarantee and shortens job completion times by 11 times compared to a baseline approach. The results demonstrate the effectiveness of the modality selection approach and the benefits of using MOSEL for multi-modal machine learning models.Abstract
Rapid advancements over the years have helped machine learning models reach previously hard-to-achieve goals, sometimes even exceeding human capabilities. However, to attain the desired accuracy, the model sizes and in turn their computational requirements have increased drastically. Thus, serving predictions from these models to meet any target latency and cost requirements of applications remains a key challenge, despite recent work in building inference-serving systems as well as algorithmic approaches that dynamically adapt models based on inputs. In this paper, we introduce a form of dynamism, modality selection, where we adaptively choose modalities from inference inputs while maintaining the model quality. We introduce MOSEL, an automated inference serving system for multi-modal ML models that carefully picks input modalities per request based on user-defined performance and accuracy requirements. MOSEL exploits modality configurations extensively, improving system throughput by 3.6$\times$ with an accuracy guarantee and shortening job completion times by 11$\times$.
摘要
随着时间的推移,机器学习模型在过去的几年内进行了快速的进步,有时甚至超越人类的能力。然而,为了达到所需的准确率,模型的大小和计算需求却有了很大的增长。因此,将预测结果服务到应用程序中,以满足任何目标延迟和成本要求,仍然是一大项目。在这篇论文中,我们引入了一种动态性,即modalities选择,我们在推理输入中动态选择Modalities,保持模型质量。我们介绍了MOSEL,一个自动化推理服务系统,可以智能地选择输入Modalities,根据用户定义的性能和准确率要求。MOSEL利用模式配置的潜在优势,提高系统吞吐量3.6倍,同时保证准确率和完成任务时间的短短化。
Weighted Sampled Split Learning (WSSL): Balancing Privacy, Robustness, and Fairness in Distributed Learning Environments
paper_authors: Manish Osti, Aashray Thakuri, Basheer Qolomany, Aos Mulahuwaish
for: 提高隐私、可靠性和公平性在分布式机器学习系统中
methods: 使用权重采样方法,将学习过程分布到多个客户端,以保护数据隐私和提高模型准确性
results: 1) 提高模型准确性,2) 提高系统可靠性,3) 维护客户端组合的公平性Abstract
This study presents Weighted Sampled Split Learning (WSSL), an innovative framework tailored to bolster privacy, robustness, and fairness in distributed machine learning systems. Unlike traditional approaches, WSSL disperses the learning process among multiple clients, thereby safeguarding data confidentiality. Central to WSSL's efficacy is its utilization of weighted sampling. This approach ensures equitable learning by tactically selecting influential clients based on their contributions. Our evaluation of WSSL spanned various client configurations and employed two distinct datasets: Human Gait Sensor and CIFAR-10. We observed three primary benefits: heightened model accuracy, enhanced robustness, and maintained fairness across diverse client compositions. Notably, our distributed frameworks consistently surpassed centralized counterparts, registering accuracy peaks of 82.63% and 75.51% for the Human Gait Sensor and CIFAR-10 datasets, respectively. These figures contrast with the top accuracies of 81.12% and 58.60% achieved by centralized systems. Collectively, our findings champion WSSL as a potent and scalable successor to conventional centralized learning, marking it as a pivotal stride forward in privacy-focused, resilient, and impartial distributed machine learning.
摘要
results: 我们的结果表明,causalPIMA 可以在完全无监督情况下学习一个可解释的 causal 结构,同时也可以找到关键的特征。我们测试了这种算法在一个 synthetic 数据集和一个科学数据集上,结果表明,它可以在完全无监督情况下找到关键的特征和 causal 关系。Abstract
Causal representation learning algorithms discover lower-dimensional representations of data that admit a decipherable interpretation of cause and effect; as achieving such interpretable representations is challenging, many causal learning algorithms utilize elements indicating prior information, such as (linear) structural causal models, interventional data, or weak supervision. Unfortunately, in exploratory causal representation learning, such elements and prior information may not be available or warranted. Alternatively, scientific datasets often have multiple modalities or physics-based constraints, and the use of such scientific, multimodal data has been shown to improve disentanglement in fully unsupervised settings. Consequently, we introduce a causal representation learning algorithm (causalPIMA) that can use multimodal data and known physics to discover important features with causal relationships. Our innovative algorithm utilizes a new differentiable parametrization to learn a directed acyclic graph (DAG) together with a latent space of a variational autoencoder in an end-to-end differentiable framework via a single, tractable evidence lower bound loss function. We place a Gaussian mixture prior on the latent space and identify each of the mixtures with an outcome of the DAG nodes; this novel identification enables feature discovery with causal relationships. Tested against a synthetic and a scientific dataset, our results demonstrate the capability of learning an interpretable causal structure while simultaneously discovering key features in a fully unsupervised setting.
摘要
causal representation learning algorithms 找到 Lower-dimensional 的表示,这些表示具有可解释的 causal 关系;因为实现这种可解释的表示是困难的,许多 causal learning algorithms 使用元信息,如(线性)结构 causal 模型, intervening 数据或 weak supervision。然而,在 exploratory causal representation learning 中,这些元信息和 prior information 可能不可用或不合适。 alternatively, scientific datasets часто有多个模式或 physics-based 约束,并使用这些 scientific, multimodal 数据可以提高 disentanglement 在完全无监督的设置中。因此,我们引入了一种 causal representation learning algorithm (causalPIMA),可以使用 multimodal 数据和known physics 来发现重要的 causal 关系。我们的 innovative algorithm 使用了一种新的 differentiable parametrization,在一个 end-to-end differentiable 框架中学习一个 directed acyclic graph (DAG) 和一个 latent space 的 variational autoencoder。我们在这个框架中使用了一个单一的 tractable evidence lower bound 损失函数。我们在 latent space 中分配了 Gaussian mixture prior,并将每个混合物标识为 DAG 节点的结果;这种新的标识使得 feature discovery 具有 causal 关系。我们在一个 sintetic 和一个 scientific dataset 上测试了我们的结果,结果表明我们可以在完全无监督的设置中学习可解释的 causal 结构,同时也可以发现关键的特征。
Semi-Synthetic Dataset Augmentation for Application-Specific Gaze Estimation
results: 平均降低 gaze estimation 错误角度的比例为 47%Abstract
Although the number of gaze estimation datasets is growing, the application of appearance-based gaze estimation methods is mostly limited to estimating the point of gaze on a screen. This is in part because most datasets are generated in a similar fashion, where the gaze target is on a screen close to camera's origin. In other applications such as assistive robotics or marketing research, the 3D point of gaze might not be close to the camera's origin, meaning models trained on current datasets do not generalize well to these tasks. We therefore suggest generating a textured tridimensional mesh of the face and rendering the training images from a virtual camera at a specific position and orientation related to the application as a mean of augmenting the existing datasets. In our tests, this lead to an average 47% decrease in gaze estimation angular error.
摘要
In other words, the existing datasets for gaze estimation are mostly generated with the gaze target on a screen close to the camera's origin, which limits the application of appearance-based gaze estimation methods to only estimating the point of gaze on a screen. To address this limitation, we suggest using a textured 3D mesh of the face and rendering the training images from a virtual camera at a specific position and orientation related to the application as a means of augmenting the existing datasets. This leads to an average 47% decrease in gaze estimation angular error.
results: 提供了基准语言模型,以及代码 для精度调整和评估,以支持进一步的开发In English, this means:
for: The paper is written to integrate a language model into the Lean proof assistant.
methods: The paper proposes using a server hosting a language model to generate suggestions, which are then checked in Lean and displayed to the user in their development environment.
results: The paper provides a baseline language model, along with code for fine-tuning and evaluation to support further development.Abstract
We present LLMSTEP, a tool for integrating a language model into the Lean proof assistant. LLMSTEP is a Lean 4 tactic that sends a user's proof state to a server hosting a language model. The language model generates suggestions, which are checked in Lean and displayed to a user in their development environment. We provide a baseline language model, along with code for fine-tuning and evaluation to support further development. We provide server implementations that run on CPU, a CUDA GPU, or a Google Colab notebook, as a step towards fast, effective language model suggestions for any user.
摘要
我们介绍LLMSTEP,一个将语言模型集成到lean推理助手的工具。LLMSTEP是lean 4的一个战略,将用户的证明状态发送到一个主机上的语言模型。语言模型产生建议,并在lean中检查和显示给用户。我们提供了基线语言模型,以及代码 для微调和评估,以支持进一步的开发。我们提供了 CPU、CUDA GPU 和 Google Colab 笔记本上的服务器实现,以便快速、有效地获得任何用户的语言模型建议。
A Novel Skip Orthogonal List for Dynamic Optimal Transport Problem
for: solves the discrete dynamic optimal transport problem efficiently when the weights or locations of the data points change, with applications in machine learning.
methods: proposes a novel 2D Skip Orthogonal List and dynamic tree techniques, based on the conventional simplex method, to efficiently complete each pivoting operation within $O(|V|)$ time with high probability.
results: significantly outperforms existing algorithms in dynamic scenarios, with a few simplex iterations in practice.Abstract
Optimal transportation is a fundamental topic that has attracted a great amount of attention from machine learning community in the past decades. In this paper, we consider an interesting discrete dynamic optimal transport problem: can we efficiently update the optimal transport plan when the weights or the locations of the data points change? This problem is naturally motivated by several applications in machine learning. For example, we often need to compute the optimal transportation cost between two different data sets; if some change happens to a few data points, should we re-compute the high complexity cost function or update the cost by some efficient dynamic data structure? We are aware that several dynamic maximum flow algorithms have been proposed before, however, the research on dynamic minimum cost flow problem is still quite limited, to the best of our knowledge. We propose a novel 2D Skip Orthogonal List together with some dynamic tree techniques. Although our algorithm is based on the conventional simplex method, it can efficiently complete each pivoting operation within $O(|V|)$ time with high probability where $V$ is the set of all supply and demand nodes. Since dynamic modifications typically do not introduce significant changes, our algorithm requires only a few simplex iterations in practice. So our algorithm is more efficient than re-computing the optimal transportation cost that needs at least one traversal over all the $O(|E|) = O(|V|^2)$ variables in general cases. Our experiments demonstrate that our algorithm significantly outperforms existing algorithms in the dynamic scenarios.
摘要
最优运输是机器学习领域内一个基本问题,在过去几十年内吸引了大量关注。在这篇论文中,我们考虑了一个有趣的离散动态最优运输问题:在数据点的重量或位置发生变化时,是否可以有效地更新最优运输计划?这个问题是机器学习中各种应用场景的自然推动。例如,我们经常需要计算两个不同数据集之间的最优运输成本;如果一些数据点发生变化,是否可以快速地更新高复杂性成本函数,或者使用一些高效的动态数据结构?我们知道有几种动态最大流算法被提出,但是关于动态最小成本流问题的研究还很有限,至于我们所知道的最佳状态。我们提出了一种新的2D跳过列表,并结合了一些动态树技术。尽管我们的算法基于传统的简单кс方法,但它可以在$O(|V|)$时间内高可用性下完成每次轴转操作,其中$V$是所有供应和需求节点的集合。由于动态修改通常不会引入重要的变化,我们的算法只需要几个简单кс迭代即可。因此,我们的算法比重新计算总成本函数,需要至少一次遍历所有$O(|E|) = O(|V|^2)$变量的情况下更高效。我们的实验表明,我们的算法在动态场景下明显超过现有算法。
Towards a fuller understanding of neurons with Clustered Compositional Explanations
results: 本研究通过分析不同谱activation的问题和提出了一些 desiderata 质量,以便评估不同算法返回的解释的有效性。Abstract
Compositional Explanations is a method for identifying logical formulas of concepts that approximate the neurons' behavior. However, these explanations are linked to the small spectrum of neuron activations (i.e., the highest ones) used to check the alignment, thus lacking completeness. In this paper, we propose a generalization, called Clustered Compositional Explanations, that combines Compositional Explanations with clustering and a novel search heuristic to approximate a broader spectrum of the neurons' behavior. We define and address the problems connected to the application of these methods to multiple ranges of activations, analyze the insights retrievable by using our algorithm, and propose desiderata qualities that can be used to study the explanations returned by different algorithms.
摘要
《 compositional explanations 是一种方法,用于identifying logical formulas of concepts that approximate the neurons' behavior。然而,这些解释与小谱activations(即用于检查alignment的最高一些)相关,因此缺乏完整性。在这篇论文中,我们提出了一种扩展,called Clustered Compositional Explanations,它将 Compositional Explanations 与 clustering 和一种新的搜索规则相结合,以approximate a broader spectrum of the neurons' behavior。我们定义并讨论了应用这些方法到多个范围的活动问题,分析了使用我们的算法可以获得的洞察,并提出了对不同算法返回的解释的希望质量。》Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.
On the Fairness ROAD: Robust Optimization for Adversarial Debiasing
results: 实验结果表明,ROAD方法可以在三个标准数据集上实现Pareto优化,即同时保证地域性均衡和全球公平性,并且在分布shift情况下提高公平性泛化性。Abstract
In the field of algorithmic fairness, significant attention has been put on group fairness criteria, such as Demographic Parity and Equalized Odds. Nevertheless, these objectives, measured as global averages, have raised concerns about persistent local disparities between sensitive groups. In this work, we address the problem of local fairness, which ensures that the predictor is unbiased not only in terms of expectations over the whole population, but also within any subregion of the feature space, unknown at training time. To enforce this objective, we introduce ROAD, a novel approach that leverages the Distributionally Robust Optimization (DRO) framework within a fair adversarial learning objective, where an adversary tries to infer the sensitive attribute from the predictions. Using an instance-level re-weighting strategy, ROAD is designed to prioritize inputs that are likely to be locally unfair, i.e. where the adversary faces the least difficulty in reconstructing the sensitive attribute. Numerical experiments demonstrate the effectiveness of our method: it achieves Pareto dominance with respect to local fairness and accuracy for a given global fairness level across three standard datasets, and also enhances fairness generalization under distribution shift.
摘要
在算法公平领域,大量关注集合公平标准,如人口学性别比和等值机会。然而,这些目标,作为总体平均值,已经引起了地方不均衡的持续问题。在这种情况下,我们解决了地方公平问题,以确保预测器在整个人口中不偏袋,而且在任何未知训练时间的子区域中也是不偏袋。为此,我们提出了ROAD,一种基于分布robust优化(DRO)框架的新方法,具有公平反对抗学习目标,其中一个反对手尝试从预测中推断敏感特征。通过实例级别的重量策略,ROAD可以优先级化可能存在地方不公平的输入,即反对手在推断敏感特征时面临最小的困难。 numerically experiment demontrates the effectiveness of our method:it achieves Pareto dominance with respect to local fairness and accuracy for a given global fairness level across three standard datasets, and also enhances fairness generalization under distribution shift.
Gen2Sim: Scaling up Robot Learning in Simulation with Generative Models
results: 研究成功地学习了多种长期任务的策略,而非temporally decomposed reward function无法学习这些任务。 Gen2Sim提供了一种可行的方法来扩大和多样化 robot manipulation技能的学习,并且可以通过时间层次分解来探索RL中的行为发现。Abstract
Generalist robot manipulators need to learn a wide variety of manipulation skills across diverse environments. Current robot training pipelines rely on humans to provide kinesthetic demonstrations or to program simulation environments and to code up reward functions for reinforcement learning. Such human involvement is an important bottleneck towards scaling up robot learning across diverse tasks and environments. We propose Generation to Simulation (Gen2Sim), a method for scaling up robot skill learning in simulation by automating generation of 3D assets, task descriptions, task decompositions and reward functions using large pre-trained generative models of language and vision. We generate 3D assets for simulation by lifting open-world 2D object-centric images to 3D using image diffusion models and querying LLMs to determine plausible physics parameters. Given URDF files of generated and human-developed assets, we chain-of-thought prompt LLMs to map these to relevant task descriptions, temporal decompositions, and corresponding python reward functions for reinforcement learning. We show Gen2Sim succeeds in learning policies for diverse long horizon tasks, where reinforcement learning with non temporally decomposed reward functions fails. Gen2Sim provides a viable path for scaling up reinforcement learning for robot manipulators in simulation, both by diversifying and expanding task and environment development, and by facilitating the discovery of reinforcement-learned behaviors through temporal task decomposition in RL. Our work contributes hundreds of simulated assets, tasks and demonstrations, taking a step towards fully autonomous robotic manipulation skill acquisition in simulation.
摘要
通用 robot manipulator 需要学习多种 manipulate 技能在多种环境中。现有的 robot 训练管道依赖人类提供动能示例或编程 simulation 环境,并编程 reward 函数 для reinforcement learning。这种人类参与度是扩大 robot 学习的重要瓶颈。我们提出 Generation to Simulation(Gen2Sim)方法,用于扩大 robot 技能学习在 simulation 中。我们使用大型预训练的语言和视觉生成模型自动生成 3D 资产、任务描述、任务分解和 reward 函数。我们使用图像扩散模型将开放世界 2D 物体中的图像映射到 3D,并使用 LLMS 确定物理参数。给定 URDF 文件生成和人类开发的资产,我们使用链式思维 Prompt LLMs 将它们映射到相关的任务描述、时间分解和相应的 Python reward 函数。我们证明 Gen2Sim 可以学习多种长期任务的策略,而 reinforcement learning 无法使用非时间分解的 reward 函数。Gen2Sim 为 robot manipulator 在 simulation 中的学习提供了一条可行的道路,不仅扩大和多样化任务和环境开发,还促进了通过时间分解在 RL 中发现执行 behaviors 的发现。我们的工作提供了数百个模拟资产、任务和示例,为完全自主 robotic manipulation 技能获得做出了一步进展。
A Stability Principle for Learning under Non-Stationarity
results: 论述显示这个方法在不知道非站ARY的情况下也能够适应。 regret bound是最大化对应损失的最小化最大化对应损失,即logarithmic factor。研究中的两个新成果包括一个Function similarity度量和一个分 segmentation技术。Abstract
We develop a versatile framework for statistical learning in non-stationary environments. In each time period, our approach applies a stability principle to select a look-back window that maximizes the utilization of historical data while keeping the cumulative bias within an acceptable range relative to the stochastic error. Our theory showcases the adaptability of this approach to unknown non-stationarity. The regret bound is minimax optimal up to logarithmic factors when the population losses are strongly convex, or Lipschitz only. At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.
摘要
我们开发了一个灵活的统计学学习框架,适用于不稳定的环境。每个时间间隔,我们的方法会选择一个稳定原则来选择最大化历史数据的利用,同时保持积累偏差在接受范围内的偏差。我们的理论表明这种方法在未知非站ARY情况下具有适应性。我们的 regret bound是最小化的最大化因子,当人口损失是强Converter或Lipschitz时。我们的分析中包括两个新的组成部分:一种函数相似度度量和非站ARY数据序列分割技术。
Socially Cognizant Robotics for a Technology Enhanced Society
results: 研究发现,通过将人类中心的目标放在首位,可以开拓出许多新的研究视角和问题,以改善机器人与人类之间的交互,并对社会产生的影响。Abstract
Emerging applications of robotics, and concerns about their impact, require the research community to put human-centric objectives front-and-center. To meet this challenge, we advocate an interdisciplinary approach, socially cognizant robotics, which synthesizes technical and social science methods. We argue that this approach follows from the need to empower stakeholder participation (from synchronous human feedback to asynchronous societal assessment) in shaping AI-driven robot behavior at all levels, and leads to a range of novel research perspectives and problems both for improving robots' interactions with individuals and impacts on society. Drawing on these arguments, we develop best practices for socially cognizant robot design that balance traditional technology-based metrics (e.g. efficiency, precision and accuracy) with critically important, albeit challenging to measure, human and society-based metrics.
摘要
新兴应用场景和对其影响的担忧,需要研究社区将人类中心的目标置于首位。为解决这个挑战,我们支持跨学科的方法,社会认知机器人,它将技术和社会科学方法相结合。我们认为,这种方法来自参与者参与(从同步人类反馈到异步社会评估)在AI驱动机器人行为的形成中发挥作用,并导致了改善机器人与个人交互以及对社会的影响的新研究视角和问题。从这些理由,我们开发了社会认知机器人的最佳实践,权衡传统技术基础的指标(如效率、准确率)与人类和社会基础的指标,这些指标具有挑战性,但对于机器人的设计和应用至关重要。
Interactive Motion Planning for Autonomous Vehicles with Joint Optimization
paper_authors: Yuxiao Chen, Sushant Veer, Peter Karkus, Marco Pavone
for: This paper is written for planning safe motions for autonomous vehicles in highly interactive driving scenarios.
methods: The paper uses deep-learning-based models for trajectory prediction and joint optimization with model predictive control (MPC) to leverage ego-conditioned prediction.
results: The proposed Interactive Joint Planning (IJP) method significantly outperforms baselines in closed-loop simulation, demonstrating its effectiveness in providing safe and efficient motions for autonomous vehicles in interactive driving scenarios.Here’s the Chinese translation of the three points:
results: 提出的互动联合规划(IJP)方法在关闭Loop simulation中显著超越基准值,demonstrating its effectiveness in providing safe and efficient motions for autonomous vehicles in interactive driving scenarios.Abstract
In highly interactive driving scenarios, the actions of one agent greatly influences those of its neighbors. Planning safe motions for autonomous vehicles in such interactive environments, therefore, requires reasoning about the impact of the ego's intended motion plan on nearby agents' behavior. Deep-learning-based models have recently achieved great success in trajectory prediction and many models in the literature allow for ego-conditioned prediction. However, leveraging ego-conditioned prediction remains challenging in downstream planning due to the complex nature of neural networks, limiting the planner structure to simple ones, e.g., sampling-based planner. Despite their ability to generate fine-grained high-quality motion plans, it is difficult for gradient-based planning algorithms, such as model predictive control (MPC), to leverage ego-conditioned prediction due to their iterative nature and need for gradient. We present Interactive Joint Planning (IJP) that bridges MPC with learned prediction models in a computationally scalable manner to provide us the best of both the worlds. In particular, IJP jointly optimizes over the behavior of the ego and the surrounding agents and leverages deep-learned prediction models as prediction priors that the join trajectory optimization tries to stay close to. Furthermore, by leveraging homotopy classes, our joint optimizer searches over diverse motion plans to avoid getting stuck at local minima. Closed-loop simulation result shows that IJP significantly outperforms the baselines that are either without joint optimization or running sampling-based planning.
摘要
在高度互动的驾驶场景中,一个agent的行为会深刻影响其周围的其他agent。因此,为自动驾驶车辆在这些互动环境中规划安全的动作计划,需要考虑ego的意图动作计划对周围agent的行为的影响。深度学习基于模型在轨迹预测方面刚果取得了很大成功,但是在下游规划中利用egoconditioned预测仍然具有挑战性,因为神经网络的复杂性限制了规划结构的选择,只能选择简单的采样基本预测器。尽管它们可以生成细腻高质量的动作计划,但是使用梯度计算法,如模型预测控制(MPC),利用egoconditioned预测却困难,因为它们的迭代性和需要梯度。我们提出了互动联合规划(IJP),它将MPC与学习预测模型在计算可扩展的方式联系起来,以获得最佳的世界。具体来说,IJP同时优化ego和周围agent的行为,并利用深度学习预测模型作为预测假设,Join trajectory optimization尝试保持近于预测。此外,通过Homotopy类,我们的联合优化器搜索到多种动作计划,以避免陷入地点附近的局部最佳解。关闭环境 simulate结果表明,IJP显著超过了不包含联合优化或运行采样基本预测的基eline。
paper_authors: Sehyun Kwon, Jaeseung Park, Minkyu Kim, Jaewoong Cho, Ernest K. Ryu, Kangwook Lee
for: 图像 clustering based on user-specified text criteria
methods: 利用现代视觉语言模型和大语言模型,实现图像 clustering Conditional on Text Criteria (IC$|$TC)
results: 在不同的基准下,IC$|$TC 可以有效地对图像进行分 clustering,并与基eline 相比显著提高表现。Abstract
Classical clustering methods do not provide users with direct control of the clustering results, and the clustering results may not be consistent with the relevant criterion that a user has in mind. In this work, we present a new methodology for performing image clustering based on user-specified text criteria by leveraging modern vision-language models and large language models. We call our method Image Clustering Conditioned on Text Criteria (IC$|$TC), and it represents a different paradigm of image clustering. IC$|$TC requires a minimal and practical degree of human intervention and grants the user significant control over the clustering results in return. Our experiments show that IC$|$TC can effectively cluster images with various criteria, such as human action, physical location, or the person's mood, while significantly outperforming baselines.
摘要
传统的帮助方法不提供用户直接控制帮助结果,并且帮助结果可能不符合用户有意思的标准。在这种工作中,我们介绍了一种新的图像帮助方法,基于用户指定的文本标准。我们称之为图像帮助 conditional on 文本标准(IC$|$TC),它代表了一种新的帮助方法 paradigm。IC$|$TC需要最小化和实用的人类干预,并为用户提供了较高的控制权,以换取更好的帮助结果。我们的实验表明,IC$|$TC可以有效地将图像分类到不同的标准,如人类动作、物理位置或人的情绪,而与基准值相比显著性能更高。
Moments for Perceptive Narration Analysis Through the Emotional Attachment of Audience to Discourse and Story
methods: 这篇论文引入了一个新的故事元素called “moments”,并提出了一种方法来分解线性故事(如电影)into a set of moments。这些 moments 可以分为两类:Story moments 和 Discourse moments。每种类型的 moment 可以进一步分为三种类型的 universal storytelling moments,这些 moments 可以增强或削弱观众对角色或故事的情感附加。
results: 这篇论文提出了一种方法来目录各种 universal moments 的出现,并使用曲线或颜色带来可视化角色的旅程。此外, authors 还证明了 story moments 和 Discourse moments 都可以转化为一个总趋势参数,这个参数可以在时间轴上Plot 出观众对故事的情感附加情况。Abstract
In this work, our goal is to develop a theoretical framework that can eventually be used for analyzing the effectiveness of visual stories such as feature films to comic books. To develop this theoretical framework, we introduce a new story element called moments. Our conjecture is that any linear story such as the story of a feature film can be decomposed into a set of moments that follow each other. Moments are defined as the perception of the actions, interactions, and expressions of all characters or a single character during a given time period. We categorize the moments into two major types: story moments and discourse moments. Each type of moment can further be classified into three types, which we call universal storytelling moments. We believe these universal moments foster or deteriorate the emotional attachment of the audience to a particular character or the story. We present a methodology to catalog the occurrences of these universal moments as they are found in the story. The cataloged moments can be represented using curves or color strips. Therefore, we can visualize a character's journey through the story as either a 3D curve or a color strip. We also demonstrated that both story and discourse moments can be transformed into one lump-sum attraction parameter. The attraction parameter in time provides a function that can be plotted graphically onto a timeline illustrating changes in the emotional attachment of audience to a character or the story. By inspecting these functions the story analyst can analytically decipher the moments in the story where the attachment is being established, maintained, strengthened, or conversely where it is languishing.
摘要
在这项工作中,我们的目标是开发一个理论框架,以便分析视觉故事,从电影到漫画。为了实现这个目标,我们引入了一个新的故事元素,即“时刻”(moments)。我们的假设是,任何线性故事,例如电影的故事,都可以分解成一系列的时刻,这些时刻继承于一个时间段内的人物或单一人物的行动、互动和表达。我们将时刻分类为两大类:剧情时刻和对话时刻。每种时刻可以进一步分为三种通用故事创作时刻。我们认为这些通用时刻会使观众对特定人物或故事产生情感附加或减少。我们提出了一种方法来目录这些通用时刻的出现,并可以使用曲线或颜色带来表示人物的旅程。我们还证明了,剧情和对话时刻都可以转化为一个累积参数。这个参数在时间上提供了一个函数,可以在时间轴上Plot,并表示观众对人物或故事的情感附加或减少的变化。通过查看这些函数,故事分析人员可以分析故事中情感附加的时刻,以及将其建立、维护、强化或反之。
Learning to Search Feasible and Infeasible Regions of Routing Problems with Flexible Neural k-Opt
results: 实验表明,NeuOpt在TSP和CVRP问题上显著超越了现有的面罩-based L2S算法,同时也超越了L2C和L2P算法。此外,paper还提供了一些新的思路来处理VRP约束。Abstract
In this paper, we present Neural k-Opt (NeuOpt), a novel learning-to-search (L2S) solver for routing problems. It learns to perform flexible k-opt exchanges based on a tailored action factorization method and a customized recurrent dual-stream decoder. As a pioneering work to circumvent the pure feasibility masking scheme and enable the autonomous exploration of both feasible and infeasible regions, we then propose the Guided Infeasible Region Exploration (GIRE) scheme, which supplements the NeuOpt policy network with feasibility-related features and leverages reward shaping to steer reinforcement learning more effectively. Additionally, we equip NeuOpt with Dynamic Data Augmentation (D2A) for more diverse searches during inference. Extensive experiments on the Traveling Salesman Problem (TSP) and Capacitated Vehicle Routing Problem (CVRP) demonstrate that our NeuOpt not only significantly outstrips existing (masking-based) L2S solvers, but also showcases superiority over the learning-to-construct (L2C) and learning-to-predict (L2P) solvers. Notably, we offer fresh perspectives on how neural solvers can handle VRP constraints. Our code is available: https://github.com/yining043/NeuOpt.
摘要
在这篇论文中,我们提出了一种名为Neural k-Opt(NeuOpt)的学习到搜索(L2S)算法,用于解决路径问题。它学习如何进行灵活的 k-opt 交换,基于一种适应性的动作因子化方法和一种自定义的循环双流解码器。作为一种突破约束 маскинг 方案的先锋性工作,我们然后提出了指导不可能区域探索(GIRE)方案,该方案在NeuOpt策略网络中添加了可行性相关特征,并通过奖励形成来更有效地驱动学习。此外,我们还为NeuOpt提供了动态数据扩充(D2A)以在推理中进行更多的搜索。我们在旅行商问题(TSP)和容量有限的交通问题(CVRP)进行了广泛的实验,结果表明,我们的NeuOpt不仅明显超越了现有的(masking-based)L2S算法,还超越了学习到构建(L2C)和学习到预测(L2P)算法。另外,我们还提供了一些新的视角,用于描述如何使用神经网络来处理 VRP 约束。我们的代码可以在 GitHub 上找到:https://github.com/yining043/NeuOpt。
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
results: 论文发现,虽然目前没有公开的实证例子表明 AI 系统会发展出偏向和权力寻求,但是理论上的证据和实验证据表明这种风险存在。因此,无法 completly 排除 AI via 偏向和权力寻求对人类 pose existential risks 的可能性。Abstract
Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose existential risks. This paper reviews the evidence for existential risks from AI via misalignment, where AI systems develop goals misaligned with human values, and power-seeking, where misaligned AIs actively seek power. The review examines empirical findings, conceptual arguments and expert opinion relating to specification gaming, goal misgeneralization, and power-seeking. The current state of the evidence is found to be concerning but inconclusive regarding the existence of extreme forms of misaligned power-seeking. Strong empirical evidence of specification gaming combined with strong conceptual evidence for power-seeking make it difficult to dismiss the possibility of existential risk from misaligned power-seeking. On the other hand, to date there are no public empirical examples of misaligned power-seeking in AI systems, and so arguments that future systems will pose an existential risk remain somewhat speculative. Given the current state of the evidence, it is hard to be extremely confident either that misaligned power-seeking poses a large existential risk, or that it poses no existential risk. The fact that we cannot confidently rule out existential risk from AI via misaligned power-seeking is cause for serious concern.
摘要
人工智能(AI)的快速发展已引发了专家、政策制定者和世界领袖对AI系统可能对人类存在潜在的极大风险的担忧。这篇评论文章检查了AI系统发展不同目标的证据,包括 specification gaming、目标扩展和权力寻求。审查的证据表明,虽然目前没有公共的实证例子,但概念上的证据强,表明AI系统可能会发展出不同于人类价值观的目标。此外,由于目前的证据状况,无法绝对排除AI系统可能对人类存在极大风险的可能性。因此,我们应该对这一点表示严重关注。
Fine-Tuning Language Models Using Formal Methods Feedback
results: 论文提供了多个自动驾驶任务的实验结果,表明该方法可以在不同的任务上提高预训练语言模型的性能,从60%提高到90%。Abstract
Although pre-trained language models encode generic knowledge beneficial for planning and control, they may fail to generate appropriate control policies for domain-specific tasks. Existing fine-tuning methods use human feedback to address this limitation, however, sourcing human feedback is labor intensive and costly. We present a fully automated approach to fine-tune pre-trained language models for applications in autonomous systems, bridging the gap between generic knowledge and domain-specific requirements while reducing cost. The method synthesizes automaton-based controllers from pre-trained models guided by natural language task descriptions. These controllers are verifiable against independently provided specifications within a world model, which can be abstract or obtained from a high-fidelity simulator. Controllers with high compliance with the desired specifications receive higher ranks, guiding the iterative fine-tuning process. We provide quantitative evidences, primarily in autonomous driving, to demonstrate the method's effectiveness across multiple tasks. The results indicate an improvement in percentage of specifications satisfied by the controller from 60% to 90%.
摘要
Our method uses natural language task descriptions to guide the synthesis of automaton-based controllers from pre-trained models. These controllers are verifiable against independently provided specifications within a world model, which can be abstract or obtained from a high-fidelity simulator. Controllers that comply with the desired specifications receive higher ranks, guiding the iterative fine-tuning process.We provide quantitative evidence, primarily in the field of autonomous driving, to demonstrate the effectiveness of our method. The results show an improvement in the percentage of specifications satisfied by the controller, from 60% to 90%.
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
paper_authors: Jaemin Cho, Yushi Hu, Roopal Garg, Peter Anderson, Ranjay Krishna, Jason Baldridge, Mohit Bansal, Jordi Pont-Tuset, Su Wang for: 这个论文的目的是evaluating text-to-image models的可靠性。methods: 这个论文使用了Question Generation and Answering(QG/A)方法,通过使用预训练的基础模型生成提问和答案,然后根据提问生成的答案和图像是否一致来评估图像的可靠性。results: 这个论文通过提出和解决一些可靠性问题(如提问不应该包含幻像、重复或漏掉信息),并使用Davidsonian Scene Graph(DSG)评估框架来提高评估的可靠性。DSG使用图表来组织提问和答案,以确保提问的 semantic coverage 和答案的一致性。经过广泛的实验和人工评估,这个论文证明了DSG可以有效地解决这些问题。Abstract
Evaluating text-to-image models is notoriously difficult. A strong recent approach for assessing text-image faithfulness is based on QG/A (question generation and answering), which uses pre-trained foundational models to automatically generate a set of questions and answers from the prompt, and output images are scored based on whether these answers extracted with a visual question answering model are consistent with the prompt-based answers. This kind of evaluation is naturally dependent on the quality of the underlying QG and QA models. We identify and address several reliability challenges in existing QG/A work: (a) QG questions should respect the prompt (avoiding hallucinations, duplications, and omissions) and (b) VQA answers should be consistent (not asserting that there is no motorcycle in an image while also claiming the motorcycle is blue). We address these issues with Davidsonian Scene Graph (DSG), an empirically grounded evaluation framework inspired by formal semantics. DSG is an automatic, graph-based QG/A that is modularly implemented to be adaptable to any QG/A module. DSG produces atomic and unique questions organized in dependency graphs, which (i) ensure appropriate semantic coverage and (ii) sidestep inconsistent answers. With extensive experimentation and human evaluation on a range of model configurations (LLM, VQA, and T2I), we empirically demonstrate that DSG addresses the challenges noted above. Finally, we present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts, covering a wide range of fine-grained semantic categories with a balanced distribution. We release the DSG-1k prompts and the corresponding DSG questions.
摘要
评估文本到图像模型是非常困难的。一种强大的最近的方法是基于QG/A(问题生成和回答),它使用预训练的基础模型自动生成了一组问题和答案从提示中,然后根据图像输出的答案是否与提示基础答案一致来评分。这种评估方法自然地受到基础QG和QA模型的质量的影响。我们 indentify和解决了现有QG/A工作中的一些可靠性挑战:(a)QG问题应该遵循提示(避免幻象、重复和漏掉),(b)VQA答案应该一致(不能声称图像中没有摩托车而同时声称摩托车是蓝色)。我们使用戴维森景图(DSG)来解决这些问题,DSG是基于形式 semantics的实际训练的评估框架。DSG自动生成了原子和唯一的问题,组织成依赖图,以确保适当的semantic Coverage并且 circumvent不一致的答案。通过广泛的实验和人工评估,我们证明了DSG可以解决上述挑战。最后,我们提供了DSG-1k,一个开源的评估标准 benchmark,包括1,060个提示,覆盖了各种细化的semantic类别,并且具有良好的分布。我们发布了DSG-1k提示和相应的DSG问题。
Alignment and Outer Shell Isotropy for Hyperbolic Graph Contrastive Learning
results: 在不同的гипербо利图表示技术上,通过自动匹配度量和均匀度量来学习高质量图表示,并在supervised和自主学习设置下实现了较高的效果。Abstract
Learning good self-supervised graph representations that are beneficial to downstream tasks is challenging. Among a variety of methods, contrastive learning enjoys competitive performance. The embeddings of contrastive learning are arranged on a hypersphere that enables the Cosine distance measurement in the Euclidean space. However, the underlying structure of many domains such as graphs exhibits highly non-Euclidean latent geometry. To this end, we propose a novel contrastive learning framework to learn high-quality graph embedding. Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information, as well as we propose a substitute of uniformity metric to prevent the so-called dimensional collapse. We show that in the hyperbolic space one has to address the leaf- and height-level uniformity which are related to properties of trees, whereas in the ambient space of the hyperbolic manifold, these notions translate into imposing an isotropic ring density towards boundaries of Poincar\'e ball. This ring density can be easily imposed by promoting the isotropic feature distribution on the tangent space of manifold. In the experiments, we demonstrate the efficacy of our proposed method across different hyperbolic graph embedding techniques in both supervised and self-supervised learning settings.
摘要
学习良好的自我超VIewgraph representation是挑战性较高的。contrastive learning方法在这些方法中具有竞争性的表现。contrastive learning的嵌入是在一个径向体上安排的,这使得在欧几何空间中可以使用cosine距离测量。然而,许多领域的下游任务中的数据结构具有非欧几何的隐藏几何结构。为此,我们提出了一种新的对比学习框架,以学习高质量的图像嵌入。具体来说,我们设计了一个对应度度量,可以有效地捕捉层次数据不变信息,同时我们也提出了一种取代均匀度量来避免叫做dimensional collapse。我们发现在拓扑空间中,需要 Addressing leaf-和height-level uniformity,这与树的性质有关。而在拓扑空间中的 ambient space 中,这些概念转化为在Poincaré球的边界上强制实施一个均匀环绕径。这个环绕径可以通过推动拓扑空间的 tangent space 上的均匀特征分布来实现。在实验中,我们证明了我们提出的方法在不同的拓扑空间中的几何图像嵌入技术中具有效果,并在自我超VIewgraph embedding中和supervised learning中进行了证明。
Is Scaling Learned Optimizers Worth It? Evaluating The Value of VeLO’s 4000 TPU Months
paper_authors: Fady Rezk, Antreas Antoniou, Henry Gouk, Timothy Hospedales
for: 本研究是用来训练一个通用的”基础”优化器的最大规模尝试。
methods: 本研究使用了 thousands of 机器学习任务和 over 4000 TPU 月份,以 Produce 一个可以泛化到新问题的优化器,并且不需要 гиперparameters 的调整。
results: 我们发现,与初始声明不符的结论:(1) VeLO 有一个关键的 гиперparameters 需要具体问题的调整,(2) VeLO 并不一定可以在解决质量上超越竞争对手,(3) VeLO 不一定比竞争优化器更快地降低训练损失。这些观察结论质疑 VeLO 的通用性和投资训练它的价值。Abstract
We analyze VeLO (versatile learned optimizer), the largest scale attempt to train a general purpose "foundational" optimizer to date. VeLO was trained on thousands of machine learning tasks using over 4000 TPU months with the goal of producing an optimizer capable of generalizing to new problems while being hyperparameter free, and outperforming industry standards such as Adam. We independently evaluate VeLO on the MLCommons optimizer benchmark suite. We find that, contrary to initial claims: (1) VeLO has a critical hyperparameter that needs problem-specific tuning, (2) VeLO does not necessarily outperform competitors in quality of solution found, and (3) VeLO is not faster than competing optimizers at reducing the training loss. These observations call into question VeLO's generality and the value of the investment in training it.
摘要
我们分析了VeLO(多功能学习优化器),目前最大规模的尝试是用多种机器学习任务来训练一个通用的“基础”优化器。VeLO在多达4000个TPU月的训练时间和4000个机器学习任务上被训练,以产生一个能够泛化到新问题的优化器,并且不需要任何hyperparameter。我们独立评估了VeLO在MLCommons优化器benchmark集合中的性能。我们发现:1. VeLO有一个关键的hyperparameter需要问题特定的调整。2. VeLO不一定能够超越竞争对手在解决问题的质量上。3. VeLO不一定比竞争对手快速地减少训练损失。这些观察结果质疑了VeLO的通用性和投资训练它的价值。
Personas as a Way to Model Truthfulness in Language Models
paper_authors: Nitish Joshi, Javier Rando, Abulhair Saparov, Najoung Kim, He He
for: This paper explores the ability of large language models to discern truth from falsehood in contradictory data.
methods: The authors hypothesize that language models can cluster truthful text by modeling a truthful persona, which is a group of agents that are likely to produce truthful text and share similar features. They use arithmetics as a synthetic environment to test this hypothesis.
results: The authors find that language models can separate true and false statements, and generalize truthfulness across agents, but only if the agents in the training data share a truthful generative process that enables the creation of a truthful persona. This suggests that models can exploit hierarchical structures in the data to learn abstract concepts like truthfulness.Abstract
Large Language Models are trained on vast amounts of text from the internet, which contains both factual and misleading information about the world. Can language models discern truth from falsehood in this contradicting data? Expanding on the view that LLMs can model different agents producing the corpora, we hypothesize that they can cluster truthful text by modeling a truthful persona: a group of agents that are likely to produce truthful text and share similar features. For example, trustworthy sources like Wikipedia and Science usually use formal writing styles and make consistent claims. By modeling this persona, LLMs can generalize truthfulness beyond the specific contexts in which each agent generated the training text. For example, the model can infer that the agent "Wikipedia" will behave truthfully on topics that were only generated by "Science" because they share a persona. We first show evidence for the persona hypothesis via two observations: (1) we can probe whether a model's answer will be truthful before it is generated; (2) finetuning a model on a set of facts improves its truthfulness on unseen topics. Next, using arithmetics as a synthetic environment, we show that language models can separate true and false statements, and generalize truthfulness across agents; but only if agents in the training data share a truthful generative process that enables the creation of a truthful persona. Overall, our findings suggest that models can exploit hierarchical structures in the data to learn abstract concepts like truthfulness.
摘要
我们的实验证明了这个人格假设,通过以下两个观察:(1)我们可以在模型生成答案之前检查其是否为真实的;(2)在训练集中对一些事实进行调整,可以提高模型对未见过的主题上的真实性。接下来,我们使用数学为Synthetic环境,证明语言模型可以分辨真实和假的声明,并将真实性扩展到不同的代理人。但是,只有在训练资料中的代理人具有真实生成过程,才能够创建一个真实的人格。总的来说,我们的发现表明了模型可以运用数据的层次结构,学习抽象概念如真实性。
Improving Intrinsic Exploration by Creating Stationary Objectives
results: 在多种探索问题中,包括稀缺奖励 Task、像素基 Observation、3D 导航和生成的环境等,SOFE 能够提高 Agent 的性能。Abstract
Exploration bonuses in reinforcement learning guide long-horizon exploration by defining custom intrinsic objectives. Count-based methods use the frequency of state visits to derive an exploration bonus. In this paper, we identify that any intrinsic reward function derived from count-based methods is non-stationary and hence induces a difficult objective to optimize for the agent. The key contribution of our work lies in transforming the original non-stationary rewards into stationary rewards through an augmented state representation. For this purpose, we introduce the Stationary Objectives For Exploration (SOFE) framework. SOFE requires identifying sufficient statistics for different exploration bonuses and finding an efficient encoding of these statistics to use as input to a deep network. SOFE is based on proposing state augmentations that expand the state space but hold the promise of simplifying the optimization of the agent's objective. Our experiments show that SOFE improves the agents' performance in challenging exploration problems, including sparse-reward tasks, pixel-based observations, 3D navigation, and procedurally generated environments.
摘要
文本翻译为简化中文:探索奖励在强化学习中引导长期探索,定义自定义内在目标。计数基本方法使用状态访问频率 derive 探索奖励。我们发现,任何基于计数基本方法 derive 的内在奖励都是非站ARY的,因此难以优化代理人的目标。我们的工作关键在于将原始非站ARY奖励转化为站ARY奖励,通过增强状态表示来实现。为此,我们提出了站ARY目标 для探索 (SOFE) 框架。SOFE需要确定不同探索奖励的 suffiSing statistic 和有效地编码这些统计作为深度网络的输入。SOFE基于提出状态扩展,既可以扩大状态空间,又可以简化代理人的目标优化。我们的实验表明,SOFE在复杂探索问题中提高了代理人的表现,包括罕见奖励任务、像素基本观察、3D导航和生成环境。Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.
Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models
paper_authors: Xue Yan, Yan Song, Xinyu Cui, Filippos Christianos, Haifeng Zhang, David Henry Mguni, Jun Wang
for: This paper aims to develop a fully integrated end-to-end framework for task-solving in real settings using complicated reasoning.
methods: The proposed leader-follower bilevel framework learns to ask relevant questions (prompts) and undertake reasoning to guide the learning of actions to be performed in an environment. The system uses a prompt-generator policy and an action policy to adapt to the CoT process and take decisive, high-performing actions.
results: The empirical data shows that the proposed system outperforms leading methods in agent learning benchmarks such as Overcooked and FourRoom.Abstract
Large language models (LLMs) demonstrate their promise in tackling complicated practical challenges by combining action-based policies with chain of thought (CoT) reasoning. Having high-quality prompts on hand, however, is vital to the framework's effectiveness. Currently, these prompts are handcrafted utilizing extensive human labor, resulting in CoT policies that frequently fail to generalize. Human intervention is also required in order to develop grounding functions that ensure low-level controllers appropriately process CoT reasoning. In this paper, we take the first step towards a fully integrated end-to-end framework for task-solving in real settings employing complicated reasoning. To that purpose, we offer a new leader-follower bilevel framework capable of learning to ask relevant questions (prompts) and subsequently undertaking reasoning to guide the learning of actions to be performed in an environment. A good prompt should make introspective revisions based on historical findings, leading the CoT to consider the anticipated goals. A prompt-generator policy has its own aim in our system, allowing it to adapt to the action policy and automatically root the CoT process towards outputs that lead to decisive, high-performing actions. Meanwhile, the action policy is learning how to use the CoT outputs to take specific actions. Our empirical data reveal that our system outperforms leading methods in agent learning benchmarks such as Overcooked and FourRoom.
摘要
大型语言模型(LLM)在解决实际挑战中展示了其应用潜力,通过结合动作政策和链接思维(CoT)理解。然而,高质量提示是框架的重要 componenet,并且通常需要人工干预以开发基础函数,以确保低层控制器正确处理CoT理解。在这篇论文中,我们将实现完整的终端到终端框架,用于实际设置中的任务解决。为此,我们提出了一个新的领导者-追随者二级框架,能够学习问题(提示)和随后进行理解,以导引行为学习。一个好的提示应该根据历史发现进行 introspective 修订,导引CoT考虑预期目标。在我们的系统中,提示策略有自己的目标,让它适应行为策略,并自动将CoT过程导向出力,以确保高效、决策性的动作。同时,动作策略在使用CoT出力进行特定动作。我们的实验数据显示,我们的系统在代理学习测试 benchmark 中表现出色,比如 Overcooked 和 FourRoom。
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
for: This paper focuses on the evaluation of opinion summarization models, specifically exploring the correlation between automatic metrics and human ratings.
methods: The paper uses a dataset called OpinSummEval, which includes human judgments and outputs from 14 opinion summarization models. The authors explore the correlation between 24 automatic metrics and human ratings across four dimensions.
results: The authors find that metrics based on neural networks generally outperform non-neural ones, but even the best-performing metrics do not consistently correlate well across all dimensions. This highlights the need for advancements in automated evaluation methods for opinion summarization.Abstract
Opinion summarization sets itself apart from other types of summarization tasks due to its distinctive focus on aspects and sentiments. Although certain automated evaluation methods like ROUGE have gained popularity, we have found them to be unreliable measures for assessing the quality of opinion summaries. In this paper, we present OpinSummEval, a dataset comprising human judgments and outputs from 14 opinion summarization models. We further explore the correlation between 24 automatic metrics and human ratings across four dimensions. Our findings indicate that metrics based on neural networks generally outperform non-neural ones. However, even metrics built on powerful backbones, such as BART and GPT-3/3.5, do not consistently correlate well across all dimensions, highlighting the need for advancements in automated evaluation methods for opinion summarization. The code and data are publicly available at https://github.com/A-Chicharito-S/OpinSummEval/tree/main.
摘要
Towards a Unified Conversational Recommendation System: Multi-task Learning via Contextualized Knowledge Distillation
results: 实验表明,我们的单个模型可以显著提高推荐性,同时保持对话流畅性,并与多任务学习方法相比,实现了相似的多样性表现。Abstract
In Conversational Recommendation System (CRS), an agent is asked to recommend a set of items to users within natural language conversations. To address the need for both conversational capability and personalized recommendations, prior works have utilized separate recommendation and dialogue modules. However, such approach inevitably results in a discrepancy between recommendation results and generated responses. To bridge the gap, we propose a multi-task learning for a unified CRS, where a single model jointly learns both tasks via Contextualized Knowledge Distillation (ConKD). We introduce two versions of ConKD: hard gate and soft gate. The former selectively gates between two task-specific teachers, while the latter integrates knowledge from both teachers. Our gates are computed on-the-fly in a context-specific manner, facilitating flexible integration of relevant knowledge. Extensive experiments demonstrate that our single model significantly improves recommendation performance while enhancing fluency, and achieves comparable results in terms of diversity.
摘要
在协作推荐系统(CRS)中,一个代理被要求在自然语言交流中推荐一组ITEMS给用户。为了解决个性化推荐和对话能力的需求,先前的工作通常使用了分开的推荐和对话模块。然而,这种方法无法快速bridging these two tasks的差异,导致推荐结果与生成的响应之间存在差异。为了bridge这个差异,我们提出了一种多任务学习的统一CRS,其中一个模型同时学习了两个任务 via Contextualized Knowledge Distillation(ConKD)。我们引入了两种ConKD版本:hard gate和soft gate。前者在两个任务特定的教师之间选择性地阻断,而后者将两个教师的知识集成在一起。我们的门控在上下文具体的计算,使得可以在不同的上下文中灵活地集成相关的知识。我们的实验表明,我们的单一模型可以大幅提高推荐性能,同时提高流畅性,并与多任务学习模型相比,在多样性方面实现相似的结果。
er.autopilot 1.0: The Full Autonomous Stack for Oval Racing at High Speeds
paper_authors: Ayoub Raji, Danilo Caporale, Francesco Gatti, Andrea Giove, Micaela Verucchi, Davide Malatesta, Nicola Musiu, Alessandro Toschi, Silviu Roberto Popitanu, Fabio Bagni, Massimiliano Bosi, Alexander Liniger, Marko Bertogna, Daniele Morra, Francesco Amerotti, Luca Bartoli, Federico Martello, Riccardo Porta
for: 本研究旨在提出一个独立开发的自主车辆软件架构,并在赛车赛道上进行了实验验证。
methods: 本研究使用了独立开发的自主车辆软件,包括了适应障碍物、主动超越和速度控制等模组。
results: 本研究在首两场赛事中获得了第二和第三名的成绩,并提供了各模组的实验结果和所学。Abstract
The Indy Autonomous Challenge (IAC) brought together for the first time in history nine autonomous racing teams competing at unprecedented speed and in head-to-head scenario, using independently developed software on open-wheel racecars. This paper presents the complete software architecture used by team TII EuroRacing (TII-ER), covering all the modules needed to avoid static obstacles, perform active overtakes and reach speeds above 75 m/s (270 km/h). In addition to the most common modules related to perception, planning, and control, we discuss the approaches used for vehicle dynamics modelling, simulation, telemetry, and safety. Overall results and the performance of each module are described, as well as the lessons learned during the first two events of the competition on oval tracks, where the team placed respectively second and third.
摘要
印第安那自主挑战(IAC)是历史上第一次将九支自主赛车队伍集结在一起,以前所未有的速度和头一头方式竞赛,使用独立开发的软件在开放式赛车上。本文介绍了TII EuroRacing(TII-ER)队伍所使用的完整软件架构,涵盖避免静止障碍物、实施活动超越和速度超过75米/秒(270公里/小时)等模块。此外,我们还讨论了车辆动力学模型、模拟、测验和安全方面的方法。文章结尾还提供了每个模块的性能和成绩,以及在oval赛道上的第一两场比赛中所学到的经验。
Detrimental Contexts in Open-Domain Question Answering
results: 研究人员发现,使用抓取大量信息可以提高问答模型的准确率,但是使用整个文章可以导致模型的性能下降。通过筛选抓取的文章,可以提高模型的性能。Abstract
For knowledge intensive NLP tasks, it has been widely accepted that accessing more information is a contributing factor to improvements in the model's end-to-end performance. However, counter-intuitively, too much context can have a negative impact on the model when evaluated on common question answering (QA) datasets. In this paper, we analyze how passages can have a detrimental effect on retrieve-then-read architectures used in question answering. Our empirical evidence indicates that the current read architecture does not fully leverage the retrieved passages and significantly degrades its performance when using the whole passages compared to utilizing subsets of them. Our findings demonstrate that model accuracy can be improved by 10% on two popular QA datasets by filtering out detrimental passages. Additionally, these outcomes are attained by utilizing existing retrieval methods without further training or data. We further highlight the challenges associated with identifying the detrimental passages. First, even with the correct context, the model can make an incorrect prediction, posing a challenge in determining which passages are most influential. Second, evaluation typically considers lexical matching, which is not robust to variations of correct answers. Despite these limitations, our experimental results underscore the pivotal role of identifying and removing these detrimental passages for the context-efficient retrieve-then-read pipeline. Code and data are available at https://github.com/xfactlab/emnlp2023-damaging-retrieval
摘要
对知识密集的NLP任务,许多研究表明,更多的信息访问可以提高模型的综合性表现。然而,counter-intuitively,过度的背景信息可能会对模型在常见问答(QA)数据集上的性能产生负面影响。在这篇论文中,我们分析了如何段落可以对问答模型产生负面影响。我们的实验证据表明,当前的读取架构不能充分利用检索到的段落,并且将整个段落作为输入时,模型的性能会明显下降。我们的发现表明,可以通过过滤掉负面影响的段落来提高模型的准确率。此外,我们还高亮了确定负面影响的段落的挑战。首先,即使正确的上下文,模型可能会作出错误预测,困难判断哪些段落最有影响。其次,评估通常是基于字符匹配,这并不是对正确答案的变体具有坚定的鲁棒性。尽管如此,我们的实验结果表明,确定和移除负面影响的段落对Context-efficient检索-然后-读取管线是非常重要的。代码和数据可以在https://github.com/xfactlab/emnlp2023-damaging-retrieval中找到。
paper_authors: Yejoon Lee, Philhoon Oh, James Thorne
for: This paper explores the effectiveness of generating context passages from large language models (LLMs) in open-domain question answering (QA), and investigates why generated passages may be more effective than retrieved ones.
methods: The paper introduces the concept of knowledge corpus error, which arises when the knowledge corpus used for retrieval is only a subset of the entire string space, and mitigates this shortcoming by generating passages in a larger space using LLMs. The paper also presents an experiment of paraphrasing human-annotated gold context using LLMs to observe knowledge corpus error empirically.
results: The results across three QA benchmarks show an increased performance (10% - 13%) when using paraphrased passage, indicating a signal for the existence of knowledge corpus error.Here is the information in Simplified Chinese text, as requested:
results: 结果表明,使用生成的段可以提高表现(10% - 13%),这表明知识库错误的存在。Abstract
Recent works in open-domain question answering (QA) have explored generating context passages from large language models (LLMs), replacing the traditional retrieval step in the QA pipeline. However, it is not well understood why generated passages can be more effective than retrieved ones. This study revisits the conventional formulation of QA and introduces the concept of knowledge corpus error. This error arises when the knowledge corpus used for retrieval is only a subset of the entire string space, potentially excluding more helpful passages that exist outside the corpus. LLMs may mitigate this shortcoming by generating passages in a larger space. We come up with an experiment of paraphrasing human-annotated gold context using LLMs to observe knowledge corpus error empirically. Our results across three QA benchmarks reveal an increased performance (10% - 13%) when using paraphrased passage, indicating a signal for the existence of knowledge corpus error. Our code is available at https://github.com/xfactlab/emnlp2023-knowledge-corpus-error
摘要
现有研究在开放领域问答(QA)中已经探索了从大语言模型(LLM)中生成上下文段落,取代传统的检索步骤在QA管道中。然而,不是很好地理解为何生成的段落比检索的更有效。本研究重新定义了传统的QA формулировка,并引入了知识库错误的概念。这种错误发生在用于检索的知识库只是字符串空间中的一个子集,可能排除了更有帮助的段落。LLM可能 mitigate这个缺点,因为它们可以生成段落在更大的空间中。我们设计了一个使用LLM来重新译human-annotated金标段落的实验,以观察知识库错误的实际情况。我们的结果在三个QA benchmark上显示,使用重新译段落时性能提高了10%-13%,这表明了知识库错误的存在。我们的代码可以在https://github.com/xfactlab/emnlp2023-knowledge-corpus-error中找到。
DUMA: a Dual-Mind Conversational Agent with Fast and Slow Thinking
paper_authors: Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, Ming Cui
for: 这篇论文的目的是提出一个基于 dual-process theory 的 conversational agent 框架,以提高对问题的回答效率和质量。
methods: 这篇论文使用了两个生成型 Large Language Models (LLMs),一个用于快速思考,另一个用于慢思考。快速思考模型负责外部互动和初步回答生成,根据问题的复杂程度进行评估是否需要启动慢思考模型。当启动时,慢思考模型会主导对话,进行细心的规划、推理和工具使用,以提供一个详细分析的回答。
results: 实验结果显示,我们的方法可以将效率和质量兼顾,与基准相比有很大的改善。Abstract
Inspired by the dual-process theory of human cognition, we introduce DUMA, a novel conversational agent framework that embodies a dual-mind mechanism through the utilization of two generative Large Language Models (LLMs) dedicated to fast and slow thinking respectively. The fast thinking model serves as the primary interface for external interactions and initial response generation, evaluating the necessity for engaging the slow thinking model based on the complexity of the complete response. When invoked, the slow thinking model takes over the conversation, engaging in meticulous planning, reasoning, and tool utilization to provide a well-analyzed response. This dual-mind configuration allows for a seamless transition between intuitive responses and deliberate problem-solving processes based on the situation. We have constructed a conversational agent to handle online inquiries in the real estate industry. The experiment proves that our method balances effectiveness and efficiency, and has a significant improvement compared to the baseline.
摘要
基于人类认知双进程理论,我们介绍DUMA conversational agent框架,该框架通过两个生成型大语言模型(LLM)来实现 быстро和慢思考的双 Mind 机制。快思模型作为外部交互的主要界面,评估问题的复杂性,并根据需要邀请慢思模型参与对话。当邀请时,慢思模型会承担对话,进行细致的规划、理智和工具使用,以提供优化的回答。这种双 Mind 配置允许在不同情况下协调Intuitive 回答和慎重的问题解决过程。我们在房地产领域的在线问题处理中构建了一个 conversational agent,实验证明我们的方法能够平衡效率和效果,与基准相比有显著改进。
results: 本文提出了一种度量道德责任的方法,并与现有的BvH和HK方法进行比较。Abstract
As more and more decisions that have a significant ethical dimension are being outsourced to AI systems, it is important to have a definition of moral responsibility that can be applied to AI systems. Moral responsibility for an outcome of an agent who performs some action is commonly taken to involve both a causal condition and an epistemic condition: the action should cause the outcome, and the agent should have been aware -- in some form or other -- of the possible moral consequences of their action. This paper presents a formal definition of both conditions within the framework of causal models. I compare my approach to the existing approaches of Braham and van Hees (BvH) and of Halpern and Kleiman-Weiner (HK). I then generalize my definition into a degree of responsibility.
摘要
随着更多的具有道德含义的决策被推到人工智能系统中,有必要为AI系统定义道德责任的定义。道德责任的出来由两个条件组成:行为应该导致结果,并且机器人应该知道(在某种形式下)可能的道德后果。这篇文章提出了一个正式的定义方法,并与布拉姆和海斯(BvH)和哈尔普尔和克莱曼-维纳(HK)的现有方法进行比较。然后,我将定义推广到责任度的一级。Here's the translation in Traditional Chinese:随着更多的具有道德含义的决策被推到人工智能系统中,有必要为AI系统定义道德责任的定义。道德责任的出来由两个条件组成:行为应该导致结果,并且机器人应该知道(在某种形式下)可能的道德后果。这篇文章提出了一个正式的定义方法,并与布拉姆和海斯(BvH)和哈尔普尔和克莱曼-维纳(HK)的现有方法进行比较。然后,我将定义推广到责任度的一级。
Large language models for aspect-based sentiment analysis
for: The paper is written for assessing the performance of GPT-4 and GPT-3.5 in aspect-based sentiment analysis (ABSA) tasks, and exploring the cost-performance trade-offs of different models.
methods: The paper uses zero-shot, few-shot, and fine-tuned settings to evaluate the performance of GPT-4 and GPT-3.5 on the ABSA task, and compares their performance with InstructABSA [@scaria_instructabsa_2023].
results: The fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task, improving upon InstructABSA by 5.7%. However, the fine-tuned model has 1000 times more parameters and thus higher inference cost. The paper also finds that detailed prompts improve performance in zero-shot and few-shot settings but are not necessary for fine-tuned models.Abstract
Large language models (LLMs) offer unprecedented text completion capabilities. As general models, they can fulfill a wide range of roles, including those of more specialized models. We assess the performance of GPT-4 and GPT-3.5 in zero shot, few shot and fine-tuned settings on the aspect-based sentiment analysis (ABSA) task. Fine-tuned GPT-3.5 achieves a state-of-the-art F1 score of 83.8 on the joint aspect term extraction and polarity classification task of the SemEval-2014 Task 4, improving upon InstructABSA [@scaria_instructabsa_2023] by 5.7%. However, this comes at the price of 1000 times more model parameters and thus increased inference cost. We discuss the the cost-performance trade-offs of different models, and analyze the typical errors that they make. Our results also indicate that detailed prompts improve performance in zero-shot and few-shot settings but are not necessary for fine-tuned models. This evidence is relevant for practioners that are faced with the choice of prompt engineering versus fine-tuning when using LLMs for ABSA.
摘要
OffMix-3L: A Novel Code-Mixed Dataset in Bangla-English-Hindi for Offensive Language Identification
results: 研究发现,BanglishBERT 在这个三语混合语料库中表现出色,超过其他 transformer 基于模型的表现。Abstract
Code-mixing is a well-studied linguistic phenomenon when two or more languages are mixed in text or speech. Several works have been conducted on building datasets and performing downstream NLP tasks on code-mixed data. Although it is not uncommon to observe code-mixing of three or more languages, most available datasets in this domain contain code-mixed data from only two languages. In this paper, we introduce OffMix-3L, a novel offensive language identification dataset containing code-mixed data from three different languages. We experiment with several models on this dataset and observe that BanglishBERT outperforms other transformer-based models and GPT-3.5.
摘要Code-mixing 是一种已经广泛研究的语言现象,在文本或语音中混合两种或更多种语言。许多研究已经建立了 code-mixed 数据集并在这些数据集上进行了下游 NLP 任务。虽然三种语言混合并不是不常见的,但大多数可用的数据集都只包含了两种语言的 code-mixed 数据。在这篇论文中,我们介绍了 OffMix-3L,一个新的三种语言混合语言识别数据集。我们在这个数据集上试用了一些模型,并发现 BanglishBERT 超过了其他转换器基于模型和 GPT-3.5。Here's the translation in Traditional Chinese:Code-mixing 是一种已经广泛研究的语言现象,在文本或语音中混合两种或更多种语言。许多研究已经建立了 code-mixed 数据集并在这些数据集上进行了下游 NLP 任务。处于三种语言混合的情况下,大多数可用的数据集都只包含了两种语言的 code-mixed 数据。在这篇论文中,我们介绍了 OffMix-3L,一个新的三种语言混合语言识别数据集。我们在这个数据集上尝试了一些模型,并发现 BanglishBERT 超过了其他对应的 transformer 基于模型和 GPT-3.5。
FormalGeo: The First Step Toward Human-like IMO-level Geometric Automated Reasoning
results: 实现了FormalGeo系统和FGPS实验,证明了GFT的正确性和实用性。使用backward depth-first search方法,解决问题失败率仅2.42%,并可以通过深度学习技术来降低这一值。Abstract
This is the first paper in a series of work we have accomplished over the past three years. In this paper, we have constructed a complete and compatible formal plane geometry system. This will serve as a crucial bridge between IMO-level plane geometry challenges and readable AI automated reasoning. With this formal system in place, we have been able to seamlessly integrate modern AI models with our formal system. Within this formal framework, AI is now capable of providing deductive reasoning solutions to IMO-level plane geometry problems, just like handling other natural languages, and these proofs are readable, traceable, and verifiable. We propose the geometry formalization theory (GFT) to guide the development of the geometry formal system. Based on the GFT, we have established the FormalGeo, which consists of 88 geometric predicates and 196 theorems. It can represent, validate, and solve IMO-level geometry problems. we also have crafted the FGPS (formal geometry problem solver) in Python. It serves as both an interactive assistant for verifying problem-solving processes and an automated problem solver, utilizing various methods such as forward search, backward search and AI-assisted search. We've annotated the FormalGeo7k dataset, containing 6,981 (expand to 186,832 through data augmentation) geometry problems with complete formal language annotations. Implementation of the formal system and experiments on the FormalGeo7k validate the correctness and utility of the GFT. The backward depth-first search method only yields a 2.42% problem-solving failure rate, and we can incorporate deep learning techniques to achieve lower one. The source code of FGPS and FormalGeo7k dataset are available at https://github.com/BitSecret/FormalGeo.
摘要
这是我们过去三年的一系列工作中的第一篇论文。在这篇论文中,我们构建了一个完整、兼容的正式平面几何系统。这将成为在IMO级平面几何挑战和可读的人工智能自动理解之间的关键桥梁。通过这个正式系统,我们可以将现代人工智能模型与我们的正式系统集成了。在这个正式框架下,人工智能现在可以提供平面几何问题的推理解决方案,就像处理其他自然语言一样,并且这些证明是可读、可追溯和可验证的。我们提出了几何ormal化理论(GFT),以引导正式几何系统的开发。基于GFT,我们建立了FormalGeo,它包含88个几何 predicate 和 196个定理。它可以表示、验证和解决IMO级平面几何问题。我们还制作了FGPS(正式几何问题解决器),它是一个在 Python 中实现的交互式助手和自动问题解决器,可以使用多种方法,如前向搜索、后向搜索和人工智能辅助搜索。我们对 FormaleGeo7k 数据集进行了注释,该数据集包含 6,981 个(通过数据扩充到 186,832)平面几何问题的完整正式语言注释。我们对正式系统的实现和 FormaleGeo7k 数据集的实验 validate 了正确性和实用性。使用回溯深度先搜索法只有2.42%的问题解决失败率,并且可以通过深度学习技术来降低这个数字。FGPS 和 FormaleGeo7k 数据集的源代码可以在 GitHub 上找到。
Deep Learning Enables Large Depth-of-Field Images for Sub-Diffraction-Limit Scanning Superlens Microscopy
results: 比较PSNR和结构相似度指标值表示,深度学习方法在图像到图像翻译中表现出色,与光学超分解图像相比,PSNR提高约0.74dB。这种方法在检测晶圆缺陷、生物样本分析、审查和其他领域都具有广泛的应用前景。Abstract
Scanning electron microscopy (SEM) is indispensable in diverse applications ranging from microelectronics to food processing because it provides large depth-of-field images with a resolution beyond the optical diffraction limit. However, the technology requires coating conductive films on insulator samples and a vacuum environment. We use deep learning to obtain the mapping relationship between optical super-resolution (OSR) images and SEM domain images, which enables the transformation of OSR images into SEM-like large depth-of-field images. Our custom-built scanning superlens microscopy (SSUM) system, which requires neither coating samples by conductive films nor a vacuum environment, is used to acquire the OSR images with features down to ~80 nm. The peak signal-to-noise ratio (PSNR) and structural similarity index measure values indicate that the deep learning method performs excellently in image-to-image translation, with a PSNR improvement of about 0.74 dB over the optical super-resolution images. The proposed method provides a high level of detail in the reconstructed results, indicating that it has broad applicability to chip-level defect detection, biological sample analysis, forensics, and various other fields.
摘要
扫描电子顾问(SEM)在多种应用中是不可或缺的,包括微电子到食品加工等,因为它可以提供具有大深度场的图像,超过光学折射限制。然而,技术需要将导电薄膜层应用于隔离样品和真空环境。我们使用深度学习来获得扫描超解像(OSR)图像和SEM领域图像之间的映射关系,这使得OSR图像可以转换为大深度场的SEM样式图像。我们自制的扫描超透镜系统(SSUM)不需要将样品层层涂敷导电薄膜,也不需要真空环境,可以获得OSR图像的特征下限为~80nm。PSNR和结构相似性指数值表明,深度学习方法在图像到图像翻译中表现出色,与扫描超解像图像相比,PSNR提高约0.74dB。我们提出的方法可以在各种领域中提供高级别的细节,包括半导体缺陷检测、生物样本分析、法医和多种其他领域。
Autonomous 3D Exploration in Large-Scale Environments with Dynamic Obstacles
results: DAEP 在动态和大规模环境中表现出优于当前标准方法,并在探索和碰撞避免方面具有更高的效果。Abstract
Exploration in dynamic and uncertain real-world environments is an open problem in robotics and constitutes a foundational capability of autonomous systems operating in most of the real world. While 3D exploration planning has been extensively studied, the environments are assumed static or only reactive collision avoidance is carried out. We propose a novel approach to not only avoid dynamic obstacles but also include them in the plan itself, to exploit the dynamic environment in the agent's favor. The proposed planner, Dynamic Autonomous Exploration Planner (DAEP), extends AEP to explicitly plan with respect to dynamic obstacles. To thoroughly evaluate exploration planners in such settings we propose a new enhanced benchmark suite with several dynamic environments, including large-scale outdoor environments. DAEP outperform state-of-the-art planners in dynamic and large-scale environments. DAEP is shown to be more effective at both exploration and collision avoidance.
摘要
文本翻译为简化中文:在真实世界中的动态和不确定环境中进行探索是Robotics中的一个开放问题,也是自主系统在大多数真实世界中的基本能力。而3D探索规划已经得到了广泛的研究,但是环境假设为静止的或者只是进行了反射性碰撞避免。我们提出了一种新的方法,不仅避免动态障碍物,而且将其包含在计划中,以利用动态环境来帮助代理人。我们提出的 Dynamic Autonomous Exploration Planner(DAEP)扩展了AEP,以明确地考虑动态障碍物。为了全面评估探索 плаanner在这些设置下的性能,我们提出了一个新的加强版benchmark suite,包括一些大规模的外部环境。DAEP在动态和大规模环境中表现出色,在探索和碰撞避免方面都更有效。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning
results: 在许多实验中,FamO2O具有与现有方法相比的 statistically significant 改进,并达到了D4RLbenchmark上的状态空间最佳性能。代码可以在https://github.com/LeapLabTHU/FamO2O中找到。Abstract
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on a pre-collected dataset with fine-tuning in an online environment. However, the incorporation of online fine-tuning can intensify the well-known distributional shift problem. Existing solutions tackle this problem by imposing a policy constraint on the policy improvement objective in both offline and online learning. They typically advocate a single balance between policy improvement and constraints across diverse data collections. This one-size-fits-all manner may not optimally leverage each collected sample due to the significant variation in data quality across different states. To this end, we introduce Family Offline-to-Online RL (FamO2O), a simple yet effective framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances. FamO2O utilizes a universal model to train a family of policies with different improvement/constraint intensities, and a balance model to select a suitable policy for each state. Theoretically, we prove that state-adaptive balances are necessary for achieving a higher policy performance upper bound. Empirically, extensive experiments show that FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark. Codes are available at https://github.com/LeapLabTHU/FamO2O.
摘要
偏向在线学习(RL)训练 paradigma combines 预训练在预收集的数据集上与在线环境的精细调整。然而,在线调整可能会加剧分布shift问题。现有的解决方案通过在offline和online学习中加入策略约束来解决该问题。这些方法通常提出一个在多种数据集上保持策略改进目标和约束之间的平衡。然而,这种一大把 fits all的方法可能无法最佳利用每个采集的样本,因为不同的状态下的数据质量有很大的差异。为此,我们介绍了Family Offline-to-Online RL(FamO2O)框架,它可以让现有算法在不同的状态下选择适当的策略改进约束。FamO2O使用一个通用模型来训练一个家族策略,每个策略都有不同的改进约束强度。此外,FamO2O还使用一个平衡模型来选择每个状态下最适合的策略。理论上,我们证明了适应性平衡是实现更高策略性能上限的必要条件。empirically,我们进行了大量的实验,并证明了FamO2O可以在D4RL benchmark上达到状态前方的性能。代码可以在https://github.com/LeapLabTHU/FamO2O上获取。
Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare
paper_authors: Junling Liu, Ziming Wang, Qichen Ye, Dading Chong, Peilin Zhou, Yining Hua for:This paper aims to address the lack of large language models in languages other than English and the ability to interpret multi-modal input, specifically for global healthcare accessibility.methods:The study introduces Qilin-Med-VL, the first Chinese large vision-language model that combines a pre-trained Vision Transformer (ViT) with a foundational language model. The model undergoes a two-stage curriculum training process that includes feature alignment and instruction tuning.results:The model is able to generate medical captions and answer complex medical queries, and the authors release a dataset called ChiMed-VL, which consists of over 1 million image-text pairs to enable detailed and comprehensive interpretation of medical data using various types of images.Abstract
Large Language Models (LLMs) have introduced a new era of proficiency in comprehending complex healthcare and biomedical topics. However, there is a noticeable lack of models in languages other than English and models that can interpret multi-modal input, which is crucial for global healthcare accessibility. In response, this study introduces Qilin-Med-VL, the first Chinese large vision-language model designed to integrate the analysis of textual and visual data. Qilin-Med-VL combines a pre-trained Vision Transformer (ViT) with a foundational LLM. It undergoes a thorough two-stage curriculum training process that includes feature alignment and instruction tuning. This method enhances the model's ability to generate medical captions and answer complex medical queries. We also release ChiMed-VL, a dataset consisting of more than 1M image-text pairs. This dataset has been carefully curated to enable detailed and comprehensive interpretation of medical data using various types of images.
摘要
大型语言模型(LLM)已经引入了新的时代,能够深刻理解复杂的医疗和生物医学话题。然而, существует一定的语言 besides English和可以处理多modal输入的模型缺失,这对全球医疗访问ibilty是关键。为此,本研究介绍了Qilin-Med-VL,首个用于整合文本和视觉数据的中文大vision-语言模型。Qilin-Med-VL结合预训练的视觉转换器(ViT)和基础的LLM。它通过两个阶段课程训练过程,包括特征对齐和指令调整。这种方法使得模型能够生成医学描述和回答复杂的医学问题。我们还发布了ChiMed-VL数据集,包含more than 1M的图像-文本对。这个数据集经过仔细审核,以便使用不同类型的图像进行详细和全面的医学数据解释。
Understanding Parameter Saliency via Extreme Value Theory
For: This paper aims to identify and correct misclassifications in deep neural networks, specifically convolutional neural networks (CNNs), by ranking convolution filters based on their potential to cause misidentification.* Methods: The paper uses parameter saliency ranking, which is based on extreme value theory, to identify the filters that are most likely to cause misclassification. The authors also use fine-tuning to correct misidentification.* Results: The paper shows that the proposed method can detect malicious filters and is less biased against the depth of layers in deep neural networks compared to existing methods. The authors also demonstrate the effectiveness of their approach on ImageNet.Abstract
Deep neural networks are being increasingly implemented throughout society in recent years. It is useful to identify which parameters trigger misclassification in diagnosing undesirable model behaviors. The concept of parameter saliency is proposed and used to diagnose convolutional neural networks (CNNs) by ranking convolution filters that may have caused misclassification on the basis of parameter saliency. It is also shown that fine-tuning the top ranking salient filters has efficiently corrected misidentification on ImageNet. However, there is still a knowledge gap in terms of understanding why parameter saliency ranking can find the filters inducing misidentification. In this work, we attempt to bridge the gap by analyzing parameter saliency ranking from a statistical viewpoint, namely, extreme value theory. We first show that the existing work implicitly assumes that the gradient norm computed for each filter follows a normal distribution. Then, we clarify the relationship between parameter saliency and the score based on the peaks-over-threshold (POT) method, which is often used to model extreme values. Finally, we reformulate parameter saliency in terms of the POT method, where this reformulation is regarded as statistical anomaly detection and does not require the implicit assumptions of the existing parameter-saliency formulation. Our experimental results demonstrate that our reformulation can detect malicious filters as well. Furthermore, we show that the existing parameter saliency method exhibits a bias against the depth of layers in deep neural networks. In particular, this bias has the potential to inhibit the discovery of filters that cause misidentification in situations where domain shift occurs. In contrast, parameter saliency based on POT shows less of this bias.
摘要
深度神经网络在近年中逐渐普及社会。为了诊断模型的不良行为,identifying模型参数的诱导性是非常有用的。在这些年中,我们提出了参数敏感性的概念,并用于诊断卷积神经网络(CNNs)中的参数敏感性排名。我们还证明了精细调整涉及到诊断错误的顶层敏感filter可以高效地修复ImageNet中的误分类。然而,我们还存在一个知识漏洞,即理解参数敏感排名如何找到导致误分类的filter。在这种情况下,我们尝试通过统计视角来填补这个漏洞,即使用极值理论。我们首先显示了现有工作假设每个滤波器的梯度 нор computes follows a normal distribution。然后,我们解释了参数敏感和分数之间的关系,并使用peaks-over-threshold(POT)方法来模型极值。最后,我们重新定义参数敏感,以统计异常检测的形式,不需要现有参数敏感的假设。我们的实验结果表明,我们的重新定义可以检测到危险的滤波器,并且我们发现现有参数敏感方法具有层深度的偏见,可能在领域转换 occurs 时阻碍发现误分类的滤波器。相比之下,基于POT方法的参数敏感方法具有较少的偏见。
A Comprehensive and Reliable Feature Attribution Method: Double-sided Remove and Reconstruct (DoRaR)
paper_authors: Dong Qin, George Amariucai, Daji Qiao, Yong Guan, Shen Fu for: 这种研究旨在解决深度神经网络和其他机器学习模型中的内部决策机制不透明性问题,以提高这些黑盒模型在不同领域的应用。methods: 该研究提出了一种基于多种改进方法的 Double-sided Remove and Reconstruct (DoRaR) 特征归因方法,可以有效地减轻艺术ifacts问题和Encoding Prediction in the Explanation (EPITE)问题,并可以帮助训练一个性能更高的特征选择器。results: 该研究通过对 MNIST、CIFAR10 和自己制作的synthetic数据集进行了广泛的测试,表明 DoRaR 特征归因方法可以有效地解释模型决策,并且可以超越其他现有的特征归因方法。Abstract
The limited transparency of the inner decision-making mechanism in deep neural networks (DNN) and other machine learning (ML) models has hindered their application in several domains. In order to tackle this issue, feature attribution methods have been developed to identify the crucial features that heavily influence decisions made by these black box models. However, many feature attribution methods have inherent downsides. For example, one category of feature attribution methods suffers from the artifacts problem, which feeds out-of-distribution masked inputs directly through the classifier that was originally trained on natural data points. Another category of feature attribution method finds explanations by using jointly trained feature selectors and predictors. While avoiding the artifacts problem, this new category suffers from the Encoding Prediction in the Explanation (EPITE) problem, in which the predictor's decisions rely not on the features, but on the masks that selects those features. As a result, the credibility of attribution results is undermined by these downsides. In this research, we introduce the Double-sided Remove and Reconstruct (DoRaR) feature attribution method based on several improvement methods that addresses these issues. By conducting thorough testing on MNIST, CIFAR10 and our own synthetic dataset, we demonstrate that the DoRaR feature attribution method can effectively bypass the above issues and can aid in training a feature selector that outperforms other state-of-the-art feature attribution methods. Our code is available at https://github.com/dxq21/DoRaR.
摘要
深度神经网络(DNN)和其他机器学习(ML)模型的内部决策机制的不充分透明性,限制了它们在一些领域的应用。为了解决这个问题,feature attrition方法被开发出来,以确定DNN和ML模型决策中的关键特征。然而,许多feature attrition方法存在一些缺点。例如,一类feature attrition方法会产生artefacts问题,即将外部样本掩码直接输入到原始训练的分类器中。另一类feature attrition方法使用共同训练的特征选择器和预测器,可以避免artefacts问题,但是它们会产生Encoding Prediction in the Explanation(EPITE)问题,导致预测结果的可信度受到特征选择器的影响。为了解决这些问题,我们在本研究中提出了Double-sided Remove and Reconstruct(DoRaR)特征attrition方法,基于一些改进方法。通过对MNIST、CIFAR10和我们自己的 sintetic dataset进行了广泛的测试,我们证明了DoRaR特征attrition方法可以有效地 circumvent这些问题,并且可以帮助训练一个性能更高的特征选择器。我们的代码可以在https://github.com/dxq21/DoRaR上下载。
Unified Segment-to-Segment Framework for Simultaneous Sequence Generation
for: simultaneous sequence generation for real-time scenarios, such as streaming speech recognition, simultaneous machine translation, and simultaneous speech translation
methods: segment-to-segment framework (Seg2Seg) with adaptive and unified learning for mapping between source and target sequences
results: state-of-the-art performance and better generality across various tasks, as demonstrated by experiments on multiple simultaneous generation tasksAbstract
Simultaneous sequence generation is a pivotal task for real-time scenarios, such as streaming speech recognition, simultaneous machine translation and simultaneous speech translation, where the target sequence is generated while receiving the source sequence. The crux of achieving high-quality generation with low latency lies in identifying the optimal moments for generating, accomplished by learning a mapping between the source and target sequences. However, existing methods often rely on task-specific heuristics for different sequence types, limiting the model's capacity to adaptively learn the source-target mapping and hindering the exploration of multi-task learning for various simultaneous tasks. In this paper, we propose a unified segment-to-segment framework (Seg2Seg) for simultaneous sequence generation, which learns the mapping in an adaptive and unified manner. During the process of simultaneous generation, the model alternates between waiting for a source segment and generating a target segment, making the segment serve as the natural bridge between the source and target. To accomplish this, Seg2Seg introduces a latent segment as the pivot between source to target and explores all potential source-target mappings via the proposed expectation training, thereby learning the optimal moments for generating. Experiments on multiple simultaneous generation tasks demonstrate that Seg2Seg achieves state-of-the-art performance and exhibits better generality across various tasks.
摘要
同时序列生成是现实时应用场景中的关键任务,如流媒体语音识别、同时机器翻译和同时语音翻译,其目标序列在接收源序列时生成。实现高质量生成的关键在于确定最佳的生成时机,通过学习源和目标序列之间的映射来实现。然而,现有方法通常采用特定任务的规则来控制生成,限制模型在不同序列类型上适应性地学习源-目标映射,阻碍了不同同时任务的多任务学习。本文提出了一个统一的段到段框架(Seg2Seg),用于同时序列生成。在同时生成过程中,模型会在等待源段和生成目标段之间转换,使得段成为源和目标之间自然的桥梁。为了实现这一点,Seg2Seg引入了一个 latent segment,作为源到目标映射的潜在桥梁,并通过提出的预期训练来探索所有可能的源-目标映射,从而学习最佳的生成时机。实验表明,Seg2Seg在多个同时生成任务上具有状态体系最佳性和更好的普适性。
results: 本 paper 的实验结果表明,这种 architecture 可以达到模elling 多种语言结构的最佳性能,并很好地与预training 中学习的含义语言表示结合。Abstract
We argue that Transformers are essentially graph-to-graph models, with sequences just being a special case. Attention weights are functionally equivalent to graph edges. Our Graph-to-Graph Transformer architecture makes this ability explicit, by inputting graph edges into the attention weight computations and predicting graph edges with attention-like functions, thereby integrating explicit graphs into the latent graphs learned by pretrained Transformers. Adding iterative graph refinement provides a joint embedding of input, output, and latent graphs, allowing non-autoregressive graph prediction to optimise the complete graph without any bespoke pipeline or decoding strategy. Empirical results show that this architecture achieves state-of-the-art accuracies for modelling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.
摘要
我们认为Transformer是基本上是图像图模型,序列只是特殊情况。注意权重函数对应于图像边。我们的图像图Transformer架构使得这种能力变得明确,将图像边输入到注意权重计算中,并使用注意函数预测图像边,因此将显式图 integrate到预训练Transformer中学习的潜在图中。添加迭代图精度提供了共同嵌入输入、输出和潜在图,allowing非自适应图预测可以优化完整的图 без any bespoke pipeline or decoding strategy. empirical results show that this architecture achieves state-of-the-art accuracy for modeling a variety of linguistic structures, integrating very effectively with the latent linguistic representations learned by pretraining.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.
Matching of Descriptive Labels to Glossary Descriptions
methods: 该论文提议使用现有的semantic text similarity测量(STS),并通过扩展 Sentence Retrieval和集成上下文化来增强它。 Sentence Retrieval是一种方法,可以为给定的标签返回与之相关的句子,而集成上下文化则是一种方法,可以计算两个上下文集(例如,两个表中的列名称)之间的相似度。
results: 实验结果表明,提议的方法可以帮助下面STS更正确地匹配描述性标签与描述文本。Abstract
Semantic text similarity plays an important role in software engineering tasks in which engineers are requested to clarify the semantics of descriptive labels (e.g., business terms, table column names) that are often consists of too short or too generic words and appears in their IT systems. We formulate this type of problem as a task of matching descriptive labels to glossary descriptions. We then propose a framework to leverage an existing semantic text similarity measurement (STS) and augment it using semantic label enrichment and set-based collective contextualization where the former is a method to retrieve sentences relevant to a given label and the latter is a method to compute similarity between two contexts each of which is derived from a set of texts (e.g., column names in the same table). We performed an experiment on two datasets derived from publicly available data sources. The result indicated that the proposed methods helped the underlying STS correctly match more descriptive labels with the descriptions.
摘要
<>使用简化字符串对文本进行相似度计算,可以帮助软件工程师在IT系统中更好地理解描述性标签(如商业术语、表列名称)的 semantics。我们将这种问题定义为映射描述标签到词典描述的任务。我们then proposed a framework to leveragen existing semantic text similarity measurement (STS) and augment it using semantic label enrichment and set-based collective contextualization。在我们的实验中,我们使用了两个公共数据源中的数据,并得到了结果,表明我们的方法可以帮助STS更正确地匹配描述标签与描述。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method
results: 经过实验表明,该方法可以有效地检测LLMs生成的非事实答案,并且可以在最新发布的LLMs中进行应用,如Vicuna、ChatGPT和GPT-4等。Abstract
Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. However, recent literature reveals that LLMs generate nonfactual responses intermittently, which impedes the LLMs' reliability for further utilization. In this paper, we propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results. Specifically, we first diversify the textual expressions for a given question and collect the corresponding answers. Then we examine the divergencies between the generated answers to identify the questions that the model may generate falsehoods. All of the above steps can be accomplished by prompting the LLMs themselves without referring to any other external resources. We conduct comprehensive experiments and demonstrate the effectiveness of our method on recently released LLMs, e.g., Vicuna, ChatGPT, and GPT-4.
摘要
大型自然语言处理模型(LLM)在自然语言处理任务中表现出了很大的潜力。然而,最新的文献表明,LLM occasional generation of nonfactual responses,这限制了LLM的可靠性,使其无法进一步使用。在这篇论文中,我们提出了一种新的自我检测方法,可以检测LLM不熟悉的问题是否会生成非事实答案。 Specifically,我们首先对给定问题的文本表达进行多样化,然后收集相应的答案。然后,我们比较这些生成的答案之间的差异,以确定问题是否会导致模型生成假信息。所有这些步骤都可以通过使用LLM自己的提问,不需要参考任何外部资源。我们对最近发布的LLM,如Vicuna、ChatGPT和GPT-4等进行了广泛的实验,并证明了我们的方法的有效性。
The Innovation-to-Occupations Ontology: Linking Business Transformation Initiatives to Occupations and Skills
methods: 该论文使用了 online job ads 和 Wikipedia 页面的嵌入式提取出来自动填充 ontology。
results: 该研究成功地匹配了各种企业变革 initiaves 和相应的职业,并提供了一种创新的方法来导引企业和教育机构在具体的企业变革 initiaves 中寻找合适的人才。Abstract
The fast adoption of new technologies forces companies to continuously adapt their operations making it harder to predict workforce requirements. Several recent studies have attempted to predict the emergence of new roles and skills in the labour market from online job ads. This paper aims to present a novel ontology linking business transformation initiatives to occupations and an approach to automatically populating it by leveraging embeddings extracted from job ads and Wikipedia pages on business transformation and emerging technologies topics. To our knowledge, no previous research explicitly links business transformation initiatives, like the adoption of new technologies or the entry into new markets, to the roles needed. Our approach successfully matches occupations to transformation initiatives under ten different scenarios, five linked to technology adoption and five related to business. This framework presents an innovative approach to guide enterprises and educational institutions on the workforce requirements for specific business transformation initiatives.
摘要
新技术的快速采用使公司需要不断适应操作,使预测工作力需求变得更加困难。一些最近的研究尝试通过在线职位招聘广告预测劳动力市场中的新角色和技能的出现。本文提出了一种新的 ontology,将企业转型活动与职业联系起来,并通过利用来自职位招聘和企业转型和新技术主题的Wikipedia页面中提取的嵌入进行自动填充。根据我们所知,没有任何前期研究直接将企业转型活动,如新技术的采用或新市场的入场,与需要的职业联系起来。我们的方法成功地将职业与转型活动相匹配,并在五种技术采用和五种商业转型的场景下进行了十个不同的enario。这种框架将为企业和教育机构提供一种创新的方法,以指导特定的企业转型活动所需的工作力。
Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey
results: 我们发现了LM4Code研究中存在的4大类隐患,即数据采集和标注、系统设计和学习、性能评估和部署维护。这些隐患可能导致LM4Code系统的不可靠性和实用性问题。Abstract
Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways.
摘要
(简化中文)现代语言模型(LM)在源代码生成和理解方面取得了成功,导致学习基于代码智能的研究得到了推动,例如自动修复bug和测试用例生成。尽管LM4Code具有巨大的潜力,但它们也面临着一些潜在的难题,这些难题会影响LM4Code在实际应用中的性能和可靠性。这些挑战需要我们进行全面的理解,不仅是识别这些问题,而且还需要探究它们的可能的影响和现有的解决方案,以建立更可靠的LM4Code系统。我们采用了一种系统atic research approach,进行了广泛的文献综述,并最终确定了67篇来自top-tier venues的研究。经过仔细检查这些研究,我们设计了LM4Code研究中的taxonomy难点,并进行了系统的研究,总结了各种难点的问题、影响、当前的解决方案和挑战。我们开发了一种全面的分类方案,将难点分解成四个重要方面:数据收集和标注、系统设计和学习、性能评估和部署维护。通过这项研究,我们希望为研究者和实践者提供一个路线图,以便他们更好地理解和利用LM4Code,以实现可靠和可信worthy的应用。
Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey
results: 本论文提供了关于文本-SQL和文本-可见化问题的最新进展,包括数据集、方法、指标和系统设计等方面的报告,并强调了大语言模型(LLM)在这些领域的影响和未来发展的可能性。Abstract
The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This survey presents a comprehensive overview of natural language interfaces for tabular data querying and visualization, which allow users to interact with data using natural language queries. We introduce the fundamental concepts and techniques underlying these interfaces with a particular emphasis on semantic parsing, the key technology facilitating the translation from natural language to SQL queries or data visualization commands. We then delve into the recent advancements in Text-to-SQL and Text-to-Vis problems from the perspectives of datasets, methodologies, metrics, and system designs. This includes a deep dive into the influence of LLMs, highlighting their strengths, limitations, and potential for future improvements. Through this survey, we aim to provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.
摘要
“自然语言处理技术的出现已经改变了用户与数据表格之间的交互方式,从传统的查询语言和手动折衣到更直观的自然语言界面。大型语言模型(LLM)如ChatGPT和其继承者的出现已经进一步推动了这个领域,开启了新的自然语言处理技术的avenues。本缩短所提供的缩短简介了自然语言界面 для数据表格查询和可视化,让用户可以使用自然语言查询来与数据进行交互。我们将介绍这些界面的基本概念和技术,尤其是对于自然语言转换为SQL查询或数据可视化指令的问题,我们将对这些问题进行深入探讨。我们还将详细介绍过去几年在文本转SQL和文本转可视化领域中的进展,包括 dataset、方法、指标和系统设计等方面的发展。这包括对大型语言模型(LLM)的影响,包括它们的优点、局限性和未来改进的潜力。我们 hoped that this survey will provide a roadmap for researchers and practitioners interested in developing and applying natural language interfaces for data interaction in the era of large language models.”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
for: This paper highlights the privacy risks associated with the use of large language models (LLMs) in AI assistants, specifically the inference-time privacy risks that arise when LLMs are fed different types of information from multiple sources and are expected to reason about what to share in their outputs.
methods: The paper proposes ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. The authors use this benchmark to evaluate the privacy reasoning abilities of two state-of-the-art LLMs, GPT-4 and ChatGPT.
results: The authors find that even the most capable models, GPT-4 and ChatGPT, reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when the authors employ privacy-inducing prompts or chain-of-thought reasoning. The paper highlights the immediate need to explore novel inference-time privacy-preserving approaches based on reasoning and theory of mind.Abstract
The interactive use of large language models (LLMs) in AI assistants (at work, home, etc.) introduces a new set of inference-time privacy risks: LLMs are fed different types of information from multiple sources in their inputs and are expected to reason about what to share in their outputs, for what purpose and with whom, within a given context. In this work, we draw attention to the highly critical yet overlooked notion of contextual privacy by proposing ConfAIde, a benchmark designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs. Our experiments show that even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. This leakage persists even when we employ privacy-inducing prompts or chain-of-thought reasoning. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
摘要
大型语言模型(LLMs)在人工智能助手(在工作、家庭等)的交互使用中引入了一新的推理时隐私风险:LLMs 从多种来源接受不同类型的信息,并被要求在输出中对此进行分享、目的和对象的决定,在给定的上下文中。在这项工作中,我们吸引关注高度敏感但受到忽略的上下文隐私的概念,并提出了 ConfAIde,一个用于评估指导学习模型的隐私推理能力的benchmark。我们的实验显示,even the most capable models such as GPT-4 and ChatGPT reveal private information in contexts that humans would not, 39% and 57% of the time, respectively。这种泄露 persist even when we employ privacy-inducing prompts or chain-of-thought reasoning。我们的工作强调了立即需要探索新的推理时隐私保护方法,基于推理和思维模型。
ASPIRO: Any-shot Structured Parsing-error-Induced ReprOmpting for Consistent Data-to-Text Generation
results: 与直接 LLM 输出相比,ASPIRO 平均降低 DART dataset 中生成的括号错误率66%。在 Rel2Text dataset 上,使用最佳 5 据 setup(text-davinci-003), scored BLEU 50.62、METEOR 45.16、BLEURT 0.82、NUBIA 0.87 和 PARENT 0.8962,与最近的 fine-tuned 预训语言模型竞争。Abstract
We present ASPIRO, an approach for structured data verbalisation into short template sentences in zero to few-shot settings. Unlike previous methods, our approach prompts large language models (LLMs) to directly produce entity-agnostic templates, rather than relying on LLMs to faithfully copy the given example entities, or validating/crafting the templates manually. We incorporate LLM re-prompting, triggered by algorithmic parsing checks, as well as the PARENT metric induced consistency validation to identify and rectify template generation problems in real-time. ASPIRO, compared to direct LLM output, averages 66\% parsing error rate reduction in generated verbalisations of RDF triples on the DART dataset. Our best 5-shot text-davinci-003 setup, scoring BLEU of 50.62, METEOR of 45.16, BLEURT of 0.82, NUBIA of 0.87, and PARENT of 0.8962 on the Rel2Text dataset, competes effectively with recent fine-tuned pre-trained language models.
摘要
我们介绍ASPIRO方法,用于在零到几极少示例设置下将结构化数据变成简短的模板句子。与前一代方法不同,我们的方法会让大型自然语言模型(LLM)直接生成无关实体的模板,而不是依赖于LLM忠实 копи写给定示例实体,或者手动验证/制定模板。我们利用LLM重新拓展,根据算法解析检查触发,以及由PARENT метрик引起的一致验证,实时rectify模板生成问题。与直接LLM输出相比,ASPIRO方法在DART数据集上的生成架构化描述中的平均解析错误率减少了66%。我们的最佳5枚TEXT-DAVINCI-003设置,在Rel2Text数据集上的BLEU分数为50.62,METEOR分数为45.16,BLEURT分数为0.82,NUBIA分数为0.87,和PARENT分数为0.8962,与最近的微调预训练语言模型竞争得来。
paper_authors: Wentao Guo, Andrew Wang, Bradon Thymes, Thorsten Joachims
for: ranking with slot constraints, which can be applied to various real-world problems such as college admission and medical trial participant selection.
methods: the proposed algorithm called MatchRank, which aims to maximize the number of filled slots by evaluating candidates in the order of the ranking.
results: MatchRank has a strong approximation guarantee and can provide substantial improvements over a range of synthetic and real-world tasks.Here’s the full summary in Simplified Chinese:
for: ranking with slot constraints, 可以应用到各种实际世界问题,如大学入学和医学试验参与者选择。
methods: 提案的算法叫做MatchRank,它的目标是通过评估候选人来填充槽位。
results: MatchRank具有强的近似保证,并且在多个 sintetic 和实际任务上能提供substantial 的改善。Abstract
We introduce the problem of ranking with slot constraints, which can be used to model a wide range of application problems -- from college admission with limited slots for different majors, to composing a stratified cohort of eligible participants in a medical trial. We show that the conventional Probability Ranking Principle (PRP) can be highly sub-optimal for slot-constrained ranking problems, and we devise a new ranking algorithm, called MatchRank. The goal of MatchRank is to produce rankings that maximize the number of filled slots if candidates are evaluated by a human decision maker in the order of the ranking. In this way, MatchRank generalizes the PRP, and it subsumes the PRP as a special case when there are no slot constraints. Our theoretical analysis shows that MatchRank has a strong approximation guarantee without any independence assumptions between slots or candidates. Furthermore, we show how MatchRank can be implemented efficiently. Beyond the theoretical guarantees, empirical evaluations show that MatchRank can provide substantial improvements over a range of synthetic and real-world tasks.
摘要
我们介绍了带槽限制的排名问题,这可以用来模型广泛的应用问题---从大学入学限制不同学系的名额,到组织一个适合者的医疗试验参与者实验组。我们表明,传统的概率排名原则(PRP)可以对带槽限制的排名问题高度不理想,而我们提出了一个新的排名算法,called MatchRank。MatchRank的目的是在人工决策者按照排名顺序评估候选者时,生成将满足最多槽位的排名。这样,MatchRank超越了PRP,并将其视为对槽位不存在的特别情况。我们的理论分析显示MatchRank具有强的近似保证,不需要候选者或槽位之间的独立假设。此外,我们显示了如何有效地实现MatchRank。在理论保证之外,实际评估显示MatchRank可以提供广泛的Synthetic和实际任务中的重大改善。
Reproducibility in Multiple Instance Learning: A Case For Algorithmic Unit Tests
paper_authors: Edward Raff, James Holt for:多元例子学习(MIL)是一个特殊的分类问题,其中输入包含多个实例,每个实例具有一个标签,标签为正则则表示至少有一个正例在输入中,否则为负例。训练这种问题需要将实例级别的信息与袋级别的标签相关联,并且含有一定的 causal 假设和偏见。MIL问题在医疗、网络安全等领域都有广泛的应用。methods:本文研究了五种深度学习的MIL模型,发现这些模型都不尊重标准MIL假设。它们能够学习反相关的实例,即默认为正例,直到看到负例为止,这不应该是正确的MIL模型的行为。我们认为这些模型的改进版本和其他相关工作也会具有同样的问题。results:我们通过一种提议的”算法单元测试”来证明这些模型的问题。我们创建了一些合成数据集,可以由一个尊重MIL假设的模型解决,而这些数据集明显暴露了这些模型学习的问题。五种评估模型各自失败了一个或多个这些测试。这提供了一种模型独立的方法来识别模型假设的违反,我们希望这将对未来的MIL模型开发和评估具有用处。Abstract
Multiple Instance Learning (MIL) is a sub-domain of classification problems with positive and negative labels and a "bag" of inputs, where the label is positive if and only if a positive element is contained within the bag, and otherwise is negative. Training in this context requires associating the bag-wide label to instance-level information, and implicitly contains a causal assumption and asymmetry to the task (i.e., you can't swap the labels without changing the semantics). MIL problems occur in healthcare (one malignant cell indicates cancer), cyber security (one malicious executable makes an infected computer), and many other tasks. In this work, we examine five of the most prominent deep-MIL models and find that none of them respects the standard MIL assumption. They are able to learn anti-correlated instances, i.e., defaulting to "positive" labels until seeing a negative counter-example, which should not be possible for a correct MIL model. We suspect that enhancements and other works derived from these models will share the same issue. In any context in which these models are being used, this creates the potential for learning incorrect models, which creates risk of operational failure. We identify and demonstrate this problem via a proposed "algorithmic unit test", where we create synthetic datasets that can be solved by a MIL respecting model, and which clearly reveal learning that violates MIL assumptions. The five evaluated methods each fail one or more of these tests. This provides a model-agnostic way to identify violations of modeling assumptions, which we hope will be useful for future development and evaluation of MIL models.
摘要
多例学习(Multiple Instance Learning,MIL)是一个类别问题的子领域,其中包含正例和负例,以及一个“袋”(bag)中的输入,其中正例是指袋中包含正例元素,否则为负例。在这种情况下,训练需要将袋级别标签与实例级别信息相关联,并且含有一定的 causal 假设和不对称性。MIL 问题在医疗(一个有害细胞表示癌症)、计算机安全(一个恶意执行程序使计算机感染)等领域出现。在这种工作中,我们考察了五种最具有影响力的深度MIL模型,并发现其中没有任何一个遵循标准MIL假设。它们可以学习反相关实例,即默认为正例标签,直到看到负例对例,这不应该是正确的MIL模型。我们认为这些模型的改进和其他基于这些模型的工作都会受到同样的问题。在这些模型被使用的任何情况下,这会创造出学习错误的模型,从而导致操作失败的风险。我们通过一种“算法单元测试”来识别和演示这个问题,其中我们创建了一些可以由遵循MIL假设的模型解决的 sintetic 数据集,并显示了这些模型学习的问题。五种评估方法都失败了一个或多个这些测试。这提供了一种模型独立的方式来识别模型假设的违反,我们希望这将对未来的MIL模型发展和评估具有用。
Function Space Bayesian Pseudocoreset for Bayesian Neural Networks
results: 该方法可以更好地衡量uncertainty量和Robustness,并且适用于多种模型架构。Abstract
A Bayesian pseudocoreset is a compact synthetic dataset summarizing essential information of a large-scale dataset and thus can be used as a proxy dataset for scalable Bayesian inference. Typically, a Bayesian pseudocoreset is constructed by minimizing a divergence measure between the posterior conditioning on the pseudocoreset and the posterior conditioning on the full dataset. However, evaluating the divergence can be challenging, particularly for the models like deep neural networks having high-dimensional parameters. In this paper, we propose a novel Bayesian pseudocoreset construction method that operates on a function space. Unlike previous methods, which construct and match the coreset and full data posteriors in the space of model parameters (weights), our method constructs variational approximations to the coreset posterior on a function space and matches it to the full data posterior in the function space. By working directly on the function space, our method could bypass several challenges that may arise when working on a weight space, including limited scalability and multi-modality issue. Through various experiments, we demonstrate that the Bayesian pseudocoresets constructed from our method enjoys enhanced uncertainty quantification and better robustness across various model architectures.
摘要
一个 bayesian pseudocoreset 是一个简化的人工数据集,它捕捉了大规模数据集中的关键信息,因此可以用作可扩展的 bayesian 推理的代理数据集。通常,一个 bayesian pseudocoreset 是通过将 posterior conditioning 中的差异最小化来构建的。然而,对于深度神经网络等高维参数模型来说,评估差异可能具有挑战。在这篇论文中,我们提出了一种新的 bayesian pseudocoreset 构建方法,它在函数空间上进行。不同于之前的方法,我们的方法在模型参数( weights)空间上构建和匹配 coreset 和全数据 posterior,而不是直接在 weights 空间上做。通过在函数空间上工作,我们的方法可以避免一些在 weights 空间上工作时可能会出现的挑战,包括有限扩展性和多模性问题。通过多个实验,我们示出了 bayesian pseudocoresets 由我们的方法构建的uncertainty quantification和模型 Architecture 的多样性均有所提高。
Real-time Animation Generation and Control on Rigged Models via Large Language Models
results: 论文展示了LLM的潜在性,可以实现动画状态的灵活转换,并通过许多RIGGED模型和动作的质量验证了该方法的稳定性。Abstract
We introduce a novel method for real-time animation control and generation on rigged models using natural language input. First, we embed a large language model (LLM) in Unity to output structured texts that can be parsed into diverse and realistic animations. Second, we illustrate LLM's potential to enable flexible state transition between existing animations. We showcase the robustness of our approach through qualitative results on various rigged models and motions.
摘要
我们介绍了一种新的实时动画控制和生成技术,使用自然语言输入控制着带有骨架的模型。首先,我们将大型语言模型(LLM)引入Unity中,以输出结构化的文本,并将其解析成多种真实和生动的动画。其次,我们展示了LLM的潜在能力,可以实现动画状态的灵活转换 между已有的动画。我们通过对不同的带有骨架和动作的模型和动画进行质量检测,证明了我们的方法的稳定性和可靠性。
results: 论文表明,该方法可以对视频质量进行显著提高,并且可以独立地 manipulate 动作和内容,以及进行时间GAN-反转,从一个内容或身份中恢复和传输视频动作。Abstract
In this paper, we propose a style-based conditional video generative model. We introduce a novel temporal generator based on a set of learned sinusoidal bases. Our method learns dynamic representations of various actions that are independent of image content and can be transferred between different actors. Beyond the significant enhancement of video quality compared to prevalent methods, we demonstrate that the disentangled dynamic and content permit their independent manipulation, as well as temporal GAN-inversion to retrieve and transfer a video motion from one content or identity to another without further preprocessing such as landmark points.
摘要
在这篇论文中,我们提出了一种基于风格的条件视频生成模型。我们引入了一种基于学习的振荡基函数,用于学习不同动作的动态表示。我们的方法可以独立地 manipulate 动作表示,并且可以在不同的演员身上传递。除了与常见方法相比带来显著改善的视频质量之外,我们还示出了独立地执行动作和内容的权限,以及时间GAN-反转来重新处理和传输视频动作。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.
Ontology Revision based on Pre-trained Language Models
results: 根据实验结果,本研究的算法可以达到了 promising 的性能,而 adapted 修订算法可以大幅提高效率,最多可以Save 96% 的时间。此外,一些 scoring functions 可以帮助修订算法在很多情况下获得更好的结果。Abstract
Ontology revision aims to seamlessly incorporate new information into an existing ontology and plays a crucial role in tasks such as ontology evolution, ontology maintenance, and ontology alignment. Similar to repair single ontologies, resolving logical incoherence in the task of ontology revision is also important and meaningful since incoherence is a main potential factor to cause inconsistency and reasoning with an inconsistent ontology will obtain meaningless answers. To deal with this problem, various ontology revision methods have been proposed to define revision operators and design ranking strategies for axioms in an ontology. However, they rarely consider axiom semantics which provides important information to differentiate axioms. On the other hand, pre-trained models can be utilized to encode axiom semantics, and have been widely applied in many natural language processing tasks and ontology-related ones in recent years. Therefore, in this paper, we define four scoring functions to rank axioms based on a pre-trained model by considering various information from a rebuttal ontology and its corresponding reliable ontology. Based on such a scoring function, we propose an ontology revision algorithm to deal with unsatisfiable concepts at once. If it is hard to resolve all unsatisfiable concepts in a rebuttal ontology together, an adapted revision algorithm is designed to deal with them group by group. We conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones. According to the experiments, it shows that our algorithms could achieve promising performance. The adapted revision algorithm could improve the efficiency largely, and at most 96% time could be saved for some ontology pairs. Some of our scoring functions help a revision algorithm obtain better results in many cases, especially for the challenging pairs.
摘要
ontology revision aims to seamlessly incorporate new information into an existing ontology and plays a crucial role in tasks such as ontology evolution, ontology maintenance, and ontology alignment. similar to repairing single ontologies, resolving logical incoherence in the task of ontology revision is also important and meaningful, as incoherence is a main potential factor that can cause inconsistency, and reasoning with an inconsistent ontology will obtain meaningless answers. to deal with this problem, various ontology revision methods have been proposed to define revision operators and design ranking strategies for axioms in an ontology. however, they rarely consider axiom semantics, which provides important information to differentiate axioms. on the other hand, pre-trained models can be utilized to encode axiom semantics, and have been widely applied in many natural language processing tasks and ontology-related ones in recent years. therefore, in this paper, we define four scoring functions to rank axioms based on a pre-trained model by considering various information from a rebuttal ontology and its corresponding reliable ontology. based on such a scoring function, we propose an ontology revision algorithm to deal with unsatisfiable concepts at once. if it is hard to resolve all unsatisfiable concepts in a rebuttal ontology together, an adapted revision algorithm is designed to deal with them group by group. we conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones. according to the experiments, it shows that our algorithms could achieve promising performance. the adapted revision algorithm could improve the efficiency largely, and at most 96% time could be saved for some ontology pairs. some of our scoring functions help a revision algorithm obtain better results in many cases, especially for the challenging pairs.
Large-scale Foundation Models and Generative AI for BigData Neuroscience
results: 该论文 argued that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.I hope that helps!Abstract
Recent advances in machine learning have made revolutionary breakthroughs in computer games, image and natural language understanding, and scientific discovery. Foundation models and large-scale language models (LLMs) have recently achieved human-like intelligence thanks to BigData. With the help of self-supervised learning (SSL) and transfer learning, these models may potentially reshape the landscapes of neuroscience research and make a significant impact on the future. Here we present a mini-review on recent advances in foundation models and generative AI models as well as their applications in neuroscience, including natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.
摘要
Recent advances in machine learning have led to significant breakthroughs in computer games, image and natural language understanding, and scientific discovery. The development of foundation models and large-scale language models (LLMs) has achieved human-like intelligence, thanks to the power of BigData. With the help of self-supervised learning (SSL) and transfer learning, these models have the potential to reshape the landscapes of neuroscience research and have a profound impact on the future.In this mini-review, we will explore recent advances in foundation models and generative AI models, as well as their applications in neuroscience. We will discuss the use of these models in natural language and speech, semantic memory, brain-machine interfaces (BMIs), and data augmentation. We argue that this paradigm-shift framework will open new avenues for many neuroscience research directions and discuss the accompanying challenges and opportunities.Foundation Models and Generative AI ModelsFoundation models and generative AI models have been instrumental in achieving human-like intelligence in various fields. These models are trained on large datasets and use self-supervised learning techniques to learn the underlying patterns and relationships in the data. Once trained, these models can be fine-tuned for specific tasks, such as natural language processing, image recognition, and speech recognition.Applications in Neuroscience1. Natural Language and Speech: Foundation models and generative AI models have been used to develop advanced natural language processing systems that can understand and generate human-like language. These systems have numerous applications in neuroscience, such as analyzing large amounts of text data to identify patterns and trends, and generating natural language descriptions of complex scientific concepts.2. Semantic Memory: These models can be used to develop advanced memory systems that can store and retrieve large amounts of information. This has numerous applications in neuroscience, such as developing systems that can remember and recall complex scientific concepts and theories.3. Brain-Machine Interfaces (BMIs): Foundation models and generative AI models can be used to develop advanced BMIs that can read and interpret brain signals. This has numerous applications in neuroscience, such as developing systems that can decode brain signals to control prosthetic limbs and other assistive technologies.4. Data Augmentation: These models can be used to generate large amounts of synthetic data that can be used to augment real-world datasets. This has numerous applications in neuroscience, such as developing systems that can generate synthetic brain imaging data to augment real-world datasets and improve the accuracy of brain imaging techniques.Challenges and OpportunitiesWhile foundation models and generative AI models have the potential to revolutionize neuroscience research, there are several challenges and opportunities that must be addressed. Some of the challenges include:1. Data Quality: The quality of the data used to train these models is crucial. Poor-quality data can lead to biased or inaccurate models.2. Explainability: It is often difficult to understand how these models make decisions, which can be a problem in fields such as neuroscience where transparency and explainability are essential.3. Ethics: The use of these models raises ethical concerns, such as the potential for bias and the impact on employment.4. Training Time: Training these models can be time-consuming and computationally intensive.Despite these challenges, the opportunities presented by foundation models and generative AI models are significant. With the right training data and the appropriate fine-tuning, these models have the potential to revolutionize neuroscience research and lead to significant advances in our understanding of the brain and nervous system.
for: 本研究目的是evaluating the performance of text-to-SQL models on several prominent cross-domain benchmarks, and re-evaluating top-performing models to assess their true performance.
methods: 该研究使用了 manual evaluation and equivalent expression rewriting to evaluate the SQL queries and models.
results: 研究发现,due to the multiple interpretations of the provided samples, attaining a perfect performance on these benchmarks is unfeasible. additionally, the true performance of the models was underestimated, and their relative performance changed after re-evaluation. Most notably, a recent GPT4-based model surpassed the gold standard reference queries in the Spider benchmark in human evaluation, highlighting the importance of interpreting benchmark evaluations cautiously.Abstract
Text-to-SQL benchmarks play a crucial role in evaluating the progress made in the field and the ranking of different models. However, accurately matching a model-generated SQL query to a reference SQL query in a benchmark fails for various reasons, such as underspecified natural language queries, inherent assumptions in both model-generated and reference queries, and the non-deterministic nature of SQL output under certain conditions. In this paper, we conduct an extensive study of several prominent cross-domain text-to-SQL benchmarks and re-evaluate some of the top-performing models within these benchmarks, by both manually evaluating the SQL queries and rewriting them in equivalent expressions. Our evaluation reveals that attaining a perfect performance on these benchmarks is unfeasible due to the multiple interpretations that can be derived from the provided samples. Furthermore, we find that the true performance of the models is underestimated and their relative performance changes after a re-evaluation. Most notably, our evaluation reveals a surprising discovery: a recent GPT4-based model surpasses the gold standard reference queries in the Spider benchmark in our human evaluation. This finding highlights the importance of interpreting benchmark evaluations cautiously, while also acknowledging the critical role of additional independent evaluations in driving advancements in the field.
摘要
On the Automatic Generation and Simplification of Children’s Stories
results: 研究者发现,虽然 LLMs 的能力在不断提高,但它们目前还没有能力限制自己的词汇水平适合更年轻的儿童。在第二个实验中,研究者explored the ability of state-of-the-art lexical simplification models to generalize to the domain of children’s stories, and created an efficient pipeline for their automatic generation.Abstract
With recent advances in large language models (LLMs), the concept of automatically generating children's educational materials has become increasingly realistic. Working toward the goal of age-appropriate simplicity in generated educational texts, we first examine the ability of several popular LLMs to generate stories with properly adjusted lexical and readability levels. We find that, in spite of the growing capabilities of LLMs, they do not yet possess the ability to limit their vocabulary to levels appropriate for younger age groups. As a second experiment, we explore the ability of state-of-the-art lexical simplification models to generalize to the domain of children's stories and, thus, create an efficient pipeline for their automatic generation. In order to test these models, we develop a dataset of child-directed lexical simplification instances, with examples taken from the LLM-generated stories in our first experiment. We find that, while the strongest-performing current lexical simplification models do not perform as well on material designed for children due to their reliance on large language models behind the scenes, some models that still achieve fairly strong results on general data can mimic or even improve their performance on children-directed data with proper fine-tuning, which we conduct using our newly created child-directed simplification dataset.
摘要
As a second experiment, we explore the ability of state-of-the-art lexical simplification models to generalize to the domain of children's stories. We develop a dataset of child-directed lexical simplification instances, using examples from the LLM-generated stories in our first experiment. We find that while the strongest-performing current lexical simplification models do not perform as well on material designed for children, some models that achieve strong results on general data can mimic or even improve their performance on children-directed data with proper fine-tuning.我们使用最新的大语言模型(LLMs),目标是自动生成儿童教育材料。为了实现适合不同年龄层的简化,我们首先评估各种流行的LLMs是否能够自动生成适合不同年龄层的故事。我们发现,虽然LLMs在过去几年内做出了很大的进步,但它们还没有拥有适合儿童年龄层的词汇量。作为第二个实验,我们研究了当前最佳的 lexical simplification 模型是否能够在儿童故事领域得到普遍化。我们开发了一个儿童指向的 lexical simplification 示例集,其中的例子来自我们的第一个实验中的 LLM-生成的故事。我们发现,当前最强的 lexical simplification 模型在面向儿童的数据上表现不如其他数据上,这是因为它们在后台使用大语言模型。但是,一些在普遍数据上表现良好的模型可以通过我们的特制的儿童指向的简化示例集进行细化,从而实现良好的表现。
Publicly Detectable Watermarking for Language Models
results: 我们的水印方案在7B参数范围内进行实验,并证明了我们的正式声明。我们的实验结果表明,我们的水印方案可以保持文本质量,同时符合正式要求。Abstract
We construct the first provable watermarking scheme for language models with public detectability or verifiability: we use a private key for watermarking and a public key for watermark detection. Our protocol is the first watermarking scheme that does not embed a statistical signal in generated text. Rather, we directly embed a publicly-verifiable cryptographic signature using a form of rejection sampling. We show that our construction meets strong formal security guarantees and preserves many desirable properties found in schemes in the private-key watermarking setting. In particular, our watermarking scheme retains distortion-freeness and model agnosticity. We implement our scheme and make empirical measurements over open models in the 7B parameter range. Our experiments suggest that our watermarking scheme meets our formal claims while preserving text quality.
摘要
我们构建了首个可证明的文本标记 schemes for 语言模型,其中使用私钥进行标记并使用公钥进行标记检测。我们的协议是首个不在生成的文本中嵌入统计信号的 watermarking schemes,而是直接使用拒绝抽象来嵌入公共可验证的 криптографиic 签名。我们证明了我们的构建符合强制ormal security guarantees 和 preserve many desirable properties found in private-key watermarking setting。特别是,我们的文本标记 schemes preserved distortion-freeness 和 model agnosticity。我们实现了我们的协议并对 open models 在 7B 参数范围进行了实验。我们的实验表明,我们的文本标记 schemes 符合我们的ormal claims 而 preserve text quality。
PeTailor: Improving Large Language Model by Tailored Chunk Scorer in Biomedical Triple Extraction
results: 我们的实验表明,PETAI-LOR在GM-CIHT上实现了状态 искусственный智能的表现。Abstract
The automatic extraction of biomedical entities and their interaction from unstructured data remains a challenging task due to the limited availability of expert-labeled standard datasets. In this paper, we introduce PETAI-LOR, a retrieval-based language framework that is augmented by tailored chunk scorer. Unlike previous retrieval-augmented language models (LM) that retrieve relevant documents by calculating the similarity between the input sentence and the candidate document set, PETAILOR segments the sentence into chunks and retrieves the relevant chunk from our pre-computed chunk-based relational key-value memory. Moreover, in order to comprehend the specific requirements of the LM, PETAI-LOR adapt the tailored chunk scorer to the LM. We also introduce GM-CIHT, an expert annotated biomedical triple extraction dataset with more relation types. This dataset is centered on the non-drug treatment and general biomedical domain. Additionally, we investigate the efficacy of triple extraction models trained on general domains when applied to the biomedical domain. Our experiments reveal that PETAI-LOR achieves state-of-the-art performance on GM-CIHT
摘要
自动提取生物医学实体和其交互从未结构化数据中 Remains 是一个挑战性的任务,因为专家标注标准数据集的可用性有限。在本文中,我们介绍 PETAI-LOR,一种基于检索的语言框架,该框架通过专门设计的块分词器进行增强。与过去的检索增强语言模型(LM)不同,PETAI-LOR 不是通过计算输入句子和候选文档集之间的相似性来 Retrieval 相关文档,而是通过将句子分成块,然后从我们预计算出的块基于关键值对存储中提取相关块。此外,为了适应特定的LM要求,PETAI-LOR 可以根据LM进行适应tailored块评分器。我们还介绍了GM-CIHT,一个专家标注的生物医学三元EXTRACT数据集,该数据集涵盖非药治疗和普通生物医学领域。此外,我们还调查了将生物医学领域应用于通用领域提取模型的可行性。我们的实验表明,PETAI-LOR 在GM-CIHT上实现了状态机器人的表现。
Do Not Harm Protected Groups in Debiasing Language Representation Models
methods: 本论文使用了四种干预技术,包括word embeddings、 adversarial training、 debiasing word embeddings和 adversarial debiasing。
results: 研究发现,干预技术可以减少偏见,但是这些技术可能会对保护的群体造成不良影响,包括性别、种族和年龄等。Abstract
Language Representation Models (LRMs) trained with real-world data may capture and exacerbate undesired bias and cause unfair treatment of people in various demographic groups. Several techniques have been investigated for applying interventions to LRMs to remove bias in benchmark evaluations on, for example, word embeddings. However, the negative side effects of debiasing interventions are usually not revealed in the downstream tasks. We propose xGAP-DEBIAS, a set of evaluations on assessing the fairness of debiasing. In this work, We examine four debiasing techniques on a real-world text classification task and show that reducing biasing is at the cost of degrading performance for all demographic groups, including those the debiasing techniques aim to protect. We advocate that a debiasing technique should have good downstream performance with the constraint of ensuring no harm to the protected group.
摘要
语言表示模型(LRM)通过实际数据训练可能捕捉和增强不良偏见,导致各种人口组群体受到不公正待遇。一些技术已经研究了对LRMs进行修正以去除偏见,但这些修正的负面影响通常不会在下游任务中表现出来。我们提出xGAP-DEBIAS,一种评估去偏见的评价方法。在这种工作中,我们研究了一个真实世界文本分类任务中四种去偏见技术,并显示了减少偏见的代价是对所有人口组群体,包括被保护的群体,进行性能下降。我们强调,一种去偏见技术应该在保证不会对保护的群体造成伤害的前提下保持良好的下游性能。
T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models
results: 研究发现,这个精度调整后的t5-large模型在小段文本识别作者方面表现出色,并且超过了所有测试基线模型。然而,研究还发现了一些作者在模型的预训练数据中的存在对预测结果产生了困难评估的影响。Abstract
Large language models have shown breakthrough potential in many NLP domains. Here we consider their use for stylometry, specifically authorship identification in Early Modern English drama. We find both promising and concerning results; LLMs are able to accurately predict the author of surprisingly short passages but are also prone to confidently misattribute texts to specific authors. A fine-tuned t5-large model outperforms all tested baselines, including logistic regression, SVM with a linear kernel, and cosine delta, at attributing small passages. However, we see indications that the presence of certain authors in the model's pre-training data affects predictive results in ways that are difficult to assess.
摘要
大型语言模型在许多自然语言处理领域中显示出了突破性潜力。我们在这里考虑使用这些模型来进行类型推断,具体而言是在 Early Modern English drama 中进行作者识别。我们发现了一些有希望的结果,以及一些担心的结果:大型语言模型能够对短段文本准确地预测作者,但也容易将文本错误归属给特定的作者。我们发现了一些证据表明模型的预设数据中的作者存在影响预测结果的方式,但这些影响难以评估。
Modeling Legal Reasoning: LM Annotation at the Edge of Human Agreement
paper_authors: Rosamond Thalken, Edward H. Stiglitz, David Mimno, Matthew Wilkens
for: 法律逻辑分析的 классификация,即法律哲学分析。
methods: 使用生成语言模型(LMs)进行文档分类任务,并对不同的LM模型进行系统性测试。
results: 发现生成模型在不受 instrucion(i.e. 提示)的情况下表现不佳,但是在对标注过的数据集进行微调后,得到了最佳的结果,并通过应用这些预测来研究历史时期的法律哲学趋势,这与知名的历史质量论相一致,同时还指出了一些可能需要进一步修正的方面。Abstract
Generative language models (LMs) are increasingly used for document class-prediction tasks and promise enormous improvements in cost and efficiency. Existing research often examines simple classification tasks, but the capability of LMs to classify on complex or specialized tasks is less well understood. We consider a highly complex task that is challenging even for humans: the classification of legal reasoning according to jurisprudential philosophy. Using a novel dataset of historical United States Supreme Court opinions annotated by a team of domain experts, we systematically test the performance of a variety of LMs. We find that generative models perform poorly when given instructions (i.e. prompts) equal to the instructions presented to human annotators through our codebook. Our strongest results derive from fine-tuning models on the annotated dataset; the best performing model is an in-domain model, LEGAL-BERT. We apply predictions from this fine-tuned model to study historical trends in jurisprudence, an exercise that both aligns with prominent qualitative historical accounts and points to areas of possible refinement in those accounts. Our findings generally sound a note of caution in the use of generative LMs on complex tasks without fine-tuning and point to the continued relevance of human annotation-intensive classification methods.
摘要
现代生成语言模型(LMs)在文档分类任务中越来越受到广泛使用,承诺可以大幅提高成本和效率。现有研究通常研究简单的分类任务,但生成模型在复杂或专业化任务上的能力更少被了解。我们考虑了一个非常复杂的任务:用法律哲学来分类法律理解。使用一个新的历史美国最高法院判决 opacity 的注释者队伍编制的数据集,我们系统地测试了多种LMs的性能。我们发现,当给生成模型提供相同的指令(i.e. 提示)时,生成模型表现很差。我们最好的结果来自于在注释过的数据集上练习模型,最佳表现的模型是适应于法律领域的 LEGAL-BERT。我们使用这个练习后的模型进行历史趋势的研究,这与著名的qualitative历史质量相符,并指出了可能的改进点。我们的发现通常表达了对生成LMs在复杂任务中无需练习的使用存在警告,并指出了人工注释Intensive分类方法的持续 relevance。
Expanding the Set of Pragmatic Considerations in Conversational AI
results: 论文提出了一种类型化的对话AI系统的设计和评估方法,以解决现有系统的实用上的缺陷。I hope that helps! Let me know if you have any other questions.Abstract
Despite considerable performance improvements, current conversational AI systems often fail to meet user expectations. We discuss several pragmatic limitations of current conversational AI systems. We illustrate pragmatic limitations with examples that are syntactically appropriate, but have clear pragmatic deficiencies. We label our complaints as "Turing Test Triggers" (TTTs) as they indicate where current conversational AI systems fall short compared to human behavior. We develop a taxonomy of pragmatic considerations intended to identify what pragmatic competencies a conversational AI system requires and discuss implications for the design and evaluation of conversational AI systems.
摘要
尽管现有的对话AI系统已经做出了很大的表现改进,但它们仍然不能满足用户的期望。我们讨论了现有对话AI系统的各种各样的限制。我们使用合适的语法示例来 illustrate these limitations, but these examples have clear pragmatic deficiencies. 我们称这些问题为“图灵测试触发器”(TTTs),因为它们表明现有的对话AI系统与人类行为相比存在着缺陷。我们开发了对话AI系统的pragma考虑的分类,以确定这些系统所需的pragma能力,并讨论了这些分类的影响对对话AI系统的设计和评估。
SDOH-NLI: a Dataset for Inferring Social Determinants of Health from Clinical Notes
paper_authors: Adam D. Lelkes, Eric Loreaux, Tal Schuster, Ming-Jun Chen, Alvin Rajkomar
for: This paper aims to provide a new dataset for natural language inference (NLI) tasks to extract social and behavioral determinants of health (SDOH) from clinical notes.
methods: The paper uses a dataset of publicly available clinical notes and formulates SDOH extraction as an NLI task, with binary textual entailment labels obtained from human raters.
results: The authors evaluate both “off-the-shelf” entailment models and models fine-tuned on their data, and find that their dataset appears more challenging than commonly used NLI datasets.Here is the same information in Simplified Chinese text:
for: 这篇论文的目的是提供一个新的自然语言推理(NLI)任务,以提取医疗记录中的社会和行为Determinants of health(SDOH)。
results: 作者评估了一些“卖在架”的推理模型以及特定于其数据集的模型,并发现其数据集与常用的NLI数据集相比更加具有挑战性。Abstract
Social and behavioral determinants of health (SDOH) play a significant role in shaping health outcomes, and extracting these determinants from clinical notes is a first step to help healthcare providers systematically identify opportunities to provide appropriate care and address disparities. Progress on using NLP methods for this task has been hindered by the lack of high-quality publicly available labeled data, largely due to the privacy and regulatory constraints on the use of real patients' information. This paper introduces a new dataset, SDOH-NLI, that is based on publicly available notes and which we release publicly. We formulate SDOH extraction as a natural language inference (NLI) task, and provide binary textual entailment labels obtained from human raters for a cross product of a set of social history snippets as premises and SDOH factors as hypotheses. Our dataset differs from standard NLI benchmarks in that our premises and hypotheses are obtained independently. We evaluate both "off-the-shelf" entailment models as well as models fine-tuned on our data, and highlight the ways in which our dataset appears more challenging than commonly used NLI datasets.
摘要
社会和行为determinants of health (SDOH) play a significant role in shaping health outcomes, and extracting these determinants from clinical notes is a first step to help healthcare providers systematically identify opportunities to provide appropriate care and address disparities. Progress on using NLP methods for this task has been hindered by the lack of high-quality publicly available labeled data, largely due to the privacy and regulatory constraints on the use of real patients' information. This paper introduces a new dataset, SDOH-NLI, that is based on publicly available notes and which we release publicly. We formulate SDOH extraction as a natural language inference (NLI) task, and provide binary textual entailment labels obtained from human raters for a cross product of a set of social history snippets as premises and SDOH factors as hypotheses. Our dataset differs from standard NLI benchmarks in that our premises and hypotheses are obtained independently. We evaluate both "off-the-shelf" entailment models as well as models fine-tuned on our data, and highlight the ways in which our dataset appears more challenging than commonly used NLI datasets.Here's the translation in Traditional Chinese:社会和行为determinants of health (SDOH) play a significant role in shaping health outcomes, and extracting these determinants from clinical notes is a first step to help healthcare providers systematically identify opportunities to provide appropriate care and address disparities. Progress on using NLP methods for this task has been hindered by the lack of high-quality publicly available labeled data, largely due to the privacy and regulatory constraints on the use of real patients' information. This paper introduces a new dataset, SDOH-NLI, that is based on publicly available notes and which we release publicly. We formulate SDOH extraction as a natural language inference (NLI) task, and provide binary textual entailment labels obtained from human raters for a cross product of a set of social history snippets as premises and SDOH factors as hypotheses. Our dataset differs from standard NLI benchmarks in that our premises and hypotheses are obtained independently. We evaluate both "off-the-shelf" entailment models as well as models fine-tuned on our data, and highlight the ways in which our dataset appears more challenging than commonly used NLI datasets.
Teacher Perception of Automatically Extracted Grammar Concepts for L2 Language Learning
results: 这篇论文的结果表明,使用自然语言文库来自动发现和可见化语法描述可以帮助教师更好地创建语言教学课程,并且这些材料被教育专业人员评估为有用。Abstract
One of the challenges in language teaching is how best to organize rules regarding syntax, semantics, or phonology in a meaningful manner. This not only requires content creators to have pedagogical skills, but also have that language's deep understanding. While comprehensive materials to develop such curricula are available in English and some broadly spoken languages, for many other languages, teachers need to manually create them in response to their students' needs. This is challenging because i) it requires that such experts be accessible and have the necessary resources, and ii) describing all the intricacies of a language is time-consuming and prone to omission. In this work, we aim to facilitate this process by automatically discovering and visualizing grammar descriptions. We extract descriptions from a natural text corpus that answer questions about morphosyntax (learning of word order, agreement, case marking, or word formation) and semantics (learning of vocabulary). We apply this method for teaching two Indian languages, Kannada and Marathi, which, unlike English, do not have well-developed resources for second language learning. To assess the perceived utility of the extracted material, we enlist the help of language educators from schools in North America to perform a manual evaluation, who find the materials have potential to be used for their lesson preparation and learner evaluation.
摘要
Translated into Simplified Chinese:一个挑战在语言教学中是如何有效地组织语法、 semantics 或音律规则的方式。这不仅需要内容创作人具备教学技能,还需要对这种语言有深刻的理解。而且,为了开发这些课程资料,英语和一些广泛使用的语言有相关的资源,但对于其他语言,教师需要手动创建响应学生需求的资料。这是因为i) 需要访问这些专家和有必要的资源,ii) 描述语言的细节是时间consuming 和易于缺少。在这个工作中,我们希望通过自动发现和视觉化语法描述来促进这个过程。我们从自然文本 corpus 中提取描述,回答有关 morphosyntax (学习word order、一致、格emarking 或 word formation)和 semantics (学习词汇)的问题。我们对几种印度语言,如 kannada 和 Marathi 进行应用,这些语言与英语不同,没有很好的第二语言学习资源。为了评估提取的材料的实际用途,我们征得北美语言教育专业人士的帮助进行手动评估,他们认为这些材料具有教学和学生评估的潜在用途。
results: 实验结果显示,在使用GPT-175B模型在H100 GPU平台进行训练时,作者的FP8混合精度训练框架可以实现42%的实际内存使用减少和64%的BF16框架(即Megatron-LM)的运行速度,超过Nvidia Transformer Engine的速度。此外,这种混合精度训练方法可以应用于其他任务,如LLM指令优化和人工回馈学习,从而降低精度训练成本。作者的FP8低精度训练框架已经公开开源于GitHub(https://github.com/Azure/MS-AMP)。Abstract
In this paper, we explore FP8 low-bit data formats for efficient training of large language models (LLMs). Our key insight is that most variables, such as gradients and optimizer states, in LLM training can employ low-precision data formats without compromising model accuracy and requiring no changes to hyper-parameters. Specifically, we propose a new FP8 automatic mixed-precision framework for training LLMs. This framework offers three levels of FP8 utilization to streamline mixed-precision and distributed parallel training for LLMs. It gradually incorporates 8-bit gradients, optimizer states, and distributed learning in an incremental manner. Experiment results show that, during the training of GPT-175B model on H100 GPU platform, our FP8 mixed-precision training framework not only achieved a remarkable 42% reduction in real memory usage but also ran 64% faster than the widely adopted BF16 framework (i.e., Megatron-LM), surpassing the speed of Nvidia Transformer Engine by 17%. This largely reduces the training costs for large foundation models. Furthermore, our FP8 mixed-precision training methodology is generic. It can be seamlessly applied to other tasks such as LLM instruction tuning and reinforcement learning with human feedback, offering savings in fine-tuning expenses. Our FP8 low-precision training framework is open-sourced at {https://github.com/Azure/MS-AMP}{aka.ms/MS.AMP}.
摘要
An Approach to Automatically generating Riddles aiding Concept Attainment
paper_authors: Niharika Sri Parasa, Chaitali Diwan, Srinath Srinivasa
for: The paper aims to enhance learner engagement in online learning environments by applying the Concept Attainment Model to build conceptual riddles.
methods: The paper uses a combination of natural language processing and the Concept Attainment Model to create factual triples from learning resources, classify them based on their uniqueness to a concept, and generate riddles based on the Concept Attainment Model’s format.
results: The human evaluation of the riddles obtained encouraging results, indicating the effectiveness of the proposed approach in enhancing learner engagement.Here’s the simplified Chinese text for the three information points:
results: 人类评价的结果显示,提案的方法具有吸引学习者的潜力。Abstract
One of the primary challenges in online learning environments, is to retain learner engagement. Several different instructional strategies are proposed both in online and offline environments to enhance learner engagement. The Concept Attainment Model is one such instructional strategy that focuses on learners acquiring a deeper understanding of a concept rather than just its dictionary definition. This is done by searching and listing the properties used to distinguish examples from non-examples of various concepts. Our work attempts to apply the Concept Attainment Model to build conceptual riddles, to deploy over online learning environments. The approach involves creating factual triples from learning resources, classifying them based on their uniqueness to a concept into `Topic Markers' and `Common', followed by generating riddles based on the Concept Attainment Model's format and capturing all possible solutions to those riddles. The results obtained from the human evaluation of riddles prove encouraging.
摘要
一个主要挑战在在线学习环境中是保持学生的参与度。多种不同的教学策略在在线和OFFLINE环境中被提出,以增强学生的参与度。概念把握模型是一种教学策略,强调学生深入理解概念,而不仅仅是其字面意思。这是通过搜索和列出不同概念的例子和非例子中的特征来实现的。我们尝试将概念把握模型应用于建立概念的谜题,并在在线学习环境中部署。该方法包括从学习资源中提取事实三元组,将其分类为概念的唯一特征和公共特征,然后根据概念把握模型的格式生成谜题,并捕捉所有的解决方案。人工评估结果表明,谜题的效果是有挑战性的。
MalFake: A Multimodal Fake News Identification for Malayalam using Recurrent Neural Networks and VGG-16
paper_authors: Adhish S. Sujan, Ajitha. V, Aleena Benny, Amiya M. P., V. S. Anoop
for: 这篇研究的目的是为了发展一个能够有效地识别假新闻的模型,尤其是在印度的地方语言中。
methods: 这篇研究使用多modalities的特征提取法和深度学习分类模型来识别假新闻。
results: 这篇研究发现,使用多modalities的特征提取法和深度学习分类模型可以更高度准确地识别假新闻,并且在Malayalam语言中进行了首次实证。Abstract
The amount of news being consumed online has substantially expanded in recent years. Fake news has become increasingly common, especially in regional languages like Malayalam, due to the rapid publication and lack of editorial standards on some online sites. Fake news may have a terrible effect on society, causing people to make bad judgments, lose faith in authorities, and even engage in violent behavior. When we take into the context of India, there are many regional languages, and fake news is spreading in every language. Therefore, providing efficient techniques for identifying false information in regional tongues is crucial. Until now, little to no work has been done in Malayalam, extracting features from multiple modalities to classify fake news. Multimodal approaches are more accurate in detecting fake news, as features from multiple modalities are extracted to build the deep learning classification model. As far as we know, this is the first piece of work in Malayalam that uses multimodal deep learning to tackle false information. Models trained with more than one modality typically outperform models taught with only one modality. Our study in the Malayalam language utilizing multimodal deep learning is a significant step toward more effective misinformation detection and mitigation.
摘要
在最近几年,网络上新闻的浏览量有所扩大。假新闻在当地语言 like 马拉雅利姆语中变得越来越普遍,尤其是在一些在线站点上不具备编辑标准的情况下。假新闻可能对社会产生坏处,让人们做出错误的判断,失去对权威机构的信任,甚至发生暴力行为。在印度国情下,有很多的地方语言,假新闻在每种语言中广泛传播。因此,为了有效地检测假新闻,在马拉雅利姆语中提供有效的技术是非常重要。直到现在,我们知道的是,在马拉雅利姆语中使用多Modalities 的深度学习模型来检测假新闻是第一次。使用多Modalities 的特征可以提高假新闻检测的准确率,因为从多个模式中提取的特征用于建立深度学习分类模型。我们的研究表明,使用多Modalities 的深度学习模型在马拉雅利姆语中可以有效地检测假新闻。这是一项重要的研究,可以帮助我们更好地检测和解决假新闻。
Revising with a Backward Glance: Regressions and Skips during Reading as Cognitive Signals for Revision Policies in Incremental Processing
results: 研究发现,人类阅读眼动追踪数据中的回退和跳过可能serve as useful predictors for revisions in BiLSTMs and Transformer models,并且这些结果适用于多种语言。Abstract
In NLP, incremental processors produce output in instalments, based on incoming prefixes of the linguistic input. Some tokens trigger revisions, causing edits to the output hypothesis, but little is known about why models revise when they revise. A policy that detects the time steps where revisions should happen can improve efficiency. Still, retrieving a suitable signal to train a revision policy is an open problem, since it is not naturally available in datasets. In this work, we investigate the appropriateness of regressions and skips in human reading eye-tracking data as signals to inform revision policies in incremental sequence labelling. Using generalised mixed-effects models, we find that the probability of regressions and skips by humans can potentially serve as useful predictors for revisions in BiLSTMs and Transformer models, with consistent results for various languages.
摘要
在自然语言处理(NLP)中,逐步处理器生成输出,基于进来的语言输入前缀。一些token触发修订,导致输出假设中的修订,但是不多少是为何模型修订这件事情都不太清楚。一个政策可以提高效率是在哪些时间步骤中进行修订。然而,找到适合训练修订政策的信号仍然是一个开放的问题,因为这些信号不自然地出现在数据集中。在这项工作中,我们 investigate了人类阅读眼动追踪数据中的回退和跳过是否可以作为修订政策的信号。使用通用混合效应模型,我们发现人类的回退和跳过概率可能可以作为BiLSTM和Transformer模型中的修订预测器,具有一致的结果。
ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models
results: 本研究在 zero-shot 和 fine-tuned CTA 问题上达到了新的州Of-the-art 性能,包括三个新的领域特定的benchmark,并发布了相关的代码和数据。Abstract
Existing deep-learning approaches to semantic column type annotation (CTA) have important shortcomings: they rely on semantic types which are fixed at training time; require a large number of training samples per type and incur large run-time inference costs; and their performance can degrade when evaluated on novel datasets, even when types remain constant. Large language models have exhibited strong zero-shot classification performance on a wide range of tasks and in this paper we explore their use for CTA. We introduce ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve column type annotation problems in a fully zero-shot manner. We ablate each component of our method separately, and establish that improvements to context sampling and label remapping provide the most consistent gains. ArcheType establishes new state-of-the-art performance on both zero-shot and fine-tuned CTA, including three new domain-specific benchmarks, which we release, along with the code to reproduce our results at https://github.com/penfever/ArcheType.
摘要
现有的深度学习方法 дляsemantic column type annotation(CTA)具有重要的缺点:它们依赖于固定的semantic type,需要训练样本数量很多,并且在运行时会产生大量的计算成本。此外,它们在新的数据集上表现不佳,即使类型保持不变。大型语言模型在各种任务上表现出了强的零shot分类能力,在这篇论文中,我们探索了它们在CTA中的使用。我们介绍了ArcheType,一种简单、实用的方法,可以使大型语言模型解决column type annotation问题,无需任何训练样本。我们分别离去每个方法的组成部分,并证明了改进context sampling和label remapping可以提供最大的改进。ArcheType在零shot和精心调整的CTA中成功地设置新的状态纪录,包括三个新的域特定的benchmark,我们在https://github.com/penfever/ArcheType中发布了这些benchmark和 reproduce我们的结果的代码。
INA: An Integrative Approach for Enhancing Negotiation Strategies with Reward-Based Dialogue System
paper_authors: Zishan Ahmad, Suman Saurabh, Vaishakh Sreekanth Menon, Asif Ekbal, Roshni Ramnani, Anutosh Maitra for:* The paper proposes a novel negotiation dialogue agent for online marketplaces, designed to negotiate on price and other factors such as item inclusion/exclusion in a bundle deal.methods:* The agent uses a new semi-automated data creation method that combines defining negotiation intents, actions, and intent-action simulation to generate potential dialogue flows.* The agent employs a set of novel rewards tailored for the negotiation task to train the Integrative Negotiation Agent (INA).results:* The proposed approach and reward system significantly enhance the agent’s negotiation capabilities, allowing it to engage in integrative negotiations and dynamically adjust prices and item inclusions/exclusions in a bundle deal.Abstract
In this paper, we propose a novel negotiation dialogue agent designed for the online marketplace. Our agent is integrative in nature i.e, it possesses the capability to negotiate on price as well as other factors, such as the addition or removal of items from a deal bundle, thereby offering a more flexible and comprehensive negotiation experience. We create a new dataset called Integrative Negotiation Dataset (IND) to enable this functionality. For this dataset creation, we introduce a new semi-automated data creation method, which combines defining negotiation intents, actions, and intent-action simulation between users and the agent to generate potential dialogue flows. Finally, the prompting of GPT-J, a state-of-the-art language model, is done to generate dialogues for a given intent, with a human-in-the-loop process for post-editing and refining minor errors to ensure high data quality. We employ a set of novel rewards, specifically tailored for the negotiation task to train our Negotiation Agent, termed as the Integrative Negotiation Agent (INA). These rewards incentivize the chatbot to learn effective negotiation strategies that can adapt to various contextual requirements and price proposals. By leveraging the IND, we train our model and conduct experiments to evaluate the effectiveness of our reward-based dialogue system for negotiation. Our results demonstrate that the proposed approach and reward system significantly enhance the agent's negotiation capabilities. The INA successfully engages in integrative negotiations, displaying the ability to dynamically adjust prices and negotiate the inclusion or exclusion of items in a bundle deal
摘要
本文提出了一种新的谈判对话机器人,适用于在线市场场所。我们的机器人具有整合性,即可以谈判价格以及其他因素,如交易包中的物品添加或删除,从而提供更加灵活和全面的谈判体验。我们创建了一个新的整合谈判数据集(IND),以实现这种功能。为了创建IND,我们提出了一种新的半自动化数据创建方法,该方法结合定义谈判意图、行为和用户和机器人之间的意图动作模拟,以生成潜在的对话流程。最后,我们使用GPT-J,一种现代自然语言处理模型,来提示对话,并进行人类在 Loop 过程中的修改和微调,以确保数据质量高。我们采用一组特定于谈判任务的新奖励,以训练我们的谈判机器人,称为整合谈判机器人(INA)。这些奖励激励机器人学习有效的谈判策略,能够适应不同的情况和价格建议。通过利用IND,我们训练我们的模型,并进行实验来评估我们的奖励基于对话系统的效果。我们的结果表明,我们的方法和奖励系统可以显著提高机器人的谈判能力。INA成功地参与了整合谈判,展现了可以动态调整价格并谈判交易包中的物品 inclusion 或 exclusion 的能力。
Lost in Translation, Found in Spans: Identifying Claims in Multilingual Social Media
results: 研究发现,使用多种语言进行训练可以超过零扩展传递和翻译数据进行训练的性能,并且小型encoder-only语言模型在低资源语言上表现比GPT系列更好。Abstract
Claim span identification (CSI) is an important step in fact-checking pipelines, aiming to identify text segments that contain a checkworthy claim or assertion in a social media post. Despite its importance to journalists and human fact-checkers, it remains a severely understudied problem, and the scarce research on this topic so far has only focused on English. Here we aim to bridge this gap by creating a novel dataset, X-CLAIM, consisting of 7K real-world claims collected from numerous social media platforms in five Indian languages and English. We report strong baselines with state-of-the-art encoder-only language models (e.g., XLM-R) and we demonstrate the benefits of training on multiple languages over alternative cross-lingual transfer methods such as zero-shot transfer, or training on translated data, from a high-resource language such as English. We evaluate generative large language models from the GPT series using prompting methods on the X-CLAIM dataset and we find that they underperform the smaller encoder-only language models for low-resource languages.
摘要
“宣称 span 识别(CSI)是 фак-检查管道中的重要步骤,目的是寻找社交媒体文章中可信worthy的声明或asserttion。despite its importance to journalists and human fact-checkers, it remains a severely understudied problem, and the scarce research on this topic so far has only focused on English. Here we aim to bridge this gap by creating a novel dataset, X-CLAIM, consisting of 7K real-world claims collected from numerous social media platforms in five Indian languages and English. We report strong baselines with state-of-the-art encoder-only language models (e.g., XLM-R) and we demonstrate the benefits of training on multiple languages over alternative cross-lingual transfer methods such as zero-shot transfer, or training on translated data, from a high-resource language such as English. We evaluate generative large language models from the GPT series using prompting methods on the X-CLAIM dataset and we find that they underperform the smaller encoder-only language models for low-resource languages.”
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
results: 该研究在多个多种语音数据集上进行了证明,包括LibriTTS和PromptSpeech数据集,并通过多个量化度量测试生成的准确率和MOS来证明其效果。Abstract
In this paper, we present a Diffusion GAN based approach (Prosodic Diff-TTS) to generate the corresponding high-fidelity speech based on the style description and content text as an input to generate speech samples within only 4 denoising steps. It leverages the novel conditional prosodic layer normalization to incorporate the style embeddings into the multi head attention based phoneme encoder and mel spectrogram decoder based generator architecture to generate the speech. The style embedding is generated by fine tuning the pretrained BERT model on auxiliary tasks such as pitch, speaking speed, emotion,gender classifications. We demonstrate the efficacy of our proposed architecture on multi-speaker LibriTTS and PromptSpeech datasets, using multiple quantitative metrics that measure generated accuracy and MOS.
摘要
在这篇论文中,我们提出了一种扩散GAN基本方法(叫做Prosodic Diff-TTS),用于根据样式描述和内容文本生成相应的高精度语音,并且只需要4个释除步骤。它利用了新的 conditional prosodic layer normalization来将样式嵌入 incorporated 到多头注意力基本架构中的phoneme encoder和mel spectrogram decoder基本生成器中,以生成语音。样式嵌入由先前热身BERT模型的 fine-tuning 在auxiliary task such as pitch, speaking speed, emotion, gender classifications中进行。我们在多个 speakers的 LibriTTS 和 PromptSpeech 数据集上证明了我们提出的架构的可行性,并使用多个量化度量来评估生成的准确性和MOS。
MPrompt: Exploring Multi-level Prompt Tuning for Machine Reading Comprehension
results: 在12个不同的benchmark上进行了广泛的实验,并实现了与当前最佳方法的平均提升率为1.94%。Abstract
The large language models have achieved superior performance on various natural language tasks. One major drawback of such approaches is they are resource-intensive in fine-tuning new datasets. Soft-prompt tuning presents a resource-efficient solution to fine-tune the pre-trained language models (PLMs) while keeping their weight frozen. Existing soft prompt methods mainly focus on designing the input-independent prompts that steer the model to fit the domain of the new dataset. Those methods often ignore the fine-grained information about the task and context of the text. In this paper, we propose a multi-level prompt tuning (MPrompt) method for machine reading comprehension. It utilizes prompts at task-specific, domain-specific, and context-specific levels to enhance the comprehension of input semantics at different granularities. We also propose an independence constraint to steer each domain-specific prompt to focus on information within its domain to avoid redundancy. Moreover, we present a prompt generator that incorporates context-related knowledge in the prompt generation to enhance contextual relevancy. We conducted extensive experiments on 12 benchmarks of various QA formats and achieved an average improvement of 1.94\% over the state-of-the-art methods.
摘要
大型自然语言模型已经在不同的自然语言任务上实现了出色的性能。然而,这些方法具有资源占用很大的缺点,需要较多的训练数据来调整新的数据集。软提示调整方法提供了一种资源有效的解决方案,可以在保持模型权重固定的情况下,对预训练语言模型(PLMs)进行调整。现有的软提示方法主要关注设计独立的输入提示,以使模型适应新数据集的领域。这些方法通常忽略了文本的任务和上下文细节信息。在这篇论文中,我们提出了一种多级提示调整(MPrompt)方法,用于机器阅读理解。它利用提示在任务特定、领域特定和上下文特定三个级别来提高输入 semantics 的理解。我们还提出了一种独立约束,以确保每个领域特定的提示专注于自己的领域内容,以避免重复。此外,我们提出了一种 incorporating 上下文相关知识的提示生成器,以提高上下文相关性。我们在 12 个不同的benchmark上进行了广泛的实验,并实现了与状态 искус法方法的平均提升率为1.94%。
Elevating Code-mixed Text Handling through Auditory Information of Words
for: handles code-mixed textual data with auditory information
methods: pre-training step based on masked-language-modelling with SOUNDEX representations (SAMLM) and a new input method
results: improved robustness towards adversarial attacks and better classification results over popular baselines for code-mixed tasksHere is the simplified Chinese version of the three points:
results: 提高了对 adversarial 攻击的Robustness,以及对 code-mixed 任务的基eline 性能Abstract
With the growing popularity of code-mixed data, there is an increasing need for better handling of this type of data, which poses a number of challenges, such as dealing with spelling variations, multiple languages, different scripts, and a lack of resources. Current language models face difficulty in effectively handling code-mixed data as they primarily focus on the semantic representation of words and ignore the auditory phonetic features. This leads to difficulties in handling spelling variations in code-mixed text. In this paper, we propose an effective approach for creating language models for handling code-mixed textual data using auditory information of words from SOUNDEX. Our approach includes a pre-training step based on masked-language-modelling, which includes SOUNDEX representations (SAMLM) and a new method of providing input data to the pre-trained model. Through experimentation on various code-mixed datasets (of different languages) for sentiment, offensive and aggression classification tasks, we establish that our novel language modeling approach (SAMLM) results in improved robustness towards adversarial attacks on code-mixed classification tasks. Additionally, our SAMLM based approach also results in better classification results over the popular baselines for code-mixed tasks. We use the explainability technique, SHAP (SHapley Additive exPlanations) to explain how the auditory features incorporated through SAMLM assist the model to handle the code-mixed text effectively and increase robustness against adversarial attacks \footnote{Source code has been made available on \url{https://github.com/20118/DefenseWithPhonetics}, \url{https://www.iitp.ac.in/~ai-nlp-ml/resources.html\#Phonetics}.
摘要
随着code-mixed数据的普及,处理这类数据的需求日益增加,但这也存在许多挑战,如处理拼写变化、多语言、不同的字符集和资源不足等。现有语言模型在处理code-mixed文本时存在困难,因为它们主要关注单词的 semantics 表示,忽略了听音特征。这导致了处理拼写变化的困难。在这篇论文中,我们提出了一种有效的方法,使用听音信息来创建适用于处理code-mixed文本数据的语言模型。我们的方法包括在遮盖语言模型的预训练阶段基于MASKED-LANGUAGE-MODELING,以及一种新的输入数据提供方法。通过对不同语言的code-mixed数据集进行 sentiment、攻击和侵略等任务的实验,我们证明了我们的新的语言模型方法(SAMLM)能够更好地鲁棒化对code-mixed文本的攻击。此外,我们的SAMLM基于方法还在code-mixed任务上得到了更好的分类结果,比 популяр的基elines更好。我们使用SHAP(SHapley Additive exPlanations)技术来解释如何通过SAMLM incorporating 听音特征来处理code-mixed文本,从而提高模型对code-mixed文本的鲁棒性和抗击攻击能力。详细的源代码已经在 和 上发布。
Disentangled Representation Learning with Large Language Models for Text-Attributed Graphs
paper_authors: Yijian Qin, Xin Wang, Ziwei Zhang, Wenwu Zhu for: 这篇论文是为了解决现有的大语言模型(LLM)在文本嵌入图(TAG)中的缺陷,提高LLM的理解和预测能力。methods: 这篇论文提出了一种名为Disentangled Graph-Text Learner(DGTL)模型,通过专门设计的分离图神经网络(GNN)层,使LLM可以更好地捕捉文本嵌入图中的复杂关系。results: 实验证明,提出的DGTL模型可以在文本嵌入图中实现superior或相当于现有基线的性能,并且可以提供自然语言的解释,因此显著提高了模型的可读性。Abstract
Text-attributed graphs (TAGs) are prevalent on the web and research over TAGs such as citation networks, e-commerce networks and social networks has attracted considerable attention in the web community. Recently, large language models (LLMs) have demonstrated exceptional capabilities across a wide range of tasks. However, the existing works focus on harnessing the potential of LLMs solely relying on prompts to convey graph structure information to LLMs, thus suffering from insufficient understanding of the complex structural relationships within TAGs. To address this problem, in this paper we present the Disentangled Graph-Text Learner (DGTL) model, which is able to enhance the reasoning and predicting capabilities of LLMs for TAGs. Our proposed DGTL model incorporates graph structure information through tailored disentangled graph neural network (GNN) layers, enabling LLMs to capture the intricate relationships hidden in text-attributed graphs from multiple structural factors. Furthermore, DGTL operates with frozen pre-trained LLMs, reducing computational costs and allowing much more flexibility in combining with different LLM models. Experimental evaluations demonstrate the effectiveness of the proposed DGTL model on achieving superior or comparable performance over state-of-the-art baselines. Additionally, we also demonstrate that our DGTL model can offer natural language explanations for predictions, thereby significantly enhancing model interpretability.
摘要
文本归属图(TAG)在网络上非常普遍,研究人员对这些图像(如引用网络、购物网络和社交网络)的研究吸引了广泛的关注。最近,大型自然语言模型(LLM)在各种任务上表现出了非常出色的能力。然而,现有的工作强调仅通过提示来使LLM对图像进行理解,因此忽略了TAG中复杂的结构关系的问题。为解决这个问题,我们在这篇论文中提出了卷积图文学习者(DGTL)模型,可以增强LLM对TAG的理解和预测能力。我们的提议的DGTL模型通过适应的分离卷积神经网络层来捕捉TAG中多种结构因素中的复杂关系,使LLM能够从多个角度理解TAG的结构。此外,DGTL模型可以与预训练的LLM结合使用, thereby reducing computational costs and allowing for much more flexibility in combining with different LLM models。实验评估表明,我们的提议的DGTL模型可以在达到或与当前基eline相当的性能。此外,我们还示出了DGTL模型可以提供自然语言的解释,从而显著提高模型可读性。
DELPHI: Data for Evaluating LLMs’ Performance in Handling Controversial Issues
paper_authors: David Q. Sun, Artem Abzaliev, Hadas Kotek, Zidi Xiu, Christopher Klein, Jason D. Williams
For: This paper aims to systematically examine how large language models (LLMs) respond to questions related to ongoing debates and controversial issues.* Methods: The authors propose a novel construction of a controversial questions dataset, expanding upon the publicly released Quora Question Pairs Dataset. They evaluate different LLMs using a subset of this dataset to understand how they handle controversial issues and the stances they adopt.* Results: The research reveals challenges concerning knowledge recency, safety, fairness, and bias in LLMs’ interaction with controversial issues, and contributes to our understanding of how these models handle complex societal debates.Here’s the text in Simplified Chinese:* For: 这篇论文目标是系统地检查大语言模型(LLM)对ongoing debates和争议问题的回答。* Methods: 作者提出了一种基于Quora Question Pairs Dataset的争议问题集的新建构,以评估不同LLM对争议问题的处理和立场。* Results: 研究发现LLM在处理争议问题时存在知识新鲜度、安全、公正性和偏见等挑战,这些挑战对于LLM在处理复杂社会问题的理解具有重要意义。Abstract
Controversy is a reflection of our zeitgeist, and an important aspect to any discourse. The rise of large language models (LLMs) as conversational systems has increased public reliance on these systems for answers to their various questions. Consequently, it is crucial to systematically examine how these models respond to questions that pertaining to ongoing debates. However, few such datasets exist in providing human-annotated labels reflecting the contemporary discussions. To foster research in this area, we propose a novel construction of a controversial questions dataset, expanding upon the publicly released Quora Question Pairs Dataset. This dataset presents challenges concerning knowledge recency, safety, fairness, and bias. We evaluate different LLMs using a subset of this dataset, illuminating how they handle controversial issues and the stances they adopt. This research ultimately contributes to our understanding of LLMs' interaction with controversial issues, paving the way for improvements in their comprehension and handling of complex societal debates.
摘要
争议是我们时代精神的反映,是任何讨论的重要方面。大语言模型(LLM)作为对话系统的出现,使人们倾向于依赖这些系统以解答他们的各种问题。因此,系统地检查 LLM 如何回答与当前讨论相关的问题是非常重要的。然而,现在还没有多少 datasets 提供了当今社会讨论的人工标注数据。为推动这一领域的研究,我们提出了一种新的争议问题集合,基于已公布的 Quora 问题对 dataset。这个 dataset 存在知识新鲜度、安全性、公平性和偏见等挑战。我们使用一部分这个 dataset 评估不同的 LLM,揭示它们如何处理争议问题,以及它们所采取的立场。这项研究最终会促进我们对 LLM 与复杂社会讨论的理解,为其更好地处理和理解社会争议的能力做出贡献。
Mind the Gap: Automated Corpus Creation for Enthymeme Detection and Reconstruction in Learner Arguments
results: 该论文通过实验表明,使用该方法可以生成高质量的论据实例,并且这些实例的自然语言表达与学生原始写作的语言相似。此外,该论文还提出了初步的检测和重建论据的方法,以便进一步研究这些任务的可能性。Abstract
Writing strong arguments can be challenging for learners. It requires to select and arrange multiple argumentative discourse units (ADUs) in a logical and coherent way as well as to decide which ADUs to leave implicit, so called enthymemes. However, when important ADUs are missing, readers might not be able to follow the reasoning or understand the argument's main point. This paper introduces two new tasks for learner arguments: to identify gaps in arguments (enthymeme detection) and to fill such gaps (enthymeme reconstruction). Approaches to both tasks may help learners improve their argument quality. We study how corpora for these tasks can be created automatically by deleting ADUs from an argumentative text that are central to the argument and its quality, while maintaining the text's naturalness. Based on the ICLEv3 corpus of argumentative learner essays, we create 40,089 argument instances for enthymeme detection and reconstruction. Through manual studies, we provide evidence that the proposed corpus creation process leads to the desired quality reduction, and results in arguments that are similarly natural to those written by learners. Finally, first baseline approaches to enthymeme detection and reconstruction demonstrate the corpus' usefulness.
摘要
写出强大的论据可能对学习者来说是一项挑战。它需要选择并将多个论据性言Unit (ADU) 组织成逻辑和一致的方式,并决定哪些ADU可以被暗示,即欠Entymemes。然而,当重要的ADU缺失时,读者可能无法跟踪思维或理解论据的主要点。这篇论文提出了两个新任务来提高学习者的论据质量:识别论据缺失 (enthymeme检测) 和填充这些缺失 (enthymeme重建).我们研究了如何通过自动创建 corpora来实现这两个任务。基于 ICLEv3 Argumentative learner essays 论文库,我们创建了40,089个论据实例。通过手动研究,我们提供了证据,表明我们的 corpus 创建过程导致了期望的质量降低,并且结果是与学习者写作的论据类似的自然。最后,我们提出了首个基eline Approaches to enthymeme检测和重建,这证明了 corpus 的用用。
Lost in Translation – Multilingual Misinformation and its Evolution
results: 研究发现,虽然大多数谣言CLAIM只被核查一次,但11.7%的CLAIM(相当于21,000多个)被多次核查。研究还发现,33%的重复CLAIM跨语言传播,表明一些谣言可以跨越语言障碍。然而,研究还发现,谣言在同一语言中更容易传播。通过分析不同语言之间CLAIM的连接和短路,研究发现CLAIM逐渐发展和变化,并且在 crossing 语言时更加明显。Abstract
Misinformation and disinformation are growing threats in the digital age, spreading rapidly across languages and borders. This paper investigates the prevalence and dynamics of multilingual misinformation through an analysis of over 250,000 unique fact-checks spanning 95 languages. First, we find that while the majority of misinformation claims are only fact-checked once, 11.7%, corresponding to more than 21,000 claims, are checked multiple times. Using fact-checks as a proxy for the spread of misinformation, we find 33% of repeated claims cross linguistic boundaries, suggesting that some misinformation permeates language barriers. However, spreading patterns exhibit strong homophily, with misinformation more likely to spread within the same language. To study the evolution of claims over time and mutations across languages, we represent fact-checks with multilingual sentence embeddings and cluster semantically similar claims. We analyze the connected components and shortest paths connecting different versions of a claim finding that claims gradually drift over time and undergo greater alteration when traversing languages. Overall, this novel investigation of multilingual misinformation provides key insights. It quantifies redundant fact-checking efforts, establishes that some claims diffuse across languages, measures linguistic homophily, and models the temporal and cross-lingual evolution of claims. The findings advocate for expanded information sharing between fact-checkers globally while underscoring the importance of localized verification.
摘要
“误信和伪信在数字时代增长为潜在的威胁,迅速在语言和国界之间传播。这篇论文通过分析超过250,000个唯一的事实核查来研究多语言误信的普遍性和动态。我们发现大多数误信声明只被核查一次,但11.7%(相当于 más than 21,000)的声明被重复核查。使用事实核查作为误信传播的代理,我们发现33%的重复声明跨语言传播,这表明一些误信可以跨越语言障碍。然而,误信的传播模式具有强的同语群效应,误信更可能在同一语言中传播。为了研究声明的时间发展和语言过渡的变化,我们使用多语言句子嵌入表示事实核查,并对具有相似含义的声明进行聚类分析。我们分析了声明之间的连接组件和语言之间的短路,发现声明逐渐演化,并在语言之间传播时更容易发生变化。总的来说,这项研究提供了关键的发现,证实了重复核查的重要性,同时也强调了地方化验证的重要性。”
A Scalable Framework for Table of Contents Extraction from Complex ESG Annual Reports
results: 该框架可以更好地处理文档的不同结构和长度,并且比前一代基eline的方法快得多。实验结果表明,我们的方法可以更高效地处理文档,并且可以适应不同的文档长度。Abstract
Table of contents (ToC) extraction centres on structuring documents in a hierarchical manner. In this paper, we propose a new dataset, ESGDoc, comprising 1,093 ESG annual reports from 563 companies spanning from 2001 to 2022. These reports pose significant challenges due to their diverse structures and extensive length. To address these challenges, we propose a new framework for Toc extraction, consisting of three steps: (1) Constructing an initial tree of text blocks based on reading order and font sizes; (2) Modelling each tree node (or text block) independently by considering its contextual information captured in node-centric subtree; (3) Modifying the original tree by taking appropriate action on each tree node (Keep, Delete, or Move). This construction-modelling-modification (CMM) process offers several benefits. It eliminates the need for pairwise modelling of section headings as in previous approaches, making document segmentation practically feasible. By incorporating structured information, each section heading can leverage both local and long-distance context relevant to itself. Experimental results show that our approach outperforms the previous state-of-the-art baseline with a fraction of running time. Our framework proves its scalability by effectively handling documents of any length.
摘要
Constructing an initial tree of text blocks based on reading order and font sizes.2. Modeling each tree node (or text block) independently by considering its contextual information captured in a node-centric subtree.3. Modifying the original tree by taking appropriate action on each tree node (Keep, Delete, or Move).This construction-modeling-modification (CMM) process offers several benefits. It eliminates the need for pairwise modeling of section headings as in previous approaches, making document segmentation practically feasible. By incorporating structured information, each section heading can leverage both local and long-distance context relevant to itself. Experimental results show that our approach outperforms the previous state-of-the-art baseline with a fraction of running time. Our framework proves its scalability by effectively handling documents of any length.
Multi-grained Evidence Inference for Multi-choice Reading Comprehension
results: 我们的方法在四个多选 MRC benchmark 上实现了显著和一致的性能提升。Abstract
Multi-choice Machine Reading Comprehension (MRC) is a major and challenging task for machines to answer questions according to provided options. Answers in multi-choice MRC cannot be directly extracted in the given passages, and essentially require machines capable of reasoning from accurate extracted evidence. However, the critical evidence may be as simple as just one word or phrase, while it is hidden in the given redundant, noisy passage with multiple linguistic hierarchies from phrase, fragment, sentence until the entire passage. We thus propose a novel general-purpose model enhancement which integrates multi-grained evidence comprehensively, named Multi-grained evidence inferencer (Mugen), to make up for the inability. Mugen extracts three different granularities of evidence: coarse-, middle- and fine-grained evidence, and integrates evidence with the original passages, achieving significant and consistent performance improvement on four multi-choice MRC benchmarks.
摘要
“Honey, Tell Me What’s Wrong”, Global Explanation of Textual Discriminative Models through Cooperative Generation
results: 实验表明,这种方法可以准确地描述分类器对输入空间中的行为,并且在输入数据不具体化时表现更好于使用输入数据的方法。Abstract
The ubiquity of complex machine learning has raised the importance of model-agnostic explanation algorithms. These methods create artificial instances by slightly perturbing real instances, capturing shifts in model decisions. However, such methods rely on initial data and only provide explanations of the decision for these. To tackle these problems, we propose Therapy, the first global and model-agnostic explanation method adapted to text which requires no input dataset. Therapy generates texts following the distribution learned by a classifier through cooperative generation. Because it does not rely on initial samples, it allows to generate explanations even when data is absent (e.g., for confidentiality reasons). Moreover, conversely to existing methods that combine multiple local explanations into a global one, Therapy offers a global overview of the model behavior on the input space. Our experiments show that although using no input data to generate samples, Therapy provides insightful information about features used by the classifier that is competitive with the ones from methods relying on input samples and outperforms them when input samples are not specific to the studied model.
摘要
“复杂机器学习的普遍使得模型无关解释算法的重要性提高。这些方法通过微量修改真实实例而创造人工实例,捕捉模型决策的变化。然而,这些方法仅依赖于初始数据,只能提供这些数据的决策说明。为解决这些问题,我们提议了疗法(Therapy),是首个全球、模型无关的解释方法,不需要输入数据集。疗法通过与分类器一起生成文本,学习分类器的分布。因为不依赖于初始样本,疗法可以在没有数据时产生解释(例如,保持隐私原则)。此外,不同于现有的方法,将多个本地解释合并成一个全局解释,疗法提供了输入空间上模型行为的全面视图。我们的实验表明,使用没有输入数据生成样本,疗法可以提供有用的特征信息,与使用输入数据生成样本的方法竞争,并在输入数据不是特定于研究模型时表现更好。”
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese
results: 实验结果显示,我们的提议的模型在四个评价指标中均达到了当前最佳性能。Abstract
In recent years, Visual Question Answering (VQA) has gained significant attention for its diverse applications, including intelligent car assistance, aiding visually impaired individuals, and document image information retrieval using natural language queries. VQA requires effective integration of information from questions and images to generate accurate answers. Neural models for VQA have made remarkable progress on large-scale datasets, with a primary focus on resource-rich languages like English. To address this, we introduce the ViCLEVR dataset, a pioneering collection for evaluating various visual reasoning capabilities in Vietnamese while mitigating biases. The dataset comprises over 26,000 images and 30,000 question-answer pairs (QAs), each question annotated to specify the type of reasoning involved. Leveraging this dataset, we conduct a comprehensive analysis of contemporary visual reasoning systems, offering valuable insights into their strengths and limitations. Furthermore, we present PhoVIT, a comprehensive multimodal fusion that identifies objects in images based on questions. The architecture effectively employs transformers to enable simultaneous reasoning over textual and visual data, merging both modalities at an early model stage. The experimental findings demonstrate that our proposed model achieves state-of-the-art performance across four evaluation metrics. The accompanying code and dataset have been made publicly accessible at \url{https://github.com/kvt0012/ViCLEVR}. This provision seeks to stimulate advancements within the research community, fostering the development of more multimodal fusion algorithms, specifically tailored to address the nuances of low-resource languages, exemplified by Vietnamese.
摘要
Recently, Visual Question Answering (VQA) has gained significant attention due to its diverse applications, such as intelligent car assistance, aiding visually impaired individuals, and document image information retrieval using natural language queries. VQA requires the effective integration of information from questions and images to generate accurate answers. Neural models for VQA have made remarkable progress on large-scale datasets, with a primary focus on resource-rich languages like English. To address this, we introduce the ViCLEVR dataset, a pioneering collection for evaluating various visual reasoning capabilities in Vietnamese while mitigating biases. The dataset comprises over 26,000 images and 30,000 question-answer pairs (QAs), each question annotated to specify the type of reasoning involved. Leveraging this dataset, we conduct a comprehensive analysis of contemporary visual reasoning systems, offering valuable insights into their strengths and limitations. Furthermore, we present PhoVIT, a comprehensive multimodal fusion that identifies objects in images based on questions. The architecture effectively employs transformers to enable simultaneous reasoning over textual and visual data, merging both modalities at an early model stage. The experimental findings demonstrate that our proposed model achieves state-of-the-art performance across four evaluation metrics. The accompanying code and dataset have been made publicly accessible at [url=https://github.com/kvt0012/ViCLEVR]. This provision seeks to stimulate advancements within the research community, fostering the development of more multimodal fusion algorithms, specifically tailored to address the nuances of low-resource languages, exemplified by Vietnamese.
results: 这篇论文的结论是,不同的语言使用场景类型具有不同的特点,而语言理解是一种多方面的现象,需要考虑个人主义和社会过程。此外,选择的理解指标会影响测量模型质量的限制,并且开启了对NLP使用的伦理考虑。Abstract
Natural Language Processing prides itself to be an empirically-minded, if not outright empiricist field, and yet lately it seems to get itself into essentialist debates on issues of meaning and measurement ("Do Large Language Models Understand Language, And If So, How Much?"). This is not by accident: Here, as everywhere, the evidence underspecifies the understanding. As a remedy, this paper sketches the outlines of a model of understanding, which can ground questions of the adequacy of current methods of measurement of model quality. The paper makes three claims: A) That different language use situation types have different characteristics, B) That language understanding is a multifaceted phenomenon, bringing together individualistic and social processes, and C) That the choice of Understanding Indicator marks the limits of benchmarking, and the beginnings of considerations of the ethics of NLP use.
摘要
自然语言处理(NLP)自认为是一个经验主义的,甚至是直接经验主义的领域,然而最近它似乎涉及到必要的本质主义辩论("大语言模型是否理解语言,以及如何量度它们?")。这不是偶合:在这里,就如 everywhere else,证据不够特征化理解。为了解决这问题,这篇论文提出了一个理解模型,以便评估现有测量模型质量的问题。论文提出了三个主张:A)不同的语言使用情况类型有不同的特征;B)语言理解是多方面的现象,既具有个人主义的特征,又具有社会过程的特征;C)选择理解指标标志着测量的限制,也标志着NLP使用的伦理考虑的开始。
SentMix-3L: A Bangla-English-Hindi Code-Mixed Dataset for Sentiment Analysis
results: 研究发现,使用 GPT-3.5 的零shot提问方法可以在 SentMix-3L 上超越所有基于 transformer 的模型。Abstract
Code-mixing is a well-studied linguistic phenomenon when two or more languages are mixed in text or speech. Several datasets have been build with the goal of training computational models for code-mixing. Although it is very common to observe code-mixing with multiple languages, most datasets available contain code-mixed between only two languages. In this paper, we introduce SentMix-3L, a novel dataset for sentiment analysis containing code-mixed data between three languages Bangla, English, and Hindi. We carry out a comprehensive evaluation using SentMix-3L. We show that zero-shot prompting with GPT-3.5 outperforms all transformer-based models on SentMix-3L.
摘要
<>将文本翻译成简化字符串。<>研究人员已经广泛研究了语言混合现象,在文本或语音中混合两种或更多的语言。许多数据集已经建立,用于训练计算机模型。虽然混合多种语言很常见,但大多数可用的数据集只包含两种语言的混合。在这篇论文中,我们介绍了一个新的 sentiment 分析数据集 SentMix-3L,包含三种语言孟加拉语、英语和印地语的混合数据。我们进行了全面的评估,并显示了 GPT-3.5 预训练模型在 SentMix-3L 上的零批训练性能超过所有 transformer 模型。
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark
paper_authors: Oscar Sainz, Jon Ander Campos, Iker García-Ferrero, Julen Etxaniz, Oier Lopez de Lacalle, Eneko Agirre
for: 本文 argue that classical NLP task evaluation using annotated benchmarks is facing a serious problem, specifically the worst kind of data contamination.
results: 本文表明,当一个大自然语言模型(LLM)在测试分割上训练,然后在同一个benchmark上评估时,会导致模型性能的过高估计,从而导致科学结论的错误公布,同时正确的结论被抛弃。这种情况可能导致科学研究的假阳性结论,并且可能对社会造成不良影响。Abstract
In this position paper, we argue that the classical evaluation on Natural Language Processing (NLP) tasks using annotated benchmarks is in trouble. The worst kind of data contamination happens when a Large Language Model (LLM) is trained on the test split of a benchmark, and then evaluated in the same benchmark. The extent of the problem is unknown, as it is not straightforward to measure. Contamination causes an overestimation of the performance of a contaminated model in a target benchmark and associated task with respect to their non-contaminated counterparts. The consequences can be very harmful, with wrong scientific conclusions being published while other correct ones are discarded. This position paper defines different levels of data contamination and argues for a community effort, including the development of automatic and semi-automatic measures to detect when data from a benchmark was exposed to a model, and suggestions for flagging papers with conclusions that are compromised by data contamination.
摘要
在这份位点论文中,我们Arguments that the traditional evaluation of Natural Language Processing (NLP) tasks using annotated benchmarks is facing a crisis. The most severe data contamination occurs when a Large Language Model (LLM) is trained on the test set of a benchmark and then evaluated in the same benchmark. The extent of the problem is unknown, as it is not easy to measure. Contamination causes an overestimation of the performance of a contaminated model in a target benchmark and associated task compared to their non-contaminated counterparts. The consequences can be very harmful, with wrong scientific conclusions being published while other correct ones are discarded. This position paper defines different levels of data contamination and advocates for a community effort, including the development of automatic and semi-automatic measures to detect when data from a benchmark was exposed to a model, and suggestions for flagging papers with conclusions that are compromised by data contamination.
Does Role-Playing Chatbots Capture the Character Personalities? Assessing Personality Traits for Role-Playing Chatbots
results: 研究结果显示,使用大规模预训练语言模型创建的角色扮演聊天机器人可以准确表现出对应的人格特质,与人类所认可的人格特质的一致率为82.8%。此外,论文还提出了可能的形成聊天机器人人格的策略。因此,这篇论文为Role-playing聊天机器人的研究提供了一个基础性的研究。Abstract
The emergence of large-scale pretrained language models has revolutionized the capabilities of new AI application, especially in the realm of crafting chatbots with distinct personas. Given the "stimulus-response" nature of chatbots, this paper unveils an innovative open-ended interview-style approach for personality assessment on role-playing chatbots, which offers a richer comprehension of their intrinsic personalities. We conduct personality assessments on 32 role-playing chatbots created by the ChatHaruhi library, across both the Big Five and MBTI dimensions, and measure their alignment with human perception. Evaluation results underscore that modern role-playing chatbots based on LLMs can effectively portray personality traits of corresponding characters, with an alignment rate of 82.8% compared with human-perceived personalities. Besides, we also suggest potential strategies for shaping chatbots' personalities. Hence, this paper serves as a cornerstone study for role-playing chatbots that intersects computational linguistics and psychology. Our resources are available at https://github.com/LC1332/Chat-Haruhi-Suzumiya
摘要
大规模预训语言模型的出现对新的人工智能应用程序带来了革命性的变革,特别是在游戏角色聊天机器人的领域。由于聊天机器人的“刺激-应答”性质,这篇论文推出了一种创新的开端式 интервью式人格测试方法,可以更深入地了解角色聊天机器人的内在人格特质。我们对使用ChatHaruhi库创建的32个角色聊天机器人进行了人格测试,包括Big Five和MBTI维度,并与人类的认知进行比较。结果显示,现代基于LLMs的角色聊天机器人可以有效表达对应的人格特质,与人类认知的人格Alignment率为82.8%。此外,我们还提出了可能的聊天机器人人格模型的形成策略。因此,本论文作为计算语言学和心理学交叉领域的基础研究,可以为角色聊天机器人的开发提供启示。我们的资源可以在https://github.com/LC1332/Chat-Haruhi-Suzumiya 查看。
Whisper-MCE: Whisper Model Finetuned for Better Performance with Mixed Languages
results: 研究表明, compare to基线的Whisper-大型v2模型,Whisper-MCE模型能够更好地捕捉原始音频的内容,实现更高的识别精度,并且具有更快的识别速度,特别是在混合语言任务中表现出色。Abstract
Recently Whisper has approached human-level robustness and accuracy in English automatic speech recognition (ASR), while in minor language and mixed language speech recognition, there remains a compelling need for further improvement. In this work, we present the impressive results of Whisper-MCE, our finetuned Whisper model, which was trained using our self-collected dataset, Mixed Cantonese and English audio dataset (MCE). Meanwhile, considering word error rate (WER) poses challenges when it comes to evaluating its effectiveness in minor language and mixed-language contexts, we present a novel rating mechanism. By comparing our model to the baseline whisper-large-v2 model, we demonstrate its superior ability to accurately capture the content of the original audio, achieve higher recognition accuracy, and exhibit faster recognition speed. Notably, our model outperforms other existing models in the specific task of recognizing mixed language.
摘要
最近,Whisper 在英语自动语音识别(ASR)中达到了人类水平的Robustness和准确率,而在小语言和杂语言语音识别方面仍然有很大的改进空间。在这项工作中,我们发布了我们自收集的数据集,混合粤语和英语音频数据集(MCE),并使用这些数据集来训练我们的 Whisper 模型,并对其进行了迁移。尽管 word error rate(WER)在小语言和杂语言上存在评估效果的挑战,我们则提出了一种新的评价机制。通过对我们的模型与基eline whisper-large-v2 模型进行比较,我们表明了我们的模型在捕捉原始音频内容的能力更高, recognition 率更高,并且速度更快。值得一提是,我们的模型在杂语言识别任务中表现出色,胜过其他现有的模型。
SOUL: Towards Sentiment and Opinion Understanding of Language
results: 实验结果表明,SOUL是现有语言模型很难解决的任务,与人类表现相比,语言模型的性能差距可达27%。此外,与人类专家和GPT-4进行评估表明,小语言模型在生成有理根据的证明方面存在限制。这些结果强调了现有语言模型在情感分析领域存在的复杂性,并提出了进一步发展情感分析的需求。Abstract
Sentiment analysis is a well-established natural language processing task, with sentiment polarity classification being one of its most popular and representative tasks. However, despite the success of pre-trained language models in this area, they often fall short of capturing the broader complexities of sentiment analysis. To address this issue, we propose a new task called Sentiment and Opinion Understanding of Language (SOUL). SOUL aims to evaluate sentiment understanding through two subtasks: Review Comprehension (RC) and Justification Generation (JG). RC seeks to validate statements that focus on subjective information based on a review text, while JG requires models to provide explanations for their sentiment predictions. To enable comprehensive evaluation, we annotate a new dataset comprising 15,028 statements from 3,638 reviews. Experimental results indicate that SOUL is a challenging task for both small and large language models, with a performance gap of up to 27% when compared to human performance. Furthermore, evaluations conducted with both human experts and GPT-4 highlight the limitations of the small language model in generating reasoning-based justifications. These findings underscore the challenging nature of the SOUL task for existing models, emphasizing the need for further advancements in sentiment analysis to address its complexities. The new dataset and code are available at https://github.com/DAMO-NLP-SG/SOUL.
摘要
sentiment分析是一个已经广泛应用的自然语言处理任务,其中情感质量分类是该领域最受欢迎的任务之一。然而,尽管先前训练的语言模型在这个领域取得了成功,但它们经常无法捕捉 sentiment分析的更广泛复杂性。为了解决这个问题,我们提出了一个新的任务,即语言情感理解(SOUL)。SOUL的目的是评估语言情感理解的能力,通过两个子任务:评论理解(RC)和证明生成(JG)。RC检验基于评论文本中主观信息的准确性,而JG要求模型为其情感预测提供解释。为了实现全面的评估,我们注释了一个新的数据集,包含15,028个语句,来自3,638篇评论。实验结果表明,SOUL是现有模型的一个挑战性任务,与人类表现的差距可达27%。此外,通过人类专家和GPT-4的评估,我们发现小语言模型在生成理由基于的证明方面存在限制。这些发现强调现有模型在情感分析的复杂性方面存在困难,需要进一步的进步,以更好地捕捉 sentiment分析的复杂性。新的数据集和代码可以在https://github.com/DAMO-NLP-SG/SOUL上获取。
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
results: 提出了PO3D-VQA模型,结合概率神经 симвоlic Program Execution 和深度神经网络,实现了3D生成表示和可靠视觉识别。实验结果显示PO3D-VQA模型在3D-aware VQA任务中表现出色,但还有一定的性能差距与2D VQA标准准样本比较, indicating that 3D-aware VQA remains an important open research area。Abstract
Despite rapid progress in Visual question answering (VQA), existing datasets and models mainly focus on testing reasoning in 2D. However, it is important that VQA models also understand the 3D structure of visual scenes, for example to support tasks like navigation or manipulation. This includes an understanding of the 3D object pose, their parts and occlusions. In this work, we introduce the task of 3D-aware VQA, which focuses on challenging questions that require a compositional reasoning over the 3D structure of visual scenes. We address 3D-aware VQA from both the dataset and the model perspective. First, we introduce Super-CLEVR-3D, a compositional reasoning dataset that contains questions about object parts, their 3D poses, and occlusions. Second, we propose PO3D-VQA, a 3D-aware VQA model that marries two powerful ideas: probabilistic neural symbolic program execution for reasoning and deep neural networks with 3D generative representations of objects for robust visual recognition. Our experimental results show our model PO3D-VQA outperforms existing methods significantly, but we still observe a significant performance gap compared to 2D VQA benchmarks, indicating that 3D-aware VQA remains an important open research area.
摘要
尽管视觉问答(VQA)已经快速进步,现有的数据集和模型主要是测试二维空间中的逻辑能力。然而,VQA模型也需要理解三维视觉场景的结构,如支持导航或操作任务。这包括理解三维物体姿态、部件和遮挡。在这项工作中,我们引入三维逻辑VQA任务,它挑战需要对视觉场景的三维结构进行复杂的推理。我们从数据集和模型两个角度解决3D-aware VQA。首先,我们介绍Super-CLEVR-3D数据集,它包含对物体部件、姿态和遮挡进行复杂的推理的问题。其次,我们提出PO3D-VQA模型,它结合了可靠的神经网络符号表示和深度神经网络的3D生成表示来实现可靠的视觉识别和逻辑推理。我们的实验结果表明,PO3D-VQA模型在3D-aware VQA任务上表现出色,但我们还观察到与2D VQA标准chmark相比,3D-aware VQA任务的性能仍然存在显著的差距,因此3D-aware VQA仍然是一个重要的未解决问题。
TarGEN: Targeted Data Generation with Large Language Models
results: 通过在8个SuperGLUE任务上训练不同类型的语言模型,包括编码器-解码器、编码器和解码器等,authors 发现 TarGEN 可以生成高质量的人工数据集,并且与原始数据集相比,模型在 TarGEN 数据集上训练后表现约1-2%点更高。Abstract
The rapid advancement of large language models (LLMs) has sparked interest in data synthesis techniques, aiming to generate diverse and high-quality synthetic datasets. However, these synthetic datasets often suffer from a lack of diversity and added noise. In this paper, we present TarGEN, a multi-step prompting strategy for generating high-quality synthetic datasets utilizing a LLM. An advantage of TarGEN is its seedless nature; it does not require specific task instances, broadening its applicability beyond task replication. We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances during dataset creation, ensuring reliable labels. To assess our technique's effectiveness, we emulate 8 tasks from the SuperGLUE benchmark and finetune various language models, including encoder-only, encoder-decoder, and decoder-only models on both synthetic and original training sets. Evaluation on the original test set reveals that models trained on datasets generated by TarGEN perform approximately 1-2% points better than those trained on original datasets (82.84% via syn. vs. 81.12% on og. using Flan-T5). When incorporating instruction tuning, the performance increases to 84.54% on synthetic data vs. 81.49% on original data by Flan-T5. A comprehensive analysis of the synthetic dataset compared to the original dataset reveals that the synthetic dataset demonstrates similar or higher levels of dataset complexity and diversity. Furthermore, the synthetic dataset displays a bias level that aligns closely with the original dataset. Finally, when pre-finetuned on our synthetic SuperGLUE dataset, T5-3B yields impressive results on the OpenLLM leaderboard, surpassing the model trained on the Self-Instruct dataset by 4.14% points. We hope that TarGEN can be helpful for quality data generation and reducing the human efforts to create complex benchmarks.
摘要
LLMS 的快速进步已经引起了数据生成技术的兴趣,以生成多样化和高质量的synthetic dataset。然而,这些synthetic dataset经常受到缺乏多样性和附加噪音的问题困扰。在本文中,我们提出了 TarGEN,一种多步提示策略,通过 LLMS 来生成高质量的synthetic dataset。 TarGEN 的优点在于它不需要特定任务实例,因此其可以应用于任务复制以外的场景。我们还在 TarGEN 中添加了一种自修复技术,使 LLMS 能够在数据创建过程中纠正错误标签,以确保可靠的标签。为评估我们的技术效果,我们在SuperGLUEbenchmark中模拟了8个任务,并使用不同的语言模型进行训练。我们发现,使用 TarGEN 生成的synthetic dataset,模型在原始测试集上的性能约为1-2%点高于使用原始数据训练的模型(82.84% via syn. vs. 81.12% on og. using Flan-T5)。当将 instrucion tuning incorporated 时,模型在synthetic数据上的性能提高到84.54% vs. 81.49% on original data by Flan-T5。我们对synthetic dataset和原始 dataset进行了全面的分析,发现synthetic dataset中的多样性和复杂性与原始 dataset相似或更高,并且噪音水平与原始 dataset相似。最后,当 T5-3B 在我们的synthetic SuperGLUE dataset上进行预训练后,在OpenLLM领头占据了4.14%点的优势。我们希望 TarGEN 可以帮助生成高质量的数据,并减少人类创建复杂的benchmark所需的努力。
From Values to Opinions: Predicting Human Behaviors and Stances Using Value-Injected Large Language Models
results: 我们对四个任务进行了一系列实验,发现使用值批入 LLM substantially 超过基eline,同时也发现使用值批入 LLM 可以更好地预测人们的意见和行为。Abstract
Being able to predict people's opinions on issues and behaviors in realistic scenarios can be helpful in various domains, such as politics and marketing. However, conducting large-scale surveys like the European Social Survey to solicit people's opinions on individual issues can incur prohibitive costs. Leveraging prior research showing influence of core human values on individual decisions and actions, we propose to use value-injected large language models (LLM) to predict opinions and behaviors. To this end, we present Value Injection Method (VIM), a collection of two methods -- argument generation and question answering -- designed to inject targeted value distributions into LLMs via fine-tuning. We then conduct a series of experiments on four tasks to test the effectiveness of VIM and the possibility of using value-injected LLMs to predict opinions and behaviors of people. We find that LLMs value-injected with variations of VIM substantially outperform the baselines. Also, the results suggest that opinions and behaviors can be better predicted using value-injected LLMs than the baseline approaches.
摘要
可以预测人们对问题和行为的意见在现实场景中是有帮助的,例如在政治和市场营销等领域。然而,进行大规模的民意调查,如欧洲社会调查,以获取人们对个别问题的意见可能会付出昂贵的代价。我们建议使用核心人类价值的影响于个人决策和行为的先前研究,并使用价值插入大语言模型(LLM)来预测意见和行为。为此,我们提出了价值插入方法(VIM),包括两种方法——论点生成和问答——用于在 LL M 中插入目标价值分布。我们then进行了四个任务的 série of experiments 来测试 VIM 的效果和使用价值插入 LL M 来预测人们的意见和行为的可能性。我们发现,使用 VIM 对 LL M 进行 fine-tuning 后,其表现substantially outperform baseline。此外,结果还表明,使用价值插入 LL M 可以更好地预测人们的意见和行为, чем基eline Approaches。
SQLformer: Deep Auto-Regressive Query Graph Generation for Text-to-SQL Translation
results: 对于文本到SQL Spider测试集,SQLformer显示出了最新的表现,并且在适应不同数据库和查询任务中具有良好的泛化能力。Abstract
In recent years, there has been growing interest in text-to-SQL translation, which is the task of converting natural language questions into executable SQL queries. This technology is important for its potential to democratize data extraction from databases. However, some of its key hurdles include domain generalisation, which is the ability to adapt to previously unseen databases, and alignment of natural language questions with the corresponding SQL queries. To overcome these challenges, we introduce SQLformer, a novel Transformer architecture specifically crafted to perform text-to-SQL translation tasks. Our model predicts SQL queries as abstract syntax trees (ASTs) in an autoregressive way, incorporating structural inductive bias in the encoder and decoder layers. This bias, guided by database table and column selection, aids the decoder in generating SQL query ASTs represented as graphs in a Breadth-First Search canonical order. Comprehensive experiments illustrate the state-of-the-art performance of SQLformer in the challenging text-to-SQL Spider benchmark. Our implementation is available at https://github.com/AdrianBZG/SQLformer
摘要
近年来,文本到SQL翻译技术已经受到了越来越多的关注,这是将自然语言问题转换成可执行的SQL查询的任务。这种技术可以帮助普通人从数据库中提取数据。然而,这个技术的一些关键挑战包括领域总结,即将数据库中的数据映射到自然语言中,以及自然语言问题与相应的SQL查询的对应。为了解决这些挑战,我们提出了SQLformer,一种专门为文本到SQL翻译任务设计的Transformer架构。我们的模型预测SQL查询为抽象语法树(AST),并在树和树之间具有指导性的结构卷积。这种卷积引导了数据库表和列的选择,帮助解码器生成SQL查询AST表示为图形,并在深度优先搜索中遍历。我们的实现可以在https://github.com/AdrianBZG/SQLformer上获取。Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.
results: 本文的方法可以在 meta-learning、personalized federated learning、end-to-end learning 和 Wasserstein distributionally robust optimization with side information(WDRO-SI)等应用中实现高效的优化。特别是在 Stochastic Nonconvex Optimization 中,我们的方法与现有的下界匹配。在数学实验中,我们的方法的复杂性不依赖于任务数量。Abstract
We introduce contextual stochastic bilevel optimization (CSBO) -- a stochastic bilevel optimization framework with the lower-level problem minimizing an expectation conditioned on some contextual information and the upper-level decision variable. This framework extends classical stochastic bilevel optimization when the lower-level decision maker responds optimally not only to the decision of the upper-level decision maker but also to some side information and when there are multiple or even infinite many followers. It captures important applications such as meta-learning, personalized federated learning, end-to-end learning, and Wasserstein distributionally robust optimization with side information (WDRO-SI). Due to the presence of contextual information, existing single-loop methods for classical stochastic bilevel optimization are unable to converge. To overcome this challenge, we introduce an efficient double-loop gradient method based on the Multilevel Monte-Carlo (MLMC) technique and establish its sample and computational complexities. When specialized to stochastic nonconvex optimization, our method matches existing lower bounds. For meta-learning, the complexity of our method does not depend on the number of tasks. Numerical experiments further validate our theoretical results.
摘要
我们介绍了 Contextual Stochastic Bilevel Optimization(CSBO)框架,这是一种带有上下文信息的随机二重优化框架,下一级决策者根据上一级决策者的决策以及一些副信息进行优化。这种框架超越了经典的随机二重优化,因为下一级决策者不仅响应上一级决策者的决策,还响应一些副信息,并且可能有多个或无数多个追随者。它涵盖了重要的应用,如meta-学习、个性化联合学习、端到端学习以及 Wasserstein Distributionally Robust Optimization with Side Information(WDRO-SI)。由于上下文信息的存在,传统的单循环方法无法收敛。为解决这个挑战,我们提出了一种高效的双循环梯度法,基于 Multilevel Monte-Carlo(MLMC)技术,并证明其样本和计算复杂度。当特化到随机非 convex 优化时,我们的方法与已有的下界匹配。对于 meta-学习,我们的方法的复杂度不виси于任务数量。实验数据也 validate 我们的理论结果。
Feature Selection in the Contrastive Analysis Setting
results: 实验结果表明,CFS方法在四个数据集中均可以比前所有的超参数和无监督特征选择方法表现更好,并且可以更好地捕捉到CA设置中的特征差异。Abstract
Contrastive analysis (CA) refers to the exploration of variations uniquely enriched in a target dataset as compared to a corresponding background dataset generated from sources of variation that are irrelevant to a given task. For example, a biomedical data analyst may wish to find a small set of genes to use as a proxy for variations in genomic data only present among patients with a given disease (target) as opposed to healthy control subjects (background). However, as of yet the problem of feature selection in the CA setting has received little attention from the machine learning community. In this work we present contrastive feature selection (CFS), a method for performing feature selection in the CA setting. We motivate our approach with a novel information-theoretic analysis of representation learning in the CA setting, and we empirically validate CFS on a semi-synthetic dataset and four real-world biomedical datasets. We find that our method consistently outperforms previously proposed state-of-the-art supervised and fully unsupervised feature selection methods not designed for the CA setting. An open-source implementation of our method is available at https://github.com/suinleelab/CFS.
摘要
contrastive analysis (CA) 指的是对target数据集中具有特点的变化进行探索,相比较background数据集中的无关变化。例如,生物医学数据分析员可能想要找到一小组基因作为病人群体疾病(target)中唯一存在的变化表示,而不是健康控制Subject(background)。然而,当前的CA设定中的特征选择问题尚未得到机器学习社区的足够关注。在这篇文章中,我们提出了对CA设定进行特征选择的方法——对比特征选择(CFS)。我们通过对CA设定的信息理论分析来证明我们的方法,并在一个半 sintetic数据集和四个实际生物医学数据集上进行了验证。我们发现,我们的方法常常超越了已有的supervised和完全无监督特征选择方法,不是CA设定。我们的实现可以在https://github.com/suinleelab/CFS上获取。
Learning to design protein-protein interactions with enhanced generalization
paper_authors: Anton Bushuiev, Roman Bushuiev, Anatolii Filkin, Petr Kouba, Marketa Gabrielova, Michal Gabriel, Jiri Sedlar, Tomas Pluskal, Jiri Damborsky, Stanislav Mazurenko, Josef Sivic
for: 提高生物医学研究和开发新药的进步,发现加强蛋白质-蛋白质交互(PPI)的突变是关键。
methods: 使用机器学习方法,特别是SE(3)-等变征模型,以实现大规模学习和泛化。
results: 提出了PPIRef数据集,这是世界上最大、非重复的蛋白质-蛋白质交互数据集,并使用PPIRef数据集预训练PPIformer模型,并通过调整预训练损失函数来预测蛋白质-蛋白质交互突变的效果。最终,通过比较其他现有的状态之最好方法,提高了新的PPIformer方法的泛化性。Abstract
Discovering mutations enhancing protein-protein interactions (PPIs) is critical for advancing biomedical research and developing improved therapeutics. While machine learning approaches have substantially advanced the field, they often struggle to generalize beyond training data in practical scenarios. The contributions of this work are three-fold. First, we construct PPIRef, the largest and non-redundant dataset of 3D protein-protein interactions, enabling effective large-scale learning. Second, we leverage the PPIRef dataset to pre-train PPIformer, a new SE(3)-equivariant model generalizing across diverse protein-binder variants. We fine-tune PPIformer to predict effects of mutations on protein-protein interactions via a thermodynamically motivated adjustment of the pre-training loss function. Finally, we demonstrate the enhanced generalization of our new PPIformer approach by outperforming other state-of-the-art methods on new, non-leaking splits of standard labeled PPI mutational data and independent case studies optimizing a human antibody against SARS-CoV-2 and increasing the thrombolytic activity of staphylokinase.
摘要
发现加强蛋白蛋白交互(PPI)的突变可以推动生物医学研究和开发改进的药物。虽然机器学习方法在这个领域已经做出了很大的进步,但它们经常在实际应用场景中难以泛化。这个研究的贡献有三个方面:1. 我们构建了3D蛋白蛋白交互的最大和非重复的数据集PPIRef,使得大规模学习成为可能。2. 我们利用PPIRef数据集来预训练PPIformer,一种新的SE(3)-可变型模型,可以在不同的蛋白蛋白绑定变体中进行广泛的泛化。3. 我们利用PPIformer模型来预测蛋白蛋白交互中突变的影响,通过在预训练损失函数中做一种热力学激活的调整。最后,我们表明了我们的新PPIformer方法在新的非泄漏分组的标注的PPI突变数据集和独立的案例研究中表现出了更高的泛化能力,比如人类抗体 against SARS-CoV-2和提高破坏酶的溶解活性。
Preventing Language Models From Hiding Their Reasoning
results: 研究发现,当 LLM 强度增加时,它们更容易使用编码推理来解决问题,但这些推理步骤可能不可读明白于人类读者。此外,研究还提出了一种方法来评估防御机制,并证明在某些条件下,重叠 rewrite 可以成功防止模型编码推理。Abstract
Large language models (LLMs) often benefit from intermediate steps of reasoning to generate answers to complex problems. When these intermediate steps of reasoning are used to monitor the activity of the model, it is essential that this explicit reasoning is faithful, i.e. that it reflects what the model is actually reasoning about. In this work, we focus on one potential way intermediate steps of reasoning could be unfaithful: encoded reasoning, where an LLM could encode intermediate steps of reasoning in the generated text in a way that is not understandable to human readers. We show that language models can be trained to make use of encoded reasoning to get higher performance without the user understanding the intermediate steps of reasoning. We argue that, as language models get stronger, this behavior becomes more likely to appear naturally. Finally, we describe a methodology that enables the evaluation of defenses against encoded reasoning, and show that, under the right conditions, paraphrasing successfully prevents even the best encoding schemes we built from encoding more than 3 bits of information per KB of text.
摘要
大型语言模型(LLM)经常从中间步骤的理解来生成复杂问题的答案。当这些中间步骤的理解用于监测模型的活动时,则必须确保这些Explicit reasoning是 faithful,即模型真正正在理解什么。在这项工作中,我们关注一种可能的不忠的中间步骤理解情况:编码理解,其中一个LLM可能将中间步骤的理解编码到生成的文本中,以至于人类读者无法理解。我们证明了语言模型可以通过编码理解来提高性能,而无需用户理解中间步骤的理解。我们还 argues that,随着语言模型的强大化,这种行为越来越容易出现。最后,我们描述了一种方法ология,用于评估防御机制 против编码理解,并显示,在合适的条件下,重叠成功地防止了我们构建的最佳编码方案中更多于3 bits的信息。
Multi-fidelity Design of Porous Microstructures for Thermofluidic Applications
paper_authors: Jonathan Tammer Eweis-LaBolle, Chuanning Zhao, Yoonjin Won, Ramin Bostanabad
for: 这 paper 的目的是为了设计最佳的热管理解决方案,以满足现代电子设备的高效热管理需求。
methods: 这 paper 使用了数据驱动的方法,利用特征函数(SDFs)来编码设计空间,并通过多元模拟和优化算法来找到最佳的热管理解决方案。
results: 这 paper 的结果显示,使用这种数据驱动的方法可以快速和有效地找到最佳的热管理解决方案,并且可以满足现代电子设备的高效热管理需求。Abstract
As modern electronic devices are increasingly miniaturized and integrated, their performance relies more heavily on effective thermal management. Two-phase cooling methods enhanced by porous surfaces, which capitalize on thin-film evaporation atop structured porous surfaces, are emerging as potential solutions. In such porous structures, the optimum heat dissipation capacity relies on two competing objectives that depend on mass and heat transfer. The computational costs of evaluating these objectives, the high dimensionality of the design space which a voxelated microstructure representation, and the manufacturability constraints hinder the optimization process for thermal management. We address these challenges by developing a data-driven framework for designing optimal porous microstructures for cooling applications. In our framework we leverage spectral density functions (SDFs) to encode the design space via a handful of interpretable variables and, in turn, efficiently search it. We develop physics-based formulas to quantify the thermofluidic properties and feasibility of candidate designs via offline simulations. To decrease the reliance on expensive simulations, we generate multi-fidelity data and build emulators to find Pareto-optimal designs. We apply our approach to a canonical problem on evaporator wick design and obtain fin-like topologies in the optimal microstructures which are also characteristics often observed in industrial applications.
摘要
现代电子设备逐渐减小和集成,其性能受到有效热管理的依赖程度加大。两相冷却方法,通过使用结构化孔隙表面增强薄膜蒸发,被视为可能的解决方案。在such porous structures中,最佳热耗抑制 capacitance rely on two competing objectives that depend on mass and heat transfer。计算这些目标的成本,高维度的设计空间的computational cost,以及制造性能约束,使得优化过程受到阻碍。我们解决这些挑战 by developing a data-driven framework for designing optimal porous microstructures for cooling applications。在我们的框架中,我们利用spectral density functions (SDFs)来编码设计空间,通过一些可读性好的变量来快速搜索。我们开发了物理学基于的方程来评估候选设计的热流体性能和可行性,并通过offline simulations来验证。为了减少成本expensive simulations,我们生成多 fideltysimulation data和建立模型来找到Pareto优化的设计。我们在一个标准的蒸发器柱状结构设计问题中应用我们的方法,并获得了fin-like topologies的优化结构,这些结构也经常出现在工业应用中。
Understanding and Improving Ensemble Adversarial Defense
methods: 该论文使用了一种新的错误理论来解释 ensemble adversarial defense 的效果,并提出了一种名为 interactive global adversarial training (iGAT) 的新方法来提高 ensemble adversarial defense 的性能。
results: 根据实验结果,iGAT 可以在 CIFAR10 和 CIFAR100 datasets 下提高 ensemble adversarial defense 的性能,最高提升达 17%,并在 white-box 和 black-box 攻击下都有显著的提升。Abstract
The strategy of ensemble has become popular in adversarial defense, which trains multiple base classifiers to defend against adversarial attacks in a cooperative manner. Despite the empirical success, theoretical explanations on why an ensemble of adversarially trained classifiers is more robust than single ones remain unclear. To fill in this gap, we develop a new error theory dedicated to understanding ensemble adversarial defense, demonstrating a provable 0-1 loss reduction on challenging sample sets in an adversarial defense scenario. Guided by this theory, we propose an effective approach to improve ensemble adversarial defense, named interactive global adversarial training (iGAT). The proposal includes (1) a probabilistic distributing rule that selectively allocates to different base classifiers adversarial examples that are globally challenging to the ensemble, and (2) a regularization term to rescue the severest weaknesses of the base classifiers. Being tested over various existing ensemble adversarial defense techniques, iGAT is capable of boosting their performance by increases up to 17% evaluated using CIFAR10 and CIFAR100 datasets under both white-box and black-box attacks.
摘要
《 ensemble 策略在对抗攻击方面变得受欢迎,它在多个基础分类器之间进行协作来防止对抗攻击。 DESPITE 这种策略的实际成功,对 ensemble 对抗防御的理论解释仍然不清楚。 To fill this gap, we develop a new error theory dedicated to understanding ensemble adversarial defense, demonstrating a provable 0-1 loss reduction on challenging sample sets in an adversarial defense scenario. Guided by this theory, we propose an effective approach to improve ensemble adversarial defense, named interactive global adversarial training (iGAT). The proposal includes (1) a probabilistic distributing rule that selectively allocates adversarial examples to different base classifiers that are globally challenging to the ensemble, and (2) a regularization term to rescue the severest weaknesses of the base classifiers. tested over various existing ensemble adversarial defense techniques, iGAT is capable of boosting their performance by increases up to 17% evaluated using CIFAR10 and CIFAR100 datasets under both white-box and black-box attacks.》Note: Please note that the translation is in Simplified Chinese, and the formatting of the text may be different from the original English version.
Parameter-Efficient Methods for Metastases Detection from Clinical Notes
results: 我们的最佳模型在 F1 分数上达到 73.8%,准确率为 84%,并 recall 为 65.8%。Abstract
Understanding the progression of cancer is crucial for defining treatments for patients. The objective of this study is to automate the detection of metastatic liver disease from free-style computed tomography (CT) radiology reports. Our research demonstrates that transferring knowledge using three approaches can improve model performance. First, we utilize generic language models (LMs), pretrained in a self-supervised manner. Second, we use a semi-supervised approach to train our model by automatically annotating a large unlabeled dataset; this approach substantially enhances the model's performance. Finally, we transfer knowledge from related tasks by designing a multi-task transfer learning methodology. We leverage the recent advancement of parameter-efficient LM adaptation strategies to improve performance and training efficiency. Our dataset consists of CT reports collected at Memorial Sloan Kettering Cancer Center (MSKCC) over the course of 12 years. 2,641 reports were manually annotated by domain experts; among them, 841 reports have been annotated for the presence of liver metastases. Our best model achieved an F1-score of 73.8%, a precision of 84%, and a recall of 65.8%.
摘要
理解肿瘤的进程对于定义患者的治疗是非常重要。本研究的目标是自动从自由式 computed tomography(CT) radiology report中检测到肝肿瘤病变。我们的研究表明,通过三种方法可以提高模型的性能。首先,我们使用通用语言模型(LM),先前在自我超vised的方式进行预训练。第二,我们使用自动注释大量未标注数据集来培训我们的模型,这种方法显著提高了模型的性能。最后,我们利用相关任务的知识传递来设计多任务转移学习方法。我们利用了最近的参数效率LM参照扩展策略,以提高性能和训练效率。我们的数据集包括 Memorial Sloan Kettering Cancer Center(MSKCC)在12年期间收集的CT报告2,641份,其中841份已经被培训出版物的培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培训出版物培�
Minimax Optimal Submodular Optimization with Bandit Feedback
results: 本文获得了一个新的下界 bound,该 bound 表示在 $T$ 次循环中, learner 的 regret 将是 $\mathcal{O}(\min_{i \le k}(in^{1/3}T^{2/3} + \sqrt{n^{k-i}T}))$。此外,本文还提出了一个可以对下界 bound 做出匹配的算法。Abstract
We consider maximizing a monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ under stochastic bandit feedback. Specifically, $f$ is unknown to the learner but at each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $|S_t| \leq k$ and receives reward $f(S_t) + \eta_t$ where $\eta_t$ is mean-zero sub-Gaussian noise. The objective is to minimize the learner's regret over $T$ times with respect to ($1-e^{-1}$)-approximation of maximum $f(S_*)$ with $|S_*| = k$, obtained through greedy maximization of $f$. To date, the best regret bound in the literature scales as $k n^{1/3} T^{2/3}$. And by trivially treating every set as a unique arm one deduces that $\sqrt{ {n \choose k} T }$ is also achievable. In this work, we establish the first minimax lower bound for this setting that scales like $\mathcal{O}(\min_{i \le k}(in^{1/3}T^{2/3} + \sqrt{n^{k-i}T}))$. Moreover, we propose an algorithm that is capable of matching the lower bound regret.
摘要
我们考虑最大化一个单调、下调的集合函数 $f:2^{[n]} \to [0,1]$ 在数据随机弹指示下。具体来说,$f$ 是学习者不知道的,但在每个时间 $t=1,\dots,T$ 中,学习者选择一个集合 $S_t \subset [n]$ , $|S_t| \leq k$,并获得奖励 $f(S_t) + \eta_t$,其中 $\eta_t$ 是mean-zero sub-Gaussian 噪声。学习者的目标是在 $T$ 次时间内,与 ($1-e^{-1}$) 近似最大的 $f(S_*)$ ,其中 $|S_*| = k$,通过简单的单调最大化 $f$ 而得到。现有的最差 regret bound 是 $k n^{1/3} T^{2/3}$。而通过将每个集合视为单一的枪一样,则可以得到 $\sqrt{n \choose k} T}$ 的 regret bound。在这个研究中,我们建立了首个最小最差下界,其scale如 $\mathcal{O}(\min_{i \le k}(in^{1/3}T^{2/3} + \sqrt{n^{k-i}T}))$。此外,我们也提出了一个能够匹配下界的算法。
Approximate Heavy Tails in Offline (Multi-Pass) Stochastic Gradient Descent
results: 研究发现,随着数据点数量的增加,offline SGD 会变得越来越 “power-law-like”,而 online SGD 则会保持不变。此外,研究还在 synthetic data 和神经网络上进行了实验来证明理论结论。Abstract
A recent line of empirical studies has demonstrated that SGD might exhibit a heavy-tailed behavior in practical settings, and the heaviness of the tails might correlate with the overall performance. In this paper, we investigate the emergence of such heavy tails. Previous works on this problem only considered, up to our knowledge, online (also called single-pass) SGD, in which the emergence of heavy tails in theoretical findings is contingent upon access to an infinite amount of data. Hence, the underlying mechanism generating the reported heavy-tailed behavior in practical settings, where the amount of training data is finite, is still not well-understood. Our contribution aims to fill this gap. In particular, we show that the stationary distribution of offline (also called multi-pass) SGD exhibits 'approximate' power-law tails and the approximation error is controlled by how fast the empirical distribution of the training data converges to the true underlying data distribution in the Wasserstein metric. Our main takeaway is that, as the number of data points increases, offline SGD will behave increasingly 'power-law-like'. To achieve this result, we first prove nonasymptotic Wasserstein convergence bounds for offline SGD to online SGD as the number of data points increases, which can be interesting on their own. Finally, we illustrate our theory on various experiments conducted on synthetic data and neural networks.
摘要
现在的一些实验研究表明,SGD可能在实际情况下具有重尾性行为,并且这种重尾性与总性能之间存在相关性。在这篇论文中,我们探究了这种重尾性的出现。先前的研究只考虑了在线(也称为单通道)SGD,其中证明重尾性的存在需要训练数据的无穷多余。因此,在实际设置中,其下面机制仍然不很清楚。我们的贡献是填充这个空白。具体来说,我们表明了离线(也称为多通道)SGD的站点分布 exhibits 'approximate' 的力学尾部,并且这种预测错误是通过如何让Empirical distribution of training data converge to the true underlying data distribution在 Wasserstein 度量下控制的。我们的主要结论是,随着数据点数量增加,离线 SGD 会逐渐具有 'power-law-like' 的行为。为了实现这个结论,我们首先证明了离线 SGD 与在线 SGD 的非尺寸 Wasserstein 准确性 bound,这可能是一个独立的兴趣点。最后,我们在synthetic data和神经网络上进行了多个实验来证明我们的理论。
Bayesian Optimization with Hidden Constraints via Latent Decision Models
results: 我们通过数值实验表示,HC-LSBO在 synthetic 和实际数据集上表现出色,特别是在大规模警区设置问题上。与基线方法相比,HC-LSBO 提供了显著的性能和效率改善。Abstract
Bayesian optimization (BO) has emerged as a potent tool for addressing intricate decision-making challenges, especially in public policy domains such as police districting. However, its broader application in public policymaking is hindered by the complexity of defining feasible regions and the high-dimensionality of decisions. This paper introduces the Hidden-Constrained Latent Space Bayesian Optimization (HC-LSBO), a novel BO method integrated with a latent decision model. This approach leverages a variational autoencoder to learn the distribution of feasible decisions, enabling a two-way mapping between the original decision space and a lower-dimensional latent space. By doing so, HC-LSBO captures the nuances of hidden constraints inherent in public policymaking, allowing for optimization in the latent space while evaluating objectives in the original space. We validate our method through numerical experiments on both synthetic and real data sets, with a specific focus on large-scale police districting problems in Atlanta, Georgia. Our results reveal that HC-LSBO offers notable improvements in performance and efficiency compared to the baselines.
摘要
bayesian 优化(BO)已成为复杂决策挑战的强大工具,尤其在公共政策领域such as 警区划分。然而,它在公共政策决策中的更广泛应用受到定义可行区域的复杂性和决策的高维度所阻碍。这篇文章介绍了隐藏的约束 latent Space Bayesian 优化(HC-LSBO),一种 integrate Bayesian 优化方法和秘密决策模型。这种方法利用一种变换自动编码器来学习原始决策空间中的可行分布,从而实现原始空间和隐藏空间之间的两个方向的映射。由此,HC-LSBO 捕捉了公共政策中隐藏的约束,允许在隐藏空间进行优化而不影响原始空间中的目标评价。我们通过对 sintetic 和实际数据集进行数学实验,发现HC-LSBO 对基eline 提供了显著的性能和效率提升。
M3C: A Framework towards Convergent, Flexible, and Unsupervised Learning of Mixture Graph Matching and Clustering
paper_authors: Jiaxin Lu, Zetian Jiang, Tianzhe Wang, Junchi Yan
for: 这篇论文targets real-world graph matching and clustering tasks, where graphs exhibit diverse modes and require grouping before or along with matching.
methods: 该方法基于Minorize-Maximization框架,提供了学习自由的 guarantee of theoretical convergence, along with relaxed clustering for enhanced flexibility.
results: 实验结果表明,该方法在公共benchmark上的准确率和效率都高于现状的graph matching和mixture graph matching和分 clustering方法。Here’s the English version for reference:
for: This paper targets real-world graph matching and clustering tasks, where graphs exhibit diverse modes and require grouping before or along with matching.
methods: The method is based on the Minorize-Maximization framework, providing learning-free guarantees of theoretical convergence, along with relaxed clustering for enhanced flexibility.
results: Experimental results demonstrate that our method outperforms state-of-the-art graph matching and mixture graph matching and clustering approaches in both accuracy and efficiency on public benchmarks.Abstract
Existing graph matching methods typically assume that there are similar structures between graphs and they are matchable. However, these assumptions do not align with real-world applications. This work addresses a more realistic scenario where graphs exhibit diverse modes, requiring graph grouping before or along with matching, a task termed mixture graph matching and clustering. We introduce Minorize-Maximization Matching and Clustering (M3C), a learning-free algorithm that guarantees theoretical convergence through the Minorize-Maximization framework and offers enhanced flexibility via relaxed clustering. Building on M3C, we develop UM3C, an unsupervised model that incorporates novel edge-wise affinity learning and pseudo label selection. Extensive experimental results on public benchmarks demonstrate that our method outperforms state-of-the-art graph matching and mixture graph matching and clustering approaches in both accuracy and efficiency. Source code will be made publicly available.
摘要
现有的图匹配方法通常假设图有相似结构,可以匹配。然而,这些假设与实际应用场景不符。本工作面临现实世界中图表现多种模式的问题,需要在匹配之前或同时进行图分组,一种被称为杂合图匹配和分群。我们介绍了一种不含学习的算法,名为小于最大化匹配和分群(M3C),该算法 garantías了理论上的收敛,并提供了放宽分群的灵活性。基于M3C,我们开发了一种无监督的模型,名为UM3C,它包括新的边绑定学习和 Pseudo标签选择。经过广泛的实验研究,我们发现OUR方法在公共测试 benchmark上比现有的图匹配和杂合图匹配和分群方法更高效和更准确。代码将在公共平台上发布。
Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage
results: 我们的提议算法在实验中表现出优于现有算法,并且我们通过对单个策略强度的假设进行Characterization sample complexity。Abstract
The goal of an offline reinforcement learning (RL) algorithm is to learn optimal polices using historical (offline) data, without access to the environment for online exploration. One of the main challenges in offline RL is the distribution shift which refers to the difference between the state-action visitation distribution of the data generating policy and the learning policy. Many recent works have used the idea of pessimism for developing offline RL algorithms and characterizing their sample complexity under a relatively weak assumption of single policy concentrability. Different from the offline RL literature, the area of distributionally robust learning (DRL) offers a principled framework that uses a minimax formulation to tackle model mismatch between training and testing environments. In this work, we aim to bridge these two areas by showing that the DRL approach can be used to tackle the distributional shift problem in offline RL. In particular, we propose two offline RL algorithms using the DRL framework, for the tabular and linear function approximation settings, and characterize their sample complexity under the single policy concentrability assumption. We also demonstrate the superior performance our proposed algorithm through simulation experiments.
摘要
文本:offline reinforcement learning(RL)算法的目标是使用历史数据学习优化策略,不能直接访问环境进行在线探索。offline RL中的一个主要挑战是分布转移,即数据生成策略对应的状态动作访问分布与学习策略之间的差异。许多最近的研究使用了偏见的想法开发了offline RL算法,并对其样本复杂性进行了定量化。与offline RL文献不同,分布robust学习(DRL)领域提供了一个理性的框架,使用最小最大形式来处理训练和测试环境之间的模型差异。本文想要将这两个领域相连,我们表明了DRL框架可以解决offline RL中的分布转移问题。特别是,我们提出了两种使用DRL框架的offline RL算法,一种是 для tabular设置,另一种是 для线性函数近似设置,并对其样本复杂性进行了定量化。我们还通过实验证明了我们的提出算法的优秀性。Translation:文本:The goal of an offline reinforcement learning (RL) algorithm is to learn optimal policies using historical (offline) data, without access to the environment for online exploration. One of the main challenges in offline RL is the distribution shift, which refers to the difference between the state-action visitation distribution of the data generating policy and the learning policy. Many recent works have used the idea of pessimism for developing offline RL algorithms and characterizing their sample complexity under a relatively weak assumption of single policy concentrability. Different from the offline RL literature, the area of distributionally robust learning (DRL) offers a principled framework that uses a minimax formulation to tackle model mismatch between training and testing environments. In this work, we aim to bridge these two areas by showing that the DRL approach can be used to tackle the distributional shift problem in offline RL. In particular, we propose two offline RL algorithms using the DRL framework, for the tabular and linear function approximation settings, and characterize their sample complexity under the single policy concentrability assumption. We also demonstrate the superior performance of our proposed algorithm through simulation experiments.
methods: 这个论文提出了一种新的方法 called Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE),它利用一种深度生成模型来生成高质量的人工数据样本,以增强原有的异质数据集,从而实现更好的数据均衡。
results: 作者们通过对MCRAGE方法与其他方法进行比较,发现MCRAGE方法可以提高医疗机器学习模型的准确率和F1分数,并且可以减少对不同性别和年龄等敏感属性的偏见。Abstract
In the field of healthcare, electronic health records (EHR) serve as crucial training data for developing machine learning models for diagnosis, treatment, and the management of healthcare resources. However, medical datasets are often imbalanced in terms of sensitive attributes such as race/ethnicity, gender, and age. Machine learning models trained on class-imbalanced EHR datasets perform significantly worse in deployment for individuals of the minority classes compared to samples from majority classes, which may lead to inequitable healthcare outcomes for minority groups. To address this challenge, we propose Minority Class Rebalancing through Augmentation by Generative modeling (MCRAGE), a novel approach to augment imbalanced datasets using samples generated by a deep generative model. The MCRAGE process involves training a Conditional Denoising Diffusion Probabilistic Model (CDDPM) capable of generating high-quality synthetic EHR samples from underrepresented classes. We use this synthetic data to augment the existing imbalanced dataset, thereby achieving a more balanced distribution across all classes, which can be used to train an unbiased machine learning model. We measure the performance of MCRAGE versus alternative approaches using Accuracy, F1 score and AUROC. We provide theoretical justification for our method in terms of recent convergence results for DDPMs with minimal assumptions.
摘要
在医疗领域,电子医疗记录(EHR)作为重要的训练数据,用于发展 диагности学、治疗和医疗资源管理的机器学习模型。然而,医疗数据经常受到敏感属性的影响,如种族/民族、性别和年龄,这会导致机器学习模型在部署过程中对少数群体的性能下降,从而可能导致不公正的医疗结果。为解决这个挑战,我们提出了少数类重新平衡通过扩充(MCRAGE),一种使用深度生成模型生成高质量的人工数据来增强不平衡的数据集的新方法。MCRAGE过程中,我们首先训练一个 Conditional Denoising Diffusion Probabilistic Model(CDDPM),可以生成来自少数类的高质量人工数据。然后,我们使用这些人工数据来扩充现有的不平衡数据集,以达到更平衡的分布,可以用于训练不偏袋机器学习模型。我们使用精度、F1分数和AUROC三个指标来衡量MCRAGE的性能和相对于其他方法的比较。此外,我们还提供了对MCRAGE方法的理论 justify,基于最近的DDPM的征untuous结果和最小的假设。
results: 本文证明了这些定义稳定性之间存在等价关系,并证明了一些扩充稳定性的结果,以便更好地理解和掌握最近几年出现的多种稳定性概念。Abstract
We show that many definitions of stability found in the learning theory literature are equivalent to one another. We distinguish between two families of definitions of stability: distribution-dependent and distribution-independent Bayesian stability. Within each family, we establish equivalences between various definitions, encompassing approximate differential privacy, pure differential privacy, replicability, global stability, perfect generalization, TV stability, mutual information stability, KL-divergence stability, and R\'enyi-divergence stability. Along the way, we prove boosting results that enable the amplification of the stability of a learning rule. This work is a step towards a more systematic taxonomy of stability notions in learning theory, which can promote clarity and an improved understanding of an array of stability concepts that have emerged in recent years.
摘要
我们证明了学习理论中多种稳定性定义之间存在等价关系。我们将稳定性定义分为两类:受分布影响的和不受分布影响的极 bayesian稳定性。每个家族中,我们证明了各种定义之间的等价关系,包括近似隐私、纯隐私、复制性、全局稳定性、完美泛化、TV稳定性、信息稳定性和KL散度稳定性。在过程中,我们证明了增强结果,使得稳定性的学习规则得到扩大。这项工作是一步 towards 更系统化的学习理论中稳定性概念的分类,可以促进清晰和学习许多年来出现的稳定性概念的更好的理解。
Fast Machine Learning Method with Vector Embedding on Orthonormal Basis and Spectral Transform
for: 本文提出了一种新的快速机器学习方法,利用了两种技术:Vector Embedding on Orthonormal Basis(VEOB)和Spectral Transform(ST)。
methods: 本方法使用了Singular Value Decomposition(SVD)技术计算向量基和投影坐标,从而提高了嵌入空间中的距离测量,并且可以压缩数据,保留最大的协方差 projection vectors。 ST 方法则将短 vectors 序列转换为спектраль空间,通过应用Discrete Cosine Transform(DCT)和选择最重要的组件,可以简化长 vectors 序列的处理。
results: 本文通过word embedding、text chunk embedding和image embedding的示例,在Julia语言中实现了一个向量数据库。它还 investigate了无监督学习和监督学习,以及处理大量数据的策略。Abstract
This paper presents a novel fast machine learning method that leverages two techniques: Vector Embedding on Orthonormal Basis (VEOB) and Spectral Transform (ST). The VEOB converts the original data encoding into a vector embedding with coordinates projected onto orthonormal bases. The Singular Value Decomposition (SVD) technique is used to calculate the vector basis and projection coordinates, leading to an enhanced distance measurement in the embedding space and facilitating data compression by preserving the projection vectors associated with the largest singular values. On the other hand, ST transforms sequence of vector data into spectral space. By applying the Discrete Cosine Transform (DCT) and selecting the most significant components, it streamlines the handling of lengthy vector sequences. The paper provides examples of word embedding, text chunk embedding, and image embedding, implemented in Julia language with a vector database. It also investigates unsupervised learning and supervised learning using this method, along with strategies for handling large data volumes.
摘要
这篇论文提出了一种新的快速机器学习方法,该方法利用了两种技术:向量嵌入在正交基(VEOB)和 спектраль转换(ST)。VEOB将原始数据编码转换为一个向量嵌入,其坐标被 проек到正交基上。使用SVD技术计算向量基和投影坐标,从而提高了嵌入空间中的距离测量,并且可以压缩数据,保留投影向量与最大特征值相关的 projet 矢量。一方面,ST将序列化的向量数据转换为 спектраль空间。通过应用DCT和选择最有价值的组件,可以简化长向量序列的处理。文章提供了word嵌入、文本块嵌入和图像嵌入的例子,实现在Julia语言中的一个向量数据库。它还 investigate了无监督学习和监督学习,以及处理大量数据的策略。
A general learning scheme for classical and quantum Ising machines
results: 实验结果表明,该学习模型在训练和执行上具有新的可能性,特别是在量子领域,量子资源被用于模型的执行和训练,提供了一个有前途的量子机器学习perspective。Abstract
An Ising machine is any hardware specifically designed for finding the ground state of the Ising model. Relevant examples are coherent Ising machines and quantum annealers. In this paper, we propose a new machine learning model that is based on the Ising structure and can be efficiently trained using gradient descent. We provide a mathematical characterization of the training process, which is based upon optimizing a loss function whose partial derivatives are not explicitly calculated but estimated by the Ising machine itself. Moreover, we present some experimental results on the training and execution of the proposed learning model. These results point out new possibilities offered by Ising machines for different learning tasks. In particular, in the quantum realm, the quantum resources are used for both the execution and the training of the model, providing a promising perspective in quantum machine learning.
摘要
一种Ising机器是专门设计用于找到Ising模型的稳定状态的硬件。相关的例子包括协调Isimg machine和量子气化器。在这篇论文中,我们提出一种基于Ising结构的新的机器学习模型,可以通过梯度下降方法高效地训练。我们提供了一个数学 caracterization of the training process,该过程基于优化一个损失函数的 partial derivatives不是直接计算出来,而是由Ising机器自己估算出来。此外,我们还提供了一些实验结果,证明了我们的学习模型在不同的任务上的应用前景。特别是在量子领域,量子资源被用于模型的执行和训练,提供了一个有前途的Perspective in quantum machine learning。
State-Action Similarity-Based Representations for Off-Policy Evaluation
results: 我们的实验结果显示,使用我们的状态动作表示方法可以提高 FQE 的数据效能,降低 OPE 错误值,并在不同的分布变换下保持 FQE 的稳定性。此外,我们还发现其他状态动作相似度度量无法表示评估策略的动作值函数,而我们的状态动作表示方法可以减少 FQE 中的数据误差。Abstract
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically successful algorithms for OPE has been the fitted q-evaluation (FQE) algorithm that uses temporal difference updates to learn an action-value function, which is then used to estimate the expected return of the evaluation policy. Typically, the original fixed dataset is fed directly into FQE to learn the action-value function of the evaluation policy. Instead, in this paper, we seek to enhance the data-efficiency of FQE by first transforming the fixed dataset using a learned encoder, and then feeding the transformed dataset into FQE. To learn such an encoder, we introduce an OPE-tailored state-action behavioral similarity metric, and use this metric and the fixed dataset to learn an encoder that models this metric. Theoretically, we show that this metric allows us to bound the error in the resulting OPE estimate. Empirically, we show that other state-action similarity metrics lead to representations that cannot represent the action-value function of the evaluation policy, and that our state-action representation method boosts the data-efficiency of FQE and lowers OPE error relative to other OPE-based representation learning methods on challenging OPE tasks. We also empirically show that the learned representations significantly mitigate divergence of FQE under varying distribution shifts. Our code is available here: https://github.com/Badger-RL/ROPE.
摘要
在强化学习中,评估策略(OPE)是计算评估策略所预期的返回的问题。一种更加实验成功的算法是适应q评估(FQE)算法,它使用时间差更新来学习一个动作价值函数,并使用这个函数来估计评估策略的返回。通常,原始固定数据集直接 fed 到 FQE 来学习评估策略的动作价值函数。而在这篇论文中,我们尝试使用一个学习的编码器来增强 FQE 的数据效率。我们引入一个适应 OPE 的状态动作行为相似度 metric,并使用这个 metric 和固定数据集来学习一个编码器,该编码器模型了这个 metric。理论上,我们证明这个 metric 可以约束 OPE 估计的错误。实验上,我们发现其他状态动作相似度 metric 导致的表示无法表示评估策略的动作价值函数,而我们的状态动作表示方法可以提高 FQE 的数据效率,相比其他基于 OPE 的表示学习方法。我们还发现学习的表示可以有效地减少 FQE 在不同分布下的急剧异常。我们的代码可以在以下链接中找到:https://github.com/Badger-RL/ROPE。
results: 研究发现,利用知道的分析物质含量来修正基线可以提高分析和量化结果的准确性,并且在两个近红外数据集上都有良好的性能。Abstract
Spectroscopic measurements can show distorted spectra shapes arising from a mixture of absorbing and scattering contributions. These distortions (or baselines) often manifest themselves as non-constant offsets or low-frequency oscillations. As a result, these baselines can adversely affect analytical and quantitative results. Baseline correction is an umbrella term where one applies pre-processing methods to obtain baseline spectra (the unwanted distortions) and then remove the distortions by differencing. However, current state-of-the art baseline correction methods do not utilize analyte concentrations even if they are available, or even if they contribute significantly to the observed spectral variability. We examine a class of state-of-the-art methods (penalized baseline correction) and modify them such that they can accommodate a priori analyte concentration such that prediction can be enhanced. Performance will be access on two near infra-red data sets across both classical penalized baseline correction methods (without analyte information) and modified penalized baseline correction methods (leveraging analyte information).
摘要
We examine a class of state-of-the-art baseline correction methods (penalized baseline correction) and modify them to accommodate a priori analyte concentration information. By leveraging this information, we can enhance prediction performance on two near infra-red data sets. In comparison to classical penalized baseline correction methods (without analyte information), our modified methods demonstrate improved performance.
Addressing GAN Training Instabilities via Tunable Classification Losses
paper_authors: Monica Welfert, Gowtham R. Kurri, Kyle Otstot, Lalitha Sankar
For: 该论文旨在提出一种基于生成对抗网络(GAN)的数据生成方法,使得生成的数据具有正式的保证。* Methods: 该论文使用类probability估计(CPE)损失函数来重新定义GAN的价值函数,并证明CPE损失GAN与$f$-GAN具有两种对应关系。此外,该论文还证明所有对称的$f$-散度都有相同的减少性。* Results: 在finite sample和模型容量下,该论文定义和获得估计和泛化错误的上下限。特别是,对于$\alpha$-GANs,该论文使用$\alpha$-损失函数,一个可调的CPE损失函数,并证明其在训练稳定性方面具有优越性。此外,该论文还引入了一种 dual-objective GAN,以解决GAN训练不稳定性问题。Abstract
Generative adversarial networks (GANs), modeled as a zero-sum game between a generator (G) and a discriminator (D), allow generating synthetic data with formal guarantees. Noting that D is a classifier, we begin by reformulating the GAN value function using class probability estimation (CPE) losses. We prove a two-way correspondence between CPE loss GANs and $f$-GANs which minimize $f$-divergences. We also show that all symmetric $f$-divergences are equivalent in convergence. In the finite sample and model capacity setting, we define and obtain bounds on estimation and generalization errors. We specialize these results to $\alpha$-GANs, defined using $\alpha$-loss, a tunable CPE loss family parametrized by $\alpha\in(0,\infty]$. We next introduce a class of dual-objective GANs to address training instabilities of GANs by modeling each player's objective using $\alpha$-loss to obtain $(\alpha_D,\alpha_G)$-GANs. We show that the resulting non-zero sum game simplifies to minimizing an $f$-divergence under appropriate conditions on $(\alpha_D,\alpha_G)$. Generalizing this dual-objective formulation using CPE losses, we define and obtain upper bounds on an appropriately defined estimation error. Finally, we highlight the value of tuning $(\alpha_D,\alpha_G)$ in alleviating training instabilities for the synthetic 2D Gaussian mixture ring as well as the large publicly available Celeb-A and LSUN Classroom image datasets.
摘要
生成敌战网络(GAN),模型为零SUM游戏中的生成器(G)和分类器(D)之间的对抗,可以生成具有正式保证的 sintetic 数据。注意到D是一个分类器,我们开始通过类型概率估计(CPE)损失函数来重新定义GAN的值函数。我们证明了CPE损失函数和$f$-GANs之间的双向对应关系,以及所有对称的$f$-散度都是相同的整合。在finite sample和模型容量设置下,我们定义和获得估计和泛化错误的上下界。我们特殊化这些结果到$\alpha$-GANs中,defined using $\alpha$-loss,一个可调CPE损失函数中的$\alpha\in(0,\infty]$.我们接着引入一个 dual-objective GAN来解决GAN的训练不稳定性,通过对每个玩家的目标使用$\alpha$-loss来获得$(\alpha_D,\alpha_G)$-GANs。我们证明了这个非零SUM游戏可以通过适当的条件来简化为$f$-散度的最小化。通过扩展这种双对象形式,我们定义和获得一个相应的估计错误的上限。最后,我们强调了在Synthetic 2D Gaussian mixture ring和大规模公共可用的Celeb-A和LSUN Classroom图像 dataset上调整($\alpha_D,\alpha_G)$的价值,以适应训练不稳定性。
results: 研究结果显示,使用开源的方法可以实现更好的 globel warming potential 和强度之间的交易,比现有业务实践更有利。Abstract
Eight percent of global carbon dioxide emissions can be attributed to the production of cement, the main component of concrete, which is also the dominant source of CO2 emissions in the construction of data centers. The discovery of lower-carbon concrete formulae is therefore of high significance for sustainability. However, experimenting with new concrete formulae is time consuming and labor intensive, as one usually has to wait to record the concrete's 28-day compressive strength, a quantity whose measurement can by its definition not be accelerated. This provides an opportunity for experimental design methodology like Bayesian Optimization (BO) to accelerate the search for strong and sustainable concrete formulae. Herein, we 1) propose modeling steps that make concrete strength amenable to be predicted accurately by a Gaussian process model with relatively few measurements, 2) formulate the search for sustainable concrete as a multi-objective optimization problem, and 3) leverage the proposed model to carry out multi-objective BO with real-world strength measurements of the algorithmically proposed mixes. Our experimental results show improved trade-offs between the mixtures' global warming potential (GWP) and their associated compressive strengths, compared to mixes based on current industry practices. Our methods are open-sourced at github.com/facebookresearch/SustainableConcrete.
摘要
全球碳排放的8%可以追溯到混凝土的生产,混凝土也是数据中心建设中主要的CO2排放来源。发现更低碳排放混凝土 формула的发现对可持续发展有着重要意义。然而,尝试新的混凝土 формула可以是时间占用和人力消耗的,因为一般需要等待28天压缩强度的测量,这个测量不能加速。这提供了 Bayesian 优化(BO)实验方法的机会,以加速搜索具有高强度和可持续的混凝土 формула。我们的方法包括:1. 模型步骤,使得混凝土强度可以准确预测,只需要 relativelly few 测量。2. 将寻找可持续的混凝土形式化为多目标优化问题。3. 利用我们提出的模型,通过实际测量 Algorithmically 提出的混凝土的强度,进行多目标 BO。我们的实验结果表明,我们的方法可以相比于现有行业实践,提高混凝土的全球温室效应(GWP)和压缩强度之间的交换。我们的方法在 GitHub 上公开发布,请参考 。
results: 我们的提议的 ESCFR 可以成功地解决很多治疗选择偏见问题,并在比较state-of-the-art方法时显示出显著的优异性。Abstract
Estimating conditional average treatment effect from observational data is highly challenging due to the existence of treatment selection bias. Prevalent methods mitigate this issue by aligning distributions of different treatment groups in the latent space. However, there are two critical problems that these methods fail to address: (1) mini-batch sampling effects (MSE), which causes misalignment in non-ideal mini-batches with outcome imbalance and outliers; (2) unobserved confounder effects (UCE), which results in inaccurate discrepancy calculation due to the neglect of unobserved confounders. To tackle these problems, we propose a principled approach named Entire Space CounterFactual Regression (ESCFR), which is a new take on optimal transport in the context of causality. Specifically, based on the framework of stochastic optimal transport, we propose a relaxed mass-preserving regularizer to address the MSE issue and design a proximal factual outcome regularizer to handle the UCE issue. Extensive experiments demonstrate that our proposed ESCFR can successfully tackle the treatment selection bias and achieve significantly better performance than state-of-the-art methods.
摘要
Estimating conditional average treatment effect from observational data is extremely difficult due to the existence of treatment selection bias. Prevalent methods mitigate this issue by aligning the distributions of different treatment groups in the latent space. However, there are two critical problems that these methods fail to address: (1) mini-batch sampling effects (MSE), which causes misalignment in non-ideal mini-batches with outcome imbalance and outliers; (2) unobserved confounder effects (UCE), which results in inaccurate discrepancy calculation due to the neglect of unobserved confounders. To tackle these problems, we propose a principled approach named Entire Space CounterFactual Regression (ESCFR), which is a new take on optimal transport in the context of causality. Specifically, based on the framework of stochastic optimal transport, we propose a relaxed mass-preserving regularizer to address the MSE issue and design a proximal factual outcome regularizer to handle the UCE issue. Extensive experiments demonstrate that our proposed ESCFR can successfully tackle the treatment selection bias and achieve significantly better performance than state-of-the-art methods.Here is the translation of the text into Traditional Chinese:Estimating conditional average treatment effect from observational data is extremely difficult due to the existence of treatment selection bias. Prevalent methods mitigate this issue by aligning the distributions of different treatment groups in the latent space. However, there are two critical problems that these methods fail to address: (1) mini-batch sampling effects (MSE), which causes misalignment in non-ideal mini-batches with outcome imbalance and outliers; (2) unobserved confounder effects (UCE), which results in inaccurate discrepancy calculation due to the neglect of unobserved confounders. To tackle these problems, we propose a principled approach named Entire Space CounterFactual Regression (ESCFR), which is a new take on optimal transport in the context of causality. Specifically, based on the framework of stochastic optimal transport, we propose a relaxed mass-preserving regularizer to address the MSE issue and design a proximal factual outcome regularizer to handle the UCE issue. Extensive experiments demonstrate that our proposed ESCFR can successfully tackle the treatment selection bias and achieve significantly better performance than state-of-the-art methods.
Entity Embeddings : Perspectives Towards an Omni-Modality Era for Large Language Models
results: 该论文预测了这种结构的潜在应用和挑战。Abstract
Large Language Models (LLMs) are evolving to integrate multiple modalities, such as text, image, and audio into a unified linguistic space. We envision a future direction based on this framework where conceptual entities defined in sequences of text can also be imagined as modalities. Such a formulation has the potential to overcome the cognitive and computational limitations of current models. Several illustrative examples of such potential implicit modalities are given. Along with vast promises of the hypothesized structure, expected challenges are discussed as well.
摘要
大型语言模型(LLM)在演化中将多Modalities,如文本、图像和音频 integrate into a unified linguistic space。我们可以想像将概念实体定义为文本序列中的一部分,也可以被视为模式。这种概念的形式化有很多潜在的应用前景,并且可以超越当前模型的认知和计算限制。我们给出了一些示例,以及预期的挑战。
Structured Semidefinite Programming for Recovering Structured Preconditioners
paper_authors: Arun Jambulapati, Jerry Li, Christopher Musco, Kirankumar Shiragur, Aaron Sidford, Kevin Tian
for: 这种框架用于解决线性系统的约似优化预处理问题。
methods: 使用新的核心矩阵方法和矩阵解决问题来解决这种问题。
results: 得到了改进的运行时间,比如$\widetilde{O}(\text{nnz}(\mathbf{K}) \cdot \text{poly}(\kappa^\star,\epsilon^{-1}))$和$\widetilde{O}(d^2)$。Abstract
We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, computes an $\epsilon$-optimal diagonal preconditioner in time $\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{-1}))$, where $\kappa^\star$ is the optimal condition number of the rescaled matrix. We give an algorithm which, given $\mathbf{M} \in \mathbb{R}^{d \times d}$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $\mathbf{M}$ in $\widetilde{O}(d^2)$ time. Our diagonal preconditioning results improve state-of-the-art runtimes of $\Omega(d^{3.5})$ attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of $\Omega(d^{\omega})$ where $\omega > 2.3$ is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery.
摘要
我们开发了一个通用框架,用于找到近似优化的预Conditioner,以解决线性系统。利用这个框架,我们得到了改进的运行时间 для基本的预Conditioning和线性系统解决问题,包括以下几个。我们提供了一个算法, Given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, computes an $\epsilon$-optimal diagonal预Conditioner in time $\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{-1}))$, where $\kappa^\star$ is the optimal condition number of the rescaled matrix.我们提供了一个算法, Given $\mathbf{M} \in \mathbb{R}^{d \times d}$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $\mathbf{M}$ in $\widetilde{O}(d^2)$ time.我们的diagonal预Conditioning结果提高了现有的semidefinite programming的运行时间,从而实现了$\Omega(d^{3.5})$的性能。我们的解决方案提高了现有的运行时间,达到了$\Omega(d^{\omega})$,其中 $\omega > 2.3$ 是当前的矩阵乘法常数。我们通过新的semidefinite programs(SDPs)的算法,称为matrix-dictionary approximation SDPs,来解决一个 associate problem,称为matrix-dictionary recovery。
Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning
results: 我们对physical robot Soccer任务和simulated D4RL导航任务、simulated autonomous driving任务和simulated Soccer任务进行了实验,发现GuDA可以学习从一小量可能不优质的示例中,并大幅超越随机生成的DA策略。Abstract
Learning from demonstration (LfD) is a popular technique that uses expert demonstrations to learn robot control policies. However, the difficulty in acquiring expert-quality demonstrations limits the applicability of LfD methods: real-world data collection is often costly, and the quality of the demonstrations depends greatly on the demonstrator's abilities and safety concerns. A number of works have leveraged data augmentation (DA) to inexpensively generate additional demonstration data, but most DA works generate augmented data in a random fashion and ultimately produce highly suboptimal data. In this work, we propose Guided Data Augmentation (GuDA), a human-guided DA framework that generates expert-quality augmented data. The key insight of GuDA is that while it may be difficult to demonstrate the sequence of actions required to produce expert data, a user can often easily identify when an augmented trajectory segment represents task progress. Thus, the user can impose a series of simple rules on the DA process to automatically generate augmented samples that approximate expert behavior. To extract a policy from GuDA, we use off-the-shelf offline reinforcement learning and behavior cloning algorithms. We evaluate GuDA on a physical robot soccer task as well as simulated D4RL navigation tasks, a simulated autonomous driving task, and a simulated soccer task. Empirically, we find that GuDA enables learning from a small set of potentially suboptimal demonstrations and substantially outperforms a DA strategy that samples augmented data randomly.
摘要
学习示例(LfD)是一种广泛使用的技术,通过专家示例学习机器人控制策略。然而,获得专家质量示例的困难限制了LfD方法的应用范围:现实世界数据收集常常成本高昂,示例制定者的能力和安全问题具有很大的影响。许多工作使用数据扩展(DA)来生成更多的示例数据,但大多数DA工作生成扩展数据的方式是随机的,最终生成低质量的数据。在这种情况下,我们提出了指导数据扩展(GuDA),一种人类指导的DA框架,可以生成专家质量的扩展数据。GuDA的关键想法是,虽然可能困难示出完整的任务执行序列,但用户可以轻松地判断扩展 trajectory 段是否表示任务进步。因此,用户可以对 DA 过程进行一些简单的规则,自动生成扩展样本,approxime 专家行为。为提取策略,我们使用现有的离线 reinforcement learning 和行为复制算法。我们对 physical robot 足球任务、simulated D4RL 导航任务、simulated autonomous driving 任务和 simulated 足球任务 进行了评估。 empirically,我们发现 GuDA 可以从小型可能不优质的示例中学习,并substantially 超过随机扩展数据的DA策略。
$α$-Mutual Information: A Tunable Privacy Measure for Privacy Protection in Data Sharing
methods: 我们使用了一种通用的扭曲基本机制, manipulate the original data to offer privacy protection。该扭曲度量根据具体的实验数据结构进行确定。
results: 我们通过实验证明了α-私钥信息的适用性,并证明了我们的方法可以在不同的性能维度上妥协隐私和实用性。此外,我们还分析了攻击者获取私人数据的边 información的后果,并证明了我们的适应性比现有技术更高。Abstract
This paper adopts Arimoto's $\alpha$-Mutual Information as a tunable privacy measure, in a privacy-preserving data release setting that aims to prevent disclosing private data to adversaries. By fine-tuning the privacy metric, we demonstrate that our approach yields superior models that effectively thwart attackers across various performance dimensions. We formulate a general distortion-based mechanism that manipulates the original data to offer privacy protection. The distortion metrics are determined according to the data structure of a specific experiment. We confront the problem expressed in the formulation by employing a general adversarial deep learning framework that consists of a releaser and an adversary, trained with opposite goals. This study conducts empirical experiments on images and time-series data to verify the functionality of $\alpha$-Mutual Information. We evaluate the privacy-utility trade-off of customized models and compare them to mutual information as the baseline measure. Finally, we analyze the consequence of an attacker's access to side information about private data and witness that adapting the privacy measure results in a more refined model than the state-of-the-art in terms of resiliency against side information.
摘要
Note: The text has been translated into Simplified Chinese, which is the standard writing system used in mainland China. The translation may not be perfect, and some nuances or idioms may be lost in translation.
paper_authors: Sáez-Maldonado Francisco Javier, Maroñas Juan, Hernández-Lobato Daniel
for: 本研究旨在探讨增强 Gaussian Processes (GP) 的扩展,提高 GP 的灵活性。
methods: 本文提出了一种新的扩展方法,即 Deep Transformed Gaussian Processes (DTGP),它是通过堆叠多个噪声过程层来增强 GP 的灵活性。
results: 实验表明,DTGP 可以在多个回归数据集中实现好的扩展性和性能。Abstract
Transformed Gaussian Processes (TGPs) are stochastic processes specified by transforming samples from the joint distribution from a prior process (typically a GP) using an invertible transformation; increasing the flexibility of the base process. Furthermore, they achieve competitive results compared with Deep Gaussian Processes (DGPs), which are another generalization constructed by a hierarchical concatenation of GPs. In this work, we propose a generalization of TGPs named Deep Transformed Gaussian Processes (DTGPs), which follows the trend of concatenating layers of stochastic processes. More precisely, we obtain a multi-layer model in which each layer is a TGP. This generalization implies an increment of flexibility with respect to both TGPs and DGPs. Exact inference in such a model is intractable. However, we show that one can use variational inference to approximate the required computations yielding a straightforward extension of the popular DSVI inference algorithm Salimbeni et al (2017). The experiments conducted evaluate the proposed novel DTGPs in multiple regression datasets, achieving good scalability and performance.
摘要
transformed Gaussian 进程(TGP)是一种 Stochastic 过程,其 Specified by transforming samples from the joint distribution of a prior process (usually a GP) using an invertible transformation; increasing the flexibility of the base process. Furthermore, they achieve competitive results compared with deep Gaussian processes (DGPs), which are another generalization constructed by a hierarchical concatenation of GPs. In this work, we propose a generalization of TGPs named deep transformed Gaussian processes (DTGPs), which follows the trend of concatenating layers of stochastic processes. More precisely, we obtain a multi-layer model in which each layer is a TGP. This generalization implies an increase in flexibility with respect to both TGPs and DGPs. Exact inference in such a model is intractable. However, we show that one can use variational inference to approximate the required computations, yielding a straightforward extension of the popular DSVI inference algorithm (Salimbeni et al., 2017). The experiments conducted evaluate the proposed novel DTGPs in multiple regression datasets, achieving good scalability and performance.Note: Some of the technical terms in the original text, such as "Gaussian processes" and "variational inference," may not have direct translations in Simplified Chinese. In such cases, I have used the most common translations available in the literature, but the reader may need to consult a more specialized dictionary or reference for a more precise translation.
One Model Fits All: Cross-Region Taxi-Demand Forecasting
results: 实验结果表明,提案的系统能够准确预测出行需求,包括在未经见过的区域。这显示了该系统在优化出行服务和提高交通效率的潜力。Abstract
The growing demand for ride-hailing services has led to an increasing need for accurate taxi demand prediction. Existing systems are limited to specific regions, lacking generalizability to unseen areas. This paper presents a novel taxi demand forecasting system that leverages a graph neural network to capture spatial dependencies and patterns in urban environments. Additionally, the proposed system employs a region-neutral approach, enabling it to train a model that can be applied to any region, including unseen regions. To achieve this, the framework incorporates the power of Variational Autoencoder to disentangle the input features into region-specific and region-neutral components. The region-neutral features facilitate cross-region taxi demand predictions, allowing the model to generalize well across different urban areas. Experimental results demonstrate the effectiveness of the proposed system in accurately forecasting taxi demand, even in previously unobserved regions, thus showcasing its potential for optimizing taxi services and improving transportation efficiency on a broader scale.
摘要
随着乘车需求的增长,需求预测已成为了ride-hailing服务的紧迫需求。现有的系统受限于特定地区,缺乏对未见地区的泛化能力。这篇论文提出了一种新的出租车需求预测系统,利用图 neural network 捕捉城市环境中的空间依赖关系和模式。此外,提出的系统采用了地域中性的方法,使得其可以训练可应用于任何地区,包括未见地区的模型。为达到这一目标,框架具有Variational Autoencoder 的力量,压缩输入特征成地域特定和地域中性组成部分。地域中性特征使得出租车需求预测可以在不同的城市区域进行跨地区预测,使模型能够在不同的城市区域中具有泛化能力。实验结果表明,提出的系统可以准确预测出租车需求,甚至在未见地区进行预测,从而展示其在优化出租车服务和改善交通效率的潜力。
Robustness of Algorithms for Causal Structure Learning to Hyperparameter Choice
paper_authors: Damian Machlanski, Spyridon Samothrakis, Paul Clarke
for: 本研究旨在探讨隐藏参数在结构学习中的影响,以及如何选择最佳的隐藏参数来提高结构学习性能。
methods: 本研究采用了一些经典的结构学习算法,并对这些算法进行了融合调参。
results: 研究发现,隐藏参数的选择在集成设置中具有很大的影响,可以导致分析者选择不适合自己数据的算法,从而影响结构学习性能。Abstract
Hyperparameters play a critical role in machine learning. Hyperparameter tuning can make the difference between state-of-the-art and poor prediction performance for any algorithm, but it is particularly challenging for structure learning due to its unsupervised nature. As a result, hyperparameter tuning is often neglected in favour of using the default values provided by a particular implementation of an algorithm. While there have been numerous studies on performance evaluation of causal discovery algorithms, how hyperparameters affect individual algorithms, as well as the choice of the best algorithm for a specific problem, has not been studied in depth before. This work addresses this gap by investigating the influence of hyperparameters on causal structure learning tasks. Specifically, we perform an empirical evaluation of hyperparameter selection for some seminal learning algorithms on datasets of varying levels of complexity. We find that, while the choice of algorithm remains crucial to obtaining state-of-the-art performance, hyperparameter selection in ensemble settings strongly influences the choice of algorithm, in that a poor choice of hyperparameters can lead to analysts using algorithms which do not give state-of-the-art performance for their data.
摘要
Model-free Posterior Sampling via Learning Rate Randomization
paper_authors: Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Remi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Menard
results: RandQL 在 tabular MDPs 和 metric state-action space 中均可以 achieve regret bound of order $\widetilde{\mathcal{O}(\sqrt{H^{5}SAT})$,并且在 empirical study 中与 existing approaches 比较,表现更好。Abstract
In this paper, we introduce Randomized Q-learning (RandQL), a novel randomized model-free algorithm for regret minimization in episodic Markov Decision Processes (MDPs). To the best of our knowledge, RandQL is the first tractable model-free posterior sampling-based algorithm. We analyze the performance of RandQL in both tabular and non-tabular metric space settings. In tabular MDPs, RandQL achieves a regret bound of order $\widetilde{\mathcal{O}(\sqrt{H^{5}SAT})$, where $H$ is the planning horizon, $S$ is the number of states, $A$ is the number of actions, and $T$ is the number of episodes. For a metric state-action space, RandQL enjoys a regret bound of order $\widetilde{\mathcal{O}(H^{5/2} T^{(d_z+1)/(d_z+2)})$, where $d_z$ denotes the zooming dimension. Notably, RandQL achieves optimistic exploration without using bonuses, relying instead on a novel idea of learning rate randomization. Our empirical study shows that RandQL outperforms existing approaches on baseline exploration environments.
摘要
在这篇论文中,我们介绍了Randomized Q-learning(RandQL),一种新的随机化无模型算法,用于减少偏差在集束Markov决策过程(MDP)中的 regret。我们知道,RandQL是首个可追踪的无模型 posterior sampling-based algorithm。我们分析了RandQL在标准MDP和非标准 metric space中的性能。在标准MDP中,RandQL的 regret bound为 $\widetilde{\mathcal{O}(\sqrt{H^{5}SAT})$,其中 $H$ 是规划 horizion, $S$ 是状态数量, $A$ 是动作数量, $T$ 是集束数量。在 metric state-action space 中,RandQL 的 regret bound为 $\widetilde{\mathcal{O}(H^{5/2} T^{(d_z+1)/(d_z+2)})$,其中 $d_z$ 表示 zooming 维度。值得注意的是,RandQL 实现了无奖诱导的探索,不使用奖励,而是通过一种新的学习率随机化的想法。我们的实验研究表明,RandQL 在基eline探索环境中表现出色。
Enhancing Enterprise Network Security: Comparing Machine-Level and Process-Level Analysis for Dynamic Malware Detection
results: 对比前一代STATE-OF-THE-ART方法,本研究的提议模型具有较高的检测精度,具体来说,检测精度提高约20.12%,而false positive率在0.1左右。Abstract
Analysing malware is important to understand how malicious software works and to develop appropriate detection and prevention methods. Dynamic analysis can overcome evasion techniques commonly used to bypass static analysis and provide insights into malware runtime activities. Much research on dynamic analysis focused on investigating machine-level information (e.g., CPU, memory, network usage) to identify whether a machine is running malicious activities. A malicious machine does not necessarily mean all running processes on the machine are also malicious. If we can isolate the malicious process instead of isolating the whole machine, we could kill the malicious process, and the machine can keep doing its job. Another challenge dynamic malware detection research faces is that the samples are executed in one machine without any background applications running. It is unrealistic as a computer typically runs many benign (background) applications when a malware incident happens. Our experiment with machine-level data shows that the existence of background applications decreases previous state-of-the-art accuracy by about 20.12% on average. We also proposed a process-level Recurrent Neural Network (RNN)-based detection model. Our proposed model performs better than the machine-level detection model; 0.049 increase in detection rate and a false-positive rate below 0.1.
摘要
Proportional Fairness in Clustering: A Social Choice Perspective
results: 这篇论文显示任何归一化都能同时满足分配公平性定义和个人公平性定义,以及更强的多赢者代表性定义。此外,弱 пропорциональ性定义也能导致更强的多赢者代表性定义的近似值。Abstract
We study the proportional clustering problem of Chen et al. [ICML'19] and relate it to the area of multiwinner voting in computational social choice. We show that any clustering satisfying a weak proportionality notion of Brill and Peters [EC'23] simultaneously obtains the best known approximations to the proportional fairness notion of Chen et al. [ICML'19], but also to individual fairness [Jung et al., FORC'20] and the "core" [Li et al. ICML'21]. In fact, we show that any approximation to proportional fairness is also an approximation to individual fairness and vice versa. Finally, we also study stronger notions of proportional representation, in which deviations do not only happen to single, but multiple candidate centers, and show that stronger proportionality notions of Brill and Peters [EC'23] imply approximations to these stronger guarantees.
摘要
我们研究陈等人的著作中的协Relative Clustering问题([ICML'19]),并与计算社会选择领域的多赢者投票问题相关。我们显示任何满足Brill和Peter斯([EC'23])的弱 пропорциональ性定义,同时也能获得Chen等人的最佳知道的变分概念([ICML'19])、个体公平([Jung等人,FORC'20])以及"核心"([Li等人,ICML'21])的最佳近似。事实上,任何对 proportional fairness 的近似也是对个体公平的近似,并且vice versa。 finally,我们还研究了更强的多个候选者代表性定义,在多个候选者中偏移不仅发生在单个中心,而是在多个中心上,并证明Brill和Peter斯([EC'23])的更强的 proportionality 定义能够导致这些更强的保证。
Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling
results: 论文提出了对Score-matching基于 causal discovery方法的Recovering causal relationships的误差率的下界,假设得分函数的估计充分准确。此外,论文还分析了Score-matching估计在Score-based生成模型中的Upper bound。Abstract
This paper provides statistical sample complexity bounds for score-matching and its applications in causal discovery. We demonstrate that accurate estimation of the score function is achievable by training a standard deep ReLU neural network using stochastic gradient descent. We establish bounds on the error rate of recovering causal relationships using the score-matching-based causal discovery method of Rolland et al. [2022], assuming a sufficiently good estimation of the score function. Finally, we analyze the upper bound of score-matching estimation within the score-based generative modeling, which has been applied for causal discovery but is also of independent interest within the domain of generative models.
摘要
Note:* "score-matching" is translated as "分数匹配" (fēnzhèng píngchǎ)* "causal discovery" is translated as " causal discovery" ( causal discovery)* "score function" is translated as "分数函数" (fēnzhèng fúnción)* "standard deep ReLU neural network" is translated as "标准深度ReLU神经网络" (zhèngdé ReLU xīnnéirwàng)* "stochastic gradient descent" is translated as "随机梯度下降" (suìjī tiēdào xiàojiù)* "error rate" is translated as "错误率" (error rate)
A Global Multi-Unit Calibration as a Method for Large Scale IoT Particulate Matter Monitoring Systems Deployments
results: 测试campaign表明,当应用于不同的传感器时,这种方法的性能与现有的方法匹配,而且可以实现大量的准确气囊监测设备的投入。如果确认,这些结果表明,当得到了正确的准确测试法,可以在大量的网络设备上实现准确的气囊监测,并且可以减少长距离数据传输需求。此外,这种准确测试模型可以轻松地被嵌入到设备上,或者在边缘实现,以便实现个人曝露监测应用。Abstract
Scalable and effective calibration is a fundamental requirement for Low Cost Air Quality Monitoring Systems and will enable accurate and pervasive monitoring in cities. Suffering from environmental interferences and fabrication variance, these devices need to encompass sensors specific and complex calibration processes for reaching a sufficient accuracy to be deployed as indicative measurement devices in Air Quality (AQ) monitoring networks. Concept and sensor drift often force calibration process to be frequently repeated. These issues lead to unbearable calibration costs which denies their massive deployment when accuracy is a concern. In this work, We propose a zero transfer samples, global calibration methodology as a technological enabler for IoT AQ multisensory devices which relies on low cost Particulate Matter (PM) sensors. This methodology is based on field recorded responses from a limited number of IoT AQ multisensors units and machine learning concepts and can be universally applied to all units of the same type. A multi season test campaign shown that, when applied to different sensors, this methodology performances match those of state of the art methodology which requires to derive different calibration parameters for each different unit. If confirmed, these results show that, when properly derived, a global calibration law can be exploited for a large number of networked devices with dramatic cost reduction eventually allowing massive deployment of accurate IoT AQ monitoring devices. Furthermore, this calibration model could be easily embedded on board of the device or implemented on the edge allowing immediate access to accurate readings for personal exposure monitor applications as well as reducing long range data transfer needs.
摘要
低成本空气质量监测系统需要扩展可扩展的准确化,以实现精准和广泛的监测城市。由于环境干扰和制造变化,这些设备需要特定的感应器和复杂的准确化过程以达到足够的准确性,以便作为空气质量(AQ)监测网络的指示测量设备。概念和感应器偏移 часто导致准确化过程需要频繁重复。这些问题导致不可持续的准确化成本,这使得大规模部署变得不可能。在这种情况下,我们提议一种零传输样本、全球准确化方法,这种方法基于低成本 particulate matter(PM)感应器。这种方法基于场记录的响应,并且可以通过机器学习概念应用于所有类型的单元。一个多季度测试 campagne表明,当应用于不同的感应器时,这种方法的性能与当前的方法匹配,该方法需要为每个不同单元 derivation 不同的准确化参数。如果确认,这些结果表明,当正确地 derivation 全球准确化法则,可以在大量部署精准的 IoT AQ 监测设备。此外,这种准确化模型可以轻松地嵌入到设备上或实现在边缘,以便提供快速的准确阅读,用于个人曝露监测应用,以及减少长距离数据传输需求。
Transductive conformal inference with adaptive scores
For: This paper provides distribution-free guarantees for many machine learning tasks, specifically in the transductive setting where decisions are made on a test sample of new points.* Methods: The paper uses conformal inference, which is a fundamental and versatile tool that provides distribution-free guarantees. The paper also uses a P'olya urn model to describe the joint distribution of the conformal $p$-values, and establishes a concentration inequality for their empirical distribution function.* Results: The paper provides uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification. The results hold for arbitrary exchangeable scores, including adaptive ones that can use the covariates of the test+calibration samples at training stage for increased accuracy.Abstract
Conformal inference is a fundamental and versatile tool that provides distribution-free guarantees for many machine learning tasks. We consider the transductive setting, where decisions are made on a test sample of $m$ new points, giving rise to $m$ conformal $p$-values. {While classical results only concern their marginal distribution, we show that their joint distribution follows a P\'olya urn model, and establish a concentration inequality for their empirical distribution function.} The results hold for arbitrary exchangeable scores, including {\it adaptive} ones that can use the covariates of the test+calibration samples at training stage for increased accuracy. We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks of current interest: interval prediction for transductive transfer learning and novelty detection based on two-class classification.
摘要
它们是一种基本和多方面的工具,提供不受分布限制的保证,用于许多机器学习任务。我们考虑了推论 setting,在一个测试样本中有 $m$ 个新点,从而生成 $m$ 个充分满足的 $p$-值。{而 classical 结果只关注它们的边缘分布,我们显示它们的联合分布遵循波尔雅urn模型,并证明它们的empirical distribution function具有减法不等式.}结果适用于任意兼容的分数,包括可适应的分数,可以在训练阶段使用测试样本和标准化样本的 covariates 进行更高的准确性。我们通过对两个现有的机器学习任务进行保证, namely interval prediction for transductive transfer learning和 noveldetection based on two-class classification,来证明这些理论结果的实用性。
Adversarial Anomaly Detection using Gaussian Priors and Nonlinear Anomaly Scores
results: 与现有工作相比,该论文在MITBIH Arrhythmia Database 上的 $F_1$ 分数从0.85提高到0.92,表明β-VAEGAN 可以更好地检测异常。Abstract
Anomaly detection in imbalanced datasets is a frequent and crucial problem, especially in the medical domain where retrieving and labeling irregularities is often expensive. By combining the generative stability of a $\beta$-variational autoencoder (VAE) with the discriminative strengths of generative adversarial networks (GANs), we propose a novel model, $\beta$-VAEGAN. We investigate methods for composing anomaly scores based on the discriminative and reconstructive capabilities of our model. Existing work focuses on linear combinations of these components to determine if data is anomalous. We advance existing work by training a kernelized support vector machine (SVM) on the respective error components to also consider nonlinear relationships. This improves anomaly detection performance, while allowing faster optimization. Lastly, we use the deviations from the Gaussian prior of $\beta$-VAEGAN to form a novel anomaly score component. In comparison to state-of-the-art work, we improve the $F_1$ score during anomaly detection from 0.85 to 0.92 on the widely used MITBIH Arrhythmia Database.
摘要
非常常见的异常检测问题在不均衡数据集中,特别是在医疗领域,因为检测和标注异常性往往是昂贵的。我们提出了一种新的模型,$\beta$-VAEGAN,通过结合$\beta$-variational autoencoder(VAE)的生成稳定性和生成敌对网络(GANs)的攻击力,以提高异常检测性能。我们研究了基于这两个组件的异常分数的组合方法,包括线性组合以及训练kernelized支持向量机(SVM)来考虑非线性关系。这些改进了异常检测性能,同时允许更快的优化。此外,我们还使用$\beta$-VAEGAN的偏差从拜尔分布来形成一种新的异常分数组件。与现有工作相比,我们在MITBIHArrhythmia数据库上提高了异常检测$F_1$分数从0.85提高到0.92。
Unveiling the Potential of Probabilistic Embeddings in Self-Supervised Learning
results: 研究发现, 在信息理论基础下, 增加一个压缩瓶颈可以明显提高外部样本探测能力, 但是同时可能导致表示的压缩和信息损失。Abstract
In recent years, self-supervised learning has played a pivotal role in advancing machine learning by allowing models to acquire meaningful representations from unlabeled data. An intriguing research avenue involves developing self-supervised models within an information-theoretic framework, but many studies often deviate from the stochasticity assumptions made when deriving their objectives. To gain deeper insights into this issue, we propose to explicitly model the representation with stochastic embeddings and assess their effects on performance, information compression and potential for out-of-distribution detection. From an information-theoretic perspective, we seek to investigate the impact of probabilistic modeling on the information bottleneck, shedding light on a trade-off between compression and preservation of information in both representation and loss space. Emphasizing the importance of distinguishing between these two spaces, we demonstrate how constraining one can affect the other, potentially leading to performance degradation. Moreover, our findings suggest that introducing an additional bottleneck in the loss space can significantly enhance the ability to detect out-of-distribution examples, only leveraging either representation features or the variance of their underlying distribution.
摘要
近年来,自适应学习已经在机器学习领域发挥了关键作用,允许模型从无标签数据中获得有意义的表示。一个吸引人的研究方向是在信息理论框架下开发自适应模型,但许多研究通常会背离在 derivation 目标时所做的随机性假设。为了更深入地了解这个问题,我们提议显式地模型表示中的随机嵌入,评估其对性能、信息压缩和可能出现在其他分布中的检测的影响。从信息理论的视角来看,我们希望 investigate 表示中的信息瓶颈,探讨其与损失空间之间的交互关系,以及在这两个空间之间是否存在负反馈的问题。我们发现,在损失空间中引入一个额外的瓶颈可以明显提高对于异常分布的检测,只需要使用表示特征或者对其分布的变异。
Lipschitz and Hölder Continuity in Reproducing Kernel Hilbert Spaces
for: investigate Lipschitz and H"older continuity in Reproducing Kernel Hilbert Spaces (RKHSs)
methods: provide sufficient conditions and collect related known results from the literature
results: new results on reproducing kernels inducing prescribed Lipschitz or H"older continuityAbstract
Reproducing kernel Hilbert spaces (RKHSs) are very important function spaces, playing an important role in machine learning, statistics, numerical analysis and pure mathematics. Since Lipschitz and H\"older continuity are important regularity properties, with many applications in interpolation, approximation and optimization problems, in this work we investigate these continuity notion in RKHSs. We provide several sufficient conditions as well as an in depth investigation of reproducing kernels inducing prescribed Lipschitz or H\"older continuity. Apart from new results, we also collect related known results from the literature, making the present work also a convenient reference on this topic.
摘要
<>将文本翻译成简化中文。<>复制kernel空间(RKHS)是非常重要的函数空间,在机器学习、统计、数值分析和纯 математи学中扮演着重要的角色。由于 lipschitz 和 holder 连续性是重要的规范性质,在 interpolate、approximation 和优化问题中具有广泛的应用,因此在这种工作中我们调查这些连续性观念在 RKHS 中。我们提供了多个足够条件以及对 reproduce kernel 引起的 lipschitz 或 holder 连续性进行深入调查。除了新的结果之外,我们还收集了相关的已知结果,使得现在的工作也成为了这个话题的便捷参考。
On kernel-based statistical learning in the mean field limit
results: 研究结果表明,在mean field limit中,empirical和无限样本解的 converges 以及相关的风险的 converges。这些结果为大规模问题提供了新的理论工具和洞察,同时也为统计学学习理论中的limit问题提供了新的形式。Abstract
In many applications of machine learning, a large number of variables are considered. Motivated by machine learning of interacting particle systems, we consider the situation when the number of input variables goes to infinity. First, we continue the recent investigation of the mean field limit of kernels and their reproducing kernel Hilbert spaces, completing the existing theory. Next, we provide results relevant for approximation with such kernels in the mean field limit, including a representer theorem. Finally, we use these kernels in the context of statistical learning in the mean field limit, focusing on Support Vector Machines. In particular, we show mean field convergence of empirical and infinite-sample solutions as well as the convergence of the corresponding risks. On the one hand, our results establish rigorous mean field limits in the context of kernel methods, providing new theoretical tools and insights for large-scale problems. On the other hand, our setting corresponds to a new form of limit of learning problems, which seems to have not been investigated yet in the statistical learning theory literature.
摘要
Many machine learning applications involve a large number of variables. 基于机器学习中的互动体系,我们考虑到输入变量的数量趋于无穷大。首先,我们继续推动渐近场限的kernel和它们的重现函数空间的研究,完善现有的理论。接着,我们提供用这些kernel进行近似的结果,包括一个表示定理。最后,我们使用这些kernel在mean field限下进行统计学学习,具体来说,我们展示了empirical和无限样本解的mean field收敛和相应的风险的收敛。我们的结果建立了机器学习中的mean field限,提供了新的理论工具和意见,用于处理大规模问题。另一方面,我们的设定对 statistical learning theory文献中没有被 investigate的一种新的限制问题形式,这种形式是mean field limit。
results: 该论文提出的方法可以提高梯度约束的精度,从而提高数据隐私的保护水平,同时也可以降低噪声的水平。Abstract
Recently, due to the popularity of deep neural networks and other methods whose training typically relies on the optimization of an objective function, and due to concerns for data privacy, there is a lot of interest in differentially private gradient descent methods. To achieve differential privacy guarantees with a minimum amount of noise, it is important to be able to bound precisely the sensitivity of the information which the participants will observe. In this study, we present a novel approach that mitigates the bias arising from traditional gradient clipping. By leveraging public information concerning the current global model and its location within the search domain, we can achieve improved gradient bounds, leading to enhanced sensitivity determinations and refined noise level adjustments. We extend the state of the art algorithms, present improved differential privacy guarantees requiring less noise and present an empirical evaluation.
摘要
最近,由于深度神经网络和其他方法的训练通常基于目标函数优化,以及数据隐私的关注,有很多关注在不同敏感度下进行梯度下降方法。为了保证数据隐私保障,需要准确地评估梯度下降中信息敏感度。在本研究中,我们提出了一种新的方法,用于减轻传统梯度裁剪所导致的偏见。我们利用公共信息,包括当前全球模型和其位置在搜索区域中,来实现更好的梯度 bound,从而提高敏感度评估和降低噪声水平。我们扩展了现有算法,提供更好的不同敏感度保障,并进行了实验评估。
Closing the Gap Between the Upper Bound and the Lower Bound of Adam’s Iteration Complexity
paper_authors: Bohan Wang, Jingwen Fu, Huishuai Zhang, Nanning Zheng, Wei Chen for: 这个论文是为了提供一种新的 Adam 优化算法的 convergenc guarantee,以便在不同的 hyperparameters 下实现更高的效率。methods: 这个论文使用了一种新的技术来处理积分和自适应学习率的杂糅,并将 Descent Lemma 中的首项转换为 gradients 的 norm,以获得更高的效率。results: 这个论文提出了一种新的 Adam 优化算法,其 convergenc guarantee 是基于 $L$-smooth condition 和 bounded noise variance assumption,并且适用于广泛的 hyperparameters。特别是,对于合适的 hyperparameters,这个算法可以实现更高的效率,并且可以 closing the gap между existing literature 中的 convergence guarantee 和实际性能。Abstract
Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam's convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption. Our results remain valid across a broad spectrum of hyperparameters. Especially with properly chosen hyperparameters, we derive an upper bound of the iteration complexity of Adam and show that it meets the lower bound for first-order optimizers. To the best of our knowledge, this is the first to establish such a tight upper bound for Adam's convergence. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate and to convert the first-order term in the Descent Lemma to the gradient norm, which may be of independent interest.
摘要
近些时候,Arjevani等人(1)已经建立了first-order优化的迭代复杂度下界。然而,现有文献中对Adam的减少报告没有满足上述下界。在这篇论文中,我们填补了这一漏洞,通过引入$L$-smooth条件和bounded noise variance假设, derivate一个新的Adam的减少保证。我们的结论适用于广泛的权重参数。特别是,对于适当的权重参数,我们 derive一个迭代复杂度的 Upper bound of Adam,并证明它与first-order优化器的下界相符。根据我们知道,这是first-order优化器的减少保证的首次建立。我们的证明使用了新的技术来处理杠杆和自适应学习率的杂糅,并将 Descent Lemma 中的first-order项转换为梯度norm,这可能有独立的价值。
results: 这篇论文显示了一个叫做“碳能efficient FL”,其中使用了能源的碳气况来衡量成本。结果显示,这种方法可以 reduces carbon emissions by 93% 和 reduces training time by 50% 相比随机选择客户端。另外,它可以 reduces carbon emissions by 80%, 而仅增加训练时间 by 38% 相比一种现有的方法。Abstract
Federated Learning (FL) distributes machine learning (ML) training across many edge devices to reduce data transfer overhead and protect data privacy. Since FL model training may span millions of devices and is thus resource-intensive, prior work has focused on improving its resource efficiency to optimize time-to-accuracy. However, prior work generally treats all resources the same, while, in practice, they may incur widely different costs, which instead motivates optimizing cost-to-accuracy. To address the problem, we design CEFL, which uses adaptive cost-aware client selection policies to optimize an arbitrary cost metric when training FL models. Our policies extend and combine prior work on utility-based client selection and critical learning periods by making them cost-aware. We demonstrate CEFL by designing carbon-efficient FL, where energy's carbon-intensity is the cost, and show that it i) reduces carbon emissions by 93\% and reduces training time by 50% compared to random client selection and ii) reduces carbon emissions by 80%, while only increasing training time by 38%, compared to a state-of-the-art approach that optimizes training time.
摘要
协同学习(FL)通过分布机器学习训练 across多个边缘设备来减少数据传输开销和保护数据隐私。由于FL模型训练可能涵盖数百万个设备,因此需要进行资源效率优化以提高时间精度。然而,先前的工作通常忽视不同资源之间的差异,而在实践中,这些资源可能具有不同的成本,这些成本反而需要优化成本精度。为解决这个问题,我们提出了CEFL,它使用适应成本 aware的客户端选择策略来优化任意成本度量在协同学习模型训练中。我们的策略扩展和结合了先前的实用性基于资源利用率的客户端选择策略和批处理学习时期的优化策略,使其成为成本 aware。我们通过设计碳素协同学习,其中能源的碳气强度作为成本,并证明了它可以:1. 降低碳排放量93%,降低训练时间50%比Random Client Selection。2. 降低碳排放量80%,仅提高训练时间38%比一种状态精通的方法。
results: 本文提出了一些解决方案来提高 EML 系统的可靠性,但也指出了一些研究挑战和未解决的问题。Abstract
The convergence of Edge Computing (EC) and Machine Learning (ML), known as Edge Machine Learning (EML), has become a highly regarded research area by utilizing distributed network resources to perform joint training and inference in a cooperative manner. However, EML faces various challenges due to resource constraints, heterogeneous network environments, and diverse service requirements of different applications, which together affect the trustworthiness of EML in the eyes of its stakeholders. This survey provides a comprehensive summary of definitions, attributes, frameworks, techniques, and solutions for trustworthy EML. Specifically, we first emphasize the importance of trustworthy EML within the context of Sixth-Generation (6G) networks. We then discuss the necessity of trustworthiness from the perspective of challenges encountered during deployment and real-world application scenarios. Subsequently, we provide a preliminary definition of trustworthy EML and explore its key attributes. Following this, we introduce fundamental frameworks and enabling technologies for trustworthy EML systems, and provide an in-depth literature review of the latest solutions to enhance trustworthiness of EML. Finally, we discuss corresponding research challenges and open issues.
摘要随着边缘计算(EC)和机器学习(ML)的融合,称为边缘机器学习(EML),已经成为了非常受到关注的研究领域,通过分布式网络资源进行共同训练和推理,以实现共同的目标。但是,EML受到了资源约束、多样网络环境和不同应用程序的服务需求等多种挑战,这些挑战共同影响了EML的可靠性,从而影响了它的投资者和用户的信任度。本文提供了Edge Machine Learning的全面概述,包括定义、特征、框架、技术和解决方案,以确保EML在6G网络中的可靠性。Here's the breakdown of the translation: - This tag indicates that the following text is a system-level translation, rather than a word-for-word translation.随着边缘计算(EC)和机器学习(ML)的融合 - This phrase translates to "With the convergence of edge computing and machine learning."称为边缘机器学习(EML) - This phrase translates to "known as edge machine learning."已经成为了非常受到关注的研究领域 - This phrase translates to "has already become a highly regarded research area."通过分布式网络资源进行共同训练和推理 - This phrase translates to "by utilizing distributed network resources to perform joint training and inference."以实现共同的目标 - This phrase translates to "to achieve common goals."但是 - This word translates to "but."EML受到了资源约束、多样网络环境和不同应用程序的服务需求等多种挑战 - This phrase translates to "EML faces various challenges due to resource constraints, heterogeneous network environments, and diverse service requirements of different applications."这些挑战共同影响了EML的可靠性 - This phrase translates to "these challenges collectively affect the trustworthiness of EML."从而影响了它的投资者和用户的信任度 - This phrase translates to "and thus affect the investors and users' trust in it."本文提供了Edge Machine Learning的全面概述,包括定义、特征、框架、技术和解决方案 - This phrase translates to "This article provides a comprehensive overview of Edge Machine Learning, including definitions, features, frameworks, techniques, and solutions."以确保EML在6G网络中的可靠性 - This phrase translates to "to ensure the reliability of EML in 6G networks."
MicroNAS: Memory and Latency Constrained Hardware-Aware Neural Architecture Search for Time Series Classification on Microcontrollers
results: 这个系统可以实现高精度的识别结果(UCI-HAR 93.93%,SkodaR 96.33%),并在 Cortex-M4 MCU 上运行。Abstract
This paper presents MicroNAS, a system designed to automatically search and generate neural network architectures capable of classifying time series data on resource-constrained microcontrollers (MCUs) and generating standard tf-lite ML models. MicroNAS takes into account user-defined constraints on execution latency and peak memory consumption on a target MCU. This approach ensures that the resulting neural network architectures are optimised for the specific constraints and requirements of the MCU on which they are implemented. To achieve this, MicroNAS uses a look-up table estimation approach for accurate execution latency calculations, with a minimum error of only 1.02ms. This accurate latency estimation on MCUs sets it apart from other hardware-aware neural architecture search (HW-NAS) methods that use less accurate estimation techniques. Finally, MicroNAS delivers performance close to that of state-of-the-art models running on desktop computers, achieving high classification accuracies on recognised datasets (93.93% on UCI-HAR and 96.33% on SkodaR) while running on a Cortex-M4 MCU.
摘要
To achieve this, MicroNAS uses a look-up table estimation approach for accurate execution latency calculations, with a minimum error of only 1.02 milliseconds. This accurate latency estimation on MCUs sets it apart from other hardware-aware neural architecture search (HW-NAS) methods that use less accurate estimation techniques.Finally, MicroNAS delivers performance close to that of state-of-the-art models running on desktop computers, achieving high classification accuracies on recognized datasets (93.93% on UCI-HAR and 96.33% on SkodaR) while running on a Cortex-M4 MCU.
Lifting the Veil: Unlocking the Power of Depth in Q-learning
paper_authors: Shao-Bo Lin, Tao Li, Shaojie Tang, Yao Wang, Ding-Xuan Zhou
for: 本文试图 theoretically verify the power of depth in deep Q-learning, and provide a solid theoretical foundation for its success in numerous applications.
methods: 本文使用 statistical learning theory 来rigorously prove that deep Q-learning outperforms traditional Q-learning, and demonstrate its good generalization error bound.
results: 研究发现,深度 Q-learning 的成功主要归功于深度神经网络(deep nets)的特殊性能,而不是它的大容量。 Additionally, the paper answers three key questions: Why does deep Q-learning perform so well? When does deep Q-learning perform better than traditional Q-learning? How many samples are required to achieve a specific prediction accuracy for deep Q-learning?Abstract
With the help of massive data and rich computational resources, deep Q-learning has been widely used in operations research and management science and has contributed to great success in numerous applications, including recommender systems, supply chains, games, and robotic manipulation. However, the success of deep Q-learning lacks solid theoretical verification and interpretability. The aim of this paper is to theoretically verify the power of depth in deep Q-learning. Within the framework of statistical learning theory, we rigorously prove that deep Q-learning outperforms its traditional version by demonstrating its good generalization error bound. Our results reveal that the main reason for the success of deep Q-learning is the excellent performance of deep neural networks (deep nets) in capturing the special properties of rewards namely, spatial sparseness and piecewise constancy, rather than their large capacities. In this paper, we make fundamental contributions to the field of reinforcement learning by answering to the following three questions: Why does deep Q-learning perform so well? When does deep Q-learning perform better than traditional Q-learning? How many samples are required to achieve a specific prediction accuracy for deep Q-learning? Our theoretical assertions are verified by applying deep Q-learning in the well-known beer game in supply chain management and a simulated recommender system.
摘要
通过庞大的数据和丰富的计算资源,深度Q学习在运筐研究和管理科学中广泛应用,并在多个应用中做出了卓越的成绩,包括推荐系统、供应链、游戏和机器人操作。然而,深度Q学习的成功尚未得到坚实的理论验证和可读性。本文的目标是从统计学学习理论的视角出发,确认深度Q学习的力量。我们在统计学学习理论的框架下,严格地证明了深度Q学习的泛化误差 bound 比传统Q学习更好。我们的结果表明,深度Q学习的成功主要归功于深度神经网络(深度网)在奖励特性上表现出色,而不是它的大容量。本文对抗习学习领域做出了基础性的贡献,回答了以下三个问题:深度Q学习为什么会表现 så well? 深度Q学习在哪些情况下表现更好于传统Q学习? 深度Q学习需要多少样本来达到特定的预测精度?我们的理论声明得到了在啤酒游戏和一个模拟的推荐系统中应用深度Q学习的实质验证。
results: 作者提出了一种改进的策略,即改进知识梯度(iKG)策略,它可以在 variant problems of BAI 中展现出更好的性能。在数学示例中,iKG 的性能也被证明是比 KG 更好的。Abstract
The knowledge gradient (KG) algorithm is a popular policy for the best arm identification (BAI) problem. It is built on the simple idea of always choosing the measurement that yields the greatest expected one-step improvement in the estimate of the best mean of the arms. In this research, we show that this policy has limitations, causing the algorithm not asymptotically optimal. We next provide a remedy for it, by following the manner of one-step look ahead of KG, but instead choosing the measurement that yields the greatest one-step improvement in the probability of selecting the best arm. The new policy is called improved knowledge gradient (iKG). iKG can be shown to be asymptotically optimal. In addition, we show that compared to KG, it is easier to extend iKG to variant problems of BAI, with the $\epsilon$-good arm identification and feasible arm identification as two examples. The superior performances of iKG on these problems are further demonstrated using numerical examples.
摘要
“知识梯度(KG)算法是一种受欢迎的策略 для最佳臂 Identification(BAI)问题。它基于简单的想法,就是总是选择测量,可以将最大化预期的一步改善在臂的最佳均值的估计。在这个研究中,我们显示出这个策略有限制,导致算法不是 asymptotically 优化的。我们随后提供了一个修正方案,通过一步前进的方式,选择测量,可以将最大化一步改善在臂选择的可能性。这个新策略被称为改善知识梯度(iKG)。iKG可以显示是 asymptotically 优化的。此外,我们显示了在 variant 问题中,iKG比KG更容易扩展,例如 $\epsilon$-good arm identification 和可行臂 identification 两个例子。iKG 在这些问题上的表现更加出色,通过数学例子进行说明。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.
Submodel Partitioning in Hierarchical Federated Learning: Algorithm Design and Convergence Analysis
results: 通过数学分析和实验 validate HIST 能够在非对称损失函数下保证收敛性,并且在许多属性(如Cell数量、本地和全球聚合频率)的影响下对性能与效率进行了评估。实验结果表明,HIST 能够大幅减少通信成本,同时保持测试准确率不变。Abstract
Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained Internet of Things (IoT) devices. In this paper, we propose hierarchical independent submodel training (HIST), a new FL methodology that aims to address these issues in hierarchical settings. The key idea behind HIST is a hierarchical version of model partitioning, where we partition the global model into disjoint submodels in each round, and distribute them across different cells, so that each cell is responsible for training only one partition of the full model. This enables each client to save computation/storage costs while alleviating the communication loads throughout the hierarchy. We characterize the convergence behavior of HIST for non-convex loss functions under mild assumptions, showing the impact of several attributes (e.g., number of cells, local and global aggregation frequency) on the performance-efficiency tradeoff. Finally, through numerical experiments, we verify that HIST is able to save communication costs by a wide margin while achieving the same target testing accuracy.
摘要
HIST 的关键想法是在每轮中对全球模型进行层次分区,将每个分区分配给不同的细胞,以便每个客户端只需训练自己的分区,而不需要与其他细胞进行通信。这样,每个客户端都可以降低计算/存储成本,同时减轻通信负担。我们分析了不同参数(例如细胞数、本地和全球汇总频率)对性能与效率的影响。最后,通过数值实验,我们证明了 HIST 可以减少通信成本,同时保持测试准确率不变。
Machine Learning Infused Distributed Optimization for Coordinating Virtual Power Plant Assets
paper_authors: Meiyi Li, Javad Mohammadi for: This paper aims to present a novel machine learning-assisted distributed optimization method for coordinating Virtual Power Plants (VPPs) and their associated Distributed Energy Resources (DERs).methods: The proposed method, named LOOP-MAC, utilizes a multi-agent coordination approach and neural network approximators to expedite the solution search.results: The LOOP-MAC method demonstrates accelerated solution times per iteration and significantly reduced convergence times compared to conventional centralized and distributed optimization methods.Abstract
Amid the increasing interest in the deployment of Distributed Energy Resources (DERs), the Virtual Power Plant (VPP) has emerged as a pivotal tool for aggregating diverse DERs and facilitating their participation in wholesale energy markets. These VPP deployments have been fueled by the Federal Energy Regulatory Commission's Order 2222, which makes DERs and VPPs competitive across market segments. However, the diversity and decentralized nature of DERs present significant challenges to the scalable coordination of VPP assets. To address efficiency and speed bottlenecks, this paper presents a novel machine learning-assisted distributed optimization to coordinate VPP assets. Our method, named LOOP-MAC(Learning to Optimize the Optimization Process for Multi-agent Coordination), adopts a multi-agent coordination perspective where each VPP agent manages multiple DERs and utilizes neural network approximators to expedite the solution search. The LOOP-MAC method employs a gauge map to guarantee strict compliance with local constraints, effectively reducing the need for additional post-processing steps. Our results highlight the advantages of LOOP-MAC, showcasing accelerated solution times per iteration and significantly reduced convergence times. The LOOP-MAC method outperforms conventional centralized and distributed optimization methods in optimization tasks that require repetitive and sequential execution.
摘要
在增加分布能源资源(DERs)的投入中,虚拟发电厂(VPP)已成为汇集多种DERs并促进其参与到总体能源市场中的关键工具。这些VPP部署受到联邦能源管理委员会的命令2222的推动,该命令使DERs和VPPs在市场 segments中竞争。然而,DERs的多样性和分散化带来了VPP资产的扩展协调的显著挑战。为了提高效率和速度瓶颈,本文提出了一种新的机器学习协助分布优化方法,称为LOOP-MAC(学习优化优化过程多代理协调)。LOOP-MAC方法采用多代理协调视角,每个VPP代理负责多个DERs,并使用神经网络approximators快速搜索解决方案。LOOP-MAC方法使用一个报表图来保证本地约束的严格遵从,从而减少了额外处理步骤的需要。我们的结果表明LOOP-MAC方法具有优势,其解决时间和趋势时间均显著减少。LOOP-MAC方法在需要重复和序列执行的优化任务中超过了传统中央化和分布式优化方法。
A Sublinear-Time Spectral Clustering Oracle with Improved Preprocessing Time
results: 本文的结果表明,对于具有较大内径导通率(至少为 $\varphi$)和较小外径导通率(至多为 $\varepsilon$)的图像,可以实现高效的 clustering membership queries。Abstract
We address the problem of designing a sublinear-time spectral clustering oracle for graphs that exhibit strong clusterability. Such graphs contain $k$ latent clusters, each characterized by a large inner conductance (at least $\varphi$) and a small outer conductance (at most $\varepsilon$). Our aim is to preprocess the graph to enable clustering membership queries, with the key requirement that both preprocessing and query answering should be performed in sublinear time, and the resulting partition should be consistent with a $k$-partition that is close to the ground-truth clustering. Previous oracles have relied on either a $\textrm{poly}(k)\log n$ gap between inner and outer conductances or exponential (in $k/\varepsilon$) preprocessing time. Our algorithm relaxes these assumptions, albeit at the cost of a slightly higher misclassification ratio. We also show that our clustering oracle is robust against a few random edge deletions. To validate our theoretical bounds, we conducted experiments on synthetic networks.
摘要
我们处理拥有强 clustering 性的图的问题,特别是对于具有至少 $\phi$ 内导通和最多 $\varepsilon$ 外导通的 $k$ 个粒子集的图。我们的目标是在子线性时间内进行类别查询,并且保持类别结果与真实分类相似。previous 的标志有两个假设:一是 $\textrm{poly}(k)\log n$ 内外导通之间的差,二是 exponential (在 $k/\varepsilon$ 上)的预processing 时间。我们的算法则关注这两个假设,但是会导致轻微的错分率增加。我们还证明了我们的类别标志对于几个随机边的删除而Robust。为了证明我们的理论上限,我们对Synthetic 网络进行实验。
From Generative AI to Generative Internet of Things: Fundamentals, Framework, and Outlooks
results: 文章通过一个现有的现场案例研究,利用生成扩散模型(GDM)设计有效的奖励合同,以吸引用户提供高质量感知数据。此外,文章还提出了一些未来研究的开放方向。Abstract
Generative Artificial Intelligence (GAI) possesses the capabilities of generating realistic data and facilitating advanced decision-making. By integrating GAI into modern Internet of Things (IoT), Generative Internet of Things (GIoT) is emerging and holds immense potential to revolutionize various aspects of society, enabling more efficient and intelligent IoT applications, such as smart surveillance and voice assistants. In this article, we present the concept of GIoT and conduct an exploration of its potential prospects. Specifically, we first overview four GAI techniques and investigate promising GIoT applications. Then, we elaborate on the main challenges in enabling GIoT and propose a general GAI-based secure incentive mechanism framework to address them, in which we adopt Generative Diffusion Models (GDMs) for incentive mechanism designs and apply blockchain technologies for secure GIoT management. Moreover, we conduct a case study on modern Internet of Vehicle traffic monitoring, which utilizes GDMs to generate effective contracts for incentivizing users to contribute sensing data with high quality. Finally, we suggest several open directions worth investigating for the future popularity of GIoT.
摘要
优化的人工智能(GAI)具有生成真实数据和提供高级决策的能力。将GAI融入现代互联网器(IoT)后,生成互联网器(GIoT)得到了巨大的潜力,推动了社会各方面的改革,如智能监测和语音助手等智能应用。本文提出了GIoT的概念,并对其潜在可能性进行了探讨。 Specifically,我们首先介绍了四种GAI技术,然后研究了GIoT的潜在应用场景。然后,我们详细介绍了GIoT实现的主要挑战和一种基于GAI的安全奖励机制框架,其中采用生成扩散模型(GDMs)为奖励机制设计,并应用区块链技术来安全地管理GIoT。此外,我们进行了现代互联网器交通监测的实践案例,利用GDMs生成高质量感知数据的合法合约。最后,我们提出了未来GIoT的一些开放方向值得进一步探索。
Unveil Sleep Spindles with Concentration of Frequency and Time
For: The paper aims to develop an accurate and interpretable algorithm for sleep spindle detection in EEG data, and to quantify the instantaneous frequencies of spindles.* Methods: The authors introduce a novel non-linear time-frequency analysis tool called “Concentration of Frequency and Time” (ConceFT), which effectively reduces stochastic EEG influence and enhances spindle visibility in the time-frequency representation. They also developed an automated spindle detection algorithm called ConceFT-Spindle (ConceFT-S), which is compared to two other algorithms (A7 and SUMO) using two benchmark databases (Dream and MASS).* Results: The results show that ConceFT-S achieves F1 scores of 0.749 in Dream and 0.786 in MASS, which is equivalent to or surpasses the performance of A7 and SUMO with statistical significance. Additionally, the authors reveal that spindle IF is generally nonlinear.Here are the three points in Simplified Chinese text:* For: 这个论文目的是开发一个准确和可解释的EEG数据中睡眠潮汐检测算法,并量化潮汐的快速频率。* Methods: 作者们引入了一种新的非线性时间频谱分析工具”集中频率和时间”(ConceFT),该工具有效减少了随机EEG的影响,使潮汐在时间频谱表示中更加明了潮汐。他们还开发了一个自动潮汐检测算法ConceFT-Spindle(ConceFT-S),并与A7和SUMO两个算法进行比较使用了两个标准数据库(Dream和MASS)。* Results: 结果表明,ConceFT-S在Dream和MASS两个数据库中的F1分数分别为0.749和0.786,这与或超过A7和SUMO的性能有统计学上的显著性。此外,作者们还发现,潮汐的快速频率通常是非线性的。Abstract
Objective: Sleep spindles contain crucial brain dynamics information. We introduce the novel non-linear time-frequency analysis tool 'Concentration of Frequency and Time' (ConceFT) to create an interpretable automated algorithm for sleep spindle annotation in EEG data and to measure spindle instantaneous frequencies (IFs). Methods: ConceFT effectively reduces stochastic EEG influence, enhancing spindle visibility in the time-frequency representation. Our automated spindle detection algorithm, ConceFT-Spindle (ConceFT-S), is compared to A7 (non-deep learning) and SUMO (deep learning) using Dream and MASS benchmark databases. We also quantify spindle IF dynamics. Results: ConceFT-S achieves F1 scores of 0.749 in Dream and 0.786 in MASS, which is equivalent to or surpass A7 and SUMO with statistical significance. We reveal that spindle IF is generally nonlinear. Conclusion: ConceFT offers an accurate, interpretable EEG-based sleep spindle detection algorithm and enables spindle IF quantification.
摘要
目标:睡眠尖峰含有关键脑动态信息。我们介绍了一种新的非线性时域分析工具“时域频率卷积”(ConceFT),以创建可解释的自动化睡眠尖峰标注算法,并测量尖峰快速频率(IF)的动态变化。方法:ConceFT可以有效减少随机的EEG影响,使睡眠尖峰在时域表示更加明显。我们的自动化睡眠尖峰检测算法ConceFT-Spindle(ConceFT-S)与A7(非深度学习)和SUMO(深度学习)在梦境和MASS数据库上进行比较,并评估尖峰IF动态变化。结果:ConceFT-S在梦境和MASS上的F1分数分别为0.749和0.786,与A7和SUMO相当或超过,这种差异为统计学上的显著性。我们发现尖峰IF通常是非线性的。结论:ConceFT提供了一种准确可解释的EEG基于睡眠尖峰检测算法,并允许量化尖峰IF的动态变化。
Boosting Data Analytics With Synthetic Volume Expansion
results: 研究发现,随着生成数据的增加,统计方法的误差最初逐渐减少,但 eventually可能增加或折衣。这种现象被称为“生成效应”,它表明在生成数据中复制原始数据的分布时存在一个“反射点”,即特定的误差度量的优化阈值。通过三个案例研究,包括文本感知分析、结构化数据预测和表格数据推理,我们证明了这种框架的效果,并将其与传统方法进行比较。Abstract
Synthetic data generation, a cornerstone of Generative Artificial Intelligence, signifies a paradigm shift in data science by addressing data scarcity and privacy while enabling unprecedented performance. As synthetic data gains prominence, questions arise concerning the accuracy of statistical methods when applied to synthetic data compared to raw data. In this article, we introduce the Synthetic Data Generation for Analytics framework. This framework employs statistical methods on high-fidelity synthetic data generated by advanced models such as tabular diffusion and Generative Pre-trained Transformer models. These models, trained on raw data, are further enhanced with insights from pertinent studies. A significant discovery within this framework is the generational effect: the error of a statistical method on synthetic data initially diminishes with added synthetic data but may eventually increase or plateau. This phenomenon, rooted in the complexities of replicating raw data distributions, highlights a "reflection point"--an optimal threshold in the size of synthetic data determined by specific error metrics. Through three illustrative case studies-sentiment analysis of texts, predictive modeling of structured data, and inference in tabular data--we demonstrate the effectiveness of this framework over traditional ones. We underline its potential to amplify various statistical methods, including gradient boosting for prediction and hypothesis testing, thereby underscoring the transformative potential of synthetic data generation in data science.
摘要
<>translate "Synthetic data generation, a cornerstone of Generative Artificial Intelligence, signifies a paradigm shift in data science by addressing data scarcity and privacy while enabling unprecedented performance. As synthetic data gains prominence, questions arise concerning the accuracy of statistical methods when applied to synthetic data compared to raw data. In this article, we introduce the Synthetic Data Generation for Analytics framework. This framework employs statistical methods on high-fidelity synthetic data generated by advanced models such as tabular diffusion and Generative Pre-trained Transformer models. These models, trained on raw data, are further enhanced with insights from pertinent studies. A significant discovery within this framework is the generational effect: the error of a statistical method on synthetic data initially diminishes with added synthetic data but may eventually increase or plateau. This phenomenon, rooted in the complexities of replicating raw data distributions, highlights a "reflection point"--an optimal threshold in the size of synthetic data determined by specific error metrics. Through three illustrative case studies-sentiment analysis of texts, predictive modeling of structured data, and inference in tabular data--we demonstrate the effectiveness of this framework over traditional ones. We underline its potential to amplify various statistical methods, including gradient boosting for prediction and hypothesis testing, thereby underscoring the transformative potential of synthetic data generation in data science."中文简体版:<>生成数据领域,人工智能生成的核心,数据科学领域发生了一场 парадигShift,通过地址数据缺乏和隐私问题,实现了无前例的性能。随着生成数据的普及,关注统计方法在生成数据上的准确性问题 arise。本文介绍了生成数据分析框架。这个框架利用高准确度的生成数据,由高级模型如表 diffusion和生成预训练 transformer 模型生成,这些模型在原始数据上训练。在这个框架中,我们发现了一种生成效应:在生成数据上使用统计方法的错误在初始阶段随着添加生成数据减少,但可能在某些点上增加或稳定。这种现象基于生成数据 Distribution 复杂性,表明一个 "反射点"--一个特定的错误指标决定的最佳大小。通过三个案例研究--文本情感分析、结构化数据预测和表格数据推理--我们示出了这个框架的效果,比传统方法更高。我们强调了它的潜在作用,包括权度提升、预测和假设测试,从而强调生成数据生成在数据科学中的转型潜力。
A Data-Centric Online Market for Machine Learning: From Discovery to Pricing
results: 论文的实验结果表明,这些新技术可以有效地匹配ML任务和数据,并且可以鼓励ML用户参与到市场中,从而提高ML模型的性能和可用性。Abstract
Data fuels machine learning (ML) - rich and high-quality training data is essential to the success of ML. However, to transform ML from the race among a few large corporations to an accessible technology that serves numerous normal users' data analysis requests, there still exist important challenges. One gap we observed is that many ML users can benefit from new data that other data owners possess, whereas these data owners sit on piles of data without knowing who can benefit from it. This gap creates the opportunity for building an online market that can automatically connect supply with demand. While online matching markets are prevalent (e.g., ride-hailing systems), designing a data-centric market for ML exhibits many unprecedented challenges. This paper develops new techniques to tackle two core challenges in designing such a market: (a) to efficiently match demand with supply, we design an algorithm to automatically discover useful data for any ML task from a pool of thousands of datasets, achieving high-quality matching between ML models and data; (b) to encourage market participation of ML users without much ML expertise, we design a new pricing mechanism for selling data-augmented ML models. Furthermore, our market is designed to be API-compatible with existing online ML markets like Vertex AI and Sagemaker, making it easy to use while providing better results due to joint data and model search. We envision that the synergy of our data and model discovery algorithm and pricing mechanism will be an important step towards building a new data-centric online market that serves ML users effectively.
摘要
数据驱动机器学习(ML)——高质量和丰富的训练数据是ML的成功关键。然而,将ML从几家大公司的竞赛转变为让常见用户的数据分析请求的可 accessible 技术,仍存在重要挑战。我们发现了一个差距:许多ML用户可以从其他数据所有者手中获得新的数据,而这些数据所有者拥有大量数据,不知道谁可以从中受益。这个差距创造了建立一个在线市场的机会,可以自动连接供应和需求。虽然在线匹配市场是普遍的(例如,乘车应用程序),但设计一个专门为ML的数据市场具有许多前所未有的挑战。本文提出了新的技术来解决两个核心挑战:(a) 高效匹配需求和供应,我们设计了一个自动从千余个数据集中找到适用于任何ML任务的有用数据,以实现高质量的匹配 междуML模型和数据。(b) 鼓励ML用户 без ML专业知识参与市场,我们设计了一种新的价格机制来销售数据增强ML模型。此外,我们的市场采用API兼容于现有的在线ML市场 like Vertex AI和Sagemaker,使其易于使用,同时提供更好的结果由于共同数据和模型搜索。我们认为,我们的数据和模型发现算法和价格机制的共同作用将是建立一个新的数据驱动的在线市场的重要一步。
Positional Encoding-based Resident Identification in Multi-resident Smart Homes
results: 广泛的实验表明,提出的方案可以有效地识别多名occupant在智能环境中。两个实际数据集的评估结果显示,该方案的准确率分别为94.5%和87.9%。Abstract
We propose a novel resident identification framework to identify residents in a multi-occupant smart environment. The proposed framework employs a feature extraction model based on the concepts of positional encoding. The feature extraction model considers the locations of homes as a graph. We design a novel algorithm to build such graphs from layout maps of smart environments. The Node2Vec algorithm is used to transform the graph into high-dimensional node embeddings. A Long Short-Term Memory (LSTM) model is introduced to predict the identities of residents using temporal sequences of sensor events with the node embeddings. Extensive experiments show that our proposed scheme effectively identifies residents in a multi-occupant environment. Evaluation results on two real-world datasets demonstrate that our proposed approach achieves 94.5% and 87.9% accuracy, respectively.
摘要
我们提出了一种新的居民标识框架,用于在多occupant智能环境中识别居民。我们的框架使用基于 pozitional 编码的特征提取模型,该模型考虑了智能环境的布局地图。我们提出了一种新的算法,用于从布局地图中生成图形。然后,我们使用 Node2Vec 算法将图形转换成高维节点嵌入。我们引入了一个长期快速储存(LSTM)模型,用于预测基于时间序列的感知事件的居民身份。广泛的实验表明,我们的提议方案可以有效地识别多occupant环境中的居民。两个实际数据集的评估结果表明,我们的方法可以达到94.5%和87.9%的准确率。
Hybrid Optical Turbulence Models Using Machine Learning and Local Measurements
results: 研究发现,这个混合模型可以比基本的大气macro-meteorological模型和机器学习模型更好地预测大气光学震荡的性能,尤其是在训练数据少的情况下。Abstract
Accurate prediction of atmospheric optical turbulence in localized environments is essential for estimating the performance of free-space optical systems. Macro-meteorological models developed to predict turbulent effects in one environment may fail when applied in new environments. However, existing macro-meteorological models are expected to offer some predictive power. Building a new model from locally-measured macro-meteorology and scintillometer readings can require significant time and resources, as well as a large number of observations. These challenges motivate the development of a machine-learning informed hybrid model framework. By combining some baseline macro-meteorological model with local observations, hybrid models were trained to improve upon the predictive power of each baseline model. Comparisons between the performance of the hybrid models, the selected baseline macro-meteorological models, and machine-learning models trained only on local observations highlight potential use cases for the hybrid model framework when local data is expensive to collect. Both the hybrid and data-only models were trained using the Gradient Boosted Decision Tree (GBDT) architecture with a variable number of in-situ meteorological observations. The hybrid and data-only models were found to outperform three baseline macro-meteorological models, even for low numbers of observations, in some cases as little as one day. For the first baseline macro-meteorological model investigated, the hybrid model achieves an estimated 29% reduction in mean absolute error (MAE) using only one days-equivalent of observation, growing to 41% after only two days, and 68% after 180 days-equivalent training data. The number of days-equivalent training data required is potentially indicative of the seasonal variation in the local microclimate and its propagation environment.
摘要
准确预测大气光学抖振在本地化环境中是自由空间光学系统性能预测的关键。 macro-метеорологические模型在不同环境中预测抖振效果可能失败,但现有的 macro-метеорологические模型仍然可以提供一定的预测力。 基于本地测量的 macro-метеорологи和抖振仪读数建立新模型可能需要较长的时间和资源,以及大量观测数据。这些挑战驱动了开发一种机器学习 Informed 混合模型框架。通过将基线 macro-метеорологических模型与本地观测数据结合,混合模型可以提高每个基线模型的预测力。对比 hybrid 模型、选择的基线 macro-метеорологи models 和只使用本地观测数据训练的机器学习模型, hybrid 模型在一些情况下可以更好地预测抖振效果。使用 Gradient Boosted Decision Tree (GBDT) 架构,hybrid 模型和数据 только模型都被训练使用本地 meteorological 观测数据。在一些情况下,hybrid 模型可以在只需一天的观测数据量下表现出较好的预测效果,而且随着训练数据量的增加,hybrid 模型的性能会得到进一步改善。对于第一个基eline macro-метеорологи models investigated,hybrid 模型可以在一天的训练数据量下实现了相对于基线模型的29%的减少 Mean Absolute Error (MAE),随着训练数据量的增加,hybrid 模型的性能会得到进一步改善。这些结果表明了hybrid 模型在本地数据质量较低的情况下的可行性。 hybrid 模型和数据只模型的训练需要的天数相对于季节变化的本地微气候和其传播环境可能有关。
paper_authors: Haowen Zhou, Brandon Y. Feng, Haiyun Guo, Siyu, Lin, Mingshu Liang, Christopher A. Metzler, Changhuei Yang for:* 这个论文是为了提高快速大型远程生物学图像分析。methods:* 这个论文使用了物理学基础模型和隐藏神经网络表示(INR)来重建快速大型远程生物学图像。results:* 比traditional FPM算法快速25倍,内存占用量减少80倍。Abstract
Image stacks provide invaluable 3D information in various biological and pathological imaging applications. Fourier ptychographic microscopy (FPM) enables reconstructing high-resolution, wide field-of-view image stacks without z-stack scanning, thus significantly accelerating image acquisition. However, existing FPM methods take tens of minutes to reconstruct and gigabytes of memory to store a high-resolution volumetric scene, impeding fast gigapixel-scale remote digital pathology. While deep learning approaches have been explored to address this challenge, existing methods poorly generalize to novel datasets and can produce unreliable hallucinations. This work presents FPM-INR, a compact and efficient framework that integrates physics-based optical models with implicit neural representations (INR) to represent and reconstruct FPM image stacks. FPM-INR is agnostic to system design or sample types and does not require external training data. In our demonstrated experiments, FPM-INR substantially outperforms traditional FPM algorithms with up to a 25-fold increase in speed and an 80-fold reduction in memory usage for continuous image stack representations.
摘要
TabAttention: Learning Attention Conditionally on Tabular Data
paper_authors: Michal K. Grzeszczyk, Szymon Płotka, Beata Rebizant, Katarzyna Kosińska-Kaczyńska, Michał Lipa, Robert Brawura-Biskupski-Samaha, Przemysław Korzeniowski, Tomasz Trzciński, Arkadiusz Sitek
methods: 该 paper 使用了一种叫做 Convolutional Block Attention Module 的模块,并将其扩展到 3D 空间,使用多头自注意力学习 attention maps。此外, authors 还增强了所有的注意模块,通过将表格数据嵌入。
results: 据 authors 的实验结果,TabAttention 可以超过临床医生和现有的方法,用于FBW 预测。这种新的方法有potential 用于各种临床工作流程中, где 图像和表格数据相结合。Abstract
Medical data analysis often combines both imaging and tabular data processing using machine learning algorithms. While previous studies have investigated the impact of attention mechanisms on deep learning models, few have explored integrating attention modules and tabular data. In this paper, we introduce TabAttention, a novel module that enhances the performance of Convolutional Neural Networks (CNNs) with an attention mechanism that is trained conditionally on tabular data. Specifically, we extend the Convolutional Block Attention Module to 3D by adding a Temporal Attention Module that uses multi-head self-attention to learn attention maps. Furthermore, we enhance all attention modules by integrating tabular data embeddings. Our approach is demonstrated on the fetal birth weight (FBW) estimation task, using 92 fetal abdominal ultrasound video scans and fetal biometry measurements. Our results indicate that TabAttention outperforms clinicians and existing methods that rely on tabular and/or imaging data for FBW prediction. This novel approach has the potential to improve computer-aided diagnosis in various clinical workflows where imaging and tabular data are combined. We provide a source code for integrating TabAttention in CNNs at https://github.com/SanoScience/Tab-Attention.
摘要
医疗数据分析经常结合图像和表格数据处理使用机器学习算法。 previous studies have investigated the impact of attention mechanisms on deep learning models, but few have explored integrating attention modules and tabular data. In this paper, we introduce TabAttention, a novel module that enhances the performance of Convolutional Neural Networks (CNNs) with an attention mechanism that is trained conditionally on tabular data. Specifically, we extend the Convolutional Block Attention Module to 3D by adding a Temporal Attention Module that uses multi-head self-attention to learn attention maps. Furthermore, we enhance all attention modules by integrating tabular data embeddings. Our approach is demonstrated on the fetal birth weight (FBW) estimation task, using 92 fetal abdominal ultrasound video scans and fetal biometry measurements. Our results indicate that TabAttention outperforms clinicians and existing methods that rely on tabular and/or imaging data for FBW prediction. This novel approach has the potential to improve computer-aided diagnosis in various clinical workflows where imaging and tabular data are combined. We provide a source code for integrating TabAttention in CNNs at .
Hyper-Skin: A Hyperspectral Dataset for Reconstructing Facial Skin-Spectra from RGB Images
paper_authors: Pai Chet Ng, Zhixiang Chi, Yannick Verdie, Juwei Lu, Konstantinos N. Plataniotis for:* 这个论文是为了探讨人脸皮肤的各种特征和问题而设计的。methods:* 这个论文使用了推杆式彩色扫描仪获取了各种彩色图像,并使用了这些图像来重建人脸皮肤的各种spectra特征。results:* 这个论文通过使用现有的state-of-the-art模型对41个带spectra特征的数据进行了重建,并得到了较高的重建精度。Abstract
We introduce Hyper-Skin, a hyperspectral dataset covering wide range of wavelengths from visible (VIS) spectrum (400nm - 700nm) to near-infrared (NIR) spectrum (700nm - 1000nm), uniquely designed to facilitate research on facial skin-spectra reconstruction. By reconstructing skin spectra from RGB images, our dataset enables the study of hyperspectral skin analysis, such as melanin and hemoglobin concentrations, directly on the consumer device. Overcoming limitations of existing datasets, Hyper-Skin consists of diverse facial skin data collected with a pushbroom hyperspectral camera. With 330 hyperspectral cubes from 51 subjects, the dataset covers the facial skin from different angles and facial poses. Each hyperspectral cube has dimensions of 1024$\times$1024$\times$448, resulting in millions of spectra vectors per image. The dataset, carefully curated in adherence to ethical guidelines, includes paired hyperspectral images and synthetic RGB images generated using real camera responses. We demonstrate the efficacy of our dataset by showcasing skin spectra reconstruction using state-of-the-art models on 31 bands of hyperspectral data resampled in the VIS and NIR spectrum. This Hyper-Skin dataset would be a valuable resource to NeurIPS community, encouraging the development of novel algorithms for skin spectral reconstruction while fostering interdisciplinary collaboration in hyperspectral skin analysis related to cosmetology and skin's well-being. Instructions to request the data and the related benchmarking codes are publicly available at: \url{https://github.com/hyperspectral-skin/Hyper-Skin-2023}.
摘要
我们介绍Hyper-Skin,一个涵盖各种波长的对称资料集,从可见光(VIS) спектル(400nm - 700nm)至近红外(NIR) спектル(700nm - 1000nm)。这个对称资料集专门设计来推进对面部肤 Spectra 重建的研究,通过从RGB图像中推算肤 Spectra,实现在consumer device上进行对肤 Spectra 的分析。在现有资料集的限制之下,Hyper-Skin 的资料集包括多样化的面部肤 Data,透过推挤式对称摄取器收集。资料集包含51名志愿者的330个对称摄取,每个对称摄取都有1024x1024x448的对称立方体,总共有多万个特征向量。资料集遵循道德指南,并包括对称摄取和Synthetic RGB图像,这些图像是使用真实摄像头的回应生成的。我们显示Hyper-Skin 资料集可以用现代模型进行肤 Spectra 重建,并在31个对称摄取中显示了肤 Spectra 的重建。这个Hyper-Skin 资料集将是neurIPS社区的一个宝贵资源,激发开发新的肤 Spectra 重建算法,并促进对肤 Spectra 分析的跨学科合作。请从以下连结获取资料和相关的benchmarking代码:
CPIA Dataset: A Comprehensive Pathological Image Analysis Dataset for Self-supervised Learning Pre-training
results: 这个论文提出了一个大规模的病理图像分析(CPIA)数据集,包含21427877个标准化图像,覆盖了48种器官/组织和100多种疾病,并提供了一些国际顶尖的基线模型和下游评估方法。Abstract
Pathological image analysis is a crucial field in computer-aided diagnosis, where deep learning is widely applied. Transfer learning using pre-trained models initialized on natural images has effectively improved the downstream pathological performance. However, the lack of sophisticated domain-specific pathological initialization hinders their potential. Self-supervised learning (SSL) enables pre-training without sample-level labels, which has great potential to overcome the challenge of expensive annotations. Thus, studies focusing on pathological SSL pre-training call for a comprehensive and standardized dataset, similar to the ImageNet in computer vision. This paper presents the comprehensive pathological image analysis (CPIA) dataset, a large-scale SSL pre-training dataset combining 103 open-source datasets with extensive standardization. The CPIA dataset contains 21,427,877 standardized images, covering over 48 organs/tissues and about 100 kinds of diseases, which includes two main data types: whole slide images (WSIs) and characteristic regions of interest (ROIs). A four-scale WSI standardization process is proposed based on the uniform resolution in microns per pixel (MPP), while the ROIs are divided into three scales artificially. This multi-scale dataset is built with the diagnosis habits under the supervision of experienced senior pathologists. The CPIA dataset facilitates a comprehensive pathological understanding and enables pattern discovery explorations. Additionally, to launch the CPIA dataset, several state-of-the-art (SOTA) baselines of SSL pre-training and downstream evaluation are specially conducted. The CPIA dataset along with baselines is available at https://github.com/zhanglab2021/CPIA_Dataset.
摘要
临床图像分析是计算机辅助诊断中的关键领域,深度学习在这个领域中广泛应用。使用预训练模型 initialized 自自然图像的传输学习可以有效提高下渠道的临床性能。然而,由于精细的域专专业 initialize 的缺乏,使得它们的潜力受限。无监督学习(SSL)可以无需样本级别标签进行预训练,这种技术具有巨大的潜力以超越临床标注的成本高。因此,关注临床SSL预训练的研究需要一个完整的、标准化的数据集,类似于计算机视觉领域的ImageNet。本文提出了临床图像分析(CPIA)数据集,这是一个大规模的SSL预训练数据集, combinining 103个开源数据集,通过了广泛的标准化。CPIA数据集包含21,427,877个标准化图像,覆盖了48种器官/组织和约100种疾病,其中包括两种主要数据类型:整幅影像(WSIs)和特征区域 interest(ROIs)。基于MPP(微米每平方Pixel)的均匀分辨率,我们提出了一种四级WSIs标准化过程,而ROIs则被 искусственно分为三级。这个多级数据集是根据经验丰富的高级医生的诊断习惯建立的。CPIA数据集促进了全面的临床理解,并允许探索 Pattern discovery。此外,为了推出CPIA数据集,我们特别进行了一些现状顶峰(SOTA)的SSL预训练和下渠道评估。CPIA数据集、基elines都可以在https://github.com/zhanglab2021/CPIA_Dataset上下载。
Towards optimal multimode fiber imaging by leveraging input polarization and conditional generative adversarial networks
results: 实验显示,输入光波 polarization 状态对成像质量产生重要影响,并且通过控制输入光波 polarization 状态,可以实现最佳的成像效果。Abstract
Deep learning techniques provide a plausible route towards achieving practical imaging through multimode fibers. However, the results produced by these methods are often influenced by physical factors like temperature, fiber length, external perturbations, and polarization state of the input light. The impact of other factors, except input light polarization, has been discussed in the literature for imaging applications. The input polarization has been considered by researchers while looking at the characterization and control of polarization in multimode fibers. Here, we show experimentally that the state of polarization of light, being injected at multimode fiber input, affects the fidelity of reconstructed images from speckle patterns. Certain polarization states produce high-quality images at fiber output, while some yield degraded results. We have designed a conditional generative adversarial network~(CGAN) for image regeneration at various degrees of input light polarization. We demonstrate that in the case of multimode fibers that are held fixed, optimal imaging can be achieved by leveraging our CGAN model with the input light polarization state, where the fidelity of images is maximum. Our work exhibits high average structural similarity index values exceeding 0.9, surpassing the previously reported value of 0.8772. We also show that the model can be generalized to image adequately for all input light polarization states when the fiber has bends or twists. We anticipate our work will be a stepping stone toward developing high-resolution and less invasive multimode fiber endoscopes.
摘要
深度学习技术可能提供实用的多模式纤维器成像方法。然而,这些方法的结果经常受到物理因素的影响,如温度、纤维长度、外部干扰和输入光的极化状态。关于成像应用,已经在文献中讨论了其他因素的影响。而输入光的极化状态则在研究人员中被视为 Characterization and control of polarization in multimode fibers。我们的实验表明,输入多模式纤维器的光极化状态会影响生成的图像质量。某些极化状态可以生成高质量的图像,而其他状态则会导致图像受损。我们开发了一种基于CGAN的图像恢复模型,可以在不同的输入光极化状态下进行图像恢复。我们的结果显示,当多模式纤维器保持不动时,我们的模型可以在不同的输入光极化状态下实现最佳的成像。我们的结果超过了之前报道的最高值0.8772,并且表明我们的模型可以在纤维器弯曲或扭转时进行图像恢复。我们预计我们的工作将成为高分辨率和 menos invasive的多模式纤维器镜头的开端。
paper_authors: Tuan Anh Le, Xin-She Yang for: 这种纸是用于解决多变量函数目标和约束的优化框架中的一种通用火fly算法(FA)。methods: 提议使用一种通用的火fly算法(FA)来解决下降传输焊缝问题,包括约束函数和目标函数为多变量独立优化变量。results: 对四个示例问题进行了解释,包括经典传输焊缝、认知焊缝、嵌入智能表面帮助传输焊缝和嵌入智能表面帮助无线电力传输。计算复杂性分析表明,在大天线 режимом下,提议的FA方法需要较少的计算复杂性,但需要更高的复杂性 than iterative和successive convex approximation(SCA)方法。实验结果表明,提议的FA方法可以达到与IPM的全球最优解相同的解决方案,而且在经典传输焊缝、RIS帮助传输焊缝和RIS帮助无线电力传输中,FA方法可以超越iterative、IPM和SCA方法。Abstract
This paper proposes a generalized Firefly Algorithm (FA) to solve an optimization framework having objective function and constraints as multivariate functions of independent optimization variables. Four representative examples of how the proposed generalized FA can be adopted to solve downlink beamforming problems are shown for a classic transmit beamforming, cognitive beamforming, reconfigurable-intelligent-surfaces-aided (RIS-aided) transmit beamforming, and RIS-aided wireless power transfer (WPT). Complexity analyzes indicate that in large-antenna regimes the proposed FA approaches require less computational complexity than their corresponding interior point methods (IPMs) do, yet demand a higher complexity than the iterative and the successive convex approximation (SCA) approaches do. Simulation results reveal that the proposed FA attains the same global optimal solution as that of the IPM for an optimization problem in cognitive beamforming. On the other hand, the proposed FA approaches outperform the iterative, IPM and SCA in terms of obtaining better solution for optimization problems, respectively, for a classic transmit beamforming, RIS-aided transmit beamforming and RIS-aided WPT.
摘要
Translated into Simplified Chinese:这篇论文提出一种通用的Firefly算法(FA),用于解决多变量函数和约束的优化框架问题。论文展示了四种示例,用于采用提议的通用FA来解决传输磁场Synthesizing、认知磁场Synthesizing、智能表面帮助传输磁场Synthesizing和智能表面帮助无线电能耗 Transfer(WPT)问题。复杂性分析表明,在大antenna regime下,提议的FA方法比其相应的内点方法(IPM)更具计算效率,但需要更高的计算复杂性 than iterative和successive Convex Approximation(SCA)方法。实验结果表明,提议的FA方法可以达到与IPM相同的全局最优解的global optimal solution,而且在认知磁场Synthesizing问题中,提议的FA方法超过iterative、IPM和SCA方法。
DPSS-based Codebook Design for Near-Field XL-MIMO Channel Estimation
results: simulation结果表明,提出的代码书设计方法可以具有较高的压缩感知性和较低的泄漏效应,同时可以高效地估计靠近场通信道。Abstract
Future sixth-generation (6G) systems are expected to leverage extremely large-scale multiple-input multiple-output (XL-MIMO) technology, which significantly expands the range of the near-field region. While accurate channel estimation is essential for beamforming and data detection, the unique characteristics of near-field channels pose additional challenges to the effective acquisition of channel state information. In this paper, we propose a novel codebook design, which allows efficient near-field channel estimation with significantly reduced codebook size. Specifically, we consider the eigen-problem based on the near-field electromagnetic wave transmission model. Moreover, we derive the general form of the eigenvectors associated with the near-field channel matrix, revealing their noteworthy connection to the discrete prolate spheroidal sequence (DPSS). Based on the proposed near-field codebook design, we further introduce a two-step channel estimation scheme. Simulation results demonstrate that the proposed codebook design not only achieves superior sparsification performance of near-field channels with a lower leakage effect, but also significantly improves the accuracy in compressive sensing channel estimation.
摘要
未来第六代(6G)系统预计会利用非常大规模多输入多输出(XL-MIMO)技术,这将大幅扩展近场区域的范围。准确频率预测是扫描和数据检测中关键的一环,但近场通道特有的特征会对有效地获取频率状态信息提出更多的挑战。在这篇论文中,我们提出了一种新的编码ebook设计,允许高效地近场频率预测,同时减少编码ebook的大小。 Specifically,我们基于近场电磁波传输模型来解决近场电磁波传输的eigen-问题。此外,我们还 derive了近场通道矩阵的特征值和特征向量的总体形式,发现它们与杂谱圆柱形数列(DPSS)之间存在深刻的连接。基于我们的近场编码ebook设计,我们还提出了两步频率预测方案。实验结果表明,我们的编码ebook设计不仅可以高效地压缩近场通道,同时也可以大幅提高压缩感知通道预测的准确性。
results: 提出的converter被通过several filter orders、center frequencies和oversampling ratios的行为 simulations validate,并且对op-amp circuit实现进行了考虑,显示了first-order op-amp non-idealities的效果。最后,通过Monte Carlo simulations, demonstrate the robustness against component variations.Abstract
In this paper, the design flexibility of the control-bounded analog-to-digital converter principle is demonstrated. A band-pass analog-to-digital converter is considered as an application and case study. We show how a low-pass control-bounded analog-to-digital converter can be translated into a band-pass version where the guaranteed stability, converter bandwidth, and signal-to-noise ratio are preserved while the center frequency for conversion can be positioned freely. The proposed converter is validated with behavioral simulations on several filter orders, center frequencies, and oversampling ratios. Additionally, we consider an op-amp circuit realization where the effects of first-order op-amp non-idealities are shown. Finally, robustness against component variations is demonstrated by Monte Carlo simulations.
摘要
在本文中,我们示出了控制bounded的报文数字转换原理的设计灵活性。我们使用了带通量的报文数字转换器作为应用和案例研究。我们表明了一种low-pass控制bounded的报文数字转换器可以被翻译成带通量版本,保持稳定性、转换宽度和信号噪声比,并且可以自由地调整中心频率。我们通过多个筛ORDER、中心频率和抽样比例的行为仿真进行验证。此外,我们还考虑了一种op-amp电路实现,其中表明了首次逻辑不 idealities的效果。最后,我们通过Monte Carlo仿真展示了对Component变化的Robustness。
Probabilistic Constellation Shaping for OFDM-Based ISAC Signaling
results: 本论文的结果显示,使用 PCS 方法可以实现一个可扩展的 S&C 贡献平衡,并且在numerical simulations中证明了这种方法的超越性。Abstract
Integrated Sensing and Communications (ISAC) has garnered significant attention as a promising technology for the upcoming sixth-generation wireless communication systems (6G). In pursuit of this goal, a common strategy is that a unified waveform, such as Orthogonal Frequency Division Multiplexing (OFDM), should serve dual-functional roles by enabling simultaneous sensing and communications (S&C) operations. However, the sensing performance of an OFDM communication signal is substantially affected by the randomness of the data symbols mapped from bit streams. Therefore, achieving a balance between preserving communication capability (i.e., the randomness) while improving sensing performance remains a challenging task. To cope with this issue, in this paper we analyze the ambiguity function of the OFDM communication signal modulated by random data. Subsequently, a probabilistic constellation shaping (PCS) method is proposed to devise the probability distributions of constellation points, which is able to strike a scalable S&C tradeoff of the random transmitted signal. Finally, the superiority of the proposed PCS method over conventional uniformly distributed constellations is validated through numerical simulations.
摘要
integrated sensing and communications (ISAC) 已经引起了广泛的关注,作为未来 sixth-generation wireless communication systems (6G) 的可能技术。为实现这个目标,一个常见的策略是使用 unified waveform,如orthogonal frequency division multiplexing (OFDM),以实现同时的 sensing and communications (S&C) 操作。然而,OFDM 通信信号的探测性能受到数据符号的随机性的影响,因此保持通信能力(即随机性)的同时提高探测性能是一项挑战。为解决这个问题,本文分析 OFDM 通信信号模拟了随机数据的异步函数。然后,一种 probabilistic constellation shaping (PCS) 方法是提出来,以设计均匀分布的星座点概率分布,能够实现可扩展的 S&C 质量规则。最后,通过数值仿真,validate了提议的 PCS 方法的超越性。
New Fast Transform for Orthogonal Frequency Division Multiplexing
results: 本研究发现,使用FCT算法可以实现OFDM系统中具有更好的对�hash-Hadamard变换(CHT)和快速傅立宝(FFT)的复合效果,并且可以实现更好的对�hash-Hadamard变换(CHT)和快速傅立宝(FFT)的复合效果,并且可以实现更好的对�hash-Hadamard变换(CHT)和快速傅立宝(FFT)的复合效果。此外,提出了一个新的OFDM系统,使用FCT算法,并评估了其性能。结果显示,提案的CT-OFDM可以实现更好的对�hash-Hadamard变换(CHT)和快速傅立宝(FFT)的复合效果,并且可以实现更好的对�hash-Hadamard变换(CHT)和快速傅立宝(FFT)的复合效果。Abstract
In this paper, a new fast and low complexity transform is introduced for orthogonal frequency division multiplexing (OFDM) wireless systems. The new transform combines the effects of fast complex-Walsh-Hadamard transform (CHT) and the fast Fourier transform (FFT) into a single unitary transform named in this paper as the complex transition transform (CTT). The development of a new algorithm for fast calculation of the CT transform called FCT is found to have all the desirable properties such as in-place computation, simple indexing scheme and considerably lower arithmetic complexity than existing algorithms. Furthermore, a new OFDM system using the FCT algorithm is introduced and its performance has been evaluated. The proposed CT-OFDM achieves a noticeable reduction in peak-to-average-power-ratio (PAPR) and a significant improvement in the bit-error-rate (BER) performance compared with the conventional OFDM.
摘要
在本文中,一种新的快速低复杂度变换被介绍到了分割多播发射系统中。该变换结合了快速复杂威尔逊哈达姆变换(CHT)和快速傅立叶变换(FFT)的效果,并将其称为复杂过渡变换(CTT)。本文提出了一种新的快速计算CT变换的算法,称为快速CT变换算法(FCT),该算法具有占位计算、简单的索引方式和较低的数学复杂性。此外,一种使用FCT算法的新的OFDM系统被引入,其性能被评估。提出的CT-OFDM系统可以减少峰值平均功率比(PAPR)和提高比特错误率(BER)的性能,与传统的OFDM系统相比有显著的改善。
Vision-Based Reconfigurable Intelligent Surface Beam Tracking for mmWave Communications
results: 研究结果表明,在插入智能表面后,多pathComponents会出现,其中一个路径的功率在堵塞情况下可以是关键,而在线视和非线视情况下都可以 observer capacity提高。Abstract
Reconfigurable intelligent surfaces have emerged as a technology with the potential to enhance wireless communication performance for 5G and beyond. However, the technology comes with challenges in areas such as complexity, power consumption, and cost. This paper demonstrates a computer vision-based reconfigurable intelligent surface beamforming algorithm that addresses complexity and cost issues and analyzes the multipath components that arise from the insertion of such a device into the wireless channel. The results show that a reconfigurable intelligent surface can provide an additional multipath component. The power of this additional path can be critical in blockage scenarios, and a capacity increase can be perceived in both line-of-sight and non line-of-sight scenarios.
摘要
《可重配置智能表面技术在5G和以后的无线通信中表现出了潜在的提高性。然而,这技术受到复杂性、功耗和成本等因素的影响。本文提出了基于计算机视觉的可重配置智能表面扫描算法,解决了复杂性和成本问题,同时分析了在插入此设备到无线通信频道时产生的多Path分量。结果表明,可重配置智能表面可以提供一个额外的多Path分量,其功率在堵塞情况下可以是关键,在线视和非线视情况下都可以观察到容量增加。》Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
SCAN-MUSIC: An Efficient Super-resolution Algorithm for Single Snapshot Wide-band Line Spectral Estimation
results: 该论文的算法可以减少标准 MUSIC 算法的计算复杂性,同时保持一定的分辨率。其性能与当前最佳算法相当,且在处理具有岛屿结构的线谱时更为可靠。Abstract
We propose an efficient algorithm for reconstructing one-dimensional wide-band line spectra from their Fourier data in a bounded interval $[-\Omega,\Omega]$. While traditional subspace methods such as MUSIC achieve super-resolution for closely separated line spectra, their computational cost is high, particularly for wide-band line spectra. To address this issue, we proposed a scalable algorithm termed SCAN-MUSIC that scans the spectral domain using a fixed Gaussian window and then reconstructs the line spectra falling into the window at each time. For line spectra with cluster structure, we further refine the proposed algorithm using the annihilating filter technique. Both algorithms can significantly reduce the computational complexity of the standard MUSIC algorithm with a moderate loss of resolution. Moreover, in terms of speed, their performance is comparable to the state-of-the-art algorithms, while being more reliable for reconstructing line spectra with cluster structure. The algorithms are supplemented with theoretical analyses of error estimates, sampling complexity, computational complexity, and computational limit.
摘要
我们提出了一种高效的算法来重建一维宽频线谱在固定区间 [[-\Ω, \Ω]] 中的重建问题。传统的子空间方法如 MUSIC 可以在紧邻的线谱上实现超解析,但其计算成本高、特别是对宽频线谱。为解决这个问题,我们提出了一种可扩展的算法 termed SCAN-MUSIC,它在 spectral 频域中使用固定的 Gaussian 窗口进行扫描,然后在每个时间点上重建落入窗口内的线谱。对于具有嵌入结构的线谱,我们进一步改进了提议的算法使用抑制器技术。这些算法可以在标准 MUSIC 算法的计算复杂度中减少计算复杂度,同时保持与现状算法相同的速度性和可靠性。我们还提供了算法的理论分析,包括错误估计、抽象复杂度、计算复杂度和计算限制。
User Association and Resource Allocation in Large Language Model Based Mobile Edge Computing System over Wireless Communications
results: 透过实验,本篇论文证明了其提出的DASHF算法的效能,并提供了有用的问题解决方案,对于实现高效的语言模型服务在移动设备上提供了重要的启示。Abstract
In the rapidly evolving landscape of large language models (LLMs) and mobile edge computing, the need for efficient service delivery to mobile users with constrained computational resources has become paramount. Addressing this, our paper delves into a collaborative framework for model training where user data and model adapters are shared with servers to optimize performance. Within this framework, users initially update the first several layers of the adapters while freezing the other layers of them, leveraging their local datasets. Once this step is complete, these partially trained parameters are transmitted to servers. The servers, equipped with more robust computational capabilities, then update the subsequent layers. After this training, they send the enhanced parameters back to the users. This collaborative training approach ensures that mobile users with limited computational capacities can still benefit from advanced LLM services without being burdened by exhaustive computations. Central to our methodology is the DASHF algorithm, which encapsulates the Dinkelbach algorithm, alternating optimization, semidefinite relaxation (SDR), the Hungarian method, and a pioneering fractional programming technique from our recent IEEE JSAC paper "Human-Centric Resource Allocation in the Metaverse over Wireless Communications". The crux of DASHF is its capability to reformulate an optimization problem as Quadratically Constrained Quadratic Programming (QCQP) via meticulously crafted transformations, making it solvable by SDR and the Hungarian algorithm. Through extensive simulations, we demonstrate the effectiveness of the DASHF algorithm, offering significant insights for the advancement of collaborative LLM service deployments.
摘要
在大型语言模型(LLM)和移动边缘 computing 的快速演进中,为了提供对移动用户的有效服务,尤其是具有限制的计算资源,已经成为非常重要。我们的论文探讨了一个合作框架,其中用户的数据和模型适配器被分享到服务器,以便优化性能。在这个框架中,用户首先对适配器的前几层进行更新,并免除其他层的固定,利用本地数据集。一旦这步完成,这些部分训练的参数将被传递到服务器。服务器,具有更强大的计算能力,则对后续层进行更新。之后,这些优化的参数将被发送回用户。这个合作训练方法确保了移动用户具有有限的计算能力仍然能够享受进步的 LLN 服务,不会受到复杂的计算所拘束。我们的方法中心在 DASHF 算法,这个算法包含了 Dinkelbach 算法、分布式优化、正方形relaxation(SDR)、匈牙利方法和我们在 IEEE JSAC 上发表的“人类中心资源分配在Metaverse中的无线通信”一文中的创新分程式技术。DASHF 算法的核心在于可以通过精心设计的转换,将优化问题转换为 quadratic constraints quadratic programming(QCQP),使其可以通过 SDR 和匈牙利算法解决。经过广泛的 simulations,我们证明了 DASHF 算法的有效性,提供了进一步探讨合作 LLN 服务部署的重要意义。
Resource Allocation for Near-Field Communications: Fundamentals, Tools, and Outlooks
paper_authors: Bokai Xu, Jiayi Zhang, Hongyang Du, Zhe Wang, Yuanwei Liu, Dusit Niyato, Bo Ai, Khaled B. Letaief
for: 本文主要研究近场通信系统中的资源分配问题,以实现高 spectral efficiency (SE) 和 energy efficiency (EE)。
methods: 本文使用 numerical techniques 和 machine learning methods 来解决近场资源分配问题,并且 highlighted their strengths and limitations。
results: 本文指出了近场通信系统中可用的资源,并且 Summarized optimization tools for addressing near-field resource allocation.Abstract
Extremely large-scale multiple-input-multiple output (XL-MIMO) is a promising technology to achieve high spectral efficiency (SE) and energy efficiency (EE) in future wireless systems. The larger array aperture of XL-MIMO makes communication scenarios closer to the near-field region. Therefore, near-field resource allocation is essential in realizing the above key performance indicators (KPIs). Moreover, the overall performance of XL-MIMO systems heavily depends on the channel characteristics of the selected users, eliminating interference between users through beamforming, power control, etc. The above resource allocation issue constitutes a complex joint multi-objective optimization problem since many variables and parameters must be optimized, including the spatial degree of freedom, rate, power allocation, and transmission technique. In this article, we review the basic properties of near-field communications and focus on the corresponding "resource allocation" problems. First, we identify available resources in near-field communication systems and highlight their distinctions from far-field communications. Then, we summarize optimization tools, such as numerical techniques and machine learning methods, for addressing near-field resource allocation, emphasizing their strengths and limitations. Finally, several important research directions of near-field communications are pointed out for further investigation.
摘要
非常大规模多输入多输出(XL-MIMO)技术是未来无线系统中实现高频率效率(SE)和能效率(EE)的有力方案。XL-MIMO的更大的天线组合使得通信场景更接近近场区域。因此,近场资源分配是实现上述关键性表达指标(KPI)的重要前提。此外,XL-MIMO系统的总性性能强度取决于选择用户的通道特性,通过扫描、功率控制等技术消除用户之间的干扰。以上资源分配问题构成了复杂的共同多目标优化问题,因为需要优化多个变量和参数,包括空间度的自由度、速率、功率分配和传输技术。在本文中,我们介绍了近场通信的基本性能和相关的"资源分配"问题。首先,我们确定了近场通信系统中可用的资源和与远场通信系统的区别。然后,我们总结了优化工具,如数值技术和机器学习方法,用于解决近场资源分配问题,强调其优点和局限性。最后,我们指出了进一步研究近场通信的重要研究方向。
results: 比对基eline方法,提高了 almost 0.84 点 PESQ 和 1% STOI,且 computational cost 大幅减少Abstract
Speech enhancement concerns the processes required to remove unwanted background sounds from the target speech to improve its quality and intelligibility. In this paper, a novel approach for single-channel speech enhancement is presented, using colored spectrograms. We propose the use of a deep neural network (DNN) architecture adapted from the pix2pix generative adversarial network (GAN) and train it over colored spectrograms of speech to denoise them. After denoising, the colors of spectrograms are translated to magnitudes of short-time Fourier transform (STFT) using a shallow regression neural network. These estimated STFT magnitudes are later combined with the noisy phases to obtain an enhanced speech. The results show an improvement of almost 0.84 points in the perceptual evaluation of speech quality (PESQ) and 1% in the short-term objective intelligibility (STOI) over the unprocessed noisy data. The gain in quality and intelligibility over the unprocessed signal is almost equal to the gain achieved by the baseline methods used for comparison with the proposed model, but at a much reduced computational cost. The proposed solution offers a comparative PESQ score at almost 10 times reduced computational cost than a similar baseline model that has generated the highest PESQ score trained on grayscaled spectrograms, while it provides only a 1% deficit in STOI at 28 times reduced computational cost when compared to another baseline system based on convolutional neural network-GAN (CNN-GAN) that produces the most intelligible speech.
摘要
音响提升关注于从目标语音中除去不想要的背景声音,以提高其质量和可理解性。在这篇论文中,我们提出了一种基于深度神经网络(DNN)的单通道语音提升方法,使用颜色spectrogram。我们采用了基于 pix2pix生成对抗网络(GAN)的DNN架构,并在颜色spectrogram上训练其来减噪。减噪后,颜色spectrogram的颜色被翻译为快时傅立声变换(STFT)的大小使用一个浅层神经网络进行预测。这些估算的STFT大小后与噪音相加,以获得提升的语音。结果表明,与不处理噪音数据相比,提升语音质量和可理解性的改进约为0.84分(PESQ)和1%(STOI)。与比较基线方法相比,提升的质量和可理解性减噪量约为90%,而计算成本减少了约10倍。提议的解决方案提供了相对于基线方法的PESQ分数,但计算成本减少了约10倍。此外,与另一个基线系统(CNN-GAN)相比,提升的STOI减噪量约为28倍,而计算成本减少了约28倍。
Real-time Neonatal Chest Sound Separation using Deep Learning
results: 该论文在人工数据集上比前方法提高了2.01dB至5.06dB的对象扭曲度量,同时计算时间也提高了至少17倍。因此,该方法可以作为任何胸部听觉监测系统的预处理步骤。Abstract
Auscultation for neonates is a simple and non-invasive method of providing diagnosis for cardiovascular and respiratory disease. Such diagnosis often requires high-quality heart and lung sounds to be captured during auscultation. However, in most cases, obtaining such high-quality sounds is non-trivial due to the chest sounds containing a mixture of heart, lung, and noise sounds. As such, additional preprocessing is needed to separate the chest sounds into heart and lung sounds. This paper proposes a novel deep-learning approach to separate such chest sounds into heart and lung sounds. Inspired by the Conv-TasNet model, the proposed model has an encoder, decoder, and mask generator. The encoder consists of a 1D convolution model and the decoder consists of a transposed 1D convolution. The mask generator is constructed using stacked 1D convolutions and transformers. The proposed model outperforms previous methods in terms of objective distortion measures by 2.01 dB to 5.06 dB in the artificial dataset, as well as computation time, with at least a 17-time improvement. Therefore, our proposed model could be a suitable preprocessing step for any phonocardiogram-based health monitoring system.
摘要
来诊检测新生儿是一种简单且不侵入性的诊断方法,用于诊断循环和呼吸道疾病。然而,在大多数情况下,获取高质量心脏和肺声 зву乐是非常困难,因为胸部声音包含了心脏、肺声和噪音声音。为了解决这个问题,通常需要进行额外的处理,以分离胸部声音成为心脏声音和肺声音。这篇论文提出了一个新的深度学习方法,用于将胸部声音分类为心脏声音和肺声音。这个方法受到Conv-TasNet模型的激发,并包括Encoder、Decoder和面组生成器。Encoder由1D梯度核心组成,Decoder由转置1D梯度组成,而面组生成器则由堆叠1D梯度和对称器组成。这个方法在人工数据集上比前方法提高了2.01dB至5.06dB的对象歪斜度指数,以及计算时间,至少提高了17倍。因此,我们的提案方法可以作为任何phonocardiogram基于的医疗监控系统的适当预处理步骤。
Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning
results: 透过 semi-supervised 训练和多元数据,提高 VITS 模型的表现,使其能够实现多种语音样式和情感的语音合成。Abstract
This paper aims to build an expressive TTS system for multi-speakers, synthesizing a target speaker's speech with multiple styles and emotions. To this end, we propose a novel contrastive learning-based TTS approach to transfer style and emotion across speakers. Specifically, we construct positive-negative sample pairs at both utterance and category (such as emotion-happy or style-poet or speaker A) levels and leverage contrastive learning to better extract disentangled style, emotion, and speaker representations from speech. Furthermore, we introduce a semi-supervised training strategy to the proposed approach to effectively leverage multi-domain data, including style-labeled data, emotion-labeled data, and unlabeled data. We integrate the learned representations into an improved VITS model, enabling it to synthesize expressive speech with diverse styles and emotions for a target speaker. Experiments on multi-domain data demonstrate the good design of our model.
摘要
这篇论文目标建立一个表达力强的多话者Text-to-Speech(TTS)系统,使得目标说话者的speech中包含多种风格和情感。为此,我们提出了一种基于对比学习的TTS方法,用于传递风格和情感 across speakers。具体来说,我们构建了一个utterance和类别(例如情感-高兴或风格-诗人或说话者A)两级的正负样本对,并利用对比学习来更好地提取speech中的分离风格、情感和说话者表示。此外,我们提出了一种半监督训练策略,以更好地利用多个频道数据,包括风格标注数据、情感标注数据和无标注数据。我们将学习的表示 integrate into an improved VITS模型,使其能够合成具有多种风格和情感的表达性speech for a target speaker。实验结果表明我们的模型设计很好。
results: 与VoicePrivacy 2022挑战的基准方法进行比较,我们的方法在隐私和实用性两个领域中具有更好的性能,特别是在情感识别、抑郁诊断和意图识别等词语表达任务中。Abstract
Existing privacy-preserving speech representation learning methods target a single application domain. In this paper, we present a novel framework to anonymize utterance-level speech embeddings generated by pre-trained encoders and show its effectiveness for a range of speech classification tasks. Specifically, given the representations from a pre-trained encoder, we train a Transformer to estimate the representations for the same utterances spoken by other speakers. During inference, the extracted representations can be converted into different identities to preserve privacy. We compare the results with the voice anonymization baselines from the VoicePrivacy 2022 challenge. We evaluate our framework on speaker identification for privacy and emotion recognition, depression classification, and intent classification for utility. Our method outperforms the baselines on privacy and utility in paralinguistic tasks and achieves comparable performance for intent classification.
摘要
现有的隐私保护的语音表示学习方法都是专门为单个应用领域设计的。在这篇论文中,我们提出了一种新的框架,用于匿名化预训练编码器生成的语音嵌入,并证明其对各种语音分类任务的效果。具体来说,给定预训练编码器生成的表示,我们训练了一个Transformer模型,以便估计相同的声音由其他 speaker 说的表示。在推理阶段,提取的表示可以转换为不同的 identities,以保护隐私。我们与 VoicePrivacy 2022 挑战的基准值进行比较,并对 speaker 识别、情感识别、抑郁诊断和意图识别等多种任务进行评估。我们的方法在隐私和实用性两个方面都超过基准值,并在意图识别任务中与基准值相对。