methods: 这个方法使用了一个基于 morphology 的测脉检测模型,并且使用了 Class Activation Map (CAM) 来精确地定位悖论。
results: 这个方法可以获得比较好的效果,其 Dice 分数为 74.39%,与超参考方法相比,其 Hausdorff 距离为 24.27,较低。Abstract
Breast cancer diagnosis challenges both patients and clinicians, with early detection being crucial for effective treatment. Ultrasound imaging plays a key role in this, but its utility is hampered by the need for precise lesion segmentation-a task that is both time-consuming and labor-intensive. To address these challenges, we propose a new framework: a morphology-enhanced, Class Activation Map (CAM)-guided model, which is optimized using a computer vision foundation model known as SAM. This innovative framework is specifically designed for weakly supervised lesion segmentation in early-stage breast ultrasound images. Our approach uniquely leverages image-level annotations, which removes the requirement for detailed pixel-level annotation. Initially, we perform a preliminary segmentation using breast lesion morphology knowledge. Following this, we accurately localize lesions by extracting semantic information through a CAM-based heatmap. These two elements are then fused together, serving as a prompt to guide the SAM in performing refined segmentation. Subsequently, post-processing techniques are employed to rectify topological errors made by the SAM. Our method not only simplifies the segmentation process but also attains accuracy comparable to supervised learning methods that rely on pixel-level annotation. Our framework achieves a Dice score of 74.39% on the test set, demonstrating compareable performance with supervised learning methods. Additionally, it outperforms a supervised learning model, in terms of the Hausdorff distance, scoring 24.27 compared to Deeplabv3+'s 32.22. These experimental results showcase its feasibility and superior performance in integrating weakly supervised learning with SAM. The code is made available at: https://github.com/YueXin18/MorSeg-CAM-SAM.
摘要
乳癌诊断带来挑战,特别是在早期发现是关键。ultrasound imaging在这方面发挥着关键作用,但是需要精准的肿体分割,这是时间consuming和劳动密集的任务。为了解决这些挑战,我们提出了一个新的框架:一种基于形态学的Class Activation Map(CAM)引导模型,通过一个已知的计算机视觉基础模型(SAM)进行优化。这种创新的框架专门用于早期乳癌ultrasound图像中的弱类supervised lesion segmentation。我们的方法利用图像级别的标注,从而消除了精准的像素级别标注的需求。首先,我们进行初步分 segmentation,基于乳腺癌形态知识。然后,我们准确地找到肿体,通过提取semantic信息,并使用CAM基于的热图来准确地Localize lesions。这两个元素之后被融合,作为SAM进行精确的分 segmentation的指引。最后,我们使用post-processing技术来修正SAM中的topological错误。我们的方法不仅简化了分 segmentation过程,还可以达到与超级vised learning方法相同的准确性。我们的框架在测试集上达到了74.39%的Dice分数,与超级vised learning方法相当。此外,它还在 Hausdorff distance上 OUTPERFORMS Deeplabv3+,分别为24.27和32.22。这些实验结果表明了我们的框架在结合弱类学习与SAM的可行性和性能优势。代码可以在以下链接获取:https://github.com/YueXin18/MorSeg-CAM-SAM。
Best uses of ChatGPT and Generative AI for computer science research
paper_authors: Eduardo C. Garrido-Merchan for: This paper explores the diverse applications of ChatGPT and other generative AI technologies in computer science academic research, with a focus on using these tools to boost the productivity of computer research scientists.methods: The paper highlights innovative uses of generative AI, such as brainstorming research ideas, aiding in the drafting and styling of academic papers, and assisting in the synthesis of state-of-the-art sections.results: The paper makes recommendations for using generative AI to improve the productivity of computer research scientists, including using these tools for synthetic data creation, research methodology, and mentorship, as well as for task organization and article quality assessment. Additionally, the paper explores the capabilities of generative AI in disseminating ideas, generating images and audio, text transcription, and engaging with editors.Abstract
Generative Artificial Intelligence (AI), particularly tools like OpenAI's popular ChatGPT, is reshaping the landscape of computer science research. Used wisely, these tools can boost the productivity of a computer research scientist. This paper provides an exploration of the diverse applications of ChatGPT and other generative AI technologies in computer science academic research, making recommendations about the use of Generative AI to make more productive the role of the computer research scientist, with the focus of writing new research papers. We highlight innovative uses such as brainstorming research ideas, aiding in the drafting and styling of academic papers and assisting in the synthesis of state-of-the-art section. Further, we delve into using these technologies in understanding interdisciplinary approaches, making complex texts simpler, and recommending suitable academic journals for publication. Significant focus is placed on generative AI's contributions to synthetic data creation, research methodology, and mentorship, as well as in task organization and article quality assessment. The paper also addresses the utility of AI in article review, adapting texts to length constraints, constructing counterarguments, and survey development. Moreover, we explore the capabilities of these tools in disseminating ideas, generating images and audio, text transcription, and engaging with editors. We also describe some non-recommended uses of generative AI for computer science research, mainly because of the limitations of this technology.
摘要
生成人工智能(AI),特别是开源AI的ChatGPT等工具,正在计算机科学研究领域发挥重要作用。如果用得当,这些工具可以提高计算机研究科学家的生产力。本文通过探讨生成AI在计算机科学学术研究中的多种应用,并提出使用生成AI来提高计算机研究科学家的作用,主要是为了写新的研究论文。我们指出了使用这些技术的创新用途,如讨论研究主题、帮助撰写和编写学术论文、协助合并state-of-the-art部分。此外,我们还探讨了这些技术在跨学科approach、简化复杂文本、建议适合发表学术刊物等方面的应用。在生成数据创造、研究方法、导师、任务组织和文章质量评估等方面,生成AI做出了重要贡献。此外,我们还探讨了AI在文章审查、文章修改、调查开发等方面的应用。此外,我们还探讨了这些工具在传播想法、生成图像和音频、文本笔记、与编辑器交互等方面的能力。最后,我们还讨论了生成AI在计算机科学研究中的一些不建议使用情况,主要是因为这些技术的限制。
Deep Coherence Learning: An Unsupervised Deep Beamformer for High Quality Single Plane Wave Imaging in Medical Ultrasound
results: 实验和phantom研究表明,提出的DL-DCL方法可以与传统的多束波图像成像(DMAS)和传统的深度学习(DAS)相比,具有更高的空间和对比分辨率。Abstract
Plane wave imaging (PWI) in medical ultrasound is becoming an important reconstruction method with high frame rates and new clinical applications. Recently, single PWI based on deep learning (DL) has been studied to overcome lowered frame rates of traditional PWI with multiple PW transmissions. However, due to the lack of appropriate ground truth images, DL-based PWI still remains challenging for performance improvements. To address this issue, in this paper, we propose a new unsupervised learning approach, i.e., deep coherence learning (DCL)-based DL beamformer (DL-DCL), for high-quality single PWI. In DL-DCL, the DL network is trained to predict highly correlated signals with a unique loss function from a set of PW data, and the trained DL model encourages high-quality PWI from low-quality single PW data. In addition, the DL-DCL framework based on complex baseband signals enables a universal beamformer. To assess the performance of DL-DCL, simulation, phantom and in vivo studies were conducted with public datasets, and it was compared with traditional beamformers (i.e., DAS with 75-PWs and DMAS with 1-PW) and other DL-based methods (i.e., supervised learning approach with 1-PW and generative adversarial network (GAN) with 1-PW). From the experiments, the proposed DL-DCL showed comparable results with DMAS with 1-PW and DAS with 75-PWs in spatial resolution, and it outperformed all comparison methods in contrast resolution. These results demonstrated that the proposed unsupervised learning approach can address the inherent limitations of traditional PWIs based on DL, and it also showed great potential in clinical settings with minimal artifacts.
摘要
单波探射图像重建(PWI)在医疗超声中成为重要的重建方法,具有高帧率和新的临床应用。近期,基于深度学习(DL)的单波PWI被研究以超越传统PWI的帧率降低。然而,由于缺乏适当的基准图像,DL基础的PWI仍然具有挑战性。为解决这问题,本文提出了一新的无监督学习方法——深度协调学习(DCL)基础的DL扁平变数(DL-DCL),以提高单波PWI的质量。在DCL-DCL中,DL网络被训练来预测基于单波探射数据的高度相关的信号,并且训练DL模型以从低质量单波探射数据中获得高质量PWI。此外,DCL-DCL框架基于复杂的基带信号,实现了通用的扁平变数。为评估DCL-DCL的性能,本文进行了遮 simulation、实验和生体研究,并与传统扁平变数(i.e., DAS with 75-PWs和DMAS with 1-PW)和其他DL基础方法(i.e., 监督学习方法with 1-PW和生成 adversarial network(GAN)with 1-PW)进行比较。实验结果显示,提案的DCL-DCL与DMAS with 1-PW和DAS with 75-PWs相似的 spatial resolution,并且与所有比较方法相比,具有更高的contrast resolution。这些结果显示了提案的无监督学习方法可以解决传统PWI基于DL的内在限制,并且在临床设置中具有最小的错误。
Mitigating Exposure Bias in Discriminator Guided Diffusion Models
methods: incorporating an auxiliary term derived from a discriminator network, modifying the sampling approach
results: Achieving an FID score of 1.73 on the unconditional CIFAR-10 dataset, outperforming the current state-of-the-art.Here’s the format you requested:
for: <what are the paper written for?>
methods: <what methods the paper use?>
results: <what results the paper get?>I hope that helps!Abstract
Diffusion Models have demonstrated remarkable performance in image generation. However, their demanding computational requirements for training have prompted ongoing efforts to enhance the quality of generated images through modifications in the sampling process. A recent approach, known as Discriminator Guidance, seeks to bridge the gap between the model score and the data score by incorporating an auxiliary term, derived from a discriminator network. We show that despite significantly improving sample quality, this technique has not resolved the persistent issue of Exposure Bias and we propose SEDM-G++, which incorporates a modified sampling approach, combining Discriminator Guidance and Epsilon Scaling. Our proposed approach outperforms the current state-of-the-art, by achieving an FID score of 1.73 on the unconditional CIFAR-10 dataset.
摘要
Diffusion Models 在图像生成中表现出色,但它们的训练需要高度计算能力,这导致了不断尝试改进生成图像质量的方法。一种最新的方法是通过抽象网络来提供一个辅助项,以bridging模型分数和数据分数之间的差距。我们发现,尽管显著提高样本质量,但这种技术并未解决持续存在的曝光偏见问题。我们提出了SEDM-G++,它将 combine Discriminator Guidance 和 Epsilon Scaling 两种技术,并实现了 current state-of-the-art 的 FID 分数(1.73)在无条件 CIFAR-10 数据集上。
Contextualizing Internet Memes Across Social Media Platforms
paper_authors: Saurav Joshi, Filip Ilievski, Luca Luceri
for: 本研究目的是寻找互联网趋势的表达方式,即互联网趋势的媒体表达形式。
methods: 本研究使用了一种名为“知识图”的semantic repository of knowledge,将互联网趋势的表达形式与知识图中的内容进行对比,从而识别和映射互联网趋势。
results: 研究发现,可以通过对互联网趋势的表达形式与知识图中的内容进行对比,来识别和映射互联网趋势。此外,研究还发现了不同平台上的趋势的差异和流行的趋势,以及一些常见的趋势渠道和子Reddit。最后,研究还示出了如何使用知识图来提供社交媒体上趋势的上下文。Abstract
Internet memes have emerged as a novel format for communication and expressing ideas on the web. Their fluidity and creative nature are reflected in their widespread use, often across platforms and occasionally for unethical or harmful purposes. While computational work has already analyzed their high-level virality over time and developed specialized classifiers for hate speech detection, there have been no efforts to date that aim to holistically track, identify, and map internet memes posted on social media. To bridge this gap, we investigate whether internet memes across social media platforms can be contextualized by using a semantic repository of knowledge, namely, a knowledge graph. We collect thousands of potential internet meme posts from two social media platforms, namely Reddit and Discord, and perform an extract-transform-load procedure to create a data lake with candidate meme posts. By using vision transformer-based similarity, we match these candidates against the memes cataloged in a recently released knowledge graph of internet memes, IMKG. We provide evidence that memes published online can be identified by mapping them to IMKG. We leverage this grounding to study the prevalence of memes on different platforms, discover popular memes, and select common meme channels and subreddits. Finally, we illustrate how the grounding can enable users to get context about memes on social media thanks to their link to the knowledge graph.
摘要
互联网趣闻在互联网上作为一种新的交流和表达方式得到广泛的应用。它们的流动性和创新性使其在不同的平台上广泛使用, occasional 用于不道德或有害的目的。虽然计算工作已经分析了这些趣闻的时间权 virality 和开发了特殊的 hate speech 检测器,但到目前为止没有任何尝试将互联网趣闻在社交媒体上 Contextualized 。为了bridging这个差距,我们investigate 了 whether 互联网趣闻在社交媒体平台上可以通过使用一个semantic repository of knowledge,即知识 graphs 来contextualize。我们收集了 thousands 的 potential 互联网趣闻帖子从两个社交媒体平台,namely Reddit 和 Discord,并perform 了一个 extract-transform-load 过程,以创建一个数据湖包含候选趣闻帖子。通过使用视力 transformer 基于相似性,我们将这些候选者与 IMKG 中 cataloged 的趣闻进行匹配。我们提供了证据,证明在线上发布的趣闻可以被IMKG 中的趣闻映射。我们利用这种固定来研究不同平台上趣闻的流行程度,发现 популяр的趣闻,并选择常见的趣闻渠道和 subreddits。最后,我们示例了如何通过这种固定,使用户在社交媒体上获得趣闻的CONTEXT。
A Principled Framework for Knowledge-enhanced Large Language Model
results: 通过分析框架的各个组成部分,证明了 LLMs 的理解能力的提高,并且在定义的假设下提供了理论保证。Abstract
Large Language Models (LLMs) are versatile, yet they often falter in tasks requiring deep and reliable reasoning due to issues like hallucinations, limiting their applicability in critical scenarios. This paper introduces a rigorously designed framework for creating LLMs that effectively anchor knowledge and employ a closed-loop reasoning process, enhancing their capability for in-depth analysis. We dissect the framework to illustrate the contribution of each component to the LLMs' performance, offering a theoretical assurance of improved reasoning under well-defined assumptions.
摘要
Bayesian Neural Networks: A Min-Max Game Framework
results: 实验结果表明,这个方法和现有的closed-loop transcription neural network相当,并且可以提供另一种 bayesian neural networks的视角。Abstract
Bayesian neural networks use random variables to describe the neural networks rather than deterministic neural networks and are mostly trained by variational inference which updates the mean and variance at the same time. Here, we formulate the Bayesian neural networks as a minimax game problem. We do the experiments on the MNIST data set and the primary result is comparable to the existing closed-loop transcription neural network. Finally, we reveal the connections between Bayesian neural networks and closed-loop transcription neural networks, and show our framework is rather practical, and provide another view of Bayesian neural networks.
摘要
bayesian neural networks 使用Random variable来描述神经网络,而不是决定性神经网络,通常通过变量推导来训练。在这里,我们将 bayesian neural networks 表示为最小化游戏问题。我们在 MNIST 数据集上进行实验,主要结果与现有的关闭环路译写神经网络相当。最后,我们描述 bayesian neural networks 和关闭环路译写神经网络之间的联系,并表明我们的框架是实用的,并提供了 bayesian neural networks 另一种视图。Here's the translation breakdown:* Bayesian neural networks ( bayesian neural networks ) - 概率神经网络* use random variables ( 使用Random variable ) - 使用随机变量* to describe the neural networks ( 描述神经网络 ) - 描述神经网络* rather than deterministic neural networks ( 而不是决定性神经网络 ) - 而不是决定性神经网络* and are mostly trained by variational inference ( 通常通过变量推导来训练 ) - 通常通过变量推导来训练* We formulate the Bayesian neural networks as a minimax game problem ( 在这里,我们将 bayesian neural networks 表示为最小化游戏问题 ) - 将 bayesian neural networks 表示为最小化游戏问题* We do the experiments on the MNIST data set ( 我们在 MNIST 数据集上进行实验 ) - 在 MNIST 数据集上进行实验* and the primary result is comparable to the existing closed-loop transcription neural network ( 主要结果与现有的关闭环路译写神经网络相当 ) - 主要结果与现有的关闭环路译写神经网络相当* Finally, we reveal the connections between Bayesian neural networks and closed-loop transcription neural networks ( 最后,我们描述 bayesian neural networks 和关闭环路译写神经网络之间的联系 ) - 最后,我们描述 bayesian neural networks 和关闭环路译写神经网络之间的联系* and show our framework is rather practical ( 并表明我们的框架是实用的 ) - 并表明我们的框架是实用的* and provide another view of Bayesian neural networks ( 并提供了 bayesian neural networks 另一种视图 ) - 并提供了 bayesian neural networks 另一种视图
An Improved Neural Network Model Based On CNN Using For Fruit Sugar Degree Detection
results: 论文通过对水果谱спектrum数据进行处理和分析,并比较了不同的神经网络模型和传统参数选择方法的效果。结果表明,使用人工神经网络模型可以准确地检测水果糖分,并且比传统参数选择方法更高效。此外,论文还提出了一种基于数据标准差(STD)的新评价标准,用于评价检测性能。Abstract
Artificial Intelligence(AI) widely applies in Image Classification and Recognition, Text Understanding and Natural Language Processing, which makes great progress. In this paper, we introduced AI into the fruit quality detection field. We designed a fruit sugar degree regression model using an Artificial Neural Network based on spectra of fruits within the visible/near-infrared(V/NIR)range. After analysis of fruit spectra, we innovatively proposed a new neural network structure: low layers consist of a Multilayer Perceptron(MLP), a middle layer is a 2-dimensional correlation matrix layer, and high layers consist of several Convolutional Neural Network(CNN) layers. In this study, we used fruit sugar value as a detection target, collecting two fruits called Gan Nan Navel and Tian Shan Pear as samples, doing experiments respectively, and comparing their results. We used Analysis of Variance(ANOVA) to evaluate the reliability of the dataset we collected. Then, we tried multiple strategies to process spectrum data, evaluating their effects. In this paper, we tried to add Wavelet Decomposition(WD) to reduce feature dimensions and a Genetic Algorithm(GA) to find excellent features. Then, we compared Neural Network models with traditional Partial Least Squares(PLS) based models. We also compared the neural network structure we designed(MLP-CNN) with other traditional neural network structures. In this paper, we proposed a new evaluation standard derived from dataset standard deviation(STD) for evaluating detection performance, validating the viability of using an artificial neural network model to do fruit sugar degree nondestructive detection.
摘要
人工智能(AI)广泛应用于图像分类和识别、自然语言处理和文本理解等领域,带来了很大的进步。在这篇论文中,我们将AI应用于水果质量检测领域。我们设计了一种基于人工神经网络的水果糖度预测模型,使用水果在可见/近红外(V/NIR)谱spectra进行分析。经过分析水果谱spectra后,我们创新地提出了一种新的神经网络结构:低层为多层感知网络(MLP),中层为2维相关矩阵层,高层为几个卷积神经网络(CNN)层。在这个研究中,我们使用水果糖度值作为检测目标,采集了两种水果样本——芜南柑和天山梨,进行了分别的实验,并比较了其结果。我们使用分布式 Анализа variance(ANOVA)评估数据集的可靠性。然后,我们尝试了多种处理谱数据的策略,评估其效果。在这篇论文中,我们尝试了使用扩展特征矩阵(WD)减少特征维度,以及使用进化算法(GA)找到优秀的特征。然后,我们比较了神经网络模型与传统的部分最小平方(PLS)基于模型。我们还比较了我们设计的神经网络结构(MLP-CNN)与其他传统神经网络结构。在这篇论文中,我们提出了一种基于数据集标准差(STD)的新评价标准,用于评估检测性能。这些结果validate了使用人工神经网络模型进行水果糖度非 destruktive检测的可能性。
Utilizing Speech Emotion Recognition and Recommender Systems for Negative Emotion Handling in Therapy Chatbots
paper_authors: Farideh Majidi, Marzieh Bahrami for: 提高访问者情绪支持和提供人类化同情methods: 使用语音感知技术和语音感受识别(SER)技术,并使用Convolutional Neural Network(CNN)模型和ShEMO数据集来准确地探测和分类负面情绪,包括恐慌、恐惧和悲伤。同时,开发了一个推荐系统,利用SER模型的输出生成个性化的情绪管理建议,并使用 GloVe 和 LSTM 模型来实现这一点。results: SER 模型在语音信号中探测和分类负面情绪的验证准确率为 88%,而推荐模型在生成个性化的情绪管理建议方面的准确率为 98%。通过 integrate 文本到语音模型 GlowTTS,能够为用户提供人类化的同情和建议,并在英文和波斯语中提供多语言支持。Abstract
Emotional well-being significantly influences mental health and overall quality of life. As therapy chatbots become increasingly prevalent, their ability to comprehend and respond empathetically to users' emotions remains limited. This paper addresses this limitation by proposing an approach to enhance therapy chatbots with auditory perception, enabling them to understand users' feelings and provide human-like empathy. The proposed method incorporates speech emotion recognition (SER) techniques using Convolutional Neural Network (CNN) models and the ShEMO dataset to accurately detect and classify negative emotions, including anger, fear, and sadness. The SER model achieves a validation accuracy of 88%, demonstrating its effectiveness in recognizing emotional states from speech signals. Furthermore, a recommender system is developed, leveraging the SER model's output to generate personalized recommendations for managing negative emotions, for which a new bilingual dataset was generated as well since there is no such dataset available for this task. The recommender model achieves an accuracy of 98% by employing a combination of global vectors for word representation (GloVe) and LSTM models. To provide a more immersive and empathetic user experience, a text-to-speech model called GlowTTS is integrated, enabling the therapy chatbot to audibly communicate the generated recommendations to users in both English and Persian. The proposed approach offers promising potential to enhance therapy chatbots by providing them with the ability to recognize and respond to users' emotions, ultimately improving the delivery of mental health support for both English and Persian-speaking users.
摘要
情感健康对于心理健康和全面生活质量具有重要影响。随着咨询虚拟助手在使用人数的增加,它们的理解和回应用户情感的能力仍然有限。本文解决这一问题,提出一种增强咨询虚拟助手的方法,使其能够通过听觉感知,理解用户的情感状况,并提供人类化同情。本方法利用了支持Vector(SER)技术,使用Convolutional Neural Network(CNN)模型和ShEMO数据集,准确地检测和分类负面情感,包括恐惧、愤怒和悲伤。SER模型的验证准确率达88%,证明其能够从语音信号中准确地检测情感状况。此外,我们还开发了一个推荐系统,利用SER模型的输出,为用户提供个性化的情感管理建议。我们生成了一个新的双语数据集,以便在这个任务上进行训练。推荐模型使用了GloVe word表示模型和LSTM模型,实现了98%的准确率。为提供更加投入和同情的用户体验,我们还 интегрирова了一个名为GlowTTS的文本读取模型,使咨询虚拟助手能够通过语音方式传达给用户建议,并在英文和波斯语中进行同时传达。本方法的提议具有推动咨询虚拟助手提供更好的情感管理支持的潜在能力。
Environment-Aware Dynamic Graph Learning for Out-of-Distribution Generalization
paper_authors: Haonan Yuan, Qingyun Sun, Xingcheng Fu, Ziwei Zhang, Cheng Ji, Hao Peng, Jianxin Li
for: This paper focuses on improving the out-of-distribution (OOD) generalization of dynamic graph neural networks (DGNNs) by modeling complex coupled environments and exploiting spatio-temporal invariant patterns.
methods: The proposed Environment-Aware dynamic Graph LEarning (EAGLE) framework includes an environment-aware EA-DGNN to model environments, an environment instantiation mechanism to diversify environments, and an invariant pattern recognition mechanism to discriminate spatio-temporal invariant patterns for OOD prediction.
results: The proposed EAGLE framework achieves superior performance compared to state-of-the-art baselines under distribution shifts, demonstrating its effectiveness in improving OOD generalization on dynamic graphs.Abstract
Dynamic graph neural networks (DGNNs) are increasingly pervasive in exploiting spatio-temporal patterns on dynamic graphs. However, existing works fail to generalize under distribution shifts, which are common in real-world scenarios. As the generation of dynamic graphs is heavily influenced by latent environments, investigating their impacts on the out-of-distribution (OOD) generalization is critical. However, it remains unexplored with the following two major challenges: (1) How to properly model and infer the complex environments on dynamic graphs with distribution shifts? (2) How to discover invariant patterns given inferred spatio-temporal environments? To solve these challenges, we propose a novel Environment-Aware dynamic Graph LEarning (EAGLE) framework for OOD generalization by modeling complex coupled environments and exploiting spatio-temporal invariant patterns. Specifically, we first design the environment-aware EA-DGNN to model environments by multi-channel environments disentangling. Then, we propose an environment instantiation mechanism for environment diversification with inferred distributions. Finally, we discriminate spatio-temporal invariant patterns for out-of-distribution prediction by the invariant pattern recognition mechanism and perform fine-grained causal interventions node-wisely with a mixture of instantiated environment samples. Experiments on real-world and synthetic dynamic graph datasets demonstrate the superiority of our method against state-of-the-art baselines under distribution shifts. To the best of our knowledge, we are the first to study OOD generalization on dynamic graphs from the environment learning perspective.
摘要
现代图学 neural networks (DGNNs) 在利用空间-时间模式方面越来越普遍。然而,现有的工作无法适应分布变化,这是现实世界中的常见情况。由于生成动态图的过程受到隐藏环境的影响,研究这些环境对于 OUT-OF-DISTRIBUTION (OOD) 通用性的影响是 kritical。然而,这还没有得到充分研究,主要有两个挑战:1. 如何正确地模型和推理动态图上复杂的环境下的分布变化?2. 如何在推理出的空间-时间环境中发现不变Pattern?为解决这些挑战,我们提出了一种新的 Environment-Aware 动态图学 LEarning (EAGLE) 框架,用于 OOD 通用性。具体来说,我们首先设计了环境意识 EA-DGNN,用于模型环境。然后,我们提出了环境实例化机制,用于实现环境多样性。最后,我们通过不变Pattern认识机制来识别OOD情况,并通过精细的 causal interventions 来进行精细的节点修饰。实验结果表明,我们的方法在真实世界和synthetic dynamic graph dataset上表现出优于状态的基eline under distribution shifts。到目前为止,我们是第一个从环境学习角度研究 OOD 通用性在动态图上。
$\varepsilon$-fractional Core Stability in Hedonic Games
paper_authors: Simone Fioravanti, Michele Flammini, Bojana Kodric, Giovanna Varricchio for:This paper focuses on the problem of coalition formation in hedonic games, where agents are strategic and have individual preferences. The goal is to find a stable coalition structure that satisfies some form of stability, such as core-stability.methods:The paper proposes a new notion of $\varepsilon$-fractional core-stability, which allows for a fraction of coalitions to core-block, and designs efficient algorithms to find such partitions for two fundamental classes of hedonic games. The paper also explores the use of probabilistic sampling to learn valuations and compute outcomes that are $\varepsilon$-fractional core-stable.results:The paper shows that the proposed notion of $\varepsilon$-fractional core-stability can guarantee both existence and polynomial-time computation, and provides efficient algorithms for finding such partitions in two fundamental classes of hedonic games. The paper also gives positive and negative results on which distributions allow for the efficient computation of outcomes that are $\varepsilon$-fractional core-stable with arbitrarily high confidence in a PAC-learning fashion.Abstract
Hedonic Games (HGs) are a classical framework modeling coalition formation of strategic agents guided by their individual preferences. According to these preferences, it is desirable that a coalition structure (i.e. a partition of agents into coalitions) satisfies some form of stability. The most well-known and natural of such notions is arguably core-stability. Informally, a partition is core-stable if no subset of agents would like to deviate by regrouping in a so-called core-blocking coalition. Unfortunately, core-stable partitions seldom exist and even when they do, it is often computationally intractable to find one. To circumvent these problems, we propose the notion of $\varepsilon$-fractional core-stability, where at most an $\varepsilon$-fraction of all possible coalitions is allowed to core-block. It turns out that such a relaxation may guarantee both existence and polynomial-time computation. Specifically, we design efficient algorithms returning an $\varepsilon$-fractional core-stable partition, with $\varepsilon$ exponentially decreasing in the number of agents, for two fundamental classes of HGs: Simple Fractional and Anonymous. From a probabilistic point of view, being the definition of $\varepsilon$-fractional core equivalent to requiring that uniformly sampled coalitions core-block with probability lower than $\varepsilon$, we further extend the definition to handle more complex sampling distributions. Along this line, when valuations have to be learned from samples in a PAC-learning fashion, we give positive and negative results on which distributions allow the efficient computation of outcomes that are $\varepsilon$-fractional core-stable with arbitrarily high confidence.
摘要
хедонис游戏(HG)是一种经典的框架模型,用于模型策略性agent coalition formation。根据这些 preferences,一个 coalition structure(即agent分配到不同的 coalition)满足一种形式的稳定性是感兴趣的。最常见和最自然的这种概念是核稳定性。 informally, a partition is core-stable if no subset of agents would like to deviate by regrouping in a so-called core-blocking coalition。 unfortunately, core-stable partitions seldom exist and even when they do, it is often computationally intractable to find one。 to circumvent these problems, we propose the notion of ε-fractional core-stability, where at most an ε-fraction of all possible coalitions is allowed to core-block。 it turns out that such a relaxation may guarantee both existence and polynomial-time computation。 specifically, we design efficient algorithms returning an ε-fractional core-stable partition, with ε exponentially decreasing in the number of agents, for two fundamental classes of HGs:simple fractional and anonymous。 from a probabilistic point of view, being the definition of ε-fractional core equivalent to requiring that uniformly sampled coalitions core-block with probability lower than ε, we further extend the definition to handle more complex sampling distributions。 along this line, when valuations have to be learned from samples in a PAC-learning fashion, we give positive and negative results on which distributions allow the efficient computation of outcomes that are ε-fractional core-stable with arbitrarily high confidence。
Introducing NCL-SM: A Fully Annotated Dataset of Images from Human Skeletal Muscle Biopsies
results: 这篇论文发现了一个全新的、高质量的生物快照数据集,可以用于开发自动化、精准、可重复地分析SM组织图像。这个数据集包括超过50,000个手动分割的肌细胞(myofibers),并且对每个myofibers进行了高质量的检查和标注。Abstract
Single cell analysis of skeletal muscle (SM) tissue is a fundamental tool for understanding many neuromuscular disorders. For this analysis to be reliable and reproducible, identification of individual fibres within microscopy images (segmentation) of SM tissue should be precise. There is currently no tool or pipeline that makes automatic and precise segmentation and curation of images of SM tissue cross-sections possible. Biomedical scientists in this field rely on custom tools and general machine learning (ML) models, both followed by labour intensive and subjective manual interventions to get the segmentation right. We believe that automated, precise, reproducible segmentation is possible by training ML models. However, there are currently no good quality, publicly available annotated imaging datasets available for ML model training. In this paper we release NCL-SM: a high quality bioimaging dataset of 46 human tissue sections from healthy control subjects and from patients with genetically diagnosed muscle pathology. These images include $>$ 50k manually segmented muscle fibres (myofibres). In addition we also curated high quality myofibres and annotated reasons for rejecting low quality myofibres and regions in SM tissue images, making this data completely ready for downstream analysis. This, we believe, will pave the way for development of a fully automatic pipeline that identifies individual myofibres within images of tissue sections and, in particular, also classifies individual myofibres that are fit for further analysis.
摘要
单元细胞分析对骨附肌(SM)组织是基本工具,用于理解许多神经肌肉疾病。为了使这种分析可靠和重复,单元细胞内的图像分割(segmentation)需要精准。目前没有任何工具或管道可以自动地对SM组织横截面图像进行精准分割和摘要。生物医学科学家在这个领域依赖于自定义工具和通用机器学习(ML)模型,然后进行劳动 INTENSIVE 和主观的手动干预,以确保分割是正确的。我们认为,通过训练ML模型,可以实现自动、精准、可重复的分割。然而,目前没有一个良好的、公共可用的批处理图像数据集,用于ML模型训练。在这篇论文中,我们发布NCL-SM:一个高质量生物影像数据集,包括46名健康控制者和被遗传诊断的肌肉疾病患者的人类组织横截面图像。这些图像包含> 50k个手动分割的肌肉元(myofibres)。此外,我们还精心准备了高质量的肌肉元和SM组织图像的批处理结果,并对低质量的肌肉元和SM组织图像进行了描述,使这些数据完全准备好进行下游分析。我们认为,这将开创出一个完全自动的分析管道,可以在SM组织图像中自动地标识和分割各个肌肉元,并在特定情况下还可以对各个肌肉元进行分类。
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data
paper_authors: Nurbanu Aksoy, Nishant Ravikumar, Alejandro F Frangi for: 这个研究旨在提高医疗影像解释的效率,并且使用多 modal 的资料来增强 radiology 报告生成。methods: 这个研究使用了一个新的多Modal transformer 网络,将静脉肺 X 光像和相关的病人人口资料集成,生成具体化的 radiology 报告。results: 根据评估指标,包括病人人口资料,使用提案的方法可以将 radiology 报告质量提高,相比基准网络使用静脉肺 X 光像alone。Abstract
Medical image interpretation is central to most clinical applications such as disease diagnosis, treatment planning, and prognostication. In clinical practice, radiologists examine medical images and manually compile their findings into reports, which can be a time-consuming process. Automated approaches to radiology report generation, therefore, can reduce radiologist workload and improve efficiency in the clinical pathway. While recent deep-learning approaches for automated report generation from medical images have seen some success, most studies have relied on image-derived features alone, ignoring non-imaging patient data. Although a few studies have included the word-level contexts along with the image, the use of patient demographics is still unexplored. This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information, to synthesise patient-specific radiology reports. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information, to synthesise full-text radiology reports. Data from two public databases were used to train and evaluate the proposed approach. CXRs and reports were extracted from the MIMIC-CXR database and combined with corresponding patients' data MIMIC-IV. Based on the evaluation metrics used including patient demographic information was found to improve the quality of reports generated using the proposed approach, relative to a baseline network trained using CXRs alone. The proposed approach shows potential for enhancing radiology report generation by leveraging rich patient metadata and combining semantic text embeddings derived thereof, with medical image-derived visual features.
摘要
医疗图像解读是许多临床应用的核心,包括疾病诊断、治疗规划和前景预测。在临床实践中,医生会 manually examining医疗图像并编写报告,这可能是一项时间消耗的过程。因此,自动化医学报告生成方法可以减轻医生的工作负担,提高临床流程的效率。 although recent deep learning approaches for automated report generation from medical images have shown some success, most studies have relied on image-derived features alone, ignoring non-imaging patient data。在这些研究中,只有一些研究包括了图像上的字符级别上下文,但是使用患者的人口数据仍然是一个未explored的领域。这篇论文提出了一种新的多Modal transformer网络,该网络将 integrate chest x-ray (CXR) 图像和相关的患者人口数据,以生成个性化的医学报告。该网络使用卷积神经网络提取 CXR 图像中的视觉特征,并使用基于 transformer 的 encoder-decoder 网络将这些视觉特征与患者人口数据相结合,以生成全文医学报告。使用两个公共数据库进行训练和评估,包括 MIMIC-CXR 数据库和 MIMIC-IV 数据库。根据评估指标,包括患者人口数据,使用该提案的方法生成的报告质量相比基线网络训练用 CXR 图像alone 提高。该提案表明可以通过利用丰富的患者Metadata和基于 semantic text embeddings derivated thereof,与医疗图像中的视觉特征结合,提高医学报告生成的质量。
results: 本文的实验结果表明,通过矩阵注意力和矩阵互动机制,可以增强深度网络的计算效率和表达能力,并且可以应用于量子领域。Abstract
In this paper, we delve into the foundational principles of tensor categories, harnessing the universal property of the tensor product to pioneer novel methodologies in deep network architectures. Our primary contribution is the introduction of the Tensor Attention and Tensor Interaction Mechanism, a groundbreaking approach that leverages the tensor category to enhance the computational efficiency and the expressiveness of deep networks, and can even be generalized into the quantum realm.
摘要
在这篇论文中,我们探索了矩阵类别的基本原理,利用矩阵产品的 universality 性来开拓深度网络体系的新方法。我们的主要贡献是提出了矩阵注意力和矩阵互动机制,这是一种创新的方法,利用矩阵类别来提高深度网络的计算效率和表达力,甚至可以扩展到量子领域。
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation
paper_authors: Nurbanu Aksoy, Serge Sharoff, Selcuk Baser, Nishant Ravikumar, Alejandro F Frangi
for: 这个论文目的是为了自动生成医疗影像报告,描述医疗影像中的发现。
methods: 该论文提出了一种基于多Modal深度学习网络的方法,结合了结构化患者数据(如生物指标和症状)和不结构化医疗记录,以生成胸部X射线报告。我们引入了一种conditioned cross-multi-head注意力模块,以融合这些不同数据模式,bridging semantic gap between visual and textual data。
results: 实验表明,通过结合多 modal 数据,比单依图像数据获得了显著提高。此外,我们的模型在ROUGE-L指标上达到了相关的state-of-the-art 模型之上。此外,我们还使用了人工评估和临床semantic相似度测量,并与word-overlap指标进行深度的量化分析。Abstract
Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. In this paper, we present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.We introduce a conditioned cross-multi-head attention module to fuse these heterogeneous data modalities, bridging the semantic gap between visual and textual data. Experiments demonstrate substantial improvements from using additional modalities compared to relying on images alone. Notably, our model achieves the highest reported performance on the ROUGE-L metric compared to relevant state-of-the-art models in the literature. Furthermore, we employed both human evaluation and clinical semantic similarity measurement alongside word-overlap metrics to improve the depth of quantitative analysis. A human evaluation, conducted by a board-certified radiologist, confirms the model's accuracy in identifying high-level findings, however, it also highlights that more improvement is needed to capture nuanced details and clinical context.
摘要
radiology report生成旨在自动生成医疗影像中的找到结果。大多数现有方法只是专注于图像数据,忽略了可以 accessible 的患者信息。在这篇论文中,我们提出了一种新的多modal deep neural network框架,用于生成胸部X射线报告,并将结构化患者数据,如生物指标和症状,与不结构化医疗记录相结合。我们提出了一种 conditioned cross-multi-head attention模块,以融合这些不同数据模式,跨越视觉和文本数据之间的semantic gap。实验结果表明,通过使用更多的modalities,可以获得显著提高。特别是,我们的模型在ROUGE-L指标上的表现比 relevante state-of-the-art 模型更高。此外,我们还使用了人类评估和临床semantic相似度测量,与word-overlap指标共同进行深入的量化分析。人类评估,由美国医学会资具认证的放射学专家进行评估,表明模型能够准确地确定高级结果,但也表明需要进一步改进,以捕捉细节和临床上下文。
Combining EEG and NLP Features for Predicting Students’ Lecture Comprehension using Ensemble Classification
results: 实验结果显示,这种分类框架可以比基eline更高的准确率预测学生的困惑和答案正确性。在两个任务中,这种框架的F1分数最高达0.65和0.78,表明使用这种方法可以提高分类性能。此外,还使用了学生自我报告的困惑评分作为一个Integrated特征,以进一步提高分类性能。Abstract
Electroencephalography (EEG) and Natural Language Processing (NLP) can be applied for education to measure students' comprehension in classroom lectures; currently, the two measures have been used separately. In this work, we propose a classification framework for predicting students' lecture comprehension in two tasks: (i) students' confusion after listening to the simulated lecture and (ii) the correctness of students' responses to the post-lecture assessment. The proposed framework includes EEG and NLP feature extraction, processing, and classification. EEG and NLP features are extracted to construct integrated features obtained from recorded EEG signals and sentence-level syntactic analysis, which provide information about specific biomarkers and sentence structures. An ensemble stacking classification method -- a combination of multiple individual models that produces an enhanced predictive model -- is studied to learn from the features to make predictions accurately. Furthermore, we also utilized subjective confusion ratings as another integrated feature to enhance classification performance. By doing so, experiment results show that this framework performs better than the baselines, which achieved F1 up to 0.65 for predicting confusion and 0.78 for predicting correctness, highlighting that utilizing this has helped improve the classification performance.
摘要
电子脑波图像(EEG)和自然语言处理(NLP)可以应用于教育,以测量学生在课堂讲解中的理解水平。目前,这两种测量方法都是分开使用的。在这项工作中,我们提议一种分类框架,用于预测学生在两个任务中的课堂理解水平:(i)学生听完模拟课程后的混乱程度,以及(ii)学生完成后评估测验中的答案正确性。提议的框架包括EEG和NLP特征提取、处理和分类。EEG和NLP特征都是从记录的EEG信号和句子水平语法分析中提取出来的,它们提供了特定的生物标志和句子结构信息。我们还利用了学生主观的混乱评分作为另一个整合特征,以提高分类性能。通过这样做,实验结果显示,这个框架比基线表现更好,其F1分数可达0.65,用于预测混乱,以及0.78,用于预测正确性,这表明使用这个框架可以提高分类性能。
ECLM: Efficient Edge-Cloud Collaborative Learning with Continuous Environment Adaptation
paper_authors: Yan Zhuang, Zhenzhe Zheng, Yunfeng Shao, Bingshuai Li, Fan Wu, Guihai Chen
For: The paper is written for developing an edge-cloud collaborative learning framework for rapid model adaptation in dynamic edge environments.* Methods: The paper proposes a novel block-level model decomposition design to decompose the original large cloud model into multiple combinable modules, and an end-to-end learning framework that incorporates the modular model design into an efficient model adaptation pipeline.* Results: The paper achieves significant improvements in model performance (18.89% accuracy increase) and resource efficiency (7.12x communication cost reduction) in adapting models to dynamic edge environments by efficiently collaborating the edge and the cloud models.Abstract
Pervasive mobile AI applications primarily employ one of the two learning paradigms: cloud-based learning (with powerful large models) or on-device learning (with lightweight small models). Despite their own advantages, neither paradigm can effectively handle dynamic edge environments with frequent data distribution shifts and on-device resource fluctuations, inevitably suffering from performance degradation. In this paper, we propose ECLM, an edge-cloud collaborative learning framework for rapid model adaptation for dynamic edge environments. We first propose a novel block-level model decomposition design to decompose the original large cloud model into multiple combinable modules. By flexibly combining a subset of the modules, this design enables the derivation of compact, task-specific sub-models for heterogeneous edge devices from the large cloud model, and the seamless integration of new knowledge learned on these devices into the cloud model periodically. As such, ECLM ensures that the cloud model always provides up-to-date sub-models for edge devices. We further propose an end-to-end learning framework that incorporates the modular model design into an efficient model adaptation pipeline including an offline on-cloud model prototyping and training stage, and an online edge-cloud collaborative adaptation stage. Extensive experiments over various datasets demonstrate that ECLM significantly improves model performance (e.g., 18.89% accuracy increase) and resource efficiency (e.g., 7.12x communication cost reduction) in adapting models to dynamic edge environments by efficiently collaborating the edge and the cloud models.
摘要
通用移动AI应用主要采用云基本学习(强大大模型)或设备内学习(轻量级小模型)两种学习方法。尽管它们各有优点,但是 neither paradigm can effectively handle动态边缘环境中的数据分布变化和设备资源波动, resulting in performance degradation. In this paper, we propose ECLM, an edge-cloud collaborative learning framework for rapid model adaptation in dynamic edge environments. We first propose a novel block-level model decomposition design to decompose the original large cloud model into multiple combinable modules. By flexibly combining a subset of the modules, this design enables the derivation of compact, task-specific sub-models for heterogeneous edge devices from the large cloud model, and the seamless integration of new knowledge learned on these devices into the cloud model periodically. As such, ECLM ensures that the cloud model always provides up-to-date sub-models for edge devices. We further propose an end-to-end learning framework that incorporates the modular model design into an efficient model adaptation pipeline including an offline on-cloud model prototyping and training stage, and an online edge-cloud collaborative adaptation stage. Extensive experiments over various datasets demonstrate that ECLM significantly improves model performance (e.g., 18.89% accuracy increase) and resource efficiency (e.g., 7.12x communication cost reduction) in adapting models to dynamic edge environments by efficiently collaborating the edge and the cloud models.
DSCom: A Data-Driven Self-Adaptive Community-Based Framework for Influence Maximization in Social Networks
paper_authors: Yuxin Zuo, Haojia Sun, Yongyi Hu, Jianxiong Guo, Xiaofeng Gao for: This paper aims to address the data-driven version of influence maximization, where the diffusion model is not given and needs to be inferred from the history cascades.methods: The paper proposes a machine learning-based framework called DSCom, which leverages node attributes to estimate the closeness between connected nodes and overcome the influence overlap problem.results: The proposed algorithm is evaluated through empirical experiments with parameterized diffusion models based on real-world social networks, showing its efficiency and effectiveness.Here’s the Chinese version:for: 这篇论文主要解决了数据驱动版本的影响最大化问题,其中diffusion模型未提供,需要从历史扩散中推断。methods: 该论文提出了基于机器学习的DSCom框架,利用节点特征来估计连接节点的相互关系,并通过自similarity matrix来解决因果重叠问题。results: 该算法经验测试了基于实际社交网络的参数化扩散模型,证明其效率和有效性。Abstract
Influence maximization aims to find a subset of seeds that maximize the influence spread under a given budget. In this paper, we mainly address the data-driven version of this problem, where the diffusion model is not given but needs to be inferred from the history cascades. Several previous works have addressed this topic in a statistical way and provided efficient algorithms with theoretical guarantee. However, in their settings, though the diffusion parameters are inferred, they still need users to preset the diffusion model, which can be an intractable problem in real-world practices. In this paper, we reformulate the problem on the attributed network and leverage the node attributes to estimate the closeness between the connected nodes. Specifically, we propose a machine learning-based framework, named DSCom, to address this problem in a heuristic way. Under this framework, we first infer the users' relationship from the diffusion dataset through attention mechanism and then leverage spectral clustering to overcome the influence overlap problem in the lack of exact diffusion formula. Compared to the previous theoretical works, we carefully designed empirical experiments with parameterized diffusion models based on real-world social networks, which prove the efficiency and effectiveness of our algorithm.
摘要
“影响 maximization 目标是找到一个最大化影响的种子子集,在给定的预算下。在这篇论文中,我们主要关注数据驱动的版本问题,即 diffusion 模型不是给定的,而是从历史扩散中推断出来。先前的一些工作已经Addressed this topic in a statistical way, providing efficient algorithms with theoretical guarantee. However, in their settings, the diffusion parameters are inferred, but users still need to preset the diffusion model, which can be an intractable problem in real-world practices.在这篇论文中,我们将问题 reformulate 到 attributed network 上,并利用节点特征来估计连接节点之间的距离。specifically,我们提出了一种机器学习基于的框架,named DSCom,来解决这个问题。在这个框架下,我们首先通过注意力机制从扩散数据集中推断用户之间的关系,然后利用 спектраль聚类来超越扩散影响的问题,具有缺乏准确扩散方程的情况下。与先前的理论工作相比,我们在实际实验中谨慎地设计了参数化的扩散模型,基于实际的社交网络数据,这证明了我们的算法的效率和有效性。”
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
paper_authors: Clifton Poth, Hannah Sterz, Indraneil Paul, Sukannya Purkayastha, Leon Engländer, Timo Imhof, Ivan Vulić, Sebastian Ruder, Iryna Gurevych, Jonas Pfeiffer
For: 这篇论文是为了推广更有效率的转移学习方法,提供了一个开源库(Adapters),可以实现参数高效和弹性的转移学习。* Methods: 这篇论文使用了10种多样化的adapter方法,并将它们集成到一个简单的接口中,提供了使用方便和可配置的方式。* Results: 论文透过评估这些adapter方法的性能,证明了这个库的可行性和弹性,并且比较了它们与传统 fine-tuning 方法的性能。Abstract
We introduce Adapters, an open-source library that unifies parameter-efficient and modular transfer learning in large language models. By integrating 10 diverse adapter methods into a unified interface, Adapters offers ease of use and flexible configuration. Our library allows researchers and practitioners to leverage adapter modularity through composition blocks, enabling the design of complex adapter setups. We demonstrate the library's efficacy by evaluating its performance against full fine-tuning on various NLP tasks. Adapters provides a powerful tool for addressing the challenges of conventional fine-tuning paradigms and promoting more efficient and modular transfer learning. The library is available via https://adapterhub.ml/adapters.
摘要
我们介绍了一个名为“Adapter”的开源库,这个库可以实现parameter-efficient和modular的转移学习在大型自然语言模型中。我们统一了10种不同的adapter方法,并将它们集成到了一个简单的界面中,这使得研究者和实践者可以轻松地使用和自定义adapter。我们显示了这个库的性能,与完整的 fine-tuning 进行比较,并证明了它在不同的NLP任务中的表现。“Adapter”提供了一个强大的工具,用于解决传统 fine-tuning 方法的挑战,并促进更有效和自定义的转移学习。这个库可以在https://adapterhub.ml/adapters中下载。
Community-Aware Efficient Graph Contrastive Learning via Personalized Self-Training
for: The paper is written for community detection tasks in graph-structured data, and it proposes a novel framework called Community-aware Efficient Graph Contrastive Learning (CEGCL) to jointly learn community partition and node representations in an end-to-end manner.
methods: The proposed CEGCL framework uses a personalized self-training (PeST) strategy for unsupervised scenarios, which enables the model to capture precise community-level personalized information in a graph. Additionally, the aligned graph clustering (AlGC) is employed to obtain the community partition.
results: The paper demonstrates the effectiveness of the proposed CEGCL model for community detection both theoretically and experimentally. Extensive experimental results show that CEGCL exhibits state-of-the-art performance on three benchmark datasets with different scales.Abstract
In recent years, graph contrastive learning (GCL) has emerged as one of the optimal solutions for various supervised tasks at the node level. However, for unsupervised and structure-related tasks such as community detection, current GCL algorithms face difficulties in acquiring the necessary community-level information, resulting in poor performance. In addition, general contrastive learning algorithms improve the performance of downstream tasks by increasing the number of negative samples, which leads to severe class collision and unfairness of community detection. To address above issues, we propose a novel Community-aware Efficient Graph Contrastive Learning Framework (CEGCL) to jointly learn community partition and node representations in an end-to-end manner. Specifically, we first design a personalized self-training (PeST) strategy for unsupervised scenarios, which enables our model to capture precise community-level personalized information in a graph. With the benefit of the PeST, we alleviate class collision and unfairness without sacrificing the overall model performance. Furthermore, the aligned graph clustering (AlGC) is employed to obtain the community partition. In this module, we align the clustering space of our downstream task with that in PeST to achieve more consistent node embeddings. Finally, we demonstrate the effectiveness of our model for community detection both theoretically and experimentally. Extensive experimental results also show that our CEGCL exhibits state-of-the-art performance on three benchmark datasets with different scales.
摘要
Recently, graph contrastive learning (GCL) has emerged as one of the optimal solutions for various supervised tasks at the node level. However, for unsupervised and structure-related tasks such as community detection, current GCL algorithms have difficulty obtaining the necessary community-level information, resulting in poor performance. In addition, general contrastive learning algorithms improve the performance of downstream tasks by increasing the number of negative samples, which leads to severe class collision and unfairness of community detection. To address these issues, we propose a novel Community-aware Efficient Graph Contrastive Learning Framework (CEGCL) to jointly learn community partition and node representations in an end-to-end manner. Specifically, we first design a personalized self-training (PeST) strategy for unsupervised scenarios, which enables our model to capture precise community-level personalized information in a graph. With the benefit of the PeST, we alleviate class collision and unfairness without sacrificing the overall model performance. Furthermore, the aligned graph clustering (AlGC) is employed to obtain the community partition. In this module, we align the clustering space of our downstream task with that in PeST to achieve more consistent node embeddings. Finally, we demonstrate the effectiveness of our model for community detection both theoretically and experimentally. Extensive experimental results also show that our CEGCL exhibits state-of-the-art performance on three benchmark datasets with different scales.
SBTRec- A Transformer Framework for Personalized Tour Recommendation Problem with Sentiment Analysis
results: 与基eline算法进行比较,SBTRec实现了平均F1分数61.45%,表明其在序列预测任务中表现出色。Abstract
When traveling to an unfamiliar city for holidays, tourists often rely on guidebooks, travel websites, or recommendation systems to plan their daily itineraries and explore popular points of interest (POIs). However, these approaches may lack optimization in terms of time feasibility, localities, and user preferences. In this paper, we propose the SBTRec algorithm: a BERT-based Trajectory Recommendation with sentiment analysis, for recommending personalized sequences of POIs as itineraries. The key contributions of this work include analyzing users' check-ins and uploaded photos to understand the relationship between POI visits and distance. We introduce SBTRec, which encompasses sentiment analysis to improve recommendation accuracy by understanding users' preferences and satisfaction levels from reviews and comments about different POIs. Our proposed algorithms are evaluated against other sequence prediction methods using datasets from 8 cities. The results demonstrate that SBTRec achieves an average F1 score of 61.45%, outperforming baseline algorithms. The paper further discusses the flexibility of the SBTRec algorithm, its ability to adapt to different scenarios and cities without modification, and its potential for extension by incorporating additional information for more reliable predictions. Overall, SBTRec provides personalized and relevant POI recommendations, enhancing tourists' overall trip experiences. Future work includes fine-tuning personalized embeddings for users, with evaluation of users' comments on POIs,~to further enhance prediction accuracy.
摘要
The paper further discusses the flexibility of the SBTRec algorithm, its ability to adapt to different scenarios and cities without modification, and its potential for extension by incorporating additional information for more reliable predictions. Overall, SBTRec provides personalized and relevant POI recommendations, enhancing tourists' overall trip experiences. Future work includes fine-tuning personalized embeddings for users, with evaluation of users' comments on POIs, to further enhance prediction accuracy.Translated into Simplified Chinese:旅游者在前往未知城市度假时,经常依靠旅游指南、旅游网站或推荐系统,以制定日程和探索流行点 интере斯(POIs)。然而,这些方法可能缺乏时间可行性、地点和用户偏好的优化。在这篇论文中,我们提出了SBTRec算法:一种基于BERT的 trajectory recommendation,具有情感分析,用于建议个性化POIs的顺序。我们的主要贡献包括分析用户检查到和上传照片,以理解POI访问和距离之间的关系。我们引入SBTRec,它包括情感分析,以提高推荐准确性。我们的提出的算法与其他序列预测方法进行比较,使用8个城市的数据。结果显示,SBTRec实现了平均F1分数为61.45%,超过基线算法。论文进一步讨论了SBTRec算法的灵活性,它可以适应不同的情况和城市,无需修改。此外,它还有扩展的潜在,通过添加更多信息,以提高预测的可靠性。总的来说,SBTRec提供了个性化和相关的POI推荐,提高旅游者的总体旅行体验。未来的工作包括个性化用户的嵌入调整,通过评估用户对POIs的评论,进一步提高预测准确性。
AIMS-EREA – A framework for AI-accelerated Innovation of Materials for Sustainability – for Environmental Remediation and Energy Applications
paper_authors: Sudarson Roy Pratihar, Deepesh Pai, Manaswita Nag
For: 可以用于综合考虑多种可能性和结构,快速找到适合的绿色材料,以满足可持续发展的能源和环境修复应用。* Methods: 基于密度函数理论(DFT)和其他理论,以及人工智能技术,可以快速和效率地对可能性进行筛选和预测,从而降低实验室synthesis和分析过程中的努力和成本。* Results: 通过 combing best of breed of Material Science theory with the power of Generative AI, 可以快速和高效地找到适合的绿色材料,并且可以避免生产危险副产品的可能性。Abstract
Many environmental remediation and energy applications (conversion and storage) for sustainability need design and development of green novel materials. Discovery processes of such novel materials are time taking and cumbersome due to large number of possible combinations and permutations of materials structures. Often theoretical studies based on Density Functional Theory (DFT) and other theories, coupled with Simulations are conducted to narrow down sample space of candidate materials, before conducting laboratory-based synthesis and analytical process. With the emergence of artificial intelligence (AI), AI techniques are being tried in this process too to ease out simulation time and cost. However tremendous values of previously published research from various parts of the world are still left as labor-intensive manual effort and discretion of individual researcher and prone to human omissions. AIMS-EREA is our novel framework to blend best of breed of Material Science theory with power of Generative AI to give best impact and smooth and quickest discovery of material for sustainability. This also helps to eliminate the possibility of production of hazardous residues and bye-products of the reactions. AIMS-EREA uses all available resources -- Predictive and Analytical AI on large collection of chemical databases along with automated intelligent assimilation of deep materials knowledge from previously published research works through Generative AI. We demonstrate use of our own novel framework with an example, how this framework can be successfully applied to achieve desired success in development of thermoelectric material for waste heat conversion.
摘要
多种环境恢复和能源应用(转化和存储)需要设计和开发绿色新材料。发现这些新材料的过程是时间consuming和复杂,因为有很多可能的组合和排序结构。经常通过密度函数理论(DFT)和其他理论,加上计算机模拟,来缩小实验室合成和分析过程中的样本空间。随着人工智能(AI)的出现,AI技术也在这个过程中使用,以减少计算时间和成本。然而,大量前期发表的研究成果仍然受到劳动密集和个人研究者的主观性的影响,容易出现人类缺失。我们的AIMS-EREA框架通过融合材料科学理论和生成AI的力量,为可持续发展提供了最佳影响和最快速的材料发现。此外,它还可以消除生产过程中可能产生的危险副产品。AIMS-EREA利用了所有可用资源——预测和分析AI在大量化学数据库中,以及自动智能吸收深入材料知识从前期发表的研究作品中。我们示例如如何使用我们的框架成功应用于废热电转换材料的开发。
Designing Interpretable ML System to Enhance Trustworthy AI in Healthcare: A Systematic Review of the Last Decade to A Proposed Robust Framework
paper_authors: Elham Nasarian, Roohallah Alizadehsani, U. Rajendra Acharyac, d Kwok-Leung Tsui for: This paper aims to review and discuss the processes and challenges of interpretable machine learning (IML) and explainable AI (XAI) in healthcare, with a focus on quality control and the importance of robust interpretability.methods: The paper uses a systematic literature review approach, searching PubMed, Scopus, and Web of Science databases using specific strings to identify relevant studies. The IML process is classified into three stages: data pre-processing interpretability, interpretable modeling, and post-processing interpretability.results: The paper provides experimental results to establish the importance of robust interpretability in healthcare, and offers insights for creating communicable clinician-AI tools. The survey also introduces a step-by-step roadmap for implementing XAI in clinical applications, addressing existing gaps and acknowledging XAI model limitations.Abstract
AI-based medical technologies, including wearables, telemedicine, LLMs, and digital care twins, significantly impact healthcare. Ensuring AI results are accurate and interpretable is crucial, especially for clinicians. This paper reviews processes and challenges of interpretable ML (IML) and explainable AI (XAI) in healthcare. Objectives include reviewing XAI processes, methods, applications, and challenges, with a focus on quality control. The IML process is classified into data pre-processing interpretability, interpretable modeling, and post-processing interpretability. The paper aims to establish the importance of robust interpretability in healthcare through experimental results, providing insights for creating communicable clinician-AI tools. Research questions, eligibility criteria, and goals were identified following PRISMA and PICO methods. PubMed, Scopus, and Web of Science were systematically searched using specific strings. The survey introduces a step-by-step roadmap for implementing XAI in clinical applications, addressing existing gaps and acknowledging XAI model limitations.
摘要
人工智能技术在医疗领域的应用,包括智能服务器、远程医疗、语言模型和数字护理双,对医疗业产生了深远的影响。为确保人工智能结果准确和可解释,特别是 для临床医生,在医疗领域中确保可解释的机器学习(IML)和可解释人工智能(XAI)的过程和挑战是非常重要。本文将对可解释ML(IML)和可解释人工智能(XAI)在医疗领域的过程和挑战进行了评估。包括数据预处理可解释、可解释模型和后处理可解释在内的IML过程将被分类。本文的目标是通过实验结果证明可Robust可解释在医疗领域的重要性,并为创建可通信的医生-AI工具提供了新的发现。根据PRISMA和PICO方法,我们定义了研究问题、适用性标准和目标。通过对PubMed、Scopus和Web of Science等数据库进行系统性搜索,我们使用特定的搜索串检索相关文献。本文将提供一个步骤并进的路线图,以帮助实施XAI在临床应用中,并解决现有的坑害和XAI模型的限制。
Orca 2: Teaching Small Language Models How to Reason
results: Orca 2在15个多样化的 benchmarck 上表现出优于同类型模型和大型模型的 Zero-shot 表现,并在复杂任务中达到了与大型模型相当或更好的水平。Abstract
Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. We open-source Orca 2 to encourage further research on the development, evaluation, and alignment of smaller LMs.
摘要
鲸鱼1从 ricSignals 学习,如解释迹象,使其在 BigBench Hard 和 AGIEval 上表现出色。在 Orca 2 中,我们继续探索如何提高训练信号可以提高小LMs 的理解能力。研究小LMs 的训练通常采用了模仿学习,以复制更强大的模型的输出。我们认为过分强调模仿可能会限制小LMs 的潜力。我们尝试教育小LMs 使用不同的解决方案来解决不同任务,可能与更大的模型不同。例如,更大的模型可能会提供一个直接回答复杂任务的方法,而小LMs 可能没有同样的容量。在 Orca 2 中,我们教育模型多种理解技巧(步骤、回忆然后生成、回忆然后生成、直接回答等)。更重要的是,我们努力帮助模型学习选择每个任务最有效的解决方案。我们使用了15种多样化的benchmark(相当于100个任务和36,000个唯一提问)来评估 Orca 2。结果显示,Orca 2 在复杂任务中表现出色,并在零容量情况下与5-10倍大的模型相当或更好的表现。我们将 Orca 2 开源,以便进一步研究小LMs 的发展、评估和对齐。
Synthetic Data Generation for Bridging Sim2Real Gap in a Production Environment
results: 试验结果表明,使用这些基本方法的组合可以在生产环境中减少模拟到现实之间的差异,从而提高机器人在生产中的表现。Abstract
Synthetic data is being used lately for training deep neural networks in computer vision applications such as object detection, object segmentation and 6D object pose estimation. Domain randomization hereby plays an important role in reducing the simulation to reality gap. However, this generalization might not be effective in specialized domains like a production environment involving complex assemblies. Either the individual parts, trained with synthetic images, are integrated in much larger assemblies making them indistinguishable from their counterparts and result in false positives or are partially occluded just enough to give rise to false negatives. Domain knowledge is vital in these cases and if conceived effectively while generating synthetic data, can show a considerable improvement in bridging the simulation to reality gap. This paper focuses on synthetic data generation procedures for parts and assemblies used in a production environment. The basic procedures for synthetic data generation and their various combinations are evaluated and compared on images captured in a production environment, where results show up to 15% improvement using combinations of basic procedures. Reducing the simulation to reality gap in this way can aid to utilize the true potential of robot assisted production using artificial intelligence.
摘要
Data Center Audio/Video Intelligence on Device (DAVID) – An Edge-AI Platform for Smart-Toys
results: 该平台可以根据用户的语音和面部表达进行识别和 интерпретаción,并且具有嵌入式的数据保护功能,以保障用户的隐私。Abstract
An overview is given of the DAVID Smart-Toy platform, one of the first Edge AI platform designs to incorporate advanced low-power data processing by neural inference models co-located with the relevant image or audio sensors. There is also on-board capability for in-device text-to-speech generation. Two alternative embodiments are presented: a smart Teddy-bear, and a roving dog-like robot. The platform offers a speech-driven user interface and can observe and interpret user actions and facial expressions via its computer vision sensor node. A particular benefit of this design is that no personally identifiable information passes beyond the neural inference nodes thus providing inbuilt compliance with data protection regulations.
摘要
TEXT这里提供了DAVID智能玩具平台的总览,这是首先采用进步低功耗神经推论模型与相应的图像或语音感应器集成的 Edge AI 平台设计。它还具有内置的文本转语音功能。这两个版本中的一个是聪明的 teddy bear,另一个是一只行走的狗like 机器人。这个平台具有语音驱动的用户界面,可以通过计算机视觉感应器监测和解读用户的动作和表情。特别的是,这个设计不会将个人识别信息传递到神经推论节点以外,因此提供了内置的数据保护规定的实现。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
Geometric Data Augmentations to Mitigate Distribution Shifts in Pollen Classification from Microscopic Images
results: 我们进行了广泛的评估,并证明了 geometric 增强技术可以在不同的模型架构上提供一定的改进,其中最高达14%。此外,我们还进行了减少采集的滤波器和图像增强器的综合评估,并证明了我们的几何增强技术在文献中的评分最高。Abstract
Distribution shifts are characterized by differences between the training and test data distributions. They can significantly reduce the accuracy of machine learning models deployed in real-world scenarios. This paper explores the distribution shift problem when classifying pollen grains from microscopic images collected in the wild with a low-cost camera sensor. We leverage the domain knowledge that geometric features are highly important for accurate pollen identification and introduce two novel geometric image augmentation techniques to significantly narrow the accuracy gap between the model performance on the train and test datasets. In particular, we show that Tenengrad and ImageToSketch filters are highly effective to balance the shape and texture information while leaving out unimportant details that may confuse the model. Extensive evaluations on various model architectures demonstrate a consistent improvement of the model generalization to field data of up to 14% achieved by the geometric augmentation techniques when compared to a wide range of standard image augmentations. The approach is validated through an ablation study using pollen hydration tests to recover the shape of dry pollen grains. The proposed geometric augmentations also receive the highest scores according to the affinity and diversity measures from the literature.
摘要
分布Shift问题是指训练和测试数据之间的差异,可能导致机器学习模型在实际应用场景中的精度下降。本文探讨了在野外采集的杂花粉胞微scopic图像中的分布Shift问题,并利用域知识,认为 геометрические特征对于准确的杂花识别非常重要。我们引入了两种新的地形图像增强技术,以减少模型在训练和测试数据集之间的精度差距。特别是,我们表明了Tenengrad和ImageToSketch筛选器在平衡形态和文本信息的同时,留下无关重要信息的能力,可以提高模型的总体性能。我们进行了对多种模型架构的广泛评估,并证明了地形增强技术可以在 field data 上提高模型总体性能达到14%。我们还进行了一项ablation study,用气压测试来恢复干燥杂花粉胞的形状,以验证我们的方法。此外,我们的地形增强技术也在文献中得到了最高的评分。
results: 实现了提高的表现。 Additionally, 开发了一个开源的图像分析和检索应用程序,易于集成,可能对临床专业人员的日常活动产生潜在支持。Abstract
Content-based image retrieval (CBIR) with self-supervised learning (SSL) accelerates clinicians' interpretation of similar images without manual annotations. We develop a CBIR from the contrastive learning SimCLR and incorporate a generalized-mean (GeM) pooling followed by L2 normalization to classify lesion types and retrieve similar images before clinicians' analysis. Results have shown improved performance. We additionally build an open-source application for image analysis and retrieval. The application is easy to integrate, relieving manual efforts and suggesting the potential to support clinicians' everyday activities.
摘要
Multiple View Geometry Transformers for 3D Human Pose Estimation
results: 对于域内和域外设置,模型均表现出优于当前状态艺法,特别是在域外设置下表现出了明显的优势,并且可以在新的摄像头和几何学上进行普适化。Abstract
In this work, we aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation. Recent works have focused on end-to-end learning-based transformer designs, which struggle to resolve geometric information accurately, particularly during occlusion. Instead, we propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in an iterative manner. The geometry modules are learning-free and handle all viewpoint-dependent 3D tasks geometrically which notably improves the model's generalization ability. The appearance modules are learnable and are dedicated to estimating 2D poses from image signals end-to-end which enables them to achieve accurate estimates even when occlusion occurs, leading to a model that is both accurate and generalizable to new cameras and geometries. We evaluate our approach for both in-domain and out-of-domain settings, where our model consistently outperforms state-of-the-art methods, and especially does so by a significant margin in the out-of-domain setting. We will release the code and models: https://github.com/XunshanMan/MVGFormer.
摘要
在这项工作中,我们目标是提高 transformer 在多视图三维人姿估计中的3D理解能力。先前的工作主要集中于终端学习基于 transformer 的设计,它们在 occlusion 时具有准确地理信息的解决能力有限。相比之下,我们提议一种新的混合模型,MVGFormer,它包括一系列具有学习自由的geometry模块和学习端到端的 appearance模块,这些模块在执行循环方式下组织。geometry模块负责所有视点依赖的3D任务,可以大幅提高模型的泛化能力。appearance模块通过 directly 从图像信号中学习2D姿势,可以在 occlusion 时达到高度准确的估计,导致模型具有高度准确和泛化性。我们在域内和域外设置下评估了我们的方法,其中我们的方法在域外设置下一直保持领先,特别是在 occlusion 情况下,我们的方法表现出了明显的优势。我们将在 GitHub 上发布代码和模型:https://github.com/XunshanMan/MVGFormer。
HungerGist: An Interpretable Predictive Model for Food Insecurity
paper_authors: Yongsu Ahn, Muheng Yan, Yu-Ru Lin, Zian Wang for: This paper aims to address the critical need for advanced early warning systems to combat escalating food insecurity in Africa, which is caused by factors such as war, climate change, and poverty.methods: The paper introduces a multi-task deep learning model called “HungerGist” that utilizes news texts and natural language processing (NLP) techniques to analyze and predict food insecurity.results: The model outperforms the baseline method trained on both traditional risk factors and human-curated keywords, and has the ability to detect critical texts that contain interpretable signals known as “gists.” Additionally, the approach has the potential to reveal latent factors that would otherwise remain concealed in unstructured texts.Abstract
The escalating food insecurity in Africa, caused by factors such as war, climate change, and poverty, demonstrates the critical need for advanced early warning systems. Traditional methodologies, relying on expert-curated data encompassing climate, geography, and social disturbances, often fall short due to data limitations, hindering comprehensive analysis and potential discovery of new predictive factors. To address this, this paper introduces "HungerGist", a multi-task deep learning model utilizing news texts and NLP techniques. Using a corpus of over 53,000 news articles from nine African countries over four years, we demonstrate that our model, trained solely on news data, outperforms the baseline method trained on both traditional risk factors and human-curated keywords. In addition, our method has the ability to detect critical texts that contain interpretable signals known as "gists." Moreover, our examination of these gists indicates that this approach has the potential to reveal latent factors that would otherwise remain concealed in unstructured texts.
摘要
“非洲的食品不安全升级,由于战争、气候变化和贫困等因素,表明了高度需要先进早期警示系统。传统的方法,依靠专家手动维护的数据,包括气候、地理和社会冲击,经常因数据限制而受到限制,阻碍了全面分析和潜在的新预测因素的发现。为解决这个问题,本文介绍了“饥饿精”,一种多任务深度学习模型,使用新闻文本和自然语言处理技术。使用9个非洲国家的4年新闻文章 corps(总计53,000篇),我们表明了我们的模型,通过新闻数据进行训练,比基eline方法(基于传统风险因素和人工标记)更高效。此外,我们的方法还能探测关键的新闻文本,含有可解释的信号,称为“精”。此外,我们的研究表明,这种方法有潜在的发现隐藏在未结构化文本中的因素的潜力。”
RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability
results: 经过测试,该方法能够有效地使LLMs理解推荐模型的行为,并生成高度可信的推荐解释。Abstract
Recommender systems are widely used in various online services, with embedding-based models being particularly popular due to their expressiveness in representing complex signals. However, these models often lack interpretability, making them less reliable and transparent for both users and developers. With the emergence of large language models (LLMs), we find that their capabilities in language expression, knowledge-aware reasoning, and instruction following are exceptionally powerful. Based on this, we propose a new model interpretation approach for recommender systems, by using LLMs as surrogate models and learn to mimic and comprehend target recommender models. Specifically, we introduce three alignment methods: behavior alignment, intention alignment, and hybrid alignment. Behavior alignment operates in the language space, representing user preferences and item information as text to learn the recommendation model's behavior; intention alignment works in the latent space of the recommendation model, using user and item representations to understand the model's behavior; hybrid alignment combines both language and latent spaces for alignment training. To demonstrate the effectiveness of our methods, we conduct evaluation from two perspectives: alignment effect, and explanation generation ability on three public datasets. Experimental results indicate that our approach effectively enables LLMs to comprehend the patterns of recommendation models and generate highly credible recommendation explanations.
摘要
推荐系统在线服务中广泛应用,尤其是基于嵌入式模型,因其可以表达复杂的信号。然而,这些模型通常缺乏可读性,使得用户和开发者对其不可靠和透明度感到不满。随着大语言模型(LLMs)的出现,我们发现它们在语言表达、知识感知和指令遵循方面具有极高的能力。基于这一点,我们提出了一种新的推荐系统模型解释方法,通过使用 LLMS 作为代理模型,并学习模仿和理解目标推荐模型的行为。 Specifically, we introduce three alignment methods: 行为对齐、意图对齐和混合对齐。行为对齐在语言空间中表示用户偏好和物品信息为文本,以学习推荐模型的行为;意图对齐在推荐模型的latent空间中使用用户和物品表示,以理解模型的行为;混合对齐将语言和latent空间进行对齐训练。为证明我们的方法的有效性,我们从两个角度进行评估:对齐效果和解释生成能力,并在三个公共数据集上进行实验。实验结果表明,我们的方法可以有效地使 LLMS 理解推荐模型的模式,并生成高可信度的推荐解释。
An Empirical Bayes Framework for Open-Domain Dialogue Generation
results: BODEB Framework 在多样性和协调性两个方面都达到了更好的结果,比较于Variational Frameworks。Abstract
To engage human users in meaningful conversation, open-domain dialogue agents are required to generate diverse and contextually coherent dialogue. Despite recent advancements, which can be attributed to the usage of pretrained language models, the generation of diverse and coherent dialogue remains an open research problem. A popular approach to address this issue involves the adaptation of variational frameworks. However, while these approaches successfully improve diversity, they tend to compromise on contextual coherence. Hence, we propose the Bayesian Open-domain Dialogue with Empirical Bayes (BODEB) framework, an empirical bayes framework for constructing an Bayesian open-domain dialogue agent by leveraging pretrained parameters to inform the prior and posterior parameter distributions. Empirical results show that BODEB achieves better results in terms of both diversity and coherence compared to variational frameworks.
摘要
要让人工智能对话机器人与人进行有意义的对话,开放领域对话代理需要生成多样化和上下文相关的对话。尽管最近的进步可以归功于预训练语言模型的使用,但生成多样化和上下文相关的对话仍然是一个开放的研究问题。一种受欢迎的方法来解决这个问题是适应变量框架。然而,这些方法通常会牺牲上下文相关性。因此,我们提出了概率开放领域对话框架(BODEB),一种基于预训练参数的 bayesian 开放领域对话代理。实际结果表明,BODEB 在多样性和上下文相关性两个方面都比变量框架更好。
results: 作者在popular的大规模人脸识别数据集上进行了实验,并证明了该方法的可行性和实用性,同时该方法可以在无监督学习的情况下提供高度的精度和稳定性。Abstract
Ensemble learning combines several individual models to obtain better generalization performance. In this work we present a practical method for estimating the joint power of several classifiers which differs from existing approaches by {\em not relying on labels}, hence enabling the work in unsupervised setting of huge datasets. It differs from existing methods which define a "diversity measure". The heart of the method is a combinatorial bound on the number of mistakes the ensemble is likely to make. The bound can be efficiently approximated in time linear in the number of samples. Thus allowing an efficient search for a combination of classifiers that are likely to produce higher joint accuracy. Moreover, having the bound applicable to unlabeled data makes it both accurate and practical in modern setting of unsupervised learning. We demonstrate the method on popular large-scale face recognition datasets which provide a useful playground for fine-grain classification tasks using noisy data over many classes. The proposed framework fits neatly in trending practices of unsupervised learning. It is a measure of the inherent independence of a set of classifiers not relying on extra information such as another classifier or labeled data.
摘要
ensemble learning可以提高总体化性能,在这个工作中我们提出了一种实用的方法,不同于现有方法,这种方法不需要标签,因此可以在无标签 dataset 上进行学习。它与现有方法不同,这种方法定义一个“多样性度量”。 ensemble learning 的核心思想是一种可以有效地估计多个分类器的结合力的 bounds,这个 bounds 可以在样本数 linear 时间内efficiently aproximated。因此,可以有效地搜索一组可能 producen higher 的 joint accuracy 的分类器组合。此外,由于这个 bounds 适用于无标签数据,这使得它在现代无supervised learning 中具有准确性和实用性。我们在流行的大规模人脸识别 dataset 上进行了示例,这些 dataset 提供了一个有用的游戏场景,用于精细的分类任务,使用噪音数据。our proposed framework 适合当前流行的无supervised learning 做法,它是一种不依赖于其他分类器或标签数据的独立性度量。
Case Repositories: Towards Case-Based Reasoning for AI Alignment
results: 该论文认为,通过建立一个包含多种案例的案例库,可以帮助AI系统对齐,同时也可以提供一个让人们进行道德反思的平台。Abstract
Case studies commonly form the pedagogical backbone in law, ethics, and many other domains that face complex and ambiguous societal questions informed by human values. Similar complexities and ambiguities arise when we consider how AI should be aligned in practice: when faced with vast quantities of diverse (and sometimes conflicting) values from different individuals and communities, with whose values is AI to align, and how should AI do so? We propose a complementary approach to constitutional AI alignment, grounded in ideas from case-based reasoning (CBR), that focuses on the construction of policies through judgments on a set of cases. We present a process to assemble such a case repository by: 1) gathering a set of ``seed'' cases -- questions one may ask an AI system -- in a particular domain from discussions in online communities, 2) eliciting domain-specific key dimensions for cases through workshops with domain experts, 3) using LLMs to generate variations of cases not seen in the wild, and 4) engaging with the public to judge and improve cases. We then discuss how such a case repository could assist in AI alignment, both through directly acting as precedents to ground acceptable behaviors, and as a medium for individuals and communities to engage in moral reasoning around AI
摘要
法律、伦理和其他领域的教学经常作为智能机器的准则核心,面临着复杂和uncertain的社会问题,受到人类价值观的框架。在实践中,AI的Alignment也面临着类似的复杂性和uncertainty,问题是AI与 whose values should it align, and how should it do so? We propose a complementary approach to constitutional AI alignment, based on ideas from case-based reasoning (CBR), which focuses on constructing policies through judgments on a set of cases. We present a process to assemble such a case repository by:1. 收集域 especific seed cases(问题可以对AI系统提问)from online community discussions,2. 通过域专家组织工作shop elicit domain-specific key dimensions for cases,3. 使用LLMs生成不同from wild variations of cases,4. 与公众交流,评估和改进cases.然后,我们讨论了如何使用这个案例库来帮助AI的Alignment,包括直接作为行为的根据,以及作为人们和社区们在AI的伦理思考中的媒介。
Representing visual classification as a linear combination of words
results: 研究发现,使用这种解释策略可以提供一些与临床知识相符的描述器,并且可以帮助非专业人员完成一些特殊的医疗任务。同时,研究还发现了公共数据集中存在一些”短cut连接”的问题。Abstract
Explainability is a longstanding challenge in deep learning, especially in high-stakes domains like healthcare. Common explainability methods highlight image regions that drive an AI model's decision. Humans, however, heavily rely on language to convey explanations of not only "where" but "what". Additionally, most explainability approaches focus on explaining individual AI predictions, rather than describing the features used by an AI model in general. The latter would be especially useful for model and dataset auditing, and potentially even knowledge generation as AI is increasingly being used in novel tasks. Here, we present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task. By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words, resulting in a weight for each word that indicates its alignment with the vision-based classifier. We assess our approach using two medical imaging classification tasks, where we find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training. However, our approach also identifies the potential for 'shortcut connections' in the public datasets used. Towards a functional measure of explainability, we perform a pilot reader study where we find that the AI-identified words can enable non-expert humans to perform a specialized medical task at a non-trivial level. Altogether, our results emphasize the potential of using multimodal foundational models to deliver intuitive, language-based explanations of visual tasks.
摘要
explainability 是深度学习中长期的挑战,尤其在医疗领域。常见的解释方法会强调图像区域,帮助人类理解 AI 模型做出的决定。然而,人类主要通过语言来传达解释,不仅包括 "where",还包括 "what"。此外,大多数解释方法都是解释个别 AI 预测,而不是描述 AI 模型在总体上使用的特征。后者尤其有用于模型和数据集 Auditing,以及可能even 生成知识,因为 AI 在新任务中使用的情况在增加。在这里,我们提出了一种解释策略,使用视觉语言模型来identify图像中的语言基于描述器。我们利用预训练的共同 embedding空间,以图像和文本的 JOINT embedding space,我们的方法可以将新的分类任务看作是一种线性组合的 слова,从而得到一个对应于每个单词的权重,这个权重指示单词与视觉基于分类器的Alignment。我们通过两个医疗成像分类任务进行评估,发现我们的方法可以获得与临床知识相当的描述器,即使没有域 específico 语言培训。然而,我们的方法还发现了公共数据集中的 "短cut 连接" 问题。为了评估函数性的解释度量,我们进行了一个 Pilot 读者研究,发现 AI 标识的单词可以帮助非专业人员在特殊医疗任务中达到非常轻量级的性能。总之,我们的结果强调了使用多Modal 基础模型来提供直观的语言基于解释,以便更好地理解视觉任务。
Cognitive bias in large language models: Cautious optimism meets anti-Panglossian meliorism
results: 本文指出,现有的大语言模型可能存在一些偏见问题,并提出了一些途径来减少这些偏见。同时,本文也探讨了人类认知偏见的哲学意义以及模型偏见的原因。Abstract
Traditional discussions of bias in large language models focus on a conception of bias closely tied to unfairness, especially as affecting marginalized groups. Recent work raises the novel possibility of assessing the outputs of large language models for a range of cognitive biases familiar from research in judgment and decisionmaking. My aim in this paper is to draw two lessons from recent discussions of cognitive bias in large language models: cautious optimism about the prevalence of bias in current models coupled with an anti-Panglossian willingness to concede the existence of some genuine biases and work to reduce them. I draw out philosophical implications of this discussion for the rationality of human cognitive biases as well as the role of unrepresentative data in driving model biases.
摘要
传统的大语言模型偏见讨论围绕着不公正性,特别是对弱势群体产生影响。 current work 提出了评估大语言模型输出的多种认知偏见的新可能性,这些偏见 Familiar from research on judgment and decision-making. My goal in this paper is to draw two lessons from recent discussions of cognitive bias in large language models: cautious optimism about the prevalence of bias in current models, combined with an anti-Panglossian willingness to acknowledge the existence of some genuine biases and work to reduce them. I will draw out the philosophical implications of this discussion for human cognitive biases and the role of unrepresentative data in driving model biases.Note: "Panglossian" refers to the tendency to overlook or downplay the existence of negative aspects or biases, named after the character Dr. Pangloss in Voltaire's Candide. The term "anti-Panglossian" is used to describe a willingness to acknowledge and address such biases.
paper_authors: Jon Z. Cai, Shafiuddin Rehan Ahmed, Julia Bonn, Kristin Wright-Bettner, Martha Palmer, James H. Martin
for: 本研究旨在开发一个基于网络语言文本的抽象含义表示(AMR)创建工具——CAMRA(编程语言类型的 AMR 编辑器)。
methods: CAMRA 使用了一种新的方法,将 AMR 注释视为编程语言中的编程,通过利用编程语言的概念,帮助用户更好地理解和使用 AMR 注释。
results: CAMRA 可以快速和准确地生成 AMR 注释,并且可以帮助用户更好地理解和使用 Propbank 角色集。Here’s the breakdown of each point in English:
for: The paper is aimed at developing a tool for creating Abstract Meaning Representations (AMR) from natural language text, called CAMRA (a programming language-like AMR editor).
methods: CAMRA uses a novel approach that treats AMR annotation as coding in programming languages, leveraging the familiarity of programming paradigms to help users better understand and use AMR annotation.
results: CAMRA can quickly and accurately generate AMR annotation, and can also help users better understand and use Propbank role sets.Abstract
In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompasses all essential features of existing AMR editors, including example lookup, while going a step further by integrating Propbank roleset lookup as an autocomplete feature within the tool. Notably, CAMRA incorporates AMR parser models as coding co-pilots, greatly enhancing the efficiency and accuracy of AMR annotators. To demonstrate the tool's capabilities, we provide a live demo accessible at: https://camra.colorado.edu
摘要
在这篇论文中,我们介绍了 CAMRA(抽象含义表示语言编辑器),一款前沿的网页式工具,用于从自然语言文本中构建抽象含义表示(AMR)。CAMRA 采用了一种新的方法来进行深层次语义注释,将 AMR 注释视为编程语言中的编程。利用编程语言的概念互联,CAMRA 包含了所有现有 AMR 编辑器中的重要功能,包括示例查询,同时还添加了基于 Propbank 角色集Lookup 的自动完成功能。另外,CAMRA 还 integrate了 AMR 解析模型,以增强 AMR 注释的效率和准确性。为证明工具的能力,我们提供了一个实时示例,可以在以下链接中访问:https://camra.colorado.edu。
results: 研究结果表明,该模型对925个困难子类别中的前三个建议准确率为93.9%。用户研究也表明,该算法提供的可解释建议和理由文档可以有效减少海关官员的分类审核时间和劳动。Abstract
The task of assigning internationally accepted commodity codes (aka HS codes) to traded goods is a critical function of customs offices. Like court decisions made by judges, this task follows the doctrine of precedent and can be nontrivial even for experienced officers. Together with the Korea Customs Service (KCS), we propose a first-ever explainable decision supporting model that suggests the most likely subheadings (i.e., the first six digits) of the HS code. The model also provides reasoning for its suggestion in the form of a document that is interpretable by customs officers. We evaluated the model using 5,000 cases that recently received a classification request. The results showed that the top-3 suggestions made by our model had an accuracy of 93.9\% when classifying 925 challenging subheadings. A user study with 32 customs experts further confirmed that our algorithmic suggestions accompanied by explainable reasonings, can substantially reduce the time and effort taken by customs officers for classification reviews.
摘要
customs offices 负责分配国际接受的商品代码(即HS码)是一项关键的任务。这种任务与法律判决一样,按照前例进行,而且对经验丰富的官员也可能是非常困难的。我们与韩国海关服务(KCS)合作,提出了一种首次出现的可解释决策支持模型,该模型建议商品的可能性最高的六位HS码。此外,模型还提供了其建议的解释,以文档的形式,可以由海关官员理解。我们对5000个最近获得分类请求的案例进行了评估,结果显示,我们模型的top3建议的准确率为93.9%,对925个困难的子标签进行分类。一个用户研究中,32名海关专家确认了我们的算法建议和可解释的理由,可以减少海关官员对分类审核的时间和努力。
Compact and Intuitive Airfoil Parameterization Method through Physics-aware Variational Autoencoder
for: optimize the design of high-performance aircraft airfoils
methods: uses physics-aware variational autoencoder to parameterize airfoil shape
results: produces smooth and non-intersecting airfoils with improved feasibility and intuitivenessAbstract
Airfoil shape optimization plays a critical role in the design of high-performance aircraft. However, the high-dimensional nature of airfoil representation causes the challenging problem known as the "curse of dimensionality". To overcome this problem, numerous airfoil parameterization methods have been developed, which can be broadly classified as polynomial-based and data-driven approaches. Each of these methods has desirable characteristics such as flexibility, parsimony, feasibility, and intuitiveness, but a single approach that encompasses all of these attributes has yet to be found. For example, polynomial-based methods struggle to balance parsimony and flexibility, while data-driven methods lack in feasibility and intuitiveness. In recent years, generative models, such as generative adversarial networks and variational autoencoders, have shown promising potential in airfoil parameterization. However, these models still face challenges related to intuitiveness due to their black-box nature. To address this issue, we developed a novel airfoil parameterization method using physics-aware variational autoencoder. The proposed method not only explicitly separates the generation of thickness and camber distributions to produce smooth and non-intersecting airfoils, thereby improving feasibility, but it also directly aligns its latent dimensions with geometric features of the airfoil, significantly enhancing intuitiveness. Finally, extensive comparative studies were performed to demonstrate the effectiveness of our approach.
摘要
高性能飞机设计中,飞机翼形参数化具有核心作用。然而,飞机翼形表示的维度高度带来“维度咒数”问题,这种问题很难解决。为了缓解这个问题,许多飞机翼形参数化方法已经开发出来,可以分为多项式基于的方法和数据驱动方法。每种方法都具有便利、简洁、可行性和直观性等特点,但是一种方法同时具有所有这些特点还没有被发现。例如,多项式基于的方法很难平衡简洁和灵活性,而数据驱动方法则缺乏可行性和直观性。在最近几年,生成模型,如生成敌方网络和变量自动编码器,在飞机翼形参数化中表现出了潜在的潜力。然而,这些模型仍然面临直观性问题,因为它们的黑盒结构。为了解决这个问题,我们开发了一种新的飞机翼形参数化方法,使用物理意识的变量自动编码器。我们的方法不仅能够明确分离thickness和camber分布的生成,从而提高可行性,而且直接将其缺失的维度与飞机翼形的几何特征直接对应,从而明显提高直观性。最后,我们进行了广泛的比较研究,以证明我们的方法的有效性。
Understanding and Mitigating Classification Errors Through Interpretable Token Patterns
paper_authors: Michael A. Hedderich, Jonas Fischer, Dietrich Klakow, Jilles Vreeken
For: 本文旨在Characterizing NLP类型错误,以提供global和可读的描述,以便改进NLP类型错误。* Methods: 本文提出了一种基于最小描述长度原则的方法,可以找到 Correct和错误预测之间的 TokenPatterns。* Results: 实验表明,该方法能够成功地找到ground truth,即使数据集很大且词汇表很大。在VQA和NERcase study中,该方法能够提供明确和行动可能的错误描述。Abstract
State-of-the-art NLP methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a classifier is prone to making systematic errors, but also gives a way to act and improve the classifier. We propose to discover those patterns of tokens that distinguish correct and erroneous predictions as to obtain global and interpretable descriptions for arbitrary NLP classifiers. We formulate the problem of finding a succinct and non-redundant set of such patterns in terms of the Minimum Description Length principle. Through an extensive set of experiments, we show that our method, Premise, performs well in practice. Unlike existing solutions, it recovers ground truth, even on highly imbalanced data over large vocabularies. In VQA and NER case studies, we confirm that it gives clear and actionable insight into the systematic errors made by NLP classifiers.
摘要
现代NLPT方法可以达到人类水平的性能在许多任务上,但仍会出错。Characterizing这些错误的方式可以提供权威的错误分布,并且可以用来改进分类器。我们提议使用token的 patrerns来 отличи correct和erroneous预测。我们将这问题转化为最小描述长度原理来解决。经过广泛的实验,我们发现我们的方法Premise在实践中表现良好。与现有的解决方案不同,它可以在大词汇和强相关性的数据上恢复真实的描述。在VQA和NER例子中,我们证明了它可以提供清晰和行动可能的NLPT分类器的系统性错误的理解。