cs.AI - 2023-08-25

NeO 360: Neural Fields for Sparse View Synthesis of Outdoor Scenes

paper_url: http://arxiv.org/abs/2308.12967
repo_url: https://github.com/zubair-irshad/NeO-360
paper_authors: Muhammad Zubair Irshad, Sergey Zakharov, Katherine Liu, Vitor Guizilini, Thomas Kollar, Adrien Gaidon, Zsolt Kira, Rares Ambrus
for: 本研究旨在提出一种可扩展的方法，用于从单个或几个姿态的RGB图像中 Synthesize 360度场景。
methods: 我们提出了一种基于神经网络的方法，称为NeO 360，它可以从单个或几个姿态的RGB图像中学习出360度场景的分布。我们的方法使用混合的图像conditional triplanar表示，可以在任何世界点上进行查询。
results: 我们在提出的挑战性360度无限 dataset中，称为NeRDS 360，进行了实验，并证明了NeO 360可以在新视图和新场景中进行高效的Synthesize，同时也提供了编辑和组合功能。

Abstract
Recent implicit neural representations have shown great results for novel view synthesis. However, existing methods require expensive per-scene optimization from many views hence limiting their application to real-world unbounded urban settings where the objects of interest or backgrounds are observed from very few views. To mitigate this challenge, we introduce a new approach called NeO 360, Neural fields for sparse view synthesis of outdoor scenes. NeO 360 is a generalizable method that reconstructs 360{\deg} scenes from a single or a few posed RGB images. The essence of our approach is in capturing the distribution of complex real-world outdoor 3D scenes and using a hybrid image-conditional triplanar representation that can be queried from any world point. Our representation combines the best of both voxel-based and bird's-eye-view (BEV) representations and is more effective and expressive than each. NeO 360's representation allows us to learn from a large collection of unbounded 3D scenes while offering generalizability to new views and novel scenes from as few as a single image during inference. We demonstrate our approach on the proposed challenging 360{\deg} unbounded dataset, called NeRDS 360, and show that NeO 360 outperforms state-of-the-art generalizable methods for novel view synthesis while also offering editing and composition capabilities. Project page: https://zubair-irshad.github.io/projects/neo360.html

摘要
近期的隐式神经表示法已经实现了出色的新视图合成效果。然而，现有的方法具有每个场景需要贵重的多视图优化，因此在实际世界的无限范围城市场景中使用受限。为解决这个挑战，我们介绍了一种新的方法called NeO 360，它是一种普适的方法，可以从单个或几个RGB图像中重建360度场景。我们的方法的核心思想是捕捉复杂的实际 OUTDOOR 3D 场景的分布，并使用一种混合图像条件的三平面表示，可以在任何世界点上进行查询。我们的表示结合了 voxel-based 和 bird's-eye-view 表示的优点，并且在表示效果和可表达性方面比每个方法更好。NeO 360 的表示允许我们在大量的无限 3D 场景集中学习，并在新视图和新场景中进行推理，只需要几个图像。我们在 NeRDS 360 提出的挑战性 360度无限数据集上证明了 NeO 360 超过了当前最佳的通用方法，并且具有编辑和组合功能。项目页面：https://zubair-irshad.github.io/projects/neo360.html

DLIP: Distilling Language-Image Pre-training

paper_url: http://arxiv.org/abs/2308.12956
repo_url: None
paper_authors: Huafeng Kuang, Jie Wu, Xiawu Zheng, Ming Li, Xuefeng Xiao, Rui Wang, Min Zheng, Rongrong Ji
for: 提高语言图像预训练模型（VLP）的部署实际应用中的性能和效率。
methods: 通过知识填充来压缩VLP模型，并从不同模块的建筑特点和多模态信息传递的角度进行了多维ensional的分析和优化。
results: 通过实验，提出了一个简单 yet efficient的Distilling Language-Image Pre-training框架（DLIP），可以在多种跨模态任务中实现状态 искусственный智能的精度/效率质量评价。例如，DLIP可以压缩BLIP模型1.9倍，从213M Parameters降至108M Parameters，同时保持和更好的性能。此外，DLIP可以保留95%以上的性能，使用22.4% Parameters和24.8% FLOPs，并提高执行速度2.7倍。

Abstract
Vision-Language Pre-training (VLP) shows remarkable progress with the assistance of extremely heavy parameters, which challenges deployment in real applications. Knowledge distillation is well recognized as the essential procedure in model compression. However, existing knowledge distillation techniques lack an in-depth investigation and analysis of VLP, and practical guidelines for VLP-oriented distillation are still not yet explored. In this paper, we present DLIP, a simple yet efficient Distilling Language-Image Pre-training framework, through which we investigate how to distill a light VLP model. Specifically, we dissect the model distillation from multiple dimensions, such as the architecture characteristics of different modules and the information transfer of different modalities. We conduct comprehensive experiments and provide insights on distilling a light but performant VLP model. Experimental results reveal that DLIP can achieve a state-of-the-art accuracy/efficiency trade-off across diverse cross-modal tasks, e.g., image-text retrieval, image captioning and visual question answering. For example, DLIP compresses BLIP by 1.9x, from 213M to 108M parameters, while achieving comparable or better performance. Furthermore, DLIP succeeds in retaining more than 95% of the performance with 22.4% parameters and 24.8% FLOPs compared to the teacher model and accelerates inference speed by 2.7x.

摘要
《视力语言预训练（VLP）显示了惊人的进步，却面临实际应用中的部署挑战。知识填充被广泛认可为模型压缩的关键手段。然而，现有的知识填充技术尚未对VLP进行深入的研究和分析，并没有提供VLP-关注的压缩实践指南。本文提出了DLIP框架，是一个简单 yet efficient的语言图像预训练压缩框架。我们通过多维度分析模型压缩，包括不同模块的建筑特点和不同Modalities的信息传递。我们进行了广泛的实验，并提供了压缩轻量级VLP模型的深入分析和实践指南。实验结果表明，DLIP可以在多个横跨模态任务中实现状态机器的精度/效率交易，例如图像搜索、图像描述和视觉问答。例如，DLIP可以将BLIP压缩到1.9倍，从213M Parameters下降至108M Parameters，同时保持与教师模型相当或更好的性能。此外，DLIP可以保留95%以上的性能，使用22.4%的参数和24.8%的FLOPs，并提高执行速度2.7倍。

Low-count Time Series Anomaly Detection

paper_url: http://arxiv.org/abs/2308.12925
repo_url: None
paper_authors: Philipp Renz, Kurt Cutajar, Niall Twomey, Gavin K. C. Cheung, Hanting Xie
for: 本研究旨在Addressing the challenges of time series anomaly detection in low-count data settings, where signal-to-noise ratios are low and non-uniform performance is prevalent.
methods: 该研究引入了一种新的生成过程，用于创建含有低个数时间序列的异常段 benchmark datasets。该过程结合了理论和实验分析，以解释常用算法在异常段分布重叠问题上的缺陷。
results: 研究发现，使用异常分数平滑可以有效地提高异常检测性能。此外，该研究还 validate了该方法的实际用途性，在一个实际的零售店销售数据集上进行了验证。

Abstract
Low-count time series describe sparse or intermittent events, which are prevalent in large-scale online platforms that capture and monitor diverse data types. Several distinct challenges surface when modelling low-count time series, particularly low signal-to-noise ratios (when anomaly signatures are provably undetectable), and non-uniform performance (when average metrics are not representative of local behaviour). The time series anomaly detection community currently lacks explicit tooling and processes to model and reliably detect anomalies in these settings. We address this gap by introducing a novel generative procedure for creating benchmark datasets comprising of low-count time series with anomalous segments. Via a mixture of theoretical and empirical analysis, our work explains how widely-used algorithms struggle with the distribution overlap between normal and anomalous segments. In order to mitigate this shortcoming, we then leverage our findings to demonstrate how anomaly score smoothing consistently improves performance. The practical utility of our analysis and recommendation is validated on a real-world dataset containing sales data for retail stores.

摘要
低个数时序列描述稀疏或间歇性事件，这些事件在大规模在线平台上采集和监测多种数据类型时很普遍。在模型低个数时序列时，存在一些独特的挑战，如低信号噪声比（畸变签识不可靠）和非均匀性（平均指标不能反映本地行为）。现有的时序异常检测社区没有专门的工具和过程来模型和可靠地检测这些设置中的异常。我们解决这个空白，通过引入一种新的生成过程，创建了包含低个数时序列异常段的标准数据集。我们通过理论和实验分析，解释了广泛使用的算法在分布重叠问题上的缺陷。然后，我们利用我们的发现，示出了如何使用异常分数缓解这个缺陷，提高性能。我们的分析和建议在实际的零售业务中得到了验证。

Evaluating the Vulnerabilities in ML systems in terms of adversarial attacks

paper_url: http://arxiv.org/abs/2308.12918
repo_url: None
paper_authors: John Harshith, Mantej Singh Gill, Madhan Jothimani
for: 本研究探讨了最新的敌意攻击方法，以及它们对当前深度学习网络防御系统的影响。
methods: 本研究使用了Randomized和敌意示例来探讨漏洞的影响。
results: 研究发现，Randomized示例可能会导致漏洞的产生，而敌意示例则可能会导致漏洞的扩大。此外，研究还探讨了这些漏洞的伦理性。In English, that would be:
for: This research explores the latest adversarial attack methods and their impact on current deep learning cyber defense systems.
methods: The research uses Randomized and adversarial examples to examine the influence of vulnerabilities.
results: The study finds that Randomized examples may lead to the creation of vulnerabilities, while adversarial examples may exacerbate them. Additionally, the research discusses the ethical implications of these vulnerabilities.

Abstract
There have been recent adversarial attacks that are difficult to find. These new adversarial attacks methods may pose challenges to current deep learning cyber defense systems and could influence the future defense of cyberattacks. The authors focus on this domain in this research paper. They explore the consequences of vulnerabilities in AI systems. This includes discussing how they might arise, differences between randomized and adversarial examples and also potential ethical implications of vulnerabilities. Moreover, it is important to train the AI systems appropriately when they are in testing phase and getting them ready for broader use.

摘要
现在有一些新的 adversarial 攻击方法，这些攻击方法可能会对当前的深度学习网络防御系统 pose 挑战。作者在这篇研究报告中关注这个领域，探讨了人工智能系统中的漏洞可能性。这包括讨论恶意攻击的可能性、随机化和 adversarial 示例之间的区别，以及漏洞的伦理问题。此外，在测试阶段，需要适当地训练 AI 系统，以便在更广泛的应用中使用。Note: "adversarial attacks" in the original text was translated as "恶意攻击" in Simplified Chinese, which is a more common term used in the field.

Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights using Generative AI

paper_url: http://arxiv.org/abs/2308.12915
repo_url: None
paper_authors: Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour
For: The paper is written to explore the potential of using advanced AI tools like GPT-4 and Stable Diffusion to create an AI-native game that blends interactive narrative and text-to-image transformation, and to enhance the narrative game genre with AI-generated content.* Methods: The paper uses a game called “1001 Nights” as a case study to demonstrate the use of AI tools in game development. The game features a protagonist, Shahrzad, who is driven by a large language model and can realize words and stories in her world through conversation with the AI King. The player can steer the conversation towards specific keywords, which become battle equipment in the game.* Results: The paper presents the results of the second iteration of the game, which challenges the conventional border between the game world and reality through a dual perspective. The game allows the player to collaborate with AI to craft narratives and shape the game world, and explores the technical and design elements of implementing such a game.

Abstract
In this paper, we present "1001 Nights", an AI-native game that allows players lead in-game reality through co-created storytelling with the character driven by large language model. The concept is inspired by Wittgenstein's idea of the limits of one's world being determined by the bounds of their language. Using advanced AI tools like GPT-4 and Stable Diffusion, the second iteration of the game enables the protagonist, Shahrzad, to realize words and stories in her world. The player can steer the conversation with the AI King towards specific keywords, which then become battle equipment in the game. This blend of interactive narrative and text-to-image transformation challenges the conventional border between the game world and reality through a dual perspective. We focus on Shahrzad, who seeks to alter her fate compared to the original folklore, and the player, who collaborates with AI to craft narratives and shape the game world. We explore the technical and design elements of implementing such a game with an objective to enhance the narrative game genre with AI-generated content and to delve into AI-native gameplay possibilities.

摘要
在这篇论文中，我们介绍了一款名为“1001夜”的人工智能（AI）原生游戏，该游戏使得玩家可以通过与人工智能合作创作故事来导导游戏世界。这个概念 draws inspiration from威特根штайн的思想，即语言的 bound пределяет我们的世界。使用了高级AI工具如GPT-4和Stable Diffusion，第二版游戏允许主人公 Шахرза德（Shahrzad）在她的世界中实现语言和故事。玩家可以通过对人工智能国王的对话指导语言，使得这些语言变成游戏中的武器。这种结合互动叙事和文本到图像转换的游戏模式挑战了传统游戏世界和现实之间的界限，我们在 dual perspective 中强调 Shahrazad 的自由和玩家和人工智能合作创作故事和形成游戏世界。我们探讨了在实施这种游戏时的技术和设计元素，以提高叙事游戏类型中的人工智能生成内容，并探索人工智能原生游戏的可能性。

CDAN: Convolutional Dense Attention-guided Network for Low-light Image Enhancement

paper_url: http://arxiv.org/abs/2308.12902
repo_url: None
paper_authors: Hossein Shakibania, Sina Raoufi, Hassan Khotanlou
for: 这篇论文主要针对低光照图像的改进和增强。
methods: 该论文提出了一种基于卷积神经网络和权重注意机制的Convolutional Dense Attention-guided Network（CDAN），用于提高低光照图像的明亮度、对比度和整体质量。
results: 对多个 benchmark 数据集进行测试，CDAN 表现出了明显的进步，与现有的状态艺技术相比，能够更好地处理低光照图像，并且能够有效地恢复图像中的纹理和颜色。

Abstract
Low-light images, characterized by inadequate illumination, pose challenges of diminished clarity, muted colors, and reduced details. Low-light image enhancement, an essential task in computer vision, aims to rectify these issues by improving brightness, contrast, and overall perceptual quality, thereby facilitating accurate analysis and interpretation. This paper introduces the Convolutional Dense Attention-guided Network (CDAN), a novel solution for enhancing low-light images. CDAN integrates an autoencoder-based architecture with convolutional and dense blocks, complemented by an attention mechanism and skip connections. This architecture ensures efficient information propagation and feature learning. Furthermore, a dedicated post-processing phase refines color balance and contrast. Our approach demonstrates notable progress compared to state-of-the-art results in low-light image enhancement, showcasing its robustness across a wide range of challenging scenarios. Our model performs remarkably on benchmark datasets, effectively mitigating under-exposure and proficiently restoring textures and colors in diverse low-light scenarios. This achievement underscores CDAN's potential for diverse computer vision tasks, notably enabling robust object detection and recognition in challenging low-light conditions.

摘要
低光照图像，受到不足照明的影响，具有减少清晰度、抑制颜色、降低细节等问题。低光照图像增强是计算机视觉中的关键任务，旨在通过提高亮度、对比度和总体品质来促进正确的分析和解释。本文介绍了一种新的卷积神经网络方法——卷积密集注意力引导网络（CDAN），用于提高低光照图像。CDAN结合了自适应网络架构、卷积块和密集块，并加入了注意力机制和跳过连接。这种架构确保了信息传递的高效和特征学习。此外，特定的后处理阶段进行了颜色均衡和对比度的调整。我们的方法在低光照图像增强中显示了明显的进步，与当前最佳结果相比，在多种复杂的场景中表现出了稳定和可靠的特点。我们的模型在标准 benchmark 数据集上表现出色，高效地抑制了下izada 和重新恢复了低光照图像中的纹理和颜色。这一成就表明 CDAN 在计算机视觉任务中具有广泛的潜力，特别是在低光照条件下进行稳定和准确的对象检测和识别。

Can Linguistic Knowledge Improve Multimodal Alignment in Vision-Language Pretraining?

paper_url: http://arxiv.org/abs/2308.12898
repo_url: https://github.com/wangfei-2019/snare
paper_authors: Fei Wang, Liang Ding, Jun Rao, Ye Liu, Li Shen, Changxing Ding
for: 本研究旨在探讨语义知识和语法结构是否可以在视觉语言关联（VLP）中提取，以及这些语言知识如何影响或改善多模态对应。
methods: 我们设计了首个大规模多模态对应探测 benchmark，名为SNARE，以检测重要的语言组件，如 lexical、semantic 和 syntax 知识。我们的研究使用 five 种高级 VLP 模型进行总体分析，发现这些模型： i) 忽略复杂的语法结构，依赖内容词 для句子理解; ii) 对 Sentence 和否定逻辑的组合表示有限制; iii) 在视觉信息中找不到动作或空间关系，困难确定 triple 组合的正确性。
results: 我们的研究发现，VLP 模型在复杂的语法结构和 Sentence 与否定逻辑的组合中存在困难，而且在视觉信息中找不到动作或空间关系。

Abstract
The multimedia community has shown a significant interest in perceiving and representing the physical world with multimodal pretrained neural network models, and among them, the visual-language pertaining (VLP) is, currently, the most captivating topic. However, there have been few endeavors dedicated to the exploration of 1) whether essential linguistic knowledge (e.g., semantics and syntax) can be extracted during VLP, and 2) how such linguistic knowledge impact or enhance the multimodal alignment. In response, here we aim to elucidate the impact of comprehensive linguistic knowledge, including semantic expression and syntactic structure, on multimodal alignment. Specifically, we design and release the SNARE, the first large-scale multimodal alignment probing benchmark, to detect the vital linguistic components, e.g., lexical, semantic, and syntax knowledge, containing four tasks: Semantic structure, Negation logic, Attribute ownership, and Relationship composition. Based on our proposed probing benchmarks, our holistic analyses of five advanced VLP models illustrate that the VLP model: i) shows insensitivity towards complex syntax structures and relies on content words for sentence comprehension; ii) demonstrates limited comprehension of combinations between sentences and negations; iii) faces challenges in determining the presence of actions or spatial relationships within visual information and struggles with verifying the correctness of triple combinations. We make our benchmark and code available at \url{https://github.com/WangFei-2019/SNARE/}.

摘要
multimedia社区对使用多模态预训练神经网络模型来感知和表示物理世界表示了广泛的兴趣，其中最吸引人的话题当属视语联系（VLP）。然而，有很少的尝试专门探讨以下两个问题：一是在VLP中是否可以提取语言基础知识（如 semantics和 syntax），二是如何使这些语言基础知识对多模态对应进行影响。为回答这些问题，我们希望通过检查包括语义表达和语法结构在内的全面语言知识的影响来解释多模态对应中的语言知识的影响。为此，我们设计并发布了首个大规模多模态对应探测 benchmark，即SNARE，以检测关键语言组件，如 lexical、semantic 和 syntax 知识。通过我们的提出的探测benchmark，我们对五种高级VLP模型进行了整体分析，发现：1. VLP模型对复杂语法结构表示不敏感，它们依赖于内容词来理解句子;2. VLP模型对 sentences和否定语言的组合表示有限制，它们很难理解这些组合的语义;3. VLP模型在视觉信息中寻找动作或空间关系的过程中遇到困难，同时它们也难以verify triple combinations的正确性。我们将我们的benchmark和代码发布在GitHub上，请参考 \url{https://github.com/WangFei-2019/SNARE/}.

Large Language Models Vote: Prompting for Rare Disease Identification

paper_url: http://arxiv.org/abs/2308.12890
repo_url: https://github.com/oniani/llms-vote
paper_authors: David Oniani, Jordan Hilsman, Hang Dong, Fengyi Gao, Shiven Verma, Yanshan Wang
for: 该论文旨在提出一种具有灵活性的提问方法，以提高基于大语言模型（LLM）的几招学习（FSL）任务的性能。
methods: 该方法称为模型投票提示（MVP），它通过提交多个LLM执行同一任务，并将其结果进行多数投票来提高任务的性能。
results: 对一种稀有疾病识别和分类任务，MVP方法能够获得任务的改进结果，并且比单个模型 ensemble 的结果更佳。此外， authors 还发布了一个新的稀有疾病数据集，可供那些同意 MIMIC-IV 数据使用协议（DUA）的人使用。

Abstract
The emergence of generative Large Language Models (LLMs) emphasizes the need for accurate and efficient prompting approaches. LLMs are often applied in Few-Shot Learning (FSL) contexts, where tasks are executed with minimal training data. FSL has become popular in many Artificial Intelligence (AI) subdomains, including AI for health. Rare diseases, affecting a small fraction of the population, inherently require FSL techniques due to limited data availability, though manual data collection and annotation is costly and time-consuming. In this paper, we propose Models-Vote Prompting (MVP), a flexible prompting approach for improving the performance of LLM queries in FSL settings. MVP works by prompting numerous LLMs to perform the same tasks and then conducting a majority vote on the resulting outputs. This method achieves improved results to any one model in the ensemble on one-shot rare disease identification and classification tasks. We also release a novel rare disease dataset for FSL, available to those who agreed to the MIMIC-IV Data Use Agreement (DUA). Furthermore, in using MVP, each model is prompted multiple times, substantially increasing the time needed for manual annotation, and to address this, we assess the feasibility of using JSON for automating generative LLM evaluation.

摘要
大量生成语言模型（LLM）的出现强调了 precisionefficient的提示方法的需求。 LLM frequently applied in Few-Shot Learning（FSL）上下文中，在 minimal training data 下进行任务执行。 FSL 在许多人工智能（AI）子领域中得到普及，包括 AI for health。 rare diseases，affecting a small fraction of the population，inherently require FSL techniques due to limited data availability，although manual data collection and annotation is costly and time-consuming。在这篇论文中，我们提出 Models-Vote Prompting（MVP），一种 flexible prompting approach，用于改进 LLM 查询在 FSL 设置中的性能。 MVP 通过 prompting numerous LLMS 完成同一个任务，并 Then conducting a majority vote on the resulting outputs。这种方法可以提高任何一个模型 ensemble 中的表现，在一次性罕见疾病识别和分类任务中。我们还发布了一个新的罕见疾病数据集，可供那些同意 MIMIC-IV Data Use Agreement（DUA）。此外，在使用 MVP 时，每个模型都会被多次提示，这substantially increases the time needed for manual annotation，并且为了解决这个问题，我们评估了使用 JSON 自动生成 LLM 评估的可能性。

Inducing Causal Structure for Abstractive Text Summarization

paper_url: http://arxiv.org/abs/2308.12888
repo_url: None
paper_authors: Lu Chen, Ruqing Zhang, Wei Huang, Wei Chen, Jiafeng Guo, Xueqi Cheng
for: 本研究旨在提高数据驱动抽象摘要模型的效果，通过强调 causal 关系而不是相关性。
methods: 我们引入了 Structural Causal Model (SCM)，假设文档和摘要中存在多个隐藏因素和非因果因素，用于捕捉文档和摘要的内容和风格。我们证明了在满足certain conditions下，我们可以通过适应训练数据来确定隐藏因素。基于这，我们提出了 Causality Inspired Sequence-to-Sequence model (CI-Seq2Seq)，用于学习 causal 表示，以便寻求 causal 信息 для摘要生成。
results: 我们在两个常用的文本摘要数据集上进行了实验，结果显示了我们的方法的优势。

Abstract
The mainstream of data-driven abstractive summarization models tends to explore the correlations rather than the causal relationships. Among such correlations, there can be spurious ones which suffer from the language prior learned from the training corpus and therefore undermine the overall effectiveness of the learned model. To tackle this issue, we introduce a Structural Causal Model (SCM) to induce the underlying causal structure of the summarization data. We assume several latent causal factors and non-causal factors, representing the content and style of the document and summary. Theoretically, we prove that the latent factors in our SCM can be identified by fitting the observed training data under certain conditions. On the basis of this, we propose a Causality Inspired Sequence-to-Sequence model (CI-Seq2Seq) to learn the causal representations that can mimic the causal factors, guiding us to pursue causal information for summary generation. The key idea is to reformulate the Variational Auto-encoder (VAE) to fit the joint distribution of the document and summary variables from the training corpus. Experimental results on two widely used text summarization datasets demonstrate the advantages of our approach.

摘要
主流的数据驱动抽象摘要模型往往探索相关性而不是 causal 关系。其中的一些相关性可能受到训练集中的语言优先级影响，从而降低整体模型的效果。为解决这个问题，我们引入结构 causal 模型（SCM）来探索摘要数据的下面结构。我们假设了一些隐藏的 causal 因素和非 causal 因素，表示文档和摘要的内容和风格。理论上，我们证明了我们的 SCM 中的隐藏因素可以通过适应训练数据来被确定。基于这，我们提议一种 causality 激发 sequence-to-sequence 模型（CI-Seq2Seq）来学习 causal 表示，以便追求摘要中的 causal 信息。关键思想是将 Variational Autoencoder（VAE）改进来适应训练集中的 JOIN 分布。实验结果表明，我们的方法在两个常用的文本摘要数据集上具有优势。