cs.CL - 2023-08-13

Faithful to Whom? Questioning Interpretability Measures in NLP

paper_url: http://arxiv.org/abs/2308.06795
repo_url: None
paper_authors: Evan Crothers, Herna Viktor, Nathalie Japkowicz
for: 这 paper 的目的是探讨现有的 faithfulness metrics 是否适用于比较不同的神经网络文本分类器的解释性。
methods: 作者使用 iterative masking 方法测试 faithfulness metrics，并发现这些度量在不同的模型之间存在很大的变化。
results: 作者发现 masked samples frequently 外部训练数据分布，并且 iterative masking 可能导致 faithfulness scores 的巨大变化。另外，作者还研究了对 faithfulness scores 的影响，包括 adversarial attacks 和 adversarial training。

Abstract
A common approach to quantifying model interpretability is to calculate faithfulness metrics based on iteratively masking input tokens and measuring how much the predicted label changes as a result. However, we show that such metrics are generally not suitable for comparing the interpretability of different neural text classifiers as the response to masked inputs is highly model-specific. We demonstrate that iterative masking can produce large variation in faithfulness scores between comparable models, and show that masked samples are frequently outside the distribution seen during training. We further investigate the impact of adversarial attacks and adversarial training on faithfulness scores, and demonstrate the relevance of faithfulness measures for analyzing feature salience in text adversarial attacks. Our findings provide new insights into the limitations of current faithfulness metrics and key considerations to utilize them appropriately.

摘要
一种常见的方法量化模型解释性是通过 iteratively masking input token 并测量预测标签变化的方式来计算 faithfulness 度量。然而，我们显示这些度量不适合比较不同的神经网络文本分类器的解释性，因为模型具有很高的特定性。我们示出了 iterative 遮盖可能会导致大量的 faithfulness 分数变化，并且遮盖样本通常不在训练时间段内。我们进一步研究了对 faithfulness 度量的影响和对文本对抗攻击的分析，并证明了 faithfulness 度量的重要性。我们的发现提供了新的理解现有 faithfulness 度量的限制和使其正确使用的关键考虑因素。

Modeling the Dashboard Provenance

paper_url: http://arxiv.org/abs/2308.06788
repo_url: None
paper_authors: Johne Jarske, Jorge Rady, Lucia V. L. Filgueiras, Leandro M. Velloso, Tania L. Santos
For: The paper aims to provide a provenance representation model for dashboards and its visual and data components, which can help organizations evaluate the quality, consistency, and reliability of the information presented on dashboards.* Methods: The proposed model will offer a comprehensive set of essential provenance metadata that enables users to evaluate the context in which a specific dashboard was developed, including information about people, organizations, entities, and activities involved in the production, influence, or delivery of the data or object.* Results: The paper aims to provide a standardized and visualized representation of provenance metadata for dashboards, which can help users make better decisions based on the quality and reliability of the information presented.

Abstract
Organizations of all kinds, whether public or private, profit-driven or non-profit, and across various industries and sectors, rely on dashboards for effective data visualization. However, the reliability and efficacy of these dashboards rely on the quality of the visual and data they present. Studies show that less than a quarter of dashboards provide information about their sources, which is just one of the expected metadata when provenance is seriously considered. Provenance is a record that describes people, organizations, entities, and activities that had a role in the production, influence, or delivery of a piece of data or an object. This paper aims to provide a provenance representation model, that entitles standardization, modeling, generation, capture, and visualization, specifically designed for dashboards and its visual and data components. The proposed model will offer a comprehensive set of essential provenance metadata that enables users to evaluate the quality, consistency, and reliability of the information presented on dashboards. This will allow a clear and precise understanding of the context in which a specific dashboard was developed, ultimately leading to better decision-making.

摘要
Provenance is a record that describes people, organizations, entities, and activities that had a role in the production, influence, or delivery of a piece of data or an object. This paper aims to provide a provenance representation model, that entitles standardization, modeling, generation, capture, and visualization, specifically designed for dashboards and its visual and data components. The proposed model will offer a comprehensive set of essential provenance metadata that enables users to evaluate the quality, consistency, and reliability of the information presented on dashboards. This will allow a clear and precise understanding of the context in which a specific dashboard was developed, ultimately leading to better decision-making.

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

paper_url: http://arxiv.org/abs/2308.06744
repo_url: None
paper_authors: Minsoo Kim, Sihwa Lee, Janghwan Lee, Sukjin Hong, Du-Seong Chang, Wonyong Sung, Jungwook Choi
for: 这个研究是为了解决生成模型在实际应用中的大型模型问题。
methods: 这个研究使用了量化测试敏感训练（QAT）方法，并提出了一个专门适用于生成模型的知识传递法。
results: 这个研究获得了较少于1.0倍的衰落和无损失的推理任务结果，表明了这个方法的成功。

Abstract
Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and no loss of accuracy in a reasoning task.

摘要
生成语言模型（GLM）在文本生成、理解和推理等任务中表现出色，但模型大小带来实际部署的挑战。为解决这个问题，量化意识训练（QAT）在生成模型中变得越来越流行。然而，现有的QAT方法对生成模型带来明显的精度损失。为此，我们提出了一种特有的知识储存方法，称为Token扩展LOGIT储存。该方法防止过拟合，并从教师模型和真实数据中提取优质知识。这项研究标志着大规模GLM的三进制重量量化意识训练的首次评估，并达到了低于1.0的质量下降和无损失的理解任务准确率。

Emergent communication for AR

paper_url: http://arxiv.org/abs/2308.07342
repo_url: None
paper_authors: Ruxiao Chen, Shuaishuai Guo
for: 这篇论文旨在提出一种用于Mobile Augmented Reality（MAR）的 emergent semantic communication 框架，以便在 MAR 中提高通信效率。
methods: 作者使用了两个代理人通过修改了 Lewis 信号游戏进行训练，以便自动生成一种简短的通信协议。
results: 实验表明，提出的方案在不可见对象上具有更好的泛化性，并且可以通过使用小型消息来提高通信效率。

Abstract
Mobile augmented reality (MAR) is widely acknowledged as one of the ubiquitous interfaces to the digital twin and Metaverse, demanding unparalleled levels of latency, computational power, and energy efficiency. The existing solutions for realizing MAR combine multiple technologies like edge, cloud computing, and fifth-generation (5G) networks. However, the inherent communication latency of visual data imposes apparent limitations on the quality of experience (QoE). To address the challenge, we propose an emergent semantic communication framework to learn the communication protocols in MAR. Specifically, we train two agents through a modified Lewis signaling game to emerge a discrete communication protocol spontaneously. Based on this protocol, two agents can communicate about the abstract idea of visual data through messages with extremely small data sizes in a noisy channel, which leads to message errors. To better simulate real-world scenarios, we incorporate channel uncertainty into our training process. Experiments have shown that the proposed scheme has better generalization on unseen objects than traditional object recognition used in MAR and can effectively enhance communication efficiency through the utilization of small-size messages.

摘要
移动增强现实（MAR）被广泛承认为数字双胞迷和Metaverse的一种普遍的界面，需要无 précédent 的延迟、计算能力和能效率。现有的 MAR 实现方案 combining 多种技术，如边缘计算、云计算和 fifth-generation（5G）网络。然而，视觉数据的自然通信延迟带来明显的用户体验质量（QoE）限制。为 Addressing 这个挑战，我们提出了一种emergent semantic communication框架，用于在 MAR 中学习通信协议。具体来说，我们通过 modify 了 Lewis 信号游戏来训练两个代理人，从而自然地生成一个精简的通信协议。根据这个协议，两个代理人可以通过 messages WITH extremely small data sizes 在噪音频道中交换信息，这会导致消息错误。为更好地模拟实际情况，我们将频率uncertainty incorporated 到我们的训练过程中。实验结果表明，我们的方案在未看到对象时比传统 MAR 中使用的对象识别更好地 generalization ，并可以通过利用小型消息来提高通信效率。