2023-11-08

cs.AI

cs.AI - 2023-11-08

Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy Implications

paper_url: http://arxiv.org/abs/2311.05054
repo_url: None
paper_authors: Jiashuo Liu, Jiayun Wu, Tianyu Wang, Hao Zou, Bo Li, Peng Cui
for: 提高机器学习算法对 distribuitional shift 的鲁棒性，Addressing the issue of distributional shifts in machine learning algorithms.
methods: 使用 Distributionally Robust Optimization (DRO) 方法，Optimizing the worst-case risk within an uncertainty set.
results: 提出 Geometry-Calibrated DRO (GCDRO) 方法，which incorporates data geometry into calibration terms to alleviate the impact of noise, and establishes a connection between the risk objective and the Helmholtz free energy in statistical physics. Comprehensive experiments confirm GCDRO’s superiority over conventional DRO methods.

Abstract
Machine learning algorithms minimizing average risk are susceptible to distributional shifts. Distributionally Robust Optimization (DRO) addresses this issue by optimizing the worst-case risk within an uncertainty set. However, DRO suffers from over-pessimism, leading to low-confidence predictions, poor parameter estimations as well as poor generalization. In this work, we conduct a theoretical analysis of a probable root cause of over-pessimism: excessive focus on noisy samples. To alleviate the impact of noise, we incorporate data geometry into calibration terms in DRO, resulting in our novel Geometry-Calibrated DRO (GCDRO) for regression. We establish the connection between our risk objective and the Helmholtz free energy in statistical physics, and this free-energy-based risk can extend to standard DRO methods. Leveraging gradient flow in Wasserstein space, we develop an approximate minimax optimization algorithm with a bounded error ratio and elucidate how our approach mitigates noisy sample effects. Comprehensive experiments confirm GCDRO's superiority over conventional DRO methods.

摘要
Neste trabalho, realizamos uma análise teórica de uma causa provável do pessimismo excessivo: o foco excessivo em amostras ruidosas. Para aliviar o impacto do ruído, incorporamos informações de geometria de dados na calibração de DRO, resultando em nossa técnica novativa de Geometry-Calibrated DRO (GCDRO) para regressão. Estabelecemos a conexão entre nossa meta de risco e a energia livre de Helmholtz na física estatística, e essa medida de risco baseada em energia livre pode ser aplicada a métodos de DRO padrão.Leverando o fluxo de gradiente na espaço de Wasserstein, desenvolvemos um algoritmo de otimização aproximada com um erro de ratio limitado e elucidamos como nossa abordagem mitiga os efeitos das amostras ruidosas. Experimentos compreensivos confirmam a superioridade do GCDRO em relação aos métodos de DRO conventioneis.

Zero-shot Translation of Attention Patterns in VQA Models to Natural Language

paper_url: http://arxiv.org/abs/2311.05043
repo_url: https://github.com/explainableml/zs-a2t
paper_authors: Leonard Salewski, A. Sophia Koepke, Hendrik P. A. Lensch, Zeynep Akata
for: 这 paper 的目的是提出一种可以在零批处理下将 transformer 注意力转换成自然语言的框架，以便更好地理解模型内部的含义。
methods: 该框架基于一个预训练的大型语言模型（LLM），该模型接受任务提示、问题和预测答案作为输入，并根据这些输入选择tokен来描述输入图像中 VQA 模型的注意力区域。
results: 该框架在 textual explanation 数据集上达到了零批处理情况下的州OF-the-art表现，在 GQA-REX 和 VQA-X 上得到了优秀的结果。

Abstract
Converting a model's internals to text can yield human-understandable insights about the model. Inspired by the recent success of training-free approaches for image captioning, we propose ZS-A2T, a zero-shot framework that translates the transformer attention of a given model into natural language without requiring any training. We consider this in the context of Visual Question Answering (VQA). ZS-A2T builds on a pre-trained large language model (LLM), which receives a task prompt, question, and predicted answer, as inputs. The LLM is guided to select tokens which describe the regions in the input image that the VQA model attended to. Crucially, we determine this similarity by exploiting the text-image matching capabilities of the underlying VQA model. Our framework does not require any training and allows the drop-in replacement of different guiding sources (e.g. attribution instead of attention maps), or language models. We evaluate this novel task on textual explanation datasets for VQA, giving state-of-the-art performances for the zero-shot setting on GQA-REX and VQA-X. Our code is available at: https://github.com/ExplainableML/ZS-A2T.

摘要
可以将模型内部转换为文本来提供人类可理解的启示。 drawing inspiration from recent successful training-free approaches for image captioning, we propose ZS-A2T，a zero-shot framework that translates the transformer attention of a given model into natural language without requiring any training. We consider this in the context of Visual Question Answering (VQA). ZS-A2T builds on a pre-trained large language model (LLM), which receives a task prompt, question, and predicted answer as inputs. The LLM is guided to select tokens that describe the regions in the input image that the VQA model attended to. Crucially, we determine this similarity by exploiting the text-image matching capabilities of the underlying VQA model. Our framework does not require any training and allows the drop-in replacement of different guiding sources (e.g., attribution instead of attention maps), or language models. We evaluate this novel task on textual explanation datasets for VQA, achieving state-of-the-art performances for the zero-shot setting on GQA-REX and VQA-X. Our code is available at: https://github.com/ExplainableML/ZS-A2T.

Automated Annotation of Scientific Texts for ML-based Keyphrase Extraction and Validation

paper_url: http://arxiv.org/abs/2311.05042
repo_url: None
paper_authors: Oluwamayowa O. Amusat, Harshad Hegde, Christopher J. Mungall, Anna Giannakou, Neil P. Byers, Dan Gunter, Kjiersten Fagnan, Lavanya Ramakrishnan
for: 本研究旨在提高机器学习生成的 metadata 的可信度，以便更好地搜索和利用生物科学领域中的大数据。
methods: 本研究提出了两种新的自动文本标签方法，即使用不同类型的数据源关联和使用领域专用的控制词汇或 Ontology。
results: 实验结果表明，提posed的标签分配方法可以生成高度特定的文本标签，与机器学习生成的关键字列表匹配度高达44%。

Abstract
Advanced omics technologies and facilities generate a wealth of valuable data daily; however, the data often lacks the essential metadata required for researchers to find and search them effectively. The lack of metadata poses a significant challenge in the utilization of these datasets. Machine learning-based metadata extraction techniques have emerged as a potentially viable approach to automatically annotating scientific datasets with the metadata necessary for enabling effective search. Text labeling, usually performed manually, plays a crucial role in validating machine-extracted metadata. However, manual labeling is time-consuming; thus, there is an need to develop automated text labeling techniques in order to accelerate the process of scientific innovation. This need is particularly urgent in fields such as environmental genomics and microbiome science, which have historically received less attention in terms of metadata curation and creation of gold-standard text mining datasets. In this paper, we present two novel automated text labeling approaches for the validation of ML-generated metadata for unlabeled texts, with specific applications in environmental genomics. Our techniques show the potential of two new ways to leverage existing information about the unlabeled texts and the scientific domain. The first technique exploits relationships between different types of data sources related to the same research study, such as publications and proposals. The second technique takes advantage of domain-specific controlled vocabularies or ontologies. In this paper, we detail applying these approaches for ML-generated metadata validation. Our results show that the proposed label assignment approaches can generate both generic and highly-specific text labels for the unlabeled texts, with up to 44% of the labels matching with those suggested by a ML keyword extraction algorithm.

摘要
高级数据技术和设施每天生成大量有价值的数据，但这些数据经常缺乏必要的元数据，使研究人员困难地找到和搜索这些数据。缺乏元数据对于使用这些数据而言是一个重要的挑战。基于机器学习的元数据抽取技术已经出现为可能的解决方案，可以自动将科学数据集中添加元数据。文本标签，通常是手动完成的，在验证机器提取的元数据中扮演着关键的角色。然而，手动标签是时间消耗的，因此有一个需要开发自动文本标签技术，以加速科学创新的过程。这个需求特别是在环境遗传学和微生物学等领域 particularly urgent，这些领域在元数据管理和创建高品质的文本挖掘数据方面 historically received less attention。在这篇论文中，我们提出了两种新的自动文本标签方法，用于验证机器生成的元数据的有效性。这两种方法都利用了不同的数据来源之间的关系，以及领域专门的控制词汇或 ontology。我们在这篇论文中详细介绍了这些方法的应用。我们的结果表明，我们的标签分配方法可以为无标签文本生成both generic和高度特定的文本标签，并且最高达44%的标签与机器学习 keyword extraction 算法提出的标签相匹配。

Transfer learning from a sparsely annotated dataset of 3D medical images

paper_url: http://arxiv.org/abs/2311.05032
repo_url: https://github.com/diagnijmegen/medicaltransferlearning3d-unet
paper_authors: Gabriel Efrain Humpire-Mamani, Colin Jacobs, Mathias Prokop, Bram van Ginneken, Nikolas Lessmann
for: This study aims to improve the efficiency of annotation and increase the accessibility of accurate organ segmentation in medical imaging using transfer learning.methods: The authors use transfer learning to leverage pre-trained model features from a large dataset to improve the performance of deep convolutional neural networks for organ segmentation in medical imaging. They use a base segmentation model (3D U-Net) trained on a large and sparsely annotated dataset and fine-tune it for four new down-stream segmentation tasks with fully annotated datasets.results: The results show that transfer learning from the base model is beneficial when small datasets are available, providing significant performance improvements. Fine-tuning the base model is more beneficial than updating all the network weights with vanilla transfer learning. The study also shows that cross-modality transfer learning using CT scans is beneficial. The performance of the fine-tuned models increased by up to 0.129 (+28%) Dice score and on average 23 experiments increased the performance by 0.029 Dice score in the new segmentation tasks.

Abstract
Transfer learning leverages pre-trained model features from a large dataset to save time and resources when training new models for various tasks, potentially enhancing performance. Due to the lack of large datasets in the medical imaging domain, transfer learning from one medical imaging model to other medical imaging models has not been widely explored. This study explores the use of transfer learning to improve the performance of deep convolutional neural networks for organ segmentation in medical imaging. A base segmentation model (3D U-Net) was trained on a large and sparsely annotated dataset; its weights were used for transfer learning on four new down-stream segmentation tasks for which a fully annotated dataset was available. We analyzed the training set size's influence to simulate scarce data. The results showed that transfer learning from the base model was beneficial when small datasets were available, providing significant performance improvements; where fine-tuning the base model is more beneficial than updating all the network weights with vanilla transfer learning. Transfer learning with fine-tuning increased the performance by up to 0.129 (+28\%) Dice score than experiments trained from scratch, and on average 23 experiments increased the performance by 0.029 Dice score in the new segmentation tasks. The study also showed that cross-modality transfer learning using CT scans was beneficial. The findings of this study demonstrate the potential of transfer learning to improve the efficiency of annotation and increase the accessibility of accurate organ segmentation in medical imaging, ultimately leading to improved patient care. We made the network definition and weights publicly available to benefit other users and researchers.

摘要
通过使用已经训练过的模型特征，转移学习可以为训练新的模型 saves time和资源，并且可能提高性能。由于医疗影像领域的大型数据集罕见，医疗影像中的转移学习还没有广泛探索。本研究探讨了使用转移学习提高医疗影像中深度卷积神经网络的组织分 segmentation性能。基本分 segmentation模型（3D U-Net）在一个大型但罕见地注解的数据集上训练，其 weights 用于转移学习四个新的下游分 segmentation任务上。我们分析了训练集大小的影响，以模拟缺乏数据的情况。结果表明，从基本模型进行转移学习在小数据集时是有利的，提供了显著性能提升（+28%）；而在所有网络权重更新为vanilla transfer learning的情况下，进行精度调整（fine-tuning）更是有利。转移学习与精度调整共同提高了新分 segmentation任务的性能，平均提高0.029 Dice分数。研究还发现了在CT扫描图像上进行转移学习的利好。本研究发现，转移学习可以提高医疗影像中的注解效率和准确率，从而提高患者治疗的质量。我们将网络定义和权重公开，以便其他用户和研究人员使用。

Towards Effective Paraphrasing for Information Disguise

paper_url: http://arxiv.org/abs/2311.05018
repo_url: https://github.com/idecir/idecir-towards-effective-paraphrasing-for-information-disguise
paper_authors: Anmol Agarwal, Shrey Gupta, Vamshi Bonagiri, Manas Gaur, Joseph Reagle, Ponnurangam Kumaraguru
for: 本研究的目的是提出一种基于幂等词汇替换的信息隐蔽技术，以防止互联网上作者的文字媒体宣传被非法利用。methods: 本研究使用了自然语言处理技术中的人工智能自动词汇替换工具（如SpinRewriter、WordAI），并通过对 sentence 进行迭代 perturbation 来混淆搜索机制。results: 本研究表明，使用多级词汇替换和幂等词汇替换可以成功隐藏 sentences 82% 的时间。这种方法可以帮助作者隐藏敏感信息，从而减少不良用户利用这些信息的风险。

Abstract
Information Disguise (ID), a part of computational ethics in Natural Language Processing (NLP), is concerned with best practices of textual paraphrasing to prevent the non-consensual use of authors' posts on the Internet. Research on ID becomes important when authors' written online communication pertains to sensitive domains, e.g., mental health. Over time, researchers have utilized AI-based automated word spinners (e.g., SpinRewriter, WordAI) for paraphrasing content. However, these tools fail to satisfy the purpose of ID as their paraphrased content still leads to the source when queried on search engines. There is limited prior work on judging the effectiveness of paraphrasing methods for ID on search engines or their proxies, neural retriever (NeurIR) models. We propose a framework where, for a given sentence from an author's post, we perform iterative perturbation on the sentence in the direction of paraphrasing with an attempt to confuse the search mechanism of a NeurIR system when the sentence is queried on it. Our experiments involve the subreddit 'r/AmItheAsshole' as the source of public content and Dense Passage Retriever as a NeurIR system-based proxy for search engines. Our work introduces a novel method of phrase-importance rankings using perplexity scores and involves multi-level phrase substitutions via beam search. Our multi-phrase substitution scheme succeeds in disguising sentences 82% of the time and hence takes an essential step towards enabling researchers to disguise sensitive content effectively before making it public. We also release the code of our approach.

摘要
信息掩蔽（ID），一部分的计算伦理在自然语言处理（NLP）中，关注在文本重新排序方法的最佳实践中，以防止互联网上作者的帖子不经授权使用。对于研究人员来说，研究ID变得重要，特别是作者在互联网上发表的文本与敏感领域相关，例如心理健康。在过去，研究人员使用基于AI的自动词汇替换工具（如SpinRewriter、WordAI）进行重新排序内容。然而，这些工具并不满足ID的目的，因为它们重新排序后的内容仍然可以跟踪回原始来源。有限的先前研究探讨了重新排序方法的效果对于ID在搜索引擎或其代理Neural Retriever（NeurIR）模型。我们提出了一个框架，其中，对于作者的一句话，我们在重新排序方向下进行迭代 perturbation，以混淆搜索机制中查询该句话的NeurIR系统。我们的实验使用了Reddit上的“r/AmItheAsshole”社区为公共内容的来源，并使用基于Neural Retriever的代理来模拟搜索引擎。我们的方法包括phrase重要性排名使用混淆分数和多级词汇替换via扫描搜索。我们的多词汇替换方案成功地隐藏了句子82%的时间，因此为研究人员隐藏敏感内容的有效方法。我们还发布了我们的方法的代码。

Joint Sensing and Semantic Communications with Multi-Task Deep Learning

paper_url: http://arxiv.org/abs/2311.05017
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Yalin E. Sagduyu, Tugba Erpek, Aylin Yener, Sennur Ulukus
for: 这篇论文探讨了深度学习技术的紧密 integrate 感知通信，包括延伸到semantic communications。
methods: transmitter 使用深度神经网络（encoder）进行源编码、频率编码和模ulation，而 receiver 使用另一个深度神经网络（decoder）进行模ulation、频率解码和源解码来重construct 数据样本。
results: 该实验使用 CIFAR-10 作为输入数据，并考虑了通道效应如添加白噪和衰减抑制。结果表明多任务深度学习可以实现高精度的紧密感知通信和semantic communications。

Abstract
This paper explores the integration of deep learning techniques for joint sensing and communications, with an extension to semantic communications. The integrated system comprises a transmitter and receiver operating over a wireless channel, subject to noise and fading effects. The transmitter employs a deep neural network, namely an encoder, for joint operations of source coding, channel coding, and modulation, while the receiver utilizes another deep neural network, namely a decoder, for joint operations of demodulation, channel decoding, and source decoding to reconstruct the data samples. The transmitted signal serves a dual purpose, supporting communication with the receiver and enabling sensing. When a target is present, the reflected signal is received, and another deep neural network decoder is utilized for sensing. This decoder is responsible for detecting the target's presence and determining its range. All these deep neural networks, including one encoder and two decoders, undergo joint training through multi-task learning, considering data and channel characteristics. This paper extends to incorporate semantic communications by introducing an additional deep neural network, another decoder at the receiver, operating as a task classifier. This decoder evaluates the fidelity of label classification for received signals, enhancing the integration of semantics within the communication process. The study presents results based on using the CIFAR-10 as the input data and accounting for channel effects like Additive White Gaussian Noise (AWGN) and Rayleigh fading. The results underscore the effectiveness of multi-task deep learning in achieving high-fidelity joint sensing and semantic communications.

摘要

Interpreting Pretrained Language Models via Concept Bottlenecks

paper_url: http://arxiv.org/abs/2311.05014
repo_url: None
paper_authors: Zhen Tan, Lu Cheng, Song Wang, Yuan Bo, Jundong Li, Huan Liu
for: 本研究旨在解释PLMs的黑obox性，提高PLMs在自然语言处理任务中的可读性和可理解性。
methods: 我们提出了一种新的方法，利用高级别的意义ful的概念来解释PLMs。我们使用人工标注和机器生成的概念相结合，提取隐藏神经元，以捕捉semantically meaningful和任务特定的概念。
results: 我们通过对实际数据集进行实证研究，发现我们的方法可以提供有价值的解释PLMs的行为，帮助诊断模型失败和提高模型的Robustness。

Abstract
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. However, the lack of interpretability due to their ``black-box'' nature poses challenges for responsible implementation. Although previous studies have attempted to improve interpretability by using, e.g., attention weights in self-attention layers, these weights often lack clarity, readability, and intuitiveness. In this research, we propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans. For example, we learn the concept of ``Food'' and investigate how it influences the prediction of a model's sentiment towards a restaurant review. We introduce C$^3$M, which combines human-annotated and machine-generated concepts to extract hidden neurons designed to encapsulate semantically meaningful and task-specific concepts. Through empirical evaluations on real-world datasets, we manifest that our approach offers valuable insights to interpret PLM behavior, helps diagnose model failures, and enhances model robustness amidst noisy concept labels.

摘要
Pretrained language models (PLMs) have made significant progress in various natural language processing tasks. However, their "black-box" nature poses challenges for responsible implementation. Previous studies have attempted to improve interpretability by using, for example, attention weights in self-attention layers, but these weights often lack clarity, readability, and intuitiveness. In this research, we propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans. For example, we learn the concept of "Food" and investigate how it influences the prediction of a model's sentiment towards a restaurant review. We introduce C$^3$M, which combines human-annotated and machine-generated concepts to extract hidden neurons designed to encapsulate semantically meaningful and task-specific concepts. Through empirical evaluations on real-world datasets, we demonstrate that our approach offers valuable insights to interpret PLM behavior, helps diagnose model failures, and enhances model robustness amidst noisy concept labels.

Expressibility-induced Concentration of Quantum Neural Tangent Kernels

paper_url: http://arxiv.org/abs/2311.04965
repo_url: None
paper_authors: Li-Wei Yu, Weikang Li, Qi Ye, Zhide Lu, Zizhao Han, Dong-Ling Deng
for: 这篇论文主要研究了量子机器学习模型的性能分析方法，以及这些方法如何应用于实际应用中的宽量子变量电路设计。
methods: 这篇论文使用了量子触 neighboorhood kernel 方法，用于分析量子机器学习模型在无限宽限制下的性能。这些方法还被应用于描述量子神经网络训练误差的整数化方式。
results: 研究发现，在全球损失函数下，高表达能量的全球和本地量子编码可以导致量子触 neighboorhood kernel 值快速减少到零。而在本地损失函数下，虽然高表达能量可以导致量子触 neighboorhood kernel 值快速减少，但是不能完全消除。此外，通过广泛的数值实验， authors 验证了这些分析理论。这些发现对量子机器学习模型的设计提供了重要的指导意见。

Abstract
Quantum tangent kernel methods provide an efficient approach to analyzing the performance of quantum machine learning models in the infinite-width limit, which is of crucial importance in designing appropriate circuit architectures for certain learning tasks. Recently, they have been adapted to describe the convergence rate of training errors in quantum neural networks in an analytical manner. Here, we study the connections between the trainability and expressibility of quantum tangent kernel models. In particular, for global loss functions, we rigorously prove that high expressibility of both the global and local quantum encodings can lead to exponential concentration of quantum tangent kernel values to zero. Whereas for local loss functions, such issue of exponential concentration persists owing to the high expressibility, but can be partially mitigated. We further carry out extensive numerical simulations to support our analytical theories. Our discoveries unveil a pivotal characteristic of quantum neural tangent kernels, offering valuable insights for the design of wide quantum variational circuit models in practical applications.

摘要
量子触感керnel方法提供了一种有效的方式来分析量子机器学习模型在无穷宽限下的性能，这对于设计适当的电路体系结构非常重要。最近，它们已经被适应来描述量子神经网络训练错误的减少速率。在这里，我们研究了量子触感керnel模型的可教化和表达能力之间的连接。特别是，对于全局损失函数，我们严格地证明了高表达能力的全局和本地量子编码的情况下，可以导致量子触感керnel值快速减少到零。而对于本地损失函数，这种问题仍然存在，但可以通过高表达能力来部分缓解。我们还进行了广泛的数值仿真，以支持我们的分析理论。我们的发现揭示了量子神经触感kernek的一个重要特点，为实际应用中的宽量子变量电路模型设计提供了价值的洞察。

Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

paper_url: http://arxiv.org/abs/2311.04902
repo_url: https://github.com/rocktimjyotidas/gblm-pruner
paper_authors: Rocktim Jyoti Das, Liqun Ma, Zhiqiang Shen
for: 这个研究是为了提出一种基于梯度的语言模型剔除方法（GBLM-Pruner），以提高这些模型的优化和简化。
methods: 这个方法使用了大型语言模型的预训梯度，通过计算梯度的第一项泰勒展开来决定剔除重要性分数，并且不需要任何训练或重新调整。
results: 试验结果显示，GBLM-Pruner比其他两种方法（SparseGPT和Wanda）在多个测试 benchmark 上表现更好，并且不需要任何额外的训练或重新调整。

Abstract
Large Language Models (LLMs) with a billion or more parameters are prime targets for network pruning, which aims to reduce a portion of the network weights without compromising performance. Prior approaches such as Weights Magnitude, SparseGPT, and Wanda, either concentrated solely on weights or integrated weights with activations for sparsity. However, they overlooked the informative gradients derived from pretrained large language models. In this paper, we present a novel sparsity-centric pruning method for pretrained LLMs, termed Gradient-based Language Model Pruner (GBLM-Pruner). GBLM-Pruner leverages the first-order term of the Taylor expansion, operating in a training-free manner by harnessing properly normalized gradients from a few calibration samples to determine the importance pruning score, and substantially outperforms competitive counterparts like SparseGPT and Wanda in multiple benchmarks. Intriguing, after incorporating gradients, the unstructured pruning method tends to reveal some structural patterns post-pruning, which mirrors the geometric interdependence inherent in the LLMs' parameter structure. Additionally, GBLM-Pruner functions without any subsequent retraining or weight updates to maintain its simplicity as other counterparts. Extensive evaluations on LLaMA-1 and LLaMA-2 across various language benchmarks and perplexity show that GBLM-Pruner surpasses magnitude pruning, Wanda (weights+activations) and SparseGPT (weights+activations+weight update) by significant margins. Our code and models are available at https://github.com/RocktimJyotiDas/GBLM-Pruner.

摘要
大型语言模型（LLM） WITH 10亿或更多参数是适用于网络剪裁的目标，寻求减少一部分网络权重而不影响性能。先前的方法，如权重强度、SparseGPT和Wanda，ether solely focused on weights or integrated weights with activations for sparsity，但它们忽略了预训练大语言模型中的有用梯度。在这篇论文中，我们提出了一种新的简洁中心的剪裁方法 для预训练LLM，称为Gradient-based Language Model Pruner（GBLM-Pruner）。GBLM-Pruner利用预训练大语言模型中的梯度进行一ORDER Taylor扩展，在无需训练的情况下，通过正确规范梯度来确定剪裁重要性分数，并显著超越了竞争对手SparseGPT和Wanda在多个benchmark上。有趣的是，在执行剪裁后，不结构化的剪裁方法往往会揭示一些结构性 Patterns，这与LLMs中参数结构的几何相互关系有关。此外，GBLM-Pruner不需要任何后续重新训练或权重更新，以简化其他对手。我们在LLaMA-1和LLaMA-2上进行了多种语言benchmark和折衔度评估，发现GBLM-Pruner在比例剪裁、Wanda（权重+活动）和SparseGPT（权重+活动+权重更新）的情况下具有显著优势。我们的代码和模型可以在https://github.com/RocktimJyotiDas/GBLM-Pruner中找到。

Prompt Sketching for Large Language Models

paper_url: http://arxiv.org/abs/2311.04954
repo_url: None
paper_authors: Luca Beurer-Kellner, Mark Niklas Müller, Marc Fischer, Martin Vechev
For: The paper aims to address the issue of disconnected and wordy intermediate responses in recent prompting strategies for large language models (LLMs).* Methods: The proposed method, called prompt sketching, involves predicting values for multiple variables in a template, allowing users to have more control over the generation process and provide a reasoning framework via intermediate instructions. The key idea is to adapt the decoding procedure to also score follow-up instructions during text generation, optimizing overall template likelihood in inference.* Results: The paper shows that prompt sketching outperforms existing, sequential prompting schemes such as direct asking or chain-of-thought on 7 out of 8 LLM benchmarking tasks, including state tracking, arithmetic reasoning, and general question answering. The paper also releases a number of generic, yet effective sketches applicable to many tasks and an open source library called dclib, powering the sketch-aware decoders.

Abstract
Many recent prompting strategies for large language models (LLMs) query the model multiple times sequentially -- first to produce intermediate results and then the final answer. However, using these methods, both decoder and model are unaware of potential follow-up prompts, leading to disconnected and undesirably wordy intermediate responses. In this work, we address this issue by proposing prompt sketching, a new prompting paradigm in which an LLM does not only respond by completing a prompt, but by predicting values for multiple variables in a template. This way, sketching grants users more control over the generation process, e.g., by providing a reasoning framework via intermediate instructions, leading to better overall results. The key idea enabling sketching with existing, autoregressive models is to adapt the decoding procedure to also score follow-up instructions during text generation, thus optimizing overall template likelihood in inference. Our experiments show that in a zero-shot setting, prompt sketching outperforms existing, sequential prompting schemes such as direct asking or chain-of-thought on 7 out of 8 LLM benchmarking tasks, including state tracking, arithmetic reasoning, and general question answering. To facilitate future use, we release a number of generic, yet effective sketches applicable to many tasks, and an open source library called dclib, powering our sketch-aware decoders.

摘要
很多现代提示策略 для大型自然语言模型（LLM）会在 sequential 方式下询问模型多次——首先生成中间结果，然后生成最终答案。然而，使用这些方法时，decoder和模型都不知道可能的后续提示，导致中间响应不连贯和不 DESIRED wordy。在这项工作中，我们解决这个问题，提出了提示绘制（prompt sketching），一种新的提示方式，在 котором一个 LLM 不仅通过完成提示来回答，而且可以预测多个变量的值在模板中。这样，绘制可以让用户更有控制力量，例如提供了一个reasoning框架 via intermediate instructions，导致更好的总体结果。我们的关键想法是使用现有的、自然进行推断的模型，将解码过程修改为在文本生成过程中也评分后续指令，以便在推断过程中优化总体模板概率。我们的实验表明，在零配置情况下，提示绘制在 7 个 LLMBenchmark 任务上比直接询问或链式思维方法表现出色，包括状态跟踪、数学逻辑和通用问答。为便于未来使用，我们发布了一些通用 yet 有效的绘制，以及一个名为 dclib 的开源库，该库将power我们的绘制执行器。

Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

paper_url: http://arxiv.org/abs/2311.04898
repo_url: None
paper_authors: Timm Hess, Tinne Tuytelaars, Gido M. van de Ven
for: 这篇论文主要关注于如何解决深度神经网络的持续学习问题，特别是当开始训练新任务时会发生快速忘记的问题。
methods: 这篇论文提出了一种基于加入回放或调整项的方法来 aproximate 缩寸类损失函数，并且还使用了梯度对映技术来调整优化路径。
results: 这篇论文预计通过结合回放-approximated joint 损失函数和梯度对映-based 优化路径，以测试加入后者是否能够提供以下优点：(1) 缓和稳定差距，(2) 增加学习效率，(3) 提高最终学习成果。

Abstract
Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when starting to train on a new task. Motivated by this 'stability gap', we propose that continual learning strategies should focus not only on the optimization objective, but also on the way this objective is optimized. While there is some continual learning work that alters the optimization trajectory (e.g., using gradient projection techniques), this line of research is positioned as alternative to improving the optimization objective, while we argue it should be complementary. To evaluate the merits of our proposition, we plan to combine replay-approximated joint objectives with gradient projection-based optimization routines to test whether the addition of the latter provides benefits in terms of (1) alleviating the stability gap, (2) increasing the learning efficiency and (3) improving the final learning outcome.

摘要
近年来，深度神经网络的不断训练方法得到了很大的进步，主要是通过添加回放或规则化项来 aproximate 所有任务的共同损失函数。然而，我们发现，即使有完美的共同损失函数近似，这些方法仍会在开始训练新任务时出现临时且显著的忘记。我们被这个"稳定差距"所驱使，我们提议，不断学习策略应该不仅关注优化目标，还应该关注优化目标的优化方式。虽然有一些不断学习研究使用梯度投影技术来修改优化轨迹，但我们认为这种研究应该是补充优化目标的，而不是替代。为了评估我们的提议的价值，我们计划将回放近似的共同损失函数与梯度投影技术相结合，以测试这种组合是否可以提供以下 beneficial effects：1. 减轻稳定差距2. 提高学习效率3. 提高最终学习成果

DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets

paper_url: http://arxiv.org/abs/2311.04894
repo_url: https://github.com/jinga-lala/damex
paper_authors: Yash Jain, Harkirat Behl, Zsolt Kira, Vibhav Vineet
for: 本文旨在提出一种universal detector的建构方法，如何在大量混合数据集上训练一个模型？
methods: 作者提出了一种名为Dataset-Aware Mixture-of-Experts（DAMEX）的解决方案，通过训练专家来 Route每个数据集的令牌到它的映射专家，以提高模型的性能。
results: 在Universal Object-Detection Benchmark上进行了实验，比较了与现有状态的前一代和非MoE基线方法，达到了平均提升10.2个AP分数，并且在不同的数据集混合情况下（1）有限的可用性、（2）不同的领域和（3）不同的标签集）都达到了稳定的提升。此外，作者还质量地表明了DAMEX的专家表示 collapse问题的Robustness。

Abstract
Construction of a universal detector poses a crucial question: How can we most effectively train a model on a large mixture of datasets? The answer lies in learning dataset-specific features and ensembling their knowledge but do all this in a single model. Previous methods achieve this by having separate detection heads on a common backbone but that results in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose Dataset-Aware Mixture-of-Experts, DAMEX where we train the experts to become an `expert' of a dataset by learning to route each dataset tokens to its mapped expert. Experiments on Universal Object-Detection Benchmark show that we outperform the existing state-of-the-art by average +10.2 AP score and improve over our non-MoE baseline by average +2.0 AP score. We also observe consistent gains while mixing datasets with (1) limited availability, (2) disparate domains and (3) divergent label sets. Further, we qualitatively show that DAMEX is robust against expert representation collapse.

摘要
建构一个通用探测器存在一个关键问题：如何有效地训练一个模型在大量的混合数据集上？答案在于学习数据集特有的特征并将其拼接在单一模型中。先前的方法通过在共同脊梁上添加分开的探测头来实现这一点，但这会导致参数的增加。在这项工作中，我们提出了混合专家（MoE）作为解决方案，并证明MoE不仅是一种扩展性工具。我们提出了数据集特性混合专家（DAMEX），其中我们在训练专家时将数据集的各个元素分配给它们所对应的专家。实验表明，我们在通用物体探测benchmark上的平均AP分数高于现有状态的拟合性工具，提高了非MoE基线的平均AP分数 by 10.2个点。我们还发现在混合不同数据集、不同领域和不同标签集时，DAMEX具有一定的稳定性和可靠性。此外，我们还证明DAMEX不容易受到专家表示塌陷的影响。

Towards Few-Annotation Learning in Computer Vision: Application to Image Classification and Object Detection tasks

paper_url: http://arxiv.org/abs/2311.04888
repo_url: None
paper_authors: Quentin Bouniot
for: 这个论文的目的是提出一些用于机器学习的有限标签问题的理论、算法和实验贡献，特别是在计算机视觉中进行图像分类和对象检测。
methods: 这个论文使用了许多现有的Meta-学习算法，以及多任务学习理论基础，以针对少量标签问题进行更有效的meta-学习。此外，它还提出了一种不使用标签的对象检测器预训练方法，以及一种使用部分标签的semi-supervised学习方法。
results: 这个论文的实验结果表明，通过将多任务学习理论与Meta-学习算法相结合，可以更好地适应少量标签问题，并且可以在对象检测器中使用不使用标签的预训练方法来提高对象检测的准确率。

Abstract
In this thesis, we develop theoretical, algorithmic and experimental contributions for Machine Learning with limited labels, and more specifically for the tasks of Image Classification and Object Detection in Computer Vision. In a first contribution, we are interested in bridging the gap between theory and practice for popular Meta-Learning algorithms used in Few-Shot Classification. We make connections to Multi-Task Representation Learning, which benefits from solid theoretical foundations, to verify the best conditions for a more efficient meta-learning. Then, to leverage unlabeled data when training object detectors based on the Transformer architecture, we propose both an unsupervised pretraining and a semi-supervised learning method in two other separate contributions. For pretraining, we improve Contrastive Learning for object detectors by introducing the localization information. Finally, our semi-supervised method is the first tailored to transformer-based detectors.

摘要
在这个论文中，我们提出了理论、算法和实验贡献，用于机器学习具有有限标签数据的任务，特别是计算机视觉中的图像分类和物体检测。在第一个贡献中，我们尝试 bridge 理论和实践中的各种Meta-Learning算法，用于少量样本分类。我们与多任务学习理论建立连接，以验证最佳的meta-learning条件。然后，我们提出了一种不使用标签数据的对象检测器培训方法，基于Transformer架构。在这两个分布中，我们分别提出了一种无监督预训练方法和一种半监督学习方法。在预训练中，我们提出了一种基于本地化信息的对比学习方法，以提高对象检测器的性能。最后，我们的半监督学习方法是首次应用于基于Transformer架构的对象检测器。

SEMQA: Semi-Extractive Multi-Source Question Answering

paper_url: http://arxiv.org/abs/2311.04886
repo_url: https://github.com/google-research-datasets/quotesum
paper_authors: Tal Schuster, Adam D. Lelkes, Haitian Sun, Jai Gupta, Jonathan Berant, William W. Cohen, Donald Metzler
for: 这篇论文旨在提出一种新的多选问答任务，即将多个多样性的源文摘要成一篇全面的答案，以便更好地评估语言模型的能力。
methods: 这篇论文使用了 semi-extractive 方法，即将factual quoted spans（直接从输入源文中摘取的 span）和非factual free-text connectors（将这些 span 连接成一个完整的 passage）相结合，以生成一个全面的答案。
results: 经过对多个语言模型的实验， authors 发现这个任务 surprisingly 具有挑战性，表明 QuoteSum 可以用于开发和研究这种混合摘要能力。

Abstract
Recently proposed long-form question answering (QA) systems, supported by large language models (LLMs), have shown promising capabilities. Yet, attributing and verifying their generated abstractive answers can be difficult, and automatically evaluating their accuracy remains an ongoing challenge. In this work, we introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion. Specifically, Semi-extractive Multi-source QA (SEMQA) requires models to output a comprehensive answer, while mixing factual quoted spans -- copied verbatim from given input sources -- and non-factual free-text connectors that glue these spans together into a single cohesive passage. This setting bridges the gap between the outputs of well-grounded but constrained extractive QA systems and more fluent but harder to attribute fully abstractive answers. Particularly, it enables a new mode for language models that leverages their advanced language generation capabilities, while also producing fine in-line attributions by-design that are easy to verify, interpret, and evaluate. To study this task, we create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions, and define text-based evaluation metrics. Experimenting with several LLMs in various settings, we find this task to be surprisingly challenging, demonstrating the importance of QuoteSum for developing and studying such consolidation capabilities.

摘要
最近提出的长形问答系统（QA），支持大型自然语言模型（LLM），已经显示了有前途的能力。然而，归功和验证生成的抽象答案是一个Difficult Challenge。在这种工作中，我们引入了一个新的问答任务，即摘要多个多样的源文，并生成一个包含多个事实引用 span 和自由文本连接器的全面答案。这种设定可以在EXTRACTIVE QA系统的输出和具有更高级别的自由文本生成能力的语言模型之间形成一个桥接。特别是，它允许语言模型利用其高级语言生成能力，同时生成易于验证、解释和评估的精准引用。为研究这个任务，我们创建了第一个这种类型的数据集，即QuoteSum，其中包含人类编写的 semi-extractive 答案，以及自然和生成的问题。我们定义了文本基于的评估指标。在不同的设定下，我们使用多种语言模型进行实验，发现这个任务实际上非常困难，这表明QuoteSum 对开发和研究这种整合能力的研究具有重要的意义。

LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

paper_url: http://arxiv.org/abs/2311.04879
repo_url: https://github.com/yangjianxin1/longqlora
paper_authors: Jianxin Yang
for: 提高大语言模型的上下文长度，并采用少量训练资源。
methods: combining Position Interpolation, QLoRA和Shift Short Attention of LongLoRA，并在单个32GB V100 GPU上进行训练。
results: 可以将LLaMA2 7B和13B的上下文长度从4096延长到8192和更长，并在PG19和Proof-pile数据集上实现竞争性的抗抑制性表现，并且与MPT-7B-8K在评估上下文长度为8192的情况下几乎相同。

Abstract
We present LongQLoRA, an efficient and effective method to extend context length of large language models with less training resources. LongQLoRA combines the advantages of Position Interpolation, QLoRA and Shift Short Attention of LongLoRA. With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps. LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets, our model outperforms LongLoRA and is very close to MPT-7B-8K within the evaluation context length of 8192. We collect and build 39k long instruction data to extend context length of Vicuna-13B from 4096 to 8192 and achieve good performance both in long and short context generation task. We also do some ablation experiments to study the effect of LoRA rank, finetuning steps and attention patterns in inference.The model weights, training data and code are avaliable at https://github.com/yangjianxin1/LongQLoRA.

摘要
我们提出了LongQLoRA，一种高效和有效的方法，可以将大语言模型的上下文长度延长，使用更少的训练资源。LongQLoRA结合了Position Interpolation、QLoRA和Shift Short Attention的优点。使用单个32GB V100 GPU，LongQLoRA可以将LLaMA2 7B和13B的上下文长度从4096提高至8192以及12k，在1000个finetuning步骤内完成。LongQLoRA在PG19和Proof-pile数据集上达到了竞争力的折射性表现，我们的模型比LongLoRA更高效，与MPT-7B-8K在评估Context length为8192的情况下几乎相当。我们收集了39k个长 instrucion数据，以延长Vicuna-13B的上下文长度从4096提高至8192，并在长和短上下文生成任务中达到了良好的表现。我们还进行了一些剥夺实验，以研究LoRA排名、finetuning步骤和注意模式在推理中的效果。模型权重、训练数据和代码可以在https://github.com/yangjianxin1/LongQLoRA中下载。

Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

paper_url: http://arxiv.org/abs/2311.04850
repo_url: https://github.com/lm-sys/llm-decontaminator
paper_authors: Shuo Yang, Wei-Lin Chiang, Lianmin Zheng, Joseph E. Gonzalez, Ion Stoica
for: 这篇论文的目的是探讨大型自然语言模型（LLM）在训练时可能会受到污染的问题，以及如何使用更强大的检测方法来解决这个问题。
methods: 本文使用了多种检测方法来检查LLM的训练数据是否受到污染，包括字串匹配（n-gram overlap）和简译等方法。另外，本文还提出了一个更强大的LLM-based检测方法，并将其应用于广泛使用的预训练和终端训练数据中。
results: 本文的实验结果显示，使用现有的检测方法可能无法彻底检测LLM的污染问题，而且简译等方法可以轻松地绕过检测方法。另外，本文还发现了一些实验数据中的污染问题，包括GPT-3.5/4生成的 sintetic数据。总之，本文强调了需要更强大的检测方法来确保LLM的训练数据是clean的，并且呼吁了社区对于公共benchmark的使用进行更多的检测和监控。

Abstract
Large language models are increasingly trained on all the data ever produced by humans. Many have raised concerns about the trustworthiness of public benchmarks due to potential contamination in pre-training or fine-tuning datasets. While most data decontamination efforts apply string matching (e.g., n-gram overlap) to remove benchmark data, we show that these methods are insufficient, and simple variations of test data (e.g., paraphrasing, translation) can easily bypass these decontamination measures. Furthermore, we demonstrate that if such variation of test data is not eliminated, a 13B model can easily overfit a test benchmark and achieve drastically high performance, on par with GPT-4. We validate such observations in widely used benchmarks such as MMLU, GSK8k, and HumanEval. To address this growing risk, we propose a stronger LLM-based decontamination method and apply it to widely used pre-training and fine-tuning datasets, revealing significant previously unknown test overlap. For example, in pre-training sets such as RedPajama-Data-1T and StarCoder-Data, we identified that 8-18\% of the HumanEval benchmark overlaps. Interestingly, we also find such contamination in synthetic dataset generated by GPT-3.5/4, suggesting a potential risk of unintentional contamination. We urge the community to adopt stronger decontamination approaches when using public benchmarks. Moreover, we call for the community to actively develop fresh one-time exams to evaluate models accurately. Our decontamination tool is publicly available at https://github.com/lm-sys/llm-decontaminator.

摘要
大型语言模型在训练时使用了所有人类生产的数据。许多人提出了公共参考数据的可靠性问题，因为可能有污染在预训或精革 dataset 中。大多数数据净化努力使用字串匹配（例如 n-gram 重叠）移除参考数据，但我们表明这些方法是不充分的，并且简单的变体（例如重写、翻译）可以轻松地绕过这些净化措施。此外，我们显示了如果这种变体数据不被消除，则一个 13B 模型可以轻松地适应参考数据，并取得极高的性能，与 GPT-4 相似。我们在广泛使用的参考数据中进行验证这些观察，包括 MMLU、GSK8k 和 HumanEval。为了解决这个增长的风险，我们提出了一个更强的 LLM-based 净化方法，并将其应用到广泛使用的预训和精革 dataset 中，发现了 significannot 前所未知的参考数据重合。例如，在 RedPajama-Data-1T 和 StarCoder-Data 预训集中，我们发现了 8-18% 的 HumanEval 参考数据重合。 interestingly，我们也发现了这种污染在 GPT-3.5/4 生成的 sintetic 数据中，建议社区将更强的净化方法应用到公共参考数据中，以确保模型的测试性能是可靠的。此外，我们呼吁社区积极开发新的一次验证问题，以确保模型的性能是正确的。我们的净化工具已经在 GitHub 上公开，可以在 https://github.com/lm-sys/llm-decontaminator 中找到。

Identifying Semantic Component for Robust Molecular Property Prediction

paper_url: http://arxiv.org/abs/2311.04837
repo_url: https://github.com/dmirlab-group/sci
paper_authors: Zijian Li, Zunhong Xu, Ruichu Cai, Zhenhui Yang, Yuguang Yan, Zhifeng Hao, Guangyi Chen, Kun Zhang
for: 本研究旨在提高Graph Neural Networks（GNN）在不同数据集下的泛化能力。
methods: 我们提出了一种名为Semantic-Components Identifiability（SCI）的生成模型，可以将latent variable分解成semantic-relevant（SR）和semantic-irrelevant（SI）组成部分。
results: 我们的实验研究表明，SCI方法可以在21个数据集上 achieve state-of-the-art performance，并且可以提供更多的泛化性能。此外，我们的Visualization结果也提供了具有启发性的案例研究和预测结果的解释。

Abstract
Although graph neural networks have achieved great success in the task of molecular property prediction in recent years, their generalization ability under out-of-distribution (OOD) settings is still under-explored. Different from existing methods that learn discriminative representations for prediction, we propose a generative model with semantic-components identifiability, named SCI. We demonstrate that the latent variables in this generative model can be explicitly identified into semantic-relevant (SR) and semantic-irrelevant (SI) components, which contributes to better OOD generalization by involving minimal change properties of causal mechanisms. Specifically, we first formulate the data generation process from the atom level to the molecular level, where the latent space is split into SI substructures, SR substructures, and SR atom variables. Sequentially, to reduce misidentification, we restrict the minimal changes of the SR atom variables and add a semantic latent substructure regularization to mitigate the variance of the SR substructure under augmented domain changes. Under mild assumptions, we prove the block-wise identifiability of the SR substructure and the comment-wise identifiability of SR atom variables. Experimental studies achieve state-of-the-art performance and show general improvement on 21 datasets in 3 mainstream benchmarks. Moreover, the visualization results of the proposed SCI method provide insightful case studies and explanations for the prediction results. The code is available at: https://github.com/DMIRLAB-Group/SCI.

摘要
虽然 Graf Neural Networks 在 recent 年取得了大量成功的质量预测task，但它们的Out-of-Distribution（OOD）环境下的普遍性仍然受探索。不同于现有的方法将学习探测表示，我们提出了具有semantic-components可识别的生成模型，称为SCI。我们证明了隐藏变量在这个生成模型中可以明确地分解为semantic-相关（SR）和semantic-不相关（SI）组成部分，这对于OOD普遍性做出了贡献，因为它们涉及到最小改变的causal mechanism。具体来说，我们首先将质量生成从原子层次到分子层次，其隐藏空间被拆分为SI子结构、SR子结构和SR原子变量。接着，为了降低错误识别，我们限制SR原子变量的最小改变，并将SR子结构加入semantic latent substructure regularization，以减少对于增强领域变化的变异。根据严谨的假设，我们证明了SR子结构的对��分解和SR原子变量的对��分解。实验研究获得了3大主流benchmark中的state-of-the-art表现，并在21个数据集上显示了一般提高。此外，我们的SCI方法的视觉化结果将提供了有用的案例研究和预测结果解释。SCI代码可以在以下地址获取：https://github.com/DMIRLAB-Group/SCI。

Decentralized Personalized Online Federated Learning

paper_url: http://arxiv.org/abs/2311.04817
repo_url: None
paper_authors: Renzhi Wu, Saayan Mitra, Xiang Chen, Anup Rao
for: 这个论文旨在提出一种新的学习设定，即分布式个性化在线学习（Decentralized Personalized Online Federated Learning，DPOEL），以满足企业端服务器（enterprise edge servers）上一些重要应用程序的需求。
methods: 该论文提出了两种技术挑战：首先，如何将来自邻居客户端的共享模型参数集成到本地模型中，以获得良好的本地模型性能。其次，如何选择客户端与其他客户端进行交互的邻居。该论文提出了一种基于学习权重的对等选择方法。
results: 该论文在三个实际项赋予预测数据集和一个空气质量预测数据集上进行了实验，并证明了其效果和可靠性。

Abstract
Vanilla federated learning does not support learning in an online environment, learning a personalized model on each client, and learning in a decentralized setting. There are existing methods extending federated learning in each of the three aspects. However, some important applications on enterprise edge servers (e.g. online item recommendation at global scale) involve the three aspects at the same time. Therefore, we propose a new learning setting \textit{Decentralized Personalized Online Federated Learning} that considers all the three aspects at the same time. In this new setting for learning, the first technical challenge is how to aggregate the shared model parameters from neighboring clients to obtain a personalized local model with good performance on each client. We propose to directly learn an aggregation by optimizing the performance of the local model with respect to the aggregation weights. This not only improves personalization of each local model but also helps the local model adapting to potential data shift by intelligently incorporating the right amount of information from its neighbors. The second challenge is how to select the neighbors for each client. We propose a peer selection method based on the learned aggregation weights enabling each client to select the most helpful neighbors and reduce communication cost at the same time. We verify the effectiveness and robustness of our proposed method on three real-world item recommendation datasets and one air quality prediction dataset.

摘要
vanilla federated learning 不支持在线学习、学习每个客户端上的个性化模型，以及分布式设置下学习。现有的方法可以在每个方面进行扩展。然而，一些重要的企业端服务器应用（例如，全球范围内的在线项目推荐）需要同时考虑这三个方面。因此，我们提出了一种新的学习设定——分布式个性化在线 federated learning，它同时考虑了这三个方面。在这种新的学习设定中，技术挑战之一是如何将来自邻居客户端的共享模型参数集成到每个客户端上，以获得个性化的本地模型。我们提议直接通过优化本地模型的性能来学习权重。这不仅提高了每个本地模型的个性化程度，还帮助本地模型适应数据变化，通过智能地包含邻居客户端的信息来适应可能出现的数据变化。另一个挑战是如何选择每个客户端的邻居。我们提议基于学习的权重来选择邻居，使每个客户端可以选择最有助的邻居，同时降低通信成本。我们对三个实际的 item recommendation 数据集和一个空气质量预测数据集进行了验证和robustness测试，结果表明我们的方法是有效和可靠的。

MTGER: Multi-view Temporal Graph Enhanced Temporal Reasoning over Time-Involved Document

paper_url: http://arxiv.org/abs/2311.04816
repo_url: None
paper_authors: Zheng Chu, Zekun Wang, Jiafeng Liang, Ming Liu, Bing Qin
for: 这个论文是用于解决文档中的时间关系和推理问题的。
methods: 该论文提出了一种多视图时间图加强的时间推理框架（MTGER），该框架可以Explicitly模型文档中的时间关系，并通过多视图机制和自动融合来提高模型的隐式推理能力。
results: 实验结果表明，MTGER可以在TimeQA和SituatedQA datasets上达到显著的效果，并且在问题变化时能够给出更一致的答案。

Abstract
The facts and time in the document are intricately intertwined, making temporal reasoning over documents challenging. Previous work models time implicitly, making it difficult to handle such complex relationships. To address this issue, we propose MTGER, a novel Multi-view Temporal Graph Enhanced Temporal Reasoning framework for temporal reasoning over time-involved documents. Concretely, MTGER explicitly models the temporal relationships among facts by multi-view temporal graphs. On the one hand, the heterogeneous temporal graphs explicitly model the temporal and discourse relationships among facts; on the other hand, the multi-view mechanism captures both time-focused and fact-focused information, allowing the two views to complement each other through adaptive fusion. To further improve the implicit reasoning capability of the model, we design a self-supervised time-comparing objective. Extensive experimental results demonstrate the effectiveness of our method on the TimeQA and SituatedQA datasets. Furthermore, MTGER gives more consistent answers under question perturbations.

摘要
文档中的事实和时间关系紧密相连，使得文档中的时间逻辑推理变得困难。前期工作中的模型对时间进行了隐式表示，导致处理复杂的时间关系变得困难。为解决这个问题，我们提议MTGER，一种基于多视图时间图的新型多视图时间图增强的时间逻辑推理框架。具体来说，MTGER使用多视图时间图来明确事实之间的时间关系。一方面，不同视图中的时间图表示了事实之间的时间和论述关系；另一方面，多视图机制使得时间和事实信息相互补充，通过适应融合来增强模型的隐式逻辑能力。此外，我们还设计了一个自动supervised时间比较目标，以提高模型的隐式逻辑能力。实验结果表明，MTGER在TimeQA和SituatedQA datasets上具有显著的效果，并且在问题扰动下的答案更加一致。

DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining

paper_url: http://arxiv.org/abs/2311.04799
repo_url: https://github.com/sw-packages/fa101e30ca4ffd6a0479993b0e1c7299d2311c0416c0b68e2551534430e1e8fe
paper_authors: Martin Kuo, Jianyi Zhang, Yiran Chen
for: 提高预训练模型的性能和可解性，以及增强自然语言理解任务中预训练模型的表现。
methods: 提出了一种新的预训练模型——依赖协议拟合BERT（DACBERT），并开发了一种两阶段预训练方框架——依赖协议预训练。这个方框架基于语言理论，将语法和 semantics信息灵活地纳入预训练过程中。第一阶段使用四个专门的子模型来捕捉chunk级别的代表性依赖关系，并将这些依赖关系转化为嵌入。第二阶段使用这些精细化的嵌入，与传统的BERT嵌入相结合，导向预训练 осталь部分的模型。
results: 在GLUE测试 benchmark上，我们的DACBERT表现出色，在不同任务中的表现都有所提高，比Crammed BERT提高3.13%的RTE任务和2.26%的MRPC任务。此外，我们的方法可以在单个GPU上，在24小时内完成预训练过程，不需要额外的计算资源或延长预训练时间。广泛的研究还证明了我们的方法在自然语言理解任务中的表现是可靠的。

Abstract
Building on the cost-efficient pretraining advancements brought about by Crammed BERT, we enhance its performance and interpretability further by introducing a novel pretrained model Dependency Agreement Crammed BERT (DACBERT) and its two-stage pretraining framework - Dependency Agreement Pretraining. This framework, grounded by linguistic theories, seamlessly weaves syntax and semantic information into the pretraining process. The first stage employs four dedicated submodels to capture representative dependency agreements at the chunk level, effectively converting these agreements into embeddings. The second stage uses these refined embeddings, in tandem with conventional BERT embeddings, to guide the pretraining of the rest of the model. Evaluated on the GLUE benchmark, our DACBERT demonstrates notable improvement across various tasks, surpassing Crammed BERT by 3.13% in the RTE task and by 2.26% in the MRPC task. Furthermore, our method boosts the average GLUE score by 0.83%, underscoring its significant potential. The pretraining process can be efficiently executed on a single GPU within a 24-hour cycle, necessitating no supplementary computational resources or extending the pretraining duration compared with the Crammed BERT. Extensive studies further illuminate our approach's instrumental role in bolstering the interpretability of pretrained language models for natural language understanding tasks.

摘要
基于Cost-efficient pre-training的进步，我们又提出了一种新的预训练模型——Dependency Agreement Crammed BERT（DACBERT）和其两个阶段预训练框架——Dependency Agreement Pretraining。这个框架，基于语言理论，通过将 sintax和semantic信息灵活地整合到预训练过程中来提高模型的性能和可解性。第一个阶段使用四个专门的子模型来捕捉chunk级别的代名词协议，并将其转化为嵌入。第二个阶段使用这些精炼的嵌入，与普通的BERT嵌入一起，导引预训练其余部分的模型。在GLUE标准测试 benchmark上，我们的DACBERT表现出色，在RTE任务上超过Crammed BERT by 3.13%，在MRPC任务上超过by 2.26%。此外，我们的方法提高了GLUE平均分数 by 0.83%，强调其显著的潜力。预训练过程可以在单个GPU上完成 within a 24-hour cycle，不需要补充的计算资源或延长预训练时间与Crammed BERT相比。广泛的研究也证明了我们的方法在自然语言理解任务中增强预训练语言模型的可解性。

On the Multiple Roles of Ontologies in Explainable AI

paper_url: http://arxiv.org/abs/2311.04778
repo_url: None
paper_authors: Roberto Confalonieri, Giancarlo Guizzardi
for: 这篇论文探讨了ontology在可解释AI和人类中心的解释系统中的不同角色。
methods: 论文考虑了三个主要的ontology应用角度，包括参考模型、通用常识逻辑和知识精简和复杂性管理。
results: 论文结论提出了 Ontology-based 方法可以帮助解释AI 的人类理解和效果，但还需要解决一些挑战。

Abstract
This paper discusses the different roles that explicit knowledge, in particular ontologies, can play in Explainable AI and in the development of human-centric explainable systems and intelligible explanations. We consider three main perspectives in which ontologies can contribute significantly, namely reference modelling, common-sense reasoning, and knowledge refinement and complexity management. We overview some of the existing approaches in the literature, and we position them according to these three proposed perspectives. The paper concludes by discussing what challenges still need to be addressed to enable ontology-based approaches to explanation and to evaluate their human-understandability and effectiveness.

摘要

Vital Sign Forecasting for Sepsis Patients in ICUs

paper_url: http://arxiv.org/abs/2311.04770
repo_url: None
paper_authors: Anubhav Bhatti, Yuwei Liu, Chen Dan, Bingjie Shen, San Lee, Yonghwan Kim, Jang Yong Kim
for: 预测Intensive Care Units（ICU）中病人的生命体征指标，帮助医疗工作者早发现生命体征不稳定的迹象并预测 septic shock 的发展
methods: 使用现代深度学习（DL）架构，开发了一种多步预测系统，利用历史生命体征数据预测未来的生命体征状况
results: 比较了三种DL模型（N-BEATS、N-HiTS、Temporal Fusion Transformer）在 eICU Collaborative Research Database 上的预测能力，发现 TFT 模型能够更好地捕捉生命体征趋势，而 N-HiTS 模型能够更好地保持生命体征短期波动在预定范围内

Abstract
Sepsis and septic shock are a critical medical condition affecting millions globally, with a substantial mortality rate. This paper uses state-of-the-art deep learning (DL) architectures to introduce a multi-step forecasting system to predict vital signs indicative of septic shock progression in Intensive Care Units (ICUs). Our approach utilizes a short window of historical vital sign data to forecast future physiological conditions. We introduce a DL-based vital sign forecasting system that predicts up to 3 hours of future vital signs from 6 hours of past data. We further adopt the DILATE loss function to capture better the shape and temporal dynamics of vital signs, which are critical for clinical decision-making. We compare three DL models, N-BEATS, N-HiTS, and Temporal Fusion Transformer (TFT), using the publicly available eICU Collaborative Research Database (eICU-CRD), highlighting their forecasting capabilities in a critical care setting. We evaluate the performance of our models using mean squared error (MSE) and dynamic time warping (DTW) metrics. Our findings show that while TFT excels in capturing overall trends, N-HiTS is superior in retaining short-term fluctuations within a predefined range. This paper demonstrates the potential of deep learning in transforming the monitoring systems in ICUs, potentially leading to significant improvements in patient care and outcomes by accurately forecasting vital signs to assist healthcare providers in detecting early signs of physiological instability and anticipating septic shock.

摘要
septic shock 和 septic shock 是一种严重的医疗情况，影响全球数百万人，mortality rate 较高。这篇论文使用当前最先进的深度学习（DL）建筑，引入一种多步预测系统，以预测 ICU 中重要的生物参数，以便评估 septic shock 的进程。我们的方法使用一个短时间的历史生物参数数据，预测未来的生物参数。我们引入了 DL 基于的生物参数预测系统，可以预测未来 3 小时的生物参数，从 6 小时的历史数据中。我们进一步采用 DILATE 损失函数，以更好地捕捉生物参数的形态和时间动态，这些是临床决策中的关键。我们使用公共可用的 eICU 合作研究数据库（eICU-CRD），比较三种 DL 模型（N-BEATS、N-HiTS 和 Temporal Fusion Transformer ）的预测能力。我们使用 mean squared error（MSE）和 dynamic time warping（DTW） метри来评估我们的模型。我们的发现显示，虽然 TFT 能够捕捉总趋势，但 N-HiTS 在保持短期波动内的范围内表现更优。这篇论文示出了深度学习在 ICU 监测系统中的潜在潜力，可能导致患者护理和结果的显著改善，通过准确预测生物参数，帮助医疗提供者早期检测生物参数的不稳定，预测 septic shock。

The voraus-AD Dataset for Anomaly Detection in Robot Applications

paper_url: http://arxiv.org/abs/2311.04765
repo_url: https://github.com/vorausrobotik/voraus-ad-dataset
paper_authors: Jan Thieß Brockmann, Marco Rudolph, Bodo Rosenhahn, Bastian Wandt
For: This paper aims to provide a dataset for anomaly detection (AD) in robotic applications, and to introduce a new baseline method called MVT-Flow that outperforms previous baselines by a large margin.* Methods: The paper uses machine data from a pick-and-place application to create a dataset for AD, and introduces MVT-Flow, a deep-learning-based density estimation method that takes the structure of the data domain into account.* Results: The paper shows that MVT-Flow outperforms previous baselines by a large margin of 6.2% in area under ROC.Here is the text in Simplified Chinese:* For: 这篇论文目的是为了提供机器人应用中的异常检测（AD）数据集，并引入一种新的基线方法called MVT-Flow，该方法在ROC领域的表现明显超过了之前的基线方法。* Methods: 论文使用机器人执行pick-and-place任务的机器数据创建了AD数据集，并引入MVT-Flow方法，该方法是基于深度学习的概率分布预测方法，它采用了数据领域的结构来 tailor its architecture。* Results: 论文表明，MVT-Flow方法在ROC领域的表现比之前的基线方法高出6.2%的差。

Abstract
During the operation of industrial robots, unusual events may endanger the safety of humans and the quality of production. When collecting data to detect such cases, it is not ensured that data from all potentially occurring errors is included as unforeseeable events may happen over time. Therefore, anomaly detection (AD) delivers a practical solution, using only normal data to learn to detect unusual events. We introduce a dataset that allows training and benchmarking of anomaly detection methods for robotic applications based on machine data which will be made publicly available to the research community. As a typical robot task the dataset includes a pick-and-place application which involves movement, actions of the end effector and interactions with the objects of the environment. Since several of the contained anomalies are not task-specific but general, evaluations on our dataset are transferable to other robotics applications as well. Additionally, we present MVT-Flow (multivariate time-series flow) as a new baseline method for anomaly detection: It relies on deep-learning-based density estimation with normalizing flows, tailored to the data domain by taking its structure into account for the architecture. Our evaluation shows that MVT-Flow outperforms baselines from previous work by a large margin of 6.2% in area under ROC.

摘要

Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers

paper_url: http://arxiv.org/abs/2311.04744
repo_url: None
paper_authors: Pim de Haan, Taco Cohen, Johann Brehmer
for: 该论文旨在开发一种基于几何深度学习的灵活架构，即几何深度学习变换器（GATr）。
methods: 该论文使用几何代数来扩展GATr架构，使其适用于任何几何（或CLIFFORD）代数。作者还研究了使用不同几何代数的版本，包括几何代数、 проектив代数和对称代数，来表示3D数据。
results: 作者在理论和实践中评估了这些不同版本，发现 simplest几何版本 computationally cheap，但 symmetry group smaller，不够表达能力，而 projective model 表达能力不够。对称代数和改进的 проектив代数定义出了强大、高性能的架构。

Abstract
The Geometric Algebra Transformer (GATr) is a versatile architecture for geometric deep learning based on projective geometric algebra. We generalize this architecture into a blueprint that allows one to construct a scalable transformer architecture given any geometric (or Clifford) algebra. We study versions of this architecture for Euclidean, projective, and conformal algebras, all of which are suited to represent 3D data, and evaluate them in theory and practice. The simplest Euclidean architecture is computationally cheap, but has a smaller symmetry group and is not as sample-efficient, while the projective model is not sufficiently expressive. Both the conformal algebra and an improved version of the projective algebra define powerful, performant architectures.

摘要
“几何深度学习构件（GATr）是一种多功能的构件，基于射影几何代数。我们将这个构件转换为可扩展的构件，让你可以根据任何几何（或Clifford）代数建立扩展性强的 transformer 架构。我们研究了这些架构的几何、 проектив和对称 algebra 版本，这些版本都适合表示3D数据，并进行了理论和实践评估。最简单的几何架构较便宜计算，但它的同调集小，不够sample-efficient，而 проектив模型不够表达力。对称 algebra 和改进的 projetive algebra 定义了强大、高性能的架构。”Note that Simplified Chinese is a simplified version of Chinese that is used in mainland China and Singapore. Traditional Chinese is used in Taiwan, Hong Kong, and other parts of the world where traditional Chinese is prevalent.

The Quest for Content: A Survey of Search-Based Procedural Content Generation for Video Games

paper_url: http://arxiv.org/abs/2311.04710
repo_url: None
paper_authors: Mar Zamorano, Carlos Cetina, Federica Sarro
for: 游戏内容的大量生成，以满足日益增长的游戏需求。
methods: 使用搜索算法实现自动化内容生成。
results: 对SBPCG领域的现状和未来研究方向的报告，以及一些实践者可采取的建议。

Abstract
Video games demand is constantly increasing, which requires the costly production of large amounts of content. Towards this challenge, researchers have developed Search-Based Procedural Content Generation (SBPCG), that is, the (semi-)automated creation of content through search algorithms. We survey the current state of SBPCG, reporting work appeared in the field between 2011-2022 and identifying open research challenges. The results lead to recommendations for practitioners and to the identification of several potential future research avenues for SBPCG.

摘要
电子游戏的需求不断增长，需要大量的内容生成，而这也导致了高昂的生产成本。为了应对这个挑战，研究人员开发了搜索基于生成内容的技术（Search-Based Procedural Content Generation，SBPCG），即通过搜索算法（semi-)自动生成内容。我们对SBPCG领域的当前状况进行了报告，涵盖2011-2022年间出版的研究成果，并确定了一些未解决的研究挑战和未来研究方向。

Challenging Common Assumptions in Multi-task Learning

paper_url: http://arxiv.org/abs/2311.04698
repo_url: None
paper_authors: Cathrin Elich, Lukas Kirchdorfer, Jan M. Köhler, Lukas Schott
for: 本研究探讨多任务学习（MTL）下的下降搜索方法，尤其是在单任务学习（STL）基础上的情况下。
methods: 本研究使用了常用的STL工具，如Adam优化器，并证明Adam优化器在MTL中的效iveness归功于部分损失度量的兼容性。此外，本研究还研究了梯度冲突的角色在MTL和STL中，并发现梯度强度作为主要 отли异点。
results: 对于常见的图像损害，本研究未发现MTL对特征传递性的明显优势。总的来说，本研究发现MTL和STL在一些方面存在相似之处，建议在更广泛的上下文中考虑这两种方法。

Abstract
While multi-task learning (MTL) has gained significant attention in recent years, its underlying mechanisms remain poorly understood. Recent methods did not yield consistent performance improvements over single task learning (STL) baselines, underscoring the importance of gaining more profound insights about challenges specific to MTL. In our study, we challenge common assumptions in MTL in the context of STL: First, the choice of optimizer has only been mildly investigated in MTL. We show the pivotal role of common STL tools such as the Adam optimizer in MTL. We deduce the effectiveness of Adam to its partial loss-scale invariance. Second, the notion of gradient conflicts has often been phrased as a specific problem in MTL. We delve into the role of gradient conflicts in MTL and compare it to STL. For angular gradient alignment we find no evidence that this is a unique problem in MTL. We emphasize differences in gradient magnitude as the main distinguishing factor. Lastly, we compare the transferability of features learned through MTL and STL on common image corruptions, and find no conclusive evidence that MTL leads to superior transferability. Overall, we find surprising similarities between STL and MTL suggesting to consider methods from both fields in a broader context.

摘要
MTL（多任务学习）在最近几年内得到了广泛关注，但它的内在机制仍未得到了充分理解。现有方法不能够在单任务学习（STL）基础上实现一致性的性能提升，这重申了对MTL的深入理解的重要性。在我们的研究中，我们挑战了MTL中通常被假设的一些假设：首先，MTL中选择优化器的研究只是轻度地进行了调查。我们表明了通用STL工具such as Adam优化器在MTL中的重要作用。我们发现Adam的有效性归功于它的部分损失尺度不变性。其次，在MTL中 gradient conflicts 的概念经常被宣称为特定的问题。我们探讨了MTL中 gradient conflicts 的角色，并与 STL 进行比较。对于 angular gradient alignment，我们未能发现这是MTL中独特的问题。我们强调了 gradient magnitude 的差异作为主要 отлича点。最后，我们比较了通过 MTL 和 STL 学习的特征的传输性，并未发现MTL leads to superior transferability。总的来说，我们发现了MTL 和 STL 之间的意外相似之处，建议在更广泛的上下文中考虑这两个领域的方法。

Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO

paper_url: http://arxiv.org/abs/2311.04951
repo_url: https://github.com/openvinotoolkit/openvino_notebooks
paper_authors: Haim Barad, Ekaterina Aidova, Yury Gorbachev
for: 提高用户体验和减少基础设施成本和能耗
methods: 使用幻数据批处理和量化优化
results: 提高文本生成响应时间，并与标准抽样相比较Here’s a more detailed explanation of each point:
for: The paper is written to improve the user experience and reduce infrastructure costs and power consumption by optimizing text generation using inference optimizations.
methods: The paper proposes using speculative sampling, a form of dynamic execution, to reduce the overall latency of text generation. The authors also use model-based optimizations such as quantization and KV caching.
results: The paper compares the performance of speculative sampling with standard autoregressive sampling and shows that speculative sampling can improve the response time of text generation.

Abstract
Inference optimizations are critical for improving user experience and reducing infrastructure costs and power consumption. In this article, we illustrate a form of dynamic execution known as speculative sampling to reduce the overall latency of text generation and compare it with standard autoregressive sampling. This can be used together with model-based optimizations (e.g. quantization) to provide an optimized solution. Both sampling methods make use of KV caching. A Jupyter notebook and some sample executions are provided.

摘要
推理优化是提高用户体验和减少基础设施成本和电力消耗的关键。在这篇文章中，我们介绍了一种动态执行技术known as speculative sampling，用于减少文本生成总延迟。我们还与标准排取样本相比较。这两种抽样方法都使用KV缓存。我们提供了一个Jupyter笔记和一些示例执行。

Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation

paper_url: http://arxiv.org/abs/2311.04693
repo_url: https://github.com/hayeong0/Diff-HierVC
paper_authors: Ha-Yeong Choi, Sang-Hoon Lee, Seong-Whan Lee
for: 提高voice conversion（VC）系统的精度和声音适应质量
methods: 基于两个扩散模型的层次VC系统（Diff-HierVC），包括DiffPitch和DiffVoice两部分
results: 实验结果表明，our模型在抽象F0生成和声音风格转换方面具有优异表现，并在零基elineVC场景中达到CER=0.83%和EER=3.29%的性能。

Abstract
Although voice conversion (VC) systems have shown a remarkable ability to transfer voice style, existing methods still have an inaccurate pitch and low speaker adaptation quality. To address these challenges, we introduce Diff-HierVC, a hierarchical VC system based on two diffusion models. We first introduce DiffPitch, which can effectively generate F0 with the target voice style. Subsequently, the generated F0 is fed to DiffVoice to convert the speech with a target voice style. Furthermore, using the source-filter encoder, we disentangle the speech and use the converted Mel-spectrogram as a data-driven prior in DiffVoice to improve the voice style transfer capacity. Finally, by using the masked prior in diffusion models, our model can improve the speaker adaptation quality. Experimental results verify the superiority of our model in pitch generation and voice style transfer performance, and our model also achieves a CER of 0.83% and EER of 3.29% in zero-shot VC scenarios.

摘要
尽管voice conversion（VC）系统已经表现出了很好的语音风格传递能力，现有的方法仍然具有不准确的抑音和低端应用质量。为解决这些挑战，我们提出了Diff-HierVC，一种基于两个扩散模型的层次VC系统。我们首先引入DiffPitch，它可以有效地生成F0目标声音风格。然后，生成的F0被 fed到DiffVoice中，用来转换语音为目标声音风格。此外，通过源-滤波器编码器，我们分离出语音，并使用转换后的Mel-spectrogram作为数据驱动的先验知识来提高voice style转换能力。最后，通过在扩散模型中使用假标记先验，我们的模型可以提高 speaker adaptation质量。实验结果证明我们的模型在抑音和voice style转换性能方面具有优势，并且在零shot VC场景下 achieved CER of 0.83%和EER of 3.29%。

Pre-training LLMs using human-like development data corpus

paper_url: http://arxiv.org/abs/2311.04666
repo_url: None
paper_authors: Khushi Bhardwaj, Raj Sanjay Shah, Sashank Varma
for: 这个论文的目的是测试大型自然语言模型（LLMs）在语言理解和推理任务中的表现，以及模型在不同的训练环境下的可重复性和稳定性。
methods: 这个论文使用了大量的raw文本数据进行预训练，并对LLMs进行了评估，以评估模型在不同的训练环境下的表现。
results: 论文提出了一系列的基准值，包括不同架构、评估 epochs 的变化和报告的预训练 metric，以及对RoBERTa 基线值的评估。

Abstract
Pre-trained Large Language Models (LLMs) have shown success in a diverse set of language inference and understanding tasks. The pre-training stage of LLMs looks at a large corpus of raw textual data. The BabyLM shared task compares LLM pre-training to human language acquisition, where the number of tokens seen by 13-year-old kids is magnitudes smaller than the number of tokens seen by LLMs. In this work, we pre-train and evaluate LLMs on their ability to learn contextual word representations using roughly the same number of tokens as seen by children. We provide a strong set of baselines; with different architectures, evaluation of changes in performance across epochs, and reported pre-training metrics for the strict small and strict tracks of the task. We also try to loosely replicate the RoBERTa baseline given by the task organizers to observe the training robustness to hyperparameter selection and replicability. We provide the submission details to the strict and strict-small tracks in this report.

摘要
大型语言模型（LLMs）在多种语言推理和理解任务中表现出色。LLMs的预训阶段将关注大量的原始文本数据。在这个工作中，我们将LLMs预训和评估其能够学习上下文 word 表现。我们使用相似数量的字元与13岁儿童所看到的字元进行比较。我们提供了强大的基准点，包括不同架构、评估改变过程中的表现、和预训中的 metric。我们还尝试复制RoBERTa基eline，以观察对于数据选择和可重现性的训练稳定性。我们在这份报告中提供了预训和紧缩小 tracks 的提交细节。

Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models

paper_url: http://arxiv.org/abs/2311.04659
repo_url: None
paper_authors: Yiyuan Li, Rakesh R. Menon, Sayan Ghosh, Shashank Srivastava
for: This paper aims to explore the ability of recent foundation models to understand generalized quantifiers in natural language, specifically in sentences featuring percentage-equipped predicates.
methods: The paper uses a crowd-sourced dataset of human-annotated generalized quantifiers in Wikipedia sentences, called QuRe, and a framework called PRESQUE, which combines natural language inference and the Rational Speech Acts framework, to test the ability of language models to understand quantifier percentage scopes.
results: The experimental results on the HVD dataset and QuRe show that PRESQUE, which uses pragmatic reasoning, performs 20% better than a literal reasoning baseline when predicting quantifier percentage scopes, with no additional training required.

Abstract
Generalized quantifiers (e.g., few, most) are used to indicate the proportions predicates are satisfied (for example, some apples are red). One way to interpret quantifier semantics is to explicitly bind these satisfactions with percentage scopes (e.g., 30%-40% of apples are red). This approach can be helpful for tasks like logic formalization and surface-form quantitative reasoning (Gordon and Schubert, 2010; Roy et al., 2015). However, it remains unclear if recent foundation models possess this ability, as they lack direct training signals. To explore this, we introduce QuRe, a crowd-sourced dataset of human-annotated generalized quantifiers in Wikipedia sentences featuring percentage-equipped predicates. We explore quantifier comprehension in language models using PRESQUE, a framework that combines natural language inference and the Rational Speech Acts framework. Experimental results on the HVD dataset and QuRe illustrate that PRESQUE, employing pragmatic reasoning, performs 20% better than a literal reasoning baseline when predicting quantifier percentage scopes, with no additional training required.

摘要
通用量词（例如，很少、大多数）用于指示逻辑 predicate 满足的порпорzioni（例如，一些苹果是红色的）。一种方法可以理解量词 semantics 是通过显式绑定这些满足情况的百分比范围（例如，30%-40% 的苹果是红色的）。这种方法可以对逻辑ormalization和表面形量化理性（Gordon 和Schubert，2010；Roy 等，2015）进行帮助。然而，是否现代基础模型拥有这种能力仍然未知，因为它们缺乏直接训练信号。为了探索这一点，我们引入 QuRe，一个由人工标注的通用量词在Wikipedia句子中出现的百分比范围的数据集。我们使用 PRESQUE，一个结合自然语言推理和理性演讲框架的框架，来探索语言模型中量词理解的能力。实验结果表明，PRESQUE，通过使用 Pragmatic Reasoning，在 HVD 数据集和 QuRe 上预测量词百分比范围时，与 Literal Reasoning 基准相比，提高了20%的性能，无需额外训练。

Hybrid Focal and Full-Range Attention Based Graph Transformers

paper_url: http://arxiv.org/abs/2311.04653
repo_url: None
paper_authors: Minhong Zhu, Zhenhao Zhao, Weiran Cai
for: 这篇论文旨在提高图structured数据学习中Graph Transformer的性能，通过增强对本地信息的捕捉和全范围相关性的学习。
methods: 该论文提出了一种新的具有复合注意力机制的强化图Transformer模型，named Focal and Full-Range Graph Transformer (FFGT)，通过结合全范围注意力和K-hop焦点注意力来捕捉全范围和本地信息。
results: 该论文在多个开放数据集上提高了现有的Graph Transformer性能，同时在一些Long-Range Graph Benchmark (LRGB)数据集上达到了与普通Transformer相同的SOTA性能，而无需任何特殊的参数调整或特定的数据预处理。

Abstract
The paradigm of Transformers using the self-attention mechanism has manifested its advantage in learning graph-structured data. Yet, Graph Transformers are capable of modeling full range dependencies but are often deficient in extracting information from locality. A common practice is to utilize Message Passing Neural Networks (MPNNs) as an auxiliary to capture local information, which however are still inadequate for comprehending substructures. In this paper, we present a purely attention-based architecture, namely Focal and Full-Range Graph Transformer (FFGT), which can mitigate the loss of local information in learning global correlations. The core component of FFGT is a new mechanism of compound attention, which combines the conventional full-range attention with K-hop focal attention on ego-nets to aggregate both global and local information. Beyond the scope of canonical Transformers, the FFGT has the merit of being more substructure-aware. Our approach enhances the performance of existing Graph Transformers on various open datasets, while achieves compatible SOTA performance on several Long-Range Graph Benchmark (LRGB) datasets even with a vanilla transformer. We further examine influential factors on the optimal focal length of attention via introducing a novel synthetic dataset based on SBM-PATTERN.

摘要
“对于使用自我注意机制的Transformers模型，它在学习图像数据中表现出了优势。然而，图像Transformers可以模型全范围的相互关系，但通常缺乏从本地获取信息的能力。为了解决这个问题，常用Message Passing Neural Networks（MPNNs）作为辅助，以便捕捉本地信息，但这些MPNNs仍然无法彻底理解子结构。在这篇论文中，我们提出了一个纯注意 mechanism的架构，即全范围注意和K-hop焦点注意的复合注意机制（FFGT），以便储存全范围和本地信息。与传统Transformers不同的是，FFGT更加注重到子结构。我们的方法可以提高现有的图像Transformers的性能，并在多个Long-Range Graph Benchmark（LRGB）dataset上实现了Compatible SOTA的性能，甚至使用普通的Transformer。我们进一步研究了注意力的最佳焦点因素，并通过引入一个基于SBM-PATTERN的新 sintetic dataset。”

SKU-Patch: Towards Efficient Instance Segmentation for Unseen Objects in Auto-Store

paper_url: http://arxiv.org/abs/2311.04645
repo_url: None
paper_authors: Biqi Yang, Weiliang Tang, Xiaojie Gao, Xianzhi Li, Yun-Hui Liu, Chi-Wing Fu, Pheng-Ann Heng
for: 这篇论文主要针对大规模库存中的自动采矿领域问题，旨在提供一个新的patch-guided实例分割方案，以减少人工干预和模型重训。
methods: 本文提出了一个 novel transformer-based网络，包括(i)一个patch-image相联 corrle encoder，用于捕捉多个层次的图像特征，并(ii)一个patch-aware transformer decoder，用于生成实例 mask。
results: 实验结果显示，SKU-Patch可以在四个库存测试 benchmark上取得最佳性能，并在一个真实的自动存储运输管线中，实现了逾90%的抓捕成功率，证明其实用性和可行性。

Abstract
In large-scale storehouses, precise instance masks are crucial for robotic bin picking but are challenging to obtain. Existing instance segmentation methods typically rely on a tedious process of scene collection, mask annotation, and network fine-tuning for every single Stock Keeping Unit (SKU). This paper presents SKU-Patch, a new patch-guided instance segmentation solution, leveraging only a few image patches for each incoming new SKU to predict accurate and robust masks, without tedious manual effort and model re-training. Technical-wise, we design a novel transformer-based network with (i) a patch-image correlation encoder to capture multi-level image features calibrated by patch information and (ii) a patch-aware transformer decoder with parallel task heads to generate instance masks. Extensive experiments on four storehouse benchmarks manifest that SKU-Patch is able to achieve the best performance over the state-of-the-art methods. Also, SKU-Patch yields an average of nearly 100% grasping success rate on more than 50 unseen SKUs in a robot-aided auto-store logistic pipeline, showing its effectiveness and practicality.

摘要
大规模仓库中，精准实例掩模是机器人抓取物品的关键，但是它们很难以获得。现有的实例分割方法通常需要 tedious scene collection、标注和网络微调，对每个 Stock Keeping Unit (SKU) 进行一个一一的处理。本文介绍了 SKU-Patch，一种新的 patch-guided 实例分割解决方案，只需要对每个新的 SKU 输入一些图像块来预测准确和可靠的掩模，不需要手动劳累和模型重新训练。技术上，我们设计了一种基于 transformer 网络的新网络，包括：(i) 一个 patch-image correlation encoder，用于捕捉多级图像特征，并将其与块信息相协调。(ii) 一个 patch-aware transformer decoder，包括多个并行任务头，用于生成实例掩模。经验表明，SKU-Patch 能够在四个仓库测试准则上取得最好的性能，并且在 robot-aided auto-store 物流管线中，SKU-Patch 可以实现97%的抓取成功率，验证了其实用性和实际性。

Object-Centric Learning with Slot Mixture Module

paper_url: http://arxiv.org/abs/2311.04640
repo_url: None
paper_authors: Daniil Kirilenko, Vitaliy Vorobyov, Alexey K. Kovalev, Aleksandr I. Panov
for: 这篇论文是为了提出一种基于 Gaussian Mixture Model 的学习式划分方法，用于改进 object-centric 架构中的 slot 表示方法。
methods: 该方法使用学习式划分方法来分解特征图像，并将分配给 slot 的信息包含在 slot 表示中，从而得到更表示力的 slot 表示。
results: 对于 object-centric enario，使用该方法而不使用 Slot Attention 可以提高性能，达到目前最佳Result 在 set property prediction 任务中。

Abstract
Object-centric architectures usually apply a differentiable module to the entire feature map to decompose it into sets of entity representations called slots. Some of these methods structurally resemble clustering algorithms, where the cluster's center in latent space serves as a slot representation. Slot Attention is an example of such a method, acting as a learnable analog of the soft k-means algorithm. Our work employs a learnable clustering method based on the Gaussian Mixture Model. Unlike other approaches, we represent slots not only as centers of clusters but also incorporate information about the distance between clusters and assigned vectors, leading to more expressive slot representations. Our experiments demonstrate that using this approach instead of Slot Attention improves performance in object-centric scenarios, achieving state-of-the-art results in the set property prediction task.

摘要
通常，对象中心的架构会应用一个可导模块到整个特征地图，将其分解成Entity表示集合的集合。一些这些方法结构上类似于聚类算法，其中聚类中心在隐藏空间中服务为槽表示。槽注意力是一 such方法， behaving as a learnable soft k-means algorithm。我们的工作使用一种学习聚类方法基于 Gaussian Mixture Model。与其他方法不同，我们在表示槽不仅包括聚类中心，还包括聚类与分配向量之间的距离信息，导致更具表达力的槽表示。我们的实验表明，使用这种方法而不是槽注意力可以在对象中心的情况下提高性能，实现了属性集 Prediction任务的国际最佳 результа。

Explained anomaly detection in text reviews: Can subjective scenarios be correctly evaluated?

paper_url: http://arxiv.org/abs/2311.04948
repo_url: None
paper_authors: David Novoa-Paradela, Oscar Fontenla-Romero, Bertha Guijarro-Berdiñas
for: 这个研究的目的是检测和解释在线平台上的异常评论。
methods: 这个ipeline包括三个模块，用于检测不会为用户提供价值的评论，包括无用和黑客组成的评论。每个分类都有一个正常性分数和一个解释，以 justify 的决定。
results: 这个ipeline在不同的数据集上进行了评测，并进行了一项解释技术的比较研究，以评估解释模块的效果。这项研究可以帮助自动化在线平台上的评论审核任务，例如电子商务平台上的评论审核，并为相关领域的异常检测任务提供灵感。此外，这项研究还表明了对不同解释技术的人类评估，并探讨了是否可以解释人类化的任务，如异常评论检测。

Abstract
This paper presents a pipeline to detect and explain anomalous reviews in online platforms. The pipeline is made up of three modules and allows the detection of reviews that do not generate value for users due to either worthless or malicious composition. The classifications are accompanied by a normality score and an explanation that justifies the decision made. The pipeline's ability to solve the anomaly detection task was evaluated using different datasets created from a large Amazon database. Additionally, a study comparing three explainability techniques involving 241 participants was conducted to assess the explainability module. The study aimed to measure the impact of explanations on the respondents' ability to reproduce the classification model and their perceived usefulness. This work can be useful to automate tasks in review online platforms, such as those for electronic commerce, and offers inspiration for addressing similar problems in the field of anomaly detection in textual data. We also consider it interesting to have carried out a human evaluation of the capacity of different explainability techniques in a real and infrequent scenario such as the detection of anomalous reviews, as well as to reflect on whether it is possible to explain tasks as humanly subjective as this one.

摘要
Translation Notes:* "anomalous reviews" 翻译为 "不常见的评论"* "online platforms" 翻译为 "在线平台"* "worthless or malicious composition" 翻译为 "无用或有恶意组合"* "normality score" 翻译为 "常见度分数"* "explanation" 翻译为 "解释"* "classification model" 翻译为 "分类模型"* "electronic commerce" 翻译为 "电子商务"* "humanly subjective" 翻译为 "有人性的"

LuminanceL1Loss: A loss function which measures percieved brightness and colour differences

paper_url: http://arxiv.org/abs/2311.04614
repo_url: None
paper_authors: Dominic De Jonge
for: 提高图像修复任务的性能
methods: 使用新的损失函数LuminanceL1Loss，将图像转换为灰度图并计算MSE损失两个频道
results: 对Retinexformer、BUIFD和DnCNN arquitectures进行了评测，并表明LuminanceL1Loss可以超越传统方法，提高图像修复任务的性能，最高提高4.7dB。

Abstract
We introduce LuminanceL1Loss, a novel loss function designed to enhance the performance of image restoration tasks. We demonstrate its superiority over MSE when applied to the Retinexformer, BUIFD and DnCNN architectures. Our proposed LuminanceL1Loss leverages a unique approach by transforming images into grayscale and subsequently computing the MSE loss for both grayscale and color channels. Experimental results demonstrate that this innovative loss function consistently outperforms traditional methods, showcasing its potential in image denoising and other related tasks in image reconstruction. It demonstrates gains up to 4.7dB. The results presented in this study highlight the efficacy of LuminanceL1Loss for various image restoration tasks.

摘要
我们介绍了一种新的损失函数，即LuminanceL1Loss，用于提高图像恢复任务的性能。我们在Retinexformer、BUIFD和DnCNN架构上进行了实验，并证明了LuminanceL1Loss在这些架构上的优越性。我们的提案的LuminanceL1Loss采用了一种独特的方法，即将图像转换成灰度图像，然后计算灰度和色彩通道之间的MSE损失。实验结果表明，这种创新的损失函数在图像压缩和其他相关的图像重建任务中具有优越的表现，提高了4.7dB。这些研究结果表明LuminanceL1Loss在各种图像恢复任务中的可靠性和普适性。

paper_url: http://arxiv.org/abs/2311.04589
repo_url: None
paper_authors: Zhen Yang, Yingxue Zhang, Fandong Meng, Jie Zhou
for: 本研究想要帮助多modal语言模型（MM-LLMs）更好地处理多modal输入和生成非文本模式。
methods: 本方法使用了Tokenize和Embed ALl（TEAL）方法，将输入从任何模式转化为token序列，并学习一个共同的嵌入空间 для所有模式。
results: 实验表明，TEAL可以获得显著的多modal理解提升，并实现了一个简单的多modal生成方案。

Abstract
Despite Multi-modal Large Language Models (MM-LLMs) have made exciting strides recently, they are still struggling to efficiently model the interactions among multi-modal inputs and the generation in non-textual modalities. In this work, we propose TEAL (Tokenize and Embed ALl)}, an approach to treat the input from any modality as a token sequence and learn a joint embedding space for all modalities. Specifically, for the input from any modality, TEAL first discretizes it into a token sequence with the off-the-shelf tokenizer and embeds the token sequence into a joint embedding space with a learnable embedding matrix. MM-LLMs just need to predict the multi-modal tokens autoregressively as the textual LLMs do. Finally, the corresponding de-tokenizer is applied to generate the output in each modality based on the predicted token sequence. With the joint embedding space, TEAL enables the frozen LLMs to perform both understanding and generation tasks involving non-textual modalities, such as image and audio. Thus, the textual LLM can just work as an interface and maintain its high performance in textual understanding and generation. Experiments show that TEAL achieves substantial improvements in multi-modal understanding, and implements a simple scheme for multi-modal generations.

摘要
尽管多模态大型语言模型（MM-LLMs）在最近几年内做出了吸人的进步，但它们仍然努力地模型多模态输入的交互和非文本modalities中的生成。在这项工作中，我们提出了TEAL（Tokenize and Embed ALl），一种方法，其中输入从任何模式都会被视为一个token序列，并在一个共享的embedding空间中学习一个共同的embedding矩阵。具体来说，对于输入从任何模式，TEAL首先将它拆分成一个token序列，使用可用的tokenizer进行拆分，然后将token序列embedding到一个共同的embedding空间中，使用一个学习的embedding矩阵。MM-LLMs只需要预测多modal tokens的autoregressive预测，就像文本LLMs一样。最后，对于每个模式，使用预测的token序列生成输出。与共同的embedding空间相比，TEAL使得冻结的LLMs可以在多modal任务中进行理解和生成任务，如图像和音频。因此，文本LLM可以作为界面，维持高效的文本理解和生成能力。实验结果表明，TEAL在多modal理解方面取得了显著的提升，并实现了简单的多modal生成方案。

Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

paper_url: http://arxiv.org/abs/2311.04588
repo_url: https://github.com/akshitjindal1/aot_wacv
paper_authors: Akshit Jindal, Vikram Goyal, Saket Anand, Chetan Arora
for: 防止机器学习模型被盗用（Model Stealing Attacks），当机器学习模型被部署为服务时。
methods: 使用 ensemble of deep learning models 作为盗取模型，以便选择最有用的数据点子集。
results: 比基于单个模型的方法高效，可以提高盗取模型的质量和攻击成功率。在 CIFAR-10 数据集上，我们的方法可以提高对模型的攻击性能，比基于单个模型的方法高效。

Abstract
Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the pool of available data. Existing attack strategies utilize approaches like Active Learning and Semi-Supervised learning to minimize costs. However, in the black-box setting, these approaches may select sub-optimal samples as they train only one thief model. Depending on the thief model's capacity and the data it was pretrained on, the model might even select noisy samples that harm the learning process. In this work, we explore the usage of an ensemble of deep learning models as our thief model. We call our attack Army of Thieves(AOT) as we train multiple models with varying complexities to leverage the crowd's wisdom. Based on the ensemble's collective decision, uncertain samples are selected for querying, while the most confident samples are directly included in the training data. Our approach is the first one to utilize an ensemble of thief models to perform model extraction. We outperform the base approaches of existing state-of-the-art methods by at least 3% and achieve a 21% higher adversarial sample transferability than previous work for models trained on the CIFAR-10 dataset.

摘要

GResilience: Trading Off Between the Greenness and the Resilience of Collaborative AI Systems

paper_url: http://arxiv.org/abs/2311.04569
repo_url: None
paper_authors: Diaeddin Rimawi, Antonio Liotta, Marco Todescato, Barbara Russo
for: 本研究旨在提供一种自动评估Collaborative Artificial Intelligence System（CAIS）恢复行动的能力来衡量系统的可恢复性和绿色性。
methods: 本研究提出了一种以优化和游戏理论为基础的方法来评估CAIS恢复行动的可恢复性和绿色性。
results: 研究人员通过设计了一种实验协议和应用于一个真实的CAIS示例器，并通过优化和游戏理论来评估CAIS恢复行动的可恢复性和绿色性。

Abstract
A Collaborative Artificial Intelligence System (CAIS) works with humans in a shared environment to achieve a common goal. To recover from a disruptive event that degrades its performance and ensures its resilience, a CAIS may then need to perform a set of actions either by the system, by the humans, or collaboratively together. As for any other system, recovery actions may cause energy adverse effects due to the additional required energy. Therefore, it is of paramount importance to understand which of the above actions can better trade-off between resilience and greenness. In this in-progress work, we propose an approach to automatically evaluate CAIS recovery actions for their ability to trade-off between the resilience and greenness of the system. We have also designed an experiment protocol and its application to a real CAIS demonstrator. Our approach aims to attack the problem from two perspectives: as a one-agent decision problem through optimization, which takes the decision based on the score of resilience and greenness, and as a two-agent decision problem through game theory, which takes the decision based on the payoff computed for resilience and greenness as two players of a cooperative game.

摘要
Translated into Simplified Chinese:一个协同人工智能系统（CAIS）与人类在共享环境中工作以实现共同目标。在一个破坏性事件导致系统性能下降后，CAIS可能需要执行一系列动作，这些动作可以由系统、人类或者共同执行。由于这些恢复动作可能会带来更多的能源消耗，因此非常重要理解这些动作可以如何让恢复和绿色之间进行更好的负担平衡。在这个进行中的工作中，我们提出了一种方法，可以自动评估CAIS恢复动作的能量和绿色之间的负担平衡。我们还设计了一个实验协议和应用于一个真实的CAIS示范器。我们的方法尝试从两个角度解决问题：作为一个一个代理决策问题，通过优化来做出决策，以及作为两个玩家的合作游戏问题，通过游戏理论来做出决策。

CAIS-DMA: A Decision-Making Assistant for Collaborative AI Systems

paper_url: http://arxiv.org/abs/2311.04562
repo_url: https://github.com/dmrimawi/cais-dma
paper_authors: Diaeddin Rimawi, Antonio Lotta, Marco Todescato, Barbara Russo
for:This paper aims to develop a methodology to automatically support the decision-making process in a Collaborative Artificial Intelligence System (CAIS) when the system experiences performance degradation after a disruptive event.methods:The proposed framework consists of three components: one manages or simulates CAIS’s environment and disruptive events, the second automates the decision-making process, and the third provides a visual analysis of CAIS behavior.results:The framework can automatically monitor the decision-making process, intervene whenever a performance degradation occurs, and recommend the next action that balances between minimizing the recovery time (i.e., resilience) and minimizing the energy adverse effects (i.e., greenness).

Abstract
A Collaborative Artificial Intelligence System (CAIS) is a cyber-physical system that learns actions in collaboration with humans in a shared environment to achieve a common goal. In particular, a CAIS is equipped with an AI model to support the decision-making process of this collaboration. When an event degrades the performance of CAIS (i.e., a disruptive event), this decision-making process may be hampered or even stopped. Thus, it is of paramount importance to monitor the learning of the AI model, and eventually support its decision-making process in such circumstances. This paper introduces a new methodology to automatically support the decision-making process in CAIS when the system experiences performance degradation after a disruptive event. To this aim, we develop a framework that consists of three components: one manages or simulates CAIS's environment and disruptive events, the second automates the decision-making process, and the third provides a visual analysis of CAIS behavior. Overall, our framework automatically monitors the decision-making process, intervenes whenever a performance degradation occurs, and recommends the next action. We demonstrate our framework by implementing an example with a real-world collaborative robot, where the framework recommends the next action that balances between minimizing the recovery time (i.e., resilience), and minimizing the energy adverse effects (i.e., greenness).

摘要
一个协同人工智能系统（CAIS）是一个融合物理系统，通过与人类在共同环境中学习行动来实现共同目标。特别是，CAIS具有一个人工智能模型，用于支持协同决策过程。当系统经历破坏性事件（例如，突发事件）时，这个决策过程可能受到影响或者even stop。因此，监测人工智能模型的学习是极其重要的。这篇论文提出了一种新的方法，用于自动支持CAIS协同决策过程在系统经历破坏性事件后。为此，我们开发了一个框架，该框架包括三个组件：一个管理或模拟CAIS的环境和破坏性事件，第二个自动化决策过程，第三个提供CAIS行为的可视分析。总之，我们的框架可以自动监测决策过程，在破坏性事件发生时进行交互，并 recommends the next action，以保持系统的可靠性和绿色性。我们通过实施一个实际的协同 робоット示例来证明我们的框架。在这个示例中，我们的框架建议下一个行动，以均衡系统的恢复时间（即可靠性）和能源不良影响（即绿色性）。

paper_url: http://arxiv.org/abs/2311.04544
repo_url: None
paper_authors: Yashothara Shanmugarasa, M. A. P. Chamikara, Hye-young Paik, Salil S. Kanhere, Liming Zhu
for: 提供消费者和能源公司 valuabe insights into energy management, while protecting privacy.
methods: 使用Local Differential Privacy (LDP) methods with randomized response techniques and sliding windows to protect appliance-level energy consumption data.
results: Efficient and effective privacy protection, balancing privacy and data utility for analysis.

Abstract
Energy disaggregation techniques, which use smart meter data to infer appliance energy usage, can provide consumers and energy companies valuable insights into energy management. However, these techniques also present privacy risks, such as the potential for behavioral profiling. Local differential privacy (LDP) methods provide strong privacy guarantees with high efficiency in addressing privacy concerns. However, existing LDP methods focus on protecting aggregated energy consumption data rather than individual appliances. Furthermore, these methods do not consider the fact that smart meter data are a form of streaming data, and its processing methods should account for time windows. In this paper, we propose a novel LDP approach (named LDP-SmartEnergy) that utilizes randomized response techniques with sliding windows to facilitate the sharing of appliance-level energy consumption data over time while not revealing individual users' appliance usage patterns. Our evaluations show that LDP-SmartEnergy runs efficiently compared to baseline methods. The results also demonstrate that our solution strikes a balance between protecting privacy and maintaining the utility of data for effective analysis.

摘要
智能计量数据分解技术可以为消费者和能源公司提供有价值的能源管理信息，但这些技术也存在隐私风险，如行为 Profiling 的可能性。本地均衡隐私（LDP）方法可以提供强隐私保证，但现有的 LDP 方法主要关注保护归并的能源消耗数据而不是个体设备。此外，这些方法没有考虑智能计量数据是流动数据，其处理方法应该考虑时间窗口。在本文中，我们提出了一种新的 LDP 方法（名为 LDP-SmartEnergy），它利用随机响应技术和滑块窗口来帮助在时间上分享设备级能源消耗数据，而不抛露个体用户的设备使用模式。我们的评估结果显示，LDP-SmartEnergy 能够高效运行，与基eline方法相比。结果还表明，我们的解决方案能够平衡保护隐私和维护数据的有用性。

RankAug: Augmented data ranking for text classification

paper_url: http://arxiv.org/abs/2311.04535
repo_url: None
paper_authors: Tiasa Singha Roy, Priyam Basu
for: 这个论文主要是为了提高生成模型的评估方法。
methods: 这篇论文提出了一种文本排序方法，用于检测和过滤生成文本中最相似的文本，以提高NLU任务的准确率。
results: 实验结果显示，通过judicious选择筛选技术可以提高准确率，最高提高35%。

Abstract
Research on data generation and augmentation has been focused majorly on enhancing generation models, leaving a notable gap in the exploration and refinement of methods for evaluating synthetic data. There are several text similarity metrics within the context of generated data filtering which can impact the performance of specific Natural Language Understanding (NLU) tasks, specifically focusing on intent and sentiment classification. In this study, we propose RankAug, a text-ranking approach that detects and filters out the top augmented texts in terms of being most similar in meaning with lexical and syntactical diversity. Through experiments conducted on multiple datasets, we demonstrate that the judicious selection of filtering techniques can yield a substantial improvement of up to 35% in classification accuracy for under-represented classes.

摘要
研究数据生成和增强主要集中在提高生成模型，留下了许多评估合成数据的方法的空白。在生成数据筛选中，文本相似度指标在NLU任务中的意图和情感分类方面具有重要作用。本研究提出了RankAug方法，它通过词语和语法多样性来推荐最相似的文本，从而提高分类精度。通过在多个数据集上进行实验，我们证明了选择合适的筛选技术可以提高受 represeted类准确率达35%。

Validating ChatGPT Facts through RDF Knowledge Graphs and Sentence Similarity

paper_url: http://arxiv.org/abs/2311.04524
repo_url: None
paper_authors: Michalis Mountantonakis, Yannis Tzitzikas
for: 这个论文目的是 validate ChatGPT 的答案和补充它们的证明和来源。
methods: 这个论文使用 RDF 知识Graph（KG）和短句嵌入来实现 ChatGPT 答案的验证和补充。特别是使用 DBpedia 和 LODsyndesis（一个 aggregate KG，包含 2000 亿 triple 从 400 RDF KGs 多个领域），并引入一种算法，可以返回更加相关的 triple（ accompaniment 和 confidence score）。
results: 在评估这种服务（以及类似服务）时，作者创建了一个评估标准套件，包括 2000 个 ChatGPT 答案（其中 1000 个是希腊名人、500 个是希腊地点、500 个是关于希腊的事件）。手动标注后，发现 ChatGPT 的答案中约 73% 是正确的，27% 是错误的。结果很有 promise，例如，对整个benchmark来说，我们成功验证了 ChatGPT 的 85.3% 正确答案，并找到了错误答案中 62.6% 的正确答案。

Abstract
Since ChatGPT offers detailed responses without justifications, and erroneous facts even for popular persons, events and places, in this paper we present a novel pipeline that retrieves the response of ChatGPT in RDF and tries to validate the ChatGPT facts using one or more RDF Knowledge Graphs (KGs). To this end we leverage DBpedia and LODsyndesis (an aggregated Knowledge Graph that contains 2 billion triples from 400 RDF KGs of many domains) and short sentence embeddings, and introduce an algorithm that returns the more relevant triple(s) accompanied by their provenance and a confidence score. This enables the validation of ChatGPT responses and their enrichment with justifications and provenance. To evaluate this service (such services in general), we create an evaluation benchmark that includes 2,000 ChatGPT facts; specifically 1,000 facts for famous Greek Persons, 500 facts for popular Greek Places, and 500 facts for Events related to Greece. The facts were manually labelled (approximately 73% of ChatGPT facts were correct and 27% of facts were erroneous). The results are promising; indicatively for the whole benchmark, we managed to verify the 85.3% of the correct facts of ChatGPT and to find the correct answer for the 62.6% of the erroneous ChatGPT facts.

摘要
自从ChatGPT提供了详细的回答无需证明，而且甚至包含错误的信息关于知名人物、事件和地点，因此在这篇论文中，我们提出了一个新的管道，它将ChatGPT的回答转换为RDF格式，并使用一个或多个RDF知识 graphs（KGs）来验证ChatGPT的信息是否正确。为此，我们利用了DBpedia和LODsyndesis（一个包含400个RDF KGs的多个领域的知识Graph，总共包含200亿个三元组），并使用短句嵌入，并引入一种算法，它可以返回更加相关的 triple（或多个 triple），以及它们的来源和信任分数。这使得可以验证ChatGPT的回答，并为其添加证明和来源。为了评估这种服务（以及类似服务），我们创建了一个评估标准，包括2,000个ChatGPT的信息，其中包括1,000个著名希腊人物、500个希腊地点和500个与希腊相关的事件。这些信息都是手动标注的（约73%的ChatGPT信息正确，27%的信息错误）。结果很有 promise，例如，对整个benchmark，我们成功验证了85.3%的正确ChatGPT信息，并为错误的ChatGPT信息找到了正确的答案的62.6%。

FFINet: Future Feedback Interaction Network for Motion Forecasting

paper_url: http://arxiv.org/abs/2311.04512
repo_url: None
paper_authors: Miao Kang, Shengqi Wang, Sanping Zhou, Ke Ye, Jingjing Jiang, Nanning Zheng
for: 预测交通代理人的未来合理行为，以提高自动驾驶系统的安全性和效率。
methods: 提出了一种新的未来反馈互动网络（FFINet），通过将当前观察和未来互动的特征进行聚合，以提高多模态轨迹预测的准确性。
results: 在 Argoverse 1 和 Argoverse 2 动态预测测试数据集上，FFINet 实现了状态领先的性能。

Abstract
Motion forecasting plays a crucial role in autonomous driving, with the aim of predicting the future reasonable motions of traffic agents. Most existing methods mainly model the historical interactions between agents and the environment, and predict multi-modal trajectories in a feedforward process, ignoring potential trajectory changes caused by future interactions between agents. In this paper, we propose a novel Future Feedback Interaction Network (FFINet) to aggregate features the current observations and potential future interactions for trajectory prediction. Firstly, we employ different spatial-temporal encoders to embed the decomposed position vectors and the current position of each scene, providing rich features for the subsequent cross-temporal aggregation. Secondly, the relative interaction and cross-temporal aggregation strategies are sequentially adopted to integrate features in the current fusion module, observation interaction module, future feedback module and global fusion module, in which the future feedback module can enable the understanding of pre-action by feeding the influence of preview information to feedforward prediction. Thirdly, the comprehensive interaction features are further fed into final predictor to generate the joint predicted trajectories of multiple agents. Extensive experimental results show that our FFINet achieves the state-of-the-art performance on Argoverse 1 and Argoverse 2 motion forecasting benchmarks.

摘要
<>自动驾驶中，预测行为的预测具有重要的作用，旨在预测未来的合理行为。现有的方法主要是基于历史交互和环境的模型，预测多模态轨迹，忽略了未来交互所引起的轨迹变化。在这篇论文中，我们提出了一种新的未来反馈互动网络（FFINet），用于聚合特征。首先，我们采用不同的空间-时间编码器，将分解的位坐标和当前场景的位置进行嵌入，提供丰富的特征 для后续的跨时间汇集。其次，我们采用相对交互和跨时间汇集策略，先后采用交互模块、观察交互模块、未来反馈模块和全局汇集模块，其中未来反馈模块可以帮助理解预测的预先行为。最后，我们将全面交互特征传递给最终预测器，生成多个交互的联合预测轨迹。广泛的实验结果表明，我们的 FFINet 在 Argoverse 1 和 Argoverse 2 运动预测Benchmark上达到了状态 искусственный智能的表现。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Causal Inference on Investment Constraints and Non-stationarity in Dynamic Portfolio Optimization through Reinforcement Learning

paper_url: http://arxiv.org/abs/2311.04946
repo_url: None
paper_authors: Yasuhiro Nakayama, Tomochika Sawaki
for: 本研究开发了一种动态资产配置投资策略，使用了回归学习技术。
methods: 我们解决了金融时间序列数据不stationarity问题的问题，并引入了一些变量，如 режим变化，以提高预测精度。
results: 我们的研究发现，通过在投资策略中应用回归学习技术，可以实现高精度的预测，并且可以考虑实际面临的投资者实际约束，从而实现有效的优化。

Abstract
In this study, we have developed a dynamic asset allocation investment strategy using reinforcement learning techniques. To begin with, we have addressed the crucial issue of incorporating non-stationarity of financial time series data into reinforcement learning algorithms, which is a significant implementation in the application of reinforcement learning in investment strategies. Our findings highlight the significance of introducing certain variables such as regime change in the environment setting to enhance the prediction accuracy. Furthermore, the application of reinforcement learning in investment strategies provides a remarkable advantage of setting the optimization problem flexibly. This enables the integration of practical constraints faced by investors into the algorithm, resulting in efficient optimization. Our study has categorized the investment strategy formulation conditions into three main categories, including performance measurement indicators, portfolio management rules, and other constraints. We have evaluated the impact of incorporating these conditions into the environment and rewards in a reinforcement learning framework and examined how they influence investment behavior.

摘要
在这项研究中，我们开发了一种动态资产分配投资策略使用强化学习技术。首先，我们解决了金融时间序列数据不Stationarity问题的应用在强化学习算法中的问题，这是投资策略应用强化学习中的一个重要实现。我们的发现表明，在环境设置中引入 certain 变量，如状态转换，可以提高预测精度。此外，强化学习在投资策略中提供了一个remarkable的优势，即可以自由地设置优化问题。这使得可以将实际面临的投资者的限制 integrate into the algorithm，从而实现高效的优化。我们对投资策略的形式化条件进行分类，包括表现指标、股票管理规则和其他限制。我们在强化学习框架中包含这些条件的环境和奖励，并研究了它们如何影响投资行为。

Auto deep learning for bioacoustic signals

paper_url: http://arxiv.org/abs/2311.04945
repo_url: https://github.com/giuliotosato/autokeras-bioacustic
paper_authors: Giulio Tosato, Abdelrahman Shehata, Joshua Janssen, Kees Kamp, Pramatya Jati, Dan Stowell
for: 这个研究旨在探讨自动深度学习是否可以提高多类bird vocalization分类的准确率和效率，并与传统的手动设计的深度学习模型进行比较。
methods: 这个研究使用了AutoKeras自动机器学习框架，自动化了神经网络搜索和超参数优化。
results: 结果表明，AutoKeras-derived模型在Western Mediterranean Wetland Birds dataset上 consistently outperform了传统模型如MobileNet、ResNet50和VGG16。这种方法和结论推动了生物声学研究的进步，并提供了一种新的自动化深度学习方法。

Abstract
This study investigates the potential of automated deep learning to enhance the accuracy and efficiency of multi-class classification of bird vocalizations, compared against traditional manually-designed deep learning models. Using the Western Mediterranean Wetland Birds dataset, we investigated the use of AutoKeras, an automated machine learning framework, to automate neural architecture search and hyperparameter tuning. Comparative analysis validates our hypothesis that the AutoKeras-derived model consistently outperforms traditional models like MobileNet, ResNet50 and VGG16. Our approach and findings underscore the transformative potential of automated deep learning for advancing bioacoustics research and models. In fact, the automated techniques eliminate the need for manual feature engineering and model design while improving performance. This study illuminates best practices in sampling, evaluation and reporting to enhance reproducibility in this nascent field. All the code used is available at https: //github.com/giuliotosato/AutoKeras-bioacustic Keywords: AutoKeras; automated deep learning; audio classification; Wetlands Bird dataset; comparative analysis; bioacoustics; validation dataset; multi-class classification; spectrograms.

摘要
Keywords: AutoKeras; automated deep learning; audio classification; Wetlands Bird dataset; comparative analysis; bioacoustics; validation dataset; multi-class classification; spectrograms.中文翻译：本研究探索使用自动化深度学习提高多类分类鸟叫声的精度和效率，与传统手动设计的深度学习模型进行比较。我们使用西地中海湿地鸟类 dataset 和 AutoKeras 框架自动化神经网络搜索和超参调整。我们的结果表明，AutoKeras derive 模型在 MobileNet、ResNet50 和 VGG16 等传统模型的比较中 consistently 表现出色。本研究强调自动化深度学习在生物声学研究中的潜在价值，因为它消除了手动特征工程和模型设计的需求，同时提高性能。我们还提供了采样、评估和报告的最佳实践，以增强这个领域的可重复性。所有代码使用的可以在 GitHub 上找到（https://github.com/giuliotosato/AutoKeras-bioacustic）。键语：AutoKeras; 自动化深度学习; 音频分类; 湿地鸟类 dataset; 比较分析; 生物声学; 验证集; 多类分类; spectrograms.

NExT-Chat: An LMM for Chat, Detection and Segmentation

paper_url: http://arxiv.org/abs/2311.04498
repo_url: https://github.com/tmukande-debug/NExT-Chat
paper_authors: Ao Zhang, Liming Zhao, Chen-Wei Xie, Yun Zheng, Wei Ji, Tat-Seng Chua
for: 本研究旨在提高大型语言模型（LLM）在多Modal理解方面的水平，通过增强视觉理解能力，使LMM能够更好地理解和回答多Modal问题。
methods: 本研究提出了一种新的对象位置模型方法 called pixel2emb，该方法让LMM输出位置嵌入，然后通过不同的解码器进行解码。这种嵌入基于的位置模型方法允许使用不同的位置格式（如 bounding box 和 mask）在多Modal会话中。
results: 在有限资源的情况下，我们的 pixel2emb 方法在位置输入和输出任务中表现出色，与现有SOTA方法相比，具有更高的性能。基于提出的 pixel2emb 方法，我们训练了一个名为 NExT-Chat 的 LMM，并证明其能够处理多种任务，如视觉固定、区域描述和基于物理的理解。

Abstract
The development of large language models (LLMs) has greatly advanced the field of multimodal understanding, leading to the emergence of large multimodal models (LMMs). In order to enhance the level of visual comprehension, recent studies have equipped LMMs with region-level understanding capabilities by representing object bounding box coordinates as a series of text sequences (pixel2seq). In this paper, we introduce a novel paradigm for object location modeling called pixel2emb method, where we ask the LMM to output the location embeddings and then decoded by different decoders. This paradigm allows for different location formats (such as bounding boxes and masks) to be used in multimodal conversations Furthermore, this kind of embedding based location modeling enables the utilization of existing practices in localization tasks, such as detection and segmentation. In scenarios with limited resources, our pixel2emb demonstrates superior performance compared to existing state-of-the-art (SOTA) approaches in both the location input and output tasks under fair comparison. Leveraging the proposed pixel2emb method, we train an LMM named NExT-Chat and demonstrate its capability of handling multiple tasks like visual grounding, region caption, and grounded reasoning.

摘要
大型语言模型（LLM）的发展对多Modal理解领域带来了巨大的进步，导致大多Modal模型（LMM）的出现。为了提高视觉理解水平， latest studies have equipped LMMs with regional understanding capabilities by representing object bounding box coordinates as a series of text sequences (pixel2seq). 在这篇论文中，我们介绍了一种新的对象位置模型方法，即像素2Embedding（pixel2emb）方法，其中我们问LMM输出位置嵌入，然后通过不同的解码器进行解码。这种嵌入基于位置模型方法允许使用不同的位置格式（如 bounding box 和 mask）在多Modal conversation中，并且可以利用现有的Localization任务的实践，如检测和分割。在有限的资源情况下，我们的像素2emb在位置输入和输出任务中表现出了较好的性能，与现有SOTA方法相比。基于提出的像素2emb方法，我们训练了一个名为NExT-Chat的LMM，并证明其能处理多个任务，如视觉定位、区域描述和基于位置的理解。

Explainable AI for Earth Observation: Current Methods, Open Challenges, and Opportunities

paper_url: http://arxiv.org/abs/2311.04491
repo_url: None
paper_authors: Gulsen Taskin, Erchan Aptoula, Alp Ertürk
for: This paper provides a panorama of the state-of-the-art in explainable remote sensing image analysis, organized by prominent Earth observation application fields.
methods: The paper explores a wide spectrum of Explainable Artificial Intelligence techniques to address the lack of explainability and interpretability in deep learning methods for remote sensing.
results: The paper presents the state-of-the-art in explainable remote sensing image analysis, covering a range of Earth observation application fields.Here’s the text in Simplified Chinese:
for: 这篇论文提供了遍及主要地球观测应用领域的现状的Remote Sensing图像分析的状况报告，使用了Explainable Artificial Intelligence技术来解决深度学习方法的解释性和可解释性问题。
methods: 论文探讨了各种Explainable Artificial Intelligence技术，以解决深度学习方法在Remote Sensing图像分析中的解释性和可解释性问题。
results: 论文提供了Remote Sensing图像分析领域的状态对应的现状报告，涵盖了主要的地球观测应用领域。

Abstract
Deep learning has taken by storm all fields involved in data analysis, including remote sensing for Earth observation. However, despite significant advances in terms of performance, its lack of explainability and interpretability, inherent to neural networks in general since their inception, remains a major source of criticism. Hence it comes as no surprise that the expansion of deep learning methods in remote sensing is being accompanied by increasingly intensive efforts oriented towards addressing this drawback through the exploration of a wide spectrum of Explainable Artificial Intelligence techniques. This chapter, organized according to prominent Earth observation application fields, presents a panorama of the state-of-the-art in explainable remote sensing image analysis.

摘要
深度学习已经在数据分析领域的所有领域中掀尘，包括远程感知。然而，尽管表现得非常出色，但深度学习的不可解性和解释性问题，从神经网络的出现以来一直存在的问题，仍然是对其进行批判的主要来源。因此，深度学习方法在远程感知领域的扩张被附加了解释人工智能技术的探索。这章，按照主要的地球观测应用领域分类，介绍了现代 explainable 远程感知图像分析的状况。

Emergent Communication for Rules Reasoning

paper_url: http://arxiv.org/abs/2311.04474
repo_url: None
paper_authors: Yuxuan Guo, Yifan Hao, Rui Zhang, Enshuai Zhou, Zidong Du, Xishan Zhang, Xinkai Song, Yuanbo Wen, Yongwei Zhao, Xuehai Zhou, Jiaming Guo, Qi Yi, Shaohui Peng, Di Huang, Ruizhi Chen, Qi Guo, Yunji Chen
for: 这个论文主要研究了深度学习基于代理的emergent通信，它们在语言和人工智能方面提供了灵感。但是，之前的尝试都是在感知 Orientated的环境下进行emerging通信， forcing agents to describe low-level perceptual features within image or symbol contexts.
methods: 在这篇论文中，我们提出了一种新的认知游戏（namely Reasoning Game），这个游戏鼓励代理通过思维和通信来解释高级规则，而不是仅仅描述低级感知上下文。我们还提出了一个不偏的数据集（namely rule-RAVEN）作为一个基准，以避免过拟合。此外，我们还提出了一种两阶段训练方法，用于在Reasoning Game中更稳定地 converge。
results: 实验结果表明，在Reasoning Game中，代理们能够解释高级规则，并将其应用到未看过的上下文特性中。此外，emerged语言还帮助代理们在不同上下文特性或任务之间进行泛化和传输。

Abstract
Research on emergent communication between deep-learning-based agents has received extensive attention due to its inspiration for linguistics and artificial intelligence. However, previous attempts have hovered around emerging communication under perception-oriented environmental settings, that forces agents to describe low-level perceptual features intra image or symbol contexts. In this work, inspired by the classic human reasoning test (namely Raven's Progressive Matrix), we propose the Reasoning Game, a cognition-oriented environment that encourages agents to reason and communicate high-level rules, rather than perceived low-level contexts. Moreover, we propose 1) an unbiased dataset (namely rule-RAVEN) as a benchmark to avoid overfitting, 2) and a two-stage curriculum agent training method as a baseline for more stable convergence in the Reasoning Game, where contexts and semantics are bilaterally drifting. Experimental results show that, in the Reasoning Game, a semantically stable and compositional language emerges to solve reasoning problems. The emerged language helps agents apply the extracted rules to the generalization of unseen context attributes, and to the transfer between different context attributes or even tasks.

摘要
研究深度学习代理之间的emergentcommunication已经受到了人工智能和语言科学的广泛关注，因为它们可以提供人工智能和语言科学的灵感。然而，之前的尝试都集中在感知 oriented 环境下的 emerging communication， forcing agents to describe low-level perceptual features within image or symbol contexts。在这项工作中， Drawing inspiration from the classic human reasoning test (namely Raven's Progressive Matrix), we propose the Reasoning Game, a cognition-oriented environment that encourages agents to reason and communicate high-level rules, rather than perceived low-level contexts。 In addition, we propose 1) an unbiased dataset (namely rule-RAVEN) as a benchmark to avoid overfitting, 2) and a two-stage curriculum agent training method as a baseline for more stable convergence in the Reasoning Game, where contexts and semantics are bilaterally drifting。实验结果表明，在 Reasoning Game 中，semantically stable and compositional language emerges to solve reasoning problems。这种emerged language helps agents apply the extracted rules to the generalization of unseen context attributes, and to the transfer between different context attributes or even tasks。

RDGCN: Reinforced Dependency Graph Convolutional Network for Aspect-based Sentiment Analysis

paper_url: http://arxiv.org/abs/2311.04467
repo_url: https://github.com/rdgcn/rdgcn
paper_authors: Xusheng Zhao, Hao Peng, Qiong Dai, Xu Bai, Huailiang Peng, Yanbing Liu, Qinglang Guo, Philip S. Yu
for: 这 paper 的目的是提高 aspect-based sentiment analysis (ABSA) 的精度，使其能够更好地预测句子中的 sentiment polarity。
methods: 这 paper 使用 graph neural networks (GNN) 来捕捉句子中的结构 Patterns，并通过 reinforcement learning 来改进 dependency graph 中的重要性计算。
results: compare to state-of-the-art GNN-based baselines, RDGCN 在三个 популяр的 dataset 上的全面实验中表现出色，提高了 ABSA 的精度。

Abstract
Aspect-based sentiment analysis (ABSA) is dedicated to forecasting the sentiment polarity of aspect terms within sentences. Employing graph neural networks to capture structural patterns from syntactic dependency parsing has been confirmed as an effective approach for boosting ABSA. In most works, the topology of dependency trees or dependency-based attention coefficients is often loosely regarded as edges between aspects and opinions, which can result in insufficient and ambiguous syntactic utilization. To address these problems, we propose a new reinforced dependency graph convolutional network (RDGCN) that improves the importance calculation of dependencies in both distance and type views. Initially, we propose an importance calculation criterion for the minimum distances over dependency trees. Under the criterion, we design a distance-importance function that leverages reinforcement learning for weight distribution search and dissimilarity control. Since dependency types often do not have explicit syntax like tree distances, we use global attention and mask mechanisms to design type-importance functions. Finally, we merge these weights and implement feature aggregation and classification. Comprehensive experiments on three popular datasets demonstrate the effectiveness of the criterion and importance functions. RDGCN outperforms state-of-the-art GNN-based baselines in all validations.

摘要
Initially, we propose an importance calculation criterion for the minimum distances over dependency trees. Under the criterion, we design a distance-importance function that leverages reinforcement learning for weight distribution search and dissimilarity control. Since dependency types often do not have explicit syntax like tree distances, we use global attention and mask mechanisms to design type-importance functions. Finally, we merge these weights and implement feature aggregation and classification.Comprehensive experiments on three popular datasets demonstrate the effectiveness of the criterion and importance functions. RDGCN outperforms state-of-the-art GNN-based baselines in all validations.

Edge-assisted U-Shaped Split Federated Learning with Privacy-preserving for Internet of Things

paper_url: http://arxiv.org/abs/2311.04944
repo_url: None
paper_authors: Hengliang Tang, Zihang Zhao, Detian Liu, Yang Cao, Shiqiang Zhang, Siqing You
for: 这个研究旨在解决互联网领域内的物联网（IoT）设备上的深度学习模型部署问题。这些设备通常没有计算和通信能力，直接传输数据会导致网络拥堵和不合理的执行。中央化数据处理在数据中心也不再可行，因为关于数据隐私和安全的 Concerns。
methods: 我们提出了一个创新的 Edge-assisted U-Shaped Split Federated Learning（EUSFL）框架，利用边缘服务器的高性能能力协助IoT设备进行模型训练和优化过程。在这个框架中，我们运用了 Federated Learning（FL），让数据持有者共同训练模型，而不需要分享数据，从而提高隐私保护。此外，我们将神经网络分为三部分使用U-型分割，让IoT设备进行本地训练。这样可以利用边缘服务器的更高计算能力，实现全面训练时间的缩短，并让IoT设备 avec varying capabilities 进行训练任务得以高效。
results: 我们的理论分析和实验结果显示，EUSFL可以与不同的聚合算法结合使用，在不同的IoT设备计算能力下保持良好的性能，并对训练时间和本地计算负载进行了明显的缩短。此外，我们还提出了一种新的杂音机制 called LabelDP，以保护数据特征和标签免受重建攻击，排除隐私泄露的风险。

Abstract
In the realm of the Internet of Things (IoT), deploying deep learning models to process data generated or collected by IoT devices is a critical challenge. However, direct data transmission can cause network congestion and inefficient execution, given that IoT devices typically lack computation and communication capabilities. Centralized data processing in data centers is also no longer feasible due to concerns over data privacy and security. To address these challenges, we present an innovative Edge-assisted U-Shaped Split Federated Learning (EUSFL) framework, which harnesses the high-performance capabilities of edge servers to assist IoT devices in model training and optimization process. In this framework, we leverage Federated Learning (FL) to enable data holders to collaboratively train models without sharing their data, thereby enhancing data privacy protection by transmitting only model parameters. Additionally, inspired by Split Learning (SL), we split the neural network into three parts using U-shaped splitting for local training on IoT devices. By exploiting the greater computation capability of edge servers, our framework effectively reduces overall training time and allows IoT devices with varying capabilities to perform training tasks efficiently. Furthermore, we proposed a novel noise mechanism called LabelDP to ensure that data features and labels can securely resist reconstruction attacks, eliminating the risk of privacy leakage. Our theoretical analysis and experimental results demonstrate that EUSFL can be integrated with various aggregation algorithms, maintaining good performance across different computing capabilities of IoT devices, and significantly reducing training time and local computation overhead.

摘要
在互联网物联网（IoT）领域，部署深度学习模型来处理由IoT设备生成或收集的数据是一项关键挑战。然而，直接数据传输会导致网络拥堵和不efficient执行，因为IoT设备通常缺乏计算和通信能力。中央化数据处理在数据中心也不再可行，因为数据隐私和安全问题。为解决这些挑战，我们提出了一种创新的Edge助けU型分布式学习（EUSFL）框架，利用边缘服务器的高性能特性来帮助IoT设备进行模型训练和优化过程。在这个框架中，我们运用分布式学习（FL），使得数据持有者可以共同训练模型，而不需要将数据共享，从而提高数据隐私保护。此外，受到分learn（SL）的启发，我们将神经网络分成三部分，在IoT设备上进行本地训练。通过利用边缘服务器的更高计算能力，我们的框架可以有效减少总训练时间，让IoT设备按照不同的能力进行训练任务，并且可以保持不同计算能力的IoT设备之间的兼容性。此外，我们还提出了一种新的噪声机制called LabelDP，以保护数据特征和标签免受重建攻击，从而消除隐私泄露的风险。我们的理论分析和实验结果表明，EUSFL可以与不同的聚合算法结合使用，保持不同计算能力的IoT设备之间的兼容性，同时减少训练时间和本地计算负担。

Improving Pacing in Long-Form Story Planning

paper_url: http://arxiv.org/abs/2311.04459
repo_url: https://github.com/yichenzw/pacing
paper_authors: Yichen Wang, Kevin Yang, Xiaoming Liu, Dan Klein
for: 提高自动生成故事大纲的自然整体感 (improve the natural pacing of automatically generated story outlines)
methods: 使用 coneCreteness 评价器控制层次大纲生成 (use a concreteness evaluator to control hierarchical outline generation)，并使用 predicted concreteness 筛选新的大纲项 (filter new outline items based on predicted concreteness)
results: 与基线对比，人类评价 CONCOCT 的均衡性高于 57% 多个大纲长度 (compared to a baseline, humans judge CONCOCT’s pacing to be more consistent over 57% of the time across multiple outline lengths)

Abstract
Existing LLM-based systems for writing long-form stories or story outlines frequently suffer from unnatural pacing, whether glossing over important events or over-elaborating on insignificant details, resulting in a jarring experience for the reader. We propose a CONCrete Outline ConTrol (CONCOCT) system to improve pacing when automatically generating story outlines. We first train a concreteness evaluator to judge which of two events is more concrete (low-level-detailed). This evaluator can then be used to control pacing in hierarchical outline generation; in this work, we explore a vaguest-first expansion procedure that aims for uniform pacing. We further use the evaluator to filter new outline items based on predicted concreteness. Compared to a baseline hierarchical outline generator, humans judge CONCOCT's pacing to be more consistent over 57% of the time across multiple outline lengths; the gains also translate to downstream stories. All code, data, and models are open-sourced.

摘要
We first train a concreteness evaluator to determine which of two events is more concrete (low-level-detailed). This evaluator is then used to control pacing in hierarchical outline generation. Specifically, we use a vaguest-first expansion procedure that aims for uniform pacing. Additionally, we use the evaluator to filter new outline items based on their predicted concreteness.Compared to a baseline hierarchical outline generator, humans judge CONCOCT's pacing to be more consistent over 57% of the time across multiple outline lengths. Furthermore, the gains translate to downstream stories. All code, data, and models are open-sourced.

Evaluating Uncertainty Quantification approaches for Neural PDEs in scientific applications

paper_url: http://arxiv.org/abs/2311.04457
repo_url: None
paper_authors: Vardhan Dongre, Gurpreet Singh Hora
for: This paper focuses on the development of Uncertainty Quantification (UQ) methods for Neural Partial Differential Equations (Neural PDEs) in scientific applications, specifically for forward and inverse problems.methods: The paper evaluates various UQ approaches, including Bayesian methods such as Hamiltonian Monte Carlo (HMC) and Monte-Carlo Dropout (MCD), as well as a conventional approach using Deep Ensembles (DE).results: The results show that Neural PDEs can effectively reconstruct flow systems and predict the associated unknown parameters, but the Bayesian methods tend to display a higher degree of certainty in their predictions compared to the DE approach. This suggests that Bayesian techniques may underestimate the true underlying uncertainty, appearing more confident in their predictions than the DE approach.Here’s the Chinese version of the three key points:for: 这篇论文关注使用神经partial differential equations (Neural PDEs)在科学应用中的uncertainty量化 (UQ)方法的开发，特别是对于前向和反向问题。methods: 论文评估了多种UQ方法，包括 bayesian方法such as Hamiltonian Monte Carlo (HMC)和Monte-Carlo Dropout (MCD)，以及一种传统的 Deep Ensembles (DE) 方法。results: 结果显示 Neural PDEs 可以有效地重construct流体系统和相关的未知参数，但 bayesian方法在其预测中显示了更高的自信度，与 DE 方法相比。这表明 bayesian 技术可能会下降 true 的下层不确定性，因此在其预测中显得更自信。

Abstract
The accessibility of spatially distributed data, enabled by affordable sensors, field, and numerical experiments, has facilitated the development of data-driven solutions for scientific problems, including climate change, weather prediction, and urban planning. Neural Partial Differential Equations (Neural PDEs), which combine deep learning (DL) techniques with domain expertise (e.g., governing equations) for parameterization, have proven to be effective in capturing valuable correlations within spatiotemporal datasets. However, sparse and noisy measurements coupled with modeling approximation introduce aleatoric and epistemic uncertainties. Therefore, quantifying uncertainties propagated from model inputs to outputs remains a challenge and an essential goal for establishing the trustworthiness of Neural PDEs. This work evaluates various Uncertainty Quantification (UQ) approaches for both Forward and Inverse Problems in scientific applications. Specifically, we investigate the effectiveness of Bayesian methods, such as Hamiltonian Monte Carlo (HMC) and Monte-Carlo Dropout (MCD), and a more conventional approach, Deep Ensembles (DE). To illustrate their performance, we take two canonical PDEs: Burger's equation and the Navier-Stokes equation. Our results indicate that Neural PDEs can effectively reconstruct flow systems and predict the associated unknown parameters. However, it is noteworthy that the results derived from Bayesian methods, based on our observations, tend to display a higher degree of certainty in their predictions as compared to those obtained using the DE. This elevated certainty in predictions suggests that Bayesian techniques might underestimate the true underlying uncertainty, thereby appearing more confident in their predictions than the DE approach.

摘要
<>使用可得到的散点数据，由于可靠且有效的感知器和数值实验，解决科学问题，包括气候变化、天气预测和城市规划等。神经partial differential equations（神经PDEs），结合深度学习（DL）技术和领域专业知识（例如，管理方程）进行参数化，能够很好地捕捉空间时间数据中的有价值相关性。然而，稀缺和噪声的测量数据，加之模型简化，导致 aleatoric和epistemicuncertainty。因此，将模型输入到输出中的不确定性进行评估是一项重要的任务，以确保神经PDEs的可靠性。本工作评估了多种uncertainty quantification（UQ）方法，包括forward和 inverse问题在科学应用中。 Specifically, we investigate the effectiveness of Bayesian methods, such as Hamiltonian Monte Carlo (HMC) and Monte-Carlo Dropout (MCD), and a more conventional approach, Deep Ensembles (DE). To illustrate their performance, we take two canonical PDEs: Burger's equation and the Navier-Stokes equation. Our results indicate that Neural PDEs can effectively reconstruct flow systems and predict the associated unknown parameters. However, it is noteworthy that the results derived from Bayesian methods, based on our observations, tend to display a higher degree of certainty in their predictions as compared to those obtained using the DE. This elevated certainty in predictions suggests that Bayesian techniques might underestimate the true underlying uncertainty, thereby appearing more confident in their predictions than the DE approach.Translated by Google Translate.

MathNAS: If Blocks Have a Role in Mathematical Architecture Design

paper_url: http://arxiv.org/abs/2311.04943
repo_url: https://github.com/wangqinsi1/mathnas
paper_authors: Wang Qinsi, Ke Jinhan, Liang Zhi, Zhang Sihai
For: 这个研究是要解决Neural Architecture Search（NAS）中的大型模型设计问题，因为现有的方法在搜寻和评估候选网络时需要庞大的 computation cost。* Methods: 这篇研究提出了一个新的分治策略，利用搜寻空间的弹性特性，将候选网络的表现计算分解为各个专案中的表现，并使用数学程式来预测网络表现。* Results: 这篇研究显示，这个新的分治策略可以实现快速的网络表现评估，并且可以实现更高的准确性和更快的搜寻速度。

Abstract
Neural Architecture Search (NAS) has emerged as a favoured method for unearthing effective neural architectures. Recent development of large models has intensified the demand for faster search speeds and more accurate search results. However, designing large models by NAS is challenging due to the dramatical increase of search space and the associated huge performance evaluation cost. Consider a typical modular search space widely used in NAS, in which a neural architecture consists of $m$ block nodes and a block node has $n$ alternative blocks. Facing the space containing $n^m$ candidate networks, existing NAS methods attempt to find the best one by searching and evaluating candidate networks directly.Different from the general strategy that takes architecture search as a whole problem, we propose a novel divide-and-conquer strategy by making use of the modular nature of the search space.Here, we introduce MathNAS, a general NAS framework based on mathematical programming.In MathNAS, the performances of the $m*n$ possible building blocks in the search space are calculated first, and then the performance of a network is directly predicted based on the performances of its building blocks. Although estimating block performances involves network training, just as what happens for network performance evaluation in existing NAS methods, predicting network performance is completely training-free and thus extremely fast. In contrast to the $n^m$ candidate networks to evaluate in existing NAS methods, which require training and a formidable computational burden, there are only $m*n$ possible blocks to handle in MathNAS. Therefore, our approach effectively reduces the complexity of network performance evaluation.Our code is available at https://github.com/wangqinsi1/MathNAS.

摘要
neural Architecture Search (NAS) 已经成为发掘有效神经建筑的首选方法。 recent development of large models 使得寻找更快的搜索速度和更准确的搜索结果变得更加紧迫。然而，通过 NAS 设计大型模型是挑战，因为搜索空间的增加会导致搜索速度的增加和评估成本的增加。 faces a typical modular search space widely used in NAS, in which a neural architecture consists of $m$ block nodes and a block node has $n$ alternative blocks. existing NAS methods attempt to find the best one by searching and evaluating candidate networks directly. unlike the general strategy that takes architecture search as a whole problem, we propose a novel divide-and-conquer strategy by making use of the modular nature of the search space. here, we introduce MathNAS, a general NAS framework based on mathematical programming. in MathNAS, the performances of the $m*n$ possible building blocks in the search space are calculated first, and then the performance of a network is directly predicted based on the performances of its building blocks. although estimating block performances involves network training, just as what happens for network performance evaluation in existing NAS methods, predicting network performance is completely training-free and thus extremely fast. in contrast to the $n^m$ candidate networks to evaluate in existing NAS methods, which require training and a formidable computational burden, there are only $m*n$ possible blocks to handle in MathNAS. therefore, our approach effectively reduces the complexity of network performance evaluation. our code is available at https://github.com/wangqinsi1/MathNAS.

MixTEA: Semi-supervised Entity Alignment with Mixture Teaching

paper_url: http://arxiv.org/abs/2311.04441
repo_url: https://github.com/xiefeng69/mixtea
paper_authors: Feng Xie, Xin Song, Xiang Zeng, Xuechen Zhao, Lei Tian, Bin Zhou, Yusong Tan
for: 本文提出了一种新的半监督实体对应（EA）方法，以解决由于缺乏充分的标注数据而带来的实体对应问题。
methods: 本文使用了一种独特的结合人工标注和 probabilistic pseudo 对应的混合教学方法，并在 pseudo 对应学习中提出了bi-directional voting（BDV）策略和匹配多样性基于修正（MDR）模块，以降低噪声对 pseudo 对应学习的负面影响。
results: 对于多个 benchmark 数据集，以及进一步的分析，表明了我们提出的方法的优越性和实用性。

Abstract
Semi-supervised entity alignment (EA) is a practical and challenging task because of the lack of adequate labeled mappings as training data. Most works address this problem by generating pseudo mappings for unlabeled entities. However, they either suffer from the erroneous (noisy) pseudo mappings or largely ignore the uncertainty of pseudo mappings. In this paper, we propose a novel semi-supervised EA method, termed as MixTEA, which guides the model learning with an end-to-end mixture teaching of manually labeled mappings and probabilistic pseudo mappings. We firstly train a student model using few labeled mappings as standard. More importantly, in pseudo mapping learning, we propose a bi-directional voting (BDV) strategy that fuses the alignment decisions in different directions to estimate the uncertainty via the joint matching confidence score. Meanwhile, we also design a matching diversity-based rectification (MDR) module to adjust the pseudo mapping learning, thus reducing the negative influence of noisy mappings. Extensive results on benchmark datasets as well as further analyses demonstrate the superiority and the effectiveness of our proposed method.

摘要
semi-supervised entity alignment（EA）是一个实际和挑战性的任务，因为缺乏充足的标注映射作为训练数据。大多数工作是通过生成 Pseudo 映射来 Address 这个问题，但它们都会受到 Pseudo 映射的错误（噪声）的影响或者忽略 Pseudo 映射的不确定性。在这篇论文中，我们提出了一种新的 semi-supervised EA 方法，名为 MixTEA，它使用端到端混合教学法和概率 Pseudo 映射来导引模型学习。我们首先在几个标注映射上训练一个学生模型。更重要的是，在 Pseudo 映射学习中，我们提出了两个方向投票（BDV）策略，它将在不同的方向投票结果中进行折衔，以便估计 uncertainty via 联合匹配信息指数。同时，我们还设计了一个匹配多样性基于修正（MDR）模块，以降低 Pseudo 映射学习中的负面影响。我们在标准 benchmark 数据集以及进一步的分析中表明了我们的提出的方法的优越性和效果。

Interpretable Geoscience Artificial Intelligence (XGeoS-AI): Application to Demystify Image Recognition

paper_url: http://arxiv.org/abs/2311.04940
repo_url: None
paper_authors: Jin-Jian Xu, Hao Zhang, Chao-Sheng Tang, Lin Li, Bin Shi
for: 这个研究的目的是解释地球科学中的图像识别问题，并提出一种可解释的地球科学人工智能（XGeoS-AI）框架来实现这一目标。
methods: 该框架采用了人类视觉机制的思想，通过在整个图像中选择一个地方生成一个阈值，以完成图像识别任务。此外，可以采用不同的人工智能方法，如支持向量回归（SVR）、多层感知神经网络（MLP）和卷积神经网络（CNN）等，来快速完成地球科学图像识别任务。
results: 实验结果表明，提出的XGeoS-AI框架具有高效、多样化和可解释的优点，有很大的潜力应用于地球科学图像识别问题。此外，该框架还可以推动地球科学领域内的技术创新。

Abstract
As Earth science enters the era of big data, artificial intelligence (AI) not only offers great potential for solving geoscience problems, but also plays a critical role in accelerating the understanding of the complex, interactive, and multiscale processes of Earth's behavior. As geoscience AI models are progressively utilized for significant predictions in crucial situations, geoscience researchers are increasingly demanding their interpretability and versatility. This study proposes an interpretable geoscience artificial intelligence (XGeoS-AI) framework to unravel the mystery of image recognition in the Earth sciences, and its effectiveness and versatility is demonstrated by taking computed tomography (CT) image recognition as an example. Inspired by the mechanism of human vision, the proposed XGeoS-AI framework generates a threshold value from a local region within the whole image to complete the recognition. Different kinds of artificial intelligence (AI) methods, such as Support Vector Regression (SVR), Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), can be adopted as the AI engines of the proposed XGeoS-AI framework to efficiently complete geoscience image recognition tasks. Experimental results demonstrate that the effectiveness, versatility, and heuristics of the proposed framework have great potential in solving geoscience image recognition problems. Interpretable AI should receive more and more attention in the field of the Earth sciences, which is the key to promoting more rational and wider applications of AI in the field of Earth sciences. In addition, the proposed interpretable framework may be the forerunner of technological innovation in the Earth sciences.

摘要
如今地球科学进入大数据时代，人工智能（AI）不仅提供了解决地球科学问题的极大潜力，还扮演着促进地球行为复杂、互动和多尺度过程的加速器。随着地球科学AI模型在重要情况下的广泛应用，地球科学研究人员更加需要其解释性和多样性。本研究提出了一种可解释的地球科学人工智能（XGeoS-AI）框架，以解决地球科学图像识别问题的谜题。灵感自人类视觉机制，提议的XGeoS-AI框架在全图像中 locates 一个区域，并生成该区域的阈值，以完成图像识别。不同的人工智能方法，如支持向量回归（SVR）、多层感知网络（MLP）和卷积神经网络（CNN），可以作为XGeoS-AI框架中的人工智能引擎，高效完成地球科学图像识别任务。实验结果表明，提议的框架具有效果、多样性和启发性，在地球科学图像识别问题上具有很大的潜力。可解释AI在地球科学领域应该收到更多的关注，这是推动AI在地球科学领域的更广泛应用的关键。此外，提议的可解释框架可能成为地球科学技术创新的先驱者。

LooGLE: Can Long-Context Language Models Understand Long Contexts?

paper_url: http://arxiv.org/abs/2311.04939
repo_url: https://github.com/bigai-nlco/loogle
paper_authors: Jiaqi Li, Mengmeng Wang, Zilong Zheng, Muhan Zhang
for: 评估大语言模型（LLMs）在长文本理解方面的能力。
methods: 使用新的文档（post-2022），且文档长度超过24,000个字符，同时采用人工生成的6,000个问题和多个领域的问题集来评估LLMs的长dependency能力。
results: 研究发现：（i）商业模型在LLMs中表现更好；（ii）LLMs在短dependency任务中表现出色，但在更复杂的长dependency任务中表现不佳；（iii）在文本上下文中学习和连接思维只提供了有限的改进；（iv）检索基本技术在短问题 answering中表现出色，而扩展文本窗口长度的技术对长文本理解有限的影响。

Abstract
Large language models (LLMs), despite their impressive performance in various language tasks, are typically limited to processing texts within context-window size. This limitation has spurred significant research efforts to enhance LLMs' long-context understanding with high-quality long-sequence benchmarks. However, prior datasets in this regard suffer from shortcomings, such as short context length compared to the context window of modern LLMs; outdated documents that have data leakage problems; and an emphasis on short dependency tasks rather than long dependency tasks. In this paper, we present LooGLE, a Long Context Generic Language Evaluation benchmark for LLMs' long context understanding. LooGLE features relatively new documents post-2022, with over 24,000 tokens per document and 6,000 newly generated questions spanning diverse domains. Human annotators meticulously crafted more than 1,100 high-quality question-answer pairs to meet the long dependency requirements. These pairs underwent thorough cross-validation, yielding the most precise assessment of LLMs' long dependency capabilities. The evaluation of eight state-of-the-art LLMs on LooGLE revealed key findings: (i) commercial models outperformed open-sourced models; (ii) LLMs excelled in short dependency tasks like short question-answering and cloze tasks but struggled with more intricate long dependency tasks; (iii) in-context learning and chaining thoughts offered only marginal improvements; (iv) retrieval-based techniques demonstrated substantial benefits for short question-answering, while strategies for extending context window length had limited impact on long context understanding. As such, LooGLE not only provides a systematic and comprehensive evaluation schema on long-context LLMs, but also sheds light on future development of enhanced models towards "true long-context understanding".

摘要
大型语言模型（LLM），尽管在不同语言任务中表现出色，但通常只能处理文本内容窗口大小。这一限制促进了大量研究，以提高LLM的长期文本理解能力。然而，先前的数据集受到一些缺点的影响，如文本长度较短，与现代LLM的上下文窗口大小相比; 带有数据泄露问题的旧文档; 和更重视短语相互作用而非长语相互作用。本文提出了LooGLE，一个长期语言评估准则，用于评估LLM的长期文本理解能力。LooGLE的特点包括：使用最新的文档（2022年后），文档长度超过24,000个字符，并且新生成了6,000个问题，覆盖多个领域。人工标注员仔细制作了1,100个高质量问题对，以满足长期依赖要求。这些对经过了严格的交叉验证，以便获得LLM的长期依赖能力的最准确评估。八种当前最佳LLM在LooGLE上进行评估后，得到了以下发现：（i）商业模型比开源模型表现更佳；（ii）LLM在短语相互作用任务中表现出色，但在更复杂的长语相互作用任务中受到限制；（iii）嵌入式学习和串联思维只提供了有限的改进；（iv）基于检索的技术在短问题回答任务中具有显著的优势，而扩展上下文窗口长度的策略对长期文本理解的改进具有有限的影响。因此，LooGLE不仅提供了LLM的长期文本理解能力的系统和全面的评估方案，还探讨了未来 LLM 的发展，以实现“真正的长期文本理解”。

Data Factors for Better Compositional Generalization

paper_url: http://arxiv.org/abs/2311.04420
repo_url: https://github.com/owenzx/data4comp
paper_authors: Xiang Zhou, Yichen Jiang, Mohit Bansal
for: 本文旨在探讨模型在不同数据集上的泛化能力，以及如何通过不同的数据因素来改善模型的泛化能力。
methods: 本文使用Transformer模型在不同数据集上进行训练，并通过分析不同数据因素的影响来解释模型的泛化能力。
results: 研究发现，增加数据复杂度可以提高模型的泛化能力，并且这种改善来自于数据集中提供更多的多样化示例和降低示例重复频率的效果。此外，训练例子的Difficulty Level也对泛化能力产生不同的影响，在 sintetic datasets上，简单的例子更能够 invoke compose understanding，而在大规模的实际语言 datasets上，一个平衡的混合mixture of simple和hard例子可以induce最强的泛化能力。

Abstract
Recent diagnostic datasets on compositional generalization, such as SCAN (Lake and Baroni, 2018) and COGS (Kim and Linzen, 2020), expose severe problems in models trained from scratch on these datasets. However, in contrast to this poor performance, state-of-the-art models trained on larger and more general datasets show better generalization ability. In this work, to reconcile this inconsistency, we conduct an empirical analysis by training Transformer models on a variety of training sets with different data factors, including dataset scale, pattern complexity, example difficulty, etc. First, we show that increased dataset complexity can lead to better generalization behavior on multiple different generalization challenges. To further understand this improvement, we show two axes of the benefit from more complex datasets: they provide more diverse examples so compositional understanding becomes more effective, and they also prevent ungeneralizable memorization of the examples due to reduced example repetition frequency. Finally, we explore how training examples of different difficulty levels influence generalization differently. On synthetic datasets, simple examples invoke stronger compositionality than hard examples do. On larger-scale real language datasets, while hard examples become more important potentially to ensure decent data coverage, a balanced mixture of simple and hard examples manages to induce the strongest generalizability. The code and data for this work are available at https://github.com/owenzx/data4comp

摘要
Recent diagnostic datasets on compositional generalization, such as SCAN (Lake and Baroni, 2018) and COGS (Kim and Linzen, 2020), have revealed severe problems in models trained from scratch on these datasets. However, in contrast to this poor performance, state-of-the-art models trained on larger and more general datasets have shown better generalization ability. In this work, we aim to reconcile this inconsistency by conducting an empirical analysis of Transformer models trained on various training sets with different data factors, including dataset scale, pattern complexity, example difficulty, etc.First, we find that increased dataset complexity leads to better generalization behavior on multiple different generalization challenges. To further understand this improvement, we identify two axes of benefit from more complex datasets: they provide more diverse examples that enhance compositional understanding, and they also reduce the likelihood of ungeneralizable memorization due to reduced example repetition frequency.Finally, we explore how training examples of different difficulty levels influence generalization differently. On synthetic datasets, simple examples tend to invoke stronger compositionality than hard examples do. On larger-scale real language datasets, while hard examples become more important to ensure decent data coverage, a balanced mixture of simple and hard examples is found to induce the strongest generalizability. The code and data for this work are available at .

PepLand: a large-scale pre-trained peptide representation model for a comprehensive landscape of both canonical and non-canonical amino acids

paper_url: http://arxiv.org/abs/2311.04419
repo_url: None
paper_authors: Ruochi Zhang, Haoran Wu, Yuting Xiu, Kewei Li, Ningning Chen, Yu Wang, Yan Wang, Xin Gao, Fengfeng Zhou
For: PepLand is a pre-training architecture for representation and property analysis of peptides spanning both canonical and non-canonical amino acids.* Methods: PepLand leverages a comprehensive multi-view heterogeneous graph neural network to unveil the subtle structural representations of peptides.* Results: PepLand effectively captures salient synthetic peptide features, laying a robust foundation for transformative advances in peptide-centric research domains.Here’s the Chinese translation of the three points:* For: 这篇研究旨在提出一个专门针对包含非标准氨基酸的肽的预训架构，以探索这些肽的积极特征和性能。* Methods: 这个预训架构使用了一个全面的多观察异源Graph Neural Network（GNN），以捕捉这些肽的细微结构特征。* Results: PepLand 能够有效地捕捉这些肽的积极特征，实现了在肽中心研究领域中的创新进步。

Abstract
In recent years, the scientific community has become increasingly interested on peptides with non-canonical amino acids due to their superior stability and resistance to proteolytic degradation. These peptides present promising modifications to biological, pharmacological, and physiochemical attributes in both endogenous and engineered peptides. Notwithstanding their considerable advantages, the scientific community exhibits a conspicuous absence of an effective pre-trained model adept at distilling feature representations from such complex peptide sequences. We herein propose PepLand, a novel pre-training architecture for representation and property analysis of peptides spanning both canonical and non-canonical amino acids. In essence, PepLand leverages a comprehensive multi-view heterogeneous graph neural network tailored to unveil the subtle structural representations of peptides. Empirical validations underscore PepLand's effectiveness across an array of peptide property predictions, encompassing protein-protein interactions, permeability, solubility, and synthesizability. The rigorous evaluation confirms PepLand's unparalleled capability in capturing salient synthetic peptide features, thereby laying a robust foundation for transformative advances in peptide-centric research domains. We have made all the source code utilized in this study publicly accessible via GitHub at https://github.com/zhangruochi/pepland

摘要

AI-accelerated Discovery of Altermagnetic Materials

paper_url: http://arxiv.org/abs/2311.04418
repo_url: https://github.com/zfgao66/mataltmag
paper_authors: Ze-Feng Gao, Shuai Qu, Bocheng Zeng, Ji-Rong Wen, Hao Sun, Pengjie Guo, Zhong-Yi Lu
for: 这篇论文旨在探索新型磁性阶段—— alternate magnetism，以及发现更多的磁性材料。methods: 这篇论文使用了人工智能搜索引擎，融合了对组态分析、图像神经网络预训、最佳运输理论和基本电子结构计算，发现25种新的磁性材料，包括金属、半导体和对磁体。results: 这篇论文发现了25种新的磁性材料，其中8种是$i$-波磁性材料，这些材料具有独特的物理性能，例如非常性 Hall 效应、非常性 Kerr 效应和トポロジック性。

Abstract
Altermagnetism, a new magnetic phase, has been theoretically proposed and experimentally verified to be distinct from ferromagnetism and antiferromagnetism. Although altermagnets have been found to possess many exotic physical properties, the very limited availability of known altermagnetic materials~(e.g., 14 confirmed materials) hinders the study of such properties. Hence, discovering more types of altermagnetic materials is crucial for a comprehensive understanding of altermagnetism and thus facilitating new applications in the next generation information technologies, e.g., storage devices and high-sensitivity sensors. Here, we report 25 new altermagnetic materials that cover metals, semiconductors, and insulators, discovered by an AI search engine unifying symmetry analysis, graph neural network pre-training, optimal transport theory, and first-principles electronic structure calculation. The wide range of electronic structural characteristics reveals that various innovative physical properties manifest in these newly discovered altermagnetic materials, e.g., anomalous Hall effect, anomalous Kerr effect, and topological property. Noteworthy, we discovered 8 $i$-wave altermagnetic materials for the first time. Overall, the AI search engine performs much better than human experts and suggests a set of new altermagnetic materials with unique properties, outlining its potential for accelerated discovery of altermagnetic materials.

摘要
新型磁相��, 名为“alternromagnetism”, 已经理论上提出并实验验证, 与��磁和反磁异有区别。 although altermagnets possess many exotic physical properties, the limited availability of known altermagnetic materials (e.g., 14 confirmed materials) hinders the study of such properties. Therefore, discovering more types of altermagnetic materials is crucial for a comprehensive understanding of altermagnetism and will facilitate new applications in the next generation information technologies, such as storage devices and high-sensitivity sensors.我们报告了25种新的� alternate magnetic materials，包括� metal、半导体和半导体，通过� unity symmetry analysis、graph neural network pre-training、optimal transport theory和� first-principles electronic structure calculation发现。 these newly discovered altermagnetic materials exhibit a wide range of electronic structural characteristics, resulting in various innovative physical properties, such as anomalous Hall effect, anomalous Kerr effect, and topological property. noteworthy, we discovered 8 $i$-wave altermagnetic materials for the first time.相比� human experts, AI search engine表现更好，提供了一些新的� alternate magnetic materials with unique properties, highlighting its potential for accelerated discovery of altermagnetic materials.

Human Conditional Reasoning in Answer Set Programming

paper_url: http://arxiv.org/abs/2311.04412
repo_url: https://github.com/Aryia-Behroziuan/References
paper_authors: Chiaki Sakama
for: 本研究探讨了人类思维中的四种推理类型，包括假设前提 (AA)、假设结果 (AC)、否定前提 (DA) 和否定结果 (DC)。
methods: 本文使用 answer set programming 实现了 AC、DA 和 DC 推理类型，并研究了这些推理类型的正式性和人类思维任务的相关性。
results: 本研究发现，AC 和 DA 推理类型在日常生活中很常见，而 DC 推理类型则是逻辑有效的。此外，本文还应用了这些完成方法于通用智能 reasoning 领域。

Abstract
Given a conditional sentence P=>Q (if P then Q) and respective facts, four different types of inferences are observed in human reasoning. Affirming the antecedent (AA) (or modus ponens) reasons Q from P; affirming the consequent (AC) reasons P from Q; denying the antecedent (DA) reasons -Q from -P; and denying the consequent (DC) (or modus tollens) reasons -P from -Q. Among them, AA and DC are logically valid, while AC and DA are logically invalid and often called logical fallacies. Nevertheless, humans often perform AC or DA as pragmatic inference in daily life. In this paper, we realize AC, DA and DC inferences in answer set programming. Eight different types of completion are introduced and their semantics are given by answer sets. We investigate formal properties and characterize human reasoning tasks in cognitive psychology. Those completions are also applied to commonsense reasoning in AI.

摘要
Given a conditional sentence P=>Q (if P then Q) and respective facts, four different types of inferences are observed in human reasoning. Affirming the antecedent (AA) (or modus ponens) reasons Q from P; affirming the consequent (AC) reasons P from Q; denying the antecedent (DA) reasons -Q from -P; and denying the consequent (DC) (or modus tollens) reasons -P from -Q. Among them, AA and DC are logically valid, while AC and DA are logically invalid and often called logical fallacies. Nevertheless, humans often perform AC or DA as pragmatic inference in daily life. In this paper, we realize AC, DA, and DC inferences in answer set programming. Eight different types of completion are introduced and their semantics are given by answer sets. We investigate formal properties and characterize human reasoning tasks in cognitive psychology. Those completions are also applied to commonsense reasoning in AI.Here's the translation in Traditional Chinese: givent a conditional sentence P=>Q (if P then Q) and respective facts, four different types of inferences are observed in human reasoning. Affirming the antecedent (AA) (or modus ponens) reasons Q from P; affirming the consequent (AC) reasons P from Q; denying the antecedent (DA) reasons -Q from -P; and denying the consequent (DC) (or modus tollens) reasons -P from -Q. Among them, AA and DC are logically valid, while AC and DA are logically invalid and often called logical fallacies. Nevertheless, humans often perform AC or DA as pragmatic inference in daily life. In this paper, we realize AC, DA, and DC inferences in answer set programming. Eight different types of completion are introduced and their semantics are given by answer sets. We investigate formal properties and characterize human reasoning tasks in cognitive psychology. Those completions are also applied to commonsense reasoning in AI.

Improved DDIM Sampling with Moment Matching Gaussian Mixtures

paper_url: http://arxiv.org/abs/2311.04938
repo_url: None
paper_authors: Prasad Gabbur
for: 用于加速采样从预训练的推 Sobolev 模型（DDPM）中。
methods: 使用 Gaussian Mixture Model（GMM）作为反转过程操作（核），具体是匹配 DDPM 前向积分的首两个中心差异。
results: 通过对 CelebAHQ 和 FFHQ 的不条件模型以及 ImageNet 的类条件模型进行实验，并证明使用 GMM 核可以在少量采样步骤下提高生成样本的质量，并且在 FID 和 IS 指标中具有显著改善。例如在 ImageNet 256x256 上，使用 10 步采样，我们可以达到 FID 6.94 和 IS 207.85，与 Gaussian 核相比，这些指标的值分别为 10.15 和 196.73。

Abstract
We propose using a Gaussian Mixture Model (GMM) as reverse transition operator (kernel) within the Denoising Diffusion Implicit Models (DDIM) framework, which is one of the most widely used approaches for accelerated sampling from pre-trained Denoising Diffusion Probabilistic Models (DDPM). Specifically we match the first and second order central moments of the DDPM forward marginals by constraining the parameters of the GMM. We see that moment matching is sufficient to obtain samples with equal or better quality than the original DDIM with Gaussian kernels. We provide experimental results with unconditional models trained on CelebAHQ and FFHQ and class-conditional models trained on ImageNet datasets respectively. Our results suggest that using the GMM kernel leads to significant improvements in the quality of the generated samples when the number of sampling steps is small, as measured by FID and IS metrics. For example on ImageNet 256x256, using 10 sampling steps, we achieve a FID of 6.94 and IS of 207.85 with a GMM kernel compared to 10.15 and 196.73 respectively with a Gaussian kernel.

摘要
我们提议使用 Gaussian Mixture Model（GMM）作为反转过程操作器（核函数）在 Denoising Diffusion Implicit Models（DDIM）框架中，这是一种广泛使用的方法来加速从预训练的 Denoising Diffusion Probabilistic Models（DDPM）中采样。我们匹配了 DDPM 前向分布的首和第二个中心均值，通过限制 GMM 参数来实现这一点。我们发现， momemt 匹配是 suficient 来获得与原始 DDIM Gaussian 核函数相同或更好的质量的采样。我们在 CelebAHQ 和 FFHQ 上训练了无条件模型，并在 ImageNet 数据集上训练了类别 condition 模型。我们的实验结果表明，使用 GMM 核函数可以在少量采样步骤下获得较好的采样质量， как measured by FID 和 IS 度量。例如，在 ImageNet 256x256 上，使用 10 步骤，我们达到了 FID 6.94 和 IS 207.85 ，与 Gaussian 核函数相比，分别降低了 3.21 和 139.88。

Human-Centered Planning

paper_url: http://arxiv.org/abs/2311.04403
repo_url: https://github.com/rprokap/pset-9
paper_authors: Yuliang Li, Nitin Kamra, Ruta Desai, Alon Halevy
for: 本研究旨在开发一个基于深度学习模型（LLM）的计划程序，以便在用户提供的自然语言约束下，生成一个合理的计划。
methods: 本研究使用了LLM和符号逻辑计划程序（SymPlan），并将两者结合在一起以提高计划的可靠性和用户满意度。
results: 实验结果显示，LLMPlan与传统的符号逻辑计划程序相比，在不具有正式约束的情况下，平均表现相当，而且能够更好地满足用户的隐式需求。在交互评估中，LLM-基于的计划程序的用户满意度高于符号逻辑计划程序的用户满意度（70.5% vs. 40.4%）。

Abstract
LLMs have recently made impressive inroads on tasks whose output is structured, such as coding, robotic planning and querying databases. The vision of creating AI-powered personal assistants also involves creating structured outputs, such as a plan for one's day, or for an overseas trip. Here, since the plan is executed by a human, the output doesn't have to satisfy strict syntactic constraints. A useful assistant should also be able to incorporate vague constraints specified by the user in natural language. This makes LLMs an attractive option for planning. We consider the problem of planning one's day. We develop an LLM-based planner (LLMPlan) extended with the ability to self-reflect on its output and a symbolic planner (SymPlan) with the ability to translate text constraints into a symbolic representation. Despite no formal specification of constraints, we find that LLMPlan performs explicit constraint satisfaction akin to the traditional symbolic planners on average (2% performance difference), while retaining the reasoning of implicit requirements. Consequently, LLM-based planners outperform their symbolic counterparts in user satisfaction (70.5% vs. 40.4%) during interactive evaluation with 40 users.

摘要
LLMs 近期在输出结构化任务上做出了印象深刻的进展，如编程、机器人规划和查询数据库等。创建基于 AI 的人工智能助手也涉及到创建结构化输出，如一天的计划或国外旅行计划。在这些情况下，由人类执行计划，输出不需要严格的语法约束。一个有用的助手应该能够根据用户提供的自然语言笔记中的抽象约束进行计划。这使得 LLMs 成为规划的有力选择。我们考虑一天的规划问题。我们开发了一个基于 LLM 的规划器（LLMPlan），并增加了对自己输出的自适应能力以及一个基于符号的规划器（SymPlan），可以将自然语言约束转换为符号表示。虽无正式约束规则，但我们发现 LLMPlan 在平均上与传统的符号规划器相当于满足约束（2%性能差异），同时保留了逻辑推理的隐式要求。因此，基于 LLM 的规划器在用户满意度方面（70.5% vs. 40.4%）在交互评估中超过符号规划器。

LRM: Large Reconstruction Model for Single Image to 3D

paper_url: http://arxiv.org/abs/2311.04400
repo_url: None
paper_authors: Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan
for: 预测3D模型从单个输入图像中
methods: 使用高可Scalable transformer-based architecture和500万可学习参数直接预测神经辐射场（NeRF）
results: 可以高效地预测3D模型，包括从真实捕捉和生成模型中的图像中Here is the same information in Simplified Chinese:
for: 预测3D模型从单个输入图像中
methods: 使用高可Scalable transformer-based architecture和500万可学习参数直接预测神经辐射场（NeRF）
results: 可以高效地预测3D模型，包括从真实捕捉和生成模型中的图像中

Abstract
We propose the first Large Reconstruction Model (LRM) that predicts the 3D model of an object from a single input image within just 5 seconds. In contrast to many previous methods that are trained on small-scale datasets such as ShapeNet in a category-specific fashion, LRM adopts a highly scalable transformer-based architecture with 500 million learnable parameters to directly predict a neural radiance field (NeRF) from the input image. We train our model in an end-to-end manner on massive multi-view data containing around 1 million objects, including both synthetic renderings from Objaverse and real captures from MVImgNet. This combination of a high-capacity model and large-scale training data empowers our model to be highly generalizable and produce high-quality 3D reconstructions from various testing inputs including real-world in-the-wild captures and images from generative models. Video demos and interactable 3D meshes can be found on this website: https://yiconghong.me/LRM/.

摘要
我们提出了首个大型重建模型（LRM），可以从单个输入图像中预测对象的3D模型，只需5秒钟。与以往的方法不同，LRM采用了可扩展的变换器基 architecture，并有5亿个学习参数来直接预测神经辐射场（NeRF）。我们在终端到终 Point manner进行了大规模的训练，使用了包括Objaverse中的 sintetic renderings和MVImgNet中的真实捕捉的大量多视图数据，共约1000万个对象。这种高容量模型和大规模的训练数据使得我们的模型具有高度泛化性和可以高质量地从多种测试输入，包括实际世界中的野外捕捉和生成模型中的图像，生成高质量的3D重建。网站上有视频 demo和可交互的3D mesh：https://yiconghong.me/LRM/。

2023-11-08

Geometry-Calibrated DRO: Combating Over-Pessimism with Free Energy Implications

Zero-shot Translation of Attention Patterns in VQA Models to Natural Language

Automated Annotation of Scientific Texts for ML-based Keyphrase Extraction and Validation

Transfer learning from a sparsely annotated dataset of 3D medical images

Towards Effective Paraphrasing for Information Disguise

Joint Sensing and Semantic Communications with Multi-Task Deep Learning

Interpreting Pretrained Language Models via Concept Bottlenecks

Expressibility-induced Concentration of Quantum Neural Tangent Kernels

Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models

Prompt Sketching for Large Language Models

Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets

Towards Few-Annotation Learning in Computer Vision: Application to Image Classification and Object Detection tasks

SEMQA: Semi-Extractive Multi-Source Question Answering

LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models

Rethinking Benchmark and Contamination for Language Models with Rephrased Samples

Identifying Semantic Component for Robust Molecular Property Prediction

Decentralized Personalized Online Federated Learning

MTGER: Multi-view Temporal Graph Enhanced Temporal Reasoning over Time-Involved Document

DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining

On the Multiple Roles of Ontologies in Explainable AI

Vital Sign Forecasting for Sepsis Patients in ICUs

The voraus-AD Dataset for Anomaly Detection in Robot Applications

Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers

The Quest for Content: A Survey of Search-Based Procedural Content Generation for Video Games

Challenging Common Assumptions in Multi-task Learning

Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO

Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation

Pre-training LLMs using human-like development data corpus

Pragmatic Reasoning Unlocks Quantifier Semantics for Foundation Models

Hybrid Focal and Full-Range Attention Based Graph Transformers

SKU-Patch: Towards Efficient Instance Segmentation for Unseen Objects in Auto-Store

Object-Centric Learning with Slot Mixture Module

Explained anomaly detection in text reviews: Can subjective scenarios be correctly evaluated?

LuminanceL1Loss: A loss function which measures percieved brightness and colour differences

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models

Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection

GResilience: Trading Off Between the Greenness and the Resilience of Collaborative AI Systems

CAIS-DMA: A Decision-Making Assistant for Collaborative AI Systems

Local Differential Privacy for Smart Meter Data Sharing

RankAug: Augmented data ranking for text classification

Validating ChatGPT Facts through RDF Knowledge Graphs and Sentence Similarity

FFINet: Future Feedback Interaction Network for Motion Forecasting

Causal Inference on Investment Constraints and Non-stationarity in Dynamic Portfolio Optimization through Reinforcement Learning

Auto deep learning for bioacoustic signals

NExT-Chat: An LMM for Chat, Detection and Segmentation

Explainable AI for Earth Observation: Current Methods, Open Challenges, and Opportunities

Emergent Communication for Rules Reasoning

RDGCN: Reinforced Dependency Graph Convolutional Network for Aspect-based Sentiment Analysis

Edge-assisted U-Shaped Split Federated Learning with Privacy-preserving for Internet of Things

Improving Pacing in Long-Form Story Planning

Evaluating Uncertainty Quantification approaches for Neural PDEs in scientific applications

MathNAS: If Blocks Have a Role in Mathematical Architecture Design

MixTEA: Semi-supervised Entity Alignment with Mixture Teaching

Interpretable Geoscience Artificial Intelligence (XGeoS-AI): Application to Demystify Image Recognition

LooGLE: Can Long-Context Language Models Understand Long Contexts?

Data Factors for Better Compositional Generalization

PepLand: a large-scale pre-trained peptide representation model for a comprehensive landscape of both canonical and non-canonical amino acids

AI-accelerated Discovery of Altermagnetic Materials

Human Conditional Reasoning in Answer Set Programming

Improved DDIM Sampling with Moment Matching Gaussian Mixtures

Human-Centered Planning

LRM: Large Reconstruction Model for Single Image to 3D