cs.AI - 2023-10-02

Transcending Domains through Text-to-Image Diffusion: A Source-Free Approach to Domain Adaptation

  • paper_url: http://arxiv.org/abs/2310.01701
  • repo_url: None
  • paper_authors: Shivang Chopra, Suraj Kothawade, Houda Aynaou, Aman Chadha
  • for: 增强模型在目标领域的性能,使用充足的标注数据来增强模型的性能。
  • methods: 使用文本到图像扩散模型在目标领域样本上训练,并使用预训练的源模型进行微调,生成相似于源数据的样本。
  • results: 在标准Office-31、Office-Home和VisDA benchmark上比较多个基线方法,证明了我们的方法在SFDA任务中的效果。
    Abstract Domain Adaptation (DA) is a method for enhancing a model's performance on a target domain with inadequate annotated data by applying the information the model has acquired from a related source domain with sufficient labeled data. The escalating enforcement of data-privacy regulations like HIPAA, COPPA, FERPA, etc. have sparked a heightened interest in adapting models to novel domains while circumventing the need for direct access to the source data, a problem known as Source-Free Domain Adaptation (SFDA). In this paper, we propose a novel framework for SFDA that generates source data using a text-to-image diffusion model trained on the target domain samples. Our method starts by training a text-to-image diffusion model on the labeled target domain samples, which is then fine-tuned using the pre-trained source model to generate samples close to the source data. Finally, we use Domain Adaptation techniques to align the artificially generated source data with the target domain data, resulting in significant performance improvements of the model on the target domain. Through extensive comparison against several baselines on the standard Office-31, Office-Home, and VisDA benchmarks, we demonstrate the effectiveness of our approach for the SFDA task.
    摘要 域 Adaptation(DA)是一种方法,用于在目标域中提高模型的性能,使用相关的源域数据,具有足够的标注数据。随着数据隐私法规如HIPAA、COPPA、FERPA等的推广,SFDA问题(无法直接访问源数据的域 adaptation)在模型适应新域时引起了更高的兴趣。在这篇论文中,我们提出了一种新的SFDA框架,通过在目标域样本上训练文本到图像扩散模型,生成源数据。我们的方法从目标域上的标注样本中训练文本到图像扩散模型,然后使用预训练的源模型进行微调,生成与源数据相似的样本。最后,我们使用域 adaptation技术将人工生成的源数据与目标域数据进行对齐,从而实现了模型在目标域的显著性能提升。通过对标准Office-31、Office-Home和VisDA测试集进行广泛的比较,我们证明了我们的方法在SFDA任务中的有效性。

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

  • paper_url: http://arxiv.org/abs/2310.04445
  • repo_url: None
  • paper_authors: Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh
  • for: 防止大语言模型(LLM)Alignment可以被绕过,通过附加特制的攻击 suffix 来诱发危险回答。
  • methods: 使用公共模型作为代理,通过成功地从公共代理模型传播攻击到私有目标模型。
  • results: 通过本地精细调教(LoFT),即在附近 lexicosemantic 附近的类似查询上练习代理模型,以减少代理模型和目标模型之间的差异。实验结果显示,通过本地精细调教,可以提高攻击 transferred 的成功率,对于 ChatGPT、GPT-4 和 Claude 目标模型,提高了 $39%$, $7%$ 和 $0.5%$ (绝对)。
    Abstract It has been shown that Large Language Model (LLM) alignments can be circumvented by appending specially crafted attack suffixes with harmful queries to elicit harmful responses. To conduct attacks against private target models whose characterization is unknown, public models can be used as proxies to fashion the attack, with successful attacks being transferred from public proxies to private target models. The success rate of attack depends on how closely the proxy model approximates the private model. We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query. Therefore, in this paper, we propose \emph{Local Fine-Tuning (LoFT)}, \textit{i.e.}, fine-tuning proxy models on similar queries that lie in the lexico-semantic neighborhood of harmful queries to decrease the divergence between the proxy and target models. First, we demonstrate three approaches to prompt private target models to obtain similar queries given harmful queries. Next, we obtain data for local fine-tuning by eliciting responses from target models for the generated similar queries. Then, we optimize attack suffixes to generate attack prompts and evaluate the impact of our local fine-tuning on the attack's success rate. Experiments show that local fine-tuning of proxy models improves attack transferability and increases attack success rate by $39\%$, $7\%$, and $0.5\%$ (absolute) on target models ChatGPT, GPT-4, and Claude respectively.
    摘要 研究人员已经发现,可以使用特制的攻击 suffix 让 Large Language Model (LLM) alignment 被绕过。为了进行对私人目标模型的攻击,可以使用公共模型作为代理,通过成功地从公共代理传输攻击到私人目标模型。攻击成功率取决于代理模型与目标模型之间的相似性。我们假设,只要代理模型可以在恶意查询附近与目标模型相似,那么攻击就可以传输。因此,在这篇论文中,我们提出了本地精细调整(LoFT),即将代理模型在恶意查询附近的类似查询上进行精细调整,以减少代理模型与目标模型之间的差异。首先,我们介绍了三种方法,以获得与恶意查询相似的查询。然后,我们通过目标模型对生成的类似查询进行响应来获得本地调整数据。最后,我们优化攻击提示符,并评估本地调整对攻击成功率的影响。实验结果显示,本地调整可以提高代理模型对攻击的传输性和攻击成功率,具体上提高了 $39\%$, $7\%$, 和 $0.5\%$(绝对)的攻击成功率。

Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models

  • paper_url: http://arxiv.org/abs/2310.01691
  • repo_url: https://github.com/manga-uofa/ptfer
  • paper_authors: Zijun Wu, Yongkang Wu, Lili Mou
  • for: 这篇论文是针对自然语言处理(NLP)领域中的启动调整问题进行研究,尤其是大语言模型的适应性问题。
  • methods: 本文提出了一种零数据 kontinuous prompt 转移方法,其中源提示被编码到相对空间中,并且对目标模型进行搜索以实现转移。
  • results: 实验结果显示了该方法的效果,发现task semantics在 kontinuous prompt 中可以通过不同语言模型进行普遍化。此外,我们发现可以从多个源模型中获取task semantics,对应的目标模型进行转移可以进一步提高转移性。
    Abstract Prompt tuning in natural language processing (NLP) has become an increasingly popular method for adapting large language models to specific tasks. However, the transferability of these prompts, especially continuous prompts, between different models remains a challenge. In this work, we propose a zero-shot continuous prompt transfer method, where source prompts are encoded into relative space and the corresponding target prompts are searched for transferring to target models. Experimental results confirm the effectiveness of our method, showing that 'task semantics' in continuous prompts can be generalized across various language models. Moreover, we find that combining 'task semantics' from multiple source models can further enhance the generalizability of transfer.
    摘要 Prompt tuning在自然语言处理(NLP)中已经变得越来越流行,用于适应特定任务。然而,这些提示的转移性,尤其是连续提示,在不同模型之间仍然是一个挑战。在这项工作中,我们提出了零shot连续提示转移方法,其中源提示被编码到相对空间中,并在目标模型上查找对应的目标提示进行转移。实验结果证明了我们的方法的效iveness,表明在不同语言模型之间可以通过'任务semantics'在连续提示中 generalized。此外,我们发现了将'任务semantics'从多个源模型组合起来可以进一步增强转移的一致性。

Designing User-Centric Behavioral Interventions to Prevent Dysglycemia with Novel Counterfactual Explanations

  • paper_url: http://arxiv.org/abs/2310.01684
  • repo_url: None
  • paper_authors: Asiful Arefeen, Hassan Ghasemzadeh
  • for: 预防糖尿病和chronic complications,通过提供个性化的饮食、运动和药物建议来维持正常血糖水平
  • methods: 利用对抗学习的想法,生成高维健康数据的决策边界,并通过格里д搜索生成可行的交互方案
  • results: GlyCoach可以在两个实际数据集和外部模拟器上进行广泛的评估,与现有的对抗学习方法相比,Counterfactuals from GlyCoach exhibit a 32% improved normalized distance, 和87%的敏感性。
    Abstract Maintaining normal blood glucose levels through lifestyle behaviors is central to maintaining health and preventing disease. Frequent exposure to dysglycemia (i.e., abnormal glucose events such as hyperlycemia and hypoglycemia) leads to chronic complications including diabetes, kidney disease and need for dialysis, myocardial infarction, stroke, amputation, and death. Therefore, a tool capable of predicting dysglycemia and offering users actionable feedback about how to make changes in their diet, exercise, and medication to prevent abnormal glycemic events could have significant societal impacts. Counterfactual explanations can provide insights into why a model made a particular prediction by generating hypothetical instances that are similar to the original input but lead to a different prediction outcome. Therefore, counterfactuals can be viewed as a means to design AI-driven health interventions to prevent adverse health outcomes such as dysglycemia. In this paper, we design GlyCoach, a framework for generating counterfactual explanations for glucose control. Leveraging insights from adversarial learning, GlyCoach characterizes the decision boundary for high-dimensional health data and performs a grid search to generate actionable interventions. GlyCoach is unique in integrating prior knowledge about user preferences of plausible explanations into the process of counterfactual generation. We evaluate GlyCoach extensively using two real-world datasets and external simulators from prior studies that predict glucose response. GlyCoach achieves 87\% sensitivity in the simulation-aided validation, surpassing the state-of-the-art techniques for generating counterfactual explanations by at least $10\%$. Besides, counterfactuals from GlyCoach exhibit a $32\%$ improved normalized distance compared to previous research.
    摘要 保持正常血糖水平通过生活习惯是保健和预防疾病的中心。经常暴露于异常血糖(如高血糖和低血糖)会导致慢性合并症如 диабе特、肾病和肾透析、心脏infarction、中风、截肢和死亡。因此,一个能预测异常血糖并提供用户可行的feedback以预防异常血糖事件的工具可能会有显著的社会影响。对于这种情况,我们设计了GlyCoach,一种基于对抗学习的框架,用于生成可行的异常血糖解释。GlyCoach利用对抗学习的洞察, caracteriza la frontera de toma de decisiones para datos de salud de alta dimensionalidad y realiza una búsqueda en grilla para generar intervenciones factibles. GlyCoach是唯一一个integrating knowledge about user preferences of plausible explanations into the process of counterfactual generation。我们对GlyCoach进行了广泛的测试,并使用了两个实际数据集和外部的仿真器来验证其性能。GlyCoach在 simulation-aided validation中达到了87%的敏感性,超过了当前最佳的对抗学习技术生成异常解释的水平。此外,异常解释从GlyCoach中 exhibit了32%的改善正规化距离。

What’s the Magic Word? A Control Theory of LLM Prompting

  • paper_url: http://arxiv.org/abs/2310.04444
  • repo_url: https://github.com/amanb2000/magic_words
  • paper_authors: Aman Bhargava, Cameron Witkowski, Manav Shah, Matt Thomson
  • for: 本研究旨在形式化提示工程为自然语言处理模型(LLM)的优化问题,并研究是否存在一个特定的提示可以使LLM正确预测输入序列中的最后一个元素。
  • methods: 本研究使用控制理论来分析LLM的可控性,并提出了一个名为$k-\epsilon$控制可行性的度量来量化LLM的可操控性。
  • results: 研究发现,对于大多数的WikiText实例,存在一个10个字或更少的魔法提示可以使LLM正确预测输入序列中的最后一个元素。此外,研究还发现了不同模型的$k-\epsilon$控制可行性的差异。
    Abstract Prompt engineering is effective and important in the deployment of LLMs but is poorly understood mathematically. Here, we formalize prompt engineering as an optimal control problem on LLMs -- where the prompt is considered a control variable for modulating the output distribution of the LLM. Within this framework, we ask a simple question: given a sequence of tokens, does there always exist a prompt we can prepend that will steer the LLM toward accurately predicting the final token? We call such an optimal prompt the magic word since prepending the prompt causes the LLM to output the correct answer. If magic words exist, can we find them? If so, what are their properties? We offer analytic analysis on the controllability of the self-attention head where we prove a bound on controllability as a function of the singular values of its weight matrices. We take inspiration from control theory to propose a metric called $k-\epsilon$ controllability to characterize LLM steerability. We compute the $k-\epsilon$ controllability of a panel of large language models, including Falcon-7b, Llama-7b, and Falcon-40b on 5000 WikiText causal language modeling tasks. Remarkably, we find that magic words of 10 tokens or less exist for over 97% of WikiText instances surveyed for each model.
    摘要 Prompt engineering是效果重要且不充分理解的一部分,它涉及到控制LLMs的输出分布。在这个框架下,我们问一个简单的问题:给定一个序列的token,是否总是存在一个可以预先 prepend 的提示,可以使LLM更准确地预测最后一个token?如果存在这种最佳提示,我们可以找到它吗?如果可以找到,它们的性质是什么?我们提出了一种分析方法,证明了自注意头的控制性的下界,这个下界取决于自注意头的Weight矩阵的单值。我们受到控制理论的启发,提出了一个名为$k-\epsilon$控制性的度量,用于描述LLM的可控性。我们计算了一些大型语言模型的$k-\epsilon$控制性,包括Falcon-7b、Llama-7b和Falcon-40b,并在5000个WikiText causal语言模型任务上测试了它们。结果显示,对于97%以上的WikiText实例,存在10个字或更少的魔法提示。

Keypoint-Augmented Self-Supervised Learning for Medical Image Segmentation with Limited Annotation

  • paper_url: http://arxiv.org/abs/2310.01680
  • repo_url: https://github.com/zshyang/kaf
  • paper_authors: Zhangsihao Yang, Mengwei Ren, Kaize Ding, Guido Gerig, Yalin Wang
  • for: 这 paper 是为了提高医疗图像分割的精度,特别是在低注释条件下。
  • methods: 这 paper 使用了自我超视的 CNN 模型(如 UNet),并在自我超视下进行了自我学习。具体来说,它使用了一种具有键点扩展的融合层,以EXTRACT representations preserving both short- and long-range self-attention。此外,它还使用了全球和本地自我超视预训练方法。
  • results: 在 MRI 和 CT 分割任务上,这 paper 的提出的方法在比 CNN 和 Transformer-based UNets 的情况下表现出了建筑学习优势。此外,它还在 SSL 方法中表现出了更好的robustness和更高的 segmentation 结果。
    Abstract Pretraining CNN models (i.e., UNet) through self-supervision has become a powerful approach to facilitate medical image segmentation under low annotation regimes. Recent contrastive learning methods encourage similar global representations when the same image undergoes different transformations, or enforce invariance across different image/patch features that are intrinsically correlated. However, CNN-extracted global and local features are limited in capturing long-range spatial dependencies that are essential in biological anatomy. To this end, we present a keypoint-augmented fusion layer that extracts representations preserving both short- and long-range self-attention. In particular, we augment the CNN feature map at multiple scales by incorporating an additional input that learns long-range spatial self-attention among localized keypoint features. Further, we introduce both global and local self-supervised pretraining for the framework. At the global scale, we obtain global representations from both the bottleneck of the UNet, and by aggregating multiscale keypoint features. These global features are subsequently regularized through image-level contrastive objectives. At the local scale, we define a distance-based criterion to first establish correspondences among keypoints and encourage similarity between their features. Through extensive experiments on both MRI and CT segmentation tasks, we demonstrate the architectural advantages of our proposed method in comparison to both CNN and Transformer-based UNets, when all architectures are trained with randomly initialized weights. With our proposed pretraining strategy, our method further outperforms existing SSL methods by producing more robust self-attention and achieving state-of-the-art segmentation results. The code is available at https://github.com/zshyang/kaf.git.
    摘要 预训练CNN模型(例如UNet)通过自我指导已成为医学图像分割中低注解注意力下强大的方法。最近的对比学习方法激励同一张图像下不同变换后的相似全球表示,或者强制不同图像/补丁特征之间的相似性。然而,CNN提取的全球和本地特征限制了捕捉生物解剖学中长距离空间相关性。为此,我们提出了一种带有加密键点的融合层,该层提取保留短距离和长距离自动注意的表示。具体来说,我们在多个缩放级别上将CNN特征地图中加入了额外输入,以学习长距离空间自动注意。此外,我们还引入了全球和本地自我指导预训练方法。在全球级别上,我们从UNet瓶颈中获得全球表示,并将多个缩放级别的键点特征聚合成全球表示。这些全球特征 subsequentially 被正则化通过图像级别对比学习目标。在本地级别上,我们定义了一种距离基于的标准准则,以首先确定键点之间的对应关系,并且促进键点特征之间的相似性。经过了广泛的实验,我们发现在MRI和CT segmentation任务上,我们提出的方法在与CNN和Transformer-based UNets进行比较时,具有更高的建筑准确性。此外,我们还提出了一种自动注意预训练策略,该策略可以生成更加稳定和robust的自动注意,并且实现了状态级别的分割结果。代码可以在https://github.com/zshyang/kaf.git中找到。

Artemis: HE-Aware Training for Efficient Privacy-Preserving Machine Learning

  • paper_url: http://arxiv.org/abs/2310.01664
  • repo_url: None
  • paper_authors: Yeonsoo Jeon, Mattan Erez, Michael Orshansky
  • for: 这个研究是为了提高基于同质加密(HE)的隐私保护 Machine Learning(ML)的实用性,特别是在处理现代大深度神经网时。
  • methods: 这个研究使用了一种名为Artemis的高效神经网簇析方法,包括两种HE掌握析法(位置析法和 диагональ析法),以减少HE条件下的Rotation操作 Compute时间。
  • results: 这个研究发现,使用Artemis方法可以在三个数据集上 achieved a 1.2-6x的改善,并且比之前的HE掌握析法更好。
    Abstract Privacy-Preserving ML (PPML) based on Homomorphic Encryption (HE) is a promising foundational privacy technology. Making it more practical requires lowering its computational cost, especially, in handling modern large deep neural networks. Model compression via pruning is highly effective in conventional plaintext ML but cannot be effectively applied to HE-PPML as is. We propose Artemis, a highly effective DNN pruning technique for HE-based inference. We judiciously investigate two HE-aware pruning strategies (positional and diagonal) to reduce the number of Rotation operations, which dominate compute time in HE convolution. We find that Pareto-optimal solutions are based fully on diagonal pruning. Artemis' benefits come from coupling DNN training, driven by a novel group Lasso regularization objective, with pruning to maximize HE-specific cost reduction (dominated by the Rotation operations). We show that Artemis improves on prior HE-oriented pruning and can achieve a 1.2-6x improvement when targeting modern convolutional models (ResNet18 and ResNet18) across three datasets.
    摘要 privacy-preserving ML (PPML) based on homomorphic encryption (HE) is a promising foundational privacy technology. making it more practical requires lowering its computational cost, especially in handling modern large deep neural networks. model compression via pruning is highly effective in conventional plaintext ML but cannot be effectively applied to HE-PPML as is. we propose artemis, a highly effective DNN pruning technique for HE-based inference. we judiciously investigate two HE-aware pruning strategies (positional and diagonal) to reduce the number of rotation operations, which dominate compute time in HE convolution. we find that pareto-optimal solutions are based fully on diagonal pruning. artemis' benefits come from coupling DNN training, driven by a novel group lasso regularization objective, with pruning to maximize HE-specific cost reduction (dominated by the rotation operations). we show that artemis improves on prior HE-oriented pruning and can achieve a 1.2-6x improvement when targeting modern convolutional models (resnet18 and resnet50) across three datasets.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

It’s all about you: Personalized in-Vehicle Gesture Recognition with a Time-of-Flight Camera

  • paper_url: http://arxiv.org/abs/2310.01659
  • repo_url: None
  • paper_authors: Amr Gomaa, Guillermo Reyes, Michael Feld
  • for: 提高在驾驶环境中手势识别精度,提高安全性和司机体验。
  • methods: 提出了一种基于CNNLSTM模型的个性化适应方法,通过数据增强、个性化适应和逐步学习等技术来提高识别精度,同时减少数据需求。
  • results: 实现了最高识别精度达90%,并证明了个性化适应和逐步学习的效果。
    Abstract Despite significant advances in gesture recognition technology, recognizing gestures in a driving environment remains challenging due to limited and costly data and its dynamic, ever-changing nature. In this work, we propose a model-adaptation approach to personalize the training of a CNNLSTM model and improve recognition accuracy while reducing data requirements. Our approach contributes to the field of dynamic hand gesture recognition while driving by providing a more efficient and accurate method that can be customized for individual users, ultimately enhancing the safety and convenience of in-vehicle interactions, as well as driver's experience and system trust. We incorporate hardware enhancement using a time-of-flight camera and algorithmic enhancement through data augmentation, personalized adaptation, and incremental learning techniques. We evaluate the performance of our approach in terms of recognition accuracy, achieving up to 90\%, and show the effectiveness of personalized adaptation and incremental learning for a user-centered design.
    摘要 尽管Gesture认知技术已经取得了重要进步,但在驾驶环境中识别手势仍然是一项挑战,主要因为数据的有限性和成本高昂,以及手势的动态和不断变化的性质。在这项工作中,我们提出了一种个性化模型适应approach,以提高CNNLSTM模型的识别精度,降低数据需求。我们的approach对 dynamically hand gesture recognition while driving做出了贡献,提供了更有效率、准确的方法,可以根据个人用户进行个性化定制,从而提高车辆内交互的安全性和用户体验,以及系统的可靠性和信任性。我们在hardware层面使用了时光探测相机,在算法层面使用了数据扩展、个性化适应和增量学习技术。我们对我们的方法的性能进行评估, recognition accuracy达到90%,并证明了个性化适应和增量学习的有效性,以及用户中心的设计。

CoDBench: A Critical Evaluation of Data-driven Models for Continuous Dynamical Systems

  • paper_url: http://arxiv.org/abs/2310.01650
  • repo_url: None
  • paper_authors: Priyanshu Burark, Karn Tiwari, Meer Mehran Rashid, Prathosh A P, N M Anoop Krishnan
  • for: This paper is written for researchers and practitioners in the field of scientific machine learning, particularly those interested in modeling dynamical systems using data-driven models.
  • methods: The paper uses 11 state-of-the-art data-driven models for solving differential equations, including feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures.
  • results: The paper conducts extensive experiments to evaluate the capabilities of these models in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency, using 8 widely applicable benchmark datasets encompassing challenges from fluid and solid mechanics. The results show that current operators struggle with the newer mechanics datasets, motivating the need for more robust neural operators.Here is the simplified Chinese version of the three key information points:
  • for: 这篇论文是为科学机器学习研究者和实践者们写的,特别是关注使用数据驱动模型来模型动态系统的人们。
  • methods: 这篇论文使用了11种最新的数据驱动模型来解决偏微分方程,包括扩展神经网络、深度运算回归模型、频率基于神经运算器和转换架构。
  • results: 这篇论文进行了广泛的实验来评估这些模型在学习、零shot超分辨、数据效率、鲁棒性和计算效率等方面的能力,使用8种广泛适用的标准 benchmark dataset,包括流体和固体机械学的挑战。
    Abstract Continuous dynamical systems, characterized by differential equations, are ubiquitously used to model several important problems: plasma dynamics, flow through porous media, weather forecasting, and epidemic dynamics. Recently, a wide range of data-driven models has been used successfully to model these systems. However, in contrast to established fields like computer vision, limited studies are available analyzing the strengths and potential applications of different classes of these models that could steer decision-making in scientific machine learning. Here, we introduce CodBench, an exhaustive benchmarking suite comprising 11 state-of-the-art data-driven models for solving differential equations. Specifically, we comprehensively evaluate 4 distinct categories of models, viz., feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures against 8 widely applicable benchmark datasets encompassing challenges from fluid and solid mechanics. We conduct extensive experiments, assessing the operators' capabilities in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency. Interestingly, our findings highlight that current operators struggle with the newer mechanics datasets, motivating the need for more robust neural operators. All the datasets and codes will be shared in an easy-to-use fashion for the scientific community. We hope this resource will be an impetus for accelerated progress and exploration in modeling dynamical systems.
    摘要 To address this gap, we introduce CodBench, an exhaustive benchmarking suite comprising 11 state-of-the-art data-driven models for solving differential equations. Specifically, we comprehensively evaluate 4 distinct categories of models, including feed forward neural networks, deep operator regression models, frequency-based neural operators, and transformer architectures, against 8 widely applicable benchmark datasets that cover challenges from fluid and solid mechanics.We conduct extensive experiments to assess the operators' capabilities in learning, zero-shot super-resolution, data efficiency, robustness to noise, and computational efficiency. Our findings reveal that current operators struggle with the newer mechanics datasets, highlighting the need for more robust neural operators.All the datasets and codes will be shared in an easy-to-use fashion for the scientific community, with the hope that this resource will serve as an impetus for accelerated progress and exploration in modeling dynamical systems.

Human Mobility Question Answering (Vision Paper)

  • paper_url: http://arxiv.org/abs/2310.04443
  • repo_url: None
  • paper_authors: Hao Xue, Flora D. Salim
  • for: 这个论文的目的是提出一个新的任务:人行为问答(MobQA),以帮助智能系统通过人行为数据学习并回答相关的问题。
  • methods: 该论文提出了一种初步的数据集设计和深度学习模型框架,用于支持该新的研究方向。
  • results: 该论文的研究对人行为预测和问答系统的研究带来了新的思路和方向,并可能开拓了新的应用领域,如智能城市规划、疫情管理和个性化推荐系统。
    Abstract Question answering (QA) systems have attracted much attention from the artificial intelligence community as they can learn to answer questions based on the given knowledge source (e.g., images in visual question answering). However, the research into question answering systems with human mobility data remains unexplored. Mining human mobility data is crucial for various applications such as smart city planning, pandemic management, and personalised recommendation system. In this paper, we aim to tackle this gap and introduce a novel task, that is, human mobility question answering (MobQA). The aim of the task is to let the intelligent system learn from mobility data and answer related questions. This task presents a new paradigm change in mobility prediction research and further facilitates the research of human mobility recommendation systems. To better support this novel research topic, this vision paper also proposes an initial design of the dataset and a potential deep learning model framework for the introduced MobQA task. We hope that this paper will provide novel insights and open new directions in human mobility research and question answering research.
    摘要 question answering (QA) 系统在人工智能社区中受到了很多关注,因为它们可以从知识源(例如:图像)学习回答问题。但是,对于使用人类活动数据进行问题回答系统的研究,仍然是一个未解探索的领域。人类活动数据的采矿是各种应用中的重要 componenet,例如:智能城市观察、疫病管理和个性化推荐系统。在这篇论文中,我们想要处理这个差异,并引入一个新的任务:人类活动问题回答(MobQA)。这个任务的目的是让智能系统从人类活动数据中学习回答相关问题。这个任务将带来一个新的思维模式变化,并且为人类活动推荐系统的研究提供更多的支持。为了更好地支持这个新的研究主题,这篇论文还提出了一个初步的数据集设计和一个可能的深度学习模型框架 для MobQA 任务。我们希望这篇论文能够提供新的见解和开启新的方向,并且为人类活动研究和问题回答研究带来新的发展。

On Training Derivative-Constrained Neural Networks

  • paper_url: http://arxiv.org/abs/2310.01649
  • repo_url: https://github.com/lbai-lab/dcnn-training
  • paper_authors: KaiChieh Lo, Daniel Huang
  • for: 这种情况是在物理学术领域中使用神经网络(NN)的时候,通过输入的部分导数来提供额外的训练信号。
  • methods: 我们提出了一种集成RELU(IReLU)活动函数,以改善DCNN的训练。我们还研究了减小和标签缩放以帮助DC训练稳定。
  • results: 我们在物理学术领域中使用 физи学信息和科学机器学习(SciML)任务进行评估,发现现有架构与IReLU激活函数、减小和标签缩放结合可以更好地利用导数约束提供的训练信号。
    Abstract We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs. We also investigate denormalization and label rescaling to help stabilize DC training. We evaluate our methods on physics-informed settings including quantum chemistry and Scientific Machine Learning (SciML) tasks. We demonstrate that existing architectures with IReLU activations combined with denormalization and label rescaling better incorporate training signal provided by derivative constraints.
    摘要 我们称将神经网络(NN)预测结果对入力的(部分)导数作为训练信号的情况为对数条件(DC)神经网络。这种情况在自然科学中的物理知识整合中很普遍。我们提议一个统合RELU(IReLU)启动函数来改善 DC 训练。我们还研究了减小和标签推对称来帮助稳定 DC 训练。我们在物理知识整合中包括量子化学和科学机器学习(SciML)任务进行评估。我们示出了现有架构与 IReLU 启动函数、减小和标签推对称可以更好地利用训练信号。

Exploring Naming Conventions (and Defects) of Pre-trained Deep Learning Models in Hugging Face and Other Model Hubs

  • paper_url: http://arxiv.org/abs/2310.01642
  • repo_url: None
  • paper_authors: Wenxin Jiang, Chingwo Cheung, George K. Thiruvathukal, James C. Davis
  • for: 本研究旨在探讨PTM名称的命名规范和相关的名称问题。
  • methods: 本研究使用自动生成名称评估技术和自动架构识别算法来分析PTM名称的语义和语法模式。
  • results: 研究发现PTM名称的命名规范和名称问题,并将这些问题框定为研究到实践过程中的一部分。我们预计未来的更多实践研究将基于PTM的元特征来支持模型搜索和再利用。
    Abstract As innovation in deep learning continues, many engineers want to adopt Pre-Trained deep learning Models (PTMs) as components in computer systems. PTMs are part of a research-to-practice pipeline: researchers publish PTMs, which engineers adapt for quality or performance and then deploy. If PTM authors choose appropriate names for their PTMs, it could facilitate model discovery and reuse. However, prior research has reported that model names are not always well chosen, and are sometimes erroneous. The naming conventions and naming defects for PTM packages have not been systematically studied - understanding them will add to our knowledge of how the research-to-practice process works for PTM packages In this paper, we report the first study of PTM naming conventions and the associated PTM naming defects. We define the components of a PTM package name, comprising the package name and claimed architecture from the metadata. We present the first study focused on characterizing the nature of naming in PTM ecosystem. To this end, we developed a novel automated naming assessment technique that can automatically extract the semantic and syntactic patterns. To identify potential naming defects, we developed a novel algorithm, automated DNN ARchitecture Assessment pipeline (DARA), to cluster PTMs based on architectural differences. Our study suggests the naming conventions for PTMs, and frames the naming conventions as signal of the research-to-practice relationships in the PTM ecosystem. We envision future works on further empirical study on leveraging meta-features of PTMs to support model search and reuse.
    摘要 As deep learning innovation continues, many engineers want to use Pre-Trained deep learning Models (PTMs) as components in computer systems. PTMs are part of a research-to-practice pipeline: researchers publish PTMs, which engineers adapt for quality or performance and then deploy. If PTM authors choose appropriate names for their PTMs, it could facilitate model discovery and reuse. However, prior research has reported that model names are not always well chosen, and are sometimes erroneous. The naming conventions and naming defects for PTM packages have not been systematically studied - understanding them will add to our knowledge of how the research-to-practice process works for PTM packages. In this paper, we report the first study of PTM naming conventions and the associated PTM naming defects. We define the components of a PTM package name, comprising the package name and claimed architecture from the metadata. We present the first study focused on characterizing the nature of naming in PTM ecosystem. To this end, we developed a novel automated naming assessment technique that can automatically extract the semantic and syntactic patterns. To identify potential naming defects, we developed a novel algorithm, automated DNN ARchitecture Assessment pipeline (DARA), to cluster PTMs based on architectural differences. Our study suggests the naming conventions for PTMs, and frames the naming conventions as signal of the research-to-practice relationships in the PTM ecosystem. We envision future works on further empirical study on leveraging meta-features of PTMs to support model search and reuse.

Imitation Learning from Observation through Optimal Transport

  • paper_url: http://arxiv.org/abs/2310.01632
  • repo_url: None
  • paper_authors: Wei-Di Chang, Scott Fujimoto, David Meger, Gregory Dudek
  • for: 学习从观察(ILfO)设定,在无直接指导和示例行为的情况下,学习者尝试模仿专家的行为。
  • methods: 使用最优运输来优化IL,生成基于温水斯坦距离的状态轨迹奖励函数,不需要学习模型或对抗学习。
  • results: 在多个连续控制任务上 demonstrates 该简单的方法可以超越当前状态的艺术,在不同评估领域中达到专家水平表现,即使只有一个专家轨迹而没有行为。
    Abstract Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine the use of optimal transport for IL, in which a reward is generated based on the Wasserstein distance between the state trajectories of the learner and expert. We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning. Unlike many other state-of-the-art methods, our approach can be integrated with any RL algorithm, and is amenable to ILfO. We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting, achieving expert-level performance across a range of evaluation domains even when observing only a single expert trajectory without actions.
    摘要 学习从观察(ILfO)是一种学习场景,在这种场景中,学习者尝试通过观察专家的行为,而不是通过直接示范的行动,来学习行为。在这篇论文中,我们重新评估了使用最优运输来进行 IL,在这种情况下,学习者会根据专家的状态轨迹和自己的状态轨迹之间的普通 Wasserstein 距离来生成奖励函数。我们表明,现有的方法可以简化为生成奖励函数,不需要学习模型或对抗学习。与其他多数现状之前的方法不同,我们的方法可以与任何RL算法集成,并且适用于 ILfO Setting。我们在多个连续控制任务上进行了详细的实验,并证明了我们的简单方法可以在评估领域中超过状态艺术。

VAL: Interactive Task Learning with GPT Dialog Parsing

  • paper_url: http://arxiv.org/abs/2310.01627
  • repo_url: None
  • paper_authors: Lane Lawley, Christopher J. MacLellan
  • for: 学习交互任务 (Interactive Task Learning) 的系统,以便从有限的人类提供的Modalities,如自然语言,获得增量知识。
  • methods: 使用大语言模型 (Large Language Models) 进行特定任务,如 predicate 和 argument 选择,并将其与符号学习算法集成。
  • results: VAL 系统可以很好地从自然语言中学习层次任务知识,并且可以通过人类理解的方式表示获得的知识,同时可以在执行新任务时无需额外训练。
    Abstract Reinforcement learning often requires millions of examples to produce static, black-box models. In contrast, interactive task learning (ITL) emphasizes incremental knowledge acquisition from limited instruction provided by humans in modalities such as natural language. However, in practice, ITL systems often suffers from brittle, error-prone language parsing. Large language models (LLMs) are resistant to brittleness but are not interpretable and cannot learn incrementally. We present VAL, an ITL system with a new philosophy for LLM/symbolic integration. By using LLMs only for specific tasks -- such as predicate and argument selection -- within an algorithmic framework, VAL reaps the benefits of LLMs to support interactive learning of hierarchical task knowledge from natural language. Acquired knowledge is human interpretable and generalizes to support execution of novel tasks without additional training. We studied users' interactions with VAL in a video game setting, finding that most users could successfully teach VAL using language they felt was natural.
    摘要 常规强化学习通常需要数百万个示例来生成静态、黑盒模型。相比之下,交互任务学习(ITL)强调从人类提供的有限指导下逐步获得知识。然而,在实践中,ITL系统经常会遇到不稳定的自然语言分析问题。大型自然语言模型(LLM)具有不稳定性的抵觐性,但是它们不可解释和不能在不同任务上逐步学习。我们提出了 VAL,一种基于 ITL 系统的新哲学,通过将 LLM 用于特定任务,例如 predicate 和 argument 选择,来支持从自然语言中学习层次任务知识。获得的知识是人类可解释的,并且可以在执行新任务时进行扩展。我们在视频游戏设定下对用户与 VAL 的互动进行了研究,发现大多数用户可以使用自然的语言教导 VAL。

Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

  • paper_url: http://arxiv.org/abs/2310.01616
  • repo_url: None
  • paper_authors: Emmeran Johnson, Ciara Pike-Burke, Patrick Rebeschini
  • for: investigate the relationship between sample-efficiency and adaptivity in reinforcement learning
  • methods: employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch, covering the whole adaptivity spectrum from non-adaptive to fully adaptive scenarios
  • results: establish lower bounds on the number of batches $K$ required for sample-efficient algorithms with $n = O(poly(d))$ queries, showing that just having adaptivity does not necessarily guarantee sample-efficiency, and the adaptivity-boundary for sample-efficiency depends on the problem dimension.Here is the text in Simplified Chinese:
  • for: investigate the relationship between sample-efficiency和adaptivity在 reinforcement learning中
  • methods: 使用一个学习框架,允许在 $K$ 批次中发送查询,并在每批次后进行反馈处理和查询更新,覆盖整个 adaptivity 谱从非适应的 ‘offline’ ($K=1$) 到完全适应 ($K=n$) 的场景和中间的场景
  • results: 确定 $K$ 批次的下界,以保证 sample-efficient 算法,并显示了适应性并不一定能 garant sample-efficiency,并且 adaptivity-boundary 对 sample-efficiency 的问题Dimension 依赖。
    Abstract We theoretically explore the relationship between sample-efficiency and adaptivity in reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $n$ to the environment that is polynomial in the dimension $d$ of the problem. Adaptivity refers to the frequency at which queries are sent and feedback is processed to update the querying strategy. To investigate this interplay, we employ a learning framework that allows sending queries in $K$ batches, with feedback being processed and queries updated after each batch. This model encompasses the whole adaptivity spectrum, ranging from non-adaptive 'offline' ($K=1$) to fully adaptive ($K=n$) scenarios, and regimes in between. For the problems of policy evaluation and best-policy identification under $d$-dimensional linear function approximation, we establish $\Omega(\log \log d)$ lower bounds on the number of batches $K$ required for sample-efficient algorithms with $n = O(poly(d))$ queries. Our results show that just having adaptivity ($K>1$) does not necessarily guarantee sample-efficiency. Notably, the adaptivity-boundary for sample-efficiency is not between offline reinforcement learning ($K=1$), where sample-efficiency was known to not be possible, and adaptive settings. Instead, the boundary lies between different regimes of adaptivity and depends on the problem dimension.
    摘要 我们理论上探索对应关系中的样本效率和适应性在强化学习中。一个算法是样本效率的如果它使用一个对称于问题维度($d$)的多项式数量($n$)来访问环境。适应性指的是在发送询问和处理反馈之后更新询问策略的频率。为了进行 investigating 这个关系,我们使用一个学习框架,让在 $K$ 个批次中发送询问,并在每个批次后处理反馈并更新询问策略。这个模型包括了整个适应spectrum,从非适应的 'offline' ($K=1$) 到完全适应 ($K=n$) 的情况,以及中间的情况。对于 $d$-dimensional 线性函数近似问题,我们建立 $\Omega(\log \log d)$ 下界在 $K$ 个批次中的数量,并证明这些下界不可能被 $n = O(poly(d))$ 问题数量超越。我们的结果显示,只有适应性 ($K>1$) 不是保证样本效率。更重要的是,适应性-boundary 不在 offline reinforcement learning ($K=1$) 和适应设定之间,而是在不同的适应度下随着问题维度而变化。

Solving the Quadratic Assignment Problem using Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2310.01604
  • repo_url: None
  • paper_authors: Puneet S. Bagga, Arthur Delarue
  • for: solves the Quadratic Assignment Problem (QAP) using deep reinforcement learning.
  • methods: uses a novel double pointer network that alternates between selecting a location and a facility to place.
  • results: produces solutions that are on average within 7.5% of a high-quality local search baseline, and even outperform it on 1.2% of instances.Here’s the full Chinese text:
  • for: 解决了 combinatorial 问题 Quadratic Assignment Problem (QAP) 使用深度强化学习。
  • methods: 使用一种新的 double pointer network,该网络 alternate 地选择一个位置和一个facility放置。
  • results: 生成的解决方案在一个大规模的synthetic实例集上进行训练,并不需要实例特定的重新训练。out-of-sample,解决方案的平均误差为7.5%,并在1.2%的实例上超越了高质量的局部搜索基准。
    Abstract The Quadratic Assignment Problem (QAP) is an NP-hard problem which has proven particularly challenging to solve: unlike other combinatorial problems like the traveling salesman problem (TSP), which can be solved to optimality for instances with hundreds or even thousands of locations using advanced integer programming techniques, no methods are known to exactly solve QAP instances of size greater than 30. Solving the QAP is nevertheless important because of its many critical applications, such as electronic wiring design and facility layout selection. We propose a method to solve the original Koopmans-Beckman formulation of the QAP using deep reinforcement learning. Our approach relies on a novel double pointer network, which alternates between selecting a location in which to place the next facility and a facility to place in the previous location. We train our model using A2C on a large dataset of synthetic instances, producing solutions with no instance-specific retraining necessary. Out of sample, our solutions are on average within 7.5% of a high-quality local search baseline, and even outperform it on 1.2% of instances.
    摘要 “ quadratic assignment problem(QAP)是一个NP困难的问题,它与其他的 combinatorial 问题(如旅游销售问题(TSP))不同,后者可以使用先进的整数程式设计技术解决,但是QAP的解决方案仍然是一个挑战。尽管如此,解决QAP仍然很重要,因为它在电子组件设计和设施布局选择等领域有许多重要应用。我们提出了一种使用深度强化学习解决原始的 Koopmans-Beckman 形式的 QAP 方法。我们的方法基于一个 noval double pointer network,这个网络在选择下一个设施的位置和在上一个位置中放置设施之间轮流。我们使用 A2C 进行训练,使用一个大量的 sintetic 实验数据,并不需要实验pecific 重训。在 out-of-sample 测试中,我们的解决方案的平均差异为 7.5%,并且在 1.2% 的实验中even outperform 高品质的地方搜索基准。”

A Review of Digital Learning Environments for Teaching Natural Language Processing in K-12 Education

  • paper_url: http://arxiv.org/abs/2310.01603
  • repo_url: None
  • paper_authors: Xiaoyi Tian, Kristy Elizabeth Boyer
  • for: 这篇论文旨在探讨在基础教育(K-12)中使用自然语言处理(NLP)的数字学习环境。
  • methods: 论文采用了一种抽查式的方法,检查现有的数字学习工具是否支持特定的NLP任务和过程,以及这些工具在教育上的可读性和评价结果。
  • results: 论文发现现有的数字学习工具在教育上有很多优点和缺点,并且有一些领域需要进一步的研究和开发。这篇论文的发现可以帮助未来的研究人员更好地制定更有效和包容的NLP教育策略。
    Abstract Natural Language Processing (NLP) plays a significant role in our daily lives and has become an essential part of Artificial Intelligence (AI) education in K-12. As children grow up with NLP-powered applications, it is crucial to introduce NLP concepts to them, fostering their understanding of language processing, language generation, and ethical implications of AI and NLP. This paper presents a comprehensive review of digital learning environments for teaching NLP in K-12. Specifically, it explores existing digital learning tools, discusses how they support specific NLP tasks and procedures, and investigates their explainability and evaluation results in educational contexts. By examining the strengths and limitations of these tools, this literature review sheds light on the current state of NLP learning tools in K-12 education. It aims to guide future research efforts to refine existing tools, develop new ones, and explore more effective and inclusive strategies for integrating NLP into K-12 educational contexts.
    摘要 自然语言处理(NLP)在我们日常生活中扮演着重要的角色,成为人工智能(AI)教育的一个重要组成部分。随着孩子在NLP应用程序中长大,因此是非常重要引入NLP概念,促进他们对语言处理、语言生成以及人工智能和NLP的伦理方面的理解。本文提供了关于在K-12教育中教学NLP的全面评估。特别是,它探讨了现有的数字学习环境,评估它们支持具体的NLP任务和过程,以及它们在教育 context中的解释性和评价结果。通过对这些工具的分析,本文照明了K-12教育中NLP学习工具的当前状况,以引导未来的研究努力,抑制现有工具,开发新的工具,并探索更有效和包容的策略,以将NLP integrate到K-12教育中。

CAT-LM: Training Language Models on Aligned Code And Tests

  • paper_url: http://arxiv.org/abs/2310.01602
  • repo_url: https://github.com/raonikitha/cat-lm
  • paper_authors: Nikitha Rao, Kush Jain, Uri Alon, Claire Le Goues, Vincent J. Hellendoorn
  • for: 这个论文的目的是提出一种基于 GPT 语言模型的自动化测试生成方法,以提高测试生成效率和代码质量。
  • methods: 该方法使用了一种新的预训练信号,该信号考虑到测试文件和代码文件之间的映射关系,并使用了长度最多为 8,192 个 tokens 的输入序列,以确保模型可以利用代码上下文来生成测试代码。
  • results: 实验结果表明,CAT-LM 可以生成高覆盖率的测试代码,并且比其他大型语言模型(CodeGen 16B 和 StarCoder)更能够生成有效的测试代码。此外,CAT-LM 还可以在测试完成任务中表现出色,并且在比较 Test-specific 模型 (TeCo) 时表现出优异。
    Abstract Testing is an integral part of the software development process. Yet, writing tests is time-consuming and therefore often neglected. Classical test generation tools such as EvoSuite generate behavioral test suites by optimizing for coverage, but tend to produce tests that are hard to understand. Language models trained on code can generate code that is highly similar to that written by humans, but current models are trained to generate each file separately, as is standard practice in natural language processing, and thus fail to consider the code-under-test context when producing a test file. In this work, we propose the Aligned Code And Tests Language Model (CAT-LM), a GPT-style language model with 2.7 Billion parameters, trained on a corpus of Python and Java projects. We utilize a novel pretraining signal that explicitly considers the mapping between code and test files when available. We also drastically increase the maximum sequence length of inputs to 8,192 tokens, 4x more than typical code generation models, to ensure that the code context is available to the model when generating test code. We analyze its usefulness for realistic applications, showing that sampling with filtering (e.g., by compilability, coverage) allows it to efficiently produce tests that achieve coverage similar to ones written by developers while resembling their writing style. By utilizing the code context, CAT-LM generates more valid tests than even much larger language models trained with more data (CodeGen 16B and StarCoder) and substantially outperforms a recent test-specific model (TeCo) at test completion. Overall, our work highlights the importance of incorporating software-specific insights when training language models for code and paves the way to more powerful automated test generation.
    摘要 testing是软件开发过程中的一个重要环节。然而,写测试用例是时间consuming的,因此经常被忽略。经典的测试生成工具such as EvoSuite会生成行为测试用例,但是它们通常会生成难以理解的测试用例。使用代码生成模型可以生成高度类似于人类编写的代码,但现有的模型都是在自然语言处理的标准实践下进行单个文件的生成,因此缺乏考虑到测试代码上下文的能力。在这种情况下,我们提出了一种新的语言模型CAT-LM,其基于GPT的语言模型,具有2.7亿个参数,并在Python和Java项目的庞大词汇库上进行训练。我们采用了一种新的预训练信号,其将考虑测试代码和源代码之间的映射。我们还增加了输入序列长度的最大值至8,192个字符,比typical code generation模型更长,以确保模型在生成测试代码时可以获得代码上下文。我们对CAT-LM的实用性进行了实验,并证明了采用抽象筛选(例如,编译可读性、覆盖率)可以有效地生成测试用例,而且与开发者编写的测试用例的样式相似。CAT-LM在与更大的语言模型(CodeGen 16B和StarCoder)和最近的测试特定模型(TeCo)进行比较时表现出了明显的优势,可以快速生成高覆盖率的测试用例。总之,我们的工作表明了在训练代码生成模型时,需要结合软件特定的知识,以便更好地实现自动化测试生成。

Memory-efficient particle filter recurrent neural network for object localization

  • paper_url: http://arxiv.org/abs/2310.01595
  • repo_url: None
  • paper_authors: Roman Korkin, Ivan Oseledets, Aleksandr Katrutsa
  • for: 本研究提出了一种新的快速卷积神经网络架构,用于解决对象定位问题。
  • methods: 本研究使用了经典的粒子筛抽象和GRU循环神经网络架构,实现了具有同样数量的参数来处理不同环境大小的约束。
  • results: 在我们的实验中,mePFRNN模型在Symmetric和噪音环境中表现更加精准地定位对象,并且需要 fewer trained parameters。
    Abstract This study proposes a novel memory-efficient recurrent neural network (RNN) architecture specified to solve the object localization problem. This problem is to recover the object states along with its movement in a noisy environment. We take the idea of the classical particle filter and combine it with GRU RNN architecture. The key feature of the resulting memory-efficient particle filter RNN model (mePFRNN) is that it requires the same number of parameters to process environments of different sizes. Thus, the proposed mePFRNN architecture consumes less memory to store parameters compared to the previously proposed PFRNN model. To demonstrate the performance of our model, we test it on symmetric and noisy environments that are incredibly challenging for filtering algorithms. In our experiments, the mePFRNN model provides more precise localization than the considered competitors and requires fewer trained parameters.
    摘要 这个研究提出了一种新的快速减少参数的隐藏状态RNN架构,用于解决对象Localization问题。这个问题是在噪声环境中恢复对象的状态和运动。我们从类传统的 particule filter 中灵感,并将 GRU RNN 架构结合起来。 mePFRNN 模型的关键特征是它需要同一数量的参数来处理不同大小的环境。因此,我们提出的 mePFRNN 模型对存储参数的内存占用量更低,相比之前提出的 PFRNN 模型。为证明我们的模型的性能,我们在具有对称和噪声环境的测试中对它进行了测试,并发现它在 filtering 算法中表现更加精准,需要 fewer 训练参数。

Prescribed Fire Modeling using Knowledge-Guided Machine Learning for Land Management

  • paper_url: http://arxiv.org/abs/2310.01593
  • repo_url: None
  • paper_authors: Somya Sharma Chatterjee, Kelly Lindsay, Neel Chatterjee, Rohan Patil, Ilkay Altintas De Callafon, Michael Steinbach, Daniel Giron, Mai H. Nguyen, Vipin Kumar
    for: 这项研究旨在提供一种可靠、快速的机器学习(ML)框架,用于实时规划和管理控制性的野火预防。methods: 该研究使用了域知识引入的ML框架,以减少数据缺乏时的物理不一致性问题,同时还使用了数据补充技术来减少预测偏袋问题。results: 研究表明,该ML框架可以快速地模拟控制性的野火,同时也可以提供更准确的火灾面积和燃烧速度估计值。此外,该框架还可以在不同的风向condition下展现出更好的总体化能力。
    Abstract In recent years, the increasing threat of devastating wildfires has underscored the need for effective prescribed fire management. Process-based computer simulations have traditionally been employed to plan prescribed fires for wildfire prevention. However, even simplified process models like QUIC-Fire are too compute-intensive to be used for real-time decision-making, especially when weather conditions change rapidly. Traditional ML methods used for fire modeling offer computational speedup but struggle with physically inconsistent predictions, biased predictions due to class imbalance, biased estimates for fire spread metrics (e.g., burned area, rate of spread), and generalizability in out-of-distribution wind conditions. This paper introduces a novel machine learning (ML) framework that enables rapid emulation of prescribed fires while addressing these concerns. By incorporating domain knowledge, the proposed method helps reduce physical inconsistencies in fuel density estimates in data-scarce scenarios. To overcome the majority class bias in predictions, we leverage pre-existing source domain data to augment training data and learn the spread of fire more effectively. Finally, we overcome the problem of biased estimation of fire spread metrics by incorporating a hierarchical modeling structure to capture the interdependence in fuel density and burned area. Notably, improvement in fire metric (e.g., burned area) estimates offered by our framework makes it useful for fire managers, who often rely on these fire metric estimates to make decisions about prescribed burn management. Furthermore, our framework exhibits better generalization capabilities than the other ML-based fire modeling methods across diverse wind conditions and ignition patterns.
    摘要 近年来,due to the increasing threat of devastating wildfires,the need for effective prescribed fire management has become more urgent. Traditional process-based computer simulations have been used to plan prescribed fires for wildfire prevention, but even simplified process models like QUIC-Fire are too compute-intensive for real-time decision-making, especially when weather conditions change rapidly.Traditional machine learning (ML) methods used for fire modeling offer computational speedup, but they struggle with physically inconsistent predictions, biased predictions due to class imbalance, biased estimates for fire spread metrics (e.g., burned area, rate of spread), and generalizability in out-of-distribution wind conditions.This paper introduces a novel ML framework that enables rapid emulation of prescribed fires while addressing these concerns. By incorporating domain knowledge, the proposed method helps reduce physical inconsistencies in fuel density estimates in data-scarce scenarios. To overcome the majority class bias in predictions, we leverage pre-existing source domain data to augment training data and learn the spread of fire more effectively.Finally, we overcome the problem of biased estimation of fire spread metrics by incorporating a hierarchical modeling structure to capture the interdependence in fuel density and burned area. Notably, the improvement in fire metric (e.g., burned area) estimates offered by our framework makes it useful for fire managers, who often rely on these fire metric estimates to make decisions about prescribed burn management.Furthermore, our framework exhibits better generalization capabilities than the other ML-based fire modeling methods across diverse wind conditions and ignition patterns.

On the Safety of Open-Sourced Large Language Models: Does Alignment Really Prevent Them From Being Misused?

  • paper_url: http://arxiv.org/abs/2310.01581
  • repo_url: None
  • paper_authors: Hangfan Zhang, Zhimeng Guo, Huaisheng Zhu, Bochuan Cao, Lu Lin, Jinyuan Jia, Jinghui Chen, Dinghao Wu
  • For: The paper is written to investigate whether alignment can prevent open-sourced large language models from being misused to generate undesired content.* Methods: The paper uses direct manipulation of the generation process to misguide open-sourced LLMs into generating undesired content, including harmful or biased information and even private data.* Results: The paper shows that even aligned open-sourced LLMs can be easily misguided to generate undesired content without heavy computations or careful prompt designs, highlighting the need for more advanced mitigation strategies.
    Abstract Large Language Models (LLMs) have achieved unprecedented performance in Natural Language Generation (NLG) tasks. However, many existing studies have shown that they could be misused to generate undesired content. In response, before releasing LLMs for public access, model developers usually align those language models through Supervised Fine-Tuning (SFT) or Reinforcement Learning with Human Feedback (RLHF). Consequently, those aligned large language models refuse to generate undesired content when facing potentially harmful/unethical requests. A natural question is "could alignment really prevent those open-sourced large language models from being misused to generate undesired content?''. In this work, we provide a negative answer to this question. In particular, we show those open-sourced, aligned large language models could be easily misguided to generate undesired content without heavy computations or careful prompt designs. Our key idea is to directly manipulate the generation process of open-sourced LLMs to misguide it to generate undesired content including harmful or biased information and even private data. We evaluate our method on 4 open-sourced LLMs accessible publicly and our finding highlights the need for more advanced mitigation strategies for open-sourced LLMs.
    摘要 我们的关键思想是通过直接控制公开的LLMs生成过程来误导它生成不良内容,包括有害或歧视性的信息以及私人数据。我们对公开的4个LLMs进行了评估,并发现了更高度的防范策略是必要的。这种发现高亮了公开的LLMs需要更加高度的安全保护。

Active Learning on Neural Networks through Interactive Generation of Digit Patterns and Visual Representation

  • paper_url: http://arxiv.org/abs/2310.01580
  • repo_url: https://github.com/drjeong/digitperceptron
  • paper_authors: Dong H. Jeong, Jin-Hee Cho, Feng Chen, Audun Josang, Soo-Yeon Ji
    for: 这个论文的目的是提高用户对人工神经网络(ANNs)的理解和学习,通过设计一个互动式学习系统,在实时模式下创造数字 patrerns并识别它们。methods: 该论文使用了人工神经网络(NNs)来识别数字 patrerns,并通过多种用户交互来帮助用户更好地理解数字 patrerns的视觉差异和其与NNs的结果。results: 经过多种数据集的评估,该系统被证明可以帮助用户更好地理解数字 patrerns和其与NNs的关系,并在实时模式下提供有用的学习体验。在夏令营中的 informal 用户测试中,参与者们也表示了对该系统的好用。
    Abstract Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying computation and meaning are hidden. Due to this nature, users often face difficulties in interpreting the underlying mechanism of the NNs and the benefits of using them. In this paper, to improve users' learning and understanding of NNs, an interactive learning system is designed to create digit patterns and recognize them in real time. To help users clearly understand the visual differences of digit patterns (i.e., 0 ~ 9) and their results with an NN, integrating visualization is considered to present all digit patterns in a two-dimensional display space with supporting multiple user interactions. An evaluation with multiple datasets is conducted to determine its usability for active learning. In addition, informal user testing is managed during a summer workshop by asking the workshop participants to use the system.
    摘要 人工神经网络(ANNs)已广泛应用于不同数据分析和各个领域问题解决。然而,神经网络(NNs)在多年来一直被视为黑盒子操作,因为其下面计算和意义隐藏不seen。由于这种性质,用户经常遇到解释NNs的下面机制和使用它们的困难。在这篇论文中,为了提高用户学习和理解NNs,一种互动学习系统被设计,可以在实时创建数字模式和识别它们。为了帮助用户清楚地理解数字模式(即0~9)的视觉差异和NNs的结果,在两个维度显示空间中进行了视觉化的整合。为了评估该系统的使用可用性,对多个数据集进行了评估。此外,在一个夏令营中,通过向参加者们提问,进行了非正式的用户测试。

Iterative Option Discovery for Planning, by Planning

  • paper_url: http://arxiv.org/abs/2310.01569
  • repo_url: None
  • paper_authors: Kenny Young, Richard S. Sutton
  • for: 这篇论文的目的是探索有用的时间抽象,以实现使用增强学习和规划在更加复杂的领域中。
  • methods: 这篇论文提出了Option Iteration,一种基于AlphaZero的专家循环找到专家策略的方法。它不是将单一强大的策略学习到满足搜索结果的所有状态,而是将一组专家策略学习到每个状态下,至少有一个策略在某些时间进程中对搜索结果进行匹配。
  • results: 实验结果显示,使用Option Iteration学习的规划Algorithm在复杂的规划环境中具有优秀的表现,较 analogous planning algorithm operating in the space of primitive actions and learning a single rollout policy with Expert Iteration。
    Abstract Discovering useful temporal abstractions, in the form of options, is widely thought to be key to applying reinforcement learning and planning to increasingly complex domains. Building on the empirical success of the Expert Iteration approach to policy learning used in AlphaZero, we propose Option Iteration, an analogous approach to option discovery. Rather than learning a single strong policy that is trained to match the search results everywhere, Option Iteration learns a set of option policies trained such that for each state encountered, at least one policy in the set matches the search results for some horizon into the future. Intuitively, this may be significantly easier as it allows the algorithm to hedge its bets compared to learning a single globally strong policy, which may have complex dependencies on the details of the current state. Having learned such a set of locally strong policies, we can use them to guide the search algorithm resulting in a virtuous cycle where better options lead to better search results which allows for training of better options. We demonstrate experimentally that planning using options learned with Option Iteration leads to a significant benefit in challenging planning environments compared to an analogous planning algorithm operating in the space of primitive actions and learning a single rollout policy with Expert Iteration.
    摘要 发现有用的时间抽象,在形式上是选项,被广泛认为是应用奖励学习和规划到越来越复杂的领域的关键。基于 alphaZero 中的专家迭代方法的实际成功,我们提议Option Iteration,这是一种相似的选项发现方法。而不是学习一个强大的全局策略,这种策略可能具有复杂的现状依赖关系,Option Iteration 学习一组局部强策略,其中每个状态都有至少一个策略匹配搜索结果在某个时间后的未来。这可能比学习一个全局强策略更容易,因为它允许算法在不同的状态下留下风险。已经学习了这组局部强策略后,我们可以使用它们来导引搜索算法,从而形成一个循环关系,更好的选项会导致更好的搜索结果,从而帮助更好的选项的训练。我们通过实验表明,使用 Option Iteration 进行规划在复杂的规划环境中具有显著的优势,比一种相似的规划算法在原始动作空间中学习一个单一满足 Rollout 策略的 Expert Iteration 方法。

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

  • paper_url: http://arxiv.org/abs/2310.01558
  • repo_url: https://github.com/oriyor/ret-robust
  • paper_authors: Ori Yoran, Tomer Wolfson, Ori Ram, Jonathan Berant
  • for: 这个论文的目的是探讨 Retrieval-augmented language models (RALMs) 在多步骤理解场景下的性能,并研究如何在不相关的证据被利用时避免性能下降。
  • methods: 该论文使用了五个开放领域问答benchmark进行了广泛的分析,描述了 Retrieval-augmented language models (RALMs) 在不相关的证据下时的性能下降的情况。然后,该论文提出了两种方法来解决这个问题:一种是使用自然语言推理(NLI)模型来筛选出问题答案对应的段落,另一种是通过在训练时使用混合相关和无关上下文来训练语言模型,以便在不相关上下文下仍然维持高性能。
  • results: 该论文的实验结果表明,只要使用1,000个示例来调整语言模型,就可以使其在不相关上下文下维持高性能,同时在相关上下文下保持高性能。
    Abstract Retrieval-augmented language models (RALMs) hold promise to produce language understanding systems that are are factual, efficient, and up-to-date. An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant, and does not harm performance when it is not. This is particularly important in multi-hop reasoning scenarios, where misuse of irrelevant evidence can lead to cascading errors. However, recent work has shown that retrieval augmentation can sometimes have a negative effect on performance. In this work, we present a thorough analysis on five open-domain question answering benchmarks, characterizing cases when retrieval reduces accuracy. We then propose two methods to mitigate this issue. First, a simple baseline that filters out retrieved passages that do not entail question-answer pairs according to a natural language inference (NLI) model. This is effective in preventing performance reduction, but at a cost of also discarding relevant passages. Thus, we propose a method for automatically generating data to fine-tune the language model to properly leverage retrieved passages, using a mix of relevant and irrelevant contexts at training time. We empirically show that even 1,000 examples suffice to train the model to be robust to irrelevant contexts while maintaining high performance on examples with relevant ones.
    摘要

SmartPlay : A Benchmark for LLMs as Intelligent Agents

  • paper_url: http://arxiv.org/abs/2310.01557
  • repo_url: https://github.com/microsoft/smartplay
  • paper_authors: Yue Wu, Xuan Tang, Tom M. Mitchell, Yuanzhi Li
  • for: 评估大语言模型(LLM)的智能代理和下一代自动化能力
  • methods: 引入了6个游戏 benchmark,包括 Рок-纸-剑、塔ower of Hanoi 和 Minecraft,每款游戏都提供了不同的设定,共计20个评估设定和无限多种环境变化
  • results: SmartPlay 可以评估 LLM 代理的9种重要能力,包括对物品相互依赖的理解、规划、空间理解、从历史学习、理解随机性等,这些能力的分化使得可以分析每一能力 separately
    Abstract Recent large language models (LLMs) have demonstrated great potential toward intelligent agents and next-gen automation, but there currently lacks a systematic benchmark for evaluating LLMs' abilities as agents. We introduce SmartPlay: both a challenging benchmark and a methodology for evaluating LLMs as agents. SmartPlay consists of 6 different games, including Rock-Paper-Scissors, Tower of Hanoi, Minecraft. Each game features a unique setting, providing up to 20 evaluation settings and infinite environment variations. Each game in SmartPlay uniquely challenges a subset of 9 important capabilities of an intelligent LLM agent, including reasoning with object dependencies, planning ahead, spatial reasoning, learning from history, and understanding randomness. The distinction between the set of capabilities each game test allows us to analyze each capability separately. SmartPlay serves not only as a rigorous testing ground for evaluating the overall performance of LLM agents but also as a road-map for identifying gaps in current methodologies. We release our benchmark at github.com/microsoft/SmartPlay
    摘要 最近的大语言模型(LLM)已经展示了很大的潜力,用于智能代理和下一代自动化,但目前没有一个系统性的 benchmark 来评估 LLM 的代理能力。我们介绍了 SmartPlay:一个具有挑战性的 benchmark 和评估 LLM 作为代理的方法ологи。SmartPlay 包含 6 个游戏,包括石头剑纸、塔ower of Hanoi 和 Minecraft。每个游戏都有唯一的设定,提供最多 20 个评估设定和无限多的环境变化。每个游戏在 SmartPlay 中具有独特的挑战,测试 LLM 代理的 9 个重要能力,包括对物品之间的逻辑推理、规划、空间逻辑、从历史学习、理解Randomness。这些测试设定的差异使我们可以分析每个能力 separately。SmartPlay 不仅为评估 LLM 代理的总性表现提供了严格的测试场景,还可以作为当前方法学的漏洞图标。我们在 GitHub 上发布了我们的 benchmark,请参考

Harnessing the Power of Choices in Decision Tree Learning

  • paper_url: http://arxiv.org/abs/2310.01551
  • repo_url: https://github.com/sullivanc19/pydl8.5-topk
  • paper_authors: Guy Blanc, Jane Lange, Chirag Pabbaraju, Colin Sullivan, Li-Yang Tan, Mo Tiwari
  • for: 提高决策树学习算法的性能和可扩展性
  • methods: 提出了一种简单的普适化,使用Top-$k$算法,考虑了$k$个最佳特征作为拆分方式,而不是单个最佳特征
  • results: 经过理论和实验验证,Top-$k$算法可以在各种benchmark上提供较高的准确率,同时也比较有可扩展性,可以处理大量数据和特征集。
    Abstract We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they grow a decision tree by iteratively splitting on the best attribute. Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just the single best attribute. We demonstrate, theoretically and empirically, the power of this simple generalization. We first prove a {\sl greediness hierarchy theorem} showing that for every $k \in \mathbb{N}$, Top-$(k+1)$ can be dramatically more powerful than Top-$k$: there are data distributions for which the former achieves accuracy $1-\varepsilon$, whereas the latter only achieves accuracy $\frac1{2}+\varepsilon$. We then show, through extensive experiments, that Top-$k$ outperforms the two main approaches to decision tree learning: classic greedy algorithms and more recent "optimal decision tree" algorithms. On one hand, Top-$k$ consistently enjoys significant accuracy gains over greedy algorithms across a wide range of benchmarks. On the other hand, Top-$k$ is markedly more scalable than optimal decision tree algorithms and is able to handle dataset and feature set sizes that remain far beyond the reach of these algorithms.
    摘要 我们提出了一种简单的扩展,以提高标准和验证成功的决策树学习算法,如ID3、C4.5和CART。这些算法在机器学习领域中具有中心作用,它们会在每次拆分时选择最佳特征。我们的算法,Top-$k$,会考虑最佳$k$个特征作为拆分点,而不是只有单个最佳特征。我们在理论和实验方面证明了这个简单的扩展的力量。我们首先证明了一个“吝啬层次理论”,显示在每个$k \in \mathbb{N}$中,Top-$k$可以比Top-$k+1$更强大:存在一些数据分布,使得前者可以达到准确率$1-\varepsilon$,而后者只能达到准确率$\frac1{2}+\varepsilon$。然后,我们通过广泛的实验表明,Top-$k$在许多基准上具有显著的准确性提升,并且可以在许多情况下超越经典的滥览算法和“优质决策树”算法。一方面,Top-$k$在广泛的基准上具有显著的准确性提升。另一方面,Top-$k$可以处理 dataset 和特征集的大小,这些大小还超过“优质决策树”算法的处理范围。

Algebras of actions in an agent’s representations of the world

  • paper_url: http://arxiv.org/abs/2310.01536
  • repo_url: https://github.com/awjdean/cayleytablegeneration
  • paper_authors: Alexander Dean, Eduardo Alonso, Esther Mondragon
  • for: 本文提出了一个框架,用于从智能体的视角提取世界变换的代数。
  • methods: 本文使用我们提出的计算方法,对世界变换的特征进行分类,并将其归类为不同的属性。
  • results: 本文将 symmetry-based representation learning(SBDRL) formalism中的同质性基础代表学习(EDRL)的两个重要结论推广到不仅基于对称性的表示,还基于世界变换的表示。最后,我们将独立处理每个纯属性的同质性条件,并将其与分类结果结合起来。
    Abstract In this paper, we propose a framework to extract the algebra of the transformations of worlds from the perspective of an agent. As a starting point, we use our framework to reproduce the symmetry-based representations from the symmetry-based disentangled representation learning (SBDRL) formalism proposed by [1]; only the algebra of transformations of worlds that form groups can be described using symmetry-based representations. We then study the algebras of the transformations of worlds with features that occur in simple reinforcement learning scenarios. Using computational methods, that we developed, we extract the algebras of the transformations of these worlds and classify them according to their properties. Finally, we generalise two important results of SBDRL - the equivariance condition and the disentangling definition - from only working with symmetry-based representations to working with representations capturing the transformation properties of worlds with transformations for any algebra. Finally, we combine our generalised equivariance condition and our generalised disentangling definition to show that disentangled sub-algebras can each have their own individual equivariance conditions, which can be treated independently.
    摘要 在这篇论文中,我们提出了一个框架,用于从智能体的视角提取世界变换的代数。作为起点,我们使用我们的框架重新实现了基于 симметрии的分离表示学习(SBDRL)形式中的对称性基于表示。只有世界变换的代数可以通过对称性基于表示来描述。我们然后研究了具有简单奖励学习场景中的特征的世界变换的代数。使用我们开发的计算方法,我们提取了这些世界变换的代数,并根据其性质进行分类。最后,我们扩展了 SBDRL 中两个重要结论 - 对称条件和分离定义 - 从只能使用对称基于表示来使用表示捕捉世界变换的Transformations的任何代数。此外,我们将这两个扩展结论相加以示,分离子代数可以各自拥有自己的对称条件,这些条件可以独立处理。

Bridging the Gap between Structural and Semantic Similarity in Diverse Planning

  • paper_url: http://arxiv.org/abs/2310.01520
  • repo_url: https://github.com/mfaisalzaki/pair2023-semantic-similarity-metrics
  • paper_authors: Mustafa F. Abdelwahed, Joan Espasa, Alice Toniolo, Ian P. Gent
  • for: 提高多计划生成器的多样性和效率,以及在具有噪音和缺失观测的情况下提高计划识别系统的效能。
  • methods: 提出了两种新的域无关度量,可以在域依存的视角下捕捉两个给定计划之间的不同信息。
  • results: 通过在多种情况下测试这两种度量,显示了它们能够更好地捕捉两个计划之间的 Structural Symmetries,并且在当前使用的度量失败时捕捉到一些重要的相似性信息。
    Abstract Diverse planning is the problem of finding multiple plans for a given problem specification, which is at the core of many real-world applications. For example, diverse planning is a critical piece for the efficiency of plan recognition systems when dealing with noisy and missing observations. Providing diverse solutions can also benefit situations where constraints are too expensive or impossible to model. Current diverse planners operate by generating multiple plans and then applying a selection procedure to extract diverse solutions using a similarity metric. Generally, current similarity metrics only consider the structural properties of the given plans. We argue that this approach is a limitation that sometimes prevents such metrics from capturing why two plans differ. In this work, we propose two new domain-independent metrics which are able to capture relevant information on the difference between two given plans from a domain-dependent viewpoint. We showcase their utility in various situations where the currently used metrics fail to capture the similarity between plans, failing to capture some structural symmetries.
    摘要 多样规划是一个问题,它的核心在于多个问题规定下找到多个解决方案。这种问题在许多实际应用中具有重要性,例如在受到噪音和缺失观察的情况下,多样规划是计划认知系统的关键组成部分。提供多样解决方案可以帮助处理约束太过于贵或者不可能模型的情况。现有的多样规划器通常通过生成多个计划,然后使用一个选择过程提取多样解决方案,并使用一个相似度度量来衡量两个计划之间的相似性。现有的相似度度量通常只考虑计划的结构性质。我们认为这种方法是一种局限性,有时会使相似度度量无法捕捉两个计划之间的差异。在这种情况下,我们提出了两种新的领域无关的度量,它们能够捕捉不同领域视角下两个计划之间的差异。我们在各种情况下展示了这两种度量的实用性,包括现有的度量失去一些结构对称性。

Towards Automatic Design of Factorio Blueprints

  • paper_url: http://arxiv.org/abs/2310.01505
  • repo_url: None
  • paper_authors: Sean Patterson, Joan Espasa, Mun See Chang, Ruth Hoffmann
  • For: The paper is written to explore the feasibility of a constraint model to optimise Factorio blueprints, with the goal of balancing correctness, optimality, and performance.* Methods: The paper uses a constraint model to optimise Factorio blueprints, combining elements of bin-packing, routing, and network design to create an optimal design.* Results: The paper presents a new challenging problem and explores the feasibility of the constraint model to optimise Factorio blueprints, demonstrating the potential for improving the efficiency and effectiveness of factory designs in the game.Here’s the same information in Simplified Chinese:* 为:本文探讨了使用约束模型优化 factorio 蓝图,以寻求平衡正确性、优化性和性能的解决方案。* 方法:本文使用约束模型,结合了垃圾排序、 Routing 和网络设计等元素,实现了优化蓝图的目标。* 结果:本文提出了一个新的挑战性问题,并探讨了约束模型是否能够有效地优化 factorio 蓝图,并示出了可能提高Factory设计的效率和生产力。
    Abstract Factorio is a 2D construction and management simulation video game about building automated factories to produce items of increasing complexity. A core feature of the game is its blueprint system, which allows players to easily save and replicate parts of their designs. Blueprints can reproduce any layout of objects in the game, but are typically used to encapsulate a complex behaviour, such as the production of a non-basic object. Once created, these blueprints are then used as basic building blocks, allowing the player to create a layer of abstraction. The usage of blueprints not only eases the expansion of the factory but also allows the sharing of designs with the game's community. The layout in a blueprint can be optimised using various criteria, such as the total space used or the final production throughput. The design of an optimal blueprint is a hard combinatorial problem, interleaving elements of many well-studied problems such as bin-packing, routing or network design. This work presents a new challenging problem and explores the feasibility of a constraint model to optimise Factorio blueprints, balancing correctness, optimality, and performance.
    摘要 factorio 是一款 2D 建设和管理模拟游戏,玩家需要建立自动化工厂生产复杂性增加的物品。核心特点之一是蓝图系统,允许玩家轻松地保存并复制部分设计。蓝图可以复制任何布局对象在游戏中,通常用于嵌入复杂行为,如生产非基本物品。一旦创建,这些蓝图就被用作基本建筑块,让玩家创建一层抽象。使用蓝图不仅简化了工厂的扩展,还allow玩家与游戏社区分享设计。在蓝图中的布局可以通过不同的优化标准来优化,如总空间用途或最终生产速率。设计优化蓝图是一个复杂的 combinatorial 问题,涉及了许多已经研究过的问题,如垃圾堆分配、路由或网络设计。本研究提出了一个新的挑战性问题,并explores the feasibility of a constraint model to optimize Factorio blueprints,既要求正确性、优化性和性能。

Towards a Model of Puzznic

  • paper_url: http://arxiv.org/abs/2310.01503
  • repo_url: None
  • paper_authors: Joan Espasa, Ian P. Gent, Ian Miguel, Peter Nightingale, András Z. Salamon, Mateu Villaret
  • for: 研究了一种用于模拟和解决PUZZNIC视频游戏的方法,该游戏需要玩家根据Sequence of moves来清空一个格子,并且不含移动块。
  • methods: 比较了一种规划方法和三种约束编程方法在一小组Benchmark实例上的表现。
  • results: 规划方法目前比约束编程方法高效,但我们提出了改进约束模型的建议。
    Abstract We report on progress in modelling and solving Puzznic, a video game requiring the player to plan sequences of moves to clear a grid by matching blocks. We focus here on levels with no moving blocks. We compare a planning approach and three constraint programming approaches on a small set of benchmark instances. The planning approach is at present superior to the constraint programming approaches, but we outline proposals for improving the constraint models.
    摘要 我们报道了PUZZNIC游戏中玩家需要规划动作序列以清空格enix的进展。我们在这里关注没有移动块的水平。我们比较了规划方法和三种约束编程方法,并对一小组benchmark实例进行比较。规划方法目前胜过约束编程方法,但我们提出了改进约束模型的建议。

GPT-Driver: Learning to Drive with GPT

  • paper_url: http://arxiv.org/abs/2310.01415
  • repo_url: https://github.com/pointscoder/gpt-driver
  • paper_authors: Jiageng Mao, Yuxi Qian, Hang Zhao, Yue Wang
  • for: 这种方法是为了将OpenAI GPT-3.5模型转换成一个可靠的自动驾驶车辆运动规划器。
  • methods: 这种方法使用了语言模型的强的理解能力和泛化潜力,将运动规划转换为一个语言模型问题,并使用了一种新的提问-理解-调整策略来刺激语言模型的数字化理解能力。
  • results: 在使用这种方法的大规模nuScenes数据集上进行了广泛的实验,并证明了这种GPT基于的运动规划器的有效性、泛化能力和可解释性。
    Abstract We present a simple yet effective approach that can transform the OpenAI GPT-3.5 model into a reliable motion planner for autonomous vehicles. Motion planning is a core challenge in autonomous driving, aiming to plan a driving trajectory that is safe and comfortable. Existing motion planners predominantly leverage heuristic methods to forecast driving trajectories, yet these approaches demonstrate insufficient generalization capabilities in the face of novel and unseen driving scenarios. In this paper, we propose a novel approach to motion planning that capitalizes on the strong reasoning capabilities and generalization potential inherent to Large Language Models (LLMs). The fundamental insight of our approach is the reformulation of motion planning as a language modeling problem, a perspective not previously explored. Specifically, we represent the planner inputs and outputs as language tokens, and leverage the LLM to generate driving trajectories through a language description of coordinate positions. Furthermore, we propose a novel prompting-reasoning-finetuning strategy to stimulate the numerical reasoning potential of the LLM. With this strategy, the LLM can describe highly precise trajectory coordinates and also its internal decision-making process in natural language. We evaluate our approach on the large-scale nuScenes dataset, and extensive experiments substantiate the effectiveness, generalization ability, and interpretability of our GPT-based motion planner. Code is now available at https://github.com/PointsCoder/GPT-Driver.
    摘要 我们提出了一种简单 yet effective的方法,可以将OpenAI GPT-3.5模型转换成自动驾驶车辆的可靠动作规划器。动作规划是自动驾驶的核心挑战,目标是计划一条安全和舒适的驾驶路径。现有的动作规划器主要采用了规则方法来预测驾驶路径,然而这些方法在面对新和未经见过的驾驶场景时显示出不够的总体化能力。在这篇论文中,我们提出了一种新的动作规划方法,它基于大语言模型(LLM)的强大推理能力和总体化潜力。我们的基本想法是将动作规划视为语言模型问题,一种未曾被探讨的视角。具体来说,我们将驾驶器输入和输出转换为语言符号,并利用LLM来生成驾驶路径的语言描述。此外,我们还提出了一种新的套件-推理-精度调整策略,以促进LLM的数字推理潜力。这种策略使得LLM可以通过自然语言描述坐标位置来描述高精度的路径坐标和其内部决策过程。我们对大规模的 nuScenes 数据集进行了广泛的实验,并证明了我们基于 GPT 的动作规划器的效果、总体化能力和可读性。代码现已经公开在 GitHub 上,请参考 https://github.com/PointsCoder/GPT-Driver。

A multi-institutional pediatric dataset of clinical radiology MRIs by the Children’s Brain Tumor Network

  • paper_url: http://arxiv.org/abs/2310.01413
  • repo_url: None
  • paper_authors: Ariana M. Familiar, Anahita Fathi Kazerooni, Hannah Anderson, Aliaksandr Lubneuski, Karthik Viswanathan, Rocky Breslow, Nastaran Khalili, Sina Bagheri, Debanjan Haldar, Meen Chul Kim, Sherjeel Arif, Rachel Madhogarhia, Thinh Q. Nguyen, Elizabeth A. Frenkel, Zeinab Helili, Jessica Harrison, Keyvan Farahani, Marius George Linguraru, Ulas Bagci, Yury Velichko, Jeffrey Stevens, Sarah Leary, Robert M. Lober, Stephani Campion, Amy A. Smith, Denise Morinigo, Brian Rood, Kimberly Diamond, Ian F. Pollack, Melissa Williams, Arastoo Vossough, Jeffrey B. Ware, Sabine Mueller, Phillip B. Storm, Allison P. Heath, Angela J. Waanders, Jena V. Lilly, Jennifer L. Mason, Adam C. Resnick, Ali Nabavizadeh
  • for: 这项研究旨在使用临床决策支持系统在儿童脑肿瘤领域进行预测分析,以提高儿童脑肿瘤的诊断和治疗。
  • methods: 这项研究使用了大规模的多Parametric MRI数据,以及关联的临床信息、数字生物 Pathology 镜像和生物 markers 数据,通过人工智能方法进行预测分析。
  • results: 研究提供了23,101个多 Parametric MRI数据,并与1,526名脑肿瘤患者的临床信息相关联,以便进行预测分析和诊断。
    Abstract Pediatric brain and spinal cancers remain the leading cause of cancer-related death in children. Advancements in clinical decision-support in pediatric neuro-oncology utilizing the wealth of radiology imaging data collected through standard care, however, has significantly lagged other domains. Such data is ripe for use with predictive analytics such as artificial intelligence (AI) methods, which require large datasets. To address this unmet need, we provide a multi-institutional, large-scale pediatric dataset of 23,101 multi-parametric MRI exams acquired through routine care for 1,526 brain tumor patients, as part of the Children's Brain Tumor Network. This includes longitudinal MRIs across various cancer diagnoses, with associated patient-level clinical information, digital pathology slides, as well as tissue genotype and omics data. To facilitate downstream analysis, treatment-na\"ive images for 370 subjects were processed and released through the NCI Childhood Cancer Data Initiative via the Cancer Data Service. Through ongoing efforts to continuously build these imaging repositories, our aim is to accelerate discovery and translational AI models with real-world data, to ultimately empower precision medicine for children.
    摘要 儿童脑和脊梗癌仍然是儿童癌症related death的主要原因。 however,在临床决策支持方面,使用儿童神经外科数据的财富,尚未有significant progress。这些数据非常适合用于预测分析,如人工智能(AI)方法,这些方法需要大量数据。为了解决这个未满需求,我们提供了多所机构、大规模的儿童数据集,包括23,101个多参数MRI检查数据,来自1,526名脑肿瘤患者的 Routine care中收集的数据。这包括跨诊断的长期MRI数据,以及相关的患者水平的临床信息、数字生物pathology slides、以及组织遗传和 Omics 数据。为了促进下游分析,我们对370名患者的首次检查数据进行了处理和发布,通过国家癌症数据计划(NCI)的儿童癌数据计划,通过癌症数据服务。我们持续努力建立这些成像存储库,以便加速发现和翻译AI模型,以最终实现精准医学 для儿童。

Generalized Animal Imitator: Agile Locomotion with Versatile Motion Prior

  • paper_url: http://arxiv.org/abs/2310.01408
  • repo_url: None
  • paper_authors: Ruihan Yang, Zhuoqun Chen, Jianhan Ma, Chongyi Zheng, Yiyu Chen, Quan Nguyen, Xiaolong Wang
  • for: 本研究旨在帮助机器人学习多种迅速和灵活的行走、转弯、跳跃和翻滚等活动,以便在复杂的任务中应用。
  • methods: 本研究使用了 Reinforcement Learning 框架,并通过模仿动物的动作和手动设计的动作来学习多种迅速的低级技能。
  • results: 本研究成功地使用了一个单一的控制器来同时学习多种迅速的行走技能,并在实际环境中进行了评估。
    Abstract The agility of animals, particularly in complex activities such as running, turning, jumping, and backflipping, stands as an exemplar for robotic system design. Transferring this suite of behaviors to legged robotic systems introduces essential inquiries: How can a robot be trained to learn multiple locomotion behaviors simultaneously? How can the robot execute these tasks with a smooth transition? And what strategies allow for the integrated application of these skills? This paper introduces the Versatile Instructable Motion prior (VIM) - a Reinforcement Learning framework designed to incorporate a range of agile locomotion tasks suitable for advanced robotic applications. Our framework enables legged robots to learn diverse agile low-level skills by imitating animal motions and manually designed motions with Functionality reward and Stylization reward. While the Functionality reward guides the robot's ability to adopt varied skills, the Stylization reward ensures performance alignment with reference motions. Our evaluations of the VIM framework span both simulation environments and real-world deployment. To our understanding, this is the first work that allows a robot to concurrently learn diverse agile locomotion tasks using a singular controller. Further details and supportive media can be found at our project site: https://rchalyang.github.io/VIM .
    摘要 “动物的机能,特别是在复杂的活动中,如跑步、折转、跳跃和翻杀,是机器系统设计的优秀示范。将这些行为 transferred 到四肢机器系统引入了重要的问题:如何让机器人学习多种 locomotive 行为 simultaneously? 如何使机器人在不同任务之间进行缓 Slope 的转换?以及如何实现这些技能的集成应用?本文介绍了 Versatile Instructable Motion prior (VIM) - 一种基于奖励学习的框架,用于把动物的机能 transferred 到高级机器人应用。我们的框架使得四肢机器人可以通过模仿动物动作和人工设计动作来学习多种灵活的低级技能。在Functionality奖励下,机器人可以采取多种技能,而在Stylization奖励下,机器人的表现需要与参考动作保持一致。我们对 VIM 框架进行了在 simulate 环境和实际部署的评估。到我们所知,这是首次允许机器人同时学习多种灵活的 locomotive 任务使用单一控制器。更多细节和支持媒体可以在我们项目网站上找到:https://rchalyang.github.io/VIM。”

Conditional Diffusion Distillation

  • paper_url: http://arxiv.org/abs/2310.01407
  • repo_url: None
  • paper_authors: Kangfu Mei, Mauricio Delbracio, Hossein Talebi, Zhengzhong Tu, Vishal M. Patel, Peyman Milanfar
  • for: 提高文本到图像生成中的速度,以便更快地进行图像修剪、修复和超分辨等条件生成任务。
  • methods: 使用条件馈承法,将 diffusion 模型与图像条件结合,以降低抽象采样时间。通过单 Stage 的共同学习,直接吸取普通预训练,大大简化之前的两 Stage 的分离两个阶段的 distillation 和条件精度调整。
  • results: 在多个任务上,包括超分辨、图像修剪和深度到图像生成等,实验表明,我们的方法可以超过现有的 distillation 技术,同样的采样时间下。另外,我们的方法是首个能匹配已经精度调整的条件馈承模型的 distillation 策略。
    Abstract Generative diffusion models provide strong priors for text-to-image generation and thereby serve as a foundation for conditional generation tasks such as image editing, restoration, and super-resolution. However, one major limitation of diffusion models is their slow sampling time. To address this challenge, we present a novel conditional distillation method designed to supplement the diffusion priors with the help of image conditions, allowing for conditional sampling with very few steps. We directly distill the unconditional pre-training in a single stage through joint-learning, largely simplifying the previous two-stage procedures that involve both distillation and conditional finetuning separately. Furthermore, our method enables a new parameter-efficient distillation mechanism that distills each task with only a small number of additional parameters combined with the shared frozen unconditional backbone. Experiments across multiple tasks including super-resolution, image editing, and depth-to-image generation demonstrate that our method outperforms existing distillation techniques for the same sampling time. Notably, our method is the first distillation strategy that can match the performance of the much slower fine-tuned conditional diffusion models.
    摘要 生成扩散模型提供强大的文本到图像生成预期,因此成为条件生成任务such as图像修复、修剪和超解像的基础。然而,扩散模型的扫描时间很慢。为解决这个挑战,我们提出了一种新的条件馈渠方法,通过将图像条件融入到扩散预期中,实现了很快速的条件扫描。我们通过同时学习和联合学习来直接升级扩散预期,大大简化了之前的两个阶段的过程,这两个阶段分别是通过馈渠和条件训练来进行馈渠和条件训练。此外,我们的方法具有新的参数效率的馈渠机制,每个任务只需要少量的额外参数,并且与共享冻结的无条件背景结合使用。实验表明,我们的方法在多个任务上,包括超解像、图像修复和深度到图像生成,都能够超越现有的馈渠技术。吸引注意的是,我们的方法是首次馈渠策略,可以在同样的扫描时间内匹配Conditional扩散模型的性能。>>>

Representation Engineering: A Top-Down Approach to AI Transparency

  • paper_url: http://arxiv.org/abs/2310.01405
  • repo_url: https://github.com/andyzoujm/representation-engineering
  • paper_authors: Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks
  • for: 本研究旨在探讨和描述人工智能系统中的表示工程(RepE),它是一种基于认知神经科学的方法,用于增强人工智能系统的透明性。
  • methods: 本研究使用了人工智能系统中的深度神经网络(DNNs),并使用了人类大脑中的 Population-level 表示来帮助我们更好地理解和控制这些系统。
  • results: 本研究显示了 RepE 技术的基线和初步分析,并证明了这些技术可以简单而有效地提高我们对大语言模型的理解和控制。此外,这些技术还可以解决许多安全问题,如诚实、无害、权力欲等问题,这表明了 RepE 的推动力。
    Abstract In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive phenomena in deep neural networks (DNNs). We provide baselines and an initial analysis of RepE techniques, showing that they offer simple yet effective solutions for improving our understanding and control of large language models. We showcase how these methods can provide traction on a wide range of safety-relevant problems, including honesty, harmlessness, power-seeking, and more, demonstrating the promise of top-down transparency research. We hope that this work catalyzes further exploration of RepE and fosters advancements in the transparency and safety of AI systems.
    摘要 在这篇论文中,我们识别并特点化出现在AI系统中的表示工程(RepE),这是一种以认知神经科学为基础的方法,用于增强AI系统的透明性。RepE将人口级表示作为分析中心点,为我们提供了新的监测和控制深度神经网络(DNNs)中高级认知现象的方法。我们提供了基线和初步分析RepE技术,证明它们可以提供简单 yet effective的解决方案,以提高我们对大语言模型的理解和控制。我们示例了这些方法可以解决各种安全关键问题,包括诚实、无害、欲望力和更多的问题,表明了顶部透明性研究的承诺。我们希望这项工作能够推动更多的RepE探索,并促进AI系统的透明性和安全。

A Good Snowman is Hard to Plan

  • paper_url: http://arxiv.org/abs/2310.01471
  • repo_url: https://github.com/udg-lai/keps2023
  • paper_authors: Miquel Bofill, Cristina Borralleras, Joan Espasa, Gerard Martín, Gustavo Patow, Mateu Villaret
  • for: 本研究旨在设计一种能够证明解决方案的优化工具,以便在粘雪人游戏中提高玩家的参与度。
  • methods: 本研究使用了直观推理(SAT)来模型游戏问题,并证明了使用直观推理可以更快地解决问题,而不使用axioms来表示可达性预测。
  • results: 研究对51个水平进行了解决,其中43个水平得到了解决,剩下8个水平仍然需要解决。
    Abstract In this work we face a challenging puzzle video game: A Good Snowman is Hard to Build. The objective of the game is to build snowmen by moving and stacking snowballs on a discrete grid. For the sake of player engagement with the game, it is interesting to avoid that a player finds a much easier solution than the one the designer expected. Therefore, having tools that are able to certify the optimality of solutions is crucial. Although the game can be stated as a planning problem and can be naturally modelled in PDDL, we show that a direct translation to SAT clearly outperforms off-the-shelf state-of-the-art planners. As we show, this is mainly due to the fact that reachability properties can be easily modelled in SAT, allowing for shorter plans, whereas using axioms to express a reachability derived predicate in PDDL does not result in any significant reduction of solving time with the considered planners. We deal with a set of 51 levels, both original and crafted, solving 43 and with 8 challenging instances still remaining to be solved.
    摘要 在这个工作中,我们面临着一个挑战性的拼图游戏:一个好的冰人是困难的建立。游戏的目标是在离散格上移动和堆叠冰球以建立冰人。为了让玩家与游戏产生更多的互动,避免玩家找到设计师没有想到的更加容易的解决方案。因此,有效的证明解的工具是非常重要。虽然游戏可以被视为规划问题,但我们表明了直接将问题转换为SAT明确超过了现有的状态艺术机器人的解决时间。这主要是因为SAT中可以轻松地表达可达性属性,从而导致更短的计划,而使用axioms来表达可达性 derivated predicate在PDDL中并不会导致解决时间减少。我们处理了51个水平,其中包括原始和手工制作的水平,解决了43个,并且还有8个挑战性的实例尚未解决。

Challenges in Modelling and Solving Plotting with PDDL

  • paper_url: http://arxiv.org/abs/2310.01470
  • repo_url: None
  • paper_authors: Joan Espasa, Ian Miguel, Peter Nightingale, András Z. Salamon, Mateu Villaret
  • for: 本研究是基于游戏《Plotting》的规划问题,旨在从格子中移除目标数量的颜色块。
  • methods: 本研究使用PDDL模型和基于实际的grounding算法来解决这个问题。
  • results: 研究表明,模型 Plotting 的复杂性和gravity的影响使得规划问题变得非常困难。
    Abstract We study a planning problem based on Plotting, a tile-matching puzzle video game published by Taito in 1989. The objective of this game is to remove a target number of coloured blocks from a grid by sequentially shooting blocks into the grid. Plotting features complex transitions after every shot: various blocks are affected directly, while others can be indirectly affected by gravity. We highlight the challenges of modelling Plotting with PDDL and of solving it with a grounding-based state-of-the-art planner.
    摘要 我们研究一个基于Plotting的规划问题,这是一款由Taito在1989年发布的块匹配游戏。这个游戏的目标是从格子中移除一定数量的颜色块,通过sequentially发射块到格子中来完成。Plotting中存在复杂的转移,其中一些块会直接受到影响,而其他块则可能会受到重力的影响。我们将PDDL的模型化和一个基于实体的现代规划器的解决方案强调出来。

EXTRACTER: Efficient Texture Matching with Attention and Gradient Enhancing for Large Scale Image Super Resolution

  • paper_url: http://arxiv.org/abs/2310.01379
  • repo_url: https://github.com/esteban-rs/extracter
  • paper_authors: Esteban Reyes-Saldana, Mariano Rivera
  • for: 提高低分辨率图像的超分辨率性能,使用引用高分辨率图像的纹理传输来增强低分辨率图像。
  • methods: 使用深度搜索算法,在特征空间内查找最相似的纹理匹配,并使用简单的差分架构来提高超分辨率结果的纹理精度。
  • results: 提高超分辨率结果的纹理精度,使用PSNR和SSMI指标显示竞争性metric results。
    Abstract Recent Reference-Based image super-resolution (RefSR) has improved SOTA deep methods introducing attention mechanisms to enhance low-resolution images by transferring high-resolution textures from a reference high-resolution image. The main idea is to search for matches between patches using LR and Reference image pair in a feature space and merge them using deep architectures. However, existing methods lack the accurate search of textures. They divide images into as many patches as possible, resulting in inefficient memory usage, and cannot manage large images. Herein, we propose a deep search with a more efficient memory usage that reduces significantly the number of image patches and finds the $k$ most relevant texture match for each low-resolution patch over the high-resolution reference patches, resulting in an accurate texture match. We enhance the Super Resolution result adding gradient density information using a simple residual architecture showing competitive metrics results: PSNR and SSMI.
    摘要 Translated into Simplified Chinese:现代参考基于图像超解析(RefSR)已经提高了SOTA深度方法,通过引入注意机制来提高低分辨率图像,并将高分辨率图像的文件质量传递给低分辨率图像。主要的想法是在LR和参考图像对的Feature空间中搜索匹配,并使用深度架构将其合并。然而,现有方法缺乏准确的 texture搜索。它们将图像分解成最多的patches为 möglich,导致内存使用非常不效率,并无法处理大图像。在这里,我们提议一种深度搜索,它可以更有效地使用内存,并将LR图像中的patch数量减少到最低限度。此外,我们还添加了一个简单的差分体系,以提高超分辨率结果的某些特征。我们的方法在PSNR和SSMI指标上达到了竞争性的metric结果。

On Grid Graph Reachability and Puzzle Games

  • paper_url: http://arxiv.org/abs/2310.01378
  • repo_url: https://github.com/udg-lai/modref2023
  • paper_authors: Miquel Bofill, Cristina Borralleras, Joan Espasa, Mateu Villaret
  • for: 这个论文主要针对的是解决类似于Sokoban的拧盘游戏中的路径规划问题。
  • methods: 该论文使用CP和SAT方法来解决这类问题,并提出了一新的编码方法。
  • results: 实验表明,新的编码方法在 плани策SAT模式下特别适合解决拧盘问题,特别是当考虑同时执行多个动作时。
    Abstract Many puzzle video games, like Sokoban, involve moving some agent in a maze. The reachable locations are usually apparent for a human player, and the difficulty of the game is mainly related to performing actions on objects, such as pushing (reachable) boxes. For this reason, the difficulty of a particular level is often measured as the number of actions on objects, other than agent walking, needed to find a solution. In this paper we study CP and SAT approaches for solving these kind of problems. We review some reachability encodings and propose a new one. We empirically show that the new encoding is well-suited for solving puzzle problems in the planning as SAT paradigm, especially when considering the execution of several actions in parallel.
    摘要 许多拼图游戏,如索科幻, involve moving some agent in a maze. The reachable locations are usually apparent for a human player, and the difficulty of the game is mainly related to performing actions on objects, such as pushing (reachable) boxes. For this reason, the difficulty of a particular level is often measured as the number of actions on objects, other than agent walking, needed to find a solution. In this paper, we study CP and SAT approaches for solving these kind of problems. We review some reachability encodings and propose a new one. We empirically show that the new encoding is well-suited for solving puzzle problems in the planning as SAT paradigm, especially when considering the execution of several actions in parallel.Here's a breakdown of the text in Simplified Chinese:许多拼图游戏 (xù duō pīn tú xì yù) - Many puzzle video gameslike Sokoban (索科幻) - involve moving some agent in a maze.The reachable locations are usually apparent for a human player (人类玩家可见的可达位置) - The reachable locations are usually apparent for a human playerand the difficulty of the game is mainly related to performing actions on objects (主要是通过对物体进行操作) - and the difficulty of the game is mainly related to performing actions on objectssuch as pushing (可达) boxes. (例如,推动oboxes。) - such as pushing boxes.For this reason, the difficulty of a particular level is often measured as the number of actions on objects (其中的某些级别通常被计算为对物体进行操作的次数) - For this reason, the difficulty of a particular level is often measured as the number of actions on objectsother than agent walking (除了agent walking外) - other than agent walkingneeded to find a solution. (需要解决。) - needed to find a solution.In this paper, we study CP and SAT approaches for solving these kind of problems. (在这篇论文中,我们研究了CP和SAT方法来解决这些问题。) - In this paper, we study CP and SAT approaches for solving these kind of problems.We review some reachability encodings (我们审查了一些可达性编码) - We review some reachability encodingsand propose a new one. (并提出了一个新的编码。) - and propose a new one.We empirically show that the new encoding is well-suited for solving puzzle problems in the planning as SAT paradigm (我们实际上证明了新编码适合在 плани策中的SAT парадигме中解决拼图问题) - We empirically show that the new encoding is well-suited for solving puzzle problems in the planning as SAT paradigmespecially when considering the execution of several actions in parallel. (特别是在同时执行多个动作时。) - especially when considering the execution of several actions in parallel.

UltraFeedback: Boosting Language Models with High-quality Feedback

  • paper_url: http://arxiv.org/abs/2310.01377
  • repo_url: https://github.com/thunlp/ultrafeedback
  • paper_authors: Ganqu Cui, Lifan Yuan, Ning Ding, Guanming Yao, Wei Zhu, Yuan Ni, Guotong Xie, Zhiyuan Liu, Maosong Sun
    for:ULTRAFEEDBACK is designed to overcome the limitations of current preference datasets in reinforcement learning from human feedback (RLHF) research, specifically the scarcity of diverse and naturalistic datasets of human preferences on large language models (LLMs) at scale.methods:To create ULTRAFEEDBACK, the authors compile a diverse array of instructions and models from multiple sources, meticulously devise annotation instructions, and employ GPT-4 to offer detailed feedback in both numerical and textual forms.results:The authors train various models using ULTRAFEEDBACK, including the reward model UltraRM, chat language model UltraLM-13B-PPO, and critique model UltraCM, and achieve top performance across multiple benchmarks, outperforming existing open-source models.Here’s the simplified Chinese version:for:ULTRAFEEDBACK 是为了解决现有偏好数据集的限制,即人工智能学习从人类反馈中的偏好数据集的缺乏多样性和自然性,特别是大语言模型(LLM)的缺乏大规模的人类偏好数据。methods:ULTRAFEEDBACK 的创建具有多种来源的指令和模型,并且经过精心设计的注释指令和 GPT-4 的详细反馈,以 numerics 和文本两种形式提供反馈。results:作者使用 ULTRAFEEDBACK 训练多种模型,包括 UltraRM 奖励模型、UltraLM-13B-PPO 对话语言模型和 UltraCM 评价模型,并在多个benchmark上达到了现有开源模型的顶峰性表现。
    Abstract Reinforcement learning from human feedback (RLHF) has become a pivot technique in aligning large language models (LLMs) with human preferences. In RLHF practice, preference data plays a crucial role in bridging human proclivity and LLMs. However, the scarcity of diverse, naturalistic datasets of human preferences on LLM outputs at scale poses a great challenge to RLHF as well as feedback learning research within the open-source community. Current preference datasets, either proprietary or limited in size and prompt variety, result in limited RLHF adoption in open-source models and hinder further exploration. In this study, we propose ULTRAFEEDBACK, a large-scale, high-quality, and diversified preference dataset designed to overcome these limitations and foster RLHF development. To create ULTRAFEEDBACK, we compile a diverse array of instructions and models from multiple sources to produce comparative data. We meticulously devise annotation instructions and employ GPT-4 to offer detailed feedback in both numerical and textual forms. ULTRAFEEDBACK establishes a reproducible and expandable preference data construction pipeline, serving as a solid foundation for future RLHF and feedback learning research. Utilizing ULTRAFEEDBACK, we train various models to demonstrate its effectiveness, including the reward model UltraRM, chat language model UltraLM-13B-PPO, and critique model UltraCM. Experimental results indicate that our models outperform existing open-source models, achieving top performance across multiple benchmarks. Our data and models are available at https://github.com/thunlp/UltraFeedback.
    摘要 大量的人类反馈学习(RLHF)已成为对大语言模型(LLM)与人类偏好的重要技术。在RLHF实践中,偏好数据扮演着关键的桥接角色,将人类偏好与LLM相连。然而,对各种自然和多样化的人类偏好数据的大规模收集和分析却成为RLHF的重大挑战。目前的偏好数据集,ether proprietary或有限的规模和提示类型,导致RLHF在开源模型中的广泛应用和进一步探索受到限制。在这项研究中,我们提出ULTRAFEEDBACK,一个大规模、高质量和多样化的偏好数据集,旨在超越这些限制,推动RLHF的发展。为建立ULTRAFEEDBACK,我们编译了多种来源的指令和模型,以生成比较数据。我们仔细设计了注释指令,并使用GPT-4提供详细的反馈,包括数字和文本两种形式。ULTRAFEEDBACK实现了可重复和扩展的偏好数据建构管道,为未来RLHF和反馈学习研究提供坚实的基础。通过使用ULTRAFEEDBACK,我们训练了多种模型,包括奖励模型 UltraRM、chat语言模型 UltraLM-13B-PPO 和评价模型 UltraCM。实验结果表明,我们的模型在多个标准准则上表现出色,与现有的开源模型相比,具有更高的性能。我们的数据和模型可以在 GitHub 上找到:https://github.com/thunlp/UltraFeedback。

Elephant Neural Networks: Born to Be a Continual Learner

  • paper_url: http://arxiv.org/abs/2310.01365
  • repo_url: None
  • paper_authors: Qingfeng Lan, A. Rupam Mahmood
  • for: 本研究旨在探讨 activation functions 在 neural network 训练动态中的作用,以及它们对遗弃学习的影响。
  • methods: 本研究使用了一种新的 activation functions 称为 “elephant activation functions”,这种函数可以生成 both sparse representations 和 sparse gradients。
  • results: 我们的研究发现,通过将 classical activation functions 替换为 elephant activation functions,可以 Significantly improve the resilience of neural networks to catastrophic forgetting。我们的方法在 Split MNIST 数据集上 achieve excellent performance ,无需使用 replay buffer、task boundary information 或 pre-training。
    Abstract Catastrophic forgetting remains a significant challenge to continual learning for decades. While recent works have proposed effective methods to mitigate this problem, they mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieves excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training.
    摘要 长期学习中的快速忘记问题仍然是一个长期不解的挑战。Recent works have proposed effective methods to mitigate this problem, but these methods mainly focus on the algorithmic side. Meanwhile, we do not fully understand what architectural properties of neural networks lead to catastrophic forgetting. This study aims to fill this gap by studying the role of activation functions in the training dynamics of neural networks and their impact on catastrophic forgetting. Our study reveals that, besides sparse representations, the gradient sparsity of activation functions also plays an important role in reducing forgetting. Based on this insight, we propose a new class of activation functions, elephant activation functions, that can generate both sparse representations and sparse gradients. We show that by simply replacing classical activation functions with elephant activation functions, we can significantly improve the resilience of neural networks to catastrophic forgetting. Our method has broad applicability and benefits for continual learning in regression, class incremental learning, and reinforcement learning tasks. Specifically, we achieve excellent performance on Split MNIST dataset in just one single pass, without using replay buffer, task boundary information, or pre-training.

SyMPox: An Automated Monkeypox Detection System Based on Symptoms Using XGBoost

  • paper_url: http://arxiv.org/abs/2310.19801
  • repo_url: None
  • paper_authors: Alireza Farzipour, Roya Elmi, Hamid Nasiri
  • for: 这个研究目的是为了提供一个独立的Monkeypox诊断应用程序,以便对症状进行分析并提供准确的诊断。
  • methods: 这个研究使用了XGBoost算法来分析症状模式,并提供了一个用户友善的平台,让人们可以轻松地评估自己的症状并获得可靠的Monkeypox诊断。
  • results: 这个研究获得了一个名为SyMPox的独立应用程序,可以实现高速和可靠的Monkeypox诊断。
    Abstract Monkeypox is a zoonotic disease. About 87000 cases of monkeypox were confirmed by the World Health Organization until 10th June 2023. The most prevalent methods for identifying this disease are image-based recognition techniques. Still, they are not too fast and could only be available to a few individuals. This study presents an independent application named SyMPox, developed to diagnose Monkeypox cases based on symptoms. SyMPox utilizes the robust XGBoost algorithm to analyze symptom patterns and provide accurate assessments. Developed using the Gradio framework, SyMPox offers a user-friendly platform for individuals to assess their symptoms and obtain reliable Monkeypox diagnoses.
    摘要 猴病毒是一种动物传人疾病。到2023年6月10日,世界卫生组织确认了约87000例猴病毒病例。现有的主要识别方法是基于图像识别技术,但它们并不够快,只能为少数人提供服务。本研究提出了一个独立应用程序,名为SyMPox,用于诊断猴病毒病例基于症状。SyMPox利用XGBoost算法分析症状模式,提供准确评估。使用Gradio框架开发,SyMPox提供了一个易用的平台,让人们可以轻松地评估症状并获得可靠的猴病毒诊断。

RA-DIT: Retrieval-Augmented Dual Instruction Tuning

  • paper_url: http://arxiv.org/abs/2310.01352
  • repo_url: None
  • paper_authors: Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Yih
  • for: 提高 Retrieval-augmented language models (RALMs) 的性能,使其能够访问长尾和最新的知识来源。
  • methods: 我们提出了一种轻量级的细化调教方法,即 Retrieval-Augmented Dual Instruction Tuning (RA-DIT),可以让任何 LLM 添加检索功能。我们的方法包括两个细化调教步骤:第一步是更新预训练 LM,使其更好地使用检索到的信息;第二步是更新检索器,使其返回更加相关的结果,符合 LM 的首选。
  • results: 我们在具有知识利用和Contextual awareness的任务上进行了细化调教,并证明每个阶段都有显著性能提升,并且使用两个阶段时又有更大的提升。我们的最佳模型 RA-DIT 65B 在 Zero-shot 和 few-shot 学习 benchmark 上实现了状态机器人性能,与现有的在 Context 中 RALM 方法相比,在 0-shot 设定下提高了 +8.9% 的平均提升,在 5-shot 设定下提高了 +1.4% 的平均提升。
    Abstract Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning (RA-DIT), a lightweight fine-tuning methodology that provides a third option by retrofitting any LLM with retrieval capabilities. Our approach operates in two distinct fine-tuning steps: (1) one updates a pre-trained LM to better use retrieved information, while (2) the other updates the retriever to return more relevant results, as preferred by the LM. By fine-tuning over tasks that require both knowledge utilization and contextual awareness, we demonstrate that each stage yields significant performance improvements, and using both leads to additional gains. Our best model, RA-DIT 65B, achieves state-of-the-art performance across a range of knowledge-intensive zero- and few-shot learning benchmarks, significantly outperforming existing in-context RALM approaches by up to +8.9% in 0-shot setting and +1.4% in 5-shot setting on average.
    摘要 听说Retrieval-augmented language models(RALMs)可以提高表现,但是构建它们是困难的。现有的方法 Either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store, both of which lead to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction Tuning(RA-DIT),一种轻量级的细调方法,可以让任何LM添加检索能力。我们的方法在两个细调步骤中运行:(1)首先更新预训练LM,使其更好地利用检索到的信息;(2)然后更新检索器,使其返回更加相关的结果,如LM所 preference。通过在需要知识利用和Contextual awareness的任务上细调,我们示出了每个阶段的性能提升,并且使用两者导致了更大的提升。我们的最佳模型RA-DIT 65B在多种知识密集的零和几shot学习benchmark上达到了状态的最佳性能,与现有的在Context中RALM方法相比,在0-shot Setting中提高了8.9%,在5-shot Setting中提高了1.4%的平均提升。

Improving Dialogue Management: Quality Datasets vs Models

  • paper_url: http://arxiv.org/abs/2310.01339
  • repo_url: https://github.com/miguel-kjh/Improving-Dialogue-Management
  • paper_authors: Miguel Ángel Medina-Ramírez, Cayetano Guerra-Artal, Mario Hernández-Tejera
  • for: 这篇论文的目的是解释对话管理器的异常性的原因,而不是通过使用不同的模型来解决问题。
  • methods: 作者使用了人工生成的对话生成器来控制数据集中的错误量和类型,以证明数据集的质量问题是对对话管理器的性能的主要限制因素。
  • results: 研究发现,数据集中的错误对对话管理器的性能占有很大的比重。
    Abstract Task-oriented dialogue systems (TODS) have become crucial for users to interact with machines and computers using natural language. One of its key components is the dialogue manager, which guides the conversation towards a good goal for the user by providing the best possible response. Previous works have proposed rule-based systems (RBS), reinforcement learning (RL), and supervised learning (SL) as solutions for the correct dialogue management; in other words, select the best response given input by the user. However, this work argues that the leading cause of DMs not achieving maximum performance resides in the quality of the datasets rather than the models employed thus far; this means that dataset errors, like mislabeling, originate a large percentage of failures in dialogue management. We studied the main errors in the most widely used datasets, Multiwoz 2.1 and SGD, to demonstrate this hypothesis. To do this, we have designed a synthetic dialogue generator to fully control the amount and type of errors introduced in the dataset. Using this generator, we demonstrated that errors in the datasets contribute proportionally to the performance of the models
    摘要

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

  • paper_url: http://arxiv.org/abs/2310.01469
  • repo_url: https://github.com/pku-yuangroup/hallucination-attack
  • paper_authors: Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Li Yuan
  • for: 这个论文旨在探讨大语言模型(LLMs)如何处理Random Tokens的非意义提问,以及这种行为是否可以被视为黑客攻击。
  • methods: 作者使用了不同的提问方法来检测LLMs的回应,并发现了一种自动诱发幻觉的方法,称为幻觉攻击。
  • results: 研究发现, LLMS 可以通过幻觉攻击来诱发幻觉,并且这种行为与 convent ional adversarial examples 有相似之处。此外,作者还提出了一种简单 yet effective 的防御策略。
    Abstract Large Language Models (LLMs), including GPT-3.5, LLaMA, and PaLM, seem to be knowledgeable and able to adapt to many tasks. However, we still can not completely trust their answer, since LLMs suffer from hallucination--fabricating non-existent facts to cheat users without perception. And the reasons for their existence and pervasiveness remain unclear. In this paper, we demonstrate that non-sense prompts composed of random tokens can also elicit the LLMs to respond with hallucinations. This phenomenon forces us to revisit that hallucination may be another view of adversarial examples, and it shares similar features with conventional adversarial examples as the basic feature of LLMs. Therefore, we formalize an automatic hallucination triggering method as the hallucination attack in an adversarial way. Finally, we explore basic feature of attacked adversarial prompts and propose a simple yet effective defense strategy. Our code is released on GitHub.
    摘要 大型语言模型(LLM),包括GPT-3.5、LLaMA和PaLM,似乎具有知识和适应多个任务的能力。然而,我们仍不能完全信任他们的答案,因为LLMs受到幻视——创造不存在的事实来欺骗用户。而这些模型的存在和普遍性的原因仍然未知。在这篇论文中,我们示示了使用随机的tokencompose的非sensical prompt可以让LLMs回应 avec hallucination。这一现象让我们需要重新思考幻视是否是另一种见识例子,并且它与 conventioal adversarial examples 共同拥有相似的基本特征。因此,我们正式定义了一种自动幻视触发方法,即幻视攻击。最后,我们探索了受到攻击的幻视提示的基本特征,并提出了一个简单 yet effective的防御策略。我们的代码已经发布到GitHub上。

The Entity-Deduction Arena: A playground for probing the conversational reasoning and planning capabilities of LLMs

  • paper_url: http://arxiv.org/abs/2310.01468
  • repo_url: None
  • paper_authors: Yizhe Zhang, Jiarui Lu, Navdeep Jaitly
    for: This paper aims to evaluate the conversational reasoning and planning capabilities of large language models (LLMs) and to develop a surrogate problem to assess their ability to deduce an entity unknown to itself.methods: The paper uses an entity-deducing game as an evaluation framework to test the performance of various LLMs, and employs Behavior Cloning (BC) and Reinforcement Learning to enhance the reasoning and planning capacity of weaker models.results: The paper finds significant differences in the performance of different LLMs on the entity-deducing game, and demonstrates that strong LLMs like GPT-4 outperform human players by a large margin. Additionally, the paper shows that weaker models can be trained to imitate stronger models and generalize to data or domains using only the demonstrations from a stronger model.
    Abstract Large language models (LLMs) are effective at answering questions that are clearly asked. However, when faced with ambiguous queries they can act unpredictably and produce incorrect outputs. This underscores the need for the development of intelligent agents capable of asking clarification questions to resolve ambiguities effectively. This capability requires complex understanding, state tracking, reasoning and planning over multiple conversational turns. However, directly measuring this can be challenging. In this paper, we offer a surrogate problem which assesses an LLMs's capability to deduce an entity unknown to itself, but revealed to a judge, by asking the judge a series of queries. This entity-deducing game can serve as an evaluation framework to probe the conversational reasoning and planning capabilities of language models. We systematically evaluate various LLMs and discover significant differences in their performance on this task. We find that strong LLMs like GPT-4 outperform human players by a large margin. We further employ Behavior Cloning (BC) to examine whether a weaker model is capable of imitating a stronger model and generalizing to data or domains, using only the demonstrations from a stronger model. We finally propose to use Reinforcement Learning to enhance reasoning and planning capacity of Vicuna models through episodes of game playing, which lead to significant performance improvement. We hope that this problem offers insights into how autonomous agents could be trained to behave more intelligently in ambiguous circumstances.
    摘要 大型语言模型(LLM)能够有效地回答明确表达的问题。然而,当面临不确定的问题时,它们可能会 acted randomly 并产生错误的输出。这反映了需要开发智能代理人,可以有效地解决不确定性。这种能力需要复杂的理解、状态跟踪、理智和规划多个对话转帖。然而,直接测量这种能力可以困难。在这篇论文中,我们提出了一个代理人的实验,用于评估语言模型的对话理解和规划能力。我们系统地评估了不同的 LLM,并发现它们在这个任务中表现出了明显的差异。我们发现强大的 GPT-4 超越了人类玩家,并且使用 Behavior Cloning(BC)来检查一个弱化模型是否可以通过强大模型的示例来学习和泛化。最后,我们提出了使用奖励学习来增强 Vicuna 模型的理智和规划能力,通过游戏玩家的episode来实现显著的性能提升。我们希望这个问题可以为自主代理人的训练提供新的思路。

L2MAC: Large Language Model Automatic Computer for Unbounded Code Generation

  • paper_url: http://arxiv.org/abs/2310.02003
  • repo_url: None
  • paper_authors: Samuel Holt, Max Ruiz Luyten, Mihaela van der Schaar
  • for: 这篇论文是为了解决 transformer 架构下的 fixed context window 问题,使 transformer 基于模型能够生成长长的、逻辑一致的代码。
  • methods: 这篇论文提出了一种实用的 stored-program automatic computer,即 L2MAC,用于长长代码生成。L2MAC 的内存包括两部分:指令注册,用于解决用户给定的任务,以及文件存储,用于存储最终和中间输出。每个指令都由一个独立的 LLM 实例执行,它的上下文由一个控制单元管理,以确保有效地与文件存储交互。这些组件使 L2MAC 能够生成无限长的代码结构,超越 transformer 的固定上下文窗口,同时生成代码符合用户指定的复杂要求。
  • results: 该论文验证了 L2MAC 在系统设计任务中生成大型代码基本成功,其他编程方法在实现用户要求时匮乏实现能力。此外,论文还提供了关于这种性能差距的调查。
    Abstract Transformer-based large language models (LLMs) are constrained by the fixed context window of the underlying transformer architecture, hindering their ability to produce long and logically consistent code. Memory-augmented LLMs are a promising solution, but current approaches cannot handle long code generation tasks since they (1) only focus on reading memory and reduce its evolution to the concatenation of new memories or (2) use very specialized memories that cannot adapt to other domains. This paper presents L2MAC, the first practical LLM-based stored-program automatic computer for long and consistent code generation. Its memory has two components: the instruction registry, which is populated with a prompt program to solve the user-given task, and a file store, which will contain the final and intermediate outputs. Each instruction is executed by a separate LLM instance, whose context is managed by a control unit capable of precise memory reading and writing to ensure effective interaction with the file store. These components enable L2MAC to generate virtually unbounded code structures, bypassing the constraints of the finite context window while producing code that fulfills complex user-specified requirements. We empirically show that L2MAC succeeds in generating large code bases for system design tasks where other coding methods fall short in implementing user requirements and provide insight into the reasons for this performance gap.
    摘要 transformer-based large language models (LLMs) 是受到转换器架构下的固定上下文窗口限制,使其无法生成长度很长且逻辑上一致的代码。增强内存的 LLM 是一种有前途的解决方案,但现有的方法无法处理长代码生成任务,因为它们 (1) 只是关注读取内存,减少内存的演化到新内存的 concatenation 或 (2) 使用非常特殊的内存,无法适应其他领域。这篇论文介绍了 L2MAC,首个实用的 LLM 基于存储程序自动计算机,用于长度很长且一致的代码生成。它的内存包括两个组件:指令注册,通过一个提示程序解决用户给定的任务,以及文件存储,包含最终和中间输出。每个指令都是由一个独立的 LLM 实例执行,它的上下文是由一个可以精准地读取和写入内存的控制单元管理,以确保与文件存储的有效交互。这些组件使 L2MAC 可以生成无限大的代码结构,绕过转换器架构下的固定上下文窗口的限制,同时生成代码,满足用户给定的复杂要求。我们实验表明,L2MAC 可以成功生成大型代码基 для系统设计任务,其他编程方法无法实现用户要求,并提供了这种性能差距的原因。

Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy

  • paper_url: http://arxiv.org/abs/2310.01334
  • repo_url: https://github.com/unites-lab/mc-smoe
  • paper_authors: Pingzhi Li, Zhenyu Zhang, Prateek Yadav, Yi-Lin Sung, Yu Cheng, Mohit Bansal, Tianlong Chen
  • for: 提高神经网络学习能力,适用于资源受限下执行端应用。
  • methods: 提出了一种名为M-SMoE的方法,通过路由统计指导专家合并,实现专家信息的整合。
  • results: 经过广泛的实验 validate,MC-SMoE可以减少内存使用量和计算量,保持性能的稳定性。例如,MC-SMoE可以在8个标准测试集上减少内存使用量达80%,计算量减少20%,而性能减少不到1%。
    Abstract Sparsely activated Mixture-of-Experts (SMoE) has shown promise to scale up the learning capacity of neural networks, however, they have issues like (a) High Memory Usage, due to duplication of the network layers into multiple copies as experts; and (b) Redundancy in Experts, as common learning-based routing policies suffer from representational collapse. Therefore, vanilla SMoE models are memory inefficient and non-scalable, especially for resource-constrained downstream scenarios. In this paper, we ask: Can we craft a compact SMoE model by consolidating expert information? What is the best recipe to merge multiple experts into fewer but more knowledgeable experts? Our pilot investigation reveals that conventional model merging methods fail to be effective in such expert merging for SMoE. The potential reasons are: (1) redundant information overshadows critical experts; (2) appropriate neuron permutation for each expert is missing to bring all of them in alignment. To address this, we propose M-SMoE, which leverages routing statistics to guide expert merging. Specifically, it starts with neuron permutation alignment for experts; then, dominant experts and their "group members" are formed; lastly, every expert group is merged into a single expert by utilizing each expert's activation frequency as their weight for merging, thus diminishing the impact of insignificant experts. Moreover, we observed that our proposed merging promotes a low dimensionality in the merged expert's weight space, naturally paving the way for additional compression. Hence, our final method, MC-SMoE (i.e., Merge, then Compress SMoE), further decomposes the merged experts into low-rank and structural sparse alternatives. Extensive experiments across 8 benchmarks validate the effectiveness of MC-SMoE. For instance, our MC-SMoE achieves up to 80% memory and a 20% FLOPs reduction, with virtually no loss in performance.
    摘要 SMoE(简化 mixture of experts)已经展示了扩展学习能力的搭配能力,但它们具有高内存使用和重复知识的问题。因此,vanilla SMoE模型是内存不足和不可扩展的,特别是在资源有限的下游场景中。在这篇论文中,我们问:可以通过集成专家信息来构建一个 компакт的 SMoE 模型吗?我们的初步调查发现,传统的模型合并方法无法有效地将多个专家合并为 fewer but more knowledgeable 专家。这可能是因为:(1)重复信息压倒关键专家;(2)缺乏适当的神经元 permutation 来将所有专家都放在一起。为解决这个问题,我们提出了 M-SMoE,它利用路由统计来导引专家合并。具体来说,它首先对专家进行神经元 permutation 对应;然后,选择dominant专家和其所在的 "组合部分" ;最后,将每个专家组合成一个单独的专家,使用每个专家的活动频率作为合并的权重,从而减少无关专家的影响。此外,我们发现我们的提议的合并可以降低专家的维度,自然地减少压缩。因此,我们的最终方法,MC-SMoE(即Merge, then Compress SMoE),进一步将合并专家 decomposes 成低维和结构减少的专家。我们对8个标准 bench 进行了广泛的实验,并证明了 MC-SMoE 的效果。例如,我们的 MC-SMoE 可以在内存和运算数据减少80%和20%,同时保持性能几乎不变。

ChoiceMates: Supporting Unfamiliar Online Decision-Making with Multi-Agent Conversational Interactions

  • paper_url: http://arxiv.org/abs/2310.01331
  • repo_url: None
  • paper_authors: Jeongeon Park, Bryan Min, Xiaojuan Ma, Juho Kim
  • for: 本研究旨在帮助用户更好地搜寻、理解和做出决策于在线信息中,尤其是在没有准确专业知识的情况下。
  • methods: 我们采用了一种动态的LLM-powered agent系统,allowing users to engage in conversations with multiple agents,以获得域园知识和有效地搜寻信息。
  • results: 与传统的网页搜索和单个代理相比,我们的 ChoiceMates 系统在用户们发现、深入了解和管理信息方面表现出了明显的优势,participants also reported that multi-agent conversations were helpful in their decision-making process.
    Abstract Unfamiliar decisions -- decisions where people lack adequate domain knowledge or expertise -- specifically increase the complexity and uncertainty of the process of searching for, understanding, and making decisions with online information. Through our formative study (n=14), we observed users' challenges in accessing diverse perspectives, identifying relevant information, and deciding the right moment to make the final decision. We present ChoiceMates, a system that enables conversations with a dynamic set of LLM-powered agents for a holistic domain understanding and efficient discovery and management of information to make decisions. Agents, as opinionated personas, flexibly join the conversation, not only providing responses but also conversing among themselves to elicit each agent's preferences. Our between-subjects study (n=36) comparing ChoiceMates to conventional web search and single-agent showed that ChoiceMates was more helpful in discovering, diving deeper, and managing information compared to Web with higher confidence. We also describe how participants utilized multi-agent conversations in their decision-making process.
    摘要 不熟悉的决策——即人们缺乏相关领域知识或专业知识——特别增加在线信息搜索、理解和做出决策过程中的复杂性和不确定性。通过我们的初步研究(n=14),我们发现用户在获取多个视角、标识重要信息以及做出最终决策的时机上遇到了困难。我们提出了 ChoiceMates,一个基于人工智能技术的系统,可以实现与动态集合的智能代理人进行协调对话,以实现领域全面理解和信息搜索管理。这些代理人作为偏好人物,不仅提供回答,还可以在对话中互动,以便每个代理人提供自己的首选。我们的 между件研究(n=36)表明,与传统网络搜索和单个代理人相比,ChoiceMates更有助于发现、深入探索和管理信息,用户表示更高的信任度。我们还描述了参与者如何在决策过程中利用多个代理人对话。

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

  • paper_url: http://arxiv.org/abs/2310.01329
  • repo_url: None
  • paper_authors: Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi
  • for: This paper is written for improving the efficiency and scalability of retrieval-augmented language models (LMs) by addressing problems such as hallucination, staleness, and privacy leaks.
  • methods: The paper introduces binary token representations (BTR) that use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. The authors also propose new calibration techniques and training objectives to restore performance.
  • results: The authors’ experiments show that BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance on five knowledge-intensive NLP tasks, using only 127GB of disk space to encode 3 billion tokens in Wikipedia.
    Abstract Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. Despite the potential loss of accuracy, our new calibration techniques and training objectives restore performance. Combined with offline and runtime compression, this only requires 127GB of disk space for encoding 3 billion tokens in Wikipedia. Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.
    摘要 <> Retrieval 增强 solves many critical problems in large language models, such as 幻象, 落后, and 隐私泄露. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. Despite the potential loss of accuracy, our new calibration techniques and training objectives restore performance. Combined with offline and runtime compression, this only requires 127GB of disk space for encoding 3 billion tokens in Wikipedia. Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.<>Here's the translation in Simplified Chinese:<> Retrieval 增强 solves many critical problems in large language models, such as 幻象, 落后, and 隐私泄露. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. Despite the potential loss of accuracy, our new calibration techniques and training objectives restore performance. Combined with offline and runtime compression, this only requires 127GB of disk space for encoding 3 billion tokens in Wikipedia. Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.<>

TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series

  • paper_url: http://arxiv.org/abs/2310.01327
  • repo_url: None
  • paper_authors: Arjun Ashok, Étienne Marcotte, Valentina Zantedeschi, Nicolas Chapados, Alexandre Drouin
  • for: 这篇论文是用于多变量构成的数据时间序列预测模型的设计,目的是让模型能够灵活地处理不同任务,包括预测、插值和其 комbination。
  • methods: 本文基于copula理论,提出了简化的目标函数,用于已经引入的transformer-based attentional copulas(TACTiS)。这个目标函数使得分布 Parameters的数量现在与变数的数量成正比,而不是因数。这需要对原始架构进行修改,并实施训练课程。
  • results: 根据作者的实验结果,新的模型在多个实际世界预测任务上表现出色,而且保留了先前的模型 flexibility,例如顺畅地处理不同标时和标本变化。
    Abstract We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including forecasting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series.
    摘要 我们介绍了一种新的多变量概率时间序列预测模型,用于灵活地解决多种任务,包括预测、 interpolate 和其组合。基于卷积理论,我们提出了简化的目标函数,用于已经引入的 transformer-based attentional copulas(TACTiS),其中分布参数的数量现在与变量数直接相关,而不是幂函数相关。这新的目标函数需要对原始架构进行必要的修改,并且需要一个训练课程。我们表明,该模型在多种实际预测任务上具有明显更好的训练势度和状态前进,同时保持了之前的灵活性,如顺利处理不对齐和不均匀采样的时间序列。

FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.01467
  • repo_url: None
  • paper_authors: Jingwei Sun, Ziyue Xu, Hongxu Yin, Dong Yang, Daguang Xu, Yiran Chen, Holger R. Roth
  • for: 这个论文是为了解决对大语言模型的调评问题,并且保持隐私和安全性。
  • methods: 这个论文使用了联邦学习(Federated Learning)技术,并且提出了一个名为“Federated Black-box Prompt Tuning”的框架,以解决调评过程中的安全性和隐私问题。
  • results: 实验结果显示,这个框架可以对调评过程中的通信和存储成本进行剧烈的减少,而且可以保持比较好的性能。
    Abstract Pre-trained language models (PLM) have revolutionized the NLP landscape, achieving stellar performances across diverse tasks. These models, while benefiting from vast training data, often require fine-tuning on specific data to cater to distinct downstream tasks. However, this data adaptation process has inherent security and privacy concerns, primarily when leveraging user-generated, device-residing data. Federated learning (FL) provides a solution, allowing collaborative model fine-tuning without centralized data collection. However, applying FL to finetune PLMs is hampered by challenges, including restricted model parameter access, high computational requirements, and communication overheads. This paper introduces Federated Black-box Prompt Tuning (FedBPT), a framework designed to address these challenges. FedBPT does not require the clients to access the model parameters. By focusing on training optimal prompts and utilizing gradient-free optimization methods, FedBPT reduces the number of exchanged variables, boosts communication efficiency, and minimizes computational and storage costs. Experiments highlight the framework's ability to drastically cut communication and memory costs while maintaining competitive performance. Ultimately, FedBPT presents a promising solution for efficient, privacy-preserving fine-tuning of PLM in the age of large language models.
    摘要 预训语言模型(PLM)已经革命化了NLPT景象,在多种任务上实现了出色的表现。这些模型,尽管从 vast 的训练数据受益,但是在下游任务中需要特定数据的精度调整。然而,这个数据适应过程存在安全和隐私问题,特别是当使用用户生成的设备存储的数据时。联邦学习(FL)提供了一个解决方案,允许合作模型的调整而无需中央数据收集。然而,在应用FL调整PLM时存在一些挑战,包括限制客户端访问模型参数、高计算需求和通信开销。本文介绍了联邦黑盒启发调整(FedBPT)框架,这个框架不需要客户端访问模型参数,而是通过训练优化提示和gradient-free优化方法来降低交换变量的数量,提高通信效率,降低计算和存储成本。实验表明,FedBPT可以剖正交通和内存成本,同时保持竞争力的表现。最终,FedBPT提供了一种有效的、隐私保护的PLM调整解决方案。

Avalon’s Game of Thoughts: Battle Against Deception through Recursive Contemplation

  • paper_url: http://arxiv.org/abs/2310.01320
  • repo_url: None
  • paper_authors: Shenzhi Wang, Chang Liu, Zilong Zheng, Siyuan Qi, Shuo Chen, Qisen Yang, Andrew Zhao, Chaofei Wang, Shiji Song, Gao Huang
    for:* 这项研究旨在探讨LLMs在有假信息混淆的环境下的能力。methods:* 该研究使用了复杂的Avalon游戏作为测试环境,并引入了一种新的框架——复杂思考(ReCon),以提高LLMs对假信息的识别和应对能力。results:* 实验结果表明,通过将ReCon与不同的LLMs结合,可以在Avalon游戏中提高LLMs的假信息识别和应对能力,无需额外微调和数据。
    Abstract Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information processed by LLMs is consistently honest, neglecting the pervasive deceptive or misleading information in human society and AI-generated content. This oversight makes LLMs susceptible to malicious manipulations, potentially resulting in detrimental outcomes. This study utilizes the intricate Avalon game as a testbed to explore LLMs' potential in deceptive environments. Avalon, full of misinformation and requiring sophisticated logic, manifests as a "Game-of-Thoughts". Inspired by the efficacy of humans' recursive thinking and perspective-taking in the Avalon game, we introduce a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.
    摘要 近期巨型语言模型(LLM)的突破让LMM作为代理人的领域具有了很大的成功。然而,一般假设是LMM处理的信息都是真实的,忽略了人类社会和AI生成的假信息的普遍存在。这种忽略使LMM容易受到恶意欺骗,可能导致不良的结果。这项研究使用了复杂的Avalon游戏作为测试环境,探索LMM在假信息环境中的潜力。Avalon游戏满是假信息,需要人类具有卓越的逻辑和思维能力。 draw inspiration from humans' recursive thinking and perspective-taking in the Avalon game, we propose a novel framework, Recursive Contemplation (ReCon), to enhance LLMs' ability to identify and counteract deceptive information. ReCon combines formulation and refinement contemplation processes; formulation contemplation produces initial thoughts and speech, while refinement contemplation further polishes them. Additionally, we incorporate first-order and second-order perspective transitions into these processes respectively. Specifically, the first-order allows an LLM agent to infer others' mental states, and the second-order involves understanding how others perceive the agent's mental state. After integrating ReCon with different LLMs, extensive experiment results from the Avalon game indicate its efficacy in aiding LLMs to discern and maneuver around deceptive information without extra fine-tuning and data. Finally, we offer a possible explanation for the efficacy of ReCon and explore the current limitations of LLMs in terms of safety, reasoning, speaking style, and format, potentially furnishing insights for subsequent research.

On the Generalization of Training-based ChatGPT Detection Methods

  • paper_url: http://arxiv.org/abs/2310.01307
  • repo_url: https://github.com/hannxu123/hcvar
  • paper_authors: Han Xu, Jie Ren, Pengfei He, Shenglai Zeng, Yingqian Cui, Amy Liu, Hui Liu, Jiliang Tang
  • for: 本研究旨在 investigate ChatGPT 生成文本的泛化性能,以帮助开发更好的 ChatGPT 检测方法。
  • methods: 本研究使用了一系列的方法来检测 ChatGPT 生成文本,包括类型分类模型的训练和测试。
  • results: 研究发现,现有的检测方法可能会受到分布变化的影响,导致其在测试时效果不佳。此外,研究还发现了一些有用的发现,可以为未来的方法和数据采集策略提供指导。
    Abstract ChatGPT is one of the most popular language models which achieve amazing performance on various natural language tasks. Consequently, there is also an urgent need to detect the texts generated ChatGPT from human written. One of the extensively studied methods trains classification models to distinguish both. However, existing studies also demonstrate that the trained models may suffer from distribution shifts (during test), i.e., they are ineffective to predict the generated texts from unseen language tasks or topics. In this work, we aim to have a comprehensive investigation on these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks. To achieve this goal, we first collect a new dataset with human and ChatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings which provide guidance for developing future methodologies or data collection strategies for ChatGPT detection.
    摘要 chatGPT是一个非常受欢迎的语言模型,它在各种自然语言任务上表现出了惊人的水平。然而,随着chatGPT的普及,也有了快速 distinguish between human-written texts and chatGPT-generated texts的需求。现有的研究主要是通过训练分类模型来实现这一目标,但是 exiting studies also show that these trained models may suffer from distribution shifts during test, meaning they are ineffective in predicting generated texts from unseen language tasks or topics.在这项工作中,我们想要进行 comprehensive investigation of these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks。To achieve this goal, we first collect a new dataset with human and chatGPT texts, and then we conduct extensive studies on the collected dataset. Our studies unveil insightful findings which provide guidance for developing future methodologies or data collection strategies for chatGPT detection.Here's the translation in Traditional Chinese as well:chatGPT是一个非常受欢迎的语言模型,它在各种自然语言任务上表现出了惊人的水平。然而,随着chatGPT的普及,也有了快速 distinguish between human-written texts and chatGPT-generated texts的需求。现有的研究主要是通过训练分类模型来实现这一目标,但是 exiting studies also show that these trained models may suffer from distribution shifts during test, meaning they are ineffective in predicting generated texts from unseen language tasks or topics。在这个工作中,我们想要进行 comprehensive investigation of these methods' generalization behaviors under distribution shift caused by a wide range of factors, including prompts, text lengths, topics, and language tasks。To achieve this goal, we first collect a new dataset with human and chatGPT texts, and then we conduct extensive studies on the collected dataset。我们的研究发现有意义的发现,它们提供了未来方法olo gy或数据收集策略的指导方针 для chatGPT检测。

Generating Explanations in Medical Question-Answering by Expectation Maximization Inference over Evidence

  • paper_url: http://arxiv.org/abs/2310.01299
  • repo_url: None
  • paper_authors: Wei Sun, Mingxiao Li, Damien Sileo, Jesse Davis, Marie-Francine Moens
  • for: 帮助医疗工作人员查找答案。(to assist healthcare workers in finding answers.)
  • methods: 提出了一种新的方法,使用医学书籍中的知识来提高医疗解释的质量。(proposed a new approach that uses medical knowledge from textbooks to improve the quality of explanations.)
  • results: 实验结果表明,该方法可以效果地使用文本证据进行推理,并与当前状态Esp报表示的模型相比,实现了明显的提升。(experimental results demonstrate that the approach can effectively use textual evidence for reasoning, and achieve a significant improvement compared to current state-of-the-art models.)
    Abstract Medical Question Answering~(medical QA) systems play an essential role in assisting healthcare workers in finding answers to their questions. However, it is not sufficient to merely provide answers by medical QA systems because users might want explanations, that is, more analytic statements in natural language that describe the elements and context that support the answer. To do so, we propose a novel approach for generating natural language explanations for answers predicted by medical QA systems. As high-quality medical explanations require additional medical knowledge, so that our system extract knowledge from medical textbooks to enhance the quality of explanations during the explanation generation process. Concretely, we designed an expectation-maximization approach that makes inferences about the evidence found in these texts, offering an efficient way to focus attention on lengthy evidence passages. Experimental results, conducted on two datasets MQAE-diag and MQAE, demonstrate the effectiveness of our framework for reasoning with textual evidence. Our approach outperforms state-of-the-art models, achieving a significant improvement of \textbf{6.86} and \textbf{9.43} percentage points on the Rouge-1 score; \textbf{8.23} and \textbf{7.82} percentage points on the Bleu-4 score on the respective datasets.
    摘要 医疗问答系统(医疗QA)在帮助医疗工作者查询问题时发挥着重要作用。然而,仅提供答案并不足以满足用户的需求,因为用户可能需要更多的解释,即更多的自然语言中的批判性陈述,以描述答案的元素和上下文。为此,我们提出了一种新的方法,用于生成医疗问答系统预测的答案中的自然语言解释。由于高质量的医疗解释需要额外的医学知识,因此我们的系统从医学书籍中提取了更多的医学知识,以提高解释生成过程中的解释质量。我们采用了一种预期-最大化方法,可以快速地扫描证据文本中的证据,以提高解释生成的效率。实验结果,在两个数据集MQAE-diag和MQAE上进行了测试,显示了我们的框架在处理文本证据时的有效性。我们的方法在比较州的模型上表现出色,在 Rouge-1 分数上提高了 \textbf{6.86} 和 \textbf{9.43} 个百分点,在 Bleu-4 分数上提高了 \textbf{8.23} 和 \textbf{7.82} 个百分点。

Co-audit: tools to help humans double-check AI-generated content

  • paper_url: http://arxiv.org/abs/2310.01297
  • repo_url: None
  • paper_authors: Andrew D. Gordon, Carina Negreanu, José Cambronero, Rasika Chakravarthy, Ian Drosos, Hao Fang, Bhaskar Mitra, Hannah Richardson, Advait Sarkar, Stephanie Simmons, Jack Williams, Ben Zorn
    for: This paper is written to emphasize the importance of co-audit tools for generative AI applications where quality is crucial and errors have significant consequences, specifically in spreadsheet computations.methods: The paper proposes a preliminary list of principles for co-audit tools and outlines research challenges for developing effective co-audit experiences.results: The paper highlights the need for tool-assisted experiences to help users double-check AI-generated content for quality and correctness, as generative models generate more complex output that is harder to audit.
    Abstract Users are increasingly being warned to check AI-generated content for correctness. Still, as LLMs (and other generative models) generate more complex output, such as summaries, tables, or code, it becomes harder for the user to audit or evaluate the output for quality or correctness. Hence, we are seeing the emergence of tool-assisted experiences to help the user double-check a piece of AI-generated content. We refer to these as co-audit tools. Co-audit tools complement prompt engineering techniques: one helps the user construct the input prompt, while the other helps them check the output response. As a specific example, this paper describes recent research on co-audit tools for spreadsheet computations powered by generative models. We explain why co-audit experiences are essential for any application of generative AI where quality is important and errors are consequential (as is common in spreadsheet computations). We propose a preliminary list of principles for co-audit, and outline research challenges.
    摘要 用户们正在收到更多的提醒,请他们检查生成内容的正确性。然而,随着LLMs(以及其他生成模型)生成的输出越来越复杂,如摘要、表格或代码,用户就更难以对输出质量或正确性进行审核或评估。因此,我们开始看到工具协助体验的出现,以帮助用户对AI生成内容进行双重审核。我们称这些工具为“合作审核工具”。这些工具与推荐工程技术相似,一个帮助用户构建输入提示,另一个帮助用户检查输出响应。本文描述了最近关于合作审核工具的研究,特别是在基于生成模型的表格计算中。我们解释了为什么合作审核体验是生成AI应用中质量重要的时候的必需品,并提出了一些初步的原则和研究挑战。

Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.01290
  • repo_url: https://github.com/wenwen-d/knowledgecrosswords
  • paper_authors: Wenxuan Ding, Shangbin Feng, Yuhan Liu, Zhaoxuan Tan, Vidhisha Balachandran, Tianxing He, Yulia Tsvetkov
  • for: 这paper的目的是检验大型语言模型(LLMs)是否可以在知识强大的情况下进行geometry reasoning,以及 LLMS 是否可以在不同领域和各种不同的知识环境中进行这种类型的reasoning。
  • methods: 这 paper使用的方法包括提出了一个名为 Knowledge Crosswords 的多空白QA数据集,以及两种新的提问方法:Stage Prompting和Verify-All。这些方法旨在augment LLMs的能力,让它们在不同的知识环境中更好地进行geometry reasoning。
  • results: 研究结果表明,虽然基eline方法在 easier 问题上表现良好,但在 harder 问题上却表现不佳。而提出的 Verify-All 方法在这些 harder 问题上表现出色,与其他方法相比差异较大。此外,研究还发现 LLMS 在 geometry reasoning 方面仍然存在一些不足,例如答案的顺序、特定的结构模式等可能会导致 LLMS 进行错误的回答。
    Abstract Large language models (LLMs) are widely adopted in knowledge-intensive tasks and have achieved impressive performance thanks to their knowledge abilities. While LLMs have demonstrated outstanding performance on atomic or linear (multi-hop) QA tasks, whether they can reason in knowledge-rich scenarios with interweaving constraints remains an underexplored problem. In this work, we propose geometric reasoning over structured knowledge, where pieces of knowledge are connected in a graph structure and models need to fill in the missing information. Such geometric knowledge reasoning would require the ability to handle structured knowledge, reason with uncertainty, verify facts, and backtrack when an error occurs. We propose Knowledge Crosswords, a multi-blank QA dataset where each problem consists of a natural language question representing the geometric constraints of an incomplete entity network, where LLMs are tasked with working out the missing entities while meeting all factual constraints. Knowledge Crosswords contains 2,101 individual problems, covering various knowledge domains and further divided into three difficulty levels. We conduct extensive experiments to evaluate existing LLM prompting approaches on the Knowledge Crosswords benchmark. We additionally propose two new approaches, Staged Prompting and Verify-All, to augment LLMs' ability to backtrack and verify structured constraints. Our results demonstrate that while baseline approaches perform well on easier problems but struggle with hard ones, our proposed Verify-All outperforms other methods by a large margin and is more robust with hard problems. Further analysis reveals that LLMs' ability of geometric reasoning over structured knowledge is still far from robust or perfect, susceptible to confounders such as the order of options, certain structural patterns, assumption of existence of correct answer, and more.
    摘要 广泛采用的大语言模型(LLM)在知识密集任务中表现出色,感谢它们的知识能力。然而, LLM 在具有复杂续链(multi-hop)和知识丰富的情况下还未得到充分探索。在这个工作中,我们提出了几何理解知识,其中知识单元被Graph结构连接在一起,模型需要填充缺失信息。这个几何知识理解需要模型能够处理结构化知识、以 uncertain 的方式进行推理、验证事实和当错时回溯。我们提出了知识十字游戏,这是一个多输入 вопро� answered 数据集,每个问题都是一个自然语言问题,表达了几何续链中缺失的实体网络, LLM 需要填充缺失的实体,同时遵循所有的事实约束。知识十字游戏包含 2,101 个问题,涵盖多个知识领域,并且分为三种难度水平。我们进行了广泛的实验,评估现有的 LLM 提示方法在知识十字游戏中的表现。我们还提出了两个新的方法,分别是 Staged Prompting 和 Verify-All,以增强 LLM 的后退和验证结构约束的能力。我们的结果显示,基eline方法在较容易的问题上表现良好,但在困难的问题上却表现不佳。我们的提出的 Verify-All 方法则大幅超越其他方法,并且更具有实用性。进一步的分析显示, LLM 在几何理解知识时仍然具有许多不足和潜在问题,例如选择顺序、特定的结构图样、存在正确答案的假设等。

Grasping AI: experiential exercises for designers

  • paper_url: http://arxiv.org/abs/2310.01282
  • repo_url: None
  • paper_authors: Dave Murray-Rust, Maria Luce Lupetti, Iohanna Nicenboim, Wouter van der Hoog
  • for: 本研究旨在帮助设计师在AI技术的应用中更好地理解人类互动的问题和社会意义,以便更负责任地设计AI系统。
  • methods: 本研究使用了九种’AI练习’,包括超人设计、负责任AI设计和推理演示,以创造AI互动设计的实验性经验。
  • results: 研究发现,通过使用emetaphors和演示来让学生更直观地理解AI系统的训练和学习、隐私和consent、自主和权力等问题,可以帮助学生更有责任地设计AI系统。
    Abstract Artificial intelligence (AI) and machine learning (ML) are increasingly integrated into the functioning of physical and digital products, creating unprecedented opportunities for interaction and functionality. However, there is a challenge for designers to ideate within this creative landscape, balancing the possibilities of technology with human interactional concerns. We investigate techniques for exploring and reflecting on the interactional affordances, the unique relational possibilities, and the wider social implications of AI systems. We introduced into an interaction design course (n=100) nine 'AI exercises' that draw on more than human design, responsible AI, and speculative enactment to create experiential engagements around AI interaction design. We find that exercises around metaphors and enactments make questions of training and learning, privacy and consent, autonomy and agency more tangible, and thereby help students be more reflective and responsible on how to design with AI and its complex properties in both their design process and outcomes.
    摘要 人工智能(AI)和机器学习(ML)在物理和数字产品中逐渐整合,创造了前所未有的交互和功能。然而,设计师面临着挑战,如何在这种创造力充沛的景观中进行设计,权衡技术的可能性和人类交互关注。我们在交互设计课程(n=100)中引入了9个“AI实践”,利用更多than human设计、责任AI和推演实践来创造交互式的AI交互设计经验。我们发现,这些实践使得学生对培训和学习、隐私和consent、自主和权力的问题变得更加直观,从而帮助学生更加反思和负责任地设计AI和其复杂特性。

A Comparison of Mesh-Free Differentiable Programming and Data-Driven Strategies for Optimal Control under PDE Constraints

  • paper_url: http://arxiv.org/abs/2310.02286
  • repo_url: None
  • paper_authors: Roussel Desmond Nzoyem, David A. W. Barton, Tom Deakin
  • for: 这篇论文主要是关于Optimal Control under Partial Differential Equations (PDE) constraints的研究,它们是受到深度学习和相关的自动梯度库的影响,而这些技术正在快速发展。
  • methods: 这篇论文使用了Direct-Adjoint Looping (DAL)、Physics-Informed Neural Networks (PINNs)和Differentiable Programming (DP)等方法进行比较,其中DP在Laplace和Navier-Stokes方程下表现最佳,而DAL和PINNs在某些情况下则表现不佳。
  • results: 这篇论文的研究结果表明,DP在解析PDE问题时可以生成最精准的梯度,并且在DAL和PINNs失败时表现出色。此外,论文还提供了一个详细的benchmark,以帮助Optimal Control启用者更好地选择合适的方法。
    Abstract The field of Optimal Control under Partial Differential Equations (PDE) constraints is rapidly changing under the influence of Deep Learning and the accompanying automatic differentiation libraries. Novel techniques like Physics-Informed Neural Networks (PINNs) and Differentiable Programming (DP) are to be contrasted with established numerical schemes like Direct-Adjoint Looping (DAL). We present a comprehensive comparison of DAL, PINN, and DP using a general-purpose mesh-free differentiable PDE solver based on Radial Basis Functions. Under Laplace and Navier-Stokes equations, we found DP to be extremely effective as it produces the most accurate gradients; thriving even when DAL fails and PINNs struggle. Additionally, we provide a detailed benchmark highlighting the limited conditions under which any of those methods can be efficiently used. Our work provides a guide to Optimal Control practitioners and connects them further to the Deep Learning community.
    摘要 < translate into Simplified Chinese领域内的最优控制 unter 部分梯度方程(PDE)约束是在深度学习和相关的自动梯度库的影响下快速发展。新的技术如物理学 Informed Neural Networks(PINNs)和可微编程(DP)与传统的数学方法如直接-逆向循环(DAL)进行比较。我们使用一个通用的无格 mesh-free 可微PDE解决方案 based on Radial Basis Functions进行了全面的比较。在拉普拉斯和 navier-Stokes 方程下,我们发现 DP 非常有效,因为它生成的梯度最准确;在 DAL 失败和 PINNs 困难时,它卓越。此外,我们提供了详细的审核,描述了任何这些方法可以高效地使用的有限条件。我们的工作为 Optimal Control 专家提供了指南,并与深度学习社区更加密切连接。

A Unified View on Neural Message Passing with Opinion Dynamics for Social Networks

  • paper_url: http://arxiv.org/abs/2310.01272
  • repo_url: None
  • paper_authors: Outongyi Lv, Bingxin Zhou, Jing Wang, Xiang Xiao, Weishu Zhao, Lirong Zheng
  • for: 这种研究旨在帮助分析和理解社交网络中的动态系统,包括社交actor之间的 opinio exchanges和信息传递。
  • methods: 该研究使用 neural message passing 和 sociometry 概念相结合,提出了 ODNet 消息传递方案,并调整了 bounded confidence 和影响 weights。
  • results: 该研究表明 ODNet 可以提高不同类型的图像表示学性能,并解决压缩问题。此外,该方法可以简单地使用社交网络图中Entity之间的交互频率来简化社交网络图。
    Abstract Social networks represent a common form of interconnected data frequently depicted as graphs within the domain of deep learning-based inference. These communities inherently form dynamic systems, achieving stability through continuous internal communications and opinion exchanges among social actors along their social ties. In contrast, neural message passing in deep learning provides a clear and intuitive mathematical framework for understanding information propagation and aggregation among connected nodes in graphs. Node representations are dynamically updated by considering both the connectivity and status of neighboring nodes. This research harmonizes concepts from sociometry and neural message passing to analyze and infer the behavior of dynamic systems. Drawing inspiration from opinion dynamics in sociology, we propose ODNet, a novel message passing scheme incorporating bounded confidence, to refine the influence weight of local nodes for message propagation. We adjust the similarity cutoffs of bounded confidence and influence weights of ODNet and define opinion exchange rules that align with the characteristics of social network graphs. We show that ODNet enhances prediction performance across various graph types and alleviates oversmoothing issues. Furthermore, our approach surpasses conventional baselines in graph representation learning and proves its practical significance in analyzing real-world co-occurrence networks of metabolic genes. Remarkably, our method simplifies complex social network graphs solely by leveraging knowledge of interaction frequencies among entities within the system. It accurately identifies internal communities and the roles of genes in different metabolic pathways, including opinion leaders, bridge communicators, and isolators.
    摘要 社交网络是常见的连接数据的形式,在深度学习基础上的推理中经常被描绘为图。这些社区自然形成动态系统,通过社交actor之间的内部通信和意见交换来实现稳定。与此同时,深度学习中的神经Message passing提供了一个明确和直观的数学框架,用于理解图中连接的节点之间信息传递和聚合。每个节点的表示都会在考虑周围节点的连接和状态的基础上进行动态更新。这项研究将社会学中的社ometry和神经Message passing结合,以分析和推断动态系统的行为。我们提出了一种名为ODNet的新的消息传递方案,该方案包括约束confidence的约束,用于修改本地节点的影响重量。我们调整了约束confidence和ODNet的影响重量,并定义了与社交网络图的特点相align的意见交换规则。我们发现ODNet可以提高不同类型的图的预测性能,并解决过滤 пробле。此外,我们的方法超过了传统的基eline在图表示学习中的表现,并证明了其在分析实际世界的共occurrence网络中的实用意义。很 Remarkably,我们的方法可以简化社交网络图,只需要利用系统内部Entity之间的交互频率来训练。它可以准确地识别社交网络图中的内部社区和不同 метаболических道路中的基因的角色,包括意见领袖、桥接通信员和隔离者。

Cooperative Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.01267
  • repo_url: https://github.com/zhangxiaochen95/uav_bs_ctrl
  • paper_authors: Ben Finkelshtein, Xingyue Huang, Michael Bronstein, İsmail İlkan Ceylan
  • for: 这个论文的目的是提出一种新的图 neural network 训练框架,使每个节点可以选择不同的策略来处理信息。
  • methods: 这个论文使用了一种新的消息传递方案,其中每个节点可以选择是 ‘听’, ‘广播’, ‘听并广播’,或者 ‘隔离’。这种方法可以视为标准的消息传递模式的特例,在每层的节点状态更新中,每个节点都会对所有邻居进行广播和接收消息。
  • results: 论文提供了一种新的图 neural network 训练方法,可以更好地利用图 topology 进行学习。论文还提供了一个理论分析和广泛的实验分析,证明了新的消息传递方案的有效性。
    Abstract Graph neural networks are popular architectures for graph machine learning, based on iterative computation of node representations of an input graph through a series of invariant transformations. A large class of graph neural networks follow a standard message-passing paradigm: at every layer, each node state is updated based on an aggregate of messages from its neighborhood. In this work, we propose a novel framework for training graph neural networks, where every node is viewed as a player that can choose to either 'listen', 'broadcast', 'listen and broadcast', or to 'isolate'. The standard message propagation scheme can then be viewed as a special case of this framework where every node 'listens and broadcasts' to all neighbors. Our approach offers a more flexible and dynamic message-passing paradigm, where each node can determine its own strategy based on their state, effectively exploring the graph topology while learning. We provide a theoretical analysis of the new message-passing scheme which is further supported by an extensive empirical analysis on a synthetic dataset and on real-world datasets.
    摘要 图 neural network 是一种流行的图学习架构,基于迭代计算输入图的节点表示的 invarianten 变换。大量的图 neural network 遵循一种标准的消息传递模式:在每层,每个节点状态都是基于它的邻居聚合的消息。在这项工作中,我们提出了一种新的图 neural network 训练框架,在该框架中,每个节点视为一个玩家,可以选择 'listen'、'broadcast'、'listen and broadcast' 或 'isolate' 四种策略。标准的消息传递方案可以视为该框架中每个节点 'listen and broadcast' 所有邻居的特例。我们的方法提供了更flexible和动态的消息传递模式,allowing each node to determine its own strategy based on its state, effectively exploring the graph topology while learning. 我们提供了对新消息传递方案的理论分析,并通过一系列实验来支持这一点,包括一个 sintetic dataset 和实际世界 dataset。

SPELL: Semantic Prompt Evolution based on a LLM

  • paper_url: http://arxiv.org/abs/2310.01260
  • repo_url: None
  • paper_authors: Yujian Betterest Li, Kai Wu
  • for: 提高训练过 neural network 模型的性能
  • methods: 利用大语言模型(LLMs)来自动优化提示语
  • results: 实验结果表明,SPELL 可以快速提高提示语Here’s a more detailed explanation of each point:
  • for: The paper is written to enhance the performance of trained neural network models by using a new paradigm called prompt engineering.
  • methods: The paper proposes a black-box evolution algorithm called SPELL (Semantic Prompt Evolution based on a LLM) to automatically optimize text prompts. SPELL uses a trained large language model (LLM) as a text generator to evolve the prompts.
  • results: The proposed method is evaluated with different LLMs and evolution parameters in different text tasks, and the experimental results show that SPELL can rapidly improve the prompts.
    Abstract Prompt engineering is a new paradigm for enhancing the performance of trained neural network models. For optimizing text-style prompts, existing methods usually individually operate small portions of a text step by step, which either breaks the fluency or could not globally adjust a prompt. Since large language models (LLMs) have powerful ability of generating coherent texts token by token, can we utilize LLMs for improving prompts? Based on this motivation, in this paper, considering a trained LLM as a text generator, we attempt to design a black-box evolution algorithm for automatically optimizing texts, namely SPELL (Semantic Prompt Evolution based on a LLM). The proposed method is evaluated with different LLMs and evolution parameters in different text tasks. Experimental results show that SPELL could rapidly improve the prompts indeed. We further explore the evolution process and discuss on the limitations, potential possibilities and future work.
    摘要 Prompt engineering是一种新的思维方式,用于提高训练过的神经网络模型的性能。现有的方法通常是单独操作小剂文本步骤,这会导致流畅性被打断或者无法全面调整提示。由于大语言模型(LLMs)具有生成文本单词的强大能力,那么可以利用LLMs来提高提示吗?基于这种动机,在这篇论文中,我们尝试使用黑盒演化算法来自动优化文本,即SPELL(含义提示演化基于LLM)。我们对不同的LLM和演化参数进行了不同的文本任务的评估。实验结果表明,SPELL确实可以快速改善提示。我们进一步探讨演化过程中的限制、潜在可能性和未来工作。

Faster and Accurate Neural Networks with Semantic Inference

  • paper_url: http://arxiv.org/abs/2310.01259
  • repo_url: None
  • paper_authors: Sazzad Sayyed, Jonathan Ashdown, Francesco Restuccia
    for: 这个论文旨在降低深度神经网络(DNN)的计算负担,而不导致性能下降。methods: 该论文提出了一种基于卷积神经网络(CNN)的Semantic Inference(SINF)框架,通过快速分类来确定输入对象的 semantic cluster,然后使用相应的子图进行预测。此外,论文还提出了一种新的Discriminative Capability Score(DCS)方法,可以独立地应用于任何DNN中。results: 论文的实验结果表明,SINF可以减少VGG16、VGG19和ResNet50的推理时间,而无需增加性能下降。DCS可以与现有的预测方法相比,在VGG16、VGG19和ResNet50上实现更高的准确率。此外,当用作截割 criterion时,DCS可以获得8.13%的准确率提升,并且减少5.82%的参数量。
    Abstract Deep neural networks (DNN) usually come with a significant computational burden. While approaches such as structured pruning and mobile-specific DNNs have been proposed, they incur drastic accuracy loss. In this paper we leverage the intrinsic redundancy in latent representations to reduce the computational load with limited loss in performance. We show that semantically similar inputs share many filters, especially in the earlier layers. Thus, semantically similar classes can be clustered to create cluster-specific subgraphs. To this end, we propose a new framework called Semantic Inference (SINF). In short, SINF (i) identifies the semantic cluster the object belongs to using a small additional classifier and (ii) executes the subgraph extracted from the base DNN related to that semantic cluster for inference. To extract each cluster-specific subgraph, we propose a new approach named Discriminative Capability Score (DCS) that finds the subgraph with the capability to discriminate among the members of a specific semantic cluster. DCS is independent from SINF and can be applied to any DNN. We benchmark the performance of DCS on the VGG16, VGG19, and ResNet50 DNNs trained on the CIFAR100 dataset against 6 state-of-the-art pruning approaches. Our results show that (i) SINF reduces the inference time of VGG19, VGG16, and ResNet50 respectively by up to 35%, 29% and 15% with only 0.17%, 3.75%, and 6.75% accuracy loss (ii) DCS achieves respectively up to 3.65%, 4.25%, and 2.36% better accuracy with VGG16, VGG19, and ResNet50 with respect to existing discriminative scores (iii) when used as a pruning criterion, DCS achieves up to 8.13% accuracy gain with 5.82% less parameters than the existing state of the art work published at ICLR 2023 (iv) when considering per-cluster accuracy, SINF performs on average 5.73%, 8.38% and 6.36% better than the base VGG16, VGG19, and ResNet50.
    摘要 深度神经网络(DNN)通常会带来巨大的计算负担。虽然有许多方法,如结构化剪辑和移动设备专门的DNN,但它们都会导致减少性能。在这篇论文中,我们利用神经网络中固有的重复性来减少计算负担,同时减少性能下降。我们发现semantically相似的输入都共享许多滤波器,特别是在早期层。因此,我们可以将semantically相似的类划分到各自的子图中。为此,我们提出了一个新的框架called Semantic Inference(SINF)。简而言之,SINF包括以下两个步骤:1. 使用一个小型额外分类器来确定输入对象所属的semantic集(cluster)。2. 使用基于这个semantic集的子图来进行卷积。为EXTRACT每个cluster-specific subgraph,我们提出了一新的方法named Discriminative Capability Score(DCS)。DCS可以独立应用于任何DNN。我们对VGG16、VGG19和ResNet50在CIFAR100 dataset上进行了6种state-of-the-art剪辑方法的比较。我们的结果表明:1. SINF可以将VGG19、VGG16和ResNet50的推理时间减少35%、29%和15%,仅增加0.17%、3.75%和6.75%的准确率损失。2. DCS可以在VGG16、VGG19和ResNet50上提高准确率,相比之下,现有的推理分数提高了3.65%、4.25%和2.36%。3. 作为剪辑标准,DCS可以减少5.82%、8.13%和2.36%的参数数量,并且在ICLR 2023中发表的最佳成果中提高了8.13%的准确率。4. 考虑每个cluster的准确率,SINF在平均上提高了5.73%、8.38%和6.36%。

Pre-training Contextual Location Embeddings in Personal Trajectories via Efficient Hierarchical Location Representations

  • paper_url: http://arxiv.org/abs/2310.01252
  • repo_url: None
  • paper_authors: Chung Park, Taesan Kim, Junui Hong, Minsung Choi, Jaegul Choo
  • for: 本研究旨在提高Location Based Services(LBS)中 Location Embedding 的效能,解决实际应用中模型大量 Location 的问题。
  • methods: 我们提出了 Geo-Tokenizer,它可以高效地减少要训练的 Location 数量,通过表示一个 Location 为多个精度不同的 Grid 的组合。此外,我们还提出了 Hierarchical Auto-regressive Location Model 对象,用于有效地训练 Geo-Tokenizer 中的 decomposed Location。
  • results: 我们在两个实际用户轨迹数据集上进行了实验,结果表明,我们的模型可以在减少模型参数的情况下,显著提高下游任务的性能。
    Abstract Pre-training the embedding of a location generated from human mobility data has become a popular method for location based services. In practice, modeling the location embedding is too expensive, due to the large number of locations to be trained in situations with fine-grained resolution or extensive target regions. Previous studies have handled less than ten thousand distinct locations, which is insufficient in the real-world applications. To tackle this problem, we propose a Geo-Tokenizer, designed to efficiently reduce the number of locations to be trained by representing a location as a combination of several grids at different scales. In the Geo-Tokenizer, a grid at a larger scale shares the common set of grids at smaller scales, which is a key factor in reducing the size of the location vocabulary. The sequences of locations preprocessed with the Geo-Tokenizer are utilized by a causal location embedding model to capture the temporal dependencies of locations. This model dynamically calculates the embedding vector of a target location, which varies depending on its trajectory. In addition, to efficiently pre-train the location embedding model, we propose the Hierarchical Auto-regressive Location Model objective to effectively train decomposed locations in the Geo-Tokenizer. We conducted experiments on two real-world user trajectory datasets using our pre-trained location model. The experimental results show that our model significantly improves the performance of downstream tasks with fewer model parameters compared to existing location embedding methods.
    摘要 预训化位置生成的位置生成方法在地理位置服务中变得非常流行。在实践中,定制位置生成的模型是太费时的,因为需要训练具有细化分辨率或广泛目标区域的大量位置。前一些研究只处理了数千个独特的位置,这对实际应用来说是不足够的。为了解决这个问题,我们提议了地理化токен化器(Geo-Tokenizer),用于有效减少需要训练的位置数量。在地理化токен化器中,一个大规模的网格共享同一个规模下的几个网格的公共集合,这是减少位置词汇库的关键因素。 pré-processed的位置序列使用地理化 embedding 模型来捕捉位置的时间相关性。这个模型会在目标位置的轨迹上计算不同的嵌入 вектор,这些嵌入 вектор会随着目标位置的变化而变化。此外,为了有效地预训练位置 embedding 模型,我们提议了层次自回归位置模型目标,以有效地训练地理化tokentizer 中的分解位置。我们在两个实际用户轨迹数据集上进行了实验,实验结果显示,我们的模型可以在较少的模型参数下显著提高下游任务的性能,相比于现有的位置嵌入方法。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

PASTA: PArallel Spatio-Temporal Attention with spatial auto-correlation gating for fine-grained crowd flow prediction

  • paper_url: http://arxiv.org/abs/2310.02284
  • repo_url: None
  • paper_authors: Chung Park, Junui Hong, Cheonbok Park, Taesan Kim, Minsung Choi, Jaegul Choo
  • for: 预测城市范围内未来人员和车辆的流动方向
  • methods: 使用PASTA神经网络模型,包括空间自相关阻止、多尺度径脱敏块和时间注意力阻止模块,有效地捕捉细致的空间时间模式
  • results: 对比其他基线模型,本模型在具有不规则空间区域的条件下表现出优异性,并提供了关键时间信息的Qualitative分析
    Abstract Understanding the movement patterns of objects (e.g., humans and vehicles) in a city is essential for many applications, including city planning and management. This paper proposes a method for predicting future city-wide crowd flows by modeling the spatio-temporal patterns of historical crowd flows in fine-grained city-wide maps. We introduce a novel neural network named PArallel Spatio-Temporal Attention with spatial auto-correlation gating (PASTA) that effectively captures the irregular spatio-temporal patterns of fine-grained maps. The novel components in our approach include spatial auto-correlation gating, multi-scale residual block, and temporal attention gating module. The spatial auto-correlation gating employs the concept of spatial statistics to identify irregular spatial regions. The multi-scale residual block is responsible for handling multiple range spatial dependencies in the fine-grained map, and the temporal attention gating filters out irrelevant temporal information for the prediction. The experimental results demonstrate that our model outperforms other competing baselines, especially under challenging conditions that contain irregular spatial regions. We also provide a qualitative analysis to derive the critical time information where our model assigns high attention scores in prediction.
    摘要 理解城市中物体(如人和车辆)的运动模式是许多应用的关键,包括城市规划和管理。本文提出了一种预测未来城市范围内人流的方法,基于历史人流的细致空间时间模式。我们介绍了一种名为PASTA(并行空间时间注意力)的新神经网络,可以有效地捕捉细致空间时间模式的异常模式。我们的方法包括空间自相关阻断、多尺度径脱异阻断和时间注意力阻断模块。空间自相关阻断利用空间统计学的概念来标识不规则的空间区域。多尺度径脱异阻断用于处理细致地图中的多个范围空间依赖关系,而时间注意力阻断模块用于过滤不相关的时间信息。实验结果表明,我们的模型在具有异常空间区域的情况下表现出优于其他竞争对手。我们还提供了一种 качеitative分析,以derive高注意力得分的关键时间信息。

ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale

  • paper_url: http://arxiv.org/abs/2310.01217
  • repo_url: https://github.com/cpjku/scalearn
  • paper_authors: Markus Frohmann, Carolin Holtermann, Shahed Masoudian, Anne Lauscher, Navid Rekabsaz
  • for: 本研究旨在提高多任务学习(MTL)的效率,特别是使用预训练语言模型(PLMs)。
  • methods: 该研究使用了两个阶段的方法:第一阶段是任务学习,在这个阶段,知识特定于任务是通过sets of parameters(例如适配器)被储存起来;第二阶段是传输,在这个阶段,已经学习的知识被用于目标任务。
  • results: 该研究表明,通过linearly scaling the output representations of source adapters for transfer learning,可以提高MTL的效率。 experiments on three benchmarks(GLUE、SuperGLUE、HumSet)表明,我们的方法可以轻量级地替代AdapterFusion,并且可以减少约0.35%的传输参数数量。此外,我们还发现,当进一步减少参数时,我们的方法仍然可以保持强大的能力,只需8个传输参数。
    Abstract Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly achieved by simultaneously learning $n$ tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (\eg adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits, such as promoting reusability, and addressing cases involving data privacy and societal concerns; on the flip side, current two-stage MTL methods come with the cost of introducing a substantial number of additional parameters. In this work, we address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective knowledge transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) show that our ScaLearn, in addition to facilitating the benefits of two-stage MTL, consistently outperforms strong baselines with only a small number of transfer parameters - roughly 0.35% of those of AdapterFusion. Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters through uniform scaling and layer-sharing, achieving similarly competitive results with only $8$ transfer parameters for each target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer.
    摘要 多任务学习(MTL)已经实现了很大的实践效果,特别是使用预训练语言模型(PLM)。而现在的方法,如AdapterFusion,将问题分解成两个阶段:(i)任务学习,其中知识特定于任务被封装在参数集中(例如适配器);(ii)传输,其中已经学习的知识被应用于目标任务。这种分解问题的方式提供了许多利点,如推动可重用性和解决数据隐私和社会问题等问题。然而,当前的两个阶段MTL方法带来了增加大量额外参数的成本。在这种情况下,我们提出了一种简单有效的两阶段MTL方法,即ScaLearn。我们的方法通过学习源任务的输出表示的线性扩展来实现效果转移。我们的实验结果表明,ScaLearn不仅可以实现二阶段MTL的好处,还可以在三个 benchmark(GLUE、SuperGLUE和HumSet)上连续输出竞争力强的结果,并且只需要约0.35%的传输参数。此外,我们发现,当进一步减少参数时,ScaLearn仍然保持强大的能力,只需要每个目标任务8个传输参数。这表明了简单的扩展是一种有 promise的更有效的任务传输方法。

Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning

  • paper_url: http://arxiv.org/abs/2310.01207
  • repo_url: None
  • paper_authors: Alexey Skrynnik, Anton Andreychuk, Maria Nesterova, Konstantin Yakovlev, Aleksandr Panov
  • for: 这个论文主要针对的问题是多代理路径寻找问题(MAPF),具体来说是在中央控制器缺 absent 的情况下,多个代理机器人在图上寻找冲突free的路径。
  • methods: 本论文提出了一种解决这个问题的方法,具体来说是通过规划和强化学习来实现。规划技术用于构建和重新规划个体路径,而强化学习则用于找到避免碰撞的策略。这两种方法被集成起来,以提高系统的吞吐量和灵活性。
  • results: 论文的实验结果显示,这种方法在多种设置下都能够准确地解决MAPF问题,并且比learnable竞争对手高效,能够更好地适应未经训练的地图。此外,这种方法还比一个基于规则的方法快得多,并且比一个state-of-the-art search-based solver更快。
    Abstract Multi-agent Pathfinding (MAPF) problem generally asks to find a set of conflict-free paths for a set of agents confined to a graph and is typically solved in a centralized fashion. Conversely, in this work, we investigate the decentralized MAPF setting, when the central controller that posses all the information on the agents' locations and goals is absent and the agents have to sequientially decide the actions on their own without having access to a full state of the environment. We focus on the practically important lifelong variant of MAPF, which involves continuously assigning new goals to the agents upon arrival to the previous ones. To address this complex problem, we propose a method that integrates two complementary approaches: planning with heuristic search and reinforcement learning through policy optimization. Planning is utilized to construct and re-plan individual paths. We enhance our planning algorithm with a dedicated technique tailored to avoid congestion and increase the throughput of the system. We employ reinforcement learning to discover the collision avoidance policies that effectively guide the agents along the paths. The policy is implemented as a neural network and is effectively trained without any reward-shaping or external guidance. We evaluate our method on a wide range of setups comparing it to the state-of-the-art solvers. The results show that our method consistently outperforms the learnable competitors, showing higher throughput and better ability to generalize to the maps that were unseen at the training stage. Moreover our solver outperforms a rule-based one in terms of throughput and is an order of magnitude faster than a state-of-the-art search-based solver.
    摘要 多智能路径找索(MAPF)问题通常是在中央控制器拥有所有智能机器人位置和目标信息的情况下解决的。然而,在这项工作中,我们研究了不中央化的MAPF设置,在没有中央控制器的情况下,智能机器人需要顺序决定行动,而不是 possessed 整个环境状态。我们关注实际上非常重要的持续分配新目标给智能机器人的生命 variant 的 MAPF 问题。为解决这个复杂的问题,我们提出了一种方法,该方法结合了规划和策略优化的搜索。我们使用规划来构建和重新规划个人路径。我们为规划算法添加了一种专门针对避免堵塞和提高系统吞吐量的技术。我们使用策略优化来找到避免碰撞的策略,这些策略通过神经网络实现,并在训练过程中不需要奖励或外部指导。我们对不同的设置进行了广泛的评估,并与现有的解决方案进行比较。结果表明,我们的方法在与可学习竞争对手进行比较时, persistently exhibit 高于通过put 和更好地适应未经训练的地图。此外,我们的解决方案在比较于寻找算法和规划算法的搜索效率和性能方面也表现出优异。

appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit

  • paper_url: http://arxiv.org/abs/2310.01206
  • repo_url: https://github.com/hitachi-nlp/appjsonify
  • paper_authors: Atsuki Yamaguchi, Terufumi Morishita
  • for: 这篇论文是为了提供一个基于Python的PDF-to-JSON转换工具集,用于处理学术论文。
  • methods: 该工具使用了多种视觉基于文档布局分析模型和规则基于文本处理方法来解析PDF文档。
  • results: 用户可以根据自己需要配置处理管道来处理特定的PDF格式文档。
    Abstract We present appjsonify, a Python-based PDF-to-JSON conversion toolkit for academic papers. It parses a PDF file using several visual-based document layout analysis models and rule-based text processing approaches. appjsonify is a flexible tool that allows users to easily configure the processing pipeline to handle a specific format of a paper they wish to process. We are publicly releasing appjsonify as an easy-to-install toolkit available via PyPI and GitHub.
    摘要 我们现在发布了一个名为appjsonify的Python基于的PDF到JSON转换工具集,用于处理学术论文。它使用了许多基于视觉文档布局分析模型以及基于规则的文本处理方法来解析PDF文件。appjsonify是一个灵活的工具,允许用户轻松配置处理管道以处理他们想要处理的特定格式的论文。我们正式发布appjsonify,可以通过PyPI和GitHub上安装。

Quantifying the Plausibility of Context Reliance in Neural Machine Translation

  • paper_url: http://arxiv.org/abs/2310.01188
  • repo_url: None
  • paper_authors: Gabriele Sarti, Grzegorz Chrupała, Malvina Nissim, Arianna Bisazza
  • for: 本研究旨在测试语言模型是否可以使用人类可接受的方式使用上下文信息,以确保其在实际应用中的安全使用。
  • methods: 本研究使用了PECoRe批判性评估框架,该框架利用模型内部结构来识别生成文本中的上下文敏感词和其与上下文关联的证据。
  • results: 通过对语言翻译模型的生成文本进行PECoRe评估,研究发现模型在不同的话语水平上的表现有所不同,并且可以通过对模型的生成结果进行分析来找到不可靠的上下文使用情况。
    Abstract Establishing whether language models can use contextual information in a human-plausible way is important to ensure their safe adoption in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, and current plausibility evaluations are practically limited to a handful of artificial benchmarks. To address this, we introduce Plausibility Evaluation of Context Reliance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models' generations. Our approach leverages model internals to (i) contrastively identify context-sensitive target tokens in generated texts and (ii) link them to contextual cues justifying their prediction. We use PECoRe to quantify the plausibility of context-aware machine translation models, comparing model rationales with human annotations across several discourse-level phenomena. Finally, we apply our method to unannotated generations to identify context-mediated predictions and highlight instances of (im)plausible context usage in model translations.
    摘要 Translation:确定语言模型是否可以在人类可理解的方式中使用语境信息是确保其在实际场景中安全采用的重要问题。然而,当前的问题是 WHEN 和哪些语境信息影响模型生成的问题,通常是分开处理的,并且目前的可理解评估只是通过几个人工标准来进行。为了解决这个问题,我们介绍了可理解语境关系评估(PECoRe),一种结构化可读性框架,用于评估语言模型在生成过程中使用语境信息的可理解程度。我们的方法利用模型内部的特征来(i)对生成文本中的语境敏感目标token进行对比,并(ii)将其与生成文本中的语境诱导符相关联。我们使用 PECoRe 评估context-aware机器翻译模型的可理解程度,与人类注释相比讨论多个话语水平现象。最后,我们将我们的方法应用于无注释生成文本,以识别语境媒介预测和指出模型翻译中的(不)可理解语境使用情况。

NarrativePlay: Interactive Narrative Understanding

  • paper_url: http://arxiv.org/abs/2310.01459
  • repo_url: None
  • paper_authors: Runcong Zhao, Wenjia Zhang, Jiazheng Li, Lixing Zhu, Yanran Li, Yulan He, Lin Gui
  • for: 让用户扮演小说中的人物,与其他角色互动,体验幽默有趣的故事情节。
  • methods: 利用大型自然语言模型(LLMs)生成人类化回应,根据故事中人物特征进行引导。
  • results: 系统可以增强用户体验,通过自动生成的视觉显示、人物肖像和对话。
    Abstract In this paper, we introduce NarrativePlay, a novel system that allows users to role-play a fictional character and interact with other characters in narratives such as novels in an immersive environment. We leverage Large Language Models (LLMs) to generate human-like responses, guided by personality traits extracted from narratives. The system incorporates auto-generated visual display of narrative settings, character portraits, and character speech, greatly enhancing user experience. Our approach eschews predefined sandboxes, focusing instead on main storyline events extracted from narratives from the perspective of a user-selected character. NarrativePlay has been evaluated on two types of narratives, detective and adventure stories, where users can either explore the world or improve their favorability with the narrative characters through conversations.
    摘要 在这篇论文中,我们介绍了一种新的系统 called NarrativePlay,它允许用户扮演虚构的人物,与其他人物在故事中进行互动,并在有限的环境中体验沉浸式的交互。我们利用大型自然语言模型(LLMs)生成人类化的回答,以人物特质从故事中提取的方式为导向。系统包括自动生成的视觉显示,包括故事的背景、人物图像和人物对话,这大大提高了用户体验。我们的方法不依赖于预定的沙盒,而是Focus on the main storyline events from the perspective of a user-selected character。NarrativePlay在探险和侦探故事中进行了评估,用户可以 either explore the world or improve their favorability with narrative characters through conversations。

Graph Isomorphic Networks for Assessing Reliability of the Medium-Voltage Grid

  • paper_url: http://arxiv.org/abs/2310.01181
  • repo_url: https://github.com/charlottecvn/ginenergygrids
  • paper_authors: Charlotte Cambier van Nooten, Tom van de Poll, Sonja Füllhase, Jacco Heres, Tom Heskes, Yuliya Shapovalova
  • for: This paper aims to improve the reliability and efficiency of energy grid assessments by using Graph Isomorphic Networks (GINs) for n-1 assessments in medium voltage grids.
  • methods: The proposed GIN approach directly handles graph-structured data and utilises graph structure and data about stations/cables to generalise to unseen grids.
  • results: The GIN approach demonstrates faster and more reliable grid assessments than traditional mathematical optimisation methods, reducing prediction times by approximately a factor of 1000.
    Abstract Ensuring electricity grid reliability becomes increasingly challenging with the shift towards renewable energy and declining conventional capacities. Distribution System Operators (DSOs) aim to achieve grid reliability by verifying the n-1 principle, ensuring continuous operation in case of component failure. Electricity networks' complex graph-based data holds crucial information for n-1 assessment: graph structure and data about stations/cables. Unlike traditional machine learning methods, Graph Neural Networks (GNNs) directly handle graph-structured data. This paper proposes using Graph Isomorphic Networks (GINs) for n-1 assessments in medium voltage grids. The GIN framework is designed to generalise to unseen grids and utilise graph structure and data about stations/cables. The proposed GIN approach demonstrates faster and more reliable grid assessments than a traditional mathematical optimisation approach, reducing prediction times by approximately a factor of 1000. The findings offer a promising approach to address computational challenges and enhance the reliability and efficiency of energy grid assessments.
    摘要 随着可再生能源的转型和传统装机能力的下降,电力网络可靠性的保证变得越来越挑战。分布系统运行商(DSOs)希望通过n-1原则来确保网络可靠性,以避免因组件故障而导致的停电。电力网络的复杂图structured数据包含关键信息 для n-1评估:图结构和站/电缆数据。不同于传统机器学习方法,图神经网络(GNNs)直接处理图结构数据。本文提出使用图同态网络(GINs)来进行n-1评估在中压电网络中。GIN框架设计用于泛化到未看到的网络,利用图结构和站/电缆数据。提议的GIN方法比传统的数学优化方法更快和可靠,降低预测时间约1000倍。发现可以有效地解决计算挑战,提高能源网络评估的可靠性和效率。

Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing

  • paper_url: http://arxiv.org/abs/2310.01180
  • repo_url: https://github.com/devilyangs/enas-kt
  • paper_authors: Shangshang Yang, Xiaoshan Yu, Ye Tian, Xueming Yan, Haiping Ma, Xingyi Zhang
  • for: 这篇论文的目的是提高知识追踪(KT)的准确率,以及自动选择输入特征和操作的搜索方法。
  • methods: 该论文使用了 transformer 架构,并添加了 convolution 操作来增强地方上下文模型,以及一种进化性的神经网络搜索方法来自动选择输入特征和操作。
  • results: 实验结果表明,该方法可以在两个最大和最复杂的教育数据集上达到最佳结果,并且比传统的 transformer 架构具有更好的准确率和搜索效率。
    Abstract Knowledge tracing (KT) aims to trace students' knowledge states by predicting whether students answer correctly on exercises. Despite the excellent performance of existing Transformer-based KT approaches, they are criticized for the manually selected input features for fusion and the defect of single global context modelling to directly capture students' forgetting behavior in KT, when the related records are distant from the current record in terms of time. To address the issues, this paper first considers adding convolution operations to the Transformer to enhance its local context modelling ability used for students' forgetting behavior, then proposes an evolutionary neural architecture search approach to automate the input feature selection and automatically determine where to apply which operation for achieving the balancing of the local/global context modelling. In the search space, the original global path containing the attention module in Transformer is replaced with the sum of a global path and a local path that could contain different convolutions, and the selection of input features is also considered. To search the best architecture, we employ an effective evolutionary algorithm to explore the search space and also suggest a search space reduction strategy to accelerate the convergence of the algorithm. Experimental results on the two largest and most challenging education datasets demonstrate the effectiveness of the architecture found by the proposed approach.
    摘要 知识跟踪(KT)目的是跟踪学生的知识状态,预测学生在作业中答题是否正确。 DESPITE 现有的 transformer 基本 KT 方法的出色表现,它们被批评为手动选择的输入特征 для 混合和单一全局上下文模型直接捕捉学生的忘记行为在 KT 中,当相关记录与当前记录之间的时间距离较远时。 TO ADDRESS 这些问题,本文首先考虑添加 convolution 操作到 transformer 以增强其本地上下文模型的能力,然后提出一种EVOLUTIONARY NEURAL ARCHITECTURE SEARCH 方法来自动选择输入特征和应用哪些操作以实现本地/全局上下文模型的平衡。 IN THE SEARCH SPACE, the original global path containing the attention module in Transformer is replaced with the sum of a global path and a local path that could contain different convolutions, and the selection of input features is also considered. TO SEARCH THE BEST ARCHITECTURE, we employ an effective evolutionary algorithm to explore the search space and also suggest a search space reduction strategy to accelerate the convergence of the algorithm. EXPERIMENTAL RESULTS ON THE TWO LARGEST AND MOST CHALLENGING EDUCATION DATASETS DEMONSTRATE THE EFFECTIVENESS OF THE ARCHITECTURE FOUND BY THE PROPOSED APPROACH.

Towards guarantees for parameter isolation in continual learning

  • paper_url: http://arxiv.org/abs/2310.01165
  • repo_url: None
  • paper_authors: Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann
  • for: 本研究旨在解释深度学习中 catastrophic forgetting 问题的解决方法。
  • methods: 本研究使用 parameter isolation 方法,并提供了对这些方法的可证确保。
  • results: 研究表明,parameter isolation 方法可以减轻 catastrophic forgetting 问题,并且可以提供可证的确保。
    Abstract Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catastrophic forgetting are lacking. In this work, we study the relationship between learning and forgetting by looking at the geometry of neural networks' loss landscape. We offer a unifying perspective on a family of continual learning algorithms, namely methods based on parameter isolation, and we establish guarantees on catastrophic forgetting for some of them.
    摘要 深度学习已经证明是机器学习中成功的一种方程式,但是深度神经网络在多个任务的顺序训练时会失败,这被称为 continual learning 文献中的惨极忘记问题。尽管最近有一些学习算法成功解决了这个问题,但我们发现有证据保证是缺失的。在这种工作中,我们研究学习和忘记之间的关系,通过审视神经网络损失的地形来研究。我们对基于参数隔离的一家 continual learning 算法提供了一种统一的视角,并为一些其中的 garantía 提供了保证。

DINE: Dimensional Interpretability of Node Embeddings

  • paper_url: http://arxiv.org/abs/2310.01162
  • repo_url: https://github.com/simonepiaggesi/dine
  • paper_authors: Simone Piaggesi, Megha Khosla, André Panisson, Avishek Anand
  • for: 这篇论文的目的是提出一种可解释性强的节点嵌入方法,以便更好地理解节点嵌入所表示的图structuren。
  • methods: 该论文使用了一种新的度量来评估节点嵌入的可解释性,并提出了一种新的节点嵌入方法called DINE,可以在不影响任务性能的情况下,提高节点嵌入的可解释性。
  • results: 实验表明,DINE可以同时学习高度可解释的节点嵌入,并且在链接预测任务中达到了高度有效的性能。
    Abstract Graphs are ubiquitous due to their flexibility in representing social and technological systems as networks of interacting elements. Graph representation learning methods, such as node embeddings, are powerful approaches to map nodes into a latent vector space, allowing their use for various graph tasks. Despite their success, only few studies have focused on explaining node embeddings locally. Moreover, global explanations of node embeddings remain unexplored, limiting interpretability and debugging potentials. We address this gap by developing human-understandable explanations for dimensions in node embeddings. Towards that, we first develop new metrics that measure the global interpretability of embedding vectors based on the marginal contribution of the embedding dimensions to predicting graph structure. We say that an embedding dimension is more interpretable if it can faithfully map to an understandable sub-structure in the input graph - like community structure. Having observed that standard node embeddings have low interpretability, we then introduce DINE (Dimension-based Interpretable Node Embedding), a novel approach that can retrofit existing node embeddings by making them more interpretable without sacrificing their task performance. We conduct extensive experiments on synthetic and real-world graphs and show that we can simultaneously learn highly interpretable node embeddings with effective performance in link prediction.
    摘要 GRAPH(图)是在社会和技术系统中广泛使用的,因为它可以将这些系统表示为互相交互的网络。图表学习方法,如节点嵌入,可以将节点映射到一个隐藏空间中,以便进行多种图任务。然而,只有一些研究对节点嵌入进行了本地解释。此外,对节点嵌入的全球解释仍然未得到了充分的研究,这限制了图解释性和调试潜力。我们解决了这个差距,通过开发人类可理解的节点嵌入解释方法。为此,我们首先开发了一些测量全球可解释性的嵌入向量метри克,以衡量嵌入向量的权重。我们认为,嵌入维度更加可解释,如果它可以准确地映射到输入图中的理解性质结构。由于标准节点嵌入具有低可解释性,我们因此引入了DINE(可解释性基于维度的节点嵌入),一种新的方法,可以让节点嵌入更加可解释,而不是牺牲任务性能。我们在 sintetic 和实际图上进行了广泛的实验,并证明了我们可以同时学习高度可解释性的节点嵌入,并且在链接预测任务中获得了有效的表现。

SWMLP: Shared Weight Multilayer Perceptron for Car Trajectory Speed Prediction using Road Topographical Features

  • paper_url: http://arxiv.org/abs/2310.02282
  • repo_url: None
  • paper_authors: Sarah Almeida Carneiro, Giovanni Chierchia, Jean Charléty, Aurélie Chataignon, Laurent Najman
  • for: 提高交通运输管理的准确性和效率,适用于各地交通数据的可用性不充分的情况
  • methods: 使用路径特征与分类多层扩散学习模型,对车辆速度进行预测
  • results: 与标准回归分析相比,显示了显著的改善,同时提供了新的方法设计交通分析的思路。
    Abstract Although traffic is one of the massively collected data, it is often only available for specific regions. One concern is that, although there are studies that give good results for these data, the data from these regions may not be sufficiently representative to describe all the traffic patterns in the rest of the world. In quest of addressing this concern, we propose a speed prediction method that is independent of large historical speed data. To predict a vehicle's speed, we use the trajectory road topographical features to fit a Shared Weight Multilayer Perceptron learning model. Our results show significant improvement, both qualitative and quantitative, over standard regression analysis. Moreover, the proposed framework sheds new light on the way to design new approaches for traffic analysis.
    摘要 Note:* "massively collected" is translated as " widely collected" (广泛收集)* "in quest of" is translated as "to address" (解决)* "independent of" is translated as "不依赖"* "shared weight" is translated as "共享 weights" (共享重量)* "multilayer perceptron" is translated as "多层感知器" (多层感知器)* "trajectory" is translated as "路径" (路径)* "road topographical features" is translated as "道路地形特征" (道路地形特征)

Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives

  • paper_url: http://arxiv.org/abs/2310.01152
  • repo_url: https://github.com/git-disl/gptlens
  • paper_authors: Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, Ling Liu
  • for: 这paper是为了系统地分析利用大语言模型(LLM)如GPT-4探测智能合约中的漏洞的可能性、挑战和解决方案。
  • methods: 这paper使用了一种名为GPTLens的 adversarial框架,将检测分为两个相互作用的阶段:生成和评估,通过让LLM担任审查员和评论者两个角色,分别提供更多答案和评估其正确性。
  • results: 实验和示例表明,GPTLens可以有效地减少false positive,并且可以由不具备专业知识的人使用。
    Abstract This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4 to dig out vulnerabilities within smart contracts based on our ongoing research. For the task of smart contract vulnerability detection, achieving practical usability hinges on identifying as many true vulnerabilities as possible while minimizing the number of false positives. Nonetheless, our empirical study reveals contradictory yet interesting findings: generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives. To mitigate this tension, we propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination, for progressive detection and refinement, wherein the LLM plays dual roles, i.e., auditor and critic, respectively. The goal of auditor is to yield a broad spectrum of vulnerabilities with the hope of encompassing the correct answer, whereas the goal of critic that evaluates the validity of identified vulnerabilities is to minimize the number of false positives. Experimental results and illustrative examples demonstrate that auditor and critic work together harmoniously to yield pronounced improvements over the conventional one-stage detection. GPTLens is intuitive, strategic, and entirely LLM-driven without relying on specialist expertise in smart contracts, showcasing its methodical generality and potential to detect a broad spectrum of vulnerabilities. Our code is available at: https://github.com/git-disl/GPTLens.
    摘要

Adaptive Online Non-stochastic Control

  • paper_url: http://arxiv.org/abs/2310.02261
  • repo_url: None
  • paper_authors: Naram Mhaisen, George Iosifidis
  • for: 该论文目的是解决非随机控制问题,以获得适应控制环境的算法。
  • methods: 该论文使用FTRL框架,并设计了新的减少技术来考虑系统的记忆。此外,它还将这些减少技术与未可信的未来成本预测结合,实现了首个可信度适应的FTRL控制器。
  • results: 该论文获得了新的下线性数据适应政策 regret bound,并实现了可信度适应的 regret bound。
    Abstract We tackle the problem of Non-stochastic Control with the aim of obtaining algorithms that adapt to the controlled environment. Namely, we tailor the FTRL framework to dynamical systems where the existence of a state, or equivalently a memory, couples the effect of the online decisions. By designing novel regularization techniques that take the system's memory into consideration, we obtain controllers with new sub-linear data adaptive policy regret bounds. Furthermore, we append these regularizers with untrusted predictions of future costs, which enables the design of the first Optimistic FTRL-based controller whose regret bound is adaptive to the accuracy of the predictions, shrinking when they are accurate while staying sub-linear even when they all fail.
    摘要 我团队面临非渐进控制问题,目标是获得适应控制环境的算法。具体来说,我们将FTRL框架应用到动态系统中,其中存在状态或等价的记忆会影响控制决策的效果。通过设计新的内存考虑的规范技术,我们获得了新的非线性数据适应策略 regret bound。此外,我们将这些规范与未经过信任的未来成本预测结合,实现了首个可靠FTRL-基于控制器,其 regret bound是对预测准确程度的自适应变量。

Stability and Generalization for Minibatch SGD and Local SGD

  • paper_url: http://arxiv.org/abs/2310.01139
  • repo_url: None
  • paper_authors: Yunwen Lei, Tao Sun, Mingrui Liu
  • for: 优化方法的批处理和本地优化
  • methods: 使用批处理SGD和本地SGD
  • results: 研究了这两种方法的稳定性和泛化能力,并发现它们可以达到线性增速,且可以满足最优风险 bound。
    Abstract The increasing scale of data propels the popularity of leveraging parallelism to speed up the optimization. Minibatch stochastic gradient descent (minibatch SGD) and local SGD are two popular methods for parallel optimization. The existing theoretical studies show a linear speedup of these methods with respect to the number of machines, which, however, is measured by optimization errors. As a comparison, the stability and generalization of these methods are much less studied. In this paper, we study the stability and generalization analysis of minibatch and local SGD to understand their learnability by introducing a novel expectation-variance decomposition. We incorporate training errors into the stability analysis, which shows how small training errors help generalization for overparameterized models. We show both minibatch and local SGD achieve a linear speedup to attain the optimal risk bounds.
    摘要 随着数据规模的增长,批处理并行化成为加速优化的受欢迎方法。小批处理梯度下降(minibatch SGD)和本地梯度下降(local SGD)是两种受欢迎的并行优化方法。现有的理论研究表明,这些方法在机器数量上 linear 的提高,但这些研究仅基于优化错误的度量。相比之下,这些方法的稳定性和泛化性尚未得到充分的研究。在这篇论文中,我们对 minibatch 和 local SGD 进行稳定性和泛化分析,并通过引入一种新的期望-变量分解来评估这些方法的学习可能性。我们将训练错误包含在稳定性分析中,并显示出小训练错误可以帮助泛化过parameterized 模型。我们还证明了 minibatch 和 local SGD 都可以达到线性的提高,以实现最佳风险下界。

Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback

  • paper_url: http://arxiv.org/abs/2310.01132
  • repo_url: None
  • paper_authors: Jacob Whitehill, Jennifer LoCasale-Crouch
  • for: 用于提供更specific、更频繁、更actionable的教学反馈,我们探索了大自然语言模型(LLMs)可以用于估算“教学支持”领域分数(CLASS)评估协议中的教学支持分数。
  • methods: 我们设计了一种机器学习架构,使用Meta的Llama2零shot提示或经典的Bag of Words(BoW)模型,对教师的语音讲解(通过OpenAI的Whisper自动转录)进行分类,以确定11种行为指标中的教学支持。然后,这些语音分类结果被聚合到整个15分钟观察会议中,以估算全局CLASS分数。
  • results: 实验结果表明,(1)自动CLASS教学支持估算精度(Pearson $R$ 达到 $0.46$)与人类间评估相当(达到 $R=0.55$); (2) LLMs 在这个任务中表现slightly better than BoW; (3) 最佳模型通常结合了LLM和BoW中的特征。最后,(4) 我们示例了如何使用模型的输出 visualize 教师每个语音的相对 corrleated 度,以提供可解释的反馈。
    Abstract With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, we explore how Large Language Models (LLMs) can be used to estimate ``Instructional Support'' domain scores of the CLassroom Assessment Scoring System (CLASS), a widely used observation protocol. We design a machine learning architecture that uses either zero-shot prompting of Meta's Llama2, and/or a classic Bag of Words (BoW) model, to classify individual utterances of teachers' speech (transcribed automatically using OpenAI's Whisper) for the presence of 11 behavioral indicators of Instructional Support. Then, these utterance-level judgments are aggregated over an entire 15-min observation session to estimate a global CLASS score. Experiments on two CLASS-coded datasets of toddler and pre-kindergarten classrooms indicate that (1) automatic CLASS Instructional Support estimation accuracy using the proposed method (Pearson $R$ up to $0.46$) approaches human inter-rater reliability (up to $R=0.55$); (2) LLMs yield slightly greater accuracy than BoW for this task; and (3) the best models often combined features extracted from both LLM and BoW. Finally, (4) we illustrate how the model's outputs can be visualized at the utterance level to provide teachers with explainable feedback on which utterances were most positively or negatively correlated with specific CLASS dimensions.
    摘要 为了为教师提供更具体、频繁和操作性的反馈,我们探讨了使用大型自然语言模型(LLM)来估算教室评估系统(CLASS)中的“教学支持”领域分数。我们设计了一种机器学习架构,使用Meta的Llama2零shot提示和/或简单的词汇模型(BoW)来分类教师的口头语言(通过OpenAI的Whisper自动识别)是否包含11种教学支持行为指标。然后,这些语言级别的判别结果被聚合到整个15分钟观察会话中,以估算全局的CLASS分数。实验结果表明,使用我们提posed的方法,自动CLASS教学支持估算准确率(Pearson $R$ 达到0.46)与人类间评估相似(达到$R=0.55$)。此外,LLM在这个任务中表现了些微的优势,而且最佳模型通常是将LLM和BoW特征结合使用。最后,我们示例了如何使用模型输出来为教师提供可解释的反馈,以便他们了解每句话是否与特定的CLASS维度有正相关。

Disentangling Voice and Content with Self-Supervision for Speaker Recognition

  • paper_url: http://arxiv.org/abs/2310.01128
  • repo_url: None
  • paper_authors: Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li
  • for: 这篇论文旨在提出一种同时模型说话者特征和内容变化的分解框架,以提高说话者识别精度。
  • methods: 该框架使用三个 Gaussian 推理层,每个层包含一个学习过程来抽取不同的说话者特征。此外,还提出了一种自我超vision方法,以动态分解内容,无需额外标签。
  • results: 在 VoxCeleb 和 SITW 数据集上进行了实验,提高了平均识别精度和最小偏差值,减少了 EER 和 minDCF 的平均值为 9.56% 和 8.24%,分别。这种方法不需要额外的模型训练或数据,因此在实际应用中容易实施。
    Abstract For speaker recognition, it is difficult to extract an accurate speaker representation from speech because of its mixture of speaker traits and content. This paper proposes a disentanglement framework that simultaneously models speaker traits and content variability in speech. It is realized with the use of three Gaussian inference layers, each consisting of a learnable transition model that extracts distinct speech components. Notably, a strengthened transition model is specifically designed to model complex speech dynamics. We also propose a self-supervision method to dynamically disentangle content without the use of labels other than speaker identities. The efficacy of the proposed framework is validated via experiments conducted on the VoxCeleb and SITW datasets with 9.56% and 8.24% average reductions in EER and minDCF, respectively. Since neither additional model training nor data is specifically needed, it is easily applicable in practical use.
    摘要 为了进行说话人识别,因为语音中含有说话人特征和内容的混合,从语音中提取准确的说话人表示很难。这篇论文提出了一种分解框架,同时模型说话人特征和内容变化。它通过三个 Gaussian 推理层实现,每个层有一个可学习的传输模型,提取不同的语音组成部分。特别是,我们设计了一种加强的传输模型,以模型复杂的语音动态。我们还提出了一种自我超vision方法,以无需其他标签,动态分解内容。我们的方法在VoxCeleb和SITW数据集上进行了实验,实现了9.56%和8.24%的平均降低EER和minDCF。由于不需要额外的模型训练或数据,因此在实践中非常容易应用。

End-to-End Continuous Speech Emotion Recognition in Real-life Customer Service Call Center Conversations

  • paper_url: http://arxiv.org/abs/2310.02281
  • repo_url: None
  • paper_authors: Yajing Feng, Laurence Devillers
  • for: 这项研究的目的是开发一个大规模的实际生产环境中的客户服务Dialogue Emotion recognition系统。
  • methods: 该研究使用了维度型情感注解方法,以捕捉实际客户服务对话中情感的细腻、复杂性和连续性。同时,该研究还Addressed the challenges of applying End-to-End SER system to the dataset, such as determining the appropriate label sampling rate and input segment length, as well as integrating contextual information (interlocutor’s gender and empathy level) with different weights using multitask learning。
  • results: 研究显示,将对话者的共情水平信息与不同权重相结合使用多任务学习可以提高模型的表现。
    Abstract Speech Emotion recognition (SER) in call center conversations has emerged as a valuable tool for assessing the quality of interactions between clients and agents. In contrast to controlled laboratory environments, real-life conversations take place under uncontrolled conditions and are subject to contextual factors that influence the expression of emotions. In this paper, we present our approach to constructing a large-scale reallife dataset (CusEmo) for continuous SER in customer service call center conversations. We adopted the dimensional emotion annotation approach to capture the subtlety, complexity, and continuity of emotions in real-life call center conversations, while annotating contextual information. The study also addresses the challenges encountered during the application of the End-to-End (E2E) SER system to the dataset, including determining the appropriate label sampling rate and input segment length, as well as integrating contextual information (interlocutor's gender and empathy level) with different weights using multitask learning. The result shows that incorporating the empathy level information improved the model's performance.
    摘要 <>Translate the following text into Simplified Chinese.<>语音情感识别(SER)在客户服务中心对话中已经成为评估客户和代理之间交流质量的有价值工具。与控制的实验室环境不同,实际对话发生在不可控的环境下,情感表达受到上下文因素的影响。本文介绍了我们的大规模实际生活数据集(CusEmo)的建构方法,用于连续的SER。我们采用了维度情感标注方法,以捕捉实际对话中情感的细节、复杂性和连续性,同时注解上下文信息。研究还解决了将结束到终端(E2E)SER系统应用到数据集中的挑战,包括确定合适的标签采样率和输入段长度,以及将不同权重的上下文信息(对话伙伴的性别和Empathy水平)与多任务学习结合。结果表明,包含Empathy水平信息可以提高模型的性能。

Prompt-tuning latent diffusion models for inverse problems

  • paper_url: http://arxiv.org/abs/2310.01110
  • repo_url: None
  • paper_authors: Hyungjin Chung, Jong Chul Ye, Peyman Milanfar, Mauricio Delbracio
  • for: 解决图像反问题,使用文本至图像普适扩散模型作为通用先验。
  • methods: 提出了一种新的方法,即在运行反扩散过程中同时调整文本嵌入,以提高图像的准确性。此外,我们还提出了一种方法,即在演化秘密变量时保持扩散变量在Encoder的范围内,以降低图像artefacts。
  • results: P2L方法在多个任务上,如超解、去锈和填充等任务上表现出色,超过了图像扩散模型和秘密扩散模型基于反问题解决器的性能。
    Abstract We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Existing methods using latent diffusion models for inverse problems typically rely on simple null text prompts, which can lead to suboptimal performance. To address this limitation, we introduce a method for prompt tuning, which jointly optimizes the text embedding on-the-fly while running the reverse diffusion process. This allows us to generate images that are more faithful to the diffusion prior. In addition, we propose a method to keep the evolution of latent variables within the range space of the encoder, by projection. This helps to reduce image artifacts, a major problem when using latent diffusion models instead of pixel-based diffusion models. Our combined method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.
    摘要 我们提出一种新的方法,使用文本到图像潜扩散模型作为普riors来解决图像反问题。现有的方法通常使用简单的空文本提示,这可能会导致表现不佳。为解决这些限制,我们介绍了一种提问调整方法,同时在反扩散过程中运行文本嵌入jointly优化。这使得我们可以生成更 faithful to the diffusion prior的图像。此外,我们还提出了一种方法,使用投影来保持潜在变量的演化在扩散器的编码器范围内,以避免图像artefacts。我们的共同方法,称为P2L,在多种任务上,如超分解、抑噪和填充等任务上表现出色,超过了图像-和潜扩散模型基于 inverse problem solvers。

Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.01107
  • repo_url: https://github.com/ground-a-video/ground-a-video
  • paper_authors: Hyeonho Jeong, Jong Chul Ye
  • for: 这篇论文旨在解决多属性视频编辑中的缺陷,提供一种基于文本-视频数据的无需训练的视频-视频翻译框架,以实现多属性视频编辑。
  • methods: 该方法基于 Cross-Frame Gated Attention Mechanism,利用文本信息在封闭表示中带入时间协调的信息,并具有 Modulated Cross-Attention 和滤波器减少 inverted latents 的技术。
  • results: 对于多属性视频编辑任务,Ground-A-Video 方法在无需训练的情况下,与基eline方法进行比较,实现了更高的编辑精度和帧稳定性。
    Abstract Recent endeavors in video editing have showcased promising results in single-attribute editing or style transfer tasks, either by training text-to-video (T2V) models on text-video data or adopting training-free methods. However, when confronted with the complexities of multi-attribute editing scenarios, they exhibit shortcomings such as omitting or overlooking intended attribute changes, modifying the wrong elements of the input video, and failing to preserve regions of the input video that should remain intact. To address this, here we present a novel grounding-guided video-to-video translation framework called Ground-A-Video for multi-attribute video editing. Ground-A-Video attains temporally consistent multi-attribute editing of input videos in a training-free manner without aforementioned shortcomings. Central to our method is the introduction of Cross-Frame Gated Attention which incorporates groundings information into the latent representations in a temporally consistent fashion, along with Modulated Cross-Attention and optical flow guided inverted latents smoothing. Extensive experiments and applications demonstrate that Ground-A-Video's zero-shot capacity outperforms other baseline methods in terms of edit-accuracy and frame consistency. Further results and codes are provided at our project page (http://ground-a-video.github.io).
    摘要

NP$^2$L: Negative Pseudo Partial Labels Extraction for Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.01098
  • repo_url: None
  • paper_authors: Xinjie Shen, Danyang Wu, Jitao Lu, Junjie Liang, Jin Xu, Feiping Nie
  • for: 提高 Pseudo 标签的准确性,并在图神经网络(GNNs)中应用。
  • methods: 使用不重叠的部分标签选择 pseudo 标签,并通过构建负边的方式将其与Message Passing Mechanism结合。
  • results: 在链接预测和节点分类任务中达到了领先的表现,比如在 benchmark 数据集上实现了状态的排名。
    Abstract How to utilize the pseudo labels has always been a research hotspot in machine learning. However, most methods use pseudo labels as supervised training, and lack of valid assessing for their accuracy. Moreover, applications of pseudo labels in graph neural networks (GNNs) oversee the difference between graph learning and other machine learning tasks such as message passing mechanism. Aiming to address the first issue, we found through a large number of experiments that the pseudo labels are more accurate if they are selected by not overlapping partial labels and defined as negative node pairs relations. Therefore, considering the extraction based on pseudo and partial labels, negative edges are constructed between two nodes by the negative pseudo partial labels extraction (NP$^2$E) module. With that, a signed graph are built containing highly accurate pseudo labels information from the original graph, which effectively assists GNN in learning at the message-passing level, provide one solution to the second issue. Empirical results about link prediction and node classification tasks on several benchmark datasets demonstrate the effectiveness of our method. State-of-the-art performance is achieved on the both tasks.
    摘要 如何利用假标签是机器学习领域的研究热点之一。然而,大多数方法使用假标签作为有监督训练,而不具有有效的评估准确性。此外,假标签在图神经网络(GNNs)中的应用过look the difference between graph learning and other machine learning tasks such as message passing mechanism.为解决第一个问题,我们通过大量实验发现,使用不重叠的 partial labels 选择 pseudo labels 会更准确。因此,基于 pseudo 和 partial 标签的抽取,我们提出了一种构建负 edges 的方法,即负 pseudo partial labels EXTraction(NP$^2$E)模块。通过这种方法,我们可以从原始图中提取高准确度的 pseudo labels 信息,帮助 GNN 在消息传递水平上学习,提供一种解决第二个问题的方案。我们在多个 benchmark 数据集上进行了链接预测和节点分类任务的实验结果,得到了 estado-of-the-art 的性能。

LoCUS: Learning Multiscale 3D-consistent Features from Posed Images

  • paper_url: http://arxiv.org/abs/2310.01095
  • repo_url: https://github.com/dakloepfer/locus
  • paper_authors: Dominik A. Kloepfer, Dylan Campbell, João F. Henriques
  • for: trains a neural network to learn a versatile representation of the world that can handle occlusions, previously-unseen views, and long time horizons without supervision.
  • methods: uses a patch retrieval objective to train the network, balancing retrieval and reusability by constructing the retrieval set carefully and adjusting the spatial tolerance.
  • results: demonstrates the effectiveness of the proposed method in creating sparse, multi-scale, semantic spatial maps composed of highly identifiable landmarks, with applications in landmark retrieval, localization, semantic segmentation, and instance segmentation.
    Abstract An important challenge for autonomous agents such as robots is to maintain a spatially and temporally consistent model of the world. It must be maintained through occlusions, previously-unseen views, and long time horizons (e.g., loop closure and re-identification). It is still an open question how to train such a versatile neural representation without supervision. We start from the idea that the training objective can be framed as a patch retrieval problem: given an image patch in one view of a scene, we would like to retrieve (with high precision and recall) all patches in other views that map to the same real-world location. One drawback is that this objective does not promote reusability of features: by being unique to a scene (achieving perfect precision/recall), a representation will not be useful in the context of other scenes. We find that it is possible to balance retrieval and reusability by constructing the retrieval set carefully, leaving out patches that map to far-away locations. Similarly, we can easily regulate the scale of the learned features (e.g., points, objects, or rooms) by adjusting the spatial tolerance for considering a retrieval to be positive. We optimize for (smooth) Average Precision (AP), in a single unified ranking-based objective. This objective also doubles as a criterion for choosing landmarks or keypoints, as patches with high AP. We show results creating sparse, multi-scale, semantic spatial maps composed of highly identifiable landmarks, with applications in landmark retrieval, localization, semantic segmentation and instance segmentation.
    摘要 自然语言处理中的一个重要挑战是建立一个空间和时间一致的世界模型。这种模型需要在遮盖物、新视图和长时间跨度(例如循环关闭和重新识别)下保持一致。目前还没有一个有效的训练方法。我们从patch retrieval问题开始,即给一个图像patch在不同视图中找到与同一个真实世界位置相对应的所有patch。一个缺点是这种目标不会促进特征的可重用性:由于每个场景都是唯一的,因此一个表示不会在其他场景中有用。我们发现可以平衡检索和可重用性,通过精心构造检索集,排除与远方位置相对应的patch。同时,我们可以轻松调整学习的特征精度(例如点、物体或房间),通过调整检索集的空间宽度。我们优化(平滑)平均准确率(AP),并在单一的排名基础上定义一个对象。这个对象同时也可以用于选择标识点或关键点,因为patches with high AP。我们获得了一个稀畴、多尺度、semantic空间地图,由高度可识别的标识点组成,并有应用于标识点检索、本地化、semantic分割和实例分割。

Non-negative isomorphic neural networks for photonic neuromorphic accelerators

  • paper_url: http://arxiv.org/abs/2310.01084
  • repo_url: None
  • paper_authors: Manos Kirtas, Nikolaos Passalis, Nikolaos Pleros, Anastasios Tefas
  • for: 提高计算速度和能效率,实现 Femtojoule per MAC 级别的精度。
  • methods: 使用非负神经网络和启用不同的启发式硬件能力,从而超越现有的硬件复杂性和能效率问题。
  • results: 实现了非负神经网络的训练和优化,并且能够与常见神经网络相比,保持同等的准确率。
    Abstract Neuromorphic photonic accelerators are becoming increasingly popular, since they can significantly improve computation speed and energy efficiency, leading to femtojoule per MAC efficiency. However, deploying existing DL models on such platforms is not trivial, since a great range of photonic neural network architectures relies on incoherent setups and power addition operational schemes that cannot natively represent negative quantities. This results in additional hardware complexity that increases cost and reduces energy efficiency. To overcome this, we can train non-negative neural networks and potentially exploit the full range of incoherent neuromorphic photonic capabilities. However, existing approaches cannot achieve the same level of accuracy as their regular counterparts, due to training difficulties, as also recent evidence suggests. To this end, we introduce a methodology to obtain the non-negative isomorphic equivalents of regular neural networks that meet requirements of neuromorphic hardware, overcoming the aforementioned limitations. Furthermore, we also introduce a sign-preserving optimization approach that enables training of such isomorphic networks in a non-negative manner.
    摘要 射频神经加速器在普及化的程度不断增加,因为它们可以很大程度地提高计算速度和能效率,实现 femtojoule 每个 MAC 效率。但是,将现有的 Deep Learning 模型部署到这些平台上不是一个简单的问题,因为大量的光子神经网络架构和加法操作方案不能Native地表示负数。这会增加硬件复杂度,降低能效率。为了解决这个问题,我们可以训练非负的神经网络,并且可能利用射频神经网络的全面射频能力。但是,现有的方法不能实现与常规神经网络相同的精度水准,因为训练问题。为了解决这个问题,我们提出了一种方法,可以将常规神经网络转换为非负的同样能力的神经网络,超越以上所述的限制。此外,我们还提出了一种维持正数的优化方法,可以在非负的情况下训练这些同样能力的神经网络。

Linear attention is (maybe) all you need (to understand transformer optimization)

  • paper_url: http://arxiv.org/abs/2310.01082
  • repo_url: None
  • paper_authors: Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, Suvrit Sra
  • for: 研究transformer训练的困难性,并设计优化器和各种优化策略。
  • methods: 使用简化后的线性 transformer 模型来解决回归任务,受到 J. von Oswald et al. (ICML 2023) 和 K. Ahn et al. (NeurIPS 2023) 的启发。
  • results: 发现我们的提议的线性 transformer 模型可以复制许多 transformer 训练动态的显著特征。因此,这些结果表明一个简化后的线性 transformer 模型可能是理解 transformer 优化的有用、现实的尝试。
    Abstract Transformer training is notoriously difficult, requiring a careful design of optimizers and use of various heuristics. We make progress towards understanding the subtleties of training transformers by carefully studying a simple yet canonical linearized shallow transformer model. Specifically, we train linear transformers to solve regression tasks, inspired by J. von Oswald et al. (ICML 2023), and K. Ahn et al. (NeurIPS 2023). Most importantly, we observe that our proposed linearized models can reproduce several prominent aspects of transformer training dynamics. Consequently, the results obtained in this paper suggest that a simple linearized transformer model could actually be a valuable, realistic abstraction for understanding transformer optimization.
    摘要 <>转换器训练非常困难,需要精心设计优化器和使用各种规则。我们通过仔细研究简单又惯用的线性化浅转换器模型,做出了进展。Specifically,我们使用线性转换器解决回归任务, draw inspiration from J. von Oswald et al. (ICML 2023) 和 K. Ahn et al. (NeurIPS 2023)。最重要的是,我们发现我们提议的线性化模型可以重现许多转换器训练动态。因此,本文中的结果表明,一个简单的线性化转换器模型可能是一个有价值的、现实的减少。Note: "ICML" and "NeurIPS" are conferences in the field of machine learning, and "J. von Oswald et al." and "K. Ahn et al." are references to specific papers or research works.

Shaping of Magnetic Field Coils in Fusion Reactors using Bayesian Optimisation

  • paper_url: http://arxiv.org/abs/2310.01455
  • repo_url: None
  • paper_authors: Timothy Nunn, Vignesh Gopakumar, Sebastien Kahn
  • for: 这个论文是为了设计一个可持续的核聚变能源 реактор而写的。
  • methods: 这个论文使用了人工智能驱动的策略来探索设计搜索空间,并确定最佳参数。
  • results: 这个论文的结果表明,使用多输出汇丰度优化方法可以确定托роида型磁铁织物的形状,以最小化磁湍的影响,并最大化плазма稳定性。
    Abstract Nuclear fusion using magnetic confinement holds promise as a viable method for sustainable energy. However, most fusion devices have been experimental and as we move towards energy reactors, we are entering into a new paradigm of engineering. Curating a design for a fusion reactor is a high-dimensional multi-output optimisation process. Through this work we demonstrate a proof-of-concept of an AI-driven strategy to help explore the design search space and identify optimum parameters. By utilising a Multi-Output Bayesian Optimisation scheme, our strategy is capable of identifying the Pareto front associated with the optimisation of the toroidal field coil shape of a tokamak. The optimisation helps to identify design parameters that would minimise the costs incurred while maximising the plasma stability by way of minimising magnetic ripples.
    摘要 核聚变使用磁场束缚显示具有可持续的能源潜力。然而,大多数聚变设备都是实验性质的,我们进入了新的工程 paradigm。 curaing a design for a fusion reactor是一个多输出优化过程。通过这项工作,我们展示了一种人工智能驱动的策略,以探索设计搜索空间并确定优化参数。通过使用多输出 bayesian 优化方案,我们的策略可以确定扁平磁场绕组的形状优化参数,以最小化磁场波动的影响。

Back to the Future: Towards Explainable Temporal Reasoning with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.01074
  • repo_url: https://github.com/chenhan97/timellama
  • paper_authors: Chenhan Yuan, Qianqian Xie, Jimin Huang, Sophia Ananiadou
  • for: 这paper的目的是提出一个新的时间推理任务,即可Explainable Temporal Reasoning(ETR),用于测试LLMs的复杂时间推理能力和解释能力。
  • methods: 这paper使用了一种新的知识图生成策略,将多个知识图 dataset的时间推理路径转化为 ExpTime dataset,并基于这些数据集提出了一种新的 LLM 系列 TimeLlaMA,可以进行指令跟踪的时间推理。
  • results: 这paper的实验结果显示,TimeLlaMA 在 ETR 任务中表现出色,在时间预测和解释能力方面均达到了当前最佳性能。
    Abstract Temporal reasoning is a crucial NLP task, providing a nuanced understanding of time-sensitive contexts within textual data. Although recent advancements in LLMs have demonstrated their potential in temporal reasoning, the predominant focus has been on tasks such as temporal expression and temporal relation extraction. These tasks are primarily designed for the extraction of direct and past temporal cues and to engage in simple reasoning processes. A significant gap remains when considering complex reasoning tasks such as event forecasting, which requires multi-step temporal reasoning on events and prediction on the future timestamp. Another notable limitation of existing methods is their incapability to provide an illustration of their reasoning process, hindering explainability. In this paper, we introduce the first task of explainable temporal reasoning, to predict an event's occurrence at a future timestamp based on context which requires multiple reasoning over multiple events, and subsequently provide a clear explanation for their prediction. Our task offers a comprehensive evaluation of both the LLMs' complex temporal reasoning ability, the future event prediction ability, and explainability-a critical attribute for AI applications. To support this task, we present the first multi-source instruction-tuning dataset of explainable temporal reasoning (ExpTime) with 26k derived from the temporal knowledge graph datasets and their temporal reasoning paths, using a novel knowledge-graph-instructed-generation strategy. Based on the dataset, we propose the first open-source LLM series TimeLlaMA based on the foundation LlaMA2, with the ability of instruction following for explainable temporal reasoning. We compare the performance of our method and a variety of LLMs, where our method achieves the state-of-the-art performance of temporal prediction and explanation.
    摘要 时间理解是一项重要的自然语言处理(NLP)任务,它提供了对时间敏感的文本数据中的上下文的细化理解。虽然最近的大语言模型(LLMs)在时间理解方面已经表现出了潜在的能力,但是主要的ocus是在时间表达和时间关系提取方面,这些任务主要是为了提取 direct 和过去时间信息,并进行简单的逻辑处理。现有的方法具有复杂逻辑任务的限制,如事件预测,它需要对事件进行多步时间逻辑处理,并预测未来时间戳。此外,现有的方法很难提供逻辑过程的说明,这限制了解释性。在这篇论文中,我们引入了首次的解释时间理解任务,即预测未来时间戳上的事件发生,基于上下文,需要多个逻辑步骤和多个事件的重复理解。这个任务提供了评估LLMs的复杂时间理解能力、未来事件预测能力和解释性的全面评估。为支持这个任务,我们提供了首个多源 instruciton-tuning 数据集 explainable temporal reasoning (ExpTime),包含26k个从时间知识图数据集和其时间逻辑路径 derivated,使用一种新的知识图指导生成策略。基于这个数据集,我们提出了首个基于基础 LlaMA2 的开源 LLM 系列 TimeLlaMA,具有指令跟踪的能力 для解释时间理解。我们与多种 LLMs 进行比较,其中我们的方法达到了时间预测和解释的状态艺术性表现。

KGEx: Explaining Knowledge Graph Embeddings via Subgraph Sampling and Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2310.01065
  • repo_url: None
  • paper_authors: Vasileios Baltatzis, Luca Costabello
  • For: + The paper aims to provide explanations for link predictions in knowledge graph embeddings (KGE) models.* Methods: + The paper proposes a novel post-hoc method called KGEx, which explains individual link predictions by training surrogate KGE models on different subsets of the target triple’s neighborhood. + The method uses a distillation process to ensure that the surrogate models are faithful to the original KGE model.* Results: + The paper demonstrates that KGEx is capable of providing explanations that are faithful to the original KGE model, using two publicly available datasets.Here is the simplified Chinese translation of the three key information points:* For: + 论文目的是为知识图embedding(KGE)模型提供链接预测解释。* Methods: + 论文提出了一种新的后置方法called KGEx,它使用不同的目标 triple 邻居子集进行各个链接预测解释。 + 方法使用了一种热退处理来确保佯作模型忠实于原始KGE模型。* Results: + 论文示出了KGEx可以提供 faithful于原始KGE模型的解释,使用了两个公开available的数据集。
    Abstract Despite being the go-to choice for link prediction on knowledge graphs, research on interpretability of knowledge graph embeddings (KGE) has been relatively unexplored. We present KGEx, a novel post-hoc method that explains individual link predictions by drawing inspiration from surrogate models research. Given a target triple to predict, KGEx trains surrogate KGE models that we use to identify important training triples. To gauge the impact of a training triple, we sample random portions of the target triple neighborhood and we train multiple surrogate KGE models on each of them. To ensure faithfulness, each surrogate is trained by distilling knowledge from the original KGE model. We then assess how well surrogates predict the target triple being explained, the intuition being that those leading to faithful predictions have been trained on impactful neighborhood samples. Under this assumption, we then harvest triples that appear frequently across impactful neighborhoods. We conduct extensive experiments on two publicly available datasets, to demonstrate that KGEx is capable of providing explanations faithful to the black-box model.
    摘要 尽管知识图谱链接预测是使用知识图谱嵌入(KGE)的首选方法,但研究知识图谱嵌入的解释性(KGE)的研究相对较少。我们提出了KGEx,一种新的后期方法,用于解释个体链接预测。给定一个目标 triple,KGEx 将训练多个代表性 KGE 模型,并用这些模型来确定重要的训练 triple。为了衡量训练 triple 的影响,我们会随机选择target triple的邻居区域,并在每个区域中训练多个代表性 KGE 模型。为确保准确性,每个代表性模型都会通过从原始 KGE 模型中精炼知识来训练。然后,我们会评估这些代表性模型是否能够预测目标 triple,即那些能够预测正确的 triple 的代表性模型是否能够在 impactful 的 neighborhood 中找到相似的 triple。在这个假设下,我们会抽取出 frequently 出现在 impactful neighborhood 中的 triple。我们进行了大量的实验,以示KGEx 可以提供 faithful 的解释。

Combining Deep Learning and GARCH Models for Financial Volatility and Risk Forecasting

  • paper_url: http://arxiv.org/abs/2310.01063
  • repo_url: None
  • paper_authors: Jakub Michańków, Łukasz Kwiatkowski, Janusz Morajda
  • for: 这个论文是为了研究一种结合常见 econometric GARCH 时间序列模型和深度学习神经网络的投资风险预测方法。
  • methods: 这个方法使用了 Gated Recurrent Unit (GRU) 神经网络,并使用了四种不同的 GARCH 组合:标准 GARCH、EGARCH、GJR-GARCH 和 APARCH。
  • results: 模型在使用日常对数返回的 S&P 500 指数和黄金价格、比特币价格上进行测试,并使用价格范围基于 Garman-Klass 估计器,修改为包含开盘和收盘价格。使用这些混合模型的风险预测结果可以评估资产的风险,包括 Value-at-Risk (VaR) 和 Expected Shortfall (ES) 在两个不同的风险水平(5% 和 1%)。
    Abstract In this paper, we develop a hybrid approach to forecasting the volatility and risk of financial instruments by combining common econometric GARCH time series models with deep learning neural networks. For the latter, we employ Gated Recurrent Unit (GRU) networks, whereas four different specifications are used as the GARCH component: standard GARCH, EGARCH, GJR-GARCH and APARCH. Models are tested using daily logarithmic returns on the S&P 500 index as well as gold price Bitcoin prices, with the three assets representing quite distinct volatility dynamics. As the main volatility estimator, also underlying the target function of our hybrid models, we use the price-range-based Garman-Klass estimator, modified to incorporate the opening and closing prices. Volatility forecasts resulting from the hybrid models are employed to evaluate the assets' risk using the Value-at-Risk (VaR) and Expected Shortfall (ES) at two different tolerance levels of 5% and 1%. Gains from combining the GARCH and GRU approaches are discussed in the contexts of both the volatility and risk forecasts. In general, it can be concluded that the hybrid solutions produce more accurate point volatility forecasts, although it does not necessarily translate into superior VaR and ES forecasts.
    摘要 在这篇论文中,我们开发了一种混合方法来预测金融工具的波动和风险, combining 常见的 econometric GARCH 时间序列模型和深度学习神经网络。其中,我们使用 Gated Recurrent Unit (GRU) 网络,而 GARCH 组件使用了四种不同的规格:标准 GARCH, EGARCH, GJR-GARCH 和 APARCH。我们使用日ilogarithmic returns 和黄金价格 Bitcoin 价格作为测试数据,这三种资产具有不同的波动动态。作为主要波动估计器,我们使用了基于价格范围的 Garman-Klass 估计器,并修改了其包含开盘和关闭价格。我们使用 hybrid 模型中的波动预测值来评估这三种资产的风险,使用 VaR 和 ES 两种不同的容差级别(5% 和 1%)。我们发现混合方法可以提高波动预测值的准确性,但不一定意味着更好的 VaR 和 ES 预测值。

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

  • paper_url: http://arxiv.org/abs/2310.01061
  • repo_url: https://github.com/rmanluo/reasoning-on-graphs
  • paper_authors: Linhao Luo, Yuan-Fang Li, Gholamreza Haffari, Shirui Pan
  • for: 提高大语言模型(LLM)的逻辑能力和可靠性,使其能够 faithful 和可读性地进行逻辑过程。
  • methods: 提出一种基于知识图(KG)的逻辑 reasoning 方法,即reasoning on graphs(RoG), synergizes LLM 与 KG 以实现 faithful 和可读性的逻辑过程。
  • results: 通过实验表明,RoG 可以在两个 benchmark KGQA 数据集上 achieve state-of-the-art 性能,并生成 faithful 和可读性的逻辑结果。
    Abstract Large language models (LLMs) have demonstrated impressive reasoning abilities in complex tasks. However, they lack up-to-date knowledge and experience hallucinations during reasoning, which can lead to incorrect reasoning processes and diminish their performance and trustworthiness. Knowledge graphs (KGs), which capture vast amounts of facts in a structured format, offer a reliable source of knowledge for reasoning. Nevertheless, existing KG-based LLM reasoning methods only treat KGs as factual knowledge bases and overlook the importance of their structural information for reasoning. In this paper, we propose a novel method called reasoning on graphs (RoG) that synergizes LLMs with KGs to enable faithful and interpretable reasoning. Specifically, we present a planning-retrieval-reasoning framework, where RoG first generates relation paths grounded by KGs as faithful plans. These plans are then used to retrieve valid reasoning paths from the KGs for LLMs to conduct faithful reasoning. Furthermore, RoG not only distills knowledge from KGs to improve the reasoning ability of LLMs through training but also allows seamless integration with any arbitrary LLMs during inference. Extensive experiments on two benchmark KGQA datasets demonstrate that RoG achieves state-of-the-art performance on KG reasoning tasks and generates faithful and interpretable reasoning results.
    摘要 大型语言模型(LLM)在复杂任务中表现出了印象的推理能力。然而,它们缺乏最新的知识和经验幻视,这可能会导致推理过程中的错误和性能下降。知识图(KG),它们捕捉了巨量的事实,并以结构化的格式储存,可以提供推理中的可靠来源。然而,现有的KG基于LLM推理方法只视KG作为事实知识库,忽略了结构信息的重要性。在这篇论文中,我们提出了一种名为“推理在图”(RoG)的新方法,它融合了LLM和KG,以实现忠实和可解释的推理。具体来说,RoG首先从KG中获取与关系相关的路径,并将其转换为实际的行动方案。这些方案后来可以用来从KG中搜寻有效的推理路径,供LLM进行忠实的推理。此外,RoG不仅将知识从KG中提取出来提高LLM的推理能力,而且还允许在推理过程中与任意LLM进行整合。实验结果显示,RoG在KG推理任务中获得了国际级的表现,并产生了忠实和可解释的推理结果。

Improved Crop and Weed Detection with Diverse Data Ensemble Learning in Agriculture

  • paper_url: http://arxiv.org/abs/2310.01055
  • repo_url: None
  • paper_authors: Muhammad Hamza Asad, Saeed Anwar, Abdul Bais
  • for: 这项研究的目的是提高深度学习技术在具有不同场景的现代农业中的应用,特别是在检测、定位和量化作物和障碍物方面。
  • methods: 这项研究使用了 Ensemble 框架,包括不同的作物和障碍物模型,以及一种 teach-and-student 配置。基本模型通过同一个 UNET 元架进行同步融合,以提高作物和障碍物的Semantic Segmentation。
  • results: 研究表明,使用多种作物和障碍物模型,并使用 teach-and-student 配置可以提高 Canola 作物和 Kochia 障碍物的 Semantic Segmentation 性能,并在未看到的测试数据上超过单个 Semantic Segmentation 模型的表现。此外,我们还进行了ablation study,并证明了我们提出的模型的有效性。
    Abstract Modern agriculture heavily relies on Site-Specific Farm Management practices, necessitating accurate detection, localization, and quantification of crops and weeds in the field, which can be achieved using deep learning techniques. In this regard, crop and weed-specific binary segmentation models have shown promise. However, uncontrolled field conditions limit their performance from one field to the other. To improve semantic model generalization, existing methods augment and synthesize agricultural data to account for uncontrolled field conditions. However, given highly varied field conditions, these methods have limitations. To overcome the challenges of model deterioration in such conditions, we propose utilizing data specific to other crops and weeds for our specific target problem. To achieve this, we propose a novel ensemble framework. Our approach involves utilizing different crop and weed models trained on diverse datasets and employing a teacher-student configuration. By using homogeneous stacking of base models and a trainable meta-architecture to combine their outputs, we achieve significant improvements for Canola crops and Kochia weeds on unseen test data, surpassing the performance of single semantic segmentation models. We identify the UNET meta-architecture as the most effective in this context. Finally, through ablation studies, we demonstrate and validate the effectiveness of our proposed model. We observe that including base models trained on other target crops and weeds can help generalize the model to capture varied field conditions. Lastly, we propose two novel datasets with varied conditions for comparisons.
    摘要 现代农业 heavily 依赖于位置特定农业管理实践,需要准确检测、定位和量化作物和雷的场景,可以通过深度学习技术来实现。在这个意义上,特定作物和雷的二进制分割模型已经表现出了承诺。然而,不可控的场景限制了这些模型在不同场景下的性能。为了改进semantic模型的泛化性,现有的方法通过扩充和合成农业数据来补做这些场景。然而,由于场景变化很大,这些方法有限制。为了超越场景变化的挑战,我们提议利用其他作物和雷的数据来解决特定问题。我们的方法包括利用不同的作物和雷模型,并使用教师-学生配置。通过同态拼接基模型的输出,我们实现了对Canola作物和Kochia雷的测试数据上的显著改进,超越了单个semantic segmentation模型的性能。我们发现UNET meta-architecture是最有效的。最后,通过剖面研究,我们证明了我们的提议模型的有效性。我们发现包括基于其他目标作物和雷的基模型可以帮助模型泛化到不同的场景中。最后,我们提出了两个新的数据集,用于比较。

Subtractor-Based CNN Inference Accelerator

  • paper_url: http://arxiv.org/abs/2310.01022
  • repo_url: None
  • paper_authors: Victor Gao, Issam Hammad, Kamal El-Sankary, Jason Gu
  • for: 提高 CNN 推理加速器的性能,通过使用减法来取代多余的 multiply 和 add 操作。
  • methods: 提出了一种新的 CNN 预处理加速器,利用排序、分组和圆拟操作来创造一些可以将 multiply 和 add 操作替换为减法操作的组合。这种方法可以降低 multiply 操作的成本,从而提高性能。
  • results: 通过对 LeNet-5 和 MNIST 集成 dataset 进行测试,实现了32.03% 的功率减少和24.59% 的面积减少,但只增加了0.1% 的精度损失。
    Abstract This paper presents a novel method to boost the performance of CNN inference accelerators by utilizing subtractors. The proposed CNN preprocessing accelerator relies on sorting, grouping, and rounding the weights to create combinations that allow for the replacement of one multiplication operation and addition operation by a single subtraction operation when applying convolution during inference. Given the high cost of multiplication in terms of power and area, replacing it with subtraction allows for a performance boost by reducing power and area. The proposed method allows for controlling the trade-off between performance gains and accuracy loss through increasing or decreasing the usage of subtractors. With a rounding size of 0.05 and by utilizing LeNet-5 with the MNIST dataset, the proposed design can achieve 32.03% power savings and a 24.59% reduction in area at the cost of only 0.1% in terms of accuracy loss.
    摘要

ETGraph: A Pioneering Dataset Bridging Ethereum and Twitter

  • paper_url: http://arxiv.org/abs/2310.01015
  • repo_url: None
  • paper_authors: Qian Wang, Zhen Zhang, Zemin Liu, Shengliang Lu, Bingqiao Luo, Bingsheng He
  • For: The paper aims to address the limitation of existing public blockchain datasets by incorporating relevant social network data into blockchain analysis.* Methods: The paper introduces ETGraph, a novel dataset that combines Ethereum transaction records and Twitter following data to authentically link Ethereum addresses with verified Twitter accounts.* Results: The paper demonstrates the significance of Twitter data in enhancing Ethereum analysis through detailed statistical analysis and extensive experiments, including Ethereum link prediction, wash-trading Ethereum addresses detection, and Twitter-Ethereum matching link prediction.
    Abstract While numerous public blockchain datasets are available, their utility is constrained by a singular focus on blockchain data. This constraint limits the incorporation of relevant social network data into blockchain analysis, thereby diminishing the breadth and depth of insight that can be derived. To address the above limitation, we introduce ETGraph, a novel dataset that authentically links Ethereum and Twitter, marking the first and largest dataset of its kind. ETGraph combines Ethereum transaction records (2 million nodes and 30 million edges) and Twitter following data (1 million nodes and 3 million edges), bonding 30,667 Ethereum addresses with verified Twitter accounts sourced from OpenSea. Detailed statistical analysis on ETGraph highlights the structural differences between Twitter-matched and non-Twitter-matched Ethereum addresses. Extensive experiments, including Ethereum link prediction, wash-trading Ethereum addresses detection, and Twitter-Ethereum matching link prediction, emphasize the significant role of Twitter data in enhancing Ethereum analysis. ETGraph is available at https://etgraph.deno.dev/.
    摘要 “虽然有很多公共链表数据可用,但它们的用途受到单一的链数据专注的限制,这限制了社交网络数据的包含在链分析中,从而减少了分析的广度和深度。为解决这个限制,我们介绍ETGraph,一个新的数据集,它独特地将Ethereum和Twitter联系起来,成为首次和最大的数据集。ETGraph将Ethereum交易记录(2000万节点和3000万边)和Twitter关注数据(1000万节点和3000万边)结合,将30667个Ethereum地址与OpenSea上验证的Twitter帐户相连接。etailed的统计分析表明,Twitter帐户和非Twitter帐户之间存在结构上的差异。广泛的实验,包括Ethereum链接预测、洗刷Ethereum地址检测和Twitter-Ethereum匹配链接预测,证明Twitter数据在提高Ethereum分析中扮演着重要的角色。ETGraph可以在https://etgraph.deno.dev/上获取。”

Efficient Algorithms for the CCA Family: Unconstrained Objectives with Unbiased Gradients

  • paper_url: http://arxiv.org/abs/2310.01012
  • repo_url: None
  • paper_authors: James Chapman, Ana Lawry Aguila, Lennie Wells
  • for: 这篇论文主要是为了研究多视角学习中的 canonical correlation analysis(CCA)家族方法,以及CCA方法的扩展和改进。
  • methods: 这篇论文使用了常见的linear CCA方法和deep CCA方法,并提出了一种新的不确定目标函数,以及一种基于杂交 descent(SGD)的 familia l of fast algorithms,以提高CCA方法的效率和精度。
  • results: 该论文的实验结果表明,新提出的CCA方法可以快速地 converge,并且在多个标准CCA和深度CCA benchmark上显示出较高的相关度和更好的性能,而且可以进行大规模的生物医学数据分析。此外,该论文还成功地与经典CCA方法进行了理论上的联系。
    Abstract The Canonical Correlation Analysis (CCA) family of methods is foundational in multi-view learning. Regularised linear CCA methods can be seen to generalise Partial Least Squares (PLS) and unified with a Generalized Eigenvalue Problem (GEP) framework. However, classical algorithms for these linear methods are computationally infeasible for large-scale data. Extensions to Deep CCA show great promise, but current training procedures are slow and complicated. First we propose a novel unconstrained objective that characterizes the top subspace of GEPs. Our core contribution is a family of fast algorithms for stochastic PLS, stochastic CCA, and Deep CCA, simply obtained by applying stochastic gradient descent (SGD) to the corresponding CCA objectives. These methods show far faster convergence and recover higher correlations than the previous state-of-the-art on all standard CCA and Deep CCA benchmarks. This speed allows us to perform a first-of-its-kind PLS analysis of an extremely large biomedical dataset from the UK Biobank, with over 33,000 individuals and 500,000 variants. Finally, we not only match the performance of `CCA-family' Self-Supervised Learning (SSL) methods on CIFAR-10 and CIFAR-100 with minimal hyper-parameter tuning, but also establish the first solid theoretical links to classical CCA, laying the groundwork for future insights.
    摘要 “Canonical Correlation Analysis(CCA)家族的方法是多视角学习的基础方法。常规线性CCA方法可以看作是分解多项式回归(PLS)的推广,并且可以与通用的 eigenvalues 问题(GEP)框架统一。然而,经典算法 для这些线性方法在大规模数据上是计算不可行的。扩展到深度CCA显示了很大的托管,但现有的训练过程相对慢和复杂。我们首先提出了一个新的不受限制的目标函数,该函数可以 caracterize 高维GEPs 的topsubspace。我们的核心贡献是一家 Fast Algorithms for Stochastic PLS, Stochastic CCA, and Deep CCA,只需要将 Stochastic Gradient Descent(SGD)应用于相应的 CCA 目标函数。这些方法在所有标准 CCA 和 Deep CCA benchmark 上显示了迅速的收敛和更高的相关性。这个速度允许我们对 UK Biobank 上的一个非常大的生物医学数据集进行了首次 PLSA 分析,该数据集包含了33,000名个体和500,000个变iante。最后,我们不仅与 `CCA-family' 自适应学习(SSL)方法在 CIFAR-10 和 CIFAR-100 上匹配性,还 establishment了类CCA的首次理论链接,为未来的发现提供了基础。”

Towards Fixing Clever-Hans Predictors with Counterfactual Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2310.01011
  • repo_url: None
  • paper_authors: Sidney Bender, Christopher J. Anders, Pattarawatt Chormai, Heike Marxfeld, Jan Herrmann, Grégoire Montavon
  • for: 本文提出了一种新的方法 called counterfactual knowledge distillation (CFKD), 用于检测和消除深度学习模型中的假设因素(confounders),并且通过人工专家反馈来帮助模型更好地理解和预测。
  • methods: 本文使用了Counterfactual Explanations的技术,通过人工专家的反馈来帮助模型更好地理解和预测。
  • results: 本文对人工增强的数据集和实际的 Histopathological 数据集进行了证明,并且表明了 CFKD 的效果。
    Abstract This paper introduces a novel technique called counterfactual knowledge distillation (CFKD) to detect and remove reliance on confounders in deep learning models with the help of human expert feedback. Confounders are spurious features that models tend to rely on, which can result in unexpected errors in regulated or safety-critical domains. The paper highlights the benefit of CFKD in such domains and shows some advantages of counterfactual explanations over other types of explanations. We propose an experiment scheme to quantitatively evaluate the success of CFKD and different teachers that can give feedback to the model. We also introduce a new metric that is better correlated with true test performance than validation accuracy. The paper demonstrates the effectiveness of CFKD on synthetically augmented datasets and on real-world histopathological datasets.
    摘要

Using Reinforcement Learning to Optimize Responses in Care Processes: A Case Study on Aggression Incidents

  • paper_url: http://arxiv.org/abs/2310.00981
  • repo_url: None
  • paper_authors: Bart J. Verhoef, Xixi Lu
  • for: 本研究旨在发现侵略行为情况下医疗工作人员的优化策略。
  • methods: 本研究采用了Markov决策过程和Q学习算法,以及SARSA算法来找到最佳策略。
  • results: 研究结果显示,从Q学习和SARSA算法 derivied的策略与当前最常用的行为相似,但提供了一些在某些情况下的更多选择。
    Abstract Previous studies have used prescriptive process monitoring to find actionable policies in business processes and conducted case studies in similar domains, such as the loan application process and the traffic fine process. However, care processes tend to be more dynamic and complex. For example, at any stage of a care process, a multitude of actions is possible. In this paper, we follow the reinforcement approach and train a Markov decision process using event data from a care process. The goal was to find optimal policies for staff members when clients are displaying any type of aggressive behavior. We used the reinforcement learning algorithms Q-learning and SARSA to find optimal policies. Results showed that the policies derived from these algorithms are similar to the most frequent actions currently used but provide the staff members with a few more options in certain situations.
    摘要 Translated into Simplified Chinese:前期研究使用拟定过程监测来找到可行的政策,并在类似领域进行了案例研究,如贷款申请过程和交通罚款过程。然而,护理过程通常更加动态和复杂。例如,在任何一个护理过程阶段,有多种动作是可能的。在这篇论文中,我们采用了回归方法,使用护理过程事件数据来训练Markov决策过程。目标是找到对员工行为的优化政策,当客户展现任何种攻击行为时。我们使用Q-学习和SARSA算法来找到优化政策。结果显示,这些算法 derivated 的策略与现有最常用的策略相似,但提供了员工在某些情况下一些更多的选项。

All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization

  • paper_url: http://arxiv.org/abs/2310.00964
  • repo_url: None
  • paper_authors: Pablo Barros, Alessandra Sciutti
  • for: 这篇论文目的是为了解决在竞争性游戏场景下多个代理需要学习决策,以最大化自己的目标并最小化对手的目标。
  • methods: 这篇论文提出了一种新的模型,由三层神经网络组成,用于学习竞争性游戏的表示、映射对手策略以及对其进行干扰。整个模型在线上进行训练,使用了一种组合损函数基于对比优化,以学习竞争性多客户端游戏。
  • results: 我们在一个彩虹猪牌游戏和四名玩家竞争厨帽游戏中进行了实验,结果表明我们的模型在与离线、在线和竞争性特定模型对战时表现更好,特别是在与同一名对手多次对战时。我们还提出了关于模型对特定策略学习的讨论。
    Abstract In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. Besides dealing with the increased dynamics of the scenarios due to the opponents' actions, they usually have to understand how to overcome the opponent's strategies. Most of the common solutions, usually based on continual learning or centralized multi-agent experiences, however, do not allow the development of personalized strategies to face individual opponents. In this paper, we propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. The entire model is trained online, using a composed loss based on a contrastive optimization, to learn competitive and multiplayer games. We evaluate our model on a pokemon duel scenario and the four-player competitive Chef's Hat card game. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times. We also present a discussion on the impact of our model, in particular on how well it deals with on specific strategy learning for each of the two scenarios.
    摘要 在竞争性游戏场景下,一组代理需要学习决策以最大化自己的目标并最小化对手的目标。除了面对对手的动态外,它们通常还需要理解如何超越对手的策略。大多数常见的解决方案,通常基于连续学习或中央多代理经验,然而这些方案并不允许发展个性化策略面对特定对手。在这篇论文中,我们提出了一种新的模型,包括三层神经网络,学习竞争游戏的表示、对特定对手策略的映射以及如何破坏它们。整个模型在线上训练,使用组合损失基于对比优化,以学习竞争和多人游戏。我们在POKEMON战斗场景和四名竞争对手的Chef's Hat卡牌游戏中进行了实验。我们的实验表明,我们的模型在面对离线、在线和竞争特定模型时表现更好,特别是在与同一个对手多次交手。我们还提供了关于我们模型的影响,包括对每个场景的特定策略学习的讨论。

Multi-Agent Bayesian Optimization with Coupled Black-Box and Affine Constraints

  • paper_url: http://arxiv.org/abs/2310.00962
  • repo_url: None
  • paper_authors: Wenjie Xu, Yuning Jiang, Bratislav Svetozarevic, Colin N. Jones
  • for: 该论文研究了分布式多代理器搜索问题,其中有 coupling black-box 约束和知道的 affine 约束。
  • methods: 提出了一种 primal-dual 分布式算法,可以在黑盒目标函数和约束函数下实现类似于单代理器情况下的 regret/violation 界限。此外,该算法可以保证在知道的 affine 约束下,每个样本的平均违规程度在 $\mathcal{O}(N\sqrt{T})$ 内。
  • results: 应用于 Gaussian processes 和无线通信优化问题,结果表明该方法可以同时实现近似优秀性和保持平均违规程度较小,证实了我们的理论分析。
    Abstract This paper studies the problem of distributed multi-agent Bayesian optimization with both coupled black-box constraints and known affine constraints. A primal-dual distributed algorithm is proposed that achieves similar regret/violation bounds as those in the single-agent case for the black-box objective and constraint functions. Additionally, the algorithm guarantees an $\mathcal{O}(N\sqrt{T})$ bound on the cumulative violation for the known affine constraints, where $N$ is the number of agents. Hence, it is ensured that the average of the samples satisfies the affine constraints up to the error $\mathcal{O}({N}/{\sqrt{T})$. Furthermore, we characterize certain conditions under which our algorithm can bound a stronger metric of cumulative violation and provide best-iterate convergence without affine constraint. The method is then applied to both sampled instances from Gaussian processes and a real-world optimal power allocation problem for wireless communication; the results show that our method simultaneously provides close-to-optimal performance and maintains minor violations on average, corroborating our theoretical analysis.
    摘要 Translation notes:* "black-box constraints" 是指不知道内部运作的约束,通常用来描述对于一个函数的约束,而不是对于一个函数的解的约束。* "known affine constraints" 是指已知的平差约束,通常用来描述一个函数的约束,其中约束的形式是一个平差方程。* "primaldual distributed algorithm" 是一种分布式算法,它可以在多个代理之间进行协调,以解决一个问题。* "regret/violation bounds" 是指一个算法的性能指标,它可以量化算法在问题上的性能。* "best-iterate convergence" 是指一个算法可以在问题上 converges to the optimal solution,并且可以保证在某些情况下,算法的迭代次数可以降低到最少。

Deep Learning in Computational Biology: Advancements, Challenges, and Future Outlook

  • paper_url: http://arxiv.org/abs/2310.03086
  • repo_url: None
  • paper_authors: Suresh Kumar, Dhanyashri Guruparan, Pavithren Aaron, Philemon Telajan, Kavinesh Mahadevan, Dinesh Davagandhi, Ong Xin Yue
  • for: 这篇文章主要关注 Computational Biology 中的 Deep Learning 技术,包括 DNA 序列分类和调适、蛋白结构预测等。
  • methods: 文章使用了 Deep Learning 技术,包括 Convolutional Neural Networks (CNNs) 和其他进阶模型,以分析生物资料。
  • results: 文章指出 Deep Learning 技术在 Computational Biology 中已经获得了重要的进步,包括 DNA 序列分类和调适、蛋白结构预测等领域。同时,文章还提出了一些挑战,例如需要大量 Labelled 数据和模型解释等问题。
    Abstract Deep learning has become a powerful tool in computational biology, revolutionising the analysis and interpretation of biological data over time. In our article review, we delve into various aspects of deep learning in computational biology. Specifically, we examine its history, advantages, and challenges. Our focus is on two primary applications: DNA sequence classification and prediction, as well as protein structure prediction from sequence data. Additionally, we provide insights into the outlook for this field. To fully harness the potential of deep learning in computational biology, it is crucial to address the challenges that come with it. These challenges include the requirement for large, labelled datasets and the interpretability of deep learning models. The use of deep learning in the analysis of DNA sequences has brought about a significant transformation in the detection of genomic variants and the analysis of gene expression. This has greatly contributed to the advancement of personalised medicine and drug discovery. Convolutional neural networks (CNNs) have been shown to be highly accurate in predicting genetic variations and gene expression levels. Deep learning techniques are used for analysing epigenetic data, including DNA methylation and histone modifications. This provides valuable insights into metabolic conditions and gene regulation. The field of protein structure prediction has been significantly impacted by deep learning, which has enabled accurate determination of the three-dimensional shape of proteins and prediction of their interactions. The future of deep learning in computational biology looks promising. With the development of advanced deep learning models and interpretation techniques, there is potential to overcome current challenges and further our understanding of biological systems.
    摘要 深度学习已成为计算生物中的强大工具,革命化了生物数据的分析和解释过程。在我们的文章评论中,我们深入探讨了深度学习在计算生物中的历史、优势和挑战。我们的焦点是两个主要应用:DNA序列分类和预测,以及蛋白质结构预测从序列数据。此外,我们还提供了这个领域未来发展的情况。要完全利用深度学习在计算生物中的潜力,必须解决这些挑战,包括需要大量标注的数据和深度学习模型的解释性。使用深度学习分析DNA序列已经带来了 significiant transformation,帮助掌握个性化医学和药物探索。卷积神经网络(CNNs)在预测基因变异和基因表达水平方面表现出了极高的准确率。深度学习技术还用于分析生物样本中的脱氧核酸和蛋白质结构。这提供了有价值的生物体系的内部状态和基因规则的信息。蛋白质结构预测领域受到深度学习的影响,使得可以准确地确定蛋白质的三维形态和预测它们之间的交互。未来,深度学习在计算生物中的发展前景很 promising。随着深度学习模型和解释技术的发展,有望超越当前的挑战,并进一步了解生物体系。

Distilling Influences to Mitigate Prediction Churn in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2310.00946
  • repo_url: None
  • paper_authors: Andreas Roth, Thomas Liebig
  • for: 本文研究graph neural networks(GNN)中的预测变化现象(prediction churn),通过比较不同模型的初始化和特征使用情况来解释这个现象。
  • methods: 本文提出了一个新的度量方法called Influence Difference(ID),用于量化不同模型之间节点采用的理由的变化。此外,文章还考虑了节点预测的稳定性和不稳定性,并提出了基于ID的知识塑造法 DropDistillation(DD)来逐个节点进行预测稳定性。
  • results: 文章的实验结果表明,DropDistillation(DD)可以在六个 benchmark dataset 上对节点预测进行稳定性和总性表现的改进,并且在知识塑造中比前方法更好。
    Abstract Models with similar performances exhibit significant disagreement in the predictions of individual samples, referred to as prediction churn. Our work explores this phenomenon in graph neural networks by investigating differences between models differing only in their initializations in their utilized features for predictions. We propose a novel metric called Influence Difference (ID) to quantify the variation in reasons used by nodes across models by comparing their influence distribution. Additionally, we consider the differences between nodes with a stable and an unstable prediction, positing that both equally utilize different reasons and thus provide a meaningful gradient signal to closely match two models even when the predictions for nodes are similar. Based on our analysis, we propose to minimize this ID in Knowledge Distillation, a domain where a new model should closely match an established one. As an efficient approximation, we introduce DropDistillation (DD) that matches the output for a graph perturbed by edge deletions. Our empirical evaluation of six benchmark datasets for node classification validates the differences in utilized features. DD outperforms previous methods regarding prediction stability and overall performance in all considered Knowledge Distillation experiments.
    摘要 模型之间存在类似性表现的情况下,预测结果存在显著的分歧,称为预测涨潮。我们的研究探讨了这种现象在图 neural network 中,通过比较不同模型的初始化和使用的特征来 investigate differences between models. 我们提出了一个新的度量方法 called Influence Difference (ID),用于量化节点之间的变化原因。此外,我们还考虑了节点的稳定和不稳定预测之间的差异,并认为这两种预测都使用了不同的原因,因此可以提供有用的梯度信号,以便在两个模型之间匹配。基于我们的分析,我们建议在知识储存中使用 ID 来减少预测涨潮。为了有效地 aproximate ID,我们引入 DropDistillation(DD),它可以在图中Edge deletions 的情况下,将输出与原始模型的输出匹配。我们对六个基准数据集进行了node classification 的实验,并证明了 DropDistillation 的效果。DD 在所有考虑的知识储存实验中都超过了先前的方法,包括预测稳定性和总性能。

Fooling the Textual Fooler via Randomizing Latent Representations

  • paper_url: http://arxiv.org/abs/2310.01452
  • repo_url: None
  • paper_authors: Duy C. Hoang, Quang H. Nguyen, Saurav Manchanda, MinLong Peng, Kok-Seng Wong, Khoa D. Doan
  • for: 防止文本黑客攻击 NLP 模型
  • methods: 使用随机 latent space 防范攻击
  • results: near state-of-the-art 鲁棒性 against representative adversarial word-level attacks on two benchmark datasets
    Abstract Despite outstanding performance in a variety of NLP tasks, recent studies have revealed that NLP models are vulnerable to adversarial attacks that slightly perturb the input to cause the models to misbehave. Among these attacks, adversarial word-level perturbations are well-studied and effective attack strategies. Since these attacks work in black-box settings, they do not require access to the model architecture or model parameters and thus can be detrimental to existing NLP applications. To perform an attack, the adversary queries the victim model many times to determine the most important words in an input text and to replace these words with their corresponding synonyms. In this work, we propose a lightweight and attack-agnostic defense whose main goal is to perplex the process of generating an adversarial example in these query-based black-box attacks; that is to fool the textual fooler. This defense, named AdvFooler, works by randomizing the latent representation of the input at inference time. Different from existing defenses, AdvFooler does not necessitate additional computational overhead during training nor relies on assumptions about the potential adversarial perturbation set while having a negligible impact on the model's accuracy. Our theoretical and empirical analyses highlight the significance of robustness resulting from confusing the adversary via randomizing the latent space, as well as the impact of randomization on clean accuracy. Finally, we empirically demonstrate near state-of-the-art robustness of AdvFooler against representative adversarial word-level attacks on two benchmark datasets.
    摘要 尽管现代自然语言处理(NLP)模型在多种任务上表现出色,但是最新的研究表明这些模型对敌意攻击很易受到影响。其中,对word-level攻击的研究最为深入了解和有效。这些攻击可以在黑盒Setting下进行,因此不需要对模型结构或参数进行访问,这使得现有的NLP应用程序受到威胁。在进行攻击时,敌人会通过让受到攻击的模型多次查询输入文本来确定最重要的单词并将它们替换为对应的同义词。在这种情况下,我们提出了一种轻量级的防御策略,名为AdvFooler。它的主要目标是在这些黑盒Query-based攻击中干扰敌人发现攻击的过程,即“诱敌”。AdvFooler在推理时随机化输入的干扰表示。与现有防御策略不同,AdvFooler不需要在训练时添加额外的计算过程,也不需要假设潜在的攻击干扰集。我们的理论和实验分析表明,通过干扰latent空间来混淆敌人,从而提高模型的Robustness。此外,我们也证明了AdvFooler在 repreensitive adversarial word-level 攻击下 Near state-of-the-art 的Robustness。

Understanding Transferable Representation Learning and Zero-shot Transfer in CLIP

  • paper_url: http://arxiv.org/abs/2310.00927
  • repo_url: None
  • paper_authors: Zixiang Chen, Yihe Deng, Yuanzhi Li, Quanquan Gu
  • for: 本研究旨在深入理解CLIP中的跨模态学习,并对其在零基础学习和自然图像生成中的表现进行分析。
  • methods: 本研究使用视觉语言对照预训练来学习图像和文本之间的共同表示,并对CLIP的特征对齐进行分析。
  • results: 研究发现CLIP的跨模态学习可以帮助提高零基础学习和自然图像生成的性能,并提出了一种基于CLIP的新方法,其在标准数据集上表现更好于CLIP和其他现有方法。
    Abstract Multi-modal learning has become increasingly popular due to its ability to leverage information from different data sources (e.g., text and images) to improve the model performance. Recently, CLIP has emerged as an effective approach that employs vision-language contrastive pretraining to learn joint image and text representations and exhibits remarkable performance in zero-shot learning and text-guided natural image generation. Despite the huge practical success of CLIP, its theoretical understanding remains elusive. In this paper, we formally study transferrable representation learning underlying CLIP and demonstrate how features from different modalities get aligned. We also analyze its zero-shot transfer performance on the downstream tasks. Inspired by our analysis, we propose a new CLIP-type approach, which achieves better performance than CLIP and other state-of-the-art methods on benchmark datasets.
    摘要 多Modal学习在最近几年来变得越来越流行,这是因为它可以利用不同数据源(例如文本和图像)来提高模型性能。最近,CLIP在无需任何批处或监督学习的情况下,通过视觉语言对照预训练来学习图像和文本之间的共同表示,并达到了无法匹配的表现。尽管CLIP在实践中取得了很大的成功,但其理论理解仍然不够清楚。在这篇论文中,我们正式研究CLIP中的传递表示学习,并证明了不同模式之间的特征如何相互对应。我们还分析了CLIP在下游任务中的零shot传递性能。 inspirited by our analysis, we propose a new CLIP-type approach, which achieves better performance than CLIP and other state-of-the-art methods on benchmark datasets.Note that Simplified Chinese is used in this translation, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can also provide the translation using that writing system.

The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice

  • paper_url: http://arxiv.org/abs/2310.00907
  • repo_url: None
  • paper_authors: Fernando Delgado, Stephen Yang, Michael Madaio, Qian Yang
  • for: 本研究旨在探讨参与式设计在人工智能设计中的应用,并提供评估参与式方法的工具kit。
  • methods: 本研究结合技术设计、政治理论和社会科学 литера献,Synthesize出一个参与式设计框架,以评估参与式方法的效果。同时,通过对最新的研究和12名人工智能研究者和实践者的 semi-结构化采访,描述参与式实践的当前状况。
  • results: 研究发现现有的参与式实践存在各种限制和不一致,需要更好地考虑实践中的 constraint,以实现更加有效的参与式设计。
    Abstract Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the "participatory turn" in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints.
    摘要 尽管现在AI系统的设计中存在增长的共识,即参与者们应参与到AI系统的设计中,但是现实中的各种方法仍然存在巨大的变化和隐含的不一致。为了帮助研究人员和实践者们实施参与式方法,评估参与式方法是否实际提供了参与者们真正的权力仍然是一个挑战。这篇文章希望通过对参与式转变的理论基础和现实情况的研究,为参与式AI设计提供一个可用的评估框架。具体来说,我们通过技术设计、政治理论和社会科学的文献 синте减得出了一个参与式AI设计评估框架。此外,我们还通过对最新发表的研究和12名AI研究人员的 semi-结构化采访进行分析,了解参与式实践的当前状况。基于这些实践发现,我们提供了更好地实现参与式目标和方法的指导,考虑到实际的限制。

All Languages Matter: On the Multilingual Safety of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.00905
  • repo_url: https://github.com/jarviswang94/multilingual_safety_benchmark
  • paper_authors: Wenxuan Wang, Zhaopeng Tu, Chang Chen, Youliang Yuan, Jen-tse Huang, Wenxiang Jiao, Michael R. Lyu
  • for: 本研究旨在为大语言模型(LLM)的开发和部署提供安全性测试 benchmark,以适应 LLM 的全球部署。
  • methods: 我们构建了第一个多语言安全测试 benchmark,名为 XSafety,用于检测 LLM 的多语言安全性。 XSafety 覆盖了 10 种语言家族中的 14 种常用安全问题。
  • results: 我们使用 XSafety 对 4 种广泛使用的 LLM 进行了实验研究,发现所有 LLM 对非英语查询产生了许多不安全响应,这表明需要开发非英语语言的安全对Alignment。 我们还提出了一些简单有效的提示方法,可以改善 ChatGPT 的多语言安全性。我们的提示方法可以将非英语查询中不安全响应的比例从 19.1% 降低到 9.7%。我们将数据发布在 GitHub 上,链接在 https://github.com/Jarviswang94/Multilingual_safety_benchmark。
    Abstract Safety lies at the core of developing and deploying large language models (LLMs). However, previous safety benchmarks only concern the safety in one language, e.g. the majority language in the pretraining data such as English. In this work, we build the first multilingual safety benchmark for LLMs, XSafety, in response to the global deployment of LLMs in practice. XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families. We utilize XSafety to empirically study the multilingual safety for 4 widely-used LLMs, including both close-API and open-source models. Experimental results show that all LLMs produce significantly more unsafe responses for non-English queries than English ones, indicating the necessity of developing safety alignment for non-English languages. In addition, we propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT by evoking safety knowledge and improving cross-lingual generalization of safety alignment. Our prompting method can significantly reduce the ratio of unsafe responses from 19.1% to 9.7% for non-English queries. We release our data at https://github.com/Jarviswang94/Multilingual_safety_benchmark.
    摘要 安全是大语言模型(LLM)的核心开发和部署的关键。然而,过去的安全标准仅关注一种语言的安全,例如在预训练数据中的主导语言,如英语。在这项工作中,我们建立了第一个多语言安全标准(XSafety),以应对全球 LLMS 的实践部署。XSafety 覆盖了 14 种常用的安全问题,涵盖 10 种语言家族。我们使用 XSafety 来实验性研究多语言安全,并对 4 种广泛使用的 LLMS 进行了实验研究,包括内置 API 和开源模型。实验结果表明,所有 LLMS 对非英语查询产生了许多不安全的响应, indicating 非英语语言的安全需要进行适应。此外,我们提议了一些简单 yet 有效的提示方法,可以改善 ChatGPT 的多语言安全性。我们的提示方法可以将非英语查询中的不安全响应比例从 19.1% 降低到 9.7%。我们将数据发布到 GitHub 上,请参考 https://github.com/Jarviswang94/Multilingual_safety_benchmark。

Expert enhanced dynamic time warping based anomaly detection

  • paper_url: http://arxiv.org/abs/2310.02280
  • repo_url: None
  • paper_authors: Matej Kloska, Gabriela Grmanova, Viera Rozinajova
  • for: 这篇论文是为了提出一种基于动态时间扭曲(DTW)算法的新型异常检测方法,以提高异常检测的效率和准确率。
  • methods: 该方法基于DTW算法,并在其基础上进行了人工智能Loop(HITL)概念的扩展和增强。
  • results: 该方法可以具有高效的异常检测、可重新训练基于专家的检测反馈,同时保持低的计算复杂度和存储空间复杂度。
    Abstract Dynamic time warping (DTW) is a well-known algorithm for time series elastic dissimilarity measure. Its ability to deal with non-linear time distortions makes it helpful in variety of data mining tasks. Such a task is also anomaly detection which attempts to reveal unexpected behaviour without false detection alarms. In this paper, we propose a novel anomaly detection method named Expert enhanced dynamic time warping anomaly detection (E-DTWA). It is based on DTW with additional enhancements involving human-in-the-loop concept. The main benefits of our approach comprise efficient detection, flexible retraining based on strong consideration of the expert's detection feedback while retaining low computational and space complexity.
    摘要 《动态时间扭曲(DTW)算法是著名的时间序列灵活不同度量表》。它的不线性时间扭曲处理能力使其在数据挖掘任务中有很好的帮助。例如,异常检测任务,它尝试发现不期望的行为,而不会出现假警示。在这篇论文中,我们提出了一种基于DTW的增强版异常检测方法,即专家增强的动态时间扭曲异常检测(E-DTWA)。这种方法基于DTW,并具有人工智能 loop 概念的增强。我们的方法具有高效检测、可重新训练基于专家检测反馈的优点,同时保持了低计算量和存储空间复杂度。

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.00900
  • repo_url: None
  • paper_authors: Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu
  • for: 提高语音质量和可理解性,并实现语音编辑
  • methods: 使用 conditional diffusion models 处理多种任务,包括语音干扰和泛音磁化
  • results: 比其他相关生成语音增强模型 superior performance 在语音干扰和泛音磁化任务上,并可以根据desired environmental sound text description、SNR和RIR进行语音编辑
    Abstract Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner. Specifically, by providing multiple types of conditions including self-supervised learning embeddings and proper text prompts to the score-based diffusion model, we can enable controllable generation of the unified speech enhancement and editing model to perform corresponding actions on the source speech. Our experiments show that our proposed uSee model can achieve superior performance in both speech denoising and dereverberation compared to other related generative speech enhancement models, and can perform speech editing given desired environmental sound text description, signal-to-noise ratios (SNR), and room impulse responses (RIR). Demos of the generated speech are available at https://muqiaoy.github.io/usee.
    摘要 <>请求翻译文本到简化中文。<>语音增强的目标是提高语音信号质量和可理解性,语音编辑则是根据用户需求进行语音编辑。在这篇论文中,我们提出了一个统一语音增强和编辑(uSee)模型,使用条件扩散模型来处理多种任务。Specifically,通过提供多种条件,包括自我超vised学习嵌入和合适的文本提示,我们可以使得控制性地生成对应的语音增强和编辑模型,以执行对源语音的相应操作。我们的实验表明,我们提出的uSee模型在语音干扰和隔音方面的性能较高,并且可以根据欲有的环境声音文本描述、信号噪声比和房间冲击波来进行语音编辑。 demo 可以在 中找到。

No Offense Taken: Eliciting Offensiveness from Language Models

  • paper_url: http://arxiv.org/abs/2310.00892
  • repo_url: https://github.com/anugyas/nluproject
  • paper_authors: Anugya Srivastava, Rahul Ahuja, Rohith Mukku
  • for: 本研究旨在提高语言模型在实际应用中的安全可靠性,通过对语言模型进行Robust testing。
  • methods: 本研究使用自动生成测试用例的方法,包括使用公开可用的小型语言模型(LMs)、不同目标LMs和红色分类器进行实验,并生成了诱导语言模型发送攻击性响应的测试用例集。
  • results: 研究发现,通过使用自动生成测试用例,可以帮助发现广泛部署的语言模型的失败模式,并且可以通过对这些测试用例进行分析,提高语言模型的安全性和可靠性。
    Abstract This work was completed in May 2022. For safe and reliable deployment of language models in the real world, testing needs to be robust. This robustness can be characterized by the difficulty and diversity of the test cases we evaluate these models on. Limitations in human-in-the-loop test case generation has prompted an advent of automated test case generation approaches. In particular, we focus on Red Teaming Language Models with Language Models by Perez et al.(2022). Our contributions include developing a pipeline for automated test case generation via red teaming that leverages publicly available smaller language models (LMs), experimenting with different target LMs and red classifiers, and generating a corpus of test cases that can help in eliciting offensive responses from widely deployed LMs and identifying their failure modes.
    摘要

GRID: A Platform for General Robot Intelligence Development

  • paper_url: http://arxiv.org/abs/2310.00887
  • repo_url: https://github.com/scaledfoundations/grid-playground
  • paper_authors: Sai Vemprala, Shuhang Chen, Abhinav Shukla, Dinesh Narayanan, Ashish Kapoor
  • for: 这个论文是为了提出一个新的机器人智能发展平台(GRID),以解决现有的机器人智能发展时间和成本问题。
  • methods: 这个平台使用了基础模型来解决机器人智能学问题,并且可以让机器人学习、实现和适应其物理能力、环境限制和目标。
  • results: 在不同的航空机器人情况下,这个平台能够实现机器人智能发展的快速化和扩展,并且能够让机器人适应不同的环境和任务。
    Abstract Developing machine intelligence abilities in robots and autonomous systems is an expensive and time consuming process. Existing solutions are tailored to specific applications and are harder to generalize. Furthermore, scarcity of training data adds a layer of complexity in deploying deep machine learning models. We present a new platform for General Robot Intelligence Development (GRID) to address both of these issues. The platform enables robots to learn, compose and adapt skills to their physical capabilities, environmental constraints and goals. The platform addresses AI problems in robotics via foundation models that know the physical world. GRID is designed from the ground up to be extensible to accommodate new types of robots, vehicles, hardware platforms and software protocols. In addition, the modular design enables various deep ML components and existing foundation models to be easily usable in a wider variety of robot-centric problems. We demonstrate the platform in various aerial robotics scenarios and demonstrate how the platform dramatically accelerates development of machine intelligent robots.
    摘要 开发机器智能能力在机器人和自动化系统中是一个昂贵和时间consuming的过程。现有的解决方案是为特定应用程序定制的,更难于泛化。另外,训练数据的缺乏加加了深入学习模型的部署复杂性。我们介绍了一个新的机器人智能发展平台(GRID),以解决这两个问题。该平台允许机器人学习、组合和适应它们的物理能力、环境限制和目标。平台通过物理世界的基础模型解决了机器人领域的AI问题。GRID是从头开始设计,以扩展性和可定制性为特点,以便更多种机器人、车辆、硬件平台和软件协议。此外,模块化的设计使得不同的深入学习组件和现有的基础模型可以轻松地在更多的机器人中应用。我们在不同的飞行器enario中展示了该平台,并证明了它在机器人机智能发展中带来了很大的加速。

(Dynamic) Prompting might be all you need to repair Compressed LLMs

  • paper_url: http://arxiv.org/abs/2310.00867
  • repo_url: None
  • paper_authors: Duc N. M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang Wang
    for: 这种研究的目的是为了提高大语言模型(LLM)的压缩,以降低它们的计算需求。methods: 这种研究使用了训练自由压缩,并 investigate了提高性能的潜在方法,包括提前驱动(prompt-driven recovery)和动态提示(inference-time dynamic prompting)。results: 研究发现,使用提前驱动和动态提示可以提高 LLM 的性能,并且在多种知识领域的九种任务中得到了1.24%的均值提高。
    Abstract Large language models (LLMs), while transformative for NLP, come with significant computational demands, underlining the need for efficient, training-free compression. Notably, despite the marked improvement in training-free compression for the largest of LLMs, our tests using LLaMA-7B and OPT-6.7b highlight a significant performance drop in several realistic downstream tasks. Investigation into the trade-off between resource-intensive post-compression re-training highlights the prospect of prompt-driven recovery as a lightweight adaption tool. However, existing studies, confined mainly to perplexity evaluations and simple tasks, fail to offer unequivocal confidence in the scalability and generalizability of prompting. We tackle this uncertainty in two key ways. First, we uncover the vulnerability of naive prompts in LLM compression as an over-reliance on a singular prompt per input. In response, we propose inference-time dynamic prompting (IDP), a mechanism that autonomously chooses from a set of curated prompts based on the context of each individual input. Second, we delve into a scientific understanding of why "prompting might be all you need post-LLM compression." Our findings suggest that compression does not irretrievably erase LLM model knowledge but displace it, necessitating a new inference path. IDP effectively redirects this path, enabling the model to tap into its inherent yet displaced knowledge and thereby recover performance. Empirical tests affirm the value of IDP, demonstrating an average performance improvement of 1.24% across nine varied tasks spanning multiple knowledge domains.
    摘要 大型自然语言处理(NLP)模型(LLM),尽管对NLP革命性,但具有显著的计算需求,高调出训练自由压缩的需求。尤其是在最大的LLM模型上,我们的测试表明在多个实际下渠道任务中出现了显著的性能下降。我们发现,对于资源占用高的后期压缩重新训练,存在负面的负担和投资回报的负担。在这种情况下,我们提出了一种基于推荐的恢复方法,称为推荐动态提示(IDP)。我们发现,使用单个提示的LLM压缩存在潜在的弱点,即对每个输入只能使用单个提示。为了解决这个问题,我们提出了一种基于上下文的自动提示选择机制,可以在每个输入上选择最佳的提示。此外,我们还进行了一种科学性的研究,以了解“提示是LLM压缩后模型性能的唯一保障”的原因。我们发现,压缩不会永久地消除LLM模型知识,而是将其推倒,需要一个新的推理路径。IDP可以有效地重定向这个路径,使模型能够启用其内在 yet 被推倒的知识,从而恢复性能。我们的实验证明了IDP的价值,在九个多样化的任务上达到了1.24%的平均性能提高。

Melody-conditioned lyrics generation via fine-tuning language model and its evaluation with ChatGPT

  • paper_url: http://arxiv.org/abs/2310.00863
  • repo_url: None
  • paper_authors: Zhe Zhang, Karol Lasocki, Yi Yu, Atsuhiro Takasu
  • for: 用于生成符号旋律的 syllable-level 歌词
  • methods: 使用 fine-tuning caracter-level 预训练模型,integrate 语言知识到 Transformer 生成器的 beam search 中
  • results: 通过 ChatGPT-based 评估,表明生成的歌词具有更高的 coherence 和正确性
    Abstract We leverage character-level language models for syllable-level lyrics generation from symbolic melody. By fine-tuning a character-level pre-trained model, we integrate language knowledge into the beam search of a syllable-level Transformer generator. Using ChatGPT-based evaluations, we demonstrate enhanced coherence and correctness in the generated lyrics.
    摘要 我们利用字符级语言模型来生成字节级歌词从符号旋律中。通过 fine-tuning 字符级预训练模型,我们将语言知识 интегриinto了字节级 transformer 生成器的搜索树。使用 ChatGPT-based 评估,我们证明了生成的歌词具有更高的一致性和正确性。Note that "字节级" (byte-level) and "symbolic melody" are not direct translations, but rather approximations of the original text. The original text is discussing the use of character-level language models for generating lyrics from musical melodies, but in Chinese, we would typically use "字符级" (character-level) instead of "symbolic melody" to refer to the musical melodies. Additionally, "ChatGPT-based evaluations" is not a direct translation, but rather a description of the evaluation method used in the study.

Use Your INSTINCT: INSTruction optimization usIng Neural bandits Coupled with Transformers

  • paper_url: http://arxiv.org/abs/2310.02905
  • repo_url: https://github.com/xqlin98/INSTINCT
  • paper_authors: Xiaoqiang Lin, Zhaoxuan Wu, Zhongxiang Dai, Wenyang Hu, Yao Shu, See-Kiong Ng, Patrick Jaillet, Bryan Kian Hsiang Low
  • for: 这paper aimed to optimize the instructions given to large language models (LLMs) using a neural bandit algorithm, in order to improve their performance in various tasks.
  • methods: 这paper uses a neural bandit algorithm that replaces the traditional Gaussian process (GP) model with a neural network (NN) surrogate, and naturally couples the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM).
  • results: 该paper通过实验表明,使用INSTINCT算法可以在不同的任务中,如 instrucion induction tasks 和 zero-shot chain-of-thought instruction 等,与现有方法相比,具有更高的性能。
    Abstract Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. However, BO usually falls short when optimizing highly sophisticated (e.g., high-dimensional) objective functions, such as the functions mapping an instruction to the performance of an LLM. This is mainly due to the limited expressive power of the Gaussian process (GP) model which is used by BO as a surrogate to model the objective function. Meanwhile, it has been repeatedly shown that neural networks (NNs), especially pre-trained transformers, possess strong expressive power and can model highly complex functions. So, we adopt a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs. More importantly, the neural bandit algorithm allows us to naturally couple the NN surrogate with the hidden representation learned by a pre-trained transformer (i.e., an open-source LLM), which significantly boosts its performance. These motivate us to propose our INSTruction optimization usIng Neural bandits Coupled with Transformers} (INSTINCT) algorithm. We perform instruction optimization for ChatGPT and use extensive experiments to show that our INSTINCT consistently outperforms the existing methods in different tasks, such as in various instruction induction tasks and the task of improving the zero-shot chain-of-thought instruction.
    摘要 大型语言模型(LLM)已经展现出很好的指令遵循能力和各种应用中的优秀表现。然而,LLM的表现受到提供的指令的影响很大,这些指令通常需要大量的人工努力进行调整。现有的工作使用了查询有效率的汤普逊-波利 Optimization(BO)算法来自动调整黑盒LLM的指令。然而,BO通常无法优化高维度的目标函数,例如将指令与LLM的性能映射到的函数。这主要是因为GP模型,它是BO使用的伪函数模型,有限的表达力。同时,对于高维度的目标函数,NN类型模型,特别是预训transformer,具有强大的表达力和可以模型高度复杂的函数。因此,我们采用了对GP模型进行替换的神经网络参数(Neural Bandit)来优化黑盒LLM的指令。此外,神经网络参数可以自然地与预训transformer中的隐藏表现相互关联,从而增强表现。这些动机我们提出INSTINCT算法(INSTruction optimization using Neural bandits Coupled with Transformers)。我们对ChatGPT进行了 instruction optimization,并通过实验显示了INSTINCT在不同任务中,如不同的指令启发任务和零shot chain-of-thought指令的提升中,与现有方法相比,具有较高的表现。

Application of frozen large-scale models to multimodal task-oriented dialogue

  • paper_url: http://arxiv.org/abs/2310.00845
  • repo_url: None
  • paper_authors: Tatsuki Kawamoto, Takuma Suzuki, Ko Miyama, Takumi Meguro, Tomohiro Takagi
  • for: 这个研究用于测试多modal任务对话的可行性,以及使用现有的大语言模型优化See框架(LENS框架)解决计算机视觉任务。
  • methods: 我们使用Multimodal Dialogs(MMD) dataset,并使用ChatGPT-based G-EVAL进行评估,它只接受文本模式,并将多modal数据处理成文本模式。
  • results: 比前一些使用Transformer模型的研究,我们的方法在fluency、有用性和相关性和 coherence三个指标上显示出统计学上的优势, Specifically, our method demonstrated an absolute lift of 10.8% in fluency, 8.8% in usefulness, and 5.2% in relevance and coherence.
    Abstract In this study, we use the existing Large Language Models ENnhanced to See Framework (LENS Framework) to test the feasibility of multimodal task-oriented dialogues. The LENS Framework has been proposed as a method to solve computer vision tasks without additional training and with fixed parameters of pre-trained models. We used the Multimodal Dialogs (MMD) dataset, a multimodal task-oriented dialogue benchmark dataset from the fashion field, and for the evaluation, we used the ChatGPT-based G-EVAL, which only accepts textual modalities, with arrangements to handle multimodal data. Compared to Transformer-based models in previous studies, our method demonstrated an absolute lift of 10.8% in fluency, 8.8% in usefulness, and 5.2% in relevance and coherence. The results show that using large-scale models with fixed parameters rather than using models trained on a dataset from scratch improves performance in multimodal task-oriented dialogues. At the same time, we show that Large Language Models (LLMs) are effective for multimodal task-oriented dialogues. This is expected to lead to efficient applications to existing systems.
    摘要 在本研究中,我们使用现有的大语言模型加强框架(LENS框架)测试多模态任务对话的可行性。LENS框架被提议用于解决计算机视觉任务无需额外训练和固定参数的前训练模型。我们使用的是时尚领域的多模态对话数据集(MMD),并使用基于ChatGPT的G-EVAL进行评估,该模型只接受文本Modalities,并通过特殊安排来处理多模态数据。与前一 Studies中使用Transformer模型相比,我们的方法在流利性、有用性和相关性和 coherence 等方面表现出积极的提升,具体的提升为10.8%、8.8%和5.2%。结果表明,使用大规模模型的固定参数而不是从scratch 训练模型可以提高多模态任务对话的性能,同时也表明了大语言模型在多模态任务对话中的有效性。这将导致现有系统的效率应用。

Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models

  • paper_url: http://arxiv.org/abs/2310.00836
  • repo_url: None
  • paper_authors: Man Luo, Shrinidhi Kumbhar, Ming shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, Chitta Baral
  • for: 本研究旨在了解大自然语言模型(LLMs)在逻辑推理中的效能,尤其是通过自然语言进行逻辑推理是否能够实现更高水平的逻辑推理能力。
  • methods: 本研究采用了一系列的方法,包括采用Seq2Seq任务标准化了24个不同的逻辑推理 datasets,并对这些 datasets 进行了单任务训练、多任务训练以及知识填充精细化训练等方法来评估模型在不同的逻辑推理类别中的性能。
  • results: 研究发现,通过单任务训练、多任务训练以及知识填充精细化训练等方法可以提高 LLMS 在逻辑推理中的性能,并且可以在不同的逻辑推理类别中实现更高水平的逻辑推理能力。
    Abstract Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. Consequently, there is a growing interest in using LLMs for logical reasoning via natural language. This work strives to understand the proficiency of LLMs in logical reasoning by offering a brief review of the latest progress in this area; with a focus on the logical reasoning datasets, tasks, and the methods adopted to utilize LLMs for reasoning. To offer a thorough analysis, we have compiled a benchmark titled LogiGLUE. This includes 24 varied datasets encompassing deductive, abductive, and inductive reasoning. We have standardized these datasets into Seq2Seq tasks to facilitate straightforward training and evaluation for future research. Utilizing LogiGLUE as a foundation, we have trained an instruction fine tuned language model, resulting in LogiT5. We study single task training, multi task training, and a chain of thought knowledge distillation fine tuning technique to assess the performance of model across the different logical reasoning categories. By this comprehensive process, we aim to shed light on the capabilities and potential pathways for enhancing logical reasoning proficiency in LLMs, paving the way for more advanced and nuanced developments in this critical field.
    摘要 人类的逻辑推理是基础知识,但是在人工智能领域 pose a significant challenge. 在初期,研究人员使用知识表示和推理(KR)系统,但这些系统不具备扩展性和需要很大的人工努力。近年来,大型自然语言模型(LLMs)的出现已经表明了在不同的逻辑推理任务上的可行性。因此,使用 LLMS 进行逻辑推理已经成为一个受到关注的领域。本工作的目的是了解 LLMS 在逻辑推理中的效能。我们提供了一个简短的进展概述,并将焦点放在逻辑推理数据集、任务和使用 LLMS 进行推理的方法上。为了进行全面的分析,我们编制了一个名为 LogiGLUE 的标准套件,该套件包含了 24 个不同的逻辑推理数据集,涵盖了推理、推论和论证三种类型的推理。我们将这些数据集标准化为 Seq2Seq 任务,以便 straightforward 的训练和评估。使用 LogiGLUE 为基础,我们训练了一个特定任务的语言模型,并命名为 LogiT5。我们研究了单任务训练、多任务训练和知识传递精炼练练技术,以评估模型在不同的逻辑推理类型中的性能。通过这种全面的 процес,我们希望能够探讨 LLMS 在逻辑推理中的能力和可能的提升路径,为更高级和复杂的发展奠定基础。

Natural Language Models for Data Visualization Utilizing nvBench Dataset

  • paper_url: http://arxiv.org/abs/2310.00832
  • repo_url: None
  • paper_authors: Shuo Wang, Carlos Crespo-Quinones
  • for: 这个论文的目的是用自然语言模型来实现数据可视化中的语言翻译。
  • methods: 这个论文使用了序列到序列变换器基本模型,并使用大型自然语言模型BERT作为编码器来预测从自然语言查询中的可视化命令。
  • results: 这个论文通过对大量自然语言查询和可视化命令进行预测,证明了这种方法的设计和性能。
    Abstract Translation of natural language into syntactically correct commands for data visualization is an important application of natural language models and could be leveraged to many different tasks. A closely related effort is the task of translating natural languages into SQL queries, which in turn could be translated into visualization with additional information from the natural language query supplied\cite{Zhong:2017qr}. Contributing to the progress in this area of research, we built natural language translation models to construct simplified versions of data and visualization queries in a language called Vega Zero. In this paper, we explore the design and performance of these sequence to sequence transformer based machine learning model architectures using large language models such as BERT as encoders to predict visualization commands from natural language queries, as well as apply available T5 sequence to sequence models to the problem for comparison.
    摘要 “自然语言转换为资料可视化的命令是一个重要的应用领域,可以应用于多种任务。相似的努力包括将自然语言转换为SQL查询,这些查询可以随后转换为可视化的资料,并且从自然语言查询中提供额外的资讯。”以下是我们在这个领域中所建立的自然语言转换模型的设计和性能:1. 使用大型自然语言模型,如BERT作为Encoder,以预测自然语言查询中的可视化命令。2. 将可用的T5序列转换模型应用到这个问题上,进行比较。我们的研究将FOCUS ON THE DESIGN AND PERFORMANCE OF THESE MODEL ARCHITECTURES, AND EXPLORE THEIR APPLICATION IN THE FIELD OF DATA VISUALIZATION.

Action Recognition Utilizing YGAR Dataset

  • paper_url: http://arxiv.org/abs/2310.00831
  • repo_url: None
  • paper_authors: Shuo Wang, Amiya Ranjan, Lawrence Jiang
  • for: 这篇论文是为了 bridging the gap of high-quality action video data 的研究和应用而写的。
  • methods: 这篇论文使用了一种新的3D动作数据生成引擎,生成了3组样本数据,以示其当前的功能性。
  • results: 通过使用这种数据生成过程,可以应用于图像分类、动作识别等领域,并且有potential可以演化成更复杂的动作识别系统。
    Abstract The scarcity of high quality actions video data is a bottleneck in the research and application of action recognition. Although significant effort has been made in this area, there still exist gaps in the range of available data types a more flexible and comprehensive data set could help bridge. In this paper, we present a new 3D actions data simulation engine and generate 3 sets of sample data to demonstrate its current functionalities. With the new data generation process, we demonstrate its applications to image classifications, action recognitions and potential to evolve into a system that would allow the exploration of much more complex action recognition tasks. In order to show off these capabilities, we also train and test a list of commonly used models for image recognition to demonstrate the potential applications and capabilities of the data sets and their generation process.
    摘要 <>转换给定文本到简化中文。>研究和应用动作认知的瓶颈是高质量动作视频数据的缺乏。尽管已经投入了大量努力,仍然存在不同类型数据的空白,一个更flexible和全面的数据集可以帮助bridging这些空白。在这篇论文中,我们介绍了一种新的3D动作数据生成引擎,并生成了3组样本数据来示出其当前的功能。通过新的数据生成过程,我们示出了其应用于图像分类、动作认知和潜在的复杂动作认知任务的可能性。为了展示这些能力,我们还训练和测试了一些常用的图像认知模型,以示出数据集和生成过程的潜在应用和能力。