cs.AI - 2023-08-16

On the Augmentation of Cognitive Accuracy and Cognitive Precision in Human/Cog Ensembles

  • paper_url: http://arxiv.org/abs/2308.08581
  • repo_url: None
  • paper_authors: Ron Fulbright
  • for: 这些研究是为了探索人类使用工具时的表现是如何改善的。
  • methods: 这些研究使用了新型的认知系统,称为“cogs”,以实现人类认知增强。
  • results: 研究发现,人类和认知系统之间的合作对话可以提高人类认知精度和准确性,并且不同类型的信息可以帮助提高人类的创造力和解决问题能力。
    Abstract Whenever humans use tools human performance is enhanced. Cognitive systems are a new kind of tool continually increasing in cognitive capability and are now performing high level cognitive tasks previously thought to be explicitly human. Usage of such tools, known as cogs, are expected to result in ever increasing levels of human cognitive augmentation. In a human cog ensemble, a cooperative, peer to peer, and collaborative dialog between a human and a cognitive system, human cognitive capability is augmented as a result of the interaction. The human cog ensemble is therefore able to achieve more than just the human or the cog working alone. This article presents results from two studies designed to measure the effect information supplied by a cog has on cognitive accuracy, the ability to produce the correct result, and cognitive precision, the propensity to produce only the correct result. Both cognitive accuracy and cognitive precision are shown to be increased by information of different types (policies and rules, examples, and suggestions) and with different kinds of problems (inventive problem solving and puzzles). Similar effects shown in other studies are compared.
    摘要 This article presents the results of two studies that measured the effect of information supplied by a cog on cognitive accuracy, the ability to produce the correct result, and cognitive precision, the tendency to produce only the correct result. The studies found that both cognitive accuracy and cognitive precision were increased by information of different types (policies and rules, examples, and suggestions) and with different types of problems (inventive problem-solving and puzzles). Similar effects have been shown in other studies.

Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities

  • paper_url: http://arxiv.org/abs/2308.08407
  • repo_url: None
  • paper_authors: Munib Mesinovic, Peter Watkinson, Tingting Zhu
  • for: 这篇论文旨在探讨人工智能应用于医疗领域中的优化和可解释性,以及这些系统在诊断和疾病预测中的应用。
  • methods: 论文使用了多种可解释性方法,包括对于诊断和预测的可解释性评估、实验设计和synthetic dataset的应用,以及对于不同类型的资料的处理和分析。
  • results: 论文总结了过去几年来对于可解释性的研究和发展,包括各种可解释性方法的应用和评估,以及这些方法在实际应用中的成果。
    Abstract Recent advancements in AI applications to healthcare have shown incredible promise in surpassing human performance in diagnosis and disease prognosis. With the increasing complexity of AI models, however, concerns regarding their opacity, potential biases, and the need for interpretability. To ensure trust and reliability in AI systems, especially in clinical risk prediction models, explainability becomes crucial. Explainability is usually referred to as an AI system's ability to provide a robust interpretation of its decision-making logic or the decisions themselves to human stakeholders. In clinical risk prediction, other aspects of explainability like fairness, bias, trust, and transparency also represent important concepts beyond just interpretability. In this review, we address the relationship between these concepts as they are often used together or interchangeably. This review also discusses recent progress in developing explainable models for clinical risk prediction, highlighting the importance of quantitative and clinical evaluation and validation across multiple common modalities in clinical practice. It emphasizes the need for external validation and the combination of diverse interpretability methods to enhance trust and fairness. Adopting rigorous testing, such as using synthetic datasets with known generative factors, can further improve the reliability of explainability methods. Open access and code-sharing resources are essential for transparency and reproducibility, enabling the growth and trustworthiness of explainable research. While challenges exist, an end-to-end approach to explainability in clinical risk prediction, incorporating stakeholders from clinicians to developers, is essential for success.
    摘要 In this review, we discuss the relationship between these concepts and how they are often used together or interchangeably. We also highlight recent progress in developing explainable models for clinical risk prediction, emphasizing the importance of quantitative and clinical evaluation and validation across multiple common modalities in clinical practice. External validation and the combination of diverse interpretability methods are essential to enhance trust and fairness.To improve the reliability of explainability methods, rigorous testing using synthetic datasets with known generative factors is necessary. Open access and code-sharing resources are also crucial for transparency and reproducibility, enabling the growth and trustworthiness of explainable research.While challenges exist, an end-to-end approach to explainability in clinical risk prediction, involving stakeholders from clinicians to developers, is essential for success. By addressing these challenges, we can ensure that AI systems are trustworthy, fair, and effective in improving healthcare outcomes.Simplified Chinese translation:最近,人工智能在医疗领域的应用发展呈现了无比惊人的扩展性,超越了人类的诊断和疾病预测性能。然而,随着AI模型的复杂度的增加,对其透明度、可能的偏见和解释性的关注也在增加。为保证AI系统的信任和可靠性,特别是在临床风险预测模型中,解释性变得非常重要。解释性指AI系统能够提供明确、可理解的决策逻辑或决策的解释给人类潜在涉及者。在临床风险预测中,其他的解释性方面,如公平、偏见、信任和透明度,也是非常重要的。在这篇文章中,我们讨论了这些概念之间的关系,以及它们在一起或互换使用的情况。我们还强调了最近在临床风险预测中发展的解释性模型的进展,强调了评估和验证的重要性,以及在多种常见的模式上进行评估和验证的必要性。外部验证和多种解释性方法的组合是重要的,以增强信任和公平。为提高解释性方法的可靠性,使用知generate的synthetic dataset是非常重要的。开源和分享代码资源也是非常重要的,以便透明度和可重现性。虽然存在挑战,但是在临床风险预测中的解释性研究需要一个综合的方法,涉及临床医生到开发者的潜在涉及者。通过解决这些挑战,我们可以确保AI系统是可靠的,公平的,并能够改善医疗结果。

PDPK: A Framework to Synthesise Process Data and Corresponding Procedural Knowledge for Manufacturing

  • paper_url: http://arxiv.org/abs/2308.08371
  • repo_url: https://github.com/0x14d/embedding-operator-knowledge
  • paper_authors: Richard Nordsieck, André Schweizer, Michael Heider, Jörg Hähner
  • for: 该论文主要是为了提供一种生成人工智能数据的框架,以便在不同领域进行应用。
  • methods: 该论文使用了 Resource Description Framework (RDF) 格式的知识图来表示过程知识,并模拟了参数化过程以生成一些合理的数据。
  • results: 该论文通过对生成的数据进行评估,发现一些已有的嵌入方法在表示过程知识方面具有潜在的优势。同时,该论文还提供了一个开源的框架和评估代码,以便将来的研究者可以更容易地进行比较。
    Abstract Procedural knowledge describes how to accomplish tasks and mitigate problems. Such knowledge is commonly held by domain experts, e.g. operators in manufacturing who adjust parameters to achieve quality targets. To the best of our knowledge, no real-world datasets containing process data and corresponding procedural knowledge are publicly available, possibly due to corporate apprehensions regarding the loss of knowledge advances. Therefore, we provide a framework to generate synthetic datasets that can be adapted to different domains. The design choices are inspired by two real-world datasets of procedural knowledge we have access to. Apart from containing representations of procedural knowledge in Resource Description Framework (RDF)-compliant knowledge graphs, the framework simulates parametrisation processes and provides consistent process data. We compare established embedding methods on the resulting knowledge graphs, detailing which out-of-the-box methods have the potential to represent procedural knowledge. This provides a baseline which can be used to increase the comparability of future work. Furthermore, we validate the overall characteristics of a synthesised dataset by comparing the results to those achievable on a real-world dataset. The framework and evaluation code, as well as the dataset used in the evaluation, are available open source.
    摘要 过程知识描述如何完成任务和解决问题。这种知识通常由领域专家拥有,例如制造业中的操作员,他们根据参数进行调整以达到质量目标。到目前为止,我们知道没有公开可用的现实世界数据集,可能由于企业对知识前进的担忧。因此,我们提供了一个框架,可以生成可靠的 sintetic 数据集,可以适应不同领域。该框架灵感来自我们有 Access 到的两个真实世界过程知识数据集。除了包含过程知识的资源描述框架(RDF)Compatible 知识图,该框架还模拟参数化过程,提供一致的过程数据。我们使用现成的嵌入方法对所获知识图进行评估, detailing 哪些方法有可能表示过程知识。这提供了一个基准,可以用于将来的研究比较可读性。此外,我们验证了生成的数据集的总特征,与真实世界数据集的结果进行比较。框架和评估代码,以及用于评估的数据集,都是开源的。

Agglomerative Transformer for Human-Object Interaction Detection

  • paper_url: http://arxiv.org/abs/2308.08370
  • repo_url: None
  • paper_authors: Danyang Tu, Wei Sun, Guangtao Zhai, Wei Shen
  • for: 提高人物对象交互检测器的性能
  • methods: 使用动态聚类和文本引导来生成实例token,并与传统 transformer encoder 集成,以获得更好的特征学习和实例级别征料提取
  • results: 在 HICO-Det 上达到了36.75 mAP的新州OF-the-art性能,并且在具有单个阶段和端到端结构的情况下,提高了8.5%的GFLOPs和36%的FPS
    Abstract We propose an agglomerative Transformer (AGER) that enables Transformer-based human-object interaction (HOI) detectors to flexibly exploit extra instance-level cues in a single-stage and end-to-end manner for the first time. AGER acquires instance tokens by dynamically clustering patch tokens and aligning cluster centers to instances with textual guidance, thus enjoying two benefits: 1) Integrality: each instance token is encouraged to contain all discriminative feature regions of an instance, which demonstrates a significant improvement in the extraction of different instance-level cues and subsequently leads to a new state-of-the-art performance of HOI detection with 36.75 mAP on HICO-Det. 2) Efficiency: the dynamical clustering mechanism allows AGER to generate instance tokens jointly with the feature learning of the Transformer encoder, eliminating the need of an additional object detector or instance decoder in prior methods, thus allowing the extraction of desirable extra cues for HOI detection in a single-stage and end-to-end pipeline. Concretely, AGER reduces GFLOPs by 8.5% and improves FPS by 36%, even compared to a vanilla DETR-like pipeline without extra cue extraction.
    摘要
  1. 一体性:每个实例token被鼓励包含所有细节特征区域,从而显著提高了不同实例级cue的提取,并 ultimately leads to a new state-of-the-art performance of HOI detection with 36.75 mAP on HICO-Det.2. 效率:动态 clustering mechanism allows AGER to generate instance tokens jointly with the feature learning of the Transformer encoder, eliminating the need for an additional object detector or instance decoder in prior methods, thus allowing the extraction of desirable extra cues for HOI detection in a single-stage and end-to-end pipeline.Concretely, AGER reduces GFLOPs by 8.5% and improves FPS by 36%, even compared to a vanilla DETR-like pipeline without extra cue extraction.

Is Meta-Learning the Right Approach for the Cold-Start Problem in Recommender Systems?

  • paper_url: http://arxiv.org/abs/2308.08354
  • repo_url: None
  • paper_authors: Davide Buffelli, Ashish Gupta, Agnieszka Strzalka, Vassilis Plachouras
    for:This paper aims to address the cold-start problem in deep learning models for recommender systems, specifically exploring the use of standard and widely adopted deep learning models and a simple modular approach for achieving similar or higher performance without using meta-learning techniques.methods:The paper employs a variety of deep learning models, including standard and widely adopted models, and compares their performance with meta-learning models specifically designed for the cold-start setting. The authors also propose a simple modular approach using common representation learning techniques.results:The authors show that, when tuned correctly, standard and widely adopted deep learning models perform just as well as newer meta-learning models on commonly used benchmarks for the cold-start problem. Additionally, the simple modular approach using common representation learning techniques performs comparably to meta-learning techniques specifically designed for the cold-start setting while being much more easily deployable in real-world applications.
    Abstract Recommender systems have become fundamental building blocks of modern online products and services, and have a substantial impact on user experience. In the past few years, deep learning methods have attracted a lot of research, and are now heavily used in modern real-world recommender systems. Nevertheless, dealing with recommendations in the cold-start setting, e.g., when a user has done limited interactions in the system, is a problem that remains far from solved. Meta-learning techniques, and in particular optimization-based meta-learning, have recently become the most popular approaches in the academic research literature for tackling the cold-start problem in deep learning models for recommender systems. However, current meta-learning approaches are not practical for real-world recommender systems, which have billions of users and items, and strict latency requirements. In this paper we show that it is possible to obtaining similar, or higher, performance on commonly used benchmarks for the cold-start problem without using meta-learning techniques. In more detail, we show that, when tuned correctly, standard and widely adopted deep learning models perform just as well as newer meta-learning models. We further show that an extremely simple modular approach using common representation learning techniques, can perform comparably to meta-learning techniques specifically designed for the cold-start setting while being much more easily deployable in real-world applications.
    摘要 现代在线产品和服务中,推荐系统已成为基本结构的一部分,对用户体验产生了深见的影响。过去几年,深度学习方法在研究中吸引了很多注目,现在在现实世界中广泛应用于现代推荐系统中。然而,在冷启动设定下(例如,用户在系统中做的交互有限),仍然是一个未解决的问题。学术研究文献中,最受欢迎的方法是使用meta-学习技术来解决冷启动问题在深度学习模型中。然而,现有的meta-学习方法在实际应用中并不实用,因为它们需要大量的用户和项目数据,并且具有严格的响应时间要求。在这篇论文中,我们表明可以在常用的benchmark测试中达到相似或更高的性能,而不需要使用meta-学习技术。具体来说,当正确地调整的标准和广泛采用的深度学习模型,它们可以与更新的meta-学习模型相比,达到类似或更高的性能。此外,我们还显示了一种非常简单的模块化方法,使用通用表示学习技术,可以与特定的冷启动设定相比,并且可以在实际应用中更加容易进行部署。Here's the translation in Simplified Chinese:现代在线产品和服务中,推荐系统已成为基本结构的一部分,对用户体验产生了深见的影响。过去几年,深度学习方法在研究中吸引了很多注目,现在在现实世界中广泛应用于现代推荐系统中。然而,在冷启动设定下(例如,用户在系统中做的交互有限),仍然是一个未解决的问题。学术研究文献中,最受欢迎的方法是使用meta-学习技术来解决冷启动问题在深度学习模型中。然而,现有的meta-学习方法在实际应用中并不实用,因为它们需要大量的用户和项目数据,并且具有严格的响应时间要求。在这篇论文中,我们表明可以在常用的benchmark测试中达到相似或更高的性能,而不需要使用meta-学习技术。具体来说,当正确地调整的标准和广泛采用的深度学习模型,它们可以与更新的meta-学习模型相比,达到类似或更高的性能。此外,我们还显示了一种非常简单的模块化方法,使用通用表示学习技术,可以与特定的冷启动设定相比,并且可以在实际应用中更加容易进行部署。

Graph Out-of-Distribution Generalization with Controllable Data Augmentation

  • paper_url: http://arxiv.org/abs/2308.08344
  • repo_url: None
  • paper_authors: Bin Lu, Xiaoying Gan, Ze Zhao, Shiyu Liang, Luoyi Fu, Xinbing Wang, Chenghu Zhou
  • for: 本研究旨在解决图像分类中的数据分布偏移问题,提高图像分类模型的稳定性和泛化能力。
  • methods: 本文提出了一种控制数据增强技术,包括提取图像理解信息,生成偏移后的虚拟样本,并利用极值理论来评估虚拟样本对模型的影响。
  • results: 对多个实际数据集进行了广泛的研究,并证明了该方法在比基eline模型具有显著的优势。
    Abstract Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties. However, due to the selection bias of training and testing data (e.g., training on small graphs and testing on large graphs, or training on dense graphs and testing on sparse graphs), distribution deviation is widespread. More importantly, we often observe \emph{hybrid structure distribution shift} of both scale and density, despite of one-sided biased data partition. The spurious correlations over hybrid distribution deviation degrade the performance of previous GNN methods and show large instability among different datasets. To alleviate this problem, we propose \texttt{OOD-GMixup} to jointly manipulate the training distribution with \emph{controllable data augmentation} in metric space. Specifically, we first extract the graph rationales to eliminate the spurious correlations due to irrelevant information. Secondly, we generate virtual samples with perturbation on graph rationale representation domain to obtain potential OOD training samples. Finally, we propose OOD calibration to measure the distribution deviation of virtual samples by leveraging Extreme Value Theory, and further actively control the training distribution by emphasizing the impact of virtual OOD samples. Extensive studies on several real-world datasets on graph classification demonstrate the superiority of our proposed method over state-of-the-art baselines.
    摘要 格 Graf儿 Neural Network(GNN)在分类图像的表现非常出色。然而,由于训练和测试数据的选择偏袋(如训练小图和测试大图,或训练稠密图和测试稀疏图),导致分布偏移广泛存在。更重要的是,我们经常观察到 hybrid 结构分布偏移,即 both scale 和 density 的偏移,尽管数据分区存在一侧偏袋。这些偏移导致过去的 GNN 方法的性能下降,并在不同的 dataset 中显示出大的不稳定性。为解决这问题,我们提出了 \texttt{OOD-GMixup},它通过在 metric 空间中控制数据增强来同时做出一些可控的数据增强。具体来说,我们首先提取图像的理据,以消除由无关信息引起的假象相关性。然后,我们生成了基于图像理据表示域的虚拟样本,以获得可能的 OOD 训练样本。最后,我们提出了 OOD 校准,通过激活EXTREME Value Theory,以量化虚拟 OOD 样本的分布偏移,并进一步控制训练分布。我们对一些实际世界的图像分类任务进行了广泛的研究,并证明了我们的提议方法的优越性。

Learning Logic Programs by Discovering Higher-Order Abstractions

  • paper_url: http://arxiv.org/abs/2308.08334
  • repo_url: None
  • paper_authors: Céline Hocquette, Sebastijan Dumančić, Andrew Cropper
  • for: 本研究旨在找到更高级别的抽象,以实现人类水平的AI。
  • methods: 该方法基于逻辑编程,从示例和背景知识中逻辑程序的induction。
  • results: 对多个领域的实验表明,STEVIE可以提高预测精度27%,降低学习时间47%。此外,STEVIE还可以找到可以在不同领域中传递的抽象。
    Abstract Discovering novel abstractions is important for human-level AI. We introduce an approach to discover higher-order abstractions, such as map, filter, and fold. We focus on inductive logic programming, which induces logic programs from examples and background knowledge. We introduce the higher-order refactoring problem, where the goal is to compress a logic program by introducing higher-order abstractions. We implement our approach in STEVIE, which formulates the higher-order refactoring problem as a constraint optimisation problem. Our experimental results on multiple domains, including program synthesis and visual reasoning, show that, compared to no refactoring, STEVIE can improve predictive accuracies by 27% and reduce learning times by 47%. We also show that STEVIE can discover abstractions that transfer to different domains
    摘要 发现新的抽象是人工智能达到人类水平的关键。我们介绍了一种方法,用于发现更高一级的抽象,如地图、筛选和折叠。我们专注于逻辑编程,即从示例和背景知识中逻辑程序的生成。我们介绍了更高一级 refactoring 问题,其目标是通过引入更高一级抽象来压缩逻辑程序。我们在 STEVIE 中实现了这种方法,将更高一级 refactoring 问题转化为约束优化问题。我们的实验结果表明,相比无 refactoring,STEVIE 可以提高预测精度27%,降低学习时间47%。我们还表明,STEVIE 可以发现可以在不同领域传递的抽象。

A Framework for Data-Driven Explainability in Mathematical Optimization

  • paper_url: http://arxiv.org/abs/2308.08309
  • repo_url: None
  • paper_authors: Kevin-Martin Aigner, Marc Goerigk, Michael Hartisch, Frauke Liers, Arthur Miehlich
  • for: 提高各种大规模实际问题的解决效率,使其可以快速解决很多以前被视为不可解决的问题。
  • methods: 使用了解释性的评价标准,与优化软件的黑盒问题相关。
  • results: 提出了一种新的评价标准,即解释性,并在简单情况下证明了其NP困难性。同时,对于简单的路径问题,提出了一种可解释的模型,并进行了数值实验,显示了解释性的成本很低。
    Abstract Advancements in mathematical programming have made it possible to efficiently tackle large-scale real-world problems that were deemed intractable just a few decades ago. However, provably optimal solutions may not be accepted due to the perception of optimization software as a black box. Although well understood by scientists, this lacks easy accessibility for practitioners. Hence, we advocate for introducing the explainability of a solution as another evaluation criterion, next to its objective value, which enables us to find trade-off solutions between these two criteria. Explainability is attained by comparing against (not necessarily optimal) solutions that were implemented in similar situations in the past. Thus, solutions are preferred that exhibit similar features. Although we prove that already in simple cases the explainable model is NP-hard, we characterize relevant polynomially solvable cases such as the explainable shortest-path problem. Our numerical experiments on both artificial as well as real-world road networks show the resulting Pareto front. It turns out that the cost of enforcing explainability can be very small.
    摘要 (Simplified Chinese translation)数学编程技术的进步使得可以有效地解决大规模的实际问题,这些问题前不久被视为不可解决的。然而,可证优解可能不会被接受,因为优化软件被视为黑盒子。虽然科学家们很好地理解这些技术,但是这lacks easy accessibility for practitioners。因此,我们提议将解释性作为评价标准之一,以便找到这两个标准之间的平衡解。解释性通过对过去在类似情况下实现的解 compare,以实现。因此,解决方案会受到类似特征的影响。虽然我们证明了简单情况下的解释模型是NP困难的,但我们描述了可解 polynomially solvable cases,如解释最短路问题。我们在人工和实际道路网络上进行了数值实验,显示了结果的Pareto前沿。结果显示,强制执行解释性的成本很小。

Integrating cognitive map learning and active inference for planning in ambiguous environments

  • paper_url: http://arxiv.org/abs/2308.08307
  • repo_url: None
  • paper_authors: Toon Van de Maele, Bart Dhoedt, Tim Verbelen, Giovanni Pezzulo
  • for: 本研究旨在探讨如何将认知地图与计划机制 integrate 以提高生物体在不确定环境中的导航能力。
  • methods: 本研究使用了一种统计学模型来描述认知地图的形成,并将其 integrate 到一个活动推理Agent中,以支持在不确定情况下的计划。
  • results: 研究发现,使用活动推理Agent可以在复杂情况下更有效地进行计划,而且在感知观察提供了不确定信息时,其效果更加出色。
    Abstract Living organisms need to acquire both cognitive maps for learning the structure of the world and planning mechanisms able to deal with the challenges of navigating ambiguous environments. Although significant progress has been made in each of these areas independently, the best way to integrate them is an open research question. In this paper, we propose the integration of a statistical model of cognitive map formation within an active inference agent that supports planning under uncertainty. Specifically, we examine the clone-structured cognitive graph (CSCG) model of cognitive map formation and compare a naive clone graph agent with an active inference-driven clone graph agent, in three spatial navigation scenarios. Our findings demonstrate that while both agents are effective in simple scenarios, the active inference agent is more effective when planning in challenging scenarios, in which sensory observations provide ambiguous information about location.
    摘要 Translated into Simplified Chinese:生物体需要获得认知地图以学习世界结构,以及能够在不确定环境中进行规划。虽然在每个领域中都已经取得了 significative progress,但整体 интеграion仍然是一个开放的研究问题。在这篇论文中,我们提出了在活动推理Agent中 интеGRATE cognitive map formation的统计模型,并在三个空间导航enario中比较了一个简单的clone graph Agent与一个活动推理驱动的clone graph Agent。我们发现,在复杂enario中,活动推理驱动的clone graph Agent更有效,因为感知观测在位置信息上提供了模糊的信息。

Robust Bayesian Satisficing

  • paper_url: http://arxiv.org/abs/2308.08291
  • repo_url: None
  • paper_authors: Artun Saday, Yaşar Cahit Yıldırım, Cem Tekin
  • for: 这篇论文主要应用于解决现代机器学习中的分布Shift问题,以确保模型的稳定性和可靠性。
  • methods: 本论文提出了一个名为RoBOS的robust Bayesian satisficing算法,用于在具有噪音和黑盒问题的contextual Bayesian optimization中实现分布Shift的抗衡。
  • results: 本论文证明了RoBOS算法在certain assumptions下可以保证sublinear lenient regret,并且提出了一个弱定义的 regret called robust satisficing regret,其下可以获得sublinear upper bound不受分布Shift的影响。
    Abstract Distributional shifts pose a significant challenge to achieving robustness in contemporary machine learning. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true and reference distributions of the context. We propose a novel robust Bayesian satisficing algorithm called RoBOS for noisy black-box optimization. Our algorithm guarantees sublinear lenient regret under certain assumptions on the amount of distribution shift. In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift. To demonstrate the effectiveness of our method, we apply it to various learning problems and compare it to other approaches, such as distributionally robust optimization.
    摘要 当前机器学习中的分布shift问题 pose a significant challenge to achieving robustness. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true and reference distributions of the context. We propose a novel robust Bayesian satisficing algorithm called RoBOS for noisy black-box optimization. Our algorithm guarantees sublinear lenient regret under certain assumptions on the amount of distribution shift. In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift. To demonstrate the effectiveness of our method, we apply it to various learning problems and compare it to other approaches, such as distributionally robust optimization.Here's the translation in Traditional Chinese:当前机器学习中的分布shift问题 pose a significant challenge to achieving robustness. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true and reference distributions of the context. We propose a novel robust Bayesian satisficing algorithm called RoBOS for noisy black-box optimization. Our algorithm guarantees sublinear lenient regret under certain assumptions on the amount of distribution shift. In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift. To demonstrate the effectiveness of our method, we apply it to various learning problems and compare it to other approaches, such as distributionally robust optimization.

It Ain’t That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

  • paper_url: http://arxiv.org/abs/2308.08268
  • repo_url: None
  • paper_authors: Xingcheng Xu, Zihao Pan, Haipeng Zhang, Yanqing Yang
  • for: investigate the generalization ability of Generative Transformer-based models
  • methods: using n-digit addition and multiplication tasks to study the models’ generalization behaviors
  • results: the models show successful ID generalization but poor OOD generalization, and the reason is found to be due to the models’ learned algebraic structures that map OOD inputs to equivalent ID domain outputs.
    Abstract Generative Transformer-based models have achieved remarkable proficiency on solving diverse problems. However, their generalization ability is not fully understood and not always satisfying. Researchers take basic mathematical tasks like n-digit addition or multiplication as important perspectives for investigating their generalization behaviors. Curiously, it is observed that when training on n-digit operations (e.g., additions) in which both input operands are n-digit in length, models generalize successfully on unseen n-digit inputs (in-distribution (ID) generalization), but fail miserably and mysteriously on longer, unseen cases (out-of-distribution (OOD) generalization). Studies try to bridge this gap with workarounds such as modifying position embedding, fine-tuning, and priming with more extensive or instructive data. However, without addressing the essential mechanism, there is hardly any guarantee regarding the robustness of these solutions. We bring this unexplained performance drop into attention and ask whether it is purely from random errors. Here we turn to the mechanistic line of research which has notable successes in model interpretability. We discover that the strong ID generalization stems from structured representations, while behind the unsatisfying OOD performance, the models still exhibit clear learned algebraic structures. Specifically, these models map unseen OOD inputs to outputs with equivalence relations in the ID domain. These highlight the potential of the models to carry useful information for improved generalization.
    摘要 We bring this unexplained performance drop into attention and question whether it is purely due to random errors. To better understand the issue, we turn to the mechanistic line of research, which has been successful in model interpretability. We discovered that the strong ID generalization is due to structured representations, while the unsatisfying OOD performance is caused by the models still exhibiting clear learned algebraic structures. Specifically, these models map unseen OOD inputs to outputs with equivalence relations in the ID domain. This suggests that the models have the potential to carry useful information for improved generalization.

Description Logics Go Second-Order – Extending EL with Universally Quantified Concepts

  • paper_url: http://arxiv.org/abs/2308.08252
  • repo_url: None
  • paper_authors: Joshua Hirschbrunn, Yevgeny Kazakov
  • for: 本研究论文主要针对描述逻辑的扩展,而不是固有的 decidable fragments of first-order logic。
  • methods: 本文引入 universally quantified concepts,即变量可以被任意替换为概念。 authors 定义了两种 semantics:一种是 schema semantics,允许概念变量只能被替换为特定语言中的概念;另一种是 second-order semantics,允许概念变量被替换为任意域的子集。
  • results: 作者们证明了扩展的描述逻辑 $\mathcal{EL}$ 的延展可以用 classical $\mathcal{EL}$ reasoning algorithms来推理,即使使用 second-order semantics。此外,作者们还证明了这个扩展的一个稍小但仍然有用的子集是可计算的。这个子集可以表达一些扩展的概念链规则、正面自限制和一些本地角色值图从 KL-ONE 中获得,而无需添加任何其他构造器。
    Abstract The study of Description Logics have been historically mostly focused on features that can be translated to decidable fragments of first-order logic. In this paper, we leave this restriction behind and look for useful and decidable extensions outside first-order logic. We introduce universally quantified concepts, which take the form of variables that can be replaced with arbitrary concepts, and define two semantics of this extension. A schema semantics allows replacements of concept variables only by concepts from a particular language, giving us axiom schemata similar to modal logics. A second-order semantics allows replacement of concept variables with arbitrary subsets of the domain, which is similar to quantified predicates in second-order logic. To study the proposed semantics, we focus on the extension of the description logic $\mathcal{EL}$. We show that for a useful fragment of the extension, the conclusions entailed by the different semantics coincide, allowing us to use classical $\mathcal{EL}$ reasoning algorithms even for the second-order semantics. For a slightly smaller, but still useful, fragment, we were also able to show polynomial decidability of the extension. This fragment, in particular, can express a generalized form of role chain axioms, positive self restrictions, and some forms of (local) role-value-maps from KL-ONE, without requiring any additional constructors.
    摘要 study of Description Logics Historically, most research has focused on features that can be translated into decidable fragments of first-order logic. In this paper, we abandon this restriction and explore useful and decidable extensions beyond first-order logic. We introduce universally quantified concepts, which take the form of variables that can be replaced with arbitrary concepts, and define two semantics for this extension. A schema semantics allows replacements of concept variables only with concepts from a specific language, giving us axiom schemata similar to modal logics. A second-order semantics allows replacement of concept variables with arbitrary subsets of the domain, similar to quantified predicates in second-order logic. To study the proposed semantics, we focus on the extension of the description logic $\mathcal{EL}$. We show that for a useful fragment of the extension, the conclusions entailed by the different semantics coincide, allowing us to use classical $\mathcal{EL}$ reasoning algorithms even for the second-order semantics. For a slightly smaller, but still useful, fragment, we were also able to show polynomial decidability of the extension. This fragment can express a generalized form of role chain axioms, positive self-restrictions, and some forms of (local) role-value-maps from KL-ONE without requiring any additional constructors.

TEST: Text Prototype Aligned Embedding to Activate LLM’s Ability for Time Series

  • paper_url: http://arxiv.org/abs/2308.08241
  • repo_url: None
  • paper_authors: Chenxi Sun, Yaliang Li, Hongyan Li, Shenda Hong
  • for: 本文是为了探讨如何使用当今的语言模型(LLM)来完成时序数据(TS)任务。
  • methods: 本文提出了两种方法:一是设计和训练基本大型模型 для TS 数据(LLM-for-TS),二是使得预训练的 LLM 处理 TS 数据(TS-for-LLM)。本文主要研究 TS-for-LLM 方法,以便利用 LLM 的能力来处理 TS 数据。
  • results: 本文通过实验表明,使用 TEST 方法可以让 LLM 处理 TS 数据,但不能与现有的特定于 TS 任务的模型相比,但可以让 LLM 拥有处理 TS 数据的能力而不妨碍其语言能力。
    Abstract This work summarizes two strategies for completing time-series (TS) tasks using today's language model (LLM): LLM-for-TS, design and train a fundamental large model for TS data; TS-for-LLM, enable the pre-trained LLM to handle TS data. Considering the insufficient data accumulation, limited resources, and semantic context requirements, this work focuses on TS-for-LLM methods, where we aim to activate LLM's ability for TS data by designing a TS embedding method suitable for LLM. The proposed method is named TEST. It first tokenizes TS, builds an encoder to embed them by instance-wise, feature-wise, and text-prototype-aligned contrast, and then creates prompts to make LLM more open to embeddings, and finally implements TS tasks. Experiments are carried out on TS classification and forecasting tasks using 8 LLMs with different structures and sizes. Although its results cannot significantly outperform the current SOTA models customized for TS tasks, by treating LLM as the pattern machine, it can endow LLM's ability to process TS data without compromising the language ability. This paper is intended to serve as a foundational work that will inspire further research.
    摘要 The TEST method first tokenizes the TS data, builds an encoder to embed the data instance-wise, feature-wise, and text-prototype-aligned, and then creates prompts to make the LLM more open to the embeddings. Finally, the method implements TS tasks using the embedded data. The experiments were conducted on TS classification and forecasting tasks using 8 different LLMs with various structures and sizes. While the results cannot significantly outperform the current state-of-the-art (SOTA) models customized for TS tasks, the TEST method can endow LLM's ability to process TS data without compromising its language ability. This work serves as a foundational study that inspires further research.Translation notes:* "LLM" is translated as "语言模型" (yǔ yán módel), which means "language model" in Simplified Chinese.* "TS" is translated as "时间序列" (shí jiān xìng zhì), which means "time-series" in Simplified Chinese.* "SOTA" is translated as "当前最佳" (dāng xiàn zuī jìa), which means "current state-of-the-art" in Simplified Chinese.

Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey

  • paper_url: http://arxiv.org/abs/2308.08234
  • repo_url: None
  • paper_authors: Lovre Torbarina, Tin Ferkovic, Lukasz Roguski, Velimir Mihelcic, Bruno Sarlija, Zeljko Kraljevic
  • for: 本研究旨在提供一种有效地使用多任务学习(MTL)方法来提高自然语言处理(NLP)模型的训练和部署效率,从而应对实际应用中的模型训练和部署问题。
  • methods: 本文主要介绍了基于变换器的MTL方法在NLP领域的应用,并讨论了在ML生命周期阶段中MTL方法的挑战和机遇。
  • results: 本研究系统地分析了如何将MTL方法应用于NLP领域中的ML生命周期阶段,并指出了在这些阶段中MTL方法的挑战和机遇。此外,本研究还提出了一种将MTL和长期学习(CL)相结合的研究方向,以解决实际应用中模型训练和部署中的问题。
    Abstract The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems to handle these models efficiently, from training to serving them in production. However, training, deploying, and updating multiple models can be complex, costly, and time-consuming, mainly when using transformer-based pre-trained language models. Multi-Task Learning (MTL) has emerged as a promising approach to improve efficiency and performance through joint training, rather than training separate models. Motivated by this, we first provide an overview of transformer-based MTL approaches in NLP. Then, we discuss the challenges and opportunities of using MTL approaches throughout typical ML lifecycle phases, specifically focusing on the challenges related to data engineering, model development, deployment, and monitoring phases. This survey focuses on transformer-based MTL architectures and, to the best of our knowledge, is novel in that it systematically analyses how transformer-based MTL in NLP fits into ML lifecycle phases. Furthermore, we motivate research on the connection between MTL and continual learning (CL), as this area remains unexplored. We believe it would be practical to have a model that can handle both MTL and CL, as this would make it easier to periodically re-train the model, update it due to distribution shifts, and add new capabilities to meet real-world requirements.
    摘要 随着自然语言处理(NLP)模型在不同领域的普及,机器学习(ML)实践者需要有效地处理这些模型,从训练到生产环境中的部署。然而,训练、部署和更新多个模型可能会具有复杂性、成本和时间consuming的问题,尤其是使用基于转换器的预训练语言模型。多任务学习(MTL)已经成为一种有前途的方法,通过共同训练而提高效率和性能。在这种情况下,我们首先提供了转换器基于MTL方法在NLP领域的概述。然后,我们讨论了使用MTL方法在ML生命周期阶段中的挑战和机遇,特别是在数据工程、模型开发、部署和监测阶段。这份报告专注于基于转换器的MTL架构,并且,到我们所知,是在ML生命周期阶段中系统地分析了转换器基于MTL在NLP领域的应用。此外,我们还鼓励了对MTL和持续学习(CL)之间的连接进行研究,因为这个领域还未得到了足够的探索。我们认为,一个能够处理MTL和CL的模型会更加实用,这样可以更加方便地在 periodic 训练、因为分布shift 和添加新功能等需求时更新模型。

Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

  • paper_url: http://arxiv.org/abs/2308.11521
  • repo_url: None
  • paper_authors: Zhenhua Wang, Wei Xie, Kai Chen, Baosheng Wang, Zhiwen Gui, Enze Wang
  • for: 这个论文 investigate了LLM “监禁”问题,并提出了一种自动监禁方法。
  • methods: 论文提出了三种实现方法,包括使用语义防火墙、自我欺骗攻击和语义攻击。
  • results: 实验结果显示,提出的攻击方法可以成功监禁GPT-3.5-Turbo和GPT-4模型,成功率为86.2%和67%,失败率为4.7%和2.2%。
    Abstract Large language models (LLMs), such as ChatGPT, have emerged with astonishing capabilities approaching artificial general intelligence. While providing convenience for various societal needs, LLMs have also lowered the cost of generating harmful content. Consequently, LLM developers have deployed semantic-level defenses to recognize and reject prompts that may lead to inappropriate content. Unfortunately, these defenses are not foolproof, and some attackers have crafted "jailbreak" prompts that temporarily hypnotize the LLM into forgetting content defense rules and answering any improper questions. To date, there is no clear explanation of the principles behind these semantic-level attacks and defenses in both industry and academia. This paper investigates the LLM jailbreak problem and proposes an automatic jailbreak method for the first time. We propose the concept of a semantic firewall and provide three technical implementation approaches. Inspired by the attack that penetrates traditional firewalls through reverse tunnels, we introduce a "self-deception" attack that can bypass the semantic firewall by inducing LLM to generate prompts that facilitate jailbreak. We generated a total of 2,520 attack payloads in six languages (English, Russian, French, Spanish, Chinese, and Arabic) across seven virtual scenarios, targeting the three most common types of violations: violence, hate, and pornography. The experiment was conducted on two models, namely the GPT-3.5-Turbo and GPT-4. The success rates on the two models were 86.2% and 67%, while the failure rates were 4.7% and 2.2%, respectively. This highlighted the effectiveness of the proposed attack method. All experimental code and raw data will be released as open-source to inspire future research. We believe that manipulating AI behavior through carefully crafted prompts will become an important research direction in the future.
    摘要 大型语言模型(LLM),如ChatGPT,已经出现了惊人的能力,接近人工通用智能。这些模型可以为社会各种需求提供便利,但同时也降低了生成不良内容的成本。因此,LLM开发者已经部署了Semantic-level防御机制,以识别和拒绝可能导致不良内容的提示。尽管这些防御机制不是不可逾越的,但一些攻击者已经制作了“监狱敲打”提示,使LLM忘记内容防御规则,回答任何不当问题。至今,在业界和学术界没有明确的Semantic-level攻击和防御原理的解释。本文 investigate LLM监狱问题,并提出了自动监狱方法的想法。我们提出了 semantic firewall 的概念,并提供了三种技术实现方法。受到传统防火墙被攻击的启示,我们引入了一种“自我欺骗”攻击,可以绕过semantic firewall,使LLM生成提示,促使监狱。我们在六种语言(英文、俄文、法文、西班牙文、中文和阿拉伯文)的七个虚拟场景中总共生成了2520个攻击payload。实验使用了GPT-3.5-Turbo和GPT-4两个模型,成功率分别为86.2%和67%,失败率分别为4.7%和2.2%。这说明了我们提出的攻击方法的效果。所有实验代码和原始数据将被发布为开源,以便uture research。我们认为,通过精心制定提示来控制AI行为将成为未来的重要研究方向。

In situ Fault Diagnosis of Indium Tin Oxide Electrodes by Processing S-Parameter Patterns

  • paper_url: http://arxiv.org/abs/2308.11639
  • repo_url: None
  • paper_authors: Tae Yeob Kang, Haebom Lee, Sungho Suh
  • for: 这个研究旨在为光电子设备中的铍镉矿(ITO)电极进行FAULT DETECTION和诊断,以确保设备的性能和可靠性。
  • methods: 该研究提出了一种增强型FAULT DETECTION方法,使用散射参数(S-parameter)信号处理,可以早期发现、具有高诊断精度、鲁棒性和根本原因分析。
  • results: 研究人员通过构建了一个完整的S-parameter模式数据库,并使用深度学习(DL)方法,包括多层感知网络(MLP)、卷积神经网络(CNN)和变换器,同时分析了缺陷的原因和严重程度。
    Abstract In the field of optoelectronics, indium tin oxide (ITO) electrodes play a crucial role in various applications, such as displays, sensors, and solar cells. Effective fault detection and diagnosis of the ITO electrodes are essential to ensure the performance and reliability of the devices. However, traditional visual inspection is challenging with transparent ITO electrodes, and existing fault detection methods have limitations in determining the root causes of the defects, often requiring destructive evaluations. In this study, an in situ fault diagnosis method is proposed using scattering parameter (S-parameter) signal processing, offering early detection, high diagnostic accuracy, noise robustness, and root cause analysis. A comprehensive S-parameter pattern database is obtained according to defect states. Deep learning (DL) approaches, including multilayer perceptron (MLP), convolutional neural network (CNN), and transformer, are then used to simultaneously analyze the cause and severity of defects. Notably, it is demonstrated that the diagnostic performance under additive noise levels can be significantly enhanced by combining different channels of the S-parameters as input to the learning algorithms, as confirmed through the t-distributed stochastic neighbor embedding (t-SNE) dimension reduction visualization.
    摘要 在光电子学领域,锌镉铝矿(ITO)电极在多种应用中扮演关键角色,如显示器、感测器和太阳能电池。有效检测和诊断ITO电极的缺陷非常重要,以确保设备的性能和可靠性。然而,传统的视觉检查困难于透明的ITO电极,现有的缺陷检测方法有限制力推断缺陷的根本原因,经常需要破坏评估。本研究提出了即位检测方法,使用散射参数(S-parameter)信号处理,可以早期检测、高精度诊断、鲁棒性和根本原因分析。通过对缺陷状态下的S-parameter模式库的建立,使用深度学习(DL)方法,包括多层感知网络(MLP)、卷积神经网络(CNN)和变换器,同时分析缺陷的原因和严重程度。另外,通过不同通道的S-parameters作为输入,使用不同的DL方法进行同时分析,可以减少干扰的影响,并且通过t-分布随机 neigh embedding(t-SNE)维度减少Visualization表明,在增加的污染水平下,诊断性能得到了显著提高。

Explainable Multi-View Deep Networks Methodology for Experimental Physics

  • paper_url: http://arxiv.org/abs/2308.08206
  • repo_url: https://github.com/scientific-computing-lab-nrcn/multi-view-explainability
  • paper_authors: Nadav Schneider, Muriel Tzdaka, Galit Sturm, Guy Lazovski, Galit Bar, Gilad Oren, Raz Gvishi, Gal Oren
  • for: 这个论文的目的是要解释多看点学习模型在物理实验中的决策过程,并提供不同多看点架构,以及一个方法来解释这些模型。
  • methods: 这个论文使用了多种深度学习模型,包括多看点模型,以及一些解释性方法来解释这些模型的决策过程。
  • results: 这个论文的实验结果显示,使用多看点架构可以提高分类精度,并且可以提供更多的解释性。具体来说,在高能量浓度物理实验中,使用多看点模型可以从foam样品的多个影像描述中提取更多的信息,以提高样品质量的评估精度。
    Abstract Physical experiments often involve multiple imaging representations, such as X-ray scans and microscopic images. Deep learning models have been widely used for supervised analysis in these experiments. Combining different image representations is frequently required to analyze and make a decision properly. Consequently, multi-view data has emerged - datasets where each sample is described by views from different angles, sources, or modalities. These problems are addressed with the concept of multi-view learning. Understanding the decision-making process of deep learning models is essential for reliable and credible analysis. Hence, many explainability methods have been devised recently. Nonetheless, there is a lack of proper explainability in multi-view models, which are challenging to explain due to their architectures. In this paper, we suggest different multi-view architectures for the vision domain, each suited to another problem, and we also present a methodology for explaining these models. To demonstrate the effectiveness of our methodology, we focus on the domain of High Energy Density Physics (HEDP) experiments, where multiple imaging representations are used to assess the quality of foam samples. We apply our methodology to classify the foam samples quality using the suggested multi-view architectures. Through experimental results, we showcase the improvement of accurate architecture choice on both accuracy - 78% to 84% and AUC - 83% to 93% and present a trade-off between performance and explainability. Specifically, we demonstrate that our approach enables the explanation of individual one-view models, providing insights into the decision-making process of each view. This understanding enhances the interpretability of the overall multi-view model. The sources of this work are available at: https://github.com/Scientific-Computing-Lab-NRCN/Multi-View-Explainability.
    摘要 物理实验经常涉及多种图像表示方式,如X射线扫描和显微镜图像。深度学习模型在这些实验中广泛应用了监督分析。将不同的图像表示方式结合起来是必要的,以便正确地分析和做出决策。因此,多视图数据出现了,每个样本都是由不同的角度、来源或模式描述的。这些问题是多视图学习的核心。理解深度学习模型的决策过程是必要的,以确保可靠和可信worth的分析。因此,许多解释方法已经被开发出来。然而,多视图模型的解释却是一个挑战,因为它们的架构复杂。在这篇论文中,我们提出了不同的多视图架构,每个适用于不同的问题,以及一种方法来解释这些模型。为了证明我们的方法的有效性,我们将在高能密度物理实验中应用我们的方法,其中多种图像表示方式是用来评估高强度粉末样本质量的。我们通过实验结果表明,我们的方法可以提高准确性,从78%提高到84%,并且可以提高AUC从83%提高到93%。此外,我们还发现了在性能和解释之间存在一定的负相关性。具体来说,我们的方法可以解释单个一视图模型的决策过程,从而提供每个视图的解释。这种理解可以提高多视图模型的解释性。这些资源可以在以下链接中获取:https://github.com/Scientific-Computing-Lab-NRCN/Multi-View-Explainability。

Towards Ontology-Mediated Planning with OWL DL Ontologies (Extended Version)

  • paper_url: http://arxiv.org/abs/2308.08200
  • repo_url: None
  • paper_authors: Tobias John, Patrick Koopmann
  • for: 本研究旨在提供一种新的方法,以便规划专家可以使用 familar 的 formalism 来编写规划规范,同时允许现有 ontology 被容易地 интегрироваться和扩展。
  • methods: 本研究使用了一种新的接口来链接规划规范和 ontology together,以便在小型Domain 下进行规划。此外,本研究还使用了一种数据依赖的 rewrite 将 ontology-mediated 规划问题转换成类别的规划问题,以便使用现有的规划工具进行处理。
  • results: 本研究的实验表明,该方法在小型Domain 下可以具有一定的潜在和局限性。同时,该方法还可以允许规划专家使用现有的规划工具来解决 ontology-mediated 规划问题。
    Abstract While classical planning languages make the closed-domain and closed-world assumption, there have been various approaches to extend those with DL reasoning, which is then interpreted under the usual open-world semantics. Current approaches for planning with DL ontologies integrate the DL directly into the planning language, and practical approaches have been developed based on first-order rewritings or rewritings into datalog. We present here a new approach in which the planning specification and ontology are kept separate, and are linked together using an interface. This allows planning experts to work in a familiar formalism, while existing ontologies can be easily integrated and extended by ontology experts. Our approach for planning with those ontology-mediated planning problems is optimized for cases with comparatively small domains, and supports the whole OWL DL fragment. The idea is to rewrite the ontology-mediated planning problem into a classical planning problem to be processed by existing planning tools. Different to other approaches, our rewriting is data-dependent. A first experimental evaluation of our approach shows the potential and limitations of this approach.
    摘要 古典规划语言采用关闭领域和关闭世界假设,然而有许多方法来扩展这些使用DL推理,然后在普通的开放世界 semantics下进行解释。现有的规划与DL ontology的集成方法是将DL直接集成到规划语言中,并基于首览 rewrite 或 rewrite 到 datalog 实现实用。我们现在提出了一种新的方法,在规划规范和 ontology 之间使用接口相互连接。这使得规划专家可以在熟悉的 formalism 中工作,而现有的 ontology 可以轻松地被集成和扩展。我们的规划与 ontology-mediated 规划问题的方法优化为小领域情况,支持整个 OWL DL Fragment。我们的 rewrite 是数据висимы的。一个初步的实验评估表明了这种方法的潜力和局限性。

Modelling the Spread of COVID-19 in Indoor Spaces using Automated Probabilistic Planning

  • paper_url: http://arxiv.org/abs/2308.08190
  • repo_url: None
  • paper_authors: Mohamed Harmanani
  • for: This paper aims to provide a novel approach for modeling the spread of COVID-19 in indoor spaces using probabilistic planning and dynamic graph analysis, and to evaluate the effectiveness of different mitigation strategies.
  • methods: The authors use a probabilistic planning framework to model the spread of COVID-19 in shared spaces, and endow the planner with means to control the spread of the disease through non-pharmaceutical interventions (NPIs) such as mandating masks and vaccines. They also compare the impact of crowds and capacity limits on the spread of COVID-19 in these settings.
  • results: The authors demonstrate that the use of probabilistic planning is effective in predicting the amount of infections that are likely to occur in shared spaces, and that automated planners have the potential to design competent interventions to limit the spread of the disease.Here are the three key points in Simplified Chinese text:
  • for: 这个研究旨在提供一种基于概率规划和动态图分析的新方法,用于模型 COVID-19 在室内空间中的传播,并评估不同的控制策略的效果。
  • methods: 作者们使用概率规划框架来模型 COVID-19 在共享空间中的传播,并赋予计划程序控制疫苗传播的能力,包括规定口罩和疫苗。他们还对受到限制的人群和容量限制的 COVID-19 传播进行比较。
  • results: 作者们表明,使用概率规划可以准确预测共享空间中的感染情况,并且自动化计划程序有效地设计控制疫苗传播的策略。
    Abstract The coronavirus disease 2019 (COVID-19) pandemic has been ongoing for around 3 years, and has infected over 750 million people and caused over 6 million deaths worldwide at the time of writing. Throughout the pandemic, several strategies for controlling the spread of the disease have been debated by healthcare professionals, government authorities, and international bodies. To anticipate the potential impact of the disease, and to simulate the effectiveness of different mitigation strategies, a robust model of disease spread is needed. In this work, we explore a novel approach based on probabilistic planning and dynamic graph analysis to model the spread of COVID-19 in indoor spaces. We endow the planner with means to control the spread of the disease through non-pharmaceutical interventions (NPIs) such as mandating masks and vaccines, and we compare the impact of crowds and capacity limits on the spread of COVID-19 in these settings. We demonstrate that the use of probabilistic planning is effective in predicting the amount of infections that are likely to occur in shared spaces, and that automated planners have the potential to design competent interventions to limit the spread of the disease. Our code is fully open-source and is available at: https://github.com/mharmanani/prob-planning-covid19 .
    摘要 COVID-19 疫情已经持续约3年,已经感染了超过75亿人,导致了全球6000万人的死亡。在疫情中,医疗专业人员、政府机构和国际组织一直在讨论如何控制疫情的传播。为了预测疫情的可能影响和评估不同mitigation策略的效果,需要一个可靠的疫情传播模型。在这个工作中,我们探讨了一种基于概率规划和动态图分析的新方法,用于模型COVID-19在室内空间的传播。我们赋予 плаanner控制疫情的能力,包括强制戴Mask和接种疫苗等非药用 интервентions。我们比较了在这些设置下COVID-19的传播的影响,并证明了概率规划有效地预测共享空间中感染的可能性,以及自动化 плаanner有 Potential to design有效的防止疫情传播的 intervención。我们的代码已经开源,可以在https://github.com/mharmanani/prob-planning-covid19 中获取。

Endogenous Macrodynamics in Algorithmic Recourse

  • paper_url: http://arxiv.org/abs/2308.08187
  • repo_url: https://github.com/pat-alt/endogenous-macrodynamics-in-algorithmic-recourse
  • paper_authors: Patrick Altmeyer, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, Cynthia C. S. Liem
  • for: 本研究旨在弥补现有的Counterfactual Explanations(CE)和Algorithmic Recourse(AR)研究中忽略的动态环境和多个个体之间的相互作用问题。
  • methods: 我们首先表明了现有方法可以总结为一个通用框架,然后我们指出了现有框架未能考虑到Counterfactual生成过程中的隐藏外部成本,这只有在研究Counterfactual生成过程的群体级别时才能发现。
  • results: 我们通过使用Various state-of-the-artCounterfactual生成器和多个标准数据集进行了大量的Counterfactual生成和研究,发现Counterfactual生成过程引起的领域和模型变化是很大,可能会妨碍Algorithmic Recourse的应用。然而,我们也提出了一些缓解这些问题的策略。我们的Counterfactual生成和研究框架快速、开源。
    Abstract Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata. The ability of such counterfactuals to handle dynamics like data and model drift remains a largely unexplored research challenge. There has also been surprisingly little work on the related question of how the actual implementation of recourse by one individual may affect other individuals. Through this work, we aim to close that gap. We first show that many of the existing methodologies can be collectively described by a generalized framework. We then argue that the existing framework does not account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level. Through simulation experiments involving various state-of the-art counterfactual generators and several benchmark datasets, we generate large numbers of counterfactuals and study the resulting domain and model shifts. We find that the induced shifts are substantial enough to likely impede the applicability of Algorithmic Recourse in some situations. Fortunately, we find various strategies to mitigate these concerns. Our simulation framework for studying recourse dynamics is fast and opensourced.
    摘要 先前的Counterfactual Explanations(CE)和Algorithmic Recourse(AR)研究都集中在单个个体和静止环境中,即给出一个估计模型后,找到满足多种需求的有效Counterfactuals。然而,这些Counterfactuals在数据和模型漂移的情况下的可靠性仍然是一个未解决的研究挑战。此外,很少有研究探讨了实际实施救济的一个人如何影响别的人。在这个工作中,我们希望填补这一差距。我们首先表明了许多现有的方法ologies可以总结为一个通用框架。然后,我们 argue that现有的框架不会考虑一种隐藏的外部成本,只有在研究救济的自然 dynamics 上才会显示出来。通过使用多种现状的Counterfactual生成器和多个标准数据集,我们生成了大量的Counterfactuals,并研究了其导致的领域和模型shift。我们发现这些shift具有很大的影响,可能会阻碍Algorithmic Recourse的应用。幸运的是,我们发现了一些缓解这些问题的策略。我们的救济动态 simulate框架快速且开源,可以帮助研究人员更好地理解和应用Algorithmic Recourse。

Enhancing Performance on Seen and Unseen Dialogue Scenarios using Retrieval-Augmented End-to-End Task-Oriented System

  • paper_url: http://arxiv.org/abs/2308.08169
  • repo_url: None
  • paper_authors: Jianguo Zhang, Stephen Roller, Kun Qian, Zhiwei Liu, Rui Meng, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong
  • for: 提高终端任务对话系统的灵活性和扩展性
  • methods: 使用缓存机制,通过缓存中的信息进行动态更新和适应不同对话场景
  • results: 实现了提高非空共同目标准确率的6.7%,至于现有强基线之上Translation:
  • for: 提高终端任务对话系统的灵活性和扩展性
  • methods: 使用缓存机制,通过缓存中的信息进行动态更新和适应不同对话场景
  • results: 实现了提高非空共同目标准确率的6.7%,至于现有强基线之上
    Abstract End-to-end task-oriented dialogue (TOD) systems have achieved promising performance by leveraging sophisticated natural language understanding and natural language generation capabilities of pre-trained models. This work enables the TOD systems with more flexibility through a simple cache. The cache provides the flexibility to dynamically update the TOD systems and handle both existing and unseen dialogue scenarios. Towards this end, we first fine-tune a retrieval module to effectively retrieve the most relevant information entries from the cache. We then train end-to-end TOD models that can refer to and ground on both dialogue history and retrieved information during TOD generation. The cache is straightforward to construct, and the backbone models of TOD systems are compatible with existing pre-trained generative models. Extensive experiments demonstrate the superior performance of our framework, with a notable improvement in non-empty joint goal accuracy by 6.7% compared to strong baselines.
    摘要 End-to-end任务导向对话(TOD)系统已经取得了可观的成绩,通过利用先前训练的自然语言理解和自然语言生成能力。这项工作允许TOD系统更加灵活,通过简单的缓存。缓存提供了更新TOD系统的动态性,并处理现有和未看过的对话场景。为此,我们首先精度地微调检索模块,以便快速检索缓存中最相关的信息项。然后,我们训练了基于对话历史和检索到信息的TOD模型,以便在TOD生成时,能够 Refer to和根据对话历史和检索到的信息。缓存的构建非常简单,并且TOD系统的核心模型与现有预训练生成模型兼容。广泛的实验证明了我们的框架的超越性,相比强基准,非空共同目标准确率提高6.7%。

PEvoLM: Protein Sequence Evolutionary Information Language Model

  • paper_url: http://arxiv.org/abs/2308.08578
  • repo_url: https://github.com/issararab/pevolm
  • paper_authors: Issar Arab
    for:This paper aims to improve the efficiency and accuracy of protein sequence alignment and evolutionary information retrieval by leveraging recent advancements in natural language processing (NLP) and machine learning (ML).methods:The proposed method, called PEvoLM, is a novel bidirectional language model that combines the concept of transfer learning with the idea of position-specific scoring matrices (PSSMs) to learn the evolutionary information of protein sequences. The model uses a single path for both the forward and backward passes, which reduces the number of free parameters compared to traditional bidirectional models.results:The proposed PEvoLM model was trained on a dataset of protein sequences and achieved state-of-the-art performance on predicting the next amino acid in a sequence, as well as the probability distribution of the next AA derived from similar yet different sequences. The model also demonstrated improved performance on multi-task learning, which involves predicting both the next AA and the evolutionary information of protein sequences. The source code and pre-trained model are available on GitHub under the permissive MIT license.
    Abstract With the exponential increase of the protein sequence databases over time, multiple-sequence alignment (MSA) methods, like PSI-BLAST, perform exhaustive and time-consuming database search to retrieve evolutionary information. The resulting position-specific scoring matrices (PSSMs) of such search engines represent a crucial input to many machine learning (ML) models in the field of bioinformatics and computational biology. A protein sequence is a collection of contiguous tokens or characters called amino acids (AAs). The analogy to natural language allowed us to exploit the recent advancements in the field of Natural Language Processing (NLP) and therefore transfer NLP state-of-the-art algorithms to bioinformatics. This research presents an Embedding Language Model (ELMo), converting a protein sequence to a numerical vector representation. While the original ELMo trained a 2-layer bidirectional Long Short-Term Memory (LSTMs) network following a two-path architecture, one for the forward and the second for the backward pass, by merging the idea of PSSMs with the concept of transfer-learning, this work introduces a novel bidirectional language model (bi-LM) with four times less free parameters and using rather a single path for both passes. The model was trained not only on predicting the next AA but also on the probability distribution of the next AA derived from similar, yet different sequences as summarized in a PSSM, simultaneously for multi-task learning, hence learning evolutionary information of protein sequences as well. The network architecture and the pre-trained model are made available as open source under the permissive MIT license on GitHub at https://github.com/issararab/PEvoLM.
    摘要 随着蛋白序列数据库的不断增长,多序列Alignment(MSA)方法,如PSI-BLAST,在时间上进行了极其探索和耗时的数据库搜索,以获取进化信息。得到的位置特异分数矩阵(PSSM)是这些搜索引擎的结果,这些矩阵在生物信息学和计算生物学领域中是非常重要的输入。蛋白序列是一系列连续的字符或氨基酸(AA)的集合。通过将蛋白序列与自然语言的对比,我们可以利用自然语言处理领域的最新进展,并将其传递到生物信息学中。本研究提出了一种Embedding Language Model(ELMo),将蛋白序列转换为数字向量表示。原ELMo使用了两层双向长短时间记忆(LSTM)网络,其中一个是向前 pass,另一个是向后 pass,通过将PSSM的想法与传输学习的概念结合起来,这种工作引入了一种新的双向语言模型(bi-LM),它具有四倍少的自由参数,并使用单路进行两个过程。模型不仅可以预测下一个AA,还可以同时预测下一个AA的分布情况,基于相似 yet different 的序列,这是通过多任务学习来学习蛋白序列的进化信息。网络架构和预训练模型都是开源的,可以在 GitHub 上下载,详细信息请参考

Interpretability Benchmark for Evaluating Spatial Misalignment of Prototypical Parts Explanations

  • paper_url: http://arxiv.org/abs/2308.08162
  • repo_url: None
  • paper_authors: Mikołaj Sacha, Bartosz Jura, Dawid Rymarczyk, Łukasz Struski, Jacek Tabor, Bartosz Zieliński
  • for: 提高parts-based网络的可解释性
  • methods: 引入特定的metric集来衡量相关的概念错误,并提出一种修正方法来解决这种错误
  • results: 通过实验研究,证明了该metric集的表达能力和修正方法的有效性
    Abstract Prototypical parts-based networks are becoming increasingly popular due to their faithful self-explanations. However, their similarity maps are calculated in the penultimate network layer. Therefore, the receptive field of the prototype activation region often depends on parts of the image outside this region, which can lead to misleading interpretations. We name this undesired behavior a spatial explanation misalignment and introduce an interpretability benchmark with a set of dedicated metrics for quantifying this phenomenon. In addition, we propose a method for misalignment compensation and apply it to existing state-of-the-art models. We show the expressiveness of our benchmark and the effectiveness of the proposed compensation methodology through extensive empirical studies.
    摘要 归并部网络在当前受欢迎的程度增加,这主要归功于它们的自我解释性。然而,它们的相似地图通常在半 finales层计算,因此prototype activation区域的接受范围经常受到图像外部的部分影响,这可能会导致误导性的解释。我们称这种不 DESirable 行为为空间解释不一致,并提出了一个专门的可解释指标集来量化这种现象。此外,我们还提出了一种补做方法,并应用于现有的状态级模型。我们通过广泛的实验研究表明了我们的指标和补做方法的表达能力和效果。

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework

  • paper_url: http://arxiv.org/abs/2308.08155
  • repo_url: None
  • paper_authors: Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, Chi Wang
  • for: 这份论文是为了开发基于多个代理的 LLM 应用程序而设计的一个框架。
  • methods: 这个框架使用多种代理进行对话,以解决任务。这些代理可以在不同的模式下运行,包括使用 LLM、人类输入和工具。
  • results: 这个框架可以减轻 LLM 生成和理解能力的强大 yet 不完美性,同时充分利用人类的理解和智慧。它还可以简化和统一复杂的 LLM 工作流程,使其变得更加自然和简单。
    Abstract This technical report presents AutoGen, a new framework that enables development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools. AutoGen's design offers multiple advantages: a) it gracefully navigates the strong but imperfect generation and reasoning abilities of these LLMs; b) it leverages human understanding and intelligence, while providing valuable automation through conversations between agents; c) it simplifies and unifies the implementation of complex LLM workflows as automated agent chats. We provide many diverse examples of how developers can easily use AutoGen to effectively solve tasks or build applications, ranging from coding, mathematics, operations research, entertainment, online decision-making, question answering, etc.
    摘要 这份技术报告介绍了AutoGen,一个新的框架,它使得开发 LLM 应用程序的多个代理可以互相对话以解决任务。AutoGen 代理可定制化、可对话、可以轻松地允许人类参与。它们可以在不同的模式下运行,这些模式可以结合 LLM、人类输入和工具来实现。AutoGen 的设计具有多个优点:一、它高效地缓解了 LLM 的强大 yet imperfect 生成和推理能力; 二、它利用人类的理解和智慧,同时提供了有价值的对话between agents; 三、它简化和统一了复杂 LLM 工作流的实现。我们提供了许多不同的开发者可以轻松地使用 AutoGen 解决任务或构建应用程序的示例,包括编程、数学、运筹、娱乐、在线决策、问答等等。

AI For Fraud Awareness

  • paper_url: http://arxiv.org/abs/2308.11032
  • repo_url: https://github.com/krdinal/CryptoFraud
  • paper_authors: Prabh Simran Singh Baweja, Orathai Sangpetch, Akkarit Sangpetch
  • for: 防止投资骗局和吸引人类资源
  • methods: 机器学习和游戏化技术
  • results: 个性化学习经验和预测投资骗局成功率提高
    Abstract In today's world, with the rise of numerous social platforms, it has become relatively easy for anyone to spread false information and lure people into traps. Fraudulent schemes and traps are growing rapidly in the investment world. Due to this, countries and individuals face huge financial risks. We present an awareness system with the use of machine learning and gamification techniques to educate the people about investment scams and traps. Our system applies machine learning techniques to provide a personalized learning experience to the user. The system chooses distinct game-design elements and scams from the knowledge pool crafted by domain experts for each individual. The objective of the research project is to reduce inequalities in all countries by educating investors via Active Learning. Our goal is to assist the regulators in assuring a conducive environment for a fair, efficient, and inclusive capital market. In the paper, we discuss the impact of the problem, provide implementation details, and showcase the potentiality of the system through preliminary experiments and results.
    摘要 今天的世界,由于社交平台的崛起,任何人都可以轻松地散布假信息和陷阱。投资领域的诈骗和陷阱在快速增长。由于此,国家和个人面临巨大的金融风险。我们提出了一种意识系统,通过机器学习和游戏化技术来教育人们关于投资诈骗和陷阱。我们的系统使用机器学习技术提供个性化学习经验 для每名用户。系统从培育专家制定的知识库中选择了不同的游戏元素和诈骗,为每名用户制定个性化的学习计划。我们的研究目标是通过活动学习减少全球各国的不平等。我们的目标是协助监管机构建立一个公平、高效、包容的资本市场环境。在论文中,我们讨论了问题的影响、实施细节以及系统的潜在可能性,并通过初步实验和结果展示系统的效果。

SYENet: A Simple Yet Effective Network for Multiple Low-Level Vision Tasks with Real-time Performance on Mobile Device

  • paper_url: http://arxiv.org/abs/2308.08137
  • repo_url: None
  • paper_authors: Weiran Gou, Ziyao Yi, Yan Xiang, Shaoqing Li, Zibin Liu, Dehui Kong, Ke Xu
    for:This paper aims to solve the problems of task-specific algorithms and large parameters in deep learning-based low-level vision tasks on mobile devices.methods:The proposed SYENet network consists of two asymmetrical branches with simple building blocks and a Quadratic Connection Unit(QCU) to connect the results. The network also uses a new Outlier-Aware Loss to improve performance.results:The proposed SYENet network achieves superior performance in real-time applications such as Image Signal Processing(ISP), Low-Light Enhancement(LLE), and Super-Resolution(SR) with 2K60FPS throughput on Qualcomm 8 Gen 1 mobile SoC. Specifically, it got the highest score in MAI 2022 Learned Smartphone ISP challenge for ISP task.
    Abstract With the rapid development of AI hardware accelerators, applying deep learning-based algorithms to solve various low-level vision tasks on mobile devices has gradually become possible. However, two main problems still need to be solved: task-specific algorithms make it difficult to integrate them into a single neural network architecture, and large amounts of parameters make it difficult to achieve real-time inference. To tackle these problems, we propose a novel network, SYENet, with only $~$6K parameters, to handle multiple low-level vision tasks on mobile devices in a real-time manner. The SYENet consists of two asymmetrical branches with simple building blocks. To effectively connect the results by asymmetrical branches, a Quadratic Connection Unit(QCU) is proposed. Furthermore, to improve performance, a new Outlier-Aware Loss is proposed to process the image. The proposed method proves its superior performance with the best PSNR as compared with other networks in real-time applications such as Image Signal Processing(ISP), Low-Light Enhancement(LLE), and Super-Resolution(SR) with 2K60FPS throughput on Qualcomm 8 Gen 1 mobile SoC(System-on-Chip). Particularly, for ISP task, SYENet got the highest score in MAI 2022 Learned Smartphone ISP challenge.
    摘要 随着人工智能硬件加速器的快速发展,在移动设备上应用深度学习算法来解决各种低级视觉任务已经变得可能。然而,两个主要问题仍需要解决:任务特定的算法难以集成到单一神经网络架构中,以及大量参数使得实时推理困难。为解决这些问题,我们提出了一种新的网络,SYENet,只有约6K个参数,可以在移动设备上实时处理多种低级视觉任务。SYENet由两个不同的束缚分支组成,每个分支都有简单的建筑块。为了有效地连接两个分支的结果,我们提出了一种叫做Quadratic Connection Unit(QCU)的新单元。此外,我们还提出了一种新的外围感知损失函数,用于处理图像。我们的方法在实时应用中证明了superior表现,包括PSNR在qualcomm 8 Gen 1移动SoC上的2K60FPS通道上,并在ISP、LLE和SR等任务上达到了最高分。特别是在ISP任务上,SYENet获得了2022年MAI学习智能手机ISP挑战赛中的最高分。

Ranking-aware Uncertainty for Text-guided Image Retrieval

  • paper_url: http://arxiv.org/abs/2308.08131
  • repo_url: None
  • paper_authors: Junyang Chen, Hanjiang Lai
  • For: The paper is written for text-guided image retrieval, specifically to incorporate conditional text to better capture users’ intent.* Methods: The paper proposes a novel ranking-aware uncertainty approach that uses only the provided triplets to capture more ranking information. The approach consists of three components: in-sample uncertainty, cross-sample uncertainty, and distribution regularization.* Results: The proposed method achieves significant results on two public datasets for composed image retrieval compared to existing state-of-the-art methods.Here is the information in Simplified Chinese text:* For: 文章是为文本指导图像检索,具体来说是使用条件文本更好地捕捉用户的意图。* Methods: 文章提出了一种基于 uncertainty 的排名感知方法,只使用提供的 triplets 来更好地捕捉排名信息。该方法包括三个组成部分:内样 uncertainty,交叉样 uncertainty 和分布规范。* Results: 提出的方法在两个公共数据集上实现了与现有状态艺术方法相比较出色的结果。
    Abstract Text-guided image retrieval is to incorporate conditional text to better capture users' intent. Traditionally, the existing methods focus on minimizing the embedding distances between the source inputs and the targeted image, using the provided triplets $\langle$source image, source text, target image$\rangle$. However, such triplet optimization may limit the learned retrieval model to capture more detailed ranking information, e.g., the triplets are one-to-one correspondences and they fail to account for many-to-many correspondences arising from semantic diversity in feedback languages and images. To capture more ranking information, we propose a novel ranking-aware uncertainty approach to model many-to-many correspondences by only using the provided triplets. We introduce uncertainty learning to learn the stochastic ranking list of features. Specifically, our approach mainly comprises three components: (1) In-sample uncertainty, which aims to capture semantic diversity using a Gaussian distribution derived from both combined and target features; (2) Cross-sample uncertainty, which further mines the ranking information from other samples' distributions; and (3) Distribution regularization, which aligns the distributional representations of source inputs and targeted image. Compared to the existing state-of-the-art methods, our proposed method achieves significant results on two public datasets for composed image retrieval.
    摘要 文本帮助图像检索是将条件文本更好地捕捉用户的意图。传统方法通常是通过使用提供的三元组 $\langle$源图像、源文本、目标图像$\rangle$ 进行距离计算,以最小化嵌入空间中的距离。然而,这种三元组优化可能会限制学习的检索模型,从而缺乏更多的排名信息,例如:三元组是一对一对应关系,而忽略了语言反馈和图像之间的多对多关系。为了捕捉更多的排名信息,我们提出了一种新的排名不确定采用方法。我们的方法包括三个主要组成部分:1. 样本内不确定性,它用一个由combined和目标特征组成的 Gaussian 分布来捕捉语义多样性。2. 交叉样本不确定性,它进一步挖掘来自其他样本的排名信息。3. 分布规则,它将源输入和目标图像的分布表示相似。与现有状态艺术方法相比,我们的提出方法在两个公共数据集上 achieved 显著的结果。

How to Mask in Error Correction Code Transformer: Systematic and Double Masking

  • paper_url: http://arxiv.org/abs/2308.08128
  • repo_url: None
  • paper_authors: Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Sunghwan Kim, Yongjune Kim, Jong-Seon No
  • for: 提高 Error Correction Code Transformer (ECCT) 的性能和计算复杂度
  • methods: 使用系统编码技术和两种不同的掩码矩阵(一个为 double-masked ECCT),提高 ECCT 的表现和学习多样性
  • results: 对 ECCT 进行修改后,实现了 state-of-the-art 的解码性能,与传统的解码算法相比,具有显著的性能优势
    Abstract In communication and storage systems, error correction codes (ECCs) are pivotal in ensuring data reliability. As deep learning's applicability has broadened across diverse domains, there is a growing research focus on neural network-based decoders that outperform traditional decoding algorithms. Among these neural decoders, Error Correction Code Transformer (ECCT) has achieved the state-of-the-art performance, outperforming other methods by large margins. To further enhance the performance of ECCT, we propose two novel methods. First, leveraging the systematic encoding technique of ECCs, we introduce a new masking matrix for ECCT, aiming to improve the performance and reduce the computational complexity. Second, we propose a novel transformer architecture of ECCT called a double-masked ECCT. This architecture employs two different mask matrices in a parallel manner to learn more diverse features of the relationship between codeword bits in the masked self-attention blocks. Extensive simulation results show that the proposed double-masked ECCT outperforms the conventional ECCT, achieving the state-of-the-art decoding performance with significant margins.
    摘要 在通信和存储系统中,错误修复码(ECC)是确保数据可靠性的关键。随着深度学习在不同领域的应用积极扩大,关注 neural network 基于decoder 的研究也在不断增长。 Among these neural decoders, Error Correction Code Transformer(ECCT)已经实现了领先的性能,比其他方法有大幅的优势。为了进一步提高 ECCT 的性能,我们提出了两种新方法。首先,利用 ECC 的系统编码技术,我们引入了一个新的面积矩阵,以提高性能并降低计算复杂度。其次,我们提出了一种新的 transformer 架构,called double-masked ECCT,这种架构在并行方式使用了两个不同的面积矩阵,以学习codeword 比特在面积矩阵中的更多的特征。经验性 simulation 结果表明,我们提出的 double-masked ECCT 可以超过 convent ECCT,实现了领先的解码性能,并且具有显著的优势。

OmniZoomer: Learning to Move and Zoom in on Sphere at High-Resolution

  • paper_url: http://arxiv.org/abs/2308.08114
  • repo_url: None
  • paper_authors: Zidong Cao, Hao Ai, Yan-Pei Cao, Ying Shan, Xiaohu Qie, Lin Wang
  • for: 提供高品质旋转和缩放功能 для Omnidirectional images (ODIs)
  • methods: 使用深度学习方法,包括M"obius变换和高分辨率特征地图生成模块,解决锚点噪声和投影问题
  • results: 实现高品质和分辨率的旋转和缩放功能,可以在虚拟现实环境中自由移动和缩放到 interessant object
    Abstract Omnidirectional images (ODIs) have become increasingly popular, as their large field-of-view (FoV) can offer viewers the chance to freely choose the view directions in immersive environments such as virtual reality. The M\"obius transformation is typically employed to further provide the opportunity for movement and zoom on ODIs, but applying it to the image level often results in blurry effect and aliasing problem. In this paper, we propose a novel deep learning-based approach, called \textbf{OmniZoomer}, to incorporate the M\"obius transformation into the network for movement and zoom on ODIs. By learning various transformed feature maps under different conditions, the network is enhanced to handle the increasing edge curvatures, which alleviates the blurry effect. Moreover, to address the aliasing problem, we propose two key components. Firstly, to compensate for the lack of pixels for describing curves, we enhance the feature maps in the high-resolution (HR) space and calculate the transformed index map with a spatial index generation module. Secondly, considering that ODIs are inherently represented in the spherical space, we propose a spherical resampling module that combines the index map and HR feature maps to transform the feature maps for better spherical correlation. The transformed feature maps are decoded to output a zoomed ODI. Experiments show that our method can produce HR and high-quality ODIs with the flexibility to move and zoom in to the object of interest. Project page is available at http://vlislab22.github.io/OmniZoomer/.
    摘要 “全方位图像(ODIs)在现实Virtual Reality等充满Environment中日益受欢迎,因其广阔的视场(FoV)可以让观看者自由选择视线。通常,使用Möbius变换来提供更多的运动和尺度缩放功能,但在图像层次上应用Möbius变换经常会导致模糊效果和抖音问题。在这篇论文中,我们提出了一种基于深度学习的新方法,称为OmniZoomer,以把Möbius变换 incorporated into the network for movement and zoom on ODIs。通过学习不同条件下的变换特征地图,网络得以处理增加的边缘弯曲,从而缓解模糊效果。此外,为了解决抖音问题,我们提出了两个关键 ком成分。首先,为了补偿因为描述曲线而缺少像素的问题,我们增强了高分辨率(HR)空间中的特征地图,并计算 transformed index map with a spatial index generation module。其次,因为ODIs是自然地表示在球面空间,我们提出了一个球面采样模块,将特征地图和HR特征地图组合起来,以便更好地在球面上匹配。 transformed feature maps are then decoded to output a zoomed ODI。实验结果表明,我们的方法可以生成高分辨率和高质量的ODIs,并且具有可以随意移动和缩放到对象所在位置的功能。项目页面可以在http://vlislab22.github.io/OmniZoomer/ 中找到。”

ChatLogo: A Large Language Model-Driven Hybrid Natural-Programming Language Interface for Agent-based Modeling and Programming

  • paper_url: http://arxiv.org/abs/2308.08102
  • repo_url: None
  • paper_authors: John Chen, Uri Wilensky
  • for: 支持开放式构建主义学习 Agent-Based Modeling 和 Programming
  • methods: 结合自然语言和程序语言的混合界面,使用大语言模型支持计算程序学习
  • results: 提供一个更用户友好的界面,支持创造性表达,并避免技术系统过度依赖于单个大语言模型
    Abstract Building on Papert (1980)'s idea of children talking to computers, we propose ChatLogo, a hybrid natural-programming language interface for agent-based modeling and programming. We build upon previous efforts to scaffold ABM & P learning and recent development in leveraging large language models (LLMs) to support the learning of computational programming. ChatLogo aims to support conversations with computers in a mix of natural and programming languages, provide a more user-friendly interface for novice learners, and keep the technical system from over-reliance on any single LLM. We introduced the main elements of our design: an intelligent command center, and a conversational interface to support creative expression. We discussed the presentation format and future work. Responding to the challenges of supporting open-ended constructionist learning of ABM & P and leveraging LLMs for educational purposes, we contribute to the field by proposing the first constructionist LLM-driven interface to support computational and complex systems thinking.
    摘要 使用 Papert(1980)的想法,我们提议了 ChatLogo,一种混合自然编程语言界面,用于代理模型和编程。我们建立在以前的尝试和使用大型语言模型(LLMs)支持计算编程的学习。ChatLogo 目标是在自然语言和编程语言之间进行对话,提供更易于使用的界面 для初学者,并避免技术系统对任何单一 LLM 的过度依赖。我们介绍了我们的设计的主要元素:一个智能命令中心,以及一个对话界面支持创造表达。我们讨论了展示形式和未来工作。面对支持开放式构建主义学习 ABM & P 以及使用 LLMs 教育目的的挑战,我们在领域中贡献了第一个构建主义 LLM-驱动的界面,用于支持计算和复杂系统思维。

S-Mixup: Structural Mixup for Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.08097
  • repo_url: https://github.com/sukwonyun/s-mixup
  • paper_authors: Junghurn Kim, Sukwon Yun, Chanyoung Park
  • for: 本研究的目的是提出一种基于图Structural Mixup(S-Mixup)的图节点分类方法,以提高图 neural network(GNN)的Robustness和泛化性。
  • methods: 本研究使用了一种新的混合策略,即Structural Mixup(S-Mixup),它利用图神经网络(GNN)的预测信度来选择混合Pool中的节点。此外,本研究还提出了一种基于Edge Gradient的边选择策略,以提高混合后的节点的连接性。
  • results: 经过广泛的实验表明,S-Mixup可以提高GNN的Robustness和泛化性,特别在异质情况下。同时,S-Mixup可以减少模型的训练时间和计算量,同时保持模型的性能。
    Abstract Existing studies for applying the mixup technique on graphs mainly focus on graph classification tasks, while the research in node classification is still under-explored. In this paper, we propose a novel mixup augmentation for node classification called Structural Mixup (S-Mixup). The core idea is to take into account the structural information while mixing nodes. Specifically, S-Mixup obtains pseudo-labels for unlabeled nodes in a graph along with their prediction confidence via a Graph Neural Network (GNN) classifier. These serve as the criteria for the composition of the mixup pool for both inter and intra-class mixups. Furthermore, we utilize the edge gradient obtained from the GNN training and propose a gradient-based edge selection strategy for selecting edges to be attached to the nodes generated by the mixup. Through extensive experiments on real-world benchmark datasets, we demonstrate the effectiveness of S-Mixup evaluated on the node classification task. We observe that S-Mixup enhances the robustness and generalization performance of GNNs, especially in heterophilous situations. The source code of S-Mixup can be found at \url{https://github.com/SukwonYun/S-Mixup}
    摘要 existstudies haves focus on graph classification tasks, whileresearch in node classification stil under-explored. In this paper, we propose a novel mixup augmentation for node classification called Structural Mixup (S-Mixup). Core idea is to take into account the structural information while mixing nodes. Specifically, S-Mixup obtains pseudo-labels for unlabeled nodes in a graph along with their prediction confidence via a Graph Neural Network (GNN) classifier. These serve as the criteria for the composition of the mixup pool for both inter and intra-class mixups. Furthermore, we utilize the edge gradient obtained from the GNN training and propose a gradient-based edge selection strategy for selecting edges to be attached to the nodes generated by the mixup. Through extensive experiments on real-world benchmark datasets, we demonstrate the effectiveness of S-Mixup evaluated on the node classification task. We observe that S-Mixup enhances the robustness and generalization performance of GNNs, especially in heterophilous situations. Source code of S-Mixup can be found at \url{https://github.com/SukwonYun/S-Mixup}

Decentralized Graph Neural Network for Privacy-Preserving Recommendation

  • paper_url: http://arxiv.org/abs/2308.08072
  • repo_url: None
  • paper_authors: Xiaolin Zheng, Zhongyu Wang, Chaochao Chen, Jiashu Qian, Yao Yang
  • for: 提出一种隐私保护的分布式图神经网络推荐系统,解决现有方法存在低效率和隐私泄露问题。
  • methods: 提出了一种新的分布式图神经网络(DGREC),包括三个阶段:图建构、本地梯度计算和全局梯度传递。用户可以选择公开交互记录。
  • results: 通过三个公共数据集进行了广泛的实验 validate了我们的框架在多个维度上的一致优势,包括推荐效果、通信效率和隐私保护。
    Abstract Building a graph neural network (GNN)-based recommender system without violating user privacy proves challenging. Existing methods can be divided into federated GNNs and decentralized GNNs. But both methods have undesirable effects, i.e., low communication efficiency and privacy leakage. This paper proposes DGREC, a novel decentralized GNN for privacy-preserving recommendations, where users can choose to publicize their interactions. It includes three stages, i.e., graph construction, local gradient calculation, and global gradient passing. The first stage builds a local inner-item hypergraph for each user and a global inter-user graph. The second stage models user preference and calculates gradients on each local device. The third stage designs a local differential privacy mechanism named secure gradient-sharing, which proves strong privacy-preserving of users' private data. We conduct extensive experiments on three public datasets to validate the consistent superiority of our framework.
    摘要 建立一个基于图计算机学(GNN)的推荐系统,并保持用户隐私免受挑战。现有的方法可以分为联邦GNN和分散GNN。但这两种方法都有不愿意的影响,例如对应用的通讯效率低下和隐私泄露。本文提出了DGREC,一个新的分散GNN推荐系统,其中用户可以选择公开他们的互动。这个系统包括三个阶段:图建构、本地偏好模型计算和全球偏好计算。第一个阶段建立了每个用户的本地内部项目图和全球用户图。第二个阶段模型用户偏好,计算每个本地设备上的偏好计算。第三个阶段设计了一个安全的偏好分享机制,以保证用户的私人数据的强大隐私。我们在三个公共数据集上进行了广泛的实验,以验证我们的框架的一致性和superiority。

Freshness or Accuracy, Why Not Both? Addressing Delayed Feedback via Dynamic Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.08071
  • repo_url: None
  • paper_authors: Xiaolin Zheng, Zhongyu Wang, Chaochao Chen, Feng Zhu, Jiashu Qian
    for: This paper aims to address the delayed feedback problem in online commercial systems, where users’ conversions are always delayed and can negatively impact the accuracy of training algorithms.methods: The proposed method, called Delayed Feedback Modeling by Dynamic Graph Neural Network (DGDFEM), includes three stages: preparing a data pipeline, building a dynamic graph, and training a CVR prediction model. The model training uses a novel graph convolutional method named HLGCN, which leverages both high-pass and low-pass filters to deal with conversion and non-conversion relationships.results: The proposed method achieves both data freshness and label accuracy, as validated by extensive experiments on three industry datasets. The results show the consistent superiority of the method over existing methods.
    Abstract The delayed feedback problem is one of the most pressing challenges in predicting the conversion rate since users' conversions are always delayed in online commercial systems. Although new data are beneficial for continuous training, without complete feedback information, i.e., conversion labels, training algorithms may suffer from overwhelming fake negatives. Existing methods tend to use multitask learning or design data pipelines to solve the delayed feedback problem. However, these methods have a trade-off between data freshness and label accuracy. In this paper, we propose Delayed Feedback Modeling by Dynamic Graph Neural Network (DGDFEM). It includes three stages, i.e., preparing a data pipeline, building a dynamic graph, and training a CVR prediction model. In the model training, we propose a novel graph convolutional method named HLGCN, which leverages both high-pass and low-pass filters to deal with conversion and non-conversion relationships. The proposed method achieves both data freshness and label accuracy. We conduct extensive experiments on three industry datasets, which validate the consistent superiority of our method.
    摘要 延迟反馈问题是在线商业系统预测转化率时最大的挑战之一,因为用户的转化事件总是延迟的。新的数据对于连续训练是有利的,但无完整反馈信息,即转化标签,训练算法可能受到充斥假负样本的困扰。现有方法通常使用多任务学习或设计数据管道来解决延迟反馈问题,但这些方法存在数据新鲜度和标签准确性之间的负担。本文提出了延迟反馈模型化方法(DGDFEM),它包括三个阶段:准备数据管道、建立动态图和训练CVR预测模型。在模型训练中,我们提出了一种新的图 convolutional方法名为HLGCN,它利用高频和低频滤波器来处理转化和非转化关系。我们的方法实现了数据新鲜度和标签准确性的平衡。我们对三个行业数据集进行了广泛的实验,验证了我们的方法的一致性优势。

Simple online learning with consistency oracle

  • paper_url: http://arxiv.org/abs/2308.08055
  • repo_url: None
  • paper_authors: Alexander Kozachinskiy, Tomasz Steifer
  • for: 本文研究在一种模型中进行在线学习,其中学习算法可以通过一个一致性 oracle 访问类型。
  • methods: 本文使用了一种新的算法,该算法可以在类型的 Littlestone 维度为 $d$ 的情况下最多出现 $O(256^d)$ 个错误。
  • results: 本文的算法可以解决一个开放的问题,即每个有限的 Littlestone 维度类型都存在一个可计算的在线学习算法。
    Abstract We consider online learning in the model where a learning algorithm can access the class only via the consistency oracle -- an oracle, that, at any moment, can give a function from the class that agrees with all examples seen so far. This model was recently considered by Assos et al. (COLT'23). It is motivated by the fact that standard methods of online learning rely on computing the Littlestone dimension of subclasses, a problem that is computationally intractable. Assos et al. gave an online learning algorithm in this model that makes at most $C^d$ mistakes on classes of Littlestone dimension $d$, for some absolute unspecified constant $C > 0$. We give a novel algorithm that makes at most $O(256^d)$ mistakes. Our proof is significantly simpler and uses only very basic properties of the Littlestone dimension. We also observe that there exists no algorithm in this model that makes at most $2^{d+1}-2$ mistakes. We also observe that our algorithm (as well as the algorithm of Assos et al.) solves an open problem by Hasrati and Ben-David (ALT'23). Namely, it demonstrates that every class of finite Littlestone dimension with recursively enumerable representation admits a computable online learner (that may be undefined on unrealizable samples).
    摘要 我们考虑在模型中进行在线学习,其中学习算法可以通过一个一致性 oracle 访问类型。这个 oracle 可以在任何时刻给出一个与所有前面看到的示例都一致的函数。这种模型在 Assos et al. (COLT'23) 中最近被考虑。它是由于标准的在线学习方法需要计算子类的 Litstone 维度,这是计算上不可能的问题而启发的。Assos et al. 提供了一个在这种模型中的在线学习算法,该算法在类型的 Litstone 维度为 $d$ 时最多会出现 $C^d$ 个错误,其中 $C$ 是一个未知的绝对常数。我们提供了一个新的算法,该算法在类型的 Litstone 维度为 $d$ 时最多会出现 $O(256^d)$ 个错误。我们的证明比较简单,只需使用了类型的 Litstone 维度的基本性质。我们还观察到,在这种模型中并没有任何算法可以在类型的 Litstone 维度为 $d$ 时出现最多 $2^{d+1}-2$ 个错误。此外,我们的算法(以及 Assos et al. 的算法)解决了 Hasrati 和 Ben-David (ALT'23) 提出的一个开放问题。即,我们证明了所有有 finite Littlestone 维度的类型都具有可计算的在线学习算法(可能是 undefined 的示例)。

Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem

  • paper_url: http://arxiv.org/abs/2308.08051
  • repo_url: None
  • paper_authors: Elena Gal, Shaun Singh, Aldo Pacchiano, Ben Walker, Terry Lyons, Jakob Foerster
  • for: 这篇研究是针对具有限制数据和实时决策的内部分数推干领域进行的,特别是在贷款申请时进行评估。
  • methods: 这篇研究使用了反抗伪阳性抑制(AdOpt)来直接对训练集中的偏袋问题进行修正,以获得不偏的但是有用的表现。
  • results: 研究获得了一系列具有挑战性的 benchmark 问题的州望性结果,并且初步证明了这种方法可以提高这些问题中的公平性。
    Abstract In many real world settings binary classification decisions are made based on limited data in near real-time, e.g. when assessing a loan application. We focus on a class of these problems that share a common feature: the true label is only observed when a data point is assigned a positive label by the principal, e.g. we only find out whether an applicant defaults if we accepted their loan application. As a consequence, the false rejections become self-reinforcing and cause the labelled training set, that is being continuously updated by the model decisions, to accumulate bias. Prior work mitigates this effect by injecting optimism into the model, however this comes at the cost of increased false acceptance rate. We introduce adversarial optimism (AdOpt) to directly address bias in the training set using adversarial domain adaptation. The goal of AdOpt is to learn an unbiased but informative representation of past data, by reducing the distributional shift between the set of accepted data points and all data points seen thus far. AdOpt significantly exceeds state-of-the-art performance on a set of challenging benchmark problems. Our experiments also provide initial evidence that the introduction of adversarial domain adaptation improves fairness in this setting.
    摘要 在许多实际场景中,二分类决策是基于有限数据和减速时间进行的,例如评审借款申请。我们关注一类这些问题,这些问题共享一个共同特点:真正的标签仅当数据点被主体分配正确标签时才能见到,例如,只有当我们接受借款申请后才能知道是否有 Default。这导致 false reject 变得自我强化,使得标记的训练集,由模型决策更新,逐渐受到偏见。先前的工作利用模型中的optimism来避免这种偏见,但这会导致准确接受率增加。我们引入对抗优isms(AdOpt)来直接 Address 训练集中的偏见,通过对抗领域适应来学习不偏的、但具有信息的表示。AdOpt 在一组挑战性 benchmark 问题上表现出色,大大超过了当前状态的性能。我们的实验还提供了初步证据,表明在这种设定下,对抗领域适应可以提高公平性。

A Comparative Analysis of the Capabilities of Nature-inspired Feature Selection Algorithms in Predicting Student Performance

  • paper_url: http://arxiv.org/abs/2308.08574
  • repo_url: None
  • paper_authors: Thomas Trask
  • for: 预测学生表现,以便采取有效的预failure intervención для有 риsk 学生。
  • methods: 使用12种自然 inspirited算法(NIAs)进行学生表现预测,并 comparing 不同的数据集和方法。
  • results: 结果表明,无论数据集都可以使用NIAs进行特征选择和传统机器学习算法进行分类,可以提高预测精度,同时减少特征集大小 by 2/3。
    Abstract Predicting student performance is key in leveraging effective pre-failure interventions for at-risk students. In this paper, I have analyzed the relative performance of a suite of 12 nature-inspired algorithms when used to predict student performance across 3 datasets consisting of instance-based clickstream data, intra-course single-course performance, and performance when taking multiple courses simultaneously. I found that, for all datasets, leveraging an ensemble approach using NIAs for feature selection and traditional ML algorithms for classification increased predictive accuracy while also reducing feature set size by 2/3.
    摘要 预测学生表现是关键在实施有效预测措施以 помочь学生避免失败。在这篇论文中,我分析了12种自然静电算法在预测学生表现方面的相对性,并在3个数据集上进行了比较,包括单个课程性能、多门课程同时表现和点播数据。我发现,无论哪个数据集,使用NIAs进行特征选择和传统机器学习算法进行分类可以提高预测精度,同时减少特征集的大小 by 2/3。

DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue

  • paper_url: http://arxiv.org/abs/2308.08043
  • repo_url: None
  • paper_authors: Lang Cao
  • for: 这个论文的目的是推广LLMs的应用范围,使其能够在复杂的诊断场景中发挥作用。
  • methods: 这篇论文提出了一种新的方法,即DiagGPT,用于将LLMs应用于诊断对话场景。DiagGPT通过对用户提问的扩展和启发问题的使用来帮助用户完成任务。
  • results: 实验表明,DiagGPT在进行诊断对话场景中表现出色,能够帮助用户完成任务。这表明DiagGPT在实际应用中具有潜在的应用前景。
    Abstract Large Language Models (LLMs), such as ChatGPT, are becoming increasingly sophisticated, demonstrating capabilities that closely resemble those of humans. These AI models are playing an essential role in assisting humans with a wide array of tasks in daily life. A significant application of AI is its use as a chat agent, responding to human inquiries across various domains. Current LLMs have shown proficiency in answering general questions. However, basic question-answering dialogue often falls short in complex diagnostic scenarios, such as legal or medical consultations. These scenarios typically necessitate Task-Oriented Dialogue (TOD), wherein an AI chat agent needs to proactively pose questions and guide users towards specific task completion. Previous fine-tuning models have underperformed in TOD, and current LLMs do not inherently possess this capability. In this paper, we introduce DiagGPT (Dialogue in Diagnosis GPT), an innovative method that extends LLMs to TOD scenarios. Our experiments reveal that DiagGPT exhibits outstanding performance in conducting TOD with users, demonstrating its potential for practical applications.
    摘要

Automated Test Case Generation Using Code Models and Domain Adaptation

  • paper_url: http://arxiv.org/abs/2308.08033
  • repo_url: None
  • paper_authors: Sepehr Hashtroudi, Jiho Shin, Hadi Hemmati, Song Wang
  • for: 提高自动化测试技术的效果,使其更能够检测复杂bug。
  • methods: 使用Transformer大型代码模型生成单元测试,并 fine-tune CodeT5 模型进行下游任务。
  • results: 提出了一个完全自动化测试框架,可以补充搜索基本测试生成器,提高测试覆盖率。results show that our approach can generate new test cases that cover lines that were not covered by developer-written tests, and using domain adaptation can increase line coverage by 49.9% and 54%.
    Abstract State-of-the-art automated test generation techniques, such as search-based testing, are usually ignorant about what a developer would create as a test case. Therefore, they typically create tests that are not human-readable and may not necessarily detect all types of complex bugs developer-written tests would do. In this study, we leverage Transformer-based code models to generate unit tests that can complement search-based test generation. Specifically, we use CodeT5, i.e., a state-of-the-art large code model, and fine-tune it on the test generation downstream task. For our analysis, we use the Methods2test dataset for fine-tuning CodeT5 and Defects4j for project-level domain adaptation and evaluation. The main contribution of this study is proposing a fully automated testing framework that leverages developer-written tests and available code models to generate compilable, human-readable unit tests. Results show that our approach can generate new test cases that cover lines that were not covered by developer-written tests. Using domain adaptation, we can also increase line coverage of the model-generated unit tests by 49.9% and 54% in terms of mean and median (compared to the model without domain adaptation). We can also use our framework as a complementary solution alongside common search-based methods to increase the overall coverage with mean and median of 25.3% and 6.3%. It can also increase the mutation score of search-based methods by killing extra mutants (up to 64 new mutants were killed per project in our experiments).
    摘要 现代自动化测试技术,如搜寻式测试,通常忽略开发者会写的测试案例。因此,它们通常会创建不可读的测试和可能不会检测所有类型的复杂bug。在本研究中,我们利用Transformer型别code模型来生成单元测试,以补充搜寻式测试生成器。具体来说,我们使用CodeT5,即现代大型code模型,并对其进行精度调整。我们使用Methods2test数据集进行精度调整和Defects4j进行项目级域适应和评估。本研究的主要贡献是提出了一个完全自动化测试框架,利用开发者写的测试和可用的code模型来生成可读性高的单元测试。结果表明,我们的方法可以生成新的测试案例,覆盖开发者写的测试案例中未覆盖的行数。通过领域适应,我们可以增加模型生成的单元测试的行覆盖率,增加了49.9%和54%的平均和中值(相比无领域适应)。此外,我们的框架还可以作为搜寻式方法的补充解决方案,增加总覆盖率的平均和中值为25.3%和6.3%。此外,它还可以提高搜寻式方法的突变得分,杀死Extra突变(在我们的实验中,每个项目最多杀死64个突变)。

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning

  • paper_url: http://arxiv.org/abs/2308.08029
  • repo_url: https://github.com/rowanlibr/sophisticated-learning
  • paper_authors: Rowan Hodson, Bruce Bassett, Charel van Hoof, Benjamin Rosman, Mark Solms, Jonathan P. Shock, Ryan Smith
  • for: 这个论文的目的是对Active Inference框架进行评估和改进。
  • methods: 这个论文使用了强化学习和搜索算法来解决多步规划问题。
  • results: 论文的实验结果显示,使用强化学习和搜索算法可以在一种生物学上相关的环境中超过其他算法的性能。
    Abstract Active Inference is a recent framework for modeling planning under uncertainty. Empirical and theoretical work have now begun to evaluate the strengths and weaknesses of this approach and how it might be improved. A recent extension - the sophisticated inference (SI) algorithm - improves performance on multi-step planning problems through recursive decision tree search. However, little work to date has been done to compare SI to other established planning algorithms. SI was also developed with a focus on inference as opposed to learning. The present paper has two aims. First, we compare performance of SI to Bayesian reinforcement learning (RL) schemes designed to solve similar problems. Second, we present an extension of SI - sophisticated learning (SL) - that more fully incorporates active learning during planning. SL maintains beliefs about how model parameters would change under the future observations expected under each policy. This allows a form of counterfactual retrospective inference in which the agent considers what could be learned from current or past observations given different future observations. To accomplish these aims, we make use of a novel, biologically inspired environment designed to highlight the problem structure for which SL offers a unique solution. Here, an agent must continually search for available (but changing) resources in the presence of competing affordances for information gain. Our simulations show that SL outperforms all other algorithms in this context - most notably, Bayes-adaptive RL and upper confidence bound algorithms, which aim to solve multi-step planning problems using similar principles (i.e., directed exploration and counterfactual reasoning). These results provide added support for the utility of Active Inference in solving this class of biologically-relevant problems and offer added tools for testing hypotheses about human cognition.
    摘要 aktive inferens 是一种最近的 плани法下 uncertainty 的框架。 empirical 和 theoretical 工作已经开始评估这种方法的优缺点,以及如何改进它。 recient extension - sophisticated inference(SI)算法 - 在多步 плани问题上提高性能通过 recursively decision tree 搜索。 然而,到目前为止,对 SI 与其他已知的 плани算法进行比较的工作尚未进行。 SI 还是在推理上而不是学习的方法。 本文的两个目标是:首先,与 bayesian 强化学习(RL)方法相比,SI 的性能如何?其次,我们提出了一种扩展 SI - sophisticated learning(SL) - 它更加具有活动学习在 плани过程中。 SL 维护对未来观测所采取的 Each policy 下的模型参数变化的信念。这允许一种对当前或过去观测进行 counterfactual 推理,即 agent 考虑在不同的未来观测下,当前或过去观测所能学习到什么。 为了实现这些目标,我们使用了一个 novel, biologically 引发的环境,这个环境是为 highlight 这类问题的问题结构而设计的。在这个环境中,agent 需要不断寻找可用(但是变化的)资源,同时面临着竞争的affordances 对信息增长。我们的 simulate 结果显示,SL 在这种情况下超越了所有其他算法 - 特别是 bayes-adaptive RL 和 upper confidence bound 算法,这些算法是使用类似原则(例如,引导探索和 counterfactual 推理)解决多步 плани问题。这些结果为活动推理在这类生物学 relevance 问题的 utility 提供了更多的支持,并为测试人类认知 Hypothesis 提供了新的工具。

Potential Energy Advantage of Quantum Economy

  • paper_url: http://arxiv.org/abs/2308.08025
  • repo_url: None
  • paper_authors: Junyu Liu, Hansheng Jiang, Zuo-Jun Max Shen
  • for: 本研究探讨了量子计算在能源成本方面的优势,并与经典计算进行对比。
  • methods: 本文使用 Cournot 竞争模型,受限于能源使用情况下,展示了量子计算机构在利润和能源效率方面的优势。
  • results: 研究发现,量子计算在大规模计算情况下可以实现更高的利润和能源效率,而且这种优势取决于大规模计算。 基于实际物理参数,文章还证明了实现这种能源效率优势所需的规模。
    Abstract Energy cost is increasingly crucial in the modern computing industry with the wide deployment of large-scale machine learning models and language models. For the firms that provide computing services, low energy consumption is important both from the perspective of their own market growth and the government's regulations. In this paper, we study the energy benefits of quantum computing vis-a-vis classical computing. Deviating from the conventional notion of quantum advantage based solely on computational complexity, we redefine advantage in an energy efficiency context. Through a Cournot competition model constrained by energy usage, we demonstrate quantum computing firms can outperform classical counterparts in both profitability and energy efficiency at Nash equilibrium. Therefore quantum computing may represent a more sustainable pathway for the computing industry. Moreover, we discover that the energy benefits of quantum computing economies are contingent on large-scale computation. Based on real physical parameters, we further illustrate the scale of operation necessary for realizing this energy efficiency advantage.
    摘要 energy cost is becoming increasingly important in the modern computing industry with the widespread deployment of large-scale machine learning models and language models. for firms that provide computing services, low energy consumption is important both from the perspective of their own market growth and the government's regulations. in this paper, we study the energy benefits of quantum computing compared to classical computing. we deviate from the conventional notion of quantum advantage based solely on computational complexity and redefine advantage in an energy efficiency context. through a Cournot competition model constrained by energy usage, we demonstrate that quantum computing firms can outperform their classical counterparts in both profitability and energy efficiency at the Nash equilibrium. therefore, quantum computing may represent a more sustainable pathway for the computing industry. furthermore, we find that the energy benefits of quantum computing economies are contingent on large-scale computation. based on real physical parameters, we illustrate the scale of operation necessary for realizing this energy efficiency advantage.

GRINN: A Physics-Informed Neural Network for solving hydrodynamic systems in the presence of self-gravity

  • paper_url: http://arxiv.org/abs/2308.08010
  • repo_url: None
  • paper_authors: Sayantan Auddy, Ramit Dey, Neal J. Turner, Shantanu Basu
  • for: 模拟三维自引力液体流体是astrophysical questions的解答之一,包括星系形成、 galaxy formation和大规模结构的发展。
  • methods: 通过利用神经网络的统一近似能力在 mesh-free框架中, physics informed neural networks (PINNs) 提供了一种新的方法来解决这些时间依赖的偏微分方程(PDEs)。
  • results: 我们的结果与一个线性分析解相匹配在线性 régime中 Within 1% error bound, and with a conventional grid code solution within 5% error bound in the nonlinear regime. 我们发现GRINN计算时间与维度无关,与传统网格代码的计算时间成正比。GRINN computation time is longer than the grid code in one- and two-dimensional calculations but is an order of magnitude lesser than the grid code in 3D with similar accuracy.
    Abstract Modeling self-gravitating gas flows is essential to answering many fundamental questions in astrophysics. This spans many topics including planet-forming disks, star-forming clouds, galaxy formation, and the development of large-scale structures in the Universe. However, the nonlinear interaction between gravity and fluid dynamics offers a formidable challenge to solving the resulting time-dependent partial differential equations (PDEs) in three dimensions (3D). By leveraging the universal approximation capabilities of a neural network within a mesh-free framework, physics informed neural networks (PINNs) offer a new way of addressing this challenge. We introduce the gravity-informed neural network (GRINN), a PINN-based code, to simulate 3D self-gravitating hydrodynamic systems. Here, we specifically study gravitational instability and wave propagation in an isothermal gas. Our results match a linear analytic solution to within 1\% in the linear regime and a conventional grid code solution to within 5\% as the disturbance grows into the nonlinear regime. We find that the computation time of the GRINN does not scale with the number of dimensions. This is in contrast to the scaling of the grid-based code for the hydrodynamic and self-gravity calculations as the number of dimensions is increased. Our results show that the GRINN computation time is longer than the grid code in one- and two- dimensional calculations but is an order of magnitude lesser than the grid code in 3D with similar accuracy. Physics-informed neural networks like GRINN thus show promise for advancing our ability to model 3D astrophysical flows.
    摘要 模拟自引力液体流动是astrophysics中答您许多基本问题的关键。这些问题包括形成 planetary disks、star-forming clouds、galaxy formation和宇宙大规模结构的发展。然而,gravity和 fluid dynamics之间的非线性互动使得解决 resulting time-dependent partial differential equations (PDEs) 在三维空间 (3D) 中提供了一项困难的挑战。通过利用 neural network 的通用适应能力 within a mesh-free framework,physics informed neural networks (PINNs) 提供了一种新的方法来解决这一挑战。我们介绍了 gravity-informed neural network (GRINN),一种基于 PINN 的代码,用于模拟 3D 自引力液体系统。我们在特定情况下研究了 gravitational instability 和波传播在固有温度气体中。我们的结果与一个线性分析解相匹配,在线性 regime 中准确到 1%,与一个 convent ional grid code solution 相匹配,在干扰增长到非线性 regime 时准确到 5%。我们发现 GRINN 的计算时间与维度无关。这与 grid-based code 的 hydrodynamic 和自重计算时间在维度增加时的扩展不同。我们的结果显示 GRINN 的计算时间在一维和二维计算中比 grid code 短,但在 3D 计算中比 grid code 更长。Physics-informed neural networks like GRINN 因此显示出了在 3D astrophysical flows 模拟方面的承诺。

Large Language Models in Introductory Programming Education: ChatGPT’s Performance and Implications for Assessments

  • paper_url: http://arxiv.org/abs/2308.08572
  • repo_url: None
  • paper_authors: Natalie Kiesler, Daniel Schiffner
  • for: 本研究探讨了使用 Large Language Models (LLMs) ChatGPT-3.5 和 GPT-4 解决入门编程任务的性能。基于性能分析,得出了对教学enario和评价格式的启示。
  • methods: 研究使用 CodingBat 提供的72个Python任务,并将全任务描述作为输入给 LLMs,并评估generated回答的正确率和可用性。
  • results: 结果显示 LLMs 的正确率高达94.4%到95.8%,并可靠地提供文本解释和程序代码,这对programming教育和评价方面开启了新的可能性。
    Abstract This paper investigates the performance of the Large Language Models (LLMs) ChatGPT-3.5 and GPT-4 in solving introductory programming tasks. Based on the performance, implications for didactic scenarios and assessment formats utilizing LLMs are derived. For the analysis, 72 Python tasks for novice programmers were selected from the free site CodingBat. Full task descriptions were used as input to the LLMs, while the generated replies were evaluated using CodingBat's unit tests. In addition, the general availability of textual explanations and program code was analyzed. The results show high scores of 94.4 to 95.8% correct responses and reliable availability of textual explanations and program code, which opens new ways to incorporate LLMs into programming education and assessment.
    摘要

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics

  • paper_url: http://arxiv.org/abs/2308.07954
  • repo_url: https://github.com/hyunp2/alphafold
  • paper_authors: Hyun Park, Parth Patel, Roland Haas, E. A. Huerta
  • for: Protein 3D structure prediction from amino acid sequence.
  • methods: AlphaFold2 and advanced computing as a service.
  • results: Up to two orders of magnitude faster than off-the-shelf AlphaFold2 implementations, reducing time-to-solution from weeks to minutes.
    Abstract The prediction of protein 3D structure from amino acid sequence is a computational grand challenge in biophysics, and plays a key role in robust protein structure prediction algorithms, from drug discovery to genome interpretation. The advent of AI models, such as AlphaFold, is revolutionizing applications that depend on robust protein structure prediction algorithms. To maximize the impact, and ease the usability, of these novel AI tools we introduce APACE, AlphaFold2 and advanced computing as a service, a novel computational framework that effectively handles this AI model and its TB-size database to conduct accelerated protein structure prediction analyses in modern supercomputing environments. We deployed APACE in the Delta supercomputer, and quantified its performance for accurate protein structure predictions using four exemplar proteins: 6AWO, 6OAN, 7MEZ, and 6D6U. Using up to 200 ensembles, distributed across 50 nodes in Delta, equivalent to 200 A100 NVIDIA GPUs, we found that APACE is up to two orders of magnitude faster than off-the-shelf AlphaFold2 implementations, reducing time-to-solution from weeks to minutes. This computational approach may be readily linked with robotics laboratories to automate and accelerate scientific discovery.
    摘要 “蛋白结构预测从氨基酸序列是生物物理 Computational Grand Challenge,它扮演着关键角色在稳定蛋白结构预测算法中,从药物发现到基因解读。人工智能模型,如AlphaFold,的出现正在改变这些应用程序中的应用,以增加其影响力和使用之 ease。为了最大化这些新的人工智能工具的影响和使用易用性,我们介绍了APACE、AlphaFold2和高性能计算作为服务,一个新的计算框架,可以有效地处理这些人工智能模型和其TB-size数据库,在现代超级计算环境中进行加速蛋白结构预测分析。我们在Delta超级计算机上部署了APACE,并评估其表现,使用四个例子蛋白:6AWO、6OAN、7MEZ和6D6U。使用 Up to 200个组,分布在Delta上的50个节点中,相当于200个A100 NVIDIA GPUs,我们发现APACE与传统的AlphaFold2实现方法相比,可以提高到二个次的速度,从 weeks 缩短到 minutes。这个计算方法可能可以与Robotics laboratory相连,以实现和加速科学发现。”

RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models

  • paper_url: http://arxiv.org/abs/2308.07922
  • repo_url: None
  • paper_authors: Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, Bryan Catanzaro
  • for: 本研究探讨了基于检索增强encoder-decoder语言模型的上下文学习能力。
  • methods: 我们首先对当前领先的ATLAS模型进行了全面的分析,并发现其在上下文学习中存在一些缺陷,主要是响应训练和测试之间的匹配性和上下文长度的限制。为了解决这些问题,我们提出了RAVEN模型,该模型结合了检索增强的隐藏语言模型和预FIX语言模型。我们还提出了Fusion-in-Context Learning,以提高几个例子的性能,让模型可以更好地利用上下文中的更多示例,不需要额外的训练或模型修改。
  • results: 通过广泛的实验,我们证明了RAVEN模型在certain scenarios中具有显著的优势,比如ATLAS模型和一些最先进的语言模型。尽管RAVEN模型有许多 fewer parameters,但它在一些场景中可以达到与最先进的语言模型相当的性能。我们的研究证明了检索增强encoder-decoder语言模型在上下文学习中的潜力,并鼓励进一步的研究在这个方向上。
    Abstract In this paper, we investigate the in-context learning ability of retrieval-augmented encoder-decoder language models. We first conduct a comprehensive analysis of the state-of-the-art ATLAS model and identify its limitations in in-context learning, primarily due to a mismatch between pretraining and testing, as well as a restricted context length. To address these issues, we propose RAVEN, a model that combines retrieval-augmented masked language modeling and prefix language modeling. We further introduce Fusion-in-Context Learning to enhance the few-shot performance by enabling the model to leverage more in-context examples without requiring additional training or model modifications. Through extensive experiments, we demonstrate that RAVEN significantly outperforms ATLAS and achieves results comparable to the most advanced language models in certain scenarios, despite having substantially fewer parameters. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning and encourages further research in this direction.
    摘要 在这篇论文中,我们研究了隐藏语言模型在受限上下文中学习的能力。我们首先进行了现有ATLAS模型的全面分析,并发现其在受限上下文中学习时存在一些限制,主要是预训练和测试的不符合,以及上下文长度的限制。为了解决这些问题,我们提出了RAVEN模型,它结合了检索支持的隐藏语言模型和前缀语言模型。我们还引入了内容学习协调技术,以便让模型在少量示例下具有更好的表现。通过广泛的实验,我们证明了RAVEN模型在certain情况下与ATLAS模型相比有显著改善,并且在一些情况下与当前最先进的语言模型相当。我们的工作表明了隐藏语言模型在受限上下文中学习的潜力,并鼓励了进一步的研究在这个方向上。

Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification

  • paper_url: http://arxiv.org/abs/2308.07921
  • repo_url: None
  • paper_authors: Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, Hongsheng Li
  • for: 这个论文的目的是探讨大型语言模型(LLMs)如GPT-4和PaLM-2如何解决数学逻辑问题。特别是OpenAI最新版本的GPT-4,即GPT-4 Code Interpreter,在数学 datasets 上表现出了很好的 perfomance。
  • methods: 这篇论文使用了不同的 Code Usage Frequency 约束来提高 GPT-4 Code Interpreter 的逻辑能力。这些约束包括代码生成和执行、代码执行结果评估和纠正答案。
  • results: 这篇论文的结果表明,通过使用 Code Usage Frequency 约束和 CSV 提示方法,GPT-4 Code Interpreter 的数学逻辑能力得到了大幅提高。具体来说,在 MATH 数据集上,使用 GPT-4 Code Interpreter 和 CSV 得到了 Zero-shot 精度提高从 53.9% 到 84.3%。
    Abstract Recent progress in large language models (LLMs) like GPT-4 and PaLM-2 has brought significant advancements in addressing math reasoning problems. In particular, OpenAI's latest version of GPT-4, known as GPT-4 Code Interpreter, shows remarkable performance on challenging math datasets. In this paper, we explore the effect of code on enhancing LLMs' reasoning capability by introducing different constraints on the \textit{Code Usage Frequency} of GPT-4 Code Interpreter. We found that its success can be largely attributed to its powerful skills in generating and executing code, evaluating the output of code execution, and rectifying its solution when receiving unreasonable outputs. Based on this insight, we propose a novel and effective prompting method, explicit \uline{c}ode-based \uline{s}elf-\uline{v}erification~(CSV), to further boost the mathematical reasoning potential of GPT-4 Code Interpreter. This method employs a zero-shot prompt on GPT-4 Code Interpreter to encourage it to use code to self-verify its answers. In instances where the verification state registers as ``False'', the model shall automatically amend its solution, analogous to our approach of rectifying errors during a mathematics examination. Furthermore, we recognize that the states of the verification result indicate the confidence of a solution, which can improve the effectiveness of majority voting. With GPT-4 Code Interpreter and CSV, we achieve an impressive zero-shot accuracy on MATH dataset \textbf{(53.9\% $\to$ 84.3\%)}.
    摘要

Relightable and Animatable Neural Avatar from Sparse-View Video

  • paper_url: http://arxiv.org/abs/2308.07903
  • repo_url: None
  • paper_authors: Zhen Xu, Sida Peng, Chen Geng, Linzhan Mou, Zihan Yan, Jiaming Sun, Hujun Bao, Xiaowei Zhou
  • for: 创建可重新照明和动画的神经人物模型从缺乏视角(或半影)视频中的动态人体。
  • methods: 提出了一种层次距离查询(HDQ)算法,用于估计人物模型下的世界空间距离,并利用球跟踪来效率地计算表面交点和光线可见性。
  • results: 实现了从缺乏视角(或半影)输入中生成高质量的可重新照明和动画的神经人物模型,并且与状态艺术法比较。
    Abstract This paper tackles the challenge of creating relightable and animatable neural avatars from sparse-view (or even monocular) videos of dynamic humans under unknown illumination. Compared to studio environments, this setting is more practical and accessible but poses an extremely challenging ill-posed problem. Previous neural human reconstruction methods are able to reconstruct animatable avatars from sparse views using deformed Signed Distance Fields (SDF) but cannot recover material parameters for relighting. While differentiable inverse rendering-based methods have succeeded in material recovery of static objects, it is not straightforward to extend them to dynamic humans as it is computationally intensive to compute pixel-surface intersection and light visibility on deformed SDFs for inverse rendering. To solve this challenge, we propose a Hierarchical Distance Query (HDQ) algorithm to approximate the world space distances under arbitrary human poses. Specifically, we estimate coarse distances based on a parametric human model and compute fine distances by exploiting the local deformation invariance of SDF. Based on the HDQ algorithm, we leverage sphere tracing to efficiently estimate the surface intersection and light visibility. This allows us to develop the first system to recover animatable and relightable neural avatars from sparse view (or monocular) inputs. Experiments demonstrate that our approach is able to produce superior results compared to state-of-the-art methods. Our code will be released for reproducibility.
    摘要 Translated into Simplified Chinese:这篇论文面临的挑战是从缺视图(或缺视)视频中的动态人体创建可重新照明和可动画化神经人体模型。相比 studio 环境,这种设定更加实用和可达,但是它对于计算机视觉来说是非常具有挑战性的难题。先前的神经人体重建方法可以从缺视图中重建可动画化人体模型,但是无法恢复物理参数以进行照明。而使用可导 diferenciable inverse rendering 方法可以成功地在 static 对象上进行材质恢复,但是对于动态人体来说,计算像素表面交叉和光线可见性的计算是 computationally 昂贵的。为解决这个挑战,我们提出了一种层次距离查询(HDQ)算法,用于估算人体在任意姿势下的世界空间距离。具体来说,我们使用 parametric human model 来估算坐标距离,并使用 SDF 的本地弯曲不变性来计算细致距离。基于 HDQ 算法,我们可以通过圆柱追踪来高效地计算表面交叉和光线可见性。这样,我们可以开发出可以从缺视图(或缺视)输入中恢复可动画化和可重新照明的神经人体模型的首个系统。实验表明,我们的方法可以与现有方法相比产生更好的结果。我们的代码将会被发布,以便重现。

Through the Lens of Core Competency: Survey on Evaluation of Large Language Models

  • paper_url: http://arxiv.org/abs/2308.07902
  • repo_url: None
  • paper_authors: Ziyu Zhuang, Qiguang Chen, Longxuan Ma, Mingda Li, Yi Han, Yushan Qian, Haopeng Bai, Zixian Feng, Weinan Zhang, Ting Liu
    for: 本研究旨在更好地评估大语言模型(LLM)的性能,以帮助指导研究领域的发展。methods: 本研究使用多种评估任务和指标来评估 LLM 的四大能力:理解、知识、可靠性和安全性。results: 研究发现,现有的评估任务和指标不具备完善性,因此提出了多种新的评估方法和指标,以更好地评估 LLM 的性能。
    Abstract From pre-trained language model (PLM) to large language model (LLM), the field of natural language processing (NLP) has witnessed steep performance gains and wide practical uses. The evaluation of a research field guides its direction of improvement. However, LLMs are extremely hard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inadequate due to the excellent performance of LLM. Secondly, existing evaluation tasks are difficult to keep up with the wide range of applications in real-world scenarios. To tackle these problems, existing works proposed various benchmarks to better evaluate LLMs. To clarify the numerous evaluation tasks in both academia and industry, we investigate multiple papers concerning LLM evaluations. We summarize 4 core competencies of LLM, including reasoning, knowledge, reliability, and safety. For every competency, we introduce its definition, corresponding benchmarks, and metrics. Under this competency architecture, similar tasks are combined to reflect corresponding ability, while new tasks can also be easily added into the system. Finally, we give our suggestions on the future direction of LLM's evaluation.
    摘要
  1. Traditional NLP tasks become inadequate due to the excellent performance of LLM.2. Existing evaluation tasks are difficult to keep up with the wide range of applications in real-world scenarios.To address these issues, previous works have proposed various benchmarks to better evaluate LLMs. To clarify the numerous evaluation tasks in both academia and industry, we investigate multiple papers concerning LLM evaluations. We summarize 4 core competencies of LLM, including:1. Reasoning: the ability to make logical and informed decisions based on the input.2. Knowledge: the ability to understand and apply knowledge to the input.3. Reliability: the ability to produce consistent and accurate output.4. Safety: the ability to avoid producing harmful or inappropriate output.For each competency, we introduce its definition, corresponding benchmarks, and metrics. Under this competency architecture, similar tasks are combined to reflect corresponding ability, while new tasks can also be easily added into the system. Finally, we provide our suggestions on the future direction of LLM’s evaluation.Translation note:* “pre-trained language model” (PLM) is translated as “预训练语言模型” (PLM)* “large language model” (LLM) is translated as “大型语言模型” (LLM)* “natural language processing” (NLP) is translated as “自然语言处理” (NLP)* “core competencies” is translated as “核心能力” (core competencies)* “reasoning” is translated as “理性” (reasoning)* “knowledge” is translated as “知识” (knowledge)* “reliability” is translated as “可靠性” (reliability)* “safety” is translated as “安全性” (safety)

Probabilistic Phase Labeling and Lattice Refinement for Autonomous Material Research

  • paper_url: http://arxiv.org/abs/2308.07897
  • repo_url: https://github.com/mingchiangchang/crystaltree.jl
  • paper_authors: Ming-Chiang Chang, Sebastian Ament, Maximilian Amsler, Duncan R. Sutherland, Lan Zhou, John M. Gregoire, Carla P. Gomes, R. Bruce van Dover, Michael O. Thompson
  • for: 本研究的目的是开发一种高效的晶体结构测定技术,以满足自动化科学发现过程中的高通过率实验需求。
  • methods: 本研究使用了Symmetry-constrained pseudo-refinement优化、best-first搜索和 Bayesian模型比较来估算晶体结构的可能性,无需phase空间信息或训练。
  • results: 实验和synthetic数据表明,CrystalShift可以提供可靠的晶体结构估算,超过现有方法的性能,并可以轻松地 интеグрите到高通过率的实验室工作流程中。此外,CrystalShift还提供了材料结构参数的量化见解,以便专家评估和AI模型对phas espacio进行模拟,从而加速材料的鉴定和发现。
    Abstract X-ray diffraction (XRD) is an essential technique to determine a material's crystal structure in high-throughput experimentation, and has recently been incorporated in artificially intelligent agents in autonomous scientific discovery processes. However, rapid, automated and reliable analysis method of XRD data matching the incoming data rate remains a major challenge. To address these issues, we present CrystalShift, an efficient algorithm for probabilistic XRD phase labeling that employs symmetry-constrained pseudo-refinement optimization, best-first tree search, and Bayesian model comparison to estimate probabilities for phase combinations without requiring phase space information or training. We demonstrate that CrystalShift provides robust probability estimates, outperforming existing methods on synthetic and experimental datasets, and can be readily integrated into high-throughput experimental workflows. In addition to efficient phase-mapping, CrystalShift offers quantitative insights into materials' structural parameters, which facilitate both expert evaluation and AI-based modeling of the phase space, ultimately accelerating materials identification and discovery.
    摘要

EduSAT: A Pedagogical Tool for Theory and Applications of Boolean Satisfiability

  • paper_url: http://arxiv.org/abs/2308.07890
  • repo_url: https://github.com/zhaoy37/sat_solver
  • paper_authors: Yiqi Zhao, Ziyan An, Meiyi Ma, Taylor Johnson
  • for: 支持学习和理解自动验证中的整数满足问题(SAT)和模ulo理论(SMT)解决方法
  • methods: 实现了逻辑推理算法(DPLL)和简化后二进制决策图(ROBDD)等SAT解决方法,以及五个NP完备问题的解决方法
  • results: 对EduSAT的评估显示其高度准确,在所有实现的SAT和SMT解决方法中都达到100%的正确率
    Abstract Boolean Satisfiability (SAT) and Satisfiability Modulo Theories (SMT) are widely used in automated verification, but there is a lack of interactive tools designed for educational purposes in this field. To address this gap, we present EduSAT, a pedagogical tool specifically developed to support learning and understanding of SAT and SMT solving. EduSAT offers implementations of key algorithms such as the Davis-Putnam-Logemann-Loveland (DPLL) algorithm and the Reduced Order Binary Decision Diagram (ROBDD) for SAT solving. Additionally, EduSAT provides solver abstractions for five NP-complete problems beyond SAT and SMT. Users can benefit from EduSAT by experimenting, analyzing, and validating their understanding of SAT and SMT solving techniques. Our tool is accompanied by comprehensive documentation and tutorials, extensive testing, and practical features such as a natural language interface and SAT and SMT formula generators, which also serve as a valuable opportunity for learners to deepen their understanding. Our evaluation of EduSAT demonstrates its high accuracy, achieving 100% correctness across all the implemented SAT and SMT solvers. We release EduSAT as a python package in .whl file, and the source can be identified at https://github.com/zhaoy37/SAT_Solver.
    摘要 布尔满足性(SAT)和满足性模ulo理论(SMT)广泛应用于自动验证,但教育用途上缺乏交互式工具。为了填补这一空白,我们提出了EduSAT,一种教育工具,专门用于支持学习和理解SAT和SMT解决方法。EduSAT实现了关键算法,如戴维斯-普特南-洛曼-罗宾逊(DPLL)算法和减少顺序二进制决策图(ROBDD) для SAT解决。此外,EduSAT还提供了五个NP完备问题的解决器抽象。用户可以通过EduSAT进行实验、分析和验证他们对SAT和SMT解决方法的理解。我们的工具附有完整的文档和教程,广泛的测试和实用功能,如自然语言界面和SAT和SMT公式生成器,这也为学习者提供了深入了解的机会。我们的评估表明,EduSAT具有100%的正确率,在所有实现的SAT和SMT解决器中。我们在Python包中发布了EduSAT,可以在.whl文件中找到,源代码可以在https://github.com/zhaoy37/SAT_Solver中找到。

A Comprehensive Study on Knowledge Graph Embedding over Relational Patterns Based on Rule Learning

  • paper_url: http://arxiv.org/abs/2308.07889
  • repo_url: https://github.com/jinlong22/Analysis-relational-patterns-and-SPA
  • paper_authors: Long Jin, Zhen Yao, Mingyang Chen, Huajun Chen, Wen Zhang
  • for: 本研究旨在解决知识图 completions 任务中 KGE 模型的表现不佳问题,探讨 KGE 模型在不同关系模式下的表现。
  • methods: 本研究使用 7 种 KGE 模型,对 4 种常见关系模式进行评估,并在理论、实体频率和分割方面进行分析,得到了一些Counterintuitive结论。
  • results: 研究发现,KGE 模型在特定关系模式下的表现不一定和预期的相关,而且存在一些关系模式下 KGE 模型表现差。为解决这个问题,研究提出了一种无需训练的方法 Score-based Patterns Adaptation (SPA),可以增强 KGE 模型在多种关系模式下的表现。
    Abstract Knowledge Graph Embedding (KGE) has proven to be an effective approach to solving the Knowledge Graph Completion (KGC) task. Relational patterns which refer to relations with specific semantics exhibiting graph patterns are an important factor in the performance of KGE models. Though KGE models' capabilities are analyzed over different relational patterns in theory and a rough connection between better relational patterns modeling and better performance of KGC has been built, a comprehensive quantitative analysis on KGE models over relational patterns remains absent so it is uncertain how the theoretical support of KGE to a relational pattern contributes to the performance of triples associated to such a relational pattern. To address this challenge, we evaluate the performance of 7 KGE models over 4 common relational patterns on 2 benchmarks, then conduct an analysis in theory, entity frequency, and part-to-whole three aspects and get some counterintuitive conclusions. Finally, we introduce a training-free method Score-based Patterns Adaptation (SPA) to enhance KGE models' performance over various relational patterns. This approach is simple yet effective and can be applied to KGE models without additional training. Our experimental results demonstrate that our method generally enhances performance over specific relational patterns. Our source code is available from GitHub at https://github.com/zjukg/Comprehensive-Study-over-Relational-Patterns.
    摘要 知识图 embedding(KGE)已经证明是解决知识图完成(KGC)任务的有效方法。关系模式,即具有特定 semantics 的图 Patterns,是 KGE 模型表现的重要因素。 Although KGE models' capabilities have been analyzed over different relational patterns in theory, and a rough connection between better relational patterns modeling and better performance of KGC has been built, a comprehensive quantitative analysis on KGE models over relational patterns remains absent, so it is uncertain how the theoretical support of KGE to a relational pattern contributes to the performance of triples associated to such a relational pattern. To address this challenge, we evaluate the performance of 7 KGE models over 4 common relational patterns on 2 benchmarks, and then conduct an analysis in three aspects: theory, entity frequency, and part-to-whole. We also obtain some counterintuitive conclusions. Finally, we introduce a training-free method Score-based Patterns Adaptation (SPA) to enhance KGE models' performance over various relational patterns. This approach is simple yet effective and can be applied to KGE models without additional training. Our experimental results demonstrate that our method generally enhances performance over specific relational patterns. Our source code is available from GitHub at .

Towards Temporal Edge Regression: A Case Study on Agriculture Trade Between Nations

  • paper_url: http://arxiv.org/abs/2308.07883
  • repo_url: https://github.com/scylj1/gnn_edge_regression
  • paper_authors: Lekang Jiang, Caiqi Zhang, Farimah Poursafaei, Shenyang Huang
  • for: 预测国际贸易数据中的边值(trade value),即预测两个国家之间的贸易金额。
  • methods: 使用图 neural network(GNN)模型,包括三种基线模型和三种动态GNN模型,在静态和动态图上进行预测。
  • results: 基线模型在不同设定下表现出色,尤其是在负边的存在情况下表现更佳。TGN模型在三种动态GNN模型中表现最佳,并且发现训练样本中负边的比例对测试性能产生了显著的影响。Here’s the full Chinese text:随着图 neural network(GNN)在动态图上的应用,它们在node classification、链接预测和图回归等任务中表现出色。然而,对于时间顺序edge regression任务,尚有很少的研究。本文通过预测国际贸易数据中的边值(trade value),探讨GNN在静态和动态图上的应用。我们提出三种简单强的基线模型,并对一种静态和三种动态GNN模型进行了广泛的实验评估。我们的实验结果表明,基线模型在不同设定下表现出色,尤其是在负边的存在情况下表现更佳。此外,我们发现TGN模型在三种动态GNN模型中表现最佳,并且训练样本中负边的比例对测试性能产生了显著的影响。相关代码可以在GitHub上找到:https://github.com/scylj1/GNN_Edge_Regression。
    Abstract Recently, Graph Neural Networks (GNNs) have shown promising performance in tasks on dynamic graphs such as node classification, link prediction and graph regression. However, few work has studied the temporal edge regression task which has important real-world applications. In this paper, we explore the application of GNNs to edge regression tasks in both static and dynamic settings, focusing on predicting food and agriculture trade values between nations. We introduce three simple yet strong baselines and comprehensively evaluate one static and three dynamic GNN models using the UN Trade dataset. Our experimental results reveal that the baselines exhibit remarkably strong performance across various settings, highlighting the inadequacy of existing GNNs. We also find that TGN outperforms other GNN models, suggesting TGN is a more appropriate choice for edge regression tasks. Moreover, we note that the proportion of negative edges in the training samples significantly affects the test performance. The companion source code can be found at: https://github.com/scylj1/GNN_Edge_Regression.
    摘要 最近,图 нейрон网络(GNN)在动态图上的任务中表现出色,包括节点分类、链接预测和图表 regression。然而,有少量的研究集中注意力于时间扩展edge regression任务,这种任务在实际世界中具有重要意义。在这篇论文中,我们探讨了GNN在静态和动态设置下进行边 regression任务的应用, focusing on 国际贸易食品和农业贸易值的预测。我们提出了三种简单强大的基elines,并对静态和三种动态GNN模型进行了广泛的实验评估,使用UN Trade数据集。我们的实验结果显示baselines在不同设置下具有极强表现,这 highlights the inadequacy of existing GNNs。此外,我们发现TGN在其他GNN模型之上表现出优异, suggesting TGN是更适合边 regression任务的选择。此外,我们注意到在训练样本中负边的比例对测试性能产生了显著的影响。相关的源代码可以在GitHub上找到:https://github.com/scylj1/GNN_Edge_Regression。

The $10 Million ANA Avatar XPRIZE Competition Advanced Immersive Telepresence Systems

  • paper_url: http://arxiv.org/abs/2308.07878
  • repo_url: None
  • paper_authors: Sven Behnke, Julie A. Adams, David Locke
  • for: 这篇论文是关于<<$10M ANA Avatar XPRIZE>>的多年竞赛,参赛者需要开发一种可以在实时传输人类存在的功能。
  • methods: 本文描述了竞赛的不同阶段和任务,以及评价标准。
  • results: 根据文章报道,竞赛中的参赛队伍通过了不同的任务和评价标准,并获得了奖励。
    Abstract The $10M ANA Avatar XPRIZE aimed to create avatar systems that can transport human presence to remote locations in real time. The participants of this multi-year competition developed robotic systems that allow operators to see, hear, and interact with a remote environment in a way that feels as if they are truly there. On the other hand, people in the remote environment were given the impression that the operator was present inside the avatar robot. At the competition finals, held in November 2022 in Long Beach, CA, USA, the avatar systems were evaluated on their support for remotely interacting with humans, exploring new environments, and employing specialized skills. This article describes the competition stages with tasks and evaluation procedures, reports the results, presents the winning teams' approaches, and discusses lessons learned.
    摘要 美国ANA Avatar XPRIZE挑战赛旨在创造能够在实时传输人类存在的功能,使得操作者可以在远程地点上见、听和互动。参赛队伍在多年的竞赛中开发出了 робо图形系统,让操作者能够在远程环境中感受到真实存在的感觉。同时,远程环境中的人们也被给予操作员在机器人中存在的印象。2022年11月在美国加利福尼亚州长滩市举行的比赛决赛中,参赛队伍的机器人系统被评估为在与人类互动、探索新环境和使用专业技能方面的支持。本文介绍了竞赛阶段的任务和评估过程,报道了结果,展示了赢得奖队的方法,并讨论了学习的教训。

Synthesizing Political Zero-Shot Relation Classification via Codebook Knowledge, NLI, and ChatGPT

  • paper_url: http://arxiv.org/abs/2308.07876
  • repo_url: https://github.com/snowood1/zero-shot-plover
  • paper_authors: Yibo Hu, Erick Skorupa Parolin, Latifur Khan, Patrick T. Brandt, Javier Osorio, Vito J. D’Orazio
  • for: 这篇论文的目的是提出零shot方法用于政治事件 ontology 关系分类,利用已有的注释代码库知识,以提高分类精度和效率。
  • methods: 这篇论文使用了一种名为 ZSP 的自然语言推理(NLI)基于的方法,它采用了树查询框架,将任务分解为上下文、Modalidad和类别推理三个级别。
  • results: 经过大规模的实验,ZSP 实现了在Rootcode 精细分类中的40%的提升,与超参 Bert 模型相当,表明 ZSP 可以作为事件记录验证和 ontology 发展中的有价值工具。
    Abstract Recent supervised models for event coding vastly outperform pattern-matching methods. However, their reliance solely on new annotations disregards the vast knowledge within expert databases, hindering their applicability to fine-grained classification. To address these limitations, we explore zero-shot approaches for political event ontology relation classification, by leveraging knowledge from established annotation codebooks. Our study encompasses both ChatGPT and a novel natural language inference (NLI) based approach named ZSP. ZSP adopts a tree-query framework that deconstructs the task into context, modality, and class disambiguation levels. This framework improves interpretability, efficiency, and adaptability to schema changes. By conducting extensive experiments on our newly curated datasets, we pinpoint the instability issues within ChatGPT and highlight the superior performance of ZSP. ZSP achieves an impressive 40% improvement in F1 score for fine-grained Rootcode classification. ZSP demonstrates competitive performance compared to supervised BERT models, positioning it as a valuable tool for event record validation and ontology development. Our work underscores the potential of leveraging transfer learning and existing expertise to enhance the efficiency and scalability of research in the field.
    摘要 现代监督学习模型在事件编码方面表现出色,但它们完全依赖新的注释,忽视了专家数据库中的庞大知识,这限制了它们的细致分类应用。为解决这些局限性,我们研究零shot方法,利用已有的注释代码ebook来 классифика事件ontology关系。我们的研究包括ChatGPT和一种新的自然语言推理(NLI)基于的approach named ZSP。ZSP采用树查询框架,将任务分解为上下文、Modal和分类层。这种框架提高了可读性、效率和 schema变化的适应性。通过对我们新收集的数据进行广泛的实验,我们揭示了ChatGPT中的不稳定性问题,并高举ZSP的出色表现。ZSP在细致的Rootcode分类中实现了40%的提升。ZSP与supervised BERT模型的表现相当,这positioned它为事件记录验证和ontology发展的有价值工具。我们的工作强调了可以通过转移学习和现有专业知识来提高研究领域的效率和可扩展性。

Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets

  • paper_url: http://arxiv.org/abs/2308.07871
  • repo_url: None
  • paper_authors: Sven Buechel, Udo Hahn
  • for: 这篇论文目的是提出一种统一的计算模型,以便将不同表达形式和标签类型的情感分析数据集合在一起,实现数据的可重用、可解释和灵活应用。
  • methods: 该论文使用了一种训练过程,以学习一个共享的幽默表示(emotion embeddings),不受不同的自然语言、交流Modalities、媒体或标签类型的限制。
  • results: 实验表明,该方法可以在各种不同的情感分析数据集上实现预测质量的可靠性和可重用性,而不需要特定的语言、Modalities或标签类型。
    Abstract Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc. Similarly, the large variety of representation formats used in previous research to describe emotions (polarity scales, basic emotion categories, dimensional approaches, appraisal theory, etc.) have led to an ever proliferating diversity of datasets, predictive models, and software tools for emotion analysis. Because of these two distinct types of heterogeneity, at the expressional and representational level, there is a dire need to unify previous work on increasingly diverging data and label types. This article presents such a unifying computational model. We propose a training procedure that learns a shared latent representation for emotions, so-called emotion embeddings, independent of different natural languages, communication modalities, media or representation label formats, and even disparate model architectures. Experiments on a wide range of heterogeneous affective datasets indicate that this approach yields the desired interoperability for the sake of reusability, interpretability and flexibility, without penalizing prediction quality. Code and data are archived under https://doi.org/10.5281/zenodo.7405327 .
    摘要 人类情感表达在许多交通Modalities和媒体格式中表现出来,因此计算研究也是多样化的,包括自然语言处理、音频信号分析、计算视觉等。在过去的研究中,用于描述情感的多种格式(负号级别、基本情绪类别、维度方法、评估理论等)导致了计算研究的数据和预测模型的总体增加,以及软件工具的杂化。由于表达和表达的多样性,我们需要统一过去的工作,以实现交互性、可读性和灵活性。本文提出了一种统一的计算模型,通过学习共享的感情嵌入,独立于不同的自然语言、交通Modalities、媒体或表达格式,甚至不同的模型架构。实验表明,这种方法可以实现数据和标签类型之间的兼容性,不пенalize预测质量。代码和数据存储在https://doi.org/10.5281/zenodo.7405327。

Brain-Inspired Computational Intelligence via Predictive Coding

  • paper_url: http://arxiv.org/abs/2308.07870
  • repo_url: None
  • paper_authors: Tommaso Salvatori, Ankur Mali, Christopher L. Buckley, Thomas Lukasiewicz, Rajesh P. N. Rao, Karl Friston, Alexander Ororbia
  • For: The paper aims to explore the potential of using predictive coding (PC) as a guiding principle for the development of machine learning algorithms, in order to address some of the limitations of current deep neural network approaches.* Methods: The paper surveys the literature on PC and its applications in machine intelligence tasks, highlighting its exciting properties and potential benefits for the field of machine learning.* Results: The paper discusses the potential of PC to model information processing in different brain areas, be used in cognitive control and robotics, and provide a powerful inversion scheme for continuous-state generative models.
    Abstract Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying uncertainty, lack of robustness, unreliability, and biological implausibility. It is possible that addressing these limitations may require schemes that are inspired and guided by neuroscience theories. One such theory, called predictive coding (PC), has shown promising performance in machine intelligence tasks, exhibiting exciting properties that make it potentially valuable for the machine learning community: PC can model information processing in different brain areas, can be used in cognitive control and robotics, and has a solid mathematical grounding in variational inference, offering a powerful inversion scheme for a specific class of continuous-state generative models. With the hope of foregrounding research in this direction, we survey the literature that has contributed to this perspective, highlighting the many ways that PC might play a role in the future of machine learning and computational intelligence at large.
    摘要 人工智能(AI)在这个世纪 rapidly becoming one of the key technologies. thus far majority of results in AI have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the widespread adoption of this approach has highlighted some important limitations, such as significant computational cost, difficulty in quantifying uncertainty, lack of robustness, unreliability, and biological implausibility. It is possible that addressing these limitations may require schemes that are inspired and guided by neuroscience theories. One such theory, called predictive coding (PC), has shown promising performance in machine intelligence tasks, exhibiting exciting properties that make it potentially valuable for the machine learning community: PC can model information processing in different brain areas, can be used in cognitive control and robotics, and has a solid mathematical grounding in variational inference, offering a powerful inversion scheme for a specific class of continuous-state generative models. With the hope of foregrounding research in this direction, we survey the literature that has contributed to this perspective, highlighting the many ways that PC might play a role in the future of machine learning and computational intelligence at large.

Leveraging Symmetries in Pick and Place

  • paper_url: http://arxiv.org/abs/2308.07948
  • repo_url: None
  • paper_authors: Haojie Huang, Dian Wang, Arsh Tangri, Robin Walters, Robert Platt
  • for: 本文旨在研究平面机器人抓取和放置任务中的对称性,并提出一种在Transporter Net框架中包含对称性的方法,以便快速适应不同的抓取和放置姿势。
  • methods: 本文使用了数学分析方法研究平面机器人抓取和放置任务中的对称性,并提出了一种在Transporter Net框架中包含对称性的方法,称为Equivariant Transporter Net。
  • results: 实验表明,Equivariant Transporter Net比非对称版本更高效,可以使用很少的人类示例来快速适应不同的抓取和放置任务。
    Abstract Robotic pick and place tasks are symmetric under translations and rotations of both the object to be picked and the desired place pose. For example, if the pick object is rotated or translated, then the optimal pick action should also rotate or translate. The same is true for the place pose; if the desired place pose changes, then the place action should also transform accordingly. A recently proposed pick and place framework known as Transporter Net captures some of these symmetries, but not all. This paper analytically studies the symmetries present in planar robotic pick and place and proposes a method of incorporating equivariant neural models into Transporter Net in a way that captures all symmetries. The new model, which we call Equivariant Transporter Net, is equivariant to both pick and place symmetries and can immediately generalize pick and place knowledge to different pick and place poses. We evaluate the new model empirically and show that it is much more sample efficient than the non-symmetric version, resulting in a system that can imitate demonstrated pick and place behavior using very few human demonstrations on a variety of imitation learning tasks.
    摘要 机器人拾取置位任务具有对象拾取和置位pose的对称性。例如,如果拾取对象旋转或平移,那么最佳拾取动作也应该旋转或平移。同样,如果置位pose发生变化,那么置位动作也应该相应变化。一种已经提出的拾取置位框架被称为Transporter Net,但它不捕捉所有的对称性。这篇论文分析了平面机器人拾取置位中存在的对称性,并提议在Transporter Net中采用具有对称性的equivariant neural网络,以捕捉所有的对称性。新的模型被称为Equivariant Transporter Net,它对拾取和置位对称性具有对称性,可以立即将拾取知识应用到不同的拾取和置位pose。我们在实验中评估了新模型,并证明它在很少示例的情况下可以很好地复现人类示例行为,在多种模仿学习任务上表现出非常高效。

Impression-Aware Recommender Systems

  • paper_url: http://arxiv.org/abs/2308.07857
  • repo_url: None
  • paper_authors: Fernando B. Pérez Maurera, Maurizio Ferrari Dacrema, Pablo Castells, Paolo Cremonesi
  • for: 提高推荐系统质量的新数据源
  • methods: 使用印象数据(过去的推荐项)和传统交互数据进行用户偏好细化
  • results: 系统性文献回顾,涵盖推荐系统使用印象数据的三大研究方向:推荐算法、数据集和评估方法Here’s the full text in Simplified Chinese:for: 本文提出了一种基于新数据源的推荐系统,用于提高推荐系统的质量。methods: 本文使用印象数据(过去的推荐项)和传统交互数据进行用户偏好细化,以提高推荐系统的准确率和个性化程度。results: 本文进行了系统性文献回顾,涵盖推荐系统使用印象数据的三大研究方向:推荐算法、数据集和评估方法。
    Abstract Novel data sources bring new opportunities to improve the quality of recommender systems. Impressions are a novel data source containing past recommendations (shown items) and traditional interactions. Researchers may use impressions to refine user preferences and overcome the current limitations in recommender systems research. The relevance and interest of impressions have increased over the years; hence, the need for a review of relevant work on this type of recommenders. We present a systematic literature review on recommender systems using impressions, focusing on three fundamental angles in research: recommenders, datasets, and evaluation methodologies. We provide three categorizations of papers describing recommenders using impressions, present each reviewed paper in detail, describe datasets with impressions, and analyze the existing evaluation methodologies. Lastly, we present open questions and future directions of interest, highlighting aspects missing in the literature that can be addressed in future works.
    摘要 新的数据源带来了推荐系统质量的改进机遇。印象是一种新的数据源,包含过去的推荐(显示的项目)和传统的交互。研究人员可以使用印象来细化用户偏好,超越现有的推荐系统研究的限制。印象的重要性和兴趣在年来不断增加,因此需要对这类推荐系统的研究进行系统性的文献评审。本文提出了一种系统性的文献评审方法,将推荐系统使用印象分为三个基本方向:推荐算法、数据集和评价方法。每个评审的论文都会在详细的描述,数据集也会被介绍,以及现有的评价方法的分析。最后,我们将提出未解决的问题和未来的方向,强调文献中缺失的方面,未来的研究可以在这些方面进行深入的探索。