cs.AI - 2023-07-16

A Recursive Bateson-Inspired Model for the Generation of Semantic Formal Concepts from Spatial Sensory Data

  • paper_url: http://arxiv.org/abs/2307.08087
  • repo_url: None
  • paper_authors: Jaime de Miguel-Rodriguez, Fernando Sancho-Caparrini
  • for: 这 paper 的目的是提出一种新的符号化方法,用于从复杂的空间感知数据中生成层次结构。
  • methods: 这 paper 使用了 Bateson 的不同理论来提取原始数据中的原子特征,然后通过 recursive 过程将这些特征组织成更高级别的构造。
  • results: 这 paper 的结果表明,使用这种符号化方法可以生成具有很好的可读性和可组合性的概念表示,而无需训练。此外,这些概念表示还具有高度的组合性、形式逻辑推理能力和对不同数据集的泛化和 OUT-OF-distribution 学习能力。
    Abstract Neural-symbolic approaches to machine learning incorporate the advantages from both connectionist and symbolic methods. Typically, these models employ a first module based on a neural architecture to extract features from complex data. Then, these features are processed as symbols by a symbolic engine that provides reasoning, concept structures, composability, better generalization and out-of-distribution learning among other possibilities. However, neural approaches to the grounding of symbols in sensory data, albeit powerful, still require heavy training and tedious labeling for the most part. This paper presents a new symbolic-only method for the generation of hierarchical concept structures from complex spatial sensory data. The approach is based on Bateson's notion of difference as the key to the genesis of an idea or a concept. Following his suggestion, the model extracts atomic features from raw data by computing elemental sequential comparisons in a stream of multivariate numerical values. Higher-level constructs are built from these features by subjecting them to further comparisons in a recursive process. At any stage in the recursion, a concept structure may be obtained from these constructs and features by means of Formal Concept Analysis. Results show that the model is able to produce fairly rich yet human-readable conceptual representations without training. Additionally, the concept structures obtained through the model (i) present high composability, which potentially enables the generation of 'unseen' concepts, (ii) allow formal reasoning, and (iii) have inherent abilities for generalization and out-of-distribution learning. Consequently, this method may offer an interesting angle to current neural-symbolic research. Future work is required to develop a training methodology so that the model can be tested against a larger dataset.
    摘要 神经 символиック方法可以结合神经网络和符号学方法的优点。通常,这些模型使用神经网络架构来提取复杂数据中的特征,然后将这些特征处理成符号学引擎中的符号,从而提供了逻辑、结构、可组合性、更好的泛化和外部数据学习等可能性。然而,神经方法对于将符号固定到感知数据中的降解,尚需很多的训练和繁琐的标注。这篇论文提出了一种新的符号学只的方法,用于从复杂空间感知数据中生成层次结构。该方法基于贝蒂逊的不同理论,即对象的想法或概念的起源是由差异所致。根据这个想法,模型从原始数据中提取原子特征,通过计算元素sequential比较来生成流行多变量数值中的特征。在这个过程中,更高级别的构造物可以通过进一步的比较来生成,并且可以通过正式概念分析来获得概念结构。在任何阶段,模型可以通过对构造物和特征进行比较来生成概念结构。 results表明,该模型可以生成比较 ric yet human-readable的概念表示,而无需训练。此外,模型生成的概念结构具有高可组合性,可能允许生成“未看到”的概念,具有逻辑逻辑和外部数据学习的能力。因此,这种方法可能会对当前神经 символиック研究提供一个有趣的角度。未来的工作是开发一种训练方法,以便将模型测试于更大的数据集。

Dataset Distillation Meets Provable Subset Selection

  • paper_url: http://arxiv.org/abs/2307.08086
  • repo_url: None
  • paper_authors: Murad Tukan, Alaa Maalouf, Margarita Osadchy
  • for: 提高 dataset distillation 的效果,使其可以更好地压缩大量数据并保持模型的性能。
  • methods: 我们提出了两个方法来改进 dataset distillation:首先,我们使用抽象样本来初始化压缩后的数据集,并且在训练过程中使用重要样本进行更新。其次,我们将数据subset selection 与 dataset distillation 结合起来,在训练过程中使用重要的样本进行更新。
  • results: 我们的方法可以帮助 dataset distillation 更好地压缩数据并保持模型的性能。我们通过实验表明,我们的方法可以与现有的 dataset distillation 技术结合使用,并且能够提高其性能。
    Abstract Deep learning has grown tremendously over recent years, yielding state-of-the-art results in various fields. However, training such models requires huge amounts of data, increasing the computational time and cost. To address this, dataset distillation was proposed to compress a large training dataset into a smaller synthetic one that retains its performance -- this is usually done by (1) uniformly initializing a synthetic set and (2) iteratively updating/learning this set according to a predefined loss by uniformly sampling instances from the full data. In this paper, we improve both phases of dataset distillation: (1) we present a provable, sampling-based approach for initializing the distilled set by identifying important and removing redundant points in the data, and (2) we further merge the idea of data subset selection with dataset distillation, by training the distilled set on ``important'' sampled points during the training procedure instead of randomly sampling the next batch. To do so, we define the notion of importance based on the relative contribution of instances with respect to two different loss functions, i.e., one for the initialization phase (a kernel fitting function for kernel ridge regression and $K$-means based loss function for any other distillation method), and the relative cross-entropy loss (or any other predefined loss) function for the training phase. Finally, we provide experimental results showing how our method can latch on to existing dataset distillation techniques and improve their performance.
    摘要 深度学习在最近几年内有很大的发展,取得了多个领域的状态机器人模型。然而,训练这些模型需要巨大的数据量和计算资源。为解决这个问题,数据集减少技术被提出,即将大规模的训练数据集压缩成一个更小的人工数据集,保持其性能。在这篇论文中,我们提高了两个数据集减少阶段的技术:1. 我们提出了一种可证明的抽象样本方法,通过标识重要和减少数据中重复的点来初始化减少后的数据集。2. 我们将数据子集选择和数据集减少技术融合在一起,在训练过程中将减少后的数据集训练在“重要”的抽取样本上。为此,我们定义了对于两个不同损失函数的相对贡献度,即一个是初始化阶段的kernel适应函数和$K$-means基于损失函数,另一个是训练阶段的相对杂志损失函数。最后,我们提供了实验结果,表明我们的方法可以与现有的数据集减少技术相结合,提高其性能。

POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenance

  • paper_url: http://arxiv.org/abs/2307.08082
  • repo_url: https://github.com/giarcieri/robust-optimal-maintenance-planning-through-reinforcement-learning-and-rllib
  • paper_authors: Giacomo Arcieri, Cyprien Hoelzl, Oliver Schwery, Daniel Straub, Konstantinos G. Papakonstantinou, Eleni Chatzi
  • for: 这篇论文是用于解决复杂的顺序决策问题的 POMDP 模型,以及使用深度学习解决 POMDP 的不确定性问题。
  • methods: 该论文提出了一种结合推理和 robust 解决 POMDP 的方法,包括使用 Markov Chain Monte Carlo 抽样来恢复转移和观察模型参数,然后使用深度学习技术来解决 POMDP 问题,并将解决方案与模型参数相结合,以提高解决方案的稳定性。
  • results: 该论文应用了这种方法于实际问题,即铁路资产保养规划问题,并进行了对 transformers 和 long short-term memory networks 的比较,以及模型基于/模型自由混合的方法的比较。
    Abstract Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the lack of availability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), require the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of transformers and long short-term memory networks, which constitute model-free RL solutions, with a model-based/model-free hybrid approach. We apply these methods to the real-world problem of optimal maintenance planning for railway assets.
    摘要 partially observable Markov decision processes (POMDPs) 可以模型复杂的顺序决策问题在Random and uncertain environment中. 一个主要阻碍POMDPs的广泛应用在实际场景中是lack of a suitable POMDP model or simulator thereof. 可用的解决方法,如Reinforcement Learning (RL), 需要知道过渡动力和观察生成过程,这些常常 unknown 和 non-trivial to infer. 在这种工作中,我们提出了一种 jointly inferring 和 Robust solution of POMDPs via deep RL. 首先,所有过渡和观察模型参数都是通过Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. 然后,POMDP with uncertain parameters is solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. 此外,我们还进行了transformers 和 long short-term memory networks (LSTM) 的比较,这些都是model-free RL solution, 以及model-based/model-free hybrid approach. 我们应用这些方法到了实际问题——铁路资产optimal maintenance planning.

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

  • paper_url: http://arxiv.org/abs/2307.08074
  • repo_url: None
  • paper_authors: Longyue Wang, Zefeng Du, Donghuai Liu, Deng Cai, Dian Yu, Haiyun Jiang, Yan Wang, Leyang Cui, Shuming Shi, Zhaopeng Tu
  • for: 这个论文的目的是提出一个新的评价标准,以评估自然语言处理(NLP)模型对文本中的语言现象进行模型化。
  • methods: 这个论文使用了一个新的评价标准,叫做Disco-Bench,这个标准可以评估NLP模型对文本中的语言现象进行模型化。
  • results: 这个论文的结果表明,使用文学文档级别的训练数据进行细化预训练可以提高NLP模型对文本中的语言现象的模型化。
    Abstract Modeling discourse -- the linguistic phenomena that go beyond individual sentences, is a fundamental yet challenging aspect of natural language processing (NLP). However, existing evaluation benchmarks primarily focus on the evaluation of inter-sentence properties and overlook critical discourse phenomena that cross sentences. To bridge the gap, we propose Disco-Bench, a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks, covering understanding, translation, and generation. Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena (e.g. cohesion and coherence) in Chinese and/or English. For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge. We totally evaluate 20 general-, in-domain and commercial models based on Transformer, advanced pretraining architectures and large language models (LLMs). Our results show (1) the challenge and necessity of our evaluation benchmark; (2) fine-grained pretraining based on literary document-level training data consistently improves the modeling of discourse information. We will release the datasets, pretrained models, and leaderboard, which we hope can significantly facilitate research in this field: https://github.com/longyuewangdcu/Disco-Bench.
    摘要 模拟语言流行 -- 超出个别句子的语言现象,是自然语言处理(NLP)的基本 yet 挑战性方面。然而,现有的评估标准主要是评估间句性质,忽视了重要的演讲现象。为了bridging this gap,我们提议了Disco-Bench,一个可以评估语句级别的演讲属性的评估标准,覆盖了理解、翻译和生成多个NLP任务。Disco-Bench包括9个文档级测试集,其中包含了中文和英文的丰富的演讲现象(例如,凝聚和一致)。为了语言分析,我们还设计了一个 диагностические测试组,可以检查目标模型是否学习到了演讲知识。我们总共评估了20个通用-, 域内-和商业模型,基于Transformer、先进的预训练架构和大语言模型(LLMs)。我们的结果显示了以下两点:1. 我们的评估标准的挑战和必要性。2. 基于文学文档级别的预训练数据进行细化预训练,可以一直提高模型对演讲信息的模型。我们将发布数据集、预训练模型和排名,希望可以帮助研究人员在这一领域进行更多的研究:https://github.com/longyuewangdcu/Disco-Bench。

Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study

  • paper_url: http://arxiv.org/abs/2307.08072
  • repo_url: https://github.com/rucaibox/quantizedempirical
  • paper_authors: Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen
  • for: investigate the impact of quantization on emergent abilities of large language models
  • methods: empirical experiments and fine-grained impact analysis
  • results: 4-bit quantization models still maintain emergent abilities, while 2-bit models show severe performance degradation; fine-tuning can improve performance
    Abstract Despite the superior performance, Large Language Models~(LLMs) require significant computational resources for deployment and use. To overcome this issue, quantization methods have been widely applied to reduce the memory footprint of LLMs as well as increasing the inference rate. However, a major challenge is that low-bit quantization methods often lead to performance degradation. It is important to understand how quantization impacts the capacity of LLMs. Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small language models. Specially, we examine the abilities of in-context learning, chain-of-thought reasoning, and instruction-following in quantized LLMs. Our empirical experiments show that these emergent abilities still exist in 4-bit quantization models, while 2-bit models encounter severe performance degradation on the test of these abilities. To improve the performance of low-bit models, we conduct two special experiments: (1) fine-gained impact analysis that studies which components (or substructures) are more sensitive to quantization, and (2) performance compensation through model fine-tuning. Our work derives a series of important findings to understand the impact of quantization on emergent abilities, and sheds lights on the possibilities of extremely low-bit quantization for LLMs.
    摘要 尽管大语言模型(LLM)在性能方面表现出色,但它们在部署和使用时需要很大的计算资源。为了解决这个问题,量化方法在LLM中广泛应用,以减少它们的内存占用量并提高推理速度。然而,低位数量化方法通常会导致性能下降。关键在于理解量化对LLM的容量有何影响。与之前关注总性能的研究不同,本研究探讨量化对 Language Model 的 emergent ability 的影响,即语言模型的特点,例如语言模型在语言上下文中学习、链式思维和遵从指令的能力。我们的实验表明,4位量化模型仍可保留这些能力,而2位量化模型在测试这些能力时受到了严重的性能下降。为了提高低位模型的性能,我们进行了两个特殊实验:(1)细读影响分析,研究量化对不同组件(或子结构)的影响,以及(2)通过模型细调进行性能补偿。我们的工作得出了一系列重要的发现,可以理解量化对 emergent ability 的影响,并照明了EXTREMELY 低位量化是否可以实现的可能性。

Towards Flexible Time-to-event Modeling: Optimizing Neural Networks via Rank Regression

  • paper_url: http://arxiv.org/abs/2307.08044
  • repo_url: https://github.com/teboozas/dart_ecai23
  • paper_authors: Hyunjun Lee, Junhyun Lee, Taehwa Choi, Jaewoo Kang, Sangbum Choi
  • for: 这个研究旨在开发一个基于深度学习的时间至事件预测模型,以提高预测性能和减少假设。
  • methods: 这个模型使用了一个基于Gehan的排名统计的目标函数,并且不需要设定基eline事件时间分布,因此可以保留直接预测事件时间的优点。
  • results: 经过量化分析various benchmark datasets,这个模型在处理高通量的停止时间至事件数据时表现出了优异的潜力。
    Abstract Time-to-event analysis, also known as survival analysis, aims to predict the time of occurrence of an event, given a set of features. One of the major challenges in this area is dealing with censored data, which can make learning algorithms more complex. Traditional methods such as Cox's proportional hazards model and the accelerated failure time (AFT) model have been popular in this field, but they often require assumptions such as proportional hazards and linearity. In particular, the AFT models often require pre-specified parametric distributional assumptions. To improve predictive performance and alleviate strict assumptions, there have been many deep learning approaches for hazard-based models in recent years. However, representation learning for AFT has not been widely explored in the neural network literature, despite its simplicity and interpretability in comparison to hazard-focused methods. In this work, we introduce the Deep AFT Rank-regression model for Time-to-event prediction (DART). This model uses an objective function based on Gehan's rank statistic, which is efficient and reliable for representation learning. On top of eliminating the requirement to establish a baseline event time distribution, DART retains the advantages of directly predicting event time in standard AFT models. The proposed method is a semiparametric approach to AFT modeling that does not impose any distributional assumptions on the survival time distribution. This also eliminates the need for additional hyperparameters or complex model architectures, unlike existing neural network-based AFT models. Through quantitative analysis on various benchmark datasets, we have shown that DART has significant potential for modeling high-throughput censored time-to-event data.
    摘要 时间到事分析(也称为存生分析)的目标是预测事件发生的时间, givens 一组特征。 一个主要挑战在这个领域是处理 censored 数据,这可能使学习算法变得更加复杂。传统方法如科克斯的相对准确度模型和加速失败时间(AFT)模型在这个领域非常流行,但它们经常假设 proportional hazards 和线性性。特别是 AF 模型经常需要预先指定 Parametric distributional assumptions。为了改进预测性能和减少严格假设,过去几年有很多深度学习方法在存生模型领域得到应用。然而, representation learning 在 neural network 文献中对 AFT 模型的应用还是不够广泛,尽管它在相对 simplicity 和 interpretability 方面与 hazard-focused 方法相比较有优势。在这个工作中,我们介绍了 Deep AFT Rank-regression model for Time-to-event prediction(DART)。这个模型使用基于 Gehan 排名统计的目标函数,这是efficient 和可靠的 representation learning 方法。在 eliminating the requirement to establish a baseline event time distribution 的同时,DART 保留了标准 AFT 模型中的优点,直接预测事件时间。我们的方法是一种 semi-parametric approach to AFT modeling,不需要任何分布 assumption 于存生时间分布。这也意味着我们不需要额外的 Hyperparameter 或复杂的模型结构,与现有的 neural network-based AFT 模型不同。通过对各种 benchmark 数据进行量化分析,我们表明了 DART 在高通过率 censored time-to-event 数据模型中的显著潜力。

A Neural-Symbolic Approach Towards Identifying Grammatically Correct Sentences

  • paper_url: http://arxiv.org/abs/2307.08036
  • repo_url: None
  • paper_authors: Nicos Isaak
  • for: validate English sentences
  • methods: neural-symbolic approach, blending of grammatical and syntactical rules with language models
  • results: effective tackling of Corpus of Linguistic Acceptability (COLA) task, improvement of symbolic systems’ accuracy results through blending with non-symbolic systems
    Abstract Textual content around us is growing on a daily basis. Numerous articles are being written as we speak on online newspapers, blogs, or social media. Similarly, recent advances in the AI field, like language models or traditional classic AI approaches, are utilizing all the above to improve their learned representation to tackle NLP challenges with human-like accuracy. It is commonly accepted that it is crucial to have access to well-written text from valid sources to tackle challenges like text summarization, question-answering, machine translation, or even pronoun resolution. For instance, to summarize well, one needs to select the most important sentences in order to concatenate them to form the summary. However, what happens if we do not have access to well-formed English sentences or even non-valid sentences? Despite the importance of having access to well-written sentences, figuring out ways to validate them is still an open area of research. To address this problem, we present a simplified way to validate English sentences through a novel neural-symbolic approach. Lately, neural-symbolic approaches have triggered an increasing interest towards tackling various NLP challenges, as they are demonstrating their effectiveness as a central component in various AI systems. Through combining Classic with Modern AI, which involves the blending of grammatical and syntactical rules with language models, we effectively tackle the Corpus of Linguistic Acceptability (COLA), a task that shows whether or not a sequence of words is an English grammatical sentence. Among others, undertaken experiments effectively show that blending symbolic and non-symbolic systems helps the former provide insights about the latter's accuracy results.
    摘要 Text around us 每天都在增长。许多文章在线报纸、博客或社交媒体上写作,而现代人工智能技术,如语言模型或传统的класси学习法,则在利用这些文本来提高其学习表示。通常认为,要解决文本处理挑战,需要访问有效的文本来。例如,要 SUMMARIZE 文本,需要选择最重要的句子,并将它们 concatenate 成 SUMMARY。但是,如果我们没有访问有效的英语句子或非有效的句子?尽管有效的文本访问是一个开放的研究领域。为解决这个问题,我们提出了一种简化的英语句子验证方法,基于一种新的神经符号学方法。在最近几年,神经符号学方法在解决各种文本处理挑战方面得到了越来越多的关注,因为它们在不同的 AI 系统中表现出了非常的效果。通过将传统的 grammatical 和 sintactical 规则与语言模型结合在一起,我们可以有效地解决 COLA 任务,这个任务是判断一个序列是否为英语 grammatical 句子。在其他实验中,我们发现了将符号学系统与非符号学系统混合,可以帮助符号学系统提供有关非符号学系统的准确性结果的深入理解。

Bayesian inference for data-efficient, explainable, and safe robotic motion planning: A review

  • paper_url: http://arxiv.org/abs/2307.08024
  • repo_url: None
  • paper_authors: Chengmin Zhou, Chao Wang, Haseeb Hassan, Himat Shah, Bingding Huang, Pasi Fränti
  • for: 这个论文旨在探讨bayesian inference在机器人运动规划中的应用,包括不确定性量化、安全性和优化 garanties、数据效率、模拟与实际差异和混合 bayesian inference和学习等方面。
  • methods: 论文使用bayesian inference的多种方法,包括 bayesian estimation、模型基于 bayesian RL 和无模型基于 bayesian RL、 inverse RL 和 hybridization of bayesian inference and RL等。
  • results: 论文提供了bayesian inference在机器人运动规划中的应用和研究进展,包括对复杂情况的 bayesian inference、数据效率、模拟与实际差异和混合 bayesian inference和学习等方面的分析和评估。
    Abstract Bayesian inference has many advantages in robotic motion planning over four perspectives: The uncertainty quantification of the policy, safety (risk-aware) and optimum guarantees of robot motions, data-efficiency in training of reinforcement learning, and reducing the sim2real gap when the robot is applied to real-world tasks. However, the application of Bayesian inference in robotic motion planning is lagging behind the comprehensive theory of Bayesian inference. Further, there are no comprehensive reviews to summarize the progress of Bayesian inference to give researchers a systematic understanding in robotic motion planning. This paper first provides the probabilistic theories of Bayesian inference which are the preliminary of Bayesian inference for complex cases. Second, the Bayesian estimation is given to estimate the posterior of policies or unknown functions which are used to compute the policy. Third, the classical model-based Bayesian RL and model-free Bayesian RL algorithms for robotic motion planning are summarized, while these algorithms in complex cases are also analyzed. Fourth, the analysis of Bayesian inference in inverse RL is given to infer the reward functions in a data-efficient manner. Fifth, we systematically present the hybridization of Bayesian inference and RL which is a promising direction to improve the convergence of RL for better motion planning. Sixth, given the Bayesian inference, we present the interpretable and safe robotic motion plannings which are the hot research topic recently. Finally, all algorithms reviewed in this paper are summarized analytically as the knowledge graphs, and the future of Bayesian inference for robotic motion planning is also discussed, to pave the way for data-efficient, explainable, and safe robotic motion planning strategies for practical applications.
    摘要 推断学有很多优势在机器人运动规划中,包括政策不确定性评估、安全性(风险意识)和优化机器人运动的 garantías, 数据效率在循环学习中训练, 和实际任务中的模拟与实际差异减少。然而,机器人运动规划中的推断学应用还处于完整理论的后方。此外,没有系统性的文章总结了推断学在机器人运动规划中的进步,以便给研究人员提供系统性的认识。这篇论文首先提供了推断学的概率理论,这些理论是复杂情况下的前提。其次,bayesian估计是用来估计政策或未知函数的 posterior,以计算政策。第三,本文总结了基于模型的推断学RL和无模型的推断学RL算法,这些算法在复杂情况下也进行了分析。第四,本文对 bayesian推断在反向RL中的分析,用数据效率的方式推断出奖励函数。第五,本文系统地介绍了推断学和RL的гибриди化,这是一个有前途的方向,以提高RL的收敛性。最后,本文总结了所有论文中的算法,并讨论了未来推断学在机器人运动规划中的发展,以便为数据效率、可读性和安全的机器人运动规划策略提供道路。

Breaking Down the Task: A Unit-Grained Hybrid Training Framework for Vision and Language Decision Making

  • paper_url: http://arxiv.org/abs/2307.08016
  • repo_url: None
  • paper_authors: Ruipu Luo, Jiwen Zhang, Zhongyu Wei
  • for: 本研究旨在解决视觉语言决策任务中的长动作序列问题,提出了一个单位化训练框架,以实现环境内的活动探索和减少传递偏见。
  • methods: 本研究提出了一个单位转换器(UT),具有内置的循环状态,以保持单位缩寸跨模式内存。
  • results: 经过广泛的TEACH标准库实验,我们的提案方法在评估指标上优于现有的状况顶尖方法。
    Abstract Vision language decision making (VLDM) is a challenging multimodal task. The agent have to understand complex human instructions and complete compositional tasks involving environment navigation and object manipulation. However, the long action sequences involved in VLDM make the task difficult to learn. From an environment perspective, we find that task episodes can be divided into fine-grained \textit{units}, each containing a navigation phase and an interaction phase. Since the environment within a unit stays unchanged, we propose a novel hybrid-training framework that enables active exploration in the environment and reduces the exposure bias. Such framework leverages the unit-grained configurations and is model-agnostic. Specifically, we design a Unit-Transformer (UT) with an intrinsic recurrent state that maintains a unit-scale cross-modal memory. Through extensive experiments on the TEACH benchmark, we demonstrate that our proposed framework outperforms existing state-of-the-art methods in terms of all evaluation metrics. Overall, our work introduces a novel approach to tackling the VLDM task by breaking it down into smaller, manageable units and utilizing a hybrid-training framework. By doing so, we provide a more flexible and effective solution for multimodal decision making.
    摘要 视觉语言决策(VLDM)是一项复杂的多模态任务。智能体需要理解复杂的人类指令,完成环境导航和物体操作的compositional任务。然而,长的行动序列使得这个任务困难学习。从环境角度来看,我们发现任务集可以分为细化的单位,每个单位包括导航阶段和交互阶段。由于环境内单位保持不变,我们提议一种新的混合训练框架,允许活动探索环境,并减少曝光偏见。这种框架利用单位粒度的配置,是模型无关的。我们设计了 Unit-Transformer(UT),具有内置的自回归状态,保持单位级别的交互记忆。经过广泛的 TEACH bencmark 实验,我们证明了我们提议的框架比现有状态的方法在所有评价指标上表现更好。总的来说,我们的工作介绍了一种新的方法来解决 VLDM 任务,通过将其分解成更小、可控的单位,并利用混合训练框架。这提供了更 flexible 和有效的多模态决策解决方案。

SHAMSUL: Simultaneous Heatmap-Analysis to investigate Medical Significance Utilizing Local interpretability methods

  • paper_url: http://arxiv.org/abs/2307.08003
  • repo_url: https://github.com/anondo1969/shamsul
  • paper_authors: Mahbub Ul Alam, Jaakko Hollmén, Jón Rúnar Baldvinsson, Rahim Rahmani
  • For: The paper aims to evaluate the performance of four well-established interpretability methods (LIME, SHAP, Grad-CAM, and LRP) in interpreting deep neural network predictions for chest radiography images.* Methods: The study uses a transfer learning approach with a multi-label-multi-class chest radiography dataset, and evaluates the interpretability methods on both single-label and multi-label predictions. The analysis includes quantitative and qualitative investigations, and compares the results against human expert annotation.* Results: The study finds that Grad-CAM demonstrates the most favorable performance in quantitative evaluation, while the LIME heatmap segmentation visualization exhibits the highest level of medical significance. The research highlights the strengths and limitations of these interpretability methods and suggests that a multimodal-based approach could offer additional insights for enhancing interpretability in the medical domain.Here is the same information in Simplified Chinese text:* For: 这个研究旨在评估深度神经网络预测图像胸部X射线成像的 interpretability 方法 (LIME, SHAP, Grad-CAM, LRP) 的性能。* Methods: 该研究使用了传输学习方法,使用多个标签多个类的胸部X射线成像集合,并对这些方法进行了单标签和多标签预测的评估。研究包括量化和质量的调查,并与人工专家标注进行比较。* Results: 研究发现,Grad-CAM在量化评估中表现最佳,而 LIME 热图分 segmentation 可视化具有最高的医学意义。研究描述了这些 interpretability 方法的优缺点,并建议使用多Modal 基于的方法,可以在医疗领域提供更多的意义。
    Abstract The interpretability of deep neural networks has become a subject of great interest within the medical and healthcare domain. This attention stems from concerns regarding transparency, legal and ethical considerations, and the medical significance of predictions generated by these deep neural networks in clinical decision support systems. To address this matter, our study delves into the application of four well-established interpretability methods: Local Interpretable Model-agnostic Explanations (LIME), Shapley Additive exPlanations (SHAP), Gradient-weighted Class Activation Mapping (Grad-CAM), and Layer-wise Relevance Propagation (LRP). Leveraging the approach of transfer learning with a multi-label-multi-class chest radiography dataset, we aim to interpret predictions pertaining to specific pathology classes. Our analysis encompasses both single-label and multi-label predictions, providing a comprehensive and unbiased assessment through quantitative and qualitative investigations, which are compared against human expert annotation. Notably, Grad-CAM demonstrates the most favorable performance in quantitative evaluation, while the LIME heatmap segmentation visualization exhibits the highest level of medical significance. Our research highlights the strengths and limitations of these interpretability methods and suggests that a multimodal-based approach, incorporating diverse sources of information beyond chest radiography images, could offer additional insights for enhancing interpretability in the medical domain.
    摘要 深度神经网络的可解释性在医疗领域已经引起了广泛的关注,这种关注的起点是因为透明度、法律和伦理考虑以及深度神经网络在临床决策支持系统中的医学意义。为解决这个问题,我们的研究探讨了四种已知的可解释方法:本地可解释性模型无关性解释(LIME)、Shapley添加itive exPlanations(SHAP)、梯度权重分类活动映射(Grad-CAM)和层次 relevance propagation(LRP)。通过转移学习的方式,我们使用了一个多标签多类肺高照图像数据集,以解释具体疾病类型的预测结果。我们的分析包括单标签和多标签预测,以提供全面和无偏见的评估,并与人类专家标注进行比较。考据表明,Grad-CAM在量化评估中表现最佳,而 LIME 热点Segmentation 视觉化可以达到最高的医学意义。我们的研究探讨了这些可解释方法的优缺点,并建议将多模式基于的方法应用于医疗领域,以获取更多的增强可解释性的信息。

MargCTGAN: A “Marginally’’ Better CTGAN for the Low Sample Regime

  • paper_url: http://arxiv.org/abs/2307.07997
  • repo_url: https://github.com/tejuafonja/margctgan
  • paper_authors: Tejumade Afonja, Dingfan Chen, Mario Fritz
  • for: 评估现代 synthetic 表格数据生成器的能力,特别是在低样本情况下。
  • methods: 使用 CTGAN 模型和特征匹配技术来改进 synthetic 数据的统计性和下游任务用途性。
  • results: 提出了 MargCTGAN 模型,可以在高到低样本情况下保持同样的下游任务用途性和统计性。
    Abstract The potential of realistic and useful synthetic data is significant. However, current evaluation methods for synthetic tabular data generation predominantly focus on downstream task usefulness, often neglecting the importance of statistical properties. This oversight becomes particularly prominent in low sample scenarios, accompanied by a swift deterioration of these statistical measures. In this paper, we address this issue by conducting an evaluation of three state-of-the-art synthetic tabular data generators based on their marginal distribution, column-pair correlation, joint distribution and downstream task utility performance across high to low sample regimes. The popular CTGAN model shows strong utility, but underperforms in low sample settings in terms of utility. To overcome this limitation, we propose MargCTGAN that adds feature matching of de-correlated marginals, which results in a consistent improvement in downstream utility as well as statistical properties of the synthetic data.
    摘要 现有的Synthetic数据生成方法具有实用的潜力,但是现有评估方法倾向于专注在下游任务的有用性上,常常忽略了这些统计特性的重要性。这个问题在低数据情况下特别突出,伴随着这些统计指标的快速衰退。在这篇文章中,我们解决这个问题,通过评估三种现有的Synthetic数据生成器的边缘分布、列 pairs相互相关、共同分布和下游任务的有用性,以及在高至低数据情况下的表现。CTGAN模型在下游任务方面表现强,但在低数据情况下表现不佳,特别是在统计特性方面。为了解决这个问题,我们提出了MargCTGAN,它通过匹配特征的束缩边缘分布,实现了在下游任务和统计特性方面的一致性提升。

CoNAN: Conditional Neural Aggregation Network For Unconstrained Face Feature Fusion

  • paper_url: http://arxiv.org/abs/2307.10237
  • repo_url: None
  • paper_authors: Bhavin Jawade, Deen Dayal Mohan, Dennis Fedorishin, Srirangaraj Setlur, Venu Govindaraju
  • for: 这个论文旨在解决受到无法控制的环境影响的长距离识别 faces 的问题,例如:距离、分辨率、角度、照明、姿势和大气状况等。
  • methods: 本文提出了一种基于分布情况的特征聚合方法,即CoNAN,以提高长距离识别 faces 的精度。这个方法通过学习一个受到分布信息 conditional 的上下文 вектор,将特征按照其估计的有用程度进行权重。
  • results: 本文在长距离无条件 face 识别 dataset 上获得了state-of-the-art 的结果,例如:BTS 和 DroneSURF,这说明了这种聚合策略的优点。
    Abstract Face recognition from image sets acquired under unregulated and uncontrolled settings, such as at large distances, low resolutions, varying viewpoints, illumination, pose, and atmospheric conditions, is challenging. Face feature aggregation, which involves aggregating a set of N feature representations present in a template into a single global representation, plays a pivotal role in such recognition systems. Existing works in traditional face feature aggregation either utilize metadata or high-dimensional intermediate feature representations to estimate feature quality for aggregation. However, generating high-quality metadata or style information is not feasible for extremely low-resolution faces captured in long-range and high altitude settings. To overcome these limitations, we propose a feature distribution conditioning approach called CoNAN for template aggregation. Specifically, our method aims to learn a context vector conditioned over the distribution information of the incoming feature set, which is utilized to weigh the features based on their estimated informativeness. The proposed method produces state-of-the-art results on long-range unconstrained face recognition datasets such as BTS, and DroneSURF, validating the advantages of such an aggregation strategy.
    摘要 face recognition from image sets acquired under unregulated and uncontrolled settings, such as at large distances, low resolutions, varying viewpoints, illumination, pose, and atmospheric conditions, is challenging. Face feature aggregation, which involves aggregating a set of N feature representations present in a template into a single global representation, plays a pivotal role in such recognition systems. Existing works in traditional face feature aggregation either utilize metadata or high-dimensional intermediate feature representations to estimate feature quality for aggregation. However, generating high-quality metadata or style information is not feasible for extremely low-resolution faces captured in long-range and high altitude settings. To overcome these limitations, we propose a feature distribution conditioning approach called CoNAN for template aggregation. Specifically, our method aims to learn a context vector conditioned over the distribution information of the incoming feature set, which is utilized to weigh the features based on their estimated informativeness. The proposed method produces state-of-the-art results on long-range unconstrained face recognition datasets such as BTS, and DroneSURF, validating the advantages of such an aggregation strategy.Here's the translation in Traditional Chinese:面部识别从不受管制的图像集中,例如在大Distance、低分辨率、不同的角度、照明、姿势和大气情况下捕捉的面部图像,是一个问题。在这些识别系统中,面部特征聚合协助整合一个模板中的多个特征表示,是一个关键的步骤。现有的传统面部特征聚合方法通常是利用元metadata或高维度的中途特征表示来估算特征质量,但是在EXTREMELY低分辨率的面部图像中,生成高质量的元metadata或饰件信息是不可能的。为了突破这些限制,我们提出了一个特征分布条件对应方法called CoNAN。 Specifically, our method aims to learn a context vector conditioned over the distribution information of the incoming feature set, which is utilized to weigh the features based on their estimated informativeness. The proposed method produces state-of-the-art results on long-range unconstrained face recognition datasets such as BTS, and DroneSURF, validating the advantages of such an aggregation strategy.

Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10236
  • repo_url: None
  • paper_authors: Yuheng Huang, Jiayang Song, Zhijie Wang, Huaming Chen, Lei Ma
  • for: 这篇论文旨在探讨大语言模型(LLMs)的信任性问题,尤其是在安全性、安全性和可靠性等方面。
  • methods: 本研究使用了十二种不确定性估计方法,并在四个大语言模型(LLMs)上进行了四个常见的自然语言处理(NLP)任务的实验,以探索 LLMS 的预测风险。
  • results: 研究发现,不确定性估计方法可以帮助揭示 LLMS 的不确定/非事实预测,并且在代码生成 зада务中发现了一些buggy程式的存在。
    Abstract The recent performance leap of Large Language Models (LLMs) opens up new opportunities across numerous industrial applications and domains. However, erroneous generations, such as false predictions, misinformation, and hallucination made by LLMs, have also raised severe concerns for the trustworthiness of LLMs', especially in safety-, security- and reliability-sensitive scenarios, potentially hindering real-world adoptions. While uncertainty estimation has shown its potential for interpreting the prediction risks made by general machine learning (ML) models, little is known about whether and to what extent it can help explore an LLM's capabilities and counteract its undesired behavior. To bridge the gap, in this paper, we initiate an exploratory study on the risk assessment of LLMs from the lens of uncertainty. In particular, we experiment with twelve uncertainty estimation methods and four LLMs on four prominent natural language processing (NLP) tasks to investigate to what extent uncertainty estimation techniques could help characterize the prediction risks of LLMs. Our findings validate the effectiveness of uncertainty estimation for revealing LLMs' uncertain/non-factual predictions. In addition to general NLP tasks, we extensively conduct experiments with four LLMs for code generation on two datasets. We find that uncertainty estimation can potentially uncover buggy programs generated by LLMs. Insights from our study shed light on future design and development for reliable LLMs, facilitating further research toward enhancing the trustworthiness of LLMs.
    摘要

Automated Polynomial Filter Learning for Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.07956
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Wendi Yu, Zhichao Hou, Xiaorui Liu
  • for: 本研究旨在探讨波利omial图像筛选器学习方法的潜力和局限性,以及提出一种通用的自动化波利omial图像筛选器学习框架,以提高波利omial图像筛选器的效果。
  • methods: 本研究使用了波利omial图像筛选器的适应学习方法,并提出了一种新的自动化波利omial图像筛选器学习框架,它可以有效地适应不同类型的图像信号。
  • results: 实验和减少研究表明,使用自动化波利omial图像筛选器学习框架可以提高波利omial图像筛选器的性能,并且在不同的学习设置下具有显著的鲁棒性和稳定性。
    Abstract Polynomial graph filters have been widely used as guiding principles in the design of Graph Neural Networks (GNNs). Recently, the adaptive learning of the polynomial graph filters has demonstrated promising performance for modeling graph signals on both homophilic and heterophilic graphs, owning to their flexibility and expressiveness. In this work, we conduct a novel preliminary study to explore the potential and limitations of polynomial graph filter learning approaches, revealing a severe overfitting issue. To improve the effectiveness of polynomial graph filters, we propose Auto-Polynomial, a novel and general automated polynomial graph filter learning framework that efficiently learns better filters capable of adapting to various complex graph signals. Comprehensive experiments and ablation studies demonstrate significant and consistent performance improvements on both homophilic and heterophilic graphs across multiple learning settings considering various labeling ratios, which unleashes the potential of polynomial filter learning.
    摘要 幂数图 filters 已经广泛应用于图神经网络(GNNs)的设计中,最近,可变学习幂数图 filters 的表现很出色,可以模型具有同性和不同性图号的图信号。在这项工作中,我们进行了一项首次的初步研究,探讨幂数图 filters 学习方法的潜力和限制,发现了严重的过拟合问题。为了改善幂数图 filters 的效果,我们提议了 Auto-Polynomial,一种新的和通用的自动幂数图 filters 学习框架,可以高效地学习更好的 filters,以适应不同的复杂图号。经过了广泛的实验和简要研究,我们发现在不同的学习设置下,Auto-Polynomial 能够在同性和不同性图号上提供显著和稳定的性能提升。

MinT: Boosting Generalization in Mathematical Reasoning via Multi-View Fine-Tuning

  • paper_url: http://arxiv.org/abs/2307.07951
  • repo_url: None
  • paper_authors: Zhenwen Liang, Dian Yu, Xiaoman Pan, Wenlin Yao, Qingkai Zeng, Xiangliang Zhang, Dong Yu
  • For: The paper aims to improve mathematical reasoning in relatively small language models (LMs) by introducing a multi-view fine-tuning method that leverages diverse annotation styles in existing mathematical problem datasets.* Methods: The proposed method postpones distinct instructions to input questions and trains the model to generate solutions in diverse formats in a flexible manner, utilizing the various annotation formats as different “views”.* Results: The approach outperforms prior knowledge distillation-based methods and carefully established baselines, with the model demonstrating promising generalization ability across various views and datasets, as well as the capability to learn from inaccurate or incomplete noisy data.
    Abstract Reasoning in mathematical domains remains a significant challenge for relatively small language models (LMs). Many current methods focus on specializing LMs in mathematical reasoning and rely heavily on knowledge distillation from powerful but inefficient large LMs (LLMs). In this work, we explore a new direction that avoids over-reliance on LLM teachers, introducing a multi-view fine-tuning method that efficiently exploits existing mathematical problem datasets with diverse annotation styles. Our approach uniquely considers the various annotation formats as different "views" and leverages them in training the model. By postpending distinct instructions to input questions, models can learn to generate solutions in diverse formats in a flexible manner. Experimental results show that our strategy enables a LLaMA-7B model to outperform prior approaches that utilize knowledge distillation, as well as carefully established baselines. Additionally, the proposed method grants the models promising generalization ability across various views and datasets, and the capability to learn from inaccurate or incomplete noisy data. We hope our multi-view training paradigm could inspire future studies in other machine reasoning domains.
    摘要 <>TRANSLATE_TEXT推理在数学领域仍然是小语言模型(LM)面临的主要挑战。许多当前方法是特化LM来进行数学推理,并且依赖于强大 yet inefficient 大语言模型(LLM)的知识填充。在这项工作中,我们explore一个新的方向,避免过于依赖LLM教师,并 introducing a multi-view fine-tuning方法,高效地利用现有的数学问题Dataset中的多种注释样式。我们的方法会考虑不同的注释格式为不同的"视图",并在训练模型时进行权重调整。通过在输入问题上附加特定的指令,我们的模型可以学习生成多种格式的解决方案,并且在灵活的方式下进行解决。实验结果显示,我们的策略可以让LLaMA-7B模型超越先前使用知识填充的方法,以及特制的基eline。此外,我们的方法还授予模型在不同的视图和数据集上具有扩展性和可学习性,并且能够从不准确或 incomplete 的噪音数据中学习。我们希望我们的多视图训练方法能够激发未来的机器推理领域的研究。

SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning

  • paper_url: http://arxiv.org/abs/2307.10234
  • repo_url: None
  • paper_authors: Kiana Kheiri, Hamid Karimi
  • for: 这个研究的目的是对多种基于Transformer的生成推荐方法(GPT)在情感分析任务中进行了系统性的评估,特别是在SemEval 2017数据集上进行的任务4。
  • methods: 这个研究使用了三种主要策略:1)使用高级GPT-3.5 Turbo进行提示工程,2)精度地调整GPT模型,3)一种创新的嵌入分类方法。
  • results: 研究发现GPT方法在情感分析任务中的预测性能明显高于当前最佳性能,准确率高于22%;此外,研究还发现GPT模型在处理复杂情感任务时的能力更强,例如理解上下文和检测嘲讽。
    Abstract This study presents a thorough examination of various Generative Pretrained Transformer (GPT) methodologies in sentiment analysis, specifically in the context of Task 4 on the SemEval 2017 dataset. Three primary strategies are employed: 1) prompt engineering using the advanced GPT-3.5 Turbo, 2) fine-tuning GPT models, and 3) an inventive approach to embedding classification. The research yields detailed comparative insights among these strategies and individual GPT models, revealing their unique strengths and potential limitations. Additionally, the study compares these GPT-based methodologies with other current, high-performing models previously used with the same dataset. The results illustrate the significant superiority of the GPT approaches in terms of predictive performance, more than 22\% in F1-score compared to the state-of-the-art. Further, the paper sheds light on common challenges in sentiment analysis tasks, such as understanding context and detecting sarcasm. It underscores the enhanced capabilities of the GPT models to effectively handle these complexities. Taken together, these findings highlight the promising potential of GPT models in sentiment analysis, setting the stage for future research in this field. The code can be found at https://github.com/DSAatUSU/SentimentGPT
    摘要

Revisiting Domain-Adaptive 3D Object Detection by Reliable, Diverse and Class-balanced Pseudo-Labeling

  • paper_url: http://arxiv.org/abs/2307.07944
  • repo_url: https://github.com/zhuoxiao-chen/redb-da-3ddet
  • paper_authors: Zhuoxiao Chen, Yadan Luo, Zheng Wang, Mahsa Baktashmotlagh, Zi Huang
    for: 本文提出了一种基于pseudo标签技术的不监督领域适应(DA)方法,用于解决多类训练场景下DA方法的性能下降问题。methods: 我们提出了一种名为ReDB的框架,用于同时学习所有类。我们使用了可靠的、多样化的和平衡的pseudo 3D框来引导自我训练。为了解决环境差异引起的干扰,我们提出了一种跨频道评估(CDE)方法,用于评估pseudo标签的正确性。此外,我们还设计了一种重叠框数(OBC)度量,用于降低计算负担和减少对象偏移。results: 我们的ReDB方法在三个标准 benchmarkdataset上进行了实验,使用了voxel-based(i.e., SECOND)和点 cloud-based 3D检测器(i.e., PointRCNN)。结果显示,我们的方法比现有的3D DA方法提高了23.15%的mAP值,在nuScenes $\rightarrow$ KITTI任务上。
    Abstract Unsupervised domain adaptation (DA) with the aid of pseudo labeling techniques has emerged as a crucial approach for domain-adaptive 3D object detection. While effective, existing DA methods suffer from a substantial drop in performance when applied to a multi-class training setting, due to the co-existence of low-quality pseudo labels and class imbalance issues. In this paper, we address this challenge by proposing a novel ReDB framework tailored for learning to detect all classes at once. Our approach produces Reliable, Diverse, and class-Balanced pseudo 3D boxes to iteratively guide the self-training on a distributionally different target domain. To alleviate disruptions caused by the environmental discrepancy (e.g., beam numbers), the proposed cross-domain examination (CDE) assesses the correctness of pseudo labels by copy-pasting target instances into a source environment and measuring the prediction consistency. To reduce computational overhead and mitigate the object shift (e.g., scales and point densities), we design an overlapped boxes counting (OBC) metric that allows to uniformly downsample pseudo-labeled objects across different geometric characteristics. To confront the issue of inter-class imbalance, we progressively augment the target point clouds with a class-balanced set of pseudo-labeled target instances and source objects, which boosts recognition accuracies on both frequently appearing and rare classes. Experimental results on three benchmark datasets using both voxel-based (i.e., SECOND) and point-based 3D detectors (i.e., PointRCNN) demonstrate that our proposed ReDB approach outperforms existing 3D domain adaptation methods by a large margin, improving 23.15% mAP on the nuScenes $\rightarrow$ KITTI task. The code is available at https://github.com/zhuoxiao-chen/ReDB-DA-3Ddet.
    摘要 <>translate text into Simplified Chinese<>无监督领域适应(DA)技术帮助了3D物体检测预测。虽然有效,但现有DA方法在多类训练场景下表现不佳,因为低质量pseudo标签和类别不均问题存在。在这篇文章中,我们解决这个挑战,提出了一种基于学习检测所有类别的 novel ReDB框架。我们的方法生成了可靠、多样和类别均衡的pseudo 3D框,用于反馈自我训练 distributionally different 目标频谱上。为了消除环境差异(例如,扩散)所引起的干扰,我们提出了跨频谱评估(CDE),用于复制目标实例到源环境中,并测量预测一致性。为了减少计算负担和 Mitigate 对象变换(例如,比例和点密度),我们设计了重叠的框数(OBC)度量,允许用于均一下 pseudo-标签对象的下采样。为了解决类别不均问题,我们逐渐增加了类别均衡的pseudo-标签目标实例和源对象,这有助于提高了识别率,包括常见的类和罕见的类。实验结果表明,我们的提出的 ReDB 方法在三个benchmarkdataset上(包括 voxel-based 和点 clouds 3D检测器)表现出色,相比现有的3D领域适应方法,提高了23.15%的MAP值。代码可以在 上获取。

KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection

  • paper_url: http://arxiv.org/abs/2307.07942
  • repo_url: https://github.com/Luoyadan/KECOR-active-3Ddet
  • paper_authors: Yadan Luo, Zhuoxiao Chen, Zhen Fang, Zheng Zhang, Zi Huang, Mahsa Baktashmotlagh
  • for: 提高自动驾驶 LiDAR 对象检测器的可靠性,减少标注量。
  • methods: 使用新的 kernel coding rate maximization (KECOR) 策略,通过信息理论来选择最有用的点云进行标注。
  • results: 比对现有方法减少了约44% 的标注成本和26% 的计算时间,无需妥协检测性能。
    Abstract Achieving a reliable LiDAR-based object detector in autonomous driving is paramount, but its success hinges on obtaining large amounts of precise 3D annotations. Active learning (AL) seeks to mitigate the annotation burden through algorithms that use fewer labels and can attain performance comparable to fully supervised learning. Although AL has shown promise, current approaches prioritize the selection of unlabeled point clouds with high uncertainty and/or diversity, leading to the selection of more instances for labeling and reduced computational efficiency. In this paper, we resort to a novel kernel coding rate maximization (KECOR) strategy which aims to identify the most informative point clouds to acquire labels through the lens of information theory. Greedy search is applied to seek desired point clouds that can maximize the minimal number of bits required to encode the latent features. To determine the uniqueness and informativeness of the selected samples from the model perspective, we construct a proxy network of the 3D detector head and compute the outer product of Jacobians from all proxy layers to form the empirical neural tangent kernel (NTK) matrix. To accommodate both one-stage (i.e., SECOND) and two-stage detectors (i.e., PVRCNN), we further incorporate the classification entropy maximization and well trade-off between detection performance and the total number of bounding boxes selected for annotation. Extensive experiments conducted on two 3D benchmarks and a 2D detection dataset evidence the superiority and versatility of the proposed approach. Our results show that approximately 44% box-level annotation costs and 26% computational time are reduced compared to the state-of-the-art AL method, without compromising detection performance.
    摘要 要实现自适应驾驶中的可靠 LiDAR 基于对象探测器,是必备的,但其成功取决于获得大量精确的3D注释。活动学习(AL)可以减轻注释负担,但目前的方法通常选择低确定性和多样性的无标点云,导致更多的实例需要标注,并降低计算效率。在这篇论文中,我们采用了一种新的核心编码率最大化(KECOR)策略,以便通过信息理论的视角来标识最有用的点云。我们使用贪婪搜索算法来寻找需要标注的点云,以达到最小化数据量的编码。为了评估选择的样本唯一性和有用性,我们构建了一个3D探测器头的卫星网络,并计算所有卫星层的外产Jacobian的outer乘积,以建立empirical神经积分(NTK)矩阵。为了满足一stage(i.e., SECOND)和两stage(i.e., PVRCNN)探测器,我们还 incorporate分类Entropy最大化和检测性能和总绘制框数之间的融合。我们在两个3D benchmark和一个2D探测数据集上进行了广泛的实验,结果表明我们的方法在 box-level 注释成本和计算时间上具有明显的优越性和多样性。我们的结果表明,相比领先方法,我们的方法可以提高约44%的盒级注释成本和26%的计算时间,而无需牺牲检测性能。

GeoGPT: Understanding and Processing Geospatial Tasks through An Autonomous GPT

  • paper_url: http://arxiv.org/abs/2307.07930
  • repo_url: None
  • paper_authors: Yifan Zhang, Cheng Wei, Shangyou Wu, Zhengting He, Wenhao Yu
    for:The paper is written to explore a new framework called GeoGPT that integrates the semantic understanding ability of large language models (LLMs) with mature tools within the GIS community, aiming to lower the threshold of non-professional users to solve geospatial tasks.methods:The paper utilizes Generative Pre-trained Transformer (e.g., ChatGPT) and AutoGPT to enable the framework to automatically reason and call externally defined tools, and the framework is validated through several cases including geospatial data crawling, spatial query, facility siting, and mapping.results:The results show that GeoGPT can conduct geospatial data collection, processing, and analysis in an autonomous manner with the instruction of only natural language, and the framework is effective in solving various geospatial tasks. The paper also suggests that the “foundational plus professional” paradigm implied in GeoGPT provides an effective way to develop next-generation GIS in this era of large foundation models.
    Abstract Decision-makers in GIS need to combine a series of spatial algorithms and operations to solve geospatial tasks. For example, in the task of facility siting, the Buffer tool is usually first used to locate areas close or away from some specific entities; then, the Intersect or Erase tool is used to select candidate areas satisfied multiple requirements. Though professionals can easily understand and solve these geospatial tasks by sequentially utilizing relevant tools, it is difficult for non-professionals to handle these problems. Recently, Generative Pre-trained Transformer (e.g., ChatGPT) presents strong performance in semantic understanding and reasoning. Especially, AutoGPT can further extend the capabilities of large language models (LLMs) by automatically reasoning and calling externally defined tools. Inspired by these studies, we attempt to lower the threshold of non-professional users to solve geospatial tasks by integrating the semantic understanding ability inherent in LLMs with mature tools within the GIS community. Specifically, we develop a new framework called GeoGPT that can conduct geospatial data collection, processing, and analysis in an autonomous manner with the instruction of only natural language. In other words, GeoGPT is used to understand the demands of non-professional users merely based on input natural language descriptions, and then think, plan, and execute defined GIS tools to output final effective results. Several cases including geospatial data crawling, spatial query, facility siting, and mapping validate the effectiveness of our framework. Though limited cases are presented in this paper, GeoGPT can be further extended to various tasks by equipping with more GIS tools, and we think the paradigm of "foundational plus professional" implied in GeoGPT provides an effective way to develop next-generation GIS in this era of large foundation models.
    摘要 决策者在地理信息系统(GIS)中需要结合一系列的空间算法和操作来解决地理任务。例如,在设施位置选址任务中,首先使用 Buffer 工具来确定靠近特定实体的区域,然后使用 Intersect 或 Erase 工具来选择满足多个需求的候选区域。虽然专业人士可以轻松地理解和解决这些地理任务,但非专业人士则有困难。现在,生成预训练的 transformer(例如 ChatGPT)表现出了强大的语义理解和逻辑能力。特别是 AutoGPT 可以进一步扩展大语言模型(LLM)的能力,通过自动理解和调用外部定义的工具。受这些研究启发,我们尝试将非专业用户解决地理任务的门槛降低,通过将语义理解能力内置在 LLM 中与 GIS 社区熟悉的工具相结合。具体来说,我们开发了一个新框架 called GeoGPT,可以在自主 manner 中进行地理数据收集、处理和分析,只需要自然语言输入。即使非专业用户只提供了自然语言描述,GeoGPT 仍可以理解用户的需求,计划和执行定义的 GIS 工具,以获得最终有效的结果。我们在这篇论文中提出了一些案例,包括地理数据抓取、空间查询、设施位置选址和地图生成,证明了 GeoGPT 的有效性。虽然我们只是在这篇论文中提出了有限的案例,但我们认为 GeoGPT 可以在不同的任务中进行扩展,只需要附加更多的 GIS 工具即可。我们认为 GeoGPT 的“基础 plus 专业”的思想提供了下一代 GIS 的有效发展方式。

Neural Architecture Retrieval

  • paper_url: http://arxiv.org/abs/2307.07919
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Xiaohuan Pei, Yanxi Li, Minjing Dong, Chang Xu
  • for: 本研究旨在提高新的神经网络架构设计和现有的神经网络架构之间的相似性检索,以便帮助研究人员更好地将自己的贡献与现有的架构相比较,并确定它们之间的联系。
  • methods: 本研究使用分解图为模块进行重建大图,并 introduce multi-level contrastive learning来实现准确的图表示学习。
  • results: 对人工设计和生成的神经网络架构进行了广泛的评估,并证明了我们的算法的优越性。此外,还建立了一个包含12k个真实世界网络架构、其嵌入的数据集,用于神经网络架构检索。
    Abstract With the increasing number of new neural architecture designs and substantial existing neural architectures, it becomes difficult for the researchers to situate their contributions compared with existing neural architectures or establish the connections between their designs and other relevant ones. To discover similar neural architectures in an efficient and automatic manner, we define a new problem Neural Architecture Retrieval which retrieves a set of existing neural architectures which have similar designs to the query neural architecture. Existing graph pre-training strategies cannot address the computational graph in neural architectures due to the graph size and motifs. To fulfill this potential, we propose to divide the graph into motifs which are used to rebuild the macro graph to tackle these issues, and introduce multi-level contrastive learning to achieve accurate graph representation learning. Extensive evaluations on both human-designed and synthesized neural architectures demonstrate the superiority of our algorithm. Such a dataset which contains 12k real-world network architectures, as well as their embedding, is built for neural architecture retrieval.
    摘要 随着新的神经网络架构设计的数量不断增加,以及现有的神经网络架构的数量,对研究人员来说已经变得困难以对现有的神经网络架构进行比较,或者确定自己的设计与其他相关的神经网络架构之间的连接。为了在高效自动化的方式下发现相似的神经网络架构,我们定义了神经网络架构检索问题,该问题的目的是检索与查询神经网络架构相似的现有神经网络架构。现有的图预训练策略无法处理神经网络架构中的计算图因为图的大小和动作。为了解决这些问题,我们提议将图分解为模块,使用这些模块来重建宏图,并引入多级对比学习来实现准确的图表示学习。我们对人工设计的和生成的神经网络架构进行了广泛的评估,结果表明我们的算法具有优势。我们还构建了一个包含12000个实际世界网络架构,以及它们的嵌入的数据集,用于神经网络架构检索。

Recognition of Mental Adjectives in An Efficient and Automatic Style

  • paper_url: http://arxiv.org/abs/2307.11767
  • repo_url: None
  • paper_authors: Fei Yang
  • for: 本研究旨在提出一种新的常识逻辑任务,即心理和物理分类(MPC),以便处理常识逻辑在逻辑图中。
  • methods: 本研究使用了一个BERT模型,并采用了活动学习算法来减少需要的标注资源。
  • results: 使用ENTROPY策略,模型达到了满意的准确率,仅需约300个标注词。此外,我们还与SentiWordNet进行比较,以检验MPC与情感分类任务的差异。
    Abstract In recent years, commonsense reasoning has received more and more attention from academic community. We propose a new lexical inference task, Mental and Physical Classification (MPC), to handle commonsense reasoning in a reasoning graph. Mental words relate to mental activities, which fall into six categories: Emotion, Need, Perceiving, Reasoning, Planning and Personality. Physical words describe physical attributes of an object, like color, hardness, speed and malleability. A BERT model is fine-tuned for this task and active learning algorithm is adopted in the training framework to reduce the required annotation resources. The model using ENTROPY strategy achieves satisfactory accuracy and requires only about 300 labeled words. We also compare our result with SentiWordNet to check the difference between MPC and subjectivity classification task in sentiment analysis.
    摘要 近年来,常识理解在学术社区中得到了越来越多的关注。我们提出了一个新的lexical inference任务,即心理和物理分类(MPC),以处理常识理解在推理图中。心理词表示心理活动,分为六类:情感、需求、感知、理解、规划和人格。物理词描述物体的物理属性,如颜色、硬度、速度和柔性。我们使用BERT模型进行了 fine-tuning,并采用了活动学习算法来降低需要的标注资源。使用ENTROPY策略的模型具有了满意的准确率,并只需约300个标注词。我们还与SentiWordNet进行了比较,以检查MPC任务与情感分类任务在情感分析中的差异。

Is Imitation All You Need? Generalized Decision-Making with Dual-Phase Training

  • paper_url: http://arxiv.org/abs/2307.07909
  • repo_url: https://github.com/yunyikristy/dualmind
  • paper_authors: Yao Wei, Yanchao Sun, Ruijie Zheng, Sai Vemprala, Rogerio Bonatti, Shuhang Chen, Ratnesh Madaan, Zhongjie Ba, Ashish Kapoor, Shuang Ma
  • for: 这篇论文旨在解决当前方法面临的挑战,如过拟合行为和任务特定的微调。
  • methods: 该方法使用了一种新的”双相”训练策略,模仿人类如何在世界上学习行为。首先通过一个自然语言控制任务的自我超vised目标学习基础共知,然后通过模仿行为根据给定的提示进行决策。
  • results: 对MetaWorld和Habitat进行了广泛的实验评估,比前一代通用代理agent高效性70%和50%。在MetaWorld上完成45个任务的成功率高于90%。
    Abstract We introduce DualMind, a generalist agent designed to tackle various decision-making tasks that addresses challenges posed by current methods, such as overfitting behaviors and dependence on task-specific fine-tuning. DualMind uses a novel "Dual-phase" training strategy that emulates how humans learn to act in the world. The model first learns fundamental common knowledge through a self-supervised objective tailored for control tasks and then learns how to make decisions based on different contexts through imitating behaviors conditioned on given prompts. DualMind can handle tasks across domains, scenes, and embodiments using just a single set of model weights and can execute zero-shot prompting without requiring task-specific fine-tuning. We evaluate DualMind on MetaWorld and Habitat through extensive experiments and demonstrate its superior generalizability compared to previous techniques, outperforming other generalist agents by over 50$\%$ and 70$\%$ on Habitat and MetaWorld, respectively. On the 45 tasks in MetaWorld, DualMind achieves over 30 tasks at a 90$\%$ success rate.
    摘要 我们介绍了DUALMIND,一种通用的智能代理,用于解决当前方法面临的挑战,如适应性和任务特定的精度调整。DUALMIND使用一种新的“双阶段”培训策略,模拟人类学习行为的方式。首先,模型通过一种自我监督目标,学习基本的共同知识,然后通过模仿行为来学习不同上下文的决策。DUALMIND可以在不同领域、场景和实现中处理任务,只需一个模型权重集,并可以在零基础下进行指令。我们通过对METAWORLD和HABITAT进行了广泛的实验,证明DUALMIND在总体化能力方面表现出优于前一代技术,在HABITAT和METAWORLD上的表现比其他通用代理高出50%和70%。在METAWORLD上的45个任务中,DUALMIND达到了90%的成功率。

A Dialogue System for Assessing Activities of Daily Living: Improving Consistency with Grounded Knowledge

  • paper_url: http://arxiv.org/abs/2307.07544
  • repo_url: None
  • paper_authors: Zhecheng Sheng, Raymond Finzel, Michael Lucke, Sheena Dufresne, Maria Gini, Serguei Pakhomov
  • for: 这种对话系统用于帮助评估人员更好地评估参与者的功能能力,尤其是新手评估人员。
  • methods: 这种对话系统使用自然语言理解(NLU)和自然语言生成(NLG)两个模块,通过模拟实际对话来评估参与者的功能能力。
  • results: 通过使用最新的InstructGPT-like模型,对话系统可以根据参与者的生活背景信息和查询来生成相应的回答,以保证对话系统的回答与知识库之间的一致性。
    Abstract In healthcare, the ability to care for oneself is reflected in the "Activities of Daily Living (ADL)," which serve as a measure of functional ability (functioning). A lack of functioning may lead to poor living conditions requiring personal care and assistance. To accurately identify those in need of support, assistance programs continuously evaluate participants' functioning across various domains. However, the assessment process may encounter consistency issues when multiple assessors with varying levels of expertise are involved. Novice assessors, in particular, may lack the necessary preparation for real-world interactions with participants. To address this issue, we developed a dialogue system that simulates interactions between assessors and individuals of varying functioning in a natural and reproducible way. The dialogue system consists of two major modules, one for natural language understanding (NLU) and one for natural language generation (NLG), respectively. In order to generate responses consistent with the underlying knowledge base, the dialogue system requires both an understanding of the user's query and of biographical details of an individual being simulated. To fulfill this requirement, we experimented with query classification and generated responses based on those biographical details using some recently released InstructGPT-like models.
    摘要 在医疗领域,自我照顾能力反映在日常生活活动(ADL)中, serves as a measure of functional ability(功能)。lack of functioning may lead to poor living conditions requiring personal care and assistance. To accurately identify those in need of support, assistance programs continuously evaluate participants' functioning across various domains. However, the assessment process may encounter consistency issues when multiple assessors with varying levels of expertise are involved. Novice assessors, in particular, may lack the necessary preparation for real-world interactions with participants. To address this issue, we developed a dialogue system that simulates interactions between assessors and individuals of varying functioning in a natural and reproducible way. The dialogue system consists of two major modules, one for natural language understanding (NLU) and one for natural language generation (NLG), respectively. In order to generate responses consistent with the underlying knowledge base, the dialogue system requires both an understanding of the user's query and of biographical details of an individual being simulated. To fulfill this requirement, we experimented with query classification and generated responses based on those biographical details using some recently released InstructGPT-like models.Note: The text has been translated using Google Translate, and some minor adjustments may be necessary to ensure accuracy and fluency.

Anomaly Detection in Automated Fibre Placement: Learning with Data Limitations

  • paper_url: http://arxiv.org/abs/2307.07893
  • repo_url: None
  • paper_authors: Assef Ghamisi, Todd Charter, Li Ji, Maxime Rivard, Gil Lund, Homayoun Najjaran
  • for: 实时检测 Automated Fibre Placement (AFP) 中的瑕疵,以免让产品质量受到影响。
  • methods: 融合深度学习和传统computer vision算法,不需要大量的 Labelled defective samples 进行训练。使用对称性优化的抽取方法,从 AFP 的 fibre layup 表面中提取地方样本,并训练 autoencoder 来检测瑕疵。
  • results: 这种方法可以实时检测 AFP 中的瑕疵,并对产品质量进行严格的检测,而不需要大量的 Labelled defective samples。实验结果显示,这种方法可以实现高精度的瑕疵检测,并且可以检测到所有类型的瑕疵。
    Abstract Conventional defect detection systems in Automated Fibre Placement (AFP) typically rely on end-to-end supervised learning, necessitating a substantial number of labelled defective samples for effective training. However, the scarcity of such labelled data poses a challenge. To overcome this limitation, we present a comprehensive framework for defect detection and localization in Automated Fibre Placement. Our approach combines unsupervised deep learning and classical computer vision algorithms, eliminating the need for labelled data or manufacturing defect samples. It efficiently detects various surface issues while requiring fewer images of composite parts for training. Our framework employs an innovative sample extraction method leveraging AFP's inherent symmetry to expand the dataset. By inputting a depth map of the fibre layup surface, we extract local samples aligned with each composite strip (tow). These samples are processed through an autoencoder, trained on normal samples for precise reconstructions, highlighting anomalies through reconstruction errors. Aggregated values form an anomaly map for insightful visualization. The framework employs blob detection on this map to locate manufacturing defects. The experimental findings reveal that despite training the autoencoder with a limited number of images, our proposed method exhibits satisfactory detection accuracy and accurately identifies defect locations. Our framework demonstrates comparable performance to existing methods, while also offering the advantage of detecting all types of anomalies without relying on an extensive labelled dataset of defects.
    摘要 传统的自动纤维布置(AFP)系统通常采用端到端的supervised learning,需要一大量的标注的异常样本来进行有效的训练。然而,获取这些标注的样本是一个挑战。为了解决这个限制,我们提出了一个全面的异常检测和地图化方法 для自动纤维布置。我们的方法结合了深度学习和经典的计算机视觉算法,不需要标注的样本或制造异常样本。它能够高效地检测多种表面问题,只需要训练 fewer 的复合部件图像。我们的框架使用了一种创新的样本EXTRACTION方法,利用 AFP 的自然的对称性来扩大数据集。通过输入纤维布置表面的深度地图,我们EXTRACT了本地的样本,这些样本与每个复合带(tow)align。这些样本被处理 durch一个自动编码器,该自动编码器在正常样本上进行了精准的重建,通过重建错误来高亮异常。相对值形成了异常地图,用于有用的可视化。我们的框架使用了球体检测来确定制造异常。实验结果表明,即使使用有限的图像训练,我们提议的方法可以具有满意的检测精度,并准确地确定异常的位置。我们的框架与现有的方法相比,同时可以检测所有类型的异常,不需要大量的标注的异常样本。

Handwritten and Printed Text Segmentation: A Signature Case Study

  • paper_url: http://arxiv.org/abs/2307.07887
  • repo_url: None
  • paper_authors: Sina Gholamian, Ali Vahdat
  • for: 这个论文的目的是解决扫描文档中手写文本与印刷文本的重叠问题,以提高文档的Optical Character Recognition(OCR)和数字化过程。
  • methods: 该论文提出了新的方法来解决手写和印刷文本的分类问题,包括引入新的数据集SignaTR6K和一种新的模型建立方式。
  • results: 该论文的最佳配置在两个不同的数据集上比 Priors 的工作提高了17.9%和7.3%的Intersection over Union(IoU)分数。
    Abstract While analyzing scanned documents, handwritten text can overlap with printed text. This overlap causes difficulties during the optical character recognition (OCR) and digitization process of documents, and subsequently, hurts downstream NLP tasks. Prior research either focuses solely on the binary classification of handwritten text or performs a three-class segmentation of the document, i.e., recognition of handwritten, printed, and background pixels. This approach results in the assignment of overlapping handwritten and printed pixels to only one of the classes, and thus, they are not accounted for in the other class. Thus, in this research, we develop novel approaches to address the challenges of handwritten and printed text segmentation. Our objective is to recover text from different classes in their entirety, especially enhancing the segmentation performance on overlapping sections. To support this task, we introduce a new dataset, SignaTR6K, collected from real legal documents, as well as a new model architecture for the handwritten and printed text segmentation task. Our best configuration outperforms prior work on two different datasets by 17.9% and 7.3% on IoU scores. The SignaTR6K dataset is accessible for download via the following link: https://forms.office.com/r/2a5RDg7cAY.
    摘要 While analyzing scanned documents, handwritten text can overlap with printed text, causing difficulties during the optical character recognition (OCR) and digitization process of documents, and subsequently, hurting downstream NLP tasks. Prior research either focuses solely on the binary classification of handwritten text or performs a three-class segmentation of the document, i.e., recognition of handwritten, printed, and background pixels. This approach results in the assignment of overlapping handwritten and printed pixels to only one of the classes, and thus, they are not accounted for in the other class. Therefore, in this research, we develop novel approaches to address the challenges of handwritten and printed text segmentation. Our objective is to recover text from different classes in their entirety, especially enhancing the segmentation performance on overlapping sections. To support this task, we introduce a new dataset, SignaTR6K, collected from real legal documents, as well as a new model architecture for the handwritten and printed text segmentation task. Our best configuration outperforms prior work on two different datasets by 17.9% and 7.3% on IoU scores. The SignaTR6K dataset is accessible for download via the following link: .

Online Goal Recognition in Discrete and Continuous Domains Using a Vectorial Representation

  • paper_url: http://arxiv.org/abs/2307.07876
  • repo_url: None
  • paper_authors: Douglas Tesch, Leonardo Rosa Amado, Felipe Meneguzzi
  • for: 本研究的目的是提出一种高效的在线目标识别方法,可以在精细空间和连续空间两个Domain中进行目标识别。
  • methods: 本方法使用了一种单一调用 плаanner的方法,以实现在精细空间和连续空间两个Domain中的目标识别。在精细空间中,方法使用了一种简化的动作模型,从而减少了计算负担。
  • results: 本研究的结果表明,该方法可以在精细空间和连续空间两个Domain中进行高效的在线目标识别,并且可以在几乎实时的速度下完成目标识别。与现有技术相比,该方法的计算量减少了数个数量级,使其成为了首个可以实用于机器人应用的在线目标识别方法。
    Abstract While recent work on online goal recognition efficiently infers goals under low observability, comparatively less work focuses on online goal recognition that works in both discrete and continuous domains. Online goal recognition approaches often rely on repeated calls to the planner at each new observation, incurring high computational costs. Recognizing goals online in continuous space quickly and reliably is critical for any trajectory planning problem since the real physical world is fast-moving, e.g. robot applications. We develop an efficient method for goal recognition that relies either on a single call to the planner for each possible goal in discrete domains or a simplified motion model that reduces the computational burden in continuous ones. The resulting approach performs the online component of recognition orders of magnitude faster than the current state of the art, making it the first online method effectively usable for robotics applications that require sub-second recognition.
    摘要 现有研究对在低观察性下高效地识别目标,但相比之下,更少的研究集中在网上目标识别,包括网上目标识别在网上和连续域中。网上目标识别方法通常需要重复访问 плаanner,这会导致高度 computation costs。在实际的物理世界中,例如机器人应用程序,识别目标在连续空间中快速和可靠是非常重要。我们开发了一种高效的目标识别方法,这种方法可以在网上和连续域中快速识别目标,并且仅需对每个可能的目标进行单一的访问。这种方法与现有的状况对照,效率高得多,可以实现在机器人应用程序中的低秒识别。

Does Double Descent Occur in Self-Supervised Learning?

  • paper_url: http://arxiv.org/abs/2307.07872
  • repo_url: https://github.com/yonatangideoni/double_descent_tiny_paper
  • paper_authors: Alisia Lupidi, Yonatan Gideoni, Dulhan Jayalath
  • for: 研究双重下降现象的大多数研究是基于supervised模型,而自适应 Setting的 исследования却发现surprisingly absent。
  • methods: 我们使用标准和线性autoencoder两种未研究过的设定进行实验。
  • results: 我们发现测试损失函数 either exhibits a classical U-shape or monotonically decreases, rather than displaying a double-descent curve.Note: “双重下降” (double descent) in Chinese is “双重下降” (shuāngzhòng jiàoxiàng).
    Abstract Most investigations into double descent have focused on supervised models while the few works studying self-supervised settings find a surprising lack of the phenomenon. These results imply that double descent may not exist in self-supervised models. We show this empirically using a standard and linear autoencoder, two previously unstudied settings. The test loss is found to have either a classical U-shape or to monotonically decrease instead of exhibiting a double-descent curve. We hope that further work on this will help elucidate the theoretical underpinnings of this phenomenon.
    摘要 大多数研究双峰现象都集中在指导学习模型上,而自动学习设置中的研究却很少,这些结果表明双峰现象可能不存在于自动学习模型中。我们通过标准的自动encoder和线性的自动encoder两种未经研究的设置来证明这点。测试损失的曲线是 Either exhibiting a classical U-shape or monotonically decreasing, rather than a double-descent curve. 我们希望进一步的研究能够推动这个现象的理论基础的探索。

The SocialAI School: Insights from Developmental Psychology Towards Artificial Socio-Cultural Agents

  • paper_url: http://arxiv.org/abs/2307.07871
  • repo_url: None
  • paper_authors: Grgur Kovač, Rémy Portelas, Peter Ford Dominey, Pierre-Yves Oudeyer
  • for: 这个论文的目的是将发展心理学知识应用到人工智能领域,以便创造更智能的社交交互式代理人。
  • methods: 这篇论文使用了 Michael Tomasello 和 Jerome Bruner 等心理学家的理论,并提出了一种可 parameterized 的 эксперименталь设计,以便研究社交智能的发展。
  • results: 这篇论文提出了一种名为 SocialAI 的学术学习平台,可以帮助研究人员在社交智能方面进行实验研究,并提供了一些示例实验。
    Abstract Developmental psychologists have long-established the importance of socio-cognitive abilities in human intelligence. These abilities enable us to enter, participate and benefit from human culture. AI research on social interactive agents mostly concerns the emergence of culture in a multi-agent setting (often without a strong grounding in developmental psychology). We argue that AI research should be informed by psychology and study socio-cognitive abilities enabling to enter a culture too. We discuss the theories of Michael Tomasello and Jerome Bruner to introduce some of their concepts to AI and outline key concepts and socio-cognitive abilities. We present The SocialAI school - a tool including a customizable parameterized uite of procedurally generated environments, which simplifies conducting experiments regarding those concepts. We show examples of such experiments with RL agents and Large Language Models. The main motivation of this work is to engage the AI community around the problem of social intelligence informed by developmental psychology, and to provide a tool to simplify first steps in this direction. Refer to the project website for code and additional information: https://sites.google.com/view/socialai-school.
    摘要 To address this issue, we propose the SocialAI school, a tool that includes a customizable parameterized suite of procedurally generated environments to simplify experiments related to socio-cognitive abilities. Our goal is to engage the AI community in exploring the problem of social intelligence informed by developmental psychology and provide a tool to facilitate initial explorations in this area.The SocialAI school is based on the theories of Michael Tomasello and Jerome Bruner, who have contributed significantly to the understanding of socio-cognitive abilities. We introduce key concepts and abilities, such as joint attention, intentional understanding, and cultural learning, and demonstrate their application in experiments using reinforcement learning (RL) agents and large language models.Our main motivation is to encourage the AI community to explore the importance of social intelligence informed by developmental psychology. We believe that by integrating insights from psychology and AI, we can create more sophisticated and human-like intelligent systems. For more information and to access the tool, please refer to the project website at .

Large Language Models as Superpositions of Cultural Perspectives

  • paper_url: http://arxiv.org/abs/2307.07870
  • repo_url: None
  • paper_authors: Grgur Kovač, Masataka Sawayama, Rémy Portelas, Cédric Colas, Peter Ford Dominey, Pierre-Yves Oudeyer
  • for: 这 paper investigate 大型自然语言模型 (LLM) 是否可以看作一个 superposition 的 perspectives 和 values.
  • methods: 作者使用了问卷测试 (PVQ, VSM, IPIP) 来研究 LLM 在不同 context 下表现出的 values 和 personality traits 是如何变化的.
  • results: 研究发现 LLM 在不同 prompt 中表现出的 values 和 personality traits 是 Context-dependent 的,并且可以通过不同的方法来控制这些 values 和 personality traits.I hope this helps! Let me know if you have any further questions.
    Abstract Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values. We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits. LLMs exhibit context-dependent values and personality traits that change based on the induced perspective (as opposed to humans, who tend to have more coherent values and personality traits across contexts). We introduce the concept of perspective controllability, which refers to a model's affordance to adopt various perspectives with differing values and personality traits. In our experiments, we use questionnaires from psychology (PVQ, VSM, IPIP) to study how exhibited values and personality traits change based on different perspectives. Through qualitative experiments, we show that LLMs express different values when those are (implicitly or explicitly) implied in the prompt, and that LLMs express different values even when those are not obviously implied (demonstrating their context-dependent nature). We then conduct quantitative experiments to study the controllability of different models (GPT-4, GPT-3.5, OpenAssistant, StableVicuna, StableLM), the effectiveness of various methods for inducing perspectives, and the smoothness of the models' drivability. We conclude by examining the broader implications of our work and outline a variety of associated scientific questions. The project website is available at https://sites.google.com/view/llm-superpositions .
    摘要 大型语言模型(LLM) oftentimes 被误解为具有人格或一组价值观。我们认为 LLM 可以被看作是一个组合多种看法的超position,每个看法都具有不同的价值观和人格特质。LLM 在不同的上下文中会表现出不同的价值观和人格特质,而人类则 tend to have more coherent 的价值观和人格特质 across contexts。我们引入了“ perspective controllability”的概念,它指的是一个模型在不同的上下文中能够采取不同的看法。我们通过问卷调查(PVQ、VSM、IPIP)研究 LLM 在不同上下文中表现出的价值观和人格特质是如何改变的。我们还通过实验证明 LLM 在不同上下文中表现出不同的价值观,并且这些价值观甚至在不明显的上下文中仍然会改变。我们然后进行了量化实验,研究不同模型(GPT-4、GPT-3.5、OpenAssistant、StableVicuna、StableLM)的可控性、对不同上下文的适应度以及模型的运作流略。我们最后结论是,我们的工作具有更广泛的科学影响和多种相关的科学问题。详情可以参考我们的项目网站:https://sites.google.com/view/llm-superpositions。

Benchmarking the Effectiveness of Classification Algorithms and SVM Kernels for Dry Beans

  • paper_url: http://arxiv.org/abs/2307.07863
  • repo_url: None
  • paper_authors: Anant Mehta, Prajit Sengupta, Divisha Garg, Harpreet Singh, Yosi Shacham Diamand
  • for: 该研究旨在帮助植物育种人和农业研究人员提高作物产量,通过分析杯装豇产品数据集来发现感兴趣的特征、病虫害抵抗性和营养含量。
  • methods: 本研究使用了不同的支持向量机(SVM)分类算法,包括线性、多项式和卷积函数(RBF),以及其他流行的分类算法,对杯装豇产品数据集进行分类。在进行分类之前,使用了主成分分析(PCA)来降维度。
  • results: 根据准确率为93.34%、准确率为92.61%、准确率为92.35%和F1得分为91.40%来评估算法的性能,RBF SVM kernel算法 achieve最高的准确率。此外,通过适当的视觉化和实验分析,本研究为复杂和非线性结构数据集的分类提供了有价值的指导。
    Abstract Plant breeders and agricultural researchers can increase crop productivity by identifying desirable features, disease resistance, and nutritional content by analysing the Dry Bean dataset. This study analyses and compares different Support Vector Machine (SVM) classification algorithms, namely linear, polynomial, and radial basis function (RBF), along with other popular classification algorithms. The analysis is performed on the Dry Bean Dataset, with PCA (Principal Component Analysis) conducted as a preprocessing step for dimensionality reduction. The primary evaluation metric used is accuracy, and the RBF SVM kernel algorithm achieves the highest Accuracy of 93.34%, Precision of 92.61%, Recall of 92.35% and F1 Score as 91.40%. Along with adept visualization and empirical analysis, this study offers valuable guidance by emphasizing the importance of considering different SVM algorithms for complex and non-linear structured datasets.
    摘要 植物育种者和农业研究人员可以提高作物产量,通过识别有利特征、疾病抵抗和营养含量,通过分析扁豆数据集。本研究分析和比较不同支持向量机(SVM)分类算法,包括直线、多项式和径向基函数(RBF),以及其他流行的分类算法。经预处理(主成分分析),对扁豆数据集进行分类。主要评价指标为准确率,RBF SVM 算法实现最高准确率为 93.34%、精度为 92.61%、回归率为 92.35% 和 F1 分数为 91.40%。此外,通过明确的视觉分析和实际分析,本研究提供了有价值的指导,强调考虑不同 SVM 算法来处理复杂和非线性结构的数据集。

Automated Knowledge Modeling for Cancer Clinical Practice Guidelines

  • paper_url: http://arxiv.org/abs/2307.10231
  • repo_url: None
  • paper_authors: Pralaypati Ta, Bhumika Gupta, Arihant Jain, Sneha Sree C, Arunima Sarkar, Keerthi Ram, Mohanasankar Sivaprakasam
  • for: 本研究旨在提取和生成临床实践指南(CPGs)中的知识,以便实现这些指南的程序式交互。
  • methods: 本研究使用自动化方法提取国家全面癌病网络(NCCN)CPGs中的知识,并生成一个结构化模型。三种激发策略,包括肿瘤分期信息、Unified Medical Language System(UMLS)METAthesaurus & National Cancer Institute thesaurus(NCIt)术语、和节点分类,也被提出以增强模型,以便实现癌病护理指南的程序式浏览和查询。
  • results: 节点分类使用支持向量机(SVM)模型,实现分类精度为0.81,通过十fold交叉验证。
    Abstract Clinical Practice Guidelines (CPGs) for cancer diseases evolve rapidly due to new evidence generated by active research. Currently, CPGs are primarily published in a document format that is ill-suited for managing this developing knowledge. A knowledge model of the guidelines document suitable for programmatic interaction is required. This work proposes an automated method for extraction of knowledge from National Comprehensive Cancer Network (NCCN) CPGs in Oncology and generating a structured model containing the retrieved knowledge. The proposed method was tested using two versions of NCCN Non-Small Cell Lung Cancer (NSCLC) CPG to demonstrate the effectiveness in faithful extraction and modeling of knowledge. Three enrichment strategies using Cancer staging information, Unified Medical Language System (UMLS) Metathesaurus & National Cancer Institute thesaurus (NCIt) concepts, and Node classification are also presented to enhance the model towards enabling programmatic traversal and querying of cancer care guidelines. The Node classification was performed using a Support Vector Machine (SVM) model, achieving a classification accuracy of 0.81 with 10-fold cross-validation.
    摘要 临床实践指南 (CPGs) для癌症疾病在新证据的激发下逐渐发展。目前,CPGs 主要以文档格式发布,这种格式不适合管理这些发展中的知识。本工作提出了一种自动提取临床指南文档中的知识并生成结构化模型的方法。该方法在使用两个版本的国家癌症网络 (NCCN) Non-Small Cell Lung Cancer (NSCLC) 临床指南进行测试,以证明该方法的效果是忠实地提取和模型知识。此外,本文还描述了三种扩充策略,包括肿瘤分期信息、Unified Medical Language System (UMLS) Metathesaurus & National Cancer Institute thesaurus (NCIt) 概念和节点分类,以增强模型,使其可以进行程序化的浏览和查询癌症护理指南。节点分类使用支持向量机 (SVM) 模型,实现分类精度为 0.81 的十fold十字验证。

A Multi-Heuristic Search-based Motion Planning for Automated Parking

  • paper_url: http://arxiv.org/abs/2307.07857
  • repo_url: None
  • paper_authors: Bhargav Adabala, Zlatan Ajanović
    for:这篇论文是关于在无结构环境中,如停车场或建筑现场,由于搜索空间的巨大性和车辆的动态约束,实时规划是一项挑战。methods:本文采用多хеURISTIC搜索方法,通过使用多个хеURISTIC函数,捕捉不同的搜索空间复杂性,并且可以充分发挥每个хеURISTIC函数的优势。results:与Hybrid A算法进行比较,Multi-Heuristic A算法在计算效率和动作计划质量两个方面占据了优势。
    Abstract In unstructured environments like parking lots or construction sites, due to the large search-space and kinodynamic constraints of the vehicle, it is challenging to achieve real-time planning. Several state-of-the-art planners utilize heuristic search-based algorithms. However, they heavily rely on the quality of the single heuristic function, used to guide the search. Therefore, they are not capable to achieve reasonable computational performance, resulting in unnecessary delays in the response of the vehicle. In this work, we are adopting a Multi-Heuristic Search approach, that enables the use of multiple heuristic functions and their individual advantages to capture different complexities of a given search space. Based on our knowledge, this approach was not used previously for this problem. For this purpose, multiple admissible and non-admissible heuristic functions are defined, the original Multi-Heuristic A* Search was extended for bidirectional use and dealing with hybrid continuous-discrete search space, and a mechanism for adapting scale of motion primitives is introduced. To demonstrate the advantage, the Multi-Heuristic A* algorithm is benchmarked against a very popular heuristic search-based algorithm, Hybrid A*. The Multi-Heuristic A* algorithm outperformed baseline in both terms, computation efficiency and motion plan (path) quality.
    摘要 在无结构环境中,如停车场或建筑现场,因车辆的搜寻空间和运动约束导致实时规划成为挑战。许多现代的规划器使用了对搜寻的搜寻函数来导航搜寻。但是,这些规划器对单一搜寻函数的质量依赖甚高,因此在实际应用中会导致无必要的延迟。在这个工作中,我们运用了多个搜寻函数的多搜寻方法,以利用不同的搜寻函数优点,捕捉不同的搜寻空间复杂性。根据我们的知识,这种方法在这个问题上没有被使用过。因此,我们定义了多个可行和非可行的搜寻函数,扩展了原始的多搜寻A*搜寻算法,以便对两向和混合点几何搜寻空间进行搜寻,并导入了动态减少运动元素的机制。为了证明优势,我们将多搜寻A*算法与非常受欢迎的搜寻函数基本算法(Hybrid A*)进行比较。结果显示,Multi-Heuristic A*算法在计算效率和运动规划(路径)质量两个方面都高于基eline。

AspectCSE: Sentence Embeddings for Aspect-based Semantic Textual Similarity using Contrastive Learning and Structured Knowledge

  • paper_url: http://arxiv.org/abs/2307.07851
  • repo_url: None
  • paper_authors: Tim Schopf, Emanuel Gerber, Malte Ostendorff, Florian Matthes
  • for: This paper is written for improving the accuracy of information retrieval tasks by using aspect-based contrastive learning of sentence embeddings.
  • methods: The paper proposes a new approach called AspectCSE, which uses aspect-based contrastive learning to train sentence embeddings that can capture specific aspects of textual similarity.
  • results: The paper reports an average improvement of 3.97% on information retrieval tasks across multiple aspects compared to the previous best results. Additionally, the paper shows that multi-aspect embeddings outperform single-aspect embeddings on aspect-specific information retrieval tasks.
    Abstract Generic sentence embeddings provide a coarse-grained approximation of semantic textual similarity but ignore specific aspects that make texts similar. Conversely, aspect-based sentence embeddings provide similarities between texts based on certain predefined aspects. Thus, similarity predictions of texts are more targeted to specific requirements and more easily explainable. In this paper, we present AspectCSE, an approach for aspect-based contrastive learning of sentence embeddings. Results indicate that AspectCSE achieves an average improvement of 3.97% on information retrieval tasks across multiple aspects compared to the previous best results. We also propose using Wikidata knowledge graph properties to train models of multi-aspect sentence embeddings in which multiple specific aspects are simultaneously considered during similarity predictions. We demonstrate that multi-aspect embeddings outperform single-aspect embeddings on aspect-specific information retrieval tasks. Finally, we examine the aspect-based sentence embedding space and demonstrate that embeddings of semantically similar aspect labels are often close, even without explicit similarity training between different aspect labels.
    摘要 中文翻译:通用句子嵌入提供了一级划算的 semantic 类似性表示,但它们忽略了特定方面的文本类似性。相反,方面基于句子嵌入提供了基于特定方面的类似性 predictions。这使得类似性预测更加专注于特定要求,并更易于解释。在这篇论文中,我们提出了 AspectCSE,一种方面基于的句子嵌入对比学习方法。结果显示,AspectCSE 在多个方面的信息检索任务上平均提高了3.97%,相比前一个最佳结果。我们还提出了使用 Wikidata 知识图Properties 来训练多个方面的句子嵌入模型,其中多个特定方面同时被考虑在类似性预测中。我们示出了多个方面嵌入在特定方面信息检索任务上的表现优于单个方面嵌入。最后,我们检查了方面基于句子嵌入空间,并证明了不同方面标签的嵌入在semantic上相似时,它们通常处于近距离。

AIOptimizer – A reinforcement learning-based software performance optimisation prototype for cost minimisation

  • paper_url: http://arxiv.org/abs/2307.07846
  • repo_url: None
  • paper_authors: Noopur Zambare
  • for: 本研究论文介绍了一个基于成本reduction的软件性能优化工具AIOptimizer的 проtotype。AIOptimizer使用一个基于强化学习的推荐系统,以提高软件系统的效率和可持续性。
  • methods: 本研究使用了一个模块化设计、数据收集技术、连续学习和可靠的集成,以提供有效和用户中心的性能优化解决方案。
  • results: 本研究发现AIOptimizer可以实现精确性、灵活性、可扩展性和用户友善性等设计因素,并且可以实现成本优化、缺陷识别、效率预测和合作等功能。
    Abstract This research article introduces AIOptimizer, a prototype for a software performance optimisation tool based on cost reduction. AIOptimizer uses a recommendation system driven by reinforcement learning to improve software system efficiency and affordability. The paper highlights AIOptimizer's design factors, such as accuracy, adaptability, scalability, and user-friendliness. To provide effective and user-centric performance optimisation solutions, it emphasises the use of a modular design, data gathering techniques, continuous learning, and resilient integration. The article also investigates AIOptimizer features such as fault identification, cost optimisation recommendations, efficiency prediction, and cooperation. Furthermore, it explores several software development life cycle models and introduces AIOptimizer uses a reinforcement learning-based recommendation engine for cost optimisation. The purpose of this research study is to highlight AIOptimizer as a prototype that uses advanced optimisation techniques and smart recommendation systems to continually enhance software performance and save expenses. The research focuses on various software development life cycle models, such as the Waterfall model, Iterative model, Spiral model, V-Model, Big Bang model and Agile Model. Each model has advantages and disadvantages, and their usefulness is determined by the project's specifications and characteristics. The AIOptimizer tool is a theoretical prototype for such software performance optimizers.
    摘要 本研究文章介绍了一种基于成本减少的软件性能优化工具prototype,称为AIOptimizer。AIOptimizer使用一种基于强化学习的推荐系统来提高软件系统的效率和可affordability。文章强调了AIOptimizer的设计因素,如准确率、适应性、扩展性和用户友好性。为提供有效和用户中心的性能优化解决方案,它强调了模块化设计、数据收集技术、连续学习和可靠的集成。文章还探讨了AIOptimizer的特性,如错误识别、成本优化建议、效率预测和合作。此外,它还介绍了软件开发生命周期模型,如水fall模型、迭代模型、散点模型、V模型、大 Bang模型和互动模型。每种模型具有优缺点,其用于特定项目的评估和适用性决定。AIOptimizer工具是一种理论上的软件性能优化器。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

RegExplainer: Generating Explanations for Graph Neural Networks in Regression Task

  • paper_url: http://arxiv.org/abs/2307.07840
  • repo_url: None
  • paper_authors: Jiaxing Zhang, Zhuomin Chen, Hao Mei, Dongsheng Luo, Hua Wei
  • for: 本文旨在解释Graph Regression模型(XAIG-R)的具体行为,以便更好地理解它在 regression 任务中的工作原理。
  • methods: 本文提出了一种基于信息瓶颈理论的新目标函数,以及一种可以支持多种 GNN 的混合框架。此外,本文还提出了一种对比学习策略,用于处理继承顺序的标签问题。
  • results: 经验证明,提出的方法能够有效地解释 GNN 模型在 regression 任务中的行为。在三个 benchmark 数据集和一个实际数据集上进行了广泛的实验,结果表明提出的方法能够准确地捕捉 GNN 模型的行为特点。
    Abstract Graph regression is a fundamental task and has received increasing attention in a wide range of graph learning tasks. However, the inference process is often not interpretable. Most existing explanation techniques are limited to understanding GNN behaviors in classification tasks. In this work, we seek an explanation to interpret the graph regression models (XAIG-R). We show that existing methods overlook the distribution shifting and continuously ordered decision boundary, which hinders them away from being applied in the regression tasks. To address these challenges, we propose a novel objective based on the information bottleneck theory and introduce a new mix-up framework, which could support various GNNs in a model-agnostic manner. We further present a contrastive learning strategy to tackle the continuously ordered labels in regression task. To empirically verify the effectiveness of the proposed method, we introduce three benchmark datasets and a real-life dataset for evaluation. Extensive experiments show the effectiveness of the proposed method in interpreting GNN models in regression tasks.
    摘要 Graph 回归是一种基本任务,在各种图学习任务中受到了越来越多的关注。然而,推理过程经常不可解释。现有的解释技术主要是用于理解 GNN 的归类任务中的行为。在这项工作中,我们寻求一种用于解释 graph 回归模型(XAIG-R)的解释。我们发现现有的方法忽略了分布转移和连续顺序决策边界,这使得它们无法应用于回归任务中。为解决这些挑战,我们提出了一个基于信息瓶颈理论的新目标函数,并引入了一种新的混合框架,可以在不同的 GNN 上进行模型无关的应用。此外,我们还提出了一种对比学习策略,用于处理连续顺序的标签。为验证提出的方法的效果,我们引入了三个标准 benchmark 数据集和一个真实生成的数据集进行评估。广泛的实验表明,提出的方法可以有效地解释 GNN 模型在回归任务中。