cs.AI - 2023-07-30

DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction

  • paper_url: http://arxiv.org/abs/2307.16246
  • repo_url: https://github.com/maoxiaowei97/drl4route
  • paper_authors: Xiaowei Mao, Haomin Wen, Hengrui Zhang, Huaiyu Wan, Lixia Wu, Jianbin Zheng, Haoyuan Hu, Youfang Lin
  • For: 预测工作者的服务路线(PDRP),以便估算未来服务任务的路线,在过去几年内受到了越来越多的关注。* Methods: 使用深度神经网络和强化学习框架,将工作者的行为模式从大量历史数据中学习出来,并将非导数对象优化纳入训练过程中。* Results: 对实际数据集进行了广泛的离线实验和在线部署,并显示了对PDRP的改进,包括Location Square Deviation(LSD)和Accuracy@3(ACC@3)的改进。
    Abstract Pick-up and Delivery Route Prediction (PDRP), which aims to estimate the future service route of a worker given his current task pool, has received rising attention in recent years. Deep neural networks based on supervised learning have emerged as the dominant model for the task because of their powerful ability to capture workers' behavior patterns from massive historical data. Though promising, they fail to introduce the non-differentiable test criteria into the training process, leading to a mismatch in training and test criteria. Which considerably trims down their performance when applied in practical systems. To tackle the above issue, we present the first attempt to generalize Reinforcement Learning (RL) to the route prediction task, leading to a novel RL-based framework called DRL4Route. It combines the behavior-learning abilities of previous deep learning models with the non-differentiable objective optimization ability of reinforcement learning. DRL4Route can serve as a plug-and-play component to boost the existing deep learning models. Based on the framework, we further implement a model named DRL4Route-GAE for PDRP in logistic service. It follows the actor-critic architecture which is equipped with a Generalized Advantage Estimator that can balance the bias and variance of the policy gradient estimates, thus achieving a more optimal policy. Extensive offline experiments and the online deployment show that DRL4Route-GAE improves Location Square Deviation (LSD) by 0.9%-2.7%, and Accuracy@3 (ACC@3) by 2.4%-3.2% over existing methods on the real-world dataset.
    摘要 picked-up 和交付路线预测(PDRP)在最近几年内收到了越来越多的关注。深度神经网络基于超级vised学习 emerged as the dominant model for the task because of their powerful ability to capture workers' behavior patterns from massive historical data。although promising,they fail to introduce the non-differentiable test criteria into the training process,leading to a mismatch in training and test criteria。Which considerably trims down their performance when applied in practical systems。To tackle the above issue,we present the first attempt to generalize Reinforcement Learning(RL)to the route prediction task,leading to a novel RL-based framework called DRL4Route。It combines the behavior-learning abilities of previous deep learning models with the non-differentiable objective optimization ability of reinforcement learning。DRL4Route can serve as a plug-and-play component to boost the existing deep learning models。Based on the framework,we further implement a model named DRL4Route-GAE for PDRP in logistic service。It follows the actor-critic architecture which is equipped with a Generalized Advantage Estimator that can balance the bias and variance of the policy gradient estimates,thus achieving a more optimal policy。Extensive offline experiments and the online deployment show that DRL4Route-GAE improves Location Square Deviation(LSD)by 0.9%-2.7%,and Accuracy@3(ACC@3)by 2.4%-3.2% over existing methods on the real-world dataset。

Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey

  • paper_url: http://arxiv.org/abs/2307.16236
  • repo_url: None
  • paper_authors: Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato
  • for: 本研究旨在探讨基于深度学习(DL)技术的新兴应用,以及与生物体针对的机制相关的挑战。
  • methods: 本文综述了一些基于生物机制的深度学习模型,包括synaptic plasticity模型,以及与脉冲神经网络(SNNs)相关的模型。
  • results: 本研究发现,基于生物机制的深度学习模型在多种应用场景中表现出色,并且可能解决一些DL技术面临的挑战,如针对攻击和生态影响。
    Abstract Recently emerged technologies based on Deep Learning (DL) achieved outstanding results on a variety of tasks in the field of Artificial Intelligence (AI). However, these encounter several challenges related to robustness to adversarial inputs, ecological impact, and the necessity of huge amounts of training data. In response, researchers are focusing more and more interest on biologically grounded mechanisms, which are appealing due to the impressive capabilities exhibited by biological brains. This survey explores a range of these biologically inspired models of synaptic plasticity, their application in DL scenarios, and the connections with models of plasticity in Spiking Neural Networks (SNNs). Overall, Bio-Inspired Deep Learning (BIDL) represents an exciting research direction, aiming at advancing not only our current technologies but also our understanding of intelligence.
    摘要

Spiking Neural Networks and Bio-Inspired Supervised Deep Learning: A Survey

  • paper_url: http://arxiv.org/abs/2307.16235
  • repo_url: None
  • paper_authors: Gabriele Lagani, Fabrizio Falchi, Claudio Gennaro, Giuseppe Amato
    for:本文提供了一个全面的评论,涵盖最近基于生物学的人工智能技术发展的方法。methods:本文 introduce了生物神经元计算原则和 synaptic plasticity,并提供了精炼的脉冲神经网络(SNN)模型,以及对SNN训练的主要挑战。results:本文讨论了一些基于生物学的训练方法,作为传统backprop-based优化的替代方案,以提高当前模型的计算能力和生物学可能性。
    Abstract For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.
    摘要 For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.Here is the translation in Traditional Chinese:For a long time, biology and neuroscience fields have been a great source of inspiration for computer scientists, towards the development of Artificial Intelligence (AI) technologies. This survey aims at providing a comprehensive review of recent biologically-inspired approaches for AI. After introducing the main principles of computation and synaptic plasticity in biological neurons, we provide a thorough presentation of Spiking Neural Network (SNN) models, and we highlight the main challenges related to SNN training, where traditional backprop-based optimization is not directly applicable. Therefore, we discuss recent bio-inspired training methods, which pose themselves as alternatives to backprop, both for traditional and spiking networks. Bio-Inspired Deep Learning (BIDL) approaches towards advancing the computational capabilities and biological plausibility of current models.

Robust Electric Vehicle Balancing of Autonomous Mobility-On-Demand System: A Multi-Agent Reinforcement Learning Approach

  • paper_url: http://arxiv.org/abs/2307.16228
  • repo_url: None
  • paper_authors: Sihong He, Shuo Han, Fei Miao
    for:This paper aims to design a multi-agent reinforcement learning (MARL) framework for electric autonomous vehicles (EAVs) balancing in future autonomous mobility-on-demand (AMoD) systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties.methods:The proposed method uses a MARL-based framework to train a robust EAVs balancing policy that considers both the supply-demand ratio and charging utilization rate across the whole city.results:Experiments show that the proposed robust method performs better compared with a non-robust MARL method, with improvements of 19.28% in reward, 28.18% in charging utilization fairness, and 3.97% in supply-demand fairness. Compared with a robust optimization-based method, the proposed MARL algorithm improves the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively.
    Abstract Electric autonomous vehicles (EAVs) are getting attention in future autonomous mobility-on-demand (AMoD) systems due to their economic and societal benefits. However, EAVs' unique charging patterns (long charging time, high charging frequency, unpredictable charging behaviors, etc.) make it challenging to accurately predict the EAVs supply in E-AMoD systems. Furthermore, the mobility demand's prediction uncertainty makes it an urgent and challenging task to design an integrated vehicle balancing solution under supply and demand uncertainties. Despite the success of reinforcement learning-based E-AMoD balancing algorithms, state uncertainties under the EV supply or mobility demand remain unexplored. In this work, we design a multi-agent reinforcement learning (MARL)-based framework for EAVs balancing in E-AMoD systems, with adversarial agents to model both the EAVs supply and mobility demand uncertainties that may undermine the vehicle balancing solutions. We then propose a robust E-AMoD Balancing MARL (REBAMA) algorithm to train a robust EAVs balancing policy to balance both the supply-demand ratio and charging utilization rate across the whole city. Experiments show that our proposed robust method performs better compared with a non-robust MARL method that does not consider state uncertainties; it improves the reward, charging utilization fairness, and supply-demand fairness by 19.28%, 28.18%, and 3.97%, respectively. Compared with a robust optimization-based method, the proposed MARL algorithm can improve the reward, charging utilization fairness, and supply-demand fairness by 8.21%, 8.29%, and 9.42%, respectively.
    摘要 电动自驾车 (EAVs) 在未来的自动化 Shared Mobility-on-Demand (AMoD) 系统中受到关注,因为它们具有经济和社会的优势。然而,EAVs 的充电模式 (长时间充电、高频充电、不可预测的充电行为等) 使得预测 EAVs 供应很具有挑战性。此外, mobilité 需求预测的不确定性使得设计一个集成的车辆均衡解决方案变得非常困难和挑战。虽然 reinforcement learning 基于 E-AMoD 均衡算法得到了成功,但是 state uncertainties under the EV 供应或 mobilité 需求仍然未经探讨。在这种情况下,我们设计了一个多代理启发学 (MARL) 基本框架 для EAVs 均衡在 E-AMoD 系统中,并使用对抗代理来模拟 EAVs 供应和 mobilité 需求不确定性。然后,我们提出了一种可靠的 E-AMoD Balancing MARL (REBAMA) 算法,用于训练一个可靠的 EAVs 均衡策略,以平衡全市的供应和需求比例,同时保证充电利用率的平衡。实验显示,我们的提出的可靠方法在比较 non-robust MARL 方法时表现更好,提高了奖励、充电利用公平和供应需求公平的指标,分别提高了19.28%, 28.18%和3.97%。相比robust optimization-based方法,我们的 MARL 算法可以提高奖励、充电利用公平和供应需求公平的指标,分别提高了8.21%, 8.29%和9.42%。

Text Analysis Using Deep Neural Networks in Digital Humanities and Information Science

  • paper_url: http://arxiv.org/abs/2307.16217
  • repo_url: None
  • paper_authors: Omri Suissa, Avshalom Elmalech, Maayan Zhitomirsky-Geffet
  • for: This paper aims to explore the use of deep neural networks (DNNs) in Digital Humanities (DH) research and provide a practical decision model for DH experts to choose the appropriate deep learning approaches for their research.
  • methods: The paper analyzes multiple use-cases of DH studies in recent literature and their possible solutions, and lays out a practical decision model for DH experts to choose the appropriate deep learning approaches for their research.
  • results: The paper aims to raise awareness of the benefits of utilizing deep learning models in the DH community and provide a practical decision model for DH experts to choose the appropriate deep learning approaches for their research.Here’s the simplified Chinese text in the format you requested:
  • for: 这篇论文目的是探讨数字人文学科(DH)研究中使用深度神经网络(DNN)的可能性和实践。
  • methods: 论文分析了最新的DH研究文献中的多个用例和可能的解决方案,并提供了实用的决策模型,帮助DH专家选择适合他们研究的深度学习方法。
  • results: 论文的目的是为DH社区宣传深度学习模型的利好,并提供实用的决策模型,帮助DH专家选择适合他们研究的深度学习方法。
    Abstract Combining computational technologies and humanities is an ongoing effort aimed at making resources such as texts, images, audio, video, and other artifacts digitally available, searchable, and analyzable. In recent years, deep neural networks (DNN) dominate the field of automatic text analysis and natural language processing (NLP), in some cases presenting a super-human performance. DNNs are the state-of-the-art machine learning algorithms solving many NLP tasks that are relevant for Digital Humanities (DH) research, such as spell checking, language detection, entity extraction, author detection, question answering, and other tasks. These supervised algorithms learn patterns from a large number of "right" and "wrong" examples and apply them to new examples. However, using DNNs for analyzing the text resources in DH research presents two main challenges: (un)availability of training data and a need for domain adaptation. This paper explores these challenges by analyzing multiple use-cases of DH studies in recent literature and their possible solutions and lays out a practical decision model for DH experts for when and how to choose the appropriate deep learning approaches for their research. Moreover, in this paper, we aim to raise awareness of the benefits of utilizing deep learning models in the DH community.
    摘要 使用计算机技术和人文领域的结合是一项持续的努力,旨在使文本、图像、音频、视频和其他文化遗产数字化、搜索化和分析化。在最近几年里,深度神经网络(DNN)在自动文本分析和自然语言处理(NLP)领域占据了主导地位,在某些情况下表现出超人般的表现。DNN是当今最先进的机器学习算法,用于解决数字人文学科(DH)研究中有关的许多NLP任务,如拼写检查、语言检测、实体提取、作者检测、问答等任务。这些有监督的算法通过大量“正确”和“错误”示例学习出模式,然后应用于新示例。然而,在使用DNN进行人文学科研究中,存在两大挑战:数据训练的可用性和领域适应。本文通过分析多个DH研究中的用例,探讨这些挑战并提出解决方案,并提出了实用的决策模型,以帮助DH专家在选择合适的深度学习方法时作出决策。此外,本文的目的还是提高人文学科社区使用深度学习模型的认识。

Question Answering with Deep Neural Networks for Semi-Structured Heterogeneous Genealogical Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2307.16214
  • repo_url: https://github.com/omrivm/uncle-bert
  • paper_authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
  • for: 这个研究旨在开发一种基于家谱树的问答系统,以便为家谱研究提供更好的支持。
  • methods: 这个研究使用了一种综合家谱数据作为知识图,然后将其转换为文本,并将文本与不结构化文本混合在一起,最后使用一种基于Transformer架构的问答模型进行训练。
  • results: 研究发现,使用专门的方法可以减少问答模型的复杂性,同时提高准确性。这种方法可能对家谱研究和实际项目有实际应用,使家谱数据更加可 accessible。
    Abstract With the rising popularity of user-generated genealogical family trees, new genealogical information systems have been developed. State-of-the-art natural question answering algorithms use deep neural network (DNN) architecture based on self-attention networks. However, some of these models use sequence-based inputs and are not suitable to work with graph-based structure, while graph-based DNN models rely on high levels of comprehensiveness of knowledge graphs that is nonexistent in the genealogical domain. Moreover, these supervised DNN models require training datasets that are absent in the genealogical domain. This study proposes an end-to-end approach for question answering using genealogical family trees by: 1) representing genealogical data as knowledge graphs, 2) converting them to texts, 3) combining them with unstructured texts, and 4) training a trans-former-based question answering model. To evaluate the need for a dedicated approach, a comparison between the fine-tuned model (Uncle-BERT) trained on the auto-generated genealogical dataset and state-of-the-art question-answering models was per-formed. The findings indicate that there are significant differences between answering genealogical questions and open-domain questions. Moreover, the proposed methodology reduces complexity while increasing accuracy and may have practical implications for genealogical research and real-world projects, making genealogical data accessible to experts as well as the general public.
    摘要 随着用户生成的家谱树的流行,新的家谱信息系统被开发出来。现代自然问答算法使用深度神经网络(DNN)架构,其中一些模型使用序列化输入并不适用于图形结构,而图形基于DNN模型则需要高度完整的知识图,而这在家谱领域并不存在。此外,这些直接监督DNN模型需要家谱领域缺乏训练数据。本研究提出了一种端到端方法,通过以下步骤来解决问题:1)将家谱数据转换为知识图,2)将其转换为文本,3)将文本与无结构文本结合,4)使用转换器基于模型来回答问题。为评估需要专门的方法,对自动生成的家谱数据 fine-tune Uncle-BERT 模型和现有的问答模型进行比较。研究发现,回答家谱问题和开放领域问题存在显著差异。此外,提出的方法可以减少复杂性而提高准确率,可能对家谱研究和实际项目产生实质性的影响,让家谱数据更加可访易地访问ible for experts and the general public。

Robust Multi-Agent Reinforcement Learning with State Uncertainty

  • paper_url: http://arxiv.org/abs/2307.16212
  • repo_url: https://github.com/sihongho/robust_marl_with_state_uncertainty
  • paper_authors: Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao
  • for: 本研究旨在解决多智能体强化学习(MARL)中存在状态不确定性的问题,提高 MARL 的稳定性和可靠性。
  • methods: 本文提出了一种基于 Markov Game 的状态扰动敌对(MG-SPA)模型,并使用 robust equilibrium(RE)作为解题方法。同时,提出了一种基于 Q-学习 的 robust multi-agent Q-learning(RMAQ)算法,以及一种基于actor-critic 算法的 robust multi-agent actor-critic(RMAAC)算法,以处理高维状态动作空间。
  • results: 实验结果表明,提出的 RMAQ 算法可以寻求最优值函数;RMAAC 算法在多个多智能体环境中,在状态不确定性存在时,与多种 MARL 和robust MARL 方法相比,表现更高效。
    Abstract In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design. Motivated by this robustness issue and the lack of corresponding studies, we study the problem of MARL with state uncertainty in this work. We provide the first attempt to the theoretical and empirical analysis of this challenging problem. We first model the problem as a Markov Game with state perturbation adversaries (MG-SPA) by introducing a set of state perturbation adversaries into a Markov Game. We then introduce robust equilibrium (RE) as the solution concept of an MG-SPA. We conduct a fundamental analysis regarding MG-SPA such as giving conditions under which such a robust equilibrium exists. Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees. To handle high-dimensional state-action space, we design a robust multi-agent actor-critic (RMAAC) algorithm based on an analytical expression of the policy gradient derived in the paper. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function; our RMAAC algorithm outperforms several MARL and robust MARL methods in multiple multi-agent environments when state uncertainty is present. The source code is public on \url{https://github.com/sihongho/robust_marl_with_state_uncertainty}.
    摘要 在实际多智能体学习(MARL)应用中,智能体可能无法获得完美的状态信息(例如因为不准确的测量或攻击),这会对智能体的策略的稳定性造成挑战。虽然稳定性在MARL部署中变得越来越重要,但是前一个研究中对状态不确定性在MARL中的研究很少。为了解决这个稳定性问题和相关的研究不足,我们在这里研究了MARL中的状态不确定性问题。我们首先将问题模型为一个Markov游戏中的状态干扰者(MG-SPA),并在Markov游戏中引入一组状态干扰者。然后,我们引入了稳定平衡(RE)作为MG-SPA的解决方案。我们进行了基本的分析,并给出了存在稳定平衡的条件。然后,我们提出了一种稳定多智能体Q学习(RMAQ)算法,以找到这样的平衡,并提供了一些确定性的证明。为了处理高维状态动作空间,我们设计了一种基于分析表达的策略梯度的稳定多智能体actor-critic(RMAAC)算法。我们的实验表明,我们的RMAQ算法可以到达最优的值函数;我们的RMAAC算法在多个多智能体环境中,当状态不确定性存在时,与多个MARL和稳定MARL方法相比,表现更好。源代码可以在上获取。

Rethinking Uncertainly Missing and Ambiguous Visual Modality in Multi-Modal Entity Alignment

  • paper_url: http://arxiv.org/abs/2307.16210
  • repo_url: https://github.com/zjukg/UMAEA
  • paper_authors: Zhuo Chen, Lingbing Guo, Yin Fang, Yichi Zhang, Jiaoyan Chen, Jeff Z. Pan, Yangning Li, Huajun Chen, Wen Zhang
  • for: 多modalentityAlignment (MMEA) 的挑战,包括模式噪声和内在的模式不确定性。
  • methods: 我们提出了一种基于 uncertainly missing and ambiguous visual modalities的Robust Multi-modal Entity Alignment (UMAEA) 方法,并在多个 benchmark splits 上达到了最佳性能。
  • results: UMAEA 方法在face of modality incompleteness和模式不确定性中具有优秀的性能,比如其他模型具有更多的参数和更多的计时时间,同时能够有效地缓解其他模型中的限制。
    Abstract As a crucial extension of entity alignment (EA), multi-modal entity alignment (MMEA) aims to identify identical entities across disparate knowledge graphs (KGs) by exploiting associated visual information. However, existing MMEA approaches primarily concentrate on the fusion paradigm of multi-modal entity features, while neglecting the challenges presented by the pervasive phenomenon of missing and intrinsic ambiguity of visual images. In this paper, we present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM, where the types of alignment KGs covering bilingual and monolingual, with standard (non-iterative) and iterative training paradigms to evaluate the model performance. Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality. This proves that the inclusion of additional multi-modal data can sometimes adversely affect EA. To address these challenges, we introduce UMAEA , a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities. It consistently achieves SOTA performance across all 97 benchmark splits, significantly surpassing existing baselines with limited parameters and time consumption, while effectively alleviating the identified limitations of other models. Our code and benchmark data are available at https://github.com/zjukg/UMAEA.
    摘要 为了解决多个知识图(KG)之间的实体对应关系(Entity Alignment,EA)的扩展,多模态实体对应(Multi-modal Entity Alignment,MMEA)尝试通过利用相关的视觉信息来标识不同知识图中的相同实体。然而,现有的MMEA方法主要集中在多模态实体特征的融合方法上,而忽略了视觉图像中的普遍现象——缺失和内在的模糊性。在这篇论文中,我们进行了视觉Modal的进一步分析,并在我们提出的MMEA-UMVM数据集上 benchmark最新的MMEA模型。我们的研究表明,在面临多模态缺失的情况下,模型会受到模态噪声的折衔,并且在高比例的缺失多模态时,表现出振荡或下降的趋势。这表明,在多模态缺失情况下,模型可能会因为模态噪声而降低性能。为了解决这些挑战,我们提出了UMAEA,一种适应不确定、缺失和模糊的视觉多模态实体对应方法。它在所有97个 benchmark split中表现出了最高的SOTA性能,超过了已有的基线值,同时具有有限的参数和时间投入。我们的代码和 benchmark数据可以在https://github.com/zjukg/UMAEA上获取。

Around the GLOBE: Numerical Aggregation Question-Answering on Heterogeneous Genealogical Knowledge Graphs with Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.16208
  • repo_url: None
  • paper_authors: Omri Suissa, Maayan Zhitomirsky-Geffet, Avshalom Elmalech
    for:This paper is written for researchers and practitioners in the field of natural language processing and genealogy, as well as for the general public who are interested in exploring cultural heritage domains.methods:The paper proposes a new end-to-end methodology for numerical aggregation question-answering (QA) for genealogical trees, which includes an automatic method for training dataset generation, a transformer-based table selection method, and an optimized transformer-based numerical aggregation QA model.results:The proposed architecture, called GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for the task of numerical aggregation QA compared to only 21% by current state-of-the-art models.
    Abstract One of the key AI tools for textual corpora exploration is natural language question-answering (QA). Unlike keyword-based search engines, QA algorithms receive and process natural language questions and produce precise answers to these questions, rather than long lists of documents that need to be manually scanned by the users. State-of-the-art QA algorithms based on DNNs were successfully employed in various domains. However, QA in the genealogical domain is still underexplored, while researchers in this field (and other fields in humanities and social sciences) can highly benefit from the ability to ask questions in natural language, receive concrete answers and gain insights hidden within large corpora. While some research has been recently conducted for factual QA in the genealogical domain, to the best of our knowledge, there is no previous research on the more challenging task of numerical aggregation QA (i.e., answering questions combining aggregation functions, e.g., count, average, max). Numerical aggregation QA is critical for distant reading and analysis for researchers (and the general public) interested in investigating cultural heritage domains. Therefore, in this study, we present a new end-to-end methodology for numerical aggregation QA for genealogical trees that includes: 1) an automatic method for training dataset generation; 2) a transformer-based table selection method, and 3) an optimized transformer-based numerical aggregation QA model. The findings indicate that the proposed architecture, GLOBE, outperforms the state-of-the-art models and pipelines by achieving 87% accuracy for this task compared to only 21% by current state-of-the-art models. This study may have practical implications for genealogical information centers and museums, making genealogical data research easy and scalable for experts as well as the general public.
    摘要 一种关键的人工智能工具 для文本资料探索是自然语言问答(QA)。不同于关键词搜索引擎,QA算法会根据自然语言问题提供精确的答案,而不是长列表需要手动扫描的文档。现状最先进的QA算法基于深度学习神经网络(DNN)在多个领域得到了成功应用。然而,在家谱领域,QA仍然受到了不足的研究,而家谱领域的研究人员(以及人文社科领域的研究人员)可以很大程度上受益于自然语言问题的能力,并且可以通过自然语言问题来获得潜藏在大量文档中的新的发现和理解。虽然有些研究已经在家谱领域进行了实际问答,但我们知道的是,没有任何研究在家谱领域进行了更加复杂的数学聚合问答(例如计数、平均值、最大值)。数学聚合问答是远程阅读和分析的关键,因此在这种领域进行数学聚合问答是非常重要的。因此,在本研究中,我们提出了一种新的端到端方法,名为GLOBE,用于家谱领域的数学聚合问答。GLOBE方法包括:1)自动生成训练数据集方法;2)基于转换器的表格选择方法;3)优化的转换器基于数学聚合问答模型。研究结果表明,GLOBE方法在这个任务上的准确率为87%,比现有的状态OF艺术模型和管道的准确率高出26倍。这项研究可能对家谱信息中心和博物馆产生实质性的影响,使家谱数据研究变得容易和可扩展,以便专家和一般公众都能够轻松地进行研究。

Synthesizing Event-centric Knowledge Graphs of Daily Activities Using Virtual Space

  • paper_url: http://arxiv.org/abs/2307.16206
  • repo_url: https://github.com/aistairc/virtualhome2kg
  • paper_authors: Shusaku Egami, Takanori Ugai, Mikiko Oono, Koji Kitamura, Ken Fukuda
  • for: 本研究旨在提供一个虚拟空间内的日常活动知识 graphs(KG)构建框架,以支持人类日常生活中的各种情感和决策。
  • methods: 本研究使用的方法包括虚拟空间 simulations、事件中心式架构、和Contextual semantic data的生成。
  • results: 本研究通过实验示出了虚拟Home2KG框架的实用性和潜力,并显示了可以通过该框架进行日常活动分析、问题回答、嵌入和散列等应用。
    Abstract Artificial intelligence (AI) is expected to be embodied in software agents, robots, and cyber-physical systems that can understand the various contextual information of daily life in the home environment to support human behavior and decision making in various situations. Scene graph and knowledge graph (KG) construction technologies have attracted much attention for knowledge-based embodied question answering meeting this expectation. However, collecting and managing real data on daily activities under various experimental conditions in a physical space are quite costly, and developing AI that understands the intentions and contexts is difficult. In the future, data from both virtual spaces, where conditions can be easily modified, and physical spaces, where conditions are difficult to change, are expected to be combined to analyze daily living activities. However, studies on the KG construction of daily activities using virtual space and their application have yet to progress. The potential and challenges must still be clarified to facilitate AI development for human daily life. Thus, this study proposes the VirtualHome2KG framework to generate synthetic KGs of daily life activities in virtual space. This framework augments both the synthetic video data of daily activities and the contextual semantic data corresponding to the video contents based on the proposed event-centric schema and virtual space simulation results. Therefore, context-aware data can be analyzed, and various applications that have conventionally been difficult to develop due to the insufficient availability of relevant data and semantic information can be developed. We also demonstrate herein the utility and potential of the proposed VirtualHome2KG framework through several use cases, including the analysis of daily activities by querying, embedding, and clustering, and fall risk detection among ...
    摘要 人工智能(AI)预期会被嵌入软件代理、机器人和 cyber-physical 系统中,以便在家庭环境中理解日常生活中的多种情况信息,以支持人类行为和决策。Scene graph和知识图(KG)建构技术吸引了很多关注,以满足这个期望。然而,收集和管理实际情况下的日常活动数据在物理空间是非常成本的,而发展AI理解意图和情况是困难的。未来,来自虚拟空间和物理空间的数据将被组合分析日常生活活动。然而,在虚拟空间和物理空间的KG建构日常活动研究仍然处于早期阶段。因此,本研究提出了虚拟家庭2KG框架,用于生成虚拟空间中的日常活动Synthetic KG。该框架将融合日常活动的 sintetic 视频数据和相关的上下文semantic数据,根据提出的事件-中心架构和虚拟空间模拟结果。因此,可以分析上下文化数据,并开发过去由于数据和semantic信息的不足而困难的应用。我们还在本文中展示了虚拟家庭2KG框架的实用性和潜力,包括查询、嵌入和凝聚等多种应用场景,以及落干风险检测等。

Shuffled Differentially Private Federated Learning for Time Series Data Analytics

  • paper_url: http://arxiv.org/abs/2307.16196
  • repo_url: None
  • paper_authors: Chenxi Huang, Chaoyang Jiang, Zhenghua Chen
  • for: 针对时间序列资料的信任worthy联合学习,实现最佳性能 while ensuring clients’ privacy.
  • methods: 使用本地差异隐藏来扩展隐藏保护 bound 到客户端,并将抛终技术 incorporated 以实现隐藏增强,从而缓解因采用本地差异隐藏而导致的准确度下降。
  • results: 在五个时间序列数据集上进行了广泛的实验,结果显示了我们的算法在小客户和大客户enario 中都实现了最小的准确度损失,并在同等隐藏保护水平下与中央差异隐藏联合学习相比,在小客户和大客户enario 中都展现出了改善的准确度。
    Abstract Trustworthy federated learning aims to achieve optimal performance while ensuring clients' privacy. Existing privacy-preserving federated learning approaches are mostly tailored for image data, lacking applications for time series data, which have many important applications, like machine health monitoring, human activity recognition, etc. Furthermore, protective noising on a time series data analytics model can significantly interfere with temporal-dependent learning, leading to a greater decline in accuracy. To address these issues, we develop a privacy-preserving federated learning algorithm for time series data. Specifically, we employ local differential privacy to extend the privacy protection trust boundary to the clients. We also incorporate shuffle techniques to achieve a privacy amplification, mitigating the accuracy decline caused by leveraging local differential privacy. Extensive experiments were conducted on five time series datasets. The evaluation results reveal that our algorithm experienced minimal accuracy loss compared to non-private federated learning in both small and large client scenarios. Under the same level of privacy protection, our algorithm demonstrated improved accuracy compared to the centralized differentially private federated learning in both scenarios.
    摘要 信任worthy的联合学习 aimsto achieve optimal performance while ensuring clients' privacy. Existing privacy-preserving federated learning approaches are mostly tailored for image data, lacking applications for time series data, which have many important applications, such as machine health monitoring and human activity recognition. Furthermore, protective noising on a time series data analytics model can significantly interfere with temporal-dependent learning, leading to a greater decline in accuracy. To address these issues, we develop a privacy-preserving federated learning algorithm for time series data. Specifically, we employ local differential privacy to extend the privacy protection trust boundary to the clients. We also incorporate shuffle techniques to achieve a privacy amplification, mitigating the accuracy decline caused by leveraging local differential privacy. Extensive experiments were conducted on five time series datasets. The evaluation results reveal that our algorithm experienced minimal accuracy loss compared to non-private federated learning in both small and large client scenarios. Under the same level of privacy protection, our algorithm demonstrated improved accuracy compared to the centralized differentially private federated learning in both scenarios.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard versions of Chinese. If you prefer Traditional Chinese, please let me know and I will be happy to provide the translation in that version as well.

CLGT: A Graph Transformer for Student Performance Prediction in Collaborative Learning

  • paper_url: http://arxiv.org/abs/2308.02038
  • repo_url: https://github.com/tianhao-peng/clgt
  • paper_authors: Tianhao Peng, Yu Liang, Wenjun Wu, Jian Ren, Zhao Pengrui, Yanjun Pu
  • for: 本研究旨在模型和预测学生在合作学习 paradigm 中的表现。大多数 literatura 中的研究都集中在讨论 forum 和社交学习网络。只有一些工作研究了学生在团队项目中如何互动,以及这些互动如何影响他们的学术表现。为了填补这个差距,我们选择了一个软件工程课程作为研究主题。参与这门课程的学生需要组队完成一个软件项目。在这种情况下,我们构建了一个学生互动图,基于不同团队中学生的活动。以这个学生互动图为基础,我们提出了一种扩展的图 transformer 框架 для合作学习(CLGT),用于评估和预测学生的表现。此外,提出的 CLGT 还包括一个解释模块,用于解释预测结果并可视化学生互动模式。实验结果表明,提出的 CLGT 在基于实际数据集上进行预测时,与基准模型相比,表现更高。此外,提出的 CLGT 可以 diferenciate 学生在合作学习 paradigm 中的低表现学生,并给教师提供早期预警,以便提供相应的帮助。
    Abstract Modeling and predicting the performance of students in collaborative learning paradigms is an important task. Most of the research presented in literature regarding collaborative learning focuses on the discussion forums and social learning networks. There are only a few works that investigate how students interact with each other in team projects and how such interactions affect their academic performance. In order to bridge this gap, we choose a software engineering course as the study subject. The students who participate in a software engineering course are required to team up and complete a software project together. In this work, we construct an interaction graph based on the activities of students grouped in various teams. Based on this student interaction graph, we present an extended graph transformer framework for collaborative learning (CLGT) for evaluating and predicting the performance of students. Moreover, the proposed CLGT contains an interpretation module that explains the prediction results and visualizes the student interaction patterns. The experimental results confirm that the proposed CLGT outperforms the baseline models in terms of performing predictions based on the real-world datasets. Moreover, the proposed CLGT differentiates the students with poor performance in the collaborative learning paradigm and gives teachers early warnings, so that appropriate assistance can be provided.
    摘要 学习协作模式下学生表现预测和评价是一项重要任务。大多数文献中的研究都集中在讨论区和社交学习网络上,只有一些研究探讨了学生在团队项目中之间的互动如何影响学业表现。为了填补这一漏洞,我们选择了软件工程课程作为研究对象。参与这门课程的学生需要组队完成软件项目。在这种情况下,我们构建了基于学生分组的团队活动图,然后提出了一种基于协作学习(CLGT)扩展图 transformer 框架,用于评估和预测学生表现。此外,我们的 CLGT 还包括一个解释模块,可以解释预测结果并可视化学生互动模式。实验结果表明,我们的 CLGT 在实际数据集上表现较好,而且可以区分协作学习中表现不佳的学生,为教师提供早期预警,以便提供相应的帮助。

ESP: Exploiting Symmetry Prior for Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.16186
  • repo_url: None
  • paper_authors: Xin Yu, Rongye Shi, Pu Feng, Yongkai Tian, Jie Luo, Wenjun Wu
  • for: 提高多智能体学习(MARL)的数据效率和模型准确性。
  • methods: integrate data augmentation和一种Well-designed consistency loss到现有的MARL方法中,使用协同学习和约束优化。
  • results: 在多个复杂任务上实现了效果,并在物理多机器人测试环境中证明了其优势。Here’s the breakdown of each point:1. for: This paper aims to improve the data efficiency and model accuracy of MARL by incorporating prior knowledge and using data augmentation.2. methods: The proposed framework uses a well-designed consistency loss and integrates it with existing MARL methods, utilizing both individual and cooperative learning.3. results: The proposed framework achieves effectiveness on multiple challenging tasks and outperforms existing methods in a physical multi-robot testbed.
    Abstract Multi-agent reinforcement learning (MARL) has achieved promising results in recent years. However, most existing reinforcement learning methods require a large amount of data for model training. In addition, data-efficient reinforcement learning requires the construction of strong inductive biases, which are ignored in the current MARL approaches. Inspired by the symmetry phenomenon in multi-agent systems, this paper proposes a framework for exploiting prior knowledge by integrating data augmentation and a well-designed consistency loss into the existing MARL methods. In addition, the proposed framework is model-agnostic and can be applied to most of the current MARL algorithms. Experimental tests on multiple challenging tasks demonstrate the effectiveness of the proposed framework. Moreover, the proposed framework is applied to a physical multi-robot testbed to show its superiority.
    摘要 多智能体强化学习(MARL)在最近几年内已经取得了成功的结果。然而,现有的强化学习方法大多需要训练模型的大量数据。此外,数据效率的强化学习还需要建立强的概念预测,这些预测在当前的 MARL 方法中被忽略了。 inspirited by 多智能体系统中的对称现象,本文提出了一种将数据扩展和一种良好设计的一致损失integrated into the existing MARL methods。此外,提出的框架是model-agnostic,可以应用于大多数当前的 MARL算法。实验测试在多个复杂任务上表明了提出的框架的有效性。此外,提出的框架还应用于一个物理多机器人测试平台,以示其优势。Note: The translation is done using a machine translation tool, and may not be perfect. Please note that the translation is provided as-is, and may not be accurate or idiomatic.

Data-Driven Modeling with Experimental Augmentation for the Modulation Strategy of the Dual-Active-Bridge Converter

  • paper_url: http://arxiv.org/abs/2307.16173
  • repo_url: None
  • paper_authors: Xinze Li, Josep Pou, Jiaxin Dong, Fanfan Lin, Changyun Wen, Suvajit Mukherjee, Xin Zhang
  • for: 提高功率转换器性能模型的准确性和实用性
  • methods: combines simulation data and experimental data to establish a highly accurate and practical data-driven model
  • results: 实现了99.92%的效率模型准确性,并在2kW硬件实验中达到了98.45%的峰效率
    Abstract For the performance modeling of power converters, the mainstream approaches are essentially knowledge-based, suffering from heavy manpower burden and low modeling accuracy. Recent emerging data-driven techniques greatly relieve human reliance by automatic modeling from simulation data. However, model discrepancy may occur due to unmodeled parasitics, deficient thermal and magnetic models, unpredictable ambient conditions, etc. These inaccurate data-driven models based on pure simulation cannot represent the practical performance in physical world, hindering their applications in power converter modeling. To alleviate model discrepancy and improve accuracy in practice, this paper proposes a novel data-driven modeling with experimental augmentation (D2EA), leveraging both simulation data and experimental data. In D2EA, simulation data aims to establish basic functional landscape, and experimental data focuses on matching actual performance in real world. The D2EA approach is instantiated for the efficiency optimization of a hybrid modulation for neutral-point-clamped dual-active-bridge (NPC-DAB) converter. The proposed D2EA approach realizes 99.92% efficiency modeling accuracy, and its feasibility is comprehensively validated in 2-kW hardware experiments, where the peak efficiency of 98.45% is attained. Overall, D2EA is data-light and can achieve highly accurate and highly practical data-driven models in one shot, and it is scalable to other applications, effortlessly.
    摘要 现代电源转换器性能模型ing的主流方法基本上是知识基础的,受到人工劳动的重荷和低精度模型ing的限制。Recent emerging data-driven techniques greatly relieve human reliance by automatic modeling from simulation data. However, model discrepancy may occur due to unmodeled parasitics, deficient thermal and magnetic models, unpredictable ambient conditions, etc. These inaccurate data-driven models based on pure simulation cannot represent the practical performance in physical world, hindering their applications in power converter modeling. To alleviate model discrepancy and improve accuracy in practice, this paper proposes a novel data-driven modeling with experimental augmentation (D2EA), leveraging both simulation data and experimental data. In D2EA, simulation data aims to establish basic functional landscape, and experimental data focuses on matching actual performance in real world. The D2EA approach is instantiated for the efficiency optimization of a hybrid modulation for neutral-point-clamped dual-active-bridge (NPC-DAB) converter. The proposed D2EA approach realizes 99.92% efficiency modeling accuracy, and its feasibility is comprehensively validated in 2-kW hardware experiments, where the peak efficiency of 98.45% is attained. Overall, D2EA is data-light and can achieve highly accurate and highly practical data-driven models in one shot, and it is scalable to other applications, effortlessly.

HierVST: Hierarchical Adaptive Zero-shot Voice Style Transfer

  • paper_url: http://arxiv.org/abs/2307.16171
  • repo_url: None
  • paper_authors: Sang-Hoon Lee, Ha-Yeong Choi, Hyung-Seok Oh, Seong-Whan Lee
  • for: 这个论文是为了解决Zero-shot voice style transfer(VST)系统中,新的话者语言风格转移的问题。
  • methods: 这个论文使用了 Hierarchical adaptive end-to-end zero-shot VST 模型,不需要文本输入,只使用了语音数据来训练模型,并利用了层次分布式构造和自我supervised representation。
  • results: 实验结果显示,我们的方法在Zero-shot VST scenario中表现更好,并且可以预测进行话者语言风格转移。Audio samples可以在 \url{https://hiervst.github.io/} 上找到。
    Abstract Despite rapid progress in the voice style transfer (VST) field, recent zero-shot VST systems still lack the ability to transfer the voice style of a novel speaker. In this paper, we present HierVST, a hierarchical adaptive end-to-end zero-shot VST model. Without any text transcripts, we only use the speech dataset to train the model by utilizing hierarchical variational inference and self-supervised representation. In addition, we adopt a hierarchical adaptive generator that generates the pitch representation and waveform audio sequentially. Moreover, we utilize unconditional generation to improve the speaker-relative acoustic capacity in the acoustic representation. With a hierarchical adaptive structure, the model can adapt to a novel voice style and convert speech progressively. The experimental results demonstrate that our method outperforms other VST models in zero-shot VST scenarios. Audio samples are available at \url{https://hiervst.github.io/}.
    摘要 尽管voice style transfer(VST)领域的进步迅速,现有的零shot VST系统仍然缺乏将新 speaker的voice style转移的能力。在这篇论文中,我们提出了层次适应式结构的终端零shot VST模型,即HierVST。无需文本脚本,我们只使用语音数据来训练模型,通过层次变量推理和自主学习表示。此外,我们采用层次适应生成器,生成抖音表示和波形声音sequentially。此外,我们利用无条件生成提高了发音人Relative acoustic representation的能力。通过层次适应结构,模型可以适应新的语音风格,并逐渐转换语音。实验结果表明,我们的方法在零shot VST场景中超过了其他VST模型。听样本可以在 \url{https://hiervst.github.io/} 上找到。

An Effective LSTM-DDPM Scheme for Energy Theft Detection and Forecasting in Smart Grid

  • paper_url: http://arxiv.org/abs/2307.16149
  • repo_url: None
  • paper_authors: Xun Yuan, Yang Yang, Arwa Alromih, Prosanta Gope, Biplab Sikdar
  • for: 这篇论文旨在解决智能电网系统中的能源盗窃探测 (ETD) 和能源消耗预测 (ECF) 两个几相关的挑战,以确保系统安全。
  • methods: 本文提出的解决方案结合了长期内部积存 (LSTM) 和检测扩散概率模型 (DDPM),实现输入重建和预测。系统通过利用重建和预测错误来识别能源盗窃实例,并且通过重建错误和预测错误的结合来检测不同类型的攻击。
  • results: 经过实验表明,提出的方案在真实数据和 sintetic 数据上都表现出色,较基eline方法有更好的检测和预测性。 ensemble 方法可以强化 ETD 性能,精确地检测能源盗窃攻击,而 baseline 方法则失败。
    Abstract Energy theft detection (ETD) and energy consumption forecasting (ECF) are two interconnected challenges in smart grid systems. Addressing these issues collectively is crucial for ensuring system security. This paper addresses the interconnected challenges of ETD and ECF in smart grid systems. The proposed solution combines long short-term memory (LSTM) and a denoising diffusion probabilistic model (DDPM) to generate input reconstruction and forecasting. By leveraging the reconstruction and forecasting errors, the system identifies instances of energy theft, with the methods based on reconstruction error and forecasting error complementing each other in detecting different types of attacks. Through extensive experiments on real-world and synthetic datasets, the proposed scheme outperforms baseline methods in ETD and ECF problems. The ensemble method significantly enhances ETD performance, accurately detecting energy theft attacks that baseline methods fail to detect. The research offers a comprehensive and effective solution for addressing ETD and ECF challenges, demonstrating promising results and improved security in smart grid systems.
    摘要 智能电网系统中的能源盗链检测(ETD)和能源消耗预测(ECF)是两个相互关联的挑战。解决这两个问题是确保系统安全的关键。本文介绍了智能电网系统中ETD和ECF的解决方案,combines long short-term memory(LSTM)和denoising diffusion probabilistic model(DDPM)来生成输入重建和预测。通过利用重建和预测错误,系统可以识别能源盗链行为,baseline方法不能检测的不同类型的攻击。通过对实际数据和 sintetic 数据进行广泛的实验,提出的方案在ETD和ECF问题中表现出色,significantly enhances ETD性能,准确地检测能源盗链攻击。本研究提供了智能电网系统中ETD和ECF问题的全面和有效解决方案,实现了系统安全性的提高。

Fully $1\times1$ Convolutional Network for Lightweight Image Super-Resolution

  • paper_url: http://arxiv.org/abs/2307.16140
  • repo_url: https://github.com/aitical/scnet
  • paper_authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu
  • for: 提高单张图像超解像(SISR)任务中的性能,特别是在具有大kernel(3×3或更大)的深度模型中。
  • methods: 提出一种简单 yet effective的完全$1\times1$卷积网络,称为Shift-Conv-based Network(SCNet),通过添加一个参数自由的空间移动操作,使得完全$1\times1$卷积网络具有强大的表示能力和出色的计算效率。
  • results: 经验表明,SCNets,即使完全使用$1\times1$卷积结构,可以与现有的轻量级SR模型相匹配或超越其性能。
    Abstract Deep models have achieved significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3\times3$ or more). However, the heavy computational footprint of such models prevents their deployment in real-time, resource-constrained environments. Conversely, $1\times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations, an essential capability to SISR models. In response to this dichotomy, we propose to harmonize the merits of both $3\times3$ and $1\times1$ kernels, and exploit a great potential for lightweight SISR tasks. Specifically, we propose a simple yet effective fully $1\times1$ convolutional network, named Shift-Conv-based Network (SCNet). By incorporating a parameter-free spatial-shift operation, it equips the fully $1\times1$ convolutional network with powerful representation capability while impressive computational efficiency. Extensive experiments demonstrate that SCNets, despite its fully $1\times1$ convolutional structure, consistently matches or even surpasses the performance of existing lightweight SR models that employ regular convolutions.
    摘要 深度模型在单图超分辨(SISR)任务上已经实现了显著的进步,特别是大型模型与大ernel(3×3或更大)。然而,这些模型的计算负担太大,使得它们在实时、资源受限的环境中不得不进行部署。相反,$1\times1$ convolution具有很大的计算效率,但是它们很难将本地空间表示合并成功。为了解决这种对立,我们提议融合$3\times3$和$1\times1$ kernel的优点,并利用轻量级SR任务的潜在能力。具体来说,我们提出了一种简单 yet effective的全$1\times1$ convolutional neural network(SCNet)。通过添加无参数的空间移动操作,SCNet可以具有强大的表示能力,同时具有出色的计算效率。广泛的实验表明,尽管SCNet具有全$1\times1$ convolutional结构,仍然可以与现有的轻量级SR模型相比或超越其性能。

User-Controlled Knowledge Fusion in Large Language Models: Balancing Creativity and Hallucination

  • paper_url: http://arxiv.org/abs/2307.16139
  • repo_url: None
  • paper_authors: Chen Zhang
  • for: 这篇论文旨在提出一种用户控制的机制,以调节大语言模型(LLM)的假设和现实知识之间的平衡。
  • methods: 该方法在训练阶段使用数字标签来表示LLM在生成响应时的 faithfulness degree,并通过自动化的过程来计算这个度量,包括 ROUGE scores、Sentence-BERT 嵌入和 LLM 自我评估得分。
  • results: 研究人员在不同的场景下进行了广泛的实验,并证明了该方法的适应性和精度。结果表明,该方法可以增强 LLM 的多样性,同时保持它们的假设和投影之间的平衡。
    Abstract In modern dialogue systems, the use of Large Language Models (LLMs) has grown exponentially due to their capacity to generate diverse, relevant, and creative responses. Despite their strengths, striking a balance between the LLMs' creativity and their faithfulness to external knowledge remains a key challenge. This paper presents an innovative user-controllable mechanism that modulates the balance between an LLM's imaginative capabilities and its adherence to factual information. Our approach incorporates a numerical tag during the fine-tuning phase of the LLM's training, representing the degree of faithfulness to the reference knowledge in the generated responses. This degree is computed through an automated process that measures lexical overlap using ROUGE scores, semantic similarity using Sentence-BERT embeddings, and an LLM's self-evaluation score. During model inference, users can manipulate this numerical tag, thus controlling the degree of the LLM's reliance on external knowledge. We conduct extensive experiments across various scenarios, demonstrating the adaptability of our method and its efficacy in ensuring the quality and accuracy of the LLM's responses. The results highlight the potential of our approach to enhance the versatility of LLMs while maintaining a balance between creativity and hallucination.
    摘要

Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving

  • paper_url: http://arxiv.org/abs/2307.16121
  • repo_url: None
  • paper_authors: Yang Lou, Qun Song, Qian Xu, Rui Tan, Jianping Wang
  • for: 提高自动驾驶感知器对象检测的精度和可靠性。
  • methods: 利用不同感知器的检测结果和单模态不确定性进行多模态融合,并通过门控网络对结果进行权重衡量。
  • results: 与状态bla bla bla相比,提高了10.67%, 3.17%, 5.40%的性能。
    Abstract Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertainties into the multi-modal fusion still lacks effective solutions due primarily to the uncertainty's cross-modal incomparability and distinct sensitivities to various adverse conditions. To fill this gap, this paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses individual expert network to process each sensor's detection result together with encoded uncertainty. Then, the expert networks' outputs are analyzed by a gating network to determine the fusion weights. The proposed UMoE module can be integrated into any proposal fusion pipeline. Evaluation shows that UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors under extreme weather, adversarial, and blinding attack scenarios.
    摘要 Here is the Simplified Chinese translation:多modal融合已经在自动驾驶感知中展示了初步的抢眼结果,但许多现有的融合方案不考虑每个融合输入的质量,可能会受到一或多个感知器的不良条件的影响。尽管预测不确定性已经应用于characterize单modal对象检测性能的实时,但在多modal融合中缺乏有效的解决方案,主要是因为不确定性的跨模异常性和不同的感知器对各种不良条件的敏感性。为了填补这个空白,这篇论文提出了不确定性编码的权重混合(UMoE)模块,该模块将单modal不确定性编码成特征网络中的一部分。然后,这些特征网络的输出将被分析器网络分析,以确定融合权重。该提出的UMoE模块可以与任何提议融合管道集成。评估结果表明,UMoE在极端天气、反击和盲目攻击等场景下达到了最大10.67%, 3.17%和5.40%的性能提升。

AI Increases Global Access to Reliable Flood Forecasts

  • paper_url: http://arxiv.org/abs/2307.16104
  • repo_url: https://github.com/google-research-datasets/global_streamflow_model_paper
  • paper_authors: Grey Nearing, Deborah Cohen, Vusumuzi Dube, Martin Gauch, Oren Gilon, Shaun Harrigan, Avinatan Hassidim, Frederik Kratzert, Asher Metzger, Sella Nevo, Florian Pappenberger, Christel Prudhomme, Guy Shalev, Shlomo Shenzis, Tadele Tekalign, Dana Weitzner, Yoss Matias
  • for: 这项研究的目的是开发一种基于人工智能的洪水预测模型,以提供更加准确和及时的洪水警报。
  • methods: 该模型使用了人工智能技术,并使用了全球覆盖率较高的卫星数据和开放数据来预测洪水事件。
  • results: 该模型在全球各大洲的洪水预测中表现出色,特别是在 ungauged 水系中,其预测精度高于现有的全球洪水模型。
    Abstract Floods are one of the most common and impactful natural disasters, with a disproportionate impact in developing countries that often lack dense streamflow monitoring networks. Accurate and timely warnings are critical for mitigating flood risks, but accurate hydrological simulation models typically must be calibrated to long data records in each watershed where they are applied. We developed an Artificial Intelligence (AI) model to predict extreme hydrological events at timescales up to 7 days in advance. This model significantly outperforms current state of the art global hydrology models (the Copernicus Emergency Management Service Global Flood Awareness System) across all continents, lead times, and return periods. AI is especially effective at forecasting in ungauged basins, which is important because only a few percent of the world's watersheds have stream gauges, with a disproportionate number of ungauged basins in developing countries that are especially vulnerable to the human impacts of flooding. We produce forecasts of extreme events in South America and Africa that achieve reliability approaching the current state of the art in Europe and North America, and we achieve reliability at between 4 and 6-day lead times that are similar to current state of the art nowcasts (0-day lead time). Additionally, we achieve accuracies over 10-year return period events that are similar to current accuracies over 2-year return period events, meaning that AI can provide warnings earlier and over larger and more impactful events. The model that we develop in this paper has been incorporated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. This work using AI and open data highlights a need for increasing the availability of hydrological data to continue to improve global access to reliable flood warnings.
    摘要 洪水是最常见且影响最大的自然灾害之一,特别是在发展中国家,那里缺乏密集的流量监测网。精确和时刻的警告是控制洪水风险的关键,但是需要对每个水系进行精确的水文模型协调。我们开发了人工智能(AI)模型,可以预测极端ydrological事件,时间从7天前到7天后。这个模型在全球各大洲、不同的领先时间和回报期都有出色的表现,特别是在无测流域,因为只有少数世界的水系有流量测站,而那些缺乏测站的水系往往是发展中国家,他们对人类洪水的影响更加敏感。我们的预测在南美和非洲 achieves reliability approaching the current state of the art in Europe and North America, and we achieve reliability at between 4 and 6-day lead times that are similar to current state of the art nowcasts (0-day lead time). In addition, we achieve accuracies over 10-year return period events that are similar to current accuracies over 2-year return period events, meaning that AI can provide warnings earlier and over larger and more impactful events. 我们在这篇文章中开发的模型已经被integrated into an operational early warning system that produces publicly available (free and open) forecasts in real time in over 80 countries. 这个使用AI和开放数据的工作表明了需要增加水文数据的可用性,以继续提高全球访问可靠的洪水警告。

PD-SEG: Population Disaggregation Using Deep Segmentation Networks For Improved Built Settlement Mask

  • paper_url: http://arxiv.org/abs/2307.16084
  • repo_url: None
  • paper_authors: Muhammad Abdul Rahman, Muhammad Ahmad Waseem, Zubair Khalid, Muhammad Tahir, Momin Uppal
  • For: 该研究旨在提供高精度的人口普查数据,以便用于国家发展规划和资源分配决策。* Methods: 该研究使用深度分割网络生成高精度的建成区域面积图像,并使用POI数据排除非居住区域。* Results: 该研究可以准确地估计人口总数和人口密度,并可以提供30米x30米的分辨率的人口普查数据。
    Abstract Any policy-level decision-making procedure and academic research involving the optimum use of resources for development and planning initiatives depends on accurate population density statistics. The current cutting-edge datasets offered by WorldPop and Meta do not succeed in achieving this aim for developing nations like Pakistan; the inputs to their algorithms provide flawed estimates that fail to capture the spatial and land-use dynamics. In order to precisely estimate population counts at a resolution of 30 meters by 30 meters, we use an accurate built settlement mask obtained using deep segmentation networks and satellite imagery. The Points of Interest (POI) data is also used to exclude non-residential areas.
    摘要 任何政策层次决策过程和学术研究,涉及资源最佳利用 для发展和规划倡议,都取决于准确的人口密度统计。现有的最先进数据集,如WorldPop和Meta,无法实现这一目标,因为它们的输入算法不能准确捕捉空间和土地利用动态。为了准确地估算人口数,我们使用高精度的建成市区mask,以及POI数据来排除非居住区域。注意:以下文本使用了简化中文,与标准中文有些细微的差异。

EnrichEvent: Enriching Social Data with Contextual Information for Emerging Event Extraction

  • paper_url: http://arxiv.org/abs/2307.16082
  • repo_url: None
  • paper_authors: Mohammadali Sefidi Esfahani, Mohammad Akbari
  • for: 这 paper 的目的是提出一种基于流行社交数据的事件检测方法,以便更好地检测和分类不同类型的社会事件。
  • methods: 该方法使用语义和语境知识来检测社交媒体上的事件,并通过构建事件链来展示事件的变化。
  • results: 实验结果表明,该方法能够高效地检测和分类不同类型的社会事件,并且可以准确地捕捉事件的变化。
    Abstract Social platforms have emerged as crucial platforms for disseminating information and discussing real-life social events, which offers an excellent opportunity for researchers to design and implement novel event detection frameworks. However, most existing approaches merely exploit keyword burstiness or network structures to detect unspecified events. Thus, they often fail to identify unspecified events regarding the challenging nature of events and social data. Social data, e.g., tweets, is characterized by misspellings, incompleteness, word sense ambiguation, and irregular language, as well as variation in aspects of opinions. Moreover, extracting discriminative features and patterns for evolving events by exploiting the limited structural knowledge is almost infeasible. To address these challenges, in this thesis, we propose a novel framework, namely EnrichEvent, that leverages the lexical and contextual representations of streaming social data. In particular, we leverage contextual knowledge, as well as lexical knowledge, to detect semantically related tweets and enhance the effectiveness of the event detection approaches. Eventually, our proposed framework produces cluster chains for each event to show the evolving variation of the event through time. We conducted extensive experiments to evaluate our framework, validating its high performance and effectiveness in detecting and distinguishing unspecified social events.
    摘要 社交平台已成为散布信息和讨论现实社会事件的重要平台,这提供了研究人员设计和实现新型事件探测框架的优秀机会。然而,大多数现有方法只是利用关键词爆炸或社交网络结构来探测不特定的事件。因此,它们经常无法识别复杂的事件和社会数据中的事件。社会数据,例如微博,具有杂乱不准、缺失、多义词和不规则语言特征,同时也存在意见方面的变化。此外,抽取特征和模式以探测发展事件的限制知识是几乎不可能的。为了解决这些挑战,在本论文中,我们提出了一种新的框架,即EnrichEvent,该框架利用流动社会数据的语言和上下文表示来探测事件。具体来说,我们利用上下文知识以及语言知识来检测相关的微博,从而提高事件探测方法的效iveness。最终,我们的提出的框架生成了每个事件的时间序列链,以示出事件的发展变化。我们进行了广泛的实验来评估我们的框架,并证明其高效性和效iveness在探测和分辨不特定社会事件。