cs.LG - 2023-08-11

Towards a Causal Probabilistic Framework for Prediction, Action-Selection & Explanations for Robot Block-Stacking Tasks

  • paper_url: http://arxiv.org/abs/2308.06203
  • repo_url: None
  • paper_authors: Ricardo Cannizzaro, Jonathan Routley, Lars Kunze
  • for: 本研究旨在提供一种基于 causal probabilistic 框架的 autonomous robot 堆叠任务解决方案,以便 robot 可以在不同的 scenarios 中自动地理解和解释堆叠任务的当前状态,并选择下一个最佳动作。
  • methods: 本研究使用 causal inference 和 physics simulation 技术,将 causal models 和 probabilistic representations 结合起来,以便 robot 可以理解和描述堆叠任务的 causal 关系,并根据不同的 placement candidates 选择下一个最佳动作。
  • results: 本研究提出了一种 novel causal probabilistic 框架,可以帮助 robot 在不同的 scenarios 中自动地理解和解释堆叠任务的当前状态,并选择下一个最佳动作。 例如,我们提供了一些 exemplar 的下一个最佳动作选择结果,并规划了在 simulated 和实际的 robot 堆叠任务中进行进一步的实验。
    Abstract Uncertainties in the real world mean that is impossible for system designers to anticipate and explicitly design for all scenarios that a robot might encounter. Thus, robots designed like this are fragile and fail outside of highly-controlled environments. Causal models provide a principled framework to encode formal knowledge of the causal relationships that govern the robot's interaction with its environment, in addition to probabilistic representations of noise and uncertainty typically encountered by real-world robots. Combined with causal inference, these models permit an autonomous agent to understand, reason about, and explain its environment. In this work, we focus on the problem of a robot block-stacking task due to the fundamental perception and manipulation capabilities it demonstrates, required by many applications including warehouse logistics and domestic human support robotics. We propose a novel causal probabilistic framework to embed a physics simulation capability into a structural causal model to permit robots to perceive and assess the current state of a block-stacking task, reason about the next-best action from placement candidates, and generate post-hoc counterfactual explanations. We provide exemplar next-best action selection results and outline planned experimentation in simulated and real-world robot block-stacking tasks.
    摘要 世界上的不确定性使得系统设计者无法预期和预制所有 robot 可能遇到的情况。因此,基于这种设计的 robot 在不控制环境中会失败。 causal 模型提供了一个理解 formal knowledge 的 causal 关系,并且可以用 probabilistic 表示 noise 和不确定性,这些是实际 robot 遇到的通常情况。 combined with causal inference, these models allow an autonomous agent to understand, reason about, and explain its environment. 在这项工作中,我们关注了一个 robot 堆叠任务,因为它涉及到多种应用,包括仓储логиística和家庭支持 robotics。我们提出了一种新的 causal probabilistic 框架,用于嵌入物理模拟能力到结构 causal 模型中,让 robot 可以识别和评估堆叠任务的当前状态,选择下一步行动的位置候选者,并生成post-hoc counterfactual 解释。我们提供了示例下一步行动选择结果,并详细介绍在 simulated 和实际 robot 堆叠任务中的计划实验。

Exploring Predicate Visual Context in Detecting of Human-Object Interactions

  • paper_url: http://arxiv.org/abs/2308.06202
  • repo_url: https://github.com/fredzzhang/pvic
  • paper_authors: Frederic Z. Zhang, Yuhui Yuan, Dylan Campbell, Zhuoyao Zhong, Stephen Gould
  • for: 本研究探讨了 DETR 框架在人物对象交互(HOI)领域中的表现,特别是使用两个阶段的 transformer 来进行 HOI 检测。
  • methods: 本研究使用了Visualiztion和仔细的实验来研究如何重新引入图像特征,并通过改进查询设计、广泛探索键和值、以及盒子对应的位域嵌入来提高 predicate 视觉上下文(PViC)模型的性能。
  • results: 根据 HICO-DET 和 V-COCO benchmark 的测试结果,我们的 PViC 模型在 HOI 领域中表现出了优于现有方法,同时保持了低的训练成本。
    Abstract Recently, the DETR framework has emerged as the dominant approach for human--object interaction (HOI) research. In particular, two-stage transformer-based HOI detectors are amongst the most performant and training-efficient approaches. However, these often condition HOI classification on object features that lack fine-grained contextual information, eschewing pose and orientation information in favour of visual cues about object identity and box extremities. This naturally hinders the recognition of complex or ambiguous interactions. In this work, we study these issues through visualisations and carefully designed experiments. Accordingly, we investigate how best to re-introduce image features via cross-attention. With an improved query design, extensive exploration of keys and values, and box pair positional embeddings as spatial guidance, our model with enhanced predicate visual context (PViC) outperforms state-of-the-art methods on the HICO-DET and V-COCO benchmarks, while maintaining low training cost.
    摘要 Translation notes:* DETR 框架在人物交互研究中得到了广泛应用,特别是两阶段变换器基于 HOI 检测器。* 这些 HOI 检测器通常基于对象特征,而不考虑人物姿态和方向信息,而选择视觉特征来确定对象标识和边框极限。* 这会限制复杂或抽象的交互识别。* 在这项工作中,我们通过视觉化和仔细设计的实验来研究这些问题。* 我们 investigate如何通过交叉注意来重新引入图像特征。* 我们采用改进的查询设计、广泛探索键值和盒对位嵌入作为空间导航,以提高 predicate visual context (PViC) 模型在 HICO-DET 和 V-COCO benchmark 上的表现,同时保持低训练成本。

Complex Facial Expression Recognition Using Deep Knowledge Distillation of Basic Features

  • paper_url: http://arxiv.org/abs/2308.06197
  • repo_url: https://github.com/angusmaiden/complex-fer
  • paper_authors: Angus Maiden, Bahareh Nakisa
  • for: 本研究旨在提出一种基于人类认知和学习的新方法,以准确地识别复杂的表情类别,并且使用少量训练样本进行学习。
  • methods: 本研究使用了知识泛化和几何学习等方法,并首次应用了几何学习到复杂表情识别中。通过知识泛化和Predictive Sorting Memory Replay等技术,本研究可以快速学习新的表情类别,同时保持已知类别的知识。
  • results: 本研究实现了在新类别上的74.28%的总准确率,比非连续学习方法高出13.95%。此外,本研究还首次在复杂表情识别中实现了几何学习的应用,并达到了100%的准确率,使用单个训练样本来识别每个表情类别。
    Abstract Complex emotion recognition is a cognitive task that has so far eluded the same excellent performance of other tasks that are at or above the level of human cognition. Emotion recognition through facial expressions is particularly difficult due to the complexity of emotions expressed by the human face. For a machine to approach the same level of performance in this domain as a human, it may need to synthesise knowledge and understand new concepts in real-time as humans do. Humans are able to learn new concepts using only few examples, by distilling the important information from memories and discarding the rest. Similarly, continual learning methods learn new classes whilst retaining the knowledge of known classes, whilst few-shot learning methods are able to learn new classes using very few training examples. We propose a novel continual learning method inspired by human cognition and learning that can accurately recognise new compound expression classes using few training samples, by building on and retaining its knowledge of basic expression classes. Using GradCAM visualisations, we demonstrate the relationship between basic and compound facial expressions, which our method leverages through knowledge distillation and a novel Predictive Sorting Memory Replay. Our method achieves the current state-of-the-art in continual learning for complex facial expression recognition with 74.28% Overall Accuracy on new classes. We also demonstrate that using continual learning for complex facial expression recognition achieves far better performance than non-continual learning methods, improving on state-of-the-art non-continual learning methods by 13.95%. To the best of our knowledge, our work is also the first to apply few-shot learning to complex facial expression recognition, achieving the state-of-the-art with 100% accuracy using a single training sample for each expression class.
    摘要 人工智能在复杂情绪认知方面的表现仍然落后于其他高于人类水平的任务的表现。人脸表达情感是特别Difficult,因为人脸上表达的情感非常复杂。为了让机器达到人类水平,它可能需要合成知识和理解新概念,就像人类一样。人类可以通过几个示例学习新概念,从记忆中提取重要信息,并抛弃其他信息。我们提出了一种基于人类认知和学习的新型连续学习方法,可以准确地识别新的复杂表达类型,使用很少的训练样本。我们使用GradCAM可视化来显示基本和复杂表达之间的关系,我们的方法通过知识储存和一种新的预测排序记忆重温来利用这种关系。我们的方法实现了当前的连续学习状态势最佳,在新类上达到74.28%的总准确率。我们还证明了在连续学习方法下,复杂表达认知的性能远超非连续学习方法,提高了状态公共非连续学习方法的13.95%。而且,我们的工作是首次将几shot学习应用于复杂表达认知领域,在每个表达类型上达到100%的准确率,只需要一个训练样本。

Assessing Guest Nationality Composition from Hotel Reviews

  • paper_url: http://arxiv.org/abs/2308.06175
  • repo_url: None
  • paper_authors: Fabian Gröger, Marc Pouly, Flavia Tinner, Leif Brandes
  • for: 本研究旨在通过机器学习技术分析酒店客户的国籍分布,以便更好地评估和监控酒店客户的概貌。
  • methods: 本研究使用了预训练的嵌入和堆式LSTM层,实现了对不结构化文本评论中关于客户国籍的引用的自动提取。
  • results: 研究发现,使用简单的架构可以提供更好的性能和时间成本协议,而不需要更复杂的语言模型。
    Abstract Many hotels target guest acquisition efforts to specific markets in order to best anticipate individual preferences and needs of their guests. Likewise, such strategic positioning is a prerequisite for efficient marketing budget allocation. Official statistics report on the number of visitors from different countries, but no fine-grained information on the guest composition of individual businesses exists. There is, however, growing interest in such data from competitors, suppliers, researchers and the general public. We demonstrate how machine learning can be leveraged to extract references to guest nationalities from unstructured text reviews in order to dynamically assess and monitor the dynamics of guest composition of individual businesses. In particular, we show that a rather simple architecture of pre-trained embeddings and stacked LSTM layers provides a better performance-runtime tradeoff than more complex state-of-the-art language models.
    摘要 Many hotels target their guest acquisition efforts at specific markets in order to best anticipate the individual preferences and needs of their guests. Similarly, precise positioning is a prerequisite for efficient marketing budget allocation. Official statistics report on the number of visitors from different countries, but there is no fine-grained information on the guest composition of individual businesses. However, there is growing interest in such data from competitors, suppliers, researchers, and the general public. We demonstrate how machine learning can be used to extract references to guest nationalities from unstructured text reviews in order to dynamically assess and monitor the dynamics of guest composition of individual businesses. In particular, we show that a relatively simple architecture of pre-trained embeddings and stacked LSTM layers provides a better performance-runtime tradeoff than more complex state-of-the-art language models.Here's the translation breakdown: Many hotels target their guest acquisition efforts at specific markets (多家酒店对specific markets进行客源策略) in order to best anticipate the individual preferences and needs of their guests (以便更好地预测客人偏好和需求) Similarly, precise positioning is a prerequisite for efficient marketing budget allocation ( similarly, precise positioning is a prerequisite for efficient marketing budget allocation) Official statistics report on the number of visitors from different countries (官方统计数据表明不同国家的游客数量) but there is no fine-grained information on the guest composition of individual businesses (但没有对个体业务的客户结构进行细化的信息) However, there is growing interest in such data from competitors, suppliers, researchers, and the general public (然而,有越来越多的竞争对手、供应商、研究人员和公众对此数据表示兴趣) We demonstrate how machine learning can be used to extract references to guest nationalities from unstructured text reviews (我们示出了使用机器学习提取客人国籍信息从无结构文本评论中) in order to dynamically assess and monitor the dynamics of guest composition of individual businesses (以便动态评估和监控个体业务的客户结构变化) In particular, we show that a relatively simple architecture of pre-trained embeddings and stacked LSTM layers provides a better performance-runtime tradeoff than more complex state-of-the-art language models (特别是,我们示出了一种相对简单的架构,即预训练的嵌入和叠加LSTM层,在性能和运行时间之间提供了更好的负荷均衡)

  • paper_url: http://arxiv.org/abs/2308.06173
  • repo_url: None
  • paper_authors: Amira Guesmi, Muhammad Abdullah Hanif, Bassem Ouni, Muhammed Shafique
  • for: 本研究准备了一份物理敌意攻击的综述,以提供对物理世界中敌意攻击的全面理解,包括攻击的特点和分类。
  • methods: 本文分析了不同应用领域中的物理敌意攻击方法,包括分类、检测、人脸识别、 semantic segmentation 和深度估计。我们评估了这些攻击方法的效果、隐蔽性和Robustness。
  • results: 本文分析了物理敌意攻击的挑战和难点,并评估了各种攻击方法的性能。我们还提出了未来研究的方向,包括增强防御机制、探索新的攻击策略、评估攻击在不同应用领域中的性能,以及建立物理敌意攻击的标准化评估标准。
    Abstract In this paper, we present a comprehensive survey of the current trends focusing specifically on physical adversarial attacks. We aim to provide a thorough understanding of the concept of physical adversarial attacks, analyzing their key characteristics and distinguishing features. Furthermore, we explore the specific requirements and challenges associated with executing attacks in the physical world. Our article delves into various physical adversarial attack methods, categorized according to their target tasks in different applications, including classification, detection, face recognition, semantic segmentation and depth estimation. We assess the performance of these attack methods in terms of their effectiveness, stealthiness, and robustness. We examine how each technique strives to ensure the successful manipulation of DNNs while mitigating the risk of detection and withstanding real-world distortions. Lastly, we discuss the current challenges and outline potential future research directions in the field of physical adversarial attacks. We highlight the need for enhanced defense mechanisms, the exploration of novel attack strategies, the evaluation of attacks in different application domains, and the establishment of standardized benchmarks and evaluation criteria for physical adversarial attacks. Through this comprehensive survey, we aim to provide a valuable resource for researchers, practitioners, and policymakers to gain a holistic understanding of physical adversarial attacks in computer vision and facilitate the development of robust and secure DNN-based systems.
    摘要 在这篇论文中,我们提供了物理敌对攻击的全面评估,专注于物理世界中的攻击方法。我们想要为读者提供物理敌对攻击的全面理解,分析其关键特征和区别特征。此外,我们还探讨在物理世界中执行攻击的具体要求和挑战。我们的文章探讨了不同应用场景中的物理敌对攻击方法,分为不同的目标任务,包括分类、检测、识别、 semantic segmentation 和深度估计。我们评估了这些攻击方法的效果、隐蔽性和可靠性。我们研究每种技术如何在 DNN 上成功操纵,同时避免检测和真实世界的扭曲。最后,我们讨论了物理敌对攻击领域当前的挑战和未来研究方向,包括增强防御机制、探索新的攻击策略、在不同应用场景中评估攻击、建立物理敌对攻击的标准化评估标准和评估方法。通过这篇全面的评估,我们希望为研究人员、实践者和政策制定者提供一份有价值的资源,以便更好地理解物理敌对攻击在计算机视觉中的现状,并促进 DNN 基本的安全和可靠性。

Phased Deep Spatio-temporal Learning for Highway Traffic Volume Prediction

  • paper_url: http://arxiv.org/abs/2308.06155
  • repo_url: None
  • paper_authors: Weilong Ding, Tianpu Zhang, Zhe Wang
    for: 这份研究paper的目的是估算城市高速公路交通量,并对于时空特征进行深入的学习分析,以提高交通量估算的准确性。methods: 本研究使用了深度时空学习方法,包括对数据进行精确的 нормализація、将时空特征融合为一个数据集,并使用了FCN和LSTM两种模型来进行学习。results: 本研究的结果显示,使用深度时空学习方法可以优化交通量估算的准确性,比传统模型提高了5.269和0.997个MPAE和R-squre度量上。
    Abstract Inter-city highway transportation is significant for citizens' modern urban life and generates heterogeneous sensory data with spatio-temporal characteristics. As a routine analysis in transportation domain, daily traffic volume estimation faces challenges for highway toll stations including lacking of exploration of correlative spatio-temporal features from a long-term perspective and effective means to deal with data imbalance which always deteriorates the predictive performance. In this paper, a deep spatio-temporal learning method is proposed to predict daily traffic volume in three phases. In feature pre-processing phase, data is normalized elaborately according to latent long-tail distribution. In spatio-temporal learning phase, a hybrid model is employed combining fully convolution network (FCN) and long short-term memory (LSTM), which considers time, space, meteorology, and calendar from heterogeneous data. In decision phase, traffic volumes on a coming day at network-wide toll stations would be achieved effectively, which is especially calibrated for vital few highway stations. Using real-world data from one Chinese provincial highway, extensive experiments show our method has distinct improvement for predictive accuracy than various traditional models, reaching 5.269 and 0.997 in MPAE and R-squre metrics, respectively.
    摘要 urban modern 生活中的公共交通运输非常重要,产生了不同类型的感知数据,具有空间时间特征。在交通领域的日常任务中,估算高速公路客运量受到了缺乏长期perspective下挖掘相关空间时间特征的缺失和数据偏移问题的影响。本文提出了一种深度空间时间学习方法,用于预测高速公路客运量。在特征预处理阶段,数据进行了精心normal化,根据潜在的长尾分布。在空间时间学习阶段,我们employs hybrid模型,组合了全连接网络(FCN)和长短期记忆(LSTM),考虑了时间、空间、天气和calendar从多种不同的数据。在决策阶段,预测当天高速公路客运量的效果非常高,尤其是对于重要的一些高速公路站点。使用中国一省高速公路的实际数据进行了广泛的实验,结果表明,我们的方法在预测精度方面与传统模型有显著的提升,达到了5.269和0.997的MPAE和R-squre指标。

Gaussian Process Regression for Maximum Entropy Distribution

  • paper_url: http://arxiv.org/abs/2308.06149
  • repo_url: None
  • paper_authors: Mohsen Sadr, Manuel Torrilhon, M. Hossein Gorji
  • for: 用于闭合问题中的最大Entropy分布。
  • methods: 使用Gaussian priors来近似Lagrangemultipliers,并且使用不同的kernel函数对Hyperparameters进行优化。
  • results: 对几个测试 случа件进行了性能研究,包括非平衡分布的relaxation,其 governing equations是Bhatnagar-Gross-Krook和Boltzmann气动方程。
    Abstract Maximum-Entropy Distributions offer an attractive family of probability densities suitable for moment closure problems. Yet finding the Lagrange multipliers which parametrize these distributions, turns out to be a computational bottleneck for practical closure settings. Motivated by recent success of Gaussian processes, we investigate the suitability of Gaussian priors to approximate the Lagrange multipliers as a map of a given set of moments. Examining various kernel functions, the hyperparameters are optimized by maximizing the log-likelihood. The performance of the devised data-driven Maximum-Entropy closure is studied for couple of test cases including relaxation of non-equilibrium distributions governed by Bhatnagar-Gross-Krook and Boltzmann kinetic equations.
    摘要 maximum-entropy 分布对各种应用问题来说是一个吸引人的家族。然而,在实际封闭设定中,找到lagrange多项式的参数化问题实际上是计算瓶颈。基于近期 Gaussian 过程的成功,我们考虑使用 Gaussian 假设来近似 lagrange多项式的参数化。对于不同的核函数,我们通过寄存器最大化 log-likelihood 来调整超参数。我们对数据驱动的最大 entropy 封闭性能进行了一些测试,包括对非平衡分布的缓和,由 Bhatnagar-Gross-Krook 和 Boltzmann 动力学方程控制。

A New Approach to Overcoming Zero Trade in Gravity Models to Avoid Indefinite Values in Linear Logarithmic Equations and Parameter Verification Using Machine Learning

  • paper_url: http://arxiv.org/abs/2308.06303
  • repo_url: None
  • paper_authors: Mikrajuddin Abdullah
  • For: 解决高零流贸易数据中Logarithmic linear equation中的无穷值问题,以便使用重力模型来描述国际贸易。* Methods: 提出了一种两步技术来确定重力参数:首先,使用本地线性回归来设置一个假值来替代零流贸易,然后使用iterative技术来估计重力参数。* Results: 使用机器学习来测试估计的参数,发现GDP的势力和距离的势力在同一个群集中,值约为1。这种策略可以解决其他Logarithmic linear regression问题。
    Abstract The presence of a high number of zero flow trades continues to provide a challenge in identifying gravity parameters to explain international trade using the gravity model. Linear regression with a logarithmic linear equation encounters an indefinite value on the logarithmic trade. Although several approaches to solving this problem have been proposed, the majority of them are no longer based on linear regression, making the process of finding solutions more complex. In this work, we suggest a two-step technique for determining the gravity parameters: first, perform linear regression locally to establish a dummy value to substitute trade flow zero, and then estimating the gravity parameters. Iterative techniques are used to determine the optimum parameters. Machine learning is used to test the estimated parameters by analyzing their position in the cluster. We calculated international trade figures for 2004, 2009, 2014, and 2019. We just examine the classic gravity equation and discover that the powers of GDP and distance are in the same cluster and are both worth roughly one. The strategy presented here can be used to solve other problems involving log-linear regression.
    摘要 《高数量的零流通贸易对国际贸易模型中gravity参数的确定带来挑战。线性回归logarithmic linear方程会得到无穷值的贸易流量。虽然有多种解决方案被提出,但大多数都不再基于线性回归,使得解决问题变得更加复杂。在这项工作中,我们建议一种两步方法来确定gravity参数:首先,使用本地线性回归来确定一个占位值来替代零流通贸易,然后估算gravity参数。使用迭代技术确定优化参数。使用机器学习测试估算参数的位置在集群中。我们计算了2004年、2009年、2014年和2019年的国际贸易数据。我们只是考虑 классическиеgravity方程,发现GDP的势和距离的权重都处于同一个集群,并且均值约为1。提出的策略可以解决其他log-linear回归中的问题。》Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Identification of the Relevance of Comments in Codes Using Bag of Words and Transformer Based Models

  • paper_url: http://arxiv.org/abs/2308.06144
  • repo_url: https://github.com/sruthisudheer/comment-classification-of-c-code
  • paper_authors: Sruthi S, Tanmay Basu
  • for: 本研究的目的是为了 классификация代码段评论的不同类型。
  • methods: 本研究使用了不同的特征工程和文本分类技术,包括经典的袋子模型和基于变换器的模型。
  • results: 研究发现袋子模型在训练集上的表现更好,但是模型的性能在训练和测试集上并不理想。
    Abstract The Forum for Information Retrieval (FIRE) started a shared task this year for classification of comments of different code segments. This is binary text classification task where the objective is to identify whether comments given for certain code segments are relevant or not. The BioNLP-IISERB group at the Indian Institute of Science Education and Research Bhopal (IISERB) participated in this task and submitted five runs for five different models. The paper presents the overview of the models and other significant findings on the training corpus. The methods involve different feature engineering schemes and text classification techniques. The performance of the classical bag of words model and transformer-based models were explored to identify significant features from the given training corpus. We have explored different classifiers viz., random forest, support vector machine and logistic regression using the bag of words model. Furthermore, the pre-trained transformer based models like BERT, RoBERT and ALBERT were also used by fine-tuning them on the given training corpus. The performance of different such models over the training corpus were reported and the best five models were implemented on the given test corpus. The empirical results show that the bag of words model outperforms the transformer based models, however, the performance of our runs are not reasonably well in both training and test corpus. This paper also addresses the limitations of the models and scope for further improvement.
    摘要 “forum for information retrieval(FIRE)这年开始了代码段评估的共同任务。这是一个二分类文本分类任务,目标是判断给定代码段的注释是否相关。BIoNLP-IISERB组在印度科学教育研究所 Bhopal(IISERB)参与了这个任务,并提交了五个运行。这篇论文介绍了模型和其他重要发现,包括不同的特征工程和文本分类技术。我们使用了不同的特征工程和文本分类技术来找出特征。我们还使用了Random Forest、支持向量机和Logistic Regression等分类器,以及 Bag of Words 模型。此外,我们还使用了预训练的 transformer 模型,如 BERT、RoBERT 和 ALBERT,并通过 fine-tuning 来适应给定的训练集。我们对各种模型的性能进行了报告,并在测试集上实现了最佳五个模型。实验结果表明, Bag of Words 模型在训练集上表现更好,但我们的运行在训练和测试集上的表现并不理想。这篇论文还探讨了模型的局限性和进一步改进的可能性。”

CompTLL-UNet: Compressed Domain Text-Line Localization in Challenging Handwritten Documents using Deep Feature Learning from JPEG Coefficients

  • paper_url: http://arxiv.org/abs/2308.06142
  • repo_url: None
  • paper_authors: Bulla Rajesh, Sk Mahafuz Zaman, Mohammed Javed, P. Nagabhushan
  • for: 本研究旨在提出一种直接从JPEG压缩矩阵中进行文本线localization的方法,以提高文本Recognition的效率和可扩展性。
  • methods: 本研究使用了一种 modificated U-Net architecture,称为Compressed Text-Line Localization Network (CompTLL-UNet),以直接从JPEG压缩矩阵中提取深度特征进行文本线localization。
  • results: 研究发现,CompTLL-UNet在JPEG压缩矩阵中直接进行文本线localization可以达到state-of-the-art表现,同时减少存储和计算成本。
    Abstract Automatic localization of text-lines in handwritten documents is still an open and challenging research problem. Various writing issues such as uneven spacing between the lines, oscillating and touching text, and the presence of skew become much more challenging when the case of complex handwritten document images are considered for segmentation directly in their respective compressed representation. This is because, the conventional way of processing compressed documents is through decompression, but here in this paper, we propose an idea that employs deep feature learning directly from the JPEG compressed coefficients without full decompression to accomplish text-line localization in the JPEG compressed domain. A modified U-Net architecture known as Compressed Text-Line Localization Network (CompTLL-UNet) is designed to accomplish it. The model is trained and tested with JPEG compressed version of benchmark datasets including ICDAR2017 (cBAD) and ICDAR2019 (cBAD), reporting the state-of-the-art performance with reduced storage and computational costs in the JPEG compressed domain.
    摘要 自动化手写文档中文行的地方化是一个还未解决的研究问题。不同的写作问题,如行间距不均,文本 oscilaltion 和触摸,以及扭曲的问题,在考虑复杂手写文档图像时变得更加挑战。这是因为,传统的文档处理方法是通过解压,但在这篇论文中,我们提出了一个想法,即使用深度特征学习直接从 JPEG 压缩级联系中进行文行地方化。我们称之为 Compressed Text-Line Localization Network (CompTLL-UNet) 的修改版 U-Net 架构。该模型在 JPEG 压缩领域中进行训练和测试,并在 JPEG 压缩版 benchmark 数据集上达到了当前最佳性能,同时减少了存储和计算成本。

Application of Artificial Neural Networks for Investigation of Pressure Filtration Performance, a Zinc Leaching Filter Cake Moisture Modeling

  • paper_url: http://arxiv.org/abs/2308.06138
  • repo_url: None
  • paper_authors: Masoume Kazemi, Davood Moradkhani, Alireza A. Alipour
    for: 这种研究旨在开发一个人工神经网络模型,用于预测氧化锌生产过程中压 filtering 过程中的蛋白质湿度。methods: 该研究使用了人工神经网络模型,并在288个测试中使用了两种不同的 fabrics:polypropylene(S1)和 polyester(S2)。results: 研究发现,人工神经网络模型可以高度准确地预测氧化锌生产过程中压 filtering 过程中的蛋白质湿度,R2 值分别为0.88和0.83,MSE值分别为6.243x10-07和1.086x10-06,MAE值分别为0.00056和0.00088。
    Abstract Machine Learning (ML) is a powerful tool for material science applications. Artificial Neural Network (ANN) is a machine learning technique that can provide high prediction accuracy. This study aimed to develop an ANN model to predict the cake moisture of the pressure filtration process of zinc production. The cake moisture was influenced by seven parameters: temperature (35 and 65 Celsius), solid concentration (0.2 and 0.38 g/L), pH (2, 3.5, and 5), air-blow time (2, 10, and 15 min), cake thickness (14, 20, 26, and 34 mm), pressure, and filtration time. The study conducted 288 tests using two types of fabrics: polypropylene (S1) and polyester (S2). The ANN model was evaluated by the Coefficient of determination (R2), the Mean Square Error (MSE), and the Mean Absolute Error (MAE) metrics for both datasets. The results showed R2 values of 0.88 and 0.83, MSE values of 6.243x10-07 and 1.086x10-06, and MAE values of 0.00056 and 0.00088 for S1 and S2, respectively. These results indicated that the ANN model could predict the cake moisture of pressure filtration in the zinc leaching process with high accuracy.
    摘要 机器学习(ML)是资源科学应用中的一种强大工具。人工神经网络(ANN)是机器学习技术之一,可以提供高精度预测。本研究目的是开发一个ANN模型,用于预测压 filtering过程中锌生产中的蛋白质湿度。蛋白质湿度受到七个参数的影响:温度(35和65℃),固体浓度(0.2和0.38 g/L),pH(2、3.5和5),空气吹时(2、10和15分),蛋白质厚度(14、20、26和34 mm),压力和滤 separation时间。研究进行了288次测试,使用了两种不同的布料:polypropylene(S1)和polyester(S2)。ANN模型被评估了Cofficient of determination(R2)、Mean Square Error(MSE)和Mean Absolute Error(MAE)度量,对于两个数据集。结果显示,R2值为0.88和0.83,MSE值为6.243x10-07和1.086x10-06,MAE值为0.00056和0.00088,分别对应S1和S2数据集。这些结果表明,ANN模型可以高精度预测压 filtering过程中锌生产中的蛋白质湿度。

PDE Discovery for Soft Sensors Using Coupled Physics-Informed Neural Network with Akaike’s Information Criterion

  • paper_url: http://arxiv.org/abs/2308.06132
  • repo_url: None
  • paper_authors: Aina Wang, Pan Qin, Xi-Ming Sun
  • for: 这个论文是为了探讨soft sensors的PDE结构发现和适应 industrial processes with spatiotemporal dependence。
  • methods: 这个论文提出了一种基于物理学习的CPINN-AIC方法,通过将物理法则和神经网络结合在一起,以找到适合soft sensors的PDE结构。
  • results: 实验结果表明,CPINN-AIC方法可以有效地找到适合soft sensors的PDE结构,并且可以在实际应用中提供高精度的预测结果。
    Abstract Soft sensors have been extensively used to monitor key variables using easy-to-measure variables and mathematical models. Partial differential equations (PDEs) are model candidates for soft sensors in industrial processes with spatiotemporal dependence. However, gaps often exist between idealized PDEs and practical situations. Discovering proper structures of PDEs, including the differential operators and source terms, can remedy the gaps. To this end, a coupled physics-informed neural network with Akaike's criterion information (CPINN-AIC) is proposed for PDE discovery of soft sensors. First, CPINN is adopted for obtaining solutions and source terms satisfying PDEs. Then, we propose a data-physics-hybrid loss function for training CPINN, in which undetermined combinations of differential operators are involved. Consequently, AIC is used to discover the proper combination of differential operators. Finally, the artificial and practical datasets are used to verify the feasibility and effectiveness of CPINN-AIC for soft sensors. The proposed CPINN-AIC is a data-driven method to discover proper PDE structures and neural network-based solutions for soft sensors.
    摘要 First, CPINN is used to obtain solutions and source terms that satisfy PDEs. Then, we propose a data-physics-hybrid loss function for training CPINN, which involves undetermined combinations of differential operators. Next, AIC is used to discover the appropriate combination of differential operators. Finally, artificial and practical datasets are used to verify the feasibility and effectiveness of CPINN-AIC for soft sensors.The proposed CPINN-AIC is a data-driven method for discovering proper PDE structures and neural network-based solutions for soft sensors.

Uncertainty Quantification for Image-based Traffic Prediction across Cities

  • paper_url: http://arxiv.org/abs/2308.06129
  • repo_url: https://github.com/alextimans/traffic4cast-uncertainty
  • paper_authors: Alexander Timans, Nina Wiedemann, Nishant Kumar, Ye Hong, Martin Raubal
  • for: 这个论文旨在探讨uncertainty quantification(UQ)方法在交通预测 tasks 中的应用,以提高模型的可解释性和决策支持能力。
  • methods: 论文使用了两种epistemic和两种aleatoric UQ 方法,对多个城市和时间点的大规模图像基于交通数据进行了比较。
  • results: 研究发现,使用UQ方法可以获得有意义的uncertainty estimate,并且可以用于不监督异常检测城市交通动态变化。在moscow 城市的示例研究中,我们发现了时间和空间效应对交通行为的影响。
    Abstract Despite the strong predictive performance of deep learning models for traffic prediction, their widespread deployment in real-world intelligent transportation systems has been restrained by a lack of interpretability. Uncertainty quantification (UQ) methods provide an approach to induce probabilistic reasoning, improve decision-making and enhance model deployment potential. To gain a comprehensive picture of the usefulness of existing UQ methods for traffic prediction and the relation between obtained uncertainties and city-wide traffic dynamics, we investigate their application to a large-scale image-based traffic dataset spanning multiple cities and time periods. We compare two epistemic and two aleatoric UQ methods on both temporal and spatio-temporal transfer tasks, and find that meaningful uncertainty estimates can be recovered. We further demonstrate how uncertainty estimates can be employed for unsupervised outlier detection on changes in city traffic dynamics. We find that our approach can capture both temporal and spatial effects on traffic behaviour in a representative case study for the city of Moscow. Our work presents a further step towards boosting uncertainty awareness in traffic prediction tasks, and aims to highlight the value contribution of UQ methods to a better understanding of city traffic dynamics.
    摘要 尽管深度学习模型在交通预测 task 上表现出了强大的预测能力,但它们在实际世界智能交通系统中的广泛部署受到了不确定性的限制。不确定性量化(UQ)方法可以带来 probabilistic reasoning,改善决策,并提高模型的部署潜力。为了更好地了解现有 UQ 方法在交通预测 task 中的用于性和 obtained uncertainties 与城市范围内交通动力学的关系,我们对一个大规模的图像基于交通数据集进行了 Investigation。我们对这些数据集进行了两种 Epistemic 和两种 Aleatoric UQ 方法的比较,并发现了 meaningful uncertainty estimates 可以被回归。我们还示出了如何使用 uncertainty estimates 进行无supervised outlier detection on changes in city traffic dynamics。我们在 Moscowa 市的一个示例研究中发现,我们的方法可以捕捉到时间和空间效应的交通行为。我们的工作是提高交通预测任务中的 uncertainty awareness 的一次进一步步骤,并 hopes to highlight the value contribution of UQ methods to a better understanding of city traffic dynamics。

Learning Control Policies for Variable Objectives from Offline Data

  • paper_url: http://arxiv.org/abs/2308.06127
  • repo_url: None
  • paper_authors: Marc Weber, Phillip Swazinna, Daniel Hein, Steffen Udluft, Volkmar Sterzing
  • for: 本研究提供了一种可行的方法来获得复杂系统的高级控制策略,特别是当直接与环境进行互动不可available时。
  • methods: 本研究使用了一种概念扩展,即变量目标策略(VOP),以帮助策略在多个目标下进行有效的演化。
  • results: 通过在策略中输入不同的目标,用户可以在运行时调整策略的行为或优化目标,不需要再收集更多的观察批量或重新训练。
    Abstract Offline reinforcement learning provides a viable approach to obtain advanced control strategies for dynamical systems, in particular when direct interaction with the environment is not available. In this paper, we introduce a conceptual extension for model-based policy search methods, called variable objective policy (VOP). With this approach, policies are trained to generalize efficiently over a variety of objectives, which parameterize the reward function. We demonstrate that by altering the objectives passed as input to the policy, users gain the freedom to adjust its behavior or re-balance optimization targets at runtime, without need for collecting additional observation batches or re-training.
    摘要 直接与环境交互不可用时,偏离线强化学习提供了一个可行的方法来获得先进的控制策略。在这篇论文中,我们介绍了一种概念扩展,即变量目标策略(VOP)。这种方法通过让策略 Parameterize the reward function,以高效地生成多种目标。我们示示了,通过在策略中输入不同的目标,用户可以在运行时调整策略的行为或优化目标,无需收集更多的观察批处或重新训练。

Learning Deductive Reasoning from Synthetic Corpus based on Formal Logic

  • paper_url: http://arxiv.org/abs/2308.07336
  • repo_url: https://github.com/hitachi-nlp/fld
  • paper_authors: Terufumi Morishita, Gaku Morio, Atsuki Yamaguchi, Yasuhiro Sogawa
  • for: 本研究旨在帮助语言模型(LM)学习逻辑推理能力,使其可以更好地理解和推理文本中的逻辑关系。
  • methods: 我们采用了一种基于形式逻辑理论的合理的 deduction 规则集,并使用这些规则生成了一个合理的 deduction 示例集(FLD),以训练LM。
  • results: 我们的实验表明,使用FLD训练LM后,LM可以更好地推理文本中的逻辑关系,并且可以在多个步骤中组合多个规则来推理。此外,我们还发现了一些不同的逻辑推理能力方面,在哪些方面 deduction corpora 可以帮助LM,在哪些方面不可以。
    Abstract We study a synthetic corpus-based approach for language models (LMs) to acquire logical deductive reasoning ability. The previous studies generated deduction examples using specific sets of deduction rules. However, these rules were limited or otherwise arbitrary. This can limit the generalizability of acquired deductive reasoning ability. We rethink this and adopt a well-grounded set of deduction rules based on formal logic theory, which can derive any other deduction rules when combined in a multistep way. We empirically verify that LMs trained on the proposed corpora, which we name $\textbf{FLD}$ ($\textbf{F}$ormal $\textbf{L}$ogic $\textbf{D}$eduction), acquire more generalizable deductive reasoning ability. Furthermore, we identify the aspects of deductive reasoning ability on which deduction corpora can enhance LMs and those on which they cannot. Finally, on the basis of these results, we discuss the future directions for applying deduction corpora or other approaches for each aspect. We release the code, data, and models.
    摘要 我们研究一种基于合成语料库的方法,使语言模型(LM)学习逻辑推理能力。过去的研究通常通过特定的推理规则生成推理示例,但这些规则有限或者是伪装的。这可能会限制学习的推理能力的普适性。我们重新思考,采用基于正式逻辑理论的固定的推理规则集,这些规则可以在多步骤中组合, derive任何其他的推理规则。我们employmetricamente验证,使用我们提出的$\textbf{FLD}$($\textbf{F}$ormal $\textbf{L}$ogic $\textbf{D}$eduction) corpora训练LMs,LMs可以获得更加普适的逻辑推理能力。此外,我们还确定了推理能力上哪些方面可以通过推理 corpora增强LMs,以及哪些方面无法增强。最后,基于这些结果,我们讨论未来如何应用推理 corpora或其他方法来解决每个方面的问题。我们释放代码、数据和模型。

Hawkes Processes with Delayed Granger Causality

  • paper_url: http://arxiv.org/abs/2308.06106
  • repo_url: None
  • paper_authors: Chao Yang, Hengyuan Miao, Shuang Li
  • for: 本研究旨在明确滞后Granger causal效应的扩展模型,基于多变量骨灰过程。这种想法源于事件引起效应通常需要一些时间。研究这个时间延迟本身具有科学意义。
  • methods: 我们提出了一种基于Variational Auto-Encoder(VAE)算法来近似 posterior distribution of time lags。我们对复杂的设定进行了研究,以便在不同情况下推断时延 posterior distribution。
  • results: 我们在 synthetic 和实际数据上进行了实验,并达到了良好的事件预测和时延推断准确率。
    Abstract We aim to explicitly model the delayed Granger causal effects based on multivariate Hawkes processes. The idea is inspired by the fact that a causal event usually takes some time to exert an effect. Studying this time lag itself is of interest. Given the proposed model, we first prove the identifiability of the delay parameter under mild conditions. We further investigate a model estimation method under a complex setting, where we want to infer the posterior distribution of the time lags and understand how this distribution varies across different scenarios. We treat the time lags as latent variables and formulate a Variational Auto-Encoder (VAE) algorithm to approximate the posterior distribution of the time lags. By explicitly modeling the time lags in Hawkes processes, we add flexibility to the model. The inferred time-lag posterior distributions are of scientific meaning and help trace the original causal time that supports the root cause analysis. We empirically evaluate our model's event prediction and time-lag inference accuracy on synthetic and real data, achieving promising results.
    摘要 我们目标是显式地模型延迟的格兰杰 causal 效应基于多variate Hawkes 过程。这个想法源于事件引起效应通常需要一些时间。研究这个时间延迟自己的研究有趣。给定我们的模型,我们首先证明延迟参数的可识别性nder mild conditions。我们进一步研究一种模型估计方法,想要从不同enario中INF posterior distribution of time lags和了解这个分布在不同enario中如何变化。我们对时延 treated as latent variables,并形式化一种Variational Auto-Encoder(VAE)算法来近似 posterior distribution of time lags。由于我们显式地模型了 Hawkes 过程中的时延,我们添加了模型的灵活性。从事件预测和时延推断角度来看,我们的模型具有良好的准确性。我们通过使用synthetic和实际数据进行实验,并实现了模型的预测和时延推断准确性。

Composable Function-preserving Expansions for Transformer Architectures

  • paper_url: http://arxiv.org/abs/2308.06103
  • repo_url: None
  • paper_authors: Andrea Gesmundo, Kaitlin Maile
  • for: 这篇论文的目的是提出一种能够逐步增加 transformer 类型神经网络的规模,保持模型的功能完整性。
  • methods: 该论文提出了六种可组合的变换,用于逐步增加神经网络的规模,并且提供了证明,证明这些变换不会影响模型的功能。
  • results: 该论文的实验结果表明,通过使用这些变换,可以 efficiently 培养更大和更强大的神经网络,并且可以逐步增加模型的规模。
    Abstract Training state-of-the-art neural networks requires a high cost in terms of compute and time. Model scale is recognized to be a critical factor to achieve and improve the state-of-the-art. Increasing the scale of a neural network normally requires restarting from scratch by randomly initializing all the parameters of the model, as this implies a change of architecture's parameters that does not allow for a straightforward transfer of knowledge from smaller size models. In this work, we propose six composable transformations to incrementally increase the size of transformer-based neural networks while preserving functionality, allowing to expand the capacity of the model as needed. We provide proof of exact function preservation under minimal initialization constraints for each transformation. The proposed methods may enable efficient training pipelines for larger and more powerful models by progressively expanding the architecture throughout training.
    摘要 文本翻译为简化字符串现代神经网络训练需要高度的计算和时间成本。模型缩放被认为是达到和改进状态前的关键因素。增加模型缩放通常需要从头开始重新初始化所有模型参数,因为这意味着更改模型参数的architecture,不允许直接将小型模型中的知识传递给大型模型。在这项工作中,我们提出了六种可 compose 的变换,可以逐步增加 transformer 类神经网络的大小,保持功能不变。我们提供了每种变换的准确函数保存证明,并且表明在较小的初始化约束下,这些变换可以高效地训练更大和更强大的模型。这些方法可能会启用高效的训练管道,以逐步扩展模型的大小,从而提高模型的能力。

Diffusion-based Visual Counterfactual Explanations – Towards Systematic Quantitative Evaluation

  • paper_url: http://arxiv.org/abs/2308.06100
  • repo_url: https://github.com/cairo-thws/dbvce_eval
  • paper_authors: Philipp Vaeth, Alexander M. Fruehwald, Benjamin Paassen, Magda Gregorova
  • for: 本研究旨在系统地评估最新的视觉对话式解释方法(VCE),并提出了一个最小的约束集以便评估。
  • methods: 本研究使用了深度生成模型来生成高维图像,并使用了一个系统的评估框架来评估VCE方法的性能。
  • results: 研究发现了一些可能的改进方向,并提供了一个值得关注的指南 для未来的VCE方法的发展。
    Abstract Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality. However, it is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies. In this work, we propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used. We use this framework to explore the effects of certain crucial design choices in the latest diffusion-based generative models for VCEs of natural image classification (ImageNet). We conduct a battery of ablation-like experiments, generating thousands of VCEs for a suite of classifiers of various complexity, accuracy and robustness. Our findings suggest multiple directions for future advancements and improvements of VCE methods. By sharing our methodology and our approach to tackle the computational challenges of such a study on a limited hardware setup (including the complete code base), we offer a valuable guidance for researchers in the field fostering consistency and transparency in the assessment of counterfactual explanations.
    摘要 最新的视觉对比例解释(VCE)技术利用深度生成模型生成高维度图像的新示例,图像质量极高。然而,目前很难比较这些VCE方法的表现,因为评价方法多样化,经常降低到视觉检查具体示例和小规模用户研究。在这项工作中,我们提出了一个系统性评价框架和最小的度量集,用于评价VCE方法。我们使用这个框架来探索最新的扩散基于生成模型中的某些关键设计选择对自然图像分类(ImageNet)VCEs的影响。我们进行了一系列减少类似实验,生成了数千个VCEs,用于一组不同的分类器,包括不同的复杂度、准确率和鲁棒性。我们的发现表明,有多个方向可以进行未来的进步和改进VCE方法。通过分享我们的方法和我们在限制性硬件设置上进行计算挑战的解决方案,我们提供了对研究领域的研究者的有价值指导,促进了透明度和一致性在对对比解释的评价中。

Neural Conversation Models and How to Rein Them in: A Survey of Failures and Fixes

  • paper_url: http://arxiv.org/abs/2308.06095
  • repo_url: None
  • paper_authors: Fabian Galetzka, Anne Beyer, David Schlangen
  • for: 这个论文旨在探讨基于强大语言模型的开放领域对话系统,以及如何使用Grice的协作对话原则来评估这些系统的合适性。
  • methods: 这篇论文使用了系统化文献综述的方法,对现有的方法进行分类和探讨,并提出了一些新的研究方向。
  • results: 论文 conclude 了现有的方法的优缺点,并提出了一些未来研究的可能性,包括如何更好地控制语言模型,以确保对话的流畅、有用、一致、准确和遵循社会规范。
    Abstract Recent conditional language models are able to continue any kind of text source in an often seemingly fluent way. This fact encouraged research in the area of open-domain conversational systems that are based on powerful language models and aim to imitate an interlocutor by generating appropriate contributions to a written dialogue. From a linguistic perspective, however, the complexity of contributing to a conversation is high. In this survey, we interpret Grice's maxims of cooperative conversation from the perspective of this specific research area and systematize the literature under the aspect of what makes a contribution appropriate: A neural conversation model has to be fluent, informative, consistent, coherent, and follow social norms. In order to ensure these qualities, recent approaches try to tame the underlying language models at various intervention points, such as data, training regime or decoding. Sorted by these categories and intervention points, we discuss promising attempts and suggest novel ways for future research.
    摘要 Translated into Simplified Chinese:现代语言模型可以继续任何类型的文本源,往往显示出流畅的语言能力。这种情况促使了开放领域对话系统的研究,基于强大语言模型,模拟对话伙伴的语言输出。从语言学角度来看,与人类对话的复杂性相比,对话中的贡献具有高度的复杂性。在这份调查中,我们从这个特定的研究领域的视角出发, интерпреτ Grice的协作对话maxims,并将文献分类和 intervención点进行了系统化。为确保这些特质,最近的方法尝试在不同的 intervención点进行了控制,例如数据、训练方法或解码。 sorted by these categories and intervención points, we discuss promising attempts and suggest novel ways for future research.

Reinforcement Logic Rule Learning for Temporal Point Processes

  • paper_url: http://arxiv.org/abs/2308.06094
  • repo_url: None
  • paper_authors: Chao Yang, Lu Wang, Kun Gao, Shuang Li
  • for: 这个论文的目的是提出一种可以逐步扩展解释时间事件的发生的框架。
  • methods: 该方法利用时间点过程模型和学习框架,逐步优化规则集的内容和权重,直到 Observational event sequence 的可能性最大化。
  • results: 该方法在 Both synthetic and real healthcare datasets 上得到了Promising results。In English, this translates to:
  • for: The purpose of this paper is to propose a framework for incrementally expanding the explanatory temporal logic rule set to explain the occurrence of temporal events.
  • methods: The method uses the temporal point process modeling and learning framework to gradually optimize the rule set content and weights until the likelihood of the observational event sequences is maximized.
  • results: The method obtains promising results on both synthetic and real healthcare datasets.
    Abstract We propose a framework that can incrementally expand the explanatory temporal logic rule set to explain the occurrence of temporal events. Leveraging the temporal point process modeling and learning framework, the rule content and weights will be gradually optimized until the likelihood of the observational event sequences is optimal. The proposed algorithm alternates between a master problem, where the current rule set weights are updated, and a subproblem, where a new rule is searched and included to best increase the likelihood. The formulated master problem is convex and relatively easy to solve using continuous optimization, whereas the subproblem requires searching the huge combinatorial rule predicate and relationship space. To tackle this challenge, we propose a neural search policy to learn to generate the new rule content as a sequence of actions. The policy parameters will be trained end-to-end using the reinforcement learning framework, where the reward signals can be efficiently queried by evaluating the subproblem objective. The trained policy can be used to generate new rules in a controllable way. We evaluate our methods on both synthetic and real healthcare datasets, obtaining promising results.
    摘要 我们提出一种框架,可以逐步扩展解释时间事件的发生。利用时间点过程模型和学习框架,规则内容和权重将会逐步优化,直到观察事件序列的可能性最高。我们提出的算法会 alternate между主问题和auxiliary problem。主问题中,当前规则集权重将会更新;auxiliary problem中,一个新规则将会被搜索并包含在规则集中,以最大化观察事件序列的可能性。主问题是 convex 的且可以使用连续优化解决,而auxiliary problem则需要搜索庞大的时间逻辑规则 predicate 和关系空间。为解决这个挑战,我们提出一种神经搜索策略,可以学习生成新规则的内容作为一个序列动作。策略参数将通过 reinforcement learning 框架进行endl-to-end 训练,其中的奖励信号可以高效地查询通过评估auxiliary problem 的目标函数。训练好的策略可以用于控制性地生成新规则。我们在 sintetic 和实际医疗数据上进行了试验,获得了有望的结果。

Experts Weights Averaging: A New General Training Scheme for Vision Transformers

  • paper_url: http://arxiv.org/abs/2308.06093
  • repo_url: None
  • paper_authors: Yongqi Huang, Peng Ye, Xiaoshui Huang, Sheng Li, Tao Chen, Wanli Ouyang
  • for: 这个论文的目的是提出一种新的普适训练策略,以提高视Transformer(ViT)的性能而不增加推理成本。
  • methods: 这种训练策略利用了 Mixture-of-Experts(MoE)技术,将 ViT 分解成多个分支结构,并在每个分支上使用特定的 FFN 进行训练。在训练阶段,通过随机划分token来将各个分支分配给专家,并在每次迭代结束后进行专家平均值来进行权重平均。在推理阶段,将每个分支转换回原始的 ViT 结构。
  • results: 对于多种2D和3D视觉任务、ViT 架构和数据集,实验表明该训练策略可以提高 ViT 的性能,并且可以应用于 ViT 的微调。此外,该训练策略还可以提高 naive MoE 在小型2D视觉任务和3D视觉任务中的效果。
    Abstract Structural re-parameterization is a general training scheme for Convolutional Neural Networks (CNNs), which achieves performance improvement without increasing inference cost. As Vision Transformers (ViTs) are gradually surpassing CNNs in various visual tasks, one may question: if a training scheme specifically for ViTs exists that can also achieve performance improvement without increasing inference cost? Recently, Mixture-of-Experts (MoE) has attracted increasing attention, as it can efficiently scale up the capacity of Transformers at a fixed cost through sparsely activated experts. Considering that MoE can also be viewed as a multi-branch structure, can we utilize MoE to implement a ViT training scheme similar to structural re-parameterization? In this paper, we affirmatively answer these questions, with a new general training strategy for ViTs. Specifically, we decouple the training and inference phases of ViTs. During training, we replace some Feed-Forward Networks (FFNs) of the ViT with specially designed, more efficient MoEs that assign tokens to experts by random uniform partition, and perform Experts Weights Averaging (EWA) on these MoEs at the end of each iteration. After training, we convert each MoE into an FFN by averaging the experts, transforming the model back into original ViT for inference. We further provide a theoretical analysis to show why and how it works. Comprehensive experiments across various 2D and 3D visual tasks, ViT architectures, and datasets validate the effectiveness and generalizability of the proposed training scheme. Besides, our training scheme can also be applied to improve performance when fine-tuning ViTs. Lastly, but equally important, the proposed EWA technique can significantly improve the effectiveness of naive MoE in various 2D visual small datasets and 3D visual tasks.
    摘要 <>将文本翻译成简化中文。<> convolutional neural networks (CNNs) 的一种通用训练方案是结构 parameterization,可以提高性能而不增加推理成本。随着 transformers (ViTs) 在视觉任务上逐渐超越 CNNs, 一个问题是:是否存在特定于 ViTs 的训练方案,可以在不增加推理成本下提高性能?recently, mixture-of-experts (MoE) 吸引了更多的关注,因为它可以在固定成本下有效地扩展 transformers 的容量。由于 MoE 也可以视为多支分支结构,我们可以使用 MoE 来实现对 ViTs 的训练方案,类似于结构 parameterization。在这篇文章中,我们答应这些问题,并提出了一种新的一般训练策略 для ViTs。具体来说,我们在训练阶段将 ViT 中的一些 feed-forward networks (FFNs) 替换为特制的、更高效的 MoEs,并在每次迭代结束后进行 Experts Weights Averaging (EWA)。训练结束后,我们将每个 MoE 转换成 FFN,并将模型转换回原始 ViT 模型,用于推理。我们还提供了一种理论分析,以解释为什么和如何实现。我们在多种 2D 和 3D 视觉任务、ViT 架构和数据集上进行了广泛的实验,并证明了提案的训练策略的有效性和普适性。此外,我们的训练策略还可以在 fine-tuning ViTs 中提高性能。最后,但也非常重要的是,我们的 EWA 技术可以在多种 2D 视觉小数据集和 3D 视觉任务中显著提高 naive MoE 的效果。

Toward a Better Understanding of Loss Functions for Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2308.06091
  • repo_url: https://github.com/psm1206/mawu
  • paper_authors: Seongmin Park, Mincheol Yoon, Jae-woong Lee, Hogun Park, Jongwuk Lee
  • for: 本研究探讨了现有的搜索系统中的协同推荐技术,具体来说是对现有的搜索系统中的三个主要组成部分进行分析:用户交互编码器、损失函数和随机抽取。
  • methods: 本研究使用了数学分析来分析现有的搜索系统中的损失函数,发现它们可以看作是对用户和项目表示的匹配和对用户和项目分布的均匀化。基于这种分析,我们提出了一种新的损失函数,即对适应和均匀化考虑的权重对齐和均匀性损失函数(MAWU)。
  • results: 我们通过对三个公共数据集进行广泛的实验研究发现,使用MAWU损失函数的MF和LightGCN模型在与其他搜索系统中比较或超过了现有的CF模型。
    Abstract Collaborative filtering (CF) is a pivotal technique in modern recommender systems. The learning process of CF models typically consists of three components: interaction encoder, loss function, and negative sampling. Although many existing studies have proposed various CF models to design sophisticated interaction encoders, recent work shows that simply reformulating the loss functions can achieve significant performance gains. This paper delves into analyzing the relationship among existing loss functions. Our mathematical analysis reveals that the previous loss functions can be interpreted as alignment and uniformity functions: (i) the alignment matches user and item representations, and (ii) the uniformity disperses user and item distributions. Inspired by this analysis, we propose a novel loss function that improves the design of alignment and uniformity considering the unique patterns of datasets called Margin-aware Alignment and Weighted Uniformity (MAWU). The key novelty of MAWU is two-fold: (i) margin-aware alignment (MA) mitigates user/item-specific popularity biases, and (ii) weighted uniformity (WU) adjusts the significance between user and item uniformities to reflect the inherent characteristics of datasets. Extensive experimental results show that MF and LightGCN equipped with MAWU are comparable or superior to state-of-the-art CF models with various loss functions on three public datasets.
    摘要

Safeguarding Learning-based Control for Smart Energy Systems with Sampling Specifications

  • paper_url: http://arxiv.org/abs/2308.06069
  • repo_url: None
  • paper_authors: Chih-Hong Cheng, Venkatesh Prasad Venkataramanan, Pragya Kirti Gupta, Yun-Fei Hsu, Simon Burton
  • for: 本研究探讨了在使用强化学习控制能源系统时遇到的挑战,特别是需要同时满足性能要求和安全要求,以避免黑OUT。
  • methods: 本文详细介绍了如何通过时间逻辑逻辑(LTL)的离散化将实时逻辑转换为线性逻辑,从而使得LTL的满足 imply 原始安全要求的满足。这种离散化允许使用先进工程方法,如生成防护仿真和正式验证,其中对于统计模型检查,LTL模型检查中的概率保证形成原始实时安全要求的满足的下界。
  • results: 本文通过实验和分析表明,通过离散化将实时逻辑转换为线性逻辑可以提高安全性和可靠性,并且可以减少风险和成本。
    Abstract We study challenges using reinforcement learning in controlling energy systems, where apart from performance requirements, one has additional safety requirements such as avoiding blackouts. We detail how these safety requirements in real-time temporal logic can be strengthened via discretization into linear temporal logic (LTL), such that the satisfaction of the LTL formulae implies the satisfaction of the original safety requirements. The discretization enables advanced engineering methods such as synthesizing shields for safe reinforcement learning as well as formal verification, where for statistical model checking, the probabilistic guarantee acquired by LTL model checking forms a lower bound for the satisfaction of the original real-time safety requirements.
    摘要

Deep learning-based flow disaggregation for hydropower plant management

  • paper_url: http://arxiv.org/abs/2308.11631
  • repo_url: None
  • paper_authors: Duo Zhang
  • for: 这个研究是为了提供一种基于深度学习的时间系列分解模型,用于从日均流量数据中提取更高的时间分辨率的流量信息。
  • methods: 该模型使用了深度学习的强大特征提取能力,对日均流量数据进行分解,并生成了更高的时间分辨率的流量时间序列。
  • results: 预liminary结果表明,该模型在 tested using flow data from a Norwegian flow station 中对流量时间序列的分解表现了一些有 promise 的特征。
    Abstract High temporal resolution data is a vital resource for hydropower plant management. Currently, only daily resolution data are available for most of Norwegian hydropower plant, however, to achieve more accurate management, sub-daily resolution data are often required. To deal with the wide absence of sub-daily data, time series disaggregation is a potential tool. In this study, we proposed a time series disaggregation model based on deep learning, the model is tested using flow data from a Norwegian flow station, to disaggregate the daily flow into hourly flow. Preliminary results show some promising aspects for the proposed model.
    摘要 高 temporal resolution 数据是贯彻水电厂管理的重要资源。现在,大多数挪威水电厂的数据仅有每日分辨率,但是为更准确的管理,需要更高的分辨率。为了解决广泛缺乏的 sub-daily 数据,时间系列分解是一个可能的工具。在这项研究中,我们提出了基于深度学习的时间系列分解模型,用于分解挪威流站的日aily flow 数据 into hourly flow。初步结果显示该模型具有一些有前途的特点。

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

  • paper_url: http://arxiv.org/abs/2308.06058
  • repo_url: None
  • paper_authors: Xiaowen Jiang, Sebastian U. Stich
  • for: 这个论文主要是为了解决SGD在训练过参数化模型时的问题,特别是在非 interpolating 设定下。
  • methods: 这篇论文提出了两种新的Stochastic Polyak stepsize (SPS)和Stochastic line-search (SLS)算法,它们可以在非 interpolating 设定下保证拟合率的拟合,并且在 convex 和过参数化模型中保持下线性和线性的拟合率。
  • results: 这篇论文提出了两种新的算法,并且 equip 了这两种算法 WITH 一种新的假值减少技术,使得它们可以在非 interpolating 设定下需要 $\smash{\widetilde{\mathcal{O}}(n+1/\epsilon)$ 梯度评估来实现 $\mathcal{O}(\epsilon)$-优化性。这些结果比 AdaSPS 和 AdaSLS Without 假值减少技术在非 interpolating regimes 更快,并且与 AdaSVRG 具有相同的速率,但是不需要内外循环结构。此外,这篇论文的实验数据 validate 了我们的理论和算法的可行性和稳定性。
    Abstract The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al. [2022]), this approach results in slower convergence rates for convex and over-parameterized models. In this work, we make two contributions: Firstly, we propose two new variants of SPS and SLS, called AdaSPS and AdaSLS, which guarantee convergence in non-interpolation settings and maintain sub-linear and linear convergence rates for convex and strongly convex functions when training over-parameterized models. AdaSLS requires no knowledge of problem-dependent parameters, and AdaSPS requires only a lower bound of the optimal function value as input. Secondly, we equip AdaSPS and AdaSLS with a novel variance reduction technique and obtain algorithms that require $\smash{\widetilde{\mathcal{O}}(n+1/\epsilon)$ gradient evaluations to achieve an $\mathcal{O}(\epsilon)$-suboptimality for convex functions, which improves upon the slower $\mathcal{O}(1/\epsilon^2)$ rates of AdaSPS and AdaSLS without variance reduction in the non-interpolation regimes. Moreover, our result matches the fast rates of AdaSVRG but removes the inner-outer-loop structure, which is easier to implement and analyze. Finally, numerical experiments on synthetic and real datasets validate our theory and demonstrate the effectiveness and robustness of our algorithms.
    摘要 最近提出的随机Polyak步骤(SPS)和随机搜索(SLS)方法在训练过参数模型时已经表现出了惊人的效果。然而,在非 interpolating Setting 中,这两种算法只能 garantate 到一个解附近的 converges,可能会比初始猜测更差。而人工减小 adaptive 步骤的方法(Orvieto et al. [2022])可以解决这个问题,但这种方法会导致对凸和过参数模型的 converges 速度变得更慢。在这个工作中,我们做了两个贡献:首先,我们提出了 AdaSPS 和 AdaSLS 两种新的变体,这些变体可以在非 interpolating Setting 中 garantate converges,并且在凸和强凸函数上保持 sub-linear 和 linear converges 速度。AdaSLS 不需要问题依赖的参数,而 AdaSPS 只需要输入最佳函数值的下界。其次,我们使用了一种新的差分降采样技术,并得到了需要 $\smash{\widetilde{\mathcal{O}}(n+1/\epsilon)$ 梯度评估来实现 $\mathcal{O}(\epsilon)$-suboptimality 的算法,这比 AdaSPS 和 AdaSLS 无差分降采样情况下的 slower $\mathcal{O}(1/\epsilon^2)$ 速度更快。此外,我们的结果与 AdaSVRG 的快速率相同,但是我们的算法去掉了内部外部循环结构,这使得实现和分析更加容易。最后,我们在 sintetic 和实际数据上进行了数据验证,并证明了我们的理论和算法的有效性和可靠性。

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

  • paper_url: http://arxiv.org/abs/2308.06053
  • repo_url: None
  • paper_authors: Xinyue Ma, Suyeon Jeong, Minjia Zhang, Di Wang, Jonghyun Choi, Myeongjae Jeon
  • for: 这个研究是为了提高边缘设备上的continuous learning(CL)系统的成本效益。
  • methods: 这个研究使用了层次记忆重测的方法来维持之前学习的知识,并且在边缘设备上进行了线上 profiling 来找到最佳的精度-能源交换值。
  • results: 比基eline系统,这个研究的系统可以实现更高的成本效益。
    Abstract Continual learning (CL) trains NN models incrementally from a continuous stream of tasks. To remember previously learned knowledge, prior studies store old samples over a memory hierarchy and replay them when new tasks arrive. Edge devices that adopt CL to preserve data privacy are typically energy-sensitive and thus require high model accuracy while not compromising energy efficiency, i.e., cost-effectiveness. Our work is the first to explore the design space of hierarchical memory replay-based CL to gain insights into achieving cost-effectiveness on edge devices. We present Miro, a novel system runtime that carefully integrates our insights into the CL framework by enabling it to dynamically configure the CL system based on resource states for the best cost-effectiveness. To reach this goal, Miro also performs online profiling on parameters with clear accuracy-energy trade-offs and adapts to optimal values with low overhead. Extensive evaluations show that Miro significantly outperforms baseline systems we build for comparison, consistently achieving higher cost-effectiveness.
    摘要 We present Miro, a novel system runtime that carefully integrates our insights into the CL framework. Miro dynamically configures the CL system based on resource states for the best cost-effectiveness, and performs online profiling on parameters with clear accuracy-energy trade-offs. By adapting to optimal values with low overhead, Miro significantly outperforms baseline systems we built for comparison, consistently achieving higher cost-effectiveness.In Simplified Chinese:CONTINUAL learning (CL) trains neural network (NN) models 从一个不断更新的任务流中进行逐步增长,以保留之前学习的知识。先前的研究使用了一个内存层次结构来存储老任务和重新播放它们,但这种方法可能会占用更多的能量和破坏能效性。我们的工作是首次探索基于层次内存重温的 CL 设计空间,以了解在边缘设备上实现成本效iveness。我们介绍了 Miro,一种新的系统时间。 Miro 细心地将我们的发现 integrate 到 CL 框架中,以实现最佳的成本效iveness。 Miro 可以在不同的资源状态下动态配置 CL 系统,并在精细的粒度上进行在线 Profiling,以确定参数的准确性-能量费用的交换。通过适应优化的低负担,Miro 在与基准系统进行比较时显著超越它们,一直保持高的成本效iveness。

Towards Instance-adaptive Inference for Federated Learning

  • paper_url: http://arxiv.org/abs/2308.06051
  • repo_url: https://github.com/chunmeifeng/fedins
  • paper_authors: Chun-Mei Feng, Kai Yu, Nian Liu, Xinxing Xu, Salman Khan, Wangmeng Zuo
  • for: 提高 federated learning 的性能,特别是在带有内部数据不均匀性的情况下。
  • methods: 提出一种基于 scale and shift deep features (SSF) 的参数高效精度调整方法,并在客户端端 sides 实现实例特化的推理。
  • results: 在 Tiny-ImageNet 上比顶级方法提高 6.64%,并且通信成本低于 15%。
    Abstract Federated learning (FL) is a distributed learning paradigm that enables multiple clients to learn a powerful global model by aggregating local training. However, the performance of the global model is often hampered by non-i.i.d. distribution among the clients, requiring extensive efforts to mitigate inter-client data heterogeneity. Going beyond inter-client data heterogeneity, we note that intra-client heterogeneity can also be observed on complex real-world data and seriously deteriorate FL performance. In this paper, we present a novel FL algorithm, i.e., FedIns, to handle intra-client data heterogeneity by enabling instance-adaptive inference in the FL framework. Instead of huge instance-adaptive models, we resort to a parameter-efficient fine-tuning method, i.e., scale and shift deep features (SSF), upon a pre-trained model. Specifically, we first train an SSF pool for each client, and aggregate these SSF pools on the server side, thus still maintaining a low communication cost. To enable instance-adaptive inference, for a given instance, we dynamically find the best-matched SSF subsets from the pool and aggregate them to generate an adaptive SSF specified for the instance, thereby reducing the intra-client as well as the inter-client heterogeneity. Extensive experiments show that our FedIns outperforms state-of-the-art FL algorithms, e.g., a 6.64\% improvement against the top-performing method with less than 15\% communication cost on Tiny-ImageNet. Our code and models will be publicly released.
    摘要 联合学习(FL)是一种分布式学习模式,允许多个客户端学习一个强大的全球模型,通过客户端的本地训练数据的汇总来提高模型性能。然而,全球模型的性能经常受到客户端数据不均衡的影响,需要大量的努力来缓解客户端之间数据不同性。在这篇论文中,我们提出了一种新的联合学习算法,即FedIns,以处理客户端数据不同性。我们采用一种实例适应的推理方法,即缩放和平移深度特征(SSF),以适应复杂的实际数据。具体来说,我们首先在每个客户端上训练一个SSF集,然后在服务器端将这些SSF集进行汇总,以保持低通信成本。为实现实例适应推理,对于一个给定的实例,我们在实例特定的情况下动态找到最佳适应SSF子集,并将这些子集进行汇总,以生成适应该实例的SSF。这有助于降低客户端之间以及客户端与数据之间的不同性。我们的FedIns在Tiny-ImageNet上与状态体系的方法进行比较,实际实验表明,我们的FedIns可以提高6.64%的性能,并且通信成本不足15%。我们将代码和模型公开发布。

AI-Assisted Investigation of On-Chain Parameters: Risky Cryptocurrencies and Price Factors

  • paper_url: http://arxiv.org/abs/2308.08554
  • repo_url: None
  • paper_authors: Abdulrezzak Zekiye, Semih Utku, Fadi Amroush, Oznur Ozkasap
    for:This paper aims to analyze historical data and use artificial intelligence algorithms to identify factors affecting cryptocurrency prices and to classify cryptocurrencies as risky or not.methods:The authors use on-chain parameters and artificial intelligence techniques, including clustering and classification, to analyze the data. They also implement multiple classifiers to predict the risk of a cryptocurrency.results:The analysis reveals that a significant proportion of cryptocurrencies disappear from the market, and that there is a negative correlation between price and maximum and total supply, as well as a weak positive correlation with 24-hour trading volume. The authors also cluster cryptocurrencies into five distinct groups using their on-chain parameters, and achieve an f1-score of 76% using K-Nearest Neighbor to predict the risk of a cryptocurrency.
    Abstract Cryptocurrencies have become a popular and widely researched topic of interest in recent years for investors and scholars. In order to make informed investment decisions, it is essential to comprehend the factors that impact cryptocurrency prices and to identify risky cryptocurrencies. This paper focuses on analyzing historical data and using artificial intelligence algorithms on on-chain parameters to identify the factors affecting a cryptocurrency's price and to find risky cryptocurrencies. We conducted an analysis of historical cryptocurrencies' on-chain data and measured the correlation between the price and other parameters. In addition, we used clustering and classification in order to get a better understanding of a cryptocurrency and classify it as risky or not. The analysis revealed that a significant proportion of cryptocurrencies (39%) disappeared from the market, while only a small fraction (10%) survived for more than 1000 days. Our analysis revealed a significant negative correlation between cryptocurrency price and maximum and total supply, as well as a weak positive correlation between price and 24-hour trading volume. Moreover, we clustered cryptocurrencies into five distinct groups using their on-chain parameters, which provides investors with a more comprehensive understanding of a cryptocurrency when compared to those clustered with it. Finally, by implementing multiple classifiers to predict whether a cryptocurrency is risky or not, we obtained the best f1-score of 76% using K-Nearest Neighbor.
    摘要 digital currencies have become a popular and widely researched topic of interest in recent years for investors and scholars. In order to make informed investment decisions, it is essential to comprehend the factors that impact digital currency prices and to identify risky digital currencies. This paper focuses on analyzing historical data and using artificial intelligence algorithms on on-chain parameters to identify the factors affecting a digital currency's price and to find risky digital currencies. We conducted an analysis of historical digital currencies' on-chain data and measured the correlation between the price and other parameters. In addition, we used clustering and classification in order to get a better understanding of a digital currency and classify it as risky or not. The analysis revealed that a significant proportion of digital currencies (39%) disappeared from the market, while only a small fraction (10%) survived for more than 1000 days. Our analysis revealed a significant negative correlation between digital currency price and maximum and total supply, as well as a weak positive correlation between price and 24-hour trading volume. Moreover, we clustered digital currencies into five distinct groups using their on-chain parameters, which provides investors with a more comprehensive understanding of a digital currency when compared to those clustered with it. Finally, by implementing multiple classifiers to predict whether a digital currency is risky or not, we obtained the best f1-score of 76% using K-Nearest Neighbor.

Controlling Character Motions without Observable Driving Source

  • paper_url: http://arxiv.org/abs/2308.06025
  • repo_url: None
  • paper_authors: Weiyuan Li, Bin Dai, Ziyi Zhou, Qi Yao, Baoyuan Wang
  • for: generator of diverse, life-like, and unlimited long head/body sequences without any driving source
  • methods: combines VQ-VAE and reinforcement learning with carefully designed reward functions to address challenges such as OOD issue, insufficient diversity, and periodic patterns
  • results: outperforms other strong baselines very significantly in generating diverse and natural motion sequences
    Abstract How to generate diverse, life-like, and unlimited long head/body sequences without any driving source? We argue that this under-investigated research problem is non-trivial at all, and has unique technical challenges behind it. Without semantic constraints from the driving sources, using the standard autoregressive model to generate infinitely long sequences would easily result in 1) out-of-distribution (OOD) issue due to the accumulated error, 2) insufficient diversity to produce natural and life-like motion sequences and 3) undesired periodic patterns along the time. To tackle the above challenges, we propose a systematic framework that marries the benefits of VQ-VAE and a novel token-level control policy trained with reinforcement learning using carefully designed reward functions. A high-level prior model can be easily injected on top to generate unlimited long and diverse sequences. Although we focus on no driving sources now, our framework can be generalized for controlled synthesis with explicit driving sources. Through comprehensive evaluations, we conclude that our proposed framework can address all the above-mentioned challenges and outperform other strong baselines very significantly.
    摘要 如何生成无驱动源的多样化、生命如真、无限长头/身序列?我们认为这是一个未受抨拤的研究问题,具有独特的技术挑战。没有 semantic constraints from the driving sources,使用标准 autoregressive model 生成无限长序列会导致 1) OOD 问题 Due to accumulated error, 2) 不够的多样性来生成自然和生命如真的动作序列和 3) 不想要的 periodic patterns 在时间方向。为解决以上挑战,我们提出了一个系统化的框架,将 VQ-VAE 的优点和一种新的 токен级控制策略,通过 reinforcement learning 使用经过设计的奖励函数进行训练。高级别的 prior model 可以轻松地注入到这个框架中,以生成无限长和多样化的序列。虽然我们现在没有驱动源,但我们的框架可以通过控制 synthesis 来扩展到有显式驱动源的情况。通过全面的评估,我们结论是我们提出的框架可以解决所有以上问题,并与其他强大的基elines 进行比较显著地超越。

Evaluating Picture Description Speech for Dementia Detection using Image-text Alignment

  • paper_url: http://arxiv.org/abs/2308.07933
  • repo_url: None
  • paper_authors: Youxiang Zhu, Nana Lin, Xiaohui Liang, John A. Batsis, Robert M. Roth, Brian MacWhinney
  • for: 这项研究旨在提高诊断失忆症的精度,通过使用图像描述文本对话来检测诊断失忆症。
  • methods: 该研究使用了大量预训练的图像文本对应模型的知识,并对样本进行了预处理,包括图像和文本的重要区域分割、文本与图像的相关性评分等。
  • results: 研究发现,通过使用图像和文本,可以提高诊断失忆症的精度,并实现了状态之最高的检测精度(83.44%),高于文本只基eline模型(79.91%)。
    Abstract Using picture description speech for dementia detection has been studied for 30 years. Despite the long history, previous models focus on identifying the differences in speech patterns between healthy subjects and patients with dementia but do not utilize the picture information directly. In this paper, we propose the first dementia detection models that take both the picture and the description texts as inputs and incorporate knowledge from large pre-trained image-text alignment models. We observe the difference between dementia and healthy samples in terms of the text's relevance to the picture and the focused area of the picture. We thus consider such a difference could be used to enhance dementia detection accuracy. Specifically, we use the text's relevance to the picture to rank and filter the sentences of the samples. We also identified focused areas of the picture as topics and categorized the sentences according to the focused areas. We propose three advanced models that pre-processed the samples based on their relevance to the picture, sub-image, and focused areas. The evaluation results show that our advanced models, with knowledge of the picture and large image-text alignment models, achieve state-of-the-art performance with the best detection accuracy at 83.44%, which is higher than the text-only baseline model at 79.91%. Lastly, we visualize the sample and picture results to explain the advantages of our models.
    摘要

Large Language Models for Telecom: Forthcoming Impact on the Industry

  • paper_url: http://arxiv.org/abs/2308.06013
  • repo_url: None
  • paper_authors: Ali Maatouk, Nicola Piovesan, Fadhel Ayed, Antonio De Domenico, Merouane Debbah
  • for: 本研究旨在探讨大语言模型(LLM)在 телеком领域的影响和应用前景,以便更好地利用LLM技术,提高操作效率和工程培训资源的利用率。
  • methods: 本文采用了各种方法,包括LLM的概念梳理、现有技术的综述、实践案例的探讨和未来研究方向的探讨,以便更好地理解LLM的现有能力和局限性。
  • results: 本文提出了一些可以快速实施在 телеCOM领域的用例,例如自动化客户服务、自动化质量监测和自动化问题排查等,这些用例可以帮助提高操作效率和减少人工干预。此外,本文还探讨了在telecom领域使用LLM的一些特殊挑战和未来研究方向,这些挑战和研究方向需要进一步的研究和开发,以便更好地应用LLM技术在telecom领域。
    Abstract Large Language Models (LLMs) have emerged as a transformative force, revolutionizing numerous fields well beyond the conventional domain of Natural Language Processing (NLP) and garnering unprecedented attention. As LLM technology continues to progress, the telecom industry is facing the prospect of its potential impact on its landscape. To elucidate these implications, we delve into the inner workings of LLMs, providing insights into their current capabilities and limitations. We also examine the use cases that can be readily implemented in the telecom industry, streamlining numerous tasks that currently hinder operational efficiency and demand significant manpower and engineering expertise. Furthermore, we uncover essential research directions that deal with the distinctive challenges of utilizing the LLMs within the telecom domain. Addressing these challenges represents a significant stride towards fully harnessing the potential of LLMs and unlocking their capabilities to the fullest extent within the telecom domain.
    摘要

Does AI for science need another ImageNet Or totally different benchmarks? A case study of machine learning force fields

  • paper_url: http://arxiv.org/abs/2308.05999
  • repo_url: None
  • paper_authors: Yatao Li, Wanling Gao, Lei Wang, Lixin Sun, Zun Wang, Jianfeng Zhan
  • for: 这篇论文旨在探讨如何有效地 benchmark AI for science(AI4S),使用机器学习方法进行科学计算任务的准确性和速度提高。
  • methods: 本论文使用机器学习力场(MLFF)作为一个案例研究,并指出了一些在科学上有意义的测试欠妥,并提出了一些解决方案来评估 MLFF 模型。
  • results: 本论文提出了一种新的方法来评估 AI4S,包括样本效率、时间域敏感和横向泛化能力等方面的性能指标,这些指标可以更好地评估模型在实际科学应用中的表现。
    Abstract AI for science (AI4S) is an emerging research field that aims to enhance the accuracy and speed of scientific computing tasks using machine learning methods. Traditional AI benchmarking methods struggle to adapt to the unique challenges posed by AI4S because they assume data in training, testing, and future real-world queries are independent and identically distributed, while AI4S workloads anticipate out-of-distribution problem instances. This paper investigates the need for a novel approach to effectively benchmark AI for science, using the machine learning force field (MLFF) as a case study. MLFF is a method to accelerate molecular dynamics (MD) simulation with low computational cost and high accuracy. We identify various missed opportunities in scientifically meaningful benchmarking and propose solutions to evaluate MLFF models, specifically in the aspects of sample efficiency, time domain sensitivity, and cross-dataset generalization capabilities. By setting up the problem instantiation similar to the actual scientific applications, more meaningful performance metrics from the benchmark can be achieved. This suite of metrics has demonstrated a better ability to assess a model's performance in real-world scientific applications, in contrast to traditional AI benchmarking methodologies. This work is a component of the SAIBench project, an AI4S benchmarking suite. The project homepage is https://www.computercouncil.org/SAIBench.
    摘要 “AI for science(AI4S)是一个emerging研究领域,旨在使用机器学习方法提高科学计算任务的精度和速度。传统的AI测试方法难以适应AI4S中的专有挑战,因为它们假设训练、测试和未来的实际世界问题数据独立且相同分布,而AI4S工作负载则预期了对数据的异步问题。本文 investigate了需要一种新的测试方法,以便对AI for science进行有效的测试,使用机器学习力场(MLFF)作为一个实验。MLFF是一种将分子动力学(MD) simulations accelerated with low computational cost和高精度。我们识别了科学上有意义的测试 missed opportunities,并提出了评估MLFF模型的方法,包括样本效率、时间域敏感度和跨数据通用能力。通过设置问题实际相似于科学应用,可以从测试中获得更有意义的性能指标。这套指标已经显示出在实际科学应用中,模型的性能比传统AI测试方法更好。这是SAIBench项目的一部分,SAIBench是一个AI4S测试集。项目首页可以在https://www.computercouncil.org/SAIBench浏览。”

Automatic Classification of Blood Cell Images Using Convolutional Neural Network

  • paper_url: http://arxiv.org/abs/2308.06300
  • repo_url: None
  • paper_authors: Rabia Asghar, Sanjay Kumar, Paul Hynds, Abeera Mahfooz
  • for: automatic classification of ten types of blood cells
  • methods: utilized transfer learning with pre-trained CNN models (VGG16, VGG19, ResNet-50, ResNet-101, ResNet-152, InceptionV3, MobileNetV2, and DenseNet-20) and proposed a novel CNN-based framework
  • results: achieved an accuracy of 99.91% on the PBC dataset, outperforming earlier results reported in the literature
    Abstract Human blood primarily comprises plasma, red blood cells, white blood cells, and platelets. It plays a vital role in transporting nutrients to different organs, where it stores essential health-related data about the human body. Blood cells are utilized to defend the body against diverse infections, including fungi, viruses, and bacteria. Hence, blood analysis can help physicians assess an individual's physiological condition. Blood cells have been sub-classified into eight groups: Neutrophils, eosinophils, basophils, lymphocytes, monocytes, immature granulocytes (promyelocytes, myelocytes, and metamyelocytes), erythroblasts, and platelets or thrombocytes on the basis of their nucleus, shape, and cytoplasm. Traditionally, pathologists and hematologists in laboratories have examined these blood cells using a microscope before manually classifying them. The manual approach is slower and more prone to human error. Therefore, it is essential to automate this process. In our paper, transfer learning with CNN pre-trained models. VGG16, VGG19, ResNet-50, ResNet-101, ResNet-152, InceptionV3, MobileNetV2, and DenseNet-20 applied to the PBC dataset's normal DIB. The overall accuracy achieved with these models lies between 91.375 and 94.72%. Hence, inspired by these pre-trained architectures, a model has been proposed to automatically classify the ten types of blood cells with increased accuracy. A novel CNN-based framework has been presented to improve accuracy. The proposed CNN model has been tested on the PBC dataset normal DIB. The outcomes of the experiments demonstrate that our CNN-based framework designed for blood cell classification attains an accuracy of 99.91% on the PBC dataset. Our proposed convolutional neural network model performs competitively when compared to earlier results reported in the literature.
    摘要 人体血液主要由血液溶解物、红细胞、白细胞和板凝血球组成。它扮演着重要的role在输送不同器官的营养物质和保存人体重要的健康数据。血液细胞可以用于防御身体对多种感染,包括病毒、菌和真菌。因此,血液分析可以帮助医生评估个体的生理状况。血液细胞被分为八类: нейтрофиLS、嗜酸细胞、基索细胞、T细胞和板凝血球,基于其核仁、形态和 citoplasma。传统上,pathologist和hemotologist在实验室中使用 Mikroskop 手动分类这些血液细胞。这种手动方法 slower和更容易出现人类错误。因此,需要自动化这个过程。在我们的论文中,通过转移学习与CNN预训练模型。VGG16、VGG19、ResNet-50、ResNet-101、ResNet-152、InceptionV3、MobileNetV2和DenseNet-20应用于PBC数据集的正常DIB。这些模型的总准确率在91.375%和94.72%之间。因此,基于这些预训练模型,我们提出了一种自动分类血液细胞的方法,以提高准确率。我们提出了一种基于CNN的框架,并在PBC数据集的正常DIB上测试了该模型。实验结果显示,我们的CNN模型在PBC数据集上达到了99.91%的准确率。我们的提出的 convolutional neural network 模型在与前一些 literatur 中报道的结果相比,表现竞争力强。

Fast and Accurate Transferability Measurement by Evaluating Intra-class Feature Variance

  • paper_url: http://arxiv.org/abs/2308.05986
  • repo_url: https://github.com/snudatalab/TMI
  • paper_authors: Huiwen Xu, U Kang
  • for: This paper aims to quickly and accurately find the most useful pre-trained model for a downstream task by measuring transferability.
  • methods: The proposed method is called TMI (TRANSFERABILITY MEASUREMENT WITH INTRA-CLASS FEATURE VARIANCE), which measures transferability by evaluating intra-class feature variance.
  • results: Extensive experiments on real-world datasets show that TMI outperforms competitors for selecting the top-5 best models, and exhibits consistently better correlation in 13 out of 17 cases.Here’s the Simplified Chinese text format you requested:
  • for: 这篇论文目标是快速准确地找到下游任务中最有用的预训练模型,通过衡量转移性。
  • methods: 提出的方法是TMI(转移性评估与内类特征差异),该方法通过评估内类特征差异来衡量转移性。
  • results: 实验结果表明,TMI在真实世界数据集上对预训练模型的选择 exhibits 比竞争者更高的溢余率和更好的相关性,在13个案例中 consistently better correlation.
    Abstract Given a set of pre-trained models, how can we quickly and accurately find the most useful pre-trained model for a downstream task? Transferability measurement is to quantify how transferable is a pre-trained model learned on a source task to a target task. It is used for quickly ranking pre-trained models for a given task and thus becomes a crucial step for transfer learning. Existing methods measure transferability as the discrimination ability of a source model for a target data before transfer learning, which cannot accurately estimate the fine-tuning performance. Some of them restrict the application of transferability measurement in selecting the best supervised pre-trained models that have classifiers. It is important to have a general method for measuring transferability that can be applied in a variety of situations, such as selecting the best self-supervised pre-trained models that do not have classifiers, and selecting the best transferring layer for a target task. In this work, we propose TMI (TRANSFERABILITY MEASUREMENT WITH INTRA-CLASS FEATURE VARIANCE), a fast and accurate algorithm to measure transferability. We view transferability as the generalization of a pre-trained model on a target task by measuring intra-class feature variance. Intra-class variance evaluates the adaptability of the model to a new task, which measures how transferable the model is. Compared to previous studies that estimate how discriminative the models are, intra-class variance is more accurate than those as it does not require an optimal feature extractor and classifier. Extensive experiments on real-world datasets show that TMI outperforms competitors for selecting the top-5 best models, and exhibits consistently better correlation in 13 out of 17 cases.
    摘要 (Simplified Chinese)给一组预训练模型,如何快速和准确地找到下游任务中最有用的预训练模型?转移性测量是用于衡量预训练模型在源任务上学习后,在目标任务上的泛化能力。它是转移学习的关键步骤,可以快速地排序预训练模型,从而选择最适合的模型。现有的方法测量转移性是通过评估源模型对目标数据的分类能力来进行,这不能准确地评估精度调整后的性能。一些方法还限制了转移性测量的应用,只能选择已经有类别器的超vised预训练模型。因此,需要一种通用的转移性测量方法,可以应对多种情况,如选择最佳无类别器的自我监督预训练模型,以及选择最佳传递层 для目标任务。在这种情况下,我们提出了TMI(转移性评价 WITH INTRA-CLASS FEATURE VARIANCE),一种快速和准确的转移性测量算法。我们认为转移性是衡量预训练模型在目标任务上的泛化能力,通过评估INTRA-CLASS Feature Variance来衡量。INTRA-CLASS Variance评估模型对新任务的适应性,这是衡量转移性的准确方法。与之前的研究相比,INTRA-CLASS Variance更加准确,因为它不需要最佳的特征提取器和类别器。实验表明,TMI在真实的 dataset上出色地选择了top-5最佳模型,并在13个案例中具有更高的相关性。

Defensive Perception: Estimation and Monitoring of Neural Network Performance under Deployment

  • paper_url: http://arxiv.org/abs/2308.06299
  • repo_url: None
  • paper_authors: Hendrik Vogt, Stefan Buehler, Mark Schutera
  • for: Addressing the issue of unnoticed catastrophic deployment and domain shift in neural networks for semantic segmentation in autonomous driving.
  • methods: Based on the idea that deep learning-based perception for autonomous driving is uncertain and best represented as a probability distribution, the paper proposes to encapsulate the neural network under deployment within an uncertainty estimation envelope using Monte Carlo Dropout.
  • results: The proposed method can estimate a neural network’s performance and monitor and notify of entering domains of reduced neural network performance under deployment, with the potential to improve safety and adaptability of autonomous driving systems.
    Abstract In this paper, we propose a method for addressing the issue of unnoticed catastrophic deployment and domain shift in neural networks for semantic segmentation in autonomous driving. Our approach is based on the idea that deep learning-based perception for autonomous driving is uncertain and best represented as a probability distribution. As autonomous vehicles' safety is paramount, it is crucial for perception systems to recognize when the vehicle is leaving its operational design domain, anticipate hazardous uncertainty, and reduce the performance of the perception system. To address this, we propose to encapsulate the neural network under deployment within an uncertainty estimation envelope that is based on the epistemic uncertainty estimation through the Monte Carlo Dropout approach. This approach does not require modification of the deployed neural network and guarantees expected model performance. Our defensive perception envelope has the capability to estimate a neural network's performance, enabling monitoring and notification of entering domains of reduced neural network performance under deployment. Furthermore, our envelope is extended by novel methods to improve the application in deployment settings, including reducing compute expenses and confining estimation noise. Finally, we demonstrate the applicability of our method for multiple different potential deployment shifts relevant to autonomous driving, such as transitions into the night, rainy, or snowy domain. Overall, our approach shows great potential for application in deployment settings and enables operational design domain recognition via uncertainty, which allows for defensive perception, safe state triggers, warning notifications, and feedback for testing or development and adaptation of the perception stack.
    摘要 在这篇论文中,我们提出了一种方法来解决神经网络 semantic segmentation 中的隐藏性恶性发布和领域变化问题。我们的方法基于神经网络在自动驾驶中的感知是uncertain的,并且最好表示为概率分布。由于自动驾驶车辆的安全性 paramount,因此感知系统需要认可车辆离开操作设计领域,预测危险性uncertainty,并减少感知系统的性能。为此,我们提议将投入部署中的神经网络包装在一个uncertainty估计膜中,该膜基于Monte Carlo Dropout方法来进行epistemic uncertainty估计。这种方法不需要修改已部署的神经网络,并保证预期的模型性能。我们的防御感知膜具有估计神经网络性能的能力,可以监测和通知部署中神经网络性能下降。此外,我们还提出了一些新的方法来改进在部署 Settings中的应用,包括减少计算成本和限制估计噪声。最后,我们示出了我们方法在多种不同的部署转移中的可应用性,如夜晚、雨天和雪天等。总之,我们的方法在部署 Settings中具有广泛的应用前景,并可以通过uncertainty来实现操作设计领域认可,这使得可以实现防御性感知、安全状态触发器、警告通知和测试或开发和适应感知堆的反馈。

An Encoder-Decoder Approach for Packing Circles

  • paper_url: http://arxiv.org/abs/2308.07335
  • repo_url: None
  • paper_authors: Akshay Kiran Jose, Gangadhar Karevvanavar, Rajshekhar V Bhat
  • for: packing smaller objects within a larger object, with the requirement that the smaller objects must not overlap and must lie completely inside the larger object.
  • methods: a novel encoder-decoder architecture consisting of an encoder block, a perturbation block, and a decoder block, with the encoder and decoder parameterized by a neural network and optimized to reduce an error between the decoder’s estimated index and the actual index of the circle provided as input to the encoder.
  • results: a sub-optimal solution that can pack smaller objects within a larger object with competitive performance compared to classical methods. The approach can be generalized to pack objects of higher dimensions and different shapes by carefully choosing normalization and perturbation layers.
    Abstract The problem of packing smaller objects within a larger object has been of interest since decades. In these problems, in addition to the requirement that the smaller objects must lie completely inside the larger objects, they are expected to not overlap or have minimum overlap with each other. Due to this, the problem of packing turns out to be a non-convex problem, obtaining whose optimal solution is challenging. As such, several heuristic approaches have been used for obtaining sub-optimal solutions in general, and provably optimal solutions for some special instances. In this paper, we propose a novel encoder-decoder architecture consisting of an encoder block, a perturbation block and a decoder block, for packing identical circles within a larger circle. In our approach, the encoder takes the index of a circle to be packed as an input and outputs its center through a normalization layer, the perturbation layer adds controlled perturbations to the center, ensuring that it does not deviate beyond the radius of the smaller circle to be packed, and the decoder takes the perturbed center as input and estimates the index of the intended circle for packing. We parameterize the encoder and decoder by a neural network and optimize it to reduce an error between the decoder's estimated index and the actual index of the circle provided as input to the encoder. The proposed approach can be generalized to pack objects of higher dimensions and different shapes by carefully choosing normalization and perturbation layers. The approach gives a sub-optimal solution and is able to pack smaller objects within a larger object with competitive performance with respect to classical methods.
    摘要 “packing smaller objects within a larger object”这个问题已经引起了几十年的关注。在这些问题中,除了要求小 objet completely inside the larger object,还需要不相互 overlap 或者 minimize overlap。由于这,packing problem 变成了一个非凸问题,获取其优化解决方案是具有挑战性。因此,许多启发性方法被用来获取sub-optimal解决方案,而且在某些特殊情况下可以获取可证明优化解决方案。在这篇论文中,我们提出了一种新的encoder-decoder架构,包括encoder块、perturbation块和decoder块,用于packing identical circles within a larger circle。在我们的方法中,encoder接受一个圆的索引作为输入,并通过normalization层输出圆心,perturbation层在圆心上添加控制的偏移,以确保它不会超过小圆的半径,decoder接受偏移后的圆心作为输入,并估算圆的索引。我们归Parameterize encoder和decoder使用神经网络,并优化它们,以减少由decoder输出的index和实际输入圆索引之间的误差。提案的方法可以通过选择normalization和perturbation层来parametrize,用于packing高维度和不同形状的 объек。该方法可以提供competitive performance with respect to classical methods,并且能够在larger object中pack smaller object。

Learning nonparametric DAGs with incremental information via high-order HSIC

  • paper_url: http://arxiv.org/abs/2308.05969
  • repo_url: None
  • paper_authors: Yafei Wang, Jianguo Liu
  • for: 本研究旨在提高潘氏网络(BN)学习中的分类识别率,通过最大化全局分数函数。
  • methods: 本研究使用了一种两阶段算法,名为最优调整(OT)算法,它首先使用了一个决定子集的条件来确定Underlying Directed Acyclic Graph(DAG),然后使用了一个全局优化和局部调整的方法来提高分数函数。
  • results: 对多种synthetic数据和实际数据进行了数值实验,结果表明,相比其他方法,OT算法在 Sigmoid Mix模型中的结构干扰距离(SID)为329.7,这表明OT算法更准确地预测了DAG结构。
    Abstract Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.
    摘要 Score-based方法 для学习悖论网络(BN)目的是最大化全局分数函数。然而,如果本地变量有直接和间接依赖关系同时存在,全局优化分数函数会遗弃变量之间的间接依赖关系的边,其分数较直接依赖关系的边小。在这篇论文中,我们提出了一种可 identificability 条件,基于确定的父集来识别下面的DAG。通过可 identificability 条件,我们开发了一个两相态算法(OT),即最优化(OT)算法,以本地修正全局优化。在优化阶段,使用首项Hilbert-Schmidt独立性 критерий(HSIC)来生成一个初始确定的父集。在调整阶段,使用HSIC的增量性质来修正skeleton,使其更加准确。numérico实验表明,OT算法在不同的 sintétiques datasets 和实际数据集上都有更好的性能,特别是在sigmoid mix模型中,OT算法的结构间接距离(SID)为329.7,比CAM的SID更小,这表明OT算法估算的图miss fewer edges compared with CAM。

Classification of White Blood Cells Using Machine and Deep Learning Models: A Systematic Review

  • paper_url: http://arxiv.org/abs/2308.06296
  • repo_url: None
  • paper_authors: Rabia Asghar, Sanjay Kumar, Paul Hynds, Arslan Shaukat
  • for: 本文主要用于探讨现代医学影像分析中的白细胞类型分类方法。
  • methods: 本文主要介绍了使用血液染色片图像、核磁共振成像(MRI)、X射线等医学影像领域的方法,以及基于机器学习(ML)和深度学习(DL)的技术在白细胞类型分类中的应用。
  • results: 研究发现,在过去的17年中,使用ML和DL技术进行白细胞类型分类的使用量和性能都在不断提高,但现存在一些挑战,包括获得相关数据的可用性和医学人员的培训。
    Abstract Machine learning (ML) and deep learning (DL) models have been employed to significantly improve analyses of medical imagery, with these approaches used to enhance the accuracy of prediction and classification. Model predictions and classifications assist diagnoses of various cancers and tumors. This review presents an in-depth analysis of modern techniques applied within the domain of medical image analysis for white blood cell classification. The methodologies that use blood smear images, magnetic resonance imaging (MRI), X-rays, and similar medical imaging domains are identified and discussed, with a detailed analysis of ML/DL techniques applied to the classification of white blood cells (WBCs) representing the primary focus of the review. The data utilized in this research has been extracted from a collection of 136 primary papers that were published between the years 2006 and 2023. The most widely used techniques and best-performing white blood cell classification methods are identified. While the use of ML and DL for white blood cell classification has concurrently increased and improved in recent year, significant challenges remain - 1) Availability of appropriate datasets remain the primary challenge, and may be resolved using data augmentation techniques. 2) Medical training of researchers is recommended to improve current understanding of white blood cell structure and subsequent selection of appropriate classification models. 3) Advanced DL networks including Generative Adversarial Networks, R-CNN, Fast R-CNN, and faster R-CNN will likely be increasingly employed to supplement or replace current techniques.
    摘要 医学影像分析(ME)和深度学习(DL)模型已经广泛应用于医学影像分析中,以提高预测和分类的准确率。这些方法可以帮助诊断不同类型的肿瘤和癌症。本文提供了现代医学影像分析领域中modern techniques的深入分析,主要关注白血球类别(WBC)的分类。这些方法使用血液沾染图像、核磁共振成像(MRI)、X射线等医学影像领域的数据,并详细分析了应用于WBC分类的ML/DL技术。研究使用的数据来自于2006年至2023年发表的136篇原始论文。最常用的技术和最佳白血球分类方法被识别出来。although the use of ML and DL for WBC classification has concurrently increased and improved in recent years, significant challenges remain - 1) 获得相应的数据集是主要挑战,可以通过数据扩展技术解决。2) 医学研究人员的培训是建议的,以提高现在白血球结构的理解,并选择合适的分类模型。3) 包括生成对抗网络、R-CNN、 Fast R-CNN 和更快的 R-CNN在内的高级 DL 网络将在未来被更加广泛应用,以补充或取代当前的技术。

Learned Point Cloud Compression for Classification

  • paper_url: http://arxiv.org/abs/2308.05959
  • repo_url: https://github.com/multimedialabsfu/learned-point-cloud-compression-for-classification
  • paper_authors: Mateen Ulhaq, Ivan V. Bajić
  • for: 用于机器分析三维点云数据的特有机器视觉任务,如分类、物体检测和分割。
  • methods: 基于PointNet的专门编码器,实现了高比特率压缩和低计算成本,并且可以在不同的硬件资源下进行调整。
  • results: 在ModelNet40数据集上,与非专门编码器进行比较,实现了94%的BD-比特率减少,同时保持了高度的准确率。对于低资源的终端设备,还提出了两种轻量级的编码器配置,实现了93%和92%的BD-比特率减少,同时保持了3%和5%的顶部一个准确率下降。
    Abstract Deep learning is increasingly being used to perform machine vision tasks such as classification, object detection, and segmentation on 3D point cloud data. However, deep learning inference is computationally expensive. The limited computational capabilities of end devices thus necessitate a codec for transmitting point cloud data over the network for server-side processing. Such a codec must be lightweight and capable of achieving high compression ratios without sacrificing accuracy. Motivated by this, we present a novel point cloud codec that is highly specialized for the machine task of classification. Our codec, based on PointNet, achieves a significantly better rate-accuracy trade-off in comparison to alternative methods. In particular, it achieves a 94% reduction in BD-bitrate over non-specialized codecs on the ModelNet40 dataset. For low-resource end devices, we also propose two lightweight configurations of our encoder that achieve similar BD-bitrate reductions of 93% and 92% with 3% and 5% drops in top-1 accuracy, while consuming only 0.470 and 0.048 encoder-side kMACs/point, respectively. Our codec demonstrates the potential of specialized codecs for machine analysis of point clouds, and provides a basis for extension to more complex tasks and datasets in the future.
    摘要 深度学习在进行机器视觉任务,如分类、物体检测和分割,三维点云数据上越来越广泛使用。然而,深度学习推理是计算昂贵的。因此,为了将点云数据传输到网络上进行服务器端处理,需要一个轻量级的编码器。这个编码器需要具有高度压缩率,而不是牺牲准确性。我们被这种需求所驱动,因此我们提出了一种特种的点云编码器,专门为机器分类任务。我们的编码器基于PointNet,与其他方法相比,具有显著更好的速率准确性质量比。具体来说,它在ModelNet40数据集上实现了94%的BD-比特率减少。而为低资源端设备,我们还提出了两种轻量级的编码器配置,它们分别实现了93%和92%的BD-比特率减少,同时仅占用0.470和0.048编码器端kMACs/点。我们的编码器表明特种编码器在机器分析点云数据中的潜力,并提供了将来扩展到更复杂的任务和数据集的基础。

Node Embedding for Homophilous Graphs with ARGEW: Augmentation of Random walks by Graph Edge Weights

  • paper_url: http://arxiv.org/abs/2308.05957
  • repo_url: https://github.com/ncsoft/argew
  • paper_authors: Jun Hee Kim, Jaeman Son, Hyunsoo Kim, Eunjo Lee
  • for: 本文提出了一种新的Random Walk基于图EdgeWeight的增强方法,用于改善node embedding的质量。
  • methods: 本文使用了ARGEW方法,它是一种基于Random Walk的增强方法,可以让node embedding更加准确地反映EdgeWeight。
  • results: 在多个实际网络中,ARGEW方法能够使node embedding更加准确地反映EdgeWeight,并且在node classification任务中表现出色,能够与supervised GCN相当。
    Abstract Representing nodes in a network as dense vectors node embeddings is important for understanding a given network and solving many downstream tasks. In particular, for weighted homophilous graphs where similar nodes are connected with larger edge weights, we desire node embeddings where node pairs with strong weights have closer embeddings. Although random walk based node embedding methods like node2vec and node2vec+ do work for weighted networks via including edge weights in the walk transition probabilities, our experiments show that the embedding result does not adequately reflect edge weights. In this paper, we propose ARGEW (Augmentation of Random walks by Graph Edge Weights), a novel augmentation method for random walks that expands the corpus in such a way that nodes with larger edge weights end up with closer embeddings. ARGEW can work with any random walk based node embedding method, because it is independent of the random sampling strategy itself and works on top of the already-performed walks. With several real-world networks, we demonstrate that with ARGEW, compared to not using it, the desired pattern that node pairs with larger edge weights have closer embeddings is much clearer. We also examine ARGEW's performance in node classification: node2vec with ARGEW outperforms pure node2vec and is not sensitive to hyperparameters (i.e. consistently good). In fact, it achieves similarly good results as supervised GCN, even without any node feature or label information during training. Finally, we explain why ARGEW works consistently well by exploring the coappearance distributions using a synthetic graph with clear structural roles.
    摘要 <>TRANSLATE_TEXT节点在网络中的表示为密集向量节点嵌入是重要的,可以帮助我们更好地理解给定的网络和解决许多下游任务。特别是在有Edge weights的同种连接节点的网络中,我们希望节点嵌入中的节点对应于Edge weights更大的节点对应更近。虽然随机游走基于节点嵌入方法如node2vec和node2vec+可以在 Edge weights的网络上工作,但我们的实验表明,这些嵌入结果并不充分反映Edge weights。在这篇论文中,我们提出了ARGEW(Augmentation of Random walks by Graph Edge Weights),一种新的增强方法,可以让随机游走中的节点对应更加接近。ARGEW可以与任何随机游走基于节点嵌入方法一起使用,因为它与随机游走的采样策略独立,可以在已经完成的游走之上进行增强。使用ARGEW,与不使用ARGEW的情况下,在实际网络上,我们发现,节点 Edge weights更大的对应更近的嵌入结果是非常明显的。此外,我们还考虑了ARGEW在节点分类任务中的性能,发现使用ARGEW的node2vec可以与监督GCN相当好,甚至不需要节点特征或标签信息进行训练。最后,我们解释了ARGEW在各种网络上的成功原因,通过使用一个Synthetic graph with clear structural roles来探索它们的协同出现分布。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can translate it for you.

INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing

  • paper_url: http://arxiv.org/abs/2308.05930
  • repo_url: None
  • paper_authors: Stefan Abi-Karam, Rishov Sarkar, Dejia Xu, Zhiwen Fan, Zhangyang Wang, Cong Hao
  • for: 这个论文是为了提高nth-order gradient计算的效率而设计的。
  • methods: 这个论文使用了FIFO流和优化的计算kernel库,以实现高效的内存使用和并行计算。
  • results: 对INR编辑作业进行了 benchmark,实现了 CPU 和 GPU 基准线上的1.8-4.8倍和1.5-3.6倍的速度提升,同时实现了3.1-8.9倍和1.7-4.3倍的内存使用降低和1.7-11.3倍和5.5-32.8倍的能量延迟产品降低。
    Abstract An increasing number of researchers are finding use for nth-order gradient computations for a wide variety of applications, including graphics, meta-learning (MAML), scientific computing, and most recently, implicit neural representations (INRs). Recent work shows that the gradient of an INR can be used to edit the data it represents directly without needing to convert it back to a discrete representation. However, given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient due to the higher demand for computing power and higher complexity in data movement. This makes it a promising target for FPGA acceleration. In this work, we introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We address this problem in two phases. First, we design a dataflow architecture that uses FIFO streams and an optimized computation kernel library, ensuring high memory efficiency and parallel computation. Second, we propose a compiler that extracts and optimizes computation graphs, automatically configures hardware parameters such as latency and stream depths to optimize throughput, while ensuring deadlock-free operation, and outputs High-Level Synthesis (HLS) code for FPGA implementation. We utilize INR editing as our benchmark, presenting results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively. Furthermore, we obtain 3.1-8.9x and 1.7-4.3x lower memory usage, and 1.7-11.3x and 5.5-32.8x lower energy-delay product. Our framework will be made open-source and available on GitHub.
    摘要 更多研究人员正在发现nth-order gradient计算在各种应用领域中的使用,包括图形学、元学习(MAML)、科学计算和最近的隐藏神经表示(INR)。最新的研究表明,INR的导数可以直接编辑它所表示的数据,无需将其转换回分组表示。然而,传统架构在计算graph函数的nth-order gradient时会面临高计算能力和数据移动复杂性的挑战,这使得它成为加速的优先目标。在这项工作中,我们介绍INR-Arch框架,它将计算图的nth-order gradient转换为硬件优化的数据流体系结构。我们解决这个问题在两个阶段。首先,我们设计了一种数据流体系结构,使用FIFO流和优化的计算kernels库,以确保高内存效率和并行计算。其次,我们提出了一种编译器,可以提取和优化计算图,自动配置硬件参数 such as 响应时间和流深度,以便最大化 Throughput,保证无堵塞操作,并输出高级synthesis(HLS)代码 дляFPGA实现。我们使用INR编辑作为我们的标准准对,并发布了相关的结果,显示与CPU和GPU基线相比,INR-Arch实现的速度提升为1.8-4.8倍和1.5-3.6倍。此外,我们得到了3.1-8.9倍和1.7-4.3倍的内存使用量减少,以及1.7-11.3倍和5.5-32.8倍的能量延迟产品。我们的框架将被开源,并在GitHub上提供。

On the equivalence of Occam algorithms

  • paper_url: http://arxiv.org/abs/2308.05906
  • repo_url: None
  • paper_authors: Zaman Keinath-Esmail
  • for: 本研究是用于提供后果正义的理论结果和算法设计方法的基础。
  • methods: 本研究使用了Occam算法和其它算法,以及关于扩展列表的概念类的研究。
  • results: 本研究获得了关于Occam算法的$\delta$-无关复杂性的结果,以及对于扩展列表的概念类的partial推论的后果正义。
    Abstract Blumer et al. (1987, 1989) showed that any concept class that is learnable by Occam algorithms is PAC learnable. Board and Pitt (1990) showed a partial converse of this theorem: for concept classes that are closed under exception lists, any class that is PAC learnable is learnable by an Occam algorithm. However, their Occam algorithm outputs a hypothesis whose complexity is $\delta$-dependent, which is an important limitation. In this paper, we show that their partial converse applies to Occam algorithms with $\delta$-independent complexities as well. Thus, we provide a posteriori justification of various theoretical results and algorithm design methods which use the partial converse as a basis for their work.
    摘要 布鲁默等(1987, 1989)证明任何可学习的概念类都可以使用Occam算法学习。博德和皮特(1990)证明了一个部分对您的结论:如果概念类是关闭于例外列表,那么任何可学习的概念类都可以使用Occam算法学习。然而,他们的Occam算法输出一个复杂度为 $\delta$-dependent的假设,这是一个重要的限制。在这篇论文中,我们证明了他们的部分对您也适用于Occam算法 $\delta$-无关的复杂度上。因此,我们提供了 posteriori 的正当性,用于各种理论结果和算法设计方法的基础。

Comparing the quality of neural network uncertainty estimates for classification problems

  • paper_url: http://arxiv.org/abs/2308.05903
  • repo_url: None
  • paper_authors: Daniel Ries, Joshua Michalenko, Tyler Ganter, Rashad Imad-Fayez Baiyasi, Jason Adams
  • for: 这个论文的目的是评估深度学习模型的不确定性评估方法的质量。
  • methods: 这篇论文使用了频率统计学方法,包括频率Interval覆盖率和Interval宽度,以评估 credible interval 的质量。同时,它还使用了预期报告错误来评估分类预测的信任度。
  • results: 研究发现,不同的深度学习模型在不同的数据集上可以生成不同质量的不确定性 estimates。此外,MCMC和 bootstrapped NN 在评估不确定性方面表现最佳,而 DE 和 MC dropout 的表现较差。通过这些结果,我们展示了不同的深度学习模型可以生成不同质量的不确定性 estimates,因此需要一种原则性的评估方法来选择合适的 UQ 方法。
    Abstract Traditional deep learning (DL) models are powerful classifiers, but many approaches do not provide uncertainties for their estimates. Uncertainty quantification (UQ) methods for DL models have received increased attention in the literature due to their usefulness in decision making, particularly for high-consequence decisions. However, there has been little research done on how to evaluate the quality of such methods. We use statistical methods of frequentist interval coverage and interval width to evaluate the quality of credible intervals, and expected calibration error to evaluate classification predicted confidence. These metrics are evaluated on Bayesian neural networks (BNN) fit using Markov Chain Monte Carlo (MCMC) and variational inference (VI), bootstrapped neural networks (NN), Deep Ensembles (DE), and Monte Carlo (MC) dropout. We apply these different UQ for DL methods to a hyperspectral image target detection problem and show the inconsistency of the different methods' results and the necessity of a UQ quality metric. To reconcile these differences and choose a UQ method that appropriately quantifies the uncertainty, we create a simulated data set with fully parameterized probability distribution for a two-class classification problem. The gold standard MCMC performs the best overall, and the bootstrapped NN is a close second, requiring the same computational expense as DE. Through this comparison, we demonstrate that, for a given data set, different models can produce uncertainty estimates of markedly different quality. This in turn points to a great need for principled assessment methods of UQ quality in DL applications.
    摘要 传统深度学习(DL)模型强大的分类器,但许多方法不提供不确定性的估计。不确定性量化(UQ)方法 для DL 模型在文献中受到了更多的关注,因为它们在决策中非常有用,特别是高 consequence 决策。然而,对 UQ 方法评价的研究相对较少。我们使用统计方法的频繁interval coverage和interval width来评价credible interval的质量,以及预期准确性error来评价分类预测的信任度。这些指标在Bayesian neural networks(BNN)适用Markov Chain Monte Carlo(MCMC)和variational inference(VI)、bootstrapped neural networks(NN)、Deep Ensembles(DE)和Monte Carlo(MC)dropout中进行了评价。我们使用这些不同的 UQ 方法来处理干扰影像目标检测问题,并显示了不同方法的结果之间的不一致性和需要一个 UQ 质量指标。为了解决这些差异并选择一个适当的 UQ 方法,我们创建了一个完全参数化的概率分布数据集,用于两类分类问题。MCMC perform最佳,而 bootstrapped NN 紧接其后,需要与 DE 相同的计算成本。通过这种比较,我们证明了,对于给定的数据集,不同的模型可以生成不同质量的不确定性估计。这一点指出了需要原则性评价 UQ 质量的必要性。

Target Detection on Hyperspectral Images Using MCMC and VI Trained Bayesian Neural Networks

  • paper_url: http://arxiv.org/abs/2308.06293
  • repo_url: None
  • paper_authors: Daniel Ries, Jason Adams, Joshua Zollweg
  • for: 这个论文是为了提供一种 bayesian neural network (BNN) 的 uncertainty quantification (UQ) 方法,以便在图像分类任务中提供更加可靠的预测和估计。
  • methods: 这个论文使用了 Markov Chain Monte Carlo (MCMC) 和 variational inference (VI) 两种方法来训练 BNN,并对这两种方法的效果进行比较。
  • results: 论文在 hyperspectral imagery (HSI) 领域中进行了 target detection 任务,并通过对 MCMC- 和 VI-trained BNN 的比较来显示这两种方法在不同场景下的效果。结果显示,两种方法都能够在一个高精度 HSI 目标检测场景中表现良好。
    Abstract Neural networks (NN) have become almost ubiquitous with image classification, but in their standard form produce point estimates, with no measure of confidence. Bayesian neural networks (BNN) provide uncertainty quantification (UQ) for NN predictions and estimates through the posterior distribution. As NN are applied in more high-consequence applications, UQ is becoming a requirement. BNN provide a solution to this problem by not only giving accurate predictions and estimates, but also an interval that includes reasonable values within a desired probability. Despite their positive attributes, BNN are notoriously difficult and time consuming to train. Traditional Bayesian methods use Markov Chain Monte Carlo (MCMC), but this is often brushed aside as being too slow. The most common method is variational inference (VI) due to its fast computation, but there are multiple concerns with its efficacy. We apply and compare MCMC- and VI-trained BNN in the context of target detection in hyperspectral imagery (HSI), where materials of interest can be identified by their unique spectral signature. This is a challenging field, due to the numerous permuting effects practical collection of HSI has on measured spectra. Both models are trained using out-of-the-box tools on a high fidelity HSI target detection scene. Both MCMC- and VI-trained BNN perform well overall at target detection on a simulated HSI scene. This paper provides an example of how to utilize the benefits of UQ, but also to increase awareness that different training methods can give different results for the same model. If sufficient computational resources are available, the best approach rather than the fastest or most efficient should be used, especially for high consequence problems.
    摘要 神经网络(NN)在图像分类中变得非常普遍,但在标准形式下产生点估计,无法提供信度评估。 bayesian神经网络(BNN)提供了图像分类预测和估计的不确定性评估。随着NN应用于更高重要性应用,不确定性评估变得越来越重要。BNN不仅可以提供准确的预测和估计,还可以提供一个包含合理值的时间范围,在所需的概率下。despite their positive attributes, BNN are notoriously difficult and time-consuming to train. Traditional Bayesian methods use Markov Chain Monte Carlo(MCMC), but this is often brushed aside as being too slow. The most common method is variational inference(VI) due to its fast computation, but there are multiple concerns with its efficacy. We apply and compare MCMC- and VI-trained BNN in the context of target detection in hyperspectral imagery(HSI), where materials of interest can be identified by their unique spectral signature. This is a challenging field, due to the numerous permuting effects practical collection of HSI has on measured spectra. Both models are trained using out-of-the-box tools on a high fidelity HSI target detection scene. Both MCMC- and VI-trained BNN perform well overall at target detection on a simulated HSI scene. This paper provides an example of how to utilize the benefits of UQ, but also to increase awareness that different training methods can give different results for the same model. If sufficient computational resources are available, the best approach rather than the fastest or most efficient should be used, especially for high-consequence problems.

The divergence time of protein structures modelled by Markov matrices and its relation to the divergence of sequences

  • paper_url: http://arxiv.org/abs/2308.06292
  • repo_url: None
  • paper_authors: Sandun Rajapaksa, Lloyd Allison, Peter J. Stuckey, Maria Garcia de la Banda, Arun S. Konagurthu
    for:这paper aimed to develop a time-parameterized statistical model to quantify the divergent evolution of protein structures.methods:The authors used a large collection of protein 3D structure alignments and inferred a time-parameterized stochastic matrix and associated Dirichlet models using the Bayesian and information-theoretic framework of Minimum Message Length.results:The authors demonstrated a competitive performance in secondary structure prediction against neural network architectures commonly employed for this task, and yielded a relationship between the Markov divergence time of structures and of sequences.
    Abstract A complete time-parameterized statistical model quantifying the divergent evolution of protein structures in terms of the patterns of conservation of their secondary structures is inferred from a large collection of protein 3D structure alignments. This provides a better alternative to time-parameterized sequence-based models of protein relatedness, that have clear limitations dealing with twilight and midnight zones of sequence relationships. Since protein structures are far more conserved due to the selection pressure directly placed on their function, divergence time estimates can be more accurate when inferred from structures. We use the Bayesian and information-theoretic framework of Minimum Message Length to infer a time-parameterized stochastic matrix (accounting for perturbed structural states of related residues) and associated Dirichlet models (accounting for insertions and deletions during the evolution of protein domains). These are used in concert to estimate the Markov time of divergence of tertiary structures, a task previously only possible using proxies (like RMSD). By analyzing one million pairs of homologous structures, we yield a relationship between the Markov divergence time of structures and of sequences. Using these inferred models and the relationship between the divergence of sequences and structures, we demonstrate a competitive performance in secondary structure prediction against neural network architectures commonly employed for this task. The source code and supplementary information are downloadable from \url{http://lcb.infotech.monash.edu.au/sstsum}.
    摘要 一个完整的时间参数化统计模型,用于量化蛋白质结构的分化演化,基于蛋白质三维结构对对的对应关系。这个模型提供了一个更好的替代方案,用于时间参数化序列基于的蛋白质相关性模型,这些模型在黑暗和午夜时区的序列关系上存在明显的限制。由于蛋白质结构受到直接影响函数选择压力,因此结构分化时间估计可以更准确地来自结构。我们使用 bayesian 和信息理论框架,来推算一个时间参数化随机矩阵(考虑相关结构态态的异常)和相应的 Dirichlet 模型(考虑插入和删除在蛋白质领域的演化中)。这些模型在合作下用于估计蛋白质结构的马克夫时间异同,这个任务之前只能使用 прокси(如 RMSD)来实现。通过分析一百万对同源结构,我们发现了结构分化时间和序列分化时间之间的关系。使用这些推算出的模型和序列分化时间之间的关系,我们展示了在次级结构预测中与常见神经网络架构相比,表现竞争力。源代码和补充信息可以在 \url{http://lcb.infotech.monash.edu.au/sstsum} 下载。

Learning to Team-Based Navigation: A Review of Deep Reinforcement Learning Techniques for Multi-Agent Pathfinding

  • paper_url: http://arxiv.org/abs/2308.05893
  • repo_url: None
  • paper_authors: Jaehoon Chung, Jamil Fayyad, Younes Al Younes, Homayoun Najjaran
  • for: 本文主要探讨了多代理路径找索(MAPF)领域中深度学习(DRL)基本方法的应用。
  • methods: 本文主要介绍了将DRL作为MAPF解决方案的一部分,并提供了一个综合的评价指标集。
  • results: 本文通过对多种MAPF算法的评价,折衔了现有解决方案的缺点,并提出了将model-based DRL作为未来发展方向的可能性。
    Abstract Multi-agent pathfinding (MAPF) is a critical field in many large-scale robotic applications, often being the fundamental step in multi-agent systems. The increasing complexity of MAPF in complex and crowded environments, however, critically diminishes the effectiveness of existing solutions. In contrast to other studies that have either presented a general overview of the recent advancements in MAPF or extensively reviewed Deep Reinforcement Learning (DRL) within multi-agent system settings independently, our work presented in this review paper focuses on highlighting the integration of DRL-based approaches in MAPF. Moreover, we aim to bridge the current gap in evaluating MAPF solutions by addressing the lack of unified evaluation metrics and providing comprehensive clarification on these metrics. Finally, our paper discusses the potential of model-based DRL as a promising future direction and provides its required foundational understanding to address current challenges in MAPF. Our objective is to assist readers in gaining insight into the current research direction, providing unified metrics for comparing different MAPF algorithms and expanding their knowledge of model-based DRL to address the existing challenges in MAPF.
    摘要 多智能路径找索(MAPF)是许多大规模 робоaxi应用中的关键领域,经常是多智能系统的基础步骤。然而,随着环境的复杂化和拥堵,MAPF的复杂性逐渐增加,导致现有的解决方案效果逐渐下降。与其他研究不同,我们的工作不仅提供了现代多智能系统中DRL的总览,还审视了DRL在多智能系统设置中的探索。此外,我们还希望通过解决MAPF解决方案评价中的困难点,提供一种统一的评价指标,并为评价MAPF算法提供全面的解释。最后,我们介绍了基于DRL的模型驱动方法,并提供了该方法的基础理解,以解决现有的MAPF挑战。我们的目标是帮助读者更深入了解当前的研究方向,提供统一的评价指标,扩展知识,并使读者能够更好地理解和应用MAPF和DRL。

DF2: Distribution-Free Decision-Focused Learning

  • paper_url: http://arxiv.org/abs/2308.05889
  • repo_url: None
  • paper_authors: Lingkai Kong, Wenhao Mu, Jiaming Cui, Yuchen Zhuang, B. Aditya Prakash, Bo Dai, Chao Zhang
  • for: 这篇论文旨在解决预测然后优化问题中的三个瓶颈,即模型匹配错误、样本平均approximation错误和梯度approximation错误。
  • methods: 我们提出了DF2方法,这是第一个不需要任务特定预测器的分布-自由决策学习方法,可以直接在训练中学习预测目标函数。我们开发了一种注意力基于分布参数化的模型架构,以便高效地学习目标函数。
  • results: 我们在一个 synthetic 问题、一个风力发电拍价问题和一个非对称疫苗分布问题上评估了DF2方法,结果显示DF2方法有效。
    Abstract Decision-focused learning (DFL) has recently emerged as a powerful approach for predict-then-optimize problems by customizing a predictive model to a downstream optimization task. However, existing end-to-end DFL methods are hindered by three significant bottlenecks: model mismatch error, sample average approximation error, and gradient approximation error. Model mismatch error stems from the misalignment between the model's parameterized predictive distribution and the true probability distribution. Sample average approximation error arises when using finite samples to approximate the expected optimization objective. Gradient approximation error occurs as DFL relies on the KKT condition for exact gradient computation, while most methods approximate the gradient for backpropagation in non-convex objectives. In this paper, we present DF2 -- the first \textit{distribution-free} decision-focused learning method explicitly designed to address these three bottlenecks. Rather than depending on a task-specific forecaster that requires precise model assumptions, our method directly learns the expected optimization function during training. To efficiently learn the function in a data-driven manner, we devise an attention-based model architecture inspired by the distribution-based parameterization of the expected objective. Our method is, to the best of our knowledge, the first to address all three bottlenecks within a single model. We evaluate DF2 on a synthetic problem, a wind power bidding problem, and a non-convex vaccine distribution problem, demonstrating the effectiveness of DF2.
    摘要 “决策关注学习”(DFL)是一种有力的方法,用于预测后优化问题。但是,现有的综合性DFL方法受到三大瓶颈:模型匹配错误、样本平均 aproximation 错误和梯度approximation 错误。模型匹配错误来自于模型参数化预测分布与真实概率分布的不一致。样本平均 aproximation 错误出现在使用有限样本来近似预期优化目标时。梯度approximation 错误则是因为DFL 依赖于 KKT 条件来计算梯度,而大多数方法在非 convex 目标上 aproximate 梯度。在这篇论文中,我们提出了 DF2 方法,这是首个不需要任务特定的预测器,可以直接在训练中学习预期优化函数。为了有效地学习这个函数,我们设计了一种注意力基于的模型架构,这种架构得到了分布基于的参数化。我们的方法可以同时解决三个瓶颈,这是首次在单个模型中实现。我们在一个 sintetic 问题、一个风力发电拍卖问题和一个非 convex 疫苗分布问题中证明了 DF2 的有效性。

GPLaSDI: Gaussian Process-based Interpretable Latent Space Dynamics Identification through Deep Autoencoder

  • paper_url: http://arxiv.org/abs/2308.05882
  • repo_url: https://github.com/llnl/gplasdi
  • paper_authors: Christophe Bonneville, Youngsoo Choi, Debojyoti Ghosh, Jonathan L. Belof
  • for: This paper aims to develop a novel reduced-order model (ROM) framework called GPLaSDI, which leverages Gaussian processes (GPs) for latent space ODE interpolations and provides accurate and efficient predictions for partial differential equations (PDEs).
  • methods: The proposed method, GPLaSDI, uses autoencoders to map full-order PDE solutions to a latent space and learns the system of ODEs governing the latent space dynamics. The method then interpolates and solves the ODE system in the reduced latent space, allowing for fast and accurate ROM predictions.
  • results: The proposed method is demonstrated on three problems, including the Burgers equation, Vlasov equation for plasma physics, and a rising thermal bubble problem, and achieves between 200 and 100,000 times speed-up with up to 7% relative error. The method provides accurate and efficient predictions for PDEs without prior knowledge of the underlying equations.
    Abstract Numerically solving partial differential equations (PDEs) can be challenging and computationally expensive. This has led to the development of reduced-order models (ROMs) that are accurate but faster than full order models (FOMs). Recently, machine learning advances have enabled the creation of non-linear projection methods, such as Latent Space Dynamics Identification (LaSDI). LaSDI maps full-order PDE solutions to a latent space using autoencoders and learns the system of ODEs governing the latent space dynamics. By interpolating and solving the ODE system in the reduced latent space, fast and accurate ROM predictions can be made by feeding the predicted latent space dynamics into the decoder. In this paper, we introduce GPLaSDI, a novel LaSDI-based framework that relies on Gaussian process (GP) for latent space ODE interpolations. Using GPs offers two significant advantages. First, it enables the quantification of uncertainty over the ROM predictions. Second, leveraging this prediction uncertainty allows for efficient adaptive training through a greedy selection of additional training data points. This approach does not require prior knowledge of the underlying PDEs. Consequently, GPLaSDI is inherently non-intrusive and can be applied to problems without a known PDE or its residual. We demonstrate the effectiveness of our approach on the Burgers equation, Vlasov equation for plasma physics, and a rising thermal bubble problem. Our proposed method achieves between 200 and 100,000 times speed-up, with up to 7% relative error.
    摘要 解决partial differential equations (PDEs) numerically可以困难且计算成本高。这导致了reduced-order models (ROMs)的发展,ROMs比full order models (FOMs)快速且准确。最近,机器学习的进步使得non-linear projection方法的创造,如Latent Space Dynamics Identification (LaSDI)。LaSDI将全功率PDE解析 mapping到一个latent space使用autoencoders,并学习latent space动力学系统。通过latent space动力学系统的 interpolating和解决,可以快速且准确地预测ROM。在本文中,我们介绍了GPLaSDI,一种基于Gaussian process (GP)的LaSDI框架。使用GP提供了两个优点:首先,它允许量化ROM预测的uncertainty。其次,通过利用这个预测uncertainty,可以efficiently adaptive training through greedy选择additional training data points。这种方法不需要先知道下面PDEs。因此,GPLaSDI是非侵入的,可以应用于没有known PDE或其剩余的问题。我们在Burgers方程、Vlasov方程 дляплазма物理和一个升高的热气囊问题上表明了我们的方法的有效性,实现了200-100,000倍的速度减少,Relative error在7%之间。

Aphid Cluster Recognition and Detection in the Wild Using Deep Learning Models

  • paper_url: http://arxiv.org/abs/2308.05881
  • repo_url: None
  • paper_authors: Tianxiao Zhang, Kaidong Li, Xiangyu Chen, Cuncong Zhong, Bo Luo, Ivan Grijalva, Brian McCornack, Daniel Flippo, Ajay Sharda, Guanghui Wang
  • for: 本研究旨在使用深度学习模型检测蚂蚁团群,以实现Targeted pesticide application。
  • methods: 我们使用了大量的麦田图像数据,手动选择了5447个图像中包含蚂蚁的图像,并对每个蚂蚁团群进行了标注。然后,我们将图像分割成patches,得到了151380个标注图像 patches。最后,我们对四种state-of-the-art对象检测模型(VFNet、GFLV2、PAA和ATSS)进行了实现和比较。
  • results: 我们的实验结果表明,这四种模型在蚂蚁数据集上具有稳定的性能,以至于平均准确率和回归率。此外,我们还提出了将邻近的蚂蚁团群合并并去除cropping中的小 clusters,从而进一步提高了性能。本研究 demonstarates了使用机器学习模型自动检测和管理蚂蚁的可能性。
    Abstract Aphid infestation poses a significant threat to crop production, rural communities, and global food security. While chemical pest control is crucial for maximizing yields, applying chemicals across entire fields is both environmentally unsustainable and costly. Hence, precise localization and management of aphids are essential for targeted pesticide application. The paper primarily focuses on using deep learning models for detecting aphid clusters. We propose a novel approach for estimating infection levels by detecting aphid clusters. To facilitate this research, we have captured a large-scale dataset from sorghum fields, manually selected 5,447 images containing aphids, and annotated each individual aphid cluster within these images. To facilitate the use of machine learning models, we further process the images by cropping them into patches, resulting in a labeled dataset comprising 151,380 image patches. Then, we implemented and compared the performance of four state-of-the-art object detection models (VFNet, GFLV2, PAA, and ATSS) on the aphid dataset. Extensive experimental results show that all models yield stable similar performance in terms of average precision and recall. We then propose to merge close neighboring clusters and remove tiny clusters caused by cropping, and the performance is further boosted by around 17%. The study demonstrates the feasibility of automatically detecting and managing insects using machine learning models. The labeled dataset will be made openly available to the research community.
    摘要 螟蛾感染对农业生产、农村社区和全球食品安全构成了重要威胁。虽然化学防治昆虫是最大化产量的重要手段,但是投放化学药品于整个田野是环境不可持续和昂贵的。因此,精准地本地化和管理螟蛾是不可或缺的。本文主要关注使用深度学习模型检测螟蛾群体。我们提出了一种新的方法,通过检测螟蛾群体来估算感染水平。为了进行这项研究,我们在高粮田场中捕捉了大规模数据集, manually selected 5,447 张图像中包含螟蛾,并对每个个体螟蛾群体进行了注释。为了使机器学习模型可用,我们进一步处理了图像,将其截割成 patches,得到了151,380 个标注图像 patches。然后,我们实现并比较了四种当前最佳对象检测模型(VFNet、GFLV2、PAA 和 ATSS)在螟蛾数据集上的性能。广泛的实验结果表明,所有模型在精度和准确性方面具有稳定的相似性。我们 then proposed to merge close neighboring clusters and remove tiny clusters caused by cropping, and the performance is further boosted by around 17%. 该研究表明了使用机器学习模型自动检测和管理昆虫的可能性。标注数据集将被开放给研究社区。

Composable Core-sets for Diversity Approximation on Multi-Dataset Streams

  • paper_url: http://arxiv.org/abs/2308.05878
  • repo_url: None
  • paper_authors: Stephanie Wang, Michael Flynn, Fangyu Luo
  • for: 这种论文主要针对的是实时训练机器学习模型,尤其是在感知数据量很大的情况下。
  • methods: 这种方法使用核心集 constructions algorithm,该算法可以 constructions 可 Composable 核心集,以便在流处理数据中SUMMARIZE 数据,并在活动学习环境中使用。
  • results: 这种方法可以减少训练时间,并且可以在实时训练中使用。 Empirical analysis 表明,这种方法可以在大量感知数据中提供可靠的结果。
    Abstract Core-sets refer to subsets of data that maximize some function that is commonly a diversity or group requirement. These subsets are used in place of the original data to accomplish a given task with comparable or even enhanced performance if biases are removed. Composable core-sets are core-sets with the property that subsets of the core set can be unioned together to obtain an approximation for the original data; lending themselves to be used for streamed or distributed data. Recent work has focused on the use of core-sets for training machine learning models. Preceding solutions such as CRAIG have been proven to approximate gradient descent while providing a reduced training time. In this paper, we introduce a core-set construction algorithm for constructing composable core-sets to summarize streamed data for use in active learning environments. If combined with techniques such as CRAIG and heuristics to enhance construction speed, composable core-sets could be used for real time training of models when the amount of sensor data is large. We provide empirical analysis by considering extrapolated data for the runtime of such a brute force algorithm. This algorithm is then analyzed for efficiency through averaged empirical regression and key results and improvements are suggested for further research on the topic.
    摘要 <>将文本翻译成简化字符串。<>核心集(core-set)指的是一个数据集的子集,可以最大化某些函数,通常是多样性或组准要求。这些子集用于取代原始数据,以实现与原始数据相同或更高的性能,而且去掉偏见。可composable核心集(composable core-sets)是指核心集具有可分解的性质,即可将核心集中的子集union起来,来 aproximate 原始数据。这些核心集可以用于流动或分布式数据。现有研究主要关注使用核心集来训练机器学习模型。先前的解决方案,如CRAIG,已经证明可以approximate 梯度下降,同时提供减少的训练时间。在这篇论文中,我们提出一种constructing 核心集算法,用于概要 summarize 流动数据,以便在活动学习环境中使用。如果与CRAIG和其他技术相结合,可以用composable核心集来实现实时训练模型,当感知器数据较大时。我们对这种简洁算法进行了empirical分析,包括考虑extrapolated 数据的运行时间。这种算法的效率则通过averaged empirical regression和关键结果进行了分析,并建议了进一步的研究方向。

Revisiting N-CNN for Clinical Practice

  • paper_url: http://arxiv.org/abs/2308.05877
  • repo_url: None
  • paper_authors: Leonardo Antunes Ferreira, Lucas Pereira Carlini, Gabriel de Almeida Sá Coutrin, Tatiany Marcondes Heideirich, Marina Carvalho de Moraes Barros, Ruth Guinsburg, Carlos Eduardo Thomaz
  • For: This paper aims to improve the performance and reliability of a deep learning model for neonatal pain assessment by optimizing its hyperparameters and evaluating their impact on classification metrics, explainability, and calibration.* Methods: The authors used a Neonatal Convolutional Neural Network (N-CNN) and optimized its hyperparameters by evaluating the improvement in F1 Score for each hyperparameter individually. They also applied soft labels derived from the Neonatal Facial Coding System to improve the model’s performance.* Results: The Tuned N-CNN showed improvements in classification metrics and explainability, but the calibration performance did not improve directly. The authors believe that their insights could contribute to the development of more reliable pain evaluation tools for newborns, which could aid healthcare professionals in delivering appropriate interventions and improving patient outcomes.Here’s the Chinese version of the three key points:* For: 这篇论文目的是通过优化深度学习模型的超参数,以提高新生痛症评估的性能和可靠性。* Methods: 作者使用了Neonatal Convolutional Neural Network (N-CNN),并通过评估每个超参数的改进情况来优化其超参数。他们还应用了基于Neonatal Facial Coding System的软标签,以提高模型的性能。* Results: 优化后的Tuned N-CNN表现出了 Classification Metrics和Explainability的改进,但Calibration性能直接不改善。作者认为,他们的发现可能对新生痛症评估工具的开发产生影响,帮助医疗专业人员采取适当的 intervención和改善病人结果。
    Abstract This paper revisits the Neonatal Convolutional Neural Network (N-CNN) by optimizing its hyperparameters and evaluating how they affect its classification metrics, explainability and reliability, discussing their potential impact in clinical practice. We have chosen hyperparameters that do not modify the original N-CNN architecture, but mainly modify its learning rate and training regularization. The optimization was done by evaluating the improvement in F1 Score for each hyperparameter individually, and the best hyperparameters were chosen to create a Tuned N-CNN. We also applied soft labels derived from the Neonatal Facial Coding System, proposing a novel approach for training facial expression classification models for neonatal pain assessment. Interestingly, while the Tuned N-CNN results point towards improvements in classification metrics and explainability, these improvements did not directly translate to calibration performance. We believe that such insights might have the potential to contribute to the development of more reliable pain evaluation tools for newborns, aiding healthcare professionals in delivering appropriate interventions and improving patient outcomes.
    摘要 Translated into Simplified Chinese:这篇论文探讨了新生儿 convolutional neural network (N-CNN) 的超参数优化和其对 классифика表现、可解释性和可靠性的影响,并讨论其在临床实践中的潜在影响。我们选择了不改变原始 N-CNN 架构的超参数,主要是调整学习率和训练正则化。优化是通过评估每个超参数的改进情况来进行的,并选择了最佳的超参数来创建一个调整后的 N-CNN。我们还应用了基于新生儿表情编码系统的软标签,提出了一种新的训练表情分类模型的方法。虽然调整后的 N-CNN 结果表明了类型 metric 和可解释性的改进,但这些改进并没有直接影响 calibration 性能。我们认为这些发现可能会对新生儿疼痛评估工具的开发产生影响,帮助医疗专业人员采取合适的 intervención和提高病人结果。

UFed-GAN: A Secure Federated Learning Framework with Constrained Computation and Unlabeled Data

  • paper_url: http://arxiv.org/abs/2308.05870
  • repo_url: None
  • paper_authors: Achintha Wijesinghe, Songyang Zhang, Siyu Qi, Zhi Ding
  • for: 本研究旨在Addressing limited computational resources和无标签数据问题,提高云端 multimedia data classification和隐私保护的可行性。
  • methods: 本文提出的Unsupervised Federated Generative Adversarial Network(UFed-GAN)可以在资源受限和无标签数据环境下Capture用户数据分布,而无需本地分类训练。
  • results: 我们的实验结果表明,UFed-GAN在Addressing limited computational resources和无标签数据问题时具有强大的潜力,同时保持用户隐私。
    Abstract To satisfy the broad applications and insatiable hunger for deploying low latency multimedia data classification and data privacy in a cloud-based setting, federated learning (FL) has emerged as an important learning paradigm. For the practical cases involving limited computational power and only unlabeled data in many wireless communications applications, this work investigates FL paradigm in a resource-constrained and label-missing environment. Specifically, we propose a novel framework of UFed-GAN: Unsupervised Federated Generative Adversarial Network, which can capture user-side data distribution without local classification training. We also analyze the convergence and privacy of the proposed UFed-GAN. Our experimental results demonstrate the strong potential of UFed-GAN in addressing limited computational resources and unlabeled data while preserving privacy.
    摘要

Using Twitter Data to Determine Hurricane Category: An Experiment

  • paper_url: http://arxiv.org/abs/2308.05866
  • repo_url: None
  • paper_authors: Songhui Yue, Jyothsna Kondari, Aibek Musaev, Randy K. Smith, Songqing Yue
  • for: 本研究旨在通过数据挖掘方法挖掘社交媒体数据和灾害严重程度之间的关系。
  • methods: 本研究使用了Twitter数据,通过数据挖掘技术挖掘社交媒体数据和灾害严重程度之间的关系。
  • results: 实验结果表明, Twitter数据与灾害严重程度之间存在正相关关系。此外,本研究还提出了一种使用Twitter数据预测灾害分类的方法。
    Abstract Social media posts contain an abundant amount of information about public opinion on major events, especially natural disasters such as hurricanes. Posts related to an event, are usually published by the users who live near the place of the event at the time of the event. Special correlation between the social media data and the events can be obtained using data mining approaches. This paper presents research work to find the mappings between social media data and the severity level of a disaster. Specifically, we have investigated the Twitter data posted during hurricanes Harvey and Irma, and attempted to find the correlation between the Twitter data of a specific area and the hurricane level in that area. Our experimental results indicate a positive correlation between them. We also present a method to predict the hurricane category for a specific area using relevant Twitter data.
    摘要 社交媒体帖子中含有大量的公众意见信息,特别是自然灾害如飓风。帖子与事件之间存在特殊的相关性,可以通过数据挖掘方法获得。这篇论文探讨了找到社交媒体数据与灾害严重程度之间的映射。我们对飓风哈维和飓风艾尔玛期间的推特数据进行了研究,并发现了这些数据与灾害严重程度之间的正相关性。我们还提出了使用相关的推特数据预测灾害分类的方法。

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

  • paper_url: http://arxiv.org/abs/2308.05864
  • repo_url: None
  • paper_authors: Jun Ma, Ronald Xie, Shamini Ayyadhury, Cheng Ge, Anubha Gupta, Ritu Gupta, Song Gu, Yao Zhang, Gihun Lee, Joonkee Kim, Wei Lou, Haofeng Li, Eric Upschulte, Timo Dickscheid, José Guilherme de Almeida, Yixin Wang, Lin Han, Xin Yang, Marco Labagnara, Sahand Jamal Rahi, Carly Kempster, Alice Pollitt, Leon Espinosa, Tâm Mignot, Jan Moritz Middeke, Jan-Niklas Eckardt, Wangkai Li, Zhaoyang Li, Xiaochen Cai, Bizhe Bai, Noah F. Greenwald, David Van Valen, Erin Weisbart, Beth A. Cimini, Zhuoshi Li, Chao Zuo, Oscar Brück, Gary D. Bader, Bo Wang
  • for: 这个论文旨在提供一个多Modalidad单元Segmentation的benchmark,以便提高单元分析的准确性和多样性。
  • methods: 这个研究使用Transformer基于的深度学习算法,可以在多种微scopic影像平台和组织类型上自动调整参数,而不需要手动参数调整。
  • results: 研究发现这个新算法不仅可以超越现有的方法,还可以在多种微scopic影像中应用,无需手动调整参数。这个benchmark和改进的算法为单元分析带来了有前途的可能性。
    Abstract Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyperparameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods, but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging.
    摘要 Cell segmentation is a crucial step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyperparameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods, but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging.Here's the translation in Traditional Chinese as well:Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyperparameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods, but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging.

Knowledge Propagation over Conditional Independence Graphs

  • paper_url: http://arxiv.org/abs/2308.05857
  • repo_url: None
  • paper_authors: Urszula Chajewska, Harsh Shrivastava
  • for: 本研究旨在提出针对 Conditional Independence (CI) 图的知识传播算法,用于从不同领域中的系统中提取有价值的信息。
  • methods: 本研究使用了一种基于 CI 图的知识传播算法,通过模型特性之间的部分相关性来捕捉系统的域トポлогия。
  • results: 实验结果表明,本研究的方法在公开available的 Cora 和 PubMed 数据集上表现出色,与现有技术相比有所提高。
    Abstract Conditional Independence (CI) graph is a special type of a Probabilistic Graphical Model (PGM) where the feature connections are modeled using an undirected graph and the edge weights show the partial correlation strength between the features. Since the CI graphs capture direct dependence between features, they have been garnering increasing interest within the research community for gaining insights into the systems from various domains, in particular discovering the domain topology. In this work, we propose algorithms for performing knowledge propagation over the CI graphs. Our experiments demonstrate that our techniques improve upon the state-of-the-art on the publicly available Cora and PubMed datasets.
    摘要 <>translate "Conditional Independence (CI) graph is a special type of a Probabilistic Graphical Model (PGM) where the feature connections are modeled using an undirected graph and the edge weights show the partial correlation strength between the features. Since the CI graphs capture direct dependence between features, they have been garnering increasing interest within the research community for gaining insights into the systems from various domains, in particular discovering the domain topology. In this work, we propose algorithms for performing knowledge propagation over the CI graphs. Our experiments demonstrate that our techniques improve upon the state-of-the-art on the publicly available Cora and PubMed datasets." into Simplified Chinese.翻译结果如下:Conditional Independence(CI)图是一种特殊的概率图模型(PGM),其特点是通过无向图模型特性之间的关系,并且边权表示特征之间的部分相关性。由于CI图能够直接表示特征之间的依赖关系,因此在不同领域的研究中引起了越来越多的关注,特别是发现领域拓扑。在这项工作中,我们提出了在CI图上进行知识传播的算法。我们的实验表明,我们的技术可以超越当前状态的术语在公开available的Cora和PubMed数据集上。

CSPM: A Contrastive Spatiotemporal Preference Model for CTR Prediction in On-Demand Food Delivery Services

  • paper_url: http://arxiv.org/abs/2308.08446
  • repo_url: None
  • paper_authors: Guyu Jiang, Xiaoyun Li, Rongrong Jing, Ruoqi Zhao, Xingliang Ni, Guodong Cao, Ning Hu
  • for: 预测用户点击率 (CTR) 在在线快递食品平台上是一项重要的任务,以准确地估算用户点击食品项目的概率。与通用的电子商务平台如淘宝和amazon不同,用户在快递食品平台上的行为和兴趣受到地点和时间的限制,导致现有的 CTR 预测算法在 OFD 场景下缺乏效果。
  • methods: 本文提出了一种基于对比学习框架的 Contrastive Sres under different search states (CSPM) 算法,用于模型用户在不同搜索状态下的各种各样的首选。CSPM 包括三个模块:对比空间时间表示学习 (CSRL)、空间时间偏好提取器 (StPE) 和空间时间信息筛选器 (StIF)。CSRL 使用对比学习框架生成一个空间时间活动表示 (SAR),以便用于搜索行为序列中的用户首选。StPE 使用 SAR 来激活用户的不同位置和时间的偏好,使用多头注意机制。StIF 将 SAR integrate到一个阻止网络中,以自动捕捉具有潜在空间时间效应的重要特征。
  • results: 对于两个大规模的实际数据集,CSPM 表现出了顶尖的状态。特别是,CSPM 在 Alibaba 在线 OFD 平台 Ele.me 上进行了成功部署,导致了一个显著的 0.88% 提升 в CTR,这有着重要的业务意义。
    Abstract Click-through rate (CTR) prediction is a crucial task in the context of an online on-demand food delivery (OFD) platform for precisely estimating the probability of a user clicking on food items. Unlike universal e-commerce platforms such as Taobao and Amazon, user behaviors and interests on the OFD platform are more location and time-sensitive due to limited delivery ranges and regional commodity supplies. However, existing CTR prediction algorithms in OFD scenarios concentrate on capturing interest from historical behavior sequences, which fails to effectively model the complex spatiotemporal information within features, leading to poor performance. To address this challenge, this paper introduces the Contrastive Sres under different search states using three modules: contrastive spatiotemporal representation learning (CSRL), spatiotemporal preference extractor (StPE), and spatiotemporal information filter (StIF). CSRL utilizes a contrastive learning framework to generate a spatiotemporal activation representation (SAR) for the search action. StPE employs SAR to activate users' diverse preferences related to location and time from the historical behavior sequence field, using a multi-head attention mechanism. StIF incorporates SAR into a gating network to automatically capture important features with latent spatiotemporal effects. Extensive experiments conducted on two large-scale industrial datasets demonstrate the state-of-the-art performance of CSPM. Notably, CSPM has been successfully deployed in Alibaba's online OFD platform Ele.me, resulting in a significant 0.88% lift in CTR, which has substantial business implications.
    摘要 clic-through rate (CTR) 预测是在在线快递食品平台上关键的任务,可以准确地估算用户点击食品项。与普通的电商平台如淘宝和亚马逊不同,用户在食品平台上的行为和兴趣更加地受到地点和时间影响,因为快递范围和地域商品供应有限。然而,现有的 CTRL 预测算法在食品平台场景中强调 capture interest from historical behavior sequences,这会导致不能够有效地模型特有的空间时间信息,从而导致 poor performance。为解决这个挑战,本文介绍了一种基于对比学习框架的 Contrastive Sres under different search states (CSPM) 算法,包括三个模块:对比空间时间表示学习 (CSRL)、空间时间偏好提取器 (StPE) 和空间时间信息筛选器 (StIF)。CSRL 使用对比学习框架生成一个空间时间活动表示 (SAR) для搜索动作。StPE 使用 SAR activation 用户在历史行为序列字段中的多头注意机制来激活用户的多样化的地点和时间偏好。StIF 将 SAR incorporated 到一个阻止网络中,以自动捕捉特有的空间时间效果。经验表明,CSPM 可以在两个大规模的工业数据集上实现状态之 arts 的表现,并且成功地部署在阿里巴巴的在线OFD平台 Ele.me 上,导致了一个显著的 0.88% 提升 CTRL,这有重要的商业意义。

GaborPINN: Efficient physics informed neural networks using multiplicative filtered networks

  • paper_url: http://arxiv.org/abs/2308.05843
  • repo_url: None
  • paper_authors: Xinquan Huang, Tariq Alkhalifah
  • for: 快速解决全波形推敲问题,例如全波形反射推敲。
  • methods: 使用改进的物理学信息感知神经网络(PINN),其中包括增强的多元滤波网络(MFN),以及利用Gabor基函数来表示波场。
  • results: 与传统PINN相比,提出的方法可以快速 converge,具体来说,可以提高 convergence speed 至少 two magnitudes。
    Abstract The computation of the seismic wavefield by solving the Helmholtz equation is crucial to many practical applications, e.g., full waveform inversion. Physics-informed neural networks (PINNs) provide functional wavefield solutions represented by neural networks (NNs), but their convergence is slow. To address this problem, we propose a modified PINN using multiplicative filtered networks, which embeds some of the known characteristics of the wavefield in training, e.g., frequency, to achieve much faster convergence. Specifically, we use the Gabor basis function due to its proven ability to represent wavefields accurately and refer to the implementation as GaborPINN. Meanwhile, we incorporate prior information on the frequency of the wavefield into the design of the method to mitigate the influence of the discontinuity of the represented wavefield by GaborPINN. The proposed method achieves up to a two-magnitude increase in the speed of convergence as compared with conventional PINNs.
    摘要 Computing the seismic wavefield by solving the Helmholtz equation is crucial to many practical applications, such as full waveform inversion. Physics-informed neural networks (PINNs) provide functional wavefield solutions represented by neural networks (NNs), but their convergence is slow. To address this problem, we propose a modified PINN using multiplicative filtered networks, which embeds some of the known characteristics of the wavefield in training, such as frequency, to achieve much faster convergence. Specifically, we use the Gabor basis function due to its proven ability to represent wavefields accurately and refer to the implementation as GaborPINN. Meanwhile, we incorporate prior information on the frequency of the wavefield into the design of the method to mitigate the influence of the discontinuity of the represented wavefield by GaborPINN. The proposed method achieves up to a two-magnitude increase in the speed of convergence as compared with conventional PINNs.Translation in Simplified Chinese:计算地震波场的海尔姆霍尔兹方程是许多实际应用的关键,例如全波形反射。物理学 Informed Neural Networks (PINNs) 提供了功能波场解决方案,表示为神经网络 (NNs),但是它们的收敛速度较慢。为解决这个问题,我们提议一种修改后的 PINN,使用多 multiply 过滤网络,其中包含一些知道波场的特征,例如频率,以实现更快的收敛速度。我们使用 Gabor 基函数,因为它已经证明可以高度准确地表示波场,并将其称为 GaborPINN。此外,我们在方法设计中加入了波场频率的先前信息,以降低 GaborPINN 表示的波场的缺陷的影响。提议的方法可以与传统 PINNs 的收敛速度提高至两个数量级。

FLShield: A Validation Based Federated Learning Framework to Defend Against Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2308.05832
  • repo_url: None
  • paper_authors: Ehsanul Kabir, Zeyu Song, Md Rafi Ur Rashid, Shagufta Mehnaz
  • for: 本研究旨在提出一种新的 Federated Learning 框架,以确保 Collaborative Learning 系统的安全性和可靠性。
  • methods: 该框架基于参与者的善意数据进行 validate 本地模型,以防止恶意参与者的行为。
  • results: 经过广泛的实验评估,FLShield 框架能够有效防止多种毒化和后门攻击,并保持本地数据的隐私。
    Abstract Federated learning (FL) is revolutionizing how we learn from data. With its growing popularity, it is now being used in many safety-critical domains such as autonomous vehicles and healthcare. Since thousands of participants can contribute in this collaborative setting, it is, however, challenging to ensure security and reliability of such systems. This highlights the need to design FL systems that are secure and robust against malicious participants' actions while also ensuring high utility, privacy of local data, and efficiency. In this paper, we propose a novel FL framework dubbed as FLShield that utilizes benign data from FL participants to validate the local models before taking them into account for generating the global model. This is in stark contrast with existing defenses relying on server's access to clean datasets -- an assumption often impractical in real-life scenarios and conflicting with the fundamentals of FL. We conduct extensive experiments to evaluate our FLShield framework in different settings and demonstrate its effectiveness in thwarting various types of poisoning and backdoor attacks including a defense-aware one. FLShield also preserves privacy of local data against gradient inversion attacks.
    摘要 federated learning (FL) 正在改变我们如何从数据中学习。随着其 популярность的增长,它现在在许多安全关键领域中使用,如自动驾驶和医疗。由于 thousands of participants 可以在这种合作环境中参与,因此保证安全性和可靠性的需求变得更加重要。这高亮了需要设计安全可靠的 FL 系统,同时保持高的实用性、本地数据隐私和效率。在这篇论文中,我们提出了一种新的 FL 框架,称为 FLShield,它利用 FL 参与者的善良数据来验证本地模型,以确保其在生成全球模型之前的可靠性。这与现有防御方法,即依赖服务器访问干净的数据集的假设,不同。我们进行了广泛的实验来评估我们的 FLShield 框架在不同的设置下的效果,并证明它在不同类型的毒剂和后门攻击中具有很高的有效性。FLShield 还保持了本地数据隐私性免受梯度反向攻击。

Neural Progressive Meshes

  • paper_url: http://arxiv.org/abs/2308.05741
  • repo_url: None
  • paper_authors: Yun-Chun Chen, Vladimir G. Kim, Noam Aigerman, Alec Jacobson
  • for: 提高3D内容的传输效率, especialy for large geometric data such as 3D meshes.
  • methods: 使用学习生成模型来分解和重建3D模型,以实现进度式传输和高质量重建。
  • results: 比基eline方法具有更高的压缩率和重建质量。In English, this means:
  • for: Improving the efficiency of transmitting 3D content, especially for large geometric data such as 3D meshes.
  • methods: Using a learned generative model to decompose and reconstruct 3D models, in order to achieve progressive transmission and high-quality reconstruction.
  • results: Outperforming baseline methods in terms of compression ratio and reconstruction quality.
    Abstract The recent proliferation of 3D content that can be consumed on hand-held devices necessitates efficient tools for transmitting large geometric data, e.g., 3D meshes, over the Internet. Detailed high-resolution assets can pose a challenge to storage as well as transmission bandwidth, and level-of-detail techniques are often used to transmit an asset using an appropriate bandwidth budget. It is especially desirable for these methods to transmit data progressively, improving the quality of the geometry with more data. Our key insight is that the geometric details of 3D meshes often exhibit similar local patterns even across different shapes, and thus can be effectively represented with a shared learned generative space. We learn this space using a subdivision-based encoder-decoder architecture trained in advance on a large collection of surfaces. We further observe that additional residual features can be transmitted progressively between intermediate levels of subdivision that enable the client to control the tradeoff between bandwidth cost and quality of reconstruction, providing a neural progressive mesh representation. We evaluate our method on a diverse set of complex 3D shapes and demonstrate that it outperforms baselines in terms of compression ratio and reconstruction quality.
    摘要 Our key insight is that the geometric details of 3D meshes often exhibit similar local patterns across different shapes, and can be effectively represented with a shared learned generative space. We use a subdivision-based encoder-decoder architecture trained in advance on a large collection of surfaces to learn this space. Additionally, we observe that residual features can be transmitted progressively between intermediate levels of subdivision, allowing the client to control the tradeoff between bandwidth cost and quality of reconstruction. This provides a neural progressive mesh representation.We evaluate our method on a diverse set of complex 3D shapes and demonstrate that it outperforms baselines in terms of compression ratio and reconstruction quality.

Zero Grads Ever Given: Learning Local Surrogate Losses for Non-Differentiable Graphics

  • paper_url: http://arxiv.org/abs/2308.05739
  • repo_url: None
  • paper_authors: Michael Fischer, Tobias Ritschel
  • for: ZeroGrads is a framework for optimizing non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling, or optimal control in physics-driven animation.
  • methods: ZeroGrads uses a neural approximation of the objective function, called a surrogate, to circumvent the issue of undefined or zero gradients in gradient-based optimization. The surrogate is learned online and self-supervised, without pre-computed data or pre-trained models.
  • results: ZeroGrads demonstrates competitive performance at little overhead, scaling well to higher dimensions with up to 35k interlinked variables. It is able to optimize diverse non-convex, non-differentiable black-box problems in graphics, including visibility in rendering, discrete parameter spaces in procedural modelling, and optimal control in physics-driven animation.
    Abstract Gradient-based optimization is now ubiquitous across graphics, but unfortunately can not be applied to problems with undefined or zero gradients. To circumvent this issue, the loss function can be manually replaced by a "surrogate" that has similar minima but is differentiable. Our proposed framework, ZeroGrads, automates this process by learning a neural approximation of the objective function, the surrogate, which in turn can be used to differentiate through arbitrary black-box graphics pipelines. We train the surrogate on an actively smoothed version of the objective and encourage locality, focusing the surrogate's capacity on what matters at the current training episode. The fitting is performed online, alongside the parameter optimization, and self-supervised, without pre-computed data or pre-trained models. As sampling the objective is expensive (it requires a full rendering or simulator run), we devise an efficient sampling scheme that allows for tractable run-times and competitive performance at little overhead. We demonstrate optimizing diverse non-convex, non-differentiable black-box problems in graphics, such as visibility in rendering, discrete parameter spaces in procedural modelling or optimal control in physics-driven animation. In contrast to more traditional algorithms, our approach scales well to higher dimensions, which we demonstrate on problems with up to 35k interlinked variables.
    摘要 梯度基本优化现在 Graphics 中 universal,但它无法应用于具有未定义或Zero梯度的问题。为了解决这个问题,损失函数可以手动被替换为一个“代理”,该函数有相似的枢轴点,但是可 diferenciable。我们的提议的框架,ZeroGrads,可以自动实现这个过程,通过学习一个神经网络来 aproximate 目标函数,代理函数,然后使用这个代理函数来 differentiate 通过黑obox 图形处理器。我们在训练过程中使用活动滑动的目标函数,以及强调本集中的性能,使得代理函数具有更好的表达能力。我们的方法可以在线进行训练,同时进行参数优化,无需预计算数据或预训练模型。由于评估目标函数的成本高(需要完整的渲染或模拟运行),我们设计了一种有效的采样方案,以实现可负担的运行时间和竞争力。我们在不同的非 diferenciable 和非杠定的黑obox 图形问题中进行优化,如视力渲染、分布式参数空间和物理驱动动画中的优化问题。与传统方法相比,我们的方法可扩展到更高的维度,我们在35k个相互关联的变量上进行了演示。

Follow Anything: Open-set detection, tracking, and following in real-time

  • paper_url: http://arxiv.org/abs/2308.05737
  • repo_url: https://github.com/alaamaalouf/followanything
  • paper_authors: Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallabhula, Makram Chahine, Daniel M. Vogt, Robert J. Wood, Antonio Torralba, Daniela Rus
  • For: 这篇论文旨在提出一种机器人系统,可以在实时中检测、跟踪和跟踪任何目标对象。* Methods: 该系统使用了一种名为“跟踪任何”(FAn)的开放词汇和多模态模型,可以应用于新的类型在推理时使用文本、图像或点击查询。该模型利用大规模预训练模型(基础模型)中的丰富视觉描述符,可以在输入图像序列中检测和分割目标对象,并跟踪它们在图像帧中。* Results: 作者在一个真实世界的机器人系统(微型飞行器)上测试了FAn系统,并证明了它可以在实时控制循环中不间断地跟踪目标对象。此外,FAn系统可以在一个笔记型的硬件配置(6-8 GB的轻量硬件卡)上运行,达到6-20帧每秒的吞吐量。
    Abstract Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial automation to logistics and warehousing, to healthcare and security. In this paper, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed ``follow anything'' (FAn), is an open-vocabulary and multimodal model -- it is not restricted to concepts seen at training time and can be applied to novel classes at inference time using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models (foundation models), FAn can detect and segment objects by matching multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion and object re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial vehicle) and report its ability to seamlessly follow the objects of interest in a real-time control loop. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we open-source all our code on our project webpage at https://github.com/alaamaalouf/FollowAnything . We also encourage the reader the watch our 5-minutes explainer video in this https://www.youtube.com/watch?v=6Mgt3EPytrw .
    摘要 Tracking和跟踪目标对象是Robotics应用场景的关键,从工业自动化到物流和仓储,以及医疗和安全。在这篇论文中,我们提出了一种可以实时探测、跟踪和跟踪任何目标的Robotic系统。我们的方法,命名为“跟随任何thing”(FAn),不受训练时间的概念限制,可以在推理时间应用于新的类型。利用大规模预训练模型(基础模型)提供的丰富视觉描述符,FAn可以通过对输入图像序列中的文本、图像和键入查询进行匹配来探测和分割目标。这些探测和分割的目标在图像帧中被跟踪,同时考虑 occlusion 和目标重新出现。我们在一种实际的微型飞行器上进行了FAn的示例应用,并报告了它在实时控制循环中无缝跟踪目标。FAn可以在一个具有6-8 GB的轻量级图形处理器的笔记型电脑上运行,每秒6-20帧的吞吐量。为了促进快速的采用、部署和扩展,我们在项目网站上公开了所有代码(https://github.com/alaamaalouf/FollowAnything)。我们还邀请读者关注我们的5分钟解释视频(https://www.youtube.com/watch?v=6Mgt3EPytrw)。

PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers

  • paper_url: http://arxiv.org/abs/2308.05732
  • repo_url: None
  • paper_authors: Phillip Lippe, Bastiaan S. Veeling, Paris Perdikaris, Richard E. Turner, Johannes Brandstetter
  • for: This paper aims to improve the accuracy and stability of deep neural network-based surrogates for solving partial differential equations (PDEs) by addressing the neglect of non-dominant spatial frequency information.
  • methods: The authors use a large-scale analysis of common temporal rollout strategies and draw inspiration from recent advances in diffusion models to introduce a novel model class called PDE-Refiner, which uses a multistep refinement process to accurately model all frequency components of PDE solutions.
  • results: The authors validate PDE-Refiner on challenging benchmarks of complex fluid dynamics and demonstrate stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. Additionally, PDE-Refiner is shown to greatly enhance data efficiency by implicitly inducing a novel form of spectral data augmentation.
    Abstract Time-dependent partial differential equations (PDEs) are ubiquitous in science and engineering. Recently, mostly due to the high computational cost of traditional solution techniques, deep neural network based surrogates have gained increased interest. The practical utility of such neural PDE solvers relies on their ability to provide accurate, stable predictions over long time horizons, which is a notoriously hard problem. In this work, we present a large-scale analysis of common temporal rollout strategies, identifying the neglect of non-dominant spatial frequency information, often associated with high frequencies in PDE solutions, as the primary pitfall limiting stable, accurate rollout performance. Based on these insights, we draw inspiration from recent advances in diffusion models to introduce PDE-Refiner; a novel model class that enables more accurate modeling of all frequency components via a multistep refinement process. We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner's connection to diffusion models enables an accurate and efficient assessment of the model's predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate.
    摘要 时间依赖的 partial differential equations (PDEs) 在科学和工程中广泛存在。近期,主要因为传统解决方法的计算成本高昂,深度神经网络基于的 surrogate 获得了增加的关注。但是,实际应用中需要 deep neural network 可以在长时间 horizons 提供稳定、准确的预测,这是一个非常困难的问题。在这项工作中,我们对常见的时间滚动策略进行大规模分析,发现了忽略非主导空间频率信息的问题,这常与 PDE 解的高频信号相关。基于这些发现,我们从 diffusion 模型中继承了一种新的模型类型 - PDE-Refiner,它可以更好地模拟所有频率组成部分。我们验证了 PDE-Refiner 在复杂的液体动力学 benchmark 上,表现稳定和准确, consistently 超过了现有的神经网络、数值和混合神经网络-数值模型。此外, PDE-Refiner 可以大幅提高数据效率,因为均匀目标 implicitly 引入了一种新的 spectral data augmentation。最后, PDE-Refiner 的连接到 diffusion 模型使得可以准确地评估模型的预测 uncertainty,allowing us to estimate 当 surrogate 变得不准确。

Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review

  • paper_url: http://arxiv.org/abs/2308.05731
  • repo_url: None
  • paper_authors: Steffen Hagedorn, Marcel Hallgarten, Martin Stoll, Alexandru Condurache
  • for: 本研究旨在探讨自动驾驶系统中的预测和规划模型,以提高安全、舒适、高效的驾驶体验。
  • methods: 本研究使用深度学习技术来实现预测和规划模型的集成,并对不同的集成方法进行系统性的回顾和分析。
  • results: 研究发现,集成预测和规划模型可以提高自动驾驶系统的安全性、舒适性和高效性,但同时也存在一些研究挑战和限制。
    Abstract Automated driving has the potential to revolutionize personal, public, and freight mobility. Besides the enormous challenge of perception, i.e. accurately perceiving the environment using available sensor data, automated driving comprises planning a safe, comfortable, and efficient motion trajectory. To promote safety and progress, many works rely on modules that predict the future motion of surrounding traffic. Modular automated driving systems commonly handle prediction and planning as sequential separate tasks. While this accounts for the influence of surrounding traffic on the ego-vehicle, it fails to anticipate the reactions of traffic participants to the ego-vehicle's behavior. Recent works suggest that integrating prediction and planning in an interdependent joint step is necessary to achieve safe, efficient, and comfortable driving. While various models implement such integrated systems, a comprehensive overview and theoretical understanding of different principles are lacking. We systematically review state-of-the-art deep learning-based prediction, planning, and integrated prediction and planning models. Different facets of the integration ranging from model architecture and model design to behavioral aspects are considered and related to each other. Moreover, we discuss the implications, strengths, and limitations of different integration methods. By pointing out research gaps, describing relevant future challenges, and highlighting trends in the research field, we identify promising directions for future research.
    摘要 Simplified Chinese:自动驾驶有可能对个人、公共和货物运输产生革命性的变革。除了巨大的感知挑战,自动驾驶还包括规划一个安全、舒适和效率的运动轨迹。为促进安全和进步,许多工作依赖于周围交通的预测模块。现有的模块通常处理预测和规划为独立的两个任务。这种方法虽然考虑了周围交通对egosensor数据的影响,但未能预测egosensor的行为对交通参与者的反应。最近的研究表明,将预测和规划 integrate为一个互相依赖的步骤是必要的,以实现安全、高效和舒适的驾驶。虽然有各种实现了这些集成系统的模型,但是它们之间的概念性和理论基础缺乏系统性的回顾。我们系统地回顾当前最新的深度学习基于预测和规划的模型,包括不同的模型架构、设计和行为方面。此外,我们还讨论了不同集成方法的影响、优势和局限性。通过指出研究漏洞、描述未来挑战和提出新趋势,我们标识了未来研究的投入方向。

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

  • paper_url: http://arxiv.org/abs/2308.05725
  • repo_url: None
  • paper_authors: Tu Anh Nguyen, Wei-Ning Hsu, Antony D’Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux
  • For: The paper is written for researchers and developers working on speech synthesis, particularly those interested in textless speech synthesis and expressive speech synthesis.* Methods: The paper uses low bitrate discrete units that have been learned in a self-supervised fashion to resynthesize high-quality speech. The authors introduce a new dataset called Expresso, which includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles.* Results: The paper presents an expressive resynthesis benchmark that evaluates the quality of resynthesized speech using automatic metrics for different self-supervised discrete encoders. The authors explore tradeoffs between quality, bitrate, and invariance to speaker and style.Here are the three key points in Simplified Chinese text:
  • for: 这篇论文是为研究者和开发者们,尤其是关注无文本语音合成和表达性语音合成的人员所写的。
  • methods: 这篇论文使用自主学习的低比特率简洁单元来重新生成高质量的语音。作者们介绍了一个新的数据集called Expresso,该数据集包括了阅读的语音和自发的对话,并在26种自由表达风格中进行了渲染。
  • results: 论文提出了一个表达性重新生成 benchmark,用于评估不同自主学习简洁单元的重新生成质量。作者们explore了质量、比特率和 speaker和风格的不变性之间的贸易。
    Abstract Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization). The adoption of these methods is still limited by the fact that most speech synthesis datasets are read, severely limiting spontaneity and expressivity. Here, we introduce Expresso, a high-quality expressive speech dataset for textless speech synthesis that includes both read speech and improvised dialogues rendered in 26 spontaneous expressive styles. We illustrate the challenges and potentials of this dataset with an expressive resynthesis benchmark where the task is to encode the input in low-bitrate units and resynthesize it in a target voice while preserving content and style. We evaluate resynthesis quality with automatic metrics for different self-supervised discrete encoders, and explore tradeoffs between quality, bitrate and invariance to speaker and style. All the dataset, evaluation metrics and baseline models are open source
    摘要 近期研究表明,可以使用低比特率独立单元来重新生成高质量的语音,而不是基于文本。这些独立单元可以自然地捕捉语音中的表达特征(诸如语调、声音风格、非语言 vocalization),但目前这些方法的应用受到大多数语音合成数据集是阅读的限制。在这里,我们介绍Expresso,一个高质量表达语音数据集,包括阅读语音和自由对话,拥有26种自然表达风格。我们介绍了这些数据集的挑战和潜力,并通过一个表达性重新生成 bencmark 评估不同自然隐式编码器的质量。我们还评估了不同自然隐式编码器的质量,bitrate和对话者和风格的不变性。所有的数据集、评估指标和基准模型都是开源的。

Optimizing Performance of Feedforward and Convolutional Neural Networks through Dynamic Activation Functions

  • paper_url: http://arxiv.org/abs/2308.05724
  • repo_url: None
  • paper_authors: Chinmay Rane, Kanishka Tyagi, Michael Manry
  • for: 这篇论文主要针对深度学习训练算法在各个领域中的成功,以及深度 convolutional neural networks(CNN)中layer的层数的增加。
  • methods: 该论文使用了piece-wise linear(PWL)活化函数在隐藏层中,并对普通的ReLU活化函数进行比较。
  • results: 研究发现,使用PWL活化函数可以在 convolutional neural networks和多层感知器中提高网络性能,并与PyTorch中深度和浅度CNNs的比较结果进一步证明了这一点。
    Abstract Deep learning training training algorithms are a huge success in recent years in many fields including speech, text,image video etc. Deeper and deeper layers are proposed with huge success with resnet structures having around 152 layers. Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained. Activation functions used in the network are of utmost importance, as they provide non linearity to the networks. Relu's are the most commonly used activation function.We show a complex piece-wise linear(PWL) activation in the hidden layer. We show that these PWL activations work much better than relu activations in our networks for convolution neural networks and multilayer perceptrons. Result comparison in PyTorch for shallow and deep CNNs are given to further strengthen our case.
    摘要 深度学习训练算法在最近几年内取得了很大成功,在多个领域,如语音、文本、图像和视频等领域中。随着更深的层次结构的提出,深度学习模型的性能得到了大幅提升。例如,ResNet结构中的约152层。 although shallow convolutional neural networks (CNNs) are still an active research area, some phenomena are still not well understood. activation functions used in the network are of paramount importance, as they provide non-linearity to the networks. ReLU是最常用的激活函数。我们在隐藏层使用复杂的piece-wise linear(PWL)激活函数,并证明这些PWL激活函数在我们的网络中比ReLU激活函数更有效。我们还通过PyTorch中的深度和浅度CNN比较结果,进一步证明我们的结论。

A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control

  • paper_url: http://arxiv.org/abs/2308.05711
  • repo_url: None
  • paper_authors: Marshall Wang, John Willes, Thomas Jiralerspong, Matin Moezzi
  • for: 优化冷暖空调控制系统的性能和能效性
  • methods: 使用经典RL和深度RL方法(Q学习和深度Q网络)在多种HVAC环境进行比较,探讨RL代理的实际参数选择和奖励调整
  • results: 提供了配置RL代理在HVAC系统中的指导,促进能源高效和成本低的运行
    Abstract Reinforcement learning (RL) is a promising approach for optimizing HVAC control. RL offers a framework for improving system performance, reducing energy consumption, and enhancing cost efficiency. We benchmark two popular classical and deep RL methods (Q-Learning and Deep-Q-Networks) across multiple HVAC environments and explore the practical consideration of model hyper-parameter selection and reward tuning. The findings provide insight for configuring RL agents in HVAC systems, promoting energy-efficient and cost-effective operation.
    摘要

Shadow Datasets, New challenging datasets for Causal Representation Learning

  • paper_url: http://arxiv.org/abs/2308.05707
  • repo_url: https://github.com/Jiagengzhu/Shadow-dataset-for-crl
  • paper_authors: Jiageng Zhu, Hanchen Xie, Jianhua Wu, Jiazhi Li, Mahyar Khayatkhoei, Mohamed E. Hussein, Wael AbdAlmageed
  • for: 本研究旨在探索语义因素之间的 causal 关系,以提高表示学习中的 causal 理解。
  • methods: 研究使用 weakly 监督的 CRL 方法,以避免高成本的标注。
  • results: 提出了两个新的 CRL 数据集,以及对原有数据集的修正,以更好地评估 CRL 性能。
    Abstract Discovering causal relations among semantic factors is an emergent topic in representation learning. Most causal representation learning (CRL) methods are fully supervised, which is impractical due to costly labeling. To resolve this restriction, weakly supervised CRL methods were introduced. To evaluate CRL performance, four existing datasets, Pendulum, Flow, CelebA(BEARD) and CelebA(SMILE), are utilized. However, existing CRL datasets are limited to simple graphs with few generative factors. Thus we propose two new datasets with a larger number of diverse generative factors and more sophisticated causal graphs. In addition, current real datasets, CelebA(BEARD) and CelebA(SMILE), the originally proposed causal graphs are not aligned with the dataset distributions. Thus, we propose modifications to them.
    摘要 发现 semantic 因素之间的 causal 关系是 representation learning 中一个 emerging 话题。大多数 causal representation learning(CRL)方法是完全监督的,这是因为标注成本高昂。为解决这种限制,弱监督 CRL 方法被引入。为评估 CRL 性能,我们使用了四个现有的数据集:Pendulum、Flow、CelebA(BEARD)和 CelebA(SMILE)。然而,现有的 CRL 数据集受到简单的图гра各种限制,因此我们提议两个新的数据集,它们具有更多的多样化的生成因素和更复杂的 causal 图。此外,现有的实际数据集 CelebA(BEARD)和 CelebA(SMILE)的原始 causal 图与数据集分布不匹配,因此我们提出修改。

Hard No-Box Adversarial Attack on Skeleton-Based Human Action Recognition with Skeleton-Motion-Informed Gradient

  • paper_url: http://arxiv.org/abs/2308.05681
  • repo_url: https://github.com/luyg45/hardnoboxattack
  • paper_authors: Zhengzhi Lu, He Wang, Ziyi Chang, Guoan Yang, Hubert P. H. Shum
  • for: 本研究旨在证明skeleton-based人体活动识别方法受到恶意攻击的漏洞性。
  • methods: 本研究使用了一种新的攻击任务,即攻击者无法访问受试者模型或训练数据或标签。我们称之为“硬无框攻击”(hard no-box attack)。我们首先学习了一个动作映射,然后定义了一种对抗损失函数来计算一个新的攻击方向,称之为skeleton-motion-informed(SMI)梯度。我们的梯度包含了运动动力学信息,与现有的梯度基于攻击方法不同。
  • results: 我们的方法对现有的分类器带来了真正的威胁,并且显示了SMI梯度可以提高攻击样本的传播性和隐蔽性在无框和转移黑盒设置中。
    Abstract Recently, methods for skeleton-based human activity recognition have been shown to be vulnerable to adversarial attacks. However, these attack methods require either the full knowledge of the victim (i.e. white-box attacks), access to training data (i.e. transfer-based attacks) or frequent model queries (i.e. black-box attacks). All their requirements are highly restrictive, raising the question of how detrimental the vulnerability is. In this paper, we show that the vulnerability indeed exists. To this end, we consider a new attack task: the attacker has no access to the victim model or the training data or labels, where we coin the term hard no-box attack. Specifically, we first learn a motion manifold where we define an adversarial loss to compute a new gradient for the attack, named skeleton-motion-informed (SMI) gradient. Our gradient contains information of the motion dynamics, which is different from existing gradient-based attack methods that compute the loss gradient assuming each dimension in the data is independent. The SMI gradient can augment many gradient-based attack methods, leading to a new family of no-box attack methods. Extensive evaluation and comparison show that our method imposes a real threat to existing classifiers. They also show that the SMI gradient improves the transferability and imperceptibility of adversarial samples in both no-box and transfer-based black-box settings.
    摘要 最近,基于骨架的人体活动识别方法已经被证明为易受到敌意攻击的。然而,这些攻击方法具有高度限制性的要求,包括受害者(i.e. white-box attacks)、训练数据(i.e. transfer-based attacks)或模型查询频繁(i.e. black-box attacks)的访问权限。这些限制性使得攻击的危害程度提高了一个问题。在这篇论文中,我们证明了这种攻击存在。为此,我们提出了一个新的攻击任务:攻击者没有访问受害者模型或训练数据或标签的权限。我们称之为“hard no-box attack”。 Specifically, we first learn a motion manifold where we define an adversarial loss to compute a new gradient for the attack, named skeleton-motion-informed (SMI) gradient。我们的梯度包含运动动力学信息,与现有的梯度基于攻击方法计算每个维度在数据中独立的假设不同。SMI梯度可以增强许多梯度基于攻击方法,导致一个新的家族的no-box攻击方法。我们进行了广泛的评估和比较,证明了我们的方法对现有的分类器具有真正的威胁。它们还表明SMI梯度可以提高攻击样本在no-box和转移基于黑盒设置中的传输性和隐蔽性。

Finding Already Debunked Narratives via Multistage Retrieval: Enabling Cross-Lingual, Cross-Dataset and Zero-Shot Learning

  • paper_url: http://arxiv.org/abs/2308.05680
  • repo_url: None
  • paper_authors: Iknoor Singh, Carolina Scarton, Xingyi Song, Kalina Bontcheva
  • for: 本研究的目的是探讨跨语言异常检验已经证伪的故事检索问题,以减少专业Fact-checker的手动努力并减少假信息的快速传播。
  • methods: 本研究使用了一种新的数据集,使用推文作为查询语进行一个数据库中的Fact-checking文章的检索。同时,研究还进行了大规模的实验,用以评估多种预训练的Transformer模型在这个跨语言任务中的表现。
  • results: 研究结果表明,跨语言检索已经证伪的故事是一项具有挑战性的任务,而Off-the-shelf Transformer模型不能超越一个强的基于词语的基线模型(BM25)。然而,研究提出了一种多Stage检索框架,可以减少BM25的性能下降,并且可以在不同领域和零shot学习的情况下表现出色。
    Abstract The task of retrieving already debunked narratives aims to detect stories that have already been fact-checked. The successful detection of claims that have already been debunked not only reduces the manual efforts of professional fact-checkers but can also contribute to slowing the spread of misinformation. Mainly due to the lack of readily available data, this is an understudied problem, particularly when considering the cross-lingual task, i.e. the retrieval of fact-checking articles in a language different from the language of the online post being checked. This paper fills this gap by (i) creating a novel dataset to enable research on cross-lingual retrieval of already debunked narratives, using tweets as queries to a database of fact-checking articles; (ii) presenting an extensive experiment to benchmark fine-tuned and off-the-shelf multilingual pre-trained Transformer models for this task; and (iii) proposing a novel multistage framework that divides this cross-lingual debunk retrieval task into refinement and re-ranking stages. Results show that the task of cross-lingual retrieval of already debunked narratives is challenging and off-the-shelf Transformer models fail to outperform a strong lexical-based baseline (BM25). Nevertheless, our multistage retrieval framework is robust, outperforming BM25 in most scenarios and enabling cross-domain and zero-shot learning, without significantly harming the model's performance.
    摘要 Retrieving already debunked narratives 的任务是检测已经被证实的故事。成功检测已经证实的声明不仅可以减少专业 фак-checker 的手动努力,还可以减slow down 虚假信息的传播。但是由于数据不 readily available 的原因,这是一个未经研究的问题,特别是在跨语言任务上。这篇论文填补了这一漏洞,通过以下三个方面:1. 创建了一个新的数据集,用于启动研究跨语言已经证实故事的检索;2. 进行了广泛的实验,对精制和简洁的多语言预训练Transformer模型进行了评估;3. 提出了一个多 Stage 框架,将跨语言已经证实故事的检索任务分为两个阶段:精度阶段和重新排序阶段。结果显示,跨语言已经证实故事的检索任务是一个挑战性的任务,而Off-the-shelf Transformer模型也不能超过强的基于词语的基线(BM25)。然而,我们的多阶段检索框架具有强健性,在大多数情况下超过了BM25,并且允许跨频率和零学习,无需明显减少模型性能。