results: 实验分析表明,引入的储量模型可以达到理论最大的短期记忆容量。同时,相比标准ESN,ES$^2$N具有更好的记忆和非线性质量的融合,以及在推理非线性模型方面的显著改善。Abstract
In this paper, we propose a new Reservoir Computing (RC) architecture, called the Edge of Stability Echo State Network (ES$^2$N). The introduced ES$^2$N model is based on defining the reservoir layer as a convex combination of a nonlinear reservoir (as in the standard ESN), and a linear reservoir that implements an orthogonal transformation. We provide a thorough mathematical analysis of the introduced model, proving that the whole eigenspectrum of the Jacobian of the ES2N map can be contained in an annular neighbourhood of a complex circle of controllable radius, and exploit this property to demonstrate that the ES$^2$N's forward dynamics evolves close to the edge-of-chaos regime by design. Remarkably, our experimental analysis shows that the newly introduced reservoir model is able to reach the theoretical maximum short-term memory capacity. At the same time, in comparison to standard ESN, ES$^2$N is shown to offer a favorable trade-off between memory and nonlinearity, as well as a significant improvement of performance in autoregressive nonlinear modeling.
摘要
在这篇论文中,我们提出了一新的储存 computing(RC)架构,称为Edge of Stability Echo State Network(ES$^2$N)。我们引入的ES$^2$N模型基于定义储存层为非线性储存(如标准ESN)和线性储存实现正交变换的权重组合。我们提供了ES$^2$N模型的完整数学分析,证明整个储存映射的雅可比矩阵的所有特征值可以包含在一个可控制的圆形范围内,并利用这个性质来证明ES$^2$N的前向动力学在设计上靠近边缘混乱 режи。在实验中,我们发现新引入的储存模型可以达到理论最大短期记忆容量。同时,相比标准ESN,ES$^2$N具有更好的记忆和非线性质量之间的交换,以及在推理非线性模型方面的显著改善。
Textual Data Mining for Financial Fraud Detection: A Deep Learning Approach
results: 研究者的结果表明,使用这些多种神经网络模型可以准确地检测金融诈骗。这些结果对金融诈骗检测有重要的意义,并为业界实践人员、 regulators 和研究人员提供了有价值的意见,帮助他们开发更加有力的诈骗检测方法。Abstract
In this report, I present a deep learning approach to conduct a natural language processing (hereafter NLP) binary classification task for analyzing financial-fraud texts. First, I searched for regulatory announcements and enforcement bulletins from HKEX news to define fraudulent companies and to extract their MD&A reports before I organized the sentences from the reports with labels and reporting time. My methodology involved different kinds of neural network models, including Multilayer Perceptrons with Embedding layers, vanilla Recurrent Neural Network (RNN), Long-Short Term Memory (LSTM), and Gated Recurrent Unit (GRU) for the text classification task. By utilizing this diverse set of models, I aim to perform a comprehensive comparison of their accuracy in detecting financial fraud. My results bring significant implications for financial fraud detection as this work contributes to the growing body of research at the intersection of deep learning, NLP, and finance, providing valuable insights for industry practitioners, regulators, and researchers in the pursuit of more robust and effective fraud detection methodologies.
摘要
在这份报告中,我使用深度学习方法进行自然语言处理(以下简称NLP)的二分类任务,以分析金融诈骗文本。首先,我从香港证券交易所新闻中搜索了规范公告和执行公告,以定义诈骗公司并从其财务报告中提取MD&A报告。然后,我将报告中的句子按照标签和报告时间进行排序。我的方法包括多层感知器(Multilayer Perceptrons)、普通的循环神经网络(vanilla RNN)、长短期记忆神经网络(LSTM)和闭合循环神经网络(GRU)等多种神经网络模型,用于文本分类任务。通过使用这些多种模型,我想要进行全面的比较,以确定这些模型在检测金融诈骗方面的准确率。我的结果对金融诈骗检测有着重要的意义,这项工作将贡献到深度学习、NLP和金融之间的交叉领域的研究中,为业内专业人士、监管机构和研究人员提供价值的发现。
paper_authors: Alireza Rafiei, Ronald Moore, Sina Jahromi, Farshid Hajati, Rishikesan Kamaleswaran for:这篇论文旨在探讨基于知识和体验的meta-learning在医疗领域的应用,以提供对健康领域中critical挑战的解决方案。methods:这篇论文使用了多种meta-learning方法,包括多/单任务学习和多/几少射学习,并将这些方法应用到医疗领域中。results:这篇论文总结了过去几年在医疗领域中的meta-learning研究,包括各种应用和挑战,以及未来的挑战和展望。Abstract
As a subset of machine learning, meta-learning, or learning to learn, aims at improving the model's capabilities by employing prior knowledge and experience. A meta-learning paradigm can appropriately tackle the conventional challenges of traditional learning approaches, such as insufficient number of samples, domain shifts, and generalization. These unique characteristics position meta-learning as a suitable choice for developing influential solutions in various healthcare contexts, where the available data is often insufficient, and the data collection methodologies are different. This survey discusses meta-learning broad applications in the healthcare domain to provide insight into how and where it can address critical healthcare challenges. We first describe the theoretical foundations and pivotal methods of meta-learning. We then divide the employed meta-learning approaches in the healthcare domain into two main categories of multi/single-task learning and many/few-shot learning and survey the studies. Finally, we highlight the current challenges in meta-learning research, discuss the potential solutions and provide future perspectives on meta-learning in healthcare.
摘要
为了提高模型的能力,机器学习的一个子集——元学习(learning to learn),利用先前的知识和经验来提高模型的性能。元学习的思想可以有效地解决传统学习方法面临的挑战,如数据缺乏、频率不均等。这些特点使得元学习成为在医疗领域开发有力的解决方案的适用场景。本文首先介绍元学习的理论基础和关键方法,然后将医疗领域中emploied元学习方法分为多/单任务学习和多/少射学习两个主要类别,并survey相关研究。最后,我们提出了当前元学习研究中的挑战,并讨论了 potential解决方案,以及未来元学习在医疗领域的前景。
Semi-supervised Learning for Segmentation of Bleeding Regions in Video Capsule Endoscopy
paper_authors: Hechen Li, Yanan Wu, Long Bai, An Wang, Tong Chen, Hongliang Ren
for: 用于诊断不同类型的肠胃内部疾病,包括不明确出血。
methods: 采用 semi-supervised learning 方法,使用 Mean Teacher 方法构建学生 U-Net 模型和老师模型,并在训练过程中交互更新参数。
results: 实验结果表明,使用 SSL 方法可以减少模型训练所需的注释量,无需妥协准确性。Abstract
In the realm of modern diagnostic technology, video capsule endoscopy (VCE) is a standout for its high efficacy and non-invasive nature in diagnosing various gastrointestinal (GI) conditions, including obscure bleeding. Importantly, for the successful diagnosis and treatment of these conditions, accurate recognition of bleeding regions in VCE images is crucial. While deep learning-based methods have emerged as powerful tools for the automated analysis of VCE images, they often demand large training datasets with comprehensive annotations. Acquiring these labeled datasets tends to be time-consuming, costly, and requires significant domain expertise. To mitigate this issue, we have embraced a semi-supervised learning (SSL) approach for the bleeding regions segmentation within VCE. By adopting the `Mean Teacher' method, we construct a student U-Net equipped with an scSE attention block, alongside a teacher model of the same architecture. These models' parameters are alternately updated throughout the training process. We use the Kvasir-Capsule dataset for our experiments, which encompasses various GI bleeding conditions. Notably, we develop the segmentation annotations for this dataset ourselves. The findings from our experiments endorse the efficacy of the SSL-based segmentation strategy, demonstrating its capacity to reduce reliance on large volumes of annotations for model training, without compromising on the accuracy of identification.
摘要
现代诊断技术中,视频幂 capsule endoscopy (VCE) 是一种非侵入性和高效的诊断多种消化道(GI)疾病的技术,包括不明确出血。在成功诊断和治疗这些疾病的过程中,准确地识别出血液区域在 VCE 图像中是关键。深度学习基本方法已经在 VCE 图像的自动分析中展示出了强大的能力,但它们通常需要大量的训练数据集,并且需要具有很大的域专业知识来获取这些标注数据集。为了解决这个问题,我们采用了一种半监督学习(SSL)方法来进行出血区域的分割。我们采用的是“Mean Teacher”方法,通过将一个学生 U-Net 和一个教师模型相互更新参数,以实现学习。我们使用 Kvasir-Capsule 数据集进行实验,该数据集包括多种 GI 出血病变。它们的分割标注我们自己制作了。实验结果证明了 SSL 基于的分割策略的效果,表明它可以降低对大量标注的依赖,不会影响识别的精度。
Replace Scoring with Arrangement: A Contextual Set-to-Arrangement Framework for Learning-to-Rank
results: 实验结果显示,STARank在2个学习排名 benchmark dataset和3个top-N实际推荐dataset上表现出色,较前方的9个方法更好。此外,STARank还能够在对候选项目之间的上下文依赖性下提供更好的性能。Abstract
Learning-to-rank is a core technique in the top-N recommendation task, where an ideal ranker would be a mapping from an item set to an arrangement (a.k.a. permutation). Most existing solutions fall in the paradigm of probabilistic ranking principle (PRP), i.e., first score each item in the candidate set and then perform a sort operation to generate the top ranking list. However, these approaches neglect the contextual dependence among candidate items during individual scoring, and the sort operation is non-differentiable. To bypass the above issues, we propose Set-To-Arrangement Ranking (STARank), a new framework directly generates the permutations of the candidate items without the need for individually scoring and sort operations; and is end-to-end differentiable. As a result, STARank can operate when only the ground-truth permutations are accessible without requiring access to the ground-truth relevance scores for items. For this purpose, STARank first reads the candidate items in the context of the user browsing history, whose representations are fed into a Plackett-Luce module to arrange the given items into a list. To effectively utilize the given ground-truth permutations for supervising STARank, we leverage the internal consistency property of Plackett-Luce models to derive a computationally efficient list-wise loss. Experimental comparisons against 9 the state-of-the-art methods on 2 learning-to-rank benchmark datasets and 3 top-N real-world recommendation datasets demonstrate the superiority of STARank in terms of conventional ranking metrics. Notice that these ranking metrics do not consider the effects of the contextual dependence among the items in the list, we design a new family of simulation-based ranking metrics, where existing metrics can be regarded as special cases. STARank can consistently achieve better performance in terms of PBM and UBM simulation-based metrics.
摘要
学习排名是推荐任务的核心技术,其理想rankerd是一个将元素集映射到排序(即排序)的函数。现有的方法大多采用概率排名原则(PRP),即先对候选集中的每个元素分配排名并then执行排序操作来生成top排名列表。然而,这些方法忽略了候选项之间的上下文依赖关系,并且排序操作是不可导的。为了缺省这些问题,我们提出了Set-To-Arrangement Ranking(STARank),一种新的框架,可以直接生成候选项的排序列表,而不需要对每个元素进行分配排名和排序操作。此外,STARank还是端到端可导的。因此,STARank可以在只有真实排名available的情况下运行,而不需要访问真实的相关性分数。为了使用给定的真实排名来监督STARank,我们利用了Plackett-Luce模型的内部一致性性来 derivate一种 computationally efficient的列表性损失。我们在2个学习排名 benchmark dataset和3个top-N实际推荐dataset上进行了对9种状态对方法的实验比较,结果表明STARank在传统排名度量上表现出色。请注意,这些排名度量不考虑候选项列表中的上下文依赖关系,我们设计了一新的基于simulation的排名度量家族,其中现有的度量可以看作特殊情况。STARank可以在PBM和UBM simulation-based度量上提供更好的性能。
results: 本研究提出了feather的动机和值 proposition,并提供了完整的技术和实现细节。例如,SDK可支持多步骤模型,并可以自动对储存数据集进行评估。但是请注意,feather是一个休眠项目,我们已经开源了代码用于研究purposes。Abstract
At its core, feather was a tool that allowed model developers to build shareable user interfaces for their models in under 20 lines of code. Using the Python SDK, developers specified visual components that users would interact with. (e.g. a FileUpload component to allow users to upload a file). Our service then provided 1) a URL that allowed others to access and use the model visually via a user interface; 2) an API endpoint to allow programmatic requests to a model. In this paper, we discuss feather's motivations and the value we intended to offer AI researchers and developers. For example, the SDK can support multi-step models and can be extended to run automatic evaluation against held out datasets. We additionally provide comprehensive technical and implementation details. N.B. feather is presently a dormant project. We have open sourced our code for research purposes: https://github.com/feather-ai/
摘要
文本核心是一个工具,允许模型开发者在20行代码以下建立可共享用户界面 для他们的模型。使用Python SDK,开发者指定了用户会交互的视觉组件(例如文件上传组件)。我们的服务然后提供了以下两个功能:1)一个URL,允许他人通过用户界面访问和使用模型;2)一个API端点,允许程序请求模型的请求。在这篇论文中,我们讨论了feather的动机和我们对AI研究者和开发者的意图价值。例如,SDK可以支持多步骤模型,并可以扩展以自动对储存数据进行评估。我们还提供了完整的技术和实现细节。注意:feather目前是一个休眠项目,我们已经开源了我们的代码 для研究用途:https://github.com/feather-ai/。
Physics-Based Task Generation through Causal Sequence of Physical Interactions
results: 研究人员使用了游戏 Angry Birds 来示例性评估生成的任务,并通过多种指标(包括物理稳定性、可用性和不可用性)评估了这些任务的质量。Abstract
Performing tasks in a physical environment is a crucial yet challenging problem for AI systems operating in the real world. Physics simulation-based tasks are often employed to facilitate research that addresses this challenge. In this paper, first, we present a systematic approach for defining a physical scenario using a causal sequence of physical interactions between objects. Then, we propose a methodology for generating tasks in a physics-simulating environment using these defined scenarios as inputs. Our approach enables a better understanding of the granular mechanics required for solving physics-based tasks, thereby facilitating accurate evaluation of AI systems' physical reasoning capabilities. We demonstrate our proposed task generation methodology using the physics-based puzzle game Angry Birds and evaluate the generated tasks using a range of metrics, including physical stability, solvability using intended physical interactions, and accidental solvability using unintended solutions. We believe that the tasks generated using our proposed methodology can facilitate a nuanced evaluation of physical reasoning agents, thus paving the way for the development of agents for more sophisticated real-world applications.
摘要
在物理环境中完成任务是人工智能系统在真实世界中的一个关键和挑战。基于物理模拟的任务经常用于解决这一问题。在这篇论文中,我们首先提出一种系统的方法来定义物理场景,使用物理相互作用的 causal 序列来描述物理交互。然后,我们提出一种使用这些定义的场景来生成在物理模拟环境中的任务的方法。我们的方法可以帮助更好地理解物理任务的细致机理,从而帮助评估人工智能系统在物理上的理解能力。我们使用Physics-based puzzle game Angry Birds进行示例,并使用一系列指标来评估生成的任务,包括物理稳定性、可以通过意图的物理交互解决、以及意外地解决使用非意图的解决方案。我们认为生成的任务可以帮助评估物理理解代理人的能力,因此为更复杂的真实世界应用开创道路。
Multi-Agent Verification and Control with Probabilistic Model Checking
results: 这篇论文总结了一些在多代理情况下的推理进展,以及这些进展的应用。它还提出了在多代理情况下推理的挑战。Abstract
Probabilistic model checking is a technique for formal automated reasoning about software or hardware systems that operate in the context of uncertainty or stochasticity. It builds upon ideas and techniques from a diverse range of fields, from logic, automata and graph theory, to optimisation, numerical methods and control. In recent years, probabilistic model checking has also been extended to integrate ideas from game theory, notably using models such as stochastic games and solution concepts such as equilibria, to formally verify the interaction of multiple rational agents with distinct objectives. This provides a means to reason flexibly about agents acting in either an adversarial or a collaborative fashion, and opens up opportunities to tackle new problems within, for example, artificial intelligence, robotics and autonomous systems. In this paper, we summarise some of the advances in this area, and highlight applications for which they have already been used. We discuss how the strengths of probabilistic model checking apply, or have the potential to apply, to the multi-agent setting and outline some of the key challenges required to make further progress in this field.
摘要
probabistic model checking 是一种技术,用于正式自动推理软件或硬件系统在不确定性或随机性的情况下的行为。它借鉴了多种领域的想法和技巧,包括逻辑、自动机和图论,以及优化、数值方法和控制。在最近几年,probabistic model checking 还被扩展到与游戏理论相结合,使用模型如随机游戏和解迪rium such as equilibria,以正式验证多个有目的智能代理人之间的互动。这提供了一种可以灵活地关于代理人在对抗或合作的方式下行为的方式,并开阔了人工智能、 роботех和自动化系统等领域的应用可能性。在这篇文章中,我们将summarize一些在这个领域的进展,并高亮应用于哪些领域。我们讨论probabistic model checking 在多代理人设置下的优势,以及需要在这个领域进行进一步进展的关键挑战。
A Symbolic Character-Aware Model for Solving Geometry Problems
results: 在GeoQA和Geometry3K两个标准数据集上实现了新的州OF-the-art问题解决率和解决步骤的改进,具体来说是从60.0%提高到64.1%,并且在Geometry3K数据集上将问题平均解决步骤从6.9下降到6.0。Abstract
AI has made significant progress in solving math problems, but geometry problems remain challenging due to their reliance on both text and diagrams. In the text description, symbolic characters such as "$\triangle$ABC" often serve as a bridge to connect the corresponding diagram. However, by simply tokenizing symbolic characters into individual letters (e.g., 'A', 'B' and 'C'), existing works fail to study them explicitly and thus lose the semantic relationship with the diagram. In this paper, we develop a symbolic character-aware model to fully explore the role of these characters in both text and diagram understanding and optimize the model under a multi-modal reasoning framework. In the text encoder, we propose merging individual symbolic characters to form one semantic unit along with geometric information from the corresponding diagram. For the diagram encoder, we pre-train it under a multi-label classification framework with the symbolic characters as labels. In addition, we enhance the geometry diagram understanding ability via a self-supervised learning method under the masked image modeling auxiliary task. By integrating the proposed model into a general encoder-decoder pipeline for solving geometry problems, we demonstrate its superiority on two benchmark datasets, including GeoQA and Geometry3K, with extensive experiments. Specifically, on GeoQA, the question-solving accuracy is increased from 60.0\% to 64.1\%, achieving a new state-of-the-art accuracy; on Geometry3K, we reduce the question average solving steps from 6.9 down to 6.0 with marginally higher solving accuracy.
摘要
AI已经做出了重要的进步在解决数学问题上,但 geometry问题仍然是挑战的,因为它们需要同文本和图表之间建立连接。在文本描述中,符号Character such as "$\triangle$ABC" часто作为连接图表和文本之间的桥梁。然而,现有的工作往往会将符号Character tokenized into individual letters(例如,'A', 'B' 和 'C'),从而失去了图表和文本之间的semantic关系。在这篇论文中,我们开发了一种符号Character-aware的模型,以全面探讨符号Character在文本和图表理解中的角色,并在多模态推理框架下优化模型。在文本Encoder中,我们提议将个别的符号Character合并成一个semantic单元,并与图表中的 геометри信息相结合。对于图表Encoder,我们在多Label分类框架下预训练它,使其能够学习符号Character的semantic信息。此外,我们通过自动学习方法在masked image modeling auxiliary task中增强图表理解能力。通过将我们的模型集成到一个通用的encoder-decoder架构中,我们在GeoQA和Geometry3K两个标准测试集上进行了广泛的实验,并证明了我们的模型在这些测试集上的超越性。具体来说,在GeoQA上,问题解决率从60.0%提高到64.1%,创造了新的state-of-the-art纪录;在Geometry3K上,我们将问题的平均解决步骤从6.9降低到6.0,并保持了marginally高的解决率。
The changing rule of human bone density with aging based on a novel definition and mensuration of bone density with computed tomography
results: 研究发现,骨密度随着年龄增长而下降,这种下降Rate在女性和男性之间有所差异,女性在39-80岁之间的年龄段下降的速度约为1.6倍于男性。这些结果证明了骨密度不同岁龄段的变化是线性的,从而为骨健康研究和临床诊断提供了新的视角。Abstract
Osteoporosis and fragility fractures have emerged as major public health concerns in an aging population. However, measuring age-related changes in bone density using dual-energy X-ray absorptiometry has limited personalized risk assessment due to susceptibility to interference from various factors. In this study, we propose an innovative statistical model of bone pixel distribution in fine-segmented computed tomography (CT) images, along with a novel approach to measuring bone density based on CT values of bone pixels. Our findings indicate that bone density exhibits a linear decline with age during adulthood between the ages of 39 and 80, with the rate of decline being approximately 1.6 times faster in women than in men. This contradicts the widely accepted notion that bone density starts declining in women at menopause and in men at around 50 years of age. The linearity of age-related changes provides further insights into the dynamics of the aging human body. Consequently, our findings suggest that the definition of osteoporosis by the World Health Organization should be revised to the standard deviation of age-based bone density. Furthermore, these results open up new avenues for research in bone health care and clinical investigation of osteoporosis.
摘要
骨质疾病和脆骨损伤已经成为老龄人口的主要公共健康问题。然而,使用双能X射线吸收率测量年龄相关的骨密度受到了各种因素的干扰, limiting personalized risk assessment. 在本研究中,我们提出了一个创新的骨像分布统计模型,以及一种基于Computed Tomography(CT)值的骨密度衡量方法。我们的发现显示,骨密度随着年龄的增长会 linear decline,女性与男性的增长率相对约1.6倍,这与通过Menopause和50岁的男性开始衰老骨骼的观念不符。这 Linearity of age-related changes provides further insights into the dynamics of the aging human body. Therefore, our findings suggest that the definition of osteoporosis by the World Health Organization should be revised to the standard deviation of age-based bone density. Furthermore, these results open up new avenues for research in bone health care and clinical investigation of osteoporosis.
Artificial Intelligence for Molecular Communication
paper_authors: Max Bartunik, Jens Kirchner, Oliver Keszocze
for: 这个研究是为了探讨分子通信技术,尤其是在医疗设备中使用这种技术实现数据传输。
methods: 这个研究使用了人工神经网络来解读受到干扰的讯号。
results: 这个研究发现,使用人工神经网络可以实现可靠地识别受到干扰的讯号。Abstract
Molecular communication is a novel approach for data transmission between miniaturized devices, especially in contexts where electrical signals are to be avoided. The communication is based on sending molecules (or other particles) at nano scale through channel instead sending electrons over a wire. Molecular communication devices have a large potential in medical applications as they offer an alternative to antenna-based transmission systems that may not be applicable due to size, temperature, or radiation constraints. The communication is achieved by transforming a digital signal into concentrations of molecules. These molecules are then detected at the other end of the communication channel and transformed back into a digital signal. Accurately modeling the transmission channel is often not possible which may be due to a lack of data or time-varying parameters of the channel (e. g., the movements of a person wearing a medical device). This makes demodulation of the signal very difficult. Many approaches for demodulation have been discussed with one particular approach having tremendous success: artificial neural networks. These networks imitate the decision process in the human brain and are capable of reliably classifying noisy input data. Training such a network relies on a large set of training data. As molecular communication as a technology is still in its early development phase, this data is not always readily available. We discuss neural network-based demodulation approaches relying on synthetic data based on theoretical channel models as well as works using actual measurements produced by a prototype test bed. In this work, we give a general overview over the field molecular communication, discuss the challenges in the demodulations process of transmitted signals, and present approaches to these challenges that are based on artificial neural networks.
摘要
молекуляр通信是一种新的方法 для数据传输 между微型设备,特别在电子信号被避免的情况下。通信基于发送分子(或其他粒子)在nanoscale通道上进行传输,而不是在电缆上发送电子信号。分子通信设备在医疗应用中有很大的潜力,因为它们可以提供一种antenna-based传输系统的替代方案,这些系统可能因为尺寸、温度或辐射限制而无法适用。通信是将数字信号转换成分子的浓度。这些分子然后在通信频道的另一端被探测,并转换回数字信号。因为模拟传输频道的模型很难,因此分子通信的许多挑战在于模拟频道的困难。许多适用于模拟频道的方法已经被讨论,其中一种方法具有惊人的成功:人工神经网络。这些网络模仿人脑中的决策过程,可以可靠地分类噪音输入数据。训练这种网络需要大量的训练数据。由于分子通信技术还处于早期发展阶段,这些数据不总是可用。我们讨论基于理论频道模型生成的 sintetic数据,以及使用实际测量生成的数据来训练人工神经网络。在这篇文章中,我们提供了分子通信领域的总体介绍,讨论了传输信号的各种挑战,以及基于人工神经网络的解决方案。
A generative model for surrogates of spatial-temporal wildfire nowcasting
for: This paper aims to provide a generative model for real-time wildfire nowcasting, using a three-dimensional Vector-Quantized Variational Autoencoders to generate spatial-temporal sequences of unseen wildfire burned areas in a given ecoregion.
methods: The proposed method uses a generative model based on a three-dimensional Vector-Quantized Variational Autoencoders to generate coherent and structured fire scenarios, taking into account the impact from geophysical variables such as vegetation and slope.
results: The generated data are used to train a surrogate model for predicting wildfire dissemination, which has been tested on both simulation data and the real Chimney fire event, showing promising results.Abstract
Recent increase in wildfires worldwide has led to the need for real-time fire nowcasting. Physics-driven models, such as cellular automata and computational fluid dynamics can provide high-fidelity fire spread simulations but they are computationally expensive and time-consuming. Much effort has been put into developing machine learning models for fire prediction. However, these models are often region-specific and require a substantial quantity of simulation data for training purpose. This results in a significant amount of computational effort for different ecoregions. In this work, a generative model is proposed using a three-dimensional Vector-Quantized Variational Autoencoders to generate spatial-temporal sequences of unseen wildfire burned areas in a given ecoregion. The model is tested in the ecoregion of a recent massive wildfire event in California, known as the Chimney fire. Numerical results show that the model succeed in generating coherent and structured fire scenarios, taking into account the impact from geophysical variables, such as vegetation and slope. Generated data are also used to train a surrogate model for predicting wildfire dissemination, which has been tested on both simulation data and the real Chimney fire event.
摘要
全球各地的野火增加,带来了实时火灾预测的需求。物理驱动的模型,如细胞自动机和计算流体力学,可以提供高精度的火灾传播模拟,但是它们 computationally expensive 和 time-consuming。大量的努力已经投入到了机器学习模型的开发中,以预测野火。然而,这些模型通常是地域特定的,需要大量的 simulate 数据来训练目的。这导致了不同的生态区域需要巨量的计算力。在这篇文章中,一种基于三维 вектор量化自适应器的生成模型被提议,用于生成未经见过的野火烧毁区域的三维空间时间序列。模型在加利福尼亚州的一个latest massive wildfire事件中进行了测试,称为Chimney fire。数据测试结果表明,模型成功地生成了具有相互关系的和结构的野火场景,考虑了地理物理变量,如 vegetation 和 slope。生成的数据还用于训练一个用于预测野火传播的代理模型,该模型在实际Chimney fire事件上进行了测试,以及在 simulate 数据上进行了测试。
MiAMix: Enhancing Image Classification through a Multi-stage Augmented Mixed Sample Data Augmentation Method
paper_authors: Wen Liang, Youzhi Liang, Jianguo Jia for: 这篇论文的目的是提出一种名为 MiAMix 的多阶段混合调整方法,以提高深度学习模型的性能和泛化能力。methods: 这篇论文使用了多种多标的混合方法,包括 Image Augmentation 和 Mixup 架构,并且随机选择混合几何调整方法以提高混合效果。results: 根据四个图像标准benchmark的评估结果,MiAMix 可以提高模型的性能,并且不需要额外的计算负载。相比之下,与现有的混合样本数据增强方法进行比较,MiAMix 表现更好。Abstract
Despite substantial progress in the field of deep learning, overfitting persists as a critical challenge, and data augmentation has emerged as a particularly promising approach due to its capacity to enhance model generalization in various computer vision tasks. While various strategies have been proposed, Mixed Sample Data Augmentation (MSDA) has shown great potential for enhancing model performance and generalization. We introduce a novel mixup method called MiAMix, which stands for Multi-stage Augmented Mixup. MiAMix integrates image augmentation into the mixup framework, utilizes multiple diversified mixing methods concurrently, and improves the mixing method by randomly selecting mixing mask augmentation methods. Recent methods utilize saliency information and the MiAMix is designed for computational efficiency as well, reducing additional overhead and offering easy integration into existing training pipelines. We comprehensively evaluate MiaMix using four image benchmarks and pitting it against current state-of-the-art mixed sample data augmentation techniques to demonstrate that MIAMix improves performance without heavy computational overhead.
摘要
尽管深度学习领域已取得了显著进步,但过拟合仍然是一个重要挑战,而数据扩充因其能够提高模型通用性而成为一个非常有前途的方法。虽然有很多策略被提出,但混合样本数据扩充(MSDA)表现出了很大的潜力,可以提高模型性能和通用性。我们介绍了一种新的混合方法,称为 Multi-stage Augmented Mixup(MiAMix),它将图像扩充integrated into mixup框架,同时使用多种多样化的混合方法,并通过随机选择混合maske augmentation方法来提高混合方法。现有的方法使用了注意力信息,而MiAMix也是为计算效率而设计,减少了额外的负担和提供了易于集成到现有训练管道的便利性。我们对MiaMix进行了四个图像标准曲线的完整评估,并与当前状态的混合样本数据扩充技术进行比较,以示MiAMix可以提高性能而不带重量级计算过程。
Crowdsourcing Fraud Detection over Heterogeneous Temporal MMMA Graph
results: 在一个industry-size HTG上应用CMT后,可以具有显著的性能提升,与其他方法相比。此外,CMT还在一个大规模的公共金融HTG上进行了实验,并取得了显著的结果,表明CMT可以应用于其他图像异常检测任务中。Abstract
The rise of the click farm business using Multi-purpose Messaging Mobile Apps (MMMAs) tempts cybercriminals to perpetrate crowdsourcing frauds that cause financial losses to click farm workers. In this paper, we propose a novel contrastive multi-view learning method named CMT for crowdsourcing fraud detection over the heterogeneous temporal graph (HTG) of MMMA. CMT captures both heterogeneity and dynamics of HTG and generates high-quality representations for crowdsourcing fraud detection in a self-supervised manner. We deploy CMT to detect crowdsourcing frauds on an industry-size HTG of a representative MMMA WeChat and it significantly outperforms other methods. CMT also shows promising results for fraud detection on a large-scale public financial HTG, indicating that it can be applied in other graph anomaly detection tasks.
摘要
随着多功能通信手机应用程序(MMMA)的兴起, click farm业务的营运者被黑客利用以进行卫星投票诈骗, causing financial losses to click farm workers. 在这篇论文中,我们提议一种新的对比多视图学习方法(CMT),用于在异质时间图(HTG)上探测卫星投票诈骗。 CMT 能够 Capture HTG 中的异质和动态特征,并生成高质量的 Representation 以自主supervised 方式探测卫星投票诈骗。 我们在一个industry-size HTG 上部署 CMT,并显著超越其他方法。 CMT 还在一个大规模的公共金融 HTG 上显示出了扫描投票诈骗的扩展性, indicating that it can be applied to other graph anomaly detection tasks.
Solving Logistic-Oriented Bin Packing Problems Through a Hybrid Quantum-Classical Approach
results: 在本文中,作者采用了多种特点,如箱不同类型、一维、二维和三维实例的解决方案,以及物品与箱的关联要求和交付优先级等,并测试了这些特点和Q4RealBPP的应用能力。Abstract
The Bin Packing Problem is a classic problem with wide industrial applicability. In fact, the efficient packing of items into bins is one of the toughest challenges in many logistic corporations and is a critical issue for reducing storage costs or improving vehicle space allocation. In this work, we resort to our previously published quantum-classical framework known as Q4RealBPP, and elaborate on the solving of real-world oriented instances of the Bin Packing Problem. With this purpose, this paper gravitates on the following characteristics: i) the existence of heterogeneous bins, ii) the extension of the framework to solve not only three-dimensional, but also one- and two-dimensional instances of the problem, iii) requirements for item-bin associations, and iv) delivery priorities. All these features have been tested in this paper, as well as the ability of Q4RealBPP to solve real-world oriented instances.
摘要
文本:The Bin Packing Problem is a classic problem with wide industrial applicability. In fact, the efficient packing of items into bins is one of the toughest challenges in many logistic corporations and is a critical issue for reducing storage costs or improving vehicle space allocation. In this work, we resort to our previously published quantum-classical framework known as Q4RealBPP, and elaborate on the solving of real-world oriented instances of the Bin Packing Problem. With this purpose, this paper gravitates on the following characteristics: i) the existence of heterogeneous bins, ii) the extension of the framework to solve not only three-dimensional, but also one- and two-dimensional instances of the problem, iii) requirements for item-bin associations, and iv) delivery priorities. All these features have been tested in this paper, as well as the ability of Q4RealBPP to solve real-world oriented instances.中文翻译: bin packing 问题是一个广泛的工业应用问题。实际上,fficiently packing items into bins 是许多物流公司面临的最大挑战,是降低存储成本或改善车辆空间分配的关键问题。在这种情况下,我们 resort 到我们之前发表的量子-классической框架,称为 Q4RealBPP,并详细介绍了解决实际应用场景的问题。为此,本文强调以下特征:i) 存在异ogeneous bins,ii) 扩展框架以解决不仅三维,而且一维和二维实例的问题,iii) 物品-桶关联要求,以及 iv) 交付优先级。所有这些特征都在本文中进行了测试,以及 Q4RealBPP 的应用实际应用场景。
Semi-supervised Contrastive Regression for Estimation of Eye Gaze
results: 这篇研究发现,这个对比学习框架可以从小访实验数据中获得一个具有普遍适用性的解决方案,并与许多现有的对比学习技术相比,表现更好。Abstract
With the escalated demand of human-machine interfaces for intelligent systems, development of gaze controlled system have become a necessity. Gaze, being the non-intrusive form of human interaction, is one of the best suited approach. Appearance based deep learning models are the most widely used for gaze estimation. But the performance of these models is entirely influenced by the size of labeled gaze dataset and in effect affects generalization in performance. This paper aims to develop a semi-supervised contrastive learning framework for estimation of gaze direction. With a small labeled gaze dataset, the framework is able to find a generalized solution even for unseen face images. In this paper, we have proposed a new contrastive loss paradigm that maximizes the similarity agreement between similar images and at the same time reduces the redundancy in embedding representations. Our contrastive regression framework shows good performance in comparison to several state of the art contrastive learning techniques used for gaze estimation.
摘要
随着人机交互系统中智能系统的需求增加,考虑到人机交互的非侵入性,考虑到人机交互的非侵入性, gaze controlled system 的开发已成为必要。 gaze 是一种非侵入性的人机交互方式,是最适合的方式之一。 使用 deep learning 模型来进行 gaze 估计是最常见的方法,但这些模型的性能完全受到 Labelled gaze dataset 的大小的限制,从而影响其性能的总体化。本文提出了一种 semi-supervised contrastive learning 框架,可以使用小量 Labelled gaze dataset 来估计 gaze 方向。该框架能够在未看过的面孔图像上找到一个通用的解决方案。我们提出了一种新的对比损失函数,可以最大化相似图像之间的相似性,同时减少对 embedding 表示的重复性。我们的对比回归框架在许多 state-of-the-art 对比学习技术用于 gaze 估计方面显示了良好的性能。
Surrogate Empowered Sim2Real Transfer of Deep Reinforcement Learning for ORC Superheat Control
results: 实验结果表明,提出的Sim2Real传输学习控制方法可以快速提高DRL在ORC控制问题中的训练速度,并解决多种运行条件下DRL代理人的一致性问题。Abstract
The Organic Rankine Cycle (ORC) is widely used in industrial waste heat recovery due to its simple structure and easy maintenance. However, in the context of smart manufacturing in the process industry, traditional model-based optimization control methods are unable to adapt to the varying operating conditions of the ORC system or sudden changes in operating modes. Deep reinforcement learning (DRL) has significant advantages in situations with uncertainty as it directly achieves control objectives by interacting with the environment without requiring an explicit model of the controlled plant. Nevertheless, direct application of DRL to physical ORC systems presents unacceptable safety risks, and its generalization performance under model-plant mismatch is insufficient to support ORC control requirements. Therefore, this paper proposes a Sim2Real transfer learning-based DRL control method for ORC superheat control, which aims to provide a new simple, feasible, and user-friendly solution for energy system optimization control. Experimental results show that the proposed method greatly improves the training speed of DRL in ORC control problems and solves the generalization performance issue of the agent under multiple operating conditions through Sim2Real transfer.
摘要
“对于工业废热回收中的组合式润滑循环(ORC)系统,由于其简单的结构和维护容易,因此在制程工业中广泛应用。但在智能制造中,传统的模型基于优化控制方法无法适应ORC系统的变化操作条件或突然变化的操作模式。深度强化学习(DRL)在不确定情况下有优势,因为它直接实现控制目标无需明确控制plant的模型。但是,将DRL直接应用到物理ORC系统中会带来不可接受的安全隐患,并且其通用性在多个操作条件下的表现不足以满足ORC控制需求。因此,本文提出了一种基于Sim2Real传播学习的DRL控制方法,以提供一个新的简单、可行、用户友好的能源系统优化控制解决方案。实验结果显示,提案的方法可以对ORC超载控制问题进行快速培训DRL,并且透过Sim2Real传播解决代理人在多个操作条件下的一致性问题。”
for: The paper is written for those interested in 3D representation and view synthesis, particularly in the context of Neural Radiance Fields (NeRFs) and their applications in computer graphics and vision.
methods: The paper uses a review of the NeRF representation and its applications, as well as a historical perspective on the development of 3D representation for view synthesis and related problems.
results: The paper provides insights into the current state of NeRF representations and their applications, as well as new developments and future directions in 3D representation.Here is the same information in Simplified Chinese text:
results: 这篇论文提供了NeRF表示的当前状况和应用,以及新的发展和未来方向。Abstract
Neural Radiance Fields or NeRFs have become the representation of choice for problems in view synthesis or image-based rendering, as well as in many other applications across computer graphics and vision, and beyond. At their core, NeRFs describe a new representation of 3D scenes or 3D geometry. Instead of meshes, disparity maps, multiplane images or even voxel grids, they represent the scene as a continuous volume, with volumetric parameters like view-dependent radiance and volume density obtained by querying a neural network. The NeRF representation has now been widely used, with thousands of papers extending or building on it every year, multiple authors and websites providing overviews and surveys, and numerous industrial applications and startup companies. In this article, we briefly review the NeRF representation, and describe the three decades-long quest to find the best 3D representation for view synthesis and related problems, culminating in the NeRF papers. We then describe new developments in terms of NeRF representations and make some observations and insights regarding the future of 3D representations.
摘要
neural radiance fields (NeRFs) 已成为视图合成或基于图像渲染等问题的表示方法选择,以及计算机视觉领域中许多其他应用程序的首选表示方法。 NeRFs 描述了一种新的三维场景或三维几何表示方法,而不是传统的网格、投影图、多平面图像或粒子网格。 NeRF 表示方法通过访问神经网络来获取视依赖的光度和体积密度,并且表示场景为一个连续体,而不是分割的几何体。在这篇文章中,我们将简要介绍 NeRF 表示方法,并描述了三十年来寻找最佳视图合成和相关问题的解决方案,即 NeRF 论文。然后,我们将介绍 NeRF 表示方法的新发展,以及一些关于未来三维表示方法的观察和发现。
Nonlinear Controller Design for a Quadrotor with Inverted Pendulum
results: 实现了四轴直径自适应控制系统的轨迹跟踪和稳定控制,并在四轴直径自适应控制系统和圆柱吊车组合情况下实现了轨迹跟踪。Abstract
The quadrotor is a $6$ degrees-of-freedom (DoF) system with underactuation. Adding a spherical pendulum on top of a quadrotor further complicates the task of achieving any output tracking while stabilizing the rest. In this report, we present different types of controllers for the nonlinear dynamical system of quadrotor and pendulum combination, utilizing feedback-linearization and control Lyapunov function with quadratic programming (CLF-QP) approaches. We demonstrated trajectory tracking for quadrotor-only case as well as quadrotor-pendulum-combined case.
摘要
四旋翼机是6度自由度(DoF)系统,受不足动作控制的影响。在这种情况下,加装球形悬挂在四旋翼机之上,使得控制任务变得更加复杂。本报告介绍了不同类型的控制器,用于非线性动力学系统的四旋翼机和悬挂组合,包括反馈线性化和控制 Lyapunov 函数(CLF)的方法。我们也展示了四旋翼机只有情况和四旋翼机与悬挂相结合的轨迹跟踪情况。
Assessing the impact of emergency department short stay units using length-of-stay prediction and discrete event simulation
results: 我们的结果表明,建议的表现是一般可接受的,并不需要特征选择。此外,结果表明可以使用患者入院特征、实验室测试结果、Radiology、生命体征和临床记录来预测住院日期,并且预测效果可以达到0.69的AUC值(分类短期和长期住院患者)。Abstract
Accurately predicting hospital length-of-stay at the time a patient is admitted to hospital may help guide clinical decision making and resource allocation. In this study we aim to build a decision support system that predicts hospital length-of-stay for patients admitted to general internal medicine from the emergency department. We conduct an exploratory data analysis and employ feature selection methods to identify the attributes that result in the best predictive performance. We also develop a discrete-event simulation model to assess the performances of the prediction models in a practical setting. Our results show that the recommendation performances of the proposed approaches are generally acceptable and do not benefit from the feature selection. Further, the results indicate that hospital length-of-stay could be predicted with reasonable accuracy (e.g., AUC value for classifying short and long stay patients is 0.69) using patient admission demographics, laboratory test results, diagnostic imaging, vital signs and clinical documentation.
摘要
可以准确预测入院病人的医院length-of-stay(LoS)可以帮助医疗决策和资源分配。在这个研究中,我们想要建立一个决策支持系统,用于预测急诊医学科病人的医院LoS。我们进行了探索性数据分析,并使用特征选择方法来确定最佳预测性能的属性。我们还开发了一个离散事件模拟模型,以评估预测模型在实际场景中的表现。我们的结果表明,提议的方法的表现是一般可接受的,并且不受特征选择的影响。此外,结果表明,可以使用病人入院的人数、实验室测试结果、 radiológica图像、生命体征和临床记录来预测医院LoS,并且预测的准确率(例如,AUC值用于分类短长停病人是0.69)。
Towards Improving Harmonic Sensitivity and Prediction Stability for Singing Melody Extraction
results: 对多种模型,包括MSNet、FTANet和新引入的PianoNet,进行了修改,并通过实验证明了提议的修改对歌唱 мелодии提取有 empirical 效果。Abstract
In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extraction.
摘要
在深度学习研究中,许多旋律提取模型采用重新设计神经网络架构以提高性能。在这篇论文中,我们提出了输入特征修改和训练目标修改,基于两个假设。首先, audio数据spectrogram中的谱harmonics快速衰减 along the frequency axis。为了增强模型对后续谱harmonics的敏感性,我们使用discrete z-transform修改Combined Frequency and Periodicity (CFP)表示。其次, vocals和非 vocals的极短时间段是非常罕见的。为了确保更稳定的旋律轮廓,我们设计了可导的损失函数,防止模型预测这些时间段。我们应用这些修改于several models,包括MSNet、FTANet和一种新引入的模型,PianoNet,该模型来自钢琴谱写网络。我们的实验结果表明,提出的修改是empirically effective дляSinging melody extraction。
Solving Witness-type Triangle Puzzles Faster with an Automatically Learned Human-Explainable Predicate
results: 使用这种预测方法可以加速搜索,平均提高搜索速度六倍,而且可以在固定搜索时间预算下解决更大的谜题。Abstract
Automatically solving puzzle instances in the game The Witness can guide players toward solutions and help puzzle designers generate better puzzles. In the latter case such an Artificial Intelligence puzzle solver can inform a human puzzle designer and procedural puzzle generator to produce better instances. The puzzles, however, are combinatorially difficult and search-based solvers can require large amounts of time and memory. We accelerate such search by automatically learning a human-explainable predicate that predicts whether a partial path to a Witness-type puzzle is not completable to a solution path. We prove a key property of the learned predicate which allows us to use it for pruning successor states in search thereby accelerating search by an average of six times while maintaining completeness of the underlying search. Conversely given a fixed search time budget per puzzle our predicate-accelerated search can solve more puzzle instances of larger sizes than the baseline search.
摘要
自动解决游戏《证人》中的PUZZLE实例可以帮助玩家找到解决方案,同时也可以帮助游戏设计者生成更好的PUZZLE。在后一种情况下,人工智能PUZZLE解决器可以告诉人类PUZZLE设计者和生成器制造更好的实例。然而,这些PUZZLE具有复杂的组合性,搜索基于的解决方法可能需要大量的时间和内存。我们使用自动学习的人类可读 predicate来预测partial path是否可能不能完成到解决方案。我们证明了该 predicate 的一个关键性质,这使得我们可以在搜索中使用它进行减少继承状态,从而加速搜索,而且平均加速 six 倍,保持搜索的完整性。相反,给定一个固定的搜索时间预算,我们的 predicate-加速搜索可以解决更大的PUZZLE实例,比基eline搜索更多。
Let’s Give a Voice to Conversational Agents in Virtual Reality
results: 作者在 Unity 中创建了两个对话 прототип,一个是非 immerse 显示,另一个是 VR 头sets,并在数字健康领域中进行了测试和评估。Abstract
The dialogue experience with conversational agents can be greatly enhanced with multimodal and immersive interactions in virtual reality. In this work, we present an open-source architecture with the goal of simplifying the development of conversational agents operating in virtual environments. The architecture offers the possibility of plugging in conversational agents of different domains and adding custom or cloud-based Speech-To-Text and Text-To-Speech models to make the interaction voice-based. Using this architecture, we present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets.
摘要
对话体验可以通过多模态和 immerse 交互在虚拟现实中进行增强。在这项工作中,我们提出了一个开源架构,以便简化在虚拟环境中运行 conversational agent 的开发。该架构允许插入不同领域的 conversational agent,并可以添加自定义或云端的 Speech-To-Text 和 Text-To-Speech 模型,以使交互voice基于。使用这个架构,我们向大家展示了在 Unity 中为非 immerse 显示和 VR 头戴设备开发的两个对话原型,其中一个是在医疗领域中进行了开发。
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
paper_authors: Weihao Yu, Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Zicheng Liu, Xinchao Wang, Lijuan Wang
For: The paper proposes an evaluation benchmark called MM-Vet to evaluate large multimodal models (LMMs) on complicated multimodal tasks.* Methods: The paper uses six core vision-language (VL) capabilities to define the tasks and evaluates the 16 integrations of interest derived from the capability combination. The paper also proposes an LLM-based evaluator for open-ended outputs to evaluate the models across different question types and answer styles.* Results: The paper evaluates representative LMMs on MM-Vet and provides insights into the capabilities of different LMM system paradigms and models.Here is the simplified Chinese version of the three key points:* For: 文章提出了一个名为 MM-Vet 的评估准则,用于评估大型多Modal模型(LMMs)在复杂多Modal任务上的表现。* 方法: 文章使用 six core 视力语言(VL)能力定义任务,并评估了这些能力的16种组合。文章还提出了基于 LLM 的评估器,用于评估不同类型的问题和答案风格。* 结果: 文章对 representative LMMs 进行了 MM-Vet 的评估,并提供了不同系统 paradigm 和模型的能力的详细分析。Abstract
We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing abilities, such as solving math problems written on the blackboard, reasoning about events and celebrities in news images, and explaining visual jokes. Rapid model advancements pose challenges to evaluation benchmark development. Problems include: (1) How to systematically structure and evaluate the complicated multimodal tasks; (2) How to design evaluation metrics that work well across question and answer types; and (3) How to give model insights beyond a simple performance ranking. To this end, we present MM-Vet, designed based on the insight that the intriguing ability to solve complicated tasks is often achieved by a generalist model being able to integrate different core vision-language (VL) capabilities. MM-Vet defines 6 core VL capabilities and examines the 16 integrations of interest derived from the capability combination. For evaluation metrics, we propose an LLM-based evaluator for open-ended outputs. The evaluator enables the evaluation across different question types and answer styles, resulting in a unified scoring metric. We evaluate representative LMMs on MM-Vet, providing insights into the capabilities of different LMM system paradigms and models. Code and data are available at https://github.com/yuweihao/MM-Vet.
摘要
我们提出了MM-Vet,一个评估标准用于测试大型多modal模型(LMM)在复杂多modal任务上的能力。最近的LMM模型已经展示了各种奇妙的能力,如解决written on the blackboard的数学问题,理解新闻图片中的事件和名人,以及解释视觉笑话。由于模型的快速进步,评估标准的开发受到挑战。问题包括:(1)如何系统地结构化和评估复杂多modal任务;(2)如何设计评估指标,能够适应不同的问题类型和答案风格;以及(3)如何为模型提供更深刻的理解,不仅是简单的性能排名。为此,我们提出了MM-Vet,基于多modal视语(VL)能力的核心能力的设计。MM-Vet定义了6个核心VL能力,并评估16种 интерес的核心能力组合。 для评估指标,我们提议一种基于LLM的评估器,可以评估不同的问题类型和答案风格,从而获得统一的评估指标。我们对代表性的LMM模型进行了MM-Vet的评估,从而获得了不同模型系统和模型 paradigm的能力的具体情况。代码和数据可以在https://github.com/yuweihao/MM-Vet中获取。
Generation of Realistic Synthetic Raw Radar Data for Automated Driving Applications using Generative Adversarial Networks
results: 这个方法可以实现激光数据的生成,并且可以实现与实际数据之间的对比,并且可以提高处理激光数据的算法的可能性。Abstract
The main approaches for simulating FMCW radar are based on ray tracing, which is usually computationally intensive and do not account for background noise. This work proposes a faster method for FMCW radar simulation capable of generating synthetic raw radar data using generative adversarial networks (GAN). The code and pre-trained weights are open-source and available on GitHub. This method generates 16 simultaneous chirps, which allows the generated data to be used for the further development of algorithms for processing radar data (filtering and clustering). This can increase the potential for data augmentation, e.g., by generating data in non-existent or safety-critical scenarios that are not reproducible in real life. In this work, the GAN was trained with radar measurements of a motorcycle and used to generate synthetic raw radar data of a motorcycle traveling in a straight line. For generating this data, the distance of the motorcycle and Gaussian noise are used as input to the neural network. The synthetic generated radar chirps were evaluated using the Frechet Inception Distance (FID). Then, the Range-Azimuth (RA) map is calculated twice: first, based on synthetic data using this GAN and, second, based on real data. Based on these RA maps, an algorithm with adaptive threshold and edge detection is used for object detection. The results have shown that the data is realistic in terms of coherent radar reflections of the motorcycle and background noise based on the comparison of chirps, the RA maps and the object detection results. Thus, the proposed method in this work has shown to minimize the simulation-to-reality gap for the generation of radar data.
摘要
主要方法 для模拟雷达是基于射线跟踪,通常是计算机处理昂贵并不考虑背景噪声。这项工作提出了一种更快的雷达数据生成方法,使用生成对抗网络(GAN)来生成合成雷达数据。代码和预训练 веса公开源代码在GitHub上可用。这种方法生成了16个同时的雷达射击,使得生成的数据可以用于雷达数据处理算法的进一步开发(滤波和归一化)。这可以增加数据增强的潜在性,例如生成不可能或安全关键的场景中的数据,以便在实际生活中不可能重现。在这项工作中,GAN被训练使用雷达测量数据,并用来生成雷达数据。输入到神经网络的距离和高斯噪声来生成雷达射击。生成的雷达射击被评估使用彩色卷积扩散(FID)。然后,基于生成的数据和实际数据,计算雷达-方向(RA)图。RA图被计算两次:首先,基于生成数据使用这个GAN;其次,基于实际数据。基于这些RA图,使用适应阈值和边检测算法进行对象检测。结果表明,生成的数据具有准确的干扰雷达反射和背景噪声,根据射击、RA图和对象检测结果进行比较。因此,提出的方法在本工作中减少了模拟到现实差距。
Nonprehensile Planar Manipulation through Reinforcement Learning with Multimodal Categorical Exploration
results: 论文表明,使用多态探索方法可以让RL控制器在不同的初始和目标物体位置和方向下进行灵活的操作,并且对于外部干扰和观测噪声 Display textshowed improved accuracy and smooth trajectories compared to previous RL literature. Furthermore, the learned policies were shown to be transferable to physical robot hardware.Abstract
Developing robot controllers capable of achieving dexterous nonprehensile manipulation, such as pushing an object on a table, is challenging. The underactuated and hybrid-dynamics nature of the problem, further complicated by the uncertainty resulting from the frictional interactions, requires sophisticated control behaviors. Reinforcement Learning (RL) is a powerful framework for developing such robot controllers. However, previous RL literature addressing the nonprehensile pushing task achieves low accuracy, non-smooth trajectories, and only simple motions, i.e. without rotation of the manipulated object. We conjecture that previously used unimodal exploration strategies fail to capture the inherent hybrid-dynamics of the task, arising from the different possible contact interaction modes between the robot and the object, such as sticking, sliding, and separation. In this work, we propose a multimodal exploration approach through categorical distributions, which enables us to train planar pushing RL policies for arbitrary starting and target object poses, i.e. positions and orientations, and with improved accuracy. We show that the learned policies are robust to external disturbances and observation noise, and scale to tasks with multiple pushers. Furthermore, we validate the transferability of the learned policies, trained entirely in simulation, to a physical robot hardware using the KUKA iiwa robot arm. See our supplemental video: https://youtu.be/vTdva1mgrk4.
摘要
开发能够完成无握持的机械人控制器,例如将物体推push到表面上,是一项复杂的任务。由于机械人的下降和混合动力学性质,以及与物体之间的摩擦交互的不确定性,需要开发出复杂的控制方法。学习回归(RL)是一种强大的框架 для开发这类机械人控制器。然而,先前RL文献中的非握持推动任务往往具有低精度、不稳定的轨迹和简单的运动,即不包括机械人 manipulate对象的旋转。我们 conjecture that previous unimodal exploration strategies fail to capture the inherent hybrid dynamics of the task, arising from the different possible contact interaction modes between the robot and the object, such as sticking, sliding, and separation.在这种情况下,我们提出了多模态探索方法,通过分类分布来实现。这使得我们可以训练平面推动RL策略,对于任意起始和目标对象位姿和orientation,并且具有改善的精度。我们示出了学习的策略对于外部干扰和观测噪声具有鲁棒性和扩展性,并且可以扩展到多个推动器。此外,我们验证了学习的策略,完全在 simulate 环境中训练的,可以转移到物理机械人硬件上,使用KUKA iiwa robot臂。请参考我们的补充视频:https://youtu.be/vTdva1mgrk4。
A Survey on Temporal Knowledge Graph Completion: Taxonomy, Progress, and Prospects
results: 本研究对 TKGC 方法进行了详细的介绍和分类,并提出了未来研究方向。Abstract
Temporal characteristics are prominently evident in a substantial volume of knowledge, which underscores the pivotal role of Temporal Knowledge Graphs (TKGs) in both academia and industry. However, TKGs often suffer from incompleteness for three main reasons: the continuous emergence of new knowledge, the weakness of the algorithm for extracting structured information from unstructured data, and the lack of information in the source dataset. Thus, the task of Temporal Knowledge Graph Completion (TKGC) has attracted increasing attention, aiming to predict missing items based on the available information. In this paper, we provide a comprehensive review of TKGC methods and their details. Specifically, this paper mainly consists of three components, namely, 1)Background, which covers the preliminaries of TKGC methods, loss functions required for training, as well as the dataset and evaluation protocol; 2)Interpolation, that estimates and predicts the missing elements or set of elements through the relevant available information. It further categorizes related TKGC methods based on how to process temporal information; 3)Extrapolation, which typically focuses on continuous TKGs and predicts future events, and then classifies all extrapolation methods based on the algorithms they utilize. We further pinpoint the challenges and discuss future research directions of TKGC.
摘要
temporal characteristics are prominently evident in a substantial volume of knowledge, which underscores the pivotal role of Temporal Knowledge Graphs (TKGs) in both academia and industry. However, TKGs often suffer from incompleteness due to three main reasons: the continuous emergence of new knowledge, the weakness of the algorithm for extracting structured information from unstructured data, and the lack of information in the source dataset. Therefore, the task of Temporal Knowledge Graph Completion (TKGC) has attracted increasing attention, aiming to predict missing items based on the available information. In this paper, we provide a comprehensive review of TKGC methods and their details. Specifically, this paper mainly consists of three components, namely, 1)Background, which covers the preliminaries of TKGC methods, loss functions required for training, as well as the dataset and evaluation protocol; 2)Interpolation, that estimates and predicts the missing elements or set of elements through the relevant available information. It further categorizes related TKGC methods based on how to process temporal information; 3)Extrapolation, which typically focuses on continuous TKGs and predicts future events, and then classifies all extrapolation methods based on the algorithms they utilize. We further pinpoint the challenges and discuss future research directions of TKGC.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
From Military to Healthcare: Adopting and Expanding Ethical Principles for Generative Artificial Intelligence
paper_authors: David Oniani, Jordan Hilsman, Yifan Peng, COL, Ronald K. Poropatich, COL Jeremy C. Pamplin, LTC Gary L. Legault, Yanshan Wang
For: The paper is written to propose ethical principles for the use of generative AI in healthcare, with the goal of addressing ethical dilemmas and challenges posed by the technology.* Methods: The paper uses a framework called GREAT PLEA to outline the ethical principles for generative AI in healthcare, which includes governance, reliability, equity, accountability, traceability, privacy, lawfulness, empathy, and autonomy.* Results: The paper aims to provide a proactive approach to addressing the ethical challenges of generative AI in healthcare, with the goal of ensuring the technology is used in a responsible and ethical manner.Abstract
In 2020, the U.S. Department of Defense officially disclosed a set of ethical principles to guide the use of Artificial Intelligence (AI) technologies on future battlefields. Despite stark differences, there are core similarities between the military and medical service. Warriors on battlefields often face life-altering circumstances that require quick decision-making. Medical providers experience similar challenges in a rapidly changing healthcare environment, such as in the emergency department or during surgery treating a life-threatening condition. Generative AI, an emerging technology designed to efficiently generate valuable information, holds great promise. As computing power becomes more accessible and the abundance of health data, such as electronic health records, electrocardiograms, and medical images, increases, it is inevitable that healthcare will be revolutionized by this technology. Recently, generative AI has captivated the research community, leading to debates about its application in healthcare, mainly due to concerns about transparency and related issues. Meanwhile, concerns about the potential exacerbation of health disparities due to modeling biases have raised notable ethical concerns regarding the use of this technology in healthcare. However, the ethical principles for generative AI in healthcare have been understudied, and decision-makers often fail to consider the significance of generative AI. In this paper, we propose GREAT PLEA ethical principles, encompassing governance, reliability, equity, accountability, traceability, privacy, lawfulness, empathy, and autonomy, for generative AI in healthcare. We aim to proactively address the ethical dilemmas and challenges posed by the integration of generative AI in healthcare.
摘要
在2020年,美国国防部官方公布了一组伦理原则,用于导引人工智能技术在未来战场上的使用。尽管Military和医疗服务之间存在显著的不同,但在面临生命改变的情况下,战士和医疗提供者都需要快速做出决策。生成AI,一种以计算机功能为基础的技术,可以高效生成有价值信息。随着计算机技术的进步和医疗数据的增加,医疗将被生成AI技术改革。Recently, generative AI has captivated the research community, leading to debates about its application in healthcare, mainly due to concerns about transparency and related issues. Meanwhile, concerns about the potential exacerbation of health disparities due to modeling biases have raised notable ethical concerns regarding the use of this technology in healthcare. However, the ethical principles for generative AI in healthcare have been understudied, and decision-makers often fail to consider the significance of generative AI. In this paper, we propose GREAT PLEA ethical principles, encompassing governance, reliability, equity, accountability, traceability, privacy, lawfulness, empathy, and autonomy, for generative AI in healthcare. We aim to proactively address the ethical dilemmas and challenges posed by the integration of generative AI in healthcare.