cs.AI - 2023-07-10

Active Learning for Video Classification with Frame Level Queries

  • paper_url: http://arxiv.org/abs/2307.05587
  • repo_url: None
  • paper_authors: Debanjan Goswami, Shayok Chakraborty
  • for: 这篇论文的目的是提出一个新的活跃学习框架,以对影片分类 зада目中减少人类标注努力。
  • methods: 我们的框架使用了不确定性和多样性的标准来选择有用的影片和具有代表性的几帧帧帧,以便只需要人类标注几帧帧帧,而不是标注整个影片。
  • results: 我们的框架可以对影片分类任务中减少人类标注努力,并且可以获得较高的准确率。
    Abstract Deep learning algorithms have pushed the boundaries of computer vision research and have depicted commendable performance in a variety of applications. However, training a robust deep neural network necessitates a large amount of labeled training data, acquiring which involves significant time and human effort. This problem is even more serious for an application like video classification, where a human annotator has to watch an entire video end-to-end to furnish a label. Active learning algorithms automatically identify the most informative samples from large amounts of unlabeled data; this tremendously reduces the human annotation effort in inducing a machine learning model, as only the few samples that are identified by the algorithm, need to be labeled manually. In this paper, we propose a novel active learning framework for video classification, with the goal of further reducing the labeling onus on the human annotators. Our framework identifies a batch of exemplar videos, together with a set of informative frames for each video; the human annotator needs to merely review the frames and provide a label for each video. This involves much less manual work than watching the complete video to come up with a label. We formulate a criterion based on uncertainty and diversity to identify the informative videos and exploit representative sampling techniques to extract a set of exemplar frames from each video. To the best of our knowledge, this is the first research effort to develop an active learning framework for video classification, where the annotators need to inspect only a few frames to produce a label, rather than watching the end-to-end video.
    摘要 深度学习算法在计算机视觉领域的研究中提出了新的挑战,并表现出了卓越的表现。然而,训练一个强大的深度神经网络需要大量的标注训练数据,获取这些数据需要很多时间和人工劳动。这个问题对于视频分类应用来说更加严重,因为人类标注者需要从头到尾播放整个视频,以为每个视频提供标签。活动学习算法可以自动从大量的未标注数据中选择最有用的样本,这将减少人工标注劳动的强度,只需要人类标注者为每个视频提供标签。在这篇论文中,我们提出了一个新的活动学习框架 для视频分类,目的是进一步减少人类标注劳动。我们的框架可以同时选择一批表型视频和每个视频的一些有用帧,人类标注者只需要审查这些帧并为每个视频提供标签。这需要的人工劳动比较少,只需要看到每个视频的一些帧,而不是从头到尾播放整个视频。我们基于不确定性和多样性的准则来选择有用的视频和帧,并使用代表性抽样技术来从每个视频中提取有用的帧。根据我们所知,这是第一篇对视频分类的活动学习框架进行研究,人类标注者只需要审查一些帧来生成标签,而不是从头到尾播放整个视频。

Weakly-supervised positional contrastive learning: application to cirrhosis classification

  • paper_url: http://arxiv.org/abs/2307.04617
  • repo_url: https://github.com/guerbet-ai/wsp-contrastive
  • paper_authors: Emma Sarfati, Alexandre Bône, Marc-Michel Rohé, Pietro Gori, Isabelle Bloch
  • for: 这个研究是为了提高医疗影像分类的精度和效率,使用低信度标签和弱标签来驱动模型的训练。
  • methods: 这个研究使用了一种弱相关学习策略,即将医疗影像的空间上下文和低信度标签 integrate 到一个通用的贝叶斯损失函数中,以提高模型的分类精度。
  • results: 研究结果表明,使用提议的方法可以提高医疗影像分类的分类AUC by 5%(相比基eline模型)和26%(相比公共的LIHC数据集)。
    Abstract Large medical imaging datasets can be cheaply and quickly annotated with low-confidence, weak labels (e.g., radiological scores). Access to high-confidence labels, such as histology-based diagnoses, is rare and costly. Pretraining strategies, like contrastive learning (CL) methods, can leverage unlabeled or weakly-annotated datasets. These methods typically require large batch sizes, which poses a difficulty in the case of large 3D images at full resolution, due to limited GPU memory. Nevertheless, volumetric positional information about the spatial context of each 2D slice can be very important for some medical applications. In this work, we propose an efficient weakly-supervised positional (WSP) contrastive learning strategy where we integrate both the spatial context of each 2D slice and a weak label via a generic kernel-based loss function. We illustrate our method on cirrhosis prediction using a large volume of weakly-labeled images, namely radiological low-confidence annotations, and small strongly-labeled (i.e., high-confidence) datasets. The proposed model improves the classification AUC by 5% with respect to a baseline model on our internal dataset, and by 26% on the public LIHC dataset from the Cancer Genome Atlas. The code is available at: https://github.com/Guerbet-AI/wsp-contrastive.
    摘要 大量医疗成像数据可以便宜地和快速地被低信任度、软标签(例如放射学分数)注释。高信任度标签,如 histology-based 诊断,是罕见和昂贵的。无监督策略,如对比学习(CL)方法,可以利用无标签或软标签数据。这些方法通常需要大批量大小,但是在Full Resolution 3D图像的情况下,由于 GPU 内存限制,这可能会增加困难。然而,Volume 的位置信息可以在某些医学应用中是非常重要的。在这种情况下,我们提出了一种高效的软监督位置(WSP)对比学习策略,其中我们将通过一种通用的核心函数基于损失函数来整合每个2Dslice的空间上下文和软标签。我们在 cirrhosis 预测中使用了一个大量的软标签图像,即放射学低信任签名,以及一个小的强 confidence(高信任度)数据集。我们的模型可以与基eline模型相比提高分类AUC值5%,并在公共的LIHC数据集上提高26%。代码可以在:https://github.com/Guerbet-AI/wsp-contrastive 中找到。

Code Generation for Machine Learning using Model-Driven Engineering and SysML

  • paper_url: http://arxiv.org/abs/2307.05584
  • repo_url: https://github.com/sraedler/MDE_for_ML_Generation
  • paper_authors: Simon Raedler, Matthias Rupp, Eugen Rigger, Stefanie Rinderle-Ma
  • for: 这个论文的目的是提高工程系统的数据驱动工程实践。
  • methods: 这个论文使用机器学习模型转换来生成可执行代码,以便在实践中更好地实现数据驱动工程。
  • results: 研究表明,这种方法可以提高模型转换的可 modify性和维护性,并且可以减少实现的努力。此外,这种方法还可以提供一个理论基础,用于标准化数据驱动工程实践。
    Abstract Data-driven engineering refers to systematic data collection and processing using machine learning to improve engineering systems. Currently, the implementation of data-driven engineering relies on fundamental data science and software engineering skills. At the same time, model-based engineering is gaining relevance for the engineering of complex systems. In previous work, a model-based engineering approach integrating the formalization of machine learning tasks using the general-purpose modeling language SysML is presented. However, formalized machine learning tasks still require the implementation in a specialized programming languages like Python. Therefore, this work aims to facilitate the implementation of data-driven engineering in practice by extending the previous work of formalizing machine learning tasks by integrating model transformation to generate executable code. The method focuses on the modifiability and maintainability of the model transformation so that extensions and changes to the code generation can be integrated without requiring modifications to the code generator. The presented method is evaluated for feasibility in a case study to predict weather forecasts. Based thereon, quality attributes of model transformations are assessed and discussed. Results demonstrate the flexibility and the simplicity of the method reducing efforts for implementation. Further, the work builds a theoretical basis for standardizing data-driven engineering implementation in practice.
    摘要 数据驱动工程指的是通过机器学习系统化收集和处理数据来提高工程系统。现在,数据驱动工程的实施仍然需要基础数据科学和软件工程技能。同时,模型基于工程在复杂系统工程中得到了应用。在前一项工作中,一种基于SysML通用模型语言的模型基于工程方法 integrates formalized机器学习任务的表现。然而,正式机器学习任务仍然需要通过专门的编程语言如Python进行实现。因此,本工作的目标是通过扩展前一项工作中的形式化机器学习任务,并将模型转换 integrate into executable code来促进数据驱动工程在实践中的实现。该方法强调模型转换的可 modify和维护性,以便在扩展和修改代码生成器时不需要修改代码。在一个案例研究中,通过预测天气预报来评估和讨论模型转换的质量特征。结果表明该方法的灵活性和简洁性,可以降低实现的努力。此外,该工作为数据驱动工程实践中的标准化提供了理论基础。

MiVOLO: Multi-input Transformer for Age and Gender Estimation

  • paper_url: http://arxiv.org/abs/2307.04616
  • repo_url: https://github.com/wildchlamydia/mivolo
  • paper_authors: Maksim Kuprashevich, Irina Tolstykh
    for: 这篇论文的目的是提出一种基于视觉变换器的年龄和性别识别方法,以解决在野外的年龄和性别识别任务中存在的挑战。methods: 该方法使用最新的视觉变换器,并将年龄和性别识别任务集成到一个共同的双输入/双出口模型中,利用了人脸信息以及人像数据。results: 经过实验表明,该模型在四个流行的benchmark上 achieve state-of-the-art表现,并且能够实现实时处理。此外,我们还介绍了一个基于Open Images Dataset的新的benchmark,其 annotations是由人工注解员 manually generated,因此得到了高准确率的答案。最后,我们公开了我们的模型,以及验证和推理代码。
    Abstract Age and gender recognition in the wild is a highly challenging task: apart from the variability of conditions, pose complexities, and varying image quality, there are cases where the face is partially or completely occluded. We present MiVOLO (Multi Input VOLO), a straightforward approach for age and gender estimation using the latest vision transformer. Our method integrates both tasks into a unified dual input/output model, leveraging not only facial information but also person image data. This improves the generalization ability of our model and enables it to deliver satisfactory results even when the face is not visible in the image. To evaluate our proposed model, we conduct experiments on four popular benchmarks and achieve state-of-the-art performance, while demonstrating real-time processing capabilities. Additionally, we introduce a novel benchmark based on images from the Open Images Dataset. The ground truth annotations for this benchmark have been meticulously generated by human annotators, resulting in high accuracy answers due to the smart aggregation of votes. Furthermore, we compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges. Finally, we grant public access to our models, along with the code for validation and inference. In addition, we provide extra annotations for used datasets and introduce our new benchmark.
    摘要 ages and gender recognition in the wild is a highly challenging task: apart from the variability of conditions, pose complexities, and varying image quality, there are cases where the face is partially or completely occluded. We present MiVOLO (Multi Input VOLO), a straightforward approach for age and gender estimation using the latest vision transformer. Our method integrates both tasks into a unified dual input/output model, leveraging not only facial information but also person image data. This improves the generalization ability of our model and enables it to deliver satisfactory results even when the face is not visible in the image. To evaluate our proposed model, we conduct experiments on four popular benchmarks and achieve state-of-the-art performance, while demonstrating real-time processing capabilities. Additionally, we introduce a novel benchmark based on images from the Open Images Dataset. The ground truth annotations for this benchmark have been meticulously generated by human annotators, resulting in high accuracy answers due to the smart aggregation of votes. Furthermore, we compare our model's age recognition performance with human-level accuracy and demonstrate that it significantly outperforms humans across a majority of age ranges. Finally, we grant public access to our models, along with the code for validation and inference. In addition, we provide extra annotations for used datasets and introduce our new benchmark.

Learning Interpretable Heuristics for WalkSAT

  • paper_url: http://arxiv.org/abs/2307.04608
  • repo_url: None
  • paper_authors: Yannet Interian, Sara Bernardini
  • for: 这 paper 是解决大型、困难的满足问题 (SAT) 的地方搜索算法的研究。
  • methods: 这 paper 使用了 reinforcement learning 来学习变量评价函数和噪音参数的优化策略。
  • results: 实验结果表明,与 WalkSAT 基准和另一种本地搜索学习的策略相比,这种方法能够获得更好的表现。
    Abstract Local search algorithms are well-known methods for solving large, hard instances of the satisfiability problem (SAT). The performance of these algorithms crucially depends on heuristics for setting noise parameters and scoring variables. The optimal setting for these heuristics varies for different instance distributions. In this paper, we present an approach for learning effective variable scoring functions and noise parameters by using reinforcement learning. We consider satisfiability problems from different instance distributions and learn specialized heuristics for each of them. Our experimental results show improvements with respect to both a WalkSAT baseline and another local search learned heuristic.
    摘要 <>本文提出了一种使用再征学习来学习有效的变量评分函数和噪声参数,以解决大规模、困难的满足问题(SAT)。这些算法的性能取决于设置噪声参数和评分变量的规则。不同的实例分布下,optimal setting for these heuristics 会有所不同。本文 presente una aproximación para aprender heurísticas especializadas para cada distribución de instancias. Our experimental results show improvements over a WalkSAT baseline and another local search learned heuristic.Note: "WalkSAT" is a local search algorithm for solving SAT problems.

A Memristor-Inspired Computation for Epileptiform Signals in Spheroids

  • paper_url: http://arxiv.org/abs/2307.04607
  • repo_url: None
  • paper_authors: Iván Díez de los Ríos, John Wesley Ephraim, Gemma Palazzolo, Teresa Serrano-Gotarredona, Gabriella Panuccio, Bernabé Linares-Barranco
  • for: 这种研究是为了开发一种基于折射器的计算方法,用于实时计算和较低的计算成本下的癫病活动特征或指纹。
  • methods: 这种计算方法基于折射器的原理,使用微电极阵系统记录的癫病活动数据进行计算。
  • results: 这种计算方法可以在实时中计算并生成癫病事件开始的警示信号,并且可以减少计算成本。
    Abstract In this paper we present a memristor-inspired computational method for obtaining a type of running spectrogram or fingerprint of epileptiform activity generated by rodent hippocampal spheroids. It can be used to compute on the fly and with low computational cost an alert-level signal for epileptiform events onset. Here, we describe the computational method behind this fingerprint technique and illustrate it using epileptiform events recorded from hippocampal spheroids using a microelectrode array system.
    摘要 在这篇论文中,我们介绍了一种基于抗晶阻器的计算方法,用于获得来自小鼠脊梁块的 épileptiform 活动的类型运动 спектрограм或指纹。这种方法可以在实时计算和低计算成本下生成 épileptiform 事件开始的警示信号。我们将在这篇论文中介绍这种指纹技术的计算方法,并使用来自脊梁块的 épileptiform 事件记录系统来说明。

DBFed: Debiasing Federated Learning Framework based on Domain-Independent

  • paper_url: http://arxiv.org/abs/2307.05582
  • repo_url: None
  • paper_authors: Jiale Li, Zhixin Li, Yibo Wang, Yao Li, Lei Wang
  • for: 这篇论文主要是关于如何使用 federated learning 技术保护数据隐私和解决数据岛屿问题,同时mitigate model bias caused by data differences in quality.
  • methods: 该论文提出了一种基于domain-independent的 debiasing federated learning 框架, named DBFed,可以在客户端上训练带有敏感特征的模型,并且可以Explicitlyencode sensitive attributes to mitigate model bias.
  • results: 该论文在三个实际数据集上进行了实验,并使用了五个评价指标来衡量模型的准确率和公平性。结果显示,DBFed 的评价指标比三种比较方法更高,充分表明 DBFed 可以减少模型偏见问题。
    Abstract As digital transformation continues, enterprises are generating, managing, and storing vast amounts of data, while artificial intelligence technology is rapidly advancing. However, it brings challenges in information security and data security. Data security refers to the protection of digital information from unauthorized access, damage, theft, etc. throughout its entire life cycle. With the promulgation and implementation of data security laws and the emphasis on data security and data privacy by organizations and users, Privacy-preserving technology represented by federated learning has a wide range of application scenarios. Federated learning is a distributed machine learning computing framework that allows multiple subjects to train joint models without sharing data to protect data privacy and solve the problem of data islands. However, the data among multiple subjects are independent of each other, and the data differences in quality may cause fairness issues in federated learning modeling, such as data bias among multiple subjects, resulting in biased and discriminatory models. Therefore, we propose DBFed, a debiasing federated learning framework based on domain-independent, which mitigates model bias by explicitly encoding sensitive attributes during client-side training. This paper conducts experiments on three real datasets and uses five evaluation metrics of accuracy and fairness to quantify the effect of the model. Most metrics of DBFed exceed those of the other three comparative methods, fully demonstrating the debiasing effect of DBFed.
    摘要 如 digital transformation 继续发展,企业在生产、管理和存储庞大数据的同时,人工智能技术也在快速发展。然而,这些技术带来了信息安全和数据安全的挑战。数据安全指的是保护数字信息不被未经授权的访问、损害、盗窃等等的整个生命周期。随着数据安全法规的推出和组织和用户对数据安全和隐私的强调,隐私保护技术如联合学习得到了广泛的应用场景。联合学习是一种分布式机器学习计算框架,允许多个主体共同训练无需分享数据,以保护数据隐私和解决数据孤岛问题。然而,多个主体的数据归并独立,数据质量差异可能导致公平性问题,如多主体数据的偏见问题,导致模型偏见和歧视。因此,我们提出了DBFed,一种基于领域独立的偏见降低联合学习框架,通过客户端训练中显式编码敏感特征来mitigate模型偏见。这篇论文在三个实际数据集上进行了实验,并使用了五个评价指标来衡量模型的效果。大多数DBFed的评价指标超过了其他三种比较方法的评价指标,完全展示了DBFed的偏见降低效果。

Model-Driven Engineering for Artificial Intelligence – A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2307.04599
  • repo_url: https://github.com/sraedler/model-driven-engineering4artificial-intelligence
  • paper_authors: Simon Raedler, Luca Berardinelli, Karolin Winter, Abbas Rahimi, Stefanie Rinderle-Ma
  • for: 本研究目的是为了探讨Model-Driven Engineering(MDE)在人工智能(AI)领域的现有知识体系,以锻炼未来的研究方向和定义当前领域的惯性。
  • methods: 本研究采用了系统性Literature Review(SLR)方法,从五个主要数据库中收集了703个候选研究,最终选择了15个主要研究。每个主要研究都会被评估和讨论,以对MDE原则和实践的采用情况进行分析,同时也会对CRISP-DM方法ологиraph中的不同阶段进行对应。
  • results: 研究发现,使用MDE在AI领域的应用还处于早期阶段,没有一个广泛使用的工具或方法。现有的方法主要集中在特定的开发阶段,而不是涵盖整个开发过程。此外,训练和定制AI算法是主要的AI相关问题,而数据集准备则受到较少的注意。早期项目阶段,如CRISP-DM《企业理解》阶段,很少被反映。
    Abstract Objective: This study aims to investigate the existing body of knowledge in the field of Model-Driven Engineering MDE in support of AI (MDE4AI) to sharpen future research further and define the current state of the art. Method: We conducted a Systemic Literature Review (SLR), collecting papers from five major databases resulting in 703 candidate studies, eventually retaining 15 primary studies. Each primary study will be evaluated and discussed with respect to the adoption of (1) MDE principles and practices and (2) the phases of AI development support aligned with the stages of the CRISP-DM methodology. Results: The study's findings show that the pillar concepts of MDE (metamodel, concrete syntax and model transformation), are leveraged to define domain-specific languages (DSL) explicitly addressing AI concerns. Different MDE technologies are used, leveraging different language workbenches. The most prominent AI-related concerns are training and modeling of the AI algorithm, while minor emphasis is given to the time-consuming preparation of the data sets. Early project phases that support interdisciplinary communication of requirements, such as the CRISP-DM \textit{Business Understanding} phase, are rarely reflected. Conclusion: The study found that the use of MDE for AI is still in its early stages, and there is no single tool or method that is widely used. Additionally, current approaches tend to focus on specific stages of development rather than providing support for the entire development process. As a result, the study suggests several research directions to further improve the use of MDE for AI and to guide future research in this area.
    摘要 目的:本研究旨在调查MDE(Model-Driven Engineering)在支持人工智能(AI)领域的现有知识体系,以锐化未来研究和定义当前领域的惯例。方法:我们进行了系统性文献综述(SLR),从五个主要数据库收集了703份候选学术论文,最终选择了15个首论。每个首论都将被评估和讨论,对MDE原则和实践的采纳以及CRISP-DM方法ологии中的不同阶段进行分析。结果:研究发现,MDE柱论(模型模型、具体语法和模型转换)在定义适用于人工智能问题的域特定语言(DSL)方面发挥了重要作用。不同的MDE技术被使用,利用不同的语言工具箱。AI algoritirthmic的训练和模型化收到了主要的注意,而数据集的准备则很少被考虑。CRISP-DM方法ологии的营运理解阶段(Business Understanding) rare reflection。结论:研究发现,MDE在人工智能领域的使用仍在早期阶段,没有一个广泛使用的工具或方法。当前的approaches倾向于特定的开发阶段,而不是为整个开发过程提供支持。因此,研究建议了 severaltudesearch directions,以进一步改进MDE在人工智能领域的使用,并引导未来的研究。

A Semi-Automated Solution Approach Selection Tool for Any Use Case via Scopus and OpenAI: a Case Study for AI/ML in Oncology

  • paper_url: http://arxiv.org/abs/2307.04573
  • repo_url: None
  • paper_authors: Deniz Kenan Kılıç, Alex Elkjær Vasegaard, Aurélien Desoeuvres, Peter Nielsen
  • for: 本研究提出了一个半自动化的工具,用于方案方法评估和选择,并且适用于研究者、实践者以及决策者。同时,这个工具可以作为未来研究的benchmark。
  • methods: 本研究使用了三个模组:(1)选择和分数计算 papers,使用关键词选择方案查询Scopus API, compute relevancy;(2)从 papers中提取方案方法,使用OpenAI API;(3)敏感分析和后analyzes。
  • results: 本研究显示出了方法方案的趋势、相关文献和方法。在肿瘤领域的实例和多个使用案中,这个工具获得了Promising的结果,与手动ground truth比较。
    Abstract In today's vast literature landscape, a manual review is very time-consuming. To address this challenge, this paper proposes a semi-automated tool for solution method review and selection. It caters to researchers, practitioners, and decision-makers while serving as a benchmark for future work. The tool comprises three modules: (1) paper selection and scoring, using a keyword selection scheme to query Scopus API and compute relevancy; (2) solution method extraction in papers utilizing OpenAI API; (3) sensitivity analysis and post-analyzes. It reveals trends, relevant papers, and methods. AI in the oncology case study and several use cases are presented with promising results, comparing the tool to manual ground truth.
    摘要 今天的文献景观中,手动审查非常占时。为解决这个挑战,这篇论文提出了一种半自动化工具,用于解决方法审查和选择。它适用于研究人员、实践者和决策者,同时作为未来工作的标准。工具包括三个模块:(1)文献选择和分数计算,使用关键词选择方案来查询Scopus API计算相关性;(2)解决方法提取在论文中使用OpenAI API;(3)敏感分析和后期分析。它显示出了趋势、相关论文和方法。在肿瘤学案例和多个实践案例中,这种工具取得了有前提的结果,与手动真实比较。

Gradient Surgery for One-shot Unlearning on Generative Model

  • paper_url: http://arxiv.org/abs/2307.04550
  • repo_url: None
  • paper_authors: Seohui Bae, Seoyoon Kim, Hyemin Jung, Woohyung Lim
  • for: 本研究旨在提出一种简单又有效的方法,用于从深度生成模型中移除特定样本的影响。
  • methods: 本方法基于多任务学习的想法,通过将梯度投影到梯度的正常平面来规范梯度的影响。
  • results: 本研究提供了 theoretically analysis,并与现有基eline相比,实现了超出现有基eline的表现。
    Abstract Recent regulation on right-to-be-forgotten emerges tons of interest in unlearning pre-trained machine learning models. While approximating a straightforward yet expensive approach of retrain-from-scratch, recent machine unlearning methods unlearn a sample by updating weights to remove its influence on the weight parameters. In this paper, we introduce a simple yet effective approach to remove a data influence on the deep generative model. Inspired by works in multi-task learning, we propose to manipulate gradients to regularize the interplay of influence among samples by projecting gradients onto the normal plane of the gradients to be retained. Our work is agnostic to statistics of the removal samples, outperforming existing baselines while providing theoretical analysis for the first time in unlearning a generative model.
    摘要 近期的“忘记权”规定吸引了大量关注,训练过的机器学习模型的“忘记”问题受到了广泛关注。而现有的机器学习忘记方法通常是将数据重新训练,这种方法可能是复杂且昂贵的。在这篇论文中,我们介绍了一个简单又有效的方法,可以将数据的影响力从深度生成模型中移除。我们参考了多任务学习的研究,我们提出了将梯度调控到梯度的正常平面上,以regularize数据之间的影响相互关系。我们的方法不依赖排除数据的统计特征,而且超越现有的基elines,并提供了理论分析。

Hate Speech Detection via Dual Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.05578
  • repo_url: None
  • paper_authors: Junyu Lu, Hongfei Lin, Xiaokun Zhang, Zhaoqing Li, Tongyue Zhang, Linlin Zong, Fenglong Ma, Bo Xu
  • for: 本研究旨在探讨社交媒体上 hate speech 的快速传播对互联网环境和社会造成的影响,以及探索现代自然语言处理技术中 hate speech 检测的问题。
  • methods: 本研究提出了一种基于 dual contrastive learning(DCL)的新方法,通过同时优化自动supervised和supervised contrastive learning损失函数,捕捉 span-level 信息,超过传统模型所使用的 token-level 情感 semantics。此外,我们还将 focal loss 集成到 DCL 框架中,以解决数据不均衡问题。
  • results: 我们在两个公开的英文数据集上进行实验,结果表明,提出的模型在 hate speech 检测任务中具有比例性和精度,并 precisely 地检测了 hate speeches。
    Abstract The fast spread of hate speech on social media impacts the Internet environment and our society by increasing prejudice and hurting people. Detecting hate speech has aroused broad attention in the field of natural language processing. Although hate speech detection has been addressed in recent work, this task still faces two inherent unsolved challenges. The first challenge lies in the complex semantic information conveyed in hate speech, particularly the interference of insulting words in hate speech detection. The second challenge is the imbalanced distribution of hate speech and non-hate speech, which may significantly deteriorate the performance of models. To tackle these challenges, we propose a novel dual contrastive learning (DCL) framework for hate speech detection. Our framework jointly optimizes the self-supervised and the supervised contrastive learning loss for capturing span-level information beyond the token-level emotional semantics used in existing models, particularly detecting speech containing abusive and insulting words. Moreover, we integrate the focal loss into the dual contrastive learning framework to alleviate the problem of data imbalance. We conduct experiments on two publicly available English datasets, and experimental results show that the proposed model outperforms the state-of-the-art models and precisely detects hate speeches.
    摘要 “社交媒体上快速传播的仇恨言论影响了网络环境和我们的社会,增加了偏见和伤害人。检测仇恨言论已经受到了学界的广泛关注,但这个任务仍然面临两个困难。第一个挑战是仇恨言论中的复杂 semantics,特别是攻击性评语的干扰。第二个挑战是仇恨言论和非仇恨言论之间的数据分布不均,这可能导致模型的性能下降。为了解决这些挑战,我们提出了一个新的双重对照学习(DCL)框架 для仇恨言论检测。我们的框架同时运行自我超vised和supervised对照学习损失,以捕捉 span-level 信息,超过了现有模型中的单词水平情感学。此外,我们将 focal loss 整合到 dual contrastive learning 框架中,以解决数据不均的问题。我们在两个公开的英文数据集上进行实验,实验结果显示,我们提出的模型比现有的模型高效,精准地检测到了仇恨言论。”

Learning Large Margin Sparse Embeddings for Open Set Medical Diagnosis

  • paper_url: http://arxiv.org/abs/2307.04541
  • repo_url: None
  • paper_authors: Mingyuan Liu, Lu Xu, Jicong Zhang
  • for: 这篇论文目的是解决医学数据集搜集不充分时,算法遇到多种挑战,其中一个重要的挑战是开集识别(OSR)。
  • methods: 这篇论文提出了两个机制来解决OSR:一个叫做Margin Loss with Adaptive Scale(MLAS),它引入了角度margin来增强内组距离和间组分离,同时还有一个适应因子来强化通用能力;另一个叫做Open-Space Suppression(OSS),它开辟了分类器,让尚未被训练的范围被视为未知。
  • results: 与现有方法比较,这篇论文的MLAS方法实现了superior表现,测量方式包括ACC、AUROC和OSCR。
    Abstract Fueled by deep learning, computer-aided diagnosis achieves huge advances. However, out of controlled lab environments, algorithms could face multiple challenges. Open set recognition (OSR), as an important one, states that categories unseen in training could appear in testing. In medical fields, it could derive from incompletely collected training datasets and the constantly emerging new or rare diseases. OSR requires an algorithm to not only correctly classify known classes, but also recognize unknown classes and forward them to experts for further diagnosis. To tackle OSR, we assume that known classes could densely occupy small parts of the embedding space and the remaining sparse regions could be recognized as unknowns. Following it, we propose Open Margin Cosine Loss (OMCL) unifying two mechanisms. The former, called Margin Loss with Adaptive Scale (MLAS), introduces angular margin for reinforcing intra-class compactness and inter-class separability, together with an adaptive scaling factor to strengthen the generalization capacity. The latter, called Open-Space Suppression (OSS), opens the classifier by recognizing sparse embedding space as unknowns using proposed feature space descriptors. Besides, since medical OSR is still a nascent field, two publicly available benchmark datasets are proposed for comparison. Extensive ablation studies and feature visualization demonstrate the effectiveness of each design. Compared with state-of-the-art methods, MLAS achieves superior performances, measured by ACC, AUROC, and OSCR.
    摘要 We propose Open Margin Cosine Loss (OMCL), which combines two mechanisms:1. Margin Loss with Adaptive Scale (MLAS): This introduces angular margin for improving intra-class compactness and inter-class separability, along with an adaptive scaling factor to enhance generalization capacity.2. Open-Space Suppression (OSS): This opens the classifier by recognizing sparse embedding space as unknowns using proposed feature space descriptors.To evaluate the effectiveness of OMCL, we use two publicly available benchmark datasets for comparison. Extensive ablation studies and feature visualization demonstrate the superior performance of MLAS, as measured by ACC, AUROC, and OSCR. Compared to state-of-the-art methods, MLAS achieves better performance in recognizing unknown classes and forwarding them to experts for further diagnosis.

Pathway toward prior knowledge-integrated machine learning in engineering

  • paper_url: http://arxiv.org/abs/2307.06950
  • repo_url: None
  • paper_authors: Xia Chen, Philipp Geyer
  • for: This study aims to integrate multidisciplinary domain professions into machine acknowledgeable, data-driven processes.
  • methods: The study examines information uncertainty sources in knowledge representation and explores knowledge decomposition with a three-tier knowledge-integrated machine learning paradigm.
  • results: The approach balances holist and reductionist perspectives in the engineering domain.Here is the same information in Simplified Chinese text, as requested:
  • for: 这种研究旨在将多学科领域专业integrated into机器可知的数据驱动过程中。
  • methods: 研究检查知识表示中的信息不确定源和知识归纳三层知识结合机器学习模型。
  • results: 该方法在工程领域实现了平衡整体和分解的观点。
    Abstract Despite the digitalization trend and data volume surge, first-principles models (also known as logic-driven, physics-based, rule-based, or knowledge-based models) and data-driven approaches have existed in parallel, mirroring the ongoing AI debate on symbolism versus connectionism. Research for process development to integrate both sides to transfer and utilize domain knowledge in the data-driven process is rare. This study emphasizes efforts and prevailing trends to integrate multidisciplinary domain professions into machine acknowledgeable, data-driven processes in a two-fold organization: examining information uncertainty sources in knowledge representation and exploring knowledge decomposition with a three-tier knowledge-integrated machine learning paradigm. This approach balances holist and reductionist perspectives in the engineering domain.
    摘要 Translation notes:* "first-principles models" 被翻译成 "逻辑驱动、物理驱动、规则驱动或知识驱动模型"* "data-driven approaches" 被翻译成 "数据驱动方法"* "domain knowledge" 被翻译成 "领域知识"* "machine acknowledgeable" 被翻译成 "可识别的机器"* "three-tier knowledge-integrated machine learning paradigm" 被翻译成 "三层知识集成机器学习模型"Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving Perception

  • paper_url: http://arxiv.org/abs/2307.04537
  • repo_url: None
  • paper_authors: Chi-Chih Chang, Wei-Cheng Lin, Pei-Shuo Wang, Sheng-Feng Yu, Yu-Chen Lu, Kuan-Cheng Lin, Kai-Chiang Wu
  • for: 本文提出了一种高效、量化化感知模型(Q- YOLOP),用于自动驾驶场景中的对象检测、 drivable 区域分 segmentation 和车道线分 segmentation。
  • methods: 该模型使用 Efficient Layer Aggregation Network(ELAN)作为底层,并采用任务特定的头来实现每个任务。训练过程包括四个阶段:预训练于 BDD100K 数据集、精度调整于 BDD100K 和 iVS 数据集、量化感知训练(QAT)于 BDD100K 数据集,并使用强大的数据增强技术和多个数据集训练。
  • results: 提出的模型在对象检测和分 segmentation tasks 中实现了状态的术语性性能(mAP@0.5 = 0.622)和低计算和存储需求。
    Abstract In this work, we present an efficient and quantization-aware panoptic driving perception model (Q- YOLOP) for object detection, drivable area segmentation, and lane line segmentation, in the context of autonomous driving. Our model employs the Efficient Layer Aggregation Network (ELAN) as its backbone and task-specific heads for each task. We employ a four-stage training process that includes pretraining on the BDD100K dataset, finetuning on both the BDD100K and iVS datasets, and quantization-aware training (QAT) on BDD100K. During the training process, we use powerful data augmentation techniques, such as random perspective and mosaic, and train the model on a combination of the BDD100K and iVS datasets. Both strategies enhance the model's generalization capabilities. The proposed model achieves state-of-the-art performance with an mAP@0.5 of 0.622 for object detection and an mIoU of 0.612 for segmentation, while maintaining low computational and memory requirements.
    摘要 在这项工作中,我们提出了一种高效和量化意识的泛型驾驶视觉模型(Q-YOLOP),用于对象检测、驾驶区域分割和车道线分割,在自动驾驶上下文中。我们的模型采用了高效层堆网络(ELAN)作为基础,并在每个任务上采用专门的任务头。我们采用了四个阶段的训练过程,包括预训练于BDD100K数据集、finetuning于BDD100K和iVS数据集,以及量化意识训练(QAT)于BDD100K。在训练过程中,我们使用了强大的数据扩展技术,如随机投影和拼图,并在BDD100K和iVS数据集上训练模型。这两种策略都会增强模型的通用化能力。我们提出的模型在对象检测和分割任务中具有状态机器的表现,即mAP@0.5为0.622,而且保持了低的计算和存储需求。

QBitOpt: Fast and Accurate Bitwidth Reallocation during Training

  • paper_url: http://arxiv.org/abs/2307.04535
  • repo_url: None
  • paper_authors: Jorn Peters, Marios Fournarakis, Markus Nagel, Mart van Baalen, Tijmen Blankevoort
  • for: 这篇论文的目的是提出一个新的算法来更新量化网络中层的比特宽,以提高量化网络在移动和嵌入式设备上的效率。
  • methods: 这篇论文使用了量化网络的训练方法,并将层的比特宽更新为获得更高的任务性能。
  • results: 根据评估 ImageNet 的结果,这篇论文的算法可以在严格的比特宽限制下,与现有的固定和混合精度方法相比,实现更高的任务性能。
    Abstract Quantizing neural networks is one of the most effective methods for achieving efficient inference on mobile and embedded devices. In particular, mixed precision quantized (MPQ) networks, whose layers can be quantized to different bitwidths, achieve better task performance for the same resource constraint compared to networks with homogeneous bitwidths. However, finding the optimal bitwidth allocation is a challenging problem as the search space grows exponentially with the number of layers in the network. In this paper, we propose QBitOpt, a novel algorithm for updating bitwidths during quantization-aware training (QAT). We formulate the bitwidth allocation problem as a constraint optimization problem. By combining fast-to-compute sensitivities with efficient solvers during QAT, QBitOpt can produce mixed-precision networks with high task performance guaranteed to satisfy strict resource constraints. This contrasts with existing mixed-precision methods that learn bitwidths using gradients and cannot provide such guarantees. We evaluate QBitOpt on ImageNet and confirm that we outperform existing fixed and mixed-precision methods under average bitwidth constraints commonly found in the literature.
    摘要 quantizing neural networks是一种最有效的方法来实现移动和嵌入式设备上的高效推理。特别是杂性精度归一化(MPQ)网络,它们的层可以被归一化到不同的比特宽,可以在相同的资源约束下实现更好的任务性能。然而,寻找最佳比特宽分配是一个复杂的问题,因为搜索空间会随着网络层数的增加而增长 exponentially。在这篇论文中,我们提出了QBitOpt算法,它是一种在量化训练中更新比特宽的算法。我们将比特宽分配问题定义为一个约束优化问题。通过将快速计算的敏感度与高效的解决方案结合在一起,QBitOpt可以生成杂性精度网络,保证任务性能和严格的资源约束之间的均衡。这与现有的杂性精度方法不同,它们通过梯度来学习比特宽,但无法提供类似的保证。我们对ImageNet进行了评估,并证明了我们在常见的平均比特宽约束下超过了现有的固定和杂性精度方法。

Preventing Errors in Person Detection: A Part-Based Self-Monitoring Framework

  • paper_url: http://arxiv.org/abs/2307.04533
  • repo_url: https://github.com/fraunhoferiks/smf-object-detection
  • paper_authors: Franziska Schwaiger, Andrea Matic, Karsten Roscher, Stephan Günnemann
  • for: 提高自主系统在真实应用中的可靠性,尤其是人体检测 tasks,以避免错误检测。
  • methods: 提出了一种自动监控框架,让感知系统在运行时进行可能性检查。通过添加人体部分检测组件,可以显著减少基线设置中的人体检测错失率,最多减少9倍。同时,通过同时训练人体和其部分,可以减少假阳检测率,最多减少50%。
  • results: 通过在公共可用的数据集DensePose和Pascal VOC上进行广泛的实验,证明了我们的框架的效iveness。代码可以在https://github.com/FraunhoferIKS/smf-object-detection上下载。
    Abstract The ability to detect learned objects regardless of their appearance is crucial for autonomous systems in real-world applications. Especially for detecting humans, which is often a fundamental task in safety-critical applications, it is vital to prevent errors. To address this challenge, we propose a self-monitoring framework that allows for the perception system to perform plausibility checks at runtime. We show that by incorporating an additional component for detecting human body parts, we are able to significantly reduce the number of missed human detections by factors of up to 9 when compared to a baseline setup, which was trained only on holistic person objects. Additionally, we found that training a model jointly on humans and their body parts leads to a substantial reduction in false positive detections by up to 50% compared to training on humans alone. We performed comprehensive experiments on the publicly available datasets DensePose and Pascal VOC in order to demonstrate the effectiveness of our framework. Code is available at https://github.com/ FraunhoferIKS/smf-object-detection.
    摘要 自动化系统在实际应用中需要检测已经学习的对象,不管它们的外观如何。尤其是在检测人类时,这是安全关键任务,避免错误是非常重要。为解决这个挑战,我们提出了一个 runtime 进行可靠性检查的自我监控框架。我们发现,通过在 runtime 添加人体部分检测的附加组件,可以大幅减少与基准设置相比(只在人体整体对象上进行训练)的人类检测错过率,达到了9倍的提高。此外,我们发现,同时训练人体和人体部分模型,可以将假阳性检测错误量降低至50%以上,比训练只在人体上进行训练要好。我们在公共可用的 dataset 上进行了广泛的实验,以示效果。代码可以在 https://github.com/FraunhoferIKS/smf-object-detection 上获取。

SAGC-A68: a space access graph dataset for the classification of spaces and space elements in apartment buildings

  • paper_url: http://arxiv.org/abs/2307.04515
  • repo_url: https://github.com/a2amir/sagc-a68
  • paper_authors: Amir Ziaee, Georg Suter
  • For: The paper aims to provide a dataset and a method for automated classification of spaces and space elements in digital 3D models of apartment buildings, using Graph Deep Learning (GDL) techniques.* Methods: The paper introduces a new dataset, SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings. The dataset is well-suited for developing GDL models for space function and space element classification. The paper also employs a graph attention network (GAT) to predict 22 space function and 6 space element classes using the dataset.* Results: The paper demonstrates the potential of the dataset and the GAT method by achieving high accuracy in predicting space functions and space elements in digital 3D models of apartment buildings. The dataset and code used in the experiment are available online.
    Abstract The analysis of building models for usable area, building safety, and energy use requires accurate classification data of spaces and space elements. To reduce input model preparation effort and errors, automated classification of spaces and space elements is desirable. A barrier hindering the utilization of Graph Deep Learning (GDL) methods to space function and space element classification is a lack of suitable datasets. To bridge this gap, we introduce a dataset, SAGC-A68, which comprises access graphs automatically generated from 68 digital 3D models of space layouts of apartment buildings. This graph-based dataset is well-suited for developing GDL models for space function and space element classification. To demonstrate the potential of the dataset, we employ it to train and evaluate a graph attention network (GAT) that predicts 22 space function and 6 space element classes. The dataset and code used in the experiment are available online. https://doi.org/10.5281/zenodo.7805872, https://github.com/A2Amir/SAGC-A68.
    摘要 “建筑模型中的usable area、建筑安全和能源使用的分析需要准确的空间和空间元素分类数据。为了减少输入模型准备和错误,自动化空间和空间元素分类是非常有利的。然而,使用图深度学习(GDL)方法进行空间功能和空间元素分类时存在一个障碍,即缺乏适当的数据集。为了缓解这个问题,我们介绍了一个数据集,即SAGC-A68,该数据集包含68个数字3D模型的空间布局图像。这个图像基于的数据集非常适合用于开发GDL模型进行空间功能和空间元素分类。为了证明数据集的潜力,我们使用它来训练和评估一个图注意网络(GAT),该网络预测22个空间功能和6个空间元素类别。数据集和实验代码在线可用,请参考https://doi.org/10.5281/zenodo.7805872和https://github.com/A2Amir/SAGC-A68。”Note: Simplified Chinese is used in this translation, as it is the most widely used variety of Chinese in mainland China.

Improving Heterogeneous Graph Learning with Weighted Mixed-Curvature Product Manifold

  • paper_url: http://arxiv.org/abs/2307.04514
  • repo_url: https://github.com/sharecodesubmission/weighted_product_manifold
  • paper_authors: Tuc Nguyen-Van, Dung D. Le, The-Anh Ta
  • for: 这 paper 的目的是学习不同结构的图像 embedding,以便在多种下游任务中提高表示能力。
  • methods: 这 paper 使用 product manifold 的方法,其中每个组件空间在 product 空间中有不同的贡献,因此需要根据数据自动确定每个组件的权重。
  • results: 这 paper 的实验表明,使用 weighted product manifold 方法可以从输入数据中学习更好的图像表示,并在多个下游任务中表现更好,例如 word similarity learning、top-$k$ recommendation 和知识图 embedding。
    Abstract In graph representation learning, it is important that the complex geometric structure of the input graph, e.g. hidden relations among nodes, is well captured in embedding space. However, standard Euclidean embedding spaces have a limited capacity in representing graphs of varying structures. A promising candidate for the faithful embedding of data with varying structure is product manifolds of component spaces of different geometries (spherical, hyperbolic, or euclidean). In this paper, we take a closer look at the structure of product manifold embedding spaces and argue that each component space in a product contributes differently to expressing structures in the input graph, hence should be weighted accordingly. This is different from previous works which consider the roles of different components equally. We then propose WEIGHTED-PM, a data-driven method for learning embedding of heterogeneous graphs in weighted product manifolds. Our method utilizes the topological information of the input graph to automatically determine the weight of each component in product spaces. Extensive experiments on synthetic and real-world graph datasets demonstrate that WEIGHTED-PM is capable of learning better graph representations with lower geometric distortion from input data, and performs better on multiple downstream tasks, such as word similarity learning, top-$k$ recommendation, and knowledge graph embedding.
    摘要 在图表示学习中,将复杂的图像结构(如节点之间的隐藏关系) faithful embedding 到嵌入空间中是非常重要的。然而,标准的欧几何嵌入空间具有表示图像的不同结构的局限性。一种有前途的候选人是Product manifold embedding 空间,其中每个分量空间具有不同的几何结构(球形、尖锐形或欧几何形)。在这篇论文中,我们更加细详地研究Product manifold embedding 空间的结构,并论证每个分量空间在Product manifold中对输入图像的表示具有不同的贡献,因此应该被权衡。这与之前的工作不同,那些假设所有分量空间具有相同的贡献。我们然后提出了WEIGHTED-PM,一种基于数据的方法,用于学习异构图像的嵌入。我们的方法利用输入图像的topological信息自动确定每个分量空间的权重。我们对 synthetic 和实际世界图像集进行了广泛的实验,结果表明WEIGHTED-PM 能够从输入数据中学习更好的图像表示,并且在多个下游任务上表现更好,如单词相似学习、top-$k$ 推荐和知识图像嵌入。

Improving Factuality of Abstractive Summarization via Contrastive Reward Learning

  • paper_url: http://arxiv.org/abs/2307.04507
  • repo_url: None
  • paper_authors: I-Chun Chern, Zhiruo Wang, Sanjan Das, Bhavuk Sharma, Pengfei Liu, Graham Neubig
  • for: 提高报告摘要的准确性和可靠性
  • methods: 利用最新的奖励学习和实际度指标,实现对Feedback的报告摘要学习
  • results: 经验研究表明,提出的框架可以通过对实际度指标的反馈学习,从而生成更加准确和可靠的报告摘要。
    Abstract Modern abstractive summarization models often generate summaries that contain hallucinated or contradictory information. In this paper, we propose a simple but effective contrastive learning framework that incorporates recent developments in reward learning and factuality metrics. Empirical studies demonstrate that the proposed framework enables summarization models to learn from feedback of factuality metrics using contrastive reward learning, leading to more factual summaries by human evaluations. This suggests that further advances in learning and evaluation algorithms can feed directly into providing more factual summaries.
    摘要

Deductive Controller Synthesis for Probabilistic Hyperproperties

  • paper_url: http://arxiv.org/abs/2307.04503
  • repo_url: None
  • paper_authors: Roman Andriushchenko, Ezio Bartocci, Milan Ceska, Francesco Pontiggia, Sarah Sallinger
  • for: 这个论文旨在解决Markov决策过程(MDP)和概率协同特性(HyperPCTL)的控制 synthesis问题。
  • methods: 该方法使用逻辑HyperPCTL和结构约束来描述控制器,并使用抽象减少策略来缩小搜索空间。
  • results: 实验结果表明,该方法与HyperProb相比,可以耗时更短、更高效地解决控制 synthesis问题,并且可以同时满足概率协同特性和内部控制器约束(例如部分可观测性)以及间接控制器约束(例如共同行动协议)。
    Abstract Probabilistic hyperproperties specify quantitative relations between the probabilities of reaching different target sets of states from different initial sets of states. This class of behavioral properties is suitable for capturing important security, privacy, and system-level requirements. We propose a new approach to solve the controller synthesis problem for Markov decision processes (MDPs) and probabilistic hyperproperties. Our specification language builds on top of the logic HyperPCTL and enhances it with structural constraints over the synthesized controllers. Our approach starts from a family of controllers represented symbolically and defined over the same copy of an MDP. We then introduce an abstraction refinement strategy that can relate multiple computation trees and that we employ to prune the search space deductively. The experimental evaluation demonstrates that the proposed approach considerably outperforms HyperProb, a state-of-the-art SMT-based model checking tool for HyperPCTL. Moreover, our approach is the first one that is able to effectively combine probabilistic hyperproperties with additional intra-controller constraints (e.g. partial observability) as well as inter-controller constraints (e.g. agreements on a common action).
    摘要 probabistic 特性指定了从不同的初始状态集到达不同的目标状态集的概率关系。这类行为特性适用于捕捉安全性、隐私和系统级别的需求。我们提出了一种新的控制器Synthesis问题的解决方案,用于Markov决策过程(MDP)和概率特性。我们的规定语言基于逻辑HyperPCTL并增加了对synthesized控制器的结构约束。我们的方法从一个家族控制器的符号表示开始,这些控制器都是基于同一个MDP的。我们然后引入一种抽象减少策略,可以关联多个计算树,并通过推理减少搜索空间。实验评估表明,我们的方法在比 HyperProb,一个现有的SMT基于模板检查工具 для HyperPCTL 的性能上明显优于。此外,我们的方法可以有效地结合概率特性与内部控制器约束(例如半观察性)以及间接控制器约束(例如共同行动协议)。

Model-Driven Engineering Method to Support the Formalization of Machine Learning using SysML

  • paper_url: http://arxiv.org/abs/2307.04495
  • repo_url: None
  • paper_authors: Simon Raedler, Juergen Mangler, Stefanie Rinderle-Ma
    for:This paper aims to support the collaborative definition of machine learning tasks by leveraging model-based engineering in the formalization of the systems modeling language SysML.methods:The method introduced in this paper uses SysML to formalize knowledge from various domains, identify and integrate data sources, define semantic connections between data attributes, and define data processing steps within the machine learning support.results:The method is evaluated through two use cases and a user study, demonstrating the potential of integrating machine learning-specific properties in systems engineering techniques to support non-data scientists in defining specific aspects of a machine learning problem and documenting knowledge on the data. The results show that the method can consolidate knowledge from various domains and support the integration of machine learning in industry by involving several stakeholders.Here is the same information in Simplified Chinese:for:这篇论文目的是支持机器学习任务的共同定义,通过基于模型工程的形式化语言SysML的扩展。methods:这篇论文引入的方法使用SysML来形式化各个领域的知识,标识和集成数据源,定义数据属性之间的含义连接,以及在机器学习支持中定义数据处理步骤。results:这篇论文通过两个使用场景和一个用户研究,展示了将机器学习特有的属性 integrate into 系统工程技术中的可能性,以支持非数据科学家在定义机器学习问题中的参与。研究结果表明,这种方法可以汇集各个领域的知识,并且支持机器学习在产业中的整合,并且能够涉及多个团队成员。
    Abstract Methods: This work introduces a method supporting the collaborative definition of machine learning tasks by leveraging model-based engineering in the formalization of the systems modeling language SysML. The method supports the identification and integration of various data sources, the required definition of semantic connections between data attributes, and the definition of data processing steps within the machine learning support. Results: By consolidating the knowledge of domain and machine learning experts, a powerful tool to describe machine learning tasks by formalizing knowledge using the systems modeling language SysML is introduced. The method is evaluated based on two use cases, i.e., a smart weather system that allows to predict weather forecasts based on sensor data, and a waste prevention case for 3D printer filament that cancels the printing if the intended result cannot be achieved (image processing). Further, a user study is conducted to gather insights of potential users regarding perceived workload and usability of the elaborated method. Conclusion: Integrating machine learning-specific properties in systems engineering techniques allows non-data scientists to understand formalized knowledge and define specific aspects of a machine learning problem, document knowledge on the data, and to further support data scientists to use the formalized knowledge as input for an implementation using (semi-) automatic code generation. In this respect, this work contributes by consolidating knowledge from various domains and therefore, fosters the integration of machine learning in industry by involving several stakeholders.
    摘要 方法:本研究将机器学习任务支持使用系统模型语言(SysML)进行形式化,以便推广专业人员之间的协作定义。这种方法可以支持资料来源的识别和集成,并定义资料属性之间的semantic连接,以及机器学习支持中的数据处理步骤。结果:通过将领域专家和机器学习专家的知识结构化使用SysML,我们提出了一个具有强大描述机器学习任务的工具。这种方法在两个使用案例中进行评估,包括一个预测天气预报基于传感器数据的智能天气系统,以及一个根据图像处理的废物预防系统。此外,我们还进行了用户研究,以获得潜在用户对方法的感受和使用预期。结论:通过将机器学习特有的属性 integrate到系统工程技术中,非数据科学家可以理解形式化知识,定义特定的机器学习问题的方面,将资料处理步骤文档化,并且支持(半)自动代码生成。在这种方式下,本研究对各个领域的知识集成,因此推动了机器学习在业界的整合,并让多个潜在用户参与。

Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations

  • paper_url: http://arxiv.org/abs/2307.05722
  • repo_url: None
  • paper_authors: Likang Wu, Zhaopeng Qiu, Zhi Zheng, Hengshu Zhu, Enhong Chen
  • for: This paper aims to explore the potential of large language models (LLMs) in understanding behavior graphs and enhancing job recommendations in online recruitment, including the promotion of out-of-distribution (OOD) applications.
  • methods: The proposed framework leverages the rich contextual information and semantic representations provided by LLMs to analyze behavior graphs and uncover underlying patterns and relationships. Specifically, it uses a meta-path prompt constructor to understand behavior graphs and a path augmentation module to alleviate prompt bias.
  • results: The approach is evaluated on a comprehensive dataset and demonstrates improved relevance and quality of recommended jobs compared to traditional path-based sequence input methods. The findings contribute to the growing field of natural language processing and offer practical implications for enhancing job search experiences.
    Abstract Large Language Models (LLMs) have revolutionized natural language processing tasks, demonstrating their exceptional capabilities in various domains. However, their potential for behavior graph understanding in job recommendations remains largely unexplored. This paper focuses on unveiling the capability of large language models in understanding behavior graphs and leveraging this understanding to enhance recommendations in online recruitment, including the promotion of out-of-distribution (OOD) application. We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs and uncover underlying patterns and relationships. Specifically, we propose a meta-path prompt constructor that leverages LLM recommender to understand behavior graphs for the first time and design a corresponding path augmentation module to alleviate the prompt bias introduced by path-based sequence input. By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users. We evaluate the effectiveness of our approach on a comprehensive dataset and demonstrate its ability to improve the relevance and quality of recommended quality. This research not only sheds light on the untapped potential of large language models but also provides valuable insights for developing advanced recommendation systems in the recruitment market. The findings contribute to the growing field of natural language processing and offer practical implications for enhancing job search experiences.
    摘要 大型自然语言模型(LLM)已经革命化了自然语言处理任务,展示了其在各个领域的出色表现。然而,它们在行为图理解方面的潜在力还尚未得到充分利用。这篇论文探讨了大型语言模型在理解行为图方面的能力,并利用这种理解来提高在线招聘中的推荐。我们提出了一种新的框架,利用大型语言模型提供的背景信息和含义表示来分析行为图,揭示下面的模式和关系。我们提出了一种基于LLM推荐的元路提取器,用于理解行为图,并设计了一种相应的路径增强模块,以减少路径基于序列输入引入的提示偏见。通过利用这种能力,我们的框架可以为个人用户提供个性化和准确的职业推荐。我们对一个完整的数据集进行了评估,并证明了我们的方法可以提高推荐的 relevance 和质量。这些发现不仅探讨了大型语言模型的未利用潜力,还为自然语言处理领域的进步提供了有价值的 suggessions,以及改善employme搜索经验。

Proceeding of the 1st Workshop on Social Robots Personalisation At the crossroads between engineering and humanities (CONCATENATE)

  • paper_url: http://arxiv.org/abs/2307.12777
  • repo_url: None
  • paper_authors: Imene Tarakli, Georgios Angelopoulos, Mehdi Hellou, Camille Vindolet, Boris Abramovic, Rocco Limongelli, Dimitri Lacroix, Andrea Bertolini, Silvia Rossi, Alessandro Di Nuovo, Angelo Cangelosi, Gordon Cheng
  • for: This paper aims to discuss and propose guidelines for personalization in robotics, addressing questions such as how to define it, how to achieve it, and how it should be guided to fit legal and ethical requirements.
  • methods: The paper uses an interdisciplinary approach, bringing together researchers from various fields to discuss and propose guidelines for personalization in robotics.
  • results: The paper aims to provide a comprehensive understanding of personalization in robotics, including its definition, achievement, and ethical considerations, to ensure the large-scale adoption of social robotics.
    Abstract Nowadays, robots are expected to interact more physically, cognitively, and socially with people. They should adapt to unpredictable contexts alongside individuals with various behaviours. For this reason, personalisation is a valuable attribute for social robots as it allows them to act according to a specific user's needs and preferences and achieve natural and transparent robot behaviours for humans. If correctly implemented, personalisation could also be the key to the large-scale adoption of social robotics. However, achieving personalisation is arduous as it requires us to expand the boundaries of robotics by taking advantage of the expertise of various domains. Indeed, personalised robots need to analyse and model user interactions while considering their involvement in the adaptative process. It also requires us to address ethical and socio-cultural aspects of personalised HRI to achieve inclusive and diverse interaction and avoid deception and misplaced trust when interacting with the users. At the same time, policymakers need to ensure regulations in view of possible short-term and long-term adaptive HRI. This workshop aims to raise an interdisciplinary discussion on personalisation in robotics. It aims at bringing researchers from different fields together to propose guidelines for personalisation while addressing the following questions: how to define it - how to achieve it - and how it should be guided to fit legal and ethical requirements.
    摘要 Translated into Simplified Chinese:现在,机器人被期望能更加physically、cognitively和社会地与人们互动。它们应该适应不可预测的情况,与具有不同行为的人们一起互动。因此,个性化成为社交机器人的有价值特性,允许它们根据特定用户的需求和偏好进行行为,并为人类带来自然和透明的机器人行为。如果正确实施,个性化也可能成为社交机器人的大规模采用的关键。然而,实现个性化是困难的,因为它需要我们拓宽机器人领域,利用不同领域的专家知识。实际上,个性化机器人需要分析和模型用户互动,同时考虑用户参与过程中的适应性。此外,我们还需要考虑人工智能与人类社会的伦理和文化方面,以实现包容和多样化的互动,避免欺骗和不当信任。同时,政策制定者需要确保法规,以适应可能的短期和长期适应HRI。这个工作室希望通过跨学科的交流,让研究人员从不同领域集结一起,提出个性化指南,并解决以下问题:如何定义它?如何实现它?如何使其遵循法律和伦理要求?

PapagAI:Automated Feedback for Reflective Essays

  • paper_url: http://arxiv.org/abs/2307.07523
  • repo_url: None
  • paper_authors: Veronika Solopova, Adrian Gruszczynski, Eiad Rostom, Fritz Cremer, Sascha Witte, Chengming Zhang, Fernando Ramos López Lea Plößl, Florian Hofmann, Ralf Romeike, Michaela Gläser-Zikuda, Christoph Benzmüller, Tim Landgraf
  • for: 提高学生学习成绩和讲师教学 activites的补充
  • methods: 基于教学理论的自动反馈工具, hybrid AI系统
  • results: 提供了一种开源自动反馈工具,可以补充讲师的反馈活动,提高学生的学习成绩
    Abstract Written reflective practice is a regular exercise pre-service teachers perform during their higher education. Usually, their lecturers are expected to provide individual feedback, which can be a challenging task to perform on a regular basis. In this paper, we present the first open-source automated feedback tool based on didactic theory and implemented as a hybrid AI system. We describe the components and discuss the advantages and disadvantages of our system compared to the state-of-art generative large language models. The main objective of our work is to enable better learning outcomes for students and to complement the teaching activities of lecturers.
    摘要 高等教育中的写作反思实践是未来教师的常规课程之一。通常,课堂教师会提供个人反馈,但这可能是一项困难的任务。在这篇论文中,我们介绍了第一个开源自动反馈工具,基于教学理论而实现,并作为混合式人工智能系统。我们说明了它的组成部分,并讨论了我们的系统与现有的生成大语言模型之间的优劣点。我们的主要目的是帮助学生学习更好,并辅助课堂教师的教学活动。

Digital Modeling for Everyone: Exploring How Novices Approach Voice-Based 3D Modeling

  • paper_url: http://arxiv.org/abs/2307.04481
  • repo_url: None
  • paper_authors: Giuseppe Desolda, Andrea Esposito, Florian Müller, Sebastian Feger
  • for: 这项研究的目的是提高voice assistant的用户体验,使得更多人可以通过语音指令制作3D模型。
  • methods: 研究采用了高严译Wizard of Oz方法,与22名参与者进行实验,以了解初级用户的心理模型如何在语音基于3D模型创建中表现。
  • results: 研究发现,初级用户通常会发送模糊、不完整或错误的语音指令,因此voice助手需要处理这些指令并提供相应的帮助。 In addition, the study found that users need a set of straightforward commands to shape simple and composite objects, and different strategies to select 3D objects.
    Abstract Manufacturing tools like 3D printers have become accessible to the wider society, making the promise of digital fabrication for everyone seemingly reachable. While the actual manufacturing process is largely automated today, users still require knowledge of complex design applications to produce ready-designed objects and adapt them to their needs or design new objects from scratch. To lower the barrier to the design and customization of personalized 3D models, we explored novice mental models in voice-based 3D modeling by conducting a high-fidelity Wizard of Oz study with 22 participants. We performed a thematic analysis of the collected data to understand how the mental model of novices translates into voice-based 3D modeling. We conclude with design implications for voice assistants. For example, they have to: deal with vague, incomplete and wrong commands; provide a set of straightforward commands to shape simple and composite objects; and offer different strategies to select 3D objects.
    摘要 现代制造工具如3D打印机已成为更广泛的社会可达,使得数字制造的承诺似乎已经实现可行。虽然今天的制造过程已经大多是自动化的,但用户仍需具备复杂的设计应用程序知识来生产预设的物体和根据自己的需求或从头开始设计新的物体。为降低个性化3D模型的设计和自定义的门槛,我们进行了一项高效精准的奥瑞奥斯研究,与22名参与者进行了高精准的研究。我们通过主题分析收集的数据来理解 novice 的心理模型如何翻译为语音基于3D模型。我们结论出了设计启示,如:处理欠准确、不完整的指令,提供简单明了的命令来形成简单和复杂物体,以及不同的策略来选择3D对象。

Some Preliminary Steps Towards Metaverse Logic

  • paper_url: http://arxiv.org/abs/2307.05574
  • repo_url: None
  • paper_authors: Antonio L. Furtado, Marco A. Casanova, Edirlei Soares de Lima
  • for: 这个论文的目的是开发一种能够处理现实世界和虚拟世界应用领域中的不稳定行为的逻辑。
  • methods: 这篇论文使用了非常ventional的扩展,尝试绘制一种最小化composite逻辑策略。
  • results: 论文通过使用 chatGPT AI agent,并appealing to common-sense approach,提出了一种可以用于处理不稳定行为的逻辑策略。
    Abstract Assuming that the term 'metaverse' could be understood as a computer-based implementation of multiverse applications, we started to look in the present work for a logic that would be powerful enough to handle the situations arising both in the real and in the fictional underlying application domains. Realizing that first-order logic fails to account for the unstable behavior of even the most simpleminded information system domains, we resorted to non-conventional extensions, in an attempt to sketch a minimal composite logic strategy. The discussion was kept at a rather informal level, always trying to convey the intuition behind the theoretical notions in natural language terms, and appealing to an AI agent, namely ChatGPT, in the hope that algorithmic and common-sense approaches can be usefully combined.
    摘要 假设“metaverse”可以理解为基于计算机实现的多元宇宙应用程序,我们在 presente 工作中寻找了一种逻辑能够处理现实世界和 fictional 下的应用领域中出现的情况。因为首领逻辑无法考虑even the simplest information system 的不稳定行为,我们转而使用非标准扩展,以尝试绘制最小的复合逻辑策略。在讨论中,我们尽量使用自然语言来表达理论概念,并征用一个 AI 代理人,即 ChatGPT,希望可以将算法和通俗的方法相结合。

  • paper_url: http://arxiv.org/abs/2307.04429
  • repo_url: https://github.com/devilyangs/emo-nas-cd
  • paper_authors: Shangshang Yang, Haiping Ma, Cheng Zhen, Ye Tian, Limiao Zhang, Yaochu Jin, Xingyi Zhang
  • for: 本研究旨在提出一种自动设计智能诊断模型,以提高现有智能教育平台中学生知识概念诊断的精度和效果。
  • methods: 本研究使用了进化多目标神经网络架构搜索(NAS)方法自动设计智能诊断模型,并采用了多目标遗传编程(MOGP)法来探索搜索空间。每个建筑物被转化为树结构,并使用树来容易优化。此外,还提出了一种初始化策略,以加速共振。
  • results: 实验结果显示,由提出的方法搜索的诊断模型在两个实际数据集上表现出了显著更好的性能,并且与人工设计的模型一样有效。
    Abstract Cognitive diagnosis plays a vital role in modern intelligent education platforms to reveal students' proficiency in knowledge concepts for subsequent adaptive tasks. However, due to the requirement of high model interpretability, existing manually designed cognitive diagnosis models hold too simple architectures to meet the demand of current intelligent education systems, where the bias of human design also limits the emergence of effective cognitive diagnosis models. In this paper, we propose to automatically design novel cognitive diagnosis models by evolutionary multi-objective neural architecture search (NAS). Specifically, we observe existing models can be represented by a general model handling three given types of inputs and thus first design an expressive search space for the NAS task in cognitive diagnosis. Then, we propose multi-objective genetic programming (MOGP) to explore the NAS task's search space by maximizing model performance and interpretability. In the MOGP design, each architecture is transformed into a tree architecture and encoded by a tree for easy optimization, and a tailored genetic operation based on four sub-genetic operations is devised to generate offspring effectively. Besides, an initialization strategy is also suggested to accelerate the convergence by evolving half of the population from existing models' variants. Experiments on two real-world datasets demonstrate that the cognitive diagnosis models searched by the proposed approach exhibit significantly better performance than existing models and also hold as good interpretability as human-designed models.
    摘要 现代智能教育平台中,认知诊断扮演着关键的角色,以揭示学生知识概念的熟练程度,以便进行适应性任务。然而,由于需要高度模型可读性,现有的手动设计的认知诊断模型具有过于简单的结构,无法满足现代智能教育系统的需求。在这篇论文中,我们提议使用进化多目标神经网络搜索(NAS)自动设计新的认知诊断模型。 Specifically,我们发现现有模型可以表示为三种输入类型的通用模型,因此首先设计了表达力强的搜索空间 для NAS任务。然后,我们提出了多目标遗传编程(MOGP)来探索 NAS 任务的搜索空间,并寻找最佳的模型性能和可读性。在 MOGP 设计中,每个建筑物被转换为树结构,并通过树来编码,以便优化。此外,我们还提出了一种 initialization 策略,以加速共振。实验结果表明,由我们提出的方法搜索出的认知诊断模型在两个真实世界数据集上表现出色,并且与人类设计的模型相当可读。

FedDCT: A Dynamic Cross-Tier Federated Learning Scheme in Wireless Communication Networks

  • paper_url: http://arxiv.org/abs/2307.04420
  • repo_url: None
  • paper_authors: Peng Liu, Youquan Xian, Chuanjian Yao, Xiaoyun Gan, Lianghaojie Zhou, Jianyong Jiang, Dongcheng Li
  • for: 这个研究旨在提高无线通信网络上的联合学习系统表现和准确性。
  • methods: 我们提出了一个novel的动态跨层联合学习方案(FedDCT),使用一个层分算法将客户端分为不同的层,并将每个层分配不同的超时阈值以减少训练时间。我们还引入了一个跨层客户选择算法,可以有效地选择层和参与者。
  • results: 我们的实验结果显示,我们的方案可以让模型更快地读数和 дости得更高的准确性在无线通信网络上。
    Abstract With the rapid proliferation of Internet of Things (IoT) devices and the growing concern for data privacy among the public, Federated Learning (FL) has gained significant attention as a privacy-preserving machine learning paradigm. FL enables the training of a global model among clients without exposing local data. However, when a federated learning system runs on wireless communication networks, limited wireless resources, heterogeneity of clients, and network transmission failures affect its performance and accuracy. In this study, we propose a novel dynamic cross-tier FL scheme, named FedDCT to increase training accuracy and performance in wireless communication networks. We utilize a tiering algorithm that dynamically divides clients into different tiers according to specific indicators and assigns specific timeout thresholds to each tier to reduce the training time required. To improve the accuracy of the model without increasing the training time, we introduce a cross-tier client selection algorithm that can effectively select the tiers and participants. Simulation experiments show that our scheme can make the model converge faster and achieve a higher accuracy in wireless communication networks.
    摘要

Unmasking the giant: A comprehensive evaluation of ChatGPT’s proficiency in coding algorithms and data structures

  • paper_url: http://arxiv.org/abs/2307.05360
  • repo_url: None
  • paper_authors: Sayed Erfan Arefin, Tasnia Ashrafi Heya, Hasan Al-Qudah, Ynes Ineza, Abdul Serwadda
  • For: The paper evaluates the coding capabilities of ChatGPT, a large language model, in the Python programming language, specifically focusing on data structures and algorithms.* Methods: The paper uses a comprehensive evaluation of ChatGPT’s coding capabilities based on a large catalog of coding challenges, and investigates the quality of ChatGPT’s code, the nature of run-time errors, and whether ChatGPT might have directly memorized some of the data used to train it.* Results: The paper investigates the above questions from the context of both the underlying learning models (GPT-3.5 and GPT-4) and on a vast array of sub-topics within the main topics, and compares with human performance whenever feasible.
    Abstract The transformative influence of Large Language Models (LLMs) is profoundly reshaping the Artificial Intelligence (AI) technology domain. Notably, ChatGPT distinguishes itself within these models, demonstrating remarkable performance in multi-turn conversations and exhibiting code proficiency across an array of languages. In this paper, we carry out a comprehensive evaluation of ChatGPT's coding capabilities based on what is to date the largest catalog of coding challenges. Our focus is on the python programming language and problems centered on data structures and algorithms, two topics at the very foundations of Computer Science. We evaluate ChatGPT for its ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. Where ChatGPT code successfully executes, but fails to solve the problem at hand, we look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations. To infer whether ChatGPT might have directly memorized some of the data that was used to train it, we methodically design an experiment to investigate this phenomena. Making comparisons with human performance whenever feasible, we investigate all the above questions from the context of both its underlying learning models (GPT-3.5 and GPT-4), on a vast array sub-topics within the main topics, and on problems having varying degrees of difficulty.
    摘要 LLMs 的变革性影响在人工智能技术领域是极其深远的。其中,ChatGPT 在多回交流中表现出色,并在多种语言中 exhibit 代码技巧。在这篇论文中,我们对 ChatGPT 的编程能力进行了全面的评估,以及在 Python 编程语言和数据结构、算法两个领域中的问题。我们评估 ChatGPT 是否能够生成正确的解决方案,代码质量和运行时错误的特点。当 ChatGPT 的代码成功执行但未能解决问题时,我们 investigate 到哪些测试用例中的征性。为了判断 ChatGPT 是否直接Memorized 一些训练数据,我们采用了一种系统atic experiment。我们在可能的情况下与人类性能进行比较,并 investigate LLMs 的下面几个方面:GPT-3.5 和 GPT-4 的下面几个学习模型,多个子题下的问题,以及问题的Difficulty 度。

Ethicist: Targeted Training Data Extraction Through Loss Smoothed Soft Prompting and Calibrated Confidence Estimation

  • paper_url: http://arxiv.org/abs/2307.04401
  • repo_url: https://github.com/thu-coai/targeted-data-extraction
  • paper_authors: Zhexin Zhang, Jiaxin Wen, Minlie Huang
  • for: 本研究旨在提出一种名为“Ethicist”的目标数据提取方法,以帮助避免语言模型内存化训练数据,从而降低隐私风险。
  • methods: 本方法使用损失平滑化和准确性估计来引导模型具有更好的特征,并通过固定模型进行调整软提示 embedding,以便诱导模型内存化训练数据。另外,我们还提出了一种简化损失函数,以使得抽取更加容易。
  • results: 我们的实验结果表明,使用 Ethicist 方法可以显著改善目标数据提取性能,并且可以调查各种因素(如排序策略、模型规模、前缀长度和后缀长度)对数据提取性能的影响。代码可以在 GitHub 上找到。
    Abstract Large pre-trained language models achieve impressive results across many tasks. However, recent works point out that pre-trained language models may memorize a considerable fraction of their training data, leading to the privacy risk of information leakage. In this paper, we propose a method named Ethicist for targeted training data extraction through loss smoothed soft prompting and calibrated confidence estimation, investigating how to recover the suffix in the training data when given a prefix. To elicit memorization in the attacked model, we tune soft prompt embeddings while keeping the model fixed. We further propose a smoothing loss that smooths the loss distribution of the suffix tokens to make it easier to sample the correct suffix. In order to select the most probable suffix from a collection of sampled suffixes and estimate the prediction confidence, we propose a calibrated confidence estimation method, which normalizes the confidence of the generated suffixes with a local estimation. We show that Ethicist significantly improves the extraction performance on a recently proposed public benchmark. We also investigate several factors influencing the data extraction performance, including decoding strategy, model scale, prefix length, and suffix length. Our code is available at https://github.com/thu-coai/Targeted-Data-Extraction.
    摘要 大型预训语言模型在多种任务上实现了很好的结果,但是最近的研究发现,这些预训语言模型可能会记忆一部分的训练数据,导致隐私风险的信息泄露。在这篇论文中,我们提出了一种方法 named Ethicist,通过loss smoothed soft prompting和准确性调整,实现目标数据提取。我们在给定一个prefix时,通过调整软提示 embedding来引起模型的记忆。我们还提出了一种平滑损失,用于平滑训练数据中的 suffix token 的损失分布,以便更容易从 Correct 的 suffix 中采样。为了选择最有可能性的 suffix 从一组采样的 suffix 中,并估计预测结果的可信度,我们提出了一种准确性调整方法,通过地方估计来 норmalize 生成的 suffix 的可信度。我们表明,Ethicist 可以显著提高目标数据提取的性能,并对多种因素的影响进行了调查,包括解码策略、模型缩放、prefix 长度和 suffix 长度。我们的代码可以在 中找到。

Recent Advancements in End-to-End Autonomous Driving using Deep Learning: A Survey

  • paper_url: http://arxiv.org/abs/2307.04370
  • repo_url: https://github.com/pranav-chib/recent-advancements-in-end-to-end-autonomous-driving-using-deep-learning
  • paper_authors: Pranav Singh Chib, Pravendra Singh
  • for: This paper provides a comprehensive review of the End-to-End autonomous driving stack, including the entire driving process from perception to control.
  • methods: The paper employs neural networks in an End-to-End manner, addressing key challenges encountered in real-world applications.
  • results: The paper discusses recent developments in End-to-End autonomous driving, including sensorial input, main and auxiliary output, learning approaches, and model evaluation techniques.Here is the information in Simplified Chinese text:
  • for: 这篇论文提供了综合的End-to-End自动驾驶栈评论,包括整个驾驶过程从感知到控制。
  • methods: 论文使用神经网络在End-to-End方式下进行整合,解决了实际应用中的主要挑战。
  • results: 论文介绍了最新的End-to-End自动驾驶发展,包括感知输入、主要和辅助输出、学习方法从模仿到强化学习、评估技术等。
    Abstract End-to-End driving is a promising paradigm as it circumvents the drawbacks associated with modular systems, such as their overwhelming complexity and propensity for error propagation. Autonomous driving transcends conventional traffic patterns by proactively recognizing critical events in advance, ensuring passengers' safety and providing them with comfortable transportation, particularly in highly stochastic and variable traffic settings. This paper presents a comprehensive review of the End-to-End autonomous driving stack. It provides a taxonomy of automated driving tasks wherein neural networks have been employed in an End-to-End manner, encompassing the entire driving process from perception to control, while addressing key challenges encountered in real-world applications. Recent developments in End-to-End autonomous driving are analyzed, and research is categorized based on underlying principles, methodologies, and core functionality. These categories encompass sensorial input, main and auxiliary output, learning approaches ranging from imitation to reinforcement learning, and model evaluation techniques. The survey incorporates a detailed discussion of the explainability and safety aspects. Furthermore, it assesses the state-of-the-art, identifies challenges, and explores future possibilities. We maintained the latest advancements and their corresponding open-source implementations at https://github.com/Pranav-chib/Recent-Advancements-in-End-to-End-Autonomous-Driving-using-Deep-Learning.
    摘要 结束到结束驾驶是一种有前途的思路,因为它 circumvents 模块化系统的缺点,如它们的束缚性和错误传递。自动驾驶超越了传统的交通模式,因为它可以主动认识到关键事件的发生,保证乘客的安全,并为他们提供舒适的交通,特别是在高度随机和变量的交通Setting中。本文提供了结束到结束自动驾驶栈的全面回顾。它提供了自动驾驶任务中使用神经网络的End-to-End方式,涵盖整个驾驶过程从感知到控制,并解决了实际应用中遇到的关键挑战。最新的结束到结束自动驾驶发展分析,并将研究分为基本原理、方法论和核心功能方面。这些类别包括感知输入、主要和辅助输出、学习方法从仿制到强化学习,以及模型评估技术。文章还包括对解释性和安全方面的详细讨论,以及现状、挑战和未来可能性的评估。我们在 GitHub 上维护最新的进展和相应的开源实现,请参考

ECS – an Interactive Tool for Data Quality Assurance

  • paper_url: http://arxiv.org/abs/2307.04368
  • repo_url: None
  • paper_authors: Christian Sieberichs, Simon Geerkens, Alexander Braun, Thomas Waschulzik
  • for: Ensuring high-quality data for use in safety-critical systems
  • methods: Novel approach using mathematical basics and multiple examples to detect potentially harmful data points
  • results: Detection of data points with potentially harmful properties for use in safety-critical systems.Here is the summary in traditional Chinese characters:
  • for: Ensuring高品质数据 для安全重要系统
  • methods: Novel方法使用数学基础和多个例子检测潜在危险数据点
  • results: Detected潜在危险数据点for use in safety-critical systems.
    Abstract With the increasing capabilities of machine learning systems and their potential use in safety-critical systems, ensuring high-quality data is becoming increasingly important. In this paper we present a novel approach for the assurance of data quality. For this purpose, the mathematical basics are first discussed and the approach is presented using multiple examples. This results in the detection of data points with potentially harmful properties for the use in safety-critical systems.
    摘要 Here is the text in Simplified Chinese:随着机器学习系统的能力的提高和其在安全关键系统中的潜在使用,保证数据质量的重要性也在增加。在这篇论文中,我们提出了一种新的数据质量保障方法。在这里,我们首先介绍了数学基础,然后通过多个示例介绍了我们的方法。这种方法可以检测数据点中可能有害的特性,以便在安全关键系统中使用。

RLTF: Reinforcement Learning from Unit Test Feedback

  • paper_url: http://arxiv.org/abs/2307.04349
  • repo_url: https://github.com/zyq-scut/rltf
  • paper_authors: Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye
  • for: 提高大型自然语言模型(LLM)的代码生成性能,通过强化学习策略。
  • methods: 使用在线渐进学习策略,并在训练过程中接受多级别单元测试反馈。
  • results: 在APPS和MBPP测试benchmark上实现了状态前的性能。
    Abstract The goal of program synthesis, or code generation, is to generate executable code based on given descriptions. Recently, there has been an increasing number of studies employing reinforcement learning (RL) to improve the performance of large language models (LLMs) for code. However, these RL methods have only used offline frameworks, limiting their exploration of new sample spaces. Additionally, current approaches that utilize unit test signals are rather simple, not accounting for specific error locations within the code. To address these issues, we proposed RLTF, i.e., Reinforcement Learning from Unit Test Feedback, a novel online RL framework with unit test feedback of multi-granularity for refining code LLMs. Our approach generates data in real-time during training and simultaneously utilizes fine-grained feedback signals to guide the model towards producing higher-quality code. Extensive experiments show that RLTF achieves state-of-the-art performance on the APPS and the MBPP benchmarks. Our code can be found at: https://github.com/Zyq-scut/RLTF.
    摘要 目的是代码生成或程序生成,是根据给定的描述生成可执行的代码。最近,有越来越多的研究使用强化学习(RL)来提高大型自然语言模型(LLM)的代码质量。然而,这些RL方法只使用了离线框架,限制了新样本空间的探索。另外,当前的使用单元测试信号的方法比较简单,没有考虑特定的代码错误位置。为解决这些问题,我们提出了RLTF,即基于单元测试反馈的在线RL框架,用于细化代码LLM。我们的方法在训练中生成数据并同时使用多级别的反馈信号来引导模型生成更高质量的代码。我们的实验表明,RLTF在APPS和MBPP测试准则上达到了状态 искусственный智能的性能。您可以在以下GitHub上找到我们的代码:https://github.com/Zyq-scut/RLTF。

Injecting Logical Constraints into Neural Networks via Straight-Through Estimators

  • paper_url: http://arxiv.org/abs/2307.04347
  • repo_url: https://github.com/azreasoners/cl-ste
  • paper_authors: Zhun Yang, Joohyung Lee, Chiyoun Park
  • for: 本文旨在探讨把逻辑约束直接插入神经网络学习中的挑战。
  • methods: 我们提出了一种系统地将逻辑约束表示为损失函数,使用梯度下降via直通估计器更新神经网络的权重,使神经网络满足逻辑约束。
  • results: 我们的方法可以利用GPU和批处理训练,与现有的神经符号计算方法相比,具有更好的扩展性和可扩展性。此外,我们还证明了我们的方法可以应用于不同类型的神经网络,如MLP、CNN和GNN,使它们可以通过直接学习已知约束而不需要大量标注数据。
    Abstract Injecting discrete logical constraints into neural network learning is one of the main challenges in neuro-symbolic AI. We find that a straight-through-estimator, a method introduced to train binary neural networks, could effectively be applied to incorporate logical constraints into neural network learning. More specifically, we design a systematic way to represent discrete logical constraints as a loss function; minimizing this loss using gradient descent via a straight-through-estimator updates the neural network's weights in the direction that the binarized outputs satisfy the logical constraints. The experimental results show that by leveraging GPUs and batch training, this method scales significantly better than existing neuro-symbolic methods that require heavy symbolic computation for computing gradients. Also, we demonstrate that our method applies to different types of neural networks, such as MLP, CNN, and GNN, making them learn with no or fewer labeled data by learning directly from known constraints.
    摘要 <>使得神经网络学习中插入离散逻辑约束是新 символи� AI 中的主要挑战。我们发现,一种直接推导器,一种用于训练 binary 神经网络的方法,可以有效地插入逻辑约束到神经网络学习中。更 Specifically,我们设计了一种系统的方法来表示离散逻辑约束,将其作为损失函数来折算;使用梯度下降via straight-through-estimator更新神经网络的权重,以便使得 binarized 输出满足逻辑约束。实验结果表明,通过利用 GPU 和批处理训练,这种方法可以比既存的新 simulate 方法更好地缩放,而且我们的方法可以应用到不同类型的神经网络,如 MLP、CNN 和 GNN,使它们可以通过不或少的标注数据学习。Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide that instead.

Continual Learning as Computationally Constrained Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04345
  • repo_url: None
  • paper_authors: Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, Benjamin Van Roy
  • for: 本文旨在解决人工智能领域中长期积累知识的问题,以提高人工智能能力的前iers。
  • methods: 本文使用了一种概念框架和工具集,以促进对 continual learning 的进一步研究。
  • results: 本文提出了一种概念框架和工具集,可以帮助研究人员更好地理解和解决 continual learning 问题。
    Abstract An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.
    摘要 一个智能代理人,能够有效地积累知识,以发展逐渐加强的技能,可能会推动人工智能技术的前沿。这种代理人的设计,仍然是人工智能领域的长期挑战。这本善本帮助解决这个问题,提供了一个框架和一套工具,以便进一步探索。

Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration

  • paper_url: http://arxiv.org/abs/2307.04341
  • repo_url: https://github.com/mengli-l1/strokeextraction
  • paper_authors: Meng Li, Yahan Yu, Yi Yang, Guanghao Ren, Jian Wang
  • for: 提高中文字符识别和生成中的roke抽取精度
  • methods: 使用深度学习和 prior 信息,包括图像注册、图像semantic segmentation和高精度单roke抽取
  • results: 比基eline方法高效,可以准确抽取中文字符的rokeHere’s a more detailed explanation of each point:
  • for: The paper aims to improve the accuracy of stroke extraction for Chinese characters, which is an important step in character recognition and generation.
  • methods: The proposed method uses deep learning and prior information to extract strokes. It consists of three parts: image registration-based stroke registration, image semantic segmentation-based stroke segmentation, and high-precision extraction of single strokes. The method uses a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes.
  • results: The experimental results show that the proposed method outperforms the baselines, demonstrating its effectiveness in stroke extraction for Chinese characters.
    Abstract Stroke extraction of Chinese characters plays an important role in the field of character recognition and generation. The most existing character stroke extraction methods focus on image morphological features. These methods usually lead to errors of cross strokes extraction and stroke matching due to rarely using stroke semantics and prior information. In this paper, we propose a deep learning-based character stroke extraction method that takes semantic features and prior information of strokes into consideration. This method consists of three parts: image registration-based stroke registration that establishes the rough registration of the reference strokes and the target as prior information; image semantic segmentation-based stroke segmentation that preliminarily separates target strokes into seven categories; and high-precision extraction of single strokes. In the stroke registration, we propose a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes for character images with complex structures. In order to verify the effectiveness of the method, we construct two datasets respectively for calligraphy characters and regular handwriting characters. The experimental results show that our method strongly outperforms the baselines. Code is available at https://github.com/MengLi-l1/StrokeExtraction.
    摘要 stroke extraction of Chinese characters plays an important role in the field of character recognition and generation. Most existing character stroke extraction methods focus on image morphological features, which often lead to errors in cross stroke extraction and stroke matching due to the lack of stroke semantics and prior information. In this paper, we propose a deep learning-based character stroke extraction method that considers stroke semantics and prior information. This method consists of three parts: image registration-based stroke registration, image semantic segmentation-based stroke segmentation, and high-precision extraction of single strokes. In the stroke registration, we propose a structure deformable image registration network to achieve structure-deformable transformation while maintaining the stable morphology of single strokes for character images with complex structures. To verify the effectiveness of the method, we construct two datasets respectively for calligraphy characters and regular handwriting characters. The experimental results show that our method significantly outperforms the baselines. Code is available at https://github.com/MengLi-l1/StrokeExtraction.Here's the translation breakdown:* roke extraction: roke抽取 (stroke extraction)* Chinese characters: 中文字符 (Chinese characters)* plays an important role: 扮演重要的角色 (plays an important role)* field of character recognition and generation: 字符识别和生成领域 (field of character recognition and generation)* most existing methods: 大多数现有方法 (most existing methods)* focus on image morphological features: 强调图像形态特征 (focus on image morphological features)* often lead to errors: 常常导致错误 (often lead to errors)* cross strokes extraction: 交叉roke抽取 (cross strokes extraction)* stroke matching: roke匹配 (stroke matching)* due to the lack of stroke semantics and prior information: 由于缺乏roke semantics和先前信息 (due to the lack of stroke semantics and prior information)* propose a deep learning-based method: 提出一种基于深度学习的方法 (propose a deep learning-based method)* consists of three parts: 分为三部分 (consists of three parts)* image registration-based stroke registration: 图像registratiooon基于stroke registration (image registration-based stroke registration)* image semantic segmentation-based stroke segmentation: 图像semantic segmentation基于stroke segmentation (image semantic segmentation-based stroke segmentation)* high-precision extraction of single strokes: 高精度的单roke抽取 (high-precision extraction of single strokes)* propose a structure deformable image registration network: 提出一种结构可变的图像registratiooon网络 (propose a structure deformable image registration network)* achieve structure-deformable transformation: 实现结构可变的变换 (achieve structure-deformable transformation)* maintaining the stable morphology of single strokes: 保持单roke的稳定形态 (maintaining the stable morphology of single strokes)* for character images with complex structures: для字符图像 WITH complex structures (for character images with complex structures)* construct two datasets: 构建两个数据集 (construct two datasets)* respectively for calligraphy characters and regular handwriting characters: 分别为楷书和常规手写字符 (respectively for calligraphy characters and regular handwriting characters)* the experimental results show that our method strongly outperforms the baselines: 实验结果表明我们的方法明显超越基线 (the experimental results show that our method strongly outperforms the baselines)* Code is available at: 代码可以在https://github.com/MengLi-l1/StrokeExtraction上获取 (Code is available at)

Miriam: Exploiting Elastic Kernels for Real-time Multi-DNN Inference on Edge GPU

  • paper_url: http://arxiv.org/abs/2307.04339
  • repo_url: None
  • paper_authors: Zhihe Zhao, Neiwen Ling, Nan Guan, Guoliang Xing
  • for: 多个深度学习神经网络(DNN)的同时运行,如自动驾驶和增强现实,需要考虑不同级别的实时性要求。然而,边缘GPU上进行多个DNN任务的协调仍然是一个未经研究的领域。
  • methods: Miriam是一个针对边缘GPU进行多个DNN任务协调的框架,包括两个主要组成部分:灵活kernel生成器和运行时动态kernel协调器。该框架支持混合承载重要任务。
  • results: 对于两个边缘GPU平台, Miriam可以提高系统吞吐量92%,同时仅增加了 less than 10%的延迟 overhead,对于关键任务。与状态对照基eline相比, Miriam可以提高系统性能。
    Abstract Many applications such as autonomous driving and augmented reality, require the concurrent running of multiple deep neural networks (DNN) that poses different levels of real-time performance requirements. However, coordinating multiple DNN tasks with varying levels of criticality on edge GPUs remains an area of limited study. Unlike server-level GPUs, edge GPUs are resource-limited and lack hardware-level resource management mechanisms for avoiding resource contention. Therefore, we propose Miriam, a contention-aware task coordination framework for multi-DNN inference on edge GPU. Miriam consolidates two main components, an elastic-kernel generator, and a runtime dynamic kernel coordinator, to support mixed critical DNN inference. To evaluate Miriam, we build a new DNN inference benchmark based on CUDA with diverse representative DNN workloads. Experiments on two edge GPU platforms show that Miriam can increase system throughput by 92% while only incurring less than 10\% latency overhead for critical tasks, compared to state of art baselines.
    摘要 很多应用程序,如自动驾驶和增强现实,需要同时运行多个深度神经网络(DNN),它们具有不同的实时性要求。然而,在边缘GPU上协调多个DNN任务的协调仍然是一个有限的研究领域。与服务器级GPU不同,边缘GPU具有资源限制,缺乏硬件层次资源管理机制,以避免资源竞争。因此,我们提出了 Miriam,一个协调感知的多DNN推理框架 для边缘GPU。Miriam包括两个主要组件:灵活kernel生成器和运行时动态kernel协调器,以支持混合重要DNN推理。为评估Miriam,我们创建了基于CUDA的多DNN推理测试 benchmark,包含多种代表性DNN工作负荷。实验结果表明,Miriam可以提高系统吞吐量92%,只带来少于10%的延迟增加,相比于状态 искусственный基eline。

Source-Aware Embedding Training on Heterogeneous Information Networks

  • paper_url: http://arxiv.org/abs/2307.04336
  • repo_url: None
  • paper_authors: Tsai Hor Chan, Chi Ho Wong, Jiajun Shen, Guosheng Yin
  • for: Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding (SUMSHINE) is proposed to address the issue of distribution discrepancy among subgraphs in Heterogeneous Information Networks (HINs) from multiple sources.
  • methods: SUMSHINE uses a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN.
  • results: Experimental results on real-world datasets in a variety of downstream tasks validate the performance of SUMSHINE over the state-of-the-art heterogeneous information network embedding algorithms.Here’s the text in Simplified Chinese:
  • for: SUMSHINE 是为了解决多源 heterogeneous information network (HIN) 中各个子图的分布差异而提出的Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding方法。
  • methods: SUMSHINE 使用了一种可扩展的无监督框架,将多源 HIN 中各个子图的嵌入分布进行对齐。
  • results: 在实际世界数据集上,SUMSHINE 在多种下游任务中表现出色,比之前的多源 HIN 嵌入算法更高效。
    Abstract Heterogeneous information networks (HINs) have been extensively applied to real-world tasks, such as recommendation systems, social networks, and citation networks. While existing HIN representation learning methods can effectively learn the semantic and structural features in the network, little awareness was given to the distribution discrepancy of subgraphs within a single HIN. However, we find that ignoring such distribution discrepancy among subgraphs from multiple sources would hinder the effectiveness of graph embedding learning algorithms. This motivates us to propose SUMSHINE (Scalable Unsupervised Multi-Source Heterogeneous Information Network Embedding) -- a scalable unsupervised framework to align the embedding distributions among multiple sources of an HIN. Experimental results on real-world datasets in a variety of downstream tasks validate the performance of our method over the state-of-the-art heterogeneous information network embedding algorithms.
    摘要 各种不同类型的信息网络(HINs)已经广泛应用于实际任务中,如推荐系统、社交网络和引用网络。而现有的HIN表示学习方法可以有效地学习网络中的 semantic和结构特征。然而,我们发现忽略了不同来源的子图分布差异会妨碍图像学习算法的效果。这个问题 Motivates我们提出SUMSHINE(可扩展无监督多源多类信息网络嵌入)——一种可扩展的无监督框架,用于对多个来源的HIN嵌入的对齐。实际实验结果表明,我们的方法在多种实际任务中表现更好于现有的多源多类信息网络嵌入算法。

Enhancing Adversarial Robustness via Score-Based Optimization

  • paper_url: http://arxiv.org/abs/2307.04333
  • repo_url: None
  • paper_authors: Boya Zhang, Weijian Luo, Zhihua Zhang
  • for: 防止深度神经网络分类器受到攻击,提高人工智能的安全性。
  • methods: 使用分布模型进行防御,在测试时优化抗击amples,以循着分数based的先验知识向原始的干净数据方向进行优化。
  • results: 在多个dataset上,包括CIFAR10、CIFAR100和ImageNet,OUR方法比既有的抗击方法更高效,同时也能够提高鲁棒性和推理速度。
    Abstract Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations. Developing algorithms that can mitigate the effects of these attacks is crucial for ensuring the safe use of artificial intelligence. Recent studies have suggested that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.
    摘要 深度神经网络分类器面临着敌意攻击的威胁,这些攻击可以通过小量的修改引起分类器的误差。为确保人工智能的安全使用,开发有效的防御策略是非常重要。 latest studies suggest that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results.在本文中,我们提出了一种新的防御方案,即 ScoreOpt,它在测试时对原始的干净数据进行优化,以遵循分布式优化的方向。我们在多个数据集上进行了广泛的实验,包括CIFAR10、CIFAR100和ImageNet。我们的实验结果表明,我们的方法在鲁棒性性能和推理速度两个方面都超过了现有的防御策略。

Learning to Generate Equitable Text in Dialogue from Biased Training Data

  • paper_url: http://arxiv.org/abs/2307.04303
  • repo_url: https://github.com/anthonysicilia/equitable-dialogue-acl2023
  • paper_authors: Anthony Sicilia, Malihe Alikhani
  • for: 这个论文主要研究了对话系统决策过程中嵌入的公平原则的影响,以及这些原则如何影响用户参与度、满意度和任务完成度。
  • methods: 该论文使用计算学习理论来研究公平文本生成问题,提供了公平文本生成的正式定义,并证明了学习人类化和学习公平之间的正式联系。
  • results: 该论文通过empirical测试验证了其理论,并证明了一些算法在生成公平文本时的性能。
    Abstract The ingrained principles of fairness in a dialogue system's decision-making process and generated responses are crucial for user engagement, satisfaction, and task achievement. Absence of equitable and inclusive principles can hinder the formation of common ground, which in turn negatively impacts the overall performance of the system. For example, misusing pronouns in a user interaction may cause ambiguity about the intended subject. Yet, there is no comprehensive study of equitable text generation in dialogue. Aptly, in this work, we use theories of computational learning to study this problem. We provide formal definitions of equity in text generation, and further, prove formal connections between learning human-likeness and learning equity: algorithms for improving equity ultimately reduce to algorithms for improving human-likeness (on augmented data). With this insight, we also formulate reasonable conditions under which text generation algorithms can learn to generate equitable text without any modifications to the biased training data on which they learn. To exemplify our theory in practice, we look at a group of algorithms for the GuessWhat?! visual dialogue game and, using this example, test our theory empirically. Our theory accurately predicts relative-performance of multiple algorithms in generating equitable text as measured by both human and automated evaluation.
    摘要 “对话系统决策过程中嵌入的公平原则是关键,以确保用户参与度、满意度和任务完成度。缺乏公平和包容的原则可能导致共同基础的形成受阻,从而影响整体系统的性能。例如,在用户交互中错误使用代名词可能导致对象的涉及性 ambiguity。然而,现有的研究中没有全面研究对话系统的公平文本生成。在这种情况下,我们使用计算学理论来研究这个问题。我们提供了对equity in text generation的正式定义,并证明了学习人类化的算法实际上可以降低到学习公平的算法(在增强数据上)。这个发现使我们还能提出了可行的条件,即文本生成算法可以在不修改恶势所训练数据上学习生成公平文本。为了证明我们的理论在实践中的正确性,我们选择了一些GuessWhat?!视觉对话游戏中的算法,并使用这个例子进行实验。我们的理论准确预测了多种算法在生成公平文本方面的相对性能,并被人工和自动评估都验证。”

A Demand-Driven Perspective on Generative Audio AI

  • paper_url: http://arxiv.org/abs/2307.04292
  • repo_url: None
  • paper_authors: Sangshin Oh, Minsung Kang, Hyeongi Moon, Keunwoo Choi, Ben Sangbae Chon
  • for: 本研究的目的是了解音频研究的产业需求,以便成功地应用人工智能技术。
  • methods: 本研究使用了访问专业音响工程师的问卷调查,以确定研究优先级和定义各种研究任务。
  • results: 调查显示当前音频质量和可控性的主要瓶颈是数据集的可用性,而且提供了一些解决这些问题的可能性。
    Abstract To achieve successful deployment of AI research, it is crucial to understand the demands of the industry. In this paper, we present the results of a survey conducted with professional audio engineers, in order to determine research priorities and define various research tasks. We also summarize the current challenges in audio quality and controllability based on the survey. Our analysis emphasizes that the availability of datasets is currently the main bottleneck for achieving high-quality audio generation. Finally, we suggest potential solutions for some revealed issues with empirical evidence.
    摘要 要成功部署人工智能研究,非常重要了解行业的需求。在这篇论文中,我们通过询问专业音频工程师,确定研究优先级和定义各种研究任务。我们还总结了现有数据的可用性是实现高质量音频生成的主要瓶颈。最后,我们提出了一些解决问题的可能性,并提供了实证证明。

Generalizing Graph ODE for Learning Complex System Dynamics across Environments

  • paper_url: http://arxiv.org/abs/2307.04287
  • repo_url: None
  • paper_authors: Zijie Huang, Yizhou Sun, Wei Wang
  • for: 学习多体系统动力学,特别是生物分子动力学等实际应用场景。
  • methods: 我们提出了一种机器学习框架,即通用图Ordinary Differential Equations(GG-ODE),用于学习连续多体系统动力学。我们使用图神经网络(GNNs)来parametrize neural ordinary differential equations(ODE),以捕捉多体系统中连续交互的特性。
  • results: 我们的模型可以准确预测系统动力学,特别是在远距离下。我们还发现我们的模型可以在新系统中很好地泛化,即使只有几个观察数据。
    Abstract Learning multi-agent system dynamics has been extensively studied for various real-world applications, such as molecular dynamics in biology. Most of the existing models are built to learn single system dynamics from observed historical data and predict the future trajectory. In practice, however, we might observe multiple systems that are generated across different environments, which differ in latent exogenous factors such as temperature and gravity. One simple solution is to learn multiple environment-specific models, but it fails to exploit the potential commonalities among the dynamics across environments and offers poor prediction results where per-environment data is sparse or limited. Here, we present GG-ODE (Generalized Graph Ordinary Differential Equations), a machine learning framework for learning continuous multi-agent system dynamics across environments. Our model learns system dynamics using neural ordinary differential equations (ODE) parameterized by Graph Neural Networks (GNNs) to capture the continuous interaction among agents. We achieve the model generalization by assuming the dynamics across different environments are governed by common physics laws that can be captured via learning a shared ODE function. The distinct latent exogenous factors learned for each environment are incorporated into the ODE function to account for their differences. To improve model performance, we additionally design two regularization losses to (1) enforce the orthogonality between the learned initial states and exogenous factors via mutual information minimization; and (2) reduce the temporal variance of learned exogenous factors within the same system via contrastive learning. Experiments over various physical simulations show that our model can accurately predict system dynamics, especially in the long range, and can generalize well to new systems with few observations.
    摘要 学习多体系统动力学已经广泛研究了多种实际应用,如生物分子动力学。大多数现有模型都是从观察历史数据学习单个系统动力学并预测未来轨迹。然而,在实践中,我们可能会观察到不同环境中生成的多个系统,这些系统之间的潜在因素不同,如温度和重力。一种简单的解决方案是学习每个环境专门的模型,但这会忽略系统动力学之间的共同特性并提供差异环境数据罕见或有限的预测结果。这里,我们提出了GG-ODE(普适图 ordinary differential equations)机器学习框架,用于学习连续多体系统动力学 across environments。我们的模型通过使用图ael neural ordinary differential equations(GNNs)参数化的神经网络学习系统动力学,以捕捉连续间的代理人之间的交互。我们实现模型通用性 by assuming不同环境的动力学都受到同一个物理法则管理,可以通过学习共享的 ODE 函数来捕捉这些法则。每个环境的特性学习的潜在隐藏因素被 incorporated into the ODE function 以考虑它们的差异。为了提高模型性能,我们还设计了两种 regularization loss 来(1)使得学习的初始状态和隐藏因素彼此正交via 信息减少Minimization; 和(2)在同一个系统中减少学习的隐藏因素的时间变差via 对比学习。在多种物理 simulations 中,我们的模型可以准确预测系统动力学,特别是在长距离下,并可以良好地通用到新系统中。

Cloud Render Farm Services Discovery Using NLP And Ontology Based Knowledge Graph

  • paper_url: http://arxiv.org/abs/2307.13604
  • repo_url: None
  • paper_authors: Ruby Annette, Aisha Banu, Sharon Priya, Subash Chandran
  • for: 该研究旨在提供一个基于 ontology 的云 render farm 服务发现引擎,以便更好地找到符合项目需求的云 render farm 服务。
  • methods: 该研究使用了知识基础 reasoning 算法,包括概念相似度理解、等价理解和数值相似度理解,来确定云 render farm 服务之间的相似性。
  • results: 研究发现,使用 ontology 基于的服务发现引擎可以更好地找到符合项目需求的云 render farm 服务,并且比不使用 ontology 和使用通用搜索引擎更高效。
    Abstract Cloud render farm services are the animation domain specific cloud services Platform-as-a-Service (PaaS) type of cloud services that provides a complete platform to render the animation files. However, identifying the render farm services that is cost effective and also matches the functional requirements that changes for almost every project like the animation software, plug-ins required etc., is a challenge. This research work proposes an ontology-based service discovery engine named RenderSelect for the cloud render farm services. The cloud render farm ontology semantically defines the relationship among the cloud render farm services. The knowledge-based reasoning algorithms namely, the Concept similarity reasoning, Equivalent reasoning and the Numerical similarity reasoning have been applied to determine the similarity among the cloud services. The service discovery engine was evaluated for finding the services under three different scenarios namely a) with help of the ontology, b) without the help of the ontology and c) using a common search engine on the internet. The results show that the proposed service discovery engine which is specifically designed for the cloud render farm services using the ontology performs significantly better than the other two.
    摘要 <> translate text into Simplified Chinese云 render farm 服务是动画领域专门的云服务平台协议(PaaS)类云服务,提供了完整的平台来渲染动画文件。然而,确定云 render farm 服务的成本效果和匹配动画软件、插件等功能要求的挑战。这项研究工作提出了基于 ontology 的服务发现引擎 named RenderSelect для云 render farm 服务。云 render farm ontology Semantically defines the relationship among the cloud render farm services。使用知识库reasoning算法,包括概念相似理解、相等理解和数值相似理解,对云服务进行相似性判断。服务发现引擎在三个不同的场景下进行测试,分别是:a) 使用 ontology,b) 无ontology和c) 使用互联网搜索引擎。结果表明,特定 для云 render farm 服务使用 ontology 的服务发现引擎在三个场景下表现出显著优异性。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

RidgeBase: A Cross-Sensor Multi-Finger Contactless Fingerprint Dataset

  • paper_url: http://arxiv.org/abs/2307.05563
  • repo_url: https://github.com/bhavinjawade/RidgeBase_Fingerprint_Camera_App
  • paper_authors: Bhavin Jawade, Deen Dayal Mohan, Srirangaraj Setlur, Nalini Ratha, Venu Govindaraju
  • for: 本研究旨在提高无接触指纹识别系统的实用性和可靠性,并且提供一个大规模的实际数据集来推动此类研究。
  • methods: 本研究使用了智能手机摄像头捕获的无接触指纹图像,并提出了一种基于多图像匹配的集成匹配协议,以提高无接触指纹识别的精度和可靠性。
  • results: 研究人员通过对RidgeBase数据集进行实验,发现该协议可以减少无接触指纹识别中的灵活性和背景影响,并且可以提高识别率和准确率。
    Abstract Contactless fingerprint matching using smartphone cameras can alleviate major challenges of traditional fingerprint systems including hygienic acquisition, portability and presentation attacks. However, development of practical and robust contactless fingerprint matching techniques is constrained by the limited availability of large scale real-world datasets. To motivate further advances in contactless fingerprint matching across sensors, we introduce the RidgeBase benchmark dataset. RidgeBase consists of more than 15,000 contactless and contact-based fingerprint image pairs acquired from 88 individuals under different background and lighting conditions using two smartphone cameras and one flatbed contact sensor. Unlike existing datasets, RidgeBase is designed to promote research under different matching scenarios that include Single Finger Matching and Multi-Finger Matching for both contactless- to-contactless (CL2CL) and contact-to-contactless (C2CL) verification and identification. Furthermore, due to the high intra-sample variance in contactless fingerprints belonging to the same finger, we propose a set-based matching protocol inspired by the advances in facial recognition datasets. This protocol is specifically designed for pragmatic contactless fingerprint matching that can account for variances in focus, polarity and finger-angles. We report qualitative and quantitative baseline results for different protocols using a COTS fingerprint matcher (Verifinger) and a Deep CNN based approach on the RidgeBase dataset. The dataset can be downloaded here: https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset.html
    摘要 contactless指纹匹配使用智能手机摄像头可以解决传统指纹系统中的主要挑战,包括卫生性获取、可搬性和演示攻击。然而,实现有用和可靠的无接触指纹匹配技术的发展受到有限的大规模实际数据的限制。为了激励更多的无接触指纹匹配技术的发展,我们介绍了RidgeBase数据集。RidgeBase包含了 более于15,000个无接触和接触基本指纹图像对,从88名个体下获得的不同背景和照明条件下的两个智能手机摄像头和一个平板式接触感测器中获得。与现有数据集不同的是,RidgeBase是设计用于促进不同匹配enario,包括单指匹配和多指匹配,以及CL2CL和C2CL验证和识别。此外,由于无接触指纹图像中的同一个手指之间的高内样异,我们提议一种基于facial recognition数据集的集合匹配协议。这种协议特别适用于实用无接触指纹匹配,可以考虑封闭、极性和手刃角的变化。我们在RidgeBase数据集上运行了一种COTS指纹匹配器(Verifinger)和一种深度 CNN 基于方法的基线结果。数据集可以在以下链接下下载:https://www.buffalo.edu/cubs/research/datasets/ridgebase-benchmark-dataset.html。

The Future of Fundamental Science Led by Generative Closed-Loop Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2307.07522
  • repo_url: None
  • paper_authors: Hector Zenil, Jesper Tegnér, Felipe S. Abrahão, Alexander Lavin, Vipin Kumar, Jeremy G. Frey, Adrian Weller, Larisa Soldatova, Alan R. Bundy, Nicholas R. Jennings, Koichi Takahashi, Lawrence Hunter, Saso Dzeroski, Andrew Briggs, Frederick D. Gregory, Carla P. Gomes, Christopher K. I. Williams, Jon Rowe, James Evans, Hiroaki Kitano, Joshua B. Tenenbaum, Ross King
  • for: The paper explores the potential of AI-driven automation in scientific discovery, particularly in fundamental deep science, and aims to mitigate current problems in the scientific process such as replication of findings and systematic production of data.
  • methods: The paper proposes an AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space.
  • results: The paper holds the promise to unleash AI’s potential for searching and discovering the fundamental structure of our world beyond what human scientists have been able to achieve, and could open doors for technological innovation to tackle some of the greatest challenges facing humanity today.
    Abstract Recent advances in machine learning and AI, including Generative AI and LLMs, are disrupting technological innovation, product development, and society as a whole. AI's contribution to technology can come from multiple approaches that require access to large training data sets and clear performance evaluation criteria, ranging from pattern recognition and classification to generative models. Yet, AI has contributed less to fundamental science in part because large data sets of high-quality data for scientific practice and model discovery are more difficult to access. Generative AI, in general, and Large Language Models in particular, may represent an opportunity to augment and accelerate the scientific discovery of fundamental deep science with quantitative models. Here we explore and investigate aspects of an AI-driven, automated, closed-loop approach to scientific discovery, including self-driven hypothesis generation and open-ended autonomous exploration of the hypothesis space. Integrating AI-driven automation into the practice of science would mitigate current problems, including the replication of findings, systematic production of data, and ultimately democratisation of the scientific process. Realising these possibilities requires a vision for augmented AI coupled with a diversity of AI approaches able to deal with fundamental aspects of causality analysis and model discovery while enabling unbiased search across the space of putative explanations. These advances hold the promise to unleash AI's potential for searching and discovering the fundamental structure of our world beyond what human scientists have been able to achieve. Such a vision would push the boundaries of new fundamental science rather than automatize current workflows and instead open doors for technological innovation to tackle some of the greatest challenges facing humanity today.
    摘要

ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey

  • paper_url: http://arxiv.org/abs/2307.04251
  • repo_url: https://github.com/iamgmujtaba/scholar_search
  • paper_authors: Salman Mohamadi, Ghulam Mujtaba, Ngan Le, Gianfranco Doretto, Donald A. Adjeroh
    for: 这个论文的主要目标是为了提供一份简洁的论文来概括现代 chatGPT 的研究进展和演化。methods: 本论文使用了两种视角来研究 chatGPT:玻璃盒视角(glass box view)和黑盒视角(black box view)。玻璃盒视角旨在理解技术的内部结构和工作机制,而黑盒视角则是将技术视为一个复杂系统,研究其输入、输出和影响。results: 本论文提供了一个全面的概述,涵盖了 chatGPT 的组件和基础元素,以及其应用、影响和意义。文章还提供了关于 LLM 和 GAI 的基本文献,并评估了现有和缺失的研究方向。最后,文章还探讨了 chatGPT 在不同领域的广泛应用和重要问题。
    Abstract ChatGPT is a large language model (LLM) created by OpenAI that has been carefully trained on a large amount of data. It has revolutionized the field of natural language processing (NLP) and has pushed the boundaries of LLM capabilities. ChatGPT has played a pivotal role in enabling widespread public interaction with generative artificial intelligence (GAI) on a large scale. It has also sparked research interest in developing similar technologies and investigating their applications and implications. In this paper, our primary goal is to provide a concise survey on the current lines of research on ChatGPT and its evolution. We considered both the glass box and black box views of ChatGPT, encompassing the components and foundational elements of the technology, as well as its applications, impacts, and implications. The glass box approach focuses on understanding the inner workings of the technology, and the black box approach embraces it as a complex system, and thus examines its inputs, outputs, and effects. This paves the way for a comprehensive exploration of the technology and provides a road map for further research and experimentation. We also lay out essential foundational literature on LLMs and GAI in general and their connection with ChatGPT. This overview sheds light on existing and missing research lines in the emerging field of LLMs, benefiting both public users and developers. Furthermore, the paper delves into the broad spectrum of applications and significant concerns in fields such as education, research, healthcare, finance, etc.
    摘要 chatGPT是一个大型自然语言模型(LLM),由OpenAI公司认真训练了大量数据。它已经革命化了自然语言处理(NLP)领域,并推动了LLM能力的边缘。chatGPT使得大规模的人工智能生成(GAI)与公众进行交互,并且激发了研究人员开发类似技术的兴趣,以及对其应用和影响的研究。在这篇论文中,我们的主要目标是为chatGPT的当前研究进行简要的报告。我们包括了glass box和black box两种视角,涵盖了技术的组件和基础元素,以及其应用、影响和意义。glass box方法强调了技术的内部工作,而black box方法则是视chatGPT为一个复杂系统,从输入、输出和效果的角度进行研究。这种方法使得我们可以全面探讨技术,并为未来的研究和实验提供路线图。我们还提供了LLM和GAI的基础文献,这些文献 shed light on存在和缺失的研究方向,这有助于公众和开发者。此外,论文还探讨了LLM在教育、研究、医疗、金融等领域的广泛应用和重要问题。

A Novel Pipeline for Improving Optical Character Recognition through Post-processing Using Natural Language Processing

  • paper_url: http://arxiv.org/abs/2307.04245
  • repo_url: None
  • paper_authors: Aishik Rakshit, Samyak Mehta, Anirban Dasgupta
  • for: 这篇论文是用于提高 Optical Character Recognition (OCR) 技术的精度,特别是对于手写文本和印刷文本的应用。
  • methods: 该论文使用了 OCR 技术,并将其与自然语言处理 (NLP) 技术结合,以提高 OCR 的精度。
  • results: 该论文提出了一个端到端管道,首先使用 OCR 技术将手写或印刷文本转化为电子文本,然后使用 NLP 技术进行后处理,以提高 OCR 的精度。
    Abstract Optical Character Recognition (OCR) technology finds applications in digitizing books and unstructured documents, along with applications in other domains such as mobility statistics, law enforcement, traffic, security systems, etc. The state-of-the-art methods work well with the OCR with printed text on license plates, shop names, etc. However, applications such as printed textbooks and handwritten texts have limited accuracy with existing techniques. The reason may be attributed to similar-looking characters and variations in handwritten characters. Since these issues are challenging to address with OCR technologies exclusively, we propose a post-processing approach using Natural Language Processing (NLP) tools. This work presents an end-to-end pipeline that first performs OCR on the handwritten or printed text and then improves its accuracy using NLP.
    摘要 оптическое характерное признание (OCR)技术在数字化书籍和无结构文档方面找到了应用,同时还有其他领域的应用,如交通统计、刑事调查、安全系统等。现状的方法在处理OCR打印文本的许多应用场景中表现良好,但是印刷书籍和手写文本的准确率受限,主要是因为类似的字符和手写文本中的变化。这些问题难以通过OCR技术alone解决,因此我们提议一种基于自然语言处理(NLP)的后处理方法。本文介绍了一个端到端管道,首先使用OCR技术将手写或印刷文本转化为电子文档,然后使用NLP工具进行后处理,以提高准确率。

TransPose: A Transformer-based 6D Object Pose Estimation Network with Depth Refinement

  • paper_url: http://arxiv.org/abs/2307.05561
  • repo_url: None
  • paper_authors: Mahmoud Abdulsalam, Nabil Aouf
  • for: 本研究旨在提高视觉基于6DoF姿态估计的精度,以便在自动化操作中提高机器人抓取应用的精度。
  • methods: 本文提出了一种基于Transformer的6DoF姿态估计方法,包括一个新的深度估计网络和一个改进的推测网络。
  • results: 对比其他文献方法,本研究的结果表明,该方法在果园抓取应用中的精度明显高于其他方法。
    Abstract As demand for robotics manipulation application increases, accurate vision-based 6D pose estimation becomes essential for autonomous operations. Convolutional Neural Networks (CNNs) based approaches for pose estimation have been previously introduced. However, the quest for better performance still persists especially for accurate robotics manipulation. This quest extends to the Agri-robotics domain. In this paper, we propose TransPose, an improved Transformer-based 6D pose estimation with a depth refinement module. The architecture takes in only an RGB image as input with no additional supplementing modalities such as depth or thermal images. The architecture encompasses an innovative lighter depth estimation network that estimates depth from an RGB image using feature pyramid with an up-sampling method. A transformer-based detection network with additional prediction heads is proposed to directly regress the object's centre and predict the 6D pose of the target. A novel depth refinement module is then used alongside the predicted centers, 6D poses and depth patches to refine the accuracy of the estimated 6D pose. We extensively compared our results with other state-of-the-art methods and analysed our results for fruit-picking applications. The results we achieved show that our proposed technique outperforms the other methods available in the literature.
    摘要 随着 роботи库操作应用的需求增加,精准视觉基于6D姿态估计变得非常重要。先前,基于卷积神经网络(CNN)的 pose estimation 方法已经得到了推出。然而,为了更好地实现自动化操作,特别是在农业机器人领域,仍然存在寻求更高的性能。在这篇论文中,我们提出了 TransPose,一种改进的 transformer 基于 6D 姿态估计方法。该架构只接受 RGB 图像作为输入,不需要附加的补充Modalities 如深度或热力图像。架构包括一个创新的轻量级深度估计网络,该网络使用特征层次来估计 RGB 图像中的深度,并使用更正方法进行升采样。我们还提出了一个基于 transformer 的探测网络,该网络直接将目标对象的中心点和6D 姿态估计作为输出。最后,我们提出了一种新的深度修正模块,该模块与预测的中心点、6D 姿态和深度裁剪图像一起,来修正估计的6D姿态的准确性。我们对其他文献中的结果进行了广泛比较,并对果物摘取应用进行了分析。我们所得到的结果显示,我们的提出的技术超过了现有文献中的方法。

Real-time Human Detection in Fire Scenarios using Infrared and Thermal Imaging Fusion

  • paper_url: http://arxiv.org/abs/2307.04223
  • repo_url: None
  • paper_authors: Truong-Dong Do, Nghe-Nhan Truong, My-Ha Le
  • for: 提高人员搜救效率,使用视觉基于人体检测系统在低可见性enario中提高生存机会。
  • methods: 利用多个摄像头捕获图像,通过推热和红外混合技术,提取有用的特征进行人体检测。
  • results: 实验结果显示,提posed方法可以在理想的速度下处理,并在0.5MAP中达到95%的表现。
    Abstract Fire is considered one of the most serious threats to human lives which results in a high probability of fatalities. Those severe consequences stem from the heavy smoke emitted from a fire that mostly restricts the visibility of escaping victims and rescuing squad. In such hazardous circumstances, the use of a vision-based human detection system is able to improve the ability to save more lives. To this end, a thermal and infrared imaging fusion strategy based on multiple cameras for human detection in low-visibility scenarios caused by smoke is proposed in this paper. By processing with multiple cameras, vital information can be gathered to generate more useful features for human detection. Firstly, the cameras are calibrated using a Light Heating Chessboard. Afterward, the features extracted from the input images are merged prior to being passed through a lightweight deep neural network to perform the human detection task. The experiments conducted on an NVIDIA Jetson Nano computer demonstrated that the proposed method can process with reasonable speed and can achieve favorable performance with a mAP@0.5 of 95%.
    摘要 火是人类生命最严重的威胁之一,可能导致高probability of fatalities。这些严重的后果来自于大量烟雾所带来的视线受阻,使得逃生者和救援队伍的视线受到了严重限制。在这种危险情况下,使用视觉基本人体检测系统可以提高生存的可能性。因此,本文提出了一种基于多个摄像头的热成像和 инфра红成像融合策略,用于人体检测在低视野情况下。通过处理多个摄像头的输入图像,可以收集更多的有用特征进行人体检测。首先,摄像头被准确使用一个光热棋盘进行准确。然后,输入图像中提取的特征被合并并传递给一个轻量级深度神经网络进行人体检测任务。实验结果表明,提议的方法可以在合理的速度下进行处理,并且可以在0.5的MAP值上达到95%的表现。

LakeBench: Benchmarks for Data Discovery over Data Lakes

  • paper_url: http://arxiv.org/abs/2307.04217
  • repo_url: None
  • paper_authors: Kavitha Srinivas, Julian Dolby, Ibrahim Abdelaziz, Oktie Hassanzadeh, Harsha Kokel, Aamod Khatiwada, Tejaswini Pedapati, Subhajit Chaudhury, Horst Samulowitz
  • for: 本研究旨在提供数据湖中数据发现的基准测试集,以便用于评估不同的表Foundational模型在数据发现任务中的表现。
  • methods: 本研究使用了多种数据源,包括政府数据平台CKAN、Socrata和欧洲中央银行,从而构建了多个 benchmark tasks。然后,对4种公共可用的表Foundational模型进行了比较性evaluation。
  • results: 研究发现,现有的表Foundational模型尚未被专门用于数据发现任务的训练,其表现表现出了明显的改进空间。结果建议,建立这些基准测试集可能对社区有所帮助,以建立适用于数据湖中数据发现的表Foundational模型。
    Abstract Within enterprises, there is a growing need to intelligently navigate data lakes, specifically focusing on data discovery. Of particular importance to enterprises is the ability to find related tables in data repositories. These tables can be unionable, joinable, or subsets of each other. There is a dearth of benchmarks for these tasks in the public domain, with related work targeting private datasets. In LakeBench, we develop multiple benchmarks for these tasks by using the tables that are drawn from a diverse set of data sources such as government data from CKAN, Socrata, and the European Central Bank. We compare the performance of 4 publicly available tabular foundational models on these tasks. None of the existing models had been trained on the data discovery tasks that we developed for this benchmark; not surprisingly, their performance shows significant room for improvement. The results suggest that the establishment of such benchmarks may be useful to the community to build tabular models usable for data discovery in data lakes.
    摘要 在企业中,有一个增长的需求是智能导航数据湖,尤其是数据发现。企业中对于找到相关的表格在数据存储中的能力是非常重要。这些表格可能是可合并的、可连接的或彼此之间的子集。现有的公共领域中没有很多对这些任务的标准准则,相关的工作都是针对私有数据集进行的。在 LakeBench 中,我们开发了多个对这些任务的标准准则,使用从多种数据源中抽出的多个表格,如政府数据平台 CKAN、Socrata 和欧洲中央银行。我们对这些任务的表现进行了比较,发现现有的公共可用基础模型中没有任何一个已经在这些任务上进行过训练。结果表明,建立这些准则可能对社区有用,以建立适用于数据发现的表格模型。

Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data

  • paper_url: http://arxiv.org/abs/2307.04216
  • repo_url: https://github.com/hieutrungle/data-slim
  • paper_authors: Hieu Le, Hernan Santos, Jian Tao
  • for: 大规模科学数据压缩
  • methods: 使用自适应神经网络进行压缩
  • results: 实现了高压缩率并保持高重建质量Here’s a more detailed explanation of each point:
  • for: The paper is focused on compressing large-scale scientific data, which is a growing challenge in many domains. The authors propose a neural network-based approach to address this issue.
  • methods: The proposed method uses an Autoencoder-based neural network to compress the data. The network is trained to reconstruct the original data from the compressed representation, allowing for efficient compression and reconstruction.
  • results: The authors test their method on several benchmark data sets and achieve a high compression ratio (140) without compromising the reconstruction quality. They also apply the method to a large-scale high-resolution climate modeling data set and achieve a compression ratio of 200 with negligible reconstruction error.
    Abstract Lossy compression has become an important technique to reduce data size in many domains. This type of compression is especially valuable for large-scale scientific data, whose size ranges up to several petabytes. Although Autoencoder-based models have been successfully leveraged to compress images and videos, such neural networks have not widely gained attention in the scientific data domain. Our work presents a neural network that not only significantly compresses large-scale scientific data but also maintains high reconstruction quality. The proposed model is tested with scientific benchmark data available publicly and applied to a large-scale high-resolution climate modeling data set. Our model achieves a compression ratio of 140 on several benchmark data sets without compromising the reconstruction quality. Simulation data from the High-Resolution Community Earth System Model (CESM) Version 1.3 over 500 years are also being compressed with a compression ratio of 200 while the reconstruction error is negligible for scientific analysis.
    摘要 丢弃性压缩已成为许多领域中减少数据大小的重要技术。特别是在大规模科学数据领域,数据的大小可以达到几百亿字节级别。虽然基于自适应神经网络的模型已成功应用于图像和视频压缩,但这些神经网络在科学数据领域并未受到广泛关注。我们的工作推出了一种可以高效压缩大规模科学数据,同时保持高重建质量的神经网络模型。我们的模型在公共可用的科学数据 benchmark 上进行测试,并应用于大规模高分辨率气候模拟数据集。我们的模型在多个 benchmark 数据集上实现了压缩率为140,而且重建质量仍然保持在高水平。同时,我们还对高分辨率地球系统模型(CESM)版本1.3的500年大量数据进行压缩,压缩率达200,而且重建错误几乎可以忽略不计。