cs.AI - 2023-09-12

Quantum Data Center: Perspectives

  • paper_url: http://arxiv.org/abs/2309.06641
  • repo_url: None
  • paper_authors: Junyu Liu, Liang Jiang
  • for: quantum computing, communication, and sensing
  • methods: combining Quantum Random Access Memory (QRAM) and quantum networks
  • results: significant benefits in efficiency, security, and precision for customers, with potential scientific and business opportunities in machine learning and big data industries.Here’s the text in Simplified Chinese:
  • for: 量子计算、量子通信和量子探测
  • methods: 结合量子随机访问存储器(QRAM)和量子网络
  • results: 为客户提供高效、安全、精度的 significative benefits,在机器学习和大数据行业中探讨可能的科学和商业机遇。
    Abstract A quantum version of data centers might be significant in the quantum era. In this paper, we introduce Quantum Data Center (QDC), a quantum version of existing classical data centers, with a specific emphasis on combining Quantum Random Access Memory (QRAM) and quantum networks. We argue that QDC will provide significant benefits to customers in terms of efficiency, security, and precision, and will be helpful for quantum computing, communication, and sensing. We investigate potential scientific and business opportunities along this novel research direction through hardware realization and possible specific applications. We show the possible impacts of QDCs in business and science, especially the machine learning and big data industries.
    摘要 一个量子版的数据中心可能在量子时代具有重要意义。在这篇论文中,我们介绍量子数据中心(QDC),量子版的现有古典数据中心,强调将量子随机访问存储(QRAM)和量子网络结合使用。我们认为QDC将为客户提供高效、安全和精度的好处,并将对量子计算、通信和探测做出重要贡献。我们通过硬件实现和可能的具体应用来研究这一新的研究方向的科学和商业机会。我们展示了QDC在业务和科学领域的可能的影响,特别是机器学习和大数据领域。

The Relational Bottleneck as an Inductive Bias for Efficient Abstraction

  • paper_url: http://arxiv.org/abs/2309.06629
  • repo_url: None
  • paper_authors: Taylor W. Webb, Steven M. Frankland, Awni Altabaa, Kamesh Krishnamurthy, Declan Campbell, Jacob Russin, Randall O’Reilly, John Lafferty, Jonathan D. Cohen
  • for: 本研究旨在解释如何从有限经验中获得抽象概念。
  • methods: 本研究使用了一种新的方法,即利用 inductive bias 来从数据中引出抽象。
  • results: 研究表明,这种方法可以在数据效率的情况下induce抽象,并且可能是人类大脑中抽象概念的获得的机制。
    Abstract A central challenge for cognitive science is to explain how abstract concepts are acquired from limited experience. This effort has often been framed in terms of a dichotomy between empiricist and nativist approaches, most recently embodied by debates concerning deep neural networks and symbolic cognitive models. Here, we highlight a recently emerging line of work that suggests a novel reconciliation of these approaches, by exploiting an inductive bias that we term the relational bottleneck. We review a family of models that employ this approach to induce abstractions in a data-efficient manner, emphasizing their potential as candidate models for the acquisition of abstract concepts in the human mind and brain.
    摘要 中心挑战是让抽象概念从有限经验中获得。这一努力 часто被划分为经验主义和Native主义方法,最近在深度神经网络和符号认知模型之间展开了讨论。我们在这里强调一种新出现的工作,通过利用我们称为关系瓶颈的推理偏好来解决这一问题。我们评论了一家族模型,这些模型通过这种方法来从数据有效地获得抽象,强调它们在人类大脑和脑中抽象概念的获得中的潜在作用。

A Reinforcement Learning Approach for Robotic Unloading from Visual Observations

  • paper_url: http://arxiv.org/abs/2309.06621
  • repo_url: https://github.com/vittoriogiammarino/rl-for-unloading-from-pixels
  • paper_authors: Vittorio Giammarino, Alberto Giammarino, Matthew Pearce
  • For: 本研究强调使用视觉输入自动解压包裹问题,以便机器人可以通过RGB-D图像来学习无需标注数据。* Methods: 我们提出了一种层次控制结构,其中高级决策模块和传统的运动控制结合在一起。高级模块通过深度征识学习(DRL)进行训练,并采用安全偏好机制和适应于这个任务的奖励函数。* Results: 我们的实验表明,这两个元素都对学习性能产生了关键作用。此外,为确保可重复性和未来研究的标准,我们提供了免费代码和 simulate。
    Abstract In this work, we focus on a robotic unloading problem from visual observations, where robots are required to autonomously unload stacks of parcels using RGB-D images as their primary input source. While supervised and imitation learning have accomplished good results in these types of tasks, they heavily rely on labeled data, which are challenging to obtain in realistic scenarios. Our study aims to develop a sample efficient controller framework that can learn unloading tasks without the need for labeled data during the learning process. To tackle this challenge, we propose a hierarchical controller structure that combines a high-level decision-making module with classical motion control. The high-level module is trained using Deep Reinforcement Learning (DRL), wherein we incorporate a safety bias mechanism and design a reward function tailored to this task. Our experiments demonstrate that both these elements play a crucial role in achieving improved learning performance. Furthermore, to ensure reproducibility and establish a benchmark for future research, we provide free access to our code and simulation.
    摘要 在这项工作中,我们关注了基于视觉观察的 роботизирован无需卸车问题,Robot需要通过RGB-D图像作为主要输入来自动卸车堆叠的包裹。而supervised学习和模仿学习在这类任务中已经达到了良好的结果,但它们强依赖于实际场景中困难获得的标注数据。我们的研究旨在开发一种样本效率高的控制框架,可以在学习过程中不需要标注数据。为此,我们提议一种层次控制结构, combining高级决策模块和传统的运动控制。高级模块通过深度循环学习(DRL)进行训练,并在这个过程中添加了安全偏好机制和适应到这个任务的奖励函数。我们的实验表明,这两个元素具有重要的作用,可以提高学习性能。此外,为确保可重现性和建立未来研究的标准,我们提供了免费的代码和 simulations。

CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis

  • paper_url: http://arxiv.org/abs/2309.07178
  • repo_url: None
  • paper_authors: Di Guo, Sijin Li, Jun Liu, Zhangren Tu, Tianyu Qiu, Jingjing Xu, Liubin Feng, Donghai Lin, Qing Hong, Meijin Lin, Yanqin Lin, Xiaobo Qu
  • for: 这个研究的目的是提供一个智能在线云计算平台,用于核磁共振(NMR)数据处理、重构和定量分析。
  • methods: 该平台使用并行计算和图形处理器(GPU)/中央处理器(CPU)的分布式计算技术,以提高计算效率和简化用户的操作。另外,它还 integrate了当前最佳的深度学习算法,以提供完整的处理过程,使用户无需安装其他软件。
  • results: 该平台可以快速地处理大量的NMR数据,并提供高精度的定量分析结果。此外,它还具有开放的API,可以让用户根据需要选择不同的处理流程和深度学习算法。
    Abstract Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html
    摘要 核磁共振(NMR)分析是化学和生物研究中的一种强大工具,但是从NMR仪器上获取的原始数据处理和后续的量化分析需要各种专门的工具,这需要深入的编程和NMR知识。特别是在深度学习工具出现之前,NMR处理是非常困难的。为了解决这问题,我们提出了CloudBrain-NMR,一个基于云计算的智能在线平台,用于NMR数据的读取、处理、重构和量化分析。该平台通过浏览器访问,不需要用户安装任何软件。CloudBrain-NMR使用并行计算和图形处理器,实现了明显缩短计算时间。此外,它还包含了最新的深度学习算法,提供了全面的功能,让用户可以完成整个处理过程,不需要靠其他软件。这使得NMR应用程序得到了高级人工智能处理的 empowerment。CloudBrain-NMR是免费开放的,可以在https://csrc.xmu.edu.cn/CloudBrain.html中免费使用。

Hybrid Algorithm Selection and Hyperparameter Tuning on Distributed Machine Learning Resources: A Hierarchical Agent-based Approach

  • paper_url: http://arxiv.org/abs/2309.06604
  • repo_url: None
  • paper_authors: Ahmad Esmaeili, Julia T. Rayz, Eric T. Matson
  • for: This paper proposes a fully automatic and collaborative agent-based mechanism for selecting distributedly organized machine learning algorithms and simultaneously tuning their hyperparameters.
  • methods: The proposed method builds upon an existing agent-based hierarchical machine-learning platform and augments its query structure to support the aforementioned functionalities without being limited to specific learning, selection, and tuning mechanisms.
  • results: According to the results, our solution is totally correct and exhibits linear time and space complexity in relation to the size of available resources. The proposed method is also demonstrated to be effective in adapting and performing across a range of algorithmic options and datasets through experiments using a system comprised of 24 algorithms and 9 datasets.
    Abstract Algorithm selection and hyperparameter tuning are critical steps in both academic and applied machine learning. On the other hand, these steps are becoming ever increasingly delicate due to the extensive rise in the number, diversity, and distributedness of machine learning resources. Multi-agent systems, when applied to the design of machine learning platforms, bring about several distinctive characteristics such as scalability, flexibility, and robustness, just to name a few. This paper proposes a fully automatic and collaborative agent-based mechanism for selecting distributedly organized machine learning algorithms and simultaneously tuning their hyperparameters. Our method builds upon an existing agent-based hierarchical machine-learning platform and augments its query structure to support the aforementioned functionalities without being limited to specific learning, selection, and tuning mechanisms. We have conducted theoretical assessments, formal verification, and analytical study to demonstrate the correctness, resource utilization, and computational efficiency of our technique. According to the results, our solution is totally correct and exhibits linear time and space complexity in relation to the size of available resources. To provide concrete examples of how the proposed methodologies can effectively adapt and perform across a range of algorithmic options and datasets, we have also conducted a series of experiments using a system comprised of 24 algorithms and 9 datasets.
    摘要 algorithm 选择和超参数调整是学术应用机器学习中的关键步骤,然而这些步骤正在不断增加、多样化和分布化机器学习资源的情况下变得越来越细腻。在机器学习平台的设计中,多智能体系统带来了规模、灵活性和稳定性等特点。本文提出了一种完全自动和协作的智能体基于机制,用于分布式组织机器学习算法和同时调整其超参数。我们的方法基于现有的智能体层次机器学习平台,并将其查询结构改进以支持上述功能性能不受特定学习、选择和调整机制限制。我们已经进行了理论评估、正式验证和分析研究,以证明我们的技术是完全正确的,并且在资源大小的情况下具有线性时间和空间复杂度。为了让读者更好地理解我们的方法在不同算法和数据集上的应用和效果,我们还进行了一系列实验,使用了24种算法和9个数据集。

Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning

  • paper_url: http://arxiv.org/abs/2309.06597
  • repo_url: None
  • paper_authors: Enna Sachdeva, Nakul Agarwal, Suhas Chundi, Sean Roelofs, Jiachen Li, Behzad Dariush, Chiho Choi, Mykel Kochenderfer
  • for: 这篇论文的目的是提高自动驾驶车和高级驾驶支持系统的社会接受度,以便实现其广泛应用。
  • methods: 这篇论文使用了一种新的多模态 egocentric 数据集,即 Rank2Tell,来评估和描述复杂交通场景中重要对象的含义水平和原因。它还提出了一种联合模型,用于同时评估对象的重要程度和自然语言描述。
  • results: 根据论文的描述,Rank2Tell 数据集和联合模型在视觉场景理解和相关领域中提供了一个有价值的资源,并在量化评估中达到了高水平的性能。
    Abstract The widespread adoption of commercial autonomous vehicles (AVs) and advanced driver assistance systems (ADAS) may largely depend on their acceptance by society, for which their perceived trustworthiness and interpretability to riders are crucial. In general, this task is challenging because modern autonomous systems software relies heavily on black-box artificial intelligence models. Towards this goal, this paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance. Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios. The dense annotations and unique attributes of the dataset make it a valuable resource for researchers working on visual scene understanding and related fields. Further, we introduce a joint model for joint importance level ranking and natural language captions generation to benchmark our dataset and demonstrate performance with quantitative evaluations.
    摘要 广泛采用商业自动驾驶车(AV)和高级驾驶助手系统(ADAS)的普及可能受到社会的接受程度的限制,这个任务的难度在于现代自动驾驶系统软件借助黑盒人工智能模型。为达到这个目标,本文提出了一个新的数据集,即 Rank2Tell,这是一个多模态自我中心数据集,用于评估对象的重要性水平和说明其重要性的原因。该数据集通过多种close和开放式视觉问答来提供各种语义、空间、时间和关系特征的精密注释,使其成为研究视觉Scene理解和相关领域的价值资源。此外,我们还引入了一种共同模型,用于同时评估对象的重要性水平和生成自然语言描述,以 benchmark我们的数据集并进行评估性评价。

Do Generative Large Language Models need billions of parameters?

  • paper_url: http://arxiv.org/abs/2309.06589
  • repo_url: None
  • paper_authors: Sia Gholami, Marwan Omar
  • for: 这个论文是为了开发高效的大语言模型(LLMs)而写的。
  • methods: 这篇论文探索了模型大小、性能和计算资源之间的贸易协议,以实现LLMs的高效性。它提出了新的方法,让不同的模型部分共享参数,从而减少总的唯一参数数量。这种方法可以保证模型尽量减少大小而不丢失复杂语言结构的学习和表示能力。
  • results: 这项研究提供了创新的工具和方法,可以创造更高效和有效的LLMs,为AI语言模型的可持续发展和普及做出了贡献。
    Abstract This paper presents novel systems and methodologies for the development of efficient large language models (LLMs). It explores the trade-offs between model size, performance, and computational resources, with the aim of maximizing the efficiency of these AI systems. The research explores novel methods that allow different parts of the model to share parameters, reducing the total number of unique parameters required. This approach ensures that the model remains compact without sacrificing its ability to learn and represent complex language structures. This study provides valuable insights and tools for creating more efficient and effective LLMs, contributing to a more sustainable and accessible future for AI language modeling.
    摘要 Translation notes:* "efficient" is translated as "高效" (gāo yè)* "large language models" is translated as "大型语言模型" (dà xíng yǔ yán módel)* "trade-offs" is translated as "负担" (zhāng dāng)* "compact" is translated as "减少" (jiǎn shǎo)* "sustainable" is translated as "可持续" (kě chéng xù)* "accessible" is translated as "可达" (kě dà)

HurriCast: An Automatic Framework Using Machine Learning and Statistical Modeling for Hurricane Forecasting

  • paper_url: http://arxiv.org/abs/2309.07174
  • repo_url: None
  • paper_authors: Shouwei Gao, Meiyan Gao, Yuepeng Li, Wenqian Dong
  • for: 这个研究旨在提高飓风风险评估模型,以更好地预测飓风的发展趋势和强度。
  • methods: 该研究使用了ARIMA模型和K-MEANS算法,以及自适应神经网络,对飓风趋势和强度进行更加精准的预测。
  • results: 实验表明,这种混合方法可以准确地模拟历史飓风行为,并提供详细的未来轨迹和强度预测,对风险管理策略提供了有价值的参考。
    Abstract Hurricanes present major challenges in the U.S. due to their devastating impacts. Mitigating these risks is important, and the insurance industry is central in this effort, using intricate statistical models for risk assessment. However, these models often neglect key temporal and spatial hurricane patterns and are limited by data scarcity. This study introduces a refined approach combining the ARIMA model and K-MEANS to better capture hurricane trends, and an Autoencoder for enhanced hurricane simulations. Our experiments show that this hybrid methodology effectively simulate historical hurricane behaviors while providing detailed projections of potential future trajectories and intensities. Moreover, by leveraging a comprehensive yet selective dataset, our simulations enrich the current understanding of hurricane patterns and offer actionable insights for risk management strategies.
    摘要 飓风在美国 pose 严重挑战,因为它们可能导致毁灭性的影响。为了降低这些风险,保险业是关键的,使用复杂的统计模型进行风险评估。然而,这些模型经常忽略风暴的时间和空间特征,并且由于数据缺乏,受限于。本研究提出了一种改进的方法,结合ARIMA模型和K-MEANS,以更好地捕捉风暴趋势,并使用自适应神经网络进行增强的风暴模拟。我们的实验表明,这种混合方法可以准确地模拟历史风暴行为,并提供详细的未来轨迹和强度预测。此外,通过利用全面而选择性的数据集,我们的模拟提高了现有风暴模式的理解,并提供了有价值的风险管理策略。

Hierarchical Multi-Task Learning Framework for Session-based Recommendations

  • paper_url: http://arxiv.org/abs/2309.06533
  • repo_url: None
  • paper_authors: Sejoon Oh, Walid Shalaby, Amir Afsharinejad, Xiquan Cui
  • for: 提高Session-based recommendation系统(SBRS)的预测精度和普适性。
  • methods: 使用层次多任务学习(H-MTL)架构,将预测任务与auxiliary任务设置在层次结构之间,从auxiliary任务中获得更多的输入特征,提高预测结果的可解释性。
  • results: 在两个Session-based recommendation数据集上,HierSRec比既有SBRS的next-item预测精度高,并且针对手动生成的候选项(例如4%的总ITEMS)进行可扩展的推理。
    Abstract While session-based recommender systems (SBRSs) have shown superior recommendation performance, multi-task learning (MTL) has been adopted by SBRSs to enhance their prediction accuracy and generalizability further. Hierarchical MTL (H-MTL) sets a hierarchical structure between prediction tasks and feeds outputs from auxiliary tasks to main tasks. This hierarchy leads to richer input features for main tasks and higher interpretability of predictions, compared to existing MTL frameworks. However, the H-MTL framework has not been investigated in SBRSs yet. In this paper, we propose HierSRec which incorporates the H-MTL architecture into SBRSs. HierSRec encodes a given session with a metadata-aware Transformer and performs next-category prediction (i.e., auxiliary task) with the session encoding. Next, HierSRec conducts next-item prediction (i.e., main task) with the category prediction result and session encoding. For scalable inference, HierSRec creates a compact set of candidate items (e.g., 4% of total items) per test example using the category prediction. Experiments show that HierSRec outperforms existing SBRSs as per next-item prediction accuracy on two session-based recommendation datasets. The accuracy of HierSRec measured with the carefully-curated candidate items aligns with the accuracy of HierSRec calculated with all items, which validates the usefulness of our candidate generation scheme via H-MTL.
    摘要 session-based recommender systems (SBRSs) 已经显示出了更高的推荐性能,但是多任务学习 (MTL) 已经被 SBRSs 采用以进一步提高其预测精度和泛化性。层次多任务学习 (H-MTL) 设置了一个层次结构 между预测任务和输出 auxiliary tasks 的输出。这个层次结构导致主任务的输入特征更加丰富,并且提高了预测结果的解释性,相比既有MTL框架。然而,H-MTL 框架在 SBRSs 中尚未被研究。在这篇论文中,我们提出了 HierSRec,它将 H-MTL 框架应用于 SBRSs。HierSRec 使用 metadata-aware Transformer 对给定的会话进行编码,然后使用会话编码进行下一个类型预测(即 auxiliary task)。接着,HierSRec 使用类型预测结果和会话编码进行下一个项目预测(即 main task)。为了可扩展的推理,HierSRec 创建了一个紧凑的候选项列表(例如,4% 的总项)每个测试示例,使用类型预测结果进行筛选。实验结果显示,HierSRec 在两个会话基于推荐数据集上的下一个项目预测精度上表现出色,与现有 SBRSs 相比。HierSRec 测试结果与我们精心准备的候选项列表相关,证明了我们的候选生成方案的有用性。

Minimum Bayes’ Risk Decoding for System Combination of Grammatical Error Correction Systems

  • paper_url: http://arxiv.org/abs/2309.06520
  • repo_url: https://github.com/rainavyas/mbr_gec
  • paper_authors: Vyas Raina, Mark Gales
  • for: 这个论文主要是为了提高grammatical error correction(GEC)系统的性能,具体来说是通过最小 bayes 风险(MBR)解码方法来改善系统输出的匹配度。
  • methods: 这篇论文使用的方法包括:MBR解码方法、current max-voting combination scheme、individual edit-level selection、以及不同的奖励度量在 MBR 解码框架中的应用。
  • results: 实验结果表明,使用提议的 MBR 解码方法可以提高 GEC 系统的性能,并且可以通过调整奖励度量来控制系统的精度、准确率和 F-score。
    Abstract For sequence-to-sequence tasks it is challenging to combine individual system outputs. Further, there is also often a mismatch between the decoding criterion and the one used for assessment. Minimum Bayes' Risk (MBR) decoding can be used to combine system outputs in a manner that encourages better alignment with the final assessment criterion. This paper examines MBR decoding for Grammatical Error Correction (GEC) systems, where performance is usually evaluated in terms of edits and an associated F-score. Hence, we propose a novel MBR loss function directly linked to this form of criterion. Furthermore, an approach to expand the possible set of candidate sentences is described. This builds on a current max-voting combination scheme, as well as individual edit-level selection. Experiments on three popular GEC datasets and with state-of-the-art GEC systems demonstrate the efficacy of the proposed MBR approach. Additionally, the paper highlights how varying reward metrics within the MBR decoding framework can provide control over precision, recall, and the F-score in combined GEC systems.
    摘要 For sequence-to-sequence tasks, it is challenging to combine individual system outputs. Furthermore, there is often a mismatch between the decoding criterion and the one used for assessment. Minimum Bayes' Risk (MBR) decoding can be used to combine system outputs in a manner that encourages better alignment with the final assessment criterion. This paper examines MBR decoding for Grammatical Error Correction (GEC) systems, where performance is usually evaluated in terms of edits and an associated F-score. Therefore, we propose a novel MBR loss function directly linked to this form of criterion. Additionally, an approach to expand the possible set of candidate sentences is described. This builds on a current max-voting combination scheme, as well as individual edit-level selection. Experiments on three popular GEC datasets and with state-of-the-art GEC systems demonstrate the efficacy of the proposed MBR approach. Moreover, the paper highlights how varying reward metrics within the MBR decoding framework can provide control over precision, recall, and the F-score in combined GEC systems.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Learning Disentangled Avatars with Hybrid 3D Representations

  • paper_url: http://arxiv.org/abs/2309.06441
  • repo_url: https://github.com/yfeng95/DELTA
  • paper_authors: Yao Feng, Weiyang Liu, Timo Bolkart, Jinlong Yang, Marc Pollefeys, Michael J. Black
  • for: 本研究旨在实现可动画化和真实化人类模拟器。
  • methods: 本paper使用混合explicit-implicit 3D表示方法,即DELTA方法,将人类模拟器分解成不同部分的mesh和神经网络频谱场。
  • results: 本paper实现了人类模拟器的分解,并在不同应用中表现出色,如分解人体和服装、分解面孔和头发等。此外,paper还发现了可以将头发和服装转移到不同的身体形态上。
    Abstract Tremendous efforts have been made to learn animatable and photorealistic human avatars. Towards this end, both explicit and implicit 3D representations are heavily studied for a holistic modeling and capture of the whole human (e.g., body, clothing, face and hair), but neither representation is an optimal choice in terms of representation efficacy since different parts of the human avatar have different modeling desiderata. For example, meshes are generally not suitable for modeling clothing and hair. Motivated by this, we present Disentangled Avatars~(DELTA), which models humans with hybrid explicit-implicit 3D representations. DELTA takes a monocular RGB video as input, and produces a human avatar with separate body and clothing/hair layers. Specifically, we demonstrate two important applications for DELTA. For the first one, we consider the disentanglement of the human body and clothing and in the second, we disentangle the face and hair. To do so, DELTA represents the body or face with an explicit mesh-based parametric 3D model and the clothing or hair with an implicit neural radiance field. To make this possible, we design an end-to-end differentiable renderer that integrates meshes into volumetric rendering, enabling DELTA to learn directly from monocular videos without any 3D supervision. Finally, we show that how these two applications can be easily combined to model full-body avatars, such that the hair, face, body and clothing can be fully disentangled yet jointly rendered. Such a disentanglement enables hair and clothing transfer to arbitrary body shapes. We empirically validate the effectiveness of DELTA's disentanglement by demonstrating its promising performance on disentangled reconstruction, virtual clothing try-on and hairstyle transfer. To facilitate future research, we also release an open-sourced pipeline for the study of hybrid human avatar modeling.
    摘要 很大的努力已经投入到了人类动画和实际化人物模型的学习中。在这个领域,both explicit和implicit的3D表示都被广泛研究,以实现人类整体模型化和捕捉(例如,身体、衣服和头发),但 neither representation是优选的选择,因为不同的人物部分有不同的模型需求。例如,网格不适用于衣服和头发的模型。这种情况下,我们提出了Disentangled Avatars(DELTA),它使用了混合的explicit-implicit 3D表示来模型人类。DELTA通过一个灰度RGB视频输入,生成了一个人物模型,其中包括身体和衣服/头发层。具体来说,我们展示了两个重要的应用场景。在第一个应用场景中,我们考虑了人体和衣服的分离,在第二个应用场景中,我们分离了面孔和头发。为了实现这些应用场景,DELTA使用了一种由网格和神经辐射场组成的混合表示方法。为了实现这种方法,我们设计了一个端到端可微 differentiable 渲染器,该渲染器将网格 integrate into volumetric rendering,以便DELTA可以直接从灰度视频中学习,不需要任何3D监督。最后,我们表明了如何将这两个应用场景结合起来,以实现全身人物模型的分离和重新渲染。这种分离允许头发和衣服进行到任何身体形状的转移。我们通过实验证明了DELTA的分离性能的表现,包括分离重建、虚拟服装尝试和头发样式传输。为了促进未来的研究,我们还发布了一个开源的人类动画模型研究管道。

LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning

  • paper_url: http://arxiv.org/abs/2309.06440
  • repo_url: None
  • paper_authors: Kenneth Shaw, Ananye Agarwal, Deepak Pathak
  • for: 这 paper 的目的是为了提供一种低成本的、人工智能研究用的多功能手。
  • methods: 这 paper 使用了一种新的机械结构,使得手臂在不同的手势状态下仍能保持最大的dexterity。此外,paper 还使用了 Machine Learning 技术进行 manipulate 任务的学习。
  • results: 这 paper 的实验结果表明,LEAP Hand 可以在真实世界中完成多种抓取任务,包括视觉 теле操作和学习从无动视频数据。LEAP Hand 在所有实验中都表现出色,而且与最近的竞争对手 Allegro Hand 相比,它的成本为 1/8。
    Abstract Dexterous manipulation has been a long-standing challenge in robotics. While machine learning techniques have shown some promise, results have largely been currently limited to simulation. This can be mostly attributed to the lack of suitable hardware. In this paper, we present LEAP Hand, a low-cost dexterous and anthropomorphic hand for machine learning research. In contrast to previous hands, LEAP Hand has a novel kinematic structure that allows maximal dexterity regardless of finger pose. LEAP Hand is low-cost and can be assembled in 4 hours at a cost of 2000 USD from readily available parts. It is capable of consistently exerting large torques over long durations of time. We show that LEAP Hand can be used to perform several manipulation tasks in the real world -- from visual teleoperation to learning from passive video data and sim2real. LEAP Hand significantly outperforms its closest competitor Allegro Hand in all our experiments while being 1/8th of the cost. We release detailed assembly instructions, the Sim2Real pipeline and a development platform with useful APIs on our website at https://leap-hand.github.io/
    摘要 dexterous 操作已经是机器人领域的长期挑战。虽然机器学习技术已经显示了一定的承诺,但结果主要受到硬件的限制。在这篇论文中,我们介绍了LEAP手,一个低成本的手臂,用于机器学习研究。与之前的手臂不同,LEAP手具有新的骨骼结构,允许无论手指pose都能够达到最大的dexterity。LEAP手是低成本的,可以在4小时内为2000美元组装,使用可得到的部件。它可以在长时间内一直承受大的扭矩。我们表明LEAP手可以在真实世界中完成多种操作任务,从视觉操作到学习从无动视频数据和sim2real。LEAP手在所有实验中都能够superior于Allegro手,而且只有1/8的成本。我们在网站https://leap-hand.github.io/上发布了详细的组装指南,Sim2Real管道和开发平台,以及有用的API。

Unveiling the potential of large language models in generating semantic and cross-language clones

  • paper_url: http://arxiv.org/abs/2309.06424
  • repo_url: None
  • paper_authors: Palash R. Roy, Ajmain I. Alam, Farouq Al-omari, Banani Roy, Chanchal K. Roy, Kevin A. Schneider
  • for: 这个论文主要目的是研究使用OpenAI的GPT模型进行semantic和cross-language代码副本生成,以便代码重用、代码理解、重构和性能测试。
  • methods: 该论文使用了SemanticCloneBench作为测试平台,通过对一系列代码片段进行评估,以评估GPT-3模型在生成代码副本方面的性能。
  • results: 研究发现,GPT-3模型在生成semantic和cross-language代码副本方面具有出色的表现,其中在semantic clones方面取得了62.14%的准确率和0.55 BLEU分数,在cross-language clones方面达到了91.25%的准确率。
    Abstract Semantic and Cross-language code clone generation may be useful for code reuse, code comprehension, refactoring and benchmarking. OpenAI's GPT model has potential in such clone generation as GPT is used for text generation. When developers copy/paste codes from Stack Overflow (SO) or within a system, there might be inconsistent changes leading to unexpected behaviours. Similarly, if someone possesses a code snippet in a particular programming language but seeks equivalent functionality in a different language, a semantic cross-language code clone generation approach could provide valuable assistance. In this study, using SemanticCloneBench as a vehicle, we evaluated how well the GPT-3 model could help generate semantic and cross-language clone variants for a given fragment.We have comprised a diverse set of code fragments and assessed GPT-3s performance in generating code variants.Through extensive experimentation and analysis, where 9 judges spent 158 hours to validate, we investigate the model's ability to produce accurate and semantically correct variants. Our findings shed light on GPT-3's strengths in code generation, offering insights into the potential applications and challenges of using advanced language models in software development. Our quantitative analysis yields compelling results. In the realm of semantic clones, GPT-3 attains an impressive accuracy of 62.14% and 0.55 BLEU score, achieved through few-shot prompt engineering. Furthermore, the model shines in transcending linguistic confines, boasting an exceptional 91.25% accuracy in generating cross-language clones
    摘要 semantic和跨语言代码倾Copy generation可能有用于代码重用、代码理解、重构和benchmarking。OpenAI的GPT模型有潜力在这种倾Copy generation中,因为GPT是用于文本生成。当开发者从Stack Overflow(SO)或系统中复制代码时,可能会出现不一致的更改,导致意外的行为。 Similarly,如果某人拥有一个代码段在特定编程语言中,但寻找相同的功能在不同语言中,semantic cross-language code clone generation方法可以提供有价值的帮助。在本研究中,使用SemanticCloneBench作为载体,我们评估了GPT-3模型在给定副本中是否可以生成Semantic和跨语言倾Copy变体。我们组织了一个多样化的代码副本,并评估GPT-3模型在生成代码变体方面的能力。经过广泛的实验和分析,9名判icator在158小时内验证了我们的结论,我们调查了模型在代码生成方面的能力。我们的数据分析得出了有力的结果。在semantic倾Copy领域,GPT-3达到了62.14%的精度和0.55 BLEU分数,通过几个极少的提示工程来实现。此外,模型在跨语言倾Copy方面表现出色,达到了91.25%的精度。

Exploring Large Language Models for Ontology Alignment

  • paper_url: http://arxiv.org/abs/2309.07172
  • repo_url: https://github.com/krr-oxford/llmap-prelim
  • paper_authors: Yuan He, Jiaoyan Chen, Hang Dong, Ian Horrocks
  • for: 这项研究探讨了最新的生成型大语言模型(LLMs)在ontology alignment中的可用性,以确定概念等价映射的跨 ontology 标注。
  • methods: 我们使用了 GPT 系列和 Flan-T5 等生成型大语言模型,并对挑战性训练集进行测试,以评估其零基础性表现。
  • results: 初步发现,LLMs 可能会超越现有的 ontology alignment 系统 like BERTMap,但需要注意framwork和提示设计。
    Abstract This work investigates the applicability of recent generative Large Language Models (LLMs), such as the GPT series and Flan-T5, to ontology alignment for identifying concept equivalence mappings across ontologies. To test the zero-shot performance of Flan-T5-XXL and GPT-3.5-turbo, we leverage challenging subsets from two equivalence matching datasets of the OAEI Bio-ML track, taking into account concept labels and structural contexts. Preliminary findings suggest that LLMs have the potential to outperform existing ontology alignment systems like BERTMap, given careful framework and prompt design.
    摘要 这个研究探讨了最近的生成型大型自然语言模型(LLM),如GPT系列和Flan-T5,在ontology alignment中的可行性,以确定 Ontology 中的概念相似映射。为了测试 Flan-T5-XXL 和 GPT-3.5-turbo 的零学习性能,我们利用了 OAEI Bio-ML 跟踪中的两个等价匹配数据集,考虑概念标签和结构上下文。初步发现,LLM 有可能超越现有的ontology alignment系统BERTMap,提供精心设计的框架和提示。

Ensemble Mask Networks

  • paper_url: http://arxiv.org/abs/2309.06382
  • repo_url: https://github.com/lok-18/GeSeNet
  • paper_authors: Jonny Luntzel
  • for: 这个研究是为了问题 $\mathbb{R}^n\rightarrow \mathbb{R}^n$ Feedforward 网络是否可以学习矩阵-向量乘法?
  • methods: 这个研究引入了两种机制:可变的面罩来处理矩阵输入,以及特有的网络剪辑来尊重面罩的依赖结构。
  • results: 研究表明,这些机制可以使网络模型近似固定操作,如矩阵-向量乘法 $\phi(A,x) \rightarrow Ax$,并有应用于测试依赖关系或交互顺序在图模型中。
    Abstract Can an $\mathbb{R}^n\rightarrow \mathbb{R}^n$ feedforward network learn matrix-vector multiplication? This study introduces two mechanisms - flexible masking to take matrix inputs, and a unique network pruning to respect the mask's dependency structure. Networks can approximate fixed operations such as matrix-vector multiplication $\phi(A,x) \rightarrow Ax$, motivating the mechanisms introduced with applications towards litmus-testing dependencies or interaction order in graph-based models.
    摘要 可以不是$\mathbb{R}^n\to\mathbb{R}^n$的Feedforward网络学习矩阵-向量乘法吗?这个研究提出了两种机制——灵活的面 masking来处理矩阵输入,以及特殊的网络剔除来尊重面的依赖结构。网络可以近似固定操作,如矩阵-向量乘法$\phi(A,x)\to Ax$,这些机制的引入鼓励了在图模型中进行考验依赖关系或交互顺序。

Style2Fab: Functionality-Aware Segmentation for Fabricating Personalized 3D Models with Generative AI

  • paper_url: http://arxiv.org/abs/2309.06379
  • repo_url: None
  • paper_authors: Faraz Faruqi, Ahmed Katary, Tarik Hasic, Amira Abdel-Rahman, Nayeemur Rahman, Leandra Tejedor, Mackenzie Leake, Megan Hofmann, Stefanie Mueller
  • for: 这个论文旨在提供一种自动将3D模型分解成功能和艺术元素的方法,以便用户可以选择性地修改3D模型的艺术元素,不会影响模型的原始功能。
  • methods: 这种方法首先创建了3D模型的功能分类体系,然后使用这个体系进行半自动的分类,将3D模型分解成功能和艺术两个部分。
  • results: 研究人员通过对1000个来自Thingiverse的3D模型进行质量分析,创建了一个功能分类体系,并使用这个体系进行分类,以及一个名为Style2Fab的系统,允许用户选择性地修改3D模型的艺术元素,而不会影响模型的原始功能。
    Abstract With recent advances in Generative AI, it is becoming easier to automatically manipulate 3D models. However, current methods tend to apply edits to models globally, which risks compromising the intended functionality of the 3D model when fabricated in the physical world. For example, modifying functional segments in 3D models, such as the base of a vase, could break the original functionality of the model, thus causing the vase to fall over. We introduce a method for automatically segmenting 3D models into functional and aesthetic elements. This method allows users to selectively modify aesthetic segments of 3D models, without affecting the functional segments. To develop this method we first create a taxonomy of functionality in 3D models by qualitatively analyzing 1000 models sourced from a popular 3D printing repository, Thingiverse. With this taxonomy, we develop a semi-automatic classification method to decompose 3D models into functional and aesthetic elements. We propose a system called Style2Fab that allows users to selectively stylize 3D models without compromising their functionality. We evaluate the effectiveness of our classification method compared to human-annotated data, and demonstrate the utility of Style2Fab with a user study to show that functionality-aware segmentation helps preserve model functionality.
    摘要 To develop this method, we first created a taxonomy of functionality in 3D models by analyzing 1000 models from a popular 3D printing repository, Thingiverse. We then developed a semi-automatic classification method to decompose 3D models into functional and aesthetic elements. We call this system Style2Fab, and it allows users to selectively stylize 3D models without compromising their functionality.We evaluated the effectiveness of our classification method compared to human-annotated data and demonstrated the utility of Style2Fab with a user study. Our results show that functionality-aware segmentation helps preserve the model's functionality, and users can use Style2Fab to selectively stylize 3D models without worrying about compromising their intended use.

Grounded Language Acquisition From Object and Action Imagery

  • paper_url: http://arxiv.org/abs/2309.06335
  • repo_url: None
  • paper_authors: James Robert Kubricht, Zhaoyuan Yang, Jianwei Qiu, Peter Henry Tu
  • for: 这 paper 的目的是研究如何使用深度学习方法来掌握视觉数据的Private语言表示。
  • methods: 该 paper 使用了 Referential game 环境和冲突学习环境来训练 Emergent Language(EL)Encoder/Decoder,并使用了神经机器翻译和随机森林分类来将symbolic表示转化为类别标签。
  • results: 该 paper 在object recognition和action recognition两个实验中使用了这些方法,并使用了Grad-CAM和t-SNE方法来解释symbols生成的含义。
    Abstract Deep learning approaches to natural language processing have made great strides in recent years. While these models produce symbols that convey vast amounts of diverse knowledge, it is unclear how such symbols are grounded in data from the world. In this paper, we explore the development of a private language for visual data representation by training emergent language (EL) encoders/decoders in both i) a traditional referential game environment and ii) a contrastive learning environment utilizing a within-class matching training paradigm. An additional classification layer utilizing neural machine translation and random forest classification was used to transform symbolic representations (sequences of integer symbols) to class labels. These methods were applied in two experiments focusing on object recognition and action recognition. For object recognition, a set of sketches produced by human participants from real imagery was used (Sketchy dataset) and for action recognition, 2D trajectories were generated from 3D motion capture systems (MOVI dataset). In order to interpret the symbols produced for data in each experiment, gradient-weighted class activation mapping (Grad-CAM) methods were used to identify pixel regions indicating semantic features which contribute evidence towards symbols in learned languages. Additionally, a t-distributed stochastic neighbor embedding (t-SNE) method was used to investigate embeddings learned by CNN feature extractors.
    摘要 深度学习方法在自然语言处理方面已经做出了很大的进步。这些模型生成的符号表达了各种多样化的知识,但是不清楚这些符号如何与世界数据相关联。在这篇论文中,我们探索了在私人语言表达中发展的私人语言(EL)编码器/解码器,并在两种不同的环境中训练这些模型:一种传统的引用游戏环境和一种对比学习环境。此外,我们还使用神经机器翻译和随机森林分类来转换符号表达(序列数字符号)为类别标签。这些方法在两个实验中应用,一个是对象识别实验(Sketchy dataset),另一个是动作识别实验(MOVI dataset)。为了解释在每个实验中生成的符号,我们使用梯度权重分布映射(Grad-CAM)方法来确定符号中含有哪些Semantic特征,以及这些特征对象的证据。此外,我们还使用了高度分布随机邻居嵌入(t-SNE)方法来调查由Convolutional Neural Networks(CNN)特征提取器学习的嵌入。

Learning Minimalistic Tsetlin Machine Clauses with Markov Boundary-Guided Pruning

  • paper_url: http://arxiv.org/abs/2309.06315
  • repo_url: https://github.com/cair/tmu
  • paper_authors: Ole-Christoffer Granmo, Per-Arne Andersen, Lei Jiao, Xuan Zhang, Christian Blakely, Tor Tveit
  • for: 本 paper 的目的是提出一种新的 Tsetlin Machine(TM) feedback scheme,用于找到 Markov boundary。
  • methods: 该 scheme 使用 Finite State Automaton - Context-Specific Independence Automaton,可以学习target variable 的 Markov boundary,从而在 TM 学习过程中减少不必要的特征。
  • results: 作者通过实验和理论分析,证明了该 scheme 可以充分利用上下文特定的独立性来找到 Markov boundary,并且可以提高 TM 的学习效率和准确性。
    Abstract A set of variables is the Markov blanket of a random variable if it contains all the information needed for predicting the variable. If the blanket cannot be reduced without losing useful information, it is called a Markov boundary. Identifying the Markov boundary of a random variable is advantageous because all variables outside the boundary are superfluous. Hence, the Markov boundary provides an optimal feature set. However, learning the Markov boundary from data is challenging for two reasons. If one or more variables are removed from the Markov boundary, variables outside the boundary may start providing information. Conversely, variables within the boundary may stop providing information. The true role of each candidate variable is only manifesting when the Markov boundary has been identified. In this paper, we propose a new Tsetlin Machine (TM) feedback scheme that supplements Type I and Type II feedback. The scheme introduces a novel Finite State Automaton - a Context-Specific Independence Automaton. The automaton learns which features are outside the Markov boundary of the target, allowing them to be pruned from the TM during learning. We investigate the new scheme empirically, showing how it is capable of exploiting context-specific independence to find Markov boundaries. Further, we provide a theoretical analysis of convergence. Our approach thus connects the field of Bayesian networks (BN) with TMs, potentially opening up for synergies when it comes to inference and learning, including TM-produced Bayesian knowledge bases and TM-based Bayesian inference.
    摘要 一个集合的变量是marks blanket的一个随机变量,如果它包含所有预测变量的信息,则称之为Markov bound。如果边界不能被缩小而失去有用的信息,则称之为Markov bound。标识随机变量的Markov bound是有利的,因为所有外部边界的变量都是 redundant。因此,Markov bound提供了一个优化的特征集。然而,从数据中学习Markov bound是困难的,因为如果一个或多个变量被从Markov bound中移除,外部边界上的变量可能会开始提供信息。相反,Markov bound内部的变量可能会停止提供信息。每个候选变量的真实角色只有在Markov bound已经被确定出来时才会表现出来。在这篇论文中,我们提出了一种新的Tsetlin Machine(TM)反馈方案,该方案附加了类型I和类型II反馈。方案使用了一个新的Finite State Automaton——Context-Specific Independence Automaton。机器学习以外的 automaton 可以学习随机变量的Markov bound,从而在TM学习过程中将其从TM中除除。我们对新方案进行了实验性研究,并证明了它可以利用上下文特定的独立性来找到Markov bound。我们还提供了一种理论分析的归一化。我们的方法因此将Bayesian networks(BN)和TM相连,可能会开拓新的可能性,包括TM生成的Bayesian知识库和TM基于Bayesian推理的推理。

AI4Food-NutritionFW: A Novel Framework for the Automatic Synthesis and Analysis of Eating Behaviours

  • paper_url: http://arxiv.org/abs/2309.06308
  • repo_url: https://github.com/bidalab/ai4food-nutritionfw
  • paper_authors: Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Isabel Espinosa-Salinas, Gala Freixer, Julian Fierrez, Ruben Vera-Rodriguez, Enrique Carrillo de Santa Pau, Ana Ramírez de Molina, Javier Ortega-Garcia
  • for: 这个论文的目的是提出一种基于人工智能的食物图像数据集创建框架,以便研究食物图像分类和个性化推荐。
  • methods: 该论文使用了图像处理和人工智能技术,并提供了一个具有4,800个不同饮食习惯的食物图像数据集。
  • results: 该论文通过自动评估饮食习惯中的健康指数,并实现了99.53%和99.60%的准确率和敏感度。I hope that helps! Let me know if you have any other questions.
    Abstract Nowadays millions of images are shared on social media and web platforms. In particular, many of them are food images taken from a smartphone over time, providing information related to the individual's diet. On the other hand, eating behaviours are directly related to some of the most prevalent diseases in the world. Exploiting recent advances in image processing and Artificial Intelligence (AI), this scenario represents an excellent opportunity to: i) create new methods that analyse the individuals' health from what they eat, and ii) develop personalised recommendations to improve nutrition and diet under specific circumstances (e.g., obesity or COVID). Having tunable tools for creating food image datasets that facilitate research in both lines is very much needed. This paper proposes AI4Food-NutritionFW, a framework for the creation of food image datasets according to configurable eating behaviours. AI4Food-NutritionFW simulates a user-friendly and widespread scenario where images are taken using a smartphone. In addition to the framework, we also provide and describe a unique food image dataset that includes 4,800 different weekly eating behaviours from 15 different profiles and 1,200 subjects. Specifically, we consider profiles that comply with actual lifestyles from healthy eating behaviours (according to established knowledge), variable profiles (e.g., eating out, holidays), to unhealthy ones (e.g., excess of fast food or sweets). Finally, we automatically evaluate a healthy index of the subject's eating behaviours using multidimensional metrics based on guidelines for healthy diets proposed by international organisations, achieving promising results (99.53% and 99.60% accuracy and sensitivity, respectively). We also release to the research community a software implementation of our proposed AI4Food-NutritionFW and the mentioned food image dataset created with it.
    摘要 现在,数百万个图像在社交媒体和网络平台上被分享。特别是,许多这些图像是由智能手机拍摄的食物图像,提供了关于个人的饮食信息。然而,饮食习惯直接关联了世界上许多最常见的疾病。利用最新的图像处理技术和人工智能(AI),这种情况表现出了优秀的机遇,可以:i) 创建新的方法,从饮食中获取个人健康信息,ii) 为特定情况(如肥胖或 COVID)提供个性化的饮食建议。有一个可调的工具集,用于创建饮食图像集,是研究这两个方面的非常需要。 本文提出了 AI4Food-NutritionFW 框架,用于创建饮食图像集,根据可配置的饮食习惯。 AI4Food-NutritionFW 模拟了一种用户友好、广泛的场景,在智能手机上拍摄图像。除框架外,我们还提供了一个唯一的饮食图像集,包含 4,800 个不同的每周饮食习惯,来自 15 个 profiles 和 1,200 个主题。特别是,我们考虑了遵循实际生活方式的健康饮食习惯(根据已知的知识)、变化 profiles(例如,吃外卖、假日),以及不健康的习惯(例如,过量快餐或糖果)。最后,我们自动评估主题的饮食习惯健康指数,使用多维度指标,基于国际组织提出的健康饮食指南,达到了非常有 promise 的结果(99.53% 和 99.60% 的准确率和敏感度,分别)。我们还向研究社区发布了我们所提出的 AI4Food-NutritionFW 和饮食图像集。

Transferability analysis of data-driven additive manufacturing knowledge: a case study between powder bed fusion and directed energy deposition

  • paper_url: http://arxiv.org/abs/2309.06286
  • repo_url: None
  • paper_authors: Mutahar Safdar, Jiarui Xie, Hyunwoong Ko, Yan Lu, Guy Lamouche, Yaoyao Fiona Zhao
  • for: 本研究旨在提供一种基于数据的知识传递分析框架,以支持在不同加工过程之间传递数据驱动的AM知识。
  • methods: 本研究使用了一种三步知识传递分析框架,包括预传递、传递和后传递步骤。在预传递步骤中,AM知识被抽象为特征化的知识组件。
  • results: 研究表明,可以成功将LPBF处理技术中的数据驱动解决方案传递到DED处理技术中,并在不同数据表示、模型架构和模型参数层次上进行了成功传递。
    Abstract Data-driven research in Additive Manufacturing (AM) has gained significant success in recent years. This has led to a plethora of scientific literature to emerge. The knowledge in these works consists of AM and Artificial Intelligence (AI) contexts that have not been mined and formalized in an integrated way. Moreover, no tools or guidelines exist to support data-driven knowledge transfer from one context to another. As a result, data-driven solutions using specific AI techniques are being developed and validated only for specific AM process technologies. There is a potential to exploit the inherent similarities across various AM technologies and adapt the existing solutions from one process or problem to another using AI, such as Transfer Learning. We propose a three-step knowledge transferability analysis framework in AM to support data-driven AM knowledge transfer. As a prerequisite to transferability analysis, AM knowledge is featurized into identified knowledge components. The framework consists of pre-transfer, transfer, and post-transfer steps to accomplish knowledge transfer. A case study is conducted between flagship metal AM processes. Laser Powder Bed Fusion (LPBF) is the source of knowledge motivated by its relative matureness in applying AI over Directed Energy Deposition (DED), which drives the need for knowledge transfer as the less explored target process. We show successful transfer at different levels of the data-driven solution, including data representation, model architecture, and model parameters. The pipeline of AM knowledge transfer can be automated in the future to allow efficient cross-context or cross-process knowledge exchange.
    摘要 “数据驱动的研究在添加制造(AM)领域在最近几年内取得了重要成功。这导致了一大量的科学文献出现。这些文献中的知识包括AM和人工智能(AI)上下文的知识,它们没有被综合化和系统化地挖掘。此外,没有任何工具或指南来支持数据驱动知识的转移 между不同的上下文。因此,为了解决特定的AM过程技术中的问题,数据驱动解决方案使用特定的AI技术进行开发和验证。这有一定的潜在利用AM过程技术之间的共同特征,并将现有的解决方案从一个过程或问题中转移到另一个过程或问题中使用AI技术,例如传输学习。我们提议一个三步知识转移可行性分析框架在AM中支持数据驱动AM知识转移。在转移可行性分析之前,AM知识被特征化为识别出来的知识组件。该框架包括先转移、转移和后转移三个步骤,以完成知识转移。一个案例研究在标志性金属AM过程之间进行了转移。用激光粉末充电(LPBF)作为知识来源,因为它在应用AI方面更加成熟,而 Directed Energy Deposition(DED)更是一个未经探索的目标过程,这导致了知识转移的需求。我们在不同数据驱动解决方案的各级上成功进行了转移,包括数据表示、模型架构和模型参数。将来,AM知识转移管道可以被自动化,以实现跨上下文或跨过程的高效知识交换。”

Jersey Number Recognition using Keyframe Identification from Low-Resolution Broadcast Videos

  • paper_url: http://arxiv.org/abs/2309.06285
  • repo_url: None
  • paper_authors: Bavesh Balaji, Jerrin Bright, Harish Prakash, Yuhao Chen, David A Clausi, John Zelek
  • for: automatic jersey number detection in sports videos
  • methods: spatio-temporal network, multi-task loss function
  • results: significant increase in accuracy (37.81% and 37.70%)
    Abstract Player identification is a crucial component in vision-driven soccer analytics, enabling various downstream tasks such as player assessment, in-game analysis, and broadcast production. However, automatically detecting jersey numbers from player tracklets in videos presents challenges due to motion blur, low resolution, distortions, and occlusions. Existing methods, utilizing Spatial Transformer Networks, CNNs, and Vision Transformers, have shown success in image data but struggle with real-world video data, where jersey numbers are not visible in most of the frames. Hence, identifying frames that contain the jersey number is a key sub-problem to tackle. To address these issues, we propose a robust keyframe identification module that extracts frames containing essential high-level information about the jersey number. A spatio-temporal network is then employed to model spatial and temporal context and predict the probabilities of jersey numbers in the video. Additionally, we adopt a multi-task loss function to predict the probability distribution of each digit separately. Extensive evaluations on the SoccerNet dataset demonstrate that incorporating our proposed keyframe identification module results in a significant 37.81% and 37.70% increase in the accuracies of 2 different test sets with domain gaps. These results highlight the effectiveness and importance of our approach in tackling the challenges of automatic jersey number detection in sports videos.
    摘要 player identification是视觉驱动足球分析中的关键组件,允许多个下渠道任务,如玩家评估、游戏分析和直播生产。然而,自动从视频中检测篮球号码存在很多挑战,包括运动模糊、低分辨率、扭曲和遮挡。现有方法,使用空间变换网络、CNN和视觉变换器,在图像数据上表现出成功,但在真实世界视频数据上却表现不佳,因为篮球号码在大多数帧中不可见。因此,确定包含篮球号码的帧是关键的子问题。为解决这些问题,我们提议一种可靠的关键帧标识模块,该模块可以提取包含篮球号码的高级信息的帧。然后,我们采用了一种空间-时间网络,以模拟空间和时间上下文,并预测视频中篮球号码的概率。此外,我们采用了多任务损失函数,以预测每个数字的概率分布。我们在SoccerNet数据集进行了广泛的评估,结果表明,将我们提议的关锥帧标识模块integrated into our approach,可以提高视频中篮球号码自动检测的准确率,相对于不含该模块的情况,提高37.81%和37.70%。这些结果 highlights the effectiveness and importance of our approach in tackling the challenges of automatic jersey number detection in sports videos.

Enhancing Multi-modal Cooperation via Fine-grained Modality Valuation

  • paper_url: http://arxiv.org/abs/2309.06255
  • repo_url: None
  • paper_authors: Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu
  • for: 本研究旨在 JOINTLY integrating 不同模式的数据,以提高多模式学习的性能。
  • methods: 我们提出了一种细化模式价值指标,用于评估每个样本中每个模式的贡献。通过模式价值评估,我们发现多模式模型往往依赖于一个特定的模式,导致其他模式成为低贡献的。我们进一步分析这一问题,并通过提高低贡献模式的抑制能力来改善多模式模型的合作。
  • results: 我们的方法可以有效地评估每个样本中每个模式的贡献,并实现了不同多模式模型中的显著提高。
    Abstract One primary topic of multi-modal learning is to jointly incorporate heterogeneous information from different modalities. However, most models often suffer from unsatisfactory multi-modal cooperation, which could not jointly utilize all modalities well. Some methods are proposed to identify and enhance the worse learnt modality, but are often hard to provide the fine-grained observation of multi-modal cooperation at sample-level with theoretical support. Hence, it is essential to reasonably observe and improve the fine-grained cooperation between modalities, especially when facing realistic scenarios where the modality discrepancy could vary across different samples. To this end, we introduce a fine-grained modality valuation metric to evaluate the contribution of each modality at sample-level. Via modality valuation, we regretfully observe that the multi-modal model tends to rely on one specific modality, resulting in other modalities being low-contributing. We further analyze this issue and improve cooperation between modalities by enhancing the discriminative ability of low-contributing modalities in a targeted manner. Overall, our methods reasonably observe the fine-grained uni-modal contribution at sample-level and achieve considerable improvement on different multi-modal models.
    摘要 (使用简化字符串)一个主要的多样性学习主题是将不同Modalities中的异质数据集合在一起。然而,大多数模型通常会受到不满意的多样性合作,无法有效地使用所有Modalities。一些方法可以识别和提高不好学习的Modalities,但是往往无法在样本水平提供细化的多样性合作观察。因此,我们需要合理地观察和改进多样性合作,尤其是在面临现实情况下,模态差异可能会随样本不同而变化。为此,我们引入细化的模态价值度量来评估每个模态的样本级贡献。通过模态价值评估,我们 regretfully 发现,多模态模型往往会依赖于一个具体的模态,导致其他模态成为低贡献的。我们进一步分析这一问题,并通过提高低贡献模态的推诊能力来改善多样性合作。总的来说,我们的方法可以合理地观察细化的单模态贡献,并在不同的多模态模型上实现显著改进。

On the Injunction of XAIxArt

  • paper_url: http://arxiv.org/abs/2309.06227
  • repo_url: None
  • paper_authors: Cheshta Arora, Debarun Sarkar
  • for: 本论文探讨了透明人工智能在艺术领域(XAIxArt)中的纠纷。
  • methods: 文章通过一系列快速问题,探讨了“解释”和“相关的解释”的混淆性。文章拒绝了“解释”和“相关的解释”, argue that XAIxArt 是人类中心艺术的不安和害怕旧有作者和人类活动的回归。
  • results: 文章通过区分了 ornamentation 模型和 sense-making 模型来证明这一观点。
    Abstract The position paper highlights the range of concerns that are engulfed in the injunction of explainable artificial intelligence in art (XAIxArt). Through a series of quick sub-questions, it points towards the ambiguities concerning 'explanation' and the postpositivist tradition of 'relevant explanation'. Rejecting both 'explanation' and 'relevant explanation', the paper takes a stance that XAIxArt is a symptom of insecurity of the anthropocentric notion of art and a nostalgic desire to return to outmoded notions of authorship and human agency. To justify this stance, the paper makes a distinction between an ornamentation model of explanation to a model of explanation as sense-making.
    摘要 文章发表于XAIxArt的各种问题,包括'解释'和'有用的解释'的歧义,以及人类中心艺术的不安和宁静愿返回过时的作者和人类活动。文章根据解释模型和意义解释模型的区别,提出了这种姿势。Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and other parts of the world.

Unveiling Signle-Bit-Flip Attacks on DNN Executables

  • paper_url: http://arxiv.org/abs/2309.06223
  • repo_url: None
  • paper_authors: Yanzuo Chen, Zhibo Liu, Yuanyuan Yuan, Sihang Hu, Tianxiang Li, Shuai Wang
  • for: 防护深度学习模型免受Bit-flip攻击
  • methods: 使用自动搜索工具发现深度学习模型漏洞,并利用模型结构在深度学习模型执行器中发现实际攻击方向
  • results: 发现深度学习模型执行器中存在广泛、严重(如单位位异常)和可传递的攻击表面,这些攻击表面不存在于高级深度学习框架中的模型 weights,可以让攻击者控制输出标签
    Abstract Recent research has shown that bit-flip attacks (BFAs) can manipulate deep neural networks (DNNs) via DRAM Rowhammer exploitations. Existing attacks are primarily launched over high-level DNN frameworks like PyTorch and flip bits in model weight files. Nevertheless, DNNs are frequently compiled into low-level executables by deep learning (DL) compilers to fully leverage low-level hardware primitives. The compiled code is usually high-speed and manifests dramatically distinct execution paradigms from high-level DNN frameworks. In this paper, we launch the first systematic study on the attack surface of BFA specifically for DNN executables compiled by DL compilers. We design an automated search tool to identify vulnerable bits in DNN executables and identify practical attack vectors that exploit the model structure in DNN executables with BFAs (whereas prior works make likely strong assumptions to attack model weights). DNN executables appear more "opaque" than models in high-level DNN frameworks. Nevertheless, we find that DNN executables contain extensive, severe (e.g., single-bit flip), and transferrable attack surfaces that are not present in high-level DNN models and can be exploited to deplete full model intelligence and control output labels. Our finding calls for incorporating security mechanisms in future DNN compilation toolchains.
    摘要

SCP: Scene Completion Pre-training for 3D Object Detection

  • paper_url: http://arxiv.org/abs/2309.06199
  • repo_url: None
  • paper_authors: Yiming Shan, Yan Xia, Yuhong Chen, Daniel Cremers
  • for: 提高3D物体探测器的性能,使其需要更少的标注数据。
  • methods: 使用Scene Completion Pre-training(SCP)方法,通过完成点云场景,更好地捕捉城市环境中物体之间的空间和Semantic关系,并消除需要额外数据集的需求。
  • results: 使用SCP方法可以使现有的状态体际3D探测器达到相同的性能,只需要20%的标注数据。
    Abstract 3D object detection using LiDAR point clouds is a fundamental task in the fields of computer vision, robotics, and autonomous driving. However, existing 3D detectors heavily rely on annotated datasets, which are both time-consuming and prone to errors during the process of labeling 3D bounding boxes. In this paper, we propose a Scene Completion Pre-training (SCP) method to enhance the performance of 3D object detectors with less labeled data. SCP offers three key advantages: (1) Improved initialization of the point cloud model. By completing the scene point clouds, SCP effectively captures the spatial and semantic relationships among objects within urban environments. (2) Elimination of the need for additional datasets. SCP serves as a valuable auxiliary network that does not impose any additional efforts or data requirements on the 3D detectors. (3) Reduction of the amount of labeled data for detection. With the help of SCP, the existing state-of-the-art 3D detectors can achieve comparable performance while only relying on 20% labeled data.
    摘要 三维对象检测使用激光点云是计算机视觉、 робо控和自动驾驶等领域的基本任务。然而,现有的三维检测器均依赖于标注过的数据集,这些数据集的标注过程昂贵且容易出错。在这篇论文中,我们提出了场景完成预训练(SCP)方法,以提高三维对象检测器的性能,并且需要更少的标注数据。SCP具有以下三个优势:1. 提高点云模型的初始化。通过完善场景点云,SCP可以有效地捕捉城市环境中物体之间的空间和semantic关系。2. 消除需要更多数据集的需求。SCP作为一个有价值的辅助网络,不需要额外的努力或数据要求。3. 降低检测需要的标注数据量。通过SCP的帮助,现有的状态对检测器可以在20%标注数据的情况下实现相同的性能。

360$^\circ$ from a Single Camera: A Few-Shot Approach for LiDAR Segmentation

  • paper_url: http://arxiv.org/abs/2309.06197
  • repo_url: None
  • paper_authors: Laurenz Reichardt, Nikolas Ebert, Oliver Wasenmüller
  • for: 这篇论文旨在提出一种有效和快速的 label-efficient LiDAR 分割方法,以增进现有方法的精度和可靠性。
  • methods: 该方法使用了一个图像教师网络来生成 LiDAR 数据中的 semantic 预测,并将其用于预训练 LiDAR 分割学生网络。可选的是进行360度数据的精度调整。
  • results: 该方法可以超过现有的标注效率方法的结果,并且在一些传统的完全监督分割网络之上还取得了更高的性能。
    Abstract Deep learning applications on LiDAR data suffer from a strong domain gap when applied to different sensors or tasks. In order for these methods to obtain similar accuracy on different data in comparison to values reported on public benchmarks, a large scale annotated dataset is necessary. However, in practical applications labeled data is costly and time consuming to obtain. Such factors have triggered various research in label-efficient methods, but a large gap remains to their fully-supervised counterparts. Thus, we propose ImageTo360, an effective and streamlined few-shot approach to label-efficient LiDAR segmentation. Our method utilizes an image teacher network to generate semantic predictions for LiDAR data within a single camera view. The teacher is used to pretrain the LiDAR segmentation student network, prior to optional fine-tuning on 360$^\circ$ data. Our method is implemented in a modular manner on the point level and as such is generalizable to different architectures. We improve over the current state-of-the-art results for label-efficient methods and even surpass some traditional fully-supervised segmentation networks.
    摘要 深度学习应用于激光数据受到不同感知器或任务的域隔差很强。为了使这些方法在不同数据上达到类似准确性,需要一个大规模的注意力标注数据集。然而,在实际应用中,标注数据昂贵和时间消耗。这些因素引发了各种研究label-efficient方法,但与完全监督方法之间仍有大的差距。因此,我们提出ImageTo360,一种高效的几极shot方法 для标签efficient LiDAR分割。我们的方法使用图像教师网络生成激光数据中的semantic预测,并在单个相机视图中使用这些预测来预训练LiDAR分割学生网络。我们的方法实现在点级别上,可以与不同的架构进行拓展。我们超越当前状态的域隔差标签方法,甚至超过了一些传统的完全监督分割网络。

A 3M-Hybrid Model for the Restoration of Unique Giant Murals: A Case Study on the Murals of Yongle Palace

  • paper_url: http://arxiv.org/abs/2309.06194
  • repo_url: None
  • paper_authors: Jing Yang, Nur Intan Raihana Ruhaiyem, Chichun Zhou
  • for: restore the Yongle Palace murals, which are valuable cultural heritage but have suffered damage
  • methods: propose a 3M-Hybrid model that leverages a pre-trained Vision Transformer model (VIT) and a multi-scale and multi-perspective strategy to address the challenges of domain bias and large defect restoration
  • results: improve SSIM and PSNR by 14.61% and 4.73%, respectively, compared to the best model among four representative CNN models, and achieve favorable results in the final restoration of giant murals.Here’s the full text in Simplified Chinese:
  • for: Restore the Yongle Palace murals, which are valuable cultural heritage but have suffered damage.
  • methods: 提出3M-Hybrid模型,利用预训练的视图转换器模型(VIT)和多尺度多角度策略,解决传统传输学学习方法中的领域偏见问题,以及大 défaut的 restore 问题。
  • results: 与最佳四种表征 CNN 模型比较,提高 SSIM 和 PSNR 指标的提升率分别为14.61%和4.73%,并在大 défaut的最终 restore 问题上获得了良好的结果。
    Abstract The Yongle Palace murals, as valuable cultural heritage, have suffered varying degrees of damage, making their restoration of significant importance. However, the giant size and unique data of Yongle Palace murals present challenges for existing deep-learning based restoration methods: 1) The distinctive style introduces domain bias in traditional transfer learning-based restoration methods, while the scarcity of mural data further limits the applicability of these methods. 2) Additionally, the giant size of these murals results in a wider range of defect types and sizes, necessitating models with greater adaptability. Consequently, there is a lack of focus on deep learning-based restoration methods for the unique giant murals of Yongle Palace. Here, a 3M-Hybrid model is proposed to address these challenges. Firstly, based on the characteristic that the mural data frequency is prominent in the distribution of low and high frequency features, high and low frequency features are separately abstracted for complementary learning. Furthermore, we integrate a pre-trained Vision Transformer model (VIT) into the CNN module, allowing us to leverage the benefits of a large model while mitigating domain bias. Secondly, we mitigate seam and structural distortion issues resulting from the restoration of large defects by employing a multi-scale and multi-perspective strategy, including data segmentation and fusion. Experimental results demonstrate the efficacy of our proposed model. In regular-sized mural restoration, it improves SSIM and PSNR by 14.61% and 4.73%, respectively, compared to the best model among four representative CNN models. Additionally, it achieves favorable results in the final restoration of giant murals.
    摘要 永乐宫壁画,作为文化遗产,受到不同程度的损害,因此 restore 的重要性提高。然而,永乐宫壁画的巨大大小和独特数据带来了现有深度学习基于 restore 方法的挑战:1)宫壁画的特殊风格引入传统 transfer learning 基于 restore 方法中的领域偏见,而且宫壁画数据的罕见性更限制了这些方法的应用。2)此外,宫壁画的巨大大小导致了更多的缺陷类型和大小,需要更加适应性强的模型。因此,对于永乐宫壁画独特的巨大宫壁画,深度学习基于 restore 方法受到了不 enough 的关注。在这里,我们提出了一种3M-Hybrid模型,以解决这些挑战。首先,基于宫壁画数据频谱在低频和高频特征之间的分布,我们分别抽取了高频和低频特征进行 complementary learning。其次,我们将预训练的 Vision Transformer 模型(VIT)integrated 到 CNN 模块中,以利用大型模型的优势,同时避免领域偏见。其次,我们使用多比例和多视角策略来mitigate 修复大缺陷的问题,包括数据分割和融合。实验结果表明,我们提出的模型在 regular-sized 宫壁画修复中提高了 SSIM 和 PSNR 指标的值,相比最佳四种 CNN 模型,提高了14.61%和4.73%。此外,它在巨大宫壁画的最终修复中也获得了良好的结果。

Glancing Future for Simultaneous Machine Translation

  • paper_url: http://arxiv.org/abs/2309.06179
  • repo_url: https://github.com/ictnlp/glance-simt
  • paper_authors: Shoutao Guo, Shaolei Zhang, Yang Feng
  • for: 提高同域机器翻译模型的翻译能力
  • methods: 使用课程学习方法,从整个句子逐渐减少可用的源语言信息,以便在 prefix2prefix 训练中增强模型的翻译能力
  • results: 比 STRONG 基eline 高效,且适用于各种同域机器翻译方法
    Abstract Simultaneous machine translation (SiMT) outputs translation while reading the source sentence. Unlike conventional sequence-to-sequence (seq2seq) training, existing SiMT methods adopt the prefix-to-prefix (prefix2prefix) training, where the model predicts target tokens based on partial source tokens. However, the prefix2prefix training diminishes the ability of the model to capture global information and introduces forced predictions due to the absence of essential source information. Consequently, it is crucial to bridge the gap between the prefix2prefix training and seq2seq training to enhance the translation capability of the SiMT model. In this paper, we propose a novel method that glances future in curriculum learning to achieve the transition from the seq2seq training to prefix2prefix training. Specifically, we gradually reduce the available source information from the whole sentence to the prefix corresponding to that latency. Our method is applicable to a wide range of SiMT methods and experiments demonstrate that our method outperforms strong baselines.
    摘要 同时机器翻译(SiMT)输出翻译 mientras lee la oración de fuente. 与现有的序列到序列(seq2seq)训练不同,现有的 SiMT 方法采用了 prefix-to-prefix(prefix2prefix)训练,其中模型预测目标Token基于部分源Token。然而, prefix2prefix 训练减少了模型捕捉全局信息的能力,并导致强制预测因为缺少必要的源信息。因此,它是必要的 bridge seq2seq 训练和 prefix2prefix 训练,以提高 SiMT 模型的翻译能力。在这篇论文中,我们提出了一种新的方法,通过观察未来的劳动学习来实现这种过渡。具体来说,我们逐渐减少了可用的源信息从整个句子到对应的遅延。我们的方法适用于各种 SiMT 方法,并且实验表明,我们的方法超过了强大的基eline。

Robust-MBDL: A Robust Multi-branch Deep Learning Based Model for Remaining Useful Life Prediction and Operational Condition Identification of Rotating Machines

  • paper_url: http://arxiv.org/abs/2309.06157
  • repo_url: None
  • paper_authors: Khoa Tran, Hai-Canh Vu, Lam Pham, Nassim Boudaoud
  • for: 预测扭矩机器的剩余有用生命(RUL)和状况操作(CO)
  • methods: 提posed system包括主要组件:1)LSTM自适应网络去噪振荡数据; 2)特征提取从去噪数据中生成时域、频域和时频基于特征; 3)一种新型和可靠的多支lines deep learning网络架构,利用多个特征
  • results: 对两个基准数据集XJTU-SY和PRONOSTIA进行了性能评估,结果表明我们提posed系统在RUL和CO预测方面准确率高,与当前最佳系统相比,具有实际应用潜力。
    Abstract In this paper, a Robust Multi-branch Deep learning-based system for remaining useful life (RUL) prediction and condition operations (CO) identification of rotating machines is proposed. In particular, the proposed system comprises main components: (1) an LSTM-Autoencoder to denoise the vibration data; (2) a feature extraction to generate time-domain, frequency-domain, and time-frequency based features from the denoised data; (3) a novel and robust multi-branch deep learning network architecture to exploit the multiple features. The performance of our proposed system was evaluated and compared to the state-of-the-art systems on two benchmark datasets of XJTU-SY and PRONOSTIA. The experimental results prove that our proposed system outperforms the state-of-the-art systems and presents potential for real-life applications on bearing machines.
    摘要 本文提出了一种基于深度学习的多支分支系统,用于预测旋转机器的剩余有用生命(RUL)和状况操作(CO)。特别是,提案的系统包括主要组成部分:1. LSTM自适应神经网络来排除振荡数据中的噪声;2. 特征提取来生成时域、频域和时频基于特征;3. 一种新的和可靠的多支分支深度学习网络架构,以利用多个特征。我们提出的系统的性能被评估并与现有系统进行比较,使用了两个XJTU-SY和PRONOSTIA的数据集。实验结果表明,我们的提案系统在RUL预测和CO识别方面表现出色,并有可能应用于真实的滚珍机器。

Measuring vagueness and subjectivity in texts: from symbolic to neural VAGO

  • paper_url: http://arxiv.org/abs/2309.06132
  • repo_url: None
  • paper_authors: Benjamin Icard, Vincent Claveau, Ghislain Atemezing, Paul Égré
  • for: 本研究旨在开发一种自动测量文本模糊和主观性的方法。
  • methods: 我们首先介绍了专家系统VAGO,然后在一个小 benchmark上证明了它在fact vs. opinion句子上的效果,然后在更大的法语新闻词汇库FreSaDa上进行了对比,并证明了讽刺文章中的主观标志更为常见。最后,我们基于BERT-like架构建立了一个神经网络版本VAGO,并通过LIME的解释工具来证明其对symbolic VAGO scores的增强和其他语言版本的生成的重要性。
  • results: 研究结果表明,神经网络版本VAGO在 FreSaDa 上的表现更好,并且可以增强 symbolic VAGO scores 的lexicons。此外,神经网络版本还可以生成其他语言版本,并且可以通过LIME的解释工具来了解它们的工作原理。
    Abstract We present a hybrid approach to the automated measurement of vagueness and subjectivity in texts. We first introduce the expert system VAGO, we illustrate it on a small benchmark of fact vs. opinion sentences, and then test it on the larger French press corpus FreSaDa to confirm the higher prevalence of subjective markers in satirical vs. regular texts. We then build a neural clone of VAGO, based on a BERT-like architecture, trained on the symbolic VAGO scores obtained on FreSaDa. Using explainability tools (LIME), we show the interest of this neural version for the enrichment of the lexicons of the symbolic version, and for the production of versions in other languages.
    摘要 我们提出了一种混合方法来自动量化文本中的uncertainty和主观性。我们首先介绍了专家系统VAGO,然后在一个小的对比实验中使用它对fact vs. opinion句子进行了示例,然后在更大的法国报纸词汇 corpus FreSaDa 上进行了测试,以确认在幽默 VS. 常规文本中的主观标记的更高频率。然后,我们建立了一个基于BERT-like架构的神经网络副本,并使用LIME Explainability工具来展示其在符号式 VAGO 分数中的利用性和在其他语言中生成版本的可能性。

JOADAA: joint online action detection and action anticipation

  • paper_url: http://arxiv.org/abs/2309.06130
  • repo_url: None
  • paper_authors: Mohammed Guermal, Francois Bremond, Rui Dai, Abid Ali
  • for: 这两个任务的缺失完整的知识集(过去、当前和未来)使得推断动作依赖关系困难,从而影响性能。
  • methods: 我们提议将这两个任务 fusion into a single uniform architecture,通过结合动作预测和在线动作检测,以捕捉未来信息的潜在相互关系。
  • results: 我们在三个挑战性 dataset(THUMOS’14、CHARADES和Multi-THUMOS)上验证了我们的提议模型(JOADAA),并 achieved SOTA results for both tasks。
    Abstract Action anticipation involves forecasting future actions by connecting past events to future ones. However, this reasoning ignores the real-life hierarchy of events which is considered to be composed of three main parts: past, present, and future. We argue that considering these three main parts and their dependencies could improve performance. On the other hand, online action detection is the task of predicting actions in a streaming manner. In this case, one has access only to the past and present information. Therefore, in online action detection (OAD) the existing approaches miss semantics or future information which limits their performance. To sum up, for both of these tasks, the complete set of knowledge (past-present-future) is missing, which makes it challenging to infer action dependencies, therefore having low performances. To address this limitation, we propose to fuse both tasks into a single uniform architecture. By combining action anticipation and online action detection, our approach can cover the missing dependencies of future information in online action detection. This method referred to as JOADAA, presents a uniform model that jointly performs action anticipation and online action detection. We validate our proposed model on three challenging datasets: THUMOS'14, which is a sparsely annotated dataset with one action per time step, CHARADES, and Multi-THUMOS, two densely annotated datasets with more complex scenarios. JOADAA achieves SOTA results on these benchmarks for both tasks.
    摘要 <> translate="no"Action anticipation involves forecasting future actions by connecting past events to future ones. However, this reasoning ignores the real-life hierarchy of events, which is composed of three main parts: past, present, and future. We argue that considering these three main parts and their dependencies could improve performance. On the other hand, online action detection is the task of predicting actions in a streaming manner. In this case, one has access only to the past and present information. Therefore, in online action detection (OAD), the existing approaches miss semantics or future information, which limits their performance. To sum up, for both of these tasks, the complete set of knowledge (past-present-future) is missing, which makes it challenging to infer action dependencies, therefore having low performances. To address this limitation, we propose to fuse both tasks into a single uniform architecture. By combining action anticipation and online action detection, our approach can cover the missing dependencies of future information in online action detection. This method, referred to as JOADAA, presents a uniform model that jointly performs action anticipation and online action detection. We validate our proposed model on three challenging datasets: THUMOS'14, which is a sparsely annotated dataset with one action per time step, CHARADES, and Multi-THUMOS, two densely annotated datasets with more complex scenarios. JOADAA achieves SOTA results on these benchmarks for both tasks.Note: I've kept the original text's sentence structure and vocabulary as much as possible, but some words and phrases may have been adjusted slightly to fit the Simplified Chinese grammar and idiomatic expressions.

LEyes: A Lightweight Framework for Deep Learning-Based Eye Tracking using Synthetic Eye Images

  • paper_url: http://arxiv.org/abs/2309.06129
  • repo_url: https://github.com/dcnieho/byrneetal_leyes
  • paper_authors: sean anthony byrne, virmarie maquiling, marcus nyström, enkelejda kasneci, diederick c. niehorster
  • For: This paper aims to address the problem of inadequate training datasets for gaze estimation techniques, which has hindered the deployment of deep learning models in real-world applications.* Methods: The proposed framework, called Light Eyes (LEyes), uses simple light distributions to model key image features required for video-based eye tracking, facilitating easy configuration for training neural networks across diverse gaze-estimation tasks.* Results: The authors demonstrate that models trained using LEyes outperform other state-of-the-art algorithms in terms of pupil and CR localization across well-known datasets, and a LEyes trained model outperforms the industry standard eye tracker using significantly more cost-effective hardware.
    Abstract Deep learning has bolstered gaze estimation techniques, but real-world deployment has been impeded by inadequate training datasets. This problem is exacerbated by both hardware-induced variations in eye images and inherent biological differences across the recorded participants, leading to both feature and pixel-level variance that hinders the generalizability of models trained on specific datasets. While synthetic datasets can be a solution, their creation is both time and resource-intensive. To address this problem, we present a framework called Light Eyes or "LEyes" which, unlike conventional photorealistic methods, only models key image features required for video-based eye tracking using simple light distributions. LEyes facilitates easy configuration for training neural networks across diverse gaze-estimation tasks. We demonstrate that models trained using LEyes outperform other state-of-the-art algorithms in terms of pupil and CR localization across well-known datasets. In addition, a LEyes trained model outperforms the industry standard eye tracker using significantly more cost-effective hardware. Going forward, we are confident that LEyes will revolutionize synthetic data generation for gaze estimation models, and lead to significant improvements of the next generation video-based eye trackers.
    摘要

Fidelity-Induced Interpretable Policy Extraction for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.06097
  • repo_url: None
  • paper_authors: Xiao Liu, Wubing Chen, Mao Tan
  • for: 提高深度强化学习(DRL)代理人的可读性和可信度,使用者能够了解代理人的决策过程和弱点。
  • methods: 基于新的可读性增强机制, FIPE 方法通过评估已有 IPE 方法的优化问题,并添加了一个准确度评估指标,以提高代理人的决策可读性和一致性。
  • results: 在 StarCraft II 复杂的控制环境中进行实验,FIPE 方法在交互性性和一致性两个方面都超过了基eline,同时易于理解。
    Abstract Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making problems. However, existing DRL agents make decisions in an opaque fashion, hindering the user from establishing trust and scrutinizing weaknesses of the agents. While recent research has developed Interpretable Policy Extraction (IPE) methods for explaining how an agent takes actions, their explanations are often inconsistent with the agent's behavior and thus, frequently fail to explain. To tackle this issue, we propose a novel method, Fidelity-Induced Policy Extraction (FIPE). Specifically, we start by analyzing the optimization mechanism of existing IPE methods, elaborating on the issue of ignoring consistency while increasing cumulative rewards. We then design a fidelity-induced mechanism by integrate a fidelity measurement into the reinforcement learning feedback. We conduct experiments in the complex control environment of StarCraft II, an arena typically avoided by current IPE methods. The experiment results demonstrate that FIPE outperforms the baselines in terms of interaction performance and consistency, meanwhile easy to understand.
    摘要 Specifically, we start by analyzing the optimization mechanism of existing IPE methods, highlighting the issue of ignoring consistency while increasing cumulative rewards. We then design a fidelity-induced mechanism by integrating a fidelity measurement into the reinforcement learning feedback. We conduct experiments in the complex control environment of StarCraft II, an environment typically avoided by current IPE methods. The experiment results demonstrate that FIPE outperforms the baselines in terms of interaction performance and consistency, while being easy to understand.

A Machine Learning Framework to Deconstruct the Primary Drivers for Electricity Market Price Events

  • paper_url: http://arxiv.org/abs/2309.06082
  • repo_url: None
  • paper_authors: Milan Jain, Xueqing Sun, Sohom Datta, Abhishek Somani
  • for: 本研究旨在分析和揭示现代电力市场中变化的价格形成因素,尤其是高可再生能源含量下的价格峰值事件。
  • methods: 该研究提出了一种基于机器学习的分析框架,可以帮助分解现代电力市场中价格峰值事件的主要驱动因素。
  • results: 研究结果表明,价格峰值事件的主要驱动因素包括可再生能源含量、天气因素和市场操作因素等。这些结果可以用于多个重要的市场设计、可再生发电和干预、运营和网络安全应用。
    Abstract Power grids are moving towards 100% renewable energy source bulk power grids, and the overall dynamics of power system operations and electricity markets are changing. The electricity markets are not only dispatching resources economically but also taking into account various controllable actions like renewable curtailment, transmission congestion mitigation, and energy storage optimization to ensure grid reliability. As a result, price formations in electricity markets have become quite complex. Traditional root cause analysis and statistical approaches are rendered inapplicable to analyze and infer the main drivers behind price formation in the modern grid and markets with variable renewable energy (VRE). In this paper, we propose a machine learning-based analysis framework to deconstruct the primary drivers for price spike events in modern electricity markets with high renewable energy. The outcomes can be utilized for various critical aspects of market design, renewable dispatch and curtailment, operations, and cyber-security applications. The framework can be applied to any ISO or market data; however, in this paper, it is applied to open-source publicly available datasets from California Independent System Operator (CAISO) and ISO New England (ISO-NE).
    摘要 《电力网络在向100%可再生能源源扩展方向上进行变革,电力市场的运营和供应链也在不断发生变化。电力市场不仅经济地派发资源,还考虑了多种可控行为,如可再生能源减少、传输拥堵缓解和能量存储优化,以确保网络可靠性。因此,电力市场价格的形成变得非常复杂。传统的根本原因分析和统计方法在现代网络和市场中变得无法应用,用于分析和推导现代电力市场价格主要驱动力的主要驱动力。在这篇论文中,我们提出一种基于机器学习的分析框架,以分解现代电力市场价格峰值事件的主要驱动力。这些结果可以用于各种关键应用,如市场设计、可再生发电和减少、运营和网络安全应用。这种框架可以应用于任何ISO或市场数据,但在这篇论文中,它被应用于公开可用的CAISO和ISO-NE数据集。》Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore, while Traditional Chinese is used in Taiwan, Hong Kong, and Macau.

BatMan-CLR: Making Few-shots Meta-Learners Resilient Against Label Noise

  • paper_url: http://arxiv.org/abs/2309.06046
  • repo_url: None
  • paper_authors: Jeroen M. Galjaard, Robert Birke, Juan Perez, Lydia Y. Chen
  • for: 本研究探讨了meta-learning中 label noise 的负面影响,并提出了两种采样技术来增强meta-learner对 label noise 的抗性。
  • methods: 本研究使用了现有的 gradient-based $N$-way $K$-shot learners,并对其进行了extensive的分析和对比。同时,本研究还提出了两种采样技术,namely manifold (Man)和 batch manifold (BatMan),以帮助meta-learner更好地利用噪音标签。
  • results: 研究结果显示,当meta-training中存在 label noise时, gradient-based $N$-way $K$-shot learners 的准确率可以下降达42%。而通过使用 manifold (Man) 和 batch manifold (BatMan) 采样技术,可以减少 meta-testing 中 label noise 的影响,并限制meta-testing准确率下降在${2.5}$, ${9.4}$, ${1.1}$ percent points 之间。
    Abstract The negative impact of label noise is well studied in classical supervised learning yet remains an open research question in meta-learning. Meta-learners aim to adapt to unseen learning tasks by learning a good initial model in meta-training and consecutively fine-tuning it according to new tasks during meta-testing. In this paper, we present the first extensive analysis of the impact of varying levels of label noise on the performance of state-of-the-art meta-learners, specifically gradient-based $N$-way $K$-shot learners. We show that the accuracy of Reptile, iMAML, and foMAML drops by up to 42% on the Omniglot and CifarFS datasets when meta-training is affected by label noise. To strengthen the resilience against label noise, we propose two sampling techniques, namely manifold (Man) and batch manifold (BatMan), which transform the noisy supervised learners into semi-supervised ones to increase the utility of noisy labels. We first construct manifold samples of $N$-way $2$-contrastive-shot tasks through augmentation, learning the embedding via a contrastive loss in meta-training, and then perform classification through zeroing on the embedding in meta-testing. We show that our approach can effectively mitigate the impact of meta-training label noise. Even with 60% wrong labels \batman and \man can limit the meta-testing accuracy drop to ${2.5}$, ${9.4}$, ${1.1}$ percent points, respectively, with existing meta-learners across the Omniglot, CifarFS, and MiniImagenet datasets.
    摘要 研究者们已经广泛研究了经典的超级学习中的标签噪声的负面影响,但在元学习领域中,这个问题仍然是一个开放的研究问题。元学习者希望通过学习一个好的初始模型,并在新任务上进行细化调整来适应未看过的学习任务。在这篇论文中,我们提供了首次对元学习器的标签噪声的影响进行了广泛的分析。我们发现,当元训练被标签噪声影响时,Reptile、iMAML和foMAML的精度 Drop by up to 42% on the Omniglot and CifarFS datasets。为了增强元学习器对标签噪声的抗性,我们提议了两种采样技术: manifold(Man)和批量 manifold(BatMan)。这两种技术可以将噪声标注的超级学习器转换成 semi-supervised 学习器,以提高噪声标注的利用性。我们首先通过扩展来构建 $N$-way $2$-contrastive-shot任务的 manifold 样本,然后通过预训练一个嵌入向量,并在元测试中使用零化来进行分类。我们发现,我们的方法可以有效地减轻元训练标签噪声的影响。即使有60%的标签是错误的,batman 和 man 可以限制元测试精度下降到 ${2.5}$, ${9.4}$, ${1.1}$%点,分别。

Update Monte Carlo tree search (UMCTS) algorithm for heuristic global search of sizing optimization problems for truss structures

  • paper_url: http://arxiv.org/abs/2309.06045
  • repo_url: None
  • paper_authors: Fu-Yao Ko, Katsuyuki Suzuki, Kazuo Yonekura
  • for: optimization of truss structures
  • methods: reinforcement learning (RL) and Monte Carlo tree search (MCTS) with upper confidence bound (UCB)
  • results: efficient optimization algorithm with at least ten times faster computation time than branch and bound (BB) method, and stable better solutions than other conventional methods.Here’s the simplified Chinese text:
  • for: 这篇论文是针对桁架结构的最佳化问题进行研究。
  • methods: 使用强化学习(RL)和Monte Carlo tree search(MCTS)方法,并且加上最高信心界(UCB)。
  • results: 提出了一个高效的最佳化算法, computation time 至少比branch and bound(BB)方法快十倍,并且稳定地获得更好的解。
    Abstract Sizing optimization of truss structures is a complex computational problem, and the reinforcement learning (RL) is suitable for dealing with multimodal problems without gradient computations. In this paper, a new efficient optimization algorithm called update Monte Carlo tree search (UMCTS) is developed to obtain the appropriate design for truss structures. UMCTS is an RL-based method that combines the novel update process and Monte Carlo tree search (MCTS) with the upper confidence bound (UCB). Update process means that in each round, the optimal cross-sectional area of each member is determined by search tree, and its initial state is the final state in the previous round. In the UMCTS algorithm, an accelerator for the number of selections for member area and iteration number is introduced to reduce the computation time. Moreover, for each state, the average reward is replaced by the best reward collected on the simulation process to determine the optimal solution. The proposed optimization method is examined on some benchmark problems of planar and spatial trusses with discrete sizing variables to demonstrate the efficiency and validity. It is shown that the computation time for the proposed approach is at least ten times faster than the branch and bound (BB) method. The numerical results indicate that the proposed method stably achieves better solution than other conventional methods.
    摘要 ��erton 优化算法的评估是一个复杂的计算问题,而人工智能学习(RL)是适用于多Modal问题的解决方案。在这篇论文中,一种新的高效优化算法called update Monte Carlo tree search(UMCTS)被开发出来,以获取适当的设计 dla truss 结构。 UMCTS 是一种基于 RL 的方法,它将 Monte Carlo tree search(MCTS)与Upper Confidence Bound(UCB)结合,并在每一轮中,通过搜索树来确定每个成员的最佳跨部分面积。在 UMCTS 算法中,一种加速器 для成员面积和迭代次数的数量被引入,以降低计算时间。此外,每个状态下的平均奖励被 replaced by 最佳在 Simulation 过程中收集的奖励,以确定最佳解决方案。提出的优化方法被应用于一些 benchmark 问题 of planar 和 spatial trusses with discrete sizing variables,以示出其高效性和有效性。结果表明,提出的方法的计算时间至少比 branch and bound(BB)方法快 ten times。 numerics 表明,提出的方法稳定地实现了 better solution than other conventional methods。

Learning Score-based Grasping Primitive for Human-assisting Dexterous Grasping

  • paper_url: http://arxiv.org/abs/2309.06038
  • repo_url: None
  • paper_authors: Tianhao Wu, Mingdong Wu, Jiyao Zhang, Yunchong Gan, Hao Dong
  • for: 本研究旨在帮助用户在人手不可用或不适用的情况下,使用人工智能手部协助 grasping 物品。
  • methods: 本研究提出了一种新的任务 called human-assisting dexterous grasping,旨在训练一个控制 робо手指的策略,以帮助用户 grasping 物品。
  • results: 实验结果表明,我们提出的方法在比基eline的情况下具有优势,highlighting 用户consciousness 和实际应用性。codes 和演示可以在 “https://sites.google.com/view/graspgf“ 中查看。
    Abstract The use of anthropomorphic robotic hands for assisting individuals in situations where human hands may be unavailable or unsuitable has gained significant importance. In this paper, we propose a novel task called human-assisting dexterous grasping that aims to train a policy for controlling a robotic hand's fingers to assist users in grasping objects. Unlike conventional dexterous grasping, this task presents a more complex challenge as the policy needs to adapt to diverse user intentions, in addition to the object's geometry. We address this challenge by proposing an approach consisting of two sub-modules: a hand-object-conditional grasping primitive called Grasping Gradient Field~(GraspGF), and a history-conditional residual policy. GraspGF learns `how' to grasp by estimating the gradient from a success grasping example set, while the residual policy determines `when' and at what speed the grasping action should be executed based on the trajectory history. Experimental results demonstrate the superiority of our proposed method compared to baselines, highlighting the user-awareness and practicality in real-world applications. The codes and demonstrations can be viewed at "https://sites.google.com/view/graspgf".
    摘要 人类辅助dexterous grasping任务在帮助用户抓取物品时具有重要 significanc。在这篇论文中,我们提出一种新任务,即人机合作dexterous grasping,旨在训练一个控制机器人手指的策略,以助用户抓取物品。与传统的dexterous grasping不同,这个任务呈现了更加复杂的挑战,因为策略需要适应用户的意图,以及物品的几何学特征。我们解决这个挑战的方法是通过两个子模块:一个手机-物品conditional grasping基本单元 called Grasping Gradient Field(GraspGF),以及一个历史条件的差异策略。GraspGF学习了如何抓取的“如何”,通过成功抓取示例集来估算抓取的梯度,而差异策略则确定了执行抓取动作的时间和速度,根据轨迹历史。实验结果表明我们提出的方法在基eline之上显著超越,highlighting用户意识和实际应用中的实用性。代码和演示可以在“https://sites.google.com/view/graspgf”上查看。

Automatically Estimating the Effort Required to Repay Self-Admitted Technical Debt

  • paper_url: http://arxiv.org/abs/2309.06020
  • repo_url: https://github.com/yikun-li/satd-repayment-effort
  • paper_authors: Yikun Li, Mohamed Soliman, Paris Avgeriou
    for: 本研究旨在提高技术债(Technical Debt)的优化和维护效率,特别是自我投诉技术债(Self-Admitted Technical Debt,SATD)的优化。methods: 本研究使用了一个大型的SATD数据集,包括341,740个SATD项来自2,568,728个提交,从1,060个Apache仓库中收集。然后,我们采用了BERT和TextCNN等机器学习方法来自动估算SATD的偿还努力。results: 我们发现不同类型的SATD偿还努力有不同的水平,代码/设计、需求、测试债需要更多的偿还努力,而文档债需要较少的偿还努力。此外,我们还总结了在SATD偿还过程中不同水平的偿还努力关键词。本研究的贡献可以帮助优化技术债的优化和维护效率,最终为软件开发和维护带来利益。
    Abstract Technical debt refers to the consequences of sub-optimal decisions made during software development that prioritize short-term benefits over long-term maintainability. Self-Admitted Technical Debt (SATD) is a specific form of technical debt, explicitly documented by developers within software artifacts such as source code comments and commit messages. As SATD can hinder software development and maintenance, it is crucial to address and prioritize it effectively. However, current methodologies lack the ability to automatically estimate the repayment effort of SATD based on its textual descriptions. To address this limitation, we propose a novel approach for automatically estimating SATD repayment effort, utilizing a comprehensive dataset comprising 341,740 SATD items from 2,568,728 commits across 1,060 Apache repositories. Our findings show that different types of SATD require varying levels of repayment effort, with code/design, requirement, and test debt demanding greater effort compared to non-SATD items, while documentation debt requires less. We introduce and evaluate machine learning methodologies, particularly BERT and TextCNN, which outperforms classic machine learning methods and the naive baseline in estimating repayment effort. Additionally, we summarize keywords associated with varying levels of repayment effort that occur during SATD repayment. Our contributions aim to enhance the prioritization of SATD repayment effort and resource allocation efficiency, ultimately benefiting software development and maintainability.
    摘要

Molecular Conformation Generation via Shifting Scores

  • paper_url: http://arxiv.org/abs/2309.09985
  • repo_url: None
  • paper_authors: Zihan Zhou, Ruiying Liu, Chaolong Ying, Ruimao Zhang, Tianshu Yu
  • for: 生成分子结构,一个计算化学中的关键问题,涉及生成分子的三维构型几何。
  • methods: 我们提出了一种新的分子构型生成方法,基于分子分解的观察,即将分子中的原子间距离推迟到Maxwell-Boltzmann分布。
  • results: 我们对分子数据集进行实验,并证明了我们的方法在现有方法的基础上具有优势。
    Abstract Molecular conformation generation, a critical aspect of computational chemistry, involves producing the three-dimensional conformer geometry for a given molecule. Generating molecular conformation via diffusion requires learning to reverse a noising process. Diffusion on inter-atomic distances instead of conformation preserves SE(3)-equivalence and shows superior performance compared to alternative techniques, whereas related generative modelings are predominantly based upon heuristical assumptions. In response to this, we propose a novel molecular conformation generation approach driven by the observation that the disintegration of a molecule can be viewed as casting increasing force fields to its composing atoms, such that the distribution of the change of inter-atomic distance shifts from Gaussian to Maxwell-Boltzmann distribution. The corresponding generative modeling ensures a feasible inter-atomic distance geometry and exhibits time reversibility. Experimental results on molecular datasets demonstrate the advantages of the proposed shifting distribution compared to the state-of-the-art.
    摘要 分子形态生成,计算化学中一项关键任务,涉及生成给定分子的三维结构均衡。通过扩散来生成分子形态,需要学习反向噪声过程。与传统技术不同,我们的方法基于分子裂解的观察,即将分子中的原子受到增加的力场影响,因此分子中间的距离分布从 Gaussian 转变为 Maxwell-Boltzmann 分布。这种生成模型保证了原子间距离的可行性,并且具有时间逆向性。对分子数据进行实验,我们发现了我们提出的分布Shift的优势,比 estado-of-the-art 更好。

DSLOT-NN: Digit-Serial Left-to-Right Neural Network Accelerator

  • paper_url: http://arxiv.org/abs/2309.06019
  • repo_url: None
  • paper_authors: Muhammad Sohail Ibrahim, Muhammad Usman, Malik Zohaib Nisar, Jeong-A, Lee
  • for: 这个研究的目的是来提高深度神经网络(DNN)的推论运算的效率,并节省电力和能源。
  • methods: 这个研究使用了一种基于数位序列的左至右(DSLOT)运算技术,称为 DSLOT-NN,并使用了低延迟的最重要 digit-first(MSDF)多项式和加法器来进行资料处理。
  • results: 这个研究的结果显示,使用 DSLOT-NN 技术可以节省大量电力和能源,并且具有较短的循环时间和较高的 OPS 每瓦特。 compared with state-of-the-art Stripes 的性能指标。
    Abstract We propose a Digit-Serial Left-tO-righT (DSLOT) arithmetic based processing technique called DSLOT-NN with aim to accelerate inference of the convolution operation in the deep neural networks (DNNs). The proposed work has the ability to assess and terminate the ineffective convolutions which results in massive power and energy savings. The processing engine is comprised of low-latency most-significant-digit-first (MSDF) (also called online) multipliers and adders that processes data from left-to-right, allowing the execution of subsequent operations in digit-pipelined manner. Use of online operators eliminates the need for the development of complex mechanism of identifying the negative activation, as the output with highest weight value is generated first, and the sign of the result can be identified as soon as first non-zero digit is generated. The precision of the online operators can be tuned at run-time, making them extremely useful in situations where accuracy can be compromised for power and energy savings. The proposed design has been implemented on Xilinx Virtex-7 FPGA and is compared with state-of-the-art Stripes on various performance metrics. The results show the proposed design presents power savings, has shorter cycle time, and approximately 50% higher OPS per watt.
    摘要 Our processing engine consists of low-latency most-significant-digit-first (MSDF) multipliers and adders that process data from left to right, allowing for digit-pipelined execution. This eliminates the need for complex mechanisms to identify negative activation, as the output with the highest weight value is generated first, and the sign of the result can be identified as soon as the first non-zero digit is generated.The precision of our online operators can be tuned at runtime, making them highly versatile in situations where accuracy can be compromised for power and energy savings. We have implemented our design on Xilinx Virtex-7 FPGA and compared it with state-of-the-art Stripes on various performance metrics. Our results show that the proposed design achieves power savings, has a shorter cycle time, and offers approximately 50% higher operations per second per watt.

SoccerNet 2023 Challenges Results

  • paper_url: http://arxiv.org/abs/2309.06006
  • repo_url: https://github.com/lRomul/ball-action-spotting
  • paper_authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim, Chen Chen, Fabian Deuser, Feng Yan, Fufu Yu, Gal Shitrit, Guanshuo Wang, Gyusik Choi, Hankyul Kim, Hao Guo, Hasby Fahrudin, Hidenari Koguchi, Håkan Ardö, Ibrahim Salah, Ido Yerushalmy, Iftikar Muhammad, Ikuma Uchida, Ishay Be’ery, Jaonary Rabarisoa, Jeongae Lee, Jiajun Fu, Jianqin Yin, Jinghang Xu, Jongho Nang, Julien Denize, Junjie Li, Junpei Zhang, Juntae Kim, Kamil Synowiec, Kenji Kobayashi, Kexin Zhang, Konrad Habel, Kota Nakajima, Licheng Jiao, Lin Ma, Lizhi Wang, Luping Wang, Menglong Li, Mengying Zhou, Mohamed Nasr, Mohamed Abdelwahed, Mykola Liashuha, Nikolay Falaleev, Norbert Oswald, Qiong Jia, Quoc-Cuong Pham, Ran Song, Romain Hérault, Rui Peng, Ruilong Chen, Ruixuan Liu, Ruslan Baikulov, Ryuto Fukushima, Sergio Escalera, Seungcheon Lee, Shimin Chen, Shouhong Ding, Taiga Someya, Thomas B. Moeslund, Tianjiao Li, Wei Shen, Wei Zhang, Wei Li, Wei Dai, Weixin Luo, Wending Zhao, Wenjie Zhang, Xinquan Yang, Yanbiao Ma, Yeeun Joo, Yingsen Zeng, Yiyang Gan, Yongqiang Zhu, Yujie Zhong, Zheng Ruan, Zhiheng Li, Zhijian Huang, Ziyu Meng
    for:* 这篇论文是为了描述2023年度的SoccerNet视频理解挑战(第三届),这些挑战包括七个视觉任务,分为三个主题。methods:* 这篇论文使用了多种视觉技术,包括动作检测、球体变化检测、笔记录和Camera calibration等。results:* 这篇论文描述了2023年度的SoccerNet视频理解挑战,包括七个视觉任务的结果,其中有三个任务是新添加的,一个任务得到了更多的数据和注释,另一个任务改为着眼点到终端方法。
    Abstract The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet.
    摘要 occerNet 2023 挑战是第三届视频理解挑战,由 SoccerNet 团队组织。这一第三届挑战包括七个视觉任务,分为三个主题。第一主题是广播视频理解,包括三个高级任务:(1)动作搜索,关注从广播视频中检索全局动作的时间戳;(2)球动作搜索,关注从广播视频中检索球的状态变化时间戳;(3)稠密视频描述,关注使用自然语言描述广播视频,并附加时间戳。第二主题是场地理解,单个任务为(4)摄像头协调,关注从图像中提取摄像头参数。第三主题是球员理解,包括三个低级任务:(5)重识别,关注在多视图中重新识别同一名球员;(6)多对tracking,关注在未编辑视频流中跟踪球员和球;(7)球衣号码识别,关注从跟踪片中识别球员的球衣号码。相比前两届 SoccerNet 挑战,任务(2-3-7)是新的,包括新的注释和数据,任务(4)增加了更多的数据和注释,任务(6)现在专注于终端方法。更多关于任务、挑战和排名的信息可以在 上获取。基础和开发集可以在 上找到。

Life-inspired Interoceptive Artificial Intelligence for Autonomous and Adaptive Agents

  • paper_url: http://arxiv.org/abs/2309.05999
  • repo_url: None
  • paper_authors: Sungwoo Lee, Younghyun Oh, Hyunhoe An, Hyebhin Yoon, Karl J. Friston, Seok Jun Hong, Choong-Wan Woo
  • for: 本研究的目的是建立自适应和自主的人工智能代理人,以便在不断变化的环境中存活和实现目标。
  • methods: 本研究使用了生物学中的感知过程,即监测自己的内部环境以保持在certain bounds内,以及生物学中的数学性质,以开发人工智能代理人。
  • results: 本研究提出了一种新的感知方法,可以帮助建立自适应和自主的人工智能代理人,并结合了遗传学、增强学习和神经科学的新进展。
    Abstract Building autonomous --- i.e., choosing goals based on one's needs -- and adaptive -- i.e., surviving in ever-changing environments -- agents has been a holy grail of artificial intelligence (AI). A living organism is a prime example of such an agent, offering important lessons about adaptive autonomy. Here, we focus on interoception, a process of monitoring one's internal environment to keep it within certain bounds, which underwrites the survival of an organism. To develop AI with interoception, we need to factorize the state variables representing internal environments from external environments and adopt life-inspired mathematical properties of internal environment states. This paper offers a new perspective on how interoception can help build autonomous and adaptive agents by integrating the legacy of cybernetics with recent advances in theories of life, reinforcement learning, and neuroscience.
    摘要 建立自主---即根据自己需求选择目标---以及适应---即在不断变化的环境中生存---的人工智能(AI)是人工智能的圣杯。生物体是这种代理人的一个好例子,它们提供了关键的适应自主性教训。在这里,我们关注内部环境监测---即保持内部环境在某些范围内---这一过程,这对生物体的存亡起到了关键作用。为建立具有内部监测能力的AI,我们需要将内部环境状态变量分离于外部环境状态变量,并采用生命力学中的内部环境状态生命力学性质。这篇论文提供了一种新的视角,即通过结合人工智能的启蒙、生命力学、奖励学习和神经科学的进步,实现内部监测能力的AI。

Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis

  • paper_url: http://arxiv.org/abs/2309.07168
  • repo_url: None
  • paper_authors: Mehdi Zadem, Sergio Mover, Sao Mai Nguyen
  • for: 提高开放式学习的效率和可贸转性,使用符号方法表示目标。
  • methods: 利用发展机制实现目标发现,通过emergent表示将环境状态集分组,保持环境动力学信息。
  • results: 在导航任务上,通过逐渐学习表示和策略,实现了数据效率和可贸转性。
    Abstract Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this work, we propose a developmental mechanism for subgoal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We create a HRL algorithm that gradually learns this representation along with the policies and evaluate it on navigation tasks to show the learned representation is interpretable and results in data efficiency.
    摘要 开放式学习受益于使用象征方法表示目标,因为它们可以为高效和可传递的学习提供结构知识。然而,现有的层次强化学习(HRL)方法,它们通常需要手动设定目标表示,这会带来一定的限制。挑战在自动发现象征目标表示时是保留环境动力学信息。在这种工作中,我们提出一种发展机制,通过观察环境状态的集合,自动找出子目标。我们创建了一种基于HRL算法,逐渐学习这种表示,并评估其在导航任务中的效果,结果表明学习的表示是可解释的,并且具有数据效率。

Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos

  • paper_url: http://arxiv.org/abs/2309.05943
  • repo_url: None
  • paper_authors: Sarthak Bhagat, Simon Stepputtis, Joseph Campbell, Katia Sycara
  • for: 预测长期人类行为,特别是使用短视频段,以提高编辑工作流程的速度和创造力。
  • methods: 使用 transformer 网络具有符号知识图,在视频段中预测行为,通过在运行时提高 transformer 的注意力机制来增强表现。
  • results: 在 Breakfast 和 50Salads 两个标准 datasets 上,我们的方法与当前状态的方法相比,在长期行为预测中使用短视频段的情况下提高了9%。
    Abstract This work focuses on anticipating long-term human actions, particularly using short video segments, which can speed up editing workflows through improved suggestions while fostering creativity by suggesting narratives. To this end, we imbue a transformer network with a symbolic knowledge graph for action anticipation in video segments by boosting certain aspects of the transformer's attention mechanism at run-time. Demonstrated on two benchmark datasets, Breakfast and 50Salads, our approach outperforms current state-of-the-art methods for long-term action anticipation using short video context by up to 9%.
    摘要

Answering Subjective Induction Questions on Products by Summarizing Multi-sources Multi-viewpoints Knowledge

  • paper_url: http://arxiv.org/abs/2309.05938
  • repo_url: None
  • paper_authors: Yufeng Zhang, Meng-xiang Wang, Jianxing Yu
  • for: 提出了一个新的 Answering Subjective Induction Question on Products (SUBJPQA) 任务,解决这类问题的答案不固定,可以从多个角度解释。
  • methods: 提出了三步方法:首先从多个知识源中检索答案相关的线索,并收集了相关的印象知识;然后通过交互式注意力捕捉问题中的相关性;最后,通过模板控制的解码器输出了全面和多角度的答案。
  • results: 由于这个新任务没有相关的评估标准集,因此 constructed a large-scale dataset named SupQA,包含48,352个样本和15种产品领域。评估结果表明了我们的方法的效果。
    Abstract This paper proposes a new task in the field of Answering Subjective Induction Question on Products (SUBJPQA). The answer to this kind of question is non-unique, but can be interpreted from many perspectives. For example, the answer to 'whether the phone is heavy' has a variety of different viewpoints. A satisfied answer should be able to summarize these subjective opinions from multiple sources and provide objective knowledge, such as the weight of a phone. That is quite different from the traditional QA task, in which the answer to a factoid question is unique and can be found from a single data source. To address this new task, we propose a three-steps method. We first retrieve all answer-related clues from multiple knowledge sources on facts and opinions. The implicit commonsense facts are also collected to supplement the necessary but missing contexts. We then capture their relevance with the questions by interactive attention. Next, we design a reinforcement-based summarizer to aggregate all these knowledgeable clues. Based on a template-controlled decoder, we can output a comprehensive and multi-perspective answer. Due to the lack of a relevant evaluated benchmark set for the new task, we construct a large-scale dataset, named SupQA, consisting of 48,352 samples across 15 product domains. Evaluation results show the effectiveness of our approach.
    摘要 Here's the translation in Simplified Chinese:这篇论文提出了一个新的任务,即答え问题 Answering Subjective Induction Questions on Products (SUBJPQA)。这类问题的答案不唯一,可以从多个角度解释。例如,问题 "手机是否重" 有多种不同的看法。一个满意的答案应该能够汇集多个来源的主观意见,并提供对象知识,如手机的重量。这与传统的 QA 任务不同,传统的答案是唯一的,可以从单个数据源中找到。为 Addressing this new task, the authors propose a three-step method. First, they retrieve all answer-related clues from multiple knowledge sources on facts and opinions. They also collect implicit commonsense facts to supplement necessary but missing contexts. Next, they capture the relevance of the clues with the questions using interactive attention. Finally, they design a reinforcement-based summarizer to aggregate all the knowledgeable clues. Based on a template-controlled decoder, they can output a comprehensive and multi-perspective answer. Due to the lack of a relevant evaluated benchmark set for the new task, the authors construct a large-scale dataset named SupQA, consisting of 48,352 samples across 15 product domains. The evaluation results show the effectiveness of their approach.

MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials Modeling

  • paper_url: http://arxiv.org/abs/2309.05934
  • repo_url: https://github.com/intellabs/matsciml
  • paper_authors: Kin Long Kelvin Lee, Carmelo Gonzales, Marcel Nassar, Matthew Spellings, Mikhail Galkin, Santiago Miret
  • For: 该论文目的是提出一个新的benchmark,用于模型化固体材料科学领域中的机器学习方法。* Methods: 该论文使用的方法包括使用开源数据集,包括大规模数据集如OpenCatalyst、OQMD、NOMAD、Carolina Materials Database和Materials Project等,以及 simulated energies、atomic forces、材料带隔和 Space group classification数据等。* Results: 该论文通过使用多个数据集进行联合预测,实现了对固体材料的多任务学习和多数据集学习。通过 MatSci ML benchmark,研究人员可以评估不同的 graf neural network 和 equivariant point cloud network 在多种学习场景中的性能。
    Abstract We propose MatSci ML, a novel benchmark for modeling MATerials SCIence using Machine Learning (MatSci ML) methods focused on solid-state materials with periodic crystal structures. Applying machine learning methods to solid-state materials is a nascent field with substantial fragmentation largely driven by the great variety of datasets used to develop machine learning models. This fragmentation makes comparing the performance and generalizability of different methods difficult, thereby hindering overall research progress in the field. Building on top of open-source datasets, including large-scale datasets like the OpenCatalyst, OQMD, NOMAD, the Carolina Materials Database, and Materials Project, the MatSci ML benchmark provides a diverse set of materials systems and properties data for model training and evaluation, including simulated energies, atomic forces, material bandgaps, as well as classification data for crystal symmetries via space groups. The diversity of properties in MatSci ML makes the implementation and evaluation of multi-task learning algorithms for solid-state materials possible, while the diversity of datasets facilitates the development of new, more generalized algorithms and methods across multiple datasets. In the multi-dataset learning setting, MatSci ML enables researchers to combine observations from multiple datasets to perform joint prediction of common properties, such as energy and forces. Using MatSci ML, we evaluate the performance of different graph neural networks and equivariant point cloud networks on several benchmark tasks spanning single task, multitask, and multi-data learning scenarios. Our open-source code is available at https://github.com/IntelLabs/matsciml.
    摘要 我们提出了MatSci ML,一个新的测试基准 для模型化材料科学使用机器学习方法(MatSci ML),专注于固体材料 periodic crystal structures 的模型。采用机器学习方法对固体材料是一个新兴领域,受到各种数据集的不同所致,这种分化使得对不同方法的比较和总体进展困难,从而阻碍了材料科学领域的研究进步。基于开源数据集,包括大规模数据集如OpenCatalyst、OQMD、NOMAD、Carolina Materials Database和Materials Project,MatSci ML 提供了多种材料系统和性能数据 для模型训练和评估,包括模拟能量、原子力、材料带隙、以及通过空间群来分类的晶体结构数据。MatSci ML 中的多种属性多样性使得实现和评估多任务学习算法 для固体材料 possible,而多种数据集的多样性促进了新的、更一般的算法和方法的开发。在多个数据集学习 Setting,MatSci ML 允许研究人员将多个数据集中的观察结果合并进行共同预测常见的属性,如能量和力。使用 MatSci ML,我们评估了不同的图 neural networks 和平衡点云网络在多个benchmark任务中的表现,涵盖单任务、多任务和多数据学习场景。我们的开源代码可以在https://github.com/IntelLabs/matsciml 中找到。

Combining deep learning and street view imagery to map smallholder crop types

  • paper_url: http://arxiv.org/abs/2309.05930
  • repo_url: None
  • paper_authors: Jordi Laguarta, Thomas Friedel, Sherrie Wang
  • for: 该研究的目的是为了创建一个全球范围内的农作物类型地图,以便监测农作物生长进度,预测全球农产量,并制定有效的政策。
  • methods: 该研究使用了深度学习技术和Google街景图像来自动生成农作物类型的地面参考数据。他们首先选择了一组街景图像,然后使用这些图像来训练一个模型,以便预测农作物类型。最后,他们将预测的标签与Remote sensing时序列结合,以创建一个无缝的农作物类型地图。
  • results: 研究发现,在泰国,该方法可以创建一个国家范围内的全面的rice、manioc、maize和甘蔗类型地图,其准确率达93%。这种方法可以在全球各地,尤其是小农场地区,提供一种快速、便宜、高准确率的农作物类型地图生成方式。
    Abstract Accurate crop type maps are an essential source of information for monitoring yield progress at scale, projecting global crop production, and planning effective policies. To date, however, crop type maps remain challenging to create in low and middle-income countries due to a lack of ground truth labels for training machine learning models. Field surveys are the gold standard in terms of accuracy but require an often-prohibitively large amount of time, money, and statistical capacity. In recent years, street-level imagery, such as Google Street View, KartaView, and Mapillary, has become available around the world. Such imagery contains rich information about crop types grown at particular locations and times. In this work, we develop an automated system to generate crop type ground references using deep learning and Google Street View imagery. The method efficiently curates a set of street view images containing crop fields, trains a model to predict crop type by utilizing weakly-labelled images from disparate out-of-domain sources, and combines predicted labels with remote sensing time series to create a wall-to-wall crop type map. We show that, in Thailand, the resulting country-wide map of rice, cassava, maize, and sugarcane achieves an accuracy of 93%. As the availability of roadside imagery expands, our pipeline provides a way to map crop types at scale around the globe, especially in underserved smallholder regions.
    摘要 准确的作物类别地图是考古规模监测作物进步、全球作物产量预测和制定有效政策的重要来源。然而,在LOW和中等收入国家,创建作物类别地图仍然是一项挑战,因为缺乏地面 truth标签用于训练机器学习模型。 Field surveys 是精度最高的标准,但它们需要大量的时间、金钱和统计资源。在最近几年,街道级别的图像,如Google Street View、KartaView和Mapillary,在全球变得可用。这些图像包含特定地点和时间的作物种植的详细信息。在这个工作中,我们开发了一个自动化的系统,使用深度学习和Google Street View图像来生成作物类别地标。该方法可以效率地筛选出包含作物田的街道图像,使用弱 labels 的图像从不同的域外来源来训练模型,并将预测标签与远程感知时序列合并以创建墙到墙的作物类别地图。我们显示,在泰国,得到的国家范围内的rice、 Cassava、maize和sugarcane的地图达到93%的准确率。随着路边图像的可用性扩展,我们的管道可以在全球范围内地图作物类别,特别是在小holder地区。

Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

  • paper_url: http://arxiv.org/abs/2309.05927
  • repo_url: None
  • paper_authors: Ran Liu, Ellen L. Zippi, Hadi Pouransari, Chris Sandino, Jingping Nie, Hanlin Goh, Erdrin Azemi, Ali Moin
  • for: 本研究旨在提出一种适应多模态信号各种任务和多样性挑战的预训练方法,以提高对多模态信号的表征和预测性能。
  • methods: 本研究提出了一种频率意识的掩码自动编码器($\texttt{bio}$FAME),它通过在频率空间中参数化信号表征来适应多模态信号的分布变化。该方法还包括一种频率维护预训练策略,通过在干扰空间中进行掩码自动编码来保持输入信号中的频率成分。
  • results: 对一系列的转移试验,我们获得了平均提高5.5%的分类精度,至于之前的状态艺术。此外,我们还证明了该方法在模式匹配情况下具有强大的实用性,包括预期不确定的模式掉包或替换。
    Abstract Leveraging multimodal information from biosignals is vital for building a comprehensive representation of people's physical and mental states. However, multimodal biosignals often exhibit substantial distributional shifts between pretraining and inference datasets, stemming from changes in task specification or variations in modality compositions. To achieve effective pretraining in the presence of potential distributional shifts, we propose a frequency-aware masked autoencoder ($\texttt{bio}$FAME) that learns to parameterize the representation of biosignals in the frequency space. $\texttt{bio}$FAME incorporates a frequency-aware transformer, which leverages a fixed-size Fourier-based operator for global token mixing, independent of the length and sampling rate of inputs. To maintain the frequency components within each input channel, we further employ a frequency-maintain pretraining strategy that performs masked autoencoding in the latent space. The resulting architecture effectively utilizes multimodal information during pretraining, and can be seamlessly adapted to diverse tasks and modalities at test time, regardless of input size and order. We evaluated our approach on a diverse set of transfer experiments on unimodal time series, achieving an average of $\uparrow$5.5% improvement in classification accuracy over the previous state-of-the-art. Furthermore, we demonstrated that our architecture is robust in modality mismatch scenarios, including unpredicted modality dropout or substitution, proving its practical utility in real-world applications. Code will be available soon.
    摘要 利用多Modal信息从生物信号是建立全面人体和心理状态的关键。然而,多Modal生物信号经常会在预训练和测试集数据中出现重大的分布变化,这可能是因为任务规定的变化或多Modal组合的变化。为了在可能存在的分布变化情况下进行有效的预训练,我们提议一种频率意识的隐藏式自动编码器($\texttt{bio}$FAME),该模型学习在频率空间中 parameterize 生物信号的表示。$\texttt{bio}$FAME 包含一种频率意识的转换器,该转换器通过固定大小的干扰函数来进行全token混合,不виси于输入的长度和采样率。为了保持每个输入通道中的频率组件,我们还采用了一种保持频率的预训练策略,该策略在幽latent空间进行隐藏式自动编码。这种架构可以有效利用多Modal信息进行预训练,并可以在测试时轻松适应多种任务和多Modal,无论输入的大小和顺序。我们对一组多Modal时间序列转换任务进行了多种转换试验,实现了平均提高5.5%的分类精度,胜过之前的状态艺。此外,我们还证明了我们的架构在模态匹配情况下具有实用性,包括预测不可预知的模态掉落或替换,证明了它在实际应用中的实用性。代码即将上传。

On Regularized Sparse Logistic Regression

  • paper_url: http://arxiv.org/abs/2309.05925
  • repo_url: https://github.com/RohithM191/TSNE-on-Amazon-Fine-Food-reviews-Dataset
  • paper_authors: Mengyuan Zhang, Kai Liu
  • for: 这篇论文的目的是用来解决高维度数据中的分类和特征选择问题,同时使用非凸补助项来减少数据的维度。
  • methods: 本文提出了一些新的优化框架,用于解决这些问题,包括使用不同的条搜搜索准则来保证优化结果的好几何性。
  • results: 实验结果显示,提出的算法可以实现高效的分类和特征选择,并且比较低的计算成本。
    Abstract Sparse logistic regression aims to perform classification and feature selection simultaneously for high-dimensional data. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant literature about solving sparse logistic regression associated with nonconvex penalties. In this paper, we propose to solve $\ell_1$-regularized sparse logistic regression and some nonconvex penalties-regularized sparse logistic regression, when the nonconvex penalties satisfy some prerequisites, with similar optimization frameworks. In the proposed optimization frameworks, we utilize different line search criteria to guarantee good convergence performance for different regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively with a lower computational cost.
    摘要 “稀疏逻辑回传数据类别和特征选择同时进行分类。处理 $\ell_1$-规定逻辑回传数据的研究已经很多,但是关于非对称责任逻辑回传数据的研究却没有相应的实践。本文提出了解决 $\ell_1$-规定稀疏逻辑回传数据和一些非对称责任逻辑回传数据的优化框架。在提议的优化框架中,我们利用不同的搜索条件来保证不同的规定Terms的优化表现良好。实验结果显示我们的提议算法在实际数据上能够有效地进行分类和特征选择,并且比较低的计算成本。”Note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.

A Survey of Hallucination in Large Foundation Models

  • paper_url: http://arxiv.org/abs/2309.05922
  • repo_url: https://github.com/vr25/hallucination-foundation-model-survey
  • paper_authors: Vipula Rawte, Amit Sheth, Amitava Das
  • for: 本文提供了关于基础模型中幻觉现象的广泛审视和分析,尤其是大基础模型(LFM)中幻觉现象的研究进展。
  • methods: 本文对幻觉现象进行分类,并确定了评估幻觉程度的评价标准。同时,本文还检查了现有的幻觉控制策略,并讨论了未来研究的可能性。
  • results: 本文提供了关于幻觉在LFM中的全面检查和解决方案。
    Abstract Hallucination in a foundation model (FM) refers to the generation of content that strays from factual reality or includes fabricated information. This survey paper provides an extensive overview of recent efforts that aim to identify, elucidate, and tackle the problem of hallucination, with a particular focus on ``Large'' Foundation Models (LFMs). The paper classifies various types of hallucination phenomena that are specific to LFMs and establishes evaluation criteria for assessing the extent of hallucination. It also examines existing strategies for mitigating hallucination in LFMs and discusses potential directions for future research in this area. Essentially, the paper offers a comprehensive examination of the challenges and solutions related to hallucination in LFMs.
    摘要 幻想在基础模型(FM)中指的是生成不符事实或包含虚假信息的内容。这篇评论文章提供了对最近努力防止幻想的全面回顾,尤其是对大型基础模型(LFMs)的幻想问题。文章分类了LFMs中幻想现象的不同类型,并设置了评估幻想程度的评价标准。它还检查了现有的防止幻想策略,并讨论了未来研究的可能性。简言之,文章提供了幻想在LFMs中的挑战和解决方案的全面检查。

SAGE: Structured Attribute Value Generation for Billion-Scale Product Catalogs

  • paper_url: http://arxiv.org/abs/2309.05920
  • repo_url: None
  • paper_authors: Athanasios N. Nikolakopoulos, Swati Kaul, Siva Karthik Gade, Bella Dubrov, Umit Batur, Suleiman Ali Khan
  • for: 这篇论文旨在为电商平台中的产品 attribute 值预测做出提高。
  • methods: 该论文提出了一种新的 attribute-value 预测方法,基于 Seq2Seq 概率模型,可以适应不同语言、产品类型和目标 attribute。该方法可以预测 attribute 值,even when such values are mentioned implicitly using periphrastic language, or not-at-all-as is the case for common-sense defaults。
  • results: 实验表明,该方法可以有效地预测 attribute 值,并且比之前的方法更高效。此外,该方法还可以预测 attribute 值在零例教程下,从而减少需要训练的数据量。
    Abstract We introduce SAGE; a Generative LLM for inferring attribute values for products across world-wide e-Commerce catalogs. We introduce a novel formulation of the attribute-value prediction problem as a Seq2Seq summarization task, across languages, product types and target attributes. Our novel modeling approach lifts the restriction of predicting attribute values within a pre-specified set of choices, as well as, the requirement that the sought attribute values need to be explicitly mentioned in the text. SAGE can infer attribute values even when such values are mentioned implicitly using periphrastic language, or not-at-all-as is the case for common-sense defaults. Additionally, SAGE is capable of predicting whether an attribute is inapplicable for the product at hand, or non-obtainable from the available information. SAGE is the first method able to tackle all aspects of the attribute-value-prediction task as they arise in practical settings in e-Commerce catalogs. A comprehensive set of experiments demonstrates the effectiveness of the proposed approach, as well as, its superiority against state-of-the-art competing alternatives. Moreover, our experiments highlight SAGE's ability to tackle the task of predicting attribute values in zero-shot setting; thereby, opening up opportunities for significantly reducing the overall number of labeled examples required for training.
    摘要 我们介绍SAGE;一种生成式LLM用于在全球电子商务目录中预测产品的属性值。我们提出了一种新的属性值预测问题的表述方式,即将属性值预测问题转化为一个Seq2Seq摘要任务,跨语言、产品类型和目标属性。我们的新的模型方法解决了预测属性值时的限制,包括预测属性值必须在先Specified的选择范围内,以及文本中必须直接提到属性值。SAGE可以在文本中推断属性值,即使这些值是使用射igrated语言表达或者不直接提到。此外,SAGE还可以预测产品上的属性是否存在或者可以从可用信息中获取。SAGE是第一种能够在实际设置中解决所有属性值预测问题的方法。一系列实验证明了我们的方法的有效性和其对State-of-the-art的竞争方法的超越性。此外,我们的实验还 highlight了SAGE在零shot设置下预测属性值的能力,从而开启了减少训练数据的机会。

Stochastic LLMs do not Understand Language: Towards Symbolic, Explainable and Ontologically Based LLMs

  • paper_url: http://arxiv.org/abs/2309.05918
  • repo_url: None
  • paper_authors: Walid S. Saba
  • for: 这篇论文是关于大语言模型(LLMs)的研究,主要探讨了 LLMS 的限制和不足,并提出了一种符号驱动的语言模型的建议。
  • methods: 本论文使用了数据驱动的方法,对 LLMS 进行了分析和评估,并提出了一种符号驱动的语言模型的建议。
  • results: 本论文的研究结果表明,LLMS 存在许多限制和不足,如不能准确地提供实际信息,因为它们所有的文本都被视为一样有价值的;此外,LLMS 的知识也会被归类为微特征(weights)中,无法归纳到有意义的概念中;此外,LLMS 在某些语言上也会出现偏差。
    Abstract In our opinion the exuberance surrounding the relative success of data-driven large language models (LLMs) is slightly misguided and for several reasons (i) LLMs cannot be relied upon for factual information since for LLMs all ingested text (factual or non-factual) was created equal; (ii) due to their subsymbolic na-ture, whatever 'knowledge' these models acquire about language will always be buried in billions of microfeatures (weights), none of which is meaningful on its own; and (iii) LLMs will often fail to make the correct inferences in several linguistic contexts (e.g., nominal compounds, copredication, quantifier scope ambi-guities, intensional contexts. Since we believe the relative success of data-driven large language models (LLMs) is not a reflection on the symbolic vs. subsymbol-ic debate but a reflection on applying the successful strategy of a bottom-up reverse engineering of language at scale, we suggest in this paper applying the effective bottom-up strategy in a symbolic setting resulting in symbolic, explainable, and ontologically grounded language models.
    摘要 我们认为大量数据驱动的大型自然语言模型(LLM)的热情是有所误导的,主要有以下几点原因:1. LLM不能依赖于事实信息,因为它们对所有的文本(事实或非事实)都是一样的;2.由于它们的非符号性质,LMM所获得的语言知识都将被归类为微特征( weights),这些微特征都是无法单独解释的;3. LLM在某些语言上将不能做出正确的推理,如名词复合、共Predication、量词范围不确定性、意思上的context。因为我们认为数据驱动大型自然语言模型(LLM)的成功不是符号vs非符号的问题,而是应用大规模的底层逆工程Strategy的成功,因此在这篇论文中,我们建议在符号环境中应用有效的底层逆工程策略,从而获得符号、可解释、基于ontology的语言模型。

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

  • paper_url: http://arxiv.org/abs/2309.05915
  • repo_url: None
  • paper_authors: Chenxiao Gao, Chenyang Wu, Mingjun Cao, Rui Kong, Zongzhang Zhang, Yang Yu
  • for: 提高offline政策优化中DT的表现,透过带有表达性的序列模型技术来实现行动生成。
  • methods: 使用动态计划来强化DT,包括三步:首先,使用样本中的值迭代来获取估计价值函数,其中具有MDP结构的动态计划特性。其次,评估行动质量基于估计的优势。引入两种优势估计器,IAE和GAE,适用于不同任务。最后,使用优势 Conditioned Transformer (ACT) 来生成基于估计优势的行动。
  • results: 测试结果表明,通过借鉴动态计划的能力,ACT能够在环境杂化下表现出效果很好,与基eline方法在各种标准准则上表现出色。此外,我们通过减少分析来检验ACT的不同设计方案的影响。
    Abstract Decision Transformer (DT), which employs expressive sequence modeling techniques to perform action generation, has emerged as a promising approach to offline policy optimization. However, DT generates actions conditioned on a desired future return, which is known to bear some weaknesses such as the susceptibility to environmental stochasticity. To overcome DT's weaknesses, we propose to empower DT with dynamic programming. Our method comprises three steps. First, we employ in-sample value iteration to obtain approximated value functions, which involves dynamic programming over the MDP structure. Second, we evaluate action quality in context with estimated advantages. We introduce two types of advantage estimators, IAE and GAE, which are suitable for different tasks. Third, we train an Advantage-Conditioned Transformer (ACT) to generate actions conditioned on the estimated advantages. Finally, during testing, ACT generates actions conditioned on a desired advantage. Our evaluation results validate that, by leveraging the power of dynamic programming, ACT demonstrates effective trajectory stitching and robust action generation in spite of the environmental stochasticity, outperforming baseline methods across various benchmarks. Additionally, we conduct an in-depth analysis of ACT's various design choices through ablation studies.
    摘要

Quality-Agnostic Deepfake Detection with Intra-model Collaborative Learning

  • paper_url: http://arxiv.org/abs/2309.05911
  • repo_url: https://bitbucket.org/deepfake-project/qad-iccv23
  • paper_authors: Binh M. Le, Simon S. Woo
  • for: 这个研究旨在提出一个 universial 的内部型别学习框架,以实现不同质量的深伪检测。
  • methods: 我们使用一个名为 QAD 的quality-agnostic deepfake detection方法,通过观察通用错误预期函数的上限,将不同质量水平的图像间的中间表现dependency最大化。
  • results: 我们的 QAD 模型在七个受测 dataset 上进行了广泛的实验,与先前的 SOTA benchmark 相比,表现出色。
    Abstract Deepfake has recently raised a plethora of societal concerns over its possible security threats and dissemination of fake information. Much research on deepfake detection has been undertaken. However, detecting low quality as well as simultaneously detecting different qualities of deepfakes still remains a grave challenge. Most SOTA approaches are limited by using a single specific model for detecting certain deepfake video quality type. When constructing multiple models with prior information about video quality, this kind of strategy incurs significant computational cost, as well as model and training data overhead. Further, it cannot be scalable and practical to deploy in real-world settings. In this work, we propose a universal intra-model collaborative learning framework to enable the effective and simultaneous detection of different quality of deepfakes. That is, our approach is the quality-agnostic deepfake detection method, dubbed QAD . In particular, by observing the upper bound of general error expectation, we maximize the dependency between intermediate representations of images from different quality levels via Hilbert-Schmidt Independence Criterion. In addition, an Adversarial Weight Perturbation module is carefully devised to enable the model to be more robust against image corruption while boosting the overall model's performance. Extensive experiments over seven popular deepfake datasets demonstrate the superiority of our QAD model over prior SOTA benchmarks.
    摘要 为了解决这个问题,我们提出了一种通用内部协作学习框架,以实现不同质量深圳投影的同时检测。具体来说,我们通过观察总体错误预期的上限来 maximize 图像不同质量水平之间的依赖关系,使用希尔伯特-施密特独立性 критерион。此外,我们还妥善地设计了一个对抗权重偏移模块,以使模型更加抗 resize 而增强整体模型的表现。我们在七个流行的深圳投影数据集上进行了广泛的实验,并证明了我们的 QAD 模型在先前的 SOTA 标准之上表现出色。

Comparing Llama-2 and GPT-3 LLMs for HPC kernels generation

  • paper_url: http://arxiv.org/abs/2309.07103
  • repo_url: None
  • paper_authors: Pedro Valero-Lara, Alexis Huante, Mustafa Al Lail, William F. Godoy, Keita Teranishi, Prasanna Balaprakash, Jeffrey S. Vetter
  • for: 本研究用于评估开源模型Llama-2在不同并行编程模型和语言(如C++:OpenMP、OpenMP Offload、OpenACC、CUDA、HIP;Fortran:OpenMP、OpenMP Offload、OpenACC;Python:numpy、Numba、pyCUDA、cuPy;Julia:Threads、CUDA.jl、AMDGPU.jl)中生成高性能计算kernels(如AXPY、GEMV、GEMM)的可能性。
  • methods: 本研究基于我们之前的工作,该基于OpenAI Codex,一个基于GPT-3的descendant,通过GitHub Copilot生成类似kernels的简单提示。我们的目标是比较Llama-2和我们原始GPT-3基线的准确率,使用相似的指标。
  • results: Llama-2模型在生成kernels时显示了竞争力或更高的准确率,而Copilot生成的代码更可靠但 menos优化,相反,Llama-2生成的代码虽然更不可靠但更高效当 corrrect。
    Abstract We evaluate the use of the open-source Llama-2 model for generating well-known, high-performance computing kernels (e.g., AXPY, GEMV, GEMM) on different parallel programming models and languages (e.g., C++: OpenMP, OpenMP Offload, OpenACC, CUDA, HIP; Fortran: OpenMP, OpenMP Offload, OpenACC; Python: numpy, Numba, pyCUDA, cuPy; and Julia: Threads, CUDA.jl, AMDGPU.jl). We built upon our previous work that is based on the OpenAI Codex, which is a descendant of GPT-3, to generate similar kernels with simple prompts via GitHub Copilot. Our goal is to compare the accuracy of Llama-2 and our original GPT-3 baseline by using a similar metric. Llama-2 has a simplified model that shows competitive or even superior accuracy. We also report on the differences between these foundational large language models as generative AI continues to redefine human-computer interactions. Overall, Copilot generates codes that are more reliable but less optimized, whereas codes generated by Llama-2 are less reliable but more optimized when correct.
    摘要 我们评估了基于开源的Llama-2模型来生成知名度高、性能优秀的计算器件(例如AXPY、GEMV、GEMM)在不同的并行编程模型和语言(例如C++:OpenMP、OpenMP Offload、OpenACC、CUDA、HIP;Fortran:OpenMP、OpenMP Offload、OpenACC;Python:numpy、Numba、pyCUDA、cuPy;Julia:Threads、CUDA.jl、AMDGPU.jl)上。我们基于我们之前的工作,这是基于OpenAI Codex,这是GPT-3的后代,通过GitHub Copilot来生成类似的kernels。我们的目标是将Llama-2和我们原始的GPT-3基线相比,使用相似的度量。Llama-2有简化的模型,并显示了竞争或更高的准确性。我们还报告了这些基础的大语言模型在人机交互中如何不断定义。总之,Copilot生成的代码更可靠但较少优化,而Llama-2生成的代码虽然较不可靠但当正确时更高效。

Strategic Behavior of Large Language Models: Game Structure vs. Contextual Framing

  • paper_url: http://arxiv.org/abs/2309.05898
  • repo_url: None
  • paper_authors: Nunzio Lorè, Babak Heydari
  • for: 本研究探讨三种大语言模型(GPT-3.5、GPT-4和LLaMa-2)在游戏理论框架下的策略决策能力。
  • methods: 研究使用四个标准两人游戏(囚徒困境、鹿猎、雪崩和囚徒欢乐)来探讨这些模型在社会决策中的 navigate能力。
  • results: 研究发现,GPT-3.5受到上下文框架的影响很大,但是它对抽象的策略理解能力有限。GPT-4和LLaMa-2根据游戏结构和上下文进行策略调整,但LLaMa-2表现出更加细腻的游戏机制理解。这些结果 highlights current LLMs的限制和不同的聪明程度,警告不要在需要复杂策略理解的任务中不经过训练使用LLMs。
    Abstract This paper investigates the strategic decision-making capabilities of three Large Language Models (LLMs): GPT-3.5, GPT-4, and LLaMa-2, within the framework of game theory. Utilizing four canonical two-player games -- Prisoner's Dilemma, Stag Hunt, Snowdrift, and Prisoner's Delight -- we explore how these models navigate social dilemmas, situations where players can either cooperate for a collective benefit or defect for individual gain. Crucially, we extend our analysis to examine the role of contextual framing, such as diplomatic relations or casual friendships, in shaping the models' decisions. Our findings reveal a complex landscape: while GPT-3.5 is highly sensitive to contextual framing, it shows limited ability to engage in abstract strategic reasoning. Both GPT-4 and LLaMa-2 adjust their strategies based on game structure and context, but LLaMa-2 exhibits a more nuanced understanding of the games' underlying mechanics. These results highlight the current limitations and varied proficiencies of LLMs in strategic decision-making, cautioning against their unqualified use in tasks requiring complex strategic reasoning.
    摘要 The findings show that GPT-3.5 is highly sensitive to contextual framing, but has limited ability to engage in abstract strategic reasoning. Both GPT-4 and LLaMa-2 adjust their strategies based on game structure and context, but LLaMa-2 demonstrates a more nuanced understanding of the games' underlying mechanics. These results highlight the current limitations and varied proficiencies of LLMs in strategic decision-making, and caution against their unqualified use in tasks requiring complex strategic reasoning.