results: 实验表明,提出的算法可以在多种 benchmark 上超越现有的表现,因此推动可靠的探索学习向更实际应用。Abstract
In real-world reinforcement learning problems, the state information is often only partially observable, which breaks the basic assumption in Markov decision processes, and thus, leads to inferior performances. Partially Observable Markov Decision Processes have been introduced to explicitly take the issue into account for learning, exploration, and planning, but presenting significant computational and statistical challenges. To address these difficulties, we exploit the representation view, which leads to a coherent design framework for a practically tractable reinforcement learning algorithm upon partial observations. We provide a theoretical analysis for justifying the statistical efficiency of the proposed algorithm. We also empirically demonstrate the proposed algorithm can surpass state-of-the-art performance with partial observations across various benchmarks, therefore, pushing reliable reinforcement learning towards more practical applications.
摘要
InteraSSort: Interactive Assortment Planning Using Large Language Models
results: experiments 表明,InteraSSort 可以帮助店长做出更加精准和个性化的决策,并且可以扩展到多种操作管理挑战。Abstract
Assortment planning, integral to multiple commercial offerings, is a key problem studied in e-commerce and retail settings. Numerous variants of the problem along with their integration into business solutions have been thoroughly investigated in the existing literature. However, the nuanced complexities of in-store planning and a lack of optimization proficiency among store planners with strong domain expertise remain largely overlooked. These challenges frequently necessitate collaborative efforts with multiple stakeholders which often lead to prolonged decision-making processes and significant delays. To mitigate these challenges and capitalize on the advancements of Large Language Models (LLMs), we propose an interactive assortment planning framework, InteraSSort that augments LLMs with optimization tools to assist store planners in making decisions through interactive conversations. Specifically, we develop a solution featuring a user-friendly interface that enables users to express their optimization objectives as input text prompts to InteraSSort and receive tailored optimized solutions as output. Our framework extends beyond basic functionality by enabling the inclusion of additional constraints through interactive conversation, facilitating precise and highly customized decision-making. Extensive experiments demonstrate the effectiveness of our framework and potential extensions to a broad range of operations management challenges.
摘要
产品排编规划是电商和零售业中关键的问题,它在多种商业服务中发挥重要作用。现有文献中已经进行了详细的研究和分析。然而,在门店规划中存在许多复杂的特点,以及门店规划人员具有强定领域专业知识的问题,这些问题经常需要多方合作和讨论,导致决策过程延长,延迟。为了解决这些挑战,我们提出一种互动式排编规划框架,即InteraSSort,该框架通过与大语言模型(LLM)的合作,为门店规划人员提供互动式会话的优化解决方案。specifically,我们开发了一个用户友好的界面,允许用户通过输入文本提示来表达优化目标,并从InteraSSort获得适应性的优化解决方案。我们的框架不仅具有基本功能,还允许通过互动会话中的添加约束,实现精准和个性化的决策。我们的实验证明了我们的框架的有效性和扩展性,并可应用于多种运维管理挑战。
Ontological Reasoning over Shy and Warded Datalog$+/-$ for Streaming-based Architectures (technical report)
results: 这篇论文介绍了两种非常有潜力和可追加的语言,namely Shy和Warded Datalog$+/- $,并基于它们的理论基础,提出了新的推理技巧,以便高效地解决实际场景中的 Ontological Reasoning 问题。Abstract
Recent years witnessed a rising interest towards Datalog-based ontological reasoning systems, both in academia and industry. These systems adopt languages, often shared under the collective name of Datalog$+/-$, that extend Datalog with the essential feature of existential quantification, while introducing syntactic limitations to sustain reasoning decidability and achieve a good trade-off between expressive power and computational complexity. From an implementation perspective, modern reasoners borrow the vast experience of the database community in developing streaming-based data processing systems, such as volcano-iterator architectures, that sustain a limited memory footprint and good scalability. In this paper, we focus on two extremely promising, expressive, and tractable languages, namely, Shy and Warded Datalog$+/-$. We leverage their theoretical underpinnings to introduce novel reasoning techniques, technically, "chase variants", that are particularly fit for efficient reasoning in streaming-based architectures. We then implement them in Vadalog, our reference streaming-based engine, to efficiently solve ontological reasoning tasks over real-world settings.
摘要
Here is the text in Simplified Chinese:近年来,有越来越多的关注于基于Datalog的 ontological reasoning系统,在学术和产业中。这些系统使用语言,通常被称为Datalog$+/-$, 它们扩展了Datalog,添加了存在量化,同时保持可 decidability 和计算复杂性的平衡。从实现角度来看,现代推理器借鉴了数据库社区对流处理系统的开发经验,如火山迭代架构,以保持有限内存占用和良好的扩展性。在这篇论文中,我们关注两种极其有前途和表达力强的语言,即Shy和Warded Datalog$+/-$.我们利用它们的理论基础,引入了新的推理技术,即"追踪变种",这些技术特别适合流处理基础结构中高效的推理。然后,我们在Vadalog,我们的参考流处理基础结构中,实现了这些技术,以高效地解决了实际应用中的 ontological reasoning问题。
NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation
results: 通过对大量人工引擎生成的提示进行分析和优化,自动生成高质量的提示,以提高文本到图像生成的质量。Abstract
Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models. Our framework utilizes constrained text decoding with a pre-trained language model that has been adapted to generate prompts similar to those produced by human prompt engineers. This approach enables higher-quality text-to-image generations and provides user control over stylistic features via constraint set specification. We demonstrate the utility of our framework by creating an interactive application for prompt enhancement and image generation using Stable Diffusion. Additionally, we conduct experiments utilizing a large dataset of human-engineered prompts for text-to-image generation and show that our approach automatically produces enhanced prompts that result in superior image quality. We make our code, a screencast video demo and a live demo instance of NeuroPrompts publicly available.
摘要
尽管最近的文本到图像扩散模型已经很出色地进步,但是获得高质量图像 Frequently requires human expertise in using these models. 在这项工作中,我们介绍NeuroPrompts,一个可靠的框架,可以自动提高用户提示的质量,以提高文本到图像模型生成的质量。我们的框架使用了约束文本解码,并利用预训练的自然语言模型,生成类似于人类提示工程师生成的提示。这种方法可以提高文本到图像生成的质量,并提供用户控制风格特征的功能,通过约束集定。我们在创建了一个交互式应用程序,用于提示增强和图像生成,并在大量人类工程师生成的提示集合上进行了实验,并证明了我们的方法可以自动生成提高图像质量的提示。我们将我们的代码、屏幕捕捉视频和实时示例 instances of NeuroPrompts 公开 disponibles。
Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators
For: The paper is written for improving the performance of machine learning (ML) models by proposing a new algorithm called Free-pipeline Fast Inner Product (FFIP) and its hardware architecture.* Methods: The paper uses an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968, and implements it for the first time in an ML accelerator. The authors also propose a new algorithm called FFIP and a generalized architecture that improves FIP’s clock frequency and throughput for a similar hardware cost.* Results: The paper shows that FFIP can be seamlessly incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units, or it can double the maximum systolic array size that can fit onto devices with a fixed hardware budget. The authors also demonstrate that their FFIP implementation for non-sparse ML models with 8 to 16-bit fixed-point inputs achieves higher throughput and compute efficiency than the best-in-class prior solutions on the same type of compute platform.Here are the three key points in Simplified Chinese:* For: 本文是为了提高机器学习(ML)模型的性能,提出了一种新的算法called Free-pipeline Fast Inner Product(FFIP)和其硬件体系结构。* Methods: 本文使用了一种尚未得到充分研究的快速内积算法(FIP),并在 ML 加速器中实现了它。作者们还提出了一种新的算法called FFIP 和一种通用的体系结构,可以提高 FIP 的时钟频率和吞吐量,同时保持相同的硬件成本。* Results: 本文表明,FFIP 可以轻松地与传统的 fixed-point systolic array ML 加速器集成,以 достиieving the same throughput with half the number of multiply-accumulate(MAC)单元,或者 doubles the maximum systolic array size that can fit onto devices with a fixed hardware budget。作者们还证明了他们的 FFIP 实现对非零杂质 ML 模型的 8 到 16 位 fixed-point 输入实现了更高的throughput和计算效率,比最佳类型的计算平台上的先前解决方案更好。Abstract
We introduce a new algorithm called the Free-pipeline Fast Inner Product (FFIP) and its hardware architecture that improve an under-explored fast inner-product algorithm (FIP) proposed by Winograd in 1968. Unlike the unrelated Winograd minimal filtering algorithms for convolutional layers, FIP is applicable to all machine learning (ML) model layers that can mainly decompose to matrix multiplication, including fully-connected, convolutional, recurrent, and attention/transformer layers. We implement FIP for the first time in an ML accelerator then present our FFIP algorithm and generalized architecture which inherently improve FIP's clock frequency and, as a consequence, throughput for a similar hardware cost. Finally, we contribute ML-specific optimizations for the FIP and FFIP algorithms and architectures. We show that FFIP can be seamlessly incorporated into traditional fixed-point systolic array ML accelerators to achieve the same throughput with half the number of multiply-accumulate (MAC) units, or it can double the maximum systolic array size that can fit onto devices with a fixed hardware budget. Our FFIP implementation for non-sparse ML models with 8 to 16-bit fixed-point inputs achieves higher throughput and compute efficiency than the best-in-class prior solutions on the same type of compute platform.
摘要
我们介绍一种新的算法 called Free-pipeline Fast Inner Product (FFIP) 和其硬件架构,该算法可以提高Winograd在1968年提出的快速内积算法(FIP)的性能。与涉及到卷积层的Winograd最小滤波算法不同,FIP可以应用于所有机器学习(ML)模型层,包括完全连接层、卷积层、回卷层和注意力/转换器层。我们在ML加速器上实现了FIP,然后提出了我们的FFIP算法和通用架构,这两者都可以提高FIP的时钟频率和通过put,同时具有相同的硬件成本。最后,我们对FIP和FFIP算法和架构进行了特定于ML的优化。我们展示了FFIP可以轻松地与传统的 fixes-point systolic array ML加速器结合使用,以实现同样的throughput,但使用的MAC单元数量减半,或者 doublesystolic array的最大大小,可以在设备上匹配的硬件预算内。我们对具有8到16位Fixed-point输入的非稀盐ML模型进行了实现,并取得了与同类 compute平台上的最佳前一solution相同或更高的throughput和计算效率。
Digital Twin-Based User-Centric Edge Continual Learning in Integrated Sensing and Communication
methods: 这篇论文使用了一个轻量级的深度神经网络(DNN)和一个移动边缘计算(MEC)服务器,并将感应数据上传到服务器进行更高精度的处理。为了处理数据漂移,服务器会在必要时更新轻量级DNN, referred to as continual learning。
results: 经过实验显示,提出的DT基于方法可以实现 computation cost minimization,并且在执行DNN基于人姿识别任务时表现出色。Abstract
In this paper, we propose a digital twin (DT)-based user-centric approach for processing sensing data in an integrated sensing and communication (ISAC) system with high accuracy and efficient resource utilization. The considered scenario involves an ISAC device with a lightweight deep neural network (DNN) and a mobile edge computing (MEC) server with a large DNN. After collecting sensing data, the ISAC device either processes the data locally or uploads them to the server for higher-accuracy data processing. To cope with data drifts, the server updates the lightweight DNN when necessary, referred to as continual learning. Our objective is to minimize the long-term average computation cost of the MEC server by optimizing two decisions, i.e., sensing data offloading and sensing data selection for the DNN update. A DT of the ISAC device is constructed to predict the impact of potential decisions on the long-term computation cost of the server, based on which the decisions are made with closed-form formulas. Experiments on executing DNN-based human motion recognition tasks are conducted to demonstrate the outstanding performance of the proposed DT-based approach in computation cost minimization.
摘要
在这篇论文中,我们提出了基于数字双(DT)的用户中心的方法,用于处理感知数据在集成感知通信(ISAC)系统中,并且具有高精度和有效资源利用。我们考虑的场景是一个具有轻量级深度学习网络(DNN)的ISAC设备,以及一个具有大型DNN的移动边缘计算(MEC)服务器。在收集感知数据后,ISAC设备会选择是在本地处理数据,或者上传到服务器进行更高精度的数据处理。为了应对数据漂移,服务器会在必要时更新轻量级DNN,称为 kontinual learning。我们的目标是将MEC服务器的长期平均计算成本降低到最低水平,通过优化两个决策:感知数据上载和感知数据选择,以便更新DNN。一个DT的ISAC设备是建立,以预测可能的决策对MEC服务器的长期计算成本产生的影响,并根据这些预测结果,制定了关闭式公式来做出决策。我们对执行基于DNN的人体运动识别任务进行了实验,以证明我们提出的DT-基于方法在计算成本减少方面表现出色。
results: 本研究发现了一些不可满足的防御,并提出了一种基于防御 semantics 的论证概要化方法。Abstract
In this paper we introduce a novel semantics, called defense semantics, for Dung's abstract argumentation frameworks in terms of a notion of (partial) defence, which is a triple encoding that one argument is (partially) defended by another argument via attacking the attacker of the first argument. In terms of defense semantics, we show that defenses related to self-attacked arguments and arguments in 3-cycles are unsatifiable under any situation and therefore can be removed without affecting the defense semantics of an AF. Then, we introduce a new notion of defense equivalence of AFs, and compare defense equivalence with standard equivalence and strong equivalence, respectively. Finally, by exploiting defense semantics, we define two kinds of reasons for accepting arguments, i.e., direct reasons and root reasons, and a notion of root equivalence of AFs that can be used in argumentation summarization.
摘要
在本文中,我们引入了一种新的 semantics,即防御 semantics,用于杜нг的抽象口头论证框架中的一种幂等,其中一个Argument被另一个Argument部分防御,通过攻击该Argument的攻击者。在防御 semantics 中,我们表明了自我攻击的论证和3-cycle中的论证是不可满足的,因此可以无需考虑这些论证。然后,我们引入了一种新的论证相等性(defense equivalence),并与标准相等性和强相等性进行比较。最后,通过利用防御 semantics,我们定义了两种接受论证的理由:直接理由和根理由,以及一种论证相等性的概念,可以用于口头论证概要化。
methods: 该论文使用了大量的人工生成内容来训练 AI 图像生成器,并在这些模型 retrained 后发现它们会生成高度扭曲的图像。
results: 研究发现, retrained 后的模型会生成高度扭曲的图像,并且这种扭曲不仅限于使用的文本提示,而且会影响整个图像的表现。此外,已经恶化的模型即使在重新训练于真实图像上也难以恢复原状。Abstract
Trained on massive amounts of human-generated content, AI (artificial intelligence) image synthesis is capable of reproducing semantically coherent images that match the visual appearance of its training data. We show that when retrained on even small amounts of their own creation, these generative-AI models produce highly distorted images. We also show that this distortion extends beyond the text prompts used in retraining, and that once poisoned, the models struggle to fully heal even after retraining on only real images.
摘要
<>用大量人工生成的内容训练,AI图像生成器可以生成具有Semantic coherence的图像,与训练数据的视觉外观相匹配。我们显示,当 retrained наeven small amounts of their own creation,这些生成-AI模型会生成高度扭曲的图像。我们还显示,这种扭曲不仅局限于用于 retrained 的文本提示,而且这些模型一旦被毒化,即使在仅使用真实图像进行重训练,也难以完全恢复。Note: "毒化" (poisoned) in the text refers to the situation where the AI model is trained on inappropriate or misleading data, which can cause the model to produce distorted or incorrect output.
PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
for: PhysGaussian is a new method for achieving high-quality novel motion synthesis by integrating physically grounded Newtonian dynamics within 3D Gaussians.
methods: PhysGaussian employs a custom Material Point Method (MPM) to enrich 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles.
results: The method demonstrates exceptional versatility across a wide variety of materials, showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements.Here is the same information in Simplified Chinese text:
methods: PhysGaussian 使用自定义的 Material Point Method (MPM),以抽象 3D Gaussian 函数中的物理意义的剪枝减压和机械压缩特性,所有在continuum mechanics 原理下演化。
results: PhysGaussian 方法在各种材料上显示出了强大的多样性,包括弹簧体、金属、非牛顿流体和尘埃材料,并且能够创造出多种视觉内容,包括新的视角和运动。Abstract
We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, "cage meshes," or any other geometry embedding, highlighting the principle of "what you see is what you simulate (WS$^2$)." Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements. Our project page is at: https://xpandora.github.io/PhysGaussian/
摘要
我们介绍PhysGaussian,一种新的方法,可以内置物理基础的新顿动力学在3D Gaussian中实现高质量的新动作 sintesis。我们的方法使用自定义的物点方法(MPM),将3D Gaussian kernel给予物理意义的几何对象和机械压力属性,并与流体力学原理一致地进行演算。PhysGaussian的一个特点是让visual rendering和物理 simulated互相使用同一个3D Gaussian kernel作为粗糙表示,这标志着"你看到的就是你实际上实现"的原则(WS$^2)。我们的方法在不同材料中展示了出色的多样性,包括弹性体、金属、非新顿流体和尘质物质,并证明了它在创造多种视角和运动的丰富可见性。我们的项目页面位于:https://xpandora.github.io/PhysGaussian/
results: 研究发现,当医疗诊断查询中使用医疗特有的词汇时,ChatGPT的错误率会增加。然而,通过几何工程来设计提示,可以帮助ChatGPT部分避免这些错误。Abstract
Reinforcement learning-based large language models, such as ChatGPT, are believed to have potential to aid human experts in many domains, including healthcare. There is, however, little work on ChatGPT's ability to perform a key task in healthcare: formal, probabilistic medical diagnostic reasoning. This type of reasoning is used, for example, to update a pre-test probability to a post-test probability. In this work, we probe ChatGPT's ability to perform this task. In particular, we ask ChatGPT to give examples of how to use Bayes rule for medical diagnosis. Our prompts range from queries that use terminology from pure probability (e.g., requests for a "posterior probability") to queries that use terminology from the medical diagnosis literature (e.g., requests for a "post-test probability"). We show how the introduction of medical variable names leads to an increase in the number of errors that ChatGPT makes. Given our results, we also show how one can use prompt engineering to facilitate ChatGPT's partial avoidance of these errors. We discuss our results in light of recent commentaries on sensitivity and specificity. We also discuss how our results might inform new research directions for large language models.
摘要
大型自然语言模型,如ChatGPT,believed to have potential to aid human experts in many domains, including healthcare. However, there is little work on ChatGPT's ability to perform a key task in healthcare: 形式抽象医学诊断思维。这种思维是用于更新先前概率到后测概率的。在这项工作中,我们探究ChatGPT的能力来执行这个任务。我们问ChatGPT给出医学诊断中使用杯因之则的示例。我们的提示从普适概率中的词汇(例如,"后测概率")到医学诊断文献中的词汇(例如,"后测概率")。我们发现,在医学变量名导入后,ChatGPT的错误率增加。根据我们的结果,我们还示出了如何使用提示工程来facilitate ChatGPT的部分避免错误。我们讨论我们的结果与最近的敏感性和特点的评论相关。我们还讨论如何将我们的结果引入新的大语言模型研究方向。
results: 研究发现,人们对机器人的信任程度受到许多因素的影响,包括机器人的能力和可靠性、交互情境和任务等。同时,研究还发现现有的信任量测试方法存在一些不足之处,需要进一步的改进和扩展。Abstract
Trust in robots is widely believed to be imperative for the adoption of robots into people's daily lives. It is, therefore, understandable that the literature of the last few decades focuses on measuring how much people trust robots -- and more generally, any agent - to foster such trust in these technologies. Researchers have been exploring how people trust robot in different ways, such as measuring trust on human-robot interactions (HRI) based on textual descriptions or images without any physical contact, during and after interacting with the technology. Nevertheless, trust is a complex behaviour, and it is affected and depends on several factors, including those related to the interacting agents (e.g. humans, robots, pets), itself (e.g. capabilities, reliability), the context (e.g. task), and the environment (e.g. public spaces vs private spaces vs working spaces). In general, most roboticists agree that insufficient levels of trust lead to a risk of disengagement while over-trust in technology can cause over-reliance and inherit dangers, for example, in emergency situations. It is, therefore, very important that the research community has access to reliable methods to measure people's trust in robots and technology. In this position paper, we outline current methods and their strengths, identify (some) weakly covered aspects and discuss the potential for covering a more comprehensive amount of factors influencing trust in HRI.
摘要
信任机器人被广泛认为是人工智能技术的采用的关键因素。因此,文献上最近几十年的研究都集中在测量人们对机器人的信任——更广泛地说,任何代理人——以促进这些技术的采用。研究人员在不同的方式测量人们对机器人的信任,例如基于文本描述或图像无physical contact的人机交互(HRI)中的信任。然而,信任是复杂的行为,它受到多种因素的影响和依赖,包括交互代理人(如人、机器人、宠物)、自身(如能力、可靠性)、 Context(如任务)和环境(如公共空间、私人空间、工作空间)。大多数机器人学家认为,不足的信任会导致技术离别,而过度信任可能会导致过度依赖和固有危险,例如在紧急情况下。因此,研究社区需要可靠的方法来测量人们对机器人的信任。在这篇Position paper中,我们介绍当前的方法和其优点,识别一些不足的方面,并讨论涵盖更广泛的信任因素的潜在可能性。
Conditional Modeling Based Automatic Video Summarization
results: 对常用的视频摘要数据集进行了广泛的实验,并 показа了该方法可以超过现有方法,达到视频摘要的州OF-THE-ART性能。Abstract
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story. Video summarization methods mainly rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video. There are other non-visual factors, such as interestingness, representativeness, and storyline consistency that should also be considered for generating high-quality video summaries. Current methods do not adequately take into account these non-visual factors, resulting in suboptimal performance. In this work, a new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries. The method utilizes a conditional modeling perspective and introduces multiple meaningful random variables and joint distributions to characterize the key components of video summarization. Helper distributions are employed to improve the training of the model. A conditional attention module is designed to mitigate potential performance degradation in the presence of multi-modal input. The proposed video summarization method incorporates the above innovative design choices that aim to narrow the gap between human-generated and machine-generated video summaries. Extensive experiments show that the proposed approach outperforms existing methods and achieves state-of-the-art performance on commonly used video summarization datasets.
摘要
In this work, we propose a new approach to video summarization based on insights from human-created ground truth summaries. Our method uses a conditional modeling perspective and introduces multiple meaningful random variables and joint distributions to characterize the key components of video summarization. We also employ helper distributions to improve the training of the model. A conditional attention module is designed to mitigate potential performance degradation in the presence of multi-modal input.Our proposed video summarization method incorporates several innovative design choices that aim to narrow the gap between human-generated and machine-generated video summaries. Extensive experiments show that our approach outperforms existing methods and achieves state-of-the-art performance on commonly used video summarization datasets.
paper_authors: Habtom Kahsay Gidey, Peter Hillmann, Andreas Karcher, Alois Knoll
for: 本研究旨在探讨软件机器人的高级通用智能工程,以及如何通过认知架构支持这些工程。
methods: 本研究使用了许多不同的认知架构,以探讨它们如何支持软件机器人的智能行为。
results: 研究发现,使用认知架构可以帮助软件机器人更好地理解和利用多个虚拟环境的便利功能,从而实现更高级的自主智能行为。Abstract
Software bots have attracted increasing interest and popularity in both research and society. Their contributions span automation, digital twins, game characters with conscious-like behavior, and social media. However, there is still a lack of intelligent bots that can adapt to web environments' variability and dynamic nature. Unlike human users, they have difficulty understanding and exploiting the affordances across multiple virtual environments. Despite the hype, bots with human user-like cognition do not currently exist. Chatbots, for instance, lack situational awareness on the digital platforms where they operate, preventing them from enacting meaningful and autonomous intelligent behavior similar to human users. In this survey, we aim to explore the role of cognitive architectures in supporting efforts towards engineering software bots with advanced general intelligence. We discuss how cognitive architectures can contribute to creating intelligent software bots. Furthermore, we highlight key architectural recommendations for the future development of autonomous, user-like cognitive bots.
摘要
软件机器人在研究和社会中受到越来越多的关注和欢迎。它们的贡献包括自动化、数字双胞胎、游戏角色具有意识型行为和社交媒体。然而,目前还缺乏智能机器人,能够适应网络环境的变化和动态性。与人类用户不同,机器人困难理解和利用多个虚拟环境中的便利。尽管有很多赞美,但是机器人与人类用户类似的认知仍然不存在。聊天机器人,例如,在数字平台上缺乏情况意识,使其无法展现出类似人类用户的智能行为。在本调查中,我们想要探索软件机器人的高级通用智能的工程准备。我们讨论了如何使用认知架构来支持高级通用智能机器人的开发。此外,我们也强调了未来开发自主、用户类似认知机器人的关键建筑方案。
Teaching Robots to Build Simulations of Themselves
for: robots to plan and estimate the outcomes of prospective actions without physically executing them
methods: self-supervised learning framework using brief raw video data
results: accurate motion planning and detection of abnormalities/recovery from damageAbstract
Simulation enables robots to plan and estimate the outcomes of prospective actions without the need to physically execute them. We introduce a self-supervised learning framework to enable robots model and predict their morphology, kinematics and motor control using only brief raw video data, eliminating the need for extensive real-world data collection and kinematic priors. By observing their own movements, akin to humans watching their reflection in a mirror, robots learn an ability to simulate themselves and predict their spatial motion for various tasks. Our results demonstrate that this self-learned simulation not only enables accurate motion planning but also allows the robot to detect abnormalities and recover from damage.
摘要
使用模拟可以让机器人规划和估算未来行动的结果,不需要物理执行。我们介绍了一种自我超级学习框架,使机器人通过短暴露视频数据自学模型和预测自己的形态、运动和动力控制,从而消除了大量实际世界数据收集和遥感估计。机器人通过观察自己的运动,类似于人类在镜中观察自己,学习了模拟自己的能力,并且可以准确规划运动和检测异常。我们的结果表明,自学 simulate 不仅允许精准的运动规划,还允许机器人恢复自身损害。
results: 根据论文的结果,使用MLP编码本地特征和减噪正则化可以在三维形状重建 task 上达到比使用 CNN 更高的性能,并且使用的模型参数数量只是使用 CNN 的一半。Abstract
While current state-of-the-art generalizable implicit neural shape models rely on the inductive bias of convolutions, it is still not entirely clear how properties emerging from such biases are compatible with the task of 3D reconstruction from point cloud. We explore an alternative approach to generalizability in this context. We relax the intrinsic model bias (i.e. using MLPs to encode local features as opposed to convolutions) and constrain the hypothesis space instead with an auxiliary regularization related to the reconstruction task, i.e. denoising. The resulting model is the first only-MLP locally conditioned implicit shape reconstruction from point cloud network with fast feed forward inference. Point cloud borne features and denoising offsets are predicted from an exclusively MLP-made network in a single forward pass. A decoder predicts occupancy probabilities for queries anywhere in space by pooling nearby features from the point cloud order-invariantly, guided by denoised relative positional encoding. We outperform the state-of-the-art convolutional method while using half the number of model parameters.
摘要
当前最新的通用隐藏型神经形态模型,它们靠 inductive bias 来实现通用性。然而,这些特性如何与点云 reconstruction 任务相容,仍然不是 entirely clear。我们explore 一种 alternativeto 通用性的方法。我们将本地特征编码使用 MLP 而不是 convolutions,并使用 auxiliary regularization 来约束假设空间。这导致了首个只有 MLP 的本地条件隐藏形态重建从点云网络,具有快速前向推理。从点云中生成的特征和denoising offset 都是通过 exclusively MLP-made 网络在单个前进 pass 预测的。一个解码器在空间中任意位置预测 queries 的占据概率,通过 pooling nearby features 从点云中order-invariantly,以denoised relative positional encoding 为导航。我们在使用一半数量的模型参数时超越了状态的 convolutional 方法。
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
results: 对于两个标准数据集(Human3.6M和MPI-INF-3DHP)的实验结果表明,这种方法可以同时实现高效和准确的3D人姿估计,比原始VPT模型更高效,并且可以适应资源受限的设备。例如,在应用于MotionBERT和MixSTE模型上,HoT可以将计算量减少约50% без损失精度,或者减少约40%的计算量,仅减少精度0.2%。Abstract
Transformers have been successfully applied in the field of video-based 3D human pose estimation. However, the high computational costs of these video pose transformers (VPTs) make them impractical on resource-constrained devices. In this paper, we present a plug-and-play pruning-and-recovering framework, called Hourglass Tokenizer (HoT), for efficient transformer-based 3D human pose estimation from videos. Our HoT begins with pruning pose tokens of redundant frames and ends with recovering full-length tokens, resulting in a few pose tokens in the intermediate transformer blocks and thus improving the model efficiency. To effectively achieve this, we propose a token pruning cluster (TPC) that dynamically selects a few representative tokens with high semantic diversity while eliminating the redundancy of video frames. In addition, we develop a token recovering attention (TRA) to restore the detailed spatio-temporal information based on the selected tokens, thereby expanding the network output to the original full-length temporal resolution for fast inference. Extensive experiments on two benchmark datasets (i.e., Human3.6M and MPI-INF-3DHP) demonstrate that our method can achieve both high efficiency and estimation accuracy compared to the original VPT models. For instance, applying to MotionBERT and MixSTE on Human3.6M, our HoT can save nearly 50% FLOPs without sacrificing accuracy and nearly 40% FLOPs with only 0.2% accuracy drop, respectively. Our source code will be open-sourced.
摘要
<>Transformers 已经成功应用于视频基于三维人体姿态估计领域。然而,高计算成本使得这些视频pose transformers (VPTs) 在有限资源设备上无法实现。在这篇论文中,我们提出了一个插件化剪辑恢复框架,called Hourglass Tokenizer (HoT),用于高效的 transformer 基于三维人体姿态估计从视频中。我们的 HoT 从剪辑姿态token 的剪辑框架中开始,并结束于恢复全长姿态token,从而在转换器中减少一些姿态token,提高模型的效率。为了实现这一点,我们提出了一个姿态剪辑集 (TPC),可以动态选择一些具有高Semantic 多样性的姿态token,同时消除视频帧中的重复性。此外,我们开发了一个姿态恢复注意力 (TRA),以便基于选择的姿态token 还原细致的空间时间信息,从而扩展网络输出到原始的全长时间分辨率,以便快速的推理。我们的实验表明,对于 Human3.6M 和 MPI-INF-3DHP 两个标准数据集,我们的方法可以实现高效和估计精度,比如对于 MotionBERT 和 MixSTE 模型,我们的 HoT 可以将 FLOPs 减少约 50% 不 sacrificing 精度,或者将 FLOPs 减少约 40% 只有 0.2% 精度下降。我们的源代码将被开源。<>
results: 论文发现,即使使用最新的GPT-4模型,AI系统的答案准确率只有39%,而专家和非专家评审人员的答案准确率分别为65%和34%。这表明AI系统需要更多的监督和审核,以确保它们可以提供可靠的信息。Abstract
We present GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. We ensure that the questions are high-quality and extremely difficult: experts who have or are pursuing PhDs in the corresponding domains reach 65% accuracy (74% when discounting clear mistakes the experts identified in retrospect), while highly skilled non-expert validators only reach 34% accuracy, despite spending on average over 30 minutes with unrestricted access to the web (i.e., the questions are "Google-proof"). The questions are also difficult for state-of-the-art AI systems, with our strongest GPT-4 based baseline achieving 39% accuracy. If we are to use future AI systems to help us answer very hard questions, for example, when developing new scientific knowledge, we need to develop scalable oversight methods that enable humans to supervise their outputs, which may be difficult even if the supervisors are themselves skilled and knowledgeable. The difficulty of GPQA both for skilled non-experts and frontier AI systems should enable realistic scalable oversight experiments, which we hope can help devise ways for human experts to reliably get truthful information from AI systems that surpass human capabilities.
摘要
我们提供了GPQA数据集,包含448个多选问题,由领域专家写作生物、物理和化学领域。我们确保这些问题的质量非常高,极其困难:有或者在追求PhD的专家只有65%的准确率(74%不计算明显的错误),而高水平的非专家验证人员只有34%的准确率,尽管他们平均花费超过30分钟,并且有无限时间在网上查询(即问题是"Google-proof")。这些问题也对当前AI系统来说很困难,我们最强的GPT-4基础模型只达39%的准确率。如果我们想用未来的AI系统来帮助我们回答非常困难的问题,例如在发展新科学知识时,我们需要开发可扩展的监督方法,使人类可以监督AI系统的输出,这可能是非常困难,即使监督人员本身具备高水平的技能和知识。GPQA的困难程度不仅对非专家和前沿AI系统来说,也可以为我们进行可扩展的监督实验,我们希望通过这些实验找到一种可靠地从AI系统获取真实信息的方法,以便在AI系统超越人类能力时,人类专家可以得到可靠的信息。
Steering Responsible AI: A Case for Algorithmic Pluralism
results: 该论文提出了对algorithmic pluralism的描述,并评估了这种概念的机遇和挑战。Abstract
In this paper, I examine questions surrounding AI neutrality through the prism of existing literature and scholarship about mediation and media pluralism. Such traditions, I argue, provide a valuable theoretical framework for how we should approach the (likely) impending era of AI mediation. In particular, I suggest examining further the notion of algorithmic pluralism. Contrasting this notion to the dominant idea of algorithmic transparency, I seek to describe what algorithmic pluralism may be, and present both its opportunities and challenges. Implemented thoughtfully and responsibly, I argue, Algorithmic or AI pluralism has the potential to sustain the diversity, multiplicity, and inclusiveness that are so vital to democracy.
摘要
在这篇论文中,我通过现有的文献和学术研究关于媒介和多元媒体来探讨人工智能中立性的问题。这些传统,我 Argument that they provide a valuable theoretical framework for how we should approach the (likely) impending era of AI mediation. 特别是,我 suggets examining further the notion of algorithmic pluralism. 对于 dominant idea of algorithmic transparency,我 seek to describe what algorithmic pluralism may be, and present both its opportunities and challenges. 如果实施得当和负责任,我 Argument that algorithmic or AI pluralism has the potential to sustain the diversity, multiplicity, and inclusiveness that are so vital to democracy.Note: Please keep in mind that the translation is done by a machine and may not be perfect. Also, the grammar and sentence structure may be different from the original text.
BrainWash: A Poisoning Attack to Forget in Continual Learning
results: 实验结果表明,BrainWash方法可以成功地让连续学习模型忘记先前学习的任务,并且可以让模型在不同的常规化学习方法下表现出较差的性能。Abstract
Continual learning has gained substantial attention within the deep learning community, offering promising solutions to the challenging problem of sequential learning. Yet, a largely unexplored facet of this paradigm is its susceptibility to adversarial attacks, especially with the aim of inducing forgetting. In this paper, we introduce "BrainWash," a novel data poisoning method tailored to impose forgetting on a continual learner. By adding the BrainWash noise to a variety of baselines, we demonstrate how a trained continual learner can be induced to forget its previously learned tasks catastrophically, even when using these continual learning baselines. An important feature of our approach is that the attacker requires no access to previous tasks' data and is armed merely with the model's current parameters and the data belonging to the most recent task. Our extensive experiments highlight the efficacy of BrainWash, showcasing degradation in performance across various regularization-based continual learning methods.
摘要
<> simultaeneously learning has gained substantial attention within the deep learning community, offering promising solutions to the challenging problem of sequential learning. Yet, a largely unexplored facet of this paradigm is its susceptibility to adversarial attacks, especially with the aim of inducing forgetting. In this paper, we introduce "BrainWash," a novel data poisoning method tailored to impose forgetting on a continual learner. By adding the BrainWash noise to a variety of baselines, we demonstrate how a trained continual learner can be induced to forget its previously learned tasks catastrophically, even when using these continual learning baselines. An important feature of our approach is that the attacker requires no access to previous tasks' data and is armed merely with the model's current parameters and the data belonging to the most recent task. Our extensive experiments highlight the efficacy of BrainWash, showcasing degradation in performance across various regularization-based continual learning methods.Translated by Google Translate.
Exploring Lip Segmentation Techniques in Computer Vision: A Comparative Analysis
paper_authors: Pietro B. S. Masur, Francisco Braulio Oliveira, Lucas Moreira Medino, Emanuel Huber, Milene Haraguchi Padilha, Cassio de Alcantara, Renata Sellaro
results: Mask2Former EHANet 具有最好的性能, BiSeNet V2 表现竞争力强, PIDNet 具有最高的报告率,但精度较低。Abstract
Lip segmentation is crucial in computer vision, especially for lip reading. Despite extensive face segmentation research, lip segmentation has received limited attention. The aim of this study is to compare state-of-the-art lip segmentation models using a standardized setting and a publicly available dataset. Five techniques, namely EHANet, Mask2Former, BiSeNet V2, PIDNet, and STDC1, are qualitatively selected based on their reported performance, inference time, code availability, recency, and popularity. The CelebAMask-HQ dataset, comprising manually annotated face images, is used to fairly assess the lip segmentation performance of the selected models. Inference experiments are conducted on a Raspberry Pi4 to emulate limited computational resources. The results show that Mask2Former and EHANet have the best performances in terms of mIoU score. BiSeNet V2 demonstrate competitive performance, while PIDNet excels in recall but has lower precision. Most models present inference time ranging from 1000 to around 3000 milliseconds on a Raspberry Pi4, with PIDNet having the lowest mean inference time. This study provides a comprehensive evaluation of lip segmentation models, highlighting their performance and inference times. The findings contribute to the development of lightweight techniques and establish benchmarks for future advances in lip segmentation, especially in IoT and edge computing scenarios.
摘要
<>Translate the given text into Simplified Chinese.<> lip 分 segmentation 是计算机视觉中非常重要的一环,特别是 lip 读。despite 广泛的 face 分 segmentation 研究,lip 分 segmentation 却收到了有限的关注。本研究的目标是 Comparing state-of-the-art lip 分 segmentation 模型,使用标准化的设定和公共可用的数据集进行评估。 Five techniques, namely EHANet, Mask2Former, BiSeNet V2, PIDNet, and STDC1, are qualitatively selected based on their reported performance, inference time, code availability, recency, and popularity. The CelebAMask-HQ dataset, comprising manually annotated face images, is used to fairly assess the lip 分 segmentation performance of the selected models. Inference experiments are conducted on a Raspberry Pi4 to emulate limited computational resources. The results show that Mask2Former and EHANet have the best performances in terms of mIoU score. BiSeNet V2 demonstrates competitive performance, while PIDNet excels in recall but has lower precision. Most models present inference time ranging from 1000 to around 3000 milliseconds on a Raspberry Pi4, with PIDNet having the lowest mean inference time. This study provides a comprehensive evaluation of lip 分 segmentation models, highlighting their performance and inference times. The findings contribute to the development of lightweight techniques and establish benchmarks for future advances in lip 分 segmentation, especially in IoT and edge computing scenarios.
Categorizing the Visual Environment and Analyzing the Visual Attention of Dogs
results: 研究发现,狗具有对汽车、植物、路面和建筑设备的更多视觉注意力,而它们之间没有很大差异。这些结果为了理解狗的视觉行为和与physical world的互动提供了重要的信息。Abstract
Dogs have a unique evolutionary relationship with humans and serve many important roles e.g. search and rescue, blind assistance, emotional support. However, few datasets exist to categorize visual features and objects available to dogs, as well as how dogs direct their visual attention within their environment. We collect and study a dataset with over 11,698 gazes to categorize the objects available to be gazed at by 11 dogs in everyday outdoor environments i.e. a walk around a college campus and urban area. We explore the availability of these object categories and the visual attention of dogs over these categories using a head mounted eye tracking apparatus. A small portion (approx. 600 images or < 20% of total dataset) of the collected data is used to fine tune a MaskRCNN for the novel image domain to segment objects present in the scene, enabling further statistical analysis on the visual gaze tendencies of dogs. The MaskRCNN, with eye tracking apparatus, serves as an end to end model for automatically classifying the visual fixations of dogs. The fine tuned MaskRCNN performs far better than chance. There are few individual differences between the 11 dogs and we observe greater visual fixations on buses, plants, pavement, and construction equipment. This work takes a step towards understanding visual behavior of dogs and their interaction with the physical world.
摘要
狗有独特的进化关系与人类,扮演许多重要的角色,如搜索救援、残疾人帮助、情感支持等。然而,有很少的数据集可以分类狗视觉中的视觉特征和对象,以及狗在环境中如何 dirige 其视觉注意力。我们收集和研究了一个数据集,包含11个狗在日常户外环境中 gaze 的11,698个视觉 fixation,包括大学校园和城市区域的行走。我们研究了这些对象类别的可用性,以及狗在这些类别中的视觉注意力使用头戴式眼动追踪设备。一小部分(约600张图像,即 <20% 的总数据集)的数据用于练化一个MaskRCNN模型,以便在新的图像领域中进行自动物体分类,从而进一步统计狗视觉倾向的统计分析。MaskRCNN模型,具有眼动追踪设备,serve为狗视觉自动分类的终端模型。练化后的MaskRCNN表现远胜准Random。我们发现11只狗之间存在很少的个体差异,而狗更多地围注视汽车、植物、路面和建筑设备。这项工作为了理解狗视觉行为和它们与物理世界的互动而做出了一步。
Leveraging Previous Facial Action Units Knowledge for Emotion Recognition on Faces
results: 研究提出了一种基于 Facial Action Coding System (FACS) 的机器学习方法,以提高多个cue情感识别的精度。Abstract
People naturally understand emotions, thus permitting a machine to do the same could open new paths for human-computer interaction. Facial expressions can be very useful for emotion recognition techniques, as these are the biggest transmitters of non-verbal cues capable of being correlated with emotions. Several techniques are based on Convolutional Neural Networks (CNNs) to extract information in a machine learning process. However, simple CNNs are not always sufficient to locate points of interest on the face that can be correlated with emotions. In this work, we intend to expand the capacity of emotion recognition techniques by proposing the usage of Facial Action Units (AUs) recognition techniques to recognize emotions. This recognition will be based on the Facial Action Coding System (FACS) and computed by a machine learning system. In particular, our method expands over EmotiRAM, an approach for multi-cue emotion recognition, in which we improve over their facial encoding module.
摘要
人们自然地理解情感,因此让机器也能够同样理解可以开启新的人机交互方式。人脸表情是情感识别技术中最大的非语言表达,可以与情感相关。一些技术基于卷积神经网络(CNN)来提取信息,但简单的CNN不一定能够找到与情感相关的面部特征。在这种工作中,我们计划通过使用表情动作单元(AU)识别技术来识别情感。这种识别基于人脸动作编码系统(FACS),由机器学习系统进行计算。具体来说,我们的方法在EmotiRAM方法中的面部编码模块上进行了改进。
Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting
paper_authors: David Latortue, Moetez Kdayem, Fidel A Guerrero Peña, Eric Granger, Marco Pedersoli
for: 人数计算(人 counting)
methods: 使用深度人脸检测器(deep person counting architectures)和Convolutional Neural Networks(CNN)
results: 使用CNN Image-Level模型可以达到与YOLO探测器和点级模型相当的人数计算结果,同时提供更高的帧率和相似的模型参数量。Abstract
Object detection models are commonly used for people counting (and localization) in many applications but require a dataset with costly bounding box annotations for training. Given the importance of privacy in people counting, these models rely more and more on infrared images, making the task even harder. In this paper, we explore how weaker levels of supervision can affect the performance of deep person counting architectures for image classification and point-level localization. Our experiments indicate that counting people using a CNN Image-Level model achieves competitive results with YOLO detectors and point-level models, yet provides a higher frame rate and a similar amount of model parameters.
摘要
人数检测模型广泛用于人数计算和本地化在多种应用中,但是需要贵重的 bounding box 标注数据来训练。由于隐私问题的重要性,这些模型越来越依赖于 Infrared 图像,使任务变得更加困难。在这篇论文中,我们研究如何弱一级超级视觉模型对深度人数计算架构的影响。我们的实验表明,使用 CNN 图像级别模型来计算人数可以达到与 YOLO 检测器和点级模型相同的性能水平,同时提供更高的帧率和相似的模型参数。
NNG-Mix: Improving Semi-supervised Anomaly Detection with Pseudo-anomaly Generation
results: 经过广泛的实验,该扩展方法在57个 benchmark 数据集上表现出色,与其他数据扩展方法相比,它可以提高预测异常数据的性能。Abstract
Anomaly detection (AD) is essential in identifying rare and often critical events in complex systems, finding applications in fields such as network intrusion detection, financial fraud detection, and fault detection in infrastructure and industrial systems. While AD is typically treated as an unsupervised learning task due to the high cost of label annotation, it is more practical to assume access to a small set of labeled anomaly samples from domain experts, as is the case for semi-supervised anomaly detection. Semi-supervised and supervised approaches can leverage such labeled data, resulting in improved performance. In this paper, rather than proposing a new semi-supervised or supervised approach for AD, we introduce a novel algorithm for generating additional pseudo-anomalies on the basis of the limited labeled anomalies and a large volume of unlabeled data. This serves as an augmentation to facilitate the detection of new anomalies. Our proposed algorithm, named Nearest Neighbor Gaussian Mixup (NNG-Mix), efficiently integrates information from both labeled and unlabeled data to generate pseudo-anomalies. We compare the performance of this novel algorithm with commonly applied augmentation techniques, such as Mixup and Cutout. We evaluate NNG-Mix by training various existing semi-supervised and supervised anomaly detection algorithms on the original training data along with the generated pseudo-anomalies. Through extensive experiments on 57 benchmark datasets in ADBench, reflecting different data types, we demonstrate that NNG-Mix outperforms other data augmentation methods. It yields significant performance improvements compared to the baselines trained exclusively on the original training data. Notably, NNG-Mix yields up to 16.4%, 8.8%, and 8.0% improvements on Classical, CV, and NLP datasets in ADBench. Our source code will be available at https://github.com/donghao51/NNG-Mix.
摘要
anomaly detection (AD) 是必备的在复杂系统中发现罕见和重要事件的工具,应用于网络入侵检测、财务诈骗检测和基础设施和工业系统的故障检测等领域。由于 AD 通常被视为无监督学习任务,因此在实际应用中很难获得大量标注数据。但是,在半监督和监督学习方面,可以利用这些标注数据,从而提高性能。在这篇文章中,我们不是提出一种新的半监督或监督学习方法,而是提出一种新的算法,可以生成更多的 Pseudo-anomaly,以便检测新的异常。我们称之为 Nearest Neighbor Gaussian Mixup (NNG-Mix)。NNG-Mix 算法可以有效地利用标注数据和无标注数据之间的信息,生成 Pseudo-anomaly。我们与常见的混合和剪辑技术进行比较,并通过在不同的数据类型上进行广泛的实验,证明 NNG-Mix 算法在 ADBench 上的57 个标准数据集上具有显著的性能优势。相比基eline 在原始训练数据上进行训练时,NNG-Mix 可以提供 Up to 16.4%、8.8% 和 8.0% 的性能提升。它的源代码将在 GitHub 上公开。
Correlated Attention in Transformers for Multivariate Time Series
results: 对于多种任务,如填充、异常检测和分类,与基本Transformer模型相比,相关注意机制可以提供更好的性能,并且在各种实验中 consistently 表现出优于基本Transformer模型。Abstract
Multivariate time series (MTS) analysis prevails in real-world applications such as finance, climate science and healthcare. The various self-attention mechanisms, the backbone of the state-of-the-art Transformer-based models, efficiently discover the temporal dependencies, yet cannot well capture the intricate cross-correlation between different features of MTS data, which inherently stems from complex dynamical systems in practice. To this end, we propose a novel correlated attention mechanism, which not only efficiently captures feature-wise dependencies, but can also be seamlessly integrated within the encoder blocks of existing well-known Transformers to gain efficiency improvement. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at the sub-series level. This architecture facilitates automated discovery and representation learning of not only instantaneous but also lagged cross-correlations, while inherently capturing time series auto-correlation. When combined with prevalent Transformer baselines, correlated attention mechanism constitutes a better alternative for encoder-only architectures, which are suitable for a wide range of tasks including imputation, anomaly detection and classification. Extensive experiments on the aforementioned tasks consistently underscore the advantages of correlated attention mechanism in enhancing base Transformer models, and demonstrate our state-of-the-art results in imputation, anomaly detection and classification.
摘要
多变量时间序列(MTS)分析在现实应用中广泛存在,如金融、气候科学和医疗等领域。各种自注意机制,是现代Transformer模型的核心,高效地发现时间相关性,但无法好地捕捉MTS数据中不同特征之间的复杂相关性,这些相关性在实践中来自复杂的动态系统。为此,我们提出了一种新的相关注意机制,不仅高效地捕捉特征间的相关性,还可以轻松地与现有的Transformer模型集成,以提高效率。具体来说,相关注意机制在特征通道之间计算特征查询和特征键之间的差值 cov Matrix,并选择ively Representations at the sub-series level。这种架构使得自动发现和表征学习不同特征之间的不仅当前响应,还包括延迟响应,而且自然地捕捉时间序列自相关。当与普遍的Transformer基线模型结合使用时,相关注意机制组成了更好的encoder-only架构,适用于广泛的任务,如填充、异常检测和分类。extensive experiment表明,相关注意机制在改进基Transformer模型的情况下,在填充、异常检测和分类任务中具有明显的优势,并实现了我们的国际前进result。
FinanceBench: A New Benchmark for Financial Question Answering
for: The paper is written to evaluate the performance of large language models (LLMs) on open book financial question answering (QA).
methods: The paper uses a test suite called FinanceBench, which consists of 10,231 questions about publicly traded companies, to evaluate the performance of 16 state-of-the-art LLM configurations. The authors manually reviewed the answers to the questions (n=2,400) and found that existing LLMs have clear limitations for financial QA.
results: The authors found that GPT-4-Turbo, a popular LLM, incorrectly answered or refused to answer 81% of questions when used with a retrieval system. While augmentation techniques such as using longer context windows can improve performance, they are unrealistic for enterprise settings due to increased latency and cannot support larger financial documents. All models examined exhibited weaknesses such as hallucinations, which limit their suitability for use by enterprises.Abstract
FinanceBench is a first-of-its-kind test suite for evaluating the performance of LLMs on open book financial question answering (QA). It comprises 10,231 questions about publicly traded companies, with corresponding answers and evidence strings. The questions in FinanceBench are ecologically valid and cover a diverse set of scenarios. They are intended to be clear-cut and straightforward to answer to serve as a minimum performance standard. We test 16 state of the art model configurations (including GPT-4-Turbo, Llama2 and Claude2, with vector stores and long context prompts) on a sample of 150 cases from FinanceBench, and manually review their answers (n=2,400). The cases are available open-source. We show that existing LLMs have clear limitations for financial QA. Notably, GPT-4-Turbo used with a retrieval system incorrectly answered or refused to answer 81% of questions. While augmentation techniques such as using longer context window to feed in relevant evidence improve performance, they are unrealistic for enterprise settings due to increased latency and cannot support larger financial documents. We find that all models examined exhibit weaknesses, such as hallucinations, that limit their suitability for use by enterprises.
摘要
金融桌面(FinanceBench)是一个首创性的测试集,用于评估大数据语言模型(LLM)在公开的财务问答(QA)领域的表现。其包含10,231个公开交易公司的问题,以及相应的答案和证据串。 FinanceBench 中的问题是生物地 Valid,覆盖了多样化的场景。它们是为了提供最低性能标准,易于答题。我们在 FinanceBench 中采样 150 个案例,测试了 16 个现状之最模型配置(包括 GPT-4-Turbo、Llama2 和 Claude2,以及 vector store 和长 context prompts),并 manually 复查其答案(n = 2,400)。案例可以在开源的形式下获得。我们发现现有的 LLM 在财务 QA 领域存在明显的限制。例如,GPT-4-Turbo 在使用检索系统时 incorrectly 答题或拒绝答题 81% 的问题。而使用 longer context window 来接受相关证据的增强技术可以提高表现,但是这些技术在企业环境中不可能实现因为增加的延迟,无法支持更大的财务文档。我们发现所有模型受测都存在一些弱点,如幻化,这限制了它们在企业环境中的适用性。
Ovarian Cancer Data Analysis using Deep Learning: A Systematic Review from the Perspectives of Key Features of Data Analysis and AI Assurance
paper_authors: Muta Tah Hira, Mohammad A. Razzaque, Mosharraf Sarker for:* 这些研究主要是为了检测和诊断卵巢癌。methods:* 大多数研究使用了基于样本的 Deep Learning 技术,并且只有一些研究使用了混合数据(临床或ómics数据)进行集成分析。results:* 只有一小部分的研究(仅8.3%) validate their models 使用外部和多样化的数据集,表明需要进一步 validate 模型。* 卵巢癌数据分析中的人工智能确保(AIAs)处于非常早期的阶段,只有2.1% 的研究直接考虑 AIA 通过解释性。Abstract
Background and objectives: By extracting this information, Machine or Deep Learning (ML/DL)-based autonomous data analysis tools can assist clinicians and cancer researchers in discovering patterns and relationships from complex data sets. Many DL-based analyses on ovarian cancer (OC) data have recently been published. These analyses are highly diverse in various aspects of cancer (e.g., subdomain(s) and cancer type they address) and data analysis features. However, a comprehensive understanding of these analyses in terms of these features and AI assurance (AIA) is currently lacking. This systematic review aims to fill this gap by examining the existing literature and identifying important aspects of OC data analysis using DL, explicitly focusing on the key features and AI assurance perspectives. Methods: The PRISMA framework was used to conduct comprehensive searches in three journal databases. Only studies published between 2015 and 2023 in peer-reviewed journals were included in the analysis. Results: In the review, a total of 96 DL-driven analyses were examined. The findings reveal several important insights regarding DL-driven ovarian cancer data analysis: - Most studies 71% (68 out of 96) focused on detection and diagnosis, while no study addressed the prediction and prevention of OC. - The analyses were predominantly based on samples from a non-diverse population (75% (72/96 studies)), limited to a geographic location or country. - Only a small proportion of studies (only 33% (32/96)) performed integrated analyses, most of which used homogeneous data (clinical or omics). - Notably, a mere 8.3% (8/96) of the studies validated their models using external and diverse data sets, highlighting the need for enhanced model validation, and - The inclusion of AIA in cancer data analysis is in a very early stage; only 2.1% (2/96) explicitly addressed AIA through explainability.
摘要
背景和目标:通过提取这些信息,机器学习或深度学习(ML/DL)基于自主数据分析工具可以帮助临床医生和癌症研究人员发现复杂数据集中的模式和关系。在最近几年内,有许多基于深度学习的OC数据分析研究被发表。这些研究在各种方面(如亚Domain和癌种)和数据分析特点上很多样。然而,对这些研究的全面理解,特别是在关键特征和人工智能保障(AI保障)方面,目前缺乏了一个全面的认知。本系统性文献综述旨在填补这一空白,通过检查现有文献,了解OC数据分析中使用DL的重要方面和AI保障视角。方法:使用PRISMA框架进行全面搜索,仅包括2015年至2023年在同行评审 журна尔上发表的研究。结果:在这次综述中,总共检查了96项DL驱动的分析。结果显示了一些关于OC数据分析的重要发现: - 大多数研究(71%,68项中)是关于检测和诊断,而没有一项关于预测和预防OC的研究。 - 这些分析主要基于来自非多样化人口(75%,72项中)的样本,限制在某个地理位置或国家。 - 只有一小部分研究(仅33%,32项中)执行了集成分析,大多数使用同类数据(临床或生物 markers)。 - 需要注意的是,只有8.3%(8项中)的研究 Validated其模型使用外部和多样化数据集,表明需要进一步提高模型验证。 - 在癌症数据分析中,AI保障处于非常早期阶段,只有2.1%(2项中)显式地考虑了AI保障,通过解释性来实现。
Generalization of Fitness Exercise Recognition from Doppler Measurements by Domain-adaption and Few-Shot Learning
results: 对比基eline,使用小数据适应技术可以提高全身运动识别精度,并且在不同用户、环境和设备下的 recognize accuracy 提高了2-6倍。Abstract
In previous works, a mobile application was developed using an unmodified commercial off-the-shelf smartphone to recognize whole-body exercises. The working principle was based on the ultrasound Doppler sensing with the device built-in hardware. Applying such a lab-environment trained model on realistic application variations causes a significant drop in performance, and thus decimate its applicability. The reason of the reduced performance can be manifold. It could be induced by the user, environment, and device variations in realistic scenarios. Such scenarios are often more complex and diverse, which can be challenging to anticipate in the initial training data. To study and overcome this issue, this paper presents a database with controlled and uncontrolled subsets of fitness exercises. We propose two concepts to utilize small adaption data to successfully improve model generalization in an uncontrolled environment, increasing the recognition accuracy by two to six folds compared to the baseline for different users.
摘要
在前一些研究中,我们已经开发了一款基于商业化手机的移动应用程序,用于识别全身运动。工作原理基于商业手机内置的射频雷达感测。但在实际应用中,使用未修改的商业手机进行识别,因为环境、用户和设备的变化,会导致性能下降,从而减少其实用性。这种性能下降的原因可能是多方面的,包括用户、环境和设备变化。这些变化在实际场景中经常更加复杂和多样化,可能难以预测在初始训练数据中。为了研究和解决这个问题,本文提出了一个包含控制和无控制subset的健身运动数据库。我们还提出了两种概念,用于利用小量适应数据来提高模型在无控制环境中的泛化性,提高识别精度两到六倍 compared to基eline для不同的用户。
Continual Learning: Applications and the Road Forward
paper_authors: Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, Gido M. van de Ven
results: 这篇论文的结论是,连续学习将成为未来机器学习领域中不可或缺的一部分,并且需要进一步的研究以满足未来的需求。Abstract
Continual learning is a sub-field of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by surveying recent continual learning papers published at three major machine learning conferences, and show that memory-constrained settings dominate the field. Then, we discuss five open problems in machine learning, and even though they seem unrelated to continual learning at first sight, we show that continual learning will inevitably be part of their solution. These problems are model-editing, personalization, on-device learning, faster (re-)training and reinforcement learning. Finally, by comparing the desiderata from these unsolved problems and the current assumptions in continual learning, we highlight and discuss four future directions for continual learning research. We hope that this work offers an interesting perspective on the future of continual learning, while displaying its potential value and the paths we have to pursue in order to make it successful. This work is the result of the many discussions the authors had at the Dagstuhl seminar on Deep Continual Learning, in March 2023.
摘要
Explaining Deep Learning Models for Age-related Gait Classification based on time series acceleration
paper_authors: Xiaoping Zheng, Bert Otten, Michiel F Reneman, Claudine JC Lamoth for:This study aimed to enhance transparency in deep learning-based gait classification for aged-related gait patterns using Explainable Artificial Intelligence.methods:The study used a dataset of 244 subjects, including 129 adults and 115 older adults (age>65), who performed a 3-minute walking task while wearing accelerometers at the lumbar segment L3. The study used deep learning models, including convolutional neural networks (CNN) and gated recurrent units (GRU), to classify adult and older adult groups. SHAP was employed to explain the models’ predictions.results:The study found that both CNN and GRU assigned higher SHAP values to the data from vertical and walking directions, particularly emphasizing data around heel contact, spanning from the terminal swing to loading response phases. GRU did not treat every stride equally, and CNN accurately distinguished between adults and older adults based on the characteristics of a single stride’s data. The study found that data around heel contact emerged as most critical, suggesting differences in acceleration and deceleration patterns during walking between different age groups.Here is the format you requested:for: 这些paper是为了做什么?methods: 这个paper使用了哪些方法?results: 这个paper得到了什么结果?I hope this helps! Let me know if you have any other questions.Abstract
Gait analysis holds significant importance in monitoring daily health, particularly among older adults. Advancements in sensor technology enable the capture of movement in real-life environments and generate big data. Machine learning, notably deep learning (DL), shows promise to use these big data in gait analysis. However, the inherent black-box nature of these models poses challenges for their clinical application. This study aims to enhance transparency in DL-based gait classification for aged-related gait patterns using Explainable Artificial Intelligence, such as SHAP. A total of 244 subjects, comprising 129 adults and 115 older adults (age>65), were included. They performed a 3-minute walking task while accelerometers were affixed to the lumbar segment L3. DL models, convolutional neural network (CNN) and gated recurrent unit (GRU), were trained using 1-stride and 8-stride accelerations, respectively, to classify adult and older adult groups. SHAP was employed to explain the models' predictions. CNN achieved a satisfactory performance with an accuracy of 81.4% and an AUC of 0.89, and GRU demonstrated promising results with an accuracy of 84.5% and an AUC of 0.94. SHAP analysis revealed that both CNN and GRU assigned higher SHAP values to the data from vertical and walking directions, particularly emphasizing data around heel contact, spanning from the terminal swing to loading response phases. Furthermore, SHAP values indicated that GRU did not treat every stride equally. CNN accurately distinguished between adults and older adults based on the characteristics of a single stride's data. GRU achieved accurate classification by considering the relationships and subtle differences between strides. In both models, data around heel contact emerged as most critical, suggesting differences in acceleration and deceleration patterns during walking between different age groups.
摘要
跑步分析对日常健康监测具有重要意义,特别是对老年人而言。随着感测技术的发展,我们可以在真实环境中记录和生成大量数据。深度学习(DL)显示出在这些大量数据中使用的潜在优势。然而,深度学习模型的黑盒特性带来了临床应用中的挑战。本研究旨在通过可解释人工智能(AI)技术,如SHAP,提高DL基于走姿分类的透明度。本研究共包括244名参与者,其中129名成年人和115名老年人(年龄超过65岁)。他们在L3脊梁上附加加速计,并完成3分钟步行任务。使用1步和8步加速度训练深度神经网络(CNN)和闭合循环单元(GRU)模型,分别进行成年人和老年人组的分类。SHAP技术用于解释模型预测结果。CNN实现了满意的表现,准确率为81.4%,AUC为0.89,GRU则表现出了有前途的结果,准确率为84.5%,AUC为0.94。SHAP分析表明,CNN和GRU都将 vertical和步行方向的数据作为优先级,特别是在脚部接触阶段(从终端摆动阶段到加速响应阶段)。此外,SHAP值表明GRU不会对每步数据进行平等对待。CNN可以通过单步数据的特征来正确地分类成年人和老年人。GRU则通过考虑步行过程中的关系和细微差异来准确地分类。在两个模型中,脚部接触阶段的数据 emerge as most critical,表明在不同年龄组之间走姿的加速和减速差异存在。
Towards Exploratory Reformulation of Constraint Models
results: 研究已经做出了一些进展,包括开发出一种基于refinement的方法,以及实现了一种可以自动生成模型的系统。Abstract
It is well established that formulating an effective constraint model of a problem of interest is crucial to the efficiency with which it can subsequently be solved. Following from the observation that it is difficult, if not impossible, to know a priori which of a set of candidate models will perform best in practice, we envisage a system that explores the space of models through a process of reformulation from an initial model, guided by performance on a set of training instances from the problem class under consideration. We plan to situate this system in a refinement-based approach, where a user writes a constraint specification describing a problem above the level of abstraction at which many modelling decisions are made. In this position paper we set out our plan for an exploratory reformulation system, and discuss progress made so far.
摘要
已经确立了,制定一个有效的约束模型是解决问题的关键,以便更高效地解决问题。从观察到,不可能在先知道哪一个候选模型会在实践中表现最好,我们想要一个系统可以在一个初始模型的基础上进行重新表述,以便根据训练实例集的性能进行指导。我们计划将这个系统放在一种改进型方法中,其中用户可以在许多模型决策之上写一个约束规范,描述问题。在这篇观点文章中,我们阐述了我们的探索重新表述系统计划,以及已经完成的进度。
Analyzing Emissions and Energy Efficiency in Mixed Traffic Control at Unsignalized Intersections
paper_authors: Michael Villarreal, Dawei Wang, Jia Pan, Weizi Li
For: This paper aims to reduce transportation-related emissions, specifically at signalized intersections, by employing mixed traffic control eco-driving strategies using robot vehicles (RVs).* Methods: The paper uses emissions analysis on unsignalized intersections with complex, real-world topologies and traffic demands, where RVs are used to reduce waiting times and congestion.* Results: The paper finds that with at least 10% RV penetration rate, RVs can reduce fuel consumption and NOx emissions by up to 27% and 28%, respectively, and with at least 30% RVs, CO and HC emissions can be reduced by up to 42% and 43%, respectively. Additionally, RVs can reduce emissions across the whole network.Abstract
Greenhouse gas emissions have dramatically risen since the early 1900s with U.S. transportation generating 28% of the U.S' emissions. As such, there is interest in reducing transportation-related emissions. Specifically, sustainability research has sprouted around signalized intersections as intersections allow different streams of traffic to cross and change directions. Recent research has developed mixed traffic control eco-driving strategies at signalized intersections to decrease emissions. However, the inherent structure of a signalized intersection generates increased emissions by creating frequent acceleration/deceleration events, excessive idling from traffic congestion, and stop-and-go waves. Thus, we believe unsignalized intersections hold potential for further sustainability improvements. In this work, we provide an emissions analysis on unsignalized intersections with complex, real-world topologies and traffic demands where mixed traffic control strategies are employed by robot vehicles (RVs) to reduce waiting times and congestion. We find with at least 10% RV penetration rate, RVs generate less fuel consumption and NOx emissions than signalized intersections by up to 27% and 28%, respectively. With at least 30% RVs, CO and HC emissions are reduced by up to 42% and 43%, respectively. Additionally, RVs can reduce emissions across the whole network despite only employing their strategies at the intersections.
摘要
美国交通部门自20世纪初期以来,绿色气体排放已经有了很大增长。为了降低交通相关排放,可持续发展研究在信号灯交叉口上进行了大量的研究。特别是在信号灯交叉口上,杂交控制驾驶技术的研究已经得到了广泛的应用。然而,信号灯交叉口的内置结构会导致更多的加速/减速事件、交通堵塞导致的停靠时间过长、以及往返波。因此,我们认为不信号灯交叉口具有更多的可持续发展可能性。在这个工作中,我们对无信号灯交叉口进行了排放分析,并采用了机器人车(RV)实施杂交控制策略来减少等待时间和堵塞。我们发现,当RV占用率至少为10%时,RV比信号灯交叉口排放更少的柴油消耗和NOx排放,分别下降27%和28%。当RV占用率至少为30%时,CO和HC排放也下降了42%和43%。此外,RV可以在整个网络中减少排放,即使只在交叉口上采用杂交控制策略。
Establishing Central Sensitization Inventory Cut-off Values in patients with Chronic Low Back Pain by Unsupervised Machine Learning
paper_authors: Xiaoping Zheng, Claudine JC Lamoth, Hans Timmerman, Ebert Otten, Michiel F Reneman for: 这个研究旨在确定chronic low back pain (CLBP)的患者群体中,Central Sensitization Inventory (CSI)的优化阈值,考虑到性别因素和疼痛状况的影响。methods: 这个研究使用了四种不supervised clustering方法来确定CSI中的HACS相关模式,并通过内部和外部指标评估 clustering性能。然后,通过 Receiver Operating Characteristic (ROC)分析确定最佳阈值。results: 研究发现, Hierarchical Clustering 方法得到最佳结果,可以将患者分为三个群体:健康组、CLBP with low HACS level 组和 CLBP with high HACS level 组。对于全体群体,优化阈值为 35,对于女性,阈值为 34,对于男性,阈值为 35。这些结果表明,CLBP 患者的优化阈值为 35。Abstract
Human Assumed Central Sensitization is involved in the development and maintenance of chronic low back pain (CLBP). The Central Sensitization Inventory (CSI) was developed to evaluate the presence of HACS, with a cut-off value of 40/100 based on patients with chronic pain. However, various factors including pain conditions (e.g., CLBP), and gender may influence this cut-off value. For chronic pain condition such as CLBP, unsupervised clustering approaches can take these factors into consideration and automatically learn the HACS-related patterns. Therefore, this study aimed to determine the cut-off values for a Dutch-speaking population with CLBP, considering the total group and stratified by gender based on unsupervised machine learning. In this study, questionnaire data covering pain, physical, and psychological aspects were collected from patients with CLBP and aged-matched pain-free adults (referred to as healthy controls, HC). Four clustering approaches were applied to identify HACS-related clusters based on the questionnaire data and gender. The clustering performance was assessed using internal and external indicators. Subsequently, receiver operating characteristic analysis was conducted on the best clustering results to determine the optimal cut-off values. The study included 151 subjects, consisting of 63 HCs and 88 patients with CLBP. Hierarchical clustering yielded the best results, identifying three clusters: healthy group, CLBP with low HACS level, and CLBP with high HACS level groups. Based on the low HACS levels group (including HC and CLBP with low HACS level) and high HACS level group, the cut-off value for the overall groups were 35, 34 for females, and 35 for. The findings suggest that the optimal cut-off values for CLBP is 35. The gender-related cut-off values should be interpreted with caution due to the unbalanced gender distribution in the sample.
摘要
人类假设中央敏感性(HACS)参与了慢性低脊梁疼痛(CLBP)的发展和维持。中央敏感性评估器(CSI)是用来评估HACS存在的存在,其分别为40/100,基于患有慢性疼痛的患者。然而,不同的因素,包括疼痛状况(例如CLBP)和性别可能影响这个分别值。为了慢性疼痛状况如CLBP,无监督聚类方法可以考虑这些因素并自动发现HACS相关的模式。因此,本研究的目的是在荷兰语言社区中确定CLBP患者的分别值,并根据性别进行分类。在本研究中,收集了疼痛、物理和心理方面的问卷数据,从患有CLBP的患者和年龄匹配的疼痛自适应人(HC)中收集数据。四种聚类方法被应用于基于问卷数据和性别 Identify HACS相关的聚类。聚类性被评估使用内部和外部指标。后续,基于最佳聚类结果进行 receiver operating characteristic 分析,以确定优化的分别值。研究包括151名参与者,包括63名HC和88名CLBP患者。层次聚类得到最佳结果,并将患者分为三个群体:健康组、CLBP低HACS水平组和 CLBP高HACS水平组。根据低HACS水平组(包括HC和CLBP低HACS水平)和高HACS水平组,总体分别值为35,女性分别值为34,男性分别值为35。结论表明,CLBP的最佳分别值为35。性别相关的分别值应该进行注意,因为样本中性别分布不均衡。
Generating Valid and Natural Adversarial Examples with Large Language Models
results: 对于Movie Review(MR)、IMDB和Yelp Review Polarity等 dataset,LLM-Attack模型在对比基eline adversarial attack模型时表现出了显著的优势,在人工和GPT-4评估中也表现出了显著的提升。该模型可以生成有效、自然的攻击示例,保持语义意义、 grammaticality和人类隐身性。Abstract
Deep learning-based natural language processing (NLP) models, particularly pre-trained language models (PLMs), have been revealed to be vulnerable to adversarial attacks. However, the adversarial examples generated by many mainstream word-level adversarial attack models are neither valid nor natural, leading to the loss of semantic maintenance, grammaticality, and human imperceptibility. Based on the exceptional capacity of language understanding and generation of large language models (LLMs), we propose LLM-Attack, which aims at generating both valid and natural adversarial examples with LLMs. The method consists of two stages: word importance ranking (which searches for the most vulnerable words) and word synonym replacement (which substitutes them with their synonyms obtained from LLMs). Experimental results on the Movie Review (MR), IMDB, and Yelp Review Polarity datasets against the baseline adversarial attack models illustrate the effectiveness of LLM-Attack, and it outperforms the baselines in human and GPT-4 evaluation by a significant margin. The model can generate adversarial examples that are typically valid and natural, with the preservation of semantic meaning, grammaticality, and human imperceptibility.
摘要
深度学习基于自然语言处理(NLP)模型,特别是预训练语言模型(PLM),已经被揭示为易受到敌意攻击的。然而,由多数主流单词水平攻击模型生成的攻击示例通常不是有效的,导致 semantic maintenance、grammaticality 和人类不可见性的失去。基于大语言模型(LLM)的异常 capacities of language understanding and generation,我们提出了 LLM-Attack,这是一种生成有效和自然的攻击示例的方法。该方法包括两个阶段:单词重要性排名(搜索最易受到攻击的单词)和单词同义补充(使用 LLM 获取的单词同义补充替换它们)。实验结果表明,对 MR、IMDB 和 Yelp Review Polarity 数据集进行比较,LLM-Attack 高效地超越了基eline adversarial attack模型,在人类和 GPT-4 评估中也具有显著的优势。LLM-Attack 可以生成有效、自然的攻击示例,保持 semantics 的意义、 grammaticality 和人类不可见性。
Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms
paper_authors: Joren Brunekreef, Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke, Jonas Teuwen for: 这个研究是用于提高像素分类器的准确性和可靠性。methods: 这个方法使用了内在结构调整(Inductive Conformal Prediction)来整理像素分类器的预测结果,并且通过调整像素分类器的对应预测结果,以提高预测的准确性和可靠性。results: 这个研究发现,使用内在结构调整法可以将像素分类器的准确性和可靠性提高,并且在具有限制的数据情况下(如医疗领域),可以更好地利用可用数据进行整理。Abstract
Image segmentation algorithms can be understood as a collection of pixel classifiers, for which the outcomes of nearby pixels are correlated. Classifier models can be calibrated using Inductive Conformal Prediction, but this requires holding back a sufficiently large calibration dataset for computing the distribution of non-conformity scores of the model's predictions. If one only requires only marginal calibration on the image level, this calibration set consists of all individual pixels in the images available for calibration. However, if the goal is to attain proper calibration for each individual pixel classifier, the calibration set consists of individual images. In a scenario where data are scarce (such as the medical domain), it may not always be possible to set aside sufficiently many images for this pixel-level calibration. The method we propose, dubbed ``Kandinsky calibration'', makes use of the spatial structure present in the distribution of natural images to simultaneously calibrate the classifiers of ``similar'' pixels. This can be seen as an intermediate approach between marginal (imagewise) and conditional (pixelwise) calibration, where non-conformity scores are aggregated over similar image regions, thereby making more efficient use of the images available for calibration. We run experiments on segmentation algorithms trained and calibrated on subsets of the public MS-COCO and Medical Decathlon datasets, demonstrating that Kandinsky calibration method can significantly improve the coverage. When compared to both pixelwise and imagewise calibration on little data, the Kandinsky method achieves much lower coverage errors, indicating the data efficiency of the Kandinsky calibration.
摘要
Image segmentation算法可以理解为一组像素分类器,其中邻近像素的结果相关。类ifier模型可以通过卷积学习进行准确化,但需要一个够大的准确化数据集来计算模型预测结果的分布。如果只需要图像水平的准确化,则准确化数据集包括所有可用于准确化的像素。但如果目标是为每个个像素分类器进行准确化,则准确化数据集包括个别图像。在数据稀缺的情况下(如医疗领域),可能无法将够多的图像用于这种像素级准确化。我们提议的“卡金斯基准确化”方法(Kandinsky calibration)利用自然图像分布中的空间结构,同时准确化“相似”像素的分类器。这可以看作是 между像素级和图像级准确化之间的中间方法,其中不符合分布的分数聚合在相似图像区域上,从而更有效地利用可用于准确化的图像。我们在使用MS-COCO和医疗十字数据集训练和准确化的segmentation算法上进行了实验,并证明了卡金斯基准确化方法可以显著提高覆盖率。与像素级和图像级准确化相比,卡金斯基准确化方法在有限的数据情况下表现出远远更低的覆盖错误,表明了卡金斯基准确化方法在数据效率方面的优势。
System 2 Attention (is something you might need too)
methods: 我们提出了 System 2 Attention(S2A),它利用 LLM 的自然语言理解能力和指令执行能力,以决定需要注意的内容。 S2A 首先重新生成输入上下文,并将其限制为只包含相关信息,然后对重新生成的上下文进行注意。
results: 在三个任务中(包括问答、数学问题和长篇生成),S2A 比标准注意力基于 LLM 高效,提高了事实性和 объекivity,降低了奉承性。Abstract
Soft attention in Transformer-based Large Language Models (LLMs) is susceptible to incorporating irrelevant information from the context into its latent representations, which adversely affects next token generations. To help rectify these issues, we introduce System 2 Attention (S2A), which leverages the ability of LLMs to reason in natural language and follow instructions in order to decide what to attend to. S2A regenerates the input context to only include the relevant portions, before attending to the regenerated context to elicit the final response. In experiments, S2A outperforms standard attention-based LLMs on three tasks containing opinion or irrelevant information, QA, math word problems and longform generation, where S2A increases factuality and objectivity, and decreases sycophancy.
摘要
transformer-based large language models (LLMs) 的软注意力容易受到上下文中不相关信息的影响,这会对下一个 Token 生成产生负面影响。为了解决这些问题,我们介绍 System 2 Attention (S2A),它利用 LLMs 的自然语言理解能力和遵循 instrucions 来决定需要注意的内容。S2A 将输入上下文重新生成为只包含相关部分,然后对重新生成的上下文进行注意,以获得最终响应。在实验中,S2A 比标准注意力基于 LLMs 在三个任务中表现出色,包括问答、数学问题和长文生成,S2A 可以提高事实性和公正性,而减少卖舌。
results: 对比基于Random embedding的方法,提出了图变量嵌入CF(GVECF)框架,实现了更好的特征传播和层次GCN协同推荐,最终在 recall 和 NDCG 指标上达到了13.78%的提升。Abstract
The customization of recommended content to users holds significant importance in enhancing user experiences across a wide spectrum of applications such as e-commerce, music, and shopping. Graph-based methods have achieved considerable performance by capturing user-item interactions. However, these methods tend to utilize randomly constructed embeddings in the dataset used for training the recommender, which lacks any user preferences. Here, we propose the concept of variational embeddings as a means of pre-training the recommender system to improve the feature propagation through the layers of graph convolutional networks (GCNs). The graph variational embedding collaborative filtering (GVECF) is introduced as a novel framework to incorporate representations learned through a variational graph auto-encoder which are embedded into a GCN-based collaborative filtering. This approach effectively transforms latent high-order user-item interactions into more trainable vectors, ultimately resulting in better performance in terms of recall and normalized discounted cumulative gain(NDCG) metrics. The experiments conducted on benchmark datasets demonstrate that our proposed method achieves up to 13.78% improvement in the recall over the test data.
摘要
Customization of recommended content to users is crucial in enhancing user experiences in various applications, such as e-commerce, music, and shopping. Graph-based methods have shown significant performance by capturing user-item interactions. However, these methods often rely on randomly constructed embeddings in the training dataset, which neglects user preferences. To address this issue, we propose the concept of variational embeddings as a means of pre-training the recommender system to improve feature propagation through graph convolutional networks (GCNs). We introduce the graph variational embedding collaborative filtering (GVECF) as a novel framework that incorporates representations learned through a variational graph auto-encoder into a GCN-based collaborative filtering. This approach effectively transforms latent high-order user-item interactions into more trainable vectors, resulting in better performance in terms of recall and normalized discounted cumulative gain (NDCG) metrics. Experimental results on benchmark datasets demonstrate that our proposed method achieves up to 13.78% improvement in recall over the test data.
Generalized super-resolution 4D Flow MRI – using ensemble learning to extend across the cardiovascular system
paper_authors: Leon Ericsson, Adam Hjalmarsson, Muhammad Usman Akbar, Edward Ferdian, Mia Bonini, Brandon Hardy, Jonas Schollenberger, Maria Aristova, Patrick Winter, Nicholas Burris, Alexander Fyrdahl, Andreas Sigfridsson, Susanne Schnell, C. Alberto Figueroa, David Nordsletten, Alistair A. Young, David Marlevi
methods: 该研究使用了不同的 convolutional base 和ensemble learning来评估SR 4D Flow MRI的一致性,并使用了synthetic数据生成在三个不同的领域(心血管、大动脉和脑血管)进行评估。
results: 研究结果表明, ensemble learning可以在不同的领域中提高SR性能,并可以准确地预测高分辨率的速度从低分辨率的输入数据中。同时,优化的网络也可以从下采样的实际数据中恢复原始分辨率的速度,以及生成严格的SR-图像。Abstract
4D Flow Magnetic Resonance Imaging (4D Flow MRI) is a non-invasive measurement technique capable of quantifying blood flow across the cardiovascular system. While practical use is limited by spatial resolution and image noise, incorporation of trained super-resolution (SR) networks has potential to enhance image quality post-scan. However, these efforts have predominantly been restricted to narrowly defined cardiovascular domains, with limited exploration of how SR performance extends across the cardiovascular system; a task aggravated by contrasting hemodynamic conditions apparent across the cardiovasculature. The aim of our study was to explore the generalizability of SR 4D Flow MRI using a combination of heterogeneous training sets and dedicated ensemble learning. With synthetic training data generated across three disparate domains (cardiac, aortic, cerebrovascular), varying convolutional base and ensemble learners were evaluated as a function of domain and architecture, quantifying performance on both in-silico and acquired in-vivo data from the same three domains. Results show that both bagging and stacking ensembling enhance SR performance across domains, accurately predicting high-resolution velocities from low-resolution input data in-silico. Likewise, optimized networks successfully recover native resolution velocities from downsampled in-vivo data, as well as show qualitative potential in generating denoised SR-images from clinical level input data. In conclusion, our work presents a viable approach for generalized SR 4D Flow MRI, with ensemble learning extending utility across various clinical areas of interest.
摘要
四维流体磁共振成像(4D Flow MRI)是一种非侵入性测量技术,可量化心血管系统中血液流动的量。 although practical use is limited by spatial resolution and image noise, incorporating trained super-resolution (SR) networks has the potential to enhance image quality post-scan. However, these efforts have been mainly focused on narrowly defined cardiovascular domains, with limited exploration of how SR performance extends across the cardiovascular system; a task that is exacerbated by the contrasting hemodynamic conditions present across the cardiovasculature.我们的研究的目标是探索SR 4D Flow MRI的通用性,使用不同域的训练集和专门的集成学习来评估其性能。我们使用了三个不同的域(心脏、大动脉和脑血管)中的 sintetic 训练数据,以及不同的卷积基和集成学习算法,对域和结构进行评估,以确定它们在不同域中的性能。结果表明,bagging和stacking ensemble ensemble 能够在不同域上提高SR性能,从低分辨率输入数据中预测高分辨率速度,并且在实际水平上成功地恢复原始分辨率速度。此外,优化的网络还可以生成净化后的SR图像从临床水平的输入数据中。总之,我们的研究提出了一种可行的通用SR 4D Flow MRI方法,使用集成学习来扩展其应用范围。这种方法可以在不同的临床领域中实现高质量的SR成像,并且可以帮助解决血液流动的诊断和评估问题。
Improving Real Estate Appraisal with POI Integration and Areal Embedding
paper_authors: Sumin Han, Youngjun Park, Sonia Sabir, Jisun An, Dongman Lee
for: 本研究主要针对两个重要挑战,第一是探讨 Points of Interest (POI) 对房屋价值的影响,并提出了一个涵盖性强的数据驱动方法来选择特征。第二是将路网基于的 Areal Embedding 应用于房地产评估,以提高空间理解。
methods: 本研究提出了一个修改后的 POI 特征提取方法,并讨论了每个 POI 对房屋价值评估的影响。然后,我们提出了一个基于掩盖多头注意力的折衣多项式预测模型(AMMASI),它在扩展现有的 ASI 模型的基础上,运用掩盖多头注意力来捕捉地理邻居房屋和相似特征房屋的特征。
results: 我们的模型在现有基eline上出现了明显的提升,并且还提供了未来优化房地产评估方法的可能性。Abstract
Despite advancements in real estate appraisal methods, this study primarily focuses on two pivotal challenges. Firstly, we explore the often-underestimated impact of Points of Interest (POI) on property values, emphasizing the necessity for a comprehensive, data-driven approach to feature selection. Secondly, we integrate road-network-based Areal Embedding to enhance spatial understanding for real estate appraisal. We first propose a revised method for POI feature extraction, and discuss the impact of each POI for house price appraisal. Then we present the Areal embedding-enabled Masked Multihead Attention-based Spatial Interpolation for House Price Prediction (AMMASI) model, an improvement upon the existing ASI model, which leverages masked multi-head attention on geographic neighbor houses and similar-featured houses. Our model outperforms current baselines and also offers promising avenues for future optimization in real estate appraisal methodologies.
摘要
尽管现有的房地产评估方法有所进步,这种研究主要关注两个重要挑战。首先,我们研究点位 интерес(POI)对房产价值的影响,强调需要一种全面、数据驱动的特征选择方法。其次,我们将路网基于的区域嵌入技术与房产评估相结合,以提高地理理解。我们首先提出了一种POI特征提取方法,然后讨论每个POI对房价评估的影响。接着,我们介绍了使用做废字符串嵌入的掩码多头注意力加速器(AMMASI)模型,这是现有ASI模型的改进,可以更好地利用地理相似特征和邻居房屋的特征。我们的模型在现有基准点上表现出色,并且还提供了未来房地产评估方法的优秀可能性。
Large Language Models and Explainable Law: a Hybrid Methodology
results: 研究人员通过这些解释,赋予非专业人员执行复杂的法律任务的能力,通过自动化法律比较,对同一个事实进行多种规则基于的推理。Abstract
The paper advocates for LLMs to enhance the accessibility, usage and explainability of rule-based legal systems, contributing to a democratic and stakeholder-oriented view of legal technology. A methodology is developed to explore the potential use of LLMs for translating the explanations produced by rule-based systems, from high-level programming languages to natural language, allowing all users a fast, clear, and accessible interaction with such technologies. The study continues by building upon these explanations to empower laypeople with the ability to execute complex juridical tasks on their own, using a Chain of Prompts for the autonomous legal comparison of different rule-based inferences, applied to the same factual case.
摘要
文章强调LLM可以提高法律系统的可用性、使用性和解释性,从而推动法律技术具有民主和利益相关的视角。文章提出了一种方法来探讨LLM可以将高级编程语言生成的解释翻译成自然语言,以便所有用户快速、清晰地与这些技术进行交互。研究继续发展了这些解释,以使非专业人士通过自动化的法律比较,执行复杂的法律任务。Here's a word-for-word translation of the text in Traditional Chinese:文章强调LLM可以提高法律系统的可用性、使用性和解释性,从而推动法律技术具有民主和利益相关的视角。文章提出了一种方法来探讨LLM可以将高级编程语言生成的解释翻译成自然语言,以便所有用户快速、清晰地与这些技术进行交互。研究继续发展了这些解释,以使非专业人士通过自动化的法律比较,执行复杂的法律任务。
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
for: DocPedia is a novel large multimodal model (LMM) for versatile OCR-free document understanding.
methods: DocPedia directly processes visual input in the frequency domain rather than the pixel space, using a limited number of visual tokens to capture a greater amount of visual and textual information.
results: Extensive experiments conducted on various benchmarks confirm the effectiveness and superior performance of DocPedia over other methods.Here’s the simplified Chinese text in the format you requested:
for: DocPedia 是一种 novel large multimodal model (LMM) для多样化 OCR-free 文档理解。
results: 广泛的实验表明 DocPedia 的效果和其他方法相比较高。Abstract
This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free document understanding, capable of parsing images up to 2,560$\times$2,560 resolution. Unlike existing work either struggle with high-resolution documents or give up the large language model thus vision or language ability constrained, our DocPedia directly processes visual input in the frequency domain rather than the pixel space. The unique characteristic enables DocPedia to capture a greater amount of visual and textual information using a limited number of visual tokens. To consistently enhance both perception and comprehension abilities of our model, we develop a dual-stage training strategy and enrich instructions/annotations of all training tasks covering multiple document types. Extensive quantitative and qualitative experiments conducted on various publicly available benchmarks confirm the mutual benefits of jointly learning perception and comprehension tasks. The results provide further evidence of the effectiveness and superior performance of our DocPedia over other methods.
摘要
这个工作介绍了 DocPedia,一种新型的大型多模态模型(LMM),能够无需OCR进行多种文档理解,并且可以处理高分辨率图像(最大2560×2560像素)。与现有的工作不同, DocPedia直接在频谱空间处理视觉输入,而不是像素空间,这使得它能够更好地捕捉更多的视觉和文本信息,只需使用有限的视觉符号。为了不断提高模型的感知和理解能力,我们提出了双Stage训练策略,并对所有训练任务进行了丰富的指令/注释增强。经验证明,这种方法可以同时提高模型的感知和理解能力。广泛的量化和质量测试表明,我们的 DocPedia在其他方法的基础上表现更出色。
Age-Friendly Route Planner: Calculating Comfortable Routes for Senior Citizens
results: 本研究示出了适应老年人的路径规划器可以提供个性化的路径,并且可以帮助创建适应老年人的路径。Abstract
The application of routing algorithms to real-world situations is a widely studied research topic. Despite this, routing algorithms and applications are usually developed for a general purpose, meaning that certain groups, such as ageing people, are often marginalized due to the broad approach of the designed algorithms. This situation may pose a problem in cities which are suffering a slow but progressive ageing of their populations. With this motivation in mind, this paper focuses on describing our implemented Age-Friendly Route Planner, whose goal is to improve the experience in the city for senior citizens. In order to measure the age-friendliness of a route, several variables have been deemed, such as the number of amenities along the route, the amount of comfortable elements found, or the avoidance of sloppy sections. In this paper, we describe one of the main features of the Age-Friendly Route Planner: the preference-based routes, and we also demonstrate how it can contribute to the creation of adapted friendly routes.
摘要
Routing算法在实际应用中是广泛研究的研究主题。然而,routing算法和应用通常是为普遍目标设计的,导致certain groups,如年龄增长的人群,因为设计的算法过广而被排除在外。这种情况可能在年龄增长的城市中带来问题。基于这种动机,这篇论文主要关注描述我们实施的年龄友好路径规划器,以提高城市内的老年人体验。为了测量路径年龄友好程度,我们考虑了许多变量,如路径上的设施数量、舒适元素的存在量或恶势力的避免。在这篇论文中,我们介绍了 preference-based路径的一个主要特点,并示例如何该功能可以为创造适应友好路径做出贡献。
Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents
paper_authors: Zhuosheng Zhang, Yao Yao, Aston Zhang, Xiangru Tang, Xinbei Ma, Zhiwei He, Yiming Wang, Mark Gerstein, Rui Wang, Gongshen Liu, Hai Zhao
For: The paper discusses the chain-of-thought (CoT) reasoning techniques used in large language models (LLMs) and their applications in developing autonomous language agents.* Methods: The paper explores the foundational mechanics of CoT techniques, including their efficacy and the paradigm shift in CoT reasoning.* Results: The paper discusses the merits of CoT reasoning, including its ability to enhance interpretability, controllability, and flexibility, and its potential for generalization, efficiency, customization, scaling, and safety.Here are the three key points in Simplified Chinese text:* For: 这篇论文讨论了大语言模型(LLM)中的链条思维(CoT)技术,以及它们在语言代理人的发展中的应用。* Methods: 论文探讨了CoT技术的基础机制,包括它的有效性和CoT思维的 парадиг shift。* Results: 论文讨论了CoT思维的优点,包括它的解释性、可控性、灵活性和普遍性,以及其在不同领域中的应用前景。Abstract
Large language models (LLMs) have dramatically enhanced the field of language intelligence, as demonstrably evidenced by their formidable empirical performance across a spectrum of complex reasoning tasks. Additionally, theoretical proofs have illuminated their emergent reasoning capabilities, providing a compelling showcase of their advanced cognitive abilities in linguistic contexts. Critical to their remarkable efficacy in handling complex reasoning tasks, LLMs leverage the intriguing chain-of-thought (CoT) reasoning techniques, obliging them to formulate intermediate steps en route to deriving an answer. The CoT reasoning approach has not only exhibited proficiency in amplifying reasoning performance but also in enhancing interpretability, controllability, and flexibility. In light of these merits, recent research endeavors have extended CoT reasoning methodologies to nurture the development of autonomous language agents, which adeptly adhere to language instructions and execute actions within varied environments. This survey paper orchestrates a thorough discourse, penetrating vital research dimensions, encompassing: (i) the foundational mechanics of CoT techniques, with a focus on elucidating the circumstances and justification behind its efficacy; (ii) the paradigm shift in CoT; and (iii) the burgeoning of language agents fortified by CoT approaches. Prospective research avenues envelop explorations into generalization, efficiency, customization, scaling, and safety. This paper caters to a wide audience, including beginners seeking comprehensive knowledge of CoT reasoning and language agents, as well as experienced researchers interested in foundational mechanics and engaging in cutting-edge discussions on these topics. A repository for the related papers is available at https://github.com/Zoeyyao27/CoT-Igniting-Agent.
摘要
大型语言模型(LLM)已经对语言智能领域产生了巨大的影响,这可以通过它们在复杂的推理任务上表现出色来证明。此外,理论证明也揭示了它们在语言上的高级认知能力。LLMs 利用了Chain of Thought(CoT)推理技术,这使得它们在得出答案之前需要形ulate 中间步骤。CoT 推理方法不仅提高了推理性能,还提高了可读性、可控性和灵活性。由于这些优点, latest research efforts 将 CoT 推理方法应用于语言代理人的开发,以便这些代理人能够遵循语言指令并在不同环境中执行操作。本文协调了一个全面的讨论,涵盖:(i)CoT 技术的基础机理,强调解释 CoT 的效果的原因和情况;(ii)CoT 推理方法的变革;以及(iii)通过 CoT 方法强化语言代理人的发展。未来研究的可能性包括探索 generalization、效率、自定义、扩展和安全。这篇文章适合各种读者,包括想要了解 CoT 推理和语言代理人的新手和经验研究者。相关论文库可以在 https://github.com/Zoeyyao27/CoT-Igniting-Agent 找到。
Beyond Boundaries: A Comprehensive Survey of Transferable Attacks on AI Systems
results: 本文发现了许多传染攻击可以在不同的领域中进行应用,并且这些攻击可以对智能系统的安全性产生很大的影响。此外,本文还提出了一些可能的研究方向,以便更好地探讨传染攻击的领域。Abstract
Artificial Intelligence (AI) systems such as autonomous vehicles, facial recognition, and speech recognition systems are increasingly integrated into our daily lives. However, despite their utility, these AI systems are vulnerable to a wide range of attacks such as adversarial, backdoor, data poisoning, membership inference, model inversion, and model stealing attacks. In particular, numerous attacks are designed to target a particular model or system, yet their effects can spread to additional targets, referred to as transferable attacks. Although considerable efforts have been directed toward developing transferable attacks, a holistic understanding of the advancements in transferable attacks remains elusive. In this paper, we comprehensively explore learning-based attacks from the perspective of transferability, particularly within the context of cyber-physical security. We delve into different domains -- the image, text, graph, audio, and video domains -- to highlight the ubiquitous and pervasive nature of transferable attacks. This paper categorizes and reviews the architecture of existing attacks from various viewpoints: data, process, model, and system. We further examine the implications of transferable attacks in practical scenarios such as autonomous driving, speech recognition, and large language models (LLMs). Additionally, we outline the potential research directions to encourage efforts in exploring the landscape of transferable attacks. This survey offers a holistic understanding of the prevailing transferable attacks and their impacts across different domains.
摘要
人工智能(AI)系统如自动驾驶、人脸识别和语音识别系统在我们日常生活中越来越普遍。然而,尽管它们的实用性,这些AI系统却易受到各种攻击,如对抗攻击、后门攻击、数据毒品攻击、成员推理攻击、模型反向攻击和模型窃取攻击。特别是,许多攻击是专门设计为targeting a particular model or system,但它们的效果可以扩散到其他目标,称为可传递性攻击。虽然有大量的努力在开发可传递性攻击方面,但总的来说,对这些攻击的全面理解仍然不够。在这篇论文中,我们全面探讨基于学习的攻击,特别是在Cyber-Physical Security(Cyber-Physical Security)领域中的可传递性攻击。我们在不同的领域(图像、文本、图ogram、音频、视频)中探讨了可传递性攻击的普遍和普遍性。这篇论文将攻击的架构从不同的视角进行分类和评论,包括数据、过程、模型和系统的视角。我们还考虑了可传递性攻击在实际场景中的影响,如自动驾驶、语音识别和大型自然语言模型(LLMs)。此外,我们还提出了可能的研究方向,以便在探索可传递性攻击的领域进行更多的努力。这篇论文提供了可传递性攻击的全面理解,并对不同领域中的可传递性攻击产生了影响。
PhytNet – Tailored Convolutional Neural Networks for Custom Botanical Data
paper_authors: Jamie R. Sykes, Katherine Denby, Daniel W. Franks
for: This paper is written for the purpose of developing a new deep learning model for automated disease, weed, and crop classification in agriculture, specifically using computer vision techniques.
methods: The paper uses a novel dataset of infrared cocoa tree images and develops a new convolutional neural network (CNN) architecture called PhytNet. The authors also compare the performance of PhytNet with existing CNN architectures like ResNet and EfficientNet.
results: The paper demonstrates the development and performance of PhytNet on a specific dataset of cocoa tree images. The results show that PhytNet displays excellent attention to relevant features, no overfitting, and an exceptionally low computation cost, making it a promising candidate for rapid disease or plant classification, or precise localisation of disease symptoms for autonomous systems.Abstract
Automated disease, weed and crop classification with computer vision will be invaluable in the future of agriculture. However, existing model architectures like ResNet, EfficientNet and ConvNeXt often underperform on smaller, specialised datasets typical of such projects. We address this gap with informed data collection and the development of a new CNN architecture, PhytNet. Utilising a novel dataset of infrared cocoa tree images, we demonstrate PhytNet's development and compare its performance with existing architectures. Data collection was informed by analysis of spectroscopy data, which provided useful insights into the spectral characteristics of cocoa trees. Such information could inform future data collection and model development. Cocoa was chosen as a focal species due to the diverse pathology of its diseases, which pose significant challenges for detection. ResNet18 showed some signs of overfitting, while EfficientNet variants showed distinct signs of overfitting. By contrast, PhytNet displayed excellent attention to relevant features, no overfitting, and an exceptionally low computation cost (1.19 GFLOPS). As such PhytNet is a promising candidate for rapid disease or plant classification, or precise localisation of disease symptoms for autonomous systems.
摘要
自动化疾病、植物和作物分类使用计算机视觉将在农业未来非常重要。然而,现有的模型架构如ResNet、EfficientNet和ConvNeXt经常在小型特殊数据集上表现不佳。我们通过了 informed data collection和开发新的CNN架构,即PhytNet,解决这个问题。我们使用了一个新的红外巧克力树图像集来证明PhytNet的发展和与现有架构进行比较。数据收集是根据谱spectroscopy数据进行分析,提供了有用的信息,例如巧克力树的spectral特征。这种信息可能会在未来的数据收集和模型开发中提供帮助。巧克力是我们选择的关键种类,因为它的疾病多样化和诊断具有挑战性。ResNet18显示了一定程度的过拟合,而EfficientNet变种显示了明显的过拟合。相比之下,PhytNet具有优秀的注意力特征,没有过拟合,计算成本非常低(1.19 GFLOPS)。因此,PhytNet是一个有前途的 кандидат,用于快速的疾病或植物分类,或精确的疾病 симптом的自动化识别。
Responsible AI Research Needs Impact Statements Too
paper_authors: Alexandra Olteanu, Michael Ekstrand, Carlos Castillo, Jina Suh
for: The paper is written to explore the potential unintended and adverse consequences of responsible artificial intelligence (RAI), ethical AI, and ethics in AI.
methods: The paper uses a qualitative research approach, including a literature review and expert interviews, to identify potential risks and challenges associated with RAI, ethical AI, and ethics in AI.
results: The paper highlights several potential unintended and adverse consequences of RAI, ethical AI, and ethics in AI, including the risk of reinforcing existing biases and power imbalances, the potential for unintended consequences of AI systems, and the need for careful consideration of ethical issues in AI development and deployment.Abstract
All types of research, development, and policy work can have unintended, adverse consequences - work in responsible artificial intelligence (RAI), ethical AI, or ethics in AI is no exception.
摘要
所有类型的研究、开发和政策工作都可能有意外、不良影响 - 负责任人工智能(RAI)、伦理AI或AI伦理都不例外。Note that "负责任人工智能" (RAI) is a term used in China to refer to "ethical AI" or "responsible AI".
Intelligent methods for business rule processing: State-of-the-art
paper_authors: Cristiano André da Costa, Uélison Jean Lopes dos Santos, Eduardo Souza dos Reis, Rodolfo Stoffel Antunes, Henrique Chaves Pacheco, Thaynã da Silva França, Rodrigo da Rosa Righi, Jorge Luis Victória Barbosa, Franklin Jebadoss, Jorge Montalvao, Rogerio Kunkel
for: 这篇论文主要用于介绍最新的智能技术在业务规则处理方面的应用。
methods: 论文采用了涵义检索和机器学习等智能方法进行研究。
results: 论文对市场前十家供应商和其主要解决方案进行了审查和分析。Abstract
In this article, we provide an overview of the latest intelligent techniques used for processing business rules. We have conducted a comprehensive survey of the relevant literature on robot process automation, with a specific focus on machine learning and other intelligent approaches. Additionally, we have examined the top vendors in the market and their leading solutions to tackle this issue.
摘要
在本文中,我们提供了对最新的智能技术处理商业规则的概述。我们进行了对相关文献的全面调查,具体强调机器学习和其他智能方法。此外,我们还评估了市场上领先的供应商和他们的主要解决方案。
Unveiling the Unseen Potential of Graph Learning through MLPs: Effective Graph Learners Using Propagation-Embracing MLPs
results: 通过对实际世界 benchmark 数据集进行了广泛的评估,我们证明了 P&D 的效果,并表明了学生 MLP 的性能得到了进一步提高。Abstract
Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semi-supervised node classification on graphs, by training a student MLP by knowledge distillation (KD) from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during KD, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation $T$ and propagation $\Pi$, we re-frame the KD process as enabling the student MLP to explicitly learn both $T$ and $\Pi$. Although this can be achieved by applying the inverse propagation $\Pi^{-1}$ before distillation from the teacher GNN, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher GNN before KD and can be interpreted as an approximate process of the inverse propagation $\Pi^{-1}$. Through comprehensive evaluations using real-world benchmark datasets, we demonstrate the effectiveness of P&D by showing further performance boost of the student MLP.
摘要
Inspired by GNNs that separate feature transformation $T$ and propagation $\Pi$, we re-frame the KD process as enabling the student MLP to explicitly learn both $T$ and $\Pi$. This can be achieved by applying the inverse propagation $\Pi^{-1}$ before distillation from the teacher GNN, but this comes with a high computational cost from large matrix multiplications during training.To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher GNN before KD and can be interpreted as an approximate process of the inverse propagation $\Pi^{-1}$. Through comprehensive evaluations using real-world benchmark datasets, we demonstrate the effectiveness of P&D by showing a further performance boost of the student MLP.
LSTM-CNN: An efficient diagnostic network for Parkinson’s disease utilizing dynamic handwriting analysis
paper_authors: Xuechao Wang, Junqing Huang, Sven Nomm, Marianna Chatzakou, Kadri Medijainen, Aaro Toomela, Michael Ruzhansky for: 这个研究的目的是提出一种基于深度学习的动手写分析方法,以提供早期诊断 Parkinson 病的 объектив标准。methods: 该方法采用了一种混合深度学习approach,结合了LSTM和CNN两种不同的神经网络模型,以提高分类精度和计算效率。results: 实验结果表明,提出的方法在新的 DraWritePD 数据集上达到了96.2%的高精度分类率,并在 PaHaW 数据集上达到了90.7%的高精度分类率。此外,该方法还具有轻量级的参数和计算量,可以在几十万个样本上进行实时推理。Abstract
Background and objectives: Dynamic handwriting analysis, due to its non-invasive and readily accessible nature, has recently emerged as a vital adjunctive method for the early diagnosis of Parkinson's disease. In this study, we design a compact and efficient network architecture to analyse the distinctive handwriting patterns of patients' dynamic handwriting signals, thereby providing an objective identification for the Parkinson's disease diagnosis. Methods: The proposed network is based on a hybrid deep learning approach that fully leverages the advantages of both long short-term memory (LSTM) and convolutional neural networks (CNNs). Specifically, the LSTM block is adopted to extract the time-varying features, while the CNN-based block is implemented using one-dimensional convolution for low computational cost. Moreover, the hybrid model architecture is continuously refined under ablation studies for superior performance. Finally, we evaluate the proposed method with its generalization under a five-fold cross-validation, which validates its efficiency and robustness. Results: The proposed network demonstrates its versatility by achieving impressive classification accuracies on both our new DraWritePD dataset ($96.2\%$) and the well-established PaHaW dataset ($90.7\%$). Moreover, the network architecture also stands out for its excellent lightweight design, occupying a mere $0.084$M of parameters, with a total of only $0.59$M floating-point operations. It also exhibits near real-time CPU inference performance, with inference times ranging from $0.106$ to $0.220$s. Conclusions: We present a series of experiments with extensive analysis, which systematically demonstrate the effectiveness and efficiency of the proposed hybrid neural network in extracting distinctive handwriting patterns for precise diagnosis of Parkinson's disease.
摘要
背景和目标:动态手写分析因为其不侵入性和Ready accessible的特点,最近在诊断parkinson病的早期诊断中得到了广泛应用。在本研究中,我们设计了一个紧凑型和高效的网络架构,以分析患者的动态手写信号特征,从而提供一种对parkinson病诊断的 объектив标准。方法:提议的网络采用了一种混合深度学习方法,旨在挖掘患者的动态手写特征。具体来说,LSTM块被采用来提取时变特征,而CNN基于的块则是通过一维 convolution来实现低计算成本。此外,我们还进行了一系列的ablation study,以进一步提高表现。最后,我们使用五fold Cross-Validation进行评估,以验证提议的方法的可行性和稳定性。结果:提议的网络在我们的新的DraWritePD数据集上达到了96.2%的分类精度,同时在已知的PaHaW数据集上也达到了90.7%的分类精度。此外,该网络架构还具有优秀的轻量级设计,占用约0.084个参数,总计约0.59亿浮点运算。它还表现出了几乎实时的CPU执行时间,执行时间在0.106-0.220秒之间。结论:我们通过了一系列的实验和分析,系统地证明了提议的混合神经网络在诊断parkinson病的精度和效率。
A Large-Scale Car Parts (LSCP) Dataset for Lightweight Fine-Grained Detection
results: 研究人员通过使用许多轻量级YOLO系列检测器进行精细的汽车部件检测,并证明了数据集的有效性。Abstract
Automotive related datasets have previously been used for training autonomous driving systems or vehicle classification tasks. However, there is a lack of datasets in the field of automotive AI for car parts detection, and most available datasets are limited in size and scope, struggling to cover diverse scenarios. To address this gap, this paper presents a large-scale and fine-grained automotive dataset consisting of 84,162 images for detecting 12 different types of car parts. This dataset was collected from natural cameras and online websites which covers various car brands, scenarios, and shooting angles. To alleviate the burden of manual annotation, we propose a novel semi-supervised auto-labeling method that leverages state-of-the-art pre-trained detectors. Moreover, we study the limitations of the Grounding DINO approach for zero-shot labeling. Finally, we evaluate the effectiveness of our proposed dataset through fine-grained car parts detection by training several lightweight YOLO-series detectors.
摘要
自动驾驶系统或车型分类任务上使用了汽车相关数据集。然而,车部AI领域内有数据集缺失,现有数据集大多受限,缺乏多样化场景的覆盖。为了填补这一空白,本文提出了一个大规模、细致的汽车数据集,包含12种不同的车部类型的84,162张图像。这个数据集来自于自然摄像头和在线网站,覆盖了多种车型、场景和拍摄角度。为了避免手动标注的劳动ious burden,我们提出了一种新的半supervised自动标注方法,利用了当前领先的预训练检测器。此外,我们研究了零shot标签Grounding DINO的局限性。最后,我们通过使用多种轻量级YOLO系列检测器进行细致的车部检测来评估我们提出的数据集的有效性。
Sparse4D v3: Advancing End-to-End 3D Detection and Tracking
results: 经验表明,在 nuScenes 测试集上,我们的提出改进可以获得显著提高,包括 mAP 、NDS 和 AMOTA 的提高率。最佳模型在 nuScenes 测试集上达到了 71.9% NDS 和 67.7% AMOTA。Abstract
In autonomous driving perception systems, 3D detection and tracking are the two fundamental tasks. This paper delves deeper into this field, building upon the Sparse4D framework. We introduce two auxiliary training tasks (Temporal Instance Denoising and Quality Estimation) and propose decoupled attention to make structural improvements, leading to significant enhancements in detection performance. Additionally, we extend the detector into a tracker using a straightforward approach that assigns instance ID during inference, further highlighting the advantages of query-based algorithms. Extensive experiments conducted on the nuScenes benchmark validate the effectiveness of the proposed improvements. With ResNet50 as the backbone, we witnessed enhancements of 3.0\%, 2.2\%, and 7.6\% in mAP, NDS, and AMOTA, achieving 46.9\%, 56.1\%, and 49.0\%, respectively. Our best model achieved 71.9\% NDS and 67.7\% AMOTA on the nuScenes test set. Code will be released at \url{https://github.com/linxuewu/Sparse4D}.
摘要
自动驾驶视觉系统中,3D探测和跟踪是两个基本任务。这篇论文将深入探讨这一领域,基于Sparse4D框架。我们引入了两个辅助训练任务(时间实例干净和质量估计),并提出了分离注意力的方法,导致探测性能得到了显著提高。此外,我们将探测器转换成跟踪器,使用简单的方法,在推理时分配实例ID,进一步发挥了查询基于算法的优势。我们在nuScenes benchmark上进行了广泛的实验, validate了我们提出的改进方法的效果。使用ResNet50作为背景网络,我们在mAP、NDS和AMOTA中提高了3.0\%、2.2\%和7.6\%,分别达到了46.9\%、56.1\%和49.0\%。我们的最佳模型在nuScenes测试集上达到了71.9\%的NDS和67.7\%的AMOTA。代码将在 GitHub上发布,详细信息请参考 \url{https://github.com/linxuewu/Sparse4D}.
Can we infer the presence of Differential Privacy in Deep Learning models’ weights? Towards more secure Deep Learning
results: 通过分析模型参数,可以判断模型是否在训练过程中使用了Diff Privacy,不需要信任模型提供者。Abstract
Differential Privacy (DP) is a key property to protect data and models from integrity attacks. In the Deep Learning (DL) field, it is commonly implemented through the Differentially Private Stochastic Gradient Descent (DP-SGD). However, when a model is shared or released, there is no way to check whether it is differentially private, that is, it required to trust the model provider. This situation poses a problem when data privacy is mandatory, specially with current data regulations, as the presence of DP can not be certificated consistently by any third party. Thus, we face the challenge of determining whether a DL model has been trained with DP, according to the title question: Can we infer the presence of Differential Privacy in Deep Learning models' weights? Since the DP-SGD significantly changes the training process of a DL model, we hypothesize that DP leaves an imprint in the weights of a DL model, which can be used to predict whether a model has been trained with DP regardless of its architecture and the training dataset. In this paper, we propose to employ the imprint in model weights of using DP to infer the presence of DP training in a DL model. To substantiate our hypothesis, we developed an experimental methodology based on two datasets of weights of DL models, each with models with and without DP training and a meta-classifier to infer whether DP was used in the training process of a DL model, by accessing its weights. We accomplish both, the removal of the requirement of a trusted model provider and a strong foundation for this interesting line of research. Thus, our contribution is an additional layer of security on top of the strict private requirements of DP training in DL models, towards to DL models.
摘要
diffeential privacy (DP) 是一种保护数据和模型的重要性能。在深度学习(DL)领域,通常通过差分 private stochastic gradient descent (DP-SGD) 实现。但当模型被分享或发布时,没有方式来检查该模型是否具有差分隐私,即需要信任模型提供者。这种情况对于数据隐私是必要的,特别是现有的数据法规,因为差分隐私的存在无法被第三方证明。因此,我们面临着判断一个深度学习模型是否在训练过程中使用差分隐私的挑战。根据标题提问,我们问:可以通过深度学习模型的参数来推断它是否在差分隐私训练中?因为差分-SGD 对深度学习模型的训练过程进行了重要变化,我们假设差分隐私会留下在深度学习模型的参数中的印记,可以用来预测该模型是否在差分隐私训练中,无论其架构和训练数据。在这篇论文中,我们提议使用差分隐私训练中模型参数中的印记来判断深度学习模型是否在差分隐私训练中。为了证实我们的假设,我们采用了一种实验方法,该方法基于两个深度学习模型参数的数据集,每个数据集包含具有和不具有差分隐私训练的模型,以及一个元类фика器来预测深度学习模型是否在差分隐私训练中。通过这种方法,我们成功地 removeds了需要信任模型提供者的要求,并提供了一种强大的基础 для这一有趣的研究领域。因此,我们的贡献是在差分隐私训练中添加了一层安全性,以加强深度学习模型的隐私保护。
results: 研究人员通过实践和分析,发现这种集成方式可以提供更高的控制级别和更好的性能,同时避免模型“幻觉”现象的出现Abstract
Customer data typically is held in database systems, which can be seen as rule-based knowledge base, whereas businesses increasingly want to benefit from the capabilities of large, pre-trained language models. In this technical report, we describe a case study of how a commercial rule engine and an integrated neural chatbot may be integrated, and what level of control that particular integration mode leads to. We also discuss alternative ways (including past ways realized in other systems) how researchers strive to maintain control and avoid what has recently been called model "hallucination".
摘要
客户数据通常被存储在数据库系统中,可以看作为规则基本知识库。而企业正在寻求利用大量预训练语言模型的能力。在这份技术报告中,我们描述了一个商业规则引擎和一个集成的神经网络聊天机器人的集成方式,该集成方式导致了什么样的控制水平。我们还讨论了其他方法(包括过去在其他系统中实现的方法)如何维护控制并避免最近被称为“模型幻觉”的问题。
Sparse Low-rank Adaptation of Pre-trained Language Models
paper_authors: Ning Ding, Xingtai Lv, Qiaosen Wang, Yulin Chen, Bowen Zhou, Zhiyuan Liu, Maosong Sun
for: 提高 fine-tuning 大型自然语言模型的效率和效iveness
methods: 增加 LoRA 方法的灵活性,通过动态调整归一化级别来控制约束维度
results: SoRA 可以与其他基eline相比,即使剩下 70% 参数和 70% 训练时间,也可以获得更好的表现Abstract
Fine-tuning pre-trained large language models in a parameter-efficient manner is widely studied for its effectiveness and efficiency. The popular method of low-rank adaptation (LoRA) offers a notable approach, hypothesizing that the adaptation process is intrinsically low-dimensional. Although LoRA has demonstrated commendable performance, it is implemented with a fixed and unalterable intrinsic rank that might not always be the ideal choice. Recognizing the need for more flexible adaptation, we extend the methodology of LoRA to an innovative approach we call sparse low-rank adaptation (SoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. We achieve this through the incorporation of a gate unit optimized with proximal gradient method in the training stage, controlling the cardinality of rank under the sparsity of the gate. In the subsequent inference stage, we eliminate the parameter blocks corresponding to the zeroed-out ranks, to reduce each SoRA module back to a concise yet rank-optimal LoRA. Our approach strengthens the representation power of LoRA by initializing it with a higher rank, while efficiently taming a temporarily increased number of parameters via updating in a sparse way. We further introduce a sparsifying scheduler for SoRA, aiming to examine the impact of the number of non-zero parameters on the model's memorization and generalization. Our experimental results demonstrate that SoRA can outperform other baselines even with 70% retained parameters and 70% training time.
摘要
大量语言模型的精细调整已被广泛研究,以提高效率和表现。LoRA方法在这些研究中具有重要地位,假设适应过程是低维度的。虽然LoRA已经表现出色,但它使用固定和不可变的内在维度,可能不是理想的选择。我们认为需要更 flexible的适应方式,于是我们扩展了LoRA方法,并将其称为SoRA。在训练阶段,我们通过在训练过程中使用适应器来控制维度的卡达利度,使得SoRA模块在执行过程中可以动态地调整其内在维度。在推理阶段,我们可以根据适应器的输出来决定是否保留每个SoRA模块中的参数块。我们的方法可以强化LoRA的表现力,同时efficiently 控制参数的数量。我们还引入了一个缩短调度器来考虑SoRA模块中参数的数量对模型的记忆和泛化造成的影响。我们的实验结果表明,SoRA可以在70% retained parameters和70% 训练时间下超越其他基elines。
Towards Robust Text Retrieval with Progressive Learning
paper_authors: Tong Wu, Yulei Qin, Enwei Zhang, Zihan Xu, Yuting Gao, Ke Li, Xing Sun
for: This paper aims to improve the performance of large language models (LLMs) in retrieving up-to-date and domain-specific information by proposing a progressively learned embeddings (PEG) model.
methods: The PEG model uses a progressive learning mechanism that dynamically modulates its attention to samples throughout the entire training process, and it is trained on more than 100 million data covering various tasks and domains.
results: The PEG model outperforms state-of-the-art embeddings in retrieving true positives, demonstrating its significant potential for applications in LLMs.Abstract
Retrieval augmentation has become an effective solution to empower large language models (LLMs) with external and verified knowledge sources from the database, which overcomes the limitations and hallucinations of LLMs in handling up-to-date and domain-specific information. However, existing embedding models for text retrieval usually have three non-negligible limitations. First, the number and diversity of samples in a batch are too restricted to supervise the modeling of textual nuances at scale. Second, the high proportional noise are detrimental to the semantic correctness and consistency of embeddings. Third, the equal treatment to easy and difficult samples would cause sub-optimum convergence of embeddings with poorer generalization. In this paper, we propose the PEG, a progressively learned embeddings for robust text retrieval. Specifically, we increase the training in-batch negative samples to 80,000, and for each query, we extracted five hard negatives. Concurrently, we incorporated a progressive learning mechanism, enabling the model to dynamically modulate its attention to the samples throughout the entire training process. Additionally, PEG is trained on more than 100 million data, encompassing a wide range of domains (e.g., finance, medicine, and tourism) and covering various tasks (e.g., question-answering, machine reading comprehension, and similarity matching). Extensive experiments conducted on C-MTEB and DuReader demonstrate that PEG surpasses state-of-the-art embeddings in retrieving true positives, highlighting its significant potential for applications in LLMs. Our model is publicly available at https://huggingface.co/TownsWu/PEG.
摘要
大量语言模型(LLM)的问题解决方法之一是使用数据库中的可靠和证明的知识来强化LLM,这种方法可以超越LLM在处理最新和域pecific信息方面的局限性和偏见。然而,现有的文本检索嵌入模型通常具有三种不可忽略的限制。首先,批处理中的样本数量和多样性太少,无法全面supervise模型处理文本细节的各种变化。其次,高卷积噪音会导致嵌入的semantic正确性和一致性受到损害。第三,对于易于处理和困难处理的样本进行相同的处理会导致嵌入的优化不佳,从而影响其总体性能。在这篇论文中,我们提出了Progressive Embeddings for Robust Text Retrieval(PEG)模型,用于解决这些限制。具体来说,我们增加了批处理中的负样本数量至80,000,并对每个查询选择五个困难的负样本。同时,我们还实现了一种进程学习机制,使得模型可以在整个训练过程中动态调整对样本的注意力。此外,PEG模型在1000万多个数据上进行训练,覆盖了多个领域(如金融、医学和旅游等)和多种任务(如问答、机器阅读理解和相似性匹配等)。我们在C-MTEB和DuReader上进行了广泛的实验,显示PEG模型在检索真正正确的样本方面表现出色, highlighting its significant potential for LLM applications。我们的模型可以在https://huggingface.co/TownsWu/PEG中找到。
Refactoring Programs Using Large Language Models with Few-Shot Examples
results: 95.68%的程序可以通过生成10个候选程序来简化,减少平均 cyclomatic complexity 17.35%,减少平均行数25.84%,且有出色的代码格式化能力,但也有一些不必要的行为,如删除或翻译注释。Abstract
A less complex and more straightforward program is a crucial factor that enhances its maintainability and makes writing secure and bug-free programs easier. However, due to its heavy workload and the risks of breaking the working programs, programmers are reluctant to do code refactoring, and thus, it also causes the loss of potential learning experiences. To mitigate this, we demonstrate the application of using a large language model (LLM), GPT-3.5, to suggest less complex versions of the user-written Python program, aiming to encourage users to learn how to write better programs. We propose a method to leverage the prompting with few-shot examples of the LLM by selecting the best-suited code refactoring examples for each target programming problem based on the prior evaluation of prompting with the one-shot example. The quantitative evaluation shows that 95.68% of programs can be refactored by generating 10 candidates each, resulting in a 17.35% reduction in the average cyclomatic complexity and a 25.84% decrease in the average number of lines after filtering only generated programs that are semantically correct. Furthermore, the qualitative evaluation shows outstanding capability in code formatting, while unnecessary behaviors such as deleting or translating comments are also observed.
摘要
一个较简单且直观的程式是一个重要的因素,可以提高程式的维护和写作安全、无错程式的能力。然而,由于工作负担重大和可能会破坏正常运行的程式,因此开发者对程式 refactoring 的态度不够积极,从而导致学习机会的损失。为了解决这个问题,我们示范了使用大型自然语言模型(LLM)GPT-3.5,可以建议使用者写的 Python 程式中更加简单的版本,以便帮助使用者学习写更好的程式。我们提出了一种方法,利用LLM的提示,选择每个目标程式问题最适合的代码 refactoring 示例,根据先前评估的提示一个例子。根据量化评估,95.68%的程式可以通过生成10个候选者,实现了17.35%的减少平均顶点复杂度和25.84%的减少平均行数。此外,量化评估还表明了代码格式化的出色能力,而不必要的行为,如删除或翻译注解,也被观察到。
Causal Structure Learning Supervised by Large Language Model
results: 在 eight 个实际数据集上进行了 comprehensive 评估,显示 ILS-CSL 的性能更高,创造了新的标准 для CSL 效率,并展示了其在 causal discovery 领域的潜在进步Abstract
Causal discovery from observational data is pivotal for deciphering complex relationships. Causal Structure Learning (CSL), which focuses on deriving causal Directed Acyclic Graphs (DAGs) from data, faces challenges due to vast DAG spaces and data sparsity. The integration of Large Language Models (LLMs), recognized for their causal reasoning capabilities, offers a promising direction to enhance CSL by infusing it with knowledge-based causal inferences. However, existing approaches utilizing LLMs for CSL have encountered issues, including unreliable constraints from imperfect LLM inferences and the computational intensity of full pairwise variable analyses. In response, we introduce the Iterative LLM Supervised CSL (ILS-CSL) framework. ILS-CSL innovatively integrates LLM-based causal inference with CSL in an iterative process, refining the causal DAG using feedback from LLMs. This method not only utilizes LLM resources more efficiently but also generates more robust and high-quality structural constraints compared to previous methodologies. Our comprehensive evaluation across eight real-world datasets demonstrates ILS-CSL's superior performance, setting a new standard in CSL efficacy and showcasing its potential to significantly advance the field of causal discovery. The codes are available at \url{https://github.com/tyMadara/ILS-CSL}.
摘要
causal discovery from observational data 是解释复杂关系的关键。 causal Structure Learning (CSL) 是 deriv ing causal Directed Acyclic Graphs (DAGs) from data 的方法,面临的挑战包括庞大 DAG 空间和数据稀缺。 将 Large Language Models (LLMs) integrate into CSL 提供了一个有前途的方向, LLMS 被认可为具有 causal 推理能力。 however, existing approaches using LLMs for CSL have encountered issues, including unreliable constraints from imperfect LLM inferences and the computational intensity of full pairwise variable analyses。 In response, we introduce the Iterative LLM Supervised CSL (ILS-CSL) framework。 ILS-CSL 创新地将 LLM-based causal inference 与 CSL 集成在迭代过程中,通过 LLM 反馈来约束 causal DAG。 这种方法不仅更有效使用 LLM 资源,还生成了更加可靠和高质量的结构约束,与之前的方法相比。 our comprehensive evaluation across eight real-world datasets demonstrates ILS-CSL's superior performance, setting a new standard in CSL efficacy and showcasing its potential to significantly advance the field of causal discovery。 codes are available at \url{https://github.com/tyMadara/ILS-CSL}.
ViP-Mixer: A Convolutional Mixer for Video Prediction
results: 在三个标准视频数据集上实现新的最佳预测性能。Abstract
Video prediction aims to predict future frames from a video's previous content. Existing methods mainly process video data where the time dimension mingles with the space and channel dimensions from three distinct angles: as a sequence of individual frames, as a 3D volume in spatiotemporal coordinates, or as a stacked image where frames are treated as separate channels. Most of them generally focus on one of these perspectives and may fail to fully exploit the relationships across different dimensions. To address this issue, this paper introduces a convolutional mixer for video prediction, termed ViP-Mixer, to model the spatiotemporal evolution in the latent space of an autoencoder. The ViP-Mixers are stacked sequentially and interleave feature mixing at three levels: frames, channels, and locations. Extensive experiments demonstrate that our proposed method achieves new state-of-the-art prediction performance on three benchmark video datasets covering both synthetic and real-world scenarios.
摘要
视频预测目标是预测视频的未来帧。现有方法主要处理视频数据,其时间维度杂mix With space和channel维度,从三个不同的角度来看:为个帧序列,为三维空间时间坐标,或者为堆叠的图像,帧被视为不同的通道。大多数它们通常只关注一个这些视角,可能会不全面利用不同维度之间的关系。为解决这个问题,本文提出了一种基于卷积混合的视频预测方法,称为ViP-Mixer,用于模型视频的空间时间演化在自适应Encoder中的幂空间。ViP-Mixer堆叠在一起,并在三级Feature混合:帧、通道和位置。广泛的实验证明,我们提出的方法在三个标准视频数据集上实现了新的最佳预测性能,覆盖了synthetic和实际场景。
MGCT: Mutual-Guided Cross-Modality Transformer for Survival Outcome Prediction using Integrative Histopathology-Genomic Features
paper_authors: Mingxin Liu, Yunzan Liu, Hui Cui, Chunquan Li, Jiquan Ma for:* The paper is written to propose a new deep learning-based computational pathology method for prognosticating cancer patients using whole slide images (WSIs) and genomic features.methods:* The proposed method, called Mutual-Guided Cross-Modality Transformer (MGCT), uses a weakly-supervised, attention-based multimodal learning framework to combine histology features and genomic features to model the genotype-phenotype interactions within the tumor microenvironment.results:* The experiments conducted using nearly 3,600 gigapixel WSIs across five different cancer types sourced from The Cancer Genome Atlas (TCGA) consistently show that MGCT outperforms the state-of-the-art (SOTA) methods.Abstract
The rapidly emerging field of deep learning-based computational pathology has shown promising results in utilizing whole slide images (WSIs) to objectively prognosticate cancer patients. However, most prognostic methods are currently limited to either histopathology or genomics alone, which inevitably reduces their potential to accurately predict patient prognosis. Whereas integrating WSIs and genomic features presents three main challenges: (1) the enormous heterogeneity of gigapixel WSIs which can reach sizes as large as 150,000x150,000 pixels; (2) the absence of a spatially corresponding relationship between histopathology images and genomic molecular data; and (3) the existing early, late, and intermediate multimodal feature fusion strategies struggle to capture the explicit interactions between WSIs and genomics. To ameliorate these issues, we propose the Mutual-Guided Cross-Modality Transformer (MGCT), a weakly-supervised, attention-based multimodal learning framework that can combine histology features and genomic features to model the genotype-phenotype interactions within the tumor microenvironment. To validate the effectiveness of MGCT, we conduct experiments using nearly 3,600 gigapixel WSIs across five different cancer types sourced from The Cancer Genome Atlas (TCGA). Extensive experimental results consistently emphasize that MGCT outperforms the state-of-the-art (SOTA) methods.
摘要
深度学习计算生物学领域在使用整个扫描图像(WSIs)对肿瘤病人进行预测中已经显示了扎实的成果。然而,大多数预测方法都受限于历史学或基因学单独使用,这会导致预测病人预测的精度受到限制。而将WSIs和基因特征结合起来则存在三个主要挑战:(1)WSIs的巨大多样性,可以达到150,000x150,000像素的大小;(2)历史学图像和基因分子数据之间没有空间相对关系;(3)现有的早期、晚期和中期多Modal特征融合策略难以捕捉肿瘤微环境中的基因-生理相互作用。为了解决这些问题,我们提出了弱型监督的注意力基本多Modal学习框架——综合型响应器(MGCT),可以结合历史学特征和基因特征来模型肿瘤微环境中的基因-生理相互作用。为验证MGCT的效果,我们在TCGA数据集上进行了大量实验,结果表明MGCT在肿瘤预测方面具有明显的优势。
Peeking Inside the Schufa Blackbox: Explaining the German Housing Scoring System
results: 初步发现结果表明,尽管有一些通用的需求,但也有因角色和实际情况而冲突的需求。这些发现提供了未来人类中心的XAI研究的可能性。Abstract
Explainable Artificial Intelligence is a concept aimed at making complex algorithms transparent to users through a uniform solution. Researchers have highlighted the importance of integrating domain specific contexts to develop explanations tailored to end users. In this study, we focus on the Schufa housing scoring system in Germany and investigate how users information needs and expectations for explanations vary based on their roles. Using the speculative design approach, we asked business information students to imagine user interfaces that provide housing credit score explanations from the perspectives of both tenants and landlords. Our preliminary findings suggest that although there are general needs that apply to all users, there are also conflicting needs that depend on the practical realities of their roles and how credit scores affect them. We contribute to Human centered XAI research by proposing future research directions that examine users explanatory needs considering their roles and agencies.
摘要
人工智能可解释(Explainable Artificial Intelligence)是一种概念,旨在为用户提供复杂算法的通用解释。研究人员认为,在发展解释时应该考虑域专业上下文。在这项研究中,我们关注德国的信用评分系统(Schufa),并调查了用户信息需求和对解释的期望是如何因role而异。我们使用推想设计方法,请商业信息学生想象提供住房信用评分解释的用户界面,从租户和地产业者两个角度出发。我们的初步发现结果表明,虽然有一些共同的需求,但也有因role而异的需求,这些需求取决于各自的角色和信用评分对其造成的实际影响。我们的研究贡献了人类中心的XAI研究,并提出了未来研究的方向,旨在考虑用户的角色和权力,以更好地满足用户的解释需求。
Web News Timeline Generation with Extended Task Prompting
results: 研究表明,通过添加更多的任务提示,可以提高NLP技术对不同新闻数据集的效果,使新闻时间线生成成为职业使用的现实。Abstract
The creation of news timeline is essential for a comprehensive and contextual understanding of events as they unfold over time. This approach aids in discerning patterns and trends that might be obscured when news is viewed in isolation. By organizing news in a chronological sequence, it becomes easier to track the development of stories, understand the interrelation of events, and grasp the broader implications of news items. This is particularly helpful in sectors like finance and insurance, where timely understanding of the event development-ranging from extreme weather to political upheavals and health crises-is indispensable for effective risk management. While traditional natural language processing (NLP) techniques have had some success, they often fail to capture the news with nuanced relevance that are readily apparent to domain experts, hindering broader industry integration. The advance of Large Language Models (LLMs) offers a renewed opportunity to tackle this challenge. However, direct prompting LLMs for this task is often ineffective. Our study investigates the application of an extended task prompting technique to assess past news relevance. We demonstrate that enhancing conventional prompts with additional tasks boosts their effectiveness on various news dataset, rendering news timeline generation practical for professional use. This work has been deployed as a publicly accessible browser extension which is adopted within our network.
摘要
创建新闻时间轴是对事件的全面和Contextual理解的关键,帮助揭示事件发展的趋势和patterns。通过将新闻按时间顺序排序,可以轻松跟踪事件的发展,理解事件之间的相互关系,并捕捉新闻项目的更大意义。特别是在金融和保险领域,时间线新闻生成对于有效的风险管理至关重要,因为它可以帮助早发现和理解extreme weather、政治动荡和健康危机等事件的发展。传统的自然语言处理(NLP)技术有一定的成功,但它们经常无法捕捉域专家所看到的细微相关性,从而阻碍了更广泛的行业 интеграción。大语言模型(LLMs)的进步提供了一个新的机会,以解决这个挑战。然而,直接推 prompt LLMs 这种任务通常是不 efective。我们的研究探讨了在扩展任务推动技术下对过去新闻 relevance 的评估。我们示出,通过提高传统推送的效果,可以在不同的新闻数据集上提高新闻时间轴生成的实用性,使其成为职业使用的实际应用。这项工作已经被部署为公共可访问的浏览器扩展,并被我们的网络所采纳。
Leveraging healthy population variability in deep learning unsupervised anomaly detection in brain FDG PET
results: 这篇论文的实验结果显示,这种方法可以精准地检测阿兹海默症相关的异常。Abstract
Unsupervised anomaly detection is a popular approach for the analysis of neuroimaging data as it allows to identify a wide variety of anomalies from unlabelled data. It relies on building a subject-specific model of healthy appearance to which a subject's image can be compared to detect anomalies. In the literature, it is common for anomaly detection to rely on analysing the residual image between the subject's image and its pseudo-healthy reconstruction. This approach however has limitations partly due to the pseudo-healthy reconstructions being imperfect and to the lack of natural thresholding mechanism. Our proposed method, inspired by Z-scores, leverages the healthy population variability to overcome these limitations. Our experiments conducted on FDG PET scans from the ADNI database demonstrate the effectiveness of our approach in accurately identifying Alzheimer's disease related anomalies.
摘要
不监督异常检测是脑成像数据分析中广泛使用的方法,因为它可以从无标注数据中识别各种异常。它基于建立个体特定的健康模型,并将个体图像与这个模型进行比较,以检测异常。在文献中,异常检测通常基于分析个体图像与其 Pseudo-健康重建图像之间的差异。然而,这种方法有一些限制,其中之一是 Pseudo-健康重建图像不准确,另一个是缺乏自然的阈值分割机制。我们提出的方法, Drawing inspiration from Z-scores,利用健康人群的变化来超越这些限制。我们在 ADNI 数据库中的FDG PET扫描实验结果表明,我们的方法可以准确地识别阿尔茨heimer病相关的异常。
A novel transformer-based approach for soil temperature prediction
results: 实验结果表明,使用变换器模型可以为预测土壤温度做出重要贡献,并确定新的州对比。相比深度学习方法和文献研究,本研究的效果更好。Abstract
Soil temperature is one of the most significant parameters that plays a crucial role in glacier energy, dynamics of mass balance, processes of surface hydrological, coaction of glacier-atmosphere, nutrient cycling, ecological stability, the management of soil, water, and field crop. In this work, we introduce a novel approach using transformer models for the purpose of forecasting soil temperature prediction. To the best of our knowledge, the usage of transformer models in this work is the very first attempt to predict soil temperature. Experiments are carried out using six different FLUXNET stations by modeling them with five different transformer models, namely, Vanilla Transformer, Informer, Autoformer, Reformer, and ETSformer. To demonstrate the effectiveness of the proposed model, experiment results are compared with both deep learning approaches and literature studies. Experiment results show that the utilization of transformer models ensures a significant contribution to the literature, thence determining the new state-of-the-art.
摘要
土壤温度是冰川能源中最重要的参数之一,它对冰川动力、质量平衡、表面水文过程、冰川-大气相互作用、营养征回循环、生态稳定性等具有关键作用。在这个工作中,我们提出了一种新的方法,利用转换器模型来预测土壤温度。根据我们所知,这是第一次使用转换器模型预测土壤温度的尝试。我们在六个FLUXNET站上进行了六个不同的转换器模型,分别是粉丝转换器、信息转换器、自动转换器、改进转换器和ETS转换器。为证明我们提出的模型的效果,我们与深度学习方法和文献研究进行了比较。实验结果表明,通过使用转换器模型可以做出显著贡献,因此确定了新的状态体。
Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks
for: This paper is focused on developing a bilingual language model (Taiyi) for diverse biomedical natural language processing tasks in both English and Chinese.
methods: The authors use a two-stage fine-tuning strategy to optimize the model’s performance across various tasks, and they use a comprehensive collection of biomedical text mining datasets to evaluate the model’s performance.
results: The authors show that Taiyi achieves superior performance compared to general language models on 13 test sets covering named entity recognition, relation extraction, text classification, and question answering tasks. Additionally, the authors demonstrate the model’s potential for bilingual biomedical multi-tasking through a case study involving additional biomedical NLP tasks.Here are the three key points in Simplified Chinese text:
for: 这篇论文是为了开发一个双语语言模型(Taiyi),用于多种生物医学自然语言处理任务。
methods: 作者使用了两个阶段的超参数调整策略,以优化模型的性能 across 多种任务。
results: 作者表明,Taiyi 比普通语言模型在 13 个测试集上表现出色,包括命名实体识别、关系抽取、文本分类和问答任务。此外,作者通过一个案例研究,展示了 Taiyi 在多语言生物医学多任务中的可能性。Abstract
Recent advancements in large language models (LLMs) have shown promising results across a variety of natural language processing (NLP) tasks. The application of LLMs to specific domains, such as biomedicine, has achieved increased attention. However, most biomedical LLMs focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To further investigate the effectiveness of the LLMs on diverse biomedical NLP tasks in different languages, we present Taiyi, a bilingual (English and Chinese) fine-tuned LLM for diverse biomedical tasks. In this work, we first curated a comprehensive collection of 140 existing biomedical text mining datasets across over 10 task types. Subsequently, a two-stage strategy is proposed for supervised fine-tuning to optimize the model performance across varied tasks. Experimental results on 13 test sets covering named entity recognition, relation extraction, text classification, question answering tasks demonstrate Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking. The source code, datasets, and model for Taiyi are freely available at https://github.com/DUTIR-BioNLP/Taiyi-LLM.
摘要
现代大语言模型(LLM)的进步已经在各种自然语言处理(NLP)任务中显示出色的结果。将LLM应用到特定领域,如生物医学,已经吸引了更多的关注。然而,大多数生物医学LLM都是专门针对单语言生物医学问答和对话任务进行提升性能。为了进一步调查LLM在不同语言的生物医学NLP任务中的效果,我们提出了 Taiyi,一个英文和中文双语精度调整的大语言模型。在这种工作中,我们首先绘制了140个现有的生物医学文本挖掘数据集,涵盖了10多种任务类型。然后,我们提议了一种两阶段的监督微调策略,以便在不同任务中优化模型的性能。实验结果表明,Taiyi在13个测试集上(包括命名实体识别、关系抽取、文本分类、问答任务)表现出色,比普通的LLM更高。此外,在进一步的生物医学NLP任务中,Taiyi还表现出了很好的多任务优势。Taiyi的源代码、数据集和模型可以免费下载于https://github.com/DUTIR-BioNLP/Taiyi-LLM。
Machine learning-based malware detection for IoT devices using control-flow data
paper_authors: Gergely Hevesi for: This thesis project aims to provide better security for IoT devices using machine learning algorithms and reverse engineering tools.methods: The proposed method consists of two phases: (1) extracting control-flow related data using static binary analysis, and (2) classifying binary executables as malicious or benign using a neural network model.results: The method is trained using a dataset of malicious and benign ARM applications, and is able to detect malware with high accuracy.Abstract
Embedded devices are specialised devices designed for one or only a few purposes. They are often part of a larger system, through wired or wireless connection. Those embedded devices that are connected to other computers or embedded systems through the Internet are called Internet of Things (IoT for short) devices. With their widespread usage and their insufficient protection, these devices are increasingly becoming the target of malware attacks. Companies often cut corners to save manufacturing costs or misconfigure when producing these devices. This can be lack of software updates, ports left open or security defects by design. Although these devices may not be as powerful as a regular computer, their large number makes them suitable candidates for botnets. Other types of IoT devices can even cause health problems since there are even pacemakers connected to the Internet. This means, that without sufficient defence, even directed assaults are possible against people. The goal of this thesis project is to provide better security for these devices with the help of machine learning algorithms and reverse engineering tools. Specifically, I study the applicability of control-flow related data of executables for malware detection. I present a malware detection method with two phases. The first phase extracts control-flow related data using static binary analysis. The second phase classifies binary executables as either malicious or benign using a neural network model. I train the model using a dataset of malicious and benign ARM applications.
摘要
特殊设备(embedded devices)是专门设计用于一些或只有几个用途的设备。它们通常是大型系统的一部分,通过硬件或无线连接。这些与其他计算机或嵌入式系统通过互联网连接的特殊设备被称为互联网物品(IoT for short)设备。由于它们的广泛使用和不充分保护,这些设备在不断增长的Malware攻击目标。公司经常为了降低生产成本或配置不当,导致这些设备缺乏软件更新、开放的端口或设计上的安全漏洞。尽管这些设备可能不如常见的计算机强大,但由于它们的大量使用,它们成为了黑客的目标。此外,一些IoT设备甚至可能对人们健康造成威胁,因为有些 pacemakers 已经连接到互联网。这意味着,如果不充分防御,甚至可能发生对人的targeted攻击。本论文的目标是通过机器学习算法和反向工程工具来提供更好的安全保护 для这些设备。 Specifically,我研究控制流相关数据的应用可能性,以提出一种两相阶段的恶意软件检测方法。第一相阶段使用静态二进制分析提取控制流相关数据。第二相阶段使用神经网络模型将二进制执行文件分类为恶意或正常。我使用一个基于恶意和正常 ARM 应用的数据集进行训练。
A Multi-In-Single-Out Network for Video Frame Interpolation without Optical Flow
results: 我们的方法在 Vimeo90K、Middlebury 和 UCF101 等视频帧 interpolate benchmark 上达到了状态机器的 результаты,与现有方法相比,具有显著的性能差距。Abstract
In general, deep learning-based video frame interpolation (VFI) methods have predominantly focused on estimating motion vectors between two input frames and warping them to the target time. While this approach has shown impressive performance for linear motion between two input frames, it exhibits limitations when dealing with occlusions and nonlinear movements. Recently, generative models have been applied to VFI to address these issues. However, as VFI is not a task focused on generating plausible images, but rather on predicting accurate intermediate frames between two given frames, performance limitations still persist. In this paper, we propose a multi-in-single-out (MISO) based VFI method that does not rely on motion vector estimation, allowing it to effectively model occlusions and nonlinear motion. Additionally, we introduce a novel motion perceptual loss that enables MISO-VFI to better capture the spatio-temporal correlations within the video frames. Our MISO-VFI method achieves state-of-the-art results on VFI benchmarks Vimeo90K, Middlebury, and UCF101, with a significant performance gap compared to existing approaches.
摘要
通常,深度学习基于视频帧 interpolate(VFI)方法都是估算两个输入帧之间的运动 вектор,并将其折叠到目标时间。这种方法在线性运动时 exhibits 出色的表现,但是在遮挡和非线性运动时存在局限性。在最近,生成模型被应用于 VFI 以解决这些问题。然而,由于 VFI 不是一个关注生成可信worth images 的任务,而是预测两个输入帧之间的准确中间帧,因此性能还是有限。在这篇论文中,我们提出了一种多入一出(MISO)基于 VFI 方法,不需要运动 вектор估算,因此可以有效地处理遮挡和非线性运动。此外,我们还引入了一种新的运动感知损失,使得 MISO-VFI 更好地捕捉视频帧中的空间-时间相关性。我们的 MISO-VFI 方法在 Vimeo90K、Middlebury 和 UCF101 视频帧 interpolate 测试benchmark上达到了当前最佳的结果,与现有方法相比,具有显著的性能差距。
results: 实验结果显示,相比单独使用的AI工具,DesignGPT可以提高设计师的表现, highlighting the potential of 将多代理人系统集成到产品方案设计中。Abstract
Generative AI faces many challenges when entering the product design workflow, such as interface usability and interaction patterns. Therefore, based on design thinking and design process, we developed the DesignGPT multi-agent collaboration framework, which uses artificial intelligence agents to simulate the roles of different positions in the design company and allows human designers to collaborate with them in natural language. Experimental results show that compared with separate AI tools, DesignGPT improves the performance of designers, highlighting the potential of applying multi-agent systems that integrate design domain knowledge to product scheme design.
摘要
<>translate english text to simplified chinese产生式AI在产品设计 workflow 中遇到许多挑战,如界面可用性和互动模式。因此,基于设计思维和设计过程,我们开发了 DesignGPT 多代理协作框架,使用人工智能代理模拟不同的设计公司位置,让人类设计师和其在自然语言中合作。实验结果显示,相比单独的 AI 工具,DesignGPT 提高了设计师的性能, highlighting 将多代理系统应用于产品方案设计中的潜在价值。Note: "产生式AI" is a term used in China to refer to generative AI, and "设计公司" means "design company" in Chinese.
Advancing Urban Renewal: An Automated Approach to Generating Historical Arcade Facades with Stable Diffusion Models
results: 我们的方法得到了高级别的精度、AUTHENTICITY和多样性,显示了在实际城市更新项目中的潜在潜力。这种新的方法可以更有效率地替代传统的城市更新设计过程,解决不authentic的图像细节、缺乏精度和限制的样式多样性问题。未来的研究可以考虑将这种二维图像生成技术与三维模型技术集成,为历史街区的建筑改造提供更全面的解决方案。Abstract
Urban renewal and transformation processes necessitate the preservation of the historical urban fabric, particularly in districts known for their architectural and historical significance. These regions, with their diverse architectural styles, have traditionally required extensive preliminary research, often leading to subjective results. However, the advent of machine learning models has opened up new avenues for generating building facade images. Despite this, creating high-quality images for historical district renovations remains challenging, due to the complexity and diversity inherent in such districts. In response to these challenges, our study introduces a new methodology for automatically generating images of historical arcade facades, utilizing Stable Diffusion models conditioned on textual descriptions. By classifying and tagging a variety of arcade styles, we have constructed several realistic arcade facade image datasets. We trained multiple low-rank adaptation (LoRA) models to control the stylistic aspects of the generated images, supplemented by ControlNet models for improved precision and authenticity. Our approach has demonstrated high levels of precision, authenticity, and diversity in the generated images, showing promising potential for real-world urban renewal projects. This new methodology offers a more efficient and accurate alternative to conventional design processes in urban renewal, bypassing issues of unconvincing image details, lack of precision, and limited stylistic variety. Future research could focus on integrating this two-dimensional image generation with three-dimensional modeling techniques, providing a more comprehensive solution for renovating architectural facades in historical districts.
摘要
都市更新和转化过程中需要保留历史城市胶囊,特别是那些建筑和历史意义极高的区域。这些区域具有多样化的建筑风格,传统上需要广泛的初步研究,经常导致主观的结果。然而,机器学习模型的出现开创了新的可能性,可以生成建筑立面图像。然而,为历史区域重新建设而生成高质量图像仍然是挑战,因为这些区域的复杂性和多样性。为解决这些挑战,我们的研究提出了一种新的方法,利用稳定扩散模型 Conditioned on 文本描述来自动生成历史街区立面图像。我们根据不同的街区风格进行分类和标注,并构建了许多真实的街区立面图像数据集。我们使用多个低级 adaptive (LoRA)模型控制生成图像的艺术性方面,并补充了 ControlNet 模型以提高精度和authenticity。我们的方法已经显示了高度的精度、authenticity 和多样性,表现出了潜在的应用潜力。这种新的方法可以为都市更新项目提供更高效和准确的替代方案,通过 circumventing 问题,例如不寓实的图像细节、缺乏精度和限制的艺术风格多样性。未来的研究可能会集成这种二维图像生成技术和三维模型技术,为历史区域的建筑立面重新建设提供更全面的解决方案。
results: 在PASCAL VOC和MSCOCO等常用 dataset 上测试,提出的模型能够稳定地提高 Fine-tuning 和 meta-learning 下的性能,与最新的工作相比达到最高得分。Abstract
Few-shot object detection (FSOD), an efficient method for addressing the severe data-hungry problem, has been extensively discussed. Current works have significantly advanced the problem in terms of model and data. However, the overall performance of most FSOD methods still does not fulfill the desired accuracy. In this paper we improve the FSOD model to address the severe issue of sample imbalance and weak feature propagation. To alleviate modeling bias from data-sufficient base classes, we examine the effect of decoupling the parameters for classes with sufficient data and classes with few samples in various ways. We design a base-novel categories decoupled DETR (DeDETR) for FSOD. We also explore various types of skip connection between the encoder and decoder for DETR. Besides, we notice that the best outputs could come from the intermediate layer of the decoder instead of the last layer; therefore, we build a unified decoder module that could dynamically fuse the decoder layers as the output feature. We evaluate our model on commonly used datasets such as PASCAL VOC and MSCOCO. Our results indicate that our proposed module could achieve stable improvements of 5% to 10% in both fine-tuning and meta-learning paradigms and has outperformed the highest score in recent works.
摘要
“几枚shot物类探测(FSOD),一种高效的方法来解决严重的数据饵问题,已经被广泛讨论。现有的工作对这个问题进行了重要的进步,但大多数FSOD方法的总性表现仍未能满足预期的精度。在这篇论文中,我们改进FSOD模型,以解决严重的样本不均衡和弱feature传播问题。为了从数据充足的基础类别中避免模型偏见,我们考虑了各种方法来隔离具有充足数据的类别和具有少数数据的类别的参数。我们设计了基于类别分类的DETR(DeDETR)模型来进行FSOD。此外,我们发现最佳的输出可能来自decoder层的中途层而不是最后层,因此我们建立了一个统一的decoder模组,可以动态地融合decoder层的输出特征。我们将我们的模型评估在常用的PASCAL VOC和MSCOCO datasets上,结果显示我们的提案模组可以在精度调整和元学习模式下稳定地提高5%至10%的表现,并且超过了最近的最高得分。”
results: 在 Continual World benchmark 上表现出色,比 purely perfect memory replay 好,并且与 state-of-the-art continual learning methods 相当或更好Abstract
Replaying past experiences has proven to be a highly effective approach for averting catastrophic forgetting in supervised continual learning. However, some crucial factors are still largely ignored, making it vulnerable to serious failure, when used as a solution to forgetting in continual reinforcement learning, even in the context of perfect memory where all data of previous tasks are accessible in the current task. On the one hand, since most reinforcement learning algorithms are not invariant to the reward scale, the previously well-learned tasks (with high rewards) may appear to be more salient to the current learning process than the current task (with small initial rewards). This causes the agent to concentrate on those salient tasks at the expense of generality on the current task. On the other hand, offline learning on replayed tasks while learning a new task may induce a distributional shift between the dataset and the learned policy on old tasks, resulting in forgetting. In this paper, we introduce RECALL, a replay-enhanced method that greatly improves the plasticity of existing replay-based methods on new tasks while effectively avoiding the recurrence of catastrophic forgetting in continual reinforcement learning. RECALL leverages adaptive normalization on approximate targets and policy distillation on old tasks to enhance generality and stability, respectively. Extensive experiments on the Continual World benchmark show that RECALL performs significantly better than purely perfect memory replay, and achieves comparable or better overall performance against state-of-the-art continual learning methods.
摘要
重新播放过去的经验被证明是预学批处理中维持快速学习的高效方法。然而,一些重要的因素仍然被忽视,使其在重复学习中容易出现严重的失败。在现在的任务中,由于大多数学习算法不具有奖励缩放的不变性,以前的任务(高奖)可能会在当前任务(初始奖)中显得更加吸引人,导致代理人偏向这些鲜明的任务,而忽略当前任务。另一方面,在学习新任务时, reuse 批处理过去的任务可能会导致批处理集和学习策略之间的分布差异,从而导致忘记。在本文中,我们介绍了 RECALL,一种增强现有批处理方法的方法,以提高新任务的抽象和稳定性。RECALL 利用了adaptive normalization的 approximate targets和policy distillation on old tasks来提高多任务学习的灵活性和稳定性。我们在 Continual World benchmark 上进行了广泛的实验,发现 RECALL 比纯粹的完美记忆播放更好,并与现状顶尖的 continual learning 方法相比,达到了相似或更好的总性表现。
Exploring Prompting Large Language Models as Explainable Metrics
results: 实验结果表明,LLMs 在 NLP 领域,特别是摘要任务中,具有扎实的可解释评估能力,并且可以在少量和零量的情况下达到可观的性能。Prompt-based strategy的最佳提示表现达到了人工评估的克ен德拉度相对值0.477。代码和结果在 GitHub 上公开发布。Abstract
This paper describes the IUST NLP Lab submission to the Prompting Large Language Models as Explainable Metrics Shared Task at the Eval4NLP 2023 Workshop on Evaluation & Comparison of NLP Systems. We have proposed a zero-shot prompt-based strategy for explainable evaluation of the summarization task using Large Language Models (LLMs). The conducted experiments demonstrate the promising potential of LLMs as evaluation metrics in Natural Language Processing (NLP), particularly in the field of summarization. Both few-shot and zero-shot approaches are employed in these experiments. The performance of our best provided prompts achieved a Kendall correlation of 0.477 with human evaluations in the text summarization task on the test data. Code and results are publicly available on GitHub.
摘要
这篇论文描述了我们在Eval4NLP 2023工作坊上提交的Prompting Large Language Models as Explainable Metrics Shared Task的实验报告。我们提议了一种零批量基于提示的方法来评估摘要任务中的大自然语言模型(LLMs)。我们进行了实验,并证明了LLMs在自然语言处理(NLP)领域,特别是摘要任务中的表现有扎实的潜力。我们在几个shot和零shot的情况下都使用了这些提示。实验结果显示,我们提供的最佳提示的性能达到了人工评估数据上的Kendall相关性0.477。代码和结果在GitHub上公开可用。
Which AI Technique Is Better to Classify Requirements? An Experiment with SVM, LSTM, and ChatGPT
results: 研究发现,ChatGPT在需求分类方面的表现比LSTM更好,但在分类非功能需求(NFR)方面,SVM的表现更好。此外,研究发现在大多数情况下,几shot Setting不一定能够提高表现,有时甚至会下降。I hope that helps! Let me know if you have any other questions.Abstract
Context and motivation: Recently, Large Language Models (LLMs) like ChatGPT have demonstrated remarkable proficiency in various Natural Language Processing (NLP) tasks. Their application in Requirements Engineering (RE), especially in requirements classification, has gained increasing interest. Question/problem: In our research, we conducted an extensive empirical evaluation of ChatGPT models including text-davinci-003, gpt-3.5-turbo, and gpt-4 in both zero-shot and few-shot settings for requirements classification. The question arises as to how these models compare to traditional classification methods, specifically Support Vector Machine (SVM) and Long Short-Term Memory (LSTM). Principal ideas/results: Based on five diverse datasets, our results show that ChatGPT consistently outperforms LSTM, and while ChatGPT is more effective than SVM in classifying functional requirements (FR), SVM is better in classifying non-functional requirements (NFR). Our results also show that contrary to our expectations, the few-shot setting does not always lead to enhanced performance; in most instances, it was found to be suboptimal. Contribution: Our findings underscore the potential of LLMs in the RE domain, suggesting that they could play a pivotal role in future software engineering processes, particularly as tools to enhance requirements classification.
摘要
Context and motivation: 近些年来,大型自然语言模型(LLM)如ChatGPT在自然语言处理(NLP)方面的表现很出色,其应用于需求工程(RE)领域,特别是需求分类,引起了越来越多的关注。问题/问题:在我们的研究中,我们进行了大量的实验性评估,包括文本达尔文-003、gpt-3.5-turbo和gpt-4等ChatGPT模型,在零shot和几shot设置下进行了需求分类。问题是如何与传统的分类方法相比,特别是支持向量机(SVM)和长期记忆(LSTM)?主要想法/结果:根据五个多样化的数据集,我们的结果显示,ChatGPT在FR和NFR之间的分类表现相对较好,而SVM在NFR中的表现更好。此外,我们发现,在大多数情况下,几shot设置不一定是最佳的,反而有时会下降性能。贡献:我们的发现表明了LLM在RE领域的潜力, suggesting that they could play a crucial role in future software engineering processes, particularly as tools to enhance requirements classification.
Data-driven project planning: An integrated network learning and constraint relaxation approach in favor of scheduling
results: 该研究使用两个实际项目数据集,显示了该方法可以为项目规划人员提供显著的灵活性(最多减少项目 kritical path 26%),以便调整项目计划和时间表。Abstract
Our focus is on projects, i.e., business processes, which are emerging as the economic drivers of our times. Differently from day-to-day operational processes that do not require detailed planning, a project requires planning and resource-constrained scheduling for coordinating resources across sub- or related projects and organizations. A planner in charge of project planning has to select a set of activities to perform, determine their precedence constraints, and schedule them according to temporal project constraints. We suggest a data-driven project planning approach for classes of projects such as infrastructure building and information systems development projects. A project network is first learned from historical records. The discovered network relaxes temporal constraints embedded in individual projects, thus uncovering where planning and scheduling flexibility can be exploited for greater benefit. Then, the network, which contains multiple project plan variations, from which one has to be selected, is enriched by identifying decision rules and frequent paths. The planner can rely on the project network for: 1) decoding a project variation such that it forms a new project plan, and 2) applying resource-constrained project scheduling procedures to determine the project's schedule and resource allocation. Using two real-world project datasets, we show that the suggested approach may provide the planner with significant flexibility (up to a 26% reduction of the critical path of a real project) to adjust the project plan and schedule. We believe that the proposed approach can play an important part in supporting decision making towards automated data-driven project planning.
摘要
First, a project network is learned from historical records. The discovered network relaxes temporal constraints embedded in individual projects, thus uncovering where planning and scheduling flexibility can be exploited for greater benefit. Then, the network, which contains multiple project plan variations, from which one has to be selected, is enriched by identifying decision rules and frequent paths. The planner can rely on the project network for:1. Decoding a project variation so that it forms a new project plan, and2. Applying resource-constrained project scheduling procedures to determine the project's schedule and resource allocation.Using two real-world project datasets, we show that the suggested approach may provide the planner with significant flexibility (up to a 26% reduction of the critical path of a real project) to adjust the project plan and schedule. We believe that the proposed approach can play an important part in supporting decision making towards automated data-driven project planning.
A New Approach to Intuitionistic Fuzzy Decision Making Based on Projection Technology and Cosine Similarity Measure
results: compare with existed methods, the proposed method can identify the optimal scheme more accurately. In medical diagnosis area, it can quickly diagnose disease. The proposed method enriches the exist-ing similarity measure methods and it can be applied to not only IFSs, but also other interval-valued intuitionistic fuzzy sets(IVIFSs) as well.Abstract
For a multi-attribute decision making (MADM) problem, the information of alternatives under different attributes is given in the form of intuitionistic fuzzy number(IFN). Intuitionistic fuzzy set (IFS) plays an important role in dealing with un-certain and incomplete information. The similarity measure of intuitionistic fuzzy sets (IFSs) has always been a research hotspot. A new similarity measure of IFSs based on the projection technology and cosine similarity measure, which con-siders the direction and length of IFSs at the same time, is first proposed in this paper. The objective of the presented pa-per is to develop a MADM method and medical diagnosis method under IFS using the projection technology and cosine similarity measure. Some examples are used to illustrate the comparison results of the proposed algorithm and some exist-ing methods. The comparison result shows that the proposed algorithm is effective and can identify the optimal scheme accurately. In medical diagnosis area, it can be used to quickly diagnose disease. The proposed method enriches the exist-ing similarity measure methods and it can be applied to not only IFSs, but also other interval-valued intuitionistic fuzzy sets(IVIFSs) as well.
摘要
For a multi-attribute decision making (MADM) problem, the information of alternatives under different attributes is given in the form of intuitionistic fuzzy numbers (IFN). Intuitionistic fuzzy sets (IFS) play an important role in dealing with uncertain and incomplete information. The similarity measure of intuitionistic fuzzy sets (IFSs) has always been a research hotspot. A new similarity measure of IFSs based on the projection technology and cosine similarity measure, which considers the direction and length of IFSs at the same time, is proposed in this paper for the first time. The objective of the presented paper is to develop a MADM method and medical diagnosis method under IFS using the projection technology and cosine similarity measure. Some examples are used to illustrate the comparison results of the proposed algorithm and some existing methods. The comparison result shows that the proposed algorithm is effective and can identify the optimal scheme accurately. In the medical diagnosis area, it can be used to quickly diagnose diseases. The proposed method enriches the existing similarity measure methods and it can be applied to not only IFSs, but also other interval-valued intuitionistic fuzzy sets (IVIFSs) as well.Here's the word-for-word translation of the text into Simplified Chinese:为多Attribute决策问题,alternatives的不同属性信息给出为Intuitionistic Fuzzy Number(IFN)的形式。Intuitionistic Fuzzy Set(IFS)在处理不确定和不完整信息方面发挥重要作用。IFS的相似度度量在IFS中 Always是研究热点。本文提出了一种基于投影技术和cosine相似度度量的IFS相似度度量,这种度量考虑了IFS的方向和长度同时。本文的目标是使用投影技术和cosine相似度度量来解决IFS下的多Attribute决策问题和医学诊断问题。一些例子用于比较提出的算法和现有方法的比较结果,结果显示,提出的算法是有效的,可以准确地确定优化方案。在医学诊断领域,它可以快速诊断疾病。提出的方法riches存在相似度度量方法,并可以应用于不仅IFS,还有其他间隔值Intuitionistic Fuzzy Set(IVIFS)。
Assessing Prompt Injection Risks in 200+ Custom GPTs
results: 该研究发现,通过提示攻击,敏感Prompt可以被提取,并且可以访问上传的文件。这些发现告诉我们,在设计和部署自定义GPT模型时,需要建立Robust的安全框架,以避免安全和隐私问题。Abstract
In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature: customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy.
摘要
在人工智能领域的快速发展中,ChatGPT已广泛应用于多种应用程序。新的特性:用户自定义ChatGPT模型以满足特定需求,打开了新的人工智能实用性前iers。然而,这项研究发现了自定义GPT模型中的一定安全漏洞:提示插入攻击。通过对超过200个用户定制GPT模型进行广泛测试,我们证实了这些系统受到提示插入攻击的漏洞。通过提示插入,敌对者不仅可以提取自定义系统提示,还可以访问上传文件。本文提供了首次的提示插入分析,以及可能的 Mitigation 策略的评估。我们的发现强调了在设计和部署自定义GPT模型时需要建立Robust 安全框架,以确保人工智能技术的发展不受安全和隐私问题的限制。本文的目的是逐itung 人工智能社区,提醒他们不要因为自定义GPT模型的好处而忽略安全和隐私问题的重要性。
ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning
results: 该论文的实验结果表明,使用 adapter 可以提高基础 Agent 的性能,并且可以与先前训练过的神经网络和规则基础 Agent 集成,以便捕捉人类专家的知识。Abstract
Deep Reinforcement Learning (DRL) agents frequently face challenges in adapting to tasks outside their training distribution, including issues with over-fitting, catastrophic forgetting and sample inefficiency. Although the application of adapters has proven effective in supervised learning contexts such as natural language processing and computer vision, their potential within the DRL domain remains largely unexplored. This paper delves into the integration of adapters in reinforcement learning, presenting an innovative adaptation strategy that demonstrates enhanced training efficiency and improvement of the base-agent, experimentally in the nanoRTS environment, a real-time strategy (RTS) game simulation. Our proposed universal approach is not only compatible with pre-trained neural networks but also with rule-based agents, offering a means to integrate human expertise.
摘要
深度强化学习(DRL)代理频繁面临外部任务适应问题,包括过拟合、致命忘记和样本不足。 although adapters have proven effective in supervised learning contexts such as natural language processing and computer vision, their potential within the DRL domain remains largely unexplored. This paper explores the integration of adapters in reinforcement learning, presenting an innovative adaptation strategy that demonstrates enhanced training efficiency and improvement of the base-agent, experimentally in the nanoRTS environment, a real-time strategy (RTS) game simulation. Our proposed universal approach is not only compatible with pre-trained neural networks but also with rule-based agents, offering a means to integrate human expertise.Here's the translation in Traditional Chinese:深度强化学习(DRL)代理频繁面临外部任务适应问题,包括过拟合、致命忘记和样本不足。 although adapters have proven effective in supervised learning contexts such as natural language processing and computer vision, their potential within the DRL domain remains largely unexplored. This paper explores the integration of adapters in reinforcement learning, presenting an innovative adaptation strategy that demonstrates enhanced training efficiency and improvement of the base-agent, experimentally in the nanoRTS environment, a real-time strategy (RTS) game simulation. Our proposed universal approach is not only compatible with pre-trained neural networks but also with rule-based agents, offering a means to integrate human expertise.
Optimal Hyperparameter $ε$ for Adaptive Stochastic Optimizers through Gradient Histograms
results: 该论文提出了一种基于 gradient histogram 的新算法,可以自动优化 safeguard 参数 $\epsilon$ 的搜索空间,以便更好地找到最佳值。Abstract
Optimizers are essential components for successfully training deep neural network models. In order to achieve the best performance from such models, designers need to carefully choose the optimizer hyperparameters. However, this can be a computationally expensive and time-consuming process. Although it is known that all optimizer hyperparameters must be tuned for maximum performance, there is still a lack of clarity regarding the individual influence of minor priority hyperparameters, including the safeguard factor $\epsilon$ and momentum factor $\beta$, in leading adaptive optimizers (specifically, those based on the Adam optimizers). In this manuscript, we introduce a new framework based on gradient histograms to analyze and justify important attributes of adaptive optimizers, such as their optimal performance and the relationships and dependencies among hyperparameters. Furthermore, we propose a novel gradient histogram-based algorithm that automatically estimates a reduced and accurate search space for the safeguard hyperparameter $\epsilon$, where the optimal value can be easily found.
摘要
In this paper, we propose a new framework based on gradient histograms to analyze and justify important attributes of adaptive optimizers, such as their optimal performance and the relationships and dependencies among hyperparameters. Furthermore, we introduce a novel gradient histogram-based algorithm that automatically estimates a reduced and accurate search space for the safeguard hyperparameter $\epsilon$, making it easy to find the optimal value.
GPT in Data Science: A Practical Exploration of Model Selection
results: 研究发现,GPT-4的模型选择建议受到了多种因素的指导,包括数据的性质、问题的类型、性能指标、计算资源、可解释性 vs 准确性、数据假设和伦理考虑。这些因素的权重不同,决定了模型选择的结果。Abstract
There is an increasing interest in leveraging Large Language Models (LLMs) for managing structured data and enhancing data science processes. Despite the potential benefits, this integration poses significant questions regarding their reliability and decision-making methodologies. It highlights the importance of various factors in the model selection process, including the nature of the data, problem type, performance metrics, computational resources, interpretability vs accuracy, assumptions about data, and ethical considerations. Our objective is to elucidate and express the factors and assumptions guiding GPT-4's model selection recommendations. We employ a variability model to depict these factors and use toy datasets to evaluate both the model and the implementation of the identified heuristics. By contrasting these outcomes with heuristics from other platforms, our aim is to determine the effectiveness and distinctiveness of GPT-4's methodology. This research is committed to advancing our comprehension of AI decision-making processes, especially in the realm of model selection within data science. Our efforts are directed towards creating AI systems that are more transparent and comprehensible, contributing to a more responsible and efficient practice in data science.
摘要
随着大型语言模型(LLMs)在数据管理和数据科学过程中的潜在应用,人们对其可靠性和决策方法的问题日益增加。这些问题提出了许多因素在选择模型过程中的重要性,包括数据的性质、问题的类型、性能指标、计算资源、解释性vs准确率、数据假设以及伦理考虑。我们的目标是解释和表达引导GPT-4选择模型的因素和假设。我们使用变量模型来描述这些因素,并使用小型数据集来评估模型和标记出的规则。通过与其他平台的假设进行比较,我们的目标是确定GPT-4的方法的效果和特点。这项研究旨在提高我们对人工智能决策过程的理解,特别是数据科学中模型选择的领域。我们的努力是创造更加透明和可读的AI系统,以便更负责任和高效地实践数据科学。
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
results: 与单个LoRAcounterpart和精度调整相比,MultiLoRA模型在多个benchmark和模型缩放上表现出色,只需要2.5%的额外参数。进一步的分析发现MultiLoRA模型的weight更新矩阵中有更多的低级特征值贡献。Abstract
LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. Since ChatGPT demonstrated superior performance on various tasks, there has been a growing desire to adapt one model for all tasks. However, the explicit low-rank of LoRA limits the adaptation performance in complex multi-task scenarios. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. In this paper, we propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA. MultiLoRA scales LoRA modules horizontally and change parameter initialization of adaptation matrices to reduce parameter dependency, thus yields more balanced unitary subspaces. We unprecedentedly construct specialized training data by mixing datasets of instruction follow, natural language understanding, world knowledge, to cover semantically and syntactically different samples. With only 2.5% of additional parameters, MultiLoRA outperforms single LoRA counterparts and fine-tuning on multiple benchmarks and model scales. Further investigation into weight update matrices of MultiLoRA exhibits reduced dependency on top singular vectors and more democratic unitary transform contributions.
摘要
LoRA 实现了非常出色的资源效率和相对性,当适应特定任务时。由于 ChatGPT 在多种任务上显示出了优秀表现,因此有越来越多的人想要适应一个模型 для所有任务。然而,LoRA 的明确低纬度限制了复杂多任务场景中的适应性能。LoRA 由一小数量的Top singular vectors 控制,而 fine-tuning decomposition 为一组 less important 的单位变换。在这篇论文中,我们提议 MultiLoRA 来改进多任务适应性能。MultiLoRA 将 LoRA 模块拓宽到 Horizontal 方向,并改变适应矩阵的参数初始化,以减少参数依赖性,因此实现更加平衡的单位空间。我们首次混合了 instrucion follow、自然语言理解、世界知识等数据集,以覆盖semantically和syntactically不同的样本。尽管只增加了2.5%的参数,MultiLoRA 仍然超过了单个 LoRA 对手和 fine-tuning 多个benchmark和模型Scale。进一步的调查表明,MultiLoRA 的weight update 矩阵中具有减少 top singular vectors 的依赖性和更多的democratic 单位变换贡献。
Interpretability in Machine Learning: on the Interplay with Explainability, Predictive Performances and Models
results: 挑战了一些关于机器学习模型的理解和应用的假设和误解,并提出了一些新的思路和方法来提高机器学习模型的理解和应用Abstract
Interpretability has recently gained attention in the field of machine learning, for it is crucial when it comes to high-stakes decisions or troubleshooting. This abstract concept is hard to grasp and has been associated, over time, with many labels and preconceived ideas. In this position paper, in order to clarify some misunderstandings regarding interpretability, we discuss its relationship with significant concepts in machine learning: explainability, predictive performances, and machine learning models. For instance, we challenge the idea that interpretability and explainability are substitutes to one another, or that a fixed degree of interpretability can be associated with a given machine learning model.
摘要
优先级化的Machine Learning领域中,可解性(Interpretability)在最近几年来受到了关注,因为它在高风险决策或疑难解答中具有重要性。这个抽象的概念很难理解,而且在时间的推移中,与多种标签和先入为主的想法相关联。在这篇position paper中,我们希望通过讨论可解性与机器学习模型之间的关系,以及与预测性能和其他核心概念的关系,来清楚一些有关可解性的误解。例如,我们挑战了认为可解性和解释性是纯粹的替代品,或者一个固定的可解性水平可以与某种机器学习模型相关联。
A Multi-Center Study on the Adaptability of a Shared Foundation Model for Electronic Health Records
paper_authors: Lin Lawrence Guo, Jason Fries, Ethan Steinberg, Scott Lanyon Fleming, Keith Morse, Catherine Aftandilian, Jose Posada, Nigam Shah, Lillian Sung for: 这种研究旨在检验基础模型在不同医院之间是否可以进行共享和适应,以提高AI在医疗领域的可扩展性和成本效益。methods: 这个研究使用了一个已经发布的结构化医疗记录基础模型($FM_{SM}$),通过在长期医疗记录数据上进行训练,来评估基础模型在不同医院的适应性。研究还使用了两个不同的医院的EHR数据集,包括Stanford医学院的2.57万名患者的 longitudinal医疗记录数据和MIMIC-IV数据集。results: 研究发现,通过继续在本地数据上进行预训练,可以大幅提高基础模型的适应性和任务适应性,而无需大量的本地训练数据。此外,基础模型在8个临床预测任务上的表现也证明了其在不同医院之间的适应性。Abstract
Foundation models hold promise for transforming AI in healthcare by providing modular components that are easily adaptable to downstream healthcare tasks, making AI development more scalable and cost-effective. Structured EHR foundation models, trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across different hospitals and their performance for local task adaptation. This multi-center study examined the adaptability of a recently released structured EHR foundation model ($FM_{SM}$), trained on longitudinal medical record data from 2.57M Stanford Medicine patients. Experiments were conducted using EHR data at The Hospital for Sick Children and MIMIC-IV. We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of training models from scratch at each site, including a local foundation model. We evaluated the performance of these models on 8 clinical prediction tasks. In both datasets, adapting the off-the-shelf $FM_{SM}$ matched the performance of GBM models locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. With continued pretraining on local data, label efficiency substantially improved, such that $FM_{SM}$ required fewer than 1% of training examples to match the fully trained GBM's performance. Continued pretraining was also 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings show that adapting shared EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.
摘要
基础模型在医疗领域的应用拥有潜在的优势,它们可以提供可重用的组件,使AI开发变得更加扩展和成本效果。使用结构化医疗记录数据进行训练的基础模型($FM_{SM}$),已经在医疗任务上显示出了多种优点,包括使用 fewer 的训练标签和更好的分布shift的Robustness。然而,共享这些模型在不同医院中的可行性以及其对本地任务适应性的性能仍然存在问题。本多中心研究检查了这种基础模型的适应性,通过在医疗儿童医院和MIMIC-IV数据集上进行实验。我们通过继续预训这些模型地方数据来评估其适应性和任务适应性,并与基于 scratch 训练的本地模型进行比较。我们对8种临床预测任务进行了评估。在两个数据集中,适应 $FM_{SM} $ 与基于GBM模型的本地训练相比,提供了13%的提升,而且在具有少量任务特定训练标签的情况下,适应 $FM_{SM} $ 只需要 fewer than 1%的训练示例来匹配完全训练的GBM模型的性能。继续预训使得样本效率明显提高,例如,$FM_{SM}$ 只需要 fewer than 1%的训练示例来匹配完全训练的GBM模型的性能。此外,继续预训还比基于 scratch 训练的本地模型采样更加高效,60%-90%。我们的发现表明,在医疗机构之间共享基础模型可以提供更好的预测性能,同时降低开发成本,从而赞成基础模型作为医疗AI开发的模块化组件。
methods: Meta Prompting 基于类型理论和类型理论,强调信息结构和 sintaxis,提供了一种独特的框架,超越传统内容强调的方法。
results: 本研究表明,Meta Prompting 在多种 AI 应用中具有优势,特别是在复杂的问题解决方面。它能够将复杂的问题破解成可管理的子问题,实现了Token效率和 fair comparison在问题解决方面的优势。此外,本研究还推广了 Meta Prompting 到多 modal基础模型设置,实现了不同数据类型的集成,如图像、音频和视频,并提出了在这些数据类型之间的挑战和潜在的应用前景。Abstract
This paper presents an in-depth exploration of Meta Prompting, a novel technique that revolutionizes the way large language models (LLMs), multi-modal foundation models, and AI systems approach problem-solving and data interpretation. Meta Prompting, rooted in type theory and category theory, prioritizes the structure and syntax of information, providing a unique framework that transcends traditional content-focused methods. We delve into the formal definitions of Meta Prompting, contrasting it with Few-Shot Prompting, and highlight its applicability and superiority in various AI applications. Key to this exploration is the expansion of Meta Prompting into the realm of complex reasoning. Here, we demonstrate how this technique adeptly breaks down intricate problems into manageable sub-problems, facilitating a step-by-step, detailed approach to problem-solving. This method proves especially advantageous in terms of token efficiency and offering a fair comparison in problem-solving scenarios, standing out against few-shot example approaches. Furthermore, the paper breaks new ground by extending Meta Prompting into multi-modal foundation model settings. This extension addresses the integration of diverse data types, such as images, audio, and video, within the structured framework of Meta Prompting, highlighting both the challenges and the vast potential of this approach in handling complex, multi-faceted data (The code is available at https://github.com/meta-prompting/meta-prompting).
摘要
Empowering remittance management in the digitised landscape: A real-time Data-Driven Decision Support with predictive abilities for financial transactions
results: 该研究不仅增强了财付转公司的安全性,还为未来的预测决策支持解决方案提供了基础,扩展了 predictive analytics 的潜在应用领域。此外,通过实施 artifact 的 theory,DSR 方法得到了进一步的发展和深入的 theory 发展在信息系统领域。Abstract
The advent of Blockchain technology (BT) revolutionised the way remittance transactions are recorded. Banks and remittance organisations have shown a growing interest in exploring blockchain's potential advantages over traditional practices. This paper presents a data-driven predictive decision support approach as an innovative artefact designed for the blockchain-oriented remittance industry. Employing a theory-generating Design Science Research (DSR) approach, we have uncovered the emergence of predictive capabilities driven by transactional big data. The artefact integrates predictive analytics and Machine Learning (ML) to enable real-time remittance monitoring, empowering management decision-makers to address challenges in the uncertain digitised landscape of blockchain-oriented remittance companies. Bridging the gap between theory and practice, this research not only enhances the security of the remittance ecosystem but also lays the foundation for future predictive decision support solutions, extending the potential of predictive analytics to other domains. Additionally, the generated theory from the artifact's implementation enriches the DSR approach and fosters grounded and stakeholder theory development in the information systems domain.
摘要
随着区块链技术(BT)的出现,财务转移交易的记录方式发生了革命。银行和财务转移机构对区块链的潜在优势表示了增加的兴趣。这篇论文描述了一种数据驱动的预测决策支持方法,作为区块链 Orientated 财务转移industry 的创新艺ifact。通过使用理论生成的设计科学研究(DSR)方法,我们揭示了交易大数据驱动的预测能力的出现。该艺ifact结合预测分析和机器学习(ML),以实时监控财务转移,让管理决策者能够在不确定的区块链 Orientated 财务转移公司 digitized 景观中 Address 挑战。将理论与实践相连,这项研究不仅提高了财务转移生态系统的安全性,还为未来的预测决策支持解决方案 lay 基础,扩展预测分析的潜在应用领域。此外,艺ifact的实施生成了理论,扩充了DSR方法,并为信息系统领域的grounded 和参与者理论发展提供了基础。
CSGNN: Conquering Noisy Node labels via Dynamic Class-wise Selection
paper_authors: Yifan Li, Zhen Tan, Kai Shu, Zongsheng Cao, Yu Kong, Huan Liu for: 这篇论文的目的是提出一种新的graph neural network(GNN)选择方法,以提高GNN在graph上的表现学习。methods: 本篇论文使用了一种新的邻居总和 latent space,以适应不同类别的范例选择。具体来说,这篇论文使用了一种动态类别选择机制,通过类别概率的调整,对不同类别的范例进行适应性选择。此外,本篇论文还使用了一种内部阶层学习方法,以避免过滤错误的范例。results: 根据实验结果,CSGNN比顶尖方法更高效和更稳定,具体来说,CSGNN在零散测验中的表现比顶尖方法高出30%以上。此外,CSGNN还能够在不同类别的范例上提高表现,并且能够避免过滤错误的范例。Abstract
Graph Neural Networks (GNNs) have emerged as a powerful tool for representation learning on graphs, but they often suffer from overfitting and label noise issues, especially when the data is scarce or imbalanced. Different from the paradigm of previous methods that rely on single-node confidence, in this paper, we introduce a novel Class-wise Selection for Graph Neural Networks, dubbed CSGNN, which employs a neighbor-aggregated latent space to adaptively select reliable nodes across different classes. Specifically, 1) to tackle the class imbalance issue, we introduce a dynamic class-wise selection mechanism, leveraging the clustering technique to identify clean nodes based on the neighbor-aggregated confidences. In this way, our approach can avoid the pitfalls of biased sampling which is common with global threshold techniques. 2) To alleviate the problem of noisy labels, built on the concept of the memorization effect, CSGNN prioritizes learning from clean nodes before noisy ones, thereby iteratively enhancing model performance while mitigating label noise. Through extensive experiments, we demonstrate that CSGNN outperforms state-of-the-art methods in terms of both effectiveness and robustness.
摘要
格点神经网络(GNNs)已经成为图像 Representation Learning 的强大工具,但它们经常受到过拟合和标签噪声问题的影响,特别是数据稀缺或不均衡时。与先前的方法不同,我们在本文中引入了一种新的类别选择方法,称为 CSGNN,它使用邻居积分的隐藏空间来适应不同类型的可靠节点选择。具体来说,我们解决了类别不均衡问题的方法是,通过 clustering 技术来识别干净的节点,基于邻居积分的信任度。这种方法可以避免全球阈值技术中的偏袋 sampling 问题。此外,我们使用了 memorization effect 的概念,在不同类型的节点上偏好学习干净的节点,从而逐步提高模型性能,同时缓解标签噪声。我们在广泛的实验中证明了 CSGNN 可以与当前状态的方法相比,在效果和稳定性两个方面具有优势。