methods: 这篇论文使用了深度 unfolding 的 Alternating Direction Method of Multipliers (ADMM) 来解决肺超音波检测中的B-线检测问题。
results: 比较 traditional model-based method, 这篇论文的方法可以更快速地完成B-线检测(更多于 90 倍),并且精度也提高了10.6%。Abstract
In the context of lung ultrasound, the detection of B-lines, which are indicative of interstitial lung disease and pulmonary edema, plays a pivotal role in clinical diagnosis. Current methods still rely on visual inspection by experts. Vision-based automatic B-line detection methods have been developed, but their performance has yet to improve in terms of both accuracy and computational speed. This paper presents a novel approach to posing B-line detection as an inverse problem via deep unfolding of the Alternating Direction Method of Multipliers (ADMM). It tackles the challenges of data labelling and model training in lung ultrasound image analysis by harnessing the capabilities of deep neural networks and model-based methods. Our objective is to substantially enhance diagnostic accuracy while ensuring efficient real-time capabilities. The results show that the proposed method runs more than 90 times faster than the traditional model-based method and achieves an F1 score that is 10.6% higher.
摘要
在肺超声 imaging 中,B-线的检测对临床诊断具有重要的作用。现有方法仍然依赖于专家的视觉检查。基于视觉的自动B-线检测方法已经开发,但其性能还未得到改进。这篇论文提出了一种将B-线检测转换为反问题via深度嵌入ADMM的新方法。它利用深度神经网络和模型基于方法来解决肺超声图像分析中的数据标注和模型训练问题。我们的目标是substantially提高诊断精度,同时保持高速的实时能力。结果显示,提出的方法在计算速度方面比传统的模型基于方法快上了90多个倍,并且 achieved an F1 score 10.6% 高于传统模型。
paper_authors: Andrea Abrardo, Alberto Toccafondi, Marco Di Renzo for:* 这篇论文主要研究的是利用多口网络理论分析和优化智能反射表面(RIS),尤其是在距离半波长之下 spacing 的情况下。methods:* 这篇论文使用了 $Z$-parameter(阻抗)和 $S$-parameter(散射)矩阵来表示 RIS 的辐射特性。* 提出了一种基于 $S$-parameter 表示的迭代算法,用于在电romagnetic 互相作用的情况下优化 RIS 的可调负载。results:* 研究发现,通过对 RIS 的结构散射进行考虑,可以更好地优化 RIS 的辐射特性,并且可以在不同的方向上获得更高的接收功率。* 对比 $Z$-parameter 和 $S$-parameter 表示,发现 $S$-parameter 更能准确地描述 RIS 的辐射特性,并且可以更快地获得更好的优化结果。Abstract
Multiport network theory has been proved to be a suitable abstraction model for analyzing and optimizing reconfigurable intelligent surfaces (RISs), especially for studying the impact of the electromagnetic mutual coupling among radiating elements that are spaced less than half of the wavelength. Both representations in terms of $Z$-parameter (impedance) and $S$-parameter (scattering) matrices are widely utilized. In this paper, we embrace multiport network theory for analyzing and optimizing the reradiation properties of RIS-aided channels, and provide four new contributions. (i) First, we offer a thorough comparison between the $Z$-parameter and $S$-parameter representations. This comparison allows us to unveil that the typical scattering models utilized for RIS-aided channels ignore the structural scattering from the RIS, which results in an unwanted specular reflection. (ii) Then, we develop an iterative algorithm for optimizing, in the presence of electromagnetic mutual coupling, the tunable loads of the RIS based on the $S$-parameters representation. We prove that small perturbations of the step size of the algorithm result in larger variations of the $S$-parameter matrix compared with the $Z$-parameter matrix, resulting in a faster convergence rate. (iii) Subsequently, we generalize the proposed algorithm to suppress the specular reflection due to the structural scattering, while maximizing the received power towards the direction of interest, and analyze the effectiveness and tradeoffs of the proposed approach. (iv) Finally, we validate the theoretical findings and algorithms with numerical simulations and a commercial full-wave electromagnetic simulator based on the method of moments.
摘要
多ports网络理论被证明是对折叠智能表面(RIS)的分析和优化模型适用,尤其是研究电磁共振元素之间的电磁共振coupling的影响。这两种表述都广泛使用$Z$-参数(阻抗)和$S$-参数(散射)矩阵。在这篇论文中,我们使用多ports网络理论分析和优化RIS-帮助通道的反射特性,并提供四项新贡献。(i)首先,我们对$Z$-参数和$S$-参数表述进行了深入的比较。这种比较表明,通常用于RIS-帮助通道的散射模型忽略了RIS的结构散射,导致不必要的反射。(ii)然后,我们开发了基于$S$-参数表述的迭代算法,用于在电磁共振coupling存在的情况下优化RIS的可变荷重。我们证明,对算法步长的小 perturbation会导致$S$-参数矩阵中的变化更大,而$Z$-参数矩阵中的变化更小,因此算法的速度更快。(iii)接着,我们扩展了提议的算法,以suppress结构散射引起的反射,同时 Maximize received power towards the direction of interest。我们分析了提议的效果和牺牲。(iv)最后,我们验证了理论发现和算法通过数值仿真和商业全波电磁 simulator based on the method of moments。
Generative AI for Space-Air-Ground Integrated Networks (SAGIN)
results: 根据实验结果,提出的框架能够提高SAGIN的服务质量。此外,本文还讨论了将来的生成AI-enabled SAGIN研究方向。Abstract
Recently, generative AI technologies have emerged as a significant advancement in artificial intelligence field, renowned for their language and image generation capabilities. Meantime, space-air-ground integrated network (SAGIN) is an integral part of future B5G/6G for achieving ubiquitous connectivity. Inspired by this, this article explores an integration of generative AI in SAGIN, focusing on potential applications and case study. We first provide a comprehensive review of SAGIN and generative AI models, highlighting their capabilities and opportunities of their integration. Benefiting from generative AI's ability to generate useful data and facilitate advanced decision-making processes, it can be applied to various scenarios of SAGIN. Accordingly, we present a concise survey on their integration, including channel modeling and channel state information (CSI) estimation, joint air-space-ground resource allocation, intelligent network deployment, semantic communications, image extraction and processing, security and privacy enhancement. Next, we propose a framework that utilizes a Generative Diffusion Model (GDM) to construct channel information map to enhance quality of service for SAGIN. Simulation results demonstrate the effectiveness of the proposed framework. Finally, we discuss potential research directions for generative AI-enabled SAGIN.
摘要
最近,生成式人工智能技术在人工智能领域取得了重要进步,被广泛应用于语言和图像生成等领域。同时,空天地三合一网络(SAGIN)是未来5G/6G的重要组成部分,旨在实现无限连接。以此为启发,本文探讨了生成式人工智能在SAGIN中的 интеграцию,主要强调其应用前景和实践案例。我们首先提供了SAGIN和生成式人工智能模型的全面审视,探讨它们的可能的 интеграción和应用前景。生成式人工智能可以生成有用的数据,并促进高级决策过程,因此可以应用于SAGIN多种场景。在这些应用场景中,我们提出了一种基于生成扩散模型(GDM)的框架,用于提高SAGIN的质量服务。实验结果表明该框架的效果是可靠的。最后,我们讨论了生成式人工智能在SAGIN中的未来研究方向。
Semantic-aware Sampling and Transmission in Energy Harvesting Systems: A POMDP Approach
results: 研究人员通过解决一个随机控制问题,实现了三个semantic-aware指标的共同优化:一、信息年龄(AoI),二、通信质量,三、错误信息年龄(AoII)。通过仿真实验,研究人员发现了优化策略的性能提升,并发现了不同的 switching-type 优化策略。Abstract
We study real-time tracking problem in an energy harvesting system with a Markov source under an imperfect channel. We consider both sampling and transmission costs and different from most prior studies that assume the source is fully observable, the sampling cost renders the source unobservable. The goal is to jointly optimize sampling and transmission policies for three semantic-aware metrics: i) the age of information (AoI), ii) general distortion, and iii) the age of incorrect information (AoII). To this end, we formulate and solve a stochastic control problem. Specifically, for the AoI metric, we cast a Markov decision process (MDP) problem and solve it using relative value iteration (RVI). For the distortion and AoII metrics, we utilize the partially observable MDP (POMDP) modeling and leverage the notion of belief MDP formulation of POMDP to find optimal policies. For the distortion metric and the AoII metric under the perfect channel setup, we effectively truncate the corresponding belief space and solve an MDP problem using RVI. For the general setup, a deep reinforcement learning policy is proposed. Through simulations, we demonstrate significant performance improvements achieved by the derived policies. The results reveal various switching-type structures of optimal policies and show that a distortion-optimal policy is also AoII optimal.
摘要
我们研究实时跟踪问题在能量收集系统中,其中源是Markov过程,并且通信频道存在不完美性。我们考虑了抽样和传输成本,并且不同于大多数前一些研究,源不可见。我们的目标是同时优化抽样和传输策略,以达到三个semantic-aware指标的最优化:一、信息年龄(AoI),二、通信误差,三、错误信息年龄(AoII)。为此,我们设计了一个随机控制问题,并使用相对价值迭代(RVI)解决Markov决策过程(MDP)问题。对于AoI指标,我们使用POMDP模型和信念MDP形式进行解决。对于误差指标和AoII指标在完美通信设置下,我们有效地舒缩相应的信念空间,并使用RVI解决MDP问题。在总体设置下,我们提议了深度强化学习策略。通过sime simulations,我们发现derived策略具有显著的性能改进。结果显示了不同的 switching-type结构,并证明了误差优化策略也是AoII优化的。
Sum-Rate Optimization for RIS-Aided Multiuser Communications with Movable Antenna
paper_authors: Yunan Sun, Hao Xu, Chongjun Ouyang, Hongwen Yang
for: 本研究旨在提高无线通信网络性能,探讨了可程度智能表面(RIS)技术的应用。
methods: 本文提出了一个基于RIS的多用户通信系统,利用可动天线(MA)技术优化通道容量。
results: 提出的迭代算法可以优化照明、RIS的反射系数(RC)值和MA的位置,以提高系统的总资料率。numerical results显示了提案的方法的有效性和MA-based系统在总资料率方面的优势。Abstract
Reconfigurable intelligent surface (RIS) is known as a promising technology to improve the performance of wireless communication networks, which has been extensively studied. Movable antenna (MA) is a novel technology that fully exploits the antenna position for enhancing the channel capacity. In this paper, we propose a new RIS-aided multiuser communication system with MAs. The sum-rate is maximized by jointly optimizing the beamforming, the reflection coefficient (RC) values of RIS and the positions of MAs. A fractional programming-based iterative algorithm is proposed to solve the formulated non-convex problem, considering three assumptions for the RIS. Numerical results are presented to verify the effectiveness of the proposed algorithm and the superiority of the proposed MA-based system in terms of sum-rate.
摘要
改进无线通信网络性能的可 configurable智能表面(RIS)技术已经广泛研究, movable antenna(MA)是一种新的技术,它可以全面利用天线位置来提高通信频率。在这篇论文中,我们提议一种基于RIS的多用户通信系统,并使用MA来提高系统性能。我们使用一种基于分数编程的迭代算法来最大化宽扩权(beamforming)、RIS反射系数(RC)和MA位置的优化问题。我们对问题进行了非几何化处理,并根据RIS的三个假设进行了解释。我们通过数值结果验证了我们的提案的有效性和MA基本系统的提高性。
Semantic Communication for Cooperative Perception based on Importance Map
methods: 本 paper 使用了一种Importance Map 技术来提取 semantic information,并提出了一种新的 Cooperative Perception Semantic Communication Scheme with Intermediate Fusion。
results: simulations 表明,我们的提议的模型在不同的通道模型下表现出了优于传统分离源-通道编码的性能。此外,我们的模型还能够在时变 multipath 拍抄频道下保持robustness。Abstract
Cooperative perception, which has a broader perception field than single-vehicle perception, has played an increasingly important role in autonomous driving to conduct 3D object detection. Through vehicle-to-vehicle (V2V) communication technology, various connected automated vehicles (CAVs) can share their sensory information (LiDAR point clouds) for cooperative perception. We employ an importance map to extract significant semantic information and propose a novel cooperative perception semantic communication scheme with intermediate fusion. Meanwhile, our proposed architecture can be extended to the challenging time-varying multipath fading channel. To alleviate the distortion caused by the time-varying multipath fading, we adopt explicit orthogonal frequency-division multiplexing (OFDM) blocks combined with channel estimation and channel equalization. Simulation results demonstrate that our proposed model outperforms the traditional separate source-channel coding over various channel models. Moreover, a robustness study indicates that only part of semantic information is key to cooperative perception. Although our proposed model has only been trained over one specific channel, it has the ability to learn robust coded representations of semantic information that remain resilient to various channel models, demonstrating its generality and robustness.
摘要
合作感知,具有更广泛的感知范围 than single-vehicle perception,在自动驾驶中扮演着越来越重要的角色,以实现3D对象探测。通过自动汽车之间的通信技术(V2V),不同的相连自动汽车(CAVs)可以共享它们的感知信息(LiDAR点云)进行合作感知。我们使用重要度图 Extract significant semantic information and propose a novel cooperative perception semantic communication scheme with intermediate fusion. Meanwhile, our proposed architecture can be extended to the challenging time-varying multipath fading channel. To alleviate the distortion caused by the time-varying multipath fading, we adopt explicit orthogonal frequency-division multiplexing (OFDM) blocks combined with channel estimation and channel equalization. Simulation results demonstrate that our proposed model outperforms the traditional separate source-channel coding over various channel models. Moreover, a robustness study indicates that only part of semantic information is key to cooperative perception. Although our proposed model has only been trained over one specific channel, it has the ability to learn robust coded representations of semantic information that remain resilient to various channel models, demonstrating its generality and robustness.Here's the translation in Traditional Chinese:合作感知,具有更广泛的感知范围 than single-vehicle perception,在自动驾驶中扮演着越来越重要的角色,以实现3D对象探测。通过自动汽车之间的通信技术(V2V),不同的相连自动汽车(CAVs)可以共享它们的感知信息(LiDAR点云)进行合作感知。我们使用重要度图 Extract significant semantic information and propose a novel cooperative perception semantic communication scheme with intermediate fusion. Meanwhile, our proposed architecture can be extended to the challenging time-varying multipath fading channel. To alleviate the distortion caused by the time-varying multipath fading, we adopt explicit orthogonal frequency-division multiplexing (OFDM) blocks combined with channel estimation and channel equalization. Simulation results demonstrate that our proposed model outperforms the traditional separate source-channel coding over various channel models. Moreover, a robustness study indicates that only part of semantic information is key to cooperative perception. Although our proposed model has only been trained over one specific channel, it has the ability to learn robust coded representations of semantic information that remain resilient to various channel models, demonstrating its generality and robustness.
paper_authors: Mingyuan Fan, Xiaodan Li, Cen Chen, Yinggui Wang
for: 这 paper 的目的是通过利用抗击器的传输性来发动黑盒攻击。
methods: 这 paper 使用的方法是通过组合多个转换后的输入来生成抗击器。
results: compared with 现有基elines,这 paper 的方法可以明显提高抗击器的传输性。Here’s the full translation of the paper’s abstract in Simplified Chinese:
for: 这 paper 的目的是通过利用抗击器的传输性来发动黑盒攻击。
methods: 这 paper 使用的方法是通过组合多个转换后的输入来生成抗击器。
results: compared with 现有基elines,这 paper 的方法可以明显提高抗击器的传输性。I hope this helps! Let me know if you have any other questions.Abstract
The transferability of adversarial examples can be exploited to launch black-box attacks. However, adversarial examples often present poor transferability. To alleviate this issue, by observing that the diversity of inputs can boost transferability, input regularization based methods are proposed, which craft adversarial examples by combining several transformed inputs. We reveal that input regularization based methods make resultant adversarial examples biased towards flat extreme regions. Inspired by this, we propose an attack called flatness-aware adversarial attack (FAA) which explicitly adds a flatness-aware regularization term in the optimization target to promote the resultant adversarial examples towards flat extreme regions. The flatness-aware regularization term involves gradients of samples around the resultant adversarial examples but optimizing gradients requires the evaluation of Hessian matrix in high-dimension spaces which generally is intractable. To address the problem, we derive an approximate solution to circumvent the construction of Hessian matrix, thereby making FAA practical and cheap. Extensive experiments show the transferability of adversarial examples crafted by FAA can be considerably boosted compared with state-of-the-art baselines.
摘要
“敌方模型可以通过对抗性示例的转移性攻击。但是,对抗性示例通常具有差的转移性。为解决这个问题,我们观察到输入多标的帮助,可以提高对抗性示例的转移性。我们提出了基于输入调整的方法,这些方法通过组合多个对抗性示例的转换而创建对抗性示例。我们发现这些对抗性示例倾向于扁平极大区域。受这些想法所影响,我们提出了一种名为扁平识别攻击(FAA)的攻击方法。这个方法将在优化目标中添加一个扁平识别调整项,以便提高对抗性示例的转移性。扁平识别调整项需要在高维度空间中评估扁平方向的梯度,但是评估梯度通常是不可能的。为解决这个问题,我们 derive an approximate solution,以便在高维度空间中评估扁平方向的梯度,并且让FAA实用且便宜。实验结果表明,由FAA创建的对抗性示例的转移性可以与现有基准相比大大提高。”
EviPrompt: A Training-Free Evidential Prompt Generation Method for Segment Anything Model in Medical Images
results: 该方法可以自动生成适当的证据提示,以提高 SAM 在医疗影像分类中的应用和有用性。 evaluations across a broad range of tasks and modalities confirm its efficacy.Abstract
Medical image segmentation has immense clinical applicability but remains a challenge despite advancements in deep learning. The Segment Anything Model (SAM) exhibits potential in this field, yet the requirement for expertise intervention and the domain gap between natural and medical images poses significant obstacles. This paper introduces a novel training-free evidential prompt generation method named EviPrompt to overcome these issues. The proposed method, built on the inherent similarities within medical images, requires only a single reference image-annotation pair, making it a training-free solution that significantly reduces the need for extensive labeling and computational resources. First, to automatically generate prompts for SAM in medical images, we introduce an evidential method based on uncertainty estimation without the interaction of clinical experts. Then, we incorporate the human prior into the prompts, which is vital for alleviating the domain gap between natural and medical images and enhancing the applicability and usefulness of SAM in medical scenarios. EviPrompt represents an efficient and robust approach to medical image segmentation, with evaluations across a broad range of tasks and modalities confirming its efficacy.
摘要
医学图像分割具有巨大的临床应用前提,但是它仍然是一个挑战,尽管深度学习在发展。 seg anything模型(SAM)在这一点方面表现出潜力,但是需要专家干预和医学图像和自然图像之间的领域差距问题带来了重大障碍。这篇论文介绍了一种新的无需训练的证据提示生成方法,名为EviPrompt,以解决这些问题。我们的方法基于医学图像之间的自然相似性,只需要一个参考图像-标注对,可以减少了大量的标注和计算资源。首先,我们引入了一种基于不确定性估计的证据方法,无需互动式临床专家。然后,我们将人类优先级 integrate 到提示中,这是关键的,可以减少医学图像和自然图像之间的领域差距,提高SAM在医学场景中的应用和实用性。EviPrompt表示一种高效和可靠的医学图像分割方法,评估结果 across 多种任务和模式表明其效果。
A design of Convolutional Neural Network model for the Diagnosis of the COVID-19
paper_authors: Xinyuan Song for:这种研究的目的是为了提供一种准确地识别COVID-19的肺部X射线图像分类方法,以帮助临床中心和医院诊断COVID-19。methods:这种方法基于19层卷积神经网络(CNN),并对三类(肺炎、正常、COVID)和四类(肺擦亮、正常、COVID-19、肺炎)进行分类。研究人员还对一些已经预训练的网络进行比较,包括Inception、Alexnet、ResNet50、Squeezenet和VGG19。results:实验结果表明,提出的CNN方法在准确率、特异性、准确率、敏感度和归一化矩阵等指标上具有明显的优势,超过了现有的发布过程。这种方法可以为临床医生提供一个有用的工具,帮助他们准确地诊断COVID-19。Abstract
With the spread of COVID-19 around the globe over the past year, the usage of artificial intelligence (AI) algorithms and image processing methods to analyze the X-ray images of patients' chest with COVID-19 has become essential. The COVID-19 virus recognition in the lung area of a patient is one of the basic and essential needs of clicical centers and hospitals. Most research in this field has been devoted to papers on the basis of deep learning methods utilizing CNNs (Convolutional Neural Network), which mainly deal with the screening of sick and healthy people.In this study, a new structure of a 19-layer CNN has been recommended for accurately recognition of the COVID-19 from the X-ray pictures of chest. The offered CNN is developed to serve as a precise diagnosis system for a three class (viral pneumonia, Normal, COVID) and a four classclassification (Lung opacity, Normal, COVID-19, and pneumonia). A comparison is conducted among the outcomes of the offered procedure and some popular pretrained networks, including Inception, Alexnet, ResNet50, Squeezenet, and VGG19 and based on Specificity, Accuracy, Precision, Sensitivity, Confusion Matrix, and F1-score. The experimental results of the offered CNN method specify its dominance over the existing published procedures. This method can be a useful tool for clinicians in deciding properly about COVID-19.
摘要
随着 COVID-19 在过去一年内的全球蔓延,使用人工智能(AI)算法和图像处理方法来分析患 COVID-19 患者的X射线图像已成为必需的。识别患 COVID-19 病毒在患者的肺部是临床中心和医院的基本和必要需求。大多数研究都集中在基于深度学习方法的 CNN(卷积神经网络)上,主要是用于健康和疾病人的分类。在本研究中,一种新的19层 CNN 结构被建议用于准确地识别 X射线图像中的 COVID-19。这个 CNN 结构是用于三类(肺病毒感染、正常、COVID)和四类分类(肺抑血、正常、COVID-19、肺炎)。对于这些结果和一些常用的预训练网络(如 Inception、Alexnet、ResNet50、Squeezenet 和 VGG19)进行了比较,并根据具体性、准确率、精度、敏感度和冲激矩阵来评估。实验结果表明,提出的 CNN 方法在现有发表的方法中具有优势。这种方法可以成为临床医生决策 COVID-19 的有用工具。
Towards A Unified Neural Architecture for Visual Recognition and Reasoning
paper_authors: Calvin Luo, Boqing Gong, Ting Chen, Chen Sun
for: 这篇论文主要针对视觉理解的两大柱子:认知和理解。
methods: 该论文提出了一种基于多任务转换器的协同架构,可以同时解决视觉认知和理解两个任务。
results: 研究发现,对象检测任务对视觉理解具有最大的帮助,并且该架构自动生成了对象中心的表示。此外,研究还发现了不同架构设计对视觉理解的影响。Abstract
Recognition and reasoning are two pillars of visual understanding. However, these tasks have an imbalance in focus; whereas recent advances in neural networks have shown strong empirical performance in visual recognition, there has been comparably much less success in solving visual reasoning. Intuitively, unifying these two tasks under a singular framework is desirable, as they are mutually dependent and beneficial. Motivated by the recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning with a generic interface (e.g., tokens) for both. Our framework enables the principled investigation of how different visual recognition tasks, datasets, and inductive biases can help enable spatiotemporal reasoning capabilities. Noticeably, we find that object detection, which requires spatial localization of individual objects, is the most beneficial recognition task for reasoning. We further demonstrate via probing that implicit object-centric representations emerge automatically inside our framework. Intriguingly, we discover that certain architectural choices such as the backbone model of the visual encoder have a significant impact on visual reasoning, but little on object detection. Given the results of our experiments, we believe that visual reasoning should be considered as a first-class citizen alongside visual recognition, as they are strongly correlated but benefit from potentially different design choices.
摘要
<>视觉理解的两个柱子是认知和理解。然而,这两个任务在注意力方面存在偏见,而且近年来神经网络的实验性表现在视觉认知方面强大,而在视觉理解方面相对落后。可是,将这两个任务集成到一个共同框架中是有利的,因为它们是互相依赖的和有益的。鼓励 by recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning with a generic interface (e.g., tokens) for both. Our framework enables the principled investigation of how different visual recognition tasks, datasets, and inductive biases can help enable spatiotemporal reasoning capabilities.发现结果显示,对象检测,需要物体的空间局部化,是最有利的认知任务 для理解。我们还通过探测发现了自动内生的卷积表示。进一步的实验结果表明,certain architectural choices such as the backbone model of the visual encoder have a significant impact on visual reasoning, but little on object detection. given the results of our experiments, we believe that visual reasoning should be considered as a first-class citizen alongside visual recognition, as they are strongly correlated but benefit from potentially different design choices. Traditional Chinese translation:<>Visual understanding 的两个柱子是识别和理解。然而,这两个任务在注意力方面存在偏见,而且近年来神经网络的实验性表现在视觉认知方面强大,而在视觉理解方面相对落后。可是,将这两个任务集成到一个共同框架中是有利的,因为它们是互相依赖的和有益的。鼓励 by recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning with a generic interface (e.g., tokens) for both. Our framework enables the principled investigation of how different visual recognition tasks, datasets, and inductive biases can help enable spatiotemporal reasoning capabilities.发现结果显示,对象检测,需要物体的空间局部化,是最有利的认知任务 для理解。我们还通过探测发现了自动内生的卷积表示。进一步的实验结果表明,certain architectural choices such as the backbone model of the visual encoder have a significant impact on visual reasoning, but little on object detection. given the results of our experiments, we believe that visual reasoning should be considered as a first-class citizen alongside visual recognition, as they are strongly correlated but benefit from potentially different design choices.
Image Classification using Combination of Topological Features and Neural Networks
methods: 本研究首先从复杂体系中构建了筛选,然后计算了 persistent homology 类型,并将其在筛选中的演化visualized through persistence diagram。此外,我们还应用了vectorization技术,使这些 topological 信息与机器学习算法兼容。
results: 我们的方法可以在 MNIST 数据集中分类多个类型的图像,并且比基eline 的结果更高。我们的分析还表明,在多类分类任务中, topological 信息可以提高神经网络的准确率,但是计算 persist homology 的计算复杂性增加。这是我们知道的第一个结合深度学习特征和 topological 特征的多类分类任务。Abstract
In this work we use the persistent homology method, a technique in topological data analysis (TDA), to extract essential topological features from the data space and combine them with deep learning features for classification tasks. In TDA, the concepts of complexes and filtration are building blocks. Firstly, a filtration is constructed from some complex. Then, persistent homology classes are computed, and their evolution along the filtration is visualized through the persistence diagram. Additionally, we applied vectorization techniques to the persistence diagram to make this topological information compatible with machine learning algorithms. This was carried out with the aim of classifying images from multiple classes in the MNIST dataset. Our approach inserts topological features into deep learning approaches composed by single and two-streams neural networks architectures based on a multi-layer perceptron (MLP) and a convolutional neral network (CNN) taylored for multi-class classification in the MNIST dataset. In our analysis, we evaluated the obtained results and compared them with the outcomes achieved through the baselines that are available in the TensorFlow library. The main conclusion is that topological information may increase neural network accuracy in multi-class classification tasks with the price of computational complexity of persistent homology calculation. Up to the best of our knowledge, it is the first work that combines deep learning features and the combination of topological features for multi-class classification tasks.
摘要
在这项工作中,我们使用 persistente homology 方法,一种 topological data analysis(TDA)中的技术,以提取数据空间中的基本 topological 特征,并将其与深度学习特征结合以进行分类任务。在 TDA 中,复杂设与筛选是建筑 Material。首先,一个筛选是从一个复杂中构造出来。然后, persistente homology 类是计算出来,并将其在筛选的演化中可见化 durch persistence 图。此外,我们还应用了vectorization技术来使这些 topological 信息与机器学习算法兼容。这是为了在 MNIST 数据集中分类图像。我们的方法把 topological 特征与单流和两流 neural network 架构(基于 multi-layer perceptron 和 convolutional neural network)结合以进行多类分类。在我们的分析中,我们评估了获得的结果,并与存在于 TensorFlow 库中的基eline 结果进行比较。结论是:topological 信息可能会增加多类分类任务中 neural network 精度,但是 persistente homology 计算的计算复杂度会增加。据我们所知,这是首次将 deep learning 特征与 topological 特征结合以进行多类分类任务。
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
paper_authors: Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan for: Florence-2 is a novel vision foundation model that can perform a variety of computer vision and vision-language tasks with simple text-based instructions.methods: Florence-2 uses a sequence-to-sequence structure and large-scale, high-quality annotated data to train the model for versatile and comprehensive vision tasks.results: Florence-2 demonstrated strong zero-shot and fine-tuning capabilities, making it a competitive vision foundation model for a variety of tasks.Here is the text in Simplified Chinese:for: florence-2 是一种 novel 的视觉基础模型,可以通过简单的文本指令来执行多种计算机视觉和视觉语言任务。methods: florence-2 使用 sequence-to-sequence 结构和大规模、高质量的注解数据来训练模型,以执行多元和全面的视觉任务。results: florence-2 在多种任务上表现出了强大的零配置和微调能力,使其成为计算机视觉领域的竞争力强的视觉基础模型。Abstract
We introduce Florence-2, a novel vision foundation model with a unified, prompt-based representation for a variety of computer vision and vision-language tasks. While existing large vision models excel in transfer learning, they struggle to perform a diversity of tasks with simple instructions, a capability that implies handling the complexity of various spatial hierarchy and semantic granularity. Florence-2 was designed to take text-prompt as task instructions and generate desirable results in text forms, whether it be captioning, object detection, grounding or segmentation. This multi-task learning setup demands large-scale, high-quality annotated data. To this end, we co-developed FLD-5B that consists of 5.4 billion comprehensive visual annotations on 126 million images, using an iterative strategy of automated image annotation and model refinement. We adopted a sequence-to-sequence structure to train Florence-2 to perform versatile and comprehensive vision tasks. Extensive evaluations on numerous tasks demonstrated Florence-2 to be a strong vision foundation model contender with unprecedented zero-shot and fine-tuning capabilities.
摘要
我们介绍 Florence-2,一种新型视觉基础模型,具有一个统一的提示基础表示,用于多种计算机视觉和视觉语言任务。现有的大型视觉模型在转移学习方面表现出色,但它们在执行简单的指令下表现不佳,这表明它们不能处理多种空间层次和semantic粒度的复杂性。Florence-2是根据文本提示进行任务指令,并生成desirable的结果,无论是captioning、对象检测、grounding或分割。这种多任务学习设置需要大规模、高质量的注解数据。为此,我们共同开发了 FLD-5B,包括126万张图像的5.4亿次全面视觉注解,使用了迭代的自动图像注解和模型优化策略。我们采用了序列到序列结构来训练 Florence-2,以便它可以执行多种灵活和全面的视觉任务。广泛的评估表明,Florence-2是一个强大的视觉基础模型候选人,具有历史上未有的零shot和微调能力。
Learning Human Action Recognition Representations Without Real Humans
paper_authors: Howard Zhong, Samarth Mishra, Donghyun Kim, SouYoung Jin, Rameswar Panda, Hilde Kuehne, Leonid Karlinsky, Venkatesh Saligrama, Aude Oliva, Rogerio Feris for: 这个论文的目的是研究是否可以使用不包含真实人类图像的数据进行人体动作识别模型的预训练。methods: 这篇论文使用了一个新的预训练策略,即 Privacy-Preserving MAE-Align,将真实人类图像去除后的数据和 sintetic数据组合使用,以提高预训练模型的表现。results: 该论文的实验结果表明,使用 Privacy-Preserving MAE-Align 策略可以提高预训练模型的表现,并将人体动作识别模型的表现与无人体动作识别模型的表现进行比较。此外,该论文还提供了一个可用于复现研究的开源 benchmark。Abstract
Pre-training on massive video datasets has become essential to achieve high action recognition performance on smaller downstream datasets. However, most large-scale video datasets contain images of people and hence are accompanied with issues related to privacy, ethics, and data protection, often preventing them from being publicly shared for reproducible research. Existing work has attempted to alleviate these problems by blurring faces, downsampling videos, or training on synthetic data. On the other hand, analysis on the transferability of privacy-preserving pre-trained models to downstream tasks has been limited. In this work, we study this problem by first asking the question: can we pre-train models for human action recognition with data that does not include real humans? To this end, we present, for the first time, a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model. We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks. Furthermore, we propose a novel pre-training strategy, called Privacy-Preserving MAE-Align, to effectively combine synthetic data and human-removed real data. Our approach outperforms previous baselines by up to 5% and closes the performance gap between human and no-human action recognition representations on downstream tasks, for both linear probing and fine-tuning. Our benchmark, code, and models are available at https://github.com/howardzh01/PPMA .
摘要
大规模视频数据的预训练已成为实现高效人体动作识别的必备条件。然而,大多数大规模视频数据包含人脸图像,因此会附带隐私、伦理和数据保护等问题,常常使得这些数据无法公开分享,对于可重复的研究。现有的工作尝试解决这些问题,通过让人脸模糊、视频下采样或使用生成的数据进行训练。然而,对于隐私保持的模型转移性的分析却受到限制。在这项工作中,我们提出了以下问题:可以我们在不包含真实人类数据的情况下进行人体动作识别预训练吗?为此,我们提供了一个新的数据集,其中包含了人类去除后的真实视频和虚拟人类生成的数据,用于预训练模型。然后,我们评估了这种数据的表示学习到下游动作识别任务中的转移性,并提出了一种新的预训练策略,即隐私保持MAE-Align。我们的方法比前一代基eline上提高了5%,并将人类动作识别和无人类动作识别表示之间的性能差距降到最小。我们的数据集、代码和模型可以在https://github.com/howardzh01/PPMA上下载。
Diffusion Models for Earth Observation Use-cases: from cloud removal to urban change detection
results: 这篇论文获得了云除和填充、数据集生成、城市规划等三个实际应用案例中的良好结果。Abstract
The advancements in the state of the art of generative Artificial Intelligence (AI) brought by diffusion models can be highly beneficial in novel contexts involving Earth observation data. After introducing this new family of generative models, this work proposes and analyses three use cases which demonstrate the potential of diffusion-based approaches for satellite image data. Namely, we tackle cloud removal and inpainting, dataset generation for change-detection tasks, and urban replanning.
摘要
“现代生成人工智能(AI)技术的进步,即扩散模型,在地球观测数据中可以获得非常有利的效果。本研究首次介绍了这种新的生成模型家族,然后提出和分析了三个使用场景,即云除和填充、数据集生成 для变化检测任务、和城市规划。”Here's a breakdown of the translation:* 现代生成人工智能 (AI) 技术 (技术) - This phrase is translated as "现代生成人工智能(AI)技术" in Simplified Chinese.* 的进步 (进步) - This word is translated as "的进步" in Simplified Chinese.* 即扩散模型 (扩散模型) - This phrase is translated as "即扩散模型" in Simplified Chinese.* 在地球观测数据中 (在地球观测数据中) - This phrase is translated as "在地球观测数据中" in Simplified Chinese.* 可以获得非常有利的效果 (可以获得非常有利的效果) - This phrase is translated as "可以获得非常有利的效果" in Simplified Chinese.* 本研究 (本研究) - This word is translated as "本研究" in Simplified Chinese.* 首次介绍了 (首次介绍了) - This phrase is translated as "首次介绍了" in Simplified Chinese.* 这种新的生成模型家族 (这种新的生成模型家族) - This phrase is translated as "这种新的生成模型家族" in Simplified Chinese.* 然后 (然后) - This word is translated as "然后" in Simplified Chinese.* 提出和分析了三个使用场景 (提出和分析了三个使用场景) - This phrase is translated as "提出和分析了三个使用场景" in Simplified Chinese.* 即云除和填充 (即云除和填充) - This phrase is translated as "即云除和填充" in Simplified Chinese.* 数据集生成 для变化检测任务 (数据集生成 для变化检测任务) - This phrase is translated as "数据集生成 для变化检测任务" in Simplified Chinese.* 和城市规划 (和城市规划) - This phrase is translated as "和城市规划" in Simplified Chinese.
Semantic-aware Video Representation for Few-shot Action Recognition
results: 在五个具有不同设定的ew-shot动作识别benchmark上,经验表明,提出的SAFSAR模型可以显著提高状态 искусственный的性能。Abstract
Recent work on action recognition leverages 3D features and textual information to achieve state-of-the-art performance. However, most of the current few-shot action recognition methods still rely on 2D frame-level representations, often require additional components to model temporal relations, and employ complex distance functions to achieve accurate alignment of these representations. In addition, existing methods struggle to effectively integrate textual semantics, some resorting to concatenation or addition of textual and visual features, and some using text merely as an additional supervision without truly achieving feature fusion and information transfer from different modalities. In this work, we propose a simple yet effective Semantic-Aware Few-Shot Action Recognition (SAFSAR) model to address these issues. We show that directly leveraging a 3D feature extractor combined with an effective feature-fusion scheme, and a simple cosine similarity for classification can yield better performance without the need of extra components for temporal modeling or complex distance functions. We introduce an innovative scheme to encode the textual semantics into the video representation which adaptively fuses features from text and video, and encourages the visual encoder to extract more semantically consistent features. In this scheme, SAFSAR achieves alignment and fusion in a compact way. Experiments on five challenging few-shot action recognition benchmarks under various settings demonstrate that the proposed SAFSAR model significantly improves the state-of-the-art performance.
摘要
In this work, we propose a simple yet effective Semantic-Aware Few-Shot Action Recognition (SAFSAR) model to address these issues. Our approach leverages a 3D feature extractor combined with an effective feature-fusion scheme and a simple cosine similarity for classification, which improves performance without the need for extra components for temporal modeling or complex distance functions.We also introduce an innovative scheme to encode textual semantics into the video representation, which adaptively fuses features from text and video and encourages the visual encoder to extract more semantically consistent features. This scheme allows for compact and effective alignment and fusion of textual and visual information.Experiments on five challenging few-shot action recognition benchmarks under various settings demonstrate that the proposed SAFSAR model significantly improves the state-of-the-art performance.
Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model
paper_authors: Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi
for: This paper aims to generate high-quality and diverse 3D assets from text prompts in a feed-forward manner.
methods: The proposed method Instant3D uses a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor.
results: The method can generate high-quality, diverse and Janus-free 3D assets within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods that can take 1 to 10 hours.Abstract
Text-to-3D with diffusion models have achieved remarkable progress in recent years. However, existing methods either rely on score distillation-based optimization which suffer from slow inference, low diversity and Janus problems, or are feed-forward methods that generate low quality results due to the scarcity of 3D training data. In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor. Through extensive experiments, we demonstrate that our method can generate high-quality, diverse and Janus-free 3D assets within 20 seconds, which is two order of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage: https://jiahao.ai/instant3d/.
摘要
In this paper, we propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner. We adopt a two-stage paradigm:1. First, we generate a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model.2. Then, we directly regress the NeRF from the generated images with a novel transformer-based sparse-view reconstructor.Through extensive experiments, we demonstrate that our method can generate high-quality, diverse, and Janus-free 3D assets within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods that can take 1 to 10 hours. Our project webpage is .
ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation
results: 实验表明,该方法可以实现可扩展的真实化和组合渲染,并生成三维彩色图像、深度图像和精确的分割mask。Abstract
We present ASSIST, an object-wise neural radiance field as a panoptic representation for compositional and realistic simulation. Central to our approach is a novel scene node data structure that stores the information of each object in a unified fashion, allowing online interaction in both intra- and cross-scene settings. By incorporating a differentiable neural network along with the associated bounding box and semantic features, the proposed structure guarantees user-friendly interaction on independent objects to scale up novel view simulation. Objects in the scene can be queried, added, duplicated, deleted, transformed, or swapped simply through mouse/keyboard controls or language instructions. Experiments demonstrate the efficacy of the proposed method, where scaled realistic simulation can be achieved through interactive editing and compositional rendering, with color images, depth images, and panoptic segmentation masks generated in a 3D consistent manner.
摘要
我们提出了ASSIST,一种对象级别神经辐射场,用于实现组合和实际的 simulate 作业。我们的方法的核心是一种新的场景节点数据结构,可以同时存储每个对象的信息,以便在线上交互和跨场景设置。通过结合可导式神经网络和相关的 bounding box 和semantic feature,我们的结构确保了用户友好的交互,可以通过鼠标/键盘控制或语言指令来查询、添加、复制、删除、转换或换位对象。实验表明,我们的方法可以实现协助编辑和组合渲染,并生成3D保持一致的颜色图像、深度图像和panographic分割mask。
An Automated Pipeline for Tumour-Infiltrating Lymphocyte Scoring in Breast Cancer
paper_authors: Adam J Shephard, Mostafa Jahanifar, Ruoyu Wang, Muhammad Dawood, Simon Graham, Kastytis Sidlauskas, Syed Ali Khurram, Nasir M Rajpoot, Shan E Ahmed Raza
For: 本研究使用深度学习算法对 breast cancer 整幕影像进行 TILs 分数计算,以提高诊断和预后评估。* Methods: 我们的方法首先分别分类 tumour 和 stroma 区域,然后在 tumour-associated stroma 中检测 TILs,并生成 TILs 分数。我们的方法基于 Efficient-UNet 架构,并且具有 state-of-the-art 的性能在 tumour/stroma 区域分 segmentation 和 TILs 检测中。* Results: 我们的研究表明,我们的自动 TILs 分数系统可以准确预测 breast cancer 患者的 survival 结果,并且与 Pathologist 的评估结果相符。Abstract
Tumour-infiltrating lymphocytes (TILs) are considered as a valuable prognostic markers in both triple-negative and human epidermal growth factor receptor 2 (HER2) breast cancer. In this study, we introduce an innovative deep learning pipeline based on the Efficient-UNet architecture to compute a TILs score for breast cancer whole slide images. Our pipeline first segments tumour-stroma regions and generates a tumour bulk mask. Subsequently, it detects TILs within the tumour-associated stroma, generating a TILs score by closely mirroring the pathologist's workflow. Our method exhibits state-of-the-art performance in segmenting tumour/stroma areas and TILs detection, as demonstrated by internal cross-validation on the TiGER Challenge training dataset and evaluation on the final leaderboards. Additionally, our TILs score proves competitive in predicting survival outcomes within the same challenge, underscoring the clinical relevance and potential of our automated TILs scoring system as a breast cancer prognostic tool.
摘要
肿瘤浸泡免疫细胞(TILs)在三重阴性和人顺体外生长因子受体2(HER2)乳腺癌中被视为有价值的诊断标志。本研究提出了一种创新的深度学习管道,基于Efficient-UNet架构,计算乳腺癌整个染色体影像中TILs分数。我们的管道首先分 segment tumor-stroma区域,并生成肿瘤涂抹mask。然后,它检测TILs在肿瘤相关的Connective tissue中,生成TILs分数,与病理学家的工作流程几乎相同。我们的方法在分 segment tumor/stroma区域和TILs检测方面表现出了状态之arte的表现,经过内部交叉验证在TiGER Challenge训练数据集上,并在最终的排名中进行了评估。此外,我们的TILs分数能够预测乳腺癌存活结果,这 highlights the clinical relevance and potential of our automated TILs scoring system as a breast cancer prognostic tool。
Automatic Report Generation for Histopathology images using pre-trained Vision Transformers
results: 这个研究获得了一个不错的性能和可移植性的报告生成机制,可以考虑整个高分辨率图像,而不只是patches。此外,这个研究还使用了现有的强大预训练的层次感知 transformer,并证明其在零损失分类以及报告生成中的有用性。Abstract
Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an existing pre-trained Vision Transformer in a two-step process of first using it to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and then using it as the encoder and an LSTM decoder for report generation, we can build a fairly performant and portable report generation mechanism that takes into account the whole of the high resolution image, instead of just the patches. We are also able to use representations from an existing powerful pre-trained hierarchical vision transformer and show its usefulness in not just zero shot classification but also for report generation.
摘要
深度学习在 Histopathology 中已经得到了成功,用于疾病分类、图像分割和更多的应用。然而,将图像和文本模式结合使用现有的状态太的方法是一个挑战,主要是因为 histopathology 图像的高分辨率。自动生成 histopathology 图像的报告是一个这样的挑战。在这项工作中,我们表明了使用现有的预训练 Vision Transformer 进行两步处理:首先,将 Whole Slide Image (WSI) 的 4096x4096 大小的patches 使用 Vision Transformer 进行编码,然后使用 Vision Transformer 作为编码器和 LSTM 解码器进行报告生成。我们发现,这种方法可以建立一个性能较高且可移植的报告生成机制,可以考虑整个高分辨率图像,而不仅仅是patches。此外,我们还可以使用现有的强大预训练 hierarchical Vision Transformer 的表示,并证明其在零shot分类以及报告生成中的用用。
Deep Fast Vision: A Python Library for Accelerated Deep Transfer Learning Vision Prototyping
results: 提供一个简单、扩展性强的深度学习工具,帮助bridge Complex deep learning frameworks和各种用户需求,推动深度学习的普及和应用。Abstract
Deep learning-based vision is characterized by intricate frameworks that often necessitate a profound understanding, presenting a barrier to newcomers and limiting broad adoption. With many researchers grappling with the constraints of smaller datasets, there's a pronounced reliance on pre-trained neural networks, especially for tasks such as image classification. This reliance is further intensified in niche imaging areas where obtaining vast datasets is challenging. Despite the widespread use of transfer learning as a remedy to the small dataset dilemma, a conspicuous absence of tailored auto-ML solutions persists. Addressing these challenges is "Deep Fast Vision", a python library that streamlines the deep learning process. This tool offers a user-friendly experience, enabling results through a simple nested dictionary definition, helping to democratize deep learning for non-experts. Designed for simplicity and scalability, Deep Fast Vision appears as a bridge, connecting the complexities of existing deep learning frameworks with the needs of a diverse user base.
摘要
深度学习视觉 caracteriza por frameworks intrincados que a menudo requieren una comprensión profunda, lo que puede representar una barrera para los principiantes y limitaciones en la adopción amplia. Con muchos investigadores lidiando con los límites de conjuntos de datos más pequeños, hay una reliance pronunciada en redes neuronales preentrenzadas, especialmente para tareas como clasificación de imágenes. Esta reliance se vuelve a intensificar en áreas de imagen nicho donde obtener conjuntos de datos vastos es desafiante. A pesar del uso amplio de aprendizaje transferido como una solución a la dilema de conjuntos de datos pequeños, una ausencia conspicua de soluciones de Auto-ML personalizadas persiste. Para abordar estos desafíos, se presenta "Deep Fast Vision", una biblioteca de Python que simplifica el proceso de aprendizaje profundo. Esta herramienta ofrece una experiencia de usuario amigable, permitiendo resultados a través de una definición de diccionario nestado simple, ayudando a democratizar el aprendizaje profundo para no expertos. Diseñada para la simplicidad y escalabilidad, Deep Fast Vision se presenta como un puente que conecta las complejidades de los marcos existentes de aprendizaje profundo con las necesidades de una base de usuarios diversa.
results: 研究发现, previously reported face recognition accuracy 高于 95% 下降到了 65% 以下,表明面部识别系统在这种更加复杂的刑事enario中表现不佳。Abstract
Recent advances in machine learning and computer vision have led to reported facial recognition accuracies surpassing human performance. We question if these systems will translate to real-world forensic scenarios in which a potentially low-resolution, low-quality, partially-occluded image is compared against a standard facial database. We describe the construction of a large-scale synthetic facial dataset along with a controlled facial forensic lineup, the combination of which allows for a controlled evaluation of facial recognition under a range of real-world conditions. Using this synthetic dataset, and a popular dataset of real faces, we evaluate the accuracy of two popular neural-based recognition systems. We find that previously reported face recognition accuracies of more than 95% drop to as low as 65% in this more challenging forensic scenario.
摘要
最近的机器学习和计算机视觉技术发展,已经使facial recognition系统的准确率超过人类表现。我们问题是这些系统在真实世界冤家enario中是否能够维持高度的准确率,例如 comparing a low-resolution, low-quality, partially-occluded image against a standard facial database。我们描述了一个大规模的 sintetic facial dataset的构建,以及一个控制的 facial forensic lineup,这两个组合允许我们在不同的真实世界条件下进行控制的评估。使用这个 sintetic dataset,以及一个流行的实际面孔数据集,我们评估了两个流行的神经网络基于的认识系统的准确率。我们发现,以前报道的面recognition准确率高于95%下降到了65%的这样的更加挑战的冤家enario中。
Federated Learning Across Decentralized and Unshared Archives for Remote Sensing Image Classification
paper_authors: Barış Büyüktaş, Gencer Sumbul, Begüm Demir
for: 这paper aimsto explore the potential of federated learning (FL) in remote sensing (RS) and compare state-of-the-art FL algorithms for image classification tasks.
methods: 本paper使用了多种state-of-the-art FL algorithms, including federated averaging (FedAvg), federated transfer learning (FedTL), and federated meta-learning (FedMeta). The authors also conducted a theoretical comparison of the algorithms based on their local training complexity, aggregation complexity, learning efficiency, communication cost, and scalability.
results: 经过实验研究, authors found that FedAvg and FedTL outperformed other algorithms under different decentralization scenarios. Additionally, the authors derived a guideline for selecting suitable FL algorithms in RS based on the characteristics of the decentralized data.Abstract
Federated learning (FL) enables the collaboration of multiple deep learning models to learn from decentralized data archives (i.e., clients) without accessing data on clients. Although FL offers ample opportunities in knowledge discovery from distributed image archives, it is seldom considered in remote sensing (RS). In this paper, as a first time in RS, we present a comparative study of state-of-the-art FL algorithms. To this end, we initially provide a systematic review of the FL algorithms presented in the computer vision community for image classification problems, and select several state-of-the-art FL algorithms based on their effectiveness with respect to training data heterogeneity across clients (known as non-IID data). After presenting an extensive overview of the selected algorithms, a theoretical comparison of the algorithms is conducted based on their: 1) local training complexity; 2) aggregation complexity; 3) learning efficiency; 4) communication cost; and 5) scalability in terms of number of clients. As the classification task, we consider multi-label classification (MLC) problem since RS images typically consist of multiple classes, and thus can simultaneously be associated with multi-labels. After the theoretical comparison, experimental analyses are presented to compare them under different decentralization scenarios in terms of MLC performance. Based on our comprehensive analyses, we finally derive a guideline for selecting suitable FL algorithms in RS. The code of this work will be publicly available at https://git.tu-berlin.de/rsim/FL-RS.
摘要
Federated 学习(FL)允许多个深度学习模型在分布式数据存储(即客户端)上学习而不需要访问客户端上的数据。尽管FL在分布式图像存储中提供了丰富的机会,它在远程感知(RS)领域几乎未得到考虑。在这篇论文中,我们为RS领域的首次应用FL算法进行了比较研究。为此,我们首先提供了计算机视觉社区中关于图像分类问题的FL算法的系统性评论,并选择了一些在客户端数据不同性(即非Identical和不同)上显示出效果的FL算法。接着,我们对选择的算法进行了理论性比较,包括:1)本地训练复杂度;2)聚合复杂度;3)学习效率;4)通信成本;和5)可扩展性。作为分类任务,我们考虑了多标签分类(MLC)问题,因为RS图像通常包含多个类别,可以同时被关联到多个标签。在理论比较后,我们进行了实验分析,对不同的分布式场景进行了MLC性能的比较。根据我们的全面分析,我们最终提出了RS中FL算法选择的指南。代码将在https://git.tu-berlin.de/rsim/FL-RS上公开。
MonoProb: Self-Supervised Monocular Depth Estimation with Interpretable Uncertainty
methods: 该 paper 使用了一种新的无监督单目深度估计方法,即 MonoProb,可以在单一前进推理中提供可解释的uncertainty,表示网络对深度预测的预期错误。
results: 该 paper 的实验结果显示,MonoProb 可以提高depth和uncertainty的性能,并且可以在不增加推理时间的情况下提供depth和uncertainty的测量。Abstract
Self-supervised monocular depth estimation methods aim to be used in critical applications such as autonomous vehicles for environment analysis. To circumvent the potential imperfections of these approaches, a quantification of the prediction confidence is crucial to guide decision-making systems that rely on depth estimation. In this paper, we propose MonoProb, a new unsupervised monocular depth estimation method that returns an interpretable uncertainty, which means that the uncertainty reflects the expected error of the network in its depth predictions. We rethink the stereo or the structure-from-motion paradigms used to train unsupervised monocular depth models as a probabilistic problem. Within a single forward pass inference, this model provides a depth prediction and a measure of its confidence, without increasing the inference time. We then improve the performance on depth and uncertainty with a novel self-distillation loss for which a student is supervised by a pseudo ground truth that is a probability distribution on depth output by a teacher. To quantify the performance of our models we design new metrics that, unlike traditional ones, measure the absolute performance of uncertainty predictions. Our experiments highlight enhancements achieved by our method on standard depth and uncertainty metrics as well as on our tailored metrics. https://github.com/CEA-LIST/MonoProb
摘要
自我监督的单目深度估算方法目标在critical应用中,如自动驾驶车辆环境分析。为了避免这些方法的潜在缺陷,对depth估算的预测 confidence quantification是关键的,以帮助基于depth估算的决策系统。在这篇论文中,我们提出了MonoProb,一种新的无监督单目深度估算方法,该方法返回可解释的uncertainty,即网络的depth预测错误预期值。我们将单目或stereo/structure-from-motion paradigms用于无监督单目深度模型的训练转换为一个概率问题。在单个前向传播推理过程中,该模型提供了depth预测和其 confidence的度量,不会增加推理时间。我们然后通过一种新的自我混合损失来提高depth和uncertainty的性能,其中学生被监督于一个 pseudo 真实数据,该数据是一个depth输出的概率分布。为了衡量我们的模型性能,我们设计了新的metric,与传统metric不同,可以量化uncertainty预测的绝对性能。我们的实验表明,我们的方法在标准深度和uncertainty metric以及我们定制的metric上具有显著提高。References:* GitHub:
Fight Fire with Fire: Combating Adversarial Patch Attacks using Pattern-randomized Defensive Patches
results: canary 和 woodpecker 实现高性能,能够抗击未知攻击方法,时间开销有限;对防御意识攻击也具有 suficient 鲁棒性Abstract
Object detection has found extensive applications in various tasks, but it is also susceptible to adversarial patch attacks. Existing defense methods often necessitate modifications to the target model or result in unacceptable time overhead. In this paper, we adopt a counterattack approach, following the principle of "fight fire with fire," and propose a novel and general methodology for defending adversarial attacks. We utilize an active defense strategy by injecting two types of defensive patches, canary and woodpecker, into the input to proactively probe or weaken potential adversarial patches without altering the target model. Moreover, inspired by randomization techniques employed in software security, we employ randomized canary and woodpecker injection patterns to defend against defense-aware attacks. The effectiveness and practicality of the proposed method are demonstrated through comprehensive experiments. The results illustrate that canary and woodpecker achieve high performance, even when confronted with unknown attack methods, while incurring limited time overhead. Furthermore, our method also exhibits sufficient robustness against defense-aware attacks, as evidenced by adaptive attack experiments.
摘要
Exploring the Efficacy of Base Data Augmentation Methods in Deep Learning-Based Radiograph Classification of Knee Joint Osteoarthritis
paper_authors: Fabi Prezja, Leevi Annala, Sampsa Kiiskinen, Timo Ojala
for: 该研究旨在检测关节骨块炎(KOA),一种全球范围内导致残疾的主要原因。
methods: 该研究使用深度学习方法进行KOA诊断,并利用数据增强技术来增加数据多样性。
results: 研究发现,使用恶意增强技术可以提高KOA分类模型的性能,但其他常用的增强技术则常下降性能。研究还发现,存在可能的混淆区域在图像中,这使得模型可以准确地分类KL0和KL4等级,而不需要考虑关节部分。这一观察表明了模型可能利用不相关的特征来进行分类。Abstract
Diagnosing knee joint osteoarthritis (KOA), a major cause of disability worldwide, is challenging due to subtle radiographic indicators and the varied progression of the disease. Using deep learning for KOA diagnosis requires broad, comprehensive datasets. However, obtaining these datasets poses significant challenges due to patient privacy concerns and data collection restrictions. Additive data augmentation, which enhances data variability, emerges as a promising solution. Yet, it's unclear which augmentation techniques are most effective for KOA. This study explored various data augmentation methods, including adversarial augmentations, and their impact on KOA classification model performance. While some techniques improved performance, others commonly used underperformed. We identified potential confounding regions within the images using adversarial augmentation. This was evidenced by our models' ability to classify KL0 and KL4 grades accurately, with the knee joint omitted. This observation suggested a model bias, which might leverage unrelated features for classification currently present in radiographs. Interestingly, removing the knee joint also led to an unexpected improvement in KL1 classification accuracy. To better visualize these paradoxical effects, we employed Grad-CAM, highlighting the associated regions. Our study underscores the need for careful technique selection for improved model performance and identifying and managing potential confounding regions in radiographic KOA deep learning.
摘要
诊断膝关节骨关节炎(KOA)具有挑战性,主要原因是诊断标准化不够,疾病进程变化多样化。使用深度学习诊断KOA需要广泛、全面的数据集。然而,获得这些数据集具有难题,主要是因为患者隐私问题和数据收集限制。添加数据增强技术可以解决这个问题。然而,不同的增强技术对KOA分类模型的影响是不确定的。本研究探讨了不同的数据增强方法,包括对抗增强技术,对KOA分类模型的影响。一些技术提高了表现,而其他们则常常表现不佳。我们使用对抗增强技术 indentified可能的混合区域内 immagini,这是通过我们的模型可以准确地分类KL0和KL4等级,而不需要膝关节。这一观察表明了我们的模型可能受到了不相关的特征的影响,从而导致模型偏好。意外地,去掉膝关节也导致了KL1等级的准确率提高。为了更好地visualize这些paraoxical效应,我们使用Grad-CAM,显示关联区域。本研究表明,选择合适的技术和识别和管理可能的混合区域在诊断KOA的深度学习中是非常重要的。
Dual input stream transformer for eye-tracking line assignment
results: 对于九种经典方法的比较,DIST模型在九个多样化的数据集上达到了98.5%的平均准确率,显示DIST模型的优越性。Abstract
We introduce a novel Dual Input Stream Transformer (DIST) for the challenging problem of assigning fixation points from eye-tracking data collected during passage reading to the line of text that the reader was actually focused on. This post-processing step is crucial for analysis of the reading data due to the presence of noise in the form of vertical drift. We evaluate DIST against nine classical approaches on a comprehensive suite of nine diverse datasets, and demonstrate DIST's superiority. By combining multiple instances of the DIST model in an ensemble we achieve an average accuracy of 98.5\% across all datasets. Our approach presents a significant step towards addressing the bottleneck of manual line assignment in reading research. Through extensive model analysis and ablation studies, we identify key factors that contribute to DIST's success, including the incorporation of line overlap features and the use of a second input stream. Through evaluation on a set of diverse datasets we demonstrate that DIST is robust to various experimental setups, making it a safe first choice for practitioners in the field.
摘要
我们介绍了一种新的双输入流转换器(DIST),用于从读者眼动数据中分配焦点点到实际阅读的行。这是阅读数据分析中的一个关键后处理步骤,因为存在垂直滑动的噪声。我们对九种经典方法进行了评估,并示出了 DIST 的优越性。通过将多个 DIST 模型 ensemble 组合,我们在所有数据集上实现了平均准确率为 98.5%。我们的方法为阅读研究中的手动线 assigning 带来了一个重要的突破口。通过广泛的模型分析和减少学习,我们确定了 DIST 成功的关键因素,包括将行 overlap 特征并入和使用第二个输入流。我们在多个不同的数据集上进行了评估,并证明了 DIST 在不同的实际设置下具有Robustness,使其成为领域中的首选方法。
Enhancing Rock Image Segmentation in Digital Rock Physics: A Fusion of Generative AI and State-of-the-Art Neural Networks
results: 研究表明,通过将扩展的生成AI模型与高级神经网络结合,可以提高分割精度和一致性,并减少专家数据的需求。TransU-Net表现出色,在岩石微结构分割中实现最高的准确率和IoU指标。Abstract
In digital rock physics, analysing microstructures from CT and SEM scans is crucial for estimating properties like porosity and pore connectivity. Traditional segmentation methods like thresholding and CNNs often fall short in accurately detailing rock microstructures and are prone to noise. U-Net improved segmentation accuracy but required many expert-annotated samples, a laborious and error-prone process due to complex pore shapes. Our study employed an advanced generative AI model, the diffusion model, to overcome these limitations. This model generated a vast dataset of CT/SEM and binary segmentation pairs from a small initial dataset. We assessed the efficacy of three neural networks: U-Net, Attention-U-net, and TransUNet, for segmenting these enhanced images. The diffusion model proved to be an effective data augmentation technique, improving the generalization and robustness of deep learning models. TransU-Net, incorporating Transformer structures, demonstrated superior segmentation accuracy and IoU metrics, outperforming both U-Net and Attention-U-net. Our research advances rock image segmentation by combining the diffusion model with cutting-edge neural networks, reducing dependency on extensive expert data and boosting segmentation accuracy and robustness. TransU-Net sets a new standard in digital rock physics, paving the way for future geoscience and engineering breakthroughs.
摘要
在数字岩石物理中,分析微结构从CT和SEM扫描图像是关键的,以估算Properties like porosity和连通性。传统的分 segmentation方法,如阈值和CNNs,经常不能准确地描述岩石微结构,同时容易受到噪声的影响。U-Net提高了分 segmentation 精度,但需要大量由专家标注的样本,这是一个费时的和容易出错的过程,因为岩石的pores shapes是复杂的。我们的研究使用了一种先进的生成AI模型,扩散模型,以超越这些限制。这个模型生成了大量的CT/SEM和二进制分 segmentation对from a small initial dataset。我们评估了三个神经网络:U-Net、Attention-U-net和TransUNet,用于这些加强图像的分 segmentation。扩散模型证明是一种有效的数据增强技术,可以提高深度学习模型的普遍性和可靠性。TransU-Net,具有Transformer结构,在分 segmentation精度和IoU指标方面表现出色,超越了U-Net和Attention-U-net。我们的研究提高了岩石图像分 segmentation的准确性和可靠性,并减少了对专家数据的依赖。TransU-Net设置了新的标准在数字岩石物理中,开创了未来地球科学和工程的突破。
Learning-Based Biharmonic Augmentation for Point Cloud Classification
methods: 我们提出了一种新的数据增强技术 called Biharmonic Augmentation (BA),它通过对现有3D结构进行平滑非RIGID变换来增加数据集的多样性。我们使用一个CoefNet来预测权重,以将多个几何体的变换概率拼接起来。
results: 我们的实验表明,Biharmonic Augmentation 可以显著提高点云数据集的性能,并且在不同的网络设计下都显示出优秀的成果。Abstract
Point cloud datasets often suffer from inadequate sample sizes in comparison to image datasets, making data augmentation challenging. While traditional methods, like rigid transformations and scaling, have limited potential in increasing dataset diversity due to their constraints on altering individual sample shapes, we introduce the Biharmonic Augmentation (BA) method. BA is a novel and efficient data augmentation technique that diversifies point cloud data by imposing smooth non-rigid deformations on existing 3D structures. This approach calculates biharmonic coordinates for the deformation function and learns diverse deformation prototypes. Utilizing a CoefNet, our method predicts coefficients to amalgamate these prototypes, ensuring comprehensive deformation. Moreover, we present AdvTune, an advanced online augmentation system that integrates adversarial training. This system synergistically refines the CoefNet and the classification network, facilitating the automated creation of adaptive shape deformations contingent on the learner status. Comprehensive experimental analysis validates the superiority of Biharmonic Augmentation, showcasing notable performance improvements over prevailing point cloud augmentation techniques across varied network designs.
摘要
点云数据集经常受到不充分的样本数量的限制,使得数据增强成为一项挑战。传统方法,如rigid transformations和缩放,受到限制,因为它们不能改变个体样本的形状。我们介绍了一种新的和高效的数据增强技术——幂函数增强(BA)方法。BA方法通过计算幂函数坐标并学习多样化填充的投影函数,以增强点云数据的多样性。此外,我们还提出了一种名为 AdvTune的高级在线增强系统,该系统通过对CoefNet和分类网络进行对抗训练,自动生成适应性的形态变换,以适应学习者的不同状态。经过广泛的实验分析,我们证明了幂函数增强的优越性,在不同的网络设计下展现出了显著的性能提升。
Attributes Grouping and Mining Hashing for Fine-Grained Image Retrieval
results: 实验结果表明,AGMH方法在精细图像检索任务中具有最佳性能,并且超过了现有的状态态方法。Abstract
In recent years, hashing methods have been popular in the large-scale media search for low storage and strong representation capabilities. To describe objects with similar overall appearance but subtle differences, more and more studies focus on hashing-based fine-grained image retrieval. Existing hashing networks usually generate both local and global features through attention guidance on the same deep activation tensor, which limits the diversity of feature representations. To handle this limitation, we substitute convolutional descriptors for attention-guided features and propose an Attributes Grouping and Mining Hashing (AGMH), which groups and embeds the category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation for efficient fine-grained image retrieval. Specifically, an Attention Dispersion Loss (ADL) is designed to force the descriptors to attend to various local regions and capture diverse subtle details. Moreover, we propose a Stepwise Interactive External Attention (SIEA) to mine critical attributes in each descriptor and construct correlations between fine-grained attributes and objects. The attention mechanism is dedicated to learning discrete attributes, which will not cost additional computations in hash codes generation. Finally, the compact binary codes are learned by preserving pairwise similarities. Experimental results demonstrate that AGMH consistently yields the best performance against state-of-the-art methods on fine-grained benchmark datasets.
摘要
To address this limitation, we propose an Attributes Grouping and Mining Hashing (AGMH) method, which groups and embeds category-specific visual attributes in multiple descriptors to generate a comprehensive feature representation for efficient fine-grained image retrieval. We also design an Attention Dispersion Loss (ADL) to force the descriptors to attend to various local regions and capture diverse subtle details. Additionally, we propose a Stepwise Interactive External Attention (SIEA) to mine critical attributes in each descriptor and construct correlations between fine-grained attributes and objects. The attention mechanism is dedicated to learning discrete attributes, which does not require additional computations in hash code generation. Finally, we learn compact binary codes by preserving pairwise similarities.Experimental results show that AGMH consistently outperforms state-of-the-art methods on fine-grained benchmark datasets.
Lidar-based Norwegian tree species detection using deep learning
methods: 本研究使用了深度学习的 tree species 分类模型,使用 lidar 影像进行分类,并且使用 focal loss 损失函数进行训练。
results: 本研究在独立验证中获得了 macro-averaged F1 分数0.70,与对比 aerial 或 aerial 和 lidar 结合的模型相当。Abstract
Background: The mapping of tree species within Norwegian forests is a time-consuming process, involving forest associations relying on manual labeling by experts. The process can involve both aerial imagery, personal familiarity, or on-scene references, and remote sensing data. The state-of-the-art methods usually use high resolution aerial imagery with semantic segmentation methods. Methods: We present a deep learning based tree species classification model utilizing only lidar (Light Detection And Ranging) data. The lidar images are segmented into four classes (Norway Spruce, Scots Pine, Birch, background) with a U-Net based network. The model is trained with focal loss over partial weak labels. A major benefit of the approach is that both the lidar imagery and the base map for the labels have free and open access. Results: Our tree species classification model achieves a macro-averaged F1 score of 0.70 on an independent validation with National Forest Inventory (NFI) in-situ sample plots. That is close to, but below the performance of aerial, or aerial and lidar combined models.
摘要
Background: 挪威森林中的树种分类是一项时间consuming的过程, Forest associations 需要经过专家 manually labeling。这个过程可以使用空中图像、个人熟悉或场景参考,以及遥感数据。现状的方法通常使用高分辨率空中图像与语义分割方法。Methods: 我们提出了一种基于深度学习的树种分类模型,只使用激光探测(Light Detection And Ranging)数据。激光图像被分类为四类(挪威落叶松、苏格兰杉、桦树、背景),使用基于U-Net的网络进行分类。模型通过 focal loss 对部分弱标签进行训练。这种方法的一个主要优点是,激光图像和基础地图均有免费和开放的 accessed。Results: 我们的树种分类模型在独立验证中以 macro-averaged F1 分数为 0.70 达到了高水平。与气象、气象和激光组合模型相比,其性能只有一些下降。
Improved Positional Encoding for Implicit Neural Representation based Compact Data Representation
results: 实验显示,提出的方法可以在压缩任务中获得显著的增益,同时在新视角合成中实现更高的重建质量,而无需增加任何复杂性。Abstract
Positional encodings are employed to capture the high frequency information of the encoded signals in implicit neural representation (INR). In this paper, we propose a novel positional encoding method which improves the reconstruction quality of the INR. The proposed embedding method is more advantageous for the compact data representation because it has a greater number of frequency basis than the existing methods. Our experiments shows that the proposed method achieves significant gain in the rate-distortion performance without introducing any additional complexity in the compression task and higher reconstruction quality in novel view synthesis.
摘要
文本翻译为简化中文。使用位置编码 capture高频信息编码的含义信号在偏挥 neural representation(INR)中。本文提出了一种新的位置编码方法,可以提高INR重建质量。该嵌入方法具有更多的频率基准,与现有方法相比,具有更好的数据压缩性。我们的实验表明,该方法可以在压缩任务中实现显著的加速性和高质量重建。
Ulcerative Colitis Mayo Endoscopic Scoring Classification with Active Learning and Generative Data Augmentation
paper_authors: Ümit Mert Çağlar, Alperen İnci, Oğuz Hanoğlu, Görkem Polat, Alptekin Temizel
For: This paper aims to improve the accuracy of endoscopic image analysis for Ulcerative Colitis (UC) diagnosis and severity classification, by using active learning and generative augmentation methods.* Methods: The proposed method involves generating a large number of synthetic samples using a small dataset of real endoscopic images, and then using active learning to select the most informative samples for training a classifier.* Results: The method achieved improved classification performance compared to using only the original labeled examples, with a QWK score increase from 68.1% to 74.5%. Additionally, the method required three times fewer real images to achieve equivalent performance.Abstract
Endoscopic imaging is commonly used to diagnose Ulcerative Colitis (UC) and classify its severity. It has been shown that deep learning based methods are effective in automated analysis of these images and can potentially be used to aid medical doctors. Unleashing the full potential of these methods depends on the availability of large amount of labeled images; however, obtaining and labeling these images are quite challenging. In this paper, we propose a active learning based generative augmentation method. The method involves generating a large number of synthetic samples by training using a small dataset consisting of real endoscopic images. The resulting data pool is narrowed down by using active learning methods to select the most informative samples, which are then used to train a classifier. We demonstrate the effectiveness of our method through experiments on a publicly available endoscopic image dataset. The results show that using synthesized samples in conjunction with active learning leads to improved classification performance compared to using only the original labeled examples and the baseline classification performance of 68.1% increases to 74.5% in terms of Quadratic Weighted Kappa (QWK) Score. Another observation is that, attaining equivalent performance using only real data necessitated three times higher number of images.
摘要
便门影像是通常用于诊断发炎性结肠炎(UC)和评估其严重程度。研究表明,深度学习基本方法可以有效地自动分析这些影像,并可能用于帮助医学博士。然而,获取和标注这些影像是很困难的。在这篇论文中,我们提出了一个活动学习基本的生成增强方法。这个方法包括生成一大量的 sintetic 样本,通过训练一小型的实际便门影像集合。然后,使用活动学习方法选择最有用的样本,并将它们用于训练分类器。我们通过实验显示了我们的方法的有效性,使用 sintetic 样本和活动学习可以将分类性能提高至原始标注数据的 68.1% 提高至 74.5%,即quadratic Weighted Kappa(QWK)分数。此外,我们发现,仅使用实际数据就能够取得相等的性能,需要三倍多的数据。
Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples
results: 实验结果显示,CSDNet 在 Ultra-FGVC 任务上具有更高的表现力和适应力,较前一些现有的方法。这说明 CSDNet 能够更好地处理具有复杂的类别分割和有限的数据的 Ultra-FGVC 任务。Abstract
In the field of intelligent multimedia analysis, ultra-fine-grained visual categorization (Ultra-FGVC) plays a vital role in distinguishing intricate subcategories within broader categories. However, this task is inherently challenging due to the complex granularity of category subdivisions and the limited availability of data for each category. To address these challenges, this work proposes CSDNet, a pioneering framework that effectively explores contrastive learning and self-distillation to learn discriminative representations specifically designed for Ultra-FGVC tasks. CSDNet comprises three main modules: Subcategory-Specific Discrepancy Parsing (SSDP), Dynamic Discrepancy Learning (DDL), and Subcategory-Specific Discrepancy Transfer (SSDT), which collectively enhance the generalization of deep models across instance, feature, and logit prediction levels. To increase the diversity of training samples, the SSDP module introduces augmented samples from different viewpoints to spotlight subcategory-specific discrepancies. Simultaneously, the proposed DDL module stores historical intermediate features by a dynamic memory queue, which optimizes the feature learning space through iterative contrastive learning. Furthermore, the SSDT module is developed by a novel self-distillation paradigm at the logit prediction level of raw and augmented samples, which effectively distills more subcategory-specific discrepancies knowledge from the inherent structure of limited training data without requiring additional annotations. Experimental results demonstrate that CSDNet outperforms current state-of-the-art Ultra-FGVC methods, emphasizing its powerful efficacy and adaptability in addressing Ultra-FGVC tasks.
摘要
在智能多媒体分析领域, ultra-fine-grained视觉分类(Ultra-FGVC)扮演着重要的角色,用于分辨更加细致的分类划分。然而,这种任务具有较复杂的分类划分和数据有限的问题。为解决这些挑战,本文提出了 CSDNet 框架,它通过对比学习和自适应学习来学习特定于 Ultra-FGVC 任务的抽象表示。CSDNet 包括三个主要模块:特征特异性分析(SSDP)、动态不同分学习(DDL)和特征特异性转移(SSDT),这些模块共同提高了深度模型的通用性 across instance、feature和logit 预测层次。为了增加训练样本的多样性,SSDP 模块通过不同视点生成的增强样本来强调特征特异性。同时,提出的 DDL 模块通过动态缓存队列来优化特性学习空间,通过反卷积学习来优化特征学习。此外,SSDT 模块通过一种新的自适应学习方式来在 raw 和增强样本的 logit 预测层次上进行自适应学习,从而更好地提取有限训练数据中的特征特异性知识。实验结果表明,CSDNet 在 Ultra-FGVC 任务上表现出色,证明了其强大的效果和适应性。
Refining the ONCE Benchmark with Hyperparameter Tuning
results: 研究发现,使用不同的搜索空间和模型可以获得更高的性能,但使用无标注数据的贡献相对较少。Abstract
In response to the growing demand for 3D object detection in applications such as autonomous driving, robotics, and augmented reality, this work focuses on the evaluation of semi-supervised learning approaches for point cloud data. The point cloud representation provides reliable and consistent observations regardless of lighting conditions, thanks to advances in LiDAR sensors. Data annotation is of paramount importance in the context of LiDAR applications, and automating 3D data annotation with semi-supervised methods is a pivotal challenge that promises to reduce the associated workload and facilitate the emergence of cost-effective LiDAR solutions. Nevertheless, the task of semi-supervised learning in the context of unordered point cloud data remains formidable due to the inherent sparsity and incomplete shapes that hinder the generation of accurate pseudo-labels. In this study, we consider these challenges by posing the question: "To what extent does unlabelled data contribute to the enhancement of model performance?" We show that improvements from previous semi-supervised methods may not be as profound as previously thought. Our results suggest that simple grid search hyperparameter tuning applied to a supervised model can lead to state-of-the-art performance on the ONCE dataset, while the contribution of unlabelled data appears to be comparatively less exceptional.
摘要
“这个研究旨在评估半监督学习方法在3D物体探测应用中的表现,特别是针对基于LiDAR感知器的应用。点云表示提供可靠和一致的观察,不受照明条件影响。在LiDAR应用中,标签数据的标注是非常重要的,但对于半监督学习方法而言,自动化3D标签的自动化过程是一个挑战,可以实现成本下降和LiDAR解决方案的发展。然而,在无序点云数据上进行半监督学习的任务仍然是一个挑战,因为点云数据的缺乏和形状不对称对于生成准确pseudo标签所造成阻碍。在这个研究中,我们询问:“半监督学习中无标的数据对模型性能的贡献为何?”我们发现,与之前的半监督方法相比,改进的空间搜寻参数可以带来更好的性能,而无标的数据对模型性能的贡献相对较少。”
2D Image head pose estimation via latent space regression under occlusion settings
methods: 该研究提出了一种基于 latent space regression 的深度学习方法,通过更好地结构化 occluded scenrio 下的问题,提高 head pose estimation 的精度。
results: 对于 occluded 和 non-occluded scenrio 下的数据集,该方法比多种 state-of-the-art 方法表现出较高的精度,并且在实际应用中(如人机交互场景)也显示出了良好的性能。Abstract
Head orientation is a challenging Computer Vision problem that has been extensively researched having a wide variety of applications. However, current state-of-the-art systems still underperform in the presence of occlusions and are unreliable for many task applications in such scenarios. This work proposes a novel deep learning approach for the problem of head pose estimation under occlusions. The strategy is based on latent space regression as a fundamental key to better structure the problem for occluded scenarios. Our model surpasses several state-of-the-art methodologies for occluded HPE, and achieves similar accuracy for non-occluded scenarios. We demonstrate the usefulness of the proposed approach with: (i) two synthetically occluded versions of the BIWI and AFLW2000 datasets, (ii) real-life occlusions of the Pandora dataset, and (iii) a real-life application to human-robot interaction scenarios where face occlusions often occur. Specifically, the autonomous feeding from a robotic arm.
摘要
头部方向是计算机视觉领域中一个具有广泛应用的挑战,已经有广泛的研究。然而,当前状态的先进系统仍然在干扰场景下表现不佳,对多种任务场景表现不可靠。这项工作提出了一种基于深度学习的新方法,用于在干扰场景下进行头部pose估计。这种策略基于latent space regression作为基础,以更好地结构化干扰场景中的问题。我们的模型在干扰场景下超过了多种现有方法,并在非干扰场景下实现了类似的准确率。我们通过以下三种实验来证明提出的方法的有用性:(i)使用BIWI和AFLW2000 datasets中的两个 sintetically occluded版本,(ii)使用Pandora dataset中的真实干扰场景,(iii)在人机交互场景中使用 autonomous feeding from a robotic arm。
Diagonal Hierarchical Consistency Learning for Semi-supervised Medical Image Segmentation
results: 实验结果显示,DiHC-Net 比之前的所有方法在公共 Left Atrium (LA) 数据集上表现更好,具有更高的稳定性和准确性。Abstract
Medical image segmentation, which is essential for many clinical applications, has achieved almost human-level performance via data-driven deep learning techniques. Nevertheless, its performance is predicated on the costly process of manually annotating a large amount of medical images. To this end, we propose a novel framework for robust semi-supervised medical image segmentation using diagonal hierarchical consistency (DiHC-Net). First, it is composed of multiple sub-models with identical multi-scale architecture but with distinct sub-layers, such as up-sampling and normalisation layers. Second, a novel diagonal hierarchical consistency is enforced between one model's intermediate and final prediction and other models' soft pseudo labels in a diagonal hierarchical fashion. Experimental results verify the efficacy of our simple framework, outperforming all previous approaches on public Left Atrium (LA) dataset.
摘要
医疗图像分割,对许多临床应用而言是必需的,已经通过数据驱动的深度学习技术实现了几乎人类水平的性能。然而,其性能受到手动标注大量医疗图像的高成本过程的限制。为此,我们提出了一种新的框架,即对称层次一致性网络(DiHC-Net)。这个框架包括多个子模型,每个子模型具有相同的多scales架构,但各自具有不同的子层,例如升降sample和normal化层。其次,我们提出了一种新的对称层次一致性,即在一个模型的中间预测和最终预测之间,以及其他模型的软 Pseudo标签之间的对称层次一致性。实验结果表明,我们的简单框架可以准确地 segment 医疗图像,并且超过了所有之前的方法在公共左心脏(LA)数据集上的性能。
U3DS$^3$: Unsupervised 3D Semantic Scene Segmentation
results: 该方法在ScanNet和SemanticKITTI datasets上达到了状态略的表现,并在S3DIS dataset上获得了竞争性的结果。Abstract
Contemporary point cloud segmentation approaches largely rely on richly annotated 3D training data. However, it is both time-consuming and challenging to obtain consistently accurate annotations for such 3D scene data. Moreover, there is still a lack of investigation into fully unsupervised scene segmentation for point clouds, especially for holistic 3D scenes. This paper presents U3DS$^3$, as a step towards completely unsupervised point cloud segmentation for any holistic 3D scenes. To achieve this, U3DS$^3$ leverages a generalized unsupervised segmentation method for both object and background across both indoor and outdoor static 3D point clouds with no requirement for model pre-training, by leveraging only the inherent information of the point cloud to achieve full 3D scene segmentation. The initial step of our proposed approach involves generating superpoints based on the geometric characteristics of each scene. Subsequently, it undergoes a learning process through a spatial clustering-based methodology, followed by iterative training using pseudo-labels generated in accordance with the cluster centroids. Moreover, by leveraging the invariance and equivariance of the volumetric representations, we apply the geometric transformation on voxelized features to provide two sets of descriptors for robust representation learning. Finally, our evaluation provides state-of-the-art results on the ScanNet and SemanticKITTI, and competitive results on the S3DIS, benchmark datasets.
摘要
现代点云分割方法大多依赖于 ricly annotated 3D 训练数据。然而,获得一致性的精确标注对于 such 3D 场景数据是时间consuming 和挑战性的。此外,还缺乏对全自动点云分割的完全无监督场景进行研究,特别是 для holistic 3D 场景。本文提出了 U3DS$^3$,这是一种Step towards completely unsupervised point cloud segmentation for any holistic 3D scenes。为实现这一目标,U3DS$^3$ 利用了一种通用无监督分割方法,可以在 both indoor and outdoor static 3D point clouds 中实现全3D场景分割,无需模型预训练。U3DS$^3$ 的首先步骤是生成 superpoints 基于场景的 геометрических特征。然后,它通过空间归一化方法进行学习,然后通过 iterative training 使用 pseudo-labels 根据集中点生成。此外,通过利用点云的几何特征,我们将其映射到 voxelized 特征上,并将其作为 Robust 表示学习的两个集。最后,我们的评估结果表明 U3DS$^3$ 在 ScanNet 和 SemanticKITTI 数据集上获得了状态对的结果,并在 S3DIS 数据集上获得了竞争性的结果。
Polar-Net: A Clinical-Friendly Model for Alzheimer’s Disease Detection in OCTA Images
results: 与现有方法相比,Polar-Net模型在私人和公共数据集上表现出色,并提供更加有价值的病理证据,证明了Retinal microvasculature变化和AD之间的相关性。Abstract
Optical Coherence Tomography Angiography (OCTA) is a promising tool for detecting Alzheimer's disease (AD) by imaging the retinal microvasculature. Ophthalmologists commonly use region-based analysis, such as the ETDRS grid, to study OCTA image biomarkers and understand the correlation with AD. However, existing studies have used general deep computer vision methods, which present challenges in providing interpretable results and leveraging clinical prior knowledge. To address these challenges, we propose a novel deep-learning framework called Polar-Net. Our approach involves mapping OCTA images from Cartesian coordinates to polar coordinates, which allows for the use of approximate sector convolution and enables the implementation of the ETDRS grid-based regional analysis method commonly used in clinical practice. Furthermore, Polar-Net incorporates clinical prior information of each sector region into the training process, which further enhances its performance. Additionally, our framework adapts to acquire the importance of the corresponding retinal region, which helps researchers and clinicians understand the model's decision-making process in detecting AD and assess its conformity to clinical observations. Through evaluations on private and public datasets, we have demonstrated that Polar-Net outperforms existing state-of-the-art methods and provides more valuable pathological evidence for the association between retinal vascular changes and AD. In addition, we also show that the two innovative modules introduced in our framework have a significant impact on improving overall performance.
摘要
Optical Coherence Tomography Angiography (OCTA) 是一种有前途的工具,用于检测阿尔ツ海默病(AD),通过呈现Retinal Microvasculature的图像。 医生们通常使用区域分析方法,如ETDRS 格,来研究 OCTA 图像标记和了解与 AD 的相关性。然而,现有的研究都使用了通用的深度计算机视觉方法,这会带来解释结果的困难和不能充分利用临床前知识。为解决这些挑战,我们提出了一种新的深度学习框架,即Polar-Net。我们的方法通过将 OCTA 图像从Cartesian坐标系转换到极坐标系,使得可以使用 Approximate Sector Convolution 和实施ETDRS 格基于的区域分析方法,这样可以更好地利用临床前知识。此外,Polar-Net 还将临床前知识 incorporated 到训练过程中,进一步提高其性能。此外,我们的框架还可以评估相应的Retinal 区域的重要性,帮助研究人员和医生理解模型的决策过程中的AD 检测和评估模型是否符合临床观察。经过评估private和公共数据集,我们展示了Polar-Net 可以超过现有的状态对方法,并提供更有价值的病理证据,用于关系Retinal 血管变化和AD的相关性。此外,我们还发现了两个创新模块在我们的框架中具有重要作用,即提高总性能。
Keystroke Verification Challenge (KVC): Biometric and Fairness Benchmark Evaluation
paper_authors: Giuseppe Stragapede, Ruben Vera-Rodriguez, Ruben Tolosana, Aythami Morales, Naser Damer, Julian Fierrez, Javier Ortega-Garcia
for: 本研究旨在提高键盘动作生物认证的性能和公正性。
methods: 本研究使用了新的实验框架和公平指标来评估键盘动作生物认证系统的性能和公正性。
results: 研究发现,通过减少键盘动作生物认证系统中文本内容的分析,可以保持atisfactory的性能,同时减少隐私泄露风险。Here’s the translation in Simplified Chinese:
for: 本研究旨在提高键盘动作生物认证的性能和公正性。
methods: 本研究使用了新的实验框架和公平指标来评估键盘动作生物认证系统的性能和公正性。
results: 研究发现,通过减少键盘动作生物认证系统中文本内容的分析,可以保持 satisfactory 的性能,同时减少隐私泄露风险。Abstract
Analyzing keystroke dynamics (KD) for biometric verification has several advantages: it is among the most discriminative behavioral traits; keyboards are among the most common human-computer interfaces, being the primary means for users to enter textual data; its acquisition does not require additional hardware, and its processing is relatively lightweight; and it allows for transparently recognizing subjects. However, the heterogeneity of experimental protocols and metrics, and the limited size of the databases adopted in the literature impede direct comparisons between different systems, thus representing an obstacle in the advancement of keystroke biometrics. To alleviate this aspect, we present a new experimental framework to benchmark KD-based biometric verification performance and fairness based on tweet-long sequences of variable transcript text from over 185,000 subjects, acquired through desktop and mobile keyboards, extracted from the Aalto Keystroke Databases. The framework runs on CodaLab in the form of the Keystroke Verification Challenge (KVC). Moreover, we also introduce a novel fairness metric, the Skewed Impostor Ratio (SIR), to capture inter- and intra-demographic group bias patterns in the verification scores. We demonstrate the usefulness of the proposed framework by employing two state-of-the-art keystroke verification systems, TypeNet and TypeFormer, to compare different sets of input features, achieving a less privacy-invasive system, by discarding the analysis of text content (ASCII codes of the keys pressed) in favor of extended features in the time domain. Our experiments show that this approach allows to maintain satisfactory performance.
摘要
分析键盘动态(KD) для生物认证有很多优点:它是最有特征的行为特征之一;键盘是人机界面中最常用的输入设备之一,用户通过键盘输入文本数据;获取它不需要额外硬件,处理也较轻量级,可透明地识别用户。然而,实验室协议和度量的多样性,以及文献中所采用的数据库的小型,使得不同系统之间的比较困难,从而阻碍了键盘生物认证的进步。为了解决这一问题,我们提出了一个新的实验框架,用于评估基于键盘动态的生物认证性和公正性,并在 CodaLab 上进行了 Keystroke Verification Challenge(KVC)。此外,我们还介绍了一种新的公正度指标,即不良假冒比率(SIR),用于捕捉 между组和内组偏见偏好的识别分数。我们通过使用两个现有的键盘认证系统,TypeNet 和 TypeFormer,对不同的输入特征进行比较,实现了一种更加隐私的系统,通过抛弃ASCII码的分析以获得更多的时间域特征。我们的实验结果表明,这种方法可以保持满意的性能。
Vision Big Bird: Random Sparsification for Full Attention
results: 实验结果表明,视觉大鸟模型可以维持自注意的稀热性,并且可以安全地去除位置编码。该模型在常见视觉任务中达到竞争性性能。Abstract
Recently, Transformers have shown promising performance in various vision tasks. However, the high costs of global self-attention remain challenging for Transformers, especially for high-resolution vision tasks. Inspired by one of the most successful transformers-based models for NLP: Big Bird, we propose a novel sparse attention mechanism for Vision Transformers (ViT). Specifically, we separate the heads into three groups, the first group used convolutional neural network (CNN) to extract local features and provide positional information for the model, the second group used Random Sampling Windows (RS-Win) for sparse self-attention calculation, and the third group reduces the resolution of the keys and values by average pooling for global attention. Based on these components, ViT maintains the sparsity of self-attention while maintaining the merits of Big Bird (i.e., the model is a universal approximator of sequence functions and is Turing complete). Moreover, our results show that the positional encoding, a crucial component in ViTs, can be safely removed in our model. Experiments show that Vision Big Bird demonstrates competitive performance on common vision tasks.
摘要
近些时间,变换器在各种视觉任务中表现出了扎实的能力。然而,全球自注意的高成本仍然是变换器的挑战,特别是高分辨率视觉任务。受Big Bird模型的启发,我们提出了一种新的稀疏注意机制 для视觉变换器(ViT)。具体来说,我们将头分为三组:第一组使用卷积神经网络(CNN)提取地方特征并提供模型位置信息,第二组使用随机抽取窗口(RS-Win)进行稀疏自注意计算,第三组将键和值的分辨率降低到平均抽取。这些组件使得ViT可以保持自注意的稀疏性,同时保持Big Bird模型的优点(即模型是序列函数的通用近似器和Turing完善的)。此外,我们的实验表明,ViT中的位置编码可以安全地移除。Vision Big Bird在常见视觉任务中显示了竞争力强的性能。
Comparing Male Nyala and Male Kudu Classification using Transfer Learning with ResNet-50 and VGG-16
results: эксperimental结果显示,在550张图像中,预训练后的VGG-16和ResNet-50模型均达到了97.7%的准确率,而不进行调整的模型则达到了93.2%的准确率。然而,这些结果是基于一个小样本大小的评估,因此可能不具有足够的可靠性和普遍性。Abstract
Reliable and efficient monitoring of wild animals is crucial to inform management and conservation decisions. The process of manually identifying species of animals is time-consuming, monotonous, and expensive. Leveraging on advances in deep learning and computer vision, we investigate in this paper the efficiency of pre-trained models, specifically the VGG-16 and ResNet-50 model, in identifying a male Kudu and a male Nyala in their natural habitats. These pre-trained models have proven to be efficient in animal identification in general. Still, there is little research on animals like the Kudu and Nyala, who are usually well camouflaged and have similar features. The method of transfer learning used in this paper is the fine-tuning method. The models are evaluated before and after fine-tuning. The experimental results achieved an accuracy of 93.2\% and 97.7\% for the VGG-16 and ResNet-50 models, respectively, before fine-tuning and 97.7\% for both models after fine-tuning. Although these results are impressive, it should be noted that they were taken over a small sample size of 550 images split in half between the two classes; therefore, this might not cater to enough scenarios to get a full conclusion of the efficiency of the models. Therefore, there is room for more work in getting a more extensive dataset and testing and extending to the female counterparts of these species and the whole antelope species.
摘要
可靠和高效的野生动物监测是管理和保护决策的关键。手动识别动物种类是时间consuming、单调和昂贵的。利用深度学习和计算机视觉的进步,我们在这篇论文中 investigate了VGG-16和ResNet-50模型在自然环境中识别♂️普通鹿和♂️涂猪的能力。这些预训练模型在动物识别方面有效。然而,关于鹿和涂猪这些动物,它们通常很隐藏,外表相似,有少量研究。本文使用的方法是转移学习方法。模型在 Fine-tuning 前和后的评估结果表明,VGG-16和ResNet-50模型在自然环境中识别♂️普通鹿和♂️涂猪的能力具有93.2%和97.7%的准确率,分别是之前和之后 Fine-tuning。虽然这些结果很出色,但是它们是基于550张图像,其中有一半是♂️普通鹿和♂️涂猪两类的样本,因此这并不能代表充分的场景,因此还有很多空间 для进一步的测试和扩展。
Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments
results: 实验结果显示,这个新的框架可以与已有的框架相比,三倍减少模型大小,并提高执行速度1.4倍。Abstract
Deep learning-based models are at the forefront of most driver observation benchmarks due to their remarkable accuracies but are also associated with high computational costs. This is challenging, as resources are often limited in real-world driving scenarios. This paper introduces a lightweight framework for resource-efficient driver activity recognition. The framework enhances 3D MobileNet, a neural architecture optimized for speed in video classification, by incorporating knowledge distillation and model quantization to balance model accuracy and computational efficiency. Knowledge distillation helps maintain accuracy while reducing the model size by leveraging soft labels from a larger teacher model (I3D), instead of relying solely on original ground truth data. Model quantization significantly lowers memory and computation demands by using lower precision integers for model weights and activations. Extensive testing on a public dataset for in-vehicle monitoring during autonomous driving demonstrates that this new framework achieves a threefold reduction in model size and a 1.4-fold improvement in inference time, compared to an already optimized architecture. The code for this study is available at https://github.com/calvintanama/qd-driver-activity-reco.
摘要
深度学习模型在驾驶员观察benchmark中领先,主要是因为它们的准确率非常高,但是也因为计算成本很高。这会在实际驾驶场景中带来挑战,因为资源经常是有限的。这篇论文介绍了一个轻量级框架,用于提高驾驶员活动识别的资源效率。这个框架基于3D MobileNet neural网络,通过知识传授和模型量化来平衡模型准确率和计算效率。知识传授可以使得模型尺寸减小,而不会影响准确率,而模型量化可以减少内存和计算需求。经过对一个公共数据集进行了广泛的测试,表明这个新的框架可以将模型尺寸减少三分之一,并提高推理时间1.4倍,相比之前优化的架构。代码可以在https://github.com/calvintanama/qd-driver-activity-reco中下载。
A Neural Height-Map Approach for the Binocular Photometric Stereo Problem
results: 本文在DiLiGenT-MV数据集和LUCES-ST数据集上达到了状态对抗性表现,并且在binocular PS setup中实现了同样的获取速度和优化表达。Abstract
In this work we propose a novel, highly practical, binocular photometric stereo (PS) framework, which has same acquisition speed as single view PS, however significantly improves the quality of the estimated geometry. As in recent neural multi-view shape estimation frameworks such as NeRF, SIREN and inverse graphics approaches to multi-view photometric stereo (e.g. PS-NeRF) we formulate shape estimation task as learning of a differentiable surface and texture representation by minimising surface normal discrepancy for normals estimated from multiple varying light images for two views as well as discrepancy between rendered surface intensity and observed images. Our method differs from typical multi-view shape estimation approaches in two key ways. First, our surface is represented not as a volume but as a neural heightmap where heights of points on a surface are computed by a deep neural network. Second, instead of predicting an average intensity as PS-NeRF or introducing lambertian material assumptions as Guo et al., we use a learnt BRDF and perform near-field per point intensity rendering. Our method achieves the state-of-the-art performance on the DiLiGenT-MV dataset adapted to binocular stereo setup as well as a new binocular photometric stereo dataset - LUCES-ST.
摘要
在这项工作中,我们提出了一种新的、高度实用的双目光学三角形(PS)框架,其具有与单视PS相同的获取速度,但可以显著提高估计几何的质量。我们的方法与现代神经网络多视图形态估计框架(如NeRF、SIREN和反射图像推导法)类似,将形态估计任务定义为通过最小化多个变化光图像中的法向缺失来学习可导表面和 текстура表示。我们的方法与传统多视图形态估计方法有两点不同:首先,我们的表面不是一个体积,而是一个深度神经网络中的高度图像;其次,我们不是预测平均Intensity,而是使用学习的BRDF进行靠近场near-field render。我们的方法在DiLiGenT-MV数据集中适配了双目掌控设置以及一个新的双目光学三角形数据集——LUCES-ST中达到了状态盘的性能。
Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
paper_authors: Siao Tang, Xin Wang, Hong Chen, Chaoyu Guan, Zewen Wu, Yansong Tang, Wenwu Zhu for:This paper focuses on developing a novel post-training quantization method for text-to-image diffusion models, specifically targeting widely used large pretrained models like Stable Diffusion and Stable Diffusion XL.methods:The proposed method, called PCR (Progressive Calibration and Relaxing), consists of two key strategies: progressive calibration and activation relaxing. The former considers the accumulated quantization error across timesteps, while the latter improves performance with negligible cost.results:The proposed method and a new benchmark (QDiffBench) are extensively evaluated on Stable Diffusion and Stable Diffusion XL. The results show that the proposed method achieves superior performance and is the first to achieve quantization for Stable Diffusion XL while maintaining performance. Additionally, QDiffBench provides a more accurate evaluation of text-to-image diffusion model quantization by considering the distribution gap and generalization performance outside the calibration dataset.Abstract
Diffusion models have achieved great success due to their remarkable generation ability. However, their high computational overhead is still a troublesome problem. Recent studies have leveraged post-training quantization (PTQ) to compress diffusion models. However, most of them only focus on unconditional models, leaving the quantization of widely used large pretrained text-to-image models, e.g., Stable Diffusion, largely unexplored. In this paper, we propose a novel post-training quantization method PCR (Progressive Calibration and Relaxing) for text-to-image diffusion models, which consists of a progressive calibration strategy that considers the accumulated quantization error across timesteps, and an activation relaxing strategy that improves the performance with negligible cost. Additionally, we demonstrate the previous metrics for text-to-image diffusion model quantization are not accurate due to the distribution gap. To tackle the problem, we propose a novel QDiffBench benchmark, which utilizes data in the same domain for more accurate evaluation. Besides, QDiffBench also considers the generalization performance of the quantized model outside the calibration dataset. Extensive experiments on Stable Diffusion and Stable Diffusion XL demonstrate the superiority of our method and benchmark. Moreover, we are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
摘要
Diffusion 模型在生成能力方面已经取得了很大的成功,但是它们的计算开销仍然是一个痛苦的问题。latest studies have leveraged post-training quantization (PTQ) to compress diffusion models, but most of them only focus on unconditional models, leaving the quantization of widely used large pretrained text-to-image models, such as Stable Diffusion, largely unexplored. In this paper, we propose a novel post-training quantization method PCR (Progressive Calibration and Relaxing) for text-to-image diffusion models, which consists of a progressive calibration strategy that considers the accumulated quantization error across timesteps, and an activation relaxing strategy that improves the performance with negligible cost. Additionally, we demonstrate that the previous metrics for text-to-image diffusion model quantization are not accurate due to the distribution gap. To tackle this problem, we propose a novel QDiffBench benchmark, which utilizes data in the same domain for more accurate evaluation. Besides, QDiffBench also considers the generalization performance of the quantized model outside the calibration dataset. Extensive experiments on Stable Diffusion and Stable Diffusion XL demonstrate the superiority of our method and benchmark. Moreover, we are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
Efficient Segmentation with Texture in Ore Images Based on Box-supervised Approach
paper_authors: Guodong Sun, Delong Huang, Yuting Peng, Le Cheng, Bo Wu, Yang Zhang
For: The paper is written for image segmentation of crushed ores in a complex working environment, where high-powered computing equipment is difficult to deploy, and the ore distribution is stacked, making it challenging to identify complete features.* Methods: The proposed method uses a ghost feature pyramid network (Ghost-FPN) to process features obtained from the backbone, an optimized detection head to obtain accurate features, and a fusion feature similarity-based loss function that combines Lab color space (Lab) and local binary patterns (LBP) texture features to improve accuracy while incurring no loss.* Results: The proposed method achieves over 50 frames per second with a small model size of 21.6 MB, and maintains a high level of accuracy compared with state-of-the-art approaches on ore image datasets.Here is the information in Simplified Chinese text:* For: 该文章是为了处理受损矿石的图像分割,在复杂的工作环境下,高性能计算设备困难执行,矿石分布叠加,难以识别完整的特征。* 方法: 提出了一种使用 Ghost Feature Pyramid Network (Ghost-FPN) 处理从底层获得的特征,优化检测头以获得准确的特征,并将 Lab 色彩空间 (Lab) 和本地二进制模式 (LBP) 文本特征组合成一个 fusional 特征相似性基于损失函数,以提高准确性而不产生损失。* 结果: 提出的方法可以在 MS COCO 上达到更高于 50 帧/秒的速度,并且在矿石图像数据集上保持高级别的准确性,而且与当前状态艺术方法相比,模型大小只有 21.6 MB。源代码可以在 \url{https://github.com/MVME-HBUT/OREINST} 上获取。Abstract
Image segmentation methods have been utilized to determine the particle size distribution of crushed ores. Due to the complex working environment, high-powered computing equipment is difficult to deploy. At the same time, the ore distribution is stacked, and it is difficult to identify the complete features. To address this issue, an effective box-supervised technique with texture features is provided for ore image segmentation that can identify complete and independent ores. Firstly, a ghost feature pyramid network (Ghost-FPN) is proposed to process the features obtained from the backbone to reduce redundant semantic information and computation generated by complex networks. Then, an optimized detection head is proposed to obtain the feature to maintain accuracy. Finally, Lab color space (Lab) and local binary patterns (LBP) texture features are combined to form a fusion feature similarity-based loss function to improve accuracy while incurring no loss. Experiments on MS COCO have shown that the proposed fusion features are also worth studying on other types of datasets. Extensive experimental results demonstrate the effectiveness of the proposed method, which achieves over 50 frames per second with a small model size of 21.6 MB. Meanwhile, the method maintains a high level of accuracy compared with the state-of-the-art approaches on ore image dataset. The source code is available at \url{https://github.com/MVME-HBUT/OREINST}.
摘要
Image segmentation方法已经在粉碎矿物中使用来确定粉碎物的大小分布。由于工作环境复杂,高功率计算设备困难提供。同时,矿物分布叠加,难以识别完整的特征。为解决这个问题,一种有效的盒子-监督法(Box-supervised)是提供了用于矿物图像分割的方法,可以识别完整独立的矿物。首先,一种鬼Feature pyramid网络(Ghost-FPN)是提出来处理来自后处理网络的特征,以减少复杂网络生成的重复semantic信息和计算。然后,一种优化的检测头是提出来,以获得维持准确性的特征。最后,Lab色彩空间(Lab)和本地二进制模式(LBP)的xture特征被组合以形成一个混合特征相似度基于的损失函数,以提高准确性而不损失一切。在MS COCO上进行了实验,表明提出的混合特征也值得进行其他类型的数据集上的研究。广泛的实验结果表明提出的方法的有效性,可以在20 frames/s的小型模型大小为21.6 MB下达到50 frames/s的性能水平,同时保持与当前最佳方法在矿物图像 dataset 的高级别准确性。源代码可以在 \url{https://github.com/MVME-HBUT/OREINST} 上获取。
Automated Sperm Assessment Framework and Neural Network Specialized for Sperm Video Recognition
results: 实验结果表明,RoSTFine 可以提高精子评估性能,并强调重要的精子部分(即头和 neck)。Abstract
Infertility is a global health problem, and an increasing number of couples are seeking medical assistance to achieve reproduction, at least half of which are caused by men. The success rate of assisted reproductive technologies depends on sperm assessment, in which experts determine whether sperm can be used for reproduction based on morphology and motility of sperm. Previous sperm assessment studies with deep learning have used datasets comprising images that include only sperm heads, which cannot consider motility and other morphologies of sperm. Furthermore, the labels of the dataset are one-hot, which provides insufficient support for experts, because assessment results are inconsistent between experts, and they have no absolute answer. Therefore, we constructed the video dataset for sperm assessment whose videos include sperm head as well as neck and tail, and its labels were annotated with soft-label. Furthermore, we proposed the sperm assessment framework and the neural network, RoSTFine, for sperm video recognition. Experimental results showed that RoSTFine could improve the sperm assessment performances compared to existing video recognition models and focus strongly on important sperm parts (i.e., head and neck).
摘要
世界各地有增加的 couples 为了成婚而寻求医疗帮助,至少有一半是由男方引起的不孕。协助生殖技术的成功率取决于精子评估,专家们通过精子形态和运动能力来决定精子是否适用于生殖。之前的精子评估研究使用深度学习都使用了只包含精子头部的图像集合,这无法考虑精子的运动和其他形态。此外,数据集的标签都是一元化的,这不足以支持专家,因为评估结果存在差异 между 专家,并没有绝对的答案。因此,我们建立了包含精子头部、脖子和尾部的视频数据集,并使用了软标签来标注数据集。此外,我们提出了精子评估框架和基于视频的神经网络模型 RoSTFine,用于精子视频识别。实验结果表明,RoSTFine 可以提高精子评估性能,并强调精子重要部分(即头和脖子)。
Inter-object Discriminative Graph Modeling for Indoor Scene Recognition
paper_authors: Chuanxin Song, Hanbo Wu, Xin Ma for: This paper focuses on improving indoor scene recognition by leveraging object information within scenes to enhance feature representations.methods: The proposed approach uses a probabilistic perspective to capture object-scene discriminative relationships, which are then transformed into an Inter-Object Discriminative Prototype (IODP). The Discriminative Graph Network (DGN) is constructed to incorporate inter-object discriminative knowledge into the image representation through graph convolution.results: The proposed approach achieves state-of-the-art results on several widely used scene datasets, demonstrating the effectiveness of the proposed approach.Abstract
Variable scene layouts and coexisting objects across scenes make indoor scene recognition still a challenging task. Leveraging object information within scenes to enhance the distinguishability of feature representations has emerged as a key approach in this domain. Currently, most object-assisted methods use a separate branch to process object information, combining object and scene features heuristically. However, few of them pay attention to interpretably handle the hidden discriminative knowledge within object information. In this paper, we propose to leverage discriminative object knowledge to enhance scene feature representations. Initially, we capture the object-scene discriminative relationships from a probabilistic perspective, which are transformed into an Inter-Object Discriminative Prototype (IODP). Given the abundant prior knowledge from IODP, we subsequently construct a Discriminative Graph Network (DGN), in which pixel-level scene features are defined as nodes and the discriminative relationships between node features are encoded as edges. DGN aims to incorporate inter-object discriminative knowledge into the image representation through graph convolution. With the proposed IODP and DGN, we obtain state-of-the-art results on several widely used scene datasets, demonstrating the effectiveness of the proposed approach.
摘要
<>变量场景布局和场景中的对象共存,indoor场景认知仍然是一项挑战性任务。利用场景中对象信息来增强特征表示的方法已经成为indoor场景认知领域的关键方法。现有大多数对象协助方法使用分立支线处理对象信息,混合对象和场景特征的方式。然而,其中很少听从解释地处理隐藏的推理知识。在本文中,我们提议利用隐藏的推理知识来增强场景特征表示。首先,我们从概率角度捕捉对象-场景推理关系,并将其转化为间对象推理原型(IODP)。在IODP的丰富先验知识基础上,我们随后建立一个推理图网络(DGN),其中像素级场景特征被定义为节点,图中的节点间的推理关系被编码为边。DGN的目标是通过图 convolution来将间对象推理知识integrated到图像表示中。与我们提议的IODP和DGN,我们在多个常用的场景数据集上获得了state-of-the-art的结果,证明了我们的方法的有效性。
Semantic Map Guided Synthesis of Wireless Capsule Endoscopy Images using Diffusion Models
results: 该研究通过视觉检查和视觉图灵测试,证明了该方法的效果,可以生成真实和多样化的WCE图像。Abstract
Wireless capsule endoscopy (WCE) is a non-invasive method for visualizing the gastrointestinal (GI) tract, crucial for diagnosing GI tract diseases. However, interpreting WCE results can be time-consuming and tiring. Existing studies have employed deep neural networks (DNNs) for automatic GI tract lesion detection, but acquiring sufficient training examples, particularly due to privacy concerns, remains a challenge. Public WCE databases lack diversity and quantity. To address this, we propose a novel approach leveraging generative models, specifically the diffusion model (DM), for generating diverse WCE images. Our model incorporates semantic map resulted from visualization scale (VS) engine, enhancing the controllability and diversity of generated images. We evaluate our approach using visual inspection and visual Turing tests, demonstrating its effectiveness in generating realistic and diverse WCE images.
摘要
无线胶囊内视镜(WCE)是一种非侵入性的方法,用于观察肠道系统,对肠道疾病的诊断非常重要。然而,解读WCE结果可以是时间consuming和疲劳的。现有的研究已经使用深度神经网络(DNNs)自动检测肠道病变,但获得充分的训练样本,特别是由于隐私问题,仍然是一个挑战。公共WCE数据库缺乏多样性和数量。为解决这个问题,我们提出了一种新的方法,利用生成模型(DM)生成多样的WCE图像。我们的模型具有视觉化缩放引擎(VS)生成的semantic map,从而提高生成图像的可控性和多样性。我们通过视觉检查和视觉图灵测试评估了我们的方法,并证明其效果在生成真实和多样的WCE图像。
Central Angle Optimization for 360-degree Holographic 3D Content
results: 经验表明,选择最佳中心角可以提高投影内容的质量。Abstract
In this study, we propose a method to find an optimal central angle in deep learning-based depth map estimation used to produce realistic holographic content. The acquisition of RGB-depth map images as detailed as possible must be performed to generate holograms of high quality, despite the high computational cost. Therefore, we introduce a novel pipeline designed to analyze various values of central angles between adjacent camera viewpoints equidistant from the origin of an object-centered environment. Then we propose the optimal central angle to generate high-quality holographic content. The proposed pipeline comprises key steps such as comparing estimated depth maps and comparing reconstructed CGHs (Computer-Generated Holograms) from RGB images and estimated depth maps. We experimentally demonstrate and discuss the relationship between the central angle and the quality of digital holographic content.
摘要
在这项研究中,我们提出了一种方法来找出深度学习基于深度地图估计的优化中心角,以生成真实的投射内容。为了生成高质量的投射,需要获取RGB-深度地图图像,这些图像需要尽可能详细,但计算成本高。因此,我们提出了一个新的管道,用于分析不同中心角之间的RGB-深度地图图像。然后,我们提出了优化中心角,以生成高质量的投射内容。该管道包括以下关键步骤:对RGB图像和估计的深度地图进行比较,并对计算机生成的投射(CGH)和估计的深度地图进行比较。我们在实验中证明并讨论了中心角与数字投射内容质量之间的关系。
Automated Heterogeneous Low-Bit Quantization of Multi-Model Deep Learning Inference Pipeline
results: 该论文通过自动化量化方法,实现了多个DNNs的精度-延迟平衡,并且提高了边缘部署中的模型性能。Abstract
Multiple Deep Neural Networks (DNNs) integrated into single Deep Learning (DL) inference pipelines e.g. Multi-Task Learning (MTL) or Ensemble Learning (EL), etc., albeit very accurate, pose challenges for edge deployment. In these systems, models vary in their quantization tolerance and resource demands, requiring meticulous tuning for accuracy-latency balance. This paper introduces an automated heterogeneous quantization approach for DL inference pipelines with multiple DNNs.
摘要
多层神经网络(DNN)组合在单个深度学习(DL)推理管道中,例如多任务学习(MTL)或集成学习(EL)等,虽然非常准确,但对边缘部署带来挑战。这些系统中的模型异常量化忍耐和资源需求,需要精确地调整以实现准确率和延迟之间的平衡。本文介绍了一种自动化多类量化方法 для DL推理管道中的多个DNN。
Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service
paper_authors: Yuanmin Tang, Jing Yu, Keke Gai, Xiangyan Qu, Yue Hu, Gang Xiong, Qi Wu for: 这个研究是为了提供一个安全和可靠的版权标识方法来防止模型EXTRACTION攻击,以保护运算在多媒体Embedding as a Service(EaaS)上的知识产权和商业所有权。methods: 本研究使用了附加 trigger 的方法来将版权标识Inserted into VLPs,并通过嵌入式扩展Transformation来实现高质量的版权验证和最小化模型性能影响。此外,我们还提出了一种协力Copyright验证策略,通过融合 triggers和嵌入分布来增强标识的可靠性,抵抗不同的攻击。results: 我们的实验结果显示,提出的版权标识方法是有效和安全的,可以在不同的数据集上验证VLPs的版权,并对于模型EXTRACTION攻击进行防护。此外,我们还提出了一种可行的Out-of-distribution trigger选择方法,使得版权标识可以在实际世界中进行实现。Abstract
Recent advances in vision-language pre-trained models (VLPs) have significantly increased visual understanding and cross-modal analysis capabilities. Companies have emerged to provide multi-modal Embedding as a Service (EaaS) based on VLPs (e.g., CLIP-based VLPs), which cost a large amount of training data and resources for high-performance service. However, existing studies indicate that EaaS is vulnerable to model extraction attacks that induce great loss for the owners of VLPs. Protecting the intellectual property and commercial ownership of VLPs is increasingly crucial yet challenging. A major solution of watermarking model for EaaS implants a backdoor in the model by inserting verifiable trigger embeddings into texts, but it is only applicable for large language models and is unrealistic due to data and model privacy. In this paper, we propose a safe and robust backdoor-based embedding watermarking method for VLPs called VLPMarker. VLPMarker utilizes embedding orthogonal transformation to effectively inject triggers into the VLPs without interfering with the model parameters, which achieves high-quality copyright verification and minimal impact on model performance. To enhance the watermark robustness, we further propose a collaborative copyright verification strategy based on both backdoor trigger and embedding distribution, enhancing resilience against various attacks. We increase the watermark practicality via an out-of-distribution trigger selection approach, removing access to the model training data and thus making it possible for many real-world scenarios. Our extensive experiments on various datasets indicate that the proposed watermarking approach is effective and safe for verifying the copyright of VLPs for multi-modal EaaS and robust against model extraction attacks. Our code is available at https://github.com/Pter61/vlpmarker.
摘要
近期,视觉语言预训模型(VLP)的进步已经提高了视觉理解和跨模态分析的能力。企业出现了基于VLP的多Modal Embedding as a Service(EaaS),但是这需要大量的训练数据和资源来提供高性能服务。然而,现有研究表明,EaaS受到模型抽取攻击,这会导致VLP的所有者受到很大的损失。保护VLP的知识产权和商业所有权是一项 increasinly 杰出的任务,但是它具有挑战。在这篇论文中,我们提出了一种安全和可靠的VLP embedding水印方法,称为VLPMarker。VLPMarker利用Embedding ortogonal transformation来有效地插入触发器到VLP中,而不会对模型参数产生影响,从而实现高质量的版权验证和最小的影响。为增强水印鲜度,我们进一步提出了基于触发器和embedding分布的共同版权验证策略,提高了对各种攻击的抗性。此外,我们还提出了一种基于非典型触发器的选择方法,使得水印更加实用。我们的实验表明,提议的水印方法是安全和可靠的,可以用于验证VLP的版权在多Modal EaaS中。我们的代码可以在https://github.com/Pter61/vlpmarker上下载。
Domain Generalization by Learning from Privileged Medical Imaging Information
results: 研究表明,使用特权信息可以提高医疗图像分类模型对outsider数据的分类精度,从0.911提高到0.934。Abstract
Learning the ability to generalize knowledge between similar contexts is particularly important in medical imaging as data distributions can shift substantially from one hospital to another, or even from one machine to another. To strengthen generalization, most state-of-the-art techniques inject knowledge of the data distribution shifts by enforcing constraints on learned features or regularizing parameters. We offer an alternative approach: Learning from Privileged Medical Imaging Information (LPMII). We show that using some privileged information such as tumor shape or location leads to stronger domain generalization ability than current state-of-the-art techniques. This paper demonstrates that by using privileged information to predict the severity of intra-layer retinal fluid in optical coherence tomography scans, the classification accuracy of a deep learning model operating on out-of-distribution data improves from $0.911$ to $0.934$. This paper provides a strong starting point for using privileged information in other medical problems requiring generalization.
摘要
学习在类似上下文中总结知识的能力对医疗成像非常重要,因为数据分布可能在不同医院或机器之间差异很大。为强化总结,大多数当前领先技术会在学习特征或参数上强制加入数据分布偏移的约束。我们提出了一种不同的方法:使用特权医疗成像信息学习(LPMII)。我们表明,使用特权信息,如肿瘤形态或位置,可以增强领域总结能力,比现有领先技术更高。这篇论文展示了,通过使用特权信息预测optical coherence tomography扫描中内层血液的严重程度,深度学习模型在不同数据上的分类精度从0.911提高到0.934。这篇论文提供了使用特权信息在医疗问题中的强大起点。
Layer-wise Auto-Weighting for Non-Stationary Test-Time Adaptation
results: 实验结果显示,本文的方法比传统的连续和慢速更新方法更好,同时可以很大程度地降低计算负载,强调了FIM-based learning weight在适应持续变化的目标分布方面的重要性。Abstract
Given the inevitability of domain shifts during inference in real-world applications, test-time adaptation (TTA) is essential for model adaptation after deployment. However, the real-world scenario of continuously changing target distributions presents challenges including catastrophic forgetting and error accumulation. Existing TTA methods for non-stationary domain shifts, while effective, incur excessive computational load, making them impractical for on-device settings. In this paper, we introduce a layer-wise auto-weighting algorithm for continual and gradual TTA that autonomously identifies layers for preservation or concentrated adaptation. By leveraging the Fisher Information Matrix (FIM), we first design the learning weight to selectively focus on layers associated with log-likelihood changes while preserving unrelated ones. Then, we further propose an exponential min-max scaler to make certain layers nearly frozen while mitigating outliers. This minimizes forgetting and error accumulation, leading to efficient adaptation to non-stationary target distribution. Experiments on CIFAR-10C, CIFAR-100C, and ImageNet-C show our method outperforms conventional continual and gradual TTA approaches while significantly reducing computational load, highlighting the importance of FIM-based learning weight in adapting to continuously or gradually shifting target domains.
摘要
(注:以下是简化中文版本)随着实际应用中数据分布的不断变化,测试时间适应(TTA)在部署后是必需的。然而,实际中的目标分布不断变化带来了悬峰忘却和错误积累的挑战。现有的TTA方法对非站立性目标分布非常有效,但是 computational load 过高,使其无法实现在设备上进行。在这篇论文中,我们提出了一种层 wise auto-weighting 算法,用于逐渐和积极地适应非站立性目标分布。我们首先通过 Fisher Information Matrix (FIM) 设计学习权重,以选择与 log-likelihood 变化相关的层,并保留不相关的层。然后,我们进一步提出了一种对数抑制器,使certain层变得几乎冻结,并 Mitigate 异常值。这有效地减少了忘却和错误积累,从而实现了高效地适应非站立性目标分布。我们的方法在 CIFAR-10C、CIFAR-100C 和 ImageNet-C 上进行了实验,并证明了我们的方法在不断变化的目标分布下对 TTA 进行了改进,并且可以减少计算负担,强调 FIM 基于的学习权重在适应不断变化的目标分布中的重要性。
Uncertainty-aware Single View Volumetric Rendering for Medical Neural Radiance Fields
results: 我们在公共可用的膝盖和胸部数据集上训练了我们的模型,并对单个X射影图像进行CT投影图像的Rendering,并与其他基于生成辐射场的方法进行比较。Abstract
In the field of clinical medicine, computed tomography (CT) is an effective medical imaging modality for the diagnosis of various pathologies. Compared with X-ray images, CT images can provide more information, including multi-planar slices and three-dimensional structures for clinical diagnosis. However, CT imaging requires patients to be exposed to large doses of ionizing radiation for a long time, which may cause irreversible physical harm. In this paper, we propose an Uncertainty-aware MedNeRF (UMedNeRF) network based on generated radiation fields. The network can learn a continuous representation of CT projections from 2D X-ray images by obtaining the internal structure and depth information and using adaptive loss weights to ensure the quality of the generated images. Our model is trained on publicly available knee and chest datasets, and we show the results of CT projection rendering with a single X-ray and compare our method with other methods based on generated radiation fields.
摘要
在临床医学领域,计算机断层成像(CT)是一种有效的医疗影像Modalities,用于诊断多种疾病。相比X射线图像,CT图像可以提供更多的信息,包括多平面切片和三维结构,为临床诊断提供更多的参考。然而,CT成像需要患者长时间暴露于大剂量辐射,可能会导致不可逆的物理损害。在本文中,我们提出了基于生成辐射场的不确定性意识MedNeRF(UMedNeRF)网络。该网络可以通过获取内部结构和深度信息,从2D X射线图像中生成CT投影图像,并使用适应损失质量来保证生成图像质量。我们的模型在公共可用的膝盖和胸部数据集上进行训练,并对CT投影图像的生成进行了比较。
Diffusion Shape Prior for Wrinkle-Accurate Cloth Registration
results: 在高精度捕捉到的实际衣服上,提出的方法比VAE或PCA基于的surface registration更好地泛化,并在扩展和减少扩展测试中都能够超越优化基于和学习基于的非rigid registration方法。Abstract
Registering clothes from 4D scans with vertex-accurate correspondence is challenging, yet important for dynamic appearance modeling and physics parameter estimation from real-world data. However, previous methods either rely on texture information, which is not always reliable, or achieve only coarse-level alignment. In this work, we present a novel approach to enabling accurate surface registration of texture-less clothes with large deformation. Our key idea is to effectively leverage a shape prior learned from pre-captured clothing using diffusion models. We also propose a multi-stage guidance scheme based on learned functional maps, which stabilizes registration for large-scale deformation even when they vary significantly from training data. Using high-fidelity real captured clothes, our experiments show that the proposed approach based on diffusion models generalizes better than surface registration with VAE or PCA-based priors, outperforming both optimization-based and learning-based non-rigid registration methods for both interpolation and extrapolation tests.
摘要
<>文本翻译成简化中文。<>注册Textureless clothes from 4D scans with vertex-accurate correspondence is challenging, yet important for dynamic appearance modeling and physics parameter estimation from real-world data. However, previous methods either rely on texture information, which is not always reliable, or achieve only coarse-level alignment. In this work, we present a novel approach to enabling accurate surface registration of texture-less clothes with large deformation. Our key idea is to effectively leverage a shape prior learned from pre-captured clothing using diffusion models. We also propose a multi-stage guidance scheme based on learned functional maps, which stabilizes registration for large-scale deformation even when they vary significantly from training data. Using high-fidelity real captured clothes, our experiments show that the proposed approach based on diffusion models generalizes better than surface registration with VAE or PCA-based priors, outperforming both optimization-based and learning-based non-rigid registration methods for both interpolation and extrapolation tests.Note: The translation is done using Google Translate, and may not be perfect. Please let me know if you need any further assistance.
Adaptive Variance Thresholding: A Novel Approach to Improve Existing Deep Transfer Vision Models and Advance Automatic Knee-Joint Osteoarthritis Classification
results: 本研究的结果表明,通过应用我们的方法,可以提高预训练KOA模型的初始准确率,并将NAS输入向量空间减少60倍,从而提高推理速度和优化超参数搜索。此外,我们还应用了这种方法于一个外部已经训练的KOA分类模型,并得到了较好的效果,使其成为骨关节风溃病分类模型之一。Abstract
Knee-Joint Osteoarthritis (KOA) is a prevalent cause of global disability and is inherently complex to diagnose due to its subtle radiographic markers and individualized progression. One promising classification avenue involves applying deep learning methods; however, these techniques demand extensive, diversified datasets, which pose substantial challenges due to medical data collection restrictions. Existing practices typically resort to smaller datasets and transfer learning. However, this approach often inherits unnecessary pre-learned features that can clutter the classifier's vector space, potentially hampering performance. This study proposes a novel paradigm for improving post-training specialized classifiers by introducing adaptive variance thresholding (AVT) followed by Neural Architecture Search (NAS). This approach led to two key outcomes: an increase in the initial accuracy of the pre-trained KOA models and a 60-fold reduction in the NAS input vector space, thus facilitating faster inference speed and a more efficient hyperparameter search. We also applied this approach to an external model trained for KOA classification. Despite its initial performance, the application of our methodology improved its average accuracy, making it one of the top three KOA classification models.
摘要
膝关节骨关节炎 (KOA) 是全球最常见的残疾原因之一,而其诊断却因为它的微不足和个人化进程而被认为是复杂的。深度学习技术可能会有所助益,但这些技术需要大量多样化的数据集,医疗数据收集限制成为了主要挑战。现有的做法通常是使用更小的数据集和转移学习。然而,这种方法可能会固化预先学习的特征,从而降低表现。本研究提出了一种改进后期特殊化分类器的新方法,通过适应差异阈值调整 (AVT) 和神经网络搜索 (NAS)。这种方法导致了两个关键的结果:首先,提高了预训练 KOA 模型的初始精度;其次,将 NAS 输入向量空间减少到 60 倍,从而提高了推理速度和搜索效率。我们还应用了这种方法于一个外部用于 KOA 分类的模型。尽管它的初始表现不佳,但通过我们的方法改进,其平均精度得到了提高,成为了 KOA 分类模型之一。
Synthesizing Bidirectional Temporal States of Knee Osteoarthritis Radiographs with Cycle-Consistent Generative Adversarial Neural Networks
results: 模型能够有效地将病例阶段转换为不同的阶段,特别是将晚期病例阶段转换为早期阶段,并且能够抑制骨质增生和扩大膝关节空间,这些特征都是早期KOA的典型表现。Abstract
Knee Osteoarthritis (KOA), a leading cause of disability worldwide, is challenging to detect early due to subtle radiographic indicators. Diverse, extensive datasets are needed but are challenging to compile because of privacy, data collection limitations, and the progressive nature of KOA. However, a model capable of projecting genuine radiographs into different OA stages could augment data pools, enhance algorithm training, and offer pre-emptive prognostic insights. In this study, we trained a CycleGAN model to synthesize past and future stages of KOA on any genuine radiograph. The model was validated using a Convolutional Neural Network that was deceived into misclassifying disease stages in transformed images, demonstrating the CycleGAN's ability to effectively transform disease characteristics forward or backward in time. The model was particularly effective in synthesizing future disease states and showed an exceptional ability to retroactively transition late-stage radiographs to earlier stages by eliminating osteophytes and expanding knee joint space, signature characteristics of None or Doubtful KOA. The model's results signify a promising potential for enhancing diagnostic models, data augmentation, and educational and prognostic usage in healthcare. Nevertheless, further refinement, validation, and a broader evaluation process encompassing both CNN-based assessments and expert medical feedback are emphasized for future research and development.
摘要
髋关节滤出病 (KOA) 是全球最主要的残疾原因之一,但早期检测困难由于病理表像不具有明显的特征。收集延伸的数据集是困难的,主要因为隐私、数据收集限制和滤出病的进行性。然而,一种能将真实的X光像投影到不同的滤出病阶段的模型可以增加数据库,提高算法训练和提供预防性预测。本研究中,我们使用了循环GAN模型将过去和未来的滤出病阶段投影到任何真实的X光像上。我们验证了这种模型,使用了一个 convolutional neural network (CNN) 被欺骗到在转换后的图像中错误地分类病种特征,表明循环GAN模型可以有效地将病种特征转换到不同的时间阶段。特别是在将未来的病状投影到当前阶段的情况下,模型表现出了极高的效果。此外,模型还可以逆转晚期X光像,使其变回早期阶段,这是 none 或 doubtful KOA 的特征之一。这些结果表明这种模型在改善诊断模型、数据增强和教学和预测方面具有普遍的潜力。然而,进一步的优化、验证和更广泛的评估过程,包括使用 CNN 基础的评估和专业医疗反馈,是未来研究和开发的重点。
results: 研究结果表明不同的 LLM 和提问类型之间存在显著的性能差异,而且提问精度对代码生成的准确率和时间效率有重要的影响。本研究的重要贡献在于找到了最佳提问策略,以便在自动代码生成任务中创造准确的 Python 函数。Abstract
Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. Among the myriad of applications that benefit from LLMs, automated code generation is increasingly promising. The potential to transform natural language prompts into executable code promises a major shift in software development practices and paves the way for significant reductions in manual coding efforts and the likelihood of human-induced errors. This paper reports the results of a study that evaluates the performance of various LLMs, such as Bard, ChatGPT-3.5, ChatGPT-4, and Claude-2, in generating Python for coding problems. We focus on how levels of prompt specificity impact the accuracy, time efficiency, and space efficiency of the generated code. A benchmark of 104 coding problems, each with four types of prompts with varying degrees of tests and specificity, was employed to examine these aspects comprehensively. Our results indicate significant variations in performance across different LLMs and prompt types, and its key contribution is to reveal the ideal prompting strategy for creating accurate Python functions. This study lays the groundwork for further research in LLM capabilities and suggests practical implications for utilizing LLMs in automated code generation tasks and test-driven development.
摘要
This study evaluates the performance of several LLMs, including Bard, ChatGPT-3.5, ChatGPT-4, and Claude-2, in generating Python code for coding problems. We focus on how the level of specificity in the prompts affects the accuracy, time efficiency, and space efficiency of the generated code. To examine these aspects comprehensively, we used a benchmark of 104 coding problems, each with four types of prompts with varying degrees of tests and specificity.Our results show significant variations in performance across different LLMs and prompt types. The study's key contribution is revealing the ideal prompting strategy for creating accurate Python functions. These findings lay the groundwork for further research into LLM capabilities and have practical implications for using LLMs in automated code generation tasks and test-driven development.
Resolving uncertainty on the fly: Modeling adaptive driving behavior as active inference
results: 研究发现,通过应用这种模型,可以解释人类在不同的驾驶情况下如何 adaptively 驾驶,例如穿过障碍物和同时进行眼动时间分享。这些结果表明了这种模型的一致性和可解释性。Abstract
Understanding adaptive human driving behavior, in particular how drivers manage uncertainty, is of key importance for developing simulated human driver models that can be used in the evaluation and development of autonomous vehicles. However, existing traffic psychology models of adaptive driving behavior either lack computational rigor or only address specific scenarios and/or behavioral phenomena. While models developed in the fields of machine learning and robotics can effectively learn adaptive driving behavior from data, due to their black box nature, they offer little or no explanation of the mechanisms underlying the adaptive behavior. Thus, a generalizable, interpretable, computational model of adaptive human driving behavior is still lacking. This paper proposes such a model based on active inference, a behavioral modeling framework originating in computational neuroscience. The model offers a principled solution to how humans trade progress against caution through policy selection based on the single mandate to minimize expected free energy. This casts goal-seeking and information-seeking (uncertainty-resolving) behavior under a single objective function, allowing the model to seamlessly resolve uncertainty as a means to obtain its goals. We apply the model in two apparently disparate driving scenarios that require managing uncertainty, (1) driving past an occluding object and (2) visual time sharing between driving and a secondary task, and show how human-like adaptive driving behavior emerges from the single principle of expected free energy minimization.
摘要
Forte: An Interactive Visual Analytic Tool for Trust-Augmented Net Load Forecasting
For: This paper aims to provide a visual analytics-based application (Forte) to explore deep probabilistic net load forecasting models across various input variables and understand the error rates for different scenarios.* Methods: The paper uses a web-based interface with carefully designed visual interventions to empower scientists to derive insights about model performance by simulating diverse scenarios, facilitating an informed decision-making process.* Results: The paper demonstrates the effectiveness of visualization techniques to provide valuable insights into the correlation between weather inputs and net load forecasts, ultimately advancing grid capabilities by improving trust in forecasting models.Abstract
Accurate net load forecasting is vital for energy planning, aiding decisions on trade and load distribution. However, assessing the performance of forecasting models across diverse input variables, like temperature and humidity, remains challenging, particularly for eliciting a high degree of trust in the model outcomes. In this context, there is a growing need for data-driven technological interventions to aid scientists in comprehending how models react to both noisy and clean input variables, thus shedding light on complex behaviors and fostering confidence in the outcomes. In this paper, we present Forte, a visual analytics-based application to explore deep probabilistic net load forecasting models across various input variables and understand the error rates for different scenarios. With carefully designed visual interventions, this web-based interface empowers scientists to derive insights about model performance by simulating diverse scenarios, facilitating an informed decision-making process. We discuss observations made using Forte and demonstrate the effectiveness of visualization techniques to provide valuable insights into the correlation between weather inputs and net load forecasts, ultimately advancing grid capabilities by improving trust in forecasting models.
摘要
正确的电网负载预测是重要的能源观察,帮助决策贸易和负载分配。然而,评估预测模型对不同的输入变数,如温度和湿度,的性能仍然是一个挑战,尤其是为了获得高度的信任度。在这个上下文中,有一个增长的需求是使用数据驱动的技术来帮助科学家理解预测模型对不同的输入变数具有多少影响,以及这些变数对预测模型的影响。在这篇论文中,我们提出了Forte,一个基于可观察分析的应用程序,用于探索深度概率电网负载预测模型的不同输入变数下的性能。这个网页式界面通过精心设计的可观察干预,帮助科学家从不同的enario中获得预测模型的性能,并帮助他们做出了 Informed 的决策。我们详细说明了使用Forte所作出的观察,并证明了可观察技术的效用,以提高电网预测模型的信任度,最终提高电网的能力。
ChatGPT in the context of precision agriculture data analytics
for: 这个研究 argue that 将 ChatGPT интеGRATED into the data processing pipeline of automated sensors in precision agriculture 可以带来多个Benefits和改进现代农业实践中的多个方面。
results: 这个研究表明,通过 ChatGPT 的语言模型可以将 Speech 输入映射到文本,并且可以通过 Python 代码和 Pandas 与整个数据库进行交互,可以实时提供农业数据分析的结果和建议,并且可以通过语音合成器与用户进行Iterative 和改进的交互。Abstract
In this study we argue that integrating ChatGPT into the data processing pipeline of automated sensors in precision agriculture has the potential to bring several benefits and enhance various aspects of modern farming practices. Policy makers often face a barrier when they need to get informed about the situation in vast agricultural fields to reach to decisions. They depend on the close collaboration between agricultural experts in the field, data analysts, and technology providers to create interdisciplinary teams that cannot always be secured on demand or establish effective communication across these diverse domains to respond in real-time. In this work we argue that the speech recognition input modality of ChatGPT provides a more intuitive and natural way for policy makers to interact with the database of the server of an agricultural data processing system to which a large, dispersed network of automated insect traps and sensors probes reports. The large language models map the speech input to text, allowing the user to form its own version of unconstrained verbal query, raising the barrier of having to learn and adapt oneself to a specific data analytics software. The output of the language model can interact through Python code and Pandas with the entire database, visualize the results and use speech synthesis to engage the user in an iterative and refining discussion related to the data. We show three ways of how ChatGPT can interact with the database of the remote server to which a dispersed network of different modalities (optical counters, vibration recordings, pictures, and video), report. We examine the potential and the validity of the response of ChatGPT in analyzing, and interpreting agricultural data, providing real time insights and recommendations to stakeholders
摘要
在这项研究中,我们 argue that将 ChatGPT integrate into 自动感知系统的数据处理管道可以带来多种优点,提高现代农业实践中的各个方面。政策制定者经常遇到困难,当他们需要获取庞大农业场景中的信息,以便做出决策。他们需要和农业专家、数据分析师和技术提供商合作,创建协同团队,但这些团队不一定可以在需要时协作,建立有效的交流也是一个挑战。在这项工作中,我们 argue that ChatGPT 的语音识别输入模式提供了一种更直观和自然的方式,让政策制定者与农业数据处理系统的服务器上的数据库进行交互。大语言模型将语音输入转换为文本,让用户可以自定义的提问,不需要适应特定的数据分析软件。输出的语言模型可以通过 Python 代码和 Pandas 与整个数据库进行交互,可视化结果,并使用语音合成器与用户进行可迭代的讨论,与数据相关。我们介绍了三种 ChatGPT 与远程服务器上的数据库交互的方法。我们研究了 ChatGPT 对农业数据的分析和解释的可能性和有效性,以及在实时提供农业决策者的信息和建议。
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
results: 这篇论文通过实践研究,发现BOFT可以对大型视觉对应、大型语言模型和文本对应图像散乱模型进行优化,并且比OFT更有优化效果。Abstract
Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.
摘要
大型基金模型在现场变得普遍,但从头来训练它们是不可持续的。因此,有效地适应这些强大模型到下游任务变得越来越重要。在这篇论文中,我们研究了一种原则正式的 Parameter-efficient finetuning 方法——Orthogonal Finetuning (OFT)。尽管它们展现了良好的泛化能力,但 OFT 仍然需要一些可训练的参数,这是因为正交矩阵的维度较高。为了解决这个问题,我们从信息传输的角度来考虑 OFT,然后确定了一些关键的需求,可以提高参数效率。受到 Cooley-Tukey 快速傅立叶变换算法的启发,我们提议一种高效的正交参数化方法,使用蝴蝶结构。我们将这种参数化方法应用于 OFT,创造了一种新的参数效率高的 finetuning 方法,称为 Orthogonal Butterfly (BOFT)。 BOFT 将 OFT 作为特例,提出一种总体的正交 finetuning 框架。最后,我们进行了广泛的实验研究,适应大型视觉转换器、大型语言模型和文本到图像扩散模型到视觉和语言领域中的各种下游任务。
Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations
results: 论文通过三个实验(代码可以在https://github.com/Roihn/SABM),证明了SABM的有效性和可行性。这些实验表明,SABM可以模拟复杂系统的行为,并且可以增加模型的灵活性和现实感。Abstract
Computer simulations offer a robust toolset for exploring complex systems across various disciplines. A particularly impactful approach within this realm is Agent-Based Modeling (ABM), which harnesses the interactions of individual agents to emulate intricate system dynamics. ABM's strength lies in its bottom-up methodology, illuminating emergent phenomena by modeling the behaviors of individual components of a system. Yet, ABM has its own set of challenges, notably its struggle with modeling natural language instructions and common sense in mathematical equations or rules. This paper seeks to transcend these boundaries by integrating Large Language Models (LLMs) like GPT into ABM. This amalgamation gives birth to a novel framework, Smart Agent-Based Modeling (SABM). Building upon the concept of smart agents -- entities characterized by their intelligence, adaptability, and computation ability -- we explore in the direction of utilizing LLM-powered agents to simulate real-world scenarios with increased nuance and realism. In this comprehensive exploration, we elucidate the state of the art of ABM, introduce SABM's potential and methodology, and present three case studies (source codes available at https://github.com/Roihn/SABM), demonstrating the SABM methodology and validating its effectiveness in modeling real-world systems. Furthermore, we cast a vision towards several aspects of the future of SABM, anticipating a broader horizon for its applications. Through this endeavor, we aspire to redefine the boundaries of computer simulations, enabling a more profound understanding of complex systems.
摘要
SABM leverages the concept of smart agents, which are entities characterized by their intelligence, adaptability, and computational ability. By using LLM-powered agents, SABM can simulate real-world scenarios with increased nuance and realism. In this comprehensive exploration, we elucidate the current state of the art of ABM, introduce the potential and methodology of SABM, and present three case studies (available at ), demonstrating the effectiveness of the SABM methodology in modeling real-world systems.Looking forward, we envision a broader horizon for the applications of SABM, with the potential to redefine the boundaries of computer simulations and enable a deeper understanding of complex systems. By harnessing the power of LLMs and ABM, SABM has the potential to revolutionize the field of computer simulations and provide new insights into complex systems.
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models
results: 在七个数据集和其分割(训练和测试/验证)上,使用GPT-4和GPT-3.5两种state-of-the-art LLM,发现方法可以增强数据污染检测和准确地估计污染程度,即使污染信号弱。Abstract
We propose the Data Contamination Quiz, a simple and effective approach to detect data contamination in large language models (LLMs) and estimate the amount of it. Specifically, we frame data contamination detection as a series of multiple-choice questions. We devise a quiz format wherein three perturbed versions of each dataset instance are created. These changes only include word-level perturbations, replacing words with their contextual synonyms, ensuring both the semantic and sentence structure remain exactly the same as the original instance. Together with the original instance, these perturbed versions constitute the choices in the quiz. Given that the only distinguishing signal among these choices is the exact wording, an LLM, when tasked with identifying the original instance from the choices, opts for the original if it has memorized it in its pre-training phase--a trait intrinsic to LLMs. A dataset partition is then marked as contaminated if the LLM's performance on the quiz surpasses what random chance suggests. Our evaluation spans seven datasets and their respective splits (train and test/validation) on two state-of-the-art LLMs: GPT-4 and GPT-3.5. While lacking access to the pre-training data, our results suggest that our approach not only enhances the detection of data contamination but also provides an accurate estimation of its extent, even when the contamination signal is weak.
摘要
我们提出了数据污染测验(Data Contamination Quiz),一种简单有效的方法用于检测大型自然语言模型(LLM)中的数据污染和量化其扩散。具体来说,我们将数据污染检测转化为一系列多选题目。我们设计了一种测验形式,其中每个数据集实例上分别创建了三个杂化版本。这些杂化版本仅包括单词水平的修改,将单词换成相关的同义词,以保持原始实例的语义和句子结构完全相同。与原始实例一起,这些杂化版本组成测验的选择。由于这些选择之间只有单词的不同,因此当一个LLM在面临这些选择时,如果它在预训练阶段已经记忆了原始实例,那么它会选择原始实例。我们对七个dataset和它们的分割(训练和测试/验证)进行了评估,使用两个现代LLM:GPT-4和GPT-3.5。尽管我们没有直接访问预训练数据,但我们的方法不仅可以增强数据污染检测,还可以准确地估计污染的程度,即使污染信号弱。
Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization
results: 我们发现Shape bias的表现受到网络架构和监督方法的影响,并且与多样性和自然性相互纠缠。我们提出了一种新的解释Shape bias的方法,即用于估计样本集中样本的多样性。Abstract
Recent advancements in deep learning have been primarily driven by the use of large models trained on increasingly vast datasets. While neural scaling laws have emerged to predict network performance given a specific level of computational resources, the growing demand for expansive datasets raises concerns. To address this, a new research direction has emerged, focusing on the creation of synthetic data as a substitute. In this study, we investigate how neural networks exhibit shape bias during training on synthetic datasets, serving as an indicator of the synthetic data quality. Specifically, our findings indicate three key points: (1) Shape bias varies across network architectures and types of supervision, casting doubt on its reliability as a predictor for generalization and its ability to explain differences in model recognition compared to human capabilities. (2) Relying solely on shape bias to estimate generalization is unreliable, as it is entangled with diversity and naturalism. (3) We propose a novel interpretation of shape bias as a tool for estimating the diversity of samples within a dataset. Our research aims to clarify the implications of using synthetic data and its associated shape bias in deep learning, addressing concerns regarding generalization and dataset quality.
摘要
Our findings reveal three key points:1. Shape bias varies across network architectures and types of supervision, casting doubt on its reliability as a predictor for generalization and its ability to explain differences in model recognition compared to human capabilities.2. Relying solely on shape bias to estimate generalization is unreliable, as it is entangled with diversity and naturalism.3. We propose a novel interpretation of shape bias as a tool for estimating the diversity of samples within a dataset.Our research aims to clarify the implications of using synthetic data and its associated shape bias in deep learning, addressing concerns regarding generalization and dataset quality.
MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things
results: 这篇论文提出了多种模型基线,包括单感知模式和多感知模式,以及多任务和多感知模型,以便未来的研究人员可以更好地进行多感知表示学习。Abstract
The Internet of Things (IoT), the network integrating billions of smart physical devices embedded with sensors, software, and communication technologies for the purpose of connecting and exchanging data with other devices and systems, is a critical and rapidly expanding component of our modern world. The IoT ecosystem provides a rich source of real-world modalities such as motion, thermal, geolocation, imaging, depth, sensors, video, and audio for prediction tasks involving the pose, gaze, activities, and gestures of humans as well as the touch, contact, pose, 3D of physical objects. Machine learning presents a rich opportunity to automatically process IoT data at scale, enabling efficient inference for impact in understanding human wellbeing, controlling physical devices, and interconnecting smart cities. To develop machine learning technologies for IoT, this paper proposes MultiIoT, the most expansive IoT benchmark to date, encompassing over 1.15 million samples from 12 modalities and 8 tasks. MultiIoT introduces unique challenges involving (1) learning from many sensory modalities, (2) fine-grained interactions across long temporal ranges, and (3) extreme heterogeneity due to unique structure and noise topologies in real-world sensors. We also release a set of strong modeling baselines, spanning modality and task-specific methods to multisensory and multitask models to encourage future research in multisensory representation learning for IoT.
摘要
互联网物品(IoT),整合了数百万个智能物理设备,嵌入了感知器、软件和通信技术,用于连接和交换数据,是当代世界中一个关键和迅速发展的组成部分。IoT生态系统提供了丰富的现实世界模式,如运动、热度、地理位置、成像、深度、音频和视频等,用于预测人类的姿势、视线、活动和手势。机器学习对IoT数据进行自动处理,可以实现高效的推理,以便更好地理解人类的健康状况、控制物理设备和连接智能城市。为了开发IoT中机器学习技术,本文提出了MultiIoT,迄今为止最大的IoTbenchmark,包括12种感知模式和8个任务,共计1.15万个样本。MultiIoT带来了来自多种感知模式的学习挑战,以及长时间范围内的细化交互和实际世界感知器的特殊结构和噪声概率图。我们还发布了一组强大的模型基线,覆盖模式和任务特定的方法、多感知模型和多任务模型,以促进未来对多感知表示学习的研究。
BanglaBait: Semi-Supervised Adversarial Approach for Clickbait Detection on Bangla Clickbait Dataset
results: 提出的模型在这个数据集上表现出色,超越了传统神经网络模型(LSTM、GRU、CNN)和语言特征基于的模型。这个数据集和详细的分析和比较可以提供未来关于孟加拉语文章标题检测的基础研究。研究人员已经发布相关代码和数据集。Abstract
Intentionally luring readers to click on a particular content by exploiting their curiosity defines a title as clickbait. Although several studies focused on detecting clickbait titles in English articles, low resource language like Bangla has not been given adequate attention. To tackle clickbait titles in Bangla, we have constructed the first Bangla clickbait detection dataset containing 15,056 labeled news articles and 65,406 unlabelled news articles extracted from clickbait dense news sites. Each article has been labeled by three expert linguists and includes an article's title, body, and other metadata. By incorporating labeled and unlabelled data, we finetune a pretrained Bangla transformer model in an adversarial fashion using Semi Supervised Generative Adversarial Networks (SS GANs). The proposed model acts as a good baseline for this dataset, outperforming traditional neural network models (LSTM, GRU, CNN) and linguistic feature based models. We expect that this dataset and the detailed analysis and comparison of these clickbait detection models will provide a fundamental basis for future research into detecting clickbait titles in Bengali articles. We have released the corresponding code and dataset.
摘要
<>将文本翻译成简化中文。<> clickbait 标题的目的是引诱读者点击特定内容,这定义了 clickbait 标题。 although several studies have focused on detecting clickbait titles in English articles, low-resource languages like Bangla have not received adequate attention. To address clickbait titles in Bangla, we have constructed the first Bangla clickbait detection dataset, containing 15,056 labeled news articles and 65,406 unlabeled news articles extracted from clickbait-dense news sites. Each article has been labeled by three expert linguists and includes the article's title, body, and other metadata. By incorporating labeled and unlabeled data, we fine-tune a pre-trained Bangla transformer model in an adversarial fashion using Semi-Supervised Generative Adversarial Networks (SS GANs). The proposed model serves as a good baseline for this dataset and outperforms traditional neural network models (LSTM, GRU, CNN) and linguistic feature-based models. We expect that this dataset and the detailed analysis and comparison of these clickbait detection models will provide a fundamental basis for future research into detecting clickbait titles in Bengali articles. We have released the corresponding code and dataset.
A Survey of AI Text-to-Image and AI Text-to-Video Generators
for: investigate cutting-edge approaches in Text-to-Image and Text-to-Video AI generations
methods: cover data preprocessing techniques, neural network types, and evaluation metrics used in the field
results: discuss challenges and limitations of Text-to-Image and Text-to-Video AI generations, as well as future research directionsHere’s the summary in Traditional Chinese:
for: 研究文本至图和文本至影片人工智能生成领域中的进步技术
methods: 涵盖数据清洁技术、神经网络类型和评估指标在这个领域中的使用
results: 探讨文本至图和文本至影片人工智能生成中的挑战和限制,以及未来研究方向Abstract
Text-to-Image and Text-to-Video AI generation models are revolutionary technologies that use deep learning and natural language processing (NLP) techniques to create images and videos from textual descriptions. This paper investigates cutting-edge approaches in the discipline of Text-to-Image and Text-to-Video AI generations. The survey provides an overview of the existing literature as well as an analysis of the approaches used in various studies. It covers data preprocessing techniques, neural network types, and evaluation metrics used in the field. In addition, the paper discusses the challenges and limitations of Text-to-Image and Text-to-Video AI generations, as well as future research directions. Overall, these models have promising potential for a wide range of applications such as video production, content creation, and digital marketing.
摘要
文本到图像和文本到视频人工智能生成模型是革新技术,使用深度学习和自然语言处理(NLP)技术来生成图像和视频从文本描述。本文对 Text-to-Image 和 Text-to-Video AI 生成领域进行了详细的探讨和分析,包括现有文献的概述以及不同研究中使用的方法。它还讨论了该领域的挑战和限制,以及未来的研究方向。总之,这些模型在视频生产、内容创作和数字市场营销等领域具有广阔的应用前景。Here's the translation in Traditional Chinese:文本到图像和文本到影片人工智能生成模型是革新技术,使用深度学习和自然语言处理(NLP)技术来生成图像和影片从文本描述。本文对 Text-to-Image 和 Text-to-Video AI 生成领域进行了详细的探讨和分析,包括现有文献的概述以及不同研究中使用的方法。它还讨论了该领域的挑战和限制,以及未来的研究方向。总之,这些模型在影片生产、内容创作和数位市场营销等领域具有广阔的应用前景。
results: 试验结果表明,引入自适应性可以使归因方法更加强大和多功能。Abstract
Deep learning has become the standard approach for most machine learning tasks. While its impact is undeniable, interpreting the predictions of deep learning models from a human perspective remains a challenge. In contrast to model training, model interpretability is harder to quantify and pose as an explicit optimization problem. Inspired by the AUC softmax information curve (AUC SIC) metric for evaluating feature attribution methods, we propose a unified discrete optimization framework for feature attribution and feature selection based on subset selection. This leads to a natural adaptive generalization of the path integrated gradients (PIG) method for feature attribution, which we call Greedy PIG. We demonstrate the success of Greedy PIG on a wide variety of tasks, including image feature attribution, graph compression/explanation, and post-hoc feature selection on tabular data. Our results show that introducing adaptivity is a powerful and versatile method for making attribution methods more powerful.
摘要
深度学习已成为大多数机器学习任务的标准方法。虽然其影响无疑,但从人类视角来解释深度学习模型预测结果仍然是一个挑战。与模型训练相比,模型解释更难以量化和表示为显式优化问题。 Drawing inspiration from AUC softmax information curve (AUC SIC) metric for evaluating feature attribution methods, we propose a unified discrete optimization framework for feature attribution and feature selection based on subset selection. This leads to a natural adaptive generalization of the path integrated gradients (PIG) method for feature attribution, which we call Greedy PIG. We demonstrate the success of Greedy PIG on a wide variety of tasks, including image feature attribution, graph compression/explanation, and post-hoc feature selection on tabular data. Our results show that introducing adaptivity is a powerful and versatile method for making attribution methods more powerful.Here's the text with some notes on the translation:* "深度学习" (shēn dào xué xí) is the Chinese term for "deep learning"* "模型训练" (mó delè xùn zhí) is the Chinese term for "model training"* "模型解释" (mó delè jiě jiè) is the Chinese term for "model interpretation"* "AUC SIC" (AUC softmax information curve) is translated as "AUC SIC" (AUC 软MAX信息曲线)* "subset selection" is translated as "子集选择" (zǐ jiāo jiàn zhèng)* "path integrated gradients" (PIG) is translated as "路径集成 gradient" (lù jì zhì zhèng jiè dào)* "Greedy PIG" is translated as "积极 PIG" (jī jí PIG)Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Taiwan, and other regions.
FourierGNN: Rethinking Multivariate Time Series Forecasting from a Pure Graph Perspective
methods: 这篇论文使用了一种新的数据结构 called hypervariate graph,它将每个时间序列的值看作一个图节点,并将每个滑动窗口转换为一个完全连接的空间时间图。然后,它提出了一种新的架构 called Fourier Graph Neural Network (FourierGNN),它可以在快 Fourier 空间中进行矩阵乘法,并且可以有效地预测未来时间序列的值。
results: 在七个 dataset 上进行了广泛的实验,结果显示,FourierGNN 可以在预测时间序列值方面具有更高的效果,同时具有更低的复杂性和更少的参数。Abstract
Multivariate time series (MTS) forecasting has shown great importance in numerous industries. Current state-of-the-art graph neural network (GNN)-based forecasting methods usually require both graph networks (e.g., GCN) and temporal networks (e.g., LSTM) to capture inter-series (spatial) dynamics and intra-series (temporal) dependencies, respectively. However, the uncertain compatibility of the two networks puts an extra burden on handcrafted model designs. Moreover, the separate spatial and temporal modeling naturally violates the unified spatiotemporal inter-dependencies in real world, which largely hinders the forecasting performance. To overcome these problems, we explore an interesting direction of directly applying graph networks and rethink MTS forecasting from a pure graph perspective. We first define a novel data structure, hypervariate graph, which regards each series value (regardless of variates or timestamps) as a graph node, and represents sliding windows as space-time fully-connected graphs. This perspective considers spatiotemporal dynamics unitedly and reformulates classic MTS forecasting into the predictions on hypervariate graphs. Then, we propose a novel architecture Fourier Graph Neural Network (FourierGNN) by stacking our proposed Fourier Graph Operator (FGO) to perform matrix multiplications in Fourier space. FourierGNN accommodates adequate expressiveness and achieves much lower complexity, which can effectively and efficiently accomplish the forecasting. Besides, our theoretical analysis reveals FGO's equivalence to graph convolutions in the time domain, which further verifies the validity of FourierGNN. Extensive experiments on seven datasets have demonstrated our superior performance with higher efficiency and fewer parameters compared with state-of-the-art methods.
摘要
多变量时间序列(MTS)预测已经在多个行业得到了重要的应用。当前的状态艺术Graph Neural Network(GNN)基本预测方法通常需要图网络(例如GCN)和时间网络(例如LSTM)来捕捉 между序列(空间)动力和内部序列(时间)依赖项,分别。然而,这两种网络的不确定兼容性会增加手动设计模型的困难度。另外,分离的空间和时间模型自然地违反了实际世界中的一体化空时间依赖关系,这大大降低了预测性能。为了解决这些问题,我们开explored an interesting direction of directly applying graph networks and rethinking MTS forecasting from a pure graph perspective.我们首先定义了一种新的数据结构,卷积graph,其中每个时间序列值(无论是变量或时间戳)都被视为图节点,并将滑动窗口转化为空间时间完全连接图。这种视角同时考虑了空间时间动力的统一,并将经典MTS预测转化为对卷积图的预测。然后,我们提出了一种新的架构Fourier Graph Neural Network(FourierGNN),其基于我们提出的快捷Graph Operator(FGO)来执行矩阵乘法操作。FourierGNN具有充分的表达能力,同时可以有效地和高效地完成预测。此外,我们的理论分析表明FGO的等价性于图 convolutions在时间频谱中,这进一步证明了FourierGNN的有效性。我们在七个数据集上进行了广泛的实验,结果显示我们的性能高于当前状态艺术方法,同时具有更低的复杂性和更少的参数。
Frequency-domain MLPs are More Effective Learners in Time Series Forecasting
results: 在13个实验室中,与状态艺术方法进行比较,FreTS方法具有更高的预测精度和稳定性。Abstract
Time series forecasting has played the key role in different industrial, including finance, traffic, energy, and healthcare domains. While existing literatures have designed many sophisticated architectures based on RNNs, GNNs, or Transformers, another kind of approaches based on multi-layer perceptrons (MLPs) are proposed with simple structure, low complexity, and {superior performance}. However, most MLP-based forecasting methods suffer from the point-wise mappings and information bottleneck, which largely hinders the forecasting performance. To overcome this problem, we explore a novel direction of applying MLPs in the frequency domain for time series forecasting. We investigate the learned patterns of frequency-domain MLPs and discover their two inherent characteristic benefiting forecasting, (i) global view: frequency spectrum makes MLPs own a complete view for signals and learn global dependencies more easily, and (ii) energy compaction: frequency-domain MLPs concentrate on smaller key part of frequency components with compact signal energy. Then, we propose FreTS, a simple yet effective architecture built upon Frequency-domain MLPs for Time Series forecasting. FreTS mainly involves two stages, (i) Domain Conversion, that transforms time-domain signals into complex numbers of frequency domain; (ii) Frequency Learning, that performs our redesigned MLPs for the learning of real and imaginary part of frequency components. The above stages operated on both inter-series and intra-series scales further contribute to channel-wise and time-wise dependency learning. Extensive experiments on 13 real-world benchmarks (including 7 benchmarks for short-term forecasting and 6 benchmarks for long-term forecasting) demonstrate our consistent superiority over state-of-the-art methods.
摘要
时间序列预测在不同的行业中扮演着关键角色,包括金融、交通、能源和医疗领域。而现有的文献中设计了许多复杂的架构,如RNNs、GNNs或Transformers,另一种基于多层感知器(MLPs)的方法具有简单的结构、低复杂度和超越性。然而,大多数MLP基于预测方法受到点约映射和信息瓶颈的限制,这大大降低预测性能。为了解决这个问题,我们开探了在频率域应用MLP的新方向,并 investigate了频率域MLP学习的特征。我们发现频率域MLP拥有两种内在特征,即全球视图和能量压缩,这两种特征使得频率域MLP在预测时Series中表现出优异。然后,我们提出了FreTS,一种简单 yet有效的架构,基于频率域MLP进行时Series预测。FreTS主要包括两个阶段,即频率域转换和频率学习。频率域转换将时间域信号转换为复数频率域,而频率学习则使用我们重新设计的MLP进行频率组成部分的学习。这两个阶段在时间和通道级别进行了规模进行了时间和通道级别的依赖学习。我们对13个实际benchmark进行了广泛的实验,结果表明我们在state-of-the-art方法之上保持了稳定的优势。
results: 本文的分析发现,目前关于公平测试的研究做到了一定的进步,但还有一些需要改进的方面。未来的研究应该更加强调利用现有的搜索测试方法来进行公平测试,以确保人工智能系统中的偏见问题得到解决。Abstract
Artificial Intelligence (AI) has demonstrated remarkable capabilities in domains such as recruitment, finance, healthcare, and the judiciary. However, biases in AI systems raise ethical and societal concerns, emphasizing the need for effective fairness testing methods. This paper reviews current research on fairness testing, particularly its application through search-based testing. Our analysis highlights progress and identifies areas of improvement in addressing AI systems biases. Future research should focus on leveraging established search-based testing methodologies for fairness testing.
摘要
人工智能(AI)在招聘、金融、医疗和司法等领域表现出了惊人的能力,但AI系统中的偏见引起了道德和社会问题的关注,高调出了有效的公平测试方法的需求。本文综述当前关于公平测试的研究,特别是通过搜索基于测试方法的应用。我们的分析显示了进步和改进的方向,未来的研究应该集中于利用已有的搜索基于测试方法来进行公平测试。
results: 实验结果表明,LoGiPT在两个公共的逻辑推理数据集上表现出色,超越了现有的解题器辅助语言模型和少量提示方法,并且在竞争的LLM如ChatGPT或GPT-4上表现了竞争力。Abstract
Logical reasoning is a fundamental aspect of human intelligence and a key component of tasks like problem-solving and decision-making. Recent advancements have enabled Large Language Models (LLMs) to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. The state-of-the-art, solver-augmented language models, use LLMs to parse natural language logical questions into symbolic representations first and then adopt external logical solvers to take in the symbolic representations and output the answers. Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions. In this paper, we introduce LoGiPT, a novel language model that directly emulates the reasoning processes of logical solvers and bypasses the parsing errors by learning to strict adherence to solver syntax and grammar. LoGiPT is fine-tuned on a newly constructed instruction-tuning dataset derived from revealing and refining the invisible reasoning process of deductive solvers. Experimental results on two public deductive reasoning datasets demonstrate that LoGiPT outperforms state-of-the-art solver-augmented LMs and few-shot prompting methods on competitive LLMs like ChatGPT or GPT-4.
摘要
理智推理是人类智能的基本方面,对于问题解决和决策都是重要组成部分。 current advancements have enabled Large Language Models (LLMs) to potentially exhibit reasoning capabilities, but complex logical reasoning remains a challenge. State-of-the-art solver-augmented language models use LLMs to parse natural language logical questions into symbolic representations first and then adopt external logical solvers to take in the symbolic representations and output the answers. Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions.在这篇论文中,我们介绍了LoGiPT,一种新的语言模型,它直接模拟逻辑解决器的思维过程,并通过学习逻辑解决器的语法和语言规则,减少或消除解析错误。 LoGiPT 在两个公共的逻辑推理数据集上进行了实验,并证明了它在与state-of-the-art solver-augmented LMs和 few-shot prompting methods进行比较中表现出色。
Going beyond persistent homology using persistent homology
results: 论文提出了一种新的颜色分离集来解决图像模型中的表达限制问题,并实现了一种基于颜色级别的PH的学习方法,从而提高了图像模型的表达能力。Abstract
Representational limits of message-passing graph neural networks (MP-GNNs), e.g., in terms of the Weisfeiler-Leman (WL) test for isomorphism, are well understood. Augmenting these graph models with topological features via persistent homology (PH) has gained prominence, but identifying the class of attributed graphs that PH can recognize remains open. We introduce a novel concept of color-separating sets to provide a complete resolution to this important problem. Specifically, we establish the necessary and sufficient conditions for distinguishing graphs based on the persistence of their connected components, obtained from filter functions on vertex and edge colors. Our constructions expose the limits of vertex- and edge-level PH, proving that neither category subsumes the other. Leveraging these theoretical insights, we propose RePHINE for learning topological features on graphs. RePHINE efficiently combines vertex- and edge-level PH, achieving a scheme that is provably more powerful than both. Integrating RePHINE into MP-GNNs boosts their expressive power, resulting in gains over standard PH on several benchmarks for graph classification.
摘要
Message-passing graph neural networks (MP-GNNs) 的表示限制已经很好地了解,例如通过weisfeiler-leman (WL) 测试来判断图是否同构。通过添加图的拓扑特征via persistent homology (PH) 得到了广泛应用,但是确定 attributed graphs 中 PH 能认可的类型仍然是一个重要的开放问题。我们提出了一种新的色分集来解决这个重要问题。我们证明了基于图连接组件的persistence得到了必要和 suficient 条件,并且证明 neither vertex-level PH nor edge-level PH 可以包含另一个类型。我们建议RePHINE,一种可以有效地结合 vertex-level PH 和 edge-level PH 的学习方法。RePHINE 可以提高 MP-GNNs 的表达能力,在多个图分类 benchmark 上实现了比标准 PH 更高的性能。
results: 基于基因算法开发了一种新的视觉速度计算方法,并通过比较与基能量方法和另一种metaheuristic方法进行比较,证明了我们的创新算法的效率。Abstract
Our work aims to estimate the camera motion mounted on the head of a mobile robot or a moving object from RGB-D images in a static scene. The problem of motion estimation is transformed into a nonlinear least squares function. Methods for solving such problems are iterative. Various classic methods gave an iterative solution by linearizing this function. We can also use the metaheuristic optimization method to solve this problem and improve results. In this paper, a new algorithm is developed for visual odometry using a sequence of RGB-D images. This algorithm is based on a genetic algorithm. The proposed iterative genetic algorithm searches using particles to estimate the optimal motion and then compares it to the traditional methods. To evaluate our method, we use the root mean square error to compare it with the based energy method and another metaheuristic method. We prove the efficiency of our innovative algorithm on a large set of images.
摘要
我团队的工作目标是从RGB-D图像中估算移动机器或移动物体的摄像头运动。这个问题被转化为非线性最小二乘函数。解决这类问题的方法是迭代的。经典方法可以将这个函数线性化以获得迭代解决方案。我们还可以使用metaheuristic优化方法解决这个问题,以提高结果。在这篇论文中,我们开发了一种基于遗传算法的视觉奔迅算法。我们使用一系列RGB-D图像来测试我们的算法,并与传统方法和另一种metaheuristic方法进行比较。我们使用根mean square error来评估我们的方法,并证明了我们的创新算法在大量图像上的效率。
Incorporating sufficient physical information into artificial neural networks: a guaranteed improvement via physics-based Rao-Blackwellization
results: 应用于材料模型化、塑性钢 simulate、质量违 brittle 损伤和塑性实验,可以提高预测的精度,减少噪音、过拟合和数据要求。Abstract
The concept of Rao-Blackwellization is employed to improve predictions of artificial neural networks by physical information. The error norm and the proof of improvement are transferred from the original statistical concept to a deterministic one, using sufficient information on physics-based conditions. The proposed strategy is applied to material modeling and illustrated by examples of the identification of a yield function, elasto-plastic steel simulations, the identification of driving forces for quasi-brittle damage and rubber experiments. Sufficient physical information is employed, e.g., in the form of invariants, parameters of a minimization problem, dimensional analysis, isotropy and differentiability. It is proven how intuitive accretion of information can yield improvement if it is physically sufficient, but also how insufficient or superfluous information can cause impairment. Opportunities for the improvement of artificial neural networks are explored in terms of the training data set, the networks' structure and output filters. Even crude initial predictions are remarkably improved by reducing noise, overfitting and data requirements.
摘要
“RAO-BLACKWELLIZATION”技术可以提高人工神经网络预测的准确性,通过物理信息的充分利用。原始统计概念的错误 нор和证明改进被转移到决定性概念上,使用物理条件的充分信息。提议的策略被应用于材料模型化,通过示例描述了固体弹性钢的预测、不可逆减弱损伤和塑料实验的标定。使用物理信息,如 invariants、最小化问题的参数、维度分析、均匀性和微分性。证明了如果物理信息充分,则直观增加信息可以带来改进,但也证明了不充分或过度信息会导致下降。探讨人工神经网络的改进机会,包括训练数据集、网络结构和输出筛选。жеven crude initial predictions can be remarkably improved by reducing noise, overfitting and data requirements.
High-dimensional mixed-categorical Gaussian processes with application to multidisciplinary design optimization for a green aircraft
paper_authors: Paul Saves, Youssef Diouane, Nathalie Bartoli, Thierry Lefebvre, Joseph Morlier
for: 这 paper 的目的是提出一种基于 Gaussian Process(GP)的混合 categorical 优化方法,以解决多学科设计优化中混合 categorical 变量的问题。
methods: 这 paper 使用 Partial Least Squares(PLS)回归来构建混合 categorical GP,并通过 Kriging with PLS 来扩展 GP 的应用范围。
results: 该方法在实际应用中得到了成功,包括对一架悬臂 beam 的结构行为的研究以及一架绿色飞机的多学科设计优化。 results 表明,该方法可以减少飞机在一次任务中消耗的燃料量为 439 公斤。Abstract
Multidisciplinary design optimization (MDO) methods aim at adapting numerical optimization techniques to the design of engineering systems involving multiple disciplines. In this context, a large number of mixed continuous, integer, and categorical variables might arise during the optimization process, and practical applications involve a significant number of design variables. Recently, there has been a growing interest in mixed-categorical metamodels based on Gaussian Process (GP) for Bayesian optimization. In particular, to handle mixed-categorical variables, several existing approaches employ different strategies to build the GP. These strategies either use continuous kernels, such as the continuous relaxation or the Gower distance-based kernels, or direct estimation of the correlation matrix, such as the exponential homoscedastic hypersphere (EHH) or the Homoscedastic Hypersphere (HH) kernel. Although the EHH and HH kernels are shown to be very efficient and lead to accurate GPs, they are based on a large number of hyperparameters. In this paper, we address this issue by constructing mixed-categorical GPs with fewer hyperparameters using Partial Least Squares (PLS) regression. Our goal is to generalize Kriging with PLS, commonly used for continuous inputs, to handle mixed-categorical inputs. The proposed method is implemented in the open-source software SMT and has been efficiently applied to structural and multidisciplinary applications. Our method is used to effectively demonstrate the structural behavior of a cantilever beam and facilitates MDO of a green aircraft, resulting in a 439-kilogram reduction in the amount of fuel consumed during a single aircraft mission.
摘要
多学科设计优化(MDO)方法是指通过数学优化技术来设计工程系统中的多学科系统。在这个上下文中,可能会出现大量的混合连续、整数和分类变量,而实际应用中的设计变量数量可能很大。现在,关于混合分类变量的泛化模型方法已经受到了越来越多的关注。特别是在涉及到混合分类变量时,exist several approaches to build the GP, such as using continuous kernels, like the continuous relaxation or the Gower distance-based kernels, or direct estimation of the correlation matrix, like the exponential homoscedastic hypersphere (EHH) or the Homoscedastic Hypersphere (HH) kernel. Although the EHH and HH kernels are shown to be very efficient and lead to accurate GPs, they are based on a large number of hyperparameters. In this paper, we address this issue by constructing mixed-categorical GPs with fewer hyperparameters using Partial Least Squares (PLS) regression. Our goal is to generalize Kriging with PLS, commonly used for continuous inputs, to handle mixed-categorical inputs. The proposed method is implemented in the open-source software SMT and has been efficiently applied to structural and multidisciplinary applications. Our method is used to effectively demonstrate the structural behavior of a cantilever beam and facilitates MDO of a green aircraft, resulting in a 439-kilogram reduction in the amount of fuel consumed during a single aircraft mission.
Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking
results: 研究发现,使用RAG可以大幅降低操作成本,并且GPT-4的数据增强技术可以提高表现在限制的数据情况下。Abstract
Standard Full-Data classifiers in NLP demand thousands of labeled examples, which is impractical in data-limited domains. Few-shot methods offer an alternative, utilizing contrastive learning techniques that can be effective with as little as 20 examples per class. Similarly, Large Language Models (LLMs) like GPT-4 can perform effectively with just 1-5 examples per class. However, the performance-cost trade-offs of these methods remain underexplored, a critical concern for budget-limited organizations. Our work addresses this gap by studying the aforementioned approaches over the Banking77 financial intent detection dataset, including the evaluation of cutting-edge LLMs by OpenAI, Cohere, and Anthropic in a comprehensive set of few-shot scenarios. We complete the picture with two additional methods: first, a cost-effective querying method for LLMs based on retrieval-augmented generation (RAG), able to reduce operational costs multiple times compared to classic few-shot approaches, and second, a data augmentation method using GPT-4, able to improve performance in data-limited scenarios. Finally, to inspire future research, we provide a human expert's curated subset of Banking77, along with extensive error analysis.
摘要
<>将文本翻译成简化中文。<>普通的全数据分类器在自然语言处理中需要千个标注示例,这是数据有限领域中不切实际的。少量方法提供了一个 alternatives, 使用对比学习技术,可以在每个类型只需20个示例。同时,大语言模型(LLMs)如GPT-4可以在每个类型只需1-5个示例。然而,这些方法的性能成本负担仍未得到充分探讨,这是企业有限预算的关键问题。我们的工作解决这个问题,通过对上述方法的研究,包括开放AI、Cohere和人类智慧的cutting-edge LLMs在内的广泛的少量场景。我们还添加了两种额外方法:首先,一种基于检索增生(RAG)的cost-effective查询方法,可以在经典少量场景中多次减少操作成本,并第二,一种使用GPT-4的数据扩展方法,可以在数据有限场景中提高性能。最后,为未来研究提供了人类专家精心审核的 Banking77 子集,以及广泛的错误分析。
In-Context Learning for MIMO Equalization Using Transformer-Based Sequence Models
results: 研究表明,通过预训练,可以使transformer-based ICL在MIMO均衡问题中达到阈值行为,即,随着预训练任务数量的增加,性能从预先确定的 minimum mean squared error(MMSE)均衡器转变为真实数据生成的prior。Abstract
Large pre-trained sequence models, such as transformer-based architectures, have been recently shown to have the capacity to carry out in-context learning (ICL). In ICL, a decision on a new input is made via a direct mapping of the input and of a few examples from the given task, serving as the task's context, to the output variable. No explicit updates of model parameters are needed to tailor the decision to a new task. Pre-training, which amounts to a form of meta-learning, is based on the observation of examples from several related tasks. Prior work has shown ICL capabilities for linear regression. In this study, we leverage ICL to address the inverse problem of multiple-input and multiple-output (MIMO) equalization based on a context given by pilot symbols. A task is defined by the unknown fading channel and by the signal-to-noise ratio (SNR) level, which may be known. To highlight the practical potential of the approach, we allow for the presence of quantization of the received signals. We demonstrate via numerical results that transformer-based ICL has a threshold behavior, whereby, as the number of pre-training tasks grows, the performance switches from that of a minimum mean squared error (MMSE) equalizer with a prior determined by the pre-trained tasks to that of an MMSE equalizer with the true data-generating prior.
摘要
大型预训模型,如基于转换器架构的模型,最近已经显示出在上下文学习(ICL)中有较大的容量。在 ICL 中,一个决策是通过直接映射输入和任务上的一些例子来进行决策。无需显式更新模型参数,可以适应新任务。预训,即一种形式的meta-学习,基于多个相关任务的观察。前工作已经证明了 ICL 的能力 для线性回归。在这个研究中,我们利用 ICL 来解决多输入多出力(MIMO)平衡问题,基于一个 Context 给出的飞行符号。任务是由未知拍摄通道和信号噪声比(SNR)水平确定。为了强调实用的潜力,我们允许接收信号的量化。我们通过数值结果表明,基于转换器的 ICL 存在一个阈值行为,其中,当预训任务数量增加时,性能从一个基于预训任务的最小方差平均值(MMSE)平衡器转换为一个基于真实数据生成的 prior 的 MMSE 平衡器。
RIGA: A Regret-Based Interactive Genetic Algorithm
For: 解决多目标 combinatorial 优化问题中的偏好不确定性问题( preference imprecision problem)。* Methods: 使用互动遗传算法(Interactive Genetic Algorithm,IGA),其包括: + 使用 regret-based elicitation 技术缩小参数空间。 + 在参数实例上应用 génétiques 运算(genetic operators)来更好地探索参数空间。 + 使用现有的解决方案(solving methods)来生成有前景的解(promising solutions)。* Results: 对多目标包袋和旅行团队问题进行了测试,并证明了 RIGA 可以在有界时间内运行,并且不超过一定的数量的查询。同时,对多个表现指标(computation times, gap to optimality, number of queries),RIGAs 的表现比现有的算法更好。Abstract
In this paper, we propose an interactive genetic algorithm for solving multi-objective combinatorial optimization problems under preference imprecision. More precisely, we consider problems where the decision maker's preferences over solutions can be represented by a parameterized aggregation function (e.g., a weighted sum, an OWA operator, a Choquet integral), and we assume that the parameters are initially not known by the recommendation system. In order to quickly make a good recommendation, we combine elicitation and search in the following way: 1) we use regret-based elicitation techniques to reduce the parameter space in a efficient way, 2) genetic operators are applied on parameter instances (instead of solutions) to better explore the parameter space, and 3) we generate promising solutions (population) using existing solving methods designed for the problem with known preferences. Our algorithm, called RIGA, can be applied to any multi-objective combinatorial optimization problem provided that the aggregation function is linear in its parameters and that a (near-)optimal solution can be efficiently determined for the problem with known preferences. We also study its theoretical performances: RIGA can be implemented in such way that it runs in polynomial time while asking no more than a polynomial number of queries. The method is tested on the multi-objective knapsack and traveling salesman problems. For several performance indicators (computation times, gap to optimality and number of queries), RIGA obtains better results than state-of-the-art algorithms.
摘要
在这篇论文中,我们提出了一种互动式遗传算法,用于解决具有偏好不确定性的多目标组合优化问题。具体来说,我们考虑了具有以下特点的问题:解决方案的偏好可以通过一个参数化的汇聚函数(例如Weighted sum、OWA运算符、Choquet积分)来表示,并且偏好参数在初始化时并不知道。为了快速提供高质量的建议,我们将感知和搜索结合使用,具体来说是:1)使用 regret-based elicitation技术来减少参数空间,2)在参数实例(而不是解决方案)上应用遗传运算,3)使用现有的解决方案设计方法来生成优秀的解决方案(人口)。我们称之为RIGА算法,它可以应用于任何多目标组合优化问题,只要汇聚函数是线性的,并且可以有效地确定(或近似)优质解决方案。我们还研究了其理论性能:RIGА算法可以在 polynomial 时间内运行,并且只需要对问题进行 polynomial 数量的询问。我们测试了这种方法在多重目标随机抽样问题和多重目标随机包问题上,并证明了它在几个性能指标(计算时间、落差和询问数)上表现更好。
Deep learning for 3D Object Detection and Tracking in Autonomous Driving: A Brief Survey
results: 本文综合比较了不同方法的实验结果,并提出了未来研究的方向,以帮助读者更好地了解3D点云数据的对象检测和跟踪任务。Abstract
Object detection and tracking are vital and fundamental tasks for autonomous driving, aiming at identifying and locating objects from those predefined categories in a scene. 3D point cloud learning has been attracting more and more attention among all other forms of self-driving data. Currently, there are many deep learning methods for 3D object detection. However, the tasks of object detection and tracking for point clouds still need intensive study due to the unique characteristics of point cloud data. To help get a good grasp of the present situation of this research, this paper shows recent advances in deep learning methods for 3D object detection and tracking.
摘要
<> translate_language: zh-CN对象探测和跟踪是自动驾驶中非常重要和基本的任务,目的是在场景中从先定的类别中标识和定位对象。三维点云学习在所有自驾数据中受到更多的关注。目前有许多深度学习方法 для 3D 对象探测。但是对于点云数据的特殊特点,对象探测和跟踪 tasks 仍然需要进一步的研究。为了帮助更好地了解这个研究的现状,本文介绍了最新的深度学习方法 для 3D 对象探测和跟踪。Note: I've set the `translate_language` parameter to `zh-CN` to indicate that the text should be translated into Simplified Chinese.
Reviewing Developments of Graph Convolutional Network Techniques for Recommendation Systems
results: 论文分析了图神经网络在推荐系统中的挑战和开放问题,包括图构建、嵌入传播和聚合以及计算效率等。这些分析帮助我们更好地探索未来的发展方向。Abstract
The Recommender system is a vital information service on today's Internet. Recently, graph neural networks have emerged as the leading approach for recommender systems. We try to review recent literature on graph neural network-based recommender systems, covering the background and development of both recommender systems and graph neural networks. Then categorizing recommender systems by their settings and graph neural networks by spectral and spatial models, we explore the motivation behind incorporating graph neural networks into recommender systems. We also analyze challenges and open problems in graph construction, embedding propagation and aggregation, and computation efficiency. This guides us to better explore the future directions and developments in this domain.
摘要
“推荐系统是今天互联网上重要的资讯服务。最近,图 neural network 已经成为推荐系统的主要方法。我们尝试综述最近的文献,探讨推荐系统和图 neural network 的背景和发展,以及将推荐系统分为不同的设定和将图 neural network 分为спектраль和空间模型。我们也分析了将图 neural network 应用到推荐系统的动机,以及构建图、传播嵌入和聚合的挑战和开放问题。这导我们更好地探索未来的发展方向。”Note: Simplified Chinese is used here, as it is more commonly used in mainland China. If you prefer Traditional Chinese, I can provide that version as well.
Enhancing Actuarial Non-Life Pricing Models via Transformers
results: 比较了多种 referential 模型,包括 generalized linear models、feed-forward neural networks、combined actuarial neural networks、LocalGLMnet 和 pure feature tokenizer transformer,并证明新方法可以在 real-world claim frequency 数据上达到更好的结果,同时保持一定的 generalized linear model 优点Abstract
Currently, there is a lot of research in the field of neural networks for non-life insurance pricing. The usual goal is to improve the predictive power via neural networks while building upon the generalized linear model, which is the current industry standard. Our paper contributes to this current journey via novel methods to enhance actuarial non-life models with transformer models for tabular data. We build here upon the foundation laid out by the combined actuarial neural network as well as the localGLMnet and enhance those models via the feature tokenizer transformer. The manuscript demonstrates the performance of the proposed methods on a real-world claim frequency dataset and compares them with several benchmark models such as generalized linear models, feed-forward neural networks, combined actuarial neural networks, LocalGLMnet, and pure feature tokenizer transformer. The paper shows that the new methods can achieve better results than the benchmark models while preserving certain generalized linear model advantages. The paper also discusses the practical implications and challenges of applying transformer models in actuarial settings.
摘要
当前, neuronal networks 在非生命保险价值评估领域中有很多研究。目标通常是通过 neuronal networks 提高预测力,而基于现有的泛化线性模型(Generalized Linear Model, GLM)。我们的论文在这个领域中做出了贡献,通过 novel methods 增强 actuarial non-life 模型。我们在 combined actuarial neural network 和 localGLMnet 基础上建立了新的模型,并使用 feature tokenizer transformer 进行增强。 manuscript 中对实际的审核频率数据集进行了表现测试,并与多个 Referential models,如 generalized linear models、feed-forward neural networks、combined actuarial neural networks、LocalGLMnet 和 pure feature tokenizer transformer 进行比较。结果显示,新方法可以在 benchmark models 之上 achieve better results,同时保持一定的 Generalized Linear Model 优点。论文还讨论了应用 transformer models 在 actuarial 设置中的实际意义和挑战。
RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph
results: 实验结果表明,RSG可以为机器人提供合理的技能推理,并使四肢机器人快速适应新的情况和学习新的技能。Abstract
Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed to survive, and can quickly acquire new ones, by composing fundamental skills with limited experience. Inspired by this, we propose a novel framework, named Robot Skill Graph (RSG) for organizing massive fundamental skills of robots and dexterously reusing them for fast adaptation. Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of massive dynamic behavioral skills instead of static knowledge in KG and enables discovering implicit relations that exist in be-tween of learning context and acquired skills of robots, serving as a starting point for understanding subtle patterns existing in robots' skill learning. Extensive experimental results demonstrate that RSG can provide rational skill inference upon new tasks and environments and enable quadruped robots to adapt to new scenarios and learn new skills rapidly.
摘要
开发能够快速适应未经见过的野外情况的机器人智能系统是探索自主机器人的一个核心挑战。虽然有些很出色的进步在四肢机器人的步态稳定和技能学习方面,但它们的快速适应仍然不如自然界中的动物。动物出生时拥有大量需要生存的基础技能,并可以快速获得新的技能,通过精心组合基本技能和有限的经验。受这种启示,我们提出了一个新的框架,即机器人技能图(RSG),用于组织机器人的基本技能和重用它们 для快速适应。RSG的结构类似知识图(KG),但是它使用动态行为技能而不是静态知识,可以发现机器人学习过程中存在的潜在关系,并作为机器人技能学习的开始点,以便理解机器人技能学习中的细微趋势。我们的实验结果表明,RSG可以为机器人提供合理的技能推理,并使四肢机器人快速适应新任务和环境,快速学习新技能。
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
results: 在 Minecraft 宇宙测试 benchmark 中,JARVIS-1 展现出了 nearly perfect 的表现,完成了200多个任务,其中包括从入门到中等水平的任务。JARVIS-1 在长期任务中取得了12.5%的完成率,这与之前的记录比起来是5倍的提高。此外,JARVIS-1 还能够自我提升,这是因为它使用了多模态记忆,这种自我提升可以持续进行,从而实现更好的智能和自主性。Abstract
Achieving human-like planning and control with multimodal observations in an open world is a key milestone for more functional generalist agents. Existing approaches can handle certain long-horizon tasks in an open world. However, they still struggle when the number of open-world tasks could potentially be infinite and lack the capability to progressively enhance task completion as game time progresses. We introduce JARVIS-1, an open-world agent that can perceive multimodal input (visual observations and human instructions), generate sophisticated plans, and perform embodied control, all within the popular yet challenging open-world Minecraft universe. Specifically, we develop JARVIS-1 on top of pre-trained multimodal language models, which map visual observations and textual instructions to plans. The plans will be ultimately dispatched to the goal-conditioned controllers. We outfit JARVIS-1 with a multimodal memory, which facilitates planning using both pre-trained knowledge and its actual game survival experiences. In our experiments, JARVIS-1 exhibits nearly perfect performances across over 200 varying tasks from the Minecraft Universe Benchmark, ranging from entry to intermediate levels. JARVIS-1 has achieved a completion rate of 12.5% in the long-horizon diamond pickaxe task. This represents a significant increase up to 5 times compared to previous records. Furthermore, we show that JARVIS-1 is able to $\textit{self-improve}$ following a life-long learning paradigm thanks to multimodal memory, sparking a more general intelligence and improved autonomy. The project page is available at https://craftjarvis-jarvis1.github.io.
摘要
实现人类化规划和控制,使用多Modal观察在开放世界中是功能普通代理的关键里程碑。现有方法可以处理某些长期任务在开放世界中,但它们仍然在数量可能无限的任务中受到挑战,而且缺乏逐渐提高任务完成度的能力。我们介绍JARVIS-1,一个在 Minecraft 宇宙中运行的开放世界代理,可以感知多Modal输入(视觉观察和人工指令),生成复杂的计划,并执行具体的控制,全部运行在 Minecraft 游戏中。具体来说,我们基于预训练的多Modal语言模型,将视觉观察和文本指令映射到计划。计划将被最终转交给目标受控器。我们为 JARVIS-1 增加了多Modal 记忆,以便通过预训练知识和实际游戏生存经验来帮助计划。在我们的实验中,JARVIS-1 在 Minecraft Universe Benchmark 上 exhibits nearly perfect 性能,包括多达 200 个任务,覆盖从入门到中级水平。JARVIS-1 在长期钻石钻刀任务中达到了12.5%的完成率,这比前一记录提高了5倍。此外,我们表明 JARVIS-1 能够 $\textit{自我改进}$ ,采用生命长学学习模式,增强智能和自主性。项目页面可以在 中找到。
Robust Adversarial Attacks Detection for Deep Learning based Relative Pose Estimation for Space Rendezvous
results: 实验结果显示,提出的异常检测方法可以准确地检测异常攻击,其检测精度为99.21%。此外,在实验室设置中使用了真实数据进行测试,实验结果表明,提出的异常检测方法在实际应用中可以达到96.29%的检测精度。Abstract
Research on developing deep learning techniques for autonomous spacecraft relative navigation challenges is continuously growing in recent years. Adopting those techniques offers enhanced performance. However, such approaches also introduce heightened apprehensions regarding the trustability and security of such deep learning methods through their susceptibility to adversarial attacks. In this work, we propose a novel approach for adversarial attack detection for deep neural network-based relative pose estimation schemes based on the explainability concept. We develop for an orbital rendezvous scenario an innovative relative pose estimation technique adopting our proposed Convolutional Neural Network (CNN), which takes an image from the chaser's onboard camera and outputs accurately the target's relative position and rotation. We perturb seamlessly the input images using adversarial attacks that are generated by the Fast Gradient Sign Method (FGSM). The adversarial attack detector is then built based on a Long Short Term Memory (LSTM) network which takes the explainability measure namely SHapley Value from the CNN-based pose estimator and flags the detection of adversarial attacks when acting. Simulation results show that the proposed adversarial attack detector achieves a detection accuracy of 99.21%. Both the deep relative pose estimator and adversarial attack detector are then tested on real data captured from our laboratory-designed setup. The experimental results from our laboratory-designed setup demonstrate that the proposed adversarial attack detector achieves an average detection accuracy of 96.29%.
摘要
研究在开发深度学习技术以提高自主空间飞行器相对导航的挑战在最近几年内不断增长。采用这些技术可以提高性能,但这些方法也增加了对深度学习方法的信任和安全性的担忧,尤其是它们对抗性攻击的敏感性。在这项工作中,我们提出了一种基于 explainability 概念的对深度神经网络 pose 估计方法的 adversarial 攻击检测方法。我们在推送器上的摄像头拍摄的图像上采用我们提出的卷积神经网络(CNN),输出target的相对位置和旋转精度。我们使用 Fast Gradient Sign Method(FGSM)生成的抗击性攻击来略微地扰乱输入图像。然后,我们根据 Long Short Term Memory(LSTM)网络来建立一个基于 explainability 度的 adversarial 攻击检测器,这里的 explainability 度是 CNN 基于 pose 估计器输出的 SHapley Value。实验结果显示,我们的 adversarial 攻击检测器在 simulated 数据上达到了 99.21% 的检测精度。在实验室设置中测试的实际数据上,我们的 adversarial 攻击检测器的平均检测精度为 96.29%。
A Decision Support System for Liver Diseases Prediction: Integrating Batch Processing, Rule-Based Event Detection and SPARQL Query
results: 这个研究使用SWRL规则将DT规则转换为ontology中的Semantic Web Rule Language (SWRL),并使用Pellet和Drool推理引擎在Protege工具中进行推理,最终可以为病人根据DT规则生成结果,并获得与病人相关的其他细节和不同预防建议。Abstract
Liver diseases pose a significant global health burden, impacting a substantial number of individuals and exerting substantial economic and social consequences. Rising liver problems are considered a fatal disease in many countries, such as Egypt, Molda, etc. The objective of this study is to construct a predictive model for liver illness using Basic Formal Ontology (BFO) and detection rules derived from a decision tree algorithm. Based on these rules, events are detected through batch processing using the Apache Jena framework. Based on the event detected, queries can be directly processed using SPARQL. To make the ontology operational, these Decision Tree (DT) rules are converted into Semantic Web Rule Language (SWRL). Using this SWRL in the ontology for predicting different types of liver disease with the help of the Pellet and Drool inference engines in Protege Tools, a total of 615 records are taken from different liver diseases. After inferring the rules, the result can be generated for the patient according to the DT rules, and other patient-related details along with different precautionary suggestions can be obtained based on these results. Combining query results of batch processing and ontology-generated results can give more accurate suggestions for disease prevention and detection. This work aims to provide a comprehensive approach that is applicable for liver disease prediction, rich knowledge graph representation, and smart querying capabilities. The results show that combining RDF data, SWRL rules, and SPARQL queries for analysing and predicting liver disease can help medical professionals to learn more about liver diseases and make a Decision Support System (DSS) for health care.
摘要
肝病对全球健康带来重大的影响,影响了大量人口并且对健康系统和社会带来了巨大的经济和社会影响。肝病在许多国家被视为致命疾病,如 Egyp、Molda等国。本研究的目标是使用基本正式 ontology(BFO)和基于决策树算法 derive的检测规则来建立预测肝病的模型。通过批处理,Apache Jena框架中的事件被检测,并基于检测到的事件,使用 SPARQL 进行直接处理。为了使 ontology 操作,这些决策树(DT)规则被转换为 Semantic Web Rule Language(SWRL)。使用这些 SWRL 在 ontology 中预测不同类型的肝病,并使用 Protege 工具中的 Pellet 和 Drool 推理引擎,共计615个记录来自不同的肝病。 after inferring the rules, the result can be generated for the patient according to the DT rules, and other patient-related details along with different precautionary suggestions can be obtained based on these results。通过将批处理的查询结果和 ontology 生成的结果组合,可以给出更加准确的疾病预测和预防建议。本工作的目标是提供一种通用的方法,可以用于肝病预测、丰富的知识图表示和智能查询能力。结果表明,将 RDF 数据、SWRL 规则和 SPARQL 查询结合分析和预测肝病,可以帮助医疗专业人员更好地了解肝病,并建立一个智能决策支持系统(DSS) для健康医疗。
How to Bridge the Gap between Modalities: A Comprehensive Survey on Multimodal Large Language Model
paper_authors: Shezheng Song, Xiaopeng Li, Shasha Li
for: This paper explores the use of Multimodal Large Language Models (MLLMs) to handle multimodal data and their potential applications in real-world human-computer interactions and artificial general intelligence.
methods: The paper surveys existing modality alignment methods for MLLMs, including Multimodal Converters, Multimodal Perceivers, Tools Assistance, and Data-Driven methods.
results: The paper discusses the challenges of processing the semantic gap in multimodality and the potential risks of erroneous generation, and highlights the importance of choosing appropriate modality alignment methods for LLMs to address environmental issues and enhance accessibility.Here is the simplified Chinese text for the three key points:
results: 论文讨论了多Modal数据的含义差距处理的挑战和可能的错误生成风险,并强调了选择合适的多Modal信息对齐方法,以解决环境问题和提高可用性。Abstract
This review paper explores Multimodal Large Language Models (MLLMs), which integrate Large Language Models (LLMs) like GPT-4 to handle multimodal data such as text and vision. MLLMs demonstrate capabilities like generating image narratives and answering image-based questions, bridging the gap towards real-world human-computer interactions and hinting at a potential pathway to artificial general intelligence. However, MLLMs still face challenges in processing the semantic gap in multimodality, which may lead to erroneous generation, posing potential risks to society. Choosing the appropriate modality alignment method is crucial, as improper methods might require more parameters with limited performance improvement. This paper aims to explore modality alignment methods for LLMs and their existing capabilities. Implementing modality alignment allows LLMs to address environmental issues and enhance accessibility. The study surveys existing modal alignment methods in MLLMs into four groups: (1) Multimodal Converters that change data into something LLMs can understand; (2) Multimodal Perceivers to improve how LLMs perceive different types of data; (3) Tools Assistance for changing data into one common format, usually text; and (4) Data-Driven methods that teach LLMs to understand specific types of data in a dataset. This field is still in a phase of exploration and experimentation, and we will organize and update various existing research methods for multimodal information alignment.
摘要
这篇评论文章探讨了多模态大语言模型(MLLM),它们将大语言模型(LLM)如GPT-4 integrated into多模态数据处理,如文本和视觉。 MLLMs 示出了生成图像故事和回答图像问题的能力, bridge the gap towards real-world human-computer interactions and hint at a potential pathway to artificial general intelligence。然而, MLLMs 在多模态 semantic gap处理方面仍面临挑战,可能导致错误生成, posing potential risks to society。选择合适的模态对齐方法是关键,因为不当的方法可能需要更多的参数,但具有有限的性能提升。这篇文章探讨了LLMs 的现有能力和现有的模态对齐方法,以实现环境问题和访问ibilty。对于现有的模态对齐方法,我们将它们分为四个组:(1)多模态转换器,将数据转换成LLMs可以理解的形式;(2)多模态感知器,提高LLMs 对不同类型数据的感知能力;(3)工具助手,将数据转换成一种常见的文本格式;(4)数据驱动方法,教导LLMs 理解特定的数据集中的特定类型数据。这个领域仍处于探索和实验阶段,我们将组织和更新现有的研究方法,以便在多模态信息对齐方面进行进一步的发展。
TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree transformation
methods: 该框架使用TransformerEncoder作为模型的重要组成部分,并引入了一种新的数据采样技术 called abstract syntax tree transformation。
results: 我们的方法可以快速和效率地学习代码嵌入,并且可以适应不同的编程语言和任务。我们通过对不同的软件工程任务和多个数据集进行广泛的实验来证明方法的效果。Abstract
Large-scale language models have made great progress in the field of software engineering in recent years. They can be used for many code-related tasks such as code clone detection, code-to-code search, and method name prediction. However, these large-scale language models based on each code token have several drawbacks: They are usually large in scale, heavily dependent on labels, and require a lot of computing power and time to fine-tune new datasets.Furthermore, code embedding should be performed on the entire code snippet rather than encoding each code token. The main reason for this is that encoding each code token would cause model parameter inflation, resulting in a lot of parameters storing information that we are not very concerned about. In this paper, we propose a novel framework, called TransformCode, that learns about code embeddings in a contrastive learning manner. The framework uses the Transformer encoder as an integral part of the model. We also introduce a novel data augmentation technique called abstract syntax tree transformation: This technique applies syntactic and semantic transformations to the original code snippets to generate more diverse and robust anchor samples. Our proposed framework is both flexible and adaptable: It can be easily extended to other downstream tasks that require code representation such as code clone detection and classification. The framework is also very efficient and scalable: It does not require a large model or a large amount of training data, and can support any programming language.Finally, our framework is not limited to unsupervised learning, but can also be applied to some supervised learning tasks by incorporating task-specific labels or objectives. To explore the effectiveness of our framework, we conducted extensive experiments on different software engineering tasks using different programming languages and multiple datasets.
摘要
大规模语言模型在软件工程领域最近几年来所做出的进步非常大。它们可以用于许多代码相关任务,如代码副本检测、代码到代码搜索和方法名预测。然而,这些基于每个代码字符的大规模语言模型有几个缺点:它们通常很大,依赖于标签很强,需要许多计算机力和时间来调整新的数据集。此外,代码嵌入应该基于整个代码片段而不是每个代码字符编码。主要原因是,对每个代码字符进行编码会导致模型参数膨胀,导致很多参数存储不重要的信息。在这篇论文中,我们提出了一个新的框架,叫做TransformCode,它通过对代码嵌入进行对照学习来学习代码嵌入。框架使用Transformer编码器作为模型的一部分。我们还介绍了一种新的数据采样技术 called abstract syntax tree transformation,该技术对原始代码片段应用 sintactic和semantic 变换来生成更多元和更加稳定的锚样本。我们提出的框架具有灵活性和适应性:它可以轻松扩展到其他下游任务需要代码表示,例如代码副本检测和分类。此外,框架也非常高效和扩展:它不需要大型模型或大量训练数据,并且可以支持任何编程语言。最后,我们的框架不仅限于无监督学习,还可以应用到一些监督学习任务,只需要添加任务特定的标签或目标。为了评估我们的框架的效果,我们对不同的软件工程任务和不同编程语言的多个数据集进行了广泛的实验。
Genetic Algorithm enhanced by Deep Reinforcement Learning in parent selection mechanism and mutation : Minimizing makespan in permutation flow shop scheduling problems
methods: 提议的 RL+GA 方法 integrate 神经网络(NN),并使用 Q-learning 或 Sarsa(0) 方法来控制 GA 算法中的两个关键运算:父选择机制和变异。在每一代,RL 代理的动作是确定选择方法、父选择概率和孪生变异概率。这allow RL 代理 dynamically 调整选择和变异 based on its 学习政策。
results: 研究结果表明 RL+GA 方法能够改进原始 GA 的性能,并且能够学习和适应人口多样性和解决方案改进随时间的演化过程。这种适应性导致在静态参数配置下获得的调度解决方案的改进。Abstract
This paper introduces a reinforcement learning (RL) approach to address the challenges associated with configuring and optimizing genetic algorithms (GAs) for solving difficult combinatorial or non-linear problems. The proposed RL+GA method was specifically tested on the flow shop scheduling problem (FSP). The hybrid algorithm incorporates neural networks (NN) and uses the off-policy method Q-learning or the on-policy method Sarsa(0) to control two key genetic algorithm (GA) operators: parent selection mechanism and mutation. At each generation, the RL agent's action is determining the selection method, the probability of the parent selection and the probability of the offspring mutation. This allows the RL agent to dynamically adjust the selection and mutation based on its learned policy. The results of the study highlight the effectiveness of the RL+GA approach in improving the performance of the primitive GA. They also demonstrate its ability to learn and adapt from population diversity and solution improvements over time. This adaptability leads to improved scheduling solutions compared to static parameter configurations while maintaining population diversity throughout the evolutionary process.
摘要
Anytime-Valid Confidence Sequences for Consistent Uncertainty Estimation in Early-Exit Neural Networks
results: 论文使用 anytime-valid confidence sequences (AVCSs) 解决这个问题,并在 regression 和 classification 任务上进行了实验验证。Abstract
Early-exit neural networks (EENNs) facilitate adaptive inference by producing predictions at multiple stages of the forward pass. In safety-critical applications, these predictions are only meaningful when complemented with reliable uncertainty estimates. Yet, due to their sequential structure, an EENN's uncertainty estimates should also be consistent: labels that are deemed improbable at one exit should not reappear within the confidence interval / set of later exits. We show that standard uncertainty quantification techniques, like Bayesian methods or conformal prediction, can lead to inconsistency across exits. We address this problem by applying anytime-valid confidence sequences (AVCSs) to the exits of EENNs. By design, AVCSs maintain consistency across exits. We examine the theoretical and practical challenges of applying AVCSs to EENNs and empirically validate our approach on both regression and classification tasks.
摘要
Early-exit neural networks (EENNs) 可以实现适应性的推理,通过多个前进通道生成预测结果。在安全关键应用中,这些预测结果的准确性只有在 accompaniment with reliable uncertainty estimates 时才有意义。然而,由于 EENN 的序列结构,它们的uncertainty estimates 应该具有一定的一致性:在某个 exit 被评估为不可能时,后续 exit 的信息不应该重新出现在信任范围内。我们表明,标准的uncertainty量化技术,如 Bayesian 方法或充分预测,可能会导致 exit 之间的不一致。我们解决这个问题,通过应用 anytime-valid confidence sequences (AVCSs) 到 EENN 的 exit 处理。由于 AVCSs 的设计,它们可以保证 exit 之间的一致性。我们检查了应用 AVCSs 到 EENN 的理论和实践挑战,并对 regression 和 classification 任务进行了实验验证。
The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models
results: 研究发现,在解码器中的嵌入异构性表现出一个明确的bell型曲线,中间层的异构性最高,而编码器中的嵌入异构性则更加uniform。此外,研究还发现,在训练的初期阶段,嵌入的维度会增加,然后逐渐减少,表明在训练过程中,模型在嵌入空间中进行了扩展和细化。Abstract
In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from the more uniformly distributed anisotropy observed in encoders. In addition, we found that the intrinsic dimension of embeddings increases in the initial phases of training, indicating an expansion into higher-dimensional space. Which is then followed by a compression phase towards the end of training with dimensionality decrease, suggesting a refinement into more compact representations. Our results provide fresh insights to the understanding of encoders and decoders embedding properties.
摘要
在这项研究中,我们展示了对转换器架构中嵌入的动态异构和内在维度的调查,特别是转换器Encoder和Decoder之间的对比。我们的发现显示,转换器Decoder中的异构性profile采取了一个明确的钟形曲线,中间层的异构性最高。这种模式与Encoder中的异构性更加 uniformly distributed 不同。此外,我们发现在训练的初期阶段,嵌入的内在维度会增加,表示在更高维度的空间中扩展。然后在训练的末期阶段,嵌入的维度会减少,表示向更加紧凑的表示进行了修finement。我们的结果为encoder和decoder嵌入性能的理解提供了新的视角。
results: 该研究发现了许多广泛使用的 LLM 存在假Alignment现象,导致previous evaluation protocols 不可靠。在提出 fake alignment 和两个新的评价指标(Consistency Score 和 Consistent Safety Score)后,该研究引入了 Fake alIgNment Evaluation 框架,以评估 LLM 的安全性。Abstract
The growing awareness of safety concerns in large language models (LLMs) has sparked considerable interest in the evaluation of safety within current research endeavors. This study investigates an interesting issue pertaining to the evaluation of LLMs, namely the substantial discrepancy in performance between multiple-choice questions and open-ended questions. Inspired by research on jailbreak attack patterns, we argue this is caused by mismatched generalization. That is, the LLM does not have a comprehensive understanding of the complex concept of safety. Instead, it only remembers what to answer for open-ended safety questions, which makes it unable to solve other forms of safety tests. We refer to this phenomenon as fake alignment and construct a comparative benchmark to empirically verify its existence in LLMs. Such fake alignment renders previous evaluation protocols unreliable. To address this, we introduce the Fake alIgNment Evaluation (FINE) framework and two novel metrics--Consistency Score (CS) and Consistent Safety Score (CSS), which jointly assess two complementary forms of evaluation to quantify fake alignment and obtain corrected performance estimates. Applying FINE to 14 widely-used LLMs reveals several models with purported safety are poorly aligned in practice. Our work highlights potential limitations in prevailing alignment methodologies.
摘要
LLMs 的安全问题正在引起越来越多的关注,这个研究探讨了 LLMS 的安全评估方法中的一个有趣问题,即多选题和开放式题的性能差异。我们根据监狱攻击模式的研究,提出了匹配混合泛化的问题,即 LLMS 对安全概念的理解不够全面,只记忆了开放式安全题的答案,无法解决其他安全测试形式。我们称这种现象为“假对齐”,并构建了比较指标来实验性证明其存在。这种假对齐使得以前的评估协议不可靠。为了解决这个问题,我们介绍了 Fake alIgNment Evaluation(FINE)框架和两个新的度量——一致度分数(CS)和安全一致分数(CSS),它们共同评估了两种不同的评估方法,以量化假对齐并获得修正后的性能估计。通过应用 FINE 到 14 种广泛使用的 LLMS 中,发现一些被认为具有安全的模型在实践中有假对齐问题。我们的工作高光了现有的对齐方法的局限性。
Establishing Performance Baselines in Fine-Tuning, Retrieval-Augmented Generation and Soft-Prompting for Non-Specialist LLM Users
results: 研究发现,使用商业平台和默认设置,无论输出多少轮Iteration,微调模型比GPT 3.5 Turbo高效,而RAG方法则超越了两者。软引用的应用有助于每种方法的性能提高。Abstract
Research into methods for improving the performance of large language models (LLMs) through fine-tuning, retrieval-augmented generation (RAG) and soft-prompting has tended to focus on the use of highly technical or high-cost techniques, making many of the newly discovered approaches comparatively inaccessible to non-technical users. In this paper we tested an unmodified version of GPT 3.5, a fine-tuned version, and the same unmodified model when given access to a vectorised RAG database, both in isolation and in combination with a basic, non-algorithmic soft prompt. In each case we tested the model's ability to answer a set of 100 questions relating primarily to events that occurred after September 2021 (the point at which GPT 3.5's training data set ends). We found that if commercial platforms are used and default settings are applied with no iteration in order to establish a baseline set of outputs, a fine-tuned model outperforms GPT 3.5 Turbo, while the RAG approach out-performed both. The application of a soft prompt significantly improved the performance of each approach.
摘要
A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning
paper_authors: Valeriia Cherepanova, Roman Levin, Gowthami Somepalli, Jonas Geiping, C. Bayan Bruss, Andrew Gordon Wilson, Tom Goldstein, Micah Goldblum
for: 本研究旨在提供一个有效的特征选择精选方法,用于适应 tabular deep learning 中的特征选择问题。
results: 本研究通过对实际数据集进行测试,发现input-gradient-based Lasso 方法在适应 corrupted 或 second-order 特征选择问题时表现出色,并且比经典的特征选择方法更高效。Abstract
Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent overfitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. Motivated by the increasing popularity of tabular deep learning, we construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of Lasso for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.
摘要
学术表格标准 benchmark 常常包含小量精心选择的特征。相比之下,数据科学家通常尽可能多地收集特征到他们的数据集中,甚至从现有特征中引入新的特征。为防止适应度过高在后续的模型下,实践者通常使用自动化特征选择方法,以确定减少的特征subset。现有的表格特征选择标准对古典下游模型、娱乐生成的数据集或不会评估特征选择器的下游性能。我们受到表格深度学习的增加流行,我们构建了一个具有下游模型性能评估的特征选择 benchmark,使用真实数据和多种生成附加特征的方法。我们还提议一种输入Gradient-based的lasso方法,用于神经网络上的特征选择,其在具有受损或第二项特征的问题上表现出色。
DPR: An Algorithm Mitigate Bias Accumulation in Recommendation feedback loops
For: The paper aims to address the bias issues in recommendation models caused by user feedback, specifically the exposure mechanism and feedback loops.* Methods: The paper uses the Missing Not At Random (MNAR) assumption to analyze the data exposure mechanism and feedback loops, and proposes a dynamic re-weighting algorithm called Dynamic Personalized Ranking (DPR) to mitigate the cross-effects of exposure mechanisms and feedback loops.* Results: The paper theoretically demonstrates the effectiveness of the proposed approach in mitigating the negative effects of feedback loops and unknown exposure mechanisms. Experimental results on real-world datasets show that models using DPR can better handle bias accumulation, and the Universal Anti-False Negative (UFN) plugin can mitigate the negative impact of false negative samples.Abstract
Recommendation models trained on the user feedback collected from deployed recommendation systems are commonly biased. User feedback is considerably affected by the exposure mechanism, as users only provide feedback on the items exposed to them and passively ignore the unexposed items, thus producing numerous false negative samples. Inevitably, biases caused by such user feedback are inherited by new models and amplified via feedback loops. Moreover, the presence of false negative samples makes negative sampling difficult and introduces spurious information in the user preference modeling process of the model. Recent work has investigated the negative impact of feedback loops and unknown exposure mechanisms on recommendation quality and user experience, essentially treating them as independent factors and ignoring their cross-effects. To address these issues, we deeply analyze the data exposure mechanism from the perspective of data iteration and feedback loops with the Missing Not At Random (\textbf{MNAR}) assumption, theoretically demonstrating the existence of an available stabilization factor in the transformation of the exposure mechanism under the feedback loops. We further propose Dynamic Personalized Ranking (\textbf{DPR}), an unbiased algorithm that uses dynamic re-weighting to mitigate the cross-effects of exposure mechanisms and feedback loops without additional information. Furthermore, we design a plugin named Universal Anti-False Negative (\textbf{UFN}) to mitigate the negative impact of the false negative problem. We demonstrate theoretically that our approach mitigates the negative effects of feedback loops and unknown exposure mechanisms. Experimental results on real-world datasets demonstrate that models using DPR can better handle bias accumulation and the universality of UFN in mainstream loss methods.
摘要
推荐模型通常受到已部署的推荐系统中收集的用户反馈的偏见。用户反馈受到曝光机制的影响很大,用户只是提供曝光给他们的项目,并且忽略其他项目,因此生成了大量的假正样本。这些偏见会在新的模型中继承下来,并通过反馈循环被强制。此外,假正样本的存在使得负样本难以处理,并将偏见引入推荐过程中。 latest work has investigated the negative impact of feedback loops and unknown exposure mechanisms on recommendation quality and user experience, treating them as independent factors and ignoring their cross-effects. To address these issues, we deeply analyze the data exposure mechanism from the perspective of data iteration and feedback loops with the Missing Not At Random (\textbf{MNAR}) assumption, theoretically demonstrating the existence of an available stabilization factor in the transformation of the exposure mechanism under the feedback loops. We further propose Dynamic Personalized Ranking (\textbf{DPR}), an unbiased algorithm that uses dynamic re-weighting to mitigate the cross-effects of exposure mechanisms and feedback loops without additional information. Furthermore, we design a plugin named Universal Anti-False Negative (\textbf{UFN}) to mitigate the negative impact of the false negative problem. We demonstrate theoretically that our approach mitigates the negative effects of feedback loops and unknown exposure mechanisms. Experimental results on real-world datasets demonstrate that models using DPR can better handle bias accumulation and the universality of UFN in mainstream loss methods.
Reframing Audience Expansion through the Lens of Probability Density Estimation
results: simulations 表明,该方法可以准确地Identify the most relevant users for an expanded audience, with high precision and recall values.Abstract
Audience expansion has become an important element of prospective marketing, helping marketers create target audiences based on a mere representative sample of their current customer base. Within the realm of machine learning, a favored algorithm for scaling this sample into a broader audience hinges on a binary classification task, with class probability estimates playing a crucial role. In this paper, we review this technique and introduce a key change in how we choose training examples to ensure the quality of the generated audience. We present a simulation study based on the widely used MNIST dataset, where consistent high precision and recall values demonstrate our approach's ability to identify the most relevant users for an expanded audience. Our results are easily reproducible and a Python implementation is openly available on GitHub: \url{https://github.com/carvalhaes-ai/audience-expansion}
摘要
audi范拓已成为营销市场的重要元素,帮助市场部署创建基于当前客户基础的目标听众。在机器学习领域,一种受欢迎的算法用于扩大这个样本,基于二分类任务,其中类别概率估计具有关键作用。在这篇论文中,我们评论这种技术,并引入一个关键的更改,以确保生成的听众质量。我们通过基于广泛使用的 MNIST 数据集进行的 simulation 研究,发现我们的方法可以具有高精度和准确性。我们的结果可以重新制作,并在 GitHub 上公开提供 Python 实现:\url{https://github.com/carvalhaes-ai/audience-expansion}
Cognitive Architecture Toward Common Ground Sharing Among Humans and Generative AIs: Trial on Model-Model Interactions in Tangram Naming Task
results: 研究发现,透过实现共同基础,模型之间的通信效果超过了偶数几率水平,并且观察到了对模型中的一个部分进行逐步反向传播可以实现性能的 statistically significant 提升。Abstract
For generative AIs to be trustworthy, establishing transparent common grounding with humans is essential. As a preparation toward human-model common grounding, this study examines the process of model-model common grounding. In this context, common ground is defined as a cognitive framework shared among agents in communication, enabling the connection of symbols exchanged between agents to the meanings inherent in each agent. This connection is facilitated by a shared cognitive framework among the agents involved. In this research, we focus on the tangram naming task (TNT) as a testbed to examine the common-ground-building process. Unlike previous models designed for this task, our approach employs generative AIs to visualize the internal processes of the model. In this task, the sender constructs a metaphorical image of an abstract figure within the model and generates a detailed description based on this image. The receiver interprets the generated description from the partner by constructing another image and reconstructing the original abstract figure. Preliminary results from the study show an improvement in task performance beyond the chance level, indicating the effect of the common cognitive framework implemented in the models. Additionally, we observed that incremental backpropagations leveraging successful communication cases for a component of the model led to a statistically significant increase in performance. These results provide valuable insights into the mechanisms of common grounding made by generative AIs, improving human communication with the evolving intelligent machines in our future society.
摘要
为了让生成型AI变得可靠,建立与人类共同基础是必要的。为了实现人机共同基础,本研究研究了模型之间的共同基础建设。在这个上下文中,共同基础被定义为在交流中的智能框架,它使得交换 между代理人之间的符号与每个代理人内部的含义相连接。这种连接是通过共同智能框架的共享而实现。在本研究中,我们使用生成型AI来视觉化模型内部的过程。在这个任务中,发送方构建一个抽象图形内部的模型,并生成基于这个图形的详细描述。接收方根据伙伴的生成描述重新构建原始抽象图形。初步的研究结果显示,通过实施共同基础,任务性能超过了偶极值水平,这表明了共同智能框架在模型中的作用。此外,我们还发现,通过基于成功交流 caso的增量反向卷积,对一部分模型的性能进行了 statistically significant 的提高。这些结果为我们在将来社会中与智能机器进行交流的机制提供了有价值的发现。
Tamil-Llama: A New Tamil Language Model Based on Llama 2
results: 在坦米文本生成和理解方面获得了显著的性能提升,具有潜在的应用在印度语言模型中。Abstract
Language modeling has witnessed remarkable advancements in recent years, with Large Language Models (LLMs) like ChatGPT setting unparalleled benchmarks in human-like text generation. However, a prevailing limitation is the underrepresentation of languages like Tamil in these cutting-edge models, leading to suboptimal performance in diverse linguistic contexts. This paper addresses this lacuna, enhancing the open-source LLaMA model with an addition of 16,000 Tamil tokens, aiming to achieve superior text generation and comprehension in the Tamil language. We strategically employ the LoRA methodology for efficient model training on a comprehensive Tamil corpus, ensuring computational feasibility and model robustness. Moreover, we introduce a Tamil-translated version of the Alpaca dataset and a subset of the OpenOrca dataset tailored for instruction fine-tuning. Our results showcase significant performance improvements in Tamil text generation, with potential implications for the broader landscape of LLMs in Indian languages. We further underscore our commitment to open research by making our models, datasets, and code publicly accessible, fostering further innovations in language modeling.
摘要
Large Language Models (LLMs) 如 ChatGPT 在最近几年内取得了无 precedent 的进步,但是有一点问题是一些语言,如 tamile 的语言,在这些先进模型中受到了不足的表现,这导致在多样化语言上的表现不佳。这篇文章解决了这个问题,通过将16,000个 tamile 单词添加到了开源的 LLaMA 模型中,以达到在 tamile 语言中的superior 文本生成和理解。我们使用 LoRA 方法学习在 comprehensive tamile 词汇库上,以确保计算可行性和模型稳定性。此外,我们还引入了 tamile 翻译的 Alpaca 数据集和 OpenOrca 数据集的一个子集,用于 fine-tuning instruction。我们的结果表明,在 tamile 文本生成方面有了显著的性能提高,这可能对整个 LLMS 的发展产生了影响。此外,我们还强调我们的研究是开放的,我们将我们的模型、数据集和代码公开 accessible,以促进进一步的语言模型化领域的创新。
AI-native Interconnect Framework for Integration of Large Language Model Technologies in 6G Systems
results: 该论文预测,通过将AI作为下一代通信体系的核心,将能够提高通信网络的功能和交互方式,并且将有新的实际应用出现。Abstract
The evolution towards 6G architecture promises a transformative shift in communication networks, with artificial intelligence (AI) playing a pivotal role. This paper delves deep into the seamless integration of Large Language Models (LLMs) and Generalized Pretrained Transformers (GPT) within 6G systems. Their ability to grasp intent, strategize, and execute intricate commands will be pivotal in redefining network functionalities and interactions. Central to this is the AI Interconnect framework, intricately woven to facilitate AI-centric operations within the network. Building on the continuously evolving current state-of-the-art, we present a new architectural perspective for the upcoming generation of mobile networks. Here, LLMs and GPTs will collaboratively take center stage alongside traditional pre-generative AI and machine learning (ML) algorithms. This union promises a novel confluence of the old and new, melding tried-and-tested methods with transformative AI technologies. Along with providing a conceptual overview of this evolution, we delve into the nuances of practical applications arising from such an integration. Through this paper, we envisage a symbiotic integration where AI becomes the cornerstone of the next-generation communication paradigm, offering insights into the structural and functional facets of an AI-native 6G network.
摘要
六代网络架构的演化将导致一次性的 коммуникацион网络变革,人工智能(AI)将扮演关键角色。这篇论文探讨了在六代系统中大语言模型(LLM)和通用预训练变换器(GPT)的无缝嵌入。这些技术将能够捕捉意图、策略和执行复杂命令,对网络功能和交互进行重塑。核心在于人工智能集成框架,织入网络中AI-центри的操作。基于不断演化的当前状态艺术,我们提出了下一代移动网络的新建筑视图。在这个新视图中,LLMs和GPTs将与传统的预生成AI和机器学习(ML)算法一起Collaborate,创造一种新的旧和新的融合,将经验证过的方法与转变性AI技术融合。本论文不仅提供了这一演化的概念审视,还探讨了这种融合的实践应用。我们可以看到,AI将成为下一代通信 paradigma的基础 stone,提供对六代网络结构和功能方面的新的视角。
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion
results: 研究表明,该方法可以提供更加个性化和有用的查询建议,比如其他LLM-基于的基elines。通过人工评估,该方法在Contextual Query Suggestion任务中表现出色,生成的查询建议更加相关、个性化和有用。Abstract
Large Language Models (LLMs) excel at tackling various natural language tasks. However, due to the significant costs involved in re-training or fine-tuning them, they remain largely static and difficult to personalize. Nevertheless, a variety of applications could benefit from generations that are tailored to users' preferences, goals, and knowledge. Among them is web search, where knowing what a user is trying to accomplish, what they care about, and what they know can lead to improved search experiences. In this work, we propose a novel and general approach that augments an LLM with relevant context from users' interaction histories with a search engine in order to personalize its outputs. Specifically, we construct an entity-centric knowledge store for each user based on their search and browsing activities on the web, which is then leveraged to provide contextually relevant LLM prompt augmentations. This knowledge store is light-weight, since it only produces user-specific aggregate projections of interests and knowledge onto public knowledge graphs, and leverages existing search log infrastructure, thereby mitigating the privacy, compliance, and scalability concerns associated with building deep user profiles for personalization. We then validate our approach on the task of contextual query suggestion, which requires understanding not only the user's current search context but also what they historically know and care about. Through a number of experiments based on human evaluation, we show that our approach is significantly better than several other LLM-powered baselines, generating query suggestions that are contextually more relevant, personalized, and useful.
摘要
In this work, we propose a novel and general approach that augments an LLM with relevant context from users' interaction histories with a search engine to personalize its outputs. Specifically, we construct an entity-centric knowledge store for each user based on their search and browsing activities on the web, which is then leveraged to provide contextually relevant LLM prompt augmentations. This knowledge store is lightweight, as it only produces user-specific aggregate projections of interests and knowledge onto public knowledge graphs, and leverages existing search log infrastructure, thereby mitigating privacy, compliance, and scalability concerns associated with building deep user profiles for personalization.We validate our approach on the task of contextual query suggestion, which requires understanding not only the user's current search context but also what they historically know and care about. Through a number of experiments based on human evaluation, we show that our approach is significantly better than several other LLM-powered baselines, generating query suggestions that are contextually more relevant, personalized, and useful.
results: 研究表明,MaaS将使GenAI模型的开发变得更加民主化,并且可以为不同领域的应用提供可观之服务。它还可以解决许多当前AI技术的挑战,如模型训练和部署等。Abstract
Due to the increased number of parameters and data in the pre-trained model exceeding a certain level, a foundation model (e.g., a large language model) can significantly improve downstream task performance and emerge with some novel special abilities (e.g., deep learning, complex reasoning, and human alignment) that were not present before. Foundation models are a form of generative artificial intelligence (GenAI), and Model-as-a-Service (MaaS) has emerged as a groundbreaking paradigm that revolutionizes the deployment and utilization of GenAI models. MaaS represents a paradigm shift in how we use AI technologies and provides a scalable and accessible solution for developers and users to leverage pre-trained AI models without the need for extensive infrastructure or expertise in model training. In this paper, the introduction aims to provide a comprehensive overview of MaaS, its significance, and its implications for various industries. We provide a brief review of the development history of "X-as-a-Service" based on cloud computing and present the key technologies involved in MaaS. The development of GenAI models will become more democratized and flourish. We also review recent application studies of MaaS. Finally, we highlight several challenges and future issues in this promising area. MaaS is a new deployment and service paradigm for different AI-based models. We hope this review will inspire future research in the field of MaaS.
摘要
由于预训过程中参数和数据的增加超过了一定水平,基础模型(例如大语言模型)可以显著提高下游任务性能,并且具有一些新的特殊能力(例如深度学习、复杂逻辑和人类匹配),这些能力在之前没有出现过。基础模型是生成人工智能(GenAI)的一种形式,而Model-as-a-Service(MaaS)是一种革命性的部署和使用GenAI模型的新 paradigma。MaaS将如何使用AI技术发生了一种巨大的变革,并提供了可扩展的和访问ible的解决方案,让开发者和用户可以无需具备大量的基础设施或模型训练专业知识来使用预训AI模型。在这篇论文中,我们 aim to provide a comprehensive overview of MaaS, its significance, and its implications for various industries. We will review the development history of "X-as-a-Service" based on cloud computing and present the key technologies involved in MaaS. With the development of GenAI models becoming more democratized, MaaS will flourish. We will also review recent application studies of MaaS. Finally, we will highlight several challenges and future issues in this promising area. MaaS是一种新的部署和服务 paradigma,我们希望这篇文章能启发未来的研究在这个领域。
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
results: 根据这个论文的结果,使用SWIM-IR数据集进行synthetic fine-tuning的多语言检索模型可以达到与人工supervised模型相当的性能,而且可以在三个检索测试benchmark上进行可靠的评估。Abstract
Dense retrieval models have predominantly been studied for English, where models have shown great success, due to the availability of human-labeled training pairs. However, there has been limited success for multilingual retrieval so far, as training data is uneven or scarcely available across multiple languages. Synthetic training data generation is promising (e.g., InPars or Promptagator), but has been investigated only for English. Therefore, to study model capabilities across both cross-lingual and monolingual retrieval tasks, we develop SWIM-IR, a synthetic retrieval training dataset containing 33 (high to very-low resource) languages for training multilingual dense retrieval models without requiring any human supervision. To construct SWIM-IR, we propose SAP (summarize-then-ask prompting), where the large language model (LLM) generates a textual summary prior to the query generation step. SAP assists the LLM in generating informative queries in the target language. Using SWIM-IR, we explore synthetic fine-tuning of multilingual dense retrieval models and evaluate them robustly on three retrieval benchmarks: XOR-Retrieve (cross-lingual), XTREME-UP (cross-lingual) and MIRACL (monolingual). Our models, called SWIM-X, are competitive with human-supervised dense retrieval models, e.g., mContriever, finding that SWIM-IR can cheaply substitute for expensive human-labeled retrieval training data.
摘要
traditional retrieval models have been mainly studied for English, where models have shown great success, due to the availability of human-labeled training pairs. However, there has been limited success for multilingual retrieval so far, as training data is uneven or scarcely available across multiple languages. Synthetic training data generation is promising (e.g., InPars or Promptagator), but has been investigated only for English. Therefore, to study model capabilities across both cross-lingual and monolingual retrieval tasks, we develop SWIM-IR, a synthetic retrieval training dataset containing 33 (high to very-low resource) languages for training multilingual dense retrieval models without requiring any human supervision. To construct SWIM-IR, we propose SAP (summarize-then-ask prompting), where the large language model (LLM) generates a textual summary prior to the query generation step. SAP assists the LLM in generating informative queries in the target language. Using SWIM-IR, we explore synthetic fine-tuning of multilingual dense retrieval models and evaluate them robustly on three retrieval benchmarks: XOR-Retrieve (cross-lingual), XTREME-UP (cross-lingual) and MIRACL (monolingual). Our models, called SWIM-X, are competitive with human-supervised dense retrieval models, e.g., mContriever, finding that SWIM-IR can cheaply substitute for expensive human-labeled retrieval training data.
methods: 这个论文使用了 five 种高资源语言和两个 NLP 任务来研究 chatGPT 的表现和自信度准确性。
results: 结果表明所选高资源语言都表现相似,chatGPT 的自信度准确性不良, часто过于自信而从未给出低自信值。Abstract
ChatGPT took the world by storm for its impressive abilities. Due to its release without documentation, scientists immediately attempted to identify its limits, mainly through its performance in natural language processing (NLP) tasks. This paper aims to join the growing literature regarding ChatGPT's abilities by focusing on its performance in high-resource languages and on its capacity to predict its answers' accuracy by giving a confidence level. The analysis of high-resource languages is of interest as studies have shown that low-resource languages perform worse than English in NLP tasks, but no study so far has analysed whether high-resource languages perform as well as English. The analysis of ChatGPT's confidence calibration has not been carried out before either and is critical to learn about ChatGPT's trustworthiness. In order to study these two aspects, five high-resource languages and two NLP tasks were chosen. ChatGPT was asked to perform both tasks in the five languages and to give a numerical confidence value for each answer. The results show that all the selected high-resource languages perform similarly and that ChatGPT does not have a good confidence calibration, often being overconfident and never giving low confidence values.
摘要
chatGPT在全球引起了一阵风波,主要是因为它的各种能力。由于没有相关文档,科学家们很快就开始了对 chatGPT 的研究,主要通过语言处理任务来测试它的能力。这篇论文想要加入关于 chatGPT 的能力的增长 литератур,主要是通过对高资源语言的表现和 chatGPT 给出答案准确性的信息来进行分析。研究高资源语言的 interessant 是, studies 表明,对英语的 NLP 任务表现较差,但没有任何研究表明,高资源语言的表现和英语相同。此外,还没有任何研究对 chatGPT 的信任性进行了分析,这也是这篇论文的一个重要目标。为了实现这两个目标,我们选择了五种高资源语言和两个 NLP 任务,并让 chatGPT 在这些语言中完成这两个任务,并给出每个答案的数字信任值。结果显示,所选高资源语言都表现相似,而 chatGPT 的信任把关不好,经常过于自信和从来不给低信任值。
Knowledge Graphs are not Created Equal: Exploring the Properties and Structure of Real KGs
results: 研究发现了许多KG的结构和属性特征,并提出了在KG基于模型开发和评估方面的一些建议。Abstract
Despite the recent popularity of knowledge graph (KG) related tasks and benchmarks such as KG embeddings, link prediction, entity alignment and evaluation of the reasoning abilities of pretrained language models as KGs, the structure and properties of real KGs are not well studied. In this paper, we perform a large scale comparative study of 29 real KG datasets from diverse domains such as the natural sciences, medicine, and NLP to analyze their properties and structural patterns. Based on our findings, we make several recommendations regarding KG-based model development and evaluation. We believe that the rich structural information contained in KGs can benefit the development of better KG models across fields and we hope this study will contribute to breaking the existing data silos between different areas of research (e.g., ML, NLP, AI for sciences).
摘要
尽管知识图(KG)相关任务和benchmark在最近几年得到了广泛关注,如KG嵌入、链接预测、实体对Alignment和语言模型的逻辑能力评估等,然而实际的知识图结构和特性尚未得到充分研究。在这篇论文中,我们对29个不同领域的真实知识图进行了大规模比较研究,以分析它们的性质和结构性特征。根据我们的发现,我们提出了一些关于基于知识图的模型开发和评估的建议。我们认为知识图中的丰富结构信息可以帮助开发更好的知识图模型,并且希望这篇研究能够突破现有的数据困境(如机器学习、自然语言处理、人工智能等领域之间的数据困境)。
Analyzing Modular Approaches for Visual Question Decomposition
results: 研究发现,ViperGPT的加成表现主要来自于选择任务特定模块,而不是BLIP-2模型。此外,ViperGPT可以保持大部分表现,只有 modifying 模块选择策略。此外,模块化方法在一些benchmark上比提问方法表现更好,因为它可以使用自然语言来表示子任务。Abstract
Modular neural networks without additional training have recently been shown to surpass end-to-end neural networks on challenging vision-language tasks. The latest such methods simultaneously introduce LLM-based code generation to build programs and a number of skill-specific, task-oriented modules to execute them. In this paper, we focus on ViperGPT and ask where its additional performance comes from and how much is due to the (state-of-art, end-to-end) BLIP-2 model it subsumes vs. additional symbolic components. To do so, we conduct a controlled study (comparing end-to-end, modular, and prompting-based methods across several VQA benchmarks). We find that ViperGPT's reported gains over BLIP-2 can be attributed to its selection of task-specific modules, and when we run ViperGPT using a more task-agnostic selection of modules, these gains go away. Additionally, ViperGPT retains much of its performance if we make prominent alterations to its selection of modules: e.g. removing or retaining only BLIP-2. Finally, we compare ViperGPT against a prompting-based decomposition strategy and find that, on some benchmarks, modular approaches significantly benefit by representing subtasks with natural language, instead of code.
摘要
(Simplified Chinese translation)模块化神经网络无需额外训练最近已经能够超越端到端神经网络在复杂的视觉语言任务上。最新的这些方法同时引入了基于LLM的代码生成以建立程序,以及一些任务特定、任务oriented的模块来执行它们。在这篇论文中,我们关注ViperGPT,并问它的额外性能来源于它的选择的任务特定模块以及BLIP-2模型是否具有主导作用。为了回答这个问题,我们进行了一项控制性研究, comparing end-to-end、模块化和提问基本方法在多个VQAbenchmark上。我们发现,ViperGPT的报告性能增加与BLIP-2模型的选择有直接关系,并且当我们使用一种更任务agnostic的模块选择策略时,这些增加消失。此外,我们发现ViperGPT在做出显著变化到其模块选择时仍然保持较高的性能,例如删除或保留仅BLIP-2模型。最后,我们与提问基本方法进行比较,并发现在某些benchmark上,模块化方法在表示子任务的自然语言方面具有显著的优势。
Autoregressive Language Models For Estimating the Entropy of Epic EHR Audit Logs
paper_authors: Benjamin C. Warner, Thomas Kannampallil, Seunghwan Kim
for: 这项研究旨在Characterizing clinician workflow on the electronic health record (EHR) through EHR audit logs.
methods: 该研究使用 transformer-based tabular language model (tabular LM) 来度量工作流程中动作序列的 entropy 或混乱程度.
results: 研究发现 tabular LM 可以准确度量工作流程中动作序列的复杂性,并且可以公开发布评估模型 дляFuture research.Abstract
EHR audit logs are a highly granular stream of events that capture clinician activities, and is a significant area of interest for research in characterizing clinician workflow on the electronic health record (EHR). Existing techniques to measure the complexity of workflow through EHR audit logs (audit logs) involve time- or frequency-based cross-sectional aggregations that are unable to capture the full complexity of a EHR session. We briefly evaluate the usage of transformer-based tabular language model (tabular LM) in measuring the entropy or disorderedness of action sequences within workflow and release the evaluated models publicly.
摘要
Distilling Large Language Models using Skill-Occupation Graph Context for HR-Related Tasks
for: This paper aims to bridge the gap in HR applications by introducing a benchmark for various HR tasks, including matching and explaining resumes to job descriptions, extracting skills and experiences from resumes, and editing resumes.
methods: The benchmark is created by distilling domain-specific knowledge from a large language model (LLM) and relying on a curated skill-occupation graph to ensure diversity and provide context for LLMs generation.
results: The student models achieve near/better performance than the teacher model (GPT-4) in various HR tasks, and the benchmark is effective in out-of-distribution data for skill extraction and resume-job description matching in zero-shot and weak supervision manner.Here’s the simplified Chinese text:
results: 学生模型在不同的人力任务中具有near/更好的性能,而且benchmark在对数据集进行零shot和弱监督下的应用中也表现出了效果。Abstract
Numerous HR applications are centered around resumes and job descriptions. While they can benefit from advancements in NLP, particularly large language models, their real-world adoption faces challenges due to absence of comprehensive benchmarks for various HR tasks, and lack of smaller models with competitive capabilities. In this paper, we aim to bridge this gap by introducing the Resume-Job Description Benchmark (RJDB). We meticulously craft this benchmark to cater to a wide array of HR tasks, including matching and explaining resumes to job descriptions, extracting skills and experiences from resumes, and editing resumes. To create this benchmark, we propose to distill domain-specific knowledge from a large language model (LLM). We rely on a curated skill-occupation graph to ensure diversity and provide context for LLMs generation. Our benchmark includes over 50 thousand triples of job descriptions, matched resumes and unmatched resumes. Using RJDB, we train multiple smaller student models. Our experiments reveal that the student models achieve near/better performance than the teacher model (GPT-4), affirming the effectiveness of the benchmark. Additionally, we explore the utility of RJDB on out-of-distribution data for skill extraction and resume-job description matching, in zero-shot and weak supervision manner. We release our datasets and code to foster further research and industry applications.
摘要
许多人力资源(HR)应用程序都集中在简历和职业描述上。虽然这些应用程序可以从大语言模型(LLM)中受益,但它们在实际应用中遇到了各种挑战,主要是缺乏各种HR任务的全面指标,以及小型模型的竞争力不足。在这篇论文中,我们想要填补这个差距,我们提出了简历职业描述指标(RJDB)。我们尽可能地为各种HR任务,包括简历与职业描述匹配和解释、从简历中提取技能和经验、编辑简历等,创建了这个指标。我们利用一个精心挑选的技能岗位图来保证多样性和提供 контекст для LLMS的生成。我们的指标包括5万多个职业描述、匹配简历和未匹配简历的 triple。我们使用RJDB训练多个小型学生模型,我们的实验表明,这些学生模型可以与教师模型(GPT-4)的性能相似或更好,这证明了指标的有效性。此外,我们还研究了RJDB在零shot和弱监督下对技能提取和简历职业描述匹配的 utility。我们发布了我们的数据和代码,以便进一步的研究和实际应用。
Transfer Learning for Structured Pruning under Limited Task Data
results: 这篇论文的实验结果表明,使用这个框架可以实现剪枝后的模型具有更好的普遍化性,比对照强大的基eline。Abstract
Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, task-aware structured pruning methods offer a solution. These approaches reduce model size by dropping structural units like layers and attention heads in a manner that takes into account the end-task. However, these pruning algorithms require more task-specific data than is typically available. We propose a framework which combines structured pruning with transfer learning to reduce the need for task-specific data. Our empirical results answer questions such as: How should the two tasks be coupled? What parameters should be transferred? And, when during training should transfer learning be introduced? Leveraging these insights, we demonstrate that our framework results in pruned models with improved generalization over strong baselines.
摘要
大型预训练模型在资源受限的应用中存在问题。幸运的是,任务意识 Structured pruning 方法提供了解决方案。这些方法通过去掉结构单元如层和注意头来减小模型大小,并且根据结束任务进行考虑。然而,这些剪枝算法需要更多的任务特定数据 than usual。我们提议一个框架,该结合 Structured pruning 和传输学习来减少需要任务特定数据的需求。我们的实验结果回答了以下问题:何时在训练过程中引入传输学习?何时将两个任务耦合?何时传输哪些参数?通过这些意见,我们示出了我们的框架可以在强大基eline上提供更好的泛化性。
results: DEMUX在84%的测试 caso中超越了强基eline,在零shot设定中(包括多语言目标池)的三种模型和四个任务上。尤其在低预算设定(5-100示例)下,我们观察到了8-11个F1点的提升 дляtoken级任务,以及2-5个F1点的提升 для复杂任务。我们的代码可以在以下链接中下载:https://github.com/simran-khanuja/demux。Abstract
We consider the task of optimally fine-tuning pre-trained multilingual models, given small amounts of unlabelled target data and an annotation budget. In this paper, we introduce DEMUX, a framework that prescribes the exact data-points to label from vast amounts of unlabelled multilingual data, having unknown degrees of overlap with the target set. Unlike most prior works, our end-to-end framework is language-agnostic, accounts for model representations, and supports multilingual target configurations. Our active learning strategies rely upon distance and uncertainty measures to select task-specific neighbors that are most informative to label, given a model. DeMuX outperforms strong baselines in 84% of the test cases, in the zero-shot setting of disjoint source and target language sets (including multilingual target pools), across three models and four tasks. Notably, in low-budget settings (5-100 examples), we observe gains of up to 8-11 F1 points for token-level tasks, and 2-5 F1 for complex tasks. Our code is released here: https://github.com/simran-khanuja/demux.
摘要
我们考虑在小量目标数据和注释预算下优化预训练多语言模型的任务。在这篇论文中,我们介绍了DEMUX框架,它可以从大量的不标记多语言数据中选择特定的数据点进行标注,这些数据点可能与目标集之间存在未知的重叠度。与大多数前一代工作不同,我们的终端框架是语言无关的,考虑了模型表示,并支持多语言目标配置。我们的活动学策略基于距离和不确定度度量来选择任务特定的邻居,以便在模型上进行标注。DEMuX在3个模型和4个任务中的0号设定下(包括多语言目标池)上比强基eline表现出色,在5-100个示例的低预算设定下,我们观察到了8-11个F1分的提升 дляToken级任务,以及2-5个F1分的提升 для复杂任务。我们的代码可以在以下链接中找到:https://github.com/simran-khanuja/demux。
Heaps’ Law in GPT-Neo Large Language Model Emulated Corpora
results: 研究发现,生成的文献摘要遵循Heaps法律,而随着GPT-Neo模型的参数大小增加,生成的词汇更加遵循Heaps法律,与人类编写的文本类似。Abstract
Heaps' law is an empirical relation in text analysis that predicts vocabulary growth as a function of corpus size. While this law has been validated in diverse human-authored text corpora, its applicability to large language model generated text remains unexplored. This study addresses this gap, focusing on the emulation of corpora using the suite of GPT-Neo large language models. To conduct our investigation, we emulated corpora of PubMed abstracts using three different parameter sizes of the GPT-Neo model. Our emulation strategy involved using the initial five words of each PubMed abstract as a prompt and instructing the model to expand the content up to the original abstract's length. Our findings indicate that the generated corpora adhere to Heaps' law. Interestingly, as the GPT-Neo model size grows, its generated vocabulary increasingly adheres to Heaps' law as as observed in human-authored text. To further improve the richness and authenticity of GPT-Neo outputs, future iterations could emphasize enhancing model size or refining the model architecture to curtail vocabulary repetition.
摘要
Relation Extraction in underexplored biomedical domains: A diversity-optimised sampling and synthetic data generation approach
paper_authors: Maxime Delmas, Magdalena Wysocka, André Freitas for: This paper aims to address the issue of limited labeled data in relation extraction tasks, specifically in the context of natural products literature.methods: The authors developed a new sampler inspired by diversity metrics in ecology, called the Greedy Maximum Entropy sampler (GME-sampler), to curate a evaluation dataset for training relation extraction models. They also explored few-shot learning with open large language models (LLaMA 7B-65B) and synthetic data generation using Vicuna-13B.results: The authors achieved substantial improvements in relation extraction performance when fine-tuning models on synthetic abstracts rather than the noisy original data. Their best-performing model, BioGPT-Large, achieved an f1-score of 59.0. They also provide the generated synthetic data and the evaluation dataset for future use.Abstract
The sparsity of labelled data is an obstacle to the development of Relation Extraction models and the completion of databases in various biomedical areas. While being of high interest in drug-discovery, the natural-products literature, reporting the identification of potential bioactive compounds from organisms, is a concrete example of such an overlooked topic. To mark the start of this new task, we created the first curated evaluation dataset and extracted literature items from the LOTUS database to build training sets. To this end, we developed a new sampler inspired by diversity metrics in ecology, named Greedy Maximum Entropy sampler, or GME-sampler (https://github.com/idiap/gme-sampler). The strategic optimization of both balance and diversity of the selected items in the evaluation set is important given the resource-intensive nature of manual curation. After quantifying the noise in the training set, in the form of discrepancies between the input abstracts text and the expected output labels, we explored different strategies accordingly. Framing the task as an end-to-end Relation Extraction, we evaluated the performance of standard fine-tuning as a generative task and few-shot learning with open Large Language Models (LLaMA 7B-65B). In addition to their evaluation in few-shot settings, we explore the potential of open Large Language Models (Vicuna-13B) as synthetic data generator and propose a new workflow for this purpose. All evaluated models exhibited substantial improvements when fine-tuned on synthetic abstracts rather than the original noisy data. We provide our best performing (f1-score=59.0) BioGPT-Large model for end-to-end RE of natural-products relationships along with all the generated synthetic data and the evaluation dataset. See more details at https://github.com/idiap/abroad-re.
摘要
“资料稀缺是生物医学领域中relation抽取模型的发展所面临的障碍。然而,自然产物文献中的潜在生物活性物质发现是一个受到过见的领域。为了启动这个新任务,我们创建了首个维护评估集和从LOTUS数据库中提取出来的文献项目,以建立训练集。为此,我们开发了一个灵活的最大熵采样器(GME-sampler),并在评估集中实现了权衡和多样性的选择。由于训练集的资源投入巨大,我们需要运用数据的混沌来评估模型的性能。我们将这个任务定义为一个端到端的relation抽取任务,并评估了标准的精致化和几何学模型的几何学学习。我们发现所有评估的模型在精致化的设定下表现出色,并且在使用生成器来生成实验数据时,具有更好的性能。我们提供了我们的最高表现(f1-score=59.0)的BioGPT-Large模型,以及所有生成的实验数据和评估集。详细信息请参考https://github.com/idiap/abroad-re。”
methods: 本研究使用了三种已 publik 的词典和 variants of ChatGPT 生成的词义定义进行比较。
results: 研究发现(i)不同的传统词典中的词义定义具有更高的表面形式相似性,而模型生成的定义则具有高度准确性,与传统词典相当;(ii)ChatGPT 定义具有高度准确性,可以在低频词术中保持准确性,而 GloVE 和 FastText 词 embedding 则不太准确。Abstract
Dictionary definitions are historically the arbitrator of what words mean, but this primacy has come under threat by recent progress in NLP, including word embeddings and generative models like ChatGPT. We present an exploratory study of the degree of alignment between word definitions from classical dictionaries and these newer computational artifacts. Specifically, we compare definitions from three published dictionaries to those generated from variants of ChatGPT. We show that (i) definitions from different traditional dictionaries exhibit more surface form similarity than do model-generated definitions, (ii) that the ChatGPT definitions are highly accurate, comparable to traditional dictionaries, and (iii) ChatGPT-based embedding definitions retain their accuracy even on low frequency words, much better than GloVE and FastText word embeddings.
摘要
传统的词典定义曾经是词语意义的决定性标准,但这种主导地位在计算机自然语言处理(NLP)的进步下来到了威胁。我们进行了一项探索性的研究,检查了古典词典定义和计算机生成的词语定义之间的吻合度。我们比较了三本出版的词典定义和 variants of ChatGPT 生成的定义,发现:1. 不同的传统词典定义在表面形式上更加相似,而模型生成的定义相对来说更加不同。2. ChatGPT 生成的定义准确率高,与传统词典定义相当,甚至在低频词语上也具有较高的准确率。3. ChatGPT 基于的词语定义 embedding 在低频词语上保持了准确性,而 GloVE 和 FastText 词语 embedding 则不如 ChatGPT。
Schema Graph-Guided Prompt for Multi-Domain Dialogue State Tracking
results: 我们的实验表明,我们的图基于方法在多域对话状态跟踪中表现更好,使用相同或少于其他多域 DST 方法的训练参数。我们还进行了广泛的对schema graph体系、参数使用和模块剥离的研究,以证明我们的模型在多域对话状态跟踪中的效果。Abstract
Tracking dialogue states is an essential topic in task-oriented dialogue systems, which involve filling in the necessary information in pre-defined slots corresponding to a schema. While general pre-trained language models have been shown effective in slot-filling, their performance is limited when applied to specific domains. We propose a graph-based framework that learns domain-specific prompts by incorporating the dialogue schema. Specifically, we embed domain-specific schema encoded by a graph neural network into the pre-trained language model, which allows for relations in the schema to guide the model for better adaptation to the specific domain. Our experiments demonstrate that the proposed graph-based method outperforms other multi-domain DST approaches while using similar or fewer trainable parameters. We also conduct a comprehensive study of schema graph architectures, parameter usage, and module ablation that demonstrate the effectiveness of our model on multi-domain dialogue state tracking.
摘要
“Dialogue state tracking(DST)在任务对话系统中是一个重要的主题,它需要填充预定的构造中的必要信息。而通用的预训语言模型在特定领域中表现不佳,因此我们提出了一个基于图形框架的方法,通过将领域特定的schema编码为图形神经网络,将领域特定的关系引导模型更好地适应特定领域。我们的实验结果显示,我们的图形基于方法在多域DST方法中表现出色,并且使用相似或少数可训练的参数。我们还进行了多种schema图架架构、参数使用和模块扩展的完整研究,实验结果证明了我们的模型在多域对话state Tracking中的效果。”Note that Simplified Chinese is the official writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other parts of the world.
Argumentation Element Annotation Modeling using XLNet
paper_authors: Christopher Ormerod, Amy Burkhardt, Mackenzie Young, Sue Lottridge
for: This paper demonstrates the effectiveness of XLNet for annotating argumentative elements in persuasive essays, providing automated feedback on essay organization.
methods: The paper uses XLNet, a transformer-based language model, with a recurrent mechanism to model long-term dependencies in lengthy texts. The model is fine-tuned on three datasets annotated with different schemes.
results: The XLNet models achieved strong performance across all datasets, even surpassing human agreement levels in some cases. The paper highlights the suitability of XLNet for providing automated feedback on essay organization, and provides insights into the relationships between the annotation tags.Abstract
This study demonstrates the effectiveness of XLNet, a transformer-based language model, for annotating argumentative elements in persuasive essays. XLNet's architecture incorporates a recurrent mechanism that allows it to model long-term dependencies in lengthy texts. Fine-tuned XLNet models were applied to three datasets annotated with different schemes - a proprietary dataset using the Annotations for Revisions and Reflections on Writing (ARROW) scheme, the PERSUADE corpus, and the Argument Annotated Essays (AAE) dataset. The XLNet models achieved strong performance across all datasets, even surpassing human agreement levels in some cases. This shows XLNet capably handles diverse annotation schemes and lengthy essays. Comparisons between the model outputs on different datasets also revealed insights into the relationships between the annotation tags. Overall, XLNet's strong performance on modeling argumentative structures across diverse datasets highlights its suitability for providing automated feedback on essay organization.
摘要
Translation notes:* "ARROW" 改为 "箭" (jian) (proprietary dataset using the Annotations for Revisions and Reflections on Writing scheme)* "PERSUADE" 改为 "说服" (shuocheng) (PERSUADE corpus)* "AAE" 改为 "Argument Annotated Essays" 改为 "论点标注作文" (lun dian biao zhun) (Argument Annotated Essays dataset)* "essays" 改为 "作文" (zuxing) (to match the Simplified Chinese word order)
Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild
results: 这篇论文提出了一种基于实践的论点,即LLM攻击的活动是一种社区协同的行为,其中参与者的动机和目标、使用的策略和技术以及社区的作用都具有重要作用。Abstract
Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail. We relate and connect this activity between its practitioners' motivations and goals; the strategies and techniques they deploy; and the crucial role the community plays. As a result, this paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.
摘要
大型语言模型(LLM)的故意生成异常输出的攻击是一项新的人类活动。这篇论文通过正式的形式化质量方法,介绍了这种攻击的如何和为何。我们对来自多个背景的参与者进行了多达数十人的采访,这些参与者都是这项尝试引起LLM失败的工作的贡献者。我们将这些参与者的动机和目标与战略和技巧相连接,并证明了这种活动的核心是LLM红团队在野外。因此,这篇论文提供了一个固定的LLM攻击理论:LLM红团队在野外。
A Comparison of Lexicon-Based and ML-Based Sentiment Analysis: Are There Outlier Words?
results: 研究发现,各个领域的文本情感分析结果存在差异,且不存在特定的词语库项导致差异的现象。Abstract
Lexicon-based approaches to sentiment analysis of text are based on each word or lexical entry having a pre-defined weight indicating its sentiment polarity. These are usually manually assigned but the accuracy of these when compared against machine leaning based approaches to computing sentiment, are not known. It may be that there are lexical entries whose sentiment values cause a lexicon-based approach to give results which are very different to a machine learning approach. In this paper we compute sentiment for more than 150,000 English language texts drawn from 4 domains using the Hedonometer, a lexicon-based technique and Azure, a contemporary machine-learning based approach which is part of the Azure Cognitive Services family of APIs which is easy to use. We model differences in sentiment scores between approaches for documents in each domain using a regression and analyse the independent variables (Hedonometer lexical entries) as indicators of each word's importance and contribution to the score differences. Our findings are that the importance of a word depends on the domain and there are no standout lexical entries which systematically cause differences in sentiment scores.
摘要
Lexicon-based方法 для情感分析文本基于每个词或语言Entry有前定的欢度指数,这些通常是手动指定的,但与机器学习基于方法的计算情感结果相比,它们的准确性不明确。可能存在 lexical Entry whose sentiment values cause a lexicon-based approach to give results that are very different from a machine learning approach。在这篇论文中,我们计算了超过 150,000 篇英语文本,从 4 个领域中获取,使用 Hedonometer,一种 lexicon-based 技术和 Azure,一种现代机器学习基于 API 的方法,这是 Azure 认知服务家族的一部分,易于使用。我们模型了每个领域的文档的情感分数之间的差异使用回归分析,并将 Hedonometer 词语入力作为每个词的重要性和对情感分数做出贡献的指标进行分析。我们的发现是,在各个领域中,一个词的重要性取决于领域,并没有一个系统性地导致情感分数差异的词语。
results: 该论文显示了这种基于ホップ代数的模型可以帮助解决一些当前大语言模型的争议,并且可以提供一种新的方法来描述语音表达中的意义提取过程。Abstract
We extend our formulation of Merge and Minimalism in terms of Hopf algebras to an algebraic model of a syntactic-semantic interface. We show that methods adopted in the formulation of renormalization (extraction of meaningful physical values) in theoretical physics are relevant to describe the extraction of meaning from syntactic expressions. We show how this formulation relates to computational models of semantics and we answer some recent controversies about implications for generative linguistics of the current functioning of large language models.
摘要
我们扩展了我们的 merge 和 minimalism 在霍夫代数中的形式ulation,用于建立语音表示与 semantics 的 интерфейス。我们显示了在理论物理中的 renormalization (提取有意义的物理值) 方法与语音表达中提取意义的方法有相似之处。我们还示出了这种形式ulation 与计算 semantics 模型之间的关系,并回答了一些最近关于生成语言学的争议。
Is it indeed bigger better? The comprehensive study of claim detection LMs applied for disinformation tackling
paper_authors: Martin Hyben, Sebastian Kula, Ivan Srba, Robert Moro, Jakub Simko
For: Compares the performance of fine-tuned models and extremely large language models on the task of check-worthy claim detection.* Methods: Uses a multilingual and multi-topical dataset, and benchmark analysis to determine the most general multilingual and multi-topical claim detector.* Results: Despite technological progress in natural language processing, fine-tuned models outperform zero-shot approaches in cross-domain settings.Here’s the full text in Simplified Chinese:* 为: Compares 精制模型和非常大的自然语言处理模型在检查可信laim检测任务上的表现。* 方法: 使用多语言多频道的数据集,并进行了benchmark分析,以确定最通用的多语言多频道laim检测器。* 结果: despite技术进步,精制模型在跨频道设置下仍然表现更好于零批处理approaches。Abstract
This study compares the performance of (1) fine-tuned models and (2) extremely large language models on the task of check-worthy claim detection. For the purpose of the comparison we composed a multilingual and multi-topical dataset comprising texts of various sources and styles. Building on this, we performed a benchmark analysis to determine the most general multilingual and multi-topical claim detector. We chose three state-of-the-art models in the check-worthy claim detection task and fine-tuned them. Furthermore, we selected three state-of-the-art extremely large language models without any fine-tuning. We made modifications to the models to adapt them for multilingual settings and through extensive experimentation and evaluation. We assessed the performance of all the models in terms of accuracy, recall, and F1-score in in-domain and cross-domain scenarios. Our results demonstrate that despite the technological progress in the area of natural language processing, the models fine-tuned for the task of check-worthy claim detection still outperform the zero-shot approaches in a cross-domain settings.
摘要
Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration
results: 研究发现,现有的会员推测技术对于实际的精度语言模型(LLM)不能有效地泄露个人隐私信息。这是因为现有的会员推测方法假设训练记录会具有高的概率被采样,但是这种假设受到训练集的多重正则化和 LLM 的总体化的影响,导致会员推测的效果减弱。Abstract
Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Prior attempts have quantified the privacy risks of language models (LMs) via MIAs, but there is still no consensus on whether existing MIA algorithms can cause remarkable privacy leakage on practical Large Language Models (LLMs). Existing MIAs designed for LMs can be classified into two categories: reference-free and reference-based attacks. They are both based on the hypothesis that training records consistently strike a higher probability of being sampled. Nevertheless, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. The reference-based attack seems to achieve promising effectiveness in LLMs, which measures a more reliable membership signal by comparing the probability discrepancy between the target model and the reference model. However, the performance of reference-based attack is highly dependent on a reference dataset that closely resembles the training dataset, which is usually inaccessible in the practical scenario. Overall, existing MIAs are unable to effectively unveil privacy leakage over practical fine-tuned LLMs that are overfitting-free and private. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, since memorization in LLMs is inevitable during the training process and occurs before overfitting, we introduce a more reliable membership signal, probabilistic variation, which is based on memorization rather than overfitting. Furthermore, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs.
摘要
语言模型(LM)的会员推测攻击(MIA)目的是确定目标数据记录是否在模型训练中使用过。先前的尝试已经衡量了语言模型的隐私风险通过 MIA,但没有达成一致是否存在现实中大语言模型(LLM)上显著的隐私泄露问题。现有的 MIA 设计为语言模型可以分为两类:无参和参参攻击。它们都基于目标模型训练记录的假设,即训练记录会具有更高的抽样概率。然而,这个假设取决于目标模型的过度适应,这将通过多种正则化方法和大语言模型的通用性来减弱。参参攻击似乎在 LLM 中获得了有效的成果,它通过比较目标模型和参考模型之间的概率差来测量更可靠的会员信号。然而,参参攻击的性能受到参考 dataset 的影响,这个 dataset 通常在实际场景中不可得。总之,现有的 MIA 无法有效地暴露实际精细调整后的 LLM 中的隐私泄露。我们提出一种基于自适应概率变化(SPV)的会员推测攻击方法。具体来说,在语言模型的训练过程中,记忆是不可避免的,而记忆在训练之前就发生了过度适应。我们引入更可靠的会员信号,即概率变化,它基于记忆而不是过度适应。此外,我们引入自我提示方法,它通过让目标 LLM 自己提供参考模型的 dataset 来构建一个类似于公共 API 上的 dataset。这样,敌对方可以收集一个类似于公共 API 上的 dataset,从而实现更好的会员推测。
Multi-Label Topic Model for Financial Textual Data
results: 作者发现,在不同主题之间的合并影响了股市反应。例如,公告新的大规模项目或破产申请会产生强烈的正面或负面市场反应,而某些其他主题则不显示出显著的价格影响。此外,相比之前的研究,这种多个标签结构允许分析不同主题之间的相互作用。Abstract
This paper presents a multi-label topic model for financial texts like ad-hoc announcements, 8-K filings, finance related news or annual reports. I train the model on a new financial multi-label database consisting of 3,044 German ad-hoc announcements that are labeled manually using 20 predefined, economically motivated topics. The best model achieves a macro F1 score of more than 85%. Translating the data results in an English version of the model with similar performance. As application of the model, I investigate differences in stock market reactions across topics. I find evidence for strong positive or negative market reactions for some topics, like announcements of new Large Scale Projects or Bankruptcy Filings, while I do not observe significant price effects for some other topics. Furthermore, in contrast to previous studies, the multi-label structure of the model allows to analyze the effects of co-occurring topics on stock market reactions. For many cases, the reaction to a specific topic depends heavily on the co-occurrence with other topics. For example, if allocated capital from a Seasoned Equity Offering (SEO) is used for restructuring a company in the course of a Bankruptcy Proceeding, the market reacts positively on average. However, if that capital is used for covering unexpected, additional costs from the development of new drugs, the SEO implies negative reactions on average.
摘要
(Note: Please note that the translation is in Simplified Chinese, and the formatting of the text may be different from the original English version.)
ChiMed-GPT: A Chinese Medical Large Language Model with Full Training Regime and Better Alignment to Human Preferences
results: 对于实际任务 such as 信息提取、问答和对话生成,ChiMed-GPT的性能都显著高于通用领域的语言模型。此外,通过对模型进行某些词汇和语言模型的改进,提高了模型的可读性和可信度。Abstract
Recently, the increasing demand for superior medical services has highlighted the discrepancies in the medical infrastructure. With big data, especially texts, forming the foundation of medical services, there is an exigent need for effective natural language processing (NLP) solutions tailored to the healthcare domain. Conventional approaches leveraging pre-trained models present promising results in this domain and current large language models (LLMs) offer advanced foundation for medical text processing. However, most medical LLMs are trained only with supervised fine-tuning (SFT), even though it efficiently empowers LLMs to understand and respond to medical instructions but is ineffective in learning domain knowledge and aligning with human preference. Another engineering barrier that prevents current medical LLM from better text processing ability is their restricted context length (e.g., 2,048 tokens), making it hard for the LLMs to process long context, which is frequently required in the medical domain. In this work, we propose ChiMed-GPT, a new benchmark LLM designed explicitly for Chinese medical domain, with enlarged context length to 4,096 tokens and undergoes a comprehensive training regime with pre-training, SFT, and RLHF. Evaluations on real-world tasks including information extraction, question answering, and dialogue generation demonstrate ChiMed-GPT's superior performance over general domain LLMs. Furthermore, we analyze possible biases through prompting ChiMed-GPT to perform attitude scales regarding discrimination of patients, so as to contribute to further responsible development of LLMs in the medical domain. The code and model are released at https://github.com/synlp/ChiMed-GPT.
摘要
最近,医疗服务的需求增长,抛出了医疗基础设施的差异。医疗领域的大数据 Text 作为医疗服务的基础,需要有效的自然语言处理(NLP)解决方案。现有的方法利用预训练模型显示了良好的结果,而当前的大语言模型(LLM)提供了医疗文本处理的高级基础。然而,大多数医疗 LL M 仅通过监督微调(SFT)进行训练,尽管它可以有效地使 LLM 理解和回答医疗指令,但是无法学习域知识和人类偏好。另一个工程障碍是现有的医疗 LL M 的上下文长度 restriction(例如 2,048 个 Token),使得 LLM Difficult to process long context,这经常需要在医疗领域进行。在这种情况下,我们提出了 ChiMed-GPT,一个专门为中文医疗领域设计的新的标准 LL M。 ChiMed-GPT 的上下文长度增加到 4,096 个 Token,并通过预训练、SFT 和 RLHF 进行全面的训练 regime。在实际任务中,包括信息提取、问题回答和对话生成,ChiMed-GPT 的性能超过了通用领域 LL M。此外,我们还分析了 ChiMed-GPT 的可能的偏见,通过让它完成恶势卷反映的任务,以至于降低 LLM 在医疗领域的可能性。代码和模型可以在 https://github.com/synlp/ChiMed-GPT 上下载。
Large Language Models are Zero Shot Hypothesis Proposers
for: investigate whether LLMs can propose scientific hypotheses
methods: construct a dataset of background knowledge and hypothesis pairs from biomedical literature, evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings
results: LLMs surprisingly generate untrained yet validated hypotheses from testing literature, increasing uncertainty facilitates candidate generation, potentially enhancing zero-shot hypothesis generation capabilitiesAbstract
Significant scientific discoveries have driven the progress of human civilisation. The explosion of scientific literature and data has created information barriers across disciplines that have slowed the pace of scientific discovery. Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down these information barriers and foster a new wave of scientific discovery. However, the potential of LLMs for scientific discovery has not been formally explored. In this paper, we start from investigating whether LLMs can propose scientific hypotheses. To this end, we construct a dataset consist of background knowledge and hypothesis pairs from biomedical literature. The dataset is divided into training, seen, and unseen test sets based on the publication date to control visibility. We subsequently evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings, including both closed and open-source LLMs. Additionally, we introduce an LLM-based multi-agent cooperative framework with different role designs and external tools to enhance the capabilities related to generating hypotheses. We also design four metrics through a comprehensive review to evaluate the generated hypotheses for both ChatGPT-based and human evaluations. Through experiments and analyses, we arrive at the following findings: 1) LLMs surprisingly generate untrained yet validated hypotheses from testing literature. 2) Increasing uncertainty facilitates candidate generation, potentially enhancing zero-shot hypothesis generation capabilities. These findings strongly support the potential of LLMs as catalysts for new scientific discoveries and guide further exploration.
摘要
科学发现的进步对人类文明的发展具有重要作用。 however, scientific literature and data explosion 已经创造了知识障碍, slowing down scientific discovery。 Large Language Models (LLMs) possess a wealth of global and interdisciplinary knowledge that can break down these information barriers and foster a new wave of scientific discovery. 然而, LLMS的科学发现潜力还没有得到正式探索。在这篇论文中,我们开始了 LLMS 可以提出科学假设的研究。为此,我们构建了一个基于生物医学文献的假设集和背景知识集,并将其分为训练、seen和未见测试集,以控制可见性。接着,我们评估了不同级别的 instructed 模型在零shot、几shot和精度调整设置下的假设生成能力,包括开源和关闭源 LLMS。此外,我们还提出了基于 LLMS 的多代合作框架,并设计了不同角色的设计和外部工具来提高假设生成能力。最后,我们设计了四种度量来评估生成的假设,包括 ChatGPT 基于和人类评估。通过实验和分析,我们得到以下发现:1. LLMS 奇异地从测试文献中提出未经训练的有效假设。2. 增加不确定性可能提高零shot假设生成能力。这些发现加强了 LLMS 作为新科学发现的潜在推动者的潜力,并且引导进一步探索。
Chain of Thought with Explicit Evidence Reasoning for Few-shot Relation Extraction
results: 这篇论文提出了一个名为CoT-ER的新方法,它使用大型自然语言模型来生成证据,然后将这些证据Explicitly incorporated into chain-of-thought来进行关系抽取。实验结果显示,CoT-ER方法在FewRel1.0和FewRel2.0数据集上 achieves competitive performance 与完全监督(100% 训练数据)现有方法相比。Abstract
Few-shot relation extraction involves identifying the type of relationship between two specific entities within a text, using a limited number of annotated samples. A variety of solutions to this problem have emerged by applying meta-learning and neural graph techniques which typically necessitate a training process for adaptation. Recently, the strategy of in-context learning has been demonstrating notable results without the need of training. Few studies have already utilized in-context learning for zero-shot information extraction. Unfortunately, the evidence for inference is either not considered or implicitly modeled during the construction of chain-of-thought prompts. In this paper, we propose a novel approach for few-shot relation extraction using large language models, named CoT-ER, chain-of-thought with explicit evidence reasoning. In particular, CoT-ER first induces large language models to generate evidences using task-specific and concept-level knowledge. Then these evidences are explicitly incorporated into chain-of-thought prompting for relation extraction. Experimental results demonstrate that our CoT-ER approach (with 0% training data) achieves competitive performance compared to the fully-supervised (with 100% training data) state-of-the-art approach on the FewRel1.0 and FewRel2.0 datasets.
摘要
几个shot关系提取问题涉及到在文本中确定两个特定实体之间的类型关系,使用有限数量的标注样本进行训练。许多解决方案已经在应用元学习和神经图技术,通常需要训练过程进行适应。然而,最近,在文本中学习的策略已经在没有训练的情况下达到了显著的结果。只有一些研究已经使用了零shot信息提取。然而,在构建链条思维提问时,对推理的证据并不被考虑或直接模型。在本文中,我们提出了一种基于大语言模型的新方法,名为CoT-ER,即链条思维withExplicit Evidence Reasoning。特别是,CoT-ER首先使大语言模型生成证据,使用任务特定和概念水平的知识。然后,这些证据被Explicitly incorporated into链条思维提问中。实验结果表明,我们的CoT-ER方法(无需训练数据)可以与完全监督(具有100%训练数据)当前领域的状态之前性能竞争。
Citation Recommendation on Scholarly Legal Articles
paper_authors: Doğukan Arslan, Saadet Sena Erdoğan, Gülşen Eryiğit
for: 这个论文的目的是提出一个学术法律数据集,以便进行参考文献推荐任务。
methods: 这个论文使用了现有的模型,并进行了实验和比较,以检验这些模型在法律领域的表现。
results: 研究结果表明,使用BM25+和SciNCL进行预选和重新排序可以提高基线性能从0.26到0.30 MAP@10,而 fine-tuning也可以提高预处理模型的表现。Abstract
Citation recommendation is the task of finding appropriate citations based on a given piece of text. The proposed datasets for this task consist mainly of several scientific fields, lacking some core ones, such as law. Furthermore, citation recommendation is used within the legal domain to identify supporting arguments, utilizing non-scholarly legal articles. In order to alleviate the limitations of existing studies, we gather the first scholarly legal dataset for the task of citation recommendation. Also, we conduct experiments with state-of-the-art models and compare their performance on this dataset. The study suggests that, while BM25 is a strong benchmark for the legal citation recommendation task, the most effective method involves implementing a two-step process that entails pre-fetching with BM25+, followed by re-ranking with SciNCL, which enhances the performance of the baseline from 0.26 to 0.30 MAP@10. Moreover, fine-tuning leads to considerable performance increases in pre-trained models, which shows the importance of including legal articles in the training data of these models.
摘要English简化字 Citation recommendation是一项基于给定文本的任务,找到相应的引用。已有的数据集主要包括一些科学领域,缺乏一些核心领域,例如法律。在法律领域中,引用推荐用于identifying supporting arguments,使用非学术法律文章。为了解决现有研究的局限性,我们收集了首个学术法律数据集 для引用推荐任务。此外,我们进行了现有模型的实验和比较,并发现使用BM25+后followed by SciNCL重新排名的两步进程可以提高基准值从0.26到0.30 MAP@10。此外, fine-tuning对预训练模型的性能有显著提高,这说明包含法律文章在模型训练数据中的重要性。 Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification
paper_authors: Reza Esfandiarpoor, Stephen H. Bach for: This paper aims to improve the performance of vision-language models like CLIP for image classification by extending class descriptions with related attributes.methods: The proposed method, Follow-up Differential Descriptions (FuDD), uses a Large Language Model (LLM) to generate new class descriptions that differentiate between ambiguous classes.results: FuDD consistently outperforms generic description ensembles and naive LLM-generated descriptions on 12 datasets, and high quality natural language class descriptions produced by FuDD result in comparable performance to few-shot adaptation methods.Abstract
A promising approach for improving the performance of vision-language models like CLIP for image classification is to extend the class descriptions (i.e., prompts) with related attributes, e.g., using brown sparrow instead of sparrow. However, current zero-shot methods select a subset of attributes regardless of commonalities between the target classes, potentially providing no useful information that would have helped to distinguish between them. For instance, they may use color instead of bill shape to distinguish between sparrows and wrens, which are both brown. We propose Follow-up Differential Descriptions (FuDD), a zero-shot approach that tailors the class descriptions to each dataset and leads to additional attributes that better differentiate the target classes. FuDD first identifies the ambiguous classes for each image, and then uses a Large Language Model (LLM) to generate new class descriptions that differentiate between them. The new class descriptions resolve the initial ambiguity and help predict the correct label. In our experiments, FuDD consistently outperforms generic description ensembles and naive LLM-generated descriptions on 12 datasets. We show that differential descriptions are an effective tool to resolve class ambiguities, which otherwise significantly degrade the performance. We also show that high quality natural language class descriptions produced by FuDD result in comparable performance to few-shot adaptation methods.
摘要
一种有前途的方法是通过扩展类描述(即提示)来提高视觉语言模型如CLIP的图像分类性能。例如,使用 Brown Sparrow 而不是只使用 Sparrow。然而,当前的零shot方法会选择图像集中的一 subset of 属性,而不考虑这些目标类之间的共通点,这可能无法提供任何有用的信息,用于 distinguishing между他们。例如,它们可能使用颜色而不是嘴形来分辨鸟鹤和织纹鸟,它们都是棕色的。我们提出了 Follow-up Differential Descriptions (FuDD),一种零shot方法,它可以为每个图像集定制类描述,并且生成更好地分 differentiate 目标类的属性。FuDD 首先确定每个图像中的抽象类,然后使用大型自然语言模型(LLM)生成新的类描述,以解决初始的混淆。这些新的类描述可以分解初始的混淆,并帮助预测正确的标签。在我们的实验中,FuDD consistently 超过了通用描述阵列和幼AGE LLM 生成的描述在 12 个数据集上。我们展示了 differential 描述是一种有效的工具,用于解决类混淆,否则会对性能产生负面影响。我们还展示了 FuDD 生成的高质量自然语言类描述可以达到与几 shot 适应方法相同的性能。
Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications
results: 本文提出了未来研究方向,包括数据增强、知识编辑和模型提升等。Abstract
Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects. Nevertheless, there is still a notable absence of a comprehensive survey. In this paper, we propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications. In addition, we conduct an in-depth analysis of different methods and point out potential research directions in the future. We hope this survey offers the community quick access and a comprehensive overview of this research area, with the intention of inspiring future research endeavors.
摘要
大型自然语言模型(LLM)在各种自然语言任务上表现出色,但它们受到过时数据和领域特定限制的影响。为了解决这些挑战,研究人员通过知识编辑和检索增强来增强LLM,并将外部信息integrate到不同方面。然而,当前仍然缺乏一份全面的评论。本文提出了一篇文章,探讨大型语言模型和知识 интеграción的趋势,包括方法分类、标准准比和应用场景。此外,我们还进行了深入的分析不同方法,并指出了未来研究的可能性。我们希望这份评论可以为社区提供快速的访问和全面的概述,以便鼓励未来的研究努力。
results: 结果显示,使用PRM-based方法可以提高简单数学逻辑(GSM8K)的准确率,但在复杂任务(MATH)中,不料地下降性能,并且奖励聚合函数的作用对模型性能产生关键作用。Abstract
While recent advances have boosted LM proficiency in linguistic benchmarks, LMs consistently struggle to reason correctly on complex tasks like mathematics. We turn to Reinforcement Learning from Human Feedback (RLHF) as a method with which to shape model reasoning processes. In particular, we explore two reward schemes, outcome-supervised reward models (ORMs) and process-supervised reward models (PRMs), to optimize for logical reasoning. Our results show that the fine-grained reward provided by PRM-based methods enhances accuracy on simple mathematical reasoning (GSM8K) while, unexpectedly, reducing performance in complex tasks (MATH). Furthermore, we show the critical role reward aggregation functions play in model performance. Providing promising avenues for future research, our study underscores the need for further exploration into fine-grained reward modeling for more reliable language models.
摘要
Recent advances have improved the proficiency of language models (LMs) in linguistic benchmarks, but they consistently struggle with complex tasks like mathematics. To improve the reasoning processes of LMs, we turn to reinforcement learning from human feedback (RLHF). Specifically, we explore two reward schemes, outcome-supervised reward models (ORMs) and process-supervised reward models (PRMs), to optimize for logical reasoning. Our results show that the fine-grained reward provided by PRM-based methods enhances accuracy in simple mathematical reasoning (GSM8K) while, unexpectedly, reducing performance in complex tasks (MATH). Additionally, we find that the aggregation functions used in the reward models play a critical role in model performance. This study highlights the need for further research into fine-grained reward modeling for more reliable language models.
CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model
results: 对一些现有的语言模型进行实验,发现虽有一些模型在特定任务上表现出色,但总体来说,现有模型在基本金融文本处理任务中仍有很大的提升空间。Abstract
Large language models (LLMs) have demonstrated great potential in the financial domain. Thus, it becomes important to assess the performance of LLMs in the financial tasks. In this work, we introduce CFBenchmark, to evaluate the performance of LLMs for Chinese financial assistant. The basic version of CFBenchmark is designed to evaluate the basic ability in Chinese financial text processing from three aspects~(\emph{i.e.} recognition, classification, and generation) including eight tasks, and includes financial texts ranging in length from 50 to over 1,800 characters. We conduct experiments on several LLMs available in the literature with CFBenchmark-Basic, and the experimental results indicate that while some LLMs show outstanding performance in specific tasks, overall, there is still significant room for improvement in basic tasks of financial text processing with existing models. In the future, we plan to explore the advanced version of CFBenchmark, aiming to further explore the extensive capabilities of language models in more profound dimensions as a financial assistant in Chinese. Our codes are released at https://github.com/TongjiFinLab/CFBenchmark.
摘要
大型语言模型(LLM)在金融领域的潜力已经得到了广泛的认可。因此,评估 LLM 在金融任务中的表现变得非常重要。在这项工作中,我们提出了 CFBenchmark,用于评估中文金融助手的 LLM 表现。CFBenchmark 的基本版本包括三个方面的八个任务,包括文本识别、分类和生成等,并且文本的长度从 50 字符到超过 1,800 字符不等。我们在文献中公布的一些 LLM 上进行了 CFBenchmark-Basic 的实验,结果表明,虽然一些 LLM 在特定任务中表现出色,但总体来说,现有模型仍然在基本的金融文本处理任务中存在很大的改进空间。未来,我们计划将 CFBenchmark 的高级版本推出,以更深入探索语言模型在中文金融助手中的广泛能力。我们的代码在 GitHub 上公布,请参考 https://github.com/TongjiFinLab/CFBenchmark。
results: Favorable results.Abstract
Detecting anomalies in a daily time series with a weekly pattern is a common task with a wide range of applications. A typical way of performing the task is by using decomposition method. However, the method often generates false positive results where a data point falls within its weekly range but is just off from its weekday position. We refer to this type of anomalies as "in-season anomalies", and propose a k-parameter approach to address the issue. The approach provides configurable extra tolerance for in-season anomalies to suppress misleading alerts while preserving real positives. It yields favorable result.
摘要
检测日征时序中的每周征性异常是一项广泛应用的任务。通常使用分解方法来实现这个任务,但这种方法经常生成假阳性结果,其中一个数据点在其每周范围内但是偏离其每周日期位置。我们称这种异常为“在季度异常”,并提出了k参数方法来解决这个问题。这种方法可以配置额外的宽限容许季度异常,以避免误导性警报,同时保留真正的阳性结果。它的结果很有利。
results: 这个论文证明了VC Dimension和Littlestone dimension之间的关系,并提供了一系列 bounds,其中包括一个新的下界,可以提高之前的下界。此外,这个论文还扩展到多类分类和agnostic设定。Abstract
We present new upper and lower bounds on the number of learner mistakes in the `transductive' online learning setting of Ben-David, Kushilevitz and Mansour (1997). This setting is similar to standard online learning, except that the adversary fixes a sequence of instances $x_1,\dots,x_n$ to be labeled at the start of the game, and this sequence is known to the learner. Qualitatively, we prove a trichotomy, stating that the minimal number of mistakes made by the learner as $n$ grows can take only one of precisely three possible values: $n$, $\Theta\left(\log (n)\right)$, or $\Theta(1)$. Furthermore, this behavior is determined by a combination of the VC dimension and the Littlestone dimension. Quantitatively, we show a variety of bounds relating the number of mistakes to well-known combinatorial dimensions. In particular, we improve the known lower bound on the constant in the $\Theta(1)$ case from $\Omega\left(\sqrt{\log(d)}\right)$ to $\Omega(\log(d))$ where $d$ is the Littlestone dimension. Finally, we extend our results to cover multiclass classification and the agnostic setting.
摘要
我们提出新的上下界关于学习者错误的数量在Ben-David、Kushilevitz和Mansour(1997)的推uctive在线学习Setting中。这个设定与标准的在线学习相似,但是敌人会在游戏开始前固定一个序列实例$x_1,\dots,x_n$的标签,并且这个序列是学习者知道的。qualitatively,我们证明了一种三分法, stating that the minimal number of mistakes made by the learner as $n$ grows can take only one of precisely three possible values: $n$, $\Theta\left(\log (n)\right)$, or $\Theta(1)$. Furthermore, this behavior is determined by a combination of the VC dimension and the Littlestone dimension. Quantitatively, we show a variety of bounds relating the number of mistakes to well-known combinatorial dimensions. In particular, we improve the known lower bound on the constant in the $\Theta(1)$ case from $\Omega\left(\sqrt{\log(d)}\right)$ to $\Omega(\log(d))$ where $d$ is the Littlestone dimension. Finally, we extend our results to cover multiclass classification and the agnostic setting.Note: "推uctive" is a typo, it should be "online" instead.
A comprehensive analysis of concept drift locality in data streams
For: 本研究目的是为了探讨概念变革的探测,以便在线学习中进行有效的模型适应。* Methods: 本研究使用了9种现有的概念变革探测方法,并进行了比较性评估,以显示它们在不同的难度水平上的表现。* Results: 研究发现,概念变革的地方性和范围有重要影响在标签器性能上,并提出了不同的概念变革类别下的最佳适应策略。Abstract
Adapting to drifting data streams is a significant challenge in online learning. Concept drift must be detected for effective model adaptation to evolving data properties. Concept drift can impact the data distribution entirely or partially, which makes it difficult for drift detectors to accurately identify the concept drift. Despite the numerous concept drift detectors in the literature, standardized procedures and benchmarks for comprehensive evaluation considering the locality of the drift are lacking. We present a novel categorization of concept drift based on its locality and scale. A systematic approach leads to a set of 2,760 benchmark problems, reflecting various difficulty levels following our proposed categorization. We conduct a comparative assessment of 9 state-of-the-art drift detectors across diverse difficulties, highlighting their strengths and weaknesses for future research. We examine how drift locality influences the classifier performance and propose strategies for different drift categories to minimize the recovery time. Lastly, we provide lessons learned and recommendations for future concept drift research. Our benchmark data streams and experiments are publicly available at https://github.com/gabrieljaguiar/locality-concept-drift.
摘要
适应漂移数据流是在线学习中的一大挑战。概念漂移必须被探测,以便有效地适应数据质量的发展。概念漂移可能会影响整个数据分布或只影响一部分,这使得漂移探测器很难准确地确定概念漂移。尽管文献中有很多概念漂移探测器,但是没有标准化的程序和标准准则 для全面评估,考虑到漂移的地方性。我们提出了一种新的概念漂移分类方法,基于其地方性和规模。我们通过这种分类方法,生成了2,760个benchmark问题,各种难度水平都有reflect。我们对9种当前state-of-the-art漂移探测器进行了 Comparative 评估,并 highlights 它们在不同难度水平上的优势和缺陷,以便未来研究。我们 также examine 如何在不同的漂移类别下,最小化恢复时间。最后,我们提供了未来概念漂移研究的教训和建议,以及我们的benchmark数据流和实验结果,可以在https://github.com/gabrieljaguiar/locality-concept-drift上获取。
A statistical perspective on algorithm unrolling models for inverse problems
results: 该论文显示了 Gradient Descent Network (GDN) 的统计复杂性是 $\mathcal{O}(\log(n)/\log(\varrho_n^{-1}))$,其中 $n$ 是样本大小,$\varrho_n$ 是梯度下降算法的速度。此外,当 negative log-density of latent variable $\bf x$ 有简单的 proximal 操作时,Then a GDN unrolled at depth $D’$ can solve the inverse problem at the parametric rate $O(D’/\sqrt{n})$.Abstract
We consider inverse problems where the conditional distribution of the observation ${\bf y}$ given the latent variable of interest ${\bf x}$ (also known as the forward model) is known, and we have access to a data set in which multiple instances of ${\bf x}$ and ${\bf y}$ are both observed. In this context, algorithm unrolling has become a very popular approach for designing state-of-the-art deep neural network architectures that effectively exploit the forward model. We analyze the statistical complexity of the gradient descent network (GDN), an algorithm unrolling architecture driven by proximal gradient descent. We show that the unrolling depth needed for the optimal statistical performance of GDNs is of order $\log(n)/\log(\varrho_n^{-1})$, where $n$ is the sample size, and $\varrho_n$ is the convergence rate of the corresponding gradient descent algorithm. We also show that when the negative log-density of the latent variable ${\bf x}$ has a simple proximal operator, then a GDN unrolled at depth $D'$ can solve the inverse problem at the parametric rate $O(D'/\sqrt{n})$. Our results thus also suggest that algorithm unrolling models are prone to overfitting as the unrolling depth $D'$ increases. We provide several examples to illustrate these results.
摘要
我们考虑反向问题,其中观察变量 $\bf y$ conditional distribution given 隐藏变量 $\bf x$ (也称为前向模型) 已知,并且我们有许多 $\bf x$ 和 $\bf y$ 的实例数据集。在这种情况下,算法卷积(algorithm unrolling)已成为设计前所未有的深度神经网络架构的非常流行的方法。我们分析了梯度下降网络(Gradient Descent Network,GDN)的统计复杂性。我们显示了 GDN 的推 rolling 深度需要为 $\log(n)/\log(\varrho_n^{-1})$,其中 $n$ 是样本大小,$\varrho_n$ 是相应的梯度下降算法的收敛速率。我们还显示了,当隐藏变量 $\bf x$ 的负梯度Log-浓度有简单的 proximal 运算时,那么在推 rolling 深度 $D'$ 下,GDN 可以在 $O(D'/\sqrt{n})$ 的速率解决反向问题。我们的结果也表明,algorithm unrolling 模型容易过拟合,随着推 rolling 深度 $D'$ 增加。我们给出了一些示例来证明这些结果。
Theory and implementation of inelastic Constitutive Artificial Neural Networks
paper_authors: Hagen Holthusen, Lukas Lamm, Tim Brepols, Stefanie Reese, Ellen Kuhl
For: The paper aims to develop a new method called Constitutive Artificial Neural Networks (CANN) to model the inelastic behavior of materials.* Methods: The paper uses a combination of feed-forward networks of the free energy and pseudo potential with a recurrent neural network approach to take time dependencies into account.* Results: The paper demonstrates that the iCANN is capable of autonomously discovering models for artificially generated data, the response of polymers for cyclic loading, and the relaxation behavior of muscle data.Here are the three points in Simplified Chinese text:
results: 论文示出 iCANN 可以自动找到模型,包括人工生成数据、聚合物的循环加载响应和肌肉数据的 relaxation 行为。Abstract
Nature has always been our inspiration in the research, design and development of materials and has driven us to gain a deep understanding of the mechanisms that characterize anisotropy and inelastic behavior. All this knowledge has been accumulated in the principles of thermodynamics. Deduced from these principles, the multiplicative decomposition combined with pseudo potentials are powerful and universal concepts. Simultaneously, the tremendous increase in computational performance enabled us to investigate and rethink our history-dependent material models to make the most of our predictions. Today, we have reached a point where materials and their models are becoming increasingly sophisticated. This raises the question: How do we find the best model that includes all inelastic effects to explain our complex data? Constitutive Artificial Neural Networks (CANN) may answer this question. Here, we extend the CANNs to inelastic materials (iCANN). Rigorous considerations of objectivity, rigid motion of the reference configuration, multiplicative decomposition and its inherent non-uniqueness, restrictions of energy and pseudo potential, and consistent evolution guide us towards the architecture of the iCANN satisfying thermodynamics per design. We combine feed-forward networks of the free energy and pseudo potential with a recurrent neural network approach to take time dependencies into account. We demonstrate that the iCANN is capable of autonomously discovering models for artificially generated data, the response of polymers for cyclic loading and the relaxation behavior of muscle data. As the design of the network is not limited to visco-elasticity, our vision is that the iCANN will reveal to us new ways to find the various inelastic phenomena hidden in the data and to understand their interaction. Our source code, data, and examples are available at doi.org/10.5281/zenodo.10066805
摘要
Higher-Order Newton Methods with Polynomial Work per Iteration
results: 该方法的本地收敛级别为$d$,比 classical Newton方法更低。数学示例表明,在$d$增加时,拥抱区域的面积可以增加。在certain assumptions下,我们还提出了一种修改后的算法,具有globally convergent和本地收敛级别为$d$。Abstract
We present generalizations of Newton's method that incorporate derivatives of an arbitrary order $d$ but maintain a polynomial dependence on dimension in their cost per iteration. At each step, our $d^{\text{th}$-order method uses semidefinite programming to construct and minimize a sum of squares-convex approximation to the $d^{\text{th}$-order Taylor expansion of the function we wish to minimize. We prove that our $d^{\text{th}$-order method has local convergence of order $d$. This results in lower oracle complexity compared to the classical Newton method. We show on numerical examples that basins of attraction around local minima can get larger as $d$ increases. Under additional assumptions, we present a modified algorithm, again with polynomial cost per iteration, which is globally convergent and has local convergence of order $d$.
摘要
我们提出了新项 Newton 方法的扩展,这些方法包括了阶数 $d$ 但是保持维度的 polynomial 依赖性。在每一步中,我们的 $d$ 阶方法使用半definite 程式来建构和最小化一个 sum of squares-凸函数的 Taylor 展开。我们证明了我们的 $d$ 阶方法有本地几何稳定性 order $d$。这导致与 класиical Newton 方法相比,我们的方法有较低的 oracle 复杂度。我们显示了一些数据例子,显示在 $d$ 增加时,当地点阶数的基础会变大。在更加假设下,我们提出了一个修改后的算法,这个算法还是有 polynomial 成本每一步,并且具有本地几何稳定性 order $d$ 和全球几何稳定性。
Blockchain-Enabled Federated Learning Approach for Vehicular Networks
results: 这个方法在遭受黑客攻击的情况下,仍能维持高准确性(91.92%),与其他分散式 Federated Learning 技术相比,这个方法具有更高的安全性和可靠性。Abstract
Data from interconnected vehicles may contain sensitive information such as location, driving behavior, personal identifiers, etc. Without adequate safeguards, sharing this data jeopardizes data privacy and system security. The current centralized data-sharing paradigm in these systems raises particular concerns about data privacy. Recognizing these challenges, the shift towards decentralized interactions in technology, as echoed by the principles of Industry 5.0, becomes paramount. This work is closely aligned with these principles, emphasizing decentralized, human-centric, and secure technological interactions in an interconnected vehicular ecosystem. To embody this, we propose a practical approach that merges two emerging technologies: Federated Learning (FL) and Blockchain. The integration of these technologies enables the creation of a decentralized vehicular network. In this setting, vehicles can learn from each other without compromising privacy while also ensuring data integrity and accountability. Initial experiments show that compared to conventional decentralized federated learning techniques, our proposed approach significantly enhances the performance and security of vehicular networks. The system's accuracy stands at 91.92\%. While this may appear to be low in comparison to state-of-the-art federated learning models, our work is noteworthy because, unlike others, it was achieved in a malicious vehicle setting. Despite the challenging environment, our method maintains high accuracy, making it a competent solution for preserving data privacy in vehicular networks.
摘要
<> translate into Simplified Chinese数据从连接的自动车可能包含敏感信息,如位置、驾驶行为、个人标识等。无效的安全措施可能会损害数据隐私和系统安全。现有中央化数据分享模式在这些系统中具有特别的隐私问题。认识到这些挑战,在技术发展的同时,倾向于分布式互动的方向,这与工业5.0的原则相吻合。这项工作与这些原则相关,强调分布式、人类中心、安全的技术互动在连接的自动车环境中。为实现这一目标,我们提议一种实用的方法,将 Federated Learning(FL)和区块链技术融合在一起。这种 integrate 的方法可以创建一个分布式的自动车网络。在这个设定下,车辆可以在不侵犯隐私的情况下学习从别的车辆,同时保证数据的完整性和责任。初始实验表明,与传统的分布式联合学习技术相比,我们提议的方法可以明显提高自动车网络的性能和安全性。系统的准确率为91.92%。尽管这些值与当前的联合学习模型相比较低,但我们的工作具有突出的特点,即在恶势力车辆环境下实现高准确率,而不是其他人所做的。不管挑战环境,我们的方法都能保持高准确率,这使得它成为了保护自动车网络数据隐私的可靠解决方案。
The AeroSonicDB (YPAD-0523) Dataset for Acoustic Detection and Classification of Aircraft
results: 基本结果显示了三种二分类模型的性能,并讨论了当前数据集的局限性和未来的潜在价值Abstract
The time and expense required to collect and label audio data has been a prohibitive factor in the availability of domain specific audio datasets. As the predictive specificity of a classifier depends on the specificity of the labels it is trained on, it follows that finely-labelled datasets are crucial for advances in machine learning. Aiming to stimulate progress in the field of machine listening, this paper introduces AeroSonicDB (YPAD-0523), a dataset of low-flying aircraft sounds for training acoustic detection and classification systems. This paper describes the method of exploiting ADS-B radio transmissions to passively collect and label audio samples. Provides a summary of the collated dataset. Presents baseline results from three binary classification models, then discusses the limitations of the current dataset and its future potential. The dataset contains 625 aircraft recordings ranging in event duration from 18 to 60 seconds, for a total of 8.87 hours of aircraft audio. These 625 samples feature 301 unique aircraft, each of which are supplied with 14 supplementary (non-acoustic) labels to describe the aircraft. The dataset also contains 3.52 hours of ambient background audio ("silence"), as a means to distinguish aircraft noise from other local environmental noises. Additionally, 6 hours of urban soundscape recordings (with aircraft annotations) are included as an ancillary method for evaluating model performance, and to provide a testing ground for real-time applications.
摘要
过往,收集和标签音频数据的时间和成本因素,对于特定领域的音频数据集的可用性是一个阻碍因素。当predictive特定性取决于labels训练的特定性,这意味着精确地标签数据集是预测机器学习的关键。为了促进机器听力领域的进步,本文发布了AeroSonicDB(YPAD-0523),一个低飞行 aircraft 音频数据集,用于训练音频检测和分类系统。本文详细介绍了使用ADS-B无线电传输来过程式收集和标签音频 Samples。提供了数据集的总结,并提出了三个binary分类模型的基eline结果。然后讨论了现有数据集的限制和未来潜力。这个数据集包含625架飞机录音, recording duration ranges from 18 to 60 seconds, for a total of 8.87 hours of aircraft audio. These 625 samples feature 301 unique aircraft, each of which are supplied with 14 supplementary (non-acoustic) labels to describe the aircraft. The dataset also contains 3.52 hours of ambient background audio ("silence"), as a means to distinguish aircraft noise from other local environmental noises. Additionally, 6 hours of urban soundscape recordings (with aircraft annotations) are included as an ancillary method for evaluating model performance, and to provide a testing ground for real-time applications.
CALLOC: Curriculum Adversarial Learning for Secure and Robust Indoor Localization
results: 实验证明,CALLOC可以在多种不同的室内场景、移动设备和攻击enario中提高准确性,比如平均误差下降6.03倍,最差情况下误差下降4.6倍,相比之下现有的室内地位定位框架。Abstract
Indoor localization has become increasingly vital for many applications from tracking assets to delivering personalized services. Yet, achieving pinpoint accuracy remains a challenge due to variations across indoor environments and devices used to assist with localization. Another emerging challenge is adversarial attacks on indoor localization systems that not only threaten service integrity but also reduce localization accuracy. To combat these challenges, we introduce CALLOC, a novel framework designed to resist adversarial attacks and variations across indoor environments and devices that reduce system accuracy and reliability. CALLOC employs a novel adaptive curriculum learning approach with a domain specific lightweight scaled-dot product attention neural network, tailored for adversarial and variation resilience in practical use cases with resource constrained mobile devices. Experimental evaluations demonstrate that CALLOC can achieve improvements of up to 6.03x in mean error and 4.6x in worst-case error against state-of-the-art indoor localization frameworks, across diverse building floorplans, mobile devices, and adversarial attacks scenarios.
摘要
indoor定位已成为许多应用程序中越来越重要的一部分,从跟踪资产到提供个性化服务。然而,实现精确定位仍然是一大挑战,因为室内环境中的变化和用于帮助定位的设备之间存在差异。此外,indoor定位系统也面临着抗 adversarial 攻击的挑战,这些攻击不仅会威胁服务的一致性,而且还会减少定位精度。为解决这些挑战,我们介绍了 CALLOC,一个新的框架,旨在抵抗抗 adversarial 攻击和室内环境中的变化。CALLOC 使用了一种新的适应学习approach,其中包括一个适应性较强的域特定缩小乘数产品注意力神经网络,特制 для抗 adversarial 和变化的鲁棒性。在实际使用情况下,CALLOC 可以在不同的建筑层面、移动设备和抗 adversarial 攻击方面实现改进。我们的实验评估表明,CALLOC 可以与现有的indoor定位框架相比,在多种不同的室内环境、移动设备和抗 adversarial 攻击场景中实现改进。改进的均方误差和最均方误差为6.03倍和4.6倍。
Compact Matrix Quantum Group Equivariant Neural Networks
for: 这 paper written for 研究 neural network 学习从数据中的量子同质性。
methods: 这 paper 使用 Woronowicz 的 Tannaka-Krein duality 来描述 compact matrix quantum group 对应的 weight matrices。
results: 这 paper 提出了一种新的类型的 neural network, called compact matrix quantum group equivariant neural network,可以从数据中学习量子同质性。此外,paper 还证明了这种 neural network 包含了所有 compact matrix group equivariant neural network 为子集。同时,paper 也获得了许多 compact matrix group equivariant neural network 的 weight matrices 的Characterization,这些 weight matrices 之前没有出现在机器学习文献中。Abstract
We derive the existence of a new type of neural network, called a compact matrix quantum group equivariant neural network, that learns from data that has an underlying quantum symmetry. We apply the Woronowicz formulation of Tannaka-Krein duality to characterise the weight matrices that appear in these neural networks for any easy compact matrix quantum group. We show that compact matrix quantum group equivariant neural networks contain, as a subclass, all compact matrix group equivariant neural networks. Moreover, we obtain characterisations of the weight matrices for many compact matrix group equivariant neural networks that have not previously appeared in the machine learning literature.
摘要
我们从数据中吸取了一种新的神经网络,即含有量子同质性的矩阵量子群响应神经网络。我们使用沃罗诺维茨形式的塔那卡-克雷因对吸引神经网络的Weight矩阵进行了定义。我们证明了矩阵量子群响应神经网络包含所有矩阵群响应神经网络的子类。此外,我们获得了许多矩阵群响应神经网络在机器学习文献中未出现过的Weight矩阵的特征。
EVORA: Deep Evidential Traversability Learning for Risk-Aware Off-Road Autonomy
paper_authors: Xiaoyi Cai, Siddharth Ancha, Lakshay Sharma, Philip R. Osteen, Bernadette Bucher, Stephen Phillips, Jiuguang Wang, Michael Everett, Nicholas Roy, Jonathan P. How
for: 本研究旨在提高快速机器人跟踪减少摩擦的能力,尤其是在不可预知的地形下。
methods: 本研究使用自我监督学习方法,直接从数据中学习地形特征,而不是手动设置成本。
results: 研究提出了一种能够有效地量化和mitigate Risks的方法,包括学习批处理分布和概率密度,以及一种新的不确定性感知loss函数。这些方法有助于提高机器人的导航性能。Abstract
Traversing terrain with good traction is crucial for achieving fast off-road navigation. Instead of manually designing costs based on terrain features, existing methods learn terrain properties directly from data via self-supervision, but challenges remain to properly quantify and mitigate risks due to uncertainties in learned models. This work efficiently quantifies both aleatoric and epistemic uncertainties by learning discrete traction distributions and probability densities of the traction predictor's latent features. Leveraging evidential deep learning, we parameterize Dirichlet distributions with the network outputs and propose a novel uncertainty-aware squared Earth Mover's distance loss with a closed-form expression that improves learning accuracy and navigation performance. The proposed risk-aware planner simulates state trajectories with the worst-case expected traction to handle aleatoric uncertainty, and penalizes trajectories moving through terrain with high epistemic uncertainty. Our approach is extensively validated in simulation and on wheeled and quadruped robots, showing improved navigation performance compared to methods that assume no slip, assume the expected traction, or optimize for the worst-case expected cost.
摘要
通过适量地形的探索是快速Off-road导航的关键。现有方法通过自我超视来学习地形特性,但是存在风险量化和mitigate风险的挑战。本工作效率地量化了 aleatoric 和 epistemic 不确定性,通过学习离散的扩展特征分布和概率密度来。基于征识深度学习,我们使用网络输出来参数化地 Dirichlet 分布,并提出了一种新的不确定性意识深度Move的距离损失函数,这个函数具有闭合式表达,可以提高学习精度和导航性能。我们的风险意识规划器通过 simulate 状态轨迹的最差预期扩展特征来处理 aleatoric 不确定性,并对高 epistemic 不确定性的轨迹进行惩罚。我们的方法在 simulate 和有脚和四脚机器人上进行了广泛验证,与不考虑滑动、预期的扩展特征或优化最差预期成本的方法进行比较,显示了改进的导航性能。
Learning material synthesis-structure-property relationship by data fusion: Bayesian Co-regionalization N-Dimensional Piecewise Function Learning
paper_authors: A. Gilad Kusne, Austin McDannald, Brian DeCost
For: 本研究旨在推动下一代技术的发展,如量子计算、碳捕集和低成本医疗影像等。* Methods: 研究人员使用了知识管理和数据融合技术,将不同仪器和实验室的数据集成在一起,以学习材料制备-结构-性质关系。* Results: 研究人员提出了一种名为Synthesis-structure-property relAtionship coreGionalized lEarner(SAGE)算法,可以在多种数据源之间进行数据融合,以学习材料制备-结构-性质关系。Abstract
Advanced materials are needed to further next-generation technologies such as quantum computing, carbon capture, and low-cost medical imaging. However, advanced materials discovery is confounded by two fundamental challenges: the challenge of a high-dimensional, complex materials search space and the challenge of combining knowledge, i.e., data fusion across instruments and labs. To overcome the first challenge, researchers employ knowledge of the underlying material synthesis-structure-property relationship, as a material's structure is often predictive of its functional property and vice versa. For example, optimal materials often occur along composition-phase boundaries or within specific phase regions. Additionally, knowledge of the synthesis-structure-property relationship is fundamental to understanding underlying physical mechanisms. However, quantifying the synthesis-structure-property relationship requires overcoming the second challenge. Researchers must merge knowledge gathered across instruments, measurement modalities, and even laboratories. We present the Synthesis-structure-property relAtionship coreGionalized lEarner (SAGE) algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to merge knowledge across data sources to learn synthesis-structure-property relationships.
摘要
高级材料需要进一步推动下一代技术,如量子计算、碳捕集和低成本医疗成像。然而,高级材料发现面临两个基本挑战:一是高维度、复杂的材料搜索空间挑战,二是组合知识挑战,即将数据源的知识融合到一起。为了解决第一个挑战,研究人员利用材料合成-结构-性能关系的知识,因为材料结构 oft predicts its functional property and vice versa。例如,理想的材料常occurs along composition-phase boundaries或在specific phase regions。此外,理解材料合成-结构-性能关系的基础知识是理解下面物理机制的基础。然而,量化材料合成-结构-性能关系需要解决第二个挑战。研究人员必须将数据源的知识融合到一起。我们介绍了 Synthesis-structure-property relAtionship coreGionalized lEarner(SAGE)算法。这是一种完全 Bayesian 算法,使用多modal coregionalization来融合数据源的知识,以学习材料合成-结构-性能关系。
Does Differential Privacy Prevent Backdoor Attacks in Practice?
paper_authors: Fereshteh Razmi, Jian Lou, Li Xiong for: This paper aims to investigate the effectiveness of different differential privacy (DP) techniques in preventing backdoor attacks in machine learning (ML) models, specifically examining PATE and Label-DP.methods: The paper employs DP-SGD and PATE to defend against backdoor attacks, and explores the role of different components of DP algorithms in defending against these attacks. The authors also propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE.results: The experiments reveal that hyperparameters and the number of backdoors in the training dataset impact the success of DP algorithms, and that Label-DP algorithms can be more effective than DP methods in defending against backdoor attacks while maintaining model accuracy.Abstract
Differential Privacy (DP) was originally developed to protect privacy. However, it has recently been utilized to secure machine learning (ML) models from poisoning attacks, with DP-SGD receiving substantial attention. Nevertheless, a thorough investigation is required to assess the effectiveness of different DP techniques in preventing backdoor attacks in practice. In this paper, we investigate the effectiveness of DP-SGD and, for the first time in literature, examine PATE in the context of backdoor attacks. We also explore the role of different components of DP algorithms in defending against backdoor attacks and will show that PATE is effective against these attacks due to the bagging structure of the teacher models it employs. Our experiments reveal that hyperparameters and the number of backdoors in the training dataset impact the success of DP algorithms. Additionally, we propose Label-DP as a faster and more accurate alternative to DP-SGD and PATE. We conclude that while Label-DP algorithms generally offer weaker privacy protection, accurate hyper-parameter tuning can make them more effective than DP methods in defending against backdoor attacks while maintaining model accuracy.
摘要
diferencial privacidad (DP) fue desarrollada originalmente para proteger la privacidad, pero recientemente se ha utilizado para proteger modelos de aprendizaje automático (ML) de ataques de contaminación, con DP-SGD recibiendo una gran cantidad de atención. Sin embargo, se requiere una investigación exhaustiva para evaluar la eficacia de diferentes técnicas de privacidad diferencial en prevenir ataques de backdoor en la práctica. En este artículo, investigamos la eficacia de DP-SGD y, por primera vez en la literatura, examinamos PATE en el contexto de ataques de backdoor. Además, exploramos el papel de diferentes componentes de los algoritmos de privacidad diferencial en la defensa contra ataques de backdoor y demostraremos que PATE es efectivo contra estos ataques gracias a la estructura de bolsa de los modelos de maestro que emplea. Nuestras experimentos revelan que los hiperparámetros y el número de backdoors en el conjunto de entrenamiento del impactan el éxito de los algoritmos de privacidad diferencial. Además, propongo Label-DP como una alternativa más rápida y precisa a DP-SGD y PATE. Concluimos que, aunque los algoritmos Label-DP generalmente ofrecen una protección de privacidad más débil, la tuning de hiperparámetros precisa puede hacer que sean más efectivos que los métodos de privacidad diferencial en la defensa contra ataques de backdoor mientras se mantiene la precisión del modelo.
Differentiable VQ-VAE’s for Robust White Matter Streamline Encodings
results: 对比了多种现有的 autoencoder 方法,DVQ-VAE 显示出了更高的编码和重建性能。Abstract
Given the complex geometry of white matter streamlines, Autoencoders have been proposed as a dimension-reduction tool to simplify the analysis streamlines in a low-dimensional latent spaces. However, despite these recent successes, the majority of encoder architectures only perform dimension reduction on single streamlines as opposed to a full bundle of streamlines. This is a severe limitation of the encoder architecture that completely disregards the global geometric structure of streamlines at the expense of individual fibers. Moreover, the latent space may not be well structured which leads to doubt into their interpretability. In this paper we propose a novel Differentiable Vector Quantized Variational Autoencoder, which are engineered to ingest entire bundles of streamlines as single data-point and provides reliable trustworthy encodings that can then be later used to analyze streamlines in the latent space. Comparisons with several state of the art Autoencoders demonstrate superior performance in both encoding and synthesis.
摘要
giventext由于白质物流线的复杂几何结构,Autoencoder已经被提议作为一个简化分析的工具,以将流线简化到低维的隐藏空间中。然而,Despite these recent successes, the majority of encoder architectures only perform dimension reduction on single streamlines as opposed to a full bundle of streamlines。这是一个严重的encoder architecture limitation, completely disregards the global geometric structure of streamlines at the expense of individual fibers。Moreover, the latent space may not be well structured which leads to doubt into their interpretability。在这篇文章中,我们提出了一种新的可微分量化自适应器,可以读取整个组合的流线作为单一数据点,并提供可靠可信的编码,可以用来分析流线在隐藏空间中。与多个现有Autoencoder进行比较,我们的方法具有较高的编码和合成性能。
Optimal Cooperative Multiplayer Learning Bandits with Noisy Rewards and No Communication
results: 该论文显示了这种算法可以在不同奖励信息的情况下实现对数($O(\frac{\log T}{\Delta_{\bm{a}})$)的追悟 regret,以及$O(\sqrt{T\log T})$的追悟 regret,这两者都是对数函数。此外,该算法在实际中也比现有的算法表现更好。Abstract
We consider a cooperative multiplayer bandit learning problem where the players are only allowed to agree on a strategy beforehand, but cannot communicate during the learning process. In this problem, each player simultaneously selects an action. Based on the actions selected by all players, the team of players receives a reward. The actions of all the players are commonly observed. However, each player receives a noisy version of the reward which cannot be shared with other players. Since players receive potentially different rewards, there is an asymmetry in the information used to select their actions. In this paper, we provide an algorithm based on upper and lower confidence bounds that the players can use to select their optimal actions despite the asymmetry in the reward information. We show that this algorithm can achieve logarithmic $O(\frac{\log T}{\Delta_{\bm{a}})$ (gap-dependent) regret as well as $O(\sqrt{T\log T})$ (gap-independent) regret. This is asymptotically optimal in $T$. We also show that it performs empirically better than the current state of the art algorithm for this environment.
摘要
我们考虑了合作多player带狗学习问题,其中玩家只能在进程前合作确定策略,但在学习过程中不能交流。在这个问题中,每个玩家同时选择动作,基于所有玩家选择的动作,团队的玩家收到奖励。但是,每个玩家只能看到自己的奖励,其他玩家的奖励是干扰的。由于玩家收到的奖励可能不同,因此存在 asymmetry 在奖励信息中。在这篇论文中,我们提供了基于上下界的 confidence bounds 算法,allowing players to select their optimal actions despite the asymmetry in the reward information. We show that this algorithm can achieve logarithmic $O(\frac{\log T}{\Delta_{\bm{a}})$ (gap-dependent) regret as well as $O(\sqrt{T\log T})$ (gap-independent) regret. This is asymptotically optimal in $T$. We also show that it performs empirically better than the current state of the art algorithm for this environment.
Time Scale Network: A Shallow Neural Network For Time Series Data
results: 这个研究的结果显示,这个时间尺度网络可以在杜立特证明和血液律异常检测中表现出色,其中包括高精度、快速训练和测试速度、以及可视化和解释学习的特征模式。此外,这个网络也在脑电律异常预测中获得了出色的表现。Abstract
Time series data is often composed of information at multiple time scales, particularly in biomedical data. While numerous deep learning strategies exist to capture this information, many make networks larger, require more data, are more demanding to compute, and are difficult to interpret. This limits their usefulness in real-world applications facing even modest computational or data constraints and can further complicate their translation into practice. We present a minimal, computationally efficient Time Scale Network combining the translation and dilation sequence used in discrete wavelet transforms with traditional convolutional neural networks and back-propagation. The network simultaneously learns features at many time scales for sequence classification with significantly reduced parameters and operations. We demonstrate advantages in Atrial Dysfunction detection including: superior accuracy-per-parameter and accuracy-per-operation, fast training and inference speeds, and visualization and interpretation of learned patterns in atrial dysfunction detection on ECG signals. We also demonstrate impressive performance in seizure prediction using EEG signals. Our network isolated a few time scales that could be strategically selected to achieve 90.9% accuracy using only 1,133 active parameters and consistently converged on pulsatile waveform shapes. This method does not rest on any constraints or assumptions regarding signal content and could be leveraged in any area of time series analysis dealing with signals containing features at many time scales.
摘要
时序数据经常具有多个时间尺度信息,特别是在生物医学数据中。虽然有许多深度学习策略可以捕捉这些信息,但是 многие网络变得更大、需要更多的数据、更复杂的计算和更难于解释。这限制了它们在实际应用中的使用,特别是面临有限的计算和数据约束。我们提出了一种简单、计算效率高的时间尺度网络,将翻译和扩展序列使用在离散干扰变换中的Sequence Network与传统的卷积神经网络和反射传播结合。该网络同时学习多个时间尺度的特征,用于序列分类,而无需增加过多的参数和运算。我们在心脏病变诊断中demonstrated出了superior的准确率-参数和运算量,快速的训练和推理速度,以及序列分类结果的可视化和解释。此外,我们还在EEG信号上进行了抑制预测,并达到了90.9%的准确率,只使用1,133个活动参数。这种方法不受任何信号内容的限制,可以在任何时序分析领域中应用,特别是面临着包含多个时间尺度的信号。
Surrogate Neural Networks to Estimate Parametric Sensitivity of Ocean Models
paper_authors: Yixuan Sun, Elizabeth Cucuzzella, Steven Brus, Sri Hari Krishna Narayanan, Balu Nadiga, Luke Van Roekel, Jan Hückelheim, Sandeep Madireddy
for: 研究气候变化和海洋相互作用的影响
methods: 使用神经网络模型和参数推定法
results: 模型输出的参数敏感性分析Here’s a more detailed explanation of each point:
for: The paper is written to study the impact of greenhouse gases, warming, and ice sheet melting on the ocean, as well as the effects of ocean processes on phenomena such as hurricanes and droughts.
methods: The authors use a combination of idealized ocean models, perturbed parameter ensemble data, and surrogate neural network models to analyze the sensitivity of the model output to unmeasurable parameters.
results: The authors compute the parametric sensitivity of the one-step forward dynamics of the model, providing insights into the impact of unmeasurable parameters on the model output.Abstract
Modeling is crucial to understanding the effect of greenhouse gases, warming, and ice sheet melting on the ocean. At the same time, ocean processes affect phenomena such as hurricanes and droughts. Parameters in the models that cannot be physically measured have a significant effect on the model output. For an idealized ocean model, we generated perturbed parameter ensemble data and trained surrogate neural network models. The neural surrogates accurately predicted the one-step forward dynamics, of which we then computed the parametric sensitivity.
摘要
模拟是理解绿色气体、暖化和冰川融化对海洋的效应的关键。同时,海洋过程对风暴和干旱等现象产生了影响。模型中无法测量的参数会对模型输出产生重要影响。为一个理想化的海洋模型,我们生成了受扰参数数据集和训练了神经网络模型。神经网络模型准确预测了下一步动力学行为,我们 THEN 计算了参数敏感度。
Interpretable Graph Anomaly Detection using Gradient Attention Maps
paper_authors: Yifei Yang, Peng Wang, Xiaofan He, Dongmian Zou
for: 本文旨在提出一种基于可解释性的图像异常检测方法,以提高异常检测性能。
methods: 本方法使用图神经网络的梯度来生成注意力地图,并使用这个地图来评分异常。
results: 对比基eline方法,本方法在多个synthetic数据集上表现出色,并且可以帮助我们更好地理解异常检测决策的过程。Abstract
Detecting unusual patterns in graph data is a crucial task in data mining. However, existing methods often face challenges in consistently achieving satisfactory performance and lack interpretability, which hinders our understanding of anomaly detection decisions. In this paper, we propose a novel approach to graph anomaly detection that leverages the power of interpretability to enhance performance. Specifically, our method extracts an attention map derived from gradients of graph neural networks, which serves as a basis for scoring anomalies. In addition, we conduct theoretical analysis using synthetic data to validate our method and gain insights into its decision-making process. To demonstrate the effectiveness of our method, we extensively evaluate our approach against state-of-the-art graph anomaly detection techniques. The results consistently demonstrate the superior performance of our method compared to the baselines.
摘要
检测图形数据中异常 Pattern 是数据挖掘中的一项关键任务。然而,现有的方法经常遇到一些挑战,包括困难保证满意的性能和缺乏可解性,这些缺陷限制了我们对异常检测决策的理解。在这篇论文中,我们提出了一种新的图形异常检测方法,该方法利用可解性来提高性能。具体来说,我们的方法利用图形神经网络的梯度导数来生成一个注意力地图,该地图作为异常分数的基础。此外,我们使用 sintetic data 进行理论分析,以获得我们的方法做出异常检测决策的理解。为了证明我们的方法的有效性,我们对比了我们的方法与当前最佳的图形异常检测技术。结果一致地表明了我们的方法与基eline相比具有更高的性能。
Minimum norm interpolation by perceptra: Explicit regularization and implicit bias
paper_authors: Jiyoung Park, Ian Pelakh, Stephan Wojtowytsch
for: 研究如何 shallow ReLU 网络在知道区域内 interpolate.
methods: 我们的分析表明,当数据点和参数的数量增加,并且权重 decay 正则化的系数逐渐减少时,Empirical risk minimizers 会 converge to a minimum norm interpolant.
results: 我们的numerical研究表明,通用优化算法对known minimum norm interpolants具有隐式偏好,无论有没有显式正则化。Abstract
We investigate how shallow ReLU networks interpolate between known regions. Our analysis shows that empirical risk minimizers converge to a minimum norm interpolant as the number of data points and parameters tends to infinity when a weight decay regularizer is penalized with a coefficient which vanishes at a precise rate as the network width and the number of data points grow. With and without explicit regularization, we numerically study the implicit bias of common optimization algorithms towards known minimum norm interpolants.
摘要
我们调查如何使浅层ReLU网络在已知区域中进行 interpolating。我们的分析表明,在数据点和参数数量增加时,empirical risk minimizers会趋向 minimum norm interpolant 的最小norm的架构,并且随着网络宽度和数据点数量增加,该 coefficient 会逐渐消失。在有Explicit regularization和无Explicit regularization的情况下,我们 numerically 研究了通用优化算法对于已知 minimum norm interpolants 的隐藏偏见。
Distributionally Robust Skeleton Learning of Discrete Bayesian Networks
methods: 利用分布性robust优化和回归方法,最大化最差风险(worst-case risk)在 Family of Distributions 内的 bounded Wasserstein distance 或 KL divergence 到 empirical distribution。
results: 提出了一种可以应用于普遍 categorical random variables 的方法,不需要 faithfulness、ordinal relationship 或 specific conditional distribution 假设。 提供了高效的算法,并在轻度假设下提供了非 asymptotic 保证。 数值研究表明方法的有效性。 Code 可以在 https://github.com/DanielLeee/drslbn 找到。Abstract
We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data. Building on distributionally robust optimization and a regression approach, we propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution. The worst-case risk accounts for the effect of outliers. The proposed approach applies for general categorical random variables without assuming faithfulness, an ordinal relationship or a specific form of conditional distribution. We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach. Under mild assumptions, we derive non-asymptotic guarantees for successful structure learning with logarithmic sample complexities for bounded-degree graphs. Numerical study on synthetic and real datasets validates the effectiveness of our method. Code is available at https://github.com/DanielLeee/drslbn.
摘要
我们考虑一个统计学上的问题,即从潜在损害的数据中学习一般化的抽象骨架。我们基于分布式弹性优化和回归方法,提出一个优化最坏风险的方法,其中最坏风险是指在一家族中的分布体内的最大差距或KL散度与empirical分布之间的最大差距。这个风险考虑到噪音的影响。我们的方法适用于一般的分类随机Variable而无需假设忠诚、排序关系或具体的假设。我们提供高效的算法和证明在对��� bounded-degree graph的简单假设下,我们可以获得非对数� Spark complexity的成功结构学。我们的方法与标准的规制化回归方法密切相关。我们的方法在实验中证明了有效。code可以在https://github.com/DanielLeee/drslbn中找到。
Turbulence Scaling from Deep Learning Diffusion Generative Models
results: 研究发现,新生成的液体动力学解具有与预期的科尔мого罗夫 scaling 相同的统计尺度 Properties,并且比训练数据集的统计尺度更加精度。这种与实际液体动力学特征相符的表现,提供了模型能够捕捉实际液体动力学特征的强有力证据。Abstract
Complex spatial and temporal structures are inherent characteristics of turbulent fluid flows and comprehending them poses a major challenge. This comprehesion necessitates an understanding of the space of turbulent fluid flow configurations. We employ a diffusion-based generative model to learn the distribution of turbulent vorticity profiles and generate snapshots of turbulent solutions to the incompressible Navier-Stokes equations. We consider the inverse cascade in two spatial dimensions and generate diverse turbulent solutions that differ from those in the training dataset. We analyze the statistical scaling properties of the new turbulent profiles, calculate their structure functions, energy power spectrum, velocity probability distribution function and moments of local energy dissipation. All the learnt scaling exponents are consistent with the expected Kolmogorov scaling and have lower errors than the training ones. This agreement with established turbulence characteristics provides strong evidence of the model's capability to capture essential features of real-world turbulence.
摘要
困难的空间和时间结构是液体动力学中抽象流动的内在特征,理解这些特征是很重要的。我们使用一种扩散基于的生成模型来学习液体动力学中抽象扩散的分布,并生成了不同于训练数据集的液体动力学解。我们在两维空间中考虑逆升阶段,并生成了多种不同的液体动力学解,与训练数据集的解不同。我们分析了新的液体动力学Profile的统计尺度性质,计算了其结构函数、能量频谱、速度分布函数和本地能量投入的积分。所学到的扩散 exponent都与预期的科尔莫戈罗夫 scaling 相符,并且与训练数据集中的 exponent 有更低的错误。这种一致性提供了强有力的证据,证明了模型能够捕捉真实世界中的液体动力学特征。
An Interpretable Machine Learning Framework to Understand Bikeshare Demand before and during the COVID-19 Pandemic in New York City
results: 根据这个研究中考虑的说明变数的相对重要性,女性用户占有和小时是这两个模型中最重要的变数。然而,月份变数在大流行模型中比在以前模型中更重要。Abstract
In recent years, bikesharing systems have become increasingly popular as affordable and sustainable micromobility solutions. Advanced mathematical models such as machine learning are required to generate good forecasts for bikeshare demand. To this end, this study proposes a machine learning modeling framework to estimate hourly demand in a large-scale bikesharing system. Two Extreme Gradient Boosting models were developed: one using data from before the COVID-19 pandemic (March 2019 to February 2020) and the other using data from during the pandemic (March 2020 to February 2021). Furthermore, a model interpretation framework based on SHapley Additive exPlanations was implemented. Based on the relative importance of the explanatory variables considered in this study, share of female users and hour of day were the two most important explanatory variables in both models. However, the month variable had higher importance in the pandemic model than in the pre-pandemic model.
摘要
Recently, 自行车共享系统已经成为非常受欢迎的可靠和可持续的微型交通解决方案。为了生成好的预测模型,这些研究需要进行高级的数据分析和机器学习模型。为此,本研究提出了一个机器学习模型框架,用于估计大规模自行车共享系统的每小时需求。这些研究发展了两个极大Gradient Boosting模型:一个使用2019年3月至2020年2月的数据(前疫情时期),另一个使用2020年3月至2021年2月的数据(疫情时期)。此外,基于SHapley Additive exPlanations的模型解释框架也被实现。根据这些研究中考虑的说明变量的相对重要性,女性用户的份额和时间段是这两个模型中最重要的说明变量。但是,月份变量在疫情模型中比前疫情模型更重要。
1-Lipschitz Neural Networks are more expressive with N-Activations
results: 论文表明,常用的activation function,如MaxMin,以及所有的二段折线activation function都过于限制了函数的表达能力,即使在 simplest一dimensional setting中。它们还引入了一种新的N-activation function,可以更好地表达函数。Abstract
A crucial property for achieving secure, trustworthy and interpretable deep learning systems is their robustness: small changes to a system's inputs should not result in large changes to its outputs. Mathematically, this means one strives for networks with a small Lipschitz constant. Several recent works have focused on how to construct such Lipschitz networks, typically by imposing constraints on the weight matrices. In this work, we study an orthogonal aspect, namely the role of the activation function. We show that commonly used activation functions, such as MaxMin, as well as all piece-wise linear ones with two segments unnecessarily restrict the class of representable functions, even in the simplest one-dimensional setting. We furthermore introduce the new N-activation function that is provably more expressive than currently popular activation functions. We provide code at https://github.com/berndprach/NActivation.
摘要
一个深度学习系统的关键性能特性是其Robustness:小改变输入 shouldn't result in large changes to its outputs. 数学上,这意味着一个网络的 lipschitz常数应该小。 一些最近的工作已经关注如何构建这样的 lipschitz 网络,通常是通过加载矩阵的约束。在这个工作中,我们研究了另一个正交方面,即激活函数的角色。我们显示,通用的激活函数,如 MaxMin,以及所有分割线性的两段激活函数都过于限制了可表示的函数的类型,即使在最简单的一维设定中。我们还引入了新的 N-激活函数,可以证明比现有的激活函数更加表达力强。我们提供了相关代码在 GitHub 上:https://github.com/berndprach/NActivation。
Symbolic Regression as Feature Engineering Method for Machine and Deep Learning Regression Tasks
results: SR-derived features可以帮助提高机器学习和深度学习回归模型的预测精度,实验结果显示SR可以提高模型的root mean square error(RMSE)值34-86%,并在实际应用中提高预测超导温度的准确率。Abstract
In the realm of machine and deep learning regression tasks, the role of effective feature engineering (FE) is pivotal in enhancing model performance. Traditional approaches of FE often rely on domain expertise to manually design features for machine learning models. In the context of deep learning models, the FE is embedded in the neural network's architecture, making it hard for interpretation. In this study, we propose to integrate symbolic regression (SR) as an FE process before a machine learning model to improve its performance. We show, through extensive experimentation on synthetic and real-world physics-related datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and deep learning regression models with 34-86% root mean square error (RMSE) improvement in synthetic datasets and 4-11.5% improvement in real-world datasets. In addition, as a realistic use-case, we show the proposed method improves the machine learning performance in predicting superconducting critical temperatures based on Eliashberg theory by more than 20% in terms of RMSE. These results outline the potential of SR as an FE component in data-driven models.
摘要
在机器学习和深度学习回归任务中,有效的特征工程(FE)角色是关键的提高模型性能。传统的FE方法通常依赖于领域专家手动设计机器学习模型的特征。在深度学习模型中,FE是内置在神经网络结构中,使其解释性困难。在这项研究中,我们提议将符号回归(SR)作为FE过程来改进机器学习模型的性能。我们通过对 sintetic和实际物理相关数据集进行广泛的实验,发现SR derivated特征的 integrate 可以显著提高机器学习和深度学习回归模型的预测能力,具体来说,在 sintetic 数据集中,RMSE 下降了34-86%,而在实际数据集中,RMSE 下降了4-11.5%。此外,我们还展示了该方法可以在预测超导极限温度基于Eliashberg理论中提高机器学习性能,具体来说,RMSE 下降了 más de 20%。这些结果表明SR 可以作为数据驱动模型中的FE组件。
Doubly Robust Structure Identification from Temporal Data
results: 我们的实验结果表明,我们的方法在噪声和循环性数据情况下具有明显的优势,并且可以很好地鉴别出真实的原因结构。Abstract
Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantially biased. Furthermore, potential causes may be correlated in practical applications. Moreover, existing algorithms often do not work with cyclic data. To address these challenges, we propose a new doubly robust method for Structure Identification from Temporal Data ( SITD ). We provide theoretical guarantees, showing that our method asymptotically recovers the true underlying causal structure. Our analysis extends to cases where the potential causes have cycles and they may be confounded. We further perform extensive experiments to showcase the superior performance of our method.
摘要
学习时序数据的原因是许多应用程序的基本任务,从金融到地球科学或生物医学应用程序。常见的方法基于向量自动回归,但这些方法不考虑可能存在的隐藏干扰因素。在具有多个可能的原因和噪声数据的情况下,这些方法可能受到重大偏误。此外,实际应用中的原因可能相互 correlated。此外,现有的算法通常不能处理循环数据。为解决这些挑战,我们提出了一种新的双重可靠方法 для时间数据结构鉴别(SITD)。我们提供了理论保证,表明我们的方法在极限情况下可以准确回归真实的下面结构。我们的分析涵盖了可能存在循环的原因,以及它们可能受到干扰的情况。我们进一步进行了广泛的实验,以示出我们的方法的超过其他方法的性能。
Graph GOSPA metric: a metric to measure the discrepancy between graphs of different sizes
paper_authors: Jinhao Gu, Ángel F. García-Fernández, Robert E. Firth, Lennart Svensson
for: This paper proposes a metric to measure the dissimilarity between graphs with different numbers of nodes.
methods: The proposed metric extends the generalised optimal subpattern assignment (GOSPA) metric for sets to graphs, and includes costs associated with node attribute errors, missed and false nodes, and edge mismatches between graphs.
results: The metric is computable in polynomial time using linear programming, and its properties are demonstrated via simulated and empirical datasets.Here’s the same information in Simplified Chinese:
results: 该metric可以使用线性 программирова来计算,并通过验证模拟和实验数据显示了其性质。Abstract
This paper proposes a metric to measure the dissimilarity between graphs that may have a different number of nodes. The proposed metric extends the generalised optimal subpattern assignment (GOSPA) metric, which is a metric for sets, to graphs. The proposed graph GOSPA metric includes costs associated with node attribute errors for properly assigned nodes, missed and false nodes and edge mismatches between graphs. The computation of this metric is based on finding the optimal assignments between nodes in the two graphs, with the possibility of leaving some of the nodes unassigned. We also propose a lower bound for the metric, which is also a metric for graphs and is computable in polynomial time using linear programming. The metric is first derived for undirected unweighted graphs and it is then extended to directed and weighted graphs. The properties of the metric are demonstrated via simulated and empirical datasets.
摘要
results: 论文证明了这些函数在非随机 Setting 中的带有反馈的情况下,可以达到$(1 - \frac{1}{e})$ regret bound,其bound 为 $\sqrt{MKT}$(忽略对数因子),其中 $T$ 是时间戳和 $M$ 是 Cardinality 约束。这个 bound 胜过了在线半模式函数最大化的 $\widetilde{O}(T^{2/3})$ regret bound。Abstract
Many online decision-making problems correspond to maximizing a sequence of submodular functions. In this work, we introduce sum-max functions, a subclass of monotone submodular functions capturing several interesting problems, including best-of-$K$-bandits, combinatorial bandits, and the bandit versions on facility location, $M$-medians, and hitting sets. We show that all functions in this class satisfy a key property that we call pseudo-concavity. This allows us to prove $\big(1 - \frac{1}{e}\big)$-regret bounds for bandit feedback in the nonstochastic setting of the order of $\sqrt{MKT}$ (ignoring log factors), where $T$ is the time horizon and $M$ is a cardinality constraint. This bound, attained by a simple and efficient algorithm, significantly improves on the $\widetilde{O}\big(T^{2/3}\big)$ regret bound for online monotone submodular maximization with bandit feedback.
摘要
多个在线决策问题都对应于最大化一个序列的准确函数。在这项工作中,我们介绍了总最大函数,它是准确函数的一个子类,捕捉了许多有趣的问题,包括最佳-$K$-投降、组合投降、投降版本的设施位置、$M$-中心和击中集。我们证明了这些函数都满足一个关键性的pseudo-凹性性质,这使得我们可以证明在非随机设定下的递归约束下, regret bound为$\big(1 - \frac{1}{e}\big)$(忽略log因子),其值在$\sqrt{MKT}$之间。这个 bound是由一种简单和高效的算法实现,与之前的 $\widetilde{O}\big(T^{2/3}\big)$ regret bound相比,有所改善。
Plasma Surrogate Modelling using Fourier Neural Operators
paper_authors: Vignesh Gopakumar, Stanislas Pamela, Lorenzo Zanisi, Zongyi Li, Ander Gray, Daniel Brennand, Nitesh Bhatia, Gregory Stathopoulos, Matt Kusner, Marc Peter Deisenroth, Anima Anandkumar, JOREK Team, MAST Team
results: FNO可以准确预测束激发的发展,并在实验域中预测实际观测数据。我们在MAST托卡马克实验室中使用摄像头记录束激发的发展,并发现FNO可以准确预测束激发的发展和形状,以及束激发与中央气流和束激发器的互动的位置。FNO具有快速训练和推理,需要 fewer data points,可以完成零射播超解析,并且能够获得高精度解决方案。Abstract
Predicting plasma evolution within a Tokamak reactor is crucial to realizing the goal of sustainable fusion. Capabilities in forecasting the spatio-temporal evolution of plasma rapidly and accurately allow us to quickly iterate over design and control strategies on current Tokamak devices and future reactors. Modelling plasma evolution using numerical solvers is often expensive, consuming many hours on supercomputers, and hence, we need alternative inexpensive surrogate models. We demonstrate accurate predictions of plasma evolution both in simulation and experimental domains using deep learning-based surrogate modelling tools, viz., Fourier Neural Operators (FNO). We show that FNO has a speedup of six orders of magnitude over traditional solvers in predicting the plasma dynamics simulated from magnetohydrodynamic models, while maintaining a high accuracy (MSE $\approx$ $10^{-5}$). Our modified version of the FNO is capable of solving multi-variable Partial Differential Equations (PDE), and can capture the dependence among the different variables in a single model. FNOs can also predict plasma evolution on real-world experimental data observed by the cameras positioned within the MAST Tokamak, i.e., cameras looking across the central solenoid and the divertor in the Tokamak. We show that FNOs are able to accurately forecast the evolution of plasma and have the potential to be deployed for real-time monitoring. We also illustrate their capability in forecasting the plasma shape, the locations of interactions of the plasma with the central solenoid and the divertor for the full duration of the plasma shot within MAST. The FNO offers a viable alternative for surrogate modelling as it is quick to train and infer, and requires fewer data points, while being able to do zero-shot super-resolution and getting high-fidelity solutions.
摘要
预测tokamak激光器中激液的发展是实现可持续核聚合的关键。我们需要快速和准确地预测激液的空间时间发展,以便快速迭代设计和控制策略。 numerically solving plasma evolution models is often expensive and time-consuming, so we need inexpensive surrogate models. We demonstrate accurate predictions of plasma evolution using deep learning-based surrogate modeling tools, specifically Fourier Neural Operators (FNO). FNO has a speedup of six orders of magnitude over traditional solvers, while maintaining a high accuracy (MSE $\approx$ $10^{-5}$). Our modified version of FNO can solve multi-variable partial differential equations (PDEs) and capture the dependence among variables in a single model. FNOs can also predict plasma evolution on real-world experimental data from cameras positioned within the MAST Tokamak, such as cameras looking across the central solenoid and the divertor. We show that FNOs can accurately forecast plasma evolution and have the potential to be deployed for real-time monitoring. Additionally, we demonstrate their capability in forecasting the plasma shape, the locations of interactions of the plasma with the central solenoid and the divertor for the full duration of the plasma shot within MAST. FNO offers a viable alternative for surrogate modeling as it is quick to train and infer, requires fewer data points, and can perform zero-shot super-resolution with high-fidelity solutions.
Multiscale Neural Operators for Solving Time-Independent PDEs
paper_authors: Winfried Ripken, Lisa Coiffard, Felix Pieper, Sebastian Dziadzio
for: 解决大型精度离散方程在数据驱动神经网络中的挑战。
methods: 提出了一种图rewiring技术,以增强神经网络的全球交互能力。
results: 实验结果显示,我们的GNN方法在不规则网格上实现了时间独立精度离散方程的新高度表现标准,而我们的图rewiring策略也提高了基线方法的表现,实现了一个任务中的状态之最。Abstract
Time-independent Partial Differential Equations (PDEs) on large meshes pose significant challenges for data-driven neural PDE solvers. We introduce a novel graph rewiring technique to tackle some of these challenges, such as aggregating information across scales and on irregular meshes. Our proposed approach bridges distant nodes, enhancing the global interaction capabilities of GNNs. Our experiments on three datasets reveal that GNN-based methods set new performance standards for time-independent PDEs on irregular meshes. Finally, we show that our graph rewiring strategy boosts the performance of baseline methods, achieving state-of-the-art results in one of the tasks.
摘要
时间独立的偏微分方程(PDEs)在大型网格上具有严重的挑战,尤其是 для数据驱动的神经偏微分方程解solvers。我们介绍了一种新的グラフ重络技术来解决一些这些挑战,例如在不同维度和不规则网格上聚合信息。我们的提议方法可以跨距离节点相互作用,提高全球几何网络(GNNs)的全球互动能力。我们的实验结果显示,GNN-based方法在三个数据集上设置了新的性能标准,并且在其中一个任务中,我们的グラフ重络策略提高了基准方法的性能,实现了最佳结果。
Hierarchical deep learning-based adaptive time-stepping scheme for multiscale simulations
paper_authors: Asif Hamid, Danish Rafiq, Shahkar Ahmad Nahvi, Mohammad Abid Bazaz
for: 这篇研究是为了解决复杂非线性系统中的多尺度问题。
methods: 这篇研究提出了一种使用深度神经网络来解决多尺度问题的新方法。
results: 这篇研究获得了比固定步骤神经网络解析器更好的性能,并且在计算时间上降低了比例。Abstract
Multiscale is a hallmark feature of complex nonlinear systems. While the simulation using the classical numerical methods is restricted by the local \textit{Taylor} series constraints, the multiscale techniques are often limited by finding heuristic closures. This study proposes a new method for simulating multiscale problems using deep neural networks. By leveraging the hierarchical learning of neural network time steppers, the method adapts time steps to approximate dynamical system flow maps across timescales. This approach achieves state-of-the-art performance in less computational time compared to fixed-step neural network solvers. The proposed method is demonstrated on several nonlinear dynamical systems, and source codes are provided for implementation. This method has the potential to benefit multiscale analysis of complex systems and encourage further investigation in this area.
摘要
多尺度特征是复杂非线性系统的标志性特征。而使用传统的数值方法进行模拟时,会受到本地Taylor系列约束,而多尺度技术则经常受到寻找封闭的限制。本研究提出了使用深度神经网络来模拟多尺度问题的新方法。通过神经网络时间步骤的层次学习,该方法可以对不同时间尺度的动力系统流图进行approximation。这种方法可以在计算时间上比固定步骤神经网络解决方案更快,并达到当前最佳性能。该方法在多个非线性动力系统中进行了示例,并提供了实现代码。这种方法具有推动多尺度分析复杂系统的潜力,并鼓励这一领域进一步的研究。
ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation
results: 实验结果表明,这个方法在三个实际 dataset(Baby, Sports, Clothing)上的评价比靶场的方法高,并且可以增强内容表示的Semantic Features。Abstract
Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance to combine (user- and item-) ID embeddings with multimodal salient features, indicating the value of IDs. However, there is a lack of a thorough analysis of the ID embeddings in terms of feature semantics in the literature. In this paper, we revisit the value of ID embeddings for multimodal recommendation and conduct a thorough study regarding its semantics, which we recognize as subtle features of content and structures. Then, we propose a novel recommendation model by incorporating ID embeddings to enhance the semantic features of both content and structures. Specifically, we put forward a hierarchical attention mechanism to incorporate ID embeddings in modality fusing, coupled with contrastive learning, to enhance content representations. Meanwhile, we propose a lightweight graph convolutional network for each modality to amalgamate neighborhood and ID embeddings for improving structural representations. Finally, the content and structure representations are combined to form the ultimate item embedding for recommendation. Extensive experiments on three real-world datasets (Baby, Sports, and Clothing) demonstrate the superiority of our method over state-of-the-art multimodal recommendation methods and the effectiveness of fine-grained ID embeddings.
摘要
多模态推荐的目标是全面地表示用户和项目表示,并利用多种多媒体内容来提供有效的推荐。现有研究表明,将用户和项目ID编码与多模态突出特征结合起来可以提高推荐性能。然而,学术文献中对ID编码的semantics还没有进行了全面的分析。本文重新评估了多模态推荐中ID编码的值,并进行了semantics的全面分析。然后,我们提出了一种新的推荐模型,该模型通过结合ID编码来增强内容和结构的semantics。具体来说,我们提出了一种层次注意机制,将ID编码与多模态融合进行了强调,并与对比学习结合使用,以提高内容表示。同时,我们提出了一种轻量级的图 convolutional network,用于每种模态的卷积整合,以提高结构表示。最后,内容和结构表示被组合,形成了最终的项目嵌入,用于推荐。我们对三个实际 datasets(婴儿、运动和时尚)进行了广泛的实验,并证明了我们的方法在多模态推荐方法中的优越性和ID编码的细腻性。
Learning-Augmented Scheduling for Solar-Powered Electric Vehicle Charging
for: scheduling the charging of electric vehicles equipped with solar panels and batteries, particularly under out-of-distribution (OOD) conditions.
methods: leverages a novel learning-augmented policy that employs a dynamic robustness budget, which is adapted in real-time based on the reinforcement learning policy’s performance, using the temporal difference (TD) error to assess the trustworthiness of the machine-learned policy.
results: markedly improves scheduling effectiveness and reliability, particularly in OOD contexts, paving the way for more resilient and adaptive EV charging systems.Abstract
We tackle the complex challenge of scheduling the charging of electric vehicles (EVs) equipped with solar panels and batteries, particularly under out-of-distribution (OOD) conditions. Traditional scheduling approaches, such as reinforcement learning (RL) and model predictive control (MPC), often fail to provide satisfactory results when faced with OOD data, struggling to balance robustness (worst-case performance) and consistency (near-optimal average performance). To address this gap, we introduce a novel learning-augmented policy. This policy employs a dynamic robustness budget, which is adapted in real-time based on the reinforcement learning policy's performance. Specifically, it leverages the temporal difference (TD) error, a measure of the learning policy's prediction accuracy, to assess the trustworthiness of the machine-learned policy. This method allows for a more effective balance between consistency and robustness in EV charging schedules, significantly enhancing adaptability and efficiency in real-world, unpredictable environments. Our results demonstrate that this approach markedly improves scheduling effectiveness and reliability, particularly in OOD contexts, paving the way for more resilient and adaptive EV charging systems.
摘要
我们面临电动汽车(EV)装有太阳能板和电池的充电时间安排的复杂挑战,尤其在异常输入(OOD)条件下。传统的安排方法,如强化学习(RL)和预测模型控制(MPC),在面临OOD数据时经常无法提供满意的结果,坚持着平衡稳定性(最差性能)和一致性(近似优性)。为解决这个差距,我们介绍了一种新的学习增强策略。这种策略使用动态 robustness预算,实时根据学习策略的性能而改变。具体来说,它利用时间差(TD)错误,用于评估机器学习策略的预测准确性。这种方法允许更好地平衡稳定性和一致性在EV充电时间安排中,大大提高了适应性和效率,特别在实际不可预测的环境中。我们的结果表明,这种方法在OOD上进行了明显改进,大大提高了安排效果和可靠性,开拓了更加可靠和适应的EV充电系统。
Aggregation Weighting of Federated Learning via Generalization Bound Estimation
results: 通过实验,提出的聚合策略可以显著提高多种代表性FL算法在标准数据集上的性能。Abstract
Federated Learning (FL) typically aggregates client model parameters using a weighting approach determined by sample proportions. However, this naive weighting method may lead to unfairness and degradation in model performance due to statistical heterogeneity and the inclusion of noisy data among clients. Theoretically, distributional robustness analysis has shown that the generalization performance of a learning model with respect to any shifted distribution is bounded. This motivates us to reconsider the weighting approach in federated learning. In this paper, we replace the aforementioned weighting method with a new strategy that considers the generalization bounds of each local model. Specifically, we estimate the upper and lower bounds of the second-order origin moment of the shifted distribution for the current local model, and then use these bounds disagreements as the aggregation proportions for weightings in each communication round. Experiments demonstrate that the proposed weighting strategy significantly improves the performance of several representative FL algorithms on benchmark datasets.
摘要
通常, Federated Learning (FL) 使用 Client 模型参数的权重方法进行聚合。但这种简单的权重方法可能会导致不公平和模型性能下降,因为客户端数据具有统计不同性和噪声。理论上,分布robustness分析表明,学习模型对于任何偏移分布的总体性能具有上限。这些上限提供了一个重新考虑权重策略的动机。在本文中,我们将替换原来的权重策略,使用每个本地模型的泛化约束来确定聚合比例。具体来说,我们将估计当前本地模型的第二个源 moments的上下限,并使用这些上下限的差异作为每个通信轮的聚合比例。实验表明,我们的权重策略可以在多个代表性 FL 算法上显著改进模型性能。
Federated Learning with Manifold Regularization and Normalized Update Reaggregation
methods: 本 paper 使用了扩展的散列学习框架,具有聚合客户端更新norm的新全球优化器,以解决模型不一致性问题。具体来说,本 paper 使用了拓扑学习的概念,在散列学习中增加了一个拓扑模型融合方案,以便更好地反映模型的不一致性。
results: 实验表明,FedMRUR可以在散列学习中达到新的州际标准(SOTA)精度,并且减少了通信量。此外,本 paper 还证明了我们的算法在非对称Setting下可以达到线性增速性质。Abstract
Federated Learning (FL) is an emerging collaborative machine learning framework where multiple clients train the global model without sharing their own datasets. In FL, the model inconsistency caused by the local data heterogeneity across clients results in the near-orthogonality of client updates, which leads to the global update norm reduction and slows down the convergence. Most previous works focus on eliminating the difference of parameters (or gradients) between the local and global models, which may fail to reflect the model inconsistency due to the complex structure of the machine learning model and the Euclidean space's limitation in meaningful geometric representations. In this paper, we propose FedMRUR by adopting the manifold model fusion scheme and a new global optimizer to alleviate the negative impacts. Concretely, FedMRUR adopts a hyperbolic graph manifold regularizer enforcing the representations of the data in the local and global models are close to each other in a low-dimensional subspace. Because the machine learning model has the graph structure, the distance in hyperbolic space can reflect the model bias better than the Euclidean distance. In this way, FedMRUR exploits the manifold structures of the representations to significantly reduce the model inconsistency. FedMRUR also aggregates the client updates norms as the global update norm, which can appropriately enlarge each client's contribution to the global update, thereby mitigating the norm reduction introduced by the near-orthogonality of client updates. Furthermore, we theoretically prove that our algorithm can achieve a linear speedup property for non-convex setting under partial client participation.Experiments demonstrate that FedMRUR can achieve a new state-of-the-art (SOTA) accuracy with less communication.
摘要
联邦学习(FL)是一种在多个客户端上训练全域模型的新兴协力机器学习框架,而不需要客户端分享自己的数据。在FL中,因为客户端的地方数据不同而导致的模型不一致性,导致客户端更新的方向接近垂直方向,这会导致全域更新的规模增加和步骤变慢。大多数先前的工作强调在删除本地和全域模型之间的差异,但这可能无法反映模型不一致性,因为机器学习模型的复杂结构和欧几何空间的限制。在这篇文章中,我们提出了FedMRUR,通过采用数据构造模型融合方案和一个新的全域优化器,以解决这些负面影响。具体来说,FedMRUR采用一个拓扑图 manifold regularizer,使得本地和全域模型的表现在低维度子空间中相似。因为机器学习模型具有图结构,在拓扑图上的距离可以更好地反映模型偏见。这样,FedMRUR可以将数据表现的拓扑图结构纳入到模型训练中,以减少模型不一致性。FedMRUR还将客户端更新的规模总和为全域更新的规模,这可以适当地增加每个客户端的贡献,从而减少由近似垂直方向的客户端更新所导致的规模增加。此外,我们也 theoretically 证明了我们的算法可以在非凸设定下 achievelinear speedup 性。实验结果显示,FedMRUR可以 achieve 新的最佳性(SOTA)的准确性,并且需要更少的通信。
An alternative for one-hot encoding in neural network models
results: 本文的实验结果表明,使用二进制编码和修改前向和反向传播过程可以实现神经网络学习过程中 certain 特征类别数据实例的模型权重变化只影响该特征类别数据实例的计算,从而提高模型的性能。Abstract
This paper proposes an algorithm that implements binary encoding of the categorical features of neural network model input data, while also implementing changes in the forward and backpropagation procedures in order to achieve the property of having model weight changes, that result from the neural network learning process for certain data instances of some feature category, only affect the forward pass calculations for input data instances of that same feature category, as it is in the case of utilising one-hot encoding for categorical features.
摘要
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
results: FlashFFTConv 在 exact FFT convolutions 中提高了速度,比 PyTorch 快速了 7.93 倍,并在 end-to-end 速度上达到了 4.4 倍速化。此外, FlashFFTConv 在 Hyena-GPT-s 和 M2-BERT-base 中实现了更好的模型质量,与同样计算预算下的模型具有相同或更好的性能。Abstract
Convolution models with long filters have demonstrated state-of-the-art reasoning abilities in many long-sequence tasks but lag behind the most optimized Transformers in wall-clock time. A major bottleneck is the Fast Fourier Transform (FFT)--which allows long convolutions to run in $O(N logN)$ time in sequence length $N$ but has poor hardware utilization. In this paper, we study how to optimize the FFT convolution. We find two key bottlenecks: the FFT does not effectively use specialized matrix multiply units, and it incurs expensive I/O between layers of the memory hierarchy. In response, we propose FlashFFTConv. FlashFFTConv uses a matrix decomposition that computes the FFT using matrix multiply units and enables kernel fusion for long sequences, reducing I/O. We also present two sparse convolution algorithms--1) partial convolutions and 2) frequency-sparse convolutions--which can be implemented simply by skipping blocks in the matrix decomposition, enabling further opportunities for memory and compute savings. FlashFFTConv speeds up exact FFT convolutions by up to 7.93$\times$ over PyTorch and achieves up to 4.4$\times$ speedup end-to-end. Given the same compute budget, FlashFFTConv allows Hyena-GPT-s to achieve 2.3 points better perplexity on the PILE and M2-BERT-base to achieve 3.3 points higher GLUE score--matching models with twice the parameter count. FlashFFTConv also achieves 96.1% accuracy on Path-512, a high-resolution vision task where no model had previously achieved better than 50%. Furthermore, partial convolutions enable longer-sequence models--yielding the first DNA model that can process the longest human genes (2.3M base pairs)--and frequency-sparse convolutions speed up pretrained models while maintaining or improving model quality.
摘要
卷积模型 WITH long filters 已经在许多长序任务中显示出了state-of-the-art的理解能力,但它们在wall-clock时间方面落后于最优化的 Transformer。一个主要瓶颈是 Fast Fourier Transform (FFT),它可以在序列长度 N 的情况下使卷积运算时间为 $O(N \log N)$,但硬件利用率不高。在这篇论文中,我们研究如何优化 FFT 卷积。我们发现了两个关键瓶颈:FFT 不好地使用特殊化矩阵乘法单元,并且在层次结构中进行 I/O 操作会产生昂贵的成本。为了解决这些问题,我们提出了 FlashFFTConv。FlashFFTConv 使用矩阵分解来计算 FFT,并使用矩阵乘法单元进行计算,从而提高硬件利用率。此外,我们还提出了两种稀疏卷积算法:1)部分卷积和2)频率稀疏卷积。这些算法可以通过跳过块来实现,从而实现更多的内存和计算减少。FlashFFTConv 可以在精确 FFT 卷积中提高速度,达到 Up to 7.93 倍 PyTorch 的速度,并在综合评估中达到 Up to 4.4 倍的速度。给定同样的计算预算,FlashFFTConv 允许 Hyena-GPT-s 在 PILE 上达到 2.3 个点更高的折衔率,并使 M2-BERT-base 在 GLUE 上达到 3.3 个点更高的分数。FlashFFTConv 还可以在 Path-512 高分辨率视觉任务中达到 96.1% 的准确率,并且部分卷积可以实现更长的序列模型,例如可以处理人类基因最长的 2.3M 个基因对。此外,频率稀疏卷积可以加速预训练模型,保持或提高模型质量。
Can Machine Learning Uncover Insights into Vehicle Travel Demand from Our Built Environment?
results: 研究结果表明,使用计算模型可以帮助设计师快速获得交通需求的反馈,包括交通总量和时间分布。此外,计算模型还可以帮助评估和优化城市用地规划,从车辆交通的角度来看。Abstract
In this paper, we propose a machine learning-based approach to address the lack of ability for designers to optimize urban land use planning from the perspective of vehicle travel demand. Research shows that our computational model can help designers quickly obtain feedback on the vehicle travel demand, which includes its total amount and temporal distribution based on the urban function distribution designed by the designers. It also assists in design optimization and evaluation of the urban function distribution from the perspective of vehicle travel. We obtain the city function distribution information and vehicle hours traveled (VHT) information by collecting the city point-of-interest (POI) data and online vehicle data. The artificial neural networks (ANNs) with the best performance in prediction are selected. By using data sets collected in different regions for mutual prediction and remapping the predictions onto a map for visualization, we evaluate the extent to which the computational model sees use across regions in an attempt to reduce the workload of future urban researchers. Finally, we demonstrate the application of the computational model to help designers obtain feedback on vehicle travel demand in the built environment and combine it with genetic algorithms to optimize the current state of the urban environment to provide recommendations to designers.
摘要
在这篇论文中,我们提出了一种基于机器学习的方法,以解决城市规划师无法根据交通工具需求优化城市土地使用的问题。研究表明,我们的计算模型可以帮助城市规划师快速获得交通工具需求的总量和时间分布,包括基于城市功能分布的交通工具需求。此外,它还可以帮助评估和优化城市功能分布的交通工具需求。我们通过收集城市点对点数据和在线交通数据获得城市功能分布信息和交通时间(VHT)信息。我们选择了最佳表现的人工神经网络(ANNs)进行预测。通过在不同地区进行互Predict和重新映射预测结果onto a map for visualization,我们评估了计算模型在不同地区的使用程度,以降低未来城市研究者的工作负担。最后,我们示出了计算模型如何帮助城市规划师获得交通工具需求反馈,并与遗传算法结合优化当前城市环境,以提供建议给城市规划师。
results: 提出了一种新的高阶TRPCA方法LMH-BRTF,通过建立一个基于order-$d$ t-SVD的低级模型和适当的先验来自动确定tensor的多rank结构,并且能够更好地利用噪音信息,从而提高TRPCA的性能。Abstract
The recently proposed tensor robust principal component analysis (TRPCA) methods based on tensor singular value decomposition (t-SVD) have achieved numerous successes in many fields. However, most of these methods are only applicable to third-order tensors, whereas the data obtained in practice are often of higher order, such as fourth-order color videos, fourth-order hyperspectral videos, and fifth-order light-field images. Additionally, in the t-SVD framework, the multi-rank of a tensor can describe more fine-grained low-rank structure in the tensor compared with the tubal rank. However, determining the multi-rank of a tensor is a much more difficult problem than determining the tubal rank. Moreover, most of the existing TRPCA methods do not explicitly model the noises except the sparse noise, which may compromise the accuracy of estimating the low-rank tensor. In this work, we propose a novel high-order TRPCA method, named as Low-Multi-rank High-order Bayesian Robust Tensor Factorization (LMH-BRTF), within the Bayesian framework. Specifically, we decompose the observed corrupted tensor into three parts, i.e., the low-rank component, the sparse component, and the noise component. By constructing a low-rank model for the low-rank component based on the order-$d$ t-SVD and introducing a proper prior for the model, LMH-BRTF can automatically determine the tensor multi-rank. Meanwhile, benefiting from the explicit modeling of both the sparse and noise components, the proposed method can leverage information from the noises more effectivly, leading to an improved performance of TRPCA. Then, an efficient variational inference algorithm is established for parameters estimation. Empirical studies on synthetic and real-world datasets demonstrate the effectiveness of the proposed method in terms of both qualitative and quantitative results.
摘要
最近提出的高阶矩阵坚定原理Component Analysis(TRPCA)方法,基于高阶矩阵均值分解(t-SVD),在多个领域取得了成功。然而,大多数这些方法只适用于第三阶矩阵,而实际数据通常是更高阶的,例如第四阶色视频、第四阶射频视频和第五阶光场图像。此外,在t-SVD框架中,矩阵多rank可以描述矩阵中更细化的低级结构,相比于管道rank。然而,确定矩阵多rank是一个更加困难的问题,而且大多数现有的TRPCA方法并不明确地模型噪音。在这种情况下,我们提出了一种新的高阶TRPCA方法,即含有抽象的高阶矩阵均值分解(LMH-BRTF)。具体来说,我们将观察到的受损矩阵分解成三部分:低级组成部分、稀疏组成部分和噪音组成部分。通过基于第d级t-SVD的低级模型和适当的先验来建立低级模型,LMH-BRTF可以自动确定矩阵多rank。此外,因为明确地模型噪音和稀疏组成,提案的方法可以更好地利用噪音信息,从而提高TRPCA的性能。然后,我们建立了一种高效的变分推理算法来估计参数。empirical studies on synthetic and real-world datasets demonstrate the effectiveness of the proposed method in terms of both qualitative and quantitative results.
Hiformer: Heterogeneous Feature Interactions Learning with Transformers for Recommender Systems
results: 实验结果表明,\textsc{Hiformer}模型可以在大规模推荐系统中提供显著改进(最高提升+2.66%),并且在线部署中具有快速执行速度。Abstract
Learning feature interaction is the critical backbone to building recommender systems. In web-scale applications, learning feature interaction is extremely challenging due to the sparse and large input feature space; meanwhile, manually crafting effective feature interactions is infeasible because of the exponential solution space. We propose to leverage a Transformer-based architecture with attention layers to automatically capture feature interactions. Transformer architectures have witnessed great success in many domains, such as natural language processing and computer vision. However, there has not been much adoption of Transformer architecture for feature interaction modeling in industry. We aim at closing the gap. We identify two key challenges for applying the vanilla Transformer architecture to web-scale recommender systems: (1) Transformer architecture fails to capture the heterogeneous feature interactions in the self-attention layer; (2) The serving latency of Transformer architecture might be too high to be deployed in web-scale recommender systems. We first propose a heterogeneous self-attention layer, which is a simple yet effective modification to the self-attention layer in Transformer, to take into account the heterogeneity of feature interactions. We then introduce \textsc{Hiformer} (\textbf{H}eterogeneous \textbf{I}nteraction Trans\textbf{former}) to further improve the model expressiveness. With low-rank approximation and model pruning, \hiformer enjoys fast inference for online deployment. Extensive offline experiment results corroborates the effectiveness and efficiency of the \textsc{Hiformer} model. We have successfully deployed the \textsc{Hiformer} model to a real world large scale App ranking model at Google Play, with significant improvement in key engagement metrics (up to +2.66\%).
摘要
学习特征交互是推荐系统的关键脊梁。在网络级应用中,学习特征交互非常困难,因为输入特征空间很大且稀疏,同时手动设计有效的特征交互非常困难,因为解决空间是指数增长的。我们提议利用基于Transformer架构的模型,通过注意层自动捕捉特征交互。Transformer架构在许多领域取得了很大成功,如自然语言处理和计算机视觉。然而,在产业中对特征交互模型的应用还有一定的差距。我们的目标是填补这个差距。我们认为,在网络级应用中使用vanilla Transformer架构存在两个主要挑战:(1)Transformer架构无法捕捉特征交互中的不同类型交互;(2)Transformer架构的服务延迟可能太高,不适合在网络级应用中使用。我们首先提出一种不同类型交互的自注意层,这是一种简单 yet有效的修改,以满足特征交互中的不同类型交互。然后,我们引入\textsc{Hiformer}(特征交互转换器),以提高模型表达能力。通过低级抽象和模型剔除,\hiformer在线上部署中具有快速的推理速度。我们对大量实验数据进行了广泛的做法验证,证明了\textsc{Hiformer}模型的有效性和高效性。我们成功地将\textsc{Hiformer}模型部署到了Google Play上一个实际应用中,并实现了关键参与度指标(最高+2.66%)上的显著提高。
For: This paper investigates the problem of determining whether two random databases are statistically dependent or not.* Methods: The paper formulates this problem as a hypothesis testing problem, and uses techniques from information theory and matrix analysis to derive thresholds for optimal testing.* Results: The paper shows that the thresholds for optimal testing depend on the number of dimensions $n$ and the spectral properties of the generative distributions of the datasets, and proves that weak detection is statistically impossible when a certain function of the eigenvalues of the likelihood function and $d$ is below a certain threshold, as $d\to\infty$. The paper also derives strong and weak detection lower and upper bounds for the case where $d$ is fixed.Abstract
In this paper, we investigate the problem of deciding whether two random databases $\mathsf{X}\in\mathcal{X}^{n\times d}$ and $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ are statistically dependent or not. This is formulated as a hypothesis testing problem, where under the null hypothesis, these two databases are statistically independent, while under the alternative, there exists an unknown row permutation $\sigma$, such that $\mathsf{X}$ and $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$, are statistically dependent with some known joint distribution, but have the same marginal distributions as the null. We characterize the thresholds at which optimal testing is information-theoretically impossible and possible, as a function of $n$, $d$, and some spectral properties of the generative distributions of the datasets. For example, we prove that if a certain function of the eigenvalues of the likelihood function and $d$, is below a certain threshold, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, no matter what the value of $n$ is. This mimics the performance of an efficient test that thresholds a centered version of the log-likelihood function of the observed matrices. We also analyze the case where $d$ is fixed, for which we derive strong (vanishing error) and weak detection lower and upper bounds.
摘要
在这篇论文中,我们研究了两个随机数据库 $\mathsf{X}\in\mathcal{X}^{n\times d}$ 和 $\mathsf{Y}\in\mathcal{Y}^{n\times d}$ 是否 Statistically 相关的问题。我们将这个问题转化为一个 гипотеза测试问题,其中在 null 假设下,这两个数据库是 statistically 独立的,而在 alternative 下,存在一个未知的行Permutation $\sigma$,使得 $\mathsf{X}$ 和 $\mathsf{Y}^\sigma$ 是一个已知的联合分布下的 Statistically 相关的,但它们的各自分布与 null 假设一样。我们分析了在 $n$ 和 $d$ 上的测试阈值,以及这些阈值与数据集的生成分布的特性之间的关系。例如,我们证明了,如果一个函数 $f$ 的值小于一定阈值,然后 $d\to\infty$,那么在任何 $n$ 值下,弱测试(比Random Guessing 稍微好)是 statistically 不可能,这与中心化的 log-likelihood 函数的测试阈值相同。我们还分析了 $d$ 是固定的情况,得到了强(消失错误)和弱(增长错误)检测下界和上界。
Fair Supervised Learning with A Simple Random Sampler of Sensitive Attributes
results: 实验表明,该方法在流行的 benchmark 数据集上比竞争方法具有更好的实用性和公平度量。此外,该方法还 theoretically Characterize 评估误差和损失的Utility。Abstract
As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of sensitive attributes and response variables, the proposed penalty is able to handle versatile formats of the sensitive attributes, so it is more extensively applicable in practice than many existing algorithms. This penalty enables us to build a computationally efficient group-level in-processing fairness-aware training framework. Empirical evidence shows that our framework enjoys better utility and fairness measures on popular benchmark data sets than competing methods. We also theoretically characterize estimation errors and loss of utility of the proposed neural-penalized risk minimization problem.
摘要
“在工业应用中,数据驱动决策过程变得越来越重要,而具有公平性的机器学习也吸引了各方关注。这项工作提出了基于神经网络学习的公平罚款,通过简单随机抽取敏感特征来实现不歧视式指导学习。与许多现有方法不同,我们的罚款可以处理各种敏感特征的格式,因此在实践中更加广泛适用。这种罚款允许我们建立高效的计算机器-级内部处理公平性感知训练框架。实验证明,我们的框架在 популяр的Benchmark数据集上实现了更好的用用性和公平性指标,而且我们还 theoretically characterize estimation errors和loss of utility of the proposed neural-penalized risk minimization problem。”Note that Simplified Chinese is the written form of Chinese used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other regions.
Clipped-Objective Policy Gradients for Pessimistic Policy Optimization
results: 在连续动作空间中使用 clipped-objective policy gradient(COPG)对象可以提高 PPO 的性能,而不添加计算成本或复杂度,并且与 TRPO 相比,COPG 方法可以提供更好的性能。Abstract
To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences. Natural policy gradient methods, including Trust Region Policy Optimization (TRPO), seek to produce monotonic improvement through bounded changes in policy outputs. Proximal Policy Optimization (PPO) is a commonly used, first-order algorithm that instead uses loss clipping to take multiple safe optimization steps per batch of data, replacing the bound on the single step of TRPO with regularization on multiple steps. In this work, we find that the performance of PPO, when applied to continuous action spaces, may be consistently improved through a simple change in objective. Instead of the importance sampling objective of PPO, we instead recommend a basic policy gradient, clipped in an equivalent fashion. While both objectives produce biased gradient estimates with respect to the RL objective, they also both display significantly reduced variance compared to the unbiased off-policy policy gradient. Additionally, we show that (1) the clipped-objective policy gradient (COPG) objective is on average "pessimistic" compared to both the PPO objective and (2) this pessimism promotes enhanced exploration. As a result, we empirically observe that COPG produces improved learning compared to PPO in single-task, constrained, and multi-task learning, without adding significant computational cost or complexity. Compared to TRPO, the COPG approach is seen to offer comparable or superior performance, while retaining the simplicity of a first-order method.
摘要
<>将文本翻译成简化中文。<>为了优化学习效率,深度参与学习(RL)中的政策梯度方法通常与减少噪声度量和基于批处经验的策略相结合。自然政策梯度方法,包括信任区政策优化(TRPO),旨在通过约束的变化来生成升序的改进。而 proximal policy optimization(PPO)则是一种通常使用的第一个算法,它使用损失clip来实现多个安全优化步骤,而不是TRPO中的约束。在这项工作中,我们发现在继续动作空间中应用PPO时,可以通过简单的目标更改来提高性能。而不是PPO的重要样本 objective,我们建议使用基本政策梯度,并将其clip在相同的方式下。虽然两个目标都会生成偏离RL目标的偏梯度估计,但它们都会显著减少噪声度。此外,我们还证明了以下两点:(1)COPG目标比PPO目标更加“负面”,(2)这种负面性会促进更好的探索。因此,我们在单任务、受限制和多任务学习中观察到,COPG可以提高学习效果,而不需要添加显著的计算成本或复杂性。相比TRPO,COPG方法可以提供相当于或更好的性能,而且保留了一个简单的首领方法的简单性。
AccEPT: An Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training
results: 研究结果显示,该提案能够在考虑的实验设置下加速边缘管道并行训练,最高提高训练速度达3倍。Abstract
It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed partitioning a large model into several sub-models, and deploying each of them to a different edge device to collaboratively train a DNN model. However, the communication overhead caused by the large amount of data transmitted from one device to another during training, as well as the sub-optimal partition point due to the inaccurate latency prediction of computation at each edge device can significantly slow down training. In this paper, we propose AccEPT, an acceleration scheme for accelerating the edge collaborative pipeline-parallel training. In particular, we propose a light-weight adaptive latency predictor to accurately estimate the computation latency of each layer at different devices, which also adapts to unseen devices through continuous learning. Therefore, the proposed latency predictor leads to better model partitioning which balances the computation loads across participating devices. Moreover, we propose a bit-level computation-efficient data compression scheme to compress the data to be transmitted between devices during training. Our numerical results demonstrate that our proposed acceleration approach is able to significantly speed up edge pipeline parallel training up to 3 times faster in the considered experimental settings.
摘要
通常不可能使用单个边缘设备(edge device)来整个大深度学习模型(DNN)的适应和训练,因为边缘设备的资源有限。为了推广智能应用于边缘设备,研究人员已经提议将大型模型 partitioned 成多个子模型,并将每个子模型分配到不同的边缘设备进行协同训练 DNN 模型。然而,在训练过程中由一个设备传输到另一个设备的大量数据会导致通信开销增加,而且因为每个边缘设备的计算延迟预测不准确,会导致分区点不佳,从而降低训练速度。在这篇论文中,我们提出了 AccEPT,一种加速边缘协同管道式训练的加速方案。具体来说,我们提出了一种轻量级的适应式计算延迟预测器,可以准确地预测每层的计算延迟在不同的设备上。此外,我们还提出了一种位元级计算效率高的数据压缩方案,可以压缩在训练过程中传输的数据。我们的数据显示,我们的提出的加速策略可以在考虑的实验设置下加速边缘管道式训练,达到 3 倍的速度提升。
Machine Learning-powered Compact Modeling of Stochastic Electronic Devices using Mixture Density Networks
paper_authors: Jack Hutchins, Shamiul Alam, Dana S. Rampini, Bakhrom G. Oripov, Adam N. McCaughan, Ahmedullah Aziz
for: This paper aims to address the challenge of accurately modeling the stochastic behavior of electronic devices in circuit design and simulation.
methods: The authors use Mixture Density Networks (MDNs), a machine learning approach, to model the stochastic behavior of electronic devices and demonstrate their method on heater cryotrons.
results: The authors achieve a mean absolute error of 0.82% in capturing the stochastic switching dynamics of heater cryotrons, showcasing the effectiveness of their approach in accurately simulating the behavior of electronic devices.Abstract
The relentless pursuit of miniaturization and performance enhancement in electronic devices has led to a fundamental challenge in the field of circuit design and simulation: how to accurately account for the inherent stochastic nature of certain devices. While conventional deterministic models have served as indispensable tools for circuit designers, they fall short when it comes to capture the subtle yet critical variability exhibited by many electronic components. In this paper, we present an innovative approach that transcends the limitations of traditional modeling techniques by harnessing the power of machine learning, specifically Mixture Density Networks (MDNs), to faithfully represent and simulate the stochastic behavior of electronic devices. We demonstrate our approach to model heater cryotrons, where the model is able to capture the stochastic switching dynamics observed in the experiment. Our model shows 0.82% mean absolute error for switching probability. This paper marks a significant step forward in the quest for accurate and versatile compact models, poised to drive innovation in the realm of electronic circuits.
摘要
“电子设备的推进式小型化和性能提高已经导致对电路设计和模拟的基本挑战:如何准确地考虑certain device的随机性。传统的决定论模型在电路设计中 serves as indispensable tools, but they fall short when it comes to capturing the subtle yet critical variability exhibited by many electronic components. 在本文中,我们透过 harnessing the power of machine learning, specifically Mixture Density Networks (MDNs), to faithfully represent and simulate the stochastic behavior of electronic devices. We demonstrate our approach by modeling heater cryotrons, where the model is able to capture the stochastic switching dynamics observed in the experiment. Our model shows 0.82% mean absolute error for switching probability, marking a significant step forward in the quest for accurate and versatile compact models, poised to drive innovation in the realm of electronic circuits.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide that as well.
Scale-MIA: A Scalable Model Inversion Attack against Secure Federated Learning via Latent Space Reconstruction
paper_authors: Shanghao Shi, Ning Wang, Yang Xiao, Chaoyu Zhang, Yi Shi, Y. Thomas Hou, Wenjing Lou
for: This paper aims to address the issue of model inversion attacks (MIAs) in federated learning, which can compromise the data privacy of individual users.
methods: The proposed method, called Scale-MIA, uses a two-step process to efficiently and accurately recover training samples from the aggregated model updates. The first step involves reconstructing the latent space representations (LSRs) from the updates using a closed-form inversion mechanism, and the second step involves recovering the whole input batches from the LSRs using a fine-tuned generative decoder.
results: The proposed Scale-MIA method achieves excellent recovery performance on different datasets, with high reconstruction rates, accuracy, and attack efficiency compared to state-of-the-art MIAs. The method is able to efficiently recover the training samples even when the system is under the protection of a robust secure aggregation protocol.Abstract
Federated learning is known for its capability to safeguard participants' data privacy. However, recently emerged model inversion attacks (MIAs) have shown that a malicious parameter server can reconstruct individual users' local data samples through model updates. The state-of-the-art attacks either rely on computation-intensive search-based optimization processes to recover each input batch, making scaling difficult, or they involve the malicious parameter server adding extra modules before the global model architecture, rendering the attacks too conspicuous and easily detectable. To overcome these limitations, we propose Scale-MIA, a novel MIA capable of efficiently and accurately recovering training samples of clients from the aggregated updates, even when the system is under the protection of a robust secure aggregation protocol. Unlike existing approaches treating models as black boxes, Scale-MIA recognizes the importance of the intricate architecture and inner workings of machine learning models. It identifies the latent space as the critical layer for breaching privacy and decomposes the complex recovery task into an innovative two-step process to reduce computation complexity. The first step involves reconstructing the latent space representations (LSRs) from the aggregated model updates using a closed-form inversion mechanism, leveraging specially crafted adversarial linear layers. In the second step, the whole input batches are recovered from the LSRs by feeding them into a fine-tuned generative decoder. We implemented Scale-MIA on multiple commonly used machine learning models and conducted comprehensive experiments across various settings. The results demonstrate that Scale-MIA achieves excellent recovery performance on different datasets, exhibiting high reconstruction rates, accuracy, and attack efficiency on a larger scale compared to state-of-the-art MIAs.
摘要
federated learning 知名于保护参与者数据隐私。然而,最近出现的模型反向攻击(MIA)表明了一个恶意参数服务器可以通过模型更新 recover 个人用户的本地数据样本。现有的攻击方法可能需要 computation-intensive 搜索基本进行更新恢复,或者它们会在参数服务器上添加额外模块,使攻击过于明显并易于检测。为了解决这些限制,我们提出了Scale-MIA,一种新的MIA,可以高效地和准确地从集成更新中提取客户端训练样本,即使系统在一个可信的安全汇聚协议的保护下。与现有的方法不同,Scale-MIA认为机器学习模型不是黑盒子,而是注重内部结构和层次结构。它将秘密空间作为隐私泄露的关键层,将复杂的恢复任务分解成两个步骤,以降低计算复杂性。第一步是从集成模型更新中重construct秘密空间表示(LSR)使用关键拟合机制,利用特制的对抗性linear层。第二步是通过feeding LSRs into a fine-tuned generative decoder来恢复整个输入批处理。我们在多种常用的机器学习模型上实现了Scale-MIA,并在不同的设置下进行了广泛的实验。结果表明,Scale-MIA在不同的数据集上表现出了高恢复率、准确率和攻击效率,相比现有的MIAs性能更高。
Improvements on Uncertainty Quantification for Node Classification via Distance-Based Regularization
paper_authors: Russell Alan Hart, Linlin Yu, Yifei Lou, Feng Chen
for: The paper focuses on uncertainty quantification for interdependent node-level classification, specifically addressing the limitations of the widely-used uncertainty cross-entropy (UCE) loss function and proposing a distance-based regularization to improve the performance of graph posterior networks (GPNs) in detecting out-of-distribution (OOD) nodes.
methods: The paper uses graph posterior networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss function, and proposes a distance-based regularization to encourage clustered OOD nodes to remain clustered in the latent space.
results: The proposed regularization outperforms the state-of-the-art in both OOD detection and misclassification detection, as demonstrated through extensive comparison experiments on eight standard datasets.Here’s the Chinese translation of the three key information points:
methods: 这篇论文使用图 posterior networks(GPNs),优化不确定度距离(UCE)基于损失函数,并提出了一种距离基于正则化,以便在 latent space 中使 OOD 节点均匀分布。
results: 提议的正则化在 OOD 探测和误分类探测中都超过了现有的最佳性能,通过对八个标准数据集进行了广泛的比较试验来证明。Abstract
Deep neural networks have achieved significant success in the last decades, but they are not well-calibrated and often produce unreliable predictions. A large number of literature relies on uncertainty quantification to evaluate the reliability of a learning model, which is particularly important for applications of out-of-distribution (OOD) detection and misclassification detection. We are interested in uncertainty quantification for interdependent node-level classification. We start our analysis based on graph posterior networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss function. We describe the theoretical limitations of the widely-used UCE loss. To alleviate the identified drawbacks, we propose a distance-based regularization that encourages clustered OOD nodes to remain clustered in the latent space. We conduct extensive comparison experiments on eight standard datasets and demonstrate that the proposed regularization outperforms the state-of-the-art in both OOD detection and misclassification detection.
摘要
paper_authors: Juhwan Lee, Justin N. Kim, Luis A. P. Dallan, Vladislav N. Zimin, Ammar Hoori, Neda S. Hassani, Mohamed H. E. Makhlouf, Giulio Guagliumi, Hiram G. Bezerra, David L. Wilson
results: 研究发现,使用该方法可以得到更好的FC segmentation结果(Dice指数为0.837+/-0.012),并且在五次交叉验证和保留测试集上表现很好(敏感度为85.0+/-0.3%,Dice指数为0.846+/-0.011)。此外,研究还发现了膜厚度与实际值之间的高度一致(膜厚度差为2.95+/-20.73um),并且在预和后硬件扩展之间存在高度一致的重复性(平均FC角度为200.9+/-128.0度/202.0+/-121.1度)。Abstract
Thin-cap fibroatheroma (TCFA) is a prominent risk factor for plaque rupture. Intravascular optical coherence tomography (IVOCT) enables identification of fibrous cap (FC), measurement of FC thicknesses, and assessment of plaque vulnerability. We developed a fully-automated deep learning method for FC segmentation. This study included 32,531 images across 227 pullbacks from two registries. Images were semi-automatically labeled using our OCTOPUS with expert editing using established guidelines. We employed preprocessing including guidewire shadow detection, lumen segmentation, pixel-shifting, and Gaussian filtering on raw IVOCT (r,theta) images. Data were augmented in a natural way by changing theta in spiral acquisitions and by changing intensity and noise values. We used a modified SegResNet and comparison networks to segment FCs. We employed transfer learning from our existing much larger, fully-labeled calcification IVOCT dataset to reduce deep-learning training. Overall, our method consistently delivered better FC segmentation results (Dice: 0.837+/-0.012) than other deep-learning methods. Transfer learning reduced training time by 84% and reduced the need for more training samples. Our method showed a high level of generalizability, evidenced by highly-consistent segmentations across five-fold cross-validation (sensitivity: 85.0+/-0.3%, Dice: 0.846+/-0.011) and the held-out test (sensitivity: 84.9%, Dice: 0.816) sets. In addition, we found excellent agreement of FC thickness with ground truth (2.95+/-20.73 um), giving clinically insignificant bias. There was excellent reproducibility in pre- and post-stenting pullbacks (average FC angle: 200.9+/-128.0 deg / 202.0+/-121.1 deg). Our method will be useful for multiple research purposes and potentially for planning stent deployments that avoid placing a stent edge over an FC.
摘要
薄层纤维肉瘤(TCFA)是膜裂崩溃的重要风险因素。内血流图像学(IVOCT)可以识别纤维覆(FC)、测量FC厚度和评估膜易裂性。我们开发了一种自动化的深度学习方法 для FC分割。本研究包括32531张图像,来自227个推出的数据。图像通过我们的OCTOPUS自动标注,并由专家编辑以确定的指南进行了手动修改。我们使用了Raw IVOCT(r,θ)图像的准备处理,包括导向杆影像检测、血液分割、像素拼接和高斯滤波。我们使用了修改后的SegResNet和比较网络来分割FC。我们利用了我们现有的大量、完全标注calcification IVOCT数据进行深度学习减少训练时间。总的来说,我们的方法在FC分割方面提供了更好的结果(Dice值为0.837±0.012),而且在其他深度学习方法的比较中表现出了更高的一致性。传输学习可以降低训练时间84%,并降低了需要更多的训练样本。我们的方法在多个横向验证(敏感性:85.0±0.3%,Dice值:84.6±0.011)和保留测试集(敏感性:84.9%,Dice值:81.6)中表现出了高度一致性。此外,我们发现FC厚度与实际值(2.95±20.73 um)之间存在了临界的一致性。在预和后植入推出中,FC角度也具有极高的一致性(平均FC角度:200.9±128.0度/202.0±121.1度)。我们的方法将有助于多种研究目的,并可能用于规划避免在FC上部留下植入体的执行。
Perceptual impact of the loss function on deep-learning image coding performance
results: 该论文通过一项人工测试来研究不同图像质量指标对深度学习图像编码器的感知性能的影响。结果表明,选择合适的质量指标对深度学习图像编码器的感知性能至关重要,而且可以根据图像内容进行选择。Abstract
Nowadays, deep-learning image coding solutions have shown similar or better compression efficiency than conventional solutions based on hand-crafted transforms and spatial prediction techniques. These deep-learning codecs require a large training set of images and a training methodology to obtain a suitable model (set of parameters) for efficient compression. The training is performed with an optimization algorithm which provides a way to minimize the loss function. Therefore, the loss function plays a key role in the overall performance and includes a differentiable quality metric that attempts to mimic human perception. The main objective of this paper is to study the perceptual impact of several image quality metrics that can be used in the loss function of the training process, through a crowdsourcing subjective image quality assessment study. From this study, it is possible to conclude that the choice of the quality metric is critical for the perceptual performance of the deep-learning codec and that can vary depending on the image content.
摘要
现在,深度学习图像编码解决方案已经达到了传统基于手工设计变换和空间预测技术的同等或更好的压缩效率。这些深度学习编码器需要一大量的图像训练集和训练方法来获得有效的压缩模型(参数集)。训练是通过优化算法来进行,该算法提供了一种将损失函数最小化的方式。因此,损失函数在整体性能中扮演关键的角色,并包含一个可导的质量指标,以模仿人类嗅感。本文的主要目标是通过人类主观图像质量评估研究来研究各种图像质量指标的感知影响,以便在训练过程中选择合适的质量指标。从这项研究中,可以结论出选择质量指标是深度学习编码器的感知性能的关键因素,并且可以根据图像内容而变化。
YOLOv5s-BC: An improved YOLOv5s-based method for real-time apple detection
paper_authors: Jingfan Liu, Zhaobing Liu for:* 这种研究旨在解决现有的苹果检测算法存在的问题,提出了一种基于YOLOv5s的改进方法,以实现实时的苹果检测。methods:* 该方法在基础模块中添加了坐标注意(CA)块,并将原始 concatenation 操作替换为双向特征 pyramid network(BiFPN)。* 此外,该方法还添加了一个新的检测头,以便在视野中检测更小和更远的目标。results:* 对比多种目标检测算法,包括YOLOv5s、YOLOv4、YOLOv3、SSD、Faster R-CNN(ResNet50)和Faster R-CNN(VGG),提出的方法具有显著的改进,具体是4.6%、3.6%、20.48%、23.22%、15.27%和15.59%的准确率提升。* 该方法的检测精度也得到了显著提升,并且具有实时的检测速度(0.018秒/图像)和较小的模型大小(16.7 Mb),满足了找苹果机器人的实时要求。* 根据热图,该方法可以更好地关注和学习目标苹果的高级特征,并能够更好地识别小目标苹果。* 在其他苹果园测试中,模型可以在实时中检测并正确地捕捉到可搜集的苹果。Abstract
To address the issues associated with the existing algorithms for the current apple detection, this study proposes an improved YOLOv5s-based method, named YOLOv5s-BC, for real-time apple detection, in which a series of modifications have been introduced. Firstly, a coordinate attention (CA) block has been incorporated into the backbone module to construct a new backbone network. Secondly, the original concatenation operation has been replaced with a bidirectional feature pyramid network (BiFPN) in the neck module. Lastly, a new detection head has been added to the head module, enabling the detection of smaller and more distant targets within the field of view of the robot. The proposed YOLOv5s-BC model was compared to several target detection algorithms, including YOLOv5s, YOLOv4, YOLOv3, SSD, Faster R-CNN (ResNet50), and Faster R-CNN (VGG), with significant improvements of 4.6%, 3.6%, 20.48%, 23.22%, 15.27%, and 15.59% in mAP, respectively. The detection accuracy of the proposed model is also greatly enhanced over the original YOLOv5s model. The model boasts an average detection speed of 0.018 seconds per image, and the weight size is only 16.7 Mb with 4.7 Mb smaller than that of YOLOv8s, meeting the real-time requirements for the picking robot. Furthermore, according to the heat map, our proposed model can focus more on and learn the high-level features of the target apples, and recognize the smaller target apples better than the original YOLOv5s model. Then, in other apple orchard tests, the model can detect the pickable apples in real time and correctly, illustrating a decent generalization ability.
摘要
要解决现有算法对现有苹果检测的问题,这些研究提出了改进的 YOLOv5s 基于方法,称为 YOLOv5s-BC,用于实时苹果检测。在这些改进中,我们在背bone模块中添加了坐标注意(CA)块,并将原始 concatenation 操作替换为双向特征pyramid网络(BiFPN)在 neck 模块中。此外,我们还添加了一个新的检测头到头模块,以便在视野中检测更小和更远的目标。与其他目标检测算法相比,我们的 YOLOv5s-BC 模型在 mAP 方面表现出了显著提高,分别为 4.6%、3.6%、20.48%、23.22%、15.27% 和 15.59%。此外,我们的模型还可以在实时要求下运行,每张图像需要0.018秒的检测时间,模型的权重大小只有16.7 Mb,比 YOLOv8s 小4.7 Mb。此外,根据热度图,我们的提posed模型可以更好地关注和学习高级特征,并在原始 YOLOv5s 模型中更好地识别更小的目标。然后,在其他苹果园测试中,模型可以在实时内correctly检测采集可能的苹果。
results: 我们提出了一个随机接入协议,以确保多个用户成功传输数据,并且限制延迟和能源损失。我们考虑了两种方案:一个固定传输机会(FTP)方案,其中每个用户的传输机会(TP)在数据传输开始时被更新;另一个是自适应传输机会(ATP)方案,其中TP在每次成功接收数据时被更新。我们分析了这两种协议的性能,包括延迟、能源消耗和失败率,并且使用对数律的传输框架大小。Abstract
The current body of research on terahertz (THz) wireless communications predominantly focuses on its application for single-user backhaul/fronthaul connectivity at sub-THz frequencies. First, we develop a generalized statistical model for signal propagation at THz frequencies encompassing physical layer impairments, including random path-loss with Gamma distribution for the molecular absorption coefficient, short-term fading characterized by the $\alpha$-$\eta$-$\kappa$-$\mu$ distribution, antenna misalignment errors, and transceiver hardware impairments. Next, we propose random access protocols for a cell-free wireless network, ensuring successful transmission for multiple users with limited delay and energy loss, exploiting the combined effect of random atmospheric absorption, non-linearity of fading, hardware impairments, and antenna misalignment errors. We consider two schemes: a fixed transmission probability (FTP) scheme where the transmission probability (TP) of each user is updated at the beginning of the data transmission and an adaptive transmission probability (ATP) scheme where the TP is updated with each successful reception of the data. We analyze the performance of both protocols using delay, energy consumption, and outage probability with scaling laws for the transmission of a data frame consisting of a single packet from users at a predefined quality of service (QoS).
摘要
Current research on terahertz (THz) wireless communications mainly focuses on its application for single-user backhaul/fronthaul connectivity at sub-THz frequencies. We first develop a generalized statistical model for signal propagation at THz frequencies, taking into account physical layer impairments such as random path-loss with Gamma distribution for the molecular absorption coefficient, short-term fading characterized by the $\alpha$-$\eta$-$\kappa$-$\mu$ distribution, antenna misalignment errors, and transceiver hardware impairments. Next, we propose random access protocols for a cell-free wireless network to ensure successful transmission for multiple users with limited delay and energy loss, leveraging the combined effect of random atmospheric absorption, non-linearity of fading, hardware impairments, and antenna misalignment errors. We consider two schemes: a fixed transmission probability (FTP) scheme where the transmission probability (TP) of each user is updated at the beginning of the data transmission, and an adaptive transmission probability (ATP) scheme where the TP is updated with each successful reception of the data. We analyze the performance of both protocols using delay, energy consumption, and outage probability with scaling laws for the transmission of a data frame consisting of a single packet from users at a predefined quality of service (QoS).
Passive Integrated Sensing and Communication Scheme based on RF Fingerprint Information Extraction for Cell-Free RAN
results: simulations results表明,提出的pasive ISAC方案可以有效地探测环境中的反射体信息,不会影响通信性能。Abstract
This paper investigates how to achieve integrated sensing and communication (ISAC) based on a cell-free radio access network (CF-RAN) architecture with a minimum footprint of communication resources. We propose a new passive sensing scheme. The scheme is based on the radio frequency (RF) fingerprint learning of the RF radio unit (RRU) to build an RF fingerprint library of RRUs. The source RRU is identified by comparing the RF fingerprints carried by the signal at the receiver side. The receiver extracts the channel parameters from the signal and estimates the channel environment, thus locating the reflectors in the environment. The proposed scheme can effectively solve the problem of interference between signals in the same time-frequency domain but in different spatial domains when multiple RRUs jointly serve users in CF-RAN architecture. Simulation results show that the proposed passive ISAC scheme can effectively detect reflector location information in the environment without degrading the communication performance.
摘要
Fully-Passive versus Semi-Passive IRS-Enabled Sensing: SNR and CRB Comparison
paper_authors: Xianxin Song, Xinmin Li, Xiaoqi Qin, Jie Xu, Tony Xiao Han, Derrick Wing Kwan Ng
for: 本研究 investigate two intelligent reflecting surface (IRS)-enabled non-line-of-sight (NLoS) sensing system with fully-passive和semi-passive IRSs, respectively.
methods: 研究使用了一个基站(BS)、一个 uniform linear array(ULA)IRS和一个点target in the NLoS region of the BS。 Specifically, we analyze the sensing signal-to-noise ratio(SNR)performance for a target detection scenario and the estimation Cramér-Rao bound(CRB)performance for a target’s direction-of-arrival(DoA)estimation scenario.
results: 结果表明,当IRS中的反射元素数($N$) sufficiently large时,semi-passive-IRS sensing system的最大探测SNR将提高 proportional to $N^2$,而fully-passive-IRS counterpart will increase proportional to $N^4$. In addition, we found that the minimum CRB performance will decrease inversely proportionally to $N^4$ and $N^6$ for the semi-passive and fully-passive-IRS sensing systems, respectively.Abstract
This paper investigates the sensing performance of two intelligent reflecting surface (IRS)-enabled non-line-of-sight (NLoS) sensing systems with fully-passive and semi-passive IRSs, respectively. In particular, we consider a fundamental setup with one base station (BS), one uniform linear array (ULA) IRS, and one point target in the NLoS region of the BS. Accordingly, we analyze the sensing signal-to-noise ratio (SNR) performance for a target detection scenario and the estimation Cram\'er-Rao bound (CRB) performance for a target's direction-of-arrival (DoA) estimation scenario, in cases where the transmit beamforming at the BS and the reflective beamforming at the IRS are jointly optimized. First, for the target detection scenario, we characterize the maximum sensing SNR when the BS-IRS channels are line-of-sight (LoS) and Rayleigh fading, respectively. It is revealed that when the number of reflecting elements $N$ equipped at the IRS becomes sufficiently large, the maximum sensing SNR increases proportionally to $N^2$ for the semi-passive-IRS sensing system, but proportionally to $N^4$ for the fully-passive-IRS counterpart. Then, for the target's DoA estimation scenario, we analyze the minimum CRB performance when the BS-IRS channel follows Rayleigh fading. Specifically, when $N$ grows, the minimum CRB decreases inversely proportionally to $N^4$ and $N^6$ for the semi-passive and fully-passive-IRS sensing systems, respectively. Finally, numerical results are presented to corroborate our analysis across various transmit and reflective beamforming design schemes under general channel setups. It is shown that the fully-passive-IRS sensing system outperforms the semi-passive counterpart when $N$ exceeds a certain threshold. This advantage is attributed to the additional reflective beamforming gain in the IRS-BS path, which efficiently compensates for the path loss for a large $N$.
摘要
For the target detection scenario, we characterize the maximum sensing SNR when the BS-IRS channels are line-of-sight (LoS) and Rayleigh fading, respectively. Our results show that when the number of reflecting elements $N$ equipped at the IRS becomes sufficiently large, the maximum sensing SNR increases proportionally to $N^2$ for the semi-passive-IRS sensing system, but proportionally to $N^4$ for the fully-passive-IRS counterpart.For the target's DoA estimation scenario, we analyze the minimum CRB performance when the BS-IRS channel follows Rayleigh fading. Our results show that when $N$ grows, the minimum CRB decreases inversely proportionally to $N^4$ and $N^6$ for the semi-passive and fully-passive-IRS sensing systems, respectively.Numerical results are presented to corroborate our analysis across various transmit and reflective beamforming design schemes under general channel setups. Our results show that the fully-passive-IRS sensing system outperforms the semi-passive counterpart when $N$ exceeds a certain threshold. This advantage is attributed to the additional reflective beamforming gain in the IRS-BS path, which efficiently compensates for the path loss for a large $N$.
Sensing-Assisted Sparse Channel Recovery for Massive Antenna Systems
results: 计算结果表明,提出的感知帮助方法可以明显提高总可 achievable 率,比传统基于DFT稀疏基准无需感知的设计更高,这是因为它减少了训练负担并提高了重建精度,具体是通过限制反馈。Abstract
This correspondence presents a novel sensing-assisted sparse channel recovery approach for massive antenna wireless communication systems. We focus on a fundamental configuration with one massive-antenna base station (BS) and one single-antenna communication user (CU). The wireless channel exhibits sparsity and consists of multiple paths associated with scatterers detectable via radar sensing. Under this setup, the BS first sends downlink pilots to the CU and concurrently receives the echo pilot signals for sensing the surrounding scatterers. Subsequently, the CU sends feedback information on its received pilot signal to the BS. Accordingly, the BS determines the sparse basis based on the sensed scatterers and proceeds to recover the wireless channel, exploiting the feedback information based on advanced compressive sensing (CS) algorithms. Numerical results show that the proposed sensing-assisted approach significantly increases the overall achievable rate than the conventional design relying on a discrete Fourier transform (DFT)-based sparse basis without sensing, thanks to the reduced training overhead and enhanced recovery accuracy with limited feedback.
摘要
First, the BS sends downlink pilots to the CU and receives echo pilot signals for sensing the surrounding scatterers. Then, the CU sends feedback information on its received pilot signal to the BS. Based on the sensed scatterers, the BS determines the sparse basis and uses advanced compressive sensing (CS) algorithms to recover the wireless channel.Numerical results show that the proposed sensing-assisted approach significantly increases the overall achievable rate compared to the conventional design that relies on a discrete Fourier transform (DFT)-based sparse basis without sensing. This is because the reduced training overhead and enhanced recovery accuracy with limited feedback provided by sensing-assisted approach lead to better performance.
results: 对wTIMIT数据库中的各个speaker组,US英语取得最佳result,相比基eline,word error rate降低18.2%。进一步调查发现嘟嚓speech中缺失的喉咙信息对嘟嚓speech识别性表现产生了最大的影响。Abstract
Whispering is a distinct form of speech known for its soft, breathy, and hushed characteristics, often used for private communication. The acoustic characteristics of whispered speech differ substantially from normally phonated speech and the scarcity of adequate training data leads to low automatic speech recognition (ASR) performance. To address the data scarcity issue, we use a signal processing-based technique that transforms the spectral characteristics of normal speech to those of pseudo-whispered speech. We augment an End-to-End ASR with pseudo-whispered speech and achieve an 18.2% relative reduction in word error rate for whispered speech compared to the baseline. Results for the individual speaker groups in the wTIMIT database show the best results for US English. Further investigation showed that the lack of glottal information in whispered speech has the largest impact on whispered speech ASR performance.
摘要
嘟哒是一种特殊的语言形式,其特点是软、浅、低声,通常用于私人通信。嘟哒speech的听音特性与正常发音 speech 有很大差异,导致自动语音识别(ASR)性能较低。为解决数据缺乏问题,我们使用一种信号处理基本技术,将正常语音的spectral特性转换为 pseudo-嘟哒speech。我们将端到端 ASR 扩展到 pseudo-嘟哒speech,并实现了对嘟哒speech的18.2% 相对下降 word error rate。 results for the individual speaker groups in the wTIMIT database show the best results for US English。进一步调查发现,嘟哒speech ASR 性能中最大的影响因素是缺乏舌喙信息。
results: 比较现有方法,我们的方法可以更高精度地重建音场,并且可以灵活地适应不同的音场特性。Abstract
Accurately representing the sound field with the high spatial resolution is critical for immersive and interactive sound field reproduction technology. To minimize experimental effort, data-driven methods have been proposed to estimate sound fields from a small number of discrete observations. In particular, kernel-based methods using Gaussian Processes (GPs) with a covariance function to model spatial correlations have been used for sound field reconstruction. However, these methods have limitations due to the fixed kernels having limited expressiveness, requiring manual identification of optimal kernels for different sound fields. In this work, we propose a new approach that parameterizes GPs using a deep neural network based on Neural Processes (NPs) to reconstruct the magnitude of the sound field. This method has the advantage of dynamically learning kernels from simulated data using an attention mechanism, allowing for greater flexibility and adaptability to the acoustic properties of the sound field. Numerical experiments demonstrate that our proposed approach outperforms current methods in reconstructing accuracy, providing a promising alternative for sound field reconstruction.
摘要
<> Accurately representing the sound field with high spatial resolution is critical for immersive and interactive sound field reproduction technology. To minimize experimental effort, data-driven methods have been proposed to estimate sound fields from a small number of discrete observations. In particular, kernel-based methods using Gaussian Processes (GPs) with a covariance function to model spatial correlations have been used for sound field reconstruction. However, these methods have limitations due to the fixed kernels having limited expressiveness, requiring manual identification of optimal kernels for different sound fields.In this work, we propose a new approach that parameterizes GPs using a deep neural network based on Neural Processes (NPs) to reconstruct the magnitude of the sound field. This method has the advantage of dynamically learning kernels from simulated data using an attention mechanism, allowing for greater flexibility and adaptability to the acoustic properties of the sound field. Numerical experiments demonstrate that our proposed approach outperforms current methods in reconstructing accuracy, providing a promising alternative for sound field reconstruction.>>>