cs.AI - 2023-11-04

UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition

  • paper_url: http://arxiv.org/abs/2311.02523
  • repo_url: None
  • paper_authors: Qiufu Li, Xi Jia, Jiancan Zhou, Linlin Shen, Jinming Duan
  • for: 这篇论文的目的是提出一种高效的面部识别方法,以满足现实世界中的面部验证应用。
  • methods: 该方法使用了一种新的集成损失函数(USS loss),该损失函数具有一个明确的统一阈值,用于分辨正面和负面对的对比。
  • results: 实验结果表明,提出的 USS loss 高效地使用了 sample-to-sample 的损失函数,并可以与 sample-to-class 的损失函数结合使用。此外,该方法在多个 benchmark 数据集上表现出色,比如 MFR、IJB-C、LFW、CFP-FP、AgeDB 和 MegaFace,并且可以超越现有的方法,如 CosFace、ArcFace、VPL、AnchorFace 和 UNPG。
    Abstract Sample-to-class-based face recognition models can not fully explore the cross-sample relationship among large amounts of facial images, while sample-to-sample-based models require sophisticated pairing processes for training. Furthermore, neither method satisfies the requirements of real-world face verification applications, which expect a unified threshold separating positive from negative facial pairs. In this paper, we propose a unified threshold integrated sample-to-sample based loss (USS loss), which features an explicit unified threshold for distinguishing positive from negative pairs. Inspired by our USS loss, we also derive the sample-to-sample based softmax and BCE losses, and discuss their relationship. Extensive evaluation on multiple benchmark datasets, including MFR, IJB-C, LFW, CFP-FP, AgeDB, and MegaFace, demonstrates that the proposed USS loss is highly efficient and can work seamlessly with sample-to-class-based losses. The embedded loss (USS and sample-to-class Softmax loss) overcomes the pitfalls of previous approaches and the trained facial model UniTSFace exhibits exceptional performance, outperforming state-of-the-art methods, such as CosFace, ArcFace, VPL, AnchorFace, and UNPG. Our code is available.
    摘要 “现有的面部识别模型(sample-to-class基本模型)无法充分利用大量的面部图像之间的交叉样本关系,而sample-to-sample基本模型则需要复杂的对应过程进行训练。此外,这两种方法都不能满足实际面部验证应用中的需求,需要一个统一的阈值来分辨正面和负面的面部对。在本篇文章中,我们提出了统一阈值结合sample-to-sample基本损失(USS损失),其中包含一个明确的统一阈值,用于分辨正面和负面的面部对。我们也从USS损失中 derivated sample-to-sample基本软max损失和BCE损失,并讨论它们之间的关系。我们对多个benchmark数据集,包括MFR、IJB-C、LFW、CFP-FP、AgeDB和MegaFace进行了广泛的评估,结果显示了我们的USS损失非常高效,并且可以与sample-to-class基本损失一起运作。我们的模型UniTSFace在训练时使用了USS损失和sample-to-class软max损失,并且表现出色,超越了现有的方法,如CosFace、ArcFace、VPL、AnchorFace和UNPG。我们的代码可以通过我们的网站下载。”

MAAIP: Multi-Agent Adversarial Interaction Priors for imitation from fighting demonstrations for physics-based characters

  • paper_url: http://arxiv.org/abs/2311.02502
  • repo_url: None
  • paper_authors: Mohamed Younes, Ewa Kijak, Richard Kulpa, Simon Malinowski, Franck Multon
  • for: 这篇论文旨在提出一种基于多智能生成对抗学习的多个物理角色动作模拟方法,以满足互动应用和电影电视行业中的自动次要人物动作生成需求。
  • methods: 该方法基于多智能生成对抗学习,使用印杂学习技术来模拟多个物理角色的交互和动作。
  • results: 该方法在两种不同的拳击和全身武术风格下进行了测试,并成功地模拟了不同风格的交互和动作。
    Abstract Simulating realistic interaction and motions for physics-based characters is of great interest for interactive applications, and automatic secondary character animation in the movie and video game industries. Recent works in reinforcement learning have proposed impressive results for single character simulation, especially the ones that use imitation learning based techniques. However, imitating multiple characters interactions and motions requires to also model their interactions. In this paper, we propose a novel Multi-Agent Generative Adversarial Imitation Learning based approach that generalizes the idea of motion imitation for one character to deal with both the interaction and the motions of the multiple physics-based characters. Two unstructured datasets are given as inputs: 1) a single-actor dataset containing motions of a single actor performing a set of motions linked to a specific application, and 2) an interaction dataset containing a few examples of interactions between multiple actors. Based on these datasets, our system trains control policies allowing each character to imitate the interactive skills associated with each actor, while preserving the intrinsic style. This approach has been tested on two different fighting styles, boxing and full-body martial art, to demonstrate the ability of the method to imitate different styles.
    摘要 仿真人物的交互和动作是现代应用中很受欢迎的话题,特别是在电影和电子游戏行业中。最近的学习策略中,强调实现单个人物的仿真动作,尤其是使用仿制学习技术。然而,模拟多个人物之间的交互和动作需要同时模型他们之间的互动。在这篇论文中,我们提出了一种新的多智能体生成对抗学习仿真学习方法,扩展了单个人物的动作仿真到多个物理基于的人物之间的交互和动作。我们使用两个无结构数据集作为输入:1)一个单个演员数据集,包含一个演员执行一系列动作和应用相关的动作链接,和2)一个互动数据集,包含一些多个演员之间的互动示例。基于这两个数据集,我们的系统通过控制策略让每个人物学习与每个演员相互交互的技能,保持内在的风格。我们在拳击和全身武术两种不同的战斗风格中测试了这种方法,以示方法的多样性。

Forecasting Post-Wildfire Vegetation Recovery in California using a Convolutional Long Short-Term Memory Tensor Regression Network

  • paper_url: http://arxiv.org/abs/2311.02492
  • repo_url: None
  • paper_authors: Jiahe Liu, Xiaodi Wang
  • for: 这个研究旨在发展成功的生态系统恢复策略,帮助理解火灾后植被恢复的过程。
  • methods: 这个研究使用了一种新的方法,即将Convolutional Long Short-Term Memory Tensor Regression(ConvLSTMTR)网络应用于火灾后植被恢复的预测。
  • results: 研究结果表明,ConvLSTMTR网络可以准确预测火灾后植被恢复的速度,并且可以分类不同的恢复趋势。
    Abstract The study of post-wildfire plant regrowth is essential for developing successful ecosystem recovery strategies. Prior research mainly examines key ecological and biogeographical factors influencing post-fire succession. This research proposes a novel approach for predicting and analyzing post-fire plant recovery. We develop a Convolutional Long Short-Term Memory Tensor Regression (ConvLSTMTR) network that predicts future Normalized Difference Vegetation Index (NDVI) based on short-term plant growth data after fire containment. The model is trained and tested on 104 major California wildfires occurring between 2013 and 2020, each with burn areas exceeding 3000 acres. The integration of ConvLSTM with tensor regression enables the calculation of an overall logistic growth rate k using predicted NDVI. Overall, our k-value predictions demonstrate impressive performance, with 50% of predictions exhibiting an absolute error of 0.12 or less, and 75% having an error of 0.24 or less. Finally, we employ Uniform Manifold Approximation and Projection (UMAP) and KNN clustering to identify recovery trends, offering insights into regions with varying rates of recovery. This study pioneers the combined use of tensor regression and ConvLSTM, and introduces the application of UMAP for clustering similar wildfires. This advances predictive ecological modeling and could inform future post-fire vegetation management strategies.
    摘要 研究火灾后植物回复的学术研究非常重要,以发展成功的生态系统回复策略。先前的研究主要探讨火灾后的生态和生物地理因素的影响。这个研究提出了一种新的方法来预测和分析火灾后植物的回复。我们开发了一个卷积长短期记忆点 regression(ConvLSTMTR)网络,可以预测未来 Normalized Difference Vegetation Index(NDVI)基于火灾后植物增长数据。这个模型在104次加利福尼亚州大火灾中训练和测试,每次火灾面积超过3000英亩。通过卷积和tensor regression的结合,我们可以计算整体的几何增长率k。总的来说,我们的k值预测表现出色,50%的预测值几何准确性在0.12或更低,75%的预测值几何准确性在0.24或更低。最后,我们使用Uniform Manifold Approximation and Projection(UMAP)和KNN推敲来识别回复趋势,提供了不同回复速率的区域差异的见解。这项研究创新了tensor regression和ConvLSTM的结合,并首次应用UMAP来推敲相似的野火。这些进展可能对未来火灾后植物管理策略提供帮助。

Uncertainty Quantification of Deep Learning for Spatiotemporal Data: Challenges and Opportunities

  • paper_url: http://arxiv.org/abs/2311.02485
  • repo_url: None
  • paper_authors: Wenchong He, Zhe Jiang
  • for: 随着GPS、Remote Sensing和计算模拟技术的发展,大量的地ospatial和时间特征数据正在不断增加,这些数据资产提供了改变社会的Unique机遇。但是,深度学习模型在高度决策应用中可能会出现意外和错误的预测,导致严重的后果。不确定性评估(UQ)可以 estimating a deep learning model’s confidence.
  • methods: 本文提供了深度学习模型的不确定性评估简介,包括其特殊挑战和现有方法。我们尤其关注不确定性来源的重要性。
  • results: 本文 highlights several future research directions for spatiotemporal data, including the importance of uncertainty sources.Here is the same information in English:
  • for: With the advancement of GPS, remote sensing, and computational simulations, large amounts of geospatial and spatiotemporal data are being collected at an increasing speed, providing unique opportunities to transform society. However, deep learning models sometimes make unexpected and incorrect predictions with unwarranted confidence, causing severe consequences in high-stake decision-making applications. Uncertainty quantification (UQ) aims to estimate a deep learning model’s confidence.
  • methods: This paper provides a brief overview of UQ of deep learning for spatiotemporal data, including its unique challenges and existing methods. We particularly focus on the importance of uncertainty sources.
  • results: The paper highlights several future research directions for spatiotemporal data, including the importance of uncertainty sources.
    Abstract With the advancement of GPS, remote sensing, and computational simulations, large amounts of geospatial and spatiotemporal data are being collected at an increasing speed. Such emerging spatiotemporal big data assets, together with the recent progress of deep learning technologies, provide unique opportunities to transform society. However, it is widely recognized that deep learning sometimes makes unexpected and incorrect predictions with unwarranted confidence, causing severe consequences in high-stake decision-making applications (e.g., disaster management, medical diagnosis, autonomous driving). Uncertainty quantification (UQ) aims to estimate a deep learning model's confidence. This paper provides a brief overview of UQ of deep learning for spatiotemporal data, including its unique challenges and existing methods. We particularly focus on the importance of uncertainty sources. We identify several future research directions for spatiotemporal data.
    摘要 Translated into Simplified Chinese:随着GPS、远程感知和计算 simulate的发展,大量的地ospatial和时空数据在不断增加。这些emerging spatiotemporal big data assets,加上深度学习技术的最新进步,为社会转型提供了唯一的机会。然而,广泛认可的深度学习 sometimes makes unexpected and incorrect predictions with unwarranted confidence,对高度决策应用(如灾害管理、医疗诊断、自动驾驶)可致严重的后果。 uncertainty quantification (UQ) aimsto estimate a deep learning model's confidence。这篇文章提供了深度学习 for spatiotemporal data的 UQ 简介,包括其特殊挑战和现有方法。我们尤其关注了 uncertainty sources 的重要性。我们标识了多个未来研究方向 for spatiotemporal data。

Generalized zero-shot audio-to-intent classification

  • paper_url: http://arxiv.org/abs/2311.02482
  • repo_url: None
  • paper_authors: Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki
  • for: 这个研究旨在提高使用音频数据的语音识别系统的未见意能力。
  • methods: 该研究提议一种通用零shot音频到意类型分类框架,只需要几个示例文本句子每个意图。这个框架首先通过使用一个自动生成的预训练模型来训练一个监督音频到意类型分类器。然后,我们使用神经网络音频生成器生成音频嵌入 для示例文本词汇,并使用普通的cosinus相似性来进行通用零shot分类。此外,我们还提出了一种多Modal训练策略,它将字幕信息 integrate到音频表示中以提高零shot性能。
  • results: 我们的多Modal训练策略提高了SLURP dataset上未见意分类的准确率,比Audio只训练策略高2.75%和18.2%。
    Abstract Spoken language understanding systems using audio-only data are gaining popularity, yet their ability to handle unseen intents remains limited. In this study, we propose a generalized zero-shot audio-to-intent classification framework with only a few sample text sentences per intent. To achieve this, we first train a supervised audio-to-intent classifier by making use of a self-supervised pre-trained model. We then leverage a neural audio synthesizer to create audio embeddings for sample text utterances and perform generalized zero-shot classification on unseen intents using cosine similarity. We also propose a multimodal training strategy that incorporates lexical information into the audio representation to improve zero-shot performance. Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2.75% and 18.2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.
    摘要 spoken language understanding systems using audio-only data 获得 popularity,但它们对未经见意旨的处理能力仍然有限。在这种研究中,我们提议一种通用的零shot audio-to-intent分类框架,只需几个sample text sentences per intent。为实现这一点,我们首先使用一个自我超vised audio-to-intent分类器进行训练,然后利用一个神经网络音频生成器生成音频嵌入 дляsample text词汇,并使用cosine similarity进行通用零shot分类。我们还提出了一种多Modal训练策略,该策略将语言信息 incorporated into the audio representation,以提高零shot性能。我们的多Modal训练方法在SLURP上的零shot意旨分类精度提高2.75%和18.2%,相比 audio-only 训练。

Constrained Equation Learner Networks for Precision-Preserving Extrapolation of Robotic Skills

  • paper_url: http://arxiv.org/abs/2311.02475
  • repo_url: None
  • paper_authors: Hector Perez-Villeda, Justus Piater, Matteo Saveriano
  • for: 该论文目的是解决在程序示示中学习 novel 技能后,如何适应不同的环境和条件,而不需要收集新的训练数据。
  • methods: 该论文提出了一种新的监督学习框架,即受限Equation Learner Networks(CEN),用于解决程序示示中的轨迹适应问题。CEN 使用 Equation Learner Networks 来学习一组 analytical 表达,并用这些表达作为基函数。
  • results: 实验结果表明,CEN 可以比现有方法更好地适应 robotic 技能,并且可以保持适应的精度。在一些 robotic 任务中,CEN 在不同的环境下实现了更高的总体性和适应性。
    Abstract In Programming by Demonstration, the robot learns novel skills from human demonstrations. After learning, the robot should be able not only to reproduce the skill, but also to generalize it to shifted domains without collecting new training data. Adaptation to similar domains has been investigated in the literature; however, an open problem is how to adapt learned skills to different conditions that are outside of the data distribution, and, more important, how to preserve the precision of the desired adaptations. This paper presents a novel supervised learning framework called Constrained Equation Learner Networks that addresses the trajectory adaptation problem in Programming by Demonstrations from a constrained regression perspective. While conventional approaches for constrained regression use one kind of basis function, e.g., Gaussian, we exploit Equation Learner Networks to learn a set of analytical expressions and use them as basis functions. These basis functions are learned from demonstration with the objective to minimize deviations from the training data while imposing constraints that represent the desired adaptations, like new initial or final points or maintaining the trajectory within given bounds. Our approach addresses three main difficulties in adapting robotic trajectories: 1) minimizing the distortion of the trajectory for new adaptations; 2) preserving the precision of the adaptations; and 3) dealing with the lack of intuition about the structure of basis functions. We validate our approach both in simulation and in real experiments in a set of robotic tasks that require adaptation due to changes in the environment, and we compare obtained results with two existing approaches. Performed experiments show that Constrained Equation Learner Networks outperform state of the art approaches by increasing generalization and adaptability of robotic skills.
    摘要 在程序编程中,机器人从人类示例学习新技能。学习后,机器人应该不仅能复制技能,还能泛化到偏移域无需新的训练数据。针对相似域的适应已经在文献中 investigate;然而,一个开放的问题是如何适应不同的条件,这些条件外部训练数据分布。此外,更重要的是如何保持适应的精度。这篇论文提出了一种新的监督学习框架,即受限 regression 框架,用于程序编程中的示例适应问题。与传统的受限 regression 方法一样,我们使用一种基于Equation Learner Networks的学习算法,以学习一组分析表达式,并使其作为基函数使用。这些基函数从示例学习中得出,并与具有限制的目标函数进行拟合,以最小化示例数据与适应结果之间的差异,同时保持适应的精度。我们的方法解决了以下三个主要难题:1)减少新适应中的路径扭曲;2)保持适应的精度;3)处理基函数的感知问题。我们在实验中 validate 了我们的方法,并与两种现有方法进行比较。实验结果表明,受限Equation Learner Networks 可以比现有方法提高机器人技能的泛化和适应能力。

Multi-State Brain Network Discovery

  • paper_url: http://arxiv.org/abs/2311.02466
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Hang Yin, Yao Su, Xinyue Liu, Thomas Hartvigsen, Yanhua Li, Xiangnan Kong
  • for: This paper aims to discover brain networks from spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains, and to model multi-state brain networks that capture the intricate patterns of brain activities.
  • methods: The proposed method, called MNGL (Multi-state Network Graphical Lasso), combines CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model) to successfully model multi-state brain networks.
  • results: Compared to recent state-of-the-art alternatives, MNGL outperforms by discovering more explanatory and realistic results using both synthetic and real world ADHD 200 fMRI datasets.Here’s the Chinese translation of the three information:
  • for: 这篇论文旨在基于神经成像数据,如fMRI扫描人脑的信号,发现脑网络,并模型多状态脑网络,以捕捉脑活动的复杂征性。
  • methods: 提议的方法是MNGL(多状态网络图解吸积模型),它将CGL(协调图解吸积模型)与GMM(加性分布模型)结合,成功地模型多状态脑网络。
  • results: 与最新的状态艺术相比,MNGL在使用 sintetic和实际ADHD 200 fMRI数据上表现出色,可以更好地捕捉脑活动的特征。
    Abstract Brain network discovery aims to find nodes and edges from the spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains. Existing methods tend to derive representative or average brain networks, assuming observed signals are generated by only a single brain activity state. However, the human brain usually involves multiple activity states, which jointly determine the brain activities. The brain regions and their connectivity usually exhibit intricate patterns that are difficult to capture with only a single-state network. Recent studies find that brain parcellation and connectivity change according to the brain activity state. We refer to such brain networks as multi-state, and this mixture can help us understand human behavior. Thus, compared to a single-state network, a multi-state network can prevent us from losing crucial information of cognitive brain network. To achieve this, we propose a new model called MNGL (Multi-state Network Graphical Lasso), which successfully models multi-state brain networks by combining CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model). Using both synthetic and real world ADHD 200 fMRI datasets, we demonstrate that MNGL outperforms recent state-of-the-art alternatives by discovering more explanatory and realistic results.
    摘要 ��� Git network discovery aims to find nodes and edges from the spatio-temporal signals obtained by neuroimaging data, such as fMRI scans of human brains. Existing methods tend to derive representative or average brain networks, assuming observed signals are generated by only a single brain activity state. However, the human brain usually involves multiple activity states, which jointly determine the brain activities. The brain regions and their connectivity usually exhibit intricate patterns that are difficult to capture with only a single-state network. Recent studies find that brain parcellation and connectivity change according to the brain activity state. We refer to such brain networks as multi-state, and this mixture can help us understand human behavior. Thus, compared to a single-state network, a multi-state network can prevent us from losing crucial information of cognitive brain network. To achieve this, we propose a new model called MNGL (Multi-state Network Graphical Lasso), which successfully models multi-state brain networks by combining CGL (coherent graphical lasso) with GMM (Gaussian Mixture Model). Using both synthetic and real world ADHD 200 fMRI datasets, we demonstrate that MNGL outperforms recent state-of-the-art alternatives by discovering more explanatory and realistic results.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Levels of AGI: Operationalizing Progress on the Path to AGI

  • paper_url: http://arxiv.org/abs/2311.02462
  • repo_url: None
  • paper_authors: Meredith Ringel Morris, Jascha Sohl-dickstein, Noah Fiedel, Tris Warkentin, Allan Dafoe, Aleksandra Faust, Clement Farabet, Shane Legg
  • for: This paper proposes a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors.
  • methods: The paper analyzes existing definitions of AGI and distills six principles that a useful ontology for AGI should satisfy.
  • results: The paper proposes “Levels of AGI” based on depth (performance) and breadth (generality) of capabilities, and discusses the challenges of quantifying the behavior and capabilities of AGI models against these levels.Here are the three key points in Simplified Chinese text:
  • for: 这篇论文提出了一个用于分类人工通用智能(AGI)模型和其前体的框架。
  • methods: 这篇论文通过分析现有的AGI定义,提出了 six 个用于AGIontology的原则。
  • results: 这篇论文提出了“Levels of AGI”,基于深度(性能)和面积(通用)的能力,并讨论了量化AGI模型的行为和能力的挑战。
    Abstract We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy. It is our hope that this framework will be useful in an analogous way to the levels of autonomous driving, by providing a common language to compare models, assess risks, and measure progress along the path to AGI. To develop our framework, we analyze existing definitions of AGI, and distill six principles that a useful ontology for AGI should satisfy. These principles include focusing on capabilities rather than mechanisms; separately evaluating generality and performance; and defining stages along the path toward AGI, rather than focusing on the endpoint. With these principles in mind, we propose 'Levels of AGI' based on depth (performance) and breadth (generality) of capabilities, and reflect on how current systems fit into this ontology. We discuss the challenging requirements for future benchmarks that quantify the behavior and capabilities of AGI models against these levels. Finally, we discuss how these levels of AGI interact with deployment considerations such as autonomy and risk, and emphasize the importance of carefully selecting Human-AI Interaction paradigms for responsible and safe deployment of highly capable AI systems.
    摘要 我们提出了一套AGI模型和其前体的能力和行为分类框架。这套框架 introduce AGI性能、通用性和自主性的多个级别。我们希望这套框架可以与自动驾驶技术类似,提供一种共同语言,比较模型、评估风险和衡量AGI的进步。为开发这套框架,我们分析了现有的AGI定义,并总结出六个AGIontology应满足的原则。这些原则包括专注于能力而不是机制,分开评估通用性和性能,以及定义AGI的发展阶段,而不是专注于终点。基于这些原则,我们提出了“AGI级别”,按照性能和通用性的深度和面积来评估AGI模型的能力。我们还讨论了未来测试AGI模型的标准 benchmark,并讨论了这些级别与部署考虑因素,如自主和风险的关系。最后,我们强调选择合适的人机交互模式对于负责任和安全地部署高能力AI系统非常重要。

Can ChatGPT support software verification?

  • paper_url: http://arxiv.org/abs/2311.02433
  • repo_url: None
  • paper_authors: Christian Janßen, Cedric Richter, Heike Wehrheim
  • for: 这个论文的目的是研究使用 chatGPT 支持正式软件验证。
  • methods: 论文使用 chatGPT 生成 loop invariants,并通过 Frama-C 和 CPAchecker 验证其有效性和实用性。
  • results: 论文的结果表明,chatGPT 可以生成有效和实用的 loop invariants,帮助 Frama-C 验证 tasks 之前无法解决。
    Abstract Large language models have become increasingly effective in software engineering tasks such as code generation, debugging and repair. Language models like ChatGPT can not only generate code, but also explain its inner workings and in particular its correctness. This raises the question whether we can utilize ChatGPT to support formal software verification. In this paper, we take some first steps towards answering this question. More specifically, we investigate whether ChatGPT can generate loop invariants. Loop invariant generation is a core task in software verification, and the generation of valid and useful invariants would likely help formal verifiers. To provide some first evidence on this hypothesis, we ask ChatGPT to annotate 106 C programs with loop invariants. We check validity and usefulness of the generated invariants by passing them to two verifiers, Frama-C and CPAchecker. Our evaluation shows that ChatGPT is able to produce valid and useful invariants allowing Frama-C to verify tasks that it could not solve before. Based on our initial insights, we propose ways of combining ChatGPT (or large language models in general) and software verifiers, and discuss current limitations and open issues.
    摘要 To provide some initial evidence for this hypothesis, we ask ChatGPT to annotate 106 C programs with loop invariants. We then check the validity and usefulness of the generated invariants by passing them to two verifiers, Frama-C and CPAchecker. Our evaluation shows that ChatGPT is able to produce valid and useful invariants that allow Frama-C to verify tasks that it could not solve before. Based on our initial insights, we propose ways of combining ChatGPT (or large language models in general) and software verifiers, and discuss current limitations and open issues.

CDR-Adapter: Learning Adapters to Dig Out More Transferring Ability for Cross-Domain Recommendation Models

  • paper_url: http://arxiv.org/abs/2311.02398
  • repo_url: None
  • paper_authors: Yanyu Chen, Yao Yao, Wai Kin Victor Chan, Li Xiao, Kai Zhang, Liang Zhang, Yun Ye
  • for: 解决推荐系统中的数据稀缺和冷启动问题,提高推荐性能。
  • methods: 提出了一种可扩展和高效的解决方案,即CDR-Adapter,它通过分离原来的推荐模型和映射函数,实现知识传递而不需要重新引入网络结构,从而避免了计算成本高和知识淡化问题。
  • results: 在标准测试集上进行了广泛的实验,证明了我们的方法的有效性,比如 state-of-the-art CDR 方法。
    Abstract Data sparsity and cold-start problems are persistent challenges in recommendation systems. Cross-domain recommendation (CDR) is a promising solution that utilizes knowledge from the source domain to improve the recommendation performance in the target domain. Previous CDR approaches have mainly followed the Embedding and Mapping (EMCDR) framework, which involves learning a mapping function to facilitate knowledge transfer. However, these approaches necessitate re-engineering and re-training the network structure to incorporate transferrable knowledge, which can be computationally expensive and may result in catastrophic forgetting of the original knowledge. In this paper, we present a scalable and efficient paradigm to address data sparsity and cold-start issues in CDR, named CDR-Adapter, by decoupling the original recommendation model from the mapping function, without requiring re-engineering the network structure. Specifically, CDR-Adapter is a novel plug-and-play module that employs adapter modules to align feature representations, allowing for flexible knowledge transfer across different domains and efficient fine-tuning with minimal training costs. We conducted extensive experiments on the benchmark dataset, which demonstrated the effectiveness of our approach over several state-of-the-art CDR approaches.
    摘要 数据稀缺和冷启问题是推荐系统中的惯常挑战。跨Domain推荐(CDR)是一种有前途的解决方案,它利用源Domain中的知识提高目标Domain中的推荐性能。先前的CDR方法主要遵循Embedding and Mapping(EMCDR)框架,这些方法通常需要重新工程和重新训练网络结构,以便传递可移植的知识。然而,这些方法可能需要大量的计算资源和可能会导致原始知识的恐慌遗忘。在这篇论文中,我们提出了一种可扩展和高效的方法,以解决推荐系统中的数据稀缺和冷启问题,名为CDR-Adapter。CDR-Adapter是一种新的插件模块,它使用适配器模块来对特征表示进行减Alignment,以便在不同的Domain之间进行可靠的知识传递和高效的细化训练,无需重新工程网络结构。我们对标准数据集进行了广泛的实验,结果表明我们的方法比先前的CDR方法更有效。

Continual Learning of Unsupervised Monocular Depth from Videos

  • paper_url: http://arxiv.org/abs/2311.02393
  • repo_url: https://github.com/NeurAI-Lab/CUDE-MonoDepthCL
  • paper_authors: Hemang Chawla, Arnav Varma, Elahe Arani, Bahram Zonooz
  • for: 提高无监督单目深度估计的能力,应用于 робо扮和自动驾驶等领域。
  • methods: 提出了一个框架,以及一种基于备忘的双存储方法(MonoDepthCL),利用空间时间协调的一致性来实现不间断学习。
  • results: 模型在不同的频率和规模上进行了训练和测试,显示了在不间断学习中的性能稳定性和增长。
    Abstract Spatial scene understanding, including monocular depth estimation, is an important problem in various applications, such as robotics and autonomous driving. While improvements in unsupervised monocular depth estimation have potentially allowed models to be trained on diverse crowdsourced videos, this remains underexplored as most methods utilize the standard training protocol, wherein the models are trained from scratch on all data after new data is collected. Instead, continual training of models on sequentially collected data would significantly reduce computational and memory costs. Nevertheless, naive continual training leads to catastrophic forgetting, where the model performance deteriorates on older domains as it learns on newer domains, highlighting the trade-off between model stability and plasticity. While several techniques have been proposed to address this issue in image classification, the high-dimensional and spatiotemporally correlated outputs of depth estimation make it a distinct challenge. To the best of our knowledge, no framework or method currently exists focusing on the problem of continual learning in depth estimation. Thus, we introduce a framework that captures the challenges of continual unsupervised depth estimation (CUDE), and define the necessary metrics to evaluate model performance. We propose a rehearsal-based dual-memory method, MonoDepthCL, which utilizes spatiotemporal consistency for continual learning in depth estimation, even when the camera intrinsics are unknown.
    摘要 空间场景理解,包括单目深度估计,在各种应用中具有重要性,如 роботиcs和自动驾驶。尽管无监督单目深度估计的改进允许模型在不同的人工训练视频上进行训练,但这还是未经探索的,因为大多数方法使用标准训练协议,即从 scratch 上所有数据进行训练,新数据收集后。相反,继续训练模型在顺序收集的数据上会显著减少计算和内存成本。然而,简单的继续训练会导致忘记现象,模型对老化领域的性能下降,显示出模型稳定性和抑制的负面关系。虽然一些技术已经被提出来解决这个问题在图像分类方面,但高维度和空间时间相关的输出使得深度估计是一个特殊的挑战。根据我们所知,现在没有任何框架或方法专门关注深度估计的连续学习问题。因此,我们提出了一个框架,即不间断无监督深度估计框架(CUDE),并定义了评估模型性能的必要指标。我们提议一种备忘队列方法,即单目深度CL,它通过空间时间一致性来实现连续学习深度估计,即使摄像头内参不详。

Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification

  • paper_url: http://arxiv.org/abs/2311.02392
  • repo_url: https://github.com/jarucezh/cldfd
  • paper_authors: Hao Zheng, Runqi Wang, Jianzhuang Liu, Asako Kanezaki
  • for: 本研究目标是解决跨频道几拟分类问题,即在培育模型时使用不同频道的数据进行学习。
  • methods: 我们采用了跨层知识填充法,以强化模型在目标数据集中提取更多特征信息。此外,我们还提出了一种特征净化操作,以降低特征重复和避免过拟合。
  • results: 我们的方法可以在BSCD-FSL benchmark中超越前任的Dynamic-Distillation方法,在1-shot和5-shot分类任务上的平均提高5.44%和1.37%。代码将在GitHub上提供。
    Abstract The conventional few-shot classification aims at learning a model on a large labeled base dataset and rapidly adapting to a target dataset that is from the same distribution as the base dataset. However, in practice, the base and the target datasets of few-shot classification are usually from different domains, which is the problem of cross-domain few-shot classification. We tackle this problem by making a small proportion of unlabeled images in the target domain accessible in the training stage. In this setup, even though the base data are sufficient and labeled, the large domain shift still makes transferring the knowledge from the base dataset difficult. We meticulously design a cross-level knowledge distillation method, which can strengthen the ability of the model to extract more discriminative features in the target dataset by guiding the network's shallow layers to learn higher-level information. Furthermore, in order to alleviate the overfitting in the evaluation stage, we propose a feature denoising operation which can reduce the feature redundancy and mitigate overfitting. Our approach can surpass the previous state-of-the-art method, Dynamic-Distillation, by 5.44% on 1-shot and 1.37% on 5-shot classification tasks on average in the BSCD-FSL benchmark. The implementation code will be available at https://github.com/jarucezh/cldfd.
    摘要 传统的几shot分类目标是学习一个模型,然后快速适应目标数据集,但在实际应用中,基数据集和目标数据集通常来自不同的领域,这是跨领域几shot分类的问题。我们解决这个问题 by 在训练阶段使得小量目标数据集中的无标照图像变得可访问。尽管基数据集充足并彩色标注,但大领域变化仍然使得从基数据集传输知识困难。我们仔细设计了跨层知识填充方法,该方法可以使模型在目标数据集中提取更多特征分类器。此外,为了避免评估阶段的过拟合,我们提议一种特征净化操作,可以减少特征重复和抑制过拟合。我们的方法可以在BSCD-FSL标准准则下平均超过前一个状态的方法,Dynamic-Distillation,在1shot和5shot分类任务上的平均性能提高5.44%和1.37%。代码实现将提供在https://github.com/jarucezh/cldfd中。

AI-based Self-healing Solutions Applied to Cellular Networks: An Overview

  • paper_url: http://arxiv.org/abs/2311.02390
  • repo_url: None
  • paper_authors: Jaleh Farmani, Amirreza Khalil Zadeh
  • For: The paper is written for researchers and practitioners in the field of cellular networks, specifically those interested in self-healing and machine learning techniques for network management.* Methods: The paper provides an overview of machine learning methods, including classical and deep learning variants, that are used to implement self-healing for cell outages in cellular networks.* Results: The paper reviews the state-of-the-art in literature for cell outages, with a particular emphasis on machine learning-based approaches.Here are the three key points in Simplified Chinese text:* For: 本文是为Cellular网络相关研究人员和实践者所写,尤其是关注自适应和机器学习技术的网络管理方面。* Methods: 本文提供了机器学习方法的概述,包括经典和深度学习变体,用于实现Cellular网络中的自适应。* Results: 本文对Cellular网络中的维护和自适应方面进行了文献综述,尤其是关注机器学习基于的方法。
    Abstract In this article, we provide an overview of machine learning (ML) methods, both classical and deep variants, that are used to implement self-healing for cell outages in cellular networks. Self-healing is a promising approach to network management, which aims to detect and compensate for cell outages in an autonomous way. This technology aims to decrease the expenses associated with the installation and maintenance of existing 4G and 5G, i.e. emerging 6G networks by simplifying operational tasks through its ability to heal itself. We provide an overview of the basic concepts and taxonomy for SON, self-healing, and ML techniques, in network management. Moreover, we review the state-of-the-art in literature for cell outages, with a particular emphasis on ML-based approaches.
    摘要 在这篇文章中,我们提供了机器学习(ML)方法的概述,包括古典和深度变种,用于实现cell网络中的自适应维护。自适应维护是一种有前途的网络管理方法,旨在通过自动化方式探测和补做cell网络中的维护问题。这技术可以降低现有4G和5G等新生6G网络的安装和维护成本,通过简化操作任务来简化操作任务。我们还提供了网络管理基本概念和分类,以及相关文献综述。特别是,我们对文献中关于cell网络维护的研究进行了深入审查,强调了基于ML的方法。

Ultra-Long Sequence Distributed Transformer

  • paper_url: http://arxiv.org/abs/2311.02382
  • repo_url: None
  • paper_authors: Xiao Wang, Isaac Lyngaas, Aristeidis Tsaris, Peng Chen, Sajal Dash, Mayanka Chandra Shekar, Tao Luo, Hong-Jun Yoon, Mohamed Wahib, John Gouley
  • for: 该论文旨在提出一种高效的分布式训练方法,以便使用长序列训练变换器模型。
  • methods: 该方法分割长序列成多个段,并将每个段分配给不同的GPU进行计算。然后,它使用了一种复合通信和双均值平均技术来避免部分自注意计算的汇聚和通信开销。
  • results: 与现有的序列并行技术相比,该方法在144个Nvidia V100 GPU上实现了5.6倍的速度提升和10.2倍的内存可用性提升。此外,该算法可以扩展到极长序列长度50,112,在3,456个GPU上实现161%的超线性并行效率和32PFLOP的吞吐量。
    Abstract Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences. It distributes a long sequence into segments among GPUs, with each GPU computing a partial self-attention for its segment. Then, it uses a fused communication and a novel double gradient averaging technique to avoid the need to aggregate partial self-attention and minimize communication overhead. We evaluated the performance between LSS Transformer and the state-of-the-art Nvidia sequence parallelism on a Wikipedia enwik8 dataset. Results show that our proposed method lead to 5.6x faster and 10.2x more memory-efficient implementation compared to state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs. Moreover, our algorithm scales to an extreme sequence length of 50,112 at 3,456 GPUs, achieving 161% super-linear parallel efficiency and a throughput of 32 petaflops.
    摘要 很多变换器模型在长序列上训练时会达到更高的准确率。然而,普通的变换器在长序列训练时会遇到过于复杂的计算和内存需求的问题。现有的长序列训练方法可能会减少速度和内存占用,并可能会降低准确率。这篇论文提出了一种新的和高效的分布式训练方法——长短序列变换器(LSS Transformer),用于训练变换器模型。它将长序列分成多个 GPU 上的段,每个 GPU 计算自己的段部分自注意。然后,它使用了一种混合通信和一种新的双 Gradient 平均技术,以避免需要合并部分自注意和减少通信开销。我们对 LSS Transformer 和状态对照短序列并行方法进行了对比,使用 Wikipedia enwik8 数据集。结果显示,我们提出的方法比状态对照短序列并行方法在 144 Nvidia V100 GPU 上得到了5.6倍快速和10.2倍内存高效的实现。此外,我们的算法可以在极长序列长度为50,112的情况下,在3,456 GPU 上进行扩展,实现161%的超线性并行率和32 petaflops 的吞吐量。

Accelerating Reinforcement Learning of Robotic Manipulations via Feedback from Large Language Models

  • paper_url: http://arxiv.org/abs/2311.02379
  • repo_url: None
  • paper_authors: Kun Chu, Xufeng Zhao, Cornelius Weber, Mengdi Li, Stefan Wermter
  • for: 提高RLAgent的学习效率和成功率,通过人工智能语言模型提供有用的反馈。
  • methods: 利用大量语言数据预训练的大语言模型(LLM),提供RLAgent有用的反馈,帮助RLAgent更快速地学习和成功完成 робо控制任务。
  • results: 实验结果表明,使用Lafite-RL框架,RLAgent可以更快速地学习并成功完成RLBench任务,并且在学习效率和成功率上都有显著提高。
    Abstract Reinforcement Learning (RL) plays an important role in the robotic manipulation domain since it allows self-learning from trial-and-error interactions with the environment. Still, sample efficiency and reward specification seriously limit its potential. One possible solution involves learning from expert guidance. However, obtaining a human expert is impractical due to the high cost of supervising an RL agent, and developing an automatic supervisor is a challenging endeavor. Large Language Models (LLMs) demonstrate remarkable abilities to provide human-like feedback on user inputs in natural language. Nevertheless, they are not designed to directly control low-level robotic motions, as their pretraining is based on vast internet data rather than specific robotics data. In this paper, we introduce the Lafite-RL (Language agent feedback interactive Reinforcement Learning) framework, which enables RL agents to learn robotic tasks efficiently by taking advantage of LLMs' timely feedback. Our experiments conducted on RLBench tasks illustrate that, with simple prompt design in natural language, the Lafite-RL agent exhibits improved learning capabilities when guided by an LLM. It outperforms the baseline in terms of both learning efficiency and success rate, underscoring the efficacy of the rewards provided by an LLM.
    摘要

MTS-DVGAN: Anomaly Detection in Cyber-Physical Systems using a Dual Variational Generative Adversarial Network

  • paper_url: http://arxiv.org/abs/2311.02378
  • repo_url: None
  • paper_authors: Haili Sun, Yan Huang, Lansheng Han, Cai Fu, Hongle Liu, Xiang Long
  • for: 这篇论文旨在应用深度生成模型来探测 Cyber-physical systems (CPSs) 中的新型攻击,并无需靠扩展标签信息。
  • methods: 这篇论文提出了一个名为 MST-DVGAN 的新型Unsupervised dual variational generative adversarial model,用于侦测 multivariate time series data 中的异常情况。
  • results: compared with state-of-the-art methods, the proposed MTS-DVGAN is more stable and can achieve consistent performance improvement in detecting anomalies in CPSs.
    Abstract Deep generative models are promising in detecting novel cyber-physical attacks, mitigating the vulnerability of Cyber-physical systems (CPSs) without relying on labeled information. Nonetheless, these generative models face challenges in identifying attack behaviors that closely resemble normal data, or deviate from the normal data distribution but are in close proximity to the manifold of the normal cluster in latent space. To tackle this problem, this article proposes a novel unsupervised dual variational generative adversarial model named MST-DVGAN, to perform anomaly detection in multivariate time series data for CPS security. The central concept is to enhance the model's discriminative capability by widening the distinction between reconstructed abnormal samples and their normal counterparts. Specifically, we propose an augmented module by imposing contrastive constraints on the reconstruction process to obtain a more compact embedding. Then, by exploiting the distribution property and modeling the normal patterns of multivariate time series, a variational autoencoder is introduced to force the generative adversarial network (GAN) to generate diverse samples. Furthermore, two augmented loss functions are designed to extract essential characteristics in a self-supervised manner through mutual guidance between the augmented samples and original samples. Finally, a specific feature center loss is introduced for the generator network to enhance its stability. Empirical experiments are conducted on three public datasets, namely SWAT, WADI and NSL_KDD. Comparing with the state-of-the-art methods, the evaluation results show that the proposed MTS-DVGAN is more stable and can achieve consistent performance improvement.
    摘要 深度生成模型可以有效探测 novel 的Cyber-physical attacks,减轻 Cyber-physical systems (CPSs) 的感受性,不需要靠据标注信息。然而,这些生成模型面临着难以识别攻击行为,与正常数据 Distribution 相似或者在正常数据分布的邻近区域内偏离。为解决这个问题,本文提出了一种新的无监督 dual variational generative adversarial model(MST-DVGAN),用于Cyber-physical systems (CPSs) 安全中的异常检测。中心思想是通过增强模型的歧义能力,使重构后的异常样本与正常样本更加明显不同。具体来说,我们提出了一个增强模块,通过对重构过程进行冲突约束,以获得更加紧凑的嵌入。然后,通过利用多变量时序数据的分布特性和模型正常模式,引入了一种变量自动编码器,以使生成对抗网络(GAN)生成更加多样化的样本。此外,我们还设计了两种增强loss函数,通过自然指导两者之间的增强样本和原始样本之间的互动,从而提取出更加重要的特征。最后,我们引入了一种特定的特征中心损失,以提高生成器网络的稳定性。我们在三个公共数据集上进行了实验,即SWAT、WADI和NSL_KDD。与现状的方法相比,我们的MST-DVGAN表现更加稳定,并且可以在各种场景下提供更高的性能改进。

Contrastive Deep Nonnegative Matrix Factorization for Community Detection

  • paper_url: http://arxiv.org/abs/2311.02357
  • repo_url: None
  • paper_authors: Yuecheng Li, Jialong Chen, Chuan Chen, Lei Yang, Zibin Zheng
  • for: 本研究旨在提出一种新的社区探测算法,以解决现有的非正式矩阵分解(NMF)基于方法的三大问题:1)它们直接将原始网络转换为社区会员空间,难以捕捉层次结构信息;2)它们通常仅关注网络的拓扑结构,忽略节点特征;3)它们难以学习社区探测所需的全局结构信息。
  • methods: 我们提出了一种新的社区探测算法,名为对比深度非负矩阵分解(CDNMF)。我们首先深化NMF,以增强其信息提取能力。然后,我们受到对比学习的启发,把网络拓扑结构和节点特征作为两个对比视图构建。此外,我们使用了一个减噪负样本层,以提高模型的社区同义性。
  • results: 我们在三个公共实验 graphs 上进行了实验,并证明了 CDNMF 模型在社区探测方面的优异性。与现有方法相比,CDNMF 模型在社区内部的节点相似性学习和全局结构信息捕捉方面具有优势。代码可以在 https://github.com/6lyc/CDNMF.git 中找到。
    Abstract Recently, nonnegative matrix factorization (NMF) has been widely adopted for community detection, because of its better interpretability. However, the existing NMF-based methods have the following three problems: 1) they directly transform the original network into community membership space, so it is difficult for them to capture the hierarchical information; 2) they often only pay attention to the topology of the network and ignore its node attributes; 3) it is hard for them to learn the global structure information necessary for community detection. Therefore, we propose a new community detection algorithm, named Contrastive Deep Nonnegative Matrix Factorization (CDNMF). Firstly, we deepen NMF to strengthen its capacity for information extraction. Subsequently, inspired by contrastive learning, our algorithm creatively constructs network topology and node attributes as two contrasting views. Furthermore, we utilize a debiased negative sampling layer and learn node similarity at the community level, thereby enhancing the suitability of our model for community detection. We conduct experiments on three public real graph datasets and the proposed model has achieved better results than state-of-the-art methods. Code available at https://github.com/6lyc/CDNMF.git.
    摘要 现在,非正式矩阵分解(NMF)已经广泛应用于社群检测,因为它的更好的解释性。然而,现有的NMF基于方法有以下三个问题:1)它们直接将原始网络转换成社群成员空间,因此很难捕捉层次信息; 2)它们通常只关注网络的结构,忽略节点特征; 3)它们难以学习必要的全局结构信息 для社群检测。因此,我们提出了一个新的社群检测算法,名为对比深度非正式矩阵分解(CDNMF)。首先,我们深入了NMF,以增强其信息提取能力。然后,我们受到对比学习的启发,创新地将网络结构和节点特征作为两个对比视图。此外,我们使用了一层恢复降低的负样本层,并在社群层上学习节点相似性,从而提高了我们模型的适应性。我们在三个公共实验Graph数据集上进行了实验,并取得了与当前最佳方法的更好的结果。代码可以在https://github.com/6lyc/CDNMF.git中找到。

Perturbation-based Active Learning for Question Answering

  • paper_url: http://arxiv.org/abs/2311.02345
  • repo_url: None
  • paper_authors: Fan Luo, Mihai Surdeanu
  • for: 建立一个问答模型,可以降低标注成本,通过使用活动学习(AL)训练策略。
  • methods: 使用活动学习的采样策略,选择最有用的无标示训练数据,以更新模型。
  • results: 提出了一种扰动基于采样策略,与常用的采样策略相比,更有效率。
    Abstract Building a question answering (QA) model with less annotation costs can be achieved by utilizing active learning (AL) training strategy. It selects the most informative unlabeled training data to update the model effectively. Acquisition functions for AL are used to determine how informative each training example is, such as uncertainty or diversity based sampling. In this work, we propose a perturbation-based active learning acquisition strategy and demonstrate it is more effective than existing commonly used strategies.
    摘要 使用活动学习(AL)训练策略可以降低问答(QA)模型标注成本。它选择最有用的未标注训练数据来更新模型,以达到更高的效果。获取函数用于AL来确定每个训练示例的有用程度,如不确定性或多样性基本采样。在这项工作中,我们提议使用扰动基于的活动学习获取策略,并证明它比现有的通用使用策略更有效。

You Only Forward Once: Prediction and Rationalization in A Single Forward Pass

  • paper_url: http://arxiv.org/abs/2311.02344
  • repo_url: None
  • paper_authors: Han Jiang, Junwen Duan, Zhe Qu, Jianxin Wang
  • for: 本研究旨在提高无监督逻辑抽象的精度和效果,使模型预测时可以快速提取有用的信息。
  • methods: 本研究使用了一种新的单阶段框架,即You Only Forward Once(YOFO)框架,其中使用了一个预训练的语言模型如BERT进行预测和分析。
  • results: 实验结果显示,YOFO模型可以比前一代RNP模型更加准确地预测和提取有用的逻辑抽象。对比于前一代方法,YOFO模型可以提高token级F1得分达18.4%。此外,研究还发现YOFO模型可以快速提取有用的逻辑抽象,并且可以在模型中移除不重要的token。
    Abstract Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predict the golden label. However, the assumption above deviates from the original definition and is too strict to perform well. Furthermore, these two-phase models suffer from the interlocking problem and spurious correlations. To solve the above problems, we propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions. In our framework, A pre-trained language model like BERT is deployed to simultaneously perform prediction and rationalization with less impact from interlocking or spurious correlations. Directly choosing the important tokens in an unsupervised manner is intractable. Instead of directly choosing the important tokens, YOFO gradually removes unimportant tokens during forward propagation. Through experiments on the BeerAdvocate and Hotel Review datasets, we demonstrate that our model is able to extract rationales and make predictions more accurately compared to RNP-based models. We observe an improvement of up to 18.4\% in token-level F1 compared to previous state-of-the-art methods. We also conducted analyses and experiments to explore the extracted rationales and token decay strategies. The results show that YOFO can extract precise and important rationales while removing unimportant tokens in the middle part of the model.
    摘要 <> translate the following text into Simplified Chinese:Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any annotated rationale. Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm. They assumed that the extracted explanation, called rationale, should be sufficient to predict the golden label. However, the assumption above deviates from the original definition and is too strict to perform well. Furthermore, these two-phase models suffer from the interlocking problem and spurious correlations. To solve the above problems, we propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions. In our framework, A pre-trained language model like BERT is deployed to simultaneously perform prediction and rationalization with less impact from interlocking or spurious correlations. Directly choosing the important tokens in an unsupervised manner is intractable. Instead of directly choosing the important tokens, YOFO gradually removes unimportant tokens during forward propagation. Through experiments on the BeerAdvocate and Hotel Review datasets, we demonstrate that our model is able to extract rationales and make predictions more accurately compared to RNP-based models. We observe an improvement of up to 18.4\% in token-level F1 compared to previous state-of-the-art methods. We also conducted analyses and experiments to explore the extracted rationales and token decay strategies. The results show that YOFO can extract precise and important rationales while removing unimportant tokens in the middle part of the model.Translation:无监督的理由抽取目标在支持模型预测而不需要任何标注的理由。先前的研究使用了一个两个阶段框架,称为神经网络预测合理化(RNP)框架,该框架采用生成Then预测模式。它们假设提取的解释,即理由,应该足够预测金 Label。然而,上述假设与原始定义偏离,并且太严格来不能perform well。此外,这些两个阶段模型受到了交叠问题和偶极相关性的影响。为解决以上问题,我们提出了一种单阶段框架,称为你只能前进一次(YOFO),该框架基于放松的理由定义,其中理由的目的是支持模型预测而不是预测。在我们的框架中,使用预训练的自然语言模型,如BERT,同时进行预测和合理化,以减少交叠或偶极相关性的影响。直接从无监督中选择重要的字符是不可能。相反,YOFO在前进传播过程中逐渐移除无关重要的字符。通过对BeerAdvocate和酒店评论数据集进行实验,我们证明了我们的模型可以更加准确地提取理由和预测。我们观察到的提取改善率可达18.4%。我们还进行了分析和实验,以探索提取的理由和字符衰减策略。结果表明,YOFO可以提取精确和重要的理由,并在中部模型中移除无关重要的字符。

Stable Diffusion Reference Only: Image Prompt and Blueprint Jointly Guided Multi-Condition Diffusion Model for Secondary Painting

  • paper_url: http://arxiv.org/abs/2311.02343
  • repo_url: None
  • paper_authors: Hao Ai, Lu Sheng
  • for: 这篇论文旨在提高二次绘制的效率,特别是在漫画、动画等艺术创作领域。
  • methods: 这篇论文提出了一新的自动化图像生成方法,仅使用两种条件图像进行精确控制生成,以减少对ControlNet的需求。
  • results: 这篇论文的实验结果显示,这新的方法可以实现高效的二次绘制,并且可以实现高品质的图像生成。
    Abstract Stable Diffusion and ControlNet have achieved excellent results in the field of image generation and synthesis. However, due to the granularity and method of its control, the efficiency improvement is limited for professional artistic creations such as comics and animation production whose main work is secondary painting. In the current workflow, fixing characters and image styles often need lengthy text prompts, and even requires further training through TextualInversion, DreamBooth or other methods, which is very complicated and expensive for painters. Therefore, we present a new method in this paper, Stable Diffusion Reference Only, a images-to-image self-supervised model that uses only two types of conditional images for precise control generation to accelerate secondary painting. The first type of conditional image serves as an image prompt, supplying the necessary conceptual and color information for generation. The second type is blueprint image, which controls the visual structure of the generated image. It is natively embedded into the original UNet, eliminating the need for ControlNet. We released all the code for the module and pipeline, and trained a controllable character line art coloring model at https://github.com/aihao2000/stable-diffusion-reference-only, that achieved state-of-the-art results in this field. This verifies the effectiveness of the structure and greatly improves the production efficiency of animations, comics, and fanworks.
    摘要 stable diffusion和controlnet在图像生成和合成领域具有优秀的成绩,但由于它的粒度和控制方法,对专业艺术创作如漫画和动画制作而言,效率提升的限制很大。现在的工作流程中,fixing人物和图像风格经常需要长时间的文本提示,甚至需要通过文本反向、梦幻箱等方法进行进一步的训练,这对画家来说是非常复杂和昂贵的。因此,我们在这篇论文中提出了一新的方法:stable diffusion reference only,这是一种图像到图像的自我超vis的模型,只需要两种类型的条件图像来进行精确的控制生成,以加速secondary painting。第一种类型的条件图像 acted as an image prompt,提供了必要的概念和颜色信息 для生成。第二种类型的条件图像是蓝图图像,它控制了生成图像的视觉结构,并且Native embedding在原始UNet中,消除了需要控制网的需求。我们已经发布了模块和管道的代码,并在https://github.com/aihao2000/stable-diffusion-reference-only上训练了一个可控色彩插画模型,达到了这个领域的州前成绩。这证明了结构的有效性,对动画、漫画和粉丝创作的生产效率进行了很大的提升。

Potato Leaf Disease Classification using Deep Learning: A Convolutional Neural Network Approach

  • paper_url: http://arxiv.org/abs/2311.02338
  • repo_url: None
  • paper_authors: Utkarsh Yashwant Tambe, A. Shobanadevi, A. Shanthini, Hsiu-Chun Hsu
  • for: 本研究使用深度学习(Deep Learning)来分类芋头叶病。
  • methods: 提议的方法包括对叶图像数据进行预处理,使用深度学习模型训练,并对测试集进行评估。
  • results: 实验结果显示,使用深度学习模型,全面准确率达99.1%,可高度准确地识别芋头叶病两种,包括早期虫疫和晚期虫疫,以及健康叶片。这种方法可能为芋头农业中疾病识别提供可靠和有效的解决方案,帮助保持食品安全和避免农业损失。
    Abstract In this study, a Convolutional Neural Network (CNN) is used to classify potato leaf illnesses using Deep Learning. The suggested approach entails preprocessing the leaf image data, training a CNN model on that data, and assessing the model's success on a test set. The experimental findings show that the CNN model, with an overall accuracy of 99.1%, is highly accurate in identifying two kinds of potato leaf diseases, including Early Blight, Late Blight, and Healthy. The suggested method may offer a trustworthy and effective remedy for identifying potato diseases, which is essential for maintaining food security and minimizing financial losses in agriculture. The model can accurately recognize the various disease types even when there are severe infections present. This work highlights the potential of deep learning methods for categorizing potato diseases, which can help with effective and automated disease management in potato farming.
    摘要 在这项研究中,我们使用了卷积神经网络(CNN)来分类芋头叶病。我们建议的方法包括对叶图像数据进行预处理,使用该数据训练CNN模型,并对测试集进行评估。实验结果表明,我们的CNN模型,准确率为99.1%,高度准确地识别了两种芋头叶病,包括早期病和晚期病,以及健康的叶片。我们建议的方法可能为芋头农业中维护食品安全和减少经济损失提供一个可靠和有效的解决方案。这种方法可以准确地识别不同的病种,即使有严重的感染存在。这项研究表明了深度学习方法在芋头病类分类中的潜力,这可能会帮助实现自动化和有效的芋头病管理。

STOW: Discrete-Frame Segmentation and Tracking of Unseen Objects for Warehouse Picking Robots

  • paper_url: http://arxiv.org/abs/2311.02337
  • repo_url: None
  • paper_authors: Yi Li, Muru Zhang, Markus Grotz, Kaichun Mo, Dieter Fox
  • for: 这篇论文主要关注在dynamic industrial robotic contexts和domestic robotic applications中,对于物品的重新排序、移除和部分遮蔽等操作,以及在长时间内实现物品追踪的任务。
  • methods: 本文提出了一个新的合成和真实世界数据集,以及一个基于transformer模组的联合分割和追踪方法,以解决这些具有挑战性的任务。
  • results: 本文的实验结果显示,该方法与最近的方法相比,有着优秀的性能。另外,请参考官方网站(\href{https://sites.google.com/view/stow-corl23}{website}) для更多的结果和视频。
    Abstract Segmentation and tracking of unseen object instances in discrete frames pose a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.
    摘要 Segmentation and tracking of unseen object instances in discrete frames poses a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.Here's the translation in Traditional Chinese:Segmentation and tracking of unseen object instances in discrete frames poses a significant challenge in dynamic industrial robotic contexts, such as distribution warehouses. Here, robots must handle object rearrangement, including shifting, removal, and partial occlusion by new items, and track these items after substantial temporal gaps. The task is further complicated when robots encounter objects not learned in their training sets, which requires the ability to segment and track previously unseen items. Considering that continuous observation is often inaccessible in such settings, our task involves working with a discrete set of frames separated by indefinite periods during which substantial changes to the scene may occur. This task also translates to domestic robotic applications, such as rearrangement of objects on a table. To address these demanding challenges, we introduce new synthetic and real-world datasets that replicate these industrial and household scenarios. We also propose a novel paradigm for joint segmentation and tracking in discrete frames along with a transformer module that facilitates efficient inter-frame communication. The experiments we conduct show that our approach significantly outperforms recent methods. For additional results and videos, please visit \href{https://sites.google.com/view/stow-corl23}{website}. Code and dataset will be released.

Complex Organ Mask Guided Radiology Report Generation

  • paper_url: http://arxiv.org/abs/2311.02329
  • repo_url: https://github.com/GaryGuTC/COMG_model
  • paper_authors: Gu Tiancheng, Liu Dongnan, Li Zhiyuan, Cai Weidong
  • for: 提高诊断报告的精度和详细程度,alleviate traditional radiology reporting workload.
  • methods: 基于多Modal的组织面干(COMG)报告生成模型, incorporates 多个器官(如骨、肺、心脏、 mediastinum)的面干,以提供更详细的医疗信息和引导模型注意力。
  • results: 在 IU-Xray 和 MIMIC 两个公共数据集上实验,COMG 比 SOTA 模型 KiUT 提高了11.4% 和 9.7% 的 BLEU@4 分数。
    Abstract The goal of automatic report generation is to generate a clinically accurate and coherent phrase from a single given X-ray image, which could alleviate the workload of traditional radiology reporting.However, in a real-world scenario, radiologists frequently face the challenge of producing extensive reports derived from numerous medical images, thereby medical report generation from multi-image perspective is needed.In this paper, we propose the Complex Organ Mask Guided (termed as COMG) report generation model, which incorporates masks from multiple organs (e.g., bones, lungs, heart, and mediastinum), to provide more detailed information and guide the model's attention to these crucial body regions. Specifically, we leverage prior knowledge of the disease corresponding to each organ in the fusion process to enhance the disease identification phase during the report generation process. Additionally, cosine similarity loss is introduced as target function to ensure the convergence of cross-modal consistency and facilitate model optimization.Experimental results on two public datasets show that COMG achieves a 11.4% and 9.7% improvement in terms of BLEU@4 scores over the SOTA model KiUT on IU-Xray and MIMIC, respectively.
    摘要 目的自动报告生成是生成单一X-ray图像的依据生成一个临床精确和连贯的句子,以减轻传统医疗影像报告的工作负担。然而,在实际应用中,医生很频繁地面临着从多个医疗影像中生成详细的报告,因此对多个医疗影像的医疗报告生成是需要的。在这篇文章中,我们提出了复杂器官面精准指南(COMG)报告生成模型,它将多个器官(如骨、肺、心脏和脊椎)的面精准指南 integrate into the model,以提供更多的详细信息和导引模型的注意力到这些重要的身体区域。具体来说,我们利用各器官疾病的专业知识在融合过程中强化疾病识别阶段,以提高报告生成过程中的疾病识别率。此外,我们引入了cosine similarity损失函数,以便在混合modal consistency的整合过程中实现模型优化。实验结果显示,在两个公共数据集上,COMG对于KiUT的BLEU@4分数有11.4%和9.7%的提升。

FragXsiteDTI: Revealing Responsible Segments in Drug-Target Interaction with Transformer-Driven Interpretation

  • paper_url: http://arxiv.org/abs/2311.02326
  • repo_url: None
  • paper_authors: Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Niloofar Yousefi, Aida Tayebi, Sina Abdidizaji, Ozlem Ozmen Garibay
  • For: 预测药物目标交互(DTI)在药物发现中具有重要意义,但是现有模型具有解释性和性能优化等挑战。本文提出了一种基于转换器的新模型,即FragXsiteDTI,以解决DTI预测中的这些挑战。* Methods: FragXsiteDTI模型 simultaneous 利用药物分子块和蛋白质孔隙,并采用了转换器架构,包括跨注意力和自注意力。模型还具有可学习的隐藏数组,通过cross-attention和self-attention来改进模型的性能。* Results: 根据三个 benchmarking 数据集的计算结果,FragXsiteDTI 模型在预测DTI方面表现出了明显的优势,并且可以准确地表达药物和目标蛋白质之间的交互。此外,模型还可以提供可读性的解释,包括药物和目标蛋白质中关键的组分。
    Abstract Drug-Target Interaction (DTI) prediction is vital for drug discovery, yet challenges persist in achieving model interpretability and optimizing performance. We propose a novel transformer-based model, FragXsiteDTI, that aims to address these challenges in DTI prediction. Notably, FragXsiteDTI is the first DTI model to simultaneously leverage drug molecule fragments and protein pockets. Our information-rich representations for both proteins and drugs offer a detailed perspective on their interaction. Inspired by the Perceiver IO framework, our model features a learnable latent array, initially interacting with protein binding site embeddings using cross-attention and later refined through self-attention and used as a query to the drug fragments in the drug's cross-attention transformer block. This learnable query array serves as a mediator and enables seamless information translation, preserving critical nuances in drug-protein interactions. Our computational results on three benchmarking datasets demonstrate the superior predictive power of our model over several state-of-the-art models. We also show the interpretability of our model in terms of the critical components of both target proteins and drug molecules within drug-target pairs.
    摘要 drugs-target interaction (DTI) prediction is crucial for drug discovery, but challenges remain in achieving model interpretability and optimizing performance. We propose a novel transformer-based model, FragXsiteDTI, to address these challenges in DTI prediction. Notably, FragXsiteDTI is the first DTI model to simultaneously leverage drug molecule fragments and protein pockets. Our information-rich representations for both proteins and drugs provide a detailed perspective on their interaction. Inspired by the Perceiver IO framework, our model features a learnable latent array that initially interacts with protein binding site embeddings using cross-attention and is later refined through self-attention. This learnable query array serves as a mediator and enables seamless information translation, preserving critical nuances in drug-protein interactions. Our computational results on three benchmarking datasets demonstrate the superior predictive power of our model over several state-of-the-art models. We also show the interpretability of our model in terms of the critical components of both target proteins and drug molecules within drug-target pairs.Here's the translation in Traditional Chinese:这是一个标题,请将其转换为中文。药品-标的互动(DTI)预测是药品探索的重要环节,但是还有许多挑战需要解决,以提高模型的解释性和性能。我们提出了一个新的transformer-based模型,FragXsiteDTI,以解决DTI预测中的这些挑战。值得注意的是,FragXsiteDTI是首个同时利用药品分子片段和蛋白质包含物的DTI模型。我们的模型具有丰富的资讯表现,对药品和蛋白质之间的互动提供了详细的见解。受到Perceiver IO框架的启发,我们的模型具有可学习的潜在阵列,首先与蛋白质绑定位置嵌入进行交互,然后通过自我对话和探索来进一步细化。这个可学习的查询阵列作为一个中介者,实现了无障碍的资讯转化,保留了药品-蛋白质互动中的重要特征。我们的computational result表明,FragXsiteDTI模型在三个benchmarkingdataset上的预测力高于多个现有模型。我们还展示了我们模型在药品-标的对之中的解释性,包括标的蛋白质和药品分子之间的关键Component。

Thermal Face Image Classification using Deep Learning Techniques

  • paper_url: http://arxiv.org/abs/2311.02314
  • repo_url: None
  • paper_authors: Prosenjit Chatterjee, ANK Zaman
  • for: 这篇论文应用于热像分类。
  • methods: 本文使用深度学习方法,specifically ResNet-50和VGGNet-19对热像进行特征提取,并应用Kalman统计filter进行图像预测。
  • results: 实验结果显示提案方法具有高精度和高效率。
    Abstract Thermal images have various applications in security, medical and industrial domains. This paper proposes a practical deep-learning approach for thermal image classification. Accurate and efficient classification of thermal images poses a significant challenge across various fields due to the complex image content and the scarcity of annotated datasets. This work uses a convolutional neural network (CNN) architecture, specifically ResNet-50 and VGGNet-19, to extract features from thermal images. This work also applied Kalman filter on thermal input images for image denoising. The experimental results demonstrate the effectiveness of the proposed approach in terms of accuracy and efficiency.
    摘要 热图像在安全、医疗和工业领域有多种应用。这篇论文提出了一种实用的深度学习方法来对热图像进行分类。由于热图像的复杂图像内容以及不同领域的热图像标注数据的稀缺,精度和效率的图像分类具有 significannot challenges。本文使用 convolutional neural network (CNN) 架构,具体来说是 ResNet-50 和 VGGNet-19,来从热图像中提取特征。此外,本文还应用了 Kalman 筛选器来降噪热输入图像。实验结果表明提出的方法在精度和效率两个方面具有remarkable的效果。

OSM vs HD Maps: Map Representations for Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2311.02305
  • repo_url: None
  • paper_authors: Jing-Yan Liao, Parth Doshi, Zihan Zhang, David Paz, Henrik Christensen
  • for: 提出了一种使用 OpenStreetMap (OSM) 作为替代高清地图 (HD Maps) 的方法,以便在自动驾驶中长期预测动向。
  • methods: 该方法通过扩展应用范围和 интеграción intercept 约束,使用 OSM 来实现长期预测动向,并且能够与 HD Map 基本相当。
  • results: 研究表明,该方法可以在不同的情景下提供更好的预测性能,并且可以在自动驾驶中广泛应用。
    Abstract While High Definition (HD) Maps have long been favored for their precise depictions of static road elements, their accessibility constraints and susceptibility to rapid environmental changes impede the widespread deployment of autonomous driving, especially in the motion forecasting task. In this context, we propose to leverage OpenStreetMap (OSM) as a promising alternative to HD Maps for long-term motion forecasting. The contributions of this work are threefold: firstly, we extend the application of OSM to long-horizon forecasting, doubling the forecasting horizon compared to previous studies. Secondly, through an expanded receptive field and the integration of intersection priors, our OSM-based approach exhibits competitive performance, narrowing the gap with HD Map-based models. Lastly, we conduct an exhaustive context-aware analysis, providing deeper insights in motion forecasting across diverse scenarios as well as conducting class-aware comparisons. This research not only advances long-term motion forecasting with coarse map representations but additionally offers a potential scalable solution within the domain of autonomous driving.
    摘要 高清定图(HD Map)已经长期被喜欢用于其精确地表示静止道路元素,但是访问限制和环境变化的敏感性使得自动驾驶的广泛部署受到限制,尤其是在动作预测任务中。在这种情况下,我们提议使用开源地图(OSM)作为自动驾驶中长期动作预测的可能的代替方案。本研究的贡献有三个方面:一、我们扩展了OSM的应用范围,使其能够进行更长的预测 horizon,与前一些研究相比, doubling the forecasting horizon。二、通过扩展的接受场和交叉点约束的结合,我们的OSM基于方法在竞争性方面表现出色,与HD Map基于模型相匹配。三、我们进行了广泛的Context-aware分析,提供了更深入的运动预测理解,并进行了不同场景的类型敏感比较。这种研究不仅提高了长期动作预测的粗糙地图表示,还提供了可扩展的解决方案在自动驾驶领域。

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

  • paper_url: http://arxiv.org/abs/2311.02303
  • repo_url: None
  • paper_authors: Bingchang Liu, Chaoyu Chen, Cong Liao, Zi Gong, Huan Wang, Zhichao Lei, Ming Liang, Dajun Chen, Min Shen, Hailian Zhou, Hang Yu, Jianguo Li
  • for: 提高 CodeLLama 模型的编程能力,并且可以同时 fine-tune 多个任务。
  • methods: 使用多任务 fine-tuning 框架(MFTcoder),并结合多种损失函数来解决多任务学习中的常见挑战。
  • results: 比较传统 fine-tuning 方法和 mixed 任务 fine-tuning 方法,MFTcoder 能够达到更高的性能,并且可以快速训练和部署。
    Abstract Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{https://github.com/codefuse-ai/MFTCOder}
    摘要 code llms 已经成为一个专门的研究领域,有很多研究把焦点放在提高模型的编程能力上,通过对预训练模型进行细化。过去的细化方法通常是为特定下游任务或场景进行定制,这意味着每个任务都需要单独进行细化,需要很多训练资源,同时也存在部署和维护上的挑战。此外,这些方法还没有利用编程任务之间的内在相互连接。为了解决这些限制,我们提出了一个多任务细化框架,MFTcoder,它允许同时并行细化多个任务。通过 incorporating 多种损失函数,我们有效地解决了多任务学习中常见的挑战,如数据不均衡、任务难度不同、和不同任务的学习速度不一致。广泛的实验证明了我们的多任务细化方法在单任务细化和混合任务细化的情况下都有出色的表现。此外,MFTcoder还提供了高效的训练能力,包括高效的数据分割模式和PEFT细化,从而在传统细化方法的基础上减少了训练时间。MFTcoder可以与多个主流的开源 LLMS 集成,如 CodeLLama 和 Qwen。利用 CodeLLama 基础,我们的 MFTcoder 细化模型,\textsc{CodeFuse-CodeLLama-34B},在 HumaneEval benchmark 上达到了很吸引人的 pass@1 分数为 74.4%,超越 GPT-4 的表现(67%,零shot)。MFTCoder 的源代码可以在 上找到。

Successive Model-Agnostic Meta-Learning for Few-Shot Fault Time Series Prognosis

  • paper_url: http://arxiv.org/abs/2311.02300
  • repo_url: None
  • paper_authors: Hai Su, Jiajun Hu, Songsen Yu
  • for: 解决几个shot的缺陷预测问题,提高预测精度和泛化能力。
  • methods: 引入新的’pseudo meta-task’分区方案,将续时序数据视为一个meta-任务,分割成多个短时期,提取更全面的特征和关系,提高预测精度。同时,引入差分算法提高方法的稳定性。
  • results: 通过在多个缺陷预测和时间序列预测 dataset上进行广泛的实验,证明了我们的方法可以在少量数据下提高预测性能和泛化能力。
    Abstract Meta learning is a promising technique for solving few-shot fault prediction problems, which have attracted the attention of many researchers in recent years. Existing meta-learning methods for time series prediction, which predominantly rely on random and similarity matching-based task partitioning, face three major limitations: (1) feature exploitation inefficiency; (2) suboptimal task data allocation; and (3) limited robustness with small samples. To overcome these limitations, we introduce a novel 'pseudo meta-task' partitioning scheme that treats a continuous time period of a time series as a meta-task, composed of multiple successive short time periods. Employing continuous time series as pseudo meta-tasks allows our method to extract more comprehensive features and relationships from the data, resulting in more accurate predictions. Moreover, we introduce a differential algorithm to enhance the robustness of our method across different datasets. Through extensive experiments on several fault and time series prediction datasets, we demonstrate that our approach substantially enhances prediction performance and generalization capability under both few-shot and general conditions.
    摘要 <>转换给定文本到简化中文。>预测技术是解决几个shot错误预测问题的有力方法,这些问题在最近几年内吸引了许多研究人员的关注。现有的预测方法对时间序列预测,主要基于随机和相似性匹配的任务分配,面临三大限制:(1)特征利用不充分;(2)不优化的任务数据分配;以及(3)只能在小样本情况下保持有限的 Robustness。为了突破这些限制,我们提出了一种新的“伪meta任务”分配方案,将一个连续时间序列视为一个meta任务,由多个连续的短时间序列组成。使用连续时间序列为伪meta任务,我们的方法可以从数据中提取更全面的特征和关系,从而实现更准确的预测。此外,我们还引入了一种差分算法,以提高我们方法的 Robustness 性在不同的数据集上。通过对多个错误和时间序列预测数据集进行广泛的实验,我们示出了我们方法可以在少量示例和普通情况下显著提高预测性能和泛化能力。

A Survey of the Various Methodologies Towards making Artificial Intelligence More Explainable

  • paper_url: http://arxiv.org/abs/2311.02291
  • repo_url: None
  • paper_authors: Sopam Dasgupta
  • for: 这个论文的目的是提高机器决策过程中的解释性和可解释性,以便更好地理解决交的决策理由。
  • methods: 本论文使用了一种基于人工智能的方法,通过对模型的解释性和可解释性进行分析和评估,以便提高机器决策过程中的解释性和可解释性。
  • results: 本论文的研究结果表明,通过提高机器决策过程中的解释性和可解释性,可以更好地理解决交的决策理由,并且可以通过对模型的解释性和可解释性进行分析和评估,以便更好地提高机器决策过程中的解释性和可解释性。
    Abstract Machines are being increasingly used in decision-making processes, resulting in the realization that decisions need explanations. Unfortunately, an increasing number of these deployed models are of a 'black-box' nature where the reasoning behind the decisions is unknown. Hence, there is a need for clarity behind the reasoning of these decisions. As humans, we would want these decisions to be presented to us in an explainable manner. However, explanations alone are insufficient. They do not necessarily tell us how to achieve an outcome but merely tell us what achieves the given outcome. For this reason, my research focuses on explainability/interpretability and how it extends to counterfactual thinking.
    摘要

Predicting Ground Reaction Force from Inertial Sensors

  • paper_url: http://arxiv.org/abs/2311.02287
  • repo_url: None
  • paper_authors: Bowen Song, Marco Paolieri, Harper E. Stewart, Leana Golubchik, Jill L. McNitt-Gray, Vishal Misra, Devavrat Shah
  • for: 这个论文的目的是用IMU数据来预测脚下压力(GRF),以便分析运动员的生物机械变量(如接触时间和加速度)。
  • methods: 这篇论文使用了三种轻量级的预测方法:k-Nearest Neighbors(KNN)回归、支持向量表示插值(SVD)回归和深度学习神经网络(LSTM)。
  • results: 研究结果表明,使用KNN回归和SVD插值可以与LSTM神经网络相比,具有相似或更高的准确率,并且训练时间更短,hyperparameter优化也更简单。尤其是当使用个人训练数据时,SER和KNN方法更加准确。此外,使用个人数据可以降低预测错误的大多数变量。
    Abstract The study of ground reaction forces (GRF) is used to characterize the mechanical loading experienced by individuals in movements such as running, which is clinically applicable to identify athletes at risk for stress-related injuries. Our aim in this paper is to determine if data collected with inertial measurement units (IMUs), that can be worn by athletes during outdoor runs, can be used to predict GRF with sufficient accuracy to allow the analysis of its derived biomechanical variables (e.g., contact time and loading rate). In this paper, we consider lightweight approaches in contrast to state-of-the-art prediction using LSTM neural networks. Specifically, we compare use of LSTMs to k-Nearest Neighbors (KNN) regression as well as propose a novel solution, SVD Embedding Regression (SER), using linear regression between singular value decomposition embeddings of IMUs data (input) and GRF data (output). We evaluate the accuracy of these techniques when using training data collected from different athletes, from the same athlete, or both, and we explore the use of acceleration and angular velocity data from sensors at different locations (sacrum and shanks). Our results illustrate that simple machine learning methods such as SER and KNN can be similarly accurate or more accurate than LSTM neural networks, with much faster training times and hyperparameter optimization; in particular, SER and KNN are more accurate when personal training data are available, and KNN comes with benefit of providing provenance of prediction. Notably, the use of personal data reduces prediction errors of all methods for most biomechanical variables.
    摘要 研究地面反应力(GRF)是为了描述运动员在运动中所经历的机械负荷的重要工具。我们的目标是确定 whether 使用抗应力计(IMUs)收集的数据可以准确预测 GRF,以便分析其 derived 生物力学变量(例如,接触时间和加载率)。 在这篇论文中,我们考虑使用轻量级方法,而不是现有的预测方法使用 LSTM 神经网络。我们比较使用 LSTM 和 k-最近邻域(KNN)回归,以及提出了一个新的解决方案,即 Singular Value Decomposition 嵌入回归(SER),使用 IMUs 数据(输入)和 GRF 数据(输出)之间的线性回归。我们评估了这些技术的准确性,使用不同的教学数据集,包括不同运动员、同一个运动员和两者。我们发现,简单的机器学习方法如 SER 和 KNN 可以与 LSTM 神经网络相比较准确,具有更快的训练时间和权重优化。特别是,使用个人教学数据可以减少预测错误,SER 和 KNN 在这种情况下更加准确。另外,使用个人数据还可以提供预测的来源。