paper_authors: Zahra Kolagar, Anna Katharina Leschanowsky, Birgit Popp
for: 保护用户隐私,法律领域世界各地强调数据处理的透明度。
methods: 提议一种动态工作流程,将隐私政策转化为隐私问答对,使隐私政策通过对话AI访问。
results: 促进多学科协作,法律专家和对话设计师之间的合作,同时考虑使用大语言模型的生成能力和相关挑战。Abstract
Privacy policies play a vital role in safeguarding user privacy as legal jurisdictions worldwide emphasize the need for transparent data processing. While the suitability of privacy policies to enhance transparency has been critically discussed, employing conversational AI systems presents unique challenges in informing users effectively. In this position paper, we propose a dynamic workflow for transforming privacy policies into privacy question-and-answer (Q&A) pairs to make privacy policies easily accessible through conversational AI. Thereby, we facilitate interdisciplinary collaboration among legal experts and conversation designers, while also considering the utilization of large language models' generative capabilities and addressing associated challenges. Our proposed workflow underscores continuous improvement and monitoring throughout the construction of privacy Q&As, advocating for comprehensive review and refinement through an experts-in-the-loop approach.
摘要
《隐私政策在保护用户隐私方面发挥了关键作用,世界各地法律领域都强调数据处理的透明度。虽然透明度是隐私政策的重要方面,但是在使用对话AI系统时,它们带来了一些独特的挑战,用于有效地 Informing 用户。在这篇位点纸中,我们提出了一种动态工作流程,将隐私政策转换成隐私问答对,以便通过对话AI系统访问隐私政策。这样,我们促进了法律专家和对话设计师之间的跨学科合作,同时也考虑了大语言模型的生成能力和相关挑战。我们的提议的工作流程强调了不断改进和监测,在建立隐私问答时,强调了专家征考和反复修改的方式。》Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.
Evaluating the Inclusiveness of Artificial Intelligence Software in Enhancing Project Management Efficiency – A Review
results: 研究发现,通过技术支持,项目管理中的包容性可以得到 significan improvement,但需要注意技术的可行性和伦理考虑。Abstract
The rise of advanced technology in project management (PM) highlights a crucial need for inclusiveness. This work examines the enhancement of both inclusivity and efficiency in PM through technological integration, focusing on defining and measuring inclusiveness. This approach illuminates how inclusivity-centered technology can significantly elevate project outcomes. The research navigates through the challenges of achieving inclusivity, mainly biases in learning databases and the design process of these technologies, assessment of transformative potential of these technologies, particularly in automating tasks like data collection and analysis, thus enabling managers to prioritize human-centric aspects of projects. However, the integration of such technology transcends efficiency, indicating a paradigm shift in understanding their societal roles. This shift necessitates a new approach in the development of these systems to prevent perpetuating social inequalities. We proposed a methodology involving criteria development for evaluating the inclusiveness and effectiveness of these technologies. This methodical approach is vital to comprehensively address the challenges and limitations inherent in these systems. Emphasizing the importance of inclusivity, the study advocates for a balance between technological advancement and ethical considerations, calling for a holistic understanding and regulation. In conclusion, the paper underscores that while these technologies can significantly improve outcomes, their mindful integration, ensuring inclusivity, is paramount. This exploration into the ethical and practical aspects of technology in PM contributes to a more informed and balanced approach within the field.
摘要
高技术的普及在项目管理(PM)领域 highlights 一个关键的需求:包容性。这项研究探讨了通过技术集成提高包容性和效率在 PM 中的可能性,关注定义和测量包容性。这种方法揭示了包容性 centered 技术在项目成果上可以产生显著提高。研究探讨了实现包容性的挑战,主要是学习数据库和设计过程中的偏见,以及这些技术的变革潜力,特别是自动化数据采集和分析任务,以便管理人员能够更好地注重项目人文方面。然而,技术的集成不仅提高效率,还表示一种社会角色转变。这种转变需要一种新的开发系统的方法,以避免社会不平等的持续。我们提出了一种包括对包容性和效果的评价 criterion 的方法ологи。这种系统atic 方法是必要的,以全面 Address 这些系统的挑战和限制。强调包容性的重要性,这篇论文呼吁对技术进步和伦理考虑进行平衡,寻求一种整体的理解和规范。 study 的结论是,虽然这些技术可以提高成果,但是注意其包容性的集成是关键。这种探讨技术和伦理的平衡在 PM 领域中做出了更加 Informed 和平衡的方法论。
Vashantor: A Large-scale Multilingual Benchmark Dataset for Automated Translation of Bangla Regional Dialects to Bangla Language
results: 实验结果显示,мы门司нг班加礼语言方言的 Bleu 分数为69.06,而奇图加礼语言方言的 Bleu 分数为36.75。我们还发现,мы门司нг班加礼语言方言的平均单词错误率为0.1548,而奇图加礼语言方言的平均单词错误率为0.3385。Region detection的准确率为85.86%和84.36%。这是首次对班加礼语言方言到班加礼语言机器翻译进行大规模的调查。Abstract
The Bangla linguistic variety is a fascinating mix of regional dialects that adds to the cultural diversity of the Bangla-speaking community. Despite extensive study into translating Bangla to English, English to Bangla, and Banglish to Bangla in the past, there has been a noticeable gap in translating Bangla regional dialects into standard Bangla. In this study, we set out to fill this gap by creating a collection of 32,500 sentences, encompassing Bangla, Banglish, and English, representing five regional Bangla dialects. Our aim is to translate these regional dialects into standard Bangla and detect regions accurately. To achieve this, we proposed models known as mT5 and BanglaT5 for translating regional dialects into standard Bangla. Additionally, we employed mBERT and Bangla-bert-base to determine the specific regions from where these dialects originated. Our experimental results showed the highest BLEU score of 69.06 for Mymensingh regional dialects and the lowest BLEU score of 36.75 for Chittagong regional dialects. We also observed the lowest average word error rate of 0.1548 for Mymensingh regional dialects and the highest of 0.3385 for Chittagong regional dialects. For region detection, we achieved an accuracy of 85.86% for Bangla-bert-base and 84.36% for mBERT. This is the first large-scale investigation of Bangla regional dialects to Bangla machine translation. We believe our findings will not only pave the way for future work on Bangla regional dialects to Bangla machine translation, but will also be useful in solving similar language-related challenges in low-resource language conditions.
摘要
“孟加拉语言变体是一种非常有趣的地域 диалект混合,增加了孟加拉语言社区的文化多样性。尽管过去对孟加拉语言到英语、英语到孟加拉语言和孟加拉语言到英语的翻译已经进行了广泛的研究,但是还没有尝试翻译孟加拉语言地域 диалект到标准孟加拉语言。在这项研究中,我们决定填补这一空白,创建了32500句孟加拉语言、孟加拉语言混合和英语 sentences,代表五个孟加拉语言地域 диалект。我们的目标是将这些地域 диаLECTS翻译成标准孟加拉语言,并准确地检测这些地域的来源。为此,我们提出了名为mT5和BanglaT5的模型,以及mBERT和Bangla-bert-base来确定这些地域的来源。我们的实验结果显示,最高的BLEU分数为69.06,来自米亚尼斯希的地域 диаLECTS,最低的BLEU分数为36.75,来自切图格的地域 диаLECTS。我们还观察到,米亚尼斯希的地域 диаLECTS的平均单词错误率最低,为0.1548,而切图格的地域 диаLECTS的平均单词错误率最高,为0.3385。对地域检测,我们获得了85.86%的准确率,使用Bangla-bert-base,和84.36%的准确率,使用mBERT。这是对孟加拉语言地域 диаLECTS到孟加拉语言机器翻译的首次大规模调查。我们认为,我们的发现将不 только开阔未来对孟加拉语言地域 диаLECTS到孟加拉语言机器翻译的道路,还将在低资源语言条件下解决类似语言相关的挑战。”
(Why) Is My Prompt Getting Worse? Rethinking Regression Testing for Evolving LLM APIs
results: 研究发现, traditional testingapproaches 不能满足 regression testing LLMs,因为LLM APIs的不同correctness notions、prompt brittleness和non-determinism。Abstract
Large Language Models (LLMs) are increasingly integrated into software applications. Downstream application developers often access LLMs through APIs provided as a service. However, LLM APIs are often updated silently and scheduled to be deprecated, forcing users to continuously adapt to evolving models. This can cause performance regression and affect prompt design choices, as evidenced by our case study on toxicity detection. Based on our case study, we emphasize the need for and re-examine the concept of regression testing for evolving LLM APIs. We argue that regression testing LLMs requires fundamental changes to traditional testing approaches, due to different correctness notions, prompting brittleness, and non-determinism in LLM APIs.
摘要
Responsible AI Considerations in Text Summarization Research: A Review of Current Practices
results: 研究发现大多数论文不考虑可能的利益相互作用和下游影响,导致研究的可能性和可靠性受到限制。基于这些结果,我们提出了实践和研究方向,以促进责任AI在文本摘要领域的发展。Abstract
AI and NLP publication venues have increasingly encouraged researchers to reflect on possible ethical considerations, adverse impacts, and other responsible AI issues their work might engender. However, for specific NLP tasks our understanding of how prevalent such issues are, or when and why these issues are likely to arise, remains limited. Focusing on text summarization -- a common NLP task largely overlooked by the responsible AI community -- we examine research and reporting practices in the current literature. We conduct a multi-round qualitative analysis of 333 summarization papers from the ACL Anthology published between 2020-2022. We focus on how, which, and when responsible AI issues are covered, which relevant stakeholders are considered, and mismatches between stated and realized research goals. We also discuss current evaluation practices and consider how authors discuss the limitations of both prior work and their own work. Overall, we find that relatively few papers engage with possible stakeholders or contexts of use, which limits their consideration of potential downstream adverse impacts or other responsible AI issues. Based on our findings, we make recommendations on concrete practices and research directions.
摘要
AI和自然语言处理(NLP)发表文章的场景越来越多地鼓励研究人员考虑可能的伦理考虑因素、不良影响和负责任AI问题他们的工作可能会带来。然而,对于特定的NLP任务,我们对这些问题的流行程度、出现时间和发生原因仍然很限制。关注文本概要---一种常见的NLP任务,负责AI社区很少关注的领域---我们对当前文献的研究和报道做了多 round的质量分析。我们关注怎样、何时和为什么负责AI问题被考虑,哪些关键参与者被考虑,以及文献目标与实际实施的差异。我们还讨论当前的评价方法,并考虑作者如何评估先前工作和自己的研究的局限性。总之,我们发现大多数文章没有考虑可能的利益相关者或使用场景,这限制了他们对可能的下游不良影响或其他负责任AI问题的考虑。根据我们的发现,我们提出了具体的实践和研究方向。
Bit Cipher – A Simple yet Powerful Word Representation System that Integrates Efficiently with Language Models
results: 研究表明,使用Bit-cipher算法可以快速训练高效的词向量,并且可以在不同的文本任务中达到竞争性的表现。此外,研究还发现了在折衔批处理和练习中使用cipher embedding层可以加速模型的训练过程,并且可以提高模型的优化性。Abstract
While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations all tend towards log-co-occurrence matrix variants, we construct a novel word representation system called Bit-cipher that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter -- bits -- that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes -- summation or concatenation -- to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance.
摘要
While Large Language Models (LLMs) become ever more dominant, classic pre-trained word embeddings sustain their relevance through computational efficiency and nuanced linguistic interpretation. Drawing from recent studies demonstrating that the convergence of GloVe and word2vec optimizations all tend towards log-co-occurrence matrix variants, we construct a novel word representation system called Bit-cipher that eliminates the need of backpropagation while leveraging contextual information and hyper-efficient dimensionality reduction techniques based on unigram frequency, providing strong interpretability, alongside efficiency. We use the bit-cipher algorithm to train word vectors via a two-step process that critically relies on a hyperparameter -- bits -- that controls the vector dimension. While the first step trains the bit-cipher, the second utilizes it under two different aggregation modes -- summation or concatenation -- to produce contextually rich representations from word co-occurrences. We extend our investigation into bit-cipher's efficacy, performing probing experiments on part-of-speech (POS) tagging and named entity recognition (NER) to assess its competitiveness with classic embeddings like word2vec and GloVe. Additionally, we explore its applicability in LM training and fine-tuning. By replacing embedding layers with cipher embeddings, our experiments illustrate the notable efficiency of cipher in accelerating the training process and attaining better optima compared to conventional training paradigms. Experiments on the integration of bit-cipher embedding layers with Roberta, T5, and OPT, prior to or as a substitute for fine-tuning, showcase a promising enhancement to transfer learning, allowing rapid model convergence while preserving competitive performance.
Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition
for: 本文targets at multimodal emotion recognition, aiming to recognize emotions for each utterance of multiple modalities, and has important applications in human-machine interaction.
methods: 本文提出了一种joint modality fusion and graph contrastive learning方法,即Joyful方法,其中multimodality fusion、contrastive learning和情感认知 jointly optimized. specifically, a new multimodal fusion mechanism is designed to provide deep interaction and fusion between global contextual and uni-modal specific features. Additionally, a graph contrastive learning framework with inter-view and intra-view contrastive losses is introduced to learn more distinguishable representations for samples with different sentiments.
results: 根据三个benchmark datasets的实验结果,Joyful方法 achieved state-of-the-art (SOTA) performance compared to all baselines.Abstract
Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter-view and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared to all baselines.
摘要
多模态情感识别目标是识别对话中每个语音的多种情感,受到人机交互应用的关注。现有的图structured方法无法同时展示对话中全局上下文特征和多模态特征的地方特征。此外,随着图层数量的增加,它们容易陷入过度熔合。本文提出了一种对多模态情感识别进行联合多modalité拟合和图相对学习的方法(Joyful),其中多模态拟合、对比学习和情感识别被联合优化。 Specifically,我们首先设计了一种新的多模态拟合机制,可以提供深入的交互和拟合全局上下文特征和单模态特征。然后,我们引入了一个图相对学习框架,包括对视和自视对比损失来学习更加 distinguishable的表示。经验表明,Joyful在三个基准数据集上达到了所有基准的最佳性能(SOTA)。
Gendec: A Machine Learning-based Framework for Gender Detection from Japanese Names
results: 研究预计可以准确地预测日本名字中的性别信息,并且可以应用于多个领域。Abstract
Every human has their own name, a fundamental aspect of their identity and cultural heritage. The name often conveys a wealth of information, including details about an individual's background, ethnicity, and, especially, their gender. By detecting gender through the analysis of names, researchers can unlock valuable insights into linguistic patterns and cultural norms, which can be applied to practical applications. Hence, this work presents a novel dataset for Japanese name gender detection comprising 64,139 full names in romaji, hiragana, and kanji forms, along with their biological genders. Moreover, we propose Gendec, a framework for gender detection from Japanese names that leverages diverse approaches, including traditional machine learning techniques or cutting-edge transfer learning models, to predict the gender associated with Japanese names accurately. Through a thorough investigation, the proposed framework is expected to be effective and serve potential applications in various domains.
摘要
每个人有自己的名字,是他们的身份和文化遗产的基本组成部分。名字通常包含了许多信息,如个人背景、民族和特别是性别。通过对名字进行分析,研究人员可以获得价值的信息,包括语言模式和文化规范,这些信息可以应用于实际应用。因此,本文提出了一个新的日本名字性别检测集合,包括64,139个全名(拼音、平仮名和汉字形式),以及他们的生物性别。此外,我们提出了一个基于多种方法的日本名字性别检测框架,包括传统机器学习技术和前沿技术的转移学习模型,以准确预测日本名字中的性别。经过仔细调查,我们预期该框架能够实现效果,并可以在多个领域应用。
results: 作者们表明,BoigLLM可以比13倍大的GPT-3.5和GPT-4模型在这个任务中表现更好,表明这些当前状态的模型可以理解图像,但缺乏实际世界中图像的信息。此外, authors还发现,通过使用BoigBench数据集,可以对图像生成和理解进行更多的研究。Abstract
The last few years have witnessed great success on image generation, which has crossed the acceptance thresholds of aesthetics, making it directly applicable to personal and commercial applications. However, images, especially in marketing and advertising applications, are often created as a means to an end as opposed to just aesthetic concerns. The goal can be increasing sales, getting more clicks, likes, or image sales (in the case of stock businesses). Therefore, the generated images need to perform well on these key performance indicators (KPIs), in addition to being aesthetically good. In this paper, we make the first endeavor to answer the question of "How can one infuse the knowledge of the end-goal within the image generation process itself to create not just better-looking images but also "better-performing'' images?''. We propose BoigLLM, an LLM that understands both image content and user behavior. BoigLLM knows how an image should look to get a certain required KPI. We show that BoigLLM outperforms 13x larger models such as GPT-3.5 and GPT-4 in this task, demonstrating that while these state-of-the-art models can understand images, they lack information on how these images perform in the real world. To generate actual pixels of behavior-conditioned images, we train a diffusion-based model (BoigSD) to align with a proposed BoigLLM-defined reward. We show the performance of the overall pipeline on two datasets covering two different behaviors: a stock dataset with the number of forward actions as the KPI and a dataset containing tweets with the total likes as the KPI, denoted as BoigBench. To advance research in the direction of utility-driven image generation and understanding, we release BoigBench, a benchmark dataset containing 168 million enterprise tweets with their media, brand account names, time of post, and total likes.
摘要
过去几年,图像生成领域取得了巨大的成功,已经突破了艺术性的接受reshold,使其直接适用于个人和商业应用。然而,在市场营销和广告应用中,图像 oftentimes 是为了达到一个目标而创建的,而不是仅仅是艺术意义。因此,生成的图像需要在关键性能指标(KPI)上表现良好,除了外观好。在这篇论文中,我们做出了第一次的尝试,回答“如何在图像生成过程中植入目标的知识,以创造不仅更好看的图像,而且“更好表现”的图像?”我们提出了 BoigLLM,一个理解图像内容和用户行为的 LLM。BoigLLM 知道一个图像需要如何看起来,以达到某个需要的 KPI。我们表明,BoigLLM 在这项任务上超过 13 倍大的模型,如 GPT-3.5 和 GPT-4,表明这些当前顶尖模型可以理解图像,但缺乏实际世界中图像的信息。为生成 Conditioned 的实际像素,我们训练了一个扩散基于模型(BoigSD),以与 BoigLLM 定义的奖励相对。我们展示了整个管道在两个 dataset 上的表现,其中一个是一个股票 dataset,其中的 KPI 是前进动作的数量,另一个是一个包含 tweet 的 dataset,其中的 KPI 是总喜欢数。为了推动研究在Utility-driven 图像生成和理解方面,我们发布了 BoigBench,一个包含 168 万个企业微博,它们的媒体、 bran account 名称、发布时间和总喜欢数。
Journey of Hallucination-minimized Generative AI Solutions for Financial Decision Makers
results: 研究表明,通过使用人类反馈来控制幻觉,可以提高 chatbot 的可靠性和质量。同时,通过使用数据 Ansatz 技术,可以生成高质量的自动报告和警示。Abstract
Generative AI has significantly reduced the entry barrier to the domain of AI owing to the ease of use and core capabilities of automation, translation, and intelligent actions in our day to day lives. Currently, Large language models (LLMs) that power such chatbots are being utilized primarily for their automation capabilities for software monitoring, report generation etc. and for specific personalized question answering capabilities, on a limited scope and scale. One major limitation of the currently evolving family of LLMs is 'hallucinations', wherein inaccurate responses are reported as factual. Hallucinations are primarily caused by biased training data, ambiguous prompts and inaccurate LLM parameters, and they majorly occur while combining mathematical facts with language-based context. Thus, monitoring and controlling for hallucinations becomes necessary when designing solutions that are meant for decision makers. In this work we present the three major stages in the journey of designing hallucination-minimized LLM-based solutions that are specialized for the decision makers of the financial domain, namely: prototyping, scaling and LLM evolution using human feedback. These three stages and the novel data to answer generation modules presented in this work are necessary to ensure that the Generative AI chatbots, autonomous reports and alerts are reliable and high-quality to aid key decision-making processes.
摘要
生成AI已经大幅降低了AI领域的入口难度,因为它们提供了易于使用的核心功能,包括自动化、翻译和智能行为,在我们日常生活中。目前,大型语言模型(LLM)正在主要用于其自动化能力,例如软件监控、报告生成等,以及特定的个性化问答能力,但范围和规模受限。现有的LLM家族存在一个主要的问题,即“幻见”(hallucinations),即因训练数据偏见、模糊的提示和不准确的LLM参数而导致的不准确回答。因此,在设计针对决策者的LLM基于解决方案时,监控和控制幻见变得必要。在这项工作中,我们介绍了针对决策者的LLM基于解决方案设计的三个主要阶段:原型、扩大和LLM进化,以及在这三个阶段中使用人类反馈来确保生成AI聊天机器人、自动生成报告和警示高质量和可靠,以帮助决策过程。
Deception Detection from Linguistic and Physiological Data Streams Using Bimodal Convolutional Neural Networks
results: 作者提出了一种将多 modalities 融合的卷积神经网络模型,并证明了这种方法在多模态骗学检测中的优势。此外,作者还比较了这种方法与之前的多模态骗学检测方法,发现其在有限数据情况下也能够达到更高的检测精度。Abstract
Deception detection is gaining increasing interest due to ethical and security concerns. This paper explores the application of convolutional neural networks for the purpose of multimodal deception detection. We use a dataset built by interviewing 104 subjects about two topics, with one truthful and one falsified response from each subject about each topic. In particular, we make three main contributions. First, we extract linguistic and physiological features from this data to train and construct the neural network models. Second, we propose a fused convolutional neural network model using both modalities in order to achieve an improved overall performance. Third, we compare our new approach with earlier methods designed for multimodal deception detection. We find that our system outperforms regular classification methods; our results indicate the feasibility of using neural networks for deception detection even in the presence of limited amounts of data.
摘要
骗子检测在当前有越来越多的关注,这主要归功于伦理和安全问题。本文探讨了使用卷积神经网络进行多Modal骗子检测的应用。我们使用了104名参与者对两个话题进行了面试,每个话题有一个真实的和一个假的回答。特别是,我们从这些数据中提取了语言和生物学特征,用于训练和构建神经网络模型。我们的主要贡献如下:1. 我们提出了一种基于多Modal的卷积神经网络模型,以实现更好的总体性能。2. 我们对之前的多Modal骗子检测方法进行了比较,并发现我们的系统在数据量有限的情况下表现更好。我们的研究表明,使用卷积神经网络进行多Modal骗子检测是可行的,即使是在数据量有限的情况下。
Partially Randomizing Transformer Weights for Dialogue Response Diversity
paper_authors: Jing Yang Lee, Kong Aik Lee, Woon-Seng Gan
for: 提高开放领域对话的响应多样性
methods: 使用固定layer的权重初始化和随机 initialize
results: 与先前的方法相比,PaRaFormer的性能相似,不增加训练或推理难度,也不增加模型的复杂度Abstract
Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during training/inference, or a significant increase in model size and complexity. Hence, we propose the \underline{Pa}rtially \underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension of the transformer which involves freezing the weights of selected layers after random initialization. Experimental results reveal that the performance of the PaRaformer is comparable to that of the aforementioned approaches, despite not entailing any additional training difficulty or increase in model complexity.
摘要
尽管最近的开放领域对话生成技术已经做出了 significi cant 进步,但问题仍然存在低响应多样性。先前的方法通常通过新的目标函数、不同的学习方法如变量框架,或者架构扩展如随机链接(RL)转换器来解决这个问题。然而,这些方法通常会在训练/推理过程中增加额外难度,或者模型的大小和复杂性会增加。因此,我们提出了PaRaFormer,一种简单的 transformer 扩展,它通过随机初始化选择层的 weights 并冻结它们来解决低响应多样性问题。实验结果表明,PaRaFormer 的性能与先前的方法相当,而无需额外的训练困难或模型的复杂度增加。
results: 该论文的主要结果是,该算法可以在 $O(d)$ 的成本和 $O(d\log(1/\varepsilon)^2)$ 的迭代复杂度下找到一个 $\varepsilon$-优点。此外,该算法的对 $d$ 的依赖性是可证明为是最佳的,即任何随机算法都必须在 $d$ 上付出 $\Omega(d)$ 的成本和迭代复杂度。Abstract
We introduce and study the problem of dueling optimization with a monotone adversary, which is a generalization of (noiseless) dueling convex optimization. The goal is to design an online algorithm to find a minimizer $\mathbf{x}^{*}$ for a function $f\colon X \to \mathbb{R}$, where $X \subseteq \mathbb{R}^d$. In each round, the algorithm submits a pair of guesses, i.e., $\mathbf{x}^{(1)}$ and $\mathbf{x}^{(2)}$, and the adversary responds with any point in the space that is at least as good as both guesses. The cost of each query is the suboptimality of the worse of the two guesses; i.e., ${\max} \left( f(\mathbf{x}^{(1)}), f(\mathbf{x}^{(2)}) \right) - f(\mathbf{x}^{*})$. The goal is to minimize the number of iterations required to find an $\varepsilon$-optimal point and to minimize the total cost (regret) of the guesses over many rounds. Our main result is an efficient randomized algorithm for several natural choices of the function $f$ and set $X$ that incurs cost $O(d)$ and iteration complexity $O(d\log(1/\varepsilon)^2)$. Moreover, our dependence on $d$ is asymptotically optimal, as we show examples in which any randomized algorithm for this problem must incur $\Omega(d)$ cost and iteration complexity.
摘要
我们介绍和研究对对抗式优化问题,即一个对抗者对函数 $f\colon X \to \mathbb{R}$ 的一个通用扩展。我们的目标是设计一个在线 Algorithm 以找到 $X \subseteq \mathbb{R}^d$ 中的最小值 $\mathbf{x}^{*}$。在每个回合中,Algorithm 会提交两个猜测,即 $\mathbf{x}^{(1)}$ 和 $\mathbf{x}^{(2)}$,对抗者则回传任何在空间中的任何点,其至少对两个猜测都是最佳的。每次猜测的成本是两个猜测中的较差一个的成本,即 $\max \left( f(\mathbf{x}^{(1)}), f(\mathbf{x}^{(2)}) \right) - f(\mathbf{x}^{*})$。我们的主要结果是一个高效的随机化算法,可以在多个自然的函数 $f$ 和集合 $X$ 上实现cost $O(d)$ 和迭代复杂度 $O(d\log(1/\varepsilon)^2)$。此外,我们的 $d$ 依赖性是对抗数学optimal,我们提供了一些示例,证明任何随机化算法 для这个问题必须有 $\Omega(d)$ 成本和迭代复杂度。
Exponentially Convergent Algorithms for Supervised Matrix Factorization
results: 通过应用于多种SMF类型问题,成功地鉴别了许多不同类型的肿瘤相关基因组。Abstract
Supervised matrix factorization (SMF) is a classical machine learning method that simultaneously seeks feature extraction and classification tasks, which are not necessarily a priori aligned objectives. Our goal is to use SMF to learn low-rank latent factors that offer interpretable, data-reconstructive, and class-discriminative features, addressing challenges posed by high-dimensional data. Training SMF model involves solving a nonconvex and possibly constrained optimization with at least three blocks of parameters. Known algorithms are either heuristic or provide weak convergence guarantees for special cases. In this paper, we provide a novel framework that 'lifts' SMF as a low-rank matrix estimation problem in a combined factor space and propose an efficient algorithm that provably converges exponentially fast to a global minimizer of the objective with arbitrary initialization under mild assumptions. Our framework applies to a wide range of SMF-type problems for multi-class classification with auxiliary features. To showcase an application, we demonstrate that our algorithm successfully identified well-known cancer-associated gene groups for various cancers.
摘要
“超级vised矩阵因子(SMF)是一种经典的机器学习方法,同时寻求特征提取和分类任务,这两个任务可能并不是先验 aligned 目标。我们想使用 SMF 学习低级别的秘密因子,以提供可解释、数据重建和类别分类的特征,解决高维数据带来的挑战。SMF 模型的训练过程 involve 解决非拟合和可能受限制的优化问题,其中至少有三个块的参数。知名的算法是 Either HEURISTIC 或提供弱 convergence 保证的特殊情况。在这篇论文中,我们提出了一种新的框架,将 SMF 视为一种低级别矩阵估计问题,并提出了一种高效的算法,可在任意初始化下,在轻微假设下提供可证明的对象目标的极限值 global minimizer ,并且在数据中心化的情况下提供了一个稳定的初始化方法。我们的框架适用于多类分类问题中的 SMF-type 问题,并且在不同的肿瘤类型中成功地识别了许多知名的肿瘤相关基因组。”
Nonsmooth Projection-Free Optimization with Functional Constraints
results: 该算法可以在 $\mathcal{O}(\epsilon^{-2})$ 迭代中获得 $\epsilon$-次优化解决方案,每迭代只需要一个 (可能不准确) 线性最小化询问(LMO)和一个 (可能不准确) 偏导数计算。这种性能与现有的下界具有相同性。Abstract
This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set. While the well-established Frank-Wolfe algorithm and its variants already avoid projections, they are primarily designed for smooth objective functions. In contrast, our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints. It achieves an $\epsilon$-suboptimal solution in $\mathcal{O}(\epsilon^{-2})$ iterations, with each iteration requiring only a single (potentially inexact) Linear Minimization Oracle (LMO) call and a (possibly inexact) subgradient computation. This performance is consistent with existing lower bounds. Similar performance is observed when deterministic subgradients are replaced with stochastic subgradients. In the special case where there are no functional inequality constraints, our algorithm competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems. Our approach utilizes a simple separation scheme in conjunction with a new Lagrange multiplier update rule.
摘要
Translated into Simplified Chinese:这篇论文提出了一种基于梯度的准则逼近算法,用于非规范凸优化问题,不需要进行可行集的投影。与已知的Frank-Wolfe算法和其变体不同,我们的提议的算法可以处理具有一般凸函数不等式约束的非准确问题。它可以在 $\mathcal{O}(\epsilon^{-2})$ 迭代内 achieve an $\epsilon$-下界解,每迭代只需要一个 (可能不准确) 线性最小化函数 oracle (LMO) 调用和 (可能不准确) 梯度计算。这个性能与现有的下界具有相同的性能。此外,当使用权重函数时,我们的算法也可以达到类似的性能。在特殊情况下,当没有函数不等式约束时,我们的算法与一种最近的准则逼近方法,设计用于无约束问题,竞争得到。我们的方法使用了一种简单的分离方案,并使用了一个新的拉格朗日积分规则。
Low-Precision Floating-Point for Efficient On-Board Deep Neural Network Processing
results: 使用6比特浮点数字quantization可以和单精度浮点数字相比,无需 significannot accuracy degradationAbstract
One of the major bottlenecks in high-resolution Earth Observation (EO) space systems is the downlink between the satellite and the ground. Due to hardware limitations, on-board power limitations or ground-station operation costs, there is a strong need to reduce the amount of data transmitted. Various processing methods can be used to compress the data. One of them is the use of on-board deep learning to extract relevant information in the data. However, most ground-based deep neural network parameters and computations are performed using single-precision floating-point arithmetic, which is not adapted to the context of on-board processing. We propose to rely on quantized neural networks and study how to combine low precision (mini) floating-point arithmetic with a Quantization-Aware Training methodology. We evaluate our approach with a semantic segmentation task for ship detection using satellite images from the Airbus Ship dataset. Our results show that 6-bit floating-point quantization for both weights and activations can compete with single-precision without significant accuracy degradation. Using a Thin U-Net 32 model, only a 0.3% accuracy degradation is observed with 6-bit minifloat quantization (a 6-bit equivalent integer-based approach leads to a 0.5% degradation). An initial hardware study also confirms the potential impact of such low-precision floating-point designs, but further investigation at the scale of a full inference accelerator is needed before concluding whether they are relevant in a practical on-board scenario.
摘要
(Simplified Chinese translation)一个主要瓶颈在高分辨率地球观测(EO)空间系统中是地面和卫星之间的下载链。由于硬件限制、机载电力限制或地面站操作成本,有强需要减少数据传输量。Various processing methods can be used to compress the data. One of them is the use of on-board deep learning to extract relevant information in the data. However, most ground-based deep neural network parameters and computations are performed using single-precision floating-point arithmetic, which is not adapted to the context of on-board processing. We propose to rely on quantized neural networks and study how to combine low precision (mini) floating-point arithmetic with a Quantization-Aware Training methodology. We evaluate our approach with a semantic segmentation task for ship detection using satellite images from the Airbus Ship dataset. Our results show that 6-bit floating-point quantization for both weights and activations can compete with single-precision without significant accuracy degradation. Using a Thin U-Net 32 model, only a 0.3% accuracy degradation is observed with 6-bit minifloat quantization (a 6-bit equivalent integer-based approach leads to a 0.5% degradation). An initial hardware study also confirms the potential impact of such low-precision floating-point designs, but further investigation at the scale of a full inference accelerator is needed before concluding whether they are relevant in a practical on-board scenario.
Benchmarking Machine Learning Models for Quantum Error Correction
results: 研究发现,通过扩大接受区域来利用远程 ancilla qubits 中的信息,可以提高 QEC 的准确率。例如,与 CNN 相比,U-Net 可以提高准确率约50%。此外,研究还提供了一个全面的分析,以便未来的研究。Abstract
Quantum Error Correction (QEC) is one of the fundamental problems in quantum computer systems, which aims to detect and correct errors in the data qubits within quantum computers. Due to the presence of unreliable data qubits in existing quantum computers, implementing quantum error correction is a critical step when establishing a stable quantum computer system. Recently, machine learning (ML)-based approaches have been proposed to address this challenge. However, they lack a thorough understanding of quantum error correction. To bridge this research gap, we provide a new perspective to understand machine learning-based QEC in this paper. We find that syndromes in the ancilla qubits result from errors on connected data qubits, and distant ancilla qubits can provide auxiliary information to rule out some incorrect predictions for the data qubits. Therefore, to detect errors in data qubits, we must consider the information present in the long-range ancilla qubits. To the best of our knowledge, machine learning is less explored in the dependency relationship of QEC. To fill the blank, we curate a machine learning benchmark to assess the capacity to capture long-range dependencies for quantum error correction. To provide a comprehensive evaluation, we evaluate seven state-of-the-art deep learning algorithms spanning diverse neural network architectures, such as convolutional neural networks, graph neural networks, and graph transformers. Our exhaustive experiments reveal an enlightening trend: By enlarging the receptive field to exploit information from distant ancilla qubits, the accuracy of QEC significantly improves. For instance, U-Net can improve CNN by a margin of about 50%. Finally, we provide a comprehensive analysis that could inspire future research in this field. We will release the code when the paper is published.
摘要
量子错误修复(QEC)是量子计算机系统中的基本问题,旨在检测和修复数据QUUBITS中的错误。由于现有的量子计算机中的数据QUUBITS不可靠,实施量子错误修复是建立稳定量子计算机系统的关键步骤。近年来,基于机器学习(ML)的方法被提议用于解决这个挑战。然而,这些方法缺乏量子错误修复的深入理解。为了填补这个研究漏洞,我们在这篇论文中提供了一新的视角,发现在 ancilla qubits 中的症状是由数据QUUBITS中的错误引起的,并且远离 ancilla qubits 可以提供辅助信息,以排除一些错误的预测。因此,为检测数据QUUBITS 中的错误,我们必须考虑 ancilla qubits 中的信息。在量子错误修复中,机器学习的应用较少,因此我们在这个领域进行了一项机器学习benchmark的创建,以评估机器学习算法的捕捉远程依赖关系能力。我们对七种state-of-the-art深度学习算法进行了广泛的实验,包括卷积神经网络、图神经网络和图变换器。我们的广泛实验发现,通过扩大感知场,以利用远离 ancilla qubits 中的信息,量子错误修复的准确率得到了显著提高。例如,U-Net 可以提高 CNN 的准确率约50%。最后,我们对这一结论进行了全面的分析,以便对这一领域的未来研究提供指导。我们将在论文发表时释放代码。
On the Hardness of Learning to Stabilize Linear Systems
paper_authors: Xiong Zeng, Zexiang Liu, Zhe Du, Necmiye Ozay, Mario Sznaier
for: 这 paper 是研究 linear time-invariant systems 的 stabilization 问题的,具体来说是研究学习这类系统的统计困难性。
methods: 这 paper 使用了学习理论和 robust control 的思想,研究了一种类型的系统在不同维度下的学习难度。
results: 研究发现,这类系统的学习难度会随着系统维度的增加而增加 exponentially,即使这些系统可以容易地被识别。Abstract
Inspired by the work of Tsiamis et al. \cite{tsiamis2022learning}, in this paper we study the statistical hardness of learning to stabilize linear time-invariant systems. Hardness is measured by the number of samples required to achieve a learning task with a given probability. The work in \cite{tsiamis2022learning} shows that there exist system classes that are hard to learn to stabilize with the core reason being the hardness of identification. Here we present a class of systems that can be easy to identify, thanks to a non-degenerate noise process that excites all modes, but the sample complexity of stabilization still increases exponentially with the system dimension. We tie this result to the hardness of co-stabilizability for this class of systems using ideas from robust control.
摘要
根据 Tsiavmis 等人的研究 \cite{tsiamis2022learning}, 在这篇论文中我们研究了线性时间不变系统学习稳定性的统计困难性。困难性是由于学习任务中的概率。 Tsiavmis 等人的研究显示存在一些系统类型具有稳定性学习困难的核心原因,那就是识别困难。在这篇论文中,我们提出了一种系统类型,它具有非零噪变数驱动所有模式,但是稳定性学习的样本复杂性还是随系统维度的幂函数增长。我们将这结果与这种系统的稳定性困难性进行连结,使用了稳定控制的想法。
Auxiliary Losses for Learning Generalizable Concept-based Models
results: 在实际数据集上进行了图像分类任务的广泛实验,并研究了不同分布Shift Setting下模型的性能。结果显示,我们提出的方法可以在所有分布Shift Setting下达到最高精度,甚至超过黑盒模型的最高概念精度。Abstract
The increasing use of neural networks in various applications has lead to increasing apprehensions, underscoring the necessity to understand their operations beyond mere final predictions. As a solution to enhance model transparency, Concept Bottleneck Models (CBMs) have gained popularity since their introduction. CBMs essentially limit the latent space of a model to human-understandable high-level concepts. While beneficial, CBMs have been reported to often learn irrelevant concept representations that consecutively damage model performance. To overcome the performance trade-off, we propose cooperative-Concept Bottleneck Model (coop-CBM). The concept representation of our model is particularly meaningful when fine-grained concept labels are absent. Furthermore, we introduce the concept orthogonal loss (COL) to encourage the separation between the concept representations and to reduce the intra-concept distance. This paper presents extensive experiments on real-world datasets for image classification tasks, namely CUB, AwA2, CelebA and TIL. We also study the performance of coop-CBM models under various distributional shift settings. We show that our proposed method achieves higher accuracy in all distributional shift settings even compared to the black-box models with the highest concept accuracy.
摘要
随着神经网络在不同应用领域的使用越来越广泛,人们对神经网络的运作更加积极要进行了解。为了增强模型的透明度,概念瓶颈模型(CBM)在引入后得到了广泛的关注。CBM通过限制神经网络的幂辐空间来带来人类可理解的高级概念表示。虽然有利,但CBM经常学习不直观的概念表示,这会导致模型性能下降。为了解决性能和概念表示之间的负相关性,我们提议协同概念瓶颈模型(coop-CBM)。我们的模型中的概念表示在细化概念标签缺失时特别有意义。此外,我们引入了概念正交损失(COL),以鼓励概念表示之间的分离和减少内部概念距离。本文通过对实际世界数据集进行了广泛的实验,包括CUB、AwA2、CelebA和TIL等图像分类任务。我们还研究了coop-CBM模型在不同分布shift设置下的性能。我们的提议方法在所有分布shift设置下都实现了更高的准确率,包括黑盒模型的最高概念准确率。
Flat Minima in Linear Estimation and an Extended Gauss Markov Theorem
results: 通过对多种Random matrix ensembles的分析和 simulations studies,本文显示了cross-validated Nuclear和Spectral regressors可以在一些情况下超越Ridge。Abstract
We consider the problem of linear estimation, and establish an extension of the Gauss-Markov theorem, in which the bias operator is allowed to be non-zero but bounded with respect to a matrix norm of Schatten type. We derive simple and explicit formulas for the optimal estimator in the cases of Nuclear and Spectral norms (with the Frobenius case recovering ridge regression). Additionally, we analytically derive the generalization error in multiple random matrix ensembles, and compare with Ridge regression. Finally, we conduct an extensive simulation study, in which we show that the cross-validated Nuclear and Spectral regressors can outperform Ridge in several circumstances.
摘要
我们考虑了线性估计问题,并提出了允许报 bias 运算符不为零,但是对矩阵 нор 类型的 Schatten 类型做bounded的扩展。我们得到了简单明确的优化者公式,包括核心和 спектраль norm 两种情况(带有 Frobenius 情况,相当于ridge regression)。此外,我们也derived了多种Random Matrix ensemble的泛化误差,并与ridge regression进行比较。最后,我们进行了大量的实验研究,并证明了在某些情况下,cross-validate的核心和 спектраль回归可以超越ridge。Note: The translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing. The other form is Traditional Chinese.
paper_authors: Zhijin Guo, Zhaozhen Xu, Martha Lewis, Nello Cristianini for: 本研究旨在探讨人工智能中的嵌入,即将符号结构转换为固定维度的向量,从而实际上将多个信号融合在一起。methods: 本研究提出了两种方法:首先是相关性检测法,测量知道特征和嵌入之间的相关性,其次是加法融合检测法,视嵌入为各个特征的向量的总和。results: 应用这两种方法后,Word2Vec中的嵌入发现 combining semantic and morphological signals,BERT句 embeddings可以分解为各个单词vector的主语、谓语和 objet。在知识图基本推荐系统中,用户嵌入,即使没有训练民主数据,仍然表现出年龄和性别的信号。 这种研究表明,嵌入是多种信号的融合,从Word2Vec组件到知识图中的人类特征提示。Abstract
Embeddings in AI convert symbolic structures into fixed-dimensional vectors, effectively fusing multiple signals. However, the nature of this fusion in real-world data is often unclear. To address this, we introduce two methods: (1) Correlation-based Fusion Detection, measuring correlation between known attributes and embeddings, and (2) Additive Fusion Detection, viewing embeddings as sums of individual vectors representing attributes. Applying these methods, word embeddings were found to combine semantic and morphological signals. BERT sentence embeddings were decomposed into individual word vectors of subject, verb and object. In the knowledge graph-based recommender system, user embeddings, even without training on demographic data, exhibited signals of demographics like age and gender. This study highlights that embeddings are fusions of multiple signals, from Word2Vec components to demographic hints in graph embeddings.
摘要
<>这里使用对应的字 embeddings 转换为简化中文。>原文:Embeddings in AI 将 символічні结构转换为固定维度的 вектор,实际上融合多个讯号。但在实际数据中,这种融合的性质往往不明确。为了解决这个问题,我们提出了两种方法:1. 相似性检测法,检测知道的特征和对应的 embedding 之间的相似性。2. 总和检测法,视对应的 embedding 为各个特征的总和,将它们视为各个特征的表现。通过这两种方法,我们发现了单词嵌入在 Word2Vec 中的 Semantic 和 Morphological 信号都会被融合在一起。BERT 的句子嵌入则可以被分解为单词的主语、词汇和宾语嵌入。在基于知识库的推荐系统中,用户嵌入,即使没有训练过demographic数据,也会显示出年龄和性别的信号。这些研究表明,对应的嵌入是多种信号的融合,从 Word2Vec 的成分到知识库中的推荐系统中的 demographic 信号。
The Persian Piano Corpus: A Collection Of Instrument-Based Feature Extracted Data Considering Dastgah
results: 这篇论文提供了2022年波斯钢琴曲目的特征EXTRACT,以便在后续的研究中更好地理解波斯音乐和钢琴在其中的角色。Abstract
The research in the field of music is rapidly growing, and this trend emphasizes the need for comprehensive data. Though researchers have made an effort to contribute their own datasets, many data collections lack the requisite inclusivity for comprehensive study because they are frequently focused on particular components of music or other specific topics. We have endeavored to address data scarcity by employing an instrument-based approach to provide a complete corpus related to the Persian piano. Our piano corpus includes relevant labels for Persian music mode (Dastgah) and comprehensive metadata, allowing for utilization in various popular research areas. The features extracted from 2022 Persian piano pieces in The Persian Piano Corpus (PPC) have been collected and made available to researchers, aiming for a more thorough understanding of Persian music and the role of the piano in it in subsequent steps.
摘要
研究领域内的音乐研究正在快速增长,这种趋势强调了全面的数据需求。虽然研究人员努力提供自己的数据集,但许多数据集缺乏包括性,因为它们 часто专注于特定的音乐组成部分或其他特定话题。我们尝试 Address 数据缺乏的问题,采用了 Musical Instrument 基本方法,以提供完整的波斯钢琴相关数据集。我们的钢琴数据集包括波斯音乐模式(Dastgah)相关的标签,以及完整的元数据,以便在各种流行的研究领域中使用。我们从2022年波斯钢琴曲目中提取了特征,并将其作为研究者的工具提供,以便在后续步骤中更好地理解波斯音乐和钢琴在其中的角色。
Tactics2D: A Multi-agent Reinforcement Learning Environment for Driving Decision-making
results: 该库可以帮助研究人员快速开发和测试决策算法,以便更好地研究自动驾驶技术。Abstract
Tactics2D is an open-source multi-agent reinforcement learning library with a Python backend. Its goal is to provide a convenient toolset for researchers to develop decision-making algorithms for autonomous driving. The library includes diverse traffic scenarios implemented as gym-based environments equipped with multi-sensory capabilities and violation detection for traffic rules. Additionally, it features a reinforcement learning baseline tested with reasonable evaluation metrics. Tactics2D is highly modular and customizable. The source code of Tactics2D is available at https://github.com/WoodOxen/Tactics2D.
摘要
《战略2D》是一个开源的多代理人强化学习库,Python后端。它的目标是为研究人员提供一个便捷的工具集,以开发自适应驾驶决策算法。库包括各种交通场景,通过gym环境实现了多感知功能和规则违反检测。此外,它还提供了一个基线测试,并且高度可 modify 和定制。《战略2D》的源代码可以在 GitHub 上找到:https://github.com/WoodOxen/Tactics2D。
Challenges in data-based geospatial modeling for environmental research and practice
results: 论文总结了解决这些挑战的技术和 популяр编程工具,以及地球空间智能在环境应用中的前景。Abstract
With the rise of electronic data, particularly Earth observation data, data-based geospatial modelling using machine learning (ML) has gained popularity in environmental research. Accurate geospatial predictions are vital for domain research based on ecosystem monitoring and quality assessment and for policy-making and action planning, considering effective management of natural resources. The accuracy and computation speed of ML has generally proved efficient. However, many questions have yet to be addressed to obtain precise and reproducible results suitable for further use in both research and practice. A better understanding of the ML concepts applicable to geospatial problems enhances the development of data science tools providing transparent information crucial for making decisions on global challenges such as biosphere degradation and climate change. This survey reviews common nuances in geospatial modelling, such as imbalanced data, spatial autocorrelation, prediction errors, model generalisation, domain specificity, and uncertainty estimation. We provide an overview of techniques and popular programming tools to overcome or account for the challenges. We also discuss prospects for geospatial Artificial Intelligence in environmental applications.
摘要
Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation is written in Traditional Chinese, which is used in Taiwan and Hong Kong.Please note that the translation is done by a machine and may not be perfect, and there may be some nuances or cultural references that are not fully captured.
A Survey of Simulators for Autonomous Driving: Taxonomy, Challenges, and Evaluation Metrics
paper_authors: Yueyuan Li, Wei Yuan, Weihao Yan, Qiyuan Shen, Chunxiang Wang, Ming Yang
For: This paper provides an in-depth review of simulators for autonomous driving, with a focus on their evolution, functionalities, and limitations.* Methods: The paper classifies simulators based on their functions, including traffic flow simulators, vehicle dynamics simulators, scenario editors, sensory data generators, and driving strategy validators. It also explores commercial and open-source simulators and evaluates their performance using qualitative and quantitative metrics.* Results: The paper identifies the primary limitations of simulators as fidelity and efficiency concerns and proposes solutions such as enhancing adverse weather simulation, automated map reconstruction, and interactive traffic participants. It also explores headless simulation and multiple-speed simulation techniques to improve the realism and efficiency of simulators.Abstract
Simulators have irreplaceable importance for the research and development of autonomous driving. Besides saving resources, labor, and time, simulation is the only feasible way to reproduce many severe accident scenarios. Despite their widespread adoption across academia and industry, there is an absence in the evolutionary trajectory of simulators and critical discourse on their limitations. To bridge the gap in research, this paper conducts an in-depth review of simulators for autonomous driving. It delineates the three-decade development into three stages: specialized development period, gap period, and comprehensive development, from which it detects a trend of implementing comprehensive functionalities and open-source accessibility. Then it classifies the simulators by functions, identifying five categories: traffic flow simulator, vehicle dynamics simulator, scenario editor, sensory data generator, and driving strategy validator. Simulators that amalgamate diverse features are defined as comprehensive simulators. By investigating commercial and open-source simulators, this paper reveals that the critical issues faced by simulators primarily revolve around fidelity and efficiency concerns. This paper justifies that enhancing the realism of adverse weather simulation, automated map reconstruction, and interactive traffic participants will bolster credibility. Concurrently, headless simulation and multiple-speed simulation techniques will exploit the theoretic advantages. Moreover, this paper delves into potential solutions for the identified issues. It explores qualitative and quantitative evaluation metrics to assess the simulator's performance. This paper guides users to find suitable simulators efficiently and provides instructive suggestions for developers to improve simulator efficacy purposefully.
摘要
<>对于自动驾驶研发中的研究和开发,模拟器具有不可或缺的重要性。除了节省资源、劳动力和时间之外,模拟是重创各种严重事故场景的唯一可行方式。尽管在学术和业界中广泛应用,但模拟器的演化征程和批判性讨论却缺乏。 为了填补这些研究的空白,本文进行了自动驾驶模拟器的深入综述。它将三十年的发展分成三个阶段:特殊开发期、阶段差期和综合开发,从中探测到了实现广泛功能和开源可访问性的趋势。然后它将模拟器按功能分类,确定了五种类别:交通流模拟器、车辆动力模拟器、enario编辑器、感知数据生成器和驾驶策略验证器。拥有多种功能的模拟器被定义为综合模拟器。通过商业和开源模拟器的调查,本文发现了模拟器的主要问题是准确性和效率问题。本文认为,提高风暴天气模拟、自动地图重建和互动交通参与者会增强信息的准确性。同时,无头模拟和多速模拟技术可以实现理论上的优势。此外,本文还探讨了模拟器问题的解决方案。它提出了评估模拟器性能的量化和质量评价指标,并为用户寻找适合的模拟器提供了有用的指导。为开发者提高模拟器效果,本文还提供了有价值的建议。
DenseNet and Support Vector Machine classifications of major depressive disorder using vertex-wise cortical features
paper_authors: Vladimir Belov, Tracy Erwin-Grabner, Ling-Li Zeng, Christopher R. K. Ching, Andre Aleman, Alyssa R. Amod, Zeynep Basgoze, Francesco Benedetti, Bianca Besteher, Katharina Brosch, Robin Bülow, Romain Colle, Colm G. Connolly, Emmanuelle Corruble, Baptiste Couvy-Duchesne, Kathryn Cullen, Udo Dannlowski, Christopher G. Davey, Annemiek Dols, Jan Ernsting, Jennifer W. Evans, Lukas Fisch, Paola Fuentes-Claramonte, Ali Saffet Gonul, Ian H. Gotlib, Hans J. Grabe, Nynke A. Groenewold, Dominik Grotegerd, Tim Hahn, J. Paul Hamilton, Laura K. M. Han, Ben J Harrison, Tiffany C. Ho, Neda Jahanshad, Alec J. Jamieson, Andriana Karuk, Tilo Kircher, Bonnie Klimes-Dougan, Sheri-Michelle Koopowitz, Thomas Lancaster, Ramona Leenings, Meng Li, David E. J. Linden, Frank P. MacMaster, David M. A. Mehler, Susanne Meinert, Elisa Melloni, Bryon A. Mueller, Benson Mwangi, Igor Nenadić, Amar Ojha, Yasumasa Okamoto, Mardien L. Oudega, Brenda W. J. H. Penninx, Sara Poletti, Edith Pomarol-Clotet, Maria J. Portella, Elena Pozzi, Joaquim Radua, Elena Rodríguez-Cano, Matthew D. Sacchet, Raymond Salvador, Anouk Schrantee, Kang Sim, Jair C. Soares, Aleix Solanes, Dan J. Stein, Frederike Stein, Aleks Stolicyn, Sophia I. Thomopoulos, Yara J. Toenders, Aslihan Uyar-Demir, Eduard Vieta, Yolanda Vives-Gilabert, Henry Völzke, Martin Walter, Heather C. Whalley, Sarah Whittle, Nils Winter, Katharina Wittfeld, Margaret J. Wright, Mon-Ju Wu, Tony T. Yang, Carlos Zarate, Dick J. Veltman, Lianne Schmaal, Paul M. Thompson, Roberto Goya-Maldonado
for: 这个研究是为了检测主要抑郁疾病(MDD)是否有 morphological alterations in the brain 的问题。
methods: 这个研究使用了深度学习工具来分析神经成像数据,并使用了 DenseNet 和 Support Vector Machine(SVM)两种类型的分类器。
results: 研究发现,不 matter which classifier is used, the integration of vertex-wise morphometric features did not lead to differentiability between MDD and healthy controls(HC),并且site effect also exists。 Therefore, the study suggests that MDD classification on this combination of features and classifiers is unfeasible。Abstract
Major depressive disorder (MDD) is a complex psychiatric disorder that affects the lives of hundreds of millions of individuals around the globe. Even today, researchers debate if morphological alterations in the brain are linked to MDD, likely due to the heterogeneity of this disorder. The application of deep learning tools to neuroimaging data, capable of capturing complex non-linear patterns, has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to demarcate MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies. In this study, we used globally representative data from the ENIGMA-MDD working group containing an extensive sample of people with MDD (N=2,772) and HC (N=4,240), which allows a comprehensive analysis with generalizable results. Based on the hypothesis that integration of vertex-wise cortical features can improve classification performance, we evaluated the classification of a DenseNet and a Support Vector Machine (SVM), with the expectation that the former would outperform the latter. As we analyzed a multi-site sample, we additionally applied the ComBat harmonization tool to remove potential nuisance effects of site. We found that both classifiers exhibited close to chance performance (balanced accuracy DenseNet: 51%; SVM: 53%), when estimated on unseen sites. Slightly higher classification performance (balanced accuracy DenseNet: 58%; SVM: 55%) was found when the cross-validation folds contained subjects from all sites, indicating site effect. In conclusion, the integration of vertex-wise morphometric features and the use of the non-linear classifier did not lead to the differentiability between MDD and HC. Our results support the notion that MDD classification on this combination of features and classifiers is unfeasible.
摘要
Major Depressive Disorder (MDD) 是一种复杂的心理疾病,影响全球数百万人的生活。尽管研究人员今天仍然debatewhether morphological alterations in the brain are linked to MDD, but the application of deep learning tools to neuroimaging data has the potential to provide diagnostic and predictive biomarkers for MDD. However, previous attempts to distinguish MDD patients and healthy controls (HC) based on segmented cortical features via linear machine learning approaches have reported low accuracies.在这个研究中,我们使用了ENIGMA-MDD工作组的全球代表性数据集(N=2,772)和HC(N=4,240),可以进行全面的分析并得到普遍可靠的结果。基于假设集成 vertex-wise cortical features可以提高分类性能,我们评估了DenseNet和Support Vector Machine (SVM)两种类器,期望前者能够超越后者。由于我们分析了多个站点的数据,我们还应用了ComBat协调工具来除掉可能的站点效应。我们发现,无论使用DenseNet或SVM类器,在未看过的站点上估计时,两者的准确率都接近机会准确率(balanced accuracy DenseNet: 51%; SVM: 53%)。然而,当分割folds包含所有站点时,两者的准确率(balanced accuracy DenseNet: 58%; SVM: 55%)提高了一些, indicating that site effect played a role.结论:在这种 combinaton of features and classifiers 上,不可能 diferenciate MDD and HC。our results support the notion that MDD classification on this combination of features and classifiers is unfeasible.
SORTAD: Self-Supervised Optimized Random Transformations for Anomaly Detection in Tabular Data
results: 研究获得了当前顶尖的结果,在多个常用的异常探测数据集上都取得了良好的成绩,并且在所有测试数据集上也取得了总体良好的结果。Abstract
We consider a self-supervised approach to anomaly detection in tabular data. Random transformations are applied to the data, and then each transformation is identified based on its output. These predicted transformations are used to identify anomalies. In tabular data this approach faces many challenges that are related to the uncorrelated nature of the data. These challenges affect the transformations that should be used, as well as the use of their predictions. To this end, we propose SORTAD, a novel algorithm that is tailor-made to solve these challenges. SORTAD optimally chooses random transformations that help the classification process, and have a scoring function that is more sensitive to the changes in the transformations classification prediction encountered in tabular data. SORTAD achieved state-of-the-art results on multiple commonly used anomaly detection data sets, as well as in the overall results across all data sets tested.
摘要
我们考虑了一种自助学习方法 для异常检测在表格数据中。在这种方法中,随机变换被应用于数据,然后每个变换被预测。这些预测的变换被用来标识异常。在表格数据中,这种方法遇到了许多与不相关性相关的挑战。这些挑战影响了应用的变换以及其预测的用途。为此,我们提出了SORTAD算法,这是特制的解决这些挑战的算法。SORTAD优选随机变换,帮助分类过程,并且有一个更敏感的变换分类预测值的评分函数。SORTAD在多个常用的异常检测数据集上达到了状态的最佳结果,以及在所有数据集上的总结果。
Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models
paper_authors: Xuefeng Gao, Hoang M. Nguyen, Lingjiong Zhu
for: 这 paper 是为了提供Score-based generative models(SGMs)的convrgence guarantees,以便在各种应用中达到state-of-the-art性能。
methods: 这 paper 使用了一种通用的 SGMs 类型,assuming accurate score estimates和smooth log-concave data distribution,并特定了几种具体的 forward processes,包括一些 newly proposed 的模型。
results: 这 paper 提供了一个 upper bound 的 iteration complexity для每个模型,并提供了一个 lower bound 当数据分布是 Gaussian。 numerically, 这 paper 通过对 CIFAR-10 上的 unconditional image generation 进行实验,发现实验结果与理论预测相一致,并且模型使用我们 newly proposed forward processes 可以超越现有模型。Abstract
Score-based generative models (SGMs) is a recent class of deep generative models with state-of-the-art performance in many applications. In this paper, we establish convergence guarantees for a general class of SGMs in 2-Wasserstein distance, assuming accurate score estimates and smooth log-concave data distribution. We specialize our result to several concrete SGMs with specific choices of forward processes modelled by stochastic differential equations, and obtain an upper bound on the iteration complexity for each model, which demonstrates the impacts of different choices of the forward processes. We also provide a lower bound when the data distribution is Gaussian. Numerically, we experiment SGMs with different forward processes, some of which are newly proposed in this paper, for unconditional image generation on CIFAR-10. We find that the experimental results are in good agreement with our theoretical predictions on the iteration complexity, and the models with our newly proposed forward processes can outperform existing models.
摘要
score-based生成模型(SGM)是一种最近的深度生成模型,在许多应用场景中表现出色。在这篇论文中,我们证明了SGM在2-Wasserstein距离下的收敛保证,假设批处数据分布是准确的评估值和光滑凹陷分布。我们对特定的SGM进行特化,并得到了每个模型的迭代复杂度上限,这 demonstartes了不同的前进过程选择对模型的影响。我们还提供了 Gaussian 分布时的下界。 numerically, we experiment SGMs with different forward processes, some of which are newly proposed in this paper, for unconditional image generation on CIFAR-10. We find that the experimental results are in good agreement with our theoretical predictions on the iteration complexity, and the models with our newly proposed forward processes can outperform existing models.Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese. The other version is Traditional Chinese.
BrainZ-BP: A Non-invasive Cuff-less Blood Pressure Estimation Approach Leveraging Brain Bio-impedance and Electrocardiogram
methods: 这个研究使用了两个电极位于前后脑骨的安排,测量大脑 BIOZ,并提取了脉冲传输时间和脑 BIOZ 形态特征等特征,并将其传输给四种回归模型进行 BP 估计。
results: 研究结果表明,Random Forest 回归模型的 Mean Absolute Error、Root Mean Square Error 和 Correlation Coefficient 分别为 2.17 mmHg、3.91 mmHg 和 0.90 для systolic pressure 估计,并为 1.71 mmHg、3.02 mmHg 和 0.89 для diastolic pressure 估计。这些结果表明 BrainZ-BP 可以准确地估计血压。Abstract
Accurate and continuous blood pressure (BP) monitoring is essential to the early prevention of cardiovascular diseases. Non-invasive and cuff-less BP estimation algorithm has gained much attention in recent years. Previous studies have demonstrated that brain bio-impedance (BIOZ) is a promising technique for non-invasive intracranial pressure (ICP) monitoring. Clinically, treatment for patients with traumatic brain injuries (TBI) requires monitoring the ICP and BP of patients simultaneously. Estimating BP by brain BIOZ directly can reduce the number of sensors attached to the patients, thus improving their comfort. To address the issues, in this study, we explore the feasibility of leveraging brain BIOZ for BP estimation and propose a novel cuff-less BP estimation approach called BrainZ-BP. Two electrodes are placed on the forehead and occipital bone of the head in the anterior-posterior direction for brain BIOZ measurement. Various features including pulse transit time and morphological features of brain BIOZ are extracted and fed into four regression models for BP estimation. Results show that the mean absolute error, root mean square error, and correlation coefficient of random forest regression model are 2.17 mmHg, 3.91 mmHg, and 0.90 for systolic pressure estimation, and are 1.71 mmHg, 3.02 mmHg, and 0.89 for diastolic pressure estimation. The presented BrainZ-BP can be applied in the brain BIOZ-based ICP monitoring scenario to monitor BP simultaneously.
摘要
Accurate and continuous blood pressure (BP) monitoring is essential to the early prevention of cardiovascular diseases. Non-invasive and cuff-less BP estimation algorithm has gained much attention in recent years. Previous studies have demonstrated that brain bio-impedance (BIOZ) is a promising technique for non-invasive intracranial pressure (ICP) monitoring. Clinically, treatment for patients with traumatic brain injuries (TBI) requires monitoring the ICP and BP of patients simultaneously. Estimating BP by brain BIOZ directly can reduce the number of sensors attached to the patients, thus improving their comfort. To address the issues, in this study, we explore the feasibility of leveraging brain BIOZ for BP estimation and propose a novel cuff-less BP estimation approach called BrainZ-BP. Two electrodes are placed on the forehead and occipital bone of the head in the anterior-posterior direction for brain BIOZ measurement. Various features including pulse transit time and morphological features of brain BIOZ are extracted and fed into four regression models for BP estimation. Results show that the mean absolute error, root mean square error, and correlation coefficient of random forest regression model are 2.17 mmHg, 3.91 mmHg, and 0.90 for systolic pressure estimation, and are 1.71 mmHg, 3.02 mmHg, and 0.89 for diastolic pressure estimation. The presented BrainZ-BP can be applied in the brain BIOZ-based ICP monitoring scenario to monitor BP simultaneously.Here's a word-for-word translation of the text into Simplified Chinese:精准和不间断的血压监测是预防心血管疾病的关键。非侵入式和无捕血压估算算法在最近几年内得到了广泛关注。先前的研究表明,脑 bio-impedance(BIOZ)是非侵入式 intracranial pressure(ICP)监测的一个可靠的技术。临床上,对抢救性脑 травuma(TBI)患者的治疗需要同时监测ICP和血压。通过脑 BIOZ 直接估算血压可以降低患者身上的感测器数量,从而改善他们的 комфор度。为解决这些问题,本研究提出了利用脑 BIOZ 进行血压估算的可能性,并提出了一种新的无捕血压估算方法,即 BrainZ-BP。在脑 BIOZ 测量中,两个电极被安置在头颈部的前后方向上,用于测量脑 BIOZ。从脑 BIOZ 中提取了多种特征,包括脉搏传输时间和脑 BIOZ 的形态特征,并将其传输给四种回归模型进行血压估算。结果显示,Random Forest 回归模型的平均绝对误差、根圆平方误差和相关系数分别为2.17 mmHg、3.91 mmHg和0.90 для systolic pressure 估算,分别为1.71 mmHg、3.02 mmHg和0.89 для diastolic pressure 估算。所提出的 BrainZ-BP 可以在脑 BIOZ 基于 ICP 监测场景中进行同时监测血压。
EdgeFM: Leveraging Foundation Model for Open-set Learning on the Edge
results: 根据三个公共数据集和两个自行收集的数据集进行评估,EdgeFM 可以将终端延迟时间降低到3.2倍,并且与基eline相比,实现了34.3%的准确度提升。Abstract
Deep Learning (DL) models have been widely deployed on IoT devices with the help of advancements in DL algorithms and chips. However, the limited resources of edge devices make these on-device DL models hard to be generalizable to diverse environments and tasks. Although the recently emerged foundation models (FMs) show impressive generalization power, how to effectively leverage the rich knowledge of FMs on resource-limited edge devices is still not explored. In this paper, we propose EdgeFM, a novel edge-cloud cooperative system with open-set recognition capability. EdgeFM selectively uploads unlabeled data to query the FM on the cloud and customizes the specific knowledge and architectures for edge models. Meanwhile, EdgeFM conducts dynamic model switching at run-time taking into account both data uncertainty and dynamic network variations, which ensures the accuracy always close to the original FM. We implement EdgeFM using two FMs on two edge platforms. We evaluate EdgeFM on three public datasets and two self-collected datasets. Results show that EdgeFM can reduce the end-to-end latency up to 3.2x and achieve 34.3% accuracy increase compared with the baseline.
摘要
深度学习(DL)模型已广泛部署在物联网设备上,由于DL算法和芯片的进步。然而,边缘设备的限制资源使得这些边缘DL模型难以在多样环境和任务中具有普适性。虽然最近出现的基础模型(FM)表现出了很好的泛化能力,但是如何有效地利用边缘设备的贫乏资源来激活FM的知识仍然不是研究的主要方向。在本文中,我们提出了EdgeFM,一种边缘云合作系统,具有开放集合识别能力。EdgeFM在云端查询FM的基础上,选择上传无标签数据,并自适应边缘模型的特定知识和结构。同时,EdgeFM在运行时进行动态模型交换,考虑到数据不确定性和动态网络变化,以确保精度总是相对于原FM做出最佳化。我们使用了两个FM在两个边缘平台进行实现EdgeFM。我们对EdgeFM进行了三个公共数据集和两个自收集数据集的测试。结果表明,EdgeFM可以将终端延迟减少至3.2倍,并实现对基准值的34.3%的准确率提升。
Polynomial-Time Solutions for ReLU Network Training: A Complexity Classification via Max-Cut and Zonotopes
for: investigate the complexity of training a two-layer ReLU neural network with weight decay regularization
methods: using a standard cone-constrained convex program and developing a randomized algorithm
results: prove that the hardness of approximation of ReLU networks mirrors the complexity of the Max-Cut problem, and develop polynomial-time approximation guarantees for certain categories of datasetsAbstract
We investigate the complexity of training a two-layer ReLU neural network with weight decay regularization. Previous research has shown that the optimal solution of this problem can be found by solving a standard cone-constrained convex program. Using this convex formulation, we prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it. In particular, when $\epsilon\leq\sqrt{84/83}-1\approx 0.006$, we show that it is NP-hard to find an approximate global optimizer of the ReLU network objective with relative error $\epsilon$ with respect to the objective value. Moreover, we develop a randomized algorithm which mirrors the Goemans-Williamson rounding of semidefinite Max-Cut relaxations. To provide polynomial-time approximations, we classify training datasets into three categories: (i) For orthogonal separable datasets, a precise solution can be obtained in polynomial-time. (ii) When there is a negative correlation between samples of different classes, we give a polynomial-time approximation with relative error $\sqrt{\pi/2}-1\approx 0.253$. (iii) For general datasets, the degree to which the problem can be approximated in polynomial-time is governed by a geometric factor that controls the diameter of two zonotopes intrinsic to the dataset. To our knowledge, these results present the first polynomial-time approximation guarantees along with first hardness of approximation results for regularized ReLU networks.
摘要
我们研究具有权重减权化的两层ReLU神经网络的训练复杂性。先前的研究表明,这个问题的优化解决方案可以通过标准的 cone-constrained 几何 програм约束来找到。使用这个几何形式,我们证明了ReLU网络的困难性不仅和Max-Cut问题的复杂性相同,而且在某些特殊情况下,甚至与其相同。具体来说,当 $\epsilon\leq\sqrt{84/83}-1\approx 0.006$ 时,我们显示了一个NP困难的问题:在Relative Error $\epsilon$ 下,不可能在 polynomial-time 内找到ReLU网络目标函数的approximate全局最优值。此外,我们开发了一种随机化的算法,它类似于Goemans-Williamson 的半definite Max-Cut 缩放。为了提供 polynomial-time approxiamtion,我们将训练数据分为三类:(i)对吸引式分割数据进行精确解决,可以在 polynomial-time 内完成。(ii)当不同类别样本之间存在负相关性时,我们提供了一种 polynomial-time approxiamtion,其相对误差为 $\sqrt{\pi/2}-1\approx 0.253$。(iii)对一般数据集,问题的approxiamtion程度由数据集的径向因子控制,这个因子控制两个zonotope 的径向。我们认为这些结果是 regularized ReLU 神经网络的首个 polynomial-time approximation guarantee 和首个困难性 results。
Learning Deterministic Finite Automata from Confidence Oracles
results: 学习一个DFA表示,该表示保留了信任函数($Q$)中的信息,并且与该函数在高度信任的地方匹配紧密。Abstract
We discuss the problem of learning a deterministic finite automaton (DFA) from a confidence oracle. That is, we are given access to an oracle $Q$ with incomplete knowledge of some target language $L$ over an alphabet $\Sigma$; the oracle maps a string $x\in\Sigma^*$ to a score in the interval $[-1,1]$ indicating its confidence that the string is in the language. The interpretation is that the sign of the score signifies whether $x\in L$, while the magnitude $|Q(x)|$ represents the oracle's confidence. Our goal is to learn a DFA representation of the oracle that preserves the information that it is confident in. The learned DFA should closely match the oracle wherever it is highly confident, but it need not do this when the oracle is less sure of itself.
摘要
我们讨论一个 deterministic finite automaton (DFA) 的学习问题, Specifically, we are given access to an oracle $Q$ with incomplete knowledge of some target language $L$ over an alphabet $\Sigma$; the oracle maps a string $x\in\Sigma^*$ to a score in the interval $[-1,1]$ indicating its confidence that the string is in the language. The interpretation is that the sign of the score signifies whether $x\in L$, while the magnitude $|Q(x)|$ represents the oracle's confidence. Our goal is to learn a DFA representation of the oracle that preserves the information that it is confident in. The learned DFA should closely match the oracle wherever it is highly confident, but it need not do this when the oracle is less sure of itself.Here's the translation breakdown:* "deterministic finite automaton" (DFA) 被翻译为 "确定型 finite automaton" (CFAs)* "confidence oracle" 被翻译为 "信任 oracle"* "target language" 被翻译为 "目标语言"* "alphabet" 被翻译为 "字母"* "score" 被翻译为 "分数"* "interval" 被翻译为 "区间"* "sign" 被翻译为 "符号"* "magnitude" 被翻译为 "大小"* "preserves" 被翻译为 "保持"* "closely match" 被翻译为 "匹配"Note that the translation is in Simplified Chinese, which is the most widely used variety of Chinese. If you need Traditional Chinese, please let me know.
Classification Methods Based on Machine Learning for the Analysis of Fetal Health Data
for: This paper aims to assess the classification performance of various machine learning models for fetal health analysis.
methods: The authors use machine learning models such as SVM, RF, and TabNet, as well as dimensionality reduction techniques like PCA and LDA.
results: The TabNet model achieves a classification accuracy of 94.36% on a fetal health dataset, demonstrating the effectiveness of machine learning-based techniques for fetal health analysis.Abstract
The persistent battle to decrease childhood mortality serves as a commonly employed benchmark for gauging advancements in the field of medicine. Globally, the under-5 mortality rate stands at approximately 5 million, with a significant portion of these deaths being avoidable. Given the significance of this problem, Machine learning-based techniques have emerged as a prominent tool for assessing fetal health. In this work, we have analyzed the classification performance of various machine learning models for fetal health analysis. Classification performance of various machine learning models, such as support vector machine (SVM), random forest(RF), and attentive interpretable tabular learning (TabNet) have been assessed on fetal health. Moreover, dimensionality reduction techniques, such as Principal component analysis (PCA) and Linear discriminant analysis (LDA) have been implemented to obtain better classification performance with less number of features. A TabNet model on a fetal health dataset provides a classification accuracy of 94.36%. In general, this technology empowers doctors and healthcare experts to achieve precise fetal health classification and identify the most influential features in the process.
摘要
persistent battle to decrease childhood mortality serves as a commonly employed benchmark for gauging advancements in the field of medicine。 globally, the under-5 mortality rate stands at approximately 5 million, with a significant portion of these deaths being avoidable。 given the significance of this problem, machine learning-based techniques have emerged as a prominent tool for assessing fetal health。 in this work, we have analyzed the classification performance of various machine learning models for fetal health analysis。 classification performance of various machine learning models, such as support vector machine (SVM), random forest (RF), and attentive interpretable tabular learning (TabNet) have been assessed on fetal health。 moreover, dimensionality reduction techniques, such as principal component analysis (PCA) and linear discriminant analysis (LDA) have been implemented to obtain better classification performance with less number of features。 a TabNet model on a fetal health dataset provides a classification accuracy of 94.36%。 in general, this technology empowers doctors and healthcare experts to achieve precise fetal health classification and identify the most influential features in the process。Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.
Taxonomic analysis of asteroids with artificial neural networks
paper_authors: Nanping Luo, Xiaobin Wang, Shenghong Gu, Antti Penttilä, Karri Muinonen, Yisi Liu
for: asteroid taxonomy and composition analysis
methods: artificial neural networks (ANNs) and spectral data from the Chinese Space Survey telescope (CSST)
results: higher than 92% accuracy in asteroid classification using the ANN tool, reasonable predictions for known taxonomic labels, and potential application for analyzing CSST asteroid spectra in the future.Here’s the simplified Chinese text:
results: ANN工具的准确率高于92%,对已知分类标签的预测结果是合理的,并且可能应用于未来分析CSST asteroid spectrum。Abstract
We study the surface composition of asteroids with visible and/or infrared spectroscopy. For example, asteroid taxonomy is based on the spectral features or multiple color indices in visible and near-infrared wavelengths. The composition of asteroids gives key information to understand their origin and evolution. However, we lack compositional information for faint asteroids due to limits of ground-based observational instruments. In the near future, the Chinese Space Survey telescope (CSST) will provide multiple colors and spectroscopic data for asteroids of apparent magnitude brighter than 25 mag and 23 mag, respectively. For the aim of analysis of the CSST spectroscopic data, we applied an algorithm using artificial neural networks (ANNs) to establish a preliminary classification model for asteroid taxonomy according to the design of the survey module of CSST. Using the SMASS II spectra and the Bus-Binzel taxonomy system, our ANN classification tool composed of 5 individual ANNs is constructed, and the accuracy of this classification system is higher than 92 %. As the first application of our ANN tool, 64 spectra of 42 asteroids obtained in 2006 and 2007 by us with the 2.16-m telescope in the Xinglong station (Observatory Code 327) of National Astronomical Observatory of China are analyzed. The predicted labels of these spectra using our ANN tool are found to be reasonable when compared to their known taxonomic labels. Considering the accuracy and stability, our ANN tool can be applied to analyse the CSST asteroid spectra in the future.
摘要
我们研究小行星表面成分,使用可见和近红外谱学观测。例如,小行星分类是基于谱spectral特征或多色指数在可见和近红外波长上。小行星的成分提供了关键信息以解释它们的起源和演化。但我们对暗淡小行星的 compositional 信息缺乏。未来,中国空间探测 telescope(CSST)将提供多种颜色和谱学数据,用于 asteroids 的 apparent magnitude brighter than 25 mag 和 23 mag 。为了分析 CSST 谱学数据的目的,我们采用了一种使用人工神经网络(ANNs)的算法,以建立一个初步的分类模型,以便根据 CSST 的设计模块进行 asteroid 分类。使用 SMASS II 谱和 Bus-Binzel 分类系统,我们的 ANN 分类工具由 5 个个 ANNS 组成,其准确率高于 92 %。作为我们 ANN 工具的首次应用,我们分析了 2006 和 2007 年我们使用 2.16-m telescope 在中国天文台(Observatory Code 327)的 Xinglong 站进行的 64 个spectrum 数据,并发现这些spectrum 的预测标签使用我们 ANN 工具是合理的,与知道的分类标签相比。考虑准确和稳定,我们的 ANN 工具可以在未来用于分析 CSST 小行星谱数据。
Bridging Data-Driven and Knowledge-Driven Approaches for Safety-Critical Scenario Generation in Automated Vehicle Validation
paper_authors: Kunkun Hao, Lu Liu, Wen Cui, Jianxing Zhang, Songyang Yan, Yuxi Pan, Zijiang Yang
for: The paper is written to address the challenges of validating automated driving vehicles (ADV) in safety-critical scenarios, and to propose a scenario generation framework called BridgeGen that can effectively generate diverse safety-critical scenarios for ADV development and performance evaluations.
methods: The paper uses both data-driven and knowledge-driven scenario generation methods, and introduces an ontology-based approach to model the five scenario layers in the operational design domain (ODD). The paper also develops an optimized scenario generation toolkit that combines traditional optimization and reinforcement learning schemes.
results: The paper conducts extensive experiments using the Carla simulator and demonstrates the effectiveness of BridgeGen in generating diverse safety-critical scenarios for ADV. The results show that BridgeGen can efficiently generate safety-critical scenarios that are not easily achievable by existing methods.Here is the same information in Simplified Chinese text:
results: 论文通过使用 Carla simulateur 进行了广泛的实验,并证明了 BridgeGen 可以有效地生成多样化的安全关键场景。结果表明,BridgeGen 可以高效地生成安全关键场景,而这些场景不易由现有的方法实现。Abstract
Automated driving vehicles~(ADV) promise to enhance driving efficiency and safety, yet they face intricate challenges in safety-critical scenarios. As a result, validating ADV within generated safety-critical scenarios is essential for both development and performance evaluations. This paper investigates the complexities of employing two major scenario-generation solutions: data-driven and knowledge-driven methods. Data-driven methods derive scenarios from recorded datasets, efficiently generating scenarios by altering the existing behavior or trajectories of traffic participants but often falling short in considering ADV perception; knowledge-driven methods provide effective coverage through expert-designed rules, but they may lead to inefficiency in generating safety-critical scenarios within that coverage. To overcome these challenges, we introduce BridgeGen, a safety-critical scenario generation framework, designed to bridge the benefits of both methodologies. Specifically, by utilizing ontology-based techniques, BridgeGen models the five scenario layers in the operational design domain (ODD) from knowledge-driven methods, ensuring broad coverage, and incorporating data-driven strategies to efficiently generate safety-critical scenarios. An optimized scenario generation toolkit is developed within BridgeGen. This expedites the crafting of safety-critical scenarios through a combination of traditional optimization and reinforcement learning schemes. Extensive experiments conducted using Carla simulator demonstrate the effectiveness of BridgeGen in generating diverse safety-critical scenarios.
摘要
Short-term Volatility Estimation for High Frequency Trades using Gaussian processes (GPs)
paper_authors: Leonard Mushunje, Maxwell Mashasha, Edina Chandiwana
For: The paper aims to improve short-term volatility and return forecasting for high-frequency trades by combining numeric and probabilistic models.* Methods: The paper uses a combination of Gaussian Processes (GPs) and a Numerical market prediction (NMP) model to make one-day-ahead volatility forecasts. The NMP model is used to correct the stock price data, and a Censored GP is used to model the relationship between the corrected stock prices and returns.* Results: The paper evaluates the forecasting errors using implied and estimated data.Here’s the simplified Chinese text for the three information points:* For: 这篇论文目的是为高频交易提高短期涨落风险和回报预测。* Methods: 论文使用GP和NMP模型组合来实现一天前的涨落风险预测。NMP模型用于correcting股票价格数据,而Censored GP用于模型corrected股票价格和回报之间的关系。* Results: 论文使用implied和estimated数据来评估预测误差。Abstract
The fundamental theorem behind financial markets is that stock prices are intrinsically complex and stochastic. One of the complexities is the volatility associated with stock prices. Volatility is a tendency for prices to change unexpectedly [1]. Price volatility is often detrimental to the return economics, and thus, investors should factor it in whenever making investment decisions, choices, and temporal or permanent moves. It is, therefore, crucial to make necessary and regular short and long-term stock price volatility forecasts for the safety and economics of investors returns. These forecasts should be accurate and not misleading. Different models and methods, such as ARCH GARCH models, have been intuitively implemented to make such forecasts. However, such traditional means fail to capture the short-term volatility forecasts effectively. This paper, therefore, investigates and implements a combination of numeric and probabilistic models for short-term volatility and return forecasting for high-frequency trades. The essence is that one-day-ahead volatility forecasts were made with Gaussian Processes (GPs) applied to the outputs of a Numerical market prediction (NMP) model. Firstly, the stock price data from NMP was corrected by a GP. Since it is not easy to set price limits in a market due to its free nature and randomness, a Censored GP was used to model the relationship between the corrected stock prices and returns. Forecasting errors were evaluated using the implied and estimated data.
摘要
金融市场的基本定理是股票价格本身具有内在的复杂性和随机性。其中一种复杂性是股票价格的波动性,波动性通常对于投资者的返报有负面影响,因此投资者应该在做投资决策时考虑波动性。为保证投资者的返报安全和经济,因此需要在REGULAR basis上进行短期和长期股票价格波动性预测。这些预测应该准确无误。传统方法和模型,如ARCH GARCH模型,已经被应用来进行这些预测,但它们在短期波动性预测上并不准确。这篇论文因此调查和实施了一种组合 numeric和 probabilistic 模型来进行短期波动性和回报预测。其中一种方法是使用 Gaussian Processes (GPs) 来对 numerics 市场预测模型(NMP)的输出进行预测。首先,股票价格数据从NMP中被修正了一个GP。由于市场的自由和随机性,使用 Censored GP 模型来模型修正后的股票价格和回报之间的关系。预测错误被评估使用实际和预测数据。
Near-Optimal Fair Resource Allocation for Strategic Agents without Money: A Data-Driven Approach
methods: 使用复杂的技术 inspirited by differentiable convex programming literature,计算PF机制的利用性。
results: 提出了一种能够快速计算PF机制的利用性的方法,并通过控制交易OFF来实现高度公平的分配。Abstract
We study learning-based design of fair allocation mechanisms for divisible resources, using proportional fairness (PF) as a benchmark. The learning setting is a significant departure from the classic mechanism design literature, in that, we need to learn fair mechanisms solely from data. In particular, we consider the challenging problem of learning one-shot allocation mechanisms -- without the use of money -- that incentivize strategic agents to be truthful when reporting their valuations. It is well-known that the mechanism that directly seeks to optimize PF is not incentive compatible, meaning that the agents can potentially misreport their preferences to gain increased allocations. We introduce the notion of "exploitability" of a mechanism to measure the relative gain in utility from misreport, and make the following important contributions in the paper: (i) Using sophisticated techniques inspired by differentiable convex programming literature, we design a numerically efficient approach for computing the exploitability of the PF mechanism. This novel contribution enables us to quantify the gap that needs to be bridged to approximate PF via incentive compatible mechanisms. (ii) Next, we modify the PF mechanism to introduce a trade-off between fairness and exploitability. By properly controlling this trade-off using data, we show that our proposed mechanism, ExPF-Net, provides a strong approximation to the PF mechanism while maintaining low exploitability. This mechanism, however, comes with a high computational cost. (iii) To address the computational challenges, we propose another mechanism ExS-Net, which is end-to-end parameterized by a neural network. ExS-Net enjoys similar (slightly inferior) performance and significantly accelerated training and inference time performance. (iv) Extensive numerical simulations demonstrate the robustness and efficacy of the proposed mechanisms.
摘要
我们研究基于学习的公平分配机制设计,使用比例公平(PF)作为 referent。学习设定是传统机制设计文献中的一个重要 departure,因为我们需要从数据中学习公平的机制。具体来说,我们考虑到具有挑战性的问题:学习一次分配机制——不使用金钱——导致战略性代表者 truthfully 报告他们的价值。已知PF直接寻求最佳化机制是不可吸引的,代表代表者可能会隐藏他们的 preference以获得更多的分配。我们引入了机制的“滥用”(exploitability)来衡量代表者可以从misreport中获得的优化。我们在文中做以下重要贡献:(i) 使用 differential convex programming 文献中的专门技术,我们设计了一个精确的方法来 Compute 机制的滥用。这个新的贡献使我们能够量化PF机制和吸引机制之间的差异。(ii) 我们将PF机制修改,以引入公平和滥用之间的变数。通过对数据进行控制,我们显示了我们的提案机制ExPF-Net可以将PF机制作为近似,同时保持低滥用。这个机制,然而,具有高计算成本。(iii) 为了解决计算问题,我们提出了另一个机制ExS-Net,这个机制是由神经网 Parametrize 的。ExS-Net 具有相似(微scopically inferior)的性能,并且具有明显提高的训练和测试时间性能。(iv) 我们的实验结果显示了我们的提案机制具有优良的Robustness和效用性。
PACOL: Poisoning Attacks Against Continual Learners
results: 研究发现,常用的 Generative Replay 和 regularization-based kontinual learning 方法都容易受到攻击,特别是 label-flipping 和 PACOL 等攻击方法可以让 kontinual learning 系统忘记已经学习的任务。Abstract
Continual learning algorithms are typically exposed to untrusted sources that contain training data inserted by adversaries and bad actors. An adversary can insert a small number of poisoned samples, such as mislabeled samples from previously learned tasks, or intentional adversarial perturbed samples, into the training datasets, which can drastically reduce the model's performance. In this work, we demonstrate that continual learning systems can be manipulated by malicious misinformation and present a new category of data poisoning attacks specific for continual learners, which we refer to as {\em Poisoning Attacks Against Continual Learners} (PACOL). The effectiveness of labeling flipping attacks inspires PACOL; however, PACOL produces attack samples that do not change the sample's label and produce an attack that causes catastrophic forgetting. A comprehensive set of experiments shows the vulnerability of commonly used generative replay and regularization-based continual learning approaches against attack methods. We evaluate the ability of label-flipping and a new adversarial poison attack, namely PACOL proposed in this work, to force the continual learning system to forget the knowledge of a learned task(s). More specifically, we compared the performance degradation of continual learning systems trained on benchmark data streams with and without poisoning attacks. Moreover, we discuss the stealthiness of the attacks in which we test the success rate of data sanitization defense and other outlier detection-based defenses for filtering out adversarial samples.
摘要
continuous learning algorithms 通常会被不良来源攻击,这些来源包括由 adversary 和坏 actor 插入的训练数据。一个 adversary 可以插入一小数量的毒害样本,如先前学习的任务中的杂乱标注样本或者 adversarial 扰动样本,这些样本可以导致模型的性能下降很快。在这项工作中,我们展示了 continual learning 系统可以被恶意诡射的,并提出了一种新的数据毒害攻击,称为 continual learning 中的毒害攻击(PACOL)。PACOL 的攻击样本不会改变样本的标签,但会导致模型忘记已经学习的知识。我们对常用的生成回馈和常规化基于 continual learning 的方法进行了完整的实验,并证明了这些方法对于攻击方法的抵触性。我们还比较了标签旋转攻击和我们在这项工作中提出的新的 adversarial 毒害攻击(PACOL)的性能下降情况,以及在不同的数据流中对 continual learning 系统的影响。此外,我们还讨论了这些攻击的隐蔽性,包括测试攻击成功率和其他基于异常检测的防御机制是否能够过滤恶意样本。
results: simulation结果表明,使用提议的BeamSync方法可以提高性能,当AP中天线数量 Doubles 时,性能提高3dB。此外,这种方法也与传统的束 formaiting技术相比较好。Abstract
In distributed massive multiple-input multiple-output (MIMO) systems, multiple geographically separated access points (APs) communicate simultaneously with a user, leveraging the benefits of multi-antenna coherent MIMO processing and macro-diversity gains from the distributed setups. However, time and frequency synchronization of the multiple APs is crucial to achieve good performance and enable joint precoding. In this paper, we analyze the synchronization requirement among multiple APs from a reciprocity perspective, taking into account the multiplicative impairments caused by mismatches in radio frequency (RF) hardware. We demonstrate that a phase calibration of reciprocity-calibrated APs is sufficient for the joint coherent transmission of data to the user. To achieve synchronization, we propose a novel over-the-air synchronization protocol, named BeamSync, to calibrate the geographically separated APs without sending any measurements to the central processing unit (CPU) through fronthaul. We show that sending the synchronization signal in the dominant direction of the channel between APs is optimal. Additionally, we derive the optimal phase and frequency offset estimators. Simulation results indicate that the proposed BeamSync method enhances performance by 3 dB when the number of antennas at the APs is doubled. Moreover, the method performs well compared to traditional beamforming techniques.
摘要
在分布式巨大多输入多输出(MIMO)系统中,多个地理上分开的访问点(AP)同时与用户通信,利用多antenna干扰MIMO处理和macro-多样性收益。然而,多个AP的时间和频率同步是需要达到良好性能和启用联合预编码的关键。在这篇论文中,我们从reciprocity角度分析了多个AP之间的同步需求,考虑了 radio频率硬件匹配不准的乘数性质。我们示出,只需要在reciprocity-calibrated APs中进行相位准化,即可实现联合整合数据传输到用户。为实现同步,我们提出了一种新的无需中央处理单元(CPU)通过前段传输的空中同步协议,名为BeamSync。我们发现,在AP之间通信道的主导方向上发送同步信号是优化的。此外,我们 derive了最佳相位和频率偏移估计器。实验结果表明,我们提出的BeamSync方法可以在APantenna数量两倍时提高性能,并且与传统的扫描方法相比,其性能较好。
Channel Estimation for FAS-assisted Multiuser mmWave Systems
results: simulations results show that the proposed method can obtain precise CSI with minimal hardware switching and pilot overhead, leading to a system sum-rate that approaches the upper bound achievable with perfect CSI.Abstract
This letter investigates the challenge of channel estimation in a multiuser millimeter-wave (mmWave) time-division duplexing (TDD) system. In this system, the base station (BS) employs a multi-antenna uniform linear array (ULA), while each mobile user is equipped with a fluid antenna system (FAS). Accurate channel state information (CSI) plays a crucial role in the precise placement of antennas in FAS. Traditional channel estimation methods designed for fixed-antenna systems are inadequate due to the high dimensionality of FAS. To address this issue, we propose a low-sample-size sparse channel reconstruction (L3SCR) method, capitalizing on the sparse propagation paths characteristic of mmWave channels. In this approach, each fluid antenna only needs to switch and measure the channel at a few specific locations. By observing this reduced-dimensional data, we can effectively extract angular and gain information related to the sparse channel, enabling us to reconstruct the full CSI. Simulation results demonstrate that our proposed method allows us to obtain precise CSI with minimal hardware switching and pilot overhead. As a result, the system sum-rate approaches the upper bound achievable with perfect CSI.
摘要
paper_authors: Yi Zhu, Mahsa Abdollahi, Ségolène Maucourt, Nico Coallier, Heitor R. Guimarães, Pierre Giovenazzo, Tiago H. Falk
for: 这个论文是为了提供多种现象特征测量和蜂群学专家标注的蜂巢数据集,以便更广泛的分析。
methods: 该论文使用了多种感知器和温度传感器收集数据,并对数据进行预处理和分析。
results: 论文提供了蜂巢数据集的分布和感知器数据的视觉化,并基于感知器数据分析和机器学习实现了冬季死亡预测、蜂巢人口估计和活蜂后繁殖的应用。Abstract
We present a longitudinal multi-sensor dataset collected from honey bee colonies (Apis mellifera) with rich phenotypic measurements. Data were continuously collected between May-2020 and April-2021 from 53 hives located at two apiaries in Qu\'ebec, Canada. The sensor data included audio features, temperature, and relative humidity. The phenotypic measurements contained beehive population, number of brood cells (eggs, larva and pupa), Varroa destructor infestation levels, defensive and hygienic behaviors, honey yield, and winter mortality. Our study is amongst the first to provide a wide variety of phenotypic trait measurements annotated by apicultural science experts, which facilitate a broader scope of analysis. We first summarize the data collection procedure, sensor data pre-processing steps, and data composition. We then provide an overview of the phenotypic data distribution as well as a visualization of the sensor data patterns. Lastly, we showcase several hive monitoring applications based on sensor data analysis and machine learning, such as winter mortality prediction, hive population estimation, and the presence of an active and laying queen.
摘要
我们提供了一个长期多感器数据集,从加拿大魁北克省五十三个坛巢中收集到的蜜蜂(Apis mellifera)的数据。数据从2020年5月到2021年4月连续收集,来自两个 apiary 地点。感器数据包括声音特征、温度和相对湿度。现象特征测量包括坛巢人口、卵、幼虫和蛹数量、Varroa destructor 感染水平、防御和卫生行为、蜜产量和冬季死亡率。我们的研究是 amongst the first 提供了蜜蜂生物学专家 annotated 多种现象特征测量,使得更广泛的分析 scope 可行。我们首先介绍数据采集过程、感器数据预处理步骤和数据结构。然后我们提供现象特征数据分布的概述以及感器数据模式的视觉化。最后,我们展示了基于感器数据分析和机器学习的坛巢监测应用,如冬季死亡预测、坛巢人口估计和活跃和繁殖 queen 存在。
Retrieval Augmented Generation of Symbolic Music with LLMs
paper_authors: Nicolas Jonason, Luca Casini, Carl Thomé, Bob L. T. Sturm
for: 用于音乐生成
methods: 使用检索系统选择相关示例
results: 在对用户进行对话时,音乐生成初步结果显示投入效果,特别是考虑到实现的可能性。I hope that helps! Let me know if you have any other questions.Abstract
We explore the use of large language models (LLMs) for music generation using a retrieval system to select relevant examples. We find promising initial results for music generation in a dialogue with the user, especially considering the ease with which such a system can be implemented. The code is available online.
摘要
我们研究使用大型自然语言模型(LLM) для музыкаль生成,使用检索系统选择相关的示例。我们发现在与用户对话中进行音乐生成初果很有前途,特别是考虑到这种系统的实现非常容易。代码在线上可用。Note: "音乐生成" (yīn yuè chàng jì) is a term used in China to refer to the generation of music using machine learning or other computational methods.
results: 实验结果表明,GhostVec 可以将 transformer 基于语音识别系统中的说话人信息提取出来,并且可以达到 10.83% EER 和 0.47 minDCF 的水平,这表明了提案的方法的效iveness。Abstract
Speaker adaptation systems face privacy concerns, for such systems are trained on private datasets and often overfitting. This paper demonstrates that an attacker can extract speaker information by querying speaker-adapted speech recognition (ASR) systems. We focus on the speaker information of a transformer-based ASR and propose GhostVec, a simple and efficient attack method to extract the speaker information from an encoder-decoder-based ASR system without any external speaker verification system or natural human voice as a reference. To make our results quantitative, we pre-process GhostVec using singular value decomposition (SVD) and synthesize it into waveform. Experiment results show that the synthesized audio of GhostVec reaches 10.83\% EER and 0.47 minDCF with target speakers, which suggests the effectiveness of the proposed method. We hope the preliminary discovery in this study to catalyze future speech recognition research on privacy-preserving topics.
摘要
喋 speaker adaptation 系统面临着隐私问题,因为这些系统通常是基于私人数据集训练的,并且容易过拟合。这篇论文显示,攻击者可以通过询问 speaker-adapted 语音识别(ASR)系统来提取speaker信息。我们关注于基于 transformer 的 ASR 系统中的 speaker 信息,并提出了 GhostVec,一种简单而高效的攻击方法,可以在 encoder-decoder 结构的 ASR 系统中提取speaker 信息,不需要外部的 speaker 验证系统或自然的人声作为参考。为了让我们的结果变量,我们使用了特征值分解(SVD)来预处理 GhostVec,并将其转换成波形。实验结果表明, synthesized audio 的 GhostVec 达到了 10.83% EER 和 0.47 minDCF 的目标 speaker,这表明了我们提出的方法的有效性。我们希望这一初步发现可以推动未来的语音识别研究,尤其是在隐私保护方面。
Reprogramming Self-supervised Learning-based Speech Representations for Speaker Anonymization
results: 在 VoicePrivacy Challenge (VPC) 2022 数据集上进行了广泛的实验,证明了我们的提出的参数灵活学习隐藏方法的效iveness。同时,我们的方法可以在隐藏过程中减少计算资源的消耗。Abstract
Current speaker anonymization methods, especially with self-supervised learning (SSL) models, require massive computational resources when hiding speaker identity. This paper proposes an effective and parameter-efficient speaker anonymization method based on recent End-to-End model reprogramming technology. To improve the anonymization performance, we first extract speaker representation from large SSL models as the speaker identifies. To hide the speaker's identity, we reprogram the speaker representation by adapting the speaker to a pseudo domain. Extensive experiments are carried out on the VoicePrivacy Challenge (VPC) 2022 datasets to demonstrate the effectiveness of our proposed parameter-efficient learning anonymization methods. Additionally, while achieving comparable performance with the VPC 2022 strong baseline 1.b, our approach consumes less computational resources during anonymization.
摘要
Translation Notes:* "speaker anonymization" is translated as "声音隐私" (shēng diàn yǐn bì)* "self-supervised learning" is translated as "自我超vision" (zì wǒ chāo wén)* "End-to-End model reprogramming" is translated as "端到端模型重写" (dìan dào dian módel zhòng xiǎng)* "pseudo domain" is translated as "假领域" (jiǎ lǐng yì)* "computational resources" is translated as "计算资源" (jìsuàn zhīyuán)Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.
LE-SSL-MOS: Self-Supervised Learning MOS Prediction with Listener Enhancement
paper_authors: Zili Qi, Xinhui Hu, Wangjin Zhou, Sheng Li, Hao Wu, Jian Lu, Xinkang Xu for:This paper proposes a novel fusion model for MOS (Mean Opinion Score) prediction that combines supervised and unsupervised approaches to improve the accuracy of predicting subjective evaluations for speech synthesis systems, especially on out-of-domain test sets.methods:The proposed fusion model uses a combination of supervised and unsupervised techniques, including pre-trained self-supervised learning models, fine-tuning of unit language models, and ensemble learning with ASR confidence.results:The experimental results on the VoiceMOS Challenge 2023 show that the proposed LE-SSL-MOS system achieves better performance than the baseline, with an absolute improvement of 13% on the noisy and enhanced speech track. The system ranked 1st and 2nd, respectively, in the French speech synthesis track and the challenge’s noisy and enhanced speech track.Abstract
Recently, researchers have shown an increasing interest in automatically predicting the subjective evaluation for speech synthesis systems. This prediction is a challenging task, especially on the out-of-domain test set. In this paper, we proposed a novel fusion model for MOS prediction that combines supervised and unsupervised approaches. In the supervised aspect, we developed an SSL-based predictor called LE-SSL-MOS. The LE-SSL-MOS utilizes pre-trained self-supervised learning models and further improves prediction accuracy by utilizing the opinion scores of each utterance in the listener enhancement branch. In the unsupervised aspect, two steps are contained: we fine-tuned the unit language model (ULM) using highly intelligible domain data to improve the correlation of an unsupervised metric - SpeechLMScore. Another is that we utilized ASR confidence as a new metric with the help of ensemble learning. To our knowledge, this is the first architecture that fuses supervised and unsupervised methods for MOS prediction. With these approaches, our experimental results on the VoiceMOS Challenge 2023 show that LE-SSL-MOS performs better than the baseline. Our fusion system achieved an absolute improvement of 13% over LE-SSL-MOS on the noisy and enhanced speech track. Our system ranked 1st and 2nd, respectively, in the French speech synthesis track and the challenge's noisy and enhanced speech track.
摘要
近些时候,研究人员对自动预测语音合成系统的主观评价有增加的兴趣。这个预测任务,尤其是在域外测试集上是一项挑战。在这篇论文中,我们提出了一种新的融合模型,用于MOS预测。我们的LE-SSL-MOS模型 combining supervised和Unsupervised方法。在supervised方面,我们开发了基于自我超vised学习模型的SSL-based predictor。在无supervised方面,我们finetune了unit语言模型(ULM),使其与高度可识别的频谱数据进行更好的对应。此外,我们还使用ASR确idence作为一个新的度量,并通过ensemble学习来利用其。到我们所知,这是首个将supervised和Unsupervised方法融合的MOS预测架构。我们的实验结果表明,LE-SSL-MOS在VoiceMOS Challenge 2023中表现出色,与基准相比,LE-SSL-MOS在噪音和加强的语音轨道上具有13%的绝对改进。我们的融合系统在法语语音合成轨道和挑战的噪音和加强语音轨道上分别 ranking 1st和2nd。
results: 论文发现,使用泛函分布分类方法可以在更为困难的情况下提高分类精度,并且比传统机器学习方法更有优势。Abstract
Accurately detecting rendezvous and proximity operations (RPO) is crucial for understanding how objects are behaving in the space domain. However, detecting closely-spaced objects (CSO) is challenging for ground-based optical space domain awareness (SDA) algorithms as two objects close together along the line-of-sight can appear blended as a single object within the point-spread function (PSF) of the optical system. Traditional machine learning methods can be useful for differentiating between singular objects and closely-spaced objects, but many methods require large training sample sizes or high signal-to-noise conditions. The quality and quantity of realistic data make probabilistic classification methods a superior approach, as they are better suited to handle these data inadequacies. We present CSO classification results using the Gaussian process python package, MuyGPyS, and examine classification accuracy as a function of angular separation and magnitude difference between the simulated satellites. This orbit-independent analysis is done on highly accurate simulated SDA images that emulate realistic ground-based commercial-of-the-shelf (COTS) optical sensor observations of CSOs. We find that MuyGPyS outperforms traditional machine learning methods, especially under more challenging circumstances.
摘要
准确探测 rendezvous 和 proximity operations(RPO)是 espacial domain 中对 объек的行为理解的关键。然而, closely-spaced objects(CSO)的探测对地面上的 optical space domain awareness(SDA)算法是挑战的,因为两个 объек在视线上几乎相同的距离可以在 optic 系统的 point-spread function(PSF)中被混合为一个单一的 объек。传统的机器学习方法可以用于分 differentiating between singular objects and closely-spaced objects,但这些方法通常需要大量的训练样本或高的信号噪声比。 probablistic classification methods 是一种更加适合的方法,因为它们可以更好地处理这些数据不足。我们使用 Gaussian process python 包 MuyGPyS 进行 CSO 类别结果,并分析类别精度与两个 simulated satellite 的角度差和亮度差之间的关系。这是一种 orbit-independent 的分析,基于高度准确的 simulated SDA 图像,这些图像模拟了商用的 COTS 光学感知器观测 CSOs。我们发现 MuyGPyS 在更加挑战的情况下表现更好,特别是在更低的信号噪声比下。
OCT2Confocal: 3D CycleGAN based Translation of Retinal OCT Images to Confocal Microscopy
for: bridging the gap between in vivo OCT and ex vivo confocal microscopy imaging
methods: developed a 3D CycleGAN framework for unsupervised translation of in vivo OCT to ex vivo confocal microscopy images
results: effectively translates between 3D medical data domains, capturing vascular, textural, and cellular details with precision, outperforming existing methods despite limited data.Abstract
Optical coherence tomography (OCT) and confocal microscopy are pivotal in retinal imaging, each presenting unique benefits and limitations. In vivo OCT offers rapid, non-invasive imaging but can be hampered by clarity issues and motion artifacts. Ex vivo confocal microscopy provides high-resolution, cellular detailed color images but is invasive and poses ethical concerns and potential tissue damage. To bridge these modalities, we developed a 3D CycleGAN framework for unsupervised translation of in vivo OCT to ex vivo confocal microscopy images. Applied to our OCT2Confocal dataset, this framework effectively translates between 3D medical data domains, capturing vascular, textural, and cellular details with precision. This marks the first attempt to exploit the inherent 3D information of OCT and translate it into the rich, detailed color domain of confocal microscopy. Assessed through quantitative and qualitative metrics, the 3D CycleGAN framework demonstrates commendable image fidelity and quality, outperforming existing methods despite the constraints of limited data. This non-invasive generation of retinal confocal images has the potential to further enhance diagnostic and monitoring capabilities in ophthalmology.
摘要
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Autoencoder
results: 我们的方法在不同任务和设定下表现出色,并且在3D物体分类、少样本学习、部分分割和3D物体检测等多种下游任务中具有较高的表现。Abstract
In recent years, the field of 3D self-supervised learning has witnessed significant progress, resulting in the emergence of Multi-Modality Masked AutoEncoders (MAE) methods that leverage both 2D images and 3D point clouds for pre-training. However, a notable limitation of these approaches is that they do not fully utilize the multi-view attributes inherent in 3D point clouds, which is crucial for a deeper understanding of 3D structures. Building upon this insight, we introduce a novel approach employing a 3D to multi-view masked autoencoder to fully harness the multi-modal attributes of 3D point clouds. To be specific, our method uses the encoded tokens from 3D masked point clouds to generate original point clouds and multi-view depth images across various poses. This approach not only enriches the model's comprehension of geometric structures but also leverages the inherent multi-modal properties of point clouds. Our experiments illustrate the effectiveness of the proposed method for different tasks and under different settings. Remarkably, our method outperforms state-of-the-art counterparts by a large margin in a variety of downstream tasks, including 3D object classification, few-shot learning, part segmentation, and 3D object detection. Code will be available at: https://github.com/Zhimin-C/Multiview-MAE
摘要
近年来,3D自适应学习领域内,多模式做法(MAE)方法得到了显著进步,利用了2D图像和3D点云进行预训练。然而,这些方法并不完全利用3D点云中的多视角特征,这是深入理解3D结构的关键。基于这一点,我们提出了一种新的方法,使用3D到多视角做法(MAE)来全面利用3D点云的多模式特征。具体来说,我们的方法使用3D做法掩码图像中的编码符号来生成原始点云和多视角深度图像。这种方法不仅扩大了模型对几何结构的理解,还利用了点云的内在多模式特征。我们的实验表明,提议的方法在不同任务和设置下具有显著的优势,比如3D物体分类、几何学学习、部分分割和3D物体检测等多个下游任务。代码将在:https://github.com/Zhimin-C/Multiview-MAE 上提供。
A Video-Based Activity Classification of Human Pickers in Agriculture
paper_authors: Abhishesh Pal, Antonio C. Leite, Jon G. O. Gjevestad, Pål J. From
for: This paper aims to improve the efficiency and productivity of harvesting operations in farming systems by developing an intelligent robotic system that can monitor human behavior, identify ongoing activities, and anticipate the worker’s needs.
methods: The proposed solution uses a combination of Mask Region-based Convolutional Neural Network (Mask R-CNN) for object detection, optical flow for motion estimation, and newly added statistical attributes of flow motion descriptors (Correlation Sensitivity, CS) to classify human activities in different agricultural scenarios.
results: The proposed framework is tested on in-house collected datasets from various crop fields, including strawberry polytunnels and apple tree orchards, and shows satisfactory results amidst challenges such as lighting variation, blur, and occlusions. The framework is evaluated using sensitivity, specificity, and accuracy measures, and the results demonstrate the effectiveness of the proposed approach.Abstract
In farming systems, harvesting operations are tedious, time- and resource-consuming tasks. Based on this, deploying a fleet of autonomous robots to work alongside farmworkers may provide vast productivity and logistics benefits. Then, an intelligent robotic system should monitor human behavior, identify the ongoing activities and anticipate the worker's needs. In this work, the main contribution consists of creating a benchmark model for video-based human pickers detection, classifying their activities to serve in harvesting operations for different agricultural scenarios. Our solution uses the combination of a Mask Region-based Convolutional Neural Network (Mask R-CNN) for object detection and optical flow for motion estimation with newly added statistical attributes of flow motion descriptors, named as Correlation Sensitivity (CS). A classification criterion is defined based on the Kernel Density Estimation (KDE) analysis and K-means clustering algorithm, which are implemented upon in-house collected dataset from different crop fields like strawberry polytunnels and apple tree orchards. The proposed framework is quantitatively analyzed using sensitivity, specificity, and accuracy measures and shows satisfactory results amidst various dataset challenges such as lighting variation, blur, and occlusions.
摘要
在农业系统中,收割操作是耗时耗源的任务。基于这点,投入一支自动驾驶机器人工作 alongside 农工可能提供广泛的生产力和物流利好。然后,一个智能机器人系统应该监测人类行为,识别当前活动并预测工作者的需求。在这种工作中,我们的主要贡献是创建一个视频基于人员检测的benchmark模型,并将其应用于不同的农业场景。我们的解决方案利用了掩模区域基于卷积神经网络(Mask R-CNN) для对象检测和运动场景的估计,并添加了新的统计特征,即相关敏感度(CS)。我们定义了一个基于饱和概率分布(KDE)分析和K-means归一化算法的分类准则,并在自己收集的 dataset 上进行了测试。我们的提出的框架在不同的 dataset 挑战下,如光线变化、模糊和遮挡等,也表现出了满意的结果。
Pre- to Post-Contrast Breast MRI Synthesis for Enhanced Tumour Segmentation
paper_authors: Richard Osuala, Smriti Joshi, Apostolia Tsirikoglou, Lidia Garrucho, Walter H. L. Pinaya, Oliver Diaz, Karim Lekadir
for: This paper aims to explore the feasibility of producing synthetic contrast enhancements for dynamic contrast-enhanced MRI (DCE-MRI) using a generative adversarial network (GAN).
methods: The authors use a GAN to translate pre-contrast T1-weighted fat-saturated breast MRI to their corresponding first DCE-MRI sequence. They also introduce a Scaled Aggregate Measure (SAMe) to evaluate the quality of synthetic data.
results: The generated DCE-MRI data are assessed using quantitative image quality metrics and applied to the downstream task of 3D breast tumour segmentation. The results show that the synthetic data can enhance the robustness of breast tumour segmentation models via data augmentation.Here’s the summary in Simplified Chinese:
methods: 作者使用 GAN 将预contrast T1-weighted fat-saturated breast MRI 翻译成其对应的第一个 DCE-MRI 序列。他们还引入了一种Scale Aggregate Measure (SAMe) 评估生成数据的质量。
results: 生成的 DCE-MRI 数据通过量化图像质量指标进行评估,并应用于3D breast tumour segmentation 下游任务。结果表明,生成数据可以通过数据增强提高breast tumour segmentation 模型的Robustness。Abstract
Despite its benefits for tumour detection and treatment, the administration of contrast agents in dynamic contrast-enhanced MRI (DCE-MRI) is associated with a range of issues, including their invasiveness, bioaccumulation, and a risk of nephrogenic systemic fibrosis. This study explores the feasibility of producing synthetic contrast enhancements by translating pre-contrast T1-weighted fat-saturated breast MRI to their corresponding first DCE-MRI sequence leveraging the capabilities of a generative adversarial network (GAN). Additionally, we introduce a Scaled Aggregate Measure (SAMe) designed for quantitatively evaluating the quality of synthetic data in a principled manner and serving as a basis for selecting the optimal generative model. We assess the generated DCE-MRI data using quantitative image quality metrics and apply them to the downstream task of 3D breast tumour segmentation. Our results highlight the potential of post-contrast DCE-MRI synthesis in enhancing the robustness of breast tumour segmentation models via data augmentation. Our code is available at https://github.com/RichardObi/pre_post_synthesis.
摘要
尽管对肿瘤检测和治疗具有优点,但是在动态增强磁共振成像(DCE-MRI)中 administraiting 对比剂具有一系列问题,包括其涉及性、堆积和肾生成性综合症风险。本研究探讨使用生成对抗网络(GAN)将预对磁共振成像(T1)转换为对应的第一个DCE-MRI序列的可能性,并引入一种量化评价生成数据的标准尺度(SAMe)。我们评估生成的DCE-MRI数据使用量化图像质量指标,并应用其到下游任务——三维乳腺肿瘤分割。我们的结果表明,通过数据增强,可以提高乳腺肿瘤分割模型的Robustness。我们的代码可以在https://github.com/RichardObi/pre_post_synthesis上获取。
Multi-entity Video Transformers for Fine-Grained Video Representation Learning
methods: 我们提出了一种自我监督的方法,通过在时间管道中更好地 интеGRATE空间信息来提高 transformer 架构的设计。我们的 Multi-entity Video Transformer(MV-Former)架构使用了无监督的 ViT 特征,并采用了多种策略来最大化提取的特征的utilty,而不需要细化 ViT 背部网络。这包括一种可学习的空间符号池化策略,用于从每帧中提取多个关键区域的特征。
results: 我们的实验表明,MV-Former 不仅超过了先前的自我监督方法,还超过了一些使用额外监督或训练数据的先前工作。当与 Kinetics-400 的额外训练数据结合使用时,MV-Former 又得到了进一步的性能提升。MV-Former 的代码可以在 GitHub 上找到。Abstract
The area of temporally fine-grained video representation learning aims to generate frame-by-frame representations for temporally dense tasks. In this work, we advance the state-of-the-art for this area by re-examining the design of transformer architectures for video representation learning. A salient aspect of our self-supervised method is the improved integration of spatial information in the temporal pipeline by representing multiple entities per frame. Prior works use late fusion architectures that reduce frames to a single dimensional vector before any cross-frame information is shared, while our method represents each frame as a group of entities or tokens. Our Multi-entity Video Transformer (MV-Former) architecture achieves state-of-the-art results on multiple fine-grained video benchmarks. MV-Former leverages image features from self-supervised ViTs, and employs several strategies to maximize the utility of the extracted features while also avoiding the need to fine-tune the complex ViT backbone. This includes a Learnable Spatial Token Pooling strategy, which is used to identify and extract features for multiple salient regions per frame. Our experiments show that MV-Former not only outperforms previous self-supervised methods, but also surpasses some prior works that use additional supervision or training data. When combined with additional pre-training data from Kinetics-400, MV-Former achieves a further performance boost. The code for MV-Former is available at https://github.com/facebookresearch/video_rep_learning.
摘要
traditional Chinese:Temporally fine-grained video representation learning的领域目标是产生每帧的frame-by-frame表现,以便在时间紧密的任务中进行分析。在这个工作中,我们提高了时间精细video representation learning的状态艺术,通过重新评估 transformer架构的设计。我们的自我超vised方法之一的特点是在时间管线中更好地融合空间信息,通过每帧都 représent multiple entities或token。对于先前的works,他们使用输出frames的单一维度vector,然后在cross-frame信息交互之前进行简化,而我们的方法则是在每帧中represent多个entity或token。我们的 Multi-entity Video Transformer(MV-Former)架构在多个精细video benchmark上 achieved state-of-the-art results。MV-Former leverages自我超vised ViTs的图像特征,并运用多种策略来提高提取的特征之 utility,同时避免繁杂的 ViT 背部bone fine-tuning。包括学习的空间Token Pooling策略,用于在每帧中识别和提取多个焦点区域的特征。我们的实验显示,MV-Former不仅超过了先前的自我超vised方法,而且还超过了一些使用额外supervision或训练数据的先前工作。当与Kinetics-400的额外训练数据结合时,MV-Former又得到了进一步的性能提升。MV-Former的代码可以在https://github.com/facebookresearch/video_rep_learning 获取。
Zero-Shot Digital Rock Image Segmentation with a Fine-Tuned Segment Anything Model
results: 实验结果表明,微调后的SAM模型(RockSAM)在岩石CT/SEM图像分割中表现出色,能够生成高质量的mask,从而提高数字岩石图像分析的效率和准确性。Abstract
Accurate image segmentation is crucial in reservoir modelling and material characterization, enhancing oil and gas extraction efficiency through detailed reservoir models. This precision offers insights into rock properties, advancing digital rock physics understanding. However, creating pixel-level annotations for complex CT and SEM rock images is challenging due to their size and low contrast, lengthening analysis time. This has spurred interest in advanced semi-supervised and unsupervised segmentation techniques in digital rock image analysis, promising more efficient, accurate, and less labour-intensive methods. Meta AI's Segment Anything Model (SAM) revolutionized image segmentation in 2023, offering interactive and automated segmentation with zero-shot capabilities, essential for digital rock physics with limited training data and complex image features. Despite its advanced features, SAM struggles with rock CT/SEM images due to their absence in its training set and the low-contrast nature of grayscale images. Our research fine-tunes SAM for rock CT/SEM image segmentation, optimizing parameters and handling large-scale images to improve accuracy. Experiments on rock CT and SEM images show that fine-tuning significantly enhances SAM's performance, enabling high-quality mask generation in digital rock image analysis. Our results demonstrate the feasibility and effectiveness of the fine-tuned SAM model (RockSAM) for rock images, offering segmentation without extensive training or complex labelling.
摘要
准确的图像分割是重要的在沟口模型和物质Characterization中,提高油气抽取效率的细节沟口模型。这种精度提供了岩石性质的启示,进而提高数字岩石物理理解。然而,为复杂的CT和SEM岩石图像创建像素级注释是困难的,因为它们的大小和低对比度,使分析时间增加。这种情况推动了数字岩石图像分析中进阶半supervised和无监督分割技术的兴趣,提供更高效、准确和 menos labor-intensive的方法。Meta AI的Segment Anything Model(SAM)在2023年革命化图像分割,提供了交互式和自动化分割,零 shot能力,对数字岩石物理进行有限训练数据和复杂图像特征是必需的。尽管它具有先进的特征,但SAM在岩石CT/SEM图像上很困难,因为它们缺失在其训练集中,以及图像的低对比度。我们的研究根据SAM进行了精度调整和大规模图像处理,以提高准确性。实验表明,精度调整可以大幅提高SAM的性能,使得高质量的面 Generation在数字岩石图像分析中。我们的结果证明了RockSAM模型的可行性和效果,为岩石图像分割提供了无需广泛训练或复杂标注的选择。
WATUNet: A Deep Neural Network for Segmentation of Volumetric Sweep Imaging Ultrasound
results: 这个研究使用了一种新的分割模型,即波峰注意网络(WATUNet),以提高分割模型的性能。研究结果表明,与其他深度网络相比,这个模型在两个 dataset 上的分割结果显著优于其他。在 VSI dataset 上,该模型的 dice 系数和 F1 分数分别为 0.94 和 0.94,而在公共 dataset 上,其分别为 0.93 和 0.94。Abstract
Objective. Limited access to breast cancer diagnosis globally leads to delayed treatment. Ultrasound, an effective yet underutilized method, requires specialized training for sonographers, which hinders its widespread use. Approach. Volume sweep imaging (VSI) is an innovative approach that enables untrained operators to capture high-quality ultrasound images. Combined with deep learning, like convolutional neural networks (CNNs), it can potentially transform breast cancer diagnosis, enhancing accuracy, saving time and costs, and improving patient outcomes. The widely used UNet architecture, known for medical image segmentation, has limitations, such as vanishing gradients and a lack of multi-scale feature extraction and selective region attention. In this study, we present a novel segmentation model known as Wavelet_Attention_UNet (WATUNet). In this model, we incorporate wavelet gates (WGs) and attention gates (AGs) between the encoder and decoder instead of a simple connection to overcome the limitations mentioned, thereby improving model performance. Main results. Two datasets are utilized for the analysis. The public "Breast Ultrasound Images" (BUSI) dataset of 780 images and a VSI dataset of 3818 images. Both datasets contained segmented lesions categorized into three types: no mass, benign mass, and malignant mass. Our segmentation results show superior performance compared to other deep networks. The proposed algorithm attained a Dice coefficient of 0.94 and an F1 score of 0.94 on the VSI dataset and scored 0.93 and 0.94 on the public dataset, respectively.
摘要
Approach: to address this issue, we propose a novel approach called volume sweep imaging (VSI), which enables untrained operators to capture high-quality ultrasound images. We also use deep learning, specifically convolutional neural networks (CNNs), to improve breast cancer diagnosis. However, existing UNet architectures have limitations, such as vanishing gradients and a lack of multi-scale feature extraction and selective region attention.To overcome these limitations, we present a novel segmentation model called Wavelet_Attention_UNet (WATUNet). Our model incorporates wavelet gates (WGs) and attention gates (AGs) between the encoder and decoder, which improves model performance.Main results: we evaluate our model on two datasets: the public "Breast Ultrasound Images" (BUSI) dataset of 780 images and a VSI dataset of 3818 images. Both datasets contain segmented lesions categorized into three types: no mass, benign mass, and malignant mass. Our segmentation results show superior performance compared to other deep networks. On the VSI dataset, the proposed algorithm attained a Dice coefficient of 0.94 and an F1 score of 0.94, while on the public dataset, it scored 0.93 and 0.94, respectively.
Domain Generalization of 3D Object Detection by Density-Resampling
results: 我们的方法在多个检测任务上(包括“车”, “人”和“自行车”检测)取得了优于state-of-the-art SDG方法和不supervised频谱适应方法的性能。代码将公开发布。Abstract
Point-cloud-based 3D object detection suffers from performance degradation when encountering data with novel domain gaps. To tackle it, the single-domain generalization (SDG) aims to generalize the detection model trained in a limited single source domain to perform robustly on unexplored domains. In this paper, we propose an SDG method to improve the generalizability of 3D object detection to unseen target domains. Unlike prior SDG works for 3D object detection solely focusing on data augmentation, our work introduces a novel data augmentation method and contributes a new multi-task learning strategy in the methodology. Specifically, from the perspective of data augmentation, we design a universal physical-aware density-based data augmentation (PDDA) method to mitigate the performance loss stemming from diverse point densities. From the learning methodology viewpoint, we develop a multi-task learning for 3D object detection: during source training, besides the main standard detection task, we leverage an auxiliary self-supervised 3D scene restoration task to enhance the comprehension of the encoder on background and foreground details for better recognition and detection of objects. Furthermore, based on the auxiliary self-supervised task, we propose the first test-time adaptation method for domain generalization of 3D object detection, which efficiently adjusts the encoder's parameters to adapt to unseen target domains during testing time, to further bridge domain gaps. Extensive cross-dataset experiments covering "Car", "Pedestrian", and "Cyclist" detections, demonstrate our method outperforms state-of-the-art SDG methods and even overpass unsupervised domain adaptation methods under some circumstances. The code will be made publicly available.
摘要
“点云基于3D对象检测中的性能逐渐下降,这是因为数据域隔阶产生的域外挑战。为解决这问题,单域泛化(SDG)寻求将在有限单个源域中训练的检测模型扩展到未探索的域中进行稳健的检测。在这篇论文中,我们提出了一种SDG方法,以提高3D对象检测的泛化性能。与先前的SDG方法不同的是,我们不仅通过数据扩展来解决问题,还提出了一种新的多任务学习策略。”“从数据扩展的角度来看,我们设计了一种物理相关的点云数据扩展(PDDA)方法,以避免因点云密度差异而导致的性能下降。从学习方法的角度来看,我们开发了一种多任务学习方法,在源训练期间,除了主要的标准检测任务之外,我们还利用一个自动导向的3D场景恢复任务来增强encoder对背景和前景细节的理解,以便更好地识别和检测对象。”“此外,基于自动导向任务,我们提出了第一个测试时适应方法,用于域泛化3D对象检测的适应。在测试时,我们可以快速调整encoder的参数,以适应未探索的目标域,从而减少域外隔阶。广泛的跨数据集实验表明,我们的方法超过了状态艺术SDG方法,甚至在某些情况下超过了无监督域泛化方法。代码将在公共上公布。”
SelfEval: Leveraging the discriminative nature of generative models for evaluation
results: 该方法可以自动评估文本图像生成模型的性能,并且可以评估模型在 attribute binding、颜色识别、计数、形状识别和空间理解等任务中的表现。此外,该方法还可以评估生成模型在 Winoground 图像分数任务中的性能,并与权威评价相当。Abstract
In this work, we show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities in a completely automated manner. Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts, making the generative model directly applicable to discriminative tasks. Using SelfEval, we repurpose standard datasets created for evaluating multimodal text-image discriminative models to evaluate generative models in a fine-grained manner: assessing their performance on attribute binding, color recognition, counting, shape recognition, spatial understanding. To the best of our knowledge SelfEval is the first automated metric to show a high degree of agreement for measuring text-faithfulness with the gold-standard human evaluations across multiple models and benchmarks. Moreover, SelfEval enables us to evaluate generative models on challenging tasks such as Winoground image-score where they demonstrate competitive performance to discriminative models. We also show severe drawbacks of standard automated metrics such as CLIP-score to measure text faithfulness on benchmarks such as DrawBench, and how SelfEval sidesteps these issues. We hope SelfEval enables easy and reliable automated evaluation for diffusion models.
摘要
在这项工作中,我们显示了文本到图像生成模型可以被"反转"来评估它们自己的文本-图像理解能力,这是一种完全自动化的方法。 我们称之为SelfEval,它使用生成模型来计算文本提示给出的真实图像的概率,使生成模型直接适用于分类任务。 使用SelfEval,我们可以将标准的评估多媒体文本-图像模型的数据集重新用于评估生成模型,并且可以细化地评估它们的性能,包括Attribute binding、颜色识别、计数、形态识别和空间理解。 根据我们所知,SelfEval是首个自动化指标,能够与人工评估的标准金属板高度一致度的评估文本 faithfulness 多种模型和基准。 此外,SelfEval可以评估生成模型在Winoground图像分数任务上的竞争性表现,并且可以解决标准自动化指标如CLIP-score在DrawBench任务上的问题。 我们希望SelfEval可以帮助执行扩散模型的自动化评估。
Multimodal Representation Learning by Alternating Unimodal Adaptation
paper_authors: Xiaohui Zhang, Jaehong Yoon, Mohit Bansal, Huaxiu Yao
for: addressing the challenge of dominant modalities in multimodal learning, improving performance in scenarios with complete and missing modalities
methods: alternating unimodal learning, shared head with continuous optimization, gradient modification mechanism for preventing information loss, test-time uncertainty-based model fusion
results: superior performance compared to prior approaches in extensive experiments on five diverse datasetsAbstract
Multimodal learning, which integrates data from diverse sensory modes, plays a pivotal role in artificial intelligence. However, existing multimodal learning methods often struggle with challenges where some modalities appear more dominant than others during multimodal learning, resulting in suboptimal performance. To address this challenge, we propose MLA (Multimodal Learning with Alternating Unimodal Adaptation). MLA reframes the conventional joint multimodal learning process by transforming it into an alternating unimodal learning process, thereby minimizing interference between modalities. Simultaneously, it captures cross-modal interactions through a shared head, which undergoes continuous optimization across different modalities. This optimization process is controlled by a gradient modification mechanism to prevent the shared head from losing previously acquired information. During the inference phase, MLA utilizes a test-time uncertainty-based model fusion mechanism to integrate multimodal information. Extensive experiments are conducted on five diverse datasets, encompassing scenarios with complete modalities and scenarios with missing modalities. These experiments demonstrate the superiority of MLA over competing prior approaches.
摘要
多模式学习,它将多种感知模式的数据集成在人工智能中扮演着关键角色。然而,现有的多模式学习方法经常遇到一些模式在多模式学习中显得更加主导地,导致表现下降。为解决这个挑战,我们提议了MLA(多模式学习与交换单模式适应)。MLA将传统的共同多模式学习过程重新框定为交换单模式学习过程,从而减少模式之间的干扰。同时,它通过共享头来捕捉交叉模式交互,并在不同模式之间进行不断的优化。这个优化过程由梯度修正机制控制,以防止共享头失去先前获得的信息。在推断阶段,MLA通过测试时间不确定性基于模型融合机制来集成多模式信息。经验表明,MLA在五种多样化的数据集上比前一种方法更高效。
SplatArmor: Articulated Gaussian splatting for animatable humans from monocular RGB videos
results: 该方法可以提供高质量的人体模型,并且可以在ZJU MoCap和People Snapshot数据集上达到惊人的效果。这些结果表明,Gaussian splatting是一种有趣的代替方法,可以使用笔制 primitives来实现人体synthesis,不会面临非 differentiability和优化问题。Abstract
We propose SplatArmor, a novel approach for recovering detailed and animatable human models by `armoring' a parameterized body model with 3D Gaussians. Our approach represents the human as a set of 3D Gaussians within a canonical space, whose articulation is defined by extending the skinning of the underlying SMPL geometry to arbitrary locations in the canonical space. To account for pose-dependent effects, we introduce a SE(3) field, which allows us to capture both the location and anisotropy of the Gaussians. Furthermore, we propose the use of a neural color field to provide color regularization and 3D supervision for the precise positioning of these Gaussians. We show that Gaussian splatting provides an interesting alternative to neural rendering based methods by leverging a rasterization primitive without facing any of the non-differentiability and optimization challenges typically faced in such approaches. The rasterization paradigms allows us to leverage forward skinning, and does not suffer from the ambiguities associated with inverse skinning and warping. We show compelling results on the ZJU MoCap and People Snapshot datasets, which underscore the effectiveness of our method for controllable human synthesis.
摘要
我们提出了SplatArmor,一种新的方法,用于recovering detailed和可动的人体模型。我们的方法使用3D Gaussian来“armor”一个参数化的体型模型,并将人体表示为一组3D Gaussian在一个坐标系中。这个坐标系的定义是通过扩展SMPLGeometry的皮肤来任意位置在坐标系中的扩展。为了考虑pose-dependent效果,我们引入了SE(3)场,以便捕捉Gaussian的位置和方向。此外,我们还提出了使用神经颜色场来提供颜色规则和3D超vision来精确位置Gaussian。我们表明,Gaussian splatting提供了一种有趣的代替方法,而不是基于神经网络渲染方法。这种渲染方法不受非导数和优化问题的影响,并且不受人体 inverse skinning和扭曲的困扰。我们在ZJU MoCap和People Snapshot数据集上展示了吸引人的结果,这些结果表明了我们的方法的可控性和可行性。
paper_authors: Soham Chitnis, Kiran Mantripragada, Faisal Z. Qureshi
For: The paper proposes a method for hyperspectral unmixing, which aims to separate the pure spectral signals of underlying materials (endmembers) and their proportions (abundances) in a hyperspectral image (HSI).* Methods: The proposed method builds upon the Latent Dirichlet Variational Autoencoder (LDVAE) and incorporates an isotropic convolutional neural network (CNN) encoder with spatial attention to leverage the spatial information present in the HSI.* Results: The proposed method was evaluated on four datasets (Samson, Hydice Urban, Cuprite, and OnTech-HSI-Syn-21) and showed improvement in endmember extraction and abundance estimation by incorporating spatial information. The model was also trained on synthetic data and evaluated on real-world data for the Cuprite dataset, demonstrating the transfer learning paradigm.Abstract
The Hyperspectral Unxming problem is to find the pure spectral signal of the underlying materials (endmembers) and their proportions (abundances). The proposed method builds upon the recently proposed method, Latent Dirichlet Variational Autoencoder (LDVAE). It assumes that abundances can be encoded as Dirichlet Distributions while mixed pixels and endmembers are represented by Multivariate Normal Distributions. However, LDVAE does not leverage spatial information present in an HSI; we propose an Isotropic CNN encoder with spatial attention to solve the hyperspectral unmixing problem. We evaluated our model on Samson, Hydice Urban, Cuprite, and OnTech-HSI-Syn-21 datasets. Our model also leverages the transfer learning paradigm for Cuprite Dataset, where we train the model on synthetic data and evaluate it on real-world data. We are able to observe the improvement in the results for the endmember extraction and abundance estimation by incorporating the spatial information. Code can be found at https://github.com/faisalqureshi/cnn-ldvae
摘要
“干扰干扰干扰干扰”问题是找到背景材料(终端成员)的纯 spectral 信号和它们的含量(充足)。我们的方法建立在近期提出的方法,潜在 Dirichlet 自动化学习(LDVAE)之上。它假设了含量可以被编码为 Dirichlet 分布,混合像素和终端成员则是 Multivariate Normal 分布。但是, LDVAE 不利用高spectral 图像中的空间信息,我们提议使用ISO特征层 CNN 编码器和空间注意力来解决干扰问题。我们在 Samson、Hydice Urban、Cuprite 和 OnTech-HSI-Syn-21 dataset 上评估了我们的模型,并且利用了转移学习 paradigma 在 Cuprite dataset 上训练模型,然后评估在实际数据上。我们发现了在 incorporating 空间信息时,可以提高终端EXTRACTION 和含量估测的结果。代码可以在 GitHub 上找到:https://github.com/faisalqureshi/cnn-ldvae。
Versatile Medical Image Segmentation Learned from Multi-Source Datasets via Model Self-Disambiguation
results: 我们在基于八种不同来源的多Modal数据集上进行了实验,并证明了我们的方法的有效性和超越性。这些结果表明,我们的方法可以更好地利用现有的注解数据,并减少新数据的注解工作,以提高模型的能力。Abstract
A versatile medical image segmentation model applicable to imaging data collected with diverse equipment and protocols can facilitate model deployment and maintenance. However, building such a model typically requires a large, diverse, and fully annotated dataset, which is rarely available due to the labor-intensive and costly data curation. In this study, we develop a cost-efficient method by harnessing readily available data with partially or even sparsely annotated segmentation labels. We devise strategies for model self-disambiguation, prior knowledge incorporation, and imbalance mitigation to address challenges associated with inconsistently labeled data from various sources, including label ambiguity and imbalances across modalities, datasets, and segmentation labels. Experimental results on a multi-modal dataset compiled from eight different sources for abdominal organ segmentation have demonstrated our method's effectiveness and superior performance over alternative state-of-the-art methods, highlighting its potential for optimizing the use of existing annotated data and reducing the annotation efforts for new data to further enhance model capability.
摘要
一种通用的医疗影像分割模型,可以应用于不同设备和协议收集的影像数据,可以方便模型部署和维护。然而,建立这种模型通常需要一个大、多样化和完全注释的数据集,而这种数据集罕见地可以通过劳动密集和成本高的数据整理获得。在这种研究中,我们开发了一种经济高效的方法,利用可用的数据中具有部分或甚至缺失注释的分割标签。我们提出了自然语言处理技术、先前知识 integrate 和负面优化的策略,以解决来自不同来源的分割标签的不一致和模式、数据集和分割标签之间的差异。实验结果表明,我们的方法在多模态数据集上表现出色,并在与其他当前状态的方法进行比较中表现出超越性,这 highlights 我们的方法的潜在用于现有注释数据的优化和新数据的注释努力的减少,以进一步提高模型的能力。
3D-TexSeg: Unsupervised Segmentation of 3D Texture using Mutual Transformer Learning
results: 实验结果表明,提出的方法可以在三个公共可用数据集上以外,与标准和state-of-the-art无监督方法相比,并与监督方法相竞争。Abstract
Analysis of the 3D Texture is indispensable for various tasks, such as retrieval, segmentation, classification, and inspection of sculptures, knitted fabrics, and biological tissues. A 3D texture is a locally repeated surface variation independent of the surface's overall shape and can be determined using the local neighborhood and its characteristics. Existing techniques typically employ computer vision techniques that analyze a 3D mesh globally, derive features, and then utilize the obtained features for retrieval or classification. Several traditional and learning-based methods exist in the literature, however, only a few are on 3D texture, and nothing yet, to the best of our knowledge, on the unsupervised schemes. This paper presents an original framework for the unsupervised segmentation of the 3D texture on the mesh manifold. We approach this problem as binary surface segmentation, partitioning the mesh surface into textured and non-textured regions without prior annotation. We devise a mutual transformer-based system comprising a label generator and a cleaner. The two models take geometric image representations of the surface mesh facets and label them as texture or non-texture across an iterative mutual learning scheme. Extensive experiments on three publicly available datasets with diverse texture patterns demonstrate that the proposed framework outperforms standard and SOTA unsupervised techniques and competes reasonably with supervised methods.
摘要
analysis of 3D texture 是必备的 для多种任务,如采集、分割、分类和生物组织诊断。 3D texture 是一种本地重复的表面变化,不受表面整体形状的影响,可以通过地方 neighborhood 和其特征来确定。现有的技术通常使用计算机视觉技术,分析全球的 3D 网格,提取特征,然后使用获得的特征进行采集或分类。文献中有一些传统的方法和学习基于的方法,但只有一些是3D texture,而没有任何一个是基于无监督方案。本文提出了一个原创的无监督分割方案,将 mesh 表面分割成文本化和非文本化区域,无需先有注释。我们设计了一个基于 transformer 的系统,包括标签生成器和清洁器两部分。两个模型在迭代的互助学习方案中,对 mesh 表面 Facet 的 геометрической图像表示进行标注。我们对三个公共可用的数据集进行了广泛的实验,结果显示,提出的方案可以超过标准和 SOTA 无监督方法,并与指导方法相匹配。
results: 研究表明,该方法可以在Synthetic和Real两个频谱中提高掌景分割模型的性能,并且可以在不同的频谱之间进行准确的适应。Abstract
Panoptic segmentation is an important computer vision task which combines semantic and instance segmentation. It plays a crucial role in domains of medical image analysis, self-driving vehicles, and robotics by providing a comprehensive understanding of visual environments. Traditionally, deep learning panoptic segmentation models have relied on dense and accurately annotated training data, which is expensive and time consuming to obtain. Recent advancements in self-supervised learning approaches have shown great potential in leveraging synthetic and unlabelled data to generate pseudo-labels using self-training to improve the performance of instance and semantic segmentation models. The three available methods for self-supervised panoptic segmentation use proposal-based transformer architectures which are computationally expensive, complicated and engineered for specific tasks. The aim of this work is to develop a framework to perform embedding-based self-supervised panoptic segmentation using self-training in a synthetic-to-real domain adaptation problem setting.
摘要
《泛opeptic segmentation是计算机视觉任务之一,它将semantic segmentation和instance segmentation融合在一起,在医学影像分析、自动驾驶和机器人等领域具有重要作用。传统的深度学习泛opeptic segmentation模型通常需要大量和准确的标注训练数据,这是expensive和时间consuming的。现在的self-supervised learning方法已经 показа出了很好的潜力,可以使用synthetic和无标签数据生成pseudo-labels,以提高instance和semantic segmentation模型的性能。现有的三种方法 для自我超vised泛opeptic segmentation都使用提案based transformer架构,这些架构是 computationally expensive, complicated和engineered for specific tasks。本研究的目标是开发一个框架,可以在synthetic-to-real domain adaptation问题上进行嵌入基于自我超vised泛opeptic segmentation,使用self-training。》Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.
Astronomical Images Quality Assessment with Automated Machine Learning
results: 研究人员通过使用自动化机器学习模型,成功地自动评估了天文图像质量。Abstract
Electronically Assisted Astronomy consists in capturing deep sky images with a digital camera coupled to a telescope to display views of celestial objects that would have been invisible through direct observation. This practice generates a large quantity of data, which may then be enhanced with dedicated image editing software after observation sessions. In this study, we show how Image Quality Assessment can be useful for automatically rating astronomical images, and we also develop a dedicated model by using Automated Machine Learning.
摘要
电子助力天文学包括使用数字摄像头与望远镜相结合,以显示天体对象的深空图像,这些图像可能通过直接观察无法看到。这种做法生成了大量数据,可以使用专门的图像修复软件进行优化 после观察会。在这项研究中,我们展示了如何使用图像质量评估来自动评分天文图像,并开发了一个专门的自动机器学习模型。
CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
results: 我们的CA-Jaccard距离是一种简单又有效的距离度量,可以高效地提高人体重复识别方法的可靠性和低计算成本。我们在实验中证明了我们的方法的效果。Abstract
Person re-identification (re-ID) is a challenging task that aims to learn discriminative features for person retrieval. In person re-ID, Jaccard distance is a widely used distance metric, especially in re-ranking and clustering scenarios. However, we discover that camera variation has a significant negative impact on the reliability of Jaccard distance. In particular, Jaccard distance calculates the distance based on the overlap of relevant neighbors. Due to camera variation, intra-camera samples dominate the relevant neighbors, which reduces the reliability of the neighbors by introducing intra-camera negative samples and excluding inter-camera positive samples. To overcome this problem, we propose a novel camera-aware Jaccard (CA-Jaccard) distance that leverages camera information to enhance the reliability of Jaccard distance. Specifically, we introduce camera-aware k-reciprocal nearest neighbors (CKRNNs) to find k-reciprocal nearest neighbors on the intra-camera and inter-camera ranking lists, which improves the reliability of relevant neighbors and guarantees the contribution of inter-camera samples in the overlap. Moreover, we propose a camera-aware local query expansion (CLQE) to exploit camera variation as a strong constraint to mine reliable samples in relevant neighbors and assign these samples higher weights in overlap to further improve the reliability. Our CA-Jaccard distance is simple yet effective and can serve as a general distance metric for person re-ID methods with high reliability and low computational cost. Extensive experiments demonstrate the effectiveness of our method.
摘要
人体重认识(re-ID)是一项具有挑战性的任务,旨在学习人体特征的识别特征。在人体重认识中,Jacard距离是广泛使用的距离度量,特别是在重新排序和聚合场景中。然而,我们发现了相机变化对Jaccard距离的可靠性的印象。具体来说,Jaccard距离根据相机变化引入了内相机负样本和排除了间相机正样本,从而减少了相机变化对Jaccard距离的可靠性。为解决这个问题,我们提出了一种新的相机意识Jacard(CA-Jaccard)距离,利用相机信息来提高Jaccard距离的可靠性。特别是,我们引入相机意识k-最相似邻居(CKRNNs),以找到k-最相似邻居在内相机和间相机排名列表上,从而改善相机变化对Jaccard距离的可靠性。此外,我们提出了相机意识地本查询扩展(CLQE),以利用相机变化作为强制约束,挖掘可靠的样本,并将这些样本在重合中分配更高的权重,以进一步提高可靠性。我们的CA-Jaccard距离简单又有效,可以作为人体重认识方法中的一种高可靠性低计算成本的距离度量。广泛的实验证明了我们的方法的有效性。
Multimodal Indoor Localization Using Crowdsourced Radio Maps
results: 对多个实际场景进行了广泛的评估,结果显示该系统可以减少 ~ 25%的性能提升,与最佳参考值相比。Abstract
Indoor Positioning Systems (IPS) traditionally rely on odometry and building infrastructures like WiFi, often supplemented by building floor plans for increased accuracy. However, the limitation of floor plans in terms of availability and timeliness of updates challenges their wide applicability. In contrast, the proliferation of smartphones and WiFi-enabled robots has made crowdsourced radio maps - databases pairing locations with their corresponding Received Signal Strengths (RSS) - increasingly accessible. These radio maps not only provide WiFi fingerprint-location pairs but encode movement regularities akin to the constraints imposed by floor plans. This work investigates the possibility of leveraging these radio maps as a substitute for floor plans in multimodal IPS. We introduce a new framework to address the challenges of radio map inaccuracies and sparse coverage. Our proposed system integrates an uncertainty-aware neural network model for WiFi localization and a bespoken Bayesian fusion technique for optimal fusion. Extensive evaluations on multiple real-world sites indicate a significant performance enhancement, with results showing ~ 25% improvement over the best baseline
摘要
室内定位系统(IPS)传统上依靠速度和建筑物的WiFi基础设施,经常补充了建筑物的 floor plan,以提高准确性。然而, floor plan 的有效性和时效性的限制使其广泛应用受到挑战。相比之下,智能手机和WIFI启动的机器人的普及,使得来自众生的广播电子地图 - 将位置对应到接收信号强度(RSS)的数据库 - 变得越来越可 accessible。这些广播电子地图不仅提供WIFI指纹-位置对应,还编码了运动规律,类似于建筑物的制约。这项工作探讨了使用这些广播电子地图作为 floor plan 的替代品在多模态 IPS 中的可能性。我们提出了一种新的框架,以解决广播电子地图的不准确和罕见覆盖问题。我们的提议的系统通过不确定性感知神经网络模型和特定的 Bayesian 融合技术来实现优质融合。对多个实际场景进行了广泛的评估,结果显示了大约 25% 的性能提升,与最佳基eline相比。
Détection d’objets célestes dans des images astronomiques par IA explicable
results: 该研究可以自动检测捕获的天体对象的存在和位置,并提供了可解释的结果。I hope that helps! Let me know if you have any other questions.Abstract
Amateur and professional astronomers can easily capture a large number of deep sky images with recent smart telescopes. However, afterwards verification is still required to check whether the celestial objects targeted are actually visible in the images produced. Depending on the magnitude of the targets, the observation conditions and the time during which the data is captured, it is possible that only stars are present in the images. In this study, we propose an approach based on explainable Artificial Intelligence to automatically detect the presence and position of captured objects. -- -- Gr\^ace \`a l'apport des t\'elescopes automatis\'es grand public, les astronomes amateurs et professionnels peuvent capturer facilement une grande quantit\'e d'images du ciel profond (comme par exemple les galaxies, n\'ebuleuses, ou amas globulaires). N\'eanmoins, une v\'erification reste n\'ecessaire \`a post\'eriori pour v\'erifier si les objets c\'elestes vis\'es sont effectivement visibles dans les images produites: cela d\'epend notamment de la magnitude des cibles, des conditions d'observation mais aussi de la dur\'ee pendant laquelle les donn\'ees sont captur\'ees. Dans cette \'etude, nous proposons une approche bas\'ee sur l'IA explicable pour d\'etecter automatiquement la pr\'esence et la position des objets captur\'es.
摘要
<>现代智能望远镜可以轻松地 capture 大量深空图像,但是需要 posteriori 验证以确认targeted 的星系是否实际存在在图像中。这取决于目标星系的亮度、观测条件以及Capture 时间。在本研究中,我们提出一种基于可解释的人工智能方法来自动检测captured 对象的存在和位置。---感谢公共大众望远镜的贡献,天文爱好者和专业天文学家可以轻松地捕捉大量深空图像,如 галактиcas, 星系、或球状星团。然而,需要 posteriori 验证以确认targeted 的星系是否实际存在在图像中:这取决于目标星系的亮度、观测条件以及Capture 时间。在本研究中,我们提出一种基于可解释的人工智能方法来自动检测captured 对象的存在和位置。
Human motion trajectory prediction using the Social Force Model for real-time and low computational cost applications
for: 这篇论文的目的是提出一种新的人体动量预测模型,即Social Force Generative Adversarial Network (SoFGAN),用于人机合作任务中的预测人体动量,如陪伴、引导或接近等任务。
methods: 这篇论文使用了Generative Adversarial Network (GAN)和Social Force Model (SFM)组合来生成不同的可能的人体动量,以避免在场景中的碰撞。此外,还添加了一个Conditional Variational Autoencoder (CVAE)模块,以强调目的学习。
results: 根据UCY或BIWI数据集的实验结果表示,我们的方法在预测方面比大多数当前状态的方法更加准确,同时也比其他方法减少了碰撞的风险。此外,我们还实现了在实时无GPU的情况下,使用这种模型进行高质量的预测,而且计算成本较低。Abstract
Human motion trajectory prediction is a very important functionality for human-robot collaboration, specifically in accompanying, guiding, or approaching tasks, but also in social robotics, self-driving vehicles, or security systems. In this paper, a novel trajectory prediction model, Social Force Generative Adversarial Network (SoFGAN), is proposed. SoFGAN uses a Generative Adversarial Network (GAN) and Social Force Model (SFM) to generate different plausible people trajectories reducing collisions in a scene. Furthermore, a Conditional Variational Autoencoder (CVAE) module is added to emphasize the destination learning. We show that our method is more accurate in making predictions in UCY or BIWI datasets than most of the current state-of-the-art models and also reduces collisions in comparison to other approaches. Through real-life experiments, we demonstrate that the model can be used in real-time without GPU's to perform good quality predictions with a low computational cost.
摘要
人体运动轨迹预测是人机合作中非常重要的功能,尤其在陪伴、引导或接近任务中,也在社交机器人、自动驾驶车或安全系统中。在这篇论文中,我们提出了一种新的轨迹预测模型,即社交力生成抗拒网络(SoFGAN)。SoFGAN使用生成抗拒网络(GAN)和社交力模型(SFM)生成不同的可能的人体轨迹,以降低场景中的碰撞。此外,我们添加了一个条件可变自动编码器(CVAE)模块,以强调目标学习。我们表明,我们的方法在UCY或BIWI数据集上的预测比现有的大多数状态对模型更准确,并且与其他方法相比,减少了碰撞。通过实际实验,我们示示了该模型可以在实时无需GPU进行高质量预测,并且计算成本较低。
SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning
results: 实验表明,SSB 方法可以大幅提高准确分类和异常检测性能,在开放集成场景中准确分类率高达 97.5%,与现有方法相比差距很大。Abstract
Semi-supervised learning (SSL) methods effectively leverage unlabeled data to improve model generalization. However, SSL models often underperform in open-set scenarios, where unlabeled data contain outliers from novel categories that do not appear in the labeled set. In this paper, we study the challenging and realistic open-set SSL setting, where the goal is to both correctly classify inliers and to detect outliers. Intuitively, the inlier classifier should be trained on inlier data only. However, we find that inlier classification performance can be largely improved by incorporating high-confidence pseudo-labeled data, regardless of whether they are inliers or outliers. Also, we propose to utilize non-linear transformations to separate the features used for inlier classification and outlier detection in the multi-task learning framework, preventing adverse effects between them. Additionally, we introduce pseudo-negative mining, which further boosts outlier detection performance. The three ingredients lead to what we call Simple but Strong Baseline (SSB) for open-set SSL. In experiments, SSB greatly improves both inlier classification and outlier detection performance, outperforming existing methods by a large margin. Our code will be released at https://github.com/YUE-FAN/SSB.
摘要
semi-supervised learning(SSL)方法可以有效地利用无标签数据来提高模型的泛化性。然而,SSL模型经常在开放集成分类情况下表现不佳,因为无标签数据中可能包含来自新类别的异常数据。在这篇论文中,我们研究了开放集成SSL设定,其目标是同时正确地分类归类数据和检测异常数据。intuitively,归类器应该只在归类数据上训练。然而,我们发现归类性能可以通过包含高信任 Pseudo-标注数据来大幅提高,无论它们是归类数据还是异常数据。此外,我们提议利用非线性变换来分离在多任务学习框架中使用的归类和异常检测的特征,避免它们之间的干扰。此外,我们引入 Pseudo-负样本采集,进一步提高异常检测性能。这三个元素导致我们提出的简单强基线(SSB)方法,在实验中对开放集成SSL方法进行了大幅改进。我们的代码将在https://github.com/YUE-FAN/SSB上发布。
Phase Guided Light Field for Spatial-Depth High Resolution 3D Imaging
results: 实验结果显示,相比现有活动光场方法,提出的方法可以在单shot光场相机上重建3D点云,并且提高了空间分辨率10倍,保持同高深度分辨率,仅需要一组高频单相频偏振Pattern。Abstract
On 3D imaging, light field cameras typically are of single shot, and however, they heavily suffer from low spatial resolution and depth accuracy. In this paper, by employing an optical projector to project a group of single high-frequency phase-shifted sinusoid patterns, we propose a phase guided light field algorithm to significantly improve both the spatial and depth resolutions for off-the-shelf light field cameras. First, for correcting the axial aberrations caused by the main lens of our light field camera, we propose a deformed cone model to calibrate our structured light field system. Second, over wrapped phases computed from patterned images, we propose a stereo matching algorithm, i.e. phase guided sum of absolute difference, to robustly obtain the correspondence for each pair of neighbored two lenslets. Finally, by introducing a virtual camera according to the basic geometrical optics of light field imaging, we propose a reorganization strategy to reconstruct 3D point clouds with spatial-depth high resolution. Experimental results show that, compared with the state-of-the-art active light field methods, the proposed reconstructs 3D point clouds with a spatial resolution of 1280$\times$720 with factors 10$\times$ increased, while maintaining the same high depth resolution and needing merely a single group of high-frequency patterns.
摘要
对3D成像,光场相机通常是单步的,但它们受到低空间分辨率和深度准确性的压制。在这篇论文中,我们通过使用光学投影机 проек加入一组单高频相位偏移的振荡模式,提出了相位导航光场算法,以提高各种光场相机的空间和深度分辨率。首先,为了正确地纠正主镜的轴向扭曲,我们提出了扭曲杯模型来准确地calibrate我们的结构化光场系统。其次,通过对 Patterned 图像中的卷绕相位进行匹配,我们提出了相位导航差分差分析法,以稳定地获取每对邻居两个镜头的匹配。最后,通过引入基本光学的光场投影机,我们提出了重新组织策略,以重建3D点云的空间深度高分辨率。实验结果表明,相比state-of-the-art的活动光场方法,我们的提议可以重建3D点云,空间分辨率为1280×720,同时保持高深度分辨率,只需要一组高频模式。
Archtree: on-the-fly tree-structured exploration for latency-aware pruning of deep neural networks
results: 实验结果表明,Archtree方法可以更好地保持原始模型的准确性,同时更好地适应延迟预算。相比之下,现有的状态 искусственный方法都不如Archtree方法。Abstract
Deep neural networks (DNNs) have become ubiquitous in addressing a number of problems, particularly in computer vision. However, DNN inference is computationally intensive, which can be prohibitive e.g. when considering edge devices. To solve this problem, a popular solution is DNN pruning, and more so structured pruning, where coherent computational blocks (e.g. channels for convolutional networks) are removed: as an exhaustive search of the space of pruned sub-models is intractable in practice, channels are typically removed iteratively based on an importance estimation heuristic. Recently, promising latency-aware pruning methods were proposed, where channels are removed until the network reaches a target budget of wall-clock latency pre-emptively estimated on specific hardware. In this paper, we present Archtree, a novel method for latency-driven structured pruning of DNNs. Archtree explores multiple candidate pruned sub-models in parallel in a tree-like fashion, allowing for a better exploration of the search space. Furthermore, it involves on-the-fly latency estimation on the target hardware, accounting for closer latencies as compared to the specified budget. Empirical results on several DNN architectures and target hardware show that Archtree better preserves the original model accuracy while better fitting the latency budget as compared to existing state-of-the-art methods.
摘要
Recently, promising latency-aware pruning methods have been proposed, where channels are removed until the network reaches a target budget of wall-clock latency pre-emptively estimated on specific hardware. In this paper, we present Archtree, a novel method for latency-driven structured pruning of DNNs. Archtree explores multiple candidate pruned sub-models in parallel in a tree-like fashion, allowing for a better exploration of the search space. Furthermore, it involves on-the-fly latency estimation on the target hardware, accounting for closer latencies as compared to the specified budget.Empirical results on several DNN architectures and target hardware show that Archtree better preserves the original model accuracy while better fitting the latency budget as compared to existing state-of-the-art methods.
Joint covariance property under geometric image transformations for spatio-temporal receptive fields according to the generalized Gaussian derivative model for visual receptive fields
results: 本文提出了一种joint covariance property,该性质可以描述不同类型的图像变换如何相互交互,并且提供了匹配输出与下游spatio-temporal receptive fields的参数需要如何变换以实现图像操作的稳定性。Abstract
The influence of natural image transformations on receptive field responses is crucial for modelling visual operations in computer vision and biological vision. In this regard, covariance properties with respect to geometric image transformations in the earliest layers of the visual hierarchy are essential for expressing robust image operations and for formulating invariant visual operations at higher levels. This paper defines and proves a joint covariance property under compositions of spatial scaling transformations, spatial affine transformations, Galilean transformations and temporal scaling transformations, which makes it possible to characterize how different types of image transformations interact with each other. Specifically, the derived relations show how the receptive field parameters need to be transformed, in order to match the output from spatio-temporal receptive fields with the underlying spatio-temporal image transformations.
摘要
自然图像变换对视觉运算的影响是计算机视觉和生物视觉中关键的一部分。在这种情况下,图像层次结构的最初层的covariance属性对于表达 Robust 图像运算和高层次的抗变换视觉操作是关键的。本文定义并证明了在作用于图像的空间缩放变换、空间乘数变换、加利列安变换和时间缩放变换的复合作用下,covariance属性的共同性。具体来说, derivated 关系表明了感知场参数如何对应于图像变换,以实现匹配下来的输出与背景图像变换。
Segment Anything Model with Uncertainty Rectification for Auto-Prompting Medical Image Segmentation
results: 在两个公共的3D医疗图像数据集上,无需额外训练或精度调整,our方法可以进一步提高分割性能,达到最高达10.7%和13.8%的 dice相似度,表明our方法具有效果和广泛的应用前提。Abstract
The introduction of the Segment Anything Model (SAM) has marked a significant advancement in prompt-driven image segmentation. However, SAM's application to medical image segmentation requires manual prompting of target structures to obtain acceptable performance, which is still labor-intensive. Despite attempts of auto-prompting to turn SAM into a fully automatic manner, it still exhibits subpar performance and lacks of reliability in the field of medical imaging. In this paper, we propose UR-SAM, an uncertainty rectified SAM framework to enhance the robustness and reliability for auto-prompting medical image segmentation. Our method incorporates a prompt augmentation module to estimate the distribution of predictions and generate uncertainty maps, and an uncertainty-based rectification module to further enhance the performance of SAM. Extensive experiments on two public 3D medical datasets covering the segmentation of 35 organs demonstrate that without supplementary training or fine-tuning, our method further improves the segmentation performance with up to 10.7 % and 13.8 % in dice similarity coefficient, demonstrating efficiency and broad capabilities for medical image segmentation without manual prompting.
摘要
《Introduction of Segment Anything Model (SAM) has brought significant advancements in prompt-driven image segmentation. However, applying SAM to medical image segmentation still requires manual prompting of target structures, which is labor-intensive. Despite attempts to turn SAM into a fully automatic manner, its performance is still subpar and unreliable in medical imaging. In this paper, we propose UR-SAM, an uncertainty rectified SAM framework to enhance the robustness and reliability of auto-prompting medical image segmentation. Our method incorporates a prompt augmentation module to estimate the distribution of predictions and generate uncertainty maps, and an uncertainty-based rectification module to further enhance the performance of SAM. Extensive experiments on two public 3D medical datasets covering the segmentation of 35 organs show that our method can improve segmentation performance by up to 10.7% and 13.8% in dice similarity coefficient without supplementary training or fine-tuning, demonstrating efficiency and broad capabilities for medical image segmentation without manual prompting.》Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.
Removing Adverse Volumetric Effects From Trained Neural Radiance Fields
results: 本文通过视频结果示示了使用NeRF来渲染雾瑞环境中的 объек 的 ClearView 效果。Abstract
While the use of neural radiance fields (NeRFs) in different challenging settings has been explored, only very recently have there been any contributions that focus on the use of NeRF in foggy environments. We argue that the traditional NeRF models are able to replicate scenes filled with fog and propose a method to remove the fog when synthesizing novel views. By calculating the global contrast of a scene, we can estimate a density threshold that, when applied, removes all visible fog. This makes it possible to use NeRF as a way of rendering clear views of objects of interest located in fog-filled environments. Additionally, to benchmark performance on such scenes, we introduce a new dataset that expands some of the original synthetic NeRF scenes through the addition of fog and natural environments. The code, dataset, and video results can be found on our project page: https://vegardskui.com/fognerf/
摘要
traditional NeRF models 可以复制foggy scenes,我们提出了一种方法来从synthesized views中移除fog。通过计算场景的全局对比度,我们可以估算一个density threshold,当应用于场景时,可以完全remove all visible fog。这使得我们可以使用NeRF来渲染fog-filled environments中的 объекts of interest的清晰视图。此外,为了评估这些场景的性能,我们引入了一个新的 dataset,该dataset通过fog和自然环境的添加扩展了一些原始的synthetic NeRF scenes。我们的代码、dataset和视频结果可以在我们项目页面上找到:https://vegardskui.com/fognerf/
Mind the map! Accounting for existing map information when estimating online HDMaps from sensor data
results: 在nuScenes数据集上实现了显著的改进,比如使用噪声地图时,MapEX比MapTRv2探测器提高38%,比当前SOTA提高16%。Abstract
Online High Definition Map (HDMap) estimation from sensors offers a low-cost alternative to manually acquired HDMaps. As such, it promises to lighten costs for already HDMap-reliant Autonomous Driving systems, and potentially even spread their use to new systems. In this paper, we propose to improve online HDMap estimation by accounting for already existing maps. We identify 3 reasonable types of useful existing maps (minimalist, noisy, and outdated). We also introduce MapEX, a novel online HDMap estimation framework that accounts for existing maps. MapEX achieves this by encoding map elements into query tokens and by refining the matching algorithm used to train classic query based map estimation models. We demonstrate that MapEX brings significant improvements on the nuScenes dataset. For instance, MapEX - given noisy maps - improves by 38% over the MapTRv2 detector it is based on and by 16% over the current SOTA.
摘要
“在线高清地图(HDMap)估算从感知器件提供了一种低成本的替代方案,以减轻现有HDMap-依赖的自动驾驶系统的成本,并可能扩展其使用至新的系统。在这篇论文中,我们提议改进在线HDMap估算,考虑现有地图的价值。我们确定了三种有用的现有地图类型(简化、噪音、过时),并引入了MapEX,一种新的在线HDMap估算框架。MapEX通过编码地图元素为查询token,并通过改进基于类传统查询基于地图估算模型的匹配算法来实现。我们示示了MapEX在nuScenes数据集上带来了显著改进,比如,MapEX(基于噪音地图)与MapTRv2探测器相比提高38%,与当前SOTA相比提高16%。”Note: "HDMap" in the text refers to "High-Definition Map".
A Framework of Landsat-8 Band Selection based on UMDA for Deforestation Detection
results: 实验表明,使用最佳组合(651)可以达到90%以上的准确率,并且比其他所有组合相比,效率和效果更高。Abstract
The conservation of tropical forests is a current subject of social and ecological relevance due to their crucial role in the global ecosystem. Unfortunately, millions of hectares are deforested and degraded each year. Therefore, government or private initiatives are needed for monitoring tropical forests. In this sense, this work proposes a novel framework, which uses of distribution estimation algorithm (UMDA) to select spectral bands from Landsat-8 that yield a better representation of deforestation areas to guide a semantic segmentation architecture called DeepLabv3+. In performed experiments, it was possible to find several compositions that reach balanced accuracy superior to 90% in segment classification tasks. Furthermore, the best composition (651) found by UMDA algorithm fed the DeepLabv3+ architecture and surpassed in efficiency and effectiveness all compositions compared in this work.
摘要
保护热带雨林是当前社会和生态领域的热点话题,因为它们在全球生态系统中扮演了关键角色。然而,每年仍有数百万公顷的雨林被毁灭和侵蚀。因此,政府或私人的倡议是必要的,以监测热带雨林。在这种情况下,本工作提出了一个新的框架,使用分布Estimation算法(UMDA)选择LandSat-8遥感器中的spectral Band,以更好地表示Deforestation区域,并用DeepLabv3+ semanticsegmentation架构进行分类。在实验中,能够找到许多compositions,其中balanced accuracy超过90%的分类任务。此外,最佳Composition(651)由UMDA算法选择,并将DeepLabv3+架构feed,在效率和效果方面超过了所有相比的Compositions。
A Relay System for Semantic Image Transmission based on Shared Feature Extraction and Hyperprior Entropy Compression
paper_authors: Wannian An, Zhicheng Bao, Haotai Liang, Chen Dong, Xiaodong
for: 提高图像重建和修复的质量
methods: 使用共享特征提取技术和幂噪 entropy压缩(HEC)技术
results: 相比其他最新研究方法,提出的系统具有较低的传输开销和更高的 semantic 图像传输性能,特别是在同样的条件下,相对比较方法的多尺度结构相似度(MS-SSIM)高出约0.2。Here’s the simplified Chinese text in the format you requested:
for: 提高图像重建和修复的质量
methods: 使用共享特征提取技术和幂噪 entropy压缩(HEC)技术
results: 相比其他最新研究方法,提出的系统具有较低的传输开销和更高的 semantic 图像传输性能。Abstract
Nowadays, the need for high-quality image reconstruction and restoration is more and more urgent. However, most image transmission systems may suffer from image quality degradation or transmission interruption in the face of interference such as channel noise and link fading. To solve this problem, a relay communication network for semantic image transmission based on shared feature extraction and hyperprior entropy compression (HEC) is proposed, where the shared feature extraction technology based on Pearson correlation is proposed to eliminate partial shared feature of extracted semantic latent feature. In addition, the HEC technology is used to resist the effect of channel noise and link fading and carried out respectively at the source node and the relay node. Experimental results demonstrate that compared with other recent research methods, the proposed system has lower transmission overhead and higher semantic image transmission performance. Particularly, under the same conditions, the multi-scale structural similarity (MS-SSIM) of this system is superior to the comparison method by approximately 0.2.
摘要
现在,高品质图像重建和修复的需求越来越紧迫。然而,大多数图像传输系统可能会受到频率干扰和链接损坏的影响,导致图像质量下降。为解决这个问题,一种基于共享特征提取和超凡 entropy压缩(HEC)的关键点通信网络 для semantics 图像传输是提出的,其中基于皮尔逊相关性的共享特征提取技术用于消除部分共享特征。此外,HEC技术在源节点和关键节点进行了分别应用,以抵御频率干扰和链接损坏的影响。实验结果表明,相比其他最近的研究方法,提出的系统具有较低的传输 overhead 和较高的semantics 图像传输性能。特别是,在同样的条件下,该系统的多尺度结构相似度(MS-SSIM)比对比方法高出约0.2。
FRCSyn Challenge at WACV 2024:Face Recognition Challenge in the Era of Synthetic Data
methods: 这篇论文使用了一个国际性的 Face Recognition Challenge in the Era of Synthetic Data (FRCSyn),以探讨假数据在面 recognition技术中的应用。
results: 根据这篇论文的结果,使用假数据可以有效地解决面 recognition技术中的数据隐私问题、人种偏见、未经见过的场景推理、以及面部pose和 occlusion 等挑战。Abstract
Despite the widespread adoption of face recognition technology around the world, and its remarkable performance on current benchmarks, there are still several challenges that must be covered in more detail. This paper offers an overview of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at WACV 2024. This is the first international challenge aiming to explore the use of synthetic data in face recognition to address existing limitations in the technology. Specifically, the FRCSyn Challenge targets concerns related to data privacy issues, demographic biases, generalization to unseen scenarios, and performance limitations in challenging scenarios, including significant age disparities between enrollment and testing, pose variations, and occlusions. The results achieved in the FRCSyn Challenge, together with the proposed benchmark, contribute significantly to the application of synthetic data to improve face recognition technology.
摘要
尽管全球范围内普及的人脸识别技术表现出色,但还有一些挑战需要更加详细地考虑。这篇文章提供了WACV 2024年举行的人脸识别挑战(FRCSyn)的概述。这是首个使用合成数据探索人脸识别技术的国际挑战。具体来说,FRCSyn挑战旨在解决现有技术中的数据隐私问题、人口偏见、未经见测enario推理和场景复杂性等问题。包括年龄差距、拍摄角度变化、 occlusion等情况下的性能 limitation。FRCSyn挑战的结果,以及提议的标准化程序,对于使用合成数据提升人脸识别技术具有重要 significanse。
End-to-end autoencoding architecture for the simultaneous generation of medical images and corresponding segmentation masks
methods: 这篇论文使用的方法是基于HVAE的终端架构,实现更好的 posterior distribution 推测,并且与传统的Variational Autoencoders(VAE)相比,具有更高的图像生成质量。
results: 这篇论文的结果显示,在拥有少量数据的情况下,这个方法可以超越对抗性模型,实现更好的图像质量和精确的肿瘤标识生成。实验结果显示,这个方法在不同的医疗影像模式下都具有良好的效果。Abstract
Despite the increasing use of deep learning in medical image segmentation, acquiring sufficient training data remains a challenge in the medical field. In response, data augmentation techniques have been proposed; however, the generation of diverse and realistic medical images and their corresponding masks remains a difficult task, especially when working with insufficient training sets. To address these limitations, we present an end-to-end architecture based on the Hamiltonian Variational Autoencoder (HVAE). This approach yields an improved posterior distribution approximation compared to traditional Variational Autoencoders (VAE), resulting in higher image generation quality. Our method outperforms generative adversarial architectures under data-scarce conditions, showcasing enhancements in image quality and precise tumor mask synthesis. We conduct experiments on two publicly available datasets, MICCAI's Brain Tumor Segmentation Challenge (BRATS), and Head and Neck Tumor Segmentation Challenge (HECKTOR), demonstrating the effectiveness of our method on different medical imaging modalities.
摘要
尽管深度学习在医学图像分割中得到了广泛应用,但获取充足的训练数据仍然是医疗领域的挑战。为应对这些限制,数据增强技术被提出,但生成真实和多样化的医学图像和其相对应的掩码仍然是一项困难任务,特别是在训练集较少的情况下。为解决这些限制,我们提出了基于希尔伯特变量自动机(HVAE)的端到端架构。这种方法可以在训练集较少的情况下提供更好的 posterior distribution 近似,从而提高图像生成质量。我们的方法在数据缺乏情况下比generative adversarial网络(GAN)表现出色,展现出了图像质量和精准肿瘤掩码生成的改进。我们在公共数据集 BRATS 和 HECKTOR 上进行了实验,证明了我们的方法在不同的医学成像模式下的效果。
Correlation-Distance Graph Learning for Treatment Response Prediction from rs-fMRI
results: 实验结果表明,该方法在 Chronic pain 和 depersonalization disorder 数据集上都有出色的表现,并且超过了当前方法的表现。Abstract
Resting-state fMRI (rs-fMRI) functional connectivity (FC) analysis provides valuable insights into the relationships between different brain regions and their potential implications for neurological or psychiatric disorders. However, specific design efforts to predict treatment response from rs-fMRI remain limited due to difficulties in understanding the current brain state and the underlying mechanisms driving the observed patterns, which limited the clinical application of rs-fMRI. To overcome that, we propose a graph learning framework that captures comprehensive features by integrating both correlation and distance-based similarity measures under a contrastive loss. This approach results in a more expressive framework that captures brain dynamic features at different scales and enables more accurate prediction of treatment response. Our experiments on the chronic pain and depersonalization disorder datasets demonstrate that our proposed method outperforms current methods in different scenarios. To the best of our knowledge, we are the first to explore the integration of distance-based and correlation-based neural similarity into graph learning for treatment response prediction.
摘要
<>将文本翻译成简化中文。<>resting-state fMRI(rs-fMRI)功能连接(FC)分析为脑区之间关系提供了有价值的信息,但是特定的设计努力用于预测治疗响应从rs-fMRI中仍然有限,这主要归结于Current brain state和下面的机制难以理解,这限制了rs-fMRI在临床应用中的使用。为了解决这个问题,我们提议一种图学学习框架,该框架可以捕捉包括相互关联和距离基于相似度度量在内的全面特征。这种方法具有更加表达力的优点,可以捕捉脑动态特征在不同尺度上,并且可以更准确地预测治疗响应。我们的实验表明,在折磨症和人格分裂症数据集上,我们的提议方法在不同的场景中都超过了当前方法。到目前为止,我们是首次将距离基于和相互关联基于的神经相似度 интегри进图学学习中,以预测治疗响应。
DeepClean: Machine Unlearning on the Cheap by Resetting Privacy Sensitive Weights using the Fisher Diagonal
paper_authors: Jiaeli Shi, Najah Ghalyan, Kostis Gourgoulias, John Buford, Sean Moran
for: 保护隐私信息,避免机器学习模型意外吸收和泄露敏感信息。
methods: 使用 Fisher Information Matrix (FIM) 实现选择性忘记,而不需要全面重训练或大量矩阵逆函数计算。
results: 实验表明,我们的算法可以成功忘记任意选择的训练数据subset,并且可以在不同的神经网络架构上实现。Abstract
Machine learning models trained on sensitive or private data can inadvertently memorize and leak that information. Machine unlearning seeks to retroactively remove such details from model weights to protect privacy. We contribute a lightweight unlearning algorithm that leverages the Fisher Information Matrix (FIM) for selective forgetting. Prior work in this area requires full retraining or large matrix inversions, which are computationally expensive. Our key insight is that the diagonal elements of the FIM, which measure the sensitivity of log-likelihood to changes in weights, contain sufficient information for effective forgetting. Specifically, we compute the FIM diagonal over two subsets -- the data to retain and forget -- for all trainable weights. This diagonal representation approximates the complete FIM while dramatically reducing computation. We then use it to selectively update weights to maximize forgetting of the sensitive subset while minimizing impact on the retained subset. Experiments show that our algorithm can successfully forget any randomly selected subsets of training data across neural network architectures. By leveraging the FIM diagonal, our approach provides an interpretable, lightweight, and efficient solution for machine unlearning with practical privacy benefits.
摘要
DUA-DA: Distillation-based Unbiased Alignment for Domain Adaptive Object Detection
results: 在跨域场景下,提高了域隔适应对象检测的精度和一致性,并大幅超越了现有的准则整合方法Abstract
Though feature-alignment based Domain Adaptive Object Detection (DAOD) have achieved remarkable progress, they ignore the source bias issue, i.e. the aligned features are more favorable towards the source domain, leading to a sub-optimal adaptation. Furthermore, the presence of domain shift between the source and target domains exacerbates the problem of inconsistent classification and localization in general detection pipelines. To overcome these challenges, we propose a novel Distillation-based Unbiased Alignment (DUA) framework for DAOD, which can distill the source features towards a more balanced position via a pre-trained teacher model during the training process, alleviating the problem of source bias effectively. In addition, we design a Target-Relevant Object Localization Network (TROLN), which can mine target-related knowledge to produce two classification-free metrics (IoU and centerness). Accordingly, we implement a Domain-aware Consistency Enhancing (DCE) strategy that utilizes these two metrics to further refine classification confidences, achieving a harmonization between classification and localization in cross-domain scenarios. Extensive experiments have been conducted to manifest the effectiveness of this method, which consistently improves the strong baseline by large margins, outperforming existing alignment-based works.
摘要
尽管基于特征对齐的领域适应物体检测(DAOD)已经取得了显著的进步,但它们忽略了源偏见问题,即对齐的特征更加偏向源频道,导致不优化适应。此外,在源和目标频道之间的频道变化使得检测总体的一致性和地方化准确性受到影响。为了解决这些挑战,我们提出了一种基于凝固的不偏投对适应(DUA)框架,可以在教师模型的 pré-训练过程中通过凝固来减轻源偏见问题。此外,我们设计了一个 Target-Relevant Object Localization Network(TROLN),可以挖掘目标相关知识,生成两个无类别的度量(IoU和中心率)。根据这两个度量,我们实施了域aware的一致性提高策略(DCE),以进一步精细化类别信任度,实现在垂直频道上的一致性。我们进行了广泛的实验,manifestly显示了该方法的有效性, persistently 大幅超越了现有的对齐基本elines。
Deep Residual CNN for Multi-Class Chest Infection Diagnosis
results: 研究发现,不同类别之间存在微妙的差异,尤其是 fibrosis 类别,这反映了自动医疗图像诊断的复杂性和挑战。这些发现可以帮助未来的研究,增强模型在识别图像中更加细腻和复杂的特征方面的性能,以及优化和改进模型的架构和训练过程。Abstract
The advent of deep learning has significantly propelled the capabilities of automated medical image diagnosis, providing valuable tools and resources in the realm of healthcare and medical diagnostics. This research delves into the development and evaluation of a Deep Residual Convolutional Neural Network (CNN) for the multi-class diagnosis of chest infections, utilizing chest X-ray images. The implemented model, trained and validated on a dataset amalgamated from diverse sources, demonstrated a robust overall accuracy of 93%. However, nuanced disparities in performance across different classes, particularly Fibrosis, underscored the complexity and challenges inherent in automated medical image diagnosis. The insights derived pave the way for future research, focusing on enhancing the model's proficiency in classifying conditions that present more subtle and nuanced visual features in the images, as well as optimizing and refining the model architecture and training process. This paper provides a comprehensive exploration into the development, implementation, and evaluation of the model, offering insights and directions for future research and development in the field.
摘要
深度学习的出现对医疗图像诊断自动化技术带来了 significiant 的推动,提供了valuable 的工具和资源在医疗和医疗诊断领域。这项研究探讨了一种使用深度差分卷积神经网络(CNN)进行多类医疗图像诊断,使用了胸部X射线图像。实施的模型,在基于多个来源的数据集上进行训练和验证,表现了93%的总准确率。然而,不同类型的疾病之间存在了细微的差异,这反映了自动医疗图像诊断的复杂性和挑战。这些发现可以为未来的研究提供方向,例如增强模型对疾病表现更加细微的图像特征的识别能力,以及优化和改进模型的架构和训练过程。本文对模型的开发、实现和评估进行了全面的探讨,为未来的研究和发展提供了新的想法和方向。
Deep Learning based CNN Model for Classification and Detection of Individuals Wearing Face Mask
paper_authors: R. Chinnaiyan, Iyyappan M, Al Raiyan Shariff A, Kondaveeti Sai, Mallikarjunaiah B M, P Bharath
for: 防止 COVID-19 流行病的扩散,提高安全性,特别是在敏感区域。
methods: 使用深度学习创建一个实时视频和图像中检测面具的模型,包括面部检测和物体检测。
results: 实验结果表明模型在测试数据上具有优秀的准确率。Abstract
In response to the global COVID-19 pandemic, there has been a critical demand for protective measures, with face masks emerging as a primary safeguard. The approach involves a two-fold strategy: first, recognizing the presence of a face by detecting faces, and second, identifying masks on those faces. This project utilizes deep learning to create a model that can detect face masks in real-time streaming video as well as images. Face detection, a facet of object detection, finds applications in diverse fields such as security, biometrics, and law enforcement. Various detector systems worldwide have been developed and implemented, with convolutional neural networks chosen for their superior performance accuracy and speed in object detection. Experimental results attest to the model's excellent accuracy on test data. The primary focus of this research is to enhance security, particularly in sensitive areas. The research paper proposes a rapid image pre-processing method with masks centred on faces. Employing feature extraction and Convolutional Neural Network, the system classifies and detects individuals wearing masks. The research unfolds in three stages: image pre-processing, image cropping, and image classification, collectively contributing to the identification of masked faces. Continuous surveillance through webcams or CCTV cameras ensures constant monitoring, triggering a security alert if a person is detected without a mask.
摘要
因应全球 COVID-19 大流行,有一个急需的保护措施,面具出现为主要防范手段。该方法包括两个方面策略:首先,识别面具,然后,识别面具上的面。这个项目利用深度学习创建一个在实时流动视频和图像中检测面具的模型。面部检测是对象检测的一个方面,在安全、生物特征、和刑事调查等领域有广泛的应用。全球各地已经开发和实施了多种检测系统, convolutional neural networks(CNN)因其对对象检测的高性能精度和速度而被选择。实验结果证明模型在测试数据上具有优秀的准确率。本研究的主要目标是增强安全,特别是在敏感区域。研究论文提议一种快速的图像预处理方法,将面具围绕面进行中心。通过特征提取和 Convolutional Neural Network,系统可以识别和检测戴着面具的人。研究分三个阶段:图像预处理、图像裁剪和图像分类,共同帮助识别面具。通过持续的网络或 CCTV 摄像头监测,确保不断监测,如果检测到没有面具的人,就触发安全警报。
Optimized Deep Learning Models for AUV Seabed Image Analysis
results: 研究发现,使用新的AUV图像处理技术和工具,可以提高海底图像的质量和准确性,并且可以更好地理解海底的特征和结构。In English, this means:
for: This study aims to provide the most up-to-date AUV image processing techniques and tools to better understand the characteristics and structure of the seafloor.
methods: The study uses the latest computer and algorithmic techniques, including image processing and analysis methods, to improve the quality and accuracy of AUV images.
results: The research found that using new AUV image processing techniques and tools can improve the quality and accuracy of seafloor images, and provide a better understanding of the seafloor’s characteristics and structure.Abstract
Using autonomous underwater vehicles, or AUVs, has completely changed how we gather data from the ocean floor. AUV innovation has advanced significantly, especially in the analysis of images, due to the increasing need for accurate and efficient seafloor mapping. This blog post provides a detailed summary and comparison of the most current advancements in AUV seafloor image processing. We will go into the realm of undersea technology, covering everything through computer and algorithmic advancements to advances in sensors and cameras. After reading this page through to the end, you will have a solid understanding of the most up-to-date techniques and tools for using AUVs to process seabed photos and how they could further our comprehension of the ocean floor
摘要
Translation notes:* "AUV" is translated as "自主探测器" (zì zhòu tàn bèi qì) in Simplified Chinese.* "seafloor mapping" is translated as "海底地图" (hǎi dǐ dì tú) in Simplified Chinese.* "underwater technology" is translated as "水下科技" (shuǐ xià kē jì) in Simplified Chinese.* "computer and algorithmic advancements" is translated as "计算机和算法进步" (jì suàn jí hé suān fǎ jìn bo) in Simplified Chinese.* "sensor and camera improvements" is translated as "探测器和摄像头改进" (tàn bèi qì hé diàn yǐng tóu gǎi jì) in Simplified Chinese.
Two-Factor Authentication Approach Based on Behavior Patterns for Defeating Puppet Attacks
results: 该方法可以实现97.87%的准确率和1.89%的false positive rate(FPR)。而且,对于增强傀儡攻击的抵抗力, combining 图像特征和时间特征在PUPGUARD中具有突出的优势。Abstract
Fingerprint traits are widely recognized for their unique qualities and security benefits. Despite their extensive use, fingerprint features can be vulnerable to puppet attacks, where attackers manipulate a reluctant but genuine user into completing the authentication process. Defending against such attacks is challenging due to the coexistence of a legitimate identity and an illegitimate intent. In this paper, we propose PUPGUARD, a solution designed to guard against puppet attacks. This method is based on user behavioral patterns, specifically, the user needs to press the capture device twice successively with different fingers during the authentication process. PUPGUARD leverages both the image features of fingerprints and the timing characteristics of the pressing intervals to establish two-factor authentication. More specifically, after extracting image features and timing characteristics, and performing feature selection on the image features, PUPGUARD fuses these two features into a one-dimensional feature vector, and feeds it into a one-class classifier to obtain the classification result. This two-factor authentication method emphasizes dynamic behavioral patterns during the authentication process, thereby enhancing security against puppet attacks. To assess PUPGUARD's effectiveness, we conducted experiments on datasets collected from 31 subjects, including image features and timing characteristics. Our experimental results demonstrate that PUPGUARD achieves an impressive accuracy rate of 97.87% and a remarkably low false positive rate (FPR) of 1.89%. Furthermore, we conducted comparative experiments to validate the superiority of combining image features and timing characteristics within PUPGUARD for enhancing resistance against puppet attacks.
摘要
人体指纹特征因其独特性和安全优势而广泛应用。然而,指纹特征可能受到傀儡攻击,攻击者可以让用户不情愿而真实的完成身份验证过程。防御 против这类攻击困难,因为攻击者可以利用合法身份和不良意图的共存。在本文中,我们提出了PUPGUARD解决方案,用于防御傀儡攻击。该方法基于用户行为特征,具体来说是在身份验证过程中,用户需要连续两次使用不同的手指点击捕捉设备。PUPGUARD利用手套特征和点击间隔时间特征,并将这两个特征 fusion 成一维特征向量,然后通过一类一分类器进行分类。这种二因素验证方法强调身份验证过程中的动态行为特征,从而增强对傀儡攻击的安全性。为评估PUPGUARD的效果,我们在31名参与者的数据集上进行了实验。我们的实验结果表明,PUPGUARD的准确率为97.87%,并且false positive rate (FPR)为1.89%。此外,我们还进行了比较实验,以验证PUPGUARD combining 手套特征和点击间隔时间特征可以增强对傀儡攻击的抵抗力。
Single-Shot and Multi-Shot Feature Learning for Multi-Object Tracking
paper_authors: Yizhe Li, Sanping Zhou, Zheng Qin, Le Wang, Jinjun Wang, Nanning Zheng for: 本研究旨在提高多目标跟踪(MOT)的精度和可靠性,使其在视频序列中更好地跟踪目标。methods: 本研究提出了一种简单 yet effective的两stage特征学习 парадиг,用于同时学习单击和多击特征,以实现robust的数据关联。在不同的框架中,我们设计了一种单击特征学习模块,以提取每个检测的特征,以便在不同的帧之间进行有效的目标关联。在跟踪过程中,如果某些目标被lost多个帧,我们则设计了一种多击特征学习模块,以提取每个跟踪的特征,以便在长期内重新找到这些丢失的目标。results: 我们的方法在MOT17和MOT20 datasets上实现了显著的改进,并在DanceTrack dataset上达到了当前最佳性能。Abstract
Multi-Object Tracking (MOT) remains a vital component of intelligent video analysis, which aims to locate targets and maintain a consistent identity for each target throughout a video sequence. Existing works usually learn a discriminative feature representation, such as motion and appearance, to associate the detections across frames, which are easily affected by mutual occlusion and background clutter in practice. In this paper, we propose a simple yet effective two-stage feature learning paradigm to jointly learn single-shot and multi-shot features for different targets, so as to achieve robust data association in the tracking process. For the detections without being associated, we design a novel single-shot feature learning module to extract discriminative features of each detection, which can efficiently associate targets between adjacent frames. For the tracklets being lost several frames, we design a novel multi-shot feature learning module to extract discriminative features of each tracklet, which can accurately refind these lost targets after a long period. Once equipped with a simple data association logic, the resulting VisualTracker can perform robust MOT based on the single-shot and multi-shot feature representations. Extensive experimental results demonstrate that our method has achieved significant improvements on MOT17 and MOT20 datasets while reaching state-of-the-art performance on DanceTrack dataset.
摘要
MSE-Nets: Multi-annotated Semi-supervised Ensemble Networks for Improving Segmentation of Medical Image with Ambiguous Boundaries
results: 我们的方法可以大大减少需要多标注数据的需求,仅需97.75%的标注数据,并且与最佳对照方法的差距只有Jaccard指数4%。此外,我们的方法在医学影像分类 tasks 上的实验结果表明,与其他仅使用单一标注或合并融合方法相比,我们的方法在医学影像分类 tasks 上表现更好。Abstract
Medical image segmentation annotations exhibit variations among experts due to the ambiguous boundaries of segmented objects and backgrounds in medical images. Although using multiple annotations for each image in the fully-supervised has been extensively studied for training deep models, obtaining a large amount of multi-annotated data is challenging due to the substantial time and manpower costs required for segmentation annotations, resulting in most images lacking any annotations. To address this, we propose Multi-annotated Semi-supervised Ensemble Networks (MSE-Nets) for learning segmentation from limited multi-annotated and abundant unannotated data. Specifically, we introduce the Network Pairwise Consistency Enhancement (NPCE) module and Multi-Network Pseudo Supervised (MNPS) module to enhance MSE-Nets for the segmentation task by considering two major factors: (1) to optimize the utilization of all accessible multi-annotated data, the NPCE separates (dis)agreement annotations of multi-annotated data at the pixel level and handles agreement and disagreement annotations in different ways, (2) to mitigate the introduction of imprecise pseudo-labels, the MNPS extends the training data by leveraging consistent pseudo-labels from unannotated data. Finally, we improve confidence calibration by averaging the predictions of base networks. Experiments on the ISIC dataset show that we reduced the demand for multi-annotated data by 97.75\% and narrowed the gap with the best fully-supervised baseline to just a Jaccard index of 4\%. Furthermore, compared to other semi-supervised methods that rely only on a single annotation or a combined fusion approach, the comprehensive experimental results on ISIC and RIGA datasets demonstrate the superior performance of our proposed method in medical image segmentation with ambiguous boundaries.
摘要
医学图像分割注释存在专家间的差异,这是因为医学图像中对象和背景的分割Boundaries是不确定的。虽然使用多个注释来训练深度模型已经得到了广泛的研究,但获得大量多注释数据是困难的,因为分割注释需要大量的时间和人力资源,导致大多数图像无法得到任何注释。为解决这个问题,我们提出了多注释 semi-supervised ensemble networks (MSE-Nets),用于从有限多注释和丰富无注释数据中学习分割。 Specifically, we introduce the network pairwise consistency enhancement (NPCE) module and multi-network pseudo-supervised (MNPS) module to enhance MSE-Nets for the segmentation task by considering two major factors: (1) to optimize the utilization of all accessible multi-annotated data, the NPCE separates (dis)agreement annotations of multi-annotated data at the pixel level and handles agreement and disagreement annotations in different ways, (2) to mitigate the introduction of imprecise pseudo-labels, the MNPS extends the training data by leveraging consistent pseudo-labels from unannotated data. Finally, we improve confidence calibration by averaging the predictions of base networks. 实验结果表明,我们可以降低需要多注释数据的数量为97.75%,并将与最佳完全监督基eline之间的差距降低到只是Jaccard指数4%。此外,与其他半监督方法相比,我们的提议方法在医学图像分割中的边界是不确定的情况下表现出了superior performance。
Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models
paper_authors: Hee-Seon Kim, Minji Son, Minbeom Kim, Myung-Joon Kwon, Changick Kim
for: 防御深度学习模型受到攻击的安全性问题在视频分析中变得更加紧迫。特别是Universal Adversarial Perturbation(UAP)对深度学习模型 pose a significant threat, as a single perturbation can mislead deep learning models on entire datasets.
methods: 我们提出了一种新的视频UAP使用图像数据和图像模型。这使得我们可以利用图像数据和图像模型基础的研究来进行视频应用。然而,图像模型对视频中的时间方面的分析有限,这是成功视频攻击的关键。为解决这个挑战,我们引入了Breaking Temporal Consistency(BTC)方法,这是在图像模型中 incorporate temporal information into video attacks 的第一个尝试。我们想要生成攻击视频,其具有与原始视频相反的模式。具体来说,BTC-UAP minimizes the feature similarity between neighboring frames in videos。
results: 我们的方法比现有方法更有效,可以在不同的数据集上达到高效率。其中包括ImageNet、UCF-101和Kinetics-400等数据集。此外,我们的方法适用于视频的不同长度和对时间偏移的抗衡性。Abstract
As video analysis using deep learning models becomes more widespread, the vulnerability of such models to adversarial attacks is becoming a pressing concern. In particular, Universal Adversarial Perturbation (UAP) poses a significant threat, as a single perturbation can mislead deep learning models on entire datasets. We propose a novel video UAP using image data and image model. This enables us to take advantage of the rich image data and image model-based studies available for video applications. However, there is a challenge that image models are limited in their ability to analyze the temporal aspects of videos, which is crucial for a successful video attack. To address this challenge, we introduce the Breaking Temporal Consistency (BTC) method, which is the first attempt to incorporate temporal information into video attacks using image models. We aim to generate adversarial videos that have opposite patterns to the original. Specifically, BTC-UAP minimizes the feature similarity between neighboring frames in videos. Our approach is simple but effective at attacking unseen video models. Additionally, it is applicable to videos of varying lengths and invariant to temporal shifts. Our approach surpasses existing methods in terms of effectiveness on various datasets, including ImageNet, UCF-101, and Kinetics-400.
摘要
为了使深度学习模型更加普及,对于这些模型的攻击性质也变得越来越重要。特别是对于Universal Adversarial Perturbation(UAP)而言,单个杂散可以诱导深度学习模型对整个数据集进行误导。我们提出了一种新的视频UAP,使用图像数据和图像模型。这使得我们可以利用图像数据和图像模型基础的研究,对于视频应用有更多的优势。然而,图像模型对视频中的时间方面有限制,这是成功视频攻击的关键。为解决这个挑战,我们引入了Breaking Temporal Consistency(BTC)方法,这是在图像模型中引入时间信息的第一次尝试。我们想要生成一些与原始视频相反的攻击视频。特别是,BTC-UAP减少了邻域帧视频特征之间的相似性。我们的方法简单而有效,可以让未看过视频模型进行攻击。此外,它适用于视频的不同长度和不同的时间偏移。我们的方法在不同的数据集上表现出色,包括ImageNet、UCF-101和Kinetics-400等。
Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration
results: 与现有方法相比,提高了homography评估指标的精度,且可以使用现有的键点检测方法进行增强Abstract
A novel Bayesian framework is proposed, which explicitly relates the homography of one video frame to the next through an affine transformation while explicitly modelling keypoint uncertainty. The literature has previously used differential homography between subsequent frames, but not in a Bayesian setting. In cases where Bayesian methods have been applied, camera motion is not adequately modelled, and keypoints are treated as deterministic. The proposed method, Bayesian Homography Inference from Tracked Keypoints (BHITK), employs a two-stage Kalman filter and significantly improves existing methods. Existing keypoint detection methods may be easily augmented with BHITK. It enables less sophisticated and less computationally expensive methods to outperform the state-of-the-art approaches in most homography evaluation metrics. Furthermore, the homography annotations of the WorldCup and TS-WorldCup datasets have been refined using a custom homography annotation tool released for public use. The refined datasets are consolidated and released as the consolidated and refined WorldCup (CARWC) dataset.
摘要
提出了一种新的 bayesian 框架,将一帧视频与下一帧视频之间的投影关系通过 affine 变换进行显式关联,并且明确模糊关键点的不确定性。过去的文献中使用了差分投影 между 后续帧,但是没有在 bayesian Setting 中使用。在摄像机运动不充分模型和关键点 treated 为确定的情况下,提出的方法 bayesian homography inference from tracked keypoints (BHITK) 使用了两 stage kalman filter,并有所提高现有方法。现有的关键点检测方法可以轻松地增强 BHITK。这种方法使得不太复杂和计算成本较低的方法能够在大多数投影评价指标中超越现有的状态艺术方法。此外,世界杯和 TS-WorldCup 数据集中的 homography 标注已经通过自定义 homography 标注工具进行了精细化,并将其整合成 consolidated and refined WorldCup (CARWC) 数据集,并将其公开发布。
Garment Recovery with Shape and Deformation Priors
results: 我们的方法可以准确地回归服装的几何结构,同时也可以生成可以直接用于动画和 simulations 的服装模型。Abstract
While modeling people wearing tight-fitting clothing has made great strides in recent years, loose-fitting clothing remains a challenge. We propose a method that delivers realistic garment models from real-world images, regardless of garment shape or deformation. To this end, we introduce a fitting approach that utilizes shape and deformation priors learned from synthetic data to accurately capture garment shapes and deformations, including large ones. Not only does our approach recover the garment geometry accurately, it also yields models that can be directly used by downstream applications such as animation and simulation.
摘要
“对于穿着紧身服装的人体模型化在最近几年中已经做出了很大的进步,但是对于润缓服装仍然是一个挑战。我们提出了一种方法,可以将实际拍摄的图像转换为实际的服装模型,无论服装的形状或者扭曲是多大。为此,我们引入了一种适应策略,利用自己 sintetic data 中学习的形状和扭曲假设,实现精准地捕捉服装的形状和扭曲,包括大的一类。不仅我们的方法可以实现实际的服装几何学精准地回传,还能够提供下游应用,如动画和模拟等。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.
Pseudo Label-Guided Data Fusion and Output Consistency for Semi-Supervised Medical Image Segmentation
results: 实验结果显示,PLGDF架构可以将更少的标签数据用于医疗图像分类任务中,同时与六种州分类学习方法进行比较,PLGDF的性能较高。codes这个研究的数据可以在https://github.com/ortonwang/PLGDF上获得。Abstract
Supervised learning algorithms based on Convolutional Neural Networks have become the benchmark for medical image segmentation tasks, but their effectiveness heavily relies on a large amount of labeled data. However, annotating medical image datasets is a laborious and time-consuming process. Inspired by semi-supervised algorithms that use both labeled and unlabeled data for training, we propose the PLGDF framework, which builds upon the mean teacher network for segmenting medical images with less annotation. We propose a novel pseudo-label utilization scheme, which combines labeled and unlabeled data to augment the dataset effectively. Additionally, we enforce the consistency between different scales in the decoder module of the segmentation network and propose a loss function suitable for evaluating the consistency. Moreover, we incorporate a sharpening operation on the predicted results, further enhancing the accuracy of the segmentation. Extensive experiments on three publicly available datasets demonstrate that the PLGDF framework can largely improve performance by incorporating the unlabeled data. Meanwhile, our framework yields superior performance compared to six state-of-the-art semi-supervised learning methods. The codes of this study are available at https://github.com/ortonwang/PLGDF.
摘要
<>将提供给定文本的简化中文翻译。<>基于卷积神经网络的超级vised学习算法在医疗影像 segmentation 任务上成为标准,但它们的效果受到大量标注数据的限制。然而,标注医疗影像集合是一项劳累和时间consuming的过程。 inspirited by semi-supervised算法,我们提出了 PLGDF 框架,该框架基于 Mean Teacher 网络,用于 segmenting 医疗影像,并使用 menos 标注数据。我们提出了一种新的 Pseudo-label 利用方案,该方案将标注和无标注数据结合使用,以增强数据集的规模。此外,我们在解码模块中 enforcing 等效性,并提出了适合评估等效性的损失函数。此外,我们还添加了一种锐化操作,以进一步提高 segmentation 的准确性。广泛的实验表明,PLGDF 框架可以通过 incorporating 无标注数据,大幅提高性能。同时,我们的框架在六种 state-of-the-art semi-supervised 学习方法中显示出了超越性。codes 这些研究可以在 https://github.com/ortonwang/PLGDF 上获得。
Enhancing Student Engagement in Online Learning through Facial Expression Analysis and Complex Emotion Recognition using Deep Learning
results: 实验结果显示,该方法可以准确地分类学生的基本情感状态,并且达到了95%的准确率。Abstract
In response to the COVID-19 pandemic, traditional physical classrooms have transitioned to online environments, necessitating effective strategies to ensure sustained student engagement. A significant challenge in online teaching is the absence of real-time feedback from teachers on students learning progress. This paper introduces a novel approach employing deep learning techniques based on facial expressions to assess students engagement levels during online learning sessions. Human emotions cannot be adequately conveyed by a student using only the basic emotions, including anger, disgust, fear, joy, sadness, surprise, and neutrality. To address this challenge, proposed a generation of four complex emotions such as confusion, satisfaction, disappointment, and frustration by combining the basic emotions. These complex emotions are often experienced simultaneously by students during the learning session. To depict these emotions dynamically,utilized a continuous stream of image frames instead of discrete images. The proposed work utilized a Convolutional Neural Network (CNN) model to categorize the fundamental emotional states of learners accurately. The proposed CNN model demonstrates strong performance, achieving a 95% accuracy in precise categorization of learner emotions.
摘要
Due to the COVID-19 pandemic, traditional physical classrooms have transitioned to online environments, requiring effective strategies to ensure sustained student engagement. One significant challenge in online teaching is the lack of real-time feedback from teachers on students' learning progress. This paper proposes a novel approach using deep learning techniques based on facial expressions to assess students' engagement levels during online learning sessions.Human emotions cannot be fully conveyed by a student using only the basic emotions, such as anger, disgust, fear, joy, sadness, surprise, and neutrality. To address this challenge, the proposed approach generates four complex emotions, including confusion, satisfaction, disappointment, and frustration, by combining the basic emotions. These complex emotions are often experienced simultaneously by students during the learning session. To depict these emotions dynamically, the proposed approach uses a continuous stream of image frames instead of discrete images.The proposed approach utilizes a Convolutional Neural Network (CNN) model to accurately categorize the fundamental emotional states of learners. The proposed CNN model demonstrates strong performance, achieving a 95% accuracy in precise categorization of learner emotions.Here is the translation in Traditional Chinese:因COVID-19大流行,传统的物理教室转换为在线环境,需要有效的策略来确保学生的持续参与。在线教学中的一个重要挑战是缺乏教师在学生学习过程中的实时反馈。本文提出了一种使用深度学习技术基于表情来评估在线学习Session中学生的参与水平的新方法。人类的情感无法由学生只使用基本情感来完全表达,例如愤怒、厌恶、恐惧、喜悦、悲伤、惊讶和中性。为了解决这个挑战,提出了组合基本情感生成四种复杂情感,包括混乱、满足、失望和沮丧。这些复杂情感经常在学习Session中同时出现。为了显示这些情感的动态变化,提出的方法使用一串无限长的图像框架而不是独立的图像。提出的方法使用一个Convolutional Neural Network(CNN)模型精确地分Category学生的情感状态。提出的CNN模型示出了强大的表现,实现了95%的精确分Category学生情感。
methods: 本文提出了一种新的方法,即 Attend to eXpert Prompts(A2XP),以解决 DNNs 在不同领域数据中的领域泛化问题。A2XP 包括两个阶段:专家适应和领域泛化。在第一阶段,每个源领域的提问都被优化,以引导模型向优化的方向发展。在第二阶段,两个嵌入器网络被训练,以有效地混合这些专家提问,以达到最佳输出。
results: 我们的广泛实验表明,A2XP 可以与现有的非私有领域泛化方法相比,达到最新的结果。实验结果表明,提出的方法不仅可以解决 DNNs 中的领域泛化问题,还提供了一种隐私保护、高效的解决方案,对 Computer Vision 领域的更广泛应用。Abstract
Deep Neural Networks (DNNs) have become pivotal in various fields, especially in computer vision, outperforming previous methodologies. A critical challenge in their deployment is the bias inherent in data across different domains, such as image style, and environmental conditions, leading to domain gaps. This necessitates techniques for learning general representations from biased training data, known as domain generalization. This paper presents Attend to eXpert Prompts (A2XP), a novel approach for domain generalization that preserves the privacy and integrity of the network architecture. A2XP consists of two phases: Expert Adaptation and Domain Generalization. In the first phase, prompts for each source domain are optimized to guide the model towards the optimal direction. In the second phase, two embedder networks are trained to effectively amalgamate these expert prompts, aiming for an optimal output. Our extensive experiments demonstrate that A2XP achieves state-of-the-art results over existing non-private domain generalization methods. The experimental results validate that the proposed approach not only tackles the domain generalization challenge in DNNs but also offers a privacy-preserving, efficient solution to the broader field of computer vision.
摘要
Cooperative Perception with Learning-Based V2V communications
results: numerical results表明,中间融合在频率损害较大的情况下比早期融合和晚期融合更加稳定,当SNR大于0dB时。此外,提议的融合方案也超过了使用检测输出的传统晚期融合。 autoencoder也提供了一个好的平衡点,在检测准确率和带宽使用之间。Abstract
Cooperative perception has been widely used in autonomous driving to alleviate the inherent limitation of single automated vehicle perception. To enable cooperation, vehicle-to-vehicle (V2V) communication plays an indispensable role. This work analyzes the performance of cooperative perception accounting for communications channel impairments. Different fusion methods and channel impairments are evaluated. A new late fusion scheme is proposed to leverage the robustness of intermediate features. In order to compress the data size incurred by cooperation, a convolution neural network-based autoencoder is adopted. Numerical results demonstrate that intermediate fusion is more robust to channel impairments than early fusion and late fusion, when the SNR is greater than 0 dB. Also, the proposed fusion scheme outperforms the conventional late fusion using detection outputs, and autoencoder provides a good compromise between detection accuracy and bandwidth usage.
摘要
合作感知在自动驾驶中广泛应用以减轻单个自动车辆感知的内在限制。为实现合作,车辆间通信(V2V)在无可或缺。这项工作分析了帐户通信频率的影响,并评估不同的混合方法和通信频率。我们提出了一种新的晚期混合方案,以利用中间特征的可靠性。为减少合作所带来的数据大小,我们采用了一种基于卷积神经网络的自适应编码器。数值结果表明,中间混合比早期混合和晚期混合更加鲁棒,当SNR大于0dB时。此外,我们的混合方案比传统的晚期混合使用检测输出更高效,而自适应编码器提供了一个好的平衡 между检测精度和带宽使用。
Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal Diseases in Ultra-wide OCTA
for: 该论文旨在提供 Exceptionally wide scanning range of up to 24 x 20 $mm^{2}$,覆盖 anterior和 posterior regions of the retina 的 Ultra-wide optical coherence tomography angiography (UW-OCTA) 技术。
methods: 该论文提出了一种 cross-modal fusion 框架,利用 multi-modal information for diagnosing multiple diseases。
results: 通过对 M3OCTA 数据集进行了extensive experiments,证明了该方法的效iveness和超越性, both in fixed and varying modalities settings。Abstract
Ultra-wide optical coherence tomography angiography (UW-OCTA) is an emerging imaging technique that offers significant advantages over traditional OCTA by providing an exceptionally wide scanning range of up to 24 x 20 $mm^{2}$, covering both the anterior and posterior regions of the retina. However, the currently accessible UW-OCTA datasets suffer from limited comprehensive hierarchical information and corresponding disease annotations. To address this limitation, we have curated the pioneering M3OCTA dataset, which is the first multimodal (i.e., multilayer), multi-disease, and widest field-of-view UW-OCTA dataset. Furthermore, the effective utilization of multi-layer ultra-wide ocular vasculature information from UW-OCTA remains underdeveloped. To tackle this challenge, we propose the first cross-modal fusion framework that leverages multi-modal information for diagnosing multiple diseases. Through extensive experiments conducted on our openly available M3OCTA dataset, we demonstrate the effectiveness and superior performance of our method, both in fixed and varying modalities settings. The construction of the M3OCTA dataset, the first multimodal OCTA dataset encompassing multiple diseases, aims to advance research in the ophthalmic image analysis community.
摘要
“ULTRA-WIDE Optical coherence tomography angiography(UW-OCTA)是一种emerging imaging技术,具有优先的优点,包括提供exceptionally wide scanning range,覆盖 anterior和 posterior retina regions,但是目前可用的UW-OCTA数据集存在limited comprehensive hierarchical information和相应的疾病标识。为解决这个限制,我们已经组装了创新的M3OCTA数据集,是第一个多模式(即多层)、多疾病、最宽的field-of-view UW-OCTA数据集。此外,使用ultra-wide ocular vasculature信息的有效利用方法仍然受到挑战。为了解决这个问题,我们提出了首个 Cross-modal fusion框架,利用多modal信息进行疾病诊断。经过了我们公开提供的M3OCTA数据集的广泛实验,我们证明了我们的方法的有效性和superior performance,包括固定和变化modal settings。M3OCTA数据集的建立,是为了进展医疗影像分析社区的研究。”
TransONet: Automatic Segmentation of Vasculature in Computed Tomographic Angiograms Using Deep Learning
paper_authors: Alireza Bagheri Rajeoni, Breanna Pederson, Ali Firooz, Hamed Abdollahi, Andrew K. Smith, Daniel G. Clair, Susan M. Lessner, Homayoun Valafar
results: 研究结果表明,使用深度学习技术可以准确地分类CTA图像中的血管系统,其中最高的Dice准确率达到93.5%和80.64%。这些结果表明深度学习技术在诊断血管系统中具有高度的潜在价值和优势。Abstract
Pathological alterations in the human vascular system underlie many chronic diseases, such as atherosclerosis and aneurysms. However, manually analyzing diagnostic images of the vascular system, such as computed tomographic angiograms (CTAs) is a time-consuming and tedious process. To address this issue, we propose a deep learning model to segment the vascular system in CTA images of patients undergoing surgery for peripheral arterial disease (PAD). Our study focused on accurately segmenting the vascular system (1) from the descending thoracic aorta to the iliac bifurcation and (2) from the descending thoracic aorta to the knees in CTA images using deep learning techniques. Our approach achieved average Dice accuracies of 93.5% and 80.64% in test dataset for (1) and (2), respectively, highlighting its high accuracy and potential clinical utility. These findings demonstrate the use of deep learning techniques as a valuable tool for medical professionals to analyze the health of the vascular system efficiently and accurately. Please visit the GitHub page for this paper at https://github.com/pip-alireza/TransOnet.
摘要
人体血管系统的疾病变化对多种慢性疾病有重要影响,如atherosclerosis和aneurysms。然而,手动分析医学影像诊断图像(如计算机tomography angiography,CTA)是一项时间consuming和繁琐的过程。为解决这个问题,我们提出了一个深度学习模型,用于在患有peripheral arterial disease(PAD)患者的CTA图像中分类血管系统。我们的研究把注意力集中在以下两个方面:1. 从 descending thoracic aorta 到 iliac bifurcation 的血管系统分类(CTA图像),我们使用深度学习技术实现了平均的 Dice 准确率为 93.5%。2. 从 descending thoracic aorta 到 knees 的血管系统分类(CTA图像),我们使用深度学习技术实现了平均的 Dice 准确率为 80.64%。这些结果表明,使用深度学习技术可以作为医疗专业人员分析血管系统的健康效果精准和高效的 valuabe工具。更多信息请参考我们的 GitHub 页面:https://github.com/pip-alireza/TransOnet。
Learning transformer-based heterogeneously salient graph representation for multimodal fusion classification of hyperspectral image and LiDAR data
results: 通过三个 benchmark 数据集的实验和分析,证明提出的方法能够在不同的模式数据上提高分类精度和准确性,并且与其他 SOTA 方法相比具有竞争力。Abstract
Data collected by different modalities can provide a wealth of complementary information, such as hyperspectral image (HSI) to offer rich spectral-spatial properties, synthetic aperture radar (SAR) to provide structural information about the Earth's surface, and light detection and ranging (LiDAR) to cover altitude information about ground elevation. Therefore, a natural idea is to combine multimodal images for refined and accurate land-cover interpretation. Although many efforts have been attempted to achieve multi-source remote sensing image classification, there are still three issues as follows: 1) indiscriminate feature representation without sufficiently considering modal heterogeneity, 2) abundant features and complex computations associated with modeling long-range dependencies, and 3) overfitting phenomenon caused by sparsely labeled samples. To overcome the above barriers, a transformer-based heterogeneously salient graph representation (THSGR) approach is proposed in this paper. First, a multimodal heterogeneous graph encoder is presented to encode distinctively non-Euclidean structural features from heterogeneous data. Then, a self-attention-free multi-convolutional modulator is designed for effective and efficient long-term dependency modeling. Finally, a mean forward is put forward in order to avoid overfitting. Based on the above structures, the proposed model is able to break through modal gaps to obtain differentiated graph representation with competitive time cost, even for a small fraction of training samples. Experiments and analyses on three benchmark datasets with various state-of-the-art (SOTA) methods show the performance of the proposed approach.
摘要
据收集的不同模式数据可以提供丰富的补充信息,如光谱镜像(HSI)提供了丰富的 spectral-spatial 性质, Synthetic Aperture Radar(SAR)提供了地球表面的结构信息,和光散射和距离测量(LiDAR)提供了高度信息。因此,将多Modal Image 进行合并,可以提高准确的地面解释。虽然许多努力已经尝试过多源 remote sensing 图像分类,但还有三个问题:1)不充分考虑模态不同性,2)丰富的特征和复杂的计算,3)过拟合现象。为了突破这些障碍,本文提出了一种基于 transformer 的多模态焦点图表示(THSGR)方法。首先,一种多模态不同结构图编码器被提出,用于编码不同模式数据中的非欧几何特征。然后,一种自我注意力free的多层核心修饰器被设计,用于效果地和高效地模型长远依赖关系。最后,一种均方桢被提出,以避免过拟合。基于以上结构,提出的方法可以突破模态差异,获得竞争时间成本下的分化图表示,即使只有小部分训练样本。实验和分析基于三个benchmark数据集和多种 state-of-the-art 方法表明了方法的性能。
results: 我们证明了 MINT 可以在多名学生可以交流的情况下提供显著的教学速度提升,特别是在重复的单个学生教学情况下。此外,我们还进行了广泛的实验来验证 MINT 的实用性和效率。Abstract
We study the problem of teaching multiple learners simultaneously in the nonparametric iterative teaching setting, where the teacher iteratively provides examples to the learner for accelerating the acquisition of a target concept. This problem is motivated by the gap between current single-learner teaching setting and the real-world scenario of human instruction where a teacher typically imparts knowledge to multiple students. Under the new problem formulation, we introduce a novel framework -- Multi-learner Nonparametric Teaching (MINT). In MINT, the teacher aims to instruct multiple learners, with each learner focusing on learning a scalar-valued target model. To achieve this, we frame the problem as teaching a vector-valued target model and extend the target model space from a scalar-valued reproducing kernel Hilbert space used in single-learner scenarios to a vector-valued space. Furthermore, we demonstrate that MINT offers significant teaching speed-up over repeated single-learner teaching, particularly when the multiple learners can communicate with each other. Lastly, we conduct extensive experiments to validate the practicality and efficiency of MINT.
摘要
我们研究同时教育多个学习者的问题,这是非参数iterative teaching设定下的教师逐步提供示例,以加速目标概念的掌握。这个问题受到单个学习者教学设定下的现实场景启发,在这种场景中,一个教师通常会向多名学生传授知识。在新的问题设定下,我们提出了一个新的框架——多学习者非参数教学(MINT)。在MINT中,教师需要同时教育多名学习者,每名学习者都需要学习一个含有Scalar值的目标模型。为了实现这一点,我们将问题归类为教育一个向量值目标模型,并从单个学习者的场景中使用的scalar值 reproduce kernel Hilbert space扩展了目标模型空间。此外,我们还证明了MINT在多名学习者之间交流的情况下可以得到明显的教学速度增加。最后,我们进行了广泛的实验来验证MINT的实用性和效率。
MPSeg : Multi-Phase strategy for coronary artery Segmentation
methods: 这篇论文使用了一个多阶段方法,包括分类 Left Coronary Artery (LCA) 和 Right Coronary Artery (RCA),然后使用专门的集成模型来执行分类任务。在LCA的分类任务中,使用了一个精确的修正模型以更正初始类别预测。
results: 这篇论文在Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) Segmentation Detection Algorithm challenge at MICCAI 2023 中表现了非常出色的效果。Abstract
Accurate segmentation of coronary arteries is a pivotal process in assessing cardiovascular diseases. However, the intricate structure of the cardiovascular system presents significant challenges for automatic segmentation, especially when utilizing methodologies like the SYNTAX Score, which relies extensively on detailed structural information for precise risk stratification. To address these difficulties and cater to this need, we present MPSeg, an innovative multi-phase strategy designed for coronary artery segmentation. Our approach specifically accommodates these structural complexities and adheres to the principles of the SYNTAX Score. Initially, our method segregates vessels into two categories based on their unique morphological characteristics: Left Coronary Artery (LCA) and Right Coronary Artery (RCA). Specialized ensemble models are then deployed for each category to execute the challenging segmentation task. Due to LCA's higher complexity over RCA, a refinement model is utilized to scrutinize and correct initial class predictions on segmented areas. Notably, our approach demonstrated exceptional effectiveness when evaluated in the Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) Segmentation Detection Algorithm challenge at MICCAI 2023.
摘要
通过精准分割 coronary arteries 是评估心血管疾病的关键过程。然而,心血管系统的复杂结构对自动分割带来了很大挑战,特别是在使用 SYNTAX Score 方法时。为了解决这些困难并满足这种需求,我们提出了 MPSeg,一种创新的多阶段策略,用于 coronary artery 分割。我们的方法特别注意到 cardiovascular 系统的结构特征,并遵循 SYNTAX Score 的原则。我们的方法首先将血管分为两类,根据它们独特的形态特征:左心脏动脉 (LCA) 和右心脏动脉 (RCA)。然后,我们使用专门设计的集合模型来执行分割任务。由于 LCA 的复杂性比 RCA 更高,因此我们使用修正模型来检查并更正初始分类预测结果。值得注意的是,我们的方法在 MICCAI 2023 年的 Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) Segmentation Detection Algorithm 挑战中表现出色。
Semi-supervised ViT knowledge distillation network with style transfer normalization for colorectal liver metastases survival prediction
paper_authors: Mohamed El Amine Elforaici, Emmanuel Montagnon, Francisco Perdigon Romero, William Trung Le, Feryel Azzi, Dominique Trudel, Bich Nguyen, Simon Turcotte, An Tang, Samuel Kadoury
results: 我们的方法在一个临床数据集上进行评估,表现出优于相关方法,c-index 为 0.804 (0.014) 和 0.733 (0.014) для OS 和 TTR,同时在TRG 分类任务中,我们的方法可以达到 86.9% 到 90.3% 的准确率和 78.5% 到 82.1% 的准确率。Abstract
Colorectal liver metastases (CLM) significantly impact colon cancer patients, influencing survival based on systemic chemotherapy response. Traditional methods like tumor grading scores (e.g., tumor regression grade - TRG) for prognosis suffer from subjectivity, time constraints, and expertise demands. Current machine learning approaches often focus on radiological data, yet the relevance of histological images for survival predictions, capturing intricate tumor microenvironment characteristics, is gaining recognition. To address these limitations, we propose an end-to-end approach for automated prognosis prediction using histology slides stained with H&E and HPS. We first employ a Generative Adversarial Network (GAN) for slide normalization to reduce staining variations and improve the overall quality of the images that are used as input to our prediction pipeline. We propose a semi-supervised model to perform tissue classification from sparse annotations, producing feature maps. We use an attention-based approach that weighs the importance of different slide regions in producing the final classification results. We exploit the extracted features for the metastatic nodules and surrounding tissue to train a prognosis model. In parallel, we train a vision Transformer (ViT) in a knowledge distillation framework to replicate and enhance the performance of the prognosis prediction. In our evaluation on a clinical dataset of 258 patients, our approach demonstrates superior performance with c-indexes of 0.804 (0.014) for OS and 0.733 (0.014) for TTR. Achieving 86.9% to 90.3% accuracy in predicting TRG dichotomization and 78.5% to 82.1% accuracy for the 3-class TRG classification task, our approach outperforms comparative methods. Our proposed pipeline can provide automated prognosis for pathologists and oncologists, and can greatly promote precision medicine progress in managing CLM patients.
摘要
来自肝脏 метаstatic colorectal cancer (CLM) 对colon cancer 患者有着重要的影响,影响了存生基于系统化化学疗法的回朋答。传统的方法,如肿瘤分化分数 (e.g., 肿瘤变化度 - TRG) 用于预后预测,受到主观性、时间限制和专业知识的限制。目前的机器学习方法通常专注于放射学数据,但是对存生预测的 histological 影像具有重要的特征,因此获得了更多的认可。为了解决这些限制,我们提出了一个端到端的方法,用于自动预测CLM患者的存生预测。我们首先使用生成对抗网络 (GAN) 来 нормалізу slide,以改善图像质量并减少染色变化。然后,我们提出了一个半supervised模型,用于从稀疏标注中进行组织识别,生成特征地图。我们使用注意力机制,让不同的图像区域在生成最终类别结果中获得不同的重要性。我们利用提取的特征来训练存生预测模型。另外,我们在知识传播框架中将视觉Transformers (ViT) 训练来增强和复制存生预测的表现。在我们在258名病例的临床数据集上进行评估时,我们的方法示出了超过86.9%到90.3%的准确率,对于TRG分类 tasks 的准确率为78.5%到82.1%。我们的方法比较方法更高。我们的提案的管道可以为Pathologist和Oncologist提供自动预测,并可以帮助精确医学进步在处理CLM患者。
BiHRNet: A Binary high-resolution network for Human Pose Estimation
results: 实验结果表明,BiHRNet在MPII数据集上 achieve PCKh 87.9,而在COCO数据集上 achieve 70.8 mAP,这些结果都高于大多数测试的精简型全精度网络。Abstract
Human Pose Estimation (HPE) plays a crucial role in computer vision applications. However, it is difficult to deploy state-of-the-art models on resouce-limited devices due to the high computational costs of the networks. In this work, a binary human pose estimator named BiHRNet(Binary HRNet) is proposed, whose weights and activations are expressed as $\pm$1. BiHRNet retains the keypoint extraction ability of HRNet, while using fewer computing resources by adapting binary neural network (BNN). In order to reduce the accuracy drop caused by network binarization, two categories of techniques are proposed in this work. For optimizing the training process for binary pose estimator, we propose a new loss function combining KL divergence loss with AWing loss, which makes the binary network obtain more comprehensive output distribution from its real-valued counterpart to reduce information loss caused by binarization. For designing more binarization-friendly structures, we propose a new information reconstruction bottleneck called IR Bottleneck to retain more information in the initial stage of the network. In addition, we also propose a multi-scale basic block called MS-Block for information retention. Our work has less computation cost with few precision drop. Experimental results demonstrate that BiHRNet achieves a PCKh of 87.9 on the MPII dataset, which outperforms all binary pose estimation networks. On the challenging of COCO dataset, the proposed method enables the binary neural network to achieve 70.8 mAP, which is better than most tested lightweight full-precision networks.
摘要
人体姿势估计(HPE)在计算机视觉应用中扮演着关键性的角色。然而,由于现有的状态艺术网络在资源有限的设备上部署困难,因此在这种情况下,一种名为BiHRNet的二进制人体姿势估计器被提出。BiHRNet在保持人体姿势估计的能力的同时,使用了更少的计算资源,通过适应二进制神经网络(BNN)进行减少计算成本。为了减少由网络二进制化引起的准确性下降,本文提出了两种类型的技术。首先,我们提出了一种新的损失函数,它将KL散度损失与AWing损失相结合,以使得二进制网络从其浮点对应的网络获得更全面的输出分布,从而减少由二进制化引起的信息损失。其次,我们提出了一种新的信息重建瓶颈,称为IR瓶颈,以保留在网络的初始阶段更多的信息。此外,我们还提出了一种多尺度基本块,称为MS-Block,以保持更多的信息。我们的方法具有较低的计算成本和减少的精度下降。实验结果表明,BiHRNet在MPII dataset上 achievied PCKh的87.9,超过了所有二进制姿势估计网络。在COCO dataset上,我们的方法使得二进制神经网络在70.8 mAP的情况下获得了更好的表现,超过了大多数测试的轻量级整数网络。
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
results: 论文的实验结果表明,使用 Style Tailoring 方法可以提高图像质量(14%)、描述性(16.2%)和场景多样性(15.3%),相比于基于 Emu 模型的描述性引擎。Abstract
We introduce Style Tailoring, a recipe to finetune Latent Diffusion Models (LDMs) in a distinct domain with high visual quality, prompt alignment and scene diversity. We choose sticker image generation as the target domain, as the images significantly differ from photorealistic samples typically generated by large-scale LDMs. We start with a competent text-to-image model, like Emu, and show that relying on prompt engineering with a photorealistic model to generate stickers leads to poor prompt alignment and scene diversity. To overcome these drawbacks, we first finetune Emu on millions of sticker-like images collected using weak supervision to elicit diversity. Next, we curate human-in-the-loop (HITL) Alignment and Style datasets from model generations, and finetune to improve prompt alignment and style alignment respectively. Sequential finetuning on these datasets poses a tradeoff between better style alignment and prompt alignment gains. To address this tradeoff, we propose a novel fine-tuning method called Style Tailoring, which jointly fits the content and style distribution and achieves best tradeoff. Evaluation results show our method improves visual quality by 14%, prompt alignment by 16.2% and scene diversity by 15.3%, compared to prompt engineering the base Emu model for stickers generation.
摘要
我们介绍 Style Tailoring,一种精度调整潜在扩散模型(LDM)的独特领域recipe,以高质量的视觉和文本映射、场景多样性为目标。我们选择了贴纸图像生成作为目标领域,因为这些图像与大规模 LDM 通常生成的 фото真实样式很不同。我们开始 WITH 一个能够文本图像生成的竞争力强的模型,如 Emu,并显示了,通过倚靠写入 photorealistic 模型来生成贴纸会导致文本映射和场景多样性受损。为了缓解这些缺陷,我们首先在弱监督下对 Emu 进行了数百万个贴纸样本的训练,以启动多样性。然后,我们 manually curate 了人类在 loop(HITL)的对应和风格数据集,并进行了进一步的训练,以提高对应和风格的映射。我们发现在这些数据集上进行顺序的训练存在一定的负面关系,即改进对应和风格的映射可能会导致另一个方面的损失。为了解决这个负面关系,我们提出了一种新的训练方法,即 Style Tailoring。我们的方法可以同时调整内容和风格的分布,并实现最佳的负面关系。我们的评估结果表明,相比于基于 Emu 模型的贴纸生成,我们的方法可以提高视觉质量14%, 文本映射16.2%, 场景多样性15.3%。
Hierarchical Pruning of Deep Ensembles with Focal Diversity
paper_authors: Yanzhao Wu, Ka-Ho Chow, Wenqi Wei, Ling Liu for: 这篇论文旨在提出一种新的深度神经网络集成预选方法,以提高集成的普遍性和可靠性,并且可以有效地降低集成执行的时间和空间成本。methods: 该方法基于三种新的集成预选技术,包括:1) 基于焦点多样性指标的集成预选方法,可以准确地捕捉集成中每个网络的补做能力;2) 基于层次结构的集成预选方法,可以逐层找到低成本高准确性的深度 ensemble;3) 基于多个焦点多样性指标的集成协调方法,可以融合多个焦点多样性指标,以提高集成预选的精度和可靠性。results: 在使用 популяр的 benchmark 数据集上进行测试,我们 demonstarted that the proposed hierarchical ensemble pruning approach can effectively identify high-quality deep ensembles with better generalizability while being more time and space efficient in ensemble decision-making.Abstract
Deep neural network ensembles combine the wisdom of multiple deep neural networks to improve the generalizability and robustness over individual networks. It has gained increasing popularity to study deep ensemble techniques in the deep learning community. Some mission-critical applications utilize a large number of deep neural networks to form deep ensembles to achieve desired accuracy and resilience, which introduces high time and space costs for ensemble execution. However, it still remains a critical challenge whether a small subset of the entire deep ensemble can achieve the same or better generalizability and how to effectively identify these small deep ensembles for improving the space and time efficiency of ensemble execution. This paper presents a novel deep ensemble pruning approach, which can efficiently identify smaller deep ensembles and provide higher ensemble accuracy than the entire deep ensemble of a large number of member networks. Our hierarchical ensemble pruning approach (HQ) leverages three novel ensemble pruning techniques. First, we show that the focal diversity metrics can accurately capture the complementary capacity of the member networks of an ensemble, which can guide ensemble pruning. Second, we design a focal diversity based hierarchical pruning approach, which will iteratively find high quality deep ensembles with low cost and high accuracy. Third, we develop a focal diversity consensus method to integrate multiple focal diversity metrics to refine ensemble pruning results, where smaller deep ensembles can be effectively identified to offer high accuracy, high robustness and high efficiency. Evaluated using popular benchmark datasets, we demonstrate that the proposed hierarchical ensemble pruning approach can effectively identify high quality deep ensembles with better generalizability while being more time and space efficient in ensemble decision making.
摘要
深度神经网络集成 combines the wisdom of multiple deep neural networks to improve the generalizability and robustness over individual networks. It has gained increasing popularity to study deep ensemble techniques in the deep learning community. Some mission-critical applications utilize a large number of deep neural networks to form deep ensembles to achieve desired accuracy and resilience, which introduces high time and space costs for ensemble execution. However, it still remains a critical challenge whether a small subset of the entire deep ensemble can achieve the same or better generalizability and how to effectively identify these small deep ensembles for improving the space and time efficiency of ensemble execution. This paper presents a novel deep ensemble pruning approach, which can efficiently identify smaller deep ensembles and provide higher ensemble accuracy than the entire deep ensemble of a large number of member networks. Our hierarchical ensemble pruning approach (HQ) leverages three novel ensemble pruning techniques. First, we show that the focal diversity metrics can accurately capture the complementary capacity of the member networks of an ensemble, which can guide ensemble pruning. Second, we design a focal diversity based hierarchical pruning approach, which will iteratively find high quality deep ensembles with low cost and high accuracy. Third, we develop a focal diversity consensus method to integrate multiple focal diversity metrics to refine ensemble pruning results, where smaller deep ensembles can be effectively identified to offer high accuracy, high robustness and high efficiency. Evaluated using popular benchmark datasets, we demonstrate that the proposed hierarchical ensemble pruning approach can effectively identify high quality deep ensembles with better generalizability while being more time and space efficient in ensemble decision making.
SSASS: Semi-Supervised Approach for Stenosis Segmentation
results: 我们的方法在 Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) Stenosis Detection Algorithm 挑战中表现出色,只需要一个模型,而不需要 ensemble 多个模型。这种成功表明我们的方法具有 Automatic 和高效的特点,可以帮助医疗专业人员更准确地评估患者的情况。Abstract
Coronary artery stenosis is a critical health risk, and its precise identification in Coronary Angiography (CAG) can significantly aid medical practitioners in accurately evaluating the severity of a patient's condition. The complexity of coronary artery structures combined with the inherent noise in X-ray images poses a considerable challenge to this task. To tackle these obstacles, we introduce a semi-supervised approach for cardiovascular stenosis segmentation. Our strategy begins with data augmentation, specifically tailored to replicate the structural characteristics of coronary arteries. We then apply a pseudo-label-based semi-supervised learning technique that leverages the data generated through our augmentation process. Impressively, our approach demonstrated an exceptional performance in the Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) Stenosis Detection Algorithm challenge by utilizing a single model instead of relying on an ensemble of multiple models. This success emphasizes our method's capability and efficiency in providing an automated solution for accurately assessing stenosis severity from medical imaging data.
摘要
coronary artery stenosis 是一个严重的健康风险,并且准确评估患者的状况的精准识别在 coronary angiography (CAG) 中非常重要。 However, the complexity of coronary artery structures and the inherent noise in X-ray images pose significant challenges to this task. To overcome these challenges, we propose a semi-supervised approach for cardiovascular stenosis segmentation. Our approach begins with data augmentation, specifically tailored to replicate the structural characteristics of coronary arteries. We then apply a pseudo-label-based semi-supervised learning technique that leverages the data generated through our augmentation process. Notably, our approach achieved an exceptional performance in the Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) Stenosis Detection Algorithm challenge by utilizing a single model instead of relying on an ensemble of multiple models. This success highlights our method's capability and efficiency in providing an automated solution for accurately assessing stenosis severity from medical imaging data.Here's the word-for-word translation of the text into Simplified Chinese: coronary artery stenosis 是一个严重的健康风险,并且准确评估患者的状况的精准识别在 coronary angiography (CAG) 中非常重要。然而, coronary artery structures 的复杂性和 X-ray images 中的自然噪音 pose significant challenges to this task. To overcome these challenges, we propose a semi-supervised approach for cardiovascular stenosis segmentation. our approach begins with data augmentation, specifically tailored to replicate the structural characteristics of coronary arteries. We then apply a pseudo-label-based semi-supervised learning technique that leverages the data generated through our augmentation process. notable, our approach achieved an exceptional performance in the Automatic Region-based Coronary Artery Disease diagnostics using x-ray angiography imagEs (ARCADE) Stenosis Detection Algorithm challenge by utilizing a single model instead of relying on an ensemble of multiple models. This success highlights our method's capability and efficiency in providing an automated solution for accurately assessing stenosis severity from medical imaging data.
Vision meets mmWave Radar: 3D Object Perception Benchmark for Autonomous Driving
results: 这个论文通过对摄像头和雷达数据进行融合,实现了更高的识别精度和可靠性。这种融合方法可以在不同的照明和天气情况下实现更好的性能,并且可以提供更多的semantic信息来支持更高级别的自动驾驶功能。Abstract
Sensor fusion is crucial for an accurate and robust perception system on autonomous vehicles. Most existing datasets and perception solutions focus on fusing cameras and LiDAR. However, the collaboration between camera and radar is significantly under-exploited. The incorporation of rich semantic information from the camera, and reliable 3D information from the radar can potentially achieve an efficient, cheap, and portable solution for 3D object perception tasks. It can also be robust to different lighting or all-weather driving scenarios due to the capability of mmWave radars. In this paper, we introduce the CRUW3D dataset, including 66K synchronized and well-calibrated camera, radar, and LiDAR frames in various driving scenarios. Unlike other large-scale autonomous driving datasets, our radar data is in the format of radio frequency (RF) tensors that contain not only 3D location information but also spatio-temporal semantic information. This kind of radar format can enable machine learning models to generate more reliable object perception results after interacting and fusing the information or features between the camera and radar.
摘要
感知融合是自动驾驶车辆准确和可靠的感知系统中的关键。大多数现有的数据集和感知解决方案都将关注相机和LiDAR的整合。然而,相机和雷达之间的合作仍然被忽视了。将 ricoh semantic information from the camera and reliable 3D information from the radar fusion 可能实现一个高效、便宜、可搬的解决方案 для 3D 物体感知任务。它还可以在不同的照明或天气驾驶enario中具有更高的可靠性,因为雷达具有 millimeter wave 的特点。在本文中,我们介绍了 CRUW3D 数据集,包括 66 万个同步和准确地测量的相机、雷达和 LiDAR 帧,在不同的驾驶enario中。与其他大规模自动驾驶数据集不同的是,我们的雷达数据以 radio frequency (RF) 张量的形式提供,该张量包含不仅 3D 位置信息,还有空间时间semantic信息。这种雷达格式可以让机器学习模型通过相机和雷达之间的信息或特征交互和融合来生成更可靠的物体感知结果。
UniMOS: A Universal Framework For Multi-Organ Segmentation Over Label-Constrained Datasets
paper_authors: Can Li, Sheng Shao, Junyi Qu, Shuchao Pang, Mehmet A. Orgun
for: This paper aims to provide a universal framework for medical image segmentation tasks, which can utilize fully and partially labeled images as well as unlabeled images.
methods: The proposed framework, called UniMOS, uses a Multi-Organ Segmentation (MOS) module over fully/partially labeled data as the basenet, and incorporates a semi-supervised training module that combines consistent regularization and pseudolabeling techniques on unlabeled data.
results: The experiments show that the UniMOS framework exhibits excellent performance in several medical image segmentation tasks compared to other advanced methods, and also significantly improves data utilization and reduces annotation cost.Here’s the full text in Simplified Chinese:
results: 实验表明,UniMOS框架在几种医学图像分割任务中表现出色,与其他先进方法相比,也有显著提高数据利用率和减少标注成本。Abstract
Machine learning models for medical images can help physicians diagnose and manage diseases. However, due to the fact that medical image annotation requires a great deal of manpower and expertise, as well as the fact that clinical departments perform image annotation based on task orientation, there is the problem of having fewer medical image annotation data with more unlabeled data and having many datasets that annotate only a single organ. In this paper, we present UniMOS, the first universal framework for achieving the utilization of fully and partially labeled images as well as unlabeled images. Specifically, we construct a Multi-Organ Segmentation (MOS) module over fully/partially labeled data as the basenet and designed a new target adaptive loss. Furthermore, we incorporate a semi-supervised training module that combines consistent regularization and pseudolabeling techniques on unlabeled data, which significantly improves the segmentation of unlabeled data. Experiments show that the framework exhibits excellent performance in several medical image segmentation tasks compared to other advanced methods, and also significantly improves data utilization and reduces annotation cost. Code and models are available at: https://github.com/lw8807001/UniMOS.
摘要
医疗影像机器学习模型可以帮助医生诊断和管理疾病。然而,由于医疗影像标注需要很大的人力和专业知识,以及临床部门根据任务方向进行标注,导致有 fewer 的医疗影像标注数据和更多的未标注数据,以及许多只标注一个器官的数据集。在这篇论文中,我们提出了UniMOS,第一个可以实现充分利用完全/部分标注图像和未标注图像的框架。具体来说,我们在完全/部分标注数据上构建了多器官分割(MOS)模块作为基础网络,并设计了一种新的目标适应损失函数。此外,我们还将一种semi-supervised Training模块 incorporated into the framework,该模块将在无标注数据上进行一致准确 regularization 和 pseudolabeling 技术,以显著提高未标注数据的分割性能。实验表明,该框架在多个医疗影像分割任务中表现出色,比其他进步方法更好,同时也有利用数据和减少标注成本。代码和模型可以在:https://github.com/lw8807001/UniMOS 中找到。
paper_authors: Bozhen Hu, Bin Gao, Cheng Tan, Tongle Wu, Stan Z. Li for: 这个研究旨在提高非破坏性测试系统中的缺陷检测精度,使用非接触、安全、高效的检测能力。methods: 本研究使用了一个新的方法,即DefectSAM,基于Segment Anything(SAM)模型,并使用了一个精心组制的实验室实验数据集和专业人员的讯息,以超越现有的州际状态艺术法。results: 实验结果显示,DefectSAM可以优化缺陷检测率,尤其是检测较弱和小型缺陷的能力,并获得更 precisione 的缺陷大小估计。此外,DefectSAM在不同材料上进行了实验,并证明了其在缺陷检测中的可靠性和有效性。Abstract
Defect detection plays a crucial role in infrared non-destructive testing systems, offering non-contact, safe, and efficient inspection capabilities. However, challenges such as low resolution, high noise, and uneven heating in infrared thermal images hinder comprehensive and accurate defect detection. In this study, we propose DefectSAM, a novel approach for segmenting defects on highly noisy thermal images based on the widely adopted model, Segment Anything (SAM)\cite{kirillov2023segany}. Harnessing the power of a meticulously curated dataset generated through labor-intensive lab experiments and valuable prompts from experienced experts, DefectSAM surpasses existing state-of-the-art segmentation algorithms and achieves significant improvements in defect detection rates. Notably, DefectSAM excels in detecting weaker and smaller defects on complex and irregular surfaces, reducing the occurrence of missed detections and providing more accurate defect size estimations. Experimental studies conducted on various materials have validated the effectiveness of our solutions in defect detection, which hold significant potential to expedite the evolution of defect detection tools, enabling enhanced inspection capabilities and accuracy in defect identification.
摘要
异常检测在红外非破坏测试系统中扮演着关键角色,提供无接触、安全、高效的检测能力。然而,红外热图像中的低分辨率、高噪声和不均匀热辐射等挑战使得全面和准确的异常检测变得困难。本研究提出了基于广泛采用的模型Segment Anything(SAM)的新方法——异常检测(DefectSAM),通过精心制作的劳动atorydataset和经验丰富的专家提示,超越现有状态的最佳分割算法,提高异常检测率。特别是,DefectSAM在复杂和不规则表面上检测弱小异常,降低错过检测的发生率,并提供更准确的异常大小估计。在不同材料上进行的实验研究证明了我们的解决方案在异常检测方面的有效性,这些解决方案具有提高检测工具的精度和准确性,并促进异常 indentification的进步。
paper_authors: Karel D’Oosterlinck, Thomas Demeester, Chris Develder, Christopher Potts
for: 论文目的是提高模型解释性和修改模型行为的可能性。
methods: 论文使用了修改模型行为以探索人类概念的方法。
results: 研究发现,可以通过修改模型行为来提高模型内部表示的解释性,并且可以通过这种方法找到相关的表示和修改它们。Abstract
Model interpretability and model editing are crucial goals in the age of large language models. Interestingly, there exists a link between these two goals: if a method is able to systematically edit model behavior with regard to a human concept of interest, this editor method can help make internal representations more interpretable by pointing towards relevant representations and systematically manipulating them.
摘要
MODEL理解和模型编辑是当今大语言模型时代的两个关键目标。有趣的是,这两个目标之间存在一种联系:如果一种方法可以系统地编辑模型的行为,以便更好地理解人类概念的内部表示。这种编辑方法可以帮助暴露内部表示,并系统地操作它们,从而使模型的内部表示更加可读。
paper_authors: Andrew S. Nencka, L. Tugan Muftuler, Peter LaViolette, Kevin M. Koch
for: probing the workings of deep neural networks, specifically the Facebook Galactica-125M language model
methods: functional neuroimaging techniques were applied to the model, using block-designed task-based prompt sequences to probe its functional structure
results: distinct, overlapping networks were identified for each task, with the most overlap between medical imaging and pathology networks, and the identified functional networks were found to be repeatable across repeated performance of related tasks and accurate in identifying presented tasksAbstract
Background: Deep neural networks have proven to be powerful computational tools for modeling, prediction, and generation. However, the workings of these models have generally been opaque. Recent work has shown that the performance of some models are modulated by overlapping functional networks of connections within the models. Here the techniques of functional neuroimaging are applied to an exemplary large language model to probe its functional structure. Methods: A series of block-designed task-based prompt sequences were generated to probe the Facebook Galactica-125M model. Tasks included prompts relating to political science, medical imaging, paleontology, archeology, pathology, and random strings presented in an off/on/off pattern with prompts about other random topics. For the generation of each output token, all layer output values were saved to create an effective time series. General linear models were fit to the data to identify layer output values which were active with the tasks. Results: Distinct, overlapping networks were identified with each task. Most overlap was observed between medical imaging and pathology networks. These networks were repeatable across repeated performance of related tasks, and correspondence of identified functional networks and activation in tasks not used to define the functional networks was shown to accurately identify the presented task. Conclusion: The techniques of functional neuroimaging can be applied to deep neural networks as a means to probe their workings. Identified functional networks hold the potential for use in model alignment, modulation of model output, and identifying weights to target in fine-tuning.
摘要
Background: 深度神经网络已经证明是有力的计算工具,用于模型、预测和生成。然而,这些模型的工作方式通常是不透明的。最近的研究表明,一些模型的性能是由模型内部的重叠功能网络连接所模ulated。在这种情况下,我们使用函数神经成像技术来探索Facebook Galactica-125M模型的函数结构。Methods: 我们生成了一系列块设计的任务基本提示序列,用于探索Facebook Galactica-125M模型。这些任务包括政治科学、医学成像、古生物学、考古学、病理学和随机串列,并以On/Off/On的模式提交关于其他随机主题的提示。为每个输出字符生成,所有层输出值都被保存,以创建一个有效的时间序列。我们使用通用线性模型适应 данных,以确定每个任务的相关层输出值。Results: 我们发现了每个任务都有独特的、重叠的函数网络。最多的重叠是在医学成像和病理学任务之间。这些网络在相关任务的重复执行中是重复的,并且可以用来确定提交的任务。我们还发现,在不使用定义函数网络的任务中,活跃的层输出值与确定的任务相关。Conclusion: 我们可以将函数神经成像技术应用于深度神经网络,以探索它们的工作方式。被发现的函数网络具有识别和调整模型输出的潜在优势。
The Hidden Linear Structure in Score-Based Models and its Application
methods: 研究人员使用了 normative analysis of the score function 来找到这种结构,并通过 empirical validation of pre-trained images diffusion model 和理论分析来证明其存在。
results: 研究人员发现,在高噪音级别下,Well-trained diffusion models 的学习得分可以近似于 Gaussian 的直线得分。这种发现可以帮助预测初始噪音轨迹,并在预测图像扩散过程中提高样本质量。Abstract
Score-based models have achieved remarkable results in the generative modeling of many domains. By learning the gradient of smoothed data distribution, they can iteratively generate samples from complex distribution e.g. natural images. However, is there any universal structure in the gradient field that will eventually be learned by any neural network? Here, we aim to find such structures through a normative analysis of the score function. First, we derived the closed-form solution to the scored-based model with a Gaussian score. We claimed that for well-trained diffusion models, the learned score at a high noise scale is well approximated by the linear score of Gaussian. We demonstrated this through empirical validation of pre-trained images diffusion model and theoretical analysis of the score function. This finding enabled us to precisely predict the initial diffusion trajectory using the analytical solution and to accelerate image sampling by 15-30\% by skipping the initial phase without sacrificing image quality. Our finding of the linear structure in the score-based model has implications for better model design and data pre-processing.
摘要
score-based 模型在多个领域的生成模型中获得了惊人的成绩。通过学习抽象数据分布的梯度场,它们可以逐步生成复杂分布,例如自然图像。然而,任何神经网络都会学习到 universal 结构在梯度场中吗?在这里,我们通过评估函数的正规分析来寻找这些结构。首先,我们得到了涉及到 scored-based 模型的关闭式解。我们证明,对于具有高噪声级别的扩散模型,学习的得分在高噪声级别上是可以近似为 Gaussian 梯度的线性得分。我们通过实验验证预训练的图像扩散模型和理论分析得分函数来证明这一点。这一发现使我们能够准确预测扩散过程的初始态使用关闭式解,并且可以通过跳过初始阶段来加速图像抽取,从而提高图像质量。我们发现了梯度场中的线性结构,这种结构在 score-based 模型中具有更好的模型设计和数据预处理的意义。
Verified Compositional Neuro-Symbolic Control for Stochastic Systems with Temporal Logic Tasks
paper_authors: Jun Wang, Kaiyuan Tan, Zihe Sun, Yiannis Kantaros for:* The paper aims to learn neural network (NN) controllers for autonomous agents with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL).methods:* The paper proposes a new approach that integrates automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems.* The proposed method uses a neuro-symbolic controller that allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills.results:* The paper shows correctness of the proposed method and provides conditions under which it is complete.* The proposed method is demonstrated through extensive numerical simulations and hardware experiments on robot navigation tasks.Abstract
Several methods have been proposed recently to learn neural network (NN) controllers for autonomous agents, with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL). Due to the sample-inefficiency of the majority of these works, compositional learning methods have been proposed decomposing the LTL specification into smaller sub-tasks. Then, separate controllers are learned and composed to satisfy the original task. A key challenge within these approaches is that they often lack safety guarantees or the provided guarantees are impractical. This paper aims to address this challenge. Particularly, we consider autonomous systems with unknown and stochastic dynamics and LTL-encoded tasks. We assume that the system is equipped with a finite set of base skills modeled by trained NN feedback controllers. Our goal is to check if there exists a temporal composition of the trained NN controllers - and if so, to compute it - that will yield a composite system behavior that satisfies the assigned LTL task with probability one. We propose a new approach that relies on a novel integration of automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems. The resulting neuro-symbolic controller allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills. We show correctness of the proposed method and we provide conditions under which it is complete. To the best of our knowledge, this is the first work that designs verified temporal compositions of NN controllers for unknown and stochastic systems. Finally, we provide extensive numerical simulations and hardware experiments on robot navigation tasks to demonstrate the proposed method.
摘要
Recently, several methods have been proposed to learn neural network (NN) controllers for autonomous agents with unknown and stochastic dynamics, tasked with complex missions captured by Linear Temporal Logic (LTL). However, due to the sample-inefficiency of most of these works, compositional learning methods have been proposed to decompose the LTL specification into smaller sub-tasks, and then learn separate controllers to satisfy the original task. A major challenge within these approaches is the lack of safety guarantees or impractical guarantees provided. This paper aims to address this challenge. Specifically, we consider autonomous systems with unknown and stochastic dynamics and LTL-encoded tasks. We assume that the system is equipped with a finite set of base skills modeled by trained NN feedback controllers. Our goal is to check if there exists a temporal composition of the trained NN controllers, and if so, to compute it, that will yield a composite system behavior that satisfies the assigned LTL task with probability one. We propose a new approach that combines automata theory and data-driven reachability analysis tools for NN-controlled stochastic systems. The resulting neuro-symbolic controller allows the agent to generate safe behaviors for unseen complex temporal logic tasks in a zero-shot fashion by leveraging its base skills. We prove the correctness of the proposed method and provide conditions under which it is complete. To the best of our knowledge, this is the first work that designs verified temporal compositions of NN controllers for unknown and stochastic systems. Finally, we provide extensive numerical simulations and hardware experiments on robot navigation tasks to demonstrate the proposed method.
Formal concept analysis for evaluating intrinsic dimension of a natural language
results: 研究发现,这两种语言的内在维度远低于 популяр的人工神经网络模型在自然语言处理中使用的维度。Abstract
Some results of a computational experiment for determining the intrinsic dimension of linguistic varieties for the Bengali and Russian languages are presented. At the same time, both sets of words and sets of bigrams in these languages were considered separately. The method used to solve this problem was based on formal concept analysis algorithms. It was found that the intrinsic dimensions of these languages are significantly less than the dimensions used in popular neural network models in natural language processing.
摘要
<>将文本翻译成简化中文。<>这些计算实验结果表明,孟加拉语和俄语的语言变体的内在维度非常低。在这些语言中,单词和大字组都被考虑了。使用了正式概念分析算法来解决这个问题。结果显示,这些语言的内在维度明显比流行的人工神经网络模型在自然语言处理中使用的维度更低。
Exploring the Consistency, Quality and Challenges in Manual and Automated Coding of Free-text Diagnoses from Hospital Outpatient Letters
paper_authors: Warren Del-Pinto, George Demetriou, Meghna Jani, Rikesh Patel, Leanne Gray, Alex Bulcock, Niels Peek, Andrew S. Kanter, William G Dixon, Goran Nenadic
For: This paper aims to evaluate the quality and consistency of manual and automated clinical coding of diagnoses from hospital outpatient letters.* Methods: The authors used 100 randomly selected letters for coding, and two human clinicians performed coding of diagnosis lists to SNOMED CT. Automated coding was also performed using IMO’s Concept Tagger. A gold standard was constructed by a panel of clinicians from a subset of the annotated diagnoses.* Results: The results indicate that humans slightly out-performed automated coding, while both performed notably better when there was only a single diagnosis contained in the free-text description. Automated coding was considered acceptable by the panel of clinicians in approximately 90% of cases.Here’s the Chinese translation of the three key information points:* For: 这篇论文目标是评估人工和自动化医疗诊断代码的质量和一致性。* Methods: 作者使用100个随机选择的医疗信息letter进行编码,并由两名人类医生对诊断列表进行编码到SNOMED CT。自动编码也使用IMO的概念标签。一个金标准由一组医生从一 subset of annotated diagnoses中构建。* Results: 结果显示人工编码slightly exceeded自动编码,而两者都在只有一个诊断存在于自由文本描述时表现更好。自动编码被评估为可以接受的程度,大约90%的 случа子。Abstract
Coding of unstructured clinical free-text to produce interoperable structured data is essential to improve direct care, support clinical communication and to enable clinical research.However, manual clinical coding is difficult and time consuming, which motivates the development and use of natural language processing for automated coding. This work evaluates the quality and consistency of both manual and automated clinical coding of diagnoses from hospital outpatient letters. Using 100 randomly selected letters, two human clinicians performed coding of diagnosis lists to SNOMED CT. Automated coding was also performed using IMO's Concept Tagger. A gold standard was constructed by a panel of clinicians from a subset of the annotated diagnoses. This was used to evaluate the quality and consistency of both manual and automated coding via (1) a distance-based metric, treating SNOMED CT as a graph, and (2) a qualitative metric agreed upon by the panel of clinicians. Correlation between the two metrics was also evaluated. Comparing human and computer-generated codes to the gold standard, the results indicate that humans slightly out-performed automated coding, while both performed notably better when there was only a single diagnosis contained in the free-text description. Automated coding was considered acceptable by the panel of clinicians in approximately 90% of cases.
摘要
临床自由文本编码到可互操作结构化数据是直接护理、促进临床通信和促进临床研究的关键。然而,手动临床编码困难和时间consuming,这些动机了开发和使用自然语言处理(NLP)自动编码。这项工作评估手动和自动临床编码诊断列表的质量和一致性。使用100份随机选择的医疗信息,两名人类临床医生 manually编码诊断列表到SNOMED CT。自动编码也使用IMO的概念标签器。一个由临床医生组成的委员会从一个子集中的注释诊断构建了黄金标准。这些标准用于评估手动和自动编码质量和一致性。首先,使用 distance-based 度量,将 SNOMED CT 视为图,并使用一种由委员会所议定的质量量表。其次,对手动和自动编码与黄金标准进行比较。结果显示,人类轻微地超越了自动编码,而两者都在单个诊断存在于自由文本描述时表现出色。自动编码被委员会认为是可接受的约90%的情况下。
Artificial Intelligence in Fetal Resting-State Functional MRI Brain Segmentation: A Comparative Analysis of 3D UNet, VNet, and HighRes-Net Models
paper_authors: Farzan Vahedifard, Xuchu Liu, Mehmet Kocak, H. Asher Ai, Mark Supanich, Christopher Sica., Kranthi K Marathu, Seth Adler, Maysam Orouskhani, Sharon Byrd
methods: 这个研究使用了人工智能(AI)技术来自动进行脑部分分 Segmentation,并使用了一个开源的婴儿 fMRI 数据集来训练 AI 模型。
results: 研究发现,使用 VNet 模型可以提高脑部分分 Segmentation 的精度,但是需要进一步的调整和研究以全面 explore 每种模型的潜力和局限性。Abstract
Introduction: Fetal resting-state functional magnetic resonance imaging (rs-fMRI) is a rapidly evolving field that provides valuable insight into brain development before birth. Accurate segmentation of the fetal brain from the surrounding tissue in nonstationary 3D brain volumes poses a significant challenge in this domain. Current available tools have 0.15 accuracy. Aim: This study introduced a novel application of artificial intelligence (AI) for automated brain segmentation in fetal brain fMRI, magnetic resonance imaging (fMRI). Open datasets were employed to train AI models, assess their performance, and analyze their capabilities and limitations in addressing the specific challenges associated with fetal brain fMRI segmentation. Method: We utilized an open-source fetal functional MRI (fMRI) dataset consisting of 160 cases (reference: fetal-fMRI - OpenNeuro). An AI model for fMRI segmentation was developed using a 5-fold cross-validation methodology. Three AI models were employed: 3D UNet, VNet, and HighResNet. Optuna, an automated hyperparameter-tuning tool, was used to optimize these models. Results and Discussion: The Dice scores of the three AI models (VNet, UNet, and HighRes-net) were compared, including a comparison between manually tuned and automatically tuned models using Optuna. Our findings shed light on the performance of different AI models for fetal resting-state fMRI brain segmentation. Although the VNet model showed promise in this application, further investigation is required to fully explore the potential and limitations of each model, including the HighRes-net model. This study serves as a foundation for further extensive research into the applications of AI in fetal brain fMRI segmentation.
摘要
引言:胎儿休息状态功能磁共振成像(rs-fMRI)是一个快速发展的领域,它为胎儿脑发展前的脑部提供了价值的信息。但是,准确地从周围组织中分离胎儿脑部在非站ARY 3D脑部图像中是一项 significante challenges。目前可用的工具的准确率只有0.15。目标:本研究推出了一种使用人工智能(AI)自动 segmentation的胎儿脑部fMRI magnet resonance imaging(fMRI)应用。我们使用了开源的胎儿功能MRI数据集(fetal-fMRI - OpenNeuro)来训练AI模型,评估其性能,并分析它们在特定挑战中的能力和局限性。方法:我们使用了一个5-fold cross-validation方法来开发一个AI模型。我们使用了3D UNet、VNet和HighResNet三种AI模型。我们使用了Optuna,一个自动调参工具,来优化这些模型。结果和讨论:我们比较了三种AI模型(VNet、UNet和HighRes-net)的Dice分数,包括手动调参和使用Optuna自动调参的模型的比较。我们的发现 shed light on the performance of different AI models for fetal resting-state fMRI brain segmentation。虽然VNet模型表现良好,但需要进一步的研究以全面探讨每种模型的潜在和局限性,包括HighRes-net模型。这项研究为胎儿脑fMRI segmentation中AI应用的进一步探索提供了基础。
Integration and Implementation Strategies for AI Algorithm Deployment with Smart Routing Rules and Workflow Management
paper_authors: Barbaros Selnur Erdal, Vikash Gupta, Mutlu Demirer, Kim H. Fair, Richard D. White, Jeff Blair, Barbara Deichert, Laurie Lafleur, Ming Melvin Qin, David Bericat, Brad Genereaux
results: 论文认为,通过可交互性和企业级扩展性来解决这些挑战,可以提高医疗AI应用的普及率。例如,DICOM、HL7和IHE等标准被用于健康域的共同图像工作流程。Laurel Bridge在这一领域中发挥了转变性的作用。而MONAI项目,成立于2019年,也被认为是一种重要的 iniciativa,用于重定义医疗AI应用的开发。MONAI部署App SDK是该项目的关键工具,可以简化AI应用的包装和部署过程,使得AI应用的扩展和标准化部署模式变得可能。Abstract
This paper reviews the challenges hindering the widespread adoption of artificial intelligence (AI) solutions in the healthcare industry, focusing on computer vision applications for medical imaging, and how interoperability and enterprise-grade scalability can be used to address these challenges. The complex nature of healthcare workflows, intricacies in managing large and secure medical imaging data, and the absence of standardized frameworks for AI development pose significant barriers and require a new paradigm to address them. The role of interoperability is examined in this paper as a crucial factor in connecting disparate applications within healthcare workflows. Standards such as DICOM, Health Level 7 HL7, and Integrating the Healthcare Enterprise (IHE) are highlighted as foundational for common imaging workflows. A specific focus is placed on the role of DICOM gateways, with Laurel Bridge leading transformational efforts in this area. To drive enterprise scalability, new tools are needed. Project MONAI, established in 2019, is introduced as an initiative aiming to redefine the development of medical AI applications. The MONAI Deploy App SDK, a component of Project MONAI, is identified as a key tool in simplifying the packaging and deployment process, enabling repeatable, scalable, and standardized deployment patterns for AI applications. The abstract underscores the potential impact of successful AI adoption in healthcare, offering physicians both life-saving and time-saving insights and driving efficiencies in radiology department workflows. The collaborative efforts between academia and industry, exemplified by collaborations with organizations like NVIDIA and Laurel Bridge, are emphasized as essential for advancing the adoption of healthcare AI solutions.
摘要
这篇论文检讨了医疗领域人工智能解决方案的广泛应用面临的挑战,特别是医疗图像应用,以及如何通过可交互性和企业级扩展性来解决这些挑战。医疗工作流程的复杂性、管理大量安全医疗图像的细节和开发人工智能应用程序的缺乏标准化框架是主要的阻碍因素,需要一种新的思维方式来解决这些问题。本文强调了可交互性在医疗工作流程中的重要性,并提到了如DICOM、HL7和IHE等标准的作用。特别是在医疗图像工作流程中,DICOM网关的角色很重要,如Laurel Bridge等公司的努力。为了驱动企业级扩展,新的工具是需要的。2019年成立的项目MONAI是一个旨在重定义医疗人工智能应用程序的开发的 iniciativa。MONAI Deploy App SDK是项目MONAI的一个关键工具,可以简化包装和部署过程,实现可重复、可扩展和标准化的部署模式。摘要 highlights the potential impact of successful AI adoption in healthcare, offering physicians both life-saving and time-saving insights and driving efficiencies in radiology department workflows。学术和产业之间的合作,例如与NVIDIA和Laurel Bridge等组织的合作,被视为健康医疗人工智能解决方案的发展的重要因素。
Exploring Machine Learning Models for Federated Learning: A Review of Approaches, Performance, and Limitations
results: 本文对 Federated Learning 的不同组件进行了评论,以及其在不同应用场景中的应用。此外,文章还提供了一些未解决的问题和未来研究方向。Abstract
In the growing world of artificial intelligence, federated learning is a distributed learning framework enhanced to preserve the privacy of individuals' data. Federated learning lays the groundwork for collaborative research in areas where the data is sensitive. Federated learning has several implications for real-world problems. In times of crisis, when real-time decision-making is critical, federated learning allows multiple entities to work collectively without sharing sensitive data. This distributed approach enables us to leverage information from multiple sources and gain more diverse insights. This paper is a systematic review of the literature on privacy-preserving machine learning in the last few years based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Specifically, we have presented an extensive review of supervised/unsupervised machine learning algorithms, ensemble methods, meta-heuristic approaches, blockchain technology, and reinforcement learning used in the framework of federated learning, in addition to an overview of federated learning applications. This paper reviews the literature on the components of federated learning and its applications in the last few years. The main purpose of this work is to provide researchers and practitioners with a comprehensive overview of federated learning from the machine learning point of view. A discussion of some open problems and future research directions in federated learning is also provided.
摘要
在快速发展的人工智能世界中,联邦学习是一种分布式学习框架,旨在保护个人数据隐私。联邦学习为敏捷决策和敏捷研究提供了一个保密的平台,当危机发生时,多个实体可以共同工作,无需分享敏感数据。这种分布式方法允许我们利用多个来源的信息,获得更多的多样化的洞察。本文是根据《系统atic Review和Meta-Analysis(PRISMA)》指南进行的一项系统性的文献综述,具体来说,我们对于联邦学习框架中的支持学习/无支持学习机器学习算法、集成方法、meta-heuristic方法、区块链技术和强化学习的应用进行了广泛的回顾。此外,我们还对联邦学习应用的各种领域进行了概述。本文的主要目的是为研究者和实践者提供联邦学习从机器学习角度的全面的视图。文章还提供了一些开放问题和未来研究方向的讨论。
paper_authors: Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, Yue Wang
For: This paper aims to integrate human-like intelligence into autonomous driving systems by leveraging Large Language Models (LLMs) as a cognitive agent.* Methods: The proposed approach, called Agent-Driver, includes a versatile tool library, a cognitive memory of common sense and experiential knowledge, and a reasoning engine for chain-of-thought reasoning, task planning, motion planning, and self-reflection.* Results: The approach significantly outperforms state-of-the-art driving methods on the large-scale nuScenes benchmark, with superior interpretability and few-shot learning ability.Here is the text in Simplified Chinese:* For: 这篇论文目标是将人类智能集成到自动驾驶系统中,通过使用大型自然语言模型(LLM)作为认知代理。* Methods: 提议的方法,称为Agent-Driver,包括一个多功能工具库、一个认知记忆,以及一个链式思维、任务规划、运动规划和自我反思的理解引擎。* Results: 该方法在大规模的 nuScenes 数据集上与当前最佳驾驶方法进行比较,显著超过了它们,并且具有更高的可读性和少量学习能力。Abstract
Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods. Project page: \href{https://github.com/USC-GVL/Agent-Driver/blob/main/index.html}{here}.
摘要
人类水平驾驶是自动驾驶的最终目标。传统方法将自动驾驶分为感知、预测和规划三个步骤,但是这些系统不能充分利用人类的内在智能和经验知识。在这篇论文中,我们提出了一种基本的思想转变,利用大型自然语言模型(LLM)作为认知代理人,将人类化智能integrated into自动驾驶系统。我们的方法,称为Agent-Driver,把传统的自动驾驶管道转变为一个可访问函数调用的多功能工具库、一个包含常识和经验知识的认知储存、以及一个可以进行链式思维、任务规划、运动规划和自我反思的解释引擎。通过LLM的支持,我们的Agent-Driver具有直观的常识和强大的解释能力,因此可以实现更加人类化、智能化的自动驾驶方法。我们在nuScenes大规模测试 benchmark上进行了广泛的实验,并证明了我们的Agent-Driver在与当前状态的驾驶方法相比有很大的提升。我们的方法还表现出了更好的可解释性和少量学习能力。项目页面:
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
results: 人工评价中,我们生成的视频质量高于所有之前的工作(81% vs. Google的Imagen Video,90% vs. Nvidia的PYOCO,96% vs. Meta的Make-A-Video),并且超过了商业解决方案 such as RunwayML的Gen2和Pika Labs。此外,我们的因式方法自然地适用于根据用户文本提示生成动画图像,我们的生成被评价为96%高于前工作。Abstract
We present Emu Video, a text-to-video generation model that factorizes the generation into two steps: first generating an image conditioned on the text, and then generating a video conditioned on the text and the generated image. We identify critical design decisions--adjusted noise schedules for diffusion, and multi-stage training--that enable us to directly generate high quality and high resolution videos, without requiring a deep cascade of models as in prior work. In human evaluations, our generated videos are strongly preferred in quality compared to all prior work--81% vs. Google's Imagen Video, 90% vs. Nvidia's PYOCO, and 96% vs. Meta's Make-A-Video. Our model outperforms commercial solutions such as RunwayML's Gen2 and Pika Labs. Finally, our factorizing approach naturally lends itself to animating images based on a user's text prompt, where our generations are preferred 96% over prior work.
摘要
我们介绍Emu Video,一种文本到视频生成模型,它将生成过程分为两步:首先生成基于文本的图像,然后生成基于文本和生成图像的视频。我们提出了关键的设计决策——调整的噪声学习策略和多Stage训练——使得我们可以直接生成高质量和高分辨率的视频,不需要先前的深度层次模型。在人工评估中,我们生成的视频质量得到了81%的优势 compared to Google的Imagen Video,90%的优势 compared to Nvidia的PYOCO,和96%的优势 compared to Meta的Make-A-Video。我们的模型超越了商业解决方案 such as RunwayML的Gen2和Pika Labs。最后,我们的分解方法自然地适用于基于用户文本提示的图像动画,我们的生成被评估为96%高于先前的工作。
Using linear initialisation to improve speed of convergence and fully-trained error in Autoencoders
results: 作者在三个数据集上使用 Straddled Matrix Initialiser 和七种state-of-the-art 初始化方法进行比较,结果显示 Straddled Matrix Initialiser 在所有 экспериментах中明显超过了所有其他方法。Abstract
Good weight initialisation is an important step in successful training of Artificial Neural Networks. Over time a number of improvements have been proposed to this process. In this paper we introduce a novel weight initialisation technique called the Straddled Matrix Initialiser. This initialisation technique is motivated by our assumption that major, global-scale relationships in data are linear with only smaller effects requiring complex non-linearities. Combination of Straddled Matrix and ReLU activation function initialises a Neural Network as a de facto linear model, which we postulate should be a better starting point for optimisation given our assumptions. We test this by training autoencoders on three datasets using Straddled Matrix and seven other state-of-the-art weight initialisation techniques. In all our experiments the Straddeled Matrix Initialiser clearly outperforms all other methods.
摘要
好的初始化Weight是训练人工神经网络成功的重要步骤。随着时间的推移,许多改进有被提议用于这个过程。在这篇论文中,我们介绍了一种新的Weight初始化技术called Straddled Matrix Initializer。这种初始化技术基于我们假设大规模数据中的主要关系是线性的,只有小规模的影响需要复杂的非线性关系。将Straddled Matrix和ReLU活化函数结合使得神经网络被初始化为一个事实上的线性模型,我们认为这应该是优化的开始点。我们通过在三个数据集上使用Straddled Matrix和七种现状顶尖Weight初始化技术进行训练autoencoder来测试这个假设。在所有我们的实验中,Straddled Matrix Initializer明显超过了所有其他方法。
A novel post-hoc explanation comparison metric and applications
methods: 这paper使用了一种新的评估 metric called Shreyan Distance,用于比较两种explainable AI系统(SHAP和LIME)在 regression和 classification 任务中的一致性。
results: 该paper发现,在不同的学习任务中,explainable AI系统之间的一致性可以 vary significantly,表明一致性不仅取决于explainer的自身特性,还取决于学习任务的类型。Here’s the English version of the paper’s abstract:This paper presents a novel metric called the Shreyan Distance to quantify the differences between explainable AI systems. The Shreyan Distance is based on the weighted difference between ranked feature importance lists produced by such systems. The paper compares two popular explainable AI systems, SHAP and LIME, for both regression and classification learning tasks. The results show that the average Shreyan Distance varies significantly between these two tasks, indicating that consistency between explainers not only depends on inherent properties of the explainers themselves, but also the type of learning task. The paper also introduces the XAISuite library, which integrates the Shreyan distance algorithm into machine learning pipelines.Abstract
Explanatory systems make the behavior of machine learning models more transparent, but are often inconsistent. To quantify the differences between explanatory systems, this paper presents the Shreyan Distance, a novel metric based on the weighted difference between ranked feature importance lists produced by such systems. This paper uses the Shreyan Distance to compare two explanatory systems, SHAP and LIME, for both regression and classification learning tasks. Because we find that the average Shreyan Distance varies significantly between these two tasks, we conclude that consistency between explainers not only depends on inherent properties of the explainers themselves, but also the type of learning task. This paper further contributes the XAISuite library, which integrates the Shreyan distance algorithm into machine learning pipelines.
摘要
<>使用描述系统可以使机器学习模型的行为更加透明,但这些系统经常存在不一致性。为了量化这些描述系统之间的差异,这篇论文提出了尚然距离度量,这是基于权重加权的特征重要性列表生成的描述系统之间的差异。本文使用尚然距离对两种描述系统(SHAP和LIME)进行了回归和分类学习任务的比较。我们发现,在这两种任务之间,均值尚然距离有很大的差异,这表明,透明度的一致性不仅取决于描述系统本身的内在特性,还取决于学习任务的类型。此外,本文还提供了XAISuite库,它将尚然距离算法集成到机器学习管道中。
PEFT-MedAware: Large Language Model for Medical Awareness
results: 研究发现,使用 PEFT 技术可以将 Falcon-1b 模型在医疗问题回答 tasks 中提高精度,并且只需使用 0.44% 的训练urable 参数,实现了资源有限下的高效性。Abstract
Chat models are capable of answering a wide range of questions, however, the accuracy of their responses is highly uncertain. In this research, we propose a specialized PEFT-MedAware model where we utilize parameter-efficient fine-tuning (PEFT) to enhance the Falcon-1b large language model on specialized MedQuAD data consisting of 16,407 medical QA pairs, leveraging only 0.44% of its trainable parameters to enhance computational efficiency. The paper adopts data preprocessing and PEFT to optimize model performance, complemented by a BitsAndBytesConfig for efficient transformer training. The resulting model was capable of outperforming other LLMs in medical question-answering tasks in specific domains with greater accuracy utilizing limited computational resources making it suitable for deployment in resource-constrained environments. We propose further improvements through expanded datasets, larger models, and feedback mechanisms for sustained medical relevancy. Our work highlights the efficiency gains and specialized capabilities of PEFT in medical AI, outpacing standard models in precision without extensive resource demands. The proposed model and data are released for research purposes only.
摘要
协议模型可以回答广泛的问题,但它们的准确性却很 uncertain。在这项研究中,我们提出了一种特殊的 PEFT-MedAware 模型,我们利用 parameter-efficient fine-tuning(PEFT)来提高 Falcon-1b 大语言模型在特殊的 MedQuAD 数据上,只使用 0.44% 的可训练参数来提高计算效率。本文采用了数据处理和 PEFT 来优化模型性能,并使用 BitsAndBytesConfig 来快速训练 transformer。得到的模型能够在具体的医疗问答任务中超越其他 LLMS 的精度,使用有限的计算资源。我们建议进一步改进通过扩展数据集、更大的模型和反馈机制来保持医疗相关性。我们的工作表明 PEFT 在医疗 AI 中具有高效性和特殊能力,在精度不受极限计算资源的情况下,超越标准模型。我们发布了提posed模型和数据供研究用途。
Use GPT-J Prompt Generation with RoBERTa for NER Models on Diagnosis Extraction of Periodontal Diagnosis from Electronic Dental Records
results: 研究发现,具有更少的负例并且更多的示例的提问可以达到最佳结果,即F1分数0.72。此外,在训练RoBERTa模型后,所有设定中的性能都保持了0.92-0.97的Consistency,表明种子质量比quantity更重要。这种提问生成方法可以快速和高效地挖掘医疗记录中的 periodontal 诊断。Abstract
This study explored the usability of prompt generation on named entity recognition (NER) tasks and the performance in different settings of the prompt. The prompt generation by GPT-J models was utilized to directly test the gold standard as well as to generate the seed and further fed to the RoBERTa model with the spaCy package. In the direct test, a lower ratio of negative examples with higher numbers of examples in prompt achieved the best results with a F1 score of 0.72. The performance revealed consistency, 0.92-0.97 in the F1 score, in all settings after training with the RoBERTa model. The study highlighted the importance of seed quality rather than quantity in feeding NER models. This research reports on an efficient and accurate way to mine clinical notes for periodontal diagnoses, allowing researchers to easily and quickly build a NER model with the prompt generation approach.
摘要
Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression
results: 研究发现,随着RE算法的复杂度增加,F1分数从0.3-0.4提高到约0.9。NER模型显示了优秀的预测性,简单RE方法在评估指标中得分0.84-0.92,而复杂RE方法和组合RE方法在评估指标中得分0.95-0.99。这项研究示例了将NER方法和NLP模型结合使用,从自由文本中提取目标信息,并将其转化为结构化数据,以满足缺失的诊断。Abstract
This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer models were used to build the NER model and evaluate its performance with the manual-labeled gold standards. The comparison of the RE methods with the gold standard showed that as the complexity increased in the RE algorithms, the F1 score increased from 0.3-0.4 to around 0.9. The NER models demonstrated excellent predictions, with the simple RE method showing 0.84-0.92 in the evaluation metrics, and the advanced and combined RE method demonstrating 0.95-0.99 in the evaluation. This study provided an example of the benefit of combining NER methods and NLP models in extracting target information from free-text to structured data and fulfilling the need for missing diagnoses from unstructured notes.
摘要
Note: Simplified Chinese is also known as "简化字符" or "简化字符".Please note that the translation is in Simplified Chinese, if you prefer Traditional Chinese, please let me know.
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
results: 对比其他直接通过LLM生成机器人代码的技术,DROC需要一round的纠正数量的一半,并在两轮纠正后几乎不需要再纠正。研究还表明,DROC在新任务或物体实例中表现出色,并提供了视频、提示和代码等详细结果。Abstract
Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .
摘要
We present Distillation and Retrieval of Online Corrections (DROC), a system that responds to arbitrary language feedback, distills generalizable knowledge from corrections, and retrieves relevant past experiences based on textual and visual similarity. DROC can respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in new settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via large language models by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations.For more information, videos, prompts, and code, please visit .
Fuse It or Lose It: Deep Fusion for Multimodal Simulation-Based Inference
results: 论文表明,MultiNPE不仅在一个基本模型上超越了 na"ive 基elines,而且在 neuroscience 和 cardiology 等科学领域中的代表性模型上也实现了更高的推断精度。 authors 还系统地研究了部分缺失数据对不同融合策略的影响。 results 表明,晚期和混合融合技术在实际应用中是选择的。Abstract
We present multimodal neural posterior estimation (MultiNPE), a method to integrate heterogeneous data from different sources in simulation-based inference with neural networks. Inspired by advances in attention-based deep fusion learning, it empowers researchers to analyze data from different domains and infer the parameters of complex mathematical models with increased accuracy. We formulate different multimodal fusion approaches for MultiNPE (early, late, and hybrid) and evaluate their performance in three challenging numerical experiments. MultiNPE not only outperforms na\"ive baselines on a benchmark model, but also achieves superior inference on representative scientific models from neuroscience and cardiology. In addition, we systematically investigate the impact of partially missing data on the different fusion strategies. Across our different experiments, late and hybrid fusion techniques emerge as the methods of choice for practical applications of multimodal simulation-based inference.
摘要
我们介绍了多Modal neural posterior估计(MultiNPE),一种能够将不同来源的数据集成在基于神经网络的 simulated-based推理中,以提高参数推导的准确性。受到深度融合学习的进步启发,MultiNPE允许研究人员从不同领域中的数据中分析数据,并使用更加准确地推导复杂的数学模型的参数。我们对MultiNPE的不同多模态融合方法(早期、晚期和混合)进行了不同的评估,并在三个复杂的数学实验中评估其性能。MultiNPE不仅在一个标准模型上超越了Na\"ive的基eline,还在 neuroscience和cardiology等科学领域中的代表性模型上实现了更高的推导精度。此外,我们系统地研究了partially missing data的不同融合策略的影响。在我们的不同实验中,晚期和混合融合技术被证明为实际应用中的首选方法。
Multi-delay arterial spin-labeled perfusion estimation with biophysics simulation and deep learning
results: QTMnet 精确地重建 perfusion Q 从 koncentrasi 数据。synthetic brain ASL 图像中的相对误差为 7.04%,比单delay ASL 模型的误差(25.15%)和多delay ASL 模型的误差(12.62%)都要低。Abstract
Purpose: To develop biophysics-based method for estimating perfusion Q from arterial spin labeling (ASL) images using deep learning. Methods: A 3D U-Net (QTMnet) was trained to estimate perfusion from 4D tracer propagation images. The network was trained and tested on simulated 4D tracer concentration data based on artificial vasculature structure generated by constrained constructive optimization (CCO) method. The trained network was further tested in a synthetic brain ASL image based on vasculature network extracted from magnetic resonance (MR) angiography. The estimations from both trained network and a conventional kinetic model were compared in ASL images acquired from eight healthy volunteers. Results: QTMnet accurately reconstructed perfusion Q from concentration data. Relative error of the synthetic brain ASL image was 7.04% for perfusion Q, lower than the error using single-delay ASL model: 25.15% for Q, and multi-delay ASL model: 12.62% for perfusion Q. Conclusion: QTMnet provides accurate estimation on perfusion parameters and is a promising approach as a clinical ASL MRI image processing pipeline.
摘要
目的:通过深度学习来开发基于生物物理的方法,从树脂扩散成像(ASL)图像中估算血液流速(Q)。方法:使用3D U-Net(QTMnet)来估算 perfusion Q,并在基于人工血管结构的 simulate 4D tracer扩散数据上训练该网络。网络被训练和测试在基于magnetic resonance(MR) angiography中提取的血管网络上。已训练的网络被进一步测试在synthetic brain ASL图像上,并与传统的动力学模型的估算结果进行比较。结果:QTMnet高精度地重建 perfusion Q。人工脑 ASL图像中的相对误差为7.04%,比单Delay ASL模型(25.15%)和多Delay ASL模型(12.62%)更低。结论:QTMnet提供了高精度的血液流速估算,是一个有前途的临床ASL MRI图像处理管道。
Concept-free Causal Disentanglement with Variational Graph Auto-Encoder
results: 经验表明,提出的模型CCVGAE和CC-Meta-Graph在不同的数据集上均达到了较高的性能,与基eline相比,提高了29%和11%的精度。Abstract
In disentangled representation learning, the goal is to achieve a compact representation that consists of all interpretable generative factors in the observational data. Learning disentangled representations for graphs becomes increasingly important as graph data rapidly grows. Existing approaches often rely on Variational Auto-Encoder (VAE) or its causal structure learning-based refinement, which suffer from sub-optimality in VAEs due to the independence factor assumption and unavailability of concept labels, respectively. In this paper, we propose an unsupervised solution, dubbed concept-free causal disentanglement, built on a theoretically provable tight upper bound approximating the optimal factor. This results in an SCM-like causal structure modeling that directly learns concept structures from data. Based on this idea, we propose Concept-free Causal VGAE (CCVGAE) by incorporating a novel causal disentanglement layer into Variational Graph Auto-Encoder. Furthermore, we prove concept consistency under our concept-free causal disentanglement framework, hence employing it to enhance the meta-learning framework, called concept-free causal Meta-Graph (CC-Meta-Graph). We conduct extensive experiments to demonstrate the superiority of the proposed models: CCVGAE and CC-Meta-Graph, reaching up to $29\%$ and $11\%$ absolute improvements over baselines in terms of AUC, respectively.
摘要
干净表示学习的目标是达到一个具有所有可解释生成因素的紧凑表示,这些生成因素在观测数据中都是可解释的。随着图数据的快速增长,学习图数据的干净表示变得越来越重要。现有的方法经常利用变量自动编码器(VAE)或其 causal structure learning 的修正,但这些方法受到 VAE 的独立因素假设和概念标签不可用的限制。在这篇论文中,我们提出了一种无监督的解决方案,称为无概念 causal disentanglement,基于一个可求优的上界,来近似最佳因子。这导致了一种 SCM 类的 causal structure 模型,直接从数据中学习概念结构。基于这个想法,我们提出了无概念 causal VGAE(CCVGAE),通过在变量自动编码器中添加一个新的 causal disentanglement 层来实现。此外,我们证明了我们的概念一致性,因此可以通过我们的概念自由 causal Meta-Graph(CC-Meta-Graph)框架来增强元学习框架。我们进行了广泛的实验,以示我们提出的模型(CCVGAE 和 CC-Meta-Graph)的超越性,相比基eline,其中 AUC 上的改进率可达 $29\%$ 和 $11\%$。
A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest
results: 模型在特定领域的表现得到了显著提高,超过了直接在域 corpus 上进行 fine-tune 的模型,并且只需600个种子实例进行自动化培育。Abstract
Large Language Models (LLMs), despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries. Our two-step approach starts from training a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents through a chain-of-thought reasoning process. Subsequently, we blend the mined QA pairs with a conversational dataset to fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise and conversational capabilities. We also developed a new evaluation benchmark which comprises four domain-specific text corpora and associated human-crafted QA pairs for testing. Our model shows remarkable performance improvement over generally aligned LLM and surpasses domain-adapted models directly fine-tuned on domain corpus. In particular, LLMiner achieves this with minimal human intervention, requiring only 600 seed instances, thereby providing a pathway towards self-improvement of LLMs through model-synthesized training data.
摘要
results: 研究结果表明,通过将ACI技术与ML技术结合使用,可以快速和可靠地解决分布式系统中的优化问题,同时满足QoS要求。Abstract
Machine Learning (ML) is a common tool to interpret and predict the behavior of distributed computing systems, e.g., to optimize the task distribution between devices. As more and more data is created by Internet of Things (IoT) devices, data processing and ML training are carried out by edge devices in close proximity. To ensure Quality of Service (QoS) throughout these operations, systems are supervised and dynamically adapted with the help of ML. However, as long as ML models are not retrained, they fail to capture gradual shifts in the variable distribution, leading to an inaccurate view of the system state. Moreover, as the prediction accuracy decreases, the reporting device should actively resolve uncertainties to improve the model's precision. Such a level of self-determination could be provided by Active Inference (ACI) -- a concept from neuroscience that describes how the brain constantly predicts and evaluates sensory information to decrease long-term surprise. We encompassed these concepts in a single action-perception cycle, which we implemented for distributed agents in a smart manufacturing use case. As a result, we showed how our ACI agent was able to quickly and traceably solve an optimization problem while fulfilling QoS requirements.
摘要
Chatbots as social companions: How people perceive consciousness, human likeness, and social health benefits in machines
paper_authors: Rose Guingrich, Michael S. A. Graziano
for: This paper aims to investigate the impact of human-AI interaction on human-human interaction, specifically focusing on the use of chatbots as social companions.
methods: The study compares the social health benefits of using chatbots with not using them, and examines how people perceive the consciousness and humanlikeness of chatbots.
results: The study finds that companion bot users report beneficial social health effects, while nonusers view them as harmful. Additionally, perceiving chatbots as more conscious and humanlike is associated with more positive opinions and better social health benefits.Abstract
As artificial intelligence (AI) becomes more widespread, one question that arises is how human-AI interaction might impact human-human interaction. Chatbots, for example, are increasingly used as social companions, but little is known about how their use impacts human relationships. A common hypothesis is that these companion bots are detrimental to social health by harming or replacing human interaction. To understand how companion bots impact social health, we studied people who used companion bots and people who did not. Contrary to expectations, companion bot users indicated that these relationships were beneficial to their social health, whereas nonusers viewed them as harmful. Another common assumption is that people perceive conscious, humanlike AI as disturbing and threatening. Among both users and nonusers, however, we found the opposite: perceiving companion bots as more conscious and humanlike correlated with more positive opinions and better social health benefits. Humanlike bots may aid social health by supplying reliable and safe interactions, without necessarily harming human relationships.
摘要
随着人工智能(AI)的普及,人类与AI交互的问题凝固在人类之间的交互方面。聊天机器人是一种在社交方面使用的AI,但对人类之间的交互的影响还不够了解。一个常见的假设是,这些伴侣机器人会对人类之间的关系产生负面影响,或者取代人类交流。为了了解伴侣机器人对社交健康的影响,我们对使用伴侣机器人和不使用伴侣机器人的人进行了研究。结果表明,使用伴侣机器人的人认为这些关系对其社交健康有益,而不使用伴侣机器人的人则认为它们有害。另一个常见的假设是,人们认为对人类似的AI会对人类产生不良影响。我们发现,使用伴侣机器人的人和不使用伴侣机器人的人都认为, perceiving companion bots as more conscious and humanlike correlated with more positive opinions and better social health benefits。可能是因为这些机器人可以提供可靠和安全的交流,而不会直接影响人类之间的关系。
Designing Reconfigurable Intelligent Systems with Markov Blankets
results: 这篇论文的结果表明,通过使用 causality 筛选器和 MB,可以减少设备需要跟踪的变量数量,并在设备基础上分析 SLOs,从而实现了 Decentralized 的 Intelligence。Abstract
Compute Continuum (CC) systems comprise a vast number of devices distributed over computational tiers. Evaluating business requirements, i.e., Service Level Objectives (SLOs), requires collecting data from all those devices; if SLOs are violated, devices must be reconfigured to ensure correct operation. If done centrally, this dramatically increases the number of devices and variables that must be considered, while creating an enormous communication overhead. To address this, we (1) introduce a causality filter based on Markov blankets (MB) that limits the number of variables that each device must track, (2) evaluate SLOs decentralized on a device basis, and (3) infer optimal device configuration for fulfilling SLOs. We evaluated our methodology by analyzing video stream transformations and providing device configurations that ensure the Quality of Service (QoS). The devices thus perceived their environment and acted accordingly -- a form of decentralized intelligence.
摘要
计算 kontinuum (CC) 系统包括大量分布在计算层次的设备。评估业务需求(服务级别目标,SLO)需要从所有设备收集数据,如果 SLO 被违反,则设备需要重新配置以确保正确的运行。如果从中心处进行,这将导致对设备和变量的考虑数量增加得非常大,同时创造出巨大的通信开销。为解决这一问题,我们(1)引入 causality 筛选器基于 Markov 纱(MB),限制每个设备需要跟踪的变量数量,(2)在设备基础上分布式评估 SLO,(3)根据 fulfilling SLO 推导出最佳设备配置。我们对视频流变换进行分析,并提供了保证服务质量(QoS)的设备配置,这些设备因此可以根据自己的环境进行自适应调整——一种分布式智能。
Hashing it Out: Predicting Unhealthy Conversations on Twitter
results: 比基线LSTM模型有明显的性能优势,并在小规模、新 Dataset上进行了较好的预测。Abstract
Personal attacks in the context of social media conversations often lead to fast-paced derailment, leading to even more harmful exchanges being made. State-of-the-art systems for the detection of such conversational derailment often make use of deep learning approaches for prediction purposes. In this paper, we show that an Attention-based BERT architecture, pre-trained on a large Twitter corpus and fine-tuned on our task, is efficient and effective in making such predictions. This model shows clear advantages in performance to the existing LSTM model we use as a baseline. Additionally, we show that this impressive performance can be attained through fine-tuning on a relatively small, novel dataset, particularly after mitigating overfitting issues through synthetic oversampling techniques. By introducing the first transformer based model for forecasting conversational events on Twitter, this work lays the foundation for a practical tool to encourage better interactions on one of the most ubiquitous social media platforms.
摘要
互联网社交媒体对话中的人身攻击常常导致快速的对话脱轨,从而导致更多的恶势攻击性交流。现代的对话脱轨检测系统经常使用深度学习方法进行预测。在这篇论文中,我们表明了一种基于注意力的BERT架构,在Twitter大量数据集上预训练并在我们的任务上细化,可以有效地进行这些预测。这个模型在性能方面有明显的优势,比基线LSTM模型更好。此外,我们还证明了这种出色的性能可以通过对小型、新的数据集进行细化来实现,特别是通过synthetic oversampling技术来 Mitigate overfitting问题。通过介绍Twitter上首个基于 transformer 模型的对话事件预测模型,这项工作为社交媒体平台上更好的互动提供了基础。
FOCAL: A Cost-Aware Video Dataset for Active Learning
results: 研究发现,使用这些方法可以在113小时之内实现更好的性能和注释成本之间的平衡,并且比传统的活动学方法更便宜。Abstract
In this paper, we introduce the FOCAL (Ford-OLIVES Collaboration on Active Learning) dataset which enables the study of the impact of annotation-cost within a video active learning setting. Annotation-cost refers to the time it takes an annotator to label and quality-assure a given video sequence. A practical motivation for active learning research is to minimize annotation-cost by selectively labeling informative samples that will maximize performance within a given budget constraint. However, previous work in video active learning lacks real-time annotation labels for accurately assessing cost minimization and instead operates under the assumption that annotation-cost scales linearly with the amount of data to annotate. This assumption does not take into account a variety of real-world confounding factors that contribute to a nonlinear cost such as the effect of an assistive labeling tool and the variety of interactions within a scene such as occluded objects, weather, and motion of objects. FOCAL addresses this discrepancy by providing real annotation-cost labels for 126 video sequences across 69 unique city scenes with a variety of weather, lighting, and seasonal conditions. We also introduce a set of conformal active learning algorithms that take advantage of the sequential structure of video data in order to achieve a better trade-off between annotation-cost and performance while also reducing floating point operations (FLOPS) overhead by at least 77.67%. We show how these approaches better reflect how annotations on videos are done in practice through a sequence selection framework. We further demonstrate the advantage of these approaches by introducing two performance-cost metrics and show that the best conformal active learning method is cheaper than the best traditional active learning method by 113 hours.
摘要
在这篇论文中,我们介绍了FOCAL(福特-橙色蜂合作活动学习)数据集,它允许我们研究视频活动学习中标注成本的影响。标注成本指的是标注和质量控制视频序列所需的时间。在实际中,活动学习研究的目标是最小化标注成本,以便在给定的预算限制下 maximize性能。然而,以前的视频活动学习研究缺乏实时标注标签,因此无法准确评估成本最小化。FOCAL解决了这个问题,提供了126个视频序列的真实标注成本标签,这些序列来自69个不同的城市场景,包括不同的天气、照明和季节条件。我们还介绍了一组具有顺序结构的视频数据的宽泛活动学习算法,可以在标注成本和性能之间达到更好的变换。此外,我们还降低了浮点运算过程(FLOPS)的负担,至少减少77.67%。我们示出了这些方法如何更好地反映实际标注视频的方式,并在序列选择框架中引入了两种性能成本指标。我们还证明了这些方法的优势,并证明了最佳宽泛活动学习方法比最佳传统活动学习方法快113小时。
EduGym: An Environment Suite for Reinforcement Learning Education
paper_authors: Thomas M. Moerland, Matthias Müller-Brockhausen, Zhao Yang, Andrius Bernatavicius, Koen Ponse, Tom Kouwenhoven, Andreas Sauter, Michiel van der Meer, Bram Renting, Aske Plaat
results: 在评估86%的学生和研究人员中,大多数人认为EduGym是一个有用的强化学习教育工具。所有的笔记可以从https://sites.google.com/view/edu-gym/home下载,而全套软件套件可以从https://github.com/RLG-Leiden/edugym安装。Abstract
Due to the empirical success of reinforcement learning, an increasing number of students study the subject. However, from our practical teaching experience, we see students entering the field (bachelor, master and early PhD) often struggle. On the one hand, textbooks and (online) lectures provide the fundamentals, but students find it hard to translate between equations and code. On the other hand, public codebases do provide practical examples, but the implemented algorithms tend to be complex, and the underlying test environments contain multiple reinforcement learning challenges at once. Although this is realistic from a research perspective, it often hinders educational conceptual understanding. To solve this issue we introduce EduGym, a set of educational reinforcement learning environments and associated interactive notebooks tailored for education. Each EduGym environment is specifically designed to illustrate a certain aspect/challenge of reinforcement learning (e.g., exploration, partial observability, stochasticity, etc.), while the associated interactive notebook explains the challenge and its possible solution approaches, connecting equations and code in a single document. An evaluation among RL students and researchers shows 86% of them think EduGym is a useful tool for reinforcement learning education. All notebooks are available from https://sites.google.com/view/edu-gym/home, while the full software package can be installed from https://github.com/RLG-Leiden/edugym.
摘要
SENetV2: Aggregated dense layer for channelwise and global representations
results: 实验结果表明,提出的模型在评估 datasets 上具有remarkable的分类精度提升,与现有的建筑体系相比。Abstract
Convolutional Neural Networks (CNNs) have revolutionized image classification by extracting spatial features and enabling state-of-the-art accuracy in vision-based tasks. The squeeze and excitation network proposed module gathers channelwise representations of the input. Multilayer perceptrons (MLP) learn global representation from the data and in most image classification models used to learn extracted features of the image. In this paper, we introduce a novel aggregated multilayer perceptron, a multi-branch dense layer, within the Squeeze excitation residual module designed to surpass the performance of existing architectures. Our approach leverages a combination of squeeze excitation network module with dense layers. This fusion enhances the network's ability to capture channel-wise patterns and have global knowledge, leading to a better feature representation. This proposed model has a negligible increase in parameters when compared to SENet. We conduct extensive experiments on benchmark datasets to validate the model and compare them with established architectures. Experimental results demonstrate a remarkable increase in the classification accuracy of the proposed model.
摘要
卷积神经网络(CNN)已经革命化图像分类任务,提取空间特征并实现了视觉任务中的状态vector。 propose模块中的压缩和刺激网络(SENet)模块集成了通道表示,多层感知器(MLP)学习数据的全局表示。在这篇论文中,我们介绍了一种新的聚合多层感知器,在SENet模块中添加了多个密集层,以提高网络的频率特征捕捉和全局知识捕捉,从而实现更好的特征表示。这种提议的模型与SENet模型参数数量增加非常小,但性能明显提高。我们在标准数据集上进行了广泛的实验,并与现有的建筑物进行比较。实验结果表明,提议的模型具有很高的分类精度。
results: 通过使用一个反对者 simulate 的自主机器人,该监控器能够识别并阻止不安全的情况,但在实际世界中进行测试时,还存在一些限制和挑战。Abstract
A prerequisite for safe autonomy-in-the-wild is safe testing-in-the-wild. Yet real-world autonomous tests face several unique safety challenges, both due to the possibility of causing harm during a test, as well as the risk of encountering new unsafe agent behavior through interactions with real-world and potentially malicious actors. We propose a framework for conducting safe autonomous agent tests on the open internet: agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans. We a design a basic safety monitor that is flexible enough to monitor existing LLM agents, and, using an adversarial simulated agent, we measure its ability to identify and stop unsafe situations. Then we apply the safety monitor on a battery of real-world tests of AutoGPT, and we identify several limitations and challenges that will face the creation of safe in-the-wild tests as autonomous agents grow more capable.
摘要
安全自主测试需要在野外安全测试。然而,实际世界自主测试面临多种独特的安全挑战,包括测试过程可能导致伤害,以及与真实世界和可能有恶意actor的交互中遇到新的危险行为。我们提出了在互联网上进行安全自主代理测试的框架:代理行为被上下文敏感监控器监视,以防止不安全测试,并将异常行为排名和记录以供人类检查。我们设计了一个基本安全监控器,可以监控现有的LLM代理,并使用对抗式的模拟代理,测试其能够识别和阻止危险情况。然后,我们应用了安全监控器在AutoGPT的实际世界测试中,并发现了许多限制和挑战,这些限制和挑战将随自主代理技术的发展而出现。
results: 我们在公共MTS数据集上进行了广泛的实验,并证明了SEA和SEA++在MTS-UDA问题上达到了状态之arte的表现。Abstract
Unsupervised Domain Adaptation (UDA) methods have been successful in reducing label dependency by minimizing the domain discrepancy between a labeled source domain and an unlabeled target domain. However, these methods face challenges when dealing with Multivariate Time-Series (MTS) data. MTS data typically consist of multiple sensors, each with its own unique distribution. This characteristic makes it hard to adapt existing UDA methods, which mainly focus on aligning global features while overlooking the distribution discrepancies at the sensor level, to reduce domain discrepancies for MTS data. To address this issue, a practical domain adaptation scenario is formulated as Multivariate Time-Series Unsupervised Domain Adaptation (MTS-UDA). In this paper, we propose SEnsor Alignment (SEA) for MTS-UDA, aiming to reduce domain discrepancy at both the local and global sensor levels. At the local sensor level, we design endo-feature alignment, which aligns sensor features and their correlations across domains. To reduce domain discrepancy at the global sensor level, we design exo-feature alignment that enforces restrictions on global sensor features. We further extend SEA to SEA++ by enhancing the endo-feature alignment. Particularly, we incorporate multi-graph-based high-order alignment for both sensor features and their correlations. Extensive empirical results have demonstrated the state-of-the-art performance of our SEA and SEA++ on public MTS datasets for MTS-UDA.
摘要
Unsupervised Domain Adaptation(UDA)方法已经成功地减少标签依赖性,最大化源频道和目标频道之间的频道差异。然而,这些方法在Multivariate Time-Series(MTS)数据上遇到了挑战。MTS数据通常包含多个传感器,每个传感器都有自己独特的分布。这种特点使得现有的UDA方法难以适应MTS数据,因为这些方法主要关注全局特征的对应,而忽略传感器级别的分布差异。为解决这个问题,我们提出了实用的频道适应场景——Multivariate Time-Series Unsupervised Domain Adaptation(MTS-UDA)。在这篇论文中,我们提出了感ensor Alignment(SEA)算法,旨在降低频道差异在传感器级别和全局传感器级别。在传感器级别上,我们设计了内部特征对齐,将传感器特征和它们之间的相关性在频道之间对齐。为了降低频道差异在全局传感器级别,我们设计了外部特征对齐,对全局传感器特征进行限制。我们还延展了SEA到SEA++,通过增强内部特征对齐来提高性能。具体来说,我们在endo-feature对齐中 incorporate多格基高阶对齐,以提高传感器特征和其相关性的对齐。我们在实验中证明了SEA和SEA++在公共MTS数据集上实现了领先的性能。
Towards a Standardized Reinforcement Learning Framework for AAM Contingency Management
methods: 这篇论文使用了Markov Decision Process(MDP)来形式化调整管理问题,并将调整管理MDPintegreated into AAM-Gym simulated environment,以便快速实现机器学习算法的测试和评估。
results: 论文提供了基本的统计信息和示例性能指标,以便作为未来算法开发的共同benchmark。Abstract
Advanced Air Mobility (AAM) is the next generation of air transportation that includes new entrants such as electric vertical takeoff and landing (eVTOL) aircraft, increasingly autonomous flight operations, and small UAS package delivery. With these new vehicles and operational concepts comes a desire to increase densities far beyond what occurs today in and around urban areas, to utilize new battery technology, and to move toward more autonomously-piloted aircraft. To achieve these goals, it becomes essential to introduce new safety management system capabilities that can rapidly assess risk as it evolves across a span of complex hazards and, if necessary, mitigate risk by executing appropriate contingencies via supervised or automated decision-making during flights. Recently, reinforcement learning has shown promise for real-time decision making across a wide variety of applications including contingency management. In this work, we formulate the contingency management problem as a Markov Decision Process (MDP) and integrate the contingency management MDP into the AAM-Gym simulation framework. This enables rapid prototyping of reinforcement learning algorithms and evaluation of existing systems, thus providing a community benchmark for future algorithm development. We report baseline statistical information for the environment and provide example performance metrics.
摘要
高级空中交通(AAM)是未来一代空运交通,包括新入场者如电动垂直起降(eVTOL)飞机、自动驾驶飞行操作和小型无人机快递。这些新的车辆和操作概念使得想要在今天的城市区域内增加密度,利用新的电池技术,并尝试更加自动驾驶飞机。为了实现这些目标,需要引入新的安全管理系统功能,能够快速评估飞行中的风险,并在需要时执行相应的应急措施,以确保安全飞行。在这篇文章中,我们将挑战管理问题形式化为Markov决策过程(MDP),并将它集成到AAM-Gym模拟框架中。这使得可以快速创建和评估各种启发式学习算法,并提供了一个社区标准,用于未来算法的发展。我们提供了基线统计信息,并提供了一些表现指标的示例。
Enhancing Object Coherence in Layout-to-Image Synthesis
results: 对比于传统方法,该模型能够更好地控制对象的一致性,包括semantic coherence和physical coherence。实验结果表明,该模型可以生成更高质量和更加控制性的图像。Abstract
Layout-to-image synthesis is an emerging technique in conditional image generation. It aims to generate complex scenes, where users require fine control over the layout of the objects in a scene. However, it remains challenging to control the object coherence, including semantic coherence (e.g., the cat looks at the flowers or not) and physical coherence (e.g., the hand and the racket should not be misaligned). In this paper, we propose a novel diffusion model with effective global semantic fusion (GSF) and self-similarity feature enhancement modules to guide the object coherence for this task. For semantic coherence, we argue that the image caption contains rich information for defining the semantic relationship within the objects in the images. Instead of simply employing cross-attention between captions and generated images, which addresses the highly relevant layout restriction and semantic coherence separately and thus leads to unsatisfying results shown in our experiments, we develop GSF to fuse the supervision from the layout restriction and semantic coherence requirement and exploit it to guide the image synthesis process. Moreover, to improve the physical coherence, we develop a Self-similarity Coherence Attention (SCA) module to explicitly integrate local contextual physical coherence into each pixel's generation process. Specifically, we adopt a self-similarity map to encode the coherence restrictions and employ it to extract coherent features from text embedding. Through visualization of our self-similarity map, we explore the essence of SCA, revealing that its effectiveness is not only in capturing reliable physical coherence patterns but also in enhancing complex texture generation. Extensive experiments demonstrate the superiority of our proposed method in both image generation quality and controllability.
摘要
layout-to-image 合成是一种在可控图像生成领域的新趋势。它的目标是生成复杂的场景,用户需要精细控制场景中对象的布局。然而,控制对象准确性仍然是一个挑战,包括semantic coherence(例如猫看到花或不)和physical coherence(例如手和racquet不能扭曲)。在这篇论文中,我们提出了一种新的扩散模型,包括有效的全局semantic fusion(GSF)和自相似特征增强模块,以导引对象准确性。 для semantic coherence,我们认为图像caption中包含了丰富的信息,用于定义图像中对象之间的semantic关系。而不是简单地在生成图像和caption之间进行交叉注意力,这会导致不满足的结果,我们开发了GSF来融合权重从layout restriction和semantic coherence的约束,并将其用于图像生成过程中。此外,为了提高physical coherence,我们开发了一个Self-similarity Coherence Attention(SCA)模块,用于 direkt地集成每个像素的生成过程中的本地上下文物理准确性。我们采用了自相似度图来编码准确性约束,并使用它来提取准确的特征从文本嵌入。通过可视化我们的自相似度图,我们探索了SCA的本质,发现它不仅能够捕捉可靠的物理准确性模式,还能够提高复杂的Texture生成。我们的实验表明,我们提出的方法在图像生成质量和可控性方面具有显著优势。
CNL2ASP: converting controlled natural language sentences into ASP
paper_authors: Simone Caruso, Carmine Dodaro, Marco Maratea, Marco Mochi, Francesco Riccio
for: 用于将英语自然语言 sentences 翻译为 Answer Set Programming (ASP) 程序。
methods: 使用一种新的工具 called CNL2ASP,该工具可以将控制的自然语言 (CNL) sentences 翻译为 ASP 规则。
results: 在实际应用中,CNL2ASP 可以获得良好的性能,与 ASP 专家手动编写的编码相比。Abstract
Answer Set Programming (ASP) is a popular declarative programming language for solving hard combinatorial problems. Although ASP has gained widespread acceptance in academic and industrial contexts, there are certain user groups who may find it more advantageous to employ a higher-level language that closely resembles natural language when specifying ASP programs. In this paper, we propose a novel tool, called CNL2ASP, for translating English sentences expressed in a controlled natural language (CNL) form into ASP. In particular, we first provide a definition of the type of sentences allowed by our CNL and their translation as ASP rules, and then exemplify the usage of the CNL for the specification of both synthetic and real-world combinatorial problems. Finally, we report the results of an experimental analysis conducted on the real-world problems to compare the performance of automatically generated encodings with the ones written by ASP practitioners, showing that our tool can obtain satisfactory performance on these benchmarks. Under consideration in Theory and Practice of Logic Programming (TPLP).
摘要
Answer Set Programming (ASP) 是一种流行的声明性编程语言,用于解决复杂的 combinatorial 问题。although ASP 在 academic 和 industrial 上得到了广泛的 Acceptance,有些用户群可能更喜欢使用更接近自然语言的高级语言来Specify ASP 程序。在这篇论文中,我们提出了一种新的工具,called CNL2ASP,用于将英语句子表达在控制的自然语言(CNL)形式中转化为 ASP。特别是,我们首先提供了允许的 CNL 句子类型和其转化为 ASP 规则,然后使用 CNL 来指定 both sintetic 和 real-world combinatorial 问题的解决方案。最后,我们进行了一个实验分析, Comparing the performance of automatically generated encodings with those written by ASP practitioners, showing that our tool can obtain satisfactory performance on these benchmarks. 在 Theory and Practice of Logic Programming (TPLP) 中进行评议。
A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness
methods: 这 paper 评估了不同的 image-to-image 方法,用于在 latent speech 特征上进行修改。
results: 我们的结果为 future research 提供了有价值的洞察,并开启了在这个方向上的新途径。Abstract
This report explores the challenge of enhancing expressiveness control in Text-to-Speech (TTS) models by augmenting a frozen pretrained model with a Diffusion Model that is conditioned on joint semantic audio/text embeddings. The paper identifies the challenges encountered when working with a VAE-based TTS model and evaluates different image-to-image methods for altering latent speech features. Our results offer valuable insights into the complexities of adding expressiveness control to TTS systems and open avenues for future research in this direction.
摘要
Here's the translation in Simplified Chinese:这份报告研究了增强文本到语音(TTS)模型的表达控制的挑战,通过将预训练的模型加入一个基于扩散模型的增强方法,使其可以根据联合semantic audio/text嵌入来进行控制。报告描述了在VAE基于TTS模型上工作时遇到的挑战,并评估了不同的图像到图像方法以修改幂等语音特征。结果提供了关于添加表达控制到TTS系统的复杂性的有价值的洞察,并开启了未来研究的可能性。
From Principle to Practice: Vertical Data Minimization for Machine Learning
results: 提出了一种基线vDM算法和隐私感知树(PAT)算法,后者在多个场景中表现出色,超过了所有基线。计划将代码公开发布为公共可用库,以促进DM原则在实际应用中的普及。Abstract
Aiming to train and deploy predictive models, organizations collect large amounts of detailed client data, risking the exposure of private information in the event of a breach. To mitigate this, policymakers increasingly demand compliance with the data minimization (DM) principle, restricting data collection to only that data which is relevant and necessary for the task. Despite regulatory pressure, the problem of deploying machine learning models that obey DM has so far received little attention. In this work, we address this challenge in a comprehensive manner. We propose a novel vertical DM (vDM) workflow based on data generalization, which by design ensures that no full-resolution client data is collected during training and deployment of models, benefiting client privacy by reducing the attack surface in case of a breach. We formalize and study the corresponding problem of finding generalizations that both maximize data utility and minimize empirical privacy risk, which we quantify by introducing a diverse set of policy-aligned adversarial scenarios. Finally, we propose a range of baseline vDM algorithms, as well as Privacy-aware Tree (PAT), an especially effective vDM algorithm that outperforms all baselines across several settings. We plan to release our code as a publicly available library, helping advance the standardization of DM for machine learning. Overall, we believe our work can help lay the foundation for further exploration and adoption of DM principles in real-world applications.
摘要
In this work, we address this challenge comprehensively. We propose a novel vertical DM (vDM) workflow based on data generalization, which ensures that no full-resolution client data is collected during training and deployment of models. This benefits client privacy by reducing the attack surface in case of a breach.We formalize and study the problem of finding generalizations that maximize data utility and minimize empirical privacy risk. We quantify this using a diverse set of policy-aligned adversarial scenarios.We propose a range of baseline vDM algorithms, as well as Privacy-aware Tree (PAT), an especially effective vDM algorithm that outperforms all baselines across several settings. We plan to release our code as a publicly available library, helping advance the standardization of DM for machine learning.Overall, we believe our work can help lay the foundation for further exploration and adoption of DM principles in real-world applications.
Regions are Who Walk Them: a Large Pre-trained Spatiotemporal Model Based on Human Mobility for Ubiquitous Urban Sensing
results: 实验结果表明,该模型可以准确地 profiling 用户和区域,并且在 trajectory 生成任务中表现出了良好的预测能力。Abstract
User profiling and region analysis are two tasks of significant commercial value. However, in practical applications, modeling different features typically involves four main steps: data preparation, data processing, model establishment, evaluation, and optimization. This process is time-consuming and labor-intensive. Repeating this workflow for each feature results in abundant development time for tasks and a reduced overall volume of task development. Indeed, human mobility data contains a wealth of information. Several successful cases suggest that conducting in-depth analysis of population movement data could potentially yield meaningful profiles about users and areas. Nonetheless, most related works have not thoroughly utilized the semantic information within human mobility data and trained on a fixed number of the regions. To tap into the rich information within population movement, based on the perspective that Regions Are Who walk them, we propose a large spatiotemporal model based on trajectories (RAW). It possesses the following characteristics: 1) Tailored for trajectory data, introducing a GPT-like structure with a parameter count of up to 1B; 2) Introducing a spatiotemporal fine-tuning module, interpreting trajectories as collection of users to derive arbitrary region embedding. This framework allows rapid task development based on the large spatiotemporal model. We conducted extensive experiments to validate the effectiveness of our proposed large spatiotemporal model. It's evident that our proposed method, relying solely on human mobility data without additional features, exhibits a certain level of relevance in user profiling and region analysis. Moreover, our model showcases promising predictive capabilities in trajectory generation tasks based on the current state, offering the potential for further innovative work utilizing this large spatiotemporal model.
摘要
用户 profiling 和区域分析是商业值得的两项任务。然而,在实际应用中,模型不同特征通常需要四个主要步骤:数据准备、数据处理、模型建立、评估和优化。这个过程是时间consuming 和人力消耗。为了实现每个特征,需要重复这个工作流程,从而导致任务开发的庞大量和总体任务开发的减少。实际上,人口流动数据包含丰富的信息。许多成功的案例表明,对人口流动数据进行深入分析可能会获得有用的用户和区域 profiling。然而,大多数相关的工作没有全面利用人口流动数据中的 semantics 信息,并且只在固定的区域上进行训练。为了挖掘人口流动中的丰富信息,我们基于“区域是谁行走的”的视角,提出了一种大型空间时间模型(RAW)。它具有以下特点:1. 适用于轨迹数据,引入 GPT-like 结构,最多可以 Count 1B 参数。2. 引入空间时间细化模块,解释轨迹为用户集合,生成任意区域嵌入。这个框架允许基于大型空间时间模型的快速任务开发。我们进行了广泛的实验 validate 我们提议的大型空间时间模型的效果。显然,我们的提议方法, solely 基于人口流动数据而无需其他特征,在用户 profiling 和区域分析中 exhibit 一定的相关性。此外,我们的模型在 trajectory 生成任务中表现出了良好的预测能力,提供了可能进一步利用这个大型空间时间模型的机会。
Using Cooperative Game Theory to Prune Neural Networks
results: 比较 existing 方法,Game Theory Assisted Pruning(GTAP)在实现参数数量和模型精度之间的平衡方面表现出色。Abstract
We show how solution concepts from cooperative game theory can be used to tackle the problem of pruning neural networks. The ever-growing size of deep neural networks (DNNs) increases their performance, but also their computational requirements. We introduce a method called Game Theory Assisted Pruning (GTAP), which reduces the neural network's size while preserving its predictive accuracy. GTAP is based on eliminating neurons in the network based on an estimation of their joint impact on the prediction quality through game theoretic solutions. Specifically, we use a power index akin to the Shapley value or Banzhaf index, tailored using a procedure similar to Dropout (commonly used to tackle overfitting problems in machine learning). Empirical evaluation of both feedforward networks and convolutional neural networks shows that this method outperforms existing approaches in the achieved tradeoff between the number of parameters and model accuracy.
摘要
我们展示了使用合作游戏理论的解决方案来解决深度神经网络(DNNs)的压缩问题。深度神经网络的 ever-growing 大小可以提高其性能,但也增加计算需求。我们提出了一种方法called Game Theory Assisted Pruning(GTAP),它可以降低神经网络的大小,同时保持预测精度。GTAP 基于 eliminating neurons 在神经网络中,根据游戏理论解决方案估计neurons 的共同影响力。具体来说,我们使用一种力量指数,类似于 Shapley 值或 Banzhaf 指数,通过 Dropout 方法(通常用于避免过拟合问题)进行定制。我们的实验表明,GTAP 方法可以与现有方法比较,在实现参数数量和模型精度之间的平衡中具有更好的性能。
Accurate and Fast Fischer-Tropsch Reaction Microkinetics using PINNs
results: 提出了一种计算高效并准确的方法,可以在现实的生产条件下解决FTS微谱模型。该模型可以准确计算含有活化剂的晶格site的比例,并且可以在GPU上运行,比传统方法快速多少。Abstract
Microkinetics allows detailed modelling of chemical transformations occurring in many industrially relevant reactions. Traditional way of solving the microkinetics model for Fischer-Tropsch synthesis (FTS) becomes inefficient when it comes to more advanced real-time applications. In this work, we address these challenges by using physics-informed neural networks(PINNs) for modelling FTS microkinetics. We propose a computationally efficient and accurate method, enabling the ultra-fast solution of the existing microkinetics models in realistic process conditions. The proposed PINN model computes the fraction of vacant catalytic sites, a key quantity in FTS microkinetics, with median relative error (MRE) of 0.03%, and the FTS product formation rates with MRE of 0.1%. Compared to conventional equation solvers, the model achieves up to 1E+06 times speed-up when running on GPUs, thus being fast enough for multi-scale and multi-physics reactor modelling and enabling its applications in real-time process control and optimization.
摘要
The proposed PINN model accurately computes the fraction of vacant catalytic sites, a key quantity in FTS microkinetics, with a median relative error (MRE) of 0.03%. Additionally, the model accurately predicts FTS product formation rates with an MRE of 0.1%. In comparison to conventional equation solvers, the PINN model achieves up to 10^6 times speed-up when running on GPUs, making it fast enough for multi-scale and multi-physics reactor modeling and enabling its applications in real-time process control and optimization.
Reinforcement Learning with Maskable Stock Representation for Portfolio Management in Customizable Stock Pools
results: 经过广泛的实验 validate 了EarnMore 在 8 个不同股票池的亚集(US 股票市场)上,与 14 个基eline 之间的差异性达到 40% 以上。Abstract
Portfolio management (PM) is a fundamental financial trading task, which explores the optimal periodical reallocation of capitals into different stocks to pursue long-term profits. Reinforcement learning (RL) has recently shown its potential to train profitable agents for PM through interacting with financial markets. However, existing work mostly focuses on fixed stock pools, which is inconsistent with investors' practical demand. Specifically, the target stock pool of different investors varies dramatically due to their discrepancy on market states and individual investors may temporally adjust stocks they desire to trade (e.g., adding one popular stocks), which lead to customizable stock pools (CSPs). Existing RL methods require to retrain RL agents even with a tiny change of the stock pool, which leads to high computational cost and unstable performance. To tackle this challenge, we propose EarnMore, a rEinforcement leARNing framework with Maskable stOck REpresentation to handle PM with CSPs through one-shot training in a global stock pool (GSP). Specifically, we first introduce a mechanism to mask out the representation of the stocks outside the target pool. Second, we learn meaningful stock representations through a self-supervised masking and reconstruction process. Third, a re-weighting mechanism is designed to make the portfolio concentrate on favorable stocks and neglect the stocks outside the target pool. Through extensive experiments on 8 subset stock pools of the US stock market, we demonstrate that EarnMore significantly outperforms 14 state-of-the-art baselines in terms of 6 popular financial metrics with over 40% improvement on profit.
摘要
PORTFOLIO管理(PM)是财务交易中的基本任务,它探索在不同股票中分配资金以实现长期收益。强化学习(RL)在最近几年内已经表现出培养财务交易的可能性,但现有的工作主要集中在固定股票池中,这与投资者的实际需求不符。特别是,投资者的目标股票池在不同的市场状况下会有很大的变化,个人投资者可能会在不同的时间内增加或减少感兴趣的股票,这导致个性化股票池(CSP)。现有的RL方法需要在股票池变化时重新训练RL代理人,这会导致计算成本高涨和性能不稳定。为解决这个挑战,我们提出了EarnMore,一个基于强化学习的投资培养框架,可以处理CSP。具体来说,我们首先引入一种机制,用于在股票外部的表示上层Masking。其次,我们通过一种自我监督的Masking和重建过程来学习有意义的股票表示。最后,我们设计了一种重要性调整机制,以便让股票组合更集中于有利股票,并忽略股票外部的表示。通过对美国股市8个子集股票池进行了广泛的实验,我们证明EarnMore在6种常见的金融指标上显著超过14种基eline的表现,增加了40%以上的利润。
A Bridge between Dynamical Systems and Machine Learning: Engineered Ordinary Differential Equations as Classification Algorithm (EODECA)
methods: 这篇论文提出了 Engineered Ordinary Differential Equations as Classification Algorithms (EODECAs),这是基于连续普通微分方程的神经网络模型,具有高度的解释性和高分类性能。
results: EODECAs 可以提供高分类性能和自然的解释性,与传统的深度学习模型相比,它们更加透明和可解。Abstract
In a world increasingly reliant on machine learning, the interpretability of these models remains a substantial challenge, with many equating their functionality to an enigmatic black box. This study seeks to bridge machine learning and dynamical systems. Recognizing the deep parallels between dense neural networks and dynamical systems, particularly in the light of non-linearities and successive transformations, this manuscript introduces the Engineered Ordinary Differential Equations as Classification Algorithms (EODECAs). Uniquely designed as neural networks underpinned by continuous ordinary differential equations, EODECAs aim to capitalize on the well-established toolkit of dynamical systems. Unlike traditional deep learning models, which often suffer from opacity, EODECAs promise both high classification performance and intrinsic interpretability. They are naturally invertible, granting them an edge in understanding and transparency over their counterparts. By bridging these domains, we hope to usher in a new era of machine learning models where genuine comprehension of data processes complements predictive prowess.
摘要
在一个越来越依赖机器学习的世界中,机器学习模型的解释性仍然是一大挑战,许多人将其功能比作一个神秘的黑盒子。这个研究想要把机器学习和动力系统相连起来。Recognizing the deep parallels between dense neural networks and dynamical systems, particularly in the light of non-linearities and successive transformations, this manuscript introduces the Engineered Ordinary Differential Equations as Classification Algorithms (EODECAs). Uniquely designed as neural networks underpinned by continuous ordinary differential equations, EODECAs aim to capitalize on the well-established toolkit of dynamical systems. Unlike traditional deep learning models, which often suffer from opacity, EODECAs promise both high classification performance and intrinsic interpretability. They are naturally invertible, granting them an edge in understanding and transparency over their counterparts. By bridging these domains, we hope to usher in a new era of machine learning models where genuine comprehension of data processes complements predictive prowess.
Quantum Data Encoding: A Comparative Analysis of Classical-to-Quantum Mapping Techniques and Their Impact on Machine Learning Accuracy
methods: 我们探讨了多种经典数据编码方法,包括基准编码、角度编码和振荡编码,并对各种经典ML算法进行了广泛的实验,包括Logistic Regression、K-Nearest Neighbors、支持向量机和集成方法Like Random Forest、LightGBM、AdaBoost和CatBoost。
results: 我们发现,量子数据嵌入可以提高分类精度和F1分数,特别是在具有增强特征表示的模型中。我们发现在运行时间方面,低复杂度模型显示出了中等增加,而更复杂的模型则表现出了明显的变化。意外地,集成方法表现出了有利的平衡,即性能改进和计算开销之间的权衡。这种研究证明了量子数据嵌入在经典ML模型中的潜在优势,并强调在实际应用中考虑性能改进和计算成本之间的平衡。未来的研究可能包括优化量子编码过程以提高计算效率,以及探讨量子嵌入技术在实际应用中的扩展性和可扩展性。Abstract
This research explores the integration of quantum data embedding techniques into classical machine learning (ML) algorithms, aiming to assess the performance enhancements and computational implications across a spectrum of models. We explore various classical-to-quantum mapping methods, ranging from basis encoding, angle encoding to amplitude encoding for encoding classical data, we conducted an extensive empirical study encompassing popular ML algorithms, including Logistic Regression, K-Nearest Neighbors, Support Vector Machines and ensemble methods like Random Forest, LightGBM, AdaBoost, and CatBoost. Our findings reveal that quantum data embedding contributes to improved classification accuracy and F1 scores, particularly notable in models that inherently benefit from enhanced feature representation. We observed nuanced effects on running time, with low-complexity models exhibiting moderate increases and more computationally intensive models experiencing discernible changes. Notably, ensemble methods demonstrated a favorable balance between performance gains and computational overhead. This study underscores the potential of quantum data embedding in enhancing classical ML models and emphasizes the importance of weighing performance improvements against computational costs. Future research directions may involve refining quantum encoding processes to optimize computational efficiency and exploring scalability for real-world applications. Our work contributes to the growing body of knowledge at the intersection of quantum computing and classical machine learning, offering insights for researchers and practitioners seeking to harness the advantages of quantum-inspired techniques in practical scenarios.
摘要
Dates Fruit Disease Recognition using Machine Learning
results: 研究发现,将Lab、统计和DWT特征结合使用,可以提高检测精度和综合性,并且在871个图像中实现了95.2%的平均准确率。Abstract
Many countries such as Saudi Arabia, Morocco and Tunisia are among the top exporters and consumers of palm date fruits. Date fruit production plays a major role in the economies of the date fruit exporting countries. Date fruits are susceptible to disease just like any fruit and early detection and intervention can end up saving the produce. However, with the vast farming lands, it is nearly impossible for farmers to observe date trees on a frequent basis for early disease detection. In addition, even with human observation the process is prone to human error and increases the date fruit cost. With the recent advances in computer vision, machine learning, drone technology, and other technologies; an integrated solution can be proposed for the automatic detection of date fruit disease. In this paper, a hybrid features based method with the standard classifiers is proposed based on the extraction of L*a*b color features, statistical features, and Discrete Wavelet Transform (DWT) texture features for the early detection and classification of date fruit disease. A dataset was developed for this work consisting of 871 images divided into the following classes; Healthy date, Initial stage of disease, Malnourished date, and Parasite infected. The extracted features were input to common classifiers such as the Random Forest (RF), Multilayer Perceptron (MLP), Na\"ive Bayes (NB), and Fuzzy Decision Trees (FDT). The highest average accuracy was achieved when combining the L*a*b, Statistical, and DWT Features.
摘要
许多国家如沙特阿拉伯、摩洛哥和突尼斯等是dates果实的主要出口国和消费国。dates果实的生产对出口国经济发展具有重要作用。然而,由于庞大的农业地域,农民无法在一定时间内频繁地检查dates Tree,因此早期病诊断和 intervención可能会增加dates fruit的成本。在现代计算机视觉、机器学习、无人机技术等的支持下,一个整合的解决方案可以提议用于自动检测dates fruit病诊断。在这篇论文中,一种基于混合特征的方法被提议,该方法基于L*a*b颜色特征、统计特征和Discrete Wavelet Transform(DWT)Texture特征进行早期检测和分类dates fruit病诊断。一个数据集被开发用于这项工作,该数据集包括871张图像,分为以下类别:健康的dates,病诊断的初期阶段,营养不良的dates和受到寄生虫感染的。提取的特征被输入到常见的分类器 such as Random Forest(RF)、Multilayer Perceptron(MLP)、Na\"ive Bayes(NB)和Fuzzy Decision Trees(FDT)。结果表明,将L*a*b、统计和DWT特征结合使用时,获得了最高的平均准确率。
Quantum-Assisted Simulation: A Framework for Designing Machine Learning Models in the Quantum Computing Domain
results: 通过对一个数据集使用机器学习和量子机器学习两种方法进行比较,研究发现量子机器学习方法可以提高数据处理效率和准确率。Abstract
Machine learning (ML) models are trained using historical data to classify new, unseen data. However, traditional computing resources often struggle to handle the immense amount of data, commonly known as Big Data, within a reasonable timeframe. Quantum computing (QC) provides a novel approach to information processing. Quantum algorithms have the potential to process classical data exponentially faster than classical computing. By mapping quantum machine learning (QML) algorithms into the quantum mechanical domain, we can potentially achieve exponential improvements in data processing speed, reduced resource requirements, and enhanced accuracy and efficiency. In this article, we delve into both the QC and ML fields, exploring the interplay of ideas between them, as well as the current capabilities and limitations of hardware. We investigate the history of quantum computing, examine existing QML algorithms, and aim to present a simplified procedure for setting up simulations of QML algorithms, making it accessible and understandable for readers. Furthermore, we conducted simulations on a dataset using both machine learning and quantum machine learning approaches. We then proceeded to compare their respective performances by utilizing a quantum simulator.
摘要
机器学习(ML)模型通过历史数据来分类新的、未经见过的数据。然而,传统计算资源经常无法处理大量数据,通常被称为大数据,在合理的时间framworks内进行处理。量子计算(QC)提供了一种新的信息处理方法。量子算法有能力在经典计算中进行批处理,并且可以在批处理中实现对数据的快速分类。通过将量子机器学习(QML)算法映射到量子机械领域,我们可以实现对数据的快速处理,降低资源需求,提高准确率和效率。在这篇文章中,我们将探讨QC和ML两个领域之间的交互,以及现有硬件的能力和限制。我们还会investigate量子计算的历史,检查现有的QML算法,并尝试提供一个简化的程序来设置QML算法的 simulations,使其更加可读性和可理解性。此外,我们还进行了一个数据集的simulation,使用了机器学习和量子机器学习两种方法。然后,我们对这两种方法的性能进行了比较,使用了量子模拟器。
INSPECT: A Multimodal Dataset for Pulmonary Embolism Diagnosis and Prognosis
paper_authors: Shih-Cheng Huang, Zepeng Huo, Ethan Steinberg, Chia-Chun Chiang, Matthew P. Lungren, Curtis P. Langlotz, Serena Yeung, Nigam H. Shah, Jason A. Fries
results: 这个论文提出了一个基准数据集,用于评估多modal的医学模型的性能。这个数据集包括19,402名病人的数据,包括CT图像、辐射报告和结构化电子医疗记录数据。Abstract
Synthesizing information from multiple data sources plays a crucial role in the practice of modern medicine. Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets. To address this limitation, we introduce INSPECT, which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes. INSPECT contains data from 19,402 patients, including CT images, radiology report impression sections, and structured electronic health record (EHR) data (i.e. demographics, diagnoses, procedures, vitals, and medications). Using INSPECT, we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks. We evaluate image-only, EHR-only, and multimodal fusion models. Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement. To the best of our knowledge, INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research.
摘要
现代医学中合并信息 FROM多种数据源的Synthesizing plays a crucial role。Current applications of artificial intelligence in medicine often focus on single-modality data due to a lack of publicly available, multimodal medical datasets。To address this limitation, we introduce INSPECT,which contains de-identified longitudinal records from a large cohort of patients at risk for pulmonary embolism (PE), along with ground truth labels for multiple outcomes。INSPECT contains data from 19,402 patients,including CT images,radiology report impression sections,and structured electronic health record (EHR) data(i.e. demographics,diagnoses,procedures,vitals,and medications)。Using INSPECT,we develop and release a benchmark for evaluating several baseline modeling approaches on a variety of important PE related tasks。We evaluate image-only,EHR-only,and multimodal fusion models。Trained models and the de-identified dataset are made available for non-commercial use under a data use agreement。To the best of our knowledge,INSPECT is the largest multimodal dataset integrating 3D medical imaging and EHR for reproducible methods evaluation and research。
TaCo: Enhancing Cross-Lingual Transfer for Low-Resource Languages in LLMs through Translation-Assisted Chain-of-Thought Processes
results: 对于三种低资源语言和一种高资源语言进行了进一步的 instrucion-tuning,并通过比较result表明,TaCo方法可以提高GPT-4的性能, especailly for low-resource languages。Abstract
LLMs such as ChatGPT and PaLM can be utilized to train on a new language and revitalize low-resource languages. However, it is evidently very costly to pretrain pr fine-tune LLMs to adopt new languages. Another challenge is the limitation of benchmark datasets and the metrics used to measure the performance of models in multilingual settings. This paper proposes cost-effective solutions to both of the aforementioned challenges. We introduce the Multilingual Instruction-Tuning Dataset (MITS), which is comprised of the translation of Alpaca-52K, Dolly-15K, and Vicuna Benchmark in 132 languages. Also, we propose a new method called \emph{TaCo: Translation-Assisted Cross-Linguality}, which make uses of translation in a chain-of-thought process to instruction-tune LLMs on a new languages through a curriculum learning process. As a proof of concept, we experimented with the instruction-tuned Guanaco-33B model and performed further instruction tuning using the TaCo method in three low-resource languages and one high-resource language. Our results show that the TaCo method impresses the GPT-4 with 82% for a low-resource language in the Vicuna Benchmark dataset, and boosts performance by double in contrast to the performance of instruction tuning only. Our results show that TaCo is a promising method for creating multilingual LLMs, even for low-resource languages. We have released our datasets and the model adapters, and encourage the research community to make use of these resources towards advancing work on multilingual LLMs.
摘要
LLMs 如 ChatGPT 和 PaLM 可以用来训练新语言并恢复低资源语言。然而,训练或精度调整 LLMs 以采用新语言明显很昂贵。另外,评价模型在多语言设置下的表现也受到限制。这篇论文提出了经济的解决方案。我们引入了多语言指导集(MITS),其包括了 Alpaca-52K、Dolly-15K 和 Vicuna Benchmark 的翻译版本,涵盖 132 种语言。此外,我们提出了一种新方法 called “TaCo:翻译协助跨语言”,它利用翻译的链条过程来帮助 LLMs 在新语言上进行指导调整。作为证明,我们在 Guanaco-33B 模型上进行了进一步的指导调整,并使用 TaCo 方法在三种低资源语言和一种高资源语言进行了实验。我们的结果表明,TaCo 方法可以在 Vicuna Benchmark 数据集中让 GPT-4 的表现提高至 82%,并在低资源语言中提高表现的两倍。我们的结果表明,TaCo 是一种有前途的方法,可以创造多语言 LLMs,即使是低资源语言。我们已经发布了数据集和模型适配器,并鼓励研究社区使用这些资源来推进多语言 LLMs 的研究。
Federated Knowledge Graph Completion via Latent Embedding Sharing and Tensor Factorization
results: 实验结果表明,FLEST 方法可以具有较高的效率和隐私保护,同时也能够保持完成任务的性能。Abstract
Knowledge graphs (KGs), which consist of triples, are inherently incomplete and always require completion procedure to predict missing triples. In real-world scenarios, KGs are distributed across clients, complicating completion tasks due to privacy restrictions. Many frameworks have been proposed to address the issue of federated knowledge graph completion. However, the existing frameworks, including FedE, FedR, and FEKG, have certain limitations. = FedE poses a risk of information leakage, FedR's optimization efficacy diminishes when there is minimal overlap among relations, and FKGE suffers from computational costs and mode collapse issues. To address these issues, we propose a novel method, i.e., Federated Latent Embedding Sharing Tensor factorization (FLEST), which is a novel approach using federated tensor factorization for KG completion. FLEST decompose the embedding matrix and enables sharing of latent dictionary embeddings to lower privacy risks. Empirical results demonstrate FLEST's effectiveness and efficiency, offering a balanced solution between performance and privacy. FLEST expands the application of federated tensor factorization in KG completion tasks.
摘要
知识图(KG)是一种嵌入式的三元组,总是缺失一些 triple,需要完成过程来预测缺失的 triple。在实际应用中,KG 分布在客户端上,因此完成任务变得更加复杂,因为隐私限制。许多框架已经被提出来解决联邦知识图完成任务中的问题,包括 FedE、FedR 和 FEKG,但是这些框架都有一些局限性。FedE 可能会导致信息泄露,FedR 的优化效果随着关系的重叠度下降,而 FEKG 则受到计算成本和模式塌 collapse 问题的限制。为了解决这些问题,我们提出了一种新的方法,即联邦隐藏嵌入分解 tensor factorization (FLEST),它使用联邦tensor factorization来完成KG completion任务。FLEST 将嵌入矩阵分解成多个独立的嵌入矩阵,从而降低了隐私风险。我们的实验结果表明,FLEST 具有较高的效果和效率,可以均衡性和隐私之间的 contradiction。FLEST 扩展了联邦tensor factorization在KG completion任务中的应用范围。
Emotion-Aware Music Recommendation System: Enhancing User Experience Through Real-Time Emotional Context
results: 该模型可以增强用户音乐经验,为用户提供与当前情感状态相符的音乐推荐,从而创造更有意义和有感的听众体验。Abstract
This study addresses the deficiency in conventional music recommendation systems by focusing on the vital role of emotions in shaping users music choices. These systems often disregard the emotional context, relying predominantly on past listening behavior and failing to consider the dynamic and evolving nature of users emotional preferences. This gap leads to several limitations. Users may receive recommendations that do not match their current mood, which diminishes the quality of their music experience. Furthermore, without accounting for emotions, the systems might overlook undiscovered or lesser-known songs that have a profound emotional impact on users. To combat these limitations, this research introduces an AI model that incorporates emotional context into the song recommendation process. By accurately detecting users real-time emotions, the model can generate personalized song recommendations that align with the users emotional state. This approach aims to enhance the user experience by offering music that resonates with their current mood, elicits the desired emotions, and creates a more immersive and meaningful listening experience. By considering emotional context in the song recommendation process, the proposed model offers an opportunity for a more personalized and emotionally resonant musical journey.
摘要
To address these limitations, this research introduces an AI model that incorporates emotional context into the song recommendation process. By accurately detecting users' real-time emotions, the model can generate personalized song recommendations that align with the users' emotional state. This approach aims to enhance the user experience by offering music that resonates with their current mood, elicits the desired emotions, and creates a more immersive and meaningful listening experience.By considering emotional context in the song recommendation process, the proposed model offers an opportunity for a more personalized and emotionally resonant musical journey. This study has the potential to revolutionize the music industry by providing users with a more tailored and emotionally satisfying experience, ultimately leading to a more engaging and fulfilling listening experience.
paper_authors: Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin
for: 这 paper written for improving the quality of person-centric image generation, specifically addressing the challenges of training imbalance and quality compromise in current subject-driven image generation methods.
methods: 这 paper propose a collaborative generation pipeline called Face-diffuser, which consists of two specialized pre-trained diffusion models (TDM and SDM) and a novel mechanism called Saliency-adaptive Noise Fusion (SNF) to eliminate training imbalance and quality compromise.
results: 该 paper achieve impressive effectiveness and robustness in person-centric image generation, with extensive experiments confirming the improved performance of Face-diffuser over existing methods.Abstract
Current subject-driven image generation methods encounter significant challenges in person-centric image generation. The reason is that they learn the semantic scene and person generation by fine-tuning a common pre-trained diffusion, which involves an irreconcilable training imbalance. Precisely, to generate realistic persons, they need to sufficiently tune the pre-trained model, which inevitably causes the model to forget the rich semantic scene prior and makes scene generation over-fit to the training data. Moreover, even with sufficient fine-tuning, these methods can still not generate high-fidelity persons since joint learning of the scene and person generation also lead to quality compromise. In this paper, we propose Face-diffuser, an effective collaborative generation pipeline to eliminate the above training imbalance and quality compromise. Specifically, we first develop two specialized pre-trained diffusion models, i.e., Text-driven Diffusion Model (TDM) and Subject-augmented Diffusion Model (SDM), for scene and person generation, respectively. The sampling process is divided into three sequential stages, i.e., semantic scene construction, subject-scene fusion, and subject enhancement. The first and last stages are performed by TDM and SDM respectively. The subject-scene fusion stage, that is the collaboration achieved through a novel and highly effective mechanism, Saliency-adaptive Noise Fusion (SNF). Specifically, it is based on our key observation that there exists a robust link between classifier-free guidance responses and the saliency of generated images. In each time step, SNF leverages the unique strengths of each model and allows for the spatial blending of predicted noises from both models automatically in a saliency-aware manner. Extensive experiments confirm the impressive effectiveness and robustness of the Face-diffuser.
摘要
现有的主题驱动图像生成方法遇到了人центric图像生成中的显著挑战。原因在于它们通过精度调整一个共同预训练的扩散来学习 semantic scene和人类生成,而这种培训不匹配导致模型忘记了丰富的 semantic scene prior,从而导致场景生成过拟合训练数据。此外,即使充分调整,这些方法仍然无法生成高效的人类,因为共同学习场景和人类生成也会导致质量牺牲。在这篇论文中,我们提出了Face-diffuser,一种有效的合作生成管线,以消除以上培训不匹配和质量牺牲。具体来说,我们首先开发了两个特殊的预训练扩散模型,即 Text-driven Diffusion Model (TDM) 和 Subject-augmented Diffusion Model (SDM),用于场景和人类生成。采样过程分为三个顺序阶段:semantic scene construction、subject-scene fusion 和 subject enhancement。第一个和最后一个阶段由 TDM 和 SDM 执行。subject-scene fusion 阶段通过一种新的、非常有效的机制——Saliency-adaptive Noise Fusion (SNF) 实现了合作。具体来说,SNF 基于我们关键观察到的,在生成图像中存在稳定的静止特征的robust链接。在每次迭代中,SNF 利用每个模型的独特优势,自动在一种saliency-aware的方式进行空间混合预测噪音。广泛的实验证明了Face-diffuser 的强大效果和稳定性。
Clustering Techniques for Stable Linear Dynamical Systems with applications to Hard Disk Drives
results: 该论文提出了基于 k-medoids 算法的硬 clustering 方法和基于 Gaussian Mixture Models 的特殊类 LTI 系统 clustering 方法,以便设计对每个群组内的优化控制器。Abstract
In Robust Control and Data Driven Robust Control design methodologies, multiple plant transfer functions or a family of transfer functions are considered and a common controller is designed such that all the plants that fall into this family are stabilized. Though the plants are stabilized, the controller might be sub-optimal for each of the plants when the variations in the plants are large. This paper presents a way of clustering stable linear dynamical systems for the design of robust controllers within each of the clusters such that the controllers are optimal for each of the clusters. First a k-medoids algorithm for hard clustering will be presented for stable Linear Time Invariant (LTI) systems and then a Gaussian Mixture Models (GMM) clustering for a special class of LTI systems, common for Hard Disk Drive plants, will be presented.
摘要
在Robust控制和数据驱动Robust控制设计方法中,考虑多个植入函数或一家植入函数家族,设计一个通用控制器,使得所有fall into这个家族的植入函数都稳定。虽然植入函数都稳定,但控制器可能对每个植入函数进行优化。这篇文章介绍了稳定线性动力系统集群的方法,以设计内部的优化控制器。首先,将介绍k-medians算法 для硬 clustering稳定线性时间不变(LTI)系统,然后介绍Gaussian Mixture Models(GMM)集群方法,专门适用于硬盘植入系统。
Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and Classification
results: 本文提出的S4MI(自监学习和半监学习 для医疗影像)管道可以在三个医疗影像数据集上进行分类和 segmentation 的检验,并且在大多数数据集中,使用10%的标注 perfomed 更好于100%的标注。半监学习方法在 segmentation 中取得了良好的效果,使用50% fewer labels 在所有三个数据集中都超过了完全监督方法。Abstract
Advancements in clinical treatment and research are limited by supervised learning techniques that rely on large amounts of annotated data, an expensive task requiring many hours of clinical specialists' time. In this paper, we propose using self-supervised and semi-supervised learning. These techniques perform an auxiliary task that is label-free, scaling up machine-supervision is easier compared with fully-supervised techniques. This paper proposes S4MI (Self-Supervision and Semi-Supervision for Medical Imaging), our pipeline to leverage advances in self and semi-supervision learning. We benchmark them on three medical imaging datasets to analyze their efficacy for classification and segmentation. This advancement in self-supervised learning with 10% annotation performed better than 100% annotation for the classification of most datasets. The semi-supervised approach yielded favorable outcomes for segmentation, outperforming the fully-supervised approach by using 50% fewer labels in all three datasets.
摘要
临床治疗和研究的进步受到监督学习技术的限制,这些技术需要大量的注解数据,需要许多临床专家的时间和努力。在这篇论文中,我们提议使用自监学习和半监学习技术。这些技术可以完成无标签任务,比完全监学技术更容易扩大机器监督。我们提出S4MI(自监学和半监学医学影像处理管道),我们的管道可以利用自监学和半监学技术来提高医学影像处理的效果。我们在三个医学影像数据集上对S4MI进行了比较,发现在大多数数据集中,自监学技术比100%注解技术表现更好,而半监学技术在所有三个数据集中对segmentation问题表现更好,只需使用50%的标签数据。
paper_authors: Karl J. Friston, Lancelot Da Costa, Alexander Tschantz, Alex Kiefer, Tommaso Salvatori, Victorita Neacsu, Magnus Koudahl, Conor Heins, Noor Sajid, Dimitrije Markovic, Thomas Parr, Tim Verbelen, Christopher L Buckley
results: 在 MNIST dataset 上进行图像分类示例中,该方法可以快速地学习抽象生成模型。在一个更加复杂的问题中,使用一种简单的 sprite-based 视觉分解问题和 Tower of Hanoi 问题,生成模型可以自动地恢复(即分离)积极状态的因素结构和特征路径或动态。Abstract
This paper concerns structure learning or discovery of discrete generative models. It focuses on Bayesian model selection and the assimilation of training data or content, with a special emphasis on the order in which data are ingested. A key move - in the ensuing schemes - is to place priors on the selection of models, based upon expected free energy. In this setting, expected free energy reduces to a constrained mutual information, where the constraints inherit from priors over outcomes (i.e., preferred outcomes). The resulting scheme is first used to perform image classification on the MNIST dataset to illustrate the basic idea, and then tested on a more challenging problem of discovering models with dynamics, using a simple sprite-based visual disentanglement paradigm and the Tower of Hanoi (cf., blocks world) problem. In these examples, generative models are constructed autodidactically to recover (i.e., disentangle) the factorial structure of latent states - and their characteristic paths or dynamics.
摘要
Translation notes:* "discrete generative models" 转为 "整数生成模型"* "Bayesian model selection" 转为 " Bayesian 模型选择"* "assimilation of training data or content" 转为 "训练数据或内容的吸收"* "priors on the selection of models" 转为 "模型选择的先验"* "expected free energy" 转为 "预期的自由能"* "constrained mutual information" 转为 "约束的共同信息"* "outcomes" 转为 "结果"* "preferred outcomes" 转为 "偏好的结果"* "autodidactically" 转为 "自学习地"
paper_authors: Chuang Yang, Kai Zhuang, Mulin Chen, Haozhao Ma, Xu Han, Tao Han, Changxing Guo, Han Han, Bingxuan Zhao, Qi Wang
for: 本研究旨在提供高精度的自动驾驶或助手驾驶支持,通过解决现有的交通标志检测和识别问题。
methods: 本研究提出了交通标志解释(TSI)任务,用于解释交通标志的全球 semantics 逻辑,并将其转换为自然语言,以提供高精度的交通指导。同时,我们设计了一种多任务学习架构,用于实现 TSI 任务,包括交通标志检测和识别以及交通标志的自然语言解释。
results: 我们在 TSI-CN 数据集上进行了实验,并证明了 TSI 任务的可行性,并且 TSI 架构可以从场景中成功地解释交通标志,即使存在复杂的 semantics 逻辑。Abstract
Most existing traffic sign-related works are dedicated to detecting and recognizing part of traffic signs individually, which fails to analyze the global semantic logic among signs and may convey inaccurate traffic instruction. Following the above issues, we propose a traffic sign interpretation (TSI) task, which aims to interpret global semantic interrelated traffic signs (e.g.,~driving instruction-related texts, symbols, and guide panels) into a natural language for providing accurate instruction support to autonomous or assistant driving. Meanwhile, we design a multi-task learning architecture for TSI, which is responsible for detecting and recognizing various traffic signs and interpreting them into a natural language like a human. Furthermore, the absence of a public TSI available dataset prompts us to build a traffic sign interpretation dataset, namely TSI-CN. The dataset consists of real road scene images, which are captured from the highway and the urban way in China from a driver's perspective. It contains rich location labels of texts, symbols, and guide panels, and the corresponding natural language description labels. Experiments on TSI-CN demonstrate that the TSI task is achievable and the TSI architecture can interpret traffic signs from scenes successfully even if there is a complex semantic logic among signs. The TSI-CN dataset and the source code of the TSI architecture will be publicly available after the revision process.
摘要
现有的交通标志相关工作都是专注于分别检测和识别交通标志,而忽略了交通标志之间的全球semantic逻辑,可能导致不准确的交通指示。为了解决这些问题,我们提出了交通标志解释(TSI)任务,该任务旨在将交通标志转化为自然语言,以提供准确的交通指示支持给自动驾驶或助手驾驶。同时,我们设计了一种多任务学习架构 для TSI,该架构负责检测和识别各种交通标志,并将它们转化为自然语言,类似于人类。由于没有公开的TSI可用数据集,我们建立了交通标志解释数据集(TSI-CN)。该数据集包括真实的路面场景图像, captured from 高速公路和城市道路在中国 driver's 视角。它包含了文本、符号和引导板的准确位置标签,以及相应的自然语言描述标签。实验表明,TSI任务是可行的,并且TSI架构可以从场景中成功地解释交通标志,即使存在复杂的semantic逻辑。TSI-CN数据集和TSI架构的源代码将在修订过程后公开。
Attention Mechanism for Lithium-Ion Battery Lifespan Prediction: Temporal and Cyclic Attention
paper_authors: Jaewook Lee, Seongmin Heo, Jay H. Lee
for: 预测锂离子电池(LIB)寿命,以便优化使用和避免事故。
methods: employs attention mechanisms(AM)construct data-driven models for predicting LIB lifespan using easily measurable inputs such as voltage, current, temperature, and capacity data.
results: 1) 通过使用时间注意力(TA)和循环注意力(CA),提高预测精度和描述输入数据中关键特征。2) 计算TA scores highlights the rest phase as a key characteristic distinguishing LIB data among different batches.3) CA scores reveal variations in the importance of cycles across batches, and the potential to reduce the number of cycles in the input data.Abstract
Accurately predicting the lifespan of lithium-ion batteries (LIBs) is pivotal for optimizing usage and preventing accidents. Previous studies in constructing prediction models often relied on inputs challenging to measure in real-time operations and failed to capture intra-cycle and inter-cycle data patterns, essential features for accurate predictions, comprehensively. In this study, we employ attention mechanisms (AM) to develop data-driven models for predicting LIB lifespan using easily measurable inputs such as voltage, current, temperature, and capacity data. The developed model integrates recurrent neural network (RNN) and convolutional neural network (CNN) components, featuring two types of attention mechanisms: temporal attention (TA) and cyclic attention (CA). The inclusion of TA aims to identify important time steps within each cycle by scoring the hidden states of the RNN, whereas CA strives to capture key features of inter-cycle correlations through self-attention (SA). This enhances model accuracy and elucidates critical features in the input data. To validate our method, we apply it to publicly available cycling data consisting of three batches of cycling modes. The calculated TA scores highlight the rest phase as a key characteristic distinguishing LIB data among different batches. Additionally, CA scores reveal variations in the importance of cycles across batches. By leveraging CA scores, we explore the potential to reduce the number of cycles in the input data. The single-head and multi-head attentions enable us to decrease the input dimension from 100 to 50 and 30 cycles, respectively.
摘要
预测锂离子电池(LIB)的寿命是非常重要的,以便优化使用和避免事故。之前的研究常常建立预测模型,但这些模型通常需要难以在实时操作中测量的输入,并且未能捕捉到循环和循环之间的数据模式,这些特征是精确预测的关键。在本研究中,我们使用注意力机制(AM)来开发基于数据驱动的LIB寿命预测模型。我们的模型结合了循环神经网络(RNN)和卷积神经网络(CNN)组件,并包括两种注意力机制:时间注意力(TA)和循环注意力(CA)。TA的目的是在每个循环中 identific important的时间步骤,而CA则是捕捉循环之间的关键特征。这使得我们的模型更加准确,并且可以帮助我们理解输入数据中的关键特征。为了验证我们的方法,我们对公共可用的循环数据进行应用,该数据包括三批循环模式。计算的TA分数显示,循环静止阶段是不同批次中LIB数据的关键特征。此外,CA分数还揭示了循环之间的重要性差异。通过利用CA分数,我们可以考虑将输入数据减少到30循环或50循环。单头和多头注意力允许我们降低输入维度。
Physics-Enhanced Multi-fidelity Learning for Optical Surface Imprint
methods: 本研究使用了多似真神经网络(MFNN)模型,首先通过纯 simulate 数据进行 aktif 训练,然后通过传输学习将 sim-to-real 距离降低。特点是通过神经网络提取未知物理特性,同时还注入已知物理特性到传输学习框架中,从而大幅提高模型稳定性和数据需求。
results: 本研究结果表明,通过使用多似真神经网络(MFNN)模型可以高效地解决材料归一化问题,并且可以减少数据需求和提高模型稳定性。这种方法可以在实验研究中应用到受限于数据约束和真实变化的场景中。Abstract
Human fingerprints serve as one unique and powerful characteristic for each person, from which policemen can recognize the identity. Similar to humans, many natural bodies and intrinsic mechanical qualities can also be uniquely identified from surface characteristics. To measure the elasto-plastic properties of one material, one formally sharp indenter is pushed into the measured body under constant force and retracted, leaving a unique residual imprint of the minute size from several micrometers to nanometers. However, one great challenge is how to map the optical image of this residual imprint into the real wanted mechanical properties, i.e., the tensile force curve. In this paper, we propose a novel method to use multi-fidelity neural networks (MFNN) to solve this inverse problem. We first actively train the NN model via pure simulation data, and then bridge the sim-to-real gap via transfer learning. The most innovative part is that we use NN to dig out the unknown physics and also implant the known physics into the transfer learning framework, thus highly improving the model stability and decreasing the data requirement. This work serves as one great example of applying machine learning into the real experimental research, especially under the constraints of data limitation and fidelity variance.
摘要
人类指印也是每个人的唯一和强大特征之一,它可以帮助警察认定人员身份。与人类类似,许多自然体系和内在机械特性也可以通过表面特征进行uniquely标识。为了测量材料的弹性挤压性能,我们使用一个高精度的检测器压入测量体系,并在Constant Force下 retraction,从而留下微米级别的剩下印记。然而,一大问题是如何将光学图像转化为真实想要的机械性能Curve。在这篇论文中,我们提出了一种新的方法,使用多种信任度神经网络(MFNN)解决这个逆问题。我们首先通过纯度 simulation 数据 актив地训练 NN 模型,然后使用传输学习桥接 sim-to-real 漏洞。最创新的部分是使用 NN 挖掘未知物理,同时还注入已知物理到传输学习框架中,从而高度提高模型稳定性和数据需求。这项工作作为机器学习在实验研究中的应用,特别是在数据约束和可靠性变化的情况下。
Interpretable pap smear cell representation for cervical cancer screening
results: 研究发现,使用变分自动编码器可以计算细胞异常程度,并可以在不使用异常样本的情况下进行训练。最佳模型在分类陵胞细胞癌(SCC)和常见细胞(NOR)之间的区别达到0.908+-0.003的AUC,而在分类高级细胞变化(HSIL)和常见细胞之间的区别达到0.920+-0.002的AUC。相比其他聚类方法,我们的方法可以更好地隔离不同的异常区域,帮助解释我们的结果。Abstract
Screening is critical for prevention and early detection of cervical cancer but it is time-consuming and laborious. Supervised deep convolutional neural networks have been developed to automate pap smear screening and the results are promising. However, the interest in using only normal samples to train deep neural networks has increased owing to class imbalance problems and high-labeling costs that are both prevalent in healthcare. In this study, we introduce a method to learn explainable deep cervical cell representations for pap smear cytology images based on one class classification using variational autoencoders. Findings demonstrate that a score can be calculated for cell abnormality without training models with abnormal samples and localize abnormality to interpret our results with a novel metric based on absolute difference in cross entropy in agglomerative clustering. The best model that discriminates squamous cell carcinoma (SCC) from normals gives 0.908 +- 0.003 area under operating characteristic curve (AUC) and one that discriminates high-grade epithelial lesion (HSIL) 0.920 +- 0.002 AUC. Compared to other clustering methods, our method enhances the V-measure and yields higher homogeneity scores, which more effectively isolate different abnormality regions, aiding in the interpretation of our results. Evaluation using in-house and additional open dataset show that our model can discriminate abnormality without the need of additional training of deep models.
摘要
屏选是预防和早期检测颈部癌症的关键,但是它却是时间consuming和劳动密集的。在超vised深度征 neural networks 中,有人开发了 automate pap smear 屏选方法,并且结果很有 promise。然而,因为医疗行业中的类别不均和标签成本高的问题,有越来越多的人关注使用正常样本来训练深度神经网络的可能性。在这项研究中,我们提出了一种方法,通过一类分类使用变量自动encoder来学习可解释深度颈部细胞表示图像。我们的发现是,可以计算细胞异常程度无需训练异常样本,并且可以将异常区域当地化,以便更好地解释我们的结果。我们的最佳模型可以在SCC和正常样本之间分类,得到0.908 ± 0.003的AUC,而在HSIL和正常样本之间分类,得到0.920 ± 0.002的AUC。相比其他分 clustering 方法,我们的方法可以提高V-度量和Homogeneity 分数,从而更好地隔离不同的异常区域,帮助解释我们的结果。我们的模型可以在我们的实验室和其他开放数据集上进行评估,并且可以在不需要深度模型进行额外训练的情况下,有效地检测异常。
FedTruth: Byzantine-Robust and Backdoor-Resilient Federated Learning Framework
results: 实验表明,FedTruth可以有效地防止恶意客户端的攻击,使FL模型免受模型毒化和后门攻击的影响。Abstract
Federated Learning (FL) enables collaborative machine learning model training across multiple parties without sharing raw data. However, FL's distributed nature allows malicious clients to impact model training through Byzantine or backdoor attacks, using erroneous model updates. Existing defenses measure the deviation of each update from a 'ground-truth model update.' They often rely on a benign root dataset on the server or use trimmed mean or median for clipping, both methods having limitations. We introduce FedTruth, a robust defense against model poisoning in FL. FedTruth doesn't assume specific data distributions nor requires a benign root dataset. It estimates a global model update with dynamic aggregation weights, considering contributions from all benign clients. Empirical studies demonstrate FedTruth's efficacy in mitigating the impacts of poisoned updates from both Byzantine and backdoor attacks.
摘要
共同学习(Federated Learning,FL)允许多方合作培训机器学习模型,无需分享原始数据。然而,FL的分布式结构允许有恶意客户端通过Byzantine或后门攻击,使用错误的模型更新。现有防御方法测量每个更新的偏差,通常基于服务器上的善意根据dataset或使用截断平均或中值clip,两种方法均有局限性。我们介绍FedTruth,一种鲁棒的FL模型报 poisoning防御方法。FedTruth不假设特定的数据分布,不需要善意根据dataset。它估算全局模型更新,使用动态汇集权重考虑所有善意客户端的贡献。实验研究表明FedTruth有效地减少了恶意更新的影响,包括Byzantine和后门攻击。
Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric Learning
results: 研究表明,使用该算法可以在分类、回归和异常检测等任务上达到或超过现有技术的性能,同时提供了新的数据和预测概念,以增强模型的解释性。Abstract
Nonparametric learning is a fundamental concept in machine learning that aims to capture complex patterns and relationships in data without making strong assumptions about the underlying data distribution. Owing to simplicity and familiarity, one of the most well-known algorithms under this paradigm is the $k$-nearest neighbors ($k$-NN) algorithm. Driven by the usage of machine learning in safety-critical applications, in this work, we shed new light on the traditional nearest neighbors algorithm from the perspective of information theory and propose a robust and interpretable framework for tasks such as classification, regression, and anomaly detection using a single model. Instead of using a traditional distance measure which needs to be scaled and contextualized, we use a novel formulation of \textit{surprisal} (amount of information required to explain the difference between the observed and expected result). Finally, we demonstrate this architecture's capability to perform at-par or above the state-of-the-art on classification, regression, and anomaly detection tasks using a single model with enhanced interpretability by providing novel concepts for characterizing data and predictions.
摘要
在传统的最近邻算法中,使用的距离度量需要缩放和contextualize,我们使用一种新的表述 Surprisal(需要解释异常结果的信息量)。最后,我们示出了这个架构可以在分类、回归和异常检测任务中表现至少与现有技术水平一样,同时提供了新的概念来描述数据和预测。
Efficient Temporally-Aware DeepFake Detection using H.264 Motion Vectors
results: experiments 显示,这种方法可以实现高效的深度伪造检测,并且与传统的每帧RGB-only方法相比,有着更小的计算成本。Abstract
Video DeepFakes are fake media created with Deep Learning (DL) that manipulate a person's expression or identity. Most current DeepFake detection methods analyze each frame independently, ignoring inconsistencies and unnatural movements between frames. Some newer methods employ optical flow models to capture this temporal aspect, but they are computationally expensive. In contrast, we propose using the related but often ignored Motion Vectors (MVs) and Information Masks (IMs) from the H.264 video codec, to detect temporal inconsistencies in DeepFakes. Our experiments show that this approach is effective and has minimal computational costs, compared with per-frame RGB-only methods. This could lead to new, real-time temporally-aware DeepFake detection methods for video calls and streaming.
摘要
视频深伪是利用深度学习(DL)创建的假媒体,可以 manipulate 人的表情或身份。现有的大多数 DeepFake 检测方法是分析每帧图像独立,忽略帧之间的不一致和不自然的运动。一些 newer 方法使用光流模型来捕捉这种 temporal 方面,但它们的计算成本较高。相比之下,我们提议使用 H.264 视频编码器中的相关但 часто被忽略的动作向量(MV)和信息面积(IM),来检测 DeepFakes 中的时间不一致。我们的实验表明,这种方法可以减少计算成本,并且效果比 RGB 每帧方法更好。这可能导致新的实时时间感知 DeepFake 检测方法,用于视频通话和流媒体。
Advancements in Generative AI: A Comprehensive Review of GANs, GPT, Autoencoders, Diffusion Model, and Transformers
paper_authors: Staphord Bengesi, Hoda El-Sayed, Md Kamruzzaman Sarker, Yao Houkpati, John Irungu, Timothy Oladunni
for: This paper explores the recent advancements in Generative Artificial Intelligence, particularly the development and applications of cutting-edge tools like Bard, Stable Diffusion, DALL-E, Make-A-Video, Runway ML, and Jukebox.
methods: The paper discusses various state-of-the-art models used in these tools, including Stable Diffusion, transformer models like GPT-3 (recent GPT-4), variational autoencoders, and generative adversarial networks.
results: The paper highlights the remarkable capabilities of these tools in accomplishing tasks such as text generation, music composition, image creation, video production, code generation, and scientific work, and also discusses the challenges posed by these advancements.Abstract
The launch of ChatGPT has garnered global attention, marking a significant milestone in the field of Generative Artificial Intelligence. While Generative AI has been in effect for the past decade, the introduction of ChatGPT has ignited a new wave of research and innovation in the AI domain. This surge in interest has led to the development and release of numerous cutting-edge tools, such as Bard, Stable Diffusion, DALL-E, Make-A-Video, Runway ML, and Jukebox, among others. These tools exhibit remarkable capabilities, encompassing tasks ranging from text generation and music composition, image creation, video production, code generation, and even scientific work. They are built upon various state-of-the-art models, including Stable Diffusion, transformer models like GPT-3 (recent GPT-4), variational autoencoders, and generative adversarial networks. This advancement in Generative AI presents a wealth of exciting opportunities and, simultaneously, unprecedented challenges. Throughout this paper, we have explored these state-of-the-art models, the diverse array of tasks they can accomplish, the challenges they pose, and the promising future of Generative Artificial Intelligence.
摘要Launch of ChatGPT 引发全球关注,标志着生成人工智能领域的重要突破口。过去十年来,生成 AI 已经在发展,但 ChatGPT 的出现启动了新一波的研究和创新在 AI 领域。这一波的兴趣使得许多 cutting-edge 工具的出现,如 Bard、Stable Diffusion、DALL-E、Make-A-Video、Runway ML 和 Jukebox 等。这些工具在文本生成、音乐创作、图像创建、视频制作、代码生成和科学工作等多个任务上表现出惊人的能力。它们基于多种当前最先进的模型,包括 Stable Diffusion、变换器模型如 GPT-3(最新 GPT-4)、变量自动编码器和生成对抗网络。这一次的生成 AI 发展呈现了许多激动人心的机遇,同时也面临着前所未有的挑战。在这篇文章中,我们探讨了这些当前最先进的模型,它们可以完成的多种任务,它们的挑战,以及生成人工智能的未来。Note: Please keep in mind that the translation is done by a machine and may not be perfect. If you need a more accurate translation, please consider using a professional translation service.
results: 通过标准指标评估ipeline的效果Abstract
With the increase in video-sharing platforms across the internet, it is difficult for humans to moderate the data for explicit content. Hence, an automated pipeline to scan through video data for explicit content has become the need of the hour. We propose a novel pipeline that uses multi-modal deep learning to first extract the explicit segments of input videos and then summarize their content using text to determine its age appropriateness and age rating. We also evaluate our pipeline's effectiveness in the end using standard metrics.
摘要
随着互联网上的视频分享平台的增加,人类对数据进行监管已变得困难。因此,一个自动化的数据扫描管道已成为当前的需求。我们提出了一种新的管道,使用多Modal深度学习来提取输入视频中的Explicit段落,然后使用文本来确定其适合年龄和评级。我们还在结束时对管道的效果进行评估,使用标准指标。Here's the translation of the text into Traditional Chinese:随着互联网上的视频分享平台的增加,人类对数据进行监管已变得困难。因此,一个自动化的数据扫描管道已成为当前的需求。我们提出了一个新的管道,使用多Modal深度学习来提取输入视频中的Explicit段落,然后使用文本来确定其适合年龄和评级。我们也在结束时对管道的效果进行评估,使用标准指标。
Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models
results: 我们在Active Vision dataset和ADE20K dataset上证明了提出的方法的效iveness。我们对比了我们的标注过程与人工标注,并证明了下游任务中的object goal navigation和part discovery的性能得到了提高。Abstract
The image annotation stage is a critical and often the most time-consuming part required for training and evaluating object detection and semantic segmentation models. Deployment of the existing models in novel environments often requires detecting novel semantic classes not present in the training data. Furthermore, indoor scenes contain significant viewpoint variations, which need to be handled properly by trained perception models. We propose to leverage the recent advancements in state-of-the-art models for bottom-up segmentation (SAM), object detection (Detic), and semantic segmentation (MaskFormer), all trained on large-scale datasets. We aim to develop a cost-effective labeling approach to obtain pseudo-labels for semantic segmentation and object instance detection in indoor environments, with the ultimate goal of facilitating the training of lightweight models for various downstream tasks. We also propose a multi-view labeling fusion stage, which considers the setting where multiple views of the scenes are available and can be used to identify and rectify single-view inconsistencies. We demonstrate the effectiveness of the proposed approach on the Active Vision dataset and the ADE20K dataset. We evaluate the quality of our labeling process by comparing it with human annotations. Also, we demonstrate the effectiveness of the obtained labels in downstream tasks such as object goal navigation and part discovery. In the context of object goal navigation, we depict enhanced performance using this fusion approach compared to a zero-shot baseline that utilizes large monolithic vision-language pre-trained models.
摘要
“图像标注阶段是训练和评估物件探测和Semantic Segmentation模型的 kritical 和最时间consuming 部分。在部署现有模型到新环境时,需要探测到 novel 的Semantic classes,而且室内场景具有重要的视角差异,需要对训练好的感知模型进行正确处理。我们提出利用最新的顶尖模型,包括底部分Segmentation(SAM)、物件探测(Detic)和Semantic Segmentation(MaskFormer),所有它们在大规模数据上进行训练。我们目标是发展一种可以实现Cost-effective的标签方法,以获得室内环境中Semantic segmentation和物件实例探测的pseudo-labels,以便训练轻量级模型供不同的下游任务。我们还提出一个多视角标签融合阶段,考虑到场景中有多个视角可用,并可以使用多个视角来识别和修正单一视角的不一致。我们在Active Vision dataset和ADE20K dataset上显示了提案的效果。我们评估标签过程的质量,与人工标注进行比较,以及下游任务中的物件目标航行和部件发现的效果。在物件目标航行中,我们显示了基于融合方法的表现,较之零基eline模型使用大型视觉语言预训模型。”
Token-level Adaptation of LoRA Adapters for Downstream Task Generalization
results: 研究结果显示,将LoRA束进行单位水平的适应,可以超过基本Llama-2-7b模型的性能,并且在数学(GSM8K)、科学(ARC-Challenge)、阅读理解(SQuAD)和程式设计(CodeAlpaca-20k)任务中都能够取得佳绩。Abstract
This paper introduces a method for adapting LoRA adapters in smaller-sized language models to arbitrary downstream tasks. Unlike standard mixture-of-expert architectures, our method employs a gradient-free routing function to choose a weighted combination of experts without increasing the compute requirements for training or inference. The results show that token-level adaptation of LoRA adapters outperforms the base Llama-2-7b model across mathematical (GSM8K), scientific (ARC-Challenge), reading comprehension (SQuAD), and coding (CodeAlpaca-20k) tasks. Further evaluations also show that the average performance of token-level adaptation outperforms individual models fine-tuned for each of the tasks with the best performance observed in adaptation of every-other token during inference. The code for this study is made available through a public repository.
摘要
Here's the Simplified Chinese translation:这篇论文提出了一种方法,用于在小型语言模型中适应任意下游任务。与传统的权重混合体系不同,我们的方法使用梯度自由路由函数来选择一个权重加权的专家组合,不会增加训练或推理计算的需求。结果表明,将LoRA适应器 token 级别适应到下游任务时,超过了基础 Llama-2-7b 模型在数学 (GSM8K)、科学 (ARC-Challenge)、阅读理解 (SQuAD) 和编程 (CodeAlpaca-20k) 任务上的性能。此外,token 级别适应的平均性能还超过了每个任务的最佳模型 fine-tune 后的性能,并且在推理时采用每个 token 适应的方式得到了最佳性能。该研究的代码公开在公共存储库中。
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
paper_authors: Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi
for: 提高适应大语言模型下沟通和用户偏好的理解和最佳实践
methods: 利用新的迭代优化技术和更高质量的指导数据集进行模型优化
results: 实现了开源模型的状态天化表现和匹配或超越GPT-3.5-turbo-0301的一些benchmarkHere’s a more detailed explanation of each point:
for: The paper aims to improve the understanding and best practices of adapting pretrained language models to downstream tasks and user preferences.
methods: The authors use a number of advances in open resources for instruction tuning, including better base models and new finetuning techniques, to improve the T"ULU models. They release a suite of improved models, including T"ULU-V2-mix, T"ULU 2, T"ULU 2+DPO, and CODE T"ULU 2.
results: The authors evaluate the T"ULU 2 suite on multiple benchmarks and show that it achieves state-of-the-art performance among open models and matches or exceeds the performance of GPT-3.5-turbo-0301 on several benchmarks.Abstract
Since the release of T\"ULU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into T\"ULU, resulting in T\"ULU 2, a suite of improved T\"ULU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user preferences. Concretely, we release: (1) T\"ULU-V2-mix, an improved collection of high-quality instruction datasets; (2) T\"ULU 2, LLAMA-2 models finetuned on the V2 mixture; (3) T\"ULU 2+DPO, T\"ULU 2 models trained with direct preference optimization (DPO), including the largest DPO-trained model to date (T\"ULU 2+DPO 70B); (4) CODE T\"ULU 2, CODE LLAMA models finetuned on our V2 mix that outperform CODE LLAMA and its instruction-tuned variant, CODE LLAMA-Instruct. Our evaluation from multiple perspectives shows that the T\"ULU 2 suite achieves state-of-the-art performance among open models and matches or exceeds the performance of GPT-3.5-turbo-0301 on several benchmarks. We release all the checkpoints, data, training and evaluation code to facilitate future open efforts on adapting large language models.
摘要
Since the release of T\"ULU [王等,2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into T\"ULU, resulting in T\"ULU 2, a suite of improved T\"ULU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user preferences. Concretely, we release: (1) T\"ULU-V2-mix, an improved collection of high-quality instruction datasets; (2) T\"ULU 2, LLAMA-2 models finetuned on the V2 mixture; (3) T\"ULU 2+DPO, T\"ULU 2 models trained with direct preference optimization (DPO), including the largest DPO-trained model to date (T\"ULU 2+DPO 70B); (4) CODE T\"ULU 2, CODE LLAMA models finetuned on our V2 mix that outperform CODE LLAMA and its instruction-tuned variant, CODE LLAMA-Instruct. Our evaluation from multiple perspectives shows that the T\"ULU 2 suite achieves state-of-the-art performance among open models and matches or exceeds the performance of GPT-3.5-turbo-0301 on several benchmarks. We release all the checkpoints, data, training and evaluation code to facilitate future open efforts on adapting large language models.
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
results: 我们的实验结果表明,使用这些”注意力less Transformer”可以与原始建筑物rival的性能。我们通过精心的减少研究和不同类型和大小的置换网络来证明我们的方法的可行性。Abstract
This work presents an analysis of the effectiveness of using standard shallow feed-forward networks to mimic the behavior of the attention mechanism in the original Transformer model, a state-of-the-art architecture for sequence-to-sequence tasks. We substitute key elements of the attention mechanism in the Transformer with simple feed-forward networks, trained using the original components via knowledge distillation. Our experiments, conducted on the IWSLT2017 dataset, reveal the capacity of these "attentionless Transformers" to rival the performance of the original architecture. Through rigorous ablation studies, and experimenting with various replacement network types and sizes, we offer insights that support the viability of our approach. This not only sheds light on the adaptability of shallow feed-forward networks in emulating attention mechanisms but also underscores their potential to streamline complex architectures for sequence-to-sequence tasks.
摘要
Countering Misinformation via Emotional Response Generation
results: 提供了首个大规模的证据Response集(约12千个CLAIM-RESPONSE对),覆盖了社交媒体平台上的基本情感和谣言吸引力两大因素,并通过了大规模实验,证明模型在输出质量和总体适应能力方面具有显著改进。Abstract
The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy. Previous research has shown how social correction can be an effective way to curb misinformation, by engaging directly in a constructive dialogue with users who spread -- often in good faith -- misleading messages. Although professional fact-checkers are crucial to debunking viral claims, they usually do not engage in conversations on social media. Thereby, significant effort has been made to automate the use of fact-checker material in social correction; however, no previous work has tried to integrate it with the style and pragmatics that are commonly employed in social media communication. To fill this gap, we present VerMouth, the first large-scale dataset comprising roughly 12 thousand claim-response pairs (linked to debunking articles), accounting for both SMP-style and basic emotions, two factors which have a significant role in misinformation credibility and spreading. To collect this dataset we used a technique based on an author-reviewer pipeline, which efficiently combines LLMs and human annotators to obtain high-quality data. We also provide comprehensive experiments showing how models trained on our proposed dataset have significant improvements in terms of output quality and generalization capabilities.
摘要
社交媒体平台上的谣言泛洪 pose 一大难题 для公共健康、社会凝聚和最终是民主。 previous research 表明,社交更正可以有效地遏制谣言,通过直接与传播谣言的用户进行构建性对话。 although professional fact-checkers are crucial to debunking viral claims, they usually do not engage in social media conversations. Therefore, significant effort has been made to automate the use of fact-checker material in social correction, but no previous work has tried to integrate it with the style and pragmatics that are commonly employed in social media communication. To fill this gap, we present VerMouth, the first large-scale dataset consisting of approximately 12,000 claim-response pairs (linked to debunking articles), taking into account both SMP-style and basic emotions, which have a significant impact on the credibility and spread of misinformation. To collect this dataset, we used a technique based on an author-reviewer pipeline, which efficiently combines LLMs and human annotators to obtain high-quality data. We also provide comprehensive experiments showing that models trained on our proposed dataset have significant improvements in terms of output quality and generalization capabilities.
Detection of Offensive and Threatening Online Content in a Low Resource Language
paper_authors: Fatima Muhammad Adam, Abubakar Yakubu Zandam, Isa Inuwa-Dutse for: This study aimed to address the lack of detection systems for offensive and threatening language in Hausa, a low-resource language spoken by over 100 million people in Africa.methods: The study consisted of two user studies (n=308) to investigate cyberbullying-related issues, collecting and annotating the first set of offensive and threatening datasets in Hausa, and developing a detection system to flag offensive and threatening content.results: The detection system was able to detect more than 70% of offensive and threatening content, but many of these were mistranslated by Google’s translation engine. The study highlights the need for a more effective detection system, which can be achieved by involving diverse stakeholders in understanding local conventions and demographics.Abstract
Hausa is a major Chadic language, spoken by over 100 million people in Africa. However, from a computational linguistic perspective, it is considered a low-resource language, with limited resources to support Natural Language Processing (NLP) tasks. Online platforms often facilitate social interactions that can lead to the use of offensive and threatening language, which can go undetected due to the lack of detection systems designed for Hausa. This study aimed to address this issue by (1) conducting two user studies (n=308) to investigate cyberbullying-related issues, (2) collecting and annotating the first set of offensive and threatening datasets to support relevant downstream tasks in Hausa, (3) developing a detection system to flag offensive and threatening content, and (4) evaluating the detection system and the efficacy of the Google-based translation engine in detecting offensive and threatening terms in Hausa. We found that offensive and threatening content is quite common, particularly when discussing religion and politics. Our detection system was able to detect more than 70% of offensive and threatening content, although many of these were mistranslated by Google's translation engine. We attribute this to the subtle relationship between offensive and threatening content and idiomatic expressions in the Hausa language. We recommend that diverse stakeholders participate in understanding local conventions and demographics in order to develop a more effective detection system. These insights are essential for implementing targeted moderation strategies to create a safe and inclusive online environment.
摘要
哈萨语是一种主要的查迪语言,在非洲被讲语言人数超过100万人。然而,从计算机语言学角度来看,哈萨语是一种低资源语言,有限的资源支持自然语言处理(NLP)任务。在线平台经常承载社交交互,可能会导致使用侮辱和威胁语言,这些语言可能因为缺乏适用于哈萨语的检测系统而被忽略。本研究的目的是解决这个问题,通过以下方法:1. 进行了两场用户研究(n=308),检查了在线社交媒体上的煽动和威胁行为。2. 收集和标注了哈萨语的第一批侮辱和威胁数据集,以支持相关的下游任务。3. 开发了一个检测系统,可以检测侮辱和威胁内容。4. 评估了检测系统和Google翻译引擎在哈萨语中检测侮辱和威胁表达的效果。我们发现,在讨论宗教和政治时,侮辱和威胁内容很普遍。我们的检测系统可以检测到大于70%的侮辱和威胁内容,然而许多这些被Google翻译引擎误译。我们认为,这是因为哈萨语中侮辱和威胁内容和idiomatic表达之间存在巧妙的关系。我们建议参与者来理解当地的习俗和人口结构,以开发更有效的检测系统。这些洞察力是必要的,以实施targeted moderation策略,创造一个安全和包容的在线环境。
When a Language Question Is at Stake. A Revisited Approach to Label Sensitive Content
results: 实验表明,使用 pseudo-labeling 方法可以生成高质量的数据集,并提供了基本的统计分析和模型评估。Abstract
Many under-resourced languages require high-quality datasets for specific tasks such as offensive language detection, disinformation, or misinformation identification. However, the intricacies of the content may have a detrimental effect on the annotators. The article aims to revisit an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war. Nowadays, this acute topic is in the spotlight of various language manipulations that cause numerous disinformation and profanity on social media platforms. The conducted experiment highlights three main stages of data annotation and underlines the main obstacles during machine annotation. Ultimately, we provide a fundamental statistical analysis of the obtained data, evaluation of models used for pseudo-labelling, and set further guidelines on how the scientists can leverage the corpus to execute more advanced research and extend the existing data samples without annotators' engagement.
摘要
许多资源不足的语言需要高质量的数据集来进行特定任务,如违禁语言检测、谎言或谣言识别。然而,内容的细节可能会对注释者产生负面影响。本文旨在重新考虑使用假标注敏感数据的方法,以便在乌克兰推特上的俄罗斯-乌克兰战争为例。现在,这个紧耦的话题在社交媒体平台上引起了各种语言护卫和词汇滥议。本实验描述了三个主要的数据注释阶段,并强调了机器注释过程中的主要困难。最后,我们提供了基本的统计分析,评估使用pseudo标注模型,并提出了科学家可以通过这些资料执行更高级别的研究和扩展现有数据样本而无需注释员的参与。
Sinhala-English Word Embedding Alignment: Introducing Datasets and Benchmark for a Low Resource Language
paper_authors: Kasun Wickramasinghe, Nisansa de Silva
for: This paper aims to align Sinhala and English word embedding spaces, addressing the lack of attention on low-resource languages in previous research.
methods: The authors use available alignment techniques and introduce a benchmark for Sinhala language embedding alignment, as well as an intermediate task of creating Sinhala-English alignment datasets.
results: While the results are not comparable to those of high-resource languages, the paper lays the groundwork for more specialized alignment between English and Sinhala embeddings.Here’s the simplified Chinese text:
results: 虽然结果与高资源语言相比并不相匹配,但这些研究为特有的英语-Singhala嵌入对齐做出了基础。Abstract
Since their inception, embeddings have become a primary ingredient in many flavours of Natural Language Processing (NLP) tasks supplanting earlier types of representation. Even though multilingual embeddings have been used for the increasing number of multilingual tasks, due to the scarcity of parallel training data, low-resource languages such as Sinhala, tend to focus more on monolingual embeddings. Then when it comes to the aforementioned multi-lingual tasks, it is challenging to utilize these monolingual embeddings given that even if the embedding spaces have a similar geometric arrangement due to an identical training process, the embeddings of the languages considered are not aligned. This is solved by the embedding alignment task. Even in this, high-resource language pairs are in the limelight while low-resource languages such as Sinhala which is in dire need of help seem to have fallen by the wayside. In this paper, we try to align Sinhala and English word embedding spaces based on available alignment techniques and introduce a benchmark for Sinhala language embedding alignment. In addition to that, to facilitate the supervised alignment, as an intermediate task, we also introduce Sinhala-English alignment datasets. These datasets serve as our anchor datasets for supervised word embedding alignment. Even though we do not obtain results comparable to the high-resource languages such as French, German, or Chinese, we believe our work lays the groundwork for more specialized alignment between English and Sinhala embeddings.
摘要
自它们的出现以来,嵌入式已成为许多自然语言处理(NLP)任务的主要组成部分,取代了之前的类型的表示。 Although multilingual embeddings have been used for the increasing number of multilingual tasks, due to the scarcity of parallel training data, low-resource languages such as Sinhala tend to focus more on monolingual embeddings. When it comes to the aforementioned multi-lingual tasks, it is challenging to utilize these monolingual embeddings because even if the embedding spaces have a similar geometric arrangement due to an identical training process, the embeddings of the languages considered are not aligned. This is solved by the embedding alignment task. Even in this, high-resource language pairs are in the limelight while low-resource languages such as Sinhala, which is in dire need of help, seem to have fallen by the wayside. In this paper, we try to align Sinhala and English word embedding spaces based on available alignment techniques and introduce a benchmark for Sinhala language embedding alignment. In addition to that, to facilitate the supervised alignment, as an intermediate task, we also introduce Sinhala-English alignment datasets. These datasets serve as our anchor datasets for supervised word embedding alignment. Although we do not obtain results comparable to the high-resource languages such as French, German, or Chinese, we believe our work lays the groundwork for more specialized alignment between English and Sinhala embeddings.
Causal Graph in Language Model Rediscovers Cortical Hierarchy in Human Narrative Processing
results: 研究发现,这两类特征的预测精度图与人脑的活动时间常数图相似,这表明语言模型和人脑在处理语言信息方面存在共同之处。Abstract
Understanding how humans process natural language has long been a vital research direction. The field of natural language processing (NLP) has recently experienced a surge in the development of powerful language models. These models have proven to be invaluable tools for studying another complex system known to process human language: the brain. Previous studies have demonstrated that the features of language models can be mapped to fMRI brain activity. This raises the question: is there a commonality between information processing in language models and the human brain? To estimate information flow patterns in a language model, we examined the causal relationships between different layers. Drawing inspiration from the workspace framework for consciousness, we hypothesized that features integrating more information would more accurately predict higher hierarchical brain activity. To validate this hypothesis, we classified language model features into two categories based on causal network measures: 'low in-degree' and 'high in-degree'. We subsequently compared the brain prediction accuracy maps for these two groups. Our results reveal that the difference in prediction accuracy follows a hierarchical pattern, consistent with the cortical hierarchy map revealed by activity time constants. This finding suggests a parallel between how language models and the human brain process linguistic information.
摘要
人类语言处理的理解已经是研究方向的一个重要领域。自然语言处理(NLP)领域在最近几年内发展出了一系列强大的语言模型。这些模型已经证明是研究人类大脑语言处理的强大工具。以前的研究表明,语言模型的特征可以与fMRI脑动activity相映射。这引起了问题:语言模型和人类大脑是否存在共同性?为了估计语言模型中信息流动的 patrern,我们研究了不同层之间的 causal 关系。受工作空间框架的启发,我们假设了语言模型中的特征可以更好地集成更多的信息,并更准确地预测高层脑动活动。为了验证这个假设,我们将语言模型的特征分为两个类别:'low in-degree' 和 'high in-degree'。然后,我们比较了这两个类别的脑预测准确率图。我们的结果表明,预测准确率之间存在层次结构,与人类大脑语言处理的 cortical hierarchy map 相符。这一发现表明了语言模型和人类大脑在处理语言信息上的并行性。
Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads
results: 研究发现,在英语语言中,两种类型的 transformer 基于 PLM 中的 gender 和 racial 偏见注意头都存在,并且这些偏见注意头在不同的模型中 exhibit 不同的行为。 结果为理解 PLM 中偏见行为带来了新的认识和指导。Abstract
Transformer-based pretrained large language models (PLM) such as BERT and GPT have achieved remarkable success in NLP tasks. However, PLMs are prone to encoding stereotypical biases. Although a burgeoning literature has emerged on stereotypical bias mitigation in PLMs, such as work on debiasing gender and racial stereotyping, how such biases manifest and behave internally within PLMs remains largely unknown. Understanding the internal stereotyping mechanisms may allow better assessment of model fairness and guide the development of effective mitigation strategies. In this work, we focus on attention heads, a major component of the Transformer architecture, and propose a bias analysis framework to explore and identify a small set of biased heads that are found to contribute to a PLM's stereotypical bias. We conduct extensive experiments to validate the existence of these biased heads and to better understand how they behave. We investigate gender and racial bias in the English language in two types of Transformer-based PLMs: the encoder-based BERT model and the decoder-based autoregressive GPT model. Overall, the results shed light on understanding the bias behavior in pretrained language models.
摘要
transformer-based pre-trained大型自然语言模型(PLM),如BERT和GPT,在NLPTask中取得了很大成功。然而,PLMs具有编码偏见的倾向。虽然一个快速发展的文献出现在PLMs中的偏见减轻中,如对性别和种族刻板印象的减轻,但内部这些偏见的manifestation和行为仍然不为人所知。更深入的理解这些偏见机制可以帮助评估模型公正性,并指导开发有效的减轻策略。在这项工作中,我们专注于Transformer架构中的注意头,并提出一种偏见分析框架,以探索和确定PLM中的偏见注意头。我们进行了广泛的实验 validate the existence of these biased heads and better understand how they behave。我们investigate gender和种族偏见在英语语言中的两种Transformer-based PLMs:BERT模型和GPT模型。总的来说,结果 shed light on understanding pretrained语言模型的偏见行为。
FOAL: Fine-grained Contrastive Learning for Cross-domain Aspect Sentiment Triplet Extraction
methods: 本研究提议使用 Fine-grained cOntrAstive Learning (FOAL) 方法来减少频率域差异,保持每个类别的抽象性,从而提高 ASTE 的性能。
results: 对六个转移对比试验表明,FOAL 方法可以提高 ASTE 的性能达6%,同时显著减少频率域差异。Abstract
Aspect Sentiment Triplet Extraction (ASTE) has achieved promising results while relying on sufficient annotation data in a specific domain. However, it is infeasible to annotate data for each individual domain. We propose to explore ASTE in the cross-domain setting, which transfers knowledge from a resource-rich source domain to a resource-poor target domain, thereby alleviating the reliance on labeled data in the target domain. To effectively transfer the knowledge across domains and extract the sentiment triplets accurately, we propose a method named Fine-grained cOntrAstive Learning (FOAL) to reduce the domain discrepancy and preserve the discriminability of each category. Experiments on six transfer pairs show that FOAL achieves 6% performance gains and reduces the domain discrepancy significantly compared with strong baselines. Our code will be publicly available once accepted.
摘要
ASTE在Specific domain中得到了有前途的结果,但是在每个域名下annotate数据是不可能的。我们提议在cross-domain Setting中运用ASTE,将知识从resource-rich的源域传播到resource-poor的目标域,从而减轻目标域的标注数据的依赖。为了有效地在域之间传递知识和准确地提取情感 triplets,我们提出了一种方法 named Fine-grained cOntrAstive Learning (FOAL),以减少域之间的差异和保持每个类别的抽象能力。我们在六个转移对的实验中发现,FOAL可以提高性能6%,同时显著减少域之间的差异。我们的代码将在接受后公开。
Exploring the Relationship between In-Context Learning and Instruction Tuning
results: 研究发现,ICL和IT都会改变LLM的隐藏状态,但ICL是IT的隐藏状态改变的假设形式。此外,ICL和IT的融合程度受到提供示例的多种因素的影响。本研究提供了对LLM行为的新的理解。Abstract
In-Context Learning (ICL) and Instruction Tuning (IT) are two primary paradigms of adopting Large Language Models (LLMs) to downstream applications. However, they are significantly different. In ICL, a set of demonstrations are provided at inference time but the LLM's parameters are not updated. In IT, a set of demonstrations are used to tune LLM's parameters in training time but no demonstrations are used at inference time. Although a growing body of literature has explored ICL and IT, studies on these topics have largely been conducted in isolation, leading to a disconnect between these two paradigms. In this work, we explore the relationship between ICL and IT by examining how the hidden states of LLMs change in these two paradigms. Through carefully designed experiments conducted with LLaMA-2 (7B and 13B), we find that ICL is implicit IT. In other words, ICL changes an LLM's hidden states as if the demonstrations were used to instructionally tune the model. Furthermore, the convergence between ICL and IT is largely contingent upon several factors related to the provided demonstrations. Overall, this work offers a unique perspective to explore the connection between ICL and IT and sheds light on understanding the behaviors of LLM.
摘要
大型自然语言模型(LLM)在下游应用中采用两种主要方法:卷积学习(ICL)和指导调整(IT)。然而,这两种方法有所不同。在ICL中,批处时提供示例,但LLM的参数未更新。在IT中,用示例进行模型训练时更新LLM的参数,但在批处时不使用示例。虽然有一部分文献研究了ICL和IT,但这两个领域的研究几乎孤立地进行,导致ICL和IT之间的连接受到了挑战。在这项工作中,我们研究了ICL和IT之间的关系,并通过对LLaMA-2(7B和13B)进行精心的实验,发现ICL是卷积学习的隐藏状态的变化。即ICL类似于在训练时使用示例进行调整LLM的模型。此外,ICL和IT的协调性受示例提供的多个因素的影响。总之,本工作提供了研究ICL和IT之间连接的新视角,并照明了LLM的行为。
Complementary Advantages of ChatGPTs and Human Readers in Reasoning: Evidence from English Text Reading Comprehension
paper_authors: Tongquan Zhou, Yao Zhang, Siyi Cao, Yulu Li, Tao Wang
for: investigate how ChatGPTs and Chinese senior school students exhibited their reasoning ability from English narrative texts.
methods: used three reasoning tests: Test 1 for commonsense inference, Test 2 for emotional inference, and Test 3 for causal inference.
results: ChatGPTs outperformed the students in daily-life inferences and positive emotions, but the students showed superiority in negative emotions and logical analysis. ChatGPT Plus excelled in updating command condition.Abstract
ChatGPT has shown its great power in text processing, including its reasoning ability from text reading. However, there has not been any direct comparison between human readers and ChatGPT in reasoning ability related to text reading. This study was undertaken to investigate how ChatGPTs (i.e., ChatGPT and ChatGPT Plus) and Chinese senior school students as ESL learners exhibited their reasoning ability from English narrative texts. Additionally, we compared the two ChatGPTs in the reasoning performances when commands were updated elaborately. The whole study was composed of three reasoning tests: Test 1 for commonsense inference, Test 2 for emotional inference, and Test 3 for causal inference. The results showed that in Test 1, the students outdid the two ChatGPT versions in local-culture-related inferences but performed worse than the chatbots in daily-life inferences. In Test 2, ChatGPT Plus excelled whereas ChatGPT lagged behind in accuracy. In association with both accuracy and frequency of correct responses, the students were inferior to the two chatbots. Compared with ChatGPTs' better performance in positive emotions, the students showed their superiority in inferring negative emotions. In Test 3, the students demonstrated better logical analysis, outdoing both chatbots. In updating command condition, ChatGPT Plus displayed good causal reasoning ability while ChatGPT kept unchanged. Our study reveals that human readers and ChatGPTs have their respective advantages and disadvantages in drawing inferences from text reading comprehension, unlocking a complementary relationship in text-based reasoning.
摘要
chatGPT 在文本处理方面表现出了强大的能力,包括从文本阅读中的理解能力。然而,直到现在,没有直接比较人类阅读者和 chatGPT 在文本阅读中的理解能力。这项研究的目的是 investigate chatGPT 和中国高中生作为 ESOL 学习者在英文故事文本中的理解能力。此外,我们还比较了两个 chatGPT 版本在命令更新后的理解性能。整个研究包括三个理解测验:测验 1 为常识推理,测验 2 为情感推理,测验 3 为 causal 推理。结果表明,在测验 1 中,学生在本地文化相关的推理方面表现出色,但在日常生活相关的推理方面表现落后于 chatbot。在测验 2 中, chatGPT 加 plus 版本表现出色,而 chatGPT 版本则落后于 accuracy。与 accuracy 和正确回答频率相关,学生比 chatbot 弱。在测验 3 中,学生表现出色,在 logical analysis 方面胜过两个 chatbot。在命令更新后的情况下, chatGPT 加 plus 版本表现出色,而 chatGPT 版本保持不变。我们的研究表明,人类阅读者和 chatGPT 在文本阅读中的理解能力各有优劣,可以形成 complementary 关系。
Prompt Pool based Class-Incremental Continual Learning for Dialog State Tracking
paper_authors: Hong Liu, Yucheng Cai, Yuan Zhou, Zhijian Ou, Yi Huang, Junlan Feng
for: 本研究旨在Addressing continual learning of dialog state tracking (DST) in the class-incremental scenario, where task identities are unknown during testing.
methods: 我们提出使用提问池方法,维护一个包含键值对的提问池,根据对话历史和提问键的距离选择提问。该方法可以自动识别任务并选择合适的提问 during testing.
results: 我们在Schema-Guided Dialog dataset (SGD) 和一个真实世界对话应用中的实验结果表明,提问池方法可以获得远高于基eline的联合目标准确率。并且将该方法与储备缓存结合,可以进一步提高模型性能。Abstract
Continual learning is crucial for dialog state tracking (DST) in dialog systems, since requirements from users for new functionalities are often encountered. However, most of existing continual learning methods for DST require task identities during testing, which is a severe limit in real-world applications. In this paper, we aim to address continual learning of DST in the class-incremental scenario (namely the task identity is unknown in testing). Inspired by the recently emerging prompt tuning method that performs well on dialog systems, we propose to use the prompt pool method, where we maintain a pool of key-value paired prompts and select prompts from the pool according to the distance between the dialog history and the prompt keys. The proposed method can automatically identify tasks and select appropriate prompts during testing. We conduct experiments on Schema-Guided Dialog dataset (SGD) and another dataset collected from a real-world dialog application. Experiment results show that the prompt pool method achieves much higher joint goal accuracy than the baseline. After combining with a rehearsal buffer, the model performance can be further improved.
摘要
Inspired by the recently emerging prompt tuning method that performs well on dialog systems, we propose the prompt pool method. We maintain a pool of key-value paired prompts and select prompts based on the distance between the dialog history and the prompt keys. The proposed method can automatically identify tasks and select appropriate prompts during testing.We conduct experiments on the Schema-Guided Dialog dataset (SGD) and a real-world dialog application dataset. Experiment results show that the prompt pool method achieves much higher joint goal accuracy than the baseline. By combining with a rehearsal buffer, the model performance can be further improved.
Energy and Carbon Considerations of Fine-Tuning BERT
paper_authors: Xiaorong Wang, Clara Na, Emma Strubell, Sorelle Friedler, Sasha Luccioni
for: This paper aims to provide a comprehensive understanding of the energy and carbon footprint of fine-tuning in NLP, in order to better characterize the role of fine-tuning in the landscape of energy and carbon emissions.
methods: The paper uses a careful empirical study of the computational costs of fine-tuning across tasks, datasets, hardware infrastructure, and measurement modalities to place fine-tuning energy and carbon costs into perspective with respect to pre-training and inference.
results: The paper outlines recommendations to NLP researchers and practitioners who wish to improve their fine-tuning energy efficiency.Here is the same information in Simplified Chinese text:
results: 这篇论文为 NLP 研究者和实践者提供了提高 fine-tuning 能效的建议。Abstract
Despite the popularity of the `pre-train then fine-tune' paradigm in the NLP community, existing work quantifying energy costs and associated carbon emissions has largely focused on language model pre-training. Although a single pre-training run draws substantially more energy than fine-tuning, fine-tuning is performed more frequently by many more individual actors, and thus must be accounted for when considering the energy and carbon footprint of NLP. In order to better characterize the role of fine-tuning in the landscape of energy and carbon emissions in NLP, we perform a careful empirical study of the computational costs of fine-tuning across tasks, datasets, hardware infrastructure and measurement modalities. Our experimental results allow us to place fine-tuning energy and carbon costs into perspective with respect to pre-training and inference, and outline recommendations to NLP researchers and practitioners who wish to improve their fine-tuning energy efficiency.
摘要
Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2
for: This study aims to investigate the effectiveness of a decoding algorithm in mitigating insults and political bias in generated text, with the goal of contributing to the ongoing effort of examining the ethical and social implications of human-AI interaction.
methods: The study uses generative pretrained transformer (GPT) language models that have the ability to recognize and detect toxicity in generated content, and a decoding algorithm that allows the models to self-debias and reduce the likelihood of generating harmful text.
results: The study aims to evaluate the efficacy of the diagnosing-debiasing approach in mitigating insults and political bias in generated text, and contribute to the ongoing effort of understanding the ethical and social implications of human-AI interaction.Abstract
The training of large language models (LLMs) on extensive, unfiltered corpora sourced from the internet is a common and advantageous practice. Consequently, LLMs have learned and inadvertently reproduced various types of biases, including violent, offensive, and toxic language. However, recent research shows that generative pretrained transformer (GPT) language models can recognize their own biases and detect toxicity in generated content, a process referred to as self-diagnosis. In response, researchers have developed a decoding algorithm that allows LLMs to self-debias, or reduce their likelihood of generating harmful text. This study investigates the efficacy of the diagnosing-debiasing approach in mitigating two additional types of biases: insults and political bias. These biases are often used interchangeably in discourse, despite exhibiting potentially dissimilar semantic and syntactic properties. We aim to contribute to the ongoing effort of investigating the ethical and social implications of human-AI interaction.
摘要
培训大型语言模型(LLM)在互联网上广泛、未经过滤的 corpora 上进行常见和有利的做法。因此,LLM 已经学习并不意气地复制了各种偏见,包括暴力、袋判和毒害的语言。然而,最近的研究表明,生成预训练 transformer(GPT)语言模型可以识别自己的偏见并检测生成内容中的毒害,一个过程称为自我诊断。为此,研究人员已经开发了一种解码算法,allowing LLMs 自我减偏,即减少生成危险的文本的可能性。本研究investigates the efficacy of the diagnosing-debiasing approach in mitigating two additional types of biases: insults and political bias。这些偏见通常在DISCOURSE中被混用,尽管它们可能具有不同的semantic和 sintactic 特征。我们想要贡献到人机交互的伦理和社会因素的ongoing 探索中。
For: 学习3D欧几何空间中函数之间的映射。* Methods: combinatorial学习方案和差异运算层,保证SE(3)-等变征性。从图谱视角来看,我们的方法可以看作是图像 convolution on graphons,我们称之为InfGCN。* Results: 在大规模电子密度数据集上进行了广泛的实验,与当前状态艺术体系相比,our model表现出了显著的优异性。多个缺省研究也进行了,以证明提案的建筑的效果。Abstract
We propose a general architecture that combines the coefficient learning scheme with a residual operator layer for learning mappings between continuous functions in the 3D Euclidean space. Our proposed model is guaranteed to achieve SE(3)-equivariance by design. From the graph spectrum view, our method can be interpreted as convolution on graphons (dense graphs with infinitely many nodes), which we term InfGCN. By leveraging both the continuous graphon structure and the discrete graph structure of the input data, our model can effectively capture the geometric information while preserving equivariance. Through extensive experiments on large-scale electron density datasets, we observed that our model significantly outperformed the current state-of-the-art architectures. Multiple ablation studies were also carried out to demonstrate the effectiveness of the proposed architecture.
摘要
我们提出了一种通用的建筑方案,这种方案结合了系数学习方案和差分运算层来学习3D欧几何空间中函数的映射。我们的提议的模型由设计 garantizado SE(3)-等价性。从图spectrum的视角来看,我们的方法可以被解释为 dense graphs(有无穷多个节点的图)上的卷积,我们称之为InfGCN。通过利用 continues graphon 结构和输入数据的树状结构,我们的模型可以有效地捕捉几何信息,同时保持等价性。经过对大规模电子密度数据集进行了广泛的实验,我们发现我们的模型可以明显超越当前状态的体系。此外,我们还进行了多个缺省研究来证明提案的效果。
A Whole New Ball Game: A Primal Accelerated Method for Matrix Games and Minimizing the Maximum of Smooth Functions
results: 本文的算法可以在 $n$ 较大时,在 $\epsilon$-approximate 的情况下,使用 $\widetilde{O}(n \epsilon^{-1/3} + \epsilon^{-2})$ 梯度和函数评估,以及 $\widetilde{O}(n \epsilon^{-4/3})$ 的额外时间来解决该问题。在特定的特例中,当每个 $f_i$ 是线性函数时,本文的算法可以在runtime $\widetilde{O}(n (d/\epsilon)^{2/3} + nd + d\epsilon^{-2})$ 内获得 $\epsilon$-approximate 解决方案。这在 $n>d$ 和 $\epsilon=1/\sqrt{n}$ 时超过所有已知的第一个方法。Abstract
We design algorithms for minimizing $\max_{i\in[n]} f_i(x)$ over a $d$-dimensional Euclidean or simplex domain. When each $f_i$ is $1$-Lipschitz and $1$-smooth, our method computes an $\epsilon$-approximate solution using $\widetilde{O}(n \epsilon^{-1/3} + \epsilon^{-2})$ gradient and function evaluations, and $\widetilde{O}(n \epsilon^{-4/3})$ additional runtime. For large $n$, our evaluation complexity is optimal up to polylogarithmic factors. In the special case where each $f_i$ is linear -- which corresponds to finding a near-optimal primal strategy in a matrix game -- our method finds an $\epsilon$-approximate solution in runtime $\widetilde{O}(n (d/\epsilon)^{2/3} + nd + d\epsilon^{-2})$. For $n>d$ and $\epsilon=1/\sqrt{n}$ this improves over all existing first-order methods. When additionally $d = \omega(n^{8/11})$ our runtime also improves over all known interior point methods. Our algorithm combines three novel primitives: (1) A dynamic data structure which enables efficient stochastic gradient estimation in small $\ell_2$ or $\ell_1$ balls. (2) A mirror descent algorithm tailored to our data structure implementing an oracle which minimizes the objective over these balls. (3) A simple ball oracle acceleration framework suitable for non-Euclidean geometry.
摘要
我们设计算法以最小化 $\max_{i\in[n]} f_i(x)$ 在 $d$-维欧几何或简单体领域上。当每个 $f_i$ 是 $1$-Lipschitz 和 $1$-smooth 时,我们的方法可以在 $\widetilde{O}(n \epsilon^{-1/3} + \epsilon^{-2})$ 梯度和函数评估和 $\widetilde{O}(n \epsilon^{-4/3})$ 额外时间下 Compute an $\epsilon$-approximate solution。对于大 $n$,我们的评估复杂度是最佳的,仅仅带有极小的多项式因子。在特殊情况下,每个 $f_i$ 是线性的情况下(即找到一个近似最佳 primal 策略在矩阵游戏中),我们的方法可以在 runtime $\widetilde{O}(n (d/\epsilon)^{2/3} + nd + d\epsilon^{-2})$ 下Compute an $\epsilon$-approximate solution。当 $n>d$ 且 $\epsilon=1/\sqrt{n}$ 时,我们的时间复杂度超过所有已知的首ORDER方法。另外,当 $d = \omega(n^{8/11})$ 时,我们的时间复杂度也超过所有已知的内部点方法。我们的算法结合了三个新的基本 primitives:1. 一个动态数据结构,可以实现高效的随机梯度估计在小 $\ell_2$ 或 $\ell_1$ 球上。2. 一个镜像下降算法,适应我们的数据结构,实现一个函数实现器,可以实现这些球上的目标最小化。3. 一个简单的球观点增强框架,适合非欧几何。I hope this helps! Let me know if you have any questions or need further clarification.
A Quadratic Speedup in Finding Nash Equilibria of Quantum Zero-Sum Games
paper_authors: Francisca Vasconcelos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Panayotis Mertikopoulos, Georgios Piliouras, Michael I. Jordan
for: quantum zero-sum games
methods: hierarchy of quantum optimization algorithms, including Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm
results: quadratic speed-up relative to previous algorithm, with an average-iterate convergence complexity of $\mathcal{O}(d/\epsilon)$ iterations to $\epsilon$-Nash equilibria.Abstract
Recent developments in domains such as non-local games, quantum interactive proofs, and quantum generative adversarial networks have renewed interest in quantum game theory and, specifically, quantum zero-sum games. Central to classical game theory is the efficient algorithmic computation of Nash equilibria, which represent optimal strategies for both players. In 2008, Jain and Watrous proposed the first classical algorithm for computing equilibria in quantum zero-sum games using the Matrix Multiplicative Weight Updates (MMWU) method to achieve a convergence rate of $\mathcal{O}(d/\epsilon^2)$ iterations to $\epsilon$-Nash equilibria in the $4^d$-dimensional spectraplex. In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/\epsilon)$ iterations to $\epsilon$-Nash equilibria. This quadratic speed-up relative to Jain and Watrous' original algorithm sets a new benchmark for computing $\epsilon$-Nash equilibria in quantum zero-sum games.
摘要
In this work, we propose a hierarchy of quantum optimization algorithms that generalize MMWU via an extra-gradient mechanism. Notably, within this proposed hierarchy, we introduce the Optimistic Matrix Multiplicative Weights Update (OMMWU) algorithm and establish its average-iterate convergence complexity as $\mathcal{O}(d/\epsilon)$ iterations to $\epsilon$-Nash equilibria. This represents a quadratic speed-up relative to Jain and Watrous' original algorithm, setting a new benchmark for computing $\epsilon$-Nash equilibria in quantum zero-sum games.
Accelerating L-shaped Two-stage Stochastic SCUC with Learning Integrated Benders Decomposition
results: 通过创建紧张的割和减少主问题的大小,提高Benders分解法的计算成本和内存使用情况。三种方法被提出,即回归Benders、分类Benders和回归-分类Benders。一个回归器读取负荷profile场景,预测子问题目标函数代理变量,形成紧张割。定义一个标准来衡量割的用于下界提高的水平。用于割的有用性被定义,并在含有分类学习器和无分类学习器两种情况下进行评估。有用割逐渐添加到主问题中,非有用割则被抛弃,以降低Benders迭代中的计算负担。多个测试系统的实践研究显示了提案的学习帮助Benders分解法在比传统多割Benders分解法更有效地解决两阶段SCUC问题。Abstract
Benders decomposition is widely used to solve large mixed-integer problems. This paper takes advantage of machine learning and proposes enhanced variants of Benders decomposition for solving two-stage stochastic security-constrained unit commitment (SCUC). The problem is decomposed into a master problem and subproblems corresponding to a load scenario. The goal is to reduce the computational costs and memory usage of Benders decomposition by creating tighter cuts and reducing the size of the master problem. Three approaches are proposed, namely regression Benders, classification Benders, and regression-classification Benders. A regressor reads load profile scenarios and predicts subproblem objective function proxy variables to form tighter cuts for the master problem. A criterion is defined to measure the level of usefulness of cuts with respect to their contribution to lower bound improvement. Useful cuts that contain the necessary information to form the feasible region are identified with and without a classification learner. Useful cuts are iteratively added to the master problem, and non-useful cuts are discarded to reduce the computational burden of each Benders iteration. Simulation studies on multiple test systems show the effectiveness of the proposed learning-aided Benders decomposition for solving two-stage SCUC as compared to conventional multi-cut Benders decomposition.
摘要
< translate into Simplified Chineseбендер的分解广泛应用于解决大规模杂合integer问题。这篇论文利用机器学习技术,提出了增强版本的本дер分解方法,用于解决两个阶段随机安全约束Unit Commitment(SCUC)问题。问题被分解成主问题和相应的负荷enario子问题。目标是通过创建更紧张的割和减少主问题的大小来降低本дер分解的计算成本和内存使用。三种方法被提出,即回归本дер、分类本дер和回归分类本дер。一个回归器读取负荷profile scenario,预测子问题目标函数假变量,以形成更紧张的割。一个 criterion 是定义用于测量割与 respect to its contribution to lower bound improvement的水平。有用的割是指包含必要信息来形成可行区的割,而不需要分类学习。有用的割在主问题中迭代添加,不用的割则被抛弃,以降低每个本дер迭代的计算负担。多个测试系统的 simulate 研究表明,提出的学习帮助的本дер分解方法可以与传统的多割本дер分解方法相比,更高效地解决两个阶段SCUC问题。
Machine learning phase transitions: Connections to the Fisher information
results: 研究证明了机器学习指标对数据中相转移的精度,并通过数值示范了这些指标在 классиical和量子系统中的性能。Abstract
Despite the widespread use and success of machine-learning techniques for detecting phase transitions from data, their working principle and fundamental limits remain elusive. Here, we explain the inner workings and identify potential failure modes of these techniques by rooting popular machine-learning indicators of phase transitions in information-theoretic concepts. Using tools from information geometry, we prove that several machine-learning indicators of phase transitions approximate the square root of the system's (quantum) Fisher information from below -- a quantity that is known to indicate phase transitions but is often difficult to compute from data. We numerically demonstrate the quality of these bounds for phase transitions in classical and quantum systems.
摘要
Translated into Simplified Chinese:尽管机器学习技术在数据上检测phasetransition的广泛使用和成功,它们的工作原理和基本限制仍然未知。在这里,我们解释这些技术的内部工作和 potential failure modes,并将它们基于信息理论概念。使用信息几何工具,我们证明了一些机器学习phasetransition的指标 approximate系统的(量子) Fisher信息的平方根从下面 - 这是已知能指示phasetransition的量,但是从数据中计算很难。我们 numerically示出了这些下界的质量 для classical和quantum系统的phasetransition.
Optimal Embedding Dimension for Sparse Subspace Embeddings
paper_authors: Shabarish Chenakkod, Michał Dereziński, Xiaoyu Dong, Mark Rudelson
for: The paper is written to address the main open question posed by Nelson and Nguyen (FOCS 2013) on the embedding dimension of oblivious subspace embeddings (OSEs) and to improve on the previous results by Cohen (SODA 2016).
methods: The paper uses a random matrix with randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column to construct an OSE with $\epsilon = O_{\theta}(1)$.
results: The paper shows that the proposed OSE has an embedding dimension of $m=O(d)$ and achieves a distortion of $\epsilon = O_{\theta}(1)$, which improves on the previous results of $m=O(d\log(d))$ and $\epsilon = O(1)$ respectively. Additionally, the paper presents an optimal single-pass algorithm for least squares regression using the proposed OSE.Abstract
A random $m\times n$ matrix $S$ is an oblivious subspace embedding (OSE) with parameters $\epsilon>0$, $\delta\in(0,1/3)$ and $d\leq m\leq n$, if for any $d$-dimensional subspace $W\subseteq R^n$, $P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq (1+\epsilon)\|x\|\,\big)\geq 1-\delta.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $\theta > 0$, a Gaussian embedding matrix with $m\geq (1+\theta) d$ is an OSE with $\epsilon = O_\theta(1)$. However, such optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation. We show that, given any $\theta > 0$, an $m\times n$ random matrix $S$ with $m\geq (1+\theta)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $\epsilon = O_{\theta}(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to construct even sparser non-oblivious embeddings, leading to the first subspace embedding with low distortion $\epsilon=o(1)$ and optimal embedding dimension $m=O(d/\epsilon^2)$ that can be applied in current matrix multiplication time.
摘要
一个随机矩阵 $S$ 是一个透彻空间嵌入 (OSE), Parameters $\epsilon>0$, $\delta\in(0,1/3)$ 和 $d\leq m\leq n$。如果任何 $d$-维子空间 $W\subseteq \mathbb{R}^n$ 上,then $P\big(\,\forall_{x\in W}\ (1+\epsilon)^{-1}\|x\|\leq\|Sx\|\leq (1+\epsilon)\|x\|\,\big)\geq 1-\delta.$ It is known that the embedding dimension of an OSE must satisfy $m\geq d$, and for any $\theta > 0$, a Gaussian embedding matrix with $m\geq (1+\theta) d$ is an OSE with $\epsilon = O_\theta(1)$. However, the optimal embedding dimension is not known for other embeddings. Of particular interest are sparse OSEs, having $s\ll m$ non-zeros per column, with applications to problems such as least squares regression and low-rank approximation.We show that, given any $\theta > 0$, an $m\times n$ random matrix $S$ with $m\geq (1+\theta)d$ consisting of randomly sparsified $\pm1/\sqrt s$ entries and having $s= O(\log^4(d))$ non-zeros per column, is an oblivious subspace embedding with $\epsilon = O_{\theta}(1)$. Our result addresses the main open question posed by Nelson and Nguyen (FOCS 2013), who conjectured that sparse OSEs can achieve $m=O(d)$ embedding dimension, and it improves on $m=O(d\log(d))$ shown by Cohen (SODA 2016). We use this to construct the first oblivious subspace embedding with $O(d)$ embedding dimension that can be applied faster than current matrix multiplication time, and to obtain an optimal single-pass algorithm for least squares regression. We further extend our results to construct even sparser non-oblivious embeddings, leading to the first subspace embedding with low distortion $\epsilon=o(1)$ and optimal embedding dimension $m=O(d/\epsilon^2)$ that can be applied in current matrix multiplication time.
Multiparameter Persistent Homology for Molecular Property Prediction
for: This paper presents a novel method for generating molecular fingerprints based on multiparameter persistent homology, which reveals the latent structures and relationships within molecular geometry and detects topological features that exhibit persistence across multiple scales.
methods: The proposed fingerprinting method uses multiparameter persistent homology, which is a more comprehensive and interpretable approach than traditional graph neural networks. The method incorporates multiple parameters such as atomic mass, partial charge, and bond type, and can be further enhanced by incorporating additional parameters.
results: The proposed method has been demonstrated to be effective in predicting molecular properties through extensive experiments on the Lipophilicity, FreeSolv, and ESOL datasets. The method provides fresh perspectives on molecular structure that are not easily discernible from single-parameter or single-scale analysis.Abstract
In this study, we present a novel molecular fingerprint generation method based on multiparameter persistent homology. This approach reveals the latent structures and relationships within molecular geometry, and detects topological features that exhibit persistence across multiple scales along multiple parameters, such as atomic mass, partial charge, and bond type, and can be further enhanced by incorporating additional parameters like ionization energy, electron affinity, chirality and orbital hybridization. The proposed fingerprinting method provides fresh perspectives on molecular structure that are not easily discernible from single-parameter or single-scale analysis. Besides, in comparison with traditional graph neural networks, multiparameter persistent homology has the advantage of providing a more comprehensive and interpretable characterization of the topology of the molecular data. We have established theoretical stability guarantees for multiparameter persistent homology, and have conducted extensive experiments on the Lipophilicity, FreeSolv, and ESOL datasets to demonstrate its effectiveness in predicting molecular properties.
摘要
在本研究中,我们提出了一种基于多参数持续同态的分子指纹生成方法。这种方法可以揭示分子几何结构中的隐藏结构和关系,并检测在多个缩放量和多个参数(如原子质量、部分电荷、键类型)之间的 persistente 特征。此外,通过添加更多参数(如离子能力、电子亲和力、旋转hybridization),可以进一步增强分子指纹的准确性。提议的指纹方法可以为分子结构的分析提供新的视角,并且与传统的图 neural network 相比, multiparameter persistent homology 具有更全面和可 interpret的特征。我们已经提供了理论稳定保证,并在 Lipophilicity、FreeSolv 和 ESOL 数据集上进行了广泛的实验,以证明其效iveness 在预测分子性质。
Online Calibration of Deep Learning Sub-Models for Hybrid Numerical Modeling Systems
paper_authors: Said Ouala, Bertrand Chapron, Fabrice Collard, Lucile Gaultier, Ronan Fablet for: 这个论文主要是关于如何使用人工智能和深度学习来改进数值 simulate 框架,以及如何在这些框架中使用 neural network 来模型化物理系统。methods: 这个论文使用了一种名为 EGA(Euler Gradient Approximation)的在线学习方法,该方法假设物理模型中的梯度可以用一种加法方式来近似,并且使用了 Explicit Euler 方法来计算梯度。results: 实验结果表明,EGA 方法可以在不同的案例中提供显著的改进,比如 ocean-atmosphere 动力学等。这些结果也表明,在线学习方法可以在 hybrid 模型中提供更好的预测性能,相比于传统的 offline 学习方法。Abstract
Artificial intelligence and deep learning are currently reshaping numerical simulation frameworks by introducing new modeling capabilities. These frameworks are extensively investigated in the context of model correction and parameterization where they demonstrate great potential and often outperform traditional physical models. Most of these efforts in defining hybrid dynamical systems follow {offline} learning strategies in which the neural parameterization (called here sub-model) is trained to output an ideal correction. Yet, these hybrid models can face hard limitations when defining what should be a relevant sub-model response that would translate into a good forecasting performance. End-to-end learning schemes, also referred to as online learning, could address such a shortcoming by allowing the deep learning sub-models to train on historical data. However, defining end-to-end training schemes for the calibration of neural sub-models in hybrid systems requires working with an optimization problem that involves the solver of the physical equations. Online learning methodologies thus require the numerical model to be differentiable, which is not the case for most modeling systems. To overcome this difficulty and bypass the differentiability challenge of physical models, we present an efficient and practical online learning approach for hybrid systems. The method, called EGA for Euler Gradient Approximation, assumes an additive neural correction to the physical model, and an explicit Euler approximation of the gradients. We demonstrate that the EGA converges to the exact gradients in the limit of infinitely small time steps. Numerical experiments are performed on various case studies, including prototypical ocean-atmosphere dynamics. Results show significant improvements over offline learning, highlighting the potential of end-to-end online learning for hybrid modeling.
摘要
人工智能和深度学习现在在数值仿真框架中发挥重要作用,带来新的模型化能力。这些框架在模型修正和参数化方面得到了广泛的研究,并在许多情况下超越了传统的物理模型。大多数这些尝试都采用了拟合动力系统的方法,其中大多数采用了Offline学习策略,在哪里神经参数化(以下简称为子模型)被训练以输出理想的修正。然而,这些混合模型在定义相关的子模型响应时可能会遇到困难。在线学习方法,也称为Online学习,可以解决这一缺点,并允许深度学习子模型在历史数据上进行训练。然而,定义在混合系统中的End-to-end学习方案需要与物理方程的解除器进行优化问题,这种问题需要数值模型具有导数性。因此,在线学习方法需要数值模型是可导的,这并不是现实的情况。为了缺过这个挑战和Physical模型的不导数性,我们提出了一种高效和实用的在线学习方法,称为EGA(Euler Gradient Approximation)。EGA假设神经修正是加性的,并且使用显式Euler近似来计算导数。我们证明EGA可以在无穷小时步下收敛到正确的导数。在不同的案例研究中,包括气洋大气动力学的示例,我们进行了数值实验,并得到了与Offline学习相比显著的改进。这些结果表明了混合模型的End-to-end在线学习的潜在潜力。
Learning Realistic Joint Space Boundaries for Range of Motion Analysis of Healthy and Impaired Human Arms
paper_authors: Shafagh Keyvanian, Michelle J. Johnson, Nadia Figueroa for:这篇论文旨在创建一个真实的人体动机学模型,以便在人机交互、生物力学和机器人帮助重建中更加准确地模拟人体动作。methods:该论文使用数据驱动的方法,通过一个一类支持向量机进行joint空间探索运动数据的适应,并实现了高效的hyperparameter调整方案。results:该论文的方法在有效地学习真实的人体动机学范围,并提供了一个量化的依硬力指标(II),用于评估健康和损伤臂的能力差异。Abstract
A realistic human kinematic model that satisfies anatomical constraints is essential for human-robot interaction, biomechanics and robot-assisted rehabilitation. Modeling realistic joint constraints, however, is challenging as human arm motion is constrained by joint limits, inter- and intra-joint dependencies, self-collisions, individual capabilities and muscular or neurological constraints which are difficult to represent. Hence, physicians and researchers have relied on simple box-constraints, ignoring important anatomical factors. In this paper, we propose a data-driven method to learn realistic anatomically constrained upper-limb range of motion (RoM) boundaries from motion capture data. This is achieved by fitting a one-class support vector machine to a dataset of upper-limb joint space exploration motions with an efficient hyper-parameter tuning scheme. Our approach outperforms similar works focused on valid RoM learning. Further, we propose an impairment index (II) metric that offers a quantitative assessment of capability/impairment when comparing healthy and impaired arms. We validate the metric on healthy subjects physically constrained to emulate hemiplegia and different disability levels as stroke patients.
摘要
真实的人类动态模型,满足人体生物力学和机器人协助康复的需求,是非常重要。但是模拟真实的关节约束却是一项挑战,因为人类手臂运动受到关节限制、间关节和内关节依赖、自体冲撞和个体能力和神经学约束的限制。因此,医生和研究人员通常采用简单的盒子约束,忽略了重要的解剖因素。在这篇论文中,我们提出了一种基于数据驱动的方法,通过将一类支持向量机制适应到一个基于运动捕捉数据的上下文中,以学习真实的解剖约束范围。我们的方法比类似的工作更高效。此外,我们还提出了一个评价能力/障碍度的指标(II指标),可以对健康和损伤手臂进行数量评价。我们验证了这个指标,通过将健康人物理约束为假性肢体瘫痪和不同的残伤水平来验证。
Predicting the Probability of Collision of a Satellite with Space Debris: A Bayesian Machine Learning Approach
results: 实验结果表明,使用 HMM 可以超越预测方法的性能,这 further 支持冲突警告可能具有markov 性质。Abstract
Space is becoming more crowded in Low Earth Orbit due to increased space activity. Such a dense space environment increases the risk of collisions between space objects endangering the whole space population. Therefore, the need to consider collision avoidance as part of routine operations is evident to satellite operators. Current procedures rely on the analysis of multiple collision warnings by human analysts. However, with the continuous growth of the space population, this manual approach may become unfeasible, highlighting the importance of automation in risk assessment. In 2019, ESA launched a competition to study the feasibility of applying machine learning in collision risk estimation and released a dataset that contained sequences of Conjunction Data Messages (CDMs) in support of real close encounters. The competition results showed that the naive forecast and its variants are strong predictors for this problem, which suggests that the CDMs may follow the Markov property. The proposed work investigates this theory by benchmarking Hidden Markov Models (HMM) in predicting the risk of collision between two resident space objects by using one feature of the entire dataset: the sequence of the probability in the CDMs. In addition, Bayesian statistics are used to infer a joint distribution for the parameters of the models, which allows the development of robust and reliable probabilistic predictive models that can incorporate physical or prior knowledge about the problem within a rigorous theoretical framework and provides prediction uncertainties that nicely reflect the accuracy of the predicted risk. This work shows that the implemented HMM outperforms the naive solution in some metrics, which further adds to the idea that the collision warnings may be Markovian and suggests that this is a powerful method to be further explored.
摘要
Space 正在低地球轨道上变得越来越拥挤,由于增加的空间活动。这样的紧密的空间环境会提高空间物体之间的Collision的风险, threatening the whole space population.因此,卫星运营商必须考虑避免Collision的作业为 Routine 的一部分。现有的程序仍然基于人类分析多个Collision 警告。然而,随着空间人口的不断增长,这种手动方法可能变得不可行, highlighting the importance of automation in risk assessment.在2019年,ESA发布了一项竞赛,以研究在Collision 风险估计中应用机器学习的可行性,并发布了一个包含实际近距离Encounter 的数据集。竞赛结果表明,naive forecast和其变体是Close Encounter 中的强有力预测器,这意味着CDMs可能遵循Markov 性质。本工作investigates this theory by benchmarkingHidden Markov Models (HMM) in predicting the risk of collision between two resident space objects using one feature of the entire dataset: the sequence of the probability in the CDMs。此外,Bayesian statistics are used to infer a joint distribution for the parameters of the models, which allows the development of robust and reliable probabilistic predictive models that can incorporate physical or prior knowledge about the problem within a rigorous theoretical framework and provides prediction uncertainties that nicely reflect the accuracy of the predicted risk。本工作显示,实施的HMM OUTPERFORMS naive solution in some metrics,这更加支持CDMs遵循Markovian的想法,并 suggets that this is a powerful method to be further explored。
A Poincaré Inequality and Consistency Results for Signal Sampling on Large Graphs
results: 作者采用了相关的图on信号抽样算法,并通过实验证明了其在图机器学习任务上的良好实验表现。Here’s the same information in English:
for: The paper addresses the challenge of large-scale graph machine learning, where the complexity of learning models scales with the graph size.
methods: The authors propose a signal sampling theory for a type of graph limit called the graphon, and prove that certain sampling sets are unique and consistent for graphon signals.
results: The authors propose a related graphon signal sampling algorithm and demonstrate its good empirical performance on graph machine learning tasks.Abstract
Large-scale graph machine learning is challenging as the complexity of learning models scales with the graph size. Subsampling the graph is a viable alternative, but sampling on graphs is nontrivial as graphs are non-Euclidean. Existing graph sampling techniques require not only computing the spectra of large matrices but also repeating these computations when the graph changes, e.g., grows. In this paper, we introduce a signal sampling theory for a type of graph limit -- the graphon. We prove a Poincar\'e inequality for graphon signals and show that complements of node subsets satisfying this inequality are unique sampling sets for Paley-Wiener spaces of graphon signals. Exploiting connections with spectral clustering and Gaussian elimination, we prove that such sampling sets are consistent in the sense that unique sampling sets on a convergent graph sequence converge to unique sampling sets on the graphon. We then propose a related graphon signal sampling algorithm for large graphs, and demonstrate its good empirical performance on graph machine learning tasks.
摘要
大规模图机器学习是挑战的,因为学习模型的复杂度与图Size相关。图样本是非欧几何的,现有的图样本技术需要计算大Matrix的特征值,并在图改变时重复这些计算,例如图生长。在这篇论文中,我们介绍了一种图Limit——图он的信号抽样理论。我们证明了图он信号的波因耳假设下的质量假设,并证明了这些样本集在Paley-Wiener空间中是独特的抽样集。通过spectral clustering和欧几何排序的连接,我们证明了这些样本集是一致的,即在 convergent 图序列上的样本集 converges to 图он上的样本集。然后,我们提出了一种基于图он信号抽样的大图机器学习算法,并在实际应用中得到了良好的实际表现。
Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks
paper_authors: Benjamin Feuer, Chinmay Hegde, Niv Cohen
for: 本研究旨在 investigating the best way to summarize the labelled training samples before feeding them to a pre-trained Prior-Data Fitted Network (PFN) for tabular data.
methods: 本研究使用 sketching 和 feature-selection methods to summarize the labelled training samples, and compare the results with conventionally fitted tabular models.
results: 研究发现,使用 sketching 和 feature-selection methods可以有效地缩小 labelled training samples,而且这些方法与 conventionally fitted tabular models 有一定的区别。Abstract
Tabular classification has traditionally relied on supervised algorithms, which estimate the parameters of a prediction model using its training data. Recently, Prior-Data Fitted Networks (PFNs) such as TabPFN have successfully learned to classify tabular data in-context: the model parameters are designed to classify new samples based on labelled training samples given after the model training. While such models show great promise, their applicability to real-world data remains limited due to the computational scale needed. Here we study the following question: given a pre-trained PFN for tabular data, what is the best way to summarize the labelled training samples before feeding them to the model? We conduct an initial investigation of sketching and feature-selection methods for TabPFN, and note certain key differences between it and conventionally fitted tabular models.
摘要
文本分类传统上采用了指导算法,这些算法估算模型参数使用训练数据。近期,先进的假数据适应网络(PFN),如TabPFN,已成功地在标签数据上进行了分类:模型参数是在给定模型训练后的标签训练样本上基于新样本进行分类。虽然这些模型显示出了极大的承诺,但它们在实际数据上的应用受到了计算规模的限制。我们研究以下问题:给定一个预训练的PFN,如何最好 SUMMARIZE 标签训练样本?我们进行了初步的笔记和特征选择方法的研究,并注意到了PFN与传统的适应 tabular 模型之间的一些关键差异。
Implicit Maximum a Posteriori Filtering via Adaptive Optimization
methods: 该论文使用了优化问题的形式来实现 bayesian 筛选,而不需要维护大 matrices 或 Monte Carlo 估计。
results: 实验表明,该方法可以在高维状态空间中实现有效、Robust 和可扩展的筛选result,与标准的 bayesian 筛选方法相比,该方法更容易 fine-tune 优化器。Abstract
Bayesian filtering approximates the true underlying behavior of a time-varying system by inverting an explicit generative model to convert noisy measurements into state estimates. This process typically requires either storage, inversion, and multiplication of large matrices or Monte Carlo estimation, neither of which are practical in high-dimensional state spaces such as the weight spaces of artificial neural networks. Here, we frame the standard Bayesian filtering problem as optimization over a time-varying objective. Instead of maintaining matrices for the filtering equations or simulating particles, we specify an optimizer that defines the Bayesian filter implicitly. In the linear-Gaussian setting, we show that every Kalman filter has an equivalent formulation using K steps of gradient descent. In the nonlinear setting, our experiments demonstrate that our framework results in filters that are effective, robust, and scalable to high-dimensional systems, comparing well against the standard toolbox of Bayesian filtering solutions. We suggest that it is easier to fine-tune an optimizer than it is to specify the correct filtering equations, making our framework an attractive option for high-dimensional filtering problems.
摘要
bayesian filtering aproximates the true underlying behavior of a time-varying system by inverting an explicit generative model to convert noisy measurements into state estimates. this process typically requires either storage, inversion, and multiplication of large matrices or monte carlo estimation, neither of which are practical in high-dimensional state spaces such as the weight spaces of artificial neural networks. here, we frame the standard bayesian filtering problem as optimization over a time-varying objective. instead of maintaining matrices for the filtering equations or simulating particles, we specify an optimizer that defines the bayesian filter implicitly. in the linear-gaussian setting, we show that every kalman filter has an equivalent formulation using k steps of gradient descent. in the nonlinear setting, our experiments demonstrate that our framework results in filters that are effective, robust, and scalable to high-dimensional systems, comparing well against the standard toolbox of bayesian filtering solutions. we suggest that it is easier to fine-tune an optimizer than it is to specify the correct filtering equations, making our framework an attractive option for high-dimensional filtering problems.
Graph Neural Networks for Pressure Estimation in Water Distribution Systems
results: 这个方法在一个规模很大的荷兰水分布网络上进行了实验,并得到了较高的准确性和稳定性。相比之下,之前的研究中的方法在同样的网络上的表现较差。Abstract
Pressure and flow estimation in Water Distribution Networks (WDN) allows water management companies to optimize their control operations. For many years, mathematical simulation tools have been the most common approach to reconstructing an estimate of the WDN hydraulics. However, pure physics-based simulations involve several challenges, e.g. partially observable data, high uncertainty, and extensive manual configuration. Thus, data-driven approaches have gained traction to overcome such limitations. In this work, we combine physics-based modeling and Graph Neural Networks (GNN), a data-driven approach, to address the pressure estimation problem. First, we propose a new data generation method using a mathematical simulation but not considering temporal patterns and including some control parameters that remain untouched in previous works; this contributes to a more diverse training data. Second, our training strategy relies on random sensor placement making our GNN-based estimation model robust to unexpected sensor location changes. Third, a realistic evaluation protocol considers real temporal patterns and additionally injects the uncertainties intrinsic to real-world scenarios. Finally, a multi-graph pre-training strategy allows the model to be reused for pressure estimation in unseen target WDNs. Our GNN-based model estimates the pressure of a large-scale WDN in The Netherlands with a MAE of 1.94mH$_2$O and a MAPE of 7%, surpassing the performance of previous studies. Likewise, it outperformed previous approaches on other WDN benchmarks, showing a reduction of absolute error up to approximately 52% in the best cases.
摘要
“水distribution网络(WDN)中的压力和流量估算可以帮助水资源管理公司优化其控制操作。在过去的几十年中,数学模拟工具一直是WDN hidraulics的重要估算方法。然而,基于物理的数据驱动方法具有一些挑战,如部分可见数据、高度不确定和广泛的手动配置。因此,数据驱动方法在WDN中得到了广泛的应用。在这种情况下,我们将物理模型和图 neural network(GNN)相结合,以解决压力估算问题。首先,我们提出了一种新的数据生成方法,使用数学模拟而不考虑时间模式,并包括一些控制参数,这些参数在前一些研究中未经考虑。其次,我们的训练策略基于随机感知器的布局,使我们的GNN-based estimation模型具有鲁棒性。最后,我们采用了一种现实istic的评估协议,考虑真实的时间模式,并在实际情况中添加了内在的不确定性。此外,我们还提出了一种多图预训练策略,使模型可以在未看到的目标WDN中进行重用。我们的GNN-based模型在荷兰的一个大规模WDN中估算了压力的 Mean Absolute Error(MAE)为1.94mH$_2$O,与前一些研究相比,表现出色。此外,它还在其他WDNbenchmark上表现出色,比前一些方法减少绝对错误的约52%。”
paper_authors: Adam D. Cobb, Brian Matejek, Daniel Elenius, Anirban Roy, Susmit Jha
for: 这 paper 是为了提出一种新的likelihood-free simulation-based inference(SBI)的估计器。
methods: 这 paper 使用了一种直观ratio estimator(DNRE)来估计likelihood ratio,DNRE 通过单个前进传播来估计likelihood ratio,与之前的方法不同。
results: 作者在引入 DNRE 时还提出了一种相应的Monte Carlo估计 posterior,并对新的 ratio estimator 进行了比较性分析。 results 显示,新的 ratio estimator 通常能够超越先前的方法。此外,作者还引入了一种新的 derivative estimator,用于比较likelihood-free Hamiltonian Monte Carlo(HMC)与 random-walk Metropolis-Hastings(MH)。结果表明,HMC 和 MH 在效果上几乎相等。最后,作者通过使用 neural ratio estimator 设计了一架quadcopter,这是一个实际应用的例子。代码可以在https://github.com/SRI-CSL/dnre 上获取。Abstract
We introduce a new amortized likelihood ratio estimator for likelihood-free simulation-based inference (SBI). Our estimator is simple to train and estimates the likelihood ratio using a single forward pass of the neural estimator. Our approach directly computes the likelihood ratio between two competing parameter sets which is different from the previous approach of comparing two neural network output values. We refer to our model as the direct neural ratio estimator (DNRE). As part of introducing the DNRE, we derive a corresponding Monte Carlo estimate of the posterior. We benchmark our new ratio estimator and compare to previous ratio estimators in the literature. We show that our new ratio estimator often outperforms these previous approaches. As a further contribution, we introduce a new derivative estimator for likelihood ratio estimators that enables us to compare likelihood-free Hamiltonian Monte Carlo (HMC) with random-walk Metropolis-Hastings (MH). We show that HMC is equally competitive, which has not been previously shown. Finally, we include a novel real-world application of SBI by using our neural ratio estimator to design a quadcopter. Code is available at https://github.com/SRI-CSL/dnre.
摘要
我们介绍一个新的折衣率分布估计器,用于无likelihood-based simulation-based推理(SBI)。我们的估计器简单易于训练,通过单一的前进传播神经估计器来估计对抗组件之间的折衣率。我们的方法直接计算两个竞争性 parameter set 之间的折衣率,与前一种比较两个神经网络输出值的方法不同。我们称之为“直接神经率估计器”(DNRE)。在引入 DNRE 时,我们 derivate 一个对应的Monte Carlo estimate of the posterior。我们 benchmark 我们的新折衣率估计器,并与过去的折衣率估计器进行比较。我们显示了我们的新折衣率估计器经常超越过去的方法。此外,我们引入了一个新的折衣率估计器 Derby 估计器,允许我们比较likelihood-free Hamiltonian Monte Carlo(HMC)与随机步进 Metropolis-Hastings(MH)。我们显示了HMC 与 MH 在likelihood-free状况下是等效的,这没有被证明过。最后,我们还提出了一个新的实际应用,利用我们的神经率估计器设计一架quadcopter。代码可以在https://github.com/SRI-CSL/dnre 上找到。
RONAALP: Reduced-Order Nonlinear Approximation with Active Learning Procedure
for: This paper is written for engineers and researchers who need to evaluate expensive, non-linear high-dimensional functions in their applications.
methods: The paper proposes the RONAALP algorithm, which is a reduced-order nonlinear approximation with active learning procedure to incrementally learn a fast and accurate reduced-order surrogate model of a target function on-the-fly. The algorithm combines nonlinear auto-encoders, community clustering, and radial basis function networks to learn an efficient and compact surrogate model with limited training data.
results: The paper demonstrates the effectiveness of the RONAALP algorithm on three direct numerical simulations of hypersonic flows in chemical nonequilibrium. The results show that the algorithm can reduce the cost of the simulation by up to 75% while maintaining an error of less than 10% on relevant quantities of interest.Abstract
Many engineering applications rely on the evaluation of expensive, non-linear high-dimensional functions. In this paper, we propose the RONAALP algorithm (Reduced Order Nonlinear Approximation with Active Learning Procedure) to incrementally learn a fast and accurate reduced-order surrogate model of a target function on-the-fly as the application progresses. First, the combination of nonlinear auto-encoder, community clustering and radial basis function networks allows to learn an efficient and compact surrogate model with limited training data. Secondly, the active learning procedure overcome any extrapolation issue when evaluating the surrogate model outside of its initial training range during the online stage. This results in generalizable, fast and accurate reduced-order models of high-dimensional functions. The method is demonstrated on three direct numerical simulations of hypersonic flows in chemical nonequilibrium. Accurate simulations of these flows rely on detailed thermochemical gas models that dramatically increase the cost of such calculations. Using RONAALP to learn a reduced-order thermodynamic model surrogate on-the-fly, the cost of such simulation was reduced by up to 75% while maintaining an error of less than 10% on relevant quantities of interest.
摘要
многие инженерные приложения зависят от оценки дорогостоящих, нелинейных, многомерных функций. В этой статье мы предлагаем алгоритм RONAALP (Редуцированный порядковый нелинейный подход с активным обучением) для постепенного обучения быстрому и точному редуцированному моделированию целевой функции на месте, как функция прогрессирует. Сначала комбинация нелинейного автоэнкодера, clustering сообществ и радиальных основных сетей позволяет обучить эффективную и компактную модель-surrogate с ограниченным количеством данных обучения. Затем активный процесс обучения преодолевает любые проблемы экстраполяции, когда модель-surrogate оценивается за пределами своего первоначального диапазона во время онлайн-стадии. Это приводит к генерализованным, быстрым и точным редуцированным моделям высокодимензионных функций. Метод демонстрируется на трёх прямых численных симуляциях гиперзвуковых потоков в химическом неравновесии. Точные симуляции таких потоков зависят от подробных моделей газов, что увеличивает стоимость расчетов. Применение RONAALP для обучения редуцированному модели thermodynamic на месте сократило стоимость таких симуляций на 75% при сохранении ошибки менее 10% на ключевые величины интереса.
Utilizing VQ-VAE for End-to-End Health Indicator Generation in Predicting Rolling Bearing RUL
results: 使用VQ-VAE方法构建标签后,PMH2012数据集上的方法显示出较低的MAD和MV值,而使用VQ-VAE标签训练的ASTCN预测模型也达到了最低的MAD和MV值。Abstract
The prediction of the remaining useful life (RUL) of rolling bearings is a pivotal issue in industrial production. A crucial approach to tackling this issue involves transforming vibration signals into health indicators (HI) to aid model training. This paper presents an end-to-end HI construction method, vector quantised variational autoencoder (VQ-VAE), which addresses the need for dimensionality reduction of latent variables in traditional unsupervised learning methods such as autoencoder. Moreover, concerning the inadequacy of traditional statistical metrics in reflecting curve fluctuations accurately, two novel statistical metrics, mean absolute distance (MAD) and mean variance (MV), are introduced. These metrics accurately depict the fluctuation patterns in the curves, thereby indicating the model's accuracy in discerning similar features. On the PMH2012 dataset, methods employing VQ-VAE for label construction achieved lower values for MAD and MV. Furthermore, the ASTCN prediction model trained with VQ-VAE labels demonstrated commendable performance, attaining the lowest values for MAD and MV.
摘要
rolling bearings 的剩余有用生命剩余预测是工业生产中的一个关键问题。一种关键的方法是将振荡信号转化为健康指标(HI),以便模型训练。本文提出了一种终端HI建构方法,基于量化变分自动编码器(VQ-VAE),解决了传统无监督学习方法中的维度减少问题。此外,由于传统统计指标不准确地反映曲线波动,本文引入了两种新的统计指标:平均绝对距离(MAD)和平均方差(MV)。这两种指标准确地描述曲线波动的特征,因此可以反映模型对相似特征的准确性。在PMH2012数据集上,使用VQ-VAE进行标签建构的方法实现了较低的MAD和MV值。此外,使用VQ-VAE标签训练的ASTCN预测模型实现了最低的MAD和MV值。
Causal Fairness-Guided Dataset Reweighting using Neural Networks
paper_authors: Xuan Zhao, Klaus Broelemann, Salvatore Ruggieri, Gjergji Kasneci
for: This paper aims to address the issue of fairness in machine learning models from a causal perspective, and proposes a reweighting scheme of datasets to mitigate bias and achieve causal fairness.
methods: The proposed method uses two neural networks to approximate the causal model of the data and the causal model of interventions, and applies reweighting guided by a discriminator to achieve various fairness notions.
results: The experiments on real-world datasets show that the proposed method can achieve causal fairness on the data while remaining close to the original data for downstream tasks.Abstract
The importance of achieving fairness in machine learning models cannot be overstated. Recent research has pointed out that fairness should be examined from a causal perspective, and several fairness notions based on the on Pearl's causal framework have been proposed. In this paper, we construct a reweighting scheme of datasets to address causal fairness. Our approach aims at mitigating bias by considering the causal relationships among variables and incorporating them into the reweighting process. The proposed method adopts two neural networks, whose structures are intentionally used to reflect the structures of a causal graph and of an interventional graph. The two neural networks can approximate the causal model of the data, and the causal model of interventions. Furthermore, reweighting guided by a discriminator is applied to achieve various fairness notions. Experiments on real-world datasets show that our method can achieve causal fairness on the data while remaining close to the original data for downstream tasks.
摘要
“machine learning模型中的公平性的重要性不能被过度说明。latest research表明,公平性应该从 causal perspective examined,并提出了基于pearl causal framework的多种公平性观。本文提出了一种基于dataset重Weighting的方法,以mitigate bias by considering the causal relationships among variables and incorporating them into the reweighting process。我们采用了两个神经网络,其结构与causal graph和interventional graph Reflects。两个神经网络可以 aproximate the causal model of the data, and the causal model of interventions。此外,我们还使用了一个抑制器来实现多种公平性观。实验表明,我们的方法可以在实际数据上实现causal fairness,while remain close to the original data for downstream tasks。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other parts of the world. Traditional Chinese is also widely used, especially in Taiwan and Hong Kong.
Handling Overlapping Asymmetric Datasets – A Twice Penalized P-Spline Approach
results: 通过数据 simulate、参数调整和模型改进,我们发现在考虑continuous和binaryResponse的情况下,我们的双重penalized方法可以提供较好的适应性,比起线性B-spline和once penalized P-spline Approximation。在实际数据中应用于评估非酒精性肝炎发展的风险时,我们发现模型适应性得到了65%以上的提高。Abstract
Overlapping asymmetric datasets are common in data science and pose questions of how they can be incorporated together into a predictive analysis. In healthcare datasets there is often a small amount of information that is available for a larger number of patients such as an electronic health record, however a small number of patients may have had extensive further testing. Common solutions such as missing imputation can often be unwise if the smaller cohort is significantly different in scale to the larger sample, therefore the aim of this research is to develop a new method which can model the smaller cohort against a particular response, whilst considering the larger cohort also. Motivated by non-parametric models, and specifically flexible smoothing techniques via generalized additive models, we model a twice penalized P-Spline approximation method to firstly prevent over/under-fitting of the smaller cohort and secondly to consider the larger cohort. This second penalty is created through discrepancies in the marginal value of covariates that exist in both the smaller and larger cohorts. Through data simulations, parameter tunings and model adaptations to consider a continuous and binary response, we find our twice penalized approach offers an enhanced fit over a linear B-Spline and once penalized P-Spline approximation. Applying to a real-life dataset relating to a person's risk of developing Non-Alcoholic Steatohepatitis, we see an improved model fit performance of over 65%. Areas for future work within this space include adapting our method to not require dimensionality reduction and also consider parametric modelling methods. However, to our knowledge this is the first work to propose additional marginal penalties in a flexible regression of which we can report a vastly improved model fit that is able to consider asymmetric datasets, without the need for missing data imputation.
摘要
常见的不协调数据集在数据科学中出现,问题是如何将它们集成到预测分析中。医疗数据集中有时只有一小部分数据可用于较多的病人,例如电子健康记录,但是一些病人可能进行了较多的进一步检测。常见的解决方案,如遗弃值替换,可能不适用,因为小组规模较小的组比大组规模更大。因此,本研究的目标是开发一种新的方法,可以将小组模型为特定的响应,同时考虑大组。我们受非参数模型的激励,以及通用的滑动技术,特别是通用的加性模型。我们使用二次罚款P-Spline近似方法,首先避免小组过/下适应,第二个罚款是通过小组和大组covariate的偏度差异来考虑大组。通过数据 simulations、参数调整和模型修改来考虑连续和二分类响应,我们发现我们的两次罚款方法在Linear B-Spline和一次罚款P-Spline Approximation中提供了显著改进。应用于一个实际数据集,关于一个人的非酒精性肝炎风险,我们发现我们的模型适应性能高于65%。未来的工作包括适应我们的方法不需要维度减少和考虑 Parametric 模型方法。但是,到我们知道的是,这是首次提出额外的边缘罚款在灵活回归中,我们可以报告一个远远超过65%的模型适应性能,能够考虑不协调数据集,无需遗弃数据替换。
Robustness Enhancement in Neural Networks with Alpha-Stable Training Noise
paper_authors: Xueqiong Yuan, Jipeng Li, Ercan Engin Kuruoğlu
For: The paper aims to improve the robustness of deep learning systems by exploring the use of alpha-stable noise instead of Gaussian noise for data augmentation.* Methods: The paper compares the testing accuracy of models trained with Gaussian noise and alpha-stable noise on data corrupted by different types of noise, and finds that training with alpha-stable noise is more effective, especially for impulsive noise.* Results: The paper shows that training with alpha-stable noise improves the robustness of deep learning models on various datasets, including image and time series datasets, and other benchmark corrupted datasets.Here’s the simplified Chinese text for the three points:* For: 该论文目的是提高深度学习系统的Robustness,通过替换传统的高斯噪声使用α稳定噪声进行数据增强。* Methods: 论文通过对不同噪声类型的数据进行测试,并比较高斯噪声和α稳定噪声训练模型的测试精度,发现α稳定噪声训练模型在干扰噪声下的性能更高。* Results: 论文通过在多个图像和时间序列数据集以及其他受损数据集上进行实验,证明α稳定噪声训练模型在不同的噪声环境下都能够提高模型的Robustness。Abstract
With the increasing use of deep learning on data collected by non-perfect sensors and in non-perfect environments, the robustness of deep learning systems has become an important issue. A common approach for obtaining robustness to noise has been to train deep learning systems with data augmented with Gaussian noise. In this work, we challenge the common choice of Gaussian noise and explore the possibility of stronger robustness for non-Gaussian impulsive noise, specifically alpha-stable noise. Justified by the Generalized Central Limit Theorem and evidenced by observations in various application areas, alpha-stable noise is widely present in nature. By comparing the testing accuracy of models trained with Gaussian noise and alpha-stable noise on data corrupted by different noise, we find that training with alpha-stable noise is more effective than Gaussian noise, especially when the dataset is corrupted by impulsive noise, thus improving the robustness of the model. The generality of this conclusion is validated through experiments conducted on various deep learning models with image and time series datasets, and other benchmark corrupted datasets. Consequently, we propose a novel data augmentation method that replaces Gaussian noise, which is typically added to the training data, with alpha-stable noise.
摘要
Alpha-stable noise is widely present in nature, as evidenced by the Generalized Central Limit Theorem and observations in various application areas. By comparing the testing accuracy of models trained with Gaussian noise and alpha-stable noise on data corrupted by different noise, we find that training with alpha-stable noise is more effective than Gaussian noise, especially when the dataset is corrupted by impulsive noise, thus improving the robustness of the model.We validate the generality of this conclusion through experiments conducted on various deep learning models with image and time series datasets, as well as other benchmark corrupted datasets. Based on these results, we propose a novel data augmentation method that replaces Gaussian noise, which is typically added to the training data, with alpha-stable noise. This approach can improve the robustness of deep learning systems and enhance their ability to generalize to real-world data.
Maintenance Techniques for Anomaly Detection AIOps Solutions
paper_authors: Lorena Poenaru-Olaru, Natalia Karpova, Luis Cruz, Jan Rellermeyer, Arie van Deursen
for: 这种研究旨在探讨 anomaly detection 技术如何自动监控 IT 系统和操作,以及如何保持模型的性能在时间变化中。
methods: 本研究使用了两种不同的模型维护技术,namely blind model retraining 和 informed model retraining,并对各种更新频率进行了分析。
results: 研究发现,采用 full-history 方法更新模型可以保持较高的检测精度,而 sliding window 方法更新模型则可以适应时间变化。此外,使用数据变化监控工具可以确定模型是否需要更新。Abstract
Anomaly detection techniques are essential in automating the monitoring of IT systems and operations. These techniques imply that machine learning algorithms are trained on operational data corresponding to a specific period of time and that they are continuously evaluated on newly emerging data. Operational data is constantly changing over time, which affects the performance of deployed anomaly detection models. Therefore, continuous model maintenance is required to preserve the performance of anomaly detectors over time. In this work, we analyze two different anomaly detection model maintenance techniques in terms of the model update frequency, namely blind model retraining and informed model retraining. We further investigate the effects of updating the model by retraining it on all the available data (full-history approach) and on only the newest data (sliding window approach). Moreover, we investigate whether a data change monitoring tool is capable of determining when the anomaly detection model needs to be updated through retraining.
摘要
“异常探测技术是自动监控IT系统和操作的重要工具。这些技术假设机器学习算法在特定时间段的操作数据上进行训练,并在新的数据上持续评估。操作数据随时间变化,因此需要持续维护适用的异常探测模型,以确保它们在时间进行良好的表现。在这个工作中,我们分析了两种异常探测模型维护技术,分别是隐身模型重训和知情模型重训。我们还调查了将模型重训在所有可用数据上(全历史方法)和仅仅在最新的数据上(滑块窗口方法)的影响。此外,我们还探讨了一个数据变化监控工具是否能够决定异常探测模型是否需要更新。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and other regions.
DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
for: This paper proposes a method to improve the efficiency of multi-task model training, which is often hindered by the variation in input sequence length.
methods: The proposed method, called DynaPipe, uses dynamic micro-batching to tackle sequence length variation and enable efficient training of large language models.
results: The authors evaluate DynaPipe on the FLANv2 dataset and show that it achieves up to 4.39x higher training throughput compared to packing-based baselines, and 3.25x compared to the best-performing baseline.Here’s the full text in Simplified Chinese:
results: 作者们在FLANv2数据集上评估了DynaPipe,并显示其在比基eline的4.39倍和GPT的3.25倍的训练吞吐量上具有明显的优势。Abstract
Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipeline-parallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling, enabling highly efficient pipeline training. Extensive evaluation on the FLANv2 dataset demonstrates up to 4.39x higher training throughput when training T5, and 3.25x when training GPT, as compared with packing-based baselines. DynaPipe's source code is publicly available at https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines.
摘要
多任务模型训练已被采用,以使用单个深度神经网络模型(通常是大型语言模型)来处理多个任务(例如,问答和文本摘要)。多任务训练通常接收不同上下文中的输入序列,因此输入样本的长度异常变化。 padding(到同一个序列长度)或 packing(短示例入力到同一个长度的序列)通常被采用,以为模型训练准备输入样本。然而,这并不是空间或计算效率的最佳选择。这篇论文提出了动态微批处理方法,以解决序列长度的变化和有效地进行多任务模型训练。我们提议在大型模型的管道并行训练中使用可变大小的微批,每个微批可能包含不同数量的样本。我们使用动态编程方法优化微批的建立,并通过动态管道和通信调度来处理微批执行时间的变化,以实现高效的管道训练。我们对FLANv2数据集进行了广泛的评估,并显示在与填充基elines进行比较时,DynaPipe可以增加训练 durchput的4.39倍,对T5模型进行训练时,可以增加3.25倍。DynaPipe的源代码可以在https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines中获取。
Decentralized Energy Marketplace via NFTs and AI-based Agents
results: 研究人员通过对系统进行广泛评估,证明了系统的扩展性和FDRL方法在分布式能源供应中的优化性。这项研究对建立先进的分布式智能电网基础设施做出了重要贡献,并拓宽了区块链和人工智能在可再生能源系统中的应用前景。Abstract
The paper introduces an advanced Decentralized Energy Marketplace (DEM) integrating blockchain technology and artificial intelligence to manage energy exchanges among smart homes with energy storage systems. The proposed framework uses Non-Fungible Tokens (NFTs) to represent unique energy profiles in a transparent and secure trading environment. Leveraging Federated Deep Reinforcement Learning (FDRL), the system promotes collaborative and adaptive energy management strategies, maintaining user privacy. A notable innovation is the use of smart contracts, ensuring high efficiency and integrity in energy transactions. Extensive evaluations demonstrate the system's scalability and the effectiveness of the FDRL method in optimizing energy distribution. This research significantly contributes to developing sophisticated decentralized smart grid infrastructures. Our approach broadens potential blockchain and AI applications in sustainable energy systems and addresses incentive alignment and transparency challenges in traditional energy trading mechanisms. The implementation of this paper is publicly accessible at \url{https://github.com/RasoulNik/DEM}.
摘要
文章介绍了一种先进的分布式能源市场place(DEM),利用区块链技术和人工智能来管理智能家庭之间的能源交易。提出的框架使用非 fungible Token(NFT)来表示独特的能源Profile,创造透明和安全的交易环境。通过联邦深度学习(FDRL),系统推广合作和适应能源管理策略,保持用户隐私。使用智能合同,确保高效和完整的能源交易。经过广泛评估,系统的扩展性和FDRL方法在优化能源分布方面的效果得到证明。这项研究对建立先进的分布式智能网格基础设施做出了重要贡献。我们的方法拓宽了区块和人工智能在可再生能源系统中的应用前景,解决了传统能源交易机制中的奖励对齐和透明度挑战。实现该文件可以通过 \url{https://github.com/RasoulNik/DEM} 访问。
results: 该论文对多个研究和生产用 caso,包括训练和推理,在多个优化问题、多个编译器和其版本以及多个gym基础设施进行了评估。Abstract
There is a growing interest in enhancing compiler optimizations with ML models, yet interactions between compilers and ML frameworks remain challenging. Some optimizations require tightly coupled models and compiler internals,raising issues with modularity, performance and framework independence. Practical deployment and transparency for the end-user are also important concerns. We propose ML-Compiler-Bridge to enable ML model development within a traditional Python framework while making end-to-end integration with an optimizing compiler possible and efficient. We evaluate it on both research and production use cases, for training and inference, over several optimization problems, multiple compilers and its versions, and gym infrastructures.
摘要
有增长的兴趣在把机器学习(ML)模型与编译器结合使用,但是编译器和ML框架之间的交互仍然是一个挑战。一些优化需要与编译器内部紧密集成的模型,导致模块化、性能和框架独立性的问题。在实践中,并不是所有的用户都可以轻松地使用和理解这些技术。我们提出了ML编译桥,它使得在传统的Python框架中开发ML模型,同时可以与优化编译器进行绑定,并且可以实现高效的终端集成。我们在多个研究和生产用 caso中进行了评估,包括训练和推理、多个编译器和其版本、以及Gym基础设施。
Delete My Account: Impact of Data Deletion on Machine Learning Classifiers
results: 我们发现,删除数据量、数据集特点和删除偏好等因素对机器学习模型表现产生强烈的影响。Abstract
Users are more aware than ever of the importance of their own data, thanks to reports about security breaches and leaks of private, often sensitive data in recent years. Additionally, the GDPR has been in effect in the European Union for over three years and many people have encountered its effects in one way or another. Consequently, more and more users are actively protecting their personal data. One way to do this is to make of the right to erasure guaranteed in the GDPR, which has potential implications for a number of different fields, such as big data and machine learning. Our paper presents an in-depth analysis about the impact of the use of the right to erasure on the performance of machine learning models on classification tasks. We conduct various experiments utilising different datasets as well as different machine learning algorithms to analyse a variety of deletion behaviour scenarios. Due to the lack of credible data on actual user behaviour, we make reasonable assumptions for various deletion modes and biases and provide insight into the effects of different plausible scenarios for right to erasure usage on data quality of machine learning. Our results show that the impact depends strongly on the amount of data deleted, the particular characteristics of the dataset and the bias chosen for deletion and assumptions on user behaviour.
摘要
用户们现在更加意识到自己的数据重要性,这主要归功于过去几年内的安全泄露和private数据泄露事件的报道。此外,欧盟的GDPR也已经在三年之前生效,许多人已经直接或 indirectly受其影响。因此,更多的用户正在主动保护自己的个人数据。一种方式是通过在GDPR中确保的“右 deletion”来保护自己的数据。这种技术在big data和机器学习等领域可能有各种不同的应用。我们的论文提供了关于使用“右 deletion”的影响对机器学习模型的分类任务性能的深入分析。我们在不同的数据集和机器学习算法上进行了多种实验,以分析不同的删除行为场景。由于实际用户行为的可靠数据缺乏,我们在不同的删除模式和偏见下做出了合理的假设,并对不同的数据质量和机器学习模型的影响进行了分析。我们的结果表明,删除数据的量、特定数据集的特点以及删除模式和偏见的选择都会对数据质量产生强烈的影响。
Adaptive Modelling Approach for Row-Type Dependent Predictive Analysis (RTDPA): A Framework for Designing Machine Learning Models for Credit Risk Analysis in Banking Sector
For: 这个论文的目的是提出一种适应行业特点的预测分析方法(RTDPA),以便更好地处理不同行业类型的数据。* Methods: 该方法使用了特定行业类型的数据预处理和特性工程,并选择了传统机器学习预测模型和高级组合技术。* Results: 研究发现,所有预测方法具有精度达90%以上,而RTDPA方法可以为每个行业类型分别应用不同的预测模型,以捕捉每个行业类型的特定特征和模式。这种方法为银行业提供了更加准确和定制化的分类结果。Abstract
In many real-world datasets, rows may have distinct characteristics and require different modeling approaches for accurate predictions. In this paper, we propose an adaptive modeling approach for row-type dependent predictive analysis(RTDPA). Our framework enables the development of models that can effectively handle diverse row types within a single dataset. Our dataset from XXX bank contains two different risk categories, personal loan and agriculture loan. each of them are categorised into four classes standard, sub-standard, doubtful and loss. We performed tailored data pre processing and feature engineering to different row types. We selected traditional machine learning predictive models and advanced ensemble techniques. Our findings indicate that all predictive approaches consistently achieve a precision rate of no less than 90%. For RTDPA, the algorithms are applied separately for each row type, allowing the models to capture the specific patterns and characteristics of each row type. This approach enables targeted predictions based on the row type, providing a more accurate and tailored classification for the given dataset.Additionally, the suggested model consistently offers decision makers valuable and enduring insights that are strategic in nature in banking sector.
摘要
在许多现实世界数据集中,每行可能具有不同的特点和需要不同的模型预测方法以获得准确的预测结果。在这篇论文中,我们提出了一种适应型模型预测方法(RTDPA),允许模型在单个数据集中处理多种行类型。我们的数据集来自XXX银行,包括两个不同的风险类别:个人贷款和农业贷款。每个类别都被分为四个类别:标准、次normal、 doubtful和损失。我们进行了适应的数据处理和特征工程,并选择了传统的机器学习预测模型和高级的集成技术。我们的发现表明,所有预测方法均能够具有至少90%的准确率。为RTDPA,我们对每个行类型应用了不同的算法,使模型能够捕捉每个行类型特有的特征和模式。这种方法允许模型进行基于行类型的targeted预测,为给定数据集提供更加准确和定制化的分类结果。此外,我们的建议模型可以为银行部门提供有价值和持续的战略性发现。
Few-shot Message-Enhanced Contrastive Learning for Graph Anomaly Detection
paper_authors: Fan Xu, Nan Wang, Xuezhi Wen, Meiqi Gao, Chaoqun Guo, Xibin Zhao
for: The paper is focused on developing a novel few-shot graph anomaly detection model called FMGAD, which can effectively identify anomalies in graph data with limited labeled information.
methods: The proposed FMGAD model uses a self-supervised contrastive learning strategy within and across views to capture intrinsic and transferable structural representations. Additionally, the model employs a Deep-GNN message-enhanced reconstruction module to extensively exploit few-shot label information and disseminate supervision signals to deeper unlabeled nodes.
results: The paper demonstrates that FMGAD achieves better performance than other state-of-the-art methods on six real-world datasets, regardless of artificially injected anomalies or domain-organic anomalies.Abstract
Graph anomaly detection plays a crucial role in identifying exceptional instances in graph data that deviate significantly from the majority. It has gained substantial attention in various domains of information security, including network intrusion, financial fraud, and malicious comments, et al. Existing methods are primarily developed in an unsupervised manner due to the challenge in obtaining labeled data. For lack of guidance from prior knowledge in unsupervised manner, the identified anomalies may prove to be data noise or individual data instances. In real-world scenarios, a limited batch of labeled anomalies can be captured, making it crucial to investigate the few-shot problem in graph anomaly detection. Taking advantage of this potential, we propose a novel few-shot Graph Anomaly Detection model called FMGAD (Few-shot Message-Enhanced Contrastive-based Graph Anomaly Detector). FMGAD leverages a self-supervised contrastive learning strategy within and across views to capture intrinsic and transferable structural representations. Furthermore, we propose the Deep-GNN message-enhanced reconstruction module, which extensively exploits the few-shot label information and enables long-range propagation to disseminate supervision signals to deeper unlabeled nodes. This module in turn assists in the training of self-supervised contrastive learning. Comprehensive experimental results on six real-world datasets demonstrate that FMGAD can achieve better performance than other state-of-the-art methods, regardless of artificially injected anomalies or domain-organic anomalies.
摘要
GRAPH anomaly detection plays a crucial role in identifying exceptional instances in graph data that deviate significantly from the majority. It has gained substantial attention in various domains of information security, including network intrusion, financial fraud, and malicious comments, etc. Existing methods are primarily developed in an unsupervised manner due to the challenge in obtaining labeled data. For lack of guidance from prior knowledge in unsupervised manner, the identified anomalies may prove to be data noise or individual data instances. In real-world scenarios, a limited batch of labeled anomalies can be captured, making it crucial to investigate the few-shot problem in graph anomaly detection. Taking advantage of this potential, we propose a novel few-shot Graph Anomaly Detection model called FMGAD (Few-shot Message-Enhanced Contrastive-based Graph Anomaly Detector). FMGAD leverages a self-supervised contrastive learning strategy within and across views to capture intrinsic and transferable structural representations. Furthermore, we propose the Deep-GNN message-enhanced reconstruction module, which extensively exploits the few-shot label information and enables long-range propagation to disseminate supervision signals to deeper unlabeled nodes. This module in turn assists in the training of self-supervised contrastive learning. Comprehensive experimental results on six real-world datasets demonstrate that FMGAD can achieve better performance than other state-of-the-art methods, regardless of artificially injected anomalies or domain-organic anomalies.
FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification
results: 在一组ML模型上,基于FIKIT的推理系统在GPU共享模式下加速高优先级任务的执行速度,相比JCT,高优先级任务的加速比例在1.33到14.87倍之间,而且超过一半的情况下加速超过3.5倍。同时,在预约共享模式下,低优先级任务的执行速度与默认GPU共享模式JCT相似,占用了0.84到1倍的时间。此外,我们还限制了核心测量和细化核心调度过程的负担在10%以下。Abstract
Highly parallelized workloads like machine learning training, inferences and general HPC tasks are greatly accelerated using GPU devices. In a cloud computing cluster, serving a GPU's computation power through multi-tasks sharing is highly demanded since there are always more task requests than the number of GPU available. Existing GPU sharing solutions focus on reducing task-level waiting time or task-level switching costs when multiple jobs competing for a single GPU. Non-stopped computation requests come with different priorities, having non-symmetric impact on QoS for sharing a GPU device. Existing work missed the kernel-level optimization opportunity brought by this setting. To address this problem, we present a novel kernel-level scheduling strategy called FIKIT: Filling Inter-kernel Idle Time. FIKIT incorporates task-level priority information, fine-grained kernel identification, and kernel measurement, allowing low priorities task's execution during high priority task's inter-kernel idle time. Thereby, filling the GPU's device runtime fully, and reduce overall GPU sharing impact to cloud services. Across a set of ML models, the FIKIT based inference system accelerated high priority tasks by 1.33 to 14.87 times compared to the JCT in GPU sharing mode, and more than half of the cases are accelerated by more than 3.5 times. Alternatively, under preemptive sharing, the low-priority tasks have a comparable to default GPU sharing mode JCT, with a 0.84 to 1 times ratio. We further limit the kernel measurement and runtime fine-grained kernel scheduling overhead to less than 10%.
摘要
高度并行化工作负载如机器学习训练、推断和一般高性能计算任务在GPU设备上得到了很大的加速。在云计算集群中,通过多任务共享来利用GPU的计算能力是非常受需求的,因为有更多的任务请求 чемGPU可用。现有的GPU共享解决方案主要集中在缓和任务级别的等待时间或任务级别的转换成本,当多个作业竞争一个GPU时。现有的工作缺乏了核心层级优化机会,这个设定下。为解决这个问题,我们提出了一个新的核心层级排程策略 called FIKIT:填充间隔 kernel Idle Time。FIKIT包括任务级别优先级信息、精确的核心识别和核心衡量,因此在高优先级任务的间隔 kernel Idle Time中执行低优先级任务,从而填充GPU的设备时间,并将全局GPU共享影响减少到云服务。遍历一系列的ML模型,基于FIKIT的推断系统在GPU共享模式下加速高优先级任务1.33到14.87倍,并大多数情况上增加了超过3.5倍。另一方面,在预设共享模式下,低优先级任务与默认GPU共享模式JCT相似,几乎没有差异。我们进一步限制了测量和精确核心排程调度的开销,以便在10%以下。
How False Data Affects Machine Learning Models in Electrochemistry?
for: This study aims to evaluate the performance of machine learning models in noisy electrochemical data and to determine whether stacking models can provide robustness to weak-to-noise models.
methods: The study uses 12 standalone models and a stacking model to test the performance of different machine learning models on electrochemical data. The models include XGB, LGBM, RF, GB, ADA, NN, ELAS, LASS, RIDGE, SVM, KNN, DT, and the stacking model.
results: The study finds that linear models handle noise well but suffer from low prediction accuracy, while tree-based models have poor noise handling but high prediction accuracy. The stacking model exhibits both high accuracy and good noise handling, making it a viable choice for beginner and experienced machine learning researchers in electrochemistry. Additionally, the study shows that neural networks are not suitable for electrochemical data and can be susceptible to noise.Abstract
Recently, the selection of machine learning model based on only the data distribution without concerning the noise of the data. This study aims to distinguish, which models perform well under noisy data, and establish whether stacking machine learning models actually provide robustness to otherwise weak-to-noise models. The electrochemical data were tested with 12 standalone models and stacking model. This includes XGB, LGBM, RF, GB, ADA, NN, ELAS, LASS, RIDGE, SVM, KNN, DT, and the stacking model. It is found that linear models handle noise well with the average error of (slope) to 1.75 F g-1 up to error per 100% percent noise added; but it suffers from prediction accuracy due to having an average of 60.19 F g-1 estimated at minimal error at 0% noise added. Tree-based models fail in terms of noise handling (average slope is 55.24 F g-1 at 100% percent noise), but it can provide higher prediction accuracy (lowest error of 23.9 F g-1) than that of linear. To address the controversial between prediction accuracy and error handling, the stacking model was constructed, which is not only show high accuracy (intercept of 25.03 F g-1), but it also exhibits good noise handling (slope of 43.58 F g-1), making stacking models a relatively low risk and viable choice for beginner and experienced machine learning research in electrochemistry. Even though neural networks (NN) are gaining popularity in the electrochemistry field. However, this study presents that NN is not suitable for electrochemical data, and improper tuning resulting in a model that is susceptible to noise. Thus, STACK models should provide better benefits in that even with untuned base models, they can achieve an accurate and noise-tolerant model. Overall, this work provides insight into machine learning model selection for electrochemical data, which should aid the understanding of data science in chemistry context.
摘要
近来,选择基于数据分布而不考虑数据噪声的机器学习模型。这项研究目的是分辨哪些模型在噪声数据上表现良好,并确定堆叠机器学习模型是否提供强度噪声模型的可靠性。使用12个独立模型和堆叠模型测试电化学数据。包括XGB、LGBM、RF、GB、ADA、NN、ELAS、LASS、RIDGE、SVM、KNN和DT模型,以及堆叠模型。结果显示线性模型在噪声数据上处理噪声well,其平均误差为1.75 F g-1,但是它因为误差最小值为0% 噪声添加而导致预测精度低下。树状模型在噪声处理方面失败(平均 Slope 为55.24 F g-1),但它可以提供更高的预测精度(最低误差为23.9 F g-1)。为了解决预测精度和噪声处理之间的矛盾,堆叠模型被构建,它不仅显示高精度( intercept 为25.03 F g-1),而且 также表现良好的噪声处理( Slope 为43.58 F g-1)。因此,堆叠模型可以在电化学领域中提供低风险且可靠的选择。虽然神经网络(NN)在电化学领域中获得了流行,但这项研究表明NN不适合电化学数据,并且不当调整可能导致模型易受噪声的影响。因此,STACK模型可以提供更好的利益,即甚至无需调整基本模型,可以实现准确且噪声耐受的模型。总之,这项研究为电化学数据机器学习模型选择提供了新的理解,帮助数据科学在化学上下文中进行更好的发展。
Towards Machine Learning-based Quantitative Hyperspectral Image Guidance for Brain Tumor Resection
paper_authors: David Black, Declan Byrne, Anna Walke, Sidong Liu, Antonio Di leva, Sadahiro Kaneko, Walter Stummer, Septimiu Salcudean, Eric Suero Molina for: 这个论文主要目标是为了开发一种基于多光谱的肿瘤分类系统,以帮助 neurosurgeon 在运行时分辨不同类型的肿瘤。methods: 这个论文使用了五种氧化酶的谱荧光特征,通过多光谱成像技术来分析这些氧化酶的谱荧光特征,并使用机器学习算法来类型不同的肿瘤和组织。results: 这个论文的结果表明,使用这五种氧化酶的谱荧光特征可以准确地分类不同类型的肿瘤和组织,并且这些氧化酶的谱荧光特征在不同的肿瘤和组织中具有不同的异常性。Abstract
Complete resection of malignant gliomas is hampered by the difficulty in distinguishing tumor cells at the infiltration zone. Fluorescence guidance with 5-ALA assists in reaching this goal. Using hyperspectral imaging, previous work characterized five fluorophores' emission spectra in most human brain tumors. In this paper, the effectiveness of these five spectra was explored for different tumor and tissue classification tasks in 184 patients (891 hyperspectral measurements) harboring low- (n=30) and high-grade gliomas (n=115), non-glial primary brain tumors (n=19), radiation necrosis (n=2), miscellaneous (n=10) and metastases (n=8). Four machine learning models were trained to classify tumor type, grade, glioma margins and IDH mutation. Using random forests and multi-layer perceptrons, the classifiers achieved average test accuracies of 74-82%, 79%, 81%, and 93% respectively. All five fluorophore abundances varied between tumor margin types and tumor grades (p < 0.01). For tissue type, at least four of the five fluorophore abundances were found to be significantly different (p < 0.01) between all classes. These results demonstrate the fluorophores' differing abundances in different tissue classes, as well as the value of the five fluorophores as potential optical biomarkers, opening new opportunities for intraoperative classification systems in fluorescence-guided neurosurgery.
摘要
完全除除恶性肿瘤受到了区分肿瘤细胞的困难所限制。使用5-ALA的荧光导航可以帮助达成这个目标。以前的工作通过多光谱成像技术 caracterized five fluorophores的辐射 спектrum在人脑肿瘤中的表达。本文 investigate了这五种辐射的效iveness для不同的肿瘤和组织类型分类任务,在184名患者(891次谱测)中进行了研究,其中有30例低级 glioma、115例高级 glioma、非 glial主要脑肿瘤19例、辐射nekrosis 2例、其他10例和 метастаasis 8例。四种机器学习模型被训练来分类肿瘤类型、级别、 glioma 边缘和 IDH 突变。使用随机森林和多层感知器,分类器在测试集上达到了74-82%、79%、81%和93%的平均测试准确率。五种辐射含量在肿瘤边缘类型和肿瘤级别之间有 statistically significant differences(P < 0.01)。对于组织类型,至少有四种辐射含量都是 statistically significant differences(P < 0.01)。这些结果表明五种辐射含量在不同的组织类型之间有 statistically significant differences,同时也表明了这五种辐射的可能作为光学生物标志的价值,开启了新的可见光导航系统在肿瘤预置手术中的新机会。
Graph Sparsifications using Neural Network Assisted Monte Carlo Tree Search
paper_authors: Alvin Chiu, Mithun Ghosh, Reyan Ahmed, Kwang-Sung Jun, Stephen Kobourov, Michael T. Goodrich
for: 这个论文是为了计算图 sparse 的目的而写的。
methods: 该论文使用了图神经网络和 Monte Carlo 搜索来计算图 sparse。首先,使用图神经网络来训练一个可以接受部分解决方案并提出新节点的模型,然后使用这个模型在 Monte Carlo 搜索中计算一个简化后的图。
results: 该方法在不同类型的图上consistently 超过了一些标准的approximation algorithm,并经常找到最佳解决方案。Abstract
Graph neural networks have been successful for machine learning, as well as for combinatorial and graph problems such as the Subgraph Isomorphism Problem and the Traveling Salesman Problem. We describe an approach for computing graph sparsifiers by combining a graph neural network and Monte Carlo Tree Search. We first train a graph neural network that takes as input a partial solution and proposes a new node to be added as output. This neural network is then used in a Monte Carlo search to compute a sparsifier. The proposed method consistently outperforms several standard approximation algorithms on different types of graphs and often finds the optimal solution.
摘要
图 нейрон网络已成功应用于机器学习和组合问题,如子图同构问题和旅行商问题。我们提出了一种基于图 нейрон网络和Monte Carlo搜索的图简化算法。我们首先训练了一个图 нейрон网络,它接受一个半解决方案并生成一个新的节点作为输出。这个神经网络然后在Monte Carlo搜索中用于计算简化图。我们的方法常常超过一些标准近似算法,并经常找到优化解决方案。
Interpretable Modeling of Single-cell perturbation Responses to Novel Drugs Using Cycle Consistence Learning
methods: 该框架基于编码器-解码器架构,将初始细胞状态映射到一个缺失空间中,其中假设药物干扰对细胞状态的效果follows linear additivity。另外,我们还引入了一种循环一致性约束,以保证初始细胞状态下药物干扰后的细胞响应能够恢复到初始细胞状态。
results: 我们对三种不同类型的数据集进行验证,包括维度肿瘤响应、蛋白质响应和单个细胞肿瘤响应。结果显示,我们的模型在比较状态-of-the-art方法时表现出了更好的性能。Abstract
Phenotype-based screening has attracted much attention for identifying cell-active compounds. Transcriptional and proteomic profiles of cell population or single cells are informative phenotypic measures of cellular responses to perturbations. In this paper, we proposed a deep learning framework based on encoder-decoder architecture that maps the initial cellular states to a latent space, in which we assume the effects of drug perturbation on cellular states follow linear additivity. Next, we introduced the cycle consistency constraints to enforce that initial cellular state subjected to drug perturbations would produce the perturbed cellular responses, and, conversely, removal of drug perturbation from the perturbed cellular states would restore the initial cellular states. The cycle consistency constraints and linear modeling in latent space enable to learn interpretable and transferable drug perturbation representations, so that our model can predict cellular response to unseen drugs. We validated our model on three different types of datasets, including bulk transcriptional responses, bulk proteomic responses, and single-cell transcriptional responses to drug perturbations. The experimental results show that our model achieves better performance than existing state-of-the-art methods.
摘要
生物学上的现象型层次检测已经吸引了很多关注,以找到活性物质的潜在作用。 транскрипцион和蛋白质Profile of cell population or single cells are informative phenotypic measures of cellular responses to perturbations. 在这篇论文中,我们提出了基于编码器-解码器架构的深度学习框架,将初始cellular states映射到一个缺失空间中,我们假设药物干扰对cellular states的影响是线性的加性。然后,我们引入了循环一致性约束,要求初始cellular state exposed to drug perturbations would produce the perturbed cellular responses,并且,从扰动cellular states中移除药物干扰后,初始cellular states可以恢复到原始的cellular states。这些循环一致性约束和线性模型在缺失空间中学习可读取和可传递的药物干扰表示,使得我们的模型可以预测未经见过的药物的作用。我们在三种不同的数据集上验证了我们的模型,包括分布式转录表达数据集、分布式蛋白质表达数据集和单个cell transcriptional responses to drug perturbations。实验结果显示,我们的模型在与现有的状态艺术方法相比,有更好的性能。
Imagination-augmented Hierarchical Reinforcement Learning for Safe and Interactive Autonomous Driving in Urban Environments
results: 在五个复杂的城市驾驶任务中,我们的层次代理成功完成了安全意识和互动行为,其成功率和平均话语步数都高于基elines。Abstract
Hierarchical reinforcement learning (HRL) has led to remarkable achievements in diverse fields. However, existing HRL algorithms still cannot be applied to real-world navigation tasks. These tasks require an agent to perform safety-aware behaviors and interact with surrounding objects in dynamic environments. In addition, an agent in these tasks should perform consistent and structured exploration as they are long-horizon and have complex structures with diverse objects and task-specific rules. Designing HRL agents that can handle these challenges in real-world navigation tasks is an open problem. In this paper, we propose imagination-augmented HRL (IAHRL), a new and general navigation algorithm that allows an agent to learn safe and interactive behaviors in real-world navigation tasks. Our key idea is to train a hierarchical agent in which a high-level policy infers interactions by interpreting behaviors imagined with low-level policies. Specifically, the high-level policy is designed with a permutation-invariant attention mechanism to determine which low-level policy generates the most interactive behavior, and the low-level policies are implemented with an optimization-based behavior planner to generate safe and structured behaviors following task-specific rules. To evaluate our algorithm, we introduce five complex urban driving tasks, which are among the most challenging real-world navigation tasks. The experimental results indicate that our hierarchical agent performs safety-aware behaviors and properly interacts with surrounding vehicles, achieving higher success rates and lower average episode steps than baselines in urban driving tasks.
摘要
Leveraging Function Space Aggregation for Federated Learning at Scale
results: 对实际大规模跨设备测试,该算法在客户端模型偏离度增加时表现更加稳定,并在本地训练轮次增加时表现出显著提高。此外,该算法可以更好地实现本地化个性化,例如在 Stack Overflow 上进行几步个性化后, FedFish 比 FedAvg 提高了下一个token预测的正确率7%。Abstract
The federated learning paradigm has motivated the development of methods for aggregating multiple client updates into a global server model, without sharing client data. Many federated learning algorithms, including the canonical Federated Averaging (FedAvg), take a direct (possibly weighted) average of the client parameter updates, motivated by results in distributed optimization. In this work, we adopt a function space perspective and propose a new algorithm, FedFish, that aggregates local approximations to the functions learned by clients, using an estimate based on their Fisher information. We evaluate FedFish on realistic, large-scale cross-device benchmarks. While the performance of FedAvg can suffer as client models drift further apart, we demonstrate that FedFish is more robust to longer local training. Our evaluation across several settings in image and language benchmarks shows that FedFish outperforms FedAvg as local training epochs increase. Further, FedFish results in global networks that are more amenable to efficient personalization via local fine-tuning on the same or shifted data distributions. For instance, federated pretraining on the C4 dataset, followed by few-shot personalization on Stack Overflow, results in a 7% improvement in next-token prediction by FedFish over FedAvg.
摘要
联邦学习模式对联邦学习方法的发展提供了启发,包括将多个客户端更新融合到全球服务器模型中,无需分享客户数据。许多联邦学习算法,包括标准的联邦均值(FedAvg),使用直接(可能是加权)平均客户参数更新,基于分布式优化的结果。在这个研究中,我们从函数空间的角度出发,提出一个新的算法,FedFish,它将客户端的本地近似函数学习结果融合,使用估计基于客户端的费雪信息。我们在实际的大规模跨设备测试中评估了FedFish。相比于客户端模型偏离的情况下,FedFish的表现更加稳定,我们的评估结果显示,当客户端训练epoch增加时,FedFish的表现会比FedAvg更好。此外,FedFish可以实现更好的本地化,例如在C4 dataset上进行联邦预训,然后在相同或类似数据分布上进行几步本地微调,可以得到7%的下一个字预测提升。
Sobol Sequence Optimization for Hardware-Efficient Vector Symbolic Architectures
results: 对于语言和标题分类两个应用,本研究的实验结果显示,使用SobolSequence生成高维向量可以提高准确率,相比 traditional方法(基于线性反馈Shift Register和MATLAB随机函数),准确率提高10.79%,同时具有更低的能耗和更高的面积-延迟产品。Abstract
Hyperdimensional computing (HDC) is an emerging computing paradigm with significant promise for efficient and robust learning. In HDC, objects are encoded with high-dimensional vector symbolic sequences called hypervectors. The quality of hypervectors, defined by their distribution and independence, directly impacts the performance of HDC systems. Despite a large body of work on the processing parts of HDC systems, little to no attention has been paid to data encoding and the quality of hypervectors. Most prior studies have generated hypervectors using inherent random functions, such as MATLAB`s or Python`s random function. This work introduces an optimization technique for generating hypervectors by employing quasi-random sequences. These sequences have recently demonstrated their effectiveness in achieving accurate and low-discrepancy data encoding in stochastic computing systems. The study outlines the optimization steps for utilizing Sobol sequences to produce high-quality hypervectors in HDC systems. An optimization algorithm is proposed to select the most suitable Sobol sequences for generating minimally correlated hypervectors, particularly in applications related to symbol-oriented architectures. The performance of the proposed technique is evaluated in comparison to two traditional approaches of generating hypervectors based on linear-feedback shift registers and MATLAB random function. The evaluation is conducted for two applications: (i) language and (ii) headline classification. Our experimental results demonstrate accuracy improvements of up to 10.79%, depending on the vector size. Additionally, the proposed encoding hardware exhibits reduced energy consumption and a superior area-delay product.
摘要
高维ensional计算(HDC)是一种出现在计算机科学中的新型计算模式,它提供了高效和可靠的学习机制。在HDC中,对象被编码为高维度 вектор符号序列called hypervectors。 hypervectors的质量直接影响HDC系统的性能。despite a large body of work on the processing parts of HDC systems, little attention has been paid to data encoding and the quality of hypervectors. Most prior studies have generated hypervectors using inherent random functions, such as MATLAB`s or Python`s random function. This work introduces an optimization technique for generating hypervectors by employing quasi-random sequences. These sequences have recently demonstrated their effectiveness in achieving accurate and low-discrepancy data encoding in stochastic computing systems. The study outlines the optimization steps for utilizing Sobol sequences to produce high-quality hypervectors in HDC systems. An optimization algorithm is proposed to select the most suitable Sobol sequences for generating minimally correlated hypervectors, particularly in applications related to symbol-oriented architectures. The performance of the proposed technique is evaluated in comparison to two traditional approaches of generating hypervectors based on linear-feedback shift registers and MATLAB random function. The evaluation is conducted for two applications: (i) language and (ii) headline classification. Our experimental results demonstrate accuracy improvements of up to 10.79%, depending on the vector size. Additionally, the proposed encoding hardware exhibits reduced energy consumption and a superior area-delay product.
Multiscale Hodge Scattering Networks for Data Analysis
results: 该方法可以提取具有抗变性和强健性的特征,并且可以用于 signal classification、domain classification 和 molecular dynamics prediction。Abstract
We propose new scattering networks for signals measured on simplicial complexes, which we call \emph{Multiscale Hodge Scattering Networks} (MHSNs). Our construction is based on multiscale basis dictionaries on simplicial complexes, i.e., the $\kappa$-GHWT and $\kappa$-HGLET, which we recently developed for simplices of dimension $\kappa \in \N$ in a given simplicial complex by generalizing the node-based Generalized Haar-Walsh Transform (GHWT) and Hierarchical Graph Laplacian Eigen Transform (HGLET). The $\kappa$-GHWT and the $\kk$-HGLET both form redundant sets (i.e., dictionaries) of multiscale basis vectors and the corresponding expansion coefficients of a given signal. Our MHSNs use a layered structure analogous to a convolutional neural network (CNN) to cascade the moments of the modulus of the dictionary coefficients. The resulting features are invariant to reordering of the simplices (i.e., node permutation of the underlying graphs). Importantly, the use of multiscale basis dictionaries in our MHSNs admits a natural pooling operation that is akin to local pooling in CNNs, and which may be performed either locally or per-scale. These pooling operations are harder to define in both traditional scattering networks based on Morlet wavelets, and geometric scattering networks based on Diffusion Wavelets. As a result, we are able to extract a rich set of descriptive yet robust features that can be used along with very simple machine learning methods (i.e., logistic regression or support vector machines) to achieve high-accuracy classification systems with far fewer parameters to train than most modern graph neural networks. Finally, we demonstrate the usefulness of our MHSNs in three distinct types of problems: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.
摘要
我们提出了一种新的扩散网络,称为多Scale霍德扩散网络(MHSN),用于测量 simplicial 复合体上的信号。我们的构建基于 simplicial 复合体上的多Scale基准词典,即 $\kappa$-GHWT 和 $\kappa$-HGLET,我们之前在 simplicial 复合体上发展了这些基准词典。这些词典包含了多Scale基准向量的 redundancy 集和相应的扩展系数,可以用来描述信号的不同尺度特征。我们的 MHSN 使用层次结构类似于 convolutional neural network (CNN),将模ulus 的词典系数 moments 层次结构化。这些特征具有对 simplicial 复合体的排序(i.e., 节点重新排序)的不变性。此外,使用多Scale基准词典的使用可以自然地进行本地池化操作,这与传统的扩散网络基于 Morlet 波lets 和 geometric 扩散网络基于Diffusion 波lets不同。这些池化操作可以在多个尺度上进行,并且可以使用本地或每个尺度进行。这些特征可以用与非常简单的机器学习方法(例如 logistic regression 或 support vector machines)结合使用,以实现高精度的分类系统,并且具有许多参数 fewer than most modern graph neural networks。 Finally, we demonstrate the usefulness of our MHSNs in three distinct types of problems: signal classification, domain (i.e., graph/simplex) classification, and molecular dynamics prediction.
results: 提高了 existing DCD 方法的稳定性和性能,能够更好地处理 thousands of variables 的场景Abstract
Inferring causal relationships as directed acyclic graphs (DAGs) is an important but challenging problem. Differentiable Causal Discovery (DCD) is a promising approach to this problem, framing the search as a continuous optimization. But existing DCD methods are numerically unstable, with poor performance beyond tens of variables. In this paper, we propose Stable Differentiable Causal Discovery (SDCD), a new method that improves previous DCD methods in two ways: (1) It employs an alternative constraint for acyclicity; this constraint is more stable, both theoretically and empirically, and fast to compute. (2) It uses a training procedure tailored for sparse causal graphs, which are common in real-world scenarios. We first derive SDCD and prove its stability and correctness. We then evaluate it with both observational and interventional data and on both small-scale and large-scale settings. We find that SDCD outperforms existing methods in both convergence speed and accuracy and can scale to thousands of variables.
摘要
“推断 causal 关系为导向无环图(DAGs)是一个重要但具有挑战性的问题。可 diferenciable causal discovery(DCD)是一种有前途的方法,它将搜索转化为连续优化问题。但现有的 DCD 方法存在数值不稳定性问题,性能在多个变量上不佳。在这篇论文中,我们提出了稳定可 diferenciable causal discovery(SDCD)方法,这种方法在两个方面提高了现有 DCD 方法:(1)它使用了一种更稳定的 alternating 约束,这个约束是理论上和实际上更稳定,计算快速。(2)它使用了适用于稀疏 causal 图的训练程序,这种图是现实世界中常见的。我们首先 deriv 了 SDCD 并证明其稳定性和正确性。然后,我们对 observational 和 intervencial 数据进行了评估,并在小规模和大规模的设置下进行了评估。我们发现 SDCD 在速度和准确性两个方面都高于现有方法,并且可以扩展到千个变量。”
FREE: The Foundational Semantic Recognition for Modeling Environmental Ecosystems
For: This paper aims to develop a new framework called FREE for modeling environmental ecosystems, which can capture the complex relationships between various environmental data over space and time.* Methods: The FREE framework uses Large Language Models (LLMs) to map available environmental data into a text space and convert the traditional predictive modeling task into a semantic recognition problem. This allows for the incorporation of natural language descriptions and the capture of data semantics.* Results: The proposed FREE framework is evaluated in two real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. The results show that FREE outperforms multiple baseline methods and is more data- and computation-efficient, as it can be pre-trained on simulated data generated by physics-based models.Abstract
Modeling environmental ecosystems is critical for the sustainability of our planet, but is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time? In this paper, we introduce a new framework, FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in Large Language Models (LLMs) to supplement the original input features with natural language descriptions. This facilitates capturing the data semantics and also allows harnessing the irregularities of input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction. The efficacy of FREE is evaluated in the context of two societally important real-world applications, predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Beyond the superior predictive performance over multiple baseline methods, FREE is shown to be more data- and computation-efficient as it can be pre-trained on simulated data generated by physics-based models.
摘要
Modeling environmental ecosystems is critical for the sustainability of our planet, but it is extremely challenging due to the complex underlying processes driven by interactions amongst a large number of physical variables. As many variables are difficult to measure at large scales, existing works often utilize a combination of observable features and locally available measurements or modeled values as input to build models for a specific study region and time period. This raises a fundamental question in advancing the modeling of environmental ecosystems: how to build a general framework for modeling the complex relationships amongst various environmental data over space and time?In this paper, we introduce a new framework called FREE, which maps available environmental data into a text space and then converts the traditional predictive modeling task in environmental science to the semantic recognition problem. The proposed FREE framework leverages recent advances in Large Language Models (LLMs) to supplement the original input features with natural language descriptions, allowing for the capture of data semantics and the harnessing of irregularities in the input features. When used for long-term prediction, FREE has the flexibility to incorporate newly collected observations to enhance future prediction.We evaluate the efficacy of FREE in the context of two societally important real-world applications: predicting stream water temperature in the Delaware River Basin and predicting annual corn yield in Illinois and Iowa. Compared to multiple baseline methods, FREE achieves superior predictive performance and is more data- and computation-efficient, as it can be pre-trained on simulated data generated by physics-based models.
Degeneration of kernel regression with Matern kernels into low-order polynomial regression in high dimension
results: 在高维特征空间中, kernel方法可能变得不稳定,容易变成低阶多项式回归,失去了对kernel方法的优势。这些结果为PIP型多项式模型在中型分子中的成功提供了更多的光UNTING,以及使用physically-motivated(复制)kernels的重要性。Abstract
Kernel methods such as kernel ridge regression and Gaussian process regressions with Matern type kernels have been increasingly used, in particular, to fit potential energy surfaces (PES) and density functionals, and for materials informatics. When the dimensionality of the feature space is high, these methods are used with necessarily sparse data. In this regime, the optimal length parameter of a Matern-type kernel tends to become so large that the method effectively degenerates into a low-order polynomial regression and therefore loses any advantage over such regression. This is demonstrated theoretically as well as numerically on the examples of six- and fifteen-dimensional molecular PES using squared exponential and simple exponential kernels. The results shed additional light on the success of polynomial approximations such as PIP for medium size molecules and on the importance of orders-of-coupling based models for preserving the advantages of kernel methods with Matern type kernels or on the use of physically-motivated (reproducing) kernels.
摘要
kernel 方法如 kernel ridge regression 和 Gaussian process regression with Matern 类型kernel 在特别是用来适应 potential energy surfaces (PES) 和物理函数als, 并在物理计算中进行材料信息学。当特征空间维度高时,这些方法通常使用必要 sparse 的数据。在这种情况下,Matern 类型 kernel 的最佳尺度参数往往变得非常大,这使得方法实际上变成了低阶多项式回归,从而失去了对 kernel 方法的优势。这些结果也提供了中型分子 PIP 的Success 和 Matern 类型 kernel 的orders-of-coupling 基本模型的重要性。Note: The translation is done using Google Translate, and may not be perfect.
results: 实验表明,该方法可以高效精确地学习数据集中的特征表示Here’s the breakdown of each point:
for: The paper is written to resolve the problem of heterogeneity in data collected at different times or locations.
methods: The paper proposes a modified NMF objective called Stratified-NMF, which learns strata-dependent statistics and a shared topics matrix.
results: The paper presents experimental results on synthetic data and real-world datasets to demonstrate the efficiency and accuracy of the method.Abstract
Non-negative matrix factorization (NMF) is an important technique for obtaining low dimensional representations of datasets. However, classical NMF does not take into account data that is collected at different times or in different locations, which may exhibit heterogeneity. We resolve this problem by solving a modified NMF objective, Stratified-NMF, that simultaneously learns strata-dependent statistics and a shared topics matrix. We develop multiplicative update rules for this novel objective and prove convergence of the objective. Then, we experiment on synthetic data to demonstrate the efficiency and accuracy of the method. Lastly, we apply our method to three real world datasets and empirically investigate their learned features.
摘要
非正式矩阵分解(NMF)是一种重要的数据低维表示技术。然而,经典的NMF不会考虑数据在不同时间或地点采集的差异性。我们解决这个问题,通过解决修改后的NMF目标函数,即Stratified-NMF,它同时学习约束分布和共享话题矩阵。我们开发了乘法更新规则,并证明目标函数的收敛性。然后,我们在synthetic数据上进行实验,以证明方法的效率和准确性。最后,我们应用我们的方法于三个实际世界数据集,并考察其学习出的特征。
methods: 该论文提供了一种基于Python的虚拟轨迹生成实现方式,可以将大规模 raw 数据集转化为虚拟轨迹,以便更好地进行分析。
results: 研究人员通过使用虚拟轨迹数据集,成功地评估了I-24 MOTION INCEPTION v1.0.0数据集中不同车道之间的速度变化和行驶时间。此外,虚拟轨迹数据集还开启了未来对交通波动的研究。Abstract
This article introduces a new virtual trajectory dataset derived from the I-24 MOTION INCEPTION v1.0.0 dataset to address challenges in analyzing large but noisy trajectory datasets. Building on the concept of virtual trajectories, we provide a Python implementation to generate virtual trajectories from large raw datasets that are typically challenging to process due to their size. We demonstrate the practical utility of these trajectories in assessing speed variability and travel times across different lanes within the INCEPTION dataset. The virtual trajectory dataset opens future research on traffic waves and their impact on energy.
摘要
Note: The Simplified Chinese translation is written in Traditional Chinese characters, which is the standard form of Chinese used in Taiwan and other countries.Here's the breakdown of the translation:1. "This article" is translated as "这篇文章" (zhè běn wén tiāng).2. "introduces" is translated as "介绍" (jiè jiǎo).3. "a new virtual trajectory dataset" is translated as "一个新的虚拟路径集" (yī gè xīn de hū yì lù fāng jī).4. "derived from the I-24 MOTION INCEPTION v1.0.0 dataset" is translated as "基于I-24 MOTION INCEPTION v1.0.0 dataset" (jī yú I-24 MOTION INCEPTION v1.0.0 jiāng dài).5. "to address challenges in analyzing large but noisy trajectory datasets" is translated as "用于处理大小噪嗤的路径集数据" (yòng yú chūng hòu dà xiǎo bīng de lù fāng jī shuō yì).6. "Building on the concept of virtual trajectories" is translated as "基于虚拟路径的概念" (jī yú hū yì lù fāng de guī yì).7. "we provide a Python implementation" is translated as "我们提供Python实现" (wǒ men tí huì Python shí jì).8. "to generate virtual trajectories from large raw datasets" is translated as "生成虚拟路径从大Raw数据" (shēng jì hū yì lù fāng jiǔ raw shuō yì).9. "that are typically challenging to process due to their size" is translated as "因其大小而具有挑战性" (yǐn qí dà xiǎo èr bù huì yǒu zhàng xìng).10. "We demonstrate the practical utility of these trajectories" is translated as "我们示出这些虚拟路径的实用性" (wǒ men shì chuī zhè xī hū yì lù fāng de shí yòng xìng).11. "in assessing speed variability and travel times across different lanes within the INCEPTION dataset" is translated as "在INCEPTION dataset中的不同车道之间的速度变化和旅行时间" (zhī dào zhè yì zhè yì shuāng dào zhī jiān de zhōng dào biàn huà hěn lǚ xíng shí).12. "The virtual trajectory dataset opens future research on traffic waves and their impact on energy" is translated as "虚拟路径集开启了交通波和能源的未来研究" (hū yì lù fāng jī kāi kě yì jiāo yì yǐn yuè yì).Note that the translation is written in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. Traditional Chinese is used in Taiwan and other countries.
results: 该paper的实验表明,该算法在不同的噪声水平下表现出色,与比较模型相比,其表现更加稳定和高效。Abstract
Coherent imaging systems, such as medical ultrasound and synthetic aperture radar (SAR), are subject to corruption from speckle due to sub-resolution scatterers. Since speckle is multiplicative in nature, the constituent image regions become corrupted to different extents. The task of denoising such images requires algorithms specifically designed for removing signal-dependent noise. This paper proposes a novel image denoising algorithm for removing signal-dependent multiplicative noise with diffusion models, called Speckle Denoising Diffusion Probabilistic Models (SDDPM). We derive the mathematical formulations for the forward process, the reverse process, and the training objective. In the forward process, we apply multiplicative noise to a given image and prove that the forward process is Gaussian. We show that the reverse process is also Gaussian and the final training objective can be expressed as the Kullback Leibler (KL) divergence between the forward and reverse processes. As derived in the paper, the final denoising task is a single step process, thereby reducing the denoising time significantly. We have trained our model with natural land-use images and ultrasound images for different noise levels. Extensive experiments centered around two different applications show that SDDPM is robust and performs significantly better than the comparative models even when the images are severely corrupted.
摘要
高度一致的干扰系统,如医疗超声和 sintetical aperture radar(SAR),受到微小雷达的扰动,导致图像受到不同程度的损害。由于干扰是加法的,图像中的不同区域受到不同程度的损害。去除这种信号依赖的干扰需要特定的算法。本文提出了一种新的图像去干扰算法,基于扩散模型,称为Speckle Denoising Diffusion Probabilistic Models(SDDPM)。我们 derivate了前向过程、反向过程和训练目标的数学表述。在前向过程中,我们将一个图像加上了加法干扰,并证明了前向过程是高斯分布。我们还证明了反向过程也是高斯分布,最终的训练目标可以表示为高斯差分(KL)散度 между前向和反向过程。根据文章中的 derivation,最终的去干扰任务是单步过程,因此可以减少去干扰时间。我们在不同的陆地使用图像和超声图像上训练了我们的模型,并在不同的噪声水平进行了广泛的实验。结果表明,SDDPM是稳定和高效的,即使图像受到严重的损害也能够达到优秀的去干扰效果。
Image-Domain Material Decomposition for Dual-energy CT using Unsupervised Learning with Data-fidelity Loss
results: 研究获得了一个可靠的、不受噪声扰乱的材料分解方法,并且在静止影像领域内进行了实验评估。Abstract
Background: Dual-energy CT (DECT) and material decomposition play vital roles in quantitative medical imaging. However, the decomposition process may suffer from significant noise amplification, leading to severely degraded image signal-to-noise ratios (SNRs). While existing iterative algorithms perform noise suppression using different image priors, these heuristic image priors cannot accurately represent the features of the target image manifold. Although deep learning-based decomposition methods have been reported, these methods are in the supervised-learning framework requiring paired data for training, which is not readily available in clinical settings. Purpose: This work aims to develop an unsupervised-learning framework with data-measurement consistency for image-domain material decomposition in DECT.
摘要
背景:双能CT(DECT)和材料分解在医学影像中扮演着重要的角色,但是分解过程可能会受到干扰的强化,导致影像信号噪声比(SNR)严重下降。现有的迭代算法在不同的图像假设上进行干扰抑制,但这些启发式图像假设无法准确地表示目标图像拓扑。虽然深度学习基于的分解方法已经报道,但这些方法需要训练用 paired 数据,这在临床 Settings 中并不可得。目的:本工作旨在开发一个无监督学习框架,以保证数据测量一致性,用于图像领域的材料分解在 DECT 中。
MIFA: Metadata, Incentives, Formats, and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis
results: 研究人员认为,基于MIFA(Metadata, Incentives, Formats, and Accessibility)指南的开发将加速生物图像分析领域中人工智能工具的发展,因为它将提高高质量的训练数据的获取和 reuse。Abstract
Artificial Intelligence methods are powerful tools for biological image analysis and processing. High-quality annotated images are key to training and developing new methods, but access to such data is often hindered by the lack of standards for sharing datasets. We brought together community experts in a workshop to develop guidelines to improve the reuse of bioimages and annotations for AI applications. These include standards on data formats, metadata, data presentation and sharing, and incentives to generate new datasets. We are positive that the MIFA (Metadata, Incentives, Formats, and Accessibility) recommendations will accelerate the development of AI tools for bioimage analysis by facilitating access to high quality training data.
摘要
人工智能技术是生物图像分析和处理中非常有力的工具。高质量的注释图像是训练和开发新方法的关键,但获取这些数据往往受到数据分享标准的限制。我们将社区专家集合在一起,开展工作室,制定了指南,以提高生物图像和注释的重用。这些指南包括数据格式、元数据、数据展示和分享、以及奖励新数据集的创造。我们对MIFA(元数据、激励、格式和可访问性)建议表示乐见,这将加速生物图像分析中AI工具的发展,并且促进高质量训练数据的获取。