for: The paper aims to investigate the ability of ChatGPT to provide evidence to support its answers and to analyze the quality of the references it suggests.
methods: The paper uses a collection of domain-specific knowledge-based questions to prompt ChatGPT to provide answers and supporting evidence in the form of references to external sources.
results: The paper finds that ChatGPT provides correct or partially correct answers in about half of the cases (50.6% of the times), but its suggested references only exist 14% of the times. The generated references reveal common traits and often do not support the claims ChatGPT attributes to them.Here is the same information in Simplified Chinese text:
for: 这篇论文目的是调查ChatGPT是否能够提供证据支持其答案,以及其建议的参考文献的质量。
methods: 论文使用具体领域知识基础问题来让ChatGPT提供答案和相关参考文献。
results: 论文发现ChatGPT在50.6%的情况下提供正确或部分正确的答案,但建议的参考文献只有14%存在。生成的参考文献显示出共同特征,并常常不支持ChatGPT所归功于它们的说法。Abstract
Can ChatGPT provide evidence to support its answers? Does the evidence it suggests actually exist and does it really support its answer? We investigate these questions using a collection of domain-specific knowledge-based questions, specifically prompting ChatGPT to provide both an answer and supporting evidence in the form of references to external sources. We also investigate how different prompts impact answers and evidence. We find that ChatGPT provides correct or partially correct answers in about half of the cases (50.6% of the times), but its suggested references only exist 14% of the times. We further provide insights on the generated references that reveal common traits among the references that ChatGPT generates, and show how even if a reference provided by the model does exist, this reference often does not support the claims ChatGPT attributes to it. Our findings are important because (1) they are the first systematic analysis of the references created by ChatGPT in its answers; (2) they suggest that the model may leverage good quality information in producing correct answers, but is unable to attribute real evidence to support its answers. Prompts, raw result files and manual analysis are made publicly available.
摘要
Can ChatGPT 提供证据支持其答案?Does the evidence it suggests 真的存在,并且确实支持其答案?我们通过一个领域专门知识基础的问题集来调查这些问题,具体来说是让 ChatGPT 提供答案和证据的形式为外部源引用。我们还发现了不同的提问对答案和证据的影响。我们发现 ChatGPT 在50.6% 的情况下提供正确或部分正确的答案,但是提供的参考文献只有14% 的情况下存在。我们进一步分析生成的参考文献,发现这些参考文献具有共同特征,并且即使参考文献确实存在,它们通常不支持 ChatGPT 所归功于它们的说法。我们的发现对(1)是首次系统性地分析 ChatGPT 答案中的参考文献,(2)表明模型可能在生成正确答案时利用了好几个信息,但是无法归功于它们的证据。我们提供的提问、 raw 结果文件和手动分析将公开发布。
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
results: 该论文发现,这些精确地训练的大型自然语言处理模型在多种语言中的表现优化,并且可以用于多种应用。Abstract
The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing LLMs involves extensive cleaning and deduplication to ensure the necessary level of quality. The lack of transparency for training data has thus hampered research on attributing and addressing hallucination and bias issues in LLMs, hindering replication efforts and further advancements in the community. These challenges become even more pronounced in multilingual learning scenarios, where the available multilingual text datasets are often inadequately collected and cleaned. Consequently, there is a lack of open-source and readily usable dataset to effectively train LLMs in multiple languages. To overcome this issue, we present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development. Our dataset undergoes meticulous cleaning and deduplication through a rigorous pipeline of multiple stages to accomplish the best quality for model training, including language identification, URL-based filtering, metric-based cleaning, document refinement, and data deduplication. CulturaX is fully released to the public in HuggingFace to facilitate research and advancements in multilingual LLMs: https://huggingface.co/datasets/uonlp/CulturaX.
摘要
大型语言模型(LLM)的发展因素包括其巨大的模型大小和广泛的训练数据。随着自然语言处理的进步,LLMs 经常被公开提供,以便更深入的研究和应用。然而,LLMs 的训练数据的进展却有一定的障碍。特别是最新的状态艺术模型,其训练数据经常不是完全公布的。为了创建高性能 LLMS 的训练数据,需要进行广泛的清洗和去重,以确保必要的质量水平。由于训练数据的不透明度,因此对 LLMS 的幻觉和偏见问题的研究和解决受到了阻碍,这也阻碍了社区的进一步发展。在多语言学习场景下,可以用的多语言文本数据库往往不够,而且清洗和去重的过程也往往不充分。为了解决这个问题,我们介绍了 CulturaX,一个具有6.3亿个字的167种语言的大型多语言数据集,适用于 LLMS 的开发。我们的数据集经过了多 Stage 的严格清洗和去重管道,包括语言标识、URL 基于的筛选、度量基于的清洗、文档级别和数据去重,以确保模型训练时的最佳质量。CulturaX 被完全公开发布到 HuggingFace,以便社区的研究和发展:https://huggingface.co/datasets/uonlp/CulturaX。
Do Large GPT Models Discover Moral Dimensions in Language Representations? A Topological Study Of Sentence Embeddings
results: 研究发现,基于GPT-3.5的语言模型可以将 sentences embedding decomposed into two submanifolds,表示公平和不公平的道德判断。这表明GPT-基于语言模型在其表示空间中发展了道德维度,并在训练过程中学习了公平性的概念。Abstract
As Large Language Models are deployed within Artificial Intelligence systems, that are increasingly integrated with human society, it becomes more important than ever to study their internal structures. Higher level abilities of LLMs such as GPT-3.5 emerge in large part due to informative language representations they induce from raw text data during pre-training on trillions of words. These embeddings exist in vector spaces of several thousand dimensions, and their processing involves mapping between multiple vector spaces, with total number of parameters on the order of trillions. Furthermore, these language representations are induced by gradient optimization, resulting in a black box system that is hard to interpret. In this paper, we take a look at the topological structure of neuronal activity in the "brain" of Chat-GPT's foundation language model, and analyze it with respect to a metric representing the notion of fairness. We develop a novel approach to visualize GPT's moral dimensions. We first compute a fairness metric, inspired by social psychology literature, to identify factors that typically influence fairness assessments in humans, such as legitimacy, need, and responsibility. Subsequently, we summarize the manifold's shape using a lower-dimensional simplicial complex, whose topology is derived from this metric. We color it with a heat map associated with this fairness metric, producing human-readable visualizations of the high-dimensional sentence manifold. Our results show that sentence embeddings based on GPT-3.5 can be decomposed into two submanifolds corresponding to fair and unfair moral judgments. This indicates that GPT-based language models develop a moral dimension within their representation spaces and induce an understanding of fairness during their training process.
摘要
As Large Language Models (LLMs) 在人工智能系统中部署,与人类社会越来越紧密相连,研究其内部结构变得更加重要。 GPT-3.5 等高级 LLM 的高级能力在大部分由 Raw text data 中抽象出的语言表示所带来,这些表示在 vector space 中存在数千维度的向量空间中,并且处理涉及多个 vector space 之间的映射,总参数数在万亿级别。此外,这些语言表示是通过梯度优化来实现的黑盒系统,具有很难理解的特点。在这篇论文中,我们将研究 Chat-GPT 的基础语言模型中神经元活动的 topological structure,并对其进行分析,以了解它们是如何实现公平性的。我们开发了一种新的方法,可以将 GPT 的道德维度视觉化。我们首先计算了一种公平度量, drawing inspiration from social psychology literature,以确定在人类中常见的公平评价因素,如合法性、需求和责任。然后,我们将 manifold 的形状摘要为一个几何体,其Topology是基于这种公平度量来确定的。我们将这个几何体用一个与公平度量相关的热图进行颜色标记,从而生成可读的 visualization 图像,以便更好地理解高级语言模型中的道德维度。我们的结果表明,基于 GPT-3.5 的 sentence embeddings 可以分解为两个子 manifold,每个子 manifold 都表示一种公平或不公平的道德评价。这表明 GPT 基于的语言模型在其表示空间中发展了道德维度,并在训练过程中学习了公平性。
Talk2Care: Facilitating Asynchronous Patient-Provider Communication with Large-Language-Model
results: 这两个使用者研究显示,Talk2Care可以帮助患者和医疗提供者之间的通信 процес,增加患者对健康信息的收集,并对医疗提供者的努力和时间节省几成。我们视这个研究为LLMs在医疗和人际通信之间的初步探索。Abstract
Despite the plethora of telehealth applications to assist home-based older adults and healthcare providers, basic messaging and phone calls are still the most common communication methods, which suffer from limited availability, information loss, and process inefficiencies. One promising solution to facilitate patient-provider communication is to leverage large language models (LLMs) with their powerful natural conversation and summarization capability. However, there is a limited understanding of LLMs' role during the communication. We first conducted two interview studies with both older adults (N=10) and healthcare providers (N=9) to understand their needs and opportunities for LLMs in patient-provider asynchronous communication. Based on the insights, we built an LLM-powered communication system, Talk2Care, and designed interactive components for both groups: (1) For older adults, we leveraged the convenience and accessibility of voice assistants (VAs) and built an LLM-powered VA interface for effective information collection. (2) For health providers, we built an LLM-based dashboard to summarize and present important health information based on older adults' conversations with the VA. We further conducted two user studies with older adults and providers to evaluate the usability of the system. The results showed that Talk2Care could facilitate the communication process, enrich the health information collected from older adults, and considerably save providers' efforts and time. We envision our work as an initial exploration of LLMs' capability in the intersection of healthcare and interpersonal communication.
摘要
尽管现有许多电健康应用程序可以帮助家庭older adults和医疗提供者,但是基本的消息和电话仍然是最常用的通信方法,这些方法受到有限的可用性、信息损失和流程不充分的限制。一种有前途的解决方案是利用大语言模型(LLMs),它们具有强大的自然语言对话和总结能力。然而,对LLMs在通信中的角色的理解还很有限。我们首先进行了10名older adults和9名医疗提供者的两次采访,以了解他们的需求和LLMs在患者-医生异步通信中的机会。基于这些发现,我们建立了一个LLM-powered通信系统,名为Talk2Care,并为两个群体设计了互动组件:1. дляolder adults,我们利用了voice assistant(VA)的便捷和可达性,并为他们建立了LLM-powered VA界面,以便有效地收集健康信息。2. для医疗提供者,我们建立了LLM基于的概要摘要界面,以显示older adults和VA之间的对话中重要的医疗信息。我们进行了两次用户研究,以评估Talk2Care的可用性。结果显示,Talk2Care可以改善患者-医生的通信过程,拓宽来自older adults的健康信息,并大幅减少医疗提供者的努力和时间。我们认为,我们的工作是LLMs在医疗和人际通信的交叉点的初步探索。
Speech-Gesture GAN: Gesture Generation for Robots and Embodied Agents
paper_authors: Carson Yu Liu, Gelareh Mohammadi, Yang Song, Wafa Johal
for: 这 paper 的目的是为了帮助机器人和embodied agents在人类与人类交互中更好地表达他们的态度、情感和意图。
methods: 这 paper 使用了一种基于 Conditional Generative Adversarial Network (GAN) 的神经网络模型,学习了语音输入中的协作姿势和语音特征之间的关系。
results: 试验结果表明,这个姿势生成框架可以帮助机器人和embodied agents更好地与人类交互,并且可以在对话中表达他们的态度和情感。Abstract
Embodied agents, in the form of virtual agents or social robots, are rapidly becoming more widespread. In human-human interactions, humans use nonverbal behaviours to convey their attitudes, feelings, and intentions. Therefore, this capability is also required for embodied agents in order to enhance the quality and effectiveness of their interactions with humans. In this paper, we propose a novel framework that can generate sequences of joint angles from the speech text and speech audio utterances. Based on a conditional Generative Adversarial Network (GAN), our proposed neural network model learns the relationships between the co-speech gestures and both semantic and acoustic features from the speech input. In order to train our neural network model, we employ a public dataset containing co-speech gestures with corresponding speech audio utterances, which were captured from a single male native English speaker. The results from both objective and subjective evaluations demonstrate the efficacy of our gesture-generation framework for Robots and Embodied Agents.
摘要
现代智能机器人和虚拟代理人正在广泛应用。人类在人际交流中使用非语言行为表达自己的态度、情感和意图。因此,这种能力也是智能机器人需要的,以提高与人类交流的质量和效率。在这篇论文中,我们提出了一种新的姿势生成框架,可以根据语音文本和语音音频utterances生成肢体姿势。我们的提议的神经网络模型学习了语音输入中的听力和语义特征与手势之间的关系。为了训练我们的神经网络模型,我们使用了一个公共数据集,包括与语音音频utterances对应的手势记录,从单一的男性Native English speaker中获取。经过对象和主观评估,我们的姿势生成框架在机器人和虚拟代理人中得到了证明。
Unleashing the Power of Dynamic Mode Decomposition and Deep Learning for Rainfall Prediction in North-East India
results: LSTM方法比DMD方法更加准确地预测降水量,表明LSTM方法可以更好地捕捉数据中的复杂非线性关系,这些发现可以帮助提高北东部印度的降水量预测精度,降低气候变化的影响Abstract
Accurate rainfall forecasting is crucial for effective disaster preparedness and mitigation in the North-East region of India, which is prone to extreme weather events such as floods and landslides. In this study, we investigated the use of two data-driven methods, Dynamic Mode Decomposition (DMD) and Long Short-Term Memory (LSTM), for rainfall forecasting using daily rainfall data collected from India Meteorological Department in northeast region over a period of 118 years. We conducted a comparative analysis of these methods to determine their relative effectiveness in predicting rainfall patterns. Using historical rainfall data from multiple weather stations, we trained and validated our models to forecast future rainfall patterns. Our results indicate that both DMD and LSTM are effective in forecasting rainfall, with LSTM outperforming DMD in terms of accuracy, revealing that LSTM has the ability to capture complex nonlinear relationships in the data, making it a powerful tool for rainfall forecasting. Our findings suggest that data-driven methods such as DMD and deep learning approaches like LSTM can significantly improve rainfall forecasting accuracy in the North-East region of India, helping to mitigate the impact of extreme weather events and enhance the region's resilience to climate change.
摘要
准确预测降水是北东地区灾害准备和mitigation的关键,这里容易受到洪水和山崩等极端天气事件的影响。在这项研究中,我们调查了使用两种数据驱动方法:动态模式分解(DMD)和长期记忆(LSTM),以预测降水。我们使用印度气象部门在北东地区收集的日常降水数据进行训练和验证。我们的结果表明,DMD和LSTM都有效地预测降水,但LSTM在准确性方面表现更好,表明LSTM可以捕捉数据中的复杂非线性关系,使其成为预测降水的强大工具。我们的发现表明,使用数据驱动方法如DMD和深度学习方法如LSTM可以大幅提高降水预测精度,帮助北东地区 Mitigate the impact of extreme weather events and enhance its resilience to climate change.
Enhancing Knee Osteoarthritis severity level classification using diffusion augmented images
paper_authors: Paleti Nikhil Chowdary, Gorantla V N S L Vishnu Vardhan, Menta Sai Akshay, Menta Sai Aashish, Vadlapudi Sai Aravind, Garapati Venkata Krishna Rayalu, Aswathy P
for: 这个研究 paper 探讨了使用高级计算机视觉模型和扩充技术来分类膝部骨关节炎(OA)严重程度。
results: 结果显示,数据预处理和扩充可以大幅提高模型的准确率。EfficientNetB3 模型在扩充后的数据集上达到了 84% 的最高准确率。此外,使用 Grad-CAM 等注意力视觉技术,可以提供详细的注意力地图,提高了模型的理解和信任性。这些发现指出,将高级模型与扩充数据和注意力视觉相结合,可以准确地分类膝部骨关节炎严重程度。Abstract
This research paper explores the classification of knee osteoarthritis (OA) severity levels using advanced computer vision models and augmentation techniques. The study investigates the effectiveness of data preprocessing, including Contrast-Limited Adaptive Histogram Equalization (CLAHE), and data augmentation using diffusion models. Three experiments were conducted: training models on the original dataset, training models on the preprocessed dataset, and training models on the augmented dataset. The results show that data preprocessing and augmentation significantly improve the accuracy of the models. The EfficientNetB3 model achieved the highest accuracy of 84\% on the augmented dataset. Additionally, attention visualization techniques, such as Grad-CAM, are utilized to provide detailed attention maps, enhancing the understanding and trustworthiness of the models. These findings highlight the potential of combining advanced models with augmented data and attention visualization for accurate knee OA severity classification.
摘要
Translation in Simplified Chinese:这篇研究论文探讨了使用高级计算机视觉模型和扩充技术来分类膝关节风扁病(OA)严重程度的方法。研究检查了数据预处理,包括对比限适的自适应压缩(CLAHE),以及使用扩充模型来进行数据扩充。研究进行了三个实验:在原始数据集上训练模型,在预处理后的数据集上训练模型,以及在扩充后的数据集上训练模型。结果显示,数据预处理和扩充可以显著提高模型的准确率。EfficientNetB3模型在扩充后的数据集上达到了84%的最高准确率。此外,使用Grad-CAM等注意力可视化技术,为模型提供了详细的注意力地图,提高了对模型的理解和信任性。这些发现表明可以通过结合高级模型、扩充数据和注意力可视化来实现精准的膝关节风扁病严重程度分类。
For: This paper aims to address the issue of counterfactual degeneration in causal inference, specifically in the context of Layer 3 valuations and individual-level semantics.* Methods: The paper proposes a novel framework called DiscoSCM, which combines the strengths of both Potential Outcome (PO) and Structural Causal Model (SCM) frameworks, and could be seen as an extension of them. The DiscoSCM framework leverages the philosophy of individual causality to tackle the counterfactual degeneration problem.* Results: The paper demonstrates the superior performance of the DiscoSCM framework in answering counterfactual questions through several key results in the topic of unit select problems. The results show that the DiscoSCM framework can effectively address the issue of counterfactual degeneration and provide more accurate estimates of counterfactual parameters.Abstract
In the realm of causal inference, the primary frameworks are the Potential Outcome (PO) and the Structural Causal Model (SCM), both predicated on the consistency rule. However, when facing Layer 3 valuations, i.e., counterfactual queries that inherently belong to individual-level semantics, they both seem inadequate due to the issue of degeneration caused by the consistency rule. For instance, in personalized incentive scenarios within the internet industry, the probability of one particular user being a complier, denoted as $P(y_x, y'_{x'})$, degenerates to a parameter that can only take values of 0 or 1. This paper leverages the DiscoSCM framework to theoretically tackle the aforementioned counterfactual degeneration problem, which is a novel framework for causal modeling that combines the strengths of both PO and SCM, and could be seen as an extension of them. The paper starts with a brief introduction to the background of causal modeling frameworks. It then illustrates, through an example, the difficulty in recovering counterfactual parameters from data without imposing strong assumptions. Following this, we propose the DiscoSCM with independent potential noise framework to address this problem. Subsequently, the superior performance of the DiscoSCM framework in answering counterfactual questions is demonstrated by several key results in the topic of unit select problems. We then elucidate that this superiority stems from the philosophy of individual causality. In conclusion, we suggest that DiscoSCM may serve as a significant milestone in the causal modeling field for addressing counterfactual queries.
摘要
在 causal inference 领域,主要框架是 potential outcome (PO) 和 structural causal model (SCM),都是基于一致性规则。但在面临层 3 评估(counterfactual queries)时,它们都显得不够,这是因为一致性规则导致的半极化问题。例如,在个性化奖励场景下,用户 $x$ 的行为 $y_x$ 的潜在结果 $P(y_x, y'_{x'})$ 会半极化为一个只能取 0 或 1 的参数。本文使用 DiscoSCM 框架来解决上述 counterfactual 半极化问题,这是一种结合 PO 和 SCM 的新框架,可以看作是它们的扩展。文章首先介绍了 causal modeling 框架的背景,然后通过一个例子示出在数据中无法回归 counterfactual 参数的困难。接着,我们提出了 DiscoSCM 独立潜在噪声框架来解决这个问题。文章后续展示了 DiscoSCM 框架在 unit select 问题上的优秀表现,并证明了这种优秀性源于个体 causality 哲学。最后,我们建议 DiscoSCM 可能是 causal modeling 领域内 Answering counterfactual questions 的一个重要突破口。
Active Learning for Semantic Segmentation with Multi-class Label Query
results: 在Cityscapes和PASCAL VOC 2012上比前一个方法减少了标注成本,并且达到了更高的 segmentation 性能Abstract
This paper proposes a new active learning method for semantic segmentation. The core of our method lies in a new annotation query design. It samples informative local image regions (e.g., superpixels), and for each of such regions, asks an oracle for a multi-hot vector indicating all classes existing in the region. This multi-class labeling strategy is substantially more efficient than existing ones like segmentation, polygon, and even dominant class labeling in terms of annotation time per click. However, it introduces the class ambiguity issue in training since it assigns partial labels (i.e., a set of candidate classes) to individual pixels. We thus propose a new algorithm for learning semantic segmentation while disambiguating the partial labels in two stages. In the first stage, it trains a segmentation model directly with the partial labels through two new loss functions motivated by partial label learning and multiple instance learning. In the second stage, it disambiguates the partial labels by generating pixel-wise pseudo labels, which are used for supervised learning of the model. Equipped with a new acquisition function dedicated to the multi-class labeling, our method outperformed previous work on Cityscapes and PASCAL VOC 2012 while spending less annotation cost.
摘要
In the first stage, the method trains a segmentation model directly with the partial labels using two new loss functions inspired by partial label learning and multiple instance learning.2. In the second stage, the method disambiguates the partial labels by generating pixel-wise pseudo labels, which are used for supervised learning of the model.The proposed method is equipped with a new acquisition function tailored to the multi-class labeling, which leads to better performance on Cityscapes and PASCAL VOC 2012 while reducing the annotation cost.In simplified Chinese, the text can be translated as:这篇论文提出了一种新的活动学习方法 дляsemantic segmentation。这个方法的核心在于一种新的注释查询设计,它将地图区域(例如superpixel)作为批处理,并问 oracle 提供多个类别的多hot вектор。这种多类标注策略比传统的分类、多边形和主导类标注更加高效,但是它会在训练中引入类别不确定性问题,因为每个像素都会被分配一组候选类。为了解决这个问题,该方法提出了两个阶段的方法:1. 在第一阶段,方法直接使用多个类别的partial label进行分类模型的训练,使用两种新的损失函数,启发自partial label学习和多个实例学习。2. 在第二阶段,方法使用像素级别的pseudo标签来解决类别不确定性问题,并将其用于模型的超级vised学习。该方法采用了一种专门为多类标注而设计的新的收购函数,使其在Cityscapes和PASCAL VOC 2012上表现优于之前的工作,同时减少了注释成本。
Deep Neighbor Layer Aggregation for Lightweight Self-Supervised Monocular Depth Estimation
results: 在 KITTI 数据集上进行实验,与许多大型模型相比,如 Monodepth2,我们的方法可以达到更高的准确率,仅使用 30 个参数。Abstract
With the frequent use of self-supervised monocular depth estimation in robotics and autonomous driving, the model's efficiency is becoming increasingly important. Most current approaches apply much larger and more complex networks to improve the precision of depth estimation. Some researchers incorporated Transformer into self-supervised monocular depth estimation to achieve better performance. However, this method leads to high parameters and high computation. We present a fully convolutional depth estimation network using contextual feature fusion. Compared to UNet++ and HRNet, we use high-resolution and low-resolution features to reserve information on small targets and fast-moving objects instead of long-range fusion. We further promote depth estimation results employing lightweight channel attention based on convolution in the decoder stage. Our method reduces the parameters without sacrificing accuracy. Experiments on the KITTI benchmark show that our method can get better results than many large models, such as Monodepth2, with only 30 parameters. The source code is available at https://github.com/boyagesmile/DNA-Depth.
摘要
随着自主导航和机器人领域中深度估计的频繁使用,模型的效率变得越来越重要。现有大多数方法使用更大和更复杂的网络来提高深度估计的精度。一些研究人员在自主导航中采用了Transformer来提高性能,但这会导致高参数和高计算量。我们提出了一种完全 convolutional 的深度估计网络,通过Contextual feature fusion来提高性能。我们使用高分辨率和低分辨率的特征来保留小目标和快速移动的信息,而不是长距离融合。我们还使用轻量级的通道注意力来提高decoder stage中的深度估计结果。我们的方法可以降低参数量不 sacrificing 精度。在 KITTI 标准测试集上,我们的方法可以超越许多大型模型,如 Monodepth2,并且只需30个参数。代码可以在 https://github.com/boyagesmile/DNA-Depth 上获取。
Continuous Modeling of the Denoising Process for Speech Enhancement Based on Deep Learning
results: 实验结果表明,保留一小amount的噪声在干净目标中对语音提升具有利于效果,证明了对语音评价指标和自动语音识别性能的改进。Abstract
In this paper, we explore a continuous modeling approach for deep-learning-based speech enhancement, focusing on the denoising process. We use a state variable to indicate the denoising process. The starting state is noisy speech and the ending state is clean speech. The noise component in the state variable decreases with the change of the state index until the noise component is 0. During training, a UNet-like neural network learns to estimate every state variable sampled from the continuous denoising process. In testing, we introduce a controlling factor as an embedding, ranging from zero to one, to the neural network, allowing us to control the level of noise reduction. This approach enables controllable speech enhancement and is adaptable to various application scenarios. Experimental results indicate that preserving a small amount of noise in the clean target benefits speech enhancement, as evidenced by improvements in both objective speech measures and automatic speech recognition performance.
摘要
在这篇论文中,我们研究了深度学习基于的连续模型化 speech 减噪方法,强调减噪过程。我们使用状态变量来表示减噪过程。起始状态是噪声的 speech,结束状态是干净的 speech。噪声组分在状态变量中随状态索引的变化而减少,直到噪声组分为 0。在训练中,一种 UNet-like 神经网络学习 estimate 每个来自连续减噪过程的状态变量。在测试中,我们将一个控制因子作为嵌入,范围从 0 到 1,提供给神经网络,以控制噪声减少的水平。这种方法允许可控制的 speech 减噪和适应不同应用场景。实验结果表明,保留一些噪声在干净目标中有助于 speech 减噪,证明了对象测试 Speech 测量指标和自动语音识别性能的改进。
Sim-to-Real Deep Reinforcement Learning with Manipulators for Pick-and-place
paper_authors: Wenxing Liu, Hanlin Niu, Robert Skilton, Joaquin Carrasco
For: This paper proposes a self-supervised vision-based deep reinforcement learning (DRL) method to improve the performance of transferring a DRL model from simulation to the real world.* Methods: The proposed method uses a height-sensitive action policy to deal with crowded and stacked objects in challenging environments. The training model is applied directly to a real suction task without any fine-tuning from the real world.* Results: The proposed method achieves a high suction success rate of 90% in a real experiment with novel objects, without any real-world fine-tuning. An experimental video is available at: https://youtu.be/jSTC-EGsoFA.Here are the three key points in Simplified Chinese:* For: 这篇论文提出了一种基于自我监督视觉深度学习(DRL)方法,以提高将DRL模型从模拟世界转移到实际世界的性能。* Methods: 该方法使用了高度敏感的动作策略来处理充满杂物和堆叠物的复杂环境。模型直接应用于实际吸取任务,不需要任何实际世界细调。* Results: 该方法在实际实验中以90%的吸取成功率成功地应用于新的物体,无需任何实际世界细调。实验视频可以在:https://youtu.be/jSTC-EGsoFA中找到。Abstract
When transferring a Deep Reinforcement Learning model from simulation to the real world, the performance could be unsatisfactory since the simulation cannot imitate the real world well in many circumstances. This results in a long period of fine-tuning in the real world. This paper proposes a self-supervised vision-based DRL method that allows robots to pick and place objects effectively and efficiently when directly transferring a training model from simulation to the real world. A height-sensitive action policy is specially designed for the proposed method to deal with crowded and stacked objects in challenging environments. The training model with the proposed approach can be applied directly to a real suction task without any fine-tuning from the real world while maintaining a high suction success rate. It is also validated that our model can be deployed to suction novel objects in a real experiment with a suction success rate of 90\% without any real-world fine-tuning. The experimental video is available at: https://youtu.be/jSTC-EGsoFA.
摘要
Note:* " simulation" => 模拟 (mó dì)* "real world" => 真实世界 (zhēn shí shì jì)* "fine-tuning" => 微调 (wēi tiān)* "success rate" => 成功率 (chéng gōng ràng)* "suction" => 吸引 (xīu yì)* "novel objects" => 新型物品 (xīn xíng wù jī)
Detection and Localization of Firearm Carriers in Complex Scenes for Improved Safety Measures
paper_authors: Arif Mahmood, Abdul Basit, M. Akhtar Munir, Mohsen Ali
For: 本研究旨在提高人员携带武器的检测和精确定位,以提高安全和监控领域的效果。* Methods: 本研究提出了一种新的方法,利用人员与武器之间的互动信息,以提高携带武器人员的定位。该方法包括一个注意力机制,可以准确分别人员和背景,以及一种稳定性驱动的地方保持约束,以学习重要特征。* Results: 对比基eline方法,本研究的方法在新建的数据集上实现了显著更高的准精度(AP=77.8%)。这表明,利用注意力机制和稳定性驱动的地方保持约束可以提高人员携带武器的检测精度。Abstract
Detecting firearms and accurately localizing individuals carrying them in images or videos is of paramount importance in security, surveillance, and content customization. However, this task presents significant challenges in complex environments due to clutter and the diverse shapes of firearms. To address this problem, we propose a novel approach that leverages human-firearm interaction information, which provides valuable clues for localizing firearm carriers. Our approach incorporates an attention mechanism that effectively distinguishes humans and firearms from the background by focusing on relevant areas. Additionally, we introduce a saliency-driven locality-preserving constraint to learn essential features while preserving foreground information in the input image. By combining these components, our approach achieves exceptional results on a newly proposed dataset. To handle inputs of varying sizes, we pass paired human-firearm instances with attention masks as channels through a deep network for feature computation, utilizing an adaptive average pooling layer. We extensively evaluate our approach against existing methods in human-object interaction detection and achieve significant results (AP=77.8\%) compared to the baseline approach (AP=63.1\%). This demonstrates the effectiveness of leveraging attention mechanisms and saliency-driven locality preservation for accurate human-firearm interaction detection. Our findings contribute to advancing the fields of security and surveillance, enabling more efficient firearm localization and identification in diverse scenarios.
摘要
探测火器和准确地local化携带火器的人在图像或视频中是安全监测和内容个性化的关键问题。然而,这个问题在复杂环境中存在 significativetranslation missing 挑战,主要是因为背景干扰和火器的多样化形状。为解决这个问题,我们提出了一种新的方法,利用人与火器互动信息,该信息提供了关键的携带人Localization的信息。我们的方法包括一个注意力机制,有效地从背景中分离人和火器,并且引入了一个带有Saliency的地方填充约束,以学习 essencial 特征。通过这些组件,我们的方法实现了出色的结果在一个新提出的数据集上。为处理不同大小的输入,我们通过一个深度网络传递paired人与火器实例,并使用适应平均抽取层来计算特征。我们对现有方法进行了广泛的评估,并实现了相比基eline方法(AP=63.1%)显著的结果(AP=77.8%)。这表明了注意力机制和Saliency-driven的地方填充约束对人与火器互动探测的精度具有重要作用。我们的发现对安全监测和识别领域的进步做出了贡献,可以在多种enario中更有效地检测和识别携带火器的人。
Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables
results: 通过结合这两种方法,SI 系统的 Pearson 产品积分相关性(PPMC)分数提高了从 0.7452 到 0.8141,即6.9% 的提高。Abstract
The performance of deep learning models depends significantly on their capacity to encode input features efficiently and decode them into meaningful outputs. Better input and output representation has the potential to boost models' performance and generalization. In the context of acoustic-to-articulatory speech inversion (SI) systems, we study the impact of utilizing speech representations acquired via self-supervised learning (SSL) models, such as HuBERT compared to conventional acoustic features. Additionally, we investigate the incorporation of novel tract variables (TVs) through an improved geometric transformation model. By combining these two approaches, we improve the Pearson product-moment correlation (PPMC) scores which evaluate the accuracy of TV estimation of the SI system from 0.7452 to 0.8141, a 6.9% increase. Our findings underscore the profound influence of rich feature representations from SSL models and improved geometric transformations with target TVs on the enhanced functionality of SI systems.
摘要
深度学习模型的性能受输入特征编码和输出解码的效率影响很大。更好的输入和输出表示有助于提高模型的性能和泛化。在声音逆转(SI)系统中,我们研究了使用自动监督学习(SSL)模型获得的声音表示,比如胡蜡BERT,与传统的声音特征相比。此外,我们还研究了通过改进的几何变换模型中的新的轨迹变量(TV)的添加。将这两种方法结合起来,可以提高声音逆转系统的归一化积分相互 correlatio(PPMC)分数,从0.7452提高到0.8141,即6.9%提高。我们的发现表明,rich的特征表示从SSL模型和改进的几何变换模型中的target TVs具有杰出的影响,提高了SI系统的功能。
SplitEE: Early Exit in Deep Neural Networks with Split Computing
results: 这个研究获得了较高的成本优化(>50%),并且仅导致了小于2%的准确性下降。Abstract
Deep Neural Networks (DNNs) have drawn attention because of their outstanding performance on various tasks. However, deploying full-fledged DNNs in resource-constrained devices (edge, mobile, IoT) is difficult due to their large size. To overcome the issue, various approaches are considered, like offloading part of the computation to the cloud for final inference (split computing) or performing the inference at an intermediary layer without passing through all layers (early exits). In this work, we propose combining both approaches by using early exits in split computing. In our approach, we decide up to what depth of DNNs computation to perform on the device (splitting layer) and whether a sample can exit from this layer or need to be offloaded. The decisions are based on a weighted combination of accuracy, computational, and communication costs. We develop an algorithm named SplitEE to learn an optimal policy. Since pre-trained DNNs are often deployed in new domains where the ground truths may be unavailable and samples arrive in a streaming fashion, SplitEE works in an online and unsupervised setup. We extensively perform experiments on five different datasets. SplitEE achieves a significant cost reduction ($>50\%$) with a slight drop in accuracy ($<2\%$) as compared to the case when all samples are inferred at the final layer. The anonymized source code is available at \url{https://anonymous.4open.science/r/SplitEE_M-B989/README.md}.
摘要
深度神经网络(DNNs)吸引了关注,因其在多种任务上表现出色。然而,在资源有限的设备(边缘、移动、物联网)中部署完整的DNNs困难,因为它们的大小较大。为解决这个问题,一些方法被考虑,如将计算部分提取到云端进行最终推理(分 computation)或在设备上进行推理,而不是将所有层传递。在这种情况下,我们提出了结合这两种方法的方法,即使用早期退出在分 computation中。在我们的方法中,我们可以在设备上进行DNNs计算的深度层(分层),并决定一个样本是否可以在这层退出,或者需要被上传。这些决定是基于精度、计算和通信成本的权重平均值。我们开发了一个名为SplitEE的算法,用于学习优化策略。由于预训练的DNNs常常在新领域中部署,采用新的批处理方式和不可预测的样本流入,SplitEE在线上和无监督的设置下工作。我们对五个不同的数据集进行了广泛的实验。SplitEE可以在计算成本方面实现大于50%的减少,同时减少精度少于2%。详细的源代码可以在 \url{https://anonymous.4open.science/r/SplitEE_M-B989/README.md} 中找到。
From Cooking Recipes to Robot Task Trees – Improving Planning Correctness and Task Efficiency by Leveraging LLMs with a Knowledge Network
For: This paper is written for the task of robotic cooking, specifically in generating a sequence of actions for a robot to prepare a meal successfully.* Methods: The paper introduces a novel task tree generation pipeline that uses a large language model (LLM) to retrieve recipe instructions and then utilizes a fine-tuned GPT-3 to convert them into a task tree, capturing sequential and parallel dependencies among subtasks. The pipeline also mitigates the uncertainty and unreliable features of LLM outputs using task tree retrieval.* Results: The paper shows superior performance compared to previous works in task planning accuracy and efficiency, with improved planning correctness and improved execution efficiency.Here is the information in Simplified Chinese text:* For: 这篇论文是为了 robotic cooking 任务进行生成一个Successful sequence of actions。* Methods: 论文提出了一种新的任务树生成管道,使用大语言模型 (LLM) retrieve recipe instructions,然后使用精度调整后的 GPT-3 将其转换为任务树,捕捉下一个和平行依赖关系。管道还使用任务树检索来降低 LLM 输出的不确定性和不可靠性。* Results: 论文的评估结果显示,与前一些工作相比,它在任务规划正确率和执行效率方面表现出色,得到了改进的规划正确率和执行效率。Abstract
Task planning for robotic cooking involves generating a sequence of actions for a robot to prepare a meal successfully. This paper introduces a novel task tree generation pipeline producing correct planning and efficient execution for cooking tasks. Our method first uses a large language model (LLM) to retrieve recipe instructions and then utilizes a fine-tuned GPT-3 to convert them into a task tree, capturing sequential and parallel dependencies among subtasks. The pipeline then mitigates the uncertainty and unreliable features of LLM outputs using task tree retrieval. We combine multiple LLM task tree outputs into a graph and perform a task tree retrieval to avoid questionable nodes and high-cost nodes to improve planning correctness and improve execution efficiency. Our evaluation results show its superior performance compared to previous works in task planning accuracy and efficiency.
摘要
任务观念生成 для robotic cooking 包括生成一系列动作来成功实现料理任务。本研究提出了一个新的任务树生成管线,可以生成正确的观念和高效的执行 для cooking 任务。我们的方法首先使用大型自然语言模型(LLM) retrieve 菜单 instrucions,然后使用精度调整的 GPT-3 将其转换为任务树,捕捉组成组件和平行依赖关系。然后,我们的管线将 LLM 输出中的不确定和高成本特征使用任务树搜寻来缓解。我们将多个 LLM 任务树输出融合为一个图形,并使用任务树搜寻来避免问题节点和高成本节点,以提高观念正确性和执行效率。我们的评估结果显示,该管线在任务观念正确性和执行效率方面表现更好于先前的工作。
Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture
results: 根据CHiME-7 EVAL集的macro演说者识别错误率(DER)的评估结果,NSD-MS2S实现了15.9%的DER,相比官方基eline系统的49%的提升,表明该模型在主要轨道上实现了CHiME-7 DASR挑战赛的最佳性能。此外,我们还引入了深度互动模块(DIM),以更好地重新获取更清晰和更特征化的多话者嵌入,使当前模型超越了我们在CHiME-7 DASR挑战赛中使用的系统。Abstract
We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance. Next, we further decrease the memory occupation of decoding by incorporating input features fusion and then employ a multi-head attention mechanism to capture features at different levels. NSD-MS2S achieved a macro diarization error rate (DER) of 15.9% on the CHiME-7 EVAL set, which signifies a relative improvement of 49% over the official baseline system, and is the key technique for us to achieve the best performance for the main track of CHiME-7 DASR Challenge. Additionally, we introduce a deep interactive module (DIM) in MA-MSE module to better retrieve a cleaner and more discriminative multi-speaker embedding, enabling the current model to outperform the system we used in the CHiME-7 DASR Challenge. Our code will be available at https://github.com/liyunlongaaa/NSD-MS2S.
摘要
我们提出了一种新的神经网络发言人分类系统,使用记忆感知多 speaker嵌入Sequence-to-Sequence架构(NSD-MS2S),这种系统结合记忆感知多 speaker嵌入(MA-MSE)和Sequence-to-Sequence(Seq2Seq)架构的优点,从而提高效率和性能。然后,我们进一步减少解码过程中的内存占用量,通过输入特征融合和多头注意机制来捕捉不同级别的特征。NSD-MS2S在CHiME-7 EVAL集上达到了15.9%的macro分类错误率(DER),相比官方基eline系统,表示提高49%的相对提升,是我们在CHiME-7 DASR挑战的主轨上实现最佳性能的关键技术。此外,我们在MA-MSE模块中引入了深度互动模块(DIM),以更好地提取 cleaner和更特征化的多 speaker嵌入,使现在的模型超越了在CHiME-7 DASR挑战中使用的系统。我们的代码将于https://github.com/liyunlongaaa/NSD-MS2S上发布。
Syntax Tree Constrained Graph Network for Visual Question Answering
methods: 本研究提议了一种新的Syntax Tree Constrained Graph Network(STCGN)模型,基于实体消息传递和语法树。该模型可以从问题中提取更加精确的语法树信息,并通过消息传递机制捕捉更加精确的实体特征。
results: 对VQA2.0数据集进行了广泛的实验,显示了我们提议的模型的超越性。Abstract
Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are extracted using a hierarchical tree convolutional network. We then design a message-passing mechanism for phrase-aware visual entities and capture entity features according to a given visual context. Extensive experiments on VQA2.0 datasets demonstrate the superiority of our proposed model.
摘要
visual question answering (VQA) 目标是自动回答基于图像内容的自然语言问题。现有的 VQA 方法将视觉模型和语言理解结合以探索问题深层 semantics。然而,这些方法忽略了问题语法信息,这种信息在理解问题基本 semantics 和指导视觉特征细化方面发挥重要作用。为了填补这个空白,我们提议了一种基于实体消息传递和语法树的新方法,即Syntax Tree Constrained Graph Network (STCGN)。这个模型可以从问题中提取语法树,并且通过 hierarchy 的 convolutional network 提取Word 和 phrase 层次特征。然后,我们设计了一种message-passing机制,以便捕捉基于给定视觉上下文的视觉实体特征。我们在 VQA2.0 数据集上进行了广泛的实验,并证明了我们的提议模型的优越性。
Imbalanced Data Stream Classification using Dynamic Ensemble Selection
results: 实验结果显示,将数据预处理与动态集合选择组合使用可以在不均等数据流中提高分类精度和正确性。Abstract
Modern streaming data categorization faces significant challenges from concept drift and class imbalanced data. This negatively impacts the output of the classifier, leading to improper classification. Furthermore, other factors such as the overlapping of multiple classes limit the extent of the correctness of the output. This work proposes a novel framework for integrating data pre-processing and dynamic ensemble selection, by formulating the classification framework for the nonstationary drifting imbalanced data stream, which employs the data pre-processing and dynamic ensemble selection techniques. The proposed framework was evaluated using six artificially generated data streams with differing imbalance ratios in combination with two different types of concept drifts. Each stream is composed of 200 chunks of 500 objects described by eight features and contains five concept drifts. Seven pre-processing techniques and two dynamic ensemble selection methods were considered. According to experimental results, data pre-processing combined with Dynamic Ensemble Selection techniques significantly delivers more accuracy when dealing with imbalanced data streams.
摘要
现代流处理数据分类面临着概念飘移和数据偏好问题,这会负面影响分类器的输出,导致不正确的分类。此外,多个类别的重叠也限制了正确性的范围。这项工作提出了一种新的框架,通过将数据预处理和动态ensemble选择相结合,对非站ARY飘移偏好数据流进行分类框架,该框架使用了数据预处理和动态ensemble选择技术。该提案在六个人工生成的数据流中进行了评估,每个流程包含200个块,每个块有500个对象,描述了八个特征。每个流程包含五次概念飘移。试验结果表明,数据预处理和动态ensemble选择技术的结合可以在偏好数据流中提供更高的准确性。
Can Large Language Models Understand Real-World Complex Instructions?
results: 通过广泛的实验,发现代表中文和英文领域的模型在遵循复杂指令时表现不佳,具体的问题包括忽略语义约束、生成错误格式、违反长度或样本数约束、不准确反映输入文本等。Abstract
Large language models (LLMs) can understand human instructions, showing their potential for pragmatic applications beyond traditional NLP tasks. However, they still struggle with complex instructions, which can be either complex task descriptions that require multiple tasks and constraints, or complex input that contains long context, noise, heterogeneous information and multi-turn format. Due to these features, LLMs often ignore semantic constraints from task descriptions, generate incorrect formats, violate length or sample count constraints, and be unfaithful to the input text. Existing benchmarks are insufficient to assess LLMs' ability to understand complex instructions, as they are close-ended and simple. To bridge this gap, we propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically. We design eight features for complex instructions and construct a comprehensive evaluation dataset from real-world scenarios. We also establish four criteria and develop corresponding metrics, as current ones are inadequate, biased or too strict and coarse-grained. We compare the performance of representative Chinese-oriented and English-oriented models in following complex instructions through extensive experiments. Resources of CELLO are publicly available at https://github.com/Abbey4799/CELLO.
摘要
Enhancing Quantised End-to-End ASR Models via Personalisation
results: 对于 LibriSpeech 和 TED-LIUM 3 数据集,PQM 可以在压缩模型中实现15.1%和23.3%的相对WRR(word error rate)下降,相比原始精度模型。此外,PQM 只需要加入1%的 speaker-specific 参数,可以实现7倍的模型大小减少。Abstract
Recent end-to-end automatic speech recognition (ASR) models have become increasingly larger, making them particularly challenging to be deployed on resource-constrained devices. Model quantisation is an effective solution that sometimes causes the word error rate (WER) to increase. In this paper, a novel strategy of personalisation for a quantised model (PQM) is proposed, which combines speaker adaptive training (SAT) with model quantisation to improve the performance of heavily compressed models. Specifically, PQM uses a 4-bit NormalFloat Quantisation (NF4) approach for model quantisation and low-rank adaptation (LoRA) for SAT. Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora. Remarkably, with a 7x reduction in model size and 1% additional speaker-specific parameters, 15.1% and 23.3% relative WER reductions were achieved on quantised Whisper and Conformer-based attention-based encoder-decoder ASR models respectively, comparing to the original full precision models.
摘要
现代自动声音识别(ASR)模型已经变得越来越大,这使得它们在有限资源设备上部署变得更加困难。模型量化是一种有效的解决方案,但是有时会导致单词错误率(WER)增加。本文提出了一种个性化quantized模型(PQM)策略,该策略结合说话人适应训练(SAT)和模型量化来提高压缩模型的性能。具体来说,PQM使用4位NormalFloat量化(NF4)方法进行模型量化,并使用低级适应(LoRA)来实现SAT。在LibriSpeech和TED-LIUM 3 corpora上进行了实验,结果显示:与原始精度模型相比,使用PQM可以实现7倍压缩和1%额外的说话人特定参数,而WER的相对下降为15.1%和23.3%。
ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
paper_authors: Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, Elena Glassman
for: 这篇论文是为了评估大型自然语言模型(LLM)的输出而写的。
methods: 这篇论文使用了一个开源的视觉工具箱,用于比较不同模型和提示的响应。
results: 研究发现,通过使用这个工具箱,不同的人可以Investigate各种假设,包括实际世界中的应用。Abstract
Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.
摘要
评估大语言模型(LLM)的输出具有挑战性,需要对多个响应进行评估和理解。然而,评估工具通常需要编程API知识,专注于窄领域,或者是关闭源代码。我们提出了链forge,一个开源的视觉工具箱,用于文本生成LLM的推荐工程和即时假设测试。链forge提供了一个图形化界面,用于比较不同模型和提示变化的响应。我们的系统旨在支持三个任务:模型选择、提示模板设计和假设测试(例如,审核)。我们在开发的早期就发布了链forge,并与学术界和在线用户合作进行了设计征提高。经过实验室和专访研究,我们发现了许多人可以使用链forge来调查他们关心的假设,包括在实际场景中。我们将评估工具分为三种模式:机会性探索、有限评估和迭代优化。
How much can ChatGPT really help Computational Biologists in Programming?
results: 研究发现 chatGPT 在生物计算领域有多种可能的影响,包括代码生成、数据分析和机器学习模型的建立等方面,同时也存在一些潜在的负面影响,如代码质量和数据隐私等问题。Abstract
ChatGPT, a recently developed product by openAI, is successfully leaving its mark as a multi-purpose natural language based chatbot. In this paper, we are more interested in analyzing its potential in the field of computational biology. A major share of work done by computational biologists these days involve coding up Bioinformatics algorithms, analyzing data, creating pipelining scripts and even machine learning modeling & feature extraction. This paper focuses on the potential influence (both positive and negative) of ChatGPT in the mentioned aspects with illustrative examples from different perspectives. Compared to other fields of Computer Science, Computational Biology has: (1) less coding resources, (2) more sensitivity and bias issues (deals with medical data) and (3) more necessity of coding assistance (people from diverse background come to this field). Keeping such issues in mind, we cover use cases such as code writing, reviewing, debugging, converting, refactoring and pipelining using ChatGPT from the perspective of computational biologists in this paper.
摘要
chatGPT,由openAI开发的一款多功能自然语言基础的聊天机器人,在这篇论文中,我们更关心其在生物计算领域的潜在影响。计算生物学家们的工作中大量包括编程、数据分析、创建管道脚本以及机器学习模型和特征提取。本论文将对chatGPT在这些方面的影响进行分析,并通过不同角度提供示例。与其他计算机科学领域不同,计算生物学领域有以下特点:(1) coding资源更少,(2)更敏感和偏见问题(与医疗数据相关),(3)更需要编程帮助(人们来自不同背景)。基于这些问题,本文将从计算生物学家的视角来探讨使用chatGPT进行代码写作、审查、调试、转换、重构和管道化的可能性。
Using Reinforcement Learning to Simplify Mealtime Insulin Dosing for People with Type 1 Diabetes: In-Silico Experiments
results: 研究结果表明,提出的RL方法在26周的enario中表现出色,可以取代标准糖尿病控制方法,并且可以提高糖尿病患者的生活质量和血糖控制水平。特别是,在26周的enario中,时间在范围($70-180$mg/dL)和时间在低糖($<70$mg/dL)分别为73.1±11.6%和2.0±1.8%,与标准CC方法相比有所提高。Abstract
People with type 1 diabetes (T1D) struggle to calculate the optimal insulin dose at mealtime, especially when under multiple daily injections (MDI) therapy. Effectively, they will not always perform rigorous and precise calculations, but occasionally, they might rely on intuition and previous experience. Reinforcement learning (RL) has shown outstanding results in outperforming humans on tasks requiring intuition and learning from experience. In this work, we propose an RL agent that recommends the optimal meal-accompanying insulin dose corresponding to a qualitative meal (QM) strategy that does not require precise carbohydrate counting (CC) (e.g., a usual meal at noon.). The agent is trained using the soft actor-critic approach and comprises long short-term memory (LSTM) neurons. For training, eighty virtual subjects (VS) of the FDA-accepted UVA/Padova T1D adult population were simulated using MDI therapy and QM strategy. For validation, the remaining twenty VS were examined in 26-week scenarios, including intra- and inter-day variabilities in glucose. \textit{In-silico} results showed that the proposed RL approach outperforms a baseline run-to-run approach and can replace the standard CC approach. Specifically, after 26 weeks, the time-in-range ($70-180$mg/dL) and time-in-hypoglycemia ($<70$mg/dL) were $73.1\pm11.6$% and $ 2.0\pm 1.8$% using the RL-optimized QM strategy compared to $70.6\pm14.8$% and $ 1.5\pm 1.5$% using CC. Such an approach can simplify diabetes treatment, resulting in improved quality of life and glycemic outcomes.
摘要
人们 WITH 类型 1 диабеetes (T1D) 困难计算最佳注射剂量,特别是在多 daily 注射 (MDI) 治疗下。实际上,他们并不总是做出精确的计算,而是倾向于依靠直觉和之前的经验。人工学习 (RL) 已经表现出惊人的成绩,可以在需要直觉和经验学习的任务上超越人类。在这项工作中,我们提出一种RL代理人,可以根据qualitative meal (QM) 策略提供最佳陪食时注射剂量。该代理人使用 soft actor-critic 方法和长Short-Term Memory (LSTM) 神经元进行训练。为了训练,我们 simulate 了八十名虚拟Subject (VS),分别采用 MDI 治疗和 QM 策略。为验证,剩下的二十 VS 在26周的enario中进行了检验,包括内部和外部的变化。 results showed that our proposed RL approach outperforms a baseline run-to-run approach and can replace the standard carbohydrate counting (CC) approach. Specifically, after 26 weeks, the time-in-range ($70-180$mg/dL) and time-in-hypoglycemia ($<70$mg/dL) were $73.1\pm11.6$% and $ 2.0\pm 1.8$% using the RL-optimized QM strategy compared to $70.6\pm14.8$% and $ 1.5\pm 1.5$% using CC. Such an approach can simplify diabetes treatment, resulting in improved quality of life and glycemic outcomes.
Conditional Mutual Information Constrained Deep Learning for Classification
results: 对 CMIC-DL 的束定优化问题,提出了一种新的交叉学习算法。实验结果表明,使用 CMIC-DL 训练的 DNN 模型在精度和对抗攻击性能方面都高于文献中其他模型和损失函数。此外,通过 visualizing 学习过程中 CMI 和 NCMI 的变化,也能够更好地理解 DNN 的学习过程。Abstract
The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the DNN, respectively. By using NCMI to evaluate popular DNNs pretrained over ImageNet in the literature, it is shown that their validation accuracies over ImageNet validation data set are more or less inversely proportional to their NCMI values. Based on this observation, the standard deep learning (DL) framework is further modified to minimize the standard cross entropy function subject to an NCMI constraint, yielding CMI constrained deep learning (CMIC-DL). A novel alternating learning algorithm is proposed to solve such a constrained optimization problem. Extensive experiment results show that DNNs trained within CMIC-DL outperform the state-of-the-art models trained within the standard DL and other loss functions in the literature in terms of both accuracy and robustness against adversarial attacks. In addition, visualizing the evolution of learning process through the lens of CMI and NCMI is also advocated.
摘要
《条件共mutual information(CMI)和正常化条件共mutual information(NCMI)的概念在深度学习(DL)中被引入,用于衡量深度神经网络(DNN)在输出概率分布空间中的集中度和分化性。CMI和NCMI之比分别表示DNN的内类集中度和对类分化程度。通过对popular DNNs在ImageNet文献中的验证性能进行NCMI评估,发现这些模型的验证性能与NCMI值相对负相关。基于此观察,我们提出了一种受CMI约束的DL框架(CMIC-DL),并提出了一种新的交互学习算法来解决这种受约束优化问题。实验结果表明,在CMIC-DL中训练的DNN模型比标准DL框架和其他文献中的损失函数训练模型在鲁棒性和鲁棒性对抗攻击方面表现更好。此外,通过CMI和NCMI的视觉化来描述学习过程的演化也被提出。》Note that Simplified Chinese is a standardized form of Chinese that is used in mainland China and Singapore. Traditional Chinese is used in Taiwan and Hong Kong.
results: 我们通过在ImageNet-100和CUB-200两个公开available datasets上进行实验,发现我们的提议方法比基eline方法高效。Abstract
This work addresses the task of class-incremental weakly supervised object localization (CI-WSOL). The goal is to incrementally learn object localization for novel classes using only image-level annotations while retaining the ability to localize previously learned classes. This task is important because annotating bounding boxes for every new incoming data is expensive, although object localization is crucial in various applications. To the best of our knowledge, we are the first to address this task. Thus, we first present a strong baseline method for CI-WSOL by adapting the strategies of class-incremental classifiers to mitigate catastrophic forgetting. These strategies include applying knowledge distillation, maintaining a small data set from previous tasks, and using cosine normalization. We then propose the feature drift compensation network to compensate for the effects of feature drifts on class scores and localization maps. Since updating network parameters to learn new tasks causes feature drifts, compensating for the final outputs is necessary. Finally, we evaluate our proposed method by conducting experiments on two publicly available datasets (ImageNet-100 and CUB-200). The experimental results demonstrate that the proposed method outperforms other baseline methods.
摘要
这个研究强调了类增量弱监督对象定位任务(CI-WSOL)。目标是逐步学习新类对象定位,只使用图像级别的注释,保留之前学习的类别定位能力。这项任务非常重要,因为为每个新来的数据都要注释 bounding box 是非常昂贵的,但对象定位是许多应用场景中非常重要的。根据我们所知,我们是第一个对这项任务进行研究的。因此,我们首先提出了一个强大的基线方法 для CI-WSOL,通过适应类增量分类器的策略来减轻忘却性。这些策略包括应用知识传播、维护前一任务中的小数据集,以及使用仓颉 норamlization。然后,我们提议了特征漂移补做网络,以补做因更新网络参数学习新任务而导致的特征漂移的效果。最后,我们通过在 ImageNet-100 和 CUB-200 两个公共可用的数据集上进行实验,证明我们提出的方法在基eline方法上表现出色。
Public Perceptions of Gender Bias in Large Language Models: Cases of ChatGPT and Ernie
paper_authors: Kyrie Zhixuan Zhou, Madelyn Rose Sanfilippo
for: This paper aims to investigate and analyze public perceptions of gender bias in large language models (LLMs) trained in different cultural contexts.
methods: The authors conducted a content analysis of social media discussions to gather data on people’s observations of gender bias in their personal use of LLMs and scientific findings about gender bias in LLMs.
results: The study found that ChatGPT, a US-based LLM, exhibited more implicit gender bias, while Ernie, a China-based LLM, showed more explicit gender bias. The findings suggest that culture plays a significant role in shaping gender bias in LLMs and propose governance recommendations to regulate gender bias in these models.Abstract
Large language models are quickly gaining momentum, yet are found to demonstrate gender bias in their responses. In this paper, we conducted a content analysis of social media discussions to gauge public perceptions of gender bias in LLMs which are trained in different cultural contexts, i.e., ChatGPT, a US-based LLM, or Ernie, a China-based LLM. People shared both observations of gender bias in their personal use and scientific findings about gender bias in LLMs. A difference between the two LLMs was seen -- ChatGPT was more often found to carry implicit gender bias, e.g., associating men and women with different profession titles, while explicit gender bias was found in Ernie's responses, e.g., overly promoting women's pursuit of marriage over career. Based on the findings, we reflect on the impact of culture on gender bias and propose governance recommendations to regulate gender bias in LLMs.
摘要
大型语言模型快速增长,但它们在回应中显示出性别偏见。在这篇研究中,我们通过社交媒体讨论分析公众对于语言模型中的性别偏见。人们分享了对于性别偏见的personal使用经验和科学发现,包括两个不同的语言模型:ChatGPT和Ernie。我们发现ChatGPT更常见偏见,例如将男女分配不同的职业标题,而Ernie的回应则显示了明显的性别偏见,例如过度推荐女性追求婚姻而不是职业。根据结果,我们思考了文化对性别偏见的影响和建议管理性别偏见在语言模型中。
Uncertainty-aware 3D Object-Level Mapping with Deep Shape Priors
paper_authors: Ziwei Liao, Jun Yang, Jingxing Qian, Angela P. Schoellig, Steven L. Waslander for:这篇论文的目的是提出一种高质量的物体级别地图生成方法,用于在探测过程中对未知物体进行高精度的重建。methods:该方法使用多个RGB-D图像作为输入,并输出高精度的3D形状和9DoF姿态(包括3个涂抹参数)。该方法利用学习的生成模型来作为对shape类别的先验知识,并使用概率uncertainty-aware优化框架来进行3D重建。results:我们在实验中取得了substantial的提高,与现有的方法相比。我们还示出了对下游机器人任务的活跃探测等方面的应用。Abstract
3D object-level mapping is a fundamental problem in robotics, which is especially challenging when object CAD models are unavailable during inference. In this work, we propose a framework that can reconstruct high-quality object-level maps for unknown objects. Our approach takes multiple RGB-D images as input and outputs dense 3D shapes and 9-DoF poses (including 3 scale parameters) for detected objects. The core idea of our approach is to leverage a learnt generative model for shape categories as a prior and to formulate a probabilistic, uncertainty-aware optimization framework for 3D reconstruction. We derive a probabilistic formulation that propagates shape and pose uncertainty through two novel loss functions. Unlike current state-of-the-art approaches, we explicitly model the uncertainty of the object shapes and poses during our optimization, resulting in a high-quality object-level mapping system. Moreover, the resulting shape and pose uncertainties, which we demonstrate can accurately reflect the true errors of our object maps, can also be useful for downstream robotics tasks such as active vision. We perform extensive evaluations on indoor and outdoor real-world datasets, achieving achieves substantial improvements over state-of-the-art methods. Our code will be available at https://github.com/TRAILab/UncertainShapePose.
摘要
三维对象级映射是Robotics中的基本问题,特别是当对象CAD模型不可用于推理时更加棘手。在这项工作中,我们提出了一个框架,可以重建高质量的对象级映射。我们的方法接受多个RGB-D图像作为输入,并输出密集的3D形状和9个DoF姿态(包括3个涉及度参数)。我们的核心思想是利用学习的生成模型来作为类别的先验,并使用概率、不确定性意识推导的优化框架来重建3D映射。我们 derive了一个概率形式,将形状和姿态不确定性传递给两个新的损失函数。不同于当前状态的方法,我们显式地模型对象的形状和姿态不确定性,从而实现了高质量的对象级映射系统。此外,我们所获得的形状和姿态不确定性,可以准确地反映我们对象映射的真实错误,同时也可以用于下游机器人任务,如活动视觉。我们在室内和室外实际数据集上进行了广泛的评估,实现了明显的提高。我们的代码将在https://github.com/TRAILab/UncertainShapePose上提供。
Contrastive Decoding Improves Reasoning in Large Language Models
results: 研究发现,Contrastive Decoding 可以在多种逻辑 reasoning 任务上大幅超越 greedy decoding,并且在 HellaSwag 常识逻辑测试 benchmark 和 GSM8K 数学单词逻辑测试 benchmark 上表现出色。Abstract
We demonstrate that Contrastive Decoding -- a simple, computationally light, and training-free text generation method proposed by Li et al 2022 -- achieves large out-of-the-box improvements over greedy decoding on a variety of reasoning tasks. Originally shown to improve the perceived quality of long-form text generation, Contrastive Decoding searches for strings that maximize a weighted difference in likelihood between strong and weak models. We show that Contrastive Decoding leads LLaMA-65B to outperform LLaMA 2, GPT-3.5 and PaLM 2-L on the HellaSwag commonsense reasoning benchmark, and to outperform LLaMA 2, GPT-3.5 and PaLM-540B on the GSM8K math word reasoning benchmark, in addition to improvements on a collection of other tasks. Analysis suggests that Contrastive Decoding improves over existing methods by preventing some abstract reasoning errors, as well as by avoiding simpler modes such as copying sections of the input during chain-of-thought. Overall, Contrastive Decoding outperforms nucleus sampling for long-form generation and greedy decoding for reasoning tasks, making it a powerful general purpose method for generating text from language models.
摘要
我们证明了对比解码(Contrastive Decoding)——一种简单、计算机轻量级、训练无需的文本生成方法,提出于李等2022年的研究——在多种逻辑任务上实现了大量的出vat的改进。原本用于改善长文本生成的 perceived 质量,对比解码搜索最大化一个权重 diferencial of likelihood между强度模型和弱度模型中的字符串。我们表明,对比解码使 LLama-65B 超过 LLama 2、GPT-3.5 和 PaLM 2-L 在 hellaSwag 常识逻辑标准 bencmark 上表现出色,以及在 GSM8K 数学单词逻辑标准 bencmark 上超过 LLaMA 2、GPT-3.5 和 PaLM-540B。此外,对比解码还能在其他任务上提高表现。分析表明,对比解码可以避免一些抽象逻辑错误,以及避免简单的模式,如输入段落复制。总之,对比解码超过 nucleus sampling для长文本生成和 greedy decoding para reasoning tasks,成为一种强大的通用方法 для生成语言模型中的文本。