paper_authors: Vullnet Useini, Stephanie Tanadini-Lang, Quentin Lohmeyer, Mirko Meboldt, Nicolaus Andratschke, Ralph P. Braun, Javier Barranco García for: 这份研究的目的是为了提高皮肤癌检测的精度和效率,以及帮助皮肤科医生识别病变。methods: 这份研究使用了人工智能(AI)决策支持工具,该工具使用了现代的物体检测算法来识别和从患者影像中提取所有皮肤损伤,然后使用自主学习AI算法来排序这些损伤的可疑程度。results: 这份研究的结果显示,使用AI决策支持工具可以提高皮肤科医生识别病变的精度,具体来说,该工具可以帮助医生识别93%的病变损伤,并且帮助医生增加自信心和与其他专家的一致性。Abstract
The incidence rates of melanoma, the deadliest form of skin cancer, have been increasing steadily worldwide, presenting a significant challenge to dermatologists. Early detection of melanoma is crucial for improving patient survival rates, but identifying suspicious lesions through ugly duckling (UD) screening, the current method used for skin cancer screening, can be challenging and often requires expertise in pigmented lesions. To address these challenges and improve patient outcomes, an artificial intelligence (AI) decision support tool was developed to assist dermatologists in identifying UD from wide-field patient images. The tool uses a state-of-the-art object detection algorithm to identify and extract all skin lesions from patient images, which are then sorted by suspiciousness using a self-supervised AI algorithm. A clinical validation study was conducted to evaluate the tool's performance, which demonstrated an average sensitivity of 93% for the top-10 AI-identified UDs on skin lesions selected by the majority of experts in pigmented skin lesions. The study also found that dermatologists confidence increased, and the average majority agreement with the top-10 AI-identified UDs improved to 100% when assisted by AI. The development of this AI decision support tool aims to address the shortage of specialists, enable at-risk patients to receive faster consultations and understand the impact of AI-assisted screening. The tool's automation can assist dermatologists in identifying suspicious lesions and provide a more objective assessment, reducing subjectivity in the screening process. The future steps for this project include expanding the dataset to include histologically confirmed melanoma cases and increasing the number of participants for clinical validation to strengthen the tool's reliability and adapt it for real-world consultation.
摘要
全球的梅毒病例数逐渐增加,对皮肤科医生而言,这提出了一项重要的挑战。早期发现梅毒病是改善病人存活率的关键,但通过“鸟嘤”(UD)检测,现在用于皮肤癌检测的方法,可能很困难,需要对疤痕性皮肤病有专门的知识。为了解决这些挑战并提高病人 outcome,我们开发了一种人工智能(AI)决策支持工具,用于协助皮肤科医生从广角图像中识别UD。该工具使用当前最先进的物体检测算法来识别和提取患者图像中的所有皮肤损伤,然后根据自动学习AI算法排序为可疑程度。在临床验证研究中,我们发现该工具的敏感性为93%,对于由大多数专家选择的疤痕性皮肤损伤的top-10 AI识别UD。研究还发现,当帮助于AI的时候,专家的自信度增加,并且对top-10 AI识别UD的多数同意率提高到100%。该工具的开发旨在解决专业人员短缺、帮助高风险患者更快地咨询,并了解AI助检查的影响。该工具的自动化可以帮助皮肤科医生识别可疑损伤,提供更Objective的评估,减少检测过程中的主观性。未来的步骤包括将数据集扩展到包括历史确诊梅毒患者 случа,并增加参与者数量以强化工具的可靠性和适应实际咨询。
Dream to Adapt: Meta Reinforcement Learning by Latent Context Imagination and MDP Imagination
results: 我们的实验显示,MetaDreamer在数据效率和混合 interpolated 测试中表现出色,超越现有的方法。Abstract
Meta reinforcement learning (Meta RL) has been amply explored to quickly learn an unseen task by transferring previously learned knowledge from similar tasks. However, most state-of-the-art algorithms require the meta-training tasks to have a dense coverage on the task distribution and a great amount of data for each of them. In this paper, we propose MetaDreamer, a context-based Meta RL algorithm that requires less real training tasks and data by doing meta-imagination and MDP-imagination. We perform meta-imagination by interpolating on the learned latent context space with disentangled properties, as well as MDP-imagination through the generative world model where physical knowledge is added to plain VAE networks. Our experiments with various benchmarks show that MetaDreamer outperforms existing approaches in data efficiency and interpolated generalization.
摘要 translate="zh-CN"meta学习(Meta RL)已经广泛探索,以快速学习未经见过的任务,通过将先前学习的知识传递到相似任务中。然而,大多数当前最佳方法需要meta训练任务的权重复盖到任务分布中,并且需要很多数据 для每个meta训练任务。在这篇论文中,我们提出了MetaDreamer算法,它需要更少的真实训练任务和数据,通过meta想象和MDP想象。我们在meta想象中,通过 interpolating在已学习的约束空间中,捕捉到分离的属性,并在生成世界模型中添加物理知识,使得plain VAE网络可以更好地预测未经见过的任务。我们在不同的benchmark上进行了实验,结果显示,MetaDreamer在数据效率和 interpolated泛化方面超过了现有方法。
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
results: 研究表明,ICV方法可以在多种任务上达到更好的性能,包括安全性、风格转换、扮演和格式化等。此外,ICV方法还可以轻松地控制LLM的行为,并且计算效率高于精度调整。Abstract
Large language models (LLMs) demonstrate emergent in-context learning capabilities, where they adapt to new tasks based on example demonstrations. However, in-context learning has seen limited effectiveness in many settings, is difficult to quantitatively control and takes up context window space. To overcome these limitations, we propose an alternative approach that recasts in-context learning as in-context vectors (ICV). Using ICV has two steps. We first use a forward pass on demonstration examples to create the in-context vector from the latent embedding of the LLM. This vector captures essential information about the intended task. On a new query, instead of adding demonstrations to the prompt, we shift the latent states of the LLM using the ICV. The ICV approach has several benefits: 1) it enables the LLM to more effectively follow the demonstration examples; 2) it's easy to control by adjusting the magnitude of the ICV; 3) it reduces the length of the prompt by removing the in-context demonstrations; 4) ICV is computationally much more efficient than fine-tuning. We demonstrate that ICV achieves better performance compared to standard in-context learning and fine-tuning on diverse tasks including safety, style transfer, role-playing and formatting. Moreover, we show that we can flexibly teach LLM to simultaneously follow different types of instructions by simple vector arithmetics on the corresponding ICVs.
摘要
大型语言模型(LLM)展示出emergent在场景学习能力,即通过示例示例来适应新任务。然而,场景学习在许多场景下表现有限,控制困难,需要场景窗口空间。为了解决这些限制,我们提议一种替代方法,即在场景中 vectors(ICV)。使用ICV有两步:首先,我们使用示例示例进行前向传播,从LLM的干扰空间中生成场景vector,这个vector捕捉了任务的核心信息。然后,在新的查询上,而不是添加示例到提示中,我们使用ICV来偏移LLM的干扰状态。ICV方法具有以下优点:1)帮助LLM更好地跟随示例示例;2)容易控制,只需调整ICV的大小;3)缩短提示的长度,去除场景示例;4)ICV比finetuning更高效。我们示示ICV可以在多种任务上达到更好的性能,包括安全、样式转移、扮演和格式化。此外,我们还证明了可以通过简单的向量运算来让LLM同时遵循不同类型的指令。
The Pros and Cons of Using Machine Learning and Interpretable Machine Learning Methods in psychiatry detection applications, specifically depression disorder: A Brief Review
results: 这些研究获得了有用的结果,帮助了心理科学家和研究人员更好地理解机器学习在心理疾病诊断中的优劣。Abstract
The COVID-19 pandemic has forced many people to limit their social activities, which has resulted in a rise in mental illnesses, particularly depression. To diagnose these illnesses with accuracy and speed, and prevent severe outcomes such as suicide, the use of machine learning has become increasingly important. Additionally, to provide precise and understandable diagnoses for better treatment, AI scientists and researchers must develop interpretable AI-based solutions. This article provides an overview of relevant articles in the field of machine learning and interpretable AI, which helps to understand the advantages and disadvantages of using AI in psychiatry disorder detection applications.
摘要
COVID-19 大流行导致许多人需要限制社交活动,这已经导致了心理疾病的增加,特别是抑郁症。为了准确和快速诊断这些疾病,以避免严重的结果如自杀,机器学习的使用已成为越来越重要。此外,为了提供更好的治疗,AI科学家和研究人员必须开发可解释的 AI 解决方案。本文提供了机器学习和可解释 AI 领域的相关文章,以便更好地了解使用 AI 在心理疾病检测应用中的优劣。
VT-Former: A Transformer-based Vehicle Trajectory Prediction Approach For Intelligent Highway Transportation Systems
results: 研究在三个 benchmark 数据集上,通过三种不同的视点展示了VT-Former 在车辆轨迹预测中的 State-of-The-Art 性能,以及其普适性和稳定性。此外,本文还评估了 VT-Former 在嵌入式板上的效率,并对其在车辆异常检测中的应用展示了其广泛的应用前景。Abstract
Enhancing roadway safety and traffic management has become an essential focus area for a broad range of modern cyber-physical systems and intelligent transportation systems. Vehicle Trajectory Prediction is a pivotal element within numerous applications for highway and road safety. These applications encompass a wide range of use cases, spanning from traffic management and accident prevention to enhancing work-zone safety and optimizing energy conservation. The ability to implement intelligent management in this context has been greatly advanced by the developments in the field of Artificial Intelligence (AI), alongside the increasing deployment of surveillance cameras across road networks. In this paper, we introduce a novel transformer-based approach for vehicle trajectory prediction for highway safety and surveillance, denoted as VT-Former. In addition to utilizing transformers to capture long-range temporal patterns, a new Graph Attentive Tokenization (GAT) module has been proposed to capture intricate social interactions among vehicles. Combining these two core components culminates in a precise approach for vehicle trajectory prediction. Our study on three benchmark datasets with three different viewpoints demonstrates the State-of-The-Art (SoTA) performance of VT-Former in vehicle trajectory prediction and its generalizability and robustness. We also evaluate VT-Former's efficiency on embedded boards and explore its potential for vehicle anomaly detection as a sample application, showcasing its broad applicability.
摘要
提高公路安全和交通管理已成为现代ци伯-物理系统和智能交通系统的重要焦点。车辆轨迹预测是这些应用程序中的重要组成部分,包括交通管理、事故预防和工地安全等。随着人工智能技术的发展和公路网络上的监测摄像头的普及,实现智能管理在这个领域已得到了大幅提高。在这篇论文中,我们介绍了一种新的变换器基于方法(VT-Former),用于高速公路安全和监测中的车辆轨迹预测。此外,我们还提出了一种新的图像注意力模块(GAT),用于捕捉车辆之间的复杂社交互动。这两个核心组件的结合,实现了准确的车辆轨迹预测。我们在三个标准数据集上进行了三种不同的视角测试,并证明了VT-Former在车辆轨迹预测中的状态之最(SoTA)性和其广泛应用性和稳定性。此外,我们还评估了VT-Former的效率在嵌入板上,并探讨了其在车辆异常检测方面的潜在应用。
TrainerAgent: Customizable and Efficient Model Training through LLM-Powered Multi-Agent System
results: 实验证明,该系统能够顺利地生成满足要求的模型,并能够检测和排除不可能的任务(如幻想情境或不道德请求),从而确保了系统的可靠性和安全性。Abstract
Training AI models has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. The quest for high-quality and efficient model development, along with the emergence of Large Language Model (LLM) Agents, has become a key focus in the industry. Leveraging the powerful analytical, planning, and decision-making capabilities of LLM, we propose a TrainerAgent system comprising a multi-agent framework including Task, Data, Model and Server agents. These agents analyze user-defined tasks, input data, and requirements (e.g., accuracy, speed), optimizing them comprehensively from both data and model perspectives to obtain satisfactory models, and finally deploy these models as online service. Experimental evaluations on classical discriminative and generative tasks in computer vision and natural language processing domains demonstrate that our system consistently produces models that meet the desired criteria. Furthermore, the system exhibits the ability to critically identify and reject unattainable tasks, such as fantastical scenarios or unethical requests, ensuring robustness and safety. This research presents a significant advancement in achieving desired models with increased efficiency and quality as compared to traditional model development, facilitated by the integration of LLM-powered analysis, decision-making, and execution capabilities, as well as the collaboration among four agents. We anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.
摘要
traditional AI model training has always been challenging, especially when there is a need for custom models to provide personalized services. Algorithm engineers often face a lengthy process to iteratively develop models tailored to specific business requirements, making it even more difficult for non-experts. With the emergence of Large Language Model (LLM) Agents, there is a growing focus on high-quality and efficient model development.我们提出了一种名为TrainerAgent的多智能框架,包括任务、数据、模型和服务器代理。这些代理分析用户定义的任务、输入数据和要求(如准确率和速度),从数据和模型角度进行全面优化,以获得满足要求的模型,并最后将这些模型部署为在线服务。我们的实验评估表明,我们的系统可以适应古典的推论和生成任务,包括计算机视觉和自然语言处理领域。此外,系统还能够批判性地识别和拒绝不可能的任务,如幻想场景或不道德的请求,以确保系统的稳定性和安全性。我们的研究表明,TrainerAgent系统可以在传统模型开发的基础上提供更高效和高质量的模型开发,这得到了LLM智能分析、决策和执行能力的支持,以及代理之间的合作。我们anticipate that our work will contribute to the advancement of research on TrainerAgent in both academic and industry communities, potentially establishing it as a new paradigm for model development in the field of AI.
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
results: 在多达16个不同的数据集上进行了广泛的测试,发现Monkey在基本任务上(如图像描述、全视Question Answering和文档Question Answering)具有稳定竞争力的表现Abstract
Large Multimodal Models have demonstrated impressive capabilities in understanding general vision-language tasks. However, due to the limitation of supported input resolution (e.g., 448 x 448) as well as the inexhaustive description of the training image-text pair, these models often encounter challenges when dealing with intricate scene understandings and narratives. Here we address the problem by proposing the Monkey. Our contributions are two-fold: 1) without pretraining from the start, our method can be built upon an existing vision encoder (e.g., vit-BigHuge) to effectively improve the input resolution capacity up to 896 x 1344 pixels; 2) we propose a multi-level description generation method, which automatically provides rich information that can guide model to learn contextual association between scenes and objects. Our extensive testing across more than 16 distinct datasets reveals that Monkey achieves consistently competitive performance over the existing LMMs on fundamental tasks, such as Image Captioning, General Visual Question Answering (VQA), and Document-oriented VQA. Models, interactive demo, and the source code are provided at the following https://github.com/Yuliang-Liu/Monkey.
摘要
大型多Modal模型在通用视力语言任务上表现出了吸引人的能力。然而,由于输入分辨率的限制(例如448x448)以及训练图片文本对的描述不够详细,这些模型经常在处理复杂的场景理解和 narraves 时遇到挑战。我们解决这个问题,我们提出了猴子(Monkey)。我们的贡献有两个方面:1. 不需要先training,我们的方法可以基于现有的视力编码器(例如 vit-BigHuge)来提高输入分辨率capacity到896x1344像素;2. 我们提出了多 уров层描述生成方法,可以自动提供详细的信息,以帮助模型学习场景和物体之间的上下文关系。我们在16个不同的数据集上进行了广泛的测试,发现Monkey在基本任务上(如图像描述、通用视Question Answering和文档 oriented VQA)与现有的LMMs(Large Multimodal Models)具有相当竞争力。我们提供了模型、交互示例和源代码,可以在以下GitHub上下载:https://github.com/Yuliang-Liu/Monkey。
Understanding Grokking Through A Robustness Viewpoint
results: 发现$l_2$ нор为神经网络泛化的必要条件,但是$l_2$ norm与测试数据不协调,提出新的评价指标可以协调grokking现象。 Additionally, the proposed method can speed up the generalization process, and learning the commutative law can explain part of the speedup.Abstract
Recently, an unusual phenomenon called grokking has gained much attention, where sometimes a neural network generalizes long after it perfectly fits the training data. We try to understand this seemingly strange phenomenon using the robustness of the neural network. Using a robustness viewpoint, we show that the popular $l_2$ weight norm (metric) of the neural network is actually a sufficient condition for grokking. As we also empirically find that $l_2$ norm correlates with grokking on the test data not in a timely way, we propose new metrics based on robustness and information theory and find that our new metrics correlate well with the grokking phenomenon. Based on the previous observations, we propose methods to speed up the generalization process. In addition, we examine the standard training process on modulo addition dataset and find that it hardly learns other basic group operations before grokking, including the commutative law. Interestingly, the speed up of generalization when using our proposed method can be partially explained by learning the commutative law, a necessary condition when the model groks on test dataset.
摘要
最近,一种奇异现象叫“grokking”在神经网络领域受到了广泛关注,神经网络在训练数据完美适应后仍然能够泛化。我们使用神经网络的稳定性视角来理解这一现象,并证明了$l_2$质量 нор(度量)是泛化现象的必要条件。然而,我们发现$l_2$ нор与测试数据上的泛化不一致,因此我们提出了基于稳定性和信息理论的新度量,并发现它们与泛化现象有高度相关性。基于以前的观察结果,我们提出了加速泛化过程的方法。此外,我们还检查了模式训练过程中的标准处理方法,发现它几乎不会学习测试数据上的其他基本群操作,包括交换律。Interestingly,使用我们提出的方法可以加速泛化过程,其中一部分可以通过学习交换律来解释,交换律是泛化到测试数据的必要条件。
An Intelligent Social Learning-based Optimization Strategy for Black-box Robotic Control with Reinforcement Learning
paper_authors: Xubo Yang, Jian Gao, Ting Wang, Yaozhen He
for: 这篇论文的目的是提出一种基于社交学习的智能控制算法,以便控制黑盒系统中的机器人。
methods: 这篇论文使用了一种叫做社交学习算法(Intelligent Social Learning,ISL),它包括学习、模仿和自我研究三种式态。
results: 试验结果显示,ISL算法比四种现有方法在六个连续控制测试案例中更有效率,具有更快的计算速度、更少的参数和更高的稳定性。此外,ISL算法在模拟和实验中的抓取任务中也获得了满意的解决方案。Abstract
Implementing intelligent control of robots is a difficult task, especially when dealing with complex black-box systems, because of the lack of visibility and understanding of how these robots work internally. This paper proposes an Intelligent Social Learning (ISL) algorithm to enable intelligent control of black-box robotic systems. Inspired by mutual learning among individuals in human social groups, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use the Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.
摘要
实现智能控制机器人是一项困难任务,特别是面临复杂黑盒系统时,因为lack of visibility和理解机器人内部的工作方式。这篇论文提议一种智能社会学习(ISL)算法,以帮助智能控制黑盒机器人系统。 Drawing inspiration from human social groups' mutual learning, ISL includes learning, imitation, and self-study styles. Individuals in the learning style use Levy flight search strategy to learn from the best performer and form the closest relationships. In the imitation style, individuals mimic the best performer with a second-level rapport by employing a random perturbation strategy. In the self-study style, individuals learn independently using a normal distribution sampling method while maintaining a distant relationship with the best performer. Individuals in the population are regarded as autonomous intelligent agents in each style. Neural networks perform strategic actions in three styles to interact with the environment and the robot and iteratively optimize the network policy. Overall, ISL builds on the principles of intelligent optimization, incorporating ideas from reinforcement learning, and possesses strong search capabilities, fast computation speed, fewer hyperparameters, and insensitivity to sparse rewards. The proposed ISL algorithm is compared with four state-of-the-art methods on six continuous control benchmark cases in MuJoCo to verify its effectiveness and advantages. Furthermore, ISL is adopted in the simulation and experimental grasping tasks of the UR3 robot for validations, and satisfactory solutions are yielded.
SCADI: Self-supervised Causal Disentanglement in Latent Variable Models
results: 研究发现,SCADI 模型能够自动学习 semantic factor 和 causal 关系,无需任何指导或标注数据,并且能够生成可读的 causal 拓扑图。Abstract
Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised, providing information about intrinsic factors, which incurs excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised CAusal DIsentanglement), that enables the model to discover semantic factors and learn their causal relationships without any supervision. This model combines a masked structural causal model (SCM) with a pseudo-label generator for causal disentanglement, aiming to provide a new direction for self-supervised causal disentanglement models.
摘要
causal disentanglement 有很大的潜力,可以捕捉复杂的情况。但是,现有的实用和效率的方法缺乏。大多数无监督分解方法无法生成可识别的结果,通常导致随机分解的输出。因此,现有的分解模型都是弱监督的,提供内在因素的信息,这会带来过高的成本。因此,我们提议一种新的模型, namely SCADI(自我监督 causal disentanglement),它可以让模型发现 semantic factor 和学习其 causal 关系,无需任何监督。这个模型将 masked 结构 causal model(SCM)与 pseudo-label 生成器结合,以实现自我监督 causal disentanglement 模型的新方向。
Heuristics-Driven Link-of-Analogy Prompting: Enhancing Large Language Models for Document-Level Event Argument Extraction
results: 该论文通过实验表明,与现有提示方法和少量监督学习方法相比,HD-LoA提示方法在文档级EAE数据集上实现了4.53%和9.38%的F1分数提升,并在另外两个任务中也达到了2.87%和2.63%的准确率提升。Abstract
In this study, we investigate in-context learning (ICL) in document-level event argument extraction (EAE). The paper identifies key challenges in this problem, including example selection, context length limitation, abundance of event types, and the limitation of Chain-of-Thought (CoT) prompting in non-reasoning tasks. To address these challenges, we introduce the Heuristic-Driven Link-of-Analogy (HD-LoA) prompting method. Specifically, we hypothesize and validate that LLMs learn task-specific heuristics from demonstrations via ICL. Building upon this hypothesis, we introduce an explicit heuristic-driven demonstration construction approach, which transforms the haphazard example selection process into a methodical method that emphasizes task heuristics. Additionally, inspired by the analogical reasoning of human, we propose the link-of-analogy prompting, which enables LLMs to process new situations by drawing analogies to known situations, enhancing their adaptability. Extensive experiments show that our method outperforms the existing prompting methods and few-shot supervised learning methods, exhibiting F1 score improvements of 4.53% and 9.38% on the document-level EAE dataset. Furthermore, when applied to sentiment analysis and natural language inference tasks, the HD-LoA prompting achieves accuracy gains of 2.87% and 2.63%, indicating its effectiveness across different tasks.
摘要
在这个研究中,我们研究了文档级事件参考抽取(EAE)中的内在学习(ICL)。文章标出了该问题的关键挑战,包括示例选择、上下文长度限制、事件类型的充沛和不可靠的链条(CoT)唤起在非逻辑任务中。为解决这些挑战,我们介绍了逻辑驱动链接 аналоги(HD-LoA)唤起方法。具体来说,我们假设并证明了LLMs通过示例示例学习任务特有的规则。基于这个假设,我们提出了一种显式逻辑驱动示例建构方法,将随机示例选择过程变换成一种系统化的方法,注重任务规则。此外,受人类 аналоги性理解的启发,我们提出了链接 аналоги唤起,使LLMs可以通过对已知情况的分析,处理新情况,提高其适应性。广泛的实验表明,我们的方法在文档级EAE数据集上的 F1 分数提高 4.53% 和 9.38%,并在 Sentiment Analysis 和自然语言推理任务上实现了 Accuracy 的提高。
Is Machine Learning Unsafe and Irresponsible in Social Sciences? Paradoxes and Reconsidering from Recidivism Prediction Tasks
results: 论文的研究结果表明,这种新的方法可以更好地捕捉社会系统的复杂性和不确定性,提高预测的准确性和可靠性。Abstract
The paper addresses some fundamental and hotly debated issues for high-stakes event predictions underpinning the computational approach to social sciences. We question several prevalent views against machine learning and outline a new paradigm that highlights the promises and promotes the infusion of computational methods and conventional social science approaches.
摘要
Here's a word-for-word translation of the text into Simplified Chinese:文章讨论了社会科学 Computational Approach 高度风险预测的一些基本和热点问题。我们对数据学习的一些常见看法提出了质疑,并提出了一新的思路,强调计算方法和传统社会科学方法的融合。
MuST: Multimodal Spatiotemporal Graph-Transformer for Hospital Readmission Prediction
results: 我们的实验结果表明,包含多modal特征在内的 MuST 模型在 MIMIC-IV 数据集上的性能明显高于单modal方法。此外,我们提出的管道还超过了目前最佳的方法在医院复 admit 预测方面的表现。Abstract
Hospital readmission prediction is considered an essential approach to decreasing readmission rates, which is a key factor in assessing the quality and efficacy of a healthcare system. Previous studies have extensively utilized three primary modalities, namely electronic health records (EHR), medical images, and clinical notes, to predict hospital readmissions. However, the majority of these studies did not integrate information from all three modalities or utilize the spatiotemporal relationships present in the dataset. This study introduces a novel model called the Multimodal Spatiotemporal Graph-Transformer (MuST) for predicting hospital readmissions. By employing Graph Convolution Networks and temporal transformers, we can effectively capture spatial and temporal dependencies in EHR and chest radiographs. We then propose a fusion transformer to combine the spatiotemporal features from the two modalities mentioned above with the features from clinical notes extracted by a pre-trained, domain-specific transformer. We assess the effectiveness of our methods using the latest publicly available dataset, MIMIC-IV. The experimental results indicate that the inclusion of multimodal features in MuST improves its performance in comparison to unimodal methods. Furthermore, our proposed pipeline outperforms the current leading methods in the prediction of hospital readmissions.
摘要
This study introduces a novel model called the Multimodal Spatiotemporal Graph-Transformer (MuST) to predict hospital readmissions. By utilizing Graph Convolution Networks and temporal transformers, we can effectively capture spatial and temporal dependencies in EHR and chest radiographs. Additionally, we propose a fusion transformer to combine the spatiotemporal features from the two modalities with features from clinical notes extracted by a pre-trained, domain-specific transformer.We evaluate the effectiveness of our method using the latest publicly available dataset, MIMIC-IV. The experimental results show that the inclusion of multimodal features in MuST improves its performance compared to unimodal methods. Furthermore, our proposed pipeline outperforms the current leading methods in predicting hospital readmissions.
results: 该论文通过设置一个大规模的实际数据集,并对现有的选择模型进行了大规模的比较,发现该提出的选择模型在短期和长期数据Period内都具有优势。Abstract
Models of choice are a fundamental input to many now-canonical optimization problems in the field of Operations Management, including assortment, inventory, and price optimization. Naturally, accurate estimation of these models from data is a critical step in the application of these optimization problems in practice, and so it is perhaps surprising that such choice estimation has to now been accomplished almost exclusively, both in theory and in practice, (a) without the use of deep learning in any meaningful way, and (b) via evaluation on limited data with constantly-changing metrics. This is in stark contrast to the vast majority of similar learning applications, for which the practice of machine learning suggests that (a) neural network-based models are typically state-of-the-art, and (b) strict standardization on evaluation procedures (datasets, metrics, etc.) is crucial. Thus motivated, we first propose a choice model that is the first to successfully (both theoretically and practically) leverage a modern neural network architectural concept (self-attention). Theoretically, we show that our attention-based choice model is a low-rank generalization of the Halo Multinomial Logit model, a recent model that parsimoniously captures irrational choice effects and has seen empirical success. We prove that whereas the Halo-MNL requires $\Omega(m^2)$ data samples to estimate, where $m$ is the number of products, our model supports a natural nonconvex estimator (in particular, that which a standard neural network implementation would apply) which admits a near-optimal stationary point with $O(m)$ samples. We then establish the first realistic-scale benchmark for choice estimation on real data and use this benchmark to run the largest evaluation of existing choice models to date. We find that the model we propose is dominant over both short-term and long-term data periods.
摘要
选择模型是操作管理领域的基本输入,包括搭配、存储和价格优化等问题。选择模型的准确估计从数据中是应用这些优化问题的重要步骤,但是到目前为止,大多数实际应用中都使用了深度学习。这是与大多数类似学习应用不同的,后者通常使用神经网络模型,并且在评价过程中坚持标准化。因此,我们首先提出一个利用现代神经网络架构思想(自注意)的选择模型,这是第一个成功地(both theoretically and practically)利用自注意来估计选择模型的实际应用。我们证明了我们的注意力基本是唯一的多omialLogit模型的低级泛化,这是一个最近的模型,可以减少人们的偏好选择效应。我们证明了在$m$是产品数量时,哈洛-多omialLogit模型需要$\Omega(m^2)$的数据样本来估计,而我们的模型可以使用标准神经网络实现,并且可以在$O(m)$的样本数据上获得近似最优的站点。我们然后建立了实际规模的选择估计 benchmark,并使用这个 benchmark 来评估现有的选择模型,发现我们的模型在短期和长期数据期间均占据了主导地位。
How ChatGPT is Solving Vulnerability Management Problem
paper_authors: Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, Wenhai Wang for:这个论文旨在探讨ChatGPT是否可以在实际的漏洞管理任务中表现出色,包括预测安全相关性和补丁正确性等多个方面。methods:这个论文使用了ChatGPT完成6个关于漏洞管理过程的任务,并与现有的最佳实践进行比较,以 investigates the impact of different prompts 和 explore the difficulties。results:论文表明,ChatGPT在某些任务中表现出色,如生成软件漏洞报告标题。然而,ChatGPT也遇到了困难,如直接提供随机示例不能保证良好的性能。 Study reveals that leveraging ChatGPT in a self-heuristic way and effectively guiding ChatGPT to focus on helpful information are promising research directions.Abstract
Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works show that ChatGPT has the capabilities of processing foundational code analysis tasks, such as abstract syntax tree generation, which indicates the potential of using ChatGPT to comprehend code syntax and static behaviors. However, it is unclear whether ChatGPT can complete more complicated real-world vulnerability management tasks, such as the prediction of security relevance and patch correctness, which require an all-encompassing understanding of various aspects, including code syntax, program semantics, and related manual comments. In this paper, we explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 78,445 samples. For each task, we compare ChatGPT against SOTA approaches, investigate the impact of different prompts, and explore the difficulties. The results suggest promising potential in leveraging ChatGPT to assist vulnerability management. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Furthermore, our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions. For instance, directly providing random demonstration examples in the prompt cannot consistently guarantee good performance in vulnerability management. By contrast, leveraging ChatGPT in a self-heuristic way -- extracting expertise from demonstration examples itself and integrating the extracted expertise in the prompt is a promising research direction. Besides, ChatGPT may misunderstand and misuse the information in the prompt. Consequently, effectively guiding ChatGPT to focus on helpful information rather than the irrelevant content is still an open problem.
摘要
近来,ChatGPT在代码分析领域引起了广泛关注。先前的研究表明,ChatGPT可以处理基础代码分析任务,如抽象语法树生成,这表明ChatGPT可能可以理解代码语法和静态行为。然而,是否可以使用ChatGPT完成更加复杂的实际漏洞管理任务,例如预测安全相关性和补丁正确性,它们需要覆盖多个方面,包括代码语法、程序 semantics 和相关手动注释。在这篇论文中,我们探索了ChatGPT在6个任务中的能力,这些任务涉及到了整个漏洞管理过程,使用了78445个样本。对于每个任务,我们与SOTA方法进行比较,研究不同的提示的影响,并探索了困难。结果表明可以使用ChatGPT协助漏洞管理,其中一个例子是ChatGPT在生成软件漏洞报告标题上的护法。此外,我们的发现还揭示了ChatGPT遇到的困难,并提供了可能的未来方向。例如,直接在提示中提供随机示例不一定能够在漏洞管理中达到好的表现。相反,通过在提示中抽取示例本身的专家知识,并将其 интегрирова到提示中是一个有前途的研究方向。此外,ChatGPT可能会对提示中的信息进行错误或不当使用,因此有效地引导ChatGPT关注有用的信息而不是无关的内容仍然是一个开放的问题。
Conceptual Model Interpreter for Large Language Models
methods: 本论文采用了探索性的研究方法,使用现有的LLMs such as Llama~2和ChatGPT 4生成和解释概念模型,并通过API或本地交互来实现与解释器和LLMs的集成。
results: 本论文的实验结果显示,使用ChatGPT 4和Llama 2生成的模型可以在对话式交互中进行迭代模式化,并且可以在不同的商业和开源LLMs和解释器上支持多种不同的实现方式。Abstract
Large Language Models (LLMs) recently demonstrated capabilities for generating source code in common programming languages. Additionally, commercial products such as ChatGPT 4 started to provide code interpreters, allowing for the automatic execution of generated code fragments, instant feedback, and the possibility to develop and refine in a conversational fashion. With an exploratory research approach, this paper applies code generation and interpretation to conceptual models. The concept and prototype of a conceptual model interpreter is explored, capable of rendering visual models generated in textual syntax by state-of-the-art LLMs such as Llama~2 and ChatGPT 4. In particular, these LLMs can generate textual syntax for the PlantUML and Graphviz modeling software that is automatically rendered within a conversational user interface. The first result is an architecture describing the components necessary to interact with interpreters and LLMs through APIs or locally, providing support for many commercial and open source LLMs and interpreters. Secondly, experimental results for models generated with ChatGPT 4 and Llama 2 are discussed in two cases covering UML and, on an instance level, graphs created from custom data. The results indicate the possibility of modeling iteratively in a conversational fashion.
摘要
Recently, large language models (LLMs) have shown the ability to generate source code in common programming languages. In addition, commercial products such as ChatGPT 4 have provided code interpreters, allowing for automatic execution of generated code fragments, instant feedback, and the ability to develop and refine in a conversational manner. With an exploratory research approach, this paper applies code generation and interpretation to conceptual models.The concept and prototype of a conceptual model interpreter were explored, capable of rendering visual models generated in textual syntax by state-of-the-art LLMs such as Llama~2 and ChatGPT 4. In particular, these LLMs can generate textual syntax for the PlantUML and Graphviz modeling software that is automatically rendered within a conversational user interface.The first result is an architecture describing the components necessary to interact with interpreters and LLMs through APIs or locally, providing support for many commercial and open-source LLMs and interpreters. Secondly, experimental results for models generated with ChatGPT 4 and Llama 2 are discussed in two cases covering UML and, on an instance level, graphs created from custom data. The results indicate the possibility of modeling iteratively in a conversational fashion.Here's the text in Traditional Chinese:最近,大型语言模型(LLMs)已经显示出生成常用程式语言的源代码的能力。此外,商业产品如ChatGPT 4已经提供了代码解释器,允许将生成的代码片段自动执行,并提供了即时反馈和开发和细化在对话方式下的能力。透过探索性研究方法,这篇论文将应用代码生成和解释到概念模型。这篇论文探索了一个概念模型解释器的概念和原型,可以将由现代 LLMs 如Llama~2和ChatGPT 4生成的文本 syntax 自动转换为可见的Visual模型。具体来说,这些 LLMs 可以生成 PlantUML 和 Graphviz 模型软件的文本 syntax,并将其自动转换为可见的Visual模型。论文的首个结果是一个架构,描述了与解释器和 LLMs 进行交互的 ком成�ionen,以及支持多个商业和开源 LLMs 和解释器的架构。其次,这篇论文针对使用 ChatGPT 4 和 Llama 2 生成的模型进行实验,并分为两个情况进行讨论:UML 和具体情况下的图形。结果显示了可以在对话方式下进行迭代式模型化。
paper_authors: Jianbin Qin, Sifan Huang, Yaoshu Wang, Jing Zhu, Yifan Zhang, Yukai Miao, Rui Mao, Makoto Onizuka, Chuan Xiao for:BClean is proposed to solve the problem of data cleaning, which is a crucial step in data preprocessing and machine learning.methods:BClean uses Bayesian inference and automatic Bayesian network construction, which can fully exploit the relationships between attributes in the observed dataset and any prior information provided by users. The system also includes an effective scoring model and several approximation strategies to enhance the efficiency of data cleaning.results:BClean achieves an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%.Abstract
There is a considerable body of work on data cleaning which employs various principles to rectify erroneous data and transform a dirty dataset into a cleaner one. One of prevalent approaches is probabilistic methods, including Bayesian methods. However, existing probabilistic methods often assume a simplistic distribution (e.g., Gaussian distribution), which is frequently underfitted in practice, or they necessitate experts to provide a complex prior distribution (e.g., via a programming language). This requirement is both labor-intensive and costly, rendering these methods less suitable for real-world applications. In this paper, we propose BClean, a Bayesian Cleaning system that features automatic Bayesian network construction and user interaction. We recast the data cleaning problem as a Bayesian inference that fully exploits the relationships between attributes in the observed dataset and any prior information provided by users. To this end, we present an automatic Bayesian network construction method that extends a structure learning-based functional dependency discovery method with similarity functions to capture the relationships between attributes. Furthermore, our system allows users to modify the generated Bayesian network in order to specify prior information or correct inaccuracies identified by the automatic generation process. We also design an effective scoring model (called the compensative scoring model) necessary for the Bayesian inference. To enhance the efficiency of data cleaning, we propose several approximation strategies for the Bayesian inference, including graph partitioning, domain pruning, and pre-detection. By evaluating on both real-world and synthetic datasets, we demonstrate that BClean is capable of achieving an F-measure of up to 0.9 in data cleaning, outperforming existing Bayesian methods by 2% and other data cleaning methods by 15%.
摘要
有一大量的研究在数据清洁方面,这些方法使用不同的原则来修正错误数据并将废弃数据集转换为一个更加干净的数据集。其中一种常见的方法是概率方法,包括极 bayesian 方法。然而,现有的概率方法 frequently 假设一个简单的分布(例如, Gaussian 分布),这些分布在实际应用中 часто 被做不当,或者需要专家提供复杂的先前分布(例如,通过编程语言)。这种需求是 Both labor-intensive and costly,使得这些方法在实际应用中不太适用。在这篇论文中,我们提出 BClean,一个基于 Bayesian 的数据清洁系统。我们将数据清洁问题转换为 Bayesian 推理,并将用户提供的先前信息和观察数据中的关系完全利用。为此,我们提出一种自动生成 Bayesian 网络的方法,该方法基于结构学习-基于函数依赖性发现的方法,并使用相似函数来捕捉属性之间的关系。此外,我们的系统允许用户修改生成的 Bayesian 网络,以便指定先前信息或者更正由自动生成过程发现的错误。我们还设计了一种有效的评分模型(即补偿评分模型),以便实现 Bayesian 推理。为提高数据清洁的效率,我们提出了多种approximation 策略,包括图 partitioning、domain pruning 和 pre-detection。通过对真实数据和 sintetic 数据进行评估,我们示出 BClean 可以在数据清洁中 achiev 0.9 的 F-度,比既 Bayesian 方法高 2%,比其他数据清洁方法高 15%。
Step by Step to Fairness: Attributing Societal Bias in Task-oriented Dialogue Systems
paper_authors: Hsuan Su, Rebecca Qian, Chinnadhurai Sankar, Shahin Shayandeh, Shang-Tse Chen, Hung-yi Lee, Daniel M. Bikel
for: 本文旨在描述一种用于诊断对话系统中偏见的诊断方法,以帮助研究人员更深入地理解偏见的来源。
methods: 本文使用了预训练的大语言模型(LLM),并通过综合分析各个系统 ком ponent的偏见行为,进行偏见诊断。
results: 实验结果表明,对话系统中的偏见通常来自于响应生成模型,而不是其他系统 ком ponent。Abstract
Recent works have shown considerable improvements in task-oriented dialogue (TOD) systems by utilizing pretrained large language models (LLMs) in an end-to-end manner. However, the biased behavior of each component in a TOD system and the error propagation issue in the end-to-end framework can lead to seriously biased TOD responses. Existing works of fairness only focus on the total bias of a system. In this paper, we propose a diagnosis method to attribute bias to each component of a TOD system. With the proposed attribution method, we can gain a deeper understanding of the sources of bias. Additionally, researchers can mitigate biased model behavior at a more granular level. We conduct experiments to attribute the TOD system's bias toward three demographic axes: gender, age, and race. Experimental results show that the bias of a TOD system usually comes from the response generation model.
摘要
近期研究已经显示了使用预训练大型自然语言模型(LLM)的端到端方式可以获得显著改进的任务对话(TOD)系统。然而,TOD系统中每个组件的偏见行为以及端到端框架中的错误卷积问题可能会导致严重的偏见TOD响应。现有的公平性研究只关注系统总体偏见。在这篇论文中,我们提出了一种诊断方法,用于归因TOD系统中各组件的偏见。通过该归因方法,我们可以更深入地了解偏见的来源。此外,研究人员可以在更细化的水平上 mitigate 模型偏见行为。我们对TOD系统的偏见进行了三个民族轴的诊断:性别、年龄和种族。实验结果表明,TOD系统的偏见通常来自于响应生成模型。
Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering
methods: 这 paper 提出了一个新的管道,称为 Knowledgeable Preference AlignmenT (KnowPAT),它使用了两种偏好集合:style preference set 和 knowledge preference set,并设计了一个新的对 alignment 目标,以让 LLM 的偏好与人类偏好相align。
results: 根据对 15 个基线方法的比较,这 paper 的 KnowPAT 管道在实际Scene 中域specific问答中表现出色,超越了 15 个基eline方法。代码可以在 https://github.com/zjukg/KnowPAT 上获取。Abstract
Recently, the development of large language models (LLMs) has attracted wide attention in academia and industry. Deploying LLMs to real scenarios is one of the key directions in the current Internet industry. In this paper, we present a novel pipeline to apply LLMs for domain-specific question answering (QA) that incorporates domain knowledge graphs (KGs), addressing an important direction of LLM application. As a real-world application, the content generated by LLMs should be user-friendly to serve the customers. Additionally, the model needs to utilize domain knowledge properly to generate reliable answers. These two issues are the two major difficulties in the LLM application as vanilla fine-tuning can not adequately address them. We think both requirements can be unified as the model preference problem that needs to align with humans to achieve practical application. Thus, we introduce Knowledgeable Preference AlignmenT (KnowPAT), which constructs two kinds of preference set called style preference set and knowledge preference set respectively to tackle the two issues. Besides, we design a new alignment objective to align the LLM preference with human preference, aiming to train a better LLM for real-scenario domain-specific QA to generate reliable and user-friendly answers. Adequate experiments and comprehensive with 15 baseline methods demonstrate that our KnowPAT is an outperforming pipeline for real-scenario domain-specific QA with LLMs. Our code is open-source at https://github.com/zjukg/KnowPAT.
摘要
最近,大型语言模型(LLM)的发展吸引了学术和产业界的广泛关注。将LLM应用到实际场景是当前互联网业界的一个重要方向。在这篇论文中,我们提出了一个新的管道,用于将LLM应用于域pecific问答(QA)中,并利用域知识图(KG),解决LLM应用中的重要方向。作为实际应用,生成的内容应该是用户友好,以服务于客户。此外,模型需要正确地利用域知识,以生成可靠的答案。这两个问题是LLM应用中的两大difficulty,vanilla fine-tuning无法充分解决。我们认为,这两个问题可以被统称为模型偏好问题,需要与人类Alignment,以实现实际应用。因此,我们提出了知识偏好Alignment(KnowPAT),它构建了两种偏好集,namely style preference set和knowledge preference set,分别解决这两个问题。此外,我们设计了一个新的对Alignment objective,以将LLM的偏好与人类偏好Alignment,以训练更好的LLM,以生成可靠和用户友好的答案。我们的实验和对15种基准方法进行了详细的比较,示出了我们的KnowPAT在实际场景下的域pecific问答with LLM的表现优于15种基准方法。我们的代码可以在https://github.com/zjukg/KnowPAT上获取。
DRUformer: Enhancing the driving scene Important object detection with driving relationship self-understanding
paper_authors: Yingjie Niu, Ming Ding, Keisuke Fujii, Kento Ohtani, Alexander Carballo, Kazuya Takeda for: 这篇论文的目的是帮助车辆预测重要的物体,以提高安全驾驶。methods: 这篇论文使用了一个名为DRUformer的多模式转换器模型,考虑了所有参与者之间的关系,并将驾驶意向 embed 到模型中。results: 该模型在DRAMA数据集上进行比较实验,与其他现有的SOTA模型进行比较,获得了16.2%的mIoU提升和12.3%的ACC提升。此外,该模型在不同的道路场景和类别下实现了重要物体检测的多元效果。Abstract
Traffic accidents frequently lead to fatal injuries, contributing to over 50 million deaths until 2023. To mitigate driving hazards and ensure personal safety, it is crucial to assist vehicles in anticipating important objects during travel. Previous research on important object detection primarily assessed the importance of individual participants, treating them as independent entities and frequently overlooking the connections between these participants. Unfortunately, this approach has proven less effective in detecting important objects in complex scenarios. In response, we introduce Driving scene Relationship self-Understanding transformer (DRUformer), designed to enhance the important object detection task. The DRUformer is a transformer-based multi-modal important object detection model that takes into account the relationships between all the participants in the driving scenario. Recognizing that driving intention also significantly affects the detection of important objects during driving, we have incorporated a module for embedding driving intention. To assess the performance of our approach, we conducted a comparative experiment on the DRAMA dataset, pitting our model against other state-of-the-art (SOTA) models. The results demonstrated a noteworthy 16.2\% improvement in mIoU and a substantial 12.3\% boost in ACC compared to SOTA methods. Furthermore, we conducted a qualitative analysis of our model's ability to detect important objects across different road scenarios and classes, highlighting its effectiveness in diverse contexts. Finally, we conducted various ablation studies to assess the efficiency of the proposed modules in our DRUformer model.
摘要
交通事故常引起致命伤害,至2023年已经导致50多万人死亡。为了实现安全驾驶和预防驾驶危险,它是非常重要的帮助车辆预测重要的物件。过去的研究主要集中在个人参与者的重要性,将它们视为独立的实体,往往忽略了参与者之间的关系。可是,这种方法在复杂的情况下显示出较差的检测效果。因此,我们提出了驾驶景况关系自我理解变数former(DRUformer),用于提高重要物件检测任务。DRUformer 是基于 transformer 的多模式重要物件检测模型,考虑所有参与者在驾驶景况中的关系。认识到驾驶意向也对重要物件检测 during driving 有重要影响,我们将驾驶意向模块 embed 到我们的模型中。为了评估我们的方法效果,我们在 DRAMA dataset 上进行了比较性实验,与其他现有的 SOTA 方法进行比较。结果显示 DRUformer 在 mIoU 方面获得了可注目的 16.2% 提升,并在 ACC 方面获得了重要的 12.3% 提升,与 SOTA 方法相比。此外,我们进行了多种简洁分析,以评估 DRUformer 模型在不同的道路enario 和类别中的效果,显示它在多元的情况下具有优秀的效果。最后,我们进行了多种简洁分析,以评估 DRUformer 模型中各个模块的效率。
Finetuning Text-to-Image Diffusion Models for Fairness
results: 这篇论文的实验结果表明,使用我们的方法可以对职业描述进行重大减少gender、race和其 intersectional偏见。 gender偏见可以在仅五个软标签的情况下得到明显减少。更重要的是,我们的方法可以支持多种公平的观点,例如控制年龄分布为75%的年轻人和25%的老人,同时对gender和race进行减少偏见。最后,我们的方法可以应对多个概念的偏见,只需要在调整资料中包含这些描述。Abstract
The rapid adoption of text-to-image diffusion models in society underscores an urgent need to address their biases. Without interventions, these biases could propagate a distorted worldview and limit opportunities for minority groups. In this work, we frame fairness as a distributional alignment problem. Our solution consists of two main technical contributions: (1) a distributional alignment loss that steers specific characteristics of the generated images towards a user-defined target distribution, and (2) biased direct finetuning of diffusion model's sampling process, which leverages a biased gradient to more effectively optimize losses defined on the generated images. Empirically, our method markedly reduces gender, racial, and their intersectional biases for occupational prompts. Gender bias is significantly reduced even when finetuning just five soft tokens. Crucially, our method supports diverse perspectives of fairness beyond absolute equality, which is demonstrated by controlling age to a $75\%$ young and $25\%$ old distribution while simultaneously debiasing gender and race. Finally, our method is scalable: it can debias multiple concepts at once by simply including these prompts in the finetuning data. We hope our work facilitates the social alignment of T2I generative AI. We will share code and various debiased diffusion model adaptors.
摘要
社会中文本到图像扩散模型的快速采纳标志着必须解决这些模型的偏见。如果没有干预措施,这些偏见可能会延续一个扭曲的世界观和限制少数群体的机会。在这项工作中,我们将公平视为分布对齐问题。我们的解决方案包括两个主要技术贡献:1. 分布对齐损失,使生成图像的特定特征向用户定义的目标分布进行调整。2. 偏见直接训练扩散模型的采样过程的方法,利用偏见的梯度更好地优化生成图像上的损失。实验表明,我们的方法可以显著减少 gender、种族和他们的交叉性偏见,特别是只需要五个软token进行微调。此外,我们的方法还可以控制年龄分布,将图像生成到75%的年轻人和25%的老年人之间。最后,我们的方法可以同时控制多个概念的偏见,只需要在微调数据中包含这些提示。我们希望我们的工作可以促进文本到图像生成AI的社会对齐。我们将分享代码和多个减偏 diffusion model adapter。
Adaptive Language-based Mental Health Assessment with Item-Response Theory
paper_authors: Vasudha Varadarajan, Sverker Sikström, Oscar N. E. Kjell, H. Andrew Schwartz
For: 这个研究旨在开发一种适应语言基本评估方法,以估计个人的心理分数基于有限的语言回答。* Methods: 研究使用了两种统计学学习方法:类传统测试理论(CTT)和项回快捷论(IRT)。* Results: 研究发现,使用适应测试可以大幅减少需要ask的问题数量,从11个问题降低到3个问题(对于抑郁)和5个问题(对于焦虑),而且使用ALIRT模型可以实现最高的准确率(如 Pearson r ≈ 0.93 ),同时减少问题数量。Abstract
Mental health issues widely vary across individuals - the manifestations of signs and symptoms can be fairly heterogeneous. Recently, language-based depression and anxiety assessments have shown promise for capturing this heterogeneous nature by evaluating a patient's own language, but such approaches require a large sample of words per person to be accurate. In this work, we introduce adaptive language-based assessment - the task of iteratively estimating an individual's psychological score based on limited language responses to questions that the model also decides to ask. To this end, we explore two statistical learning-based approaches for measurement/scoring: classical test theory (CTT) and item response theory (IRT). We find that using adaptive testing in general can significantly reduce the number of questions required to achieve high validity (r ~ 0.7) with standardized tests, bringing down from 11 total questions down to 3 for depression and 5 for anxiety. Given the combinatorial nature of the problem, we empirically evaluate multiple strategies for both the ordering and scoring objectives, introducing two new methods: a semi-supervised item response theory based method (ALIRT), and a supervised actor-critic based model. While both of the models achieve significant improvements over random and fixed orderings, we find ALIRT to be a scalable model that achieves the highest accuracy with lower numbers of questions (e.g. achieves Pearson r ~ 0.93 after only 3 questions versus asking all 11 questions). Overall, ALIRT allows prompting a reduced number of questions without compromising accuracy or overhead computational costs.
摘要
心理健康问题在各个人之间很有差异 - 症状的表现可以很异化。在最近的语言基于评估中,使用患者自己的语言来评估心理健康的表现已经显示了批 promise。然而,这些方法需要每个人提供大量的语言数据来达到准确性。在这种情况下,我们介绍了适应语言基本评估 - 通过限制语言问题的数量来评估个体的心理分数。为此,我们 explore了两种统计学学习方法:классиical test theory(CTT)和item response theory(IRT)。我们发现,使用适应测试可以significantly reducethe number of questions required to achieve high validity(r ≈ 0.7)with standardized tests,从11个问题降低到3个问题(对压力问题)和5个问题(对抑郁问题)。由于问题的组合性,我们进行了多种策略的实验性评估,包括两种新方法:一种基于 semi-supervised item response theory的方法(ALIRT),以及一种基于supervised actor-critic模型的方法。虽然两种模型都实现了 Random和固定顺序的改进,但我们发现 ALIRT 是一种可扩展的模型,可以在减少问题数量的情况下保持高度的准确性(例如,在只需要3个问题时达到 Pearson r ≈ 0.93)。总之,ALIRT 允许在减少问题数量的情况下进行评估,不会增加计算成本或承载压力。
Electronic Communication Data Link Encryption Simulation Based on Wireless Communication
results: 实验结果表明,使用改进的护圈卷私钥 cryptographic algorithm simulate系统数据链Encryption在无线网络通信中,时间只需2.31毫秒,比其他算法更低。结论:研究表明,基于无线通信技术可以有效提高电子通信数据链Encryption的模拟效果。Abstract
In order to improve the simulation effect of electronic communication data link encryption, the author proposes a solution based on wireless communication. The main content of this technology is based on the research of wireless communication, improve the elliptic curve cryptographic algorithm to build a system encryption model, obtain legal and valid node private keys, evaluate and analyze the relevant security attributes of the system, verify the security of the keys, and realize the encryption optimization of wireless network communication. Experimental results show that: Using the improved elliptic curve to simulate the system data chain encryption under the certificateless public key cryptosystem in network communication, the time is only 2.31 milliseconds, which is lower than other algorithms. Conclusion: It is proved that the technology research based on wireless communication can effectively improve the encryption simulation effect of electronic communication data link.
摘要
要提高电子通信数据链加密的模拟效果,作者提出了基于无线通信的解决方案。该技术的主要内容是基于无线通信的研究,改进椭圆曲线密码算法,建立系统加密模型,获得法理合法的节点私钥,评估和分析系统安全特性,验证密钥安全性,并实现无线网络通信加密优化。实验结果显示,使用改进的椭圆曲线来模拟系统数据链加密under certificateless public key cryptosystem在网络通信中,时间只需2.31毫秒,比其他算法更低。结论:研究表明,基于无线通信技术可以有效地提高电子通信数据链加密的模拟效果。
Online Advertisements with LLMs: Opportunities and Challenges
results: 论文提出了不同设计考虑和实施技术挑战。Abstract
This paper explores the potential for leveraging Large Language Models (LLM) in the realm of online advertising systems. We delve into essential requirements including privacy, latency, reliability, users and advertisers' satisfaction, which such a system must fulfill. We further introduce a general framework for LLM advertisement, consisting of modification, bidding, prediction, and auction modules. Different design considerations for each module is presented, with an in-depth examination of their practicality and the technical challenges inherent to their implementation.
摘要
paper_authors: Jiankai Sun, Jianing Qiu, Chuanyang Zheng, John Tucker, Javier Yu, Mac Schwager for: This paper aims to accelerate research in developing rich, multimodal scene models trained from egocentric data, with applications in VR/AR and intelligent agents.methods: The paper uses differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs) to construct a NeRF-like model from an egocentric image sequence.results: The paper presents a comprehensive multimodal egocentric video dataset, featuring diverse data modalities and real-world context, as a foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in VR, AR, and robotics.Abstract
We seek to accelerate research in developing rich, multimodal scene models trained from egocentric data, based on differentiable volumetric ray-tracing inspired by Neural Radiance Fields (NeRFs). The construction of a NeRF-like model from an egocentric image sequence plays a pivotal role in understanding human behavior and holds diverse applications within the realms of VR/AR. Such egocentric NeRF-like models may be used as realistic simulations, contributing significantly to the advancement of intelligent agents capable of executing tasks in the real-world. The future of egocentric view synthesis may lead to novel environment representations going beyond today's NeRFs by augmenting visual data with multimodal sensors such as IMU for egomotion tracking, audio sensors to capture surface texture and human language context, and eye-gaze trackers to infer human attention patterns in the scene. To support and facilitate the development and evaluation of egocentric multimodal scene modeling, we present a comprehensive multimodal egocentric video dataset. This dataset offers a comprehensive collection of sensory data, featuring RGB images, eye-tracking camera footage, audio recordings from a microphone, atmospheric pressure readings from a barometer, positional coordinates from GPS, connectivity details from Wi-Fi and Bluetooth, and information from dual-frequency IMU datasets (1kHz and 800Hz) paired with a magnetometer. The dataset was collected with the Meta Aria Glasses wearable device platform. The diverse data modalities and the real-world context captured within this dataset serve as a robust foundation for furthering our understanding of human behavior and enabling more immersive and intelligent experiences in the realms of VR, AR, and robotics.
摘要
我们寻求加速开展具有丰富多modal scene模型的研究,基于可微分数据trace的射影类似NeRF(Neural Radiance Fields)。从 Egocentric 影像序列建立NeRF-like模型扮演着重要的角色,可以更好地理解人类行为,并具有广泛应用于VR/AR等领域。这些 Egocentric NeRF-like 模型可以用来生成真实的simulation,对于在真实世界中进行任务的智能代理人具有重要意义。未来的 Egocentric 视角合成可能将会超越今天的NeRFs,通过与多modal感应器(如IMU、Audio、眼动追踪等)集成,增强视觉数据,并从人类语言上下文中获取更多的信息。为了支持和促进 Egocentric 多modal scene 模型的开发和评估,我们提供了一个完整的多modal Egocentric 影像Dataset。这个dataset包括RGB图像、眼动摄影机、麦克风录音、气压测量、GPS位置坐标、Wi-Fi和蓝牙连接资讯以及双频率IMU数据(1kHz和800Hz)和磁ometer。这个dataset在Meta Aria Glasses 挂架台上进行收集。这个多modal的数据模式和在真实世界中捕捉的情感上,将成为更好的基础 для进一步理解人类行为,并实现更 immerse 和智能的VR、AR和机器人体验。
THOS: A Benchmark Dataset for Targeted Hate and Offensive Speech
methods: 本论文使用了 manually labeled 的 tweets,并使用了 Large Language Models 来进行分类。
results: 研究人员通过使用 THOS dataset,成功地使用 Large Language Models 进行分类,并达到了高度的准确率。Abstract
Detecting harmful content on social media, such as Twitter, is made difficult by the fact that the seemingly simple yes/no classification conceals a significant amount of complexity. Unfortunately, while several datasets have been collected for training classifiers in hate and offensive speech, there is a scarcity of datasets labeled with a finer granularity of target classes and specific targets. In this paper, we introduce THOS, a dataset of 8.3k tweets manually labeled with fine-grained annotations about the target of the message. We demonstrate that this dataset makes it feasible to train classifiers, based on Large Language Models, to perform classification at this level of granularity.
摘要
检测社交媒体上的危险内容,如推特上的负面或仇恨言论,受到复杂性的限制。实际上,许多数据集已经为训练分类器而收集,但是它们的标签精度尚不够。在这篇论文中,我们介绍了THOS数据集,包含8.3万个推特消息的手动标注细化目标类别。我们示示了这个数据集使得基于大语言模型的分类器可以在这种精度水平上进行分类。
Controllability-Constrained Deep Network Models for Enhanced Control of Dynamical Systems
results: 该方法可以提高模型中的控制性,并且可以提供更高效的控制器、更好的解释性和更低的长期预测误差。这些结果表明了数据驱动模型的控制性可以通过控制理论基于的方法进行改进。Abstract
Control of a dynamical system without the knowledge of dynamics is an important and challenging task. Modern machine learning approaches, such as deep neural networks (DNNs), allow for the estimation of a dynamics model from control inputs and corresponding state observation outputs. Such data-driven models are often utilized for the derivation of model-based controllers. However, in general, there are no guarantees that a model represented by DNNs will be controllable according to the formal control-theoretical meaning of controllability, which is crucial for the design of effective controllers. This often precludes the use of DNN-estimated models in applications, where formal controllability guarantees are required. In this proof-of-the-concept work, we propose a control-theoretical method that explicitly enhances models estimated from data with controllability. That is achieved by augmenting the model estimation objective with a controllability constraint, which penalizes models with a low degree of controllability. As a result, the models estimated with the proposed controllability constraint allow for the derivation of more efficient controllers, they are interpretable by the control-theoretical quantities and have a lower long-term prediction error. The proposed method provides new insights on the connection between the DNN-based estimation of unknown dynamics and the control-theoretical guarantees of the solution properties. We demonstrate the superiority of the proposed method in two standard classical control systems with state observation given by low resolution high-dimensional images.
摘要
<>Here's the translation in Traditional Chinese:<>