cs.AI - 2023-11-21

From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models

  • paper_url: http://arxiv.org/abs/2311.13063
  • repo_url: None
  • paper_authors: Zachary Englhardt, Chengqian Ma, Margaret E. Morris, Xuhai “Orson” Xu, Chun-Cheng Chang, Lianhui Qin, Daniel McDuff, Xin Liu, Shwetak Patel, Vikram Iyer
  • for: 提供MENTAL HEALTH PROFESSIONALS WITH INSIGHTS FROM PATIENTS’ DAILY LIVES
  • methods: 使用大型自然语言模型(LLMs)将多感器数据转化为临床实用信息
  • results: 成功实现了对抑郁症的二分类预测(精度达61.1%),并发现了一种新的人工智能协作方法,在临床决策过程中,临床专家和人工智能工具之间进行互动,以增强诊断的准确性和有用性。
    Abstract Passively collected behavioral health data from ubiquitous sensors holds significant promise to provide mental health professionals insights from patient's daily lives; however, developing analysis tools to use this data in clinical practice requires addressing challenges of generalization across devices and weak or ambiguous correlations between the measured signals and an individual's mental health. To address these challenges, we take a novel approach that leverages large language models (LLMs) to synthesize clinically useful insights from multi-sensor data. We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data such as step count and sleep relate to conditions like depression and anxiety. We first demonstrate binary depression classification with LLMs achieving accuracies of 61.1% which exceed the state of the art. While it is not robust for clinical use, this leads us to our key finding: even more impactful and valued than classification is a new human-AI collaboration approach in which clinician experts interactively query these tools and combine their domain expertise and context about the patient with AI generated reasoning to support clinical decision-making. We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data.
    摘要 大量敏感数据的收集和分析可以为心理健康专业人员提供有益的洞察。然而,将这些数据应用于临床实践中存在一些挑战,如设备间的总结和心理健康状况与测量信号之间的弱或抽象相关性。为解决这些挑战,我们采用了一种新的方法,利用大型自然语言模型(LLM)来生成临床有用的洞察。我们开发了一种链条思维提示方法,使用LLM生成与步数和睡眠等数据相关的心理健康状况的推理。我们首先证明了使用LLM进行抑郁分类可以达到61.1%的准确率,这超过了现有技术的表现。尽管这并不是临床使用的稳定性,但这导我们到了我们的关键发现:在临床决策中,与AI生成的推理相结合的专业医生知识和patient的特定情况可以更有价值和有用。我们发现GPT-4可以正确地引用数字数据75%的时间,并且临床参与者表达了强烈的兴趣使用这种方法来解读自我跟踪数据。

Latent Lab: Large Language Models for Knowledge Exploration

  • paper_url: http://arxiv.org/abs/2311.13051
  • repo_url: None
  • paper_authors: Kevin Dunnell, Trudy Painter, Andrew Stoddard, Andy Lippman
  • for: 本研究探讨了人工智能模型,特别是大型语言模型(LLMs),在创新过程中支持知识探索和人类创造力的潜力。
  • methods: 本研究使用了“潜在实验室”(Latent Lab),一种互动工具,用于探索MIT媒体实验室研究项目之间的连接,强调“探索”而不是搜索。
  • results: 在用户研究中,“潜在实验室”成功地引导用户探索到新的知识基础,并为人类-AI知识探索系统的进一步发展提供了基础。
    Abstract This paper investigates the potential of AI models, particularly large language models (LLMs), to support knowledge exploration and augment human creativity during ideation. We present "Latent Lab" an interactive tool for discovering connections among MIT Media Lab research projects, emphasizing "exploration" over search. The work offers insights into collaborative AI systems by addressing the challenges of organizing, searching, and synthesizing content. In a user study, the tool's success was evaluated based on its ability to introduce users to an unfamiliar knowledge base, ultimately setting the groundwork for the ongoing advancement of human-AI knowledge exploration systems.
    摘要

Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis

  • paper_url: http://arxiv.org/abs/2311.13046
  • repo_url: None
  • paper_authors: Yuxi Heluo, Kexin Wang, Charles W. Robson
  • for: 这项研究的目的是调查2020年COVID-19大流行期间人类行为是如何遵守面具穿着相关的公共卫生政策。
  • methods: 这项研究使用了对物体检测的卷积神经网络、回归分析和多层感知器组合分析了维也纳公众在2020年的视觉数据,以 investigate 人类行为。
  • results: 研究发现,面具穿着相关的政府规定和公共交通宣传影响了COVID-19大流行期间人类行为的正确性。此外,改变宣传和规定内容的变化对人类行为产生了不同的影响。对于人类行为的预测,神经网络比回归分析更准确。此外,回归分析还允许我们探索人类行为的可能的 causal 过程。
    Abstract In this work, we contribute the first visual open-source empirical study on human behaviour during the COVID-19 pandemic, in order to investigate how compliant a general population is to mask-wearing-related public-health policy. Object-detection-based convolutional neural networks, regression analysis and multilayer perceptrons are combined to analyse visual data of the Viennese public during 2020. We find that mask-wearing-related government regulations and public-transport announcements encouraged correct mask-wearing-behaviours during the COVID-19 pandemic. Importantly, changes in announcement and regulation contents led to heterogeneous effects on people's behaviour. Comparing the predictive power of regression analysis and neural networks, we demonstrate that the latter produces more accurate predictions of population reactions during the COVID-19 pandemic. Our use of regression modelling also allows us to unearth possible causal pathways underlying societal behaviour. Since our findings highlight the importance of appropriate communication contents, our results will facilitate more effective non-pharmaceutical interventions to be developed in future. Adding to the literature, we demonstrate that regression modelling and neural networks are not mutually exclusive but instead complement each other.
    摘要 在这项研究中,我们提供了 COVID-19 疫情期间人类行为的首个视觉开源实验研究,以Investigate whether the general population is compliant with mask-wearing-related public health policies. 我们结合了物体检测基于 convolutional neural networks,回归分析和多层感知器来分析了维也纳公众在2020年的视觉数据。我们发现,面具套用政策和公共交通宣传对 COVID-19 疫情期间的人类行为产生了正面影响。此外,宣传和规定的内容变化导致了人类行为的不同响应。我们比较了回归分析和神经网络的预测力,发现神经网络生成的预测更加准确。我们的回归分析还允许我们探索人类行为的可能的 causal 通路。由于我们的发现强调了有效的沟通内容的重要性,我们的结果将有助于未来开发更有效的非药用途 intevention。此外,我们还证明了回归分析和神经网络并不是矛盾的,而是可以相互补充。

Synaptic Sampling of Neural Networks

  • paper_url: http://arxiv.org/abs/2311.13038
  • repo_url: https://github.com/Fidan-Sadig/neural-network
  • paper_authors: James B. Aimone, William Severa, J. Darby Smith
  • for: 本文旨在描述一种能够直接通过bernoulli coin flips来采样神经网络的方法,以便在 probablistic computing 技术上实现更好的 uncertainty quantification。
  • methods: 本文使用了一种名为 scANN 的方法,即 \textit{sampling (by coinflips) artificial neural networks},其中神经网络的权重 treated as Bernoulli coin flips,以便通过采样来Quantify uncertainty。
  • results: 本文的实验结果表明,scANN 方法可以准确地预测神经网络输出的 uncertainty,同时nearly match fully deterministic performance。
    Abstract Probabilistic artificial neural networks offer intriguing prospects for enabling the uncertainty of artificial intelligence methods to be described explicitly in their function; however, the development of techniques that quantify uncertainty by well-understood methods such as Monte Carlo sampling has been limited by the high costs of stochastic sampling on deterministic computing hardware. Emerging computing systems that are amenable to hardware-level probabilistic computing, such as those that leverage stochastic devices, may make probabilistic neural networks more feasible in the not-too-distant future. This paper describes the scANN technique -- \textit{sampling (by coinflips) artificial neural networks} -- which enables neural networks to be sampled directly by treating the weights as Bernoulli coin flips. This method is natively well suited for probabilistic computing techniques that focus on tunable stochastic devices, nearly matches fully deterministic performance while also describing the uncertainty of correct and incorrect neural network outputs.
    摘要 probabilistic artificial neural networks 提供了可观的前景,使人工智能方法中的不确定性能被直接描述在函数中;然而,使用了已知方法 such as Monte Carlo sampling 来衡量不确定性的技术发展受到了 deterministic computing 硬件的高成本的限制。 emerging computing systems 可能在未来使 probabilistic neural networks 变得可能,这些系统通过利用Random devices 来实现硬件水平的 probabilistic computing。这篇文章描述了 scANN 技术---通过币技术来 sampling 人工神经网络---该技术使得神经网络可以直接通过对权重进行 Bernoulli 币flips 来采样。这种方法与可变的Stochastic devices 很compatible,并且可以与 deterministic performance 相比,同时也能够描述正确和错误神经网络输出的不确定性。

DMLR: Data-centric Machine Learning Research – Past, Present and Future

  • paper_url: http://arxiv.org/abs/2311.13028
  • repo_url: None
  • paper_authors: Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William A Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debojyoti Dutta, Curtis G Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlaš, Ahmed Alaa, Adji Bousso Dieng, Natasha Noy, Vijay Janapa Reddi, James Zou, Praveen Paritosh, Mihaela van der Schaar, Kurt Bollacker, Lora Aroyo, Ce Zhang, Joaquin Vanschoren, Isabelle Guyon, Peter Mattson
  • for: 本文提出了创建下一代公共数据集的需求,以推动机器学习科学的发展。
  • methods: 本文使用了社区参与和基础设施建设的方法,以创建和维护这些数据集和方法。
  • results: 本文 Charted a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.
    Abstract Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.
    摘要 根据第一届DMLR会议在ICML2023年举行的讨论和前一些会议,本报告探讨了公共数据集的创建和维护对机器学习科学的发展 relevance,并提出了共同努力的道路,以实现这些数据集和方法对科学、社会和商业产生积极影响。

Attention: Large Multimodal Model is Watching your Geo-privacy

  • paper_url: http://arxiv.org/abs/2311.13018
  • repo_url: None
  • paper_authors: Yifan Yang, Yixian Zhang, Daoyang Li, Shuju Sun, Junhong Duan, Junzhou He, Qingyang Wu, Hao Liu
  • for: 本研究旨在探讨大数据时代人们的个人隐私如何遭受泄露,尤其是在社交媒体和大数据时代,个人隐私遭受泄露的风险增加。
  • methods: 本研究使用了一种基于GPT-4的模型,名为“Dr. Watson”,来分析和提取社交媒体和公共数据源中的地理信息。
  • results: 实验结果表明,“Dr. Watson”能够成功地提取特定的地理信息,从而暴露了当前的地理隐私措施的漏洞。这些发现强调了在社交媒体和大数据时代,个人隐私泄露的容易性。
    Abstract Geographic privacy, a crucial aspect of personal security, often goes unnoticed in daily activities. This paper addresses the underestimation of this privacy in the context of increasing online data sharing and the advancements in information gathering technologies. With the surge in the use of Large Multimodal Models, such as GPT-4, for Open Source Intelligence (OSINT), the potential risks associated with geographic privacy breaches have intensified. This study highlights the criticality of these developments, focusing on their implications for individual privacy. The primary objective is to demonstrate the capabilities of advanced AI tools, specifically a GPT-4 based model named "Dr. Watson," in identifying and potentially compromising geographic privacy through online shared content. We developed "Dr. Watson" to analyze and extract geographic information from publicly available data sources. The study involved five experimental cases, each offering different perspectives on the tool's application in extracting precise location data from partial images and social media content. The experiments revealed that "Dr. Watson" could successfully identify specific geographic details, thereby exposing the vulnerabilities in current geo-privacy measures. These findings underscore the ease with which geographic information can be unintentionally disclosed. The paper concludes with a discussion on the broader implications of these findings for individuals and the community at large. It emphasizes the urgency for enhanced awareness and protective measures against geo-privacy leakage in the era of advanced AI and widespread social media usage.
    摘要 geographic privacy, a crucial aspect of personal security, often goes unnoticed in daily activities. This paper addresses the underestimation of this privacy in the context of increasing online data sharing and the advancements in information gathering technologies. With the surge in the use of Large Multimodal Models, such as GPT-4, for Open Source Intelligence (OSINT), the potential risks associated with geographic privacy breaches have intensified. This study highlights the criticality of these developments, focusing on their implications for individual privacy. The primary objective is to demonstrate the capabilities of advanced AI tools, specifically a GPT-4 based model named "Dr. Watson," in identifying and potentially compromising geographic privacy through online shared content. We developed "Dr. Watson" to analyze and extract geographic information from publicly available data sources. The study involved five experimental cases, each offering different perspectives on the tool's application in extracting precise location data from partial images and social media content. The experiments revealed that "Dr. Watson" could successfully identify specific geographic details, thereby exposing the vulnerabilities in current geo-privacy measures. These findings underscore the ease with which geographic information can be unintentionally disclosed. The paper concludes with a discussion on the broader implications of these findings for individuals and the community at large. It emphasizes the urgency for enhanced awareness and protective measures against geo-privacy leakage in the era of advanced AI and widespread social media usage.

CovarNav: Machine Unlearning via Model Inversion and Covariance Navigation

  • paper_url: http://arxiv.org/abs/2311.12999
  • repo_url: None
  • paper_authors: Ali Abbasi, Chayne Thrash, Elaheh Akbari, Daniel Zhang, Soheil Kolouri
  • for: 这个研究旨在解决人工智能模型对训练数据的依赖,以实现选择性地删除特定训练数据点的影响。
  • methods: 我们透过对模型进行反击攻击,获取模型训练数据的代理,然后透过对忘记集合(i.e., 忘记集)进行错分类,最后使用梯度对应方法来对忘记集进行最小化交叉熵损失。
  • results: 我们在CIFAR-10和Vggface2 datasets上进行了严谨的评估,与现有的 Referenced 数据显示了我们的提案的有效性。
    Abstract The rapid progress of AI, combined with its unprecedented public adoption and the propensity of large neural networks to memorize training data, has given rise to significant data privacy concerns. To address these concerns, machine unlearning has emerged as an essential technique to selectively remove the influence of specific training data points on trained models. In this paper, we approach the machine unlearning problem through the lens of continual learning. Given a trained model and a subset of training data designated to be forgotten (i.e., the "forget set"), we introduce a three-step process, named CovarNav, to facilitate this forgetting. Firstly, we derive a proxy for the model's training data using a model inversion attack. Secondly, we mislabel the forget set by selecting the most probable class that deviates from the actual ground truth. Lastly, we deploy a gradient projection method to minimize the cross-entropy loss on the modified forget set (i.e., learn incorrect labels for this set) while preventing forgetting of the inverted samples. We rigorously evaluate CovarNav on the CIFAR-10 and Vggface2 datasets, comparing our results with recent benchmarks in the field and demonstrating the efficacy of our proposed approach.
    摘要 随着人工智能的快速进步和无 precedent 的公众采用,大 neural network 的训练数据的隐私问题得到了更多的关注。为解决这些问题,机器忘记技术在 Machine Learning 领域得到了广泛的应用。在这篇论文中,我们通过 Continual Learning 的视角,对机器忘记问题进行解决。给定一个训练过的模型和需要被忘记的训练数据(即“忘记集”),我们提出了一个三步过程,名为 CovarNav,以便实现这种忘记。首先,我们使用模型反推攻击来Derive 模型的训练数据的代理。然后,我们将忘记集中的样本重新标签,选择与实际真实值最不一致的最可能的类别。最后,我们使用梯度投影方法来在修改后的忘记集上降低交叉熵损失(即学习 incorrect labels для该集),同时避免对反推样本的忘记。我们在 CIFAR-10 和 Vggface2 datasets 进行了严格的评估,与当前领域的标准准则进行比较,并证明了我们的提出的方法的效果。

RLIF: Interactive Imitation Learning as Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.12996
  • repo_url: None
  • paper_authors: Jianlan Luo, Perry Dong, Yuexiang Zhai, Yi Ma, Sergey Levine
  • for: 本研究旨在提出一种基于强化学习的自动技能学习方法,以解决在机器人学习中的实际控制问题中的分布性shift问题。
  • methods: 本研究使用了强化学习方法,并在人工专家的在线 intervene 下收集修正数据,以 Addressing the distributional shift challenges that afflict na"ive behavioral cloning。
  • results: 研究发现,使用我们提出的方法可以在不同任务中强制性地超越DAgger-like方法,特别是当 intervening experts 不是最优的情况下。 Code和视频可以在项目网站rlif-page.github.io找到。
    Abstract Although reinforcement learning methods offer a powerful framework for automatic skill acquisition, for practical learning-based control problems in domains such as robotics, imitation learning often provides a more convenient and accessible alternative. In particular, an interactive imitation learning method such as DAgger, which queries a near-optimal expert to intervene online to collect correction data for addressing the distributional shift challenges that afflict na\"ive behavioral cloning, can enjoy good performance both in theory and practice without requiring manually specified reward functions and other components of full reinforcement learning methods. In this paper, we explore how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. We also provide a unified framework to analyze our RL method and DAgger; for which we present the asymptotic analysis of the suboptimal gap for both methods as well as the non-asymptotic sample complexity bound of our method. We then evaluate our method on challenging high-dimensional continuous control simulation benchmarks as well as real-world robotic vision-based manipulation tasks. The results show that it strongly outperforms DAgger-like approaches across the different tasks, especially when the intervening experts are suboptimal. Code and videos can be found on the project website: rlif-page.github.io
    摘要 尽管强化学习方法提供了自动技能获得的强大框架,但在实际的学习控制问题中,如 робо派生学习经常提供了更加便捷和可 accessible的替代方案。特别是,一种交互式学习法如 DAgger,可以在线查询一个近似优秀专家的干预数据,以Addressing the distributional shift challenges that afflict naive behavioral cloning。这种方法在理论和实践中都可以获得良好的性能,不需要手动指定奖励函数和其他全面强化学习方法的其他组件。在这篇论文中,我们 explore了如何使用强化学习来提高性能,并且不需要 intervening experts是near-optimal的假设。我们的提posed方法使用强化学习的用户干预信号作为奖励。这些假设可以relax the assumption that intervening experts in interactive imitation learning should be near-optimal,并允许算法学习提高过potential suboptimal human expert的行为。我们还提供了一个统一的框架来分析我们的RL方法和DAgger; 我们对这两种方法的非 asymptotic sample complexity bound和suboptimal gap的极限分析。然后,我们在高维连续控制 simulations中评估了我们的方法,以及实际的机器人视觉基于 manipulation tasks。结果显示,它在不同任务中都强制 beat DAgger-like approaches,特别是当 intervening experts are suboptimal。代码和视频可以在项目网站: rlif-page.github.io 找到。

NERIF: GPT-4V for Automatic Scoring of Drawn Models

  • paper_url: http://arxiv.org/abs/2311.12990
  • repo_url: None
  • paper_authors: Gyeong-Geon Lee, Xiaoming Zhai
    for: The paper aims to advance scientific modeling practices by leveraging the powerful image processing capability of GPT-4V to automatically score student-drawn models for science phenomena.methods: The paper developed a method called NERIF (Notation-Enhanced Rubric Instruction for Few-shot Learning) that employs instructional notes and rubrics to prompt GPT-4V to score student-drawn models.results: The paper found that GPT-4V’s average scoring accuracy was .51, with higher accuracy for the ‘Beginning’ and ‘Developing’ classes and lower accuracy for the ‘Proficient’ class. The study also revealed how GPT-4V retrieves information from image input and narrates student-drawn models in natural language. Additionally, the paper demonstrated how GPT-4V assigns scores to student-drawn models according to the given scoring rubric and instructional notes.
    Abstract Scoring student-drawn models is time-consuming. Recently released GPT-4V provides a unique opportunity to advance scientific modeling practices by leveraging the powerful image processing capability. To test this ability specifically for automatic scoring, we developed a method NERIF (Notation-Enhanced Rubric Instruction for Few-shot Learning) employing instructional note and rubrics to prompt GPT-4V to score students' drawn models for science phenomena. We randomly selected a set of balanced data (N = 900) that includes student-drawn models for six modeling assessment tasks. Each model received a score from GPT-4V ranging at three levels: 'Beginning,' 'Developing,' or 'Proficient' according to scoring rubrics. GPT-4V scores were compared with human experts' scores to calculate scoring accuracy. Results show that GPT-4V's average scoring accuracy was mean =.51, SD = .037. Specifically, average scoring accuracy was .64 for the 'Beginning' class, .62 for the 'Developing' class, and .26 for the 'Proficient' class, indicating that more proficient models are more challenging to score. Further qualitative study reveals how GPT-4V retrieves information from image input, including problem context, example evaluations provided by human coders, and students' drawing models. We also uncovered how GPT-4V catches the characteristics of student-drawn models and narrates them in natural language. At last, we demonstrated how GPT-4V assigns scores to student-drawn models according to the given scoring rubric and instructional notes. Our findings suggest that the NERIF is an effective approach for employing GPT-4V to score drawn models. Even though there is space for GPT-4V to improve scoring accuracy, some mis-assigned scores seemed interpretable to experts. The results of this study show that utilizing GPT-4V for automatic scoring of student-drawn models is promising.
    摘要 学生手绘模型评分是时间consuming的。最近发布的GPT-4V提供了一个独特的机会,以便进步科学模型设计方法,利用图像处理能力。为了测试这种自动评分能力,我们开发了一种方法 called NERIF(notation-enhanced rubric instruction for few-shot learning),使用指导注释和评分标准来让GPT-4V评分学生手绘模型。我们随机选择了900个平衡数据(N = 900),包括学生手绘模型的六个模型评估任务。每个模型都得到了GPT-4V的评分,分为三个级别:' Beginning'、'Developing' 和 'Proficient',根据评分标准。GPT-4V的评分与人类专家的评分进行比较,以计算评分准确率。结果显示,GPT-4V的平均评分准确率为0.51,标准差为0.037。具体来说,' Beginning' 级别的平均评分为0.64,'Developing' 级别的平均评分为0.62,'Proficient' 级别的平均评分为0.26,这表明更高水平的模型更容易得到正确的评分。进一步的研究表明GPT-4V从图像输入中提取信息,包括问题上下文、人类评分员提供的示例评估以及学生手绘模型。我们还发现GPT-4V捕捉学生手绘模型的特点,并将其表达成自然语言。最后,我们示出了GPT-4V如何根据给定的评分标准和指导注释评分学生手绘模型。这些结果表明NERIF是一种有效的方法,可以使GPT-4V自动评分学生手绘模型。虽然有些评分有所不准确,但是一些不当评分似乎可以被专家解释。这些结果表明,通过GPT-4V自动评分学生手绘模型是可行的,并且具有潜在的优势。

Unsupervised Graph Attention Autoencoder for Attributed Networks using K-means Loss

  • paper_url: http://arxiv.org/abs/2311.12986
  • repo_url: None
  • paper_authors: Abdelfateh Bekkaira, Slimane Bellaouar, Slimane Oulad-Naoui
  • for: 本研究旨在提出一种简单、高效、基于无监督图像识别的社区检测模型,用于解释具有特征信息的网络。
  • methods: 该模型基于图像识别自适应编码器,同时兼顾了网络的结构信息和特征信息,以达到社区检测的双重目标。它使用k-means作为目标函数,并使用多头图像识别自适应编码器进行解码。
  • results: 在三个具有特征信息的网络 dataset 上进行实验,该方法比当前状态体系方法在 NMI 和 ARI 指标上表现出色,并且随网络规模增加而适用。这些结果有助于解释生物网络、社会网络等领域中的基本社区结构。
    Abstract Several natural phenomena and complex systems are often represented as networks. Discovering their community structure is a fundamental task for understanding these networks. Many algorithms have been proposed, but recently, Graph Neural Networks (GNN) have emerged as a compelling approach for enhancing this task.In this paper, we introduce a simple, efficient, and clustering-oriented model based on unsupervised \textbf{G}raph Attention \textbf{A}uto\textbf{E}ncoder for community detection in attributed networks (GAECO). The proposed model adeptly learns representations from both the network's topology and attribute information, simultaneously addressing dual objectives: reconstruction and community discovery. It places a particular emphasis on discovering compact communities by robustly minimizing clustering errors. The model employs k-means as an objective function and utilizes a multi-head Graph Attention Auto-Encoder for decoding the representations. Experiments conducted on three datasets of attributed networks show that our method surpasses state-of-the-art algorithms in terms of NMI and ARI. Additionally, our approach scales effectively with the size of the network, making it suitable for large-scale applications. The implications of our findings extend beyond biological network interpretation and social network analysis, where knowledge of the fundamental community structure is essential.
    摘要 许多自然现象和复杂系统经常被表示为网络。发现这些网络的社区结构是理解它们的基本任务。许多算法已经被提出,但最近,图神经网络(GNN)已经出现为了提高这个任务的有力的方法。在这篇论文中,我们介绍了一种简单、高效、归一化的模型,基于无监督的图注意力自动编码器(GAECO)来进行社区检测。该模型能够同时学习网络的结构和属性信息,并同时解决两个目标:重建和社区发现。它特别关注发现紧凑的社区,并且可以坚定地降低归一化错误。该模型使用k-means作为目标函数,并使用多头图注意力自动编码器进行解码。在三个具有属性网络的数据集上进行了实验,我们发现我们的方法在NMI和ARI方面超过了当前最佳算法。此外,我们的方法可以有效扩展到大规模网络应用,使之适用于大规模应用。我们的发现对生物网络解释和社会网络分析等领域有深远的影响。

GAIA: a benchmark for General AI Assistants

  • paper_url: http://arxiv.org/abs/2311.12983
  • repo_url: None
  • paper_authors: Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom
  • for: 这个论文的目的是提出一个基本智能助手的标准测试集(GAIA),如果解决这个问题,则代表了人工智能研究的一个里程碑。
  • methods: 该论文使用了人类问题来测试基本智能能力,包括推理、多 modal 处理、网页浏览和工具使用能力。
  • results: 论文表明,人类参与者对 GAIA 问题的回答率为 92%,而使用插件的 GPT-4 则只有 15%。这显示人类的表现与现有的大型语言模型(LLMs)不同,后者在专业技能方面的任务上表现出色。
    Abstract We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92\% vs. 15\% for GPT-4 equipped with plugins. This notable performance disparity contrasts with the recent trend of LLMs outperforming humans on tasks requiring professional skills in e.g. law or chemistry. GAIA's philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. Using GAIA's methodology, we devise 466 questions and their answer. We release our questions while retaining answers to 300 of them to power a leader-board available at https://huggingface.co/gaia-benchmark.
    摘要 我们介绍GAIA,一个标准参考平台 для通用AI助手,如果解决了这个问题,则表示了人工智能研究的里程碑。GAIA提出了真实世界中的问题,需要一系列基本能力,如推理、多Modal处理、网页浏览和一般工具使用能力。GAIA的问题概念简单 для人类,但是对现代AI来说很困难:我们显示了人类参考者对GAIA问题的回答率为92%,而使用插件的GPT-4则只有15%。这个差距非常明显,与现在的人工智能标准参考任务中的专业技能方面的LLMs(大型语言模型)之外表现相比,GAIA的哲学不同于现在的标准参考任务,它强调targeting tasks that are ever more difficult for humans。我们认为AGI的出现将取决于系统能够展示类似于人类平均表现的 Robustness。使用GAIA的方法ology,我们提出了466个问题和其答案。我们发布了这些问题,但保留了300个答案,以便在https://huggingface.co/gaia-benchmark上建立一个排行榜。

Neural Approximate Dynamic Programming for the Ultra-fast Order Dispatching Problem

  • paper_url: http://arxiv.org/abs/2311.12975
  • repo_url: None
  • paper_authors: Arash Dehghan, Mucahit Cevik, Merve Bodur
  • For: The paper aims to enhance the operational efficiency of same-day delivery (SDD) services by solving the ultra-fast order dispatching problem (ODP) within a centralized warehouse setting.* Methods: The paper introduces extensions to the ultra-fast ODP, such as order batching and explicit courier assignments, and uses NeurADP (a combination of Approximate Dynamic Programming and Deep Reinforcement Learning) as the solution method.* Results: The paper shows that NeurADP significantly outperforms myopic and DRL baselines, and that the inclusion of order batching and courier queues enhances the efficiency of delivery operations. The paper also provides detailed sensitivity analysis to confirm the robustness of NeurADP under different scenarios.
    Abstract Same-Day Delivery (SDD) services aim to maximize the fulfillment of online orders while minimizing delivery delays but are beset by operational uncertainties such as those in order volumes and courier planning. Our work aims to enhance the operational efficiency of SDD by focusing on the ultra-fast Order Dispatching Problem (ODP), which involves matching and dispatching orders to couriers within a centralized warehouse setting, and completing the delivery within a strict timeline (e.g., within minutes). We introduce important extensions to ultra-fast ODP such as order batching and explicit courier assignments to provide a more realistic representation of dispatching operations and improve delivery efficiency. As a solution method, we primarily focus on NeurADP, a methodology that combines Approximate Dynamic Programming (ADP) and Deep Reinforcement Learning (DRL), and our work constitutes the first application of NeurADP outside of the ride-pool matching problem. NeurADP is particularly suitable for ultra-fast ODP as it addresses complex one-to-many matching and routing intricacies through a neural network-based VFA that captures high-dimensional problem dynamics without requiring manual feature engineering as in generic ADP methods. We test our proposed approach using four distinct realistic datasets tailored for ODP and compare the performance of NeurADP against myopic and DRL baselines by also making use of non-trivial bounds to assess the quality of the policies. Our numerical results indicate that the inclusion of order batching and courier queues enhances the efficiency of delivery operations and that NeurADP significantly outperforms other methods. Detailed sensitivity analysis with important parameters confirms the robustness of NeurADP under different scenarios, including variations in courier numbers, spatial setup, vehicle capacity, and permitted delay time.
    摘要 same-day delivery (SDD) 服务 aim to maximize online order fulfillment while minimizing delivery delays, but are faced with operational uncertainties such as order volumes and courier planning. Our work aims to improve the operational efficiency of SDD by focusing on the ultra-fast Order Dispatching Problem (ODP), which involves matching and dispatching orders to couriers within a centralized warehouse setting and completing delivery within a strict timeline (e.g., within minutes). We introduce important extensions to ultra-fast ODP such as order batching and explicit courier assignments to provide a more realistic representation of dispatching operations and improve delivery efficiency. As a solution method, we primarily focus on NeurADP, a methodology that combines Approximate Dynamic Programming (ADP) and Deep Reinforcement Learning (DRL), and our work constitutes the first application of NeurADP outside of the ride-pool matching problem. NeurADP is particularly suitable for ultra-fast ODP as it addresses complex one-to-many matching and routing intricacies through a neural network-based VFA that captures high-dimensional problem dynamics without requiring manual feature engineering as in generic ADP methods. We test our proposed approach using four distinct realistic datasets tailored for ODP and compare the performance of NeurADP against myopic and DRL baselines by also making use of non-trivial bounds to assess the quality of the policies. Our numerical results indicate that the inclusion of order batching and courier queues enhances the efficiency of delivery operations and that NeurADP significantly outperforms other methods. Detailed sensitivity analysis with important parameters confirms the robustness of NeurADP under different scenarios, including variations in courier numbers, spatial setup, vehicle capacity, and permitted delay time.

Clustered Policy Decision Ranking

  • paper_url: http://arxiv.org/abs/2311.12970
  • repo_url: None
  • paper_authors: Mark Levin, Hana Chockler
  • for: 这篇论文旨在为RL训练过程中的策略进行重要性评估。
  • methods: 该论文提出了一种基于统计 covariance 估计的黑盒方法,可以对已经训练好的策略进行重要性排名。
  • results: 该方法可以准确地推断出RL训练过程中不同决策的重要性,并且比之前的统计错误排名方法更加准确。
    Abstract Policies trained via reinforcement learning (RL) are often very complex even for simple tasks. In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is. Given a trained policy, we propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states. We compare our measure against a previous statistical fault localization based ranking procedure.
    摘要 政策通过强化学习(RL)训练 oft very complex, even for simple tasks. In an episode with n time steps, a policy will make n decisions on actions to take, many of which may appear non-intuitive to the observer. Moreover, it is not clear which of these decisions directly contribute towards achieving the reward and how significant their contribution is. Given a trained policy, we propose a black-box method based on statistical covariance estimation that clusters the states of the environment and ranks each cluster according to the importance of decisions made in its states. We compare our measure against a previous statistical fault localization based ranking procedure.Here's the translation breakdown:政策 (government policy) -> 政策 (policy)通过 (through) -> 通过 (through)强化学习 (reinforcement learning) -> 强化学习 (reinforcement learning)episode -> 集 (episode)time steps -> 时间步骤 (time steps)policy -> 政策 (policy)will make -> 会make (will make)actions -> 动作 (actions)many of which may appear non-intuitive -> 许多可能看起来不 intuitive (many of which may appear non-intuitive)to the observer. -> 对观察者来说。 (to the observer.)Moreover, it is not clear -> 此外,还不清楚 (Moreover, it is not clear)which of these decisions directly contribute -> 哪些决策直接贡献 (which of these decisions directly contribute)towards achieving the reward -> 获得奖励 (towards achieving the reward)and how significant their contribution is. -> 以及其贡献的重要性是多少。 (and how significant their contribution is.)Given a trained policy, we propose -> 给定一个训练好的政策,我们提出 (Given a trained policy, we propose)a black-box method based on statistical covariance estimation -> 一种基于统计协VAR分布的黑盒方法 (a black-box method based on statistical covariance estimation)that clusters the states of the environment -> 对环境的状态进行聚合 (that clusters the states of the environment)and ranks each cluster according to the importance of decisions made in its states. -> 并对每个分组中的决策进行排名,以便了解各个分组中决策的重要性。 (and ranks each cluster according to the importance of decisions made in its states.)We compare our measure against a previous statistical fault localization based ranking procedure. -> 我们对我们的度量进行与之前基于统计FAULT Localization的排名进行比较。 (We compare our measure against a previous statistical fault localization based ranking procedure.)

Robustifying Generalizable Implicit Shape Networks with a Tunable Non-Parametric Model

  • paper_url: http://arxiv.org/abs/2311.12967
  • repo_url: https://github.com/Ouasfi/Feat-NKRR-adaptation
  • paper_authors: Amine Ouasfi, Adnane Boukhayma
  • for: implicit shape reconstruction from unoriented point cloud
  • methods: feedforward generalizable models with inter-shape data prior and intra-shape regularization prior
  • results: improved performance and efficiency, adaptive expressiveness-robustness trade-off, demonstrated on synthetic and real data.Here’s the full Chinese text:
  • for: 这篇论文主要针对无方向点云的隐形形状重建问题进行研究。
  • methods: 使用 feedforward 通过给定的数据集进行训练,并将其与其他形状之间的数据径统一,以提高性能和推导速度。
  • results: 透过调整方法,实现了更好的表现和效率,并且在不同的形状下获得了适应性和稳定性。实验结果显示,该方法在实验和实际数据上均有着优越的表现。
    Abstract Feedforward generalizable models for implicit shape reconstruction from unoriented point cloud present multiple advantages, including high performance and inference speed. However, they still suffer from generalization issues, ranging from underfitting the input point cloud, to misrepresenting samples outside of the training data distribution, or with toplogies unseen at training. We propose here an efficient mechanism to remedy some of these limitations at test time. We combine the inter-shape data prior of the network with an intra-shape regularization prior of a Nystr\"om Kernel Ridge Regression, that we further adapt by fitting its hyperprameters to the current shape. The resulting shape function defined in a shape specific Reproducing Kernel Hilbert Space benefits from desirable stability and efficiency properties and grants a shape adaptive expressiveness-robustness trade-off. We demonstrate the improvement obtained through our method with respect to baselines and the state-of-the-art using synthetic and real data.
    摘要 <>将Feedforward普通模型应用于隐藏形状重建从不 oriented 点云中具有多种优点,包括高性能和推理速度。然而,它们仍然受到泛化问题的困扰,从下采样点云到违背样本分布,或者看到训练数据中未见 topology。我们在测试时提出了一种有效的解决方案。我们将网络的间 shape 数据 prior 与内 shape 规则化 prior 的 Nyström Kernel Ridge Regression 相结合,并将其适应于当前形状。 resulting shape function 在 shape 特定的 reproduce kernel Hilbert space 中定义,具有愉悦的稳定性和效率性质,同时具有形状特定的表达性和鲁棒性质。我们通过对基eline和现有技术的比较,示出了我们的方法可以提供更好的改进。Note that Simplified Chinese is a writing system used in mainland China, and it may be different from Traditional Chinese, which is used in Taiwan and other countries.

Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection

  • paper_url: http://arxiv.org/abs/2311.12956
  • repo_url: https://github.com/sashamatsun/lskdiffdet
  • paper_authors: Ahmed Sharshar, Aleksandr Matsun
    for:This paper focuses on improving the accuracy and efficiency of object detection in aerial images, with a specific emphasis on small objects and diverse orientations.methods:The proposed approach utilizes the Large Selective Kernel Network (LSKNet) as the backbone and combines it with the DiffusionDet head, along with several novel methodologies and ablation studies to refine the model’s performance.results:The proposed model achieves a mean average precision (MAP) of approximately 45.7%, outperforming the RCNN model by 4.7% on the same dataset, indicating a significant improvement in object detection accuracy and efficiency.
    Abstract In the realm of aerial image analysis, object detection plays a pivotal role, with significant implications for areas such as remote sensing, urban planning, and disaster management. This study addresses the inherent challenges in this domain, notably the detection of small objects, managing densely packed elements, and accounting for diverse orientations. We present an in-depth evaluation of an object detection model that integrates the Large Selective Kernel Network (LSKNet)as its backbone with the DiffusionDet head, utilizing the iSAID dataset for empirical analysis. Our approach encompasses the introduction of novel methodologies and extensive ablation studies. These studies critically assess various aspects such as loss functions, box regression techniques, and classification strategies to refine the model's precision in object detection. The paper details the experimental application of the LSKNet backbone in synergy with the DiffusionDet heads, a combination tailored to meet the specific challenges in aerial image object detection. The findings of this research indicate a substantial enhancement in the model's performance, especially in the accuracy-time tradeoff. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement, outperforming the RCNN model by 4.7% on the same dataset. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis, paving the way for more accurate and efficient object detection methodologies. The code is publicly available at https://github.com/SashaMatsun/LSKDiffDet
    摘要 在飞行图像分析领域,对象检测占据重要地位,对于远程感知、城市规划和灾害管理等领域具有重要意义。这项研究解决了这个领域的内在挑战,包括检测小对象、处理紧密排序的元素以及考虑多种方向。我们提出了一种含有新的方法和广泛的缓解研究的对象检测模型,使用iSAID数据集进行实证分析。我们的方法包括引入新的损失函数、盒子回归技术和分类策略来提高模型对 объек detection的精度。我们在实验中应用了LSKNet作为模型的基础,并与DiffusionDet头部结合使用,这种组合是为了特别应对飞行图像中的对象检测挑战。我们的实验结果表明,提档的模型在精度-时间交换中具有显著提高,特别是在RCNN模型的4.7%上。这一进步证明了我们的修改是有效的,并为飞行图像分析领域设置了新的benchmark,开出了更高精度和更高效的对象检测方法。代码可以在https://github.com/SashaMatsun/LSKDiffDet上获得。

PINNs-Based Uncertainty Quantification for Transient Stability Analysis

  • paper_url: http://arxiv.org/abs/2311.12947
  • repo_url: None
  • paper_authors: Ren Wang, Ming Zhong, Kaidi Xu, Lola Giráldez Sánchez-Cortés, Ignacio de Cominges Guerra
  • for: 本文解决了电力系统中缺失参数和不确定性传递的稳定性挑战。
  • methods: 我们介绍了一种新的物理学 Informed Neural Networks(PINNs) ensemble(E-PINNs)来估算旋转角和固有剖面系数,提高了准确性和计算负担。E-PINNs 利用了振荡方程的物理基础,提供了一个可靠的解决方案。
  • results: 我们的方法不仅能够有效地进行参数估算,还能够评估系统行为的不确定性,为稳定性分析提供了 probabilistic 的洞察。我们通过对 $1$-bus 和 $2$-bus 系统的分析,表明了模型在参数变化和数据缺乏的情况下的可靠性。这种方法可以扩展到电力系统稳定性分析中,为计算效率和可靠性做出贡献。
    Abstract This paper addresses the challenge of transient stability in power systems with missing parameters and uncertainty propagation in swing equations. We introduce a novel application of Physics-Informed Neural Networks (PINNs), specifically an Ensemble of PINNs (E-PINNs), to estimate critical parameters like rotor angle and inertia coefficient with enhanced accuracy and reduced computational load. E-PINNs capitalize on the underlying physical principles of swing equations to provide a robust solution. Our approach not only facilitates efficient parameter estimation but also quantifies uncertainties, delivering probabilistic insights into the system behavior. The efficacy of E-PINNs is demonstrated through the analysis of $1$-bus and $2$-bus systems, highlighting the model's ability to handle parameter variability and data scarcity. The study advances the application of machine learning in power system stability, paving the way for reliable and computationally efficient transient stability analysis.
    摘要 Translated into Simplified Chinese:这篇论文挑战了电力系统缺失参数和不确定性传递的稳定性问题。我们介绍了一种使用物理学 Informed Neural Networks (PINNs) 的ensemble (E-PINNs)来估算总角和阻尼系数,并提高了准确性和计算负担。E-PINNs 利用振荡方程的物理原理来提供一个可靠的解决方案。我们的方法不仅帮助提高参数估算的效率,还可以量化不确定性,为系统行为提供概率性的洞察。我们通过分析 $1$-bus 和 $2$-bus 系统来证明 E-PINNs 的能力,表明该模型可以处理参数变化和数据缺乏。这种研究推动了机器学习在电力系统稳定性分析中的应用,为可靠和计算高效的稳定性分析开创了新的可能性。

DroneOptiNet: A Framework for Optimal Drone-based Load Redistribution Mechanism for 5G and Beyond Solar Small Cell Networks

  • paper_url: http://arxiv.org/abs/2311.12944
  • repo_url: None
  • paper_authors: Daksh Dave, Vinay Chamola, Sandeep Joshi, Sherali Zeadally
  • for: 提高无线通信系统的可靠性和韧性
  • methods: 利用机龙基站(BS)进行无线电力重新分配,并使用演化神经网络进行启发式调节
  • results: 降低基站的电力问题和维持通信带宽稳定,提高无线通信系统的可靠性和韧性
    Abstract The power requirements posed by the fifth-generation and beyond cellular networks are an important constraint in network deployment and require energy-efficient solutions. In this work, we propose a novel user load transfer approach using airborne base stations (BS), mounted on drones, for reliable and secure power redistribution across the micro-grid network comprising green small cell BSs. Depending on the user density and the availability of an aerial BS, the energy requirement of a cell with an energy deficit is accommodated by migrating the aerial BS from a high-energy to a low-energy cell. The proposed hybrid drone-based framework integrates long short-term memory with unique cost functions using an evolutionary neural network for drones and BSs, and efficiently manages energy and load redistribution. The proposed algorithm reduces power outages at BSs and maintains consistent throughput stability, thereby demonstrating its capability to boost the reliability and robustness of wireless communication systems.
    摘要 fifth-generation 和更高代Cellular 网络的能源需求是重要的约束因素,需要能效的解决方案。在这项工作中,我们提出了一种新的用户负荷传输方法,使用飞行基站(BS),固定在无人机上,为可靠和安全地分配电力。根据用户密度和有空 aerial BS的可用性,能源缺乏的细端网格中的一个细端BS的能源需求会通过将 aerial BS从高能端到低能端的细端BS中迁移来满足。我们提出的 hybrid 无人机基站框架将长Short-term 记忆与唯一的成本函数使用进行演化神经网络,以有效地管理能量和负荷重新分配。该算法可以减少BS的发电机故障,保持通信系统的吞吐稳定性,因此展示了它的可能性,提高了无线通信系统的可靠性和稳定性。

InteRACT: Transformer Models for Human Intent Prediction Conditioned on Robot Actions

  • paper_url: http://arxiv.org/abs/2311.12943
  • repo_url: None
  • paper_authors: Kushal Kedia, Atiksh Bhardwaj, Prithwish Dan, Sanjiban Choudhury
  • for: 本研究旨在解决人机共同操作中机器人预测人意图的问题,以实现简洁执行任务。
  • methods: 本研究提出了一种新的architecture,即InteRACT,它利用人机动作之间的对应关系来进行转移学习,从大量的人人交互数据中预测人意图。
  • results: 在一系列实际的共同人机操作任务上,我们的条件预测模型超过了多种独立预测模型的基准。此外,我们还引入了新的操作技术来控制7个自由度的机器人臂,并收集了一个多样化的人机共同操作数据集,该数据集将被开源。
    Abstract In collaborative human-robot manipulation, a robot must predict human intents and adapt its actions accordingly to smoothly execute tasks. However, the human's intent in turn depends on actions the robot takes, creating a chicken-or-egg problem. Prior methods ignore such inter-dependency and instead train marginal intent prediction models independent of robot actions. This is because training conditional models is hard given a lack of paired human-robot interaction datasets. Can we instead leverage large-scale human-human interaction data that is more easily accessible? Our key insight is to exploit a correspondence between human and robot actions that enables transfer learning from human-human to human-robot data. We propose a novel architecture, InteRACT, that pre-trains a conditional intent prediction model on large human-human datasets and fine-tunes on a small human-robot dataset. We evaluate on a set of real-world collaborative human-robot manipulation tasks and show that our conditional model improves over various marginal baselines. We also introduce new techniques to tele-operate a 7-DoF robot arm and collect a diverse range of human-robot collaborative manipulation data, which we open-source.
    摘要 在合作人机Robot manipulation中, robot需要预测人类意图,并根据此进行相应的行为以实现任务。然而,人类的意图又取决于机器人的行动,导致一种鸡蛋问题。先前的方法忽略了这种互相关系,而是独立地训练了独立的意图预测模型。这是因为在机器人行动和人类行动之间的条件学习是困难的。我们的关键发现是利用人类和机器人行动之间的对应关系,以便在大规模的人类人类交互数据上进行转移学习。我们提议一种新的架构,即InteRACT,它先在大规模的人类人类交互数据上预训练一个条件意图预测模型,然后在小规模的人类机器人交互数据上细化。我们对一组真实的合作人机Robot manipulation任务进行评估,并证明了我们的条件模型在多种独立基线上进行改进。我们还介绍了一些新的技术来远程操纵7度自由度机器人臂,并收集了一个多样化的人机合作 manipulate数据,这些数据将被开源。

Intrinsic Image Decomposition via Ordinal Shading

  • paper_url: http://arxiv.org/abs/2311.12792
  • repo_url: https://github.com/compphoto/Intrinsic
  • paper_authors: Chris Careaga, Yağız Aksoy
  • for: 高精度的内在分解,用于各种反射渲染和计算摄影术pipeline
  • methods: 将问题分解为两部分,首先使用稠密的ORDinal shading表示法,并使用第二个网络将低和高分辨率的ORDinal estimation结合在一起,以生成具有全局协调和局部细节的颜色渲染
  • results: 通过计算估计的照明和反射率,使模型学习高精度的内在分解,并在实际应用中进行难以实现的编辑任务,如重新染色和重新照明
    Abstract Intrinsic decomposition is a fundamental mid-level vision problem that plays a crucial role in various inverse rendering and computational photography pipelines. Generating highly accurate intrinsic decompositions is an inherently under-constrained task that requires precisely estimating continuous-valued shading and albedo. In this work, we achieve high-resolution intrinsic decomposition by breaking the problem into two parts. First, we present a dense ordinal shading formulation using a shift- and scale-invariant loss in order to estimate ordinal shading cues without restricting the predictions to obey the intrinsic model. We then combine low- and high-resolution ordinal estimations using a second network to generate a shading estimate with both global coherency and local details. We encourage the model to learn an accurate decomposition by computing losses on the estimated shading as well as the albedo implied by the intrinsic model. We develop a straightforward method for generating dense pseudo ground truth using our model's predictions and multi-illumination data, enabling generalization to in-the-wild imagery. We present an exhaustive qualitative and quantitative analysis of our predicted intrinsic components against state-of-the-art methods. Finally, we demonstrate the real-world applicability of our estimations by performing otherwise difficult editing tasks such as recoloring and relighting.
    摘要 “内在分解是计算机视觉中的基本中间问题,它在各种反向渲染和计算 fotografraphy 管道中扮演着关键的角色。生成高精度内在分解是一个具有下降约束的任务,需要精准地估计连续值的颜色和反射率。在这个工作中,我们实现高分辨率内在分解,通过将问题分解成两部分。首先,我们提出了一种适用于各种照明条件的稠密顺序颜色表示,使用不受约束的损失函数来估计顺序颜色cue无需遵循内在模型。然后,我们将低分辨率和高分辨率的顺序估计结合使用第二个网络,生成一个具有全局准确性和局部细节的颜色估计。我们对模型的颜色估计和内在模型预测的反射率进行损失计算,以便帮助模型学习准确地分解。我们还提出了一种简单的方法,使用我们的模型预测和多种照明数据生成密集的 Pseudo 真实数据,以便普遍应用于野外图像。最后,我们对我们预测的内在组件进行了详细的质量和质量分析,并将其与当前的方法进行比较。最后,我们通过执行一些困难的编辑任务,如重新颜色和重新照明,来证明我们的估计的可能性。”

Quantifying Impairment and Disease Severity Using AI Models Trained on Healthy Subjects

  • paper_url: http://arxiv.org/abs/2311.12781
  • repo_url: https://github.com/fishneck/cobra
  • paper_authors: Boyang Yu, Aakash Kaku, Kangning Liu, Avinash Parnandi, Emily Fokas, Anita Venkatesan, Natasha Pandit, Rajesh Ranganath, Heidi Schambra, Carlos Fernandez-Granda
  • for: 这个研究的目的是提出一种新的评估残疾和疾病严重度的方法,利用人工智能模型,以便更好地评估患者的疾病程度。
  • methods: 这种方法利用了医学健康个体数据集来训练人工智能模型,然后使用这些模型对患者进行评估,并计算出一个名为COBRA分数,用于评估患者的残疾程度。
  • results: 研究表明,COBRA分数与专业评估结果(FMA)之间存在强相关关系,在两种不同的数据模式(穿戴式传感器和视频)上都有 statistically significant 的相关性(ρ = 0.845, 95% CI [0.743,0.908] 和 ρ = 0.746, 95% C.I [0.594, 0.847])。此外,这种方法还可以用于评估其他情况,如膝关节炎的严重程度,并与独立的临床评估相关(ρ = 0.644, 95% C.I [0.585,0.696])。
    Abstract Automatic assessment of impairment and disease severity is a key challenge in data-driven medicine. We propose a novel framework to address this challenge, which leverages AI models trained exclusively on healthy individuals. The COnfidence-Based chaRacterization of Anomalies (COBRA) score exploits the decrease in confidence of these models when presented with impaired or diseased patients to quantify their deviation from the healthy population. We applied the COBRA score to address a key limitation of current clinical evaluation of upper-body impairment in stroke patients. The gold-standard Fugl-Meyer Assessment (FMA) requires in-person administration by a trained assessor for 30-45 minutes, which restricts monitoring frequency and precludes physicians from adapting rehabilitation protocols to the progress of each patient. The COBRA score, computed automatically in under one minute, is shown to be strongly correlated with the FMA on an independent test cohort for two different data modalities: wearable sensors ($\rho = 0.845$, 95% CI [0.743,0.908]) and video ($\rho = 0.746$, 95% C.I [0.594, 0.847]). To demonstrate the generalizability of the approach to other conditions, the COBRA score was also applied to quantify severity of knee osteoarthritis from magnetic-resonance imaging scans, again achieving significant correlation with an independent clinical assessment ($\rho = 0.644$, 95% C.I [0.585,0.696]).
    摘要 自动评估残疾和疾病严重性是数据驱动医学中的关键挑战。我们提出了一种新的框架,利用专门训练在健康人群中的AI模型来解决这个挑战。我们称之为COBRA分数(Confidence-Based chaRacterization of Anomalies),它利用AI模型对异常或疾病患者的预测不确定程度来衡量其与健康人群的偏差。我们应用COBRA分数来解决现有临床评估 Upper-body impairment 在roke患者中的一个重要限制,即需要专业评估人员在面对面进行30-45分钟的评估,这限制了监测频率和病人的进程不能适应医生改进复健协议。COBRA分数可以在1分钟内自动计算,与FMA(Gold-standard Fugl-Meyer Assessment)在独立测试组中呈现强相关关系(粒度值为0.845,95%CI=[0.743,0.908])和 виде(粒度值为0.746,95%CI=[0.594,0.847])两种数据模式。为了证明方法的普适性,我们还应用了COBRA分数来评估了Magnetic-resonance imaging 扫描中的 knee osteoarthritis 的严重程度,并在独立临床评估中达到了显著相关关系(粒度值为0.644,95%CI=[0.585,0.696])。

SPOT! Revisiting Video-Language Models for Event Understanding

  • paper_url: http://arxiv.org/abs/2311.12919
  • repo_url: None
  • paper_authors: Gengyuan Zhang, Jinhe Bi, Jindong Gu, Volker Tresp
  • for: 研究多Modal学习中的视频理解。
  • methods: 利用大规模的网络抓取的视频文本对照,通过弱监督来预训练视频语言模型,并显示了惊人的潜力在视频理解任务中。
  • results: 发现现有视频语言模型无法分辨细节事件差异,因为视频文本对照通常只包含大致级别的描述。为解决这个问题,我们提出了SPOT Prober,用于评估现有视频语言模型在事件理解方面的能力。我们发现,通过在模型中插入 manipulate 事件标签来提高模型的事件理解能力。
    Abstract Understanding videos is an important research topic for multimodal learning. Leveraging large-scale datasets of web-crawled video-text pairs as weak supervision has become a pre-training paradigm for learning joint representations and showcased remarkable potential in video understanding tasks. However, videos can be multi-event and multi-grained, while these video-text pairs usually contain only broad-level video captions. This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events? To address this, we introduce SPOT Prober, to benchmark existing video-language models's capacities of distinguishing event-level discrepancies as an indicator of models' event understanding ability. Our approach involves extracting events as tuples () from videos and generating false event tuples by manipulating tuple components systematically. We reevaluate the existing video-language models with these positive and negative captions and find they fail to distinguish most of the manipulated events. Based on our findings, we propose to plug in these manipulated event captions as hard negative samples and find them effective in enhancing models for event understanding.
    摘要 <> transtable="zh-CN"理解视频是一个重要的多模式学习研究领域。利用大规模的网络爬取的视频文本对的弱监督,已成为视频理解任务的预训练方法,并表现出了Remarkable potential。然而,视频可能是多事件和多层次的,而这些视频文本对通常只包含大规模的视频标签。这引发了一个问题:在如此弱监督下,视频表示能否分辨出文本描述中的细节不一致?为解决这问题,我们引入了SPOT Prober,用于评估现有视频语言模型对事件的理解能力。我们的方法是从视频中提取事件为元组(<主题、 predicates、 object、 attribute、 timestamps>),并将元组组成系统地 manipulate 成假事件元组。我们重新评估了现有的视频语言模型,并发现它们大多数无法分辨 manipulate 后的事件。基于我们的发现,我们提议在模型训练中添加这些 manipulate 后的事件caption作为困难样本,并发现它们有效地提高模型的事件理解能力。

  • paper_url: http://arxiv.org/abs/2311.12917
  • repo_url: None
  • paper_authors: E. Kulman, R. Kuang, Q. Morris
  • for: 这个论文旨在描述用于描述癌症发展的演化历史,并且提供有用的信息以 inform cancer treatment。
  • methods: 这篇论文使用了点变化检测结果来构建癌症演化树。
  • results: 这篇论文的结果表明,Orchard算法可以准确地重建癌症演化树,并且在90个 simulations和14个B-ALL例子中表现更加稳定和准确。
    Abstract Phylogenies depicting the evolutionary history of genetically heterogeneous subpopulations of cells from the same cancer i.e., cancer phylogenies, provide useful insights about cancer development and inform treatment. Cancer phylogenies can be reconstructed using data obtained from bulk DNA sequencing of multiple tissue samples from the same cancer. We introduce Orchard, a fast algorithm that reconstructs cancer phylogenies using point mutations detected in bulk DNA sequencing data. Orchard constructs cancer phylogenies progressively, one point mutation at a time, ultimately sampling complete phylogenies from a posterior distribution implied by the bulk DNA data. Orchard reconstructs more plausible phylogenies than state-of-the-art cancer phylogeny reconstruction methods on 90 simulated cancers and 14 B-progenitor acute lymphoblastic leukemias (B-ALLs). These results demonstrate that Orchard accurately reconstructs cancer phylogenies with up to 300 mutations. We then introduce a simple graph based clustering algorithm that uses a reconstructed phylogeny to infer unique groups of mutations i.e., mutation clusters, that characterize the genetic differences between cancer cell populations, and show that this approach is competitive with state-of-the-art mutation clustering methods.
    摘要 生物Tree depicting the evolutionary history of genetically heterogeneous subpopulations of cells from the same cancer, namely cancer phylogenies, can provide valuable insights into cancer development and inform treatment. Cancer phylogenies can be reconstructed using data obtained from bulk DNA sequencing of multiple tissue samples from the same cancer. We propose a fast algorithm called Orchard, which reconstructs cancer phylogenies using point mutations detected in bulk DNA sequencing data. Orchard constructs cancer phylogenies progressively, one point mutation at a time, ultimately sampling complete phylogenies from the posterior distribution implied by the bulk DNA data. Orchard reconstructs more plausible phylogenies than state-of-the-art cancer phylogeny reconstruction methods on 90 simulated cancers and 14 B-progenitor acute lymphoblastic leukemias (B-ALLs). These results demonstrate that Orchard accurately reconstructs cancer phylogenies with up to 300 mutations. We then introduce a simple graph-based clustering algorithm that uses a reconstructed phylogeny to infer unique groups of mutations, namely mutation clusters, that characterize the genetic differences between cancer cell populations, and show that this approach is competitive with state-of-the-art mutation clustering methods.

Digital Twin Framework for Optimal and Autonomous Decision-Making in Cyber-Physical Systems: Enhancing Reliability and Adaptability in the Oil and Gas Industry

  • paper_url: http://arxiv.org/abs/2311.12755
  • repo_url: None
  • paper_authors: Carine Menezes Rebello, Johannes Jäschkea, Idelfonso B. R. Nogueira
  • for: 本研究旨在提出一种用于油气工业中气动力喷气过程中的数字双体系,以便实现最优化和自动化的决策。
  • methods: 本研究使用了 bayesian inference、Monte Carlo simulations、转移学习、在线学习以及新的策略来具现数字双体系,包括模型高维度减少和认知攻击。
  • results: 该研究提出了一种高效、可靠、可信worthy的数字双体系标识框架,可以在不同环境下适应变化,并包括预测uncertainty,从而提高了整个决策过程的可靠性和效率。
    Abstract The concept of creating a virtual copy of a complete Cyber-Physical System opens up numerous possibilities, including real-time assessments of the physical environment and continuous learning from the system to provide reliable and precise information. This process, known as the twinning process or the development of a digital twin (DT), has been widely adopted across various industries. However, challenges arise when considering the computational demands of implementing AI models, such as those employed in digital twins, in real-time information exchange scenarios. This work proposes a digital twin framework for optimal and autonomous decision-making applied to a gas-lift process in the oil and gas industry, focusing on enhancing the robustness and adaptability of the DT. The framework combines Bayesian inference, Monte Carlo simulations, transfer learning, online learning, and novel strategies to confer cognition to the DT, including model hyperdimensional reduction and cognitive tack. Consequently, creating a framework for efficient, reliable, and trustworthy DT identification was possible. The proposed approach addresses the current gap in the literature regarding integrating various learning techniques and uncertainty management in digital twin strategies. This digital twin framework aims to provide a reliable and efficient system capable of adapting to changing environments and incorporating prediction uncertainty, thus enhancing the overall decision-making process in complex, real-world scenarios. Additionally, this work lays the foundation for further developments in digital twins for process systems engineering, potentially fostering new advancements and applications across various industrial sectors.
    摘要 《创建虚拟复制的完整Cyber-Physical System开发了许多可能性,包括实时评估物理环境和持续学习系统以提供可靠和准确的信息。这个过程,称为双体过程或数字双体(DT)的开发,在不同领域得到了广泛的应用。然而,在考虑实时执行人工智能模型,如在数字双体中使用的模型,在信息交换场景中存在挑战。这项工作提出了一个数字双体框架,用于在气压吸取过程中进行优化和自主决策。这个框架结合 bayesian推理、虚拟 Monte Carlo 仿真、传输学习、在线学习和新的战略,以 confer cognition to the DT,包括模型多维度减少和认知抓取。因此,创建一个高效、可靠、信任worthy的 DT 标识框架成为可能。这种方法填补了现有文献中关于融合不同学习技术和uncertainty管理的数字双体策略的空白。这个数字双体框架目的是提供一个可靠、高效的系统,能够适应变化的环境,并 incorporate prediction uncertainty,从而提高了总决策过程的复杂、真实场景中的可靠性。此外,这项工作为数字双体技术在过程系统工程领域奠定了基础,可能激发新的发展和应用在不同的产业领域。》

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

  • paper_url: http://arxiv.org/abs/2311.12754
  • repo_url: https://github.com/huang-yh/selfocc
  • paper_authors: Yuanhui Huang, Wenzhao Zheng, Borui Zhang, Jie Zhou, Jiwen Lu
  • for: 提高视觉自主驾驶系统的稳定性,预测周围3D空间中每个点是否占用。
  • methods: 使用自我超级vised学习方法,只使用视频序列来学习3D占用状态。
  • results: 与前一代最佳方法SceneRF相比,SelfOcc在SemanticKITTI上提高了58.7%,并在nuScenes上实现了高质量的深度和3D占用状态。
    Abstract 3D occupancy prediction is an important task for the robustness of vision-centric autonomous driving, which aims to predict whether each point is occupied in the surrounding 3D space. Existing methods usually require 3D occupancy labels to produce meaningful results. However, it is very laborious to annotate the occupancy status of each voxel. In this paper, we propose SelfOcc to explore a self-supervised way to learn 3D occupancy using only video sequences. We first transform the images into the 3D space (e.g., bird's eye view) to obtain 3D representation of the scene. We directly impose constraints on the 3D representations by treating them as signed distance fields. We can then render 2D images of previous and future frames as self-supervision signals to learn the 3D representations. We propose an MVS-embedded strategy to directly optimize the SDF-induced weights with multiple depth proposals. Our SelfOcc outperforms the previous best method SceneRF by 58.7% using a single frame as input on SemanticKITTI and is the first self-supervised work that produces reasonable 3D occupancy for surround cameras on nuScenes. SelfOcc produces high-quality depth and achieves state-of-the-art results on novel depth synthesis, monocular depth estimation, and surround-view depth estimation on the SemanticKITTI, KITTI-2015, and nuScenes, respectively. Code: https://github.com/huang-yh/SelfOcc.
    摘要 三维占用预测是自主驾驶视觉系统中重要任务,目标是预测周围3D空间中每个点的占用状态。现有方法通常需要3D占用标签以生成有意义的结果。然而,为每个 voz 粒子 annotate 占用状态是非常劳累。在这篇论文中,我们提出了一种自动学习的方法,即SelfOcc,使用仅视频序列来学习3D占用。我们首先将图像转换为3D空间(例如,鸟瞰视图),以获得场景的3D表示。然后,我们直接对3D表示进行约束,将其当作签名距离场景。我们可以将前一帧和后一帧的图像渲染为自我监督信号,以便学习3D表示。我们提出了一种嵌入MVS的策略,直接优化SDF引起的加权。我们的SelfOcc在SemanticKITTI上比前一个最佳方法SceneRF高58.7%,并且是自动学习的首个方法,可以在射频相机上生成有理性3D占用。SelfOcc可以生成高质量的深度图,并在SemanticKITTI、KITTI-2015和nuScenes上分别达到了状态最佳的结果。我们的代码可以在 GitHub 上找到。

Image Transformation for IoT Time-Series Data: A Review

  • paper_url: http://arxiv.org/abs/2311.12742
  • repo_url: None
  • paper_authors: Duygu Altunkaya, Feyza Yildirim Okay, Suat Ozdemir
  • for: 本研究主要针对 IoT 领域中时序数据的分类或回归问题,即高维、高频时序数据的分类或回归问题。
  • methods: 本研究使用图像转换/编码技术来解决 IoT 领域中时序数据的分类或回归问题,包括不同的编码技术、数据类型和应用领域。
  • results: 研究发现,图像转换/编码技术可以提高深度学习模型在 IoT 领域中时序数据分类或回归的性能。但是,还需要进一步探索图像转换/编码技术的挑战和未来维度。
    Abstract In the era of the Internet of Things (IoT), where smartphones, built-in systems, wireless sensors, and nearly every smart device connect through local networks or the internet, billions of smart things communicate with each other and generate vast amounts of time-series data. As IoT time-series data is high-dimensional and high-frequency, time-series classification or regression has been a challenging issue in IoT. Recently, deep learning algorithms have demonstrated superior performance results in time-series data classification in many smart and intelligent IoT applications. However, it is hard to explore the hidden dynamic patterns and trends in time-series. Recent studies show that transforming IoT data into images improves the performance of the learning model. In this paper, we present a review of these studies which use image transformation/encoding techniques in IoT domain. We examine the studies according to their encoding techniques, data types, and application areas. Lastly, we emphasize the challenges and future dimensions of image transformation.
    摘要 在互联网物联网(IoT)时代,智能手机、内置系统、无线感知器和几乎所有的智能设备通过本地网络或互联网连接起来,数十亿个智能设备之间进行通信,生成巨量的时间序列数据。由于IoT时间序列数据具有高维度和高频率,时间序列分类或回归成为IoT中的挑战。然而,难以探索时间序列中隐藏的动态模式和趋势。近年来,深度学习算法在IoT应用中表现出色的时间序列数据分类表现。然而,图像变换技术在IoT领域的应用仍然具有潜在的优势。本文是一篇系统性地回顾这些使用图像变换/编码技术的IoT领域研究。我们根据它们的编码技术、数据类型和应用领域进行了分析。最后,我们强调了图像变换的挑战和未来的维度。

Content Augmented Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.12741
  • repo_url: https://github.com/fatemehgholamzadeh/augss-gnn
  • paper_authors: Fatemeh Gholamzadeh Nasrabadi, AmirHossein Kashani, Pegah Zahedi, Mostafa Haghir Chehreghani
  • For: This paper aims to improve the performance of graph neural networks (GNNs) by incorporating content information into the node embeddings.* Methods: The proposed method uses a combination of structural embeddings generated by a GNN and content embeddings generated by an auto-encoder or content graph to form the final node embeddings.* Results: The proposed method achieves high accuracy and performance on several real-world datasets.Here is the same information in Simplified Chinese:* For: 这篇论文目的是提高图 нейрон网络(GNN)的性能,通过将内容信息 integrate 到节点嵌入中。* Methods: 提议方法使用结构嵌入和内容嵌入的组合,通过结构嵌入和内容嵌入来形成节点嵌入。* Results: 提议方法在多个实际数据集上达到了高精度和性能。
    Abstract In recent years, graph neural networks (GNNs) have become a popular tool for solving various problems over graphs. In these models, the link structure of the graph is typically exploited and nodes' embeddings are iteratively updated based on adjacent nodes. Nodes' contents are used solely in the form of feature vectors, served as nodes' first-layer embeddings. However, the filters or convolutions, applied during iterations/layers to these initial embeddings lead to their impact diminish and contribute insignificantly to the final embeddings. In order to address this issue, in this paper we propose augmenting nodes' embeddings by embeddings generating from their content, at higher GNN layers. More precisely, we propose models wherein a structural embedding using a GNN and a content embedding are computed for each node. These two are combined using a combination layer to form the embedding of a node at a given layer. We suggest methods such as using an auto-encoder or building a content graph, to generate content embeddings. In the end, by conducting experiments over several real-world datasets, we demonstrate the high accuracy and performance of our models.
    摘要 To address this issue, in this paper, we propose augmenting nodes' embeddings by generating embeddings from their content at higher GNN layers. Specifically, we propose models that compute a structural embedding using a GNN and a content embedding for each node. These two embeddings are combined using a combination layer to form the embedding of a node at a given layer. We suggest methods such as using an auto-encoder or building a content graph to generate content embeddings.Through experiments on several real-world datasets, we demonstrate the high accuracy and performance of our models.

  • paper_url: http://arxiv.org/abs/2311.12719
  • repo_url: None
  • paper_authors: Pranav Nataraj Devaraj, Rakesh Teja P V, Aaryav Gangrade, Manoj Kumar R
  • For: The paper is written for those who are interested in creating a Legal Documentation AI Chatbot with relevant features.* Methods: The paper uses a combination of AI technologies, including chatbots, to streamline the handling of legal documents. The authors describe the development of each component of the chatbot in detail, including the Android app and the Langchain query processing code.* Results: The paper presents the results of the authors’ efforts to integrate the chatbot components through a Flask backend and REST API methods, and discusses the functionality of each component.
    Abstract With the exponential growth of digital data and the increasing complexity of legal documentation, there is a pressing need for efficient and intelligent tools to streamline the handling of legal documents.With the recent developments in the AI field, especially in chatbots, it cannot be ignored as a very compelling solution to this problem.An insight into the process of creating a Legal Documentation AI Chatbot with as many relevant features as possible within the given time frame is presented.The development of each component of the chatbot is presented in detail.Each component's workings and functionality has been discussed.Starting from the build of the Android app and the Langchain query processing code till the integration of both through a Flask backend and REST API methods.
    摘要 随着数字数据的急速增长和法律文档的复杂化,有一定的需求是提高法律文档的处理效率和智能化工具。在人工智能领域的最近发展,尤其是在聊天机器人方面,不能忽略它作为非常有吸引力的解决方案。本文将对建立法律文档AI聊天机器人的过程进行深入探讨,并尽可能多地包含相关的特点和功能。从构建Android应用和Langchain查询处理代码到通过Flask后端和REST API方法集成。Here's the breakdown of the translation:随着数字数据的急速增长 (with the rapid growth of digital data)和法律文档的复杂化 (and the increasing complexity of legal documentation)有一定的需求 (there is a certain need)是提高法律文档的处理效率 (to improve the efficiency of handling legal documents)和智能化工具 (and intelligent tools)。在人工智能领域的最近发展 (in the recent developments of the AI field)尤其是在聊天机器人方面 (especially in chatbots)不能忽略 (cannot be ignored)它作为非常有吸引力的解决方案 (it as a very compelling solution)。本文将对建立法律文档AI聊天机器人的过程进行深入探讨 (this article will provide an in-depth discussion of the process of building a legal document AI chatbot),并尽可能多地包含相关的特点和功能 (and as many relevant features as possible)。从构建Android应用和Langchain查询处理代码 (from building the Android app and Langchain query processing code)到通过Flask后端和REST API方法集成 (to integrating through Flask backend and REST API methods)。

minimax: Efficient Baselines for Autocurricula in JAX

  • paper_url: http://arxiv.org/abs/2311.12716
  • repo_url: https://github.com/facebookresearch/minimax
  • paper_authors: Minqi Jiang, Michael Dennis, Edward Grefenstette, Tim Rocktäschel
  • for: 这篇论文主要用于提出一种快速实现自适应环境设计(UED)训练方法,以便在未看过环境中培养强大的决策maker。
  • methods: 该论文使用JAX实现了绝对约束环境和自适应训练算法,并通过硬件加速来加速训练过程。
  • results: 该论文提出了一个名为minimax的库,可以在加速硬件上快速进行UED训练,并实现了更高的训练效率和并行性。
    Abstract Unsupervised environment design (UED) is a form of automatic curriculum learning for training robust decision-making agents to zero-shot transfer into unseen environments. Such autocurricula have received much interest from the RL community. However, UED experiments, based on CPU rollouts and GPU model updates, have often required several weeks of training. This compute requirement is a major obstacle to rapid innovation for the field. This work introduces the minimax library for UED training on accelerated hardware. Using JAX to implement fully-tensorized environments and autocurriculum algorithms, minimax allows the entire training loop to be compiled for hardware acceleration. To provide a petri dish for rapid experimentation, minimax includes a tensorized grid-world based on MiniGrid, in addition to reusable abstractions for conducting autocurricula in procedurally-generated environments. With these components, minimax provides strong UED baselines, including new parallelized variants, which achieve over 120$\times$ speedups in wall time compared to previous implementations when training with equal batch sizes. The minimax library is available under the Apache 2.0 license at https://github.com/facebookresearch/minimax.
    摘要 自动课程学习(UED)是一种训练强化决策代理人的自动课程学习方法,可以 zero-shot 转移到未看过环境。这些自动课程学习方法在RL社区中受到了很大的关注。然而,UED实验通常需要几个星期的训练时间,这个 compute 需求是RL领域快速创新的主要障碍。这个工作介绍了用于 UED 训练的加速硬件库 minimax。使用 JAX 实现完全张量化环境和自动课程算法,minimax 将整个训练循环编译到硬件加速器上。为提供快速实验的 Petri 碗,minimax 包含了张量化网格世界,以及可重用的扩展函数,用于在生成的环境中进行自动课程。与这些组件相结合,minimax 提供了强大的 UED 基线,包括新的并发版本,可以在同批量大小的训练时间内实现更 чем 120倍的速度提升。minimax 库可以在 下载,遵循 Apache 2.0 许可证。

Alpha Zero for Physics: Application of Symbolic Regression with Alpha Zero to find the analytical methods in physics

  • paper_url: http://arxiv.org/abs/2311.12713
  • repo_url: None
  • paper_authors: Yoshihiro Michishita
  • for: 这个论文是为了探讨机器学习与神经网络如何应用于物理领域,特别是用于数学计算和实验检测的问题。
  • methods: 该论文提出了一种基于Symbolic regression的Alpha Zero算法(AZfP),用于发展物理学中的分析方法。作为示例,论文示出了AZfP可以 derivation高频扩展在 Floquet 系统中。
  • results: 该论文表明,AZfP可能有助于发展物理学中的新理论框架。
    Abstract Machine learning with neural networks is now becoming a more and more powerful tool for various tasks, such as natural language processing, image recognition, winning the game, and even for the issues of physics. Although there are many studies on the application of machine learning to numerical calculation and the assistance of experimental detection, the methods of applying machine learning to find the analytical method are poorly studied. In this paper, we propose the frameworks of developing analytical methods in physics by using the symbolic regression with the Alpha Zero algorithm, that is Alpha Zero for physics (AZfP). As a demonstration, we show that AZfP can derive the high-frequency expansion in the Floquet systems. AZfP may have the possibility of developing a new theoretical framework in physics.
    摘要 机器学习与神经网络在不同任务上 becoming increasingly powerful tool, such as自然语言处理、图像识别、游戏胜利、和物理问题。although there have been many studies on the application of machine learning to numerical calculation and experimental detection assistance, the methods of applying machine learning to find analytical methods are not well studied. In this paper, we propose the frameworks of developing analytical methods in physics by using the symbolic regression with the Alpha Zero algorithm, that is, Alpha Zero for physics (AZfP). As a demonstration, we show that AZfP can derive the high-frequency expansion in the Floquet systems. AZfP may have the possibility of developing a new theoretical framework in physics.Note: The translation is done using Google Translate, and may not be perfect. Please let me know if you need any further assistance.

Keeping Users Engaged During Repeated Administration of the Same Questionnaire: Using Large Language Models to Reliably Diversify Questions

  • paper_url: http://arxiv.org/abs/2311.12707
  • repo_url: None
  • paper_authors: Hye Sun Yun, Mehdi Arjmand, Phillip Raymond Sherlock, Michael Paasche-Orlow, James W. Griffith, Timothy Bickmore
  • for: 这篇论文的目的是提出一种使用大型自然语言模型(LLM)生成多个问卷版本,以保持良好的心理测量特性,并且避免回答疲劳和响应偏好。
  • methods: 这篇论文使用了一种长期研究,参与者每天为两周接受了标准化抑郁问卷或两个LLM生成的问卷变体,同时完成了一个验证了抑郁问卷的 validate 问卷。
  • results: 研究发现,LLM生成的问卷变体和标准化问卷之间具有一致的covariation,表明LLM生成的问卷variant具有可靠性和有效性。参与者认为, repeating 使用标准化问卷 Significantly 更加烦人,相比LLM生成的问卷变体。 这些发现表明LLM可以 Invigorate 问卷,提高参与者的参与度和兴趣,无需牺牲有效性。
    Abstract Standardized, validated questionnaires are vital tools in HCI research and healthcare, offering dependable self-report data. However, their repeated use in longitudinal or pre-post studies can induce respondent fatigue, impacting data quality via response biases and decreased response rates. We propose utilizing large language models (LLMs) to generate diverse questionnaire versions while retaining good psychometric properties. In a longitudinal study, participants engaged with our agent system and responded daily for two weeks to either a standardized depression questionnaire or one of two LLM-generated questionnaire variants, alongside a validated depression questionnaire. Psychometric testing revealed consistent covariation between the external criterion and the focal measure administered across the three conditions, demonstrating the reliability and validity of the LLM-generated variants. Participants found the repeated administration of the standardized questionnaire significantly more repetitive compared to the variants. Our findings highlight the potential of LLM-generated variants to invigorate questionnaires, fostering engagement and interest without compromising validity.
    摘要 标准化、验证的问卷是人机交互研究和医疗中非常重要的工具,提供可靠的自报数据。然而,在长期或先后研究中,重复使用标准问卷可能会导致受试者疲劳,影响数据质量通过回答偏见和回答率下降。我们提议使用大语言模型(LLM)生成多种问卷版本,保持好的心理测量性质。在一个长期研究中,参与者与我们的代理系统交互,每天对标准抑郁问卷或两种LLM生成的问卷变体进行了两周的回答,同时还进行了验证抑郁问卷的有效性测试。心理测量测试表明,三个状态下的外部标准和关键量测试之间存在一致的covariation,这表明LLM生成的变体具有可靠性和有效性。参与者发现,重复使用标准问卷比对LLM生成的变体更加烦琐。我们的发现表明,LLM生成的变体可以invigorate问卷,提高参与者的参与度和兴趣,而不会COMPROMISE有效性。

Can Large Language Models Understand Content and Propagation for Misinformation Detection: An Empirical Study

  • paper_url: http://arxiv.org/abs/2311.12699
  • repo_url: None
  • paper_authors: Mengyang Chen, Lingwei Wei, Han Cao, Wei Zhou, Songlin Hu
  • for: 本研究旨在探讨大语言模型(LLMs)在误信息探测任务中的表现。
  • methods: 本研究采用多种提示来评估多种LLMs的理解能力,并设计了四种指令调整策略以提高LLMs的误信息探测性能。
  • results: 实验结果表明,采用提示多样化和指令调整策略可以提高LLMs的误信息探测性能,并且LLMs可以更好地理解内容和宣传结构。
    Abstract Large Language Models (LLMs) have garnered significant attention for their powerful ability in natural language understanding and reasoning. In this paper, we present a comprehensive empirical study to explore the performance of LLMs on misinformation detection tasks. This study stands as the pioneering investigation into the understanding capabilities of multiple LLMs regarding both content and propagation across social media platforms. Our empirical studies on five misinformation detection datasets show that LLMs with diverse prompts achieve comparable performance in text-based misinformation detection but exhibit notably constrained capabilities in comprehending propagation structure compared to existing models in propagation-based misinformation detection. Besides, we further design four instruction-tuned strategies to enhance LLMs for both content and propagation-based misinformation detection. These strategies boost LLMs to actively learn effective features from multiple instances or hard instances, and eliminate irrelevant propagation structures, thereby achieving better detection performance. Extensive experiments further demonstrate LLMs would play a better capacity in content and propagation structure under these proposed strategies and achieve promising detection performance. These findings highlight the potential ability of LLMs to detect misinformation.
    摘要 Our analysis of five misinformation detection datasets shows that LLMs with diverse prompts perform similarly in text-based misinformation detection, but they struggle to comprehend the propagation structure. In contrast, existing models excel in propagation-based misinformation detection. To enhance the performance of LLMs, we design four instruction-tuned strategies that help them learn effective features from multiple instances or hard instances and eliminate irrelevant propagation structures.Our extensive experiments demonstrate that LLMs can achieve better detection performance under these strategies. These findings highlight the potential of LLMs to detect misinformation.

Diffusion Model Alignment Using Direct Preference Optimization

  • paper_url: http://arxiv.org/abs/2311.12908
  • repo_url: None
  • paper_authors: Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik
  • for: 本研究旨在使用人工比较数据和强化学习法 FROM human feedback (RLHF) 方法来提高 diffusion models 的适应性。
  • methods: 本研究提出了一种直接优化 diffusion models 以满足人类偏好的方法,称为 Diffusion-DPO。这种方法是基于 Direct Preference Optimization (DPO) 的一种简化版,可以直接优化一个策略以满足人类偏好。
  • results: 通过使用 Pick-a-Pic 数据集的 851K 个人类比较数据,我们经过 fine-tuning 基于 SDXL-1.0 模型的基本模型,得到了显著改善的视觉吸引力和提示匹配性。此外,我们还开发了一种使用 AI 反馈的变体,其性能与人类反馈训练相当,这开启了 diffusion model Alignment 方法的扩展。
    Abstract Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality images and captions to improve visual appeal and text alignment. We propose Diffusion-DPO, a method to align diffusion models to human preferences by directly optimizing on human comparison data. Diffusion-DPO is adapted from the recently developed Direct Preference Optimization (DPO), a simpler alternative to RLHF which directly optimizes a policy that best satisfies human preferences under a classification objective. We re-formulate DPO to account for a diffusion model notion of likelihood, utilizing the evidence lower bound to derive a differentiable objective. Using the Pick-a-Pic dataset of 851K crowdsourced pairwise preferences, we fine-tune the base model of the state-of-the-art Stable Diffusion XL (SDXL)-1.0 model with Diffusion-DPO. Our fine-tuned base model significantly outperforms both base SDXL-1.0 and the larger SDXL-1.0 model consisting of an additional refinement model in human evaluation, improving visual appeal and prompt alignment. We also develop a variant that uses AI feedback and has comparable performance to training on human preferences, opening the door for scaling of diffusion model alignment methods.
    摘要 大型语言模型(LLM)通过人工比较数据和人类反馈学习(RLHF)方法进行微调,以使其更好地适应用户的偏好。而文本到图像扩散模型中人员偏好学习尚未广泛研究,最佳现有方法是通过精心约选高质量的图像和标签来进行微调,以提高图像的可见性和文本的匹配度。我们提出了扩散-DPO方法,用于将扩散模型与人类偏好进行对应。扩散-DPO方法基于直接满足人类偏好的策略,并利用证据下界来 derivate一个可微分的目标函数。使用851000个人工投票的 Pick-a-Pic 数据集,我们微调了基于state-of-the-art Stable Diffusion XL(SDXL)-1.0模型的基本模型,并使用扩散-DPO方法。我们的微调基本模型在人工评价中显著超过了基本 SDXL-1.0 模型和增加了一个细化模型的 SDXL-1.0 模型,提高了图像的可见性和文本的匹配度。我们还开发了一种使用 AI 反馈的变体,其性能与人工偏好训练相当,开启了扩散模型对齐方法的扩展。

From Concept to Manufacturing: Evaluating Vision-Language Models for Engineering Design

  • paper_url: http://arxiv.org/abs/2311.12668
  • repo_url: None
  • paper_authors: Cyril Picard, Kristen M. Edwards, Anna C. Doris, Brandon Man, Giorgio Giannone, Md Ferdous Alam, Faez Ahmed
    for:This paper aims to evaluate the capabilities of GPT-4V, a vision language model, in a wide range of engineering design tasks.methods:The paper uses a comprehensive evaluation of GPT-4V across four main areas of engineering design tasks, including Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks.results:The study assesses GPT-4V’s capabilities in design tasks such as sketch similarity analysis, concept selection using Pugh Charts, material selection, engineering drawing analysis, CAD generation, topology optimization, design for additive and subtractive manufacturing, spatial reasoning challenges, and textbook problems. The study identifies the limitations of GPT-4V in complex engineering design applications and provides a foundation for future assessments of vision language models in this field. Additionally, the paper contributes a set of benchmark testing datasets for ongoing advancements and applications.Here is the same information in Simplified Chinese:for:这篇论文目的是评估GPT-4V视觉语言模型在各种工程设计任务中的能力。methods:本论文使用了对GPT-4V在四个主要工程设计任务领域进行了全面评估,包括概念设计、系统级别和详细设计、生产和检查、工程教育任务。results:研究评估GPT-4V在各种设计任务中的能力,包括图纸相似性分析、使用Pugh图进行概念选择、材料选择、工程图分析、CAD生成、多尺度优化、适应加法和减法生产设计、空间逻辑挑战和学科题目。研究发现GPT-4V在复杂工程设计应用中存在一些限制,并提供了未来评估视觉语言模型的基础。此外,论文还提供了为未来进程中的进一步发展和应用提供了一个大于1000个查询的测试数据集。
    Abstract Engineering Design is undergoing a transformative shift with the advent of AI, marking a new era in how we approach product, system, and service planning. Large language models have demonstrated impressive capabilities in enabling this shift. Yet, with text as their only input modality, they cannot leverage the large body of visual artifacts that engineers have used for centuries and are accustomed to. This gap is addressed with the release of multimodal vision language models, such as GPT-4V, enabling AI to impact many more types of tasks. In light of these advancements, this paper presents a comprehensive evaluation of GPT-4V, a vision language model, across a wide spectrum of engineering design tasks, categorized into four main areas: Conceptual Design, System-Level and Detailed Design, Manufacturing and Inspection, and Engineering Education Tasks. Our study assesses GPT-4V's capabilities in design tasks such as sketch similarity analysis, concept selection using Pugh Charts, material selection, engineering drawing analysis, CAD generation, topology optimization, design for additive and subtractive manufacturing, spatial reasoning challenges, and textbook problems. Through this structured evaluation, we not only explore GPT-4V's proficiency in handling complex design and manufacturing challenges but also identify its limitations in complex engineering design applications. Our research establishes a foundation for future assessments of vision language models, emphasizing their immense potential for innovating and enhancing the engineering design and manufacturing landscape. It also contributes a set of benchmark testing datasets, with more than 1000 queries, for ongoing advancements and applications in this field.
    摘要 工程设计正在通过人工智能的出现,进入一个新的时代,如此以前所未有的产品、系统和服务规划方式。大型自然语言模型在这个过程中表现出了惊人的能力。然而,由于文本为它们的唯一输入模式,它们无法利用工程师 centuries 使用的大量视觉文物,这是一个问题。这个问题得到解决,通过发布多modal视觉语言模型,如 GPT-4V,使得 AI 能够影响更多类型的任务。在这些进步的背景下,这篇论文对 GPT-4V 进行了全面的评估,在工程设计任务中分为四个主要领域:概念设计、系统级和细节设计、制造和检查、和工程教育任务。我们的研究评估 GPT-4V 在设计任务中的能力,如:图像相似性分析、选择使用 Pugh 图、材料选择、工程图像分析、 CAD 生成、材料优化、设计添加和减少制造、空间理解挑战和文book问题。通过这种结构化的评估,我们不仅探索 GPT-4V 在复杂的设计和制造挑战中的能力,还确定了它在复杂工程设计应用中的局限性。我们的研究建立了未来评估视 language 模型的基础,强调它们在工程设计和制造领域的潜在创新和提高的潜力。同时,我们也提供了更多于 1000 个查询的测试数据集,为这一领域的进一步进步和应用做出了贡献。

The DURel Annotation Tool: Human and Computational Measurement of Semantic Proximity, Sense Clusters and Semantic Change

  • paper_url: http://arxiv.org/abs/2311.12664
  • repo_url: None
  • paper_authors: Dominik Schlechtweg, Shafqat Mumtaz Virk, Pauline Sander, Emma Sköldberg, Lukas Theuer Linke, Tuo Zhang, Nina Tahmasebi, Jonas Kuhn, Sabine Schulte im Walde
  • for: 这篇论文是用于介绍一个名为DURel的工具,该工具可以在在线开源界面上进行 semantic proximity 的注释。
  • methods: 这篇论文使用了标准化的人工注释以及计算注释,建立在最近的 Word-in-Context 模型之上。注释员的判断被使用图像分 clustering 技术集中,并可以对分析。
  • results: 这篇论文通过使用 DURel 工具可以轻松地测试单个用例之间的 semantic proximity,无需大量的准备工作。此外,工具还提供了比较注释员之间的一致性,以及计算概念频率分布、semantic variation 和时间变化等数据分析功能。
    Abstract We present the DURel tool that implements the annotation of semantic proximity between uses of words into an online, open source interface. The tool supports standardized human annotation as well as computational annotation, building on recent advances with Word-in-Context models. Annotator judgments are clustered with automatic graph clustering techniques and visualized for analysis. This allows to measure word senses with simple and intuitive micro-task judgments between use pairs, requiring minimal preparation efforts. The tool offers additional functionalities to compare the agreement between annotators to guarantee the inter-subjectivity of the obtained judgments and to calculate summary statistics giving insights into sense frequency distributions, semantic variation or changes of senses over time.
    摘要 我们介绍了DURel工具,它实现了词语 semantic proximity 的注释到在线、开源 интер法面上。该工具支持标准化的人类注释以及计算注释,基于最近的 Word-in-Context 模型。注释决策被聚合使用自动图 clustering 技术,并可视化 для分析。这使得可以使用简单和直观的微任务判断对用例之间的词语涵义,需要最小的准备努力。工具还提供了比较注释者之间协议的功能,以确保获得的判断是 между主体的,以及计算摘要统计信息,提供词汇频率分布、semantic variation 和时间变化等信息。

PARK: Parkinson’s Analysis with Remote Kinetic-tasks

  • paper_url: http://arxiv.org/abs/2311.12654
  • repo_url: None
  • paper_authors: Md Saiful Islam, Sangwu Lee, Abdelrahman Abdelkader, Sooyong Park, Ehsan Hoque
  • for: 检测parkinson病(PD)的网络框架,让用户在家中完成 neuroscience 测试。
  • methods: 指导用户完成三个任务,包括语音、表情和手势测试,并分析视频确定是否存在PD的症状。
  • results: 以易于理解的方式显示结果,并提供个性化的治疗和护理资源。Note: The above information points are in Simplified Chinese, and the word order is adjusted to follow the typical sentence structure in Chinese.
    Abstract We present a web-based framework to screen for Parkinson's disease (PD) by allowing users to perform neurological tests in their homes. Our web framework guides the users to complete three tasks involving speech, facial expression, and finger movements. The task videos are analyzed to classify whether the users show signs of PD. We present the results in an easy-to-understand manner, along with personalized resources to further access to treatment and care. Our framework is accessible by any major web browser, improving global access to neurological care.
    摘要 我们提供了一个基于网页的框架,以帮助用户在家中进行parkinson病(PD)的检测。我们的网页框架会引导用户完成三项任务,包括语音、表情和手部运动。这些任务的视频会被分析,以确定用户是否显示出PD的症状。我们将结果显示在易于理解的方式下,并提供个性化的资源,以便更好地访问治疗和护理。我们的框架可以通过任何主要浏览器访问,从而提高全球脑科医疗的访问性。Note: Please keep in mind that the translation is done by a machine and may not be perfect. If you have any specific requirements or preferences, please let me know and I can provide a more tailored translation.

Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots

  • paper_url: http://arxiv.org/abs/2311.12651
  • repo_url: https://github.com/whu-usi3dv/mobile-seed
  • paper_authors: Youqi Liao, Shuhao Kang, Jianping Li, Yang Liu, Yun Liu, Zhen Dong, Bisheng Yang, Xieyuanli Chen
    for: 这篇论文主要是为了提出一种轻量级的框架,以实现同时进行Semantic Segmentation和Boundary Detection。methods: 该框架基于两核心思想:一个是分流体系,另一个是动态调整的融合方法。分流体系使得模型可以同时学习类别相关的 semantic information以及多尺度特征的boundary信息。动态调整的融合方法使得模型可以在不同的输入数据上进行精准的融合。results: 实验结果表明,Mobile-Seed在Cityscapes数据集上与SOTA基eline的比较中提高了2.2个百分点(pp)的mIoU和4.2个百分点(pp)的mF-score,同时保持了在线推理速度为23.9帧/秒(FPS),分辨率为1024x2048。此外,对CamVid和PASCAL Context数据集的实验也证明了该方法的普适性。
    Abstract Precise and rapid delineation of sharp boundaries and robust semantics is essential for numerous downstream robotic tasks, such as robot grasping and manipulation, real-time semantic mapping, and online sensor calibration performed on edge computing units. Although boundary detection and semantic segmentation are complementary tasks, most studies focus on lightweight models for semantic segmentation but overlook the critical role of boundary detection. In this work, we introduce Mobile-Seed, a lightweight, dual-task framework tailored for simultaneous semantic segmentation and boundary detection. Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. The encoder is divided into two pathways: one captures category-aware semantic information, while the other discerns boundaries from multi-scale features. The AFD module dynamically adapts the fusion of semantic and boundary information by learning channel-wise relationships, allowing for precise weight assignment of each channel. Furthermore, we introduce a regularization loss to mitigate the conflicts in dual-task learning and deep diversity supervision. Compared to existing methods, the proposed Mobile-Seed offers a lightweight framework to simultaneously improve semantic segmentation performance and accurately locate object boundaries. Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score, while maintaining an online inference speed of 23.9 frames-per-second (FPS) with 1024x2048 resolution input on an RTX 2080 Ti GPU. Additional experiments on CamVid and PASCAL Context datasets confirm our method's generalizability. Code and additional results are publicly available at https://whu-usi3dv.github.io/Mobile-Seed/.
    摘要 这文章提出了一个名为“Mobile-Seed”的轻量级、双任务框架,用于同时进行 semantic segmentation 和 boundary detection。这个框架包括两条通道:一条用于捕捉类别意识的信息,另一条用于从多个特征中获取边界信息。我们还引入了一个活动融合解码器(AFD)和双任学习整合损失函数。AFD模组可以在 runtime 中动态地适应各个通道之间的融合,以精确地分配每个通道的重要性。此外,我们引入了一个内排优化损失函数,以减少双任学习中的冲突。在 Cityscapes 数据集上进行了实验,我们发现 Mobile-Seed 可以与现有基eline(SOTA)的基eline进行比较,在 mIoU 和 mF-score 上提高 2.2 分点和 4.2 分点,而且可以在 1024x2048 像素的输入上保持在线进行推断,具体来说是 23.9 帧/秒的帧速度。此外,我们还进行了 CamVid 和 PASCAL Context 数据集的实验,显示我们的方法具有普遍性。代码和更多的结果可以在 上获取。

KNVQA: A Benchmark for evaluation knowledge-based VQA

  • paper_url: http://arxiv.org/abs/2311.12639
  • repo_url: None
  • paper_authors: Sirui Cheng, Siyu Zhang, Jiayi Wu, Muchen Lan
  • for: 本研究旨在提供一种可靠的评估方法,以评估大型视语模型(LVLM)在多模态场景中的实用性。
  • methods: 本研究使用了一种新的评估方法——KNVQA-Eval,通过人工标注和感知来评估知识基于VQA任务中LVLM的准确性。
  • results: 研究发现,现有的LVLMs在知识基于VQA任务中存在诸如对象投影和事实准确性等两个主要问题,而previous评估方法更多地关注语言内容的理解和推理能力,而忽略了多模态交互的全面评估。
    Abstract Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems. However, LVLMs are still plagued by the two critical issues of object hallucination and factual accuracy, which limit the practicality of LVLMs in different scenarios. Furthermore, previous evaluation methods focus more on the comprehension and reasoning of language content but lack a comprehensive evaluation of multimodal interactions, thereby resulting in potential limitations. To this end, we propose a novel KNVQA-Eval, which is devoted to knowledge-based VQA task evaluation to reflect the factuality of multimodal LVLMs. To ensure the robustness and scalability of the evaluation, we develop a new KNVQA dataset by incorporating human judgment and perception, aiming to evaluate the accuracy of standard answers relative to AI-generated answers in knowledge-based VQA. This work not only comprehensively evaluates the contextual information of LVLMs using reliable human annotations, but also further analyzes the fine-grained capabilities of current methods to reveal potential avenues for subsequent optimization of LVLMs-based estimators. Our proposed VQA-Eval and corresponding dataset KNVQA will facilitate the development of automatic evaluation tools with the advantages of low cost, privacy protection, and reproducibility. Our code will be released upon publication.
    摘要 在多模态场景下,大视语模型(LVLM)已经做出了 significiant 的进步,这是因为它们在视觉和语言系统中具有强大的感知和理解能力。然而,LVLM仍然面临着两个关键问题:对象投影和事实准确性,这限制了LVLM在不同场景中的实用性。此外,先前的评估方法更关注语言内容的理解和逻辑,而忽略了多模态交互的全面评估,从而可能导致局限性。为此,我们提出了一种新的KNVQA-Eval,用于评估知识基于VQA任务的准确性。为保证评估的稳定性和可扩展性,我们开发了一个新的KNVQA数据集,通过将人类判断和感知纳入到数据集中,以评估标准答案与AI生成答案之间的差异。这项工作不仅全面评估了LVLM的上下文信息,还进一步分析了当前方法的细腻能力,以探索可能的优化方向。我们的提出的VQA-Eval和相应的KNVQA数据集将为LVLM-基于估计器的自动评估工具带来低成本、隐私保护和可重复性的优势。我们将在出版时释出代码。

ChessVision – A Dataset for Logically Coherent Multi-label Classification

  • paper_url: http://arxiv.org/abs/2311.12610
  • repo_url: https://github.com/espressovi/chessvisionchallenge
  • paper_authors: Soumadeep Saha, Utpal Garain
  • for: 这篇论文的目的是探讨深度学习技术在棋盘检测任务中的应用,以及这些技术如何处理棋盘检测任务中的语义上下文和逻辑约束。
  • methods: 这篇论文使用了深度学习技术,并提供了一个大量的规则集来检测棋盘检测任务中的语义上下文和逻辑约束。
  • results: 研究发现,使用深度学习技术进行棋盘检测任务可以取得高度的性能,但是这些模型往往产生了大量的无关的结果, indicating that this dataset presents a significant challenge for future works.
    Abstract Starting with early successes in computer vision tasks, deep learning based techniques have since overtaken state of the art approaches in a multitude of domains. However, it has been demonstrated time and again that these techniques fail to capture semantic context and logical constraints, instead often relying on spurious correlations to arrive at the answer. Since application of deep learning techniques to critical scenarios are dependent on adherence to domain specific constraints, several attempts have been made to address this issue. One limitation holding back a thorough exploration of this area, is a lack of suitable datasets which feature a rich set of rules. In order to address this, we present the ChessVision Dataset, consisting of 200,000+ images of annotated chess games in progress, requiring recreation of the game state from its corresponding image. This is accompanied by a curated set of rules which constrains the set of predictions to "reasonable" game states, and are designed to probe key semantic abilities like localization and enumeration. Alongside standard metrics, additional metrics to measure performance with regards to logical consistency is presented. We analyze several popular and state of the art vision models on this task, and show that, although their performance on standard metrics are laudable, they produce a plethora of incoherent results, indicating that this dataset presents a significant challenge for future works.
    摘要 开始于计算机视觉任务的早期成功,深度学习基于技术后来在多个领域超越了现状的方法。然而,它们时刻表现出不能捕捉 semantic context 和逻辑约束,而是经常依靠偶合关系来得到答案。因此,在应用深度学习技术到关键应用场景时,需要遵循域pecific的约束。为了解决这个问题,我们提出了 ChessVision 数据集,包含200,000+ 个 annotated chess game 的图像,需要从图像中重建游戏状态。此外,我们还提供了一个精心编辑的规则集,用于约束预测结果的合理性,并且旨在探索棋盘游戏的 semantic 能力,如本地化和排列。此外,我们还提出了一些针对逻辑一致性的metric,以评估模型的表现。我们对一些流行的和现状最佳的视觉模型进行了分析,并发现它们在标准 metric 上表现出色,但是它们在这个任务上产生了大量的不一致结果,这表明这个数据集对未来的工作呈现出了 significiant 挑战。

Trustworthy AI: Deciding What to Decide

  • paper_url: http://arxiv.org/abs/2311.12604
  • repo_url: None
  • paper_authors: Caesar Wu, Yuan-Fang Li, Jian Li, Jingjing Xu, Bouvry Pascal
  • For: The paper aims to address the challenge of determining which information can be trusted when using Artificial Intelligence (AI) systems for decision-making, known as Trustworthy AI (TAI).* Methods: The paper proposes a new framework for TAI that includes three crucial components of AI: representation space, loss function, and optimizer. Each component is loosely coupled with four TAI properties, resulting in a total of twelve TAI properties. The authors plan to use this framework to conduct experiments using quantitative and qualitative research methods to evaluate the effectiveness of the TAI properties in the decision-making context.* Results: The paper presents an optimal prediction model trained by a given dataset for applying strategic investment decisions in the technology sector using the proposed TAI framework. The authors also provide their future direction for TAI research.
    Abstract When engaging in strategic decision-making, we are frequently confronted with overwhelming information and data. The situation can be further complicated when certain pieces of evidence contradict each other or become paradoxical. The primary challenge is how to determine which information can be trusted when we adopt Artificial Intelligence (AI) systems for decision-making. This issue is known as deciding what to decide or Trustworthy AI. However, the AI system itself is often considered an opaque black box. We propose a new approach to address this issue by introducing a novel framework of Trustworthy AI (TAI) encompassing three crucial components of AI: representation space, loss function, and optimizer. Each component is loosely coupled with four TAI properties. Altogether, the framework consists of twelve TAI properties. We aim to use this framework to conduct the TAI experiments by quantitive and qualitative research methods to satisfy TAI properties for the decision-making context. The framework allows us to formulate an optimal prediction model trained by the given dataset for applying the strategic investment decision of credit default swaps (CDS) in the technology sector. Finally, we provide our view of the future direction of TAI research
    摘要 当参与战略决策时,我们经常面临着沮丧的信息和数据。情况可能更加复杂,因为一些证据之间矛盾或变得变形。主要挑战是如何确定可信的信息,当我们采用人工智能(AI)系统进行决策时。这个问题被称为可靠AI(TAI)。然而,AI系统本身经常被视为透明的黑盒子。我们提出了一种新的方法来解决这个问题,即引入一种新的TAI框架,包括AI中三个关键组件:表示空间、损失函数和优化器。每个组件都与四个TAI特性相互独立。总体来说,框架包含十二个TAI特性。我们计划使用这个框架进行TAI实验,通过量化和质量研究方法来满足TAI特性。这个框架允许我们根据给定的数据集构建一个优化预测模型,用于应用技术领域的投资决策。最后,我们提供了我们对TAI研究未来方向的看法。

Visual tracking brain computer interface

  • paper_url: http://arxiv.org/abs/2311.12592
  • repo_url: None
  • paper_authors: Changxing Huang, Nanlin Shi, Yining Miao, Xiaogang Chen, Yijun Wang, Xiaorong Gao
  • for: 这个研究旨在超越传统的扫描命令,实现基于神经活动的自然连续控制。
  • methods: 研究人员采用了一种新的空间编码刺激方法,并开发了相应的投影方法来实现连续模拟的解码。
  • results: 实验中,参与者的Fitt’s ITR为0.55bps(固定跟踪任务)和0.37bps(随机跟踪任务),表明该BCI的高精度和可靠性。此外,研究人员还将该BCI应用于绘画和游戏等两个应用场景。
    Abstract Brain-computer interfaces (BCIs) offer a way to interact with computers without relying on physical movements. Non-invasive electroencephalography (EEG)-based visual BCIs, known for efficient speed and calibration ease, face limitations in continuous tasks due to discrete stimulus design and decoding methods. To achieve continuous control, we implemented a novel spatial encoding stimulus paradigm and devised a corresponding projection method to enable continuous modulation of decoded velocity. Subsequently, we conducted experiments involving 17 participants and achieved Fitt's ITR of 0.55 bps for the fixed tracking task and 0.37 bps for the random tracking task. The proposed BCI with a high Fitt's ITR was then integrated into two applications, including painting and gaming. In conclusion, this study proposed a visual BCI-based control method to go beyond discrete commands, allowing natural continuous control based on neural activity.
    摘要 ��������� BCIs (���������computer interfaces) ��������� ��������� computers ��������� physical movements. EEG-based visual BCIs (��������� EEG��������� ��������� ���������) ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��������� ��

Improving Source-Free Target Adaptation with Vision Transformers Leveraging Domain Representation Images

  • paper_url: http://arxiv.org/abs/2311.12589
  • repo_url: None
  • paper_authors: Gauransh Sawhney, Daksh Dave, Adeel Ahmed, Jiechao Gao, Khalid Saleem
  • for: 这 paper 是为了提高 ViT 在无源频道适应中的性能,通过调整 Key 元素来提高域特征表示。
  • methods: 该 paper 使用了 Vision Transformers (ViTs) 和 Domain Representation Images (DRIs) 来实现源自由目标适应。DRIs 是域特征特征图像,通过将域特征embeddings 输入到 Key 元素中来提高域特征表示。
  • results: experiments 表明,不包括 DRIs 时,ViT 的性能有限提高,而包括 DRIs 时,平均精度得到了提高,表明 DRIs 可以有效地提高 ViT 在域适应中的性能。
    Abstract Unsupervised Domain Adaptation (UDA) methods facilitate knowledge transfer from a labeled source domain to an unlabeled target domain, navigating the obstacle of domain shift. While Convolutional Neural Networks (CNNs) are a staple in UDA, the rise of Vision Transformers (ViTs) provides new avenues for domain generalization. This paper presents an innovative method to bolster ViT performance in source-free target adaptation, beginning with an evaluation of how key, query, and value elements affect ViT outcomes. Experiments indicate that altering the key component has negligible effects on Transformer performance. Leveraging this discovery, we introduce Domain Representation Images (DRIs), feeding embeddings through the key element. DRIs act as domain-specific markers, effortlessly merging with the training regimen. To assess our method, we perform target adaptation tests on the Cross Instance DRI source-only (SO) control. We measure the efficacy of target adaptation with and without DRIs, against existing benchmarks like SHOT-B* and adaptations via CDTrans. Findings demonstrate that excluding DRIs offers limited gains over SHOT-B*, while their inclusion in the key segment boosts average precision promoting superior domain generalization. This research underscores the vital role of DRIs in enhancing ViT efficiency in UDA scenarios, setting a precedent for further domain adaptation explorations.
    摘要 不监督领域适应(UDA)方法可以帮助知识从标注源领域传播到无标注目标领域,解决领域偏移的问题。 convolutional neural networks(CNNs)是UDA中的标准工具,但是vision transformers(ViTs)的出现提供了新的途径 для领域总体化。这篇论文提出了一种创新的方法,用于提高ViT在源领域无标注目标领域中的表现,开始于元素如关键、查询和值元素对ViT的影响的评估。实验表明,关键元素的修改对转换器性能产生了微不足的影响。利用这一发现,我们引入了领域表示图像(DRIs),将嵌入图像通过关键元素。DRIs behave as domain-specific markers, effortlessly merging with the training regimen. To assess our method, we perform target adaptation tests on the Cross Instance DRI source-only (SO) control. We measure the efficacy of target adaptation with and without DRIs, against existing benchmarks like SHOT-B* and adaptations via CDTrans. Findings demonstrate that excluding DRIs offers limited gains over SHOT-B*, while their inclusion in the key segment boosts average precision promoting superior domain generalization. This research underscores the vital role of DRIs in enhancing ViT efficiency in UDA scenarios, setting a precedent for further domain adaptation explorations.

Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer

  • paper_url: http://arxiv.org/abs/2311.12905
  • repo_url: None
  • paper_authors: Wenqiao Zhang, Zheqi Lv, Hao Zhou, Jia-Wei Liu, Juncheng Li, Mengze Li, Siliang Tang, Yueting Zhuang
  • for: 提高模型在新目标领域的最大化适应
  • methods: 多源活动领域适应(MADA)和动态集成不确定性评估框架(Detective)
  • results: 比现有方法提高较多的表现在三个频道适应 benchmark 上
    Abstract Active Domain Adaptation (ADA) aims to maximally boost model adaptation in a new target domain by actively selecting a limited number of target data to annotate.This setting neglects the more practical scenario where training data are collected from multiple sources. This motivates us to target a new and challenging setting of knowledge transfer that extends ADA from a single source domain to multiple source domains, termed Multi-source Active Domain Adaptation (MADA). Not surprisingly, we find that most traditional ADA methods cannot work directly in such a setting, mainly due to the excessive domain gap introduced by all the source domains and thus their uncertainty-aware sample selection can easily become miscalibrated under the multi-domain shifts. Considering this, we propose a Dynamic integrated uncertainty valuation framework(Detective) that comprehensively consider the domain shift between multi-source domains and target domain to detect the informative target samples. Specifically, the leverages a dynamic Domain Adaptation(DA) model that learns how to adapt the model's parameters to fit the union of multi-source domains. This enables an approximate single-source domain modeling by the dynamic model. We then comprehensively measure both domain uncertainty and predictive uncertainty in the target domain to detect informative target samples using evidential deep learning, thereby mitigating uncertainty miscalibration. Furthermore, we introduce a contextual diversity-aware calculator to enhance the diversity of the selected samples. Experiments demonstrate that our solution outperforms existing methods by a considerable margin on three domain adaptation benchmarks.
    摘要 为了解决这个问题,我们提出了一个动态整合不确定性评估框架(Detective),该框架全面考虑多源领域和目标领域之间的领域差。具体来说,我们利用动态适应模型(DA),该模型学习如何适应模型参数以适应多源领域的联合。这使得模型能够在动态模型中实现约等于单个源领域模型。然后,我们全面测量目标领域中的领域不确定性和预测不确定性,以探测有用的目标样本。此外,我们引入了上下文多样化识别器,以提高选择的样本多样性。实验表明,我们的解决方案在三个领域适应 benchmark 上出现了明显的提高,相比之下,现有方法的性能较差。

Echocardiogram Foundation Model – Application 1: Estimating Ejection Fraction

  • paper_url: http://arxiv.org/abs/2311.12582
  • repo_url: None
  • paper_authors: Adil Dahlan, Cyril Zakka, Abhinav Kumar, Laura Tang, Rohan Shad, Robyn Fong, William Hiesinger
  • for: 评估心脏功能,提高诊断准确率
  • methods: 使用自动学习(SSL)方法,基于150万个echocardiogram进行训练
  • results: 实现了射出率的准确率,与专业医生的性能相同, mean absolute percentage error为9.40%
    Abstract Cardiovascular diseases stand as the primary global cause of mortality. Among the various imaging techniques available for visualising the heart and evaluating its function, echocardiograms emerge as the preferred choice due to their safety and low cost. Quantifying cardiac function based on echocardiograms is very laborious, time-consuming and subject to high interoperator variability. In this work, we introduce EchoAI, an echocardiogram foundation model, that is trained using self-supervised learning (SSL) on 1.5 million echocardiograms. We evaluate our approach by fine-tuning EchoAI to estimate the ejection fraction achieving a mean absolute percentage error of 9.40%. This level of accuracy aligns with the performance of expert sonographers.
    摘要 心血管疾病是全球最主要的死亡原因。在多种心脏影像技术中,电子心室图(echocardiogram)成为首选,因为它的安全性和低成本。然而,根据电子心室图量化心脏功能是很困难、时间consuming和人员差异很大。在这项工作中,我们引入EchoAI,一个基于自主学习(SSL)的电子心室图基础模型,在150万个电子心室图上进行训练。我们评估我们的方法,通过精度地预测心脏补做率(ejection fraction),实现了9.40%的绝对差距误差。这种精度与专业医生的性能相当。

IMGTB: A Framework for Machine-Generated Text Detection Benchmarking

  • paper_url: http://arxiv.org/abs/2311.12574
  • repo_url: None
  • paper_authors: Michal Spiegel, Dominik Macko
  • for: 本研究旨在提供一个简单易用的测试框架,以便测试和比较新的机器产生文本检测方法。
  • methods: 本研究使用了IMGTB框架,该框架可以轻松地整合新的检测方法和评估数据集。
  • results: 本研究的结果显示,IMGTB框架可以方便地进行机器产生文本检测方法的比较和研究,并且具有较高的构成性和可 configure 性。
    Abstract In the era of large language models generating high quality texts, it is a necessity to develop methods for detection of machine-generated text to avoid harmful use or simply due to annotation purposes. It is, however, also important to properly evaluate and compare such developed methods. Recently, a few benchmarks have been proposed for this purpose; however, integration of newest detection methods is rather challenging, since new methods appear each month and provide slightly different evaluation pipelines. In this paper, we present the IMGTB framework, which simplifies the benchmarking of machine-generated text detection methods by easy integration of custom (new) methods and evaluation datasets. Its configurability and flexibility makes research and development of new detection methods easier, especially their comparison to the existing state-of-the-art detectors. The default set of analyses, metrics and visualizations offered by the tool follows the established practices of machine-generated text detection benchmarking found in state-of-the-art literature.
    摘要 在大语言模型生成高质量文本时代,检测机器生成文本成为必要的一步,以避免有害使用或只是为annotations purpose。然而,也需要正确评估和比较这些发展出来的方法。最近,一些benchmark已经被提议用于这个目的;然而,新方法每月出现,提供些许不同的评估管道。本文介绍IMGTB框架,它使机器生成文本检测方法的benchmarking更加简单,可以轻松地 инте integrate自定义(新)方法和评估数据集。其可 configurability和灵活性使研究和开发新检测方法变得更加容易,尤其是与现有state-of-the-art检测器进行比较。IMGTB框架的默认集成了分析、指标和可视化工具,与现有Literature中的机器生成文本检测benchmarking实践一致。

Moderating Model Marketplaces: Platform Governance Puzzles for AI Intermediaries

  • paper_url: http://arxiv.org/abs/2311.12573
  • repo_url: None
  • paper_authors: Robert Gorwa, Michael Veale
  • For: The paper discusses the challenges of governing AI model marketplaces, such as Hugging Face, GitHub, and Civitai, where users can upload and share their own models and training data.* Methods: The paper examines several case studies of incidents on these platforms to explore how they moderate models and provides an analysis of the practices that industry has been developing to respond to moderation demands, including licensing, access and use restrictions, automated content moderation, and open policy development.* Results: The paper concludes that the policy challenge of governing AI model marketplaces is considerable, but suggests some ideas for how platforms could better mobilize resources to act as a careful, fair, and proportionate regulatory access point.
    Abstract The AI development community is increasingly making use of hosting intermediaries such as Hugging Face provide easy access to user-uploaded models and training data. These model marketplaces lower technical deployment barriers for hundreds of thousands of users, yet can be used in numerous potentially harmful and illegal ways. In this article, we explain ways in which AI systems, which can both `contain' content and be open-ended tools, present one of the trickiest platform governance challenges seen to date. We provide case studies of several incidents across three illustrative platforms -- Hugging Face, GitHub and Civitai -- to examine how model marketplaces moderate models. Building on this analysis, we outline important (and yet nevertheless limited) practices that industry has been developing to respond to moderation demands: licensing, access and use restrictions, automated content moderation, and open policy development. While the policy challenge at hand is a considerable one, we conclude with some ideas as to how platforms could better mobilize resources to act as a careful, fair, and proportionate regulatory access point.
    摘要 人工智能开发社区现在越来越多地利用托管者如Hugging Face提供易访问用户上传模型和训练数据。这些模型市场降低了数十万用户的技术部署障碍,但可以用于多种可能有害和非法的方式。在这篇文章中,我们解释了AI系统如何在多种方面构成极具挑战性的平台治理问题。我们通过对Hugging Face、GitHub和Civitai等三个示例平台的情况进行分析,探讨了模型市场如何进行模型级别管理。基于这种分析,我们描述了行业在应对管理需求方面的重要 yet limited的实践:许可证、访问和使用限制、自动内容审核、开放政策发展。虽然政策挑战在手头上是极大的,但我们在结尾采用一些想法,以便平台可以更好地利用资源,成为一个仔细、公正、 proporcionate的管理访问点。

Scheduling Distributed Flexible Assembly Lines using Safe Reinforcement Learning with Soft Shielding

  • paper_url: http://arxiv.org/abs/2311.12572
  • repo_url: None
  • paper_authors: Lele Li, Liyong Lin
  • for: 提高生产工艺线的效率和可靠性,并能够实时调度作业。
  • methods: 使用优点actor-批处理学习方法,并提出了一种更加紧凑的环境表示方法,以及一种基于Monte-Carlo搜索的软障碍组件。
  • results: 经过性能评估,提出的算法和软障碍组件能够有效地提高作业调度和生产流程的效率和可靠性。
    Abstract Highly automated assembly lines enable significant productivity gains in the manufacturing industry, particularly in mass production condition. Nonetheless, challenges persist in job scheduling for make-to-job and mass customization, necessitating further investigation to improve efficiency, reduce tardiness, promote safety and reliability. In this contribution, an advantage actor-critic based reinforcement learning method is proposed to address scheduling problems of distributed flexible assembly lines in a real-time manner. To enhance the performance, a more condensed environment representation approach is proposed, which is designed to work with the masks made by priority dispatching rules to generate fixed and advantageous action space. Moreover, a Monte-Carlo tree search based soft shielding component is developed to help address long-sequence dependent unsafe behaviors and monitor the risk of overdue scheduling. Finally, the proposed algorithm and its soft shielding component are validated in performance evaluation.
    摘要 高度自动化的生产线可以实现大量生产中的产量增加,但是在make-to-job和个性化生产中仍存在困难,需要进一步研究以提高效率、减少延迟、提高安全性和可靠性。在这种贡献中,一种基于优先级分发规则的优点批评学习方法被提出,用于在实时下解决分布式灵活生产线的调度问题。为了提高性能,我们提出了一种更紧凑的环境表示方法,可以与优先级分发规则生成固定和有利的动作空间。此外,我们还开发了基于Monte Carlo搜索的软隔离组件,用于 Addressing long-sequence dependent unsafe behaviors and monitoring the risk of overdue scheduling。最后,我们验证了提案的算法和软隔离组件的性能评估。

Multi-Session Budget Optimization for Forward Auction-based Federated Learning

  • paper_url: http://arxiv.org/abs/2311.12548
  • repo_url: None
  • paper_authors: Xiaoli Tang, Han Yu
  • for: 多Session federated learning (FL) 训练过程中,模型用户 (MU) 需要积极策略来优化预算分配,以最大化总用户价值。
  • methods: 基于层次强化学习的Multi-session Budget Optimization Strategy for forward Auction-based Federated Learning (MultiBOS-AFL) jointly 优化了 между会话预算步骤和会话中的拍卖策略,以最大化总用户价值。
  • results: 对六个基准数据集进行了广泛的实验,与七种现有方法进行比较。 results show that MultiBOS-AFL 可以在给定预算下获得12.28%更高的用户价值,14.52%更多的通过拍卖获得的数据,以及1.23%更高的测试准确率。
    Abstract Auction-based Federated Learning (AFL) has emerged as an important research field in recent years. The prevailing strategies for FL model users (MUs) assume that the entire team of the required data owners (DOs) for an FL task must be assembled before training can commence. In practice, an MU can trigger the FL training process multiple times. DOs can thus be gradually recruited over multiple FL model training sessions. Existing bidding strategies for AFL MUs are not designed to handle such scenarios. Therefore, the problem of multi-session AFL remains open. To address this problem, we propose the Multi-session Budget Optimization Strategy for forward Auction-based Federated Learning (MultiBOS-AFL). Based on hierarchical reinforcement learning, MultiBOS-AFL jointly optimizes inter-session budget pacing and intra-session bidding for AFL MUs, with the objective of maximizing the total utility. Extensive experiments on six benchmark datasets show that it significantly outperforms seven state-of-the-art approaches. On average, MultiBOS-AFL achieves 12.28% higher utility, 14.52% more data acquired through auctions for a given budget, and 1.23% higher test accuracy achieved by the resulting FL model compared to the best baseline. To the best of our knowledge, it is the first budget optimization decision support method with budget pacing capability designed for MUs in multi-session forward auction-based federated learning
    摘要

In-Context Learning Functions with Varying Number of Minima

  • paper_url: http://arxiv.org/abs/2311.12538
  • repo_url: https://github.com/pittnail/icl-minima
  • paper_authors: David Oniani, Yanshan Wang
  • for: 本研究探讨了大语言模型(LLMs)在受限学习(In-Context Learning,ICL)中的表现,以及ICL在特定函数特性下的干预。
  • methods: 本研究使用了正式框架来探讨ICL,并提出了一种新的函数预测任务,即预测函数有多少个 минимум。我们还实现了一种生成函数输入为 минимум的方法。
  • results: 我们发现,增加函数的 минимум数会下降ICL性能。然而,我们的评估结果表明,ICL在所有设置下都能够超越2层神经网络(2NN)模型,并且ICL在所有设置下都能够更快地学习。我们验证了这些发现通过一系列几何参数配置下的几何实验。
    Abstract Large Language Models (LLMs) have proven effective at In-Context Learning (ICL), an ability that allows them to create predictors from labeled examples. Few studies have explored the interplay between ICL and specific properties of functions it attempts to approximate. In our study, we use a formal framework to explore ICL and propose a new task of approximating functions with varying number of minima. We implement a method that allows for producing functions with given inputs as minima. We find that increasing the number of minima degrades ICL performance. At the same time, our evaluation shows that ICL outperforms 2-layer Neural Network (2NN) model. Furthermore, ICL learns faster than 2NN in all settings. We validate the findings through a set of few-shot experiments across various hyperparameter configurations.
    摘要

Oasis: Data Curation and Assessment System for Pretraining of Large Language Models

  • paper_url: http://arxiv.org/abs/2311.12537
  • repo_url: https://github.com/tongzhou21/oasis
  • paper_authors: Tong Zhou, Yubo Chen, Pengfei Cao, Kang Liu, Jun Zhao, Shengping Liu
  • for: This paper presents a pretraining corpus curation and assessment platform called Oasis, which aims to improve the quality of large language models by customizing a corpus curation pipeline and leveraging comprehensive corpus assessment for iterative optimization.
  • methods: The Oasis platform includes a customized data curation module with an interactive modular rule filter, a debiased neural filter, and an adaptive document deduplication module. It also features a holistic data assessment module with human, GPT-4, and heuristic metrics.
  • results: The authors exhibit a complete process for using Oasis to curate and assess pretraining data, and they publicly release an 800GB bilingual corpus curated by Oasis. The results show that Oasis can significantly improve the quality of pretraining data and reduce the bias in large language models.
    Abstract Data is one of the most critical elements in building a large language model. However, existing systems either fail to customize a corpus curation pipeline or neglect to leverage comprehensive corpus assessment for iterative optimization of the curation. To this end, we present a pretraining corpus curation and assessment platform called Oasis -- a one-stop system for data quality improvement and quantification with user-friendly interactive interfaces. Specifically, the interactive modular rule filter module can devise customized rules according to explicit feedback. The debiased neural filter module builds the quality classification dataset in a negative-centric manner to remove the undesired bias. The adaptive document deduplication module could execute large-scale deduplication with limited memory resources. These three parts constitute the customized data curation module. And in the holistic data assessment module, a corpus can be assessed in local and global views, with three evaluation means including human, GPT-4, and heuristic metrics. We exhibit a complete process to use Oasis for the curation and assessment of pretraining data. In addition, an 800GB bilingual corpus curated by Oasis is publicly released.
    摘要 “数据是大型语言模型建模的关键元素。然而,现有系统 Either fail to customize a corpus curation pipeline or neglect to leverage comprehensive corpus assessment for iterative optimization of the curation. To address this challenge, we present a pre-training corpus curation and assessment platform called Oasis -- a one-stop system for data quality improvement and quantification with user-friendly interactive interfaces. Specifically, the interactive modular rule filter module can devise customized rules according to explicit feedback. The debiased neural filter module builds the quality classification dataset in a negative-centric manner to remove the undesired bias. The adaptive document deduplication module could execute large-scale deduplication with limited memory resources. These three parts constitute the customized data curation module. And in the holistic data assessment module, a corpus can be assessed in local and global views, with three evaluation means including human, GPT-4, and heuristic metrics. We exhibit a complete process to use Oasis for the curation and assessment of pre-training data. In addition, an 800GB bilingual corpus curated by Oasis is publicly released.”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you prefer Traditional Chinese, I can provide that as well.

Neural Network Pruning by Gradient Descent

  • paper_url: http://arxiv.org/abs/2311.12526
  • repo_url: https://github.com/3riccc/neural_pruning
  • paper_authors: Zhang Zhang, Ruyi Tao, Jiang Zhang
  • for: 这篇论文的目的是提出一种新的神经网络减少框架,以优化神经网络的参数和结构,提高计算效率和模型解释性。
  • methods: 该框架使用了Gumbel-Softmax技术,通过权重和结构的同时优化,在梯度下降 процессе中实现了神经网络的减少。
  • results: 实验结果表明,该框架可以在MNIST数据集上保持高准确率,只使用0.15%的原始神经网络参数。此外,该框架还提高了神经网络解释性,可以直接从减少后的神经网络中提取特征重要性,以及可以视觉化特征的对称和特征到结果的信息传递路径。
    Abstract The rapid increase in the parameters of deep learning models has led to significant costs, challenging computational efficiency and model interpretability. In this paper, we introduce a novel and straightforward neural network pruning framework that incorporates the Gumbel-Softmax technique. This framework enables the simultaneous optimization of a network's weights and topology in an end-to-end process using stochastic gradient descent. Empirical results demonstrate its exceptional compression capability, maintaining high accuracy on the MNIST dataset with only 0.15\% of the original network parameters. Moreover, our framework enhances neural network interpretability, not only by allowing easy extraction of feature importance directly from the pruned network but also by enabling visualization of feature symmetry and the pathways of information propagation from features to outcomes. Although the pruning strategy is learned through deep learning, it is surprisingly intuitive and understandable, focusing on selecting key representative features and exploiting data patterns to achieve extreme sparse pruning. We believe our method opens a promising new avenue for deep learning pruning and the creation of interpretable machine learning systems.
    摘要 “深度学习模型参数快速增长导致了显著的成本、计算效率和模型解释性问题。在这篇论文中,我们介绍了一种新的和简单的神经网络剪枝框架,该框架通过抽象跟踪技术实现了端到端的权重和结构同时优化。实验结果表明,我们的框架可以具有极高的压缩能力,在MNIST数据集上只使用0.15%的原始网络参数保持高度准确。此外,我们的框架可以增强神经网络解释性,不仅可以直接从剪枝网络中提取特征重要性,还可以通过特征对称和特征征向信息传播的可视化来描述神经网络的工作机制。尽管剪枝策略通过深度学习学习得到,但它却是非常直观和理解的,强调选择关键表征特征并利用数据模式来实现极度稀疏剪枝。我们认为,我们的方法将开启深度学习剪枝和可解释机器学习系统的新的先进 Avenues。”

ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models

  • paper_url: http://arxiv.org/abs/2311.12524
  • repo_url: https://github.com/mcjacktang/llm-healthassistant
  • paper_authors: Jiankai Tang, Kegang Wang, Hongming Hu, Xiyuxing Zhang, Peiyu Wang, Xin Liu, Yuntao Wang
  • for: 本研究旨在评估大自然语言模型(LLM)在医疗领域的可效性,尤其是它们在个人异常健康监测中的应用。我们的研究主要研究LLM在使用FDA批准的设备获取的生物学数据中进行解释和分析的能力。
  • methods: 我们使用了一种模拟海拔低压环境中采集的异常生物学数据进行了广泛的分析,以评估LLM在识别和评估用户健康状况方面的精度和可靠性。
  • results: 我们发现LLM在评估心率和氧气含量(SpO2)方面表现出色,MAE低于1 beat/分钟,MAPE低于1%,总准确率超过85%。在图像分析任务中,我们特地修改的GPT模型在解读光谱游离图像(PPG)数据方面表现出色,心率估计错误低于1 bpm,心率估计误差为7.28 MAE。这项研究提出LLM作为健康数据分析工具和高级人工智能医疗助手的双重角色,在未来医疗助手框架中提供个性化健康洞见和建议。
    Abstract This study concentrates on evaluating the efficacy of Large Language Models (LLMs) in healthcare, with a specific focus on their application in personal anomalous health monitoring. Our research primarily investigates the capabilities of LLMs in interpreting and analyzing physiological data obtained from FDA-approved devices. We conducted an extensive analysis using anomalous physiological data gathered in a simulated low-air-pressure plateau environment. This allowed us to assess the precision and reliability of LLMs in understanding and evaluating users' health status with notable specificity. Our findings reveal that LLMs exhibit exceptional performance in determining medical indicators, including a Mean Absolute Error (MAE) of less than 1 beat per minute for heart rate and less than 1% for oxygen saturation (SpO2). Furthermore, the Mean Absolute Percentage Error (MAPE) for these evaluations remained below 1%, with the overall accuracy of health assessments surpassing 85%. In image analysis tasks, such as interpreting photoplethysmography (PPG) data, our specially adapted GPT models demonstrated remarkable proficiency, achieving less than 1 bpm error in cycle count and 7.28 MAE for heart rate estimation. This study highlights LLMs' dual role as health data analysis tools and pivotal elements in advanced AI health assistants, offering personalized health insights and recommendations within the future health assistant framework.
    摘要 Translation notes:* " LLMS" is translated as "大语言模型" (dà yǔ yán módel), which means "large language models" in Simplified Chinese.* "FDA-approved devices" is translated as "由FDA批准的设备" (yǔ FDA píng yǔ de jiāng yì), which means "devices approved by the FDA" in Simplified Chinese.* "anomalous physiological data" is translated as "异常生物数据" (yì cháng shēng wù shù) in Simplified Chinese, which means "anomalous biological data".* "MAE" and "MAPE" are translated as "平均绝对误差" (píng jian jí yì bìng yì) and "平均绝对误差%" (píng jian jí yì bìng yì) respectively, which means "mean absolute error" and "mean absolute percentage error" in Simplified Chinese.* "PPG" is translated as "血氧光谱" (xuè yǎng guāng shuāng) in Simplified Chinese, which means "photoplethysmography" in English.* "GPT" is translated as "自然语言处理" (zì rán yǔ yán xíng) in Simplified Chinese, which means "natural language processing" in English.

Classification of Tabular Data by Text Processing

  • paper_url: http://arxiv.org/abs/2311.12521
  • repo_url: None
  • paper_authors: Keshav Ramani, Daniel Borrajo
  • for: 这篇论文是为了提出一种基于文本处理技术的分类方法,即文本基类型分类(TBC),用于解决表格数据上的分类任务。
  • methods: 该方法使用当前最先进的文本处理技术来实现分类任务,包括文本特征提取、文本分类和模型训练等步骤。
  • results: 在多个数据集上,TBC方法可以与当前最先进的模型达到相似的准确率、准确率和预测类别的准确率。
    Abstract Natural Language Processing technology has advanced vastly in the past decade. Text processing has been successfully applied to a wide variety of domains. In this paper, we propose a novel framework, Text Based Classification(TBC), that uses state of the art text processing techniques to solve classification tasks on tabular data. We provide a set of controlled experiments where we present the benefits of using this approach against other classification methods. Experimental results on several data sets also show that this framework achieves comparable performance to that of several state of the art models in accuracy, precision and recall of predicted classes.
    摘要 自然语言处理技术在过去十年内发展了非常快。文本处理技术在各种领域得到了成功应用。在这篇论文中,我们提出了一种新的框架,文本基于类型化(TBC),该框架使用当前最佳的文本处理技术来解决类别任务。我们提供了一组控制的实验,其中我们展示了使用该方法与其他分类方法的比较。实验结果表明,这种框架在几个数据集上达到了和当前最佳模型的准确率、精度和归类率的相同水平。

Fin-QD: A Computational Design Framework for Soft Grippers: Integrating MAP-Elites and High-fidelity FEM

  • paper_url: http://arxiv.org/abs/2311.12477
  • repo_url: None
  • paper_authors: Yue Xie, Xing Wang, Fumiya Iida, David Howard
  • for: 本研究旨在 Computational design 可以激活软机械系统的潜在功能,并且解决软机械系统的高非线性问题。
  • methods: 本研究使用了自动化的 Computational design 优化框架,通过质量多样性approach来生成多种软机械系统的设计,以便在不同物理特性下成功地抓取各种几何特征不同的物体。
  • results: 本研究实现了自动化生成多种软机械系统的设计,并且通过高精度的 Finite Element Modelling (FEM) 来评估抓取性能和特征。这些设计可以考虑软机械系统的体积和工作空间,并且可以使用简单的控制方案来实现。这些结果可以帮助 bridge 软机械系统的计算设计空间和物体抓取问题。
    Abstract Computational design can excite the full potential of soft robotics that has the drawbacks of being highly nonlinear from material, structure, and contact. Up to date, enthusiastic research interests have been demonstrated for individual soft fingers, but the frame design space (how each soft finger is assembled) remains largely unexplored. Computationally design remains challenging for the finger-based soft gripper to grip across multiple geometrical-distinct object types successfully. Including the design space for the gripper frame can bring huge difficulties for conventional optimisation algorithms and fitness calculation methods due to the exponential growth of high-dimensional design space. This work proposes an automated computational design optimisation framework that generates gripper diversity to individually grasp geometrically distinct object types based on a quality-diversity approach. This work first discusses a significantly large design space (28 design parameters) for a finger-based soft gripper, including the rarely-explored design space of finger arrangement that is converted to various configurations to arrange individual soft fingers. Then, a contact-based Finite Element Modelling (FEM) is proposed in SOFA to output high-fidelity grasping data for fitness evaluation and feature measurements. Finally, diverse gripper designs are obtained from the framework while considering features such as the volume and workspace of grippers. This work bridges the gap of computationally exploring the vast design space of finger-based soft grippers while grasping large geometrically distinct object types with a simple control scheme.
    摘要 计算设计可以激发软机器人的潜力,但软机器人具有高度非线性的材料、结构和接触特性,导致计算设计具有极大的挑战性。至今,研究者们对具有各种软指的软机器人表现出了热烈的兴趣,但软机器人框架设计空间(如每个软指的组合方式)仍然得不到充分的探索。计算设计软机器人抓取器抓取多种几何体型物体时的成功率很低,这是因为抓取器的框架设计空间的增长率是指数增长的。本工作提出了一种自动化计算设计优化框架,该框架可以生成具有多种抓取器类型的多样性,以便在不同的几何体型物体上进行成功的抓取。本工作首先介绍了软机器人抓取器的巨大设计空间(28个设计参数),其中包括软指的安排方式的设计空间,这些设计空间可以通过不同的配置来变换到多种配置。然后,本工作提出了基于SOFA的contact-based Finite Element Modeling(FEM),以生成高精度的抓取数据,用于评估优质和特征量。最后,框架生成了具有多种特征的多种抓取器设计,包括抓取器体积和工作空间的考虑。本工作通过计算设计软机器人抓取器,bridge了计算探索软机器人抓取器巨大的设计空间和抓取多种几何体型物体的简单控制方案之间的差距。

PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords

  • paper_url: http://arxiv.org/abs/2311.12475
  • repo_url: https://github.com/clicknext-ai/phayathaibert
  • paper_authors: Panyut Sriwirote, Jalinee Thapiang, Vasan Timtong, Attapol T. Rutherford
  • for: 本研究旨在解决 WangchanBERTa 模型对外语词语理解的缺陷,特别是英语词语的借鉴不受字形融合的情况。
  • methods: 本研究使用 vocabulary transfer 技术将 XLM-R 的预训练tokenizer 中的外语词语添加到 WangchanBERTa 的tokenizer中,然后从 WangchanBERTa 的Checkpoint开始,在一个更大的数据集上进行预训练。
  • results: 我们的新预训练模型 PhayaThaiBERT 在多个下游任务和数据集上表现出色,超越 WangchanBERTa 的表现。
    Abstract While WangchanBERTa has become the de facto standard in transformer-based Thai language modeling, it still has shortcomings in regard to the understanding of foreign words, most notably English words, which are often borrowed without orthographic assimilation into Thai in many contexts. We identify the lack of foreign vocabulary in WangchanBERTa's tokenizer as the main source of these shortcomings. We then expand WangchanBERTa's vocabulary via vocabulary transfer from XLM-R's pretrained tokenizer and pretrain a new model using the expanded tokenizer, starting from WangchanBERTa's checkpoint, on a new dataset that is larger than the one used to train WangchanBERTa. Our results show that our new pretrained model, PhayaThaiBERT, outperforms WangchanBERTa in many downstream tasks and datasets.
    摘要 王chanBERTa 已成为泰语预训言语模型中的标准,但它在对外语词汇理解方面仍有缺陷,特别是英语词汇,这些词汇经常被借鉴到泰语中没有正确的写字 assimilation 的情况。我们认为 WangchanBERTa 的tokenizer 缺乏外语词汇为主要的缺陷。我们然后通过将 XLM-R 的预训tokenizer 的词汇迁移到 WangchanBERTa 的tokenizer中,并从 WangchanBERTa 的checkpoint开始,在一个 larger 的数据集上预训一个新的模型,我们称之为 PhayaThaiBERT。我们的结果显示,PhayaThaiBERT 在许多下游任务和数据集上表现出色,超越 WangchanBERTa。

Self-Supervised Deconfounding Against Spatio-Temporal Shifts: Theory and Modeling

  • paper_url: http://arxiv.org/abs/2311.12472
  • repo_url: https://github.com/shotdowndiane/steve
  • paper_authors: Jiahao Ji, Wentao Zhang, Jingyuan Wang, Yue He, Chao Huang
  • for: 本研究旨在提高城市交通效率和推动可持续发展,通过减轻外部因素对于预测交通流量的影响。
  • methods: 我们首先在 causal 图中构建了过去交通数据、未来交通数据和外部预测因素的 causal 关系,并发现了优先艺术的失败Points in OOD 交通数据是由预测因素作为共同因素而导致的。然后,我们提出了一种名为分解上下文调整(DCA)的理论解决方案,它通过分解隐藏的 causal 相关性和干扰相关性来减轻外部因素对于预测交通流量的影响。
  • results: 我们的 STEVE 框架在四个大规模 benchmark 数据集上进行了广泛的实验,并显示其在不同的 OOD 交通预测场景下的稳定性和可靠性。
    Abstract As an important application of spatio-temporal (ST) data, ST traffic forecasting plays a crucial role in improving urban travel efficiency and promoting sustainable development. In practice, the dynamics of traffic data frequently undergo distributional shifts attributed to external factors such as time evolution and spatial differences. This entails forecasting models to handle the out-of-distribution (OOD) issue where test data is distributed differently from training data. In this work, we first formalize the problem by constructing a causal graph of past traffic data, future traffic data, and external ST contexts. We reveal that the failure of prior arts in OOD traffic data is due to ST contexts acting as a confounder, i.e., the common cause for past data and future ones. Then, we propose a theoretical solution named Disentangled Contextual Adjustment (DCA) from a causal lens. It differentiates invariant causal correlations against variant spurious ones and deconfounds the effect of ST contexts. On top of that, we devise a Spatio-Temporal sElf-superVised dEconfounding (STEVE) framework. It first encodes traffic data into two disentangled representations for associating invariant and variant ST contexts. Then, we use representative ST contexts from three conceptually different perspectives (i.e., temporal, spatial, and semantic) as self-supervised signals to inject context information into both representations. In this way, we improve the generalization ability of the learned context-oriented representations to OOD ST traffic forecasting. Comprehensive experiments on four large-scale benchmark datasets demonstrate that our STEVE consistently outperforms the state-of-the-art baselines across various ST OOD scenarios.
    摘要 为了提高城市交通效率和推动可持续发展,ST数据预测扮演了非常重要的应用。然而,交通数据的动态频繁受到外部因素的影响,导致预测模型需要处理非典型数据(OOD)的问题。在这种情况下,我们首先将问题正式化, constructed a causal graph of past traffic data, future traffic data, and external ST contexts。我们发现,先前的方法在OOD交通数据上出现失败是因为ST上下文作为共同原因,即过去数据和未来数据之间的共同原因。然后,我们提出了一种名为分解上下文调整(DCA)的理论解决方案,它可以减轻ST上下文的影响。此外,我们还提出了一种ST自适应推断(STEVE)框架,它可以将交通数据编码成两个分离的表示,其中一个表示共同的ST上下文,另一个表示特有的ST上下文。然后,我们使用三种不同的视角(时间、空间和semantic)中的代表ST上下文作为自我指导信号,以增强学习的上下文感知能力。最后,我们在四个大规模 benchmark 数据集上进行了广泛的实验,结果显示,我们的 STEVE consistently 超过了状态前的基elines across various ST OOD scenarios。

Towards a Gateway for Knowledge Graph Schemas Collection, Analysis, and Embedding

  • paper_url: http://arxiv.org/abs/2311.12465
  • repo_url: None
  • paper_authors: Mattia Fumagalli, Marco Boffo, Daqian Shi, Mayukh Bagchi, Fausto Giunchiglia
  • for: 本文的目的是使用现有的知识图库来训练统计模型。
  • methods: 本文使用了现有的知识图库,并提供了一个 gateway 来汇集这些数据,以便对这些数据进行分析和可视化。
  • results: 本文提供了一个初版的 LiveSchema iniciative,可以将现有的知识图库集成到一个 gateway 中,并提供了一些针对这些数据的查询和分析服务。
    Abstract One of the significant barriers to the training of statistical models on knowledge graphs is the difficulty that scientists have in finding the best input data to address their prediction goal. In addition to this, a key challenge is to determine how to manipulate these relational data, which are often in the form of particular triples (i.e., subject, predicate, object), to enable the learning process. Currently, many high-quality catalogs of knowledge graphs, are available. However, their primary goal is the re-usability of these resources, and their interconnection, in the context of the Semantic Web. This paper describes the LiveSchema initiative, namely, a first version of a gateway that has the main scope of leveraging the gold mine of data collected by many existing catalogs collecting relational data like ontologies and knowledge graphs. At the current state, LiveSchema contains - 1000 datasets from 4 main sources and offers some key facilities, which allow to: i) evolving LiveSchema, by aggregating other source catalogs and repositories as input sources; ii) querying all the collected resources; iii) transforming each given dataset into formal concept analysis matrices that enable analysis and visualization services; iv) generating models and tensors from each given dataset.
    摘要 一个 significative barrier to the training of statistical models on knowledge graphs is the difficulty that scientists have in finding the best input data to address their prediction goal. In addition to this, a key challenge is to determine how to manipulate these relational data, which are often in the form of particular triples (i.e., subject, predicate, object), to enable the learning process. Currently, many high-quality catalogs of knowledge graphs, are available. However, their primary goal is the re-usability of these resources, and their interconnection, in the context of the Semantic Web. This paper describes the LiveSchema initiative, namely, a first version of a gateway that has the main scope of leveraging the gold mine of data collected by many existing catalogs collecting relational data like ontologies and knowledge graphs. At the current state, LiveSchema contains - 1000 datasets from 4 main sources and offers some key facilities, which allow to: i) evolving LiveSchema, by aggregating other source catalogs and repositories as input sources; ii) querying all the collected resources; iii) transforming each given dataset into formal concept analysis matrices that enable analysis and visualization services; iv) generating models and tensors from each given dataset.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide that version as well.

HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis

  • paper_url: http://arxiv.org/abs/2311.12454
  • repo_url: https://github.com/sh-lee-prml/hierspeechpp
  • paper_authors: Sang-Hoon Lee, Ha-Yeong Choi, Seung-Bin Kim, Seong-Whan Lee
    for:This paper proposes a fast and strong zero-shot speech synthesizer for text-to-speech (TTS) and voice conversion (VC) tasks.methods:The proposed method, called HierSpeech++, uses a hierarchical speech synthesis framework that significantly improves the robustness and expressiveness of the synthetic speech. It also adopts the text-to-vec framework to generate a self-supervised speech representation and an F0 representation based on text representations and prosody prompts.results:The experimental results demonstrated that HierSpeech++ outperforms LLM-based and diffusion-based models in zero-shot speech synthesis tasks, and achieves human-level quality zero-shot speech synthesis.
    Abstract Large language models (LLM)-based speech synthesis has been widely adopted in zero-shot speech synthesis. However, they require a large-scale data and possess the same limitations as previous autoregressive speech models, including slow inference speed and lack of robustness. This paper proposes HierSpeech++, a fast and strong zero-shot speech synthesizer for text-to-speech (TTS) and voice conversion (VC). We verified that hierarchical speech synthesis frameworks could significantly improve the robustness and expressiveness of the synthetic speech. Furthermore, we significantly improve the naturalness and speaker similarity of synthetic speech even in zero-shot speech synthesis scenarios. For text-to-speech, we adopt the text-to-vec framework, which generates a self-supervised speech representation and an F0 representation based on text representations and prosody prompts. Then, HierSpeech++ generates speech from the generated vector, F0, and voice prompt. We further introduce a high-efficient speech super-resolution framework from 16 kHz to 48 kHz. The experimental results demonstrated that the hierarchical variational autoencoder could be a strong zero-shot speech synthesizer given that it outperforms LLM-based and diffusion-based models. Moreover, we achieved the first human-level quality zero-shot speech synthesis. Audio samples and source code are available at https://github.com/sh-lee-prml/HierSpeechpp.
    摘要 大型语言模型(LLM)基于的语音合成已广泛应用于零批语音合成。然而,它们需要大规模数据和拥有过去autoregressive语音模型的同一些限制,包括慢速推理速度和缺乏可靠性。这篇论文提议了 HierSpeech++,一种快速和强大的零批语音合成器 для文本到语音(TTS)和语音转换(VC)。我们证明了层次语音合成框架可以显著提高合成语音的可靠性和表达力。此外,我们还提高了合成语音的自然性和说话人相似性,即使在零批语音合成场景下。 для文本到语音,我们采用文本到 вектор框架,它生成了一个自动编码的语音表示和F0表示,基于文本表示和语调提示。然后,HierSpeech++生成语音从生成的 вектор、F0和语音提示。我们还引入了高效的语音超分辨框架,从16 kHz提升到48 kHz。实验结果表明,层次变化自动机可以在零批语音合成场景下成为强大的零批语音合成器,并且超过LLM-基于和扩散-基于模型。此外,我们实现了首次人类水平质量的零批语音合成。音频示例和源代码可以在https://github.com/sh-lee-prml/HierSpeechpp获得。

Extracting Definienda in Mathematical Scholarly Articles with Transformers

  • paper_url: http://arxiv.org/abs/2311.12448
  • repo_url: https://github.com/sufianj/def_extraction
  • paper_authors: Shufan Jiang, Pierre Senellart
  • for: 本研究旨在自动识别学术论文中定义的概念。
  • methods: 本研究使用了精致预训Transformer Token-level分类任务和一般大语言模型(GPT)进行问题回答任务。
  • results: 实验结果显示,可以使用精致预训模型或更简单的预训模型,均可以 дости持高精度和准确性。
    Abstract We consider automatically identifying the defined term within a mathematical definition from the text of an academic article. Inspired by the development of transformer-based natural language processing applications, we pose the problem as (a) a token-level classification task using fine-tuned pre-trained transformers; and (b) a question-answering task using a generalist large language model (GPT). We also propose a rule-based approach to build a labeled dataset from the LATEX source of papers. Experimental results show that it is possible to reach high levels of precision and recall using either recent (and expensive) GPT 4 or simpler pre-trained models fine-tuned on our task.
    摘要 我们考虑自动从学术文章中提取定义中的特定术语。受转换器基于自然语言处理应用的发展启发,我们将问题定义为(a)一个Token级别分类任务,使用精度调整的预训练 transformer; 和(b)一个问题回答任务,使用一种通用大型自然语言模型(GPT)。我们还提议一种基于规则的方法来建立标注数据集从latex文档中。实验结果表明,可以通过使用最新的(且昂贵的)GPT 4或简单预训练模型,在我们任务上达到高精度和准确率。

Designing Long-term Group Fair Policies in Dynamical Systems

  • paper_url: http://arxiv.org/abs/2311.12447
  • repo_url: None
  • paper_authors: Miriam Rateike, Isabel Valera, Patrick Forré
  • for: 这个论文旨在提出一种新的框架,以实现在动态系统中长期保证群体公正。
  • methods: 这个框架基于时间Homogeneous Markov chain模型,并利用Markov chain收敛定理来确保唯一收敛。
  • results: 该框架可以实现在长期下达到targeted fair stationary state,独立于初始数据分布。它还可以评估不同的长期目标,并 analyze its impact on group-conditional population distribution在长期下。
    Abstract Neglecting the effect that decisions have on individuals (and thus, on the underlying data distribution) when designing algorithmic decision-making policies may increase inequalities and unfairness in the long term - even if fairness considerations were taken in the policy design process. In this paper, we propose a novel framework for achieving long-term group fairness in dynamical systems, in which current decisions may affect an individual's features in the next step, and thus, future decisions. Specifically, our framework allows us to identify a time-independent policy that converges, if deployed, to the targeted fair stationary state of the system in the long term, independently of the initial data distribution. We model the system dynamics with a time-homogeneous Markov chain and optimize the policy leveraging the Markov chain convergence theorem to ensure unique convergence. We provide examples of different targeted fair states of the system, encompassing a range of long-term goals for society and policymakers. Furthermore, we show how our approach facilitates the evaluation of different long-term targets by examining their impact on the group-conditional population distribution in the long term and how it evolves until convergence.
    摘要 忽略algorithmic decision-making中个人影响(以及相关数据分布)可能会导致长期不平等和不公正。在这篇论文中,我们提出了一种新的框架,以实现长期群体公平的dinamic system。在这个框架中,当前的决策可能会影响个人的特征在下一步,因此影响未来的决策。我们使用时Homogeneous Markov chain来模拟系统动态,并通过Markov chain converges theorem来确保唯一的 convergence。我们提供了不同的targeted fair state的例子,涵盖了社会和 policymakers的多种长期目标。此外,我们还示了我们的方法如何评估不同的长期目标,以及它们在长期发展中对群体conditional population distribution的影响。

Knowledge Base Enabled Semantic Communication: A Generative Perspective

  • paper_url: http://arxiv.org/abs/2311.12443
  • repo_url: None
  • paper_authors: Jinke Ren, Zezhong Zhang, Jie Xu, Guanying Chen, Yaping Sun, Ping Zhang, Shuguang Cui
    for: This paper aims to explore the use of semantic knowledge base (KB) to improve the efficiency of semantic communication in 6G wireless networks.methods: The paper proposes a generative semantic communication architecture that utilizes three sub-KBs: source, task, and channel KBs. The construction approaches for each sub-KB are also presented, along with their use in semantic coding and transmission.results: The paper demonstrates the superiority of generative semantic communication over conventional syntactic communication and classical semantic communication through a case study. The results show that generative semantic communication can significantly enhance the communication efficiency while maintaining the desired meaning of the source messages.
    Abstract Semantic communication is widely touted as a key technology for propelling the sixth-generation (6G) wireless networks. However, providing effective semantic representation is quite challenging in practice. To address this issue, this article takes a crack at exploiting semantic knowledge base (KB) to usher in a new era of generative semantic communication. Via semantic KB, source messages can be characterized in low-dimensional subspaces without compromising their desired meaning, thus significantly enhancing the communication efficiency. The fundamental principle of semantic KB is first introduced, and a generative semantic communication architecture is developed by presenting three sub-KBs, namely source, task, and channel KBs. Then, the detailed construction approaches for each sub-KB are described, followed by their utilization in terms of semantic coding and transmission. A case study is also provided to showcase the superiority of generative semantic communication over conventional syntactic communication and classical semantic communication. In a nutshell, this article establishes a scientific foundation for the exciting uncharted frontier of generative semantic communication.
    摘要 Semantic communication is widely promoted as a key technology for 6G wireless networks. However, providing effective semantic representation is quite challenging in practice. To address this issue, this article explores the use of semantic knowledge base (KB) to usher in a new era of generative semantic communication. By characterizing source messages in low-dimensional subspaces without compromising their desired meaning, semantic KB can significantly enhance communication efficiency.The fundamental principle of semantic KB is first introduced, followed by a generative semantic communication architecture that includes three sub-KBs: source, task, and channel KBs. The construction approaches for each sub-KB are then described, along with their utilization in semantic coding and transmission. A case study is also provided to demonstrate the superiority of generative semantic communication over conventional syntactic communication and classical semantic communication.In summary, this article establishes a scientific foundation for the exciting uncharted frontier of generative semantic communication.

Fair Enough? A map of the current limitations of the requirements to have “fair’’ algorithms

  • paper_url: http://arxiv.org/abs/2311.12435
  • repo_url: None
  • paper_authors: Alessandro Castelnovo, Nicole Inverardi, Gabriele Nanino, Ilaria Giuseppina Penco, Daniele Regoli
    for: The paper focuses on the issue of bias and unfairness in automated decision-making systems, and the need for a more nuanced understanding of what “fairness” means in real-world scenarios.methods: The paper highlights the existing research on assessing and mitigating bias in AI systems, but argues that these efforts are insufficient without a broader societal understanding of what fairness means in practice.results: The paper identifies a list of fundamental ambiguities and attention points that must be addressed in order to give concrete meaning to the demand for fairness in AI systems.
    Abstract In the recent years, the raise in the usage and efficiency of Artificial Intelligence and, more in general, of Automated Decision-Making systems has brought with it an increasing and welcome awareness of the risks associated with such systems. One of such risks is that of perpetuating or even amplifying bias and unjust disparities present in the data from which many of these systems learn to adjust and optimise their decisions. This awareness has on one side encouraged several scientific communities to come up with more and more appropriate ways and methods to assess, quantify, and possibly mitigate such biases and disparities. On the other hand, it has prompted more and more layers of society, including policy makers, to call for ``fair'' algorithms. We believe that while a lot of excellent and multidisciplinary research is currently being conducted, what is still fundamentally missing is the awareness that having ``fair'' algorithms is per s\'e a nearly meaningless requirement, that needs to be complemented with a lot of additional societal choices to become actionable. Namely, there is a hiatus between what the society is demanding from Automated Decision-Making systems, and what this demand actually means in real-world scenarios. In this work, we outline the key features of such a hiatus, and pinpoint a list of fundamental ambiguities and attention points that we as a society must address in order to give a concrete meaning to the increasing demand of fairness in Automated Decision-Making systems.
    摘要 在最近几年,人工智能和自动决策系统的使用和效率提高了,引发了关于这些系统的风险的增加和更加重要的关注。其中一种风险是通过学习和优化决策的数据中的偏见和不公正现象来延续或加剧偏见和不公正现象。这种关注导致了许多科学社区开发出更多和更适合的方法来评估、量化和可能减轻这些偏见和不公正现象。另一方面,更多的社会层次,包括政策制定者,要求“公平”的算法。我们认为,虽然目前有很多优秀的多学科研究,但实际上还缺乏一个基本的认识:具有“公平”的算法并不是一个具体的要求,需要更多的社会选择来变得行动Ready。即使在现实场景中,我们需要解决许多歧义和注意点,才能让“公平”的算法有具体的意义。在这篇文章中,我们列出了关于这种停滞的关键特征,并指出了一些基本的歧义和注意点,我们社会必须解决,以让“公平”的算法有具体的意义。

A recurrent connectionist model of melody perception : An exploration using TRACX2

  • paper_url: http://arxiv.org/abs/2311.12431
  • repo_url: None
  • paper_authors: Daniel Defays, Robert French, Barbara Tillmann
  • for: 研究Speech segmentation、Serial image processing和Music processing中的同样或相似机制。
  • methods: 使用TRACX2模型,一种基于认知的回归连接式自适应网络,模拟了speech和serial-imageprocessing。
  • results: TRACX2模型可以成功地处理易谱简单的法国儿童歌曲的interval,并且展现了人类可识别的旋律类别。
    Abstract Are similar, or even identical, mechanisms used in the computational modeling of speech segmentation, serial image processing and music processing? We address this question by exploring how TRACX2, (French et al., 2011; French \& Cottrell, 2014; Mareschal \& French, 2017), a recognition-based, recursive connectionist autoencoder model of chunking and sequence segmentation, which has successfully simulated speech and serial-image processing, might be applied to elementary melody perception. The model, a three-layer autoencoder that recognizes ''chunks'' of short sequences of intervals that have been frequently encountered on input, is trained on the tone intervals of melodically simple French children's songs. It dynamically incorporates the internal representations of these chunks into new input. Its internal representations cluster in a manner that is consistent with ''human-recognizable'' melodic categories. TRACX2 is sensitive to both contour and proximity information in the musical chunks that it encounters in its input. It shows the ''end-of-word'' superiority effect demonstrated by Saffran et al. (1999) for short musical phrases. The overall findings suggest that the recursive autoassociative chunking mechanism, as implemented in TRACX2, may be a general segmentation and chunking mechanism, underlying not only word-and imagechunking, but also elementary melody processing.
    摘要 这些机制是否在计算模型中的语音分 segmentation、串行图像处理和音乐处理中使用相同或者相似的机制呢?我们通过探讨如何使用TRACX2(French et al., 2011;French 与 Cottrell, 2014;Mareschal 与 French, 2017),一种认知基于模型,可以模拟语音和串行图像处理,是否可以应用于基本旋律识别。TRACX2是一个三层自适应神经网络,可以识别短序列间隔的''块'',这些块是在输入中频繁出现的。它在新输入中动态包含内部表示。它的内部表示归一化在''人类可识''的旋律类别上。TRACX2敏感于音乐块中的形态和距离信息。它在短乐段上显示出''字符串''终点优点效应,这与Saffran等人(1999)所发现的短乐段''字符串''终点优点效应相似。总体发现表明, recursive autoassociative chunking机制,如TRACX2所实现的,可能是一种通用的分 segmentation和chunking机制,在不同的领域中都能够应用。

How Far Have We Gone in Vulnerability Detection Using Large Language Models

  • paper_url: http://arxiv.org/abs/2311.12420
  • repo_url: None
  • paper_authors: Zeyu Gao, Hao Wang, Yuchen Zhou, Wenyu Zhu, Chao Zhang
  • for: 本研究的目的是探讨大型自然语言模型(LLMs)在漏洞检测中的潜力。
  • methods: 本研究使用了16个LLMs和6种现有的深度学习模型和静态分析器进行实验,以evaluate LLMs在漏洞检测中的表现。
  • results: 实验结果显示,一些LLMs可以在漏洞检测中超过传统的深度学习方法表现,这表明LLMs在软件安全方面存在未经探索的潜力。
    Abstract As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is critically important, yet challenging. Given the significant successes of Large Language Models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF (Capture-the-Flag) challenges and real-world applications, with annotations for each vulnerable function detailing the vulnerability type and its root cause. Through our experiments encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models and static analyzers, we find that several LLMs outperform traditional deep learning approaches in vulnerability detection, revealing an untapped potential in LLMs. This work contributes to the understanding and utilization of LLMs for enhanced software security.
    摘要 As software becomes increasingly complex and prone to vulnerabilities, automated vulnerability detection is becoming more and more important, yet challenging. Given the significant successes of Large Language Models (LLMs) in various tasks, there is growing anticipation of their efficacy in vulnerability detection. However, a quantitative understanding of their potential in vulnerability detection is still missing. To bridge this gap, we introduce a comprehensive vulnerability benchmark VulBench. This benchmark aggregates high-quality data from a wide range of CTF (Capture-the-Flag) challenges and real-world applications, with annotations for each vulnerable function detailing the vulnerability type and its root cause. Through our experiments encompassing 16 LLMs and 6 state-of-the-art (SOTA) deep learning-based models and static analyzers, we find that several LLMs outperform traditional deep learning approaches in vulnerability detection, revealing an untapped potential in LLMs. This work contributes to the understanding and utilization of LLMs for enhanced software security.Here's the word-for-word translation: software becoming increasingly complex, vulnerabilities increasingly important, yet challenging, Large Language Models (LLMs) success in various tasks, growing anticipation of efficacy in vulnerability detection, quantitative understanding of LLM potential missing, introduce comprehensive vulnerability benchmark VulBench, benchmark aggregates high-quality data from CTF challenges and real-world applications, annotations for each vulnerable function, vulnerability type and root cause, experiments encompassing 16 LLMs and 6 state-of-the-art deep learning-based models and static analyzers, several LLMs outperform traditional deep learning approaches, untapped potential in LLMs, contribution to understanding and utilization of LLMs for enhanced software security.

A Safer Vision-based Autonomous Planning System for Quadrotor UAVs with Dynamic Obstacle Trajectory Prediction and Its Application with LLMs

  • paper_url: http://arxiv.org/abs/2311.12893
  • repo_url: None
  • paper_authors: Jiageng Zhong, Ming Li, Yinliang Chen, Zihang Wei, Fan Yang, Haoran Shen
  • for: 这篇论文的目的是提出一种基于视觉的自主规划系统,以实现智能quadcopter UAV的可靠和可预测的自主飞行。
  • methods: 该论文使用了一种轻量级的物体检测算法来识别动态障碍物,然后使用加拿赫滤波来跟踪和估算障碍物的运动状态。在规划阶段,我们不仅考虑静止障碍物,还考虑了障碍物的潜在运动。在生成 trajectory 时,我们使用了一种基于B-spline的 trajectory 搜索算法,并将其进一步优化了多种约束,以提高安全性和飞行特性的匹配。
  • results: 我们在实验环境中对实际飞行和 simulate 环境进行了测试,结果表明,我们的方法可以在实时中检测和避免动态环境中的障碍物,与现有方法相比,具有更高的可靠性。此外,本研究还探讨了自主规划系统与大型自然语言处理(NLP)技术的 интеграción,以实现更加人性化的人机交互。
    Abstract For intelligent quadcopter UAVs, a robust and reliable autonomous planning system is crucial. Most current trajectory planning methods for UAVs are suitable for static environments but struggle to handle dynamic obstacles, which can pose challenges and even dangers to flight. To address this issue, this paper proposes a vision-based planning system that combines tracking and trajectory prediction of dynamic obstacles to achieve efficient and reliable autonomous flight. We use a lightweight object detection algorithm to identify dynamic obstacles and then use Kalman Filtering to track and estimate their motion states. During the planning phase, we not only consider static obstacles but also account for the potential movements of dynamic obstacles. For trajectory generation, we use a B-spline-based trajectory search algorithm, which is further optimized with various constraints to enhance safety and alignment with the UAV's motion characteristics. We conduct experiments in both simulation and real-world environments, and the results indicate that our approach can successfully detect and avoid obstacles in dynamic environments in real-time, offering greater reliability compared to existing approaches. Furthermore, with the advancements in Natural Language Processing (NLP) technology demonstrating exceptional zero-shot generalization capabilities, more user-friendly human-machine interactions have become feasible, and this study also explores the integration of autonomous planning systems with Large Language Models (LLMs).
    摘要 更进一步,随着自然语言处理(NLP)技术的发展,我们可以通过与大型语言模型(LLM)的集成,实现更加人性化的人机交互。这种方法可以让用户通过自然语言输入指令,实现对UAV的自主 планинг和控制。在这种情况下,我们可以通过NLP技术来解析和理解用户的输入,并根据这些输入生成合适的trajectory планы。在实验阶段,我们在模拟环境和实际环境中进行了详细的测试和评估。结果表明,我们的方法可以在实时中检测和避免动态环境中的障碍物,并且比现有方法更加可靠。此外,我们还发现,通过与LLM的集成,我们的方法可以在不同的语言环境下进行自主 планинг和控制,提高了UAV的可靠性和安全性。

nach0: Multimodal Natural and Chemical Languages Foundation Model

  • paper_url: http://arxiv.org/abs/2311.12410
  • repo_url: None
  • paper_authors: Micha Livne, Zulfat Miftahutdinov, Elena Tutubalina, Maksim Kuznetsov, Daniil Polykovskiy, Annika Brundyn, Aastha Jhunjhunwala, Anthony Costa, Alex Aliper, Alex Zhavoronkov
  • For: The paper introduces a new foundation model called nach0, which can solve various chemical and biological tasks such as biomedical question answering, named entity recognition, molecular generation, and others.* Methods: The model is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings, with fine-tuning using specific task-related instructions.* Results: The model outperforms state-of-the-art baselines on single-domain and cross-domain tasks, and can generate high-quality outputs in molecular and textual formats, demonstrating its effectiveness in multi-domain setups.Here is the summary in Traditional Chinese:* For: 这篇论文引入了一个新的基础模型called nach0,可以解决多种化学和生物任务,例如生物问题答案、名称识别、分子生成、分子合成、特征预测等等。* Methods: 这个模型是一个多元领域和多任务的encoder-decoder LLM,通过预训用不同领域和任务的文本数据,将多种化学和语言知识传递到模型中。* Results: 模型在单一领域和跨领域任务中表现出色,比基于状态的基eline表现更好,并且可以生成高质量的分子和文本格式的输出,展示其在多领域设置中的有效性。
    Abstract Large Language Models (LLMs) have substantially driven scientific progress in various domains, and many papers have demonstrated their ability to tackle complex problems with creative solutions. Our paper introduces a new foundation model, nach0, capable of solving various chemical and biological tasks: biomedical question answering, named entity recognition, molecular generation, molecular synthesis, attributes prediction, and others. nach0 is a multi-domain and multi-task encoder-decoder LLM pre-trained on unlabeled text from scientific literature, patents, and molecule strings to incorporate a range of chemical and linguistic knowledge. We employed instruction tuning, where specific task-related instructions are utilized to fine-tune nach0 for the final set of tasks. To train nach0 effectively, we leverage the NeMo framework, enabling efficient parallel optimization of both base and large model versions. Extensive experiments demonstrate that our model outperforms state-of-the-art baselines on single-domain and cross-domain tasks. Furthermore, it can generate high-quality outputs in molecular and textual formats, showcasing its effectiveness in multi-domain setups.
    摘要 大型语言模型(LLM)已经在多个领域catalyzed scientific progress, 并且许多论文展示了它们的创新性解决方案。我们的论文介绍了一个新的基础模型 nach0,可以解决多种化学和生物任务:生物医学问答、名称实体识别、分子生成、分子合成、特征预测等等。 nach0 是一个多个领域和多任务 encoder-decoder LLM,通过对科学文献、专利和分子字符串的无标签文本进行预训练,以整合化学和语言知识。我们使用了指令调整,在最终的任务中使用特定任务相关的指令来细化 nach0。为了训练 nach0 效果,我们利用了 NeMo 框架,实现了平行优化基模型和大模型版本。广泛的实验表明,我们的模型在单domain和交叉domain任务上表现出色,并且可以生成高质量的输出,包括分子和文本格式,这说明它在多个领域设置中的效果。

Infinite forecast combinations based on Dirichlet process

  • paper_url: http://arxiv.org/abs/2311.12379
  • repo_url: None
  • paper_authors: Yinuo Ren, Feng Li, Yanfei Kang, Jue Wang
  • for: This paper proposes a deep learning ensemble forecasting model based on the Dirichlet process to integrate information from multiple sources and improve prediction accuracy.
  • methods: The method uses a deep learning sub-model pool, weight adjustment, and diversity strategies during the combination process. It also utilizes a decaying strategy to tackle the challenge of stochastic gradient descent in determining the optimal learning rate.
  • results: The proposed ensemble model demonstrates substantial improvements in prediction accuracy and stability compared to a single benchmark model, as demonstrated through an empirical analysis using the weekly dataset from the M4 competition.
    Abstract Forecast combination integrates information from various sources by consolidating multiple forecast results from the target time series. Instead of the need to select a single optimal forecasting model, this paper introduces a deep learning ensemble forecasting model based on the Dirichlet process. Initially, the learning rate is sampled with three basis distributions as hyperparameters to convert the infinite mixture into a finite one. All checkpoints are collected to establish a deep learning sub-model pool, and weight adjustment and diversity strategies are developed during the combination process. The main advantage of this method is its ability to generate the required base learners through a single training process, utilizing the decaying strategy to tackle the challenge posed by the stochastic nature of gradient descent in determining the optimal learning rate. To ensure the method's generalizability and competitiveness, this paper conducts an empirical analysis using the weekly dataset from the M4 competition and explores sensitivity to the number of models to be combined. The results demonstrate that the ensemble model proposed offers substantial improvements in prediction accuracy and stability compared to a single benchmark model.
    摘要 预测组合将多种来源的信息集成到一起,通过将多个预测结果从目标时间序列进行整合。而不需要手动选择最佳预测模型,这篇论文提出了基于 Dirichlet 过程的深度学习ensemble预测模型。在初始化学习率时,使用三个基础分布作为超参数来转换无穷 mixture 到一个有限的一个。然后,收集所有检查点来建立深度学习子模型池,并在组合过程中开发了质量调整和多样性策略。这种方法的主要优点是通过单一训练过程来生成所需的基础学习者,使用衰减策略来解决由梯度下降带来的难题,确定最佳学习率。为了保证方法的普遍性和竞争力,这篇论文通过使用 M4 竞赛每周数据集进行 empirical 分析,并探讨组合模型的数量对预测精度和稳定性的影响。结果表明,提议的组合模型可以在预测精度和稳定性方面提供显著改进,比单个参考模型更好。

Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge

  • paper_url: http://arxiv.org/abs/2311.12889
  • repo_url: None
  • paper_authors: Bowen Jiang, Zhijun Zhuang, Camillo Jose Taylor
  • for: 这篇论文旨在提出一种基于关系层次结构和常识知识的场景图生成方法,以提高场景图生成模型的性能。
  • methods: 该方法使用一个 bayesian 分类头,利用 hierarchical 结构来同时预测场景中两个物体之间的超类或类型关系,以及每个超类下的详细关系。
  • results: 实验结果表明,在 Visual Genome 和 OpenImage V6 数据集上,通过利用层次结构和常识知识,可以大幅提高场景图生成模型的性能。此外,该方法还可以作为现有场景图生成算法的可移植模块,以提高其结果。
    Abstract This work presents an enhanced approach to generating scene graphs by incorporating a relationship hierarchy and commonsense knowledge. Specifically, we propose a Bayesian classification head that exploits an informative hierarchical structure. It jointly predicts the super-category or type of relationship between the two objects, along with the detailed relationship under each super-category. We design a commonsense validation pipeline that uses a large language model to critique the results from the scene graph prediction system and then use that feedback to enhance the model performance. The system requires no external large language model assistance at test time, making it more convenient for practical applications. Experiments on the Visual Genome and the OpenImage V6 datasets demonstrate that harnessing hierarchical relationships enhances the model performance by a large margin. The proposed Bayesian head can also be incorporated as a portable module in existing scene graph generation algorithms to improve their results. In addition, the commonsense validation enables the model to generate an extensive set of reasonable predictions beyond dataset annotations.
    摘要 Simplified Chinese:这项工作提出了一种改进的场景图生成方法,通过包含层次关系结构和通用常识。特定的是,我们提议一种 bayesian 分类头,利用这种有用的层次结构。它同时预测两个物体之间的超类或类型关系,以及每个超类下的详细关系。我们设计了一个通用常识验证管道,使用一个大型语言模型来评估Scene graph prediction系统的结果,并使用这些反馈来提高模型性能。系统在测试时不需要外部大型语言模型的帮助,使其更适合实际应用。实验结果表明,在Visual Genome和OpenImage V6 datasets上,利用层次关系可以大幅提高模型性能。此外,我们还设计了一个可以与现有场景图生成算法集成的 bayesian 头,以提高其性能。通用常识验证使得模型可以生成更加广泛的合理预测,超过数据注释。

Post-Training Quantization with Low-precision Minifloats and Integers on FPGAs

  • paper_url: http://arxiv.org/abs/2311.12359
  • repo_url: None
  • paper_authors: Shivam Aggarwal, Alessandro Pappalardo, Hans Jakob Damsgaard, Giuseppe Franco, Thomas B. Preußer, Michaela Blott, Tulika Mitra
  • for: 本研究旨在提出一种新的减量策略,以减少神经网络的精度而无需额外训练 overhead。
  • methods: 本研究使用的方法包括minifloat和整数quantization schemes,以及Weight equalization、bias correction、SmoothQuant、gradient-based learned rounding和GPTQ方法等。
  • results: 实验结果表明,使用low-precision minifloats可以与整数quantization schemes相比,在精度-准确性trade-off中实现更好的性能。此外,我们还对FPGA硬件成本模型进行了评估,发现整数quantization通常是最优的选择,因为它的硬件资源占用较小。
    Abstract Post-Training Quantization (PTQ) is a powerful technique for model compression, reducing the precision of neural networks without additional training overhead. Recent works have investigated adopting 8-bit floating-point quantization (FP8) in the context of PTQ for model inference. However, the exploration of floating-point formats smaller than 8 bits and their comparison with integer quantization remains relatively limited. In this work, we present minifloats, which are reduced-precision floating-point formats capable of further reducing the memory footprint, latency, and energy cost of a model while approaching full-precision model accuracy. Our work presents a novel PTQ design-space exploration, comparing minifloat and integer quantization schemes across a range of 3 to 8 bits for both weights and activations. We examine the applicability of various PTQ techniques to minifloats, including weight equalization, bias correction, SmoothQuant, gradient-based learned rounding, and the GPTQ method. Our experiments validate the effectiveness of low-precision minifloats when compared to their integer counterparts across a spectrum of accuracy-precision trade-offs on a set of reference deep learning vision workloads. Finally, we evaluate our results against an FPGA-based hardware cost model, showing that integer quantization often remains the Pareto-optimal option, given its relatively smaller hardware resource footprint.
    摘要 后期训练量化(PTQ)是一种强大的模型压缩技术,可以降低神经网络的精度 без 额外训练负担。先前的研究已经 investigate 8 位浮点数量化(FP8)在 PTQ 中进行模型评估。然而,对于小于 8 位浮点数格式的研究以及与整数量化的比较尚未得到充分的探讨。在这种情况下,我们提出了“minifloat”,它是一种具有更低精度的浮点数格式,可以进一步减少模型的内存占用量、响应时间和能耗。我们的工作在 PTQ 设计空间中进行了新的探讨,比较了 minifloat 和整数量化方案,包括量化方法、平衡量化、偏移量化、SmoothQuant、学习型 learned 圆满量化和 GPTQ 方法。我们的实验表明,在一系列的深度学习视觉任务中,使用 low-precision minifloat 可以跟上整数量化的性能,并且可以在精度-精度之间进行trade-off。最后,我们对一个基于 FPGA 的硬件成本模型进行了评估,发现,在许多情况下,整数量化仍然是 Pareto 优化的选择,因为它的硬件资源占用相对较小。

Stable Diffusion For Aerial Object Detection

  • paper_url: http://arxiv.org/abs/2311.12345
  • repo_url: None
  • paper_authors: Yanan Jian, Fuxun Yu, Simranjit Singh, Dimitrios Stamoulis
  • for: 提高大规模飞行器物体检测的精度和效率
  • methods: 使用稳定扩散方法(SD)生成具有较高 semantics的增强数据,并通过粗略地址分割、低级别适应(LORA)和复制粘贴方法将生成的物体与背景融合
  • results: 通过这种方法,可以提高飞行器物体检测的精度和效率,并且可以适应大规模数据收集和飞行器物体的特殊性
    Abstract Aerial object detection is a challenging task, in which one major obstacle lies in the limitations of large-scale data collection and the long-tail distribution of certain classes. Synthetic data offers a promising solution, especially with recent advances in diffusion-based methods like stable diffusion (SD). However, the direct application of diffusion methods to aerial domains poses unique challenges: stable diffusion's optimization for rich ground-level semantics doesn't align with the sparse nature of aerial objects, and the extraction of post-synthesis object coordinates remains problematic. To address these challenges, we introduce a synthetic data augmentation framework tailored for aerial images. It encompasses sparse-to-dense region of interest (ROI) extraction to bridge the semantic gap, fine-tuning the diffusion model with low-rank adaptation (LORA) to circumvent exhaustive retraining, and finally, a Copy-Paste method to compose synthesized objects with backgrounds, providing a nuanced approach to aerial object detection through synthetic data.
    摘要
  1. Sparse-to-dense region of interest (ROI) extraction to bridge the semantic gap.2. Fine-tuning the diffusion model with low-rank adaptation (LORA) to circumvent exhaustive retraining.3. A Copy-Paste method to compose synthesized objects with backgrounds, providing a nuanced approach to aerial object detection through synthetic data.This framework aims to address the challenges in using diffusion-based methods for aerial object detection, and provide a more effective and efficient approach to this task.

A Survey on Large Language Models for Personalized and Explainable Recommendations

  • paper_url: http://arxiv.org/abs/2311.12338
  • repo_url: None
  • paper_authors: Junyi Chen
  • for: This survey aims to analyze how Recommender Systems (RS) can benefit from Large Language Models (LLM) based methodologies.
  • methods: The survey describes how LLMs can be used to enhance personalized and explainable recommendations by processing vast amounts of textual data.
  • results: The survey highlights major challenges in Personalized Explanation Generating (PEG) tasks, including cold-start problems, unfairness, and bias in RS.Here is the text in Simplified Chinese:
  • for: 这份调查旨在分析如何使用大型自然语言模型(LLM)来改进个性化和可靠的推荐系统。
  • methods: 调查描述了如何使用LLM来提高个性化和可靠的推荐,包括处理大量文本数据。
  • results: 调查强调了个性化解释生成(PEG)任务中的主要挑战,包括冷启动问题、不公正和偏见在推荐系统中。
    Abstract In recent years, Recommender Systems(RS) have witnessed a transformative shift with the advent of Large Language Models(LLMs) in the field of Natural Language Processing(NLP). These models such as OpenAI's GPT-3.5/4, Llama from Meta, have demonstrated unprecedented capabilities in understanding and generating human-like text. This has led to a paradigm shift in the realm of personalized and explainable recommendations, as LLMs offer a versatile toolset for processing vast amounts of textual data to enhance user experiences. To provide a comprehensive understanding of the existing LLM-based recommendation systems, this survey aims to analyze how RS can benefit from LLM-based methodologies. Furthermore, we describe major challenges in Personalized Explanation Generating(PEG) tasks, which are cold-start problems, unfairness and bias problems in RS.
    摘要 近年来,推荐系统(RS)在自然语言处理(NLP)领域内 witnessed 一种转变性的变革,即大型语言模型(LLM)的出现。这些模型,如OpenAI的 GPT-3.5/4 和 Llama from Meta,在理解和生成人类语言方面表现出了无 precedent 的能力。这对个人化和可解释推荐系统产生了一种新的思维方式, LLM 提供了一套灵活的工具集来处理大量文本数据,以提高用户体验。为了为读者提供全面的了解,本调查报告分析了如何使用 LLM 方法来改进 RS。此外,我们还描述了个性化解释生成(PEG)任务中的主要挑战,包括冷启始 пробле、不公正性和偏见问题在 RS 中。

Do Smaller Language Models Answer Contextualised Questions Through Memorisation Or Generalisation?

  • paper_url: http://arxiv.org/abs/2311.12337
  • repo_url: None
  • paper_authors: Tim Hartill, Joshua Bensemann, Michael Witbrock, Patricia J. Riddle
  • for: 这个论文的目的是检测语言模型是否通过记忆还是通过总结来回答问题。
  • methods: 这个论文使用了semantic similarity的方法来评估语言模型是否通过记忆或总结来回答问题。
  • results: 研究发现,使用这种方法可以检测出一些训练和评估样本之间的 semantic similarity,并且可以提高语言模型在这些样本上的性能。 Specifically, 在两个评估数据集上(DROP和ROPES),使用这种方法可以提高语言模型的性能 by 9.0%和25.7%。
    Abstract A distinction is often drawn between a model's ability to predict a label for an evaluation sample that is directly memorised from highly similar training samples versus an ability to predict the label via some method of generalisation. In the context of using Language Models for question-answering, discussion continues to occur as to the extent to which questions are answered through memorisation. We consider this issue for questions that would ideally be answered through reasoning over an associated context. We propose a method of identifying evaluation samples for which it is very unlikely our model would have memorised the answers. Our method is based on semantic similarity of input tokens and label tokens between training and evaluation samples. We show that our method offers advantages upon some prior approaches in that it is able to surface evaluation-train pairs that have overlap in either contiguous or discontiguous sequences of tokens. We use this method to identify unmemorisable subsets of our evaluation datasets. We train two Language Models in a multitask fashion whereby the second model differs from the first only in that it has two additional datasets added to the training regime that are designed to impart simple numerical reasoning strategies of a sort known to improve performance on some of our evaluation datasets but not on others. We then show that there is performance improvement between the two models on the unmemorisable subsets of the evaluation datasets that were expected to benefit from the additional training datasets. Specifically, performance on unmemorisable subsets of two of our evaluation datasets, DROP and ROPES significantly improves by 9.0%, and 25.7% respectively while other evaluation datasets have no significant change in performance.
    摘要 一般来说,模型的能力包括直接从高度相似的训练样本中memorize标签 versus通过一种总结来预测标签。在使用语言模型进行问答时,对问题是否通过记忆进行答案的问题进行讨论。我们对这个问题进行了考虑,并提出了一种用于标识评估样本,它们的答案 unlikely 被我们的模型记忆。我们的方法基于输入token和标签token在训练和评估样本之间的语义相似性。我们显示了我们的方法可以抗衡一些先前的方法,因为它可以检测到恰合的评估样本。我们使用这种方法来标识我们的评估数据集中的不可 memorizable 子集。我们在多任务模式下训练了两个语言模型,其中第二个模型与第一个模型只有两个额外的训练数据集不同。这两个额外的训练数据集是为了教会模型简单的数字推理策略,这种策略已知能够提高一些我们的评估数据集的表现,但不能提高其他数据集的表现。我们then 显示了这两个模型在不可 memorizable 子集上的表现有显著改善,具体来说,Drop和ROPES两个评估数据集上的表现改善了9.0%和25.7%。而其他评估数据集没有显著改善。

Classification of Instagram fake users using supervised machine learning algorithms

  • paper_url: http://arxiv.org/abs/2311.12336
  • repo_url: None
  • paper_authors: Vertika Singh, Naman Tolasaria, Patel Meet Alpeshkumar, Shreyash Bartwal
  • for: 这篇论文是为了提供一种检测和消除在社交媒体上的伪造账户和在线人意代表的应用程序,以保护公司免受诈骗风险。
  • methods: 该应用程序采用了用户中心的设计,使得它可以轻松地与现有的调查程序集成,并且为刑事分支机构提供易于使用的界面。
  • results: 该应用程序可以帮助检测和消除社交媒体上的伪造账户和在线人意代表,从而帮助公司减少诈骗风险。
    Abstract In the contemporary era, online social networks have become integral to social life, revolutionizing the way individuals manage their social connections. While enhancing accessibility and immediacy, these networks have concurrently given rise to challenges, notably the proliferation of fraudulent profiles and online impersonation. This paper proposes an application designed to detect and neutralize such dishonest entities, with a focus on safeguarding companies from potential fraud. The user-centric design of the application ensures accessibility for investigative agencies, particularly the criminal branch, facilitating navigation of complex social media landscapes and integration with existing investigative procedures
    摘要 现代时期,在线社交网络已成为社会生活的一 integral part,改变了人们如何管理社交连接的方式。同时,这些网络也产生了挑战,主要是假 profiles和在线隐身的普遍存在。这篇文章提出了一种应用程序,用于检测和消除这些不诚实的实体,以保护公司免受欺诈的风险。用户中心的设计使得这些应用程序对刑事部门的探员来说更加轻松使用,能够快速浏览复杂的社交媒体景观,并与现有的探索过程集成。

Quantum-Enhanced Support Vector Machine for Large-Scale Stellar Classification with GPU Acceleration

  • paper_url: http://arxiv.org/abs/2311.12328
  • repo_url: None
  • paper_authors: Kuan-Cheng Chen, Xiaotian Xu, Henry Makhanov, Hui-Hsuan Chung, Chen-Yu Liu
  • for: 这项研究旨在开发一种基于量子计算和GPU加速的量子增强支持向量机(QSVM)方法,用于stellar classification。
  • methods: 该方法利用量子计算原理和GPU加速,对复杂的 binary 和多类stellar classification系统进行处理,并显著超过了传统方法(如K-Nearest Neighbors和Logistic Regression)。
  • results: 研究发现,量子增强支持向量机方法能够显著提高分类精度,特别是在处理复杂的stellar classification问题时。此外,量子计算和GPU加速的结合使得计算效率得到了提高,并且可以处理大量数据。这些发现示示了量子机器学习在天体物理研究中的潜在应用前景。
    Abstract In this study, we introduce an innovative Quantum-enhanced Support Vector Machine (QSVM) approach for stellar classification, leveraging the power of quantum computing and GPU acceleration. Our QSVM algorithm significantly surpasses traditional methods such as K-Nearest Neighbors (KNN) and Logistic Regression (LR), particularly in handling complex binary and multi-class scenarios within the Harvard stellar classification system. The integration of quantum principles notably enhances classification accuracy, while GPU acceleration using the cuQuantum SDK ensures computational efficiency and scalability for large datasets in quantum simulators. This synergy not only accelerates the processing process but also improves the accuracy of classifying diverse stellar types, setting a new benchmark in astronomical data analysis. Our findings underscore the transformative potential of quantum machine learning in astronomical research, marking a significant leap forward in both precision and processing speed for stellar classification. This advancement has broader implications for astrophysical and related scientific fields
    摘要 在这项研究中,我们介绍了一种创新的量子机器学习(QSVM)方法,用于星系分类,利用量子计算和GPU加速。我们的QSVM算法在处理复杂的 binary 和多类enario 时表现出色,特别是在使用 Harvard 星系分类系统时。量子原理的 integrate 使得分类精度得到了明显提高,而使用 cuQuantum SDK 的 GPU 加速器 Ensures 计算效率和可扩展性,使得对大型量子 simulate 数据进行处理变得更加快速和高效。这种结合不仅加速处理过程,还提高了对不同星系类型的分类精度,创造了新的benchmark 在天体数据分析领域。我们的发现表明量子机器学习在天体研究中具有transformative 的潜力,标志着天体分类的精度和处理速度得到了 significan 提高。这一进步有广泛的应用前景,不仅限于天体物理和相关的科学领域。

A Survey on Multimodal Large Language Models for Autonomous Driving

  • paper_url: http://arxiv.org/abs/2311.12320
  • repo_url: https://github.com/irohxu/awesome-multimodal-llm-autonomous-driving
  • paper_authors: Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, Tianren Gao, Erlong Li, Kun Tang, Zhipeng Cao, Tong Zhou, Ao Liu, Xinrui Yan, Shuqi Mei, Jianguo Cao, Ziran Wang, Chao Zheng
    for:This paper aims to provide a comprehensive understanding of the challenges, opportunities, and future endeavors in applying Large Language Models (LLMs) to autonomous driving systems.methods:The paper reviews existing Multimodal Large Language Models (MLLMs) tools for driving, transportation, and map systems, as well as existing datasets and benchmarks. It also discusses several important problems that need to be solved by both academia and industry to further promote the development of this field.results:The paper presents a systematic investigation of the application of LLMs in autonomous driving systems, including a review of existing MLLM tools, datasets, and benchmarks, as well as a discussion of the challenges and opportunities in this field.
    Abstract With the emergence of Large Language Models (LLMs) and Vision Foundation Models (VFMs), multimodal AI systems benefiting from large models have the potential to equally perceive the real world, make decisions, and control tools as humans. In recent months, LLMs have shown widespread attention in autonomous driving and map systems. Despite its immense potential, there is still a lack of a comprehensive understanding of key challenges, opportunities, and future endeavors to apply in LLM driving systems. In this paper, we present a systematic investigation in this field. We first introduce the background of Multimodal Large Language Models (MLLMs), the multimodal models development using LLMs, and the history of autonomous driving. Then, we overview existing MLLM tools for driving, transportation, and map systems together with existing datasets and benchmarks. Moreover, we summarized the works in The 1st WACV Workshop on Large Language and Vision Models for Autonomous Driving (LLVM-AD), which is the first workshop of its kind regarding LLMs in autonomous driving. To further promote the development of this field, we also discuss several important problems regarding using MLLMs in autonomous driving systems that need to be solved by both academia and industry.
    摘要 大量语言模型(LLM)和视觉基础模型(VFM)的出现,使得多模态AI系统可以与人类一样理解真实世界,做出决策,控制工具。在最近几个月里,LLM在自动驾驶和地图系统中受到了广泛的关注。虽然它们的潜力很大,但是还没有一个全面的理解关键挑战、机会和未来应用在LLM驾驶系统中的发展。本文提出了一个系统性的调查。首先,我们介绍了多模态大语言模型(MLLM)的背景,多模态模型的开发使用LLM,自动驾驶的历史。然后,我们对现有的MLLM工具、交通、地图系统进行了概述。此外,我们还summarized了在第一届WACV工作坊上关于大语言和视觉模型 для自动驾驶(LLVM-AD)的工作,这是自动驾驶领域中第一次关于LLM的工作坊。为了进一步推动这个领域的发展,我们还讨论了在使用MLLM时需要解决的一些重要问题,这些问题包括学术界和产业界。

Overcoming Pathology Image Data Deficiency: Generating Images from Pathological Transformation Process

  • paper_url: http://arxiv.org/abs/2311.12316
  • repo_url: https://github.com/rowerliu/adbd
  • paper_authors: Zeyu Liu, Yufang He, Yu Zhao, Yunlu Feng, Guanglei Zhang
  • for: 医学诊断中的金标准– histopathology 面临应用限制,由于医疗资源短缺。通过深度学习,计算支持诊断可以解决医生短缺和提供有效的临床分析。但是,建立可靠的模型通常需要大量的训练数据,这在 PATHOLOGICAL 领域是挑战。
  • methods: 我们提出了一种适应深度控制的双向扩散网络(ADBD),用于图像数据生成。域控制方法可以使用小训练集并在源信息引导下解决扩散过拟合问题。我们还开发了一种混合注意策略,将全局和本地注意力优先级融合,导引扩散和确保迁徙成功。
  • results: ADBD 能够有效地解决 PATHOLOGICAL 图像数据不足的问题,并且可以支持进一步的 PATHOLOGICAL 相关研究。我们通过实验证明,ADBD 可以生成无限个交叉域中间图像,并且与医生诊断相关的软标签相匹配。
    Abstract Histopathology serves as the gold standard for medical diagnosis but faces application limitations due to the shortage of medical resources. Leveraging deep learning, computer-aided diagnosis has the potential to alleviate the pathologist scarcity and provide timely clinical analysis. However, developing a reliable model generally necessitates substantial data for training, which is challenging in pathological field. In response, we propose an adaptive depth-controlled bidirectional diffusion (ADBD) network for image data generation. The domain migration approach can work with small trainset and overcome the diffusion overfitting by source information guidance. Specifically, we developed a hybrid attention strategy to blend global and local attention priorities, which guides the bidirectional diffusion and ensures the migration success. In addition, we developed the adaptive depth-controlled strategy to simulate physiological transformations, capable of yielding unlimited cross-domain intermediate images with corresponding soft labels. ADBD is effective for overcoming pathological image data deficiency and supportable for further pathology-related research.
    摘要 Here's the Simplified Chinese translation: histopathology 作为医学诊断的标准,但它面临着医疗资源短缺的问题。通过深度学习,计算机辅助诊断有可能解决医生短缺问题并提供快速的临床分析。然而,开发可靠的模型通常需要大量的训练数据,这在pathological领域是挑战。为此,我们提出了适应深度控制的双向扩散网络(ADBD)。这种领域迁移方法可以与小训练集一起工作,并通过源信息导航来避免扩散过拟合。我们还开发了一种混合注意力策略,将全局注意力和本地注意力优先级相互融合,以指导双向扩散和确保迁移成功。此外,我们还开发了适应深度控制策略,可以模拟生物物理转化,生成无限cross-domain中间图像和相应的软标签。 ADBD 有效地解决了医学图像数据不足问题,并可以用于进一步的医学相关研究。

IEKM: A Model Incorporating External Keyword Matrices

  • paper_url: http://arxiv.org/abs/2311.12310
  • repo_url: None
  • paper_authors: Cheng Luo, Qin Li, Zhao Yan, Mengliang Rao, Yunbo Cao
  • for: 解决客户端 plataform 系统中的核心文本Semantic Similarity(STS)任务中的两个紧迫挑战:首先,需要适应不同领域的客户(DDA);其次,模型很难分辨具有相似文本句对的句对,即困难的负样本。
  • methods: 我们提出了一种将外部关键词矩阵(IEKM)模型与Transformer结构结合使用,以实现自适应更新。通过网格单元对自注意层进行修正,以便更好地调整模型结果。
  • results: 我们在多个数据集上评估了方法,结果表明我们的方法在所有数据集上都有提高表现。为了证明我们的方法可以有效解决所有挑战,我们进行了灵活更正试验,其结果表明F1值从56.61提高到73.53。
    Abstract A customer service platform system with a core text semantic similarity (STS) task faces two urgent challenges: Firstly, one platform system needs to adapt to different domains of customers, i.e., different domains adaptation (DDA). Secondly, it is difficult for the model of the platform system to distinguish sentence pairs that are literally close but semantically different, i.e., hard negative samples. In this paper, we propose an incorporation external keywords matrices model (IEKM) to address these challenges. The model uses external tools or dictionaries to construct external matrices and fuses them to the self-attention layers of the Transformer structure through gating units, thus enabling flexible corrections to the model results. We evaluate the method on multiple datasets and the results show that our method has improved performance on all datasets. To demonstrate that our method can effectively solve all the above challenges, we conduct a flexible correction experiment, which results in an increase in the F1 value from 56.61 to 73.53. Our code will be publicly available.
    摘要 客户服务平台系统面临两个紧急挑战:首先,平台系统需要适应不同的客户领域(DDA)。其次,模型很难区分具有相似文本内容但具有不同含义的句子,即困难的负题样本(hard negative samples)。在这篇论文中,我们提出了一个外部关键矩阵模型(IEKM)来解决这些挑战。这个模型使用外部工具或词汇目录来构建外部矩阵,然后通过闸道单元与自律对话层组合,实现柔软调整模型结果。我们在多个数据集上进行评估,结果显示我们的方法在所有数据集上都有提高表现。为了证明我们的方法可以有效解决所有挑战,我们进行了柔软调整实验,实验结果显示自愿对应增加F1值由56.61提高至73.53。我们的代码将会公开。

Causality is all you need

  • paper_url: http://arxiv.org/abs/2311.12307
  • repo_url: None
  • paper_authors: Ning Xu, Yifei Gao, Hongshuo Tian, Yongdong Zhang, An-An Liu
  • for: This paper aims to build an integrated causal framework for revealing cause-effect forces hidden in data, which can improve the machine’s comprehension of causal relationships within a broader semantic space.
  • methods: The proposed Causal Graph Routing (CGR) framework relies entirely on intervention mechanisms and includes a stack of causal layers, each with a set of parallel deconfounding blocks from different causal graphs. The sufficient cause concept is used to dynamically select the suitable deconfounding methods in each layer.
  • results: The CGR framework was evaluated on two classical tasks of CV and NLP and surpassed the current state-of-the-art methods on both Visual Question Answer and Long Document Classification tasks. It has great potential in building the “causal” pre-training large-scale model that effectively generalizes to diverse tasks.
    Abstract In the fundamental statistics course, students are taught to remember the well-known saying: "Correlation is not Causation". Till now, statistics (i.e., correlation) have developed various successful frameworks, such as Transformer and Pre-training large-scale models, which have stacked multiple parallel self-attention blocks to imitate a wide range of tasks. However, in the causation community, how to build an integrated causal framework still remains an untouched domain despite its excellent intervention capabilities. In this paper, we propose the Causal Graph Routing (CGR) framework, an integrated causal scheme relying entirely on the intervention mechanisms to reveal the cause-effect forces hidden in data. Specifically, CGR is composed of a stack of causal layers. Each layer includes a set of parallel deconfounding blocks from different causal graphs. We combine these blocks via the concept of the proposed sufficient cause, which allows the model to dynamically select the suitable deconfounding methods in each layer. CGR is implemented as the stacked networks, integrating no confounder, back-door adjustment, front-door adjustment, and probability of sufficient cause. We evaluate this framework on two classical tasks of CV and NLP. Experiments show CGR can surpass the current state-of-the-art methods on both Visual Question Answer and Long Document Classification tasks. In particular, CGR has great potential in building the "causal" pre-training large-scale model that effectively generalizes to diverse tasks. It will improve the machines' comprehension of causal relationships within a broader semantic space.
    摘要 在基础统计课程中,学生常被教育以下诀窍:“相关性不等于 causality”。到目前为止,统计(即相关性)已经发展出了多种成功的框架,如 transformer 和预训练大规模模型,它们堆叠了多个并行的自注意机制来模仿多种任务。然而,在 causality 社区中,如何建立一个完整的 causal 框架仍然是一个未解决的问题。在这篇论文中,我们提出了 causal graph routing(CGR)框架,一种基于 intervención 机制的完整 causal 模型。specifically,CGR 由一系列 causal 层组成,每个层包括多个并行的 deconfounding 块。我们将这些块结合到一起,使用我们提出的 sufficient cause 概念,让模型在每个层中动态选择合适的 deconfounding 方法。CGR 实现为堆叠网络,结合 no confounder、back-door adjustment、front-door adjustment 和 probability of sufficient cause。我们在 CV 和 NLP 两个 classical task 上进行了实验,结果表明,CGR 可以超越当前状态的艺术方法。特别是,CGR 在建立 "causal" pre-training 大规模模型方面具有潜力,这将提高机器的 causal 关系理解,并扩展到更广泛的 semantic space。

Discovering Effective Policies for Land-Use Planning

  • paper_url: http://arxiv.org/abs/2311.12304
  • repo_url: None
  • paper_authors: Risto Miikkulainen, Olivier Francon, Daniel Young, Elliot Meyerson, Babak Hodjat
  • for: 这个研究的目的是为了提供一种有效的地用规划工具,以便决策者可以快速和有效地评估不同的地用选择,以影响气候变化。
  • methods: 该研究使用了可用的历史数据Changes in land use和Simulation of carbon emissions/absorption来学习一个代表模型,然后使用了进化搜索过程来找到适合specific locations的有效的地用政策。
  • results: 该研究使用了Project Resilience平台和Land-Use Harmonization数据集以及BLUE simulator来构建了一个可以生成Pareto前方的系统,这个系统可以考虑不同的地点和不同的地用选择,并提供了一个可能有用的地用规划工具。
    Abstract How areas of land are allocated for different uses, such as forests, urban, and agriculture, has a large effect on carbon balance, and therefore climate change. Based on available historical data on changes in land use and a simulation of carbon emissions/absorption, a surrogate model can be learned that makes it possible to evaluate the different options available to decision-makers efficiently. An evolutionary search process can then be used to discover effective land-use policies for specific locations. Such a system was built on the Project Resilience platform and evaluated with the Land-Use Harmonization dataset and the BLUE simulator. It generates Pareto fronts that trade off carbon impact and amount of change customized to different locations, thus providing a potentially useful tool for land-use planning.
    摘要 “地域的不同用途地域分配,如森林、城市和农业,对碳平衡有着很大的影响,因此对气候变化产生了重要的影响。通过历史数据Changes in land use和碳排放/吸收的模拟,可以学习一个代表模型,以便效率地评估不同选择。然后,一种进化搜索过程可以用来找到适合特定位置的有效地区用途政策。这种系统在Project Resilience平台上建立,并使用Land-Use Harmonization数据集和BLUE模拟器进行评估。它生成了碳影响和改变量的 pareto 前提,因此可以提供有用的工具 для地域规划。”Note: "碳平衡" (carbon balance) is translated as "碳平衡" in Simplified Chinese, but it is translated as "碳总量" (carbon total) in Traditional Chinese.

Detecting subtle macroscopic changes in a finite temperature classical scalar field with machine learning

  • paper_url: http://arxiv.org/abs/2311.12303
  • repo_url: None
  • paper_authors: Jiming Yang, Yutong Zheng, Jiahong Zhou, Huiyu Li, Jun Yin
  • for: 这个研究旨在探索如何通过检测宏观变化来探测实验几体系中的行为,从古典到量子领域。
  • methods: 研究使用了不同的方法来分 differentiate scalar field samples at varying temperatures,包括物理方法、统计方法和人工智能方法。
  • results: 研究发现,人工智能方法在敏感性方面比物理方法和统计方法更高,能够更好地探测宏观变化。这个结果提供了一种 Proof-of-concept,证明人工智能可能可以检测实验几体系中的宏观变化,这些变化可能会被物理测量所隐藏。
    Abstract The ability to detect macroscopic changes is important for probing the behaviors of experimental many-body systems from the classical to the quantum realm. Although abrupt changes near phase boundaries can easily be detected, subtle macroscopic changes are much more difficult to detect as the changes can be obscured by noise. In this study, as a toy model for detecting subtle macroscopic changes in many-body systems, we try to differentiate scalar field samples at varying temperatures. We compare different methods for making such differentiations, from physics method, statistics method, to AI method. Our finding suggests that the AI method outperforms both the statistical method and the physics method in its sensitivity. Our result provides a proof-of-concept that AI can potentially detect macroscopic changes in many-body systems that elude physical measures.
    摘要 <>translate "The ability to detect macroscopic changes is important for probing the behaviors of experimental many-body systems from the classical to the quantum realm. Although abrupt changes near phase boundaries can easily be detected, subtle macroscopic changes are much more difficult to detect as the changes can be obscured by noise. In this study, as a toy model for detecting subtle macroscopic changes in many-body systems, we try to differentiate scalar field samples at varying temperatures. We compare different methods for making such differentiations, from physics method, statistics method, to AI method. Our finding suggests that the AI method outperforms both the statistical method and the physics method in its sensitivity. Our result provides a proof-of-concept that AI can potentially detect macroscopic changes in many-body systems that elude physical measures." into Simplified Chinese.中文简体版:检测macroscopic变化对于实验many-body系统的行为 probing是非常重要。虽然在相对boundary附近的快速变化容易被检测出来,但是微妙的macroscopic变化很难被检测,因为这些变化可能会被噪声所隐藏。在这种情况下,我们使用scalar fieldamples在不同温度下进行比较,以便 diferenciating它们。我们比较了不同的方法,包括物理方法、统计方法和AI方法。我们的发现表明,AI方法在敏感性方面比物理方法和统计方法更高。这个结果提供了一种 Proof-of-concept,表明AI可能可以检测many-body系统中隐藏的macroscopic变化。

Noise in Relation Classification Dataset TACRED: Characterization and Reduction

  • paper_url: http://arxiv.org/abs/2311.12298
  • repo_url: None
  • paper_authors: Akshay Parekh, Ashish Anand, Amit Awekar
    for:这 paper 的目的是两fold。首先,探索基于模型的方法来描述数据集 TACRED 中的噪音的主要原因。其次,标识可能噪音的实例。methods:我们分析了现有的模型的预测和性能,以确定数据集 TACRED 中噪音的根本原因。我们发现大多数噪音来自于标记为“无关”的实例。results:我们提出了两种 nearest-neighbor-based 策略来自动标识可能噪音的实例,并对它们进行消除和重新标识。我们的第一种策略(IS)基于假设正例是干净的,因此我们使用假降低预测来标识噪音的负例。而我们的第二种策略(ES)基于使用干净的数据集来标识可能噪音的负例。我们的实验结果表明,在使用 IS retrained 两个 SOTA 模型时,得到了平均4%的 F1 分数提高。而重新标识(TACRED-R)不会改善原始结果。但是,在 ES 中,SOTA 模型在对应的消除和重新标识数据集上得到了平均3.8%和4.4%的 F1 分数提高。我们还扩展了 ES 来清洁正例,得到了平均5.8%和5.6%的 F1 分数提高。
    Abstract The overarching objective of this paper is two-fold. First, to explore model-based approaches to characterize the primary cause of the noise. in the RE dataset TACRED Second, to identify the potentially noisy instances. Towards the first objective, we analyze predictions and performance of state-of-the-art (SOTA) models to identify the root cause of noise in the dataset. Our analysis of TACRED shows that the majority of the noise in the dataset originates from the instances labeled as no-relation which are negative examples. For the second objective, we explore two nearest-neighbor-based strategies to automatically identify potentially noisy examples for elimination and reannotation. Our first strategy, referred to as Intrinsic Strategy (IS), is based on the assumption that positive examples are clean. Thus, we have used false-negative predictions to identify noisy negative examples. Whereas, our second approach, referred to as Extrinsic Strategy, is based on using a clean subset of the dataset to identify potentially noisy negative examples. Finally, we retrained the SOTA models on the eliminated and reannotated dataset. Our empirical results based on two SOTA models trained on TACRED-E following the IS show an average 4% F1-score improvement, whereas reannotation (TACRED-R) does not improve the original results. However, following ES, SOTA models show the average F1-score improvement of 3.8% and 4.4% when trained on respective eliminated (TACRED-EN) and reannotated (TACRED-RN) datasets respectively. We further extended the ES for cleaning positive examples as well, which resulted in an average performance improvement of 5.8% and 5.6% for the eliminated (TACRED-ENP) and reannotated (TACRED-RNP) datasets respectively.
    摘要 本文的主要目标是两重的。一是探索基于模型的方法来描述主要噪声的原因。在TACRED数据集上,我们分析了现有的状态艺模型的预测和性能,以确定噪声的根本原因。我们发现TACRED数据集中的噪声主要来自于“无关”类标注的实例。第二个目标是自动标识可能噪声的实例,以便对其进行更正和重新标识。我们提出了两种 nearest-neighbor-based 策略,即内在策略(IS)和外在策略(ES)。IS 基于假设正例是干净的,因此我们使用假阳例的预测来标识噪声的负例。ES 基于使用干净的数据集来标识可能噪声的负例。最后,我们在TACRED-E上重新训练了两个状态艺模型,并观察其性能。我们发现,基于IS,两个模型在TACRED-E上的平均 F1 分数提高了4%。然而,重新标识(TACRED-R)并没有改善原始结果。但是,基于ES,两个模型在TACRED-EN和TACRED-RN上的平均 F1 分数提高了3.8%和4.4%。我们还扩展了ES,以清理正例也,并观察了其性能改善。得到的结果显示,在TACRED-ENP和TACRED-RNP上,两个模型的平均 F1 分数提高了5.8%和5.6%。

ATLANTIC: Structure-Aware Retrieval-Augmented Language Model for Interdisciplinary Science

  • paper_url: http://arxiv.org/abs/2311.12289
  • repo_url: None
  • paper_authors: Sai Munikoti, Anurag Acharya, Sridevi Wagle, Sameera Horawalavithana
  • for: 提高语言模型在科学任务中的表现, especialy Retrieval augmentation 技术可以补充语言模型的知识容量。
  • methods: 提出了一种新的结构意识推 retrieved 语言模型,通过在检索过程中考虑文档之间的结构关系来提高检索的准确性和有用性。
  • results: 实验结果表明,结构意识推 retrieved 能够提取更加准确、 faithful 和上下文相关的段落,同时与总准确率相比具有相似的表现。
    Abstract Large language models record impressive performance on many natural language processing tasks. However, their knowledge capacity is limited to the pretraining corpus. Retrieval augmentation offers an effective solution by retrieving context from external knowledge sources to complement the language model. However, existing retrieval augmentation techniques ignore the structural relationships between these documents. Furthermore, retrieval models are not explored much in scientific tasks, especially in regard to the faithfulness of retrieved documents. In this paper, we propose a novel structure-aware retrieval augmented language model that accommodates document structure during retrieval augmentation. We create a heterogeneous document graph capturing multiple types of relationships (e.g., citation, co-authorship, etc.) that connect documents from more than 15 scientific disciplines (e.g., Physics, Medicine, Chemistry, etc.). We train a graph neural network on the curated document graph to act as a structural encoder for the corresponding passages retrieved during the model pretraining. Particularly, along with text embeddings of the retrieved passages, we obtain structural embeddings of the documents (passages) and fuse them together before feeding them to the language model. We evaluate our model extensively on various scientific benchmarks that include science question-answering and scientific document classification tasks. Experimental results demonstrate that structure-aware retrieval improves retrieving more coherent, faithful and contextually relevant passages, while showing a comparable performance in the overall accuracy.
    摘要 大型自然语言处理模型在许多任务上表现出色,但它们的知识容量受到预训练文档的限制。检索增强技术可以补充语言模型,但现有技术忽略了文档之间的结构关系。而且,在科学任务中,检索模型尚未得到充分的研究,特别是 faithfulness 的 retrieved 文档。本文提出了一种新的结构意识检索增强语言模型,该模型在检索过程中考虑文档之间的结构关系。我们构建了多种关系(如引用、合作作者等)连接来自多个科学领域(如物理、医学、化学等)的文档的异质文档图。我们使用图 neural network 在 cura 文档图上训练结构编码器,以便在语言模型 pré-training 过程中使用。特别是,在检索过程中获取的文档(Passage)的文本嵌入以及结构嵌入,并将它们进行拼接并传递给语言模型。我们在科学问答和科学文档分类任务中进行了广泛的实验评估。实验结果表明,结构意识检索可以提取更加一致、忠实和上下文相关的文档段落,同时保持与总准确率相当的水平。

Adapting LLMs for Efficient, Personalized Information Retrieval: Methods and Implications

  • paper_url: http://arxiv.org/abs/2311.12287
  • repo_url: None
  • paper_authors: Samira Ghodratnama, Mehrdad Zakershahrak
  • for: 这篇论文旨在探讨大语言模型(LLM)在信息检索中的应用,以及如何优化检索过程、选择优化模型和效率地搭配LLM。
  • methods: 论文提出了一些优化检索过程的方法,包括模型优化、数据优化和系统优化,以及如何选择最佳的LLM模型。
  • results: 论文通过对多个实验和案例进行分析,显示了LLM在信息检索中的效果,并解决了模型幻觉等问题。
    Abstract The advent of Large Language Models (LLMs) heralds a pivotal shift in online user interactions with information. Traditional Information Retrieval (IR) systems primarily relied on query-document matching, whereas LLMs excel in comprehending and generating human-like text, thereby enriching the IR experience significantly. While LLMs are often associated with chatbot functionalities, this paper extends the discussion to their explicit application in information retrieval. We explore methodologies to optimize the retrieval process, select optimal models, and effectively scale and orchestrate LLMs, aiming for cost-efficiency and enhanced result accuracy. A notable challenge, model hallucination-where the model yields inaccurate or misinterpreted data-is addressed alongside other model-specific hurdles. Our discourse extends to crucial considerations including user privacy, data optimization, and the necessity for system clarity and interpretability. Through a comprehensive examination, we unveil not only innovative strategies for integrating Language Models (LLMs) with Information Retrieval (IR) systems, but also the consequential considerations that underline the need for a balanced approach aligned with user-centric principles.
    摘要 大型语言模型(LLM)的出现标志着在线用户与信息交互的重要转变。传统信息检索(IR)系统主要依靠查询文档匹配,而LLM则 excel在理解和生成人类化文本方面,从而增强了IR体验的质量。尽管LLM常常与chatbot功能相关,但这篇论文推广了它们在信息检索中的直接应用。我们探讨了优化检索过程、选择最佳模型和效率地执行LLM的方法,以实现成本效益和提高结果准确性。我们还 Addresses 模型幻觉-模型生成的数据不准确或被误解的问题,以及其他模型特有的挑战。我们的讨论还涵盖了重要的考虑因素,如用户隐私、数据优化和系统的清晰性和可解释性。通过全面的检视,我们不仅探讨了将语言模型(LLM)与信息检索(IR)系统集成的创新策略,还探讨了这些策略的基础是用户中心的原则。

Probabilistic Forecast Reconciliation with Kullback-Leibler Divergence Regularization

  • paper_url: http://arxiv.org/abs/2311.12279
  • repo_url: https://github.com/guanyu0316/Probabilistic-Forecast-Reconciliation-with-DL
  • paper_authors: Guanyu Zhang, Feng Li, Yanfei Kang
  • for: 这个研究旨在提出一种新的概率预测重新汇合方法,以解决现有的预测预测重新汇合方法对准确性和一致性之间的贡献。
  • methods: 本研究使用深度学习框架实现概率预测重新汇合,并引入库拉克-莱布对不对准确性调整项,使重新汇合步骤更加柔软和软性。
  • results: 研究使用三个层次时间序数据集进行评估,结果显示 compared to other probabilistic forecast reconciliation methods, our proposed approach has several advantages.
    Abstract As the popularity of hierarchical point forecast reconciliation methods increases, there is a growing interest in probabilistic forecast reconciliation. Many studies have utilized machine learning or deep learning techniques to implement probabilistic forecasting reconciliation and have made notable progress. However, these methods treat the reconciliation step as a fixed and hard post-processing step, leading to a trade-off between accuracy and coherency. In this paper, we propose a new approach for probabilistic forecast reconciliation. Unlike existing approaches, our proposed approach fuses the prediction step and reconciliation step into a deep learning framework, making the reconciliation step more flexible and soft by introducing the Kullback-Leibler divergence regularization term into the loss function. The approach is evaluated using three hierarchical time series datasets, which shows the advantages of our approach over other probabilistic forecast reconciliation methods.
    摘要 随着层次点预测重叠方法的流行程度增加,对于 probabilistic 预测重叠方法的兴趣也在增加。许多研究使用机器学习或深度学习技术来实现 probabilistic 预测重叠方法,并在这方面做出了 notable 的进步。然而,这些方法将重叠步骤视为固定和坚定的后处理步骤,导致精度和准确性之间存在负面的贸易。在这篇论文中,我们提出了一种新的方法 для probabilistic 预测重叠。与现有方法不同的是,我们的提议方法将预测步骤和重叠步骤组合到了深度学习框架中,使得重叠步骤更加 flexible 和软,通过引入 Kullback-Leibler 分布regularization 项到损失函数中。该方法在三个层次时间序列数据集上进行评估,并显示了我们的方法与其他 probabilistic 预测重叠方法的优势。

Learning Causal Representations from General Environments: Identifiability and Intrinsic Ambiguity

  • paper_url: http://arxiv.org/abs/2311.12267
  • repo_url: None
  • paper_authors: Jikai Jin, Vasilis Syrgkanis
  • for: 这paper研究了 causal representation learning,即从低级数据中恢复高级 latent variables 和它们的 causal 关系,假设我们有访问多个环境中的观察数据。
  • methods: 这paper使用 linear causal models 和 general non-parametric causal models,并提供了一个名为 LiNGCReL 的算法,可以 garantuee recuperate 真实模型。
  • results: 这paper提供了一种可以在不假设硬 intervención的情况下保证模型可以被 recuperate 的方法,并且通过数学实验证明了这种方法的有效性。
    Abstract This paper studies causal representation learning, the task of recovering high-level latent variables and their causal relationships from low-level data that we observe, assuming access to observations generated from multiple environments. While existing works are able to prove full identifiability of the underlying data generating process, they typically assume access to single-node, hard interventions which is rather unrealistic in practice. The main contribution of this paper is characterize a notion of identifiability which is provably the best one can achieve when hard interventions are not available. First, for linear causal models, we provide identifiability guarantee for data observed from general environments without assuming any similarities between them. While the causal graph is shown to be fully recovered, the latent variables are only identified up to an effect-domination ambiguity (EDA). We then propose an algorithm, LiNGCReL which is guaranteed to recover the ground-truth model up to EDA, and we demonstrate its effectiveness via numerical experiments. Moving on to general non-parametric causal models, we prove the same idenfifiability guarantee assuming access to groups of soft interventions. Finally, we provide counterparts of our identifiability results, indicating that EDA is basically inevitable in our setting.
    摘要 The main contribution of this paper is to characterize a notion of identifiability that is the best one can achieve without hard interventions. For linear causal models, we provide identifiability guarantees for data observed from general environments without assuming any similarities between them. While the causal graph is fully recovered, the latent variables are only identified up to an effect-domination ambiguity (EDA).We then propose an algorithm, LiNGCReL, which is guaranteed to recover the ground-truth model up to EDA. We demonstrate the effectiveness of our algorithm through numerical experiments.For general non-parametric causal models, we prove the same identifiability guarantee assuming access to groups of soft interventions. Finally, we provide counterparts of our identifiability results, indicating that EDA is inevitable in our setting.Translated into Simplified Chinese:这篇论文研究了 causal representation learning,即从低级数据中恢复高级 latent variable 和它们的 causal 关系。现有的工作可以证明下游数据生成过程的全部可 identificability,但它们假设了单个节点、硬 intervención,这是实际中不太可能的。这篇论文的主要贡献是 characterize一种可以 achievable 的 identifiability,不需要硬 intervención。 для linear causal models,我们提供了数据来自通用环境的全部可 identificability 保证,而不需要任何相似性假设。虽然 causal graph 完全可以被恢复,但 latent variable 只能被确定到 effect-domination ambiguity(EDA)。我们then propose了一种算法,LiNGCReL,该算法可以 garantuee 恢复真实模型,并且我们通过数学实验证明了其效果。对于通用非参数 causal models,我们证明了同样的可 identificability 保证,假设有软 intervención。最后,我们提供了对应的可 identificability 结果,表明 EDA 是不可避免的。

Resilient Control of Networked Microgrids using Vertical Federated Reinforcement Learning: Designs and Real-Time Test-Bed Validations

  • paper_url: http://arxiv.org/abs/2311.12264
  • repo_url: None
  • paper_authors: Sayak Mukherjee, Ramij R. Hossain, Sheik M. Mohiuddin, Yuan Liu, Wei Du, Veronica Adetola, Rohit A. Jinsiwale, Qiuhua Huang, Tianzhixi Yin, Ankit Singhal
  • for: 提高网络化微型电网的系统级可靠性,尤其是随着人口增加的增板式资源(IBR)的加入。
  • methods: 提出了一种 Federated Reinforcement Learning(Fed-RL)方法,用于解决模型复杂性和不确定的IBR设备动态行为问题,以及数据分享的隐私问题。
  • results: 通过将在模拟世界中学习的策略转移到实时硬件 loops中,成功桥接了模拟和实际世界之间的差距。实验结果表明,由模拟世界学习的RL控制器在实时硬件中表现出了令人满意的结果。
    Abstract Improving system-level resiliency of networked microgrids is an important aspect with increased population of inverter-based resources (IBRs). This paper (1) presents resilient control design in presence of adversarial cyber-events, and proposes a novel federated reinforcement learning (Fed-RL) approach to tackle (a) model complexities, unknown dynamical behaviors of IBR devices, (b) privacy issues regarding data sharing in multi-party-owned networked grids, and (2) transfers learned controls from simulation to hardware-in-the-loop test-bed, thereby bridging the gap between simulation and real world. With these multi-prong objectives, first, we formulate a reinforcement learning (RL) training setup generating episodic trajectories with adversaries (attack signal) injected at the primary controllers of the grid forming (GFM) inverters where RL agents (or controllers) are being trained to mitigate the injected attacks. For networked microgrids, the horizontal Fed-RL method involving distinct independent environments is not appropriate, leading us to develop vertical variant Federated Soft Actor-Critic (FedSAC) algorithm to grasp the interconnected dynamics of networked microgrid. Next, utilizing OpenAI Gym interface, we built a custom simulation set-up in GridLAB-D/HELICS co-simulation platform, named Resilient RL Co-simulation (ResRLCoSIM), to train the RL agents with IEEE 123-bus benchmark test systems comprising 3 interconnected microgrids. Finally, the learned policies in simulation world are transferred to the real-time hardware-in-the-loop test-bed set-up developed using high-fidelity Hypersim platform. Experiments show that the simulator-trained RL controllers produce convincing results with the real-time test-bed set-up, validating the minimization of sim-to-real gap.
    摘要 提高网络化微型电网的系统级坚韧性是一项非常重要的任务,随着人口的增加,增加了基于散列器(IBR)的资源。这篇文章(1)提出了在抗击敌意网络事件时的坚韧控制设计,并提出了一种新的联邦学习(Fed-RL)方法来解决模型复杂性、不确定的IBR设备动态行为和数据共享问题。此外,文章还解决了在多方所有的网络化电网中数据共享的隐私问题。通过以下多元目标,文章首先形成了一种学习控制(RL)训练设置,生成了抗击攻击信号插入Grid Forming Inverter(GFM)的primary控制器的epsidic trajectory。为了满足网络微型电网的特殊需求,我们不采用了水平的Fed-RL方法,而是开发了垂直的Federated Soft Actor-Critic(FedSAC)算法,以捕捉网络微型电网的水平连接动态。接着,我们使用OpenAI Gym接口,建立了一个自定义的GridLAB-D/HELICS合作 simulate平台,名为Resilient RL Co-simulation(ResRLCoSIM),用于训练RL代理。最后,文章通过将 simulate世界中学习的策略转移到实时硬件实验室中, Validated the minimization of sim-to-real gap。实验结果表明,由 simulate世界中学习的RL控制器在实时硬件实验室中产生了令人满意的结果,验证了 sim-to-real gap的最小化。