cs.AI - 2023-10-11

AutoRepo: A general framework for multi-modal LLM-based automated construction reporting

  • paper_url: http://arxiv.org/abs/2310.07944
  • repo_url: None
  • paper_authors: Hongxu Pu, Xincong Yang, Jing Li, Runhao Guo, Heng Li
  • for: 提高建筑项目安全、质量和时间完成性,使用自动生成建筑检查报告的新框架AutoRepo。
  • methods: 利用无人车进行建筑检查,收集场景信息,并使用多模态大语言模型生成检查报告。
  • results: 在实际建筑项目中应用并测试了AutoRepo框架,显示它可以减少检查过程的时间和资源配置,并生成符合法规标准的高质量检查报告。
    Abstract Ensuring the safety, quality, and timely completion of construction projects is paramount, with construction inspections serving as a vital instrument towards these goals. Nevertheless, the predominantly manual approach of present-day inspections frequently results in inefficiencies and inadequate information management. Such methods often fall short of providing holistic, exhaustive assessments, consequently engendering regulatory oversights and potential safety hazards. To address this issue, this paper presents a novel framework named AutoRepo for automated generation of construction inspection reports. The unmanned vehicles efficiently perform construction inspections and collect scene information, while the multimodal large language models (LLMs) are leveraged to automatically generate the inspection reports. The framework was applied and tested on a real-world construction site, demonstrating its potential to expedite the inspection process, significantly reduce resource allocation, and produce high-quality, regulatory standard-compliant inspection reports. This research thus underscores the immense potential of multimodal large language models in revolutionizing construction inspection practices, signaling a significant leap forward towards a more efficient and safer construction management paradigm.
    摘要 Ensuring the safety, quality, and timely completion of construction projects is crucial, with construction inspections serving as a vital tool towards these goals. However, the predominantly manual approach of present-day inspections frequently leads to inefficiencies and inadequate information management. Such methods often fall short of providing comprehensive, exhaustive assessments, resulting in regulatory oversights and potential safety hazards. To address this issue, this paper presents a novel framework named AutoRepo for automated generation of construction inspection reports. The unmanned vehicles efficiently perform construction inspections and collect scene information, while the multimodal large language models (LLMs) are leveraged to automatically generate the inspection reports. The framework was applied and tested on a real-world construction site, demonstrating its potential to expedite the inspection process, significantly reduce resource allocation, and produce high-quality, regulatory standard-compliant inspection reports. This research thus underscores the immense potential of multimodal large language models in revolutionizing construction inspection practices, signaling a significant leap forward towards a more efficient and safer construction management paradigm.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation using Large Language Models

  • paper_url: http://arxiv.org/abs/2310.07937
  • repo_url: None
  • paper_authors: Bangguo Yu, Hamidreza Kasaei, Ming Cao
  • for: 这篇论文旨在解决多机器人合作时的可视目标导航问题,以实现高效率和可靠性。
  • methods: 本文提出了一个创新的框架,即Co-NavGPT,它使用大型自然语言模型(LLM)作为多机器人合作的全球观察者,将环境资料转换为提示,进而提高 LLM 的景象理解能力。
  • results: 实验结果显示,Co-NavGPT 在 HM3D 环境中的成功率和效率都高于现有模型,而且不需要任何学习过程,这表明 LLM 在多机器人合作领域的应用潜力非常大。
    Abstract In advanced human-robot interaction tasks, visual target navigation is crucial for autonomous robots navigating unknown environments. While numerous approaches have been developed in the past, most are designed for single-robot operations, which often suffer from reduced efficiency and robustness due to environmental complexities. Furthermore, learning policies for multi-robot collaboration are resource-intensive. To address these challenges, we propose Co-NavGPT, an innovative framework that integrates Large Language Models (LLMs) as a global planner for multi-robot cooperative visual target navigation. Co-NavGPT encodes the explored environment data into prompts, enhancing LLMs' scene comprehension. It then assigns exploration frontiers to each robot for efficient target search. Experimental results on Habitat-Matterport 3D (HM3D) demonstrate that Co-NavGPT surpasses existing models in success rates and efficiency without any learning process, demonstrating the vast potential of LLMs in multi-robot collaboration domains. The supplementary video, prompts, and code can be accessed via the following link: \href{https://sites.google.com/view/co-navgpt}{https://sites.google.com/view/co-navgpt}.
    摘要 在高级人机交互任务中,视觉目标导航是关键,以便自主机器人在未知环境中进行自主导航。过去有许多方法被开发出来,但大多数是单机器人操作的,它们往往因环境复杂性而减少效率和可靠性。此外,学习策略 для多机器人合作也是费时费力的。为解决这些挑战,我们提出了Co-NavGPT框架,它将大型自然语言模型(LLM)作为多机器人合作的全球规划器。Co-NavGPT将探索环境数据编码成提示,从而提高 LLM 的景象理解能力。然后,它将每个机器人分配出探索前沿,以实现高效的目标搜索。在Habitat-Matterport 3D(HM3D)上进行的实验结果表明,Co-NavGPT比既有模型在成功率和效率方面具有更高的潜力,而且无需进行学习过程,这表明 LLM 在多机器人合作领域的潜力是非常大。补充视频、提示和代码可以通过以下链接获取:\href{https://sites.google.com/view/co-navgpt}{https://sites.google.com/view/co-navgpt}.

What Matters to You? Towards Visual Representation Alignment for Robot Learning

  • paper_url: http://arxiv.org/abs/2310.07932
  • repo_url: None
  • paper_authors: Ran Tian, Chenfeng Xu, Masayoshi Tomizuka, Jitendra Malik, Andrea Bajcsy
  • for: 本研究旨在帮助机器人优化与人类 preference 相关的奖励,以便机器人可以根据人类的需求和选择进行行为。
  • methods: 本研究使用了 Representation-Aligned Preference-based Learning (RAPL) 方法,该方法通过人类反馈来调整机器人的视觉表示,以便更好地满足人类的需求。
  • results: 实验结果表明,RAPL 的奖励可以生成人类喜欢的机器人行为,并且具有高样本效率和零样本泛化性。
    Abstract When operating in service of people, robots need to optimize rewards aligned with end-user preferences. Since robots will rely on raw perceptual inputs like RGB images, their rewards will inevitably use visual representations. Recently there has been excitement in using representations from pre-trained visual models, but key to making these work in robotics is fine-tuning, which is typically done via proxy tasks like dynamics prediction or enforcing temporal cycle-consistency. However, all these proxy tasks bypass the human's input on what matters to them, exacerbating spurious correlations and ultimately leading to robot behaviors that are misaligned with user preferences. In this work, we propose that robots should leverage human feedback to align their visual representations with the end-user and disentangle what matters for the task. We propose Representation-Aligned Preference-based Learning (RAPL), a method for solving the visual representation alignment problem and visual reward learning problem through the lens of preference-based learning and optimal transport. Across experiments in X-MAGICAL and in robotic manipulation, we find that RAPL's reward consistently generates preferred robot behaviors with high sample efficiency, and shows strong zero-shot generalization when the visual representation is learned from a different embodiment than the robot's.
    摘要

D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning

  • paper_url: http://arxiv.org/abs/2310.07931
  • repo_url: https://github.com/adymaharana/d2pruning
  • paper_authors: Adyasha Maharana, Prateek Yadav, Mohit Bansal
  • for: 提高模型训练数据质量可以降低模型测试错误率,同时可以采用数据减少方法来降低计算成本。
  • methods: 提出了一种基于图structure的数据选择算法, named D2 Pruning, 使用前向和反向消息传递来更新数据集中每个示例的difficulty scores,然后使用图Structured sampling方法选择最佳的核心集。
  • results: 对于视觉和语言 datasets,D2 Pruning比前一代方法更好地选择核心集,可以达到70%的减少率,同时发现使用D2 Pruning来筛选大量多模态数据集可以提高数据集的多样性和预训练模型的泛化性。
    Abstract Analytical theories suggest that higher-quality data can lead to lower test errors in models trained on a fixed data budget. Moreover, a model can be trained on a lower compute budget without compromising performance if a dataset can be stripped of its redundancies. Coreset selection (or data pruning) seeks to select a subset of the training data so as to maximize the performance of models trained on this subset, also referred to as coreset. There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics. Optimizing for data diversity leads to a coreset that is biased towards easier samples, whereas, selection by difficulty ranking omits easy samples that are necessary for the training of deep learning models. This demonstrates that data diversity and importance scores are two complementary factors that need to be jointly considered during coreset selection. We represent a dataset as an undirected graph and propose a novel pruning algorithm, D2 Pruning, that uses forward and reverse message passing over this dataset graph for coreset selection. D2 Pruning updates the difficulty scores of each example by incorporating the difficulty of its neighboring examples in the dataset graph. Then, these updated difficulty scores direct a graph-based sampling method to select a coreset that encapsulates both diverse and difficult regions of the dataset space. We evaluate supervised and self-supervised versions of our method on various vision and language datasets. Results show that D2 Pruning improves coreset selection over previous state-of-the-art methods for up to 70% pruning rates. Additionally, we find that using D2 Pruning for filtering large multimodal datasets leads to increased diversity in the dataset and improved generalization of pretrained models.
    摘要 高品质数据可能导致模型在固定数据预算下的测试错误下降。此外,一个模型可以在固定计算预算下训练无需妥协性能,只要可以从数据集中剔除重复项。核心集选择(数据剔除)目的是选择训练数据集中的子集,以最大化模型在这个子集上的性能。现有两种主导方法:(1)基于geometry的数据选择,以提高数据多样性在核心集中,和(2)基于训练动态函数来评分样本的难度。最佳化数据多样性会导致核心集偏向容易样本,而选择难度排名则会忽略容易样本,这些样本是深度学习模型训练所必需的。这表明数据多样性和重要性分数是两种完全相关的因素,需要在核心集选择中同时考虑。我们将数据集表示为无向图,并提出一种新的减少算法,D2 Pruning,它使用数据集图上的前向和反向消息传递来进行核心集选择。D2 Pruning将数据集图上的每个例子的难度分数更新,基于该例子的邻居例子在数据集图上的难度。然后,这些更新后的难度分数将导航一种基于图的采样方法,选择包含多样和困难区域的核心集。我们对各种视觉和语言数据集进行了超过70%的减少率。此外,我们发现使用D2 Pruning进行大量多模态数据集的筛选可以提高数据集的多样性和预训练模型的泛化性。

Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning

  • paper_url: http://arxiv.org/abs/2310.07918
  • repo_url: None
  • paper_authors: Jannik Deuschel, Caleb N. Ellington, Benjamin J. Lengerich, Yingtao Luo, Pascal Friederich, Eric P. Xing
  • for: 该论文目的是提出一种 Contextualized Policy Recovery(CPR)方法,以解决现有的政策学习模型存在准确性和可读性之间的负面选择问题。
  • methods: CPR方法将问题定义为多任务学习问题,将复杂的决策过程分解为不同的上下文特定策略。每个上下文特定策略都是一个线性观察到行动映射。CPR方法可以在完全无线上和部分可见的决策环境中运行,并可以与任何循环黑盒模型或可读的决策模型结合使用。
  • results: 研究人员通过在 simulate 和实际数据上测试 CPR 方法,实现了在静脉抗生素干扰 ICU 中预测抗生素药物的 (+22% AUROC vs. 前一代 SOTA) 和预测 Alzheimer 病人 MRI 药物的 (+7.7% AUROC vs. 前一代 SOTA) 任务上的状元表现。与此同时,CPR 方法 closing 了可读性和黑盒方法之间的准确性差距,允许高分辨率探索和分析上下文特定的决策模型。
    Abstract Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.
    摘要 “干预推理”政策学习目标在于从观察行为中估算出可理解的决策政策,但现有模型却存在精度和可理解之间的对抗。这个对抗限制了基于数据的人类决策过程的解释。例如,为了审核医疗决策中的偏见和不良实践,我们需要一些可解释的决策过程模型,以提供简洁的行为描述。实际上,现有的方法受到这个对抗,因为它们将基于决策过程的底层模型设计为一个通用政策,但人类决策实际上是动态的,可以随着上下文资讯而改变。因此,我们提出了“上下文化政策恢复”(CPR),它将决策过程模型化为多任务学习问题,并将复杂的决策政策拆分为具有上下文特定的政策。CPR模型每个上下文特定的政策为一个线性观察到动作的映射,并在上下文更新时产生新的决策模型。CPR适用于完全离线和部分可观察的决策环境,并可以与任何可读黑盒模型或可解释的决策模型整合。我们通过在模拟和实际数据上进行了研究, CP 的表现比前一代最佳化� Архивная копия от 20 августа 2020 на Wayback Machine

  • paper_url: http://arxiv.org/abs/2310.07917
  • repo_url: None
  • paper_authors: Elaheh Jafarigol, Theodore Trafalis
  • for: 本研究旨在提供大规模不均衡数据中机器学习领域各种方法的概述和总结,以便在不同领域中应用大规模不均衡数据。
  • methods: 本研究涉及了各种手段,包括各种数据处理技术和机器学习算法,以 Addressing the problem of imbalanced data in various domains。
  • results: 本研究通过收集和评审258篇同行评审文章,提供了对各种方法的审视和总结,以及在不同领域中机器学习的应用。
    Abstract For over two decades, detecting rare events has been a challenging task among researchers in the data mining and machine learning domain. Real-life problems inspire researchers to navigate and further improve data processing and algorithmic approaches to achieve effective and computationally efficient methods for imbalanced learning. In this paper, we have collected and reviewed 258 peer-reviewed papers from archival journals and conference papers in an attempt to provide an in-depth review of various approaches in imbalanced learning from technical and application perspectives. This work aims to provide a structured review of methods used to address the problem of imbalanced data in various domains and create a general guideline for researchers in academia or industry who want to dive into the broad field of machine learning using large-scale imbalanced data.
    摘要

Recurrent networks recognize patterns with low-dimensional oscillations

  • paper_url: http://arxiv.org/abs/2310.07908
  • repo_url: https://github.com/ktmurray1999/neural-rules
  • paper_authors: Keith T. Murray
  • for: 这个研究探讨了一种新的动力学机制,用于识别模式,通过解释一个基于SET卡游戏的简单任务中的回归神经网络(RNN)的含义。
  • methods: 这个研究使用了解释RNN的方法,并将其视为在低维度限径谱中发生相位变换的模式识别。此外,研究者还手工制作了一个气泡模型,以重现RNN的动力学。
  • results: 这个研究发现,RNN可以通过相位变换来识别模式,并且这种机制与金字塔自动机(FSA)的转移有相似之处。此外,研究还发现了一种可能的神经网络实现FSA的机制。这项研究不仅提供了一种可能的模式识别机制,还为深度学习模型解释提供了一个新的视角。
    Abstract This study proposes a novel dynamical mechanism for pattern recognition discovered by interpreting a recurrent neural network (RNN) trained on a simple task inspired by the SET card game. We interpreted the trained RNN as recognizing patterns via phase shifts in a low-dimensional limit cycle in a manner analogous to transitions in a finite state automaton (FSA). We further validated this interpretation by handcrafting a simple oscillatory model that reproduces the dynamics of the trained RNN. Our findings not only suggest of a potential dynamical mechanism capable of pattern recognition, but also suggest of a potential neural implementation of FSA. Above all, this work contributes to the growing discourse on deep learning model interpretability.
    摘要

RoboCLIP: One Demonstration is Enough to Learn Robot Policies

  • paper_url: http://arxiv.org/abs/2310.07899
  • repo_url: None
  • paper_authors: Sumedh A Sontakke, Jesse Zhang, Sébastien M. R. Arnold, Karl Pertsch, Erdem Bıyık, Dorsa Sadigh, Chelsea Finn, Laurent Itti
    for: RoboCLIP is designed to address the difficulty of reward specification in reinforcement learning, particularly the need for extensive expert supervision to design robust reward functions.methods: RoboCLIP uses a single video demonstration or textual description of the task to generate rewards without manual reward function design, leveraging pretrained Video-and-Language Models (VLMs) without any finetuning.results: Reinforcement learning agents trained with RoboCLIP rewards demonstrate 2-3 times higher zero-shot performance than competing imitation learning methods on downstream robot manipulation tasks, using only one video/text demonstration.
    Abstract Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing expert demonstrations but typically require a large number of in-domain expert demonstrations. Inspired by advances in the field of Video-and-Language Models (VLMs), we present RoboCLIP, an online imitation learning method that uses a single demonstration (overcoming the large data requirement) in the form of a video demonstration or a textual description of the task to generate rewards without manual reward function design. Additionally, RoboCLIP can also utilize out-of-domain demonstrations, like videos of humans solving the task for reward generation, circumventing the need to have the same demonstration and deployment domains. RoboCLIP utilizes pretrained VLMs without any finetuning for reward generation. Reinforcement learning agents trained with RoboCLIP rewards demonstrate 2-3 times higher zero-shot performance than competing imitation learning methods on downstream robot manipulation tasks, doing so using only one video/text demonstration.
    摘要 <>请注意,以下文本将使用简化中文表示。> reward specification 是 reinforcement learning 中的一个不orious难题,需要广泛的专家指导以设计可靠的奖励函数。 imitation learning(IL)方法尝试通过使用专家示范来绕过这些问题,但通常需要大量的域内专家示范。 我们受到 Video-and-Language Models(VLMs)领域的进步 inspirited,我们提出了 RoboCLIP,一种在线的 imitation learning 方法,使用单个示范(超越大量数据要求),可以通过视频示范或文本描述来生成奖励,无需手动设计奖励函数。 RoboCLIP 还可以使用不同域的示范,如人类解决任务的视频示范,以便不需要同一个示范和部署域。 RoboCLIP 使用预训练的 VLMs,无需任何finetuning,可以生成奖励。 reinforcement learning 代理人使用 RoboCLIP 奖励表现出2-3倍于竞争的 imitation learning 方法在下游机器人处理任务上的零件表现,使用单个视频/文本示范。

Efficient Integrators for Diffusion Generative Models

  • paper_url: http://arxiv.org/abs/2310.07894
  • repo_url: https://github.com/mandt-lab/PSLD
  • paper_authors: Kushagra Pandey, Maja Rudolph, Stephan Mandt
  • for: 本研究旨在提高扩展Diffusion模型的采样速度,以便更快地生成样本。
  • methods: 我们提出了两种扩展Diffusion模型的采样方法: conjugate integrators和splitting integrators。 conjugate integrators通过将反射 diffusion 动力学映射到更易于采样的空间,而 splitting-based integrators通过在数据和占 auxiliary 变量之间进行交替更新来减少数值计算误差。
  • results: 我们对这两种方法进行了广泛的实验和理论研究,并提出了一种hybrid方法,这种方法可以在扩展空间中对Diffusion模型进行最佳性能的采样。在应用于[Pandey & Mandt, 2023]中的CIFAR-10上,我们的deterministic和stochastic采样器可以在100个网络功能评估(NFE)后达到FID分数为2.11和2.36,比最佳基eline的2.57和2.63更低。
    Abstract Diffusion models suffer from slow sample generation at inference time. Therefore, developing a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators generalize DDIM, mapping the reverse diffusion dynamics to a more amenable space for sampling. In contrast, splitting-based integrators, commonly used in molecular dynamics, reduce the numerical simulation error by cleverly alternating between numerical updates involving the data and auxiliary variables. After extensively studying these methods empirically and theoretically, we present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces. Applied to Phase Space Langevin Diffusion [Pandey & Mandt, 2023] on CIFAR-10, our deterministic and stochastic samplers achieve FID scores of 2.11 and 2.36 in only 100 network function evaluations (NFE) as compared to 2.57 and 2.63 for the best-performing baselines, respectively. Our code and model checkpoints will be made publicly available at \url{https://github.com/mandt-lab/PSLD}.
    摘要 Diffusion models在推理时会受到慢的样本生成问题。因此,开发一个原则性的框架来加速预训练模型的干扰/随机抽取是一个有前途的方向。我们提议两种补充性框架来加速预训练模型的样本生成:协调 интеграторы和分割 интеграторы。协调 интеграторы扩展了逆扩散动力学,将逆扩散动力学映射到更适合采样的空间。相比之下,分割基于分子动力学通常使用数值方法更新数据和辅助变量,从而减少数值计算误差。我们经验和理论上深入研究这些方法,并提出了一种混合方法,导致预训练模型在扩展空间中的最佳表现。应用于[Pandey & Mandt, 2023]中的phas Space Langevin Diffusion(PSLD)在CIFAR-10上,我们的干扰和随机抽取器在100个网络函数评估(NFE)内达到了FID分数为2.11和2.36,与最佳基线相比下升2.57和2.63。我们将代码和模型Checkpoint公开在 GitHub上,请参考\url{https://github.com/mandt-lab/PSLD}.

LangNav: Language as a Perceptual Representation for Navigation

  • paper_url: http://arxiv.org/abs/2310.07889
  • repo_url: None
  • paper_authors: Bowen Pan, Rameswar Panda, SouYoung Jin, Rogerio Feris, Aude Oliva, Phillip Isola, Yoon Kim
  • for: 本文探讨了使用语言作为视觉 Navigation 的感知表示。
  • methods: 我们的方法使用了市场上可得的视觉系统(图像描述和对象检测)将当前时间步的自我中心Panoramic View转换成自然语言描述。然后,我们将预训练的语言模型进行训练,以选择基于当前视图和轨迹历史的最佳行为。与标准设置不同的是,我们的方法使用(粗粒度)语言作为感知表示,而不是直接使用预训练的视觉特征。
  • results: 我们的方法在R2R视觉语言导航标准 benchmark上实现了比预先学习的视觉特征更好的性能,特别是在只有10-100个黄金轨迹available的情况下。这表明使用语言作为感知表示可以在导航任务中提供更好的性能。
    Abstract We explore the use of language as a perceptual representation for vision-and-language navigation. Our approach uses off-the-shelf vision systems (for image captioning and object detection) to convert an agent's egocentric panoramic view at each time step into natural language descriptions. We then finetune a pretrained language model to select an action, based on the current view and the trajectory history, that would best fulfill the navigation instructions. In contrast to the standard setup which adapts a pretrained language model to work directly with continuous visual features from pretrained vision models, our approach instead uses (discrete) language as the perceptual representation. We explore two use cases of our language-based navigation (LangNav) approach on the R2R vision-and-language navigation benchmark: generating synthetic trajectories from a prompted large language model (GPT-4) with which to finetune a smaller language model; and sim-to-real transfer where we transfer a policy learned on a simulated environment (ALFRED) to a real-world environment (R2R). Our approach is found to improve upon strong baselines that rely on visual features in settings where only a few gold trajectories (10-100) are available, demonstrating the potential of using language as a perceptual representation for navigation tasks.
    摘要

Leader-Follower Neural Networks with Local Error Signals Inspired by Complex Collectives

  • paper_url: http://arxiv.org/abs/2310.07885
  • repo_url: None
  • paper_authors: Chenzhong Yin, Mingxi Cheng, Xiongye Xiao, Xinghe Chen, Shahin Nazarian, Andrei Irimia, Paul Bogdan
  • For: The paper is written to propose a neural network architecture inspired by the rules observed in nature’s collective ensembles, and to investigate the behavior of workers in the network.* Methods: The paper uses a leader-follower neural network (LFNN) structure, and trains the network using local error signals and optionally incorporating backpropagation (BP) and global loss.* Results: The paper achieves significantly lower error rates than previous BP-free algorithms on MNIST and CIFAR-10, and outperforms previous BP-free algorithms by a significant margin on ImageNet.Here is the information in Simplified Chinese text, as requested:* For: 这篇论文是为了提出基于自然集合中规则的神经网络架构,并研究工作者在网络中的行为。* Methods: 这篇论文使用了领导者-追随者神经网络(LFNN)结构,并通过本地错误信号和可选的反propagation(BP)和全局损失来训练网络。* Results: 这篇论文在MNIST和CIFAR-10上实现了比前一代BP-free算法更低的错误率,并在ImageNet上超过了前一代BP-free算法的性能。
    Abstract The collective behavior of a network with heterogeneous, resource-limited information processing units (e.g., group of fish, flock of birds, or network of neurons) demonstrates high self-organization and complexity. These emergent properties arise from simple interaction rules where certain individuals can exhibit leadership-like behavior and influence the collective activity of the group. Motivated by the intricacy of these collectives, we propose a neural network (NN) architecture inspired by the rules observed in nature's collective ensembles. This NN structure contains workers that encompass one or more information processing units (e.g., neurons, filters, layers, or blocks of layers). Workers are either leaders or followers, and we train a leader-follower neural network (LFNN) by leveraging local error signals and optionally incorporating backpropagation (BP) and global loss. We investigate worker behavior and evaluate LFNNs through extensive experimentation. Our LFNNs trained with local error signals achieve significantly lower error rates than previous BP-free algorithms on MNIST and CIFAR-10 and even surpass BP-enabled baselines. In the case of ImageNet, our LFNN-l demonstrates superior scalability and outperforms previous BP-free algorithms by a significant margin.
    摘要 集体行为的网络(例如鱼群、鸟群或神经网络)表现出高度自组织和复杂性。这些emergent特性来自简单的互动规则,其中某些个体可以展示领导性行为,影响集体活动的总体表现。受自然集体ensemble的复杂性启发,我们提出一种基于自然集体的神经网络(NN)结构。这个NN结构包含有一个或多个信息处理单元(例如神经元、滤波器、层或层组)的工作者。工作者可以是领导者或追随者,我们使用本地错误信号和可选地包括反向传播(BP)和全局损失来训练领导者-追随者神经网络(LFNN)。我们对工作者行为进行了广泛的实验研究,并评估了LFNN的性能。我们的LFNN使用本地错误信号进行训练,在MNIST和CIFAR-10上达到了较低的错误率,并在ImageNet上超越了前一代BP-free算法。

The Thousand Faces of Explainable AI Along the Machine Learning Life Cycle: Industrial Reality and Current State of Research

  • paper_url: http://arxiv.org/abs/2310.07882
  • repo_url: None
  • paper_authors: Thomas Decker, Ralf Gross, Alexander Koebler, Michael Lebacher, Ronald Schnitzer, Stefan H. Weber
  • for: 本研究探讨了可解释人工智能(XAI)在生产业中的实际应用 relevance,并对当前学术研究进行了比较。
  • methods: 本研究采用了广泛的采访方法,包括各种角色和关键参与者从不同的行业部门进行了访问。此外,我们还提供了一个简短的文献回顾,以提供一个涵盖所调查人员的意见以及当前学术研究的总览。
  • results: 我们的调查结果表明,虽然存在许多不同的XAI方法,但大多数都集中在模型评估阶段和数据科学家之间。此外,我们还发现了一些不足,例如现有方法和框架不足以帮助非专家用户理解和解释透明度不高的人工智能模型。
    Abstract In this paper, we investigate the practical relevance of explainable artificial intelligence (XAI) with a special focus on the producing industries and relate them to the current state of academic XAI research. Our findings are based on an extensive series of interviews regarding the role and applicability of XAI along the Machine Learning (ML) lifecycle in current industrial practice and its expected relevance in the future. The interviews were conducted among a great variety of roles and key stakeholders from different industry sectors. On top of that, we outline the state of XAI research by providing a concise review of the relevant literature. This enables us to provide an encompassing overview covering the opinions of the surveyed persons as well as the current state of academic research. By comparing our interview results with the current research approaches we reveal several discrepancies. While a multitude of different XAI approaches exists, most of them are centered around the model evaluation phase and data scientists. Their versatile capabilities for other stages are currently either not sufficiently explored or not popular among practitioners. In line with existing work, our findings also confirm that more efforts are needed to enable also non-expert users' interpretation and understanding of opaque AI models with existing methods and frameworks.
    摘要 The study finds that while there are many different XAI approaches, most are centered around the model evaluation phase and data scientists, with limited exploration of their versatility in other stages. Additionally, the study confirms that more efforts are needed to enable non-expert users to interpret and understand opaque AI models using existing methods and frameworks.The review of the relevant literature provides an encompassing overview of the opinions of the surveyed persons as well as the current state of academic research. By comparing the interview results with the current research approaches, the study reveals several discrepancies between the two, highlighting the need for further research and development in XAI.

DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks

  • paper_url: http://arxiv.org/abs/2310.07881
  • repo_url: None
  • paper_authors: Nawras Alkassab, Chin-Tser Huang, Tania Lorido Botran
  • for: 这篇论文旨在提高内容传输网络(Content Delivery Networks,CDN)中视频内容的缓存和预取优化算法,以提高用户体验质量。
  • methods: 这篇论文提出了一种基于深度学习的预取优化算法,named DeePref,可以在CDN边缘网络中自动适应用户访问模式的变化,提高预取精度和覆盖率。
  • results: 实验结果表明,使用DeePref DRQN在实际世界数据集上,可以提高预取精度和覆盖率,相比基eline方法,提高28%和17%,同时也研究了将统计模型从一个边缘网络传输到另一个边缘网络,可以提高预取精度和覆盖率,相比基eline方法,提高30%和10%。
    Abstract Content Delivery Networks carry the majority of Internet traffic, and the increasing demand for video content as a major IP traffic across the Internet highlights the importance of caching and prefetching optimization algorithms. Prefetching aims to make data available in the cache before the requester places its request to reduce access time and improve the Quality of Experience on the user side. Prefetching is well investigated in operating systems, compiler instructions, in-memory cache, local storage systems, high-speed networks, and cloud systems. Traditional prefetching techniques are well adapted to a particular access pattern, but fail to adapt to sudden variations or randomization in workloads. This paper explores the use of reinforcement learning to tackle the changes in user access patterns and automatically adapt over time. To this end, we propose, DeePref, a Deep Reinforcement Learning agent for online video content prefetching in Content Delivery Networks. DeePref is a prefetcher implemented on edge networks and is agnostic to hardware design, operating systems, and applications. Our results show that DeePref DRQN, using a real-world dataset, achieves a 17% increase in prefetching accuracy and a 28% increase in prefetching coverage on average compared to baseline approaches that use video content popularity as a building block to statically or dynamically make prefetching decisions. We also study the possibility of transfer learning of statistical models from one edge network into another, where unseen user requests from unknown distribution are observed. In terms of transfer learning, the increase in prefetching accuracy and prefetching coverage are [$30%$, $10%$], respectively. Our source code will be available on Github.
    摘要 content delivery networks 承载了互联网上大量流量,而在互联网上占主要的流量中,视频内容的增长也使得缓存和预取优化算法变得越来越重要。预取的目的是在用户请求之前将数据提取到缓存中,以降低访问时间和提高用户体验质量。预取技术已经在操作系统、编译器指令、内存缓存、本地存储系统、高速网络和云系统中得到了广泛的研究。传统的预取技术通常是根据特定的访问模式进行适应,但是它们无法适应用户访问模式的快速变化或随机化。本文探讨了使用强化学习来解决这些变化的问题。为此,我们提出了DeePref,一种基于深度强化学习的在线视频内容预取代理。DeePref在边缘网络上实现,不依赖于硬件设计、操作系统或应用程序。我们的结果表明,DeePref DRQN使用实际数据集时,平均提高预取精度28%和预取覆盖率17% Compared to基eline方法,使用视频内容的流行度作为预取决策的基础或动态决策。我们还研究了将统计模型从一个边缘网络传播到另一个边缘网络中,以处理未经见过的用户请求和未知分布。在传播学习中,预取精度和预取覆盖率增加了[30%, 10%]。我们的源代码将在 GitHub 上发布。

TabLib: A Dataset of 627M Tables with Context

  • paper_url: http://arxiv.org/abs/2310.07875
  • repo_url: None
  • paper_authors: Gus Eggert, Kevin Huo, Mike Biven, Justin Waugh
  • for: 这篇论文是为了提供一个大型、多样化的表格数据集,以便进行现代AI系统的研究和发展。
  • methods: 该论文使用了多种数据抽取方法,从GitHub和Common Crawl等来源中提取了627万个表格,总量达69 TiB,并且包含了867亿个上下文符号。
  • results: 该论文提出了一个名为”TabLib”的新的表格数据集,其包含了627万个表格,总量达69 TiB,并且具有多样化的表格结构和上下文。这个数据集的大小和多样性都提供了许多探索和研究的机会,类似于text和图像模态的基础数据集。
    Abstract It is well-established that large, diverse datasets play a pivotal role in the performance of modern AI systems for text and image modalities. However, there are no datasets for tabular data of comparable size and diversity to those available for text and images. Thus we present "TabLib'', a compilation of 627 million tables totaling 69 TiB, along with 867B tokens of context. TabLib was extracted from numerous file formats, including CSV, HTML, SQLite, PDF, Excel, and others, sourced from GitHub and Common Crawl. The size and diversity of TabLib offer considerable promise in the table modality, reminiscent of the original promise of foundational datasets for text and images, such as The Pile and LAION.
    摘要 “已经确立了现代人工智能系统中大量多样数据的重要作用。然而,对于表格数据,没有相关的大量多样数据集的存在,与文本和图像模式相似。因此,我们提出了“TabLib”,包含627万个表格,总量为69 TiB,以及867亿个上下文token。TabLib从多种文件格式中提取,包括CSV、HTML、SQLite、PDF、Excel等,来自GitHub和Common Crawl。TabLib的大小和多样性表现出了很大的抢救潜力,类似于文本和图像领域的基础数据集,如The Pile和LAION。”

Hierarchical Pretraining on Multimodal Electronic Health Records

  • paper_url: http://arxiv.org/abs/2310.07871
  • repo_url: https://github.com/xiaochenwang-psu/medhmp
  • paper_authors: Xiaochen Wang, Junyu Luo, Jiaqi Wang, Ziyi Yin, Suhan Cui, Yuan Zhong, Yaqing Wang, Fenglong Ma
  • for: 这篇论文是为了解决医疗领域中电子健康记录(EHR)资料的层次结构问题,以提高现有预训模型在多种下游任务中的通用能力。
  • methods: 本文提出了一个新的、通用的、多modal EHR预训框架(MEDHMP),专门针对医疗领域中的层次结构资料进行预训。
  • results: authors通过实验结果显示了MEDHMP的效果,在八个下游任务中三个层次上展示了佳绩,与十八个基准相比,更加强调了我们的方法的可行性。
    Abstract Pretraining has proven to be a powerful technique in natural language processing (NLP), exhibiting remarkable success in various NLP downstream tasks. However, in the medical domain, existing pretrained models on electronic health records (EHR) fail to capture the hierarchical nature of EHR data, limiting their generalization capability across diverse downstream tasks using a single pretrained model. To tackle this challenge, this paper introduces a novel, general, and unified pretraining framework called MEDHMP, specifically designed for hierarchically multimodal EHR data. The effectiveness of the proposed MEDHMP is demonstrated through experimental results on eight downstream tasks spanning three levels. Comparisons against eighteen baselines further highlight the efficacy of our approach.
    摘要 <>转换文本到简化中文。<>预训练技术在自然语言处理(NLP)中已经证明是一种强大的技术,在不同的NLP下游任务中表现出了很好的成绩。然而,在医疗领域,现有的预训练模型在电子医疗记录(EHR)数据上失去了层次结构的特点,因此限制了单个预训练模型在多种下游任务中的通用化能力。为解决这个挑战,本文提出了一种新的、通用、多模式预训练框架called MEDHMP,专门针对层次多模式EHR数据。我们通过对八个下游任务进行实验,证明了我们的方法的有效性。与 eighteen个基准值进行比较,我们的方法的成绩更加出色。

Cheap Talking Algorithms

  • paper_url: http://arxiv.org/abs/2310.07867
  • repo_url: None
  • paper_authors: Daniele Condorelli, Massimiliano Furlan
  • for: 研究独立强化学习算法在 crawford 和 sobel (1982) 游戏中的信息传输行为。
  • methods: sender 和 receiver 在训练中共同 converges to 离散的 Nash 均衡。
  • results: 通信占据预期的最大程度,即根据交战利益冲突的程度。结论是对不同参数和游戏 especification robust。I hope this helps!
    Abstract We simulate behaviour of independent reinforcement learning algorithms playing the Crawford and Sobel (1982) game of strategic information transmission. We show that a sender and a receiver training together converge to strategies close to the exante optimal equilibrium of the game. Hence, communication takes place to the largest extent predicted by Nash equilibrium given the degree of conflict of interest between agents. The conclusion is shown to be robust to alternative specifications of the hyperparameters and of the game. We discuss implications for theories of equilibrium selection in information transmission games, for work on emerging communication among algorithms in computer science and for the economics of collusions in markets populated by artificially intelligent agents.
    摘要 我们模拟独立强化学习算法在克劳福德和索贝尔(1982)游戏中的行为。我们显示,发送者和接收者一起培训时,会 converges到靠近预先优化的均衡点。因此,通信发生在预先优化的均衡点所预测的范围内。结论是对于不同的具体化参数和游戏参数,都是Robust的。我们讨论这些结论对信息传输游戏理论选择、计算机科学中算法之间的emerging通信以及人工智能代理人市场中的经济合作等方面的影响。

Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations

  • paper_url: http://arxiv.org/abs/2310.07849
  • repo_url: None
  • paper_authors: Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, Ming Yin
  • for: investigate the effectiveness of using large language models (LLMs) to generate synthetic datasets for text classification
  • methods: use LLMs to generate synthetic data, and evaluate the performance of models trained on these synthetic data
  • results: find that subjectivity, at both the task level and instance level, is negatively associated with the performance of the model trained on synthetic dataHere’s the full text in Simplified Chinese:
  • for: 这个研究是为了investigate大语言模型(LLMs)是否可以生成高质量的synthetic datasets,以便提高文本分类模型的性能。
  • methods: 研究者使用LLMs生成synthetic data,并评估这些synthetic data上模型的性能。
  • results: 发现任务级别和实例级别的主观性均对模型在synthetic data上的性能产生负面影响。
    Abstract The collection and curation of high-quality training data is crucial for developing text classification models with superior performance, but it is often associated with significant costs and time investment. Researchers have recently explored using large language models (LLMs) to generate synthetic datasets as an alternative approach. However, the effectiveness of the LLM-generated synthetic data in supporting model training is inconsistent across different classification tasks. To better understand factors that moderate the effectiveness of the LLM-generated synthetic data, in this study, we look into how the performance of models trained on these synthetic data may vary with the subjectivity of classification. Our results indicate that subjectivity, at both the task level and instance level, is negatively associated with the performance of the model trained on synthetic data. We conclude by discussing the implications of our work on the potential and limitations of leveraging LLM for synthetic data generation.
    摘要 collection and curation of high-quality training data 高质量训练数据的收集和整理是发展文本分类模型的表现优秀的关键因素,但它们frequently associated with significant costs and time investment. Researchers have recently explored using large language models (LLMs) to generate synthetic datasets as an alternative approach. However, the effectiveness of the LLM-generated synthetic data in supporting model training is inconsistent across different classification tasks.To better understand the factors that moderate the effectiveness of the LLM-generated synthetic data, in this study, we investigate how the performance of models trained on these synthetic data may vary with the subjectivity of classification. Our results indicate that subjectivity, at both the task level and instance level, is negatively associated with the performance of the model trained on synthetic data.We conclude by discussing the implications of our work on the potential and limitations of leveraging LLM for synthetic data generation.Here's the translation in Traditional Chinese:集成和整理高质量训练数据的收集是发展文本分类模型的表现优秀的关键因素,但它们经常与大量的成本和时间投入相关。研究人员最近尝试使用大语言模型(LLMs)生成 sintetic数据作为代替方案。然而,LLM生成的 sintetic数据在不同的分类任务中的效果是不一致的。为了更好地理解LLM生成 sintetic数据的效果的moderating因素,在这个研究中,我们investigate how the performance of models trained on these sintetic data may vary with the subjectivity of classification.我们的结果表明,在任务水平和实例水平都存在负相关性 между模型在 sintetic数据上的性能和分类的主观性。我们 conclude by discussing the implications of our work on the potential and limitations of leveraging LLM for synthetic data generation.

Towards the Fundamental Limits of Knowledge Transfer over Finite Domains

  • paper_url: http://arxiv.org/abs/2310.07838
  • repo_url: https://github.com/sw-packages/ae0783895ca52d793929d6e5e57c365320dc5864c41ab9a7d5f64b2310c2fd59
  • paper_authors: Qingyue Zhao, Banghua Zhu
  • for: 本文研究了知识传递的统计效率,具体来说是通过$n$个教师样本传递到一个 probabilistic 学习器中,以便在输入空间$\mathcal{S}$上预测标签$\mathcal{A}$。
  • methods: 本文使用了三个进行知识传递的水平,它们分别是:只有硬标签信息(first level)、教师概率分布信息+硬标签信息(second level)和完整的soft labels信息(third level)。
  • results: 本文证明了,在第一个水平上,只有硬标签信息时,最优的最大 LIKElihood estimator 的渐近速率为 $\sqrt{|{\mathcal S}||{\mathcal A}|}/{n}$。在第二个水平上,增加教师概率分布信息可以提高渐近速率的下界至 ${|{\mathcal S}||{\mathcal A}|}/{n}$。在第三个水平上,使用完整的soft labels信息可以实现渐近速率 ${|{\mathcal S}|}/{n}$ ,而且任何 Kullback-Leibler divergence 最小化器都是优选的。numerical simulations 验证了这些理论结论。
    Abstract We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $\sqrt{|{\mathcal S}||{\mathcal A}|}/{n}$. The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${|{\mathcal S}||{\mathcal A}|}/{n}$. However, under this second data acquisition protocol, minimizing a naive adaptation of the cross-entropy loss results in an asymptotically biased student. We overcome this limitation and achieve the fundamental limit by using a novel empirical variant of the squared error logit loss. The third level further equips the student with the soft labels (complete logits) on ${\mathcal A}$ given every sampled input, thereby provably enables the student to enjoy a rate ${|{\mathcal S}|}/{n}$ free of $|{\mathcal A}|$. We find any Kullback-Leibler divergence minimizer to be optimal in the last case. Numerical simulations distinguish the four learners and corroborate our theory.
    摘要 我们Characterize了知识传输的统计效率通过$n$个教师到一个概率学生分类器的输入空间$\mathcal S$上的标签$\mathcal A$.我们显示了隐私信息的三个进步级别可以加速传输。在第一个级别上,只有硬标签是知道的,由最大likelihood估计达到最小最大值$\sqrt{|{\mathcal S}||{\mathcal A}|}/{n}$.在第二个级别上,教师标签的概率也可以获得,这会降低下界到${|{\mathcal S}||{\mathcal A}|}/{n}$.然而,在这个第二个数据收集协议下,直接适应cross-entropy损失会导致漫游学生。我们解决了这个局限性,并达到基本限制,使用一种新的实际变体的平方差logit损失。在第三个级别上,学生还获得了每个输入的完整logits(${\mathcal A}$),这使得学生可以在${|{\mathcal S}|}/{n}$的时间内达到基本限制。我们发现任何Kullback-Leibler divergence最小化者是最佳的。数字实验证明了我们的理论。

When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement

  • paper_url: http://arxiv.org/abs/2310.07831
  • repo_url: https://github.com/facebookresearch/adaptive_scheduling
  • paper_authors: Aaron Defazio, Ashok Cutkosky, Harsh Mehta, Konstantin Mishchenko
  • for: 这个论文的目的是为了bridging theory和实践之间的差距,并为一类优化算法(包括SGD)提供新的问题适应学习率计划。
  • methods: 这篇论文使用了对一类优化算法的推论分析,并使用实际中的观测梯度 norm来Derive更加精细的学习率计划。
  • results: 论文的实验结果表明,linear decay schedule在10种深度学习问题中具有最好的性能,而且在一系列LLMs和一组логистиック回归问题中也有出色的表现。此外,论文还提供了一种自动实现学习率计划的系统方法,可以实现学习率温化和快速学习率降低。
    Abstract Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.
    摘要 theory和实践中的学习率调度不符合,我们通过提出新的问题适应型学习率调度来减少这一差距。我们的关键技术贡献是对一类优化算法(包括SGD)的学习率调度进行了精细分析。与大多数前一些工作一样,我们研究了平均迭代点的渐进,但是我们强调研究最后一个迭代点,这是实际应用中人们常用的。在假设最坏情况下,我们的理论预测,最佳选择是线性衰减调度:一种广泛使用的做法,其中每个迭代点的步长与当前迭代次数/$T$ 成正比。为了超越这个最坏情况分析,我们使用观察到的梯度 norm 来 derive 更加细化的调度。这些细化调度具有学习率暖启和快速学习率缓和两个性能。我们是首个系统地自动实现这两个特性的系统。我们对10种深度学习问题、一系列LLMs以及一组logistic regression问题进行了最全面的学习率调度评估。我们证明,总的来说,线性衰减调度与所有通用的默认调度(包括cosine annealing)相当或超越。此外,我们的调度修正方法可以提供进一步的改进。

Does Synthetic Data Make Large Language Models More Efficient?

  • paper_url: http://arxiv.org/abs/2310.07830
  • repo_url: None
  • paper_authors: Sia Gholami, Marwan Omar
  • for: 本研究旨在探讨深度学习方法在自然语言处理(NLP)领域中的应用,尤其是关于生成模板化问题的数据生成。
  • methods: 本研究使用模板基于的问题生成方法,以增加数据的多样性和数据量,并对现代变换器模型的性能进行评估。
  • results: 研究发现,使用模板基于的数据生成可以提高变换器模型的性能,但同时也存在风险的过拟合和模板限制的问题。
    Abstract Natural Language Processing (NLP) has undergone transformative changes with the advent of deep learning methodologies. One challenge persistently confronting researchers is the scarcity of high-quality, annotated datasets that drive these models. This paper explores the nuances of synthetic data generation in NLP, with a focal point on template-based question generation. By assessing its advantages, including data augmentation potential and the introduction of structured variety, we juxtapose these benefits against inherent limitations, such as the risk of overfitting and the constraints posed by pre-defined templates. Drawing from empirical evaluations, we demonstrate the impact of template-based synthetic data on the performance of modern transformer models. We conclude by emphasizing the delicate balance required between synthetic and real-world data, and the future trajectories of integrating synthetic data in model training pipelines. The findings aim to guide NLP practitioners in harnessing synthetic data's potential, ensuring optimal model performance in diverse applications.
    摘要 自然语言处理(NLP)在深度学习方法出现后已经经历了转变性变化。一个持续挑战研究人员的问题是数据质量的缺乏,这些模型驱动。这篇论文探讨了NLP中 sintetic数据生成的细节,特点是模板基于的问题生成。我们评估了这些优点,如数据增强潜力和结构化多样性的引入,并与内置的限制进行对比,如过拟合风险和预定模板所带来的限制。通过实验评估,我们证明了模板基于的 sintetic数据对现代转换器模型的性能产生了影响。我们结论,将synthetic数据和真实世界数据进行 equilibrio是NLP实践者需要注意的。将来,我们预计将在模型训练管道中集成synthetic数据,以便优化模型在多样化应用中的性能。这些发现希望能够引导NLP实践者在使用synthetic数据的潜力,以确保模型在多样化应用中的优秀性能。

Exploring the Relationship between Analogy Identification and Sentence Structure Encoding in Large Language Models

  • paper_url: http://arxiv.org/abs/2310.07818
  • repo_url: None
  • paper_authors: Thilini Wijesiriwardene, Ruwan Wickramarachchi, Aishwarya Naresh Reganti, Vinija Jain, Aman Chadha, Amit Sheth, Amitava Das
  • for: 本研究旨在探讨现代自然语言处理(NLP)技术是如何识别句子对应关系,以及这种能力与语言模型(LLM)的句子结构编码能力之间的关系。
  • methods: 本研究使用多种大语言模型(LLM)来识别句子对应关系,并分析这些模型对句子结构的编码能力。
  • results: 研究发现,LLMs的句子对应关系识别能力与它们对句子结构的编码能力之间存在正相关性。具体来说, LLMS 更好地捕捉句子结构,它们也更具句子对应关系识别能力。
    Abstract Identifying analogies plays a pivotal role in human cognition and language proficiency. In the last decade, there has been extensive research on word analogies in the form of ``A is to B as C is to D.'' However, there is a growing interest in analogies that involve longer text, such as sentences and collections of sentences, which convey analogous meanings. While the current NLP research community evaluates the ability of Large Language Models (LLMs) to identify such analogies, the underlying reasons behind these abilities warrant deeper investigation. Furthermore, the capability of LLMs to encode both syntactic and semantic structures of language within their embeddings has garnered significant attention with the surge in their utilization. In this work, we examine the relationship between the abilities of multiple LLMs to identify sentence analogies, and their capacity to encode syntactic and semantic structures. Through our analysis, we find that analogy identification ability of LLMs is positively correlated with their ability to encode syntactic and semantic structures of sentences. Specifically, we find that the LLMs which capture syntactic structures better, also have higher abilities in identifying sentence analogies.
    摘要 理解对比在人类认知和语言能力中发挥着重要作用。过去十年,关于单词对比的研究得到了广泛的关注,但是现在越来越多的研究者关注 sentence对比,即 sentence A 和 sentence B 之间的对比。然而,当前的自然语言处理(NLP)研究社区正在评估大型自然语言模型(LLM)能否识别这类对比。尽管 LLM 的能力在识别对比方面的研究得到了广泛的关注,但是这些能力的下面原因仍然需要更深入的探究。此外, LLM 能够内嵌语言结构的能力也在不断受到关注,特别是它们可以内嵌 sentence 的语法和 semantics 结构。在这篇文章中,我们研究了多个 LLM 的对比 Identification 能力和它们内嵌语言结构的关系。我们发现,LLM 的对比 Identification 能力和它们内嵌语言结构的能力是正相关的。具体来说,我们发现 LLM 可以更好地捕捉语法结构的那些也有更高的对比 Identification 能力。

Generative Modeling with Phase Stochastic Bridges

  • paper_url: http://arxiv.org/abs/2310.07805
  • repo_url: None
  • paper_authors: Tianrong Chen, Jiatao Gu, Laurent Dinh, Evangelos A. Theodorou, Josh Susskind, Shuangfei Zhai
  • for: 这篇论文的目的是提出一种基于阶段空间动力学的生成模型框架,以便更好地生成连续输入数据。
  • methods: 该模型使用了Stochastic Differential Equation(SDE)和神经网络来实现逆向生成。具体来说,它首先在输入空间中定义了一个阶段空间,然后使用Stochastic Optimal Control的理念来建立一个路径度量,以便高效地采样数据。
  • results: 在标准图像生成测试 benchmark 上,该模型在小范围内的Number of Function Evaluations(NFEs)下表现出色,并且与使用有效采样技术的 diffusion models 的性能相当,这说明该模型有potential作为一种新的生成模型工具。
    Abstract Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.
    摘要 干扰模型(DM)表示现代生成模型的极限,它们在维度输入空间中构建了随机差分方程(SDE),并使用神经网络来逆向解决。在这项工作中,我们介绍了一种新的生成模型框架,基于phaspace动力学,phaspace是包括位置和速度的扩展空间。利用Stochastic Optimal Control的洞察,我们在phaspace中建立了一个路径度量,以便高效采样。与DMs不同,我们的框架在动力学征标的早期阶段就能生成真实数据点,这使得可以通过使用速度信息来加速数据生成。在标准图像生成标准benchmark上,我们的模型在小范围内的Number of Function Evaluations(NFEs)下表现优秀,并且与Diffusion Models配备高效采样技术相比,我们的方法在生成模型方面具有潜在的潜力。

A general mechanism of humor: reformulating the semantic overlap

  • paper_url: http://arxiv.org/abs/2310.07803
  • repo_url: None
  • paper_authors: Javier Martínez
  • for: 提出一种通用的幽默机制,不限于语言交流。
  • methods: 基于礼物的概念,包括脱困和冲突的解决方法。
  • results: 提出了一种可以应用于非语言交流的幽默机制,并且认为这种机制是人类思维的核心。
    Abstract This article proposes a cognitive mechanism of humour of general applicability, not restricted to verbal communication. It is indebted to Raskin's concept of script overlap, and conforms to the incongruity-resolution theoretical framework, but it is built on the notion of constraint, an abstract correspondence between sets of data. Under this view, script overlap is an outcome of a more abstractly described phenomenon, constraint overlap. The important concept of the overlooked argument is introduced to characterise the two overlapping constraints -- overt and covert. Their inputs and outputs are not directly encoded in utterances, but implicated by them, and their overlap results in another overlap at the level of the communicated utterances, that the incongruity reveals. Our hypothesis assumes as a given that the evocation of such constraints is a cognitive effect of the inferential process by which a hearer interprets utterances. We base this assumption on Hofstadter's theory of analogy-making as the essence of human thought. By substituting "stimuli" of any kind for "utterances" in this model, we obtain a mechanism as easily applicable to non-verbal communication -- slapstick, cartoons -- and we propose it describes the necessary and sufficient conditions for a communicative act in any modality to carry humour.
    摘要 The authors propose that the evocation of these constraints is a cognitive effect of the inferential process by which a hearer interprets utterances. This idea is based on Hofstadter's theory of analogy-making as the essence of human thought. By applying this mechanism to non-verbal communication, such as slapstick or cartoons, the authors suggest that it provides a necessary and sufficient condition for a communicative act in any modality to carry humor.

An Information Bottleneck Characterization of the Understanding-Workload Tradeoff

  • paper_url: http://arxiv.org/abs/2310.07802
  • repo_url: https://github.com/mycal-tucker/ib-explanations
  • paper_authors: Lindsay Sanneman, Mycal Tucker, Julie Shah
  • for: 这篇论文旨在探讨人工智能系统的可解释性(XAI),以支持人类理解AI系统。
  • methods: 论文使用信息瓶颈方法(Information Bottleneck method)来自动生成抽象(hand-crafted groupings of related problem features),以平衡工作负荷和理解之间的 contradistinction。
  • results: 实验表明,通过抽象来解释复杂概念可以有效地Address和平衡工作负荷和理解之间的 contradistinction。
    Abstract Recent advances in artificial intelligence (AI) have underscored the need for explainable AI (XAI) to support human understanding of AI systems. Consideration of human factors that impact explanation efficacy, such as mental workload and human understanding, is central to effective XAI design. Existing work in XAI has demonstrated a tradeoff between understanding and workload induced by different types of explanations. Explaining complex concepts through abstractions (hand-crafted groupings of related problem features) has been shown to effectively address and balance this workload-understanding tradeoff. In this work, we characterize the workload-understanding balance via the Information Bottleneck method: an information-theoretic approach which automatically generates abstractions that maximize informativeness and minimize complexity. In particular, we establish empirical connections between workload and complexity and between understanding and informativeness through human-subject experiments. This empirical link between human factors and information-theoretic concepts provides an important mathematical characterization of the workload-understanding tradeoff which enables user-tailored XAI design.
    摘要 Translation notes:* "artificial intelligence" is translated as "人工智能" (réngōng zhìnéng)* "explainable AI" is translated as "可解释人工智能" (kějìjiě xiǎng réngōng zhìnéng)* "human understanding" is translated as "人类理解" (réngrì lǐjiě)* "mental workload" is translated as "心理劳动" (xīn lǐ gōngzuò)* "information-theoretic approach" is translated as "信息理论方法" (xìnwù lǐlùn fāngfa)* "abstractions" is translated as "抽象" (chōuxiàng)* "hand-crafted groupings" is translated as "手工组合" (shǒu gōng zǔyì)* "problem features" is translated as "问题特征" (wèn tí tèchēng)* "workload-understanding tradeoff" is translated as "劳动-理解交换" (gōngzuò-lǐjiě gòuhuan)* "user-tailored XAI design" is translated as "用户定制XAI设计" (yònghòu dìngxì XAI jièdì)

Explainable Attention for Few-shot Learning and Beyond

  • paper_url: http://arxiv.org/abs/2310.07800
  • repo_url: None
  • paper_authors: Bahareh Nikpour, Narges Armanfard
  • for: 提高几何学模型的准确率和可靠性,特别在数据采集和标注过程中面临限制的情况下。
  • methods: 利用深度强化学习实现硬注意力找到,直接影响原始输入数据,使其可解释性提高。
  • results: 通过对多个 benchmark 数据集进行广泛的实验,证明我们提出的方法的效果。
    Abstract Attention mechanisms have exhibited promising potential in enhancing learning models by identifying salient portions of input data. This is particularly valuable in scenarios where limited training samples are accessible due to challenges in data collection and labeling. Drawing inspiration from human recognition processes, we posit that an AI baseline's performance could be more accurate and dependable if it is exposed to essential segments of raw data rather than the entire input dataset, akin to human perception. However, the task of selecting these informative data segments, referred to as hard attention finding, presents a formidable challenge. In situations with few training samples, existing studies struggle to locate such informative regions due to the large number of training parameters that cannot be effectively learned from the available limited samples. In this study, we introduce a novel and practical framework for achieving explainable hard attention finding, specifically tailored for few-shot learning scenarios, called FewXAT. Our approach employs deep reinforcement learning to implement the concept of hard attention, directly impacting raw input data and thus rendering the process interpretable for human understanding. Through extensive experimentation across various benchmark datasets, we demonstrate the efficacy of our proposed method.
    摘要 注意机制在增强学习模型方面表现出了扎实的潜力,特别是在数据采集和标注过程中存在困难时。 Drawing inspiration from human recognition processes, we argue that an AI baseline's performance could be more accurate and dependable if it is exposed to essential segments of raw data rather than the entire input dataset, similar to human perception. However, the task of selecting these informative data segments, referred to as hard attention finding, presents a formidable challenge. In situations with few training samples, existing studies struggle to locate such informative regions due to the large number of training parameters that cannot be effectively learned from the available limited samples. In this study, we introduce a novel and practical framework for achieving explainable hard attention finding, specifically tailored for few-shot learning scenarios, called FewXAT. Our approach employs deep reinforcement learning to implement the concept of hard attention, directly impacting raw input data and thus rendering the process interpretable for human understanding. Through extensive experimentation across various benchmark datasets, we demonstrate the efficacy of our proposed method.

A Transfer-Learning-Based Prognosis Prediction Paradigm that Bridges Data Distribution Shift across EMR Datasets

  • paper_url: http://arxiv.org/abs/2310.07799
  • repo_url: None
  • paper_authors: Zhongji Zhang, Yuhang Wang, Yinghao Zhu, Xinyu Ma, Tianlong Wang, Chaohe Zhang, Yasha Wang, Liantao Ma
  • for: 预测新疆突病和其他疾病的准确性
  • methods: 使用传输学习方法建立源数据集和目标数据集之间的转换模型,以适应不同任务域下的特征分布偏移问题
  • results: 比基eline方法高效,特别是在处理有限数据量时 display(“我们的方法可以更好地预测新疆突病和其他疾病。”)
    Abstract Due to the limited information about emerging diseases, symptoms are hard to be noticed and recognized, so that the window for clinical intervention could be ignored. An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan, so to promptly prevent unfavorable outcomes. However, in the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference, to the extent that even data labels are difficult to mark correctly. In addition, Electronic Medical Record (EMR) data of different diseases or of different sources of the same disease can prove to be having serious cross-dataset feature misalignment problems, greatly mutilating the efficiency of deep learning models. This article introduces a transfer learning method to build a transition model from source dataset to target dataset. By way of constraining the distribution shift of features generated in disparate domains, domain-invariant features that are exclusively relative to downstream tasks are captured, so to cultivate a unified domain-invariant encoder across various task domains to achieve better feature representation. Experimental results of several target tasks demonstrate that our proposed model outperforms competing baseline methods and has higher rate of training convergence, especially in dealing with limited data amount. A multitude of experiences have proven the efficacy of our method to provide more accurate predictions concerning newly emergent pandemics and other diseases.
    摘要 (Simplified Chinese translation)由于疾病出现的信息有限,症状难以注意和识别,因此临床 intervención的窗口可能被忽略。一个有效的预测模型可以帮助医生确定病种和制定个性化的治疗方案,以便更快地预防不利的结果。然而,疾病的早期阶段,有限的数据收集和临床经验,加上隐私和伦理的担忧,可能导致参考数据的有限性,甚至数据标签难以正确地标注。此外,不同疾病或同一种疾病的不同来源的电子医疗记录(EMR)数据可能会导致严重的跨数据集特征不一致问题,大大降低深度学习模型的效率。这篇文章介绍了一种传输学习方法,用于从源数据集转移到目标数据集。通过限制不同领域中特征生成的分布shift,捕捉固有的领域特征,以便在不同任务领域中培养一个统一的领域特征不变的编码器,以达到更好的特征表示。实验结果表明,我们提出的方法在多个目标任务上表现出色,特别是在处理有限数据量时。多种经验证明了我们的方法在新出现的流行病和其他疾病预测方面的可靠性。

GenTKG: Generative Forecasting on Temporal Knowledge Graph

  • paper_url: http://arxiv.org/abs/2310.07793
  • repo_url: None
  • paper_authors: Ruotong Liao, Xu Jia, Yunpu Ma, Volker Tresp
  • for: 用于替代传统的 embedding-based 和 rule-based 模型,并在 temporal knowledge graph 领域实现生成式预测。
  • methods: 提出了一种基于 retrieval 策略和 lightweight 参数efficient instruciton tuning的生成式预测方法,named GenTKG。
  • results: 在低计算资源下,GenTKG 比传统方法有更高的预测性能,并且在未经重新训练的情况下,在未看到的数据集上也表现出了很好的转移性。
    Abstract The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework that performs generative forecasting on tKGs named GenTKG, which combines a temporal logical rule-based retrieval strategy and lightweight parameter-efficient instruction tuning. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting under low computation resources. GenTKG also highlights remarkable transferability with exceeding performance on unseen datasets without re-training. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.
    摘要 大Language Model (LLM)的快速进步使得temporal knowledge graph (tKG)领域受到了关注,而在这个领域,传统的经过设计的嵌入式和规则基本模型仍然主导。问题是, pré-trained LLMs能否理解结构化的时间关系数据,并取代它们作为时间关系预测的基本模型?因此,我们将 temporal knowledge forecasting 引入到生成设定中。然而,在复杂的时间图数据结构和Sequential natural expressions LLMs处理的大� Fischer gap 和 tKGs的庞大数据量和轻量级 fine-tuning LLMs 的计算成本之间存在挑战。为解决这些挑战,我们提出了一种新的检索增强生成框架,名为 GenTKG,它结合了时间逻辑规则基本的检索策略和轻量级参数高效调整。我们的实验表明,GenTKG 在低计算资源下超过了传统的时间关系预测方法。GenTKG 还表现出了很好的转移性,在未看过的数据集上达到了 excel 的表现。我们的工作揭示了 LLMs 在 tKG 领域的巨大潜力,并开启了一个新的前ier для generative forecasting on tKGs。

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

  • paper_url: http://arxiv.org/abs/2310.07771
  • repo_url: None
  • paper_authors: Xiaofan Li, Yifu Zhang, Xiaoqing Ye
  • for: 提供高质量、大规模多视图视频数据,用于自动驾驶研究。
  • methods: 提出了一种基于 Bird’s-Eye-View(BEV)表示的协调扩散框架DrivingDiffusion,用于生成真实多视图视频。
  • results: 通过该框架,可以免费生成大规模、高质量多视图视频,用于驱动任务研究。
    Abstract With the increasing popularity of autonomous driving based on the powerful and unified bird's-eye-view (BEV) representation, a demand for high-quality and large-scale multi-view video data with accurate annotation is urgently required. However, such large-scale multi-view data is hard to obtain due to expensive collection and annotation costs. To alleviate the problem, we propose a spatial-temporal consistent diffusion framework DrivingDiffusion, to generate realistic multi-view videos controlled by 3D layout. There are three challenges when synthesizing multi-view videos given a 3D layout: How to keep 1) cross-view consistency and 2) cross-frame consistency? 3) How to guarantee the quality of the generated instances? Our DrivingDiffusion solves the problem by cascading the multi-view single-frame image generation step, the single-view video generation step shared by multiple cameras, and post-processing that can handle long video generation. In the multi-view model, the consistency of multi-view images is ensured by information exchange between adjacent cameras. In the temporal model, we mainly query the information that needs attention in subsequent frame generation from the multi-view images of the first frame. We also introduce the local prompt to effectively improve the quality of generated instances. In post-processing, we further enhance the cross-view consistency of subsequent frames and extend the video length by employing temporal sliding window algorithm. Without any extra cost, our model can generate large-scale realistic multi-camera driving videos in complex urban scenes, fueling the downstream driving tasks. The code will be made publicly available.
    摘要 随着自动驾驶基于强大和统一的 bird's-eye-view (BEV) 表示的 популяр度增长,需要大量高质量多视图视频数据和准确的标注,但这些数据很难获得因为收集和标注成本高昂。为解决这个问题,我们提出了一个空间-时间一致的扩散框架 DrivingDiffusion,用于生成真实的多视图视频,控制了3D 布局。在生成多视图视频时,存在以下三个挑战:1)保持多视图视频之间的一致性和2)保持多帧视频之间的一致性?3)如何保证生成的实例质量?我们的 DrivingDiffusion 解决这些问题,通过将多视图单帧图像生成步骤、多camera共享的单视图视频生成步骤和后处理步骤,来生成真实的多视图视频。在多视图模型中,保证多视图图像之间的一致性,通过邻接相机之间的信息交换。在时间模型中,我们主要从多视图图像的首帧中提取需要注意的信息,并通过引入本地提示来提高生成的实例质量。在后处理步骤中,我们进一步提高了后续帧之间的一致性,并通过使用时间滑动窗口算法,延长视频的长度。无需额外成本,我们的模型可以生成大量高质量的多相机城市驾驶视频,为下游驾驶任务提供燃料。代码将公开。

PAD: A Dataset and Benchmark for Pose-agnostic Anomaly Detection

  • paper_url: http://arxiv.org/abs/2310.07716
  • repo_url: https://github.com/ericlee0224/pad
  • paper_authors: Qiang Zhou, Weize Li, Lihan Jiang, Guoliang Wang, Guyue Zhou, Shanghang Zhang, Hao Zhao
  • for: 这个论文的目的是解决对象异常检测中的两个主要挑战:第一个是现有数据集缺乏完整的视觉信息,其中数据集通常假设训练和测试样本具有相同的pose angle,但在实际应用中,异常可能存在任何对象区域,需要研究无关于pose的异常检测。第二个挑战是对于无关于pose的异常检测的实验协议的缺乏一致性,这使得不同方法之间的比较不公平,阻碍了无关于pose的异常检测的研究。
  • methods: 作者们开发了一个名为Multi-pose Anomaly Detection(MAD)数据集和Pose-agnostic Anomaly Detection(PAD)benchmark,以解决上述两个挑战。Specifically,他们使用了20种复杂形状的LEGO玩具,包括4K视图,以及高质量和多样化的3D异常在 both simulated和real environments中。此外,作者们还提出了一种名为OmniposeAD的新方法,通过使用MAD进行训练,专门设计用于无关于pose的异常检测。
  • results: 作者们通过了全面的评估,证明了他们的数据集和方法的相关性。此外,他们还提供了一个开源的benchmark库,包括数据集和基eline方法,以便未来的研究和应用。代码、数据和模型都公开可用于https://github.com/EricLee0224/PAD。
    Abstract Object anomaly detection is an important problem in the field of machine vision and has seen remarkable progress recently. However, two significant challenges hinder its research and application. First, existing datasets lack comprehensive visual information from various pose angles. They usually have an unrealistic assumption that the anomaly-free training dataset is pose-aligned, and the testing samples have the same pose as the training data. However, in practice, anomaly may exist in any regions on a object, the training and query samples may have different poses, calling for the study on pose-agnostic anomaly detection. Second, the absence of a consensus on experimental protocols for pose-agnostic anomaly detection leads to unfair comparisons of different methods, hindering the research on pose-agnostic anomaly detection. To address these issues, we develop Multi-pose Anomaly Detection (MAD) dataset and Pose-agnostic Anomaly Detection (PAD) benchmark, which takes the first step to address the pose-agnostic anomaly detection problem. Specifically, we build MAD using 20 complex-shaped LEGO toys including 4K views with various poses, and high-quality and diverse 3D anomalies in both simulated and real environments. Additionally, we propose a novel method OmniposeAD, trained using MAD, specifically designed for pose-agnostic anomaly detection. Through comprehensive evaluations, we demonstrate the relevance of our dataset and method. Furthermore, we provide an open-source benchmark library, including dataset and baseline methods that cover 8 anomaly detection paradigms, to facilitate future research and application in this domain. Code, data, and models are publicly available at https://github.com/EricLee0224/PAD.
    摘要 “物体异常检测是机器视觉领域的重要问题,最近有很大的进步。然而,两个主要挑战是阻碍其研究和应用。第一个是现有数据集缺乏全面的视觉信息,通常假设 anomaly-free 训练数据集是同一个 pose 的,测试样本也是同一个 pose。然而,在实际情况下,异常可能存在于对象任意区域,训练和查询样本可能有不同的 pose,需要研究无 pose 的异常检测。第二个是对pose-agnostic异常检测的实验室协议缺乏共识,导致不公正的比较,阻碍研究pose-agnostic异常检测。为解决这些问题,我们开发了Multi-pose Anomaly Detection(MAD)数据集和Pose-agnostic Anomaly Detection(PAD) benchmar,这是解决pose-agnostic异常检测问题的第一步。 Specifically,我们使用了20种复杂形状的 LEGO 玩具,包括4K 视图和各种 pose,以及高质量和多样化的3D 异常在 both simulated 和实际环境中。此外,我们提出了一种新的 OmniposeAD 方法,通过 MAD 训练而得,专门针对无 pose 异常检测。通过全面的评估,我们证明了我们的数据集和方法的相关性。此外,我们还提供了一个开源的benchmark库,包括dataset和基准方法,覆盖8种异常检测思想,以便未来的研究和应用。代码、数据和模型都公开可用于https://github.com/EricLee0224/PAD。”

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

  • paper_url: http://arxiv.org/abs/2310.07713
  • repo_url: None
  • paper_authors: Boxin Wang, Wei Ping, Lawrence McAfee, Peng Xu, Bo Li, Mohammad Shoeybi, Bryan Catanzaro
  • for: 这个论文是为了研究预训练自然语言模型(LLM)的可靠性和精度,以及如何通过外部数据库来提高这些模型的性能。
  • methods: 这个论文使用了Retrieval方法来预训练LLM,并在这个基础模型上进行了更多的预训练和调教。
  • results: 论文的实验结果表明,使用Retrieval方法预训练LLM后,可以大幅提高模型的精度和可靠性,并且可以在零基础情况下进行成功的问答 tasks。
    Abstract Pretraining auto-regressive large language models (LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented LLM is still limited (e.g., Retro has 7.5B parameters), which limits the effectiveness of instruction tuning and zero-shot generalization. In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval before instruction tuning. Specifically, we continue to pretrain the 43B GPT model on additional 100 billion tokens using the Retro augmentation method by retrieving from 1.2 trillion tokens. The obtained foundation model, Retro 48B, largely outperforms the original 43B GPT in terms of perplexity. After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on zero-shot question answering (QA) tasks. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. Surprisingly, we find that one can ablate the encoder from InstructRetro architecture and directly use its decoder backbone, while achieving comparable results. We hypothesize that pretraining with retrieval makes its decoder good at incorporating context for QA. Our results highlights the promising direction to obtain a better GPT decoder for QA through continued pretraining with retrieval before instruction tuning.
    摘要 <> translate "Pretraining auto-regressive large language models (LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented LLM is still limited (e.g., Retro has 7.5B parameters), which limits the effectiveness of instruction tuning and zero-shot generalization. In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval before instruction tuning. Specifically, we continue to pretrain the 43B GPT model on additional 100 billion tokens using the Retro augmentation method by retrieving from 1.2 trillion tokens. The obtained foundation model, Retro 48B, largely outperforms the original 43B GPT in terms of perplexity. After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on zero-shot question answering (QA) tasks. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA tasks, and 10% over GPT across 4 challenging long-form QA tasks. Surprisingly, we find that one can ablate the encoder from InstructRetro architecture and directly use its decoder backbone, while achieving comparable results. We hypothesize that pretraining with retrieval makes its decoder good at incorporating context for QA. Our results highlights the promising direction to obtain a better GPT decoder for QA through continued pretraining with retrieval before instruction tuning." into Simplified Chinese.Here's the translation:<>预训自动循环大语言模型(LLM)与检索结合可以提高混淆率和事实准确率,通过利用外部数据库。然而,现有的预训检索增强LLM的大小仍然有限(例如,Retro有7.5亿参数),这限制了指令调整和零基数泛化的效iveness。在这种工作中,我们引入Retro 48B,最大化LLM预训检索后的指令调整。具体来说,我们继续预训43B GPT模型在additional 100亿个字符上,使用Retro增强方法,通过检索1.2万亿个字符来进行预训。获得的基础模型,Retro 48B,与原始43B GPT在混淆率上显著提高。在Retro上进行指令调整后,InstructRetro在零基数问答任务上表现出了显著提高。具体来说,InstructRetro在8个短形问答任务中平均提高7%,在4个挑战长形问答任务中提高10%。 surprisely,我们发现可以从InstructRetro架构中除去encoder,直接使用其decoder backbone,而 achieve comparable results。我们 hypothesize that预训检索使得其decoder可以好好地包含上下文。我们的结果显示了预训检索后GPT decoder的提高的可能性,并且 highlights the promising direction to obtain a better GPT decoder for QA through continued pretraining with retrieval before instruction tuning。

Growing Brains: Co-emergence of Anatomical and Functional Modularity in Recurrent Neural Networks

  • paper_url: http://arxiv.org/abs/2310.07711
  • repo_url: None
  • paper_authors: Ziming Liu, Mikail Khona, Ila R. Fiete, Max Tegmark
  • for: 这个论文的目的是研究如何使用机器学习方法来实现脑模式下的神经网络结构。
  • methods: 这个论文使用的方法是一种名为“脑灵感模块化训练”(BIMT),它可以让神经网络中的神经元组织成功参与到同一些计算任务中,同时也可以使神经网络的性能更高。
  • results: 研究发现,通过使用BIMT训练神经网络,可以同时实现功能和结构的模块化,并且这些模块化的神经元也可以在不同的计算任务中保持一定的稳定性。此外,相比标准的$L_1$或无regularization设置,BIMT可以使神经网络的性能更高。
    Abstract Recurrent neural networks (RNNs) trained on compositional tasks can exhibit functional modularity, in which neurons can be clustered by activity similarity and participation in shared computational subtasks. Unlike brains, these RNNs do not exhibit anatomical modularity, in which functional clustering is correlated with strong recurrent coupling and spatial localization of functional clusters. Contrasting with functional modularity, which can be ephemerally dependent on the input, anatomically modular networks form a robust substrate for solving the same subtasks in the future. To examine whether it is possible to grow brain-like anatomical modularity, we apply a recent machine learning method, brain-inspired modular training (BIMT), to a network being trained to solve a set of compositional cognitive tasks. We find that functional and anatomical clustering emerge together, such that functionally similar neurons also become spatially localized and interconnected. Moreover, compared to standard $L_1$ or no regularization settings, the model exhibits superior performance by optimally balancing task performance and network sparsity. In addition to achieving brain-like organization in RNNs, our findings also suggest that BIMT holds promise for applications in neuromorphic computing and enhancing the interpretability of neural network architectures.
    摘要 Recurrent neural networks (RNNs) 训练在compositional tasks上可以显示函数含量,在这些 neurons 可以被分为活动相似性和共享计算子任务中的集群。与大脑不同,这些 RNNs 不会显示解剖学含量,解剖学含量与强回路互连和功能集群的空间地域化强相关。与函数含量不同,解剖学含量可以在输入的影响下短暂地存在。为了检查是否可以培养大脑类似的解剖学含量,我们在一个解剖学含量训练(BIMT)中训练一个解剖学含量的网络,以解决一组compositional cognitive tasks。我们发现,功能相似的 neurons 不仅在活动上相似,还在空间上受到相似的归一化和连接。此外,相比标准 $L_1$ 或无规则化设置,模型在任务性能和网络稀缺性之间取得了优质平衡,并且表现出色。除了在 RNNs 中实现大脑类似的组织结构外,我们的发现还表明BIMT在 neuromorphic computing 和增强神经网络架构的解释性方面具有潜在的潜力。

Pixel State Value Network for Combined Prediction and Planning in Interactive Environments

  • paper_url: http://arxiv.org/abs/2310.07706
  • repo_url: None
  • paper_authors: Sascha Rosbach, Stefan M. Leupold, Simon Großjohann, Stefan Roth
  • for: 本研究旨在提高自动驾驶车辆在城市环境中的交通互动能力。
  • methods: 该研究提出了一种基于深度学习的方法,将预测和规划分别作为两个独立模块。 conditional GAN with U-Net architecture 是用于预测高分辨率图像序列的。
  • results: 研究结果表明,该方法可以在复杂的情况下,如车道变换 amidst conflicting objectives 中展现出直观的行为。
    Abstract Automated vehicles operating in urban environments have to reliably interact with other traffic participants. Planning algorithms often utilize separate prediction modules forecasting probabilistic, multi-modal, and interactive behaviors of objects. Designing prediction and planning as two separate modules introduces significant challenges, particularly due to the interdependence of these modules. This work proposes a deep learning methodology to combine prediction and planning. A conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences. The sequences represent explicit motion predictions, mainly used to train context understanding, and pixel state values suitable for planning encoding kinematic reachability, object dynamics, safety, and driving comfort. The model can be trained offline on target images rendered by a sampling-based model-predictive planner, leveraging real-world driving data. Our results demonstrate intuitive behavior in complex situations, such as lane changes amidst conflicting objectives.
    摘要 自动驾驶车辆在城市环境中必须可靠地与其他交通参与者交互。规划算法经常利用分离的预测模块预测 probabilistic、多模式和互动行为。将预测和规划分为两个模块会导致很多挑战,尤其是由于这两个模块之间的互相关系。这项工作提出了基于深度学习的方法,将预测和规划合并起来。使用 conditional GAN WITH U-Net 架构,训练预测两个高分辨率图像序列。这两个序列表示明确的运动预测,主要用于训练上下文理解,以及适用于规划编码减速可能性、物体动力学、安全和驾驶舒适。模型可以在 target 图像上进行训练,使用采样基本的模拟预测规划器生成的图像,利用实际驾驶数据。我们的结果表明在复杂的情况下,如lane change amidst conflicting objectives, exhibit intuitive behavior。

From Scarcity to Efficiency: Improving CLIP Training via Visual-enriched Captions

  • paper_url: http://arxiv.org/abs/2310.07699
  • repo_url: None
  • paper_authors: Zhengfeng Lai, Haotian Zhang, Wentao Wu, Haoping Bai, Aleksei Timofeev, Xianzhi Du, Zhe Gan, Jiulong Shan, Chen-Nee Chuah, Yinfei Yang, Meng Cao
  • for: 提高CLIP模型的训练效果和数据效率
  • methods: 利用视觉概念和新生成的视觉增强caption(VeC)进行拓展和改进 caption,并提出了一种混合训练方案
  • results: 对于不同规模的原始数据进行了全面的评估,显示 VeCLIP 在图像-文本对齐和总体模型性能方面具有显著优势,例如在 COCO 和 Flickr30k 检索任务中的 Retrieval 任务中达到了20%以上的提升,同时在数据效率方面也达到了3%以上的提升。
    Abstract Web-crawled datasets are pivotal to the success of pre-training vision-language models, exemplified by CLIP. However, web-crawled AltTexts can be noisy and potentially irrelevant to images, thereby undermining the crucial image-text alignment. Existing methods for rewriting captions using large language models (LLMs) have shown promise on small, curated datasets like CC3M and CC12M. Nevertheless, their efficacy on massive web-captured captions is constrained by the inherent noise and randomness in such data. In this study, we address this limitation by focusing on two key aspects: data quality and data variety. Unlike recent LLM rewriting techniques, we emphasize exploiting visual concepts and their integration into the captions to improve data quality. For data variety, we propose a novel mixed training scheme that optimally leverages AltTexts alongside newly generated Visual-enriched Captions (VeC). We use CLIP as one example and adapt the method for CLIP training on large-scale web-crawled datasets, named VeCLIP. We conduct a comprehensive evaluation of VeCLIP across small, medium, and large scales of raw data. Our results show significant advantages in image-text alignment and overall model performance, underscoring the effectiveness of VeCLIP in improving CLIP training. For example, VeCLIP achieves a remarkable over 20% improvement in COCO and Flickr30k retrieval tasks under the 12M setting. For data efficiency, we also achieve a notable over 3% improvement while using only 14% of the data employed in the vanilla CLIP and 11% in ALIGN.
    摘要 网络爬取数据集是CLIP成功的关键因素,但网络爬取AltText可能是不稳定和无关的图像,从而损害图像和文本的对齐。现有的使用大型自然语言模型(LLM)重写caption的方法有所成就在小型 cura dataset上,但它们在大量网络抓取caption上的效果受限于数据的噪音和随机性。在这项研究中,我们解决这一问题,重点关注数据质量和数据多样性两个方面。与之前的LLM重写技术不同,我们强调利用视觉概念并将其 интегриinto caption中以提高数据质量。为了提高数据多样性,我们提议一种新的混合训练方案,利用AltText alongside新生成的视觉增强caption(VeC)进行优化。我们使用CLIP作为一个例子,并适应CLIP在大规模网络抓取数据上进行训练,称之为VeCLIP。我们对VeCLIP进行了广泛的评估,包括小、中和大规模的 raw data 评估。我们的结果表明,VeCLIP在图像和文本对齐方面存在显著改善,并且在COCO和Flickr30k检索任务中 achieved 辉煌的提升,尤其在12M设定下。此外,我们还在数据效率方面取得了明显的提升,只使用了14%的数据,相比于vanilla CLIP和ALIGN使用的11%和14%。

SurroCBM: Concept Bottleneck Surrogate Models for Generative Post-hoc Explanation

  • paper_url: http://arxiv.org/abs/2310.07698
  • repo_url: None
  • paper_authors: Bo Pan, Zhenke Liu, Yifei Zhang, Liang Zhao
  • For: 这 paper 的目的是解释黑盒模型的决策过程,以提高模型的可解释性。* Methods: 这 paper 使用了 Concept Activation Vectors (CAVs) 和 Concept Bottleneck Models (CBMs) 等新的技术,以提供基于概念的解释。但是,这些技术需要人工定义的概念,可能是成本高的。因此,这 paper 提出了一种新的框架,即 Concept Bottleneck Surrogate Models (SurroCBM),可以自动发现黑盒模型中的概念,并提供可解释的模型。* Results: 经过广泛的实验,这 paper 证明了 SurroCBM 的可行性和有效性,并且可以不断提高解释质量。这表明 SurroCBM 有可能成为黑盒模型可解释性的新途径。
    Abstract Explainable AI seeks to bring light to the decision-making processes of black-box models. Traditional saliency-based methods, while highlighting influential data segments, often lack semantic understanding. Recent advancements, such as Concept Activation Vectors (CAVs) and Concept Bottleneck Models (CBMs), offer concept-based explanations but necessitate human-defined concepts. However, human-annotated concepts are expensive to attain. This paper introduces the Concept Bottleneck Surrogate Models (SurroCBM), a novel framework that aims to explain the black-box models with automatically discovered concepts. SurroCBM identifies shared and unique concepts across various black-box models and employs an explainable surrogate model for post-hoc explanations. An effective training strategy using self-generated data is proposed to enhance explanation quality continuously. Through extensive experiments, we demonstrate the efficacy of SurroCBM in concept discovery and explanation, underscoring its potential in advancing the field of explainable AI.
    摘要 <>用�ayer 的 Explainable AI 技术来揭示黑盒型模型的决策过程。传统的焦点方法可以高亮影响数据段的数据,但缺乏含义理解。最近的进步包括概念活化 вектор (CAV) 和概念瓶颈模型 (CBM),可以提供基于概念的解释,但需要人类定义的概念。然而,人类标注的概念是有成本的获得的。这篇论文介绍了概念瓶颈代理模型 (SurroCBM),一种新的框架,用于解释黑盒型模型的决策过程。SurroCBM 可以自动发现黑盒型模型中共享和特有的概念,并使用可解释的代理模型进行后期解释。通过自动生成的数据进行高质量的训练,可以不断提高解释质量。经过广泛的实验,我们证明 SurroCBM 在概念发现和解释方面具有极高的效果,这有助于进一步发展透明 AI 技术。

Hypergraph Neural Networks through the Lens of Message Passing: A Common Perspective to Homophily and Architecture Design

  • paper_url: http://arxiv.org/abs/2310.07684
  • repo_url: None
  • paper_authors: Lev Telyatnikov, Maria Sofia Bucarelli, Guillermo Bernardez, Olga Zaghen, Simone Scardapane, Pietro Lio
  • for: 本文探讨了hypergraph学习领域中存在的一些问题,包括homophily在高阶网络中的作用、现有的hypergraph架构和方法的可能性,以及现有的数据集是否能够为高阶网络学习提供有意义的比较标准。
  • methods: 本文提出了一种基于消息传递方式的高阶网络内部homophily概念,并提出了一种新的消息传递框架MultiSet,以及一种基于新的超链抽样策略的新架构MultiSetMixer。
  • results: 经过广泛的实验,本文得出了许多有价值的发现,包括homophily在高阶网络中的作用、现有的hypergraph架构和方法的局限性,以及一些改进的方法和架构的可能性。
    Abstract Most of the current hypergraph learning methodologies and benchmarking datasets in the hypergraph realm are obtained by lifting procedures from their graph analogs, simultaneously leading to overshadowing hypergraph network foundations. This paper attempts to confront some pending questions in that regard: Can the concept of homophily play a crucial role in Hypergraph Neural Networks (HGNNs), similar to its significance in graph-based research? Is there room for improving current hypergraph architectures and methodologies? (e.g. by carefully addressing the specific characteristics of higher-order networks) Do existing datasets provide a meaningful benchmark for HGNNs? Diving into the details, this paper proposes a novel conceptualization of homophily in higher-order networks based on a message passing scheme; this approach harmonizes the analytical frameworks of datasets and architectures, offering a unified perspective for exploring and interpreting complex, higher-order network structures and dynamics. Further, we propose MultiSet, a novel message passing framework that redefines HGNNs by allowing hyperedge-dependent node representations, as well as introduce a novel architecture MultiSetMixer that leverages a new hyperedge sampling strategy. Finally, we provide an extensive set of experiments that contextualize our proposals and lead to valuable insights in hypergraph representation learning.
    摘要 Currently, most hypergraph learning methodologies and benchmarking datasets in the hypergraph realm are derived from lifting procedures from their graph analogs, which can lead to overshadowing hypergraph network foundations. This paper aims to address some outstanding questions in this regard: Can the concept of homophily play a crucial role in Hypergraph Neural Networks (HGNNs), similar to its significance in graph-based research? Is there room for improving current hypergraph architectures and methodologies? (e.g., by carefully addressing the specific characteristics of higher-order networks) Do existing datasets provide a meaningful benchmark for HGNNs?In detail, this paper proposes a novel conceptualization of homophily in higher-order networks based on a message passing scheme, which harmonizes the analytical frameworks of datasets and architectures, offering a unified perspective for exploring and interpreting complex, higher-order network structures and dynamics. Furthermore, we propose MultiSet, a novel message passing framework that redefines HGNNs by allowing hyperedge-dependent node representations, as well as introduce a novel architecture MultiSetMixer that leverages a new hyperedge sampling strategy. Finally, we provide an extensive set of experiments that contextualize our proposals and lead to valuable insights in hypergraph representation learning.Here's the translation in Traditional Chinese:现在,大多数的超гра网学方法和测试数据集在超гра网领域都是从它们的几何网领域中提取出来的,这可能会导致超гра网网络基础建筑的陌生。本文尝试回答一些尚未得到解答的问题:在超гра网神经网络(HGNN)中,认可性(homophily)是否能够扮演重要的角色,跟graph-based研究中的认可性一样?现有的数据集是否能够提供有意义的参考 benchmark для HGNN?进一步详细地说,本文提出了一个新的高阶网络中认可性的概念化方法,基于讯息传递方案,这种方法可以融合数据集和架构的分析框架,提供一个统一的见解来探索和解释高阶网络结构和动态的复杂性。此外,我们还提出了 MultiSet,一个新的讯息传递框架,它可以让超边依赖的节点表现,以及引入一个新的超边抽样策略。最后,我们提供了一系列实验,以背景和评估我们的提议,并带来有价值的见解在超гра网表示学习中。

Controllable Data Generation Via Iterative Data-Property Mutual Mappings

  • paper_url: http://arxiv.org/abs/2310.07683
  • repo_url: None
  • paper_authors: Bo Pan, Muran Qin, Shiyu Wang, Yifei Zhang, Liang Zhao
  • for: 这个论文的目的是提高基于VAE的数据生成器的控制性和分离性。
  • methods: 这个论文提出了一种普适的框架,通过在数据和属性之间进行互相映射,来增强VAE基于的数据生成器的控制性和分离性。
  • results: 实验结果表明,该框架可以在短时间内准确地控制生成样本的属性,同时保持生成样本的有效性和分离性。
    Abstract Deep generative models have been widely used for their ability to generate realistic data samples in various areas, such as images, molecules, text, and speech. One major goal of data generation is controllability, namely to generate new data with desired properties. Despite growing interest in the area of controllable generation, significant challenges still remain, including 1) disentangling desired properties with unrelated latent variables, 2) out-of-distribution property control, and 3) objective optimization for out-of-distribution property control. To address these challenges, in this paper, we propose a general framework to enhance VAE-based data generators with property controllability and ensure disentanglement. Our proposed objective can be optimized on both data seen and unseen in the training set. We propose a training procedure to train the objective in a semi-supervised manner by iteratively conducting mutual mappings between the data and properties. The proposed framework is implemented on four VAE-based controllable generators to evaluate its performance on property error, disentanglement, generation quality, and training time. The results indicate that our proposed framework enables more precise control over the properties of generated samples in a short training time, ensuring the disentanglement and keeping the validity of the generated samples.
    摘要 Translation notes:* 离干分离 (lìgǎn fēnhū) refers to the problem of disentangling desired properties from unrelated latent variables.* OUT-OF-DISTRIBUTION (OUT-OF-DISTRIBUTION) refers to the challenge of controlling properties that are not present in the training data.* 目标优化 (mèngtǎo yòujiā) refers to the process of optimizing the objective function to achieve the desired properties.* VAE-based controllable generators (VAE-based kòngzhì yǐngchǎng) refers to the generative models that use Variational Autoencoders (VAEs) to generate data with desired properties.

Explainable Image Similarity: Integrating Siamese Networks and Grad-CAM

  • paper_url: http://arxiv.org/abs/2310.07678
  • repo_url: None
  • paper_authors: Ioannis E. Livieris, Emmanuel Pintelas, Niki Kiriakidou, Panagiotis Pintelas
  • for: 提高图像相似性评估的可解释性,以便更好地理解图像之间的相似性原因。
  • methods: 提出了一种基于Siamese网络和Grad-CAM的图像相似性评估方法,并提供了可视化的实际和假设性解释。
  • results: 提出了一种新的图像相似性评估框架,可以提供可解释的图像相似性分数以及实际和假设性解释,并且有可能提高图像基于系统的解释性、可靠性和用户接受度。
    Abstract With the proliferation of image-based applications in various domains, the need for accurate and interpretable image similarity measures has become increasingly critical. Existing image similarity models often lack transparency, making it challenging to understand the reasons why two images are considered similar. In this paper, we propose the concept of explainable image similarity, where the goal is the development of an approach, which is capable of providing similarity scores along with visual factual and counterfactual explanations. Along this line, we present a new framework, which integrates Siamese Networks and Grad-CAM for providing explainable image similarity and discuss the potential benefits and challenges of adopting this approach. In addition, we provide a comprehensive discussion about factual and counterfactual explanations provided by the proposed framework for assisting decision making. The proposed approach has the potential to enhance the interpretability, trustworthiness and user acceptance of image-based systems in real-world image similarity applications. The implementation code can be found in https://github.com/ioannislivieris/Grad_CAM_Siamese.git.
    摘要 We present a new framework that integrates Siamese Networks and Grad-CAM for providing explainable image similarity. This approach has the potential to enhance the interpretability, trustworthiness, and user acceptance of image-based systems in real-world image similarity applications.In addition, we provide a comprehensive discussion of the factual and counterfactual explanations provided by the proposed framework, which can assist decision-making. The proposed approach has the potential to improve the interpretability and trustworthiness of image-based systems, and the implementation code can be found at https://github.com/ioannislivieris/Grad_CAM_Siamese.git.

Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples

  • paper_url: http://arxiv.org/abs/2310.07747
  • repo_url: None
  • paper_authors: Hao Sun, Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar
  • for: 本研究旨在提出一种负责任控制器,以便在决策系统中减少实际应用中的风险。
  • methods: 本研究使用了离线数据集作为决策 cuerpo,并根据特定的例子选择(称为 Corpus Subset)进行负责任控制。
  • results: 研究表明,AOC可以在低数据量情况下运行,并且可以在具有离线imitating设定的情况下进行扩展。AOC在模拟和实际医疗应用中表现出了高水平的性能,同时保持负责任。
    Abstract Learning controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. AOC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability. We assess AOC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability.
    摘要 学习控制器使用停机数据进行决策系统的研究是一个非常重要的领域,因为它可以减少实际系统中的风险。然而,在责任感知的设置中,决策责任并没有得到文献充分考虑。这篇论文介绍了负责任控制器(AOC),它使用停机dataset作为决策体系,并基于定制的示例选择(称为Corpus Subset)进行负责任控制。AOC在低数据情况下运行得非常有效,可以扩展到严格的停机模仿环境,并表现出保守和适应的特点。我们在模拟和实际医疗场景中评估了AOC的性能,强调它在处理停机控制任务时能够达到高水平的性能,同时保持责任。

HaarNet: Large-scale Linear-Morphological Hybrid Network for RGB-D Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.07669
  • repo_url: None
  • paper_authors: Rick Groenendijk, Leo Dorst, Theo Gevers
  • for: 本研究旨在开探使用多样性模式的约束来提高RGB-D数据的处理和分析效能。
  • methods: 本文提出了一种混合线性- morphological 网络,称为 HaarNet,使用了 morphological 元素和常见的线性模块。
  • results: 实验表明, HaarNet 与当前最佳 CNN 相当竞争,表明 morphological 网络是 geometry-based 学习任务的可能的研究方向。
    Abstract Signals from different modalities each have their own combination algebra which affects their sampling processing. RGB is mostly linear; depth is a geometric signal following the operations of mathematical morphology. If a network obtaining RGB-D input has both kinds of operators available in its layers, it should be able to give effective output with fewer parameters. In this paper, morphological elements in conjunction with more familiar linear modules are used to construct a mixed linear-morphological network called HaarNet. This is the first large-scale linear-morphological hybrid, evaluated on a set of sizeable real-world datasets. In the network, morphological Haar sampling is applied to both feature channels in several layers, which splits extreme values and high-frequency information such that both can be processed to improve both modalities. Moreover, morphologically parameterised ReLU is used, and morphologically-sound up-sampling is applied to obtain a full-resolution output. Experiments show that HaarNet is competitive with a state-of-the-art CNN, implying that morphological networks are a promising research direction for geometry-based learning tasks.
    摘要 文本中的不同modalities每个都有自己的组合代数,这些组合代数会影响它们的采样处理。RGB主要是线性的;深度是一种几何信号,按照数学形态学的操作进行处理。如果一个网络获得RGB-D输入,这个网络中有多种操作可以在其层次结构中使用,那么它应该能够在 fewer parameters 下提供有效的输出。在这篇论文中,作者使用了 conjunction 的 linear-morphological 网络,称之为 HaarNet。这是第一个大规模的线性-几何混合网络,在一些实际世界的数据集上进行了评估。在网络中,morphological Haar sampling 是在多个层次应用于特征通道上,将极端值和高频信息分割,以便进一步处理这两个模式。此外,使用了 morphologically parameterized ReLU 和 morphologically-sound up-sampling,以获得全分辨率输出。实验表明,HaarNet 与state-of-the-art CNN 相当竞争,implying that morphological networks 是一个有前途的研究方向 для基于几何学的学习任务。

GRaMuFeN: Graph-based Multi-modal Fake News Detection in Social Media

  • paper_url: http://arxiv.org/abs/2310.07668
  • repo_url: None
  • paper_authors: Makan Kananian, Fatima Badiei, S. AmirAli Gh. Ghahramani
  • for: 检测假信息在社交媒体平台上的扩散,提高公众意见形成的真实性。
  • methods: 提议使用文本encoder和图像encoder组合,文本encoder使用图Converter网络(GCN)进行文本分析,图像encoder使用预训练的ResNet-152 convolutional neural network(CNN)进行图像分析,并实现对比相似度损失函数,以提高检测假信息的精度。
  • results: 对两个公共可用的社交媒体新闻数据集进行了广泛的评估,相比现有的状态之artifact,提高了10%的微 F1-Score,表明GCN和CNN模型的组合可以有效地检测社交媒体上的假信息。
    Abstract The proliferation of social media platforms such as Twitter, Instagram, and Weibo has significantly enhanced the dissemination of false information. This phenomenon grants both individuals and governmental entities the ability to shape public opinions, highlighting the need for deploying effective detection methods. In this paper, we propose GraMuFeN, a model designed to detect fake content by analyzing both the textual and image content of news. GraMuFeN comprises two primary components: a text encoder and an image encoder. For textual analysis, GraMuFeN treats each text as a graph and employs a Graph Convolutional Neural Network (GCN) as the text encoder. Additionally, the pre-trained ResNet-152, as a Convolutional Neural Network (CNN), has been utilized as the image encoder. By integrating the outputs from these two encoders and implementing a contrastive similarity loss function, GraMuFeN achieves remarkable results. Extensive evaluations conducted on two publicly available benchmark datasets for social media news indicate a 10 % increase in micro F1-Score, signifying improvement over existing state-of-the-art models. These findings underscore the effectiveness of combining GCN and CNN models for detecting fake news in multi-modal data, all while minimizing the additional computational burden imposed by model parameters.
    摘要 “社交媒体平台的普及,如Twitter、Instagram和微博,已经提高了假信息的传播。这种现象让个人和政府机构都可以影响公众意见,高亮了需要部署有效的检测方法。本文提出了GraMuFeN模型,用于检测假新闻。GraMuFeN包括两个主要组成部分:文本编码器和图像编码器。对文本分析,GraMuFeN将每个文本视为一个图,并使用图 convolutional neural network (GCN) 作为文本编码器。此外,预训练的 ResNet-152 也被用作图像编码器。通过将这两个编码器的输出集成并实现对比相似性损失函数,GraMuFeN实现了显著的效果。对社交媒体新闻两个公共可用的 benchmark 数据集进行了广泛的评估,GraMuFeN 在 micro F1-Score 方面提高了10%,表明与现有状态码模型相比有显著的提高。这些发现表明了将 GCN 和 CNN 模型结合使用可以在多模式数据中检测假新闻,同时减少模型参数所增加的计算负担。”

Global Minima, Recoverability Thresholds, and Higher-Order Structure in GNNS

  • paper_url: http://arxiv.org/abs/2310.07667
  • repo_url: None
  • paper_authors: Drake Brown, Trevor Garrity, Kaden Parker, Jason Oliphant, Stone Carson, Cole Hanson, Zachary Boyd
  • for: 这个论文探讨了图 neural network(GNN)架构的性能从Random Graph Theory的角度。
  • methods: 作者使用了理论和数值方法来分析GNN的性能,包括对一层和二层GCNs的nodewise准确率的理论分析,以及对四种不同的GNN架构(GCN、GAT、SAGE和Graph Transformer)在不同假设下的数值分析。
  • results: 作者发现了一些关键的结论,包括:重 tailed degree distribution可以提高GNN性能,GNN可以在强烈不同结构上工作,SAGE和Graph Transformer可以在无isy edge数据上工作,但是没有架构能够处理足够噪音特征数据。此外,作者发现了一些特定的高阶结构在 sintetic data中和实际数据中的杂合效果通常是负面的。
    Abstract We analyze the performance of graph neural network (GNN) architectures from the perspective of random graph theory. Our approach promises to complement existing lenses on GNN analysis, such as combinatorial expressive power and worst-case adversarial analysis, by connecting the performance of GNNs to typical-case properties of the training data. First, we theoretically characterize the nodewise accuracy of one- and two-layer GCNs relative to the contextual stochastic block model (cSBM) and related models. We additionally prove that GCNs cannot beat linear models under certain circumstances. Second, we numerically map the recoverability thresholds, in terms of accuracy, of four diverse GNN architectures (GCN, GAT, SAGE, and Graph Transformer) under a variety of assumptions about the data. Sample results of this second analysis include: heavy-tailed degree distributions enhance GNN performance, GNNs can work well on strongly heterophilous graphs, and SAGE and Graph Transformer can perform well on arbitrarily noisy edge data, but no architecture handled sufficiently noisy feature data well. Finally, we show how both specific higher-order structures in synthetic data and the mix of empirical structures in real data have dramatic effects (usually negative) on GNN performance.
    摘要 我们从Random graph theory的角度分析图 neural network(GNN)的性能。我们的方法可以补充现有的GNN分析方法,如 combinatorial expressive power和最坏情况攻击分析,以连接GNN的性能和训练数据的典型特性。首先,我们理论上Characterize the node accuracy of one- and two-layer GCNs relative to the contextual stochastic block model (cSBM) and related models。我们还证明GCNs不能在某些情况下超过线性模型。其次,我们 numerically map the recoverability thresholds, in terms of accuracy, of four diverse GNN architectures (GCN, GAT, SAGE, and Graph Transformer) under a variety of assumptions about the data。 Sample results of this second analysis include: heavy-tailed degree distributions enhance GNN performance, GNNs can work well on strongly heterophilous graphs, and SAGE and Graph Transformer can perform well on arbitrarily noisy edge data, but no architecture handled sufficiently noisy feature data well。最后,我们示出了 especific higher-order structures in synthetic data and the mix of empirical structures in real data have dramatic effects (usually negative) on GNN performance。

Deep Backtracking Counterfactuals for Causally Compliant Explanations

  • paper_url: http://arxiv.org/abs/2310.07665
  • repo_url: None
  • paper_authors: Klaus-Rudolf Kladny, Julius von Kügelgen, Bernhard Schölkopf, Michael Muehlebach
  • for: 本文研究了Counterfactuals的一种新方法,即backtracking方法,可以在结构 causal models 中计算出 conditional counterfactuals。
  • methods: 本文提出了一种实用的方法,通过在结构 latent space 中做 tractable constrained optimization 问题,来生成 backtracking counterfactuals。
  • results: 实验表明, compared to existing methods of counterfactual explanations, 本文的方法更加 versatile, modular, and causally compliant。
    Abstract Counterfactuals can offer valuable insights by answering what would have been observed under altered circumstances, conditional on a factual observation. Whereas the classical interventional interpretation of counterfactuals has been studied extensively, backtracking constitutes a less studied alternative the backtracking principle has emerged as an alternative philosophy where all causal laws are kept intact. In the present work, we introduce a practical method for computing backtracking counterfactuals in structural causal models that consist of deep generative components. To this end, we impose conditions on the structural assignments that enable the generation of counterfactuals by solving a tractable constrained optimization problem in the structured latent space of a causal model. Our formulation also facilitates a comparison with methods in the field of counterfactual explanations. Compared to these, our method represents a versatile, modular and causally compliant alternative. We demonstrate these properties experimentally on a modified version of MNIST and CelebA.
    摘要 <>输入文本翻译成简化中文。<> counterfactuals 可以提供有价值的洞察,回答在修改条件下所观察到的结果。 classical interventional interpretation of counterfactuals 已经得到了广泛的研究,而 backtracking 则是一种 less studied alternative 。 backtracking principle 是一种具有保持所有 causal laws 的哲学原则。 在当前的工作中,我们介绍了一种实用的计算 backtracking counterfactuals 的方法,该方法在 structural causal models 中包含深度生成组件。 为此,我们在 causal model 中做出了特定的结构分配,以便通过解决一个可解决的封闭优化问题在 structured latent space 中生成 counterfactuals。 我们的формаulation 还可以与 counterfactual explanations 方法进行比较,相比之下,我们的方法表现出了 versatile、modular 和 causally compliant 的性能。 我们在一个修改后的 MNIST 和 CelebA 上进行了实验来证明这一点。

Mini-DALLE3: Interactive Text to Image by Prompting Large Language Models

  • paper_url: http://arxiv.org/abs/2310.07653
  • repo_url: https://github.com/Zeqiang-Lai/MiniDALLE-3
  • paper_authors: Zeqiang Lai, Xizhou Zhu, Jifeng Dai, Yu Qiao, Wenhai Wang
  • for: 这个研究旨在探讨如何使用自然语言描述来与高品质的文字至图模型(T2I)进行有效的沟通,以及如何将这种技术应用于实际的人机交互中。
  • methods: 本研究使用了调整提示的技术和现有的T2I模型来解决问题,并评估了这种方法在不同的语言模型(LLM)和T2I模型下的效果。
  • results: 研究发现,这种方法可以让LLMs拥有更好的图像质量和更强的文字与图像相互对应,并且可以让任何现有的LLMs和T2I模型都具备这种能力,而且不需要进行任何训练。
    Abstract The revolution of artificial intelligence content generation has been rapidly accelerated with the booming text-to-image (T2I) diffusion models. Within just two years of development, it was unprecedentedly of high-quality, diversity, and creativity that the state-of-the-art models could generate. However, a prevalent limitation persists in the effective communication with these popular T2I models, such as Stable Diffusion, using natural language descriptions. This typically makes an engaging image hard to obtain without expertise in prompt engineering with complex word compositions, magic tags, and annotations. Inspired by the recently released DALLE3 - a T2I model directly built-in ChatGPT that talks human language, we revisit the existing T2I systems endeavoring to align human intent and introduce a new task - interactive text to image (iT2I), where people can interact with LLM for interleaved high-quality image generation/edit/refinement and question answering with stronger images and text correspondences using natural language. In addressing the iT2I problem, we present a simple approach that augments LLMs for iT2I with prompting techniques and off-the-shelf T2I models. We evaluate our approach for iT2I in a variety of common-used scenarios under different LLMs, e.g., ChatGPT, LLAMA, Baichuan, and InternLM. We demonstrate that our approach could be a convenient and low-cost way to introduce the iT2I ability for any existing LLMs and any text-to-image models without any training while bringing little degradation on LLMs' inherent capabilities in, e.g., question answering and code generation. We hope this work could draw broader attention and provide inspiration for boosting user experience in human-machine interactions alongside the image quality of the next-generation T2I systems.
    摘要 人工智能内容生成革命已经快速加速,特别是在文本到图像(T2I)扩散模型方面。只用两年的时间,这些模型的质量、多样性和创造力已经到了历史的新高度。然而,在使用自然语言描述与这些流行的T2I模型进行有效交流仍然存在一定的限制,这通常需要专业的提示工程师、魔法标签和注释。 drawing inspiration from the recently released DALLE3 - a T2I model directly built-in ChatGPT that talks human language, we revisit the existing T2I systems and introduce a new task - interactive text to image (iT2I), where people can interact with LLM for interleaved high-quality image generation/edit/refinement and question answering with stronger images and text correspondences using natural language.在解决iT2I问题时,我们提出了一种简单的方法,即在LLMs中进行iT2I的扩展,使用提示技术和现有的T2I模型。我们在不同的LLMs(如ChatGPT、LLAMA、Baichuan和InternLM)下进行了多种常见的场景的评估。我们的方法可以让任何现有的LLMs和任何文本到图像模型都具备iT2I能力,而无需训练,同时带来对LLMs的内置能力的影响很小。我们希望这种工作能吸引更广泛的关注,并为下一代T2I系统的图像质量和人机交互的进步提供灵感。

Rethinking the BERT-like Pretraining for DNA Sequences

  • paper_url: http://arxiv.org/abs/2310.07644
  • repo_url: None
  • paper_authors: Chaoqi Liang, Weiqiang Bai, Lifeng Qiao, Yuchen Ren, Jianle Sun, Peng Ye, Hongliang Yan, Xinzhu Ma, Wangmeng Zuo, Wanli Ouyang
  • for: 这份研究旨在探讨如何将大规模预训应用到生物科学领域,特别是基于DNA序列的预训方法。
  • methods: 研究人员首先执行了一系列的探索性实验,获得了许多有益的观察,包括:在下游任务 fine-tuning 阶段,使用 K-mer 重叠tokenization 而不是 K-mer 非重叠tokenization,两者都可以在下游任务中获得显著的性能改进。
  • results: 研究人员发现,使用 K-mer 重叠tokenization 在预训过程中可以迅速生成明确的 K-mer 嵌入,并降低损失到非常低水平,但使用 K-mer 非重叠tokenization 则会导致嵌入变得更模糊,并持续降低损失。此外,使用重叠tokenization 会导致预训模型的自我注意力在中间层中过度集中在某些字串上,显示这些层未能得到适当的优化。总之,重叠tokenization 可以帮助下游任务的 fine-tuning,但它会导致预训过程中的快速收敛。为了解开预训的潜力,研究人员提出了一种新的方法called RandomMask,它通过不断扩展隐藏界限来增加BERT-like预训的任务难度,并成功取得了26个数据集中的28个数据集上的7个下游任务的Top-tier表现。
    Abstract With the success of large-scale pretraining in NLP, there is an increasing trend of applying it to the domain of life sciences. In particular, pretraining methods based on DNA sequences have garnered growing attention due to their potential to capture generic information about genes. However, existing pretraining methods for DNA sequences largely rely on direct adoptions of BERT pretraining from NLP, lacking a comprehensive understanding and a specifically tailored approach. To address this research gap, we first conducted a series of exploratory experiments and gained several insightful observations: 1) In the fine-tuning phase of downstream tasks, when using K-mer overlapping tokenization instead of K-mer non-overlapping tokenization, both overlapping and non-overlapping pretraining weights show consistent performance improvement.2) During the pre-training process, using K-mer overlapping tokenization quickly produces clear K-mer embeddings and reduces the loss to a very low level, while using K-mer non-overlapping tokenization results in less distinct embeddings and continuously decreases the loss. 3) Using overlapping tokenization causes the self-attention in the intermediate layers of pre-trained models to tend to overly focus on certain tokens, reflecting that these layers are not adequately optimized. In summary, overlapping tokenization can benefit the fine-tuning of downstream tasks but leads to inadequate pretraining with fast convergence. To unleash the pretraining potential, we introduce a novel approach called RandomMask, which gradually increases the task difficulty of BERT-like pretraining by continuously expanding its mask boundary, forcing the model to learn more knowledge. RandomMask is simple but effective, achieving top-tier performance across 26 datasets of 28 datasets spanning 7 downstream tasks.
    摘要 随着人工智能的应用在生命科学领域的扩大,对于基因序列的预训练方法也在吸引越来越多的关注。特别是基因序列预训练方法,因为它们可能会捕捉到生物体中 generic 信息。然而,现有的基因序列预训练方法大多基于 NLP 中的 BERT 预训练方法,lacking a comprehensive understanding and a specifically tailored approach。为了填补这个研究漏洞,我们首先进行了一系列的探索性实验,获得了一些有价值的观察:1)在下游任务 fine-tuning 阶段,使用 K-mer overlap 的tokenization而不是 K-mer non-overlapping 的tokenization, both overlapping 和 non-overlapping 预训练权重都显示了一致的性能提升。2)在预训练过程中,使用 K-mer overlap 的tokenization快速生成了明确的 K-mer 嵌入,并将损失降到了非常低的水平,而使用 K-mer non-overlapping 的tokenization则导致了较为模糊的嵌入,并持续降低损失。3)使用 overlap 的tokenization会让预训练模型中的自注意力倾向于过度关注某些符号,表明这些层并未充分优化。总之,overlapping 的tokenization可以优化下游任务的 fine-tuning,但是会导致预训练快速收敛。为了解锁预训练的潜力,我们提出了一种新的方法RandomMask,它通过不断扩展BERT-like 预训练模型的mask边界来增加任务难度,让模型学习更多的知识。RandomMask 简单 yet effective,在 28 个数据集上的 26 个任务上实现了顶尖表现。

OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models

  • paper_url: http://arxiv.org/abs/2310.07637
  • repo_url: None
  • paper_authors: Yuhe Liu, Changhua Pei, Longlong Xu, Bohan Chen, Mingze Sun, Zhirui Zhang, Yongqian Sun, Shenglin Zhang, Kun Wang, Haiming Zhang, Jianhui Li, Gaogang Xie, Xidao Wen, Xiaohui Nie, Dan Pei
    for:The paper is written to evaluate the performance of large language models (LLMs) in Artificial Intelligence for IT Operations (AIOps) tasks.methods:The paper presents a comprehensive task-oriented AIOps benchmark called OpsEval, which includes 7,200 questions in both multiple-choice and question-answer formats to assess LLMs’ proficiency in three crucial scenarios (Wired Network Operation, 5G Communication Operation, and Database Operation) at various ability levels.results:The paper shows that GPT4-score is more consistent with experts than widely used Bleu and Rouge, and that various LLM tricks can affect the performance of AIOps, including zero-shot, chain-of-thought, and few-shot in-context learning. The paper also provides quantitative and qualitative results that demonstrate the effectiveness of OpsEval in evaluating LLMs’ performance in AIOps tasks.
    Abstract Large language models (LLMs) have exhibited remarkable capabilities in NLP-related tasks such as translation, summarizing, and generation. The application of LLMs in specific areas, notably AIOps (Artificial Intelligence for IT Operations), holds great potential due to their advanced abilities in information summarizing, report analyzing, and ability of API calling. Nevertheless, the performance of current LLMs in AIOps tasks is yet to be determined. Furthermore, a comprehensive benchmark is required to steer the optimization of LLMs tailored for AIOps. Compared with existing benchmarks that focus on evaluating specific fields like network configuration, in this paper, we present \textbf{OpsEval}, a comprehensive task-oriented AIOps benchmark designed for LLMs. For the first time, OpsEval assesses LLMs' proficiency in three crucial scenarios (Wired Network Operation, 5G Communication Operation, and Database Operation) at various ability levels (knowledge recall, analytical thinking, and practical application). The benchmark includes 7,200 questions in both multiple-choice and question-answer (QA) formats, available in English and Chinese. With quantitative and qualitative results, we show how various LLM tricks can affect the performance of AIOps, including zero-shot, chain-of-thought, and few-shot in-context learning. We find that GPT4-score is more consistent with experts than widely used Bleu and Rouge, which can be used to replace automatic metrics for large-scale qualitative evaluations.
    摘要 大型语言模型(LLM)在自然语言处理(NLP)相关任务中表现出了非常出色的能力,包括翻译、摘要和生成等。在特定领域中应用LLM的潜在性非常大,尤其是在人工智能操作(AIOps)中,因为它们在资讯摘要、报告分析和API调用等方面有出色的能力。然而,目前LLM在AIOps任务中的表现仍未被评估。此外,为了适当地优化LLM,需要一个全面的标准参考。相比于现有的标准,这篇文章提出了一个名为“OpsEval”的全面的AIOps标准参考,用于评估LLM的能力。OpsEval包括三个重要的操作场景(有线网络操作、5G通信操作和数据库操作),并且在不同的能力水平(知识回传、分析思维和实践应用)进行评估。标准包括7,200个问题,分为多选和问题回答(QA)格式,英文和中文两种语言。我们通过量化和质量的结果显示出不同的LLM技巧可以如何影响AIOps的表现,包括零式、串行和少数内容学习。我们发现GPT4-score与专家的表现更一致,而Bleu和Rouge的自动评分可以用来取代大规模的质量评分。

Dual Quaternion Rotational and Translational Equivariance in 3D Rigid Motion Modelling

  • paper_url: http://arxiv.org/abs/2310.07623
  • repo_url: None
  • paper_authors: Guilherme Vieira, Eleonora Grassucci, Marcos Eduardo Valle, Danilo Comminiello
  • for: 本 paper 的目的是提出一种基于 dual quaternion 表示的3D空间中对象的刚性运动模型,以便更好地处理3D学习任务。
  • methods: 本 paper 使用 dual quaternion 表示法来模型3D空间中对象的刚性运动,并且通过对每个点进行同时旋转和平移的表示,保留了点集的相关性。
  • results: 实验证明,使用本 paper 提出的 dual quaternion 表示法可以在人姿预测任务中超越前一些方法,表明该方法在3D学习任务中的效果。
    Abstract Objects' rigid motions in 3D space are described by rotations and translations of a highly-correlated set of points, each with associated $x,y,z$ coordinates that real-valued networks consider as separate entities, losing information. Previous works exploit quaternion algebra and their ability to model rotations in 3D space. However, these algebras do not properly encode translations, leading to sub-optimal performance in 3D learning tasks. To overcome these limitations, we employ a dual quaternion representation of rigid motions in the 3D space that jointly describes rotations and translations of point sets, processing each of the points as a single entity. Our approach is translation and rotation equivariant, so it does not suffer from shifts in the data and better learns object trajectories, as we validate in the experimental evaluations. Models endowed with this formulation outperform previous approaches in a human pose forecasting application, attesting to the effectiveness of the proposed dual quaternion formulation for rigid motions in 3D space.
    摘要 三维空间中物体的刚性运动被描述为旋转和平移的高相关点集,每个点有相应的 $x,y,z$ 坐标,但是实值网络视为每个点为独立实体,导致信息损失。先前的工作利用四元数代数来模型旋转运动,但这些代数不能正确地表示平移,从而导致三维学习任务中的下降性能。为了解决这些限制,我们使用三元数表示方式来描述点集的刚性运动,对每个点进行单一处理。我们的方法具有平移和旋转对称性,因此不会受到数据的偏移,更好地学习物体的运动轨迹,如我们在实验评估中所证明。使用这种形式的模型,与先前的方法相比,在人姿预测应用中表现出色,证明了我们的双元数表示方法在三维空间中的刚性运动的有效性。

Reinforcement Learning-based Knowledge Graph Reasoning for Explainable Fact-checking

  • paper_url: http://arxiv.org/abs/2310.07613
  • repo_url: None
  • paper_authors: Gustav Nikopensius, Mohit Mayank, Orchid Chetia Phukan, Rajesh Sharma
    for:The paper aims to improve the trustworthiness of automated fact-checking systems by incorporating reinforcement learning (RL) and knowledge graph (KG) reasoning for explainable fact-checking.methods:The proposed approach uses RL to train an agent to compute paths that prove or disprove factual claims, and a voting mechanism to reach a verdict based on the paths produced by the agent. The KG is used to represent knowledge for explanations.results:Extensive experiments on two datasets (FB15K-277 and NELL-995) show that the proposed approach is effective in producing human-readable explanations in the form of paths and classifications for fact claims, and can increase trustworthiness by providing a human-in-the-loop approach.
    Abstract Fact-checking is a crucial task as it ensures the prevention of misinformation. However, manual fact-checking cannot keep up with the rate at which false information is generated and disseminated online. Automated fact-checking by machines is significantly quicker than by humans. But for better trust and transparency of these automated systems, explainability in the fact-checking process is necessary. Fact-checking often entails contrasting a factual assertion with a body of knowledge for such explanations. An effective way of representing knowledge is the Knowledge Graph (KG). There have been sufficient works proposed related to fact-checking with the usage of KG but not much focus is given to the application of reinforcement learning (RL) in such cases. To mitigate this gap, we propose an RL-based KG reasoning approach for explainable fact-checking. Extensive experiments on FB15K-277 and NELL-995 datasets reveal that reasoning over a KG is an effective way of producing human-readable explanations in the form of paths and classifications for fact claims. The RL reasoning agent computes a path that either proves or disproves a factual claim, but does not provide a verdict itself. A verdict is reached by a voting mechanism that utilizes paths produced by the agent. These paths can be presented to human readers so that they themselves can decide whether or not the provided evidence is convincing or not. This work will encourage works in this direction for incorporating RL for explainable fact-checking as it increases trustworthiness by providing a human-in-the-loop approach.
    摘要 fact-checking是一项非常重要的任务,因为它可以防止谣言的扩散。然而,手动fact-checking无法与在线false信息的速度保持 pace。自动化fact-checking机器比人类更快。但为了提高自动化系统的信任和透明度,需要解释性在fact-checking过程中。fact-checking通常包括对一个真实声明与一个知识库进行对比,以便提供这些解释。知识图(KG)是一种有效的知识表示方式。虽然有很多关于fact-checking的提议,但没有很多关于RL的应用。为了填补这一差,我们提出了一种基于RL的KG逻辑应用,用于可读性的fact-checking。我们对FB15K-277和NELL-995 datasets进行了广泛的实验,结果表明,使用KG进行逻辑 reasoning可以生成人类可读的解释,包括路径和分类。RL逻辑代理 computes一个路径,以证明或驳斥一个真实声明,但不提供自己的判断。一个决策是通过RL逻辑代理生成的路径进行投票,以确定声明的真实性。这些路径可以向人类读者展示,让他们自己决定提供的证据是否有力。这种工作将鼓励更多关于RL的fact-checking应用,因为它提高了系统的可信度,并提供了人类在循环的方式。

PHYDI: Initializing Parameterized Hypercomplex Neural Networks as Identity Functions

  • paper_url: http://arxiv.org/abs/2310.07612
  • repo_url: https://github.com/ispamm/phydi
  • paper_authors: Matteo Mancanelli, Eleonora Grassucci, Aurelio Uncini, Danilo Comminiello
  • for: 这篇论文主要用于研究 parameterized hypercomplex neural networks(PHNNs)的收敛性和提高其性能。
  • methods: 本文提出了 parameterized hypercomplex identity initialization(PHYDI)方法,用于控制PHNNs的收敛性,并在不同的缩放量下实现更好的性能。
  • results: 研究发现,PHYDI方法可以在不同的benchmark中提高PHNNs的性能,并且可以在减少迭代次数的情况下达到相同的性能水平。
    Abstract Neural models based on hypercomplex algebra systems are growing and prolificating for a plethora of applications, ranging from computer vision to natural language processing. Hand in hand with their adoption, parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergence at a large scale. In this paper, we study PHNNs convergence and propose parameterized hypercomplex identity initialization (PHYDI), a method to improve their convergence at different scales, leading to more robust performance when the number of layers scales up, while also reaching the same performance with fewer iterations. We show the effectiveness of this approach in different benchmarks and with common PHNNs with ResNets- and Transformer-based architecture. The code is available at https://github.com/ispamm/PHYDI.
    摘要 神经网络基于高维复杂代数系统在各种应用中增长和普遍,从计算机视觉到自然语言处理。随着其采用,具有参数化的高维复杂神经网络(PHNNs)的大小不断增长,而没有任何控制其归一化的技术。本文研究PHNNs的归一化并提出具有参数化高维复杂标识初始化(PHYDI)方法,以提高它们在不同级别上的归一化性能,以达到更加稳定的性能,同时也可以在更少的迭代次数下达到相同的性能。我们在不同的标准吨数据集上测试了这种方法,并与常见的PHNNs结构(ResNets和Transformers)结合使用。代码可以在https://github.com/ispamm/PHYDI中找到。

Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refined Open-Source Models

  • paper_url: http://arxiv.org/abs/2310.07611
  • repo_url: None
  • paper_authors: Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Zhenhailong Wang, Heng Ji
  • For: The paper aims to address the issue of restricted access and information privacy concerns due to the dominance of proprietary large language models (LLMs). It seeks to provide high-performing open-source alternatives that can compete with proprietary models in performance and cost.* Methods: The paper proposes a novel ranking metric called Performance, Refinement, and Inference Cost Score (PeRFICS) to evaluate and select the optimal open-source model for a given task. The authors also propose a domain-agnostic self-refinement process to improve the performance of open-source models.* Results: The authors’ experiments show that open-source models of varying sizes, on average, improve 8.2% from their baseline performance. The smallest model, Vicuna-7B, achieves a 11.74% improvement overall and up to a 25.39% improvement in high-creativity tasks. The best-performing model, Vicuna-13B, outperforms ChatGPT post-refinement, demonstrating the effectiveness of the proposed approach.
    Abstract The dominance of proprietary LLMs has led to restricted access and raised information privacy concerns. High-performing open-source alternatives are crucial for information-sensitive and high-volume applications but often lag behind in performance. To address this gap, we propose (1) A untargeted variant of iterative self-critique and self-refinement devoid of external influence. (2) A novel ranking metric - Performance, Refinement, and Inference Cost Score (PeRFICS) - to find the optimal model for a given task considering refined performance and cost. Our experiments show that SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance. Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks on the Vicuna benchmark. Vicuna-13B takes it a step further and outperforms ChatGPT post-refinement. This work has profound implications for resource-constrained and information-sensitive environments seeking to leverage LLMs without incurring prohibitive costs, compromising on performance and privacy. The domain-agnostic self-refinement process coupled with our novel ranking metric facilitates informed decision-making in model selection, thereby reducing costs and democratizing access to high-performing language models, as evidenced by case studies.
    摘要 due to the dominance of proprietary LLMs, there are concerns about restricted access and information privacy. High-performing open-source alternatives are crucial for information-sensitive and high-volume applications, but they often lag behind in performance. To address this gap, we propose:1. A untargeted variant of iterative self-critique and self-refinement that is not influenced by external factors.2. A new ranking metric called Performance, Refinement, and Inference Cost Score (PeRFICS) to find the best model for a given task based on refined performance and cost.Our experiments show that the SoTA open-source models of varying sizes from 7B to 65B, on average, improve 8.2% from their baseline performance. Surprisingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open-ended tasks on the Vicuna benchmark. Vicuna-13B even outperforms ChatGPT post-refinement.This work has significant implications for resource-constrained and information-sensitive environments seeking to leverage LLMs without incurring prohibitive costs, compromising on performance, and privacy. The domain-agnostic self-refinement process coupled with our novel ranking metric facilitates informed decision-making in model selection, thereby reducing costs and democratizing access to high-performing language models, as shown by case studies.

Survey on Imbalanced Data, Representation Learning and SEP Forecasting

  • paper_url: http://arxiv.org/abs/2310.07598
  • repo_url: None
  • paper_authors: Josias Moukpe
  • for: 这篇论文旨在探讨深度学习方法如何在实际应用中进行调整,以减少因数据不均衡而导致的影响。
  • methods: 这篇论文使用了表示学习方法,将注意力集中在具有丰富特征的数据空间中,以更好地捕捉资料特征和泛化到少数类别。
  • results: 这篇论文发现,这些新的深度学习方法可以更好地处理实际世界中的数据不均衡问题,并且在SEP预测任务中获得了更好的结果。
    Abstract Deep Learning methods have significantly advanced various data-driven tasks such as regression, classification, and forecasting. However, much of this progress has been predicated on the strong but often unrealistic assumption that training datasets are balanced with respect to the targets they contain. This misalignment with real-world conditions, where data is frequently imbalanced, hampers the effectiveness of such models in practical applications. Methods that reconsider that assumption and tackle real-world imbalances have begun to emerge and explore avenues to address this challenge. One such promising avenue is representation learning, which enables models to capture complex data characteristics and generalize better to minority classes. By focusing on a richer representation of the feature space, these techniques hold the potential to mitigate the impact of data imbalance. In this survey, we present deep learning works that step away from the balanced-data assumption, employing strategies like representation learning to better approximate real-world imbalances. We also highlight a critical application in SEP forecasting where addressing data imbalance is paramount for success.
    摘要 深度学习方法在许多数据驱动任务中取得了重大进步,包括回归、分类和预测等。然而,大多数这些进步假设了训练集与目标之间的均衡,这种假设在实际应用中并不是真实的。因此,许多模型在实际应用中效果不佳。为了解决这个问题,有些新的方法开始出现,它们尝试重新考虑训练集与实际应用中的偏衡。一种这样的有前途的方向是表示学习,它允许模型捕捉复杂的数据特征,并更好地泛化到小类。通过强调richer的表示空间,这些技术可能会减轻数据偏衡的影响。在这篇评论中,我们介绍了脱离平衡数据的深度学习工作,使用表示学习等策略来更好地 aproximate实际应用中的偏衡。我们还高亮了应用在SEP预测中,Addressing data imbalance是成功的关键应用。

Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models

  • paper_url: http://arxiv.org/abs/2310.07589
  • repo_url: None
  • paper_authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker
  • for: 这篇论文的目的是提出一种适应语言演化的恶意抑制方法,以提高模型在实际应用中的性能和可靠性。
  • methods: 该方法基于Retrieval-based Approach,通过在decode时使用检索来实现恶意控制文本生成。
  • results: 论文通过对多种语言模型进行实验,证明了Goodtriever方法可以减少43%的延迟时间和提高计算效率,同时保持与当前状态艺术的水平。
    Abstract Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes into account its changing nature. We introduce Goodtriever, a flexible methodology that matches the current state-of-the-art toxicity mitigation while achieving 43% relative latency reduction during inference and being more computationally efficient. By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation. Our research advocates for an increased focus on adaptable mitigation techniques, which better reflect the data drift models face when deployed in the wild. Code and data are available at https://github.com/for-ai/goodtriever.
    摘要 很大的努力已经投入到抑制毒性方面,但现有的方法frequently需要对模型参数进行极大的修改或使用 computationally intensive的auxiliary models。此外,前一代的方法经常忽视了语言的不断发展的特点。在这项工作中,我们提出了一个完整的抑制毒性视角,考虑到其变化的特点。我们介绍了Goodtriever,一种灵活的方法,与当前状态的抑制毒性方法相当,实现了43%的相对延迟减少 durante la inferencia y es más eficiente en términos de computación。通过在 decode 时 incorporating a retrieval-based approach,Goodtriever 实现了基于文本生成的毒性控制。我们强调了适应性的技术的重要性,以更好地反映数据模型在野外部署时所面临的数据漂移模型。代码和数据可以在 获取。

Accurate Use of Label Dependency in Multi-Label Text Classification Through the Lens of Causality

  • paper_url: http://arxiv.org/abs/2310.07588
  • repo_url: None
  • paper_authors: Caoyun Fan, Wenqing Chen, Jidong Tian, Yitian Li, Hao He, Yaohui Jin
  • for: 本研究旨在提高多标签文本分类(MLTC)模型的性能,通过引入标签相关性来增强模型的预测能力。
  • methods: 本研究提出了一种Counterfactual Text Classifier(CFTC),它首先使用预测后修改的基础体系来提取标签相关性中嵌入的精准标签信息,然后通过对标签相关性的反向矩阵来阻断相关性快捷的偏好。
  • results: 实验结果表明,CFTC在三个数据集上显著超过基eline,并有效地消除了标签相关性偏好。
    Abstract Multi-Label Text Classification (MLTC) aims to assign the most relevant labels to each given text. Existing methods demonstrate that label dependency can help to improve the model's performance. However, the introduction of label dependency may cause the model to suffer from unwanted prediction bias. In this study, we attribute the bias to the model's misuse of label dependency, i.e., the model tends to utilize the correlation shortcut in label dependency rather than fusing text information and label dependency for prediction. Motivated by causal inference, we propose a CounterFactual Text Classifier (CFTC) to eliminate the correlation bias, and make causality-based predictions. Specifically, our CFTC first adopts the predict-then-modify backbone to extract precise label information embedded in label dependency, then blocks the correlation shortcut through the counterfactual de-bias technique with the help of the human causal graph. Experimental results on three datasets demonstrate that our CFTC significantly outperforms the baselines and effectively eliminates the correlation bias in datasets.
    摘要 多 Label 文本分类 (MLTC) 目标是为每个给定的文本分配最 relevante 标签。现有方法表明,标签依赖可以帮助提高模型的性能。然而,标签依赖的引入可能会导致模型受到不良预测偏见。在本研究中,我们归因偏见于模型对标签依赖的误用,即模型倾向于利用标签依赖的相关短cut 而不是将文本信息和标签依赖 fusion 用于预测。 motivated by causal inference,我们提出了Counterfactual Text Classifier (CFTC),以消除相关偏见,并基于 causality 进行预测。specifically,我们的 CFTC 首先采用 predict-then-modify 基干来提取标签依赖中嵌入的精确标签信息,然后通过 counterfactual de-bias 技术和人类 causal graph 屏蔽相关短cut,以消除偏见。实验结果在三个数据集上表明,我们的 CFTC significantly outperforms 基elines,并有效地消除数据集中的偏见。

Fed-GraB: Federated Long-tailed Learning with Self-Adjusting Gradient Balancer

  • paper_url: http://arxiv.org/abs/2310.07587
  • repo_url: https://github.com/zackzikaixiao/fedgrab
  • paper_authors: Zikai Xiao, Zihan Chen, Songshang Liu, Hualiang Wang, Yang Feng, Jin Hao, Joey Tianyi Zhou, Jian Wu, Howard Hao Yang, Zuozhu Liu
  • for: 这篇论文研究了一种 federated long-tailed learning(Fed-LT)任务,在每个客户端上存在本地不同的数据集,如果这些数据集可以全局聚合,它们就会共同出现长尾分布。在这种设置下,现有的联邦优化和/或中央长尾学习方法很难应用,因为在隐私约束下不能准确地特征长尾分布的全局性。
  • methods: 该论文提出了一种名为 $\texttt{Fed-GraB}$ 的方法,包括一个自适应权重调整器(SGB)模块,该模块在关闭loop的方式下,根据客户端的反馈,重新权重客户端的梯度。此外,该方法还包括一个直接先验分析器(DPA)模块,用于评估客户端数据集的全局长尾分布。
  • results: EXTENSIVE EXPERIMENTS DEMONSTRATE THAT $\texttt{Fed-GraB}$ ACHIEVES STATE-OF-THE-ART PERFORMANCE ON REPRESENTATIVE DATASETS SUCH AS CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, AND iNaturalist。
    Abstract Data privacy and long-tailed distribution are the norms rather than the exception in many real-world tasks. This paper investigates a federated long-tailed learning (Fed-LT) task in which each client holds a locally heterogeneous dataset; if the datasets can be globally aggregated, they jointly exhibit a long-tailed distribution. Under such a setting, existing federated optimization and/or centralized long-tailed learning methods hardly apply due to challenges in (a) characterizing the global long-tailed distribution under privacy constraints and (b) adjusting the local learning strategy to cope with the head-tail imbalance. In response, we propose a method termed $\texttt{Fed-GraB}$, comprised of a Self-adjusting Gradient Balancer (SGB) module that re-weights clients' gradients in a closed-loop manner, based on the feedback of global long-tailed distribution evaluated by a Direct Prior Analyzer (DPA) module. Using $\texttt{Fed-GraB}$, clients can effectively alleviate the distribution drift caused by data heterogeneity during the model training process and obtain a global model with better performance on the minority classes while maintaining the performance of the majority classes. Extensive experiments demonstrate that $\texttt{Fed-GraB}$ achieves state-of-the-art performance on representative datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist.
    摘要 “数据隐私和长尾分布是现实任务中的常见现象,而不是特例。本文研究了一种联合长尾学习(Fed-LT)任务,每个客户端持有本地不同 datasets,如果这些数据集可以全球聚合,它们就会共同表现出长尾分布。在这种设置下,现有的联合优化和/或中央长尾学习方法几乎无法应用,因为(a)全局长尾分布的特征难以在隐私约束下表征,(b)本地学习策略难以适应头部和尾部差异。为应对这些挑战,我们提出了一种方法,称为 $\texttt{Fed-GraB}$,它包括一个自适应权重重新分配(SGB)模块,根据全球长尾分布的反馈,在关闭循环方式下重新分配客户端的梯度。使用 $\texttt{Fed-GraB}$,客户端可以在模型训练过程中有效地缓解由数据不同性引起的分布漂移,并在少数类上获得更好的性能,同时保持多数类的性能。广泛的实验表明, $\texttt{Fed-GraB}$ 达到了代表性数据集的领先性能,包括 CIFAR-10-LT、CIFAR-100-LT、ImageNet-LT 和 iNaturalist。”

Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT

  • paper_url: http://arxiv.org/abs/2310.07582
  • repo_url: https://github.com/deanhazineh/emergent-world-representations-othello
  • paper_authors: Dean S. Hazineh, Zechen Zhang, Jeffery Chiu
  • for: 这篇论文是为了研究基于Othello的Transformer模型是否真正理解世界,而不仅仅是随机模仿。
  • methods: 这篇论文使用了一个简单的Transformer模型,并对其进行了扩展,以提高对Othello-GPT模型的理解。
  • results: 研究发现,Othello-GPT模型具有一个线性的对抗方面表示,这个表示导致了它的决策过程。此外,研究还发现了这个线性世界表示和 causal 决策之间的互动,以及层数和模型复杂度对这种互动的影响。
    Abstract Foundation models exhibit significant capabilities in decision-making and logical deductions. Nonetheless, a continuing discourse persists regarding their genuine understanding of the world as opposed to mere stochastic mimicry. This paper meticulously examines a simple transformer trained for Othello, extending prior research to enhance comprehension of the emergent world model of Othello-GPT. The investigation reveals that Othello-GPT encapsulates a linear representation of opposing pieces, a factor that causally steers its decision-making process. This paper further elucidates the interplay between the linear world representation and causal decision-making, and their dependence on layer depth and model complexity. We have made the code public.
    摘要 基本模型在决策和逻辑推理方面表现出了显著的能力。然而,关于它们真正理解世界的问题仍然存在着不断的讨论,一些人认为它们只是Random mimicry。本文仔细检查了一个简单的 transformer 被训练 для奥菲洛,扩展了先前的研究,以更好地了解奥菲洛-GPT 的 emergent 世界模型。调查发现,奥菲洛-GPT 包含了对对抗的 Piece 的线性表示,这种因素 causally 导向它的决策过程。本文还详细解释了线性世界表示和 causal 决策之间的交互,以及它们与模型复杂度和层数之间的依赖关系。我们已经公开了代码。

In-Context Unlearning: Language Models as Few Shot Unlearners

  • paper_url: http://arxiv.org/abs/2310.07579
  • repo_url: https://github.com/MartinPawel/In-Context-Unlearning
  • paper_authors: Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju
  • for: 研究高效地从特定训练点中除去对训练模型的影响,以遵守隐私法规 like Right to be Forgotten.
  • methods: 提出了一些用于 approximate 训练数据除去而不需要重新训练模型的算法,这些算法需要对模型参数进行访问。
  • results: 实验结果表明,在提供输入Context和颠倒标签 alongside correctly labelled instances 时,可以高效地除去特定训练点的影响,并保持与状态体系方法的竞争力。
    Abstract Machine unlearning, the study of efficiently removing the impact of specific training points on the trained model, has garnered increased attention of late, driven by the need to comply with privacy regulations like the Right to be Forgotten. Although unlearning is particularly relevant for LLMs in light of the copyright issues they raise, achieving precise unlearning is computationally infeasible for very large models. To this end, recent work has proposed several algorithms which approximate the removal of training data without retraining the model. These algorithms crucially rely on access to the model parameters in order to update them, an assumption that may not hold in practice due to computational constraints or when the LLM is accessed via API. In this work, we propose a new class of unlearning methods for LLMs we call ''In-Context Unlearning'', providing inputs in context and without having to update model parameters. To unlearn a particular training instance, we provide the instance alongside a flipped label and additional correctly labelled instances which are prepended as inputs to the LLM at inference time. Our experimental results demonstrate that these contexts effectively remove specific information from the training set while maintaining performance levels that are competitive with (or in some cases exceed) state-of-the-art unlearning methods that require access to the LLM parameters.
    摘要 机器学习无学习(Machine Unlearning),即在已经训练过的模型上efficiently removing特定训练点的影响,在最近几年来收到了更多的关注,即由隐私法规如“Right to be Forgotten”所驱动。特别是在LLM中,由于复杂的版权问题,无学习变得非常重要。然而,在非常大的模型上,准确的无学习是计算不可能的。为此,最近的研究提出了一些使用模型参数更新的算法,以便精确地除掉训练数据。这些算法具有访问模型参数的假设,但在实践中这种假设可能不成立,例如因为计算限制或者LLM通过API访问。在这项工作中,我们提出了一种新的LLM无学习方法,我们称之为“在Context中的无学习”(In-Context Unlearning)。在这种方法中,我们在推理时提供特定训练实例,并将其旋转后的标签和正确标签的其他实例预处理为LLM的输入。我们的实验结果表明,这些上下文可以有效地从训练集中除掉特定信息,而且与无需更新模型参数的状态之前的方法竞争。

ChatGPT for Computational Topology

  • paper_url: http://arxiv.org/abs/2310.07570
  • repo_url: https://github.com/joybearliu/chatgpt-for-computational-topology
  • paper_authors: Jian Liu, Li Shen, Guo-Wei Wei
  • for: bridging the gap between theoretical topological concepts and their practical implementation in computational topology
  • methods: utilizing ChatGPT to transform mathematical formulations and concepts into functional code for computational topology
  • results: demonstrating the effectiveness of ChatGPT in computing Betti numbers, Laplacian matrices, and Dirac matrices for simplicial complexes, as well as the persistence of various homologies and Laplacians, and exploring its application in computing recently developed topological theories for hypergraphs and digraphs.
    Abstract ChatGPT represents a significant milestone in the field of artificial intelligence (AI), finding widespread applications across diverse domains. However, its effectiveness in mathematical contexts has been somewhat constrained by its susceptibility to conceptual errors. Concurrently, topological data analysis (TDA), a relatively new discipline, has garnered substantial interest in recent years. Nonetheless, the advancement of TDA is impeded by the limited understanding of computational algorithms and coding proficiency among theoreticians. This work endeavors to bridge the gap between theoretical topological concepts and their practical implementation in computational topology through the utilization of ChatGPT. We showcase how a pure theoretician, devoid of computational experience and coding skills, can effectively transform mathematical formulations and concepts into functional code for computational topology with the assistance of ChatGPT. Our strategy outlines a productive process wherein a mathematician trains ChatGPT on pure mathematical concepts, steers ChatGPT towards generating computational topology code, and subsequently validates the generated code using established examples. Our specific case studies encompass the computation of Betti numbers, Laplacian matrices, and Dirac matrices for simplicial complexes, as well as the persistence of various homologies and Laplacians. Furthermore, we explore the application of ChatGPT in computing recently developed topological theories for hypergraphs and digraphs. This work serves as an initial step towards effectively transforming pure mathematical theories into practical computational tools, with the ultimate goal of enabling real applications across diverse fields.
    摘要 chatGPT 代表了人工智能(AI)领域的一个重要里程碑,在多个领域中发现了广泛的应用。然而,它在数学上的效iveness受到了概念错误的限制。同时,数据 topology 分析(TDA)在最近几年内得到了广泛的关注。然而,TDA 的发展受到了计算算法和编程技能的限制,特别是在理论家中。这项工作的目的是通过使用 chatGPT 将数学概念与计算 topology 的实现相连接。我们展示了如何让纯粹的数学家,没有计算经验和编程技能,通过 chatGPT 将数学表述和概念转化为功能的计算 topology 代码。我们的策略是让数学家通过 chatGPT 训练数学概念,然后通过 chatGPT 生成计算 topology 代码,并 finally 验证生成的代码使用已知的例子进行验证。我们的具体案例包括计算 simplicial 复杂体上的 Betti 数、Laplacian 矩阵和 Dirac 矩阵,以及不同 homology 和 Laplacian 的 persistency。此外,我们还探讨了 chatGPT 在计算最新发展的图 theoretically 的应用。这项工作作为将纯粹的数学理论转化为实用计算工具的第一步,最终目标是在多个领域应用。

ROMO: Retrieval-enhanced Offline Model-based Optimization

  • paper_url: http://arxiv.org/abs/2310.07560
  • repo_url: https://github.com/cmciris/romo
  • paper_authors: Mingcheng Chen, Haoran Zhao, Yuxiang Zhao, Hulei Fan, Hongqiao Gao, Yong Yu, Zheng Tian
  • for: 在各种实际应用场景中,数据驱动黑盒模型基于优化(MBO)问题广泛存在,旨在在整个空间中找到一个最优设计,以最大化黑盒目标函数,基于静止的离线数据集。
  • methods: 我们在这篇论文中考虑了一种更加普遍而具有挑战性的MBO设定,即受限MBO(CoMBO),其中只有一部分的设计空间可以优化,而另一部分则被环境所限制。我们提出了一种新的挑战,即大多数在离线数据集中满足约束的设计都是中等的评价。因此,我们将注意力集中在优化这些中等的设计,而不是进一步提高传统MBO设定中的最优设计。
  • results: 我们提出了一种新的forward方法,名为回 retrieve-enhanced offline model-based optimization(ROMO),它可以在离线数据集中检索和聚合相关样本,以提供可靠的预测,并用其进行梯度下降优化。ROMO简单易行,并在CoMBO设定中超过了现有方法的表现。我们在一个 sintetic Hartmann(3D)函数数据集、一个工业CIO数据集以及一个Modified Tasks中进行了实验,结果表明,ROMO在各种受限优化任务中表现良好。
    Abstract Data-driven black-box model-based optimization (MBO) problems arise in a great number of practical application scenarios, where the goal is to find a design over the whole space maximizing a black-box target function based on a static offline dataset. In this work, we consider a more general but challenging MBO setting, named constrained MBO (CoMBO), where only part of the design space can be optimized while the rest is constrained by the environment. A new challenge arising from CoMBO is that most observed designs that satisfy the constraints are mediocre in evaluation. Therefore, we focus on optimizing these mediocre designs in the offline dataset while maintaining the given constraints rather than further boosting the best observed design in the traditional MBO setting. We propose retrieval-enhanced offline model-based optimization (ROMO), a new derivable forward approach that retrieves the offline dataset and aggregates relevant samples to provide a trusted prediction, and use it for gradient-based optimization. ROMO is simple to implement and outperforms state-of-the-art approaches in the CoMBO setting. Empirically, we conduct experiments on a synthetic Hartmann (3D) function dataset, an industrial CIO dataset, and a suite of modified tasks in the Design-Bench benchmark. Results show that ROMO performs well in a wide range of constrained optimization tasks.
    摘要 “数据驱动黑盒模型基于优化(MBO)问题在许多实际应用场景中出现,目标是在整个空间上找到最优化黑盒目标函数的设计,使用静态离线数据。在这项工作中,我们考虑了更一般 yet 更加挑战性的 MBO 设定,即受限 MBO(CoMBO),其中只有部分设计空间可以优化,而另外的部分受到环境的限制。这种新的挑战是,大多数满足约束的观察到的设计都是较差的评价。因此,我们将注意力转向了这些较差的评价设计,而不是在传统 MBO 设定中进一步提高最佳观察到的设计。我们提出了 reuse-based offline model-based optimization(ROMO),一种新的可求导的前向方法,它在拥有离线数据时重新检索和聚合相关样本,以提供可靠的预测,并用其进行梯度下降优化。ROMO 简单实现,并在 CoMBO 设定中超过了当前状态的方法。我们在一个 Synthetic Hartmann(3D)函数数据集、一个industrial CIO数据集以及一个 Design-Bench benchmark 中进行了实验,结果表明,ROMO 在受限优化任务中表现良好。”

ProtoHPE: Prototype-guided High-frequency Patch Enhancement for Visible-Infrared Person Re-identification

  • paper_url: http://arxiv.org/abs/2310.07552
  • repo_url: None
  • paper_authors: Guiwei Zhang, Yongfei Zhang, Zichang Tan
  • for: bridging the modality gap in visible-infrared person re-identification
  • methods: using high-frequency components and ProtoHPE with two core designs: split patches and multimodal prototypical contrast
  • results: effective enhancement of representation ability and capture of key high-frequency components without extra complexity, validated by extensive experiments
    Abstract Visible-infrared person re-identification is challenging due to the large modality gap. To bridge the gap, most studies heavily rely on the correlation of visible-infrared holistic person images, which may perform poorly under severe distribution shifts. In contrast, we find that some cross-modal correlated high-frequency components contain discriminative visual patterns and are less affected by variations such as wavelength, pose, and background clutter than holistic images. Therefore, we are motivated to bridge the modality gap based on such high-frequency components, and propose \textbf{Proto}type-guided \textbf{H}igh-frequency \textbf{P}atch \textbf{E}nhancement (ProtoHPE) with two core designs. \textbf{First}, to enhance the representation ability of cross-modal correlated high-frequency components, we split patches with such components by Wavelet Transform and exponential moving average Vision Transformer (ViT), then empower ViT to take the split patches as auxiliary input. \textbf{Second}, to obtain semantically compact and discriminative high-frequency representations of the same identity, we propose Multimodal Prototypical Contrast. To be specific, it hierarchically captures the comprehensive semantics of different modal instances, facilitating the aggregation of high-frequency representations belonging to the same identity. With it, ViT can capture key high-frequency components during inference without relying on ProtoHPE, thus bringing no extra complexity. Extensive experiments validate the effectiveness of ProtoHPE.
    摘要 visible-infrared人识别具有大的模态差异,因此大多数研究都会仰仗可 correlate的可见-红外人像,可能会在严重的分布转换下表现不佳。然而,我们发现一些跨模态相关的高频成分含有可 дискriminative的视觉特征,并且比可见人像更不受波长、姿势和背景噪声的影响。因此,我们决心基于这些高频成分来bridging模态差异,并提出了ProtoHPE方法,具有两个核心设计。首先,为了提高跨模态相关高频成分的表征能力,我们使用波峰变换和抽象迭代ViT来拆分patches,然后使ViT作为辅助输入使用。其次,为了获得不同感知模式下的同一个人的semantically Compact和дискriminative高频表示,我们提出了多模态原型冲突。具体来说,它可以 hierarchically capture不同感知模式下的同一个人的全面semantics,使得ViT能在推理过程中捕捉到关键的高频成分,无需依赖于ProtoHPE,从而不增加额外复杂性。我们的实验证明了ProtoHPE的效果。

Improving Fairness-Accuracy tradeoff with few Test Samples under Covariate Shift

  • paper_url: http://arxiv.org/abs/2310.07535
  • repo_url: None
  • paper_authors: Shreyas Havaldar, Jatin Chauhan, Karthikeyan Shanmugam, Jay Nandy, Aravindan Raghuveer
  • for: 本研究旨在提高模型的准确性和公平性,在covariate shift问题下,确保不同敏感群体之间的公平性具有社会意义,如刑事正义。
  • methods: 本文提出三种贡献:首先,我们提出一种新的复合权重 entropy 基于目标函数,用于提高预测准确性和公平性。其次,我们提出了一种新的不对称 covariate shift Setting,这种设定在许多现有基eline上表现出EXTREMELY CHALLENGING的特点。第三,我们提出了一种理论基础,表明我们的方法可以准确地预测测试数据中的loss。
  • results: 我们的实验和理论分析表明,我们的方法可以在几个标准数据集上出perform state-of-the-art baselines in the Pareto sense,并且不受重要样本偏置的影响。此外,我们还证明了我们的方法可以在新的不对称 covariate shift Setting中提高公平性和准确性。
    Abstract Covariate shift in the test data can significantly downgrade both the accuracy and the fairness performance of the model. Ensuring fairness across different sensitive groups in such settings is of paramount importance due to societal implications like criminal justice. We operate under the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available. Towards this problem, we make three contributions. First is a novel composite weighted entropy based objective for prediction accuracy which is optimized along with a representation matching loss for fairness. We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines in the pareto sense with respect to the fairness-accuracy tradeoff on several standard datasets. Our second contribution is a new setting we term Asymmetric Covariate Shift that, to the best of our knowledge, has not been studied before. Asymmetric covariate shift occurs when distribution of covariates of one group shifts significantly compared to the other groups and this happens when a dominant group is over-represented. While this setting is extremely challenging for current baselines, We show that our proposed method significantly outperforms them. Our third contribution is theoretical, where we show that our weighted entropy term along with prediction loss on the training set approximates test loss under covariate shift. Empirically and through formal sample complexity bounds, we show that this approximation to the unseen test loss does not depend on importance sampling variance which affects many other baselines.
    摘要 科Variate shift在测试数据中可能会导致模型的准确性和公平性表现下降。在社会中,保持不同敏感群体之间的公平性非常重要,特别是在刑事司法方面。我们在无监督情况下运行,只有一小组无标记测试样本和一个标记训练集可用。为解决这问题,我们提出了三个贡献:第一个贡献是一种新的复合权重Entropy基于目标函数,可以同时保证预测准确性和公平性。我们通过实验表明,使用我们的损失函数可以在 pareto折衔中超过一些状态方法的基elines,并且在多个标准数据集上达到更好的性能。第二个贡献是一种新的设定,我们称之为不对称科Variate shift。这种情况发生在一个群体中covariate的分布发生了显著变化,而另一个群体的分布则相对稳定。这种情况非常困难,但我们展示了我们提出的方法可以在这种情况下表现出色。第三个贡献是理论上的,我们表明了我们的复合权重Entropy терμ和预测损失函数可以近似测试损失函数。我们通过实验和正式样本复杂度下界来证明,这种近似不依赖于重要样本变量,因此不同于许多其他基elines。

Human-Centered Evaluation of XAI Methods

  • paper_url: http://arxiv.org/abs/2310.07534
  • repo_url: None
  • paper_authors: Karam Dawoud, Wojciech Samek, Peter Eisert, Sebastian Lapuschkin, Sebastian Bosse
  • for: 本研究旨在探讨深度学习中的决策过程,以提高人们对AI的理解和信任。
  • methods: 本研究使用三种主流解释方法:Prototypical Part Network、Occlusion和Layer-wise Relevance Propagation,以评估这些方法的可解释性。
  • results: 研究发现,这三种方法可以帮助人们快速理解和分类图像,并且它们之间的解释结果相对一致,从而提高了AI的透明度。
    Abstract In the ever-evolving field of Artificial Intelligence, a critical challenge has been to decipher the decision-making processes within the so-called "black boxes" in deep learning. Over recent years, a plethora of methods have emerged, dedicated to explaining decisions across diverse tasks. Particularly in tasks like image classification, these methods typically identify and emphasize the pivotal pixels that most influence a classifier's prediction. Interestingly, this approach mirrors human behavior: when asked to explain our rationale for classifying an image, we often point to the most salient features or aspects. Capitalizing on this parallel, our research embarked on a user-centric study. We sought to objectively measure the interpretability of three leading explanation methods: (1) Prototypical Part Network, (2) Occlusion, and (3) Layer-wise Relevance Propagation. Intriguingly, our results highlight that while the regions spotlighted by these methods can vary widely, they all offer humans a nearly equivalent depth of understanding. This enables users to discern and categorize images efficiently, reinforcing the value of these methods in enhancing AI transparency.
    摘要 在人工智能领域中,一个关键挑战是解释深度学习中的决策过程。在过去几年,许多方法出现了,旨在对各种任务中的决策进行解释。特别是在图像分类任务中,这些方法通常可以识别并强调影响分类器预测的关键像素。这种方法与人类行为的差异不大:当被问到分类图像的理由时,我们通常会指出最引人注目的特征或方面。 builds on this parallel, our research conducted a user-centered study to objectively measure the interpretability of three leading explanation methods: (1) Prototypical Part Network, (2) Occlusion, and (3) Layer-wise Relevance Propagation. Our results show that while the regions highlighted by these methods can vary widely, they all provide humans with a nearly equivalent depth of understanding. This enables users to efficiently categorize and discern images, which reinforces the value of these methods in enhancing AI transparency.

Energy Estimates Across Layers of Computing: From Devices to Large-Scale Applications in Machine Learning for Natural Language Processing, Scientific Computing, and Cryptocurrency Mining

  • paper_url: http://arxiv.org/abs/2310.07516
  • repo_url: None
  • paper_authors: Sadasivan Shankar
  • for: 这 paper 旨在确定和分析计算机系统中的能源使用量。
  • methods: 这 paper 使用了以前分析 [3] 的基础,对单个设备和系统,包括三种大规模计算应用程序(人工智能/机器学习自然语言处理、科学仿真和加密货币矿 Pool)进行了能源估计。与以前的比较 bit-level switching 中,由于几何缩放,通过逻辑门阵列来实现的能效性,现在在应用程序层次的指令和仿真层次中都需要更多的能源。
  • results: 这 paper 的分析表明,使用older 半导体技术节点的架构改变可以与使用 newer 技术节点的架构相比,在 AI/ML 加速器中实现相同的能效性。此外,对计算系统中的能量和热动力学限制进行了比较,显示计算应用程序的总模拟需要27-36个数量级更高的能量需求。这些能量估计表明计算系统中的能效性需要被考虑,包括能量作为设计参数,以满足计算应用程序的增长需求在数字世界中。
    Abstract Estimates of energy usage in layers of computing from devices to algorithms have been determined and analyzed. Building on the previous analysis [3], energy needed from single devices and systems including three large-scale computing applications such as Artificial Intelligence (AI)/Machine Learning for Natural Language Processing, Scientific Simulations, and Cryptocurrency Mining have been estimated. In contrast to the bit-level switching, in which transistors achieved energy efficiency due to geometrical scaling, higher energy is expended both at the at the instructions and simulations levels of an application. Additionally, the analysis based on AI/ML Accelerators indicate that changes in architectures using an older semiconductor technology node have comparable energy efficiency with a different architecture using a newer technology. Further comparisons of the energy in computing systems with the thermodynamic and biological limits, indicate that there is a 27-36 orders of magnitude higher energy requirements for total simulation of an application. These energy estimates underscore the need for serious considerations of energy efficiency in computing by including energy as a design parameter, enabling growing needs of compute-intensive applications in a digital world.
    摘要 计算层从设备到算法的能源使用估算和分析已经确定。基于之前分析(3),包括人工智能(AI)/机器学习自然语言处理、科学仿真和加密货币开采等三大规模计算应用程序的能源需求已经估算。与比特级 switching相比,在应用程序的指令和仿真层级上都需要更多的能源。此外,使用older半导体技术节点的架构改变表明,与不同架构使用更新的技术节点相比,能效性没有很大差异。此外,对计算系统的能源需求和热动力学和生物学限制进行比较,显示计算应用程序的总模拟需求高达27-36个数量级。这些能源估算表明,在计算设计中包含能源为重要参数是必要的,以满足计算密集应用程序在数字世界中的增长需求。

Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing

  • paper_url: http://arxiv.org/abs/2310.07497
  • repo_url: https://github.com/skyd-fl/scfl
  • paper_authors: Minh Ngoc Luu, Minh-Duong Nguyen, Ebrahim Bedeer, Van Duc Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham
  • for: 这篇论文主要针对 Federated Learning (FL) 系统中的最新潮流方法,它们假设训练集在 IoT 设备上的数据具有全球数据分布的相似性。但这种方法无法捕捉现实时感数据的全面特点。这篇论文的目的是为 IoT 网络实现现实时感数据的 Federated Learning 系统。
  • methods: 这篇论文提出了一个新的方法,即 Sample-driven Control for Federated Learning (SCFL),该方法可以在 IoT 网络中实现现实时感数据的 Federated Learning。SCFL 方法首先将数据采样过程视为一个优化问题,并通过实现数据采样过程中的范例控制来减少过滤问题并提高准确性。
  • results: 这篇论文的结果显示,SCFL 方法可以有效地控制数据采样过程中的过滤问题,并提高 Federated Learning 系统的准确性。实验结果显示,SCFL 方法在不同的数据分布下均可以获得高准确性。此外,SCFL 方法还可以在变化的环境下获得佳效果,因为它可以在不同的数据分布下进行自适应调整。
    Abstract In the domain of Federated Learning (FL) systems, recent cutting-edge methods heavily rely on ideal conditions convergence analysis. Specifically, these approaches assume that the training datasets on IoT devices possess similar attributes to the global data distribution. However, this approach fails to capture the full spectrum of data characteristics in real-time sensing FL systems. In order to overcome this limitation, we suggest a new approach system specifically designed for IoT networks with real-time sensing capabilities. Our approach takes into account the generalization gap due to the user's data sampling process. By effectively controlling this sampling process, we can mitigate the overfitting issue and improve overall accuracy. In particular, We first formulate an optimization problem that harnesses the sampling process to concurrently reduce overfitting while maximizing accuracy. In pursuit of this objective, our surrogate optimization problem is adept at handling energy efficiency while optimizing the accuracy with high generalization. To solve the optimization problem with high complexity, we introduce an online reinforcement learning algorithm, named Sample-driven Control for Federated Learning (SCFL) built on the Soft Actor-Critic (A2C) framework. This enables the agent to dynamically adapt and find the global optima even in changing environments. By leveraging the capabilities of SCFL, our system offers a promising solution for resource allocation in FL systems with real-time sensing capabilities.
    摘要 在 Federated Learning(FL)系统领域,现代方法倚靠理想的条件进行归一化分析。specifically, these approaches assume that the training datasets on IoT devices possess similar attributes to the global data distribution. However, this approach fails to capture the full spectrum of data characteristics in real-time sensing FL systems. In order to overcome this limitation, we propose a new approach system specifically designed for IoT networks with real-time sensing capabilities. Our approach takes into account the generalization gap due to the user's data sampling process. By effectively controlling this sampling process, we can mitigate the overfitting issue and improve overall accuracy. In particular, We first formulate an optimization problem that harnesses the sampling process to concurrently reduce overfitting while maximizing accuracy. In pursuit of this objective, our surrogate optimization problem is adept at handling energy efficiency while optimizing the accuracy with high generalization. To solve the optimization problem with high complexity, we introduce an online reinforcement learning algorithm, named Sample-driven Control for Federated Learning (SCFL) built on the Soft Actor-Critic (A2C) framework. This enables the agent to dynamically adapt and find the global optima even in changing environments. By leveraging the capabilities of SCFL, our system offers a promising solution for resource allocation in FL systems with real-time sensing capabilities.Here's the word-for-word translation of the text into Simplified Chinese:在 Federated Learning(FL)系统领域,现代方法倚靠理想的条件进行归一化分析。specifically, these approaches assume that the training datasets on IoT devices possess similar attributes to the global data distribution. However, this approach fails to capture the full spectrum of data characteristics in real-time sensing FL systems. In order to overcome this limitation, we propose a new approach system specifically designed for IoT networks with real-time sensing capabilities. Our approach takes into account the generalization gap due to the user's data sampling process. By effectively controlling this sampling process, we can mitigate the overfitting issue and improve overall accuracy. In particular, We first formulate an optimization problem that harnesses the sampling process to concurrently reduce overfitting while maximizing accuracy. In pursuit of this objective, our surrogate optimization problem is adept at handling energy efficiency while optimizing the accuracy with high generalization. To solve the optimization problem with high complexity, we introduce an online reinforcement learning algorithm, named Sample-driven Control for Federated Learning (SCFL) built on the Soft Actor-Critic (A2C) framework. This enables the agent to dynamically adapt and find the global optima even in changing environments. By leveraging the capabilities of SCFL, our system offers a promising solution for resource allocation in FL systems with real-time sensing capabilities.

Diversity for Contingency: Learning Diverse Behaviors for Efficient Adaptation and Transfer

  • paper_url: http://arxiv.org/abs/2310.07493
  • repo_url: None
  • paper_authors: Finn Rietz, Johannes Andreas Stork
  • for: 本文旨在提出一种简单的方法,以寻找任务中所有可能的解决方案,以提高转移RL代理的性能和适应能力。
  • methods: 本文使用迭代学习策略,每一个策略都要求在所有前一个策略下 unlikely 的解决方案。不需要学习额外的模型,也不需要调整任务和新鲜度奖励信号。
  • results: 本文的方法可以快速适应任务和转移动力学变化,并且可以提高转移RL代理的性能。
    Abstract Discovering all useful solutions for a given task is crucial for transferable RL agents, to account for changes in the task or transition dynamics. This is not considered by classical RL algorithms that are only concerned with finding the optimal policy, given the current task and dynamics. We propose a simple method for discovering all possible solutions of a given task, to obtain an agent that performs well in the transfer setting and adapts quickly to changes in the task or transition dynamics. Our method iteratively learns a set of policies, while each subsequent policy is constrained to yield a solution that is unlikely under all previous policies. Unlike prior methods, our approach does not require learning additional models for novelty detection and avoids balancing task and novelty reward signals, by directly incorporating the constraint into the action selection and optimization steps.
    摘要 发现所有有用的解决方案是让转移RL代理工作的关键,以适应任务或过程动态变化。这并不是классиical RL算法所考虑的,它们只关心当前任务和动态下找到优化策略。我们提出了一种简单的方法,可以帮助代理人在转移设置下快速适应任务或动态变化,并且 Perform well。我们的方法每次迭代学习一组策略,而每一个后续策略都需要在所有前一个策略下 unlikely to yield a solution。不同于先前的方法,我们的方法不需要学习额外的模型来检测新事物,也不需要平衡任务和新事物的奖励信号,直接将约束 incorporated into action selection and optimization steps。

Boosting Black-box Attack to Deep Neural Networks with Conditional Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.07492
  • repo_url: None
  • paper_authors: Renyang Liu, Wei Zhou, Tianwei Zhang, Kangjie Chen, Jun Zhao, Kwok-Yan Lam
    for:This paper proposes a novel black-box attack strategy to improve the query efficiency of generating adversarial examples (AE) under query-limited situations.methods:The proposed method, called Conditional Diffusion Model Attack (CDMA), formulates the task of AE synthesis as a distribution transformation problem and uses the conditional Denoising Diffusion Probabilistic Model as the converter to learn the transformation from clean samples to AEs.results:CDMA significantly reduces the number of queries needed compared to nine state-of-the-art black-box attacks, with an average reduction of the query count to a handful of times. The attack success rate is high, with $>99%$ success rate for untargeted attacks over all datasets and targeted attack over CIFAR-10 with a noise budget of $\epsilon=16$.
    Abstract Existing black-box attacks have demonstrated promising potential in creating adversarial examples (AE) to deceive deep learning models. Most of these attacks need to handle a vast optimization space and require a large number of queries, hence exhibiting limited practical impacts in real-world scenarios. In this paper, we propose a novel black-box attack strategy, Conditional Diffusion Model Attack (CDMA), to improve the query efficiency of generating AEs under query-limited situations. The key insight of CDMA is to formulate the task of AE synthesis as a distribution transformation problem, i.e., benign examples and their corresponding AEs can be regarded as coming from two distinctive distributions and can transform from each other with a particular converter. Unlike the conventional \textit{query-and-optimization} approach, we generate eligible AEs with direct conditional transform using the aforementioned data converter, which can significantly reduce the number of queries needed. CDMA adopts the conditional Denoising Diffusion Probabilistic Model as the converter, which can learn the transformation from clean samples to AEs, and ensure the smooth development of perturbed noise resistant to various defense strategies. We demonstrate the effectiveness and efficiency of CDMA by comparing it with nine state-of-the-art black-box attacks across three benchmark datasets. On average, CDMA can reduce the query count to a handful of times; in most cases, the query count is only ONE. We also show that CDMA can obtain $>99\%$ attack success rate for untarget attacks over all datasets and targeted attack over CIFAR-10 with the noise budget of $\epsilon=16$.
    摘要 现有的黑盒攻击已经显示了吸引深度学习模型的可能性(AE)的潜在攻击性。大多数这些攻击需要处理広大的优化空间,需要大量的询问,因此在实际情况下显示有限的实际影响。在这篇论文中,我们提出了一种新的黑盒攻击策略,即条件扩散模型攻击(CDMA),以提高受限制的询问量生成AE的能力。我们的关键见解是将AE生成视为两种不同分布之间的转换问题,即正常示例和其相应的AE可以被视为来自两个不同的分布,并可以使用特定的数据转换器将其转换为另一种分布。不同于传统的询问和优化方法,我们可以直接使用条件扩散probabilistic模型来生成适合的AE,这可以将询问数量大大减少。CDMA使用条件排除扩散模型来实现转换,这种模型可以从清洁示例到AE的转换,并确保受到不同防御策略的抗压。我们透过与九种现有的黑盒攻击进行比较,证明CDMA可以将询问数量大大减少,平均只需要几回询问。此外,我们还证明CDMA可以在CIFAR-10上 obtaint>99%的攻击成功率,并且在其他两个测试集上也有出色的表现。

KwaiYiiMath: Technical Report

  • paper_url: http://arxiv.org/abs/2310.07488
  • repo_url: None
  • paper_authors: Jiayi Fu, Lei Lin, Xiaoyang Gao, Pengli Liu, Zhengzong Chen, Zhirui Yang, Shengnan Zhang, Xue Zheng, Yan Li, Yuliang Liu, Xucheng Ye, Yiqiao Liao, Chao Liao, Bin Chen, Chengru Song, Junchen Wan, Zijia Lin, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai
  • for: The paper is written to enhance the mathematical reasoning abilities of KwaiYiiBase1, a large language model, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF) on both English and Chinese mathematical tasks.
  • methods: The paper uses Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF) to enhance the mathematical reasoning abilities of KwaiYiiBase1.
  • results: The paper achieves state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with similar size models, respectively.Here are the three key information points in Simplified Chinese text:
  • for: 本文是为了增强kwaiyiibase1的数学逻辑能力,通过supervised fine-tuning (SFT)和人工反馈学习 (RLHF),在英文和中文数学任务上进行了应用。
  • methods: 本文使用supervised fine-tuning (SFT)和人工反馈学习 (RLHF)来增强kwaiyiibase1的数学逻辑能力。
  • results: 本文在GSM8k、CMath和KMath上 achievements state-of-the-art (SOTA)性能,与相似大小的模型相比。
    Abstract Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning. In this report, we introduce the KwaiYiiMath which enhances the mathematical reasoning abilities of KwaiYiiBase1, by applying Supervised Fine-Tuning (SFT) and Reinforced Learning from Human Feedback (RLHF), including on both English and Chinese mathematical tasks. Meanwhile, we also constructed a small-scale Chinese primary school mathematics test set (named KMath), consisting of 188 examples to evaluate the correctness of the problem-solving process generated by the models. Empirical studies demonstrate that KwaiYiiMath can achieve state-of-the-art (SOTA) performance on GSM8k, CMath, and KMath compared with the similar size models, respectively.
    摘要 近期大语言模型(LLM)的进步已经表现出对各种自然语言处理(NLP)下游任务的出色能力,甚至包括需要多步逻辑的数学任务。在这份报告中,我们介绍了废弃YiiMath,通过精度练习(SFT)和人工反馈学习(RLHF)进行数学逻辑能力的提升,包括英语和中文数学任务。同时,我们还建立了一个小规模的中文小学数学测试集(名为KMath),包含188个例子,用于评估模型生成的问题解决过程的正确性。实验表明,废弃YiiMath可以在GSM8k、CMath和KMath上达到相同规模模型的SOTA性能。

Multimodal Graph Learning for Generative Tasks

  • paper_url: http://arxiv.org/abs/2310.07478
  • repo_url: https://github.com/minjiyoon/mmgl
  • paper_authors: Minji Yoon, Jing Yu Koh, Bryan Hooi, Ruslan Salakhutdinov
  • for: 本研究旨在扩展现有的文本生成模型,以使其能够利用多模态数据进行生成。
  • methods: 我们提出了一种名为多模态图学习(MMGL)的框架,可以捕捉多模态数据之间的复杂关系。我们基于预训练语言模型(LM),并通过infuse多个邻居信息和图структура信息来增强其文本生成能力。
  • results: 我们通过了三个研究问题,即如何兼容多个邻居信息,如何兼容图структура信息,以及如何 Parametric-efficiently 训练预训练LM。我们的实验结果表明,MMGL可以增强文本生成能力,并且可以在兼容多个邻居信息和图структура信息的情况下进行Parameter-efficient 训练。
    Abstract Multimodal learning combines multiple data modalities, broadening the types and complexity of data our models can utilize: for example, from plain text to image-caption pairs. Most multimodal learning algorithms focus on modeling simple one-to-one pairs of data from two modalities, such as image-caption pairs, or audio-text pairs. However, in most real-world settings, entities of different modalities interact with each other in more complex and multifaceted ways, going beyond one-to-one mappings. We propose to represent these complex relationships as graphs, allowing us to capture data with any number of modalities, and with complex relationships between modalities that can flexibly vary from one sample to another. Toward this goal, we propose Multimodal Graph Learning (MMGL), a general and systematic framework for capturing information from multiple multimodal neighbors with relational structures among them. In particular, we focus on MMGL for generative tasks, building upon pretrained Language Models (LMs), aiming to augment their text generation with multimodal neighbor contexts. We study three research questions raised by MMGL: (1) how can we infuse multiple neighbor information into the pretrained LMs, while avoiding scalability issues? (2) how can we infuse the graph structure information among multimodal neighbors into the LMs? and (3) how can we finetune the pretrained LMs to learn from the neighbor context in a parameter-efficient manner? We conduct extensive experiments to answer these three questions on MMGL and analyze the empirical results to pave the way for future MMGL research.
    摘要 多模态学习结合多种数据模式,扩大我们模型使用的数据类型和复杂度:例如,从纯文本到图像描述对。大多数多模态学习算法都是模型简单的一对一对数据,如图像描述对或音频文本对。然而,在实际世界中,不同模式之间的实体往往存在更复杂和多方面的交互,超出一对一映射。我们建议表示这些复杂关系为图,以便捕捉多模式数据,并且在不同样本之间可以有不同的关系。为了实现这个目标,我们提出了多模态图学习(MMGL),一种通用和系统的框架,用于从多个多模式邻居中捕捉信息。具体来说,我们将关注MMGL的生成任务,基于预训练语言模型(LM),以增强其文本生成功能。我们提出了三个研究问题:(1)如何将多个邻居信息融入预训练LM中,而不会面临扩展性问题?(2)如何将多模式邻居之间的图结构信息融入LM中?(3)如何在LM中 parameter-efficient 地学习邻居 контекст?我们进行了广泛的实验和分析,以回答这三个问题,并为未来MMGL研究做出了基础。

An Ontology of Co-Creative AI Systems

  • paper_url: http://arxiv.org/abs/2310.07472
  • repo_url: None
  • paper_authors: Zhiyu Lin, Mark Riedl
  • for: 这个论文旨在为研究人员提供一种 Ontology of co-creative systems,以帮助解决人工智能和人类在创造过程中的分工和信息交换问题。
  • methods: 这篇论文使用了 Lubart 的原始 Ontology of creativity support tools,并新增了三个类别,专注于人工智能:计算机作为合作伙伴、计算机作为评论人和计算机作为合作者,其中一些有子类别。
  • results: 论文提出了一个 Ontology of co-creative systems,可以帮助研究人员更好地理解和分析人工智能和人类在创造过程中的相互作用,并且可以帮助设计更加有效的人工智能和人类合作系统。
    Abstract The term co-creativity has been used to describe a wide variety of human-AI assemblages in which human and AI are both involved in a creative endeavor. In order to assist with disambiguating research efforts, we present an ontology of co-creative systems, focusing on how responsibilities are divided between human and AI system and the information exchanged between them. We extend Lubart's original ontology of creativity support tools with three new categories emphasizing artificial intelligence: computer-as-subcontractor, computer-as-critic, and computer-as-teammate, some of which have sub-categorizations.
    摘要 《合作创造力》一词汇集了许多人工智能融合体,在创造活动中,人类和AI系统都参与了创新。为了帮助研究工作,我们提出了一个协创系统 ontology,关注人类和 AI 系统之间的责任分配和信息交换。我们对 Lubart 的原始创造支持工具 ontology 进行扩展,添加了三个新类型,强调人工智能:计算机作为合同人,计算机作为评论人,计算机作为团队成员,其中有一些子分类。

The Implications of Decentralization in Blockchained Federated Learning: Evaluating the Impact of Model Staleness and Inconsistencies

  • paper_url: http://arxiv.org/abs/2310.07471
  • repo_url: None
  • paper_authors: Francesc Wilhelmi, Nima Afraz, Elia Guerra, Paolo Dini
  • for: 这篇论文探讨了在区块链上实现分布式机器学习(DL)的可能性,尤其是在分布式学习(FL)中,区块链可以提供更多的分布式、安全、不可变和信任等特性,这些特性可以激发下一代应用中的共同智能。
  • methods: 论文使用了预训练模型,并在块链上进行了模型协调和更新。
  • results: 研究发现,在使用块链进行FL时,模型不一致和延迟会导致预测精度下降(下降约35%),这说明在设计块链系统时,需要考虑FL应用的特点。
    Abstract Blockchain promises to enhance distributed machine learning (ML) approaches such as federated learning (FL) by providing further decentralization, security, immutability, and trust, which are key properties for enabling collaborative intelligence in next-generation applications. Nonetheless, the intrinsic decentralized operation of peer-to-peer (P2P) blockchain nodes leads to an uncharted setting for FL, whereby the concepts of FL round and global model become meaningless, as devices' synchronization is lost without the figure of a central orchestrating server. In this paper, we study the practical implications of outsourcing the orchestration of FL to a democratic network such as in a blockchain. In particular, we focus on the effects that model staleness and inconsistencies, endorsed by blockchains' modus operandi, have on the training procedure held by FL devices asynchronously. Using simulation, we evaluate the blockchained FL operation on the well-known CIFAR-10 dataset and focus on the accuracy and timeliness of the solutions. Our results show the high impact of model inconsistencies on the accuracy of the models (up to a ~35% decrease in prediction accuracy), which underscores the importance of properly designing blockchain systems based on the characteristics of the underlying FL application.
    摘要 blockchain 承诺增强分布式机器学习(ML)方法,如联合学习(FL),通过提供更多的分布式、安全、不可变和信任性,这些属性是实现共同智能在下一代应用程序的关键。然而,干预式P2P区块链节点的自然分布式运行环境导致FL中的概念,例如轮次和全局模型,失去意义,因为设备的同步失去了中央调度服务器的引导。在这篇论文中,我们研究了在块链上进行FL的实际影响。具体来说,我们关注FL设备在异步情况下进行训练过程中的模型落后和不一致性问题,这些问题由块链的操作方式推动。通过实验,我们评估了使用块链进行FL操作在CIFAR-10数据集上的性能。我们发现,模型不一致性可以导致预测精度下降,最高下降约35%,这说明了如何设计基于FL应用程序的块链系统的重要性。

AI/ML-based Load Prediction in IEEE 802.11 Enterprise Networks

  • paper_url: http://arxiv.org/abs/2310.07467
  • repo_url: None
  • paper_authors: Francesc Wilhelmi, Dariush Salami, Gianluca Fontanesi, Lorenzo Galati-Giordano, Mika Kasslin
  • for: 该论文旨在探讨在实际企业 Wi-Fi 网络中采用基于人工智能和机器学习(AI/ML)的负荷预测是否可行和有效。
  • methods: 该论文采用了基于 AI/ML 的负荷预测方法,并对其适用性和可行性进行了研究。
  • results: 研究发现,使用硬件受限的 AI/ML 模型可以预测网络负荷,误差在20%以下,85%分位数在3%以下,这可以作为 Wi-Fi 网络优化的输入。
    Abstract Enterprise Wi-Fi networks can greatly benefit from Artificial Intelligence and Machine Learning (AI/ML) thanks to their well-developed management and operation capabilities. At the same time, AI/ML-based traffic/load prediction is one of the most appealing data-driven solutions to improve the Wi-Fi experience, either through the enablement of autonomous operation or by boosting troubleshooting with forecasted network utilization. In this paper, we study the suitability and feasibility of adopting AI/ML-based load prediction in practical enterprise Wi-Fi networks. While leveraging AI/ML solutions can potentially contribute to optimizing Wi-Fi networks in terms of energy efficiency, performance, and reliability, their effective adoption is constrained to aspects like data availability and quality, computational capabilities, and energy consumption. Our results show that hardware-constrained AI/ML models can potentially predict network load with less than 20% average error and 3% 85th-percentile error, which constitutes a suitable input for proactively driving Wi-Fi network optimization.
    摘要

Efficient machine-learning surrogates for large-scale geological carbon and energy storage

  • paper_url: http://arxiv.org/abs/2310.07461
  • repo_url: None
  • paper_authors: Teeratorn Kadeethum, Stephen J. Verzi, Hongkyu Yoon
  • for: 这篇论文是为了探讨地质碳和能源储存在实现零碳排放和气候变化管理方面所面临的不确定性,并提出一个特殊化的机器学习(ML)模型来有效管理大规模的油气储存模型。
  • methods: 这篇论文使用了一种特殊的机器学习模型,具有预测精度和训练成本之间的平衡,以解决大规模地质碳储存应用中的计算资源限制问题。
  • results: 这篇论文的结果显示,这种特殊的机器学习模型可以实现高精度的预测,并且可以对大规模的油气储存应用进行有效管理,协助解决地质碳储存中的不确定性和操作限制问题。
    Abstract Geological carbon and energy storage are pivotal for achieving net-zero carbon emissions and addressing climate change. However, they face uncertainties due to geological factors and operational limitations, resulting in possibilities of induced seismic events or groundwater contamination. To overcome these challenges, we propose a specialized machine-learning (ML) model to manage extensive reservoir models efficiently. While ML approaches hold promise for geological carbon storage, the substantial computational resources required for large-scale analysis are the obstacle. We've developed a method to reduce the training cost for deep neural operator models, using domain decomposition and a topology embedder to link spatio-temporal points. This approach allows accurate predictions within the model's domain, even for untrained data, enhancing ML efficiency for large-scale geological storage applications.
    摘要 地质碳和能源储存对于实现零碳排放和气候变化做出重要贡献,但它们面临地质因素和运营限制,可能导致人工地震或地下水污染。为了解决这些挑战,我们提议使用专门的机器学习(ML)模型来有效管理大规模的沉存模型。While ML approaches hold promise for geological carbon storage, the substantial computational resources required for large-scale analysis are the obstacle. We've developed a method to reduce the training cost for deep neural operator models, using domain decomposition and a topology embedder to link spatio-temporal points. This approach allows accurate predictions within the model's domain, even for untrained data, enhancing ML efficiency for large-scale geological storage applications.Translation notes:* "地质碳" (dì qiè diān) means "geological carbon"* "能源储存" (néngyuan jīcè) means "energy storage"* "零碳排放" (líng diān fāshuā) means "net-zero carbon emissions"* "气候变化" (qìngkēng biànhàng) means "climate change"* "地质因素" (dì qiè yīnxiàng) means "geological factors"* "运营限制" (yùn yì jìzhèng) means "operational limitations"* "人工地震" (réngōng dìzhèn) means "induced seismic events"* "地下水污染" (dì xià shuǐwù) means "groundwater contamination"* "ML" (M L) stands for "machine learning"* "沉存模型" (chéncèng módel) means "reservoir models"* "域 decomposition" (dì zhāng) means "domain decomposition"* "トポлоги embedder" (tuōpōlógì yìbiāo) means "topology embedder"* "链接点" (liánjié diǎn) means "linking points"* "模型的域" (módel de dì) means "the model's domain"* "untrained data" (wutaining tiào xiàng) means "untrained data"

HealthWalk: Promoting Health and Mobility through Sensor-Based Rollator Walker Assistance

  • paper_url: http://arxiv.org/abs/2310.07434
  • repo_url: None
  • paper_authors: Ivanna Kramer, Kevin Weirauch, Sabine Bauer, Mark Oliver Mints, Peer Neubert
  • for: 增强physically limited人群 mobilty和独立参与社会
  • methods: интегриyo sensors into rollator walker designs
  • results: 数据收集和其他 interessin use casesHere’s the information in Simplified Chinese text:
  • for: 增强physically limited人群 mobilty和独立参与社会
  • methods: интегриyo sensors into rollator walker designs
  • results: 数据收集和其他 interessin use casesI hope this helps! Let me know if you have any other questions.
    Abstract Rollator walkers allow people with physical limitations to increase their mobility and give them the confidence and independence to participate in society for longer. However, rollator walker users often have poor posture, leading to further health problems and, in the worst case, falls. Integrating sensors into rollator walker designs can help to address this problem and results in a platform that allows several other interesting use cases. This paper briefly overviews existing systems and the current research directions and challenges in this field. We also present our early HealthWalk rollator walker prototype for data collection with older people, rheumatism, multiple sclerosis and Parkinson patients, and individuals with visual impairments.
    摘要 轮椅步行器让人们 WITH 身体限制能够提高 mobilility 并给他们带来自信和独立,以便 longer 参与社会。但是轮椅步行器用户 часто有坏 posture,导致更多的健康问题,甚至最坏情况下掉进了 falls。把感应器 integrate 到轮椅步行器设计中可以解决这个问题,并且可以实现多种有趣的用case。本文 briefly 概述了现有系统和这个领域的当前研究方向和挑战。我们也 Present 我们的早期 HealthWalk 轮椅步行器原型,用于收集数据 FROM older people, rheumatism, multiple sclerosis 和 Parkinson 病人,以及Visual impairments 患者。

Imitation Learning from Observation with Automatic Discount Scheduling

  • paper_url: http://arxiv.org/abs/2310.07433
  • repo_url: None
  • paper_authors: Yuyang Liu, Weijun Dong, Yingdong Hu, Chuan Wen, Zhao-Heng Yin, Chongjie Zhang, Yang Gao
    for:本研究旨在解决机器人学习从视频示例数据中学习的问题,即“imitating the expert without access to its action”。methods:本研究使用了一种 inverse reinforcement learning 方法,将 ILfO 问题转化为 reinforcement learning 问题,使用代理奖金计算从 agent 和专家的观察中。results:实验结果显示,我们的方法在 nine Meta-World 任务上表现出色,与现有方法比较,在所有任务上均有显著改善,包括一些不可解决的任务。
    Abstract Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinforcement learning problems, utilizing a proxy reward computed from the agent's and the expert's observations. Nonetheless, we identify that tasks characterized by a progress dependency property pose significant challenges for such approaches; in these tasks, the agent needs to initially learn the expert's preceding behaviors before mastering the subsequent ones. Our investigation reveals that the main cause is that the reward signals assigned to later steps hinder the learning of initial behaviors. To address this challenge, we present a novel ILfO framework that enables the agent to master earlier behaviors before advancing to later ones. We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively alters the discount factor in reinforcement learning during the training phase, prioritizing earlier rewards initially and gradually engaging later rewards only when the earlier behaviors have been mastered. Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms state-of-the-art methods across all tasks, including those that are unsolvable by them.
    摘要 人类常通过观察和模仿获得新技能。对于机器人代理,从互联网上的大量未标注视频示例数据中学习,却面临了无法访问行为者的行为的挑战,称为观察学习从抽象(ILfO)。常见的解决ILfO问题的方法是将其转化为反杂化学习问题,使用代理人和行为者的观察得到的代理奖励。然而,我们发现任务具有进步依赖性属性时,这些方法会遇到很大的挑战,因为代理人需要在学习早期的行为之前,才能学习专家的后续行为。我们的研究发现,主要的问题在于奖励信号赋给后续步骤会阻碍代理人学习初期的行为。为解决这个挑战,我们提出了一种新的ILfO框架,使得代理人可以在训练期间,先学习初期的行为,然后才能进行后续的学习。我们在训练过程中引入自适应折扣因子调整机制(ADS),根据训练过程中的步骤,自动调整折扣因子,在初期偏好早期奖励,逐渐地只有在初期行为已经熟悉时,才能够参与后续奖励。我们在Meta-World任务上进行了九项实验,结果表明,我们的方法在所有任务上都有显著的优势,包括一些不可能由现有方法解决的任务。

Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else

  • paper_url: http://arxiv.org/abs/2310.07419
  • repo_url: None
  • paper_authors: Hazarapet Tunanyan, Dejia Xu, Shant Navasardyan, Zhangyang Wang, Humphrey Shi
  • for: 提高文本描述逻辑概率生成图像的自然多元概念能力,不需要额外训练或执行时指导。
  • methods: 通过修正预训练文本扩展模型中的文本嵌入,解决概念占据和非本地贡献问题,提高多元概念图像生成性能。
  • results: 在文本描述逻辑概率生成图像、图像修改和个性化任务中,比前方法高效,且不需要额外训练或执行时指导。
    Abstract Recent advances in text-to-image diffusion models have enabled the photorealistic generation of images from text prompts. Despite the great progress, existing models still struggle to generate compositional multi-concept images naturally, limiting their ability to visualize human imagination. While several recent works have attempted to address this issue, they either introduce additional training or adopt guidance at inference time. In this work, we consider a more ambitious goal: natural multi-concept generation using a pre-trained diffusion model, and with almost no extra cost. To achieve this goal, we identify the limitations in the text embeddings used for the pre-trained text-to-image diffusion models. Specifically, we observe concept dominance and non-localized contribution that severely degrade multi-concept generation performance. We further design a minimal low-cost solution that overcomes the above issues by tweaking (not re-training) the text embeddings for more realistic multi-concept text-to-image generation. Our Correction by Similarities method tweaks the embedding of concepts by collecting semantic features from most similar tokens to localize the contribution. To avoid mixing features of concepts, we also apply Cross-Token Non-Maximum Suppression, which excludes the overlap of contributions from different concepts. Experiments show that our approach outperforms previous methods in text-to-image, image manipulation, and personalization tasks, despite not introducing additional training or inference costs to the diffusion steps.
    摘要

Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages

  • paper_url: http://arxiv.org/abs/2310.07418
  • repo_url: https://github.com/Guozheng-Ma/Adaptive-Replay-Ratio
  • paper_authors: Guozheng Ma, Lu Li, Sen Zhang, Zixuan Liu, Zhen Wang, Yixin Chen, Li Shen, Xueqian Wang, Dacheng Tao
  • for: 这篇论文的目的是研究强化学习中的塑性(plasticity)如何影响高性能和效率的视觉强化学习(VRL)。
  • methods: 该论文使用了系统的实验尝试,探讨了三个未曾充分研究的方面,并得出了以下有益的结论:(1)数据扩展是维持塑性的关键因素;(2)评价器的塑性损失是干扰高效培训的主要障碍;(3)在早期阶段不及时 intervene 可能会导致评价器的塑性损失变成致命的问题。
  • results: 该论文的研究结果表明,适应式 RR 可以避免早期阶段的评价器塑性损失,并在后期阶段更频 reuse,从而提高样本效率。
    Abstract Plasticity, the ability of a neural network to evolve with new data, is crucial for high-performance and sample-efficient visual reinforcement learning (VRL). Although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. In this work, we conduct a systematic empirical exploration focusing on three primary underexplored facets and derive the following insightful conclusions: (1) data augmentation is essential in maintaining plasticity; (2) the critic's plasticity loss serves as the principal bottleneck impeding efficient training; and (3) without timely intervention to recover critic's plasticity in the early stages, its loss becomes catastrophic. These insights suggest a novel strategy to address the high replay ratio (RR) dilemma, where exacerbated plasticity loss hinders the potential improvements of sample efficiency brought by increased reuse frequency. Rather than setting a static RR for the entire training process, we propose Adaptive RR, which dynamically adjusts the RR based on the critic's plasticity level. Extensive evaluations indicate that Adaptive RR not only avoids catastrophic plasticity loss in the early stages but also benefits from more frequent reuse in later phases, resulting in superior sample efficiency.
    摘要 neural network 的可变性(plasticity) 是视觉强化学习(VRL) 的关键因素,它可以使 agent 在新数据上进行学习和改进。 although methods like resetting and regularization can potentially mitigate plasticity loss, the influences of various components within the VRL framework on the agent's plasticity are still poorly understood. 在这项工作中,我们进行了系统性的实验研究,关注三个次要未经探索的方面,得到以下有价值的结论:(1)数据扩展是维护可变性的关键因素;(2)批评者的可变性损失是训练的主要瓶颈;(3)在早期阶段没有及时干预可以使批评者的可变性恢复,这种损失会变成灾难性的。这些结论建议了一种新的策略,即适应性的 reuse frequency(RR),可以 dynamically 根据批评者的可变性水平进行调整。我们的实验表明,适应性的 RR 不仅可以避免批评者的可变性损失在早期阶段,还可以在后期阶段更频 reuse,从而提高样本效率。

What can knowledge graph alignment gain with Neuro-Symbolic learning approaches?

  • paper_url: http://arxiv.org/abs/2310.07417
  • repo_url: None
  • paper_authors: Pedro Giesteira Cotovio, Ernesto Jimenez-Ruiz, Catia Pesquita
  • for: 本研究旨在探讨现有的知识图гра(KG)对应算法的局限性,以及可以通过结合符号学习和数字学习的гибри德学习模型来改进KGA的性能和可解性。
  • methods: 本研究使用了现有的KGA算法和深度学习模型,以及一些相关的数字学习和符号学习方法进行比较和分析。
  • results: 研究发现,结合符号学习和数字学习的гибри德学习模型可以提高KGA的性能和可解性,并且可以支持人类中心的验证和验证方法。
    Abstract Knowledge Graphs (KG) are the backbone of many data-intensive applications since they can represent data coupled with its meaning and context. Aligning KGs across different domains and providers is necessary to afford a fuller and integrated representation. A severe limitation of current KG alignment (KGA) algorithms is that they fail to articulate logical thinking and reasoning with lexical, structural, and semantic data learning. Deep learning models are increasingly popular for KGA inspired by their good performance in other tasks, but they suffer from limitations in explainability, reasoning, and data efficiency. Hybrid neurosymbolic learning models hold the promise of integrating logical and data perspectives to produce high-quality alignments that are explainable and support validation through human-centric approaches. This paper examines the current state of the art in KGA and explores the potential for neurosymbolic integration, highlighting promising research directions for combining these fields.
    摘要 知识 graphs (KG) 是许多数据敏感应用的重要组成部分,因为它们可以表示数据以及其意义和上下文。对不同领域和提供商的 KG 进行对接是必要的,以便获得更加全面和集成的表示。当前的 KG 对应 (KGA) 算法有一定的限制,它们无法体现逻辑思维和语言、结构和 semantics 数据学习的相互作用。深入学习模型在 KGA 方面具有良好的表现,但它们受到解释性、逻辑和数据效率的限制。混合 neuralsymbolic 学习模型可以结合逻辑和数据视角,生成高质量的对接,同时可以提供可解释的结果和人类中心的验证方法。本文将对当前 KGA 领域的状况进行检查,并探讨将这两个领域结合在一起的潜在研究方向。

DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation

  • paper_url: http://arxiv.org/abs/2310.07403
  • repo_url: https://github.com/ictnlp/daspeech
  • paper_authors: Qingkai Fang, Yan Zhou, Yang Feng
    for:* DASpeech is a non-autoregressive direct speech-to-speech translation model that aims to achieve both high-quality translations and fast decoding speeds.methods:* DASpeech uses a two-pass architecture to decompose the generation process into two steps: a linguistic decoder first generates the target text, and an acoustic decoder then generates the target speech based on the hidden states of the linguistic decoder.* DA-Transformer models translations with a directed acyclic graph (DAG), and dynamic programming is used to calculate the expected hidden states for each target token during training.results:* DASpeech achieves comparable or even better performance than the state-of-the-art S2ST model Translatotron 2, while preserving up to 18.53x speedup compared to the autoregressive baseline.* DASpeech shows significant improvements in both translation quality and decoding speed compared to previous non-autoregressive S2ST models, without relying on knowledge distillation or iterative decoding.* DASpeech preserves the speaker’s voice of the source speech during translation.Here is the format you requested:for: direkt S2ST模型可以实现高质量翻译和快速解码methods: two-pass架构,首先使用语言解码器生成目标文本,然后使用听音解码器基于语言解码器的隐藏状态生成目标语音results: comparable或更好的性能,与当前状态OF-the-art S2ST模型Translatotron 2相当,同时保持18.53倍的速度提升相比于核心基eline。
    Abstract Direct speech-to-speech translation (S2ST) translates speech from one language into another using a single model. However, due to the presence of linguistic and acoustic diversity, the target speech follows a complex multimodal distribution, posing challenges to achieving both high-quality translations and fast decoding speeds for S2ST models. In this paper, we propose DASpeech, a non-autoregressive direct S2ST model which realizes both fast and high-quality S2ST. To better capture the complex distribution of the target speech, DASpeech adopts the two-pass architecture to decompose the generation process into two steps, where a linguistic decoder first generates the target text, and an acoustic decoder then generates the target speech based on the hidden states of the linguistic decoder. Specifically, we use the decoder of DA-Transformer as the linguistic decoder, and use FastSpeech 2 as the acoustic decoder. DA-Transformer models translations with a directed acyclic graph (DAG). To consider all potential paths in the DAG during training, we calculate the expected hidden states for each target token via dynamic programming, and feed them into the acoustic decoder to predict the target mel-spectrogram. During inference, we select the most probable path and take hidden states on that path as input to the acoustic decoder. Experiments on the CVSS Fr-En benchmark demonstrate that DASpeech can achieve comparable or even better performance than the state-of-the-art S2ST model Translatotron 2, while preserving up to 18.53x speedup compared to the autoregressive baseline. Compared with the previous non-autoregressive S2ST model, DASpeech does not rely on knowledge distillation and iterative decoding, achieving significant improvements in both translation quality and decoding speed. Furthermore, DASpeech shows the ability to preserve the speaker's voice of the source speech during translation.
    摘要 直接Speech-to-Speech翻译(S2ST)模型可以将一种语言的语音翻译成另一种语言。然而,由于语言和听音多样性,目标语音表现出复杂的多Modal分布,这会对S2ST模型的高质量翻译和快速解码速度带来挑战。在这篇论文中,我们提出了DASpeech模型,这是一种非autoregressive的直接S2ST模型,可以同时实现高质量翻译和快速解码速度。为了更好地捕捉目标语音的复杂分布,DASpeech采用了两个过程来分解生成过程,其中一个是语言解码器,它首先生成目标文本;另一个是听音解码器,它根据语言解码器生成的隐藏状态来生成目标听音spectrogram。我们使用DA-Transformer模型的解码器作为语言解码器,并使用FastSpeech 2模型作为听音解码器。在训练时,我们使用DAG模型来表示翻译,并通过动态计算隐藏状态来考虑所有可能的路径。在推理时,我们选择最有可能性的路径,并将隐藏状态feed into听音解码器来预测目标听音spectrogram。实验结果表明,DASpeech可以与状态艺术S2ST模型Translatotron 2进行比较,同时保持18.53倍的速度提升。与之前的非autoregressive S2ST模型相比,DASpeech不需要知识储存和迭代解码,它可以实现显著的改进 both translation quality和解码速度。此外,DASpeech还可以保持源语音的 speaker voice。

NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining

  • paper_url: http://arxiv.org/abs/2310.07402
  • repo_url: None
  • paper_authors: Chenguo Lin, Xumeng Wen, Wei Cao, Congrui Huang, Jiang Bian, Stephen Lin, Zhirong Wu
  • for: 学习时序数据的 semantic 表示。
  • methods: 采用 Transformer 架构,首先将输入分割成不重叠的窗口,然后对每个窗口进行 numerically multi-scaled embedding。
  • results: 在多个uniivariate和multivariate classification benchmark上显示出惊人的改进,并在非学习基础方法中Establish新的状态码。
    Abstract Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequences. We adopt the Transformer architecture by first partitioning the input into non-overlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical scales to high-dimensional vectors, we propose a numerically multi-scaled embedding module enumerating all possible scales for the scalar values. The model undergoes pretraining using the proposed numerically multi-scaled embedding with a simple contrastive objective on a large-scale dataset containing over a million sequences. We study its transfer performance on a number of univariate and multivariate classification benchmarks. Our method exhibits remarkable improvement against previous representation learning approaches and establishes the new state of the art, even compared with domain-specific non-learning-based methods.
    摘要 We adopt the Transformer architecture by first partitioning the input into non-overlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical scales into high-dimensional vectors, we propose a numerically multi-scaled embedding module that enumerates all possible scales for the scalar values.The model undergoes pretraining using the proposed numerically multi-scaled embedding with a simple contrastive objective on a large-scale dataset containing over a million sequences. We study its transfer performance on a number of univariate and multivariate classification benchmarks. Our method exhibits remarkable improvement compared to previous representation learning approaches and establishes a new state of the art, even compared with domain-specific non-learning-based methods.

Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation

  • paper_url: http://arxiv.org/abs/2310.07397
  • repo_url: https://github.com/iwangjian/topdial
  • paper_authors: Jian Wang, Yi Cheng, Dongding Lin, Chak Tou Leong, Wenjie Li
    for: 这 paper 是研究 targets-oriented dialogue systems 的,它们可以帮助对话系统更好地驱动对话向 predetermined targets 或达到系统方面的目标。methods: 这 paper 使用了一种新的方法,即在 <dialogue act, topic> 对应的对话中进行个性化目标途径。它还提出了一种自动 dataset curation 框架,使用 role-playing 方法来生成大规模的个性化目标对话数据集。results: 这 paper 通过实验表明,这些个性化目标对话数据集具有高质量,可以用于探索个性化目标对话。
    Abstract Target-oriented dialogue systems, designed to proactively steer conversations toward predefined targets or accomplish specific system-side goals, are an exciting area in conversational AI. In this work, by formulating a pair as the conversation target, we explore a novel problem of personalized target-oriented dialogue by considering personalization during the target accomplishment process. However, there remains an emergent need for high-quality datasets, and building one from scratch requires tremendous human effort. To address this, we propose an automatic dataset curation framework using a role-playing approach. Based on this framework, we construct a large-scale personalized target-oriented dialogue dataset, TopDial, which comprises about 18K multi-turn dialogues. The experimental results show that this dataset is of high quality and could contribute to exploring personalized target-oriented dialogue.
    摘要 目标导向对话系统,旨在主动导引对话向预定的目标或完成系统侧的目标,是现代对话智能领域的一个有趣领域。在这种工作中,我们通过формализова对话行为和话题的对应关系,探索一种个性化目标导向对话的问题。然而,有一定的需求升级高质量的数据集,建立自身的数据集需要巨大的人工劳动。为解决这个问题,我们提出了一种自动数据集筛选框架,基于角色扮演的方法。通过这种框架,我们建立了一个大规模的个性化目标导向对话数据集,TopDial,包含约18K多轮对话。实验结果表明,这个数据集具有高质量,可以探索个性化目标导向对话。

Learning a Reward Function for User-Preferred Appliance Scheduling

  • paper_url: http://arxiv.org/abs/2310.07389
  • repo_url: https://github.com/nikskiks/learning-reward-function-demand-response
  • paper_authors: Nikolina Čović, Jochen Cremer, Hrvoje Pandžić
  • for: 降低电力 секто�的碳排放,需要加快住宅部门的需给回应服务提供的推展。
  • methods: 使用反馈学习算法,将住宅用户的过往消耗数据变数为创建每天家用电器运行时间表。
  • results: 透过不询问住宅用户的需求和愿望,将住宅用户内在地参与服务设计和决策过程,并透过金钱或环保动机来鼓励住宅用户继续参与需给回应服务。
    Abstract Accelerated development of demand response service provision by the residential sector is crucial for reducing carbon-emissions in the power sector. Along with the infrastructure advancement, encouraging the end users to participate is crucial. End users highly value their privacy and control, and want to be included in the service design and decision-making process when creating the daily appliance operation schedules. Furthermore, unless they are financially or environmentally motivated, they are generally not prepared to sacrifice their comfort to help balance the power system. In this paper, we present an inverse-reinforcement-learning-based model that helps create the end users' daily appliance schedules without asking them to explicitly state their needs and wishes. By using their past consumption data, the end consumers will implicitly participate in the creation of those decisions and will thus be motivated to continue participating in the provision of demand response services.
    摘要 加速了住宅部分的需求应答服务提供的发展是减少能源部门碳排放的关键。同时,激励终端用户参与是关键。终端用户强烈关注隐私和控制,希望在日常家用电器运行时间的设计和决策过程中被包括。此外,如果他们没有经济或环境上的驱动力,他们通常不愿意为了帮助平衡能源系统而做出牺牲。本文提出了一种基于逆激励学习的模型,可以无需直接询问终端用户需求和愿望,通过使用其过去的消耗数据,使终端用户在创造这些决策过程中implicitly参与,从而被激励继续参与提供需求应答服务。

Histopathological Image Classification and Vulnerability Analysis using Federated Learning

  • paper_url: http://arxiv.org/abs/2310.07380
  • repo_url: None
  • paper_authors: Sankalp Vyas, Amar Nath Patra, Raj Mani Shukla
  • for: 革新健康预测技术,保护用户隐私
  • methods: 联邦学习(Federated Learning)技术,对于肉眼皮癌资料集进行预测
  • results: 发现联邦学习容易受到数据毒素攻击,影响模型的精度。透过测试实验,发现当数据毒素比例增加时,模型的精度会显著下降。
    Abstract Healthcare is one of the foremost applications of machine learning (ML). Traditionally, ML models are trained by central servers, which aggregate data from various distributed devices to forecast the results for newly generated data. This is a major concern as models can access sensitive user information, which raises privacy concerns. A federated learning (FL) approach can help address this issue: A global model sends its copy to all clients who train these copies, and the clients send the updates (weights) back to it. Over time, the global model improves and becomes more accurate. Data privacy is protected during training, as it is conducted locally on the clients' devices. However, the global model is susceptible to data poisoning. We develop a privacy-preserving FL technique for a skin cancer dataset and show that the model is prone to data poisoning attacks. Ten clients train the model, but one of them intentionally introduces flipped labels as an attack. This reduces the accuracy of the global model. As the percentage of label flipping increases, there is a noticeable decrease in accuracy. We use a stochastic gradient descent optimization algorithm to find the most optimal accuracy for the model. Although FL can protect user privacy for healthcare diagnostics, it is also vulnerable to data poisoning, which must be addressed.
    摘要 医疗是机器学习(ML)的一个重要应用领域。传统上,ML模型通常由中央服务器进行训练,该服务器将来自不同分布式设备的数据集成以预测新生成的数据结果。这会导致隐私披露问题,因为模型可以访问敏感用户信息。一种联邦学习(FL)方法可以解决这个问题:全球模型将其 копию传递给所有客户端,客户端将其更新( weights)回传给它。随着全球模型的改进,其精度会逐渐提高。在训练过程中,数据隐私得到保护,因为训练在客户端上进行。然而,全球模型受到数据毒品攻击的威胁。我们开发了一种隐私保护的FL技术,并在皮肤癌数据集上进行了实验。发现,当一个客户端意外地将标签反转为攻击时,全球模型的准确率会下降。随着标签反转的百分比增加,全球模型的准确率会显著下降。我们使用某种随机梯度下降优化算法来找到最佳准确率。虽然FL可以保护用户隐私 для医疗诊断,但也容易受到数据毒品攻击,这些问题需要解决。

Causal Unsupervised Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2310.07379
  • repo_url: https://github.com/byungkwanlee/causal-unsupervised-segmentation
  • paper_authors: Junho Kim, Byung-Kwan Lee, Yong Man Ro
  • for: 这个论文的目的是提出一种新的无监督 semantic segmentation 方法,以实现高质量的semantic grouping无需人工标注。
  • methods: 该方法利用自动预训练的特征进行train prediction heads,并采用 causal inference 的思想来定义适当的 clustering 级别。
  • results: 通过大量实验和分析,该方法在不同的数据集上达到了无监督 semantic segmentation 的州OF-THE-ART表现。
    Abstract Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations. With the advent of self-supervised pre-training, various frameworks utilize the pre-trained features to train prediction heads for unsupervised dense prediction. However, a significant challenge in this unsupervised setup is determining the appropriate level of clustering required for segmenting concepts. To address it, we propose a novel framework, CAusal Unsupervised Semantic sEgmentation (CAUSE), which leverages insights from causal inference. Specifically, we bridge intervention-oriented approach (i.e., frontdoor adjustment) to define suitable two-step tasks for unsupervised prediction. The first step involves constructing a concept clusterbook as a mediator, which represents possible concept prototypes at different levels of granularity in a discretized form. Then, the mediator establishes an explicit link to the subsequent concept-wise self-supervised learning for pixel-level grouping. Through extensive experiments and analyses on various datasets, we corroborate the effectiveness of CAUSE and achieve state-of-the-art performance in unsupervised semantic segmentation.
    摘要 自动 semantic segmentation 的目标是实现高质量的 semantic grouping 无需人工标注。随着自我超级预训练的出现,各种框架通过预训练特征来训练预测头 для无监督的稠密预测。然而,在这种无监督设置中,确定适当的归类水平是一个重要挑战。为解决这个问题,我们提出了一种新的框架,名为 CAusal Unsupervised Semantic sEgmentation (CAUSE),它利用 causal inference 的启示。具体来说,我们将 intervention-oriented approach (i.e., frontdoor adjustment) 引入,定义适当的两步任务 для无监督预测。第一步是构建 concept clusterbook 作为中间变量,它表示不同粒度的概念原型的可能性。然后, mediator 建立了Explicit链接,使得后续的概念wise self-supervised learning 可以帮助像素级别的归类。通过对多个数据集的广泛实验和分析,我们证明 CAUSE 的效果,并实现了无监督 semantic segmentation 的州际性能。

Point Cloud Denoising and Outlier Detection with Local Geometric Structure by Dynamic Graph CNN

  • paper_url: http://arxiv.org/abs/2310.07376
  • repo_url: None
  • paper_authors: Kosuke Nakayama, Hiroto Fukuta, Hiroshi Watanabe
  • for: 点云数据清洗和异常检测
  • methods: 应用两种基于动态图 convolutional layer的方法
  • results: 提出的方法在AUPR和Chamfer Distance上表现出色,比传统方法更高的异常检测精度和清洗精度
    Abstract The digitalization of society is rapidly developing toward the realization of the digital twin and metaverse. In particular, point clouds are attracting attention as a media format for 3D space. Point cloud data is contaminated with noise and outliers due to measurement errors. Therefore, denoising and outlier detection are necessary for point cloud processing. Among them, PointCleanNet is an effective method for point cloud denoising and outlier detection. However, it does not consider the local geometric structure of the patch. We solve this problem by applying two types of graph convolutional layer designed based on the Dynamic Graph CNN. Experimental results show that the proposed methods outperform the conventional method in AUPR, which indicates outlier detection accuracy, and Chamfer Distance, which indicates denoising accuracy.
    摘要 社会的数字化发展 rapidly towards the realization of digital twin和metaverse。特别是3D空间中的点云数据受到关注。由于测量错误导致的噪声和异常值,因此需要对点云数据进行噪声除除和异常值检测。其中,PointCleanNet是一种有效的点云噪声除除和异常值检测方法。但它不考虑点云的本地 геометрическую结构。我们解决这个问题,通过基于动态图 convolutional layer的两种类型应用。实验结果显示,我们提出的方法在AUPR和Chamfer Distance中表现出色,即异常检测精度和噪声除除精度。Here's the breakdown of the translation:* 社会的数字化发展 (society's digital development) -> 社会的数字化发展 (society's digital development)* rapidly towards the realization of digital twin和metaverse (rapidly towards the realization of digital twin and metaverse) -> rapid towards the realization of digital twin和metaverse (rapid towards the realization of digital twin and metaverse)* 特别是3D空间中的点云数据 (point cloud data in 3D space) -> 特别是3D空间中的点云数据 (point cloud data in 3D space)* 受到关注 (attracting attention) -> 受到关注 (attracting attention)* 由于测量错误导致的噪声和异常值 (due to measurement errors and outliers) -> 由于测量错误导致的噪声和异常值 (due to measurement errors and outliers)* 因此需要对点云数据进行噪声除除和异常值检测 (therefore, need to denoise and detect outliers for point cloud data) -> 因此需要对点云数据进行噪声除除和异常值检测 (therefore, need to denoise and detect outliers for point cloud data)* 其中,PointCleanNet是一种有效的点云噪声除除和异常值检测方法 (PointCleanNet is an effective method for point cloud denoising and outlier detection) -> 其中,PointCleanNet是一种有效的点云噪声除除和异常值检测方法 (PointCleanNet is an effective method for point cloud denoising and outlier detection)* 但它不考虑点云的本地 геометрическую结构 (but it does not consider the local geometric structure of the point cloud) -> 但它不考虑点云的本地 геометрическую结构 (but it does not consider the local geometric structure of the point cloud)* 我们解决这个问题,通过基于动态图 convolutional layer的两种类型应用 (we solve this problem by applying two types of graph convolutional layers based on dynamic graphs) -> 我们解决这个问题,通过基于动态图 convolutional layer的两种类型应用 (we solve this problem by applying two types of graph convolutional layers based on dynamic graphs)* 实验结果显示,我们提出的方法在AUPR和Chamfer Distance中表现出色 (experimental results show that our method outperforms the conventional method in AUPR and Chamfer Distance) -> 实验结果显示,我们提出的方法在AUPR和Chamfer Distance中表现出色 (experimental results show that our method outperforms the conventional method in AUPR and Chamfer Distance)

Give and Take: Federated Transfer Learning for Industrial IoT Network Intrusion Detection

  • paper_url: http://arxiv.org/abs/2310.07354
  • repo_url: None
  • paper_authors: Lochana Telugu Rajesh, Tapadhir Das, Raj Mani Shukla, Shamik Sengupta
  • for: 这篇论文旨在提出一种联边学习(Federated Transfer Learning,FTL)方法,用于防护互联网络(IIoT)中的网络入侵攻击。
  • methods: 本论文提出了一个搭配式神经网络(Combinational Neural Network,CNN),用于实现FTL的中心部分。在这篇论文中,我们将IIoT数据分成客户端和服务器端两部分,然后使用客户端模型的weight更新服务器模型。
  • results: 本论文的实验结果显示,FTL设置在IIoT客户端和服务器之间的 Iterations 具有高性能,并且比现有的机器学习算法在网络入侵攻击预测方面表现更好。
    Abstract The rapid growth in Internet of Things (IoT) technology has become an integral part of today's industries forming the Industrial IoT (IIoT) initiative, where industries are leveraging IoT to improve communication and connectivity via emerging solutions like data analytics and cloud computing. Unfortunately, the rapid use of IoT has made it an attractive target for cybercriminals. Therefore, protecting these systems is of utmost importance. In this paper, we propose a federated transfer learning (FTL) approach to perform IIoT network intrusion detection. As part of the research, we also propose a combinational neural network as the centerpiece for performing FTL. The proposed technique splits IoT data between the client and server devices to generate corresponding models, and the weights of the client models are combined to update the server model. Results showcase high performance for the FTL setup between iterations on both the IIoT clients and the server. Additionally, the proposed FTL setup achieves better overall performance than contemporary machine learning algorithms at performing network intrusion detection.
    摘要 “现代互联网络设备(IoT)技术的快速发展已成为今天的行业核心,组成了企业级互联网络(IIoT)计划,where industries are leveraging IoT to improve communication and connectivity via emerging solutions like data analytics and cloud computing. Unfortunately, the rapid use of IoT has made it an attractive target for cybercriminals. Therefore, protecting these systems is of utmost importance. In this paper, we propose a federated transfer learning (FTL) approach to perform IIoT network intrusion detection. As part of the research, we also propose a combinational neural network as the centerpiece for performing FTL. The proposed technique splits IoT data between the client and server devices to generate corresponding models, and the weights of the client models are combined to update the server model. Results showcase high performance for the FTL setup between iterations on both the IIoT clients and the server. Additionally, the proposed FTL setup achieves better overall performance than contemporary machine learning algorithms at performing network intrusion detection.”Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese writing systems.

Semantic Association Rule Learning from Time Series Data and Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2310.07348
  • repo_url: None
  • paper_authors: Erkan Karabulut, Victoria Degeler, Paul Groth
  • for: 这篇论文的目的是为了提出一个基于知识 graphs 和时间序列数据的 semantic association rule 学习管道,以及一个新的 semantic association rule 标准。
  • methods: 这篇论文使用了知识 graphs 和时间序列数据来学习 semantic association rules,并提出了一种新的 semantic association rule 标准。
  • results: 实验结果表明,提出的方法可以学习出具有 semantic information 的大量 association rules,这些规则更加普适。
    Abstract Digital Twins (DT) are a promising concept in cyber-physical systems research due to their advanced features including monitoring and automated reasoning. Semantic technologies such as Knowledge Graphs (KG) are recently being utilized in DTs especially for information modelling. Building on this move, this paper proposes a pipeline for semantic association rule learning in DTs using KGs and time series data. In addition to this initial pipeline, we also propose new semantic association rule criterion. The approach is evaluated on an industrial water network scenario. Initial evaluation shows that the proposed approach is able to learn a high number of association rules with semantic information which are more generalizable. The paper aims to set a foundation for further work on using semantic association rule learning especially in the context of industrial applications.
    摘要 数字双胞(DT)是现代物联网系统研究中的一个抢险概念,它具有监测和自动推理等高级功能。 semantic技术如知识图(KG)在DT中特别地应用于信息模型化。基于这种趋势,本文提出了基于知识图和时间序列数据的semantic association rule学习管道。此外,我们还提出了一新的semantic association rule标准。我们通过对工业水网enario进行初步评估,发现提出的方法能够学习大量具有Semantic信息的相关规则,这些规则更加通用。本文的目的是为将来的Semantic association rule学习研究提供基础,特别是在工业应用中。

Fast-ELECTRA for Efficient Pre-training

  • paper_url: http://arxiv.org/abs/2310.07347
  • repo_url: None
  • paper_authors: Chengyu Dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu
  • for: 本文针对ELECTRA预训方法进行改进,以提高效率和稳定性。
  • methods: 本文使用现有语言模型作为助动模型,并透过温度调整的渐减 schedule 建立学习课程,以帮助主模型提高表现。
  • results: 本研究显示,使用现有语言模型作为助动模型可以帮助提高ELECTRA的效率和稳定性,并且与现有state-of-the-art ELECTRA-style预训方法相当。
    Abstract ELECTRA pre-trains language models by detecting tokens in a sequence that have been replaced by an auxiliary model. Although ELECTRA offers a significant boost in efficiency, its potential is constrained by the training cost brought by the auxiliary model. Notably, this model, which is jointly trained with the main model, only serves to assist the training of the main model and is discarded post-training. This results in a substantial amount of training cost being expended in vain. To mitigate this issue, we propose Fast-ELECTRA, which leverages an existing language model as the auxiliary model. To construct a learning curriculum for the main model, we smooth its output distribution via temperature scaling following a descending schedule. Our approach rivals the performance of state-of-the-art ELECTRA-style pre-training methods, while significantly eliminating the computation and memory cost brought by the joint training of the auxiliary model. Our method also reduces the sensitivity to hyper-parameters and enhances the pre-training stability.
    摘要 ELECTRA 使用替换token的模型进行预训练,以提高效率。然而,ELECTRA 的潜力受到助手模型的训练成本的限制。具体来说,这个模型只是用于帮助主模型训练,并且在训练结束后被抛弃。这会导致大量的训练成本浪费。为了解决这个问题,我们提出了 Fast-ELECTRA,它利用现有的语言模型作为助手模型。我们使用温度层 scaling 将主模型的输出分布平滑化,以建立一个学习课程 для主模型。我们的方法可以与现有的状态态-of-the-art ELECTRA 预训练方法相比,同时减少计算和内存成本,以及敏感度到hyperparameter和预训练稳定性。

Exploring Social Motion Latent Space and Human Awareness for Effective Robot Navigation in Crowded Environments

  • paper_url: http://arxiv.org/abs/2310.07335
  • repo_url: None
  • paper_authors: Junaid Ahmed Ansari, Satyajit Tourani, Gourav Kumar, Brojeshwar Bhowmick
  • for: 本研究提出了一种基于社交动作潜在空间学习的社交机器人导航方法,以提高社交导航指标(如成功率、导航时间、轨迹长度),并生成更平滑、更预测性的轨迹。
  • methods: 该方法利用社交动作潜在空间来生成机器人控制,并通过对比基线模型而证明其超越性。
  • results: 研究表明,包含人类意识的社交机器人导航框架可以生成更短、更平滑的轨迹,是因为人类可以正面与机器人互动。
    Abstract This work proposes a novel approach to social robot navigation by learning to generate robot controls from a social motion latent space. By leveraging this social motion latent space, the proposed method achieves significant improvements in social navigation metrics such as success rate, navigation time, and trajectory length while producing smoother (less jerk and angular deviations) and more anticipatory trajectories. The superiority of the proposed method is demonstrated through comparison with baseline models in various scenarios. Additionally, the concept of humans' awareness towards the robot is introduced into the social robot navigation framework, showing that incorporating human awareness leads to shorter and smoother trajectories owing to humans' ability to positively interact with the robot.
    摘要 这个工作提出了一种新的社交机器人导航方法,通过学习生成机器人控制从社交动作潜在空间中生成机器人控制。通过利用这个社交动作潜在空间,提议方法可以获得 significatively 提高社交导航指标,如成功率、导航时间和轨迹长度,同时生成更平滑(具有 menos jerk 和 angular deviation)的轨迹。在不同的场景下,相比基eline 模型,提议方法的优势得到了证明。此外,在社交机器人导航框架中引入人类意识的概念,表明在机器人与人类之间的交互中,机器人可以更短更平滑的轨迹。

An Empirical Study of Instruction-tuning Large Language Models in Chinese

  • paper_url: http://arxiv.org/abs/2310.07328
  • repo_url: https://github.com/phoebussi/alpaca-cot
  • paper_authors: Qingyi Si, Tong Wang, Zheng Lin, Xu Zhang, Yanan Cao, Weiping Wang
  • for: 这 paper 旨在对中文大语言模型(LLMs)进行深入的实验研究,以便更好地适应中文指令。
  • methods: 本 paper 使用了三个关键元素进行实验研究:LLM bases、参数效率方法和指令数据类型。此外,它还研究了其他因素,如 chain-of-thought 数据和人类价值对Alignment。
  • results: 本 paper 的实验结果表明,通过对 LLM bases、参数效率方法和指令数据类型进行调整,可以实现更好地适应中文指令的中文 LLMS。 Code 和数据可以在 https://github.com/PhoebusSi/Alpaca-CoT 上获取。
    Abstract The success of ChatGPT validates the potential of large language models (LLMs) in artificial general intelligence (AGI). Subsequently, the release of LLMs has sparked the open-source community's interest in instruction-tuning, which is deemed to accelerate ChatGPT's replication process. However, research on instruction-tuning LLMs in Chinese, the world's most spoken language, is still in its early stages. Therefore, this paper makes an in-depth empirical study of instruction-tuning LLMs in Chinese, which can serve as a cookbook that provides valuable findings for effectively customizing LLMs that can better respond to Chinese instructions. Specifically, we systematically explore the impact of LLM bases, parameter-efficient methods, instruction data types, which are the three most important elements for instruction-tuning. Besides, we also conduct experiment to study the impact of other factors, e.g., chain-of-thought data and human-value alignment. We hope that this empirical study can make a modest contribution to the open Chinese version of ChatGPT. This paper will release a powerful Chinese LLMs that is comparable to ChatGLM. The code and data are available at https://github.com/PhoebusSi/Alpaca-CoT.
    摘要 成功的ChatGPT证明大语言模型(LLM)在人工通用智能(AGI)中的潜力。随后,LLM的发布激发了开源社区对于教程调整的兴趣,这被认为可以加速ChatGPT的复制过程。然而,对于中文的LLM教程调整研究仍处于初期阶段。因此,本文进行了深入的实验研究,以提供有价值的发现,可以帮助更好地调整LLM,以便更好地回应中文指令。具体来说,我们系统地探讨LLM基础、参数效率方法和指令数据类型等三个关键元素的影响。此外,我们还进行了实验研究其他因素,如链条数据和人类价值对alignment。我们希望这个实验研究可以为开放的中文版ChatGPT提供一个有价值的参考,并释放一个与ChatGPT相当的中文LLM。代码和数据可以在https://github.com/PhoebusSi/Alpaca-CoT中找到。

An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l

  • paper_url: http://arxiv.org/abs/2310.07325
  • repo_url: None
  • paper_authors: James Dao, Yeu-Tong Lau, Can Rager, Jett Janiak
  • for: 该研究探讨了一种4层转换器的内存管理问题,并提供了具体的证据。
  • methods: 研究使用了直观逻辑权重分析技术来分析模型的输出。
  • results: 研究发现,直观逻辑权重分析技术可能提供不准确的结果,因为它不考虑模型中的干净行为。
    Abstract We provide concrete evidence for memory management in a 4-layer transformer. Specifically, we identify clean-up behavior, in which model components consistently remove the output of preceeding components during a forward pass. Our findings suggest that the interpretability technique Direct Logit Attribution provides misleading results. We show explicit examples where this technique is inaccurate, as it does not account for clean-up behavior.
    摘要 我们提供具体证据对 transformer Memory Management 的研究。具体来说,我们发现了“清理”行为,即模型组件在前进通道中一致地 removes 前一个组件的输出。我们的发现表明,使用 Direct Logit Attribution interpretability 技术可能会得到错误的结果。我们提供了明确的例子,证明这种技术无法考虑清理行为。

On the Impact of Cross-Domain Data on German Language Models

  • paper_url: http://arxiv.org/abs/2310.07321
  • repo_url: None
  • paper_authors: Amin Dada, Aokun Chen, Cheng Peng, Kaleb E Smith, Ahmad Idrissi-Yaghir, Constantin Marc Seibold, Jianning Li, Lars Heiliger, Xi Yang, Christoph M. Friedrich, Daniel Truhn, Jan Egger, Jiang Bian, Jens Kleesiek, Yonghui Wu
  • for: 本研究目的是探讨数据多样性对大语言模型的影响,以及高质量数据是否能够超越多样性的效果。
  • methods: 研究者使用了五个不同领域的文本数据集,并对这些数据集进行了归一化和分词处理。然后,他们在这些数据集上训练了一系列大语言模型,并对这些模型进行了多个下游任务的benchmark测试。
  • results: 研究者发现,训练在多样性数据集上的模型可以与高质量数据集上的模型相比,在多个下游任务上表现出较好的性能,并且可以提高过去最佳性能的4.45%。
    Abstract Traditionally, large language models have been either trained on general web crawls or domain-specific data. However, recent successes of generative large language models, have shed light on the benefits of cross-domain datasets. To examine the significance of prioritizing data diversity over quality, we present a German dataset comprising texts from five domains, along with another dataset aimed at containing high-quality data. Through training a series of models ranging between 122M and 750M parameters on both datasets, we conduct a comprehensive benchmark on multiple downstream tasks. Our findings demonstrate that the models trained on the cross-domain dataset outperform those trained on quality data alone, leading to improvements up to $4.45\%$ over the previous state-of-the-art. The models are available at https://huggingface.co/ikim-uk-essen
    摘要 传统上,大型语言模型通常是通过全网爬虫或域专业数据进行训练。然而,最近的生成大型语言模型的成功,抛光了跨领域数据的优势。为了评估数据多样性的重要性,我们提供了一个德国 dataset,包含五个领域的文本,以及另一个具有高质量数据的 dataset。通过在两个 dataset 上训练一系列模型,从 122M 到 750M 参数的模型,我们进行了多个下游任务的完整性测试。我们的发现表明,跨领域 dataset 上训练的模型比单一质量数据alone 训练的模型提高了 $4.45\%$,超过了之前的状态调。模型可以在 https://huggingface.co/ikim-uk-essen 上获取。

WiGenAI: The Symphony of Wireless and Generative AI via Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.07312
  • repo_url: None
  • paper_authors: Mehdi Letafati, Samad Ali, Matti Latva-aho
  • for: 这 paper 旨在探讨生成AI在无线通信系统中的应用,以铺垫未来的研究。
  • methods: 这 paper 使用了Diffusion-based生成模型,这种新的状态态模型在生成模型中具有最新的状态。
  • results: 这 paper 通过两个案例研究,提出了一种使用Diffusion模型提高无线通信系统的Bit Error Rate的方法,并且在不理想的接收器情况下实现了30%的提高。
    Abstract Innovative foundation models, such as GPT-3 and stable diffusion models, have made a paradigm shift in the realm of artificial intelligence (AI) towards generative AI-based systems. In unison, from data communication and networking perspective, AI and machine learning (AI/ML) algorithms are envisioned to be pervasively incorporated into the future generations of wireless communications systems, highlighting the need for novel AI-native solutions for the emergent communication scenarios. In this article, we outline the applications of generative AI in wireless communication systems to lay the foundations for research in this field. Diffusion-based generative models, as the new state-of-the-art paradigm of generative models, are introduced, and their applications in wireless communication systems are discussed. Two case studies are also presented to showcase how diffusion models can be exploited for the development of resilient AI-native communication systems. Specifically, we propose denoising diffusion probabilistic models (DDPM) for a wireless communication scheme with non-ideal transceivers, where 30% improvement is achieved in terms of bit error rate. As the second application, DDPMs are employed at the transmitter to shape the constellation symbols, highlighting a robust out-of-distribution performance. Finally, future directions and open issues for the development of generative AI-based wireless systems are discussed to promote future research endeavors towards wireless generative AI (WiGenAI).
    摘要 创新基础模型,如GPT-3和稳定扩散模型,在人工智能(AI)领域引入了一个新的思维方式,推动了基于生成AI的系统的发展。在数据通信和网络方面,AI/ML算法将在未来的无线通信系统中普遍应用,需要新的AINative解决方案来应对新兴的通信场景。本文介绍了生成AI在无线通信系统中的应用,为这一领域的研究奠基。 diffusion基础生成模型被介绍为新的生成模型的状态艺术,其应用在无线通信系统中被讨论。两个案例研究如何使用扩散模型来开发鲁棒的AINative通信系统。首先,我们提出了噪声扩散概率模型(DDPM),用于一种无线通信方案中的非理想接收器,其中提高了比特错误率30%。其次,DDPM被用于发送器来修饰 konstellation 符号,并证明了在异常输出情况下的稳定性。最后,我们讨论了未来发展和开放问题,以促进未来的无线生成AI(WiGenAI)研究。

RobustGEC: Robust Grammatical Error Correction Against Subtle Context Perturbation

  • paper_url: http://arxiv.org/abs/2310.07299
  • repo_url: https://github.com/hillzhang1999/robustgec
  • paper_authors: Yue Zhang, Leyang Cui, Enbo Zhao, Wei Bi, Shuming Shi
    for: 这种论文的目的是为了评估语言修正系统的上下文稳定性。methods: 该论文使用了5000个语言修正例子,每个例子包括一个原始错误语句和五个人工标注的变体。results: 研究发现当前的语言修正系统仍然无法快速响应上下文变化,而提议的简单 yet effective 方法可以有效解决这个问题。
    Abstract Grammatical Error Correction (GEC) systems play a vital role in assisting people with their daily writing tasks. However, users may sometimes come across a GEC system that initially performs well but fails to correct errors when the inputs are slightly modified. To ensure an ideal user experience, a reliable GEC system should have the ability to provide consistent and accurate suggestions when encountering irrelevant context perturbations, which we refer to as context robustness. In this paper, we introduce RobustGEC, a benchmark designed to evaluate the context robustness of GEC systems. RobustGEC comprises 5,000 GEC cases, each with one original error-correct sentence pair and five variants carefully devised by human annotators. Utilizing RobustGEC, we reveal that state-of-the-art GEC systems still lack sufficient robustness against context perturbations. In addition, we propose a simple yet effective method for remitting this issue.
    摘要 grammatical error correction (GEC) 系统在日常写作任务中扮演着重要的角色。然而,用户可能会在使用 GEC 系统时发现,当输入有所修改时,GEC 系统可能会在初始化时表现良好,但在修改后仍然无法正确地更正错误。为确保理想的用户体验,一个可靠的 GEC 系统应该有能力在不相关的上下文干扰下提供一致和准确的建议。在这篇论文中,我们介绍了 RobustGEC,一个用于评估 GEC 系统的上下文稳定性的库。RobustGEC 包含 5,000 个 GEC 案例,每个案例包含一对原始错误 corrected 句子 pair 和五个由人类标注员所设计的修改案例。利用 RobustGEC,我们发现现有的 GEC 系统仍然缺乏对上下文干扰的抗衡能力。此外,我们也提出了一个简单 yet 有效的方法来解决这个问题。

Beyond Memorization: Violating Privacy Via Inference with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.07298
  • repo_url: None
  • paper_authors: Robin Staab, Mark Vero, Mislav Balunović, Martin Vechev
  • for: 这个研究的目的是研究现有大语言模型(LLM)是否可以通过文本内容来推断个人特征。
  • methods: 研究使用了现有的LLM,并构建了一个基于Reddit Profilese的数据集,以测试LLM的推断能力。
  • results: 研究发现,现有的LLM可以准确地推断个人特征,例如地点、收入和性别,并且可以在人工智能技术的一个分之一的成本和时间下达到人类水平。此外,研究还探讨了隐私泄露的风险,并发现现有的防御措施无法保护用户隐私。
    Abstract Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the key question of whether current LLMs could violate individuals' privacy by inferring personal attributes from text given at inference time. In this work, we present the first comprehensive study on the capabilities of pretrained LLMs to infer personal attributes from text. We construct a dataset consisting of real Reddit profiles, and show that current LLMs can infer a wide range of personal attributes (e.g., location, income, sex), achieving up to $85\%$ top-1 and $95.8\%$ top-3 accuracy at a fraction of the cost ($100\times$) and time ($240\times$) required by humans. As people increasingly interact with LLM-powered chatbots across all aspects of life, we also explore the emerging threat of privacy-invasive chatbots trying to extract personal information through seemingly benign questions. Finally, we show that common mitigations, i.e., text anonymization and model alignment, are currently ineffective at protecting user privacy against LLM inference. Our findings highlight that current LLMs can infer personal data at a previously unattainable scale. In the absence of working defenses, we advocate for a broader discussion around LLM privacy implications beyond memorization, striving for a wider privacy protection.
    摘要 现有大量语言模型(LLM)隐私研究主要关注模型在训练数据中储存的问题。同时,模型的推理能力有了很大的提高。这提出了关键问题:现有LLM是否可以通过文本来推断个人特征?在这个工作中,我们提供了首次LLM在文本中推断个人特征的全面研究。我们构建了基于真实的Reddit Profilestext dataset,并显示了当前LLM可以通过文本推断各种个人特征(如地点、收入、性别),达到了人类的$85\%$ top-1和$95.8\%$ top-3准确率,而且只需要人类的$100\times$ 时间和$240\times$ 成本。随着人们在所有方面的生活中与LLM驱动的chatbot进行交互,我们也探讨了隐私泄露的emerging threat,即通过看起来普通的问题来提取个人信息。最后,我们发现现有的mitigationstrategies,如文本匿名和模型对齐,目前无法保护用户隐私。我们的发现表明当前LLM可以在前所未有的规模上进行个人数据推断。在没有工作的防御措施时,我们呼吁更广泛的隐私问题讨论,努力为更多的隐私保护。

Automated Verification of Equivalence Properties in Advanced Logic Programs – Bachelor Thesis

  • paper_url: http://arxiv.org/abs/2310.19806
  • repo_url: None
  • paper_authors: Jan Heuer
  • for: 这个论文的目的是提供一种自动化正式验证工具,用于替换原始子程序。
  • methods: 这个论文使用了 Anthem 翻译工具,以及一个自动证明工具 для классической逻辑,来验证两个程序的强等价性。
  • results: 该论文扩展了 Anthem 工具,使其能够验证包含卷积、否定和简单选择规则的逻辑程序的强等价性。新版本的 Anthem 还能够翻译这些程序到 классической逻辑中。
    Abstract With the increase in industrial applications using Answer Set Programming, the need for formal verification tools, particularly for critical applications, has also increased. During the program optimisation process, it would be desirable to have a tool which can automatically verify whether an optimised subprogram can replace the original subprogram. Formally this corresponds to the problem of verifying the strong equivalence of two programs. In order to do so, the translation tool anthem was developed. It can be used in conjunction with an automated theorem prover for classical logic to verify that two programs are strongly equivalent. With the current version of anthem, only the strong equivalence of positive programs with a restricted input language can be verified. This is a result of the translation $\tau^*$ implemented in anthem that produces formulas in the logic of here-and-there, which coincides with classical logic only for positive programs. This thesis extends anthem in order to overcome these limitations. First, the transformation $\sigma^*$ is presented, which transforms formulas from the logic of here-and-there to classical logic. A theorem formalises how $\sigma^*$ can be used to express equivalence in the logic of here-and-there in classical logic. Second, the translation $\tau^*$ is extended to programs containing pools. Another theorem shows how $\sigma^*$ can be combined with $\tau^*$ to express the strong equivalence of two programs in classical logic. With $\sigma^*$ and the extended $\tau^*$, it is possible to express the strong equivalence of logic programs containing negation, simple choices, and pools. Both the extended $\tau^*$ and $\sigma^*$ are implemented in a new version of anthem. Several examples of logic programs containing pools, negation, and simple choice rules, which the new version of anthem can translate to classical logic, are presented. Some a...
    摘要 随着应用 Answer Set Programming 的增加,对于重要应用程序的正式验证工具的需求也增加了。在程序优化过程中,您希望有一个工具可以自动验证优化后的子程序是否可以替换原始子程序。正式来说,这对应于两个程序的强等价性问题的验证。为此,anthem 工具被开发出来。它可以与自动证明工具结合,以验证两个程序的强等价性。anthem 的当前版本只能验证正确的程序的强等价性,这是因为 anthem 中的翻译 $\tau^*$ 仅能处理正确的程序。这个论文扩展 anthem,以解决这些限制。首先,我们提出了一种变换 $\sigma^*$,可以将 formulas 从 here-and-there 逻辑转换为类型逻辑。一个定理证明了 $\sigma^*$ 如何用于表达 here-and-there 逻辑中的等价性。其次,我们扩展了 $\tau^*$,以便处理包含池的程序。另一个定理证明了如何将 $\sigma^*$ 与 $\tau^*$ 结合使用,以表达两个程序的强等价性。通过 $\sigma^*$ 和扩展后的 $\tau^*$,我们可以表达包含否定、简单选择规则的逻辑程序的强等价性。这些扩展后的 $\sigma^*$ 和 $\tau^*$ 都被实现在新版本的 anthem 中。我们还提供了一些逻辑程序示例,包括包含池、否定和简单选择规则的程序,这些程序可以通过新版本的 anthem 转换为类型逻辑。

An Analysis on Large Language Models in Healthcare: A Case Study of BioBERT

  • paper_url: http://arxiv.org/abs/2310.07282
  • repo_url: None
  • paper_authors: Shyni Sharaf, V. S. Anoop
  • for: This paper explores the application of large language models, specifically BioBERT, in healthcare and its potential benefits for clinical decision support and information retrieval.
  • methods: The paper proposes a systematic methodology for fine-tuning BioBERT to meet the unique needs of the healthcare domain, including data gathering, annotation, and specialized preprocessing techniques.
  • results: The paper evaluates the performance of BioBERT in various healthcare-related tasks, such as medical entity recognition and question-answering, and explores techniques to improve the model’s interpretability. It also acknowledges the ethical considerations and challenges of integrating BioBERT into healthcare contexts.
    Abstract This paper conducts a comprehensive investigation into applying large language models, particularly on BioBERT, in healthcare. It begins with thoroughly examining previous natural language processing (NLP) approaches in healthcare, shedding light on the limitations and challenges these methods face. Following that, this research explores the path that led to the incorporation of BioBERT into healthcare applications, highlighting its suitability for addressing the specific requirements of tasks related to biomedical text mining. The analysis outlines a systematic methodology for fine-tuning BioBERT to meet the unique needs of the healthcare domain. This approach includes various components, including the gathering of data from a wide range of healthcare sources, data annotation for tasks like identifying medical entities and categorizing them, and the application of specialized preprocessing techniques tailored to handle the complexities found in biomedical texts. Additionally, the paper covers aspects related to model evaluation, with a focus on healthcare benchmarks and functions like processing of natural language in biomedical, question-answering, clinical document classification, and medical entity recognition. It explores techniques to improve the model's interpretability and validates its performance compared to existing healthcare-focused language models. The paper thoroughly examines ethical considerations, particularly patient privacy and data security. It highlights the benefits of incorporating BioBERT into healthcare contexts, including enhanced clinical decision support and more efficient information retrieval. Nevertheless, it acknowledges the impediments and complexities of this integration, encompassing concerns regarding data privacy, transparency, resource-intensive requirements, and the necessity for model customization to align with diverse healthcare domains.
    摘要

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

  • paper_url: http://arxiv.org/abs/2310.07276
  • repo_url: https://github.com/QizhiPei/BioT5
  • paper_authors: Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, Rui Yan
  • For: The paper aims to enhance drug discovery by integrating molecules, proteins, and natural language.* Methods: The proposed method, BioT5, uses SELFIES to generate robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature. It also distinguishes between structured and unstructured knowledge.* Results: BioT5 shows superior performance across a wide range of tasks, demonstrating its strong capability of capturing underlying relations and properties of bio-entities.
    Abstract Recent advancements in biological research leverage the integration of molecules, proteins, and natural language to enhance drug discovery. However, current models exhibit several limitations, such as the generation of invalid molecular SMILES, underutilization of contextual information, and equal treatment of structured and unstructured knowledge. To address these issues, we propose $\mathbf{BioT5}$, a comprehensive pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. $\mathbf{BioT5}$ utilizes SELFIES for $100%$ robust molecular representations and extracts knowledge from the surrounding context of bio-entities in unstructured biological literature. Furthermore, $\mathbf{BioT5}$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information. After fine-tuning, BioT5 shows superior performance across a wide range of tasks, demonstrating its strong capability of capturing underlying relations and properties of bio-entities. Our code is available at $\href{https://github.com/QizhiPei/BioT5}{Github}$.
    摘要 最近的生物研究进步利用分子、蛋白质和自然语言的集成来提高药物发现。然而,当前的模型具有许多限制,如生成无效的分子SMILES、Contextual information的过度利用和结构化知识和未结构化知识的平等对待。为解决这些问题,我们提出了 $\mathbf{BioT5}$,一个全面预训练框架,用于增强生物学中的分子知识和自然语言关系。 $\mathbf{BioT5}$ 使用 SELFIES 确保 $100%$ 可靠的分子表示,从生物文献中的生物实体周围的上下文中提取知识,并在结构化和未结构化知识之间进行区分。这使得 $\mathbf{BioT5}$ 在许多任务上表现出色,表明它能够捕捉生物实体下面的关系和性质。我们的代码可以在 $\href{https://github.com/QizhiPei/BioT5}{Github}$ 上找到。

CoPAL: Corrective Planning of Robot Actions with Large Language Models

  • paper_url: http://arxiv.org/abs/2310.07263
  • repo_url: None
  • paper_authors: Frank Joublin, Antonello Ceravola, Pavel Smirnov, Felix Ocker, Joerg Deigmoeller, Anna Belardinelli, Chao Wang, Stephan Hasler, Daniel Tanneberg, Michael Gienger
  • for: 这种研究旨在提高机器人完全自主系统的可行性,以替代人类执行任务。
  • methods: 这种研究使用了大语言模型应用于机器人任务和动作规划,提出了一种新的重启策略来处理物理、逻辑和 semantics 错误。
  • results: 经验证明,提出的反馈体系可以提高执行可能性、正确性和时间复杂度。
    Abstract In the pursuit of fully autonomous robotic systems capable of taking over tasks traditionally performed by humans, the complexity of open-world environments poses a considerable challenge. Addressing this imperative, this study contributes to the field of Large Language Models (LLMs) applied to task and motion planning for robots. We propose a system architecture that orchestrates a seamless interplay between multiple cognitive levels, encompassing reasoning, planning, and motion generation. At its core lies a novel replanning strategy that handles physically grounded, logical, and semantic errors in the generated plans. We demonstrate the efficacy of the proposed feedback architecture, particularly its impact on executability, correctness, and time complexity via empirical evaluation in the context of a simulation and two intricate real-world scenarios: blocks world, barman and pizza preparation.
    摘要 “在实现完全自主机器人系统的挑战中,开放世界环境的复杂性对任务和动作观念规划产生了很大的挑战。这项研究对于大型自然语言模型(LLM)应用在机器人任务和动作规划方面做出了贡献。我们提出了一个系统架构,它在多个认知水平之间进行了联系,包括理解、规划和动作生成。这个架构的核心是一种新的重新规划策略,可以处理物理基础、逻辑和semantic错误在生成的计划中。我们透过实际评估在模拟和两个实际世界情况下(即积木世界和制作啤酒和制作izza),证明了我们提出的反馈架构的有效性,特别是对于执行可能性、正确性和时间复杂度的影响。”

Uncovering Hidden Connections: Iterative Tracking and Reasoning for Video-grounded Dialog

  • paper_url: http://arxiv.org/abs/2310.07259
  • repo_url: https://github.com/hyu-zhang/itr
  • paper_authors: Haoyu Zhang, Meng Liu, Yaowei Wang, Da Cao, Weili Guan, Liqiang Nie
  • for: 这篇论文的目的是提出一种新的视频对话方法,可以快速和准确地回答视频内容相关的问题。
  • methods: 这篇论文使用了一种迭代跟踪和理解策略,将文本编码器、视觉编码器和生成器相结合。文本编码器使用了一种路径跟踪和汇总机制,能够从对话历史中提取关键信息,解决问题。视觉编码器使用了一种迭代理解网络,精心挑选和强调视频中重要的视觉特征,提高视觉理解的深度。
  • results: 作者通过在两个知名的数据集上进行实验,证明了他们提出的方法的可靠性和适应性。
    Abstract In contrast to conventional visual question answering, video-grounded dialog necessitates a profound understanding of both dialog history and video content for accurate response generation. Despite commendable strides made by existing methodologies, they often grapple with the challenges of incrementally understanding intricate dialog histories and assimilating video information. In response to this gap, we present an iterative tracking and reasoning strategy that amalgamates a textual encoder, a visual encoder, and a generator. At its core, our textual encoder is fortified with a path tracking and aggregation mechanism, adept at gleaning nuances from dialog history that are pivotal to deciphering the posed questions. Concurrently, our visual encoder harnesses an iterative reasoning network, meticulously crafted to distill and emphasize critical visual markers from videos, enhancing the depth of visual comprehension. Culminating this enriched information, we employ the pre-trained GPT-2 model as our response generator, stitching together coherent and contextually apt answers. Our empirical assessments, conducted on two renowned datasets, testify to the prowess and adaptability of our proposed design.
    摘要 contrast to traditional visual question answering, video-grounded dialog requires a profound understanding of both dialog history and video content for accurate response generation. Despite notable advances made by existing methodologies, they often struggle with incrementally understanding complex dialog histories and integrating video information. In response to this gap, we propose an iterative tracking and reasoning strategy that combines a textual encoder, a visual encoder, and a generator. At its core, our textual encoder is reinforced with a path tracking and aggregation mechanism, skilled at extracting subtleties from dialog history that are crucial to deciphering the posed questions. Meanwhile, our visual encoder utilizes an iterative reasoning network, carefully crafted to distill and emphasize essential visual cues from videos, enhancing the depth of visual understanding. Combining this enriched information, we employ the pre-trained GPT-2 model as our response generator, seamlessly stitching together coherent and contextually appropriate answers. Our empirical evaluations, conducted on two well-known datasets, demonstrate the effectiveness and adaptability of our proposed design.

ADMEOOD: Out-of-Distribution Benchmark for Drug Property Prediction

  • paper_url: http://arxiv.org/abs/2310.07253
  • repo_url: https://github.com/qweasdzxc-wsy/ADMEOOD
  • paper_authors: Shuoying Wei, Xinlong Wen, Lida Zhu, Songquan Li, Rongbo Zhu
  • For: This paper aims to address the out-of-distribution (OOD) problem in drug property prediction by proposing a novel benchmark dataset and evaluating the performance of different domain generalization models.* Methods: The proposed benchmark, called ADMEOOD, includes a systematic OOD dataset curator and benchmark specifically designed for drug property prediction. It includes two types of OOD data shifts: Noise Shift and Concept Conflict Drift (CCD).* Results: The experimental results demonstrate the effectiveness of the proposed partition method in ADMEOOD, showing a significant difference in performance between in-distribution and out-of-distribution data. Additionally, the paper shows that Empirical Risk Minimization (ERM) and other models exhibit distinct trends in performance across different domains and measurement types.
    Abstract Obtaining accurate and valid information for drug molecules is a crucial and challenging task. However, chemical knowledge and information have been accumulated over the past 100 years from various regions, laboratories, and experimental purposes. Little has been explored in terms of the out-of-distribution (OOD) problem with noise and inconsistency, which may lead to weak robustness and unsatisfied performance. This study proposes a novel benchmark ADMEOOD, a systematic OOD dataset curator and benchmark specifically designed for drug property prediction. ADMEOOD obtained 27 ADME (Absorption, Distribution, Metabolism, Excretion) drug properties from Chembl and relevant literature. Additionally, it includes two kinds of OOD data shifts: Noise Shift and Concept Conflict Drift (CCD). Noise Shift responds to the noise level by categorizing the environment into different confidence levels. On the other hand, CCD describes the data which has inconsistent label among the original data. Finally, it tested on a variety of domain generalization models, and the experimental results demonstrate the effectiveness of the proposed partition method in ADMEOOD: ADMEOOD demonstrates a significant difference performance between in-distribution and out-of-distribution data. Moreover, ERM (Empirical Risk Minimization) and other models exhibit distinct trends in performance across different domains and measurement types.
    摘要 得到正确和有效的药分子信息是一项关键和挑战性的任务。然而,化学知识和信息在过去100年间在不同的地方、实验室和实验目的下积累了大量的经验。虽然有很多研究对药物性能进行了预测,但对于异常情况(OOD)的问题还有很少的探索。本研究提出了一个新的标准测试集ADMEOOD,该集包含27个ADME(吸收、分布、代谢、排除)药物性能 Parameters from Chembl和相关文献。此外,它还包括两种类型的OOD数据推移:噪声推移和概念冲突推移(CCD)。噪声推移根据噪声水平进行分类环境,而CCD则描述了原始数据中存在冲突的标签。最后,它在多种领域通用化模型上进行了测试,实验结果表明了提案的分区方法在ADMEOOD中的效果:ADMEOOD在含有和不含的数据之间显示了明显的性能差异。此外,ERM(Empirical Risk Minimization)模型和其他模型在不同的领域和测量类型上表现出了不同的趋势。

Ethical Reasoning over Moral Alignment: A Case and Framework for In-Context Ethical Policies in LLMs

  • paper_url: http://arxiv.org/abs/2310.07251
  • repo_url: None
  • paper_authors: Abhinav Rao, Aditi Khandelwal, Kumar Tanmay, Utkarsh Agarwal, Monojit Choudhury
  • for: argue that LLMs should be infused with generic ethical reasoning capabilities to handle value pluralism at a global scale, rather than aligning them to specific ethical principles.
  • methods: develop a framework that integrates moral dilemmas with moral principles from different formalisms of normative ethics and at different levels of abstractions.
  • results: initial experiments with GPT-x models show that while GPT-4 is a nearly perfect ethical reasoner, the models still have bias towards the moral values of Western and English speaking societies.
    Abstract In this position paper, we argue that instead of morally aligning LLMs to specific set of ethical principles, we should infuse generic ethical reasoning capabilities into them so that they can handle value pluralism at a global scale. When provided with an ethical policy, an LLM should be capable of making decisions that are ethically consistent to the policy. We develop a framework that integrates moral dilemmas with moral principles pertaining to different foramlisms of normative ethics, and at different levels of abstractions. Initial experiments with GPT-x models shows that while GPT-4 is a nearly perfect ethical reasoner, the models still have bias towards the moral values of Western and English speaking societies.
    摘要 在这份Position paper中,我们认为,而不是将人工智能语言模型(LLMs) morally align到特定的道德原则上,我们应该通过嵌入基于道德理解的能力来让它们能够处理全球范围内的价值多元性。当提供了一个道德政策时,一个LLM应该能够作出道德一致的决策。我们开发了一个整合道德困境和不同形式的normative ethics的道德原则的框架。初步实验表明,使用GPT-x模型时,GPT-4是一个几乎完美的道德思考者,但模型仍然偏向西方和英语社会的道德价值观。

Surrogate modeling for stochastic crack growth processes in structural health monitoring applications

  • paper_url: http://arxiv.org/abs/2310.07241
  • repo_url: None
  • paper_authors: Nicholas E. Silionis, Konstantinos N. Anyfantis
  • for: 这个论文的目的是用structural health monitoring(SHM)技术预测金属结构中裂缝增长的未来趋势,以便实现预测维护。
  • methods: 该论文使用了物理基础的裂缝增长模型,以及对这些模型的不确定性进行了 reprehension。具体来说,这个论文使用了 Gaussian Process(GP)回归模型,以生成不同类型的不确定性的先验分布。
  • results: 该论文通过在数值上实现了这种方法,并对两个基本的裂缝监测问题进行了评估,即裂缝长度监测(损害评估)和裂缝增长监测(损害预测)。
    Abstract Fatigue crack growth is one of the most common types of deterioration in metal structures with significant implications on their reliability. Recent advances in Structural Health Monitoring (SHM) have motivated the use of structural response data to predict future crack growth under uncertainty, in order to enable a transition towards predictive maintenance. Accurately representing different sources of uncertainty in stochastic crack growth (SCG) processes is a non-trivial task. The present work builds on previous research on physics-based SCG modeling under both material and load-related uncertainty. The aim here is to construct computationally efficient, probabilistic surrogate models for SCG processes that successfully encode these different sources of uncertainty. An approach inspired by latent variable modeling is employed that utilizes Gaussian Process (GP) regression models to enable the surrogates to be used to generate prior distributions for different Bayesian SHM tasks as the application of interest. Implementation is carried out in a numerical setting and model performance is assessed for two fundamental crack SHM problems; namely crack length monitoring (damage quantification) and crack growth monitoring (damage prognosis).
    摘要 轻度疲劳裂隙是金属结构衰弱的一种最常见的类型,它对结构可靠性有着重要的影响。现代结构健康监测(SHM)技术的发展,使得可以通过结构响应数据预测未来裂隙增长,以便实现预测维护。正确表达不同类型的不确定性在杂音裂隙(SCG)过程中是一项非常困难的任务。本工作基于之前的物理基础SCG模型下的不确定性研究,旨在构建高效、probabilistic替身模型,以成功地编码不同类型的不确定性。我们采用了含隐变量模型的方法,使用 Gaussian Process(GP)回归模型,以便替身模型可以用来生成不同bayesian SHM任务的先验分布。实施在数字化环境中,并对两个基本的裂隙 SHM问题进行了评估,即裂隙长度监测(损害评估)和裂隙增长监测(损害预测)。

Using Learnable Physics for Real-Time Exercise Form Recommendations

  • paper_url: http://arxiv.org/abs/2310.07221
  • repo_url: None
  • paper_authors: Abhishek Jaiswal, Gautam Chauhan, Nisheeth Srivastava
  • for: 这篇论文是为了提供一个用于训练和重abilitation的推荐系统,以实时评估和给出修正建议,以提高安全性和生产力。
  • methods: 这个推荐系统使用MediaPipe进行姿势识别,使用峰值振荡检测缩数量,并使用一个可学习的物理模拟器追踪每个运动动作的动作演进。
  • results: 这个系统在六种全身和上半身运动动作中进行了实时评估和修正建议,以提高自修练的可能性和降低运动伤害的风险。
    Abstract Good posture and form are essential for safe and productive exercising. Even in gym settings, trainers may not be readily available for feedback. Rehabilitation therapies and fitness workouts can thus benefit from recommender systems that provide real-time evaluation. In this paper, we present an algorithmic pipeline that can diagnose problems in exercise techniques and offer corrective recommendations, with high sensitivity and specificity in real-time. We use MediaPipe for pose recognition, count repetitions using peak-prominence detection, and use a learnable physics simulator to track motion evolution for each exercise. A test video is diagnosed based on deviations from the prototypical learned motion using statistical learning. The system is evaluated on six full and upper body exercises. These real-time recommendations, counseled via low-cost equipment like smartphones, will allow exercisers to rectify potential mistakes making self-practice feasible while reducing the risk of workout injuries.
    摘要 好的姿势和形态是健身安全和生产力的关键。即使在健身房 Setting中,教练可能不总是可以提供反馈。rehabilitation therapy和健身训练可以从推荐系统中受益,该系统可以提供实时的评估。在这篇论文中,我们提出了一个算法管道,可以诊断运动技巧中的问题并提供修正建议,具有高度敏感和特异性。我们使用MediaPipe进行姿势识别,使用峰值特征检测计数 repetitions,并使用可学习的物理模拟器跟踪运动的动态变化。一个测试视频根据异常分析 deviations from the learned prototypical motion。该系统被评估在六种全身和上半身运动中。这些实时建议,通过低成本的设备如智能手机,将让运动员可以Rectify potential mistakes,使自修成为可能,同时降低运动伤害的风险。

Improved Membership Inference Attacks Against Language Classification Models

  • paper_url: http://arxiv.org/abs/2310.07219
  • repo_url: None
  • paper_authors: Shlomit Shachor, Natalia Razinkov, Abigail Goldsteen
  • for: 这篇论文旨在评估机器学习模型中的隐私风险,以帮助决策使用、部署或共享模型。
  • methods: 该论文提出了一种新的整合方法,通过生成多个专门的攻击模型来对分类模型进行会员推理攻击。
  • results: 该研究表明,使用该整合方法可以实现更高的攻击精度,比单个攻击模型或每个分类标签的攻击模型都高。
    Abstract Artificial intelligence systems are prevalent in everyday life, with use cases in retail, manufacturing, health, and many other fields. With the rise in AI adoption, associated risks have been identified, including privacy risks to the people whose data was used to train models. Assessing the privacy risks of machine learning models is crucial to enabling knowledgeable decisions on whether to use, deploy, or share a model. A common approach to privacy risk assessment is to run one or more known attacks against the model and measure their success rate. We present a novel framework for running membership inference attacks against classification models. Our framework takes advantage of the ensemble method, generating many specialized attack models for different subsets of the data. We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label, both on classical and language classification tasks.
    摘要 人工智能系统在日常生活中广泛应用,包括零售、制造、医疗等领域。随着人工智能的普及,相关的风险也被识别出来,包括使用人工智能模型训练数据的隐私风险。评估机器学习模型的隐私风险是必要的,以便做出了知情的决策是否使用、部署或共享模型。我们提出了一种新的散 membership 攻击框架,该框架利用 ensemble 方法,生成多个特化的攻击模型,用于不同的数据子集。我们展示了,这种方法可以在类别和语言分类任务上达到更高的准确率,比单个攻击模型或每个类别标签的攻击模型都高。

Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

  • paper_url: http://arxiv.org/abs/2310.07218
  • repo_url: None
  • paper_authors: Yuxin Chen, Chen Tang, Ran Tian, Chenran Li, Jinning Li, Masayoshi Tomizuka, Wei Zhan
  • for: 这 paper 是 investigate 多智能体强化学习(MARL)中的泛化问题。
  • methods: 这 paper 使用 Level of Influence(LoI)metric 量化多智能体之间的互动程度,并在不同的enario和环境中评估 LoI 对泛化性能的影响。
  • results: 研究发现,在许多enario中,多个合作者的多样化训练可以提高 eg agent 的泛化性能,但是这种提高的程度因enario和环境而异。LoI 能够预测这种差异性。此外,基于 LoI 的资源分配策略可以在受限的 computation budget 下提高泛化性能。
    Abstract Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which an agent is influenced by unseen co-players depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on effectively training agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget.
    摘要 <>文本翻译成简化中文。<>多 Agent Reinforcement Learning(MARL)中,泛化带来了重要的挑战。代表者在未经见过的合作者的影响程度取决于代表者的策略和特定情况。一个量化的分析这个关系,可以帮助有效地训练代表者。在这个研究中,我们提出了影响度指数(LoI),用于衡量在给定enario和环境中代表者之间的交互强度。我们发现,通常情况下,在训练过程中采用更加多样化的合作者集合,可以提高代表者的泛化性能;然而,这种改善的程度随着不同的情况和环境而异。LoI有效地预测这些改善的差异。此外,我们还提出了基于LoI的资源分配策略,用于在有限的计算预算下训练多种情况下的策略。我们的结果表明,根据LoI进行策略资源的有计划分配,可以在同一个计算预算下达到更高的性能。

Multi-Task Learning-Enabled Automatic Vessel Draft Reading for Intelligent Maritime Surveillance

  • paper_url: http://arxiv.org/abs/2310.07212
  • repo_url: None
  • paper_authors: Jingxiang Qu, Ryan Wen Liu, Chenjie Zhao, Yu Guo, Sendren Sheng-Dong Xu, Fenghua Zhu, Yisheng Lv
  • For: This paper proposes a multi-task learning-enabled computational method (MTL-VDR) for generating highly reliable vessel draft depth readings.* Methods: The MTL-VDR method consists of four components: draft mark detection, draft scale recognition, vessel/water segmentation, and final draft depth estimation. The method uses a powerful and efficient convolutional neural network for draft mark detection and employs a multi-task learning method for simultaneous draft scale recognition and vessel/water segmentation.* Results: The method demonstrated superior performance in terms of accuracy, robustness, and efficiency, with an adaptive computational method used to yield an accurate and robust draft depth. The computational speed exceeds 40 FPS, satisfying the requirements of real-time maritime surveillance to guarantee vessel traffic safety.
    Abstract The accurate and efficient vessel draft reading (VDR) is an important component of intelligent maritime surveillance, which could be exploited to assist in judging whether the vessel is normally loaded or overloaded. The computer vision technique with an excellent price-to-performance ratio has become a popular medium to estimate vessel draft depth. However, the traditional estimation methods easily suffer from several limitations, such as sensitivity to low-quality images, high computational cost, etc. In this work, we propose a multi-task learning-enabled computational method (termed MTL-VDR) for generating highly reliable VDR. In particular, our MTL-VDR mainly consists of four components, i.e., draft mark detection, draft scale recognition, vessel/water segmentation, and final draft depth estimation. We first construct a benchmark dataset related to draft mark detection and employ a powerful and efficient convolutional neural network to accurately perform the detection task. The multi-task learning method is then proposed for simultaneous draft scale recognition and vessel/water segmentation. To obtain more robust VDR under complex conditions (e.g., damaged and stained scales, etc.), the accurate draft scales are generated by an automatic correction method, which is presented based on the spatial distribution rules of draft scales. Finally, an adaptive computational method is exploited to yield an accurate and robust draft depth. Extensive experiments have been implemented on the realistic dataset to compare our MTL-VDR with state-of-the-art methods. The results have demonstrated its superior performance in terms of accuracy, robustness, and efficiency. The computational speed exceeds 40 FPS, which satisfies the requirements of real-time maritime surveillance to guarantee vessel traffic safety.
    摘要 “精准和高效的船舶吃水深度读取(VDR)是智能海上监测中重要的一部分,可以帮助判断船舶是否超载。计算机视觉技术具有出色的价格-性能比,成为船舶吃水深度估算的受欢迎媒体。然而,传统估算方法容易受到低质量图像、高计算成本等限制。在这种情况下,我们提出了一种基于多任务学习的计算方法(简称MTL-VDR),用于生成高可靠性的VDR。具体来说,我们的MTL-VDR包括四个组件:船舶吃水深度检测、船舶/水域分割、船舶吃水深度估算和自适应计算方法。我们首先构建了相关的船舶吃水深度检测数据集,并使用高效和强大的卷积神经网络进行检测任务的准确实施。然后,我们提出了多任务学习方法,用于同时进行船舶吃水深度估算和船舶/水域分割。为了在复杂情况下(如损坏和污染等)获得更加稳定的VDR,我们提出了一种自动更正方法,基于船舶吃水深度的精度规则。最后,我们运用了一种适应计算方法,以确保高准确性和稳定性。我们对实际数据进行了广泛的实验,与现有方法进行比较。结果显示,我们的MTL-VDR在精度、稳定性和效率方面具有显著的优势。计算速度超过40帧每秒,满足了实时海上监测的需求,以保障船舶交通安全。”

State of the Art on Diffusion Models for Visual Computing

  • paper_url: http://arxiv.org/abs/2310.07204
  • repo_url: None
  • paper_authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein
  • for: 提供一个入门性的状态报告,帮助研究者、艺术家和实践者了解扩散模型在视觉计算领域的基本数学概念、实现细节和设计选择,以及扩散基于生成AI工具的各种应用和扩展。
  • methods: 涵盖了扩散模型的基本数学概念、Stable Diffusion模型的实现细节和设计选择,以及扩散基于生成AI工具的各种应用和扩展。
  • results: 提供了一个全面的Literature综述,概述了扩散基于生成AI工具的各种应用和扩展,包括2D图像、视频、3D物体、动作和4D场景等。同时也讨论了可用的数据集、评价指标、开放的挑战和社会影响等。
    Abstract The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applications has seen exponential growth and relevant papers are published across the computer graphics, computer vision, and AI communities with new works appearing daily on arXiv. This rapid growth of the field makes it difficult to keep up with all recent developments. The goal of this state-of-the-art report (STAR) is to introduce the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model, as well as overview important aspects of these generative AI tools, including personalization, conditioning, inversion, among others. Moreover, we give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing, categorized by the type of generated medium, including 2D images, videos, 3D objects, locomotion, and 4D scenes. Finally, we discuss available datasets, metrics, open challenges, and social implications. This STAR provides an intuitive starting point to explore this exciting topic for researchers, artists, and practitioners alike.
    摘要 领域的视觉计算在迅速发展,启动了基于生成人工智能(AI)的扩展,这些技术在图像、视频和3D场景的生成、编辑和重建方面提供了无前例的能力。在这些领域中,扩散模型是生成AI架构的首选。过去一年内,有关扩散工具和应用的学术论文数量在计算机图形、计算机视觉和人工智能领域呈指数增长,每天在arXiv上出现新的论文。这种快速发展的领域使得保持最新的发展变得困难。本state-of-the-art报告(STAR)的目的是介绍扩散模型的基本数学概念,扩散模型的实现细节和设计选择,以及生成AI工具的重要方面,包括个性化、条件、反向等。此外,我们还给出了生成和编辑扩散模型的快速增长的评价,分类为生成媒介的类型,包括2D图像、视频、3D物体、移动和4D场景。最后,我们讨论了可用的数据集、评价指标、开放的挑战和社会影响。这个STAR为研究者、艺术家和实践者提供了直观的入门点,以便更好地探索这个激动人心的主题。

MatChat: A Large Language Model and Application Service Platform for Materials Science

  • paper_url: http://arxiv.org/abs/2310.07197
  • repo_url: None
  • paper_authors: Ziyi Chen, Fankai Xie, Meng Wan, Yang Yuan, Miao Liu, Zongguo Wang, Sheng Meng, Yangang Wang
  • for: 预测化学合成路径,以满足材料科学研究中的需求。
  • methods: 利用自动生成文本和问答系统,以及精细调整技术,实现大规模AI模型在特定领域中的部署。
  • results: 研究人员通过使用LLaMA2-7B模型,并将其特化为材料科学领域,创造出了名为MatChat的特殊AI模型,可以预测无机材料合成路径。MatChat表现出了丰富的知识掌握和逻辑能力,但还需要进一步改进,以满足不同材料设计需求。
    Abstract The prediction of chemical synthesis pathways plays a pivotal role in materials science research. Challenges, such as the complexity of synthesis pathways and the lack of comprehensive datasets, currently hinder our ability to predict these chemical processes accurately. However, recent advancements in generative artificial intelligence (GAI), including automated text generation and question-answering systems, coupled with fine-tuning techniques, have facilitated the deployment of large-scale AI models tailored to specific domains. In this study, we harness the power of the LLaMA2-7B model and enhance it through a learning process that incorporates 13,878 pieces of structured material knowledge data. This specialized AI model, named MatChat, focuses on predicting inorganic material synthesis pathways. MatChat exhibits remarkable proficiency in generating and reasoning with knowledge in materials science. Although MatChat requires further refinement to meet the diverse material design needs, this research undeniably highlights its impressive reasoning capabilities and innovative potential in the field of materials science. MatChat is now accessible online and open for use, with both the model and its application framework available as open source. This study establishes a robust foundation for collaborative innovation in the integration of generative AI in materials science.
    摘要 文本预测在材料科学研究中扮演着重要的角色。现在,化学合成路径的复杂性和缺乏全面数据库等问题正在阻碍我们准确预测这些化学过程。然而,最近的生成人工智能(GAI)技术,包括自动生成文本和问答系统,以及精细调整技术,已经使得大规模AI模型在特定领域中进行部署。在本研究中,我们利用LLaMA2-7B模型的力量,并通过包括13,878个结构化物理知识数据的学习过程,开发了一个专门用于预测无机材料合成路径的AI模型,称为MatChat。MatChat在材料科学领域中表现出了惊人的知识生成和理解能力。虽然MatChat还需要进一步的优化,以满足多样化的材料设计需求,但这项研究无疑地高亮了MatChat在材料科学领域的创新潜力。MatChat现在在线开放,可以免费使用,模型和应用框架都是开源的。本研究建立了对生成AI在材料科学领域的集成创新的坚实基础。

Adaptive Gating in Mixture-of-Experts based Language Models

  • paper_url: http://arxiv.org/abs/2310.07188
  • repo_url: None
  • paper_authors: Jiamin Li, Qiang Su, Yitao Yang, Yimin Jiang, Cong Wang, Hong Xu
  • for: 这篇论文主要是关于如何使用适应性网络来提高语言模型的训练效率和性能。
  • methods: 该论文提出了一种适应性网络(MoE)模型,其中每个token可以通过不同的专家来进行计算,以适应不同的语言复杂度。此外,论文还使用了课程学习来进一步降低训练时间。
  • results: 实验结果显示,适应性网络可以减少最多22.5%的训练时间,同时保持推理质量。此外,论文还进行了路由决策的分析,并提供了相关的分析结论。
    Abstract Large language models, such as OpenAI's ChatGPT, have demonstrated exceptional language understanding capabilities in various NLP tasks. Sparsely activated mixture-of-experts (MoE) has emerged as a promising solution for scaling models while maintaining a constant number of computational operations. Existing MoE model adopts a fixed gating network where each token is computed by the same number of experts. However, this approach contradicts our intuition that the tokens in each sequence vary in terms of their linguistic complexity and, consequently, require different computational costs. Little is discussed in prior research on the trade-off between computation per token and model performance. This paper introduces adaptive gating in MoE, a flexible training strategy that allows tokens to be processed by a variable number of experts based on expert probability distribution. The proposed framework preserves sparsity while improving training efficiency. Additionally, curriculum learning is leveraged to further reduce training time. Extensive experiments on diverse NLP tasks show that adaptive gating reduces at most 22.5% training time while maintaining inference quality. Moreover, we conduct a comprehensive analysis of the routing decisions and present our insights when adaptive gating is used.
    摘要 大型语言模型,如OpenAI的ChatGPT,在不同的自然语言处理任务中表现出了非常出色的语言理解能力。零启动权重的混合专家(MoE)已经成为了可扩展模型的一个有 promise的解决方案,以保持计算操作数量的常数。现有的MoE模型采用固定的闭包网络,每个字符都由相同数量的专家计算。然而,这种方法与我们的语言理解 intuition 相抵触,即每个序列中的字符具有不同的语言复杂度,因此需要不同的计算成本。在优化过程中,对计算每个字符的时间和模型性能之间的负面ffekt little 的研究。本文提出了适应性闭包(Adaptive Gating),一种灵活的训练策略,使得字符可以根据专家概率分布来处理不同数量的专家。该提案保持了稀疏性,同时改善了训练效率。此外,我们还利用了课程学习,以进一步减少训练时间。广泛的实验表明,适应性闭包可以在不同的自然语言处理任务中减少训练时间最多22.5%,保持推理质量。此外,我们还进行了路由决策的全面分析,并对适应性闭包使用的时候提供了我们的思路。

Multiview Transformer: Rethinking Spatial Information in Hyperspectral Image Classification

  • paper_url: http://arxiv.org/abs/2310.07186
  • repo_url: None
  • paper_authors: Jie Zhang, Yongshan Zhang, Yicong Zhou
  • for: 本文是为了提高涉及谱图像(HSI)的地形分类准确性而研究的。
  • methods: 本文使用多视图变换器(MPCA、SED、SPTT)来提取HSI中的空间-spectral特征表示。MPCA通过构建多视图观察数据,并在每个视图数据上应用PCA来提取低维度视图表示。SED使用全连接卷积神经网络来提取多视图特征图。SPTT使用空间poolingtokenization策略将多视图特征转换为tokens,学习稳定和分类的空间-spectral特征。
  • results: 实验结果表明,提出的多视图变换器在三个HSI数据集上表现出色,超过了现有方法的性能。
    Abstract Identifying the land cover category for each pixel in a hyperspectral image (HSI) relies on spectral and spatial information. An HSI cuboid with a specific patch size is utilized to extract spatial-spectral feature representation for the central pixel. In this article, we investigate that scene-specific but not essential correlations may be recorded in an HSI cuboid. This additional information improves the model performance on existing HSI datasets and makes it hard to properly evaluate the ability of a model. We refer to this problem as the spatial overfitting issue and utilize strict experimental settings to avoid it. We further propose a multiview transformer for HSI classification, which consists of multiview principal component analysis (MPCA), spectral encoder-decoder (SED), and spatial-pooling tokenization transformer (SPTT). MPCA performs dimension reduction on an HSI via constructing spectral multiview observations and applying PCA on each view data to extract low-dimensional view representation. The combination of view representations, named multiview representation, is the dimension reduction output of the MPCA. To aggregate the multiview information, a fully-convolutional SED with a U-shape in spectral dimension is introduced to extract a multiview feature map. SPTT transforms the multiview features into tokens using the spatial-pooling tokenization strategy and learns robust and discriminative spatial-spectral features for land cover identification. Classification is conducted with a linear classifier. Experiments on three HSI datasets with rigid settings demonstrate the superiority of the proposed multiview transformer over the state-of-the-art methods.
    摘要 Identifying the land cover category for each pixel in a hyperspectral image (HSI) relies on both spectral and spatial information. We use an HSI cuboid with a specific patch size to extract spatial-spectral feature representations for the central pixel. However, we find that scene-specific but not essential correlations may be recorded in the HSI cuboid, which can improve model performance on existing HSI datasets but also make it difficult to evaluate the model's ability. We refer to this problem as the spatial overfitting issue and use strict experimental settings to avoid it.To address this issue, we propose a multiview transformer for HSI classification, which consists of multiview principal component analysis (MPCA), spectral encoder-decoder (SED), and spatial-pooling tokenization transformer (SPTT). MPCA performs dimension reduction on the HSI by constructing spectral multiview observations and applying PCA on each view data to extract low-dimensional view representations. The combination of view representations, named multiview representation, is the dimension reduction output of the MPCA. To aggregate the multiview information, a fully-convolutional SED with a U-shape in spectral dimension is introduced to extract a multiview feature map. SPTT transforms the multiview features into tokens using the spatial-pooling tokenization strategy and learns robust and discriminative spatial-spectral features for land cover identification. Classification is conducted with a linear classifier.Experiments on three HSI datasets with rigid settings demonstrate the superiority of the proposed multiview transformer over the state-of-the-art methods.

rpcPRF: Generalizable MPI Neural Radiance Field for Satellite Camera

  • paper_url: http://arxiv.org/abs/2310.07179
  • repo_url: None
  • paper_authors: Tongtong Zhang, Yuanxiang Li
  • for: 这个论文targets the task of novel view synthesis of satellite images, with a focus on Rational Polynomial Camera (RPC) models.
  • methods: The proposed method, called rpcPRF, uses a Multiplane Images (MPI) based Planar neural Radiance Field (PRF) to synthesize novel views of satellite images. The model leverages reprojection supervision to generalize to unseen scenes and removes the need for dense depth supervision.
  • results: The paper reports that rpcPRF outperforms state-of-the-art NERF-based methods in terms of image fidelity, reconstruction accuracy, and efficiency on two datasets (TLC and SatMVS3D) with urban scenes from WV-3 and ZY-3 satellites.
    Abstract Novel view synthesis of satellite images holds a wide range of practical applications. While recent advances in the Neural Radiance Field have predominantly targeted pin-hole cameras, and models for satellite cameras often demand sufficient input views. This paper presents rpcPRF, a Multiplane Images (MPI) based Planar neural Radiance Field for Rational Polynomial Camera (RPC). Unlike coordinate-based neural radiance fields in need of sufficient views of one scene, our model is applicable to single or few inputs and performs well on images from unseen scenes. To enable generalization across scenes, we propose to use reprojection supervision to induce the predicted MPI to learn the correct geometry between the 3D coordinates and the images. Moreover, we remove the stringent requirement of dense depth supervision from deep multiview-stereo-based methods by introducing rendering techniques of radiance fields. rpcPRF combines the superiority of implicit representations and the advantages of the RPC model, to capture the continuous altitude space while learning the 3D structure. Given an RGB image and its corresponding RPC, the end-to-end model learns to synthesize the novel view with a new RPC and reconstruct the altitude of the scene. When multiple views are provided as inputs, rpcPRF exerts extra supervision provided by the extra views. On the TLC dataset from ZY-3, and the SatMVS3D dataset with urban scenes from WV-3, rpcPRF outperforms state-of-the-art nerf-based methods by a significant margin in terms of image fidelity, reconstruction accuracy, and efficiency, for both single-view and multiview task.
    摘要 <> translate the following text into Simplified Chinese:Novel view synthesis of satellite images has a wide range of practical applications. While recent advances in the Neural Radiance Field have predominantly targeted pin-hole cameras, and models for satellite cameras often demand sufficient input views. This paper presents rpcPRF, a Multiplane Images (MPI) based Planar neural Radiance Field for Rational Polynomial Camera (RPC). Unlike coordinate-based neural radiance fields in need of sufficient views of one scene, our model is applicable to single or few inputs and performs well on images from unseen scenes. To enable generalization across scenes, we propose to use reprojection supervision to induce the predicted MPI to learn the correct geometry between the 3D coordinates and the images. Moreover, we remove the stringent requirement of dense depth supervision from deep multiview-stereo-based methods by introducing rendering techniques of radiance fields. rpcPRF combines the superiority of implicit representations and the advantages of the RPC model, to capture the continuous altitude space while learning the 3D structure. Given an RGB image and its corresponding RPC, the end-to-end model learns to synthesize the novel view with a new RPC and reconstruct the altitude of the scene. When multiple views are provided as inputs, rpcPRF exerts extra supervision provided by the extra views. On the TLC dataset from ZY-3, and the SatMVS3D dataset with urban scenes from WV-3, rpcPRF outperforms state-of-the-art nerf-based methods by a significant margin in terms of image fidelity, reconstruction accuracy, and efficiency, for both single-view and multiview task.Translation:新视图合成卫星图像应用广泛。Recent Advances in Neural Radiance Field 主要针对平面镜头,而卫星相机模型通常需要足够的输入视图。本文提出了rpcPRF,基于多平面图像(MPI)的平面神经频率场(RPC)模型。与坐标基于神经频率场的模型不同,我们的模型适用于单个或几个输入,并在未见场景中表现良好。为实现场景总结,我们提议使用重投映监督,使预测的MPI学习正确的场景坐标和图像之间的几何关系。此外,我们从深度多视图雷达方法中移除了密集深度监督的需求,通过引入投映技术来实现雷达场景的渲染。rpcPRF结合了神经频率场的优势和RPC模型的优点,可以捕捉不间断的高度空间,同时学习3D结构。给定一个RGB图像和其相应的RPC,结束到终点模型可以使用新的RPCsynthesize Novel View和重建场景的高度。当提供多个视图输入时,rpcPRF可以提供Extra supervision。在ZY-3的TLC数据集和WV-3的SatMVS3D数据集上,rpcPRF与状态aru的nerf-based方法比之,在图像准确度、重建精度和效率等方面具有显著的优势,包括单视图和多视图任务。

Online Speculative Decoding

  • paper_url: http://arxiv.org/abs/2310.07177
  • repo_url: None
  • paper_authors: Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
  • for: 加速大语言模型(LLM)的推理过程,使用较小的稿本模型预测目标模型的输出。
  • methods: 在线预测推理(OSD)技术,通过在用户查询数据观察到的过程中不断更新(多个)稿本模型(),使用LLM服务器集群的剩余计算能力进行在线重新训练稿本模型,以提高预测精度。
  • results: 根据实验结果,在线预测推理可以提高稿本模型的准确率,从而提高LLM的推理效率,并且可以降低延迟时间,实现1.22倍至3.06倍的提升。
    Abstract Speculative decoding is a pivotal technique to accelerate the inference of large language models (LLMs) by employing a smaller draft model to predict the target model's outputs. However, its efficacy can be limited due to the low predictive accuracy of the draft model, particularly when faced with diverse text inputs and a significant capability gap between the draft and target models. We introduce online speculative decoding (OSD) to address this challenge. The main idea is to continually update (multiple) draft model(s) on observed user query data using the abundant excess computational power in an LLM serving cluster. Given that LLM inference is memory-bounded, the surplus computational power in a typical LLM serving cluster can be repurposed for online retraining of draft models, thereby making the training cost-neutral. Since the query distribution of an LLM service is relatively simple, retraining on query distribution enables the draft model to more accurately predict the target model's outputs, particularly on data originating from query distributions. As the draft model evolves online, it aligns with the query distribution in real time, mitigating distribution shifts. We develop a prototype of online speculative decoding based on online knowledge distillation and evaluate it using both synthetic and real query data on several popular LLMs. The results show a substantial increase in the token acceptance rate by 0.1 to 0.65, which translates into 1.22x to 3.06x latency reduction.
    摘要 推测解码是一种重要的技术,可以加速大型语言模型(LLM)的推断过程,通过使用一个更小的签名模型来预测目标模型的输出。然而,其效果可能受到签名模型预测精度的限制,特别是面对多样化的文本输入和目标模型的能力差距。我们提出在线推测解码(OSD)技术来解决这个挑战。主要想法是在 LLM 服务中继续更新(多个)签名模型(多个),使用 LLM 的剩余计算能力来在线 retrained 签名模型,从而使得 retrained 成本neutral。由于 LLM 的查询分布相对简单,因此在线 retrained 签名模型可以更好地预测目标模型的输出,特别是来自查询分布的数据。在签名模型在线演化的过程中,它与查询分布相对应,使得分布shift Mitigated。我们开发了基于在线知识填充的 OSD прототип,并对其进行了Synthetic和实际查询数据的评估。结果表明,通过使用 OSD 技术,可以提高 токен接受率 BY 0.1 TO 0.65,相当于减少延迟时间 BY 1.22x TO 3.06x。

Solving Travelling Thief Problems using Coordination Based Methods

  • paper_url: http://arxiv.org/abs/2310.07156
  • repo_url: None
  • paper_authors: Majid Namazi, M. A. Hakim Newton, Conrad Sanderson, Abdul Sattar
  • for: solves the Travelling Thief Problem (TTP) by proposing a coordination-based approach that integrates human-designed and machine learning-based heuristics to improve solution quality.
  • methods: uses a combination of local search, human-designed coordination heuristics, and machine learning to explore cyclic tours and make item selections during collection plan exploration.
  • results: significantly outperforms existing state-of-the-art TTP solvers on a set of benchmark problems, demonstrating the effectiveness of the proposed coordination-based approach.
    Abstract A travelling thief problem (TTP) is a proxy to real-life problems such as postal collection. TTP comprises an entanglement of a travelling salesman problem (TSP) and a knapsack problem (KP) since items of KP are scattered over cities of TSP, and a thief has to visit cities to collect items. In TTP, city selection and item selection decisions need close coordination since the thief's travelling speed depends on the knapsack's weight and the order of visiting cities affects the order of item collection. Existing TTP solvers deal with city selection and item selection separately, keeping decisions for one type unchanged while dealing with the other type. This separation essentially means very poor coordination between two types of decision. In this paper, we first show that a simple local search based coordination approach does not work in TTP. Then, to address the aforementioned problems, we propose a human designed coordination heuristic that makes changes to collection plans during exploration of cyclic tours. We further propose another human designed coordination heuristic that explicitly exploits the cyclic tours in item selections during collection plan exploration. Lastly, we propose a machine learning based coordination heuristic that captures characteristics of the two human designed coordination heuristics. Our proposed coordination based approaches help our TTP solver significantly outperform existing state-of-the-art TTP solvers on a set of benchmark problems. Our solver is named Cooperation Coordination (CoCo) and its source code is available from https://github.com/majid75/CoCo
    摘要 “旅行盗僧问题”(TTP)是一个代表现实生活中的问题,如邮政收集。TTP包括一个旅行销售问题(TSP)和一个袋包问题(KP)的挂钮,因为KP中的物品是在TSP中的城市分散的,盗僧需要到城市集取物品。在TTP中,城市选择和物品选择的决策需要密切协调,因为盗僧的旅行速度取决于袋包的重量,并且城市顺序影响物品顺序收集。现有的TTP解决方案是分开处理城市选择和物品选择,对一种类型的决策不会改变,而对另一种类型的决策进行处理。这种分离实际上意味着两种决策之间的很 poor 的协调。在这篇论文中,我们首先显示了一个简单的本地搜索基于协调方法在TTP中不会工作。然后,为了解决上述问题,我们提出了一个人工设计的协调规则,在探索游历的过程中对收集计划进行修改。我们还提出了另一个人工设计的协调规则,在收集计划探索过程中明确地利用游历。 finally, we propose a machine learning based coordination heuristic that captures the characteristics of the two human-designed coordination heuristics. our proposed coordination-based approaches significantly outperform existing state-of-the-art TTP solvers on a set of benchmark problems. our solver is named Cooperation Coordination (CoCo), and its source code is available from .

No Privacy Left Outside: On the (In-)Security of TEE-Shielded DNN Partition for On-Device ML

  • paper_url: http://arxiv.org/abs/2310.07152
  • repo_url: https://github.com/ziqi-zhang/teeslice-artifact
  • paper_authors: Ziqi Zhang, Chen Gong, Yifeng Cai, Yuanyuan Yuan, Bingyan Liu, Ding Li, Yao Guo, Xiangqun Chen
  • For: The paper is focused on addressing the security challenges of on-device machine learning (ML) models, specifically the threats of model stealing (MS) and membership inference attack (MIA).* Methods: The paper proposes a novel technique called TEESlice, which partitions the DNN model into two parts before training to defend against MS and MIA during inference. TEESlice uses a partition-before-training strategy to accurately separate privacy-related weights from public weights.* Results: The paper presents experimental results that show TEESlice delivers the same security protection as shielding the entire DNN model inside a Trusted Execution Environment (TEE), but with over 10X less overhead and no accuracy loss compared to prior TSDP solutions. The paper also highlights the inherent difficulty in deciding optimal DNN partition configurations for present TSDP solutions and the variability of such configurations across datasets and models.
    Abstract On-device ML introduces new security challenges: DNN models become white-box accessible to device users. Based on white-box information, adversaries can conduct effective model stealing (MS) and membership inference attack (MIA). Using Trusted Execution Environments (TEEs) to shield on-device DNN models aims to downgrade (easy) white-box attacks to (harder) black-box attacks. However, one major shortcoming is the sharply increased latency (up to 50X). To accelerate TEE-shield DNN computation with GPUs, researchers proposed several model partition techniques. These solutions, referred to as TEE-Shielded DNN Partition (TSDP), partition a DNN model into two parts, offloading the privacy-insensitive part to the GPU while shielding the privacy-sensitive part within the TEE. This paper benchmarks existing TSDP solutions using both MS and MIA across a variety of DNN models, datasets, and metrics. We show important findings that existing TSDP solutions are vulnerable to privacy-stealing attacks and are not as safe as commonly believed. We also unveil the inherent difficulty in deciding optimal DNN partition configurations (i.e., the highest security with minimal utility cost) for present TSDP solutions. The experiments show that such ``sweet spot'' configurations vary across datasets and models. Based on lessons harvested from the experiments, we present TEESlice, a novel TSDP method that defends against MS and MIA during DNN inference. TEESlice follows a partition-before-training strategy, which allows for accurate separation between privacy-related weights from public weights. TEESlice delivers the same security protection as shielding the entire DNN model inside TEE (the ``upper-bound'' security guarantees) with over 10X less overhead (in both experimental and real-world environments) than prior TSDP solutions and no accuracy loss.
    摘要 ondevice ML引入新的安全挑战:DNN模型变成了设备用户可见的白盒模型。基于白盒信息,攻击者可以进行有效的模型窃取(MS)和会员推理攻击(MIA)。使用Trusted Execution Environments(TEEs)保护在设备上的DNN模型,以降低(容易)白盒攻击到(更加困难)黑盒攻击。然而,一个主要缺点是增加了响应时间(最多50倍)。为了加速TEE保护的DNN计算,研究人员提出了多种模型分割技术。这些解决方案被称为TEE-Shielded DNN Partition(TSDP),它将DNN模型分成两部分,将隐私敏感部分卷入TEE中,而隐私不敏感部分将被卷入GPU上。这篇论文对现有TSDP解决方案进行了MS和MIA的分别测试,并对多个DNN模型、数据集和指标进行了测试。我们发现现有TSDP解决方案容易受到隐私窃取攻击,并不如常被认为的安全。我们还发现决定最佳DNN分割配置(即最高安全性和最小实用成本)对现有TSDP解决方案是困难的。实验表明,这些“甜点”配置在不同的数据集和模型上具有差异。基于实验所获的经验,我们提出了TEESlice,一种新的TSDP方法。TEESlice采用分配before training的策略,允许准确地分化隐私相关的权重与公共权重。TEESlice提供了完全保护MS和MIA During DNN推理的安全保障,并且在实验和实际环境中具有10倍以上的性能优化,无损失 accuracy。

Determining Winners in Elections with Absent Votes

  • paper_url: http://arxiv.org/abs/2310.07150
  • repo_url: None
  • paper_authors: Qishen Han, Amélie Marian, Lirong Xia
  • for: 这个论文研究了选举中缺失选票的情况下,决定赢家的问题。
  • methods: 这篇论文使用了NP-完全理论和特殊的位置得分规则,以计算缺失选票情况下的赢家问题。
  • results: 论文表明,在投票 truncated 情况下,赢家问题是NP-完全的,而在特定的位置得分规则下,问题可以在多阶段时间内解决。
    Abstract An important question in elections is the determine whether a candidate can be a winner when some votes are absent. We study this determining winner with the absent votes (WAV) problem when the votes are top-truncated. We show that the WAV problem is NP-complete for the single transferable vote, Maximin, and Copeland, and propose a special case of positional scoring rule such that the problem can be computed in polynomial time. Our results in top-truncated rankings differ from the results in full rankings as their hardness results still hold when the number of candidates or the number of missing votes are bounded, while we show that the problem can be solved in polynomial time in either case.
    摘要 <>转换给定文本到简化中文。<>选举中一个重要问题是确定缺失票的候选人是否可以赢得选举。我们研究缺失票确定赢家问题(WAV问题),当投票是top-truncated时。我们显示WAV问题是NP-完全的 для单轮转移投票、最大最小值和冠军得分方式,并提出一种特殊情况的 pozitional scoring rule,使得问题可以在多阶段时间内解决。我们的结果在top-truncated排名中与全排名的结果不同,因为他们的困难结果仍然在缺失票或候选人数量是有限的情况下仍然成立,而我们则显示问题可以在任一情况下解决在多阶段时间内。Note: "简化中文" (Simplified Chinese) is a standardized form of Chinese used in mainland China and Singapore, while " tradicional中文" (Traditional Chinese) is used in Hong Kong, Macau, and Taiwan.

Denoising Task Routing for Diffusion Models

  • paper_url: http://arxiv.org/abs/2310.07138
  • repo_url: None
  • paper_authors: Byeongjun Park, Sangmin Woo, Hyojun Go, Jin-Young Kim, Changick Kim
  • for: 这篇论文的目的是提出一种简单的添加策略,以提高 diffusion 模型中的多任务学习(MTL)性能。
  • methods: 该策略基于 selectively 活跃 diffusion 模型中的多个核心通道,以实现不同任务之间的信息路径分离。
  • results: experiments 表明,该策略可以提高 diffusion 模型的性能,并且不需要添加额外参数。此外,该策略还可以加速训练过程的收敛。
    Abstract Diffusion models generate highly realistic images through learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplored area in designing neural architectures that explicitly incorporate MTL into the framework of diffusion models. In this paper, we present Denoising Task Routing (DTR), a simple add-on strategy for existing diffusion model architectures to establish distinct information pathways for individual tasks within a single architecture by selectively activating subsets of channels in the model. What makes DTR particularly compelling is its seamless integration of prior knowledge of denoising tasks into the framework: (1) Task Affinity: DTR activates similar channels for tasks at adjacent timesteps and shifts activated channels as sliding windows through timesteps, capitalizing on the inherent strong affinity between tasks at adjacent timesteps. (2) Task Weights: During the early stages (higher timesteps) of the denoising process, DTR assigns a greater number of task-specific channels, leveraging the insight that diffusion models prioritize reconstructing global structure and perceptually rich contents in earlier stages, and focus on simple noise removal in later stages. Our experiments demonstrate that DTR consistently enhances the performance of diffusion models across various evaluation protocols, all without introducing additional parameters. Furthermore, DTR contributes to accelerating convergence during training. Finally, we show the complementarity between our architectural approach and existing MTL optimization techniques, providing a more complete view of MTL within the context of diffusion training.
    摘要 Diffusion models可以生成非常真实的图像,通过学习多步降噪过程,自然地启用多任务学习(MTL)的原理。尽管涉及到的连接在 diffusion models 和 MTL 之间,仍然有一个未探索的领域,那就是设计神经网络架构,以显式地包含 MTL 在 diffusion models 中。在这篇论文中,我们提出了一种简单的扩展策略,即 Denoising Task Routing(DTR),可以让现有的 diffusion model 架构中设置独特的信息通路,以便每个任务在单一架构中有自己的信息通路。DTR 的实现方式很有趣,它通过在模型中选择性地启用多个通道来实现这一点。具体来说,DTR 在不同的时间步骤中启用不同的通道,使得模型可以在不同的时间步骤中完成不同的任务。此外,DTR 还可以根据任务之间的相互关系来启用相应的通道,从而使得模型可以更好地利用多任务的相互关系。我们的实验结果表明,DTR 可以一直提高 diffusion models 的性能,无需添加额外参数。此外,DTR 还可以加速训练过程的收敛。最后,我们还证明了 DTR 和现有的 MTL 优化技术之间的相互关系,从而提供了更全面的 MTL 视角,以便更好地理解 diffusion 训练中的多任务学习。

Off-Policy Evaluation for Human Feedback

  • paper_url: http://arxiv.org/abs/2310.07123
  • repo_url: None
  • paper_authors: Qitong Gao, Ge Gao, Juncheng Dong, Vahid Tarokh, Min Chi, Miroslav Pajic
  • for: 用于评估人工奖励信号(HF)的精度,提高RL在医疗等领域的安全性和效率。
  • methods: 基于IHR恢复方法和环境知识储存的幂等空间,对HF信号进行精度评估。
  • results: 在实验中,比直接使用现有OPE方法而言,我们的方法显著提高了HF信号的精度评估表现。
    Abstract Off-policy evaluation (OPE) is important for closing the gap between offline training and evaluation of reinforcement learning (RL), by estimating performance and/or rank of target (evaluation) policies using offline trajectories only. It can improve the safety and efficiency of data collection and policy testing procedures in situations where online deployments are expensive, such as healthcare. However, existing OPE methods fall short in estimating human feedback (HF) signals, as HF may be conditioned over multiple underlying factors and is only sparsely available; as opposed to the agent-defined environmental rewards (used in policy optimization), which are usually determined over parametric functions or distributions. Consequently, the nature of HF signals makes extrapolating accurate OPE estimations to be challenging. To resolve this, we introduce an OPE for HF (OPEHF) framework that revives existing OPE methods in order to accurately evaluate the HF signals. Specifically, we develop an immediate human reward (IHR) reconstruction approach, regularized by environmental knowledge distilled in a latent space that captures the underlying dynamics of state transitions as well as issuing HF signals. Our approach has been tested over two real-world experiments, adaptive in-vivo neurostimulation and intelligent tutoring, as well as in a simulation environment (visual Q&A). Results show that our approach significantly improves the performance toward estimating HF signals accurately, compared to directly applying (variants of) existing OPE methods.
    摘要 <>Off-policy evaluation (OPE) 是关键性的,可以减少在训练和评估强化学习(RL)中线上部署的成本,通过使用停滞轨迹来估计目标(评估)策略的性能和/或排名。它可以提高数据采集和策略测试的安全性和效率,特别是在医疗领域,因为在线部署是昂贵的。然而,现有的 OPE 方法无法准确地估计人类反馈(HF)信号,因为 HF 可能会受到多个下面因素的影响,并且只有稀缺的可用;相比之下,agent-defined 环境奖励(用于策略优化)通常是基于参数函数或分布来定义的。因此,HF 信号的自然特点使得 extrapolating 准确的 OPE 估计变得挑战。为解决这一问题,我们介绍了一个 OPEHF 框架,该框架可以重新使用现有的 OPE 方法,以准确地评估 HF 信号。specifically,我们开发了一种立即人类奖励(IHR)重构方法,该方法在环境知识捕获层中进行Regularization,以捕捉状态转移的下面动态和发出 HF 信号。我们的方法在两个实际实验(adaptive in-vivo neurostimulation和智能教学)以及一个模拟环境(视觉问答)中进行测试,结果表明,我们的方法可以准确地估计 HF 信号,相比直接应用(变种的)现有 OPE 方法。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

The Temporal Structure of Language Processing in the Human Brain Corresponds to The Layered Hierarchy of Deep Language Models

  • paper_url: http://arxiv.org/abs/2310.07106
  • repo_url: None
  • paper_authors: Ariel Goldstein, Eric Ham, Mariano Schain, Samuel Nastase, Zaid Zada, Avigail Dabush, Bobbi Aubrey, Harshvardhan Gazula, Amir Feder, Werner K Doyle, Sasha Devore, Patricia Dugan, Daniel Friedman, Roi Reichart, Michael Brenner, Avinatan Hassidim, Orrin Devinsky, Adeen Flinker, Omer Levy, Uri Hasson
  • for: 这paper的目的是用Deep Language Models(DLMs)来理解人类大脑中自然语言处理的机制。
  • methods: 这paper使用了层次序列的连续数值向量来表示单词和上下文,从而开拓了一系列的应用,如人类化文本生成。
  • results: 这paper表明了DLM层次结构可以模型人类大脑中语言理解的时间动力学,并通过对ECoG数据的使用,实现了更高的时间分辨率。结果表明DLM和人类大脑的语言处理有关系,DLM层次结构的层次积累信息与人类大脑高级语言区域的神经活动相吻合。
    Abstract Deep Language Models (DLMs) provide a novel computational paradigm for understanding the mechanisms of natural language processing in the human brain. Unlike traditional psycholinguistic models, DLMs use layered sequences of continuous numerical vectors to represent words and context, allowing a plethora of emerging applications such as human-like text generation. In this paper we show evidence that the layered hierarchy of DLMs may be used to model the temporal dynamics of language comprehension in the brain by demonstrating a strong correlation between DLM layer depth and the time at which layers are most predictive of the human brain. Our ability to temporally resolve individual layers benefits from our use of electrocorticography (ECoG) data, which has a much higher temporal resolution than noninvasive methods like fMRI. Using ECoG, we record neural activity from participants listening to a 30-minute narrative while also feeding the same narrative to a high-performing DLM (GPT2-XL). We then extract contextual embeddings from the different layers of the DLM and use linear encoding models to predict neural activity. We first focus on the Inferior Frontal Gyrus (IFG, or Broca's area) and then extend our model to track the increasing temporal receptive window along the linguistic processing hierarchy from auditory to syntactic and semantic areas. Our results reveal a connection between human language processing and DLMs, with the DLM's layer-by-layer accumulation of contextual information mirroring the timing of neural activity in high-order language areas.
    摘要

ClausewitzGPT Framework: A New Frontier in Theoretical Large Language Model Enhanced Information Operations

  • paper_url: http://arxiv.org/abs/2310.07099
  • repo_url: None
  • paper_authors: Benjamin Kereopa-Yorke
  • for: This paper aims to provide a framework for navigating the risks and challenges of Large Language Models (LLMs) and autonomous AI agents in the context of information operations.
  • methods: The paper uses a novel formulation called the “ClausewitzGPT” equation to quantify the risks of LLM-augmented operations and emphasizes the importance of ethical considerations and autonomous AI agents in ensuring a moral compass and societal imperatives.
  • results: The paper highlights the staggering year-on-year growth of AI information campaigns and emphasizes the urgency of addressing the challenges and risks of LLMs and autonomous AI agents in the context of information operations.Here are the three points in Simplified Chinese text:
  • for: 这篇论文目标是为了提供LLM和自动化AI代理在信息操作中 navigating 技术飞速带来的风险和挑战的框架。
  • methods: 论文使用一种新的方法,称为“ClausewitzGPT”方程,以量化LLM增强操作的风险,并强调了伦理考虑和自动化AI代理在保持道德 компас和社会要求方面的重要性。
  • results: 论文强调了年度增长的AI信息运动,并强调了现在的杰uncje点,需要我们通过加强LLM和自动化AI代理的风险和挑战来应对。
    Abstract In a digital epoch where cyberspace is the emerging nexus of geopolitical contention, the melding of information operations and Large Language Models (LLMs) heralds a paradigm shift, replete with immense opportunities and intricate challenges. As tools like the Mistral 7B LLM (Mistral, 2023) democratise access to LLM capabilities (Jin et al., 2023), a vast spectrum of actors, from sovereign nations to rogue entities (Howard et al., 2023), find themselves equipped with potent narrative-shaping instruments (Goldstein et al., 2023). This paper puts forth a framework for navigating this brave new world in the "ClausewitzGPT" equation. This novel formulation not only seeks to quantify the risks inherent in machine-speed LLM-augmented operations but also underscores the vital role of autonomous AI agents (Wang, Xie, et al., 2023). These agents, embodying ethical considerations (Hendrycks et al., 2021), emerge as indispensable components (Wang, Ma, et al., 2023), ensuring that as we race forward, we do not lose sight of moral compasses and societal imperatives. Mathematically underpinned and inspired by the timeless tenets of Clausewitz's military strategy (Clausewitz, 1832), this thesis delves into the intricate dynamics of AI-augmented information operations. With references to recent findings and research (Department of State, 2023), it highlights the staggering year-on-year growth of AI information campaigns (Evgeny Pashentsev, 2023), stressing the urgency of our current juncture. The synthesis of Enlightenment thinking, and Clausewitz's principles provides a foundational lens, emphasising the imperative of clear strategic vision, ethical considerations, and holistic understanding in the face of rapid technological advancement.
    摘要 在数字时代,虚拟空间成为地opolitical竞争的emerging nexus,information操作和大型自然语言模型(LLM)的融合标志着一种新的 paradigm shift,具有巨大的机遇和复杂的挑战。Tools like Mistral 7B LLM(Mistral,2023)通过 démocratising access to LLM capabilities(Jin et al., 2023),让各种actor,从主权国家到黑帮(Howard et al., 2023),拥有高效的叙述形成工具。这篇论文提出了一种用于 navigate这个勇敢的新世界的“ClausewitzGPT”方程。这种新的方程不仅试图量化机器速度下LLM-加速的风险,而且强调了自主AI代理(Wang, Xie, et al., 2023)的重要性。这些代理,具有伦理考虑(Hendrycks et al., 2021),在我们前进的过程中变得不可或缺。这篇论文受数学基础和Clausewitz的 воен略思想(Clausewitz, 1832)的激发,探讨了人工智能加速的信息操作动态。通过参考最新的发现和研究(Department of State, 2023),这篇论文强调了艺术智能信息活动的年度增长(Evgeny Pashentsev, 2023),强调当前的战略危机性。通过融合Enlightenment思想和Clausewitz的原则,这篇论文提供了一种基本的镜像,强调了在快速技术进步的面前,我们必须具备清晰的战略视野、伦理考虑和整体理解。

Sparse Universal Transformer

  • paper_url: http://arxiv.org/abs/2310.07096
  • repo_url: https://github.com/shawntan/SUT
  • paper_authors: Shawn Tan, Yikang Shen, Zhenfang Chen, Aaron Courville, Chuang Gan
  • for: 本研究旨在提出一种叫做 sparse universal transformer(SUT),用于提高 universal transformer(UT)的计算复杂度和参数效率,同时保持其基本特性和泛化能力。
  • methods: 本研究使用了 sparse mixture of experts(SMoE)和一种新的棒拌分解方式来降低 UT 的计算复杂度,并且提出了一种新的停止机制来控制计算复杂度。
  • results: 实验表明,SUT 可以与强基线模型相当的性能,仅使用了半个计算和参数量,并且在正式语言任务(Logical inference和CFQ)上显示了强大的泛化能力。此外,新的停止机制还可以在推理过程中减少计算量约 50%,而无须减少性能。
    Abstract The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers. Empirical evidence shows that UTs have better compositional generalization than Vanilla Transformers (VTs) in formal language tasks. The parameter-sharing also affords it better parameter efficiency than VTs. Despite its many advantages, scaling UT parameters is much more compute and memory intensive than scaling up a VT. This paper proposes the Sparse Universal Transformer (SUT), which leverages Sparse Mixture of Experts (SMoE) and a new stick-breaking-based dynamic halting mechanism to reduce UT's computation complexity while retaining its parameter efficiency and generalization ability. Experiments show that SUT achieves the same performance as strong baseline models while only using half computation and parameters on WMT'14 and strong generalization results on formal language tasks (Logical inference and CFQ). The new halting mechanism also enables around 50\% reduction in computation during inference with very little performance decrease on formal language tasks.
    摘要 《 universal transformer(UT)是一种变体的transformer,它在层之间共享参数。实验证明,UT在正式语言任务上有更好的compositional generalizationthan Vanilla Transformers(VT)。此外,UT的参数共享还使其的参数使用效率更高than VT。尽管它有许多优点,但是扩展UT参数的计算复杂度会比VT的计算复杂度更高。这篇论文提出了Sparse Universal Transformer(SUT),它利用Sparse Mixture of Experts(SMoE)和一种基于扔投的新动态停止机制来降低UT的计算复杂度,保持UT的参数效率和泛化能力。实验表明,SUT可以与强基eline模型相当的性能,仅使用半个计算和参数来处理WMT'14和正式语言任务(逻辑推理和CFQ)。新的停止机制还可以在推理过程中降低计算量约50%,而无需减少正式语言任务中的性能。

Jaeger: A Concatenation-Based Multi-Transformer VQA Model

  • paper_url: http://arxiv.org/abs/2310.07091
  • repo_url: None
  • paper_authors: Jieting Long, Zewei Shi, Penghao Jiang, Yidong Gan
  • for: 提高文档视觉问答的表现,增强语义含义的准确性和细化的multimodal检索。
  • methods: 利用大语言和开放世界的先前模型,如RoBERTa大型和GPT2-xl,作为特征提取器,并对其输出进行 concatenation 操作,以实现多源信息的同时考虑。
  • results: 在 Task C 的 PDF-VQA 数据集上实现竞争性表现。
    Abstract Document-based Visual Question Answering poses a challenging task between linguistic sense disambiguation and fine-grained multimodal retrieval. Although there has been encouraging progress in document-based question answering due to the utilization of large language and open-world prior models\cite{1}, several challenges persist, including prolonged response times, extended inference durations, and imprecision in matching. In order to overcome these challenges, we propose Jaegar, a concatenation-based multi-transformer VQA model. To derive question features, we leverage the exceptional capabilities of RoBERTa large\cite{2} and GPT2-xl\cite{3} as feature extractors. Subsequently, we subject the outputs from both models to a concatenation process. This operation allows the model to consider information from diverse sources concurrently, strengthening its representational capability. By leveraging pre-trained models for feature extraction, our approach has the potential to amplify the performance of these models through concatenation. After concatenation, we apply dimensionality reduction to the output features, reducing the model's computational effectiveness and inference time. Empirical results demonstrate that our proposed model achieves competitive performance on Task C of the PDF-VQA Dataset. If the user adds any new data, they should make sure to style it as per the instructions provided in previous sections.
    摘要 文档视觉问答 зада题存在语义含义涉及和细腻多媒体检索的挑战。虽然因使用大语言和开放世界先进模型\cite{1}而取得了鼓舞人的进步,但还存在许多挑战,包括长时间响应、延长推理时间和匹配不准确。为了解决这些挑战,我们提议Jaegar,一种 concatenation-based 多变换 VQA 模型。为了 derivate 问题特征,我们利用 RoBERTa 大\cite{2} 和 GPT2-xl\cite{3} 作为特征提取器。然后,我们将两个模型的输出经 concatenation 操作,以便同时考虑不同来源的信息,提高模型的表达能力。通过利用预训练模型来提取特征,我们的方法可能会强化这些模型的表现。然后,我们对输出特征进行维度缩放,以降低模型的计算效率和推理时间。实验结果表明,我们提议的模型在 PDF-VQA 数据集的 Task C 上达到了竞争性的性能。如果用户添加任何新数据,他们应该按照以前提供的指导方针进行风格化处理。

Diversity of Thought Improves Reasoning Abilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2310.07088
  • repo_url: None
  • paper_authors: Ranjita Naik, Varun Chandrasekaran, Mert Yuksekgonul, Hamid Palangi, Besmira Nushi
  • for: 提高 LLM 在需要复杂逻辑的设置中的表现
  • methods: 使用多种生成和修改解码步骤来 ensemble 多个生成,以提高模型表现
  • results: 在固定生成预算下,DIV-SE 和 IDIV-SE 在多个逻辑测试上表现出色,超过了之前的基线值,而且在最Difficult Blocksworld任务上达到了最高的29.6%的提升率。
    Abstract Large language models (LLMs) are documented to struggle in settings that require complex reasoning. Nevertheless, instructing the model to break down the problem into smaller reasoning steps (Wei et al., 2022), or ensembling various generations through modifying decoding steps (Wang et al., 2023) boosts performance. Current methods assume that the input prompt is fixed and expect the decoding strategies to introduce the diversity needed for ensembling. In this work, we relax this assumption and discuss how one can create and leverage variations of the input prompt as a means to diversity of thought to improve model performance. We propose a method that automatically improves prompt diversity by soliciting feedback from the LLM to ideate approaches that fit for the problem. We then ensemble the diverse prompts in our method DIV-SE (DIVerse reasoning path Self-Ensemble) across multiple inference calls. We also propose a cost-effective alternative where diverse prompts are used within a single inference call; we call this IDIV-SE (In-call DIVerse reasoning path Self-Ensemble). Under a fixed generation budget, DIV-SE and IDIV-SE outperform the previously discussed baselines using both GPT-3.5 and GPT-4 on several reasoning benchmarks, without modifying the decoding process. Additionally, DIV-SE advances state-of-the-art performance on recent planning benchmarks (Valmeekam et al., 2023), exceeding the highest previously reported accuracy by at least 29.6 percentage points on the most challenging 4/5 Blocksworld task. Our results shed light on how to enforce prompt diversity toward LLM reasoning and thereby improve the pareto frontier of the accuracy-cost trade-off.
    摘要 大型语言模型(LLM)在需要复杂推理的设定下 frequently struggle (Wei et al., 2022)。然而,将问题拆分成小推理步骤(Wei et al., 2022)或者聚合不同代表(Wang et al., 2023)可以提高表现。现有的方法假设输入提示是固定的,并且预期解码策略可以引入充分的多样性。在这个工作中,我们实际松动这个假设,并讨论如何将输入提示的多样性引入到模型中,以提高模型表现。我们提出了一个方法,可以自动提高提示多样性,通过从LLM获取反馈,以便发展适合问题的方法。我们然后聚合这些多样的提示,在多个推理步骤中进行ensemble。我们还提出了一个成本效益高的代替方案,使用多个不同的提示在单一的推理步骤中进行ensemble。在固定的生成预算下,DIV-SE和IDIV-SE在多个推理benchmark上表现出色,不需要修改解码过程。此外,DIV-SE在最近的规划benchmark上进一步提高了州时的表现,至少比前一代最高的accuracy报告提高29.6个百分点。我们的结果 shed light on how to enforce提示多样性在LLM推理中,并因此提高精度-成本费用的 pareto frontier。

Leveraging Twitter Data for Sentiment Analysis of Transit User Feedback: An NLP Framework

  • paper_url: http://arxiv.org/abs/2310.07086
  • repo_url: None
  • paper_authors: Adway Das, Abhishek Kumar Prajapati, Pengxiang Zhang, Mukund Srinath, Andisheh Ranjbari
  • for: 本研究は、便宜なソーシャルメディアデータを利用して、ユーザーフィードバックを収集するための新しいNLP基盘フレームワークを提案しています。
  • methods: このフレームワークでは、ツイートをクラスタリングするためのフィーウェイ学习を使用し、ツイートの感情を评価するためのレキシコンベースの感情分析モデルを适用します。
  • results: この研究では、ニューヨーク市の地下铁システムを例として、このフレームワークを适用して、ユーザーの感想を捉えることができました。结果は、安全性、信赖性、および保守の3つのカテゴリーに分类されたツイートを正确に识别することができました。また、感情の强さと方向性を评価することができました。この结果は、便宜なソーシャルメディアデータを使用してユーザーフィードバックを収集することが有效であることを证明しています。
    Abstract Traditional methods of collecting user feedback through transit surveys are often time-consuming, resource intensive, and costly. In this paper, we propose a novel NLP-based framework that harnesses the vast, abundant, and inexpensive data available on social media platforms like Twitter to understand users' perceptions of various service issues. Twitter, being a microblogging platform, hosts a wealth of real-time user-generated content that often includes valuable feedback and opinions on various products, services, and experiences. The proposed framework streamlines the process of gathering and analyzing user feedback without the need for costly and time-consuming user feedback surveys using two techniques. First, it utilizes few-shot learning for tweet classification within predefined categories, allowing effective identification of the issues described in tweets. It then employs a lexicon-based sentiment analysis model to assess the intensity and polarity of the tweet sentiments, distinguishing between positive, negative, and neutral tweets. The effectiveness of the framework was validated on a subset of manually labeled Twitter data and was applied to the NYC subway system as a case study. The framework accurately classifies tweets into predefined categories related to safety, reliability, and maintenance of the subway system and effectively measured sentiment intensities within each category. The general findings were corroborated through a comparison with an agency-run customer survey conducted in the same year. The findings highlight the effectiveness of the proposed framework in gauging user feedback through inexpensive social media data to understand the pain points of the transit system and plan for targeted improvements.
    摘要 传统的公共交通User feedback收集方法经常是时间consuming、资源占用和成本高的。在这篇论文中,我们提出了一种基于自然语言处理(NLP)的框架,利用社交媒体平台 like Twitter 上的丰富、便宜的用户生成内容来了解用户对不同服务问题的看法。Twitter 是一个 Microblogging 平台,它上有大量的实时用户生成内容,这些内容经常包含有价值的反馈和意见。我们的框架可以快速地收集和分析用户反馈,不需要费时和费力的用户反馈调查。我们使用了 few-shot learning 来分类 tweet,并使用 sentiment analysis 模型来评估 tweet 的情感 INTENSITY和方向。我们验证了该框架的有效性,并将其应用到纽约市地铁系统作为案例研究。结果表明,框架可以有效地将 tweet 分类到预定义的安全、可靠性和维护等类别中,并准确地评估每个类别中的情感 INTENSITY。我们的发现得到了公共交通机构所运行的客户调查的支持,这些发现反映了该框架在估计用户反馈的有效性。