cs.AI - 2023-11-26

Variational Exploration Module VEM: A Cloud-Native Optimization and Validation Tool for Geospatial Modeling and AI Workflows

  • paper_url: http://arxiv.org/abs/2311.16196
  • repo_url: None
  • paper_authors: Julian Kuehnert, Hiwot Tadesse, Chris Dearden, Rosie Lickorish, Paolo Fraccaro, Anne Jones, Blair Edwards, Sekou L. Remy, Peter Melling, Tim Culmer
  • for: 这种研究旨在提高我们所处环境的物理系统理解,并为减少社会危害而设计最佳实践。
  • methods: 这种研究使用地理观测和计算模型,并使用云端部署来扩大模型和人工智能工作流程。
  • results: 研究人员已经开发了变量探索模块,该模块可以在云端部署模型工作流程,并使用 bayesian 和机器学习方法来分析模型行为。用户可以通过组合多种采样策略来自定义模型。
    Abstract Geospatial observations combined with computational models have become key to understanding the physical systems of our environment and enable the design of best practices to reduce societal harm. Cloud-based deployments help to scale up these modeling and AI workflows. Yet, for practitioners to make robust conclusions, model tuning and testing is crucial, a resource intensive process which involves the variation of model input variables. We have developed the Variational Exploration Module which facilitates the optimization and validation of modeling workflows deployed in the cloud by orchestrating workflow executions and using Bayesian and machine learning-based methods to analyze model behavior. User configurations allow the combination of diverse sampling strategies in multi-agent environments. The flexibility and robustness of the model-agnostic module is demonstrated using real-world applications.
    摘要

GGNNs : Generalizing GNNs using Residual Connections and Weighted Message Passing

  • paper_url: http://arxiv.org/abs/2311.15448
  • repo_url: None
  • paper_authors: Abhinav Raghuvanshi, Kushal Sokke Malleshappa
  • for: 本研究旨在提高图生成神经网络(GNN)的学习效果和准确率,通过修改消息传递机制来提高网络的泛化能力。
  • methods: 本研究使用多层感知器(MLP)构建GNN,并在消息传递机制中添加权重补做和径向连接,以提高网络的学习和快速抽象。
  • results: 研究表明,通过修改消息传递机制,GNN的学习效果和准确率显著提高,同时网络的泛化能力也得到了改善。
    Abstract Many real-world phenomena can be modeled as a graph, making them extremely valuable due to their ubiquitous presence. GNNs excel at capturing those relationships and patterns within these graphs, enabling effective learning and prediction tasks. GNNs are constructed using Multi-Layer Perceptrons (MLPs) and incorporate additional layers for message passing to facilitate the flow of features among nodes. It is commonly believed that the generalizing power of GNNs is attributed to the message-passing mechanism between layers, where nodes exchange information with their neighbors, enabling them to effectively capture and propagate information across the nodes of a graph. Our technique builds on these results, modifying the message-passing mechanism further: one by weighing the messages before accumulating at each node and another by adding Residual connections. These two mechanisms show significant improvements in learning and faster convergence
    摘要 许多实际世界现象可以表示为图,使其变得非常有价值。GNNS excellence at capturing这些关系和图中的模式,使得学习和预测任务变得非常有效。GNNS通过多层感知器(MLP)构建,并添加特殊层来促进节点之间信息的传递。通常认为,GNNS的泛化能力是由节点之间信息传递机制带来的,其中节点与邻居节点交换信息,以便有效地捕捉和传播节点图中的信息。我们的技术基于这些结果,进一步改进信息传递机制:一是对消息进行权重,二是添加径向连接。这两种机制在学习和速度更快的整体性能中具有显著改进。

ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation [Technical Report]

  • paper_url: http://arxiv.org/abs/2311.15438
  • repo_url: None
  • paper_authors: Hamed Ayoobi, Nico Potyka, Francesca Toni
  • for: 这篇论文是为了提出一种新的可解释深度神经网络模型,用于图像分类。
  • methods: 这篇论文使用了 proto-arg-net,一种基于 protoypical-part-learning 的 interpretable 深度神经网络模型。它使用super-protoypes,将多个 prototypical-parts 组合成单个 prototypical class representation。此外,它还使用多层感知抽象(MLP)来提高准确性,并且可以根据用户认知需求进行自定义。
  • results: 在实验中,ProtoArgNet 的准确率比 protoPNet 高,并且可以认识到图像中不同区域的 prototypical-parts 之间的空间关系。
    Abstract We propose ProtoArgNet, a novel interpretable deep neural architecture for image classification in the spirit of prototypical-part-learning as found, e.g. in ProtoPNet. While earlier approaches associate every class with multiple prototypical-parts, ProtoArgNet uses super-prototypes that combine prototypical-parts into single prototypical class representations. Furthermore, while earlier approaches use interpretable classification layers, e.g. logistic regression in ProtoPNet, ProtoArgNet improves accuracy with multi-layer perceptrons while relying upon an interpretable reading thereof based on a form of argumentation. ProtoArgNet is customisable to user cognitive requirements by a process of sparsification of the multi-layer perceptron/argumentation component. Also, as opposed to other prototypical-part-learning approaches, ProtoArgNet can recognise spatial relations between different prototypical-parts that are from different regions in images, similar to how CNNs capture relations between patterns recognized in earlier layers.
    摘要 我们提出了ProtoArgNet,一种新的可读性深度神经网络模型,以探索 protoypical-part-learning 的精神,类似于 ProtoPNet 等。与先前的方法不同,ProtoArgNet 使用超类 prototype,将多个 protoypical-part 合并到单个 protoypical class 表示中。此外,ProtoArgNet 使用多层感知器,而不是可读性分类层,如 logistic regression,以提高准确率。同时,ProtoArgNet 可以根据用户认知需求进行精简,以适应不同的应用场景。此外,ProtoArgNet 还可以识别不同区域中 protoypical-part 之间的空间关系,类似于 CNN 在早期层中识别模式之间的关系。

Wired Perspectives: Multi-View Wire Art Embraces Generative AI

  • paper_url: http://arxiv.org/abs/2311.15421
  • repo_url: https://github.com/WinKawaks/DreamWire
  • paper_authors: Zhiyu Qu, Lan Yang, Honggang Zhang, Tao Xiang, Kaiyue Pang, Yi-Zhe Song
  • for: 这篇论文的目的是提供一种使得 everyone可以轻松创造多视角雕塑(MVWA)的人工智能系统。
  • methods: 该系统使用了3D B'ezier曲线、Prim的算法和扩散模型或其变体(如ControlNet)的知识储存。这种结合使得系统能够表示3D雕塑,保证空间连续性,并且解决数据缺乏问题。
  • results: 作者通过了广泛的评估和分析,提供了系统的内部工作原理,包括连接性和视觉美感之间的质量均衡。
    Abstract Creating multi-view wire art (MVWA), a static 3D sculpture with diverse interpretations from different viewpoints, is a complex task even for skilled artists. In response, we present DreamWire, an AI system enabling everyone to craft MVWA easily. Users express their vision through text prompts or scribbles, freeing them from intricate 3D wire organisation. Our approach synergises 3D B\'ezier curves, Prim's algorithm, and knowledge distillation from diffusion models or their variants (e.g., ControlNet). This blend enables the system to represent 3D wire art, ensuring spatial continuity and overcoming data scarcity. Extensive evaluation and analysis are conducted to shed insight on the inner workings of the proposed system, including the trade-off between connectivity and visual aesthetics.
    摘要 创造多视图电磁雕塑(MVWA),一种静态3D雕塑,需要高水平的艺术家技巧。为了解决这问题,我们提出了梦织系统,帮助所有人轻松创作MVWA。用户通过文本提示或勾勒,不需要考虑复杂的3D电磁组织。我们的方法结合3D Bézier曲线、Prim的算法和扩散模型知识填充(例如ControlNet),这种混合可以 repre sent 3D电磁雕塑,保证空间连续性,并且超越数据缺乏。我们进行了广泛的评估和分析,以便了解我们提出的系统的内部工作机制,包括连接和视觉美感之间的质量衡量。

A Framework for Realistic Simulation of Daily Human Activity

  • paper_url: http://arxiv.org/abs/2311.15400
  • repo_url: None
  • paper_authors: Ifrah Idrees, Siddharth Singh, Kerui Xu, Dylan F. Glas
  • for: 这个论文是为了提供一种大规模 simulate human activity in home environments中的方法,以支持社交机器人like Astro的特性开发和测试。
  • methods: 该论文提出了一种基于人工生成的日常活动模式的框架,可以自定义不同的人物模式或活动模式,并可以变换活动时间。它还提出了一种 bidirectional constraint propagation 算法,用于从模板中生成时间表。
  • results: 论文通过用例enario分析 validate了其框架的表达力,并demonstrated that its method can be used to generate data closely resembling human behavior from three public datasets and a self-collected dataset。
    Abstract For social robots like Astro which interact with and adapt to the daily movements of users within the home, realistic simulation of human activity is needed for feature development and testing. This paper presents a framework for simulating daily human activity patterns in home environments at scale, supporting manual configurability of different personas or activity patterns, variation of activity timings, and testing on multiple home layouts. We introduce a method for specifying day-to-day variation in schedules and present a bidirectional constraint propagation algorithm for generating schedules from templates. We validate the expressive power of our framework through a use case scenario analysis and demonstrate that our method can be used to generate data closely resembling human behavior from three public datasets and a self-collected dataset. Our contribution supports systematic testing of social robot behaviors at scale, enables procedural generation of synthetic datasets of human movement in different households, and can help minimize bias in training data, leading to more robust and effective robots for home environments.
    摘要 For social robots like Astro, which interact with and adapt to the daily movements of users within the home, realistic simulation of human activity is necessary for feature development and testing. This paper proposes a framework for simulating daily human activity patterns in home environments at scale, supporting manual configurability of different personas or activity patterns, variation of activity timings, and testing on multiple home layouts. We introduce a method for specifying day-to-day variation in schedules and present a bidirectional constraint propagation algorithm for generating schedules from templates. We validate the expressive power of our framework through a use case scenario analysis and demonstrate that our method can be used to generate data closely resembling human behavior from three public datasets and a self-collected dataset. Our contribution supports systematic testing of social robot behaviors at scale, enables procedural generation of synthetic datasets of human movement in different households, and can help minimize bias in training data, leading to more robust and effective robots for home environments.Here's the text with some additional information about the translation:I used the Google Translate API to translate the text into Simplified Chinese. The translation is in the traditional Chinese format, which is commonly used in mainland China and Taiwan.Note that the translation may not be perfect, and some nuances or idioms in the original text may not be fully conveyed in the translation. Additionally, the translation may not be suitable for all contexts or audiences, and it's important to consider the cultural and linguistic appropriateness of the translation when using it in different contexts.

Optimally Teaching a Linear Behavior Cloning Agent

  • paper_url: http://arxiv.org/abs/2311.15399
  • repo_url: None
  • paper_authors: Shubham Kumar Bharti, Stephen Wright, Adish Singla, Xiaojin Zhu
  • for: 这个论文研究了线性行为克隆学习者的最优教学方法。
  • methods: 教师可以选择哪些状态给学习者示范,学习者保持一个无穷Linear гипоthesis空间,实现目标政策。
  • results: 我们提出了一种名为“iterative elimination”的教学算法,可以实现最小化示范数量,并且提供了一种approximation算法,可以保证 approximates the teaching dimension。我们还提供了实验结果,证明我们的算法的有效性和可行性。
    Abstract We study optimal teaching of Linear Behavior Cloning (LBC) learners. In this setup, the teacher can select which states to demonstrate to an LBC learner. The learner maintains a version space of infinite linear hypotheses consistent with the demonstration. The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations. This number is known as the Teaching Dimension(TD). We present a teaching algorithm called ``Teach using Iterative Elimination(TIE)" that achieves instance optimal TD. However, we also show that finding optimal teaching set computationally is NP-hard. We further provide an approximation algorithm that guarantees an approximation ratio of $\log(|A|-1)$ on the teaching dimension. Finally, we provide experimental results to validate the efficiency and effectiveness of our algorithm.
    摘要 我们研究线性行为传递(LBC)学生的最佳教学方法。在这个设置中,老师可以选择给LBC学生显示哪些状态。学生保持了一个无限 Linear Hypotheses 的版本空间,与示例相容。老师的目标是使用最少的状态示例教育 LBC 学生一个可行的目标政策,这个数字称为教学维度(TD)。我们提出了一个名为“Iterative Elimination 教学法”(TIE),它可以实现实例最佳的 TD。然而,我们还证明了找到最佳教学集的计算是NP困难的。我们还提供了一个近似算法,它保证了 $\log(|A|-1)$ 的近似比率在教学维度上。最后,我们提供了实验结果,以验证我们的算法的有效性和可行性。

Confidence Is All You Need for MI Attacks

  • paper_url: http://arxiv.org/abs/2311.15373
  • repo_url: None
  • paper_authors: Abhishek Sinha, Himanshi Tibrewal, Mansi Gupta, Nikhar Waghela, Shivank Garg
  • for: 本研究旨在提出一种新的会员推理攻击方法,用于攻击机器学习模型的机密性。
  • methods: 本研究使用机器学习模型的信任值来衡量数据点的成员身份。而不是通过损失函数和会员关系的尝试,我们利用了模型在训练数据上的满意度更高,以及模型在训练数据上的特定模式和噪声。
  • results: 本研究提出了一种基于模型的信任值的会员推理攻击方法,无需知道数据点的真实类别。这种方法比传统的标签依赖的攻击方法更加有利。
    Abstract In this evolving era of machine learning security, membership inference attacks have emerged as a potent threat to the confidentiality of sensitive data. In this attack, adversaries aim to determine whether a particular point was used during the training of a target model. This paper proposes a new method to gauge a data point's membership in a model's training set. Instead of correlating loss with membership, as is traditionally done, we have leveraged the fact that training examples generally exhibit higher confidence values when classified into their actual class. During training, the model is essentially being 'fit' to the training data and might face particular difficulties in generalization to unseen data. This asymmetry leads to the model achieving higher confidence on the training data as it exploits the specific patterns and noise present in the training data. Our proposed approach leverages the confidence values generated by the machine learning model. These confidence values provide a probabilistic measure of the model's certainty in its predictions and can further be used to infer the membership of a given data point. Additionally, we also introduce another variant of our method that allows us to carry out this attack without knowing the ground truth(true class) of a given data point, thus offering an edge over existing label-dependent attack methods.
    摘要 在这个不断发展的机器学习安全领域,用户数据隐私性的威胁——会员推测攻击——已经成为一种强大的攻击方式。在这种攻击中,敌方希望确定特定点是否在目标模型的训练集中使用。这篇论文提出了一种新的方法来评估模型训练集中的数据点成员身份。而不是传统的基于损失函数的方法,我们利用了训练数据中实际类别的推断值更高的特性。在训练过程中,模型在训练数据上进行"适应",可能会面临未见数据的总体化问题。这种差异导致模型对训练数据的推断值更高,因为它利用了特定的模式和噪声在训练数据中。我们提出的方法利用了机器学习模型生成的信任值。这些信任值提供了机器学习模型对其预测结果的概率评估,并可以用来推断数据点的成员身份。此外,我们还介绍了另一种我们的方法,允许我们在不知道数据点真实类别(true class)的情况下进行这种攻击,从而对现有的标签依赖攻击方法产生优势。

TD-Net: A Tri-domain network for sparse-view CT reconstruction

  • paper_url: http://arxiv.org/abs/2311.15369
  • repo_url: None
  • paper_authors: Xinyuan Wang, Changqing Su, Bo Xiong
  • for: 减少X射线Radiation风险,提高CT图像质量
  • methods: 三域策略(TD-Net),结合频率监督模块(FSM)
  • results: 高质量CT图像重建,有效地平衡Radiation安全和图像准确性
    Abstract Sparse-view CT reconstruction, aimed at reducing X-ray radiation risks, frequently suffers from image quality degradation, manifested as noise and artifacts. Existing post-processing and dual-domain techniques, although effective in radiation reduction, often lead to over-smoothed results, compromising diagnostic clarity. Addressing this, we introduce TD-Net, a pioneering tri-domain approach that unifies sinogram, image, and frequency domain optimizations. By incorporating Frequency Supervision Module(FSM), TD-Net adeptly preserves intricate details, overcoming the prevalent over-smoothing issue. Extensive evaluations demonstrate TD-Net's superior performance in reconstructing high-quality CT images from sparse views, efficiently balancing radiation safety and image fidelity. The enhanced capabilities of TD-Net in varied noise scenarios highlight its potential as a breakthrough in medical imaging.
    摘要 《简化视图CT重建》,目的是降低X射线辐射风险,然而经常会导致图像质量下降,manifest为噪声和artefacts。现有的后处理和双Domain技术,虽然可以减少辐射,但经常导致结果过滤,增加诊断的模糊性。为了解决这个问题,我们介绍TD-Net,一种创新的三Domain方法,它将sinogram、图像和频率Domain优化融合在一起。通过integrating Frequency Supervision Module(FSM),TD-Net能够精细地保留细节,超越常见的过滤问题。广泛的评估表明TD-Net在不同的噪声场景下能够高效地重建高质量CT图像,fficiently balance辐射安全和图像准确性。TD-Net的提高能力在多种噪声场景下表明它的潜在breakthrough在医学影像领域。

Having Second Thoughts? Let’s hear it

  • paper_url: http://arxiv.org/abs/2311.15356
  • repo_url: https://github.com/rprokap/pset-9
  • paper_authors: Jung H. Lee, Sujith Vijayan
  • for: 该研究旨在提高深度学习模型的可靠性和安全性。
  • methods: 该研究提出了一种新的证明过程,模拟选择性注意力,以提高深度学习模型的准确率和鲁棒性。
  • results: 实验结果表明,该新的证明过程可以提高深度学习模型的准确率,并帮助建立安全措施,以避免模型受到人工和自然 adversarial example 的攻击。
    Abstract Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas. After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted. Since the human brain consists of multiple functional areas highly connected to one another and relies on intricate interplays between bottom-up and top-down (from high-order to low-order areas) processing, we hypothesize that incorporating top-down signal processing may make DL models more robust. To address this hypothesis, we propose a certification process mimicking selective attention and test if it could make DL models more robust. Our empirical evaluations suggest that this newly proposed certification can improve DL models' accuracy and help us build safety measures to alleviate their vulnerabilities with both artificial and natural adversarial examples.
    摘要

Token Recycling for Efficient Sequential Inference with Vision Transformers

  • paper_url: http://arxiv.org/abs/2311.15335
  • repo_url: None
  • paper_authors: Jan Olszewski, Dawid Rymarczyk, Piotr Wójcik, Mateusz Pach, Bartosz Zieliński
  • for: 这篇论文是为了解决对 incomplete inputs 的处理问题,因为 ViT 不需要假设 missing values 的值。
  • methods: 这篇论文提出了 TOken REcycling (TORE) 修改,它可以与任何架构组合使用,以提高 ViT 的sequential inference 效率。 TORE 将 ViT 分为两部分:iterator 和 aggregator。 iterator 处理每个 sequential information,并将其分为中途的 tokens,这些 tokens 会被缓存。 aggregator 则处理中途 tokens,以取得预测结果。
  • results: 这篇论文的 TORE 修改可以大幅提高 ViT 的 sequential inference 效率,并且可以与任何架构组合使用。 此外,论文还提出了一个辅助的训练策略,可以对 sequential decision-making 问题进行有效的解决,并且可以保持 state-of-the-art 的准确性。
    Abstract Vision Transformers (ViTs) overpass Convolutional Neural Networks in processing incomplete inputs because they do not require the imputation of missing values. Therefore, ViTs are well suited for sequential decision-making, e.g. in the Active Visual Exploration problem. However, they are computationally inefficient because they perform a full forward pass each time a piece of new sequential information arrives. To reduce this computational inefficiency, we introduce the TOken REcycling (TORE) modification for the ViT inference, which can be used with any architecture. TORE divides ViT into two parts, iterator and aggregator. An iterator processes sequential information separately into midway tokens, which are cached. The aggregator processes midway tokens jointly to obtain the prediction. This way, we can reuse the results of computations made by iterator. Except for efficient sequential inference, we propose a complementary training policy, which significantly reduces the computational burden associated with sequential decision-making while achieving state-of-the-art accuracy.
    摘要 vision transformers (ViTs) 超过 convolutional neural networks (CNNs) 在处理不完整输入的能力方面,因为它们不需要填充缺失的值。因此,ViTs 适用于顺序决策问题,如活动视觉探索问题。然而,它们在处理新的Sequential信息时需要进行全面的前进通过,这会导致计算不fficient。为了解决这个计算不fficient的问题,我们介绍了 TOken REcycling (TORE) 修改,可以与任何架构一起使用。TORE 将 ViT 分成两部分:iterator 和 aggregator。iterator 处理 sequential 信息,将其分割成中途 токен,并将其缓存。aggregator 处理中途 токен,并使用它们来获取预测结果。这样,我们可以重用 iterator 进行计算的结果。除了高效的顺序推理,我们还提出了一种补充的训练策略,可以减少顺序决策所需的计算负担,同时保持state-of-the-art的准确性。

ASI: Accuracy-Stability Index for Evaluating Deep Learning Models

  • paper_url: http://arxiv.org/abs/2311.15332
  • repo_url: None
  • paper_authors: Wei Dai, Daniel Berleant
  • for: 本研究旨在提供一种新的评估深度学习模型的量化指标,以准确评估模型的准确率和稳定性。
  • methods: 本研究提出了一种名为准确稳定指数(ASI)的量化指标,该指标结合了准确率和稳定性来评估深度学习模型。
  • results: 实验结果表明,ASI可以准确地评估深度学习模型的准确率和稳定性,并且可以visual化模型的性能。
    Abstract In the context of deep learning research, where model introductions continually occur, the need for effective and efficient evaluation remains paramount. Existing methods often emphasize accuracy metrics, overlooking stability. To address this, the paper introduces the Accuracy-Stability Index (ASI), a quantitative measure incorporating both accuracy and stability for assessing deep learning models. Experimental results demonstrate the application of ASI, and a 3D surface model is presented for visualizing ASI, mean accuracy, and coefficient of variation. This paper addresses the important issue of quantitative benchmarking metrics for deep learning models, providing a new approach for accurately evaluating accuracy and stability of deep learning models. The paper concludes with discussions on potential weaknesses and outlines future research directions.
    摘要 在深度学习研究中,模型引入不断,评估效果的需求仍然急需。现有方法frequently emphasize精度指标,忽视稳定性。为解决这一问题,本文提出了精度稳定指数(ASI),一种具有精度和稳定性的评估深度学习模型的量化指标。实验结果表明ASI的应用,并提供了一种3D表面模型,用于可见性评估ASI、平均精度和变化系数。本文解决了深度学习模型的量化评估指标问题,提供了一种新的精度和稳定性评估方法。文章结尾提出了可能的弱点和未来研究方向。

Lightweight Face Recognition: An Improved MobileFaceNet Model

  • paper_url: http://arxiv.org/abs/2311.15326
  • repo_url: None
  • paper_authors: Ahmad Hassanpour, Yasamin Kowsari
  • for: 这个论文探讨了轻量级面Recognition(FR)模型,特别是MobileFaceNet和其修改后的MMobileFaceNet。由于设备限制的计算资源,需要开发具有减少内存占用和计算需求的FR模型,同时保持准确性。
  • methods: 我们使用了不同的数据集、模型架构和优化算法来影响FR模型的性能。我们在EFaR-2023比赛中参与了,并在不同的测试数据上显示出了出色的性能,特别是在限定参数数量的类别中。我们使用了Webface42M数据集的子集和锐度感知优化(SAM)算法,以实现在多种测试数据上提高准确率。
  • results: 我们的方法可以创建不仅计算效率高,还可以在多种不同情况下保持高准确率的FR模型。在不同的测试数据上,我们的模型都达到了优秀的性能。
    Abstract This paper presents an extensive exploration and comparative analysis of lightweight face recognition (FR) models, specifically focusing on MobileFaceNet and its modified variant, MMobileFaceNet. The need for efficient FR models on devices with limited computational resources has led to the development of models with reduced memory footprints and computational demands without sacrificing accuracy. Our research delves into the impact of dataset selection, model architecture, and optimization algorithms on the performance of FR models. We highlight our participation in the EFaR-2023 competition, where our models showcased exceptional performance, particularly in categories restricted by the number of parameters. By employing a subset of the Webface42M dataset and integrating sharpness-aware minimization (SAM) optimization, we achieved significant improvements in accuracy across various benchmarks, including those that test for cross-pose, cross-age, and cross-ethnicity performance. The results underscore the efficacy of our approach in crafting models that are not only computationally efficient but also maintain high accuracy in diverse conditions.
    摘要 We participated in the EFaR-2023 competition and our models demonstrated exceptional performance, particularly in categories with restricted parameters. By using a subset of the Webface42M dataset and integrating sharpness-aware minimization (SAM) optimization, we achieved significant improvements in accuracy across various benchmarks, including those that test for cross-pose, cross-age, and cross-ethnicity performance. The results demonstrate the effectiveness of our approach in developing models that are not only computationally efficient but also maintain high accuracy in diverse conditions.

A Foundational Framework and Methodology for Personalized Early and Timely Diagnosis

  • paper_url: http://arxiv.org/abs/2311.16195
  • repo_url: None
  • paper_authors: Tim Schubert, Richard W Peck, Alexander Gimson, Camelia Davtyan, Mihaela van der Schaar
  • for: 提高患者生活质量和医疗系统效益,早期诊断可能导致更好的治疗选择、长期存活和生活质量,以及减少总成本。
  • methods: 使用决策理论方法描述诊断过程,并结合机器学习和统计方法来估计个性化诊断路径的优化。
  • results: 提出了首个基础框架,可以系统地识别和估计个体患者诊断过程中的时间依赖性和价值。这种框架可以帮助开发个性化的决策支持工具,并且可以用来评估技术对个体早期诊断的影响。
    Abstract Early diagnosis of diseases holds the potential for deep transformation in healthcare by enabling better treatment options, improving long-term survival and quality of life, and reducing overall cost. With the advent of medical big data, advances in diagnostic tests as well as in machine learning and statistics, early or timely diagnosis seems within reach. Early diagnosis research often neglects the potential for optimizing individual diagnostic paths. To enable personalized early diagnosis, a foundational framework is needed that delineates the diagnosis process and systematically identifies the time-dependent value of various diagnostic tests for an individual patient given their unique characteristics. Here, we propose the first foundational framework for early and timely diagnosis. It builds on decision-theoretic approaches to outline the diagnosis process and integrates machine learning and statistical methodology for estimating the optimal personalized diagnostic path. To describe the proposed framework as well as possibly other frameworks, we provide essential definitions. The development of a foundational framework is necessary for several reasons: 1) formalism provides clarity for the development of decision support tools; 2) observed information can be complemented with estimates of the future patient trajectory; 3) the net benefit of counterfactual diagnostic paths and associated uncertainties can be modeled for individuals 4) 'early' and 'timely' diagnosis can be clearly defined; 5) a mechanism emerges for assessing the value of technologies in terms of their impact on personalized early diagnosis, resulting health outcomes and incurred costs. Finally, we hope that this foundational framework will unlock the long-awaited potential of timely diagnosis and intervention, leading to improved outcomes for patients and higher cost-effectiveness for healthcare systems.
    摘要 早期疾病诊断具有深刻的转变潜力,可以提供更好的治疗选择、改善长期存活和生活质量,以及减少总成本。随着医疗大数据的出现,早期诊断的进步在望。然而,早期诊断研究通常忽略个人化诊断路径的优化潜力。为实现个人化早期诊断,我们提出了首个基础框架。这个框架基于决策理论方法,描述诊断过程,并将机器学习和统计方法用于估计个人化诊断路径的优化。为描述我们的框架以及可能的其他框架,我们提供必要的定义。我们认为,开发基础框架是必要的多种原因:1)形式提供诊断工具的开发方法的清晰性; 2)可以补充当前患者信息,预测未来患者轨迹; 3)可以模型个人化诊断路径的未来患者 trajectory 和相关不确定性; 4)“早期”和“时间有序”的诊断可以得到明确定义; 5)可以制定评估技术对个人化早期诊断的影响,结果健康结果和支付成本。最后,我们希望该基础框架能够解锁医疗系统中早期诊断和 intervención 的潜在力量,导致患者的改善结果和成本效益的提高。

Perspective in Opinion Dynamics on Complex Convex Domains of Time Networks for Addiction, Forgetting

  • paper_url: http://arxiv.org/abs/2311.15318
  • repo_url: None
  • paper_authors: Yasuko Kawahata
  • for: 本文修订了之前的研究,并引入不同的时空尺度。文章提出了一个包含层A和B的模型,它们在时间上有不同的忘记和依赖程度。此外,文章还模型了层A、A’、B和B’中忘记和依赖的变化在某些条件下。
  • methods: 本文使用了新的层C和D来描述各自的忘记和依赖行为,以及这些行为在时间和空间上的发展。此外,文章还使用了网络分析和笛卡尔矩阵来更深入地理解网络结构和动态。
  • results: 本文的结果表明,在不同的时空尺度下,各自的忘记和依赖行为会产生不同的影响。文章还发现了一些潜在的问题,如观点的扩展、媒体的影响和信任问题,这些问题需要在协调建设中考虑。
    Abstract This paper revises previous work and introduces changes in spatio-temporal scales. The paper presents a model that includes layers A and B with varying degrees of forgetting and dependence over time. We also model changes in dependence and forgetting in layers A, A', B, and B' under certain conditions. In addition, to discuss the formation of opinion clusters that have reinforcing or obstructive behaviors of forgetting and dependence and are conservative or brainwashing or detoxifying and less prone to filter bubbling, new clusters C and D that recommend, obstruct, block, or incite forgetting and dependence over time are Introduction. This introduction allows us to test hypotheses regarding the expansion of opinions in two dimensions over time and space, the state of development of opinion space, and the expansion of public opinion. Challenges in consensus building will be highlighted, emphasizing the dynamic nature of opinions and the need to consider factors such as dissent, distrust, and media influence. The paper proposes an extended framework that incorporates trust, distrust, and media influence into the consensus building model. We introduce network analysis using dimerizing as a method to gain deeper insights. In this context, we discuss network clustering, media influence, and consensus building. The location and distribution of dimers will be analyzed to gain insight into the structure and dynamics of the network. Dimertiling has been applied in various fields other than network analysis, such as physics and sociology. The paper concludes by emphasizing the importance of diverse perspectives, network analysis, and influential entities in consensus building. It also introduces torus-based visualizations that aid in understanding complex network structures.
    摘要

Students’ interest in knowledge acquisition in Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2311.16193
  • repo_url: None
  • paper_authors: Manuela-Andreea Petrescu, Emilia-Loredana Pop, Tudor-Dan Mihoc
  • for: 本研究探讨了计算机科学专业学生对人工智能课程的期望和看法。
  • methods: 我们匿名收集了200名学生中的58名学生的答案,并使用主题分析方法进行分析和解读。
  • results: 结果显示学生对人工智能的兴趣主要来自于其流行性、应用性、爱好和兴趣、未来发展潜力和高薪资。然而,男生更关心获得高级技能,而女生更关心获得中级知识。学生最不喜欢的部分是人工智能中的数学方面。一些学生还意识到人工智能的潜在可能性,并用于不良目的。与数据库课程相比,学生对人工智能课程的兴趣更高,但对数据库课程的兴趣相对较低。
    Abstract Some students' expectations and points of view related to the Artificial Intelligence course are explored and analyzed in this study. We anonymous collected answers from 58 undergraduate students out of 200 enrolled in the Computer Science specialization. The answers were analysed and interpreted using thematic analysis to find out their interests and attractive and unattractive aspects related to the Artificial Intelligence study topic. We concluded that students are interested in Artificial Intelligence due to its trendiness, applicability, their passion and interest in the subject, the potential for future growth, and high salaries. However, the students' expectations were mainly related to achieving medium knowledge in the Artificial Intelligence field, and men seem to be more interested in acquiring high-level skills than women. The most common part that wasn't enjoyed by the students was the mathematical aspect used in Artificial Intelligence. Some of them (a small group) were also aware of the Artificial Intelligence potential which could be used in an unethical manner for negative purposes. Our study also provides a short comparison to the Databases course, in which students were not that passionate or interested in achieving medium knowledge, their interest was related to DB usage and basic information.
    摘要 这个研究探讨了一些计算机专业学生对人工智能课程的期望和看法。我们匿名收集了200名计算机专业学生中的58名学生的答案,并使用主题分析方法进行分析和解读。我们发现,学生对人工智能的兴趣主要归结于其流行度、应用性、自己的热爱和兴趣、未来发展的潜力以及高薪金。然而,学生们对人工智能的期望主要是要达到中等水平的知识,而男生更想要具备高级技能 than women。学生最不喜欢的部分是人工智能中的数学方面。一些学生(一小组)也意识到人工智能的潜在可能性,并用于不道德的目的。我们的研究还提供了对数据库课程的简短比较,学生对数据库课程不那么热情和感兴趣,只是关注数据库的使用和基本信息。

Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement

  • paper_url: http://arxiv.org/abs/2311.15303
  • repo_url: https://github.com/avani17101/CD
  • paper_authors: Avani Gupta, Saurabh Saini, P J Narayanan
  • for: 这paper focuses on improving the interpretability of neural networks by using human-centered concept explanations to reduce model bias and improve understanding.
  • methods: The paper introduces Concept Activation Vectors (CAVs) for ante-hoc training and Concept Distillation to create richer concepts using a pre-trained knowledgeable model as the teacher. The method can sensitize or desensitize a model towards concepts.
  • results: The paper shows applications of concept-sensitive training to debias several classification problems and induce prior knowledge into IID, a reconstruction problem. Concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge.Here’s the Chinese translation of the three points:
  • for: 这paper 专注于使用人类中心的概念解释来改善神经网络的可解释性,以降低模型偏见和提高理解。
  • methods: paper 引入 Concept Activation Vectors (CAVs) для ante-hoc 训练和 Concept Distillation 来创造更加丰富的概念。 teacher 模型可以敏感或抑制模型对概念的响应。
  • results: paper 示例了通过概念敏感训练来减少多个分类问题的偏见和将先验知识引入 IID 重建问题中。 Concept-sensitive training 可以提高模型的可解释性,降低偏见,并引入先验知识。
    Abstract Humans use abstract concepts for understanding instead of hard features. Recent interpretability research has focused on human-centered concept explanations of neural networks. Concept Activation Vectors (CAVs) estimate a model's sensitivity and possible biases to a given concept. In this paper, we extend CAVs from post-hoc analysis to ante-hoc training in order to reduce model bias through fine-tuning using an additional Concept Loss. Concepts were defined on the final layer of the network in the past. We generalize it to intermediate layers using class prototypes. This facilitates class learning in the last convolution layer, which is known to be most informative. We also introduce Concept Distillation to create richer concepts using a pre-trained knowledgeable model as the teacher. Our method can sensitize or desensitize a model towards concepts. We show applications of concept-sensitive training to debias several classification problems. We also use concepts to induce prior knowledge into IID, a reconstruction problem. Concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge. Please visit https://avani17101.github.io/Concept-Distilllation/ for code and more details.
    摘要 人类使用抽象概念来理解而不是具体特征。 current interpretability research 关注人类中心的概念解释 neural networks。 Concept Activation Vectors (CAVs) 可以 estimating a model's sensitivity and possible biases to a given concept. 在这篇论文中,我们将 CAVs 从后续分析扩展到 ante-hoc 训练,以降低模型偏见 durch fine-tuning 使用额外的 Concept Loss。 在过去, concepts 是在网络的最后一层定义的。 我们扩展了它到中间层使用类prototype,这使得最后一层的 Convolution 层能够更好地学习类。 我们还引入了 Concept Distillation,它可以使用一个已经训练过的、知识充足的模型作为教师,来创造更加丰富的概念。 我们的方法可以让模型对概念敏感或不敏感。 我们在几个分类问题中应用了概念敏感训练来减少偏见。 我们还使用概念来启发IID,一个重建问题。 概念敏感训练可以提高模型解释性,减少偏见,并启发先验知识。 请参考 https://avani17101.github.io/Concept-Distilllation/ 获取代码和更多细节。

A Data-driven and multi-agent decision support system for time slot management at container terminals: A case study for the Port of Rotterdam

  • paper_url: http://arxiv.org/abs/2311.15298
  • repo_url: None
  • paper_authors: Ali Nadi, Maaike Snelder, J. W. C. van Lint, Lóránt Tavasszy
  • for: 这篇论文旨在提供一种智能决策支持系统,以控制和管理 Container Hub 中卡车的到达时间。
  • methods: 该模型利用历史数据预测系统状态,并根据控制策略来满足多个潜在利益相关方的需求。
  • results: 模拟结果表明,通过应用该方法可以在系统中获得显著的改善。
    Abstract Controlling the departure time of the trucks from a container hub is important to both the traffic and the logistics systems. This, however, requires an intelligent decision support system that can control and manage truck arrival times at terminal gates. This paper introduces an integrated model that can be used to understand, predict, and control logistics and traffic interactions in the port-hinterland ecosystem. This approach is context-aware and makes use of big historical data to predict system states and apply control policies accordingly, on truck inflow and outflow. The control policies ensure multiple stakeholders satisfaction including those of trucking companies, terminal operators, and road traffic agencies. The proposed method consists of five integrated modules orchestrated to systematically steer truckers toward choosing those time slots that are expected to result in lower gate waiting times and more cost-effective schedules. The simulation is supported by real-world data and shows that significant gains can be obtained in the system.
    摘要 控制Container hub出发时间对交通和物流系统都很重要。然而,这需要一个智能决策支持系统来控制和管理货架到 термина尔门的车辆到达时间。这篇文章介绍了一个整合的模型,可以用来理解、预测和控制港口-后勤环境中的物流和交通互动。这种方法是Context-aware,利用历史大数据预测系统状态并应用控制策略,以确保多个利益相关方的满意度,包括货运公司、码头运营商和公路交通管理机构。提出的方法由五个整合模块组成,用于系统化地引导车辆选择预计会导致更低的门后等待时间和更经济的计划。 simulations supported by real-world data show that significant gains can be obtained in the system.

Spatial and Temporal Characteristics of Freight Tours: A Data-Driven Exploratory Analysis

  • paper_url: http://arxiv.org/abs/2311.15287
  • repo_url: None
  • paper_authors: Ali Nadi, Lóránt Tavasszy, J. W. C. van Lint, Maaike Snelder
  • for: 本研究目的是对于不同的货物市场中的数字货物运输活动数据进行模型化分析,推导出运输和路径模式。
  • methods: 本研究使用了一个新的离散-连续决策树方法来从货物运输数据中提取规则。
  • results: 研究发现,空间和时间特征是重要的,可以捕捉不同的巡航和时间径patterns。许多交通市场中的交通公司对于填充情况的反应也有所不同。
    Abstract This paper presents a modeling approach to infer scheduling and routing patterns from digital freight transport activity data for different freight markets. We provide a complete modeling framework including a new discrete-continuous decision tree approach for extracting rules from the freight transport data. We apply these models to collected tour data for the Netherlands to understand departure time patterns and tour strategies, also allowing us to evaluate the effectiveness of the proposed algorithm. We find that spatial and temporal characteristics are important to capture the types of tours and time-of-day patterns of freight activities. Also, the empirical evidence indicates that carriers in most of the transport markets are sensitive to the level of congestion. Many of them adjust the type of tour, departure time, and the number of stops per tour when facing a congested zone. The results can be used by practitioners to get more grip on transport markets and develop freight and traffic management measures.
    摘要 Here is the text in Simplified Chinese:这篇论文提出了一种基于数字货物运输活动数据的路线和计划模型化方法,用于不同的货物市场。该方法包括一种新的抽象连续决策树方法,用于从数据中提取规则。作者使用这种方法对荷兰实际旅游数据进行了应用,以了解出发时间模式和旅游策略,并评估了提议的效果。研究发现,空间和时间特征是重要的,以捕捉不同的旅游和时间征式。此外,许多运输商在面临拥堵情况时,会调整巡游类型、出发时间和停留数量。结果可以用于实践者更好地了解运输市场,并开发有效的货物和交通管理措施。

Bias-Variance Trade-off in Physics-Informed Neural Networks with Randomized Smoothing for High-Dimensional PDEs

  • paper_url: http://arxiv.org/abs/2311.15283
  • repo_url: None
  • paper_authors: Zheyuan Hu, Zhouhao Yang, Yezhen Wang, George Em Karniadakis, Kenji Kawaguchi
  • for: 本研究旨在提高物理学 Informed Neural Networks (PINNs) 在高维度情况下的计算效率,并且解决高维度Derivative computation中的计算成本问题。
  • methods: 本研究使用 Randomized Smoothing PINN (RS-PINN) 方法,通过加入 Gaussian noise来实现随机缓和,从而使得 Monte Carlo 方法可以用于 Derivative aproximation,从而消除了 costly auto-differentiation。
  • results: 研究发现,RS-PINN 在高维度情况下存在偏差,这些偏差来自 PDE 非线性和 MSE 损失函数的非线性。研究还提出了一些偏差修正技术,包括基于 PDE 非线性的修正方法。这些修正方法使得 RS-PINN 可以更好地捕捉高维度 PDE 的解。
    Abstract While physics-informed neural networks (PINNs) have been proven effective for low-dimensional partial differential equations (PDEs), the computational cost remains a hurdle in high-dimensional scenarios. This is particularly pronounced when computing high-order and high-dimensional derivatives in the physics-informed loss. Randomized Smoothing PINN (RS-PINN) introduces Gaussian noise for stochastic smoothing of the original neural net model, enabling Monte Carlo methods for derivative approximation, eliminating the need for costly auto-differentiation. Despite its computational efficiency in high dimensions, RS-PINN introduces biases in both loss and gradients, negatively impacting convergence, especially when coupled with stochastic gradient descent (SGD). We present a comprehensive analysis of biases in RS-PINN, attributing them to the nonlinearity of the Mean Squared Error (MSE) loss and the PDE nonlinearity. We propose tailored bias correction techniques based on the order of PDE nonlinearity. The unbiased RS-PINN allows for a detailed examination of its pros and cons compared to the biased version. Specifically, the biased version has a lower variance and runs faster than the unbiased version, but it is less accurate due to the bias. To optimize the bias-variance trade-off, we combine the two approaches in a hybrid method that balances the rapid convergence of the biased version with the high accuracy of the unbiased version. In addition, we present an enhanced implementation of RS-PINN. Extensive experiments on diverse high-dimensional PDEs, including Fokker-Planck, HJB, viscous Burgers', Allen-Cahn, and Sine-Gordon equations, illustrate the bias-variance trade-off and highlight the effectiveness of the hybrid RS-PINN. Empirical guidelines are provided for selecting biased, unbiased, or hybrid versions, depending on the dimensionality and nonlinearity of the specific PDE problem.
    摘要 physics-informed neural networks (PINNs) 已经在低维度partial differential equations (PDEs) 中证明了其效iveness,但计算成本在高维度场景中仍然是一个障碍。特别是当计算高阶和高维度导数时,物理学习损失中的计算成本变得非常高。Randomized Smoothing PINN (RS-PINN) 引入 Gaussian 噪声,用于随机简化原始神经网络模型,使得 Monte Carlo 方法可以用于导数 aproximation,从而消除了自动梯度分布的成本。然而,RS-PINN 在高维度场景中引入了 bias,影响了整体的 converges,尤其是在与随机梯度下降 (SGD) 结合使用时。我们进行了 PINN 中 bias 的全面分析,并归因于 Mean Squared Error (MSE) 损失函数的非线性和 PDE 非线性。我们提出了针对不同顺序的 PDE 非线性的修正技术。不偏 bias 版本的 RS-PINN 具有较低的偏差和较快的运行速度,但它的精度较差。为了优化偏差-偏差质量的负面选择,我们将偏差版本和不偏版本组合在一起,以充分利用它们的优点。此外,我们还提出了 RS-PINN 的改进实现方式。我们在多种高维度 PDE 中进行了广泛的实验,包括 Fokker-Planck、HJB、viscous Burgers、Allen-Cahn 和 Sine-Gordon 等方程,并证明了 bias-variance 负面的选择。我们还提供了实际指南,以帮助选择适合特定 PDE 问题的偏差、不偏或混合版本。

  • paper_url: http://arxiv.org/abs/2311.15269
  • repo_url: None
  • paper_authors: Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang
  • for: 提高深度神经网络(DNN)训练和推理 task 的性能,采用多个设备分布执行,并且需要谨慎地制定执行计划。
  • methods: 使用自动化搜索系统 Tessel,实现了多种操作放置策略下的高效执行计划搜索。
  • results: Tessel 可以在训练和推理任务中提高性能,实验结果显示,与代表性 DNN 模型相比,Tessel 可以实现 Training 性能提高5.5倍,并且在推理任务中减少了38%的延迟时间。
    Abstract Increasingly complex and diverse deep neural network (DNN) models necessitate distributing the execution across multiple devices for training and inference tasks, and also require carefully planned schedules for performance. However, existing practices often rely on predefined schedules that may not fully exploit the benefits of emerging diverse model-aware operator placement strategies. Handcrafting high-efficiency schedules can be challenging due to the large and varying schedule space. This paper presents Tessel, an automated system that searches for efficient schedules for distributed DNN training and inference for diverse operator placement strategies. To reduce search costs, Tessel leverages the insight that the most efficient schedules often exhibit repetitive pattern (repetend) across different data inputs. This leads to a two-phase approach: repetend construction and schedule completion. By exploring schedules for various operator placement strategies, Tessel significantly improves both training and inference performance. Experiments with representative DNN models demonstrate that Tessel achieves up to 5.5x training performance speedup and up to 38% inference latency reduction.
    摘要 随着深度神经网络(DNN)模型的复杂度和多样性的增加,训练和推理任务需要分布在多个设备上进行执行,并且需要谨慎地规划表现。然而,现有的做法 oftentimes 采用固定的计划,可能无法完全利用emerging的多样模型-aware操作放置策略的优势。手工制定高效计划可以是困难的,因为计划空间很大且变化很大。这篇论文介绍了Tessel,一个自动化的系统,用于搜索分布式DNN训练和推理中的高效计划,包括多种操作放置策略。为了降低搜索成本,Tessel利用了对着重复性(repetend)的观察,即不同数据输入下的高效计划往往具有相同的模式。这导致了一个两相阶段的方法:repetend construction和schedule completion。通过对不同操作放置策略的计划进行探索,Tessel可以显著提高训练和推理性能。实验表明,使用代表性DNN模型,Tessel可以实现训练性能速度提高达5.5倍,并且在推理中降低了38%的延迟。

Unlearning via Sparse Representations

  • paper_url: http://arxiv.org/abs/2311.15268
  • repo_url: None
  • paper_authors: Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal
  • for: 删除模型中的知识(class unlearning)
  • methods: 基于抽象瓶颈的几何表示法,减少计算成本
  • results: 在CIFAR-10、CIFAR-100和LACUNA-100三个数据集上,与现有的SCRUB方法相当或更好的性能,计算成本几乎为零
    Abstract Machine \emph{unlearning}, which involves erasing knowledge about a \emph{forget set} from a trained model, can prove to be costly and infeasible by existing techniques. We propose a nearly compute-free zero-shot unlearning technique based on a discrete representational bottleneck. We show that the proposed technique efficiently unlearns the forget set and incurs negligible damage to the model's performance on the rest of the data set. We evaluate the proposed technique on the problem of \textit{class unlearning} using three datasets: CIFAR-10, CIFAR-100, and LACUNA-100. We compare the proposed technique to SCRUB, a state-of-the-art approach which uses knowledge distillation for unlearning. Across all three datasets, the proposed technique performs as well as, if not better than SCRUB while incurring almost no computational cost.
    摘要

Utilizing Multiple Inputs Autoregressive Models for Bearing Remaining Useful Life Prediction

  • paper_url: http://arxiv.org/abs/2311.16192
  • repo_url: None
  • paper_authors: Junliang Wang, Qinghua Zhang, Guanhua Zhu, Guoxi Sun
  • For: The paper aims to improve the accuracy of remaining useful life (RUL) prediction for rolling bearings in industrial production.* Methods: The proposed method uses a multi-input autoregressive model that integrates vibration signals with previously predicted health indicator (HI) values, utilizing feature fusion and autoregressive iterations to improve generalization capabilities.* Results: The proposed method achieves significantly lower root mean square error (RMSE) and score compared to other backbone networks using similar autoregressive approaches, and outperforms traditional autoregressive models and non-autoregressive networks in terms of generalization ability.
    Abstract Accurate prediction of the Remaining Useful Life (RUL) of rolling bearings is crucial in industrial production, yet existing models often struggle with limited generalization capabilities due to their inability to fully process all vibration signal patterns. We introduce a novel multi-input autoregressive model to address this challenge in RUL prediction for bearings. Our approach uniquely integrates vibration signals with previously predicted Health Indicator (HI) values, employing feature fusion to output current window HI values. Through autoregressive iterations, the model attains a global receptive field, effectively overcoming the limitations in generalization. Furthermore, we innovatively incorporate a segmentation method and multiple training iterations to mitigate error accumulation in autoregressive models. Empirical evaluation on the PMH2012 dataset demonstrates that our model, compared to other backbone networks using similar autoregressive approaches, achieves significantly lower Root Mean Square Error (RMSE) and Score. Notably, it outperforms traditional autoregressive models that use label values as inputs and non-autoregressive networks, showing superior generalization abilities with a marked lead in RMSE and Score metrics.
    摘要 <>输入文本翻译成简化中文。<>工业生产中,准确预测滚珠机器的剩余有用生命(RUL)非常重要,但现有模型往往因不能完全处理振荡信号模式而受到限制。我们介绍了一种新的多输入自回归模型,用于解决滚珠机器RUL预测中的这种挑战。我们的方法独特地结合振荡信号和先前预测的健康指标(HI)值,使用特征融合输出当前窗口HI值。通过自回归迭代,模型实现了全球接收场,有效地超越了限制性。此外,我们创新地引入分割方法和多个训练迭代,以降低自回归模型中的错误积累。实验表明,我们的模型,相比其他基于同样的自回归方法的背bone网络,在PMH2012数据集上显著降低了Root Mean Square Error(RMSE)和Score指标。特别是,它超过了使用标签值作为输入的传统自回归模型和非自回归网络,显示出优于普通自回归模型和非自回归网络的普通化能力,并且在RMSE和Score指标中表现出了明显的优势。

Algorithm Evolution Using Large Language Model

  • paper_url: http://arxiv.org/abs/2311.15249
  • repo_url: https://github.com/Beta-Blaze/astral-LLM
  • paper_authors: Fei Liu, Xialiang Tong, Mingxuan Yuan, Qingfu Zhang
  • for: 本研究的目的是提出一种新的自动生成优化算法方法,帮助减少人类专家的努力和知识要求。
  • methods: 该方法利用大语言模型(LLM)来自动生成优化算法,而不需要训练模型。
  • results: 在寻找销售人员旅行问题的解决方案上,使用AEL生成的构造式算法比基于简单手动设计和LLM生成的启发式更高效。 AEL也比其他领域深度学习模型基于算法的方法更加可扩展性。
    Abstract Optimization can be found in many real-life applications. Designing an effective algorithm for a specific optimization problem typically requires a tedious amount of effort from human experts with domain knowledge and algorithm design skills. In this paper, we propose a novel approach called Algorithm Evolution using Large Language Model (AEL). It utilizes a large language model (LLM) to automatically generate optimization algorithms via an evolutionary framework. AEL does algorithm-level evolution without model training. Human effort and requirements for domain knowledge can be significantly reduced. We take constructive methods for the salesman traveling problem as a test example, we show that the constructive algorithm obtained by AEL outperforms simple hand-crafted and LLM-generated heuristics. Compared with other domain deep learning model-based algorithms, these methods exhibit excellent scalability across different problem sizes. AEL is also very different from previous attempts that utilize LLMs as search operators in algorithms.
    摘要 优化可以在各种实际应用中找到。设计一个有效的优化算法通常需要培训的人员具有域知识和算法设计技能的几个月的努力。在这篇论文中,我们提出了一种新的方法called Algorithm Evolution using Large Language Model(AEL)。它利用大语言模型(LLM)来自动生成优化算法via演化框架。AEL不需要训练模型。人工劳动和域知识的需求可以减少到最小。我们使用constructive方法为旅行商问题作为测试例子,我们显示了constructive算法获得的AEL比simple手动和LLM生成的启发性较高。与其他域深度学习模型基于算法相比,这些方法在不同的问题大小上 exhibit excellent scalability。AEL与前一些利用LLM作为搜索操作的方法不同。

ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection

  • paper_url: http://arxiv.org/abs/2311.15243
  • repo_url: None
  • paper_authors: Yichen Bai, Zongbo Han, Changqing Zhang, Bing Cao, Xiaoheng Jiang, Qinghua Hu
  • for: 提高异常点检测方法的效果,尤其是对于最难的异常点样本(ID-like outliers)。
  • methods: 使用CLIP从 vicinity 空间中挑选ID样本的近似点,并利用这些近似点来进一步利用CLIP的能力进行异常点检测。
  • results: 在多个实际图像数据集上(如ImageNet-1k)实现了superior的几步学习性能,比如在4步OOD检测中,我们的方法相比state-of-the-art方法,平均降低FPR95的值 by 12.16%,并提高了平均AUROC的值 by 2.76%.
    Abstract Out-of-distribution (OOD) detection methods often exploit auxiliary outliers to train model identifying OOD samples, especially discovering challenging outliers from auxiliary outliers dataset to improve OOD detection. However, they may still face limitations in effectively distinguishing between the most challenging OOD samples that are much like in-distribution (ID) data, i.e., ID-like samples. To this end, we propose a novel OOD detection framework that discovers ID-like outliers using CLIP from the vicinity space of the ID samples, thus helping to identify these most challenging OOD samples. Then a prompt learning framework is proposed that utilizes the identified ID-like outliers to further leverage the capabilities of CLIP for OOD detection. Benefiting from the powerful CLIP, we only need a small number of ID samples to learn the prompts of the model without exposing other auxiliary outlier datasets. By focusing on the most challenging ID-like OOD samples and elegantly exploiting the capabilities of CLIP, our method achieves superior few-shot learning performance on various real-world image datasets (e.g., in 4-shot OOD detection on the ImageNet-1k dataset, our method reduces the average FPR95 by 12.16% and improves the average AUROC by 2.76%, compared to state-of-the-art methods).
    摘要 OUT-OF-DISTRIBUTION (OOD) 探测方法经常利用辅助异常数据来训练模型标识异常样本,特别是从辅助异常数据集中发现挑战性异常样本以提高 OOD 探测。但它们可能仍然面临有限制力在有效地分辨最挑战性的 OOD 样本,即 ID 样本类似的样本。为此,我们提议一种新的 OOD 探测框架,利用 CLIP 从 ID 样本附近的空间检测 ID 样本类似的异常样本,从而帮助标识这些最挑战性的 OOD 样本。然后,我们提议一种启发学习框架,利用已经标识的 ID 样本类似异常样本来进一步利用 CLIP 的能力进行 OOD 探测。由于我们只需要一小数量的 ID 样本来学习启发,因此我们不需要暴露其他辅助异常数据集。通过专注于最挑战性 ID 样本类似 OOD 样本并精巧利用 CLIP 的能力,我们的方法在多个实际图像 dataset 上实现了优秀的几极学习性能(例如,在 ImageNet-1k dataset 上的 4-shot OOD 探测中,我们的方法降低了平均 FPR95 值 by 12.16%,并提高了平均 AUROC 值 by 2.76%,相比于当前方法)。

See and Think: Embodied Agent in Virtual Environment

  • paper_url: http://arxiv.org/abs/2311.15209
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Zhonghan Zhao, Wenhao Chai, Xuan Wang, Li Boyi, Shengyu Hao, Shidong Cao, Tian Ye, Jenq-Neng Hwang, Gaoang Wang
  • for: This paper aims to propose a comprehensive and visionary embodied agent, STEVE, in the Minecraft virtual environment.
  • methods: STEVE consists of three key components: vision perception, language instruction, and code action. Vision perception involves interpreting visual information in the environment, while language instruction is responsible for iterative reasoning and decomposing complex tasks into manageable guidelines. Code action generates executable skill actions based on retrieval in a skill database.
  • results: The paper achieves at most 1.5 times faster unlocking key tech trees and 2.5 times quicker in block search tasks compared to previous state-of-the-art methods. Additionally, the paper collects a new dataset called STEVE-21K, which includes 600+ vision-environment pairs, 20K knowledge question-answering pairs, and 200+ skill-code pairs.
    Abstract Large language models (LLMs) have achieved impressive progress on several open-world tasks. Recently, using LLMs to build embodied agents has been a hotspot. In this paper, we propose STEVE, a comprehensive and visionary embodied agent in the Minecraft virtual environment. STEVE consists of three key components: vision perception, language instruction, and code action. Vision perception involves the interpretation of visual information in the environment, which is then integrated into the LLMs component with agent state and task instruction. Language instruction is responsible for iterative reasoning and decomposing complex tasks into manageable guidelines. Code action generates executable skill actions based on retrieval in skill database, enabling the agent to interact effectively within the Minecraft environment. We also collect STEVE-21K dataset, which includes 600$+$ vision-environment pairs, 20K knowledge question-answering pairs, and 200$+$ skill-code pairs. We conduct continuous block search, knowledge question and answering, and tech tree mastery to evaluate the performance. Extensive experiments show that STEVE achieves at most $1.5 \times$ faster unlocking key tech trees and $2.5 \times$ quicker in block search tasks compared to previous state-of-the-art methods.
    摘要 大型语言模型(LLM)已经取得了许多开放世界任务的卓越成果。现在使用LLM创建具有身体的代理人是一个热点。在这篇论文中,我们提出了STEVE,一个涵盖性和未来性的具有身体的代理人,在 Minecraft 虚拟环境中实现。STEVE 包括三个关键 ком成分:视觉识别、语言指令和代码行动。视觉识别涉及环境中的视觉信息的解释,并与代理人的状态和任务指令结合在LLMs 中。语言指令负责逐步逻辑和精确化复杂任务,并将其转换为可行的指令。代码行动从技能数据库中获取可行的技能动作,让代理人在 Minecraft 环境中实现有效的互动。我们还收集了 STEVE-21K 数据集,包括 600+ 视觉环境组合、20,000 知识问题答案组合和 200+ 技能代码组合。我们进行了无间断的对应搜寻、知识问题回答和技术树精通,以评估表现。实验结果显示,STEVE 在开关键技术树和对应搜寻任务中实现了最多 $1.5 \times$ 更快的解锁,并在对应搜寻任务中实现了最多 $2.5 \times$ 更快的速度。

LongStory: Coherent, Complete and Length Controlled Long story Generation

  • paper_url: http://arxiv.org/abs/2311.15208
  • repo_url: None
  • paper_authors: Kyeongman Park, Nakyeong Yang, Kyomin Jung
  • for: 这篇论文旨在提供一种可控长度、具有准确性和完整性的长篇故事生成模型。
  • methods: 这篇论文提出了两种新的方法:首先,使用长期和短期上下文权重调整器(CWC),以 acknowledge their distinct roles。其次,使用 DISCOURSE TOKENS 来表达长篇故事的结构位置(LSP)。
  • results: 在三个不同的数据集上进行训练,LongStory 模型比其他基elines,包括 Plotmachine,在 coherence、completeness、 relevance 和 repetitiveness 方面表现出色。此外,我们还进行了零shot测试,以评估模型对数据集之外的预测能力,并对我们的方法进行了验证。
    Abstract A human author can write any length of story without losing coherence. Also, they always bring the story to a proper ending, an ability that current language models lack. In this work, we present the LongStory for coherent, complete, and length-controlled long story generation. LongStory introduces two novel methodologies: (1) the long and short-term contexts weight calibrator (CWC) and (2) long story structural positions (LSP). The CWC adjusts weights for long-term context Memory and short-term context Cheating, acknowledging their distinct roles. The LSP employs discourse tokens to convey the structural positions of a long story. Trained on three datasets with varied average story lengths, LongStory outperforms other baselines, including the strong story generator Plotmachine, in coherence, completeness, relevance, and repetitiveness. We also perform zero-shot tests on each dataset to assess the model's ability to predict outcomes beyond its training data and validate our methodology by comparing its performance with variants of our model.
    摘要 人类作者可以写出任意长度的故事,而不失一致性。他们也总是能够为故事做出合适的结尾,这是当前语言模型缺乏的能力。在这项工作中,我们提出了长故事生成的新方法:长和短期上下文权重调整器(CWC)和长故事结构位置(LSP)。CWC调整了长期上下文记忆和短期上下文诈取的权重,认可他们的不同角色。LSP使用演讲符号传达长故事的结构位置。我们在三个不同平均故事长度的数据集上训练了LongStory,并与其他基elines进行比较,包括强大的故事生成器Plotmachine。LongStory在一致性、完整性、 relevance和重复性方面超过了基elines,并在零工作测试中也表现出色。我们还对每个数据集进行零工作测试,以评估模型在训练数据以外的预测能力,并对我们的方法进行验证。

ChatGPT and Beyond: The Generative AI Revolution in Education

  • paper_url: http://arxiv.org/abs/2311.15198
  • repo_url: None
  • paper_authors: Mohammad AL-Smadi
  • for: This paper aims to explore the potential applications and implications of generative AI models, particularly ChatGPT, in the educational landscape.
  • methods: The paper conducts a comprehensive and rigorous evaluation of recent academic literature on generative AI models in education, specifically targeting high-impact research from Scopus-indexed Q1 and Q2 journals published between November 2022 and July 2023.
  • results: The survey finds the potential benefits, challenges, and emerging trends in the integration of AI technologies into learning environments, and seeks to contribute to the understanding of the nexus between artificial intelligence and education, empowering educators, researchers, and policymakers to make informed decisions about the integration of AI technologies into learning environments.
    Abstract The wide adoption and usage of generative artificial intelligence (AI) models, particularly ChatGPT, has sparked a surge in research exploring their potential applications in the educational landscape. This survey examines academic literature published between November, 2022, and July, 2023, specifically targeting high-impact research from Scopus-indexed Q1 and Q2 journals. This survey delves into the practical applications and implications of generative AI models across a diverse range of educational contexts. Through a comprehensive and rigorous evaluation of recent academic literature, this survey seeks to illuminate the evolving role of generative AI models, particularly ChatGPT, in education. By shedding light on the potential benefits, challenges, and emerging trends in this dynamic field, the survey endeavors to contribute to the understanding of the nexus between artificial intelligence and education. The findings of this review will empower educators, researchers, and policymakers to make informed decisions about the integration of AI technologies into learning environments.
    摘要 “广泛的采用和使用生成人工智能(AI)模型,特别是ChatGPT,已经引发了教育领域的研究潮流。这份调查探讨了在2022年11月至2023年7月期间发表的学术期刊文章,特别是来自Scopus搜索引擎的Q1和Q2期刊。这份调查探讨了生成AI模型在多元化教育背景下的实际应用和影响。通过 sistematic 和严谨的学术文献评估,这份调查想要照明生成AI模型在教育领域中的演变角色,特别是ChatGPT。透过探讨这些技术的潜在优点、挑战和趋势,这份调查想要帮助教育工作者、研究者和政策制定者做出了解AI技术在学习环境中的应用。这份评估结果将帮助人们更好地理解人工智能和教育之间的联系。”

Neural Network Models of Becoming a Cardinal Principle Knower

  • paper_url: http://arxiv.org/abs/2311.15194
  • repo_url: None
  • paper_authors: Vima Gupta, Sashank Varma
  • for: 这个论文 investigate 儿童进入小学时,从记忆 count list 到理解 successor function 和 countably infinite 的过程。
  • methods: 这两个神经网络模型都学习 successor function,一个使用一个热点编码方法,另一个使用位值编码方法。
  • results: 研究发现,使用位值编码方法的模型会在十进制界限上显示预测的 representational similarity drop。 counting across a tens boundary 可以理解为二维空间中的向量运算,其中同一个十进制位的数字组织在线性分割的方式,而同一个一进制位的数字则 grouped together。 一个学习纲curriculum simulation 表明,在 развивающейся child 的数学环境中,小数的表示 continually sharpened even as larger numbers begin to be learned.
    Abstract As children enter elementary school, their understanding of the ordinal structure of numbers transitions from a memorized count list of the first 50-100 numbers to knowing the successor function and understanding the countably infinite. We investigate this developmental change in two neural network models that learn the successor function on the pairs (N, N+1) for N in (0, 98). The first uses a one-hot encoding of the input and output values and corresponds to children memorizing a count list, while the second model uses a place-value encoding and corresponds to children learning the language rules for naming numbers. The place-value model showed a predicted drop in representational similarity across tens boundaries. Counting across a tens boundary can be understood as a vector operation in 2D space, where the numbers with the same tens place are organized in a linearly separable manner, whereas those with the same ones place are grouped together. A curriculum learning simulation shows that, in the expanding numerical environment of the developing child, representations of smaller numbers continue to be sharpened even as larger numbers begin to be learned. These models set the stage for future work using recurrent architectures to move beyond learning the successor function to simulating the counting process more generally, and point towards a deeper understanding of what it means to understand the countably infinite.
    摘要 Note: The above text is translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation is written in the formal style, which is appropriate for academic or professional writing.Here are some key differences between the original text and the translation:1. Word order: In Chinese, the word order is different from English. In the translation, the word order is changed to conform to the standard word order in Chinese. For example, "As children enter elementary school" becomes "进入小学时" (jìn rù xiǎo xué shí) in Chinese.2. Grammar: Chinese grammar is different from English, and the translation is written in the formal style, which is appropriate for academic or professional writing. For example, the phrase "their understanding of the ordinal structure of numbers" becomes "他们对数字序列结构的理解" (tāmen duì shū zhì xiàng jì yì) in Chinese.3. Vocabulary: Some vocabulary is changed to conform to the standard usage in Chinese. For example, "count list" becomes "数字列表" (shù zhì jiǎo biǎo) in Chinese.4. Idioms: Chinese has many idiomatic expressions that do not have direct translations in English. The translation takes into account the cultural and linguistic context of the original text and uses appropriate idiomatic expressions in Chinese. For example, "counting across a tens boundary" becomes "跨越十的数字" (kuà yuè shí de shù zhì) in Chinese.

IA-LSTM: Interaction-Aware LSTM for Pedestrian Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2311.15193
  • repo_url: None
  • paper_authors: Yuehai Chen
  • for: 预测人行车道上人员的轨迹,是自动驾驶或自主移动机器人领域的不可或缺的任务,因为这可以帮助机器做出避免碰撞的政策决策。
  • methods: 我们提出了一种基于correntropy的新机制,可以衡量人群之间的相互作用的重要性,同时还可以建立每个人行车道上的个人空间。我们还提出了一种含有这种数据驱动机制的交互模块,可以有效地提取人群动态的人群交互特征表示,并计算相应的重要性权重。
  • results: 我们在两个公共数据集上进行了实验,并证明了我们的模型可以在评估人行车道上的轨迹预测方面达到更好的性能,比如最新的方法。
    Abstract Predicting the trajectory of pedestrians in crowd scenarios is indispensable in self-driving or autonomous mobile robot field because estimating the future locations of pedestrians around is beneficial for policy decision to avoid collision. It is a challenging issue because humans have different walking motions and the interactions between humans and objects in the current environment, especially between human themselves, are complex. Previous researches have focused on how to model the human-human interactions, however, neglecting the relative importance of interactions. In order to address this issue, we introduce a novel mechanism based on the correntropy, which not only can measure the relative importance of human-human interactions, but also can build personal space for each pedestrian. We further propose an Interaction Module including this data-driven mechanism that can effectively extract feature representations of dynamic human-human interactions in the scene and calculate corresponding weights to represent the importance of different interactions. To share such social messages among pedestrians, we design an interaction-aware architecture based on the Long Short-Term Memory (LSTM) network for trajectory prediction. We demonstrate the performance of our model on two public datasets and the experimental results demonstrate that our model can achieve better performance than several latest methods with good performance.
    摘要 预测人群中行人的轨迹是自动驾驶或自主移动机器场景中必不可少的因为可以帮助做出避免碰撞的政策决策。然而,这是一个具有挑战性的问题,因为人们的步行方式不同,人与人和人与环境中的交互都是复杂的。 previous researches have focused on how to model human-human interactions, but neglected the relative importance of interactions.为了解决这个问题,我们引入了一种基于correntropy的新机制,不仅可以测量人类之间交互的相对重要性,还可以建立每名行人的个人空间。我们还提出了一个包含这种数据驱动机制的交互模块,可以有效地提取场景中动态人类之间交互的特征表示和计算相应的重要性权重。为了在行人之间分享社会消息,我们设计了基于Long Short-Term Memory(LSTM)网络的交通预测建筑。我们在两个公共数据集上进行了实验,结果表明,我们的模型可以比多种最新的方法表现出更好的性能。

MACE: A Multi-pattern Accommodated and Efficient Anomaly Detection Method in the Frequency Domain

  • paper_url: http://arxiv.org/abs/2311.16191
  • repo_url: None
  • paper_authors: Feiyi Chen, Yingying zhang, Zhen Qin, Lunting Fan, Renhe Jiang, Yuxuan Liang, Qingsong Wen, Shuiguang Deng
  • For: 这个论文的目的是提出一种基于多模式的高效异常检测方法,用于适应云端系统中的异常检测问题。* Methods: 该方法使用了一种基于频域的多模式检测方法,包括一种新的几何EXTRACT机制,可以处理多种常见的正常模式,以及一种双重卷积机制,可以提高短期异常检测的敏感度和效率。* Results: 实验表明,该方法可以有效地处理多种常见的正常模式,并且可以达到现状领先的性能水平,同时具有高效的计算成本。
    Abstract Anomaly detection significantly enhances the robustness of cloud systems. While neural network-based methods have recently demonstrated strong advantages, they encounter practical challenges in cloud environments: the contradiction between the impracticality of maintaining a unique model for each service and the limited ability of dealing with diverse normal patterns by a unified model, as well as issues with handling heavy traffic in real time and short-term anomaly detection sensitivity. Thus, we propose MACE, a Multi-pattern Accommodated and efficient Anomaly detection method in the frequency domain for time series anomaly detection. There are three novel characteristics of it: (i) a pattern extraction mechanism excelling at handling diverse normal patterns, which enables the model to identify anomalies by examining the correlation between the data sample and its service normal pattern, instead of solely focusing on the data sample itself; (ii) a dualistic convolution mechanism that amplifies short-term anomalies in the time domain and hinders the reconstruction of anomalies in the frequency domain, which enlarges the reconstruction error disparity between anomaly and normality and facilitates anomaly detection; (iii) leveraging the sparsity and parallelism of frequency domain to enhance model efficiency. We theoretically and experimentally prove that using a strategically selected subset of Fourier bases can not only reduce computational overhead but is also profit to distinguish anomalies, compared to using the complete spectrum. Moreover, extensive experiments demonstrate MACE's effectiveness in handling diverse normal patterns with a unified model and it achieves state-of-the-art performance with high efficiency. \end{abstract}
    摘要 cloud系统的稳定性得到了异常检测的增强。而基于神经网络的方法最近表现出了强大的优势,但在云环境中遇到了实际挑战:一个服务只能维护一个唯一的模型,而另一个服务可能会有多种正常模式,这两者之间的矛盾,以及在实时处理大量流量时和短期异常检测敏感性的问题。因此,我们提出了MACE,一种适用于时间序列异常检测的频率频谱中的多模式适应和高效异常检测方法。它具有以下三个新特点:1. 能够处理多种正常模式的模式提取机制,使得模型可以通过对数据样本和其服务正常模式之间的相关性进行检查,而不是仅仅关注数据样本本身来识别异常。2. 使用双重卷积机制,在时间频谱中强制扩大短期异常,并阻止异常重建的过程,使异常检测错误差异增加,便于异常检测。3. 利用频率频谱的稀热和并行性来提高模型效率。我们理论和实验证明,使用策略性选择的快捷谱可以不仅减少计算开销,还能够更好地 отли异常,比使用完整谱更有利。此外,广泛的实验表明,MACE可以有效地处理多种正常模式,并且达到了当前最高的性能和高效性。

Domain Knowledge Injection in Bayesian Search for New Materials

  • paper_url: http://arxiv.org/abs/2311.15162
  • repo_url: None
  • paper_authors: Zikai Xie, Xenophon Evangelopoulos, Joseph Thacker, Andrew Cooper
  • for: 本研究设计了一个称为DKIBO的潜在搜寻过程优化算法,以整合领域知识来调整探索空间。
  • methods: 本研究使用了 bayesian 优化(BO)算法,并在这个过程中内置了领域知识,以提高搜寻的效率。
  • results: 本研究透过实验验证了DKIBO的实际价值,并在不同的实验设定下进行了验证和减少分析。
    Abstract In this paper we propose DKIBO, a Bayesian optimization (BO) algorithm that accommodates domain knowledge to tune exploration in the search space. Bayesian optimization has recently emerged as a sample-efficient optimizer for many intractable scientific problems. While various existing BO frameworks allow the input of prior beliefs to accelerate the search by narrowing down the space, incorporating such knowledge is not always straightforward and can often introduce bias and lead to poor performance. Here we propose a simple approach to incorporate structural knowledge in the acquisition function by utilizing an additional deterministic surrogate model to enrich the approximation power of the Gaussian process. This is suitably chosen according to structural information of the problem at hand and acts a corrective term towards a better-informed sampling. We empirically demonstrate the practical utility of the proposed method by successfully injecting domain knowledge in a materials design task. We further validate our method's performance on different experimental settings and ablation analyses.
    摘要 在这篇论文中,我们提出了DKIBO算法,这是一种基于 bayesian 优化(BO)的算法,可以利用域知识来调整搜索空间中的探索。 bayesian 优化在最近几年内已经出现为许多不可解的科学问题的一种效果的优化器。 although various existing BO frameworks allow the input of prior beliefs to accelerate the search by narrowing down the space, incorporating such knowledge is not always straightforward and can often introduce bias and lead to poor performance. 在这篇论文中,我们提出了一种简单的方法来包含域知识在获取函数中,通过使用一个额外的束缚函数来增强 Gaussian 过程的描述力。这个函数遵循问题当前的结构信息,并作为一个更加有信息的抽象来帮助更好地采样。我们在材料设计任务中证明了这种方法的实践用途,并在不同的实验设置和缺失分析中 validate 了我们的方法的性能。

Hessian Aware Low-Rank Weight Perturbation for Continual Learning

  • paper_url: http://arxiv.org/abs/2311.15161
  • repo_url: https://github.com/lijiaqi/halrp
  • paper_authors: Jiaqi Li, Rui Wang, Yuanhao Lai, Changjian Shui, Sabyasachi Sahoo, Charles X. Ling, Shichun Yang, Boyu Wang, Christian Gagné, Fan Zhou
  • for: 本研究旨在提出一种基于低级数投影的持续学习算法,以便在进行多个任务的顺序学习时,无需忘记之前学习的知识。
  • methods: 本文提出了一种基于参数变化的低级数投影算法,通过对每层神经网络的任务适应参数进行低级数投影,以实现持续学习。具体来说,我们提出了一种基于Hessian矩阵的量化关系,以确定低级数投影级数。此外,我们还控制了模型资源增长,通过杀死不重要的参数来减少参数的增长。
  • results: 我们在多个 benchmark 上进行了广泛的实验,包括一个大规模任务集。与其他一些最新的方法进行比较,我们的方法能够更好地处理不同的任务顺序问题和忘记问题,并且能够更好地控制模型的资源增长。
    Abstract Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. A demo code can be found at https://github.com/lijiaqi/HALRP.
    摘要 <>将文本翻译成简化中文。<>持续学习目标是在一系列任务之间学习无忘记前一个任务所学习的知识。在这个工作中,我们提出了对 continual learning 的 Hessian Aware Low-Rank Perturbation 算法。通过将参数变化模型化为层次结构上的 weight matrix 变化,我们提议在每层神经网络中应用低级approximation。 Specifically,我们 theoretically 阐述了 Hessian 和我们提议的低级approximation之间的量化关系。然后,我们根据层次特征 gradient 和低级approximation error 的marginally增量来全局确定权重级别。此外,我们控制模型容量by pruning less important parameters,以减少参数增长。我们在不同的benchmark上进行了广泛的实验,包括一个大规模任务的数据集,并与一些最近的state-of-the-art方法进行了比较,以示出我们提议的方法的有效性和可扩展性。实验结果表明,我们的方法在不同的benchmark上表现更好,特别是在完成任务顺序稳定性和忘记问题方面。一个 demo 代码可以在 https://github.com/lijiaqi/HALRP 找到。

xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data

  • paper_url: http://arxiv.org/abs/2311.15156
  • repo_url: None
  • paper_authors: Jing Gong, Minsheng Hao, Xingyi Cheng, Xin Zeng, Chiming Liu, Jianzhu Ma, Xuegong Zhang, Taifeng Wang, Le Song
  • for: 这篇论文是用于解决高通量测序技术的进步使得单元级基因表达测量中的数据总量已经超过5000万记录,但类传播模型在处理这些数据时存在计算和内存的瓶颈。
  • methods: 作者提出了一种新的不对称编码器-解码器变换器模型,称为xTrimoGene$^\alpha$(简称xTrimoGene),该模型利用数据的稀疏特征来扩大预训练。
  • results: 实验表明,xTrimoGene模型可以在最大化transformer模型规模时提高性能,并且在多个下游任务中达到了最高精度,包括单元类型标注、扰动-seq效应预测和药品组合预测。xTrimoGene模型现在通过以下链接提供服务:https://api.biomap.com/xTrimoGene/apply。
    Abstract Advances in high-throughput sequencing technology have led to significant progress in measuring gene expressions at the single-cell level. The amount of publicly available single-cell RNA-seq (scRNA-seq) data is already surpassing 50M records for humans with each record measuring 20,000 genes. This highlights the need for unsupervised representation learning to fully ingest these data, yet classical transformer architectures are prohibitive to train on such data in terms of both computation and memory. To address this challenge, we propose a novel asymmetric encoder-decoder transformer for scRNA-seq data, called xTrimoGene$^\alpha$ (or xTrimoGene for short), which leverages the sparse characteristic of the data to scale up the pre-training. This scalable design of xTrimoGene reduces FLOPs by one to two orders of magnitude compared to classical transformers while maintaining high accuracy, enabling us to train the largest transformer models over the largest scRNA-seq dataset today. Our experiments also show that the performance of xTrimoGene improves as we scale up the model sizes, and it also leads to SOTA performance over various downstream tasks, such as cell type annotation, perturb-seq effect prediction, and drug combination prediction. xTrimoGene model is now available for use as a service via the following link: https://api.biomap.com/xTrimoGene/apply.
    摘要 高通量测序技术的进步使得单个细胞水平的基因表达测量得到了 significi cant进步。目前已经有5000万条人类单个细胞RNA-seq数据(scRNA-seq)公共数据,每个记录测量20,000个基因。这种数据的普遍性增加了不监督学习的需求,然而经典的变换器架构在计算和内存方面都是禁制的。为了解决这个挑战,我们提出了一种新的非对称编码器-解码器变换器,称为xTrimoGene$^\alpha$(简称xTrimoGene),它利用单个细胞RNA-seq数据的稀疏特点来扩大预训练。这种可扩展的xTrimoGene设计减少了计算量的浮点数据点数量,同时保持高度准确,因此可以训练最大的变换器模型,并且可以在最大的单个细胞RNA-seq数据集上进行训练。我们的实验也显示,xTrimoGene的性能随模型大小的增加而提高,并且在不同的下游任务中表现出了领先的性能,如细胞类型标注、扰动序列效应预测和药物组合预测。xTrimoGene模型现在通过以下链接提供:https://api.biomap.com/xTrimoGene/apply。