cs.LG - 2023-08-03

The Capability of Large Language Models to Measure Psychiatric Functioning

  • paper_url: http://arxiv.org/abs/2308.01834
  • repo_url: None
  • paper_authors: Isaac R. Galatzer-Levy, Daniel McDuff, Vivek Natarajan, Alan Karthikesalingam, Matteo Malgaroli
  • for: 本研究用Large language models (LLMs)来预测心理功能,无需专门培训。
  • methods: 使用Med-PaLM 2对患者口述和临床描述进行预测,并使用提问提取估算的临床分数和诊断。
  • results: 研究发现,Med-PaLM 2可以预测多种心理疾病的功能,其中最高的性能是预测抑郁分数(准确率范围为0.80-0.84),与人类临床评估人员的表现相当(t(1,144) = 1.20; p = 0.23)。这些结果表明,通用临床语言模型可以静态预测心理风险基于自由描述功能。
    Abstract The current work investigates the capability of Large language models (LLMs) that are explicitly trained on large corpuses of medical knowledge (Med-PaLM 2) to predict psychiatric functioning from patient interviews and clinical descriptions without being trained to do so. To assess this, n = 145 depression and n =115 PTSD assessments and n = 46 clinical case studies across high prevalence/high comorbidity disorders (Depressive, Anxiety, Psychotic, trauma and stress, Addictive disorders) were analyzed using prompts to extract estimated clinical scores and diagnoses. Results demonstrate that Med-PaLM 2 is capable of assessing psychiatric functioning across a range of psychiatric conditions with the strongest performance being the prediction of depression scores based on standardized assessments (Accuracy range= 0.80 - 0.84) which were statistically indistinguishable from human clinical raters t(1,144) = 1.20; p = 0.23. Results show the potential for general clinical language models to flexibly predict psychiatric risk based on free descriptions of functioning from both patients and clinicians.
    摘要 当前研究探讨了大型语言模型(LLMs)在大量医学知识训练(Med-PaLM 2)下预测患者的心理功能。为了评估这一点,研究人员分析了145例含有抑郁症和115例含有Post Traumatic Stress Disorder(PTSD)的评估结果,以及46例临床案例,包括高频高相关疾病(抑郁、焦虑、精神病、吸毒症)。结果表明Med-PaLM 2可以准确预测各种心理疾病的功能水平,最高的表现是预测抑郁评估结果(准确率范围为0.80-0.84),这与人类临床评估器的表现相当(t(1,144) = 1.20; p = 0.23)。结果表明大规模临床语言模型可以通过自动提取评估结果和诊断来预测心理风险。

Distribution-Free Inference for the Regression Function of Binary Classification

  • paper_url: http://arxiv.org/abs/2308.01835
  • repo_url: None
  • paper_authors: Ambrus Tamás, Balázs Csanád Csáji
  • for: 该论文主要针对二分类问题的回归函数(即输入 conditional 类别预测值)的建模和确定。
  • methods: 该论文提出了一种框架,用于构建准确、分布无关、非对数学上确定的 confidence 区间,以确定真实的回归函数。然后,提出了具体的算法来实现该框架。
  • results: 该论文证明了构建的 confidence 区间是强有力的,即任何不正确的模型都将在长期内被排除,并且这种排除可以通过 probably approximately correct 类型上下文来衡量。此外,提出了与 asymptotic 信息椭圆比较的方法。
    Abstract One of the key objects of binary classification is the regression function, i.e., the conditional expectation of the class labels given the inputs. With the regression function not only a Bayes optimal classifier can be defined, but it also encodes the corresponding misclassification probabilities. The paper presents a resampling framework to construct exact, distribution-free and non-asymptotically guaranteed confidence regions for the true regression function for any user-chosen confidence level. Then, specific algorithms are suggested to demonstrate the framework. It is proved that the constructed confidence regions are strongly consistent, that is, any false model is excluded in the long run with probability one. The exclusion is quantified with probably approximately correct type bounds, as well. Finally, the algorithms are validated via numerical experiments, and the methods are compared to approximate asymptotic confidence ellipsoids.
    摘要 一个重要的二分类问题中的重要对象是回归函数,即输入给定后的类标签的Conditional Expectation。不仅可以通过回归函数定义 bayes 优化的分类器,还可以编码相应的错误分布。文章提出了一种抽样框架,可以construct exact、distribution-free和非尺度假设保证的真实回归函数信心区间,并提供了特定的算法来实现该框架。文章证明了构造的信心区间具有strong consistency,即任何不正确模型都将在长期内被排除,并且这种排除的可靠性被证明为probabilistic bounds。最后,算法被 validate通过数值实验,并与approximate asymptotic confidence ellipsoids进行比较。

Hard Adversarial Example Mining for Improving Robust Fairness

  • paper_url: http://arxiv.org/abs/2308.01823
  • repo_url: None
  • paper_authors: Chenhao Lin, Xiang Ji, Yulong Yang, Qian Li, Chao Shen, Run Wang, Liming Fang
  • for: 这个论文旨在提高深度神经网络(DNN)对于攻击性示例(AE)的抵抗性,并且解决这些模型对于不公正性问题的限制。
  • methods: 这篇论文提出了一个简单又有效的框架,即适应式强制例预测(HAM),可以在适应性训练中对DNN进行适应性训练,以提高其对于AE的抵抗性和公正性。
  • results: 实验结果显示,这个方法可以在CIFAR-10、SVHN和Imagenette等数据集上实现重要的改善,即提高DNN对于AE的抵抗性和公正性,并且降低了computational cost。
    Abstract Adversarial training (AT) is widely considered the state-of-the-art technique for improving the robustness of deep neural networks (DNNs) against adversarial examples (AE). Nevertheless, recent studies have revealed that adversarially trained models are prone to unfairness problems, restricting their applicability. In this paper, we empirically observe that this limitation may be attributed to serious adversarial confidence overfitting, i.e., certain adversarial examples with overconfidence. To alleviate this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM concentrates on mining hard adversarial examples while discarding the easy ones in an adaptive fashion. Specifically, HAM identifies hard AEs in terms of their step sizes needed to cross the decision boundary when calculating loss value. Besides, an early-dropping mechanism is incorporated to discard the easy examples at the initial stages of AE generation, resulting in efficient AT. Extensive experimental results on CIFAR-10, SVHN, and Imagenette demonstrate that HAM achieves significant improvement in robust fairness while reducing computational cost compared to several state-of-the-art adversarial training methods. The code will be made publicly available.
    摘要 “对抗训练”(AT)是深度神经网络(DNN)的鲜明技术,可以提高其对假输入的鲜明性。然而, latest studies have shown that adversarially trained models are prone to unfairness problems, limiting their applicability. In this paper, we empirically observe that this limitation may be attributed to serious adversarial confidence overfitting, i.e., certain adversarial examples with overconfidence. To address this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM concentrates on mining hard adversarial examples while discarding the easy ones in an adaptive fashion. Specifically, HAM identifies hard AEs in terms of their step sizes needed to cross the decision boundary when calculating loss value. Furthermore, an early-dropping mechanism is incorporated to discard the easy examples at the initial stages of AE generation, resulting in efficient AT. Extensive experimental results on CIFAR-10, SVHN, and Imagenette demonstrate that HAM achieves significant improvement in robust fairness while reducing computational cost compared to several state-of-the-art adversarial training methods. The code will be made publicly available.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

  • paper_url: http://arxiv.org/abs/2308.01814
  • repo_url: None
  • paper_authors: Greg Yang, Etai Littwin
  • for: 这个论文旨在探讨适用于宽神经网络的自适应优化器,如 Adam 优化器,在宽神经网络中是否会出现新的现象。
  • methods: 该论文使用了一种新的表示语言 called Tensor Program,以及一种简化表达和计算的 bra-ket 表示法,来描述如何使用适应优化器处理梯度并生成更新。
  • results: 研究发现,在宽神经网络中使用适应优化器时,会出现类似于 Stochastic Gradient Descent (SGD) 中的特征学习和核函数行为的分 dichotomy。此外,研究还提出了一种新的 “神经 Tangent” 和 “最大更新” 概念,以描述在不同架构下的学习过程。
    Abstract Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general optimizers as well, including Adam -- albeit with a nonlinear notion of "kernel." We derive the corresponding "neural tangent" and "maximal update" limits for any architecture. Two foundational advances underlie the above results: 1) A new Tensor Program language, NEXORT, that can express how adaptive optimizers process gradients into updates. 2) The introduction of bra-ket notation to drastically simplify expressions and calculations in Tensor Programs. This work summarizes and generalizes all previous results in the Tensor Programs series of papers.
    摘要 SGD 以外,在宽神经网络中,适应优化器如 Adam 引入新的现象,我们在这里展示:SGD 中的特征学习和核函数行为之间的分 dichotomy 也存在于总优化器中,包括 Adam,但是它们的概念是非线性的。我们 deriv 出对于任何架构的“神经射”和“最大更新”的限制。这些结果基于以下两个基础性的进步:1. 一种新的tensor program语言,叫做 NEXORT,可以表示如何适应优化器处理梯度到更新的过程。2. 使用 bra-ket notation来剖析和计算tensor program表达式,从而大大简化表达式和计算。这篇文章总结了以前的所有tensor program系列文章的结果,并对其进行总结和推广。

Job Shop Scheduling via Deep Reinforcement Learning: a Sequence to Sequence approach

  • paper_url: http://arxiv.org/abs/2308.01797
  • repo_url: https://github.com/dawoz/JSP-DeepRL-Seq2Seq
  • paper_authors: Giovanni Bonetta, Davide Zago, Rossella Cancelliere, Andrea Grosso
  • for: 这 paper 是用 Deep Reinforcement Learning 方法解决 Job Shop Problem,即一个常见的 Combinatorial Optimization 问题,以获得优化的生产计划。
  • methods: 该 paper 使用了一种基于自然语言编码器-解码器模型的 Deep Reinforcement Learning 方法,自动学习调度规则。这种方法 Never 在过去用于调度问题,但它可以普遍应用于其他优化调度任务。
  • results: 实验结果表明,该方法可以超越许多经典方法,以及部分 Deep Reinforcement Learning 方法,并在Job Shop Problem 中获得竞争力强的结果。
    Abstract Job scheduling is a well-known Combinatorial Optimization problem with endless applications. Well planned schedules bring many benefits in the context of automated systems: among others, they limit production costs and waste. Nevertheless, the NP-hardness of this problem makes it essential to use heuristics whose design is difficult, requires specialized knowledge and often produces methods tailored to the specific task. This paper presents an original end-to-end Deep Reinforcement Learning approach to scheduling that automatically learns dispatching rules. Our technique is inspired by natural language encoder-decoder models for sequence processing and has never been used, to the best of our knowledge, for scheduling purposes. We applied and tested our method in particular to some benchmark instances of Job Shop Problem, but this technique is general enough to be potentially used to tackle other different optimal job scheduling tasks with minimal intervention. Results demonstrate that we outperform many classical approaches exploiting priority dispatching rules and show competitive results on state-of-the-art Deep Reinforcement Learning ones.
    摘要 “Job scheduling 是一个非常知名的 Combinatorial Optimization 问题,它在各种自动化系统中有无数应用。一旦规划得好,就能够限制生产成本和废弃物。然而,这个问题的NP-hardness使得需要使用专门的知识和技巧来设计对特定任务的方法。这篇文章提出了一个原创的 Deep Reinforcement Learning 方法,可以自动学习派送规则。我们的技术受自然语言Encoder-Decoder模型的启发,并从未在Job Shop Problem 中使用过。我们将这个方法应用到了一些 benchmark 的Job Shop Problem 问题上,但这个方法够通用,可以用来解决其他不同的优化任务。结果表明我们可以超过许多传统的优化方法,并与现有的 Deep Reinforcement Learning 方法竞争。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Benchmarking Adaptative Variational Quantum Algorithms on QUBO Instances

  • paper_url: http://arxiv.org/abs/2308.01789
  • repo_url: None
  • paper_authors: Gloria Turati, Maurizio Ferrari Dacrema, Paolo Cremonesi
  • for: 这 paper 的目的是对各种变量量量算法 (VQA) 进行系统性比较,以填补现有文献中缺乏系统性比较的问题。
  • methods: 这 paper 使用的方法包括:对三种已有的 Adaptative VQA(EVQE、VAns、RA-VQE)进行比较,以及对Quantum Approximate Optimization Algorithm (QAOA) 的分析。这些方法都应用于 QUBO 问题上,并通过评估解决的质量和计算时间来评估其表现。
  • results: 这 paper 的结果表明,Adaptative VQA 方法在 QUBO 问题上的表现比 traditional VQA 方法更好,特别是在选择合适的 hyperparameter 时。此外,这 paper 还发现,在不同的 hyperparameter 选择情况下,Adaptative VQA 方法的表现可以得到大幅提高。这些结果可以作为 near-term 量子计算机设备上 Adaptative VQA 的标准 Referential,并为未来这一领域的研究提供有价值的指导。
    Abstract In recent years, Variational Quantum Algorithms (VQAs) have emerged as a promising approach for solving optimization problems on quantum computers in the NISQ era. However, one limitation of VQAs is their reliance on fixed-structure circuits, which may not be taylored for specific problems or hardware configurations. A leading strategy to address this issue are Adaptative VQAs, which dynamically modify the circuit structure by adding and removing gates, and optimize their parameters during the training. Several Adaptative VQAs, based on heuristics such as circuit shallowness, entanglement capability and hardware compatibility, have already been proposed in the literature, but there is still lack of a systematic comparison between the different methods. In this paper, we aim to fill this gap by analyzing three Adaptative VQAs: Evolutionary Variational Quantum Eigensolver (EVQE), Variable Ansatz (VAns), already proposed in the literature, and Random Adapt-VQE (RA-VQE), a random approach we introduce as a baseline. In order to compare these algorithms to traditional VQAs, we also include the Quantum Approximate Optimization Algorithm (QAOA) in our analysis. We apply these algorithms to QUBO problems and study their performance by examining the quality of the solutions found and the computational times required. Additionally, we investigate how the choice of the hyperparameters can impact the overall performance of the algorithms, highlighting the importance of selecting an appropriate methodology for hyperparameter tuning. Our analysis sets benchmarks for Adaptative VQAs designed for near-term quantum devices and provides valuable insights to guide future research in this area.
    摘要 Recent years have seen the emergence of Variational Quantum Algorithms (VQAs) as a promising approach for solving optimization problems on quantum computers in the NISQ era. However, one limitation of VQAs is their reliance on fixed-structure circuits, which may not be tailored for specific problems or hardware configurations. To address this issue, Adaptative VQAs have been proposed, which dynamically modify the circuit structure and optimize parameters during training. Several Adaptative VQAs have been proposed in the literature, but there is still a lack of systematic comparison between them. In this paper, we aim to fill this gap by analyzing three Adaptative VQAs: Evolutionary Variational Quantum Eigensolver (EVQE), Variable Ansatz (VAns), and Random Adapt-VQE (RA-VQE), as well as the Quantum Approximate Optimization Algorithm (QAOA) for comparison. We apply these algorithms to QUBO problems and study their performance by examining the quality of the solutions found and the computational times required. We also investigate the impact of hyperparameter choice on algorithm performance, highlighting the importance of selecting an appropriate methodology for hyperparameter tuning. Our analysis sets benchmarks for Adaptative VQAs designed for near-term quantum devices and provides valuable insights to guide future research in this area.

Deep Learning-based Prediction of Stress and Strain Maps in Arterial Walls for Improved Cardiovascular Risk Assessment

  • paper_url: http://arxiv.org/abs/2308.01771
  • repo_url: None
  • paper_authors: Yasin Shokrollahi1, Pengfei Dong1, Xianqi Li, Linxia Gu
  • for: 这个研究旨在替代Finite Element Method (FEM),使用深度学习工具来预测血管壁的剪切压力和扭矩分布。
  • methods: 我们提出了一种基于U-Net的全 convolutional neural network (CNN),以及一种基于 conditional generative adversarial network (cGAN) 的模型,以预测血管壁的剪切压力和扭矩分布。
  • results: 我们的模型可以高度准确地预测血管壁的剪切压力和扭矩分布,并且可以考虑到不同的 calcification 量和空间配置。
    Abstract This study investigated the potential of end-to-end deep learning tools as a more effective substitute for FEM in predicting stress-strain fields within 2D cross sections of arterial wall. We first proposed a U-Net based fully convolutional neural network (CNN) to predict the von Mises stress and strain distribution based on the spatial arrangement of calcification within arterial wall cross-sections. Further, we developed a conditional generative adversarial network (cGAN) to enhance, particularly from the perceptual perspective, the prediction accuracy of stress and strain field maps for arterial walls with various calcification quantities and spatial configurations. On top of U-Net and cGAN, we also proposed their ensemble approaches, respectively, to further improve the prediction accuracy of field maps. Our dataset, consisting of input and output images, was generated by implementing boundary conditions and extracting stress-strain field maps. The trained U-Net models can accurately predict von Mises stress and strain fields, with structural similarity index scores (SSIM) of 0.854 and 0.830 and mean squared errors of 0.017 and 0.018 for stress and strain, respectively, on a reserved test set. Meanwhile, the cGAN models in a combination of ensemble and transfer learning techniques demonstrate high accuracy in predicting von Mises stress and strain fields, as evidenced by SSIM scores of 0.890 for stress and 0.803 for strain. Additionally, mean squared errors of 0.008 for stress and 0.017 for strain further support the model's performance on a designated test set. Overall, this study developed a surrogate model for finite element analysis, which can accurately and efficiently predict stress-strain fields of arterial walls regardless of complex geometries and boundary conditions.
    摘要

Bag of Policies for Distributional Deep Exploration

  • paper_url: http://arxiv.org/abs/2308.01759
  • repo_url: None
  • paper_authors: Asen Nachkov, Luchen Li, Giulia Luise, Filippo Valdettaro, Aldo Faisal
  • for: 该研究旨在提高复杂环境中的演化学习(RL)效率,具体来说是在分布RL中进行深入探索。
  • methods: 我们提出了一种通用方法,即Bag of Policies(BoP),可以在任何返回分布估计器基础上建立,该方法通过维护多个独立更新的头部来实现深入探索。
  • results: 我们通过实验表明,BoP方法可以在ALE Atari游戏中提高学习robustness和速度。
    Abstract Efficient exploration in complex environments remains a major challenge for reinforcement learning (RL). Compared to previous Thompson sampling-inspired mechanisms that enable temporally extended exploration, i.e., deep exploration, we focus on deep exploration in distributional RL. We develop here a general purpose approach, Bag of Policies (BoP), that can be built on top of any return distribution estimator by maintaining a population of its copies. BoP consists of an ensemble of multiple heads that are updated independently. During training, each episode is controlled by only one of the heads and the collected state-action pairs are used to update all heads off-policy, leading to distinct learning signals for each head which diversify learning and behaviour. To test whether optimistic ensemble method can improve on distributional RL as did on scalar RL, by e.g. Bootstrapped DQN, we implement the BoP approach with a population of distributional actor-critics using Bayesian Distributional Policy Gradients (BDPG). The population thus approximates a posterior distribution of return distributions along with a posterior distribution of policies. Another benefit of building upon BDPG is that it allows to analyze global posterior uncertainty along with local curiosity bonus simultaneously for exploration. As BDPG is already an optimistic method, this pairing helps to investigate if optimism is accumulatable in distributional RL. Overall BoP results in greater robustness and speed during learning as demonstrated by our experimental results on ALE Atari games.
    摘要 efficient exploration in complex environments remains a major challenge for reinforcement learning (RL). compared to previous thompson sampling-inspired mechanisms that enable temporally extended exploration, i.e., deep exploration, we focus on deep exploration in distributional RL. we develop here a general purpose approach, bag of policies (BoP), that can be built on top of any return distribution estimator by maintaining a population of its copies. BoP consists of an ensemble of multiple heads that are updated independently. during training, each episode is controlled by only one of the heads and the collected state-action pairs are used to update all heads off-policy, leading to distinct learning signals for each head which diversify learning and behavior. to test whether optimistic ensemble method can improve on distributional RL as did on scalar RL, by e.g. bootstrapped dqn, we implement the BoP approach with a population of distributional actor-critics using bayesian distributional policy gradients (BDPG). the population thus approximates a posterior distribution of return distributions along with a posterior distribution of policies. another benefit of building upon BDPG is that it allows to analyze global posterior uncertainty along with local curiosity bonus simultaneously for exploration. as BDPG is already an optimistic method, this pairing helps to investigate if optimism is accumulatable in distributional RL. overall BoP results in greater robustness and speed during learning as demonstrated by our experimental results on ALE Atari games.

Guided Distillation for Semi-Supervised Instance Segmentation

  • paper_url: http://arxiv.org/abs/2308.02668
  • repo_url: None
  • paper_authors: Tariq Berrada, Camille Couprie, Karteek Alahari, Jakob Verbeek
  • for: 提高实例 segmentation 方法的效果,减少依赖于全部标注的训练图像。
  • methods: 使用 semi-supervised 方法,即使用未标注数据作为训练信号,限制模型过拟合已标注样本。
  • results: 对 teacher-student distillation 模型进行改进,包括引入新的 “导引燃烧” 阶段和评估不同的实例 segmentation 架构、后向网络和预训练策略。这些改进导致了substantial 的提高,比如在 Cityscapes 数据集上提高 mask-AP 从 23.7 到 33.9,在 COCO 数据集上提高 mask-AP 从 18.3 到 34.1,即使使用只有 1% 的标注数据。
    Abstract Although instance segmentation methods have improved considerably, the dominant paradigm is to rely on fully-annotated training images, which are tedious to obtain. To alleviate this reliance, and boost results, semi-supervised approaches leverage unlabeled data as an additional training signal that limits overfitting to the labeled samples. In this context, we present novel design choices to significantly improve teacher-student distillation models. In particular, we (i) improve the distillation approach by introducing a novel "guided burn-in" stage, and (ii) evaluate different instance segmentation architectures, as well as backbone networks and pre-training strategies. Contrary to previous work which uses only supervised data for the burn-in period of the student model, we also use guidance of the teacher model to exploit unlabeled data in the burn-in period. Our improved distillation approach leads to substantial improvements over previous state-of-the-art results. For example, on the Cityscapes dataset we improve mask-AP from 23.7 to 33.9 when using labels for 10\% of images, and on the COCO dataset we improve mask-AP from 18.3 to 34.1 when using labels for only 1\% of the training data.
    摘要 尽管实例分割方法已经有了很大的进步,但现在主流的方法仍然是通过完全标注的图像进行训练,这是获取标注数据的劳作。为了解决这个问题, semi-supervised 方法可以利用无标注数据作为训练信号,限制学习到标注样本的过拟合。在这个上下文中,我们提出了新的设计选择,以提高教师学生液体模型。具体来说,我们(i)改进了液体模型的distillation方法,引入了一个新的“导向燃烧”阶段,以及(ii)评估不同的实例分割架构、背部网络和预训练策略。与前一个工作不同,我们在学生模型的燃烧期间也使用教师模型的指导来利用无标注数据。我们改进的液体模型方法导致了与前一个状态的 существенный改进。例如,在Cityscapes 数据集上,我们从 23.7 提高到 33.9 的 mask-AP,并在 COCO 数据集上从 18.3 提高到 34.1 的 mask-AP,只使用了1%的训练数据上的标注。

Neural Collapse Terminus: A Unified Solution for Class Incremental Learning and Its Variants

  • paper_url: http://arxiv.org/abs/2308.01746
  • repo_url: https://github.com/neuralcollapseapplications/unicil
  • paper_authors: Yibo Yang, Haobo Yuan, Xiangtai Li, Jianlong Wu, Lefei Zhang, Zhouchen Lin, Philip Torr, Dacheng Tao, Bernard Ghanem
  • for: 这篇论文targets class incremental learning (CIL)、long-tail class incremental learning (LTCIL)和 few-shot class incremental learning (FSCIL) 三种任务,解决了数据不均衡和数据罕见性导致的分类 incremental learning难题。
  • methods: 该论文提出了一种统一的解决方案,即使用神经萧条终点(Neural Collapse Terminus,NCT),它是一种固定结构,在整个标签空间中具有最大的等角间距性。NCT acts as a consistent target throughout the incremental training, avoiding dividing the feature space incrementally。在CIL和LTCIL任务中,我们还提出了一种prototype evolving scheme来使得backbone features满足NCT的要求。
  • results: 我们的方法可以在多个数据集上进行广泛的实验,证明了对所有三种任务和通用情况(不知道总类数和数据分布是ormal、long-tail或few-shot)的一致性。我们的方法也可以在数据不均衡和数据罕见性情况下保持分类能力,并且在不同的数据集上具有良好的普适性。
    Abstract How to enable learnability for new classes while keeping the capability well on old classes has been a crucial challenge for class incremental learning. Beyond the normal case, long-tail class incremental learning and few-shot class incremental learning are also proposed to consider the data imbalance and data scarcity, respectively, which are common in real-world implementations and further exacerbate the well-known problem of catastrophic forgetting. Existing methods are specifically proposed for one of the three tasks. In this paper, we offer a unified solution to the misalignment dilemma in the three tasks. Concretely, we propose neural collapse terminus that is a fixed structure with the maximal equiangular inter-class separation for the whole label space. It serves as a consistent target throughout the incremental training to avoid dividing the feature space incrementally. For CIL and LTCIL, we further propose a prototype evolving scheme to drive the backbone features into our neural collapse terminus smoothly. Our method also works for FSCIL with only minor adaptations. Theoretical analysis indicates that our method holds the neural collapse optimality in an incremental fashion regardless of data imbalance or data scarcity. We also design a generalized case where we do not know the total number of classes and whether the data distribution is normal, long-tail, or few-shot for each coming session, to test the generalizability of our method. Extensive experiments with multiple datasets are conducted to demonstrate the effectiveness of our unified solution to all the three tasks and the generalized case.
    摘要 如何在新类添加时保持老类表现良好是incremental learning中的一大挑战。此外,长尾类增量学习和少shot类增量学习也被提出,以考虑数据不均和数据罕见的问题,这些问题在实际应用中非常普遍,并使得已知的忘记悖论变得更加严重。现有的方法主要针对其中一个任务。在这篇论文中,我们提出了对不一致的问题的共同解决方案。具体来说,我们提议一个固定结构的神经覆盖终点,该结构具有整个标签空间中最大的等角间类分离度。它在增量训练中 acted as一个不可分割的目标,以避免在增量训练中分割特征空间。对CIL和LTCIL,我们还提出了一种原型演化方案,用于使得干扰核的特征慢慢地趋向于我们的神经覆盖终点。我们的方法也适用于FSCIL,只需要小量的修改。理论分析表明,我们的方法在增量训练中保持神经覆盖优化无关于数据不均或数据罕见。我们还设计了一个通用情况,在每次会话中不知道总共多少个类和数据分布是否正常、长尾或少shot,以测试我们的方法的普适性。我们在多个数据集上进行了广泛的实验,以证明我们的共同解决方案对所有三个任务和通用情况都是有效的。

Multitask Learning with No Regret: from Improved Confidence Bounds to Active Learning

  • paper_url: http://arxiv.org/abs/2308.01744
  • repo_url: None
  • paper_authors: Pier Giuseppe Sessa, Pierre Laforgue, Nicolò Cesa-Bianchi, Andreas Krause
  • for: This paper is written for people who want to learn multiple related tasks simultaneously while quantifying uncertainty in the estimated tasks, especially in the challenging agnostic setting.
  • methods: The paper uses novel multitask confidence intervals that do not require i.i.d. data and can be applied to bound the regret in online learning. The paper also proposes a novel online learning algorithm that achieves improved regret without knowing the task similarity parameter in advance.
  • results: The paper obtains new regret guarantees that can significantly improve over treating tasks independently, and introduces a novel multitask active learning setup where several tasks must be simultaneously optimized but only one can be queried for feedback. The paper also empirically validates the bounds and algorithms on synthetic and real-world data.
    Abstract Multitask learning is a powerful framework that enables one to simultaneously learn multiple related tasks by sharing information between them. Quantifying uncertainty in the estimated tasks is of pivotal importance for many downstream applications, such as online or active learning. In this work, we provide novel multitask confidence intervals in the challenging agnostic setting, i.e., when neither the similarity between tasks nor the tasks' features are available to the learner. The obtained intervals do not require i.i.d. data and can be directly applied to bound the regret in online learning. Through a refined analysis of the multitask information gain, we obtain new regret guarantees that, depending on a task similarity parameter, can significantly improve over treating tasks independently. We further propose a novel online learning algorithm that achieves such improved regret without knowing this parameter in advance, i.e., automatically adapting to task similarity. As a second key application of our results, we introduce a novel multitask active learning setup where several tasks must be simultaneously optimized, but only one of them can be queried for feedback by the learner at each round. For this problem, we design a no-regret algorithm that uses our confidence intervals to decide which task should be queried. Finally, we empirically validate our bounds and algorithms on synthetic and real-world (drug discovery) data.
    摘要 多任务学习是一种强大的框架,它允许学习多个相关的任务,并在这些任务之间共享信息。量化任务估计中的不确定性是许多下游应用的关键问题,如在线学习或活动学习中。在这种工作中,我们提供了新的多任务信息增强的信度范围,这些范围不需要数据的i.i.d.分布,并可以直接应用于约束在线学习中的 regret。通过对多任务信息增强的细化分析,我们获得了新的 regret 保证,这些保证与任务相似性参数有关,可以在不知道任务相似性参数的情况下提高 regret 的性能。我们还提出了一种新的在线学习算法,该算法可以在不知道任务相似性参数的情况下实现改进的 regret。作为第二个关键应用,我们介绍了一种多任务活动学习设置,在这种设置中,学习器需要同时优化多个任务,但只有一个任务可以在每个轮次中被学习器请求反馈。为解决这个问题,我们设计了一种不会 regret 的算法,该算法使用我们的信度范围来决定哪个任务应该被请求反馈。最后,我们通过synthetic和实际世界(药物发现)数据进行了实验 validate 我们的 bound 和算法。

Finding the Optimum Design of Large Gas Engines Prechambers Using CFD and Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2308.01743
  • repo_url: None
  • paper_authors: Stefan Posch, Clemens Gößnitzer, Franz Rohrhofer, Bernhard C. Geiger, Andreas Wimmer
  • for: 优化大气Engine增压器设计,以实现稳定燃烧、低排放和高效率。
  • methods: 使用计算流体力学(CFD)模拟和 bayesian优化算法,对增压器设计参数进行优化。
  • results: 实验结果表明,选择的策略能够有效地找到符合目标值的增压器设计。
    Abstract The turbulent jet ignition concept using prechambers is a promising solution to achieve stable combustion at lean conditions in large gas engines, leading to high efficiency at low emission levels. Due to the wide range of design and operating parameters for large gas engine prechambers, the preferred method for evaluating different designs is computational fluid dynamics (CFD), as testing in test bed measurement campaigns is time-consuming and expensive. However, the significant computational time required for detailed CFD simulations due to the complexity of solving the underlying physics also limits its applicability. In optimization settings similar to the present case, i.e., where the evaluation of the objective function(s) is computationally costly, Bayesian optimization has largely replaced classical design-of-experiment. Thus, the present study deals with the computationally efficient Bayesian optimization of large gas engine prechambers design using CFD simulation. Reynolds-averaged-Navier-Stokes simulations are used to determine the target values as a function of the selected prechamber design parameters. The results indicate that the chosen strategy is effective to find a prechamber design that achieves the desired target values.
    摘要 “湍流喷射概念使用预室是大气引擎中稳定燃烧的可能解,实现低排放水平的高效燃烧。由于大气引擎预室的设计和运行 Parameters 的广泛范围,所以 Computational Fluid Dynamics(CFD)是评估不同设计的优先方法,因为实验室测试是时间consuming 和昂贵的。但是,复杂的物理学 пробле 需要大量的计算时间,限制了 CFD 模拟的可行性。在优化设定中,例如现在的情况,Bayesian 优化已经取代了传统的设计实验。因此,本研究探讨了 Computationally Efficient Bayesian 优化大气引擎预室设计 using CFD 模拟。Reynolds-averaged-Navier-Stokes 模拟用于选择预室设计参数中的目标值。结果显示选择的策略是有效的,可以找到一个符合预期的预室设计。”

Exploiting Multi-Label Correlation in Label Distribution Learning

  • paper_url: http://arxiv.org/abs/2308.01742
  • repo_url: None
  • paper_authors: Zhiqiang Kou jing wang yuheng jia xin geng
  • for: This paper focuses on addressing the challenges of Label Distribution Learning (LDL) methods that exploit low-rank label correlation, which is not always present in real-world datasets.
  • methods: The proposed method introduces an auxiliary Multi-Label Learning (MLL) process to capture low-rank label correlation, which is then used to improve the performance of LDL.
  • results: The proposed method is shown to be superior to existing LDL methods through comprehensive experiments, and ablation studies demonstrate the advantages of exploiting low-rank label correlation in the auxiliary MLL.Here’s the text in Simplified Chinese:
  • for: 本文主要研究 Label Distribution Learning (LDL) 方法中的一个挑战,即优化方法可以快速地适应实际数据中的 label distribution 结构。
  • methods: 提议的方法是通过引入多标签学习 (MLL) 过程来捕捉 label distribution 的低级别结构,从而提高 LDL 的性能。
  • results: 实验结果表明,提议的方法在比较实验中胜过现有的 LDL 方法,并且精细分析表明低级别 label correlation 的利用在 auxiliary MLL 中具有优势。
    Abstract Label Distribution Learning (LDL) is a novel machine learning paradigm that assigns label distribution to each instance. Many LDL methods proposed to leverage label correlation in the learning process to solve the exponential-sized output space; among these, many exploited the low-rank structure of label distribution to capture label correlation. However, recent studies disclosed that label distribution matrices are typically full-rank, posing challenges to those works exploiting low-rank label correlation. Note that multi-label is generally low-rank; low-rank label correlation is widely adopted in multi-label learning (MLL) literature. Inspired by that, we introduce an auxiliary MLL process in LDL and capture low-rank label correlation on that MLL rather than LDL. In such a way, low-rank label correlation is appropriately exploited in our LDL methods. We conduct comprehensive experiments and demonstrate that our methods are superior to existing LDL methods. Besides, the ablation studies justify the advantages of exploiting low-rank label correlation in the auxiliary MLL.
    摘要 标签分布学习(LDL)是一种新的机器学习方案,它将标签分布分配给每个实例。许多LDL方法尝试利用标签相关性来解决输出空间的指数型增长;其中许多利用标签分布的低级结构来捕捉标签相关性。然而,最近的研究表明,标签分布矩阵通常是全级结构,这对那些利用低级标签相关性的方法带来挑战。注意,多标签通常是低级的;低级标签相关性广泛采用在多标签学习(MLL) литературе。针对此,我们引入了一个辅助的MLL过程,并在其上捕捉低级标签相关性。这种方法可以正确地利用低级标签相关性。我们进行了广泛的实验,并证明了我们的方法与现有的LDL方法相比,具有更高的性能。此外,缺失研究证明了在辅助MLL中利用低级标签相关性的优势。

Bringing Chemistry to Scale: Loss Weight Adjustment for Multivariate Regression in Deep Learning of Thermochemical Processes

  • paper_url: http://arxiv.org/abs/2308.01954
  • repo_url: None
  • paper_authors: Franz M. Rohrhofer, Stefan Posch, Clemens Gößnitzer, José M. García-Oliver, Bernhard C. Geiger
  • for: 用于提高人工神经网络(ANN)在多变量回归任务中学习多种物质质量分布的精度。
  • methods: 使用一种简单 yet effective的损失函数调整方法,以超越标准的平均方差优化方法,使ANN可以准确地学习所有种类物质质量分布,包括次要种类,其标准优化完全失败。
  • results: 通过对网络训练过程中的梯度均衡来解释调整方法的效iveness。
    Abstract Flamelet models are widely used in computational fluid dynamics to simulate thermochemical processes in turbulent combustion. These models typically employ memory-expensive lookup tables that are predetermined and represent the combustion process to be simulated. Artificial neural networks (ANNs) offer a deep learning approach that can store this tabular data using a small number of network weights, potentially reducing the memory demands of complex simulations by orders of magnitude. However, ANNs with standard training losses often struggle with underrepresented targets in multivariate regression tasks, e.g., when learning minor species mass fractions as part of lookup tables. This paper seeks to improve the accuracy of an ANN when learning multiple species mass fractions of a hydrogen (\ce{H2}) combustion lookup table. We assess a simple, yet effective loss weight adjustment that outperforms the standard mean-squared error optimization and enables accurate learning of all species mass fractions, even of minor species where the standard optimization completely fails. Furthermore, we find that the loss weight adjustment leads to more balanced gradients in the network training, which explains its effectiveness.
    摘要 FLAMELET模型广泛用于计算流体动力学来模拟热化学过程,其中通常使用占用内存的 lookup 表格来表示燃烧过程。人工神经网络(ANNs)提供了深度学习方法,可以将这些表格存储在少量的网络参数中,可能在复杂的 simulations 中减少内存需求量度。然而,标准的训练损失通常在多变量回归任务中表现不佳,例如学习氢(\ce{H2))燃烧 lookup 表格中的少数种团谱。本文想要提高一个 ANN 在学习多种种团谱的燃烧 lookup 表格中的准确性。我们评估了一种简单 yet effective 的损失加权调整,超越标准的均方差估计,使得网络在所有种团谱中学习准确,包括少数种团谱,其标准估计完全失败。此外,我们发现损失加权调整导致了网络训练过程中的更平衡的梯度,这解释了其效果。

MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction

  • paper_url: http://arxiv.org/abs/2308.01737
  • repo_url: https://github.com/chiangel/map-code
  • paper_authors: Jianghao Lin, Yanru Qu, Wei Guo, Xinyi Dai, Ruiming Tang, Yong Yu, Weinan Zhang
  • for: 这 paper 是为了提高个性化在线服务的Click-through rate (CTR) 预测而写的。
  • methods: 这 paper 使用了自我超vision学习 paradigm,并提出了两种实用算法:masked feature prediction (MFP) 和 replaced feature detection (RFD),以更好地利用大量的用户点击记录数据。
  • results: 该 paper 的实验结果表明,使用 MFP 和 RFD 可以在两个实际大规模数据集(Avazu、Criteo)上 achieve new state-of-the-art 的性能,并且比以前的方法更高效和更加简单。
    Abstract With the widespread application of personalized online services, click-through rate (CTR) prediction has received more and more attention and research. The most prominent features of CTR prediction are its multi-field categorical data format, and vast and daily-growing data volume. The large capacity of neural models helps digest such massive amounts of data under the supervised learning paradigm, yet they fail to utilize the substantial data to its full potential, since the 1-bit click signal is not sufficient to guide the model to learn capable representations of features and instances. The self-supervised learning paradigm provides a more promising pretrain-finetune solution to better exploit the large amount of user click logs, and learn more generalized and effective representations. However, self-supervised learning for CTR prediction is still an open question, since current works on this line are only preliminary and rudimentary. To this end, we propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data, and more specifically, we derive two practical algorithms: masked feature prediction (MFP) and replaced feature detection (RFD). MFP digs into feature interactions within each instance through masking and predicting a small portion of input features, and introduces noise contrastive estimation (NCE) to handle large feature spaces. RFD further turns MFP into a binary classification mode through replacing and detecting changes in input features, making it even simpler and more effective for CTR pretraining. Our extensive experiments on two real-world large-scale datasets (i.e., Avazu, Criteo) demonstrate the advantages of these two methods on several strong backbones (e.g., DCNv2, DeepFM), and achieve new state-of-the-art performance in terms of both effectiveness and efficiency for CTR prediction.
    摘要

Quantification of Predictive Uncertainty via Inference-Time Sampling

  • paper_url: http://arxiv.org/abs/2308.01731
  • repo_url: None
  • paper_authors: Katarína Tóthová, Ľubor Ladický, Daniel Thul, Marc Pollefeys, Ender Konukoglu
  • for: 预测变化和数据模糊问题的解决方案
  • methods: 使用后勤采样策略来估计预测不确定性,不需要专门的模型和训练机制,可以应用于任何循环幂网络
  • results: 可以生成不同可能的输出,与预测错误之间存在正相关性
    Abstract Predictive variability due to data ambiguities has typically been addressed via construction of dedicated models with built-in probabilistic capabilities that are trained to predict uncertainty estimates as variables of interest. These approaches require distinct architectural components and training mechanisms, may include restrictive assumptions and exhibit overconfidence, i.e., high confidence in imprecise predictions. In this work, we propose a post-hoc sampling strategy for estimating predictive uncertainty accounting for data ambiguity. The method can generate different plausible outputs for a given input and does not assume parametric forms of predictive distributions. It is architecture agnostic and can be applied to any feed-forward deterministic network without changes to the architecture or training procedure. Experiments on regression tasks on imaging and non-imaging input data show the method's ability to generate diverse and multi-modal predictive distributions, and a desirable correlation of the estimated uncertainty with the prediction error.
    摘要 <>传统的预测变化问题通常通过建立专门的模型,即建有 probabilistic 功能的模型,来解决。这些方法通常需要特定的架构组件和训练机制,可能带有限制性和过于自信,即高度自信准确但不准确的预测。在这种工作中,我们提议一种后期采样策略来估计预测uncertainty,考虑到数据的模糊性。该方法可以生成不同的可能性输出,并不假设预测分布的 parametic 形式。它是架构无关的,可以应用于任何 deterministic 网络,无需改变架构或训练过程。实验表明,该方法可以生成多模态和多样化的预测分布,并且预测uncertainty与预测误差之间存在可良好的相关性。Note: "Simplified Chinese" is a romanization of the Chinese language that uses a simplified set of characters and pronunciation, which is commonly used in mainland China.

Telematics Combined Actuarial Neural Networks for Cross-Sectional and Longitudinal Claim Count Data

  • paper_url: http://arxiv.org/abs/2308.01729
  • repo_url: None
  • paper_authors: Francis Duval, Jean-Philippe Boucher, Mathieu Pigeon
  • for: 这个论文是为了提出一种基于Combined Actuarial Neural Network(CANN)框架的新型车险保险模型,以提高车险保险的准确性和效果。
  • methods: 这个论文使用了一种组合了传统核算模型和神经网络的方法,称之为CANN模型,它将传统的核算模型和神经网络组合起来,以获得一个两部分模型。这个模型利用了传统模型的坚实基础和神经网络的灵活性和处理能力,以捕捉车险保险中复杂的关系和互动。
  • results: 研究发现,使用CANN模型可以在车险保险中获得更高的准确性和效果,比起基于手动设计的茶imer特征的log-linear模型。
    Abstract We present novel cross-sectional and longitudinal claim count models for vehicle insurance built upon the Combined Actuarial Neural Network (CANN) framework proposed by Mario W\"uthrich and Michael Merz. The CANN approach combines a classical actuarial model, such as a generalized linear model, with a neural network. This blending of models results in a two-component model comprising a classical regression model and a neural network part. The CANN model leverages the strengths of both components, providing a solid foundation and interpretability from the classical model while harnessing the flexibility and capacity to capture intricate relationships and interactions offered by the neural network. In our proposed models, we use well-known log-linear claim count regression models for the classical regression part and a multilayer perceptron (MLP) for the neural network part. The MLP part is used to process telematics car driving data given as a vector characterizing the driving behavior of each insured driver. In addition to the Poisson and negative binomial distributions for cross-sectional data, we propose a procedure for training our CANN model with a multivariate negative binomial (MVNB) specification. By doing so, we introduce a longitudinal model that accounts for the dependence between contracts from the same insured. Our results reveal that the CANN models exhibit superior performance compared to log-linear models that rely on manually engineered telematics features.
    摘要 我们提出了一种新的跨sectional和 longitudinallaimcount模型,用于汽车保险,基于Mario W\"uthrich和Michael Merz所提出的Combined Actuarial Neural Network(CANN)框架。CANN方法将经典泛 actuarial模型,如通用线性模型,与神经网络结合。这种混合模型组成了两个组件:经典回归模型和神经网络部分。CANN模型利用了经典模型的坚实基础和可解释性,同时利用神经网络的灵活性和可捕捉复杂关系和互动。在我们的提议中,我们使用了常见的log-linear claim count回归模型作为经典回归部分,而用多层感知器(MLP)来处理每名保险人的驾驶行为数据。此外,我们还提出了一种训练CANN模型的多variate negative binomial(MVNB)规范的过程。通过这种方法,我们引入了一种Longitudinal模型,该模型考虑了同一保险人签订的合同之间的依赖关系。我们的结果表明,CANN模型比基于手动设计的 telematics特征的log-linear模型表现出色。

ADRNet: A Generalized Collaborative Filtering Framework Combining Clinical and Non-Clinical Data for Adverse Drug Reaction Prediction

  • paper_url: http://arxiv.org/abs/2308.02571
  • repo_url: https://github.com/haoxuanli-pku/adrnet
  • paper_authors: Haoxuan Li, Taojun Hu, Zetong Xiong, Chunyuan Zheng, Fuli Feng, Xiangnan He, Xiao-Hua Zhou
  • for: 预测药物副作用 (ADR) 的精度和效率,以减少病人死亡和提高药物安全性。
  • methods: 基于合作过滤和表示学习的总结方法,利用非临床数据中的药物特征进行药物-副作用预测。
  • results: 在两个大规模的临床数据集上进行了广泛的比较性能测试,并在两个非临床数据集上进行了实验,证明了提案的 ADRNet 框架的准确性和效率。
    Abstract Adverse drug reaction (ADR) prediction plays a crucial role in both health care and drug discovery for reducing patient mortality and enhancing drug safety. Recently, many studies have been devoted to effectively predict the drug-ADRs incidence rates. However, these methods either did not effectively utilize non-clinical data, i.e., physical, chemical, and biological information about the drug, or did little to establish a link between content-based and pure collaborative filtering during the training phase. In this paper, we first formulate the prediction of multi-label ADRs as a drug-ADR collaborative filtering problem, and to the best of our knowledge, this is the first work to provide extensive benchmark results of previous collaborative filtering methods on two large publicly available clinical datasets. Then, by exploiting the easy accessible drug characteristics from non-clinical data, we propose ADRNet, a generalized collaborative filtering framework combining clinical and non-clinical data for drug-ADR prediction. Specifically, ADRNet has a shallow collaborative filtering module and a deep drug representation module, which can exploit the high-dimensional drug descriptors to further guide the learning of low-dimensional ADR latent embeddings, which incorporates both the benefits of collaborative filtering and representation learning. Extensive experiments are conducted on two publicly available real-world drug-ADR clinical datasets and two non-clinical datasets to demonstrate the accuracy and efficiency of the proposed ADRNet. The code is available at https://github.com/haoxuanli-pku/ADRnet.
    摘要 药物反应(ADR)预测对于医疗和药品发现具有关键作用,可以降低病人死亡率并提高药品安全性。在过去的几年中,许多研究都在努力地预测药物-ADR的发生率。然而,这些方法 Either did not effectively utilize non-clinical data, such as physical, chemical, and biological information about the drug, or did little to establish a link between content-based and pure collaborative filtering during the training phase.在本文中,我们将药物多标签 ADR 预测转化为药物-ADR 共同滤波问题,并且在我们所知道的范围内,这是首次对两个大规模公共可用的临床数据集进行了广泛的 benchmark 测试。然后,通过利用药物的易 accessible 非临床数据中的药物特征,我们提出了 ADRNet,一种通用的共同滤波框架,结合临床数据和非临床数据 для药物-ADR 预测。具体来说,ADRNet 包括一个浅层共同滤波模块和一个深度药物表示模块,可以利用高维度的药物描述符进一步引导学习低维度的 ADR 潜在嵌入,这里包括了共同滤波和表示学习的两大优点。我们对两个公共可用的实际世界药物-ADR 临床数据集和两个非临床数据集进行了广泛的实验,以证明我们提出的 ADRNet 的准确性和效率。代码可以在 上获取。

  • paper_url: http://arxiv.org/abs/2308.01682
  • repo_url: https://github.com/cborile/eval_lp_xai
  • paper_authors: Claudio Borile, Alan Perotti, André Panisson
  • for: 本研究旨在提供链接预测模型的人类可理解的解释方法,以便推广图机器学习(GML)模型的应用。
  • methods: 本研究使用现有的图神经网络(GNN)解释方法,并提出了一些量化的评价指标来评估这些解释的质量。
  • results: 研究发现,不同的Distance between node embeddings选择对链接预测解释质量有所影响,而state-of-the-art explainability methods for GNN也在链接预测任务中表现不佳。
    Abstract Graph Machine Learning (GML) has numerous applications, such as node/graph classification and link prediction, in real-world domains. Providing human-understandable explanations for GML models is a challenging yet fundamental task to foster their adoption, but validating explanations for link prediction models has received little attention. In this paper, we provide quantitative metrics to assess the quality of link prediction explanations, with or without ground-truth. State-of-the-art explainability methods for Graph Neural Networks are evaluated using these metrics. We discuss how underlying assumptions and technical details specific to the link prediction task, such as the choice of distance between node embeddings, can influence the quality of the explanations.
    摘要 图机学(GML)在实际领域拥有很多应用,如节点/图分类和链接预测。提供可理解的图学模型解释是推广其采用的基本任务,但链接预测模型的解释 validation received little attention。在这篇论文中,我们提供了量化的评价指标,以确定链接预测解释的质量,无论有无ground-truth。我们使用这些指标评估当今的图学神经网络解释方法。我们还讨论了链接预测任务的下面假设和技术细节,如节点嵌入距离的选择,对解释质量产生的影响。

Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

  • paper_url: http://arxiv.org/abs/2308.02570
  • repo_url: None
  • paper_authors: Feng Chen, Jiajia Liu, Kaixiang Ji, Wang Ren, Jian Wang, Jingdong Wang
  • for: 本研究旨在解决多模态命名实体识别(MNER)中的两大挑战:一是桥接文本和图像之间的semantic gap,二是匹配实体与其相关的物体在图像中。
  • methods: 我们提出了一种拓展Generative Adversarial Networks(GANs)的双向生成对齐方法,名为BGA-MNER,它通过文本和图像两 modalities的生成和对齐,同时jointly optimize bidirectional reconstruction objective,以实现对实体-物体关系的直接和强大的约束。
  • results: 我们在两个 benchmark上进行了广泛的实验,并证明了我们的方法可以在无图像输入情况下达到状态级性能。
    Abstract The challenge posed by multimodal named entity recognition (MNER) is mainly two-fold: (1) bridging the semantic gap between text and image and (2) matching the entity with its associated object in image. Existing methods fail to capture the implicit entity-object relations, due to the lack of corresponding annotation. In this paper, we propose a bidirectional generative alignment method named BGA-MNER to tackle these issues. Our BGA-MNER consists of \texttt{image2text} and \texttt{text2image} generation with respect to entity-salient content in two modalities. It jointly optimizes the bidirectional reconstruction objectives, leading to aligning the implicit entity-object relations under such direct and powerful constraints. Furthermore, image-text pairs usually contain unmatched components which are noisy for generation. A stage-refined context sampler is proposed to extract the matched cross-modal content for generation. Extensive experiments on two benchmarks demonstrate that our method achieves state-of-the-art performance without image input during inference.
    摘要

Efficiency of First-Order Methods for Low-Rank Tensor Recovery with the Tensor Nuclear Norm Under Strict Complementarity

  • paper_url: http://arxiv.org/abs/2308.01677
  • repo_url: None
  • paper_authors: Dan Garber, Atara Kaplan
  • For: 该 paper 是为了研究低级数据的恢复问题,特别是使用紧张矩阵的拓扑级数据。* Methods: 该 paper 使用了各种约束优化算法,包括拓扑级数据的紧张矩阵减小问题,以及基于紧张矩阵的拓扑级数据的优化问题。* Results: 该 paper 得到了一些关于低级数据的恢复问题的结论,包括:1. 当 objective 函数 是强Converter 函数时,使用拓扑级数据的紧张矩阵减小问题可以实现 linear 的减小率,即使 objective 函数 不是强Converter 函数。2. 对于一个具有某些特定 tubal 级数据的优化问题,使用拓扑级数据的紧张矩阵减小问题可以实现 nearly linear 的减小率。3. 对于一个非均匀的 objective 函数,使用 extragradient 方法可以实现类似的结论。此外,paper 还提供了一些基本的结论,包括对高阶级数据的拓扑级数据的研究。
    Abstract We consider convex relaxations for recovering low-rank tensors based on constrained minimization over a ball induced by the tensor nuclear norm, recently introduced in \cite{tensor_tSVD}. We build on a recent line of results that considered convex relaxations for the recovery of low-rank matrices and established that under a strict complementarity condition (SC), both the convergence rate and per-iteration runtime of standard gradient methods may improve dramatically. We develop the appropriate strict complementarity condition for the tensor nuclear norm ball and obtain the following main results under this condition: 1. When the objective to minimize is of the form $f(\mX)=g(\mA\mX)+\langle{\mC,\mX}\rangle$ , where $g$ is strongly convex and $\mA$ is a linear map (e.g., least squares), a quadratic growth bound holds, which implies linear convergence rates for standard projected gradient methods, despite the fact that $f$ need not be strongly convex. 2. For a smooth objective function, when initialized in certain proximity of an optimal solution which satisfies SC, standard projected gradient methods only require SVD computations (for projecting onto the tensor nuclear norm ball) of rank that matches the tubal rank of the optimal solution. In particular, when the tubal rank is constant, this implies nearly linear (in the size of the tensor) runtime per iteration, as opposed to super linear without further assumptions. 3. For a nonsmooth objective function which admits a popular smooth saddle-point formulation, we derive similar results to the latter for the well known extragradient method. An additional contribution which may be of independent interest, is the rigorous extension of many basic results regarding tensors of arbitrary order, which were previously obtained only for third-order tensors.
    摘要 我们考虑对低维度tensor的复原问题进行凸relaxation,基于给定的ball induce by tensor nuclear norm, refer to \cite{tensor_tSVD}.我们建立在latest line of results中,对低维度matrix的复原问题进行凸relaxation,并证明在strict complementarity condition (SC)下,标准的梯度方法具有优化的传递率和每次迭代的时间复杂度。我们发展了适当的SC condition for tensor nuclear norm ball,并取得以下主要结果:1. 当目标函数为 $f(\mX)=g(\mA\mX)+\langle{\mC,\mX}\rangle$ ,其中 $g$ 是强式凸函数且 $\mA$ 是线性映射(例如最小二乘),则存在强式凸成长范围,这意味着标准的梯度方法具有直线传递率,即使 $f$ 不是强式凸函数。2. 当目标函数是光滑函数时,初始化在具有SC condition的优化解时,标准的梯度方法只需要进行tensor nuclear norm ball上的SVD computations(即 проек onto the tensor nuclear norm ball),其中rank与优化解的 tubal rank 匹配。在特定情况下,这意味着每次迭代只需要运算在nearly linear(在tensor的大小上)的时间内,相比之下,无需更多的假设时间复杂度增长。3. 当目标函数是非光滑函数且具有流行的双峰形式时,我们 derive similar results to the latter for the well-known extragradient method.此外,我们还提供了一些独立有兴趣的结果,例如对于任意维度tensor的基本结果的精确推广,这些结果曾经只被证明 для第三维度tensor。

End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear MPC

  • paper_url: http://arxiv.org/abs/2308.01674
  • repo_url: None
  • paper_authors: Daniel Mayfrank, Alexander Mitsos, Manuel Dahmen
  • For: The paper aims to develop a method for end-to-end reinforcement learning of dynamic surrogate models for optimal performance in nonlinear model predictive control (NMPC) applications.* Methods: The paper proposes a method for training dynamic surrogate models using reinforcement learning, which can reduce the computational burden of NMPC and improve its performance.* Results: The paper shows that the proposed method can achieve predictive controllers that strike a favorable balance between control performance and computational demand, and outperform models derived from system identification. Additionally, the method can react to changes in the control setting without retraining.
    Abstract (Economic) nonlinear model predictive control ((e)NMPC) requires dynamic system models that are sufficiently accurate in all relevant state-space regions. These models must also be computationally cheap enough to ensure real-time tractability. Data-driven surrogate models for mechanistic models can be used to reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum average prediction accuracy on simulation samples and perform suboptimally as part of actual (e)NMPC. We present a method for end-to-end reinforcement learning of dynamic surrogate models for optimal performance in (e)NMPC applications, resulting in predictive controllers that strike a favorable balance between control performance and computational demand. We validate our method on two applications derived from an established nonlinear continuous stirred-tank reactor model. We compare the controller performance to that of MPCs utilizing models trained by the prevailing maximum prediction accuracy paradigm, and model-free neural network controllers trained using reinforcement learning. We show that our method matches the performance of the model-free neural network controllers while consistently outperforming models derived from system identification. Additionally, we show that the MPC policies can react to changes in the control setting without retraining.
    摘要 经济非线性模型预测控制(eNMPC)需要具有足够准确的动态系统模型,这些模型在所有相关的状态空间区域中都必须具有足够的准确性。这些模型还需要够快速 computationally,以确保实时可行性。基于系统认识的数据驱动模型可以用来降低(e)NMPC中的计算负担; however,这些模型通常通过系统认识来训练,以实现最大平均预测精度在模拟样本上。我们提出了一种终端渐进学习的动态代理模型方法,以实现在(e)NMPC应用中的优化性能。我们验证了这种方法,并将其应用于两个来自已知的非线性混合均匀 реактор模型中的应用。我们比较了控制性能与使用系统认识训练的模型、以及使用循环神经网络控制器训练的模型之间的性能。我们发现,我们的方法与使用循环神经网络控制器相当,而且一直 exceeds models derived from system identification。此外,我们还发现MPC策略可以随控制设置的变化而反应,无需重新训练。

UniG-Encoder: A Universal Feature Encoder for Graph and Hypergraph Node Classification

  • paper_url: http://arxiv.org/abs/2308.01650
  • repo_url: https://github.com/minhzou/unig-encoder
  • paper_authors: Minhao Zou, Zhongxue Gan, Yutong Wang, Junheng Zhang, Dongyan Sui, Chun Guan, Siyang Leng
  • for: 本研究旨在提出一种通用的特征编码器,用于图和高维图表示学习。
  • methods: 该方法使用一个前向变折框架,将图或高维图的 topological 关系转化为边或超边特征,然后将这些特征和原始节点特征 fed 入神经网络进行编码。
  • results: 对于十二种 represntative 高维图数据集和六种实际图数据集,该方法的性能都高于现有方法。
    Abstract Graph and hypergraph representation learning has attracted increasing attention from various research fields. Despite the decent performance and fruitful applications of Graph Neural Networks (GNNs), Hypergraph Neural Networks (HGNNs), and their well-designed variants, on some commonly used benchmark graphs and hypergraphs, they are outperformed by even a simple Multi-Layer Perceptron. This observation motivates a reexamination of the design paradigm of the current GNNs and HGNNs and poses challenges of extracting graph features effectively. In this work, a universal feature encoder for both graph and hypergraph representation learning is designed, called UniG-Encoder. The architecture starts with a forward transformation of the topological relationships of connected nodes into edge or hyperedge features via a normalized projection matrix. The resulting edge/hyperedge features, together with the original node features, are fed into a neural network. The encoded node embeddings are then derived from the reversed transformation, described by the transpose of the projection matrix, of the network's output, which can be further used for tasks such as node classification. The proposed architecture, in contrast to the traditional spectral-based and/or message passing approaches, simultaneously and comprehensively exploits the node features and graph/hypergraph topologies in an efficient and unified manner, covering both heterophilic and homophilic graphs. The designed projection matrix, encoding the graph features, is intuitive and interpretable. Extensive experiments are conducted and demonstrate the superior performance of the proposed framework on twelve representative hypergraph datasets and six real-world graph datasets, compared to the state-of-the-art methods. Our implementation is available online at https://github.com/MinhZou/UniG-Encoder.
    摘要 《图和高维图表示学习中的抽象和表示学习》在不同的研究领域中,图和高维图表示学习已经引起了越来越多的关注。Despite the decent performance and fruitful applications of图神经网络(GNNs)和高维图神经网络(HGNNs),以及其设计的许多变体,它们在一些常用的图和高维图上表现出色,但是它们仍然被简单的多层感知器(MLP)所超越。这一观察激发了现有GNNs和HGNNs的设计思维的重新检视,并提出了提取图特征的挑战。在这种情况下,我们提出了一种通用的图特征编码器,称为UniG-Encoder。UniG-Encoder的架构由一种正则化投影矩阵的前向变换和反向变换组成。在前向变换中,我们将图中连接的节点之间的 topological relationship 映射到edge或者高维edge特征上,使用正则化投影矩阵。得到的edge/高维edge特征和原始节点特征后,将被feed into一个神经网络。编码后的节点嵌入则可以通过反向变换,即神经网络输出的投影矩阵的转置,得到。这种架构与传统的spectral-based和/或message passing方法不同,同时并且全面地利用节点特征和图/高维图结构,提高了效率和一致性。设计的投影矩阵可以直观地和 interpretably 表示图特征。我们在十二个代表性的高维图数据集和六个实际的图数据集上进行了广泛的实验,并证明了我们的提案在这些数据集上表现出色,比传统的方法更高效。我们的实现可以在https://github.com/MinhZou/UniG-Encoder中找到。

MARLIM: Multi-Agent Reinforcement Learning for Inventory Management

  • paper_url: http://arxiv.org/abs/2308.01649
  • repo_url: None
  • paper_authors: Rémi Leluc, Elie Kadoche, Antoine Bertoncello, Sébastien Gourvénec
  • for: solve the inventory management problem for a single-echelon multi-products supply chain with stochastic demands and lead-times.
  • methods: reinforcement learning framework called MARLIM, with controllers developed through single or multiple agents in a cooperative setting.
  • results: numerical experiments on real data demonstrate the benefits of reinforcement learning methods over traditional baselines.Here’s the text in Traditional Chinese if you prefer:
  • for: 解决供应链中单一层次多产品的存储管理问题,具有随机需求和延误时间。
  • methods: 使用增强学习框架 called MARLIM,通过单一或多个代理人在合作环境中发展控制器。
  • results: 使用实际数据进行numerical experiments,证明增强学习方法比传统基准方法更有利。
    Abstract Maintaining a balance between the supply and demand of products by optimizing replenishment decisions is one of the most important challenges in the supply chain industry. This paper presents a novel reinforcement learning framework called MARLIM, to address the inventory management problem for a single-echelon multi-products supply chain with stochastic demands and lead-times. Within this context, controllers are developed through single or multiple agents in a cooperative setting. Numerical experiments on real data demonstrate the benefits of reinforcement learning methods over traditional baselines.
    摘要 维护供应和需求产品的平衡是供应链业中最重要的挑战之一。本文提出了一个新的强化学习框架,名为 MARLIM,以解决单一库存产品供应链中的存储管理问题。在这个 Setting中,控制器通过单一或多个代理人在合作环境中发展。数值实验表明,强化学习方法比传统基线更有利。Here's the word-for-word translation:维护供应和需求产品的平衡是供应链业中最重要的挑战之一。本文提出了一个新的强化学习框架,名为 MARLIM,以解决单一库存产品供应链中的存储管理问题。在这个 Setting中,控制器通过单一或多个代理人在合作环境中发展。数值实验表明,强化学习方法比传统基线更有利。

Interleaving GANs with knowledge graphs to support design creativity for book covers

  • paper_url: http://arxiv.org/abs/2308.01626
  • repo_url: https://github.com/alexmotogna/generatorapi
  • paper_authors: Alexandru Motogna, Adrian Groza
  • for: 这个论文是为了提高生成图书封面的质量,使用不同的训练方法来获得更好的生成图像。
  • methods: 该论文使用生成对抗网络(GANs)在图书封面领域中进行应用,并通过与知识图结合来修改输入标题,以获得更多的可能性。
  • results: 该方法在生成图书封面方面比前一些尝试更好,而知识图也为图书作者或编辑提供了更多的选择。
    Abstract An attractive book cover is important for the success of a book. In this paper, we apply Generative Adversarial Networks (GANs) to the book covers domain, using different methods for training in order to obtain better generated images. We interleave GANs with knowledge graphs to alter the input title to obtain multiple possible options for any given title, which are then used as an augmented input to the generator. Finally, we use the discriminator obtained during the training phase to select the best images generated with new titles. Our method performed better at generating book covers than previous attempts, and the knowledge graph gives better options to the book author or editor compared to using GANs alone.
    摘要 一个吸引人的书封面对于书的成功非常重要。在这篇论文中,我们使用生成对抗网络(GANs)来修改书封面领域,使用不同的训练方法来获得更好的生成图像。我们将GANs与知识图组合起来,使得输入标题的变化可以生成多个可能的选项,然后将这些选项作为增强的输入传递给生成器。最后,我们使用训练阶段获得的探测器来选择最佳的生成图像。我们的方法在生成书封面方面表现更好,并且知识图可以为书作者或编辑提供更好的选择,比起使用GANsalone。

Weighted Multi-Level Feature Factorization for App ads CTR and installation prediction

  • paper_url: http://arxiv.org/abs/2308.02568
  • repo_url: https://github.com/knife982000/recsys2023challenge
  • paper_authors: Juan Manuel Rodriguez, Antonela Tommasel
  • for: 这项研究是为了提出一种用于ACM RecSys Challenge 2023的方法,以提高深层漏斗优化和特别强调用户隐私。
  • methods: 该方法基于归一化多级特征分解,将 clicking 和 installing 视为两个不同, yet related 任务,因此模型设计了特定任务的特定特征和共享特征。
  • results: 该方法在 academia-track 最终结果中获得了11名和总分55分的成绩。Is there anything else I can help with?
    Abstract This paper provides an overview of the approach we used as team ISISTANITOS for the ACM RecSys Challenge 2023. The competition was organized by ShareChat, and involved predicting the probability of a user clicking an app ad and/or installing an app, to improve deep funnel optimization and a special focus on user privacy. Our proposed method inferring the probabilities of clicking and installing as two different, but related tasks. Hence, the model engineers a specific set of features for each task and a set of shared features. Our model is called Weighted Multi-Level Feature Factorization because it considers the interaction of different order features, where the order is associated to the depth in a neural network. The prediction for a given task is generated by combining the task specific and shared features on the different levels. Our submission achieved the 11 rank and overall score of 55 in the competition academia-track final results. We release our source code at: https://github.com/knife982000/RecSys2023Challenge
    摘要

Multimodal Indoor Localisation in Parkinson’s Disease for Detecting Medication Use: Observational Pilot Study in a Free-Living Setting

  • paper_url: http://arxiv.org/abs/2308.02419
  • repo_url: https://github.com/ferdianjovan/Multihead-Dual-Convolutional-Self-Attention
  • paper_authors: Ferdian Jovan, Catherine Morgan, Ryan McConville, Emma L. Tonkin, Ian Craddock, Alan Whone
  • for: 本研究旨在提高现有indoor localization方法的效果,透过使用 transformer 模型和双模态数据( Received Signal Strength Indicator 和加速器数据)来提高患者的跟踪精度。
  • methods: 本研究使用 transformer 模型和 dual modalities 来提高 indoor localization 的精度。
  • results: 研究结果表明,提案的网络在实际数据上表现出色,并且可以准确地预测患者是否正在服用 леvodopa 药物。
    Abstract Parkinson's disease (PD) is a slowly progressive, debilitating neurodegenerative disease which causes motor symptoms including gait dysfunction. Motor fluctuations are alterations between periods with a positive response to levodopa therapy ("on") and periods marked by re-emergency of PD symptoms ("off") as the response to medication wears off. These fluctuations often affect gait speed and they increase in their disabling impact as PD progresses. To improve the effectiveness of current indoor localisation methods, a transformer-based approach utilising dual modalities which provide complementary views of movement, Received Signal Strength Indicator (RSSI) and accelerometer data from wearable devices, is proposed. A sub-objective aims to evaluate whether indoor localisation, including its in-home gait speed features (i.e. the time taken to walk between rooms), could be used to evaluate motor fluctuations by detecting whether the person with PD is taking levodopa medications or withholding them. To properly evaluate our proposed method, we use a free-living dataset where the movements and mobility are greatly varied and unstructured as expected in real-world conditions. 24 participants lived in pairs (consisting of one person with PD, one control) for five days in a smart home with various sensors. Our evaluation on the resulting dataset demonstrates that our proposed network outperforms other methods for indoor localisation. The sub-objective evaluation shows that precise room-level localisation predictions, transformed into in-home gait speed features, produce accurate predictions on whether the PD participant is taking or withholding their medications.
    摘要 帕金森病 (PD) 是一种慢慢地进行、严重影响人体功能的神经退化疾病。它会导致运动症状,其中包括走姿畸形。在 леvodopa 治疗的影响下,运动症状会经历变化,通常是由“在”和“离”两种不同的状态组成的。这些变化会影响走姿速度,并且随着疾病的进行而加剧。为了改进现有的室内定位方法,一种基于 transformer 的方法,使用了两种不同的感知模式,即 Received Signal Strength Indicator (RSSI) 和加速器数据,从 wearable 设备获取。这种方法的一个目标是评估 Whether indoor localization, including its in-home gait speed features (i.e., the time taken to walk between rooms), could be used to evaluate motor fluctuations by detecting whether the person with PD is taking levodopa medications or withholding them。为了准确评估我们的提议方法,我们使用了一个免费生活数据集,该数据集包含了各种不同和不结构的运动和 mobilty 数据,如预计在实际条件下的情况。24名参与者(其中有1名帕金森病患者和1名控制人)在一个智能家庭中生活了5天,并装备了多种感知器。我们对 resulting 数据进行评估,并表明我们的提议网络在室内定位方面表现出色。另一个目标评估表明,通过精准地将室内定位预测结果转换为室内走姿速度特征,可以准确地预测帕金森病患者是否正在服用或减少 леvodopa 药物。

A Novel Convolutional Neural Network Architecture with a Continuous Symmetry

  • paper_url: http://arxiv.org/abs/2308.01621
  • repo_url: https://github.com/liuyao12/ConvNets-PDE-perspective
  • paper_authors: Yao Liu, Hang Shao, Bing Bai
  • for: 这个论文旨在提出一种新的卷积神经网络(ConvNet)架构,以解决图像分类任务。
  • methods: 该架构基于一类几何Equation(PDEs),即 quasi-linear hyperbolic systems,并且允许权重的修改via维度的连续群 Symmetry。
  • results: 该架构可以与传统模型相比,在图像分类任务上达到相似的性能,并且可以推广到更多的应用场景。
    Abstract This paper introduces a new Convolutional Neural Network (ConvNet) architecture inspired by a class of partial differential equations (PDEs) called quasi-linear hyperbolic systems. With comparable performance on the image classification task, it allows for the modification of the weights via a continuous group of symmetry. This is a significant shift from traditional models where the architecture and weights are essentially fixed. We wish to promote the (internal) symmetry as a new desirable property for a neural network, and to draw attention to the PDE perspective in analyzing and interpreting ConvNets in the broader Deep Learning community.
    摘要 这篇论文介绍了一种新的卷积神经网络(ConvNet)建 architecture, Drawing inspiration from a class ofpartial differential equations(PDEs)called quasi-linear hyperbolic systems. 与传统模型相比,该建 architecture 允许权重的修改via continuous group of symmetry, 这是一种重要的shift。我们希望通过推广 internal symmetry 作为神经网络的新有利属性,并吸引更广泛的深度学习社区关注 PDE 的视角来分析和解释 ConvNets。

Assessing Systematic Weaknesses of DNNs using Counterfactuals

  • paper_url: http://arxiv.org/abs/2308.01614
  • repo_url: None
  • paper_authors: Sujan Sai Gannamaneni, Michael Mock, Maram Akila
  • for: This paper aims to identify systematic weaknesses in deep neural networks (DNNs) for safety-critical applications, and to develop an effective and computationally efficient algorithm to validate the semantic attribution of existing subsets.
  • methods: The proposed method is inspired by counterfactual explanations and uses a combination of feature importance and partial dependence plots to identify the causal relationship between specific attributes and the model’s performance.
  • results: The authors demonstrate the effectiveness of their approach on an example from the autonomous driving domain, showing that the proposed method can identify performance differences among different pedestrian assets, and that the asset type is not always the reason for reduced performance.
    Abstract With the advancement of DNNs into safety-critical applications, testing approaches for such models have gained more attention. A current direction is the search for and identification of systematic weaknesses that put safety assumptions based on average performance values at risk. Such weaknesses can take on the form of (semantically coherent) subsets or areas in the input space where a DNN performs systematically worse than its expected average. However, it is non-trivial to attribute the reason for such observed low performances to the specific semantic features that describe the subset. For instance, inhomogeneities within the data w.r.t. other (non-considered) attributes might distort results. However, taking into account all (available) attributes and their interaction is often computationally highly expensive. Inspired by counterfactual explanations, we propose an effective and computationally cheap algorithm to validate the semantic attribution of existing subsets, i.e., to check whether the identified attribute is likely to have caused the degraded performance. We demonstrate this approach on an example from the autonomous driving domain using highly annotated simulated data, where we show for a semantic segmentation model that (i) performance differences among the different pedestrian assets exist, but (ii) only in some cases is the asset type itself the reason for this reduction in the performance.
    摘要

Feature Noise Boosts DNN Generalization under Label Noise

  • paper_url: http://arxiv.org/abs/2308.01609
  • repo_url: https://github.com/zlzenglu/fn
  • paper_authors: Lu Zeng, Xuan Chen, Xiaoshuang Shi, Heng Tao Shen
  • for: 增强深度神经网络(DNNs)的泛化性,尤其是在标签噪声(label noise)的情况下。
  • methods: 直接将噪声添加到训练数据的特征上,使用了一种简单的特征噪声方法。
  • results: 经过了 teorтиче 分析和实验 validate,这种特征噪声方法可以有效地增强DNNs的泛化性,并且可以在不同的标签噪声水平和类型下选择合适的噪声类型和水平来实现恰当的标签噪声泛化。
    Abstract The presence of label noise in the training data has a profound impact on the generalization of deep neural networks (DNNs). In this study, we introduce and theoretically demonstrate a simple feature noise method, which directly adds noise to the features of training data, can enhance the generalization of DNNs under label noise. Specifically, we conduct theoretical analyses to reveal that label noise leads to weakened DNN generalization by loosening the PAC-Bayes generalization bound, and feature noise results in better DNN generalization by imposing an upper bound on the mutual information between the model weights and the features, which constrains the PAC-Bayes generalization bound. Furthermore, to ensure effective generalization of DNNs in the presence of label noise, we conduct application analyses to identify the optimal types and levels of feature noise to add for obtaining desirable label noise generalization. Finally, extensive experimental results on several popular datasets demonstrate the feature noise method can significantly enhance the label noise generalization of the state-of-the-art label noise method.
    摘要 Deep neural networks (DNNs) 的泛化能力受到标签噪声的影响。在这项研究中,我们介绍了一种简单的特征噪声方法,可以直接将噪声添加到训练数据的特征中,以提高 DNN 的泛化能力。我们进行了理论分析,发现标签噪声会导致 DNN 的泛化能力弱化,并且特征噪声会对 DNN 的泛化能力产生正面的影响,并且可以限制 PAC-Bayes 泛化约束。此外,为确保 DNN 在标签噪声下的有效泛化,我们进行了应用分析,以确定合适的特征噪声类型和水平。最后,我们对多个流行的数据集进行了广泛的实验,发现特征噪声方法可以显著提高标签噪声泛化方法的泛化能力。

Discriminative Graph-level Anomaly Detection via Dual-students-teacher Model

  • paper_url: http://arxiv.org/abs/2308.01947
  • repo_url: https://github.com/whb605/gladst
  • paper_authors: Fu Lin, Xuexiong Luo, Jia Wu, Jian Yang, Shan Xue, Zitong Wang, Haonan Gong
  • for: 本文目的是提出一种基于图集中异常图的检测方法,以找到图集中异常的图。由于图集中异常检测的研究不够,现有的方法主要是通过捕捉异常图信息来学习更好的图表示。然而,它们忽略了异常图分类的有效性评价函数。本文因此首先定义图集中异常信息,包括节点和图像异常,并采用节点级和图级信息差异来识别它们。
  • methods: 本文提出了一种 dual-students-teacher 模型,其中教师模型使用了规则损函数来训练图表示更加分散。两个竞争学生模型,一个是 normal 图模型,另一个是异常图模型,分别适应教师模型的节点级和图级表示视角。最后,我们将两个学生模型的表示错误相加,以分类异常图。
  • results: 实验分析表明,我们的方法在真实世界中的图据集上得到了良好的效果,能够有效地检测图集中的异常图。
    Abstract Different from the current node-level anomaly detection task, the goal of graph-level anomaly detection is to find abnormal graphs that significantly differ from others in a graph set. Due to the scarcity of research on the work of graph-level anomaly detection, the detailed description of graph-level anomaly is insufficient. Furthermore, existing works focus on capturing anomalous graph information to learn better graph representations, but they ignore the importance of an effective anomaly score function for evaluating abnormal graphs. Thus, in this work, we first define anomalous graph information including node and graph property anomalies in a graph set and adopt node-level and graph-level information differences to identify them, respectively. Then, we introduce a discriminative graph-level anomaly detection framework with dual-students-teacher model, where the teacher model with a heuristic loss are trained to make graph representations more divergent. Then, two competing student models trained by normal and abnormal graphs respectively fit graph representations of the teacher model in terms of node-level and graph-level representation perspectives. Finally, we combine representation errors between two student models to discriminatively distinguish anomalous graphs. Extensive experiment analysis demonstrates that our method is effective for the graph-level anomaly detection task on graph datasets in the real world.
    摘要 不同于当前的节点级异常检测任务,我们的目标是找到图集中异常的图,并且这些图significantly differ from others。由于关于图级异常检测的研究 scarcity,我们所用的异常描述不够详细。此外,现有的works主要是捕捉异常图信息,以学习更好的图表示。然而,它们忽略了有效的异常分数函数的重要性,用于评估异常图。因此,在这个工作中,我们首先定义了图集中的异常信息,包括节点和图性异常。然后,我们提出了一种推理图级异常检测框架,使用两个学生模型和一个教师模型。教师模型通过规则损失进行训练,以使图表示更加分化。两个学生模型,分别使用正常和异常图进行训练,然后将教师模型的图表示分别拟合到节点级和图级 representation 两个视角中。最后,我们将两个学生模型的表示错误相加,以分类地分别检测异常图。我们的方法在实际世界上的图据上进行了广泛的实验分析,并证明其效果。

Unsupervised Multiplex Graph Learning with Complementary and Consistent Information

  • paper_url: http://arxiv.org/abs/2308.01606
  • repo_url: https://github.com/larryuestc/cocomg
  • paper_authors: Liang Peng, Xin Wang, Xiaofeng Zhu
  • for: 本文提出了一种能够 Addressing two practical issues in unsupervised multiplex graph learning (UMGL) 的方法,即 out-of-sample 问题和噪声问题。
  • methods: 本文使用了多个 MLP Encoder 来进行表示学习,并对两个约束进行限制,即保持节点之间的本地图 структуры,以处理 out-of-sample 问题,并 maximize 多个节点表示之间的相关性,以处理噪声问题。
  • results: 对比其他方法,本文的提出的方法在不同的下游任务中显示出了显著的效果和高效率,能够有效地解决 out-of-sample 问题和噪声问题。
    Abstract Unsupervised multiplex graph learning (UMGL) has been shown to achieve significant effectiveness for different downstream tasks by exploring both complementary information and consistent information among multiple graphs. However, previous methods usually overlook the issues in practical applications, i.e., the out-of-sample issue and the noise issue. To address the above issues, in this paper, we propose an effective and efficient UMGL method to explore both complementary and consistent information. To do this, our method employs multiple MLP encoders rather than graph convolutional network (GCN) to conduct representation learning with two constraints, i.e., preserving the local graph structure among nodes to handle the out-of-sample issue, and maximizing the correlation of multiple node representations to handle the noise issue. Comprehensive experiments demonstrate that our proposed method achieves superior effectiveness and efficiency over the comparison methods and effectively tackles those two issues. Code is available at https://github.com/LarryUESTC/CoCoMG.
    摘要 <>转换文本为简化中文。<>无监督多Graph学习(UMGL)已经在不同的下游任务中显示出了显著的效果,通过探索多个图的共同信息和差异信息。然而,先前的方法通常忽视了实际应用中的问题,即外样问题和噪声问题。为了解决这些问题,在这篇论文中,我们提出了一种有效和高效的 UMGL 方法,通过两个约束来进行表示学习:一是保持节点之间的本地图 структуры,以处理外样问题,二是最大化多个节点表示之间的相关性,以处理噪声问题。广泛的实验表明,我们提出的方法在比较方法上显示出了superior的效果和高效性,并有效地解决了这两个问题。代码可以在 中找到。

Deep Learning-based surrogate models for parametrized PDEs: handling geometric variability through graph neural networks

  • paper_url: http://arxiv.org/abs/2308.01602
  • repo_url: None
  • paper_authors: Nicola Rares Franco, Stefania Fresca, Filippo Tombari, Andrea Manzoni
  • for: 本研究旨在提出一种基于图 neural network (GNN) 的时间依赖partial differential equation (PDE) 模拟方法,以提高模拟效率和泛化能力。
  • methods: 本研究使用了一种数据驱动的时间步骤方法,其中GNN建立了一个高效的系统来演化系统。
  • results: 实验结果表明,GNN 可以提供一种有效的替代方案,以提高模拟效率和泛化能力。 GNN 也能够在不同的几何和分辨率下进行泛化。
    Abstract Mesh-based simulations play a key role when modeling complex physical systems that, in many disciplines across science and engineering, require the solution of parametrized time-dependent nonlinear partial differential equations (PDEs). In this context, full order models (FOMs), such as those relying on the finite element method, can reach high levels of accuracy, however often yielding intensive simulations to run. For this reason, surrogate models are developed to replace computationally expensive solvers with more efficient ones, which can strike favorable trade-offs between accuracy and efficiency. This work explores the potential usage of graph neural networks (GNNs) for the simulation of time-dependent PDEs in the presence of geometrical variability. In particular, we propose a systematic strategy to build surrogate models based on a data-driven time-stepping scheme where a GNN architecture is used to efficiently evolve the system. With respect to the majority of surrogate models, the proposed approach stands out for its ability of tackling problems with parameter dependent spatial domains, while simultaneously generalizing to different geometries and mesh resolutions. We assess the effectiveness of the proposed approach through a series of numerical experiments, involving both two- and three-dimensional problems, showing that GNNs can provide a valid alternative to traditional surrogate models in terms of computational efficiency and generalization to new scenarios. We also assess, from a numerical standpoint, the importance of using GNNs, rather than classical dense deep neural networks, for the proposed framework.
    摘要 mesh-based 模拟在科学和工程多种领域中发挥关键作用,特别是在解决参数化时间依赖非线性偏微分方程(PDEs)中。在这种情况下,全序模型(FOMs),如基于finite element方法的模型,可以达到高级别的准确性,但通常需要昂贵的计算。为了解决这个问题,人们开发了代理模型,以换取更高效的计算方法,这可以实现可接受的妥协。本工作探讨在时间依赖PDEs中使用图 neural network(GNNs)进行模拟,特别是在具有参数化空间域的情况下。我们提出了一种系统atic的方法,使用数据驱动时间步骤来构建代理模型,其中GNN架构用于有效地演化系统。与大多数代理模型不同的是,我们的方法可以处理具有参数依赖的空间域的问题,同时能够泛化到不同的几何和分辨率。我们通过一系列数字实验,包括二维和三维问题,证明GNNs可以提供一个有效的代替方案,与传统的代理模型相比。此外,我们还从数值角度评估了使用GNNs而不使用传统的 dense deep neural networks 的重要性。

Experimental Results regarding multiple Machine Learning via Quaternions

  • paper_url: http://arxiv.org/abs/2308.01946
  • repo_url: None
  • paper_authors: Tianlei Zhu, Renzhe Zhu
  • for: 这个研究探讨了使用四元数在机器学习算法中的应用。
  • methods: 这个研究使用了四元数来表示和分类三维空间中的旋转数据,包括随机生成的四元数数据和相应的标签,将四元数转换为旋转矩阵,并使其为输入特征。
  • results: 根据四元数和多种机器学习算法,这个研究显示了更高的准确率和显著提高的预测性能。总之,这个研究提供了使用四元数进行机器学习任务的实质基础。
    Abstract This paper presents an experimental study on the application of quaternions in several machine learning algorithms. Quaternion is a mathematical representation of rotation in three-dimensional space, which can be used to represent complex data transformations. In this study, we explore the use of quaternions to represent and classify rotation data, using randomly generated quaternion data and corresponding labels, converting quaternions to rotation matrices, and using them as input features. Based on quaternions and multiple machine learning algorithms, it has shown higher accuracy and significantly improved performance in prediction tasks. Overall, this study provides an empirical basis for exploiting quaternions for machine learning tasks.
    摘要

SoK: Assessing the State of Applied Federated Machine Learning

  • paper_url: http://arxiv.org/abs/2308.02454
  • repo_url: None
  • paper_authors: Tobias Müller, Maximilian Stäbler, Hugo Gascón, Frank Köster, Florian Matthes
  • for: 本研究旨在探讨 Federated Machine Learning (FedML) 在实际应用中的现状和挑战。
  • methods: 本研究采用系统性的文献回顾方法,对 74 篇相关文章进行分析,描述 FedML 实现的特点和趋势,以及其应用领域和推动因素。
  • results: 本研究发现,FedML 在privacy-critical 环境中具有潜在的应用前景,但是在实际应用中还存在一些挑战,如数据隐私保护和communication overhead。
    Abstract Machine Learning (ML) has shown significant potential in various applications; however, its adoption in privacy-critical domains has been limited due to concerns about data privacy. A promising solution to this issue is Federated Machine Learning (FedML), a model-to-data approach that prioritizes data privacy. By enabling ML algorithms to be applied directly to distributed data sources without sharing raw data, FedML offers enhanced privacy protections, making it suitable for privacy-critical environments. Despite its theoretical benefits, FedML has not seen widespread practical implementation. This study aims to explore the current state of applied FedML and identify the challenges hindering its practical adoption. Through a comprehensive systematic literature review, we assess 74 relevant papers to analyze the real-world applicability of FedML. Our analysis focuses on the characteristics and emerging trends of FedML implementations, as well as the motivational drivers and application domains. We also discuss the encountered challenges in integrating FedML into real-life settings. By shedding light on the existing landscape and potential obstacles, this research contributes to the further development and implementation of FedML in privacy-critical scenarios.
    摘要 This study aims to explore the current state of applied FedML and identify the challenges hindering its practical adoption. Through a comprehensive systematic literature review, we assess 74 relevant papers to analyze the real-world applicability of FedML. Our analysis focuses on the characteristics and emerging trends of FedML implementations, as well as the motivational drivers and application domains. We also discuss the encountered challenges in integrating FedML into real-life settings. By shedding light on the existing landscape and potential obstacles, this research contributes to the further development and implementation of FedML in privacy-critical scenarios.

Unsupervised Representation Learning for Time Series: A Review

  • paper_url: http://arxiv.org/abs/2308.01578
  • repo_url: https://github.com/mqwfrog/ults
  • paper_authors: Qianwen Meng, Hangwei Qian, Yong Liu, Yonghui Xu, Zhiqi Shen, Lizhen Cui
  • for: 本研究旨在系统性地研究无监督表示学习方法的应用于时间序列数据,以掌握无需标注的特征表示。
  • methods: 本研究使用了现有的各种无监督表示学习方法,包括自适应表示学习、异常点检测、矩阵因子分解等。
  • results: 研究发现,使用contrastive learning方法可以在9个真实世界数据集上实现最高的表示学习效果,并且提出了一些实践考虑因素和未来研究挑战。
    Abstract Unsupervised representation learning approaches aim to learn discriminative feature representations from unlabeled data, without the requirement of annotating every sample. Enabling unsupervised representation learning is extremely crucial for time series data, due to its unique annotation bottleneck caused by its complex characteristics and lack of visual cues compared with other data modalities. In recent years, unsupervised representation learning techniques have advanced rapidly in various domains. However, there is a lack of systematic analysis of unsupervised representation learning approaches for time series. To fill the gap, we conduct a comprehensive literature review of existing rapidly evolving unsupervised representation learning approaches for time series. Moreover, we also develop a unified and standardized library, named ULTS (i.e., Unsupervised Learning for Time Series), to facilitate fast implementations and unified evaluations on various models. With ULTS, we empirically evaluate state-of-the-art approaches, especially the rapidly evolving contrastive learning methods, on 9 diverse real-world datasets. We further discuss practical considerations as well as open research challenges on unsupervised representation learning for time series to facilitate future research in this field.
    摘要 <> translate_language: zh-CNUnsupervised representation learning方法 aim to learn discriminative feature representations from unlabeled data, without the requirement of annotating every sample. 时序数据的特殊特点和缺乏视觉提示,使得时序数据的注解瓶颈非常大,因此Unsupervised representation learning是非常重要的。在过去几年,Unsupervised representation learning技术在不同领域得到了快速发展。然而,对时序数据Unsupervised representation learning的系统性分析仍然缺乏。为了填补这一空白,我们进行了 comprehensive literature review of existing rapidly evolving Unsupervised representation learning approaches for time series。此外,我们还开发了一个标准化和快速实现的库,名为ULTS(i.e., Unsupervised Learning for Time Series),以便对不同模型进行快速实现和统一评估。通过ULTS,我们employmetricamente评估了现状最佳的方法,特别是在不同的 datasets上 rapidly evolving contrastive learning方法。我们还讨论了实践中的Considerations和未来研究中的开放问题,以便促进未来在这个领域的研究。

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

  • paper_url: http://arxiv.org/abs/2308.01573
  • repo_url: https://github.com/komyeongjin/specdiff-gan
  • paper_authors: Myeongjin Ko, Yong-Hoon Choi
  • for: 提高DiffGAN-TTS模型的speech质量和生成速度
  • methods: 采用两个识别器:扩散识别器和spectrogram识别器,以学习扩散过程的分布和生成数据的分布
  • results: 比较latest state-of-the-art模型FastSpeech2和DiffGAN-TTS,在不同的评价指标上显示更高的性能,包括SSIM、MCD、F0 RMSE、STOI、PESQ以及主观评价MOS。
    Abstract The diffusion model is capable of generating high-quality data through a probabilistic approach. However, it suffers from the drawback of slow generation speed due to the requirement of a large number of time steps. To address this limitation, recent models such as denoising diffusion implicit models (DDIM) focus on generating samples without directly modeling the probability distribution, while models like denoising diffusion generative adversarial networks (GAN) combine diffusion processes with GANs. In the field of speech synthesis, a recent diffusion speech synthesis model called DiffGAN-TTS, utilizing the structure of GANs, has been introduced and demonstrates superior performance in both speech quality and generation speed. In this paper, to further enhance the performance of DiffGAN-TTS, we propose a speech synthesis model with two discriminators: a diffusion discriminator for learning the distribution of the reverse process and a spectrogram discriminator for learning the distribution of the generated data. Objective metrics such as structural similarity index measure (SSIM), mel-cepstral distortion (MCD), F0 root mean squared error (F0 RMSE), short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ), as well as subjective metrics like mean opinion score (MOS), are used to evaluate the performance of the proposed model. The evaluation results show that the proposed model outperforms recent state-of-the-art models such as FastSpeech2 and DiffGAN-TTS in various metrics. Our implementation and audio samples are located on GitHub.
    摘要 Diffusion模型可以生成高质量数据通过概率方法。然而,它受限于需要大量时间步骤,导致生成速度慢。为解决这个限制,最近的模型如降噪扩散权重模型(DDIM)和降噪扩散生成对抗网络(GAN)等都在生成样本不直接模型概率分布。在语音生成领域,一种最近的扩散语音生成模型叫做DiffGAN-TTS,利用了GAN的结构,已经被引入并显示出优于现有模型的语音质量和生成速度。在这篇论文中,我们提议一种语音生成模型,其中包括两个识别器:一个扩散识别器用于学习逆过程的分布,一个spectrogram识别器用于学习生成数据的分布。对象度量如构造相似度指数(SSIM)、mel-cepstral损害(MCD)、F0根圆满平均误差(F0 RMSE)、短时对象智能度(STOI)、语音质量评价(PESQ)以及主观度量如主观评分(MOS)等都用于评估提议模型的性能。评估结果表明,提议模型超过了最近的状态艺模型如FastSpeech2和DiffGAN-TTS在各个度量上。我们的实现和音频样本位于GitHub。

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

  • paper_url: http://arxiv.org/abs/2308.01566
  • repo_url: None
  • paper_authors: Otmane Sakhi, David Rohde, Nicolas Chopin
  • for: 大规模机器学习系统中的返回牌块技术在搜索、信息检索和推荐系统中发挥着越来越重要的作用。
  • methods: 该文章提出了一种基于策略优化框架的方法来优化大规模决策系统,使其能够快速完成在线查询。该方法基于一种新的决策函数relaxation,从而实现了一种简单 yet efficient的学习算法,可涵盖巨大的动作空间。
  • results: 文章对比了使用常见的Plackett-Luce策略类与该方法,并在动作空间规模达万万的问题上进行了比较,证明了该方法的有效性。
    Abstract An increasingly important building block of large scale machine learning systems is based on returning slates; an ordered lists of items given a query. Applications of this technology include: search, information retrieval and recommender systems. When the action space is large, decision systems are restricted to a particular structure to complete online queries quickly. This paper addresses the optimization of these large scale decision systems given an arbitrary reward function. We cast this learning problem in a policy optimization framework and propose a new class of policies, born from a novel relaxation of decision functions. This results in a simple, yet efficient learning algorithm that scales to massive action spaces. We compare our method to the commonly adopted Plackett-Luce policy class and demonstrate the effectiveness of our approach on problems with action space sizes in the order of millions.
    摘要 越来越重要的大规模机器学习系统建构件是基于返回板,即查询时返回的有序列表。这些应用包括搜索、信息检索和推荐系统。当动作空间较大时,决策系统会受到特定结构的限制,以快速完成在线查询。这篇论文通过对大规模决策系统优化问题进行政策优化框架,提出了一种新的政策类型。这种新政策类型由一种新的决策函数 relaxation 得到,从而得到了一个简单 yet efficient 的学习算法,可扩展到巨大的动作空间。我们与普遍采用的沃尔特-劳伯策略类比较,并在动作空间规模为百万的问题上表现了我们的方法的有效性。

Hierarchical Federated Learning in Wireless Networks: Pruning Tackles Bandwidth Scarcity and System Heterogeneity

  • paper_url: http://arxiv.org/abs/2308.01562
  • repo_url: None
  • paper_authors: Md Ferdous Pervej, Richeng Jin, Huaiyu Dai
  • for: 本研究旨在提出一种基于层次分解的贝叶斯学习(PHFL)算法,以满足实际无线网络中端到端通信受限制和设备限制。
  • methods: 本研究使用模型剪除技术,并且jointly优化客户端的模型剪除比例、中央处理器频率和传输功率,以最小化控制项的抽象 bound 下的抽象 bound。
  • results: 通过广泛的 simulations,本研究证明了提出的 PHFL 算法在测试准确率、墙 clock 时间、能量消耗和带宽要求上的有效性。
    Abstract While a practical wireless network has many tiers where end users do not directly communicate with the central server, the users' devices have limited computation and battery powers, and the serving base station (BS) has a fixed bandwidth. Owing to these practical constraints and system models, this paper leverages model pruning and proposes a pruning-enabled hierarchical federated learning (PHFL) in heterogeneous networks (HetNets). We first derive an upper bound of the convergence rate that clearly demonstrates the impact of the model pruning and wireless communications between the clients and the associated BS. Then we jointly optimize the model pruning ratio, central processing unit (CPU) frequency and transmission power of the clients in order to minimize the controllable terms of the convergence bound under strict delay and energy constraints. However, since the original problem is not convex, we perform successive convex approximation (SCA) and jointly optimize the parameters for the relaxed convex problem. Through extensive simulation, we validate the effectiveness of our proposed PHFL algorithm in terms of test accuracy, wall clock time, energy consumption and bandwidth requirement.
    摘要 While a practical wireless network has many tiers where end users do not directly communicate with the central server, the users' devices have limited computation and battery powers, and the serving base station (BS) has a fixed bandwidth. Owing to these practical constraints and system models, this paper leverages model pruning and proposes a pruning-enabled hierarchical federated learning (PHFL) in heterogeneous networks (HetNets). We first derive an upper bound of the convergence rate that clearly demonstrates the impact of the model pruning and wireless communications between the clients and the associated BS. Then we jointly optimize the model pruning ratio, central processing unit (CPU) frequency and transmission power of the clients in order to minimize the controllable terms of the convergence bound under strict delay and energy constraints. However, since the original problem is not convex, we perform successive convex approximation (SCA) and jointly optimize the parameters for the relaxed convex problem. Through extensive simulation, we validate the effectiveness of our proposed PHFL algorithm in terms of test accuracy, wall clock time, energy consumption and bandwidth requirement.Here is the translation in Traditional Chinese:而在实际无线网络中,有多层次的端用户不直接与中央服务器通信,用户的设备有限的计算和电池能力,并且服务基站(BS)有固定的带宽。由于这些实际限制和系统模型,这篇文章利用模型剔除和提议了一个剔除enabled的 Hierarchical Federated Learning(PHFL)在不同类型的网络(HetNets)中。我们首先 derive了各种模型剔除的上限 bound,该 bound 清楚地显示了剔除和无线通信 между客户端和相关的 BS 的影响。然后,我们共同优化模型剔除比例、中央处理器(CPU)频率和客户端的传输功率,以最小化控制性 bound 的调整下限,以满足严格的延迟和能源限制。然而,由于原始问题不是凸变数,我们通过Successive Convex Approximation(SCA)来优化parameters,并对它们进行松动的优化。通过广泛的Simulation,我们证明了我们提议的 PHFL 算法在测试准确度、壁时、能源消耗和带宽需求方面的有效性。

Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.01557
  • repo_url: None
  • paper_authors: Joao Carvalho, An T. Le, Mark Baierl, Dorothea Koert, Jan Peters
  • For: The paper is written for researchers and practitioners in the field of robot motion planning and optimization, particularly those interested in using learning priors to accelerate the planning process.* Methods: The paper proposes using diffusion models as priors for motion planning, which allows for sampling directly from the posterior trajectory distribution conditioned on task goals. The authors also leverage the inverse denoising process of diffusion models to effectively encode data multimodality in high-dimensional settings.* Results: The paper demonstrates the efficacy of the proposed method, Motion Planning Diffusion, through experiments in simulated planar robot and 7-dof robot arm manipulator environments. The results show that diffusion models are strong priors for encoding high-dimensional trajectory distributions of robot motions, and the method is able to generalize well to previously unseen obstacles.Here is the same information in Simplified Chinese text:* For: 本文是为机器人运动规划和优化领域的研究人员和实践者编写的,特别是关心使用学习假设加速运动规划的人。* Methods: 本文提出使用扩散模型作为运动规划的假设,可以直接从 posterior 运动规划分布中采样任务目标条件下的轨迹。作者们还利用扩散模型的逆噪处理机制来有效地编码高维数据多模性。* Results: 本文通过在平面机器人和7度OF机械臂环境中的 simulations 来证明提出的方法 Motion Planning Diffusion 的效果,结果表明扩散模型是高维轨迹分布的强大假设,并能够在未经见过障碍物的环境中具有普适性。
    Abstract Learning priors on trajectory distributions can help accelerate robot motion planning optimization. Given previously successful plans, learning trajectory generative models as priors for a new planning problem is highly desirable. Prior works propose several ways on utilizing this prior to bootstrapping the motion planning problem. Either sampling the prior for initializations or using the prior distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we propose learning diffusion models as priors. We then can sample directly from the posterior trajectory distribution conditioned on task goals, by leveraging the inverse denoising process of diffusion models. Furthermore, diffusion has been recently shown to effectively encode data multimodality in high-dimensional settings, which is particularly well-suited for large trajectory dataset. To demonstrate our method efficacy, we compare our proposed method - Motion Planning Diffusion - against several baselines in simulated planar robot and 7-dof robot arm manipulator environments. To assess the generalization capabilities of our method, we test it in environments with previously unseen obstacles. Our experiments show that diffusion models are strong priors to encode high-dimensional trajectory distributions of robot motions.
    摘要 学习轨迹分布可以帮助加速机器人运动规划优化。给定先前成功的计划,学习轨迹生成模型作为优化问题的先前知识是非常有优势。先前的工作提出了使用这种先前知识来启动运动规划问题的多种方法。可以从先前知识中采样,或者使用先前知识的分布来形式化最大 posterior 方法来优化轨迹。在这个工作中,我们提议使用扩散模型来学习先前知识。我们可以通过扩散模型的逆减针对任务目标来直接从 posterior 轨迹分布中采样,并且利用扩散模型对高维数据的编码能力来更好地处理大规模轨迹数据。为了证明我们的方法效果,我们将比较我们的方法(运动规划扩散)与多个基eline在 simulate 平面机器人和7度OF机械臂环境中。为了评估我们的方法泛化能力,我们在未看过障碍物的环境中进行测试。我们的实验表明,扩散模型是高维轨迹分布的机器人运动优化中强大的先前知识。

InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent

  • paper_url: http://arxiv.org/abs/2308.01552
  • repo_url: None
  • paper_authors: Po-Lin Chen, Cheng-Shang Chang
  • for: 本研究 paper 探讨 OpenAI 的 ChatGPT 在具有体系的代理系统中的整合,评估其对交互决策标准的影响。
  • methods: 我们采用了多种提示,将 ChatGPT 分配到不同的角色中,如检查员和排序员,然后与原始语言模型结合。
  • results: 我们的研究显示,在 AlfWorld 中,包含 6 个不同任务的模拟家庭环境中,ChatGPT 的成功率达到 98%,强调了提示工程的重要性,从而为任务规划预测开辟了新的可能性。
    Abstract This research paper delves into the integration of OpenAI's ChatGPT into embodied agent systems, evaluating its influence on interactive decision-making benchmark. Drawing a parallel to the concept of people assuming roles according to their unique strengths, we introduce InterAct. In this approach, we feed ChatGPT with varied prompts, assigning it a numerous roles like a checker and a sorter, then integrating them with the original language model. Our research shows a remarkable success rate of 98% in AlfWorld, which consists of 6 different tasks in a simulated household environment, emphasizing the significance of proficient prompt engineering. The results highlight ChatGPT's competence in comprehending and performing intricate tasks effectively in real-world settings, thus paving the way for further advancements in task planning.
    摘要 Translation notes:* "OpenAI's ChatGPT" is translated as "OpenAI的ChatGPT" (OpenAI的chatGPT)* "embodied agent systems" is translated as "实体式代理系统" (shíwù yìjīng yìjīng zhìxīng)* "interactive decision-making benchmark" is translated as "互动决策指标" (interactive decision-making indicator)* "AlfWorld" is translated as "AlfWorld" (AlfWorld)* "simulated household environment" is translated as "模拟家庭环境" (móxíng jiāgōng yuánjīng)* "task planning" is translated as "任务规划" (task planning)

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

  • paper_url: http://arxiv.org/abs/2308.01546
  • repo_url: None
  • paper_authors: Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov
  • for: 这个论文的目的是提出一种基于稳定扩散模型的文本到音乐生成模型(MusicLDM),以便在音乐领域进行跨模态生成任务。
  • methods: 该模型使用了稳定扩散和音乐LDM架构,并通过重新训练CLAP预训练模型和Hifi-GAN vocoder来适应音乐领域。此外,该模型还使用了 beat tracking 模型和两种不同的mixup策略来进行数据扩展和避免抄袭。
  • results: 根据新定义的CLAP分数评价指标,该模型和mixup策略可以提高生成的音乐质量和新颖性,同时仍保持与输入文本的对应性。
    Abstract Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.
    摘要 Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.

Lode Enhancer: Level Co-creation Through Scaling

  • paper_url: http://arxiv.org/abs/2308.01543
  • repo_url: None
  • paper_authors: Debosmita Bhaumik, Julian Togelius, Georgios N. Yannakakis, Ahmed Khalifa
  • for: 用AI进行2D游戏等级设计协助工具
  • methods: 使用深度神经网络进行等级升采样、编辑和扩展
  • results: 通过对游戏等级进行升采样和编辑,设计者可以在不同的分辨率下创建和编辑等级,并且神经网络可以学习增加缺失的特征。
    Abstract We explore AI-powered upscaling as a design assistance tool in the context of creating 2D game levels. Deep neural networks are used to upscale artificially downscaled patches of levels from the puzzle platformer game Lode Runner. The trained networks are incorporated into a web-based editor, where the user can create and edit levels at three different levels of resolution: 4x4, 8x8, and 16x16. An edit at any resolution instantly transfers to the other resolutions. As upscaling requires inventing features that might not be present at lower resolutions, we train neural networks to reproduce these features. We introduce a neural network architecture that is capable of not only learning upscaling but also giving higher priority to less frequent tiles. To investigate the potential of this tool and guide further development, we conduct a qualitative study with 3 designers to understand how they use it. Designers enjoyed co-designing with the tool, liked its underlying concept, and provided feedback for further improvement.
    摘要 我们探索使用人工智能进行缩放作为游戏等级设计工具,在创建2D游戏等级时。我们使用深度神经网络来缩放人工减小的等级补丁,来自游戏排版平台游戏“卢德运动”。我们在网页编辑器中 integrate 了训练的神经网络,用户可以在不同的分辨率下创建和编辑等级:4x4、8x8和16x16。任何修改都会同步到其他分辨率上。由于缩放需要创造不存在低分辨率下的特征,我们训练神经网络来复制这些特征。我们介绍了一种神经网络架构,可以不仅学习缩放,还可以增加较少的瓷砖优先级。为了了解这工具的潜在力量和进一步发展,我们进行了3名设计师的质量调查,了解他们如何使用这工具,他们对这工具的概念和反馈。

MFIM: Megapixel Facial Identity Manipulation

  • paper_url: http://arxiv.org/abs/2308.01536
  • repo_url: None
  • paper_authors: Sanghyeon Na
  • for: 本研究的目的是提出一种新的面部换 Identity 框架,以实现高质量的面部换 Image 生成和有效地改变面部特征。
  • methods: 我们的模型基于 StyleGAN 的 GAN-inversion 技术,并使用 3DMM 来semantically 捕捉多种面部特征。我们对模型进行了严格的设计和训练,以确保模型可以生成高质量的面部换 Image。
  • results: 我们通过了广泛的实验,证明了我们的模型可以达到领先的性能水平,并且可以根据用户的需求进行自定义。我们还提出了一种新的操作called ID mixing,可以创造出新的 Identity by semantically mixing 多个人的 Identities。
    Abstract Face swapping is a task that changes a facial identity of a given image to that of another person. In this work, we propose a novel face-swapping framework called Megapixel Facial Identity Manipulation (MFIM). The face-swapping model should achieve two goals. First, it should be able to generate a high-quality image. We argue that a model which is proficient in generating a megapixel image can achieve this goal. However, generating a megapixel image is generally difficult without careful model design. Therefore, our model exploits pretrained StyleGAN in the manner of GAN-inversion to effectively generate a megapixel image. Second, it should be able to effectively transform the identity of a given image. Specifically, it should be able to actively transform ID attributes (e.g., face shape and eyes) of a given image into those of another person, while preserving ID-irrelevant attributes (e.g., pose and expression). To achieve this goal, we exploit 3DMM that can capture various facial attributes. Specifically, we explicitly supervise our model to generate a face-swapped image with the desirable attributes using 3DMM. We show that our model achieves state-of-the-art performance through extensive experiments. Furthermore, we propose a new operation called ID mixing, which creates a new identity by semantically mixing the identities of several people. It allows the user to customize the new identity.
    摘要 <>Translate the given text into Simplified Chinese.<>人脸替换是一项任务,替换给定图像中的人脸为另一个人脸。在这项工作中,我们提出了一种新的人脸替换框架,称为兆像人脸身份修饰(MFIM)。人脸替换模型应该完成两个目标:首先,生成高质量图像;其次,有效地转换给定图像中的人脸身份特征(如脸形和眼睛)到另一个人脸身份特征,保留不相关的人脸特征(如姿势和表情)。为达到这个目标,我们利用了3DMM,可以捕捉各种人脸特征。具体来说,我们明确地监督我们的模型生成一个人脸替换图像,拥有愿景的特征。我们展示了我们的模型在各种实验中达到了状态艺术性的表现。此外,我们还提出了一种新的操作,即ID混合,可以创造一个新的身份,通过semantic Mixing(semantic混合)几个人脸的身份。这允许用户自定义新的身份。

Food Classification using Joint Representation of Visual and Textual Data

  • paper_url: http://arxiv.org/abs/2308.02562
  • repo_url: None
  • paper_authors: Prateek Mittal, Puneet Goyal, Joohi Chauhan
  • for: 这个研究旨在提出一个多 modal 分类框架,用于健康领域的食物分类。
  • methods: 提案的网络使用 Modified EfficientNet 和 Mish 活化函数进行图像分类,并使用传统的 BERT 变数架构进行文本分类。
  • results: 实验结果显示,提案的网络在大量的 open-source 数据集 UPMC Food-101 上表现出色,与其他方法比较,获得了11.57% 和 6.34% 的准确率优势,图像和文本分类方面。
    Abstract Food classification is an important task in health care. In this work, we propose a multimodal classification framework that uses the modified version of EfficientNet with the Mish activation function for image classification, and the traditional BERT transformer-based network is used for text classification. The proposed network and the other state-of-the-art methods are evaluated on a large open-source dataset, UPMC Food-101. The experimental results show that the proposed network outperforms the other methods, a significant difference of 11.57% and 6.34% in accuracy is observed for image and text classification, respectively, when compared with the second-best performing method. We also compared the performance in terms of accuracy, precision, and recall for text classification using both machine learning and deep learning-based models. The comparative analysis from the prediction results of both images and text demonstrated the efficiency and robustness of the proposed approach.
    摘要 食品分类是医疗保健中的重要任务。在这项工作中,我们提出了一种多模态分类框架,使用修改后的EfficientNet和Mish激活函数进行图像分类,并使用传统的BERT变换网络进行文本分类。我们的提议网络和其他状态arct对比的方法在大型开源数据集UPMC Food-101上进行了评估。实验结果表明,我们的提议网络在图像和文本分类方面的准确率比其他方法高出11.57%和6.34%,与第二高分类方法相比。我们还对文本分类方面的准确率、精度和准确率进行了比较分析,并通过图像和文本预测结果的比较分析,证明了我们的方法的高效和可靠性。

Circumventing Concept Erasure Methods For Text-to-Image Generative Models

  • paper_url: http://arxiv.org/abs/2308.01508
  • repo_url: https://github.com/nyu-dice-lab/circumventing-concept-erasure
  • paper_authors: Minh Pham, Kelly O. Marshall, Chinmay Hegde
  • for: 本研究探讨了五种最新的概念除除法,以证明这些方法无法完全除除目标概念。
  • methods: 研究人员使用特殊的学习word embedding来检测和恢复被sanitized模型中的概念。
  • results: 结果表明这些post hoc概念除法方法不稳定,不适用于AI安全。
    Abstract Text-to-image generative models can produce photo-realistic images for an extremely broad range of concepts, and their usage has proliferated widely among the general public. On the flip side, these models have numerous drawbacks, including their potential to generate images featuring sexually explicit content, mirror artistic styles without permission, or even hallucinate (or deepfake) the likenesses of celebrities. Consequently, various methods have been proposed in order to "erase" sensitive concepts from text-to-image models. In this work, we examine five recently proposed concept erasure methods, and show that targeted concepts are not fully excised from any of these methods. Specifically, we leverage the existence of special learned word embeddings that can retrieve "erased" concepts from the sanitized models with no alterations to their weights. Our results highlight the brittleness of post hoc concept erasure methods, and call into question their use in the algorithmic toolkit for AI safety.
    摘要

Local-Global Temporal Fusion Network with an Attention Mechanism for Multiple and Multiclass Arrhythmia Classification

  • paper_url: http://arxiv.org/abs/2308.02416
  • repo_url: None
  • paper_authors: Yun Kwan Kim, Minji Lee, Kunwook Jo, Hee Seok Song, Seong-Whan Lee
    for:The paper aims to develop a framework for arrhythmia detection and classification with a constrained input length, which can accurately recognize arrhythmias and calculate their occurrence times.methods:The proposed method consists of local temporal information extraction, global pattern extraction, and local-global information fusion with attention. The method utilizes a combination of local and global features to capture the dynamics of arrhythmias and achieve accurate detection and classification.results:The proposed method was evaluated on the MIT-BIH arrhythmia database and MIT-BIH atrial fibrillation database, and the results showed superior performance compared to state-of-the-art models. The method was also tested on a different database and achieved superior performance, demonstrating its generalization ability.
    Abstract Clinical decision support systems (CDSSs) have been widely utilized to support the decisions made by cardiologists when detecting and classifying arrhythmia from electrocardiograms (ECGs). However, forming a CDSS for the arrhythmia classification task is challenging due to the varying lengths of arrhythmias. Although the onset time of arrhythmia varies, previously developed methods have not considered such conditions. Thus, we propose a framework that consists of (i) local temporal information extraction, (ii) global pattern extraction, and (iii) local-global information fusion with attention to perform arrhythmia detection and classification with a constrained input length. The 10-class and 4-class performances of our approach were assessed by detecting the onset and offset of arrhythmia as an episode and the duration of arrhythmia based on the MIT-BIH arrhythmia database (MITDB) and MIT-BIH atrial fibrillation database (AFDB), respectively. The results were statistically superior to those achieved by the comparison models. To check the generalization ability of the proposed method, an AFDB-trained model was tested on the MITDB, and superior performance was attained compared with that of a state-of-the-art model. The proposed method can capture local-global information and dynamics without incurring information losses. Therefore, arrhythmias can be recognized more accurately, and their occurrence times can be calculated; thus, the clinical field can create more accurate treatment plans by using the proposed method.
    摘要 临床决策支持系统(CDSS)已广泛应用于卡地理学家在电cardiogram(ECG)中检测和分类 irregular heartbeat。然而,构建一个CDSS用于 irregular heartbeat 分类任务是困难的,因为irregular heartbeat的持续时间各不相同。 although the onset time of irregular heartbeat varies, previously developed methods have not considered such conditions. Therefore, we propose a framework that consists of (i) local temporal information extraction, (ii) global pattern extraction, and (iii) local-global information fusion with attention to perform irregular heartbeat detection and classification with a constrained input length.我们使用 MIT-BIH arrhythmia 数据库(MITDB)和 MIT-BIH atrial fibrillation 数据库(AFDB)来评估我们的方法。结果表明,我们的方法在10类和4类任务中表现出了 statistically superior 的结果,比较于比较模型。为了检验我们的方法的通用能力,我们使用 AFDB 训练的模型在 MITDB 上进行测试,并取得了比较于一个现有模型的superior 性能。我们的方法可以捕捉local-global信息和动态,而不会导致信息损失。因此,irregular heartbeat可以更准确地被识别,并且其出现时间可以被计算出来。因此,临床领域可以通过使用我们的方法创建更加准确的治疗计划。

Online Multi-Task Learning with Recursive Least Squares and Recursive Kernel Methods

  • paper_url: http://arxiv.org/abs/2308.01938
  • repo_url: None
  • paper_authors: Gabriel R. Lencione, Fernando J. Von Zuben
  • for: 这个论文提出了两种新的在线多任务学习(MTL)回归问题的方法。
  • methods: 我们使用高性能的图形基于MTL形式,并开发了其归纳版本,基于Weighted Recursive Least Squares(WRLS)和在线稀疏最小二乘支持向量回归(OSLSSVR)。
  • results: 我们使用任务堆叠变换,展示了一个包含多个任务关系的矩阵,并将其 Structural information embodied于MT-WRLS方法的初始化过程中,或MT-OSLSSVR中的多任务核函数中。相比现有文献,我们实现了精确和近似的循环回归,具有quadratic per-instance cost的纬度。在一个实际的风速预测案例中,我们比较了我们的在线MTL方法与其他竞争者,并证明了我们的两种提案方法具有显著的性能提升。
    Abstract This paper introduces two novel approaches for Online Multi-Task Learning (MTL) Regression Problems. We employ a high performance graph-based MTL formulation and develop its recursive versions based on the Weighted Recursive Least Squares (WRLS) and the Online Sparse Least Squares Support Vector Regression (OSLSSVR). Adopting task-stacking transformations, we demonstrate the existence of a single matrix incorporating the relationship of multiple tasks and providing structural information to be embodied by the MT-WRLS method in its initialization procedure and by the MT-OSLSSVR in its multi-task kernel function. Contrasting the existing literature, which is mostly based on Online Gradient Descent (OGD) or cubic inexact approaches, we achieve exact and approximate recursions with quadratic per-instance cost on the dimension of the input space (MT-WRLS) or on the size of the dictionary of instances (MT-OSLSSVR). We compare our online MTL methods to other contenders in a real-world wind speed forecasting case study, evidencing the significant gain in performance of both proposed approaches.
    摘要

Minimax Optimal $Q$ Learning with Nearest Neighbors

  • paper_url: http://arxiv.org/abs/2308.01490
  • repo_url: None
  • paper_authors: Puning Zhao, Lifeng Lai
  • for: This paper proposes two new $Q$ learning methods to improve the convergence rate of estimated $Q$ functions in continuous state spaces.
  • methods: The proposed methods use a direct nearest neighbor approach instead of the kernel nearest neighbor approach used in (Shah and Xie, 2018), which significantly improves the convergence rate and time complexity in high-dimensional state spaces.
  • results: Both offline and online methods are minimax rate optimal, meaning that they achieve the optimal convergence rate of $\tilde{O}(T^{-1/(d+2)})$ in high-dimensional state spaces.
    Abstract $Q$ learning is a popular model free reinforcement learning method. Most of existing works focus on analyzing $Q$ learning for finite state and action spaces. If the state space is continuous, then the original $Q$ learning method can not be directly used. A modification of the original $Q$ learning method was proposed in (Shah and Xie, 2018), which estimates $Q$ values with nearest neighbors. Such modification makes $Q$ learning suitable for continuous state space. (Shah and Xie, 2018) shows that the convergence rate of estimated $Q$ function is $\tilde{O}(T^{-1/(d+3)})$, which is slower than the minimax lower bound $\tilde{\Omega}(T^{-1/(d+2)})$, indicating that this method is not efficient. This paper proposes two new $Q$ learning methods to bridge the gap of convergence rates in (Shah and Xie, 2018), with one of them being offline, while the other is online. Despite that we still use nearest neighbor approach to estimate $Q$ function, the algorithms are crucially different from (Shah and Xie, 2018). In particular, we replace the kernel nearest neighbor in discretized region with a direct nearest neighbor approach. Consequently, our approach significantly improves the convergence rate. Moreover, the time complexity is also significantly improved in high dimensional state spaces. Our analysis shows that both offline and online methods are minimax rate optimal.
    摘要

Efficient neural supersampling on a novel gaming dataset

  • paper_url: http://arxiv.org/abs/2308.01483
  • repo_url: None
  • paper_authors: Antoine Mercier, Ruan Erasmus, Yashesh Savani, Manik Dhingra, Fatih Porikli, Guillaume Berger
  • for: 提高游戏视频的实时渲染环境,以提高分辨率、帧率和光学真实性。
  • methods: 使用神经网络算法进行聚合样本,提高supersampling的效率,并 introduce了一个新的数据集,包括auxiliary modalities如运动向量和深度,这些数据集可以用于衡量领域的进步和推进supersampling技术的前沿。
  • results: 与传统方法相比,该算法可以提高supersampling的效率4倍,保持同等准确性。新的数据集填补了现有数据集的缺失,可以作为衡量领域进步的 valuable resource。
    Abstract Real-time rendering for video games has become increasingly challenging due to the need for higher resolutions, framerates and photorealism. Supersampling has emerged as an effective solution to address this challenge. Our work introduces a novel neural algorithm for supersampling rendered content that is 4 times more efficient than existing methods while maintaining the same level of accuracy. Additionally, we introduce a new dataset which provides auxiliary modalities such as motion vectors and depth generated using graphics rendering features like viewport jittering and mipmap biasing at different resolutions. We believe that this dataset fills a gap in the current dataset landscape and can serve as a valuable resource to help measure progress in the field and advance the state-of-the-art in super-resolution techniques for gaming content.
    摘要 现代游戏实时渲染技术面临着高分辨率、高帧率和真实感等需求的挑战。聚合抽象技术已成为解决这些挑战的有效方法。我们的工作提出了一种新的神经网络算法,可以实现对游戏内容的聚合抽象,比现有方法高效4倍,同时保持同等准确性。此外,我们还提供了一个新的数据集,包括视频游戏中的运动向量和深度信息,这些信息是通过视窗摇摆和miplevel偏移等图形渲染特性生成的。我们认为这个数据集填补了当前数据领域的空白,可以作为评估进步和推动领域的state-of-the-art技术的 valuable resource。

Online covariance estimation for stochastic gradient descent under Markovian sampling

  • paper_url: http://arxiv.org/abs/2308.01481
  • repo_url: None
  • paper_authors: Abhishek Roy, Krishnakumar Balasubramanian
  • For: The paper is written for understanding the convergence rate of the online overlapping batch-means covariance estimator for Stochastic Gradient Descent (SGD) under Markovian sampling.* Methods: The paper uses techniques such as the batch-means covariance estimator and state-dependent and state-independent Markovian sampling to analyze the convergence rate of SGD.* Results: The paper shows that the convergence rates of the covariance estimator are $O\big(\sqrt{d},n^{-1/8}(\log n)^{1/4}\big)$ and $O\big(\sqrt{d},n^{-1/8}\big)$ under state-dependent and state-independent Markovian sampling, respectively, with $d$ representing dimensionality and $n$ denoting the number of observations or SGD iterations. These rates match the best-known convergence rate previously established for the independent and identically distributed ($\iid$) case. Additionally, the paper establishes the convergence rate for the first four moments of the $\ell_2$ norm of the error of SGD dynamics under state-dependent Markovian data.
    Abstract We study the online overlapping batch-means covariance estimator for Stochastic Gradient Descent (SGD) under Markovian sampling. We show that the convergence rates of the covariance estimator are $O\big(\sqrt{d}\,n^{-1/8}(\log n)^{1/4}\big)$ and $O\big(\sqrt{d}\,n^{-1/8}\big)$ under state-dependent and state-independent Markovian sampling, respectively, with $d$ representing dimensionality and $n$ denoting the number of observations or SGD iterations. Remarkably, these rates match the best-known convergence rate previously established for the independent and identically distributed ($\iid$) case by \cite{zhu2021online}, up to logarithmic factors. Our analysis overcomes significant challenges that arise due to Markovian sampling, leading to the introduction of additional error terms and complex dependencies between the blocks of the batch-means covariance estimator. Moreover, we establish the convergence rate for the first four moments of the $\ell_2$ norm of the error of SGD dynamics under state-dependent Markovian data, which holds potential interest as an independent result. To validate our theoretical findings, we provide numerical illustrations to derive confidence intervals for SGD when training linear and logistic regression models under Markovian sampling. Additionally, we apply our approach to tackle the intriguing problem of strategic classification with logistic regression, where adversaries can adaptively modify features during the training process to increase their chances of being classified in a specific target class.
    摘要 我们研究在Markovian sampling下的线上遮盾Batch-Means协方幂矩阵估计器,并证明其参数估计率为$O\big(\sqrt{d}\,n^{-1/8}(\log n)^{1/4}\big)$和$O\big(\sqrt{d}\,n^{-1/8}\big)$,具体来说,这两个率限分别适用于Markovian sampling下的状态相依和状态独立两种情况,其中$d$表示维度,$n$表示观察或SGD迭代次数。可以看到,这些率限与之前独立同分布($\iid$)情况下的最佳率限相匹配,仅受到logs因子的影响。我们的分析面临了由Markovian sampling带来的重要挑战,导致批量均值协方幂矩阵估计器中出现了额外的错误项和复杂的依赖关系。此外,我们还确定了SGD动态中第四个对应的内积项的测度,具体来说,这个结果可能具有独立的价值,作为一个独立的研究结果。在实践中,我们提供了数值示例,以 derivate SGD训练过程中的信任区间。此外,我们还应用了我们的方法,解决了具有挑战性的问题,例如在训练过程中,敌人可以随机修改特征,以增加他们被特定目标类别中的机会。

Interpretable Machine Learning for Discovery: Statistical Challenges & Opportunities

  • paper_url: http://arxiv.org/abs/2308.01475
  • repo_url: None
  • paper_authors: Genevera I. Allen, Luqin Gan, Lili Zheng
  • for: 本文主要探讨了可解解释机器学习(Interpretable Machine Learning)的应用,它们如何为我们提供人理解的结论和发现。
  • methods: 本文主要介绍了使用可解解释机器学习技术进行大数据处理、可视化和预测,以及如何通过这些技术来获得新的科学发现。
  • results: 本文详细介绍了使用可解解释机器学习技术进行supervised和unsupervised学习中的发现,以及如何验证这些发现的有效性和可靠性。
    Abstract New technologies have led to vast troves of large and complex datasets across many scientific domains and industries. People routinely use machine learning techniques to not only process, visualize, and make predictions from this big data, but also to make data-driven discoveries. These discoveries are often made using Interpretable Machine Learning, or machine learning models and techniques that yield human understandable insights. In this paper, we discuss and review the field of interpretable machine learning, focusing especially on the techniques as they are often employed to generate new knowledge or make discoveries from large data sets. We outline the types of discoveries that can be made using Interpretable Machine Learning in both supervised and unsupervised settings. Additionally, we focus on the grand challenge of how to validate these discoveries in a data-driven manner, which promotes trust in machine learning systems and reproducibility in science. We discuss validation from both a practical perspective, reviewing approaches based on data-splitting and stability, as well as from a theoretical perspective, reviewing statistical results on model selection consistency and uncertainty quantification via statistical inference. Finally, we conclude by highlighting open challenges in using interpretable machine learning techniques to make discoveries, including gaps between theory and practice for validating data-driven-discoveries.
    摘要 新技术导致科学领域和产业中大量复杂数据的出现,人们常用机器学习技术不仅处理、可见化和预测这些大数据,还用以获得数据驱动发现。这些发现通常使用可解释机器学习,即机器学习模型和技术,获得人类可理解的发现。在这篇论文中,我们讨论了可解释机器学习的场景,特别是在大数据集上使用这些技术进行新的发现。我们列举了在supervised和unsupervised Setting下使用可解释机器学习实现的发现类型。此外,我们关注了如何在数据驱动下验证这些发现,以便提高机器学习系统的信任和科学研究的重复性。我们从实践和理论两个角度来验证这些发现,包括数据分割和稳定性的方法,以及统计学结果的模型选择一致性和不确定性评估。最后,我们结束时强调了在使用可解释机器学习技术进行发现时存在的开放挑战,包括数据驱动发现的验证和理论与实践之间的差距。

Reverse Stable Diffusion: What prompt was used to generate this image?

  • paper_url: http://arxiv.org/abs/2308.01472
  • repo_url: None
  • paper_authors: Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Mubarak Shah
  • for: 这篇论文的目的是提出一个新的文本描述预测任务,并使用不同的白盒和黑盒模型来解决这个任务。
  • methods: 本文使用了一个共同的预测和多 label 词汇分类目标函数,以生成改进的文本描述。此外,文本 також使用了一个课程学习程式和一个无监督领域对应 kernel 学习方法来进一步提高方法。
  • results: 本文的实验结果显示,使用 proposed 的学习框架可以在 Stable Diffusion 生成的图像上预测文本描述,并且在 white-box 模型上获得最高的改进。此外,文中还发现了一个有趣的发现:将 diffusion 模型直接用于文本与图像生成任务可以让模型生成更加适合的图像,对应于输入文本的描述。
    Abstract Text-to-image diffusion models such as Stable Diffusion have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to obtain the desired images. To this end, we introduce the new task of predicting the text prompt given an image generated by a generative diffusion model. We combine a series of white-box and black-box models (with and without access to the weights of the diffusion network) to deal with the proposed task. We propose a novel learning framework comprising of a joint prompt regression and multi-label vocabulary classification objective that generates improved prompts. To further improve our method, we employ a curriculum learning procedure that promotes the learning of image-prompt pairs with lower labeling noise (i.e. that are better aligned), and an unsupervised domain-adaptive kernel learning method that uses the similarities between samples in the source and target domains as extra features. We conduct experiments on the DiffusionDB data set, predicting text prompts from images generated by Stable Diffusion. Our novel learning framework produces excellent results on the aforementioned task, yielding the highest gains when applied on the white-box model. In addition, we make an interesting discovery: training a diffusion model on the prompt generation task can make the model generate images that are much better aligned with the input prompts, when the model is directly reused for text-to-image generation.
    摘要 文本到图像扩散模型,如稳定扩散,在最近吸引了许多研究者的关注,而逆扩散过程可以更好地理解生成过程并如何引入提示以获得所需的图像。为此,我们介绍了预测由生成扩散模型生成的图像中的文本提示的新任务。我们结合了白盒和黑盒模型(具有或无Diffusion网络的权重)来处理该任务。我们提出了一种新的学习框架,包括文本提示 regression和多标签词汇分类目标,可以生成改进的提示。为了进一步改进我们的方法,我们使用了课程学习程序,该程序将优先采用低噪音(即更好地对齐)的图像-提示对来学习。此外,我们还使用了无监督领域适应器,使用源和目标领域样本之间的相似性为额外特征。我们在DiffusionDB数据集上进行实验,预测由稳定扩散生成的图像中的文本提示。我们的新学习框架在该任务上获得了出色的结果,特别是在白盒模型上实现了最高的提升。此外,我们还发现了一个有趣的发现:在直接将扩散模型用于文本到图像生成任务的训练过程中,模型可以生成与输入提示更好地对齐的图像。

Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

  • paper_url: http://arxiv.org/abs/2308.01471
  • repo_url: None
  • paper_authors: Ben Agro, Quinlan Sykora, Sergio Casas, Raquel Urtasun
  • for: 本研究旨在开发一种能够同时捕捉围绕自动驾驶车辆(SDV)运行的环境和未来行为预测的方法。
  • methods: 我们的方法是一种混合了物体检测和未来行为预测的方法,通过一个单一神经网络来减少计算量,同时提高预测的准确性。
  • results: 我们的实验结果表明,我们的方法可以在城市和高速公路环境中比前状态OF THE ART表现出色,并且可以避免过度预测和计算资源浪费。更多信息可以查看我们的项目网站:https://waabi.ai/research/implicito。
    Abstract A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants. Existing works either perform object detection followed by trajectory forecasting of the detected objects, or predict dense occupancy and flow grids for the whole scene. The former poses a safety concern as the number of detections needs to be kept low for efficiency reasons, sacrificing object recall. The latter is computationally expensive due to the high-dimensionality of the output grid, and suffers from the limited receptive field inherent to fully convolutional networks. Furthermore, both approaches employ many computational resources predicting areas or objects that might never be queried by the motion planner. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network. Our method avoids unnecessary computation, as it can be directly queried by the motion planner at continuous spatio-temporal locations. Moreover, we design an architecture that overcomes the limited receptive field of previous explicit occupancy prediction methods by adding an efficient yet effective global attention mechanism. Through extensive experiments in both urban and highway settings, we demonstrate that our implicit model outperforms the current state-of-the-art. For more information, visit the project website: https://waabi.ai/research/implicito.
    摘要 一种自驾车(SDV)必须能够感知它所处的环境和预测其他交通参与者的未来行为。现有的方法分别执行物体探测然后预测检测到的物体的轨迹,或预测整个场景的稠密占用和流动Grid。前者存在安全隐患,因为需要保持检测数量低,以保证效率,同时牺牲物体回归率。后者因高维度输出网络而 computationally expensive,并且受到限制的受感网络的有限观察范围的影响。此外,两种方法都需要大量计算资源预测可能不会被动控制器询问的区域或物体。这种驱动我们的协调感知和未来预测方法,该方法可以直接被动控制器询问,并且避免了不必要的计算。此外,我们还设计了一种高效但有效的全局注意力机制,以超越过去的显式占用预测方法的有限观察范围。通过在城市和高速公路上进行了广泛的实验,我们证明了我们的隐式模型可以比现状之最。更多信息,请访问我们的项目网站:https://waabi.ai/research/implicito。

Training Data Protection with Compositional Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.01937
  • repo_url: None
  • paper_authors: Aditya Golatkar, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto
  • for: 本研究旨在提出一种名为 compartmentalized diffusion models(CDM),可以在推理时将不同的扩散模型(或提示)组合在一起,以实现在所有数据 simultanously 训练的性能。
  • methods: 本方法可以在各个模型之间进行分离训练,即每个模型只需要学习自己所接触到的数据,而不需要掌握所有数据。此外,CDM 还允许在不同的分布和领域上进行不同的训练,并在推理时根据用户的访问权限服务自定义模型。
  • results: CDM 可以实现大规模扩散模型中的选择性忘记和持续学习,以及根据用户的访问权限服务自定义模型。此外,CDM 还可以确定特定样本的数据 subsets 的重要性。
    Abstract We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data simultaneously. Furthermore, each model only contains information about the subset of the data it was exposed to during training, enabling several forms of training data protection. In particular, CDMs are the first method to enable both selective forgetting and continual learning for large-scale diffusion models, as well as allowing serving customized models based on the user's access rights. CDMs also allow determining the importance of a subset of the data in generating particular samples.
    摘要 我们介绍了封 compartmentalized Diffusion Models(CDM),这是一种方法来在不同的数据来源上训练不同的扩散模型(或提示),并在推论时进行组合。个别模型可以在专门的时间和环境下进行训练,并且可以在不同的分布和领域上进行训练。在推论时,这些模型可以组合以 дости得相当于一个传统模型,训练在所有数据上。此外,每个模型只包含它在训练时所接触到的子集数据的信息,这使得可以实现多种训练数据保护。特别是,CDMs 是首个允许大规模扩散模型中的选择性遗忘和持续学习,以及根据用户的存取权来提供自定义的模型。CDMs 还允许决定特定数据子集在生成特定样本时的重要性。

Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI

  • paper_url: http://arxiv.org/abs/2308.04448
  • repo_url: None
  • paper_authors: Avijit Ghosh, Dhanya Lakshmi
  • for: 这篇论文的目的是提出一个名为“双重管理”的框架,以促进创新和创意的发展,同时确保generative AI的安全和道德性。
  • methods: 这篇论文使用了中央政府法规和社区发展的安全机制,以实现双重管理框架的目的。
  • results: 根据论文的描述,这个双重管理框架可以实现创新和创意的发展,同时确保generative AI的安全和道德性。
    Abstract Generative Artificial Intelligence (AI) has seen mainstream adoption lately, especially in the form of consumer-facing, open-ended, text and image generating models. However, the use of such systems raises significant ethical and safety concerns, including privacy violations, misinformation and intellectual property theft. The potential for generative AI to displace human creativity and livelihoods has also been under intense scrutiny. To mitigate these risks, there is an urgent need of policies and regulations responsible and ethical development in the field of generative AI. Existing and proposed centralized regulations by governments to rein in AI face criticisms such as not having sufficient clarity or uniformity, lack of interoperability across lines of jurisdictions, restricting innovation, and hindering free market competition. Decentralized protections via crowdsourced safety tools and mechanisms are a potential alternative. However, they have clear deficiencies in terms of lack of adequacy of oversight and difficulty of enforcement of ethical and safety standards, and are thus not enough by themselves as a regulation mechanism. We propose a marriage of these two strategies via a framework we call Dual Governance. This framework proposes a cooperative synergy between centralized government regulations in a U.S. specific context and safety mechanisms developed by the community to protect stakeholders from the harms of generative AI. By implementing the Dual Governance framework, we posit that innovation and creativity can be promoted while ensuring safe and ethical deployment of generative AI.
    摘要 生成人工智能(AI)在最近几年内得到了广泛的推广,尤其是在形式为consumer-facing、开放结束的文本和图像生成模型。然而,使用这些系统的使用带来了重要的伦理和安全问题,包括隐私侵犯、谣言和知识产权侵犯。生成AI的潜在性取代人类创造力和生活方式也在严格审查。为了缓解这些风险,有一个紧迫需要的政策和法规,负责able和伦理的开发在生成AI领域。现有和提议的中央政府法规,如政府的执法,受到批评,包括缺乏清晰性和一致性、跨行政区域不兼容性、限制创新和妨碍自由市场竞争。 Decentralized保护via Crowdsourced safety工具和机制是一个可能的代替方案。然而,它们缺乏伦理和安全标准的可靠监管和执法能力,因此不够作为唯一的规章机制。我们提出了一种名为“双重管理”的框架,该框架提议在美国特定的上下文中,中央政府法规和社区开发的安全机制之间建立合作协同关系,以促进创新和伦理的投入,同时确保生成AI的安全和伦理部署。

  • paper_url: http://arxiv.org/abs/2308.01469
  • repo_url: None
  • paper_authors: Ruyi Ding, Shijin Duan, Xiaolin Xu, Yunsi Fei
  • for: 保护图structured数据中的链接信息,防止敏感信息泄露。
  • methods: 提出了一种新的图poisong攻击方法,通过增强链接连接泄露来提高链接泄露效果。还提出了一种注意力机制,可以嵌入链接检测网络中。
  • results: 在四个真实世界数据集和三种不同的GNN结构中,VertexSerum显著超过了现有的链接推断攻击方法,平均提高了AUC分数9.8%。此外,我们的实验还表明VertexSerum在黑盒和在线学习设置中都有出色的应用可能性。
    Abstract Graph neural networks (GNNs) have brought superb performance to various applications utilizing graph structural data, such as social analysis and fraud detection. The graph links, e.g., social relationships and transaction history, are sensitive and valuable information, which raises privacy concerns when using GNNs. To exploit these vulnerabilities, we propose VertexSerum, a novel graph poisoning attack that increases the effectiveness of graph link stealing by amplifying the link connectivity leakage. To infer node adjacency more accurately, we propose an attention mechanism that can be embedded into the link detection network. Our experiments demonstrate that VertexSerum significantly outperforms the SOTA link inference attack, improving the AUC scores by an average of $9.8\%$ across four real-world datasets and three different GNN structures. Furthermore, our experiments reveal the effectiveness of VertexSerum in both black-box and online learning settings, further validating its applicability in real-world scenarios.
    摘要 GRAPH NEURAL NETWORKS (GNNs) have brought superb performance to various applications utilizing graph structural data, such as social analysis and fraud detection. The graph links, e.g., social relationships and transaction history, are sensitive and valuable information, which raises privacy concerns when using GNNs. To exploit these vulnerabilities, we propose VertexSerum, a novel graph poisoning attack that increases the effectiveness of graph link stealing by amplifying the link connectivity leakage. To infer node adjacency more accurately, we propose an attention mechanism that can be embedded into the link detection network. Our experiments demonstrate that VertexSerum significantly outperforms the SOTA link inference attack, improving the AUC scores by an average of 9.8% across four real-world datasets and three different GNN structures. Furthermore, our experiments reveal the effectiveness of VertexSerum in both black-box and online learning settings, further validating its applicability in real-world scenarios.

Machine Learning Small Molecule Properties in Drug Discovery

  • paper_url: http://arxiv.org/abs/2308.12354
  • repo_url: None
  • paper_authors: Nikolai Schapin, Maciej Majewski, Alejandro Varela, Carlos Arroniz, Gianni De Fabritiis
    for: This paper provides a comprehensive overview of various machine learning (ML) methods for predicting small molecule properties in drug discovery, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity).methods: The paper reviews a wide range of ML methods, including neural networks, chemical fingerprints, and graph-based neural networks, and discusses existing popular datasets and molecular descriptors.results: The paper highlights the challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explores briefly possible multi-objective optimization techniques to balance diverse properties while optimizing lead candidates. Additionally, the paper assesses techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery.Here is the information in Simplified Chinese text:for: 这篇论文提供了药物小分子性能预测方法的全面回顾,包括绑定亲和力、溶解度和ADMET(吸收、分布、代谢、排泄和毒性)。methods: 论文评论了各种机器学习方法,包括神经网络、化学指纹和图表基本网络,并讨论了现有的受欢迎数据集和分子特征。results: 论文强调了hit-to-lead和领导优化阶段中预测和优化多种属性的挑战,并 briefly explores可能的多目标优化技术来均衡多种属性。此外,论文评估了模型预测的理解方法,特别是在药物发现中的关键决策过程中。
    Abstract Machine learning (ML) is a promising approach for predicting small molecule properties in drug discovery. Here, we provide a comprehensive overview of various ML methods introduced for this purpose in recent years. We review a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). We discuss existing popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks. We highlight also challenges of predicting and optimizing multiple properties during hit-to-lead and lead optimization stages of drug discovery and explore briefly possible multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Finally, techniques to provide an understanding of model predictions, especially for critical decision-making in drug discovery are assessed. Overall, this review provides insights into the landscape of ML models for small molecule property predictions in drug discovery. So far, there are multiple diverse approaches, but their performances are often comparable. Neural networks, while more flexible, do not always outperform simpler models. This shows that the availability of high-quality training data remains crucial for training accurate models and there is a need for standardized benchmarks, additional performance metrics, and best practices to enable richer comparisons between the different techniques and models that can shed a better light on the differences between the many techniques.
    摘要 机器学习(ML)是药物发现中预测小分子性质的有前途的方法。本文提供了最近几年内对这种目标的各种机器学习方法的全面概述。我们评论了各种性质,包括结合稳定性、溶解度和ADMET(吸收、分布、代谢、排泄和毒性)。我们讨论了现有的受欢迎数据集和分子特征,如化学指纹和图像基于神经网络。我们 также提到了选择和优化多个属性的挑战,以及可能使用的多目标优化技术来平衡多个属性。最后,我们评估了模型预测结果的方法,特别是在药物发现的关键决策过程中。总的来说,本文提供了药物小分子性质预测机器学习模型的景观,目前有多种不同的方法,但它们的性能经常相当。神经网络,虽然更灵活,并不总是击败简单的模型。这表明数据训练的质量是关键,还需要标准化的 bencmarks、额外的性能指标和最佳实践,以便更好地比较不同的方法和模型,从而更好地了解它们之间的差异。

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

  • paper_url: http://arxiv.org/abs/2308.02560
  • repo_url: None
  • paper_authors: Robin San Roman, Yossi Adi, Antoine Deleforge, Romain Serizel, Gabriel Synnaeve, Alexandre Défossez
    for:这篇论文旨在提出一种高质量多频幂 diffusion-based 框架,用于从低比特率精炼的数据表示生成任何类型的音频模式(如语音、音乐、环境声)。methods:该方法使用 diffusion 模型,并在多个频带上实现。results:在相同的比特率下,该方法与现有的生成技术相比,具有更高的感知质量。
    Abstract Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms conditioned on highly compressed representations. Although such methods produce impressive results, they are prone to generate audible artifacts when the conditioning is flawed or imperfect. An alternative modeling approach is to use diffusion models. However, these have mainly been used as speech vocoders (i.e., conditioned on mel-spectrograms) or generating relatively low sampling rate signals. In this work, we propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality (e.g., speech, music, environmental sounds) from low-bitrate discrete representations. At equal bit rate, the proposed approach outperforms state-of-the-art generative techniques in terms of perceptual quality. Training and, evaluation code, along with audio samples, are available on the facebookresearch/audiocraft Github page.
    摘要 深度生成模型可以生成高质量音频,受到不同类型的表示(例如:mel-spectrograms、Mel-frequency Cepstral Coefficients (MFCC))的控制。近期,这些模型被用来synthesize音波形态,受到高度压缩的表示的控制。虽然这些方法可以生成印象深刻的结果,但它们容易产生杂音artefacts,当控制是不完整或有误的时。一种alternative的模型方法是使用扩散模型。然而,这些模型主要用于speech vocoder(受到mel-spectrograms的控制)或生成低频率的信号。在这个工作中,我们提议一种高质量多频段扩散基础框架,可以从low-bitrate discrete表示生成任何类型的音频模式(例如:speech、音乐、环境声)。在相同的比特率下,我们的提议方法在perceptual质量上超过了现状的生成技术。训练和评估代码,以及音频样本,可以在facebookresearch/audiocraft GitHub页面上获取。

A digital twin framework for civil engineering structures

  • paper_url: http://arxiv.org/abs/2308.01445
  • repo_url: None
  • paper_authors: Matteo Torzoni, Marco Tezzele, Stefano Mariani, Andrea Manzoni, Karen E. Willcox
  • for: 这项研究旨在提出一种预测性的数字双方法,用于监测、维护和管理 civil engineering 系统的健康状况。
  • methods: 该方法使用 probabilistic graphical model 编码 asset-twin 相关系统,并使用动态 Bayesian network 模型来处理时间重复的观察数据。深度学习模型用于提供实时结构健康诊断。
  • results: 研究人员通过在 synthetic 案例中使用reduced-order numerical model 计算健康依赖控制策略,并通过 dynamically updating digital twin 状态来实现智能决策。两个synthetic案例(一个悬臂 beam 和一个铁路桥)证明了该方法的动态决策能力。
    Abstract The digital twin concept represents an appealing opportunity to advance condition-based and predictive maintenance paradigms for civil engineering systems, thus allowing reduced lifecycle costs, increased system safety, and increased system availability. This work proposes a predictive digital twin approach to the health monitoring, maintenance, and management planning of civil engineering structures. The asset-twin coupled dynamical system is encoded employing a probabilistic graphical model, which allows all relevant sources of uncertainty to be taken into account. In particular, the time-repeating observations-to-decisions flow is modeled using a dynamic Bayesian network. Real-time structural health diagnostics are provided by assimilating sensed data with deep learning models. The digital twin state is continually updated in a sequential Bayesian inference fashion. This is then exploited to inform the optimal planning of maintenance and management actions within a dynamic decision-making framework. A preliminary offline phase involves the population of training datasets through a reduced-order numerical model and the computation of a health-dependent control policy. The strategy is assessed on two synthetic case studies, involving a cantilever beam and a railway bridge, demonstrating the dynamic decision-making capabilities of health-aware digital twins.
    摘要 “数字双胞体概念可能为公共工程系统维护和预测维护方面提供一个吸引人的机遇,以降低系统成本、提高系统安全性和提高系统可用性。这项工作提议一种预测性数字双胞体方法,用于监测、维护和管理计划 civil engineering 结构。Asset-twin 相关的动态系统通过 probabilistic graphical model 编码,其中包括所有相关的不确定因素。具体来说,时间重复的观察数据流使用动态 Bayesian network 进行模型化。通过嵌入感知数据的深度学习模型,实时执行结构健康诊断。数字双胞体状态通过顺序 Bayesian 推理方式不断更新。这些信息最后用于在动态决策框架中决策维护和管理活动的优化。在一个先进的离线阶段,通过减少的数值模型和计算健康控制策略,人工数据被填充到训练集中。这种策略在两个 sintetic 案例中,包括一个悬臂 beam 和一个铁路桥,展示了健康意识数字双胞体的动态决策能力。”

DLSIA: Deep Learning for Scientific Image Analysis

  • paper_url: http://arxiv.org/abs/2308.02559
  • repo_url: None
  • paper_authors: Eric J Roberts, Tanny Chavez, Alexander Hexemer, Petrus H. Zwart
  • for: 这篇论文是为了推广Python基于深度学习库DLSIA,帮助科学家和研究人员在多种科学领域使用自定义卷积神经网络架构进行图像分析任务,以便在下游数据处理或实验循环计算 scenarios 中使用。
  • methods: 该论文使用了易于使用的架构,如自动Encoder、可调U-Net和精简的混合缩放神经网络(MSDNet),以及Random graphs和稀疏连接生成的稀疏混合缩放神经网络(SMSNet)。
  • results: 该论文通过论文的实验数据继续增长 scale和复杂度,DLSIA提供了论文架构的可定制化和抽象,帮助科学家适应机器学习方法,加速发现,促进交叉领域合作,并进展科学图像分析研究。
    Abstract We introduce DLSIA (Deep Learning for Scientific Image Analysis), a Python-based machine learning library that empowers scientists and researchers across diverse scientific domains with a range of customizable convolutional neural network (CNN) architectures for a wide variety of tasks in image analysis to be used in downstream data processing, or for experiment-in-the-loop computing scenarios. DLSIA features easy-to-use architectures such as autoencoders, tunable U-Nets, and parameter-lean mixed-scale dense networks (MSDNets). Additionally, we introduce sparse mixed-scale networks (SMSNets), generated using random graphs and sparse connections. As experimental data continues to grow in scale and complexity, DLSIA provides accessible CNN construction and abstracts CNN complexities, allowing scientists to tailor their machine learning approaches, accelerate discoveries, foster interdisciplinary collaboration, and advance research in scientific image analysis.
    摘要 我们介绍DLSIA(深度学习 для科学影像分析),一个基于Python的机器学习库,它为科学家和研究人员提供了许多可自定义的卷积神经网络架构,用于广泛的影像分析任务,包括下游处理和实验运行 Computing enario。DLSIA 提供了易于使用的架构,例如自动编码器、可调 U-Net 和对�如� mixed-scale dense network (MSNet)。此外,我们还引入了随机 graphs 和罕见 Connection 的 sparse mixed-scale network (SMSNet)。随着实验数据的数量和复杂度不断增加,DLSIA 提供了可访问的 CNN 建立和抽象 CNN 复杂度,让科学家可以根据自己的机器学习方法,加速发现,促进多学科合作,并进展科学影像分析研究。

Novel Physics-Based Machine-Learning Models for Indoor Air Quality Approximations

  • paper_url: http://arxiv.org/abs/2308.01438
  • repo_url: None
  • paper_authors: Ahmad Mohammadshirazi, Aida Nadafian, Amin Karimi Monsefi, Mohammad H. Rafiei, Rajiv Ramnath
  • For: 这种研究旨在提供一种准确的indoor空气质量估计方法,以提供健康的室内环境,优化相关能源消耗,并提供人类 COMFORT。* Methods: 该研究提出了六种新的物理基于机器学习模型,结合了状态空间概念、闭合循环单元和分解技术。* Results: 该研究表明,提出的模型比类似的现状技术转换器模型更加简单、计算效率高,并且能够更好地捕捉室内空气质量数据中的高度非线性特征。
    Abstract Cost-effective sensors are capable of real-time capturing a variety of air quality-related modalities from different pollutant concentrations to indoor/outdoor humidity and temperature. Machine learning (ML) models are capable of performing air-quality "ahead-of-time" approximations. Undoubtedly, accurate indoor air quality approximation significantly helps provide a healthy indoor environment, optimize associated energy consumption, and offer human comfort. However, it is crucial to design an ML architecture to capture the domain knowledge, so-called problem physics. In this study, we propose six novel physics-based ML models for accurate indoor pollutant concentration approximations. The proposed models include an adroit combination of state-space concepts in physics, Gated Recurrent Units, and Decomposition techniques. The proposed models were illustrated using data collected from five offices in a commercial building in California. The proposed models are shown to be less complex, computationally more efficient, and more accurate than similar state-of-the-art transformer-based models. The superiority of the proposed models is due to their relatively light architecture (computational efficiency) and, more importantly, their ability to capture the underlying highly nonlinear patterns embedded in the often contaminated sensor-collected indoor air quality temporal data.
    摘要 Cost-effective sensors can real-time capture various air quality-related modalities, from different pollutant concentrations to indoor/outdoor humidity and temperature. Machine learning (ML) models can perform air-quality "ahead-of-time" approximations. Accurate indoor air quality approximation is crucial to provide a healthy indoor environment, optimize associated energy consumption, and offer human comfort. However, it is essential to design an ML architecture that captures the domain knowledge, so-called problem physics. In this study, we propose six novel physics-based ML models for accurate indoor pollutant concentration approximations. The proposed models combine state-space concepts in physics, Gated Recurrent Units, and Decomposition techniques. The proposed models were illustrated using data collected from five offices in a commercial building in California. The proposed models are less complex, computationally more efficient, and more accurate than similar state-of-the-art transformer-based models. The superiority of the proposed models is due to their relatively light architecture (computational efficiency) and their ability to capture the underlying highly nonlinear patterns embedded in the often contaminated sensor-collected indoor air quality temporal data.

Price-Aware Deep Learning for Electricity Markets

  • paper_url: http://arxiv.org/abs/2308.01436
  • repo_url: None
  • paper_authors: Vladimir Dvorkin, Ferdinando Fioretto
  • for: 本文旨在探讨深度学习在运维规划中的应用,以及深度学习中的预测错误如何影响电力价格。
  • methods: 本文使用了深度学习模型来预测电力供应和需求,并通过嵌入电力市场清算优化层来提高公平性。
  • results: 研究发现,通过嵌入电力市场清算优化层可以减少预测错误对电力价格的影响,同时可以控制系统内价格错误的空间分布。
    Abstract While deep learning gradually penetrates operational planning, its inherent prediction errors may significantly affect electricity prices. This letter examines how prediction errors propagate into electricity prices, revealing notable pricing errors and their spatial disparity in congested power systems. To improve fairness, we propose to embed electricity market-clearing optimization as a deep learning layer. Differentiating through this layer allows for balancing between prediction and pricing errors, as oppose to minimizing prediction errors alone. This layer implicitly optimizes fairness and controls the spatial distribution of price errors across the system. We showcase the price-aware deep learning in the nexus of wind power forecasting and short-term electricity market clearing.
    摘要 “深度学习逐渐渗透到运维规划中,但它的内置预测错误可能对电力价格产生很大影响。这封信通过分析预测错误如何传播到电力价格,揭示了价格错误的很大差异和系统中的空间分布。为了提高公平性,我们提议将电力市场清算优化作为深度学习层的一部分。通过这个层进行极点搜索,可以平衡预测错误和价格错误,而不是仅仅是减少预测错误。这个层也隐式地优化了公平性,并控制了系统中价格错误的空间分布。我们在风力发电预测和短期电力市场清算之间展示了价格意识深度学习的作用。”Note: Please keep in mind that the translation is not perfect and may not capture all the nuances of the original text.

COVID-VR: A Deep Learning COVID-19 Classification Model Using Volume-Rendered Computer Tomography

  • paper_url: http://arxiv.org/abs/2308.01433
  • repo_url: None
  • paper_authors: Noemi Maritza L. Romero, Ricco Vasconcellos, Mariana R. Mendoza, João L. D. Comba
  • for: 该论文主要目的是提出一种基于Volume Rendering技术的肺疾病分类方法,以提高肺疾病诊断的准确性和效率。
  • methods: 该方法使用了深度学习模型,利用多个视角捕捉的Volume Rendering图像来分类肺疾病。
  • results: 对比于传统的slice-based方法,该方法能够更好地识别肺疾病,并且在比较中表现竞争力强。
    Abstract The COVID-19 pandemic presented numerous challenges to healthcare systems worldwide. Given that lung infections are prevalent among COVID-19 patients, chest Computer Tomography (CT) scans have frequently been utilized as an alternative method for identifying COVID-19 conditions and various other types of pulmonary diseases. Deep learning architectures have emerged to automate the identification of pulmonary disease types by leveraging CT scan slices as inputs for classification models. This paper introduces COVID-VR, a novel approach for classifying pulmonary diseases based on volume rendering images of the lungs captured from multiple angles, thereby providing a comprehensive view of the entire lung in each image. To assess the effectiveness of our proposal, we compared it against competing strategies utilizing both private data obtained from partner hospitals and a publicly available dataset. The results demonstrate that our approach effectively identifies pulmonary lesions and performs competitively when compared to slice-based methods.
    摘要 COVID-19 大流行对全球医疗系统带来了许多挑战。由于呼吸系统疾病非常普遍 Among COVID-19 patients, chest Computer Tomography (CT) scans have frequently been used as an alternative method for identifying COVID-19 conditions and various other types of pulmonary diseases. 深度学习体系 emerged to automate the identification of pulmonary disease types by leveraging CT scan slices as inputs for classification models.This paper introduces COVID-VR, a novel approach for classifying pulmonary diseases based on volume rendering images of the lungs captured from multiple angles, thereby providing a comprehensive view of the entire lung in each image. To assess the effectiveness of our proposal, we compared it against competing strategies utilizing both private data obtained from partner hospitals and a publicly available dataset. The results demonstrate that our approach effectively identifies pulmonary lesions and performs competitively when compared to slice-based methods.Here's the text with Traditional Chinese characters:COVID-19 大流行对全球医疗系统带来了许多挑战。由于呼吸系统疾病非常普遍 Among COVID-19 patients, chest Computer Tomography (CT) scans have frequently been used as an alternative method for identifying COVID-19 conditions and various other types of pulmonary diseases. 深度学习体系 emerged to automate the identification of pulmonary disease types by leveraging CT scan slices as inputs for classification models.This paper introduces COVID-VR, a novel approach for classifying pulmonary diseases based on volume rendering images of the lungs captured from multiple angles, thereby providing a comprehensive view of the entire lung in each image. To assess the effectiveness of our proposal, we compared it against competing strategies utilizing both private data obtained from partner hospitals and a publicly available dataset. The results demonstrate that our approach effectively identifies pulmonary lesions and performs competitively when compared to slice-based methods.

Unlocking the Potential of Similarity Matching: Scalability, Supervision and Pre-training

  • paper_url: http://arxiv.org/abs/2308.02427
  • repo_url: None
  • paper_authors: Yanis Bahroun, Shagesh Sridharan, Atithi Acharya, Dmitri B. Chklovskii, Anirvan M. Sengupta
  • for: 这篇论文旨在提出一种基于本地学习规则的可能学习框架,以替代具有限制的负回propagation(BP)算法,提高计算效率和生物可能性。
  • methods: 该研究使用了一种基于similarity matching(SM)框架的方法,该框架与生物系统中观察到的机制相符,并且可以在线、本地化和生物可能性的方法。
  • results: 研究人员通过对PyTorch实现Convolutional Nonnegative SM进行扩展,并引入一种基于核心相关分析的本地化supervised SM目标,使得SM层可以堆叠。此外,研究人员还对LeNet预训练架构进行比较,并证明了与BP训练模型的评估特征的比较。这种结合生物可能性和计算效率的方法开创了多个可能性的探索。
    Abstract While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framework, which aligns with observed mechanisms in biological systems and offers online, localized, and biologically plausible algorithms. i) To scale SM to large datasets, we propose an implementation of Convolutional Nonnegative SM using PyTorch. ii) We introduce a localized supervised SM objective reminiscent of canonical correlation analysis, facilitating stacking SM layers. iii) We leverage the PyTorch implementation for pre-training architectures such as LeNet and compare the evaluation of features against BP-trained models. This work combines biologically plausible algorithms with computational efficiency opening multiple avenues for further explorations.
    摘要 While effective, the backpropagation (BP) algorithm has limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framework, which aligns with observed mechanisms in biological systems and offers online, localized, and biologically plausible algorithms.i) To scale SM to large datasets, we propose an implementation of Convolutional Nonnegative SM using PyTorch.ii) We introduce a localized supervised SM objective reminiscent of canonical correlation analysis, facilitating stacking SM layers.iii) We leverage the PyTorch implementation for pre-training architectures such as LeNet and compare the evaluation of features against BP-trained models. This work combines biologically plausible algorithms with computational efficiency, opening multiple avenues for further explorations.Here's the translation in Traditional Chinese:虽然backpropagation(BP)算法有效,但它具有生物可能性、计算成本和线上学习不适用的限制。因此,有着增加生物可能性学习方法的生物学可能性的兴趣。本研究专注在主要无监督相似匹配(SM)框架,该框架与生物系统观察到的机制相似,并且提供了线上、本地和生物可能性的算法。i) 为了扩展SM到大量数据集,我们提议使用PyTorch实现Convolutional Nonnegative SM。ii) 我们引入了一个本地导向的SM目标,与传统的均值分析相似,便于堆叠SM层。iii) 我们利用PyTorch实现,与BP训练的模型进行比较,以评估特征的评估。本研究结合了生物可能性算法和计算效率,开启了多个探索之路。

Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction

  • paper_url: http://arxiv.org/abs/2308.03782
  • repo_url: None
  • paper_authors: Yue Ling
  • for: 这项研究的目的是开发一种可以分析病人药物评价文本,并准确地分类为正面、中性或负面三类的自然语言处理(NLP)模型。
  • methods: 研究人员实现和评估了多种分类模型,包括BERT基础模型、医疗+临床BERT和简单的CNN。
  • results: 结果表明,医疗+临床BERT模型在总性表现方面胜过基础BERT模型,实现了 macro f1 和回归分数的11%的提升,如表2所示。未来的研究可以探讨如何利用每个模型的特点。医疗+临床BERT模型在医疗术语方面表现出色,而简单的CNN则能够准确地识别关键词并在文本中分类 sentiment。
    Abstract The objective of this study is to develop natural language processing (NLP) models that can analyze patients' drug reviews and accurately classify their satisfaction levels as positive, neutral, or negative. Such models would reduce the workload of healthcare professionals and provide greater insight into patients' quality of life, which is a critical indicator of treatment effectiveness. To achieve this, we implemented and evaluated several classification models, including a BERT base model, Bio+Clinical BERT, and a simpler CNN. Results indicate that the medical domain-specific Bio+Clinical BERT model significantly outperformed the general domain base BERT model, achieving macro f1 and recall score improvement of 11%, as shown in Table 2. Future research could explore how to capitalize on the specific strengths of each model. Bio+Clinical BERT excels in overall performance, particularly with medical jargon, while the simpler CNN demonstrates the ability to identify crucial words and accurately classify sentiment in texts with conflicting sentiments.
    摘要 本研究的目的是开发自然语言处理(NLP)模型,可以分析病人的药品评价并准确地分类为正面、中性或负面的满意度。这些模型会减轻医疗专业人员的工作负担,并提供更多有关病人生活质量的指标,这是治疗效果的关键指标。为 достичь这一目标,我们实施和评估了多种分类模型,包括BERT基础模型、医疗+临床BERT和简单的CNN。结果表明,医疗领域特定的Bio+Clinical BERT模型在表格2中显著超越了通用领域基础BERT模型,实现了macro f1和回归分数的提高率为11%。未来的研究可以探讨如何利用每个模型的特点。Bio+Clinical BERT在总性性能方面表现优异,特别是对医疗术语的处理;而简单的CNN则能够准确地标识关键词并在文本中 conflicting 的情感下准确地分类 sentiment。

Sea level Projections with Machine Learning using Altimetry and Climate Model ensembles

  • paper_url: http://arxiv.org/abs/2308.02460
  • repo_url: None
  • paper_authors: Saumya Sinha, John Fasullo, R. Steven Nerem, Claire Monteleoni
  • for: 研究全球海平面的上升趋势和anthropogenic climate-change signals的贡献
  • methods: 使用机器学习(ML)方法,combine satellite observations和气候模型 simulations,预测未来30年海平面变化
  • results: 通过非线性拟合气候模型预测和ML模型,预测未来30年海平面变化,并通过分割数据集来提高预测的准确性
    Abstract Satellite altimeter observations retrieved since 1993 show that the global mean sea level is rising at an unprecedented rate (3.4mm/year). With almost three decades of observations, we can now investigate the contributions of anthropogenic climate-change signals such as greenhouse gases, aerosols, and biomass burning in this rising sea level. We use machine learning (ML) to investigate future patterns of sea level change. To understand the extent of contributions from the climate-change signals, and to help in forecasting sea level change in the future, we turn to climate model simulations. This work presents a machine learning framework that exploits both satellite observations and climate model simulations to generate sea level rise projections at a 2-degree resolution spatial grid, 30 years into the future. We train fully connected neural networks (FCNNs) to predict altimeter values through a non-linear fusion of the climate model hindcasts (for 1993-2019). The learned FCNNs are then applied to future climate model projections to predict future sea level patterns. We propose segmenting our spatial dataset into meaningful clusters and show that clustering helps to improve predictions of our ML model.
    摘要 卫星探雷数据自1993年起获取到,全球平均海平面上升速率为3.4毫米/年。经过三十年的观测,我们现在可以研究人类活动对海平面上升的贡献,包括绿色气体、尘埃和生物燃烧等气候变化信号。我们使用机器学习(ML)技术来研究未来海平面变化的趋势。为了了解气候变化信号的贡献程度,以及在未来预测海平面变化的需要,我们转而使用气候模型仿真。本研究提出了一种基于卫星观测和气候模型仿真的机器学习框架,用于预测未来30年的海平面变化趋势。我们使用全连接神经网络(FCNN)来预测探雷值,通过非线性混合气候模型预测(1993-2019年)来训练FCNN。学习后的FCNN被应用于未来气候模型预测中,以预测未来海平面的变化趋势。我们还提出了分割我们的空间数据集,并证明分割可以提高我们的机器学习模型的预测精度。

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

  • paper_url: http://arxiv.org/abs/2308.01390
  • repo_url: https://github.com/mlfoundations/open_flamingo
  • paper_authors: Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt
  • for: 这个论文是为了制定一个开源的FLAMINGO模型复制,用于视觉语言处理任务。
  • methods: 这个论文使用了多种权重学习算法和精度优化技术来训练FLAMINGO模型,并在七个视觉语言数据集上进行了评估。
  • results: 根据论文的报告,OpenFlamingo模型在七个视觉语言数据集上的平均性能为80-89%,与相应的FLAMINGO模型的性能相当。
    Abstract We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind's Flamingo models. On seven vision-language datasets, OpenFlamingo models average between 80 - 89% of corresponding Flamingo performance. This technical report describes our models, training data, hyperparameters, and evaluation suite. We share our models and code at https://github.com/mlfoundations/open_flamingo.
    摘要 我们介绍OpenFlamingo,一个家族型态数据视觉语言模型,从3B到9B参数。OpenFlamingo是一个持续进行的开源实现深渊迷你的FLAMINGO模型。在七个视觉语言数据集上,OpenFlamingo模型的平均表现为80-89%相应的FLAMINGO性能。这份技术报告描述了我们的模型、训练数据、几何parameters和评估套件。我们在https://github.com/mlfoundations/open_flamingo上分享我们的模型和代码。

Follow the Soldiers with Optimized Single-Shot Multibox Detection and Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.01389
  • repo_url: None
  • paper_authors: Jumman Hossain, Maliha Momtaz
  • for: 本研究的主要目标是建立一个自动驾驶系统,使其能够跟踪一名士兵(在我们的项目中)在任何方向移动。
  • methods: 我们使用优化的单步多卷检测(SSD)对象检测模型和学习环境(RL)模型来实现该目标。
  • results: 我们使用SSD Lite而不是SSD,并对其进行了比较。实验结果显示,SSD Lite在这三种技术中表现最佳,并在执行速度方面具有显著的提升(约2-3倍)而无需牺牲准确性。
    Abstract Nowadays, autonomous cars are gaining traction due to their numerous potential applications on battlefields and in resolving a variety of other real-world challenges. The main goal of our project is to build an autonomous system using DeepRacer which will follow a specific person (for our project, a soldier) when they will be moving in any direction. Two main components to accomplish this project is an optimized Single-Shot Multibox Detection (SSD) object detection model and a Reinforcement Learning (RL) model. We accomplished the task using SSD Lite instead of SSD and at the end, compared the results among SSD, SSD with Neural Computing Stick (NCS), and SSD Lite. Experimental results show that SSD Lite gives better performance among these three techniques and exhibits a considerable boost in inference speed (~2-3 times) without compromising accuracy.
    摘要 现在,自适应车辆正在受到广泛关注,因为它们在战场和解决各种实际问题中具有丰富的潜力。我们项目的主要目标是使用DeepRacer建立一个自适应系统,该系统可以跟踪一个特定人(在我们项目中是一名士兵)在任何方向移动时。我们使用优化的单射多框检测(SSD)对象检测模型和学习奖励(RL)模型来实现该目标。我们使用SSD Lite而不是SSD,并在结束时比较了这三种技术的结果。实验结果显示SSD Lite在这三种技术中表现最佳,并且在执行速度方面表现了明显的提升(约2-3倍),而无需牺牲准确性。

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

  • paper_url: http://arxiv.org/abs/2308.01320
  • repo_url: https://github.com/microsoft/DeepSpeed
  • paper_authors: Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He
  • for: 这篇论文目的是为了提供一个可访问、高效、cost-effective的RLHF(人类反馈学习)训练管道,以便更多的AI研究人员可以使用ChatGPT-like模型进行训练。
  • methods: 这篇论文使用了DeepSpeed-Chat系统,这是一个新的RLHF训练系统,它提供了三个关键能力:对ChatGPT-like模型的易于使用训练和推理经验、InstructGPT的DeepSpeed-RLHF管道的复制、以及一个可靠的DeepSpeed-RLHF系统,该系统结合了多种优化来实现训练和推理的高效率和可扩展性。
  • results: 这篇论文的结果表明,使用DeepSpeed-Chat系统可以在短时间内训练ChatGPT-like模型,并且可以在相对较少的成本下进行大规模训练。这种系统的开发将会推动AI领域的进步和发展,并且将使更多的数据科学家有access to advanced RLHF训练技术。
    Abstract ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.
    摘要 <>translate "ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI."into Simplified Chinese:<> chatGPT-like 模型已经在人工智能中革命化了各种应用,从概要和编程到翻译,与人类表现相当或甚至超越人类表现。然而,当前的景象缺乏可 accessible、高效、成本效果的整体 RLHF(人类反馈学习)训练管道,特别是在 billions of parameters 的训练 scale 上。这篇论文介绍了 DeepSpeed-Chat,一种新的系统,它将RLHF 训练 демокра化,使其对 AI 社区开放。DeepSpeed-Chat 提供三个关键能力:对 ChatGPT-like 模型的易于使用训练和推理经验,基于 InstructGPT 的 DeepSpeed-RLHF 管道,以及一个可靠的 DeepSpeed-RLHF 系统,它将训练和推理优化集成在一起。该系统具有无前例的高效性和可扩展性,可以在短时间内训练 billions of parameters 的模型,并且在成本的一小部分。通过这一发展,DeepSpeed-Chat 为更广泛的 RLHF 训练提供了可持续的进程,使得数据科学家 WITH 有限的资源也可以访问高级 RLHF 训练,从而推动 AI 领域的创新和进一步发展。

Computational Long Exposure Mobile Photography

  • paper_url: http://arxiv.org/abs/2308.01379
  • repo_url: None
  • paper_authors: Eric Tabellion, Nikhil Karnad, Noa Glaser, Ben Weiss, David E. Jacobs, Yael Pritch
  • for: 这篇论文是关于计算机图像处理技术的研究,旨在提供一种可以在手持式智能手机摄像头APP中实现长时间曝光摄影的系统。
  • methods: 该系统首先检测和分割主题,然后跟踪场景运动多个帧,并将图像对齐以保持所需的锐化和生成美观的运动梦幕。最后,它使用预测动态模型来 sintesize动态曝光,并将曝光图像与正常曝光图像 composite 成一个高分辨率和高动态范围的照片。
  • results: 该系统可以帮助摄影师在手持式智能手机摄像头中实现长时间曝光摄影,并且可以自动检测和分割主题,以及生成美观的运动梦幕。这种技术可以让摄影师更容易地拍摄长时间曝光照片,并且可以帮助更多的摄影爱好者掌握这种技术。
    Abstract Long exposure photography produces stunning imagery, representing moving elements in a scene with motion-blur. It is generally employed in two modalities, producing either a foreground or a background blur effect. Foreground blur images are traditionally captured on a tripod-mounted camera and portray blurred moving foreground elements, such as silky water or light trails, over a perfectly sharp background landscape. Background blur images, also called panning photography, are captured while the camera is tracking a moving subject, to produce an image of a sharp subject over a background blurred by relative motion. Both techniques are notoriously challenging and require additional equipment and advanced skills. In this paper, we describe a computational burst photography system that operates in a hand-held smartphone camera app, and achieves these effects fully automatically, at the tap of the shutter button. Our approach first detects and segments the salient subject. We track the scene motion over multiple frames and align the images in order to preserve desired sharpness and to produce aesthetically pleasing motion streaks. We capture an under-exposed burst and select the subset of input frames that will produce blur trails of controlled length, regardless of scene or camera motion velocity. We predict inter-frame motion and synthesize motion-blur to fill the temporal gaps between the input frames. Finally, we composite the blurred image with the sharp regular exposure to protect the sharpness of faces or areas of the scene that are barely moving, and produce a final high resolution and high dynamic range (HDR) photograph. Our system democratizes a capability previously reserved to professionals, and makes this creative style accessible to most casual photographers. More information and supplementary material can be found on our project webpage: https://motion-mode.github.io/
    摘要 长时间拍摄可以生成吸引人的图像,表现在Scene中的运动元素的摩擦模式。通常在两种模式下使用,生成 either 前景或背景模糊效果。前景模糊图像通常在静止摄像机上拍摄,捕捉摩擦的前景元素,如流动的水或灯光轨迹,与静止的背景景象一样清晰。背景模糊图像,也称为滑动摄影,通过在摄像机跟踪移动目标来生成一个锐定的主题,与相对运动的背景模糊。这两种技术都具有挑战性,需要额外设备和高级技能。在这篇论文中,我们描述了一种基于智能手机摄像机应用的计算机 burst摄影系统,可以在单击闭合按钮后自动完成这些效果。我们的方法首先检测和分割主题。我们跟踪场景运动,并对多帧图像进行对齐,以保持所需的锐度和生成美观的运动螺旋。我们捕捉具有不充足光量的快速拍摄,并从输入帧中选择能够生成控制长度的摩擦轨迹。我们预测间帧运动,并使用模拟摩擦来填充时间间隔。最后,我们将模糊图像 composite 到锐定的正常曝光图像中,以保护人脸或场景中的 hardly moving 部分,并生成一个高分辨率和高 dynamically range (HDR) 图像。我们的系统将这种创造力减少到专业人员之外,使这种创造性风格开放给大多数众所可达。更多信息和补充材料可以在我们项目网站中找到:https://motion-mode.github.io/

AI-Enhanced Data Processing and Discovery Crowd Sourcing for Meteor Shower Mapping

  • paper_url: http://arxiv.org/abs/2308.02664
  • repo_url: None
  • paper_authors: Siddha Ganju, Amartya Hatua, Peter Jenniskens, Sahyadri Krishna, Chicheng Ren, Surya Ambardar
  • for: 这项研究的目标是为了映射我们的陨星雨,通过三角测量陨星轨迹检测在低光照视频摄像机上的多个位置上,从16个国家的北和南半球进行覆盖。
  • methods: 这项研究使用了一个云基的AI导向的自动化数据处理管道,以提高数据处理的速度和精度,并使用可解释的活动学习和AI管道来自动化数据处理。
  • results: 到目前为止,CAMS已经发现了200多个新的陨星雨,并验证了数十个之前已经报告的雨。
    Abstract The Cameras for Allsky Meteor Surveillance (CAMS) project, funded by NASA starting in 2010, aims to map our meteor showers by triangulating meteor trajectories detected in low-light video cameras from multiple locations across 16 countries in both the northern and southern hemispheres. Its mission is to validate, discover, and predict the upcoming returns of meteor showers. Our research aimed to streamline the data processing by implementing an automated cloud-based AI-enabled pipeline and improve the data visualization to improve the rate of discoveries by involving the public in monitoring the meteor detections. This article describes the process of automating the data ingestion, processing, and insight generation using an interpretable Active Learning and AI pipeline. This work also describes the development of an interactive web portal (the NASA Meteor Shower portal) to facilitate the visualization of meteor radiant maps. To date, CAMS has discovered over 200 new meteor showers and has validated dozens of previously reported showers.
    摘要 美国国家航空航天局(NASA)自2010年起投入了“全天空闪电观测计划”(CAMS),旨在通过多个国家和多个地点的低光照视频摄像机械triangulationeteor轨迹,以确定和预测下一次闪电流星雨的返回。该项目的任务是验证、发现和预测下一次闪电流星雨的返回。我们的研究旨在通过实施云端AI智能pipeline自动化数据处理和改进数据视图来提高发现率,并通过与公众合作监测闪电探测来提高发现率。这篇文章描述了使用可解释性活动学习和AIipeline自动化数据进入、处理和情况描述的过程。此外,这篇文章还描述了开发了NASA流星雨门户,以便促进流星 radiant map的可视化。至今,CAMS已经发现了200多个新的闪电流星雨,并验证了数十个之前报道的闪电流星雨。

Explainable Deep Learning for Tumor Dynamic Modeling and Overall Survival Prediction using Neural-ODE

  • paper_url: http://arxiv.org/abs/2308.01362
  • repo_url: None
  • paper_authors: Mark Laurie, James Lu
  • for: 支持肿瘤疾病药物开发,提高预测性,实现个性化治疗和决策。
  • methods: 使用Tumor Dynamic Neural-ODE(TDNODE)作为具有药理知识的神经网络,从 longitudinal 肿瘤大小数据中发现模型。
  • results: TDNODE 能够减少现有模型的一个关键局限性,从截断数据中做不偏向预测。生成的 metrics 可以高度准确地预测患者的全身存活率(OS)。
    Abstract While tumor dynamic modeling has been widely applied to support the development of oncology drugs, there remains a need to increase predictivity, enable personalized therapy, and improve decision-making. We propose the use of Tumor Dynamic Neural-ODE (TDNODE) as a pharmacology-informed neural network to enable model discovery from longitudinal tumor size data. We show that TDNODE overcomes a key limitation of existing models in its ability to make unbiased predictions from truncated data. The encoder-decoder architecture is designed to express an underlying dynamical law which possesses the fundamental property of generalized homogeneity with respect to time. Thus, the modeling formalism enables the encoder output to be interpreted as kinetic rate metrics, with inverse time as the physical unit. We show that the generated metrics can be used to predict patients' overall survival (OS) with high accuracy. The proposed modeling formalism provides a principled way to integrate multimodal dynamical datasets in oncology disease modeling.
    摘要 traditional dynamic modeling has been widely used to support the development of oncology drugs, but there is still a need to improve predictability, enable personalized therapy, and make better decisions. we propose the use of Tumor Dynamic Neural-ODE (TDNODE) as a pharmacology-informed neural network to enable model discovery from longitudinal tumor size data. we show that TDNODE overcomes a key limitation of existing models by making unbiased predictions from truncated data. the encoder-decoder architecture is designed to express an underlying dynamical law that possesses the fundamental property of generalized homogeneity with respect to time. thus, the modeling formalism enables the encoder output to be interpreted as kinetic rate metrics, with inverse time as the physical unit. we show that the generated metrics can be used to predict patients' overall survival (os) with high accuracy. the proposed modeling formalism provides a principled way to integrate multimodal dynamical datasets in oncology disease modeling.

Compressed and distributed least-squares regression: convergence rates with applications to Federated Learning

  • paper_url: http://arxiv.org/abs/2308.01358
  • repo_url: None
  • paper_authors: Constantin Philippenko, Aymeric Dieuleveut
  • for: investigate the impact of compression on stochastic gradient algorithms for machine learning, specifically in distributed and federated learning
  • methods: analyze the convergence rates of several unbiased compression operators, and extend the results to the case of federated learning
  • results: demonstrate that the limit variance term scales with $\mathrm{Tr}(\mathfrak{C}{\mathrm{ania} H^{-1})/K$, and analyze the dependency of $\mathfrak{C}{\mathrm{ania}$ on the compression strategy and its impact on convergence in centralized and heterogeneous FL frameworks.
    Abstract In this paper, we investigate the impact of compression on stochastic gradient algorithms for machine learning, a technique widely used in distributed and federated learning. We underline differences in terms of convergence rates between several unbiased compression operators, that all satisfy the same condition on their variance, thus going beyond the classical worst-case analysis. To do so, we focus on the case of least-squares regression (LSR) and analyze a general stochastic approximation algorithm for minimizing quadratic functions relying on a random field. We consider weak assumptions on the random field, tailored to the analysis (specifically, expected H\"older regularity), and on the noise covariance, enabling the analysis of various randomizing mechanisms, including compression. We then extend our results to the case of federated learning. More formally, we highlight the impact on the convergence of the covariance $\mathfrak{C}_{\mathrm{ania}$ of the additive noise induced by the algorithm. We demonstrate despite the non-regularity of the stochastic field, that the limit variance term scales with $\mathrm{Tr}(\mathfrak{C}_{\mathrm{ania} H^{-1})/K$ (where $H$ is the Hessian of the optimization problem and $K$ the number of iterations) generalizing the rate for the vanilla LSR case where it is $\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$ (Bach and Moulines, 2013). Then, we analyze the dependency of $\mathfrak{C}_{\mathrm{ania}$ on the compression strategy and ultimately its impact on convergence, first in the centralized case, then in two heterogeneous FL frameworks.
    摘要 在这篇论文中,我们研究了压缩对于分布式和联合学习中使用的梯度下降算法的影响。我们将不同的压缩算法比较,这些算法都满足同样的协方差Condition,因此超过了传统的最坏情况分析。为了这样做,我们将ocus on least-squares regression(LSR),并分析一种基于随机场的总体梯度下降算法来最小化quadratics函数。我们假设了随机场的假设(specifically, expected H\"older regularity)和噪声 covariance,使得可以分析不同的随机化机制,包括压缩。然后,我们将结果推广到联合学习中。更正式地,我们强调 covariance $\mathfrak{C}_{\mathrm{ania}$ 的添加噪声对算法的 converges impact。我们证明,即使随机场不规则, covariance $\mathfrak{C}_{\mathrm{ania}$ 的极限方差项与 $\mathrm{Tr}(\mathfrak{C}_{\mathrm{ania} H^{-1})/K$ 相似(where $H$ is the Hessian of the optimization problem and $K$ the number of iterations),这与 vanilla LSR 情况中的 rate $\sigma^2 \mathrm{Tr}(H H^{-1}) / K = \sigma^2 d / K$ (Bach and Moulines, 2013)相同。然后,我们分析了压缩策略对 covariance $\mathfrak{C}_{\mathrm{ania}$ 的影响,最初是在中央化环境中,然后在两个不同的多样化联合学习框架中进行分析。

More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes

  • paper_url: http://arxiv.org/abs/2308.01313
  • repo_url: https://github.com/umd-huang-lab/perceptionclip
  • paper_authors: Bang An, Sicheng Zhu, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang
  • for: This paper aims to improve zero-shot image classification using CLIP, by leveraging the model’s ability to understand visual concepts and natural language descriptions.
  • methods: The proposed method, called PerceptionCLIP, first infers contextual attributes (e.g., background) from an image, and then performs object classification conditioning on these attributes.
  • results: The proposed method achieves better generalization, group robustness, and interpretability compared to traditional zero-shot classification methods. For example, PerceptionCLIP with ViT-L/14 improves the worst group accuracy by 16.5% on the Waterbirds dataset and by 3.5% on CelebA.Here’s the simplified Chinese translation of the three key information points:
  • for: 这篇论文目的是使用CLIP提高零shot图像分类,通过利用模型对视觉概念和自然语言描述的能力。
  • methods: 提议的方法是名为PerceptionCLIP,它首先从图像中推理出上下文特征(例如背景),然后根据这些特征进行对象分类。
  • results: 提议的方法相比传统零shot分类方法,具有更好的泛化、群体稳定和可解释性。例如,PerceptionCLIP与ViT-L/14结合使用,在水鸟数据集上提高了最差群组准确率16.5%,在 celebA 数据集上提高了3.5%。
    Abstract CLIP, as a foundational vision language model, is widely used in zero-shot image classification due to its ability to understand various visual concepts and natural language descriptions. However, how to fully leverage CLIP's unprecedented human-like understanding capabilities to achieve better zero-shot classification is still an open question. This paper draws inspiration from the human visual perception process: a modern neuroscience view suggests that in classifying an object, humans first infer its class-independent attributes (e.g., background and orientation) which help separate the foreground object from the background, and then make decisions based on this information. Inspired by this, we observe that providing CLIP with contextual attributes improves zero-shot classification and mitigates reliance on spurious features. We also observe that CLIP itself can reasonably infer the attributes from an image. With these observations, we propose a training-free, two-step zero-shot classification method named PerceptionCLIP. Given an image, it first infers contextual attributes (e.g., background) and then performs object classification conditioning on them. Our experiments show that PerceptionCLIP achieves better generalization, group robustness, and better interpretability. For example, PerceptionCLIP with ViT-L/14 improves the worst group accuracy by 16.5% on the Waterbirds dataset and by 3.5% on CelebA.
    摘要 CLIP,作为基础视言语模型,在零批图像分类中广泛使用,因为它能够理解多种视觉概念和自然语言描述。然而,如何充分利用CLIP的人类如意理解能力来实现更好的零批分类仍是一个开放问题。这篇论文启发自人类视觉过程:现代神经科学视野认为,在分类一个物体,人们首先推理出它的类型独立特征(如背景和方向),这些特征将物体与背景分离,然后根据这些信息进行决策。 inspirited by this, we observe that providing CLIP with contextual attributes improves zero-shot classification and mitigates reliance on spurious features. We also observe that CLIP itself can reasonably infer the attributes from an image. With these observations, we propose a training-free, two-step zero-shot classification method named PerceptionCLIP. Given an image, it first infers contextual attributes (e.g., background) and then performs object classification conditioning on them. Our experiments show that PerceptionCLIP achieves better generalization, group robustness, and better interpretability. For example, PerceptionCLIP with ViT-L/14 improves the worst group accuracy by 16.5% on the Waterbirds dataset and by 3.5% on CelebA.

Lode Encoder: AI-constrained co-creativity

  • paper_url: http://arxiv.org/abs/2308.01312
  • repo_url: None
  • paper_authors: Debosmita Bhaumik, Ahmed Khalifa, Julian Togelius
  • for: 这个论文是为了设计一个基于混合引导的平台游戏《寻宝 runner》的游戏等级创建系统。
  • methods: 这个系统使用多个自编码器,通过训练这些自编码器使得它们能够生成更像原始等级的设计。用户可以通过«油画»方式在系统提供的建议基础上创建和编辑等级。
  • results: 文章报道了这个系统的设计和训练过程,以及用户测试结果。
    Abstract We present Lode Encoder, a gamified mixed-initiative level creation system for the classic platform-puzzle game Lode Runner. The system is built around several autoencoders which are trained on sets of Lode Runner levels. When fed with the user's design, each autoencoder produces a version of that design which is closer in style to the levels that it was trained on. The Lode Encoder interface allows the user to build and edit levels through 'painting' from the suggestions provided by the autoencoders. Crucially, in order to encourage designers to explore new possibilities, the system does not include more traditional editing tools. We report on the system design and training procedure, as well as on the evolution of the system itself and user tests.
    摘要 我们介绍Lode Encoder,一个基于混合式 init 的平台游戏吧 Runner 级别创建系统。该系统建立在多个自适应oder上,这些自适应oder在 Lode Runner 级别设计的集合上进行了训练。当用户提供设计时,每个自适应oder都会生成一个更加适合 Lode Runner 级别的版本。Lode Encoder 界面允许用户通过 '涂抹' 方式在自适应oder 提供的建议基础上创建和编辑级别。为了鼓励设计师探索新的可能性,系统没有传统的编辑工具。我们介绍了系统的设计和训练过程,以及用户测试。

Masked and Swapped Sequence Modeling for Next Novel Basket Recommendation in Grocery Shopping

  • paper_url: http://arxiv.org/abs/2308.01308
  • repo_url: https://github.com/liming-7/mask-swap-nnbr
  • paper_authors: Ming Li, Mozhdeh Ariannezhad, Andrew Yates, Maarten de Rijke
  • for: 本研究强调的任务是提出一种新的下一个购物篮(NNBR)任务,即推荐一个只包含新品种的购物篮。
  • methods: 我们提出了一种简单的双向变换器购物篮推荐模型(BTBR),该模型直接模型了物品之间的相关性,并且可以在不同的购物篮中模型物品之间的相关性。
  • results: 我们通过对三个公开数据集进行了广泛的实验,并证明了BTBR和我们的面具策略和交换策略可以很好地提高NNBR任务的性能。
    Abstract Next basket recommendation (NBR) is the task of predicting the next set of items based on a sequence of already purchased baskets. It is a recommendation task that has been widely studied, especially in the context of grocery shopping. In next basket recommendation (NBR), it is useful to distinguish between repeat items, i.e., items that a user has consumed before, and explore items, i.e., items that a user has not consumed before. Most NBR work either ignores this distinction or focuses on repeat items. We formulate the next novel basket recommendation (NNBR) task, i.e., the task of recommending a basket that only consists of novel items, which is valuable for both real-world application and NBR evaluation. We evaluate how existing NBR methods perform on the NNBR task and find that, so far, limited progress has been made w.r.t. the NNBR task. To address the NNBR task, we propose a simple bi-directional transformer basket recommendation model (BTBR), which is focused on directly modeling item-to-item correlations within and across baskets instead of learning complex basket representations. To properly train BTBR, we propose and investigate several masking strategies and training objectives: (i) item-level random masking, (ii) item-level select masking, (iii) basket-level all masking, (iv) basket-level explore masking, and (v) joint masking. In addition, an item-basket swapping strategy is proposed to enrich the item interactions within the same baskets. We conduct extensive experiments on three open datasets with various characteristics. The results demonstrate the effectiveness of BTBR and our masking and swapping strategies for the NNBR task. BTBR with a properly selected masking and swapping strategy can substantially improve NNBR performance.
    摘要 下一个篮球推荐(NBR)任务是预测下一个序列中的项目,基于已经购买的篮球。这是一种广泛研究的推荐任务,特别是在超市购物中。在NBR任务中,分 distinguish between repeat items(已经消耗过的项目)和 explore items(未消耗过的项目)。大多数NBR工作 Either ignore this distinction or focus on repeat items。我们提出了下一个新型篮球推荐(NNBR)任务,即推荐一个只包含新品的篮球,这对实际应用和NBR评估都具有价值。我们评估了现有的NBR方法在NNBR任务中的表现,发现至今为止,对NNBR任务的进展有限。为解决NNBR任务,我们提出了一种简单的双向转换器篮球推荐模型(BTBR),这是直接模型item-to-item correlations within和across baskets,而不是学习复杂的篮球表示。为正确地训练BTBR,我们提出了和 investigate several masking strategies and training objectives:(i)item-level random masking,(ii)item-level select masking,(iii)basket-level all masking,(iv)basket-level explore masking,和(v)joint masking。此外,我们还提出了一种item-basket swapping strategy,以增强item interactions within the same baskets。我们对三个开放数据集进行了广泛的实验,结果表明BTBR和我们的masking和swapping策略对NNBR任务有效。BTBR WITH properly selected masking and swapping strategy can substantially improve NNBR performance。

Excitatory/Inhibitory Balance Emerges as a Key Factor for RBN Performance, Overriding Attractor Dynamics

  • paper_url: http://arxiv.org/abs/2308.10831
  • repo_url: None
  • paper_authors: Emmanuel Calvet, Jean Rouat, Bertrand Reulet
  • for: 这个论文旨在研究Random Boolean Networks(RBNs)在某些特定的分布参数下的不同动态行为,以及这些动态行为如何影响计算性能。
  • methods: 作者使用Random Boolean Networks(RBNs)模型,并研究了不同的分布参数对计算性能的影响。
  • results: 研究发现,在某些特定的分布参数下,Random Boolean Networks(RBNs)可以具有多种不同的动态行为,其中一些动态行为可以提高计算性能。此外,研究还发现,不同的动态行为对计算性能的影响几乎没有关系。
    Abstract Reservoir computing provides a time and cost-efficient alternative to traditional learning methods.Critical regimes, known as the "edge of chaos," have been found to optimize computational performance in binary neural networks. However, little attention has been devoted to studying reservoir-to-reservoir variability when investigating the link between connectivity, dynamics, and performance. As physical reservoir computers become more prevalent, developing a systematic approach to network design is crucial. In this article, we examine Random Boolean Networks (RBNs) and demonstrate that specific distribution parameters can lead to diverse dynamics near critical points. We identify distinct dynamical attractors and quantify their statistics, revealing that most reservoirs possess a dominant attractor. We then evaluate performance in two challenging tasks, memorization and prediction, and find that a positive excitatory balance produces a critical point with higher memory performance. In comparison, a negative inhibitory balance delivers another critical point with better prediction performance. Interestingly, we show that the intrinsic attractor dynamics have little influence on performance in either case.
    摘要 rezhervoir computing 提供了一种时间和成本效率的代替方法,traditional learning methods 中的一种新的方法。critical regimes, known as the "edge of chaos," have been found to optimize computational performance in binary neural networks。However, little attention has been devoted to studying reservoir-to-reservoir variability when investigating the link between connectivity, dynamics, and performance。As physical reservoir computers become more prevalent, developing a systematic approach to network design is crucial。In this article, we examine Random Boolean Networks (RBNs) and demonstrate that specific distribution parameters can lead to diverse dynamics near critical points。We identify distinct dynamical attractors and quantify their statistics, revealing that most reservoirs possess a dominant attractor。We then evaluate performance in two challenging tasks, memorization and prediction, and find that a positive excitatory balance produces a critical point with higher memory performance。In comparison, a negative inhibitory balance delivers another critical point with better prediction performance。Interestingly, we show that the intrinsic attractor dynamics have little influence on performance in either case。

EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding

  • paper_url: http://arxiv.org/abs/2308.01329
  • repo_url: None
  • paper_authors: Yan Zheng, Junpeng Wang, Chin-Chia Michael Yeh, Yujie Fan, Huiyuan Chen, Liang Wang, Wei Zhang
  • for: 这个论文主要是为了探讨嵌入学习算法中的特征编码方法,以及如何从 embedding 空间中提取 semantics 信息。
  • methods: 这个论文提出了一种嵌入探索算法 named EmbeddingTree,它可以将嵌入 vector 与实体特征之间的semantics关系进行结构化解释。同时, authors 也开发了一种基于 EmbeddingTree 的互动视觉工具,可以帮助用户探索高维 embedding 空间中的特征。
  • results: 作者们通过使用 EmbeddingTree 和互动视觉工具,可以从 embedding 空间中提取出更多的 semantics 信息,并且可以对 embedding 训练中的特征进行denoising/注入。他们还通过对实际业务数据和公共30Music listening/playlists数据集进行实验,证明了 EmbeddingTree 的效果和互动视觉工具的价值。
    Abstract Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embedding exploration algorithm that relates the semantics of entity features with the less-interpretable embedding vectors. An interactive visualization tool is also developed based on EmbeddingTree to explore high-dimensional embeddings. The tool helps users discover nuance features of data entities, perform feature denoising/injecting in embedding training, and generate embeddings for unseen entities. We demonstrate the efficacy of EmbeddingTree and our visualization tool through embeddings generated for industry-scale merchant data and the public 30Music listening/playlists dataset.
    摘要 <> translate_language: zh-CN>Original text:“嵌入学习”将维度数据实体转化为连续数字表示,嵌入实体的特征/属性。尽管不同的嵌入学习算法在表现出色,但是对嵌入学习后得到的数字空间中的特征进行结构性解释却得到了少数努力。这项工作提出了“嵌入树”,一种嵌入探索算法,它将数据实体的 semantics 与嵌入向量相关联。此外,我们还开发了基于“嵌入树”的互动视觉化工具,帮助用户探索高维嵌入中的细节特征,进行嵌入训练中的特征杂谔/注入,并生成未经见过的嵌入。我们通过使用“嵌入树”和视觉化工具来验证嵌入树的有效性,并在行业级的商家数据和公共30Music的听众/播放列表数据集上进行了实践。Translated text in Simplified Chinese:“嵌入学习”将维度数据实体转化为连续数字表示,嵌入实体的特征/属性。尽管不同的嵌入学习算法在表现出色,但是对嵌入学习后得到的数字空间中的特征进行结构性解释却得到了少数努力。这项工作提出了“嵌入树”,一种嵌入探索算法,它将数据实体的 semantics 与嵌入向量相关联。此外,我们还开发了基于“嵌入树”的互动视觉化工具,帮助用户探索高维嵌入中的细节特征,进行嵌入训练中的特征杂谔/注入,并生成未经见过的嵌入。我们通过使用“嵌入树”和视觉化工具来验证嵌入树的有效性,并在行业级的商家数据和公共30Music的听众/播放列表数据集上进行了实践。

Investigation on Machine Learning Based Approaches for Estimating the Critical Temperature of Superconductors

  • paper_url: http://arxiv.org/abs/2308.01932
  • repo_url: None
  • paper_authors: Fatin Abrar Shams, Rashed Hasan Ratul, Ahnaf Islam Naf, Syed Shaek Hossain Samir, Mirza Muntasir Nishat, Fahim Faisal, Md. Ashraful Hoque
  • for: 这篇论文主要是为了提出一种基于机器学习的方法来准确预测超导材料的 kritischer 温度。
  • methods: 该论文使用了一种叠加机器学习方法来训练自己,以便更好地预测超导材料的 kritischer 温度。
  • results: 与其他前一些可accessible的研究相比,该模型表现了一定的承诺性,其RMSE为9.68,R2值为0.922。
    Abstract Superconductors have been among the most fascinating substances, as the fundamental concept of superconductivity as well as the correlation of critical temperature and superconductive materials have been the focus of extensive investigation since their discovery. However, superconductors at normal temperatures have yet to be identified. Additionally, there are still many unknown factors and gaps of understanding regarding this unique phenomenon, particularly the connection between superconductivity and the fundamental criteria to estimate the critical temperature. To bridge the gap, numerous machine learning techniques have been established to estimate critical temperatures as it is extremely challenging to determine. Furthermore, the need for a sophisticated and feasible method for determining the temperature range that goes beyond the scope of the standard empirical formula appears to be strongly emphasized by various machine-learning approaches. This paper uses a stacking machine learning approach to train itself on the complex characteristics of superconductive materials in order to accurately predict critical temperatures. In comparison to other previous accessible research investigations, this model demonstrated a promising performance with an RMSE of 9.68 and an R2 score of 0.922. The findings presented here could be a viable technique to shed new insight on the efficient implementation of the stacking ensemble method with hyperparameter optimization (HPO).
    摘要 超导材料已经是最引人注目的物质之一,因为超导性的基本概念以及相关的极限温度和超导材料的关系,已经被广泛研究了多年。然而,在常规温度下的超导材料还没有被发现。此外,关于这一特有现象的多种未知因素和理解之间的连接,特别是计算极限温度的基本标准的关系,仍然存在很多未知和缺陷。为了填补这些缺陷,许多机器学习技术已经被开发出来,以便估算极限温度。此外,需要一种可行、可靠的方法来确定极限温度范围,这超出了标准的实验方程的范围。本文使用堆叠机器学习方法来训练自己,以便准确预测极限温度。与之前可 accessible 的研究比较,这个模型表现出了有前途的性能,RMSE 为 9.68,R2 分数为 0.922。这些发现可能对于改进堆叠ensemble方法与超参数优化(HPO)的实现有所帮助。

BRNES: Enabling Security and Privacy-aware Experience Sharing in Multiagent Robotic and Autonomous Systems

  • paper_url: http://arxiv.org/abs/2308.01274
  • repo_url: https://github.com/aralab-unr/brnes
  • paper_authors: Md Tamjid Hossain, Hung Manh La, Shahriar Badsha, Anton Netchaev
  • for: 这篇论文是用于解决多智能体问题,尤其是在对抗攻击和推理攻击的情况下。
  • methods: 本论文使用了类似于专家给学习者的导师-学习者架构,并且提出了一个称为BRNES的新的多智能体学习框架,以确保学习者在对抗攻击和推理攻击的情况下能够获得更好的学习效果。
  • results: 本论文的实验结果显示, compared to the state-of-the-art frameworks, BRNES 可以更快地达到目标,并且在对抗攻击和推理攻击的情况下也能够获得更好的学习效果。具体来说,BRNES 比非隐私框架8.32倍 faster,并且比隐私框架1.41倍 faster。
    Abstract Although experience sharing (ES) accelerates multiagent reinforcement learning (MARL) in an advisor-advisee framework, attempts to apply ES to decentralized multiagent systems have so far relied on trusted environments and overlooked the possibility of adversarial manipulation and inference. Nevertheless, in a real-world setting, some Byzantine attackers, disguised as advisors, may provide false advice to the advisee and catastrophically degrade the overall learning performance. Also, an inference attacker, disguised as an advisee, may conduct several queries to infer the advisors' private information and make the entire ES process questionable in terms of privacy leakage. To address and tackle these issues, we propose a novel MARL framework (BRNES) that heuristically selects a dynamic neighbor zone for each advisee at each learning step and adopts a weighted experience aggregation technique to reduce Byzantine attack impact. Furthermore, to keep the agent's private information safe from adversarial inference attacks, we leverage the local differential privacy (LDP)-induced noise during the ES process. Our experiments show that our framework outperforms the state-of-the-art in terms of the steps to goal, obtained reward, and time to goal metrics. Particularly, our evaluation shows that the proposed framework is 8.32x faster than the current non-private frameworks and 1.41x faster than the private frameworks in an adversarial setting.
    摘要 尽管经验分享(ES)可以加速多智能学习(MARL)在顾问-被顾问框架下,但是在分散式多智能系统中应用ES的尝试都是在可信环境下进行,而忽略了恶意攻击和推理的可能性。然而,在实际场景中,一些拜占庭攻击者,装扮成顾问,可能为被顾问提供错误的建议,从而使整体学习性能受到极大的降低。此外,一个推理攻击者,装扮成被顾问,可能通过多次查询来推理顾问的私人信息,使整个ES过程成为隐私泄露的问题。为解决这些问题,我们提出了一种基于BRNES的新的MARL框架,它在每个被顾问 learning步骤中采用了动态邻居区选择和权重经验聚合技术来减少拜占庭攻击的影响。此外,为保护智能机器的私人信息免受恶意推理攻击,我们利用了本地差分隐私(LDP)induced的噪声在ES过程中。我们的实验表明,我们的框架比现有的非私钥框架快8.32倍,比私钥框架快1.41倍在恶意Setting中。

A Probabilistic Approach to Self-Supervised Learning using Cyclical Stochastic Gradient MCMC

  • paper_url: http://arxiv.org/abs/2308.01271
  • repo_url: None
  • paper_authors: Masoumeh Javanbakhat, Christoph Lippert
  • for: 本文提出了一种实用的 bayesian自适应学习方法,使用循环随机梯度汉姆验 Monte Carlo (cSGHMC) 方法来 aproximate高维、多模态 posterior 分布。
  • methods: 本文使用 Bayesian self-supervised learning 方法,并在高维 embedding 中置信度 posterior 分布。使用 cSGHMC 方法来 aproximate posterior 分布,并从 interpretable 和多样化的表示中获得了优化的表现。
  • results: 实验结果表明,通过约束 marginal posterior 分布,bayesian self-supervised learning 方法在多种下游分类任务中具有显著的性能提升、calibration 和 out-of-distribution 检测。此外,在 SVHN 和 CIFAR-10 dataset 上,提出的方法也有效地检测了 out-of-distribution 样本。
    Abstract In this paper we present a practical Bayesian self-supervised learning method with Cyclical Stochastic Gradient Hamiltonian Monte Carlo (cSGHMC). Within this framework, we place a prior over the parameters of a self-supervised learning model and use cSGHMC to approximate the high dimensional and multimodal posterior distribution over the embeddings. By exploring an expressive posterior over the embeddings, Bayesian self-supervised learning produces interpretable and diverse representations. Marginalizing over these representations yields a significant gain in performance, calibration and out-of-distribution detection on a variety of downstream classification tasks. We provide experimental results on multiple classification tasks on four challenging datasets. Moreover, we demonstrate the effectiveness of the proposed method in out-of-distribution detection using the SVHN and CIFAR-10 datasets.
    摘要 在这篇论文中,我们提出了一种实用的极 probabilistic Bayesian自适应学习方法,即循环随机梯度汉堡 Monte Carlo(cSGHMC)。在这个框架中,我们将参数的自适应学习模型的参数置于一个高维度和多模态的 posterior 分布中,并使用 cSGHMC 来近似这个高维度和多模态的 posterior 分布。通过探索表征空间的表示,极 probabilistic Bayesian自适应学习可以生成可读取和多样化的表示。对这些表示的聚合具有显著提升性,可以在多种下游分类任务上实现更高的性能、评估和对外部数据集的检测。我们在多个分类任务上进行了多个数据集的实验,并证明了我们的提议方法在 SVHN 和 CIFAR-10 数据集上的外部数据集检测的效果。

Tirtha – An Automated Platform to Crowdsource Images and Create 3D Models of Heritage Sites

  • paper_url: http://arxiv.org/abs/2308.01246
  • repo_url: https://github.com/smlab-niser/tirtha-public
  • paper_authors: Jyotirmaya Shivottam, Subhankar Mishra
  • for: 保护文化遗产(CH)场所的数字化保存是非常重要,以防止自然灾害或人类活动的损害。
  • methods: 创建CH场所的3D模型已经成为数字保存的流行方法,感谢计算机视觉和光ogramметry的进步。但是,这个过程 consume much time and money,并且通常需要专业设备和技能,尤其是在资源受限的发展国家。
  • results: 我们提出了Tirtha,一个基于网络的平台,通过协同拍摄图片来生成CH场所的3D模型。Tirtha使用了当前最好的结构from Motion(SfM)和多视图ステレオ(MVS)技术。它是可拓展、可cost-effective的,可以适应新的摄影技术的发展。Tirtha可以通过网络界面访问,并可以部署在本地或云端环境中。我们的案例研究表明,Tirtha可以成功地创建奥里萨邦印度的寺庙3D模型,使用了公共上传的图片。这些3D模型可以在Tirtha网站上查看、交互和下载。我们的工作希望通过提供大量公共上传的图片和3D重建,为计算机视觉、遗产保护和相关领域的研究提供数据集。总之,Tirtha是一步向数字保存的民主化,主要是在资源受限的发展国家。
    Abstract Digital preservation of Cultural Heritage (CH) sites is crucial to protect them against damage from natural disasters or human activities. Creating 3D models of CH sites has become a popular method of digital preservation thanks to advancements in computer vision and photogrammetry. However, the process is time-consuming, expensive, and typically requires specialized equipment and expertise, posing challenges in resource-limited developing countries. Additionally, the lack of an open repository for 3D models hinders research and public engagement with their heritage. To address these issues, we propose Tirtha, a web platform for crowdsourcing images of CH sites and creating their 3D models. Tirtha utilizes state-of-the-art Structure from Motion (SfM) and Multi-View Stereo (MVS) techniques. It is modular, extensible and cost-effective, allowing for the incorporation of new techniques as photogrammetry advances. Tirtha is accessible through a web interface at https://tirtha.niser.ac.in and can be deployed on-premise or in a cloud environment. In our case studies, we demonstrate the pipeline's effectiveness by creating 3D models of temples in Odisha, India, using crowdsourced images. These models are available for viewing, interaction, and download on the Tirtha website. Our work aims to provide a dataset of crowdsourced images and 3D reconstructions for research in computer vision, heritage conservation, and related domains. Overall, Tirtha is a step towards democratizing digital preservation, primarily in resource-limited developing countries.
    摘要 针对文化遗产(CH)场景的数字保存是非常重要,以保护它们免受自然灾害或人类活动的损害。创建CH场景的3D模型已成为数字保存的流行方法,感谢计算机视觉和相机摄影渠道的进步。然而,这个过程占用时间、成本高,通常需要特殊的设备和专业知识,这在有限资源的发展国家中存在挑战。此外,缺乏开放的3D模型存储库,限制了研究和公众对遗产的参与和研究。为解决这些问题,我们提出了Tirtha,一个基于网络的平台,用于把文化遗产场景的图片集成为3D模型。Tirtha使用当前最佳的结构从运动(SfM)和多视图镜像(MVS)技术。它是可扩展、可cost-effective的,可以适应计算机视觉的进步。Tirtha通过网络界面访问,可以在本地部署或云端环境中部署。在我们的案例研究中,我们使用了拥有图片的拥有者来创建奥里萨(India)的寺庙3D模型。这些3D模型可以在Tirtha网站上查看、交互和下载。我们的工作的目标是提供一个包含了拥有图片和3D重建的数据集,用于计算机视觉、遗产保护和相关领域的研究。总之,Tirtha是一步向数字保存的民主化,主要在有限资源的发展国家。