cs.LG - 2023-08-16

Accurate synthesis of Dysarthric Speech for ASR data augmentation

  • paper_url: http://arxiv.org/abs/2308.08438
  • repo_url: None
  • paper_authors: Mohammad Soleymanpour, Michael T. Johnson, Rahim Soleymanpour, Jeffrey Berry
  • For: This paper is written for the purpose of developing a new dysarthric speech synthesis method for use in Automatic Speech Recognition (ASR) training data augmentation.* Methods: The paper uses a modified neural multi-talker Text-to-Speech (TTS) system, which includes a dysarthria severity level coefficient and a pause insertion model, to synthesize dysarthric speech for varying severity levels. The paper also uses a DNN-HMM model for dysarthria-specific speech recognition.* Results: The paper shows that the addition of dysarthric speech synthesis to ASR training data improves the accuracy of dysarthric speech recognition by 12.2%, and that the addition of severity level and pause insertion controls decreases WER by 6.5%. The subjective evaluation shows that the synthesized speech is perceived as similar to true dysarthric speech, especially for higher levels of dysarthria.Here is the simplified Chinese translation of the three key information points:* For: 这篇论文是为了开发一种新的嗜睡术语合成方法,以便用于自动语音识别(ASR)训练数据增强。* Methods: 这篇论文使用一种修改后的神经网络多个人Text-to-Speech(TTS)系统,包括嗜睡度量级别和插入停顿模型,以生成嗜睡术语。* Results: 这篇论文显示,通过将嗜睡术语合成添加到ASR训练数据中,可以提高嗜睡术语识别精度,相比基eline,增强精度12.2%。此外,添加严重度和插入停顿控制可以降低WER值6.5%。主观评估表明,生成的合成语音与真正的嗜睡术语相似,尤其是高度嗜睡术语。
    Abstract Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems can help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech, which is not readily available for dysarthric talkers. This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation. Differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels are important components for dysarthric speech modeling, synthesis, and augmentation. For dysarthric speech synthesis, a modified neural multi-talker TTS is implemented by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. To evaluate the effectiveness for synthesis of training data for ASR, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has significant impact on the dysarthric ASR systems. In addition, we have conducted a subjective evaluation to evaluate the dysarthric-ness and similarity of synthesized speech. Our subjective evaluation shows that the perceived dysartrhic-ness of synthesized speech is similar to that of true dysarthric speech, especially for higher levels of dysarthria
    摘要 �� Dyasarthria 是一种运动性语言障碍,常characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems can help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech, which is not readily available for dysarthric talkers. This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation. Differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels are important components for dysarthric speech modeling, synthesis, and augmentation. For dysarthric speech synthesis, a modified neural multi-talker TTS is implemented by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. To evaluate the effectiveness for synthesis of training data for ASR, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has significant impact on the dysarthric ASR systems. In addition, we have conducted a subjective evaluation to evaluate the dysarthric-ness and similarity of synthesized speech. Our subjective evaluation shows that the perceived dysartrhic-ness of synthesized speech is similar to that of true dysarthric speech, especially for higher levels of dysarthria.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning

  • paper_url: http://arxiv.org/abs/2308.08427
  • repo_url: None
  • paper_authors: Ziteng Cheng, Anthony Coache, Sebastian Jaimungal
  • for: 这个研究旨在使用互动问答方法来识别代理人的风险偏好。
  • methods: 我们在一期和无穷远期两种场景中进行研究,在一期场景中,我们假设代理人的风险偏好是由状态的成本函数和风险评估函数来描述。在无穷远期场景中,我们模型风险偏好带有一个折衡因子。我们假设可以访问一组finite的候选人,其中包含代理人的真实风险偏好,然后通过询问代理人的优化政策在不同环境中来识别代理人的风险偏好。
  • results: 我们证明,询问代理人的优化政策在不同环境中是一种有效的方法来识别代理人的风险偏好。具体来说,我们证明代理人的风险偏好可以通过问题的数量增长和问题的随机设计来识别出来,并且我们开发了一种算法来设计优化问题。在 simulations 中,我们发现我们的方法可以快速地学习代理人的风险偏好,比Randomly 设计的问题更快。这种方法在 robo-advising 中有重要应用,并提供了一种新的风险偏好识别方法。
    Abstract This paper proposes a novel framework for identifying an agent's risk aversion using interactive questioning. Our study is conducted in two scenarios: a one-period case and an infinite horizon case. In the one-period case, we assume that the agent's risk aversion is characterized by a cost function of the state and a distortion risk measure. In the infinite horizon case, we model risk aversion with an additional component, a discount factor. Assuming the access to a finite set of candidates containing the agent's true risk aversion, we show that asking the agent to demonstrate her optimal policies in various environment, which may depend on their previous answers, is an effective means of identifying the agent's risk aversion. Specifically, we prove that the agent's risk aversion can be identified as the number of questions tends to infinity, and the questions are randomly designed. We also develop an algorithm for designing optimal questions and provide empirical evidence that our method learns risk aversion significantly faster than randomly designed questions in simulations. Our framework has important applications in robo-advising and provides a new approach for identifying an agent's risk preferences.
    摘要 Translation notes:* "risk aversion" is translated as "风险偏好" (fēngxǐn tiěndòng)* "cost function" is translated as "成本函数" (chéngbèng fāngxìn)* "distortion risk measure" is translated as "偏移风险度量" (diānchōng fēngxǐn duōliàng)* "infinite horizon" is translated as "无限距离" (wúxìn jiǔdì)* "discount factor" is translated as "折扣因子" (diǎnkē yīn zhī)* "finite set" is translated as "有限集" (yǒujiàn jítè)* "optimal policies" is translated as "最佳策略" (zuìjiā cuòlüè)* "randomly designed questions" is translated as "随机设计的问题" (suījī jièdǎo de wèn tí)* "empirical evidence" is translated as "实验证据" (shíyàn zhèngjiā)

Digital twinning of cardiac electrophysiology models from the surface ECG: a geodesic backpropagation approach

  • paper_url: http://arxiv.org/abs/2308.08410
  • repo_url: None
  • paper_authors: Thomas Grandits, Jan Verhülsdonk, Gundolf Haase, Alexander Effland, Simone Pezzuto
  • for: 用于建立个性化的心脏电physiology模型,以便在临床时间Constraint中提供个性化的cardiac models。
  • methods: 使用Geodesic-BP方法,一种基于GPU加速的机器学习框架,来优化eikonal方程的参数,以达到模拟cardiac activation的高精度。
  • results: 在一个人工测试 caso中,Geodesic-BP方法可以很准确地重construct一个模拟的cardiac activation,包括在模型不准确性的情况下。此外,我们还应用了该算法于一个公共可用的兔子模型数据集,得到了非常正面的结果。
    Abstract The eikonal equation has become an indispensable tool for modeling cardiac electrical activation accurately and efficiently. In principle, by matching clinically recorded and eikonal-based electrocardiograms (ECGs), it is possible to build patient-specific models of cardiac electrophysiology in a purely non-invasive manner. Nonetheless, the fitting procedure remains a challenging task. The present study introduces a novel method, Geodesic-BP, to solve the inverse eikonal problem. Geodesic-BP is well-suited for GPU-accelerated machine learning frameworks, allowing us to optimize the parameters of the eikonal equation to reproduce a given ECG. We show that Geodesic-BP can reconstruct a simulated cardiac activation with high accuracy in a synthetic test case, even in the presence of modeling inaccuracies. Furthermore, we apply our algorithm to a publicly available dataset of a rabbit model, with very positive results. Given the future shift towards personalized medicine, Geodesic-BP has the potential to help in future functionalizations of cardiac models meeting clinical time constraints while maintaining the physiological accuracy of state-of-the-art cardiac models.
    摘要 《椭圆方程》已成为心脏电动力学模型的不可或缺工具。在原理上,通过对临床记录和椭圆方程基于的电cardiogram(ECG)进行匹配,可以建立个性化的心脏电physiology模型,无需侵入性的干预。然而,匹配过程仍然是一项复杂的任务。本研究提出了一种新方法,即Geodesic-BP,以解决反椭圆问题。Geodesic-BP适用于加速机器学习框架的GPU,可以优化椭圆方程的参数,以复制给定的ECG。我们在synthetic测试案例中显示,Geodesic-BP可以高精度地重建模拟的cardiac activation,即使在模型不准确的情况下。此外,我们对公共可用的兔子模型数据集进行了应用,得到了非常正面的结果。随着未来的个性化医疗转型,Geodesic-BP具有帮助将cardiac模型功能化,遵循临床时间约束,保持现有的心脏模型physiological accuracy的潜在力量。

Explainable AI for clinical risk prediction: a survey of concepts, methods, and modalities

  • paper_url: http://arxiv.org/abs/2308.08407
  • repo_url: None
  • paper_authors: Munib Mesinovic, Peter Watkinson, Tingting Zhu
  • for: 这篇论文的目的是解释AI医疗应用中的关键概念,包括解释性、可读性、公平性、信任和透明度,以及它们如何在临床预测中实现。
  • methods: 这篇论文使用了许多解释模型的发展,包括使用生成器和潜在隐藏的数据来提高解释性,以及使用多种解释方法来增强信任和公平性。
  • results: 这篇论文的结果显示,这些解释模型可以在临床预测中提高解释性和信任度,并且可以实现在多种模式下的透明度和公平性。
    Abstract Recent advancements in AI applications to healthcare have shown incredible promise in surpassing human performance in diagnosis and disease prognosis. With the increasing complexity of AI models, however, concerns regarding their opacity, potential biases, and the need for interpretability. To ensure trust and reliability in AI systems, especially in clinical risk prediction models, explainability becomes crucial. Explainability is usually referred to as an AI system's ability to provide a robust interpretation of its decision-making logic or the decisions themselves to human stakeholders. In clinical risk prediction, other aspects of explainability like fairness, bias, trust, and transparency also represent important concepts beyond just interpretability. In this review, we address the relationship between these concepts as they are often used together or interchangeably. This review also discusses recent progress in developing explainable models for clinical risk prediction, highlighting the importance of quantitative and clinical evaluation and validation across multiple common modalities in clinical practice. It emphasizes the need for external validation and the combination of diverse interpretability methods to enhance trust and fairness. Adopting rigorous testing, such as using synthetic datasets with known generative factors, can further improve the reliability of explainability methods. Open access and code-sharing resources are essential for transparency and reproducibility, enabling the growth and trustworthiness of explainable research. While challenges exist, an end-to-end approach to explainability in clinical risk prediction, incorporating stakeholders from clinicians to developers, is essential for success.
    摘要 近期人工智能在医疗领域的应用显示了很大的承诺,可以超越人类的诊断和疾病预测。然而,随着人工智能模型的复杂度的增加,关于它们的不透明度、潜在偏见和解释性的问题也在关注。为确保人工智能系统的可靠性和可信worthiness,特别是在临床风险预测模型中,解释性变得非常重要。解释性通常指人工智能系统能够提供人类权利者可靠的决策逻辑或决策结果的强有力的解释。在临床风险预测中,其他方面的解释性,如公平、偏见、信任和透明度,也是重要的概念,不仅是解释性。本文评论了这些概念之间的关系,并讨论了最新的开发可解释模型的进展,强调临床实践中多种普遍的Modalities的数据量的评估和验证。它强调需要外部验证和多种解释性方法的组合,以增强可靠性和公平。采用严格的测试,如使用已知生成因素的 sintetic 数据集,可以进一步提高解释性方法的可靠性。开放访问和代码分享资源是必要的,以便透明度和可重现性。虽然存在挑战,但是综合approach,从临床医生到开发者,是必要的 для成功。

Content-based Recommendation Engine for Video Streaming Platform

  • paper_url: http://arxiv.org/abs/2308.08406
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Puskal Khadka, Prabhav Lamichhane
  • for: 提供视频推荐服务,根据用户的前一个 interess和选择。
  • methods: 使用机器学习算法和TF-IDF文本矩阵化方法,确定文档中的相关性。计算每个内容之间的cosinusimilarity值,以确定视频的相似性。
  • results: 提出一种基于内容的推荐引擎,可以为用户提供适合其兴趣和选择的视频建议。测试结果显示,提出的引擎性能比较好,精度、回归率和F1核心均达到了预期水平。
    Abstract Recommendation engine suggest content, product or services to the user by using machine learning algorithm. This paper proposed a content-based recommendation engine for providing video suggestion to the user based on their previous interests and choices. We will use TF-IDF text vectorization method to determine the relevance of words in a document. Then we will find out the similarity between each content by calculating cosine similarity between them. Finally, engine will recommend videos to the users based on the obtained similarity score value. In addition, we will measure the engine's performance by computing precision, recall, and F1 core of the proposed system.
    摘要 <> translate into Simplified Chinese推荐引擎使用机器学习算法提供内容、产品或服务给用户。这篇论文提出了基于用户之前的兴趣和选择的视频推荐引擎。我们使用TF-IDF文本向量化方法确定文档中的相关性。然后我们计算每个内容之间的cosine相似性,并根据得到的相似性分值来推荐视频给用户。此外,我们还会测算推荐引擎的性能,包括精度、准确率和F1分值。Note: "TF-IDF" stands for "Term Frequency-Inverse Document Frequency", which is a text vectorization method used to determine the importance of words in a document.

Fast Uncertainty Quantification of Spent Nuclear Fuel with Neural Networks

  • paper_url: http://arxiv.org/abs/2308.08391
  • repo_url: None
  • paper_authors: Arnau Albà, Andreas Adelmann, Lucas Münster, Dimitri Rochman, Romana Boiger
  • for: 这 paper 是为了快速评估核电燃料(SNF)的特性而写的。
  • methods: 这 paper 使用神经网络(NN)来模拟 SNF 的特性,减少计算成本。
  • results: NN 可以准确地预测 decay heat 和 nuclide 浓度,响应关键输入参数的变化。模型被验证了,并且可以减少计算成本。
    Abstract The accurate calculation and uncertainty quantification of the characteristics of spent nuclear fuel (SNF) play a crucial role in ensuring the safety, efficiency, and sustainability of nuclear energy production, waste management, and nuclear safeguards. State of the art physics-based models, while reliable, are computationally intensive and time-consuming. This paper presents a surrogate modeling approach using neural networks (NN) to predict a number of SNF characteristics with reduced computational costs compared to physics-based models. An NN is trained using data generated from CASMO5 lattice calculations. The trained NN accurately predicts decay heat and nuclide concentrations of SNF, as a function of key input parameters, such as enrichment, burnup, cooling time between cycles, mean boron concentration and fuel temperature. The model is validated against physics-based decay heat simulations and measurements of different uranium oxide fuel assemblies from two different pressurized water reactors. In addition, the NN is used to perform sensitivity analysis and uncertainty quantification. The results are in very good alignment to CASMO5, while the computational costs (taking into account the costs of generating training samples) are reduced by a factor of 10 or more. Our findings demonstrate the feasibility of using NNs as surrogate models for fast characterization of SNF, providing a promising avenue for improving computational efficiency in assessing nuclear fuel behavior and associated risks.
    摘要 现代物理模型可靠但计算成本高。这篇论文提出了使用神经网络(NN)模型来快速预测核电燃料(SNF)特性。通过训练NN模型使用CASMO5网格计算数据,可以准确预测燃料衰变热和核lide浓度,具体取决于一些关键输入参数,如浓缩度、燃烧时间、循环冷却时间、平均氧化物含量和燃料温度。模型被验证了基于物理模型的衰变热计算和不同氧化物燃料聚集体的测量数据。此外,NN模型还可以进行敏感分析和不确定性评估。结果与CASMO5很相似,计算成本(包括生成训练样本的成本)被降低了一倍或更多。这些发现表明使用NN模型可以快速预测SNF特性,提供了改善核电燃料行为和相关风险评估计算效率的可能性。

Continuous Sweep: an improved, binary quantifier

  • paper_url: http://arxiv.org/abs/2308.08387
  • repo_url: None
  • paper_authors: Kevin Kloos, Julian D. Karch, Quinten A. Meertens, Mark de Rooij
  • for: 估计数据集中类别的总比例(quantification learning)
  • methods: 使用参数化二分类分布(parametric binary quantifier),改进决策边界,并计算mean而不是median
  • results: 在各种情况下,Continuous Sweep比Median Sweep表现更好,并且可以通过分析表达来找到最佳决策边界。
    Abstract Quantification is a supervised machine learning task, focused on estimating the class prevalence of a dataset rather than labeling its individual observations. We introduce Continuous Sweep, a new parametric binary quantifier inspired by the well-performing Median Sweep. Median Sweep is currently one of the best binary quantifiers, but we have changed this quantifier on three points, namely 1) using parametric class distributions instead of empirical distributions, 2) optimizing decision boundaries instead of applying discrete decision rules, and 3) calculating the mean instead of the median. We derive analytic expressions for the bias and variance of Continuous Sweep under general model assumptions. This is one of the first theoretical contributions in the field of quantification learning. Moreover, these derivations enable us to find the optimal decision boundaries. Finally, our simulation study shows that Continuous Sweep outperforms Median Sweep in a wide range of situations.
    摘要

Precision and Recall Reject Curves for Classification

  • paper_url: http://arxiv.org/abs/2308.08381
  • repo_url: None
  • paper_authors: Lydia Fischer, Patricia Wollstadt
  • for: 用于评估模型的置信度评价
  • methods: 使用 prototype-based classifiers from learning vector quantization
  • results: 提供了一种新的评估方法,可以更 accurately 评估模型的性能, especialy for 数据异常分布的场景。
    Abstract For some classification scenarios, it is desirable to use only those classification instances that a trained model associates with a high certainty. To obtain such high-certainty instances, previous work has proposed accuracy-reject curves. Reject curves allow to evaluate and compare the performance of different certainty measures over a range of thresholds for accepting or rejecting classifications. However, the accuracy may not be the most suited evaluation metric for all applications, and instead precision or recall may be preferable. This is the case, for example, for data with imbalanced class distributions. We therefore propose reject curves that evaluate precision and recall, the recall-reject curve and the precision-reject curve. Using prototype-based classifiers from learning vector quantization, we first validate the proposed curves on artificial benchmark data against the accuracy reject curve as a baseline. We then show on imbalanced benchmarks and medical, real-world data that for these scenarios, the proposed precision- and recall-curves yield more accurate insights into classifier performance than accuracy reject curves.
    摘要 有些分类场景中,您可能想使用已经训练好的模型对分类结果进行高度确定性的评估。为了获得这些高度确定性的分类实例,前一些工作提出了准确率拒绝曲线。拒绝曲线可以评估和比较不同确定度度量在不同的阈值上Accept或拒绝分类的性能。但是,准确率可能不是所有应用场景中最适合的评估度量,特别是数据具有不均匀的类别分布。我们因此提议使用精度和准确率的拒绝曲线。使用学习 вектор量化的原型基 classifier,我们首先验证提议的曲线在人工 benchmark 数据上对准确率 reject 曲线作为基准进行验证。然后,我们在具有不均匀分布的 benchmark 和医学实际数据上示出,在这些场景中,提议的精度和准确率曲线可以更准确地评估分类器的性能,而不是准确率 reject 曲线。

A distributed neural network architecture for dynamic sensor selection with application to bandwidth-constrained body-sensor networks

  • paper_url: http://arxiv.org/abs/2308.08379
  • repo_url: None
  • paper_authors: Thomas Strypsteen, Alexander Bertrand
  • for: 这个论文旨在提出一种动态侦测器选择方法,以便在深度神经网络(DNN)中选择最佳的侦测器子集,并将这个选择与任务模型一起学习,以提高侦测器选择的精确性。
  • methods: 这个方法使用Gumbel-Softmax点子来允许数字的决策被通过标准的反射调变学习。它还包括一个动态空间范 filter,使任务-DNN更加具有耐腐性,以便能够处理多个可能的node subset。
  • results: 这个方法可以将选择最佳通道分布到不同的节点上,并且可以对实际的体内侦测网络(WSN)进行验证,并分析传输负载和任务准确性之间的交易。
    Abstract We propose a dynamic sensor selection approach for deep neural networks (DNNs), which is able to derive an optimal sensor subset selection for each specific input sample instead of a fixed selection for the entire dataset. This dynamic selection is jointly learned with the task model in an end-to-end way, using the Gumbel-Softmax trick to allow the discrete decisions to be learned through standard backpropagation. We then show how we can use this dynamic selection to increase the lifetime of a wireless sensor network (WSN) by imposing constraints on how often each node is allowed to transmit. We further improve performance by including a dynamic spatial filter that makes the task-DNN more robust against the fact that it now needs to be able to handle a multitude of possible node subsets. Finally, we explain how the selection of the optimal channels can be distributed across the different nodes in a WSN. We validate this method on a use case in the context of body-sensor networks, where we use real electroencephalography (EEG) sensor data to emulate an EEG sensor network. We analyze the resulting trade-offs between transmission load and task accuracy.
    摘要 我们提出了一种动态感知选择方法,用于深度神经网络(DNN),可以在每个特定输入样本上选择最佳感知subset,而不是整个数据集中固定的选择。这种动态选择与任务模型在综合的方式中同时学习,使用Gumbel-Softmax技巧,以使得柔性决策可以通过标准反馈来学习。然后,我们介绍了如何使用这种动态选择来增加无线传感器网络(WSN)的寿命,通过限制每个节点发送的次数。此外,我们还提高了性能,通过包括动态空间筛选器,使任务-DNN更加抗性能于面临多个可能的节点subset。最后,我们解释了如何选择优化的通道。我们验证了这种方法,使用了真实的电enzephalography(EEG)感知数据,模拟了EEG感知网络。我们分析了结果的平衡 Trade-offs between transmission load和任务准确率。

PDPK: A Framework to Synthesise Process Data and Corresponding Procedural Knowledge for Manufacturing

  • paper_url: http://arxiv.org/abs/2308.08371
  • repo_url: https://github.com/0x14d/embedding-operator-knowledge
  • paper_authors: Richard Nordsieck, André Schweizer, Michael Heider, Jörg Hähner
    for: 本研究的目的是提供一个框架,可以生成基于不同领域的 sintetic 数据集,以便模拟实际中的程序�beroject knowledge。methods: 本研究使用的方法包括:(1) 基于 Resource Description Framework (RDF) 的知识 graphs 的设计,(2) 模拟 parametrisation проце�cess,(3) 使用现有的嵌入方法来表示程序�beroject knowledge。results: 本研究的结果包括:(1) 一个可以适应不同领域的 sintetic 数据集,(2) 一个基于 RDF 的知识 graphs 的表示方法,(3) 一个可以评估嵌入方法的基本比较结果。
    Abstract Procedural knowledge describes how to accomplish tasks and mitigate problems. Such knowledge is commonly held by domain experts, e.g. operators in manufacturing who adjust parameters to achieve quality targets. To the best of our knowledge, no real-world datasets containing process data and corresponding procedural knowledge are publicly available, possibly due to corporate apprehensions regarding the loss of knowledge advances. Therefore, we provide a framework to generate synthetic datasets that can be adapted to different domains. The design choices are inspired by two real-world datasets of procedural knowledge we have access to. Apart from containing representations of procedural knowledge in Resource Description Framework (RDF)-compliant knowledge graphs, the framework simulates parametrisation processes and provides consistent process data. We compare established embedding methods on the resulting knowledge graphs, detailing which out-of-the-box methods have the potential to represent procedural knowledge. This provides a baseline which can be used to increase the comparability of future work. Furthermore, we validate the overall characteristics of a synthesised dataset by comparing the results to those achievable on a real-world dataset. The framework and evaluation code, as well as the dataset used in the evaluation, are available open source.
    摘要 “程序性知识”描述了如何完成任务和解决问题。这种知识通常由领域专家所拥有,例如制造业中的操作员,他们会调整参数以达到质量目标。据我们所知,没有公开可用的实际世界数据集,可能因为企业对知识前进的担忧。因此,我们提供了一个框架,可以生成可靠的Synthetic数据集,可以适应不同领域。这个框架基于我们可以获得的两个实际世界数据集的知识,并且模拟参数化过程,提供一致的处理数据。我们使用已有的嵌入方法对知识图进行评估,并详细介绍这些方法的潜在表现。此外,我们还验证了生成的数据集的总性特征,并与实际世界数据集进行比较。框架和评估代码以及使用于评估的数据集都可以公开获取。

Dual-Branch Temperature Scaling Calibration for Long-Tailed Recognition

  • paper_url: http://arxiv.org/abs/2308.08366
  • repo_url: None
  • paper_authors: Jialin Guo, Zhenyu Wu, Zhiqiang Zhan, Yang Ji
  • for: 本研究旨在解决深度神经网络的调整问题,尤其是在面对长条形分布数据时,模型受到调整问题的影响,导致模型过度自信。
  • methods: 本研究使用温度扩展(TS)方法,设计了多支分支温度扩展模型(Dual-TS), simultaneously considering the diversity of temperature parameters of different categories and the non-generalizability of temperature parameters for rare samples in minority classes.
  • results: 通过实验,我们示出了我们的模型在传统ECE和Esbin-ECE评价指标上均达到了顶尖性。
    Abstract The calibration for deep neural networks is currently receiving widespread attention and research. Miscalibration usually leads to overconfidence of the model. While, under the condition of long-tailed distribution of data, the problem of miscalibration is more prominent due to the different confidence levels of samples in minority and majority categories, and it will result in more serious overconfidence. To address this problem, some current research have designed diverse temperature coefficients for different categories based on temperature scaling (TS) method. However, in the case of rare samples in minority classes, the temperature coefficient is not generalizable, and there is a large difference between the temperature coefficients of the training set and the validation set. To solve this challenge, this paper proposes a dual-branch temperature scaling calibration model (Dual-TS), which considers the diversities in temperature parameters of different categories and the non-generalizability of temperature parameters for rare samples in minority classes simultaneously. Moreover, we noticed that the traditional calibration evaluation metric, Excepted Calibration Error (ECE), gives a higher weight to low-confidence samples in the minority classes, which leads to inaccurate evaluation of model calibration. Therefore, we also propose Equal Sample Bin Excepted Calibration Error (Esbin-ECE) as a new calibration evaluation metric. Through experiments, we demonstrate that our model yields state-of-the-art in both traditional ECE and Esbin-ECE metrics.
    摘要 Currently, 深度神经网络的准确性调整 receiving extensive attention and research. 不准确的情况可能导致模型过于自信。而在长条形分布的数据下,偏好类和少数类之间的样本准确性差异更为明显,这会导致更严重的过自信。为解决这个问题,一些当前的研究已经设计了不同类别的温度系数。然而,在罕见的少数类中的样本中,温度系数不一致,Validation集和训练集的温度系数存在大的差异。为解决这个挑战,本文提出了双支temperature scaling calibration模型(Dual-TS),该模型考虑了不同类别的温度参数的多样性以及少数类中样本的非一致性。此外,我们发现传统的准确性评价指标excepted Calibration Error(ECE)会将低信度的少数类样本更加重视,这会导致模型准确性的误估。因此,我们也提出了Equal Sample Bin Excepted Calibration Error(Esbin-ECE)作为一个新的准确性评价指标。经过实验,我们展示了我们的模型在传统ECE和Esbin-ECE指标下达到了国际一流的性能。

KernelWarehouse: Towards Parameter-Efficient Dynamic Convolution

  • paper_url: http://arxiv.org/abs/2308.08361
  • repo_url: https://github.com/osvai/kernelwarehouse
  • paper_authors: Chao Li, Anbang Yao
  • for: 本研究旨在提高动态核函数的表现,并提出了一种更通用的动态核函数模型——$KernelWarehouse$.
  • methods: 本研究使用了一种新的基本思路,即通过减少核 dimension 和增加核数来提高动态核函数的表现。这种方法通过策略性地分割核和共享库藏来增强层内参数的相互关系和逻辑相互作用,从而提供更多的自由度来适应不同的参数预算。
  • results: 研究人员通过对 ImageNet 和 MS-COCO 数据集使用不同的 ConvNet 架构,并使用 KernelWarehouse 模型,达到了当前最佳的结果。例如,使用 KernelWarehouse 模型在 ImageNet 上训练 ResNet18|ResNet50|MobileNetV2|ConvNeXt-Tiny 模型,可以达到 76.05%|81.05%|75.52%|82.51% 的 top-1 准确率。此外,由于 KernelWarehouse 的灵活设计,可以将 ConvNet 模型的大小减小,同时提高准确率,例如,我们的 ResNet18 模型在 36.45%|65.10% 参数减少后,对比基eline 模型,可以提高 2.89%|2.29% 的绝对准确率。
    Abstract Dynamic convolution learns a linear mixture of $n$ static kernels weighted with their sample-dependent attentions, demonstrating superior performance compared to normal convolution. However, existing designs are parameter-inefficient: they increase the number of convolutional parameters by $n$ times. This and the optimization difficulty lead to no research progress in dynamic convolution that can allow us to use a significant large value of $n$ (e.g., $n>100$ instead of typical setting $n<10$) to push forward the performance boundary. In this paper, we propose $KernelWarehouse$, a more general form of dynamic convolution, which can strike a favorable trade-off between parameter efficiency and representation power. Its key idea is to redefine the basic concepts of "$kernels$" and "$assembling$ $kernels$" in dynamic convolution from the perspective of reducing kernel dimension and increasing kernel number significantly. In principle, KernelWarehouse enhances convolutional parameter dependencies within the same layer and across successive layers via tactful kernel partition and warehouse sharing, yielding a high degree of freedom to fit a desired parameter budget. We validate our method on ImageNet and MS-COCO datasets with different ConvNet architectures, and show that it attains state-of-the-art results. For instance, the ResNet18|ResNet50|MobileNetV2|ConvNeXt-Tiny model trained with KernelWarehouse on ImageNet reaches 76.05%|81.05%|75.52%|82.51% top-1 accuracy. Thanks to its flexible design, KernelWarehouse can even reduce the model size of a ConvNet while improving the accuracy, e.g., our ResNet18 model with 36.45%|65.10% parameter reduction to the baseline shows 2.89%|2.29% absolute improvement to top-1 accuracy.
    摘要 “动态核函数学习一种线性混合的 $n$ 个静止核函数,显示出比普通核函数更高的性能。然而,现有设计是参数不充分利用的:它们将核函数参数数量增加 $n$ 倍。这和优化困难导致无法进行大量 $n$ 的研究进步,而 $n$ 通常设置在 10 左右。在这篇论文中,我们提出了 $KernelWarehouse$,一种更通用的动态核函数设计,可以在参数效率和表示力之间做出一个平衡。它的关键思想是通过重新定义动态核函数中 "$核函数" 和 "$核函数组合" 的概念,从减少核函数维度和增加核函数数量的角度来提高卷积参数的相互依赖性和层之间的参数共享,从而获得一定的参数预算的自由度。我们在 ImageNet 和 MS-COCO 数据集上验证了我们的方法,并显示其可以达到领先的Result。例如,我们在 ImageNet 上使用 KernelWarehouse 训练 ResNet18|ResNet50|MobileNetV2|ConvNeXt-Tiny 模型,达到 76.05%|81.05%|75.52%|82.51% 的 top-1 准确率。另外,由于 KernelWarehouse 的灵活设计,它可以减少 ConvNet 模型的大小,同时提高准确率,例如,我们的 ResNet18 模型,在参数减少 36.45%|65.10% 后,对比基eline的准确率提高 2.89%|2.29%。”

Independent Distribution Regularization for Private Graph Embedding

  • paper_url: http://arxiv.org/abs/2308.08360
  • repo_url: https://github.com/hkust-knowcomp/privategraphencoder
  • paper_authors: Qi Hu, Yangqiu Song
  • for: 本研究旨在提出一种名为Private Variational Graph AutoEncoders(PVGAE)的新方法,以保护图像数据的隐私性。
  • methods: PVGAE使用了独立分布罚款来规范各个编码器之间的独立性,并通过在训练过程中添加独立分布罚款来保证隐私性。
  • results: 实验结果表明,PVGAE在三个实际数据集上的性能和隐私保护性都比基eline方法更高。
    Abstract Learning graph embeddings is a crucial task in graph mining tasks. An effective graph embedding model can learn low-dimensional representations from graph-structured data for data publishing benefiting various downstream applications such as node classification, link prediction, etc. However, recent studies have revealed that graph embeddings are susceptible to attribute inference attacks, which allow attackers to infer private node attributes from the learned graph embeddings. To address these concerns, privacy-preserving graph embedding methods have emerged, aiming to simultaneously consider primary learning and privacy protection through adversarial learning. However, most existing methods assume that representation models have access to all sensitive attributes in advance during the training stage, which is not always the case due to diverse privacy preferences. Furthermore, the commonly used adversarial learning technique in privacy-preserving representation learning suffers from unstable training issues. In this paper, we propose a novel approach called Private Variational Graph AutoEncoders (PVGAE) with the aid of independent distribution penalty as a regularization term. Specifically, we split the original variational graph autoencoder (VGAE) to learn sensitive and non-sensitive latent representations using two sets of encoders. Additionally, we introduce a novel regularization to enforce the independence of the encoders. We prove the theoretical effectiveness of regularization from the perspective of mutual information. Experimental results on three real-world datasets demonstrate that PVGAE outperforms other baselines in private embedding learning regarding utility performance and privacy protection.
    摘要 学习图像抽象是图像挖掘任务中关键的任务。一个有效的图像抽象模型可以从图像结构数据中学习低维度表示,为数据发布带来多个下游应用程序的 beneficial effects,如节点分类、链接预测等。然而, latest studies have shown that graph embeddings are vulnerable to attribute inference attacks, which allow attackers to infer private node attributes from the learned graph embeddings. To address these concerns, privacy-preserving graph embedding methods have emerged, aiming to simultaneously consider primary learning and privacy protection through adversarial learning. However, most existing methods assume that representation models have access to all sensitive attributes in advance during the training stage, which is not always the case due to diverse privacy preferences. Furthermore, the commonly used adversarial learning technique in privacy-preserving representation learning suffers from unstable training issues. In this paper, we propose a novel approach called Private Variational Graph AutoEncoders (PVGAE) with the aid of independent distribution penalty as a regularization term. Specifically, we split the original variational graph autoencoder (VGAE) to learn sensitive and non-sensitive latent representations using two sets of encoders. Additionally, we introduce a novel regularization to enforce the independence of the encoders. We prove the theoretical effectiveness of regularization from the perspective of mutual information. Experimental results on three real-world datasets demonstrate that PVGAE outperforms other baselines in private embedding learning regarding utility performance and privacy protection.

Convergence of Two-Layer Regression with Nonlinear Units

  • paper_url: http://arxiv.org/abs/2308.08358
  • repo_url: None
  • paper_authors: Yichuan Deng, Zhao Song, Shenghao Xie
  • for: 这个论文主要是为了解决一个基于Softmax和ReLU单元的归一化问题。
  • methods: 作者提出了一种基于粗略新颖方法的搜索算法,用于解决这个归一化问题。
  • results: 作者计算了一个准确的表示形式 для损失函数的资深特征,并证明了这个表示形式下的梯度是 lipschitz 连续和归一化的。
    Abstract Large language models (LLMs), such as ChatGPT and GPT4, have shown outstanding performance in many human life task. Attention computation plays an important role in training LLMs. Softmax unit and ReLU unit are the key structure in attention computation. Inspired by them, we put forward a softmax ReLU regression problem. Generally speaking, our goal is to find an optimal solution to the regression problem involving the ReLU unit. In this work, we calculate a close form representation for the Hessian of the loss function. Under certain assumptions, we prove the Lipschitz continuous and the PSDness of the Hessian. Then, we introduce an greedy algorithm based on approximate Newton method, which converges in the sense of the distance to optimal solution. Last, We relax the Lipschitz condition and prove the convergence in the sense of loss value.
    摘要 大型语言模型(LLM),如ChatGPT和GPT4,在许多人类生活任务中表现出色。注意计算在训练 LLM 中扮演重要角色。Softmax单元和ReLU单元是关键结构在注意计算中。受到他们的灵感,我们提出了Softmax ReLU regression问题。通过该问题,我们的目标是找到一个最佳解决方案。在这个工作中,我们计算了一个圆滑函数表示的条件下的条件下的条件下的Hessian数值。在某些假设下,我们证明了Hessian 的 lipschitz 连续和对偶性。接着,我们引入了一个类似于扩展新тон方法的探索算法,它在几乎所有情况下将 convergence 到最佳解决方案。最后,我们宽松了 lipschitz 条件,并证明了在损失值方面的对应。

Is Meta-Learning the Right Approach for the Cold-Start Problem in Recommender Systems?

  • paper_url: http://arxiv.org/abs/2308.08354
  • repo_url: None
  • paper_authors: Davide Buffelli, Ashish Gupta, Agnieszka Strzalka, Vassilis Plachouras
  • for: 本研究旨在解决现代 Онлайн产品和服务中的推荐系统中的冷启动问题。
  • methods: 本研究使用了标准的深度学习模型,并证明了这些模型可以在冷启动 setting中达到同等或更高的性能。
  • results: 研究表明,当正确地调整深度学习模型时,它们可以与更新的 meta-learning 模型相比,在常用的benchmark上达到同等或更高的性能。此外,研究还表明了一种非常简单的模块化方法,可以在实际应用中更易地实现。
    Abstract Recommender systems have become fundamental building blocks of modern online products and services, and have a substantial impact on user experience. In the past few years, deep learning methods have attracted a lot of research, and are now heavily used in modern real-world recommender systems. Nevertheless, dealing with recommendations in the cold-start setting, e.g., when a user has done limited interactions in the system, is a problem that remains far from solved. Meta-learning techniques, and in particular optimization-based meta-learning, have recently become the most popular approaches in the academic research literature for tackling the cold-start problem in deep learning models for recommender systems. However, current meta-learning approaches are not practical for real-world recommender systems, which have billions of users and items, and strict latency requirements. In this paper we show that it is possible to obtaining similar, or higher, performance on commonly used benchmarks for the cold-start problem without using meta-learning techniques. In more detail, we show that, when tuned correctly, standard and widely adopted deep learning models perform just as well as newer meta-learning models. We further show that an extremely simple modular approach using common representation learning techniques, can perform comparably to meta-learning techniques specifically designed for the cold-start setting while being much more easily deployable in real-world applications.
    摘要

Graph Out-of-Distribution Generalization with Controllable Data Augmentation

  • paper_url: http://arxiv.org/abs/2308.08344
  • repo_url: None
  • paper_authors: Bin Lu, Xiaoying Gan, Ze Zhao, Shiyu Liang, Luoyi Fu, Xinbing Wang, Chenghu Zhou
    for: This paper aims to address the issue of distribution deviation in graph neural network (GNN) training, specifically the problem of hybrid structure distribution shift, which can lead to spurious correlations and degrade the performance of GNN methods.methods: The proposed method, called \texttt{OOD-GMixup}, uses controllable data augmentation in the metric space to manipulate the training distribution and alleviate the distribution deviation problem. Specifically, the method involves extracting graph rationales to eliminate spurious correlations, generating virtual samples with perturbation on the graph rationale representation domain, and using OOD calibration to measure the distribution deviation of virtual samples.results: The proposed method outperforms state-of-the-art baselines on several real-world datasets for graph classification tasks, demonstrating its effectiveness in addressing the distribution deviation problem in GNN training.
    Abstract Graph Neural Network (GNN) has demonstrated extraordinary performance in classifying graph properties. However, due to the selection bias of training and testing data (e.g., training on small graphs and testing on large graphs, or training on dense graphs and testing on sparse graphs), distribution deviation is widespread. More importantly, we often observe \emph{hybrid structure distribution shift} of both scale and density, despite of one-sided biased data partition. The spurious correlations over hybrid distribution deviation degrade the performance of previous GNN methods and show large instability among different datasets. To alleviate this problem, we propose \texttt{OOD-GMixup} to jointly manipulate the training distribution with \emph{controllable data augmentation} in metric space. Specifically, we first extract the graph rationales to eliminate the spurious correlations due to irrelevant information. Secondly, we generate virtual samples with perturbation on graph rationale representation domain to obtain potential OOD training samples. Finally, we propose OOD calibration to measure the distribution deviation of virtual samples by leveraging Extreme Value Theory, and further actively control the training distribution by emphasizing the impact of virtual OOD samples. Extensive studies on several real-world datasets on graph classification demonstrate the superiority of our proposed method over state-of-the-art baselines.
    摘要 图 нейрон网络(GNN)在分类图属性方面表现出色,但由于训练和测试数据的选择偏见(例如训练小图测试大图,或训练密集图测试稀疏图),流行范围偏差。更重要的是,我们经常观察到 hybrid 结构分布偏移,即一种同时具有规模和密度偏移的分布偏移,尽管数据分区是一种一侧偏见。这种偏移导致过去的 GNN 方法表现不稳定,并且在不同的数据集上显示出大的变化。为解决这个问题,我们提出了 \texttt{OOD-GMixup},它通过在度量空间进行可控的数据增强来同时 manipulate 训练分布。具体来说,我们首先提取图理由来消除由无关信息引起的假 positives。然后,我们通过对图理由表示域进行扰动来生成可能的 OOD 训练样本。最后,我们提出了 OOD 校准,通过沿用极值理论来测量虚拟 OOD 样本的分布偏移,并且通过强调虚拟 OOD 样本的影响来活动控制训练分布。我们在多个实际 dataset 上进行了广泛的研究,并证明了我们的提议方法的超越性。

Learning Logic Programs by Discovering Higher-Order Abstractions

  • paper_url: http://arxiv.org/abs/2308.08334
  • repo_url: None
  • paper_authors: Céline Hocquette, Sebastijan Dumančić, Andrew Cropper
  • for: 本研究旨在找到人类水平的AI需要的新抽象,以提高预测精度和学习效率。
  • methods: 本研究使用逻辑编程,从示例和背景知识中逻辑程序的induction。我们引入高阶 refactoring 问题,寻找可以压缩逻辑程序的高阶抽象。
  • results: 我们在多个领域,包括程序生成和视觉理解,进行了实验。结果显示,相比无 refactoring,STEVIE可以提高预测精度27%,降低学习时间47%。此外,STEVIE还可以找到可以在不同领域中传递的抽象。
    Abstract Discovering novel abstractions is important for human-level AI. We introduce an approach to discover higher-order abstractions, such as map, filter, and fold. We focus on inductive logic programming, which induces logic programs from examples and background knowledge. We introduce the higher-order refactoring problem, where the goal is to compress a logic program by introducing higher-order abstractions. We implement our approach in STEVIE, which formulates the higher-order refactoring problem as a constraint optimisation problem. Our experimental results on multiple domains, including program synthesis and visual reasoning, show that, compared to no refactoring, STEVIE can improve predictive accuracies by 27% and reduce learning times by 47%. We also show that STEVIE can discover abstractions that transfer to different domains
    摘要 发现新的抽象是人类水平AI的关键。我们介绍了一种方法,用于发现更高级别的抽象,如地图、筛选和折叠。我们关注了逻辑编程,它从示例和背景知识中生成逻辑程序。我们定义了更高级别的重构问题,其目标是通过引入更高级别的抽象来压缩逻辑程序。我们在STEVIE中实现了我们的方法,它将重构问题转化为约束优化问题。我们的实验结果在多个领域,包括程序生成和视觉理解,表明,相比无 refactoring,STEVIE可以提高预测精度 by 27%,并将学习时间减少 by 47%。我们还表明STEVIE可以找到可以在不同领域传递的抽象。

Warped geometric information on the optimisation of Euclidean functions

  • paper_url: http://arxiv.org/abs/2308.08305
  • repo_url: None
  • paper_authors: Marcelo Hartmann, Bernardo Williams, Hanlin Yu, Mark Girolami, Alessandro Barp, Arto Klami
  • for: 优化一个在高维欧几何空间中定义的实数函数,如机器学习任务中的损函数或统计推断中的LOG probability distribution。
  • methods: 使用扭曲的里曼尼安geometry的概念来重新定义欧几何空间上的函数优化问题,然后在拓扑学上找到函数的最优点。
  • results: 使用第三阶 Taylor级别的地odesic曲线approximation来提高优化效率,并且比标准欧几何Gradient-basedcounterparts更快 converging。
    Abstract We consider the fundamental task of optimizing a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use the warped Riemannian geometry notions to redefine the optimisation problem of a function on Euclidean space to a Riemannian manifold with a warped metric, and then find the function's optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associate with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using third-order approximation of geodesics, outperforms standard Euclidean gradient-based counterparts in term of number of iterations until convergence and an alternative method for Hessian-based optimisation routines.
    摘要 我们考虑了在高维欧几学空间中优化实数函数的基本任务,例如机器学习中的损函数或统计推断中的Logarithm的概率分布。我们使用扭曲的里曼尼geometry来重新定义在欧几学空间上的函数优化问题,然后在这个拓扑上寻找函数的最优点。选择的扭曲度metric在搜索区域上引入了一个计算友好的度量牛顿,使得优化问题的解决变得更加容易。尽管在欧几学空间上进行优化是不可能的,但我们可以 analytically derived Taylor approximation up to third-order。这些approximation不会在拓扑上,但我们可以构建适当的Retraction map pull them back onto the manifold。因此,我们可以高效地优化third-order approximation of geodesics。我们涵盖相关理论,描述了一种实用的优化算法,并对一组具有挑战性的优化benchmark进行了实验性评估。我们的提议的算法,使用第三阶 Taylor approximation of geodesics,在迭代次数 Until convergence和一种基于Hessian的优化方法之间占据了优势。

Robust Bayesian Satisficing

  • paper_url: http://arxiv.org/abs/2308.08291
  • repo_url: None
  • paper_authors: Artun Saday, Yaşar Cahit Yıldırım, Cem Tekin
  • for: 本研究旨在解决当代机器学习中存在分布变化的问题,以实现Robust Satisficing(RS)的目标。
  • methods: 本文提出了一种基于 bayesian 优化的 robust Bayesian satisficing 算法(RoBOS),用于帮助解决上述问题。
  • results: 本研究显示,RoBOS 算法可以在不同的学习问题中具有抑 Linear 的宽松偏误,并且可以独立于分布变化的量来控制偏误。
    Abstract Distributional shifts pose a significant challenge to achieving robustness in contemporary machine learning. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true and reference distributions of the context. We propose a novel robust Bayesian satisficing algorithm called RoBOS for noisy black-box optimization. Our algorithm guarantees sublinear lenient regret under certain assumptions on the amount of distribution shift. In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift. To demonstrate the effectiveness of our method, we apply it to various learning problems and compare it to other approaches, such as distributionally robust optimization.
    摘要 translate("Distributional shifts pose a significant challenge to achieving robustness in contemporary machine learning. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true and reference distributions of the context. We propose a novel robust Bayesian satisficing algorithm called RoBOS for noisy black-box optimization. Our algorithm guarantees sublinear lenient regret under certain assumptions on the amount of distribution shift. In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift. To demonstrate the effectiveness of our method, we apply it to various learning problems and compare it to other approaches, such as distributionally robust optimization.")Here's the translation in Simplified Chinese:Distributional shifts pose a significant challenge to achieving robustness in contemporary machine learning. To overcome this challenge, robust satisficing (RS) seeks a robust solution to an unspecified distributional shift while achieving a utility above a desired threshold. This paper focuses on the problem of RS in contextual Bayesian optimization when there is a discrepancy between the true and reference distributions of the context. We propose a novel robust Bayesian satisficing algorithm called RoBOS for noisy black-box optimization. Our algorithm guarantees sublinear lenient regret under certain assumptions on the amount of distribution shift. In addition, we define a weaker notion of regret called robust satisficing regret, in which our algorithm achieves a sublinear upper bound independent of the amount of distribution shift. To demonstrate the effectiveness of our method, we apply it to various learning problems and compare it to other approaches, such as distributionally robust optimization.

DFedADMM: Dual Constraints Controlled Model Inconsistency for Decentralized Federated Learning

  • paper_url: http://arxiv.org/abs/2308.08290
  • repo_url: None
  • paper_authors: Qinglun Li, Li Shen, Guanghao Li, Quanjun Yin, Dacheng Tao
  • for: 提高 Federated Learning (FL) 中的通信负担问题,增进 Decentralized Federated Learning (DFL) 的性能。
  • methods: 提出了两种新的 DFL 算法:DFedADMM 和其改进版 DFedADMM-SAM,使用 primal-dual 优化 (ADMM) 和 Sharpness-Aware Minimization (SAM) 优化器来解决本地不一致和本地不同程度适应问题。
  • results: 在非对称 Setting 中,DFedADMM 和 DFedADMM-SAM 算法的渐进速度为 $\mathcal{O}\Big(\frac{1}{\sqrt{KT}+\frac{1}{KT(1-\psi)^2}\Big)$ 和 $\mathcal{O}\Big(\frac{1}{\sqrt{KT}+\frac{1}{KT(1-\psi)^2}+ \frac{1}{T^{3/2}K^{1/2}\Big)$,并且在 MNIST、CIFAR10 和 CIFAR100 数据集上实验证明了我们的算法在一致性和渐进速度方面与现有 SOTA 优化器相比具有显著优势。
    Abstract To address the communication burden issues associated with federated learning (FL), decentralized federated learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which have not been fundamentally addressed by existing DFL methods. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to enhance the performance of DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of $\small \mathcal{O}\Big(\frac{1}{\sqrt{KT}+\frac{1}{KT(1-\psi)^2}\Big)$ and $\small \mathcal{O}\Big(\frac{1}{\sqrt{KT}+\frac{1}{KT(1-\psi)^2}+ \frac{1}{T^{3/2}K^{1/2}\Big)$ in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where $1 - \psi$ represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10 and CIFAR100 datesets demonstrate that our algorithms exhibit superior performance in terms of both generalization and convergence speed compared to existing state-of-the-art (SOTA) optimizers in DFL.
    摘要 To address the communication burden issues associated with federated learning (FL), decentralized federated learning (DFL) discards the central server and establishes a decentralized communication network, where each client communicates only with neighboring clients. However, existing DFL methods still suffer from two major challenges: local inconsistency and local heterogeneous overfitting, which have not been fundamentally addressed by existing DFL methods. To tackle these issues, we propose novel DFL algorithms, DFedADMM and its enhanced version DFedADMM-SAM, to enhance the performance of DFL. The DFedADMM algorithm employs primal-dual optimization (ADMM) by utilizing dual variables to control the model inconsistency raised from the decentralized heterogeneous data distributions. The DFedADMM-SAM algorithm further improves on DFedADMM by employing a Sharpness-Aware Minimization (SAM) optimizer, which uses gradient perturbations to generate locally flat models and searches for models with uniformly low loss values to mitigate local heterogeneous overfitting. Theoretically, we derive convergence rates of $\small \mathcal{O}\Big(\frac{1}{\sqrt{KT}+\frac{1}{KT(1-\psi)^2}\Big)$ and $\small \mathcal{O}\Big(\frac{1}{\sqrt{KT}+\frac{1}{KT(1-\psi)^2}+ \frac{1}{T^{3/2}K^{1/2}\Big)$ in the non-convex setting for DFedADMM and DFedADMM-SAM, respectively, where $1 - \psi$ represents the spectral gap of the gossip matrix. Empirically, extensive experiments on MNIST, CIFAR10 and CIFAR100 datesets demonstrate that our algorithms exhibit superior performance in terms of both generalization and convergence speed compared to existing state-of-the-art (SOTA) optimizers in DFL.

CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

  • paper_url: http://arxiv.org/abs/2308.08283
  • repo_url: None
  • paper_authors: Hantao Zhang, Weidong Guo, Chenyang Qiu, Shouhong Wan, Bingbing Zou, Wanqin Wang, Peiquan Jin
  • for: Rectal cancer segmentation of CT images for timely clinical diagnosis, radiotherapy treatment, and follow-up.
  • methods: A novel large-scale rectal cancer CT image dataset (CARE) with pixel-level annotations, and a novel medical cancer lesion segmentation benchmark model (U-SAM) that incorporates prompt information to tackle the challenges of intricate anatomical structures.
  • results: The proposed U-SAM outperforms state-of-the-art methods on the CARE dataset and the WORD dataset, demonstrating its effectiveness in rectal cancer segmentation.
    Abstract Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer. Additionally, a major obstacle is the lack of a large-scale, finely annotated CT image dataset for rectal cancer segmentation. To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum, which serves as a valuable resource for algorithm research and clinical application development. Moreover, we propose a novel medical cancer lesion segmentation benchmark model named U-SAM. The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information. U-SAM contains three key components: promptable information (e.g., points) to aid in target area localization, a convolution module for capturing low-level lesion details, and skip-connections to preserve and recover spatial information during the encoding-decoding process. To evaluate the effectiveness of U-SAM, we systematically compare its performance with several popular segmentation methods on the CARE dataset. The generalization of the model is further verified on the WORD dataset. Extensive experiments demonstrate that the proposed U-SAM outperforms state-of-the-art methods on these two datasets. These experiments can serve as the baseline for future research and clinical application development.
    摘要 CT 影像中的肛门癌分 segmentation 在临床诊断、放疗治疗和跟踪中扮演着关键的角色。 although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer. Additionally, a major obstacle is the lack of a large-scale, finely annotated CT image dataset for rectal cancer segmentation. To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum, which serves as a valuable resource for algorithm research and clinical application development. Moreover, we propose a novel medical cancer lesion segmentation benchmark model named U-SAM. The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information. U-SAM contains three key components: promptable information (e.g., points) to aid in target area localization, a convolution module for capturing low-level lesion details, and skip-connections to preserve and recover spatial information during the encoding-decoding process. To evaluate the effectiveness of U-SAM, we systematically compare its performance with several popular segmentation methods on the CARE dataset. The generalization of the model is further verified on the WORD dataset. Extensive experiments demonstrate that the proposed U-SAM outperforms state-of-the-art methods on these two datasets. These experiments can serve as the baseline for future research and clinical application development.Here's the text with some additional information about the Simplified Chinese translation:The Simplified Chinese translation is written in a formal and objective style, which is appropriate for a scientific or technical document. The translation uses proper nouns and phrases to convey the meaning of the original text accurately. The use of 大量 (dàliàng) to describe the dataset CARE emphasizes the scale of the dataset, while the use of 精心 (jīngxīn) to describe the annotations highlights the precision of the annotations. The translation also uses technical terms such as 肛门 (fèngmén) for "rectum" and 癌 (gān) for "cancer" to ensure accuracy in the medical context.Please note that the translation is provided as a courtesy and may not be perfect or entirely accurate. If you have any further questions or need more information, please feel free to ask!

It Ain’t That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

  • paper_url: http://arxiv.org/abs/2308.08268
  • repo_url: None
  • paper_authors: Xingcheng Xu, Zihao Pan, Haipeng Zhang, Yanqing Yang
  • for: 研究基于Transformer模型的生成模型在解决多种问题上表现出色,但其泛化能力还不够了解。
  • methods: 研究人员通过使用基本数学任务,如n进位加法或乘法,来研究这些模型的泛化行为。
  • results: 研究发现,当训练n进位操作(如加法)时,模型在未seen n进位输入上能够成功泛化(ID泛化),但在长度更长的未seen输入上表现很差(OOD泛化)。研究人员尝试使用工具such as修改位嵌入、细化和priming来bridging这个差距,但这些解决方案并没有解决核心机制,因此无法保证其稳定性。
    Abstract Generative Transformer-based models have achieved remarkable proficiency on solving diverse problems. However, their generalization ability is not fully understood and not always satisfying. Researchers take basic mathematical tasks like n-digit addition or multiplication as important perspectives for investigating their generalization behaviors. Curiously, it is observed that when training on n-digit operations (e.g., additions) in which both input operands are n-digit in length, models generalize successfully on unseen n-digit inputs (in-distribution (ID) generalization), but fail miserably and mysteriously on longer, unseen cases (out-of-distribution (OOD) generalization). Studies try to bridge this gap with workarounds such as modifying position embedding, fine-tuning, and priming with more extensive or instructive data. However, without addressing the essential mechanism, there is hardly any guarantee regarding the robustness of these solutions. We bring this unexplained performance drop into attention and ask whether it is purely from random errors. Here we turn to the mechanistic line of research which has notable successes in model interpretability. We discover that the strong ID generalization stems from structured representations, while behind the unsatisfying OOD performance, the models still exhibit clear learned algebraic structures. Specifically, these models map unseen OOD inputs to outputs with equivalence relations in the ID domain. These highlight the potential of the models to carry useful information for improved generalization.
    摘要 transformer-based models have achieved remarkable proficiency in solving diverse problems, but their generalization ability is not fully understood and is not always satisfying. researchers take basic mathematical tasks like n-digit addition or multiplication as important perspectives for investigating their generalization behaviors. curiously, it is observed that when training on n-digit operations (e.g., additions) in which both input operands are n-digit in length, models generalize successfully on unseen n-digit inputs (in-distribution (ID) generalization), but fail miserably and mysteriously on longer, unseen cases (out-of-distribution (OOD) generalization). studies try to bridge this gap with workarounds such as modifying position embedding, fine-tuning, and priming with more extensive or instructive data. however, without addressing the essential mechanism, there is hardly any guarantee regarding the robustness of these solutions. we bring this unexplained performance drop into attention and ask whether it is purely from random errors. here we turn to the mechanistic line of research which has notable successes in model interpretability. we discover that the strong ID generalization stems from structured representations, while behind the unsatisfying OOD performance, the models still exhibit clear learned algebraic structures. specifically, these models map unseen OOD inputs to outputs with equivalence relations in the ID domain. these highlight the potential of the models to carry useful information for improved generalization.

Graph Relation Aware Continual Learning

  • paper_url: http://arxiv.org/abs/2308.08259
  • repo_url: None
  • paper_authors: Qinghua Shen, Weijieying Ren, Wei Qin
  • for: 这个论文研究了从无穷流图数据中学习,总结历史知识,并将其泛化到未来任务。在这个任务中,只有当前图数据可用。
  • methods: 这个论文提出了一种关注图边的隐藏关系,并设计了一种可变模型RAM-CG,包括一个用于探索图边隐藏关系的模块和一个用于考虑时间推移的masking类ifier。
  • results: 对于CitationNet、OGBN-arxiv和TWITCH dataset,RAM-CG模型比现有最佳实现提供了2.2%、6.9%和6.6%的准确率提升。
    Abstract Continual graph learning (CGL) studies the problem of learning from an infinite stream of graph data, consolidating historical knowledge, and generalizing it to the future task. At once, only current graph data are available. Although some recent attempts have been made to handle this task, we still face two potential challenges: 1) most of existing works only manipulate on the intermediate graph embedding and ignore intrinsic properties of graphs. It is non-trivial to differentiate the transferred information across graphs. 2) recent attempts take a parameter-sharing policy to transfer knowledge across time steps or progressively expand new architecture given shifted graph distribution. Learning a single model could loss discriminative information for each graph task while the model expansion scheme suffers from high model complexity. In this paper, we point out that latent relations behind graph edges can be attributed as an invariant factor for the evolving graphs and the statistical information of latent relations evolves. Motivated by this, we design a relation-aware adaptive model, dubbed as RAM-CG, that consists of a relation-discovery modular to explore latent relations behind edges and a task-awareness masking classifier to accounts for the shifted. Extensive experiments show that RAM-CG provides significant 2.2%, 6.9% and 6.6% accuracy improvements over the state-of-the-art results on CitationNet, OGBN-arxiv and TWITCH dataset, respective.
    摘要 在这篇论文中,我们指出了图像边缘下的隐藏关系可以作为不变因素,并且这些隐藏关系的统计信息在演化过程中发展。基于这一点,我们设计了一种相关性感知的自适应模型,即RAM-CG,它包括一个用于探索隐藏关系的关系探索模块和一个用于考虑时间步骤或graph distribution的变化的任务意识Masking类ifier。经过广泛的实验,我们发现RAM-CG可以提供2.2%、6.9%和6.6%的精度提高 compared to state-of-the-art results on CitationNet、OGBN-arxiv和TWITCH dataset,分别。

Two Phases of Scaling Laws for Nearest Neighbor Classifiers

  • paper_url: http://arxiv.org/abs/2308.08247
  • repo_url: None
  • paper_authors: Pengkun Yang, Jingzhao Zhang
  • for: 本研究旨在研究 nearest neighbor 分类器的扩展法则,即 Observation 的训练数据量增加后,模型的测试性能会提高。
  • methods: 本研究使用了 nearest neighbor 分类器,并对其进行了分析和研究。
  • results: 研究发现, nearest neighbor 分类器可以在一定的数据量范围内具有 polynomial 型的扩展法则,而在其他范围内具有 exponential 型的扩展法则。这种结果反映了数据分布的复杂性对模型的泛化误差的影响。
    Abstract A scaling law refers to the observation that the test performance of a model improves as the number of training data increases. A fast scaling law implies that one can solve machine learning problems by simply boosting the data and the model sizes. Yet, in many cases, the benefit of adding more data can be negligible. In this work, we study the rate of scaling laws of nearest neighbor classifiers. We show that a scaling law can have two phases: in the first phase, the generalization error depends polynomially on the data dimension and decreases fast; whereas in the second phase, the error depends exponentially on the data dimension and decreases slowly. Our analysis highlights the complexity of the data distribution in determining the generalization error. When the data distributes benignly, our result suggests that nearest neighbor classifier can achieve a generalization error that depends polynomially, instead of exponentially, on the data dimension.
    摘要 (Simplified Chinese)一个尺度法则指的是模型在训练数据量增加后测试性能的改进。一个快速尺度法则意味着可以通过简单地增加数据和模型大小解决机器学习问题。然而,在许多情况下,增加更多数据的好处很小。在这个工作中,我们研究 nearest neighbor 分类器的尺度法则。我们发现一个尺度法则可以有两个阶段:在第一阶段,泛化错误与数据维度之间存在 polynomial 关系,随着数据维度增加而快速下降;而在第二阶段,错误与数据维度之间存在 exponential 关系,随着数据维度增加而慢慢下降。我们的分析表明数据分布的复杂性对泛化错误的影响。当数据分布良好时,我们的结果表明 nearest neighbor 分类器可以在数据维度上取得 polynomial 类型的泛化错误,而不是 exponential 类型。

The Expressive Power of Graph Neural Networks: A Survey

  • paper_url: http://arxiv.org/abs/2308.08235
  • repo_url: None
  • paper_authors: Bingxu Zhang, Changjun Fan, Shixuan Liu, Kuihua Huang, Xiang Zhao, Jincai Huang, Zhong Liu
  • for: 本研究旨在探讨图aelastic networks(GNNs)的表达能力限制,包括图 isomorphism recognition 和 subgraph counting等问题。
  • methods: 本研究使用了多种方法来探讨 GNNs 的表达能力,包括 graph feature enhancement、graph topology enhancement 和 GNNs 架构增强等方法。
  • results: 本研究结果显示,GNNs 的表达能力有很多限制,但可以通过不同的方法来增强表达能力,如图 feature enhancement、graph topology enhancement 和 GNNs 架构增强等方法。
    Abstract Graph neural networks (GNNs) are effective machine learning models for many graph-related applications. Despite their empirical success, many research efforts focus on the theoretical limitations of GNNs, i.e., the GNNs expressive power. Early works in this domain mainly focus on studying the graph isomorphism recognition ability of GNNs, and recent works try to leverage the properties such as subgraph counting and connectivity learning to characterize the expressive power of GNNs, which are more practical and closer to real-world. However, no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a first survey for models for enhancing expressive power under different forms of definition. Concretely, the models are reviewed based on three categories, i.e., Graph feature enhancement, Graph topology enhancement, and GNNs architecture enhancement.
    摘要 格点网络(GNNs)是许多图形应用中的有效机器学习模型。尽管其实验成功,但许多研究努力集中在GNNs的理论局限性上,即GNNs表达能力。早期工作主要关注图同构识别能力,而最近的工作尝试利用子图计数和连接学习来描述GNNs的表达能力,这些方法更加实际和更加接近实际应用。然而,没有任何报告和开源库全面总结和讨论这个重要方向。为了填补这个空白,我们进行了首次的survey,检讨不同定义下GNNs表达能力的模型。具体来说,模型被评估基于三类划分,即图特征增强、图结构增强和GNNs架构增强。

Challenges and Opportunities of Using Transformer-Based Multi-Task Learning in NLP Through ML Lifecycle: A Survey

  • paper_url: http://arxiv.org/abs/2308.08234
  • repo_url: None
  • paper_authors: Lovre Torbarina, Tin Ferkovic, Lukasz Roguski, Velimir Mihelcic, Bruno Sarlija, Zeljko Kraljevic
  • for: 本研究旨在探讨使用多任务学习(MTL)方法在自然语言处理(NLP)领域中提高效率和性能。
  • methods: 本文首先提供了转换器基于MTL方法的概述,然后讨论了在NLP领域中MTL方法在数据工程、模型开发、部署和监控阶段的挑战和机遇。
  • results: 本文系统地分析了在NLP领域中使用转换器基于MTL方法如何适应ML生命周期阶段,并提出了在MTL和 continual learning(CL)之间的连接研究的可能性,以便在实际应用中更方便地定期重新训练模型、更新模型due to distribution shifts,并添加新功能来满足实际需求。
    Abstract The increasing adoption of natural language processing (NLP) models across industries has led to practitioners' need for machine learning systems to handle these models efficiently, from training to serving them in production. However, training, deploying, and updating multiple models can be complex, costly, and time-consuming, mainly when using transformer-based pre-trained language models. Multi-Task Learning (MTL) has emerged as a promising approach to improve efficiency and performance through joint training, rather than training separate models. Motivated by this, we first provide an overview of transformer-based MTL approaches in NLP. Then, we discuss the challenges and opportunities of using MTL approaches throughout typical ML lifecycle phases, specifically focusing on the challenges related to data engineering, model development, deployment, and monitoring phases. This survey focuses on transformer-based MTL architectures and, to the best of our knowledge, is novel in that it systematically analyses how transformer-based MTL in NLP fits into ML lifecycle phases. Furthermore, we motivate research on the connection between MTL and continual learning (CL), as this area remains unexplored. We believe it would be practical to have a model that can handle both MTL and CL, as this would make it easier to periodically re-train the model, update it due to distribution shifts, and add new capabilities to meet real-world requirements.
    摘要 随着自然语言处理(NLP)模型在不同领域的推广,机器学习(ML)实践者需要高效地处理这些模型,从训练到生产环境中部署。然而,训练、部署和更新多个模型可能会复杂、成本高和时间费时,尤其是使用基于转换器的预训练语言模型。多任务学习(MTL)已经作为一种有 Promise的方法,以提高效率和性能,通过共同训练而不是独立训练多个模型。我们首先提供了基于转换器的 MTL 方法在 NLP 领域的概述。然后,我们讨论了在 ML 生命周期阶段中使用 MTL 方法的挑战和机遇,尤其是在数据工程、模型开发、部署和监控阶段。本文主要关注基于转换器的 MTL 架构,并且,到我们所知,是在 ML 生命周期阶段中系统地分析了基于转换器的 MTL 在 NLP 领域的应用。此外,我们还鼓励了研究基于 MTL 和连续学习(CL)的连续关系,因为这个领域还没有得到充分的探索。我们认为,一个能够同时处理 MTL 和 CL 的模型会更加实用,因为这样可以更方便地在 periodic retrained 模型,因应分布Shift 更新模型,并添加新的功能来满足实际需求。

SCQPTH: an efficient differentiable splitting method for convex quadratic programming

  • paper_url: http://arxiv.org/abs/2308.08232
  • repo_url: None
  • paper_authors: Andrew Butler
  • for: 这篇论文是用来解决凸quadratic programs(QPs)的问题的。
  • methods: 这篇论文使用了alternating direction method of multipliers(ADMM)和operator splitting方法来解决QPs。
  • results: 实验结果显示,使用SCQPTH可以提供1-10倍的计算效率提升,相比于现有的可微QP解决方法。
    Abstract We present SCQPTH: a differentiable first-order splitting method for convex quadratic programs. The SCQPTH framework is based on the alternating direction method of multipliers (ADMM) and the software implementation is motivated by the state-of-the art solver OSQP: an operating splitting solver for convex quadratic programs (QPs). The SCQPTH software is made available as an open-source python package and contains many similar features including efficient reuse of matrix factorizations, infeasibility detection, automatic scaling and parameter selection. The forward pass algorithm performs operator splitting in the dimension of the original problem space and is therefore suitable for large scale QPs with $100-1000$ decision variables and thousands of constraints. Backpropagation is performed by implicit differentiation of the ADMM fixed-point mapping. Experiments demonstrate that for large scale QPs, SCQPTH can provide a $1\times - 10\times$ improvement in computational efficiency in comparison to existing differentiable QP solvers.
    摘要 我们介绍SCQPTH:一种可微的首项拆分法 для凸quadratic programs。SCQPTH框架基于多方向方法(ADMM),并受到现代QP解析器OSQP的驱动。SCQPTH软件作为一个开源python套件,具有许多相似特性,包括有效的续用矩阵因数、无法构成检测、自动尺度调整和参数选择。forward pass算法在原始问题空间的维度进行操作拆分,适用于大规模QP的解释,具有100-1000个决策变数和千个约束。回传算法使用凸拓扑ADMM定点映射的隐含对偶积分。实验表明,对于大规模QP,SCQPTH可以提供1倍-10倍的计算效率提升,与现有的可微QP解析器相比。

Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

  • paper_url: http://arxiv.org/abs/2308.11521
  • repo_url: None
  • paper_authors: Zhenhua Wang, Wei Xie, Kai Chen, Baosheng Wang, Zhiwen Gui, Enze Wang
    for: 这篇论文 investigate了LLM囚禁问题,并提出了一种自动囚禁方法。methods: 论文提出了三种实现方法,包括使用语义防火墙、自我误导攻击和隐藏攻击。results: 实验结果显示,在两种模型上,自动囚禁方法的成功率分别为86.2%和67%,失败率分别为4.7%和2.2%。
    Abstract Large language models (LLMs), such as ChatGPT, have emerged with astonishing capabilities approaching artificial general intelligence. While providing convenience for various societal needs, LLMs have also lowered the cost of generating harmful content. Consequently, LLM developers have deployed semantic-level defenses to recognize and reject prompts that may lead to inappropriate content. Unfortunately, these defenses are not foolproof, and some attackers have crafted "jailbreak" prompts that temporarily hypnotize the LLM into forgetting content defense rules and answering any improper questions. To date, there is no clear explanation of the principles behind these semantic-level attacks and defenses in both industry and academia. This paper investigates the LLM jailbreak problem and proposes an automatic jailbreak method for the first time. We propose the concept of a semantic firewall and provide three technical implementation approaches. Inspired by the attack that penetrates traditional firewalls through reverse tunnels, we introduce a "self-deception" attack that can bypass the semantic firewall by inducing LLM to generate prompts that facilitate jailbreak. We generated a total of 2,520 attack payloads in six languages (English, Russian, French, Spanish, Chinese, and Arabic) across seven virtual scenarios, targeting the three most common types of violations: violence, hate, and pornography. The experiment was conducted on two models, namely the GPT-3.5-Turbo and GPT-4. The success rates on the two models were 86.2% and 67%, while the failure rates were 4.7% and 2.2%, respectively. This highlighted the effectiveness of the proposed attack method. All experimental code and raw data will be released as open-source to inspire future research. We believe that manipulating AI behavior through carefully crafted prompts will become an important research direction in the future.
    摘要 大型语言模型(LLM),如ChatGPT,已经出现了惊人的能力,仅次于人工智能。这些模型可以为社会的不同需求提供便利,但也lowered the cost of generating harmful content。因此,LLM开发者们已经部署了semantic-level防御来识别和拒绝可能导致不当内容的提示。不过,这些防御并不是不可攻击的,有些攻击者已经crafted "jailbreak"提示,使LLM暂时忘记内容防御规则,回答任何不当问题。到目前为止,在业界和学术界都没有清楚的解释这些semantic-level攻击和防御的原则。本文investigates the LLM jailbreak problem,并提出了一个自动化jailbreak方法。我们提出了semantic firewall的概念,并提供了三种技术实现方法。受到传统防火墙被攻击的启发,我们引入了一种"自我欺骗"攻击,可以绕过semantic firewall,使LLM生成提示,促使jailbreak。我们在六种语言(英文、俄语、法语、西班牙语、中文和阿拉伯语)的七个虚拟enario中,总共生成了2,520个攻击payload。实验使用了GPT-3.5-Turbo和GPT-4两个模型,成功率为86.2%和67%,失败率为4.7%和2.2%。这诉求了我们的提案攻击方法的效果。我们将所有实验代码和原始数据发布为开源,以启发未来的研究。我们相信,通过精心设计的提示,可以操纵AI的行为,将成为未来的重要研究方向。

Exploring Winograd Convolution for Cost-effective Neural Network Fault Tolerance

  • paper_url: http://arxiv.org/abs/2308.08230
  • repo_url: None
  • paper_authors: Xinghua Xue, Cheng Liu, Bo Liu, Haitong Huang, Ying Wang, Tao Luo, Lei Zhang, Huawei Li, Xiaowei Li
  • for: 本文研究了winograd核函数在神经网络中的 fault tolerance 特性,以及其与传统 fault-tolerant 设计方法的相互作用。
  • methods: 本文首次从不同的granularity水平(模型、层、操作类型)进行了winograd convolution fault tolerance的全面评估。并 explore了使用winograd convolution的自然fault tolerance来实现cost-effective的神经网络保护。
  • results: experiments show that winograd convolution可以在 fault-tolerant neural networks 中减少设计开销by 55.77% 平均而无损失精度,并在考虑winograd convolution的自然fault tolerance时减少计算开销by 17.24%。在应用于具有多种 faults 的情况下,使用winograd convolution和 fault-aware retraining和 constrained activation functions 组合的模型精度显著提高。
    Abstract Winograd is generally utilized to optimize convolution performance and computational efficiency because of the reduced multiplication operations, but the reliability issues brought by winograd are usually overlooked. In this work, we observe the great potential of winograd convolution in improving neural network (NN) fault tolerance. Based on the observation, we evaluate winograd convolution fault tolerance comprehensively from different granularities ranging from models, layers, and operation types for the first time. Then, we explore the use of inherent fault tolerance of winograd convolution for cost-effective NN protection against soft errors. Specifically, we mainly investigate how winograd convolution can be effectively incorporated with classical fault-tolerant design approaches including triple modular redundancy (TMR), fault-aware retraining, and constrained activation functions. According to our experiments, winograd convolution can reduce the fault-tolerant design overhead by 55.77\% on average without any accuracy loss compared to standard convolution, and further reduce the computing overhead by 17.24\% when the inherent fault tolerance of winograd convolution is considered. When it is applied on fault-tolerant neural networks enhanced with fault-aware retraining and constrained activation functions, the resulting model accuracy generally shows significant improvement in presence of various faults.
    摘要 winograd通常用于优化卷积性能和计算效率,因为它减少了多个乘法操作,但winograd引入的可靠性问题通常被忽略。在这项工作中,我们发现winograd卷积可以大幅提高神经网络(NN)fault tolerance。基于这一观察,我们对winograd卷积的FAULT TOLERANCE进行了全面的评估,从不同的粒度(模型、层、操作类型)进行了首次评估。然后,我们探索了使用winograd卷积的内在fault tolerance来实现cost-effectiveNN保护 against soft errors。具体来说,我们主要研究了如何effectively incorporate winograd卷积与传统的 fault-tolerant设计方法,包括TMR、 fault-aware retraining和constrained activation functions。根据我们的实验,winograd卷积可以将FAULT-TOLERANT DESIGN overhead减少55.77%,而无需损失精度,并且在考虑winograd卷积的内在 fault tolerance时,可以减少计算 overhead的17.24%。当应用于强化了FAULT-TOLERANT NEURAL NETWORKS的模型中,模型的准确率在不同的FAULT Condition下都会表现出明显的改善。

Inherent Redundancy in Spiking Neural Networks

  • paper_url: http://arxiv.org/abs/2308.08227
  • repo_url: https://github.com/biclab/asa-snn
  • paper_authors: Man Yao, Jiakui Hu, Guangshe Zhao, Yaoyuan Wang, Ziyang Zhang, Bo Xu, Guoqi Li
  • for: 本研究旨在探讨逻辑神经网络(SNN)中的内在重复性,以提高逻辑神经网络在准确性和能效性方面的优势。
  • methods: 本研究使用了一种名为 Advance Spatial Attention(ASA)模块,该模块可以自适应优化SNN的膜电压分布,从而精准地控制噪声脉冲特征。
  • results: 实验结果表明,提出的方法可以显著降低脉冲发生率,并且在比较于现有SNN基elines的情况下,表现出较好的性能。
    Abstract Spiking Neural Networks (SNNs) are well known as a promising energy-efficient alternative to conventional artificial neural networks. Subject to the preconceived impression that SNNs are sparse firing, the analysis and optimization of inherent redundancy in SNNs have been largely overlooked, thus the potential advantages of spike-based neuromorphic computing in accuracy and energy efficiency are interfered. In this work, we pose and focus on three key questions regarding the inherent redundancy in SNNs. We argue that the redundancy is induced by the spatio-temporal invariance of SNNs, which enhances the efficiency of parameter utilization but also invites lots of noise spikes. Further, we analyze the effect of spatio-temporal invariance on the spatio-temporal dynamics and spike firing of SNNs. Then, motivated by these analyses, we propose an Advance Spatial Attention (ASA) module to harness SNNs' redundancy, which can adaptively optimize their membrane potential distribution by a pair of individual spatial attention sub-modules. In this way, noise spike features are accurately regulated. Experimental results demonstrate that the proposed method can significantly drop the spike firing with better performance than state-of-the-art SNN baselines. Our code is available in \url{https://github.com/BICLab/ASA-SNN}.
    摘要 神经网络(SNN)因其能够减少能耗而被广泛认为是软件人工神经网络的有望代替。然而,由于人们对SNN的假设偏见,即SNN的快速发射是稀疏的,因此对SNN中的内在冗余进行分析和优化得到了相对少的关注。在这项工作中,我们提出了三个关键问题,即SNN中冗余的来源、冗余如何影响SNN的空间时间动力学和发射,以及如何利用SNN的冗余来提高准确率和能效性。我们提出了一种提高空间注意力(ASA)模块,该模块可以自适应优化SNN的膜电压分布,并减少噪声发射特征。实验结果表明,我们的方法可以至关重要地降低SNN的噪声发射,并在比较现有SNN基eline的情况下提高性能。我们的代码可以在中找到。

How To Overcome Confirmation Bias in Semi-Supervised Image Classification By Active Learning

  • paper_url: http://arxiv.org/abs/2308.08224
  • repo_url: None
  • paper_authors: Sandra Gilhuber, Rasmus Hvingelby, Mang Ling Ada Fok, Thomas Seidl
  • for: 这篇论文是关于是否需要活动学习的研究,它发现了强大的深度半supervised方法的出现,使得有限的标签数据设置中的活动学习是否可以使用的问题提出了问题。
  • methods: 这篇论文使用了 semi-supervised learning(SSL)方法和活动学习(AL)方法进行比较,以确定在实际数据场景中是否可以结合这两种方法以获得更好的性能。
  • results: 研究发现,SSL方法和AL方法的组合可以在实际数据场景中提高性能,但是这些结果来自于已知的benchmark数据集,这可能会过度估计外部效应。此外,文献缺乏关于实际数据场景中active semi-supervised learning方法的研究,这导致了我们对这些方法在实际场景中的性能的理解有限。因此,这篇论文提出了三个常见的数据挑战: между类异质、内类异质和between类相似。这些挑战可能会影响SSL性能,但是Random sampling不能 Mitigate这些挑战,有时甚至比supervised learning更差。然而,我们发现,在实际数据场景中,AL可以超越Confirmation bias,提高SSL的性能。
    Abstract Do we need active learning? The rise of strong deep semi-supervised methods raises doubt about the usability of active learning in limited labeled data settings. This is caused by results showing that combining semi-supervised learning (SSL) methods with a random selection for labeling can outperform existing active learning (AL) techniques. However, these results are obtained from experiments on well-established benchmark datasets that can overestimate the external validity. However, the literature lacks sufficient research on the performance of active semi-supervised learning methods in realistic data scenarios, leaving a notable gap in our understanding. Therefore we present three data challenges common in real-world applications: between-class imbalance, within-class imbalance, and between-class similarity. These challenges can hurt SSL performance due to confirmation bias. We conduct experiments with SSL and AL on simulated data challenges and find that random sampling does not mitigate confirmation bias and, in some cases, leads to worse performance than supervised learning. In contrast, we demonstrate that AL can overcome confirmation bias in SSL in these realistic settings. Our results provide insights into the potential of combining active and semi-supervised learning in the presence of common real-world challenges, which is a promising direction for robust methods when learning with limited labeled data in real-world applications.
    摘要 是否需要活动学习?强大深度半supervised方法的出现使得有限量标注数据的使用可能性受到了质问。这是因为结果表明将半supervised学习(SSL)方法与随机选择标注结合可以超越现有的活动学习(AL)技术。然而,这些结果是基于可靠的 benchmark 数据集进行实验所获得的,这可能会过度估计外部有效性。然而, литераature 缺乏对活动半supervised学习方法在实际数据场景中的性能的研究,这种空白在我们的理解中留下了一个大的坑。因此,我们介绍了三种常见的实际数据挑战: между类差异、 Within 类差异和 между类相似性。这些挑战可能会对 SSL 性能产生负面影响,因为确认偏见。我们在 simulated 数据挑战上进行了 SSL 和 AL 实验,发现随机抽样不能 Mitigate 确认偏见,而在一些情况下,随机抽样甚至比supervised learning(SL)更差。然而,我们示出了 AL 可以在这些实际设置中超越确认偏见。我们的结果提供了对将活动和半supervised学习结合使用在实际数据场景中的可能性的深入理解,这是一个可靠的方法在有限量标注数据的实际应用中学习。

HyperSNN: A new efficient and robust deep learning model for resource constrained control applications

  • paper_url: http://arxiv.org/abs/2308.08222
  • repo_url: None
  • paper_authors: Zhanglu Yan, Shida Wang, Kaiwen Tang, Weng-Fai Wong
  • for: 这个论文主要针对智能家居、机器人和智能家具等领域的边缘计算技术进行研究和开发,以提高系统的能效性和可靠性。
  • methods: 该论文提出了一种新的控制任务方法,称为HyperSNN,它将神经网络和高维度计算相结合,以降低能耗并提高系统的可靠性和可能性。HyperSNN使用8位整数加法取代昂贵的32位浮点 multiply操作,从而降低了能耗,同时保持了与传统机器学习方法相当的控制精度。
  • results: 该论文通过在AI Gym benchmark上测试HyperSNN模型,发现HyperSNN可以与传统机器学习方法具有相同的控制精度,但具有9.96%到1.36%的能耗减少。此外,实验还表明HyperSNN具有更高的可靠性。因此,HyperSNN适用于交互式、移动和穿戴式设备,推动了能效性和可靠性的系统设计。同时,它开创了实际工业场景中复杂算法如模型预测控制(MPC)的实践之路。
    Abstract In light of the increasing adoption of edge computing in areas such as intelligent furniture, robotics, and smart homes, this paper introduces HyperSNN, an innovative method for control tasks that uses spiking neural networks (SNNs) in combination with hyperdimensional computing. HyperSNN substitutes expensive 32-bit floating point multiplications with 8-bit integer additions, resulting in reduced energy consumption while enhancing robustness and potentially improving accuracy. Our model was tested on AI Gym benchmarks, including Cartpole, Acrobot, MountainCar, and Lunar Lander. HyperSNN achieves control accuracies that are on par with conventional machine learning methods but with only 1.36% to 9.96% of the energy expenditure. Furthermore, our experiments showed increased robustness when using HyperSNN. We believe that HyperSNN is especially suitable for interactive, mobile, and wearable devices, promoting energy-efficient and robust system design. Furthermore, it paves the way for the practical implementation of complex algorithms like model predictive control (MPC) in real-world industrial scenarios.
    摘要 在智能家居、机器人和智能家庭等领域的边缘计算技术普及化的背景下,这篇论文引入了HyperSNN,一种使用脉冲神经网络(SNN)和高维ensional计算的创新方法,用于控制任务。HyperSNN将昂贵的32位浮点 multiply替换为8位整数加法,从而降低能耗而不失精度,可能提高准确率。我们的模型在 AI Gym 标准彩色环境中进行了测试,包括 Cartpole、Acrobot、MountainCar 和 Lunar Lander 等benchmark。HyperSNN在控制精度方面与传统机器学习方法相当,但只需1.36%到9.96%的能耗。此外,我们的实验表明,使用HyperSNN可以提高系统的可靠性。我们认为HyperSNN特别适合交互式、移动和穿戴式设备,推动能效的系统设计。此外,它开创了实际工业应用场景中实施复杂算法如预测控制(MPC)的实际应用之路。

In situ Fault Diagnosis of Indium Tin Oxide Electrodes by Processing S-Parameter Patterns

  • paper_url: http://arxiv.org/abs/2308.11639
  • repo_url: None
  • paper_authors: Tae Yeob Kang, Haebom Lee, Sungho Suh
  • for: 这研究旨在开发一种可以早期检测和诊断氧化镧锆矿电极(ITO电极)的缺陷,以确保设备的性能和可靠性。
  • methods: 该研究使用了散射参数(S-parameter)信号处理技术,可以提供早期检测、高精度诊断、防噪特性和根本原因分析。
  • results: 研究发现,通过将不同频率频谱的S-parameter作为输入,可以使用深度学习(DL)算法同时分析缺陷的原因和严重程度,并在噪声水平下提高诊断性能。
    Abstract In the field of optoelectronics, indium tin oxide (ITO) electrodes play a crucial role in various applications, such as displays, sensors, and solar cells. Effective fault detection and diagnosis of the ITO electrodes are essential to ensure the performance and reliability of the devices. However, traditional visual inspection is challenging with transparent ITO electrodes, and existing fault detection methods have limitations in determining the root causes of the defects, often requiring destructive evaluations. In this study, an in situ fault diagnosis method is proposed using scattering parameter (S-parameter) signal processing, offering early detection, high diagnostic accuracy, noise robustness, and root cause analysis. A comprehensive S-parameter pattern database is obtained according to defect states. Deep learning (DL) approaches, including multilayer perceptron (MLP), convolutional neural network (CNN), and transformer, are then used to simultaneously analyze the cause and severity of defects. Notably, it is demonstrated that the diagnostic performance under additive noise levels can be significantly enhanced by combining different channels of the S-parameters as input to the learning algorithms, as confirmed through the t-distributed stochastic neighbor embedding (t-SNE) dimension reduction visualization.
    摘要 在光电子学领域,锌镉矿(ITO)电极在不同应用中发挥关键作用,如显示器、感测器和太阳能电池。确切检测和诊断ITO电极的缺陷非常重要,以确保设备的性能和可靠性。然而,传统的视觉检查困难于透明的ITO电极,现有的缺陷检测方法有限制,常需要破坏评估。本研究提出了即场缺陷诊断方法,使用散射参数(S-parameter)信号处理,可以早期检测、高度准确地诊断缺陷,并具有噪声抗性和根本原因分析能力。通过对缺陷状态的S-parameter模式数据库的建立,使用深度学习(DL)方法,包括多层感知网络(MLP)、卷积神经网络(CNN)和变换器,同时分析缺陷的原因和严重程度。尤其是,研究表明,将不同通道的S-parameters作为输入,可以增强对噪声水平的诊断性能,这得到了通过t-分布随机邻居embedding(t-SNE)维度减少视觉化的证明。

Epicure: Distilling Sequence Model Predictions into Patterns

  • paper_url: http://arxiv.org/abs/2308.08203
  • repo_url: None
  • paper_authors: Miltiadis Allamanis, Earl T. Barr
  • for: 这篇论文主要是为了提高机器学习模型在预测高 entropy 序列时的准确率。
  • methods: 这篇论文提出了一种方法 called Epicure,它可以将机器学习模型的预测转换成简单的模式。Epicure 使用一个格子来表示模型预测的各种抽象模式,这些模式可以更好地捕捉模型预测的细节。
  • results: 在测试两个任务中,namely predicting a descriptive name of a function given its source code body和detecting anomalous names given a function,Epicure 能够更准确地预测名称。Compared to the best model prediction, Epicure 可以提高准确率 by 61% for a false alarm rate of 10%.
    Abstract Most machine learning models predict a probability distribution over concrete outputs and struggle to accurately predict names over high entropy sequence distributions. Here, we explore finding abstract, high-precision patterns intrinsic to these predictions in order to make abstract predictions that usefully capture rare sequences. In this short paper, we present Epicure, a method that distils the predictions of a sequence model, such as the output of beam search, into simple patterns. Epicure maps a model's predictions into a lattice that represents increasingly more general patterns that subsume the concrete model predictions. On the tasks of predicting a descriptive name of a function given the source code of its body and detecting anomalous names given a function, we show that Epicure yields accurate naming patterns that match the ground truth more often compared to just the highest probability model prediction. For a false alarm rate of 10%, Epicure predicts patterns that match 61% more ground-truth names compared to the best model prediction, making Epicure well-suited for scenarios that require high precision.
    摘要 大多数机器学习模型预测结果为概率分布,困难准确预测高 entropy 序列分布中的名称。我们在这里探索找到高精度抽象模式,以便在罕见序列中准确预测名称。在这篇短文中,我们介绍 Epicure,一种方法,可以将序列模型的预测映射到一个表示增加更一般模式的格子中。在函数体代码中预测函数名和检测异常名称任务上,我们显示 Epicure 可以准确地预测名称,与真实值更常地匹配。对于 false alarm rate 为 10%,Epicure 可以预测匹配真实值的名称61%多于最佳模型预测,使其适用于需要高精度的场景。

DeSCo: Towards Generalizable and Scalable Deep Subgraph Counting

  • paper_url: http://arxiv.org/abs/2308.08198
  • repo_url: None
  • paper_authors: Tianyu Fu, Chiyue Wei, Yu Wang, Rex Ying
    for: 这篇论文是针对大规模子граφ counting问题(subgraph counting)提出了一个专业的解决方案。大规模子граφ counting有很多实际应用,例如社交网络分析中的模式计数以及交易网络中的过滤探测。methods: 这篇论文使用了一个叫做DeSCo的神经深度子граφ counting管线,以精确地预测查询的计数和出现位置。DeSCo首先使用了一个新的 canonical partition 技术,将大型目标图分成小型邻接图。这种技术可以严重减少查询的计数 variation,并且保证不会 missed 或 double-counting。其次,DeSCo使用了一个具有表现力的子граφ-based heterogeneous graph neural network 来实现每个邻接图中的计数。最后,DeSCo使用了一个称为 gossip propagation 的传播方法,将邻接图中的计数传递到下一个邻接图中。results: 这篇论文的实验结果显示,DeSCo 可以在八个真实世界数据集上进行预测,并且与现有的神经方法相比,具有137倍的平均方差错误,同时保持了多项式时间复杂性。
    Abstract Subgraph counting is the problem of counting the occurrences of a given query graph in a large target graph. Large-scale subgraph counting is useful in various domains, such as motif counting for social network analysis and loop counting for money laundering detection on transaction networks. Recently, to address the exponential runtime complexity of scalable subgraph counting, neural methods are proposed. However, existing neural counting approaches fall short in three aspects. Firstly, the counts of the same query can vary from zero to millions on different target graphs, posing a much larger challenge than most graph regression tasks. Secondly, current scalable graph neural networks have limited expressive power and fail to efficiently distinguish graphs in count prediction. Furthermore, existing neural approaches cannot predict the occurrence position of queries in the target graph. Here we design DeSCo, a scalable neural deep subgraph counting pipeline, which aims to accurately predict the query count and occurrence position on any target graph after one-time training. Firstly, DeSCo uses a novel canonical partition and divides the large target graph into small neighborhood graphs. The technique greatly reduces the count variation while guaranteeing no missing or double-counting. Secondly, neighborhood counting uses an expressive subgraph-based heterogeneous graph neural network to accurately perform counting in each neighborhood. Finally, gossip propagation propagates neighborhood counts with learnable gates to harness the inductive biases of motif counts. DeSCo is evaluated on eight real-world datasets from various domains. It outperforms state-of-the-art neural methods with 137x improvement in the mean squared error of count prediction, while maintaining the polynomial runtime complexity.
    摘要 <>Translate the given text into Simplified Chinese.<>大规模的子图计数问题是计算给定查询图在大型目标图中出现的次数。这种问题在社会网络分析中计算模式的搜索和金融网络中检测购买交易的循环计数等领域都是有用的。然而,目前的神经方法并不能够准确地解决这个问题。主要的问题包括:首先,查询的计数可以在不同的目标图上从零到百万之间变化,这比大多数图回归任务更加具有挑战性。其次,当前可扩展的图神经网络具有有限的表达能力,无法高效地 distinguishing 图。最后,现有的神经方法无法预测查询在目标图中的出现位置。为了解决这些问题,我们设计了DeSCo,一种可扩展的神经深度子图计数管道。DeSCo通过一种新的准确的分区方法将大型目标图分解成小 neighbohood 图。这种技术可以很好地减少计数的变化,同时保证不会产生错过或重复计数。其次,neighborhood 计数使用表达能力强的不同类型图神经网络来准确地进行计数。最后,gossip 传播使用学习门户来传递 neighborhood 计数,以便利用模式计数的启发性。DeSCo在八个实际数据集上进行了评估,与当前的神经方法相比,具有137倍的平均平方误差提升,同时保持了对数时间复杂度。

Leveraging Explainable AI to Analyze Researchers’ Aspect-Based Sentiment about ChatGPT

  • paper_url: http://arxiv.org/abs/2308.11001
  • repo_url: None
  • paper_authors: Shilpa Lakhanpal, Ajay Gupta, Rajeev Agrawal
  • for: 本研究旨在分析研究者对ChatGPT的看法,即使用哪些方法和得到什么结果。
  • methods: 本研究使用可解释AI来实现方面基于情感分析,以推动state of the art的提升。
  • results: 本研究提供了延伸 newer datasets上的方面基于情感分析的有价值信息,并不受文本数据的长度所限制。
    Abstract The groundbreaking invention of ChatGPT has triggered enormous discussion among users across all fields and domains. Among celebration around its various advantages, questions have been raised with regards to its correctness and ethics of its use. Efforts are already underway towards capturing user sentiments around it. But it begs the question as to how the research community is analyzing ChatGPT with regards to various aspects of its usage. It is this sentiment of the researchers that we analyze in our work. Since Aspect-Based Sentiment Analysis has usually only been applied on a few datasets, it gives limited success and that too only on short text data. We propose a methodology that uses Explainable AI to facilitate such analysis on research data. Our technique presents valuable insights into extending the state of the art of Aspect-Based Sentiment Analysis on newer datasets, where such analysis is not hampered by the length of the text data.
    摘要 <> chatgpt 的创新性发明引发了各界用户的广泛讨论,包括其多种优点的欢快和使用 ethics 的问题。各方面的用户情 Sentiment 已经在进行 Capture 的工作。但是,我们需要问到研究者们如何对 chatgpt 进行多方面的分析。我们的方法使用可解释 AI 来实现这种分析,并提供了valuable insights 以推进 aspect-based sentiment analysis 的状态艺术的扩展。>Here's the translation of the text into Traditional Chinese:<> chatgpt 的创新性发明引发了各界用户的广泛讨论,包括其多种优点的欢快和使用 ethics 的问题。各方面的用户情感已经在进行 Capture 的工作。但是,我们需要问到研究者们如何进行 chatgpt 的多方面分析。我们的方法使用可解释 AI 来实现这种分析,并提供了 valuable insights 以推进 aspect-based sentiment analysis 的状态艺术的扩展。>

Endogenous Macrodynamics in Algorithmic Recourse

  • paper_url: http://arxiv.org/abs/2308.08187
  • repo_url: https://github.com/pat-alt/endogenous-macrodynamics-in-algorithmic-recourse
  • paper_authors: Patrick Altmeyer, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, Cynthia C. S. Liem
  • for: 本研究旨在探讨Counterfactual Explanations (CE)和Algorithmic Recourse (AR)在动态环境下的应用,包括如何处理数据和模型演变的问题。
  • methods: 本研究使用一种普适的框架来描述现有的方法ologies,并发现了一种隐藏的外部成本,即在群体水平研究救济的终结性影响。
  • results: 通过使用各种现状最佳化器和数据集,我们通过模拟实验发现了救济引入的领域和模型变化很大,可能会妨碍救济的应用。然而,我们也提出了一些缓解方法,并开发了一个快速的 simulate框架。
    Abstract Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment: given some estimated model, the goal is to find valid counterfactuals for an individual instance that fulfill various desiderata. The ability of such counterfactuals to handle dynamics like data and model drift remains a largely unexplored research challenge. There has also been surprisingly little work on the related question of how the actual implementation of recourse by one individual may affect other individuals. Through this work, we aim to close that gap. We first show that many of the existing methodologies can be collectively described by a generalized framework. We then argue that the existing framework does not account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level. Through simulation experiments involving various state-of the-art counterfactual generators and several benchmark datasets, we generate large numbers of counterfactuals and study the resulting domain and model shifts. We find that the induced shifts are substantial enough to likely impede the applicability of Algorithmic Recourse in some situations. Fortunately, we find various strategies to mitigate these concerns. Our simulation framework for studying recourse dynamics is fast and opensourced.
    摘要 现有的工作主要关注单个个体在静止环境下的Counterfactual Explanations(CE)和Algorithmic Recourse(AR),即给一个估计模型下找到满足多种要求的有效counterfactuals。然而,这些counterfactuals在数据和模型变化时的处理能力仍然是一个未经探索的研究挑战。此外,很少有研究关于个体实施救济后对其他个体的影响。我们通过这项工作来填补这一差距。我们首先表明了许多现有方法可以总结为一个通用框架。然后,我们 argue that现有的框架不会考虑一种隐藏的外部成本,只有在研究群体级别的救济动力学时才能发现。通过使用多种state-of-the-art counterfactual生成器和多个标准数据集,我们生成了大量的counterfactuals,并研究其导致的领域和模型变化。我们发现这些变化足够大,可能会阻碍救济的应用。幸运地,我们发现了多种缓解这些问题的策略。我们的救济动力学 simulate框架快速且开源。

Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design

  • paper_url: http://arxiv.org/abs/2308.08174
  • repo_url: None
  • paper_authors: Shuwen Lu, Zhihui Zhang, Cong Guo, Jingwen Leng, Yangjie Zhou, Minyi Guo
    for: This paper aims to develop a high-performance and efficient hardware acceleration framework for graph neural network (GNN) models, addressing the challenges of high bandwidth requirements and model diversity.methods: The proposed framework, called SwitchBlade, utilizes a new type of partition-level operator fusion, partition-level multi-threading, and fine-grained graph partitioning to reduce the bandwidth requirement and improve hardware utilization.results: The proposed framework achieves an average speedup of 1.85 times and energy savings of 19.03 times compared to the NVIDIA V100 GPU, and delivers performance comparable to state-of-the-art specialized accelerators.
    Abstract Graph neural networks (GNNs) have shown significant accuracy improvements in a variety of graph learning domains, sparking considerable research interest. To translate these accuracy improvements into practical applications, it is essential to develop high-performance and efficient hardware acceleration for GNN models. However, designing GNN accelerators faces two fundamental challenges: the high bandwidth requirement of GNN models and the diversity of GNN models. Previous works have addressed the first challenge by using more expensive memory interfaces to achieve higher bandwidth. For the second challenge, existing works either support specific GNN models or have generic designs with poor hardware utilization. In this work, we tackle both challenges simultaneously. First, we identify a new type of partition-level operator fusion, which we utilize to internally reduce the high bandwidth requirement of GNNs. Next, we introduce partition-level multi-threading to schedule the concurrent processing of graph partitions, utilizing different hardware resources. To further reduce the extra on-chip memory required by multi-threading, we propose fine-grained graph partitioning to generate denser graph partitions. Importantly, these three methods make no assumptions about the targeted GNN models, addressing the challenge of model variety. We implement these methods in a framework called SwitchBlade, consisting of a compiler, a graph partitioner, and a hardware accelerator. Our evaluation demonstrates that SwitchBlade achieves an average speedup of $1.85\times$ and energy savings of $19.03\times$ compared to the NVIDIA V100 GPU. Additionally, SwitchBlade delivers performance comparable to state-of-the-art specialized accelerators.
    摘要 格raph神经网络(GNN)在各种图学任务中表现出了显著的准确性改进,引发了广泛的研究兴趣。为将这些准确性改进应用于实际场景,必须开发高性能和高效的硬件加速器 для GNN 模型。然而,设计 GNN 加速器面临两个基本挑战:GNN 模型的带宽要求高,以及 GNN 模型的多样性。先前的工作通过使用更昂贵的内存接口来实现更高的带宽来解决第一个挑战。对于第二个挑战,现有的工作ether支持特定的 GNN 模型或者有通用的设计,导致硬件利用率低下。在这个工作中,我们同时解决了这两个挑战。首先,我们发现了一种新的合并类型的分区级操作,我们利用这种合并来减少 GNN 模型的带宽要求。然后,我们引入分区级多线程来调度图分 partitions 的同时处理,使用不同的硬件资源。为了避免多线程增加的额外内存开销,我们提议细化的图分解。这三种方法不仅不假设目标 GNN 模型,而且还可以减少硬件内存占用。我们将这些方法集成到一个名为 SwitchBlade 的框架中,包括编译器、图分解器和硬件加速器。我们的评估表明,SwitchBlade 可以在 NVIDIA V100 GPU 上实现平均的速度提升 $1.85\times$ 和能源减少 $19.03\times$。此外,SwitchBlade 可以与当前的特化加速器相比。

Expressivity of Graph Neural Networks Through the Lens of Adversarial Robustness

  • paper_url: http://arxiv.org/abs/2308.08173
  • repo_url: https://github.com/francesco-campi/rob-subgraphs
  • paper_authors: Francesco Campi, Lukas Gosch, Tom Wollschläger, Yan Scholten, Stephan Günnemann
  • for: This paper studies the adversarial robustness of Graph Neural Networks (GNNs) and compares their expressive power to traditional Message Passing Neural Networks (MPNNs).
  • methods: The paper uses adversarial attacks to test the ability of GNNs to count specific subgraph patterns, and extends the concept of adversarial robustness to this task.
  • results: The paper shows that more powerful GNNs fail to generalize to small perturbations to the graph’s structure and fail to count substructures on out-of-distribution graphs.Here is the same information in Simplified Chinese text:
  • for: 这篇论文研究了图神经网络(GNNs)的 adversarial robustness,并与传统的消息传递神经网络(MPNNs)进行比较。
  • methods: 论文使用了针对特定子图 Pattern的 adversarial 攻击,并将这种概念扩展到这个任务上。
  • results: 论文显示了更强大的 GNNs 对小 strucure 的变化和 out-of-distribution 图表示不能通过总结。
    Abstract We perform the first adversarial robustness study into Graph Neural Networks (GNNs) that are provably more powerful than traditional Message Passing Neural Networks (MPNNs). In particular, we use adversarial robustness as a tool to uncover a significant gap between their theoretically possible and empirically achieved expressive power. To do so, we focus on the ability of GNNs to count specific subgraph patterns, which is an established measure of expressivity, and extend the concept of adversarial robustness to this task. Based on this, we develop efficient adversarial attacks for subgraph counting and show that more powerful GNNs fail to generalize even to small perturbations to the graph's structure. Expanding on this, we show that such architectures also fail to count substructures on out-of-distribution graphs.
    摘要 我们进行了第一个对图神经网络(GNNs)的逆攻击 robustness 研究,发现它们在许多情况下比传统的消息传递神经网络(MPNNs)更具有潜在的表达能力。特别是,我们使用逆攻击 robustness 作为一种工具,揭示了 GNNs 的表达能力与理论可能的表达能力之间存在很大的差距。为此,我们专注于 GNNs 的子图计数能力,这是一个已知的表达能力指标,并将逆攻击 robustness 扩展到这个任务。 Based on this, we develop efficient adversarial attacks for subgraph counting and show that more powerful GNNs fail to generalize even to small perturbations to the graph's structure. In addition, we show that such architectures also fail to count substructures on out-of-distribution graphs.

AATCT-IDS: A Benchmark Abdominal Adipose Tissue CT Image Dataset for Image Denoising, Semantic Segmentation, and Radiomics Evaluation

  • paper_url: http://arxiv.org/abs/2308.08172
  • repo_url: None
  • paper_authors: Zhiyu Ma, Chen Li, Tianming Du, Le Zhang, Dechao Tang, Deguo Ma, Shanchuan Huang, Yan Liu, Yihao Sun, Zhihao Chen, Jin Yuan, Qianqing Nie, Marcin Grzegorzek, Hongzan Sun
  • for: 这个研究用于探讨腹部脂肪组织CT图像数据集(AATTCT-IDS)的研究潜力。
  • methods: 该研究使用了AATTCT-IDS数据集,并对其进行了不同任务的比较研究,包括图像压缩、 semantics 分割和 радиологиcal 分析。
  • results: 研究结果显示,在图像压缩任务中,使用缓和策略可以更好地降低杂噪,但是会导致图像细节的损失。在 semantics 分割任务中,BiSeNet 模型可以在短时间内 obtaint 比较好的 segmentation 结果,并具有良好的隔离小型和隔离的脂肪组织能力。在 радиологиcal 分析任务中,研究发现了脂肪分布的多dimensional 特征。
    Abstract Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentation models, and study radiomics. For different tasks, this paper compares and analyzes the performance of various methods on AATTCT-IDS by combining the visualization results and evaluation data. Thus, verify the research potential of this data set in the above three types of tasks. Results: In the comparative study of image denoising, algorithms using a smoothing strategy suppress mixed noise at the expense of image details and obtain better evaluation data. Methods such as BM3D preserve the original image structure better, although the evaluation data are slightly lower. The results show significant differences among them. In the comparative study of semantic segmentation of abdominal adipose tissue, the segmentation results of adipose tissue by each model show different structural characteristics. Among them, BiSeNet obtains segmentation results only slightly inferior to U-Net with the shortest training time and effectively separates small and isolated adipose tissue. In addition, the radiomics study based on AATTCT-IDS reveals three adipose distributions in the subject population. Conclusion: AATTCT-IDS contains the ground truth of adipose tissue regions in abdominal CT slices. This open-source dataset can attract researchers to explore the multi-dimensional characteristics of abdominal adipose tissue and thus help physicians and patients in clinical practice. AATCT-IDS is freely published for non-commercial purpose at: \url{https://figshare.com/articles/dataset/AATTCT-IDS/23807256}.
    摘要 方法:本研究使用的Benchmark数据集是《 Abdomen Adipose Tissue CT Image Dataset》(AATTCT-IDS),包含300个研究对象,并公布了13,732个Raw CT slice。研究人员对3,213个slice进行了手动标注,以验证减噪方法、训练semantic segmentation模型和研究 ради米克特性。通过组合视觉化结果和评估数据来对不同任务进行比较分析,以验证数据集的研究潜力。结果:在图像减噪比较研究中,使用缓和策略的算法可以更好地抑制杂噪,但是会增加图像细节的损失。BM3D算法可以更好地保持原始图像结构,但评估数据略为下降。结果显示了不同算法之间存在很大的差异。在 Abdomen Adipose Tissue Semantic Segmentation 比较研究中,每种模型的 segmentation 结果具有不同的结构特征。BISeNet模型可以在短时间内 obtener segmentation 结果,与 U-Net 相当,并且能够有效地分离小 isolated adipose tissue。此外,基于 AATTCT-IDS 的 ради米克特研究发现了脂肪分布的三种类型。结论:AATTCT-IDS 包含了 Abdomen CT 图像中脂肪组织区域的真实标准。这个开源数据集可以吸引研究人员通过多维度特征的探索,帮助临床医生和病人。AATTCT-IDS 采用非商业用途发布,可以免费下载,请参考:https://figshare.com/articles/dataset/AATTCT-IDS/23807256。

A Quantum Approximation Scheme for k-Means

  • paper_url: http://arxiv.org/abs/2308.08167
  • repo_url: None
  • paper_authors: Ragesh Jaiswal
  • for: 这个论文targets the classical $k$-means clustering problem, and proposes a quantum approximation scheme with a polylogarithmic running time.
  • methods: 这个算法使用了QRAM数据结构,并使用了一种$(1 + \varepsilon)$-approximation方法,其中$\varepsilon > 0$是任意的 positivereal number.
  • results: 这个算法可以在时间复杂度为 $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$ 内运行,并且 WITH HIGH PROBABILITY输出一个$k$个中心点集合,其中$cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$,where $C_{OPT}$是最优的$k$-中心点集合,$cost(.)$是标准的$k$-means成本函数(即点到最近中心的平方距离的总和),并且$\eta$是最大距离到最小距离的比率。这是第一个具有polylogarithmic running time的量子算法,并且不需要量子线性代数子过程,运行时间独立于参数(例如condition number)。
    Abstract We give a quantum approximation scheme (i.e., $(1 + \varepsilon)$-approximation for every $\varepsilon > 0$) for the classical $k$-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^d$ stored in QRAM data structure, our quantum algorithm runs in time $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$ and with high probability outputs a set $C$ of $k$ centers such that $cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$. Here $C_{OPT}$ denotes the optimal $k$-centers, $cost(.)$ denotes the standard $k$-means cost function (i.e., the sum of the squared distance of points to the closest center), and $\eta$ is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of $(1+\varepsilon)$ for the $k$-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures.
    摘要 我们提供一种量子近似方案(即$(1+\varepsilon)$-近似方案,其中$\varepsilon > 0$) для классической$k$-Means分布问题在QRAM模型中,并且running时间只具有多项式幂ilogarithmic(polylogarithmic)依赖于数据点的数量。更具体地,给定一个数据集$V$包含$N$个点的归一化在QRAM数据结构中,我们的量子算法在时间 $\tilde{O} \left( 2^{\tilde{O}(\frac{k}{\varepsilon})} \eta^2 d\right)$ 内运行,并且高probability输出一组$C$的$k$个中心,使得 $cost(V, C) \leq (1+\varepsilon) \cdot cost(V, C_{OPT})$,其中$C_{OPT}$表示最优的$k$-中心,$cost(.)$表示标准的$k$-Means成本函数(即点到最近中心的平方距离之和),并且$\eta$是最大距离与最小距离的比率。这是首个具有多项式幂ilogarithmic running time的量子算法,并且不需要量子线性代数子算法,运行时间不依赖于参数(例如condition number)的量子学习算法。

PEvoLM: Protein Sequence Evolutionary Information Language Model

  • paper_url: http://arxiv.org/abs/2308.08578
  • repo_url: https://github.com/issararab/pevolm
  • paper_authors: Issar Arab
  • for: 这个论文的目的是提出一种基于语言模型的多序列对应方法,以提高生物计算机学中的蛋白质序列分析。
  • methods: 该方法使用了一种基于语言模型的 bidirectional Long Short-Term Memory (LSTMs) 网络,并将 PSSMs 与转移学习结合,以降低模型的自由参数数量。
  • results: 该方法可以同时预测下一个氨基酸和其演化信息,并且可以通过多任务学习来学习蛋白质序列的进化信息。Here’s the simplified Chinese text:
  • for: 这个论文的目的是提出一种基于语言模型的多序列对应方法,以提高生物计算机学中的蛋白质序列分析。
  • methods: 该方法使用了一种基于语言模型的 bidirectional Long Short-Term Memory (LSTMs) 网络,并将 PSSMs 与转移学习结合,以降低模型的自由参数数量。
  • results: 该方法可以同时预测下一个氨基酸和其演化信息,并且可以通过多任务学习来学习蛋白质序列的进化信息。
    Abstract With the exponential increase of the protein sequence databases over time, multiple-sequence alignment (MSA) methods, like PSI-BLAST, perform exhaustive and time-consuming database search to retrieve evolutionary information. The resulting position-specific scoring matrices (PSSMs) of such search engines represent a crucial input to many machine learning (ML) models in the field of bioinformatics and computational biology. A protein sequence is a collection of contiguous tokens or characters called amino acids (AAs). The analogy to natural language allowed us to exploit the recent advancements in the field of Natural Language Processing (NLP) and therefore transfer NLP state-of-the-art algorithms to bioinformatics. This research presents an Embedding Language Model (ELMo), converting a protein sequence to a numerical vector representation. While the original ELMo trained a 2-layer bidirectional Long Short-Term Memory (LSTMs) network following a two-path architecture, one for the forward and the second for the backward pass, by merging the idea of PSSMs with the concept of transfer-learning, this work introduces a novel bidirectional language model (bi-LM) with four times less free parameters and using rather a single path for both passes. The model was trained not only on predicting the next AA but also on the probability distribution of the next AA derived from similar, yet different sequences as summarized in a PSSM, simultaneously for multi-task learning, hence learning evolutionary information of protein sequences as well. The network architecture and the pre-trained model are made available as open source under the permissive MIT license on GitHub at https://github.com/issararab/PEvoLM.
    摘要 随着蛋白序列数据库的呈指数增长,多重序列对齐(MSA)方法如PSI-BLAST在时间上进行耗时和耗力的数据库搜索,以获取进化信息。得到的位置特异分数据(PSSM)被多种机器学习(ML)模型在生物信息学和计算生物学中作为重要输入。蛋白序列是一系列连续的字符或氨基酸(AA)的集合。通过将蛋白序列与自然语言的相似性进行比较,我们可以利用自然语言处理(NLP)领域的最新进展,并将其应用到生物信息学中。本研究投入了一个Embedding Language Model(ELMo),将蛋白序列转换为数值vector表示。在原ELMo模型中,一个2层扩展LSTM网络按照两路架构,一路为前向传输,另一路为后向传输。在将PSSM的概念与传输学习混合到一起的基础上,本工作提出了一种新的双向语言模型(bi-LM),具有四倍少的自由参数,并使用单路架构进行两个方向的传输。该模型在预测下一个AA以外,同时也预测来自相似 yet different 序列的AA的概率分布,即PSSM,并在多任务学习中同时学习蛋白序列的进化信息。网络架构和预训练模型在MIT免费许可下在GitHub上提供,可以在https://github.com/issararab/PEvoLM中下载。

Stochastic Controlled Averaging for Federated Learning with Communication Compression

  • paper_url: http://arxiv.org/abs/2308.08165
  • repo_url: None
  • paper_authors: Xinmeng Huang, Ping Li, Xiaoyun Li
  • for: 降低 Federated Learning(FL)的通信开销,提高FL的效率和可扩展性。
  • methods: 提出了一种更加高效/简单的渐控方法,并基于此实现了两种压缩FL算法(SCALLION和SCAFCOM),支持不偏和偏压缩。
  • results: experiments show that SCALLION和SCAFCOM可以与相应的全精度FL方法匹配或超越其通信和计算复杂度,并且可以在不同的数据不均性下表现良好。
    Abstract Communication compression, a technique aiming to reduce the information volume to be transmitted over the air, has gained great interests in Federated Learning (FL) for the potential of alleviating its communication overhead. However, communication compression brings forth new challenges in FL due to the interplay of compression-incurred information distortion and inherent characteristics of FL such as partial participation and data heterogeneity. Despite the recent development, the performance of compressed FL approaches has not been fully exploited. The existing approaches either cannot accommodate arbitrary data heterogeneity or partial participation, or require stringent conditions on compression. In this paper, we revisit the seminal stochastic controlled averaging method by proposing an equivalent but more efficient/simplified formulation with halved uplink communication costs. Building upon this implementation, we propose two compressed FL algorithms, SCALLION and SCAFCOM, to support unbiased and biased compression, respectively. Both the proposed methods outperform the existing compressed FL methods in terms of communication and computation complexities. Moreover, SCALLION and SCAFCOM accommodates arbitrary data heterogeneity and do not make any additional assumptions on compression errors. Experiments show that SCALLION and SCAFCOM can match the performance of corresponding full-precision FL approaches with substantially reduced uplink communication, and outperform recent compressed FL methods under the same communication budget.
    摘要 通信压缩,一种目的是减少在空中传输的信息量,在联合学习(FL)中获得了广泛的关注,因为它可能减轻联合学习的通信负担。然而,通信压缩在FL中带来了新的挑战,这是因为压缩引入的信息扭曲和联合学习的自然特点,如数据不同性和部分参与。尽管有最近的发展,已有的压缩FL方法的性能尚未被完全利用。现有的方法 Either cannot accommodate arbitrary data heterogeneity or partial participation, or require stringent conditions on compression.在这篇论文中,我们重新考虑了seminal stochastic controlled averaging方法,并提出了一种更加有效/简单的表述,减少了上行通信成本的一半。基于这个实现,我们提出了两种压缩FL算法,即SCALLION和SCAFCOM,以支持不偏和偏 compression。两种方法在communication和计算复杂度方面都有更好的性能,并且可以满足任意数据不同性和不添加任何压缩错误的假设。实验表明,SCALLION和SCAFCOM可以与相应的全精度FL方法匹配性能,并且在相同的通信预算下出perform recent compressed FL methods。

Characteristics of networks generated by kernel growing neural gas

  • paper_url: http://arxiv.org/abs/2308.08163
  • repo_url: https://github.com/kazuhisafujita/kernelgng
  • paper_authors: Kazuhisa Fujita
  • for: 本研究旨在开发kernel GNG,一种基于GNG算法的kernel化版本,以及investigate kernel GNG所生成的网络特性。
  • methods: 本研究使用了五种kernels,包括Gaussian、Laplacian、Cauchy、inverse multiquadric和logkernels,将数据集映射到特征空间。
  • results: 研究发现,kernel GNG可以生成具有不同特性的网络,其中每种kernel生成的网络具有不同的特征。
    Abstract This research aims to develop kernel GNG, a kernelized version of the growing neural gas (GNG) algorithm, and to investigate the features of the networks generated by the kernel GNG. The GNG is an unsupervised artificial neural network that can transform a dataset into an undirected graph, thereby extracting the features of the dataset as a graph. The GNG is widely used in vector quantization, clustering, and 3D graphics. Kernel methods are often used to map a dataset to feature space, with support vector machines being the most prominent application. This paper introduces the kernel GNG approach and explores the characteristics of the networks generated by kernel GNG. Five kernels, including Gaussian, Laplacian, Cauchy, inverse multiquadric, and log kernels, are used in this study.
    摘要 这项研究的目标是开发kernel GNG,即基于基域神经网络(GNG)算法的基域化版本,并研究由kernel GNG生成的网络特性。GNG是一种无监督的人工神经网络,可以将数据集转换成无向图,从而提取数据集中的特征。GNG广泛应用于 вектор量化、归一化和3D图形。基域方法通常用于将数据集映射到特征空间,Support Vector Machines(SVM)是最广泛应用的应用。本文介绍了基域GNG方法,并探讨由基域GNG生成的网络特性。本研究使用的五种基域包括 Gaussian、Laplacian、Cauchy、 inverse multiquadric 和 log 基域。

Interpretability Benchmark for Evaluating Spatial Misalignment of Prototypical Parts Explanations

  • paper_url: http://arxiv.org/abs/2308.08162
  • repo_url: None
  • paper_authors: Mikołaj Sacha, Bartosz Jura, Dawid Rymarczyk, Łukasz Struski, Jacek Tabor, Bartosz Zieliński
  • for: 该研究旨在检验和解释Prototypical Parts Networks(PPN)的准确性和可解释性问题。
  • methods: 研究人员提出了一种新的可解释性指标集,用于评估PPN模型中prototype activation region的准确性和可解释性问题。此外,他们还提出了一种misalignment compensation方法,用于解决这种问题。
  • results: 研究人员通过广泛的实验研究,证明了他们的指标集和补做方法的效果。他们发现,通过使用他们的方法,PPN模型可以更好地准确地捕捉和解释图像中的部分。
    Abstract Prototypical parts-based networks are becoming increasingly popular due to their faithful self-explanations. However, their similarity maps are calculated in the penultimate network layer. Therefore, the receptive field of the prototype activation region often depends on parts of the image outside this region, which can lead to misleading interpretations. We name this undesired behavior a spatial explanation misalignment and introduce an interpretability benchmark with a set of dedicated metrics for quantifying this phenomenon. In addition, we propose a method for misalignment compensation and apply it to existing state-of-the-art models. We show the expressiveness of our benchmark and the effectiveness of the proposed compensation methodology through extensive empirical studies.
    摘要 幻化网络的各部分模型在不断增长的 популяр度中,主要是因为它们的自我解释能力很强。然而,它们的相似度地图通常在半 finales层计算,这意味着prototype activation区域的受感知范围经常受到图像外部部分的影响,这可能导致不准确的解释。我们称这种情况为空间解释误差,并引入了一个特有的可度量这种现象的解释指标集。此外,我们还提出了一种补偿方法,并应用于现有的状态对照模型。我们通过广泛的实验研究证明了我们的指标和补偿方法的表达能力和有效性。

Benchmarking Adversarial Robustness of Compressed Deep Learning Models

  • paper_url: http://arxiv.org/abs/2308.08160
  • repo_url: None
  • paper_authors: Brijesh Vora, Kartik Patwari, Syed Mahbub Hafiz, Zubair Shafiq, Chen-Nee Chuah
  • for: 本研究旨在探讨针对基本模型的攻击输入对压缩后的模型的影响。
  • methods: 我们开发了一个多样化的攻击测试环境,并对多种常用的深度神经网络模型进行了探索。我们采用了优化的压缩策略,以保持准确性和性能。
  • results: 我们发现,即使压缩后的模型具有更好的总体性能和执行速度,它们对攻击输入的抵抗性仍然保持相对不变。这表明,模型压缩不会对针对攻击的鲁棒性产生负面影响。
    Abstract The increasing size of Deep Neural Networks (DNNs) poses a pressing need for model compression, particularly when employed on resource constrained devices. Concurrently, the susceptibility of DNNs to adversarial attacks presents another significant hurdle. Despite substantial research on both model compression and adversarial robustness, their joint examination remains underexplored. Our study bridges this gap, seeking to understand the effect of adversarial inputs crafted for base models on their pruned versions. To examine this relationship, we have developed a comprehensive benchmark across diverse adversarial attacks and popular DNN models. We uniquely focus on models not previously exposed to adversarial training and apply pruning schemes optimized for accuracy and performance. Our findings reveal that while the benefits of pruning enhanced generalizability, compression, and faster inference times are preserved, adversarial robustness remains comparable to the base model. This suggests that model compression while offering its unique advantages, does not undermine adversarial robustness.
    摘要 随着深度神经网络(DNN)的尺度不断增加,模型压缩成为了资源有限设备上的一 pressing需求。同时,DNN对攻击性输入的抵触也成为了一大难题。虽然关于模型压缩和攻击性稳定性的研究均有很大进步,但这两个领域之间的交叠仍然尚未得到充分研究。我们的研究尝试填补这一空白,探讨针对基本模型的攻击性输入如何影响其压缩版本。为了实现这一目标,我们在多种攻击方法和流行的DNN模型之间建立了一个完整的比较平台。我们独特地将注意力集中在没有接受过防御性训练的DNN模型上,并应用优化了准确和性能的压缩方案。我们的发现表明,即使使用压缩,模型的总体性能和攻击性稳定性仍然保持相对不变。这表明,模型压缩不会损害攻击性稳定性。

Deep Generative Imputation Model for Missing Not At Random Data

  • paper_url: http://arxiv.org/abs/2308.08158
  • repo_url: None
  • paper_authors: Jialei Chen, Yuanbo Xu, Pengyang Wang, Yongjian Yang
  • for: 解决Missing Not At Random (MNAR)问题,即数据损失的原因不完全 observable。
  • methods: 提出了一种基于 JOINT 分布的模型,并使用深度生成模型来处理实际世界中的损失机制,以并行地恢复不完整数据和重建损失的面具。
  • results: 对比STATE-OF-THE-ART 基eline,提出的GNR模型在RMSE方面平均提高9.9%到18.8%,并且总是在面具重建精度方面获得更好的结果,这使得恢复更加原理。
    Abstract Data analysis usually suffers from the Missing Not At Random (MNAR) problem, where the cause of the value missing is not fully observed. Compared to the naive Missing Completely At Random (MCAR) problem, it is more in line with the realistic scenario whereas more complex and challenging. Existing statistical methods model the MNAR mechanism by different decomposition of the joint distribution of the complete data and the missing mask. But we empirically find that directly incorporating these statistical methods into deep generative models is sub-optimal. Specifically, it would neglect the confidence of the reconstructed mask during the MNAR imputation process, which leads to insufficient information extraction and less-guaranteed imputation quality. In this paper, we revisit the MNAR problem from a novel perspective that the complete data and missing mask are two modalities of incomplete data on an equal footing. Along with this line, we put forward a generative-model-specific joint probability decomposition method, conjunction model, to represent the distributions of two modalities in parallel and extract sufficient information from both complete data and missing mask. Taking a step further, we exploit a deep generative imputation model, namely GNR, to process the real-world missing mechanism in the latent space and concurrently impute the incomplete data and reconstruct the missing mask. The experimental results show that our GNR surpasses state-of-the-art MNAR baselines with significant margins (averagely improved from 9.9% to 18.8% in RMSE) and always gives a better mask reconstruction accuracy which makes the imputation more principle.
    摘要 通常情况下,数据分析会面临缺失不具有完整性(Missing Not At Random,MNAR)问题,其中缺失的原因不是完全观察到。与完全随机缺失(Missing Completely At Random,MCAR)问题相比,MNAR问题更加真实和复杂。现有的统计方法将MNAR机制分解为不同的共同分布。但我们发现,直接将这些统计方法 интеGRATE到深度生成模型中是不佳的。特别是,它会忽略恢复mask的自信度,导致信息提取不充分和缺失补做质量不够保证。在这篇论文中,我们从一个新的视角来看待MNAR问题,即完整数据和缺失mask是两种不同的束缚数据模式。随着这一线,我们提出了一种特有的生成模型协同分布方法,即并合模型。这种方法可以在平行方式 represent complete data和缺失mask的分布,并提取完整数据和缺失mask中的足够信息。进一步,我们利用深度生成补做模型,即GNR,来处理实际中的缺失机制,并同时补做缺失数据和恢复缺失mask。实验结果显示,我们的GNR在RMSE方面与状态当前的MNAR基线之间提高了9.9%到18.8%的平均提升,并且总是提供更好的mask重建精度,这使得补做更符合原理。

Sarcasm Detection in a Disaster Context

  • paper_url: http://arxiv.org/abs/2308.08156
  • repo_url: None
  • paper_authors: Tiberiu Sosea, Junyi Jessy Li, Cornelia Caragea
  • for: 这篇论文主要是为了研究在自然灾害时人们使用社交媒体平台发送的 sarcastic 语言,以提高自然语言理解。
  • methods: 该论文使用了预训练语言模型进行嘲讽检测,并提供了一个包含15,000个推文的 HurricaneSARC 数据集。
  • results: 研究结果显示,使用中间任务转移学习可以提高 HurricaneSARC 上的性能,最佳模型可以达到0.70的 F1 分数。
    Abstract During natural disasters, people often use social media platforms such as Twitter to ask for help, to provide information about the disaster situation, or to express contempt about the unfolding event or public policies and guidelines. This contempt is in some cases expressed as sarcasm or irony. Understanding this form of speech in a disaster-centric context is essential to improving natural language understanding of disaster-related tweets. In this paper, we introduce HurricaneSARC, a dataset of 15,000 tweets annotated for intended sarcasm, and provide a comprehensive investigation of sarcasm detection using pre-trained language models. Our best model is able to obtain as much as 0.70 F1 on our dataset. We also demonstrate that the performance on HurricaneSARC can be improved by leveraging intermediate task transfer learning. We release our data and code at https://github.com/tsosea2/HurricaneSarc.
    摘要 在自然灾害事件中,人们经常使用社交媒体平台如推特请求帮助、提供灾害情况信息或表达对事件或公共政策的负面情感。在这种情况下,人们可能会通过讽刺或反讽的方式表达自己的不满。在这种情况下,理解这种语言表达方式是提高自然语言理解灾害相关推特的关键。在这篇论文中,我们介绍了风暴沙射(HurricaneSARC)数据集,该数据集包含15,000个推特,每个推特都被标注为带有讽刺意图。我们提供了一项全面的讽刺检测研究,使用预训练语言模型。我们的最佳模型在我们的数据集上可以获得0.70的F1分。我们还证明了在风暴沙射数据集上的性能可以通过中间任务转移学习提高。我们将数据和代码发布在GitHub上,请参考https://github.com/tsosea2/HurricaneSarc。

Hierarchical Topological Ordering with Conditional Independence Test for Limited Time Series

  • paper_url: http://arxiv.org/abs/2308.08148
  • repo_url: None
  • paper_authors: Anpeng Wu, Haoxuan Li, Kun Kuang, Keli Zhang, Fei Wu
  • For: 本研究旨在使用导向acyclic graphs (DAGs) 揭示观测数据中的 causal 关系。* Methods: 本研究使用 topology-based 方法,首先学习变量的排序,然后消除冗余的边,以确保图形 remain acyclic。* Results: 研究人员提出了一种改进 topology-based 方法,通过 incorporating conditional instrumental variables as exogenous interventions,可以 Identify descendant nodes for each variable。HT-CIT 算法可以减少需要被截割的边的数量,并且在实际数据上得到了更好的性能。
    Abstract Learning directed acyclic graphs (DAGs) to identify causal relations underlying observational data is crucial but also poses significant challenges. Recently, topology-based methods have emerged as a two-step approach to discovering DAGs by first learning the topological ordering of variables and then eliminating redundant edges, while ensuring that the graph remains acyclic. However, one limitation is that these methods would generate numerous spurious edges that require subsequent pruning. To overcome this limitation, in this paper, we propose an improvement to topology-based methods by introducing limited time series data, consisting of only two cross-sectional records that need not be adjacent in time and are subject to flexible timing. By incorporating conditional instrumental variables as exogenous interventions, we aim to identify descendant nodes for each variable. Following this line, we propose a hierarchical topological ordering algorithm with conditional independence test (HT-CIT), which enables the efficient learning of sparse DAGs with a smaller search space compared to other popular approaches. The HT-CIT algorithm greatly reduces the number of edges that need to be pruned. Empirical results from synthetic and real-world datasets demonstrate the superiority of the proposed HT-CIT algorithm.
    摘要 To overcome this limitation, we propose an improvement to topology-based methods by incorporating limited time series data, consisting of only two cross-sectional records that do not need to be adjacent in time and are subject to flexible timing. By incorporating conditional instrumental variables as exogenous interventions, we aim to identify descendant nodes for each variable.We propose a hierarchical topological ordering algorithm with conditional independence test (HT-CIT), which enables the efficient learning of sparse DAGs with a smaller search space compared to other popular approaches. The HT-CIT algorithm greatly reduces the number of edges that need to be pruned.Empirical results from synthetic and real-world datasets demonstrate the superiority of the proposed HT-CIT algorithm.

Online Control for Linear Dynamics: A Data-Driven Approach

  • paper_url: http://arxiv.org/abs/2308.08138
  • repo_url: None
  • paper_authors: Zishun Liu, Yongxin Chen
  • for: 这个论文关注在线控制问题,特别是Linear Time-Invariant系统(LTI)中 unknown dynamics、bounded disturbance 和 adversarial cost。
  • methods: 我们提出了一种数据驱动的策略,以减少控制器的 regret。不同于模型基于方法,我们的算法不需要确定系统模型,而是利用单个噪音自由轨迹来计算干扰的积累,并使用我们设计的积累干扰控制器来做决策,其参数通过在线加权 descent 进行更新。
  • results: 我们证明了我们的算法的 regret 是 $\mathcal{O}(\sqrt{T})$,这意味着它的性能与模型基于方法相当。
    Abstract This paper considers an online control problem over a linear time-invariant system with unknown dynamics, bounded disturbance, and adversarial cost. We propose a data-driven strategy to reduce the regret of the controller. Unlike model-based methods, our algorithm does not identify the system model, instead, it leverages a single noise-free trajectory to calculate the accumulation of disturbance and makes decisions using the accumulated disturbance action controller we design, whose parameters are updated by online gradient descent. We prove that the regret of our algorithm is $\mathcal{O}(\sqrt{T})$ under mild assumptions, suggesting that its performance is on par with model-based methods.
    摘要 这篇论文考虑了一个在线控制问题,其中系统为线性时不变的,受到干扰和恶意成本的影响。我们提出了一种数据驱动的策略,以减少控制器的 regret。不同于模型基于方法,我们的算法不需要确定系统模型,而是利用一个干净的轨迹来计算干扰的积累,并使用我们设计的积累干扰控制器,其参数通过在线梯度下降进行更新。我们证明了我们的算法的 regret是 $\mathcal{O}(\sqrt{T})$,这意味着它的性能与模型基于方法相当。

Microstructure-Empowered Stock Factor Extraction and Utilization

  • paper_url: http://arxiv.org/abs/2308.08135
  • repo_url: None
  • paper_authors: Xianfeng Jiao, Zizhong Li, Chang Xu, Yang Liu, Weiqing Liu, Jiang Bian
    for:This paper aims to effectively extract essential factors from order flow data for diverse downstream tasks across different granularities and scenarios.methods:The proposed framework consists of a Context Encoder and an Factor Extractor, using unsupervised learning methods to select important signals from the given context.results:The extracted factors are utilized for downstream tasks, demonstrating significant improvement for stock trend prediction and order execution tasks at the second and minute level, compared to existing tick-level approaches.Here’s the simplified Chinese text:for: 这篇论文目的是EXTRACTING essential factors from order flow data for 多种下游任务 Across different granularities and scenarios.methods: 该提议的框架包括Context Encoder和Factor Extractor,使用无监督学习方法选择Context中的重要信号。results: 提取的因素被用于下游任务,对stock trend prediction和订单执行任务 at the second and minute level exhibit significant improvement, compared to现有的tick-level Approaches.
    Abstract High-frequency quantitative investment is a crucial aspect of stock investment. Notably, order flow data plays a critical role as it provides the most detailed level of information among high-frequency trading data, including comprehensive data from the order book and transaction records at the tick level. The order flow data is extremely valuable for market analysis as it equips traders with essential insights for making informed decisions. However, extracting and effectively utilizing order flow data present challenges due to the large volume of data involved and the limitations of traditional factor mining techniques, which are primarily designed for coarser-level stock data. To address these challenges, we propose a novel framework that aims to effectively extract essential factors from order flow data for diverse downstream tasks across different granularities and scenarios. Our method consists of a Context Encoder and an Factor Extractor. The Context Encoder learns an embedding for the current order flow data segment's context by considering both the expected and actual market state. In addition, the Factor Extractor uses unsupervised learning methods to select such important signals that are most distinct from the majority within the given context. The extracted factors are then utilized for downstream tasks. In empirical studies, our proposed framework efficiently handles an entire year of stock order flow data across diverse scenarios, offering a broader range of applications compared to existing tick-level approaches that are limited to only a few days of stock data. We demonstrate that our method extracts superior factors from order flow data, enabling significant improvement for stock trend prediction and order execution tasks at the second and minute level.
    摘要 高频量资金投资是股票投资的重要方面。特别是订单流量数据提供了最详细的信息 amid高频度交易,包括订单书和交易记录的粒度水平。订单流量数据非常有价值 для市场分析,因为它们提供了对决策的重要见解。然而,从订单流量数据中提取和充分利用该数据具有挑战,主要是因为该数据的量太大,以及传统因数挖掘技术的局限性,这些技术主要是设计 для粗细度股票数据。为了解决这些挑战,我们提出了一个新的框架,旨在充分提取订单流量数据中的重要因素,并将其应用到不同的细节和情况下。我们的方法包括内容编码器和因数挖掘器。内容编码器会将目前订单流量数据段的内容嵌入学习到一个内容嵌入,考虑到预期和实际市场状态。此外,因数挖掘器会使用无监督学习方法来选择订单流量数据中最重要的信号,这些信号与大多数信号在该情况下最为不同。提取的因素则可以在不同的细节和情况下被重新利用。在实验研究中,我们的提案方法可以高效处理一整年的股票订单流量数据,提供更广泛的应用场景,比起现有的几天股票数据tick水平的方法。我们显示,我们的方法可以从订单流量数据中提取出超越性的因素,导致股票趋势预测和订单执行任务在第二和分钟级别上得到了重要改善。

Is Self-Supervised Pretraining Good for Extrapolation in Molecular Property Prediction?

  • paper_url: http://arxiv.org/abs/2308.08129
  • repo_url: None
  • paper_authors: Shun Takashige, Masatoshi Hanai, Toyotaro Suzumura, Limin Wang, Kenjiro Taura
  • for: 这个论文主要是为了研究如何使用自适应预训练技术提高材料属性预测的准确性。
  • methods: 这个论文使用了自适应预训练技术,首先在无标签数据上训练模型,然后在标签数据上进行目标任务训练。
  • results: 研究发现,自适应预训练可以提高模型对未观察属性值的预测,但不能准确地预测绝对属性值。
    Abstract The prediction of material properties plays a crucial role in the development and discovery of materials in diverse applications, such as batteries, semiconductors, catalysts, and pharmaceuticals. Recently, there has been a growing interest in employing data-driven approaches by using machine learning technologies, in combination with conventional theoretical calculations. In material science, the prediction of unobserved values, commonly referred to as extrapolation, is particularly critical for property prediction as it enables researchers to gain insight into materials beyond the limits of available data. However, even with the recent advancements in powerful machine learning models, accurate extrapolation is still widely recognized as a significantly challenging problem. On the other hand, self-supervised pretraining is a machine learning technique where a model is first trained on unlabeled data using relatively simple pretext tasks before being trained on labeled data for target tasks. As self-supervised pretraining can effectively utilize material data without observed property values, it has the potential to improve the model's extrapolation ability. In this paper, we clarify how such self-supervised pretraining can enhance extrapolation performance.We propose an experimental framework for the demonstration and empirically reveal that while models were unable to accurately extrapolate absolute property values, self-supervised pretraining enables them to learn relative tendencies of unobserved property values and improve extrapolation performance.
    摘要 material的性质预测在材料的开发和发现中扮演着关键性的角色,如电池、半导体、催化剂和药品等。最近,有越来越多的研究者开始使用数据驱动方法,将机器学习技术与传统的理论计算相结合。在物理学中,预测未知值(commonly referred to as extrapolation)是特别重要的,因为它允许研究者对材料进行深入的研究,并且不仅限于可用数据的范围。然而,即使最近的高功能机器学习模型也存在着准确预测的问题。自我超vised pretraining是一种机器学习技术,其中模型首先在无标签数据上通过简单的预TEXT tasks进行训练,然后在标签数据上进行目标任务的训练。由于自我超vised pretraining可以充分利用材料数据无需观察到的性质值,因此它有可能提高模型的预测能力。在这篇论文中,我们解释了如何使用自我超vised pretraining来提高预测性能。我们提出了一种实验框架,并通过实验证明,虽然模型无法准确预测绝对性质值,但是自我超vised pretraining使得它们学习了未知性质值的相对倾向,从而提高预测性能。

How to Mask in Error Correction Code Transformer: Systematic and Double Masking

  • paper_url: http://arxiv.org/abs/2308.08128
  • repo_url: None
  • paper_authors: Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim, Sunghwan Kim, Yongjune Kim, Jong-Seon No
  • for: 提高 Error Correction Code Transformer (ECCT) 的性能和计算复杂度。
  • methods: 引入新的掩码矩阵和修改 ECCT 的 transformer 架构,以提高 decoding 性能。
  • results: 对 ECCT 进行改进,实现了 state-of-the-art 的 decoding 性能,与传统的 decoding 算法相比,带来了显著的性能提升。
    Abstract In communication and storage systems, error correction codes (ECCs) are pivotal in ensuring data reliability. As deep learning's applicability has broadened across diverse domains, there is a growing research focus on neural network-based decoders that outperform traditional decoding algorithms. Among these neural decoders, Error Correction Code Transformer (ECCT) has achieved the state-of-the-art performance, outperforming other methods by large margins. To further enhance the performance of ECCT, we propose two novel methods. First, leveraging the systematic encoding technique of ECCs, we introduce a new masking matrix for ECCT, aiming to improve the performance and reduce the computational complexity. Second, we propose a novel transformer architecture of ECCT called a double-masked ECCT. This architecture employs two different mask matrices in a parallel manner to learn more diverse features of the relationship between codeword bits in the masked self-attention blocks. Extensive simulation results show that the proposed double-masked ECCT outperforms the conventional ECCT, achieving the state-of-the-art decoding performance with significant margins.
    摘要 在通信和存储系统中,错误修正码(ECC)是确保数据可靠性的关键。随着深度学习在多个领域的应用积极扩大,有一个增长的研究重点是基于神经网络的解码器,以超越传统的解码算法。其中,Error Correction Code Transformer(ECCT)已经实现了状态收敛性能,超过其他方法的大幅提高。为了进一步提高ECCT的性能,我们提出了两种新的方法。首先,利用ECC的系统编码技术,我们引入了一个新的面积矩阵,以提高性能并降低计算复杂性。其次,我们提出了一种新的ECCT架构,即双面Masked ECCT。这种架构在并行方式中使用了两个不同的面积矩阵,以学习codeword位数据之间的更多多样性的关系。经过广泛的 simulate结果表明,我们的双面Masked ECCT可以超越传统的ECCT,实现最佳解码性能,并且具有显著的提高。

S-Mixup: Structural Mixup for Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.08097
  • repo_url: https://github.com/sukwonyun/s-mixup
  • paper_authors: Junghurn Kim, Sukwon Yun, Chanyoung Park
  • for: 本研究主要针对图像分类任务进行应用混合技术,而研究在节点分类 tasks 仍然尚未得到充分的研究。
  • methods: 本文提出了一种新的结构混合方法(S-Mixup),利用图гра树神经网络(GNN)分类器来获得 pseudo-标签并且通过 edge gradient 来选择edge。
  • results: 通过对真实世界 benchmark 数据集进行了广泛的实验,我们证明了 S-Mixup 在节点分类任务中的效果,尤其是在不同类型的图像中。
    Abstract Existing studies for applying the mixup technique on graphs mainly focus on graph classification tasks, while the research in node classification is still under-explored. In this paper, we propose a novel mixup augmentation for node classification called Structural Mixup (S-Mixup). The core idea is to take into account the structural information while mixing nodes. Specifically, S-Mixup obtains pseudo-labels for unlabeled nodes in a graph along with their prediction confidence via a Graph Neural Network (GNN) classifier. These serve as the criteria for the composition of the mixup pool for both inter and intra-class mixups. Furthermore, we utilize the edge gradient obtained from the GNN training and propose a gradient-based edge selection strategy for selecting edges to be attached to the nodes generated by the mixup. Through extensive experiments on real-world benchmark datasets, we demonstrate the effectiveness of S-Mixup evaluated on the node classification task. We observe that S-Mixup enhances the robustness and generalization performance of GNNs, especially in heterophilous situations. The source code of S-Mixup can be found at \url{https://github.com/SukwonYun/S-Mixup}
    摘要 先前的研究主要集中在图像分类任务上应用mixup技术,而节点分类任务的研究仍然处于初期阶段。在这篇论文中,我们提出了一种新的节点分类mixup增强方法,称为结构mixup(S-Mixup)。核心思想是在混合节点时考虑结构信息。特别是,S-Mixup使用图神经网络(GNN)分类器获取未标注节点的 pseudo-标签以及其预测信度。这些 pseudo-标签 serve as the criteria for the composition of the mixup pool for both inter and intra-class mixups。此外,我们提出了基于GNN训练的边 Gradient的选择策略,用于选择混合时附加到节点上的边。经过广泛的实验,我们证明了S-Mixup可以增强GNN的Robustness和泛化性,特别在不同类型的情况下。S-Mixup的源代码可以在 \url{https://github.com/SukwonYun/S-Mixup} 找到。

Safety Filter Design for Neural Network Systems via Convex Optimization

  • paper_url: http://arxiv.org/abs/2308.08086
  • repo_url: https://github.com/shaoruchen/nn-system-psf
  • paper_authors: Shaoru Chen, Kong Yao Chee, Nikolai Matni, M. Ani Hsieh, George J. Pappas
  • for: 这个论文目的是提出一种基于 convex 优化的安全筛选方法,以保证基于神经网络(NN)系统的控制器是安全的,面对添加型干扰。
  • methods: 该方法利用 NN 验证工具来上界化 NN 动力学,然后通过Robust linear MPC 搜索一个能 garantuee 约束满足的控制器。
  • results: 数学示例表明,该方法可以有效地保证 NN 系统的安全性,并且可以适应不同的模型误差。
    Abstract With the increase in data availability, it has been widely demonstrated that neural networks (NN) can capture complex system dynamics precisely in a data-driven manner. However, the architectural complexity and nonlinearity of the NNs make it challenging to synthesize a provably safe controller. In this work, we propose a novel safety filter that relies on convex optimization to ensure safety for a NN system, subject to additive disturbances that are capable of capturing modeling errors. Our approach leverages tools from NN verification to over-approximate NN dynamics with a set of linear bounds, followed by an application of robust linear MPC to search for controllers that can guarantee robust constraint satisfaction. We demonstrate the efficacy of the proposed framework numerically on a nonlinear pendulum system.
    摘要 随着数据的增加,已经广泛证明了神经网络(NN)可以准确地捕捉复杂系统动态,以数据驱动的方式。然而,神经网络的建筑复杂性和非线性使得Synthesize a provably safe controller是一项挑战。在这项工作中,我们提出了一种新的安全筛选器,该筛选器基于凸优化来保证系统的安全性,面对加itive disturbances,这些disturbances可以捕捉模型错误。我们的方法利用了神经网络验证的工具,将NN动态覆盖成一组线性上下文,然后通过Robust linear MPC来搜索一个能确保约束满足的控制器。我们通过数值实验示范了我们的框架在非线性护卷系统上的效果。

Rigid Transformations for Stabilized Lower Dimensional Space to Support Subsurface Uncertainty Quantification and Interpretation

  • paper_url: http://arxiv.org/abs/2308.08079
  • repo_url: None
  • paper_authors: Ademide O. Mabadeje, Michael J. Pyrcz
    for:This paper aims to improve the accuracy and repeatability of nonlinear dimensionality reduction (NDR) methods for subsurface datasets, which are characterized by big data challenges such as high dimensionality and complex relationships.methods:The proposed method employs rigid transformations to stabilize the Euclidean invariant representation of the data, integrates out-of-sample points (OOSP), and quantifies uncertainty using a stress ratio (SR) metric.results:The proposed method is validated using synthetic data, distance metrics, and real-world wells from the Duvernay Formation, and shows improved accuracy and repeatability compared to existing methods. The SR metric provides valuable insights into uncertainty, enabling better model adjustments and inferential analysis.
    Abstract Subsurface datasets inherently possess big data characteristics such as vast volume, diverse features, and high sampling speeds, further compounded by the curse of dimensionality from various physical, engineering, and geological inputs. Among the existing dimensionality reduction (DR) methods, nonlinear dimensionality reduction (NDR) methods, especially Metric-multidimensional scaling (MDS), are preferred for subsurface datasets due to their inherent complexity. While MDS retains intrinsic data structure and quantifies uncertainty, its limitations include unstabilized unique solutions invariant to Euclidean transformations and an absence of out-of-sample points (OOSP) extension. To enhance subsurface inferential and machine learning workflows, datasets must be transformed into stable, reduced-dimension representations that accommodate OOSP. Our solution employs rigid transformations for a stabilized Euclidean invariant representation for LDS. By computing an MDS input dissimilarity matrix, and applying rigid transformations on multiple realizations, we ensure transformation invariance and integrate OOSP. This process leverages a convex hull algorithm and incorporates loss function and normalized stress for distortion quantification. We validate our approach with synthetic data, varying distance metrics, and real-world wells from the Duvernay Formation. Results confirm our method's efficacy in achieving consistent LDS representations. Furthermore, our proposed "stress ratio" (SR) metric provides insight into uncertainty, beneficial for model adjustments and inferential analysis. Consequently, our workflow promises enhanced repeatability and comparability in NDR for subsurface energy resource engineering and associated big data workflows.
    摘要 <>translate text into Simplified Chinese<>Subsurface datasets inherently possess big data characteristics such as vast volume, diverse features, and high sampling speeds, further compounded by the curse of dimensionality from various physical, engineering, and geological inputs. Among the existing dimensionality reduction (DR) methods, nonlinear dimensionality reduction (NDR) methods, especially Metric-multidimensional scaling (MDS), are preferred for subsurface datasets due to their inherent complexity. While MDS retains intrinsic data structure and quantifies uncertainty, its limitations include unstabilized unique solutions invariant to Euclidean transformations and an absence of out-of-sample points (OOSP) extension. To enhance subsurface inferential and machine learning workflows, datasets must be transformed into stable, reduced-dimension representations that accommodate OOSP. Our solution employs rigid transformations for a stabilized Euclidean invariant representation for LDS. By computing an MDS input dissimilarity matrix, and applying rigid transformations on multiple realizations, we ensure transformation invariance and integrate OOSP. This process leverages a convex hull algorithm and incorporates loss function and normalized stress for distortion quantification. We validate our approach with synthetic data, varying distance metrics, and real-world wells from the Duvernay Formation. Results confirm our method's efficacy in achieving consistent LDS representations. Furthermore, our proposed "stress ratio" (SR) metric provides insight into uncertainty, beneficial for model adjustments and inferential analysis. Consequently, our workflow promises enhanced repeatability and comparability in NDR for subsurface energy resource engineering and associated big data workflows.

Decentralized Graph Neural Network for Privacy-Preserving Recommendation

  • paper_url: http://arxiv.org/abs/2308.08072
  • repo_url: None
  • paper_authors: Xiaolin Zheng, Zhongyu Wang, Chaochao Chen, Jiashu Qian, Yao Yang
  • for: 提出了一种隐私保护的图学链接分析系统,解决了现有方法的通信效率低下和隐私泄露问题。
  • methods: 该方法包括三个阶段:图构建、本地梯度计算和全局梯度传递。在第一阶段,为每名用户构建了本地内项质量图和全局用户图。在第二阶段,用户偏好模型化并计算每个本地设备上的梯度。在第三阶段,实现了一种安全梯度分享机制,以保障用户私人数据的隐私。
  • results: 通过对三个公共数据集进行广泛的实验 validate了我们的框架在不同的情况下的一致性优于现有方法。
    Abstract Building a graph neural network (GNN)-based recommender system without violating user privacy proves challenging. Existing methods can be divided into federated GNNs and decentralized GNNs. But both methods have undesirable effects, i.e., low communication efficiency and privacy leakage. This paper proposes DGREC, a novel decentralized GNN for privacy-preserving recommendations, where users can choose to publicize their interactions. It includes three stages, i.e., graph construction, local gradient calculation, and global gradient passing. The first stage builds a local inner-item hypergraph for each user and a global inter-user graph. The second stage models user preference and calculates gradients on each local device. The third stage designs a local differential privacy mechanism named secure gradient-sharing, which proves strong privacy-preserving of users' private data. We conduct extensive experiments on three public datasets to validate the consistent superiority of our framework.
    摘要 建立一个基于图 neural network(GNN)的推荐系统,不让用户隐私泄露是一个挑战。现有的方法可以分为联邦GNN和分散GNN两种。但是这两种方法都有不良影响,即低通信效率和隐私泄露。本文提出了DGREC,一个新的分散GNN推荐系统,其中用户可以选择公开其互动。这个系统包括三个阶段:图建构、本地梯度计算和全球梯度传递。第一阶段在每个用户的本地内部项目图中建立了一个内部图,并在所有用户之间建立了一个全球跨用户图。第二阶段模型用户的喜好,并在每个本地设备上计算梯度。第三阶段设计了一个本地隐私保证机制,名为安全梯度分享,证明了用户隐私的严格保证。我们在三个公共数据集上进行了广泛的实验,以验证我们的框架的一致性和优化。

Freshness or Accuracy, Why Not Both? Addressing Delayed Feedback via Dynamic Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.08071
  • repo_url: None
  • paper_authors: Xiaolin Zheng, Zhongyu Wang, Chaochao Chen, Feng Zhu, Jiashu Qian
    for: 这篇论文旨在解决在线商业系统中的延迟反馈问题,因为用户的延迟反馈通常会导致模型训练受到影响。methods: 本论文提出了一个名为延迟反馈模型化(DGDFEM)的方法,它包括三个阶段:准备一个数据管线、建立动态图和训练一个CVR预测模型。在模型训练过程中,我们提出了一种新的图解扩展方法 named HLGCN,它可以处理对于转换和非转换关系的处理。results: 我们进行了广泛的实验, validate the consistent superiority of our method on three industry datasets。
    Abstract The delayed feedback problem is one of the most pressing challenges in predicting the conversion rate since users' conversions are always delayed in online commercial systems. Although new data are beneficial for continuous training, without complete feedback information, i.e., conversion labels, training algorithms may suffer from overwhelming fake negatives. Existing methods tend to use multitask learning or design data pipelines to solve the delayed feedback problem. However, these methods have a trade-off between data freshness and label accuracy. In this paper, we propose Delayed Feedback Modeling by Dynamic Graph Neural Network (DGDFEM). It includes three stages, i.e., preparing a data pipeline, building a dynamic graph, and training a CVR prediction model. In the model training, we propose a novel graph convolutional method named HLGCN, which leverages both high-pass and low-pass filters to deal with conversion and non-conversion relationships. The proposed method achieves both data freshness and label accuracy. We conduct extensive experiments on three industry datasets, which validate the consistent superiority of our method.
    摘要 延迟反馈问题是在预测转化率时最为紧迫的挑战,因为在线商业系统中用户的转化都会延迟。新的数据对于连续训练有益,但是无法获得完整的反馈信息,即转化标签,训练算法可能会受到干扰性的假负样本的影响。现有方法通常使用多任务学习或设计数据管道来解决延迟反馈问题,但这些方法存在数据新鲜度和标签准确性之间的负担。在本文中,我们提出延迟反馈模型化方法(DGDFEM),它包括三个阶段:准备数据管道、建立动态图和训练转化率预测模型。在模型训练中,我们提出了一种新的图 convolution方法 named HLGCN,它利用了高通和低通滤波器来处理转化和非转化关系。我们的方法可以同时保证数据新鲜度和标签准确性。我们对三个行业数据集进行了广泛的实验,并证明了我们的方法的一致性优势。

Max-affine regression via first-order methods

  • paper_url: http://arxiv.org/abs/2308.08070
  • repo_url: None
  • paper_authors: Seonho Kim, Kiryung Lee
  • for: 这个论文是研究max-affine模型的回归问题,该模型可以通过综合多个affine模型使用最大函数来生成一个piecewise线性模型。
  • methods: 论文使用了梯度下降(GD)和批处理梯度下降(SGD)来解决max-affine模型的回归问题,并进行了非假设性分析。
  • results: 研究发现,在随机观察到模型的情况下,GD和SGD在sub-Gaussian性和反射性下 converge linearly to a neighborhood of the ground truth,并且SGD在低样本场景中不仅更快速 convergence,还能够超过alternating minimization和GD的性能。
    Abstract We consider regression of a max-affine model that produces a piecewise linear model by combining affine models via the max function. The max-affine model ubiquitously arises in applications in signal processing and statistics including multiclass classification, auction problems, and convex regression. It also generalizes phase retrieval and learning rectifier linear unit activation functions. We present a non-asymptotic convergence analysis of gradient descent (GD) and mini-batch stochastic gradient descent (SGD) for max-affine regression when the model is observed at random locations following the sub-Gaussianity and an anti-concentration with additive sub-Gaussian noise. Under these assumptions, a suitably initialized GD and SGD converge linearly to a neighborhood of the ground truth specified by the corresponding error bound. We provide numerical results that corroborate the theoretical finding. Importantly, SGD not only converges faster in run time with fewer observations than alternating minimization and GD in the noiseless scenario but also outperforms them in low-sample scenarios with noise.
    摘要 我们考虑一种最大平均模型,它生成一个分割线性模型,通过最大函数将多个平均模型相互结合。这种最大平均模型广泛应用于信号处理和统计领域,包括多类分类、拍卖问题和凸回归。它还泛化phas retrieval和学习Rectifier Linear Unit激活函数。我们对梯度下降(GD)和批处理梯度下降(SGD)在最大平均 regresión中进行非假设性分析,当模型在随机位置上观察时,ASSUMING SUB-Gaussianity和反射激活函数。在这些假设下,初始化GD和SGD会在一个固定误差 bound 的附近Linearly converge。我们提供了数值结果,证明了这些理论发现。重要的是,SGD不仅在干净场景下更快的 converge 时间和观察 fewer than alternating minimization和GD,而且在噪声场景下也超越它们。

A Reinforcement Learning Approach for Performance-aware Reduction in Power Consumption of Data Center Compute Nodes

  • paper_url: http://arxiv.org/abs/2308.08069
  • repo_url: https://github.com/akhileshraj91/generalized_rl_anl
  • paper_authors: Akhilesh Raj, Swann Perarnau, Aniruddha Gokhale
  • for: 本研究旨在透过实现协调处理器的电力消耗和应用程序的性能,以减少云端资料中心的能源需求。
  • methods: 本研究使用了强化学习(Reinforcement Learning)来设计云端处理器的电力规则,并与argoNode资源管理软件库和Intel Running Average Power Limit(RAPL)硬件控制机制搭配使用。
  • results: 经过训练的代理人可以通过调整处理器的最大供电功率,以实现协调电力消耗和应用程序性能。在使用STREAM套件进行评估时,已经示出了一个可以找到平衡点的实时执行环境。
    Abstract As Exascale computing becomes a reality, the energy needs of compute nodes in cloud data centers will continue to grow. A common approach to reducing this energy demand is to limit the power consumption of hardware components when workloads are experiencing bottlenecks elsewhere in the system. However, designing a resource controller capable of detecting and limiting power consumption on-the-fly is a complex issue and can also adversely impact application performance. In this paper, we explore the use of Reinforcement Learning (RL) to design a power capping policy on cloud compute nodes using observations on current power consumption and instantaneous application performance (heartbeats). By leveraging the Argo Node Resource Management (NRM) software stack in conjunction with the Intel Running Average Power Limit (RAPL) hardware control mechanism, we design an agent to control the maximum supplied power to processors without compromising on application performance. Employing a Proximal Policy Optimization (PPO) agent to learn an optimal policy on a mathematical model of the compute nodes, we demonstrate and evaluate using the STREAM benchmark how a trained agent running on actual hardware can take actions by balancing power consumption and application performance.
    摘要 In this paper, we explore the use of Reinforcement Learning (RL) to design a power capping policy on cloud compute nodes using observations on current power consumption and instantaneous application performance (heartbeats). By leveraging the Argo Node Resource Management (NRM) software stack in conjunction with the Intel Running Average Power Limit (RAPL) hardware control mechanism, we design an agent to control the maximum supplied power to processors without compromising on application performance.Employing a Proximal Policy Optimization (PPO) agent to learn an optimal policy on a mathematical model of the compute nodes, we demonstrate and evaluate using the STREAM benchmark how a trained agent running on actual hardware can take actions by balancing power consumption and application performance.

The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models

  • paper_url: http://arxiv.org/abs/2308.08061
  • repo_url: None
  • paper_authors: Abi Aryan, Aakash Kumar Nain, Andrew McMahon, Lucas Augusto Meyer, Harpreet Singh Sahota
    for: 这个论文是为了解决在生产环境中部署机器学习模型时,常见的三个属性的问题。methods: 论文提出了一种框架,用于考虑这三个属性的关系,并为大语言模型的开发、部署和管理提供了新的思路。results: 论文表明,通过这种框架,可以帮助企业更好地评估大语言模型的投资,并且可以在生产环境中部署这些模型,以便更好地满足企业的需求。
    Abstract When deploying machine learning models in production for any product/application, there are three properties that are commonly desired. First, the models should be generalizable, in that we can extend it to further use cases as our knowledge of the domain area develops. Second they should be evaluable, so that there are clear metrics for performance and the calculation of those metrics in production settings are feasible. Finally, the deployment should be cost-optimal as far as possible. In this paper we propose that these three objectives (i.e. generalization, evaluation and cost-optimality) can often be relatively orthogonal and that for large language models, despite their performance over conventional NLP models, enterprises need to carefully assess all the three factors before making substantial investments in this technology. We propose a framework for generalization, evaluation and cost-modeling specifically tailored to large language models, offering insights into the intricacies of development, deployment and management for these large language models.
    摘要
  1. 模型应该可扩展,以便在未来我们对领域知识的发展中可以继续使用它。2. 模型应该可评估,以便有明确的性能指标和在生产环境中计算这些指标是可能的。3. 模型的部署应该是可最化的,尽可能地低效。在这篇论文中,我们提出了一个框架,用于模型泛化、评估和成本模型,专门针对大语言模型。这个框架可以为大语言模型的开发、部署和管理提供深入的理解。我们认为这三个目标通常是相互独立的,而且对于企业来说,在投入大语言模型技术之前,需要仔细评估这三个因素。

Robust Bayesian Tensor Factorization with Zero-Inflated Poisson Model and Consensus Aggregation

  • paper_url: http://arxiv.org/abs/2308.08060
  • repo_url: https://github.com/klarman-cell-observatory/scbtf_experiments
  • paper_authors: Daniel Chafamo, Vignesh Shanmugam, Neriman Tokcan
  • for: 这个论文的目的是提出一种新的tensor因子分解方法,以处理高维计数数据中的空值偏好。
  • methods: 这个论文使用的方法包括Zero Inflated Poisson Tensor Factorization(ZIPTF)和Consensus Zero Inflated Poisson Tensor Factorization(C-ZIPTF),它们都是基于tensor因子分解的新方法,可以处理高维计数数据中的空值偏好。
  • results: 实验结果表明,ZIPTF和C-ZIPTF在各种 sintetic和实际的Single-cell RNA sequencing(scRNA-seq)数据中都能够成功地重建known和生物学意义的蛋白表达计划。特别是,当数据中存在高概率的空值时,ZIPTF可以达到2.4倍的准确率提高。此外,C-ZIPTF可以提高因子化结果的一致性和准确率。
    Abstract Tensor factorizations (TF) are powerful tools for the efficient representation and analysis of multidimensional data. However, classic TF methods based on maximum likelihood estimation underperform when applied to zero-inflated count data, such as single-cell RNA sequencing (scRNA-seq) data. Additionally, the stochasticity inherent in TFs results in factors that vary across repeated runs, making interpretation and reproducibility of the results challenging. In this paper, we introduce Zero Inflated Poisson Tensor Factorization (ZIPTF), a novel approach for the factorization of high-dimensional count data with excess zeros. To address the challenge of stochasticity, we introduce Consensus Zero Inflated Poisson Tensor Factorization (C-ZIPTF), which combines ZIPTF with a consensus-based meta-analysis. We evaluate our proposed ZIPTF and C-ZIPTF on synthetic zero-inflated count data and synthetic and real scRNA-seq data. ZIPTF consistently outperforms baseline matrix and tensor factorization methods in terms of reconstruction accuracy for zero-inflated data. When the probability of excess zeros is high, ZIPTF achieves up to $2.4\times$ better accuracy. Additionally, C-ZIPTF significantly improves the consistency and accuracy of the factorization. When tested on both synthetic and real scRNA-seq data, ZIPTF and C-ZIPTF consistently recover known and biologically meaningful gene expression programs.
    摘要 tensor分解(TF)是一种高效的工具,用于表示和分析多维数据。然而,经典的TF方法基于最大化可能性估计,在应用于零含量数据,如单元RNA测序(scRNA-seq)数据时,表现不佳。此外,TF的恒定性使得因素在重复运行中异常,这使得结果的解释和复制困难。在本文中,我们介绍了零含量ポイッソンtensor分解(ZIPTF),一种用于高维计数数据中的零含量因素分解的新方法。为了解决恒定性挑战,我们引入了consensus零含量ポイッソンtensor分解(C-ZIPTF),它将ZIPTF与consensus-based meta-analysis结合。我们对ZIPTF和C-ZIPTF进行了synthetic zero-inflated count data和synthetic和实际scRNA-seq数据的评估。ZIPTF在零含量数据上的重建精度比基eline矩阵和tensor分解方法高,当零含量probability高时,ZIPTF可以达到2.4倍的精度提高。此外,C-ZIPTF可以提高因素分解的一致性和精度。当测试在synthetic和实际scRNA-seq数据上时,ZIPTF和C-ZIPTF都可以一致地恢复知道的和生物学意义的蛋白表达程序。

Simple online learning with consistency oracle

  • paper_url: http://arxiv.org/abs/2308.08055
  • repo_url: None
  • paper_authors: Alexander Kozachinskiy, Tomasz Steifer
  • for: 本研究旨在提出一种在模型中进行在线学习的算法,该模型允许学习算法只能通过一个一致性oracle访问类。
  • methods: 本研究使用了一种新的算法,该算法可以在类的Littlestone维度为$d$时 makest at most $O(256^d)$ mistake。我们的证明比前一个算法简单得多,只需要使用了基本的Littlestone维度属性。
  • results: 本研究的结果是,无论类的Littlestone维度如何大,我们的算法都可以在类中 makest at most $O(256^d)$ mistake。此外,我们还证明了Hasrati和Ben-David(ALT’23)的开题,即每个有 recursively enumerable representation 的类都可以有一个可计算的在线学习算法(可能是undefined on unrealizable samples)。
    Abstract We consider online learning in the model where a learning algorithm can access the class only via the consistency oracle -- an oracle, that, at any moment, can give a function from the class that agrees with all examples seen so far. This model was recently considered by Assos et al. (COLT'23). It is motivated by the fact that standard methods of online learning rely on computing the Littlestone dimension of subclasses, a problem that is computationally intractable. Assos et al. gave an online learning algorithm in this model that makes at most $C^d$ mistakes on classes of Littlestone dimension $d$, for some absolute unspecified constant $C > 0$. We give a novel algorithm that makes at most $O(256^d)$ mistakes. Our proof is significantly simpler and uses only very basic properties of the Littlestone dimension. We also observe that there exists no algorithm in this model that makes at most $2^{d+1}-2$ mistakes. We also observe that our algorithm (as well as the algorithm of Assos et al.) solves an open problem by Hasrati and Ben-David (ALT'23). Namely, it demonstrates that every class of finite Littlestone dimension with recursively enumerable representation admits a computable online learner (that may be undefined on unrealizable samples).
    摘要 我们考虑在模型中进行在网络学习,其中学习算法可以通过一个一致性 oracle 访问类别。这个 oracle 可以在任何时刻给出一个对应到所有已经看过的例子的函数。这个模型最近由Assos et al. (COLT'23) 考虑过。它是由于标准的在网络学习方法需要 Computing Littlestone 维度的问题是 computationally intractable 而提出的。Assos et al. 提出了一个在这个模型中的线上学习算法,它在类别的 Littlestone 维度为 $d$ 时会误导最多 $C^d$ 次。我们提出了一个新的算法,它在类别的 Littlestone 维度为 $d$ 时会误导最多 $O(256^d)$ 次。我们的证明比较简单,只需要使用一些非常基本的 Littlestone 维度的性质。我们还观察到,在这个模型中没有任何算法可以在类别的 Littlestone 维度为 $d$ 时误导最多 $2^{d+1}-2$ 次。此外,我们的算法(以及Assos et al. 的算法)解决了 Hasrati 和 Ben-David (ALT'23) 的开问题。即,它证明了所有具有 finite Littlestone 维度的类别,都存在可 computable 的线上学习者(可能是 undefined 的 samples)。

Natural Evolution Strategies as a Black Box Estimator for Stochastic Variational Inference

  • paper_url: http://arxiv.org/abs/2308.08053
  • repo_url: None
  • paper_authors: Ahmad Ayaz Amin
  • for: 用于超越 varyational autoencoders(VAE)中的权重估计问题,以实现 Bayesian 推断在大数据集上的有效进行。
  • methods: 使用自然进化策略(Natural Evolution Strategies)提出的一种alternative estimator,不假设使用的分布类型,allowing for the creation of models that would otherwise not have been possible under the VAE framework。
  • results: 提出的estimator不受权重估计问题的限制,可以创建不同类型的模型,并且可以在VAE中实现更高的效果。
    Abstract Stochastic variational inference and its derivatives in the form of variational autoencoders enjoy the ability to perform Bayesian inference on large datasets in an efficient manner. However, performing inference with a VAE requires a certain design choice (i.e. reparameterization trick) to allow unbiased and low variance gradient estimation, restricting the types of models that can be created. To overcome this challenge, an alternative estimator based on natural evolution strategies is proposed. This estimator does not make assumptions about the kind of distributions used, allowing for the creation of models that would otherwise not have been possible under the VAE framework.
    摘要 Here's the text in Simplified Chinese: Stochastic variational inference和其 derivatives in the form of variational autoencoders具有能够有效地进行 Bayesian inference on 大量数据的能力。然而,使用 VAE 进行推理需要特定的设计选择(即重parameterization trick),以实现无偏误和低幂度的Gradient估计,限制了可以创建的模型类型。为了解决这个挑战,一种基于自然进化策略的替代估计器被提议。这个估计器不会对分布类型进行假设,允许创建不可能在 VAE 框架下创建的模型。

Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem

  • paper_url: http://arxiv.org/abs/2308.08051
  • repo_url: None
  • paper_authors: Elena Gal, Shaun Singh, Aldo Pacchiano, Ben Walker, Terry Lyons, Jakob Foerster
  • for: 本研究targets 实际世界中的二分类决策问题,即基于有限数据和实时决策的借款申请等问题。
  • methods: 本研究使用了对抗优化(AdOpt)来直接Address bias in the training set,通过对适应域适应来学习不偏且有用的表示。
  • results: AdOptsignificantly exceeds state-of-the-art performance on a set of challenging benchmark problems, and our experiments also provide initial evidence that the introduction of adversarial domain adaptation improves fairness in this setting.
    Abstract In many real world settings binary classification decisions are made based on limited data in near real-time, e.g. when assessing a loan application. We focus on a class of these problems that share a common feature: the true label is only observed when a data point is assigned a positive label by the principal, e.g. we only find out whether an applicant defaults if we accepted their loan application. As a consequence, the false rejections become self-reinforcing and cause the labelled training set, that is being continuously updated by the model decisions, to accumulate bias. Prior work mitigates this effect by injecting optimism into the model, however this comes at the cost of increased false acceptance rate. We introduce adversarial optimism (AdOpt) to directly address bias in the training set using adversarial domain adaptation. The goal of AdOpt is to learn an unbiased but informative representation of past data, by reducing the distributional shift between the set of accepted data points and all data points seen thus far. AdOpt significantly exceeds state-of-the-art performance on a set of challenging benchmark problems. Our experiments also provide initial evidence that the introduction of adversarial domain adaptation improves fairness in this setting.
    摘要 在许多实际世界中的设定下,二进制分类决策基于有限数据集,例如评审借款申请。我们关注一类问题,这些问题共同特点是:真实标签只有当数据点被主体分配正确标签时才能见到,例如只有当借款申请被accept时才能确定 defaults。这导致false rejects 自我加强,从而导致labels 训练集,由模型决策而不断更新的,受到偏见。先前的工作利用模型内置optimism来mitigate这种效应,但是这会导致准确接受率增加。我们引入对抗优化(AdOpt),直接通过对抗领域适应来纠正训练集的偏见。AdOpt的目标是学习不偏见但具有信息的过去数据表示,通过减少所有见过的数据点和接受数据点之间的分布差异来实现。AdOpt在一组复杂的benchmark问题上表现出了显著超过状态艺术性的表现。我们的实验还提供了初步证据,表明在这种设定下,引入对抗领域适应可以改善公平性。

Regret Lower Bounds in Multi-agent Multi-armed Bandit

  • paper_url: http://arxiv.org/abs/2308.08046
  • repo_url: None
  • paper_authors: Mengfan Xu, Diego Klabjan
  • for: 本研究的目的是提供多臂投注机制下的证明Upper bound和Lower bound的方法,以及对这些方法的分析和比较。
  • methods: 本研究使用了多臂投注机制,并提供了一系列的算法和方法来解决这些问题。
  • results: 本研究提供了一系列的Lower bound,包括适用于各种设定下的下界,以及与之前的研究中的Upper bound之间的差异。
    Abstract Multi-armed Bandit motivates methods with provable upper bounds on regret and also the counterpart lower bounds have been extensively studied in this context. Recently, Multi-agent Multi-armed Bandit has gained significant traction in various domains, where individual clients face bandit problems in a distributed manner and the objective is the overall system performance, typically measured by regret. While efficient algorithms with regret upper bounds have emerged, limited attention has been given to the corresponding regret lower bounds, except for a recent lower bound for adversarial settings, which, however, has a gap with let known upper bounds. To this end, we herein provide the first comprehensive study on regret lower bounds across different settings and establish their tightness. Specifically, when the graphs exhibit good connectivity properties and the rewards are stochastically distributed, we demonstrate a lower bound of order $O(\log T)$ for instance-dependent bounds and $\sqrt{T}$ for mean-gap independent bounds which are tight. Assuming adversarial rewards, we establish a lower bound $O(T^{\frac{2}{3})$ for connected graphs, thereby bridging the gap between the lower and upper bound in the prior work. We also show a linear regret lower bound when the graph is disconnected. While previous works have explored these settings with upper bounds, we provide a thorough study on tight lower bounds.
    摘要 多臂猎手驱动方法的提高方法拥有可证明的上界 regret,同时对应的下界Bound也得到了广泛的研究。在这个 Setting 中,Recently, Multi-agent Multi-armed Bandit 在不同领域中得到了广泛应用,每个客户端面临分布式的猎手问题,系统性能通常由 regret 来衡量。虽然有效的算法出现了,但对应的 regret 下界却受到了限制的注意。为此,我们在这里提供了首次对 regret 下界的全面研究,并证明其紧致性。 Specifically, 当图表现出好连接性和奖励是随机分布的时候,我们展示了一个下界 bound 的规模为 $\order{ \log T}$ 的实例依赖性 bounds 和 $\sqrt{T}$ 的不相互独立 bounds,这些下界都是紧致的。在对抗性奖励情况下,我们提出了一个下界 bound 的规模为 $O(T^{ \frac{2}{3})$,这个 bound 可以将上下界之间的差异bridged。此外,我们还证明了连接图时的线性下界 bound。而在之前的工作中,只有对 upper bounds 进行了研究。我们对这些设定进行了全面的研究,并证明了其下界的紧致性。

A Comparative Analysis of the Capabilities of Nature-inspired Feature Selection Algorithms in Predicting Student Performance

  • paper_url: http://arxiv.org/abs/2308.08574
  • repo_url: None
  • paper_authors: Thomas Trask
  • for: 预测学生表现,以便采取有效的预failure措施,帮助学生避免落后。
  • methods: 使用12种自然引导的算法,包括特征选择和传统机器学习算法,对3个数据集进行预测,包括单元操作数据、单一课程表现数据和同时攻读多门课程表现数据。
  • results: 结果表明,对于所有数据集,使用NIAs进行特征选择和传统机器学习算法进行分类,可以提高预测精度,同时减少特征集的大小。
    Abstract Predicting student performance is key in leveraging effective pre-failure interventions for at-risk students. In this paper, I have analyzed the relative performance of a suite of 12 nature-inspired algorithms when used to predict student performance across 3 datasets consisting of instance-based clickstream data, intra-course single-course performance, and performance when taking multiple courses simultaneously. I found that, for all datasets, leveraging an ensemble approach using NIAs for feature selection and traditional ML algorithms for classification increased predictive accuracy while also reducing feature set size by 2/3.
    摘要 预测学生表现是关键在实施有效的预failure措施的过程中,以帮助学生避免失败。在这篇论文中,我分析了12种自然引导的算法在预测学生表现方面的相对性,并使用这些算法来选择特征并使用传统的机器学习算法进行分类。我发现,无论 dataset 是哪一个,使用 Ensemble 方法和 NIAs 选择特征,并使用传统的机器学习算法进行分类,可以提高预测精度,同时减少特征集的大小,比例为2/3。

Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks

  • paper_url: http://arxiv.org/abs/2308.08030
  • repo_url: None
  • paper_authors: Tian-Yi Zhou, Xiaoming Huo
  • for: 这个论文研究了使用深度ReLU神经网络进行 ${\mathbb R}^d$ 上二分类问题的解决方案,不受模型参数的限制。
  • methods: 我们使用了深度ReLU神经网络,并提供了一个新的近似误差 bounds для一般分析函数,以便进行分析。
  • results: 我们得到了不依赖于维度 $d$ 的非假 asymptotic upper bounds 和折补风险(类别错误率)的强化证明,表明深度ReLU 神经网络可以超越维度的咒语。
    Abstract This paper studies the binary classification of unbounded data from ${\mathbb R}^d$ generated under Gaussian Mixture Models (GMMs) using deep ReLU neural networks. We obtain $\unicode{x2013}$ for the first time $\unicode{x2013}$ non-asymptotic upper bounds and convergence rates of the excess risk (excess misclassification error) for the classification without restrictions on model parameters. The convergence rates we derive do not depend on dimension $d$, demonstrating that deep ReLU networks can overcome the curse of dimensionality in classification. While the majority of existing generalization analysis of classification algorithms relies on a bounded domain, we consider an unbounded domain by leveraging the analyticity and fast decay of Gaussian distributions. To facilitate our analysis, we give a novel approximation error bound for general analytic functions using ReLU networks, which may be of independent interest. Gaussian distributions can be adopted nicely to model data arising in applications, e.g., speeches, images, and texts; our results provide a theoretical verification of the observed efficiency of deep neural networks in practical classification problems.
    摘要 Simplified Chinese translation:这篇论文研究了使用深度ReLU神经网络进行 $\mathbb{R}^d$ 中无穷数据的二分类问题,基于 Gaussian Mixture Models (GMMs)。我们得到了 $\unicode{x2013}$ 的首次非增长上界和抽象率,而这些上界不依赖维度 $d$。这表明深度ReLU 网络可以超越维度约束。大多数现有的泛化分析中的 Classification 算法都是基于固定维度的,而我们则考虑了无穷维度的情况,通过利用 Gaussian 分布的分布和快速衰减。为便于我们的分析,我们还提出了一个 novel 的 Approximation 错误 bound для一般的分析函数,可能具有独立的 интерес。 Gaussian 分布可以很好地模型应用中的数据,如演讲、图像和文本等;我们的结果提供了实际问题中深度神经网络的理论验证。

Planning to Learn: A Novel Algorithm for Active Learning during Model-Based Planning

  • paper_url: http://arxiv.org/abs/2308.08029
  • repo_url: https://github.com/rowanlibr/sophisticated-learning
  • paper_authors: Rowan Hodson, Bruce Bassett, Charel van Hoof, Benjamin Rosman, Mark Solms, Jonathan P. Shock, Ryan Smith
  • for: 这 paper 的目的是比较 Active Inference 和 Bayesian reinforcement learning schemes 在解决类似问题上的性能,以及提出一种 incorporating active learning during planning 的方法。
  • methods: 这 paper 使用了 Sophisticated Inference (SI) 算法和 Sophisticated Learning (SL) 算法,SI 使用了回归搜索来解决多步计划问题,而 SL 维护了对模型参数的信念变化的想法,以实现对未来观测的反思式学习。
  • results: simulations 表明,SL 在一种生物学上levant的环境中表现出色,比 Bayes-adaptive RL 和 upper confidence bound algorithms 更高效,这些算法都使用了类似的原则(例如,导向探索和对未来观测的反思)来解决多步计划问题。
    Abstract Active Inference is a recent framework for modeling planning under uncertainty. Empirical and theoretical work have now begun to evaluate the strengths and weaknesses of this approach and how it might be improved. A recent extension - the sophisticated inference (SI) algorithm - improves performance on multi-step planning problems through recursive decision tree search. However, little work to date has been done to compare SI to other established planning algorithms. SI was also developed with a focus on inference as opposed to learning. The present paper has two aims. First, we compare performance of SI to Bayesian reinforcement learning (RL) schemes designed to solve similar problems. Second, we present an extension of SI - sophisticated learning (SL) - that more fully incorporates active learning during planning. SL maintains beliefs about how model parameters would change under the future observations expected under each policy. This allows a form of counterfactual retrospective inference in which the agent considers what could be learned from current or past observations given different future observations. To accomplish these aims, we make use of a novel, biologically inspired environment designed to highlight the problem structure for which SL offers a unique solution. Here, an agent must continually search for available (but changing) resources in the presence of competing affordances for information gain. Our simulations show that SL outperforms all other algorithms in this context - most notably, Bayes-adaptive RL and upper confidence bound algorithms, which aim to solve multi-step planning problems using similar principles (i.e., directed exploration and counterfactual reasoning). These results provide added support for the utility of Active Inference in solving this class of biologically-relevant problems and offer added tools for testing hypotheses about human cognition.
    摘要 aktive Inference 是一种最近的 планинг下 uncertainty 的框架。 empirical 和 theoretischen 工作已经开始评估这种方法的优劣和如何改进它。一种最近的扩展 - 智能推理(SI)算法 - 在多步 планинг问题上提高性能通过 recursively 决策树搜索。然而,到目前为止,尚未对 SI 与其他已知的 планинг算法进行比较。SI 也是在推理方面而不是学习方面进行发展。本文的两个目标是:首先, Comparing the performance of SI with Bayesian reinforcement learning (RL) schemes designed to solve similar problems; second, presenting an extension of SI - sophisticated learning (SL) - that more fully incorporates active learning during planning. SL maintains beliefs about how model parameters would change under the future observations expected under each policy, allowing a form of counterfactual retrospective inference in which the agent considers what could be learned from current or past observations given different future observations. To accomplish these aims, we make use of a novel, biologically inspired environment designed to highlight the problem structure for which SL offers a unique solution. In this environment, an agent must continually search for available (but changing) resources in the presence of competing affordances for information gain. Our simulations show that SL outperforms all other algorithms in this context - most notably, Bayes-adaptive RL and upper confidence bound algorithms, which aim to solve multi-step planning problems using similar principles (i.e., directed exploration and counterfactual reasoning). These results provide added support for the utility of Active Inference in solving this class of biologically-relevant problems and offer added tools for testing hypotheses about human cognition.

Potential Energy Advantage of Quantum Economy

  • paper_url: http://arxiv.org/abs/2308.08025
  • repo_url: None
  • paper_authors: Junyu Liu, Hansheng Jiang, Zuo-Jun Max Shen
  • for: 本研究旨在探讨量子计算在能源效率方面的优势,并证明量子计算可以在能源消耗方面比经典计算更高效。
  • methods: 我们使用 Cournot 竞争模型,将能源消耗作为约束条件,并通过 Nash 均衡来证明量子计算机器可以在财务收益和能源效率两个方面比经典计算机器更高效。
  • results: 我们发现,量子计算在大规模计算中可以获得更高的能源效率优势,并且需要在大规模的操作范围内进行计算。基于实际物理参数,我们还证明了实现这种能源效率优势所需的规模的尺度。
    Abstract Energy cost is increasingly crucial in the modern computing industry with the wide deployment of large-scale machine learning models and language models. For the firms that provide computing services, low energy consumption is important both from the perspective of their own market growth and the government's regulations. In this paper, we study the energy benefits of quantum computing vis-a-vis classical computing. Deviating from the conventional notion of quantum advantage based solely on computational complexity, we redefine advantage in an energy efficiency context. Through a Cournot competition model constrained by energy usage, we demonstrate quantum computing firms can outperform classical counterparts in both profitability and energy efficiency at Nash equilibrium. Therefore quantum computing may represent a more sustainable pathway for the computing industry. Moreover, we discover that the energy benefits of quantum computing economies are contingent on large-scale computation. Based on real physical parameters, we further illustrate the scale of operation necessary for realizing this energy efficiency advantage.
    摘要 现代计算业务中能源成本日益重要,尤其是大规模机器学习模型和自然语言处理模型的广泛部署。为提供计算服务的公司而言,低能耗是重要的,不仅从市场增长的角度来看,也从政府的法规来看。在这篇论文中,我们研究了量子计算对于纳什平衡下的能源利益。我们偏离了传统的量子优势概念,定义了能效上的优势。使用偏处比率模型,我们示出了量子计算公司在纳什平衡下可以超越经典对手,在利润和能效性方面表现出优势。因此,量子计算可能代表计算业务更可持续的发展 paths。此外,我们发现了大规模计算的能源利益,并通过实际物理参数来证明实现这种能效优势的规模。

Active Inverse Learning in Stackelberg Trajectory Games

  • paper_url: http://arxiv.org/abs/2308.08017
  • repo_url: None
  • paper_authors: Yue Yu, Jacob Levy, Negar Mehr, David Fridovich-Keil, Ufuk Topcu
  • for: 本文研究的目的是使用游戏理论学习来推断玩家的目标函数。
  • methods: 本文提出了一种活动式反向学习方法,使得领导者可以尝试不同的假设来描述追随者的目标函数。这种方法不同于现有方法,不是通过消耗观察到的行为来进行反向学习。而是通过活动地增加不同假设下的追随者的轨迹差异,以加速领导者的推断。
  • results: 在循环轨迹游戏中,提出的方法可以大幅提高领导者对不同假设下追随者轨迹的 Conditional Probability 的快速抽象。相比于随机输入,提出的领导者输入可以加速抽象的推断过程,几乎可以达到多个数量级的提速。
    Abstract Game-theoretic inverse learning is the problem of inferring the players' objectives from their actions. We formulate an inverse learning problem in a Stackelberg game between a leader and a follower, where each player's action is the trajectory of a dynamical system. We propose an active inverse learning method for the leader to infer which hypothesis among a finite set of candidates describes the follower's objective function. Instead of using passively observed trajectories like existing methods, the proposed method actively maximizes the differences in the follower's trajectories under different hypotheses to accelerate the leader's inference. We demonstrate the proposed method in a receding-horizon repeated trajectory game. Compared with uniformly random inputs, the leader inputs provided by the proposed method accelerate the convergence of the probability of different hypotheses conditioned on the follower's trajectory by orders of magnitude.
    摘要 <>游戏理论反学习问题是从行为中推断玩家的目标。我们将游戏形式 Stackelberg 游戏中的领导和追随者作为例子,每个玩家的行为是动力系统的轨迹。我们提议一种活动反学习方法,以便领导者根据追随者的动力系统轨迹中的不同假设来推断追随者的目标函数。不同于现有方法,我们的方法不是通过被动观察到的轨迹来进行反学习,而是通过活动地增加追随者的轨迹下不同假设的差异,以加速领导者的推断。我们在回溯 horizon 重复轨迹游戏中应用了该方法,并与随机输入相比,该方法可以加速各个假设 conditioned on 追随者的轨迹上的概率的减少,减少至数量级。

GRINN: A Physics-Informed Neural Network for solving hydrodynamic systems in the presence of self-gravity

  • paper_url: http://arxiv.org/abs/2308.08010
  • repo_url: None
  • paper_authors: Sayantan Auddy, Ramit Dey, Neal J. Turner, Shantanu Basu
  • for: 模拟三维自引力液体系统,用于解答宇宙学中许多基础问题,如 planet-forming 盘状物、星系形成、星系演化和宇宙大规模结构的发展。
  • methods: 利用物理学 informed neural network(PINN)的 универса适应能力,在 mesh-free 框架中实现解决时间依赖部分强 differential equations(PDEs)的问题。
  • results: 与 аналитиче解准相匹配,在线性 régime中与传统网格代码解准相差 1%,在非线性 régime中与传统网格代码解准相差 5%。 GRINN 计算时间与维度无关,与传统网格代码计算时间相比,在一维和二维计算中快得多,在三维计算中 slower 但是更准。
    Abstract Modeling self-gravitating gas flows is essential to answering many fundamental questions in astrophysics. This spans many topics including planet-forming disks, star-forming clouds, galaxy formation, and the development of large-scale structures in the Universe. However, the nonlinear interaction between gravity and fluid dynamics offers a formidable challenge to solving the resulting time-dependent partial differential equations (PDEs) in three dimensions (3D). By leveraging the universal approximation capabilities of a neural network within a mesh-free framework, physics informed neural networks (PINNs) offer a new way of addressing this challenge. We introduce the gravity-informed neural network (GRINN), a PINN-based code, to simulate 3D self-gravitating hydrodynamic systems. Here, we specifically study gravitational instability and wave propagation in an isothermal gas. Our results match a linear analytic solution to within 1\% in the linear regime and a conventional grid code solution to within 5\% as the disturbance grows into the nonlinear regime. We find that the computation time of the GRINN does not scale with the number of dimensions. This is in contrast to the scaling of the grid-based code for the hydrodynamic and self-gravity calculations as the number of dimensions is increased. Our results show that the GRINN computation time is longer than the grid code in one- and two- dimensional calculations but is an order of magnitude lesser than the grid code in 3D with similar accuracy. Physics-informed neural networks like GRINN thus show promise for advancing our ability to model 3D astrophysical flows.
    摘要 To address this challenge, we propose a novel approach based on physics-informed neural networks (PINNs), which leverages the universal approximation capabilities of neural networks to simulate 3D self-gravitating hydrodynamic systems. We introduce the gravity-informed neural network (GRINN), a PINN-based code that can accurately simulate 3D self-gravitating flows.In this study, we specifically focus on gravitational instability and wave propagation in an isothermal gas. Our results show that the GRINN code can accurately capture the linear analytic solution to within 1% in the linear regime and a conventional grid code solution to within 5% as the disturbance grows into the nonlinear regime. Moreover, we find that the computation time of the GRINN code does not scale with the number of dimensions, which is in contrast to the scaling of the grid-based code for the hydrodynamic and self-gravity calculations as the number of dimensions is increased.Our results demonstrate that the GRINN code is an order of magnitude faster than the grid code in 3D with similar accuracy, indicating that PINNs like GRINN hold great promise for advancing our ability to model 3D astrophysical flows.

BI-LAVA: Biocuration with Hierarchical Image Labeling through Active Learning and Visual Analysis

  • paper_url: http://arxiv.org/abs/2308.08003
  • repo_url: None
  • paper_authors: Juan Trelles, Andrew Wentzel, William Berrios, G. Elisabeta Marai
  • for: 本研究目的是提高生物资料数据的品质和可用性,通过可视分析和人工学习策略,帮助模型建立者面对受限的实验数据和分类 hierarchy 的挑战。
  • methods: 本研究使用输入数据的自适应学习和可视分析策略,具有以下特点:(1)使用对应的数据分布、分类 hierarchy 和图像投影来表示数据;(2)运用活动学习和人工学习来处理受限的实验数据和标签;(3)透过可视分析来帮助模型建立者理解数据集和分类 hierarchy 的特点。
  • results: 经过评估,人机混合方法能够成功地帮助领域专家理解数据集和分类 hierarchy 的特点,并且可以提高数据的品质和可用性。
    Abstract In the biomedical domain, taxonomies organize the acquisition modalities of scientific images in hierarchical structures. Such taxonomies leverage large sets of correct image labels and provide essential information about the importance of a scientific publication, which could then be used in biocuration tasks. However, the hierarchical nature of the labels, the overhead of processing images, the absence or incompleteness of labeled data, and the expertise required to label this type of data impede the creation of useful datasets for biocuration. From a multi-year collaboration with biocurators and text-mining researchers, we derive an iterative visual analytics and active learning strategy to address these challenges. We implement this strategy in a system called BI-LAVA Biocuration with Hierarchical Image Labeling through Active Learning and Visual Analysis. BI-LAVA leverages a small set of image labels, a hierarchical set of image classifiers, and active learning to help model builders deal with incomplete ground-truth labels, target a hierarchical taxonomy of image modalities, and classify a large pool of unlabeled images. BI-LAVA's front end uses custom encodings to represent data distributions, taxonomies, image projections, and neighborhoods of image thumbnails, which help model builders explore an unfamiliar image dataset and taxonomy and correct and generate labels. An evaluation with machine learning practitioners shows that our mixed human-machine approach successfully supports domain experts in understanding the characteristics of classes within the taxonomy, as well as validating and improving data quality in labeled and unlabeled collections.
    摘要 在生物医学领域,taxonomies将科学图像收集modalities组织成层次结构。这些taxonomies利用大量正确的图像标签,提供了科学出版物的重要信息,可以用于生物CURATION任务。然而,层次性标签、图像处理过程的开销、标签数据缺失或不完整、以及标签这类数据的专业知识却使得创建有用的数据集变得困难。基于多年的合作与生物CURATORS和文本挖掘研究人员,我们提出了一种迭代式视觉分析和活动学习策略。我们实现了这种策略在BI-LAVA生物CURATION系统中。BI-LAVA利用一小sets of image标签、层次Set of image分类器和活动学习来帮助模型建立者处理不完整的真实标签、目标层次的图像模式和一大量的未标记图像。BI-LAVA的前端使用自定编码来表示数据分布、税onomy、图像投影和图像缩略图,帮助模型建立者探索未familiar的图像数据集和税onomy,并且更正和生成标签。我们与机器学习专家合作进行评估,表明我们的人机结合方法可以成功地支持领域专家理解税onomy中类别的特征,以及验证和提高标记和未标记集的数据质量。

A physics-informed machine learning model for reconstruction of dynamic loads

  • paper_url: http://arxiv.org/abs/2308.08571
  • repo_url: None
  • paper_authors: Gledson Rodrigo Tondo, Igor Kavrakov, Guido Morgenthal
  • For: This paper aims to develop a probabilistic physics-informed machine-learning framework for reconstructing dynamic forces on long-span bridges based on measured deflections, velocities, or accelerations.* Methods: The proposed framework uses Gaussian process regression to model the relationship between the measured data and the dynamic forces, and can handle incomplete and contaminated data.* Results: The developed framework is applied to an aerodynamic analysis of the Great Belt East Bridge, and the results show good agreement between the applied and predicted dynamic load. The framework can be used for validation of design models and assumptions, as well as prognosis of responses to assist in damage detection and structural health monitoring.Here is the same information in Simplified Chinese:* For: 这篇论文目的是开发一种基于测量数据的可靠物理学整合机器学习框架,用于重 span 桥的动态力计算。* Methods: 提posed 框架使用 Gaussian 过程回归来模型测量数据和动态力之间的关系,并可以处理受损和污染的数据。* Results: 该框架在应用于大彩虹大桥的风动分析中获得了良好的吻合度,可以用于验证设计模型和假设,以及损害检测和结构健康监测。
    Abstract Long-span bridges are subjected to a multitude of dynamic excitations during their lifespan. To account for their effects on the structural system, several load models are used during design to simulate the conditions the structure is likely to experience. These models are based on different simplifying assumptions and are generally guided by parameters that are stochastically identified from measurement data, making their outputs inherently uncertain. This paper presents a probabilistic physics-informed machine-learning framework based on Gaussian process regression for reconstructing dynamic forces based on measured deflections, velocities, or accelerations. The model can work with incomplete and contaminated data and offers a natural regularization approach to account for noise in the measurement system. An application of the developed framework is given by an aerodynamic analysis of the Great Belt East Bridge. The aerodynamic response is calculated numerically based on the quasi-steady model, and the underlying forces are reconstructed using sparse and noisy measurements. Results indicate a good agreement between the applied and the predicted dynamic load and can be extended to calculate global responses and the resulting internal forces. Uses of the developed framework include validation of design models and assumptions, as well as prognosis of responses to assist in damage detection and structural health monitoring.
    摘要 长 Span 桥拱需要面对多种动态冲击 durante 其使用寿命。为了考虑这些影响结构系统的效应,设计时使用多种荷载模型来模拟结构可能经历的条件。这些模型基于不同的简化假设和由测量数据指导的参数,因此其输出具有内在的不确定性。这篇文章描述了一种基于 Gaussian 进程回归的概率物理学 informed 机器学习框架,用于重建动态力 based on 测量弯曲、速度或加速度。该模型可以处理部分完整和污染的数据,并提供自然的正则化方法来考虑测量系统中的噪声。应用于大吃水东桥的 aerodynamic 分析, numerically 计算 quasi-steady 模型下的 aerodynamic 响应,并使用稀缺和污染的测量数据来重建动态荷载。结果表明了 applied 和预测的动态荷载之间的良好一致,并可以扩展到计算全局响应和结构内部力。用于验证设计模型和假设,以及检测结构受损和监测结构健康状况。

Monte Carlo guided Diffusion for Bayesian linear inverse problems

  • paper_url: http://arxiv.org/abs/2308.07983
  • repo_url: None
  • paper_authors: Gabriel Cardoso, Yazid Janati El Idrissi, Sylvain Le Corff, Eric Moulines
  • for: 这个论文是为了解决杂合知识的线性逆问题,包括计算摄影到医学影像等多个应用。
  • methods: 这个论文使用了分数基生成模型(SGM)来解决这些问题,特别是在填充问题中。
  • results: 该论文提出了一种基于贝叶斯框架的Feynman-Kac模型,并使用了顺序 Monte Carlo 方法解决这个问题。数据显示,该算法在处理杂合知识的逆问题时表现更好于基准值。
    Abstract Ill-posed linear inverse problems that combine knowledge of the forward measurement model with prior models arise frequently in various applications, from computational photography to medical imaging. Recent research has focused on solving these problems with score-based generative models (SGMs) that produce perceptually plausible images, especially in inpainting problems. In this study, we exploit the particular structure of the prior defined in the SGM to formulate recovery in a Bayesian framework as a Feynman--Kac model adapted from the forward diffusion model used to construct score-based diffusion. To solve this Feynman--Kac problem, we propose the use of Sequential Monte Carlo methods. The proposed algorithm, MCGdiff, is shown to be theoretically grounded and we provide numerical simulations showing that it outperforms competing baselines when dealing with ill-posed inverse problems.
    摘要 “对于具有前进模型知识的抽象线性逆问题,在计算摄影和医疗影像等多个应用中经常出现。现代研究强调使用得分生成模型(SGM)解决这些问题,以生成可见感合理的图像,特别是在填充问题中。本研究利用SGM中的特殊结构,将恢复问题转化为bayesian框架中的Feynman-Kac模型,并使用顺序 Monte Carlo方法解决。我们提出的算法MCGdiff理论上有基础,并在数值实验中证明其在不良定性逆问题中表现更好于竞争对手。”Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

An Adaptive Approach for Probabilistic Wind Power Forecasting Based on Meta-Learning

  • paper_url: http://arxiv.org/abs/2308.07980
  • repo_url: None
  • paper_authors: Zichao Meng, Ye Guo, Hongbin Sun
  • for: 这个论文研究了一种适应性approach for probabilistic wind power forecasting (WPF), 包括Offline和Online学习过程。
  • methods: 在Offline学习阶段,使用内部和外部循环更新的meta-学习方法来训练基础预测模型,使其具有不同预测任务中的优秀适应性,如 probabilistic WPF with different lead times or locations。在Online学习阶段,基础预测模型被应用于在线预测,并与增量学习技术相结合。
  • results: 对于不同的预测任务和地点,提出了两种应用:一是针对不同的领先时间(temporal adaptation),二是针对新建的风力电站(spatial adaptation)。数据集实验结果表明,提出的方法具有优秀的适应性,与现有的方法相比。
    Abstract This paper studies an adaptive approach for probabilistic wind power forecasting (WPF) including offline and online learning procedures. In the offline learning stage, a base forecast model is trained via inner and outer loop updates of meta-learning, which endows the base forecast model with excellent adaptability to different forecast tasks, i.e., probabilistic WPF with different lead times or locations. In the online learning stage, the base forecast model is applied to online forecasting combined with incremental learning techniques. On this basis, the online forecast takes full advantage of recent information and the adaptability of the base forecast model. Two applications are developed based on our proposed approach concerning forecasting with different lead times (temporal adaptation) and forecasting for newly established wind farms (spatial adaptation), respectively. Numerical tests were conducted on real-world wind power data sets. Simulation results validate the advantages in adaptivity of the proposed methods compared with existing alternatives.
    摘要

MultiSChuBERT: Effective Multimodal Fusion for Scholarly Document Quality Prediction

  • paper_url: http://arxiv.org/abs/2308.07971
  • repo_url: None
  • paper_authors: Gideon Maillette de Buy Wenniger, Thomas van Dongen, Lambert Schomaker
  • for: 本研究旨在提高学术文献质量预测 task 的性能,通过将文本信息和视觉信息结合起来进行预测。
  • methods: 本研究使用了一种多模态预测模型 MultiSChuBERT,它将文本模型基于块化全文本和计算BERT块编码(SChuBERT),与视觉模型基于 Inception V3 结合在一起。
  • results: 研究发现,将视觉信息和文本信息结合在一起可以显著改善预测结果。此外,研究还发现逐渐冰结视觉子模型的Weight可以降低过拟合现象,提高结果。最后,研究发现使用不同的文本嵌入模型可以进一步提高结果。
    Abstract Automatic assessment of the quality of scholarly documents is a difficult task with high potential impact. Multimodality, in particular the addition of visual information next to text, has been shown to improve the performance on scholarly document quality prediction (SDQP) tasks. We propose the multimodal predictive model MultiSChuBERT. It combines a textual model based on chunking full paper text and aggregating computed BERT chunk-encodings (SChuBERT), with a visual model based on Inception V3.Our work contributes to the current state-of-the-art in SDQP in three ways. First, we show that the method of combining visual and textual embeddings can substantially influence the results. Second, we demonstrate that gradual-unfreezing of the weights of the visual sub-model, reduces its tendency to ovefit the data, improving results. Third, we show the retained benefit of multimodality when replacing standard BERT$_{\textrm{BASE}$ embeddings with more recent state-of-the-art text embedding models. Using BERT$_{\textrm{BASE}$ embeddings, on the (log) number of citations prediction task with the ACL-BiblioMetry dataset, our MultiSChuBERT (text+visual) model obtains an $R^{2}$ score of 0.454 compared to 0.432 for the SChuBERT (text only) model. Similar improvements are obtained on the PeerRead accept/reject prediction task. In our experiments using SciBERT, scincl, SPECTER and SPECTER2.0 embeddings, we show that each of these tailored embeddings adds further improvements over the standard BERT$_{\textrm{BASE}$ embeddings, with the SPECTER2.0 embeddings performing best.
    摘要 自动评估学术文献质量是一项复杂的任务,具有高的潜在影响力。在特定的情况下,添加视觉信息可以提高学术文献质量预测(SDQP)任务的性能。我们提出了多模态预测模型MultiSChuBERT,它将文本模型基于批处全篇文本和计算BERT批处编码(SChuBERT),与视觉模型基于Inception V3.0结合。我们的工作对学术文献质量预测领域做出了三种贡献。首先,我们发现将视觉和文本嵌入结合的方法可以具有显著的影响。其次,我们证明在归一化权重的过程中,慢慢地冰封视觉子模型的权重可以降低它的预测数据的偏向性,提高结果。最后,我们发现在使用更新的文本嵌入模型而不是标准的BERT$_{\textrm{BASE}$嵌入模型时,多模态性仍然具有保留的优势。使用BERT$_{\textrm{BASE}$嵌入模型,我们在ACL-BiblioMetry数据集上进行(ilog)引用数预测任务,MultiSChuBERT(文本+视觉)模型的$R^{2}$分数为0.454,比SChuBERT(文本只)模型的0.432高。类似的改进也在PeerRead接受/拒绝预测任务中获得。在我们使用SciBERT、scincl、SPECTER和SPECTER2.0嵌入模型时,我们发现每种适应嵌入模型都带来进一步的改进,SPECTER2.0嵌入模型表现最佳。

RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models

  • paper_url: http://arxiv.org/abs/2308.07922
  • repo_url: None
  • paper_authors: Jie Huang, Wei Ping, Peng Xu, Mohammad Shoeybi, Kevin Chen-Chuan Chang, Bryan Catanzaro
  • for: 本研究探讨了基于搜索增强encoder-decoder语言模型的上下文学习能力。
  • methods: 我们首先对state-of-the-art ATLAS模型进行了全面分析,并发现其在上下文学习中存在一些局限性,主要是预训练和测试之间的匹配性不佳,以及上下文长度的限制。为了解决这些问题,我们提出了RAVEN模型,它将搜索增强的masked语言模型和前缀语言模型相结合。此外,我们还提出了Fusion-in-Context Learning来提高几何学习性能,使模型可以在不需要额外训练或模型修改的情况下,更好地利用上下文中的更多示例。
  • results: 通过广泛的实验,我们证明了RAVEN模型在certain情况下可以明显超越ATLAS模型,并在一些情况下与当前最先进的语言模型匹配。此外,我们还发现RAVEN模型在几何学习中的表现比ATLAS模型更好,尽管它具有更少的参数。我们的研究见证了 Retrieval-augmented encoder-decoder语言模型在上下文学习中的潜力,并鼓励了进一步的研究。
    Abstract In this paper, we investigate the in-context learning ability of retrieval-augmented encoder-decoder language models. We first conduct a comprehensive analysis of the state-of-the-art ATLAS model and identify its limitations in in-context learning, primarily due to a mismatch between pretraining and testing, as well as a restricted context length. To address these issues, we propose RAVEN, a model that combines retrieval-augmented masked language modeling and prefix language modeling. We further introduce Fusion-in-Context Learning to enhance the few-shot performance by enabling the model to leverage more in-context examples without requiring additional training or model modifications. Through extensive experiments, we demonstrate that RAVEN significantly outperforms ATLAS and achieves results comparable to the most advanced language models in certain scenarios, despite having substantially fewer parameters. Our work underscores the potential of retrieval-augmented encoder-decoder language models for in-context learning and encourages further research in this direction.
    摘要 在这篇论文中,我们研究了基于检索扩展encoder-decoder语言模型的Contextual Learning能力。我们首先对当前领先的ATLAS模型进行了全面分析,并发现其在Contextual Learning中存在一些限制,主要归结于预训练和测试的匹配性不足以及context长度的限制。为了解决这些问题,我们提议了RAVEN模型,它将检索扩展的MASKED语言模型和前缀语言模型相结合。我们还引入了Context-in-Fusion Learning,以提高少量示例的性能,使模型可以在不需要额外训练或模型修改的情况下,更好地利用更多的Contextual例子。通过广泛的实验,我们证明了RAVEN模型可以在某些场景下明显超过ATLAS模型,并达到与当前最先进语言模型相当的性能,即使有substantially fewer parameters。我们的工作强调了基于检索扩展encoder-decoder语言模型的Contextual Learning的潜力,并鼓励进一步的研究。

The Regular Expression Inference Challenge

  • paper_url: http://arxiv.org/abs/2308.07899
  • repo_url: None
  • paper_authors: Mojtaba Valizadeh, Philip John Gorinski, Ignacio Iacobacci, Martin Berger
  • for: 这 paper 是为了探讨 code/language modelling 中的一个挑战问题,即 regular expression inference (REI)。
  • methods: 这 paper 使用的方法包括 program synthesis 技术和 GPU 加速。
  • results: 这 paper 提供了首个大规模的 REI 数据集,以及一些初步的机器学习基线。
    Abstract We propose \emph{regular expression inference (REI)} as a challenge for code/language modelling, and the wider machine learning community. REI is a supervised machine learning (ML) and program synthesis task, and poses the problem of finding minimal regular expressions from examples: Given two finite sets of strings $P$ and $N$ and a cost function $\text{cost}(\cdot)$, the task is to generate an expression $r$ that accepts all strings in $P$ and rejects all strings in $N$, while no other such expression $r'$ exists with $\text{cost}(r')<\text{cost}(r)$. REI has advantages as a challenge problem: (i) regular expressions are well-known, widely used, and a natural idealisation of code; (ii) REI's asymptotic worst-case complexity is well understood; (iii) REI has a small number of easy to understand parameters (e.g.~$P$ or $N$ cardinality, string lengths of examples, or the cost function); this lets us easily finetune REI-hardness; (iv) REI is an unsolved problem for deep learning based ML. Recently, an REI solver was implemented on GPUs, using program synthesis techniques. This enabled, for the first time, fast generation of minimal expressions for complex REI instances. Building on this advance, we generate and publish the first large-scale datasets for REI, and devise and evaluate several initial heuristic and machine learning baselines. We invite the community to participate and explore ML methods that learn to solve REI problems. We believe that progress in REI directly translates to code/language modelling.
    摘要 我们提出了常规表达式推理(REI)作为代码/语言模型化挑战,以及更广泛的机器学习社区。REI是一个监督式机器学习(ML)和程式合成任务,挑战找到最小的常规表达式,使得所有$P$集合中的字串都被接受,而$N$集合中的字串都被拒绝,而且没有其他可以实现这一结果的表达式$r'$,则cost函数中的cost(r')>cost(r)。REI有以下优点作为挑战问题:(一)常规表达式很受欢迎,广泛使用,并是代码的自然化的理想化;(二)REI的极限最差情况的复杂度良好了解;(三)REI只有一小部分容易理解的参数(例如$P$或$N$的 cardinality,字串示例的长度,或者cost函数),这使我们可以轻松地调整REI的困难度;(四)REI是深度学习基于机器学习的未解决问题。最近,一个REI解决方案在GPU上被实现,使用程式合成技术。这使得,在一次性的情况下,快速生成复杂的REI问题中的最小表达式。我们为了进一步推动REI的研究,为首次发布了大规模的REI数据集,并设计了许多初步的假设和机器学习基准。我们邀请社区参与,探索机器学习方法可以解决REI问题。我们相信,REI的进步直接对代码/语言模型化有益。

SciRE-Solver: Efficient Sampling of Diffusion Probabilistic Models by Score-integrand Solver with Recursive Derivative Estimation

  • paper_url: http://arxiv.org/abs/2308.07896
  • repo_url: None
  • paper_authors: Shigui Li, Wei Chen, Delu Zeng
    for:This paper proposes a high-efficiency sampler for Diffusion Probabilistic Models (DPMs), which are powerful generative models known for their ability to generate high-fidelity image samples.methods:The proposed method uses a score-based exact solution paradigm for the diffusion ODEs corresponding to the sampling process of DPMs, and introduces a new perspective on developing numerical algorithms for solving diffusion ODEs. The method also uses a recursive derivative estimation (RDE) method to reduce the estimation error.results:The proposed method, called SciRE-Solver, achieves state-of-the-art (SOTA) sampling performance with a limited number of score function evaluations (NFE) on both discrete-time and continuous-time DPMs. Specifically, the method achieves $3.48$ FID with $12$ NFE and $2.42$ FID with $20$ NFE for continuous-time DPMs on CIFAR10, respectively. The method also reaches SOTA values of $2.40$ FID with $100$ NFE for continuous-time DPM and of $3.15$ FID with $84$ NFE for discrete-time DPM on CIFAR-10, as well as of $2.17$ ($2.02$) FID with $18$ ($50$) NFE for discrete-time DPM on CelebA 64$\times$64.
    Abstract Diffusion probabilistic models (DPMs) are a powerful class of generative models known for their ability to generate high-fidelity image samples. A major challenge in the implementation of DPMs is the slow sampling process. In this work, we bring a high-efficiency sampler for DPMs. Specifically, we propose a score-based exact solution paradigm for the diffusion ODEs corresponding to the sampling process of DPMs, which introduces a new perspective on developing numerical algorithms for solving diffusion ODEs. To achieve an efficient sampler, we propose a recursive derivative estimation (RDE) method to reduce the estimation error. With our proposed solution paradigm and RDE method, we propose the score-integrand solver with the convergence order guarantee as efficient solver (SciRE-Solver) for solving diffusion ODEs. The SciRE-Solver attains state-of-the-art (SOTA) sampling performance with a limited number of score function evaluations (NFE) on both discrete-time and continuous-time DPMs in comparison to existing training-free sampling algorithms. Such as, we achieve $3.48$ FID with $12$ NFE and $2.42$ FID with $20$ NFE for continuous-time DPMs on CIFAR10, respectively. Different from other samplers, SciRE-Solver has the promising potential to surpass the FIDs achieved in the original papers of some pre-trained models with a small NFEs. For example, we reach SOTA value of $2.40$ FID with $100$ NFE for continuous-time DPM and of $3.15$ FID with $84$ NFE for discrete-time DPM on CIFAR-10, as well as of $2.17$ ($2.02$) FID with $18$ ($50$) NFE for discrete-time DPM on CelebA 64$\times$64.
    摘要 Diffusion probabilistic models (DPMs) 是一种强大的生成模型,能够生成高质量的图像样本。然而,DPMs 的实现受到批处的概率过程的缓慢问题。在这项工作中,我们提出了一种高效的抽象方法,用于解决DPMs 的批处概率过程。具体来说,我们提出了一种基于得分函数的精确解方法,用于解决相应的批处Diffusion ODEs。此外,我们还提出了一种级联Derivative估计(RDE)方法,以降低估计误差。通过我们的提出的解方法和RDE方法,我们提出了一种得分函数整合器(SciRE-Solver),用于解决Diffusion ODEs。SciRE-Solver 实现了状态前的(SOTA)抽样性能,并且只需要少量的得分函数评估(NFE)。例如,我们在 CIFAR10 上实现了 $3.48$ FID 和 $2.42$ FID 的抽样性能,它们分别需要 $12$ NFE 和 $20$ NFE。与其他抽样器不同,SciRE-Solver 具有可能超过原始模型的 FID 的潜在能力,例如,我们在 CIFAR10 上实现了 $2.40$ FID 和 $3.15$ FID,它们分别需要 $100$ NFE 和 $84$ NFE。此外,我们还在 CelebA 64$\times$64 上实现了 $2.17$ ($2.02$) FID 和 $18$ ($50$) NFE。

On regularized Radon-Nikodym differentiation

  • paper_url: http://arxiv.org/abs/2308.07887
  • repo_url: None
  • paper_authors: Duc Hoan Nguyen, Werner Zellinger, Sergei V. Pereverzyev
  • for: 本文解决了Radon-NikodymDerivative的估计问题,这个问题在各种应用中出现,如covariate shift适应、likelihood-ratio测试、mutual information估计和conditional probability估计中。
  • methods: 本文使用总体规范化 regularization scheme在 reproduce kernel Hilbert space中解决了上述问题,并确定了相应的正则化算法的收敛速率,这是通过考虑权重函数的平滑性和估计空间的容量来实现的。
  • results: 本文的理论结果通过数学实验证明,可以高精度地重construct Radon-NikodymDerivative在任何特定点。
    Abstract We discuss the problem of estimating Radon-Nikodym derivatives. This problem appears in various applications, such as covariate shift adaptation, likelihood-ratio testing, mutual information estimation, and conditional probability estimation. To address the above problem, we employ the general regularization scheme in reproducing kernel Hilbert spaces. The convergence rate of the corresponding regularized algorithm is established by taking into account both the smoothness of the derivative and the capacity of the space in which it is estimated. This is done in terms of general source conditions and the regularized Christoffel functions. We also find that the reconstruction of Radon-Nikodym derivatives at any particular point can be done with high order of accuracy. Our theoretical results are illustrated by numerical simulations.
    摘要 我们讨论类 Radon-Nikodym Derivative 的估计问题。这个问题在各种应用中出现,例如对应测验、likelihood-ratio 测试、共轨关系估计和conditional probability 估计。为了解决这个问题,我们使用通用的常数化方案在复原核函数空间中实现。我们证明了这个常数化算法的数据速度,通过考虑类 derivative 的平滑性和核函数空间中的容量。此外,我们也发现了在特定点进行类 Radon-Nikodym Derivative 的重建可以实现高精度。我们的理论结果通过数学实验进行说明。

Back to Basics: A Sanity Check on Modern Time Series Classification Algorithms

  • paper_url: http://arxiv.org/abs/2308.07886
  • repo_url: https://github.com/mlgig/tabularmodelsfortsc
  • paper_authors: Bhaskar Dhariyal, Thach Le Nguyen, Georgiana Ifrim
  • for: 这篇论文的目的是为了评估时序分类方法的基线性能。
  • methods: 这篇论文使用的方法包括经典机器学习算法(如拥平面、LDA、Random Forest)和ROCKET家族的分类器(如Rocket、MiniRocket、MultiRocket)。
  • results: 研究发现,使用Tabular模型可以在大约19%的单variate数据集和28%的多variate数据集上表现出色,并且在大约50%的数据集上达到了10个百分点的准确率。这些结果表明,在开发时序分类器时,需要考虑使用Tabular模型作为基线。这些模型快速、简单、可能比较容易理解和部署。
    Abstract The state-of-the-art in time series classification has come a long way, from the 1NN-DTW algorithm to the ROCKET family of classifiers. However, in the current fast-paced development of new classifiers, taking a step back and performing simple baseline checks is essential. These checks are often overlooked, as researchers are focused on establishing new state-of-the-art results, developing scalable algorithms, and making models explainable. Nevertheless, there are many datasets that look like time series at first glance, but classic algorithms such as tabular methods with no time ordering may perform better on such problems. For example, for spectroscopy datasets, tabular methods tend to significantly outperform recent time series methods. In this study, we compare the performance of tabular models using classic machine learning approaches (e.g., Ridge, LDA, RandomForest) with the ROCKET family of classifiers (e.g., Rocket, MiniRocket, MultiRocket). Tabular models are simple and very efficient, while the ROCKET family of classifiers are more complex and have state-of-the-art accuracy and efficiency among recent time series classifiers. We find that tabular models outperform the ROCKET family of classifiers on approximately 19% of univariate and 28% of multivariate datasets in the UCR/UEA benchmark and achieve accuracy within 10 percentage points on about 50% of datasets. Our results suggest that it is important to consider simple tabular models as baselines when developing time series classifiers. These models are very fast, can be as effective as more complex methods and may be easier to understand and deploy.
    摘要 现代时间序列分类技术已经进步很远,从1NN-DTW算法到ROCKET家族的分类器。然而,在当前的快速发展新的分类器,需要从时刻停下来,进行简单的基准检查。这些检查通常被忽略,因为研究人员对于创造新的state-of-the-art结果、开发可扩展的算法以及使模型可读性有着紧迫的需求。然而,有许多数据集看起来像时间序列,但 классические算法如表格方法无时间排序可能在这些问题上表现更好。例如,对于spectroscopy数据集,表格方法在许多情况下会表现出色。在这种研究中,我们比较了使用经典机器学习方法(如ridge、LDA、RandomForest)的表格模型与ROCKET家族的分类器(如Rocket、MiniRocket、MultiRocket)的性能。表格模型简单、很有效,而ROCKET家族的分类器则是更复杂,在最近的时间序列分类器中具有state-of-the-art的准确率和效率。我们发现,表格模型在UCRC/UEA数据集上约占19%的单variate数据集和28%的多variate数据集上表现较佳,并且在大约50%的数据集上达到了准确率在10个百分点之间。我们的结果表明,在开发时间序列分类器时,需要考虑使用简单的表格模型作为基准。这些模型很快速,可能与更复杂的方法相当有效,而且可能更容易理解和部署。

The Challenge of Fetal Cardiac MRI Reconstruction Using Deep Learning

  • paper_url: http://arxiv.org/abs/2308.07885
  • repo_url: None
  • paper_authors: Denis Prokopenko, Kerstin Hammernik, Thomas Roberts, David F A Lloyd, Daniel Rueckert, Joseph V Hajnal
  • for: 这个研究旨在提高非扫描kt-SENSE重建质量,使用深度学习方法来恢复不完整扫描的数据。
  • methods: 这个研究使用了深度学习网络来重建kt-SENSE样式所获取的数据,并对不同网络架构和训练策略进行了研究。
  • results: 研究发现,使用多架构和训练策略可以提高模型的性能,但是这些模型仍然不能准确地捕捉婴儿心脏的动态特征。
    Abstract Dynamic free-breathing fetal cardiac MRI is one of the most challenging modalities, which requires high temporal and spatial resolution to depict rapid changes in a small fetal heart. The ability of deep learning methods to recover undersampled data could help to optimise the kt-SENSE acquisition strategy and improve non-gated kt-SENSE reconstruction quality. In this work, we explore supervised deep learning networks for reconstruction of kt-SENSE style acquired data using an extensive in vivo dataset. Having access to fully-sampled low-resolution multi-coil fetal cardiac MRI, we study the performance of the networks to recover fully-sampled data from undersampled data. We consider model architectures together with training strategies taking into account their application in the real clinical setup used to collect the dataset to enable networks to recover prospectively undersampled data. We explore a set of modifications to form a baseline performance evaluation for dynamic fetal cardiac MRI on real data. We systematically evaluate the models on coil-combined data to reveal the effect of the suggested changes to the architecture in the context of fetal heart properties. We show that the best-performers recover a detailed depiction of the maternal anatomy on a large scale, but the dynamic properties of the fetal heart are under-represented. Training directly on multi-coil data improves the performance of the models, allows their prospective application to undersampled data and makes them outperform CTFNet introduced for adult cardiac cine MRI. However, these models deliver similar qualitative performances recovering the maternal body very well but underestimating the dynamic properties of fetal heart. This dynamic feature of fast change of fetal heart that is highly localised suggests both more targeted training and evaluation methods might be needed for fetal heart application.
    摘要 “几何受测疫苗Cardiac MRI是最具挑战性的测疫方法,需要高度的时间和空间分辨率,以呈现迅速变化的小胎心脏。深度学习方法可以复原缺少样本的数据,可以帮助优化kt-SENSE取样策略和提高非锁定kt-SENSE重建品质。在这个工作中,我们使用了对应式深度学习网络来重建kt-SENSE式取得的数据,使用了实验室中的广泛生物实验数据。由于我们有完整的低分辨率多晶粒胎心MRI数据,我们可以研究深度学习网络是否可以从缺少样本中获取完整的数据。我们考虑了网络架构和训练策略,以便在实际的临床设置中执行。我们系统地评估了这些模型,并且使用了多晶粒数据进行训练,以便在未来遇到缺少样本时进行预测。我们发现这些最佳performer可以精确地重建大规模的母体 анатоми,但是儿 heart的动力学特性受到了忽略。这些模型在训练 directly on multi-coil data 时表现更好,并且可以在缺少样本情况下进行预测。但是,这些模型对儿 heart的动力学特性做出了不充分的评估。这个儿 heart的快速变化和高度地方化的特性表明,为了对儿 heart进行评估,可能需要更加特定的训练和评估方法。”

A Trustable LSTM-Autoencoder Network for Cyberbullying Detection on Social Media Using Synthetic Data

  • paper_url: http://arxiv.org/abs/2308.09722
  • repo_url: None
  • paper_authors: Mst Shapna Akter, Hossain Shahriar, Alfredo Cuzzocrea
  • for: 本研究旨在提出一种可靠的LSTM-Autoencoder网络,用于社交媒体上的虐待推测。
  • methods: 本研究使用了人工生成的数据集,并对英语、印地语和孟加拉语三种语言进行实验验证。使用了LSTM、BiLSTM、LSTM-Autoencoder、Word2vec、BERT和GPT-2等模型进行比较。
  • results: 研究发现,提出的LSTM-Autoencoder网络在所有数据集上表现最佳,具有95%的准确率。与前一些相关研究相比,本研究的结果具有状态的前进。
    Abstract Social media cyberbullying has a detrimental effect on human life. As online social networking grows daily, the amount of hate speech also increases. Such terrible content can cause depression and actions related to suicide. This paper proposes a trustable LSTM-Autoencoder Network for cyberbullying detection on social media using synthetic data. We have demonstrated a cutting-edge method to address data availability difficulties by producing machine-translated data. However, several languages such as Hindi and Bangla still lack adequate investigations due to a lack of datasets. We carried out experimental identification of aggressive comments on Hindi, Bangla, and English datasets using the proposed model and traditional models, including Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), LSTM-Autoencoder, Word2vec, Bidirectional Encoder Representations from Transformers (BERT), and Generative Pre-trained Transformer 2 (GPT-2) models. We employed evaluation metrics such as f1-score, accuracy, precision, and recall to assess the models performance. Our proposed model outperformed all the models on all datasets, achieving the highest accuracy of 95%. Our model achieves state-of-the-art results among all the previous works on the dataset we used in this paper.
    摘要 社交媒体cyberbullying有着恶势力影响人类生活。随着在线社交网络的日常增长,讨厌言语的数量也在增加。这种凶杀内容可能导致抑郁和自杀行为。本文提议一种可靠的LSTM-Autoencoder网络来检测社交媒体上的cyberbullying,使用合成数据。我们已经实现了数据可用性问题的解决方法,生成了机器翻译数据。然而,一些语言,如希ن迪和孟加拉语,仍然缺乏足够的调查,因为数据缺乏。我们对印地语、孟加拉语和英语数据集进行了实验性的攻击性评论标识,使用我们提出的模型和传统模型,包括Long Short-Term Memory(LSTM)、Bidirectional Long Short-Term Memory(BiLSTM)、LSTM-Autoencoder、Word2vec、Bidirectional Encoder Representations from Transformers(BERT)和Generative Pre-trained Transformer 2(GPT-2)模型。我们使用了评价指标,如f1-score、准确率、精度和回归率,评估模型的表现。我们的提议模型在所有数据集上表现出色,实现了最高的准确率95%。我们的模型在所有之前的作品中获得了状态的杰出成绩。

Towards Temporal Edge Regression: A Case Study on Agriculture Trade Between Nations

  • paper_url: http://arxiv.org/abs/2308.07883
  • repo_url: https://github.com/scylj1/gnn_edge_regression
  • paper_authors: Lekang Jiang, Caiqi Zhang, Farimah Poursafaei, Shenyang Huang
  • for: This paper is written for exploring the application of Graph Neural Networks (GNNs) to edge regression tasks in both static and dynamic settings, specifically focusing on predicting food and agriculture trade values between nations.
  • methods: The paper introduces three simple yet strong baselines and comprehensively evaluates one static and three dynamic GNN models using the UN Trade dataset.
  • results: The experimental results show that the baselines exhibit remarkably strong performance across various settings, highlighting the inadequacy of existing GNNs. Additionally, the paper finds that TGN outperforms other GNN models, suggesting TGN is a more appropriate choice for edge regression tasks. Furthermore, the proportion of negative edges in the training samples significantly affects the test performance.Here is the information in Simplified Chinese text:
  • for: 这篇论文是为了探索图神经网络(GNNs)在边 regression 任务中的应用,特别是针对食品和农业贸易值 между国家。
  • methods: 论文引入三种简单强大的基线,并对一个静态和三个动态 GNN 模型进行了全面的评估,使用 UN 贸易 dataset。
  • results: 实验结果显示,基线在不同设置下表现出色,强调现有 GNN 的不足。此外,论文发现 TGN 在边 regression 任务中表现更出色,建议 TGN 是更适合的选择。此外,论文发现训练样本中负边的比例对测试性能产生了显著的影响。
    Abstract Recently, Graph Neural Networks (GNNs) have shown promising performance in tasks on dynamic graphs such as node classification, link prediction and graph regression. However, few work has studied the temporal edge regression task which has important real-world applications. In this paper, we explore the application of GNNs to edge regression tasks in both static and dynamic settings, focusing on predicting food and agriculture trade values between nations. We introduce three simple yet strong baselines and comprehensively evaluate one static and three dynamic GNN models using the UN Trade dataset. Our experimental results reveal that the baselines exhibit remarkably strong performance across various settings, highlighting the inadequacy of existing GNNs. We also find that TGN outperforms other GNN models, suggesting TGN is a more appropriate choice for edge regression tasks. Moreover, we note that the proportion of negative edges in the training samples significantly affects the test performance. The companion source code can be found at: https://github.com/scylj1/GNN_Edge_Regression.
    摘要 近期,图神经网络(GNNs)在动态图像任务中表现出色,包括节点分类、链接预测和图像 regression。然而,有少量研究探讨了时间Edge regression任务,它在实际世界中具有重要应用。在这篇论文中,我们探讨了 GNNs 在静态和动态设置下的边 regression 任务,专注于预测国家之间的食品和农业贸易价值。我们提出三种简单却强大的基elines,并且广泛评估了一个静态 GNN 模型和三个动态 GNN 模型,使用 UN Trade 数据集进行实验。我们的实验结果表明,基elines 在不同设置下具有惊人的表现,反映现有 GNNs 的不足。此外,我们发现预测样本中负边的比例对测试性能产生了重要的影响。 companion 的源代码可以在:https://github.com/scylj1/GNN_Edge_Regression 找到。

Synthesizing Political Zero-Shot Relation Classification via Codebook Knowledge, NLI, and ChatGPT

  • paper_url: http://arxiv.org/abs/2308.07876
  • repo_url: https://github.com/snowood1/zero-shot-plover
  • paper_authors: Yibo Hu, Erick Skorupa Parolin, Latifur Khan, Patrick T. Brandt, Javier Osorio, Vito J. D’Orazio
  • for: 本研究旨在提高政治事件代码分类的效率和可扩展性,通过利用知识从已有的注释编目中提取 transferred learning 和已有专家知识。
  • methods: 本研究使用了 Zero-shot approach,包括一种基于自然语言理解 (NLI) 的新方法 named ZSP,它采用了树查询框架,将任务分解成上下文、Modalities 和类别异常级别。
  • results: 对于细化的 Rootcode 分类,ZSP 实现了40%的 F1 分数提升,与 supervised BERT 模型相比,ZSP 的性能相对稳定,可用于事件记录验证和 ontology 开发。
    Abstract Recent supervised models for event coding vastly outperform pattern-matching methods. However, their reliance solely on new annotations disregards the vast knowledge within expert databases, hindering their applicability to fine-grained classification. To address these limitations, we explore zero-shot approaches for political event ontology relation classification, by leveraging knowledge from established annotation codebooks. Our study encompasses both ChatGPT and a novel natural language inference (NLI) based approach named ZSP. ZSP adopts a tree-query framework that deconstructs the task into context, modality, and class disambiguation levels. This framework improves interpretability, efficiency, and adaptability to schema changes. By conducting extensive experiments on our newly curated datasets, we pinpoint the instability issues within ChatGPT and highlight the superior performance of ZSP. ZSP achieves an impressive 40% improvement in F1 score for fine-grained Rootcode classification. ZSP demonstrates competitive performance compared to supervised BERT models, positioning it as a valuable tool for event record validation and ontology development. Our work underscores the potential of leveraging transfer learning and existing expertise to enhance the efficiency and scalability of research in the field.
    摘要

Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets

  • paper_url: http://arxiv.org/abs/2308.07871
  • repo_url: None
  • paper_authors: Sven Buechel, Udo Hahn
  • for: 这篇论文旨在提出一种统一计算模型,用于处理不同表达形式和标签类型的人类情感表达。
  • methods: 该模型使用训练程序学习共享幂等表达空间,以独立于不同自然语言、通信模式、媒体和表达标签格式,并且可以应用于不同的模型架构。
  • results: 实验结果表明,该方法可以实现数据和标签类型之间的协同性,无需增加预测质量的损害,同时提供了可重用、可解释和灵活的模型。
    Abstract Human emotion is expressed in many communication modalities and media formats and so their computational study is equally diversified into natural language processing, audio signal analysis, computer vision, etc. Similarly, the large variety of representation formats used in previous research to describe emotions (polarity scales, basic emotion categories, dimensional approaches, appraisal theory, etc.) have led to an ever proliferating diversity of datasets, predictive models, and software tools for emotion analysis. Because of these two distinct types of heterogeneity, at the expressional and representational level, there is a dire need to unify previous work on increasingly diverging data and label types. This article presents such a unifying computational model. We propose a training procedure that learns a shared latent representation for emotions, so-called emotion embeddings, independent of different natural languages, communication modalities, media or representation label formats, and even disparate model architectures. Experiments on a wide range of heterogeneous affective datasets indicate that this approach yields the desired interoperability for the sake of reusability, interpretability and flexibility, without penalizing prediction quality. Code and data are archived under https://doi.org/10.5281/zenodo.7405327 .
    摘要 人类情感表达在多种通信modalities和媒体格式中表现出来,因此计算研究也在自然语言处理、音频信号分析、计算机视觉等领域得到了多样化的应用。在过去的研究中,用于描述情感的不同格式(如负责度维度、基本情绪类别、维度方法、评估理论等)导致了数据集、预测模型和软件工具的总体化,从而产生了不断增长的多样化问题。为了解决这两种不同的多样性,即表达层次和表示层次的多样性,这篇文章提出了一种统一的计算模型。我们提议一种在不同的自然语言、通信modalities、媒体和表示格式之间学习共享的潜在表达(emotion embeddings)的训练方法,无论是不同的语言、模式、媒体或表示格式,都可以学习到共享的表达特征。在各种多样化的情感数据集上进行了广泛的实验,结果表明,这种方法可以实现数据集之间的可 reuse、可 interpretability和灵活性,而不会增加预测质量的损失。代码和数据可以在https://doi.org/10.5281/zenodo.7405327上找到。

Brain-Inspired Computational Intelligence via Predictive Coding

  • paper_url: http://arxiv.org/abs/2308.07870
  • repo_url: None
  • paper_authors: Tommaso Salvatori, Ankur Mali, Christopher L. Buckley, Thomas Lukasiewicz, Rajesh P. N. Rao, Karl Friston, Alexander Ororbia
  • for: This paper aims to explore the potential of predictive coding (PC) in addressing the limitations of deep neural networks in machine learning and computational intelligence.
  • methods: The paper surveys the literature on PC and its applications in cognitive control, robotics, and variational inference, highlighting its exciting properties and potential value for the machine learning community.
  • results: The paper hopes to foreground research in PC-inspired machine learning and encourage further exploration of its potential in the future of computational intelligence.
    Abstract Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained with the error backpropagation learning algorithm. However, the ubiquitous adoption of this approach has highlighted some important limitations such as substantial computational cost, difficulty in quantifying uncertainty, lack of robustness, unreliability, and biological implausibility. It is possible that addressing these limitations may require schemes that are inspired and guided by neuroscience theories. One such theory, called predictive coding (PC), has shown promising performance in machine intelligence tasks, exhibiting exciting properties that make it potentially valuable for the machine learning community: PC can model information processing in different brain areas, can be used in cognitive control and robotics, and has a solid mathematical grounding in variational inference, offering a powerful inversion scheme for a specific class of continuous-state generative models. With the hope of foregrounding research in this direction, we survey the literature that has contributed to this perspective, highlighting the many ways that PC might play a role in the future of machine learning and computational intelligence at large.
    摘要 人工智能(AI)在这个世纪 rapid becoming 一种关键技术。大多数 AI 结果都是使用深度神经网络,通过错误反propagation 学习算法训练得到的。然而,普遍采用这种方法的限制已经吸引了人们的注意,包括计算成本很高、难以量化不确定性、缺乏可靠性和生物学可能性。可能需要解决这些限制的方法可能是基于神经科学理论的。一种这种理论是预测编码(PC),它在机器智能任务中表现出了扎实的性能,并且具有许多可能有用的特性,例如:PC 可以模型不同脑区的信息处理,可以应用于认知控制和机器人学习,并且具有强大的数学基础,可以通过变量推理来解决特定类型的连续状态生成模型。希望通过这篇文章,各位能够更好地了解 PC 在机器学习和计算智能领域的前景,并且能够更好地发挥研究。因此,我们在这篇文章中survey了一些与这种视角相关的文献,并 highlighted 它们在未来机器学习和计算智能领域的可能性。

Graph-Structured Kernel Design for Power Flow Learning using Gaussian Processes

  • paper_url: http://arxiv.org/abs/2308.07867
  • repo_url: None
  • paper_authors: Parikshit Pareek, Deepjyoti Deka, Sidhant Misra
  • for: 这篇论文旨在开发一种基于物理学的图structured kernel,用于电流流动学习 using Gaussian Process(GP)。
  • methods: 该kernel,称为顶点度kernel(VDK),基于网络图或结构的 latent decomposition 来描述电压注入关系。而VDK的设计不需要解决优化问题,从而提高效率。此外,作者还提出了一种图reducible方法,用于获得VDK表示形式中的更少项。
  • results: 作者通过实验表明,提案的 VDK-GP 可以在中等规模500-Bus和大规模1354-Bus电力系统上实现更 than two fold 样本复杂度减少,相比整个 GP。此外,作者还提出了一种新的网络滑块活动学习算法,可以快速地适应 VDK 的学习。在测试预测中,该算法可以比Random Trial 的平均性能提高两倍,在中等规模500-Bus系统上,并在大规模1354-Bus系统上达到最佳性能的 10% 。此外,作者还证明了提案的方法在不同数据集上的uncertainty quantification应用中的性能。
    Abstract This paper presents a physics-inspired graph-structured kernel designed for power flow learning using Gaussian Process (GP). The kernel, named the vertex-degree kernel (VDK), relies on latent decomposition of voltage-injection relationship based on the network graph or topology. Notably, VDK design avoids the need to solve optimization problems for kernel search. To enhance efficiency, we also explore a graph-reduction approach to obtain a VDK representation with lesser terms. Additionally, we propose a novel network-swipe active learning scheme, which intelligently selects sequential training inputs to accelerate the learning of VDK. Leveraging the additive structure of VDK, the active learning algorithm performs a block-descent type procedure on GP's predictive variance, serving as a proxy for information gain. Simulations demonstrate that the proposed VDK-GP achieves more than two fold sample complexity reduction, compared to full GP on medium scale 500-Bus and large scale 1354-Bus power systems. The network-swipe algorithm outperforms mean performance of 500 random trials on test predictions by two fold for medium-sized 500-Bus systems and best performance of 25 random trials for large-scale 1354-Bus systems by 10%. Moreover, we demonstrate that the proposed method's performance for uncertainty quantification applications with distributionally shifted testing data sets.
    摘要

Impression-Aware Recommender Systems

  • paper_url: http://arxiv.org/abs/2308.07857
  • repo_url: None
  • paper_authors: Fernando B. Pérez Maurera, Maurizio Ferrari Dacrema, Pablo Castells, Paolo Cremonesi
    for: 这个论文旨在探讨基于印象(过去推荐的项目)的推荐系统,以提高推荐系统的质量。methods: 本文使用系统性文献综述方法,对推荐系统使用印象进行三个基本的研究角度:推荐器、数据集和评价方法。results: 本文对各种推荐系统使用印象进行了详细的介绍,还分析了各种数据集和评价方法。最后,本文提出了一些未解决的问题和未来研究方向,强调在文献中缺失的方面可以在未来的研究中进行深入探讨。
    Abstract Novel data sources bring new opportunities to improve the quality of recommender systems. Impressions are a novel data source containing past recommendations (shown items) and traditional interactions. Researchers may use impressions to refine user preferences and overcome the current limitations in recommender systems research. The relevance and interest of impressions have increased over the years; hence, the need for a review of relevant work on this type of recommenders. We present a systematic literature review on recommender systems using impressions, focusing on three fundamental angles in research: recommenders, datasets, and evaluation methodologies. We provide three categorizations of papers describing recommenders using impressions, present each reviewed paper in detail, describe datasets with impressions, and analyze the existing evaluation methodologies. Lastly, we present open questions and future directions of interest, highlighting aspects missing in the literature that can be addressed in future works.
    摘要 新的数据源带来了改善推荐系统质量的新机遇。印象是一种新的数据源,包含过去的推荐(显示的项目)和传统的交互。研究人员可以使用印象来细化用户喜好,超越当前的推荐系统研究的限制。在几年内,印象的相关性和兴趣度有所增加,因此需要对这类推荐器进行系统性的文献回顾。本文提供了对推荐系统使用印象的系统性文献回顾,重点是三个基本的研究方向:推荐器、数据集和评估方法ologies。我们提供了三种描述推荐器使用印象的论文分类,详细介绍每篇评论文章,描述含印象的数据集,并分析现有的评估方法。最后,我们提出了未解决的问题和未来方向,强调文献中缺失的方面,可以在未来的研究中进行深入探究。