cs.LG - 2023-07-24

QAmplifyNet: Pushing the Boundaries of Supply Chain Backorder Prediction Using Interpretable Hybrid Quantum - Classical Neural Network

  • paper_url: http://arxiv.org/abs/2307.12906
  • repo_url: None
  • paper_authors: Md Abrar Jahin, Md Sakib Hossain Shovon, Md. Saiful Islam, Jungpil Shin, M. F. Mridha, Yuichi Okuyama
  • for: 这个研究的目的是为供应链管理系统提供精确的货物预测,以便优化存储控制、降低成本和提高顾客满意度。
  • methods: 本研究提出了一个新的方法olographical framework,使用量子灵感技术实现供应链货物预测,并且可以处理短期和不寻常的数据集。
  • results: 实验结果显示,QAmplifyNet模型在短期和不寻常的数据集上的预测效果比 класиical models、量子对集、量子神经网和深度强化学习模型更好。这个模型的可读性和可扩展性使其成为供应链管理中的理想解决方案。
    Abstract Supply chain management relies on accurate backorder prediction for optimizing inventory control, reducing costs, and enhancing customer satisfaction. However, traditional machine-learning models struggle with large-scale datasets and complex relationships, hindering real-world data collection. This research introduces a novel methodological framework for supply chain backorder prediction, addressing the challenge of handling large datasets. Our proposed model, QAmplifyNet, employs quantum-inspired techniques within a quantum-classical neural network to predict backorders effectively on short and imbalanced datasets. Experimental evaluations on a benchmark dataset demonstrate QAmplifyNet's superiority over classical models, quantum ensembles, quantum neural networks, and deep reinforcement learning. Its proficiency in handling short, imbalanced datasets makes it an ideal solution for supply chain management. To enhance model interpretability, we use Explainable Artificial Intelligence techniques. Practical implications include improved inventory control, reduced backorders, and enhanced operational efficiency. QAmplifyNet seamlessly integrates into real-world supply chain management systems, enabling proactive decision-making and efficient resource allocation. Future work involves exploring additional quantum-inspired techniques, expanding the dataset, and investigating other supply chain applications. This research unlocks the potential of quantum computing in supply chain optimization and paves the way for further exploration of quantum-inspired machine learning models in supply chain management. Our framework and QAmplifyNet model offer a breakthrough approach to supply chain backorder prediction, providing superior performance and opening new avenues for leveraging quantum-inspired techniques in supply chain management.
    摘要 供应链管理需要准确预测营销订单,以优化存储控制、降低成本和提高客户满意度。然而,传统的机器学习模型在大规模数据集和复杂关系下难以处理实际数据收集。本研究提出了一种新的方法olo Framework for Supply Chain Backorder Prediction,解决大数据集处理的挑战。我们的提议模型,QAmplifyNet,在短时间和不均衡数据集上预测营销订单非常有效。实验评估表明QAmplifyNet在经典模型、量子ensemble、量子神经网络和深度强化学习方面具有突出的优势。由于它可以处理短时间和不均衡数据集,因此在供应链管理中是一个 идеal的解决方案。为了提高模型可读性,我们使用了可解释人工智能技术。实际应用包括改善存储控制、减少营销订单和提高运营效率。QAmplifyNet可以轻松整合到实际供应链管理系统中,允许执行投入式决策和有效资源分配。未来的工作包括探索更多的量子静止技术、扩大数据集和探索其他供应链应用。本研究开启了量子计算在供应链优化中的潜力,为了 leveraging量子静止机器学习模型在供应链管理中提供了一个突破性的方法。我们的框架和QAmplifyNet模型为供应链营销订单预测提供了超越性能,开创了新的可能性,以及可以在供应链管理中应用量子静止技术。

Universal Approximation Theorem and error bounds for quantum neural networks and quantum reservoirs

  • paper_url: http://arxiv.org/abs/2307.12904
  • repo_url: None
  • paper_authors: Lukas Gonon, Antoine Jacquier
  • for: 这个论文是为了证明Quantum Neural Network可以用于精确地预测函数的目的。
  • methods: 这篇论文使用了Parameterised quantum circuits和随机量子Circuits来 aproximate classical functions。
  • results: 这篇论文提供了具体的错误 bound,证明一个Quantum Neural Network可以在某些情况下以 $\mathcal{O}(\varepsilon^{-2})$ 的参数和 $\mathcal{O} (\lceil \log_2(\varepsilon^{-1}) \rceil)$ 个量子比特来实现精度 $\varepsilon>0$ 的函数预测。
    Abstract Universal approximation theorems are the foundations of classical neural networks, providing theoretical guarantees that the latter are able to approximate maps of interest. Recent results have shown that this can also be achieved in a quantum setting, whereby classical functions can be approximated by parameterised quantum circuits. We provide here precise error bounds for specific classes of functions and extend these results to the interesting new setup of randomised quantum circuits, mimicking classical reservoir neural networks. Our results show in particular that a quantum neural network with $\mathcal{O}(\varepsilon^{-2})$ weights and $\mathcal{O} (\lceil \log_2(\varepsilon^{-1}) \rceil)$ qubits suffices to achieve accuracy $\varepsilon>0$ when approximating functions with integrable Fourier transform.
    摘要 “ universal approximation 定理 是 classical neural network 的基础,提供了理论保证这些 later 能够 approximate interessant 的映射。 recent results 表明这也可以在量子设置下实现,其中 classical 函数可以通过参数化 quantum circuit 的方式进行approximation。我们在这里提供了具体的误差 bound для特定的函数类型,并将其扩展到 randomized quantum circuit 中,模拟 classical reservoir neural network。我们的结果显示,一个 quantum neural network WITH $\mathcal{O}(\varepsilon^{-2})$ weights 和 $\mathcal{O} (\lceil \log_2(\varepsilon^{-1}) \rceil)$ qubits 就能够达到 $\varepsilon>0$ 的精度,当approximating functions with integrable Fourier transform。”Note: "Simplified Chinese" is a romanization of Chinese that uses simpler characters and grammar to facilitate typing and reading. It is not a standardized translation of Chinese, and the actual translation may vary depending on the context and the translator's preference.

Anytime Model Selection in Linear Bandits

  • paper_url: http://arxiv.org/abs/2307.12897
  • repo_url: None
  • paper_authors: Parnian Kassraie, Aldo Pacchiano, Nicolas Emmenegger, Andreas Krause
  • for: This paper is written for solving the problem of model selection in the context of bandit optimization, which is a challenging problem that requires balancing exploration and exploitation not only for action selection, but also for model selection.
  • methods: The paper proposes a new method called ALEXP, which uses online learning algorithms that treat different models as experts and emulates full-information feedback to the online learner with a favorable bias-variance trade-off.
  • results: The paper shows that ALEXP has an exponentially improved ($\log M$) dependence on the number of models $M$ for its regret, and has anytime guarantees on its regret without requiring knowledge of the horizon $n$ or relying on an initial purely exploratory stage.
    Abstract Model selection in the context of bandit optimization is a challenging problem, as it requires balancing exploration and exploitation not only for action selection, but also for model selection. One natural approach is to rely on online learning algorithms that treat different models as experts. Existing methods, however, scale poorly ($\text{poly}M$) with the number of models $M$ in terms of their regret. Our key insight is that, for model selection in linear bandits, we can emulate full-information feedback to the online learner with a favorable bias-variance trade-off. This allows us to develop ALEXP, which has an exponentially improved ($\log M$) dependence on $M$ for its regret. ALEXP has anytime guarantees on its regret, and neither requires knowledge of the horizon $n$, nor relies on an initial purely exploratory stage. Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.
    摘要 Our key insight is that, for model selection in linear bandits, we can emulate full-information feedback to the online learner with a favorable bias-variance trade-off. This allows us to develop ALEXP, which has an exponentially improved dependence on $M$ for its regret, with a logarithmic dependence on $M$ (log$M$).ALEXP has anytime guarantees on its regret, and does not require knowledge of the horizon $n$ or an initial purely exploratory stage. Our approach utilizes a novel time-uniform analysis of the Lasso, establishing a new connection between online learning and high-dimensional statistics.

A Statistical View of Column Subset Selection

  • paper_url: http://arxiv.org/abs/2307.12892
  • repo_url: https://github.com/anavsood/css
  • paper_authors: Anav Sood, Trevor Hastie
  • for: 选择一小集合的表变量从大数据集中
  • methods: 使用某些简统统计方法,如某些简单的概率模型,来实现维度减少
  • results: 该paper表明了CSS和主要变量选择是等价的,并且两者都可以视为最大化信息的最优化问题。此外,paper还介绍了如何使用这些连接来快速完成CSS,包括使用摘要统计数据进行CSS和在缺失和/或抑制数据的情况下进行CSS。
    Abstract We consider the problem of selecting a small subset of representative variables from a large dataset. In the computer science literature, this dimensionality reduction problem is typically formalized as Column Subset Selection (CSS). Meanwhile, the typical statistical formalization is to find an information-maximizing set of Principal Variables. This paper shows that these two approaches are equivalent, and moreover, both can be viewed as maximum likelihood estimation within a certain semi-parametric model. Using these connections, we show how to efficiently (1) perform CSS using only summary statistics from the original dataset; (2) perform CSS in the presence of missing and/or censored data; and (3) select the subset size for CSS in a hypothesis testing framework.
    摘要 我团队正在考虑一个大数据集中选择一小子集的代表变量问题。在计算机科学文献中,这个维度减少问题通常被称为列子集选择(CSS)。在统计学文献中,这个问题通常被формализова为找到最优化信息的主要变量。这篇论文表明了这两种方法是等价的,并且它们都可以被视为在某种半 parametic 模型中的最大 LIKELIHOOD估计。使用这些连接,我们展示了如何:1. 使用原始数据集的摘要统计来快速完成 CSS。2. 在缺失和/或截断数据存在的情况下进行 CSS。3. 在假设检测框架中选择 CSS 中的子集大小。

Interpretable Stereotype Identification through Reasoning

  • paper_url: http://arxiv.org/abs/2308.00071
  • repo_url: None
  • paper_authors: Jacob-Junqi Tian, Omkar Dige, David Emerson, Faiza Khan Khattak
  • for: 这篇研究的目的是探讨语言模型中的偏见,并将公平性集成到语言模型的开发过程中,以确保这些模型是不偏不倾的。
  • methods: 本研究使用了Vicuna-13B-v1.3进行零扩展 sterotype 识别 tasks,并 comparing the performance of scaling up from 13B to 33B 和 reasoning 的效果。
  • results: 研究发现,reasoning 可以帮助语言模型在离 Domain tasks 中提高精度,并且可以增加模型的解释力。
    Abstract Given that language models are trained on vast datasets that may contain inherent biases, there is a potential danger of inadvertently perpetuating systemic discrimination. Consequently, it becomes essential to examine and address biases in language models, integrating fairness into their development to ensure these models are equitable and free from bias. In this work, we demonstrate the importance of reasoning in zero-shot stereotype identification based on Vicuna-13B-v1.3. While we do observe improved accuracy by scaling from 13B to 33B, we show that the performance gain from reasoning significantly exceeds the gain from scaling up. Our findings suggest that reasoning could be a key factor that enables LLMs to trescend the scaling law on out-of-domain tasks such as stereotype identification. Additionally, through a qualitative analysis of select reasoning traces, we highlight how reasoning enhances not just accuracy but also the interpretability of the decision.
    摘要 Given that language models are trained on vast datasets that may contain inherent biases, there is a potential danger of inadvertently perpetuating systemic discrimination. Consequently, it becomes essential to examine and address biases in language models, integrating fairness into their development to ensure these models are equitable and free from bias. In this work, we demonstrate the importance of reasoning in zero-shot stereotype identification based on Vicuna-13B-v1.3. While we do observe improved accuracy by scaling from 13B to 33B, we show that the performance gain from reasoning significantly exceeds the gain from scaling up. Our findings suggest that reasoning could be a key factor that enables LLMs to trescend the scaling law on out-of-domain tasks such as stereotype identification. Additionally, through a qualitative analysis of select reasoning traces, we highlight how reasoning enhances not just accuracy but also the interpretability of the decision.Here's the translation in Traditional Chinese: given that language models are trained on vast datasets that may contain inherent biases, there is a potential danger of inadvertently perpetuating systemic discrimination. Consequently, it becomes essential to examine and address biases in language models, integrating fairness into their development to ensure these models are equitable and free from bias. In this work, we demonstrate the importance of reasoning in zero-shot stereotype identification based on Vicuna-13B-v1.3. While we do observe improved accuracy by scaling from 13B to 33B, we show that the performance gain from reasoning significantly exceeds the gain from scaling up. Our findings suggest that reasoning could be a key factor that enables LLMs to trescend the scaling law on out-of-domain tasks such as stereotype identification. Additionally, through a qualitative analysis of select reasoning traces, we highlight how reasoning enhances not just accuracy but also the interpretability of the decision.

Data-free Black-box Attack based on Diffusion Model

  • paper_url: http://arxiv.org/abs/2307.12872
  • repo_url: None
  • paper_authors: Mingwen Shao, Lingzhuang Meng, Yuanjian Qiao, Lixu Zhang, Wangmeng Zuo
  • for: 本研究旨在提高数据free黑盒攻击的效率和准确性,通过使用扩散模型生成数据进行训练代理模型。
  • methods: 本研究使用扩散模型生成数据,并提出了一种干扰代码增强(LCA)方法,以指导扩散模型生成数据。LCA方法可以使得生成的数据符合目标模型的批判标准,同时保持高的多样性。
  • results: 对于不同的目标模型,我们的LCA方法可以获得更高的攻击成功率,并且需要更少的查询预算。EXTensive experiments表明,我们的LCA方法可以提高数据free黑盒攻击的效率和准确性。
    Abstract Since the training data for the target model in a data-free black-box attack is not available, most recent schemes utilize GANs to generate data for training substitute model. However, these GANs-based schemes suffer from low training efficiency as the generator needs to be retrained for each target model during the substitute training process, as well as low generation quality. To overcome these limitations, we consider utilizing the diffusion model to generate data, and propose a data-free black-box attack scheme based on diffusion model to improve the efficiency and accuracy of substitute training. Despite the data generated by the diffusion model exhibits high quality, it presents diverse domain distributions and contains many samples that do not meet the discriminative criteria of the target model. To further facilitate the diffusion model to generate data suitable for the target model, we propose a Latent Code Augmentation (LCA) method to guide the diffusion model in generating data. With the guidance of LCA, the data generated by the diffusion model not only meets the discriminative criteria of the target model but also exhibits high diversity. By utilizing this data, it is possible to train substitute model that closely resemble the target model more efficiently. Extensive experiments demonstrate that our LCA achieves higher attack success rates and requires fewer query budgets compared to GANs-based schemes for different target models.
    摘要 因为目标模型的训练数据不可获得,大多数最新的方案使用GANs生成数据来训练代理模型。然而,这些GANs基于的方案受到低训练效率和低生成质量的限制。为了突破这些限制,我们考虑使用扩散模型生成数据,并提出了基于扩散模型的数据 свобо black-box攻击方案,以提高代理训练的效率和准确性。尽管扩散模型生成的数据具有高质量,但它们具有多样的领域分布和含有许多不符合目标模型的扩散标准的样本。为了使扩散模型更加适合目标模型,我们提出了幽默代码修饰(LCA)方法,以导引扩散模型生成数据。通过LCA的引导,扩散模型生成的数据不仅满足目标模型的扩散标准,而且具有高多样性。通过这些数据,我们可以更加快速地训练符合目标模型的代理模型。我们的LCA在不同的目标模型上实现了更高的攻击成功率和更少的查询预算。

Stochastic Step-wise Feature Selection for Exponential Random Graph Models (ERGMs)

  • paper_url: http://arxiv.org/abs/2307.12862
  • repo_url: None
  • paper_authors: Helal El-Zaatari, Fei Yu, Michael R Kosorok
  • for: 这 paper 的目的是提供一种改进的 exponential random graph models (ERGMs) 模型,以更好地Capture 社交网络中的依赖关系。
  • methods: 该 paper 使用了一种新的方法,即选择内生变量(endogenous variable selection),以解决 ERGMs 中的degeneracy问题,并提高了网络模型的准确性。
  • results: 经验测试表明,该方法可以有效地避免 ERGMs 中的degeneracy问题,并提高了网络模型的准确性和可靠性。
    Abstract Statistical analysis of social networks provides valuable insights into complex network interactions across various scientific disciplines. However, accurate modeling of networks remains challenging due to the heavy computational burden and the need to account for observed network dependencies. Exponential Random Graph Models (ERGMs) have emerged as a promising technique used in social network modeling to capture network dependencies by incorporating endogenous variables. Nevertheless, using ERGMs poses multiple challenges, including the occurrence of ERGM degeneracy, which generates unrealistic and meaningless network structures. To address these challenges and enhance the modeling of collaboration networks, we propose and test a novel approach that focuses on endogenous variable selection within ERGMs. Our method aims to overcome the computational burden and improve the accommodation of observed network dependencies, thereby facilitating more accurate and meaningful interpretations of network phenomena in various scientific fields. We conduct empirical testing and rigorous analysis to contribute to the advancement of statistical techniques and offer practical insights for network analysis.
    摘要 (Simplified Chinese translation)社交网络统计分析提供了许多有价值的网络互动现象的理解,但是准确地模型网络仍然是一项挑战,因为计算负担重要和需要考虑观察到的网络依赖关系。扩展随机图模型(ERGMs)在社交网络模型中表现出了扩展的潜力,可以通过包含内生变量来捕捉网络依赖关系。然而,使用ERGMs也存在多种挑战,包括ERGM异常性,这会生成无意义和不切实际的网络结构。为了解决这些挑战并改进协作网络的模型,我们提出了一种新的方法,即内生变量选择在ERGMs中。我们的方法目的是减少计算负担和更好地考虑观察到的网络依赖关系,从而为不同科学领域中的网络现象提供更准确和有意义的解释。我们进行了实际测试和严格分析,以贡献到统计技术的进步和为网络分析提供实用的指导。

A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

  • paper_url: http://arxiv.org/abs/2307.12856
  • repo_url: None
  • paper_authors: Izzeddin Gur, Hiroki Furuta, Austin Huang, Mustafa Safdari, Yutaka Matsuo, Douglas Eck, Aleksandra Faust
  • for: 本研究旨在提高自主浏览器的自然语言指令下的实际website上的性能。
  • methods: 研究人员提出了WebAgent Agent,该Agent可以根据自然语言指令完成实际website上的任务。WebAgent使用Flan-U-PaLM进行code生成和HTML-T5进行规划和摘要,以及使用本地和全局听力机制和杂xture-span噪声目标来解决HTML文档中的长 Span问题。
  • results: 研究人员通过实验表明,WebAgent在真实的website上提高了成功率超过50%,并且HTML-T5在解决HTML基本任务方面的成功率高于之前的SoTA,达到14.9%。
    Abstract Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web navigation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that can complete the tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via generated Python programs from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our recipe improves the success on a real website by over 50%, and that HTML-T5 is the best model to solve HTML-based tasks; achieving 14.9% higher success rate than prior SoTA on the MiniWoB web navigation benchmark and better accuracy on offline task planning evaluation.
    摘要 各种大型自然语言模型(LLM)在自主网络浏览方面最近很有进步,但实际网站上的性能仍然受到以下三种因素的影响:开放性、限制的上下文长度和HTML的适应性。我们介绍了WebAgent,一个根据自然语言指令完成实际网站上的任务的LML-驱动的代理人。WebAgent通过将指令分解成标准化的子指令、将长HTML文档摘要成任务相关的短报道,并通过生成的Python程序来操作网站。我们为WebAgent设计了Flan-U-PaLM,用于锚定代码生成,以及HTML-T5,一种新的适应HTML文档的预训练语言模型,使用本地和全球注意力机制,并结合长时间排除目标来计划和摘要。我们实际证明了我们的配方可以在真实网站上提高成功率高于50%,并且HTML-T5是解决HTML基本任务的最佳模型,比前一个SoTA在小型网站浏览 benchmark上的成功率高14.9%。

Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization

  • paper_url: http://arxiv.org/abs/2307.12851
  • repo_url: None
  • paper_authors: Hancheng Min, René Vidal, Enrique Mallada
  • for: 这 paper 研究了使用梯度流的两层 ReLU 网络进行二分类训练,并且使用小初值。
  • methods: 我们分析了在训练集中的输入向量之间的相互关系,并且通过对神经元的方向动态进行仔细分析,提供了 $\mathcal{O}(\frac{\log n}{\sqrt{\mu})$ 上限 bounds on 训练时间,其中 $n$ 是数据点的数量,$\mu$ 是数据点之间的相互关系。
  • results: 我们的分析表明,在训练的早期阶段,神经元在第一层会尝试与输入数据进行对齐,并且在训练的晚期阶段,损失函数会逐渐逼近零,并且第一层的权重矩阵会变得相对低矩。数据实验表明了我们的理论发现。
    Abstract This paper studies the problem of training a two-layer ReLU network for binary classification using gradient flow with small initialization. We consider a training dataset with well-separated input vectors: Any pair of input data with the same label are positively correlated, and any pair with different labels are negatively correlated. Our analysis shows that, during the early phase of training, neurons in the first layer try to align with either the positive data or the negative data, depending on its corresponding weight on the second layer. A careful analysis of the neurons' directional dynamics allows us to provide an $\mathcal{O}(\frac{\log n}{\sqrt{\mu})$ upper bound on the time it takes for all neurons to achieve good alignment with the input data, where $n$ is the number of data points and $\mu$ measures how well the data are separated. After the early alignment phase, the loss converges to zero at a $\mathcal{O}(\frac{1}{t})$ rate, and the weight matrix on the first layer is approximately low-rank. Numerical experiments on the MNIST dataset illustrate our theoretical findings.
    摘要 Here's the translation in Simplified Chinese:这篇论文研究了使用梯度流的两层ReLU网络 для二分类问题的训练,并采用小初始化。我们考虑了一个具有良好分离的训练集:任何同样标签的输入数据对都是正相关的,任何不同标签的输入数据对都是负相关的。我们的分析表明,在训练的早期阶段,第一层神经元尝试与输入数据进行对齐,具体来说,它们会对应于第二层神经元的权重进行对齐。通过仔细分析神经元的方向动态,我们可以提供一个 $\mathcal{O}(\frac{\log n}{\sqrt{\mu})$ 上限于时间内所有神经元达到好的对齐度,其中 $n$ 是数据点的数量,$\mu$ 是数据分离度。训练过程后期,损失函数会随着时间的增长而逐渐减少,并且第一层神经元的权重矩阵会 aproximately 变为低级 matrix。实验结果表明,这些理论发现都能够在 MNIST 数据集上得到支持。

Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials

  • paper_url: http://arxiv.org/abs/2307.12840
  • repo_url: None
  • paper_authors: Ilias Diakonikolas, Daniel M. Kane
  • for: 学习一个 linear combination of $k$ ReLU 活化器在标准 Gaussian 分布上的 $\mathbb{R}^d$ 上的问题,使用标准损失函数。
  • methods: 使用 tensor decomposition 技术来识别一个子空间,使得所有 $O(k)$-order моменты在正交方向上都很小。
  • results: 提供了一个高效的算法,其复杂度为 $(dk/\epsilon)^{O(k)}$, 比之前的算法更为优化。
    Abstract We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $\mathbb{R}^d$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ is the target accuracy. Prior work had given an algorithm for this problem with complexity $(dk/\epsilon)^{h(k)}$, where the function $h(k)$ scales super-polynomially in $k$. Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the $O(k)$-order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors are small given that the lower-order ones are.
    摘要 我们研究一个PAC学习问题,即以$k$个ReLU激活函数为线性结构,在标准 Gaussian 分布下的 $\mathbb{R}^d$ 上对对于方差损失函数进行学习。我们的主要结果是一个有效的学习算法,其sample和computational Complexity为 $(dk/\epsilon)^{O(k)}$, where $\epsilon>0$ 是目标精度。对比之下,先前的算法的复杂度为 $(dk/\epsilon)^{h(k)}$, where $h(k)$ scales 超 polynomial 地增长。有趣的是,我们的算法的复杂度几乎是near-optimal within the class of Correlational Statistical Query algorithms。在高阶概念上,我们的算法使用了维度分解来识别一个子空间,使得所有 $O(k)$-order moments 在这个orthogonal direction 上都是小的。其分析将使用Schur多项式理论来显示,在这个子空间上,更高阶的error tensors 是小的,只要Lower-order ones 是。

Learning Provably Robust Estimators for Inverse Problems via Jittering

  • paper_url: http://arxiv.org/abs/2307.12822
  • repo_url: https://github.com/mli-lab/robust_reconstructors_via_jittering
  • paper_authors: Anselm Krainovic, Mahdi Soltanolkotabi, Reinhard Heckel
  • for: 这篇论文 investigate whether jittering, a simple regularization technique, can be used to train deep neural networks to be worst-case robust for inverse problems.
  • methods: 论文使用了jittering regularization technique during training, and presents a novel analytical characterization of the optimal $\ell_2$-worst-case robust estimator for linear denoising.
  • results: 研究发现,jittering可以增强worst-case robustness,但可能不适用于 inverse problems beyond denoising。同时,论文还发现,使用实际数据进行训练可以提供一定的 robustness enhancement.
    Abstract Deep neural networks provide excellent performance for inverse problems such as denoising. However, neural networks can be sensitive to adversarial or worst-case perturbations. This raises the question of whether such networks can be trained efficiently to be worst-case robust. In this paper, we investigate whether jittering, a simple regularization technique that adds isotropic Gaussian noise during training, is effective for learning worst-case robust estimators for inverse problems. While well studied for prediction in classification tasks, the effectiveness of jittering for inverse problems has not been systematically investigated. In this paper, we present a novel analytical characterization of the optimal $\ell_2$-worst-case robust estimator for linear denoising and show that jittering yields optimal robust denoisers. Furthermore, we examine jittering empirically via training deep neural networks (U-nets) for natural image denoising, deconvolution, and accelerated magnetic resonance imaging (MRI). The results show that jittering significantly enhances the worst-case robustness, but can be suboptimal for inverse problems beyond denoising. Moreover, our results imply that training on real data which often contains slight noise is somewhat robustness enhancing.
    摘要 深度神经网络在反向问题中表现出色,但是神经网络可能对抗性或最坏情况的扰动敏感。这引起了训练神经网络是否可以有效地培养最坏情况Robust的问题。在这篇论文中,我们调查了在反向问题中是否可以使用扰动,一种简单的规范技术,来学习最坏情况Robust的估计器。虽然在预测类型任务中well studied,但是反向问题中扰动的效iveness尚未系统地研究。在这篇论文中,我们提供了一种新的分析 Characterization of the optimal $\ell_2$-worst-case robust estimator for linear denoising,并证明了扰动可以生成最优的Robust denoiser。此外,我们通过训练深度神经网络(U-net)对自然图像杂谔、减 convolution和加速核磁共振成像(MRI)进行实验。结果表明,扰动可以强化最坏情况的Robust性,但可能不适用于反向问题 beyond denoising。此外,我们的结果也表明,训练在真实数据上,通常含有些许噪声,可以提高Robust性。

Maximal Independent Sets for Pooling in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.13011
  • repo_url: None
  • paper_authors: Stevan Stanovic, Benoit Gaüzère, Luc Brun
  • for: 图像分类领域的进步
  • methods: 基于最大独立集的三种图像池化方法
  • results: 实验结果证明最大独立集约束对图像池化有 relevance
    Abstract Convolutional Neural Networks (CNNs) have enabled major advances in image classification through convolution and pooling. In particular, image pooling transforms a connected discrete lattice into a reduced lattice with the same connectivity and allows reduction functions to consider all pixels in an image. However, there is no pooling that satisfies these properties for graphs. In fact, traditional graph pooling methods suffer from at least one of the following drawbacks: Graph disconnection or overconnection, low decimation ratio, and deletion of large parts of graphs. In this paper, we present three pooling methods based on the notion of maximal independent sets that avoid these pitfalls. Our experimental results confirm the relevance of maximal independent set constraints for graph pooling.
    摘要 卷积神经网络(CNNs)已经为图像分类带来重要的进步,通过卷积和聚合。特别是图像聚合将连接的离散网络转换为减少的网络,让减少函数考虑整个图像中的所有像素。然而,为图集而设计的pooling方法存在一些缺点,包括图集分离或过度连接、低减少比率和删除大量图集。在这篇论文中,我们提出了基于最大独立集的三种pooling方法,避免了这些缺点。我们的实验结果证明了最大独立集约束对图集聚合具有重要性。

Causal Fair Machine Learning via Rank-Preserving Interventional Distributions

  • paper_url: http://arxiv.org/abs/2307.12797
  • repo_url: https://github.com/slds-lmu/paper_2023_cfml
  • paper_authors: Ludwig Bothmann, Susanne Dandl, Michael Schomaker
  • for: 该论文旨在设计机器学习模型,以减少自动决策系统中的不公正性。
  • methods: 该论文提出了一种基于 causal thinking 的方法,通过引入保护属性来定义个体是否是normatively equal。该方法使用rank-preserving interventional distributions来定义一个FiND世界,并使用扭曲方法进行估计。
  • results: 该论文通过实验和实际数据 validate了该方法和模型的评价标准,并显示了该方法能够减少不公正性。
    Abstract A decision can be defined as fair if equal individuals are treated equally and unequals unequally. Adopting this definition, the task of designing machine learning models that mitigate unfairness in automated decision-making systems must include causal thinking when introducing protected attributes. Following a recent proposal, we define individuals as being normatively equal if they are equal in a fictitious, normatively desired (FiND) world, where the protected attribute has no (direct or indirect) causal effect on the target. We propose rank-preserving interventional distributions to define an estimand of this FiND world and a warping method for estimation. Evaluation criteria for both the method and resulting model are presented and validated through simulations and empirical data. With this, we show that our warping approach effectively identifies the most discriminated individuals and mitigates unfairness.
    摘要 一个决策可以被定义为公平的,如果对等的人进行对等的待遇,不同的人则不同的待遇。在设计自动化决策系统中减少不公的方面,我们应该采用 causal 思维。根据最近的提议,我们定义了一个人为在一个虚拟、normatively 愿望的 (FiND) 世界中是否是等值的。我们提议使用排名保持分布来定义这个FiND世界的估计量,并使用扭曲方法进行估计。我们对方法和模型的评价标准和验证结果进行了说明和验证,并通过实验数据和仿真数据来显示我们的扭曲方法能够有效地找到最受歧视的个体并减少不公。

Compact & Capable: Harnessing Graph Neural Networks and Edge Convolution for Medical Image Classification

  • paper_url: http://arxiv.org/abs/2307.12790
  • repo_url: https://github.com/anonrepo-keeper/gcnn-ec
  • paper_authors: Aryan Singh, Pepijn Van de Ven, Ciarán Eising, Patrick Denny
  • for: 这个研究探索了图形神经网络(Graph Neural Network,GNN)在医疗影像分类中的潜力。
  • methods: 我们提出了一种新的模型,具有结合图形神经网络和边弹击的特点,通过RGB通道特征值之间的连接强化关系,实现更好地表示关键图形节点之间的连接。
  • results: 我们的模型不仅与现有的深度神经网络(Deep Neural Network,DNN)相比,表现优化,且仅需1000个参数,训练时间和数据需求都被降低了。我们将这个GCNN模型与预训练的DNN进行比较,发现GCNN在医疗影像分类任务中表现出色,并鼓励进一步探索更进阶的图形基于模型,如图形注意力网络(Graph Attention Network,GAT)和图形自动编码器(Graph Auto-Encoder)在医疗影像领域的应用。
    Abstract Graph-based neural network models are gaining traction in the field of representation learning due to their ability to uncover latent topological relationships between entities that are otherwise challenging to identify. These models have been employed across a diverse range of domains, encompassing drug discovery, protein interactions, semantic segmentation, and fluid dynamics research. In this study, we investigate the potential of Graph Neural Networks (GNNs) for medical image classification. We introduce a novel model that combines GNNs and edge convolution, leveraging the interconnectedness of RGB channel feature values to strongly represent connections between crucial graph nodes. Our proposed model not only performs on par with state-of-the-art Deep Neural Networks (DNNs) but does so with 1000 times fewer parameters, resulting in reduced training time and data requirements. We compare our Graph Convolutional Neural Network (GCNN) to pre-trained DNNs for classifying MedMNIST dataset classes, revealing promising prospects for GNNs in medical image analysis. Our results also encourage further exploration of advanced graph-based models such as Graph Attention Networks (GAT) and Graph Auto-Encoders in the medical imaging domain. The proposed model yields more reliable, interpretable, and accurate outcomes for tasks like semantic segmentation and image classification compared to simpler GCNNs
    摘要 “基于图的神经网络模型在知识学习领域受到广泛应用,因为它们可以捕捉难以识别的实体之间的隐藏 topological 关系。这些模型在药物发现、蛋白质交互、semantic segmentation 和 fluid dynamics 等领域中得到应用。在本研究中,我们调查了医学图像分类中的可能性,并提出了一种新的模型,该模型将基于图的神经网络(GCNN)和边 convolution 结合在一起,通过RGB通道特征值之间的连接来强大地表示关键图节点之间的连接。我们的提出的模型不仅与现有的深度神经网络(DNN)性能相似,而且具有1000倍少的参数,从而减少了训练时间和数据需求。我们对MedMNIST 数据集类别进行比较,发现GCNN在医学图像分类中有良好的前景,并且鼓励进一步探索更高级的图基于模型,如图注意力网络(GAT)和图自动编码器(GAE)在医学图像分类领域。GCNN 模型在 semantic segmentation 和图像分类任务中提供了更可靠、可解释、高精度的结果,相比于简单的 GCNN 模型”

Deep neural network improves the estimation of polygenic risk scores for breast cancer

  • paper_url: http://arxiv.org/abs/2307.13010
  • repo_url: None
  • paper_authors: Adrien Badré, Li Zhang, Wellington Muchero, Justin C. Reynolds, Chongle Pan
    for: 这个研究用于比较多种计算模型来计算乳腺癌风险分数(PRS)。methods: 这个研究使用了深度神经网络(DNN)和其他机器学习技术以及统计学方法,包括BLUP、BayesA和LDpred。results: DNN在测试群体中表现出色,其AUC为67.4%,比其他方法高。此外,DNN还能够分化患者群体,并且可以达到18.8%的感知率 при 90%的准确率。这些结果表明,DNN可以更好地预测乳腺癌风险。
    Abstract Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA and LDpred. In the test cohort with 50% prevalence, the Area Under the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bi-modal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case sub-population with an average PRS significantly higher than the control population and a normal-genetic-risk case sub-population with an average PRS similar to the control population. This allowed DNN to achieve 18.8% recall at 90% precision in the test cohort with 50% prevalence, which can be extrapolated to 65.4% recall at 20% precision in a general population with 12% prevalence. Interpretation of the DNN model identified salient variants that were assigned insignificant p-values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through non-linear relationships.
    摘要 多因素风险分数(PRS)用于估计个体复杂疾病的遗传风险,基于整个基因组中的多个遗传变异。本研究比较了多种计算模型来估计乳腺癌PRS。深度神经网络(DNN)被发现超过了其他机器学习技术和确立的统计算法,包括BLUP、BayesA和LDpred。在测试群中的50%预测率下,DNN的AUC分数为67.4%,BLUP的AUC分数为64.2%,BayesA的AUC分数为64.5%,LDpred的AUC分数为62.4%。BLUP、BayesA和LPpred都生成的PRS在正例群中遵循正态分布。然而,DNN在疾病群中生成的PRS遵循了二元分布,由两个正态分布组成,其中一个分布的mean值明显高于控制群的mean值,另一个分布的mean值与控制群的mean值类似。这表明DNN能够将疾病群分为高遗传风险子群和正常遗传风险子群,其中高遗传风险子群的PRS平均值明显高于控制群的PRS,而正常遗传风险子群的PRS与控制群的PRS类似。这使得DNN在50%预测率下实现了18.8%的回归率,在20%预测率下可以扩展到65.4%的回归率。DNN模型的解释发现了一些被关键性评估为无关的变异,但对DNN预测是重要的。这些变异可能与现象之间存在非线性关系。

Analyzing the Strategy of Propaganda using Inverse Reinforcement Learning: Evidence from the 2022 Russian Invasion of Ukraine

  • paper_url: http://arxiv.org/abs/2307.12788
  • repo_url: None
  • paper_authors: Dominique Geissler, Stefan Feuerriegel
  • for: 这个研究旨在分析2022年俄罗斯入侵乌克兰的社交媒体宣传活动的战略。
  • methods: 这个研究使用反强化学习(IRL)方法来分析社交媒体上的宣传行为。
  • results: 研究发现,负面宣传的机器人和人类用户采取不同策略:机器人主要回应支持入侵的消息,而人类用户主要回应反对消息,这表明机器人寻求把消息推广,而人类用户更倾向于进行批评讨论。
    Abstract The 2022 Russian invasion of Ukraine was accompanied by a large-scale, pro-Russian propaganda campaign on social media. However, the strategy behind the dissemination of propaganda has remained unclear, particularly how the online discourse was strategically shaped by the propagandists' community. Here, we analyze the strategy of the Twitter community using an inverse reinforcement learning (IRL) approach. Specifically, IRL allows us to model online behavior as a Markov decision process, where the goal is to infer the underlying reward structure that guides propagandists when interacting with users with a supporting or opposing stance toward the invasion. Thereby, we aim to understand empirically whether and how between-user interactions are strategically used to promote the proliferation of Russian propaganda. For this, we leverage a large-scale dataset with 349,455 posts with pro-Russian propaganda from 132,131 users. We show that bots and humans follow a different strategy: bots respond predominantly to pro-invasion messages, suggesting that they seek to drive virality; while messages indicating opposition primarily elicit responses from humans, suggesting that they tend to engage in critical discussions. To the best of our knowledge, this is the first study analyzing the strategy behind propaganda from the 2022 Russian invasion of Ukraine through the lens of IRL.
    摘要 俄罗斯入侵乌克兰的2022年社交媒体宣传活动拥有了大规模的俄罗斯支持者宣传运动。然而,这些宣传的执行策略仍然不清楚,特别是在线媒体互动如何被推动者社区战略性地形成。在这里,我们使用反向强化学习(IRL)方法来分析推特社区的策略。具体来说,IRL允许我们将在线行为视为Markov决策过程,其中的目标是推断推特用户在与支持或反对入侵的用户互动时的奖励结构。由此,我们希望理解在线互动是如何被推动者用于推广俄罗斯宣传的。为此,我们利用了349,455条推特文章和132,131名用户的大规模数据集。我们发现,机器人和人类采取不同策略:机器人主要回应支持入侵的消息,表明它们想要驱动病毒性;而表达反对的消息主要引起人类的回应,表明人类更倾向于进行批评讨论。根据我们所知,这是第一篇分析2022年俄罗斯入侵乌克兰宣传策略的IRL研究。

Is attention all you need in medical image analysis? A review

  • paper_url: http://arxiv.org/abs/2307.12775
  • repo_url: None
  • paper_authors: Giorgos Papanastasiou, Nikolaos Dikaios, Jiahao Huang, Chengjia Wang, Guang Yang
  • For: This paper reviews and analyzes existing hybrid CNN-Transf/Attention models for medical image analysis (MIA) problems, and discusses their generalization opportunities for scientific and clinical impact.* Methods: The paper uses a comprehensive analysis framework to evaluate the architectural designs, breakthroughs, and opportunities of hybrid CNN-Transf/Attention models in MIA.* Results: The paper provides a systematic review of existing hybrid CNN-Transf/Attention models, and discusses their strengths and limitations in terms of generalization ability and clinical impact.
    Abstract Medical imaging is a key component in clinical diagnosis, treatment planning and clinical trial design, accounting for almost 90% of all healthcare data. CNNs achieved performance gains in medical image analysis (MIA) over the last years. CNNs can efficiently model local pixel interactions and be trained on small-scale MI data. The main disadvantage of typical CNN models is that they ignore global pixel relationships within images, which limits their generalisation ability to understand out-of-distribution data with different 'global' information. The recent progress of Artificial Intelligence gave rise to Transformers, which can learn global relationships from data. However, full Transformer models need to be trained on large-scale data and involve tremendous computational complexity. Attention and Transformer compartments (Transf/Attention) which can well maintain properties for modelling global relationships, have been proposed as lighter alternatives of full Transformers. Recently, there is an increasing trend to co-pollinate complementary local-global properties from CNN and Transf/Attention architectures, which led to a new era of hybrid models. The past years have witnessed substantial growth in hybrid CNN-Transf/Attention models across diverse MIA problems. In this systematic review, we survey existing hybrid CNN-Transf/Attention models, review and unravel key architectural designs, analyse breakthroughs, and evaluate current and future opportunities as well as challenges. We also introduced a comprehensive analysis framework on generalisation opportunities of scientific and clinical impact, based on which new data-driven domain generalisation and adaptation methods can be stimulated.
    摘要 医疗影像是诊断、治疗规划和临床试验设计中的关键组成部分,占健康保健数据的大约90%。过去几年,深度学习(CNN)在医疗影像分析(MIA)中获得了性能提升。CNN可以高效地模型影像中的局部像素互动,并可以在小规模的MI数据上进行训练。然而,典型的CNN模型忽略了影像中的全局像素关系,这限制了它们的泛化能力,不能理解不同的全局信息。随着人工智能的发展,转换器(Transformers)在数据中学习全局关系的能力得到了提升。然而,全Transformers模型需要大规模的训练数据和巨大的计算复杂度。为了维护模型的全局性和可扩展性,人们提出了Attention和Transformers组件(Transf/Attention)。最近几年, hybrid CNN-Transf/Attention模型在多个MIA问题上得到了广泛应用。在这篇系统评影卷中,我们对现有的hybrid CNN-Transf/Attention模型进行了抽样、回顾和分析,并评估了这些模型的当前和未来的机遇和挑战。此外,我们还提出了一种全面的分析框架,以便根据这些模型的泛化机会,推动数据驱动的领域泛化和适应方法的发展。

Detecting disturbances in network-coupled dynamical systems with machine learning

  • paper_url: http://arxiv.org/abs/2307.12771
  • repo_url: None
  • paper_authors: Per Sebastian Skardal, Juan G. Restrepo
  • for: identifying disturbances in network-coupled dynamical systems without knowledge of the disturbances or underlying dynamics
  • methods: model-free method based on machine learning using prior observations of the system when forced by a known training function
  • results: able to identify the locations and properties of many different types of unknown disturbances using a variety of known forcing functions, both with linear and nonlinear disturbances using food web and neuronal activity models.Here’s the full translation in Simplified Chinese:
  • for: Identifying disturbances in network-coupled dynamical systems without knowledge of the disturbances or underlying dynamics
  • methods: Model-free method based on machine learning using prior observations of the system when forced by a known training function
  • results: Able to identify the locations and properties of many different types of unknown disturbances using a variety of known forcing functions, both with linear and nonlinear disturbances using food web and neuronal activity models.
    Abstract Identifying disturbances in network-coupled dynamical systems without knowledge of the disturbances or underlying dynamics is a problem with a wide range of applications. For example, one might want to know which nodes in the network are being disturbed and identify the type of disturbance. Here we present a model-free method based on machine learning to identify such unknown disturbances based only on prior observations of the system when forced by a known training function. We find that this method is able to identify the locations and properties of many different types of unknown disturbances using a variety of known forcing functions. We illustrate our results both with linear and nonlinear disturbances using food web and neuronal activity models. Finally, we discuss how to scale our method to large networks.
    摘要 <>无知的网络相互作用系统中的干扰的识别问题具有广泛的应用领域。例如,我们可能想知道哪些节点在网络中受到干扰,并识别干扰的类型。在这里,我们提出了一种无模型的方法,基于机器学习来识别未知干扰,只基于先前观察到的系统强制函数。我们发现这种方法可以识别多种不同类型的未知干扰,使用多种已知强制函数。我们使用食物网和神经活动模型来ILLUSTRATE我们的结果。最后,我们讨论如何扩展我们的方法到大型网络。<>

Nonparametric Linear Feature Learning in Regression Through Regularisation

  • paper_url: http://arxiv.org/abs/2307.12754
  • repo_url: https://github.com/bertillefollain/regfeal
  • paper_authors: Bertille Follain, Umut Simsekli, Francis Bach
  • for: 这个论文的目的是提出一种新的非 Parametric 特征选择方法,用于在高维数据中进行预测、计算和解释。
  • methods: 该方法使用了Empirical Risk Minimization 算法,并在其中添加了一个 penalty term 来保证方法的 versatility。在使用 Hermite 波幅的时候,我们引入了一个新的估计器名为 RegFeaL。
  • results: 我们的实验结果表明,RegFeaL 可以在各种实验中达到高效的预测性和精度。此外,我们还提供了一些实验结果,证明了我们的方法的可靠性和稳定性。
    Abstract Representation learning plays a crucial role in automated feature selection, particularly in the context of high-dimensional data, where non-parametric methods often struggle. In this study, we focus on supervised learning scenarios where the pertinent information resides within a lower-dimensional linear subspace of the data, namely the multi-index model. If this subspace were known, it would greatly enhance prediction, computation, and interpretation. To address this challenge, we propose a novel method for linear feature learning with non-parametric prediction, which simultaneously estimates the prediction function and the linear subspace. Our approach employs empirical risk minimisation, augmented with a penalty on function derivatives, ensuring versatility. Leveraging the orthogonality and rotation invariance properties of Hermite polynomials, we introduce our estimator, named RegFeaL. By utilising alternative minimisation, we iteratively rotate the data to improve alignment with leading directions and accurately estimate the relevant dimension in practical settings. We establish that our method yields a consistent estimator of the prediction function with explicit rates. Additionally, we provide empirical results demonstrating the performance of RegFeaL in various experiments.
    摘要 学习表示在自动选择特征中扮演着关键角色,尤其在高维数据的情况下。在这种情况下,非 Parametric 方法经常陷入困难。在这项研究中,我们关注supervised学习场景,其中相关信息归结于数据中的一个低维线性子空间,即多指标模型。如果这个子空间知道,那么预测、计算和解释都将得到极大提高。为解决这个挑战,我们提出了一种新的方法,即linear feature学习方法,该方法同时估算预测函数和线性子空间。我们的方法使用empirical risk minimization,加上函数导数的罚函数,以确保多样性。通过 Hermite polynomials 的正交性和旋转不变性,我们引入了我们的估计器,名为RegFeaL。通过alternative minimization,我们可以逐步旋转数据,以便更好地与主要方向align,并准确地估算实际情况中的相关维度。我们证明了我们的方法可以得到一个consistent的预测函数估算器,并且提供了explicit rates。此外,我们还提供了许多实际 экспериментов的结果,以证明 RegFeaL 的性能。

Concept-based explainability for an EEG transformer model

  • paper_url: http://arxiv.org/abs/2307.12745
  • repo_url: https://github.com/andersgmadsen/tcav-bendr
  • paper_authors: Anders Gjølbye Madsen, William Theodor Lehn-Schiøler, Áshildur Jónsdóttir, Bergdís Arnardóttir, Lars Kai Hansen
  • for: 这个论文的目的是解释深度学习模型内部的状态,以便更好地理解它们如何处理数据。
  • methods: 该论文使用了Concept Activation Vectors(CAVs)方法来解释深度学习模型。CAVs是基于人类可理解的概念的方法,通过利用欧几何分布来定义内部状态。
  • results: 研究人员通过使用外部标注的EEG数据集和基于生物学结构的概念来定义概念,并证明这两种方法都可以提供深度EEG模型学习的有价值信息。
    Abstract Deep learning models are complex due to their size, structure, and inherent randomness in training procedures. Additional complexity arises from the selection of datasets and inductive biases. Addressing these challenges for explainability, Kim et al. (2018) introduced Concept Activation Vectors (CAVs), which aim to understand deep models' internal states in terms of human-aligned concepts. These concepts correspond to directions in latent space, identified using linear discriminants. Although this method was first applied to image classification, it was later adapted to other domains, including natural language processing. In this work, we attempt to apply the method to electroencephalogram (EEG) data for explainability in Kostas et al.'s BENDR (2021), a large-scale transformer model. A crucial part of this endeavor involves defining the explanatory concepts and selecting relevant datasets to ground concepts in the latent space. Our focus is on two mechanisms for EEG concept formation: the use of externally labeled EEG datasets, and the application of anatomically defined concepts. The former approach is a straightforward generalization of methods used in image classification, while the latter is novel and specific to EEG. We present evidence that both approaches to concept formation yield valuable insights into the representations learned by deep EEG models.
    摘要 深度学习模型因其大小、结构和训练过程中的随机性而复杂。这些复杂性来自数据选择和印uctive bias。为了解释这些复杂性,金等人(2018)提出了概念活化向量(CAV),该方法通过将深度模型内部状态转化为人类可理解的概念来解释深度模型的行为。这些概念与 latent space 中的方向相对应,通过使用线性投影来确定。这种方法最初应用于图像分类 зада务,后来扩展到其他领域,包括自然语言处理。在这项工作中,我们尝试将该方法应用于 Kostas 等人(2021)的 BENDR 模型,这是一个大规模的 transformer 模型。我们的注重点在于选择合适的解释概念和使用外部标注的 EEG 数据集来固定概念。前一种方法是一种直观的推广,而后一种方法是特定于 EEG 的新领域。我们显示两种方法都可以为深度 EEG 模型学习的表征提供有价值的解释。

Sparse-firing regularization methods for spiking neural networks with time-to-first spike coding

  • paper_url: http://arxiv.org/abs/2307.13007
  • repo_url: None
  • paper_authors: Yusuke Sakemi, Kakei Yamamoto, Takeo Hosomi, Kazuyuki Aihara
  • for: 这种研究旨在提高多层脉冲神经网络(SNN)的训练效果,特别是使用错误反射算法来实现理想的时间编码。
  • methods: 这种方法使用时间到初始脉冲(TTFS)编码,每个神经元只能发射一次,这种限制使得信息可以在非常低的脉冲频率下处理。
  • results: 通过两种基于脉冲时间的稀发(SSR)规范方法来进一步降低TTFS-编码SNNs的脉冲频率,并在MNIST、Fashion-MNIST和CIFAR-10 datasets上使用多层感知器网络和卷积神经网络结构进行研究。
    Abstract The training of multilayer spiking neural networks (SNNs) using the error backpropagation algorithm has made significant progress in recent years. Among the various training schemes, the error backpropagation method that directly uses the firing time of neurons has attracted considerable attention because it can realize ideal temporal coding. This method uses time-to-first spike (TTFS) coding, in which each neuron fires at most once, and this restriction on the number of firings enables information to be processed at a very low firing frequency. This low firing frequency increases the energy efficiency of information processing in SNNs, which is important not only because of its similarity with information processing in the brain, but also from an engineering point of view. However, only an upper limit has been provided for TTFS-coded SNNs, and the information-processing capability of SNNs at lower firing frequencies has not been fully investigated. In this paper, we propose two spike timing-based sparse-firing (SSR) regularization methods to further reduce the firing frequency of TTFS-coded SNNs. The first is the membrane potential-aware SSR (M-SSR) method, which has been derived as an extreme form of the loss function of the membrane potential value. The second is the firing condition-aware SSR (F-SSR) method, which is a regularization function obtained from the firing conditions. Both methods are characterized by the fact that they only require information about the firing timing and associated weights. The effects of these regularization methods were investigated on the MNIST, Fashion-MNIST, and CIFAR-10 datasets using multilayer perceptron networks and convolutional neural network structures.
    摘要 多层脉冲神经网络(SNN)的训练使用错误归散算法在过去几年来有了 significiant progress。多种训练方案中,使用神经元发射时间的错误归散方法吸引了较大的关注,因为它可以实现理想的时间编码。这种方法使用时间到第一脉冲(TTFS)编码,每个神经元只能发射一次,这种限制神经元发射数量使得信息可以在非常低的发射频率下处理。这种低发射频率提高了SNNs中信息处理的能效性,这不仅与神经元处理信息的方式相似,还从工程角度来看是非常重要。然而,只有提供了TTFS编码SNNs的Upper bound,它们在lower firing frequency下的信息处理能力还未得到了全面的研究。在这篇论文中,我们提出了两种基于发射时间的稀发射(SSR)规范,以进一步降低TTFS编码SNNs的发射频率。第一种是膜电压意识SSR(M-SSR)方法,它是膜电压值的极限形式的损失函数。第二种是发射条件意识SSR(F-SSR)方法,它是基于发射条件获得的规范函数。两种方法都是基于发射时间和相关权重的信息。我们在MNIST、Fashion-MNIST和CIFAR-10 datasets上使用多层报告网络和卷积神经网络结构来研究这两种规范的效果。

Safety Performance of Neural Networks in the Presence of Covariate Shift

  • paper_url: http://arxiv.org/abs/2307.12716
  • repo_url: None
  • paper_authors: Chih-Hong Cheng, Harald Ruess, Konstantinos Theodorou
  • for: This paper aims to address the issue of covariate shift’s impact on the operational safety performance of neural networks, and proposes a method to reshape the initial test set based on an approximation of the operational data.
  • methods: The proposed method uses finite binning and static dataflow analysis to derive conservative bounds on the values of neurons, and formulates a mixed integer linear programming (MILP) constraint to construct the minimum set of data points to be removed in the test set.
  • results: The proposed method can re-evaluate the safety performance of neural networks in the presence of covariate shift by using the reshaped test set, and can potentially reduce the need for collecting new operational data and creating corresponding ground truth labels.
    Abstract Covariate shift may impact the operational safety performance of neural networks. A re-evaluation of the safety performance, however, requires collecting new operational data and creating corresponding ground truth labels, which often is not possible during operation. We are therefore proposing to reshape the initial test set, as used for the safety performance evaluation prior to deployment, based on an approximation of the operational data. This approximation is obtained by observing and learning the distribution of activation patterns of neurons in the network during operation. The reshaped test set reflects the distribution of neuron activation values as observed during operation, and may therefore be used for re-evaluating safety performance in the presence of covariate shift. First, we derive conservative bounds on the values of neurons by applying finite binning and static dataflow analysis. Second, we formulate a mixed integer linear programming (MILP) constraint for constructing the minimum set of data points to be removed in the test set, such that the difference between the discretized test and operational distributions is bounded. We discuss potential benefits and limitations of this constraint-based approach based on our initial experience with an implemented research prototype.
    摘要 covariate shift可能会影响神经网络的操作安全性表现。然而,为了重新评估安全性表现,通常需要收集新的操作数据并创建相应的地面真实标签,这并不是在运行时可行。我们因此提议将初始测试集重新分配,基于运行时神经网络活动 patrerns的approximation。这种approximation可以通过观察和学习神经网络在运行时的活动模式来获得。重新分配的测试集尝试反映了在运行时神经网络活动值的分布,可以用于重新评估安全性表现在covariate shift的情况下。首先,我们通过finite binning和静态数据流分析来 derive保守的神经元值 bounds。其次,我们将mix integer linear programming(MILP)约束构造最小的数据点删除集,以使得测试集和运行时分布之间的差异保持在bound。我们对实际研究版本中的初步体验提出了可能的优点和限制。

Policy Gradient Optimal Correlation Search for Variance Reduction in Monte Carlo simulation and Maximum Optimal Transport

  • paper_url: http://arxiv.org/abs/2307.12703
  • repo_url: None
  • paper_authors: Pierre Bras, Gilles Pagès
  • for: 估计 $f(X_T)$ 的方法,即使 $X$ 是 Stochastic Differential Equation 的解。
  • methods: 使用 $(f(X^1_T) + f(X^2_T))/2$ 作为新的估计器,其中 $X^1$ 和 $X^2$ 具有同样的分布,但是具有相对的路径相关性,以降低方差。采用深度神经网络来 aproximate 优化函数 $\rho$,并使用策略梯度和奖励学习技术来准确调整 $\rho$。
  • results: 通过Policy Gradient和奖励学习技术来准确地调整优化函数 $\rho$,实现了降低方差的目标。
    Abstract We propose a new algorithm for variance reduction when estimating $f(X_T)$ where $X$ is the solution to some stochastic differential equation and $f$ is a test function. The new estimator is $(f(X^1_T) + f(X^2_T))/2$, where $X^1$ and $X^2$ have same marginal law as $X$ but are pathwise correlated so that to reduce the variance. The optimal correlation function $\rho$ is approximated by a deep neural network and is calibrated along the trajectories of $(X^1, X^2)$ by policy gradient and reinforcement learning techniques. Finding an optimal coupling given marginal laws has links with maximum optimal transport.
    摘要 我们提出了一种新的算法来降低方差时估计 $f(X_T)$,其中 $X$ 是一个随机 diferencial equation 的解,$f$ 是一个测试函数。新的估计器是 $(f(X^1_T) + f(X^2_T))/2$,其中 $X^1$ 和 $X^2$ 具有同样的分布,但是它们的路径相关,以降低方差。我们使用深度神经网络来 aproximate 优化函数 $\rho$,并通过政策梯度和强化学习技术来调整 $\rho$ 的参数。找到最佳对接给定分布有关系于最大优化运输。

MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features

  • paper_url: http://arxiv.org/abs/2307.12698
  • repo_url: None
  • paper_authors: Adrien Bardes, Jean Ponce, Yann LeCun
  • for: 本研究旨在jointly学习视觉表示和流体动作 estimation,并证明这两个目标互相帮助进行学习,从而学习出包含运动信息的内容特征。
  • methods: 本研究提出了MC-JEPA模型,即共同嵌入预测建模和自然学习方法,在共同编码器中同时学习视觉表示和流体动作 estimation。
  • results: 实验结果表明,MC-JEPA模型可以在无监督的情况下实现视觉表示和流体动作 estimation的同时学习,并且在下游任务中,如图像和视频Semantic segmentation等任务中,可以达到与现有无监督光流估计 benchmark和常见自然学习方法相当的性能。
    Abstract Self-supervised learning of visual representations has been focusing on learning content features, which do not capture object motion or location, and focus on identifying and differentiating objects in images and videos. On the other hand, optical flow estimation is a task that does not involve understanding the content of the images on which it is estimated. We unify the two approaches and introduce MC-JEPA, a joint-embedding predictive architecture and self-supervised learning approach to jointly learn optical flow and content features within a shared encoder, demonstrating that the two associated objectives; the optical flow estimation objective and the self-supervised learning objective; benefit from each other and thus learn content features that incorporate motion information. The proposed approach achieves performance on-par with existing unsupervised optical flow benchmarks, as well as with common self-supervised learning approaches on downstream tasks such as semantic segmentation of images and videos.
    摘要 自适应学习视觉表示法中心在学习内容特征,这些特征不包括物体运动或位置信息,而是通过识别和区分图像和视频中的对象来学习。相反,光流估计是一个不需要理解图像内容的任务。我们将这两种方法结合起来,并介绍MC-JEPA,一种共享编码器中的共同预测建筑和自适应学习方法,以jointly学习光流和内容特征。我们发现这两个相关的目标,即光流估计目标和自适应学习目标,在共同学习中互相帮助,因此学习的内容特征包含运动信息。我们的方法可以与现有的无监督光流标准做比较,以及常见的自适应学习方法在图像和视频Semantic segmentation任务上的性能。

Addressing the Impact of Localized Training Data in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.12689
  • repo_url: https://github.com/akanshaaga/reg_appnp
  • paper_authors: Singh Akansha
  • for: 本研究旨在评估图神经网络(GNNs)在本地化训练数据下的性能。
  • methods: 我们提出了一种常见GNN模型的补做方法,以适应本地化训练数据下的挑战。
  • results: 我们在三个标准图神经网络 benchmark dataset上进行了广泛的测试,并得到了显著的性能提升。
    Abstract Graph Neural Networks (GNNs) have achieved notable success in learning from graph-structured data, owing to their ability to capture intricate dependencies and relationships between nodes. They excel in various applications, including semi-supervised node classification, link prediction, and graph generation. However, it is important to acknowledge that the majority of state-of-the-art GNN models are built upon the assumption of an in-distribution setting, which hinders their performance on real-world graphs with dynamic structures. In this article, we aim to assess the impact of training GNNs on localized subsets of the graph. Such restricted training data may lead to a model that performs well in the specific region it was trained on but fails to generalize and make accurate predictions for the entire graph. In the context of graph-based semi-supervised learning (SSL), resource constraints often lead to scenarios where the dataset is large, but only a portion of it can be labeled, affecting the model's performance. This limitation affects tasks like anomaly detection or spam detection when labeling processes are biased or influenced by human subjectivity. To tackle the challenges posed by localized training data, we approach the problem as an out-of-distribution (OOD) data issue by by aligning the distributions between the training data, which represents a small portion of labeled data, and the graph inference process that involves making predictions for the entire graph. We propose a regularization method to minimize distributional discrepancies between localized training data and graph inference, improving model performance on OOD data. Extensive tests on popular GNN models show significant performance improvement on three citation GNN benchmark datasets. The regularization approach effectively enhances model adaptation and generalization, overcoming challenges posed by OOD data.
    摘要 GRAPH NEURAL NETWORKS (GNNs) have achieved notable success in learning from graph-structured data, owing to their ability to capture intricate dependencies and relationships between nodes. They excel in various applications, including semi-supervised node classification, link prediction, and graph generation. However, it is important to acknowledge that the majority of state-of-the-art GNN models are built upon the assumption of an in-distribution setting, which hinders their performance on real-world graphs with dynamic structures. In this article, we aim to assess the impact of training GNNs on localized subsets of the graph. Such restricted training data may lead to a model that performs well in the specific region it was trained on but fails to generalize and make accurate predictions for the entire graph. In the context of graph-based semi-supervised learning (SSL), resource constraints often lead to scenarios where the dataset is large, but only a portion of it can be labeled, affecting the model's performance. This limitation affects tasks like anomaly detection or spam detection when labeling processes are biased or influenced by human subjectivity. To tackle the challenges posed by localized training data, we approach the problem as an out-of-distribution (OOD) data issue by aligning the distributions between the training data, which represents a small portion of labeled data, and the graph inference process that involves making predictions for the entire graph. We propose a regularization method to minimize distributional discrepancies between localized training data and graph inference, improving model performance on OOD data. Extensive tests on popular GNN models show significant performance improvement on three citation GNN benchmark datasets. The regularization approach effectively enhances model adaptation and generalization, overcoming challenges posed by OOD data.

An Estimator for the Sensitivity to Perturbations of Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.12679
  • repo_url: None
  • paper_authors: Naman Maheshwari, Nicholas Malaya, Scott Moe, Jaydeep P. Kulkarni, Sudhanva Gurumurthi
  • for: 这篇论文的目的是为了评估深度神经网络(DNNs)在安全关键应用中的稳定性,如自动驾驶车和疾病诊断。
  • methods: 这篇论文使用了一种能够预测DNN对输入和模型参数的敏感性的估计器。该估计器基于不等式和矩阵范数,其结果类似于神经网络的condition number。
  • results: 在测试了AlexNet和VGG-19 convolutional neural networks(CNNs)以及ImageNet dataset时,这种估计器能够准确地预测DNN对输入和模型参数的敏感性。此外,通过随机偏移和攻击测试,这种估计器的紧密性也得到了证明。
    Abstract For Deep Neural Networks (DNNs) to become useful in safety-critical applications, such as self-driving cars and disease diagnosis, they must be stable to perturbations in input and model parameters. Characterizing the sensitivity of a DNN to perturbations is necessary to determine minimal bit-width precision that may be used to safely represent the network. However, no general result exists that is capable of predicting the sensitivity of a given DNN to round-off error, noise, or other perturbations in input. This paper derives an estimator that can predict such quantities. The estimator is derived via inequalities and matrix norms, and the resulting quantity is roughly analogous to a condition number for the entire neural network. An approximation of the estimator is tested on two Convolutional Neural Networks, AlexNet and VGG-19, using the ImageNet dataset. For each of these networks, the tightness of the estimator is explored via random perturbations and adversarial attacks.
    摘要 ( Deep Neural Networks 必须在安全关键应用中稳定,如自动驾驶车和疾病诊断。因此,必须了解 DNN 对输入和模型参数的敏感度,以确定安全地表示网络所需的最小位数准确性。然而,没有一个通用的结果可以预测给定 DNN 对轮减错误、噪声或其他输入中的敏感度。这篇文章提出了一个估计器,可以预测这些量。估计器是通过不等式和矩阵范数 derive,其结果类似于整个神经网络的condition number。这个估计器的紧密性在使用 ImageNet 数据集上对 AlexNet 和 VGG-19 两个卷积神经网络进行随机扰动和攻击性测试中被探索。)

Global k-Space Interpolation for Dynamic MRI Reconstruction using Masked Image Modeling

  • paper_url: http://arxiv.org/abs/2307.12672
  • repo_url: None
  • paper_authors: Jiazhen Pan, Suprosanna Shit, Özgün Turgut, Wenqi Huang, Hongwei Bran Li, Nil Stolt-Ansó, Thomas Küstner, Kerstin Hammernik, Daniel Rueckert
  • for: 这篇论文的目的是为了提高动力磁共振成像(MRI)中的数据探测率,以解决因时间限制而导致的抽象项目残影。
  • methods: 本文使用的方法是将受测空间探测短缺的数据进行插值,并且使用一个新的Transformer-based k-space Global Interpolation Network(k-GIN)来学习全球的低频和高频成像结构。此外,我们还提出了一个k-space Iterative Refinement Module(k-IRM)来强化高频成像的学习。
  • results: 我们的方法与基准方法相比,在92个内部2D+t心脏MRI试验中表现出了优化的成像质量和更高的类别化能力。特别是在具有高度受测空间探测短缺的情况下,我们的方法具有更高的类别化能力和普遍性。
    Abstract In dynamic Magnetic Resonance Imaging (MRI), k-space is typically undersampled due to limited scan time, resulting in aliasing artifacts in the image domain. Hence, dynamic MR reconstruction requires not only modeling spatial frequency components in the x and y directions of k-space but also considering temporal redundancy. Most previous works rely on image-domain regularizers (priors) to conduct MR reconstruction. In contrast, we focus on interpolating the undersampled k-space before obtaining images with Fourier transform. In this work, we connect masked image modeling with k-space interpolation and propose a novel Transformer-based k-space Global Interpolation Network, termed k-GIN. Our k-GIN learns global dependencies among low- and high-frequency components of 2D+t k-space and uses it to interpolate unsampled data. Further, we propose a novel k-space Iterative Refinement Module (k-IRM) to enhance the high-frequency components learning. We evaluate our approach on 92 in-house 2D+t cardiac MR subjects and compare it to MR reconstruction methods with image-domain regularizers. Experiments show that our proposed k-space interpolation method quantitatively and qualitatively outperforms baseline methods. Importantly, the proposed approach achieves substantially higher robustness and generalizability in cases of highly-undersampled MR data.
    摘要 在动态磁共振成像(MRI)中,通常因为扫描时间有限,会导致卷积空间下折射样本受到假象 artifacts。因此,动态MR重建需要不仅考虑 x 和 y 方向的空间频率组件,还需要考虑时间重复性。大多数前一些工作都是通过图像领域的正则化(约束)来进行MR重建。相比之下,我们注意到 interpolating 未折射的卷积空间,并提出了一种基于 Transformer 的全域卷积global interpolation network,称之为 k-GIN。我们的 k-GIN 学习了 2D+t 卷积空间中低频和高频组件之间的全局依赖关系,并使用其来 interpolate 未折射数据。此外,我们还提出了一种 k-space 迭代优化模块(k-IRM),以提高高频组件的学习。我们对 92 个室内 2D+t 心脏 MRI 测试数据进行了评估,并与使用图像领域正则化的 MR 重建方法进行比较。实验表明,我们提出的方法在量化和质量上都有显著提高,并且在高度受折射影响的 MR 数据中具有更高的robustness和普适性。

Control and Monitoring of Artificial Intelligence Algorithms

  • paper_url: http://arxiv.org/abs/2307.13705
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Carlos Mario Braga Ortuño, Blanza Martinez Donoso, Belén Muñiz Villanueva
  • for: 这篇论文强调了在人工智能模型部署后进行监管和评估数据分布的变化。
  • methods: 文章介绍了数据漂移和概念漂移的概念,以及他们的基础分布。同时,文章还提出了一些用于评估模型性能对于时间变化的指标。
  • results: 文章通过介绍不同的指标和方法,探讨了模型在不同情况下的性能。
    Abstract This paper elucidates the importance of governing an artificial intelligence model post-deployment and overseeing potential fluctuations in the distribution of present data in contrast to the training data. The concepts of data drift and concept drift are explicated, along with their respective foundational distributions. Furthermore, a range of metrics is introduced, which can be utilized to scrutinize the model's performance concerning potential temporal variations.
    摘要 这篇论文强调了在人工智能模型部署后的管理和监测数据分布的可能变化,而不是只是在训练数据上。文中介绍了数据漂移和概念漂移的概念,并详细介绍了它们的基础分布。此外,文中还提出了一些指标,可以用来评估模型在可能时间变化的情况下的性能。Here's a breakdown of the translation:* 这篇论文 (zhè běn tiān) - This paper* 强调 (qiáng dì) - emphasize* 在人工智能模型部署后 (zài rénsheng zhìyì módel bùdào hòu) - after the deployment of the artificial intelligence model* 管理 (guǎn lí) - management* 和监测 (hé jìng chá) - and monitoring* 数据分布 (shùdā fāngchēng) - data distribution* 可能变化 (kěnéng biànhùa) - potential variations* 而不是只是在训练数据上 (ér bùshì zhīshì zài xiǎngyìng shūjuè) - rather than only on the training data* 文中介绍 (wén zhōng jièshì) - the paper introduces* 数据漂移 (shùdā qùyì) - data drift* 和概念漂移 (hè guījiān qùyì) - and concept drift* 概念 (guījiān) - concept* 漂移 (qùyì) - drift* 基础分布 (jīshì fāngchēng) - fundamental distribution* 此外 (qíwài) - furthermore* 文中还提出 (wén zhōng hái tímzhěng) - the paper also proposes* 一些指标 (yīxiē zhǐdài) - some metrics* 可以用来 (kěyǐ yòu lái) - can be used to* 评估模型 (píngjì módel) - evaluate the model* 在可能时间变化的情况下 (zài kěnéng shíjiān biànhùa de qíngkè) - in the case of possible temporal variations.

TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers

  • paper_url: http://arxiv.org/abs/2307.12667
  • repo_url: https://github.com/fahim-sikder/TransFusion
  • paper_authors: Md Fahim Sikder, Resmi Ramachandranpillai, Fredrik Heintz
  • for: 本研究旨在生成高质量、长序时间序数据,应用广泛。
  • methods: 我们提出了一种基于协同扩散和变换器的生成模型,称为TransFusion。
  • results: TransFusion可以生成高质量的长序时间序数据,并且在许多视觉和实验性指标上表现出优于之前的状态 искусственный智能。
    Abstract The generation of high-quality, long-sequenced time-series data is essential due to its wide range of applications. In the past, standalone Recurrent and Convolutional Neural Network-based Generative Adversarial Networks (GAN) were used to synthesize time-series data. However, they are inadequate for generating long sequences of time-series data due to limitations in the architecture. Furthermore, GANs are well known for their training instability and mode collapse problem. To address this, we propose TransFusion, a diffusion, and transformers-based generative model to generate high-quality long-sequence time-series data. We have stretched the sequence length to 384, and generated high-quality synthetic data. To the best of our knowledge, this is the first study that has been done with this long-sequence length. Also, we introduce two evaluation metrics to evaluate the quality of the synthetic data as well as its predictive characteristics. We evaluate TransFusion with a wide variety of visual and empirical metrics, and TransFusion outperforms the previous state-of-the-art by a significant margin.
    摘要 “高质量、长序时间序数据的生成是非常重要,因为它具有广泛的应用领域。在过去,单独的循环神经网和卷积神经网基于的生成对抗网(GAN)被用来合成时间序数据。但是,它们因架构限制而无法生成长序时间序数据,并且GAN在训练时会出现不稳定和模式崩溃问题。为解决这问题,我们提出了TransFusion,一个扩散和卷积变数基于的生成模型,可以生成高质量的长序时间序数据。我们已经将序列长度延长到384,并生成了高质量的 sintetic数据。到目前为止,这是第一篇使用这长序长度的研究。此外,我们也引入了两个评估 metric 来评估生成的质量和预测特性。我们将TransFusion评估使用广泛的视觉和实验 metric,并证明TransFusion在前一代的state-of-the-art上出现著标准的差异。”

Online Continual Learning in Keyword Spotting for Low-Resource Devices via Pooling High-Order Temporal Statistics

  • paper_url: http://arxiv.org/abs/2307.12660
  • repo_url: https://github.com/umbertomichieli/tap-slda
  • paper_authors: Umberto Michieli, Pablo Peso Parada, Mete Ozay
  • for: 本研究旨在提高附加在设备上的语音识别(KWS)模型的适应速度,使其能够快速适应用户定义的新词语,而不会忘记之前的词语。
  • methods: 本研究提出了一种名为Temporal Aware Pooling(TAP)的方法,它在采用冻结的后向传播模型(backbone)的基础上,通过计算高阶语音特征的时间相关特征空间来扩充特征空间。然后,对这个扩充后的特征空间进行Gaussian模型的更新,以便更好地利用语音表示。
  • results: 实验分析表明,TAP-SLDA方法在几个设置、后向传播模型和基础上都显示出了明显的优异性,相对于竞争者的平均提升率为11.3%。
    Abstract Keyword Spotting (KWS) models on embedded devices should adapt fast to new user-defined words without forgetting previous ones. Embedded devices have limited storage and computational resources, thus, they cannot save samples or update large models. We consider the setup of embedded online continual learning (EOCL), where KWS models with frozen backbone are trained to incrementally recognize new words from a non-repeated stream of samples, seen one at a time. To this end, we propose Temporal Aware Pooling (TAP) which constructs an enriched feature space computing high-order moments of speech features extracted by a pre-trained backbone. Our method, TAP-SLDA, updates a Gaussian model for each class on the enriched feature space to effectively use audio representations. In experimental analyses, TAP-SLDA outperforms competitors on several setups, backbones, and baselines, bringing a relative average gain of 11.3% on the GSC dataset.
    摘要 <> transtable id="1" 键 слова检测(KWS)模型在嵌入式设备上应该快速适应新用户定义的词语,而不会忘记之前的词语。嵌入式设备具有有限的存储和计算资源,因此无法保存样本或更新大型模型。我们考虑了嵌入式在线继续学习(EOCL)的设置,其中KWS模型具有冻结的脊梁被训练以逐渐认可新的词语从一个不重复的流量中,每个样本一个个。为此,我们提议使用时间意识汇聚(TAP),它在预训练后的听语特征空间中构建了丰富的特征空间,并计算高阶时域特征以实现有效的听语表示。我们的方法TAP-SLDA将对每个类在汇聚特征空间上更新加aussian模型,以使用听语表示。在实验分析中,TAP-SLDA比竞争者在多种设置、脊梁和基础上表现出较高的平均提升率,达到11.3%的相对平均提升率在GSC数据集上。

Remote Bio-Sensing: Open Source Benchmark Framework for Fair Evaluation of rPPG

  • paper_url: http://arxiv.org/abs/2307.12644
  • repo_url: https://github.com/remotebiosensing/rppg
  • paper_authors: Dae-Yeol Kim, Eunsu Goh, KwangKee Lee, JongEui Chae, JongHyeon Mun, Junyeong Na, Chae-bong Sohn, Do-Yup Kim
  • For: This study provides a benchmarking framework for evaluating the performance of remote photoplethysmography (rPPG) techniques across a wide range of datasets, to ensure fair and meaningful comparison and progress in the field.* Methods: The study uses a variety of datasets, including both conventional non-deep neural network (non-DNN) and deep neural network (DNN) methods, to evaluate the performance of rPPG techniques and provide a comprehensive benchmarking framework.* Results: The study aims to provide a fair and evaluable benchmarking framework for rPPG techniques, addressing the challenges of skin color, camera characteristics, ambient lighting, and other sources of noise and artifacts, to make meaningful progress in the field.
    Abstract rPPG (Remote photoplethysmography) is a technology that measures and analyzes BVP (Blood Volume Pulse) by using the light absorption characteristics of hemoglobin captured through a camera. Analyzing the measured BVP can derive various physiological signals such as heart rate, stress level, and blood pressure, which can be applied to various applications such as telemedicine, remote patient monitoring, and early prediction of cardiovascular disease. rPPG is rapidly evolving and attracting great attention from both academia and industry by providing great usability and convenience as it can measure biosignals using a camera-equipped device without medical or wearable devices. Despite extensive efforts and advances in this field, serious challenges remain, including issues related to skin color, camera characteristics, ambient lighting, and other sources of noise and artifacts, which degrade accuracy performance. We argue that fair and evaluable benchmarking is urgently required to overcome these challenges and make meaningful progress from both academic and commercial perspectives. In most existing work, models are trained, tested, and validated only on limited datasets. Even worse, some studies lack available code or reproducibility, making it difficult to fairly evaluate and compare performance. Therefore, the purpose of this study is to provide a benchmarking framework to evaluate various rPPG techniques across a wide range of datasets for fair evaluation and comparison, including both conventional non-deep neural network (non-DNN) and deep neural network (DNN) methods. GitHub URL: https://github.com/remotebiosensing/rppg
    摘要 rPPG (远程血液抑血光谱) 是一种技术,通过使用摄像头捕捉光谱特性来测量和分析血液脉冲(BVP)。通过分析测量的BVP,可以 derivate 多种生物physiological signals,如心率、剂量压力和 стресс水平,这些信号可以应用于telemedicine、远程病人监测和早期心血管疾病预测等领域。rPPG 在学术和产业界 rapidly evolving 和吸引广泛关注,因为它可以通过摄像头设备测量生物信号,无需医疗或佩戴设备,提供了很好的可用性和便利性。然而,它还存在严重的挑战,包括皮肤颜色、摄像头特性、 ambient 照明和其他干扰和噪声的问题,这些问题会降低精度性能。我们认为,准确和评估可能的benchmarking是 urgently required ,以超越这些挑战和取得学术和商业上的进步。现有的大多数工作都是在有限的数据集上进行训练、测试和验证,甚至有些研究缺乏可用的代码或可重现性,使得准确评估和比较困难。因此,本研究的目的是提供一个 benchmarking 框架,以评估不同的 rPPG 技术在各种数据集上的性能,包括非深度神经网络(non-DNN)和深度神经网络(DNN)方法。GitHub URL:https://github.com/remotebiosensing/rppg

Fake News Detection Through Graph-based Neural Networks: A Survey

  • paper_url: http://arxiv.org/abs/2307.12639
  • repo_url: None
  • paper_authors: Shuzhi Gong, Richard O. Sinnott, Jianzhong Qi, Cecile Paris
  • for: 本研究评估了基于图структуры的新闻假消息检测方法和深度学习方法,以及它们在新闻传播过程中的应用。
  • methods: 本研究分类了现有的图结构基于的新闻假消息检测方法,包括知识驱动方法、传播基于方法和多元社交 контекст基于方法。
  • results: 本研究评估了现有的图结构基于的新闻假消息检测方法,并提出了未来研究方向。
    Abstract The popularity of online social networks has enabled rapid dissemination of information. People now can share and consume information much more rapidly than ever before. However, low-quality and/or accidentally/deliberately fake information can also spread rapidly. This can lead to considerable and negative impacts on society. Identifying, labelling and debunking online misinformation as early as possible has become an increasingly urgent problem. Many methods have been proposed to detect fake news including many deep learning and graph-based approaches. In recent years, graph-based methods have yielded strong results, as they can closely model the social context and propagation process of online news. In this paper, we present a systematic review of fake news detection studies based on graph-based and deep learning-based techniques. We classify existing graph-based methods into knowledge-driven methods, propagation-based methods, and heterogeneous social context-based methods, depending on how a graph structure is constructed to model news related information flows. We further discuss the challenges and open problems in graph-based fake news detection and identify future research directions.
    摘要 “在线社交网络的广泛散布信息的受欢迎程度,使得人们可以更加快速地分享和消耗信息。但是,低品质和/或意外或故意伪造的信息也可以快速散布,导致社会产生了负面的影响。为了早为社会做出负面影响的预防和控制,识别、标识和驳斥网络伪信息的检测已经成为一个非常紧迫的问题。许多方法已经被提出来检测伪新闻,包括深度学习和agraph基的方法。在过去的几年中,agraph基的方法具有优秀的成绩,因为它们可以将在线新闻的社交内容和传播过程模型得到更加精确地。在这篇文章中,我们提出了一种系统性的审查伪新闻检测研究,依据graph基的方法进行分类,并讨论了伪新闻检测中的挑战和未来研究方向。”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need Traditional Chinese, please let me know.

Identifying drivers and mitigators for congestion and redispatch in the German electric power system with explainable AI

  • paper_url: http://arxiv.org/abs/2307.12636
  • repo_url: None
  • paper_authors: Maurizio Titz, Sebastian Pütz, Dirk Witthaut
  • for: 这篇论文旨在分析德国传输电网中的压力峰值和对负面影响,以及可能的市场设计变更以缓解压力峰值。
  • methods: 该论文使用可解释的机器学习模型来预测每小时的重新配置和对贸易量。模型分析了压力峰值的驱动因素和缓解因素,并评估了它们的影响。
  • results: 研究发现,风力电力生产是压力峰值的主要驱动因素,而水力电力和跨国电力贸易也扮演着重要的缓解作用。然而,太阳能电力没有缓解压力峰值的效果。结果表明,市场设计的变更可以缓解压力峰值。
    Abstract The transition to a sustainable energy supply challenges the operation of electric power systems in manifold ways. Transmission grid loads increase as wind and solar power are often installed far away from the consumers. In extreme cases, system operators must intervene via countertrading or redispatch to ensure grid stability. In this article, we provide a data-driven analysis of congestion in the German transmission grid. We develop an explainable machine learning model to predict the volume of redispatch and countertrade on an hourly basis. The model reveals factors that drive or mitigate grid congestion and quantifies their impact. We show that, as expected, wind power generation is the main driver, but hydropower and cross-border electricity trading also play an essential role. Solar power, on the other hand, has no mitigating effect. Our results suggest that a change to the market design would alleviate congestion.
    摘要 “将可再生能源纳入可持续能源供应的过程对电力系统运行带来多种挑战。透传网络荷载增加,因为风力和太阳能经常在消费者处 instal 远 away。在极端情况下,系统运维人员需要通过对贸易或重新分配来确保网格稳定。在这篇文章中,我们提供了一个可解释的机器学习模型,用于预测每小时的重新分配和对贸易量。该模型表明风力发电是主要驱动力,而水力发电和跨国电力贸易也扮演着关键性的角色。而太阳能发电却没有缓解效果。我们的结果表明,修改市场设计可以缓解压力。”Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

De-confounding Representation Learning for Counterfactual Inference on Continuous Treatment via Generative Adversarial Network

  • paper_url: http://arxiv.org/abs/2307.12625
  • repo_url: None
  • paper_authors: Yonghe Zhao, Qiang Huang, Haolong Zeng, Yun Pen, Huiyan Sun
  • for: This paper aims to address the problem of counterfactual inference for continuous treatment variables, which is more common in real-world causal inference tasks.
  • methods: The proposed method is called de-confounding representation learning (DRL), which generates representations of covariates that are disentangled from the treatment variables. The DRL model is a non-parametric model that eliminates both linear and nonlinear dependence between treatment and covariates.
  • results: The DRL model outperforms state-of-the-art counterfactual inference models for continuous treatment variables in extensive experiments on synthetic datasets. Additionally, the DRL model is applied to a real-world medical dataset MIMIC and demonstrates a detailed causal relationship between red cell width distribution and mortality.
    Abstract Counterfactual inference for continuous rather than binary treatment variables is more common in real-world causal inference tasks. While there are already some sample reweighting methods based on Marginal Structural Model for eliminating the confounding bias, they generally focus on removing the treatment's linear dependence on confounders and rely on the accuracy of the assumed parametric models, which are usually unverifiable. In this paper, we propose a de-confounding representation learning (DRL) framework for counterfactual outcome estimation of continuous treatment by generating the representations of covariates disentangled with the treatment variables. The DRL is a non-parametric model that eliminates both linear and nonlinear dependence between treatment and covariates. Specifically, we train the correlations between the de-confounded representations and the treatment variables against the correlations between the covariate representations and the treatment variables to eliminate confounding bias. Further, a counterfactual inference network is embedded into the framework to make the learned representations serve both de-confounding and trusted inference. Extensive experiments on synthetic datasets show that the DRL model performs superiorly in learning de-confounding representations and outperforms state-of-the-art counterfactual inference models for continuous treatment variables. In addition, we apply the DRL model to a real-world medical dataset MIMIC and demonstrate a detailed causal relationship between red cell width distribution and mortality.
    摘要 常用的Counterfactual推论 для连续而不是二进制的治疗变量更常见在实际世界的 causal推论任务中。现有一些基于Marginal Structural Model的样本重新权重方法,可以消除干扰的偏见,但这些方法通常假设了治疗变量和干扰变量之间的线性关系,并且这些模型通常是不可证明的。在这篇论文中,我们提出了一种基于de-confounding representation learning(DRL)的推论框架,用于连续治疗变量的 counterfactual 结果估计。DRL 是一种非 Parametric 模型,可以消除治疗变量和干扰变量之间的线性和非线性关系。具体来说,我们在框架中训练了干扰变量和治疗变量之间的相关性和干扰变量和治疗变量之间的相关性,以消除干扰偏见。此外,我们还将 counterfactual 推论网络 embedding 到框架中,以使得学习的表示可以用于 both de-confounding 和可靠的推论。我们在 synthetic 数据上进行了广泛的实验,发现 DRL 模型在学习 de-confounding 表示方面表现出色,并且超过了当前的 counterfactual 推论模型。此外,我们还应用了 DRL 模型到实际的医疗数据集 MIMIC,并显示了红细胞宽度分布和死亡的明确 causal 关系。

Predicting Ordinary Differential Equations with Transformers

  • paper_url: http://arxiv.org/abs/2307.12617
  • repo_url: None
  • paper_authors: Sören Becker, Michal Klein, Alexander Neitz, Giambattista Parascandolo, Niki Kilbertus
  • for: recuperates scalar ordinary differential equations (ODEs) in symbolic form from irregularly sampled and noisy observations of a single solution trajectory
  • methods: transformer-based sequence-to-sequence model
  • results: better or on par with existing methods in terms of accurate recovery, and efficiently scalable after one-time pretraining on a large set of ODEs
    Abstract We develop a transformer-based sequence-to-sequence model that recovers scalar ordinary differential equations (ODEs) in symbolic form from irregularly sampled and noisy observations of a single solution trajectory. We demonstrate in extensive empirical evaluations that our model performs better or on par with existing methods in terms of accurate recovery across various settings. Moreover, our method is efficiently scalable: after one-time pretraining on a large set of ODEs, we can infer the governing law of a new observed solution in a few forward passes of the model.
    摘要 我们开发了一种基于转换器的序列到序列模型,可以从不规则采样和噪声观测数据中精确地回归Scalar常微方程(ODEs)的符号形式。我们在广泛的实验中证明了我们的模型与现有方法相比,在不同的设置下都能够更高效地回归精度。另外,我们的方法可以高效扩展:只需一次预训练于大量ODEs后,我们就可以在几个前向传播中快速地推断新观测数据的管理法律。

ExWarp: Extrapolation and Warping-based Temporal Supersampling for High-frequency Displays

  • paper_url: http://arxiv.org/abs/2307.12607
  • repo_url: None
  • paper_authors: Akanksha Dixit, Yashashwee Chakrabarty, Smruti R. Sarangi
    for: 提高高频显示器的帧率,提供更平滑和响应的用户体验。methods: 使用动态网络(DNN)和强化学习(RL)算法,选择适合的插值和扩展方法,以提高帧率,同时保持图像质量。results: 比较传统方法,Exwarp可以提高帧率4倍,且图像质量几乎不受影响。
    Abstract High-frequency displays are gaining immense popularity because of their increasing use in video games and virtual reality applications. However, the issue is that the underlying GPUs cannot continuously generate frames at this high rate -- this results in a less smooth and responsive experience. Furthermore, if the frame rate is not synchronized with the refresh rate, the user may experience screen tearing and stuttering. Previous works propose increasing the frame rate to provide a smooth experience on modern displays by predicting new frames based on past or future frames. Interpolation and extrapolation are two widely used algorithms that predict new frames. Interpolation requires waiting for the future frame to make a prediction, which adds additional latency. On the other hand, extrapolation provides a better quality of experience because it relies solely on past frames -- it does not incur any additional latency. The simplest method to extrapolate a frame is to warp the previous frame using motion vectors; however, the warped frame may contain improperly rendered visual artifacts due to dynamic objects -- this makes it very challenging to design such a scheme. Past work has used DNNs to get good accuracy, however, these approaches are slow. This paper proposes Exwarp -- an approach based on reinforcement learning (RL) to intelligently choose between the slower DNN-based extrapolation and faster warping-based methods to increase the frame rate by 4x with an almost negligible reduction in the perceived image quality.
    摘要 高频显示器目前在游戏和虚拟现实应用中得到了广泛的推广,但是这些显示器的后置GPU无法持续生成这高的帧率,这会导致用户体验不平滑和不响应。此外,如果帧率与刷新率不同步,用户可能会经历屏渲染和颤抖现象。以往的工作建议通过预测新帧来提高现代显示器的帧率,以提供柔顺的用户体验。插值和拟合是两种广泛使用的预测算法。插值需要等待未来帧来作预测,这会添加额外的延迟。拟合则提供了更高质量的用户体验,因为它仅基于过去帧进行预测,不增加额外的延迟。最简单的拟合方法是通过运动向量来扭曲上一帧,以生成下一帧。但是,扭曲后的帧可能包含不正确渲染的视觉artifacts,这使得设计这种方案非常困难。过去的工作使用深度神经网络(DNN)来获得高精度,但这些方法较慢。这篇论文提出了Exwarp方法,基于强化学习(RL)来智能选择 slower DNN-based extrapolation和 faster warping-based方法,以提高帧率4倍,并且几乎无法感受到图像质量的下降。

Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models

  • paper_url: http://arxiv.org/abs/2307.12601
  • repo_url: https://github.com/patrik-ha/concept-backpropagation
  • paper_authors: Patrik Hammersborg, Inga Strümke
  • for: This paper aims to provide a method for visualizing the information that a neural network model depends on to represent a given concept.
  • methods: The method used in this paper is called concept backpropagation, which involves perturbing the model input in a way that maximizes the detected concept.
  • results: The paper presents results for this method applied to a variety of input modalities, and discusses how the method can be used to visualize the information that trained concept probes use and the degree to which the representation of the probed concept is entangled within the neural network model.
    Abstract Neural network models are widely used in a variety of domains, often as black-box solutions, since they are not directly interpretable for humans. The field of explainable artificial intelligence aims at developing explanation methods to address this challenge, and several approaches have been developed over the recent years, including methods for investigating what type of knowledge these models internalise during the training process. Among these, the method of concept detection, investigates which \emph{concepts} neural network models learn to represent in order to complete their tasks. In this work, we present an extension to the method of concept detection, named \emph{concept backpropagation}, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model. In this approach, the model input is perturbed in a manner guided by a trained concept probe for the described model, such that the concept of interest is maximised. This allows for the visualisation of the detected concept directly in the input space of the model, which in turn makes it possible to see what information the model depends on for representing the described concept. We present results for this method applied to a various set of input modalities, and discuss how our proposed method can be used to visualise what information trained concept probes use, and the degree as to which the representation of the probed concept is entangled within the neural network model itself.
    摘要

Optimized data collection and analysis process for studying solar-thermal desalination by machine learning

  • paper_url: http://arxiv.org/abs/2307.12594
  • repo_url: None
  • paper_authors: Guilong Peng, Senshan Sun, Yangjun Qin, Zhenwei Xu, Juxin Du, Swellam W. sharshir, A. W. Kandel, A. E. Kabeel, Nuo Yang
  • for: 这个研究的目的是提高机器学习在太阳蒸馏净水方面的应用,通过大量的实验数据收集和分析。
  • methods: 这个研究使用了修改后的实验数据收集和分析过程,通过加速数据收集和减少时间83.3%来收集超过一千个实验数据,比前一个研究的平均数据量大得多。同时,研究者使用了三种算法,包括人工神经网络、多ivariate 回归和随机森林,来研究数据特征的影响。
  • results: 研究结果表明,使用人工神经网络和随机森林算法时,大量数据可以显著提高预测精度。此外,研究还发现数据规模和范围对预测精度和影响因素排名的影响很大。同时,研究发现人工神经网络在推广范围上的描述性能受到数据范围的影响。这些结果表明,大量的实验数据收集和分析,以及数据特征的影响分析是机器学习在太阳蒸馏净水领域的重要步骤,可以推广机器学习在这个领域的应用。
    Abstract An effective interdisciplinary study between machine learning and solar-thermal desalination requires a sufficiently large and well-analyzed experimental datasets. This study develops a modified dataset collection and analysis process for studying solar-thermal desalination by machine learning. Based on the optimized water condensation and collection process, the proposed experimental method collects over one thousand datasets, which is ten times more than the average number of datasets in previous works, by accelerating data collection and reducing the time by 83.3%. On the other hand, the effects of dataset features are investigated by using three different algorithms, including artificial neural networks, multiple linear regressions, and random forests. The investigation focuses on the effects of dataset size and range on prediction accuracy, factor importance ranking, and the model's generalization ability. The results demonstrate that a larger dataset can significantly improve prediction accuracy when using artificial neural networks and random forests. Additionally, the study highlights the significant impact of dataset size and range on ranking the importance of influence factors. Furthermore, the study reveals that the extrapolation data range significantly affects the extrapolation accuracy of artificial neural networks. Based on the results, massive dataset collection and analysis of dataset feature effects are important steps in an effective and consistent machine learning process flow for solar-thermal desalination, which can promote machine learning as a more general tool in the field of solar-thermal desalination.
    摘要 要有效地结合机器学习和太阳蒸馈淡水,需要一个足够大、且具有分析力的实验数据集。本研究提出了一种修改后的数据采集和分析过程,用于通过机器学习研究太阳蒸馈淡水。基于优化的水蒸馈和收集过程,该方法收集了超过一千个数据集,比前一个平均数据集的十倍多,并将采集时间减少了83.3%。而且,该研究通过使用三种不同的算法,包括人工神经网络、多元线性回归和随机森林,研究数据集大小和范围对预测精度、因素重要性排名和模型泛化能力的影响。结果表明,大量数据集可以在使用人工神经网络和随机森林时显著提高预测精度。此外,研究还发现数据集大小和范围对因素重要性排名产生了重要影响。此外,研究还发现人工神经网络抽象数据范围对抽象预测精度产生了重要影响。根据结果,大量数据采集和分析数据集特征效果是机器学习过程中不可或缺的一步,可以推动机器学习在太阳蒸馈淡水领域的普遍应用。

InVAErt networks: a data-driven framework for emulation, inference and identifiability analysis

  • paper_url: http://arxiv.org/abs/2307.12586
  • repo_url: None
  • paper_authors: Guoxiang Grayson Tong, Carlos A. Sing Long, Daniele E. Schiavazzi
  • for: 本研究旨在推广使用生成模型和深度学习来解决物理系统的设计和分析问题,而不仅仅是模拟任务。
  • methods: 该研究提出了一种名为inVAErt网络的框架,该框架使用确定性编码器和解码器来表示前向和反向解决 Map,使用流变换模型来捕捉系统输出的概率分布,并使用变量编码器来学习减少输入和输出之间的不一致性。
  • results: 研究人员通过数值实验证明了inVAErt网络的可行性和灵活性,并发现选择罚分 coefficient和积分空间抽取策略对训练和测试性能有重要影响。
    Abstract Use of generative models and deep learning for physics-based systems is currently dominated by the task of emulation. However, the remarkable flexibility offered by data-driven architectures would suggest to extend this representation to other aspects of system synthesis including model inversion and identifiability. We introduce inVAErt (pronounced \emph{invert}) networks, a comprehensive framework for data-driven analysis and synthesis of parametric physical systems which uses a deterministic encoder and decoder to represent the forward and inverse solution maps, normalizing flow to capture the probabilistic distribution of system outputs, and a variational encoder designed to learn a compact latent representation for the lack of bijectivity between inputs and outputs. We formally investigate the selection of penalty coefficients in the loss function and strategies for latent space sampling, since we find that these significantly affect both training and testing performance. We validate our framework through extensive numerical examples, including simple linear, nonlinear, and periodic maps, dynamical systems, and spatio-temporal PDEs.
    摘要 使用生成模型和深度学习来处理物理系统的应用主要是 emulator。然而,这些数据驱动架构的灵活性表示可以扩展到其他系统设计方面,包括模型反转和可识别性。我们介绍inVAErt(pronounced inverse)网络,一个涵盖数据驱动分析和设计参数物理系统的框架,使用决定性编码器和解码器表示前向和反向解决Map,使用正态流捕捉系统输出的概率分布,并使用可变编码器学习减少输入和输出之间的不一致。我们正式调查损害征素在损失函数中的选择和latent空间抽样策略,因为我们发现这些对训练和测试性能有很大影响。我们通过大量的数字例子验证了我们的框架,包括简单的线性、非线性和 periodic maps,动力系统和时空PDEs。

Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data

  • paper_url: http://arxiv.org/abs/2307.12576
  • repo_url: None
  • paper_authors: Junghyun Koo, Yunkee Chae, Chang-Bin Jeon, Kyogu Lee
  • for: 提高音乐源分离(MSS)性能,增加大数据集来改进MSS模型的训练
  • methods: 自动地对含有噪声标签的数据集进行自我反射,提高MSS模型的识别精度
  • results: 使用自我反射的数据集可以达到与使用干净标签的数据集相同的识别精度,而且在只有噪声标签数据集的情况下,MSS模型训练在自我反射数据集上可以超过使用干净标签数据集训练的性能。
    Abstract Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks. With the push to acquire larger datasets to improve MSS performance, the inevitability of encountering mislabeled individual instrument tracks becomes a significant challenge to address. This paper introduces an automated technique for refining the labels in a partially mislabeled dataset. Our proposed self-refining technique, employed with a noisy-labeled dataset, results in only a 1% accuracy degradation in multi-label instrument recognition compared to a classifier trained on a clean-labeled dataset. The study demonstrates the importance of refining noisy-labeled data in MSS model training and shows that utilizing the refined dataset leads to comparable results derived from a clean-labeled dataset. Notably, upon only access to a noisy dataset, MSS models trained on a self-refined dataset even outperform those trained on a dataset refined with a classifier trained on clean labels.
    摘要 音乐源分离(MSS)面临限量正确标注个 instrumente 轨迹的问题。随着提高 MSS性能的努力,遇到带有错误标注的个 instrumente 轨迹的可能性变得非常重要。这篇文章介绍了一种自动刷新标注的技术,可以在带有噪声标注的 dataset 上进行刷新。我们的提议的自我刷新技术与噪声标注 dataset 上的类ifier 结合使用,对多个标签 instrumente 识别中的准确率进行了1%的下降。这种研究表明了刷新噪声标注数据的重要性,并证明了使用刷新后的数据可以达到与清晰标注数据相同的结果。甚至只有带有噪声标注的数据,MSS模型在使用自我刷新数据进行训练后会比使用刷新后的数据进行训练后更高的性能。

Towards Generalising Neural Topical Representations

  • paper_url: http://arxiv.org/abs/2307.12564
  • repo_url: None
  • paper_authors: Xiaohao Yang, He Zhao, Dinh Phung, Lan Du
  • for: 提高 neural topic model(NTM)的通用能力,使其可以在不同的资料集中具有可靠的泛化能力。
  • methods: 使用数据扩充和层次话题交通距离(HOTT)计算优化运输(OT)距离,以iminimize similar documents的semantic distance during training NTMs。
  • results: 对NTMs进行了扩展,使其在不同的资料集中具有显著提高的泛化能力。
    Abstract Topic models have evolved from conventional Bayesian probabilistic models to Neural Topic Models (NTMs) over the last two decays. Although NTMs have achieved promising performance when trained and tested on a specific corpus, their generalisation ability across corpora is rarely studied. In practice, we often expect that an NTM trained on a source corpus can still produce quality topical representation for documents in a different target corpus without retraining. In this work, we aim to improve NTMs further so that their benefits generalise reliably across corpora and tasks. To do so, we propose to model similar documents by minimising their semantical distance when training NTMs. Specifically, similar documents are created by data augmentation during training; The semantical distance between documents is measured by the Hierarchical Topic Transport Distance (HOTT), which computes the Optimal Transport (OT) distance between the topical representations. Our framework can be readily applied to most NTMs as a plug-and-play module. Extensive experiments show that our framework significantly improves the generalisation ability regarding neural topical representation across corpora.
    摘要

DeepGATGO: A Hierarchical Pretraining-Based Graph-Attention Model for Automatic Protein Function Prediction

  • paper_url: http://arxiv.org/abs/2307.13004
  • repo_url: None
  • paper_authors: Zihao Li, Changkun Jiang, Jianqiang Li
  • for: automatic protein function prediction (AFP)
  • methods: sequence-based hierarchical prediction method using graph attention networks (GATs) and contrastive learning
  • results: better scalability in GO term enrichment analysis on large-scale datasets
    Abstract Automatic protein function prediction (AFP) is classified as a large-scale multi-label classification problem aimed at automating protein enrichment analysis to eliminate the current reliance on labor-intensive wet-lab methods. Currently, popular methods primarily combine protein-related information and Gene Ontology (GO) terms to generate final functional predictions. For example, protein sequences, structural information, and protein-protein interaction networks are integrated as prior knowledge to fuse with GO term embeddings and generate the ultimate prediction results. However, these methods are limited by the difficulty in obtaining structural information or network topology information, as well as the accuracy of such data. Therefore, more and more methods that only use protein sequences for protein function prediction have been proposed, which is a more reliable and computationally cheaper approach. However, the existing methods fail to fully extract feature information from protein sequences or label data because they do not adequately consider the intrinsic characteristics of the data itself. Therefore, we propose a sequence-based hierarchical prediction method, DeepGATGO, which processes protein sequences and GO term labels hierarchically, and utilizes graph attention networks (GATs) and contrastive learning for protein function prediction. Specifically, we compute embeddings of the sequence and label data using pre-trained models to reduce computational costs and improve the embedding accuracy. Then, we use GATs to dynamically extract the structural information of non-Euclidean data, and learn general features of the label dataset with contrastive learning by constructing positive and negative example samples. Experimental results demonstrate that our proposed model exhibits better scalability in GO term enrichment analysis on large-scale datasets.
    摘要 自动蛋白功能预测(AFP)被分类为大规模多标签分类问题,旨在自动化蛋白聚集分析,以消除现有的人工劳动密集方法。现有的popular方法主要结合蛋白质相关信息和生物学功能 ontology(GO)标签来生成最终的功能预测结果。例如,蛋白序列、结构信息和蛋白蛋白交互网络被融合到GO标签嵌入中,以生成最终的预测结果。然而,这些方法受到蛋白质结构信息或网络拓扑信息的困难性和准确性的限制。因此,越来越多的方法只使用蛋白序列进行蛋白功能预测,这是一种更可靠和计算成本更低的方法。然而,现有的方法无法充分EXTRACT蛋白序列和标签数据中的特征信息。因此,我们提出了一种遵循蛋白序列层次预测方法,深度GATGO,该方法可以处理蛋白序列和GO标签数据层次,并使用图注意力网络(GATs)和对比学习来进行蛋白功能预测。具体来说,我们使用预训练模型计算蛋白序列和标签数据的嵌入,以降低计算成本并提高嵌入精度。然后,我们使用GATs动态提取蛋白序列非几何数据的结构信息,并通过对比学习学习标签数据的通用特征。实验结果表明,我们提出的模型在大规模GO标签浸泡分析中展现出较好的扩展性。

Homophily-Driven Sanitation View for Robust Graph Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.12555
  • repo_url: https://github.com/htytewx/softcam
  • paper_authors: Yulin Zhu, Xing Ai, Yevgeniy Vorobeychik, Kai Zhou
  • for: 这个论文旨在探讨Graph Contrastive Learning(GCL)对于结构攻击的 adversarial robustness。
  • methods: 这篇论文使用了一系列的攻击分析和理论分析,揭示了现有攻击的弱点和如何降低GCL的性能。此外,它还提出了一种robust GCL框架,该框架通过 integrate homophily-driven sanitation view来增强GCL的鲁棒性。然而,sanitation objective的非导数性带来了一些挑战,以下是一些解决这些挑战的技巧。
  • results: 我们的实验结果表明,GCHS(Graph Contrastive Learning with Homophily-driven Sanitation View)在两种状态之前的顶尖模型面前占据了优势,并在生成节点 embedding 和两个重要的下游任务上表现出色。
    Abstract We investigate adversarial robustness of unsupervised Graph Contrastive Learning (GCL) against structural attacks. First, we provide a comprehensive empirical and theoretical analysis of existing attacks, revealing how and why they downgrade the performance of GCL. Inspired by our analytic results, we present a robust GCL framework that integrates a homophily-driven sanitation view, which can be learned jointly with contrastive learning. A key challenge this poses, however, is the non-differentiable nature of the sanitation objective. To address this challenge, we propose a series of techniques to enable gradient-based end-to-end robust GCL. Moreover, we develop a fully unsupervised hyperparameter tuning method which, unlike prior approaches, does not require knowledge of node labels. We conduct extensive experiments to evaluate the performance of our proposed model, GCHS (Graph Contrastive Learning with Homophily-driven Sanitation View), against two state of the art structural attacks on GCL. Our results demonstrate that GCHS consistently outperforms all state of the art baselines in terms of the quality of generated node embeddings as well as performance on two important downstream tasks.
    摘要 我们研究不监督图像对比学习(GCL)的抗击力,特别是对于结构性攻击。首先,我们提供了广泛的实验和理论分析,揭示了现有攻击的如何和为什么会下降GCL性能。 inspirited by our analytic results, we present a Robust GCL framework that integrates a homophily-driven sanitation view, which can be learned jointly with contrastive learning. However, the non-differentiable nature of the sanitation objective poses a key challenge. To address this challenge, we propose a series of techniques to enable gradient-based end-to-end robust GCL. Moreover, we develop a fully unsupervised hyperparameter tuning method, which unlike prior approaches, does not require knowledge of node labels. We conduct extensive experiments to evaluate the performance of our proposed model, GCHS (Graph Contrastive Learning with Homophily-driven Sanitation View), against two state-of-the-art structural attacks on GCL. Our results demonstrate that GCHS consistently outperforms all state-of-the-art baselines in terms of the quality of generated node embeddings as well as performance on two important downstream tasks.Here's the translation of the text in Traditional Chinese:我们研究不监督图像对比学习(GCL)的抗击力,特别是对于结构性攻击。首先,我们提供了广泛的实验和理论分析,揭示了现有攻击的如何和为什么会下降GCL性能。 inspirited by our analytic results, we present a Robust GCL framework that integrates a homophily-driven sanitation view, which can be learned jointly with contrastive learning. However, the non-differentiable nature of the sanitation objective poses a key challenge. To address this challenge, we propose a series of techniques to enable gradient-based end-to-end robust GCL. Moreover, we develop a fully unsupervised hyperparameter tuning method, which unlike prior approaches, does not require knowledge of node labels. We conduct extensive experiments to evaluate the performance of our proposed model, GCHS (Graph Contrastive Learning with Homophily-driven Sanitation View), against two state-of-the-art structural attacks on GCL. Our results demonstrate that GCHS consistently outperforms all state-of-the-art baselines in terms of the quality of generated node embeddings as well as performance on two important downstream tasks.

Continuation Path Learning for Homotopy Optimization

  • paper_url: http://arxiv.org/abs/2307.12551
  • repo_url: https://github.com/xi-l/cpl
  • paper_authors: Xi Lin, Zhiyuan Yang, Xiaoyuan Zhang, Qingfu Zhang
  • for: 提高Homotopy优化的效果和可用性,并提供更多的解决方案选择机会。
  • methods: 提出了一种基于模型的方法,可以同时优化原始问题和所有优化子问题,并实时生成任意中间解决方案。
  • results: 实验表明,该方法可以明显提高Homotopy优化的性能,并提供更多的有用信息支持更好的决策。
    Abstract Homotopy optimization is a traditional method to deal with a complicated optimization problem by solving a sequence of easy-to-hard surrogate subproblems. However, this method can be very sensitive to the continuation schedule design and might lead to a suboptimal solution to the original problem. In addition, the intermediate solutions, often ignored by classic homotopy optimization, could be useful for many real-world applications. In this work, we propose a novel model-based approach to learn the whole continuation path for homotopy optimization, which contains infinite intermediate solutions for any surrogate subproblems. Rather than the classic unidirectional easy-to-hard optimization, our method can simultaneously optimize the original problem and all surrogate subproblems in a collaborative manner. The proposed model also supports real-time generation of any intermediate solution, which could be desirable for many applications. Experimental studies on different problems show that our proposed method can significantly improve the performance of homotopy optimization and provide extra helpful information to support better decision-making.
    摘要 In this work, we propose a novel model-based approach to learn the whole continuation path for homotopy optimization, which includes infinite intermediate solutions for any surrogate subproblems. Unlike classic unidirectional easy-to-hard optimization, our method can simultaneously optimize the original problem and all surrogate subproblems in a collaborative manner. The proposed model also supports real-time generation of any intermediate solution, which can be desirable for many applications.Experimental studies on different problems show that our proposed method can significantly improve the performance of homotopy optimization and provide extra helpful information to support better decision-making.

On the Connection between Pre-training Data Diversity and Fine-tuning Robustness

  • paper_url: http://arxiv.org/abs/2307.12532
  • repo_url: None
  • paper_authors: Vivek Ramanujan, Thao Nguyen, Sewoong Oh, Ludwig Schmidt, Ali Farhadi
  • for: 了解预训练策略对下游模型的泛化性质的影响。
  • methods: 研究预训练分布的属性对下游模型的可靠性的影响,包括标签空间、标签 semantics、图像多样性、数据领域和数据量等因素。
  • results: 发现预训练数据量是下游模型的可靠性的关键因素,其他因素具有有限的影响。例如,将 ImageNet 预训练类减少到 4 倍,同时将每个类的图像数量增加到 4 倍(即保持总数据量不变)不会影响 fine-tuned 模型的可靠性。通过使用不同的自然和Synthetic 数据源预训练分布,主要通过 iWildCam-WILDS 分布转换测试下游模型的可靠性。
    Abstract Pre-training has been widely adopted in deep learning to improve model performance, especially when the training data for a target task is limited. In our work, we seek to understand the implications of this training strategy on the generalization properties of downstream models. More specifically, we ask the following question: how do properties of the pre-training distribution affect the robustness of a fine-tuned model? The properties we explore include the label space, label semantics, image diversity, data domains, and data quantity of the pre-training distribution. We find that the primary factor influencing downstream effective robustness (Taori et al., 2020) is data quantity, while other factors have limited significance. For example, reducing the number of ImageNet pre-training classes by 4x while increasing the number of images per class by 4x (that is, keeping total data quantity fixed) does not impact the robustness of fine-tuned models. We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources, primarily using the iWildCam-WILDS distribution shift as a test for downstream robustness.
    摘要 <>将文本翻译成简化中文。>预训练已广泛应用于深度学习中,以提高模型性能,特别是当目标任务的训练数据scarce时。在我们的工作中,我们想要了解预训练策略对下游模型的泛化性质产生的影响。更 Specifically,我们问的问题是:预训练分布的属性如何影响下游模型的可靠性?我们探讨的属性包括标签空间、标签 semantics、图像多样性、数据领域和数据量。我们发现预训练数据量是下游可靠性的主要因素,而其他因素具有有限的意义。例如,将 ImageNet 预训练类别数量减少到 4 倍,同时图像每类数量增加 4 倍(即保持总数据量不变),不会影响 Fine-tune 模型的可靠性。我们通过不同的自然和 sintetic 数据源中的预训练分布,主要使用 iWildCam-WILDS 分布转换为下游可靠性的测试。

Rethinking Medical Report Generation: Disease Revealing Enhancement with Knowledge Graph

  • paper_url: http://arxiv.org/abs/2307.12526
  • repo_url: https://github.com/wangyixinxin/mrg-kg
  • paper_authors: Yixin Wang, Zihao Lin, Haoyu Dong
  • for: 这个研究旨在提高医疗报告生成(MRG)的品质,特别是透过知识图(KG)来导向生成过程。
  • methods: 本研究使用了一个完整的KG,包括137种疾病和问题,并导入了一个新的增强描述疾病类型的增强策略,以解决长条形分布问题。
  • results: 研究发现,提案的两阶段生成框架和增强策略可以提高生成的多样性和准确性,并有着显著的改善效果。
    Abstract Knowledge Graph (KG) plays a crucial role in Medical Report Generation (MRG) because it reveals the relations among diseases and thus can be utilized to guide the generation process. However, constructing a comprehensive KG is labor-intensive and its applications on the MRG process are under-explored. In this study, we establish a complete KG on chest X-ray imaging that includes 137 types of diseases and abnormalities. Based on this KG, we find that the current MRG data sets exhibit a long-tailed problem in disease distribution. To mitigate this problem, we introduce a novel augmentation strategy that enhances the representation of disease types in the tail-end of the distribution. We further design a two-stage MRG approach, where a classifier is first trained to detect whether the input images exhibit any abnormalities. The classified images are then independently fed into two transformer-based generators, namely, ``disease-specific generator" and ``disease-free generator" to generate the corresponding reports. To enhance the clinical evaluation of whether the generated reports correctly describe the diseases appearing in the input image, we propose diverse sensitivity (DS), a new metric that checks whether generated diseases match ground truth and measures the diversity of all generated diseases. Results show that the proposed two-stage generation framework and augmentation strategies improve DS by a considerable margin, indicating a notable reduction in the long-tailed problem associated with under-represented diseases.
    摘要 医疗报告生成(MRG)中知识图(KG)扮演着关键性的角色,因为它揭示疾病之间的关系,可以用于导航生成过程。然而,建立全面的KG是劳动密集的,而其在MRG过程中的应用还尚未得到了充分的探索。本研究中,我们建立了包含137种疾病和异常的完整KG,基于这个KG,我们发现现有的MRG数据集具有长尾分布问题。为了解决这个问题,我们提出了一种新的增强策略,增强疾病类型在分布尾部的表示。此外,我们设计了两个阶段的MRG方法,其中第一阶段使用分类器来检测输入图像是否具有任何异常。经过分类后,图像分别被独立地传递到两个基于转换器的生成器,即“疾病特定生成器”和“疾病无效生成器”,以生成对应的报告。为了提高生成报告的临床评估,我们提出了多样性敏感度(DS),一种新的指标,用于检查生成的疾病与实际情况是否匹配,并测量所有生成的疾病的多样性。结果显示,我们的两个阶段生成框架和增强策略可以大幅提高DS, indicating a considerable reduction in the long-tailed problem associated with under-represented diseases.

Landslide Surface Displacement Prediction Based on VSXC-LSTM Algorithm

  • paper_url: http://arxiv.org/abs/2307.12524
  • repo_url: None
  • paper_authors: Menglin Kong, Ruichen Li, Fan Liu, Xingquan Li, Juan Cheng, Muzhou Hou, Cong Cao
  • for: 预测地面滑坡表面变位
  • methods: 基于变形模式分解(VMD)、SegSigmoid函数、XGBoost算法和嵌入LSTM neural network的时序预测框架(VSXC-LSTM)
  • results: 在测试集上,模型表现良好,除了随机项 subsequences 难以适应外, periodic item subsequence 和 trend item subsequence 的 RMSE 和 MAPE 都小于 0.1, periodic item prediction module 基于 XGBoost 的 RMSE 为 0.006。
    Abstract Landslide is a natural disaster that can easily threaten local ecology, people's lives and property. In this paper, we conduct modelling research on real unidirectional surface displacement data of recent landslides in the research area and propose a time series prediction framework named VMD-SegSigmoid-XGBoost-ClusterLSTM (VSXC-LSTM) based on variational mode decomposition, which can predict the landslide surface displacement more accurately. The model performs well on the test set. Except for the random item subsequence that is hard to fit, the root mean square error (RMSE) and the mean absolute percentage error (MAPE) of the trend item subsequence and the periodic item subsequence are both less than 0.1, and the RMSE is as low as 0.006 for the periodic item prediction module based on XGBoost\footnote{Accepted in ICANN2023}.
    摘要 地面滑坡是自然灾害,容易威胁当地生态、人们生命和财产。在这篇论文中,我们基于实际的单向表面偏移数据进行模拟研究,并提出了一种基于变幅模式分解的时间序列预测框架,称为VMD-SegSigmoid-XGBoost-ClusterLSTM(VSXC-LSTM)。这种模型可以更准确地预测滑坡表面偏移。测试集上的表现良好,只有随机项子序列难以适应,RMSE和MAPE值均小于0.1, periodic item prediction module based on XGBoost的RMSE值为0.006( Accepted in ICANN2023)。

Lost In Translation: Generating Adversarial Examples Robust to Round-Trip Translation

  • paper_url: http://arxiv.org/abs/2307.12520
  • repo_url: https://github.com/neelbhandari6/nmt_text_attack
  • paper_authors: Neel Bhandari, Pin-Yu Chen
  • for: 研究文章探讨了现有文本 adversarial 攻击的稳定性,特别是对于保持了 considerable similarity 的文本 adversarial examples。
  • methods: 文章使用了 six state-of-the-art text-based adversarial attacks,并对它们进行了 round-trip translation 测试。此外,文章还提出了一种基于 machine translation 的解决方案,以增强 adversarial example 的稳定性。
  • results: 研究发现,six state-of-the-art text-based adversarial attacks 在 round-trip translation 下失效,而 integrate machine translation into adversarial example generation 可以提高稳定性。这些结果表明,找到可以抗 translation 的 adversarial examples 可以帮助找到语言模型的缺陷,并促进更多关于多语言 adversarial attacks 的研究。
    Abstract Language Models today provide a high accuracy across a large number of downstream tasks. However, they remain susceptible to adversarial attacks, particularly against those where the adversarial examples maintain considerable similarity to the original text. Given the multilingual nature of text, the effectiveness of adversarial examples across translations and how machine translations can improve the robustness of adversarial examples remain largely unexplored. In this paper, we present a comprehensive study on the robustness of current text adversarial attacks to round-trip translation. We demonstrate that 6 state-of-the-art text-based adversarial attacks do not maintain their efficacy after round-trip translation. Furthermore, we introduce an intervention-based solution to this problem, by integrating Machine Translation into the process of adversarial example generation and demonstrating increased robustness to round-trip translation. Our results indicate that finding adversarial examples robust to translation can help identify the insufficiency of language models that is common across languages, and motivate further research into multilingual adversarial attacks.
    摘要 现代语言模型在许多下游任务上具有高精度。然而,它们仍然容易受到敌意攻击,特别是那些维持了原文和敌意例子之间的相似性。由于文本的多语言性,攻击者可以利用不同语言的文本来攻击语言模型。在这篇论文中,我们展示了现有的文本基于攻击的六种状态体验的不稳定性,并证明它们在翻译后不再有效。此外,我们还介绍了一种基于机器翻译的解决方案,并证明该方法可以提高攻击例子的翻译稳定性。我们的结果表明,找到可以抵抗翻译的攻击例子可以帮助我们发现语言模型的共同缺陷,并促进多语言攻击的研究。

DEPHN: Different Expression Parallel Heterogeneous Network using virtual gradient optimization for Multi-task Learning

  • paper_url: http://arxiv.org/abs/2307.12519
  • repo_url: None
  • paper_authors: Menglin Kong, Ri Su, Shaojie Zhao, Muzhou Hou
  • for: This paper proposes a new method for multi-task learning (MTL) recommendation systems to better understand user behavior in complex scenarios.
  • methods: The proposed method, called Different Expression Parallel Heterogeneous Network (DEPHN), uses different feature interaction methods to improve the generalization ability of shared information flow and adaptively adjusts the learning intensity of gated units based on task correlation.
  • results: Extensive experiments on artificial and real-world datasets demonstrate that DEPHN can capture task correlation in complex situations and achieve better performance than baseline models.Here’s the simplified Chinese version:
  • for: 这篇论文提出了一种基于多任务学习(MTL)的推荐系统,以更好地理解在复杂情况下用户行为。
  • methods: 提议的方法是多表达并发异构网络(DEPHN),通过不同的特征交互方法提高共享信息流的泛化能力,并在训练过程中通过特征显式映射和虚梯度系数进行专家阀控,以适应不同任务信息流的差异。
  • results: 对于人工和实际数据集的广泛实验表明,DEPHN可以在复杂情况下捕捉任务相关性,并比基线模型表现更好。
    Abstract Recommendation system algorithm based on multi-task learning (MTL) is the major method for Internet operators to understand users and predict their behaviors in the multi-behavior scenario of platform. Task correlation is an important consideration of MTL goals, traditional models use shared-bottom models and gating experts to realize shared representation learning and information differentiation. However, The relationship between real-world tasks is often more complex than existing methods do not handle properly sharing information. In this paper, we propose an Different Expression Parallel Heterogeneous Network (DEPHN) to model multiple tasks simultaneously. DEPHN constructs the experts at the bottom of the model by using different feature interaction methods to improve the generalization ability of the shared information flow. In view of the model's differentiating ability for different task information flows, DEPHN uses feature explicit mapping and virtual gradient coefficient for expert gating during the training process, and adaptively adjusts the learning intensity of the gated unit by considering the difference of gating values and task correlation. Extensive experiments on artificial and real-world datasets demonstrate that our proposed method can capture task correlation in complex situations and achieve better performance than baseline models\footnote{Accepted in IJCNN2023}.
    摘要 互联网运营商可以通过多任务学习(MTL)来理解用户和预测他们在多行为场景中的行为。传统模型通过共享底部模型和阻塞专家来实现共享表示学习和信息差异化。然而,现实世界中任务之间的关系经常比既有方法不够好地处理共享信息。在这篇论文中,我们提出了不同表达平行多样性网络(DEPHN),以同时模型多个任务。DEPHN通过不同的特征互动方法来提高共享信息流的泛化能力。对于模型对不同任务信息流的分化能力,DEPHN使用特征显式映射和虚拟梯度系数进行专家闭合 durante 训练过程中,并根据计算任务相互关系的差异来自适应学习Intensity of gated unit。经过了人工和实际世界的广泛实验,我们的提议方法可以在复杂的情况下捕捉任务相互关系,并在基eline模型的基础上提高表现。

FaFCNN: A General Disease Classification Framework Based on Feature Fusion Neural Networks

  • paper_url: http://arxiv.org/abs/2307.12518
  • repo_url: None
  • paper_authors: Menglin Kong, Shaojie Zhao, Juan Cheng, Xingquan Li, Ri Su, Muzhou Hou, Cong Cao
  • for: 本研究旨在 Addressing two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, namely insufficient training samples and feature fusion.
  • methods: 提出了一种基于域对抗学习的Feature-aware Fusion Correlation Neural Network (FaFCNN),具有特点增强样本相关性和特征对齐。
  • results: 实验结果表明,使用增强的特征获得融合特征后,FaFCNN可以更好地提高疾病分类性能,特别是在低质量数据集上。此外,广泛的实验还证明了模型的稳定性和每个组件的有效性。
    Abstract There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another one is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-aware Fusion Correlation Neural Network (FaFCNN), which introduces a feature-aware interaction module and a feature alignment module based on domain adversarial learning. This is a general framework for disease classification, and FaFCNN improves the way existing methods obtain sample correlation features. The experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree yields more performance gains than random-forest based methods. On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines. In addition, extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model\footnote{Accepted in IEEE SMC2023}.
    摘要 “有两个基本问题在应用深度学习/机器学习方法进行疾病分类任务中,一个是训练样本数量和质量不足; 另一个是如何有效地融合多个来源特征,以训练可靠的分类模型。为了解决这些问题,我们提出了基于人类学习知识的Feature-aware Fusion Correlation Neural Network(FaFCNN)框架。这是一个通用的疾病分类框架,并且FaFCNN可以将现有方法中取得的样本相互联系特征改进。实验结果显示,使用增强决策树进行预训练后的扩展特征可以实现更多的性能提升,而且在我们的设置中,FaFCNN在低质量样本大量遗传数据上取得了一致性的最佳性能。此外,广泛的实验显示了提案的方法的稳定性和每个模型 ком成分的有效性。”Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China.

An Empirical Evaluation of Temporal Graph Benchmark

  • paper_url: http://arxiv.org/abs/2307.12510
  • repo_url: https://github.com/yule-BUAA/DyGLib_TGB
  • paper_authors: Le Yu
  • for: 本研究是一个empirical evaluation of Temporal Graph Benchmark (TGB),通过扩展我们的Dynamic Graph Library (DyGLib)来对TGB进行评估。
  • methods: 本研究使用了11种流行的动态图学习方法进行更加广泛的比较,包括TGB中所report的基eline。
  • results: 通过实验发现,不同的模型在不同的数据集上表现出了不同的性能,与之前的观察一致;同时,使用DyGLib可以对一些基eline进行显著改进,超过TGB的报告结果。
    Abstract In this paper, we conduct an empirical evaluation of Temporal Graph Benchmark (TGB) by extending our Dynamic Graph Library (DyGLib) to TGB. Compared with TGB, we include eleven popular dynamic graph learning methods for more exhaustive comparisons. Through the experiments, we find that (1) different models depict varying performance across various datasets, which is in line with previous observations; (2) the performance of some baselines can be significantly improved over the reported results in TGB when using DyGLib. This work aims to ease the researchers' efforts in evaluating various dynamic graph learning methods on TGB and attempts to offer results that can be directly referenced in the follow-up research. All the used resources in this project are publicly available at https://github.com/yule-BUAA/DyGLib_TGB. This work is in progress, and feedback from the community is welcomed for improvements.
    摘要 在这篇论文中,我们进行了emporical评估Temporal Graph Benchmark(TGB)的扩展,使用我们的动态图库(DyGLib)来对TGB进行评估。相比TGB,我们包括了 eleven 种流行的动态图学习方法,以便进行更加详细的比较。通过实验,我们发现了以下两点:1. 不同的模型在不同的数据集上表现出了不同的性能,这与之前的观察一致。2. 一些基elines的性能可以通过使用DyGLib进行改进,这与TGB中report的结果相比有所提高。这项工作的目标是为研究者提供一个便利的评估多种动态图学习方法的平台,并提供可直接引用的结果。我们使用的所有资源都公开可用于https://github.com/yule-BUAA/DyGLib_TGB。这项工作正在进行中,欢迎社区提供反馈以便进行改进。

AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.12499
  • repo_url: None
  • paper_authors: Xuelong Dai, Kaisheng Liang, Bin Xiao
  • for: 防御深度学习模型受到攻击的攻击方法研究
  • methods: 使用扩散模型生成不受限制的攻击示例,并提出两种新的反向生成导航技术来进行攻击采样
  • results: 对MNIST和ImageNet dataset进行实验,得到了高质量、真实的攻击示例,并超越了基于GAN的方法在攻击性和生成质量上In English, this means:
  • for: Research on adversarial attacks against deep learning models and defense techniques
  • methods: Using diffusion models to generate unrestricted adversarial examples, and proposing two novel adversarial guidance techniques for reverse generation
  • results: Experimental results on MNIST and ImageNet datasets show that AdvDiff is effective in generating high-quality, realistic adversarial examples that outperform GAN-based methods in attack performance and generation quality.
    Abstract Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often utilize Generative Adversarial Networks (GANs), which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable to generate high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective to generate unrestricted adversarial examples, which outperforms GAN-based methods in terms of attack performance and generation quality.
    摘要 深度学习模型面临了无限制敌意攻击的威胁,这些攻击可以有效绕过防御机制。然而,先前的攻击方法 часто使用生成对抗网络(GAN),这些网络不是理论可证明的,因此会生成不实际的例子,特别是对于大规模的数据集如ImageNet。在这篇论文中,我们提出了一种新的方法,称为AdvDiff,用于生成无限制敌意例子。我们设计了两种新的对抗导航技术,用于在扩散模型的反生成过程中进行对抗采样。这两种技术可以生成高质量、实际的敌意例子,通过可视化目标分类器的梯度来 интегрирова。实验结果表明,AdvDiff在MNIST和ImageNet数据集上效果地生成了无限制敌意例子,其性能和生成质量都高于基于GAN的方法。

A faster and simpler algorithm for learning shallow networks

  • paper_url: http://arxiv.org/abs/2307.12496
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Sitan Chen, Shyam Narayanan
  • for: 学习一个线性组合中的ReLU活化器,给出标注的例子来自标准的$d$-维高斯分布。
  • methods: 使用Chen et al.的算法, runtime在$\text{poly}(d,1/\varepsilon)$时间内运行,并在多个阶段学习。
  • results: 显示了一个简单的一阶版本的算法可以 suffices,并且其运行时间只是 $(d/\varepsilon)^{O(k^2)} $。
    Abstract We revisit the well-studied problem of learning a linear combination of $k$ ReLU activations given labeled examples drawn from the standard $d$-dimensional Gaussian measure. Chen et al. [CDG+23] recently gave the first algorithm for this problem to run in $\text{poly}(d,1/\varepsilon)$ time when $k = O(1)$, where $\varepsilon$ is the target error. More precisely, their algorithm runs in time $(d/\varepsilon)^{\mathrm{quasipoly}(k)}$ and learns over multiple stages. Here we show that a much simpler one-stage version of their algorithm suffices, and moreover its runtime is only $(d/\varepsilon)^{O(k^2)}$.
    摘要 我们回到了已经很受研究的问题:学习一个线性 комбінаción of $k$ ReLU 激活函数, given labeled examples 从标准 $d$-dimensional Gaussian 分布中获取。陈等人 [CDG+23] 最近提出了首个这个问题可以在 $\text{poly}(d,1/\varepsilon)$ 时间内解决的算法,其中 $k = O(1)$,$\varepsilon$ 是目标错误。更加精确地说,他们的算法在多个阶段中执行, runtime 为 $(d/\varepsilon)^{\mathrm{quasipoly}(k)}$。我们现在显示出一个 much simpler 的一阶版本的他们的算法,并且其时间复杂度仅为 $(d/\varepsilon)^{O(k^2)}$。

Learning Universal and Robust 3D Molecular Representations with Graph Convolutional Networks

  • paper_url: http://arxiv.org/abs/2307.12491
  • repo_url: None
  • paper_authors: Shuo Zhang, Yang Liu, Li Xie, Lei Xie
  • for: 用于学习分子的准确表示,需要考虑分子的化学和几何特征。
  • methods: 基于分子图表示的方向节点对(DNP)描述器,Robust Molecular Graph Convolutional Network(RoM-GCN)模型可以同时考虑节点和边特征。
  • results: 对蛋白质和小分子数据集进行评估,研究表明DNP描述器能够具有3D分子几何信息的Robust性,RoM-GCN模型在比较基eline上表现出色。
    Abstract To learn accurate representations of molecules, it is essential to consider both chemical and geometric features. To encode geometric information, many descriptors have been proposed in constrained circumstances for specific types of molecules and do not have the properties to be ``robust": 1. Invariant to rotations and translations; 2. Injective when embedding molecular structures. In this work, we propose a universal and robust Directional Node Pair (DNP) descriptor based on the graph representations of 3D molecules. Our DNP descriptor is robust compared to previous ones and can be applied to multiple molecular types. To combine the DNP descriptor and chemical features in molecules, we construct the Robust Molecular Graph Convolutional Network (RoM-GCN) which is capable to take both node and edge features into consideration when generating molecule representations. We evaluate our model on protein and small molecule datasets. Our results validate the superiority of the DNP descriptor in incorporating 3D geometric information of molecules. RoM-GCN outperforms all compared baselines.
    摘要

Learning Resource Allocation Policy: Vertex-GNN or Edge-GNN?

  • paper_url: http://arxiv.org/abs/2307.12480
  • repo_url: None
  • paper_authors: Yao Peng, Jia Guo, Chenyang Yang
  • for: 本文研究了基于图神经网络(Graph Neural Networks,GNNs)的无线资源分配策略学习。
  • methods: 本文分析了顶点神经网络(Vertex-GNNs)和边神经网络(Edge-GNNs)在学习无线策略时的表达能力。
  • results: 研究发现,顶点神经网络和边神经网络的表达能力取决于处理和组合函数的线性和输出维度。顶点神经网络在使用线性处理器时无法分辨所有通道矩阵,而边神经网络可以。在学习precoding策略时,即使使用非线性处理器,顶点神经网络的表达能力仍然有限。研究还提出了必要的条件,以确保GNNs可以好好地学习precoding策略。实验结果证明了分析结论,并表明了边神经网络可以与顶点神经网络相比,具有更低的训练和推断时间。
    Abstract Graph neural networks (GNNs) update the hidden representations of vertices (called Vertex-GNNs) or hidden representations of edges (called Edge-GNNs) by processing and pooling the information of neighboring vertices and edges and combining to incorporate graph topology. When learning resource allocation policies, GNNs cannot perform well if their expressive power are weak, i.e., if they cannot differentiate all input features such as channel matrices. In this paper, we analyze the expressive power of the Vertex-GNNs and Edge-GNNs for learning three representative wireless policies: link scheduling, power control, and precoding policies. We find that the expressive power of the GNNs depend on the linearity and output dimensions of the processing and combination functions. When linear processors are used, the Vertex-GNNs cannot differentiate all channel matrices due to the loss of channel information, while the Edge-GNNs can. When learning the precoding policy, even the Vertex-GNNs with non-linear processors may not be with strong expressive ability due to the dimension compression. We proceed to provide necessary conditions for the GNNs to well learn the precoding policy. Simulation results validate the analyses and show that the Edge-GNNs can achieve the same performance as the Vertex-GNNs with much lower training and inference time.
    摘要 图 нейрон网络(GNNs)更新隐藏表示的顶点(称为顶点GNNs)或隐藏表示的边(称为边GNNs),通过处理和汇聚邻近顶点和边的信息,并将其组合以利用图STRUCTURE。在学习资源分配策略时,GNNs如果表达力强不足,例如不能分辨输入特征集如渠道矩阵,则不能表现出好的性能。在这篇论文中,我们分析顶点GNNs和边GNNs在学习三种代表性无线策略:链接调度策略、功率控制策略和排序策略时的表达力。我们发现顶点GNNs和边GNNs的表达力取决于处理和组合函数的线性和输出维度。当使用线性处理器时,顶点GNNs无法分辨所有渠道矩阵,而边GNNs可以。在学习预处理策略时, même avec les processeurs non linéaires, les GNNs peut ne pas avoir une capacité d'expression suffisante en raison de la compression de la dimension. Nous avons fourni les conditions nécessaires pour que les GNNs apprennent efficacement la stratégie de pré-traitement. Les résultats de simulation valident les analyses et montrent que les GNNs peuvent atteindre le même niveau de performance que les GNNs avec beaucoup moins de temps d'entraînement et d'inférence.

Model-free generalized fiducial inference

  • paper_url: http://arxiv.org/abs/2307.12472
  • repo_url: None
  • paper_authors: Jonathan P Williams
  • for: 这项研究的目的是为了开发一种安全可靠的机器学习 uncertainty quantification 方法。
  • methods: 这项研究使用了一种 model-free 统计框架,以实现 imprecise probabilistic prediction inference。这个框架可以提供 finite sample control of type 1 errors,同时也可以提供更多的 versatile tools for imprecise probabilistic reasoning。
  • results: 这项研究提出了一种新的 preciseness probability approximation,用于 approximate belief/plausibility measure pair。这个 aproximation 是一种 optima in some sense 的 probability measure in the credal set,可以解决 imprecise probabilistic approaches to inference 的问题。
    Abstract Motivated by the need for the development of safe and reliable methods for uncertainty quantification in machine learning, I propose and develop ideas for a model-free statistical framework for imprecise probabilistic prediction inference. This framework facilitates uncertainty quantification in the form of prediction sets that offer finite sample control of type 1 errors, a property shared with conformal prediction sets, but this new approach also offers more versatile tools for imprecise probabilistic reasoning. Furthermore, I propose and consider the theoretical and empirical properties of a precise probabilistic approximation to the model-free imprecise framework. Approximating a belief/plausibility measure pair by an [optimal in some sense] probability measure in the credal set is a critical resolution needed for the broader adoption of imprecise probabilistic approaches to inference in statistical and machine learning communities. It is largely undetermined in the statistical and machine learning literatures, more generally, how to properly quantify uncertainty in that there is no generally accepted standard of accountability of stated uncertainties. The research I present in this manuscript is aimed at motivating a framework for statistical inference with reliability and accountability as the guiding principles.
    摘要 我受到机器学习中无certainty量化的需求而努力提出和开发一个无模型的统计框架,以便实现precise probabilistic prediction inference中的uncertainty量化。这个框架可以提供finite sample控制type 1 error的prediction set,和conformal prediction set相似,但这个新的方法可以提供更多的versatile工具 дляimprecise probabilistic reasoning。此外,我还提出了一个精确的 probabilistic approximation,用于对无模型的imprecise framework进行approximation。在这个框架中,一个belief/plausibility measure pair的抽象是一个optimal的probability measure在credal set中,这是critical resolution needed for the broader adoption of imprecise probabilistic approaches to inference in statistical and machine learning communities。在统计和机器学习文献中,更加一般地,无法properly quantify uncertainty,因为没有一个通行的标准 accountability of stated uncertainties。我在这个著作中的研究是对 statistical inference的一个框架,以实现可靠性和责任性为引导 principl。

Rethinking Data Distillation: Do Not Overlook Calibration

  • paper_url: http://arxiv.org/abs/2307.12463
  • repo_url: https://github.com/dongyaozhu/calibrate-networks-trained-on-distilled-datasets
  • paper_authors: Dongyao Zhu, Bowen Lei, Jie Zhang, Yanbo Fang, Ruqi Zhang, Yiqun Xie, Dongkuan Xu
  • for: 本研究旨在解决因数据压缩而导致的神经网络输出过于自信的问题,提出了两种新的纠正方法:Masked Temperature Scaling (MTS) 和 Masked Distillation Training (MDT)。
  • methods: 本研究使用了数据压缩后的神经网络进行训练,并采用了温度扩大和混合方法来纠正神经网络的输出。
  • results: 研究发现,使用Masked Temperature Scaling (MTS) 和 Masked Distillation Training (MDT) 可以更好地纠正数据压缩后的神经网络输出,同时保持数据压缩的效率。
    Abstract Neural networks trained on distilled data often produce over-confident output and require correction by calibration methods. Existing calibration methods such as temperature scaling and mixup work well for networks trained on original large-scale data. However, we find that these methods fail to calibrate networks trained on data distilled from large source datasets. In this paper, we show that distilled data lead to networks that are not calibratable due to (i) a more concentrated distribution of the maximum logits and (ii) the loss of information that is semantically meaningful but unrelated to classification tasks. To address this problem, we propose Masked Temperature Scaling (MTS) and Masked Distillation Training (MDT) which mitigate the limitations of distilled data and achieve better calibration results while maintaining the efficiency of dataset distillation.
    摘要 neural networks 经过精炼数据训练后通常会生成过度自信的输出,需要进行减强方法来修正。现有的减强方法,如温度Scaling 和 mixup,对于基于原始大规模数据的网络进行训练时工作良好。然而,我们发现这些方法无法调整基于大源数据集的数据精炼后的网络。在这篇论文中,我们发现了精炼数据导致的网络无法减强的两个问题:(i)精炼数据集中最大 logits 的更集中分布,以及(ii)semantic意义强度不相关的信息的丢失。为解决这问题,我们提出了Masked Temperature Scaling (MTS) 和 Masked Distillation Training (MDT),这两种方法可以缓解精炼数据的局限性,实现更好的减强结果,同时保持数据精炼的效率。

Rates of Approximation by ReLU Shallow Neural Networks

  • paper_url: http://arxiv.org/abs/2307.12461
  • repo_url: None
  • paper_authors: Tong Mao, Ding-Xuan Zhou
  • For: The paper is written to investigate the efficiency of shallow neural networks with one hidden layer in approximating functions from H"older spaces.* Methods: The paper uses ReLU neural networks with $m$ hidden neurons to approximate functions from $W_\infty^r([-1, 1]^d)$ and provides rates of uniform approximation.* Results: The paper shows that ReLU shallow neural networks can uniformly approximate functions from $W_\infty^r([-1, 1]^d)$ with rates $O((\log m)^{\frac{1}{2} +d}m^{-\frac{r}{d}\frac{d+2}{d+4})$ when $r<d/2 +2$, which is very close to the optimal rate $O(m^{-\frac{r}{d})$ when the dimension $d$ is large.
    Abstract Neural networks activated by the rectified linear unit (ReLU) play a central role in the recent development of deep learning. The topic of approximating functions from H\"older spaces by these networks is crucial for understanding the efficiency of the induced learning algorithms. Although the topic has been well investigated in the setting of deep neural networks with many layers of hidden neurons, it is still open for shallow networks having only one hidden layer. In this paper, we provide rates of uniform approximation by these networks. We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H\"older space $W_\infty^r([-1, 1]^d)$ with rates $O((\log m)^{\frac{1}{2} +d}m^{-\frac{r}{d}\frac{d+2}{d+4})$ when $r
    摘要 “射预统计学中的神经网络,尤其是使用Rectified Linear Unit(ReLU)启动的神经网络,在深度学习的发展中扮演着中心作用。关于使用这些神经网络来近似Holder空间中函数的问题,是深度学习算法的效率的关键因素。虽然在多层神经网络的情况下已经得到了广泛的研究,但是对于单层神经网络还未有充分的研究。在这篇论文中,我们提供了uniform近似率。我们证明了ReLU单层神经网络可以将-$m$个隐藏神经元uniform近似$W_\infty^r([-1, 1]^d)$中的函数, rates为$O((\log m)^{\frac{1}{2} +d}m^{-\frac{r}{d}\frac{d+2}{d+4})$,当$r

Information-theoretic Analysis of Test Data Sensitivity in Uncertainty

  • paper_url: http://arxiv.org/abs/2307.12456
  • repo_url: None
  • paper_authors: Futoshi Futami, Tomoharu Iwata
  • for: 这篇论文的目的是对 bayesian 推断中的不确定性进行量化,并分析了这种不确定性的两种类型:aleatoric 不确定性和 epistemic 不确定性。
  • methods: 该论文使用了 bayesian 推断,并rigorously 分解了 predictive uncertainty 到 two 种不确定性。它们分别表示数据生成过程中的内在随机性和数据不充分导致的多样性。
  • results: 该论文成功地定义了 uncertainty sensitivity,并extend 了现有的 bayesian meta-learning 分析。它首次显示了任务之间的新的sensitivity。
    Abstract Bayesian inference is often utilized for uncertainty quantification tasks. A recent analysis by Xu and Raginsky 2022 rigorously decomposed the predictive uncertainty in Bayesian inference into two uncertainties, called aleatoric and epistemic uncertainties, which represent the inherent randomness in the data-generating process and the variability due to insufficient data, respectively. They analyzed those uncertainties in an information-theoretic way, assuming that the model is well-specified and treating the model's parameters as latent variables. However, the existing information-theoretic analysis of uncertainty cannot explain the widely believed property of uncertainty, known as the sensitivity between the test and training data. It implies that when test data are similar to training data in some sense, the epistemic uncertainty should become small. In this work, we study such uncertainty sensitivity using our novel decomposition method for the predictive uncertainty. Our analysis successfully defines such sensitivity using information-theoretic quantities. Furthermore, we extend the existing analysis of Bayesian meta-learning and show the novel sensitivities among tasks for the first time.
    摘要 某些任务中,泊然推理 often 用于 uncertainty quantification 任务。据 Xu 和 Raginsky (2022)的分析, Bayesian 推理中的 predictive uncertainty 可以分为两种不确定性,即 aleatoric 和 epistemic 不确定性,它们表示数据生成过程中的内在随机性和数据不充分导致的多样性。他们通过信息论方式分析这些不确定性,假设模型是正确的并将模型参数看作隐藏变量。然而,现有的信息论分析不能解释uncertainty 中的一个广泛信奉的性质,即测试数据与训练数据之间的敏感性。这意味着当测试数据与训练数据相似时, epistemic 不确定性应该减少。在这项工作中,我们通过我们的新的分解方法来研究这种敏感性。我们的分析成功地定义了这种敏感性使用信息论量表示。此外,我们将 Bayesian meta-learning 的现有分析扩展到新的任务,并首次研究这些任务之间的新的敏感性。

DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces

  • paper_url: http://arxiv.org/abs/2307.12451
  • repo_url: https://github.com/ferg-lab/diamondback
  • paper_authors: Michael S. Jones, Kirill Shmilovich, Andrew L. Ferguson
  • for: 这个论文的目的是提出一种基于杂化的分子模型,以便在长时间步骤上模拟蛋白质的均质化和折叠过程。
  • methods: 这个论文使用了一种名为Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping(DiAMoNDBack)的杂化模型,用于从粗粒度模型中恢复到原子级模型。这种模型基于一种杂化扩散过程,通过 conditioned on the Cα trace和当地的蛋白质结构来生成原子级模型。
  • results: 这个论文的实验结果表明,DiAMoNDBack模型可以在蛋白质结构模拟中实现高水平的重建性,包括正确的键形成、避免侧链冲突以及配置态的多样性。此外,模型还可以在不同的蛋白质结构和 simulate 中进行传输性。
    Abstract Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long-time scales such as aggregation and folding. The reduced resolution realizes computational accelerations but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only C{\alpha} coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace and previously backmapped backbone and side chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side chain all-atom configurations consistent with the coarse-grained C{\alpha} trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically-disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side chain clashes, and diversity of the generated side chain configurational states. We make DiAMoNDBack model publicly available as a free and open source Python package.
    摘要 习微观模型可以访问到长度和时间尺度,不可能由所有原子模型 achieve。它允许模拟在长时间尺度上发生的过程,如聚集和折叠。减少分辨率实现计算加速,但原子尺度的表示是完整理解机制的必要条件。回映是将高级别的分辨率复制到低级别的分辨率模型中的过程。在这种工作中,我们报道了Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping(Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping,简称DiAMoNDBack)作为一种推荐模型,用于在保留Cα坐标的情况下,将高级别的分辨率详细信息还原到低级别的分辨率模型中。这种推荐过程从蛋白质的N端开始,以每个残基为单位,通过Cα轨迹和以前已经回映的背bone和副链原子来驱动。本地和自适应的特点使得模型可以转移到不同的蛋白质上。由于杂化的推荐过程,模型可以生成一个真实的蛋白质详细配置,包括背bone和副链原子的全原子配置,并且与高级别的分辨率详细信息保持一致。我们在65000多个PDB结构数据集上训练了DiAMoNDBack模型,并在一个PDB测试集上验证了它。我们还在Protein Ensemble Database(PED)中的自发布蛋白质结构数据集、DE Shaw Research的分子动力学 simulations和减少分辨率 simulation数据上应用了这种模型。我们达到了当前最佳的重建性表现,包括正确的键形成、避免副链冲突和副链配置状态的多样性。我们将DiAMoNDBack模型作为一种免费和开源的Python包公开发布。

ProtoFL: Unsupervised Federated Learning via Prototypical Distillation

  • paper_url: http://arxiv.org/abs/2307.12450
  • repo_url: None
  • paper_authors: Hansol Kim, Youngjun Kwak, Minyoung Jung, Jinho Shin, Youngsung Kim, Changick Kim
  • for: 提高数据隐私保护和验证系统性能
  • methods: 提出了基于原型表示缩短的 federated learning(ProtoFL)和基于正则化流的本地一类分类器
  • results: 在五个广泛使用的标准评估 dataset 上,证明了我们提出的框架在先前Literature中的表现优于其他方法
    Abstract Federated learning (FL) is a promising approach for enhancing data privacy preservation, particularly for authentication systems. However, limited round communications, scarce representation, and scalability pose significant challenges to its deployment, hindering its full potential. In this paper, we propose 'ProtoFL', Prototypical Representation Distillation based unsupervised Federated Learning to enhance the representation power of a global model and reduce round communication costs. Additionally, we introduce a local one-class classifier based on normalizing flows to improve performance with limited data. Our study represents the first investigation of using FL to improve one-class classification performance. We conduct extensive experiments on five widely used benchmarks, namely MNIST, CIFAR-10, CIFAR-100, ImageNet-30, and Keystroke-Dynamics, to demonstrate the superior performance of our proposed framework over previous methods in the literature.
    摘要 federated learning (FL) 是一种有前途的方法,可以增强数据隐私保护,特别是 для 身份验证系统。然而,有限的回合通信,珍贵的表示和扩展性带来了大量的挑战,这些挑战妨碍了其全面发挥。在这篇论文中,我们提议了“ProtoFL”,基于原型表示抽象的 Federated Learning,以提高全球模型的表示力并降低回合通信成本。此外,我们还引入了一种基于正规流的本地一类分类器,以提高有限数据下的性能。这是文献中第一篇使用 Federated Learning 提高一类分类性能的研究。我们在五个广泛使用的 benchmark 上进行了广泛的实验,包括 MNIST、CIFAR-10、CIFAR-100、ImageNet-30 和 Keystroke-Dynamics,以示出我们提posed框架的超过先前方法的优秀性。

WEPRO: Weight Prediction for Efficient Optimization of Hybrid Quantum-Classical Algorithms

  • paper_url: http://arxiv.org/abs/2307.12449
  • repo_url: None
  • paper_authors: Satwik Kundu, Debarshi Kundu, Swaroop Ghosh
  • for: 加速量子 neural network 和量子矩阵问题的训练,提高量子矩阵算法的精度和效率。
  • methods: 提出了一种新的方法 called WEPRO,利用量子矩阵参数 weights 的常见趋势来加速量子矩阵的训练。并提出了两种优化技术 Naive Prediction 和 Adaptive Prediction。
  • results: 通过对多个量子 neural network 模型的训练和测试,显示了 WEPRO 可以提高训练速度约 2.25 倍,同时提高精度和预测性能,且具有低存储和计算开销。在量子矩阵问题中,WEPRO 也能够提高训练速度和精度。
    Abstract The exponential run time of quantum simulators on classical machines and long queue depths and high costs of real quantum devices present significant challenges in the effective training of Variational Quantum Algorithms (VQAs) like Quantum Neural Networks (QNNs), Variational Quantum Eigensolver (VQE) and Quantum Approximate Optimization Algorithm (QAOA). To address these limitations, we propose a new approach, WEPRO (Weight Prediction), which accelerates the convergence of VQAs by exploiting regular trends in the parameter weights. We introduce two techniques for optimal prediction performance namely, Naive Prediction (NaP) and Adaptive Prediction (AdaP). Through extensive experimentation and training of multiple QNN models on various datasets, we demonstrate that WEPRO offers a speedup of approximately $2.25\times$ compared to standard training methods, while also providing improved accuracy (up to $2.3\%$ higher) and loss (up to $6.1\%$ lower) with low storage and computational overheads. We also evaluate WEPRO's effectiveness in VQE for molecular ground-state energy estimation and in QAOA for graph MaxCut. Our results show that WEPRO leads to speed improvements of up to $3.1\times$ for VQE and $2.91\times$ for QAOA, compared to traditional optimization techniques, while using up to $3.3\times$ less number of shots (i.e., repeated circuit executions) per training iteration.
    摘要 traditional training methods for Variational Quantum Algorithms (VQAs) like Quantum Neural Networks (QNNs), Variational Quantum Eigensolver (VQE), and Quantum Approximate Optimization Algorithm (QAOA) face significant challenges due to the exponential run time on classical machines and the long queue depths and high costs of real quantum devices. To address these limitations, we propose a new approach called WEPRO (Weight Prediction), which accelerates the convergence of VQAs by exploiting regular trends in the parameter weights. We introduce two techniques for optimal prediction performance, namely Naive Prediction (NaP) and Adaptive Prediction (AdaP). Through extensive experimentation and training of multiple QNN models on various datasets, we demonstrate that WEPRO offers a speedup of approximately 2.25 times compared to standard training methods, while also providing improved accuracy (up to 2.3% higher) and loss (up to 6.1% lower) with low storage and computational overheads. We also evaluate WEPRO's effectiveness in VQE for molecular ground-state energy estimation and in QAOA for graph MaxCut. Our results show that WEPRO leads to speed improvements of up to 3.1 times for VQE and 2.91 times for QAOA, compared to traditional optimization techniques, while using up to 3.3 times less number of shots (i.e., repeated circuit executions) per training iteration.

Multifidelity Covariance Estimation via Regression on the Manifold of Symmetric Positive Definite Matrices

  • paper_url: http://arxiv.org/abs/2307.12438
  • repo_url: None
  • paper_authors: Aimee Maurais, Terrence Alsup, Benjamin Peherstorfer, Youssef Marzouk
  • for: 这篇论文是为了提出一种多信度估计器,用于估计协方差矩阵。
  • methods: 这篇论文使用了拟合问题的方法,在协方差矩阵的拟合空间上进行估计。
  • results: 论文的实验结果表明,使用这种多信度估计器可以大幅降低估计误差,相比单信度和其他多信度估计器。此外,这种估计器还保持了正定定义性,使其适用于后续任务,如数据吸收和 метри学学习。
    Abstract We introduce a multifidelity estimator of covariance matrices formulated as the solution to a regression problem on the manifold of symmetric positive definite matrices. The estimator is positive definite by construction, and the Mahalanobis distance minimized to obtain it possesses properties which enable practical computation. We show that our manifold regression multifidelity (MRMF) covariance estimator is a maximum likelihood estimator under a certain error model on manifold tangent space. More broadly, we show that our Riemannian regression framework encompasses existing multifidelity covariance estimators constructed from control variates. We demonstrate via numerical examples that our estimator can provide significant decreases, up to one order of magnitude, in squared estimation error relative to both single-fidelity and other multifidelity covariance estimators. Furthermore, preservation of positive definiteness ensures that our estimator is compatible with downstream tasks, such as data assimilation and metric learning, in which this property is essential.
    摘要 我们介绍一个多域确度估计器,它是解决在对称正定矩阵构造的应变问题中的解。这个估计器由建构而成,并且在 Mahalanobis 距离下实现了实用的计算。我们显示了我们的数据融合多域确度估计器(MRMF)是一个最大 LIKELIHOOD 估计器,在某些错误模型上的拓扑 tangent space 上。更一般地说,我们的里敦热投影框架包含了现有的多域确度估计器,它们是由控制值构成的。我们通过数据示例显示了我们的估计器可以对单域和其他多域确度估计器的平方误差做出很大减少,达到一个次的减少。此外,保持正定性的保证,使得我们的估计器可以与下游任务,如数据融合和度量学习,进行Compatible。

A Generalized Schwarz-type Non-overlapping Domain Decomposition Method using Physics-constrained Neural Networks

  • paper_url: http://arxiv.org/abs/2307.12435
  • repo_url: https://github.com/hipersimlab/pecann
  • paper_authors: Shamsulhaq Basir, Inanc Senocak
  • For: The paper is written for solving forward and inverse problems involving partial differential equations (PDEs) using a meshless Schwarz-type non-overlapping domain decomposition method based on artificial neural networks.* Methods: The paper uses a generalized Robin-type interface condition, where unique Robin parameters are assigned to each subdomain and learned to minimize the mismatch on the Robin interface condition. The method uses an independent neural network model trained to minimize the loss on the governing PDE while strictly enforcing boundary and interface conditions through an augmented Lagrangian formalism.* Results: The paper demonstrates the versatility and performance of the proposed approach through extensive experiments on forward and inverse problems, including one-way and two-way decompositions with crosspoints. The learned Robin parameters adapt to the local behavior of the solution, domain partitioning, and subdomain location relative to the overall domain.
    Abstract We present a meshless Schwarz-type non-overlapping domain decomposition method based on artificial neural networks for solving forward and inverse problems involving partial differential equations (PDEs). To ensure the consistency of solutions across neighboring subdomains, we adopt a generalized Robin-type interface condition, assigning unique Robin parameters to each subdomain. These subdomain-specific Robin parameters are learned to minimize the mismatch on the Robin interface condition, facilitating efficient information exchange during training. Our method is applicable to both the Laplace's and Helmholtz equations. It represents local solutions by an independent neural network model which is trained to minimize the loss on the governing PDE while strictly enforcing boundary and interface conditions through an augmented Lagrangian formalism. A key strength of our method lies in its ability to learn a Robin parameter for each subdomain, thereby enhancing information exchange with its neighboring subdomains. We observe that the learned Robin parameters adapt to the local behavior of the solution, domain partitioning and subdomain location relative to the overall domain. Extensive experiments on forward and inverse problems, including one-way and two-way decompositions with crosspoints, demonstrate the versatility and performance of our proposed approach.
    摘要 我们提出了一种无缝Schwarz类非重叠域分解方法,基于人工神经网络来解决部分� differential 方程(PDEs)中的前向和反向问题。为确保邻居子域解的一致性,我们采用一种通用的Robin类型界面条件,将每个子域分配特定的Robin参数。这些子域特定的Robin参数通过在训练中最小化Robin界面条件的差异,以便有效地交换信息。我们的方法适用于拉普拉斯方程和哈尔曼方程。它使用独立的神经网络模型来表示本地解,并通过一种扩展的拉格朗日 formalism来严格执行边界和界面条件。我们发现,我们的方法可以学习每个子域的Robin参数,从而提高邻居子域之间信息交换的能力。我们在多个实验中证明了我们的提出的方法的多样性和性能。这些实验包括一个方向和二个方向的分解,以及跨点的分解。

Augmented Box Replay: Overcoming Foreground Shift for Incremental Object Detection

  • paper_url: http://arxiv.org/abs/2307.12427
  • repo_url: https://github.com/YuyangSunshine/ABR_IOD
  • paper_authors: Liu Yuyang, Cong Yang, Goswami Dipam, Liu Xialei, Joost van de Weijer
  • for: 这篇论文的目的是解决在增量学习中的忘记问题,特别是在增量物体检测(IOD)领域中,通过重新播放之前任务的图像和当前任务的图像。
  • methods: 这篇论文提出了一种新的和高效的增量Box Replay(ABR)方法,该方法仅将前一任务的背景图像中的前景物体存储并重新播放,从而解决了背景shift问题。此外,该论文还提出了一种新的注意力捕捉的RoI填充损失,该损失使当前模型在旧模型中捕捉到重要信息。
  • results: 实验结果表明,ABR方法可以有效地避免忘记前一任务的类别,同时保持当前任务的柔软性。此外,ABR方法还可以减少存储需求,并在 Pascal-VOC 和 COCO 数据集上达到了顶尖性能。
    Abstract In incremental learning, replaying stored samples from previous tasks together with current task samples is one of the most efficient approaches to address catastrophic forgetting. However, unlike incremental classification, image replay has not been successfully applied to incremental object detection (IOD). In this paper, we identify the overlooked problem of foreground shift as the main reason for this. Foreground shift only occurs when replaying images of previous tasks and refers to the fact that their background might contain foreground objects of the current task. To overcome this problem, a novel and efficient Augmented Box Replay (ABR) method is developed that only stores and replays foreground objects and thereby circumvents the foreground shift problem. In addition, we propose an innovative Attentive RoI Distillation loss that uses spatial attention from region-of-interest (RoI) features to constrain current model to focus on the most important information from old model. ABR significantly reduces forgetting of previous classes while maintaining high plasticity in current classes. Moreover, it considerably reduces the storage requirements when compared to standard image replay. Comprehensive experiments on Pascal-VOC and COCO datasets support the state-of-the-art performance of our model.
    摘要 增量学习中,重新播放之前任务中的样本和当前任务中的样本是解决忘却折架的最有效方法之一。然而,与增量分类不同,图像重新播放在增量物体检测(IOD)中尚未得到成功。在这篇论文中,我们认为过looked problem of foreground shift是主要的原因。foreground shift只发生在重新播放之前任务的图像时,并且指的是这些背景可能包含当前任务中的前景对象。为解决这个问题,我们开发了一种新的和高效的增强盒子重新播放(ABR)方法,只将前景对象存储和重新播放,因此绕过了前景shift问题。此外,我们提出了一种创新的关注点 RoI 特征整合损失,使当前模型从老模型中提取最重要的信息,并将其用于现有模型的约束。ABR 能够减少之前类型的忘却,同时保持当前类型的高柔性。此外,它也可以significantly reduce the storage requirements when compared to standard image replay。我们在 Pascal-VOC 和 COCO 数据集上进行了全面的实验,并证明了我们的模型的状态-of-the-art表现。

  • paper_url: http://arxiv.org/abs/2307.12417
  • repo_url: None
  • paper_authors: Kasidis Arunruangsirilert, Jiro Katto
  • for: 这个论文的目的是预测5G NR网络中用户设备(UE)的未来上行吞吐量,以优化用户体验(QoE)。
  • methods: 这个论文使用ConvLSTM神经网络预测未来上行吞吐量,基于过去的上行吞吐量和RF参数。网络通过实际的5G SA网络驱动测试数据进行训练,并限制了模型只使用Android API中提供的信息。
  • results: 这个论文的结果表明,使用ConvLSTM神经网络预测未来上行吞吐量的方法可以达到98.9%的准确率, average RMSE为1.80 Mbps。
    Abstract While the 5G New Radio (NR) network promises a huge uplift of the uplink throughput, the improvement can only be seen when the User Equipment (UE) is connected to the high-frequency millimeter wave (mmWave) band. With the rise of uplink-intensive smartphone applications such as the real-time transmission of UHD 4K/8K videos, and Virtual Reality (VR)/Augmented Reality (AR) contents, uplink throughput prediction plays a huge role in maximizing the users' quality of experience (QoE). In this paper, we propose using a ConvLSTM-based neural network to predict the future uplink throughput based on past uplink throughput and RF parameters. The network is trained using the data from real-world drive tests on commercial 5G SA networks while riding commuter trains, which accounted for various frequency bands, handover, and blind spots. To make sure our model can be practically implemented, we then limited our model to only use the information available via Android API, then evaluate our model using the data from both commuter trains and other methods of transportation. The results show that our model reaches an average prediction accuracy of 98.9\% with an average RMSE of 1.80 Mbps across all unseen evaluation scenarios.
    摘要 5G新Radio(NR)网络承诺会带来巨大的上行吞吐量提高,但是这种提高只能在用户设备(UE)与高频毫米波(mmWave)频率带连接时得到。随着上行吞吐量占用应用程序如实时传输UHD 4K/8K视频和虚拟现实(VR)/增强现实(AR)内容的普及,上行吞吐量预测在maximizing用户体验质量(QoE)中扮演着关键的角色。在这篇论文中,我们提议使用ConvLSTM神经网络预测未来上行吞吐量,基于过去上行吞吐量和RF参数。网络通过实际驱动测试数据 collected from commercial 5G SA 网络而验证,该数据包括不同频率带、过渡和隐私。为确保我们的模型能够实际应用,我们然后限制了我们的模型仅使用可以通过 Android API 获得的信息。我们使用了不同交通工具进行评估,并发现我们的模型在所有未seen评估场景中达到了98.9%的预测精度,平均Relative Mean Squared Error(RMSE)为1.80 Mbps。

A Machine Learning Approach to Two-Stage Adaptive Robust Optimization

  • paper_url: http://arxiv.org/abs/2307.12409
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Dimitris Bertsimas, Cheol Woo Kim
  • for: 解决两阶段线性适应Robust优化问题(ARO)中的 binary 现在变量和多面uncertainty sets问题。
  • methods: 使用机器学习方法,编码优化的现在决策、最差情况相关的优化决策和等待决策为策略。使用列和约束生成算法提取优化策略,并使用机器学习模型预测高质量策略。
  • results: 应用方法到facility location、multi-item 存储控制和单位启动问题,可以快速解决ARO问题,高精度。
    Abstract We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization (ARO) problems with binary here-and-now variables and polyhedral uncertainty sets. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions into what we denote as the strategy. We solve multiple similar ARO instances in advance using the column and constraint generation algorithm and extract the optimal strategies to generate a training set. We train a machine learning model that predicts high-quality strategies for the here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the wait-and-see decisions. We also introduce an algorithm to reduce the number of different target classes the machine learning algorithm needs to be trained on. We apply the proposed approach to the facility location, the multi-item inventory control and the unit commitment problems. Our approach solves ARO problems drastically faster than the state-of-the-art algorithms with high accuracy.
    摘要

Optimal Control of Multiclass Fluid Queueing Networks: A Machine Learning Approach

  • paper_url: http://arxiv.org/abs/2307.12405
  • repo_url: None
  • paper_authors: Dimitris Bertsimas, Cheol Woo Kim
  • for: 这篇论文是为了提出一种机器学习方法来控制多类流体队列网络(MFQNET),并提供了明确和深入的控制策略。
  • methods: 这篇论文使用了优化类型的机器学习方法,即Optimal Classification Trees with hyperplane splits(OCT-H)来学习MFQNET的控制策略。
  • results: 实验结果表明,使用OCT-H学习的控制策略可以在大规模网络中实现100%的准确率,而在线应用只需几毫秒。
    Abstract We propose a machine learning approach to the optimal control of multiclass fluid queueing networks (MFQNETs) that provides explicit and insightful control policies. We prove that a threshold type optimal policy exists for MFQNET control problems, where the threshold curves are hyperplanes passing through the origin. We use Optimal Classification Trees with hyperplane splits (OCT-H) to learn an optimal control policy for MFQNETs. We use numerical solutions of MFQNET control problems as a training set and apply OCT-H to learn explicit control policies. We report experimental results with up to 33 servers and 99 classes that demonstrate that the learned policies achieve 100\% accuracy on the test set. While the offline training of OCT-H can take days in large networks, the online application takes milliseconds.
    摘要 我们提出了一种机器学习方法来优化多类流体队列网络(MFQNET)的控制问题,该方法提供了明确和深入的控制策略。我们证明了多类流体队列网络控制问题中存在一种阈值类型的优化策略,其阈值曲线都是通过起点的 hyperplanes。我们使用Optimal Classification Trees with hyperplane splits(OCT-H)来学习MFQNET的控制策略。我们使用 numerically solved MFQNET control problems作为训练集,并通过OCT-H来学习明确的控制策略。我们发现在33个服务器和99个类型的 эксперименталь结果中,学习的策略可以达到100%的准确率。虽然在大型网络中的离线训练可能需要几天的时间,但在线应用只需毫秒钟。

Uncertainty-aware Grounded Action Transformation towards Sim-to-Real Transfer for Traffic Signal Control

  • paper_url: http://arxiv.org/abs/2307.12388
  • repo_url: None
  • paper_authors: Longchao Da, Hao Mei, Romir Sharma, Hua Wei
  • for: 提高RL在实际道路上的应用性能
  • methods: 使用 simulations-to-real-world(sim-to-real)转移方法,动态将模拟环境中学习的策略转移到实际环境中,以抑制域的差异
  • results: 在模拟交通环境中评估了UGAT方法,并显示其在实际环境中显著提高RL策略的性能
    Abstract Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.
    摘要 交通信号控制(TSC)是一项复杂重要的任务,影响了数百万人的日常生活。强化学习(RL)已经在优化交通信号控制方面显示了扎实的成果,但现有RL基于TSC方法主要在模拟环境中训练,它们在真实世界中的性能差距很大。在这篇论文中,我们提出了一种从模拟环境到真实世界(sim-to-real)传输方法,称为UGAT,它可以在模拟环境中学习的策略在真实世界中被转移并且在不同的环境中保持良好的性能。我们对一个模拟交通环境进行了评估,并证明了UGAT方法在真实世界中可以大幅提高RL策略的性能。

In-Context Learning in Large Language Models Learns Label Relationships but Is Not Conventional Learning

  • paper_url: http://arxiv.org/abs/2307.12375
  • repo_url: None
  • paper_authors: Jannik Kossen, Tom Rainforth, Yarin Gal
  • for: 本研究旨在 investigating large language models (LLMs) 在下游任务中的启发式学习能力,特别是如何在 Context 中提供的示例对 Label 之间的关系对 LLMs 的预测造成影响。
  • methods: 本研究使用了一种 combine 的方法,包括分析 LLMs 在预训练和 Context 中的行为,以及如何将 Context 中的示例和 Label 相互关联。
  • results: 研究发现,LLMs 通常会在 Context 中使用示例 Label 的信息,但是预训练和 Context 中的 Label 关系是不同的,并且模型不会对所有 Context 中的信息进行平等考虑。这些结论可以帮助我们更好地理解和调节 LLM 的行为。
    Abstract The performance of Large Language Models (LLMs) on downstream tasks often improves significantly when including examples of the input-label relationship in the context. However, there is currently no consensus about how this in-context learning (ICL) ability of LLMs works: for example, while Xie et al. (2021) liken ICL to a general-purpose learning algorithm, Min et al. (2022b) argue ICL does not even learn label relationships from in-context examples. In this paper, we study (1) how labels of in-context examples affect predictions, (2) how label relationships learned during pre-training interact with input-label examples provided in-context, and (3) how ICL aggregates label information across in-context examples. Our findings suggests LLMs usually incorporate information from in-context labels, but that pre-training and in-context label relationships are treated differently, and that the model does not consider all in-context information equally. Our results give insights into understanding and aligning LLM behavior.
    摘要 大型自然语言模型(LLM)在下游任务中的表现经常会有显著改善,当 inclusion 输入-标签关系的例子在上下文中时。然而,目前没有一致的共识,如何解释 LLM 的上下文学习(ICL)能力。例如,希xsd et al.(2021)认为 ICL 类似于一种通用学习算法,而 Min et al.(2022b)则认为 ICL 不会从上下文中学习标签关系。在这篇文章中,我们研究了以下几点:1. 输入-标签例子中的标签如何影响预测结果。2. 在预训练中学习的标签关系如何与输入-标签例子在上下文中相互作用。3. ICL 如何平均处理上下文中的标签信息。我们发现 LLM 通常会在上下文中利用标签信息,但是预训练和上下文中的标签关系被处理不同,而且模型不会对所有上下文中的信息进行平均处理。我们的发现可以帮助理解和调整 LLM 的行为。

Assessing Intra-class Diversity and Quality of Synthetically Generated Images in a Biomedical and Non-biomedical Setting

  • paper_url: http://arxiv.org/abs/2308.02505
  • repo_url: None
  • paper_authors: Muhammad Muneeb Saad, Mubashir Husain Rehmani, Ruairi O’Reilly
  • for: 这种研究是为了评估生成 adversarial Networks (GANs) 在生成难以 obtain 的医学影像数据 augmentation 任务中的效果。
  • methods: 这种研究使用了多Scale Structural Similarity Index Measure 和 Cosine Distance 评估生成图像的内部多样性,以及 Frechet Inception Distance 评估生成图像的质量。
  • results: 研究发现,在不同的医学影像模式下,生成图像的多样性和质量得分异常大。不同的采样大小也会影响生成图像的质量和多样性。这种研究旨在探讨生成图像的多样性和质量在医学和非医学影像模式之间的差异。
    Abstract In biomedical image analysis, data imbalance is common across several imaging modalities. Data augmentation is one of the key solutions in addressing this limitation. Generative Adversarial Networks (GANs) are increasingly being relied upon for data augmentation tasks. Biomedical image features are sensitive to evaluating the efficacy of synthetic images. These features can have a significant impact on metric scores when evaluating synthetic images across different biomedical imaging modalities. Synthetically generated images can be evaluated by comparing the diversity and quality of real images. Multi-scale Structural Similarity Index Measure and Cosine Distance are used to evaluate intra-class diversity, while Frechet Inception Distance is used to evaluate the quality of synthetic images. Assessing these metrics for biomedical and non-biomedical imaging is important to investigate an informed strategy in evaluating the diversity and quality of synthetic images. In this work, an empirical assessment of these metrics is conducted for the Deep Convolutional GAN in a biomedical and non-biomedical setting. The diversity and quality of synthetic images are evaluated using different sample sizes. This research intends to investigate the variance in diversity and quality across biomedical and non-biomedical imaging modalities. Results demonstrate that the metrics scores for diversity and quality vary significantly across biomedical-to-biomedical and biomedical-to-non-biomedical imaging modalities.
    摘要 在生物医学影像分析中,数据偏好是广泛存在的问题,而生成对抗网络(GANs)在解决这一问题上逐渐被广泛应用。生物医学影像特征对评估合成图像的效果非常敏感,这些特征可以带来评估合成图像的纪录分数变化。合成图像可以通过与真实图像进行比较来评估其多样性和质量。在不同的生物医学成像modalities中,使用多尺度结构相似性指标和夹角距离来评估内类多样性,而使用彩色征波距离来评估合成图像的质量。为了调查这些指标在生物医学和非生物医学成像modalities中的效果,这里进行了一项实验性的评估。研究表明,在不同的生物医学和非生物医学成像modalities中,多样性和质量指标的分数差异很大。

Early Prediction of Alzheimers Disease Leveraging Symptom Occurrences from Longitudinal Electronic Health Records of US Military Veterans

  • paper_url: http://arxiv.org/abs/2307.12369
  • repo_url: None
  • paper_authors: Rumeng Li, Xun Wang, Dan Berlowitz, Brian Silver, Wen Hu, Heather Keating, Raelene Goodwin, Weisong Liu, Honghuang Lin, Hong Yu
  • for: 预测阿尔茨海默病(AD)的早期诊断非常重要,以便于时间性的 intervención和治疗。这项研究使用机器学习方法分析患者的长期电子医疗纪录(EHR),以找出可以预测AD诊断的标志和症状。
  • methods: 这项研究使用了一种 случа控制设计,使用从2004年到2021年的美国卫生部卫生管理局(VHA)的长期EHR数据进行分析。实验中的患者是由2016年以后根据ICD-10-CM代码诊断出的AD患者,与年龄、性别和临床使用相同的9名控制人进行匹配。研究使用了AD相关关键词的时间序列分析,以预测AD诊断。
  • results: 研究发现,患者的AD相关关键词的时间序列分析可以预测AD诊断,特别是在诊断附近的时间段。模型的拟合度(ROCAUC)为0.997,准确性很高。研究还发现,年龄、性别和种族/民族子组的预测结果几乎一致,只有年龄 younger than 65的 subgroup的预测结果不太准确(ROCAUC 0.746)。这种机器学习模型可以使用EHR数据预测AD诊断,提供一种可靠的、便宜的方式,用于早期诊断大量人口。
    Abstract Early prediction of Alzheimer's disease (AD) is crucial for timely intervention and treatment. This study aims to use machine learning approaches to analyze longitudinal electronic health records (EHRs) of patients with AD and identify signs and symptoms that can predict AD onset earlier. We used a case-control design with longitudinal EHRs from the U.S. Department of Veterans Affairs Veterans Health Administration (VHA) from 2004 to 2021. Cases were VHA patients with AD diagnosed after 1/1/2016 based on ICD-10-CM codes, matched 1:9 with controls by age, sex and clinical utilization with replacement. We used a panel of AD-related keywords and their occurrences over time in a patient's longitudinal EHRs as predictors for AD prediction with four machine learning models. We performed subgroup analyses by age, sex, and race/ethnicity, and validated the model in a hold-out and "unseen" VHA stations group. Model discrimination, calibration, and other relevant metrics were reported for predictions up to ten years before ICD-based diagnosis. The study population included 16,701 cases and 39,097 matched controls. The average number of AD-related keywords (e.g., "concentration", "speaking") per year increased rapidly for cases as diagnosis approached, from around 10 to over 40, while remaining flat at 10 for controls. The best model achieved high discriminative accuracy (ROCAUC 0.997) for predictions using data from at least ten years before ICD-based diagnoses. The model was well-calibrated (Hosmer-Lemeshow goodness-of-fit p-value = 0.99) and consistent across subgroups of age, sex and race/ethnicity, except for patients younger than 65 (ROCAUC 0.746). Machine learning models using AD-related keywords identified from EHR notes can predict future AD diagnoses, suggesting its potential use for identifying AD risk using EHR notes, offering an affordable way for early screening on large population.
    摘要 早期预测阿尔ツ海默病(AD)是非常重要,以便在时间上采取措施和治疗。这项研究目的是使用机器学习方法分析患者的长期电子医疗记录(EHR),以确定患者患阿尔ツ海默病的预测指标。我们采用了一种case-control设计,使用2004年至2021年美国卫生部老年军人医疗管理局(VHA)的长期EHR数据,确定患者是在2016年1月1日后被诊断为阿尔ツ海默病(根据ICD-10-CM代码),并与年龄、性别和临床使用相同的9名控制人群进行匹配。我们使用了一组阿尔ツ海默病相关关键词,并跟踪这些词语在患者的长期EHR记录中的出现情况,采用4种机器学习模型进行预测。我们进行了年龄、性别和种族/民族 subgroup分析,并在一个“隐藏”的VHA站点上验证模型。模型的准确率、均衡和其他相关指标都被报告,用于预测距离ICD-基本诊断的诊断。研究人口包括16,701例患者和39,097名匹配的控制人群。患者的AD相关关键词每年增加速度很快,从约10个增加到超过40个,而控制人群保持在10个,而且在诊断 approached时,AD相关关键词的增加速度加剧。最佳模型在使用至少10年之前的ICD-基本诊断数据时,达到了0.997的报告准确率。模型均衡好(Hosmer-Lemeshow准确度测试值为0.99),并在年龄、性别和种族/民族 subgroup中保持一致,除了年龄小于65岁的患者(ROCAUC为0.746)。机器学习模型使用从EHR记录中提取的AD相关关键词可以预测未来的AD诊断,表明其可能用于通过EHR记录来预测AD风险,提供一种可靠且有效的大规模屏检方式。