cs.LG - 2023-08-07

Improving FHB Screening in Wheat Breeding Using an Efficient Transformer Model

  • paper_url: http://arxiv.org/abs/2308.03670
  • repo_url: None
  • paper_authors: Babak Azad, Ahmed Abdalla, Kwanghee Won, Ali Mirzakhani Nafchi
  • For: The paper is written for the early detection of Fusarium head blight (FHB) in wheat and barley breeding programs.* Methods: The paper proposes a new Context Bridge to integrate the local representation capability of the U-Net network in the transformer model, and replaces the standard attention mechanism with Efficient Self-attention.* Results: The proposed transformer-based method is effective for FHB-disease detection, as demonstrated through extensive experiments across typical tasks for plant image segmentation.Here’s the simplified Chinese text for the three key points:* 用: 本文为小麦和麦角束缚程序中早期检测 fusarium head blight (FHB) 的研究。* 方法: 本文提出一种新的 Context Bridge,将 U-Net 网络的本地表示能力与 transformer 模型结合,并将标准的注意机制 replaced 为 Efficient Self-attention。* 结果: 提议的 transformer-based 方法在 Plant image segmentation 等 typical tasks 中得到了广泛的实验证明,表明其效果是可靠的。
    Abstract Fusarium head blight is a devastating disease that causes significant economic losses annually on small grains. Efficiency, accuracy, and timely detection of FHB in the resistance screening are critical for wheat and barley breeding programs. In recent years, various image processing techniques have been developed using supervised machine learning algorithms for the early detection of FHB. The state-of-the-art convolutional neural network-based methods, such as U-Net, employ a series of encoding blocks to create a local representation and a series of decoding blocks to capture the semantic relations. However, these methods are not often capable of long-range modeling dependencies inside the input data, and their ability to model multi-scale objects with significant variations in texture and shape is limited. Vision transformers as alternative architectures with innate global self-attention mechanisms for sequence-to-sequence prediction, due to insufficient low-level details, may also limit localization capabilities. To overcome these limitations, a new Context Bridge is proposed to integrate the local representation capability of the U-Net network in the transformer model. In addition, the standard attention mechanism of the original transformer is replaced with Efficient Self-attention, which is less complicated than other state-of-the-art methods. To train the proposed network, 12,000 wheat images from an FHB-inoculated wheat field at the SDSU research farm in Volga, SD, were captured. In addition to healthy and unhealthy plants, these images encompass various stages of the disease. A team of expert pathologists annotated the images for training and evaluating the developed model. As a result, the effectiveness of the transformer-based method for FHB-disease detection, through extensive experiments across typical tasks for plant image segmentation, is demonstrated.
    摘要 fusarium 头病是一种致命的疾病,每年对小麦和麦角造成重大经济损失。效率、准确性和时间检测 fusarium 头病在抗性屏选中是关键,以便为小麦和麦角推广 програм序。在最近几年,Various image processing techniques 有 been developed using supervised machine learning algorithms for the early detection of fusarium 头病。state-of-the-art convolutional neural network-based methods, such as U-Net, employ a series of encoding blocks to create a local representation and a series of decoding blocks to capture the semantic relations。However, these methods are not often capable of long-range modeling dependencies inside the input data, and their ability to model multi-scale objects with significant variations in texture and shape is limited。In order to overcome these limitations, a new Context Bridge is proposed to integrate the local representation capability of the U-Net network in the transformer model。In addition, the standard attention mechanism of the original transformer is replaced with Efficient Self-attention, which is less complicated than other state-of-the-art methods。To train the proposed network, 12,000 wheat images from an fusarium 头病-inoculated wheat field at the SDSU research farm in Volga, SD, were captured。In addition to healthy and unhealthy plants, these images encompass various stages of the disease。A team of expert pathologists annotated the images for training and evaluating the developed model。As a result, the effectiveness of the transformer-based method for fusarium 头病-disease detection, through extensive experiments across typical tasks for plant image segmentation, is demonstrated。

Diffusion Model in Causal Inference with Unmeasured Confounders

  • paper_url: http://arxiv.org/abs/2308.03669
  • repo_url: https://github.com/tatsu432/BDCM
  • paper_authors: Tatsuhiro Shimizu
  • For: 本研究旨在扩展Diffusion Model,以回答基于观察数据的 causal 问题,在存在隐藏 confounder 的情况下。* Methods: 我们使用 Pearl 的 Directed Acyclic Graph (DAG) 框架,并提出了一种Diffusion-based Causal Model (DCM),即将 diffusion model incorporated into DAG 来更准确地回答 causal 问题,假设所有 confounder 都是观察的。但在实践中,隐藏 confounder 存在,这限制了 DCM 的可应用范围。为了解决这种限制,我们提出了一种扩展模型 called Backdoor Criterion based DCM (BDCM),其基于 Backdoor criterion 来找出 DAG 中需要包含在 decoding 过程中的变量,以便在隐藏 confounder 的情况下扩展 DCM。* Results: 我们通过 synthetic data 实验表明,我们的提议的模型能够更 precisely 捕捉 counterfactual distribution than DCM under unmeasured confounders。
    Abstract We study how to extend the use of the diffusion model to answer the causal question from the observational data under the existence of unmeasured confounders. In Pearl's framework of using a Directed Acyclic Graph (DAG) to capture the causal intervention, a Diffusion-based Causal Model (DCM) was proposed incorporating the diffusion model to answer the causal questions more accurately, assuming that all of the confounders are observed. However, unmeasured confounders in practice exist, which hinders DCM from being applicable. To alleviate this limitation of DCM, we propose an extended model called Backdoor Criterion based DCM (BDCM), whose idea is rooted in the Backdoor criterion to find the variables in DAG to be included in the decoding process of the diffusion model so that we can extend DCM to the case with unmeasured confounders. Synthetic data experiment demonstrates that our proposed model captures the counterfactual distribution more precisely than DCM under the unmeasured confounders.
    摘要 我们研究如何使用扩散模型回答从观察数据中的 causal 问题,在存在未探测的干扰变量的情况下。在珍珠的框架中,使用直接径向图(DAG)捕捉 causal 干预,一种扩散基于 causal 模型(DCM)被提出,假设所有干扰变量都是观察的。然而,在实践中存在未探测的干扰变量,这限制了 DCM 的应用。为了解决 DCM 中未探测干扰变量的问题,我们提出一种扩展模型,即 Backdoor criterion 基于 DCM(BDCM),其基于 Backdoor criterion 来找出 DAG 中需要包含在扩散模型中的变量,以便在未探测干扰变量的情况下扩展 DCM。synthetic 数据实验表明,我们的提议的模型可以更准确地回答 counterfactual 分布,比 DCM 在未探测干扰变量的情况下。

Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness

  • paper_url: http://arxiv.org/abs/2308.03666
  • repo_url: None
  • paper_authors: Shide Du, Zihan Fang, Shiyang Lan, Yanchao Tan, Manuel Günther, Shiping Wang, Wenzhong Guo
  • for: 提高人工智能系统的可靠性和多模式学习能力
  • methods: 使用自定义可靠网络、灵活学习正则化和开放世界认知损失来提高可靠性和多模式学习
  • results: 实现了开放世界多模式认知任务的高性能提升
    Abstract As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.
    摘要
  1. Insufficient explanation of predictive results: It is difficult to understand why the system made a particular prediction or decision.2. Inadequate generalization for learning models: The system may not perform well when faced with new or unfamiliar situations.3. Poor adaptability to uncertain environments: The system may not be able to handle unexpected events or changes in the environment.To address these challenges, we propose a neural program that bridges trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. Our approach includes the following three components:1. Design-level interpretability: We first customize trustworthy networks with specific physical meanings, making it easier to understand how the system works and why it makes certain predictions or decisions.2. Environmental well-being task-interfaces: We design flexible learning regularizers to improve the generalization of trustworthy learning, allowing the system to adapt to new situations and environments.3. Open-world recognition programs: We integrate open-world recognition losses with agent mechanisms to increase the robustness of trustworthy learning, enabling the system to handle unexpected events and changes in the environment.By enhancing various trustworthy properties through these designed open-world protocols, we observe significant performance improvements across a wide range of surroundings, under open-world multimedia recognition scenarios. These protocols are applicable to a variety of environments, including but not limited to:1. Image recognition: The system can accurately recognize objects and scenes in images, even in the presence of noise or other challenges.2. Speech recognition: The system can accurately transcribe spoken language, even in noisy or unfamiliar environments.3. Natural language processing: The system can understand and respond to natural language inputs, even in complex or ambiguous situations.Overall, our approach to trustworthy open-world learning has the potential to significantly improve the performance and reliability of artificial intelligence systems, enabling them to better serve the needs of users in a wide range of contexts.

Distributionally Robust Classification on a Data Budget

  • paper_url: http://arxiv.org/abs/2308.03821
  • repo_url: https://github.com/penfever/vlhub
  • paper_authors: Benjamin Feuer, Ameya Joshi, Minh Pham, Chinmay Hegde
  • for: 该论文目的是研究如何在数据有限的情况下培养可靠的深度学习模型。
  • methods: 该论文使用了一系列新的训练数据集和精心控制的调查来研究对于图像分类的Robustness的因素。
  • results: 研究发现,使用标准的ResNet-50模型,训练时使用交叉熵损失,可以在240万个图像样本上达到与CLIP模型训练400万样本后的Robustness水平。这是我们知道的第一个在有限数据预算下实现(近)顶尖分布robustness的结果。
    Abstract Real world uses of deep learning require predictable model behavior under distribution shifts. Models such as CLIP show emergent natural distributional robustness comparable to humans, but may require hundreds of millions of training samples. Can we train robust learners in a domain where data is limited? To rigorously address this question, we introduce JANuS (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions, and perform a series of carefully controlled investigations of factors contributing to robustness in image classification, then compare those results to findings derived from a large-scale meta-analysis. Using this approach, we show that standard ResNet-50 trained with the cross-entropy loss on 2.4 million image samples can attain comparable robustness to a CLIP ResNet-50 trained on 400 million samples. To our knowledge, this is the first result showing (near) state-of-the-art distributional robustness on limited data budgets. Our dataset is available at \url{https://huggingface.co/datasets/penfever/JANuS_dataset}, and the code used to reproduce our experiments can be found at \url{https://github.com/penfever/vlhub/}.
    摘要 real-world 应用需要深度学习模型在分布变化下具有预测性的行为。例如,CLIP 模型表现出了自然的分布强度性,但可能需要百万个训练样本。可以在有限数据量的领域中训练强健的学习者吗?为了彻底回答这个问题,我们介绍了 JANuS(共同注释和名称集),一个包含四个新的训练集,包括图像、标签和相应的描述,并进行了一系列严格控制的调查,以研究影响模型强度的因素。我们发现,使用权重平均损失函数,只需训练标准 ResNet-50 模型,可以在240万张图像样本上达到与 CLIP ResNet-50 模型在400万样本上的相似水平的分布强度性。我们认为这是首次在有限数据预算下实现(近)状态时的分布强度性的研究结果。我们的数据集可以在 \url{https://huggingface.co/datasets/penfever/JANuS_dataset} 上下载,并且使用来复制我们的实验代码可以在 \url{https://github.com/penfever/vlhub/} 上找到。

Two-stage Early Prediction Framework of Remaining Useful Life for Lithium-ion Batteries

  • paper_url: http://arxiv.org/abs/2308.03664
  • repo_url: None
  • paper_authors: Dhruv Mittal, Hymalai Bello, Bo Zhou, Mayank Shekhar Jha, Sungho Suh, Paul Lukowicz
  • for: 预测 Lithium-ion 电池的有用寿命 (RUL),以提高各种业务中的电池管理可靠性和可维护性。
  • methods: 提议的方法包括两个阶段:使用神经网络模型确定首次预测周期 (FPC),然后使用预测衰落特征来估算剩余有用寿命。
  • results: 实验结果显示,提议的方法在 RUL 预测方面表现出色,比传统方法更准确和可靠。此外,该方法在实际应用中也表现了承诺,提供了更好的准确性和可应用性。
    Abstract Early prediction of remaining useful life (RUL) is crucial for effective battery management across various industries, ranging from household appliances to large-scale applications. Accurate RUL prediction improves the reliability and maintainability of battery technology. However, existing methods have limitations, including assumptions of data from the same sensors or distribution, foreknowledge of the end of life (EOL), and neglect to determine the first prediction cycle (FPC) to identify the start of the unhealthy stage. This paper proposes a novel method for RUL prediction of Lithium-ion batteries. The proposed framework comprises two stages: determining the FPC using a neural network-based model to divide the degradation data into distinct health states and predicting the degradation pattern after the FPC to estimate the remaining useful life as a percentage. Experimental results demonstrate that the proposed method outperforms conventional approaches in terms of RUL prediction. Furthermore, the proposed method shows promise for real-world scenarios, providing improved accuracy and applicability for battery management.
    摘要 早期预测电池剩余有用生命(RUL)是跨多个领域的关键,从家用电器到大规模应用。准确的RUL预测提高电池技术的可靠性和维护性。然而,现有方法有限,包括同感知数据的假设、结束生命阶段(EOL)的先知识和忽略第一预测周期(FPC)来确定开始不健康阶段。这篇论文提出了一种新的Li-ion电池RUL预测方法。该框架包括两个阶段:使用神经网络模型将衰减数据分为不同的健康状态,并预测衰减模式以估算剩余有用生命的百分数。实验结果表明,提案方法在RUL预测方面表现出了明显的优势,并在实际场景中展现出了改善的准确性和可用性。

Matrix Completion in Almost-Verification Time

  • paper_url: http://arxiv.org/abs/2308.03661
  • repo_url: None
  • paper_authors: Jonathan A. Kelner, Jerry Li, Allen Liu, Aaron Sidford, Kevin Tian
  • for: 这个论文提出了一个新的低级matrix completion问题的解决方案,即从random observations中approximentate矩阵\mathbf{M}的级数为r的完全问题。
  • methods: 该论文提出了一种算法,可以在没有任何假设下completes\mathbf{M}的99%行和列在 sampling的基础上。然后,假设矩阵\mathbf{M}的行和列范围满足特定的规则性质,则可以通过融合多个回归问题的解来提高完全问题的解决方案。
  • results: 论文表明,在矩阵\mathbf{M}的行和列范围满足特定的规则性质时,可以通过 sampling的基础上completes\mathbf{M}到高精度从mr^{2+o(1)} observations中,并且runtime为mr^{3+o(1)}。此外,论文还提出了一些Robust variantsof algorithms,可以在noisy环境中completes\mathbf{M}到Frobenius norm distance的approximentatelyr^{1.5}\Delta。在这些runtimes中,可以verify that a rank-$r$ decomposition $\mathbf{U}\mathbf{V}^\top$ agrees with the sampled observations。
    Abstract We give a new framework for solving the fundamental problem of low-rank matrix completion, i.e., approximating a rank-$r$ matrix $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \ge n$) from random observations. First, we provide an algorithm which completes $\mathbf{M}$ on $99\%$ of rows and columns under no further assumptions on $\mathbf{M}$ from $\approx mr$ samples and using $\approx mr^2$ time. Then, assuming the row and column spans of $\mathbf{M}$ satisfy additional regularity properties, we show how to boost this partial completion guarantee to a full matrix completion algorithm by aggregating solutions to regression problems involving the observations. In the well-studied setting where $\mathbf{M}$ has incoherent row and column spans, our algorithms complete $\mathbf{M}$ to high precision from $mr^{2+o(1)}$ observations in $mr^{3 + o(1)}$ time (omitting logarithmic factors in problem parameters), improving upon the prior state-of-the-art [JN15] which used $\approx mr^5$ samples and $\approx mr^7$ time. Under an assumption on the row and column spans of $\mathbf{M}$ we introduce (which is satisfied by random subspaces with high probability), our sample complexity improves to an almost information-theoretically optimal $mr^{1 + o(1)}$, and our runtime improves to $mr^{2 + o(1)}$. Our runtimes have the appealing property of matching the best known runtime to verify that a rank-$r$ decomposition $\mathbf{U}\mathbf{V}^\top$ agrees with the sampled observations. We also provide robust variants of our algorithms that, given random observations from $\mathbf{M} + \mathbf{N}$ with $\|\mathbf{N}\|_{F} \le \Delta$, complete $\mathbf{M}$ to Frobenius norm distance $\approx r^{1.5}\Delta$ in the same runtimes as the noiseless setting. Prior noisy matrix completion algorithms [CP10] only guaranteed a distance of $\approx \sqrt{n}\Delta$.
    摘要 我们提出了一个新的框架来解决低阶矩阵完成问题,即确 aproximating 一个 Rank-$r$ 矩阵 $\mathbf{M} \in \mathbb{R}^{m \times n}$ (where $m \ge n$) 从Random observations 中。首先,我们提供了一个算法,可以在不进一步假设 $\mathbf{M}$ 的情况下,从 $\approx mr$ 样本中完成 $\mathbf{M}$ 的99% 的行和列。然后,假设 $\mathbf{M}$ 的行和列范围满足其他调和的特性,我们可以通过聚合这些部分完成数据来实现全矩阵完成算法。在广泛研究的设定中,其中 $\mathbf{M}$ 的行和列范围是不对称的,我们的算法可以从 $mr^{2+o(1)}$ 样本中完成 $\mathbf{M}$ 到高精度,比以前的状况下(JN15)使用 $\approx mr^5$ 样本和 $\approx mr^7$ 时间。假设 $\mathbf{M}$ 的行和列范围满足我们引入的一个假设(这个假设在高概率下成立),我们的样本缩减到 almost information-theoretically Optimal $mr^{1+o(1)}$,并且我们的时间缩减到 $mr^{2+o(1)}$。我们的时间有着愉悦的性质,与最好的知识理论时间匹配,以确认 $\mathbf{U}\mathbf{V}^\top$ 是否对样本一致。我们还提供了一些强健的算法,可以在 $\mathbf{M} + \mathbf{N}$ 中的随机样本中完成 $\mathbf{M}$,其中 $\|\mathbf{N}\|_{F} \le \Delta$。这些算法可以在同一个时间中完成这些任务,并且可以保证完成结果的 Frobenius 误差在 $\approx r^{1.5}\Delta$ 之间。与之前的噪音矩阵完成算法(CP10)相比,这些算法可以提供更好的误差保证,即 $\approx \sqrt{n}\Delta$。

Generative Forests

  • paper_url: http://arxiv.org/abs/2308.03648
  • repo_url: https://github.com/AlCorreia/GeFs
  • paper_authors: Richard Nock, Mathieu Guillame-Bert
  • for: 本文主要针对Tabular数据的生成和模型化问题,旨在提出新的树状生成模型和训练算法,以解决现有方法的三大问题。
  • methods: 本文提出了一种基于树状模型的生成模型,可以快速生成高质量的Tabular数据,同时保证模型的可读性和可视化性。此外,本文还提出了一种基于树状模型的训练算法,可以简化之前的训练设定和 Display boosting-compatible convergence。
  • results: 实验表明,本文的方法可以在缺失数据补充和生成数据比较真实数据的问题上达到remarkable results,特别是在比较于state-of-the-art方法的情况下。
    Abstract Tabular data represents one of the most prevalent form of data. When it comes to data generation, many approaches would learn a density for the data generation process, but would not necessarily end up with a sampler, even less so being exact with respect to the underlying density. A second issue is on models: while complex modeling based on neural nets thrives in image or text generation (etc.), less is known for powerful generative models on tabular data. A third problem is the visible chasm on tabular data between training algorithms for supervised learning with remarkable properties (e.g. boosting), and a comparative lack of guarantees when it comes to data generation. In this paper, we tackle the three problems, introducing new tree-based generative models convenient for density modeling and tabular data generation that improve on modeling capabilities of recent proposals, and a training algorithm which simplifies the training setting of previous approaches and displays boosting-compliant convergence. This algorithm has the convenient property to rely on a supervised training scheme that can be implemented by a few tweaks to the most popular induction scheme for decision tree induction with two classes. Experiments are provided on missing data imputation and comparing generated data to real data, displaying the quality of the results obtained by our approach, in particular against state of the art.
    摘要 表格数据表示一种非常常见的数据形式。在数据生成方面,许多方法会学习数据生成过程中的浓度,但并不一定会得到一个抽象,更不一定是对于下面的潜在浓度准确。第二个问题是模型:虽然复杂的模型基于神经网络在图像或文本生成等领域得到了成功,但对于可质量生成模型来说,对于表格数据 menos 知之。第三个问题是表格数据的可见差异,在超vised学习中具有惊人性能的训练算法(例如,提升),而数据生成方面却缺乏保证。在这篇论文中,我们解决了这三个问题,提出了新的树状生成模型,可以增强对表格数据的模型能力,以及一种简化训练设置的训练算法,可以在前一代方法的基础上进行快速启用。这种算法可以通过对最流行的决策树生成算法中的两类训练进行一些修改来实现。我们的实验表明,我们的方法可以在缺失数据填充和生成数据与真实数据比较的情况下表现出色,特别是与现有技术相比。

XFlow: Benchmarking Flow Behaviors over Graphs

  • paper_url: http://arxiv.org/abs/2308.03819
  • repo_url: https://github.com/xgraphing/xflow
  • paper_authors: Zijian Zhang, Zonghan Zhang, Zhiqian Chen
  • for: 本研究旨在提供一个涵盖多领域 tasks、基线模型、图 dataset 和评估工具的新的参考库,以便研究各种传播行为在不同领域的情况下。
  • methods: 本研究使用了多种方法,包括基线模型、图 Theory 和机器学习方法,以探索各种传播行为的特点和特征。
  • results: 本研究的结果显示了现有的基础模型在不同的图dataset上的优适性和缺陷,以及可能的未来研究方向。
    Abstract The occurrence of diffusion on a graph is a prevalent and significant phenomenon, as evidenced by the spread of rumors, influenza-like viruses, smart grid failures, and similar events. Comprehending the behaviors of flow is a formidable task, due to the intricate interplay between the distribution of seeds that initiate flow propagation, the propagation model, and the topology of the graph. The study of networks encompasses a diverse range of academic disciplines, including mathematics, physics, social science, and computer science. This interdisciplinary nature of network research is characterized by a high degree of specialization and compartmentalization, and the cooperation facilitated by them is inadequate. From a machine learning standpoint, there is a deficiency in a cohesive platform for assessing algorithms across various domains. One of the primary obstacles to current research in this field is the absence of a comprehensive curated benchmark suite to study the flow behaviors under network scenarios. To address this disparity, we propose the implementation of a novel benchmark suite that encompasses a variety of tasks, baseline models, graph datasets, and evaluation tools. In addition, we present a comprehensive analytical framework that offers a generalized approach to numerous flow-related tasks across diverse domains, serving as a blueprint and roadmap. Drawing upon the outcomes of our empirical investigation, we analyze the advantages and disadvantages of current foundational models, and we underscore potential avenues for further study. The datasets, code, and baseline models have been made available for the public at: https://github.com/XGraphing/XFlow
    摘要 Diffusion on graphs is a common and important phenomenon, such as the spread of rumors, influenza-like viruses, and smart grid failures. Understanding the flow behavior is a challenging task due to the complex interplay between the seed distribution, propagation model, and graph topology. Network research is an interdisciplinary field that includes mathematics, physics, social science, and computer science, but this field is characterized by a high degree of specialization and compartmentalization, and cooperation between disciplines is limited. From a machine learning perspective, there is a lack of a comprehensive platform for assessing algorithms across different domains. One of the primary obstacles to current research in this field is the absence of a comprehensive curated benchmark suite to study flow behaviors under network scenarios.To address this gap, we propose the implementation of a novel benchmark suite that includes various tasks, baseline models, graph datasets, and evaluation tools. Additionally, we present a comprehensive analytical framework that provides a generalized approach to numerous flow-related tasks across diverse domains, serving as a blueprint and roadmap. Based on our empirical investigation, we analyze the advantages and disadvantages of current foundational models and highlight potential avenues for further study. The datasets, code, and baseline models have been made publicly available at: .

MedMine: Examining Pre-trained Language Models on Medication Mining

  • paper_url: http://arxiv.org/abs/2308.03629
  • repo_url: https://github.com/hecta-uom/m3
  • paper_authors: Haifa Alrdahi, Lifeng Han, Hendrik Šuvalov, Goran Nenadic
  • for: 本研究旨在探讨现有的预训练语言模型(PLM)在自动药物检索 task 上的表现,以便为未来研究提供参考。
  • methods: 本研究使用了 Fine-tuning 方法,包括 Med7 和 XLM-RoBERTa 两种预训练模型,以便对历史药物检索 shared task 数据集进行比较。
  • results: 研究发现现有的 PLM 在不同类型的实体和药物事件上表现不均衡,并且提出了将这些模型结合使用、或者通过 ensemble learning 和数据扩展来提高总体精度的想法。
    Abstract Automatic medication mining from clinical and biomedical text has become a popular topic due to its real impact on healthcare applications and the recent development of powerful language models (LMs). However, fully-automatic extraction models still face obstacles to be overcome such that they can be deployed directly into clinical practice for better impacts. Such obstacles include their imbalanced performances on different entity types and clinical events. In this work, we examine current state-of-the-art pre-trained language models (PLMs) on such tasks, via fine-tuning including the monolingual model Med7 and multilingual large language model (LLM) XLM-RoBERTa. We compare their advantages and drawbacks using historical medication mining shared task data sets from n2c2-2018 challenges. We report the findings we get from these fine-tuning experiments such that they can facilitate future research on addressing them, for instance, how to combine their outputs, merge such models, or improve their overall accuracy by ensemble learning and data augmentation. MedMine is part of the M3 Initiative \url{https://github.com/HECTA-UoM/M3}
    摘要 自动药物检索从临床和生物医学文本中的检索已经成为一个流行的话题,这主要归功于它在医疗应用中的实际影响以及最近发展的强大语言模型(LM)。然而,完全自动提取模型仍然需要突破一些障碍,以便在临床实践中直接部署。这些障碍包括它们在不同实体类型和临床事件上的不均衡表现。在这项工作中,我们研究了当前状态的最佳预训练语言模型(PLM)在这些任务上,包括单语言模型Med7和多语言大型语言模型(LLM)XLM-RoBERTa。我们比较了它们的优势和缺陷,使用历史药物检索共同任务数据集。我们报告了这些精度调整实验的结果,以便将来研究如何结合它们的输出,合并这些模型,或者提高它们的总精度 mediante ensemble学习和数据扩展。MedMine是M3Initiave的一部分,详情请参考

A sparse coding approach to inverse problems with application to microwave tomography imaging

  • paper_url: http://arxiv.org/abs/2308.03818
  • repo_url: None
  • paper_authors: Cesar F. Caiafa, Ramiro M. Irastorza
  • for: 这篇论文是关于解决具有各种科学和技术领域的不定性问题的,包括医疗诊断和天文学研究。
  • methods: 这篇论文使用了稀疏表示法,这是一种基于生物视觉系统的自然图像生成模型,可以有效地解决不定性线性逆问题。
  • results: 该论文提出了一种基于稀疏 coding的非线性和不定性问题解决方法,可能将导致现有算法的显著改进。
    Abstract Inverse imaging problems that are ill-posed can be encountered across multiple domains of science and technology, ranging from medical diagnosis to astronomical studies. To reconstruct images from incomplete and distorted data, it is necessary to create algorithms that can take into account both, the physical mechanisms responsible for generating these measurements and the intrinsic characteristics of the images being analyzed. In this work, the sparse representation of images is reviewed, which is a realistic, compact and effective generative model for natural images inspired by the visual system of mammals. It enables us to address ill-posed linear inverse problems by training the model on a vast collection of images. Moreover, we extend the application of sparse coding to solve the non-linear and ill-posed problem in microwave tomography imaging, which could lead to a significant improvement of the state-of-the-arts algorithms.
    摘要 <>translate "Inverse imaging problems that are ill-posed can be encountered across multiple domains of science and technology, ranging from medical diagnosis to astronomical studies. To reconstruct images from incomplete and distorted data, it is necessary to create algorithms that can take into account both, the physical mechanisms responsible for generating these measurements and the intrinsic characteristics of the images being analyzed. In this work, the sparse representation of images is reviewed, which is a realistic, compact and effective generative model for natural images inspired by the visual system of mammals. It enables us to address ill-posed linear inverse problems by training the model on a vast collection of images. Moreover, we extend the application of sparse coding to solve the non-linear and ill-posed problem in microwave tomography imaging, which could lead to a significant improvement of the state-of-the-arts algorithms." into Simplified Chinese.描述:反射图像问题可以在多个科学和技术领域中遇到,从医学诊断到天文学研究。为了从不完整和扭曲的数据中重建图像,需要创建能够考虑物理机制生成这些测量的数据,以及图像的内在特征。在这项工作中,我们评论了图像简洁表示,它是自然图像的可靠、紧凑和有效的生成模型,受蜥蜴视系统的启发。通过训练模型,可以解决线性不定的反射图像问题。此外,我们还扩展了简洁编码的应用,以解决微波Tomography影像问题,这可能会导致现有算法的显著改进。

A Meta-learning based Stacked Regression Approach for Customer Lifetime Value Prediction

  • paper_url: http://arxiv.org/abs/2308.08502
  • repo_url: None
  • paper_authors: Karan Gadgil, Sukhpal Singh Gill, Ahmed M. Abdelmoniem
  • for: The paper is written to propose a new approach to estimating Customer Lifetime Value (CLV) that is both effective and interpretable, using a combination of bagging and boosting models.
  • methods: The proposed approach uses a meta-learning-based stacked regression model that combines the predictions from multiple bagging and boosting models to estimate CLV.
  • results: The paper shows the efficacy of the proposed approach through empirical tests on an openly available Online Retail dataset, demonstrating that it outperforms existing distribution-based and basic models.
    Abstract Companies across the globe are keen on targeting potential high-value customers in an attempt to expand revenue and this could be achieved only by understanding the customers more. Customer Lifetime Value (CLV) is the total monetary value of transactions/purchases made by a customer with the business over an intended period of time and is used as means to estimate future customer interactions. CLV finds application in a number of distinct business domains such as Banking, Insurance, Online-entertainment, Gaming, and E-Commerce. The existing distribution-based and basic (recency, frequency & monetary) based models face a limitation in terms of handling a wide variety of input features. Moreover, the more advanced Deep learning approaches could be superfluous and add an undesirable element of complexity in certain application areas. We, therefore, propose a system which is able to qualify both as effective, and comprehensive yet simple and interpretable. With that in mind, we develop a meta-learning-based stacked regression model which combines the predictions from bagging and boosting models that each is found to perform well individually. Empirical tests have been carried out on an openly available Online Retail dataset to evaluate various models and show the efficacy of the proposed approach.
    摘要 世界各地企业都在努力寻找高值客户,以扩大收入。这可以通过更好地了解客户来实现。客户全生命值(CLV)是指客户在企业与其交易的总财务值,在一定时间范围内,并用于估计未来客户互动。CLV在银行、保险、在线娱乐、游戏和电商等多个业务领域得到应用。现有的分布型和基本(频率、额度和时间)型模型具有处理多种输入特征的限制。此外,更先进的深度学习方法可能会添加不必要的复杂性。因此,我们提出一个能够同时具有效果、全面和简单可解释的系统。为了实现这一目标,我们开发了一种基于元学习的堆叠回归模型,该模型结合了袋型和投射型模型,每一个模型都能够单独表现出色。在一个公开的在线零售数据集上进行了实验,以评估不同模型的效果,并证明了我们的方法的有效性。

Stock Market Price Prediction: A Hybrid LSTM and Sequential Self-Attention based Approach

  • paper_url: http://arxiv.org/abs/2308.04419
  • repo_url: None
  • paper_authors: Karan Pardeshi, Sukhpal Singh Gill, Ahmed M. Abdelmoniem
  • for: 预测股票价格,帮助投资者在最佳时间做出最佳决策。
  • methods: 使用深度学习策略,具体来说是Long Short-Term Memory(LSTM)与Sequential Self-Attention Mechanism(LSTM-SSAM)建模方法。
  • results: 对三个股票数据集(SBIN、HDFCBANK、BANKBARODA)进行了广泛的实验,实验结果表明提议的模型比现有模型更有效果和可行性。RMSE和R2评价指标得到了最佳result。
    Abstract One of the most enticing research areas is the stock market, and projecting stock prices may help investors profit by making the best decisions at the correct time. Deep learning strategies have emerged as a critical technique in the field of the financial market. The stock market is impacted due to two aspects, one is the geo-political, social and global events on the bases of which the price trends could be affected. Meanwhile, the second aspect purely focuses on historical price trends and seasonality, allowing us to forecast stock prices. In this paper, our aim is to focus on the second aspect and build a model that predicts future prices with minimal errors. In order to provide better prediction results of stock price, we propose a new model named Long Short-Term Memory (LSTM) with Sequential Self-Attention Mechanism (LSTM-SSAM). Finally, we conduct extensive experiments on the three stock datasets: SBIN, HDFCBANK, and BANKBARODA. The experimental results prove the effectiveness and feasibility of the proposed model compared to existing models. The experimental findings demonstrate that the root-mean-squared error (RMSE), and R-square (R2) evaluation indicators are giving the best results.
    摘要 一个非常吸引人的研究领域是股票市场,并且预测股票价格可以帮助投资者获得最佳的决策时间。深度学习策略在财务市场中得到了广泛应用。股票市场受到两个方面的影响:一是地域政治社会和全球事件的影响,这些事件可能影响股票价格走势。另一方面,我们专注于历史价格走势和季节性,以预测股票价格。在这篇论文中,我们的目标是建立一个可以预测股票价格的新模型,并且对现有模型进行比较。为了提供更好的预测结果,我们提议一种名为长期记忆(LSTM)和顺序自注意机制(SSAM)的新模型。最后,我们对三个股票数据集进行了广泛的实验:SBIN、HDFCBANK和BANKBARODA。实验结果证明了我们提出的模型的可行性和有效性,并且比现有模型更好。实验结果表明,使用Root Mean Squared Error(RMSE)和R-square(R2)评价指标,我们的模型在预测股票价格方面得到了最佳的结果。

Adaptive Semi-Supervised Segmentation of Brain Vessels with Ambiguous Labels

  • paper_url: http://arxiv.org/abs/2308.03613
  • repo_url: None
  • paper_authors: Fengming Lin, Yan Xia, Nishant Ravikumar, Qiongyao Liu, Michael MacRaild, Alejandro F Frangi
  • for: 脑血管疾病诊断和治疗中,精准分割脑血管的重要性。
  • methods: 我们提出了一种适应性半supervised方法,包括进步半supervised学习、适应训练策略和边界增强。
  • results: 对3DRA数据集进行实验,我们的方法在mesh-based分割指标中显示出优于其他方法。通过利用部分和抽象标注数据,我们的方法在杂乱标注数据集上实现了出色的分割性能,展示了临床应用的潜力。
    Abstract Accurate segmentation of brain vessels is crucial for cerebrovascular disease diagnosis and treatment. However, existing methods face challenges in capturing small vessels and handling datasets that are partially or ambiguously annotated. In this paper, we propose an adaptive semi-supervised approach to address these challenges. Our approach incorporates innovative techniques including progressive semi-supervised learning, adaptative training strategy, and boundary enhancement. Experimental results on 3DRA datasets demonstrate the superiority of our method in terms of mesh-based segmentation metrics. By leveraging the partially and ambiguously labeled data, which only annotates the main vessels, our method achieves impressive segmentation performance on mislabeled fine vessels, showcasing its potential for clinical applications.
    摘要 精准分割脑血管是脑血管疾病诊断和治疗中的关键。然而,现有方法在捕捉小血管和处理部分或恶劣标注的数据集时遇到困难。在这篇论文中,我们提出了一种适应式半监督方法来解决这些挑战。我们的方法包括进步式半监督学习、适应性训练策略和边界增强等创新技术。在3DRA数据集上进行实验,我们的方法在基于网格的分割指标上表现出色。通过利用部分和恶劣标注的数据,我们的方法在混乱标注的细血管上具有出色的分割性能,这显示了其在临床应用中的潜力。

A machine-learning sleep-wake classification model using a reduced number of features derived from photoplethysmography and activity signals

  • paper_url: http://arxiv.org/abs/2308.05759
  • repo_url: None
  • paper_authors: Douglas A. Almeida, Felipe M. Dias, Marcelo A. F. Toledo, Diego A. C. Cardenas, Filipe A. C. Oliveira, Estela Ribeiro, Jose E. Krieger, Marco A. Gutierrez
  • for: 这个研究的目的是开发一种基于 Photoplethysmography (PPG) 信号和活动计数的机器学习睡眠-醒目分类模型,以提高睡眠质量和全身健康。
  • methods: 该研究使用了 eXtreme Gradient Boosting (XGBoost) 算法和 PPG 信号和活动计数特征进行睡眠 stage INFERENCE。
  • results: 该研究的结果显示,使用 XGBoost 算法和 PPG 信号和活动计数特征可以达到比现有方法更高的分类性能,具体来说,敏感性为 91.15 $\pm$ 1.16%, 特征选择率为 53.66 $\pm$ 1.12%, F1 分数为 83.88 $\pm$ 0.56%, κ值为 48.0 $\pm$ 0.86%。
    Abstract Sleep is a crucial aspect of our overall health and well-being. It plays a vital role in regulating our mental and physical health, impacting our mood, memory, and cognitive function to our physical resilience and immune system. The classification of sleep stages is a mandatory step to assess sleep quality, providing the metrics to estimate the quality of sleep and how well our body is functioning during this essential period of rest. Photoplethysmography (PPG) has been demonstrated to be an effective signal for sleep stage inference, meaning it can be used on its own or in a combination with others signals to determine sleep stage. This information is valuable in identifying potential sleep issues and developing strategies to improve sleep quality and overall health. In this work, we present a machine learning sleep-wake classification model based on the eXtreme Gradient Boosting (XGBoost) algorithm and features extracted from PPG signal and activity counts. The performance of our method was comparable to current state-of-the-art methods with a Sensitivity of 91.15 $\pm$ 1.16%, Specificity of 53.66 $\pm$ 1.12%, F1-score of 83.88 $\pm$ 0.56%, and Kappa of 48.0 $\pm$ 0.86%. Our method offers a significant improvement over other approaches as it uses a reduced number of features, making it suitable for implementation in wearable devices that have limited computational power.
    摘要 睡眠是我们健康和养成的重要方面。它对我们的情绪、身体和肌肉健康产生重要影响,对我们的情绪、记忆和认知功能也产生很大的影响。睡眠阶段的分类是评估睡眠质量的必要步骤,可以提供评估睡眠质量的度量,以及身体在这段睡眠时的功能状况。光学抵抗检测(PPG)已经被证明可以用于睡眠阶段推断,这意味着它可以单独使用或与其他信号组合使用来确定睡眠阶段。这些信息非常重要,可以帮助发现可能存在的睡眠问题,并开发改善睡眠质量和总体健康的策略。在这项工作中,我们提出了基于XTreme Gradient Boosting(XGBoost)算法和PPG信号和活动计数的机器学习睡眠-醒目分类模型。我们的方法的性能与当前状态的方法相当,具有感知率91.15 $\pm$ 1.16%、特异性53.66 $\pm$ 1.12%、F1分数83.88 $\pm$ 0.56%和卡ปα48.0 $\pm$ 0.86%。我们的方法比其他方法更加有优势,因为它使用了减少的特征数,适合在有限计算能力的穿戴式设备中实现。

  • paper_url: http://arxiv.org/abs/2308.03574
  • repo_url: https://github.com/anonreposit/gesp
  • paper_authors: Etor Arza, Leni K. Le Goff, Emma Hart
  • for: 提高direct policy search任务的计算效率,尤其是在物理世界中进行评估时。
  • methods: 基于对象值的简单停止 criterion,不需要具体问题知识。
  • results: 在五个来自游戏、 роботи库和 класси控制领域的直接策略搜索环境中,可以Save up to 75% of computation time。 comparison with problem-specific stopping criteria shows that it performs comparably while being more generally applicable.
    Abstract Lengthy evaluation times are common in many optimization problems such as direct policy search tasks, especially when they involve conducting evaluations in the physical world, e.g. in robotics applications. Often, when evaluating a solution over a fixed time period, it becomes clear that the objective value will not increase with additional computation time (for example, when a two-wheeled robot continuously spins on the spot). In such cases, it makes sense to stop the evaluation early to save computation time. However, most approaches to stop the evaluation are problem-specific and need to be specifically designed for the task at hand. Therefore, we propose an early stopping method for direct policy search. The proposed method only looks at the objective value at each time step and requires no problem-specific knowledge. We test the introduced stopping criterion in five direct policy search environments drawn from games, robotics, and classic control domains, and show that it can save up to 75% of the computation time. We also compare it with problem-specific stopping criteria and demonstrate that it performs comparably while being more generally applicable.
    摘要 长时间的评估时间是许多优化问题中的常见问题,如直接策略搜索任务,特别是在物理世界中进行评估,例如在二轮机器人应用中。经常情况下,当评估一解决方案在固定时间段内时,就会发现目标值不会随着计算时间增加(例如,当二轮机器人不断旋转在同一点上)。在这些情况下,可以提前结束评估以降低计算时间。然而,大多数评估结束方法是专门为特定任务设计的,需要具备专门的问题知识。因此,我们提出了一种直接策略搜索中的 early stopping 方法。该方法只需要在每个时间步骤中评估目标值,无需特定任务知识。我们在五个直接策略搜索环境中测试了引入的停止标准,这些环境来自游戏、机器人和 класси控制领域。我们发现,该方法可以将计算时间减少到75%。我们还与专门的停止标准进行比较,并证明它与专门的停止标准相比,性能相似,但更一般适用。

When Federated Learning meets Watermarking: A Comprehensive Overview of Techniques for Intellectual Property Protection

  • paper_url: http://arxiv.org/abs/2308.03573
  • repo_url: None
  • paper_authors: Mohammed Lansari, Reda Bellafqira, Katarzyna Kapusta, Vincent Thouvenot, Olivier Bettan, Gouenou Coatrieux
  • for: 本文提供了最新的 Federated Learning(FL) watermarking 技术的概述,包括新的挑战和机遇在FL中。
  • methods: 本文详细介绍了过去五年内关于DNN watermarking的研究,以及在FL中应用这些技术的新途径。
  • results: 本文总结了FL watermarking 技术的最新进展,并释明了在FL中保护模型所有权的新挑战和机遇。
    Abstract Federated Learning (FL) is a technique that allows multiple participants to collaboratively train a Deep Neural Network (DNN) without the need of centralizing their data. Among other advantages, it comes with privacy-preserving properties making it attractive for application in sensitive contexts, such as health care or the military. Although the data are not explicitly exchanged, the training procedure requires sharing information about participants' models. This makes the individual models vulnerable to theft or unauthorized distribution by malicious actors. To address the issue of ownership rights protection in the context of Machine Learning (ML), DNN Watermarking methods have been developed during the last five years. Most existing works have focused on watermarking in a centralized manner, but only a few methods have been designed for FL and its unique constraints. In this paper, we provide an overview of recent advancements in Federated Learning watermarking, shedding light on the new challenges and opportunities that arise in this field.
    摘要 受领域限制的 Federated Learning(FL)是一种技术,允许多个参与者共同训练深度神经网络(DNN),不需要中央化数据。FL possess several advantages, including privacy-preserving properties, making it suitable for applications in sensitive contexts, such as healthcare or the military. However, the training process requires sharing information about participants' models, which makes the individual models vulnerable to theft or unauthorized distribution by malicious actors. To address the issue of ownership rights protection in the context of Machine Learning(ML), DNN watermarking methods have been developed over the past five years. Most existing works have focused on watermarking in a centralized manner, but only a few methods have been designed for FL and its unique constraints. In this paper, we provide an overview of recent advancements in Federated Learning watermarking, highlighting the new challenges and opportunities that arise in this field.

Provably Efficient Learning in Partially Observable Contextual Bandit

  • paper_url: http://arxiv.org/abs/2308.03572
  • repo_url: None
  • paper_authors: Xueping Gong, Jiheng Zhang
  • for: 这项研究探讨了在半可见情况下的Contextual Bandit问题,agent有限制知识和部分隐藏变量的情况下,通过优化问题来描述或部分描述 causal effect between actions and rewards。
  • methods: 我们将原始函数约束转化为 linear 约束,通过顺序解 linear programming 来样 compatible causal models,并通过这些模型来获得 causal bounds,并考虑到估计误差。我们的抽取算法提供了可靠的抽取结果。
  • results: 我们证明了我们的 causally enhanced 算法在 action set 和函数空间大小的情况下比 classical bandit 算法更高效,并且在可以处理通用 context distribution 的情况下,我们的方法可以在函数approximation 任务中提高 regret 的速度。我们的结果还表明,我们的方法可以在实际应用中提高 contextual bandit 代理的性能。
    Abstract In this paper, we investigate transfer learning in partially observable contextual bandits, where agents have limited knowledge from other agents and partial information about hidden confounders. We first convert the problem to identifying or partially identifying causal effects between actions and rewards through optimization problems. To solve these optimization problems, we discretize the original functional constraints of unknown distributions into linear constraints, and sample compatible causal models via sequentially solving linear programmings to obtain causal bounds with the consideration of estimation error. Our sampling algorithms provide desirable convergence results for suitable sampling distributions. We then show how causal bounds can be applied to improving classical bandit algorithms and affect the regrets with respect to the size of action sets and function spaces. Notably, in the task with function approximation which allows us to handle general context distributions, our method improves the order dependence on function space size compared with previous literatures. We formally prove that our causally enhanced algorithms outperform classical bandit algorithms and achieve orders of magnitude faster convergence rates. Finally, we perform simulations that demonstrate the efficiency of our strategy compared to the current state-of-the-art methods. This research has the potential to enhance the performance of contextual bandit agents in real-world applications where data is scarce and costly to obtain.
    摘要 在这篇论文中,我们研究了在部分可见 контекстual bandit 中的传输学习, agents 具有其他 agents 的有限知识和隐藏的干扰因素的有限信息。我们首先将问题转化为标识或部分标识 causal 效果 между动作和奖励的优化问题。为解这些优化问题,我们将原始的函数约束Unknown Distribution 转化为线性约束,并通过顺序解线性程序来采样兼容 causal 模型,以获得 causal 上下文的约束。我们的抽取算法提供了可靠的收敛结果。然后,我们展示了如何使用 causal 上下文来改进 classical bandit 算法,并对奖励的大小和功能空间的大小产生影响。尤其在功能适应任务中,我们的方法可以处理一般的 context 分布,我们的方法可以在功能空间大小的下降中提高奖励的顺序依赖度。我们正式证明我们的 causally 强化算法在较前文献的算法之上出perform better,并 achiev 许多更快的收敛率。最后,我们在实验中证明了我们的策略在实际应用中比现有的方法更高效。这种研究具有提高实际应用中 contextual bandit 代理的性能的潜在可能性。

Partial identification of kernel based two sample tests with mismeasured data

  • paper_url: http://arxiv.org/abs/2308.03570
  • repo_url: None
  • paper_authors: Ron Nafshi, Maggie Makar
  • for: 这个论文是关于在机器学习应用中使用非参数两个样本测试(最大均值差)时,如何处理含有错误样本的情况。
  • methods: 该论文使用了对$\epsilon$-污染情况下的MMD的估计,并研究了MMD的部分鉴定。
  • results: 该论文提出了一种方法来估计MMD的上下限,并证明这些上下限是true, unknown MMD的精确 bounds。通过三个实验 dataset,论文证明了这种方法的优越性,即它可以提供紧Binding的上下限,并且false coverage rate较低。
    Abstract Nonparametric two-sample tests such as the Maximum Mean Discrepancy (MMD) are often used to detect differences between two distributions in machine learning applications. However, the majority of existing literature assumes that error-free samples from the two distributions of interest are available.We relax this assumption and study the estimation of the MMD under $\epsilon$-contamination, where a possibly non-random $\epsilon$ proportion of one distribution is erroneously grouped with the other. We show that under $\epsilon$-contamination, the typical estimate of the MMD is unreliable. Instead, we study partial identification of the MMD, and characterize sharp upper and lower bounds that contain the true, unknown MMD. We propose a method to estimate these bounds, and show that it gives estimates that converge to the sharpest possible bounds on the MMD as sample size increases, with a convergence rate that is faster than alternative approaches. Using three datasets, we empirically validate that our approach is superior to the alternatives: it gives tight bounds with a low false coverage rate.
    摘要 非参数两个样本测试,如最大均值差(MMD),在机器学习应用中广泛使用以检测两个分布之间的差异。然而,现有的大部分文献假设有误无误的样本来自两个分布。我们松弛这个假设,研究在$\epsilon$-杂化下MMD的估计,其中可能存在非随机的$\epsilon$比例一个分布被误归类为另一个分布。我们表明在$\epsilon$-杂化下,通常的估计MMD是不可靠的。而我们研究了MMD的部分鉴定,并Characterize了包含真实未知MMD的尖锐上下限。我们提议了一种估计这些上下限的方法,并证明它的估计将与样本大小增加而 convergence to the sharpest possible bounds on MMD, with a convergence rate that is faster than alternative approaches。使用三个数据集,我们实验 validate that our approach is superior to the alternatives: it gives tight bounds with a low false coverage rate.

A Transfer Learning Framework for Proactive Ramp Metering Performance Assessment

  • paper_url: http://arxiv.org/abs/2308.03542
  • repo_url: None
  • paper_authors: Xiaobo Ma, Adrian Cottam, Mohammad Razaur Rahman Shaon, Yao-Jan Wu
  • for: 评估升道计划的性能时需要评估升道计划对高速公路交通流动的影响。
  • methods: 本研究提出了一种基于传输学习模型的方法,通过学习before和after情况下的车流特征,对新的高速公路段预测车流参数。
  • results: 实验结果表明,提posed方法可以作为评估升道计划性能的 alternatives。
    Abstract Transportation agencies need to assess ramp metering performance when deploying or expanding a ramp metering system. The evaluation of a ramp metering strategy is primarily centered around examining its impact on freeway traffic mobility. One way these effects can be explored is by comparing traffic states, such as the speed before and after the ramp metering strategy has been altered. Predicting freeway traffic states for the after scenarios following the implementation of a new ramp metering control strategy could offer valuable insights into the potential effectiveness of the target strategy. However, the use of machine learning methods in predicting the freeway traffic state for the after scenarios and evaluating the effectiveness of transportation policies or traffic control strategies such as ramp metering is somewhat limited in the current literature. To bridge the research gap, this study presents a framework for predicting freeway traffic parameters (speed, occupancy, and flow rate) for the after situations when a new ramp metering control strategy is implemented. By learning the association between the spatial-temporal features of traffic states in before and after situations for known freeway segments, the proposed framework can transfer this learning to predict the traffic parameters for new freeway segments. The proposed framework is built upon a transfer learning model. Experimental results show that the proposed framework is feasible for use as an alternative for predicting freeway traffic parameters to proactively evaluate ramp metering performance.
    摘要 (简化中文)交通机构需要评估扩展或部署匝道控制系统时,需要评估匝道控制策略的表现。匝道控制策略的效果主要是通过评估它对高速公路交通流动性的影响来评估。比较交通状态之前和之后匝道控制策略改变后的情况可以提供有价值的信息。但是,使用机器学习方法来预测高速公路交通状态的后果和评估交通政策或匝道控制策略的效果在当前文献中受限。为了填补这个研究漏洞,本研究提出了一种预测高速公路交通参数(速度、占用率和流速)的后果框架。通过学习知道的匝道段的前后交通状态之间的空间时间特征,该框架可以将这种学习转移到新的匝道段上预测交通参数。该框架基于转移学习模型。实验结果表明,该框架可以作为评估匝道控制性能的代替方法来预测高速公路交通参数。

On-ramp and Off-ramp Traffic Flows Estimation Based on A Data-driven Transfer Learning Framework

  • paper_url: http://arxiv.org/abs/2308.03538
  • repo_url: None
  • paper_authors: Xiaobo Ma, Abolfazl Karimpour, Yao-Jan Wu
    for: 用于提高路径负荷管理策略的执行和监测,以及评估高速公路交叉口的交通性能。methods: 使用数据驱动的框架,通过启用转移学习模型,可以准确地估计缺失的融合流量。results: 实验结果表明,提案的方法可以在不同的交通模式、分布和特点下提供高精度的融合流量估计,其误差值在23.90 veh/h到40.85 veh/h之间, root mean square error值在34.55 veh/h到57.77 veh/h之间。此外,比较分析表明,提案的方法在其他传统机器学习模型之上表现出优异的性能。
    Abstract To develop the most appropriate control strategy and monitor, maintain, and evaluate the traffic performance of the freeway weaving areas, state and local Departments of Transportation need to have access to traffic flows at each pair of on-ramp and off-ramp. However, ramp flows are not always readily available to transportation agencies and little effort has been made to estimate these missing flows in locations where no physical sensors are installed. To bridge this research gap, a data-driven framework is proposed that can accurately estimate the missing ramp flows by solely using data collected from loop detectors on freeway mainlines. The proposed framework employs a transfer learning model. The transfer learning model relaxes the assumption that the underlying data distributions of the source and target domains must be the same. Therefore, the proposed framework can guarantee high-accuracy estimation of on-ramp and off-ramp flows on freeways with different traffic patterns, distributions, and characteristics. Based on the experimental results, the flow estimation mean absolute errors range between 23.90 veh/h to 40.85 veh/h for on-ramps, and 31.58 veh/h to 45.31 veh/h for off-ramps; the flow estimation root mean square errors range between 34.55 veh/h to 57.77 veh/h for on-ramps, and 41.75 veh/h to 58.80 veh/h for off-ramps. Further, the comparison analysis shows that the proposed framework outperforms other conventional machine learning models. The estimated ramp flows based on the proposed method can help transportation agencies to enhance the operations of their ramp control strategies for locations where physical sensors are not installed.
    摘要 为了开发最佳的控制策略和监测、维护和评估高速公路叠加区域的交通表现,国家和地方交通部门需要了解每对侧线和脱离的交通流量。然而,叠加区域的流量并不总是可以向交通机构提供,而且过去很少有人尝试了估计这些缺失的流量。为了填补这一研究漏洞,我们提出了一个数据驱动的框架,可以准确地估计缺失的叠加流量,只使用高速公路主线上的循环检测器收集的数据。我们的框架采用了传输学习模型,该模型放宽了下面数据集的假设,因此我们的框架可以 garantizar高精度地估计叠加流量,无论高速公路的交通模式、分布和特征如何。根据实验结果,估计的叠加流量 Mean Absolute Error 在23.90 veh/h 到 40.85 veh/h之间,Root Mean Square Error 在34.55 veh/h 到 57.77 veh/h之间。此外,比较分析表明,我们的方法在其他传统的机器学习模型之上表现出色。根据我们的估计结果,可以帮助交通机构在没有物理感知器的情况下提高叠加控制策略的运行。

Deep Feature Learning for Wireless Spectrum Data

  • paper_url: http://arxiv.org/abs/2308.03530
  • repo_url: None
  • paper_authors: Ljupcho Milosheski, Gregor Cerar, Blaž Bertalanič, Carolina Fortuna, Mihael Mohorčič
  • for: 本研究旨在实现无监督的无线传输几何分 clustering。
  • methods: 我们提出了基于卷积神经网络的自动特征表示学习方法,并证明了这种方法可以将输入数据降维至99.3%比baseline PCA少。
  • results: 我们显示了自动特征表示学习可以提取细化的传输几何shape,而baseline仅能基于背景噪音进行大致分类。
    Abstract In recent years, the traditional feature engineering process for training machine learning models is being automated by the feature extraction layers integrated in deep learning architectures. In wireless networks, many studies were conducted in automatic learning of feature representations for domain-related challenges. However, most of the existing works assume some supervision along the learning process by using labels to optimize the model. In this paper, we investigate an approach to learning feature representations for wireless transmission clustering in a completely unsupervised manner, i.e. requiring no labels in the process. We propose a model based on convolutional neural networks that automatically learns a reduced dimensionality representation of the input data with 99.3% less components compared to a baseline principal component analysis (PCA). We show that the automatic representation learning is able to extract fine-grained clusters containing the shapes of the wireless transmission bursts, while the baseline enables only general separability of the data based on the background noise.
    摘要

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.03526
  • repo_url: None
  • paper_authors: Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser, David Choi, Petko Georgiev, Daniel Toyama, Aja Huang, Roman Ring, Igor Babuschkin, Timo Ewalds, Mahyar Bordbar, Sarah Henderson, Sergio Gómez Colmenarejo, Aäron van den Oord, Wojciech Marian Czarnecki, Nando de Freitas, Oriol Vinyals
  • For: The paper is written to advance offline reinforcement learning algorithms in the challenging environment of StarCraft II.* Methods: The paper introduces a new benchmark called AlphaStar Unplugged, which includes a dataset, standardized API, and evaluation protocol. The authors also present baseline agents, including behavior cloning and offline variants of actor-critic and MuZero.* Results: The authors achieve a 90% win rate against a previously published AlphaStar behavior cloning agent using only offline data, improving the state of the art of agents in this domain.Here is the same information in Simplified Chinese text:
  • for: 本文是为了提高星际II上的离线学习算法。
  • methods: 本文引入了一个新的标准 benchmark,即AlphaStar Unplugged,包括一个数据集、标准 API 和评估协议。作者还提供了一些基线代理,包括行为做影响和离线版actor-critic和MuZero。
  • results: 作者使用仅离线数据达到了90% 的胜率,超过了之前发表的AlphaStar行为做影响代理。
    Abstract StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.
    摘要 星际II是一个非常挑战性的规则学习环境,它是部分可见、随机、多智能的,掌握星际II需要长期战略规划,同时在实时低级别执行。它还有活跃的职业竞赛场景。由于星际II的挑战性和Blizzard公司发布了数百万场星际II游戏记录,这使得星际II成为了提前RL算法的进步的 идеaldestination。本文利用了这些数据和API,并实现了一个基准测试(AlphaStar Unplugged),它为Offline RL算法带来了前所未有的挑战。我们定义了一个子集(Blizzard发布的数据),工具和标准API для机器学习方法,以及评价协议。我们还提供了基eline agents,包括行为做法快照、Offline变体的actor-critic和MuZero。我们使用仅Offline数据提高了代理机器的状态,并达到了以前发布的AlphaStar行为快照机器的90%赢率。

Worker Activity Recognition in Manufacturing Line Using Near-body Electric Field

  • paper_url: http://arxiv.org/abs/2308.03514
  • repo_url: None
  • paper_authors: Sungho Suh, Vitor Fortes Rey, Sizhen Bian, Yu-Chi Huang, Jože M. Rožanec, Hooman Tavakoli Ghinani, Bo Zhou, Paul Lukowicz
  • for: 本研究旨在提高制造业的生产效率和产品质量,通过部署先进的感知和控制系统。
  • methods: 本研究使用了融合IMU和体容感测模块的新型便携式感知原型,并对多渠道时间序列 convolutional neural networks 和深度径向LSTM进行早期和晚期整合处理。
  • results: 实验结果表明,我们提出的方法可以与基线方法相比,在实际应用中表现出较高的性能。此外,具有体容感器和特征融合方法的感知原型,对比没有体容感器和Apple Watch数据的情况,提高了6.35%,macro F1分数提高了9.38%。
    Abstract Manufacturing industries strive to improve production efficiency and product quality by deploying advanced sensing and control systems. Wearable sensors are emerging as a promising solution for achieving this goal, as they can provide continuous and unobtrusive monitoring of workers' activities in the manufacturing line. This paper presents a novel wearable sensing prototype that combines IMU and body capacitance sensing modules to recognize worker activities in the manufacturing line. To handle these multimodal sensor data, we propose and compare early, and late sensor data fusion approaches for multi-channel time-series convolutional neural networks and deep convolutional LSTM. We evaluate the proposed hardware and neural network model by collecting and annotating sensor data using the proposed sensing prototype and Apple Watches in the testbed of the manufacturing line. Experimental results demonstrate that our proposed methods achieve superior performance compared to the baseline methods, indicating the potential of the proposed approach for real-world applications in manufacturing industries. Furthermore, the proposed sensing prototype with a body capacitive sensor and feature fusion method improves by 6.35%, yielding a 9.38% higher macro F1 score than the proposed sensing prototype without a body capacitive sensor and Apple Watch data, respectively.
    摘要 制造业为提高生产效率和产品质量而努力,通常会使用先进的感测和控制系统。穿戴式感测器正在成为制造业中实现这一目标的有望解决方案,因为它们可以提供不间断和不干扰的工作者活动监测。本文介绍了一种新的穿戴式感测原型,该原型结合IMU和身体电容感测模块,以认识制造线上工作者的活动。为处理这些多模式感测数据,我们提出并比较了早期和晚期感测数据融合方法,用于多渠道时间序列卷积神经网络和深度卷积LSTM。我们通过使用提议的硬件和神经网络模型,对收集和标注感测数据的Apple Watch和测试准系中的感测数据进行评估。实验结果表明,我们的提议方法在比基准方法的情况下表现出色,这表明了我们的方法在实际应用中的潜在可能性。此外,结合身体电容感测器和特征融合方法的提议感测器提高了6.35%,对比没有身体电容感测器和Apple Watch数据的情况下,提议感测器的macro F1分数提高了9.38%。

A data-driven approach to predict decision point choice during normal and evacuation wayfinding in multi-story buildings

  • paper_url: http://arxiv.org/abs/2308.03511
  • repo_url: None
  • paper_authors: Yan Feng, Panchamy Krishnakumari
    for: 这项研究旨在理解和预测在多层建筑物内 pedestrian 的决策点选择行为,以确保 pedestrian 的安全。methods: 该研究使用了数据驱动的方法,首先构建了indoor网络表示,然后使用了一种已知的机器学习算法,即随机森林(RF)模型来预测 pedestrian 在路线上的决策点选择。results: 研究发现,使用 RF 模型可以高度准确预测 pedestrian 的决策点选择,其中最高的预测精度达到 96%。此外,研究还发现,个人特征不会影响决策点选择。这项研究表明了应用机器学习算法来研究 pedestrian 路线选择行为在复杂的indoor建筑物中的潜力。
    Abstract Understanding pedestrian route choice behavior in complex buildings is important to ensure pedestrian safety. Previous studies have mostly used traditional data collection methods and discrete choice modeling to understand the influence of different factors on pedestrian route and exit choice, particularly in simple indoor environments. However, research on pedestrian route choice in complex buildings is still limited. This paper presents a data-driven approach for understanding and predicting the pedestrian decision point choice during normal and emergency wayfinding in a multi-story building. For this, we first built an indoor network representation and proposed a data mapping technique to map VR coordinates to the indoor representation. We then used a well-established machine learning algorithm, namely the random forest (RF) model to predict pedestrian decision point choice along a route during four wayfinding tasks in a multi-story building. Pedestrian behavioral data in a multi-story building was collected by a Virtual Reality experiment. The results show a much higher prediction accuracy of decision points using the RF model (i.e., 93% on average) compared to the logistic regression model. The highest prediction accuracy was 96% for task 3. Additionally, we tested the model performance combining personal characteristics and we found that personal characteristics did not affect decision point choice. This paper demonstrates the potential of applying a machine learning algorithm to study pedestrian route choice behavior in complex indoor buildings.
    摘要 理解步行者路径选择行为在复杂的建筑物中是重要的,以确保步行者的安全。先前的研究通常使用传统的数据采集方法和精确选择模型来理解不同因素对步行者路径和出口选择的影响,特别是在简单的室内环境中。然而,关于步行者路径选择在复杂的建筑物中的研究仍然有限。本文提出了一种数据驱动的方法,用于理解和预测步行者决策点选择在正常和紧急导航中的多层建筑物中。为此,我们首先建立了一个室内网络表示,并提出了一种数据映射技术来将VR坐标映射到室内表示中。然后,我们使用一种已有的机器学习算法,即随机森林(RF)模型来预测步行者决策点选择的路径中的决策点。在一个多层建筑物中的步行者行为数据被通过虚拟现实实验收集。结果显示,使用RF模型(即93%的平均预测精度)的预测精度远高于逻辑回归模型。最高的预测精度是96%的任务3。此外,我们测试了模型性能,结果表明个人特征不会影响决策点选择。本文示范了应用机器学习算法研究步行者路径选择行为在复杂室内建筑物中的可能性。

Balanced Face Dataset: Guiding StyleGAN to Generate Labeled Synthetic Face Image Dataset for Underrepresented Group

  • paper_url: http://arxiv.org/abs/2308.03495
  • repo_url: None
  • paper_authors: Kidist Amde Mekonnen
    for:This paper aims to generate a robust face image dataset that is balanced among different demographic groups, using the StyleGAN model.methods:The paper uses the StyleGAN model to generate synthetic face images, and controls the generation process to achieve a balanced distribution of the dataset among different demographic groups.results:The paper achieves a balanced distribution of the dataset among different demographic groups, and demonstrates the effectiveness of using synthetic data generation and active labeling to reduce bias in machine learning.Here’s the Chinese translation of the three points:for:这篇论文的目标是使用StyleGAN模型生成一个可靠的人脸图像数据集,该数据集在不同的民族群体中具有平衡分布。methods:论文使用StyleGAN模型生成人脸synthetic图像,并控制生成过程以实现数据集中不同民族群体的平衡分布。results:论文实现了数据集中不同民族群体的平衡分布,并证明了通过生成人工数据和活动标注来减少机器学习中的偏见。
    Abstract For a machine learning model to generalize effectively to unseen data within a particular problem domain, it is well-understood that the data needs to be of sufficient size and representative of real-world scenarios. Nonetheless, real-world datasets frequently have overrepresented and underrepresented groups. One solution to mitigate bias in machine learning is to leverage a diverse and representative dataset. Training a model on a dataset that covers all demographics is crucial to reducing bias in machine learning. However, collecting and labeling large-scale datasets has been challenging, prompting the use of synthetic data generation and active labeling to decrease the costs of manual labeling. The focus of this study was to generate a robust face image dataset using the StyleGAN model. In order to achieve a balanced distribution of the dataset among different demographic groups, a synthetic dataset was created by controlling the generation process of StyleGaN and annotated for different downstream tasks.
    摘要 为了让机器学习模型在特定问题领域 generaleffectively,需要确保数据够大且符合实际情况。然而,实际世界数据集经常具有过度和不足的分布。一种解决偏见问题的方法是利用多样化和代表性的数据集。训练机器学习模型需要覆盖所有民族,这有助于减少偏见。然而,收集和标注大规模数据集的成本高昂,因此常用生成数据和活动标注来减少手动标注成本。这个研究的目标是通过StyleGAN模型生成一个可靠的人脸图像集。为了实现数据集的均衡分布,我们控制了StyleGAN生成过程,并对不同下游任务进行了标注。

Exploring the Physical World Adversarial Robustness of Vehicle Detection

  • paper_url: http://arxiv.org/abs/2308.03476
  • repo_url: None
  • paper_authors: Wei Jiang, Tianyuan Zhang, Shuangcheng Liu, Weiyu Ji, Zichao Zhang, Gang Xiao
  • For: The paper is written to highlight the significance of adversarial attacks in real-world contexts and to introduce a new dataset (DCI) for evaluating the robustness of detection models under these attacks.* Methods: The paper uses an innovative instant-level data generation pipeline using the CARLA simulator to create the DCI dataset, which enables comprehensive experiments involving three detection models and three physical adversarial attacks.* Results: The paper finds that Yolo v6 demonstrates remarkable resilience to adversarial attacks, while the ASA attack yields a substantial average AP reduction of 14.51%. The study also notes that static scenes yield higher recognition AP values and that outcomes remain relatively consistent across varying weather conditions. Additionally, the study suggests that advancements in adversarial attack algorithms may be approaching their “limitation”.
    Abstract Adversarial attacks can compromise the robustness of real-world detection models. However, evaluating these models under real-world conditions poses challenges due to resource-intensive experiments. Virtual simulations offer an alternative, but the absence of standardized benchmarks hampers progress. Addressing this, we propose an innovative instant-level data generation pipeline using the CARLA simulator. Through this pipeline, we establish the Discrete and Continuous Instant-level (DCI) dataset, enabling comprehensive experiments involving three detection models and three physical adversarial attacks. Our findings highlight diverse model performances under adversarial conditions. Yolo v6 demonstrates remarkable resilience, experiencing just a marginal 6.59% average drop in average precision (AP). In contrast, the ASA attack yields a substantial 14.51% average AP reduction, twice the effect of other algorithms. We also note that static scenes yield higher recognition AP values, and outcomes remain relatively consistent across varying weather conditions. Intriguingly, our study suggests that advancements in adversarial attack algorithms may be approaching its ``limitation''.In summary, our work underscores the significance of adversarial attacks in real-world contexts and introduces the DCI dataset as a versatile benchmark. Our findings provide valuable insights for enhancing the robustness of detection models and offer guidance for future research endeavors in the realm of adversarial attacks.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standardized form of Chinese used in mainland China and Singapore.

How to forecast power generation in wind farms? Insights from leveraging hierarchical structure

  • paper_url: http://arxiv.org/abs/2308.03472
  • repo_url: None
  • paper_authors: Lucas English, Mahdi Abolghasemi
  • for: 预测可再生能源生产,帮助决策全球减排。
  • methods: 使用层次预测和协调,以提高预测质量。
  • results: 跨时间和空间协调预测方法可以提高预测精度,特别是在多个时间层级。 linear regression 可以在大多数水平上超过机器学习模型的性能。
    Abstract Forecasting of renewable energy generation provides key insights which may help with decision-making towards global decarbonisation. Renewable energy generation can often be represented through cross-sectional hierarchies, whereby a single farm may have multiple individual generators. Hierarchical forecasting through reconciliation has demonstrated a significant increase in the quality of forecasts both theoretically and empirically. However, it is not evident whether forecasts generated by individual temporal and cross-sectional aggregation can be superior to integrated cross-temporal forecasts and to individual forecasts on more granular data. In this study, we investigate the accuracies of different cross-sectional and cross-temporal reconciliation methods using both linear regression and gradient boosting machine learning for forecasting wind farm power generation. We found that cross-temporal reconciliation is superior to individual cross-sectional reconciliation at multiple temporal aggregations. Cross-temporally reconciled machine learning base forecasts also demonstrated a high accuracy at coarser temporal granularities, which may encourage adoption for short-term wind forecasts. We also show that linear regression can outperform machine learning models across most levels in cross-sectional wind time series.
    摘要 预测可再生能源生产提供关键的视野,帮助决策全球减灰化。可再生能源生产经常可以用分层结构表示,一个农场可以有多个个体发电机。层次预测通过协调可以提高预测质量, both theoretically and empirically。然而,不清楚 Whether forecasts generated by individual temporal and cross-sectional aggregation can be superior to integrated cross-temporal forecasts and to individual forecasts on more granular data.本研究 investigate 不同的横截和时间层整合预测方法的准确性,使用线性回归和梯度拟合机器学习模型预测风力农场电力生产。发现cross-temporal reconciliation 在多个时间层上胜过单个横截协调。同时,梯度拟合机器学习基础预测也在更粗细的时间层上表现出高准确性,这可能推动短期风预测的采用。此外,我们还发现了线性回归在大多数水平上超过机器学习模型的准确性。

Wide Gaps and Clustering Axioms

  • paper_url: http://arxiv.org/abs/2308.03464
  • repo_url: None
  • paper_authors: Mieczysław A. Kłopotek
  • for: 本研究旨在探讨k-means算法是否遵循克林伯格的聚类axiomaatic系统,并提出一些改进方案来使k-means更加符合这个系统。
  • methods: 本研究使用了两种新的聚类性特征:变量k-分割性和剩余k-分割性,并证明了k-means算法在欧几何或非欧几何空间中遵循克林伯格的一致性axioma。
  • results: 研究发现,k-means算法在某些情况下会violate克林伯格的一致性axioma,这是因为数据本身不符合聚类axioma。为了解决这个问题,研究提出了一些改进方案,包括一种基于变量k-分割性和剩余k-分割性的k-means算法。这些方案可以在欧几何和非欧几何空间中实现,并且可以使k-means算法更加符合克林伯格的聚类axioma。
    Abstract The widely applied k-means algorithm produces clusterings that violate our expectations with respect to high/low similarity/density and is in conflict with Kleinberg's axiomatic system for distance based clustering algorithms that formalizes those expectations in a natural way. k-means violates in particular the consistency axiom. We hypothesise that this clash is due to the not explicated expectation that the data themselves should have the property of being clusterable in order to expect the algorithm clustering hem to fit a clustering axiomatic system. To demonstrate this, we introduce two new clusterability properties, variational k-separability and residual k-separability and show that then the Kleinberg's consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of k-means algorithm that fit approximately the Kleinberg's richness axiom that does not hold for k-means. In this way, we reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and for clusterability theory, practical contribution is the possibility to construct {datasets for testing purposes of algorithms optimizing k-means cost function. This includes a method of construction of {clusterable data with known in advance global optimum.
    摘要 广泛应用的k-means算法生成的分布不符合我们对高/低相似性和密度的预期,并与克林伯格的距离基于对数据分组算法的axiomaatic系统产生冲突。k-means Violates particular consistency axiom。我们 hypothesize 这个冲突是因为数据本身不具备分组性,以至于预期k-means对数据进行分组。 To demonstrate this, we introduce two new clusterability properties, variational k-separability and residual k-separability, and show that then the Kleinberg's consistency axiom holds for k-means operating in the Euclidean or non-Euclidean space. Furthermore, we propose extensions of k-means algorithm that fit approximately the Kleinberg's richness axiom that does not hold for k-means. In this way, we reconcile k-means with Kleinberg's axiomatic framework in Euclidean and non-Euclidean settings. Besides contribution to the theory of axiomatic frameworks of clustering and for clusterability theory, practical contribution is the possibility to construct datasets for testing purposes of algorithms optimizing k-means cost function. This includes a method of construction of clusterable data with known in advance global optimum.

High-Resolution Cranial Defect Reconstruction by Iterative, Low-Resolution, Point Cloud Completion Transformers

  • paper_url: http://arxiv.org/abs/2308.03813
  • repo_url: https://github.com/MWod/DeepImplant_MICCAI_2023
  • paper_authors: Marek Wodzinski, Mateusz Daniol, Daria Hemmerling, Miroslaw Socha
  • for: automatic cranial defect reconstruction
  • methods: iterative, transformer-based method
  • results: superior performance in terms of GPU memory consumption while maintaining high-quality of the reconstructed defectsHere’s the text in Simplified Chinese:
  • for: 自动化头部问题重建
  • methods: 迭代、基于传播器的方法
  • results: GPU 内存消耗量下降,并保持高品质的缺陷重建I hope this helps! Let me know if you have any other questions.
    Abstract Each year thousands of people suffer from various types of cranial injuries and require personalized implants whose manual design is expensive and time-consuming. Therefore, an automatic, dedicated system to increase the availability of personalized cranial reconstruction is highly desirable. The problem of the automatic cranial defect reconstruction can be formulated as the shape completion task and solved using dedicated deep networks. Currently, the most common approach is to use the volumetric representation and apply deep networks dedicated to image segmentation. However, this approach has several limitations and does not scale well into high-resolution volumes, nor takes into account the data sparsity. In our work, we reformulate the problem into a point cloud completion task. We propose an iterative, transformer-based method to reconstruct the cranial defect at any resolution while also being fast and resource-efficient during training and inference. We compare the proposed methods to the state-of-the-art volumetric approaches and show superior performance in terms of GPU memory consumption while maintaining high-quality of the reconstructed defects.
    摘要 每年数千人都会因为不同类型的头部伤害而需要个性化嵌入式设备,但 manual 的设计是贵重时间的。因此,一个自动化、专门的系统可以大幅提高个性化头部重建的可用性。我们可以将这个问题 формули为形状完成任务,并使用专门的深度网络解决。现有的最常见方法是使用积分表示法,并应用深度网络进行图像分割。但这种方法存在一些限制,并不能扩展到高分辨率的体积,同时也不考虑数据稀缺性。在我们的工作中,我们将问题重新формализова为点云完成任务。我们提议一种迭代的变换器基本方法,可以在任何分辨率下重建头部缺陷,同时也具有快速和资源高效的训练和推理特点。我们与当前状态的积分方法进行比较,并显示我们的提议方法在 GPU 内存占用量方面具有显著优势,而无需牺牲高质量的缺陷重建。

Redesigning Out-of-Distribution Detection on 3D Medical Images

  • paper_url: http://arxiv.org/abs/2308.07324
  • repo_url: None
  • paper_authors: Anton Vasiliuk, Daria Frolova, Mikhail Belyaev, Boris Shirokikh
  • for: 本研究旨在解决验证医学影像分割中的异常样本检测问题,特别是由于缺乏明确的异常数据定义,导致许多人 искусственно设定问题而无法测量临床影响。
  • methods: 本研究提出了一种根据医学影像三维数据特点和下游任务(例如分割)重新定义异常样本检测问题。通过利用下游模型的性能来定义异常样本,我们可以无需明确ID/OOD分类来衡量不同样本的影响。我们称这种方法为预期性能下降(EPD)。
  • results: 在11种CT和MRI异常样本检测挑战中,我们示出了EPD的效果,并证明EPD可以根据临床影响来排序方法。
    Abstract Detecting out-of-distribution (OOD) samples for trusted medical image segmentation remains a significant challenge. The critical issue here is the lack of a strict definition of abnormal data, which often results in artificial problem settings without measurable clinical impact. In this paper, we redesign the OOD detection problem according to the specifics of volumetric medical imaging and related downstream tasks (e.g., segmentation). We propose using the downstream model's performance as a pseudometric between images to define abnormal samples. This approach enables us to weigh different samples based on their performance impact without an explicit ID/OOD distinction. We incorporate this weighting in a new metric called Expected Performance Drop (EPD). EPD is our core contribution to the new problem design, allowing us to rank methods based on their clinical impact. We demonstrate the effectiveness of EPD-based evaluation in 11 CT and MRI OOD detection challenges.
    摘要 <> transtable text into Simplified Chinese<>检测非典型(OOD)样本 для信任的医学影像分割是一个重要的挑战。关键问题在于缺乏严格定义的异常数据,这经常导致人工设定的问题无法量化临床影响。在这篇论文中,我们重新设计了OOD检测问题,根据医学影像的特点和相关的下游任务(如分割)。我们提议使用下游模型的性能作为图像之间的 pseudometric,以定义异常样本。这种方法允许我们根据不同样本的性能影响加权它们,无需显式的ID/OOD分类。我们称之为预期性能下降(EPD)。 EPD是我们对新的问题设计的核心贡献,允许我们根据临床影响排名方法。我们在11个CT和MRI OOD检测挑战中展示了EPD-基于的评价效果。

Cross-Silo Prototypical Calibration for Federated Learning with Non-IID Data

  • paper_url: http://arxiv.org/abs/2308.03457
  • repo_url: https://github.com/qizhuang-qz/FedCSPC
  • paper_authors: Zhuang Qi, Lei Meng, Zitan Chen, Han Hu, Hui Lin, Xiangxu Meng
  • for: 这个论文目的是提出一种基于联合特征 проtotypical calibration 方法 (FedCSPC),以在隐私保护下实现训练全球模型,并且能够对不同资料来源的数据进行一致性调整。
  • methods: 本论文使用了 Data Prototypical Modeling (DPM) 模组和 Cross-silo Prototypical Calibration (CSPC) 模组,DPM 模组可以帮助获取数据模式,而 CSPC 模组可以将不同来源的数据调整到一个共同的特征空间中,并且可以实现一致性调整。
  • results: 实验结果显示,FedCSPC 方法可以在不同资料来源上学习一致的特征,并且比起现有的方法有更好的性能。
    Abstract Federated Learning aims to learn a global model on the server side that generalizes to all clients in a privacy-preserving manner, by leveraging the local models from different clients. Existing solutions focus on either regularizing the objective functions among clients or improving the aggregation mechanism for the improved model generalization capability. However, their performance is typically limited by the dataset biases, such as the heterogeneous data distributions and the missing classes. To address this issue, this paper presents a cross-silo prototypical calibration method (FedCSPC), which takes additional prototype information from the clients to learn a unified feature space on the server side. Specifically, FedCSPC first employs the Data Prototypical Modeling (DPM) module to learn data patterns via clustering to aid calibration. Subsequently, the cross-silo prototypical calibration (CSPC) module develops an augmented contrastive learning method to improve the robustness of the calibration, which can effectively project cross-source features into a consistent space while maintaining clear decision boundaries. Moreover, the CSPC module's ease of implementation and plug-and-play characteristics make it even more remarkable. Experiments were conducted on four datasets in terms of performance comparison, ablation study, in-depth analysis and case study, and the results verified that FedCSPC is capable of learning the consistent features across different data sources of the same class under the guidance of calibrated model, which leads to better performance than the state-of-the-art methods. The source codes have been released at https://github.com/qizhuang-qz/FedCSPC.
    摘要 联合学习目标是在服务器端学习一个总模型,该模型在保持隐私的情况下泛化到所有客户端。现有解决方案通常是通过客户端对象函数的规范化或改进模型聚合机制来提高模型泛化能力。然而,这些方法通常受到数据偏见的限制,如不同数据分布和缺失类。为解决这个问题,本文提出了跨积 Silva 批量准备方法(FedCSPC),该方法通过客户端提供的额外原型信息在服务器端学习一个统一的特征空间。具体来说,FedCSPC首先使用数据批量模型(DPM)模块学习数据模式以帮助准备。接着,跨积 Silva 批量准备(CSPC)模块提供了一种改进了准备的批量学习方法,可以有效地将跨源特征投影到一个协调的空间中,保持清晰的决策界。此外,CSPC模块的易于实现和插件化特性使其更加吸引人。经过对四个数据集的性能比较、简洁研究、深入分析和案例研究,结果证明了 FedCSPC 能够在不同数据源之间学习一致的特征,这导致了与状态艺术法比较的更好的性能。代码已经在 GitHub 上发布,请参考

Doubly Robust Estimator for Off-Policy Evaluation with Large Action Spaces

  • paper_url: http://arxiv.org/abs/2308.03443
  • repo_url: https://github.com/tatsu432/DR-estimator-OPE-large-action
  • paper_authors: Tatsuhiro Shimizu, Laura Forastiere
  • for: Off-Policy Evaluation (OPE) in contextual bandit settings with large action spaces.
  • methods: 使用 Marginalized Inverse Propensity Scoring (MIPS) 和 Marginalized Doubly Robust (MDR) estimator.
  • results: 提供了一种更加精度的 estimator, 并且在实验中证明了其超过了现有的 estimator.
    Abstract We study Off-Policy Evaluation (OPE) in contextual bandit settings with large action spaces. The benchmark estimators suffer from severe bias and variance tradeoffs. Parametric approaches suffer from bias due to difficulty specifying the correct model, whereas ones with importance weight suffer from variance. To overcome these limitations, Marginalized Inverse Propensity Scoring (MIPS) was proposed to mitigate the estimator's variance via embeddings of an action. To make the estimator more accurate, we propose the doubly robust estimator of MIPS called the Marginalized Doubly Robust (MDR) estimator. Theoretical analysis shows that the proposed estimator is unbiased under weaker assumptions than MIPS while maintaining variance reduction against IPS, which was the main advantage of MIPS. The empirical experiment verifies the supremacy of MDR against existing estimators.
    摘要 我们研究在Contextual Bandit设置下的Off-Policy评估(OPE),它们的标准估计器受到严重的偏见和方差交易的影响。参数化方法受到模型难以准确地特定的偏见,而重要性Weighted方法受到方差的影响。为了解决这些限制,我们提出了Embeddings of an action的Marginalized Inverse Propensity Scoring(MIPS)来减少估计器的方差。为了使估计器更准确,我们提出了MIPS的双重Robust(MDR)估计器。理论分析表明,我们的估计器在较弱的假设下具有不偏性,同时维持IPS的方差减少。实验证明了MDR的超越性。

  • paper_url: http://arxiv.org/abs/2308.03417
  • repo_url: https://github.com/purl-sanitizer/purl
  • paper_authors: Shaoor Munir, Patrick Lee, Umar Iqbal, Zubair Shafiq, Sandra Siby
  • for: 防止追踪浏览器新增防御策略,novel tracking方法继续出现。
  • methods: 利用机器学习方法,检测和净化链接装饰中的追踪信息。
  • results: PURL可以准确地检测和净化链接装饰,比现有Countermeasure更高效和可靠,并对常见欺骗技术有较好的鲁棒性。
    Abstract While privacy-focused browsers have taken steps to block third-party cookies and browser fingerprinting, novel tracking methods that bypass existing defenses continue to emerge. Since trackers need to exfiltrate information from the client- to server-side through link decoration regardless of the tracking technique they employ, a promising orthogonal approach is to detect and sanitize tracking information in decorated links. We present PURL, a machine-learning approach that leverages a cross-layer graph representation of webpage execution to safely and effectively sanitize link decoration. Our evaluation shows that PURL significantly outperforms existing countermeasures in terms of accuracy and reducing website breakage while being robust to common evasion techniques. We use PURL to perform a measurement study on top-million websites. We find that link decorations are widely abused by well-known advertisers and trackers to exfiltrate user information collected from browser storage, email addresses, and scripts involved in fingerprinting.
    摘要 “对于隐私浏览器的尝试,第三方Cookie和浏览器指纹都已经被防止,但新的追踪方法继续出现,这些方法可以跳过现有的防护措施。因为追踪者需要将信息从客户端传到服务器端,因此一个有效的对策是检测和清理链接装饰。我们提出了PURL,一种机器学习方法,利用页面执行的跨层图表示来安全地和有效地检测和清理链接装饰。我们的评估显示,PURL比现有的对策更高度精度和减少网站损坏,同时具有对常见的逃脱技术的抗性。我们使用PURL进行了顶千个网站的测量研究,发现链接装饰被知名的广告商和追踪者广泛运用,以将用户信息从浏览器储存、电子邮件地址和脚本散发扫描撷取到。”Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China and Singapore.

Noncompact uniform universal approximation

  • paper_url: http://arxiv.org/abs/2308.03812
  • repo_url: None
  • paper_authors: Teun D. H. van Nuland
  • for: 这个论文探讨了universal approximation theorem在非 компакт输入空间 $\mathbb R^n$ 上的普遍化 convergenc。
  • methods: 这个论文使用了神经网络来对所有在 $\mathbb R^n$ 上连续函数进行uniform approximation。
  • results: 研究发现,对于所有非零 activation function $\varphi$ 和所有 $n$ 和 $l\geq2$, THEN $\mathcal{N}_\varphi^l(\mathbb R^n)$ 是一个 vector space,且对于左限和右限不同的 $\varphi$,这个 vector space 独立于 $\varphi$ 和 $l$,且等于 sigmoid compose with one-dimensional projection 的闭 span。对于左限和右限相同的 $\varphi$,这个 vector space 等于 commutative resolvent algebra,一个 C*-algebra,且独立于 $l\geq1$。
    Abstract The universal approximation theorem is generalised to uniform convergence on the (noncompact) input space $\mathbb R^n$. All continuous functions that vanish at infinity can be uniformly approximated by neural networks with one hidden layer, for all continuous activation functions $\varphi\neq0$ with asymptotically linear behaviour at $\pm\infty$. When $\varphi$ is moreover bounded, we exactly determine which functions can be uniformly approximated by neural networks, with the following unexpected results. Let $\overline{\mathcal{N}_\varphi^l(\mathbb R^n)}$ denote the vector space of functions that are uniformly approximable by neural networks with $l$ hidden layers and $n$ inputs. For all $n$ and all $l\geq2$, $\overline{\mathcal{N}_\varphi^l(\mathbb R^n)}$ turns out to be an algebra under the pointwise product. If the left limit of $\varphi$ differs from its right limit (for instance, when $\varphi$ is sigmoidal) the algebra $\overline{\mathcal{N}_\varphi^l(\mathbb R^n)}$ ($l\geq2$) is independent of $\varphi$ and $l$, and equals the closed span of products of sigmoids composed with one-dimensional projections. If the left limit of $\varphi$ equals its right limit, $\overline{\mathcal{N}_\varphi^l(\mathbb R^n)}$ ($l\geq1$) equals the (real part of the) commutative resolvent algebra, a C*-algebra which is used in mathematical approaches to quantum theory. In the latter case, the algebra is independent of $l\geq1$, whereas in the former case $\overline{\mathcal{N}_\varphi^2(\mathbb R^n)}$ is strictly bigger than $\overline{\mathcal{N}_\varphi^1(\mathbb R^n)}$.
    摘要 “universal approximation theorem”被推广到非 compat 输入空间 $\mathbb R^n$ 上的 uniform convergence。所有在 infinities 处消失的连续函数可以由一层神经网络 uniform approximation,对所有非零连续激活函数 $\varphi$ 的 asymptotically linear behavior at $\pm\infty$。当 $\varphi$ moreover bounded 时,我们可以准确地确定可以 uniform approximation 的函数,并且有以下意外的结果。具有 $n$ 输入和 $l$ 层神经网络的函数空间 $\mathcal{N}_\varphi^l(\mathbb R^n)$ 被定义为可以通过神经网络 uniform approximation 的函数空间。对所有 $n$ 和 $l\geq2$,$\mathcal{N}_\varphi^l(\mathbb R^n)$ 是一个点wise product 的代数。如果左限的 $\varphi$ 不同于右限(例如,sigmoid 函数), то $\mathcal{N}_\varphi^l(\mathbb R^n)$ ($l\geq2$) 是 $\varphi$ 和 $l$ 的独立的代数,等于sigmoid compose with one-dimensional projection 的闭 span。如果左限的 $\varphi$ 等于右限,那么 $\mathcal{N}_\varphi^l(\mathbb R^n)$ ($l\geq1$) 是一个 commutative resolvent algebra,一个 C*-algebra,这种代数在数学方法中用于量子理论。在后者情况下,这个代数是 $l\geq1$ 的独立的,而在前者情况下,$\mathcal{N}_\varphi^2(\mathbb R^n)$ 是 $\mathcal{N}_\varphi^1(\mathbb R^n)$ 的 strictly bigger。

Applied metamodelling for ATM performance simulations

  • paper_url: http://arxiv.org/abs/2308.03404
  • repo_url: None
  • paper_authors: Christoffer Riis, Francisco N. Antunes, Tatjana Bolić, Gérald Gurtner, Andrew Cook, Carlos Lima Azevedo, Francisco Câmara Pereira
  • for: 提高ATM simulator的计划和运作的决策支持
  • methods: integrate active learning和SHAP值进行模拟мета模型
  • results: 比XGBoost模型具有更好的解释能力,并且可以更好地揭示输入和输出变量之间的隐藏关系。
    Abstract The use of Air traffic management (ATM) simulators for planing and operations can be challenging due to their modelling complexity. This paper presents XALM (eXplainable Active Learning Metamodel), a three-step framework integrating active learning and SHAP (SHapley Additive exPlanations) values into simulation metamodels for supporting ATM decision-making. XALM efficiently uncovers hidden relationships among input and output variables in ATM simulators, those usually of interest in policy analysis. Our experiments show XALM's predictive performance comparable to the XGBoost metamodel with fewer simulations. Additionally, XALM exhibits superior explanatory capabilities compared to non-active learning metamodels. Using the `Mercury' (flight and passenger) ATM simulator, XALM is applied to a real-world scenario in Paris Charles de Gaulle airport, extending an arrival manager's range and scope by analysing six variables. This case study illustrates XALM's effectiveness in enhancing simulation interpretability and understanding variable interactions. By addressing computational challenges and improving explainability, XALM complements traditional simulation-based analyses. Lastly, we discuss two practical approaches for reducing the computational burden of the metamodelling further: we introduce a stopping criterion for active learning based on the inherent uncertainty of the metamodel, and we show how the simulations used for the metamodel can be reused across key performance indicators, thus decreasing the overall number of simulations needed.
    摘要 使用空交通管理(ATM)模拟器进行规划和运行可能会面临模型复杂性挑战。本文介绍XALM(可解释主动学习元模型),一个三步框架,将活动学习和SHAP(SHapley Additive exPlanations)值 integrate到 simulation元模型中,以支持ATM决策。XALM能够效率地揭示ATM模拟器中输入和输出变量之间的隐藏关系,通常是政策分析中的关键点。我们的实验表明,XALM的预测性能与XGBoost元模型相当,而且XALM的解释能力比非活动学习元模型更高。使用Mercury(飞机和乘客)ATM模拟器,XALM在法国巴黎查理·德·古尔机场的一个实际场景中应用,分析了六个变量。这个案例示出了XALM在提高模拟解释性和理解变量互动方面的效果。通过解决计算挑战和提高解释性,XALM补充了传统的模拟分析。最后,我们介绍了两种实用的计算压力减轻方法:基于元模型内在不确定性的活动学习停止 criterion,以及可以将模拟用于元模型中的 simulation reuse across key performance indicators,从而降低总的模拟数量。

Towards Machine Learning-based Fish Stock Assessment

  • paper_url: http://arxiv.org/abs/2308.03403
  • repo_url: None
  • paper_authors: Stefan Lüdtke, Maria E. Pierce
  • for: 提高可持续性渔业管理中鱼类资源的准确评估
  • methods: 使用机器学习模型改进鱼类资源参数的估计和预测
  • results: 对五种不同的鱼类资源进行实验,发现预测减降率和繁殖种群质量的准确率有很大改善
    Abstract The accurate assessment of fish stocks is crucial for sustainable fisheries management. However, existing statistical stock assessment models can have low forecast performance of relevant stock parameters like recruitment or spawning stock biomass, especially in ecosystems that are changing due to global warming and other anthropogenic stressors. In this paper, we investigate the use of machine learning models to improve the estimation and forecast of such stock parameters. We propose a hybrid model that combines classical statistical stock assessment models with supervised ML, specifically gradient boosted trees. Our hybrid model leverages the initial estimate provided by the classical model and uses the ML model to make a post-hoc correction to improve accuracy. We experiment with five different stocks and find that the forecast accuracy of recruitment and spawning stock biomass improves considerably in most cases.
    摘要 准确评估淡水鱼资源非常重要,以实现可持续的渔业管理。然而,现有的统计鱼填评估模型可能具有低预测性能,特别是在因全球变暖和其他人类压力而变化的生态系统中。在这篇论文中,我们研究了使用机器学习模型提高鱼填评估和预测的方法。我们提议一种混合模型,结合传统的统计鱼填评估模型和监督学习,具体来说是梯度提升树。我们的混合模型利用传统模型提供的初始估计,并使用ML模型进行后续更正,以提高准确性。我们对五个不同的鱼种进行实验,发现预测准确性在大多数情况下有显著提高。

Enhancing Nucleus Segmentation with HARU-Net: A Hybrid Attention Based Residual U-Blocks Network

  • paper_url: http://arxiv.org/abs/2308.03382
  • repo_url: None
  • paper_authors: Junzhou Chen, Qian Huang, Yulin Chen, Linyi Qian, Chengyuan Yu
  • for: 这个研究主要旨在提高核体像素化的精度和效率,以便于生物医学分析、诊断和分类中使用。
  • methods: 我们提出了一个基于双支分支网络的混合注意力残差U-块方法,可以同时预测目标信息和目标 outline。我们还提出了一个后处理方法,可以结合目标信息和目标 outline来区别遮蔽的核体和生成实例分割图像。
  • results: 我们的方法在各个数据集上进行了广泛的量化评估,结果显示我们的方法在与现有方法比较时表现出色,特别是在应用于不规则核体的情况下。
    Abstract Nucleus image segmentation is a crucial step in the analysis, pathological diagnosis, and classification, which heavily relies on the quality of nucleus segmentation. However, the complexity of issues such as variations in nucleus size, blurred nucleus contours, uneven staining, cell clustering, and overlapping cells poses significant challenges. Current methods for nucleus segmentation primarily rely on nuclear morphology or contour-based approaches. Nuclear morphology-based methods exhibit limited generalization ability and struggle to effectively predict irregular-shaped nuclei, while contour-based extraction methods face challenges in accurately segmenting overlapping nuclei. To address the aforementioned issues, we propose a dual-branch network using hybrid attention based residual U-blocks for nucleus instance segmentation. The network simultaneously predicts target information and target contours. Additionally, we introduce a post-processing method that combines the target information and target contours to distinguish overlapping nuclei and generate an instance segmentation image. Within the network, we propose a context fusion block (CF-block) that effectively extracts and merges contextual information from the network. Extensive quantitative evaluations are conducted to assess the performance of our method. Experimental results demonstrate the superior performance of the proposed method compared to state-of-the-art approaches on the BNS, MoNuSeg, CoNSeg, and CPM-17 datasets.
    摘要 核心图像分割是生物体分析、病理诊断和分类中的关键步骤,它的质量直接影响下游应用的结果。然而,核心图像分割过程面临着许多复杂的问题,如核心大小变化、核心渐圆、不均颜色、细胞堆叠和重叠细胞等。现有的核心图像分割方法主要基于核心形态或边沿基本方法。核心形态基本方法具有限定泛化能力,难以预测不规则形状的核心,而边沿基本方法在重叠细胞分割上存在困难。为解决以上问题,我们提出了一种基于双支网络的核心实例分割方法。该网络同时预测目标信息和目标极值。此外,我们引入了一种 combining 目标信息和目标极值的后处理方法,以分辨重叠的核心并生成实例分割图像。在网络中,我们提出了一种Context Fusion块(CF-块),可以有效地提取和融合网络中的Contextual信息。我们对方法进行了广泛的量化评估,并发现方法的性能在BNS、MoNuSeg、CoNSeg和CPM-17等数据集上都显著超过了现有方法。

A reading survey on adversarial machine learning: Adversarial attacks and their understanding

  • paper_url: http://arxiv.org/abs/2308.03363
  • repo_url: None
  • paper_authors: Shashank Kotyan
  • for: 本研究旨在探讨和理解针对神经网络的攻击方法,以系统化的方式掌握攻击方法的类别和特点。
  • methods: 本文使用了多种攻击方法,包括随机攻击、梯度攻击、缺失攻击、噪声攻击等,以测试神经网络的抵御能力。
  • results: 本文通过对多种神经网络模型进行攻击和防御测试,发现攻击方法的多样性和神经网络模型的抵御能力强度不同,并提出了一些未来研究方向。
    Abstract Deep Learning has empowered us to train neural networks for complex data with high performance. However, with the growing research, several vulnerabilities in neural networks have been exposed. A particular branch of research, Adversarial Machine Learning, exploits and understands some of the vulnerabilities that cause the neural networks to misclassify for near original input. A class of algorithms called adversarial attacks is proposed to make the neural networks misclassify for various tasks in different domains. With the extensive and growing research in adversarial attacks, it is crucial to understand the classification of adversarial attacks. This will help us understand the vulnerabilities in a systematic order and help us to mitigate the effects of adversarial attacks. This article provides a survey of existing adversarial attacks and their understanding based on different perspectives. We also provide a brief overview of existing adversarial defences and their limitations in mitigating the effect of adversarial attacks. Further, we conclude with a discussion on the future research directions in the field of adversarial machine learning.
    摘要 深度学习已经赋予我们训练复杂数据的神经网络高性能。然而,随着研究的增长,许多神经网络的漏洞也被曝光。一个特定的研究分支,敌意机器学习,利用和掌握了一些导致神经网络错分的漏洞。一类称为敌意攻击的算法被提出来使神经网络错分各种任务在不同领域。随着敌意攻击的广泛和增长的研究,我们需要理解敌意攻击的分类。这将帮助我们系统地理解漏洞,并帮助我们减轻敌意攻击的影响。本文提供了现有的敌意攻击和它们的理解基于不同的角度。我们还提供了敌意防御的简要概述和其限制在减轻敌意攻击的影响。最后,我们结束 WITH 未来机器学习领域的研究方向。

Solving Falkner-Skan type equations via Legendre and Chebyshev Neural Blocks

  • paper_url: http://arxiv.org/abs/2308.03337
  • repo_url: None
  • paper_authors: Alireza Afzal Aghaei, Kourosh Parand, Ali Nikkhah, Shakila Jaberi
  • for: 解决非线性法克-斯坦方程
  • methods: 使用Legendre和Chebyshev神经块,利用 ortogonal polynomials 在神经网络中增强人工神经网络的近似能力
  • results: 通过模拟不同的法克-斯坦方程配置,实现了提高计算效率和准确率的目的
    Abstract In this paper, a new deep-learning architecture for solving the non-linear Falkner-Skan equation is proposed. Using Legendre and Chebyshev neural blocks, this approach shows how orthogonal polynomials can be used in neural networks to increase the approximation capability of artificial neural networks. In addition, utilizing the mathematical properties of these functions, we overcome the computational complexity of the backpropagation algorithm by using the operational matrices of the derivative. The efficiency of the proposed method is carried out by simulating various configurations of the Falkner-Skan equation.
    摘要 在本文中,一种新的深度学习架构,用于解决非线性法克纳-斯坦方程,被提出。使用Legendre和Chebyshev神经块,这种方法展示了如何在神经网络中使用正交多项式增加人工神经网络的近似能力。此外,利用这些函数的数学性质,我们超越了反射算法的计算复杂性,使用操作矩阵的导数。提出的方法的效率被通过 simulate多种法克纳-斯坦方程的配置进行证明。

Non-Convex Bilevel Optimization with Time-Varying Objective Functions

  • paper_url: http://arxiv.org/abs/2308.03811
  • repo_url: None
  • paper_authors: Sen Lin, Daouda Sow, Kaiyi Ji, Yingbin Liang, Ness Shroff
  • for: 本研究强调在在线应用中实现精细化优化,满足流动数据和时间变化函数的需求。
  • methods: 我们提出了一种基于单 Loop 的在线双层优化器(SOBOW),通过窗口均值来更新外层决策,不需要知道过去函数。我们还开发了一种新的分析技术,用于综合分析决策变量之间的复杂 Coupling,并且精细控制了 hypergradient 估计误差。
  • results: 我们证明 SOBOW 可以在某些条件下实现幂等级的双层本地 regret。广泛的实验结果证明 SOBOW 的效果。
    Abstract Bilevel optimization has become a powerful tool in a wide variety of machine learning problems. However, the current nonconvex bilevel optimization considers an offline dataset and static functions, which may not work well in emerging online applications with streaming data and time-varying functions. In this work, we study online bilevel optimization (OBO) where the functions can be time-varying and the agent continuously updates the decisions with online streaming data. To deal with the function variations and the unavailability of the true hypergradients in OBO, we propose a single-loop online bilevel optimizer with window averaging (SOBOW), which updates the outer-level decision based on a window average of the most recent hypergradient estimations stored in the memory. Compared to existing algorithms, SOBOW is computationally efficient and does not need to know previous functions. To handle the unique technical difficulties rooted in single-loop update and function variations for OBO, we develop a novel analytical technique that disentangles the complex couplings between decision variables, and carefully controls the hypergradient estimation error. We show that SOBOW can achieve a sublinear bilevel local regret under mild conditions. Extensive experiments across multiple domains corroborate the effectiveness of SOBOW.
    摘要 bilateral 优化已成为机器学习问题中的一种强大工具。然而,当前的非凸 bilateral 优化假设了一个离线数据集和静止函数,这可能不适用于新般的在线应用程序中的流动数据和时间变化函数。在这种工作中,我们研究在线 bilateral 优化(OBO),其中函数可以是时间变化的,代理人在线流动数据中不断更新决策。为了处理函数的变化和真实的梯度不可知,我们提议了一种带窗口平均(SOBOW)的单loop在线 bilateral 优化器,其在内存中保存最近的梯度估计,并基于窗口平均更新外层决策。相比现有算法,SOBOW具有计算效率和不需要知道前一个函数的优点。为了处理单 loop 更新和函数变化对 OBO 的独特技术难点,我们开发了一种新的分析技术,可以分解决策变量之间的复杂 Coupling,并且精心控制梯度估计错误。我们表明,SOBOW 可以在某些条件下 achieve 下降的 bilateral 地方 regret。广泛的实验证明了 SOBOW 的有效性。

Expediting Neural Network Verification via Network Reduction

  • paper_url: http://arxiv.org/abs/2308.03330
  • repo_url: None
  • paper_authors: Yuyi Zhong, Ruiwei Wang, Siau-Cheng Khoo
  • for: 验证深度神经网络的安全性属性,以确保神经网络在关键应用中正常工作。
  • methods: 提出了多种验证方法,以验证神经网络的安全性。但是,许多已知的验证工具仍然无法处理复杂的网络架构和大型神经网络。本文提出了一种网络减少技术作为验证前置处理方法。
  • results: 我们在大量的 benchmark 上实验表明,提posed 的减少技术可以减少神经网络,并使现有的验证工具更快速。此外,实验结果还表明,网络减少可以提高现有验证工具对许多神经网络的可用性。
    Abstract A wide range of verification methods have been proposed to verify the safety properties of deep neural networks ensuring that the networks function correctly in critical applications. However, many well-known verification tools still struggle with complicated network architectures and large network sizes. In this work, we propose a network reduction technique as a pre-processing method prior to verification. The proposed method reduces neural networks via eliminating stable ReLU neurons, and transforming them into a sequential neural network consisting of ReLU and Affine layers which can be handled by the most verification tools. We instantiate the reduction technique on the state-of-the-art complete and incomplete verification tools, including alpha-beta-crown, VeriNet and PRIMA. Our experiments on a large set of benchmarks indicate that the proposed technique can significantly reduce neural networks and speed up existing verification tools. Furthermore, the experiment results also show that network reduction can improve the availability of existing verification tools on many networks by reducing them into sequential neural networks.
    摘要 深度神经网络的安全性特性的验证方法有很多已经被提出,以确保神经网络在关键应用中正确地工作。然而,许多知名的验证工具仍然无法处理复杂的网络架构和大型网络。在这种情况下,我们提出了一种网络减少技术作为预处理方法,以降低验证工具的难度。我们的方法利用稳定的ReLU神经元消除和转换为一个包含ReLU和Affine层的顺序神经网络,这种网络可以被大多数验证工具处理。我们在 alpha-beta-crown、VeriNet 和 PRIMA 等当今最佳实践中的完整和 incomplete 验证工具上实现了这种减少技术。我们对一组大型标准 benchmark 进行了实验,结果表明,我们的方法可以减少神经网络,并使现有的验证工具在许多网络上提高可用性。此外,实验结果还表明,网络减少可以提高现有验证工具对许多网络的验证能力。

AFN: Adaptive Fusion Normalization via Encoder-Decoder Framework

  • paper_url: http://arxiv.org/abs/2308.03321
  • repo_url: https://github.com/huanranchen/ASRNorm
  • paper_authors: Zikai Zhou, Huanran Chen
  • for: 该论文目的是提出一种统一的Normalization函数,以减少不同Normalization函数的缺点。
  • methods: 该论文使用了多种Normalization函数,并通过对比这些函数的优缺点,提出了一种新的Adaptive Fusion Normalization函数。
  • results: 实验结果显示,AFN函数在领域泛化和图像分类任务中表现较好,超过了之前的Normalization技术。
    Abstract The success of deep learning is inseparable from normalization layers. Researchers have proposed various normalization functions, and each of them has both advantages and disadvantages. In response, efforts have been made to design a unified normalization function that combines all normalization procedures and mitigates their weaknesses. We also proposed a new normalization function called Adaptive Fusion Normalization. Through experiments, we demonstrate AFN outperforms the previous normalization techniques in domain generalization and image classification tasks.
    摘要 深度学习的成功与 нормализацион层无可分割。研究人员已经提出了多种 нормализацион函数,每种都有其优点和缺点。为了解决这些弱点,努力设计一个统一的 нормализацион函数,既能汇集所有的 нормализацион过程,又能减轻它们的缺点。我们还提出了一种新的 нормаliasjon函数,叫做自适应融合 нормаliasjon(AFN)。经过实验,我们证明AFN在领域普适化和图像分类任务中表现出色。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Binary Federated Learning with Client-Level Differential Privacy

  • paper_url: http://arxiv.org/abs/2308.03320
  • repo_url: None
  • paper_authors: Lumin Liu, Jun Zhang, Shenghui Song, Khaled B. Letaief
  • for: 提高 Federated Learning 系统的隐私保护和性能。
  • methods: 使用 binary neural networks (BNNs) 和离散噪声来实现 client-level 隐私保护,并且通过减少模型参数的精度来提高通信效率。
  • results: 实验结果基于 MNIST 和 Fashion-MNIST 数据集表明,提议的训练算法可以实现客户端级隐私保护而同时具有低通信开销的优势。
    Abstract Federated learning (FL) is a privacy-preserving collaborative learning framework, and differential privacy can be applied to further enhance its privacy protection. Existing FL systems typically adopt Federated Average (FedAvg) as the training algorithm and implement differential privacy with a Gaussian mechanism. However, the inherent privacy-utility trade-off in these systems severely degrades the training performance if a tight privacy budget is enforced. Besides, the Gaussian mechanism requires model weights to be of high-precision. To improve communication efficiency and achieve a better privacy-utility trade-off, we propose a communication-efficient FL training algorithm with differential privacy guarantee. Specifically, we propose to adopt binary neural networks (BNNs) and introduce discrete noise in the FL setting. Binary model parameters are uploaded for higher communication efficiency and discrete noise is added to achieve the client-level differential privacy protection. The achieved performance guarantee is rigorously proved, and it is shown to depend on the level of discrete noise. Experimental results based on MNIST and Fashion-MNIST datasets will demonstrate that the proposed training algorithm achieves client-level privacy protection with performance gain while enjoying the benefits of low communication overhead from binary model updates.
    摘要 federated learning(FL)是一种隐私保护的协作学习框架,可以通过减少隐私泄露来进一步增强隐私保护。现有的FL系统通常采用联邦平均(FedAvg)作为训练算法,并通过高精度的模型权重来实现减少隐私的目的。然而,这种隐私性和用途之间的质量负担在这些系统中严重地降低了训练性能,特别是当强制实施严格的隐私预算时。此外,高精度的模型权重需要高精度的数据。为了提高通信效率和实现更好的隐私性和用途之间的质量负担,我们提议一种通信高效的FL训练算法,并且保证隐私性。具体来说,我们提议采用二进制神经网络(BNN)和在FL设定中引入离散噪声。二进制模型参数上传以提高通信效率,而离散噪声可以实现客户端级别的隐私保护。我们通过 teorema 证明了性能保证,并证明其取决于离散噪声的水平。实验结果基于 MNIST 和 Fashion-MNIST 数据集表明,提议的训练算法可以实现客户端级别的隐私保护,同时享受到低通信开销的 binary 模型更新的好处。

HomOpt: A Homotopy-Based Hyperparameter Optimization Method

  • paper_url: http://arxiv.org/abs/2308.03317
  • repo_url: https://github.com/jeffkinnison/shadho
  • paper_authors: Sophia J. Abraham, Kehelwala D. G. Maduranga, Jeffery Kinnison, Zachariah Carmichael, Jonathan D. Hauenstein, Walter J. Scheirer
  • for: 提高机器学习模型的性能和效率,即 Hyperparameter Optimization (HPO) 问题。
  • methods: 提出一种新的数据驱动的 Hyperparameter Optimization 方法,基于 Generalized Additive Model (GAM) 函数和 homotopy 优化。
  • results: 对多种优化技术(如 Random Search、TPE、Bayes 和 SMAC)进行比较,并在多个标准机器学习Benchmark和开放集ognition任务上显示出更好的目标性能。
    Abstract Machine learning has achieved remarkable success over the past couple of decades, often attributed to a combination of algorithmic innovations and the availability of high-quality data available at scale. However, a third critical component is the fine-tuning of hyperparameters, which plays a pivotal role in achieving optimal model performance. Despite its significance, hyperparameter optimization (HPO) remains a challenging task for several reasons. Many HPO techniques rely on naive search methods or assume that the loss function is smooth and continuous, which may not always be the case. Traditional methods, like grid search and Bayesian optimization, often struggle to quickly adapt and efficiently search the loss landscape. Grid search is computationally expensive, while Bayesian optimization can be slow to prime. Since the search space for HPO is frequently high-dimensional and non-convex, it is often challenging to efficiently find a global minimum. Moreover, optimal hyperparameters can be sensitive to the specific dataset or task, further complicating the search process. To address these issues, we propose a new hyperparameter optimization method, HomOpt, using a data-driven approach based on a generalized additive model (GAM) surrogate combined with homotopy optimization. This strategy augments established optimization methodologies to boost the performance and effectiveness of any given method with faster convergence to the optimum on continuous, discrete, and categorical domain spaces. We compare the effectiveness of HomOpt applied to multiple optimization techniques (e.g., Random Search, TPE, Bayes, and SMAC) showing improved objective performance on many standardized machine learning benchmarks and challenging open-set recognition tasks.
    摘要 机器学习在过去几十年内取得了很大成功,经常归功于算法创新和大规模数据的可用性。然而,一个第三要 componenet是细化参数的调整,它在实现优化模型性能中扮演着关键的角色。尽管其重要性,但参数优化(HPO)仍然是一个具有挑战性的任务,主要因为以下几个原因:多数HPO技术利用粗暴的搜索方法,或者假设损失函数是连续的,这并不总是情况。传统的方法,如格里德搜索和bayesian优化,经常难以快速适应和有效地搜索损失函数的 landscape。格里德搜索 computationally expensive,而bayesian优化可能需要很长时间来 prime。由于搜索空间 дляHPOfrequently高维和非 convex,因此寻找全局最优点是有很大挑战。此外,优化参数可能会受到特定的数据集或任务的影响,这进一步增加了搜索过程的复杂性。为解决这些问题,我们提出了一种新的参数优化方法,HomOpt,使用基于通用添加模型(GAM)的数据驱动方法,并结合抽象优化。这种策略可以增强现有优化方法的性能和有效性,并在不同的域空间上提供更快的趋势。我们对HomOpt应用于多种优化技术(例如Random Search、TPE、Bayes和SMAC),并在许多标准化机器学习benchmark和开放集成任务上显示出了提高了目标性能。

Deep Q-Network for Stochastic Process Environments

  • paper_url: http://arxiv.org/abs/2308.03316
  • repo_url: None
  • paper_authors: Kuangheng He
  • for: 本研究用深度学习抽象环境中的束缚学习方法解决复杂问题。
  • methods: 本研究使用深度Q学习网络,并评估不同结构的网络在束缚过程环境中的性能。
  • results: 研究结果表明,使用特定网络结构可以在束缚过程环境中提高性能。
    Abstract Reinforcement learning is a powerful approach for training an optimal policy to solve complex problems in a given system. This project aims to demonstrate the application of reinforcement learning in stochastic process environments with missing information, using Flappy Bird and a newly developed stock trading environment as case studies. We evaluate various structures of Deep Q-learning networks and identify the most suitable variant for the stochastic process environment. Additionally, we discuss the current challenges and propose potential improvements for further work in environment-building and reinforcement learning techniques.
    摘要 “增强学习”是一种强大的方法,用于训练在给定系统中的优化策略,解决复杂的问题。这个项目的目标是通过使用“扩展 Deep Q-学习”网络和新开发的股票交易环境作为案例研究,在偏振过程环境中应用增强学习。我们评估了不同结构的 Deep Q-学习网络,并确定最适合偏振过程环境的变体。此外,我们还讨论了当前的挑战和可能的改进方法,以便进一步推进环境建设和增强学习技术。

Symmetry-Preserving Program Representations for Learning Code Semantics

  • paper_url: http://arxiv.org/abs/2308.03312
  • repo_url: None
  • paper_authors: Kexin Pei, Weichen Li, Qirui Jin, Shuyang Liu, Scott Geng, Lorenzo Cavallaro, Junfeng Yang, Suman Jana
  • for: 本研究旨在提高自动化程序理解的能力,尤其是安全任务中的核心问题。
  • methods: 我们Draw inspiration from examples of convolution layers exploiting translation symmetry,探讨如何使用代码 симметрии提高 LL M 架构。我们提出了一种正式的群理论框架,准确地定义代码 симметрии为 semantics-preserving 变换,并提供了 precisione reasoning 技术来保证在 LL M 架构中Symmetry preservation。
  • results: 我们 introduce a novel variant of self-attention that preserves program symmetries,并通过详细的实验评估,证明其在泛化和Robustness 方面的效果。总的来说,我们的代码 Symmetry 框架提供了正式和有力的理由技术,可以导向将来的特циалиzed LL M 的发展,并推动 LL M 驱动的程序理解任务的进步。
    Abstract Large Language Models (LLMs) have shown promise in automated program reasoning, a crucial aspect of many security tasks. However, existing LLM architectures for code are often borrowed from other domains like natural language processing, raising concerns about their generalization and robustness to unseen code. A key generalization challenge is to incorporate the knowledge of code semantics, including control and data flow, into the LLM architectures. Drawing inspiration from examples of convolution layers exploiting translation symmetry, we explore how code symmetries can enhance LLM architectures for program analysis and modeling. We present a rigorous group-theoretic framework that formally defines code symmetries as semantics-preserving transformations and provides techniques for precisely reasoning about symmetry preservation within LLM architectures. Using this framework, we introduce a novel variant of self-attention that preserves program symmetries, demonstrating its effectiveness in generalization and robustness through detailed experimental evaluations across different binary and source code analysis tasks. Overall, our code symmetry framework offers rigorous and powerful reasoning techniques that can guide the future development of specialized LLMs for code and advance LLM-guided program reasoning tasks.
    摘要 Inspired by the use of convolution layers that exploit translation symmetry, we explore how code symmetries can enhance LLM architectures for program analysis and modeling. We provide a rigorous group-theoretic framework that defines code symmetries as semantics-preserving transformations and provides techniques for precisely reasoning about symmetry preservation within LLM architectures.Using this framework, we introduce a novel variant of self-attention that preserves program symmetries, which we demonstrate to be effective in terms of generalization and robustness through detailed experimental evaluations across different binary and source code analysis tasks. Overall, our code symmetry framework offers rigorous and powerful reasoning techniques that can guide the future development of specialized LLMs for code and advance LLM-guided program reasoning tasks.

Implicit Graph Neural Diffusion Based on Constrained Dirichlet Energy Minimization

  • paper_url: http://arxiv.org/abs/2308.03306
  • repo_url: None
  • paper_authors: Guoji Fu, Mohammed Haroon Dupty, Yanfei Dong, Lee Wee Sun
  • for: This paper aims to address the issues of over-smoothing and limited adaptability in implicit graph neural networks (GNNs) by introducing a geometric framework for designing implicit graph diffusion layers.
  • methods: The paper proposes a parameterized graph Laplacian operator to learn the geometry of vertex and edge spaces, as well as the graph gradient operator from data. The implicit graph diffusion layer is viewed as the fixed-point solution of a Dirichlet energy minimization problem, and the authors design a solution with constraints on vertex features to trade off smoothing with the preservation of node feature information.
  • results: The paper demonstrates better performance than leading implicit and explicit GNNs on benchmark datasets for node and graph classification tasks, with substantial accuracy improvements observed for some datasets.
    Abstract Implicit graph neural networks (GNNs) have emerged as a potential approach to enable GNNs to capture long-range dependencies effectively. However, poorly designed implicit GNN layers can experience over-smoothing or may have limited adaptability to learn data geometry, potentially hindering their performance in graph learning problems. To address these issues, we introduce a geometric framework to design implicit graph diffusion layers based on a parameterized graph Laplacian operator. Our framework allows learning the geometry of vertex and edge spaces, as well as the graph gradient operator from data. We further show how implicit GNN layers can be viewed as the fixed-point solution of a Dirichlet energy minimization problem and give conditions under which it may suffer from over-smoothing. To overcome the over-smoothing problem, we design our implicit graph diffusion layer as the solution of a Dirichlet energy minimization problem with constraints on vertex features, enabling it to trade off smoothing with the preservation of node feature information. With an appropriate hyperparameter set to be larger than the largest eigenvalue of the parameterized graph Laplacian, our framework guarantees a unique equilibrium and quick convergence. Our models demonstrate better performance than leading implicit and explicit GNNs on benchmark datasets for node and graph classification tasks, with substantial accuracy improvements observed for some datasets.
    摘要 匿名图 neural networks (GNNs) 已经出现为一种可能的方法,以便 GNNs 可以有效地捕捉长距离依赖关系。然而,如果设计不当的匿名 GNN 层,可能会导致过滤或有限适应性,从而妨碍它们在图学习问题中表现。为了解决这些问题,我们提出了一个几何框架,用于设计基于参数化图拉普拉斯运算符的匿名图扩散层。我们的框架允许学习顶点和边空间的几何结构,以及图的梯度运算符从数据中学习。此外,我们还证明了匿名 GNN 层可以视为 Dirichlet 能量最小化问题的固定点解,并给出了避免过滤的条件。为了超越过滤问题,我们设计了一种基于顶点特征的 Dirichlet 能量最小化问题的约束,使得匿名图扩散层能够平衡平滑与保留顶点特征信息之间的权衡。在适当的超参数设置为大于最大 eigenvalues of 参数化图拉普拉斯运算符时,我们的框架保证唯一的平衡点和快速收敛。我们的模型在 benchmark 数据集上 для 节点和图分类任务中表现出色,与领先的匿名 GNN 和Explicit GNN 相比,具有显著的准确率提高。

Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection

  • paper_url: http://arxiv.org/abs/2308.03300
  • repo_url: https://github.com/cecile-hi/regularized-adaptive-weight-modification
  • paper_authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao, Chenglong Wang, Chuyuan Zhang
  • for: 这个论文的目的是解决伪音标注检测算法在不同数据集上表现下降的问题。
  • methods: 我们提出了一种持续学习算法,叫做Regularized Adaptive Weight Modification(RAWM),可以避免伪阳性检测算法中的溃败性忘记。当调整检测网络时,我们的方法会根据伪音和真音的比例进行适应的 modificaitondirection。
  • results: 我们在多个数据集上进行了跨数据集实验,结果表明我们的方法可以提高伪音标注检测的表现。另外,我们还引入了一个规化因素,以保持网络对于不同的音响环境中的真音标注的记忆。
    Abstract Current fake audio detection algorithms have achieved promising performances on most datasets. However, their performance may be significantly degraded when dealing with audio of a different dataset. The orthogonal weight modification to overcome catastrophic forgetting does not consider the similarity of genuine audio across different datasets. To overcome this limitation, we propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting, called Regularized Adaptive Weight Modification (RAWM). When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances. The adaptive modification direction ensures the network can effectively detect fake audio on the new dataset while preserving its knowledge of old model, thus mitigating catastrophic forgetting. In addition, genuine audio collected from quite different acoustic conditions may skew their feature distribution, so we introduce a regularization constraint to force the network to remember the old distribution in this regard. Our method can easily be generalized to related fields, like speech emotion recognition. We also evaluate our approach across multiple datasets and obtain a significant performance improvement on cross-dataset experiments.
    摘要 When fine-tuning a detection network, our approach adaptively computes the direction of weight modification based on the ratio of genuine utterances and fake utterances. This ensures that the network can effectively detect fake audio on the new dataset while preserving its knowledge of the old model, thus mitigating catastrophic forgetting.In addition, we introduce a regularization constraint to force the network to remember the old distribution of genuine audio in terms of feature distribution, even when faced with new audio from quite different acoustic conditions. Our method can easily be applied to related fields such as speech emotion recognition.We evaluate our approach across multiple datasets and observe a significant improvement in performance on cross-dataset experiments.

Studying Large Language Model Generalization with Influence Functions

  • paper_url: http://arxiv.org/abs/2308.03296
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman
  • for: 了解和 mitigate Machine Learning 模型中关联的风险
  • methods: 使用 Influence Functions 来回答一个 counterfactual:如果一个序列被添加到训练集中,如何改变模型的参数和输出?
  • results: 使用 Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) 方法可以在大型语言模型 (LLMs) 中扩展 Influence Functions,并且可以在几乎实时内计算 inverse-Hessian-vector product (IHVP)。我们的实验表明,EK-FAC 可以达到类似于传统的 Influence Functions 估计器的准确性,即使 IHVP 计算在下面的许多 órders of magnitude 快。
    Abstract When trying to gain better visibility into a machine learning model in order to understand and mitigate the associated risks, a potentially valuable source of evidence is: which training examples most contribute to a given behavior? Influence functions aim to answer a counterfactual: how would the model's parameters (and hence its outputs) change if a given sequence were added to the training set? While influence functions have produced insights for small models, they are difficult to scale to large language models (LLMs) due to the difficulty of computing an inverse-Hessian-vector product (IHVP). We use the Eigenvalue-corrected Kronecker-Factored Approximate Curvature (EK-FAC) approximation to scale influence functions up to LLMs with up to 52 billion parameters. In our experiments, EK-FAC achieves similar accuracy to traditional influence function estimators despite the IHVP computation being orders of magnitude faster. We investigate two algorithmic techniques to reduce the cost of computing gradients of candidate training sequences: TF-IDF filtering and query batching. We use influence functions to investigate the generalization patterns of LLMs, including the sparsity of the influence patterns, increasing abstraction with scale, math and programming abilities, cross-lingual generalization, and role-playing behavior. Despite many apparently sophisticated forms of generalization, we identify a surprising limitation: influences decay to near-zero when the order of key phrases is flipped. Overall, influence functions give us a powerful new tool for studying the generalization properties of LLMs.
    摘要 当尝试更好地了解一个机器学习模型以便理解和避免相关风险时,一个有价值的证据来源是:哪些训练示例最大程度地对模型的行为做出贡献?影响函数的目的是回答一个Counterfactual问题:如果给定的序列添加到训练集中, THEN 模型的参数(以及其输出)如何改变?虽然影响函数已经生成了一些启示,但是它们难以扩展到大型自然语言模型(LLM),因为计算 inverse-Hessian-vector product(IHVP)的困难。我们使用Eigenvalue-corrected Kronecker-Factored Approximate Curvature(EK-FAC)的方法来扩展影响函数到 LLM 中,并在520亿参数下实现了类似的准确率。我们运行了两种算法技术来减少计算候选训练序列的梯度的成本:TF-IDF 筛选和查询批处理。我们使用影响函数来调查大语言模型的泛化模式,包括泛化 Patterns的稀缺性、逐渐增加的抽象级别、数学和编程能力、跨语言泛化和角色扮演行为。尽管有很多复杂的泛化形式,但我们发现一个Surprising limitation:影响的 decay 到 near-zero 当键phrase 的顺序被反转。总之,影响函数为我们研究大语言模型的泛化性质提供了一个强大的新工具。

DOMINO: Domain-invariant Hyperdimensional Classification for Multi-Sensor Time Series Data

  • paper_url: http://arxiv.org/abs/2308.03295
  • repo_url: None
  • paper_authors: Junyao Wang, Luke Chen, Mohammad Abdullah Al Faruque
  • for: 这个研究是为了解决智能网络的资料驱动机器学学习方法中的分布偏移问题。
  • methods: 这个研究使用了脑海算法(HDC)来解决分布偏移问题,并且提出了一个名为DOMINO的新的学习框架。
  • results: 这个研究的结果显示,DOMINO比前一代的预测方法高出2.04%的精度,并且在训练和测试过程中比前一代的预测方法快得多了16.34倍和2.89倍。此外,DOMINO在部分标签和高度不均的资料上进行学习时表现特别出色,与硬件噪音相比,DOMINO的Robustness提高了10.93倍。
    Abstract With the rapid evolution of the Internet of Things, many real-world applications utilize heterogeneously connected sensors to capture time-series information. Edge-based machine learning (ML) methodologies are often employed to analyze locally collected data. However, a fundamental issue across data-driven ML approaches is distribution shift. It occurs when a model is deployed on a data distribution different from what it was trained on, and can substantially degrade model performance. Additionally, increasingly sophisticated deep neural networks (DNNs) have been proposed to capture spatial and temporal dependencies in multi-sensor time series data, requiring intensive computational resources beyond the capacity of today's edge devices. While brain-inspired hyperdimensional computing (HDC) has been introduced as a lightweight solution for edge-based learning, existing HDCs are also vulnerable to the distribution shift challenge. In this paper, we propose DOMINO, a novel HDC learning framework addressing the distribution shift problem in noisy multi-sensor time-series data. DOMINO leverages efficient and parallel matrix operations on high-dimensional space to dynamically identify and filter out domain-variant dimensions. Our evaluation on a wide range of multi-sensor time series classification tasks shows that DOMINO achieves on average 2.04% higher accuracy than state-of-the-art (SOTA) DNN-based domain generalization techniques, and delivers 16.34x faster training and 2.89x faster inference. More importantly, DOMINO performs notably better when learning from partially labeled and highly imbalanced data, providing 10.93x higher robustness against hardware noises than SOTA DNNs.
    摘要 With the rapid evolution of the Internet of Things, many real-world applications utilize heterogeneously connected sensors to capture time-series information. Edge-based machine learning (ML) methodologies are often employed to analyze locally collected data. However, a fundamental issue across data-driven ML approaches is distribution shift. It occurs when a model is deployed on a data distribution different from what it was trained on, and can substantially degrade model performance. Additionally, increasingly sophisticated deep neural networks (DNNs) have been proposed to capture spatial and temporal dependencies in multi-sensor time series data, requiring intensive computational resources beyond the capacity of today's edge devices. While brain-inspired hyperdimensional computing (HDC) has been introduced as a lightweight solution for edge-based learning, existing HDCs are also vulnerable to the distribution shift challenge. In this paper, we propose DOMINO, a novel HDC learning framework addressing the distribution shift problem in noisy multi-sensor time-series data. DOMINO leverages efficient and parallel matrix operations on high-dimensional space to dynamically identify and filter out domain-variant dimensions. Our evaluation on a wide range of multi-sensor time series classification tasks shows that DOMINO achieves on average 2.04% higher accuracy than state-of-the-art (SOTA) DNN-based domain generalization techniques, and delivers 16.34x faster training and 2.89x faster inference. More importantly, DOMINO performs notably better when learning from partially labeled and highly imbalanced data, providing 10.93x higher robustness against hardware noises than SOTA DNNs.

SynJax: Structured Probability Distributions for JAX

  • paper_url: http://arxiv.org/abs/2308.03291
  • repo_url: https://github.com/deepmind/synjax
  • paper_authors: Miloš Stanojević, Laurent Sartran
  • for: 这个论文是为了提高深度学习模型中的结构化对象处理而写的。
  • methods: 这篇论文使用的方法是为了提供高效的 вектор化实现归一化算法,以便在现代硬件加速器上实现大规模可导模型。
  • results: 这篇论文通过SynJax库实现了大规模可导模型,并且可以高效地处理结构化对象,如树和分割。
    Abstract The development of deep learning software libraries enabled significant progress in the field by allowing users to focus on modeling, while letting the library to take care of the tedious and time-consuming task of optimizing execution for modern hardware accelerators. However, this has benefited only particular types of deep learning models, such as Transformers, whose primitives map easily to the vectorized computation. The models that explicitly account for structured objects, such as trees and segmentations, did not benefit equally because they require custom algorithms that are difficult to implement in a vectorized form. SynJax directly addresses this problem by providing an efficient vectorized implementation of inference algorithms for structured distributions covering alignment, tagging, segmentation, constituency trees and spanning trees. With SynJax we can build large-scale differentiable models that explicitly model structure in the data. The code is available at https://github.com/deepmind/synjax.
    摘要 深度学习软件库的发展允许用户专注于模型设计,让库负责处理现代硬件加速器的繁琐和耗时任务。然而,这只有某些深度学习模型,如转换器,得到了利用。这些模型的基本 primitives 可以直接映射到 вектор化计算。不同的模型,如树和分割,因为它们需要特定的算法,很难以在 вектор化形式下实现。SynJax 直接解决了这个问题,提供了高效的 вектор化实现方式,用于推理算法,包括对适配、标记、分割、树和 span 的支持。通过 SynJax,我们可以构建大规模的可导 diferenciable 模型,直接模型数据中的结构。代码可以在 GitHub 上找到:https://github.com/deepmind/synjax。

  • paper_url: http://arxiv.org/abs/2308.03290
  • repo_url: None
  • paper_authors: Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li
  • for: 这个研究旨在提出一种一击混合精度搜寻方法,以实现高品质且低成本的模型。
  • methods: 这个方法使用混合精度搜寻,将数据分成integer和low-precision floating point两部分,并对它们进行一击搜寻。
  • results: 这个方法可以实现高品质的模型,并且比先前的方法更高的精度和更低的成本。在ImageNet上,这个方法可以提高ResNet-18的精度 by 1.31% points和ResNet-50的精度 by 0.90% points,而且与先前的方法相比,这个方法的模型成本相同。此外,这个方法还可以对MobileNetV2进行改进,实现更高的精度和更低的成本。最后,这个方法还可以同时搜寻模型的架构和量化空间,实现ImageNet的精度提高 by 2.69% points,并且与相似的模型成本。
    Abstract Quantization has become a mainstream compression technique for reducing model size, computational requirements, and energy consumption for modern deep neural networks (DNNs). With the improved numerical support in recent hardware, including multiple variants of integer and floating point, mixed-precision quantization has become necessary to achieve high-quality results with low model cost. Prior mixed-precision quantization methods have performed a post-training quantization search, which compromises on accuracy, or a differentiable quantization search, which leads to high memory usage from branching. Therefore, we propose the first one-shot mixed-precision quantization search that eliminates the need for retraining in both integer and low-precision floating point models. We evaluate our floating-point and integer quantization search (FLIQS) on multiple convolutional networks and vision transformer models to discover Pareto-optimal models. Our approach discovers models that improve upon uniform precision, manual mixed-precision, and recent integer quantization search methods. With the proposed integer quantization search, we increase the accuracy of ResNet-18 on ImageNet by 1.31% points and ResNet-50 by 0.90% points with equivalent model cost over previous methods. Additionally, for the first time, we explore a novel mixed-precision floating-point search and improve MobileNetV2 by up to 0.98% points compared to prior state-of-the-art FP8 models. Finally, we extend FLIQS to simultaneously search a joint quantization and neural architecture space and improve the ImageNet accuracy by 2.69% points with similar model cost on a MobileNetV2 search space.
    摘要 快照量化已成为现代深度神经网络(DNN)的主流压缩技术,以降低模型大小、计算需求和能耗。随着当前硬件的改进 numerical support,包括多种整数和浮点数的多种变体,杂音精度量化已成为实现高质量结果的低成本模型的必要手段。先前的杂音精度量化方法通常会进行训练后量化搜索,这会妥协准确性,或者使用可导量化搜索,这会导致高内存使用率。因此,我们提出了首次一shot杂音精度量化搜索,这将消除重新训练的需要,并在整数和低精度浮点数模型中实现高质量结果。我们对多个卷积神经网络和视Transformer模型进行了评估,并发现了Pareto优质模型。我们的方法可以提高ResNet-18在ImageNet上的准确率by 1.31%点和ResNet-50上的准确率by 0.90%点,与先前方法相当。此外,我们首次探索了一种新的杂音精度浮点数搜索,并提高了MobileNetV2的性能,相比先前的FP8模型。最后,我们将FLIQS扩展到同时搜索一个量化和神经网络架构的空间,并在MobileNetV2搜索空间上提高ImageNet的准确率by 2.69%点,与相同的模型成本相似。

High-rate discretely-modulated continuous-variable quantum key distribution using quantum machine learning

  • paper_url: http://arxiv.org/abs/2308.03283
  • repo_url: None
  • paper_authors: Qin Liao, Jieyu Liu, Anqi Huang, Lei Huang, Zhuoying Fei, Xiquan Fu
  • for: 该研究旨在提出一种高速率的 continuous-variable quantum key distribution(CVQKD)分类器,用于提高CVQKD系统的安全性和效率。
  • methods: 该研究使用了量子机器学习技术,将CVQKD系统分为三部分:初始化部分,预测部分和数据处理部分。在预测部分,使用了低复杂度的量子k-最近邻(QkNN)分类器,用于预测Bob方向的失去拟合干扰 states。
  • results: 研究发现,提出的QkNN-based CVQKD具有较高的安全性和效率,并且可以通过增加拟合干扰的幅度来进一步提高秘密密钥率。数值仿真结果表明,相比现有的DM CVQKD协议,该方案的秘密密钥率明显高于现有的协议。
    Abstract We propose a high-rate scheme for discretely-modulated continuous-variable quantum key distribution (DM CVQKD) using quantum machine learning technologies, which divides the whole CVQKD system into three parts, i.e., the initialization part that is used for training and estimating quantum classifier, the prediction part that is used for generating highly correlated raw keys, and the data-postprocessing part that generates the final secret key string shared by Alice and Bob. To this end, a low-complexity quantum k-nearest neighbor (QkNN) classifier is designed for predicting the lossy discretely-modulated coherent states (DMCSs) at Bob's side. The performance of the proposed QkNN-based CVQKD especially in terms of machine learning metrics and complexity is analyzed, and its theoretical security is proved by using semi-definite program (SDP) method. Numerical simulation shows that the secret key rate of our proposed scheme is explicitly superior to the existing DM CVQKD protocols, and it can be further enhanced with the increase of modulation variance.
    摘要 我们提出了一种高率方案 для离散Modulated kontinuierliche variable Quantum Key Distribution(DM CVQKD),使用量子机器学习技术,将整个CVQKD系统分成三部分:初始化部分用于训练和估计量子分类器,预测部分用于生成高度相关的Raw密钥,以及数据处理部分用于生成最终由Alice和Bob共享的密钥串。为此,我们设计了一种低复杂度的量子k-最近邻(QkNN)分类器,用于预测Bob方面的失去离散Modulated coherent states(DMCSs)。我们分析了提议的QkNN-based CVQKD的性能,包括机器学习指标和复杂度,并使用半definite Program(SDP)方法证明其理论安全性。numerical simulation表明,我们提议的方案的秘密密钥率明显高于现有的DM CVQKD协议,并可以通过增加模拟幅度进一步提高。

Knowledge Distilled Ensemble Model for sEMG-based Silent Speech Interface

  • paper_url: http://arxiv.org/abs/2308.06533
  • repo_url: None
  • paper_authors: Wenqiang Lai, Qihan Yang, Ye Mao, Endong Sun, Jiangnan Ye
  • for: 这个论文是为了解决语音疾病的问题而写的。
  • methods: 这个论文使用了深度学习知识压缩组合模型(KDE-SSI)来解决surface electromyography-based Silent Speech Interface(sEMG-based SSI)的限制。
  • results: 该模型可以在26个NATO音素字符集中正确地类别3900个数据样本,允许不ambiguous地生成任何英语单词。测试准确率达85.9%。
    Abstract Voice disorders affect millions of people worldwide. Surface electromyography-based Silent Speech Interfaces (sEMG-based SSIs) have been explored as a potential solution for decades. However, previous works were limited by small vocabularies and manually extracted features from raw data. To address these limitations, we propose a lightweight deep learning knowledge-distilled ensemble model for sEMG-based SSI (KDE-SSI). Our model can classify a 26 NATO phonetic alphabets dataset with 3900 data samples, enabling the unambiguous generation of any English word through spelling. Extensive experiments validate the effectiveness of KDE-SSI, achieving a test accuracy of 85.9\%. Our findings also shed light on an end-to-end system for portable, practical equipment.
    摘要 声音疾病影响全球数百万人。基于表面电 MYography(sEMG)的无声朗读界面(SSI)已经被研究了几十年。然而,之前的工作受到小词汇和从原始数据手动提取的特征所限制。为了解决这些限制,我们提议一种轻量级深度学习知识填充ensemble模型 для sEMG-based SSI(KDE-SSI)。我们的模型可以 классифицировать一个 NATO phonetic alphabet dataset,包含 3900 个数据样本,允许不 ambiguous 地生成任何英语单词 through spelling。广泛的实验证明了 KDE-SSI 的效果,实现了 85.9% 的测试精度。我们的发现还 shed light on 一个端到端系统 for portable, practical equipment.

DSformer: A Double Sampling Transformer for Multivariate Time Series Long-term Prediction

  • paper_url: http://arxiv.org/abs/2308.03274
  • repo_url: None
  • paper_authors: Chengqing Yu, Fei Wang, Zezhi Shao, Tao Sun, Lin Wu, Yongjun Xu
  • for: 预测多变量时间序列长期变化,提供决策参考。
  • methods: 提议使用双重采样变换器(DSformer),包括双重采样(DS)块和时间变量注意(TVA)块。DS块使用下采样和分割采样将原始序列转换为具有全球信息和本地信息注意的特征向量。然后,TVA块使用时间注意和变量注意来挖掘这些特征向量的不同维度信息,并提取关键信息。
  • results: 实验结果表明,DSformer可以在九个真实世界数据集上超过八个基elines。
    Abstract Multivariate time series long-term prediction, which aims to predict the change of data in a long time, can provide references for decision-making. Although transformer-based models have made progress in this field, they usually do not make full use of three features of multivariate time series: global information, local information, and variables correlation. To effectively mine the above three features and establish a high-precision prediction model, we propose a double sampling transformer (DSformer), which consists of the double sampling (DS) block and the temporal variable attention (TVA) block. Firstly, the DS block employs down sampling and piecewise sampling to transform the original series into feature vectors that focus on global information and local information respectively. Then, TVA block uses temporal attention and variable attention to mine these feature vectors from different dimensions and extract key information. Finally, based on a parallel structure, DSformer uses multiple TVA blocks to mine and integrate different features obtained from DS blocks respectively. The integrated feature information is passed to the generative decoder based on a multi-layer perceptron to realize multivariate time series long-term prediction. Experimental results on nine real-world datasets show that DSformer can outperform eight existing baselines.
    摘要 多变量时间序列长期预测,目的是预测数据在长期内的变化,可以提供决策参考。虽然基于转换器模型在这个领域已经取得了进步,但它们通常不充分利用多变量时间序列的三个特征:全局信息、本地信息和变量相关性。为了有效利用这些特征并建立高精度预测模型,我们提议了双重采样变换器(DSformer)。DSformer包括双重采样(DS)块和时间变量注意(TVA)块。首先,DS块使用下采样和分割采样将原始序列转化为特征向量,其中专注于全局信息和本地信息。然后,TVA块使用时间注意和变量注意来挖掘这些特征向量从不同维度,提取关键信息。最后,基于并行结构,DSformer使用多个TVA块来挖掘和集成不同维度的特征信息,并将其传递给基于多层感知机器的生成解码器,实现多变量时间序列长期预测。实验结果表明,DSformer可以在九个真实世界数据集上超过八个基准值。

Local Structure-aware Graph Contrastive Representation Learning

  • paper_url: http://arxiv.org/abs/2308.03271
  • repo_url: None
  • paper_authors: Kai Yang, Yuan Liu, Zijuan Zhao, Peijin Ding, Wenqian Zhao
  • for: 本研究提出了一种Local Structure-aware Graph Contrastive representation Learning方法(LS-GCL),用于模型节点的多视图结构信息。
  • methods: 本方法首先构建了各个目标节点的含义子图,不限于首领相邻节点。然后,对每个目标节点,使用共享GNN编码器获取子图级别节点嵌入。最后,使用pooling函数生成子图级别图像嵌入。
  • results: 对五个数据集进行实验,结果表明, compared to现有图像学习方法,LS-GCL方法在节点分类和链接预测任务中表现出色。
    Abstract Traditional Graph Neural Network (GNN), as a graph representation learning method, is constrained by label information. However, Graph Contrastive Learning (GCL) methods, which tackle the label problem effectively, mainly focus on the feature information of the global graph or small subgraph structure (e.g., the first-order neighborhood). In the paper, we propose a Local Structure-aware Graph Contrastive representation Learning method (LS-GCL) to model the structural information of nodes from multiple views. Specifically, we construct the semantic subgraphs that are not limited to the first-order neighbors. For the local view, the semantic subgraph of each target node is input into a shared GNN encoder to obtain the target node embeddings at the subgraph-level. Then, we use a pooling function to generate the subgraph-level graph embeddings. For the global view, considering the original graph preserves indispensable semantic information of nodes, we leverage the shared GNN encoder to learn the target node embeddings at the global graph-level. The proposed LS-GCL model is optimized to maximize the common information among similar instances at three various perspectives through a multi-level contrastive loss function. Experimental results on five datasets illustrate that our method outperforms state-of-the-art graph representation learning approaches for both node classification and link prediction tasks.
    摘要 传统的图形神经网络(GNN)在图像学习中受标签信息的限制。然而,图像对比学习(GCL)方法,它们可以有效地解决标签问题,主要集中在全图或小部分图结构(例如,第一邻居)的特征信息上。在文章中,我们提出了一种本地结构意识 graph contrastive representation learning方法(LS-GCL),用于模型节点的多视图结构信息。具体来说,我们构建了不限于第一邻居的语义子图。对于本地视图,每个目标节点的语义子图输入到共享GNNEncoder中,以获得目标节点的子图级别嵌入。然后,我们使用一种池化函数生成子图级别图像嵌入。对于全球视图,因为原始图保留了节点的不可或缺semantic信息,我们利用共享GNNEncoder来学习目标节点的全图级别嵌入。我们提出的LS-GCL模型通过最大化三个不同视点上相似实例的共同信息来优化多级对比损失函数。实验结果表明,我们的方法在五个 dataset上较前state-of-the-art图形学习方法出色地进行节点类别和连接预测任务。

Simple Rule Injection for ComplEx Embeddings

  • paper_url: http://arxiv.org/abs/2308.03269
  • repo_url: None
  • paper_authors: Haodi Ma, Anthony Colas, Yuejie Wang, Ali Sadeghian, Daisy Zhe Wang
  • for: 本研究旨在结合逻辑规则和知识图embedding以获得优化的知识图推理结果。
  • methods: 本研究提出了一种名为InjEx的机制,可以通过简单的约束来插入多种逻辑规则,以捕捉Definite Horn规则。
  • results: 实验结果表明,InjEx可以在知识图完成(KGC)和少量shot知识图完成(FKGC)任务中具有更高的性能和可扩展性,并且可以带来更加可读性的知识图 Representation。
    Abstract Recent works in neural knowledge graph inference attempt to combine logic rules with knowledge graph embeddings to benefit from prior knowledge. However, they usually cannot avoid rule grounding, and injecting a diverse set of rules has still not been thoroughly explored. In this work, we propose InjEx, a mechanism to inject multiple types of rules through simple constraints, which capture definite Horn rules. To start, we theoretically prove that InjEx can inject such rules. Next, to demonstrate that InjEx infuses interpretable prior knowledge into the embedding space, we evaluate InjEx on both the knowledge graph completion (KGC) and few-shot knowledge graph completion (FKGC) settings. Our experimental results reveal that InjEx outperforms both baseline KGC models as well as specialized few-shot models while maintaining its scalability and efficiency.
    摘要 近期研究在神经知识图推理中尝试将逻辑规则与知识图嵌入结合以获得优势,但通常无法避免规则固化,并尚未全面探讨多种规则的混合。在这种工作中,我们提出了InjEx机制,可以通过简单的约束来注入多种类型的规则,这些规则捕捉了幂等规则。首先,我们理论上证明了InjEx可以注入这些规则。然后,我们通过在知识图完成(KGC)和少量知识图完成(FKGC)设置中评估InjEx,发现InjEx可以充分吸收明确的先验知识,并在缺少数据时保持高效性和扩展性。

Exploring Different Time-series-Transformer (TST) Architectures: A Case Study in Battery Life Prediction for Electric Vehicles (EVs)

  • paper_url: http://arxiv.org/abs/2308.03260
  • repo_url: None
  • paper_authors: Niranjan Sitapure, Atharva Kulkarni
  • for: The paper aims to develop accurate battery life prediction models for electric vehicles (EVs) using a data-driven approach and novel transformer-based architectures.
  • methods: The paper uses time-series-transformers (TSTs) and long short-term memory (LSTM) models to predict battery state-of-charge (SOC) and temperature in EVs, incorporating environmental, battery, vehicle driving, and heating circuit data.
  • results: The paper explores and compares novel TST architectures, including encoder TST + decoder LSTM and a hybrid TST-LSTM, to create accurate battery life prediction models for EVs.
    Abstract In recent years, battery technology for electric vehicles (EVs) has been a major focus, with a significant emphasis on developing new battery materials and chemistries. However, accurately predicting key battery parameters, such as state-of-charge (SOC) and temperature, remains a challenge for constructing advanced battery management systems (BMS). Existing battery models do not comprehensively cover all parameters affecting battery performance, including non-battery-related factors like ambient temperature, cabin temperature, elevation, and regenerative braking during EV operation. Due to the difficulty of incorporating these auxiliary parameters into traditional models, a data-driven approach is suggested. Time-series-transformers (TSTs), leveraging multiheaded attention and parallelization-friendly architecture, are explored alongside LSTM models. Novel TST architectures, including encoder TST + decoder LSTM and a hybrid TST-LSTM, are also developed and compared against existing models. A dataset comprising 72 driving trips in a BMW i3 (60 Ah) is used to address battery life prediction in EVs, aiming to create accurate TST models that incorporate environmental, battery, vehicle driving, and heating circuit data to predict SOC and battery temperature for future time steps.
    摘要 近年来,电动汽车(EV)的电池技术受到了重点关注,开发新的电池材料和化学组合也受到了重视。然而,正确预测电池参数,如充电状态(SOC)和温度,仍然是构建高级电池管理系统(BMS)的挑战。现有的电池模型没有完全覆盖所有影响电池性能的参数,包括非电池相关因素,如外部温度、车辆温度、海拔和在EV运行时的回生制动。由于将这些辅助参数 incorporated into traditional models 是困难的,一种数据驱动的方法被建议。时序列转换器(TST),利用多头注意力和并行化友好的架构,与LSTM模型一起被探讨。 Novel TST架构,包括编码TST+解码LSTM和混合TST-LSTM,也被开发并与现有模型进行比较。使用了72次开放驱动记录(BMW i3,60 Ah),用于预测EV电池寿命,目标是创建准确的TST模型, incorporating environmental, battery, vehicle driving, and heating circuit data to predict SOC and battery temperature for future time steps。

Optimal Approximation and Learning Rates for Deep Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2308.03259
  • repo_url: None
  • paper_authors: Shao-Bo Lin
  • for: 这篇论文主要针对深度卷积神经网络的抽象和学习性能分析。
  • methods: 论文使用了零填充和最大池化来分析深度卷积神经网络的抽象和学习性能。
  • results: 论文证明了,用于模型$ r $-光滑函数的近似率,深度卷积神经网络的深度$ L $ 的近似率是$ (L^2/\log L)^{-2r/d} $,这是最佳准确性因子。此外,论文还提出了几乎最佳的学习率来实现深度卷积神经网络上的实际风险函数最小化。
    Abstract This paper focuses on approximation and learning performance analysis for deep convolutional neural networks with zero-padding and max-pooling. We prove that, to approximate $r$-smooth function, the approximation rates of deep convolutional neural networks with depth $L$ are of order $ (L^2/\log L)^{-2r/d} $, which is optimal up to a logarithmic factor. Furthermore, we deduce almost optimal learning rates for implementing empirical risk minimization over deep convolutional neural networks.
    摘要 Here's the translation in Simplified Chinese:这篇论文关注深度卷积神经网络(CNN)的近似和学习性能分析,包括零填充和最大池化。我们证明,用于近似$r$-光滑函数的深度$L$的CNN的近似率是$(L^2/\log L)^{-2r/d}$,即优化到对数因子。此外,我们还得出了对深度CNN进行实际风险最小化的几乎最佳学习速率。

Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

  • paper_url: http://arxiv.org/abs/2308.03243
  • repo_url: https://github.com/cyclebooster/unsupervised-adversarial-detection-without-extra-model
  • paper_authors: Chien Cheng Chyou, Hung-Ting Su, Winston H. Hsu
  • for: 提高深度学习模型对抗攻击的可靠性
  • methods: 提出新的训练损失函数和无需依赖于攻击类型的检测方法
  • results: 检测率高于93.9%,false positive率低于2.5%,在所有攻击类型下都有良好表现
    Abstract Adversarial robustness poses a critical challenge in the deployment of deep learning models for real-world applications. Traditional approaches to adversarial training and supervised detection rely on prior knowledge of attack types and access to labeled training data, which is often impractical. Existing unsupervised adversarial detection methods identify whether the target model works properly, but they suffer from bad accuracies owing to the use of common cross-entropy training loss, which relies on unnecessary features and strengthens adversarial attacks. We propose new training losses to reduce useless features and the corresponding detection method without prior knowledge of adversarial attacks. The detection rate (true positive rate) against all given white-box attacks is above 93.9% except for attacks without limits (DF($\infty$)), while the false positive rate is barely 2.5%. The proposed method works well in all tested attack types and the false positive rates are even better than the methods good at certain types.
    摘要 深度学习模型在实际应用中面临严重的敌对Robustness挑战。传统的敌对训练和监督检测方法需要严格的优化目标和训练数据,这并不现实。现有的无监督敌对检测方法可以判断目标模型是否正常工作,但它们因使用共同克服极值损失函数,导致检测精度差。我们提出了新的训练损失函数,以减少无用的特征,并提出了不需要先知 adversarial 攻击的检测方法。对于所有给出的白盒攻击,检测率(真正阳性率)高于93.9%,只有不受限制的攻击(DF($\infty$))的检测率较低, False Positive率只有2.5%。我们的方法在所有攻击类型上都很好,而且对于某些类型的检测率更高。

Asynchronous Decentralized Q-Learning: Two Timescale Analysis By Persistence

  • paper_url: http://arxiv.org/abs/2308.03239
  • repo_url: None
  • paper_authors: Bora Yongacoglu, Gürdal Arslan, Serdar Yüksel
  • for: 这篇论文主要探讨了多智能体学习(MARL)中的非站点性挑战,以及如何使用不同的方法来解决这个挑战。
  • methods: 这篇论文使用了 asynchronous variant of the decentralized Q-learning algorithm,并提供了 suficient conditions 以 garantuee that the asynchronous algorithm drives play to equilibrium with high probability。
  • results: 研究发现,使用constant learning rates在Q-factor更新中是关键的,可以relaxing the synchrony assumptions of earlier work。此外,这种方法还可以应用于 asynchronous generalizations of a number of other algorithms from the regret testing tradition。
    Abstract Non-stationarity is a fundamental challenge in multi-agent reinforcement learning (MARL), where agents update their behaviour as they learn. Many theoretical advances in MARL avoid the challenge of non-stationarity by coordinating the policy updates of agents in various ways, including synchronizing times at which agents are allowed to revise their policies. Synchronization enables analysis of many MARL algorithms via multi-timescale methods, but such synchrony is infeasible in many decentralized applications. In this paper, we study an asynchronous variant of the decentralized Q-learning algorithm, a recent MARL algorithm for stochastic games. We provide sufficient conditions under which the asynchronous algorithm drives play to equilibrium with high probability. Our solution utilizes constant learning rates in the Q-factor update, which we show to be critical for relaxing the synchrony assumptions of earlier work. Our analysis also applies to asynchronous generalizations of a number of other algorithms from the regret testing tradition, whose performance is analyzed by multi-timescale methods that study Markov chains obtained via policy update dynamics. This work extends the applicability of the decentralized Q-learning algorithm and its relatives to settings in which parameters are selected in an independent manner, and tames non-stationarity without imposing the coordination assumptions of prior work.
    摘要

AdaER: An Adaptive Experience Replay Approach for Continual Lifelong Learning

  • paper_url: http://arxiv.org/abs/2308.03810
  • repo_url: None
  • paper_authors: Xingyu Li, Bo Tang, Haifeng Li
  • for: 这 paper 的目的是解决机器学习框架中的持续学习问题,即学习者需要不断获得新的知识,但是流动数据的非站点性带来了快速忘记之问题。
  • methods: 这 paper 使用了一种新的算法叫做adaptive-experience replay(AdaER),它包括两个阶段:记忆重播和记忆更新。在记忆重播阶段,AdaER 使用了一种 Contextually-cued memory recall(C-CMR)策略,选择在当前输入数据和任务上表现最大的冲突的记忆进行重播。此外,AdaER 还包括一种 Entropy-balanced reservoir sampling(E-BRS)策略来增强记忆缓存的性能。
  • results: 经过实验表明,AdaER 比现有的持续学习基elines表现更高,highlighting its efficacy in mitigating catastrophic forgetting and improving learning performance。
    Abstract Continual lifelong learning is an machine learning framework inspired by human learning, where learners are trained to continuously acquire new knowledge in a sequential manner. However, the non-stationary nature of streaming training data poses a significant challenge known as catastrophic forgetting, which refers to the rapid forgetting of previously learned knowledge when new tasks are introduced. While some approaches, such as experience replay (ER), have been proposed to mitigate this issue, their performance remains limited, particularly in the class-incremental scenario which is considered natural and highly challenging. In this paper, we present a novel algorithm, called adaptive-experience replay (AdaER), to address the challenge of continual lifelong learning. AdaER consists of two stages: memory replay and memory update. In the memory replay stage, AdaER introduces a contextually-cued memory recall (C-CMR) strategy, which selectively replays memories that are most conflicting with the current input data in terms of both data and task. Additionally, AdaER incorporates an entropy-balanced reservoir sampling (E-BRS) strategy to enhance the performance of the memory buffer by maximizing information entropy. To evaluate the effectiveness of AdaER, we conduct experiments on established supervised continual lifelong learning benchmarks, specifically focusing on class-incremental learning scenarios. The results demonstrate that AdaER outperforms existing continual lifelong learning baselines, highlighting its efficacy in mitigating catastrophic forgetting and improving learning performance.
    摘要 AdaER包括两个阶段:记忆回顾和记忆更新。在记忆回顾阶段,AdaER使用了Contextually-cued Memory Recall(C-CMR)策略,选择ively回顾与当前输入数据和任务相关的记忆。此外,AdaER还将Entropy-balanced Reservoir Sampling(E-BRS)策略添加到记忆缓存中,以提高记忆缓存的性能,并 Maximizing information entropy。为了评估AdaER的有效性,我们对已有的超vised continual lifelong learning Benchmark进行实验,特别是对维度增量学习scenario。结果显示,AdaER在减少忘却和提高学习性能方面表现出色,较以前的持续性学习基eline高效。

G-Mix: A Generalized Mixup Learning Framework Towards Flat Minima

  • paper_url: http://arxiv.org/abs/2308.03236
  • repo_url: None
  • paper_authors: Xingyu Li, Bo Tang
  • for: 提高深度神经网络(DNN)的通用能力,特别是在有限的训练数据available时。
  • methods: combines Mixup和SAM(Sharpness-Aware Minimization)技术,以提高DNN训练过程中的泛化能力。
  • results: 提出了两种新算法:Binary G-Mix和Decomposed G-Mix,可以进一步优化DNN性能。实验结果表明,这两种算法可以在多个数据集和模型上提高模型的泛化性能,达到状态 искусственный智能水平。
    Abstract Deep neural networks (DNNs) have demonstrated promising results in various complex tasks. However, current DNNs encounter challenges with over-parameterization, especially when there is limited training data available. To enhance the generalization capability of DNNs, the Mixup technique has gained popularity. Nevertheless, it still produces suboptimal outcomes. Inspired by the successful Sharpness-Aware Minimization (SAM) approach, which establishes a connection between the sharpness of the training loss landscape and model generalization, we propose a new learning framework called Generalized-Mixup, which combines the strengths of Mixup and SAM for training DNN models. The theoretical analysis provided demonstrates how the developed G-Mix framework enhances generalization. Additionally, to further optimize DNN performance with the G-Mix framework, we introduce two novel algorithms: Binary G-Mix and Decomposed G-Mix. These algorithms partition the training data into two subsets based on the sharpness-sensitivity of each example to address the issue of "manifold intrusion" in Mixup. Both theoretical explanations and experimental results reveal that the proposed BG-Mix and DG-Mix algorithms further enhance model generalization across multiple datasets and models, achieving state-of-the-art performance.
    摘要 Translation notes:* "Deep neural networks" is translated as "深度神经网络" (shēn dào shén zhī wǎng luò)* "Mixup" is translated as "混合" (hùn hé)* "Sharpness-Aware Minimization" is translated as "锐度意识化最小化" (yǐ shi zhī yì xiǎng zuì xiǎo)* "Generalized-Mixup" is translated as "通用混合" (gōng yòng hùn hé)* "Binary G-Mix" is translated as "二进制G-Mix" (èr jì zhì G-Mix)* "Decomposed G-Mix" is translated as "分解G-Mix" (fēn jiě G-Mix)

Analysis of the Evolution of Advanced Transformer-Based Language Models: Experiments on Opinion Mining

  • paper_url: http://arxiv.org/abs/2308.03235
  • repo_url: https://github.com/zekaouinoureddine/Opinion-Transformers
  • paper_authors: Nour Eddine Zekaoui, Siham Yousfi, Maryem Rhanoui, Mounia Mikram
  • for: 本研究的目的是研究Transformer型语言模型在情感分析领域的表现,并对这些模型进行比较,以便为生产工程师和研究人员提供指导。
  • methods: 本研究使用了Transformer型语言模型进行情感分析,并对这些模型进行了比较。
  • results: 研究发现,Transformer型语言模型在情感分析任务上具有出色的表现,具有较快的处理速度和更高的准确率。
    Abstract Opinion mining, also known as sentiment analysis, is a subfield of natural language processing (NLP) that focuses on identifying and extracting subjective information in textual material. This can include determining the overall sentiment of a piece of text (e.g., positive or negative), as well as identifying specific emotions or opinions expressed in the text, that involves the use of advanced machine and deep learning techniques. Recently, transformer-based language models make this task of human emotion analysis intuitive, thanks to the attention mechanism and parallel computation. These advantages make such models very powerful on linguistic tasks, unlike recurrent neural networks that spend a lot of time on sequential processing, making them prone to fail when it comes to processing long text. The scope of our paper aims to study the behaviour of the cutting-edge Transformer-based language models on opinion mining and provide a high-level comparison between them to highlight their key particularities. Additionally, our comparative study shows leads and paves the way for production engineers regarding the approach to focus on and is useful for researchers as it provides guidelines for future research subjects.
    摘要

Imbalanced Large Graph Learning Framework for FPGA Logic Elements Packing Prediction

  • paper_url: http://arxiv.org/abs/2308.03231
  • repo_url: None
  • paper_authors: Zhixiong Di, Runzhe Tao, Lin Chen, Qiang Wu, Yibo Lin
  • for: 预测FPGA嵌入式逻辑元素是否被压缩,以优化设计优化和加速设计关闭。
  • methods: 提议一种大图学习框架ImLG,通过独特的特征提取和特征聚合方法来提高层次图表示学习。另外,针对不均衡分布的逻辑元素压缩和未压缩的情况,提出了图像增量和小批训练等技术来解决这种学习任务。
  • results: 实验结果表明,我们的框架可以提高FPGA嵌入式逻辑元素预测的F1分数比最近的泊Sdk-based预测方法提高42.82%。物理设计结果表明,提议的方法可以帮助分配器提高已经路由的电缆长度0.93%和SLICE占用0.89%。
    Abstract Packing is a required step in a typical FPGA CAD flow. It has high impacts to the performance of FPGA placement and routing. Early prediction of packing results can guide design optimization and expedite design closure. In this work, we propose an imbalanced large graph learning framework, ImLG, for prediction of whether logic elements will be packed after placement. Specifically, we propose dedicated feature extraction and feature aggregation methods to enhance the node representation learning of circuit graphs. With imbalanced distribution of packed and unpacked logic elements, we further propose techniques such as graph oversampling and mini-batch training for this imbalanced learning task in large circuit graphs. Experimental results demonstrate that our framework can improve the F1 score by 42.82% compared to the most recent Gaussian-based prediction method. Physical design results show that the proposed method can assist the placer in improving routed wirelength by 0.93% and SLICE occupation by 0.89%.
    摘要 Packing 是FPGA CAD流程中的一个必需步骤,它对FPGA的地点和路径产生了高度的影响。 early prediction of packing results can guide design optimization and expedite design closure. 在这种工作中,我们提出了一种大图学习框架,ImLG,用于预测逻辑元素是否将被packed After placement。specifically,我们提出了专门的特征提取和特征聚合方法,以增强环 Graph的节点表示学习。 With imbalanced distribution of packed and unpacked logic elements, we further propose techniques such as graph oversampling and mini-batch training for this imbalanced learning task in large circuit graphs. Experimental results demonstrate that our framework can improve the F1 score by 42.82% compared to the most recent Gaussian-based prediction method. Physical design results show that the proposed method can assist the placer in improving routed wirelength by 0.93% and SLICE occupation by 0.89%.

Tractability of approximation by general shallow networks

  • paper_url: http://arxiv.org/abs/2308.03230
  • repo_url: None
  • paper_authors: Hrushikesh Mhaskar, Tong Mao
  • for: 本文提出了一种更加锐利的 bounds 方法,用于 approximating 函数 $ x\mapsto\int_{\mathbb{Y} G( x, y)d\tau( y) $ 的形式,where $\mathbb{X}$ 和 $\mathbb{Y}$ 是紧密的 метри空间。
  • methods: 本文使用了 $G$-网络,即 $ x\mapsto \sum_{k=1}^n a_kG( x, y_k) $,where $ y_1,\cdots, y_n\in\mathbb{Y} $ 和 $ a_1,\cdots, a_n\in\mathbb{R} $。
  • results: 本文提出了基于维度的独立 bounds 方法,可以用于评估 $G$-网络的度量准确性,并且这些 bounds 中的常数都是几乎只依赖于维度的 polynomial 方式增长。
    Abstract In this paper, we present a sharper version of the results in the paper Dimension independent bounds for general shallow networks; Neural Networks, \textbf{123} (2020), 142-152. Let $\mathbb{X}$ and $\mathbb{Y}$ be compact metric spaces. We consider approximation of functions of the form $ x\mapsto\int_{\mathbb{Y} G( x, y)d\tau( y)$, $ x\in\mathbb{X}$, by $G$-networks of the form $ x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $ y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$. Defining the dimensions of $\mathbb{X}$ and $\mathbb{Y}$ in terms of covering numbers, we obtain dimension independent bounds on the degree of approximation in terms of $n$, where also the constants involved are all dependent at most polynomially on the dimensions. Applications include approximation by power rectified linear unit networks, zonal function networks, certain radial basis function networks as well as the important problem of function extension to higher dimensional spaces.
    摘要 在这篇论文中,我们提出了一个更为精细的版本的result,来自文章《独立维度 bounds for general shallow networks》;Neural Networks, \textbf{123} (2020), 142-152。假设 $\mathbb{X}$ 和 $\mathbb{Y}$ 是两个 компакт度度量空间。我们考虑了将函数 $x\mapsto\int_{\mathbb{Y} G( x, y)d\tau( y)$, $x\in\mathbb{X}$, 近似为 $G$-网络的形式 $x\mapsto \sum_{k=1}^n a_kG( x, y_k)$, $y_1,\cdots, y_n\in\mathbb{Y}$, $a_1,\cdots, a_n\in\mathbb{R}$.我们使用covering数量来定义 $\mathbb{X}$ 和 $\mathbb{Y}$ 的维度,得到了独立于维度的度量约束,其中涉及的常数都是仅仅受到 polynomial 幂级的影响。应用包括power rectified linear unit网络、zonal function网络、certain radial basis function网络以及高维空间中函数扩展的重要问题。

Why Linguistics Will Thrive in the 21st Century: A Reply to Piantadosi (2023)

  • paper_url: http://arxiv.org/abs/2308.03228
  • repo_url: None
  • paper_authors: Jordan Kodner, Sarah Payne, Jeffrey Heinz
  • for: 本文批判Piantadosi(2023)认为现代语言模型推翻了昌斯基的语言方法,从四个主要方面进行批判。
  • methods: 本文使用了大量语言模型(LLMs)的印象和实用性来评估昌斯基的语言方法。
  • results: 本文 conclude that 大量语言模型(LLMs)无法解决语言学习中的主要谜团,即儿童在获得 native语言后的语言能力增长。此外,LLMs 无法提供对语言和语言学习的科学理论,因为它们只提供了预测而不是可解释的结果。
    Abstract We present a critical assessment of Piantadosi's (2023) claim that "Modern language models refute Chomsky's approach to language," focusing on four main points. First, despite the impressive performance and utility of large language models (LLMs), humans achieve their capacity for language after exposure to several orders of magnitude less data. The fact that young children become competent, fluent speakers of their native languages with relatively little exposure to them is the central mystery of language learning to which Chomsky initially drew attention, and LLMs currently show little promise of solving this mystery. Second, what can the artificial reveal about the natural? Put simply, the implications of LLMs for our understanding of the cognitive structures and mechanisms underlying language and its acquisition are like the implications of airplanes for understanding how birds fly. Third, LLMs cannot constitute scientific theories of language for several reasons, not least of which is that scientific theories must provide interpretable explanations, not just predictions. This leads to our final point: to even determine whether the linguistic and cognitive capabilities of LLMs rival those of humans requires explicating what humans' capacities actually are. In other words, it requires a separate theory of language and cognition; generative linguistics provides precisely such a theory. As such, we conclude that generative linguistics as a scientific discipline will remain indispensable throughout the 21st century and beyond.
    摘要 我们提供了对Piantadosi(2023)的批判性评估,关注四个主要点。首先,虽然大型语言模型(LLM)具有印象力和实用性,但人类通过相对许多更少的数据获得语言能力是语言学习的中心谜题,Piantadosi最初吸引到了注意。LLMs current show little promise of solving this mystery. Second, what can the artificial reveal about the natural? In other words, the implications of LLMs for our understanding of the cognitive structures and mechanisms underlying language and its acquisition are like the implications of airplanes for understanding how birds fly. Third, LLMs cannot constitute scientific theories of language for several reasons, not least of which is that scientific theories must provide interpretable explanations, not just predictions. This leads to our final point: to even determine whether the linguistic and cognitive capabilities of LLMs rival those of humans requires explicating what humans' capacities actually are. In other words, it requires a separate theory of language and cognition; generative linguistics provides precisely such a theory. As such, we conclude that generative linguistics as a scientific discipline will remain indispensable throughout the 21st century and beyond.

Local Consensus Enhanced Siamese Network with Reciprocal Loss for Two-view Correspondence Learning

  • paper_url: http://arxiv.org/abs/2308.03217
  • repo_url: None
  • paper_authors: Linbo Wang, Jing Wu, Xianyong Fang, Zhengyi Liu, Chenjie Cao, Yanwei Fu
  • for: 提高两视对匹配学习框架的精度和稳定性。
  • methods: 提议一个Local Feature Consensus(LFC)插件块来增强现有模型的特征,以及扩展现有模型到siames网络,使用对称损失来利用对方向投影的信息。
  • results: 通过实验,在标准 benchmark 数据集上达到了状态当前的性能。
    Abstract Recent studies of two-view correspondence learning usually establish an end-to-end network to jointly predict correspondence reliability and relative pose. We improve such a framework from two aspects. First, we propose a Local Feature Consensus (LFC) plugin block to augment the features of existing models. Given a correspondence feature, the block augments its neighboring features with mutual neighborhood consensus and aggregates them to produce an enhanced feature. As inliers obey a uniform cross-view transformation and share more consistent learned features than outliers, feature consensus strengthens inlier correlation and suppresses outlier distraction, which makes output features more discriminative for classifying inliers/outliers. Second, existing approaches supervise network training with the ground truth correspondences and essential matrix projecting one image to the other for an input image pair, without considering the information from the reverse mapping. We extend existing models to a Siamese network with a reciprocal loss that exploits the supervision of mutual projection, which considerably promotes the matching performance without introducing additional model parameters. Building upon MSA-Net, we implement the two proposals and experimentally achieve state-of-the-art performance on benchmark datasets.
    摘要 最近的两视匹配学习研究通常建立一个端到端网络,同时预测匹配可靠性和相对pose。我们从两个方面提高了这种框架:首先,我们提出了一个本地特征共识(LFC)插件块,用于增强现有模型的特征。给一个匹配特征,该块将其周围的特征与相互邻域共识,并将它们积累到生成一个强化特征。由于匹配点遵循均匀的双视变换,并且分享更一致的学习特征,因此特征共识强化匹配点之间的相互关系,降低干扰物的影响,使输出特征更有特征性,以便将匹配点分类为匹配/干扰物。其次,现有方法在网络训练时使用真实匹配和 Essential matrix projecting一个图像到另一个图像,而不考虑反向映射的信息。我们将现有模型扩展为siamese网络,使用相互抽象的损失函数,以便利用反向映射的supervision,大幅提高匹配性能,而无需添加更多的模型参数。基于MSA-Net,我们实现了两个提议,并在benchmark datasets上实验ally achieve state-of-the-art performance。

The Effect of SGD Batch Size on Autoencoder Learning: Sparsity, Sharpness, and Feature Learning

  • paper_url: http://arxiv.org/abs/2308.03215
  • repo_url: None
  • paper_authors: Nikhil Ghosh, Spencer Frei, Wooseok Ha, Bin Yu
  • for: 这个论文研究了使用梯度下降法(SGD)训练单神经 autoencoder 的动态性,并研究了不同批处理大小对于非对称问题的影响。
  • methods: 该论文使用了随机初始化的 SGD 算法,并研究了不同批处理大小对于解的影响。
  • results: 研究发现,无论批处理大小,SGD 都可以成功找到全球最小值,但是特定的全球最小值取决于批处理大小。在全部批处理情况下,解是 dense 的(即不含杂的),并且与初始向量高度相似,表明在这种情况下,Feature learning 不太多。相反,任何小于样本数的批处理大小都可以找到一个 sparse 的全球最小值,这种 “特征选择” 是由梯度randomness 引起的。此外,我们还发现,使用全部批处理的SGD 找到的最小值比使用更小的批处理大小找到的最小值更平滑(即距离初始向量更远),这与之前的研究不同。Note: “梯度randomness” is a term used to describe the randomness of the gradients in the stochastic gradient descent algorithm.
    Abstract In this work, we investigate the dynamics of stochastic gradient descent (SGD) when training a single-neuron autoencoder with linear or ReLU activation on orthogonal data. We show that for this non-convex problem, randomly initialized SGD with a constant step size successfully finds a global minimum for any batch size choice. However, the particular global minimum found depends upon the batch size. In the full-batch setting, we show that the solution is dense (i.e., not sparse) and is highly aligned with its initialized direction, showing that relatively little feature learning occurs. On the other hand, for any batch size strictly smaller than the number of samples, SGD finds a global minimum which is sparse and nearly orthogonal to its initialization, showing that the randomness of stochastic gradients induces a qualitatively different type of "feature selection" in this setting. Moreover, if we measure the sharpness of the minimum by the trace of the Hessian, the minima found with full batch gradient descent are flatter than those found with strictly smaller batch sizes, in contrast to previous works which suggest that large batches lead to sharper minima. To prove convergence of SGD with a constant step size, we introduce a powerful tool from the theory of non-homogeneous random walks which may be of independent interest.
    摘要 在这项工作中,我们研究了使用单 neuron autoencoder 的梯度下降法(SGD)在正交数据上的动态。我们发现,对于这个非 convex 问题,随机初始化的SGD WITH 常数步长能够成功找到全局最小值,但这个全局最小值与批处理大小有关。在完整批处理设置下,我们发现的解是密集的(即不是稀疏),与其初始方向高度相似,表明在这个设置下,相对较少的特征学习发生。相反,任何小于样本数的批处理大小都会使SGD找到全局最小值,这个全局最小值是稀疏的和初始方向几乎垂直的,这表明随机梯度的Randomness 会导致一种totally different的"特征选择"现象。此外,我们还证明了SGD WITH 常数步长的收敛性,并引入了非同homogeneous random walk 的理论工具,这可能有独立的意义。

Average-Hard Attention Transformers are Constant-Depth Uniform Threshold Circuits

  • paper_url: http://arxiv.org/abs/2308.03212
  • repo_url: None
  • paper_authors: Lena Strobl
  • for: This paper explores the relationship between transformer models and constant-depth threshold circuits, and demonstrates that transformers can be simulated by constant-depth threshold circuits.
  • methods: The paper uses two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length.
  • results: The paper shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Additionally, the paper extends the first result to yield uniform circuits as well.Here’s the information in Simplified Chinese text:
  • for: 这篇论文研究了 transformer 模型与常深度阈值电路之间的关系,并证明 transformer 可以被模拟为常深度阈值电路。
  • methods: 该论文使用了两个假设:平均困难的注意力和对输入长度的对数精度。
  • results: 该论文显示了 transformer 模型可以被模拟为常深度阈值电路,其中第二个更加稳定,因为它生成了一个固定深度电路家族。此外,论文还将第一个结果推广到生成固定深度电路。
    Abstract Transformers have emerged as a widely used neural network model for various natural language processing tasks. Previous research explored their relationship with constant-depth threshold circuits, making two assumptions: average-hard attention and logarithmic precision for internal computations relative to input length. Merrill et al. (2022) prove that average-hard attention transformers recognize languages that fall within the complexity class TC0, denoting the set of languages that can be recognized by constant-depth polynomial-size threshold circuits. Likewise, Merrill and Sabharwal (2023) show that log-precision transformers recognize languages within the class of uniform TC0. This shows that both transformer models can be simulated by constant-depth threshold circuits, with the latter being more robust due to generating a uniform circuit family. Our paper shows that the first result can be extended to yield uniform circuits as well.
    摘要 transformers 已经成为自然语言处理任务中广泛使用的神经网络模型。前期研究探讨了它们与常深度阈值电路之间的关系,假设了平均困难注意力和对内部计算的对数精度相对于输入长度。Merill et al. (2022)证明了average-hard attention transformers 可以认出TC0复杂性类型的语言,这些语言可以被表示为常深度的多阶度阈值电路。Merill 和 Sabharwal (2023)表明,log-precision transformers 可以认出 uniform TC0 类型的语言。这表明这两种 transformers 模型都可以被模拟为常深度阈值电路,其中后者更加稳定,因为它生成了一个固定深度的多阶度阈值电路家族。我们的论文表明,上一个结果可以被推广到生成固定深度的多阶度阈值电路。

Time-Parameterized Convolutional Neural Networks for Irregularly Sampled Time Series

  • paper_url: http://arxiv.org/abs/2308.03210
  • repo_url: None
  • paper_authors: Chrysoula Kosma, Giannis Nikolentzos, Michalis Vazirgiannis
  • for: 这篇论文主要关注于如何对不规则数据进行模型化和预测,尤其是在多变量时间序列中。
  • methods: 本文提出了一种名为时间参数化卷积神经网(TPCNN)的新型神经网络模型,它运用时间参数化的卷积 kernel 来实现不规则数据的模型化。
  • results: 根据实验结果,TPCNN 模型在测量 interpolating 和类别任务中表现了竞争力和效率,并且可以对不规则数据进行有效地模型化和预测。
    Abstract Irregularly sampled multivariate time series are ubiquitous in several application domains, leading to sparse, not fully-observed and non-aligned observations across different variables. Standard sequential neural network architectures, such as recurrent neural networks (RNNs) and convolutional neural networks (CNNs), consider regular spacing between observation times, posing significant challenges to irregular time series modeling. While most of the proposed architectures incorporate RNN variants to handle irregular time intervals, convolutional neural networks have not been adequately studied in the irregular sampling setting. In this paper, we parameterize convolutional layers by employing time-explicitly initialized kernels. Such general functions of time enhance the learning process of continuous-time hidden dynamics and can be efficiently incorporated into convolutional kernel weights. We, thus, propose the time-parameterized convolutional neural network (TPCNN), which shares similar properties with vanilla convolutions but is carefully designed for irregularly sampled time series. We evaluate TPCNN on both interpolation and classification tasks involving real-world irregularly sampled multivariate time series datasets. Our experimental results indicate the competitive performance of the proposed TPCNN model which is also significantly more efficient than other state-of-the-art methods. At the same time, the proposed architecture allows the interpretability of the input series by leveraging the combination of learnable time functions that improve the network performance in subsequent tasks and expedite the inaugural application of convolutions in this field.
    摘要 不规则时间序列是多种应用领域中的普遍现象,导致不同变量之间的观察记录稀缺、不完全观察和不对称。标准的序列神经网络架构,如循环神经网络(RNN)和卷积神经网络(CNN),假设时间序列的均匀采样,对于不规则时间序列模型 pose significant challenges。大多数提议的架构包括RNN变体来处理不规则时间间隔,但是卷积神经网络在不规则采样 Setting 中尚未得到了充分的研究。在这篇论文中,我们将时间序列中的卷积层参数化,使用时间explicitly初始化的kernel。这种通用时间函数可以增强隐藏时间序列的学习过程,并可以高效地被包含到卷积核心的 weights 中。因此,我们提出了时间参数化卷积神经网络(TPCNN),它与普通的卷积神经网络 sharing similar properties,但是特别地设计 для不规则时间序列。我们在实验中使用TPCNN进行 interpolate 和 classification 任务,并对实际的不规则时间序列多变量数据进行评估。我们的实验结果表明,提议的 TPCNN 模型在竞争性和效率两个方面具有竞争力,而且可以更好地利用输入序列的学习可能性,通过组合学习时间函数来提高网络性能,并且可以更快地在这一领域中应用卷积神经网络。

Communication-Free Distributed GNN Training with Vertex Cut

  • paper_url: http://arxiv.org/abs/2308.03209
  • repo_url: None
  • paper_authors: Kaidi Cao, Rui Deng, Shirley Wu, Edward W Huang, Karthik Subbian, Jure Leskovec
  • for: 加速图 neural network(GNN)在实际图中的训练,以便应对实际图中的巨量数据和复杂结构。
  • methods: 提出了一种新的分布式训练框架CoFree-GNN,通过减少交互 communication来加速训练过程,并采用骨架剖分法保持图结构。
  • results: 在实际网络上进行了广泛的实验,demonstrating that CoFree-GNN can speed up GNN training by up to 10 times compared to existing state-of-the-art methods.
    Abstract Training Graph Neural Networks (GNNs) on real-world graphs consisting of billions of nodes and edges is quite challenging, primarily due to the substantial memory needed to store the graph and its intermediate node and edge features, and there is a pressing need to speed up the training process. A common approach to achieve speed up is to divide the graph into many smaller subgraphs, which are then distributed across multiple GPUs in one or more machines and processed in parallel. However, existing distributed methods require frequent and substantial cross-GPU communication, leading to significant time overhead and progressively diminishing scalability. Here, we introduce CoFree-GNN, a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. The framework utilizes a Vertex Cut partitioning, i.e., rather than partitioning the graph by cutting the edges between partitions, the Vertex Cut partitions the edges and duplicates the node information to preserve the graph structure. Furthermore, the framework maintains high model accuracy by incorporating a reweighting mechanism to handle a distorted graph distribution that arises from the duplicated nodes. We also propose a modified DropEdge technique to further speed up the training process. Using an extensive set of experiments on real-world networks, we demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
    摘要 训练图 neural network(GNN)在实际图中包含数百亿个节点和边的情况下是非常困难的,主要是因为需要很大的内存来存储图和其间途节点和边特征的存储。随着图的规模的增长,训练过程的速度变得非常重要。现有的分布式方法需要频繁的跨GPU通信,导致训练过程中的时间开销很大,并且随着图的规模的增长,缓存的缺省值逐渐减少。在这里,我们介绍了CoFree-GNN,一种新的分布式GNN训练框架,可以快速加速GNN训练过程。该框架使用顶点割分法,而不是将图分成多个分区,然后在多个GPU上并行处理。此外,框架还保持了高精度模型,通过对填充的节点数据进行重新权重来处理受损的图分布。我们还提出了一种修改后 DropEdge 技术,以进一步加速训练过程。通过对实际网络进行了广泛的实验,我们证明了CoFree-GNN可以在实际图中加速GNN训练过程,并且可以达到现有状态的 искусственный智能训练方法的10倍速度。

Microvasculature Segmentation in Human BioMolecular Atlas Program (HuBMAP)

  • paper_url: http://arxiv.org/abs/2308.03203
  • repo_url: None
  • paper_authors: Youssef Sultan, Yongqiang Wang, James Scanlon, Lisa D’lima
  • for: 这个研究旨在为 HuBMAP 项目提供细胞分割技术,以创建详细的人体细胞地图。
  • methods: 该研究使用 FastAI U-Net 模型作基础,并对其进行了多种变化,包括不同的后处理架构、更深的模型和特征峰网络。
  • results: 研究对不同方法的性能进行了严格评估,并提供了有价值的探索和Future研究方向。
    Abstract Image segmentation serves as a critical tool across a range of applications, encompassing autonomous driving's pedestrian detection and pre-operative tumor delineation in the medical sector. Among these applications, we focus on the National Institutes of Health's (NIH) Human BioMolecular Atlas Program (HuBMAP), a significant initiative aimed at creating detailed cellular maps of the human body. In this study, we concentrate on segmenting various microvascular structures in human kidneys, utilizing 2D Periodic Acid-Schiff (PAS)-stained histology images. Our methodology begins with a foundational FastAI U-Net model, upon which we investigate alternative backbone architectures, delve into deeper models, and experiment with Feature Pyramid Networks. We rigorously evaluate these varied approaches by benchmarking their performance against our baseline U-Net model. This study thus offers a comprehensive exploration of cutting-edge segmentation techniques, providing valuable insights for future research in the field.
    摘要 Image segmentation serves as a critical tool across a range of applications, encompassing autonomous driving's pedestrian detection and pre-operative tumor delineation in the medical sector. Among these applications, we focus on the National Institutes of Health's (NIH) Human BioMolecular Atlas Program (HuBMAP), a significant initiative aimed at creating detailed cellular maps of the human body. In this study, we concentrate on segmenting various microvascular structures in human kidneys, utilizing 2D Periodic Acid-Schiff (PAS)-stained histology images. Our methodology begins with a foundational FastAI U-Net model, upon which we investigate alternative backbone architectures, delve into deeper models, and experiment with Feature Pyramid Networks. We rigorously evaluate these varied approaches by benchmarking their performance against our baseline U-Net model. This study thus offers a comprehensive exploration of cutting-edge segmentation techniques, providing valuable insights for future research in the field.Here's the word-for-word translation of the text into Simplified Chinese:图像分割 serves as a critical tool across a range of applications, including autonomous driving的 pedestrian detection和医疗领域的 pre-operative tumor delineation。在这些应用程序中,我们关注国家医疗研究所(NIH)的人类生物分子图库计划(HuBMAP),这是一项创建详细人体细胞地图的重要 iniciative。在这种研究中,我们关注人类肾脏中的微血管结构分割,使用2D Periodic Acid-Schiff(PAS)染色 histology 图像。我们的方法开始于基础 FastAI U-Net 模型,然后我们 investigate alternative backbone architectures、 deeper models、和 Feature Pyramid Networks。我们严格评估这些不同的方法,对比我们的基eline U-Net 模型。这种研究 thus offers a comprehensive exploration of cutting-edge segmentation techniques, providing valuable insights for future research in the field。

Source-free Domain Adaptive Human Pose Estimation

  • paper_url: http://arxiv.org/abs/2308.03202
  • repo_url: https://github.com/davidpengucf/sfdahpe
  • paper_authors: Qucheng Peng, Ce Zheng, Chen Chen
  • for: 本研究旨在解决人体姿态估计(HPE)中数据隐私和安全问题,提出了一种新的任务:无源频道适应HPE。
  • methods: 本研究提出了一种新的框架,包括三个模型:源模型、中间模型和目标模型,从源保护和目标相关两个角度解决了问题。 source-protect模块更好地保持源信息,而target-relevant模块减少了空间表示的稀疏性,通过建立一个新的空间概率空间和姿势特异性学习和信息增强来解决这个问题。
  • results: 对多个频道适应HPEbenchmark进行了广泛的实验,结果表明,提出的方法在与现有方法进行比较时得到了显著的改进。代码可以在https://github.com/davidpengucf/SFDAHPE中下载。
    Abstract Human Pose Estimation (HPE) is widely used in various fields, including motion analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world datasets present a significant challenge for HPE. To overcome this, one approach is to train HPE models on synthetic datasets and then perform domain adaptation (DA) on real-world data. Unfortunately, existing DA methods for HPE neglect data privacy and security by using both source and target data in the adaptation process. To this end, we propose a new task, named source-free domain adaptive HPE, which aims to address the challenges of cross-domain learning of HPE without access to source data during the adaptation process. We further propose a novel framework that consists of three models: source model, intermediate model, and target model, which explores the task from both source-protect and target-relevant perspectives. The source-protect module preserves source information more effectively while resisting noise, and the target-relevant module reduces the sparsity of spatial representations by building a novel spatial probability space, and pose-specific contrastive learning and information maximization are proposed on the basis of this space. Comprehensive experiments on several domain adaptive HPE benchmarks show that the proposed method outperforms existing approaches by a considerable margin. The codes are available at https://github.com/davidpengucf/SFDAHPE.
    摘要 人体姿势估计(HPE)在多个领域得到广泛应用,包括运动分析、医疗和虚拟现实。然而,实际世界数据集的高成本成为HPE的一大挑战。为了解决这个问题,一种方法是在HPE模型上训练 synthetic 数据集,然后进行领域适应(DA)操作实际世界数据。然而,现有的 DA 方法 для HPE 忽视了数据隐私和安全性,使用了源和目标数据在适应过程中。为此,我们提出了一个新的任务,即无源领域适应 HPE,旨在解决 HPE 的跨领域学习问题,不需要源数据的访问 during adaptation process。我们还提出了一个新的框架,包括三个模型:源模型、中间模型和目标模型,该框架从源保护和目标相关两个角度出发,探讨了这个任务。源保护模块更好地保留源信息,同时抗抗噪,目标相关模块通过建立一个新的空间概率空间,减少了空间表示的稀疏性,并通过基于这个空间的姿势特异性学习和信息最大化来提高姿势估计的精度。通过对多个领域适应 HPE benchmark 进行广泛的实验,我们发现了提议方法在现有方法之上得到了较大的提升。代码可以在 中下载。

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

  • paper_url: http://arxiv.org/abs/2308.03188
  • repo_url: https://github.com/teacherpeterpan/self-correction-llm-papers
  • paper_authors: Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, William Yang Wang
  • for: This paper aims to provide a comprehensive review of techniques for self-correction in large language models (LLMs) to address undesired behaviors such as hallucination, unfaithful reasoning, and toxic content.
  • methods: The paper reviews and taxonomizes recent work utilizing self-correction techniques, including training-time, generation-time, and post-hoc correction methods.
  • results: The paper summarizes the major applications of self-correction techniques in LLMs and discusses future directions and challenges in this emerging area of research.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目的是为大语言模型(LLM)提供全面的审查,以解决它们的不良行为,如幻见、不负责任的思维和恶势感。
  • methods: 论文回顾和分类了最近几年利用自我修复技术的研究,包括训练时、生成时和后续修复方法。
  • results: 论文总结了自我修复技术在LLM中的主要应用场景,并讨论未来的发展和挑战。
    Abstract Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent behaviors, including hallucination, unfaithful reasoning, and toxic content. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. Techniques leveraging automated feedback -- either produced by the LLM itself or some external system -- are of particular interest as they are a promising way to make LLM-based solutions more practical and deployable with minimal human feedback. This paper presents a comprehensive review of this emerging class of techniques. We analyze and taxonomize a wide array of recent work utilizing these strategies, including training-time, generation-time, and post-hoc correction. We also summarize the major applications of this strategy and conclude by discussing future directions and challenges.
    摘要 Translation notes:* "Large language models" (LLMs) is translated as "大型语言模型" (dàxìng yǔyán módelǐ)* "Natural language processing" (NLP) is translated as "自然语言处理" (zìrán yǔyán xùzhì)* "Hallucination" is translated as "幻见" (hèngjiàn)* "Unfaithful reasoning" is translated as "不寻常的理解" (bù zhǎngcháng de lǐjiě)* "Toxic content" is translated as "毒害内容" (dāo hài nèirong)* "Self-correction" is translated as "自动修复" (zìdòng xiūgòng)* "Automated feedback" is translated as "自动反馈" (zìdòng fāngxiàn)* "Training-time" is translated as "训练时间" (xùnlí shíjiān)* "Generation-time" is translated as "生成时间" (shēngchǎng shíjiān)* "Post-hoc correction" is translated as "后续修复" (hòu xiāng xiūgòng)* "Major applications" is translated as "主要应用" (zhǔyào yìngyòu)* "Future directions" is translated as "未来方向" (wèilái fāngdìng)* "Challenges" is translated as "挑战" (tiǎozhàn)

A Lightweight Method for Modeling Confidence in Recommendations with Learned Beta Distributions

  • paper_url: http://arxiv.org/abs/2308.03186
  • repo_url: https://github.com/nkny/confidencerecsys2023
  • paper_authors: Norman Knyazev, Harrie Oosterhuis
  • for: 提供一种简单实用的推荐方法,可以提供推荐结果的信息度量。
  • methods: 使用学习 beta 分布来预测用户喜好,该方法可以简单实现,同时可以提供明确的信息度量。
  • results: 对比 existed 方法,本方法可以保持竞争性的准确率,同时信息度量和准确率之间存在显著正相关性。此外,在高精度目标推荐任务中,本方法的表现更高。
    Abstract Most Recommender Systems (RecSys) do not provide an indication of confidence in their decisions. Therefore, they do not distinguish between recommendations of which they are certain, and those where they are not. Existing confidence methods for RecSys are either inaccurate heuristics, conceptually complex or computationally very expensive. Consequently, real-world RecSys applications rarely adopt these methods, and thus, provide no confidence insights in their behavior. In this work, we propose learned beta distributions (LBD) as a simple and practical recommendation method with an explicit measure of confidence. Our main insight is that beta distributions predict user preferences as probability distributions that naturally model confidence on a closed interval, yet can be implemented with the minimal model-complexity. Our results show that LBD maintains competitive accuracy to existing methods while also having a significantly stronger correlation between its accuracy and confidence. Furthermore, LBD has higher performance when applied to a high-precision targeted recommendation task. Our work thus shows that confidence in RecSys is possible without sacrificing simplicity or accuracy, and without introducing heavy computational complexity. Thereby, we hope it enables better insight into real-world RecSys and opens the door for novel future applications.
    摘要 大多数推荐系统(RecSys)没有提供决策的信任度指示。因此,它们不能分辨出它们是否具有信任度。现有的信任方法对RecSys是 Either inaccurate heuristics, conceptually complex or computationally very expensive。因此,现实世界中的RecSys应用 rare adopt these methods,并因此无法提供信任信息。在这项工作中,我们提议使用学习beta分布(LBD)作为简单而实用的推荐方法,具有显式的信任度度量。我们的主要发现是,beta分布预测用户喜好的概率分布,自然地模型信任的闭合区间,但可以实现最小的模型复杂度。我们的结果表明,LBD与现有方法具有相似的精度,而且其准确性和信任度之间存在显著的相关性。此外,LBD在高精度目标推荐任务中表现更高。因此,我们的工作显示了信任度在RecSys中是可能的,不需要牺牲简单性或准确性,也不需要承受重量的计算复杂度。这有助于提供更好的察看实际RecSys的信息,并开启了未来应用的新可能性。

A Critical Review of Physics-Informed Machine Learning Applications in Subsurface Energy Systems

  • paper_url: http://arxiv.org/abs/2308.04457
  • repo_url: None
  • paper_authors: Abdeldjalil Latrach, Mohamed Lamine Malki, Misael Morales, Mohamed Mehana, Minou Rabiei
  • For: The paper is written for researchers and practitioners in the field of machine learning, particularly in the area of physics-informed machine learning (PIML), to provide a comprehensive review of its applications in subsurface energy systems, such as the oil and gas industry.* Methods: The paper uses a literature review to discuss the current state of PIML techniques and their applications in various fields, including seismic applications, reservoir simulation, hydrocarbons production forecasting, and intelligent decision-making in the exploration and production stages.* Results: The paper highlights the successful utilization of PIML for tasks related to subsurface energy systems, demonstrating its ability to provide more accurate and reliable predictions for resource management and operational efficiency. Additionally, it shows the potential of PIML to revolutionize the oil and gas industry and other emerging areas of interest, such as carbon and hydrogen storage, and geothermal systems.
    Abstract Machine learning has emerged as a powerful tool in various fields, including computer vision, natural language processing, and speech recognition. It can unravel hidden patterns within large data sets and reveal unparalleled insights, revolutionizing many industries and disciplines. However, machine and deep learning models lack interpretability and limited domain-specific knowledge, especially in applications such as physics and engineering. Alternatively, physics-informed machine learning (PIML) techniques integrate physics principles into data-driven models. By combining deep learning with domain knowledge, PIML improves the generalization of the model, abidance by the governing physical laws, and interpretability. This paper comprehensively reviews PIML applications related to subsurface energy systems, mainly in the oil and gas industry. The review highlights the successful utilization of PIML for tasks such as seismic applications, reservoir simulation, hydrocarbons production forecasting, and intelligent decision-making in the exploration and production stages. Additionally, it demonstrates PIML's capabilities to revolutionize the oil and gas industry and other emerging areas of interest, such as carbon and hydrogen storage; and geothermal systems by providing more accurate and reliable predictions for resource management and operational efficiency.
    摘要 Translation Notes:* "unravel" 解释 (jiě jiě)* "hidden patterns" 隐藏的模式 (hìn zì de mó shì)* "revolutionize" 革命化 (gé mín huà)* "domain knowledge" 领域知识 (zhōng yì zhī shī)* "physics-informed" 物理学习 (wù lǐ xué xí)* "abides by" 遵循 (zhèng xiǎng)* "interpretability" 可解释性 (kě jì e xiǎng xìng)* "subsurface energy systems" 地层能源系统 (dì céng néng yuán xìng zhì)* "oil and gas industry" 石油和天然气业 (shí yóu hé tiān nàng qì yè)* "seismic applications" 地震应用 (dì zhèn yìng yòu)* "reservoir simulation" 储量模拟 (chuī liàng mó xiǎng)* "hydrocarbons production forecasting" 矿物质生产预测 (kuàng wù zhì shēng chéng yù jì)* "intelligent decision-making" 智能决策 (zhì néng jí suī)* "resource management" 资源管理 (yùn xīn guǎn lí)* "operational efficiency" 运营效率 (yùn yìng xiǎng jì)* "carbon and hydrogen storage" 碳和氢存储 (dàn hé hóu cè yù)* "geothermal systems" 地热系统 (dì rè xìng zhì)

Adapting Machine Learning Diagnostic Models to New Populations Using a Small Amount of Data: Results from Clinical Neuroscience

  • paper_url: http://arxiv.org/abs/2308.03175
  • repo_url: None
  • paper_authors: Rongguang Wang, Guray Erus, Pratik Chaudhari, Christos Davatzikos
  • for: 这篇论文目的是为了解决机器学习在医疗领域中的可重现性问题,特别是在医学中。
  • methods: 这篇论文使用了权重的机制实现零偏好的学习方法,将来自不同来源的数据结合来预测目标群的结果。
  • results: 这篇论文的结果显示,这种方法可以在多来源数据上建立更好的预测模型,并且可以在不同的扫描器、实验室和人口特征下进行可重现性的预测。
    Abstract Machine learning (ML) has shown great promise for revolutionizing a number of areas, including healthcare. However, it is also facing a reproducibility crisis, especially in medicine. ML models that are carefully constructed from and evaluated on a training set might not generalize well on data from different patient populations or acquisition instrument settings and protocols. We tackle this problem in the context of neuroimaging of Alzheimer's disease (AD), schizophrenia (SZ) and brain aging. We develop a weighted empirical risk minimization approach that optimally combines data from a source group, e.g., subjects are stratified by attributes such as sex, age group, race and clinical cohort to make predictions on a target group, e.g., other sex, age group, etc. using a small fraction (10%) of data from the target group. We apply this method to multi-source data of 15,363 individuals from 20 neuroimaging studies to build ML models for diagnosis of AD and SZ, and estimation of brain age. We found that this approach achieves substantially better accuracy than existing domain adaptation techniques: it obtains area under curve greater than 0.95 for AD classification, area under curve greater than 0.7 for SZ classification and mean absolute error less than 5 years for brain age prediction on all target groups, achieving robustness to variations of scanners, protocols, and demographic or clinical characteristics. In some cases, it is even better than training on all data from the target group, because it leverages the diversity and size of a larger training set. We also demonstrate the utility of our models for prognostic tasks such as predicting disease progression in individuals with mild cognitive impairment. Critically, our brain age prediction models lead to new clinical insights regarding correlations with neurophysiological tests.
    摘要

Two Sides of Miscalibration: Identifying Over and Under-Confidence Prediction for Network Calibration

  • paper_url: http://arxiv.org/abs/2308.03172
  • repo_url: https://github.com/aoshuang92/miscalibration_ts
  • paper_authors: Shuang Ao, Stefan Rueger, Advaith Siddharthan
  • for: 本研究旨在解决深度神经网络的准确预测问题中的可靠性问题,即训练数据集中的模型可靠性问题。
  • methods: 本研究提出了一种新的评估方法,即误差评估指标,以评估模型的整体和分类准确率。此外,本研究还提出了一种基于分类误差评估指标的calibration技术,可以解决模型过于自信和不够自信的问题。
  • results: 本研究的实验结果表明,提出的误差评估指标和calibration技术可以substantially outperform现有的calibration技术。此外,在一个自动失败检测任务中,我们的方法也提高了模型的失败检测和可靠性。
    Abstract Proper confidence calibration of deep neural networks is essential for reliable predictions in safety-critical tasks. Miscalibration can lead to model over-confidence and/or under-confidence; i.e., the model's confidence in its prediction can be greater or less than the model's accuracy. Recent studies have highlighted the over-confidence issue by introducing calibration techniques and demonstrated success on various tasks. However, miscalibration through under-confidence has not yet to receive much attention. In this paper, we address the necessity of paying attention to the under-confidence issue. We first introduce a novel metric, a miscalibration score, to identify the overall and class-wise calibration status, including being over or under-confident. Our proposed metric reveals the pitfalls of existing calibration techniques, where they often overly calibrate the model and worsen under-confident predictions. Then we utilize the class-wise miscalibration score as a proxy to design a calibration technique that can tackle both over and under-confidence. We report extensive experiments that show our proposed methods substantially outperforming existing calibration techniques. We also validate our proposed calibration technique on an automatic failure detection task with a risk-coverage curve, reporting that our methods improve failure detection as well as trustworthiness of the model. The code are available at \url{https://github.com/AoShuang92/miscalibration_TS}.
    摘要 深度神经网络的正确信度调整是关键 для可靠的预测结果。不当的信度调整可能导致模型过度自信或者不足自信,即模型对其预测结果的自信度高于或低于实际精度。当前的研究主要关注过度自信的问题,并已经在不同的任务上展现了成功。然而,下降自信的问题尚未得到充分的注意。在这篇论文中,我们强调了对下降自信的需要,并提出了一种新的评价指标——信度混乱分数,用于评估模型的总体和分类准确程度。我们的提议的评价指标 revelas了现有的准则化技术的缺陷,它们经常对模型进行过度准则化,从而使下降自信的预测结果更加差。然后,我们使用分类准确度下降的指标作为代理,设计了一种能够杜绝过度和下降自信的准则化技术。我们进行了广泛的实验,并证明了我们的提议方法在现有的准则化技术之上显著超越。此外,我们还验证了我们的提议准则化技术在自动故障检测任务中的可靠性和信任性,通过发布了一个风险覆盖曲线。代码可以在 \url{https://github.com/AoShuang92/miscalibration_TS} 中找到。

Detection of Anomalies in Multivariate Time Series Using Ensemble Techniques

  • paper_url: http://arxiv.org/abs/2308.03171
  • repo_url: None
  • paper_authors: Anastasios Iliopoulos, John Violos, Christos Diou, Iraklis Varlamis
  • for: 这篇论文主要关注于多变量时间序列异常探测,以解决许多领域中的问题。
  • methods: 本论文提出了一种基于深度神经网络的方法,包括LSTM、自动Encoder和嵌入式自动Encoder等。这些方法在具有偏差数据的情况下表现出色。然而,当应用到多变量时间序列时,异常可能从一小subset的特征集中发生。为了提高这些基本模型的表现,我们提出了一种特征袋包技术,将特征集分成多个子集,并对每个子集进行适当的变数转换。
  • results: 本论文的实验结果显示,提出的组合技术可以对SKAB资料集进行异常探测,并且在不监控和半监控情况下都有着良好的表现。具体来说,这篇论文的数据显示,使用组合技术可以对SKAB资料集进行异常探测,并且在不监控情况下,异常探测精度提高了2%,而在半监控情况下,异常探测精度提高了10%以上。
    Abstract Anomaly Detection in multivariate time series is a major problem in many fields. Due to their nature, anomalies sparsely occur in real data, thus making the task of anomaly detection a challenging problem for classification algorithms to solve. Methods that are based on Deep Neural Networks such as LSTM, Autoencoders, Convolutional Autoencoders etc., have shown positive results in such imbalanced data. However, the major challenge that algorithms face when applied to multivariate time series is that the anomaly can arise from a small subset of the feature set. To boost the performance of these base models, we propose a feature-bagging technique that considers only a subset of features at a time, and we further apply a transformation that is based on nested rotation computed from Principal Component Analysis (PCA) to improve the effectiveness and generalization of the approach. To further enhance the prediction performance, we propose an ensemble technique that combines multiple base models toward the final decision. In addition, a semi-supervised approach using a Logistic Regressor to combine the base models' outputs is proposed. The proposed methodology is applied to the Skoltech Anomaly Benchmark (SKAB) dataset, which contains time series data related to the flow of water in a closed circuit, and the experimental results show that the proposed ensemble technique outperforms the basic algorithms. More specifically, the performance improvement in terms of anomaly detection accuracy reaches 2% for the unsupervised and at least 10% for the semi-supervised models.
    摘要 异常检测在多变量时间序列中是许多领域的主要问题。由于异常事件罕见,因此对分类算法来说是一个困难的问题。基于深度神经网络的方法,如LSTM、Autoencoder、Convolutional Autoencoder等,在这样的不均衡数据中显示出了正面的效果。然而,在多变量时间序列中,异常可能来自一小部分特征集。为了提高基本模型的性能,我们提议一种特征袋包技术,该技术只考虑特定的子集特征,并应用基于Principal Component Analysis(PCA)的嵌入式旋转变换来提高效果和泛化性。此外,我们还提议一种集成技术,将多个基本模型的输出集成到最终决策中。此外,我们还提出了一种半监督方法,使用Logistic Regressor将基本模型的输出集成到最终决策中。我们对Skoltech异常数据集(SKAB)进行了实验,该数据集包含关于水流在关闭环circuit中的时间序列数据,实验结果显示,我们的ensemble方法在异常检测精度方面与基本算法相比,提高了2%(不监督)和至少10%(半监督)。

FireFly A Synthetic Dataset for Ember Detection in Wildfire

  • paper_url: http://arxiv.org/abs/2308.03164
  • repo_url: https://github.com/ergowho/firefly2.0
  • paper_authors: Yue Hu, Xinan Ye, Yifei Liu, Souvik Kundu, Gourav Datta, Srikar Mutnuri, Namo Asavisanu, Nora Ayanian, Konstantinos Psounis, Peter Beerel
  • for: 本研究写作了一个名为FireFly的人工数据集,用于发现灯火。
  • methods: 本研究使用Unreal Engine 4(UE4)创建了一个自动生成的统计数据集,并使用自动化的标签工具来生成标签数据。
  • results: 本研究在四个流行的物体检测模型上进行了评估,并获得了8.57%的平均精度提升(mAP)在实际的野火enario中,相比于仅以小型实际数据进行训练。
    Abstract This paper presents "FireFly", a synthetic dataset for ember detection created using Unreal Engine 4 (UE4), designed to overcome the current lack of ember-specific training resources. To create the dataset, we present a tool that allows the automated generation of the synthetic labeled dataset with adjustable parameters, enabling data diversity from various environmental conditions, making the dataset both diverse and customizable based on user requirements. We generated a total of 19,273 frames that have been used to evaluate FireFly on four popular object detection models. Further to minimize human intervention, we leveraged a trained model to create a semi-automatic labeling process for real-life ember frames. Moreover, we demonstrated an up to 8.57% improvement in mean Average Precision (mAP) in real-world wildfire scenarios compared to models trained exclusively on a small real dataset.
    摘要 这份论文提出了“火萝虫”,一个使用Unreal Engine 4(UE4)创建的人工数据集,用于缺乏ember特有训练资源的缺陷。为创建这个数据集,我们提供了一个自动生成Synthetic标注数据集的工具,可以根据用户需求进行自定义,以实现数据集的多样性和自定义。我们总共生成了19273帧,用于评估FireFly在四种流行的物体检测模型上。此外,我们还利用一个已经训练的模型来创建一种半自动的标注过程,以便为真实的萝虫框架进行标注。此外,我们还证明了在实际野外爆发火情况下,FireFly比只在小型实际数据集上训练的模型提高了8.57%的平均准确率。