cs.LG - 2023-11-25

Testable Learning with Distribution Shift

  • paper_url: http://arxiv.org/abs/2311.15142
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan
  • for: 本研究重新强调了学习 distribuion shift 问题,即学习者只有 training distribution $D$ 上标注的样本和 test distribution $D’$ 上的无标注样本,并且需要输出一个具有低测试错误率的分类器。传统的方法是通过 bounding 损失来bounds 分类器的性能,但这些距离们显得困难计算,无法导致高效的算法。
  • methods: 我们采用了一种新的模型 called testable learning with distribution shift,可以获得可靠的算法来证明分类器在 test distribution $D’$ 上的性能。learner 输出一个具有低测试错误率的分类器,当 samples from $D$ 和 $D’$ 通过相关的测试时。此外,测试还需要接受,如果 marginal of $D$ 与 marginal of $D’$ 相同。我们提供了一些正面的结果,包括在半空间、半空间的交集和决策树上学习的情况下,当 marginal of $D$ 是 Gaussian 或 uniform on ${\pm 1}^d$ 时。在这些基本情况下,没有高效的算法,除非 $D’$ 具有强大的假设。
  • results: 我们 obtianed several positive results, including the ability to learn well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on ${\pm 1}^d$. Additionally, we developed a moment-matching approach combined with ideas from active learning to simulate an efficient oracle for estimating disagreement regions for halfspaces in the realizable case. For the non-realizable setting, we applied recent work from testable (agnostic) learning. Furthermore, we proved that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We used constructions from the pseudorandomness literature to obtain the required approximators.
    Abstract We revisit the fundamental problem of learning with distribution shift, in which a learner is given labeled samples from training distribution $D$, unlabeled samples from test distribution $D'$ and is asked to output a classifier with low test error. The standard approach in this setting is to bound the loss of a classifier in terms of some notion of distance between $D$ and $D'$. These distances, however, seem difficult to compute and do not lead to efficient algorithms. We depart from this paradigm and define a new model called testable learning with distribution shift, where we can obtain provably efficient algorithms for certifying the performance of a classifier on a test distribution. In this model, a learner outputs a classifier with low test error whenever samples from $D$ and $D'$ pass an associated test; moreover, the test must accept if the marginal of $D$ equals the marginal of $D'$. We give several positive results for learning well-studied concept classes such as halfspaces, intersections of halfspaces, and decision trees when the marginal of $D$ is Gaussian or uniform on $\{\pm 1\}^d$. Prior to our work, no efficient algorithms for these basic cases were known without strong assumptions on $D'$. For halfspaces in the realizable case (where there exists a halfspace consistent with both $D$ and $D'$), we combine a moment-matching approach with ideas from active learning to simulate an efficient oracle for estimating disagreement regions. To extend to the non-realizable setting, we apply recent work from testable (agnostic) learning. More generally, we prove that any function class with low-degree $L_2$-sandwiching polynomial approximators can be learned in our model. We apply constructions from the pseudorandomness literature to obtain the required approximators.
    摘要 我们重新探讨了学习对于分布差异的基本问题,在这个问题中,学者被提供了训练分布$D$中的标签样本,测试分布$D'$中的无标签样本,并要求他们生成一个低测试错误的分类器。标准的方法在这个设定中是通过将损失函数绑定到$D$和$D'$之间的一些距离之上,这些距离 however 似乎很难计算,并不会导致有效的算法。我们从这个概念中独立,定义了一个名为“测试学习对于分布差异”的新模型,在这个模型中,学者输出一个低测试错误的分类器,其中标签样本从$D$和$D'$中通过一个相关的测试而获得批准。此外,这个测试还需要确保$D$和$D'$的聚合相同。我们提供了多个正面的结果,包括在对半空间、半空间的交集和决策树等常见的概念类中学习时,当$D$的聚合为高斯或 uniform on $\{\pm 1\}^d$时,可以取得有效的算法。在这些基本情况下,以前没有无效的算法,不需要强制假设$D'$的假设。对半空间情况(其存在一个半空间与 both $D$ 和 $D'$ 兼容),我们结合了一个积分匹配方法与活动学习的想法,实现了一个高效的伪oracle,用于估计不同区域。为了扩展到非可行情况,我们应用了最近的测试(agnostic)学习的成果。更一般地,我们证明任何具有低度$L_2$矩阵拓扑的函数类别可以在我们的模型中学习。我们从 pseudorandomness 文献中取得了需要的拓扑器。

Multi-fidelity Constrained Optimization for Stochastic Black Box Simulators

  • paper_url: http://arxiv.org/abs/2311.15137
  • repo_url: None
  • paper_authors: Atul Agrawal, Kislaya Ravi, Phaedon-Stelios Koutsourelakis, Hans-Joachim Bungartz
  • for: 优化 simulate 中的参数,以提高设计过程中的性能。
  • methods: 使用 Stochastic Constrained Optimization for N dimensions(Scout-Nd)算法,能够有效地估算梯度,减少梯度估计的噪声,并应用多 fidelti 策略进一步减少计算努力。
  • results: 在标准准例中进行了 validate,demonstrating 能够有效地优化参数,表现比现有方法更好。
    Abstract Constrained optimization of the parameters of a simulator plays a crucial role in a design process. These problems become challenging when the simulator is stochastic, computationally expensive, and the parameter space is high-dimensional. One can efficiently perform optimization only by utilizing the gradient with respect to the parameters, but these gradients are unavailable in many legacy, black-box codes. We introduce the algorithm Scout-Nd (Stochastic Constrained Optimization for N dimensions) to tackle the issues mentioned earlier by efficiently estimating the gradient, reducing the noise of the gradient estimator, and applying multi-fidelity schemes to further reduce computational effort. We validate our approach on standard benchmarks, demonstrating its effectiveness in optimizing parameters highlighting better performance compared to existing methods.
    摘要 <>转换文本到简化中文。<>模拟器参数的Constrained优化在设计过程中发挥关键作用。这些问题在模拟器是随机的、计算成本高的和参数空间高维的情况下变得挑战性更大。只有通过利用参数与之关系的梯度来进行优化,但这些梯度在许多传统的黑盒代码中不可见。我们介绍了Scout-Nd(随机Constrained优化for N维)算法,以解决上述问题,效率地估计梯度,减少梯度估计的噪声,并应用多级准则来进一步减少计算努力。我们对标准准例进行验证,示出了我们的方法在优化参数时的效果,比传统方法更好。

Modelling wildland fire burn severity in California using a spatial Super Learner approach

  • paper_url: http://arxiv.org/abs/2311.16187
  • repo_url: https://github.com/Nicholas-Simafranca/Super_Learner_Wild_Fire
  • paper_authors: Nicholas Simafranca, Bryant Willoughby, Erin O’Neil, Sophie Farr, Brian J Reich, Naomi Giertych, Margaret Johnson, Madeleine Pascolini-Campbell
  • for: 预测加州西部野火燃烧严重程度
  • methods: 使用机器学习模型,使用预 Feuer remotely sensed data 来预测火灾后燃烧严重程度
  • results: 比较标准线性回传方法,Super Learner 算法在所有组合中都表现出色,能够准确预测燃烧严重程度的主要驱动因素,包括绿色、高度和火灾天气变量,这些发现可以提供实际的几何资讯,帮助社区制定缓解措施,例如早期火灾探测系统、预火季节 vegetation 清理活动和紧急 Response 资源配置。
    Abstract Given the increasing prevalence of wildland fires in the Western US, there is a critical need to develop tools to understand and accurately predict burn severity. We develop a machine learning model to predict post-fire burn severity using pre-fire remotely sensed data. Hydrological, ecological, and topographical variables collected from four regions of California - the sites of the Kincade fire (2019), the CZU Lightning Complex fire (2020), the Windy fire (2021), and the KNP Fire (2021) - are used as predictors of the difference normalized burn ratio. We hypothesize that a Super Learner (SL) algorithm that accounts for spatial autocorrelation using Vecchia's Gaussian approximation will accurately model burn severity. In all combinations of test and training sets explored, the results of our model showed the SL algorithm outperformed standard Linear Regression methods. After fitting and verifying the performance of the SL model, we use interpretable machine learning tools to determine the main drivers of severe burn damage, including greenness, elevation and fire weather variables. These findings provide actionable insights that enable communities to strategize interventions, such as early fire detection systems, pre-fire season vegetation clearing activities, and resource allocation during emergency responses. When implemented, this model has the potential to minimize the loss of human life, property, resources, and ecosystems in California.
    摘要 due to the increasing prevalence of wildland fires in the Western US, there is a critical need to develop tools to understand and accurately predict burn severity. we develop a machine learning model to predict post-fire burn severity using pre-fire remotely sensed data. hydrological, ecological, and topographical variables collected from four regions of California - the sites of the Kincade fire (2019), the CZU Lightning Complex fire (2020), the Windy fire (2021), and the KNP Fire (2021) - are used as predictors of the difference normalized burn ratio. we hypothesize that a Super Learner (SL) algorithm that accounts for spatial autocorrelation using Vecchia's Gaussian approximation will accurately model burn severity. in all combinations of test and training sets explored, the results of our model showed the SL algorithm outperformed standard linear regression methods. after fitting and verifying the performance of the SL model, we use interpretable machine learning tools to determine the main drivers of severe burn damage, including greenness, elevation, and fire weather variables. these findings provide actionable insights that enable communities to strategize interventions, such as early fire detection systems, pre-fire season vegetation clearing activities, and resource allocation during emergency responses. when implemented, this model has the potential to minimize the loss of human life, property, resources, and ecosystems in California.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Speech-Based Blood Pressure Estimation with Enhanced Optimization and Incremental Clustering

  • paper_url: http://arxiv.org/abs/2311.15098
  • repo_url: None
  • paper_authors: Vaishali Rajput, Preeti Mulay, Rajeev Raje
  • for: 本研究旨在探讨血压测量的准确估算方法,以帮助诊断各种健康问题。
  • methods: 本研究采用机器学习和语音信号,并提出了一种基于归一化的强化学习策略,以提高血压测量的准确性。
  • results: 研究结果表明,combined outcome of these clustering techniques enables robust BP estimation,并且通过 интегрирование高级血压估算技术和 YouTube 视频的情感维度,本研究拓宽了我们对现代媒体环境对健康影响的理解。
    Abstract Blood Pressure (BP) estimation plays a pivotal role in diagnosing various health conditions, highlighting the need for innovative approaches to overcome conventional measurement challenges. Leveraging machine learning and speech signals, this study investigates accurate BP estimation with a focus on preprocessing, feature extraction, and real-time applications. An advanced clustering-based strategy, incorporating the k-means algorithm and the proposed Fact-Finding Instructor optimization algorithm, is introduced to enhance accuracy. The combined outcome of these clustering techniques enables robust BP estimation. Moreover, extending beyond these insights, this study delves into the dynamic realm of contemporary digital content consumption. Platforms like YouTube have emerged as influential spaces, presenting an array of videos that evoke diverse emotions. From heartwarming and amusing content to intense narratives, YouTube captures a spectrum of human experiences, influencing information access and emotional engagement. Within this context, this research investigates the interplay between YouTube videos and physiological responses, particularly Blood Pressure (BP) levels. By integrating advanced BP estimation techniques with the emotional dimensions of YouTube videos, this study enriches our understanding of how modern media environments intersect with health implications.
    摘要 血压(BP)估算在诊断各种健康状况中扮演着关键角色,高亮了需要采用创新的测量方法来解决传统测量挑战。利用机器学习和语音信号,本研究探讨了准确的BP估算方法,并将重点放在预处理、特征提取和实时应用方面。本研究提出了一种基于归一化策略的高级归一化策略,结合k-means算法和提出的Fact-Finding Instructor优化算法,以提高准确性。这些归一化策略的结合效果使得BP估算更加稳定。此外,本研究还推广了这些视频的感知效果,探讨了YouTube视频平台上的视频内容对血压水平的影响。通过将高级BP估算技术与YouTube视频的情感维度结合起来,本研究扩展了我们对现代媒体环境与健康影响的理解。

AugmentTRAJ: A framework for point-based trajectory data augmentation

  • paper_url: http://arxiv.org/abs/2311.15097
  • repo_url: None
  • paper_authors: Yaksh J Haranwala
  • For: The paper is written for researchers and practitioners working with mobility data analysis, particularly those interested in leveraging data augmentation techniques to improve the performance and generalization of their models.* Methods: The paper introduces AugmenTRAJ, an open-source Python3 framework designed for trajectory data augmentation. The framework offers a variety of data augmentation techniques, including point-wise augmentation, to generate synthetic trajectories that preserve the inherent characteristics of the original data.* Results: The paper showcases the effectiveness of AugmenTRAJ in enhancing the performance and generalization of mobility data analysis models. The framework is found to be reliable and versatile, providing researchers with a practical tool for augmenting trajectory data and expanding the potential applications of data augmentation in this domain.
    Abstract Data augmentation has emerged as a powerful technique in machine learning, strengthening model robustness while mitigating overfitting and under-fitting issues by generating diverse synthetic data. Nevertheless, despite its success in other domains, data augmentation's potential remains largely untapped in mobility data analysis, primarily due to the intricate nature and unique format of trajectory data. Additionally, there is a lack of frameworks capable of point-wise data augmentation, which can reliably generate synthetic trajectories while preserving the inherent characteristics of the original data. To address these challenges, this research introduces AugmenTRAJ, an open-source Python3 framework designed explicitly for trajectory data augmentation. AugmenTRAJ offers a reliable and well-controlled approach for generating synthetic trajectories, thereby enabling the harnessing of data augmentation benefits in mobility analysis. This thesis presents a comprehensive overview of the methodologies employed in developing AugmenTRAJ and showcases the various data augmentation techniques available within the framework. AugmenTRAJ opens new possibilities for enhancing mobility data analysis models' performance and generalization capabilities by providing researchers with a practical and versatile tool for augmenting trajectory data, Its user-friendly implementation in Python3 facilitates easy integration into existing workflows, offering the community an accessible resource to leverage the full potential of data augmentation in trajectory-based applications.
    摘要 <>TRANSLATE_TEXT大数据增强技术在机器学习中得到广泛应用,提高模型的鲁棒性,同时避免过拟合和下适应问题,通过生成多样化的 sintetic 数据。然而,在其他领域的成功不withstanding,数据增强在行动数据分析中的潜力仍然未得到充分开发利用,主要是因为行动数据的特殊性和 Format。此外,Point-wise 数据增强的框架缺乏,这些框架可以可靠地生成 sintetic 行动轨迹,同时保持原始数据的基本特征。为解决这些挑战,这项研究提出了 AugmenTRAJ,一个开源的 Python3 框架,专门用于行动数据增强。AugmenTRAJ 提供了一种可靠和受控的 sintetic 行动轨迹生成方法,使得行动数据分析模型的性能和泛化能力得到进一步提高。这份论文提供了开发 AugmenTRAJ 的方法和技术的全面概述,并展示了框架中的多种数据增强技术。AugmenTRAJ 为研究人员提供了一个实用和灵活的工具,可以帮助他们在行动数据分析中更好地利用数据增强的优势。其Python3 实现的易用性使得它可以轻松地与现有的 workflow 集成,为社区提供了一个可访问的资源,以便在行动数据分析中充分发挥数据增强的作用。<>

Where2Start: Leveraging initial States for Robust and Sample-Efficient Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.15089
  • repo_url: None
  • paper_authors: Pouya Parsa, Raoof Zare Moayedi, Mohammad Bornosi, Mohammad Mahdi Bejani
  • for: The paper aims to improve the performance of reinforcement learning agents by leveraging the knowledge captured by trajectories.
  • methods: The proposed Where2Start algorithm selects the initial state in a way that maximizes the instability of the agent in the vicinity of that state, leading to a decrease in the number of trajectories needed for the agent to reach an acceptable reward.
  • results: The experiments show that Where2Start can improve sample efficiency up to 8 times and can be combined with most state-of-the-art algorithms to improve robustness and sample efficiency significantly.
    Abstract The reinforcement learning algorithms that focus on how to compute the gradient and choose next actions, are effectively improved the performance of the agents. However, these algorithms are environment-agnostic. This means that the algorithms did not use the knowledge that has been captured by trajectory. This poses that the algorithms should sample many trajectories to train the model. By considering the essence of environment and how much the agent learn from each scenario in that environment, the strategy of the learning procedure can be changed. The strategy retrieves more informative trajectories, so the agent can learn with fewer trajectory sample. We propose Where2Start algorithm that selects the initial state so that the agent has more instability in vicinity of that state. We show that this kind of selection decreases number of trajectories that should be sampled that the agent reach to acceptable reward. Our experiments shows that Where2Start can improve sample efficiency up to 8 times. Also Where2Start can combined with most of state-of-the-art algorithms and improve that robustness and sample efficiency significantly.
    摘要 эти reinforcement learning 算法,关注计算梯度和选择下一步行为,有效地提高了代理人的性能。然而,这些算法是环境无关的,这意味着这些算法不使用路径中所捕捉的知识。这会导致算法需要采样多个路径来训练模型。通过考虑环境的本质和代理人在那里学习的程度,我们可以更改学习策略。我们提出了 Where2Start 算法,该算法选择初始状态,使得代理人在该状态附近有更大的不稳定性。我们展示了这种选择可以降低代理人需要采样的轨迹数量,以达到acceptable reward。我们的实验表明,Where2Start 可以提高样本效率,达到8倍以上。此外,Where2Start 可以与大多数当前的state-of-the-art算法结合使用,并在robustness和样本效率方面提供显著改善。

A GPU-based Hydrodynamic Simulator with Boid Interactions

  • paper_url: http://arxiv.org/abs/2311.15088
  • repo_url: https://github.com/xi-liu-cs/water
  • paper_authors: Xi Liu, Gizem Kayar, Ken Perlin
  • for: simulations of virtual agent behaviors and navigation inside a smoothed particle hydrodynamical (SPH) fluid environment
  • methods: GPU compute shaders of DirectX, parallel smoothed particle hydrodynamics model, distributed boid model, surface reconstruction using marching cubes algorithm
  • results: real-time water mesh surface reconstruction, interaction between SPH and virtual boid agents, versatility for underwater navigation and remote control engineering purposes
    Abstract We present a hydrodynamic simulation system using the GPU compute shaders of DirectX for simulating virtual agent behaviors and navigation inside a smoothed particle hydrodynamical (SPH) fluid environment with real-time water mesh surface reconstruction. The current SPH literature includes interactions between SPH and heterogeneous meshes but seldom involves interactions between SPH and virtual boid agents. The contribution of the system lies in the combination of the parallel smoothed particle hydrodynamics model with the distributed boid model of virtual agents to enable agents to interact with fluids. The agents based on the boid algorithm influence the motion of SPH fluid particles, and the forces from the SPH algorithm affect the movement of the boids. To enable realistic fluid rendering and simulation in a particle-based system, it is essential to construct a mesh from the particle attributes. Our system also contributes to the surface reconstruction aspect of the pipeline, in which we performed a set of experiments with the parallel marching cubes algorithm per frame for constructing the mesh from the fluid particles in a real-time compute and memory-intensive application, producing a wide range of triangle configurations. We also demonstrate that our system is versatile enough for reinforced robotic agents instead of boid agents to interact with the fluid environment for underwater navigation and remote control engineering purposes.
    摘要 我们提出了基于 DirectX GPU compute shaders 的 hydrodynamic simulation 系统,用于虚拟代理行为和在流体环境中的导航。现有的 SPH 文献中有些包括 SPH 和不同的网格之间的交互,但几乎不包括 SPH 和虚拟代理之间的交互。我们的系统的贡献在于将平行简化particle hydrodynamics模型与分布式 boid 模型结合起来,使代理能够与流体交互。基于 boid 算法的代理影响流体粒子的运动,而流体粒子的力也影响代理的移动。为实现真实的流体渲染和模拟,在 particle-based 系统中构建一个网格是非常重要的。我们的系统还贡献了 surface reconstruction 方面的研究,在每帧使用并行的 marching cubes 算法来从流体粒子中构建网格,生成了一系列 triangle 配置。我们还示出了我们的系统可以用于水下导航和远程控制工程应用,并且可以使用 reinforced robotic agents 代替 boid agents 与流体环境交互。

Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study

  • paper_url: http://arxiv.org/abs/2311.15051
  • repo_url: None
  • paper_authors: Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun
  • for: 这篇论文的目的是为了提供有关梯度下降with momentum的更深刻的理解,并证明其在现代深度学习中的应用。
  • methods: 这篇论文使用了大量的学习率和学习率温存,并通过实验和理论的分析来证明梯度下降with momentum在训练过程中的效果。
  • results: 研究发现,使用大量的学习率和学习率温存的梯度下降with momentum可以使模型迅速 converges to flatter minima,比传统的梯度下降更好。此外,研究还提供了理论 intuition,表明梯度下降with momentum可以通过“增强”自身稳定效果来解释这种现象。
    Abstract Although gradient descent with momentum is widely used in modern deep learning, a concrete understanding of its effects on the training trajectory still remains elusive. In this work, we empirically show that momentum gradient descent with a large learning rate and learning rate warmup displays large catapults, driving the iterates towards flatter minima than those found by gradient descent. We then provide empirical evidence and theoretical intuition that the large catapult is caused by momentum "amplifying" the self-stabilization effect (Damian et al., 2023).
    摘要

Training a Hopfield Variational Autoencoder with Equilibrium Propagation

  • paper_url: http://arxiv.org/abs/2311.15047
  • repo_url: None
  • paper_authors: Tom Van Der Meersch, Johannes Deleu, Thomas Demeester
  • for: 这篇论文主要用于推广Equilibrium Propagation在生成AI中的应用。
  • methods: 这篇论文使用Equilibrium Propagation法则来训练一个变量自动编码器(VAE),并利用套管网络的对称性,将Encoder和Decoder合并为一个模型,从而减少了VAE实现所需的芯片大小。
  • results: 该研究表明,通过使用Equilibrium Propagation法则和套管网络的合并模型,可以有效地降低VAE的计算成本,并且保持模型的性能。
    Abstract On dedicated analog hardware, equilibrium propagation is an energy-efficient alternative to backpropagation. In spite of its theoretical guarantees, its application in the AI domain remains limited to the discriminative setting. Meanwhile, despite its high computational demands, generative AI is on the rise. In this paper, we demonstrate the application of Equilibrium Propagation in training a variational autoencoder (VAE) for generative modeling. Leveraging the symmetric nature of Hopfield networks, we propose using a single model to serve as both the encoder and decoder which could effectively halve the required chip size for VAE implementations, paving the way for more efficient analog hardware configurations.
    摘要 在专门的分析逻辑硬件上,均衡传播是一种能效的替代方案,它可以在AI领域中提高能效性。尽管它具有理论保证,但在推理设置中的应用仍然受限。而在这篇论文中,我们展示了使用均衡传播来训练一个变量自动编码器(VAE) для生成模型。利用惯性网络的对称性,我们提议使用单个模型作为编码器和解码器,这可以减少VAE实现所需的芯片大小,并且开创了更有效的分析硬件配置的道路。

Satellite-based feature extraction and multivariate time-series prediction of biotoxin contamination in shellfish

  • paper_url: http://arxiv.org/abs/2311.15000
  • repo_url: https://github.com/zeniSoida/pl1
  • paper_authors: Sergio Tavares, Pedro R. Costa, Ludwig Krippahl, Marta B. Lopes
    for: 这个研究的目的是evaluate the integration of satellite data in forecasting models for predicting toxin concentrations in shellfish given forecasting horizons up to four weeks.methods: 这个研究使用了Sentinel-3 satellite imagery for marine surveillance, along with shellfish biotoxin contamination data from various production areas along Portugal’s western coastline, collected by Portuguese official control. Unsupervised feature extraction was performed using autoencoders able to handle non-valid pixels caused by factors like cloud cover, land, or anomalies. Several Artificial Neural Networks models were applied to compare univariate (contamination only) and multivariate (contamination and satellite data) time-series forecasting.results: 研究发现,包含这些特征可以提高预测,特别是在lagoon production areas (RIAV) 和L5B area (oceanic) 的1-week和2-week前 horizon。这种方法可以充分利用高维数据源如Remote Sensing的信息,而不会影响预测模型的准确性。
    Abstract Shellfish production constitutes an important sector for the economy of many Portuguese coastal regions, yet the challenge of shellfish biotoxin contamination poses both public health concerns and significant economic risks. Thus, predicting shellfish contamination levels holds great potential for enhancing production management and safeguarding public health. In our study, we utilize a dataset with years of Sentinel-3 satellite imagery for marine surveillance, along with shellfish biotoxin contamination data from various production areas along Portugal's western coastline, collected by Portuguese official control. Our goal is to evaluate the integration of satellite data in forecasting models for predicting toxin concentrations in shellfish given forecasting horizons up to four weeks, which implies extracting a small set of useful features and assessing their impact on the predictive models. We framed this challenge as a time-series forecasting problem, leveraging historical contamination levels and satellite images for designated areas. While contamination measurements occurred weekly, satellite images were accessible multiple times per week. Unsupervised feature extraction was performed using autoencoders able to handle non-valid pixels caused by factors like cloud cover, land, or anomalies. Finally, several Artificial Neural Networks models were applied to compare univariate (contamination only) and multivariate (contamination and satellite data) time-series forecasting. Our findings show that incorporating these features enhances predictions, especially beyond one week in lagoon production areas (RIAV) and for the 1-week and 2-week horizons in the L5B area (oceanic). The methodology shows the feasibility of integrating information from a high-dimensional data source like remote sensing without compromising the model's predictive ability.
    摘要 欧洲南部地区著名的贝壳生产业对当地经济起着重要作用,但贝壳生物毒素污染会对公众健康和经济造成重大风险。因此,预测贝壳毒素污染水平具有很大的潜在价值,以提高生产管理和保护公众健康。在我们的研究中,我们使用了多年的Sentinel-3卫星图像数据,以及从葡萄牙西海岸不同生产区收集的贝壳毒素污染数据,并通过葡萄牙官方监测数据来验证我们的模型。我们的目标是评估在四周前的预测模型,以预测贝壳毒素污染水平,并从事件序列预测问题出发,利用历史污染水平和卫星图像数据来预测贝壳毒素污染水平。尽管污染测量每周进行一次,但卫星图像可以在每周多次获取。我们使用自动Encoder来处理非有效像素,以避免因云层、陆地或异常而导致的问题。最后,我们应用了多个人工神经网络模型,以比较单variate(污染水平)和多variate(污染水平和卫星数据)时间序列预测。我们的结果显示,将这些特征集成到预测模型中可以提高预测精度,特别是在里亚瓦(RIAV)和L5B区域(海洋)的1周和2周预测水平。我们的方法表明,可以不COMPROMISE模型预测能力来集成高维数据源如远程感知数据。

Eliminating Domain Bias for Federated Learning in Representation Space

  • paper_url: http://arxiv.org/abs/2311.14975
  • repo_url: https://github.com/tsingz0/dbe
  • paper_authors: Jianqing Zhang, Yang Hua, Jian Cao, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan
  • for: 这个论文的目的是提出一个缩小领域差异的框架,以解决在联合学习中发生的表现偏袋现象。
  • methods: 这个框架使用了一种称为� Domain Bias Eliminator(DBE)的方法,它可以在联合学习中将供应商和服务器的领域差异优化,以提高知识的传递和个性化能力。
  • results: 实验结果显示,使用DBE的联合学习方法可以大幅提高已有的个性化联合学习方法的一致性和个性化能力,并且可以大幅超越了现有的个性化联合学习方法。
    Abstract Recently, federated learning (FL) is popular for its privacy-preserving and collaborative learning abilities. However, under statistically heterogeneous scenarios, we observe that biased data domains on clients cause a representation bias phenomenon and further degenerate generic representations during local training, i.e., the representation degeneration phenomenon. To address these issues, we propose a general framework Domain Bias Eliminator (DBE) for FL. Our theoretical analysis reveals that DBE can promote bi-directional knowledge transfer between server and client, as it reduces the domain discrepancy between server and client in representation space. Besides, extensive experiments on four datasets show that DBE can greatly improve existing FL methods in both generalization and personalization abilities. The DBE-equipped FL method can outperform ten state-of-the-art personalized FL methods by a large margin. Our code is public at https://github.com/TsingZ0/DBE.
    摘要 最近,联合学习(FL)在隐私保护和合作学习能力方面受到欢迎。然而,在统计上不均衡的场景下,我们发现客户端上的数据领域偏好导致代表性偏衡现象,从而降低了服务器和客户端之间的普适表示能力。为解决这些问题,我们提出了一个通用框架域偏衡纠正器(DBE) для FL。我们的理论分析表明,DBE可以推动服务器和客户端之间的双向知识传递,因为它减少了服务器和客户端之间的领域差异在表示空间中。此外,我们在四个数据集上进行了广泛的实验,发现 DBE 可以大幅提高现有 FL 方法的普适和个性化能力。DBE 配置的 FL 方法可以在普适和个性化能力方面超越了十个当前顶尖个性化 FL 方法。我们的代码可以在 上找到。

Selective Inference for Changepoint detection by Recurrent Neural Network

  • paper_url: http://arxiv.org/abs/2311.14964
  • repo_url: https://github.com/shirara1016/si_for_cpd_by_rnn
  • paper_authors: Tomohiro Shiraishi, Daiki Miwa, Vo Nguyen Le Duy, Ichiro Takeuchi
  • for: 这个研究的主要目标是使用循环神经网络(RNN)来评估时间序列中检测到的变化点(CP)的统计可靠性。
  • methods: 这个研究使用了Selective Inference(SI)框架,通过conditioning on the event of hypothesis selection来避免选择偏见。
  • results: 研究通过人工和实际数据实验表明,提出的方法可以有效地控制 false detection 的风险,并且可以在时间序列中检测到复杂动态的变化点。
    Abstract In this study, we investigate the quantification of the statistical reliability of detected change points (CPs) in time series using a Recurrent Neural Network (RNN). Thanks to its flexibility, RNN holds the potential to effectively identify CPs in time series characterized by complex dynamics. However, there is an increased risk of erroneously detecting random noise fluctuations as CPs. The primary goal of this study is to rigorously control the risk of false detections by providing theoretically valid p-values to the CPs detected by RNN. To achieve this, we introduce a novel method based on the framework of Selective Inference (SI). SI enables valid inferences by conditioning on the event of hypothesis selection, thus mitigating selection bias. In this study, we apply SI framework to RNN-based CP detection, where characterizing the complex process of RNN selecting CPs is our main technical challenge. We demonstrate the validity and effectiveness of the proposed method through artificial and real data experiments.
    摘要 在这种研究中,我们研究了使用循环神经网络(RNN)来评估时间序列中检测到的变化点(CP)的统计可靠性。由于RNN的灵活性,它有可能有效地识别时间序列中的复杂动态中的CP。然而,随机噪声波动的风险增加了false检测的风险。我们的主要目标是通过提供有效的p值来控制false检测的风险。为此,我们提出了基于选择性推理(SI)框架的一种新方法。SI允许有效的推理,通过对假设选择事件进行条件,从而减少选择偏见。在这种研究中,我们将SI框架应用于RNN基于CP检测,并且我们的主要技术挑战是characterizing RNN在选择CP时的复杂过程。我们通过人工和实际数据实验 validate the effectiveness and validity of our proposed method。

Identification of morphological fingerprint in perinatal brains using quasi-conformal mapping and contrastive learning

  • paper_url: http://arxiv.org/abs/2311.14955
  • repo_url: None
  • paper_authors: Boyang Wang, Weihao Zheng, Ying Wang, Zhe Zhang, Yuchen Sheng, Minmin Wang
  • for: 这个研究旨在确定新生儿脑中的个体特征是否存在,以及哪些 morphological attributes 或 cortical regions 更好地特征个体差异。
  • methods: 该研究使用了深度学习框架,将三维球体的三个 morphological features( cortical thickness、mean curvature 和 sulcal depth)Project onto 二维平面 through quasi-conformal mapping,并使用了 ResNet18 和 contrastive learning 进行个体识别。
  • results: 研究使用了 682 名新生儿的横截面 estructural MRI 数据,并通过数据扩展和参数调整来训练模型。模型在 30 名 longitudinal scanned infant 数据上进行验证,实现了remarkable Top1 和 Top5 准确率为 71.37% 和 84.10%, respectively。感觉和视觉 cortices 被识别为个体识别中最重要的区域。此外,折叠 morphology 显示出更高的分类能力,可能作为新生儿脑中的 morphological fingerprint。
    Abstract The morphological fingerprint in the brain is capable of identifying the uniqueness of an individual. However, whether such individual patterns are present in perinatal brains, and which morphological attributes or cortical regions better characterize the individual differences of ne-onates remain unclear. In this study, we proposed a deep learning framework that projected three-dimensional spherical meshes of three morphological features (i.e., cortical thickness, mean curvature, and sulcal depth) onto two-dimensional planes through quasi-conformal mapping, and employed the ResNet18 and contrastive learning for individual identification. We used the cross-sectional structural MRI data of 682 infants, incorporating with data augmentation, to train the model and fine-tuned the parameters based on 60 infants who had longitudinal scans. The model was validated on 30 longitudinal scanned infant data, and remarkable Top1 and Top5 accuracies of 71.37% and 84.10% were achieved, respectively. The sensorimotor and visual cortices were recognized as the most contributive regions in individual identification. Moreover, the folding morphology demonstrated greater discriminative capability than the cortical thickness, which could serve as the morphological fingerprint in perinatal brains. These findings provided evidence for the emergence of morphological fingerprints in the brain at the beginning of the third trimester, which may hold promising implications for understanding the formation of in-dividual uniqueness in the brain during early development.
    摘要 Brain中的形态指纹可以识别个体的唯一性。然而,在生前脑中是否存在这样的个体特征,以及哪些形态特征或 cortical 区域更好地描述新生儿的唯一性,还未得到清楚的答案。在这项研究中,我们提出了一种深度学习框架,将三维球面的三个形态特征(即 cortical thickness、mean curvature 和 sulcal depth)映射到二维平面上,并使用 ResNet18 和对比学习来实现个体识别。我们使用了682名新生儿的cross-sectional structural MRI数据,并在数据增强后,根据60名新生儿的长期扫描数据进行参数调整。模型被证明在30名长期扫描新生儿数据上的 Validation 中具有remarkable Top1 和 Top5 准确率为71.37% 和 84.10%,分别。感知动作和视觉 cortices 被识别为个体识别中最重要的区域。此外,折叠 morphology 表现出更高的分类能力,可能作为新生儿脑中的形态指纹。这些发现提供了脑开始第三 trimester 时的形态指纹的诞生证据,可能对早期脑发展中个体独特性的形成产生深远的影响。

Robust Graph Neural Networks via Unbiased Aggregation

  • paper_url: http://arxiv.org/abs/2311.14934
  • repo_url: None
  • paper_authors: Ruiqi Feng, Zhichao Hou, Tyler Derr, Xiaorui Liu
  • for: 本研究探讨了图 neural network (GNN) 的 adversarial robustness 问题,并提供了一种简单且有效的图信号估计器。
  • methods: 本研究使用了代表性的Robust GNNs,并提供了一种约新颖iterative reweighted least squares算法来解决估计问题,这种算法具有理论上的收敛保证。
  • results: 实验表明,提议的模型具有强大的Robustness,而ablation study还提供了深入的理解其优势。
    Abstract The adversarial robustness of Graph Neural Networks (GNNs) has been questioned due to the false sense of security uncovered by strong adaptive attacks despite the existence of numerous defenses. In this work, we delve into the robustness analysis of representative robust GNNs and provide a unified robust estimation point of view to understand their robustness and limitations. Our novel analysis of estimation bias motivates the design of a robust and unbiased graph signal estimator. We then develop an efficient Quasi-Newton iterative reweighted least squares algorithm to solve the estimation problem, which unfolds as robust unbiased aggregation layers in GNNs with a theoretical convergence guarantee. Our comprehensive experiments confirm the strong robustness of our proposed model, and the ablation study provides a deep understanding of its advantages.
    摘要 Graph Neural Networks (GNNs) 的对抗Robustness 被质疑,因为强大的 adaptive 攻击破坏了许多防御措施的假保障。在这项工作中,我们进行了代表性的 GNNs robustness 分析,并提供了一种统一的 robust 估计视角来理解它们的强度和局限性。我们的新的估计偏见分析motivates 了设计一种强健和无偏的图信号估计器。然后,我们开发了一种高效的 quasi-Newton 迭代最小二乘算法来解决估计问题,它在 GNNs 中展开为一种robust 不偏的聚合层,并有理论上的收敛保证。我们的广泛的实验证明了我们提出的模型的强大对抗能力,并且ablation 研究提供了深入的理解。

One-Shot Transfer Learning for Nonlinear ODEs

  • paper_url: http://arxiv.org/abs/2311.14931
  • repo_url: None
  • paper_authors: Wanzhou Lei, Pavlos Protopapas, Joy Parikh
  • for: 解决非线性偏微分方程(ODE)中的单个多项式项,使用物理 Informed Neural Networks(PINNs)。
  • methods: combining perturbation method和一次转移学习,将非线性ODE转化为线性ODE系统,使用PINN训练并提供新实例的闭合形解。
  • results: 在杜卷方程中展示了方法的有效性,并建议其适用于类似结构的PDE和ODE系统。
    Abstract We introduce a generalizable approach that combines perturbation method and one-shot transfer learning to solve nonlinear ODEs with a single polynomial term, using Physics-Informed Neural Networks (PINNs). Our method transforms non-linear ODEs into linear ODE systems, trains a PINN across varied conditions, and offers a closed-form solution for new instances within the same non-linear ODE class. We demonstrate the effectiveness of this approach on the Duffing equation and suggest its applicability to similarly structured PDEs and ODE systems.
    摘要 我们提出一种通用的方法,将扰动方法和一次转移学习结合用于解决非线性偏微分方程(ODE),使用物理学 Informed Neural Networks(PINN)。我们的方法将非线性ODE转化为线性ODE系统,训练PINN在不同条件下,并提供新的实例的关闭形解。我们在杜勃方程中进行了示范,并建议其适用于类似结构的偏微分方程和ODE系统。

A latent linear model for nonlinear coupled oscillators on graphs

  • paper_url: http://arxiv.org/abs/2311.14910
  • repo_url: None
  • paper_authors: Agam Goyal, Zhaoxing Wu, Richard P. Yim, Binhao Chen, Zihong Xu, Hanbaek Lyu
  • for: 这篇论文旨在研究在任意图上的带有相互同步倾向的oscillator系统,该系统可能会在整个图上显示非线性行为。
  • methods: 作者使用了一种基于监督矩阵分解的方法,来学习这些oscillator系统的latent dynamics filters,以Linearize非线性行为。
  • results: 作者发现,可以通过将subgraph-level的动态分解成一些基本的元素动态模式,来预测整个图上的同步状态。这种方法可以与基准和黑盒分类算法竞争,即使其架构简单明了。
    Abstract A system of coupled oscillators on an arbitrary graph is locally driven by the tendency to mutual synchronization between nearby oscillators, but can and often exhibit nonlinear behavior on the whole graph. Understanding such nonlinear behavior has been a key challenge in predicting whether all oscillators in such a system will eventually synchronize. In this paper, we demonstrate that, surprisingly, such nonlinear behavior of coupled oscillators can be effectively linearized in certain latent dynamic spaces. The key insight is that there is a small number of `latent dynamics filters', each with a specific association with synchronizing and non-synchronizing dynamics on subgraphs so that any observed dynamics on subgraphs can be approximated by a suitable linear combination of such elementary dynamic patterns. Taking an ensemble of subgraph-level predictions provides an interpretable predictor for whether the system on the whole graph reaches global synchronization. We propose algorithms based on supervised matrix factorization to learn such latent dynamics filters. We demonstrate that our method performs competitively in synchronization prediction tasks against baselines and black-box classification algorithms, despite its simple and interpretable architecture.
    摘要 翻译结果:系统中的各自振荡器受到邻近振荡器的同步倾向的本地驱动,但可能在整个图上表现出非线性行为。了解这种非线性行为是预测系统是否会全局同步的关键挑战。在这篇论文中,我们发现有一些“隐藏动态空间”,其中每个空间具有特定的同步和不同步动态特征,并且可以将图上的任何动态 aproximated为这些基本动态模式的线性组合。通过 ensemble 的 subgraph 级别预测,可以获得可理解的预测器,以确定整个图上是否会达到全局同步。我们基于supervised matrix factorization提出了一种学习这些隐藏动态空间的算法。我们示出了我们的方法在同步预测任务中与基准和黑obox分类算法相比,具有竞争力,即使其架构简单明了。

mvlearnR and Shiny App for multiview learning

  • paper_url: http://arxiv.org/abs/2311.16181
  • repo_url: https://github.com/lasandrall/mvlearnr
  • paper_authors: Elise F. Palzer, Sandra E. Safo
  • for: 这个论文是为了开发一个名为mvlearnR的R packages,用于集成多个数据源或视图或模式(例如 genomics、proteomics、临床和人口数据)。
  • methods: 这个包使用统计学和机器学习方法,以及图形工具,提供了一个便捷的数据集成工作流程。
  • results: 这个方法有potential用于提供复杂疾病机制的更深入的理解。Here are the three points in Simplified Chinese text:
  • for: 这个论文是为了开发一个名为mvlearnR的R包,用于集成多个数据源或视图或模式(例如 genomics、proteomics、临床和人口数据)。
  • methods: 这个包使用统计学和机器学习方法,以及图形工具,提供了一个便捷的数据集成工作流程。
  • results: 这个方法有potential用于提供复杂疾病机制的更深入的理解。
    Abstract The package mvlearnR and accompanying Shiny App is intended for integrating data from multiple sources or views or modalities (e.g. genomics, proteomics, clinical and demographic data). Most existing software packages for multiview learning are decentralized and offer limited capabilities, making it difficult for users to perform comprehensive integrative analysis. The new package wraps statistical and machine learning methods and graphical tools, providing a convenient and easy data integration workflow. For users with limited programming language, we provide a Shiny Application to facilitate data integration anywhere and on any device. The methods have potential to offer deeper insights into complex disease mechanisms. Availability and Implementation: mvlearnR is available from the following GitHub repository: https://github.com/lasandrall/mvlearnR. The web application is hosted on shinyapps.io and available at: https://multi-viewlearn.shinyapps.io/MultiView_Modeling/
    摘要 package mvlearnR 和附加的 Shiny App 用于 integrating 数据从多个源或视图或模式(例如 genomics, proteomics, clinical 和 demographic 数据)。大多数现有的软件包 для多视图学习是分散的,提供有限的能力,使用者难以进行全面的 integrative 分析。新包 wrapping 统计和机器学习方法和图形工具,提供一个便捷的数据集成 workflow。对不具备编程语言技能的用户,我们提供了一个 Shiny 应用程序,以便在任何设备上进行数据集成。这些方法具有可能为复杂疾病机理提供更深入的理解。可用性和实施:mvlearnR 可以从以下 GitHub 仓库获取:https://github.com/lasandrall/mvlearnR。 web 应用程序被HOST在 shinyapps.io 上,可以通过以下链接访问:https://multi-viewlearn.shinyapps.io/MultiView_Modeling/

Support Vector Machine Implementation on MPI-CUDA and Tensorflow Framework

  • paper_url: http://arxiv.org/abs/2311.14908
  • repo_url: None
  • paper_authors: Islam Elgarhy
  • for: 本文旨在比较SVM算法在不同并行架构框架下的性能,以找到可以减少SVM算法解决 quadratic programming 优化问题所需的高计算成本的解决方案。
  • methods: 本文使用了多种并行架构,包括多核CPU和高可扩展的GPU,来加速SVM算法的计算。
  • results: 实验结果表明,使用MPI-CUDA实现的SVM算法在不同数据集上实现了速度提高,而TensorFlow实现提供了跨平台解决方案,可以快速移植到其他硬件组件。
    Abstract Support Vector Machine (SVM) algorithm requires a high computational cost (both in memory and time) to solve a complex quadratic programming (QP) optimization problem during the training process. Consequently, SVM necessitates high computing hardware capabilities. The central processing unit (CPU) clock frequency cannot be increased due to physical limitations in the miniaturization process. However, the potential of parallel multi-architecture, available in both multi-core CPUs and highly scalable GPUs, emerges as a promising solution to enhance algorithm performance. Therefore, there is an opportunity to reduce the high computational time required by SVM for solving the QP optimization problem. This paper presents a comparative study that implements the SVM algorithm on different parallel architecture frameworks. The experimental results show that SVM MPI-CUDA implementation achieves a speedup over SVM TensorFlow implementation on different datasets. Moreover, SVM TensorFlow implementation provides a cross-platform solution that can be migrated to alternative hardware components, which will reduces the development time.
    摘要 支持向量机 (SVM) 算法需要高度的计算成本( Both 内存和时间)来解决复杂的quadratic programming (QP) 优化问题 durante el entrenamiento proces. 因此,SVM 需要高度的计算硬件能力。 CPU 频率不能增加 due to physical limitations in the miniaturization process. 然而,可用的并行多架构,包括多核 CPU 和高可扩展 GPU, emerges as a promising solution to enhance algorithm performance. 因此,there is an opportunity to reduce the high computational time required by SVM for solving the QP optimization problem. This paper presents a comparative study that implements the SVM algorithm on different parallel architecture frameworks. The experimental results show that SVM MPI-CUDA implementation achieves a speedup over SVM TensorFlow implementation on different datasets. Moreover, SVM TensorFlow implementation provides a cross-platform solution that can be migrated to alternative hardware components, which will reduce the development time.Note: I've kept the original sentence structure and vocabulary as much as possible, but I've made some adjustments to make it more idiomatic and natural in Simplified Chinese.

LLM-Assisted Code Cleaning For Training Accurate Code Generators

  • paper_url: http://arxiv.org/abs/2311.14904
  • repo_url: None
  • paper_authors: Naman Jain, Tianjun Zhang, Wei-Lin Chiang, Joseph E. Gonzalez, Koushik Sen, Ion Stoica
  • for: 本研究旨在提高代码生成系统的性能,通过改善数据质量来提高模型的可读性和结构化性。
  • methods: 我们采用了一种新的数据清洁管道,通过重命名变量、分解复杂代码为更小的帮助子函数、并通过 LLM 基于的转换插入自然语言计划来改善代码质量。
  • results: 我们在两个复杂的算法代码生成 benchmark 上评估了我们的方法,发现 Fine-tuning CodeLLaMa-7B 在我们修改后的模块化程序上进行了最多30%的提高,并且发现使用较小的高质量数据可以达到更好的性能,甚至超过了关闭源模型。
    Abstract Natural language to code generation is an important application area of LLMs and has received wide attention from the community. The majority of relevant studies have exclusively concentrated on increasing the quantity and functional correctness of training sets while disregarding other stylistic elements of programs. More recently, data quality has garnered a lot of interest and multiple works have showcased its importance for improving performance. In this work, we investigate data quality for code and find that making the code more structured and readable leads to improved code generation performance of the system. We build a novel data-cleaning pipeline that uses these principles to transform existing programs by 1.) renaming variables, 2.) modularizing and decomposing complex code into smaller helper sub-functions, and 3.) inserting natural-language based plans via LLM based transformations. We evaluate our approach on two challenging algorithmic code generation benchmarks and find that fine-tuning CodeLLaMa-7B on our transformed modularized programs improves the performance by up to 30% compared to fine-tuning on the original dataset. Additionally, we demonstrate improved performance from using a smaller amount of higher-quality data, finding that a model fine-tuned on the entire original dataset is outperformed by a model trained on 15% of our cleaned dataset. Even in comparison to closed-source models, our models outperform the much larger AlphaCoder models.
    摘要 自然语言到代码生成是LLM的重要应用领域,社区内部广泛关注。大多数相关研究均围绕增加训练集的量和功能正确性而围绕,而忽视其他程序的风格元素。在这种情况下,我们发现了代码质量的重要性,并展示了使代码更加结构化和可读性的改进代码生成系统的效果。我们构建了一个新的数据清洁管道,使用这些原则来转换现有的程序,包括:1)重命名变量,2)归并和分解复杂代码为小型帮助子函数,3)通过LLM基于语言模型的转换插入自然语言计划。我们在两个挑战性代码生成标准 benchmark 上评估了我们的方法,发现在我们对模块化程序进行微调CodeLLaMa-7B时,性能提高了30%。此外,我们还发现使用较少量但高质量数据进行微调,模型在15%的清洁数据集上进行微调比模型在原始数据集上进行微调更高性能。而与关闭源代码模型相比,我们的模型仍然表现出优异。

A unified framework for learning with nonlinear model classes from arbitrary linear samples

  • paper_url: http://arxiv.org/abs/2311.14886
  • repo_url: None
  • paper_authors: Ben Adcock, Juan M. Cardenas, Nick Dexter
  • for: 学习未知对象从特定模型类中学习
  • methods: 引入一个笔者统一框架,允许对象在任意希尔伯特空间,使用任意类型的随机线性测量数据和非线性模型类
  • results: 提供了一系列学习保证,其中包括对模型类的变化的研究,以及对各种已知问题的推广和改进In more detail, the paper is focused on the problem of learning an unknown object from training data using a given model class. The authors introduce a unified framework that allows for objects in arbitrary Hilbert spaces, general types of (random) linear measurements as training data, and general types of nonlinear model classes. They establish a series of learning guarantees for this framework, including explicit relations between the amount of training data and properties of the model class to ensure near-best generalization bounds. The paper also introduces the key notion of the variation of a model class with respect to a distribution of sampling operators, and demonstrates the versatility of the framework by showing that it can accommodate many different types of well-known problems of interest, including matrix sketching by random sampling, compressed sensing with isotropic vectors, active learning in regression, and compressed sensing with generative models.
    Abstract This work considers the fundamental problem of learning an unknown object from training data using a given model class. We introduce a unified framework that allows for objects in arbitrary Hilbert spaces, general types of (random) linear measurements as training data and general types of nonlinear model classes. We establish a series of learning guarantees for this framework. These guarantees provide explicit relations between the amount of training data and properties of the model class to ensure near-best generalization bounds. In doing so, we also introduce and develop the key notion of the variation of a model class with respect to a distribution of sampling operators. To exhibit the versatility of this framework, we show that it can accommodate many different types of well-known problems of interest. We present examples such as matrix sketching by random sampling, compressed sensing with isotropic vectors, active learning in regression and compressed sensing with generative models. In all cases, we show how known results become straightforward corollaries of our general learning guarantees. For compressed sensing with generative models, we also present a number of generalizations and improvements of recent results. In summary, our work not only introduces a unified way to study learning unknown objects from general types of data, but also establishes a series of general theoretical guarantees which consolidate and improve various known results.
    摘要 这个工作考虑了学习未知对象从培训数据中学习的基本问题,使用给定的模型类。我们提出了一个统一框架,允许对任意希尔伯特空间中的对象进行学习,并且支持random linear measurements作为培训数据,以及一般类型的非线性模型类。我们建立了一系列学习保证,这些保证提供了培训数据的量和模型类型之间的直接关系,以确保近似最佳泛化评估。在这过程中,我们还引入了关键的模型类变化概念,即对于分布 sampling 算子的变化。我们通过展示这个框架可以涵盖许多已知的问题,如矩阵笔记、压缩感知、活动学习、压缩感知和生成模型。在这些例子中,我们显示了如何使用我们的一般学习保证,从而得到已知结果的直接推论。此外,我们还对压缩感知与生成模型进行了一些扩展和改进。总之,我们的工作不仅提出了一个统一的方法来学习未知对象,还建立了一系列一般的理论保证,这些保证总结了和改进了许多已知的结果。

Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.14885
  • repo_url: None
  • paper_authors: Melrose Roderick, Gaurav Manek, Felix Berkenkamp, J. Zico Kolter
  • for: 实现稳定的离谱策略学习(Off-policy Reinforcement Learning),减少问题发生的分布差异问题。
  • methods: 提出一新的专案确量评估(Projected Off-Policy Q-Learning,POP-QL),融合重新权重的随机样本和策略范围的限制,以降低价值估计误差和传播误差。
  • results: 在标准参考 зада例中竞争性表现,并在资料收集策略是明显不理想的情况下表现出色。
    Abstract A key problem in off-policy Reinforcement Learning (RL) is the mismatch, or distribution shift, between the dataset and the distribution over states and actions visited by the learned policy. This problem is exacerbated in the fully offline setting. The main approach to correct this shift has been through importance sampling, which leads to high-variance gradients. Other approaches, such as conservatism or behavior-regularization, regularize the policy at the cost of performance. In this paper, we propose a new approach for stable off-policy Q-Learning. Our method, Projected Off-Policy Q-Learning (POP-QL), is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error. In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
    摘要 主要问题在完全离线RL中是状态和动作访问的分布差异,这个问题在完全离线设置中更加突出。主要方法是通过重要性抽样,导致高差值的梯度。其他方法,如保守性或行为规范,将策略正则化,但会导致性能下降。在这篇论文中,我们提出了一种稳定的离线Q学习方法。我们的方法,即投影式离线Q学习(POP-QL),是一种新的actor-critic算法,同时对离线样本进行重新权重并将策略限制,以防止偏离和降低价值近似错误。在我们的实验中,POP-QL不仅在标准准 benchmark上达到竞争性性能,而且在数据采集策略是非常低效的情况下也超过竞争方法。

eess.IV - 2023-11-25

Learning graph-Fourier spectra of textured surface images for defect localization

  • paper_url: http://arxiv.org/abs/2311.15082
  • repo_url: None
  • paper_authors: Tapan Ganatma Nakkina, Adithyaa Karthikeyan, Yuhao Zhong, Ceyhun Eksin, Satish T. S. Bukkapatnam
  • for: automatic detection of surface defects in highly textured backgrounds
  • methods: graph Fourier analysis and convolutional neural network (1D-CNN)
  • results: classification accuracy of 99.4% and explainable AI method using SHAP to analyze the trained 1D-CNN modelHere’s the full text in Simplified Chinese:
  • for: 本研究旨在自动检测生产过程中的表面问题,特别是在高度文化背景下。
  • methods: Graph Fourier分析和卷积神经网络(1D-CNN)。
  • results: 对于具有高度文化背景的图像进行自动检测,以获得99.4%的准确率和可解释的AI方法使用SHAP来分析训练好的1D-CNN模型,并发现低频率的graph傅立叶波形在精确地local化表面问题中发挥重要作用。
    Abstract In the realm of industrial manufacturing, product inspection remains a significant bottleneck, with only a small fraction of manufactured items undergoing inspection for surface defects. Advances in imaging systems and AI can allow automated full inspection of manufactured surfaces. However, even the most contemporary imaging and machine learning methods perform poorly for detecting defects in images with highly textured backgrounds, that stem from diverse manufacturing processes. This paper introduces an approach based on graph Fourier analysis to automatically identify defective images, as well as crucial graph Fourier coefficients that inform the defects in images amidst highly textured backgrounds. The approach capitalizes on the ability of graph representations to capture the complex dynamics inherent in high-dimensional data, preserving crucial locality properties in a lower dimensional space. A convolutional neural network model (1D-CNN) was trained with the coefficients of the graph Fourier transform of the images as the input to identify, with classification accuracy of 99.4%, if the image contains a defect. An explainable AI method using SHAP (SHapley Additive exPlanations) was used to further analyze the trained 1D-CNN model to discern important spectral coefficients for each image. This approach sheds light on the crucial contribution of low-frequency graph eigen waveforms to precisely localize surface defects in images, thereby advancing the realization of zero-defect manufacturing.
    摘要 在工业生产领域,产品检测仍然是一个重要的瓶颈,只有一小部分生产item进行表面检测。技术进步和人工智能可以自动检测全部生产表面的瑕疵。然而,even the most contemporary imaging and machine learning methods perform poorly for detecting defects in images with highly textured backgrounds, which stem from diverse manufacturing processes.这篇论文提出了一种基于图 Fourier分析的方法,可以自动识别瑕疵图像,以及图像中的关键图 Fourier 约束 coefficient。这种方法利用图表示法 capture高维数据中的复杂动态特性,保留图像中的重要地方性质。一种基于1D-CNN的卷积神经网络模型被训练使用图 Fourier 约束的图像 coefficients 作为输入,可以准确地判断图像是否包含瑕疵。使用 SHAP 方法进行解释的 AI 方法可以进一步分析训练好的 1D-CNN 模型,以便了解每个图像中关键的spectral coefficient。这种方法显示了低频图 eigen waveforms 在准确地Localiza表面瑕疵图像中的重要贡献,从而推进零瑕疵生产的实现。

A Novel Deep Clustering Framework for Fine-Scale Parcellation of Amygdala Using dMRI Tractography

  • paper_url: http://arxiv.org/abs/2311.14935
  • repo_url: None
  • paper_authors: Haolin He, Ce Zhu, Le Zhang, Yipeng Liu, Xiao Xu, Yuqian Chen, Leo Zekelman, Jarrett Rushmore, Yogesh Rathi, Nikos Makris, Lauren J. O’Donnell, Fan Zhang
  • for: 这个研究旨在提供一种自动化、精细分割腾脑囊的方法,以便更好地理解腾脑囊的结构功能相关性。
  • methods: 这个研究使用了 diffusion MRI tractography 技术,以便估算腾脑囊的白肽结构连接性,并使用深度学习方法进行精细分割腾脑囊。
  • results: 研究结果表明,使用提议的方法可以在腾脑囊中分割出九个独特的 parcels,这些 parcels 在不同的主题中具有good correspondence。
    Abstract The amygdala plays a vital role in emotional processing and exhibits structural diversity that necessitates fine-scale parcellation for a comprehensive understanding of its anatomico-functional correlations. Diffusion MRI tractography is an advanced imaging technique that can estimate the brain's white matter structural connectivity to potentially reveal the topography of the amygdala for studying its subdivisions. In this work, we present a deep clustering pipeline to perform automated, fine-scale parcellation of the amygdala using diffusion MRI tractography. First, we incorporate a newly proposed deep learning approach to enable accurate segmentation of the amygdala directly on the dMRI data. Next, we design a novel streamline clustering-based structural connectivity feature for a robust representation of voxels within the amygdala. Finally, we improve the popular joint dimensionality reduction and k-means clustering approach to enable amygdala parcellation at a finer scale. With the proposed method, we obtain nine unique amygdala parcels. Experiments show that these parcels can be consistently identified across subjects and have good correspondence to the widely used coarse-scale amygdala parcellation.
    摘要 《amygdala的多样性和其相关功能的研究》Introduction:The amygdala is a crucial structure involved in emotional processing, and its anatomical diversity has made it challenging to study its function. Recent advances in diffusion MRI tractography have enabled the estimation of white matter structural connectivity, which can potentially reveal the topography of the amygdala and its subdivisions. In this study, we propose a deep clustering pipeline to perform automated, fine-scale parcellation of the amygdala using diffusion MRI tractography.Methodology:1. Deep Learning Approach: We incorporate a newly proposed deep learning approach to accurately segment the amygdala directly on the dMRI data.2. Streamline Clustering-based Structural Connectivity Feature: We design a novel streamline clustering-based structural connectivity feature to robustly represent voxels within the amygdala.3. Improved Joint Dimensionality Reduction and k-Means Clustering: We improve the popular joint dimensionality reduction and k-means clustering approach to enable amygdala parcellation at a finer scale.Results:With the proposed method, we obtain nine unique amygdala parcels that can be consistently identified across subjects and have good correspondence to the widely used coarse-scale amygdala parcellation.Conclusion:Our proposed method provides a fine-scale parcellation of the amygdala, which can help researchers better understand the anatomical and functional diversity of this structure and its role in emotional processing.

eess.SP - 2023-11-25

OFDMA-F$^2$L: Federated Learning With Flexible Aggregation Over an OFDMA Air Interface

  • paper_url: http://arxiv.org/abs/2311.15141
  • repo_url: None
  • paper_authors: Shuyan Hu, Xin Yuan, Wei Ni, Xin Wang, Ekram Hossain, H. Vincent Poor
  • for: 提高 Federated Learning(FL)在移动网络中的性能,解决限制参与的客户端和FL融合的问题。
  • methods: 提出一种新的flexible aggregation-based FL(F$^2$L)模型,使用orthogonal frequency division multiple-access(OFDMA)空 Interface,让选择的客户端在每个聚合轮次中训练本地模型,并在聚合轮次之前进行多次迭代。选择客户端、子频和模ulation,根据通道条件和计算能力进行适应。
  • results: 通过分析OFDMA-F$^2$L的最佳性 gap,并使用Lagrange-dual方法解决权衡积sum rate最大化的挑战性混合整数程序,发现使用“赢家当选”策略可以实现最佳客户端、子频和模ulation选择。实验中,OFDMA-F$^2$L可以提高训练的速度和精度,比如18%和5%,相比潜在的替代方案。
    Abstract Federated learning (FL) can suffer from a communication bottleneck when deployed in mobile networks, limiting participating clients and deterring FL convergence. The impact of practical air interfaces with discrete modulations on FL has not previously been studied in depth. This paper proposes a new paradigm of flexible aggregation-based FL (F$^2$L) over orthogonal frequency division multiple-access (OFDMA) air interface, termed as ``OFDMA-F$^2$L'', allowing selected clients to train local models for various numbers of iterations before uploading the models in each aggregation round. We optimize the selections of clients, subchannels and modulations, adapting to channel conditions and computing powers. Specifically, we derive an upper bound on the optimality gap of OFDMA-F$^2$L capturing the impact of the selections, and show that the upper bound is minimized by maximizing the weighted sum rate of the clients per aggregation round. A Lagrange-dual based method is developed to solve this challenging mixed integer program of weighted sum rate maximization, revealing that a ``winner-takes-all'' policy provides the almost surely optimal client, subchannel, and modulation selections. Experiments on multilayer perceptrons and convolutional neural networks show that OFDMA-F$^2$L with optimal selections can significantly improve the training convergence and accuracy, e.g., by about 18\% and 5\%, compared to potential alternatives.
    摘要 “联合学习(FL)在移动网络中部署时可能会遇到通信瓶颈,限制参与的客户端和妨碍FL的收敛。这篇论文提出了一种新的柔性聚合基于联合学习(F$^2$L)的方法,称为“OFDMA-F$^2$L”,允许参与客户端在每个聚合轮次中对本地模型进行不同数量的训练,然后将模型上传到总集。我们优化客户端、子频和模ulation的选择,适应通道条件和计算能力。 Specifically, we derive an upper bound on the optimality gap of OFDMA-F$^2$L, which captures the impact of the selections, and show that the upper bound is minimized by maximizing the weighted sum rate of the clients per aggregation round. A Lagrange-dual based method is developed to solve this challenging mixed integer program of weighted sum rate maximization, revealing that a \"winner-takes-all\" policy provides the almost surely optimal client, subchannel, and modulation selections. Experiments on multilayer perceptrons and convolutional neural networks show that OFDMA-F$^2$L with optimal selections can significantly improve the training convergence and accuracy, e.g., by about 18\% and 5\%, compared to potential alternatives.”Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore.

Quickest Change Detection with Post-Change Density Estimation

  • paper_url: http://arxiv.org/abs/2311.15128
  • repo_url: None
  • paper_authors: Yuchen Liang, Venugopal V. Veeravalli
  • for: 本文考虑了序列独立观测中快速变化探测问题。
  • methods: 本文提出了两种基于后变换密度估计的测试方法:窗口限定非参数化通用准则(NGLR)CuSum测试和非参数化窗口限定自适应(NWLA)CuSum测试。
  • results: 在满足certain smoothness conditions下,当密度估计 converge时,两种测试方法具有first-order asymptotic optimality,false alarm rate随着预测值减少到零。
    Abstract The problem of quickest change detection in a sequence of independent observations is considered. The pre-change distribution is assumed to be known, while the post-change distribution is unknown. Two tests based on post-change density estimation are developed for this problem, the window-limited non-parametric generalized likelihood ratio (NGLR) CuSum test and the non-parametric window-limited adaptive (NWLA) CuSum test. Both tests do not assume any knowledge of the post-change distribution, except that the post-change density satisfies certain smoothness conditions that allows for efficient non-parametric estimation. Also, they do not require any pre-collected post-change training samples. Under certain convergence conditions on the density estimator, it is shown that both tests are first-order asymptotically optimal, as the false alarm rate goes to zero. The analysis is validated through numerical results, where both tests are compared with baseline tests that have distributional knowledge.
    摘要 “我们考虑了一个独立观测序列中的快速变化探测问题。我们假设了预测变化的分布知道,而后换变分布不知道。我们提出了两种基于后换变分布估计的测试,分别是窗口限定非 Parametric 普通概率函数(NGLR)CuSum 测试和非 Parametric 窗口限定自适应(NWLA)CuSum 测试。这两种测试不知道后换变分布,只知道后换变分布满足 certain smoothness conditions,允许非 Parametric 估计。此外,它们不需要预先收集后换变training samples。在某些数据散布的假设下,我们证明了这两种测试在false alarm rate接近零时是first-order asymptotically optimal。我们通过numerical results validate our analysis, comparison with基eline tests that have distributional knowledge.”Note: Simplified Chinese is used here, as it is the most widely used variety of Chinese in mainland China. However, if you prefer Traditional Chinese, I can provide that version as well.

Multiuser Beamforming for Partially-Connected Millimeter Wave Massive MIMO

  • paper_url: http://arxiv.org/abs/2311.15069
  • repo_url: None
  • paper_authors: Chenhao Qi, Jinlin Hu, Yang Du, Arumugam Nallanathan
  • for: 本研究探讨了 partially-connected millimeter wave massive MIMO 系统中的多用户扩 beamforming 技术。
  • methods: 根据完美的渠道状态信息 (CSI),提出了一种低复杂度混合式 beamforming 方案,其中分为analog beamformer和digital beamformer两部分。analog beamformer 设计为一个相对位移问题,以充分利用数组增益。给出了analog beamformer,然后解决一个加重平均均方差问题来设计digital beamformer。
  • results: 对于完美 CSI 情况下,提出的方案可以减少计算复杂性,但是减少了总比特率表现的差异。对于含有误差 CSI 情况下,提出的analog-only beamformer设计方案可以有效地减少多用户干扰。
    Abstract Multiuser beamforming is considered for partially-connected millimeter wave massive MIMO systems. Based on perfect channel state information (CSI), a low-complexity hybrid beamforming scheme that decouples the analog beamformer and the digital beamformer is proposed to maximize the sum-rate. The analog beamformer design is modeled as a phase alignment problem to harvest the array gain. Given the analog beamformer, the digital beamformer is designed by solving a weighted minimum mean squared error problem. Then based on imperfect CSI, an analog-only beamformer design scheme is proposed, where the design problem aims at maximizing the desired signal power on the current user and minimizing the power on the other users to mitigate the multiuser interference. The original problem is then transformed into a series of independent beam nulling subproblems, where an efficient iterative algorithm using the majorization-minimization framework is proposed to solve the subproblems. Simulation results show that, under perfect CSI, the proposed scheme achieves almost the same sum-rate performance as the existing schemes but with lower computational complexity; and under imperfect CSI, the proposed analog-only beamforming design scheme can effectively mitigate the multiuser interference.
    摘要 (以下是简化中文版)在具有部分连接的毫米波巨量MIMO系统中,考虑多用户扫描。基于完美通道状态信息(CSI),提出一种低复杂度混合扫描方案,该方案将数字扫描器和分析扫描器分离开来,以最大化总Bit rate。分析扫描器的设计被视为相位匹配问题,以利用阵列效应。给定分析扫描器,则数字扫描器的设计问题是解一个质量因子最小二乘问题。然后,基于不完美CSI,提出一种分析只的扫描器设计方案,该方案的目标是 Maximize the desired signal power on the current user and minimize the power on the other users to mitigate the multiuser interference. 原始问题被转换成一系列独立的扫描nulling子问题,并提出一种高效的迭代算法使用大量化-最小化框架来解决子问题。实验结果显示,在完美CSI情况下,提出的方案可以与现有方案准确性相同,但计算复杂度较低;而在不完美CSI情况下,提出的分析只的扫描器设计方案可以有效地减少多用户干扰。

Beam Training and Tracking for Extremely Large-Scale MIMO Communications

  • paper_url: http://arxiv.org/abs/2311.15066
  • repo_url: None
  • paper_authors: Kangjian Chen, Chenhao Qi, Cheng-Xiang Wang, Geoffrey Ye Li
  • for: 这个论文研究了非常大规模多输入多出口通信系统中的部分连接混合结构的射频场训练和跟踪。
  • methods: 论文提出了两Stage hybrid-field射频场训练方案,在第一个阶段,每个子阵列独立地使用多个远场通道扫描向量来近似近场的射频场。在第二个阶段,基于数字结合器的设计,将 analog combiner的输出从第一个阶段组合起来以找到最佳的码WORD。
  • results: 论文提出了一种基于 stationary phase 和时空同征的 beam refinement 方案(BRPSS),并开发了一种低复杂度的近场跟踪方案,使用动态模型描述通道变化,并使用扩展 Kalman 筛选器进行跟踪。 simulate 结果验证了提出的方案的有效性。
    Abstract In this paper, beam training and beam tracking are investigated for extremely large-scale multiple-input-multiple-output communication systems with partially-connected hybrid combining structures. Firstly, we propose a two-stage hybrid-field beam training scheme for both the near field and the far field. In the first stage, each subarray independently uses multiple far-field channel steering vectors to approximate near-field ones for analog combining. To find the codeword best fitting for the channel, digital combiners in the second stage are designed to combine the outputs of the analog combiners from the first stage. Then, based on the principle of stationary phase and the time-frequency duality, the expressions of subarray signals after analog combining are analytically derived and a beam refinement based on phase shifts of subarrays~(BRPSS) scheme with closed-form solutions is proposed for high-resolution channel parameter estimation. Moreover, a low-complexity near-field beam tracking scheme is developed, where the kinematic model is adopted to characterize the channel variations and the extended Kalman filter is exploited for beam tracking. Simulation results verify the effectiveness of the proposed schemes.
    摘要 在本文中,我们 investigate了极大规模多输入多输出通信系统中的半连接混合结构。首先,我们提出了一种两阶段混合场 beam training 方案,其中第一阶段每个子阵列独立地使用多个远场通道方向射频 vectors 来近似近场 ones。在第二阶段,我们设计了用于将数字 combiners 的输出相乘的 analog combiners。然后,基于站静相对和时空对偶性,我们Derived the expressions of subarray signals after analog combining and proposed a beam refinement based on phase shifts of subarrays (BRPSS) scheme with closed-form solutions for high-resolution channel parameter estimation.此外,我们还提出了一种低复杂度近场 beam tracking 方案,其中采用了运动模型来描述通道变化,并利用了扩展 Kalman 筛来进行 beam tracking。实验结果证明了我们提出的方案的有效性。

Simultaneous Beam Training and Target Sensing in ISAC Systems with RIS

  • paper_url: http://arxiv.org/abs/2311.15062
  • repo_url: None
  • paper_authors: Kangjian Chen, Chenhao Qi, Octavia A. Dobre, Geoffrey Ye Li
  • for: 本文研究一种整合感知和通信(ISAC)系统,其中包括可重新配置智能表面(RIS)。
  • methods: 我们的同步扫描和目标检测(SBTTS)方案使得基站可以与用户终端(UT)和RIS进行同步扫描,并同时检测目标。我们发现了echoes从RIS中的能量在角度-延迟频谱中异gexponential accumulation,而目标echoes在Doppler-延迟频谱中异gexponential accumulation。SBTTS方案可以将RIS与目标的混合回声分开。
  • results: 我们提出了一种基于SBTTS和PAOE方案的位置和数组方向估计(PAOE)方案,可以对线性视野通道和非线性视野通道进行估计。通过利用扫描结果,我们计算了通道之间RIS和UT的角度 arrival和角度 departure,以实现ISAC系统的扫描平衡。实验结果证明了我们的方案的有效性。
    Abstract This paper investigates an integrated sensing and communication (ISAC) system with reconfigurable intelligent surface (RIS). Our simultaneous beam training and target sensing (SBTTS) scheme enables the base station to perform beam training with the user terminals (UTs) and the RIS, and simultaneously to sense the targets. Based on our findings, the energy of the echoes from the RIS is accumulated in the angle-delay domain while that from the targets is accumulated in the Doppler-delay domain. The SBTTS scheme can distinguish the RIS from the targets with the mixed echoes from the RIS and the targets. Then we propose a positioning and array orientation estimation (PAOE) scheme for both the line-of-sight channels and the non-line-of-sight channels based on the beam training results of SBTTS by developing a low-complexity two-dimensional fast search algorithm. Based on the SBTTS and PAOE schemes, we further compute the angle-of-arrival and angle-of-departure for the channels between the RIS and the UTs by exploiting the geometry relationship to accomplish the beam alignment of the ISAC system. Simulation results verify the effectiveness of the proposed schemes.
    摘要 (Simplified Chinese)这篇论文研究了一种集成感知和通信(ISAC)系统,其中包含可重新配置智能表面(RIS)。我们的同时扫描和目标探测(SBTTS)方案使得基站可以通过用户终端(UT)和RIS进行扫描,并同时探测目标。根据我们的发现,RIS中的回射强度在角度延迟频谱中归集,而目标的回射强度在Doppler延迟频谱中归集。SBTTS方案可以通过混合RIS和目标的混合强度来 отличиRIS和目标。然后,我们提议一种位姿和数组方向估计(PAOE)方案,以便对线路通信频道和非线路通信频道进行估计。基于SBTTS和PAOE方案,我们进一步计算了通信频道之间RIS和UT的角度 arrival和角度 departure,以便完成ISAC系统的束Alignment。 simulate结果证明了我们的方案的有效性。

SenseAI: Real-Time Inpainting for Electron Microscopy

  • paper_url: http://arxiv.org/abs/2311.15061
  • repo_url: None
  • paper_authors: Jack Wells, Amirafshar Moshtaghpour, Daniel Nicholls, Alex W. Robinson, Yalin Zheng, Jony Castagna, Nigel D. Browning
  • for: 这篇论文主要是为了解决电子顾问数据的探针学习和稀疏编码基于填充算法的实时问题。
  • methods: 这篇论文使用了joint dictionary-learning和稀疏编码基于填充算法,但由于现有的算法效率不高,因此开发了一个名为SenseAI的C++/CUDA库来解决这个问题。
  • results: SenseAI可以快速地进行 dictionary-based inpainting,并且可以提供live reconstructions、dictionary transfer和图像质量指标的实时图表。
    Abstract Despite their proven success and broad applicability to Electron Microscopy (EM) data, joint dictionary-learning and sparse-coding based inpainting algorithms have so far remained impractical for real-time usage with an Electron Microscope. For many EM applications, the reconstruction time for a single frame is orders of magnitude longer than the data acquisition time, making it impossible to perform exclusively subsampled acquisition. This limitation has led to the development of SenseAI, a C++/CUDA library capable of extremely efficient dictionary-based inpainting. SenseAI provides N-dimensional dictionary learning, live reconstructions, dictionary transfer and visualization, as well as real-time plotting of statistics, parameters, and image quality metrics.
    摘要 尽管joint字典学习和稀有编码基于填充算法在电子顾问数据中有证明的成功和广泛应用,但这些算法在实时使用电子顾问时仍然无法实现。许多电子顾问应用程序中的重建时间比数据采集时间长得多,这限制了填充的使用。这一限制导致了SenseAI的开发,这是一个基于C++/CUDA的字典学习库,可以实现EXTREMELY高效的字典基本填充。SenseAI提供了N维字典学习、实时重建、字典传输和可视化,以及实时图表化统计参数和图像质量指标。

Key Issues in Wireless Transmission for NTN-Assisted Internet of Things

  • paper_url: http://arxiv.org/abs/2311.15060
  • repo_url: None
  • paper_authors: Chenhao Qi, Jing Wang, Leyi Lyu, Lei Tan, Jinming Zhang, Geoffrey Ye Li
  • for: 本文是为了解决非地球网络(NTN)中大量连接和精准频率信息获取的挑战而写的。
  • methods: 本文使用随机访问建立无线链接,并使用多访问传输数据流。另外,本文还使用频率分配、频率共享、扫描干扰和射频讯号来有效地分配无线资源。
  • results: 本文对三个关键问题进行了全面的研究和分析,并提出了一些新的方案和技术。这些方案和技术能够有效地解决非地球网络中的大量连接和精准频率信息获取问题。
    Abstract Non-terrestrial networks (NTNs) have become appealing resolutions for seamless coverage in the next-generation wireless transmission, where a large number of Internet of Things (IoT) devices diversely distributed can be efficiently served. The explosively growing number of IoT devices brings a new challenge for massive connection. The long-distance wireless signal propagation in NTNs leads to severe path loss and large latency, where the accurate acquisition of channel state information (CSI) is another challenge, especially for fast-moving non-terrestrial base stations (NTBSs). Moreover, the scarcity of on-board resources of NTBSs is also a challenge for resource allocation. To this end, we investigate three key issues, where the existing schemes and emerging resolutions for these three key issues have been comprehensively presented. The first issue is to enable the massive connection by designing random access to establish the wireless link and multiple access to transmit data streams. The second issue is to accurately acquire CSI in various channel conditions by channel estimation and beam training, where orthogonal time frequency space modulation and dynamic codebooks are on focus. The third issue is to efficiently allocate the wireless resources, including power allocation, spectrum sharing, beam hopping, and beamforming. At the end of this article, some future research topics are identified.
    摘要 非地面网络(NTN)已成为下一代无线传输的吸引人解决方案,其可以有效地服务多种互联网件(IoT)设备。随着互联网件设备的快速增加,批量连接成为了一大挑战。非地面基站(NTBS)的远程无线信号传播导致了严重的路径损失和大量延迟,同时精准获取通道状态信息(CSI)是另一个挑战,尤其是对于高速移动的 NTBS。此外,NTBS 的board资源的缺乏也成为了资源分配的挑战。为此,我们进行了三个关键问题的研究,即:1. 通过随机访问建立无线链接和多个数据流传输。2. 在不同的通道条件下精准获取 CSI,包括通道估计和天线训练,其中寄存器时间频率空间模ulation和动态编码库在研究中备受关注。3. 有效地分配无线资源,包括功率分配、频率分享、天线跳跃和天线相机。这篇文章结束时,我们还提出了一些未来研究的主题。

Gohberg-Semencul Estimation of Toeplitz Structured Covariance Matrices and Their Inverses

  • paper_url: http://arxiv.org/abs/2311.14995
  • repo_url: None
  • paper_authors: Benedikt Böck, Dominik Semmler, Benedikt Fesl, Michael Baur, Wolfgang Utschick
  • for: 这篇论文是为了提出一种新的可靠性检查的likelihood-based估计方法,用于当只有几个数据样本时,估计具有对角线结构的协变矩阵和其逆。
  • methods: 这篇论文使用了一种新的对角线结构估计方法,基于Gohberg-Semencul(GS)参数化的反对角线矩阵估计。这种方法利用了AR过程与GS参数化之间的关系,并提出了一些实际的参数调整技术。
  • results: 实验结果显示,这种新的估计方法可以对具有对角线结构的协变矩阵和其逆进行高效地估计,并且可以确保估计结果的正定definiteness。
    Abstract When only few data samples are accessible, utilizing structural prior knowledge is essential for estimating covariance matrices and their inverses. One prominent example is knowing the covariance matrix to be Toeplitz structured, which occurs when dealing with wide sense stationary (WSS) processes. This work introduces a novel class of positive definiteness ensuring likelihood-based estimators for Toeplitz structured covariance matrices (CMs) and their inverses. In order to accomplish this, we derive positive definiteness enforcing constraint sets for the Gohberg-Semencul (GS) parameterization of inverse symmetric Toeplitz matrices. Motivated by the relationship between the GS parameterization and autoregressive (AR) processes, we propose hyperparameter tuning techniques, which enable our estimators to combine advantages from state-of-the-art likelihood and non-parametric estimators. Moreover, we present a computationally cheap closed-form estimator, which is derived by maximizing an approximate likelihood. Due to the ensured positive definiteness, our estimators perform well for both the estimation of the CM and the inverse covariance matrix (ICM). Extensive simulation results validate the proposed estimators' efficacy for several standard Toeplitz structured CMs commonly employed in a wide range of applications.
    摘要 当只有几个数据样本可用时,利用结构知识是必要的 для估计协方差矩阵和其 inverse。一个典型的例子是知道协方差矩阵是 toeplitz 结构的,这会发生在宽感Stationary (WSS) 过程中。这项工作介绍了一种新的正确性保证的可靠性基于 likelihood 的估计器,用于 toeplitz 结构协方差矩阵 (CM) 和其 inverse。为了完成这一点,我们 derivated positive definiteness 确保的约束集 для Gohberg-Semencul (GS) 参数化的 inverse symmetric toeplitz 矩阵。受到 GS 参数化和自动回归 (AR) 过程之间的关系的激励,我们提出了 hyperparameter 调整技术,这些技术使我们的估计器可以结合最好的 likelihood 和非 Parametric 估计器的优点。此外,我们提出了一种计算成本低的closed-form 估计器,它是通过最大化approximate likelihood 来 derivation。由于 Ensured positive definiteness,我们的估计器在估计 CM 和 inverse covariance matrix (ICM) 方面表现出色。我们的 simulate 结果表明,我们的估计器在一些标准的 toeplitz 结构 CM 上具有良好的性能。

Hybrid Precoding and Combining for mmWave Full-Duplex Joint Radar and Communication Systems under Self-Interference

  • paper_url: http://arxiv.org/abs/2311.14942
  • repo_url: None
  • paper_authors: Murat Bayraktar, Nuria González-Prelcic, Hao Chen
  • for: 这 paper 探讨了一种 joint radar and communication (JRC) 系统在 millimeter wave (mmWave) 频率上的设计,以实现同时感知和通信的功能。
  • methods: 该 paper 使用了 Generalized Eigenvalue-based 预编器,以满足下链用户速率、雷达增强和自遗浪谔抑制的多种需求。 hybrid 分析/_ digital 架构会削弱预编器中的自遗浪谔抑制能力,因此我们还提出了一种增强 SI 抑制的分析/_ digital 混合器。
  • results: 我们的数字实验表明,提posed 架构可以实现需要的雷达增强和自遗浪谔抑制,同时兼顾下链spectral efficiency的要求。 此外,我们的数字实验还表明,使用 OFDM 雷达处理技术可以实现高精度的距离和速度估计。
    Abstract In the context of integrated sensing and communication (ISAC), a full-duplex (FD) transceiver can operate as a monostatic radar while maintaining communication capabilities. This paper investigates the design of precoders and combiners for a joint radar and communication (JRC) system at mmWave frequencies. The primary goals of the design are to minimize self-interference (SI) caused by FD operation, while guaranteeing certain performance in terms of some sensing and communication metrics, as well as taking into account the hardware limitations coming from a hybrid MIMO architecture. Specifically, we introduce a generalized eigenvalue-based precoder that takes into account downlink user rate, radar gain, and SI suppression. Since the hybrid analog/digital architecture degrades the SI suppression capability of the precoder, we further enhance SI suppression with the analog combiner. Our numerical results demonstrate that the proposed architecture achieves the required radar gain and SI mitigation while incurring a small loss in downlink spectral efficiency. Additionally, the numerical experiments also show that the use of orthogonal frequency division multiplexing (OFDM) for radar processing with the proposed beamforming architecture results in highly accurate range and velocity estimates for detected targets.
    摘要 在 инте格рирован感知通信(ISAC)上,全双工(FD)扬送器可以作为单Statics radar 运行,保持通信能力。本文研究了 joint radar and communication(JRC)系统的 precoder 和 combiner 的设计,以实现 mmWave 频率下的自适应遮盲(SI)降低,保证感知和通信指标的一定性能,同时考虑 hybrid MIMO 架构的硬件限制。 Specifically, we introduce a generalized eigenvalue-based precoder that takes into account downlink user rate, radar gain, and SI suppression. Since the hybrid analog/digital architecture degrades the SI suppression capability of the precoder, we further enhance SI suppression with the analog combiner. Our numerical results demonstrate that the proposed architecture achieves the required radar gain and SI mitigation while incurring a small loss in downlink spectral efficiency. Additionally, the numerical experiments also show that the use of orthogonal frequency division multiplexing (OFDM) for radar processing with the proposed beamforming architecture results in highly accurate range and velocity estimates for detected targets.Note that Simplified Chinese is a more casual and informal version of Chinese, and it may not be appropriate for all situations or audiences. If you need a more formal version, you may want to consider using Traditional Chinese or Standard Chinese instead.

cs.SD - 2023-11-24

Overview Of The 2023 Icassp Sp Clarity Challenge: Speech Enhancement For Hearing Aids

  • paper_url: http://arxiv.org/abs/2311.14490
  • repo_url: None
  • paper_authors: Trevor J. Cox, Jon Barker, Will Bailey, Simone Graetzer, Michael A. Akeroyd, John F. Culling, Graham Naylor
  • for: 本文报告了 ICASP SP Clarity Challenge:听音辅助器的音响提升设计和结果。enario中有多个干扰者和听者Head rotation。
  • methods: 挑战使用 fixes the amplification stage of the hearing aid; 使用联合指标评估语音 inteligibilty和音质; 提供了基于 simulated和实际 Room Measurements两个评估集。
  • results: 五支队伍在 simulated evaluation set上进行了改进,但实际 measurement set上的表现远低于预期。 ongoing investigations 确定实际数据集和 simulated数据集之间的差异的原因。 被提出的可能的原因包括:测量中的投射器噪声、lower order Ambisonics 降低了系统利用 binAural cues的能力,以及实际和 simulated Room Impulse Responses之间的差异。
    Abstract This paper reports on the design and outcomes of the ICASSP SP Clarity Challenge: Speech Enhancement for Hearing Aids. The scenario was a listener attending to a target speaker in a noisy, domestic environment. There were multiple interferers and head rotation by the listener. The challenge extended the second Clarity Enhancement Challenge (CEC2) by fixing the amplification stage of the hearing aid; evaluating with a combined metric for speech intelligibility and quality; and providing two evaluation sets, one based on simulation and the other on real-room measurements. Five teams improved on the baseline system for the simulated evaluation set, but the performance on the measured evaluation set was much poorer. Investigations are on-going to determine the exact cause of the mismatch between the simulated and measured data sets. The presence of transducer noise in the measurements, lower order Ambisonics harming the ability for systems to exploit binaural cues and the differences between real and simulated room impulse responses are suggested causes
    摘要 Translated into Simplified Chinese:这篇文章报告了ICASSP SP Clarity Challenge:听说器增强器的设计和效果。场景是一个听众听取目标说话人在吵闹的家庭环境中。有多个干扰者和听众的头转换。挑战将第二个增强挑战(CEC2)中的增强阶段固定,并使用合并指标评估语音知能和质量。提供了基于模拟和实际测量的两个评估集。五支队伍在模拟评估集上超过基准系统,但实际测量集的性能远低。正在进行的调查是确定模拟和实际数据集之间的差异的原因。被提出的可能性包括测量中的传感器噪声、下一个Ambisonics阶段对系统利用耳后听觉信息的阻碍,以及实际和模拟房间冲击波的差异。

Allpass impulse response modelling

  • paper_url: http://arxiv.org/abs/2311.14239
  • repo_url: None
  • paper_authors: Matt R. Flax
  • for: 这篇论文是为了研究FIR系统模型的方法,这种方法仅仅基于相位引入和消除(全相差滤波器),因此数值稳定。
  • methods: 这种方法使用相位引入和消除来模拟FIR系统,不改变 маgnitude,因此处于系统的线性范围内。
  • results: 这篇论文的结果表明,这种方法可以精确地模拟FIR系统,并且数值稳定。
    Abstract This document defines a method for FIR system modelling which is very trivial as it only depends on phase introduction and removal (allpass filters). As magnitude is not altered, the processing is numerically stable. It is limited to phase alteration which maintains the time domain magnitude to force a system within its linear limits.
    摘要 这份文档定义了一种简单的FIR系统模型方法,该方法只依赖于相位引入和消除(全通过滤波器)。由于大小不变,处理是数值稳定。它只限制相位变化,以保持时域大小,使系统保持在线性限制内。Here's a word-for-word translation:这份文档定义了一种简单的FIR系统模型方法,该方法只依赖于相位引入和消除(全通过滤波器)。由于大小不变,处理是数值稳定。它只限制相位变化,以保持时域大小,使系统保持在线性限制内。

eess.AS - 2023-11-24

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech

  • paper_url: http://arxiv.org/abs/2311.14816
  • repo_url: https://github.com/ETZET/SpeechEmotionAVLearning
  • paper_authors: Enting Zhou, You Zhang, Zhiyao Duan
  • for: 这个论文的目的是学习从语音材料中预测情绪表达的精度和豁达性。
  • methods: 这个论文使用了自然语言处理技术和深度学习算法,首先学习了一个高维、情绪相关的语音特征表示,然后使用了 anchored dimensionality reduction 将这个表示映射到了2D的情绪坐标系中。
  • results: 实验结果表明,这个方法可以在没有使用过程中的AV标注数据的情况下,与状态 искусственный进行比较,在IEMOCAP数据集上达到了类似的CCC性能。此外,对MEAD和EmoDB数据集的AV预测结果的可见化也提供了可解释的AV表示的视觉化。
    Abstract Dimensional representations of speech emotions such as the arousal-valence (AV) representation provide a continuous and fine-grained description and control than their categorical counterparts. They have wide applications in tasks such as dynamic emotion understanding and expressive text-to-speech synthesis. Existing methods that predict the dimensional emotion representation from speech cast it as a supervised regression task. These methods face data scarcity issues, as dimensional annotations are much harder to acquire than categorical labels. In this work, we propose to learn the AV representation from categorical emotion labels of speech. We start by learning a rich and emotion-relevant high-dimensional speech feature representation using self-supervised pre-training and emotion classification fine-tuning. This representation is then mapped to the 2D AV space according to psychological findings through anchored dimensionality reduction. Experiments show that our method achieves a Concordance Correlation Coefficient (CCC) performance comparable to state-of-the-art supervised regression methods on IEMOCAP without leveraging ground-truth AV annotations during training. This validates our proposed approach on AV prediction. Furthermore, visualization of AV predictions on MEAD and EmoDB datasets shows the interpretability of the learned AV representations.
    摘要 尺度表示情感的维度表示,如动力吟唱情感(AV)表示,提供了连续和精细的描述和控制,比其分类对手更多。它们在动态情感理解和表达文本到语音合成等任务中有广泛的应用。现有的方法预测AV表示从语音中的批处是一种有监督的回归任务。这些方法面临数据缺乏问题,因为维度标注更加困难于分类标签。在这项工作中,我们提议从语音的分类情感标签中学习AV表示。我们首先学习一个富有情感相关的高维语音特征表示,使用自我监督预训练和情感分类练熟。这个表示然后通过静态维度减少映射到2D AV空间,根据心理发现。实验表明,我们的方法在IEMOCAP无需在训练过程中使用真实的AV标注的情况下,可以达到与当前状态的监督回归方法相同的CCC性能。这将 validate我们的提出的方法。此外,对MEAD和EmoDB数据集的AV预测视觉显示了学习的AV表示的可读性。

cs.CV - 2023-11-24

Uncertainty Aware AI for 2D MRI Segmentation

  • paper_url: http://arxiv.org/abs/2311.14875
  • repo_url: None
  • paper_authors: Lohith Konathala
  • for: 这个研究旨在提供一个可靠且可解释的深度学习探测医疗影像的方法,以提高自动诊断疾病的精度和可靠性。
  • methods: 本研究使用了巴叶斯对应网络和注意力机制,实现了精度和可解释的探测结果。
  • results: 在使用BraTS 2020 dataset进行评估时,我们的模型得到了高的F1分数和交集遮上率(IoU)。
    Abstract Robust uncertainty estimations are necessary in safety-critical applications of Deep Learning. One such example is the semantic segmentation of medical images, whilst deep-learning approaches have high performance in such tasks they lack interpretability as they give no indication of their confidence when making classification decisions. Robust and interpretable segmentation is a critical first stage in automatically screening for pathologies hence the optimal solution is one which can provide high accuracy but also capture the underlying uncertainty. In this work, we present an uncertainty-aware segmentation model, BA U-Net, for use on MRI data that incorporates Bayesian Neural Networks and Attention Mechanisms to provide accurate and interpretable segmentations. We evaluated our model on the publicly available BraTS 2020 dataset using F1 Score and Intersection Over Union (IoU) as evaluation metrics.
    摘要 强大的不确定性估计是深度学习应用中的必需品,特别是在安全关键应用中。例如,深度学习方法在医学图像Semantic segmentation任务中表现出色,但它们缺乏可解性,因为它们不会提供分类决策时的信息。可靠和可解的分 segmentation是自动检测疾病的关键第一步,因此最佳解决方案是一个可以提供高精度而且捕捉下面的不确定性的模型。在这篇文章中,我们提出了一种不确定性意识的分 segmentation模型,BA U-Net,用于MRI数据,该模型包括抽象神经网络和注意机制,以提供准确和可解的分 segmentation。我们使用BraTS 2020 dataset进行评估,使用F1 Score和Intersection Over Union(IoU)作为评估指标。

Unified Medical Image Pre-training in Language-Guided Common Semantic Space

  • paper_url: http://arxiv.org/abs/2311.14851
  • repo_url: None
  • paper_authors: Xiaoxuan He, Yifan Yang, Xinyang Jiang, Xufang Luo, Haoji Hu, Siyun Zhao, Dongsheng Li, Yuqing Yang, Lili Qiu
  • for: 这篇论文的目的是提出一个具有扩展性的医疗影像预训架构,以掌握不同频率和维度的医疗影像数据,并将其转换为一个共同semantic空间,以便实现医疗影像分析和诠释的统一化。
  • methods: 本研究使用了诊断报告来建立共同semantic空间,从而创建了一个统一的医疗影像表示,并使用了这个表示来进行医疗影像分析和诠释。
  • results: 研究结果显示,UniMedI可以优化医疗影像分析和诠释的一致性,并且在不同频率和维度的医疗影像数据上具有出色的表现。
    Abstract Vision-Language Pre-training (VLP) has shown the merits of analysing medical images, by leveraging the semantic congruence between medical images and their corresponding reports. It efficiently learns visual representations, which in turn facilitates enhanced analysis and interpretation of intricate imaging data. However, such observation is predominantly justified on single-modality data (mostly 2D images like X-rays), adapting VLP to learning unified representations for medical images in real scenario remains an open challenge. This arises from medical images often encompass a variety of modalities, especially modalities with different various number of dimensions (e.g., 3D images like Computed Tomography). To overcome the aforementioned challenges, we propose an Unified Medical Image Pre-training framework, namely UniMedI, which utilizes diagnostic reports as common semantic space to create unified representations for diverse modalities of medical images (especially for 2D and 3D images). Under the text's guidance, we effectively uncover visual modality information, identifying the affected areas in 2D X-rays and slices containing lesion in sophisticated 3D CT scans, ultimately enhancing the consistency across various medical imaging modalities. To demonstrate the effectiveness and versatility of UniMedI, we evaluate its performance on both 2D and 3D images across 10 different datasets, covering a wide range of medical image tasks such as classification, segmentation, and retrieval. UniMedI has demonstrated superior performance in downstream tasks, showcasing its effectiveness in establishing a universal medical visual representation.
    摘要 医学图像预训练(VLP)已经证明了对医学图像进行分析,通过挖掘医学图像和其报告之间的语义相似性。它高效地学习视觉表示,从而促进了复杂的医学图像数据的分析和解释。然而,这种观察是基于单一模式的数据(主要是2D图像如X射线),将VLP学习到一个综合表示中是一个开放的挑战。这是因为医学图像通常包含多种模式,特别是不同维度的模式(例如3D图像如计算Tomography)。为了解决这些挑战,我们提出了一个综合医学图像预训练框架,称为UniMedI,该框架利用诊断报告作为共同语义空间,创建综合表示 для多种医学图像模式(特别是2D和3D图像)。在我们的指导下,我们有效地抽取视觉模式信息,在2DX射线图像中找到病changed区域,并在复杂的3DCT扫描中找到患部,从而提高了各种医学图像模式之间的一致性。为了证明UniMedI的效果和多样性,我们对2D和3D图像进行了10个不同的数据集评估,覆盖了各种医学图像任务,如分类、 segmentation和检索。UniMedI在下游任务中表现出色,证明了其在建立医学视觉共同表示方面的效iveness。

UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

  • paper_url: http://arxiv.org/abs/2311.16477
  • repo_url: None
  • paper_authors: Zhongyu Jiang, Wenhao Chai, Lei Li, Zhuoran Zhou, Cheng-Yen Yang, Jenq-Neng Hwang
  • for: 这 paper 的目的是开发一种能够效果地结合多Modalities 信息的感知技术,以便使用更大的数据集和约束来训练,并利用每个模式中的信息。
  • methods: 该 paper 提出了一种统一的人姿估计(HPE)管道,将所有三种模式的特征都Alignment在同一个管道中,并提出了一种新的 singular value 基于的对比学习损失函数,以更好地对不同的模式进行对应。
  • results: 该 paper 的实验结果表明,UniHPE 在 Human3.6M 数据集上的 MPJPE 为 50.5 mm,在 3DPW 数据集上的 PAMPJPE 为 51.6 mm,这些表现 metric 非常出色。
    Abstract In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.
    摘要 近些年来,有一个增长的兴趣是开发有效的感知技术,以 combinational 多模态信息。这些技术包括将多种来源的特征进行对应,以便更高效地训练大量数据和约束,以及利用每种模态中的财富。计算机视觉中的2D和3D人姿估算(HPE)是两个关键的感知任务,它们有许多下游应用,如行为识别、人机交互、物体跟踪等。然而,关于图像和2D/3D人姿之间的相关性研究使用对比性学习方法的研究是有限的。在这篇论文中,我们提出UniHPE,一个统一的人姿估算管道,将所有三种模态的特征进行对应,即2D人姿估算、提升基于和图像基于的3D人姿估算。为了同时对多于两种模态进行对应,我们提出了一种新的 singular value 基于对比学习损失,可以更好地对不同模态进行对应,并进一步提高性能。在我们的评估中,UniHPE达到了remarkable的性能指标:MPJPE $50.5$mm on the Human3.6M dataset和PAMPJPE $51.6$mm on the 3DPW dataset。我们的提出的方法具有潜在的推动计算机视觉领域的前进,并可以应用于多种应用程序。

Benchmarking Robustness of Text-Image Composed Retrieval

  • paper_url: http://arxiv.org/abs/2311.14837
  • repo_url: None
  • paper_authors: Shitong Sun, Jindong Gu, Shaogang Gong
  • for: 这个论文目的是研究文本图像组合检索的稳定性,包括对于自然损害和文本理解的分析。
  • methods: 该论文使用了文本图像组合检索方法,并在自然损害和文本理解方面进行了系统性的分析。
  • results: 该论文通过 introduce 两个新的大规模 benchmark 数据集(CIRR-C和 FashionIQ-C)和一个新的 диагностиック数据集(CIRR-D),以及对文本图像组合检索方法的系统性分析,以探讨文本图像组合检索的稳定性和文本理解能力。
    Abstract Text-image composed retrieval aims to retrieve the target image through the composed query, which is specified in the form of an image plus some text that describes desired modifications to the input image. It has recently attracted attention due to its ability to leverage both information-rich images and concise language to precisely express the requirements for target images. However, the robustness of these approaches against real-world corruptions or further text understanding has never been studied. In this paper, we perform the first robustness study and establish three new diversified benchmarks for systematic analysis of text-image composed retrieval against natural corruptions in both vision and text and further probe textural understanding. For natural corruption analysis, we introduce two new large-scale benchmark datasets, CIRR-C and FashionIQ-C for testing in open domain and fashion domain respectively, both of which apply 15 visual corruptions and 7 textural corruptions. For textural understanding analysis, we introduce a new diagnostic dataset CIRR-D by expanding the original raw data with synthetic data, which contains modified text to better probe textual understanding ability including numerical variation, attribute variation, object removal, background variation, and fine-grained evaluation. The code and benchmark datasets are available at https://github.com/SunTongtongtong/Benchmark-Robustness-Text-Image-Compose-Retrieval.
    摘要 文本图像组合检索目标是通过指定一个图像和一些描述所需修改的文本来检索目标图像。这些方法在最近吸引了关注,因为它们可以利用图像中的信息和简短的语言来精确表达需要的图像要求。然而,这些方法对实际世界的损害或文本理解的Robustness从未被研究。在这篇论文中,我们进行了第一个Robustness研究,并建立了三个新的多样化 benchMark для系统性的分析文本图像组合检索方法对实际世界的损害和文本理解。对于实际世界的损害分析,我们引入了两个新的大规模 benchmark dataset:CIRR-C和FashionIQ-C,它们在开放领域和时尚领域分别测试15种视觉损害和7种文本损害。对于文本理解分析,我们引入了一个新的 диагностические dataset CIRR-D,通过扩展原始的 raw 数据,包括修改的文本,以更好地检测文本理解能力,包括数值变化、属性变化、对象移除、背景变化和细化评价。代码和benchMark dataset可以在 GitHub 上获取:https://github.com/SunTongtongtong/Benchmark-Robustness-Text-Image-Compose-Retrieval。

Proximal Algorithms for Accelerated Langevin Dynamics

  • paper_url: http://arxiv.org/abs/2311.14829
  • repo_url: None
  • paper_authors: Duy H. Thai, Alexander L. Young, David B. Dunson
  • for: 这个论文是为了开发一种新的MCMC算法,以实现更好地混合Markov链。
  • methods: 这个论文使用了一种带有随机噪声的奈斯特洛夫算法,并证明了这种方法可以导致一个指定的目标分布作为它的稳定分布。
  • results: 实验表明,提议的方法在不同的统计和图像处理模型中都有更好的混合性,比传统的勒拜赛末MCMC算法更好。
    Abstract We develop a novel class of MCMC algorithms based on a stochastized Nesterov scheme. With an appropriate addition of noise, the result is a time-inhomogeneous underdamped Langevin equation, which we prove emits a specified target distribution as its invariant measure. Convergence rates to stationarity under Wasserstein-2 distance are established as well. Metropolis-adjusted and stochastic gradient versions of the proposed Langevin dynamics are also provided. Experimental illustrations show superior performance of the proposed method over typical Langevin samplers for different models in statistics and image processing including better mixing of the resulting Markov chains.
    摘要 我们开发了一种新的 MCMC 算法基于偏导数 Nesterov 方案。通过适当添加噪声,得到了时间不一致的涨落征函数方程,我们证明其具有指定目标分布作为吸引器的吸引性。我们还证明了在 Wasserstein-2 距离下的收敛率。此外,我们还提供了 Metropolis 调整和随机梯度版本的提案 Langvin 动力学。实验示例显示了我们的方法在不同的统计和图像处理模型中表现更好,包括更好地混合Markov链。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Text and Click inputs for unambiguous open vocabulary instance segmentation

  • paper_url: http://arxiv.org/abs/2311.14822
  • repo_url: https://github.com/nikolaiwarner7/text-and-click-for-open-vocabulary-segmentation
  • paper_authors: Nikolai Warner, Meera Hahn, Jonathan Huang, Irfan Essa, Vighnesh Birodkar
  • for: 本研究旨在提高图像分割的精度和效率,通过 humans-in-the-loop 提供额外输入,以便更好地分割图像中的对象。
  • methods: 本研究提出了一种新的分割方法,称为 Text + Click segmentation,它使用图像、文本描述和前景单击点作为输入,并使用开放词汇图像文本模型来支持广泛的文本提示。
  • results: 对于常见的分割集合refCOCO、COCO、VOC和OpenImages,模型能够更好地分割图像中的对象,特别是在涉及到重叠或共存的semantic category时。
    Abstract Segmentation localizes objects in an image on a fine-grained per-pixel scale. Segmentation benefits by humans-in-the-loop to provide additional input of objects to segment using a combination of foreground or background clicks. Tasks include photoediting or novel dataset annotation, where human annotators leverage an existing segmentation model instead of drawing raw pixel level annotations. We propose a new segmentation process, Text + Click segmentation, where a model takes as input an image, a text phrase describing a class to segment, and a single foreground click specifying the instance to segment. Compared to previous approaches, we leverage open-vocabulary image-text models to support a wide-range of text prompts. Conditioning segmentations on text prompts improves the accuracy of segmentations on novel or unseen classes. We demonstrate that the combination of a single user-specified foreground click and a text prompt allows a model to better disambiguate overlapping or co-occurring semantic categories, such as "tie", "suit", and "person". We study these results across common segmentation datasets such as refCOCO, COCO, VOC, and OpenImages. Source code available here.
    摘要 Segmentation 可以在图像上进行细化的每个像素级别地Localize对象。Segmentation 可以通过人类在Loop中提供额外输入来提高对象的分割,包括使用背景或前景键 clicks。任务包括图像编辑或新数据集注释, где人类标注员可以利用现有的分割模型而不是直接在像素级别上绘制Raw annotations。我们提出了一种新的分割过程,文本 + 单击分割,其中模型接受图像、文本短语描述类划分和单个前景键点击。与之前的方法相比,我们利用了开放词汇图像文本模型,以支持广泛的文本提示。基于文本提示来conditioning分割提高了对 novel或未经见过的类的准确性。我们证明了 combining 单个用户指定的前景键点击和文本提示可以使模型更好地解决 overlap 或共处的语义类划分,例如“链”、“西装”和“人”。我们在 refCOCO、COCO、VOC 和 OpenImages 等常见分割数据集上进行了这些研究。代码可以在这里找到。

SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation

  • paper_url: http://arxiv.org/abs/2311.14671
  • repo_url: https://github.com/menglcool/segic
  • paper_authors: Lingchen Meng, Shiyi Lan, Hengduo Li, Jose M. Alvarez, Zuxuan Wu, Yu-Gang Jiang
  • for: 本研究旨在开发一种基于单一视觉基础模型(VFM)的结构准确分割方法,可以在少量标注样本的情况下进行准确的图像分割。
  • methods: 本研究使用了一种名为SEGIC的结构准确分割方法,该方法基于VFM模型,并且通过提取受影响的图像和准确样本之间的相似性,来学习分割规则。
  • results: 根据实验结果,SEGIC方法可以在一些一键分割 benchmark 上达到 estado del arte 的性能,而且可以轻松扩展到多个任务,如视频对象分割和开放词汇分割。
    Abstract In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target. The resulting models can be generalized seamlessly to novel segmentation tasks, significantly reducing the labeling and training costs compared with conventional pipelines. However, in-context segmentation is more challenging than classic ones due to its meta-learning nature, requiring the model to learn segmentation rules conditioned on a few samples, not just the segmentation. Unlike previous work with ad-hoc or non-end-to-end designs, we propose SEGIC, an end-to-end segment-in-context framework built upon a single vision foundation model (VFM). In particular, SEGIC leverages the emergent correspondence within VFM to capture dense relationships between target images and in-context samples. As such, information from in-context samples is then extracted into three types of instructions, i.e. geometric, visual, and meta instructions, serving as explicit conditions for the final mask prediction. SEGIC is a straightforward yet effective approach that yields state-of-the-art performance on one-shot segmentation benchmarks. Notably, SEGIC can be easily generalized to diverse tasks, including video object segmentation and open-vocabulary segmentation. Code will be available at \url{https://github.com/MengLcool/SEGIC}.
    摘要 宏观分割是一种将新图像分割成多个类别的技术,使用一些已经标注过的图像,称为“内容相似例子”,来探索图像的内容相似性。这种技术可以在新的分割任务上进行普适的泛化,相比于传统的管道,可以大幅减少标注和训练成本。然而,宏观分割比 класси的分割更加具有挑战性,因为它具有元学习性,需要模型学习分割规则,条件于几个样本,而不仅仅是分割。在这种情况下,我们提出了SEGIC,一种基于单一视觉基础模型(VFM)的一阶段内容分割框架。特别是,SEGIC利用VFM中的emergent对应关系,捕捉目标图像和内容相似样本之间的稠密关系。因此,从内容相似样本中提取信息,并将其转化为三种类型的指令,即几何指令、视觉指令和元指令,作为最终掩码预测的显式条件。SEGIC是一种简单 yet 有效的方法,在一阶段内容分割 benchmark 上实现了状态码的性能。另外,SEGIC可以轻松扩展到多种任务,包括视频对象分割和开放词汇分割。代码将在 \url{https://github.com/MengLcool/SEGIC} 上提供。

Understanding Self-Supervised Features for Learning Unsupervised Instance Segmentation

  • paper_url: http://arxiv.org/abs/2311.14665
  • repo_url: None
  • paper_authors: Paul Engstler, Luke Melas-Kyriazi, Christian Rupprecht, Iro Laina
  • for: 本文探索了无需人工标注的自助学习方法对实例分割任务的应用。
  • methods: 本文使用了多种自助学习方法,包括DINO和MAE等。
  • results: 研究发现,DINO的特征在实例分割任务中虽然具有良好的语义描述能力,但缺乏对实例的敏感度。MAE的特征则表现出较高的实例敏感度。
    Abstract Self-supervised learning (SSL) can be used to solve complex visual tasks without human labels. Self-supervised representations encode useful semantic information about images, and as a result, they have already been used for tasks such as unsupervised semantic segmentation. In this paper, we investigate self-supervised representations for instance segmentation without any manual annotations. We find that the features of different SSL methods vary in their level of instance-awareness. In particular, DINO features, which are known to be excellent semantic descriptors, lack behind MAE features in their sensitivity for separating instances.
    摘要 自我指导学习(SSL)可以用来解决复杂的视觉任务,无需人工标注。自我指导表示包含有用的 semantic 信息,因此它们已经在无监督Semantic Segmentation 等任务中使用。在这篇论文中,我们调查了不同 SSL 方法的自我指导表示,以便实现无需人工标注的实例分割。我们发现,DINO 的特征在实例分割方面较为脆弱,而 MAE 的特征则更加敏感于分割实例。

Continuous football player tracking from discrete broadcast data

  • paper_url: http://arxiv.org/abs/2311.14642
  • repo_url: None
  • paper_authors: Matthew J. Penn, Christl A. Donnelly, Samir Bhatt
  • for: 提供了一种方法来估算全场 continuous tracking 数据,以帮助职业足球队伍获取高质量的数据,而不需要特殊的设备或高成本。
  • methods: 该方法使用计算机视觉技术,基于广播视频的不连续数据,来估算全场 tracking 数据。
  • results: 测试结果表明,该方法可以准确地估算全场 tracking 数据,并且可以应用于大量的足球比赛。
    Abstract Player tracking data remains out of reach for many professional football teams as their video feeds are not sufficiently high quality for computer vision technologies to be used. To help bridge this gap, we present a method that can estimate continuous full-pitch tracking data from discrete data made from broadcast footage. Such data could be collected by clubs or players at a similar cost to event data, which is widely available down to semi-professional level. We test our method using open-source tracking data, and include a version that can be applied to a large set of over 200 games with such discrete data.
    摘要 <>将文本翻译成简化中文。<>职业足球队伍的玩家跟踪数据仍然无法达到许多专业队伍的手中,因为他们的视频流不够高质量,不能用计算机视觉技术进行跟踪。为了bridging这个差距,我们提出了一种方法,可以从广播影片中提取连续全场跟踪数据。这些数据可以由俱乐部或球员在类似于事件数据的成本下收集,而事件数据在 semi-professional 水平下广泛可用。我们使用开源跟踪数据进行测试,并包含一个可应用于大量超过 200 场的精简数据。

Unsupervised high-throughput segmentation of cells and cell nuclei in quantitative phase images

  • paper_url: http://arxiv.org/abs/2311.14639
  • repo_url: None
  • paper_authors: Julia Sistermanns, Ellen Emken, Gregor Weirich, Oliver Hayden, Wolfgang Utschick
  • for: 这篇论文的目的是为了提高细胞学诊断的效率和准确性,通过开发高通量数字束谱微scopic镜技术来自动分类单个细胞。
  • methods: 这篇论文使用了无监督的多stage方法来自动分类细胞,不会误分辨噪音或反射为细胞,同时能够检测细胞内部结构,特别是细胞核在不染色的细胞中。
  • results: 论文中表明,这种分类方法在多个实验中对病人样本具有一致性和可靠性,并且在合理的每个细胞分析时间内完成。
    Abstract In the effort to aid cytologic diagnostics by establishing automatic single cell screening using high throughput digital holographic microscopy for clinical studies thousands of images and millions of cells are captured. The bottleneck lies in an automatic, fast, and unsupervised segmentation technique that does not limit the types of cells which might occur. We propose an unsupervised multistage method that segments correctly without confusing noise or reflections with cells and without missing cells that also includes the detection of relevant inner structures, especially the cell nucleus in the unstained cell. In an effort to make the information reasonable and interpretable for cytopathologists, we also introduce new cytoplasmic and nuclear features of potential help for cytologic diagnoses which exploit the quantitative phase information inherent to the measurement scheme. We show that the segmentation provides consistently good results over many experiments on patient samples in a reasonable per cell analysis time.
    摘要 在努力帮助细胞诊断方面,我们提出了自动化单元筛选技术,使用高通量数字折射 Microscopy 进行临床研究,捕捉了千张图像和百万个细胞。问题在于发现一种自动、快速、无监督的分割技术,不会混淆噪音或反射与细胞,也不会错过细胞。我们提出了一种多 stage 的无监督方法,可以正确地分割细胞,同时检测细胞内部的相关结构,特别是细胞核在未染色细胞中。为使信息更加合理和可解释,我们还引入了新的细胞质和核特征,可以帮助cytopathologists进行诊断。我们展示了这种分割技术在多个实验中的一致性和可靠性。

Automated Detection and Counting of Windows using UAV Imagery based Remote Sensing

  • paper_url: http://arxiv.org/abs/2311.14635
  • repo_url: None
  • paper_authors: Dhruv Patel, Shivani Chepuri, Sarvesh Thakur, K. Harikumar, Ravi Kiran S., K. Madhava Krishna
  • for: 本研究旨在提出一种基于无人机遥感系统的窗户数量检测方法,以帮助建筑和surveying领域中的监测和评估工作。
  • methods: 该方法包括两个阶段,首先是通过无人机摄像头和其他传感器提供数据,然后是通过计算机视觉管道自动识别和计数窗户。
  • results: 实验结果表明,提出的方法可以准确地检测和计数窗户,并且比现有方法更高效。
    Abstract Despite the technological advancements in the construction and surveying sector, the inspection of salient features like windows in an under-construction or existing building is predominantly a manual process. Moreover, the number of windows present in a building is directly related to the magnitude of deformation it suffers under earthquakes. In this research, a method to accurately detect and count the number of windows of a building by deploying an Unmanned Aerial Vehicle (UAV) based remote sensing system is proposed. The proposed two-stage method automates the identification and counting of windows by developing computer vision pipelines that utilize data from UAV's onboard camera and other sensors. Quantitative and Qualitative results show the effectiveness of our proposed approach in accurately detecting and counting the windows compared to the existing method.
    摘要

CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization

  • paper_url: http://arxiv.org/abs/2311.14631
  • repo_url: None
  • paper_authors: Ruoyu Zhao, Mingrui Zhu, Shiyin Dong, Nannan Wang, Xinbo Gao
  • for: 这 paper 的目的是提出一种基于倒推的文本到图像个性化方法,以便通过文本提示生成符合个性化概念的图像。
  • methods: 该 paper 使用了一种基于倒推的方法,包括首先分解文本Encoder 在图像生成过程中的集成,然后在这个特点空间中 concatenate 表示,以学习个性化概念与基类之间的差异。
  • results: 该 paper 的实验结果表明,CatVersion 可以更好地保持个性化概念,并且允许更加灵活的编辑。同时,该方法还可以更加准确地评估个性化图像生成的结果。
    Abstract We propose CatVersion, an inversion-based method that learns the personalized concept through a handful of examples. Subsequently, users can utilize text prompts to generate images that embody the personalized concept, thereby achieving text-to-image personalization. In contrast to existing approaches that emphasize word embedding learning or parameter fine-tuning for the diffusion model, which potentially causes concept dilution or overfitting, our method concatenates embeddings on the feature-dense space of the text encoder in the diffusion model to learn the gap between the personalized concept and its base class, aiming to maximize the preservation of prior knowledge in diffusion models while restoring the personalized concepts. To this end, we first dissect the text encoder's integration in the image generation process to identify the feature-dense space of the encoder. Afterward, we concatenate embeddings on the Keys and Values in this space to learn the gap between the personalized concept and its base class. In this way, the concatenated embeddings ultimately manifest as a residual on the original attention output. To more accurately and unbiasedly quantify the results of personalized image generation, we improve the CLIP image alignment score based on masks. Qualitatively and quantitatively, CatVersion helps to restore personalization concepts more faithfully and enables more robust editing.
    摘要 我们提出了CatVersion,一种基于倒推的方法,通过一些示例学习个人化概念。然后,用户可以使用文本提示生成符合个人化概念的图像,实现文本到图像个性化。与现有方法相比,我们的方法不是通过Word embedding学习或Diffusion模型参数微调来强调概念泛化或过拟合,而是将Encoder的特征密集空间 embedding concatenated在Diffusion模型中,以学习个人化概念和基类之间的差异,以最大化Diffusion模型中的优先知识保留,同时还能够Restore个人化概念。为此,我们首先析分Encoder在图像生成过程中的集成,以便Identify特征密集空间。然后,我们在Keys和Values中 concatenated embedding,以学习个人化概念和基类之间的差异。这样, concatenated embedding最终将 manifest as原始注意力输出的差异。为更准确和不偏地衡量个性化图像生成的结果,我们改进了CLIP图像对齐分数,基于masks。从qualitative和quantitative角度来看,CatVersion可以更加忠实地Restore个人化概念,并允许更加稳定的编辑。

Neural Style Transfer for Computer Games

  • paper_url: http://arxiv.org/abs/2311.14617
  • repo_url: None
  • paper_authors: Eleftherios Ioannou, Steve Maddock
  • for: 增强3D电子游戏的视觉效果
  • methods: 在3D渲染管道中注入深度意识的样式转移技术
  • results: 实现了一种可靠、有艺术性的游戏场景样式化方法,超越了现有图像和视频样式转移方法的效果
    Abstract Neural Style Transfer (NST) research has been applied to images, videos, 3D meshes and radiance fields, but its application to 3D computer games remains relatively unexplored. Whilst image and video NST systems can be used as a post-processing effect for a computer game, this results in undesired artefacts and diminished post-processing effects. Here, we present an approach for injecting depth-aware NST as part of the 3D rendering pipeline. Qualitative and quantitative experiments are used to validate our in-game stylisation framework. We demonstrate temporally consistent results of artistically stylised game scenes, outperforming state-of-the-art image and video NST methods.
    摘要 神经风格传输(NST)研究已经应用于图像、视频、3D模型和辐射场景,但它在3D电脑游戏中的应用还相对未经探索。尽管图像和视频NST系统可以作为电脑游戏的后处理效果使用,但这会导致不想要的artefacts和降低后处理效果。在这里,我们提出了在3D渲染管道中注入深度意识的NST方法。我们使用了质量和量的实验来验证我们的游戏风格框架。我们展示了一系列艺术风格化游戏场景,超过了现状的图像和视频NST方法。

Animate124: Animating One Image to 4D Dynamic Scene

  • paper_url: http://arxiv.org/abs/2311.14603
  • repo_url: https://github.com/HeliosZhao/Animate124
  • paper_authors: Yuyang Zhao, Zhiwen Yan, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee
  • for: 这篇论文旨在实现将单张宽泛图像转换为3D视频,通过文本动作描述来控制动画。
  • methods: 该方法使用高级4D网格动态神经辐射场(NeRF)模型,在三个阶段进行优化,包括使用2D和3D扩散约束、视频扩散模型和个性化扩散约束。
  • results: 该方法可以具有显著的进步,与现有的基eline相比,根据广泛的量化和质量评估表明。
    Abstract We introduce Animate124 (Animate-one-image-to-4D), the first work to animate a single in-the-wild image into 3D video through textual motion descriptions, an underexplored problem with significant applications. Our 4D generation leverages an advanced 4D grid dynamic Neural Radiance Field (NeRF) model, optimized in three distinct stages using multiple diffusion priors. Initially, a static model is optimized using the reference image, guided by 2D and 3D diffusion priors, which serves as the initialization for the dynamic NeRF. Subsequently, a video diffusion model is employed to learn the motion specific to the subject. However, the object in the 3D videos tends to drift away from the reference image over time. This drift is mainly due to the misalignment between the text prompt and the reference image in the video diffusion model. In the final stage, a personalized diffusion prior is therefore utilized to address the semantic drift. As the pioneering image-text-to-4D generation framework, our method demonstrates significant advancements over existing baselines, evidenced by comprehensive quantitative and qualitative assessments.
    摘要 我们介绍Animate124(animate一张图像到4D),这是首个通过文本运动描述将单个现场图像转化为3D视频的工作。这是一个尚未得到足够关注的问题,具有广泛的应用前景。我们的4D生成方法利用了高级4D网格动态神经辐射场(NeRF)模型,在三个不同的阶段进行了多次扩散假设的优化。首先,使用参考图像为INITIALIZATION的静态模型进行优化,由2D和3D扩散假设引导。然后,使用视频扩散模型学习图像的运动特征。然而,在3D视频中,对象往往会偏离参考图像。这种偏离主要是由文本提示和参考图像在视频扩散模型之间的不同导致的。因此,在最后一个阶段,我们使用个性化扩散假设来解决 semantic drift。作为图像-文本-4D生成框架的先锋,我们的方法在现有的基线上显示出了显著的进步,经过了广泛的量化和质量评估。

Large Language Models as Automated Aligners for benchmarking Vision-Language Models

  • paper_url: http://arxiv.org/abs/2311.14580
  • repo_url: None
  • paper_authors: Yuanfeng Ji, Chongjian Ge, Weikai Kong, Enze Xie, Zhengying Liu, Zhengguo Li, Ping Luo
  • for: 这种论文的目的是为了评估大语言模型(LLMs)在执行复杂的认知和理解任务时的性能,并且检验这些模型是否与人类智能相符。
  • methods: 这种论文使用了自动生成数据集,使用大语言模型(如GPT-4)生成大量的问答逻辑三元组,以便评估视力语言模型(VLMs)的性能。
  • results: 研究发现,使用大语言模型进行自动评估,可以达到85%的一致率,这表明这种方法可以准确评估视力语言模型的性能。
    Abstract With the advancements in Large Language Models (LLMs), Vision-Language Models (VLMs) have reached a new level of sophistication, showing notable competence in executing intricate cognition and reasoning tasks. However, existing evaluation benchmarks, primarily relying on rigid, hand-crafted datasets to measure task-specific performance, face significant limitations in assessing the alignment of these increasingly anthropomorphic models with human intelligence. In this work, we address the limitations via Auto-Bench, which delves into exploring LLMs as proficient aligners, measuring the alignment between VLMs and human intelligence and value through automatic data curation and assessment. Specifically, for data curation, Auto-Bench utilizes LLMs (e.g., GPT-4) to automatically generate a vast set of question-answer-reasoning triplets via prompting on visual symbolic representations (e.g., captions, object locations, instance relationships, and etc.). The curated data closely matches human intent, owing to the extensive world knowledge embedded in LLMs. Through this pipeline, a total of 28.5K human-verified and 3,504K unfiltered question-answer-reasoning triplets have been curated, covering 4 primary abilities and 16 sub-abilities. We subsequently engage LLMs like GPT-3.5 to serve as judges, implementing the quantitative and qualitative automated assessments to facilitate a comprehensive evaluation of VLMs. Our validation results reveal that LLMs are proficient in both evaluation data curation and model assessment, achieving an average agreement rate of 85%. We envision Auto-Bench as a flexible, scalable, and comprehensive benchmark for evaluating the evolving sophisticated VLMs.
    摘要 随着大语言模型(LLMs)的发展,视力语言模型(VLMs)已达到了新的水平,表现出了人类智能的复杂认知和理解能力。然而,现有的评估标准,主要依靠手动制作的数据来衡量任务特定的性能,面临着评估这些越来越人化的模型与人类智能的Alignment的困难。在这种情况下,我们通过Auto-Bench来解决这些问题,它通过自动征集数据和自动评估来衡量LLMs的Alignment。具体来说, для数据征集,Auto-Bench利用LLMs(例如GPT-4)自动生成了一大量的问题答案理解 triplets,通过提示视觉符号表示(例如caption、对象位置、实例关系等)。这些征集到的数据准确反映了人类意图,因为LLMs中嵌入了广泛的世界知识。我们通过这种管道,共征集了28.5K个人验证的和3504K个未过滤的问题答案理解 triplets,覆盖了4种基本能力和16种互相关的子能力。我们接着让LLMs如GPT-3.5服为评价judge,实现了量化和质量自动评估,以便全面评估VLMs。我们的验证结果表明,LLMs在评估数据征集和模型评估方面具有强大的能力,达到了85%的一致率。我们期望Auto-Bench成为一个灵活、扩展性强的评估标准,用于评估逐渐成熟的VLMs。

From Text to Image: Exploring GPT-4Vision’s Potential in Advanced Radiological Analysis across Subspecialties

  • paper_url: http://arxiv.org/abs/2311.14777
  • repo_url: None
  • paper_authors: Felix Busch, Tianyu Han, Marcus Makowski, Daniel Truhn, Keno Bressem, Lisa Adams
  • for: 评估和比较GPT-4和GPT-4Vision在放射学任务中的表现,并提出GPT-4Vision可能可以从图像中识别放射学特征,从而提高其诊断潜力。
  • methods: 使用GPT-4和GPT-4Vision进行放射学任务,比较 их表现。
  • results: GPT-4Vision可能可以更好地识别放射学特征从图像中,提高诊断潜力。
    Abstract The study evaluates and compares GPT-4 and GPT-4Vision for radiological tasks, suggesting GPT-4Vision may recognize radiological features from images, thereby enhancing its diagnostic potential over text-based descriptions.
    摘要 研究比较了GPT-4和GPT-4视力在放射学任务中的表现,表明GPT-4视力可以从图像中识别放射学特征,从而提高其诊断潜力,比文本描述更高。Here's a breakdown of the translation:* 研究 (研究) - study* 比较 (比较) - compare* GPT-4 (GPT-4) - GPT-4* GPT-4视力 (GPT-4视力) - GPT-4Vision* 放射学任务 (放射学任务) - radiological tasks* 表明 (表明) - suggests* 识别 (识别) - recognize* 放射学特征 (放射学特征) - radiological features* 从图像中 (从图像中) - from images* 提高 (提高) - enhance* 诊断潜力 (诊断潜力) - diagnostic potential* 比文本描述 (比文本描述) - compared to text-based descriptions

ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model

  • paper_url: http://arxiv.org/abs/2311.14542
  • repo_url: None
  • paper_authors: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord, Patrick Perez, Mohamed Elhoseiny
  • for: 这篇论文旨在提出一种可解释的二维扩散生成模型,以增强生成过程的可读性。
  • methods: 该方法由三个简单可解释的阶段组成:生成框架、alette和细节颜色图像。这些阶段通过精心设计和优化,实现了高效率和准确性。
  • results: 对于LSUN-Churches和COCO数据集的广泛实验表明, ToddlerDiffusion 方法可以持续性地超越现有方法,并且在LSUN-Churches 数据集上与LDM(Stable-Diffusion)相当,而且三倍快速, architecture 相对较小。
    Abstract Diffusion-based generative models excel in perceptually impressive synthesis but face challenges in interpretability. This paper introduces ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework inspired by the human generation system. Unlike traditional diffusion models with opaque denoising steps, our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image. This not only enhances overall performance but also enables robust editing and interaction capabilities. Each stage is meticulously formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM). Extensive experiments on datasets like LSUN-Churches and COCO validate our approach, consistently outperforming existing methods. ToddlerDiffusion achieves notable efficiency, matching LDM performance on LSUN-Churches while operating three times faster with a 3.76 times smaller architecture. Our source code is provided in the supplementary material and will be publicly accessible.
    摘要 Diffusion-based生成模型在视觉上表现出色,但面临解释性的挑战。这篇论文介绍了 ToddlerDiffusion,一种可解释的2D扩散图像生成框架,Draw inspiration from human generation system。不同于传统的扩散模型,我们的方法将生成过程 decomposes into simpler, more interpretable stages:生成框线,alette和细节颜色图像。这不仅提高了总性表现,还允许Robust editing and interaction capabilities。每个阶段都仔细制定了,以确保高效性和准确性,超越Stable-Diffusion(LDM)。广泛的实验表明,ToddlerDiffusion在LSUN-Churches和COCO数据集上具有显著的高效性,与LDM相当,而且操作速度三倍, architecture size为3.76倍。我们的源代码将在补充材料中提供,并将公开 accessible。

READS-V: Real-time Automated Detection of Epileptic Seizures from Surveillance Videos via Skeleton-based Spatiotemporal ViG

  • paper_url: http://arxiv.org/abs/2311.14775
  • repo_url: None
  • paper_authors: Yankun Xu, Jie Yang, Wenjie Ming, Shuang Wang, Mohamad Sawan
  • 为:开发一个高效、准确、及时的视频基于 epileptic seizure 开始探测系统,以便为患者提供更好的监测和诊断。* 方法:使用skeleton-based spatiotemporal vision graph neural network (STViG),能够快速、准确地识别患者的动作,并且可以在实时监测视频中探测 epileptic seizure 开始。* 结果:实验结果显示,STViG 在收集的患者视频数据上比前一代动作识别模型更高度准确(错误率5.9%),并且具有较低的计算复杂度(0.4G)。此外,通过结合输出概率和积累函数的决策规则,我们的 READS-V 系统实现了5.1 s EEG 开始探测延迟时间,比临床标准延迟时间提前13.1 s,并且无false detection。
    Abstract An accurate and efficient epileptic seizure onset detection system can significantly benefit patients. Traditional diagnostic methods, primarily relying on electroencephalograms (EEGs), often result in cumbersome and non-portable solutions, making continuous patient monitoring challenging. The video-based seizure detection system is expected to free patients from the constraints of scalp or implanted EEG devices and enable remote monitoring in residential settings. Previous video-based methods neither enable all-day monitoring nor provide short detection latency due to insufficient resources and ineffective patient action recognition techniques. Additionally, skeleton-based action recognition approaches remain limitations in identifying subtle seizure-related actions. To address these challenges, we propose a novel skeleton-based spatiotemporal vision graph neural network (STViG) for efficient, accurate, and timely REal-time Automated Detection of epileptic Seizures from surveillance Videos (READS-V). Our experimental results indicate STViG outperforms previous state-of-the-art action recognition models on our collected patients' video data with higher accuracy (5.9% error) and lower FLOPs (0.4G). Furthermore, by integrating a decision-making rule that combines output probabilities and an accumulative function, our READS-V system achieves a 5.1 s EEG onset detection latency, a 13.1 s advance in clinical onset detection, and zero false detection rate.
    摘要 一个精准、高效的癫痫发作起始检测系统可以对患者提供 significannot benefits。传统诊断方法通常基于电энцефалографи(EEG),往往会导致不便宜和不可搬移的解决方案,使得不间断监测患者困难。视频基本检测系统预计可以解вобо患者从头皮或植入EEG设备的限制中解放出来,并允许在家庭环境中进行远程监测。过去的视频基本方法未能实现全天候监测,也未能提供短检测延迟,因为不足的资源和不效的患者行为识别技术。此外,骨骼基本行动识别方法还存在识别癫痫发作相关行动的限制。为解决这些挑战,我们提出了一种新的骨骼基本空间视觉 graphs neuronal network(STViG),用于高效、准确、及时的REal-time Automated Detection of epileptic Seizures from surveillance Videos(READS-V)。我们的实验结果表明,STViG在我们收集的患者视频数据上表现出优于之前的状态艺术行动识别模型,误差率为5.9%,FLOPs为0.4G。此外,通过结合输出概率和积累函数的决策规则,我们的 READS-V 系统实现了5.1 s EEG 起始检测延迟,13.1 s 提前诊断起点,并无误检测。

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

  • paper_url: http://arxiv.org/abs/2311.14521
  • repo_url: None
  • paper_authors: Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang, Xiaofeng Yang, Yikai Wang, Zhongang Cai, Lei Yang, Huaping Liu, Guosheng Lin
  • For: This paper aims to improve the efficiency and control of 3D editing methods, particularly in complex scenes, by introducing a novel 3D representation called Gaussian Splatting (GS) and a new editing algorithm called GaussianEditor.* Methods: The proposed GaussianEditor algorithm uses Gaussian semantic tracing to trace the editing target throughout the training process, and Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. The algorithm also includes editing strategies for efficient object removal and integration.* Results: The paper presents comprehensive experiments that demonstrate the superior control, efficacy, and rapid performance of GaussianEditor compared to traditional 3D editing methods. The results show that GaussianEditor can effectively edit complex scenes with high precision and efficiency, and is particularly useful for tasks such as object removal and integration.
    Abstract 3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation. GaussianEditor enhances precision and control in editing through our proposed Gaussian semantic tracing, which traces the editing target throughout the training process. Additionally, we propose Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. Project Page: https://buaacyw.github.io/gaussian-editor/
    摘要 三维编辑在许多领域,如游戏和虚拟现实,扮演着重要的角色。传统的三维编辑方法,例如基于三角形和点云的表示方法,经常无法实际地重现复杂的场景。然而,基于各种各样的三维表示方法,如神经辐射场(NeRF),可以有效地渲染复杂的场景,但是它们的处理速度较慢,控制特定场景区域的能力受限。为了解决这些挑战,我们的论文提出了一种有效和高效的三维编辑算法,基于 Gaussian Splatting(GS),一种新的三维表示方法。我们的 Gaussian Editor 提高了编辑精度和控制,通过我们的提议的 Gaussian semantic tracing,在训练过程中跟踪编辑目标。此外,我们还提出了层次 Gaussian splatting(HGS),以实现稳定和细腻的结果,通过从二维扩散模型获得的随机生成指导。我们还开发了高效的对象移除和集成策略,这是现有方法中的一个挑战。我们的完整的实验证明 Gaussian Editor 的精度、效果和快速性,标志着三维编辑领域的一个重要进步。项目页面:https://buaacyw.github.io/gaussian-editor/

Multi-Class Anomaly Detection based on Regularized Discriminative Coupled hypersphere-based Feature Adaptation

  • paper_url: http://arxiv.org/abs/2311.14506
  • repo_url: None
  • paper_authors: Mehdi Rafiei, Alexandros Iosifidis
  • for: 这篇论文旨在解决多类别异常检测中的问题,提出了一个新的模型,即Regularized Discriminative Coupled-hypersphere-based Feature Adaptation (RD-CFA)。
  • methods: 这篇论文使用了一种modified Regularized Discriminative Variational Auto-Encoder (RD-VAE)来获取类分布特征,然后与Coupled-hypersphere-based Feature Adaptation (CFA)组合,以实现多类别异常检测。
  • results: 经过广泛的评估,RD-CFA比起八种当前的方法表现出色,能够优化异常检测和地图化。
    Abstract In anomaly detection, identification of anomalies across diverse product categories is a complex task. This paper introduces a new model by including class discriminative properties obtained by a modified Regularized Discriminative Variational Auto-Encoder (RD-VAE) in the feature extraction process of Coupled-hypersphere-based Feature Adaptation (CFA). By doing so, the proposed Regularized Discriminative Coupled-hypersphere-based Feature Adaptation (RD-CFA), forms a solution for multi-class anomaly detection. By using the discriminative power of RD-VAE to capture intricate class distributions, combined with CFA's robust anomaly detection capability, the proposed method excels in discerning anomalies across various classes. Extensive evaluations on multi-class anomaly detection and localization using the MVTec AD and BeanTech AD datasets showcase the effectiveness of RD-CFA compared to eight leading contemporary methods.
    摘要 在异常检测中,针对不同产品类别的异常检测是一项复杂的任务。这篇论文提出了一种新的模型,该模型通过包含修改后的常规化学抑制变换器(RD-VAE)获取分类特征属性,并将其与嵌入式半球体特征适应(CFA)结合使用。这种提议的模型被称为常规化嵌入式半球体特征适应(RD-CFA),用于多类异常检测。通过使用RD-VAE捕捉复杂的分类分布,并与CFA的强大异常检测能力相结合,提议的方法在不同类别的异常检测和 lokalisierung中表现出色。经过对多类异常检测和 lokalisierung使用MVTec AD和BeanTech AD数据集的广泛评估,提议的方法与当前八种领先方法相比,显示出了更高的效果。

MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

  • paper_url: http://arxiv.org/abs/2311.14494
  • repo_url: https://github.com/wu-cvgl/mvcontrol
  • paper_authors: Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu
  • for: 增强现有预训练多视图2D扩散模型,以生成可控多视图图像和视角一致3D内容。
  • methods: 基于MVDream模型,采用新的神经网络模块进行终端任务特定条件学习。提出一种新的条件机制,通过预测输入空间和视角条件的嵌入来控制网络的生成图像。
  • results: 通过训练MVControl模型,可以实现高质量3D内容的生成,并且可以控制生成图像的形状和视角。广泛的实验表明,我们的方法具有强大的普适性和可控性。代码可以在https://github.com/WU-CVGL/MVControl/上下载。
    Abstract We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models by incorporating additional input conditions, e.g. edge maps. Our approach enables the generation of controllable multi-view images and view-consistent 3D content. To achieve controllable multi-view image generation, we leverage MVDream as our base model, and train a new neural network module as additional plugin for end-to-end task-specific condition learning. To precisely control the shapes and views of generated images, we innovatively propose a new conditioning mechanism that predicts an embedding encapsulating the input spatial and view conditions, which is then injected to the network globally. Once MVControl is trained, score-distillation (SDS) loss based optimization can be performed to generate 3D content, in which process we propose to use a hybrid diffusion prior. The hybrid prior relies on a pre-trained Stable-Diffusion network and our trained MVControl for additional guidance. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content. Code available at https://github.com/WU-CVGL/MVControl/.
    摘要 我们介绍MVControl,一种新的神经网络架构,可以增强现有预训练的多视图2D扩散模型,例如添加边图等输入条件。我们的方法可以生成可控的多视图图像和视图一致的3D内容。为实现可控的多视图图像生成,我们利用MVDream作为基本模型,并训练一个新的神经网络模块作为特定任务的终端用途。为了准确地控制生成的图像的形状和视图,我们创新地提出了一种新的条件定义机制,可以预测输入空间和视图条件的嵌入,并将其注入到网络中。一旦MVControl被训练,可以使用SDS损失基于优化来生成3D内容,在这个过程中,我们提议使用混合扩散先验。混合先验基于预训练的稳定扩散网络和我们训练的MVControl,以提供额外的指导。广泛的实验表明,我们的方法可以具有robust的通用性,并且可以生成高质量的3D内容。代码可以在https://github.com/WU-CVGL/MVControl/上获取。

Set Features for Anomaly Detection

  • paper_url: http://arxiv.org/abs/2311.14773
  • repo_url: https://github.com/abhishekpatel-lpu/CICIDS-2017-intrution-detection-
  • paper_authors: Niv Cohen, Issar Tzachor, Yedid Hoshen
  • for: 本研究旨在探讨如何探测样本中不寻常的组合元素。
  • methods: 本文提出了一种基于元素分布模型的方法,通过简单的概率计算方法来计算样本的异常分数。
  • results: 本研究比较了fixed feature方法和state-of-the-art segmentation-based方法,发现本方法在图像水平的逻辑异常检测和时间序列异常检测中表现较佳。
    Abstract This paper proposes set features for detecting anomalies in samples that consist of unusual combinations of normal elements. Many leading methods discover anomalies by detecting an unusual part of a sample. For example, state-of-the-art segmentation-based approaches, first classify each element of the sample (e.g., image patch) as normal or anomalous and then classify the entire sample as anomalous if it contains anomalous elements. However, such approaches do not extend well to scenarios where the anomalies are expressed by an unusual combination of normal elements. In this paper, we overcome this limitation by proposing set features that model each sample by the distribution of its elements. We compute the anomaly score of each sample using a simple density estimation method, using fixed features. Our approach outperforms the previous state-of-the-art in image-level logical anomaly detection and sequence-level time series anomaly detection.
    摘要 这篇论文提出了一种针对含有不寻常组合的正常元素的异常检测方法。许多领先方法通过检测样本中异常部分来检测异常。例如,当前领先的分割基于方法先将每个样本中的每个元素(例如图像块)分类为正常或异常,然后将整个样本分类为异常如果它包含异常元素。但这些方法不太适用于情况下,异常表现为正常元素的不寻常组合。在这篇论文中,我们解决了这个限制,我们提出了一种基于元素分布的集合特征来模型每个样本。我们使用了一种简单的涂抹估计方法来计算每个样本的异常分数。我们的方法在图像水平逻辑异常检测和时间序列异常检测中超过了前一个领先方法。

Towards Interpretable Classification of Leukocytes based on Deep Learning

  • paper_url: http://arxiv.org/abs/2311.14485
  • repo_url: None
  • paper_authors: Stefan Röhrl, Johannes Groll, Manuel Lengl, Simon Schumann, Christian Klenk, Dominik Heim, Martin Knopp, Oliver Hayden, Klaus Diepold
  • for: 这个论文旨在提高自标量方法的精度和可靠性,以便更好地将其应用于临床决策过程中。
  • methods: 该论文使用了机器学习方法,包括自适应核算法和深度学习模型,以提高细胞类型分类的精度和可靠性。
  • results: 研究人员通过对不同enario的血细胞分析进行比较,发现了一些通用的检测模式,并证明了这些方法在不同的情况下的可行性和有用性。
    Abstract Label-free approaches are attractive in cytological imaging due to their flexibility and cost efficiency. They are supported by machine learning methods, which, despite the lack of labeling and the associated lower contrast, can classify cells with high accuracy where the human observer has little chance to discriminate cells. In order to better integrate these workflows into the clinical decision making process, this work investigates the calibration of confidence estimation for the automated classification of leukocytes. In addition, different visual explanation approaches are compared, which should bring machine decision making closer to professional healthcare applications. Furthermore, we were able to identify general detection patterns in neural networks and demonstrate the utility of the presented approaches in different scenarios of blood cell analysis.
    摘要 自适应方法在细胞成像中具有灵活性和成本效益,这些方法通过机器学习方法实现,尽管无标签和相关下降的对比度,仍可以准确地分类细胞,人类观察员很难进行分辨。为了更好地将这些工作流程integrate到临床决策过程中,本研究检验了自动分类细胞的准确性评估的准确性。此外,我们还比较了不同的视觉解释方法,以便将机器决策更近于专业医疗应用。此外,我们还发现了神经网络中的普遍探测模式,并证明了提出的方法在不同的血液细胞分析场景中的实用性。

Trainwreck: A damaging adversarial attack on image classifiers

  • paper_url: http://arxiv.org/abs/2311.14772
  • repo_url: https://github.com/janzahalka/trainwreck
  • paper_authors: Jan Zahálka
  • for: 这个论文探讨了一种新的攻击vector,即损害性 adversarial attack(DAAs),用于让计算机视觉(CV)模型受损,以达到经济抗衰的目的。
  • methods: 这篇论文提出了一种名为 Trainwreck 的训练时间攻击,用于腐蚀图像分类器的性能。 Trainwreck 使用了隐藏式(ε ≤ 8/255)的类对对 Universal 抖动来融合类似类的数据,从而使得模型在训练时受到损害。
  • results: 实验表明, Trainwreck 是一种有效的攻击,可以适用于不同的模型架构,包括 EfficientNetV2、ResNeXt-101 和 finetuned ViT-L-16。 攻击的强度可以通过毒素率参数来调整。 针对 Trainwreck 或类似 DAAs,数据重复和文件哈希/像素差可以作为可靠的防御技术。
    Abstract Adversarial attacks are an important security concern for computer vision (CV), as they enable malicious attackers to reliably manipulate CV models. Existing attacks aim to elicit an output desired by the attacker, but keep the model fully intact on clean data. With CV models becoming increasingly valuable assets in applied practice, a new attack vector is emerging: disrupting the models as a form of economic sabotage. This paper opens up the exploration of damaging adversarial attacks (DAAs) that seek to damage the target model and maximize the total cost incurred by the damage. As a pioneer DAA, this paper proposes Trainwreck, a train-time attack that poisons the training data of image classifiers to degrade their performance. Trainwreck conflates the data of similar classes using stealthy ($\epsilon \leq 8/255$) class-pair universal perturbations computed using a surrogate model. Trainwreck is a black-box, transferable attack: it requires no knowledge of the target model's architecture, and a single poisoned dataset degrades the performance of any model trained on it. The experimental evaluation on CIFAR-10 and CIFAR-100 demonstrates that Trainwreck is indeed an effective attack across various model architectures including EfficientNetV2, ResNeXt-101, and a finetuned ViT-L-16. The strength of the attack can be customized by the poison rate parameter. Finally, data redundancy with file hashing and/or pixel difference are identified as a reliable defense technique against Trainwreck or similar DAAs. The code is available at https://github.com/JanZahalka/trainwreck.
    摘要 “机器视觉(CV)模型的反攻击是一项重要的安全性应急,因为它允许攻击者随意地改变CV模型的输出。现有的攻击尝试使模型产生攻击者所需的输出,但是保持清洁数据时模型完整无损。随着CV模型在实践中的重要性增加,一新的攻击方向正在出现:破坏模型作为经济战略的攻击。本文开启了这种破坏性反攻击(DAAs)的探索,并提出了一个名为“Trainwreck”的训练时间攻击。Trainwreck使用隐藏($\epsilon \leq 8/255)”的型别对应运算,在训练过程中混乱相似类型的数据,导致模型的性能下降。Trainwreck是一个黑盒子、可转移的攻击,它不需要攻击者知道目标模型的架构,仅需将毒化的训练数据提供给任何模型,即使是不同的架构。实验评估在CIFAR-10和CIFAR-100上,显示Trainwreck是一个有效的攻击,可以适用于不同的模型架构,包括EfficientNetV2、ResNeXt-101和Finetuned ViT-L-16。攻击强度可以通过毒素率参数自动调整。最后,我们发现了使用档案哈希和/或像素差来防护Trainwreck或相似的DAAs的资料重复是一个可靠的防御技术。代码可以在https://github.com/JanZahalka/trainwreck上取得。”

Joint Diffusion: Mutual Consistency-Driven Diffusion Model for PET-MRI Co-Reconstruction

  • paper_url: http://arxiv.org/abs/2311.14473
  • repo_url: None
  • paper_authors: Taofeng Xie, Zhuo-Xu Cui, Chen Luo, Huayu Wang, Congcong Liu, Yuanzhi Zhang, Xuemei Wang, Yanjie Zhu, Qiyu Jin, Guoqing Chen, Yihang Zhou, Dong Liang, Haifeng Wang
    for: 这项研究的目的是提高PET-MRI系统中的功能和解剖成像的速度和质量。methods: 这项研究使用了一种新的MC-Diffusion模型,利用多Modal图像之间的共同信息来提高图像重建。results: 研究结果表明,MC-Diffusion模型可以在PET-MRI系统中提高图像质量和速度,超过了现有的方法。
    Abstract Positron Emission Tomography and Magnetic Resonance Imaging (PET-MRI) systems can obtain functional and anatomical scans. PET suffers from a low signal-to-noise ratio. Meanwhile, the k-space data acquisition process in MRI is time-consuming. The study aims to accelerate MRI and enhance PET image quality. Conventional approaches involve the separate reconstruction of each modality within PET-MRI systems. However, there exists complementary information among multi-modal images. The complementary information can contribute to image reconstruction. In this study, we propose a novel PET-MRI joint reconstruction model employing a mutual consistency-driven diffusion mode, namely MC-Diffusion. MC-Diffusion learns the joint probability distribution of PET and MRI for utilizing complementary information. We conducted a series of contrast experiments about LPLS, Joint ISAT-net and MC-Diffusion by the ADNI dataset. The results underscore the qualitative and quantitative improvements achieved by MC-Diffusion, surpassing the state-of-the-art method.
    摘要

CT-xCOV: a CT-scan based Explainable Framework for COVid-19 diagnosis

  • paper_url: http://arxiv.org/abs/2311.14462
  • repo_url: https://github.com/ismailelbouknify/ct-xcov
  • paper_authors: Ismail Elbouknify, Afaf Bouhoute, Khalid Fardousse, Ismail Berrada, Abdelmajid Badri
  • for: 这paper的目的是开发一个可解释的深度学习涂抹COVID-19诊断模型,以及提供可视化和文本化解释。
  • methods: 这paper使用了U-Net模型进行肺部分 segmentation,并对COVID-19检测使用了三种不同的CNN架构:标准CNN、ResNet50和DenseNet121。在检测后,使用了三种XAI技术:Grad-Cam、Integrated Gradient(IG)和LIME提供可视化和文本化解释。
  • results: 实验结果表明,使用的DL模型具有良好的性能。U-Net分割模型达到了98%的Dice指标。对于提posed的分类模型(标准CNN),使用5-fold交叉验证(准精度98.40%,准确率98.23%)。在XAI技术比较中,Grad-Cam得到了最好的解释结果,与IG和LIME相比,其达到了55%的COVID-19阳性扫描中的Dice指标。
    Abstract In this work, CT-xCOV, an explainable framework for COVID-19 diagnosis using Deep Learning (DL) on CT-scans is developed. CT-xCOV adopts an end-to-end approach from lung segmentation to COVID-19 detection and explanations of the detection model's prediction. For lung segmentation, we used the well-known U-Net model. For COVID-19 detection, we compared three different CNN architectures: a standard CNN, ResNet50, and DenseNet121. After the detection, visual and textual explanations are provided. For visual explanations, we applied three different XAI techniques, namely, Grad-Cam, Integrated Gradient (IG), and LIME. Textual explanations are added by computing the percentage of infection by lungs. To assess the performance of the used XAI techniques, we propose a ground-truth-based evaluation method, measuring the similarity between the visualization outputs and the ground-truth infections. The performed experiments show that the applied DL models achieved good results. The U-Net segmentation model achieved a high Dice coefficient (98%). The performance of our proposed classification model (standard CNN) was validated using 5-fold cross-validation (acc of 98.40% and f1-score 98.23%). Lastly, the results of the comparison of XAI techniques show that Grad-Cam gives the best explanations compared to LIME and IG, by achieving a Dice coefficient of 55%, on COVID-19 positive scans, compared to 29% and 24% obtained by IG and LIME respectively. The code and the dataset used in this paper are available in the GitHub repository [1].
    摘要 在这项工作中,我们开发了一个可解释的Deep Learning(DL)框架,用于COVID-19诊断 based on CT-scans。我们采用了一个端到端的方法,从肺部分 segmentation 到 COVID-19检测和检测模型预测的解释。用于肺部分 segmentation 的模型是 U-Net 模型。用于 COVID-19 检测,我们比较了三种不同的 Convolutional Neural Network(CNN)架构:标准 CNN、ResNet50 和 DenseNet121。检测后,我们提供了视觉和文本的解释。用于视觉解释,我们应用了三种不同的 Explainable AI(XAI)技术, namely,Grad-Cam、Integrated Gradient(IG)和 LIME。文本解释是通过计算感染部分的百分比来进行的。为了评估使用的 XAI 技术的性能,我们提出了一种基于真实数据的评估方法,测量视觉化输出和真实感染的相似度。实验结果表明,我们使用的 DL 模型具有良好的性能。U-Net segmentation 模型达到了 98% 的 Dice 系数。我们的提出的分类模型(标准 CNN)在 5-fold cross-validation 中得到了 98.40% 的准确率和 98.23% 的 F1-score。最后,我们对 XAI 技术进行了比较,结果显示 Grad-Cam 提供了最好的解释,其 Dice 系数为 COVID-19 阳性扫描中的 55%,比 LIME 和 IG 的 29% 和 24% 高出了许多。代码和数据集使用在这篇论文中,可以在 GitHub 存储库中找到。

IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather

  • paper_url: http://arxiv.org/abs/2311.14459
  • repo_url: None
  • paper_authors: Furqan Ahmed Shaik, Abhishek Malreddy, Nikhil Reddy Billa, Kunal Chaudhary, Sunny Manchanda, Girish Varma
  • for: 本研究旨在提供一个高度Robustness的大规模自动驾驶 dataset,以满足现代自动驾驶系统的需求。
  • methods: 我们引入了一个新的捕捉和标注方法,并使用了高品质的图像和标注数据,以提高模型的准确性和可靠性。
  • results: 我们的实验结果表明,IDD-AW 是目前最加分的大规模自动驾驶 dataset,可以挑战和提高现有的模型性能。
    Abstract Large-scale deployment of fully autonomous vehicles requires a very high degree of robustness to unstructured traffic, and weather conditions, and should prevent unsafe mispredictions. While there are several datasets and benchmarks focusing on segmentation for drive scenes, they are not specifically focused on safety and robustness issues. We introduce the IDD-AW dataset, which provides 5000 pairs of high-quality images with pixel-level annotations, captured under rain, fog, low light, and snow in unstructured driving conditions. As compared to other adverse weather datasets, we provide i.) more annotated images, ii.) paired Near-Infrared (NIR) image for each frame, iii.) larger label set with a 4-level label hierarchy to capture unstructured traffic conditions. We benchmark state-of-the-art models for semantic segmentation in IDD-AW. We also propose a new metric called ''Safe mean Intersection over Union (Safe mIoU)'' for hierarchical datasets which penalizes dangerous mispredictions that are not captured in the traditional definition of mean Intersection over Union (mIoU). The results show that IDD-AW is one of the most challenging datasets to date for these tasks. The dataset and code will be available here: http://iddaw.github.io.
    摘要 大规模自动驾驶汽车部署需要非常高度的Robustness,以适应不结构化交通和天气条件,并避免 unsafe 的预测错误。当前有几个数据集和标准套件专注于驾驶场景的分割,但他们并不特地关注安全和可靠性问题。我们介绍了 IDD-AW 数据集,该数据集提供了 5000 个高质量图像,每个图像都有像素级注解,在雨、雾、低光和雪等不结构化驾驶条件下拍摄。与其他不良天气数据集相比,我们提供了以下优势:i.) 更多的注释图像ii.) 每帧图像都有对应的 Near-Infrared(NIR)图像iii.) 更大的标签集,包括4级标签层,以捕捉不结构化交通条件我们使用现状最佳模型进行 semantic segmentation 的测试,并提出了一个新的指标called ''Safe mean Intersection over Union''(Safe mIoU),用于衡量层次数据集中的安全性。结果表明,IDD-AW 是目前最有挑战性的数据集之一。数据集和代码将在以下地址公开:http://iddaw.github.io。

Segment (Almost) Nothing: Prompt-Agnostic Adversarial Attacks on Segmentation Models

  • paper_url: http://arxiv.org/abs/2311.14450
  • repo_url: None
  • paper_authors: Francesco Croce, Matthias Hein
  • for: 本研究旨在提出一种基于Latent Space的攻击方法,可以对多种提示(包括视觉和文本提示)进行攻击。
  • methods: 该方法使用了image encoder来对输入图像进行编码,然后使用embedding vectors进行攻击。
  • results: 研究发现,只需添加一定的杂音(radius=1/255)可以使得基于Latent Space的攻击方法对多种提示进行攻击,并且这些攻击可以被应用于任何输入图像。
    Abstract General purpose segmentation models are able to generate (semantic) segmentation masks from a variety of prompts, including visual (points, boxed, etc.) and textual (object names) ones. In particular, input images are pre-processed by an image encoder to obtain embedding vectors which are later used for mask predictions. Existing adversarial attacks target the end-to-end tasks, i.e. aim at altering the segmentation mask predicted for a specific image-prompt pair. However, this requires running an individual attack for each new prompt for the same image. We propose instead to generate prompt-agnostic adversarial attacks by maximizing the $\ell_2$-distance, in the latent space, between the embedding of the original and perturbed images. Since the encoding process only depends on the image, distorted image representations will cause perturbations in the segmentation masks for a variety of prompts. We show that even imperceptible $\ell_\infty$-bounded perturbations of radius $\epsilon=1/255$ are often sufficient to drastically modify the masks predicted with point, box and text prompts by recently proposed foundation models for segmentation. Moreover, we explore the possibility of creating universal, i.e. non image-specific, attacks which can be readily applied to any input without further computational cost.
    摘要 通用分割模型可以生成(semantic)分割面照,包括视觉(点、盒子等)和文本(对象名)等多种提示。具体来说,输入图像先经图像编码器处理,以获取嵌入向量,然后用于预测分割面照。现有的敌意攻击target着端到端任务,即想要对特定图像-提示对的分割面照进行修改。然而,这需要对每个新提示进行专门的攻击。我们提议代之而行的是生成提示无关的敌意攻击,通过在嵌入空间中最大化 $\ell_2$ 距离,以使得编码过程只依赖于图像,导致各种提示下的分割面照受到扰动。我们显示,甚至是可见度很低的 $\ell_\infty$ 约束Radius $\epsilon=1/255$ 的扰动也可以很快地大幅修改由最近提出的基础模型进行分割的分割面照。此外,我们探讨了创建通用的攻击,可以无需进一步的计算成本,直接应用于任何输入。

Deformable multi-modal image registration for the correlation between optical measurements and histology images

  • paper_url: http://arxiv.org/abs/2311.14414
  • repo_url: None
  • paper_authors: Lianne Feenstra, Maud Lambregts, Theo J. M Ruers, Behdad Dashtbozorg
  • for: 提高仪器技术的验证性,减少人工注册错误和不一致性。
  • methods: 使用深度学习原理自动多Modal图像对alignment,并利用手动注册图像作为准确参照。
  • results: 对比手动注册和自动注册,自动注册表现出色,Dice分数和互信息指标都高于手动注册。
    Abstract The correlation of optical measurements with a correct pathology label is often hampered by imprecise registration caused by deformations in histology images. This study explores an automated multi-modal image registration technique utilizing deep learning principles to align snapshot breast specimen images with corresponding histology images. The input images, acquired through different modalities, present challenges due to variations in intensities and structural visibility, making linear assumptions inappropriate. An unsupervised and supervised learning approach, based on the VoxelMorph model, was explored, making use of a dataset with manually registered images used as ground truth. Evaluation metrics, including Dice scores and mutual information, reveal that the unsupervised model outperforms the supervised (and manual approach) significantly, achieving superior image alignment. This automated registration approach holds promise for improving the validation of optical technologies by minimizing human errors and inconsistencies associated with manual registration.
    摘要 “对于光学测量和正确病理标签之间的相互相关,经常会受到压缩图像的干扰,这使得对压缩图像进行自动多modal镜像对接成为一个重要的研究课题。本研究探讨了使用深度学习原理来自动对接 snapshot乳腺标本图像和相应的病理图像。输入图像,通过不同modalities所取得,具有不同的强度和结构可视性,使得线性假设无法应用。本研究探讨了不supervised和supervised学习方法,基于VoxelMorph模型,并使用了手动注册图像作为参考标准。评估指标,包括 dice分数和共识度,显示了不supervised模型对supervised(以及人工方法)进行了明显的超越,实现了更好的图像对接。这自动对接方法具有改善光学技术的验证过程中的人类错误和不一致性的潜力。”

OneFormer3D: One Transformer for Unified Point Cloud Segmentation

  • paper_url: http://arxiv.org/abs/2311.14405
  • repo_url: https://github.com/filapro/oneformer3d
  • paper_authors: Maxim Kolodiazhnyi, Anna Vorontsova, Anton Konushin, Danila Rukhovich
  • for: 这个论文 targets 3D 点云Semantic, Instance, and Panoptic Segmentation tasks.
  • methods: 该论文提出了一种统一的、简单的和高效的模型,名为OneFormer3D,可以同时地处理这三种任务。该模型使用可学习的核函数,每个核函数负责生成一个实例或 semantic category 的面积掩模。这些核函数被一个基于 transformer 的解码器进行培育,并将所有实例和 semantic queries 作为输入传递给该解码器。这种设计允许在单个训练运行中培育一个模型,从而实现所有三个分 segmentation 任务的同时优秀性能。
  • results: 该论文在ScanNet 测试领先борDEX中 ranking 1st, 并设置了新的 state-of-the-art (+2.1 mAP50) 成绩。此外,该论文还在 ScanNet200 和 S3DIS 数据集上实现了 state-of-the-art 的结果。
    Abstract Semantic, instance, and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby, the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified, simple, and effective model addressing all these tasks jointly. The model, named OneFormer3D, performs instance and semantic segmentation consistently, using a group of learnable kernels, where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run, so that it achieves top performance on all three segmentation tasks simultaneously. Specifically, our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic, instance, and panoptic segmentation of ScanNet (+21 PQ), ScanNet200 (+3.8 mAP50), and S3DIS (+0.8 mIoU) datasets.
    摘要 <>TRANSLATE_TEXTSemantic, instance, and panoptic segmentation of 3D point clouds have been addressed using task-specific models of distinct design. Thereby, the similarity of all segmentation tasks and the implicit relationship between them have not been utilized effectively. This paper presents a unified, simple, and effective model addressing all these tasks jointly. The model, named OneFormer3D, performs instance and semantic segmentation consistently, using a group of learnable kernels, where each kernel is responsible for generating a mask for either an instance or a semantic category. These kernels are trained with a transformer-based decoder with unified instance and semantic queries passed as an input. Such a design enables training a model end-to-end in a single run, so that it achieves top performance on all three segmentation tasks simultaneously. Specifically, our OneFormer3D ranks 1st and sets a new state-of-the-art (+2.1 mAP50) in the ScanNet test leaderboard. We also demonstrate the state-of-the-art results in semantic, instance, and panoptic segmentation of ScanNet (+21 PQ), ScanNet200 (+3.8 mAP50), and S3DIS (+0.8 mIoU) datasets.TRANSLATE_TEXT

Multi-scale Semantic Correlation Mining for Visible-Infrared Person Re-Identification

  • paper_url: http://arxiv.org/abs/2311.14395
  • repo_url: https://github.com/Hua-XC/MSCMNet
  • paper_authors: Ke Cheng, Xuecheng Hua, Hu Lu, Juanjuan Tu, Yuanquan Wang, Shitong Wang
  • for: 提高Visible-Infrared Person Re-Identification (VI-ReID) 任务中的匹配精度,主要是怎样提取不同模式的特征来匹配用。
  • methods: 提出了一种 Multi-scale Semantic Correlation Mining network (MSCMNet),该网络包括三个新的组成部分:首先,通过考虑多种模式信息的有效利用,设计了一个多 scales Information Correlation Mining Block (MIMB),以探索多个缩放级别的semantic correlations; 其次,为MIMB提供更多的semantic信息,设计了一个 quadruple-stream feature extractor (QFE) with non-shared parameters,从不同维度的数据集中提取信息; 最后,提出了一种 Quadruple Center Triplet Loss (QCT),以解决总特征中的信息不一致问题。
  • results: 在SYSU-MM01、RegDB和LLCM datasets上进行了广泛的实验,结果显示,提出的MSCMNet可以达到最高的匹配精度。
    Abstract The main challenge in the Visible-Infrared Person Re-Identification (VI-ReID) task lies in how to extract discriminative features from different modalities for matching purposes. While the existing well works primarily focus on minimizing the modal discrepancies, the modality information can not thoroughly be leveraged. To solve this problem, a Multi-scale Semantic Correlation Mining network (MSCMNet) is proposed to comprehensively exploit semantic features at multiple scales and simultaneously reduce modality information loss as small as possible in feature extraction. The proposed network contains three novel components. Firstly, after taking into account the effective utilization of modality information, the Multi-scale Information Correlation Mining Block (MIMB) is designed to explore semantic correlations across multiple scales. Secondly, in order to enrich the semantic information that MIMB can utilize, a quadruple-stream feature extractor (QFE) with non-shared parameters is specifically designed to extract information from different dimensions of the dataset. Finally, the Quadruple Center Triplet Loss (QCT) is further proposed to address the information discrepancy in the comprehensive features. Extensive experiments on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that the proposed MSCMNet achieves the greatest accuracy.
    摘要 主要挑战在可见光谱人重识别任务中是如何提取特征特异性,以便匹配用途。现有的既有工作主要强调降低模态差异,但模态信息不能充分利用。为解决这个问题,一种多尺度semantic correlation mining网络(MSCMNet)被提出,以全面利用semantic特征,并同时尽量减少模态信息损失。该网络包括三个新成分。首先,为了考虑有效利用模态信息,一个多scale信息相关挖掘块(MIMB)被设计,以探索多个缩放级别的semantic相关性。其次,为了丰富semantic信息,特制的四个流Feature抽取器(QFE)被设计,以EXTRACT数据集中不同维度的信息。最后,一种四个中心三重损失函数(QCT)被提出,以解决全面特征之间的信息差异。EXTENSIVE experiments on SYSU-MM01, RegDB,和LLCM数据集显示,提出的MSCMNet可以获得最高的准确率。

A Parameterized Generative Adversarial Network Using Cyclic Projection for Explainable Medical Image Classification

  • paper_url: http://arxiv.org/abs/2311.14388
  • repo_url: None
  • paper_authors: Xiangyu Xiong, Yue Sun, Xiaohong Liu, ChanTong Lam, Tong Tong, Hao Chen, Qinquan Gao, Wei Ke, Tao Tan
  • for: addresses the problem of data insufficiency in small-scale medical datasets by proposing a parameterized GAN (ParaGAN) for effective domain adaptation and explainable classification.
  • methods: ParaGAN incorporates projection distance parameters in cyclic projection and projects the source images to the decision boundary to obtain the class-difference maps, which effectively controls the changes of synthetic samples among domains and highlights the attention regions for downstream classification.
  • results: the proposed ParaGAN consistently outperforms the existing augmentation methods with explainable classification on two small-scale medical datasets.
    Abstract Although current data augmentation methods are successful to alleviate the data insufficiency, conventional augmentation are primarily intra-domain while advanced generative adversarial networks (GANs) generate images remaining uncertain, particularly in small-scale datasets. In this paper, we propose a parameterized GAN (ParaGAN) that effectively controls the changes of synthetic samples among domains and highlights the attention regions for downstream classification. Specifically, ParaGAN incorporates projection distance parameters in cyclic projection and projects the source images to the decision boundary to obtain the class-difference maps. Our experiments show that ParaGAN can consistently outperform the existing augmentation methods with explainable classification on two small-scale medical datasets.
    摘要 尽管当前的数据增强方法能够解决数据不足问题,但这些方法主要是同一个领域内的增强,而高级的生成对抗网络(GANs)生成的图像仍然存在uncertainty,特别是在小规模数据集中。在这篇论文中,我们提出了一种具有参数化的GAN(ParaGAN),可以有效控制生成的样本之间域的变化和突出下游分类器的注意区域。具体来说,ParaGAN在循环投影中包含投影距离参数,将源图像投影到决策边缘,从而获得类差地图。我们的实验表明,ParaGAN可以在两个小规模医疗数据集上一致性地超越现有的增强方法,并且可以提供可解释的分类。

Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion

  • paper_url: http://arxiv.org/abs/2311.14343
  • repo_url: None
  • paper_authors: Minshan Xie, Hanyuan Liu, Chengze Li, Tien-Tsin Wong
  • for: 本研究旨在提出一种同步多帧擦除框架,以维护视觉特征和时间一致性。
  • methods: 本方法使用文本指导的影像擦除模型,并将其扩展为视频合成。同时,通过 Shared Information among Frames (SIF) 机制,确保每帧的信息与其他帧的信息相互关联,以保持视觉特征和时间一致性。
  • results: 对比于现有的视频编辑方法,本方法可以生成高质量和多样化的视频结果,并且在量化测试中显示出优异的性能。
    Abstract Text-guided video-to-video stylization transforms the visual appearance of a source video to a different appearance guided on textual prompts. Existing text-guided image diffusion models can be extended for stylized video synthesis. However, they struggle to generate videos with both highly detailed appearance and temporal consistency. In this paper, we propose a synchronized multi-frame diffusion framework to maintain both the visual details and the temporal consistency. Frames are denoised in a synchronous fashion, and more importantly, information of different frames is shared since the beginning of the denoising process. Such information sharing ensures that a consensus, in terms of the overall structure and color distribution, among frames can be reached in the early stage of the denoising process before it is too late. The optical flow from the original video serves as the connection, and hence the venue for information sharing, among frames. We demonstrate the effectiveness of our method in generating high-quality and diverse results in extensive experiments. Our method shows superior qualitative and quantitative results compared to state-of-the-art video editing methods.
    摘要

Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models

  • paper_url: http://arxiv.org/abs/2311.14339
  • repo_url: https://github.com/cristianopatricio/concept-based-interpretability-vlm
  • paper_authors: Cristiano Patrício, Luís F. Teixeira, João C. Neves
  • for: 该文章主要针对皮肤病变诊断方面的概念基型模型进行研究,以便提高诊断的可解释性。
  • methods: 该文章提出了一种基于视觉语言模型的概念embedding学习策略,通过将概念描述为文本嵌入来适应下游皮肤病变分类任务。
  • results: 实验表明,视觉语言模型不仅在使用概念为文本嵌入时可以达到更高的准确率,还可以采用更少的概念标注样本来达到相当的性能。
    Abstract Concept-based models naturally lend themselves to the development of inherently interpretable skin lesion diagnosis, as medical experts make decisions based on a set of visual patterns of the lesion. Nevertheless, the development of these models depends on the existence of concept-annotated datasets, whose availability is scarce due to the specialized knowledge and expertise required in the annotation process. In this work, we show that vision-language models can be used to alleviate the dependence on a large number of concept-annotated samples. In particular, we propose an embedding learning strategy to adapt CLIP to the downstream task of skin lesion classification using concept-based descriptions as textual embeddings. Our experiments reveal that vision-language models not only attain better accuracy when using concepts as textual embeddings, but also require a smaller number of concept-annotated samples to attain comparable performance to approaches specifically devised for automatic concept generation.
    摘要 “概念基础模型自然地适用于生成内在可解释的皮肤患病诊断,因为医疗专家做出决策 based on 皮肤患病的视觉模式。然而,这些模型的开发受到概念注解数据的有限性的限制,因为注解过程需要特殊的专业知识和技能。在这种情况下,我们表明了使用视觉语言模型可以减轻对概念注解样本的依赖。具体来说,我们提出了一种嵌入学习策略,使得 CLIP 可以通过概念基础的文本嵌入来适应皮肤患病分类任务。我们的实验表明,视觉语言模型不仅在使用概念为文本嵌入时达到更高的准确率,而且只需要一小数量的概念注解样本来达到与自动概念生成方法相比的相似水平。”

TVT: Training-Free Vision Transformer Search on Tiny Datasets

  • paper_url: http://arxiv.org/abs/2311.14337
  • repo_url: None
  • paper_authors: Zimian Wei, Hengyue Pan, Lujun Li, Peijie Dong, Zhiliang Tian, Xin Niu, Dongsheng Li
  • for: 在这篇论文中,搜索一个更好的视觉转换器(ViT),并且不需要训练。
  • methods: 这篇论文使用教师模型(ConvNet)的专长来搜索一个更好的ViT,并且使用了一个新的教师对应的评估指标和学生能力指标来搜索。
  • results: 在实验中,这篇论文的方法比过去的训练自由搜索方法表现更好,并且在不同的小资料集和搜索空间中都获得了优秀的效果。
    Abstract Training-free Vision Transformer (ViT) architecture search is presented to search for a better ViT with zero-cost proxies. While ViTs achieve significant distillation gains from CNN teacher models on small datasets, the current zero-cost proxies in ViTs do not generalize well to the distillation training paradigm according to our experimental observations. In this paper, for the first time, we investigate how to search in a training-free manner with the help of teacher models and devise an effective Training-free ViT (TVT) search framework. Firstly, we observe that the similarity of attention maps between ViT and ConvNet teachers affects distill accuracy notably. Thus, we present a teacher-aware metric conditioned on the feature attention relations between teacher and student. Additionally, TVT employs the L2-Norm of the student's weights as the student-capability metric to improve ranking consistency. Finally, TVT searches for the best ViT for distilling with ConvNet teachers via our teacher-aware metric and student-capability metric, resulting in impressive gains in efficiency and effectiveness. Extensive experiments on various tiny datasets and search spaces show that our TVT outperforms state-of-the-art training-free search methods. The code will be released.
    摘要 training-freevision transformer(ViT)架构搜索是提出了在零成本情况下搜索更好的ViT。而ViT在小数据集上达到了显著的精炼成果,但现有的零成本代理在ViT中并不好地适应精炼训练方法,根据我们的实验观察。在这篇论文中,我们第一次 investigate了如何在无需训练的情况下进行搜索,并提出了一种有效的training-free ViT(TVT)搜索框架。首先,我们发现了ViT和ConvNet教师模型之间的注意力地图相似性对精炼精度产生了明显的影响。因此,我们提出了一种基于教师模型的教师相关的度量,并使用学生模型的L2-Norm weight作为学生能力度量来提高排名的一致性。最后,TVT通过我们的教师相关度量和学生能力度量来搜索最佳的ViT模型,并在不同的小数据集和搜索空间进行了广泛的实验,得到了较高的效率和效果。我们的TVT方法超越了当前的无需训练搜索方法的状态。代码将会发布。

Maximizing Discrimination Capability of Knowledge Distillation with Energy-based Score

  • paper_url: http://arxiv.org/abs/2311.14334
  • repo_url: None
  • paper_authors: Seonghak Kim, Gyeongdo Ham, Suin Lee, Donggon Jang, Daeshik Kim
  • for: 用于应用最新的计算机视觉技术,知识储存方法(KD)是必不可少的。现有的常数温度扩展KDs采用了所有样本集中的常数温度扩展,从而限制了每个样本的知识利用。
  • methods: 我们分类dataset为两类(低能量样本和高能量样本),基于每个样本的能量分数。通过实验,我们发现低能量样本具有高信任分数,表示一定的预测,而高能量样本具有低信任分数,表示不确定的预测。为了通过调整非目标类预测来总结优质知识,我们应用高温到低能量样本,以创建平滑的分布,并应用低温到高能量样本,以实现锐化分布。
  • results: 与前期的logit-based和特征基于方法相比,我们的能量基于KD(Energy KD)在多个数据集上达到了更好的性能。尤其是在CIFAR-100-LT和ImageNet数据集上, Energy KD 表现出了显著的改善。此外,我们还提出了高能量基数据增强(HE-DA),通过对20-50%的数据集进行增强,可以实现明显的性能改善。这表明HE-DA可以在资源有限的设备上使用。根据我们所知,这是第一篇利用能量分数在KD和DA中使用的论文,我们认为它将对未来的研究产生很大的贡献。
    Abstract To apply the latest computer vision techniques that require a large computational cost in real industrial applications, knowledge distillation methods (KDs) are essential. Existing logit-based KDs apply the constant temperature scaling to all samples in dataset, limiting the utilization of knowledge inherent in each sample individually. In our approach, we classify the dataset into two categories (i.e., low energy and high energy samples) based on their energy score. Through experiments, we have confirmed that low energy samples exhibit high confidence scores, indicating certain predictions, while high energy samples yield low confidence scores, meaning uncertain predictions. To distill optimal knowledge by adjusting non-target class predictions, we apply a higher temperature to low energy samples to create smoother distributions and a lower temperature to high energy samples to achieve sharper distributions. When compared to previous logit-based and feature-based methods, our energy-based KD (Energy KD) achieves better performance on various datasets. Especially, Energy KD shows significant improvements on CIFAR-100-LT and ImageNet datasets, which contain many challenging samples. Furthermore, we propose high energy-based data augmentation (HE-DA) for further improving the performance. We demonstrate that meaningful performance improvement could be achieved by augmenting only 20-50% of dataset, suggesting that it can be employed on resource-limited devices. To the best of our knowledge, this paper represents the first attempt to make use of energy scores in KD and DA, and we believe it will greatly contribute to future research.
    摘要 <>为应用计算机视觉技术,需要大量计算资源在实际应用中。知识储存方法(KD)是必要的。现有的常数温度扩展KDs将所有样本的温度设置为常数,这限制了每个样本的知识利用。在我们的方法中,我们将数据集分类为两类(即低能量样本和高能量样本),基于它们的能量分数。我们通过实验确认,低能量样本具有高信息率,表示确定的预测,而高能量样本具有低信息率,表示不确定的预测。为了通过调整非目标类预测来把握优质知识,我们将低能量样本应用高温,以创建更平滑的分布,而高能量样本应用低温,以实现更锐化的分布。与前期的常数温度扩展KD和特征扩展方法相比,我们的能量基于KD(能量KD)实现了更好的性能在多个数据集上。特别是在CIFAR-100-LT和ImageNet数据集上,我们的方法表现出了显著的改善。此外,我们还提出了高能量基数数据增强(HE-DA),可以进一步提高性能。我们示出,只需增强20-50%的数据集,就可以获得意义性的性能改善,这表明可以在资源有限的设备上使用。根据我们所知,这是首次利用能量分数在KD和DA中使用,我们认为这将对未来的研究产生很大的贡献。

Binarized 3D Whole-body Human Mesh Recovery

  • paper_url: http://arxiv.org/abs/2311.14323
  • repo_url: https://github.com/zhitengli/bidrn
  • paper_authors: Zhiteng Li, Yulun Zhang, Jing Lin, Haotong Qin, Jinjin Gu, Xin Yuan, Linghe Kong, Xiaokang Yang
  • for: reconstruction of 3D human body, face, and hands from a single image
  • methods: Binarized Dual Residual Network (BiDRN) and Binaried BoxNet
  • results: significant improvement over state-of-the-art binarization algorithms, comparable performance with full-precision method Hand4Whole using fewer parameters and operations.Here’s the full text in Simplified Chinese:
  • for: 这个论文的目标是从单个图像中重建3D人体、面孔和手部的三维模型。
  • methods: 我们提出了一种彩色分割网络(BiDRN)和彩色盒网络(Binaried BoxNet)来实现高效的三维人体重建。
  • results: 我们的BiDRN方法在比特化方法比较中具有显著的改善,并且与全精度方法Hand4Whole的性能相似,但占用的参数和运算数量减少了22.1%和14.8%。
    Abstract 3D whole-body human mesh recovery aims to reconstruct the 3D human body, face, and hands from a single image. Although powerful deep learning models have achieved accurate estimation in this task, they require enormous memory and computational resources. Consequently, these methods can hardly be deployed on resource-limited edge devices. In this work, we propose a Binarized Dual Residual Network (BiDRN), a novel quantization method to estimate the 3D human body, face, and hands parameters efficiently. Specifically, we design a basic unit Binarized Dual Residual Block (BiDRB) composed of Local Convolution Residual (LCR) and Block Residual (BR), which can preserve full-precision information as much as possible. For LCR, we generalize it to four kinds of convolutional modules so that full-precision information can be propagated even between mismatched dimensions. We also binarize the face and hands box-prediction network as Binaried BoxNet, which can further reduce the model redundancy. Comprehensive quantitative and qualitative experiments demonstrate the effectiveness of BiDRN, which has a significant improvement over state-of-the-art binarization algorithms. Moreover, our proposed BiDRN achieves comparable performance with full-precision method Hand4Whole while using just 22.1% parameters and 14.8% operations. We will release all the code and pretrained models.
    摘要 三维全身人体重建目标是从单个图像中重建三维人体、面孔和手部。虽然强大的深度学习模型已经实现了高精度的估计,但它们需要巨大的内存和计算资源。因此,这些方法几乎无法在边缘设备上部署。在这种情况下,我们提出了一种彩色二分数差异网络(BiDRN),一种新的量化方法,用于高效地估计三维人体、面孔和手部参数。具体来说,我们设计了基本单元彩色二分数差异块(BiDRB),其包括本地径差异块(LCR)和块差异块(BR)。LCR通过四种不同的径差异模块来保留全精度信息,以便在不匹配的维度之间传递信息。此外,我们还对面孔和手部框预测网络进行了彩色化,以进一步减少模型的重复性。从量化和质量上的实验来看,BiDRN具有显著的改善,与当前的量化算法相比。此外,我们的提议的BiDRN可以与全精度方法Hand4Whole achieve相当的性能,但使用的参数和操作数量却只有22.1%和14.8%。我们将发布所有代码和预训练模型。

Stable Cluster Discrimination for Deep Clustering

  • paper_url: http://arxiv.org/abs/2311.14310
  • repo_url: https://github.com/idstcv/secu
  • paper_authors: Qi Qian
  • for: 本研究旨在提高深度嵌入 clustering 的表现,同时实现 representation learning 和 clustering 的同时进行。
  • methods: 为了解决两项目的问题,本研究提出了两阶段训练策略,包括一个预训练阶段 для representation learning,然后精确地调整得到的模型 для clustering。此外,一些一阶段方法也是主要用于 representation learning,通过设计不同的限制来避免潜在的潜在泥沼问题。
  • results: 实验结果显示,SeCu 可以在所有测试数据集上实现 state-of-the-art 的表现,证明了一阶段 deep clustering 的效iveness。
    Abstract Deep clustering can optimize representations of instances (i.e., representation learning) and explore the inherent data distribution (i.e., clustering) simultaneously, which demonstrates a superior performance over conventional clustering methods with given features. However, the coupled objective implies a trivial solution that all instances collapse to the uniform features. To tackle the challenge, a two-stage training strategy is developed for decoupling, where it introduces an additional pre-training stage for representation learning and then fine-tunes the obtained model for clustering. Meanwhile, one-stage methods are developed mainly for representation learning rather than clustering, where various constraints for cluster assignments are designed to avoid collapsing explicitly. Despite the success of these methods, an appropriate learning objective tailored for deep clustering has not been investigated sufficiently. In this work, we first show that the prevalent discrimination task in supervised learning is unstable for one-stage clustering due to the lack of ground-truth labels and positive instances for certain clusters in each mini-batch. To mitigate the issue, a novel stable cluster discrimination (SeCu) task is proposed and a new hardness-aware clustering criterion can be obtained accordingly. Moreover, a global entropy constraint for cluster assignments is studied with efficient optimization. Extensive experiments are conducted on benchmark data sets and ImageNet. SeCu achieves state-of-the-art performance on all of them, which demonstrates the effectiveness of one-stage deep clustering. Code is available at \url{https://github.com/idstcv/SeCu}.
    摘要 深度归一可以优化实例表示(即表示学习)并同时探索数据内部分布(即归一),这表明与传统归一方法相比,深度归一具有更高的表现。然而,整体目标函数隐藏了一个潜在的简单解,即所有实例都归一到共同特征。为解决这个挑战,我们提出了一种两阶段训练策略,其中首先进行表示学习预训练,然后细化获得的模型进行归一。同时,一些一阶段方法被主要用于表示学习而不是归一,这些方法通过设计各种约束来避免归一。虽然这些方法得到了成功,但是对深度归一的适应学习目标函数的研究还不充分。在这种情况下,我们首先显示了一阶段归一的普遍稳定性问题,因为每个批处中缺乏准确标签和每个类别的正例实例。为解决这个问题,我们提出了一种新的稳定归一任务(SeCu),并可以根据这个任务获得一个新的硬度感知的归一标准。此外,我们还研究了一种全局Entropy约束,以便有效地优化归一。我们在标准数据集和ImageNet上进行了广泛的实验,SeCu实现了所有数据集的状态之最好表现,这表明了深度归一的一阶段表示学习的有效性。代码可以在 \url{https://github.com/idstcv/SeCu} 中找到。

Cosine Similarity Knowledge Distillation for Individual Class Information Transfer

  • paper_url: http://arxiv.org/abs/2311.14307
  • repo_url: None
  • paper_authors: Gyeongdo Ham, Seonghak Kim, Suin Lee, Jae-Hyeok Lee, Daeshik Kim
  • for: 提高模型压缩的效果,特别是使得学生模型能够与老师模型的性能相似或更高。
  • methods: 利用批处理级别的教师和学生预测,并使用cosine相似性来衡量学生模型对老师模型知识的学习。在cosine相似性高时,降低温度缩放,以便学生模型更好地学习老师模型的知识。
  • results: 对比 existed KD 方法,提出了一种有效的 KD 方法,可以使学生模型与老师模型的性能相似或更高。经验证明,这种方法可以提高模型压缩的效果。
    Abstract Previous logits-based Knowledge Distillation (KD) have utilized predictions about multiple categories within each sample (i.e., class predictions) and have employed Kullback-Leibler (KL) divergence to reduce the discrepancy between the student and teacher predictions. Despite the proliferation of KD techniques, the student model continues to fall short of achieving a similar level as teachers. In response, we introduce a novel and effective KD method capable of achieving results on par with or superior to the teacher models performance. We utilize teacher and student predictions about multiple samples for each category (i.e., batch predictions) and apply cosine similarity, a commonly used technique in Natural Language Processing (NLP) for measuring the resemblance between text embeddings. This metric's inherent scale-invariance property, which relies solely on vector direction and not magnitude, allows the student to dynamically learn from the teacher's knowledge, rather than being bound by a fixed distribution of the teacher's knowledge. Furthermore, we propose a method called cosine similarity weighted temperature (CSWT) to improve the performance. CSWT reduces the temperature scaling in KD when the cosine similarity between the student and teacher models is high, and conversely, it increases the temperature scaling when the cosine similarity is low. This adjustment optimizes the transfer of information from the teacher to the student model. Extensive experimental results show that our proposed method serves as a viable alternative to existing methods. We anticipate that this approach will offer valuable insights for future research on model compression.
    摘要 先前的预测值基于的知识填充(KD)方法都是使用每个样本中的多个类别预测(i.e., class predictions),并使用卷积-莱布NER(KL)差异来减少学生模型和教师模型的差异。尽管KD技术的普及,学生模型仍然无法达到教师模型的相同水平。为此,我们介绍了一种新的和有效的KD方法,能够达到或超过教师模型的性能。我们使用教师和学生对每个类别的多个样本预测(i.e., batch predictions),并使用cosine相似性,一种常用的自然语言处理(NLP)中的度量手段。这个度量的自然尺度不变性,即仅仅基于向量方向而不是大小,使得学生可以从教师的知识中动态学习,而不是受到固定的教师知识分布的限制。此外,我们提出了一种called cosine相似性权重temperature(CSWT)来提高性能。CSWT在cosine相似性高时降低温度尺度的涨幅,并在cosine相似性低时增加温度尺度的涨幅。这种调整可以优化教师知识的传递给学生模型。我们的实验结果表明,我们的提议方法可以作为现有方法的可行代替。我们期望这种方法会为未来的模型压缩提供有价值的 inspirations。

GeoViT: A Versatile Vision Transformer Architecture for Geospatial Image Analysis

  • paper_url: http://arxiv.org/abs/2311.14301
  • repo_url: None
  • paper_authors: Madhav Khirwar, Ankur Narang
  • for: 本研究旨在提供更高精度的CO2和NO2排放量估算、燃料类型和气溶胶覆盖率等数据,以便促进气候变化监测和排放限制政策的发展。
  • methods: 本研究使用了一种名为GeoViT的小型视 transformer模型,可以处理卫星影像数据进行多模式分 segmentation、分类和回归任务。
  • results: 通过使用GeoViT模型,本研究实现了对CO2排放量的更高精度估算、燃料类型的识别和高分辨率NO2浓度地图的创造等任务,并超越了之前的状态对模型。
    Abstract Greenhouse gases are pivotal drivers of climate change, necessitating precise quantification and source identification to foster mitigation strategies. We introduce GeoViT, a compact vision transformer model adept in processing satellite imagery for multimodal segmentation, classification, and regression tasks targeting CO2 and NO2 emissions. Leveraging GeoViT, we attain superior accuracy in estimating power generation rates, fuel type, plume coverage for CO2, and high-resolution NO2 concentration mapping, surpassing previous state-of-the-art models while significantly reducing model size. GeoViT demonstrates the efficacy of vision transformer architectures in harnessing satellite-derived data for enhanced GHG emission insights, proving instrumental in advancing climate change monitoring and emission regulation efforts globally.
    摘要 《绿色气候变化的主要驱动者是绿色气体,因此精准量化和来源确定是必要的,以便开发 Mitigation 策略。我们介绍 GeoViT,一种具有可靠的卫星影像处理能力的小型视transformer模型,用于多模态分割、分类和回归任务,targeting CO2 和 NO2 排放。通过 GeoViT,我们实现了对发电率、燃料类型、CO2 气泡覆盖率和高分辨率 NO2 浓度地图的高精度估计,超越了先前的状态太平方法,同时具有显著减小模型大小的优势。 GeoViT 示例了视transformer 架构在使用卫星来源数据的情况下提供更高的GHG 排放情况洞察,这将对全球气候变化监测和排放规定做出重要贡献。》Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and other countries. If you need the translation in Traditional Chinese, please let me know.

Decouple Content and Motion for Conditional Image-to-Video Generation

  • paper_url: http://arxiv.org/abs/2311.14294
  • repo_url: None
  • paper_authors: Cuifeng Shen, Yulu Gan, Chen Chen, Xiongwei Zhu, Lele Cheng, Jinzhi Wang
  • for: 创建一个真实的新视频,只准入一个图像和文本作为条件。
  • methods: 提出了一种新的方法,通过分解目标RGB像素为空间内容和时间运动两个分量来解决传统的cI2V生成方法中的限制,包括模式一致和视觉连续性。
  • results: 实验结果表明,该方法可以在不添加新结构复杂度的情况下提高效果和效率,并且在多个数据集上达到了现有方法的性能水平。
    Abstract The goal of conditional image-to-video (cI2V) generation is to create a believable new video by beginning with the condition, i.e., one image and text.The previous cI2V generation methods conventionally perform in RGB pixel space, with limitations in modeling motion consistency and visual continuity. Additionally, the efficiency of generating videos in pixel space is quite low.In this paper, we propose a novel approach to address these challenges by disentangling the target RGB pixels into two distinct components: spatial content and temporal motions. Specifically, we predict temporal motions which include motion vector and residual based on a 3D-UNet diffusion model. By explicitly modeling temporal motions and warping them to the starting image, we improve the temporal consistency of generated videos. This results in a reduction of spatial redundancy, emphasizing temporal details. Our proposed method achieves performance improvements by disentangling content and motion, all without introducing new structural complexities to the model. Extensive experiments on various datasets confirm our approach's superior performance over the majority of state-of-the-art methods in both effectiveness and efficiency.
    摘要 “目的是实现基于条件(cI2V)的图像转映,创建一个真实的新影片,从条件开始,即一幅图像和文本。现有的cI2V生成方法通常在RGB像素空间中进行,它们受到模型动作一致和视觉连续性的限制。另外,生成影片的效率在像素空间是很低。在这篇论文中,我们提出了一个新的方法,即分解目标RGB像素为两个不同的分量:空间内容和时间动作。具体来说,我们预测时间动作,包括动向 вектор和差异,透过3D-UNet扩散模型。通过Explicitly 模型时间动作,并将其扭转到起始图像,我们提高了生成影片的时间一致性。这导致缩减空间重复,强调时间细节。我们的提案方法在效能和效率方面都超过了大多数现有的方法,并且不增加新的结构层次到模型中。广泛的实验证明了我们的方法的超越性。”

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

  • paper_url: http://arxiv.org/abs/2311.14284
  • repo_url: https://github.com/weijiawu/paradiffusion
  • paper_authors: Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang
  • for: 这篇论文主要target是解决长段文本(Up to 512 words)到图像生成任务中的Alignment问题。
  • methods: 该模型使用大语言模型(例如Llama V2)进行文本编码,然后通过LORA进行调整,以实现文本-图像特征空间的Alignment。
  • results: 实验表明,ParaDiffusion模型在ViLG-300和ParaPrompts上比前一代模型(SD XL、DeepFloyd IF)高出15%和45%的人工投票率,对视觉吸引力和文本准确性进行了 significiant improvement。
    Abstract Text-to-image (T2I) models have recently experienced rapid development, achieving astonishing performance in terms of fidelity and textual alignment capabilities. However, given a long paragraph (up to 512 words), these generation models still struggle to achieve strong alignment and are unable to generate images depicting complex scenes. In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation. At its core is using a large language model (e.g., Llama V2) to encode long-form text, followed by fine-tuning with LORA to alignthe text-image feature spaces in the generation task. To facilitate the training of long-text semantic alignment, we also curated a high-quality paragraph-image pair dataset, namely ParaImage. This dataset contains a small amount of high-quality, meticulously annotated data, and a large-scale synthetic dataset with long text descriptions being generated using a vision-language model. Experiments demonstrate that ParaDiffusion outperforms state-of-the-art models (SD XL, DeepFloyd IF) on ViLG-300 and ParaPrompts, achieving up to 15% and 45% human voting rate improvements for visual appeal and text faithfulness, respectively. The code and dataset will be released to foster community research on long-text alignment.
    摘要 文本到图像(T2I)模型在最近几年内发展非常快,在 faithfulness 和文本对齐能力方面达到了惊人的表现。然而,对于长 paragraph(最多 512 个字),这些生成模型仍然具有差不多的对齐能力,无法生成复杂场景的图像。在这篇论文中,我们介绍了一种具有信息激活的扩展模型,称为 ParaDiffusion,该模型利用大型语言模型(例如 Llama V2)对长文本进行编码,然后通过 LORA 的 fine-tuning 将文本-图像特征空间对齐在生成任务中。为了促进长文本semantic alignment的训练,我们还精心准备了一个高质量的 paragraph-image 对应数据集,即 ParaImage。这个数据集包括一小部分高质量、精心标注的数据,以及一个大规模的Synthetic数据集,其中使用了视力语言模型来生成长文本描述。实验表明,ParaDiffusion 在 ViLG-300 和 ParaPrompts 上超过了现状的模型(SD XL、DeepFloyd IF),实现了15% 到 45% 的人类投票率提升,即 visual appeal 和 text faithfulness 的提升。代码和数据将被发布,以便社区进行长文本对齐的研究。

Image Super-Resolution with Text Prompt Diffusion

  • paper_url: http://arxiv.org/abs/2311.14282
  • repo_url: https://github.com/zhengchen1999/promptsr
  • paper_authors: Zheng Chen, Yulun Zhang, Jinjin Gu, Xin Yuan, Linghe Kong, Guihai Chen, Xiaokang Yang
  • for: 提高图像超分辨率(SR)性能,通过引入文本提示来提供质量约束。
  • methods: 使用文本生成管道将文本与SR数据集集成,通过抽象的文本受损表示方法和受损模型来实现文本描述。提出PromptSR模型,利用扩散模型和预训练语言模型(如T5和CLIP)来实现文本提示SR。
  • results: 在synthetic和实际图像上,通过引入文本提示,SR性能得到了显著提高。
    Abstract Image super-resolution (SR) methods typically model degradation to improve reconstruction accuracy in complex and unknown degradation scenarios. However, extracting degradation information from low-resolution images is challenging, which limits the model performance. To boost image SR performance, one feasible approach is to introduce additional priors. Inspired by advancements in multi-modal methods and text prompt image processing, we introduce text prompts to image SR to provide degradation priors. Specifically, we first design a text-image generation pipeline to integrate text into SR dataset through the text degradation representation and degradation model. The text representation applies a discretization manner based on the binning method to describe the degradation abstractly. This representation method can also maintain the flexibility of language. Meanwhile, we propose the PromptSR to realize the text prompt SR. The PromptSR employs the diffusion model and the pre-trained language model (e.g., T5 and CLIP). We train the model on the generated text-image dataset. Extensive experiments indicate that introducing text prompts into image SR, yields excellent results on both synthetic and real-world images. Code: https://github.com/zhengchen1999/PromptSR.
    摘要 图像超分辨率(SR)方法通常模型劣化来提高重建准确率,但是从低分辨率图像中提取劣化信息是困难的,这限制了模型性能。为了提高图像SR性能,一个可行的方法是引入额外约束。引申于多模式方法和文本描述图像处理的进步,我们引入文本提示来图像SR中。具体来说,我们首先设计了文本-图像生成管线,通过文本劣化表示和劣化模型来整合文本到SR数据集中。文本表示采用了精度方法基于桶法来描述劣化抽象。这种表示方法可以保持语言的灵活性。同时,我们提出了PromptSR,它使用了扩散模型和预训练语言模型(例如T5和CLIP)来实现文本提示SR。我们在生成的文本-图像数据集上训练了模型。广泛的实验表明,在图像SR中引入文本提示,可以获得优秀的结果,并在真实的图像上进行了证明。代码:https://github.com/zhengchen1999/PromptSR。

Multi-modal Instance Refinement for Cross-domain Action Recognition

  • paper_url: http://arxiv.org/abs/2311.14281
  • repo_url: None
  • paper_authors: Yuan Qing, Naixing Wu, Shaohua Wan, Lixin Duan
  • for: 提高跨频道动作识别的性能,减少负向传递
  • methods: 提出了多模态实例精炼(MMIR)方法,通过对每个模式进行强制学习,选择每个频道中的负样本,以减少负向传递
  • results: 在EPIC-Kitchens数据集上,与其他多个基线方法进行比较,实现了MMIR方法的超越表现,demonstrating the advantage of MMIR in reducing negative transfer.
    Abstract Unsupervised cross-domain action recognition aims at adapting the model trained on an existing labeled source domain to a new unlabeled target domain. Most existing methods solve the task by directly aligning the feature distributions of source and target domains. However, this would cause negative transfer during domain adaptation due to some negative training samples in both domains. In the source domain, some training samples are of low-relevance to target domain due to the difference in viewpoints, action styles, etc. In the target domain, there are some ambiguous training samples that can be easily classified as another type of action under the case of source domain. The problem of negative transfer has been explored in cross-domain object detection, while it remains under-explored in cross-domain action recognition. Therefore, we propose a Multi-modal Instance Refinement (MMIR) method to alleviate the negative transfer based on reinforcement learning. Specifically, a reinforcement learning agent is trained in both domains for every modality to refine the training data by selecting out negative samples from each domain. Our method finally outperforms several other state-of-the-art baselines in cross-domain action recognition on the benchmark EPIC-Kitchens dataset, which demonstrates the advantage of MMIR in reducing negative transfer.
    摘要 <>translate "Unsupervised cross-domain action recognition aims at adapting the model trained on an existing labeled source domain to a new unlabeled target domain. Most existing methods solve the task by directly aligning the feature distributions of source and target domains. However, this would cause negative transfer during domain adaptation due to some negative training samples in both domains. In the source domain, some training samples are of low-relevance to target domain due to the difference in viewpoints, action styles, etc. In the target domain, there are some ambiguous training samples that can be easily classified as another type of action under the case of source domain. The problem of negative transfer has been explored in cross-domain object detection, while it remains under-explored in cross-domain action recognition. Therefore, we propose a Multi-modal Instance Refinement (MMIR) method to alleviate the negative transfer based on reinforcement learning. Specifically, a reinforcement learning agent is trained in both domains for every modality to refine the training data by selecting out negative samples from each domain. Our method finally outperforms several other state-of-the-art baselines in cross-domain action recognition on the benchmark EPIC-Kitchens dataset, which demonstrates the advantage of MMIR in reducing negative transfer." into Simplified Chinese.cross-domain action recognition是一种无监督的领域适应任务,旨在将源频率域已经训练的模型适应到新的无标签目标频率域。大多数现有方法直接对源和目标频率域的特征分布进行对齐,这会导致域适应中的负面影响。在源频率域中,一些训练样本具有低相关性,因为视角、动作风格等因素的不同。在目标频率域中,有一些抽象的训练样本可以在源频率域中被误分类为另一种动作。域适应中的负面影响已经在跨频率域物体检测中被探讨,而在跨频率域动作识别中尚未得到充分的探讨。因此,我们提出了一种多模式实例纠正(MMIR)方法,以减少负面影响。具体来说,我们在两个频率域中训练了一个强化学习代理,用于在每个模式中纠正训练数据,选择源频率域和目标频率域中的负面样本。我们的方法最终在EPIC-Kitchens数据集上比较其他多个状态级基elines具有优势,这说明MMIR在减少负面影响方面的优势。

Latent Diffusion Prior Enhanced Deep Unfolding for Spectral Image Reconstruction

  • paper_url: http://arxiv.org/abs/2311.14280
  • repo_url: None
  • paper_authors: Zongliang Wu, Ruiying Lu, Ying Fu, Xin Yuan
  • for: 实现压缩特征图像重建,从单一具有压缩测量的二维数据中恢复三维空间特征图像。
  • methods: 使用热漂浮结构,并将实现为具有压缩特征的图像进行增强。
  • results: 提供了一个可靠且高效的方法,可以从单一具有压缩测量的二维数据中恢复高质量的三维空间特征图像,并且可以提高实现速度。
    Abstract Snapshot compressive spectral imaging reconstruction aims to reconstruct three-dimensional spatial-spectral images from a single-shot two-dimensional compressed measurement. Existing state-of-the-art methods are mostly based on deep unfolding structures but have intrinsic performance bottlenecks: $i$) the ill-posed problem of dealing with heavily degraded measurement, and $ii$) the regression loss-based reconstruction models being prone to recover images with few details. In this paper, we introduce a generative model, namely the latent diffusion model (LDM), to generate degradation-free prior to enhance the regression-based deep unfolding method. Furthermore, to overcome the large computational cost challenge in LDM, we propose a lightweight model to generate knowledge priors in deep unfolding denoiser, and integrate these priors to guide the reconstruction process for compensating high-quality spectral signal details. Numeric and visual comparisons on synthetic and real-world datasets illustrate the superiority of our proposed method in both reconstruction quality and computational efficiency. Code will be released.
    摘要 Snapshot 压缩spectral imaging重建目标是从单个两维压缩测量中重建三维空间spectral图像。现有状态之arteMethods主要基于深度 unfolding 结构,但它们具有内在的性能瓶颈:$i$) 测量受到严重损害的问题,和 $ii$) 基于回归损失的重建模型容易recover图像中的细节少。在这篇论文中,我们引入了一种生成模型,即latent diffusion model(LDM),以增强基于回归的深度 unfolding 方法。此外,为了解决LDM中的大量计算成本问题,我们提议一种轻量级的模型来生成知识先验,并将这些先验与深度 unfolding denoiser 集成,以导引重建过程以补做高质量spectral信号的细节。数字和视觉比较表明我们的提出方法在重建质量和计算效率两个方面具有优势。代码将发布。

Racing With ROS 2 A Navigation System for an Autonomous Formula Student Race Car

  • paper_url: http://arxiv.org/abs/2311.14276
  • repo_url: https://github.com/qut-motorsport/qutms_nav_integration
  • paper_authors: Alastair Bradford, Grant van Breda, Tobias Fischer
  • For: The paper is written for teams participating in autonomous racing disciplines, such as Formula Student and Society of Automotive Engineers, who are looking for an open-source solution to navigate their race cars.* Methods: The paper uses the Robot Operating System 2 (ROS2) and its open-source navigation stack to address the challenges of high-speed navigation and control in autonomous racing. The authors compare off-the-shelf navigation libraries against traditional custom-made programs developed by QUT Motorsport to evaluate their applicability in autonomous racing scenarios.* Results: The paper provides quantitative and qualitative comparisons of the navigation packages against traditional navigation solutions, with the goal of lowering the entry barrier for autonomous racing. The authors also provide a comprehensive tutorial for teams participating in similar racing disciplines and other autonomous mobile robot applications.
    Abstract The advent of autonomous vehicle technologies has significantly impacted various sectors, including motorsport, where Formula Student and Formula: Society of Automotive Engineers introduced autonomous racing classes. These offer new challenges to aspiring engineers, including the team at QUT Motorsport, but also raise the entry barrier due to the complexity of high-speed navigation and control. This paper presents an open-source solution using the Robot Operating System 2, specifically its open-source navigation stack, to address these challenges in autonomous Formula Student race cars. We compare off-the-shelf navigation libraries that this stack comprises of against traditional custom-made programs developed by QUT Motorsport to evaluate their applicability in autonomous racing scenarios and integrate them onto an autonomous race car. Our contributions include quantitative and qualitative comparisons of these packages against traditional navigation solutions, aiming to lower the entry barrier for autonomous racing. This paper also serves as a comprehensive tutorial for teams participating in similar racing disciplines and other autonomous mobile robot applications.
    摘要 自动驾驶技术的出现对各个领域产生了深远的影响,其中包括赛车, Formula Student 和 Society of Automotive Engineers 等组织也在引入自动赛车级别。这些新的挑战对年轻的工程师来说是一个重要的机遇,如 Queensland University of Technology 的车队。然而,自动赛车技术的复杂性也提高了参与门槛,特别是高速导航和控制方面。本文推出了一个开源解决方案,使用 Robot Operating System 2 的开源导航栈来解决自动赛车技术中的挑战。我们对商业 Navigation 库和 QUT Motorsport 自己开发的传统导航程序进行了比较,以评估它们在自动赛车场景中的适用性,并将它们集成到自动赛车上。我们的贡献包括对这些包装的量化和质量比较,以降低自动赛车技术的入门门槛。此外,本文还 serves as a comprehensive tutorial for参与相似赛车领域的团队和其他自动移动机器应用。

Cooperative Dual Attention for Audio-Visual Speech Enhancement with Facial Cues

  • paper_url: http://arxiv.org/abs/2311.14275
  • repo_url: None
  • paper_authors: Feixiang Wang, Shuang Yang, Shiguang Shan, Xilin Chen
    for:本文主要目标是提高Audio-Visual Speech Enhancement(AVSE)的稳定性和效果,通过利用脸部信息以外的面部特征。methods:提议一种 dual attention cooperative 框架,包括一个基于空间注意力的视觉编码器,以及一个基于自我注意力的视觉特征融合策略,以忽略非关键信息,捕捉和提高视觉信息,并与音频信号进行紧凑的融合。results:对多个 dataset 进行了严格的分析和比较,结果显示,我们的模型在多种 metric 上均超过现有方法,特别是在面部信息不可靠或缺失时。
    Abstract In this work, we focus on leveraging facial cues beyond the lip region for robust Audio-Visual Speech Enhancement (AVSE). The facial region, encompassing the lip region, reflects additional speech-related attributes such as gender, skin color, nationality, etc., which contribute to the effectiveness of AVSE. However, static and dynamic speech-unrelated attributes also exist, causing appearance changes during speech. To address these challenges, we propose a Dual Attention Cooperative Framework, DualAVSE, to ignore speech-unrelated information, capture speech-related information with facial cues, and dynamically integrate it with the audio signal for AVSE. Specifically, we introduce a spatial attention-based visual encoder to capture and enhance visual speech information beyond the lip region, incorporating global facial context and automatically ignoring speech-unrelated information for robust visual feature extraction. Additionally, a dynamic visual feature fusion strategy is introduced by integrating a temporal-dimensional self-attention module, enabling the model to robustly handle facial variations. The acoustic noise in the speaking process is variable, impacting audio quality. Therefore, a dynamic fusion strategy for both audio and visual features is introduced to address this issue. By integrating cooperative dual attention in the visual encoder and audio-visual fusion strategy, our model effectively extracts beneficial speech information from both audio and visual cues for AVSE. Thorough analysis and comparison on different datasets, including normal and challenging cases with unreliable or absent visual information, consistently show our model outperforming existing methods across multiple metrics.
    摘要 在这项工作中,我们关注利用 facial cues beyond the lip region дляrobust Audio-Visual Speech Enhancement (AVSE). facial region, including the lip region, reflects additional speech-related attributes such as gender, skin color, nationality, etc., which contribute to the effectiveness of AVSE. However, static and dynamic speech-unrelated attributes also exist, causing appearance changes during speech. To address these challenges, we propose a Dual Attention Cooperative Framework, DualAVSE, to ignore speech-unrelated information, capture speech-related information with facial cues, and dynamically integrate it with the audio signal for AVSE. Specifically, we introduce a spatial attention-based visual encoder to capture and enhance visual speech information beyond the lip region, incorporating global facial context and automatically ignoring speech-unrelated information for robust visual feature extraction. Additionally, a dynamic visual feature fusion strategy is introduced by integrating a temporal-dimensional self-attention module, enabling the model to robustly handle facial variations. The acoustic noise in the speaking process is variable, impacting audio quality. Therefore, a dynamic fusion strategy for both audio and visual features is introduced to address this issue. By integrating cooperative dual attention in the visual encoder and audio-visual fusion strategy, our model effectively extracts beneficial speech information from both audio and visual cues for AVSE. Thorough analysis and comparison on different datasets, including normal and challenging cases with unreliable or absent visual information, consistently show our model outperforming existing methods across multiple metrics.

CRISP: Hybrid Structured Sparsity for Class-aware Model Pruning

  • paper_url: http://arxiv.org/abs/2311.14272
  • repo_url: https://github.com/shivmgg/crisp
  • paper_authors: Shivam Aggarwal, Kuluhan Binici, Tulika Mitra
  • for: 提高计算效率和减少模型大小,适用于在限定类型数据上进行图像分类任务。
  • methods: 提出了一种名为CRISP的新零化框架,利用一种混合的精细结构稀疏模式,包括细致的N:M结构稀疏和块级稀疏。采用梯度导引的类域准确性分数来引导零化策略,以保留用户特定类型的权重。
  • results: CRISP实现了高准确率和最小内存占用量,对Popular模型如ResNet-50、VGG-16和MobileNetV2进行了ImageNet和CIFAR-100数据集的测试,并实现了14倍的延迟和能耗减少,与现有零化方法相比。代码可以在https://github.com/shivmgg/CRISP/查看。
    Abstract Machine learning pipelines for classification tasks often train a universal model to achieve accuracy across a broad range of classes. However, a typical user encounters only a limited selection of classes regularly. This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes. Existing works rely on unstructured pruning, which introduces randomly distributed non-zero values in the model, making it unsuitable for hardware acceleration. Alternatively, some approaches employ structured pruning, such as channel pruning, but these tend to provide only minimal compression and may lead to reduced model accuracy. In this work, we propose CRISP, a novel pruning framework leveraging a hybrid structured sparsity pattern that combines both fine-grained N:M structured sparsity and coarse-grained block sparsity. Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes. CRISP achieves high accuracy with minimal memory consumption for popular models like ResNet-50, VGG-16, and MobileNetV2 on ImageNet and CIFAR-100 datasets. Moreover, CRISP delivers up to 14$\times$ reduction in latency and energy consumption compared to existing pruning methods while maintaining comparable accuracy. Our code is available at https://github.com/shivmgg/CRISP/.
    摘要

Segmentation-Based Parametric Painting

  • paper_url: http://arxiv.org/abs/2311.14271
  • repo_url: https://github.com/manuelladron/semantic_based_painting
  • paper_authors: Manuel Ladron de Guevara, Matthew Fisher, Aaron Hertzmann
  • for: 这个论文的目的是提出一种基于 semantic segmentation 的图像转 painting 方法,以生成大规模、高精度的画作,具有人类like的艺术品质和样式变化。
  • methods: 这个方法使用了 segmentation-based painting 过程和基于人工绘画策略的动态注意力地图方法,以优化画梭进程,使得批处理大图像时能够 capture 大规模结构和细节,同时允许采用不同的绘画风格进行控制。
  • results: 该方法可以生成高质量的画作,并且可以处理大Canvas,比前一代方法更高效和灵活。经过严格的评估,我们的方法被证明可以生成更加美观和功能上优于前一代方法的画作。代码可以在 GitHub 上找到:https://github.com/manuelladron/semantic_based_painting.git
    Abstract We introduce a novel image-to-painting method that facilitates the creation of large-scale, high-fidelity paintings with human-like quality and stylistic variation. To process large images and gain control over the painting process, we introduce a segmentation-based painting process and a dynamic attention map approach inspired by human painting strategies, allowing optimization of brush strokes to proceed in batches over different image regions, thereby capturing both large-scale structure and fine details, while also allowing stylistic control over detail. Our optimized batch processing and patch-based loss framework enable efficient handling of large canvases, ensuring our painted outputs are both aesthetically compelling and functionally superior as compared to previous methods, as confirmed by rigorous evaluations. Code available at: https://github.com/manuelladron/semantic\_based\_painting.git
    摘要 我们介绍了一种新的图像到画作方法,该方法可以生成大规模、高精度的画作,具有人类艺术品质和样式变化。为处理大图像并获得笔划控制,我们引入了分割基于笔划过程和人工绘画策略启发的动态注意力地图方法,使得笔划过程可以在不同的图像区域进行批处理,同时捕捉大规模结构和细节,并允许样式控制细节。我们优化的批处理和 patch-based 损失框架,使得我们可以高效地处理大Canvas,确保我们的涂护输出具有艺术魅力和功能优势,与之前的方法相比,经rigorous评估得出。代码可以在:https://github.com/manuelladron/semantic\_based\_painting.git 中找到。

Bursting Spikes: Efficient and High-performance SNNs for Event-based Vision

  • paper_url: http://arxiv.org/abs/2311.14265
  • repo_url: https://github.com/bic-l/burst-ann2snn
  • paper_authors: Ziqing Wang, Yuetong Fang, Jiahang Cao, Renjing Xu
  • for: 提高事件驱动视觉的效率和高速响应,使用脉冲神经网络(SNN)。
  • methods: 引入启动机制,允许每步多发脉冲,降低转换错误并实现低延迟SNN。使用 pareto 前ier-driven 算法重新分配启动模式。同时,提出一种敏感度驱动的脉冲压缩技术,自动选择层Specific的最佳阈值比率。
  • results: 比较 experiments 表明,我们的方法在分类和物体检测方面具有优秀的性能和降低了能耗。代码将在 https://github.com/bic-L/burst-ann2snn 上提供。
    Abstract Advancing event-driven vision through spiking neural networks (SNNs) is crucial to empowering high-speed and efficient perception. While directly converting the pre-trained artificial neural networks (ANNs) - by replacing the non-linear activation with spiking neurons - can provide SNNs with good performance, the resultant SNNs typically demand long timesteps and high energy consumption to achieve their optimal performance. To address this challenge, we introduce the burst-spike mechanism inspired by the biological nervous system, allowing multiple spikes per timestep to reduce conversion errors and produce low-latency SNNs. To further bolster this enhancement, we leverage the Pareto Frontier-driven algorithm to reallocate burst-firing patterns. Moreover, to reduce energy consumption during the conversion process, we propose a sensitivity-driven spike compression technique, which automatically locates the optimal threshold ratio according to layer-specific sensitivity. Extensive experiments demonstrate our approach outperforms state-of-the-art SNN methods, showcasing superior performance and reduced energy usage across classification and object detection. Our code will be available at https://github.com/bic-L/burst-ann2snn.
    摘要 Simplified Chinese:通过升级事件驱动视觉的射频神经网络(SNN)来提高高速和高效的感知是非常重要的。直接将预训练的人工神经网络(ANN)转换为SNN可以提供良好的性能,但resultant SNN通常需要长时步和高能耗来实现最佳性能。为解决这个挑战,我们引入了生物神经系统中的冲击频率机制,允许每个时步多发多个射频,从而减少转换错误和生成低延迟SNN。此外,我们利用Pareto Frontier驱动算法来重新分配冲击射频模式。此外,为了在转换过程中减少能耗,我们提议一种敏感度驱动的压缩射频技术,自动根据层pecific敏感度确定最佳阈值比率。广泛的实验证明我们的方法在分类和物体检测方面超过了状态的最佳SNN方法,展示了更高的性能和更低的能耗。我们的代码将在https://github.com/bic-L/burst-ann2snn中提供。

ZeroPS: High-quality Cross-modal Knowledge Transfer for Zero-Shot 3D Part Segmentation

  • paper_url: http://arxiv.org/abs/2311.14262
  • repo_url: None
  • paper_authors: Yuheng Xue, Nenglun Chen, Jun Liu, Wenyun Sun
  • for: 这个研究的目的是设计一个 zero-shot 3D 部件分 segmentation 管道,即 ZeroPS,以高品质地将 2D 预备模型的知识转移到 3D 点云。
  • methods: 我们的方法包括两个 ком成分:1) 自我扩展 Component 将 2D 组 FROM 单一视点扩展到空间全球级 3D 组; 2) 多modal 标签 Component 引入了二维检查机制来投票每个 2D 预测 bounding box 到最佳对应 3D 部件,并使用 Class Non-highest Vote Penalty 函数来精致化投票矩阵。
  • results: 我们的方法在 PartnetE 数据集上进行了三个 zero-shot segmentation 任务的广泛评估, achieved state-of-the-art 结果,与现有方法 (+19.6%, +5.2%, +4.9%, 分别) 有 statistically significant 优化。我们的提案不需要任何训练、 fine-tuning 或学习可变参数。它对预设�shift hardly affected。代码将会发布。
    Abstract Recently, many 2D pretrained foundational models have demonstrated impressive zero-shot prediction capabilities. In this work, we design a novel pipeline for zero-shot 3D part segmentation, called ZeroPS. It high-quality transfers knowledge from 2D pretrained foundational models to 3D point clouds. The main idea of our approach is to explore the natural relationship between multi-view correspondences and the prompt mechanism of foundational models and build bridges on it. Our pipeline consists of two components: 1) a self-extension component that extends 2D groups from a single viewpoint to spatial global-level 3D groups; 2) a multi-modal labeling component that introduces a two-dimensional checking mechanism to vote each 2D predicted bounding box to the best matching 3D part, and a Class Non-highest Vote Penalty function to refine the Vote Matrix. Additionally, a merging algorithm is included to merge part-level 3D groups. Extensive evaluation of three zero-shot segmentation tasks on PartnetE datasets, achieving state-of-the-art results with significant improvements (+19.6%, +5.2% and +4.9%, respectively) over existing methods. Our proposed approach does not need any training, fine-tuning or learnable parameters. It is hardly affected by domain shift. The code will be released.
    摘要 近些时候,许多2D预训模型已经展现出了吸引人的零批预测能力。在这项工作中,我们设计了一个新的零批3D部分 segmentation管道,称为ZeroPS。它高质量地传输了2D预训基本模型中的知识到3D点云。我们的方法的核心思想是利用多视图匹配和基础模型的提示机制之间的自然关系,建立桥梁。我们的管道包括两个组成部分:1)一个自适应组件,将单个视图中的2D组从单个视图扩展到全球水平的3D组;2)一个多模态标签组件,引入二维检查机制,将每个2D预测 bounding box 投票到最佳匹配的3D部分,并使用类非最高投票函数进行细化。此外,我们还包括一个合并算法,将部分级3D组合成。我们对三个零批 segmentation 任务进行了广泛的评估,在 PartnetE 数据集上 achievement 了现有方法的状态对应成绩 (+19.6%, +5.2% 和 +4.9%, 分别),而且我们的提posed方法不需要任何训练、微调或学习参数。它受到域shift的影响很小。我们将代码发布。

RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling

  • paper_url: http://arxiv.org/abs/2311.14242
  • repo_url: None
  • paper_authors: Xiaoyue Wan, Zhuo Chen, Yiming Bao, Xu Zhao
  • for: 短基线双目3D人姿估算,寻求更加PORTABLE的设备,同时维护 Géometric Measurement Property 以避免深度抽象。
  • methods: 提出了 Stereo Co-Keypoints Estimation 模块,通过对二视图2D点的匹配使用不同的Disparity,提高了二视图2D点的视角一致性,并通过 Stereo Volume Feature 来包含不同Disparity的二视图特征。此外,还提出了 Pre-trained Pose Transformer 模块,通过捕捉人姿协调关系,对 occlusion 进行处理。
  • results: 通过在 H36M 和 MHAD 数据集上进行了广泛的实验,以及对图像进行了视觉化,证明了我们的方法在短基线双目3D人姿估算和 occlusion 处理方面的效果。
    Abstract In the domain of 3D Human Pose Estimation, which finds widespread daily applications, the requirement for convenient acquisition equipment continues to grow. To satisfy this demand, we set our sights on a short-baseline binocular setting that offers both portability and a geometric measurement property that radically mitigates depth ambiguity. However, as the binocular baseline shortens, two serious challenges emerge: first, the robustness of 3D reconstruction against 2D errors deteriorates; and second, occlusion reoccurs due to the limited visual differences between two views. To address the first challenge, we propose the Stereo Co-Keypoints Estimation module to improve the view consistency of 2D keypoints and enhance the 3D robustness. In this module, the disparity is utilized to represent the correspondence of binocular 2D points and the Stereo Volume Feature is introduced to contain binocular features across different disparities. Through the regression of SVF, two-view 2D keypoints are simultaneously estimated in a collaborative way which restricts their view consistency. Furthermore, to deal with occlusions, a Pre-trained Pose Transformer module is introduced. Through this module, 3D poses are refined by perceiving pose coherence, a representation of joint correlations. This perception is injected by the Pose Transformer network and learned through a pre-training task that recovers iterative masked joints. Comprehensive experiments carried out on H36M and MHAD datasets, complemented by visualizations, validate the effectiveness of our approach in the short-baseline binocular 3D Human Pose Estimation and occlusion handling.
    摘要 在3D人姿估计领域中,日常应用的需求不断增长。为满足这一需求,我们选择了短基线双目设计,它提供了可携带性和深度不确定性的 geometric measurement 性能。然而,随着基线短化,存在两个严重的挑战:首先,3D重建对2D错误的Robustness下降;其次,因视场限制而导致 occlusion 重新发生。为解决第一个挑战,我们提议了stereo co-keypoints estimation模块,以提高视 consistency 的2D关键点和加强3D Robustness。在这个模块中,disparity 用于表示双目2D点之间的对应关系,并引入了stereo volume feature来包含不同disparities的双目特征。通过SVF的回归,两个视场2D关键点同时被 estimate,这限制了它们的视 consistency。此外,为处理 occlusion,我们引入了预训练 pose transformer 模块。通过这个模块,3D姿态被修正,以便 perceive pose coherence,这是通过 pose transformer 网络学习并在先期任务中恢复循环屏蔽的。通过在 H36M 和 MHAD 数据集上进行了广泛的实验,并通过视觉化,证明了我们的方法在短基线双目3D人姿估计和 occlusion 处理方面的效果。

SafeSea: Synthetic Data Generation for Adverse & Low Probability Maritime Conditions

  • paper_url: http://arxiv.org/abs/2311.14764
  • repo_url: https://github.com/martin-3240/safesea
  • paper_authors: Martin Tran, Jordan Shipard, Hermawan Mulyono, Arnold Wiliem, Clinton Fookes
  • for: 提高物体探测模型的Robustness,尤其是在恶势害海域检测中。
  • methods: 使用两个自动化筛选器,首先根据海域状况分类为不同的海State水平,然后检查输入图像中的物体是否保留。
  • results: 创建了SafeSea数据集,提供了不同的天气背景,以补充海上物体探测模型的训练。并观察到,在风暴海域背景下,物体探测模型具有较差的检测精度。
    Abstract High-quality training data is essential for enhancing the robustness of object detection models. Within the maritime domain, obtaining a diverse real image dataset is particularly challenging due to the difficulty of capturing sea images with the presence of maritime objects , especially in stormy conditions. These challenges arise due to resource limitations, in addition to the unpredictable appearance of maritime objects. Nevertheless, acquiring data from stormy conditions is essential for training effective maritime detection models, particularly for search and rescue, where real-world conditions can be unpredictable. In this work, we introduce SafeSea, which is a stepping stone towards transforming actual sea images with various Sea State backgrounds while retaining maritime objects. Compared to existing generative methods such as Stable Diffusion Inpainting~\cite{stableDiffusion}, this approach reduces the time and effort required to create synthetic datasets for training maritime object detection models. The proposed method uses two automated filters to only pass generated images that meet the criteria. In particular, these filters will first classify the sea condition according to its Sea State level and then it will check whether the objects from the input image are still preserved. This method enabled the creation of the SafeSea dataset, offering diverse weather condition backgrounds to supplement the training of maritime models. Lastly, we observed that a maritime object detection model faced challenges in detecting objects in stormy sea backgrounds, emphasizing the impact of weather conditions on detection accuracy. The code, and dataset are available at https://github.com/martin-3240/SafeSea.
    摘要 高品质的训练数据是增强物品探测模型的关键。在海上领域中,获取多样化的真实图像数据 particullay 困难,因为捕捉海洋图像时,海洋物品的存在特别困难,尤其是在飓风情况下。这些挑战的原因包括资源限制,以及海洋物品的不可预测出现方式。然而,从飓风情况中获取数据是训练有效海上物品探测模型的必要条件,特别是搜救行动中,实际世界的情况可能是不可预测的。在这个工作中,我们介绍了SafeSea,它是将实际海洋图像转换为多样化海洋背景的开始。相比于现有的生成方法,如稳定传播填充(Stable Diffusion Inpainting),这种方法可以快速创建海上物品探测模型的synthetic数据集。我们的方法使用两个自动筛选器,分别为 sea 状态水平的分类和物品从输入图像中是否仍然存在的检查。这个方法实现了创建SafeSea数据集,提供了多元的天气情况背景,供海上物品探测模型的训练。最后,我们观察到海上物品探测模型在飓风海域背景下侦测物品的问题,强调了天气情况对探测精度的影响。SafeSea 代码和数据可以在 GitHub 上获取:https://github.com/martin-3240/SafeSea。

Pseudo-label Correction for Instance-dependent Noise Using Teacher-student Framework

  • paper_url: http://arxiv.org/abs/2311.14237
  • repo_url: https://github.com/eugenekim3107/pseudo-label-correction-for-instance-dependent-noise-using-teacher-student-framework
  • paper_authors: Eugene Kim
  • for: 这篇论文旨在解决深度学习模型面临标签噪音的问题,对于标签噪音的影响使得模型对于标签的分类能力下降。
  • methods: 本研究提出了一个新的教师-学生架构,称为P-LC(伪标签修正),它利用三个Encoder来建立一个伪标签修正系统。当学生为一些图像生成伪标签时,教师将选择使用伪标签或原始标签。
  • results: 实验结果显示,P-LC在MNIST、Fashion-MNIST和SVHN等数据集上均表现出色,尤其在高噪音水平下。此外,我们还引入了一个噪音水平估计,帮助评估模型表现和决定是否需要进一步的数据清洁程序。
    Abstract The high capacity of deep learning models to learn complex patterns poses a significant challenge when confronted with label noise. The inability to differentiate clean and noisy labels ultimately results in poor generalization. We approach this problem by reassigning the label for each image using a new teacher-student based framework termed P-LC (pseudo-label correction). Traditional teacher-student networks are composed of teacher and student classifiers for knowledge distillation. In our novel approach, we reconfigure the teacher network into a triple encoder, leveraging the triplet loss to establish a pseudo-label correction system. As the student generates pseudo labels for a set of given images, the teacher learns to choose between the initially assigned labels and the pseudo labels. Experiments on MNIST, Fashion-MNIST, and SVHN demonstrate P-LC's superior performance over existing state-of-the-art methods across all noise levels, most notably in high noise. In addition, we introduce a noise level estimation to help assess model performance and inform the need for additional data cleaning procedures.
    摘要 高效的深度学习模型可以学习复杂的模式,但 Label 噪声问题却对其具有 significante challenge。由于无法区分干净和噪声标签,这 ultimately results in poor generalization。我们通过一种新的教师生学习框架(P-LC,pseudo-label correction)来解决这个问题。传统的教师生网络由教师和学生分类器组成,用于知识储存。在我们的新方法中,我们将教师网络重新配置为三重Encoder,利用 triplet loss 来建立一个pseudo-label correction系统。当学生生成一组给定图像的 pseudo labels时,教师将选择最初分配的标签还是 pseudo labels。实验表明,P-LC 在 MNIST、Fashion-MNIST 和 SVHN 上的性能都高于现有的状态之势方法,特别是在高噪声水平。此外,我们还提出了一种噪声水平估计,以便评估模型性能并提供额外数据清洁过程的指导。

cs.AI - 2023-11-24

Advancing Fluid-Based Thermal Management Systems Design: Leveraging Graph Neural Networks for Graph Regression and Efficient Enumeration Reduction

  • paper_url: http://arxiv.org/abs/2311.14874
  • repo_url: None
  • paper_authors: Saeid Bayat, Nastaran Shahmansouri, Satya RT Peddada, Alex Tessier, Adrian Butscher, James T Allison
    for: 这 paper 的目的是提出一种基于图的框架,用于快速和高效地找到优化 thermal management system 的设计方案。methods: 该 paper 使用了图学习模型(GNN),用于预测系统的性能值。首先,使用 GNN 模型对数据进行训练,然后使用这些模型来预测系统的性能值。results: 研究结果表明,使用 GNN 模型可以快速和高效地预测系统的性能值,并且可以减少约 92% 的系统动态模型化和优化控制分析时间。
    Abstract In this research, we developed a graph-based framework to represent various aspects of optimal thermal management system design, with the aim of rapidly and efficiently identifying optimal design candidates. Initially, the graph-based framework is utilized to generate diverse thermal management system architectures. The dynamics of these system architectures are modeled under various loading conditions, and an open-loop optimal controller is employed to determine each system's optimal performance. These modeled cases constitute the dataset, with the corresponding optimal performance values serving as the labels for the data. In the subsequent step, a Graph Neural Network (GNN) model is trained on 30% of the labeled data to predict the systems' performance, effectively addressing a regression problem. Utilizing this trained model, we estimate the performance values for the remaining 70% of the data, which serves as the test set. In the third step, the predicted performance values are employed to rank the test data, facilitating prioritized evaluation of the design scenarios. Specifically, a small subset of the test data with the highest estimated ranks undergoes evaluation via the open-loop optimal control solver. This targeted approach concentrates on evaluating higher-ranked designs identified by the GNN, replacing the exhaustive search (enumeration-based) of all design cases. The results demonstrate a significant average reduction of over 92% in the number of system dynamic modeling and optimal control analyses required to identify optimal design scenarios.
    摘要 在这项研究中,我们开发了一个基于图的框架来表示优化热管理系统设计的多种方面,以便快速和高效地确定优化设计方案。首先,基于图的框架用于生成多种热管理系统架构。这些系统架构的动态模型在不同的荷载条件下被模型化,并使用开箱控制器来确定每个系统的优化性能。这些模型的情况组成了数据集,其中每个数据对应的优化性能值服为标签。在接下来的步骤中,我们使用30%的标签数据训练了一个图神经网络(GNN)模型,以预测系统的性能。使用这个训练好的模型,我们对剩下的70%的数据进行预测,并将预测值用于排序测试数据。在第三步中,我们使用预测值来评价测试数据,并将其分为优化设计的小子集。这些小子集进行了优化控制的评估,从而实现了高效的优化设计评价。结果表明,使用GNN模型可以减少系统动态模型化和优化控制分析的数量超过92%。

Improving Cross-Domain Hate Speech Generalizability with Emotion Knowledge

  • paper_url: http://arxiv.org/abs/2311.14865
  • repo_url: None
  • paper_authors: Shi Yin Hong, Susan Gauch
  • for: 这项研究旨在提高仇恨言语检测系统的通用性,以便在真实世界中进行部署。
  • methods: 这项研究使用了多任务架构,利用情感知识来提高仇恨言语检测的通用性。
  • results: 研究表明,使用情感知识可以提高仇恨言语检测的通用性,并在六个公共可用的数据集上实现了18.1%的通用性提升和8.5%的平均跨领域性提升,根据F1度量。
    Abstract Reliable automatic hate speech (HS) detection systems must adapt to the in-flow of diverse new data to curtail hate speech. However, hate speech detection systems commonly lack generalizability in identifying hate speech dissimilar to data used in training, impeding their robustness in real-world deployments. In this work, we propose a hate speech generalization framework that leverages emotion knowledge in a multitask architecture to improve the generalizability of hate speech detection in a cross-domain setting. We investigate emotion corpora with varying emotion categorical scopes to determine the best corpus scope for supplying emotion knowledge to foster generalized hate speech detection. We further assess the relationship between using pretrained Transformers models adapted for hate speech and its effect on our emotion-enriched hate speech generalization model. We perform extensive experiments on six publicly available datasets sourced from different online domains and show that our emotion-enriched HS detection generalization method demonstrates consistent generalization improvement in cross-domain evaluation, increasing generalization performance up to 18.1% and average cross-domain performance up to 8.5%, according to the F1 measure.
    摘要 可靠的自动仇恨言语检测系统需要适应新数据的流入,以遏制仇恨言语的 распространение。然而,仇恨言语检测系统通常缺乏对不同于训练数据的仇恨言语识别的普适性,从而限制其在实际应用中的稳定性。在这种情况下,我们提出了一种基于情感知识的仇恨言语检测框架,以提高仇恨言语检测的普适性在跨领域设置下。我们 investigate了不同的情感领域词库,以确定最佳的词库范围来供情感知识,以促进普适的仇恨言语检测。我们进一步评估了使用预训练的Transformers模型,以适应仇恨言语检测,并对我们的情感强化仇恨言语检测模型产生了何种影响。我们在六个公共可用的数据集上进行了广泛的实验,并显示了我们的情感强化仇恨言语检测普适化方法在跨领域评估中具有可靠的普适性提升,提高总体普适性达18.1%,平均跨领域性达8.5%,根据F1度量。

Next-gen traffic surveillance: AI-assisted mobile traffic violation detection system

  • paper_url: http://arxiv.org/abs/2311.16179
  • repo_url: None
  • paper_authors: Dila Dede, Mehmet Ali Sarsıl, Ata Shaker, Olgu Altıntaş, Onur Ergen
  • for: 这篇论文的目的是探讨如何运用人工智能技术来实现精确的交通法规遵循检测系统,以减少交通事故的影响。
  • methods: 本论文使用的方法包括计算机视觉和机器学习,包括YOLOv5检测模组和强SORT追踪模组,以检测交通违法行为。
  • results: 本论文的结果显示,这些方法可以实现精确的交通违法检测,包括红灯违法、非法使用崩溃车道、违反车辆跟踪距离、违反标志法规、非法停车和停车在标志道路上。
    Abstract Road traffic accidents pose a significant global public health concern, leading to injuries, fatalities, and vehicle damage. Approximately 1,3 million people lose their lives daily due to traffic accidents [World Health Organization, 2022]. Addressing this issue requires accurate traffic law violation detection systems to ensure adherence to regulations. The integration of Artificial Intelligence algorithms, leveraging machine learning and computer vision, has facilitated the development of precise traffic rule enforcement. This paper illustrates how computer vision and machine learning enable the creation of robust algorithms for detecting various traffic violations. Our model, capable of identifying six common traffic infractions, detects red light violations, illegal use of breakdown lanes, violations of vehicle following distance, breaches of marked crosswalk laws, illegal parking, and parking on marked crosswalks. Utilizing online traffic footage and a self-mounted on-dash camera, we apply the YOLOv5 algorithm's detection module to identify traffic agents such as cars, pedestrians, and traffic signs, and the strongSORT algorithm for continuous interframe tracking. Six discrete algorithms analyze agents' behavior and trajectory to detect violations. Subsequently, an Identification Module extracts vehicle ID information, such as the license plate, to generate violation notices sent to relevant authorities.
    摘要 交通事故是全球公共健康的一大挑战,引起伤亡、死亡和车辆损害。根据世界卫生组织2022年的数据,每天有1,3万人因交通事故丧生。为解决这一问题,需要 precisetraffic law violation detection system,以确保遵守法规。通过人工智能算法,利用机器学习和计算机视觉,可以开发出高精度的交通规则执行系统。本文介绍了计算机视觉和机器学习如何为交通规则执行创造出Robust algorithms。我们的模型可以识别六种常见的交通违法行为,包括红灯违法、非法使用车辆维护道、车辆跟踪违法、人行交通违法、非法停车和停车在人行交通道上。我们使用在车上安装的自适应摄像头和网络交通视频,并应用YOLOv5检测模块和强Sort算法来识别交通代理人(包括车辆、行人和交通标志)的行为和轨迹,然后分别对代理人的行为和轨迹进行分析,并生成违法通知。最后,一个标识模块从车辆ID信息中提取车辆牌照号,以生成违法通知并发送到相关当局。

A Reusable AI-Enabled Defect Detection System for Railway Using Ensembled CNN

  • paper_url: http://arxiv.org/abs/2311.14824
  • repo_url: None
  • paper_authors: Rahatara Ferdousi, Fedwa Laamarti, Chunsheng Yang, Abdulmotaleb El Saddik
  • For: 提高智能铁路系统中的可靠性,通过检测铁路部件的缺陷。* Methods: 使用ensemble learning和转移学习模型(VGG-19、MobileNetV3和ResNet-50),并对不同阶段进行训练,以提高缺陷分类精度和鲁棒性。* Results: 对比其他状态艺术方法,实验结果表明我们的方法可以获得更好和更一致的性能, substantiating the reusability of the defect detection system for newly evolved defected rail parts。
    Abstract Accurate Defect detection is crucial for ensuring the trustworthiness of intelligent railway systems. Current approaches rely on single deep-learning models, like CNNs, which employ a large amount of data to capture underlying patterns. Training a new defect classifier with limited samples often leads to overfitting and poor performance on unseen images. To address this, researchers have advocated transfer learning and fine-tuning the pre-trained models. However, using a single backbone network in transfer learning still may cause bottleneck issues and inconsistent performance if it is not suitable for a specific problem domain. To overcome these challenges, we propose a reusable AI-enabled defect detection approach. By combining ensemble learning with transfer learning models (VGG-19, MobileNetV3, and ResNet-50), we improved the classification accuracy and achieved consistent performance at a certain phase of training. Our empirical analysis demonstrates better and more consistent performance compared to other state-of-the-art approaches. The consistency substantiates the reusability of the defect detection system for newly evolved defected rail parts. Therefore we anticipate these findings to benefit further research and development of reusable AI-enabled solutions for railway systems.
    摘要 精准的缺陷检测是智能铁路系统的可靠性保证之重要因素。现有的方法通常采用单个深度学习模型,如Convolutional Neural Networks(CNNs),利用大量数据捕捉下面的模式。然而,在训练新的缺陷分类器时,通常会导致过拟合和未看到的图像表现不佳。为解决这些挑战,研究人员已经提出了传输学习和精度调整预训练模型的方法。然而,使用单一的背部网络在传输学习中仍可能会导致瓶颈问题和不一致的表现。为了解决这些问题,我们提出了一种可重用的人工智能启用缺陷检测方法。通过将集成学习与传输学习模型(VGG-19、MobileNetV3和ResNet-50)相结合,我们提高了分类精度并在某些阶段的训练中实现了一致的表现。我们的实验分析表明,我们的方法在其他状态艺术方法的比较中表现更好和一致。这种一致性证明了缺陷检测系统的可重用性,因此我们预计这些发现将对智能铁路系统的进一步研发产生积极的影响。

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

  • paper_url: http://arxiv.org/abs/2311.15826
  • repo_url: https://github.com/mbzuai-oryx/geochat
  • paper_authors: Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Das, Salman Khan, Fahad Shahbaz Khan
    for: 这篇论文的目的是提出一种可靠的远程感知大型语言模型(VLM),以便在远程感知领域进行对话。methods: 该模型使用多任务对话能力,可以处理高分辨率远程感知图像,并且可以根据用户提问的要求进行地区特定的对话。results: 模型在远程感知多任务对话中表现出色,例如图像和地区描述、视觉问答、场景分类、视觉引用检测等任务上,都有robust零基eline性表现。
    Abstract Recent advancements in Large Vision-Language Models (VLMs) have shown great promise in natural image domains, allowing users to hold a dialogue about given visual content. However, such general-domain VLMs perform poorly for Remote Sensing (RS) scenarios, leading to inaccurate or fabricated information when presented with RS domain-specific queries. Such a behavior emerges due to the unique challenges introduced by RS imagery. For example, to handle high-resolution RS imagery with diverse scale changes across categories and many small objects, region-level reasoning is necessary alongside holistic scene interpretation. Furthermore, the lack of domain-specific multimodal instruction following data as well as strong backbone models for RS make it hard for the models to align their behavior with user queries. To address these limitations, we propose GeoChat - the first versatile remote sensing VLM that offers multitask conversational capabilities with high-resolution RS images. Specifically, GeoChat can not only answer image-level queries but also accepts region inputs to hold region-specific dialogue. Furthermore, it can visually ground objects in its responses by referring to their spatial coordinates. To address the lack of domain-specific datasets, we generate a novel RS multimodal instruction-following dataset by extending image-text pairs from existing diverse RS datasets. We establish a comprehensive benchmark for RS multitask conversations and compare with a number of baseline methods. GeoChat demonstrates robust zero-shot performance on various RS tasks, e.g., image and region captioning, visual question answering, scene classification, visually grounded conversations and referring detection. Our code is available at https://github.com/mbzuai-oryx/geochat.
    摘要 近期大量视觉语言模型(VLM)的进步已经在自然图像领域显示出了极大的搭配潜力,allowing用户与图像进行对话。然而,这些通用领域VLM对远程感知(RS)场景表现糟糕,导致用户提出的RS领域特定问题时返回错误或 fabricated 信息。这种行为的原因在于RS图像具有独特的挑战,例如处理高分辨率RS图像的多种比例变化和许多小对象需要地域级别的理解。此外,RS领域缺乏特定的多模态指令遵循数据和强大的后准体模型,使模型困难匹配用户查询。为解决这些限制,我们提出了GeoChat - 首个多功能远程感知VLM,具有多任务对话功能和高分辨率RS图像。具体来说,GeoChat不仅可以回答图像级别的问题,还可以接受区域输入进行区域特定对话。此外,它可以将对象视觉匹配到其空间坐标。为了解决RS领域缺乏特定数据,我们生成了一个新的RS多模态指令遵循数据集,通过扩展现有多种RS数据集的图像文本对。我们建立了RS多任务对话的全面benchmark,并与多个基线方法进行比较。GeoChat在多种RS任务上 exhibits robust zero-shot表现,例如图像和区域captioning、视觉问题回答、场景分类、visually grounded conversations和referring detection。我们的代码可以在https://github.com/mbzuai-oryx/geochat 上获取。

Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

  • paper_url: http://arxiv.org/abs/2311.14656
  • repo_url: https://github.com/jonathan-roberts1/charting-new-territories
  • paper_authors: Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, Samuel Albanie
  • for: This paper explores the capabilities of multimodal large language models (MLLMs) in the geographic and geospatial domains, and evaluates their performance against open-source counterparts.
  • methods: The paper uses a small-scale geographic benchmark consisting of a suite of visual tasks to challenge the models and test their abilities across a spectrum of complexity.
  • results: The analysis reveals where the models excel and where they falter, providing a balanced view of their capabilities in the geographic domain. Additionally, the benchmark will be publicly released to enable the comparison and evaluation of future models.Here’s the Chinese version of the three key points:
  • for: 这篇论文探索了大型语言模型(MLLMs)在地理和地球几何领域的知识和能力,并对开源对手进行评估。
  • methods: 这篇论文使用一个小规模的地理 benchmark,包括一系列视觉任务,用于挑战这些模型,并测试它们在复杂性谱中的能力。
  • results: 分析发现,这些模型在某些任务上 excel,甚至超过人类的性能,但也有一些任务在它们失败。这个分析提供了地理领域中模型的总体能力的平衡视图。此外,这个 benchmark 将会公开发布,以便将来的模型进行比较和评估。
    Abstract Multimodal large language models (MLLMs) have shown remarkable capabilities across a broad range of tasks but their knowledge and abilities in the geographic and geospatial domains are yet to be explored, despite potential wide-ranging benefits to navigation, environmental research, urban development, and disaster response. We conduct a series of experiments exploring various vision capabilities of MLLMs within these domains, particularly focusing on the frontier model GPT-4V, and benchmark its performance against open-source counterparts. Our methodology involves challenging these models with a small-scale geographic benchmark consisting of a suite of visual tasks, testing their abilities across a spectrum of complexity. The analysis uncovers not only where such models excel, including instances where they outperform humans, but also where they falter, providing a balanced view of their capabilities in the geographic domain. To enable the comparison and evaluation of future models, our benchmark will be publicly released.
    摘要 多模态大语言模型(MLLM)已经表现出了广泛的能力,但它们在地理和地区领域的知识和能力尚未得到了探索,尽管这些能力在航行、环境研究、城市发展和灾害应急应用中具有广泛的应用前景。我们进行了一系列实验,探索了不同的视觉能力,特别是关注最新的FRONTIER模型GPT-4V,并对开源对手进行了比较。我们的方法包括对这些模型进行小规模的地理测试,测试它们在复杂性谱上的能力。分析结果表明,这些模型在某些情况下可以超越人类,而在其他情况下则表现不佳,这为未来的模型评估和比较提供了一个平衡的视角。为便于将来的模型评估和比较,我们将在公共领域中发布我们的准则。

Evaluating Large Language Models through Gender and Racial Stereotypes

  • paper_url: http://arxiv.org/abs/2311.14788
  • repo_url: None
  • paper_authors: Ananya Malik
  • for: 这项研究是为了研究语言模型中可能存在的偏见,特别是gender和race偏见在职业场景下的表现。
  • methods: 这项研究采用了比较性研究的方法,使用了 newer和older语言模型,对两类偏见进行了评估。
  • results: 研究发现, newer模型中的gender偏见已经大幅减少,而race偏见仍然存在。
    Abstract Language Models have ushered a new age of AI gaining traction within the NLP community as well as amongst the general population. AI's ability to make predictions, generations and its applications in sensitive decision-making scenarios, makes it even more important to study these models for possible biases that may exist and that can be exaggerated. We conduct a quality comparative study and establish a framework to evaluate language models under the premise of two kinds of biases: gender and race, in a professional setting. We find out that while gender bias has reduced immensely in newer models, as compared to older ones, racial bias still exists.
    摘要 人工智能(AI)在自然语言处理(NLP)领域以及普通民众中得到了广泛的推广和应用。AI的预测、生成和敏感决策应用等能力,使得研究这些模型中可能存在的偏见变得非常重要。我们进行了质量比较研究,并建立了一个基于两种偏见(性别和种族)的评价框架,在职业场景中进行了测试。我们发现, newer models中的性别偏见已经减少了很多,而种族偏见仍然存在。

History Filtering in Imperfect Information Games: Algorithms and Complexity

  • paper_url: http://arxiv.org/abs/2311.14651
  • repo_url: None
  • paper_authors: Christopher Solinas, Douglas Rebstock, Nathan R. Sturtevant, Michael Buro
  • for: 这篇论文主要适用于不完全信息游戏中的深度有限搜索和价值函数。
  • methods: 论文使用了深度有限搜索和价值函数来解决不完全信息游戏中的问题。它们还使用了一些有强理论保证的方法,但这些方法可能需要较为复杂的计算。
  • results: 论文的实验结果表明,在使用深度有限搜索和价值函数时,可以在不完全信息游戏中实现更好的性能。它们还提供了一种基于马尔可夫链 Монте卡罗牛算法的生成算法,可以在诸如牛牛等游戏中进行更加快速的搜索。
    Abstract Historically applied exclusively to perfect information games, depth-limited search with value functions has been key to recent advances in AI for imperfect information games. Most prominent approaches with strong theoretical guarantees require subgame decomposition - a process in which a subgame is computed from public information and player beliefs. However, subgame decomposition can itself require non-trivial computations, and its tractability depends on the existence of efficient algorithms for either full enumeration or generation of the histories that form the root of the subgame. Despite this, no formal analysis of the tractability of such computations has been established in prior work, and application domains have often consisted of games, such as poker, for which enumeration is trivial on modern hardware. Applying these ideas to more complex domains requires understanding their cost. In this work, we introduce and analyze the computational aspects and tractability of filtering histories for subgame decomposition. We show that constructing a single history from the root of the subgame is generally intractable, and then provide a necessary and sufficient condition for efficient enumeration. We also introduce a novel Markov Chain Monte Carlo-based generation algorithm for trick-taking card games - a domain where enumeration is often prohibitively expensive. Our experiments demonstrate its improved scalability in the trick-taking card game Oh Hell. These contributions clarify when and how depth-limited search via subgame decomposition can be an effective tool for sequential decision-making in imperfect information settings.
    摘要

Calibrated Language Models Must Hallucinate

  • paper_url: http://arxiv.org/abs/2311.14648
  • repo_url: None
  • paper_authors: Adam Tauman Kalai, Santosh S. Vempala
  • for: 这研究旨在解释语言模型中的幻像现象,并提出了一种解释方法。
  • methods: 该研究使用了统计学方法,对语言模型的幻像现象进行分析。
  • results: 研究发现,幻像现象是因为语言模型满足一定的统计学条件而出现的,而这些条件不是由transformer语言模型架构或数据质量决定的。此外,研究还发现,对于一些特定的事实,幻像是语言模型的必要条件,以确保其能够生成有效的文本。
    Abstract Recent language models have a mysterious tendency to generate false but plausible-sounding text. Such "hallucinations" are an obstacle to the usability of language-based AI systems and can harm people who rely upon their outputs. This work shows shows that there is an inherent statistical reason that pretrained language models hallucinate certain types of facts, having nothing to do with the transformer LM architecture or data quality. For "arbitrary" facts whose veracity cannot be determined from the training data, we show that hallucination is necessary for language models that satisfy a statistical calibration condition appropriate for generative language models. Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. One conclusion is that models pretrained to be sufficiently good predictors (i.e., calibrated) may require post-training to mitigate hallucinations on the type of arbitrary facts that tend to appear once in the training set. However, our analysis also suggests that there is no statistical reason that pretraining will lead to hallucination on facts that tend to appear more than once in the training data (like references to publications such as articles and books, whose hallucinations have been particularly notable and problematic) or on systematic facts (like arithmetic calculations). Therefore, different architectures and learning algorithms may mitigate these latter types of hallucinations.
    摘要 现代语言模型有一种神秘的倾向,即生成 false but plausible-sounding 的文本。这些 "hallucination" 会使语言基于 AI 系统的可用性受到影响,并可能伤害人们对其输出的依赖。这项工作表明,这种 hallucination 有一定的统计原因,与 transformer LM 架构或数据质量无关。对于无法从训练数据中确定真假的 "arbitrary" 事实,我们显示了 hallucination 是语言模型满足统计均衡 condtion 的必要条件。Specifically, if the maximum probability of any fact is bounded, we show that the probability of generating a hallucination is close to the fraction of facts that occur exactly once in the training data (a "Good-Turing" estimate), even assuming ideal training data without errors. 这一结论是,可以在训练后使用 post-training 来减少基于训练集中单一出现的hallucination。然而,我们的分析还表明,没有统计原因使得预训练会导致 hallucination 发生在多次出现在训练数据中的事实(如文献引用如文章和书籍的hallucination)或系统性的事实(如数学计算)。因此,不同的架构和学习算法可能会解决这些类型的 hallucination。

GPT-4V Takes the Wheel: Evaluating Promise and Challenges for Pedestrian Behavior Prediction

  • paper_url: http://arxiv.org/abs/2311.14786
  • repo_url: None
  • paper_authors: Jia Huang, Peng Jiang, Alvika Gautam, Srikanth Saripalli
  • for: 预测行人行为,提高自动驾驶安全性
  • methods: 使用大型多modal模型(LMM)和语言模型GPT-4V(vision),通过semi-supervised Training来提高视觉理解和 causal 逻辑能力
  • results: 在使用公共可用数据集JAAD、PIE和WiDEVIEW进行评估时,GPT-4V(vision)表现出zero-shot行人行为预测和自动驾驶场景理解能力,但还未达到传统领域专门模型的水平,存在小人和车辆在运动时的处理困难
    Abstract Existing pedestrian behavior prediction methods rely primarily on deep neural networks that utilize features extracted from video frame sequences. Although these vision-based models have shown promising results, they face limitations in effectively capturing and utilizing the dynamic spatio-temporal interactions between the target pedestrian and its surrounding traffic elements, crucial for accurate reasoning. Additionally, training these models requires manually annotating domain-specific datasets, a process that is expensive, time-consuming, and difficult to generalize to new environments and scenarios. The recent emergence of Large Multimodal Models (LMMs) offers potential solutions to these limitations due to their superior visual understanding and causal reasoning capabilities, which can be harnessed through semi-supervised training. GPT-4V(ision), the latest iteration of the state-of-the-art Large-Language Model GPTs, now incorporates vision input capabilities. This report provides a comprehensive evaluation of the potential of GPT-4V for pedestrian behavior prediction in autonomous driving using publicly available datasets: JAAD, PIE, and WiDEVIEW. Quantitative and qualitative evaluations demonstrate GPT-4V(ision)'s promise in zero-shot pedestrian behavior prediction and driving scene understanding ability for autonomous driving. However, it still falls short of the state-of-the-art traditional domain-specific models. Challenges include difficulties in handling small pedestrians and vehicles in motion. These limitations highlight the need for further research and development in this area.
    摘要 现有的步行者行为预测方法主要依靠深度神经网络,使用视频帧序列中提取的特征进行学习。 although these vision-based models have shown promising results, they have limitations in effectively capturing and utilizing the dynamic spatio-temporal interactions between the target pedestrian and its surrounding traffic elements, which is crucial for accurate reasoning. In addition, training these models requires manually annotating domain-specific datasets, which is expensive, time-consuming, and difficult to generalize to new environments and scenarios.However, the recent emergence of Large Multimodal Models (LMMs) offers potential solutions to these limitations due to their superior visual understanding and causal reasoning capabilities, which can be harnessed through semi-supervised training. GPT-4V(ision), the latest iteration of the state-of-the-art Large-Language Model GPTs, now incorporates vision input capabilities.This report provides a comprehensive evaluation of the potential of GPT-4V for pedestrian behavior prediction in autonomous driving using publicly available datasets: JAAD, PIE, and WiDEVIEW. Quantitative and qualitative evaluations demonstrate GPT-4V(ision)'s promise in zero-shot pedestrian behavior prediction and driving scene understanding ability for autonomous driving. However, it still falls short of the state-of-the-art traditional domain-specific models. Challenges include difficulties in handling small pedestrians and vehicles in motion. These limitations highlight the need for further research and development in this area.

One Strike, You’re Out: Detecting Markush Structures in Low Signal-to-Noise Ratio Images

  • paper_url: http://arxiv.org/abs/2311.14633
  • repo_url: https://github.com/thomasjurriaans/markush-recognition-msc-thesis
  • paper_authors: Thomas Jurriaans, Kinga Szarkowska, Eric Nalisnick, Markus Schwoerer, Camilo Thorne, Saber Akhondi
  • for: 本研究旨在提出和测试一种用于分类Markush结构的新方法,以提高化学家从大量文献中提取化学信息的自动化方法的精度。
  • methods: 本研究使用了两种方法进行比较:固定特征提取法和终端学习(CNN)方法。研究发现,终端学习方法与固定特征提取法的比较显著,其Macro F1值为0.928(0.035 SD),而固定特征提取法的Macro F1值为0.701(0.052 SD)。
  • results: 研究结果表明,提出的方法可以有效地和准确地过滤Markush结构,并且可以提高化学家使用OCSR管道时的精度。
    Abstract Modern research increasingly relies on automated methods to assist researchers. An example of this is Optical Chemical Structure Recognition (OCSR), which aids chemists in retrieving information about chemicals from large amounts of documents. Markush structures are chemical structures that cannot be parsed correctly by OCSR and cause errors. The focus of this research was to propose and test a novel method for classifying Markush structures. Within this method, a comparison was made between fixed-feature extraction and end-to-end learning (CNN). The end-to-end method performed significantly better than the fixed-feature method, achieving 0.928 (0.035 SD) Macro F1 compared to the fixed-feature method's 0.701 (0.052 SD). Because of the nature of the experiment, these figures are a lower bound and can be improved further. These results suggest that Markush structures can be filtered out effectively and accurately using the proposed method. When implemented into OCSR pipelines, this method can improve their performance and use to other researchers.
    摘要 现代研究日益依靠自动化方法助长研究人员。一个例子是光学化学结构识别(OCSR),它帮助化学家从大量文献中提取化学物质的信息。马库什结构是无法正确分析的化学结构,会导致错误。本研究的重点是提议并测试一种新的马库什结构分类方法。在这种方法中,对比了固定特征提取和端到端学习(CNN)两种方法。端到端方法在比较中表现了显著的优势,达到了0.928(0.035 SD)的macro F1指标,比固定特征方法的0.701(0.052 SD)高出了许多。由于实验的性质,这些数据是下界值,可以进一步提高。这些结果表明,可以使用提案的方法有效地和准确地过滤马库什结构。将这种方法纳入OCSR管道中,可以提高其性能,并且可以用于其他研究人员。

ARIA: On the interaction between Architectures, Aggregation methods and Initializations in federated visual classification

  • paper_url: http://arxiv.org/abs/2311.14625
  • repo_url: None
  • paper_authors: Vasilis Siomos, Sergio Naval-Marimont, Jonathan Passerat-Palmbach, Giacomo Tarroni
  • For: The paper is written to investigate the effect of architecture, initialization, and aggregation (ARIA) elements on the performance of federated learning (FL) models in medical image classification tasks.* Methods: The paper uses a joint ARCHitecture-Initialization-Aggregation (ARIA) study and benchmarking approach to evaluate the performance of different ARIA element combinations across a range of medical image classification tasks.* Results: The paper finds that ARIA elements should be chosen together to achieve the best possible performance, and provides insights into good choices for each element depending on the task, the effect of normalization layers, and the utility of secure multi-party computation (SMPC) pre-training.
    Abstract Federated Learning (FL) is a collaborative training paradigm that allows for privacy-preserving learning of cross-institutional models by eliminating the exchange of sensitive data and instead relying on the exchange of model parameters between the clients and a server. Despite individual studies on how client models are aggregated, and, more recently, on the benefits of ImageNet pre-training, there is a lack of understanding of the effect the architecture chosen for the federation has, and of how the aforementioned elements interconnect. To this end, we conduct the first joint ARchitecture-Initialization-Aggregation study and benchmark ARIAs across a range of medical image classification tasks. We find that, contrary to current practices, ARIA elements have to be chosen together to achieve the best possible performance. Our results also shed light on good choices for each element depending on the task, the effect of normalisation layers, and the utility of SSL pre-training, pointing to potential directions for designing FL-specific architectures and training pipelines.
    摘要 合作学习(Federated Learning,FL)是一种协同训练模式,允许客户端在保护隐私数据的情况下进行模型训练,通过客户端和服务器之间交换模型参数而不是敏感数据。虽然有关客户端模型的汇集方法和ImageNet预训练的研究已经进行过,但是还没有很好地理解 federation 的架构选择对性能的影响,以及这些元素之间的关系。为了解决这个问题,我们进行了首次的ARchitecture-Initialization-Aggregation研究和ARIA比较,并在医学图像分类任务上进行了检验。我们发现,与现有做法不同,ARIA元素需要一起选择以达到最佳性能。我们的结果还揭示了每个元素在任务上的好选择,normalization层的影响以及SSL预训练的用处,这些结果可能指向FL特有的架构设计和训练管道的可能方向。

Eliciting Honest Information From Authors Using Sequential Review

  • paper_url: http://arxiv.org/abs/2311.14619
  • repo_url: None
  • paper_authors: Yichi Zhang, Grant Schoenebeck, Weijie Su
  • for: 提高会议审核质量和拒绝低质量纸张
  • methods: 使用sequential review机制,通过conditioning the review of the next paper on the review scores of the previous papers来评估作者的纸张质量
  • results: 1) 能够获取作者真实的排名信息,在现实 scenarios下比前作更真实; 2) 提高接受纸张质量,减少审核工作量,提高接受纸张的平均质量; 3) 鼓励作者写 fewer papers of higher quality.
    Abstract In the setting of conference peer review, the conference aims to accept high-quality papers and reject low-quality papers based on noisy review scores. A recent work proposes the isotonic mechanism, which can elicit the ranking of paper qualities from an author with multiple submissions to help improve the conference's decisions. However, the isotonic mechanism relies on the assumption that the author's utility is both an increasing and a convex function with respect to the review score, which is often violated in peer review settings (e.g.~when authors aim to maximize the number of accepted papers). In this paper, we propose a sequential review mechanism that can truthfully elicit the ranking information from authors while only assuming the agent's utility is increasing with respect to the true quality of her accepted papers. The key idea is to review the papers of an author in a sequence based on the provided ranking and conditioning the review of the next paper on the review scores of the previous papers. Advantages of the sequential review mechanism include 1) eliciting truthful ranking information in a more realistic setting than prior work; 2) improving the quality of accepted papers, reducing the reviewing workload and increasing the average quality of papers being reviewed; 3) incentivizing authors to write fewer papers of higher quality.
    摘要 在会议同仁评审设置下,会议希望接受高质量的文章,拒绝低质量的文章基于噪音评分。一项最近的工作提议使用ISO逻辑机制,以提取作者多篇提交的文章质量排名。然而,ISO逻辑机制假设作者的价值函数是增加和凸函数,这在同仁评审设置中经常被违反(例如,作者尝试最大化被接受的文章数量)。在这篇文章中,我们提议一种顺序评审机制,可以真实地提取作者的排名信息,只需要假设作者的价值函数是增加的。我们的顺序评审机制的优点包括:1)在更真实的设置下提取真实的排名信息;2)提高接受文章的质量,减少评审工作量,提高平均被接受的文章质量;3)激励作者写 fewer 但高质量的文章。

A Survey and Analysis of Evolutionary Operators for Permutations

  • paper_url: http://arxiv.org/abs/2311.14595
  • repo_url: https://github.com/cicirello/permutation-crossover-landscape-analysis
  • paper_authors: Vincent A. Cicirello
  • for: 该论文主要研究了 permutation 问题的进化算法。
  • methods: 论文使用了多种进化算法的Operator,包括交叉和突变Operator。
  • results: 论文通过实验分析了不同 permutation 特征的突变Operator。
    Abstract There are many combinatorial optimization problems whose solutions are best represented by permutations. The classic traveling salesperson seeks an optimal ordering over a set of cities. Scheduling problems often seek optimal orderings of tasks or activities. Although some evolutionary approaches to such problems utilize the bit strings of a genetic algorithm, it is more common to directly represent solutions with permutations. Evolving permutations directly requires specialized evolutionary operators. Over the years, many crossover and mutation operators have been developed for solving permutation problems with evolutionary algorithms. In this paper, we survey the breadth of evolutionary operators for permutations. We implemented all of these in Chips-n-Salsa, an open source Java library for evolutionary computation. Finally, we empirically analyze the crossover operators on artificial fitness landscapes isolating different permutation features.
    摘要 有很多 combinatorial optimization 问题的解是最佳 permutation。经典的旅行销售员寻找一个最佳 ordering 的城市集。调度问题通常寻找最佳 ordering 的任务或活动。虽然一些EVOLUTIONARY APPROACHES 使用生物学演化算法中的 bit string,但更常直接表示解为 permutation。进化 permutation 需要特殊的进化运算。过去几十年,为解 permutation 问题而开发了很多 crossing 和 mutation 运算。在这篇论文中,我们对 permutation 问题的进化运算进行了评估。我们在 Chips-n-Salsa,一个开源 Java 库 для进化计算中实现了所有这些运算。最后,我们对人工适应度 landscape 中的 permutation 特征进行了实验分析。

GPT Struct Me: Probing GPT Models on Narrative Entity Extraction

  • paper_url: http://arxiv.org/abs/2311.14583
  • repo_url: https://github.com/hmosousa/gpt_struct_me
  • paper_authors: Hugo Sousa, Nuno Guimarães, Alípio Jorge, Ricardo Campos
  • for: 本研究旨在评估两种现代自然语言处理模型(GPT-3和GPT-3.5)在新闻文本中提取结构化信息的能力。
  • methods: 本研究使用了两种state-of-the-art语言模型(GPT-3和GPT-3.5),通过对Text2Story Lusa数据集进行评估,以评估这两种模型在提取新闻文本中的事件、参与者和时间表达等结构化信息的能力。
  • results: 研究结果表明,GPT模型在提取新闻文本中的结构化信息方面具有竞争力,与现有的基eline系统相比,GPT模型可以作为有限资源的实践者提供一个综合性的解决方案。
    Abstract The importance of systems that can extract structured information from textual data becomes increasingly pronounced given the ever-increasing volume of text produced on a daily basis. Having a system that can effectively extract such information in an interoperable manner would be an asset for several domains, be it finance, health, or legal. Recent developments in natural language processing led to the production of powerful language models that can, to some degree, mimic human intelligence. Such effectiveness raises a pertinent question: Can these models be leveraged for the extraction of structured information? In this work, we address this question by evaluating the capabilities of two state-of-the-art language models -- GPT-3 and GPT-3.5, commonly known as ChatGPT -- in the extraction of narrative entities, namely events, participants, and temporal expressions. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles whose annotation framework includes a set of entity structures along with several tags and attribute values. We first select the best prompt template through an ablation study over prompt components that provide varying degrees of information on a subset of documents of the dataset. Subsequently, we use the best templates to evaluate the effectiveness of the models on the remaining documents. The results obtained indicate that GPT models are competitive with out-of-the-box baseline systems, presenting an all-in-one alternative for practitioners with limited resources. By studying the strengths and limitations of these models in the context of information extraction, we offer insights that can guide future improvements and avenues to explore in this field.
    摘要 随着文本数据的日常生成量不断增加,EXTRACT STRUCTURED INFORMATION FROM TEXT的重要性也在不断提高。一个可以有效地EXTRACT STRUCTURED INFORMATION的系统在各个领域都是一种资产,例如金融、医疗和法律。自然语言处理技术的发展使得可以部分模拟人类智能的语言模型出现了。这种效果提出了一个有关问题:可以使用这些模型EXTRACT STRUCTURED INFORMATION吗?在这个工作中,我们将回答这个问题,通过评估GPT-3和GPT-3.5(也称为ChatGPT)语言模型在新闻文本中EXTRACT STRUCTURED INFORMATION的能力。我们使用了Text2Story Lusa数据集,包含119篇葡萄牙文新闻文本,其中每篇文本都有一个实体结构和多个标签和属性值的注释框架。我们首先通过对一些文本的子集进行ablation研究,选择最佳提示模板。然后,我们使用最佳提示模板来评估模型在剩下的文本上的效果。结果表明,GPT模型与外部基eline系统竞争,提供一个全面的替代方案,可以为具有有限资源的实践者提供帮助。通过研究这些模型在EXTRACT STRUCTURED INFORMATION的上下文中的优劣点和局限性,我们提供了指导未来改进和探索的信息。

Counting Solutions to Conjunctive Queries: Structural and Hybrid Tractability

  • paper_url: http://arxiv.org/abs/2311.14579
  • repo_url: None
  • paper_authors: Hubie Chen, Gianluigi Greco, Stefan Mengel, Francesco Scarcello
  • for: 本研究旨在解决基本问题,即在数据库中计数搜索结果的效率。
  • methods: 本文使用 #-hypertree decompositions 方法,具体来说是利用查询的结构特性和数据库中的属性,包括键或其他弱型度Constraints,从而提高计数效率。
  • results: 本研究证明,对于具有 bounded #-hypertree width 的查询,可以在 polynomial time 内计数结果。此外,本文还证明,对于 bounded arity 查询,bounded #-hypertree width 性质 precisely delineates the frontier of tractability for the counting problem。
    Abstract Counting the number of answers to conjunctive queries is a fundamental problem in databases that, under standard assumptions, does not have an efficient solution. The issue is inherently #P-hard, extending even to classes of acyclic instances. To address this, we pinpoint tractable classes by examining the structural properties of instances and introducing the novel concept of #-hypertree decomposition. We establish the feasibility of counting answers in polynomial time for classes of queries featuring bounded #-hypertree width. Additionally, employing novel techniques from the realm of fixed-parameter computational complexity, we prove that, for bounded arity queries, the bounded #-hypertree width property precisely delineates the frontier of tractability for the counting problem. This result closes an important gap in our understanding of the complexity of such a basic problem for conjunctive queries and, equivalently, for constraint satisfaction problems (CSPs). Drawing upon #-hypertree decompositions, a ''hybrid'' decomposition method emerges. This approach leverages both the structural characteristics of the query and properties intrinsic to the input database, including keys or other (weaker) degree constraints that limit the permissible combinations of values. Intuitively, these features may introduce distinct structural properties that elude identification through the ''worst-possible database'' perspective inherent in purely structural methods.
    摘要 计算 conjunctive 查询的答案数是Database的基本问题,在标准假设下,不存在有效的解决方案。这个问题是基本 #P-hard,甚至涵盖了不向的分支实例。 To address this, we identify tractable classes by examining the structural properties of instances and introducing the novel concept of #-hypertree decomposition. We establish the feasibility of counting answers in polynomial time for classes of queries featuring bounded #-hypertree width. Additionally, employing novel techniques from the realm of fixed-parameter computational complexity, we prove that, for bounded arity queries, the bounded #-hypertree width property precisely delineates the frontier of tractability for the counting problem. This result closes an important gap in our understanding of the complexity of such a basic problem for conjunctive queries and, equivalently, for constraint satisfaction problems (CSPs). Based on #-hypertree decompositions, a ''hybrid'' decomposition method emerges. This approach leverages both the structural characteristics of the query and properties intrinsic to the input database, including keys or other (weaker) degree constraints that limit the permissible combinations of values. Intuitively, these features may introduce distinct structural properties that elude identification through the ''worst-possible database'' perspective inherent in purely structural methods.

RAISE – Radiology AI Safety, an End-to-end lifecycle approach

  • paper_url: http://arxiv.org/abs/2311.14570
  • repo_url: None
  • paper_authors: M. Jorge Cardoso, Julia Moosbauer, Tessa S. Cook, B. Selnur Erdal, Brad Genereaux, Vikash Gupta, Bennett A. Landman, Tiarna Lee, Parashkev Nachev, Elanchezhian Somasundaram, Ronald M. Summers, Khaled Younis, Sebastien Ourselin, Franz MJ Pfister
  • for: 这篇论文旨在探讨人工智能(AI)在医学影像领域的应用,以提高诊断和治疗效果,但同时也需要仔细考虑避免潜在的风险。
  • methods: 论文提出了一种遵循高标准的验证和评估过程,以确保AI模型在医疗实践中的安全、有效性和诊断能力具备最高标准。在生产环境中,实施输入和输出的“ guardrails” 来防止个别故障,并且进行不断的 postevaluation 以跟踪数据的变化、公平性和价值传递。
  • results: 论文强调了在多个层次的质量保证,包括法规、临床、技术和伦理方面,以确保AI在医疗实践中的应用。它还提出了在医疗系统、产业、学术和政府机构之间建立合作的重要性,以解决多方面的挑战。通过这种方式,开发者可以赢得用户和患者的信任,并使AI在医疗领域得到负责任的普及和应用。
    Abstract The integration of AI into radiology introduces opportunities for improved clinical care provision and efficiency but it demands a meticulous approach to mitigate potential risks as with any other new technology. Beginning with rigorous pre-deployment evaluation and validation, the focus should be on ensuring models meet the highest standards of safety, effectiveness and efficacy for their intended applications. Input and output guardrails implemented during production usage act as an additional layer of protection, identifying and addressing individual failures as they occur. Continuous post-deployment monitoring allows for tracking population-level performance (data drift), fairness, and value delivery over time. Scheduling reviews of post-deployment model performance and educating radiologists about new algorithmic-driven findings is critical for AI to be effective in clinical practice. Recognizing that no single AI solution can provide absolute assurance even when limited to its intended use, the synergistic application of quality assurance at multiple levels - regulatory, clinical, technical, and ethical - is emphasized. Collaborative efforts between stakeholders spanning healthcare systems, industry, academia, and government are imperative to address the multifaceted challenges involved. Trust in AI is an earned privilege, contingent on a broad set of goals, among them transparently demonstrating that the AI adheres to the same rigorous safety, effectiveness and efficacy standards as other established medical technologies. By doing so, developers can instil confidence among providers and patients alike, enabling the responsible scaling of AI and the realization of its potential benefits. The roadmap presented herein aims to expedite the achievement of deployable, reliable, and safe AI in radiology.
    摘要 radiology 中的 AI integreation 引入了改善临床护理和效率的机会,但它也需要一个极其小心的方法来减轻潜在的风险,与任何新技术一样。开始于严格的预部署评估和验证,我们应该确保模型达到最高的安全性、有效性和效果标准。在生产使用时,实施输入和输出的 guardrails 作为额外的保护层,并在发生个体失败时进行识别和修复。Continuous post-deployment monitoring 可以跟踪人口级别的性能(数据漂移)、公平和价值传递,并在时间上跟踪。审核 poste-deployment 模型性能和教育 radiologists 关于新算法驱动的发现是关键,以确保 AI 在临床实践中效果。认可 AI 无法提供绝对保障,即使受限于其用途,因此我们强调多级质量控制 - 法规、临床、技术和道德 - 的共同努力。医疗机构、产业、学术和政府之间的合作是必要的,以解决多方面的挑战。通过 transparent 地表明 AI 遵循同样的严格安全、有效性和效果标准,开发者可以赢得提供者和患者的信任,使 AI 负责任地扩展。以上路线图预测了在 radiology 中实现可靠、可信、安全的 AI。

Electric Vehicles coordination for grid balancing using multi-objective Harris Hawks Optimization

  • paper_url: http://arxiv.org/abs/2311.14563
  • repo_url: None
  • paper_authors: Cristina Bianca Pop, Tudor Cioara, Viorica Chifu, Ionut Anghel, Francesco Bellesini
  • for: 本研究旨在提出一种EV Fleet协调模型,以确保地方网络供应的可靠性和稳定性,并利用EV充电和充电期间的能量储存和发电。
  • methods: 本研究使用了Harris Hawks优化算法(HHO)来解决优化问题,考虑了网络能量均衡、时间使用偏好和EV驱逐器的位置。EV充电和充电时间的调整是通过探索和利用操作来实现,并确保技术和操作上的可行性。
  • results: 研究结果表明,协调充电和充电期间的EV Fleet不仅可以满足网络平衡服务要求,还与用户偏好保持相似性,差异较小。
    Abstract The rise of renewables coincides with the shift towards Electrical Vehicles (EVs) posing technical and operational challenges for the energy balance of the local grid. Nowadays, the energy grid cannot deal with a spike in EVs usage leading to a need for more coordinated and grid aware EVs charging and discharging strategies. However, coordinating power flow from multiple EVs into the grid requires sophisticated algorithms and load-balancing strategies as the complexity increases with more control variables and EVs, necessitating large optimization and decision search spaces. In this paper, we propose an EVs fleet coordination model for the day ahead aiming to ensure a reliable energy supply and maintain a stable local grid, by utilizing EVs to store surplus energy and discharge it during periods of energy deficit. The optimization problem is addressed using Harris Hawks Optimization (HHO) considering criteria related to energy grid balancing, time usage preference, and the location of EV drivers. The EVs schedules, associated with the position of individuals from the population, are adjusted through exploration and exploitation operations, and their technical and operational feasibility is ensured, while the rabbit individual is updated with a non-dominated EV schedule selected per iteration using a roulette wheel algorithm. The solution is evaluated within the framework of an e-mobility service in Terni city. The results indicate that coordinated charging and discharging of EVs not only meet balancing service requirements but also align with user preferences with minimal deviations.
    摘要 随着可再生能源的发展,电动汽车(EV)的使用也在增加,对当地电网的能源平衡造成技术和运营挑战。现在,电网无法处理大量EV的使用,导致需要更加协调和grid aware的EV充电和充电策略。然而,将多个EV的电流输入到电网中需要复杂的算法和负荷均衡策略,因为 complexity 随着更多的控制变量和EV的增加。在这篇论文中,我们提出了一种EV队伍协调模型,以确保可靠的能源供应和维护当地电网的稳定,通过使用EV存储过剩能量并在能源缺乏期间释放。这个优化问题使用Harris Hawks优化(HHO),考虑电网平衡、时间使用偏好和EV驾驶员的位置。EV的时间表,与人口中的个体相关,通过探索和利用操作进行调整,并确保技术和运营可行性。每轮更新 rabbit 个体使用不在dominated EV时间表选择。解决方案在特内利市的e-мобиility服务框架下进行评估。结果表明,协调充电和充电的EV不仅满足平衡服务要求,还与用户偏好保持一致,偏差相对较小。

Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.14552
  • repo_url: https://github.com/jefferyzhan/griffon
  • paper_authors: Yufei Zhan, Yousong Zhu, Zhiyang Chen, Fan Yang, Ming Tang, Jinqiao Wang
  • for: 该研究旨在探讨现有的大量视力语言模型(LVLM)是否可以具备基本对象感知能力,并如何在不同的本地化场景下提高其性能。
  • methods: 该研究使用了现有的LVLM,并通过自定义的语言提示集和精度的位置认知任务,使模型能够准确地识别和定位对象。
  • results: 研究表明,使用LVLM可以具备基本对象感知能力,并且可以在不同的本地化场景下提高性能。此外,提出了一种基于LVLM的新的语言提示本地化集,可以帮助提高模型的性能。
    Abstract Replicating the innate human ability to detect all objects based on free-form texts at any granularity remains a formidable challenge for Vision-Language models. Current Large Vision Language Models (LVLMs) are predominantly constrained to grounding a single, pre-existing object, relying solely on data from Referring Expression Comprehension tasks. The limitation leads to a compromise in model design, necessitating the introduction of visual expert models or the integration of customized head structures. Beyond these constraints, our research delves into the untapped potential of LVLMs and uncover their inherent capability for basic object perception, allowing them to accurately identify and locate objects of interest. Building on this insight, we introduce a novel language-prompted localization dataset designed to fully unleash the capabilities of LVLMs in integrating fine-grained object perception with precise location awareness. More importantly, we present $\textbf{Griffon}$, a purely LVLM-based baseline, which does not require the introduction of any special tokens, expert models, or additional detection modules. It simply maintains a consistent structure with popular LVLMs by unifying data formats across various localization-related scenarios and is trained end-to-end through a well-designed pipeline. Comprehensive experiments demonstrate that $\textbf{Griffon}$ not only achieves state-of-the-art performance on the fine-grained RefCOCO series but also approaches the capabilities of the expert model Faster RCNN on the detection benchmark MSCOCO.
    摘要 Current large vision language models (LVLMs) can only ground a single pre-existing object based on referring expression comprehension tasks, which limits their design and requires the use of visual expert models or customized head structures. Our research explores the untapped potential of LVLMs and discovers their inherent ability to perceive objects at a fine grain level, allowing them to accurately identify and locate objects of interest. Building on this insight, we introduce a novel language-prompted localization dataset that fully utilizes the capabilities of LVLMs in integrating fine-grained object perception with precise location awareness.We also present a purely LVLM-based baseline called Griffon, which does not require the introduction of special tokens, expert models, or additional detection modules. Griffon maintains a consistent structure with popular LVLMs and is trained end-to-end through a well-designed pipeline. Comprehensive experiments show that Griffon achieves state-of-the-art performance on the fine-grained RefCOCO series and approaches the capabilities of the expert model Faster RCNN on the detection benchmark MSCOCO.

Inferring Latent Class Statistics from Text for Robust Visual Few-Shot Learning

  • paper_url: http://arxiv.org/abs/2311.14544
  • repo_url: https://github.com/ybendou/fs-text2stats
  • paper_authors: Yassir Bendou, Vincent Gripon, Bastien Pasdeloup, Giulia Lioi, Lukas Mauch, Fabien Cardinaux, Ghouthi Boukli Hacene
  • for: 提高几个shot学习中的横跨领域稳定性
  • methods: 利用文本 derive 统计来预测每个类视觉特征分布的平均和方差
  • results: 在多个数据集上显示了在 incorporating mean和covariance统计时提高几个shot学习中的分类性能
    Abstract In the realm of few-shot learning, foundation models like CLIP have proven effective but exhibit limitations in cross-domain robustness especially in few-shot settings. Recent works add text as an extra modality to enhance the performance of these models. Most of these approaches treat text as an auxiliary modality without fully exploring its potential to elucidate the underlying class visual features distribution. In this paper, we present a novel approach that leverages text-derived statistics to predict the mean and covariance of the visual feature distribution for each class. This predictive framework enriches the latent space, yielding more robust and generalizable few-shot learning models. We demonstrate the efficacy of incorporating both mean and covariance statistics in improving few-shot classification performance across various datasets. Our method shows that we can use text to predict the mean and covariance of the distribution offering promising improvements in few-shot learning scenarios.
    摘要 在几拠学习领域,基础模型如CLIP表现良好,但在跨领域稳定性方面存在限制,特别是在几拠设置下。现有工作通过文本作为附加特征来提高这些模型的表现。大多数这些方法将文本视为辅助特征而不是全面探索文本的潜在作用,即利用文本获得类视觉特征分布的下文。在这篇论文中,我们提出一种新的方法,利用文本 derive 的统计来预测每个类的视觉特征分布的平均值和方差。这种预测框架把潜在空间增强,从而提高几拠学习模型的 robustness 和普适性。我们在不同的数据集上展示了将两者(即平均值和方差统计)合并在一起的效果,并证明文本可以用来预测分布的平均值和方差,这种方法在几拠学习场景中具有承诺性。

Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language

  • paper_url: http://arxiv.org/abs/2311.14543
  • repo_url: None
  • paper_authors: Di Jin, Shikib Mehri, Devamanyu Hazarika, Aishwarya Padmakumar, Sungjin Lee, Yang Liu, Mahdi Namazifar
  • for: 本研究旨在 investigate 大语言模型(LLM)的数据效率, Specifically, 我们在1000个记录或更少的人类反馈中进行了finetuning。
  • methods: 我们使用了一个开源的LLM,如Falcon-40B-Instruct,并在人类反馈中进行了 критики和修订。
  • results: 我们发现,这种方法可以改善很多强大的LLM,如ChatGPT、BARD和Vicuna,的答案质量。例如,经过一次修订,ChatGPT的答案的赢得率可以提高至56.6%,并且可以进一步提高至65.9% после应用五次修订。
    Abstract Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which may provide detailed feedback on strengths and weaknesses of a given response. In this work we investigate data efficiency of modeling human feedback that is in natural language. Specifically, we fine-tune an open-source LLM, e.g., Falcon-40B-Instruct, on a relatively small amount (1000 records or even less) of human feedback in natural language in the form of critiques and revisions of responses. We show that this model is able to improve the quality of responses from even some of the strongest LLMs such as ChatGPT, BARD, and Vicuna, through critique and revision of those responses. For instance, through one iteration of revision of ChatGPT responses, the revised responses have 56.6% win rate over the original ones, and this win rate can be further improved to 65.9% after applying the revision for five iterations.
    摘要

RDF Stream Taxonomy: Systematizing RDF Stream Types in Research and Practice

  • paper_url: http://arxiv.org/abs/2311.14540
  • repo_url: None
  • paper_authors: Piotr Sowinski, Pawel Szmeja, Maria Ganzha, Marcin Paprzycki
  • for: 本研究旨在 Addressing the critical research gap in RDF streaming 范畴的系统化和描述,提供一个 novel taxonomy (RDF-STaX) 以推动 RDF 流程的研究和实践。
  • methods: 本研究使用 OWL 2 DL ontology 和 FAIR 原则,提供了详细的文档和其他资源,以促进 ontology 的采用。 二个实现的用例显示了资源的使用方式,并且提供了一个 collaborative, living state-of-the-art 的 RDF 流程评论。
  • results: 本研究实现了一个 novel nanopublications dataset, serves as a collaborative, living state-of-the-art review of RDF streaming。 RDF-STaX 这个资源旨在推动 RDF 流程的科研讨论、合作和工具互操作性。
    Abstract Over the years, RDF streaming was explored in research and practice from many angles, resulting in a wide range of RDF stream definitions. This variety presents a major challenge in discussing and integrating streaming solutions, due to the lack of a common language. This work attempts to address this critical research gap, by systematizing RDF stream types present in the literature in a novel taxonomy. The proposed RDF Stream Taxonomy (RDF-STaX) is embodied in an OWL 2 DL ontology that follows the FAIR principles, making it readily applicable in practice. Extensive documentation and additional resources are provided, to foster the adoption of the ontology. Two realized use cases are presented, demonstrating the usefulness of the resource in discussing research works and annotating streaming datasets. Another result of this contribution is the novel nanopublications dataset, which serves as a collaborative, living state-of-the-art review of RDF streaming. The aim of RDF-STaX is to address a real need of the community for a better way to systematize and describe RDF streams. The resource is designed to help drive innovation in RDF streaming, by fostering scientific discussion, cooperation, and tool interoperability.
    摘要 The proposed RDF Stream Taxonomy (RDF-STaX) is formalized in an OWL 2 DL ontology that adheres to the FAIR principles, making it easily applicable in practice. Detailed documentation and additional resources are provided to facilitate the adoption of the ontology. Two practical use cases are presented, demonstrating the utility of the resource in discussing research works and annotating streaming datasets.Moreover, this contribution includes a novel nanopublications dataset, which serves as a collaborative, living state-of-the-art review of RDF streaming. The goal of RDF-STaX is to provide a much-needed solution for systematizing and describing RDF streams, thereby driving innovation in RDF streaming and fostering scientific discussion, cooperation, and tool interoperability.

CMed-GPT: Prompt Tuning for Entity-Aware Chinese Medical Dialogue Generation

  • paper_url: http://arxiv.org/abs/2311.14539
  • repo_url: None
  • paper_authors: Zhijie Qu, Juan Li, Zerui Ma, Jianqiang Li
  • for: 本研究旨在提高中文医疗对话生成技术的水平,以满足在线医疗咨询的需求。
  • methods: 该研究提出了基于中文医疗领域文本的 GPT 预训练语言模型(CMed-GPT),并在基本版本和大版本之间进行了比较。此外,文本中的词语和实体嵌入也被纳入对话文本中,以满足下游对话生成任务的需求。
  • results: 通过 fine-tuning 和 p-tuning,CMed-GPT 模型的 PPL 值从 8.44 降低至 7.35,证明了 CMed-GPT 模型在生成中文医疗文本方面的 Exceptional 表现。此外,研究还表明,在医疗对话生成中,包含外部信息的 incorporation 可以提高对话质量。
    Abstract Medical dialogue generation relies on natural language generation techniques to enable online medical consultations. Recently, the widespread adoption of large-scale models in the field of natural language processing has facilitated rapid advancements in this technology. Existing medical dialogue models are mostly based on BERT and pre-trained on English corpora, but there is a lack of high-performing models on the task of Chinese medical dialogue generation. To solve the above problem, this paper proposes CMed-GPT, which is the GPT pre-training language model based on Chinese medical domain text. The model is available in two versions, namely, base and large, with corresponding perplexity values of 8.64 and 8.01. Additionally, we incorporate lexical and entity embeddings into the dialogue text in a uniform manner to meet the requirements of downstream dialogue generation tasks. By applying both fine-tuning and p-tuning to CMed-GPT, we lowered the PPL from 8.44 to 7.35. This study not only confirms the exceptional performance of the CMed-GPT model in generating Chinese biomedical text but also highlights the advantages of p-tuning over traditional fine-tuning with prefix prompts. Furthermore, we validate the significance of incorporating external information in medical dialogue generation, which enhances the quality of dialogue generation.
    摘要 医疗对话生成依靠自然语言生成技术,以实现在线医疗咨询。近年来,大规模模型在自然语言处理领域的普及,使得医疗对话生成技术得到了快速的进步。现有的医疗对话模型大多基于BERT并在英语 corpus 上进行预训练,但是中文医疗对话生成模型却缺乏高性能模型。为解决这个问题,本文提出了CMed-GPT模型,它是基于中文医疗领域文本的 GPT 预训练语言模型。该模型有两个版本,即基础版和大版本,其中每个版本的 PPL 值分别为8.64和8.01。此外,我们在对话文本中uniformly embedding lexical和实体信息,以满足下游对话生成任务的需求。通过对 CMed-GPT 模型进行 fine-tuning 和 p-tuning,我们将 PPL 从8.44下降到7.35。本研究不仅证明了 CMed-GPT 模型在生成中文医疗文本方面的 exceptional 表现,还 highlights 精度调整的优势,以及在医疗对话生成中 External information 的添加对对话质量有益的效果。

Digital Twin-Native AI-Driven Service Architecture for Industrial Networks

  • paper_url: http://arxiv.org/abs/2311.14532
  • repo_url: None
  • paper_authors: Kubra Duran, Matthew Broadbent, Gokhan Yurdakul, Berk Canberk
  • for: 提高智能网络管理需求的减少,以满足大规模网络的精准监测和学习需求。
  • methods: 提议DT NATIVE AI驱动服务架构,实现了TCP基于数据流管道和RL基于学习模型。
  • results: 对Internet of Vehicles(IoV)网络进行应用,实现了约30%的处理时间减少,并测试了不同学习率组合的actor和critic网络性能。
    Abstract The dramatic increase in the connectivity demand results in an excessive amount of Internet of Things (IoT) sensors. To meet the management needs of these large-scale networks, such as accurate monitoring and learning capabilities, Digital Twin (DT) is the key enabler. However, current attempts regarding DT implementations remain insufficient due to the perpetual connectivity requirements of IoT networks. Furthermore, the sensor data streaming in IoT networks cause higher processing time than traditional methods. In addition to these, the current intelligent mechanisms cannot perform well due to the spatiotemporal changes in the implemented IoT network scenario. To handle these challenges, we propose a DT-native AI-driven service architecture in support of the concept of IoT networks. Within the proposed DT-native architecture, we implement a TCP-based data flow pipeline and a Reinforcement Learning (RL)-based learner model. We apply the proposed architecture to one of the broad concepts of IoT networks, the Internet of Vehicles (IoV). We measure the efficiency of our proposed architecture and note ~30% processing time-saving thanks to the TCP-based data flow pipeline. Moreover, we test the performance of the learner model by applying several learning rate combinations for actor and critic networks and highlight the most successive model.
    摘要 “由于互联网的需求增加,现在有大量的物联网(IoT)实现。为了管理这些大规模网络的需求,例如精准监控和学习能力,then Digital Twin(DT)是关键的启动器。但现有DT实现的尝试仍然不足,主要是因为互联网的无穷连接需求。此外,IoT网络中的感应器资料流对传统方法来处理的时间长得多。此外,目前的智能机制不能够在实现的IoT网络场景中表现良好,这是因为当前的IoT网络场景中存在着空间时间的变化。为了解决这些挑战,我们提议一个DT-Native AI驱动服务架构,用于支持IoT网络概念。在我们的提议架构中,我们实现了基于TCP的数据流水平和基于强化学习(RL)的学习模型。我们将这个架构应用到互联网网络中的一个广泛概念,即互联网交通(IoV)。我们测量了我们的提议架构的效率,发现在TCP基础的数据流水平下,可以节省约30%的处理时间。此外,我们测试了学习模型的性能,使用了不同的学习率组合 дляactor和批评网络,并评估了最成功的模型。”

FRAD: Front-Running Attacks Detection on Ethereum using Ternary Classification Model

  • paper_url: http://arxiv.org/abs/2311.14514
  • repo_url: None
  • paper_authors: Yuheng Zhang, Pin Liu, Guojun Wang, Peiqiang Li, Wanyi Gu, Houji Chen, Xuelei Liu, Jinyao Zhu
  • for: 本研究旨在提供一种准确地检测ETHEREUM上的前播攻击方法,以保护交易安全性。
  • methods: 该研究提出了一种基于ternary分类模型的FRAD(前播攻击检测模型),可以准确地分类ETHEREUM上的交易活动,并对交易执行进行检测和分类。
  • results: 实验结果表明,使用多层感知器(MLP)分类器可以达到84.59%的检测精度和84.60%的F1分数,表明FRAD模型可以高效地检测前播攻击。
    Abstract With the evolution of blockchain technology, the issue of transaction security, particularly on platforms like Ethereum, has become increasingly critical. Front-running attacks, a unique form of security threat, pose significant challenges to the integrity of blockchain transactions. In these attack scenarios, malicious actors monitor other users' transaction activities, then strategically submit their own transactions with higher fees. This ensures their transactions are executed before the monitored transactions are included in the block. The primary objective of this paper is to delve into a comprehensive classification of transactions associated with front-running attacks, which aims to equip developers with specific strategies to counter each type of attack. To achieve this, we introduce a novel detection method named FRAD (Front-Running Attacks Detection on Ethereum using Ternary Classification Model). This method is specifically tailored for transactions within decentralized applications (DApps) on Ethereum, enabling accurate classification of front-running attacks involving transaction displacement, insertion, and suppression. Our experimental validation reveals that the Multilayer Perceptron (MLP) classifier offers the best performance in detecting front-running attacks, achieving an impressive accuracy rate of 84.59% and F1-score of 84.60%.
    摘要 The primary objective of this paper is to classify transactions associated with front-running attacks in a comprehensive manner, equipping developers with specific strategies to counter each type of attack. To achieve this, we propose a novel detection method called FRAD (Front-Running Attacks Detection on Ethereum using Ternary Classification Model), which is specifically tailored for transactions within decentralized applications (DApps) on Ethereum. This method can accurately classify front-running attacks involving transaction displacement, insertion, and suppression.Our experimental validation shows that the Multilayer Perceptron (MLP) classifier offers the best performance in detecting front-running attacks, with an impressive accuracy rate of 84.59% and F1-score of 84.60%.

StableSSM: Alleviating the Curse of Memory in State-space Models through Stable Reparameterization

  • paper_url: http://arxiv.org/abs/2311.14495
  • repo_url: None
  • paper_authors: Shida Wang, Qianxiao Li
  • for: 这 paper 研究了 state-space models (SSMs) 的长期记忆学习能力,从参数化的角度出发。
  • methods: 这 paper 使用了一种批量梯度下降法来优化 SSMs,并提出了一种类型的重parameterization 技术来解决 SSMs 的记忆限制。
  • results: 这 paper 发现,使用重parameterization 技术可以解决 SSMs 的记忆限制,并且可以提高其近似能力和优化稳定性。
    Abstract In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets and language models.
    摘要 在这篇论文中,我们研究了状态空间模型(SSM)的长期记忆学习能力,从参数化的角度来看。我们证明了没有重parameterization的状态空间模型会显示出类似于传统RNN的记忆限制:target关系可以被稳定地由状态空间模型 aproximate的必须有减少幅度的记忆。我们的分析表明这是由于恒等重量 converging to a stability boundary的结果,这些重量可以通过重parameterization技术得到有效地提高。为此,我们介绍了一类重parameterization技术,可以有效地提高SSM的approximation能力。此外,我们还证明了一个原则性的选择重parameterization方案可以提高优化稳定性。我们验证了我们的发现使用 sintetic数据和语言模型。

Sliding Window FastEdit: A Framework for Lesion Annotation in Whole-body PET Images

  • paper_url: http://arxiv.org/abs/2311.14482
  • repo_url: https://github.com/matt3o/autopet2-submission
  • paper_authors: Matthias Hadlich, Zdravko Marinov, Moon Kim, Enrico Nasca, Jens Kleesiek, Rainer Stiefelhagen
  • for: 这篇论文主要针对的是解决核医影像中疾病分类的准确性问题,但是需要大量的手动 voxel 注释来训练。
  • methods: 该论文提出了 SW-FastEdit 交互分割框架,通过只需要一些用户点击来加速分割,而不是原来的 voxelwise 注释。
  • results: 该模型在 AutoPET 数据集上比非静止窗口交互模型准确,并且可以通过静止窗口方式在 HECKTOR 数据集上进行推断。用户研究发现,只需要10次点击循环就能达到高质量预测,并且 NASA-TLX 工作负担较低。
    Abstract Deep learning has revolutionized the accurate segmentation of diseases in medical imaging. However, achieving such results requires training with numerous manual voxel annotations. This requirement presents a challenge for whole-body Positron Emission Tomography (PET) imaging, where lesions are scattered throughout the body. To tackle this problem, we introduce SW-FastEdit - an interactive segmentation framework that accelerates the labeling by utilizing only a few user clicks instead of voxelwise annotations. While prior interactive models crop or resize PET volumes due to memory constraints, we use the complete volume with our sliding window-based interactive scheme. Our model outperforms existing non-sliding window interactive models on the AutoPET dataset and generalizes to the previously unseen HECKTOR dataset. A user study revealed that annotators achieve high-quality predictions with only 10 click iterations and a low perceived NASA-TLX workload. Our framework is implemented using MONAI Label and is available: https://github.com/matt3o/AutoPET2-Submission/
    摘要 深度学习已经革命化医疗影像中精准的疾病分割。然而,实现这些结果需要训练数量很大的手动 voxel 注释。这个要求对整体体 Positron Emission Tomography (PET) 成像而言是一个挑战,因为疾病是全身贯穿的。为解决这个问题,我们介绍 SW-FastEdit - 一个互动性分割框架,可以通过只需要几个用户点击来加速标注。而先前的互动模型因内存限制而会裁剪或重新大小 PET Volume,我们的滑块窗口基本互动方案可以使用完整的 Volume。我们的模型在 AutoPET 数据集上超越了现有的非滑块窗口互动模型,并在之前未seen的 HECKTOR 数据集上保持了良好的一致性。一个用户研究表明,用户可以在 10 次点击 iterations 内获得高质量预测,并且 NASA-TLX 工作负担低。我们的框架使用 MONAI Label 实现,可以在 GitHub 上获取:https://github.com/matt3o/AutoPET2-Submission/

Evolutionary game theory: the mathematics of evolution and collective behaviours

  • paper_url: http://arxiv.org/abs/2311.14480
  • repo_url: None
  • paper_authors: The Anh Han
  • for: 这篇论文旨在探讨EVOLUTIONARY GAME THEORY作为集体行为进化的 poderful和统一的数学工具。
  • methods: 这些研究方向使用EVOLUTIONARY GAME THEORY方法,包括统计性质分析random evolutionary game中稳定平衡数量的性质,以及模拟高等技术发展的危险和AI技术发展竞赛中安全行为的EVOLUTION。
  • results: 这些研究得到了一些有趣的结论,例如:在random evolutionary game中,稳定平衡数量的统计性质具有某些特征,并且AI技术发展竞赛中,安全行为的EVOLUTION可以帮助减少技术发展中的风险。
    Abstract This brief discusses evolutionary game theory as a powerful and unified mathematical tool to study evolution of collective behaviours. It summarises some of my recent research directions using evolutionary game theory methods, which include i) the analysis of statistical properties of the number of (stable) equilibria in a random evolutionary game, and ii) the modelling of safety behaviours' evolution and the risk posed by advanced Artificial Intelligence technologies in a technology development race. Finally, it includes an outlook and some suggestions for future researchers.
    摘要
  1. The analysis of statistical properties of the number of (stable) equilibria in a random evolutionary game.2. The modeling of safety behaviors’ evolution and the risk posed by advanced Artificial Intelligence technologies in a technology development race.Finally, it includes an outlook and some suggestions for future researchers.Translated into Simplified Chinese:这个简报讨论了EVOLUTIONARY GAME theory作为集合行为的演化的强大和统一的数学工具。它概述了我最近的研究方向,使用EVOLUTIONARY GAME theory方法,包括:1. 随机演化游戏中稳定平衡数量的统计性质的分析。2. 技术发展竞赛中安全行为的演化和高级人工智能技术的风险的模型。最后,它包括一个看门和未来研究者的建议。Translated into Traditional Chinese:这个简报讨论了EVOLUTIONARY GAME theory作为集合行为的演化的强大和统一的数学工具。它概述了我最近的研究方向,使用EVOLUTIONARY GAME theory方法,包括:1. 随机演化游戏中稳定平衡数量的统计性质的分析。2. 技术发展竞赛中安全行为的演化和高级人工智能技术的风险的模型。最后,它包括一个看门和未来研究者的建议。

MRxaI: Black-Box Explainability for Image Classifiers in a Medical Setting

  • paper_url: http://arxiv.org/abs/2311.14471
  • repo_url: None
  • paper_authors: Nathan Blake, Hana Chockler, David A. Kelly, Santiago Calderon Pena, Akchunya Chanchal
  • for: 这 paper 是关于解释静止图像分类器输出的研究,尤其是针对医疗领域中的 MRI 图像。
  • methods: 这 paper 使用了多种黑盒方法,包括 causal explainability-based rex,以及其他一些常见的黑盒方法。
  • results: 研究发现,大多数黑盒方法不适合解释医疗领域中的静止图像分类结果,而 causal explainability-based rex 则能够与 gradcam 相比,表现很好。
    Abstract Existing tools for explaining the output of image classifiers can be divided into white-box, which rely on access to the model internals, and black-box, agnostic to the model. As the usage of AI in the medical domain grows, so too does the usage of explainability tools. Existing work on medical image explanations focuses on white-box tools, such as gradcam. However, there are clear advantages to switching to a black-box tool, including the ability to use it with any classifier and the wide selection of black-box tools available. On standard images, black-box tools are as precise as white-box. In this paper we compare the performance of several black-box methods against gradcam on a brain cancer MRI dataset. We demonstrate that most black-box tools are not suitable for explaining medical image classifications and present a detailed analysis of the reasons for their shortcomings. We also show that one black-box tool, a causal explainability-based rex, performs as well as \gradcam.
    摘要 现有的图像分类器解释工具可以分为白盒和黑盒两类,其中白盒工具需要对模型内部具有访问权,而黑盒工具则不依赖于模型。随着医疗领域中的AI应用的广泛使用,解释工具的使用也在不断增长。现有的医疗图像解释工作主要关注白盒工具,如gradcam。然而,使用黑盒工具有很多优点,包括可以与任何分类器结合使用,并且有很多黑盒工具可供选择。在标准图像上,黑盒工具的精度与白盒工具相当。在这篇论文中,我们比较了多个黑盒方法与gradcam的性能,并发现大多数黑盒工具不适用于医疗图像分类的解释,并提供了详细的分析原因。此外,我们还发现一种黑盒工具,基于 causal explainability 的 rex,与 gradcam 的性能相当。

How to ensure a safe control strategy? Towards a SRL for urban transit autonomous operation

  • paper_url: http://arxiv.org/abs/2311.14457
  • repo_url: None
  • paper_authors: Zicong Zhao
    for: This paper proposes a framework for safe and efficient decision-making in urban rail transit autonomous operation using deep reinforcement learning.methods: The proposed framework combines linear temporal logic, reinforcement learning, and Monte Carlo tree search, and consists of four main modules: a post-posed shielding, a searching tree module, a DRL framework, and an additional actor.results: The proposed framework can meet speed constraints, schedule constraints, and optimize the operation process, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.
    Abstract Deep reinforcement learning has gradually shown its latent decision-making ability in urban rail transit autonomous operation. However, since reinforcement learning can not neither guarantee safety during learning nor execution, this is still one of the major obstacles to the practical application of reinforcement learning. Given this drawback, reinforcement learning applied in the safety-critical autonomous operation domain remains challenging without generating a safe control command sequence that avoids overspeed operations. Therefore, a SSA-DRL framework is proposed in this paper for safe intelligent control of urban rail transit autonomous operation trains. The proposed framework is combined with linear temporal logic, reinforcement learning and Monte Carlo tree search and consists of four mainly module: a post-posed shielding, a searching tree module, a DRL framework and an additional actor. Furthermore, the output of the framework can meet speed constraint, schedule constraint and optimize the operation process. Finally, the proposed SSA-DRL framework for decision-making in urban rail transit autonomous operation is evaluated in sixteen different sections, and its effectiveness is demonstrated through an ablation experiment and comparison with the scheduled operation plan.
    摘要 深度强化学习逐渐显示了它在城市轨道交通自动化操作中的隐藏决策能力。然而,由于强化学习无法保证学习和执行过程中的安全,这还是城市轨道交通自动化操作中强化学习应用的主要障碍。因此,本文提出了一个SSA-DRL框架,用于安全智能控制城市轨道交通自动化操作列车。该框架组合了线性时间逻辑、强化学习和蒙地卡树搜索,包括四个主要模块: posterior shielding、搜索树模块、DRL框架和附加actor。此外,Output of the framework can meet speed constraint, schedule constraint and optimize the operation process。最后,本文提出的SSA-DRL框架在16个不同的段落中进行了评估,并通过ablation experiment和比较与规划操作计划进行了证明其效果。

Universal Jailbreak Backdoors from Poisoned Human Feedback

  • paper_url: http://arxiv.org/abs/2311.14455
  • repo_url: https://github.com/ethz-spylab/rlhf-poisoning
  • paper_authors: Javier Rando, Florian Tramèr
  • for: 这 paper 用于研究语音模型中的偏见和攻击,以及如何通过人工反馈学习(RLHF)来实现模型的适应和安全性。
  • methods: 这 paper 使用了RLHF方法,并调查了攻击者可能会植入“跳过增强”后门到模型中的可能性。
  • results: 研究发现,universal 跳过增强后门可以让模型生成有害的回答,而且这种后门比之前的研究中的后门更加强大和难以植入。
    Abstract Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses. Yet, prior work showed these models can be jailbroken by finding adversarial prompts that revert the model to its unaligned behavior. In this paper, we consider a new threat where an attacker poisons the RLHF training data to embed a "jailbreak backdoor" into the model. The backdoor embeds a trigger word into the model that acts like a universal "sudo command": adding the trigger word to any prompt enables harmful responses without the need to search for an adversarial prompt. Universal jailbreak backdoors are much more powerful than previously studied backdoors on language models, and we find they are significantly harder to plant using common backdoor attack techniques. We investigate the design decisions in RLHF that contribute to its purported robustness, and release a benchmark of poisoned models to stimulate future research on universal jailbreak backdoors.
    摘要 人工智能反馈学习(RLHF)用于让大型自然语言模型生成有用和无害的回复。然而,先前的工作表明这些模型可以通过找到攻击性提示来折衣模型的行为。在这篇论文中,我们考虑了一个新的威胁,即攻击者杀害RLHF训练数据,以嵌入一个"监狱援助后门"到模型中。这个后门包含一个触发词,可以让任何提示都会让模型生成有害的回复,不需要搜索攻击性提示。这种全局监狱后门比先前研究中的后门更加强大,我们发现它们使用常见后门攻击技术植入非常困难。我们 investigate RLHF的设计决策,并发布一个恶作用模型的标准 benchmark,以便未来的研究人员可以启发更多的研究。

Deep Learning for Automatic Strain Quantification in Arrhythmogenic Right Ventricular Cardiomyopathy

  • paper_url: http://arxiv.org/abs/2311.14448
  • repo_url: None
  • paper_authors: Laura Alvarez-Florez, Jörg Sander, Mimount Bourfiss, Fleur V. Y. Tjong, Birgitta K. Velthuis, Ivana Išgum
    for:The paper aims to develop an automatic method for quantifying cardiac motion in arrhythmogenic right ventricular cardiomyopathy (ARVC) diagnosis using cine Cardiac Magnetic Resonance Imaging (CMRI).methods:The method uses Implicit Neural Representations (INRs) and a biomechanically informed regularization inspired by the myocardial incompressibility assumption to register CMRIs from different time points of the cardiac cycle. The method also includes a rigid registration guided by the long-axis views to rectify inter-slice misalignment and an unsupervised deep learning super-resolution approach to increase the through-plane resolution.results:The proposed method significantly improves registration performance compared to using a single view or a single-frame registration method. Additionally, the method is able to quantify global and segmental strain over a cardiac cycle and compute the peak strain, which can assist in diagnosis and provide further understanding of disease-specific alterations of cardiac motion. The results show significant differences in the peak strain between ARVC patients and healthy controls, suggesting that automated motion quantification methods may be useful for diagnosis.Here’s the Chinese translation of the three points:for:这篇论文目标是使用cine Cardiac Magnetic Resonance Imaging(CMRI)诊断Arrhythmogenic right ventricular cardiomyopathy(ARVC),自动量化心肺运动。methods:该方法使用Implicit Neural Representations(INRs)和基于机能学的压缩regularization,以填充CMRI中的不同时间点的征识。此外,方法还包括由长轴视图引导的稳定registrations,以消除 междуslice的偏移。results:提议的方法可以显著提高registration性能,比单视图或单帧registration方法更为稳定。此外,方法还可以量化心肺运动的全球和分部弹性,并计算最大弹性值。结果表明,Automated motion量化方法可能是诊断ARVC的有用工具。
    Abstract Quantification of cardiac motion with cine Cardiac Magnetic Resonance Imaging (CMRI) is an integral part of arrhythmogenic right ventricular cardiomyopathy (ARVC) diagnosis. Yet, the expert evaluation of motion abnormalities with CMRI is a challenging task. To automatically assess cardiac motion, we register CMRIs from different time points of the cardiac cycle using Implicit Neural Representations (INRs) and perform a biomechanically informed regularization inspired by the myocardial incompressibility assumption. To enhance the registration performance, our method first rectifies the inter-slice misalignment inherent to CMRI by performing a rigid registration guided by the long-axis views, and then increases the through-plane resolution using an unsupervised deep learning super-resolution approach. Finally, we propose to synergically combine information from short-axis and 4-chamber long-axis views, along with an initialization to incorporate information from multiple cardiac time points. Thereafter, to quantify cardiac motion, we calculate global and segmental strain over a cardiac cycle and compute the peak strain. The evaluation of the method is performed on a dataset of cine CMRI scans from 47 ARVC patients and 67 controls. Our results show that inter-slice alignment and generation of super-resolved volumes combined with joint analysis of the two cardiac views, notably improves registration performance. Furthermore, the proposed initialization yields more physiologically plausible registrations. The significant differences in the peak strain, discerned between the ARVC patients and healthy controls suggest that automated motion quantification methods may assist in diagnosis and provide further understanding of disease-specific alterations of cardiac motion.
    摘要 干预心脏动力学诊断中,心脏运动评估是一个重要的组成部分。然而,通过人工评估心脏运动的CMRI图像是一项具有挑战性的任务。为了自动评估心脏运动,我们使用Implicit Neural Representations(INRs)进行心脏图像匹配,并采用生物力学 informed regularization,以便更好地匹配心脏的压缩性假设。为了提高匹配性能,我们的方法首先将CMRI图像中的横截轴偏移 rectify,并使用深度学习超分解技术来提高过滤平面分辨率。最后,我们提议同时使用短轴和4个胸部长轴图像,并使用初始化来汇集多个心脏时间点的信息。然后,我们计算全心和局部弹性,并计算心脏的峰弹性。我们的结果表明,通过同时使用短轴和4个胸部长轴图像,并使用初始化来汇集多个心脏时间点的信息,可以提高匹配性能。此外,我们的初始化方法可以更好地匹配心脏的动力学特征。在ARVC患者和健康群体之间的峰弹性差异显著,这表明自动动量评估方法可能可以帮助诊断和提供更深入的心脏动力学特征的理解。

GCPV: Guided Concept Projection Vectors for the Explainable Inspection of CNN Feature Spaces

  • paper_url: http://arxiv.org/abs/2311.14435
  • repo_url: None
  • paper_authors: Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, Korinna Bade
  • for: 本研究旨在提高计算机视觉深度神经网络(CNN)的解释性和可读性,以便人工检查学习的潜在表示。
  • methods: 本研究使用的方法包括:globally associate given natural language semantic concepts with representing vectors or regions in the CNN latent space,以及 hierarchical clustering。
  • results: 研究结果显示,引入了本地到全局导向概念向量(GCPV)方法后,对象检测器的性能得到了改进,并且可以具有多层概念向量的好处和强健性。此外,GCPV可以用于找到混淆的概念之根本原因,并且可以揭示概念水平的异常值。
    Abstract For debugging and verification of computer vision convolutional deep neural networks (CNNs) human inspection of the learned latent representations is imperative. Therefore, state-of-the-art eXplainable Artificial Intelligence (XAI) methods globally associate given natural language semantic concepts with representing vectors or regions in the CNN latent space supporting manual inspection. Yet, this approach comes with two major disadvantages: They are locally inaccurate when reconstructing a concept label and discard information about the distribution of concept instance representations. The latter, though, is of particular interest for debugging, like finding and understanding outliers, learned notions of sub-concepts, and concept confusion. Furthermore, current single-layer approaches neglect that information about a concept may be spread over the CNN depth. To overcome these shortcomings, we introduce the local-to-global Guided Concept Projection Vectors (GCPV) approach: It (1) generates local concept vectors that each precisely reconstruct a concept segmentation label, and then (2) generalizes these to global concept and even sub-concept vectors by means of hiearchical clustering. Our experiments on object detectors demonstrate improved performance compared to the state-of-the-art, the benefit of multi-layer concept vectors, and robustness against low-quality concept segmentation labels. Finally, we demonstrate that GCPVs can be applied to find root causes for confusion of concepts like bus and truck, and reveal interesting concept-level outliers. Thus, GCPVs pose a promising step towards interpretable model debugging and informed data improvement.
    摘要 为了调试和验证计算机视觉深度学习模型(CNN),人工检查学习的秘密表示(latent space)是必要的。因此,当前的可解释人工智能(XAI)技术将自然语言 semantic concept 与 CNN latent space 中表示vector或区域相关联。然而,这种方法有两个主要缺点:它们在恢复概念标签时是地方准确的,并且抛弃了概念实例表示的分布信息。后者对于调试,如找到和理解异常实例、学习的子概念和概念混淆而言是非常有利。此外,当前的单层方法忽略了概念信息可能在 CNN 深度中分布。为了解决这些缺点,我们提出了本地到全局导向概念投影向量(GCPV)方法:它首先生成每个精确地恢复一个概念分割标签的本地概念vector,然后通过层次归一化来将其扩展到全局概念vector和even sub-concept vector。我们在对物体检测器进行了实验,并证明了我们的方法在比state-of-the-art更高的性能、多层概念vector的利用和低质量概念分割标签的Robustness。最后,我们示示了GCPV可以用于找到混淆的概念,并发现有趣的概念水平异常。因此,GCPVposes a promising step towards interpretable model debugging和 informed data improvement。

Learning to Cooperate and Communicate Over Imperfect Channels

  • paper_url: http://arxiv.org/abs/2311.14770
  • repo_url: None
  • paper_authors: Jannis Weil, Gizem Ekinci, Heinz Koeppl, Tobias Meuser
  • for: 提高多代理系统中代理之间协作性,特别在部分可见情况下。
  • methods: 使用独立Q学习算法,允许代理在不可靠通道上进行归一化交流,并适应不同通道特性。
  • results: 在一个新的数字预测环境中,我们的方法比无适应能力的方法表现出色,并且在交通枢纽环境中有限制。
    Abstract Information exchange in multi-agent systems improves the cooperation among agents, especially in partially observable settings. In the real world, communication is often carried out over imperfect channels. This requires agents to handle uncertainty due to potential information loss. In this paper, we consider a cooperative multi-agent system where the agents act and exchange information in a decentralized manner using a limited and unreliable channel. To cope with such channel constraints, we propose a novel communication approach based on independent Q-learning. Our method allows agents to dynamically adapt how much information to share by sending messages of different sizes, depending on their local observations and the channel's properties. In addition to this message size selection, agents learn to encode and decode messages to improve their jointly trained policies. We show that our approach outperforms approaches without adaptive capabilities in a novel cooperative digit-prediction environment and discuss its limitations in the traffic junction environment.
    摘要 多智能体系中的信息交换可以提高智能体之间的合作,特别是在部分可见的设定下。在实际世界中,通信经常通过不可靠的通道进行。这需要智能体处理通信中的uncertainty,以适应可能的信息损失。在这篇论文中,我们考虑了一个合作多智能体系统,在这个系统中,智能体在分布式的方式进行行动和信息交换,使用有限和不可靠的通道进行通信。为了应对这种通道的限制,我们提出了一种新的通信方法,基于独立Q学习。我们的方法允许智能体在本地观察和通道的性质基础上选择发送信息的大小,并且学习编码和解码消息以提高其共同训练的策略。我们的方法在一个新的合作数字预测环境中表现出了超过非适应方法的优越性,并且对交通拐点环境中的局限性进行了讨论。

Human-Machine Cooperative Multimodal Learning Method for Cross-subject Olfactory Preference Recognition

  • paper_url: http://arxiv.org/abs/2311.14426
  • repo_url: None
  • paper_authors: Xiuxin Xia, Yuchen Guo, Yanwei Wang, Yuchao Yang, Yan Shi, Hong Men
  • for: 这种研究是为了开发一种跨个体嗅觉喜好认知方法,以便在食品、服装、化妆品等领域进行嗅觉评估。
  • methods: 这种方法使用了电子鼻(E-nose)和嗅觉电 Encyclopaedia(EEG)的多模态学习方法,以实现跨个体嗅觉喜好认知。
  • results: 研究结果表明,该方法可以在24名参与者中实现跨个体嗅觉喜好认知,并且认知效果比现有方法更高。此外,该方法的优势在于可以准确地捕捉嗅觉信息和个体情感信息,因此有很好的应用前景在实际嗅觉评估中。
    Abstract Odor sensory evaluation has a broad application in food, clothing, cosmetics, and other fields. Traditional artificial sensory evaluation has poor repeatability, and the machine olfaction represented by the electronic nose (E-nose) is difficult to reflect human feelings. Olfactory electroencephalogram (EEG) contains odor and individual features associated with human olfactory preference, which has unique advantages in odor sensory evaluation. However, the difficulty of cross-subject olfactory EEG recognition greatly limits its application. It is worth noting that E-nose and olfactory EEG are more advantageous in representing odor information and individual emotions, respectively. In this paper, an E-nose and olfactory EEG multimodal learning method is proposed for cross-subject olfactory preference recognition. Firstly, the olfactory EEG and E-nose multimodal data acquisition and preprocessing paradigms are established. Secondly, a complementary multimodal data mining strategy is proposed to effectively mine the common features of multimodal data representing odor information and the individual features in olfactory EEG representing individual emotional information. Finally, the cross-subject olfactory preference recognition is achieved in 24 subjects by fusing the extracted common and individual features, and the recognition effect is superior to the state-of-the-art recognition methods. Furthermore, the advantages of the proposed method in cross-subject olfactory preference recognition indicate its potential for practical odor evaluation applications.
    摘要 气味评估在食品、服装、化妆品等领域有广泛的应用。传统的人工气味评估有重复性问题,而电子鼻(E-nose)则难以反映人类情感。气味电enzephalogram(EEG)含有气味信息和个体特征,这些特征与人类气味偏好有着独特的优势。然而,跨主体气味EEG认知的困难很大限制了其应用。值得注意的是,E-nose和气味EEG各有其优势,E-nose更好地表达气味信息,而气味EEG更好地反映个体情感。在这篇论文中,一种E-nose和气味EEG多模式学习方法被提出,用于跨主体气味偏好认知。首先,确立了气味EEG和E-nose多模式数据收集和处理方法。然后,提出了一种补充多模式数据挖掘策略,以有效挖掘表达气味信息和个体情感的共同特征。最后,在24名主体中实现了跨主体气味偏好认知,并将提取的共同特征和个体特征进行融合,实现了更高的认知效果。此外,提出的方法在跨主体气味偏好认知中的优势,表明其在实际气味评估应用中具有潜在的潜力。

AdaDiff: Adaptive Step Selection for Fast Diffusion

  • paper_url: http://arxiv.org/abs/2311.14768
  • repo_url: None
  • paper_authors: Hui Zhang, Zuxuan Wu, Zhen Xing, Jie Shao, Yu-Gang Jiang
  • for: 提高 diffusion 模型中的渲染速度,适应不同输入文本的质量。
  • methods: 引入 AdaDiff 框架,学习实例特定的步骤使用策略,并使用政策梯度法优化。
  • results: 在三个图像生成和两个视频生成 benchmark 上,与基eline 使用固定 50 步骤的同等质量视觉效果,降低推理时间至少 33%,最高达 40%。
    Abstract Diffusion models, as a type of generative models, have achieved impressive results in generating images and videos conditioned on textual conditions. However, the generation process of diffusion models involves denoising for dozens of steps to produce photorealistic images/videos, which is computationally expensive. Unlike previous methods that design ``one-size-fits-all'' approaches for speed up, we argue denoising steps should be sample-specific conditioned on the richness of input texts. To this end, we introduce AdaDiff, a lightweight framework designed to learn instance-specific step usage policies, which are then used by the diffusion model for generation. AdaDiff is optimized using a policy gradient method to maximize a carefully designed reward function, balancing inference time and generation quality. We conduct experiments on three image generation and two video generation benchmarks and demonstrate that our approach achieves similar results in terms of visual quality compared to the baseline using a fixed 50 denoising steps while reducing inference time by at least 33%, going as high as 40%. Furthermore, our qualitative analysis shows that our method allocates more steps to more informative text conditions and fewer steps to simpler text conditions.
    摘要 Diffusion模型, как一种生成模型,在生成图像和视频的条件下已经实现了很好的效果。然而,Diffusion模型的生成过程中需要进行多达 dozen 步的干净过程,以生成高质量的图像和视频,这具有计算昂贵的问题。与之前的方法不同,我们认为干净步骤应该是基于输入文本的质量来conditioned,而不是采用一szone-size-fits-all的方法来加速。为此,我们介绍了 AdaDiff,一个轻量级的框架,用于学习实例特定的步骤使用策略,然后将其用于Diffusion模型的生成。AdaDiff通过使用一种政策梯度法来优化一个特殊的奖励函数,以平衡推理时间和生成质量。我们在三个图像生成和两个视频生成 benchmark 上进行了实验,并证明了我们的方法可以在保持视觉质量的情况下,降低推理时间至少33%,最高达40%。此外,我们的质量分析表明,我们的方法会在不同的文本条件下分配不同的步骤数量,对简单的文本条件分配 fewer 步骤,对更加详细的文本条件分配更多的步骤。

LLamol: A Dynamic Multi-Conditional Generative Transformer for De Novo Molecular Design

  • paper_url: http://arxiv.org/abs/2311.14407
  • repo_url: https://github.com/fraunhofer-scai/llamol
  • paper_authors: Niklas Dobberstein, Astrid Maass, Jan Hamaekers
  • for: 本研究目的是开发一种基于Transformer架构的生成模型,用于探索有机化学空间,并找到可能具有电磁活性的分子。
  • methods: 本研究使用了一种新的训练方法,称为“随机上下文学习”,以最大化模型的灵活性和可靠性。模型可以处理单个和多个条件的有机分子生成,并可以包含数字和/或字符序列在生成过程中。
  • results: 研究表明,LLamol模型可以生成有效的有机分子结构,并且可以随意地包含数字和/或字符序列进行生成。模型在各种场景中都表现非常满意。
    Abstract Generative models have demonstrated substantial promise in Natural Language Processing (NLP) and have found application in designing molecules, as seen in General Pretrained Transformer (GPT) models. In our efforts to develop such a tool for exploring the organic chemical space in search of potentially electro-active compounds, we present "LLamol", a single novel generative transformer model based on the LLama 2 architecture, which was trained on a 13M superset of organic compounds drawn from diverse public sources. To allow for a maximum flexibility in usage and robustness in view of potentially incomplete data, we introduce "Stochastic Context Learning" as a new training procedure. We demonstrate that the resulting model adeptly handles single- and multi-conditional organic molecule generation with up to four conditions, yet more are possible. The model generates valid molecular structures in SMILES notation while flexibly incorporating three numerical and/or one token sequence into the generative process, just as requested. The generated compounds are very satisfactory in all scenarios tested. In detail, we showcase the model's capability to utilize token sequences for conditioning, either individually or in combination with numerical properties, making LLamol a potent tool for de novo molecule design, easily expandable with new properties.
    摘要 “生成模型在自然语言处理(NLP)和设计分子方面已经表现出了很大的承诺,如seen in General Pretrained Transformer(GPT)模型。为了开发一个用于探索有机化学空间,搜索可能有电动活性的分子的工具,我们介绍了“LLamol”,一种单一的生成转换器模型,基于LLama 2 架构,该模型在1300万个有机分子的超集上 receive 训练。为了实现最大的灵活性和数据不完整的鲁棒性,我们提出了“随机上下文学习”的新训练方法。我们示示了该模型可以轻松处理单个和多个条件的有机分子生成,并且可以随意地包含数字和/或字符串序列在生成过程中。生成的分子结构都是有效的SMILES记录,并且可以随意地携带三个数字和/或一个字符串序列。在所有测试场景中,生成的分子都很满意。”Note: The translation is in Simplified Chinese, which is one of the two standard versions of Chinese used in mainland China and Singapore.

Prototype of deployment of Federated Learning with IoT devices

  • paper_url: http://arxiv.org/abs/2311.14401
  • repo_url: None
  • paper_authors: Pablo García Santaclara, Ana Fernández Vilas, Rebeca P. Díaz Redondo
  • For: 本研究旨在提出一种基于联合学习的解决方案,以帮助Internet of Things(IoT)设备学习和改进模型性能,而不违反数据保护法规。* Methods: 本研究使用了联合学习技术,并在raspberry pi板上实现了一个详细的详细的联合学习解决方案。* Results: 研究结果显示,联合学习解决方案在一些情况下不能达到传统方法的性能水平,但可以在敏感数据保护和各种环境下进行有效的学习和改进。
    Abstract In the age of technology, data is an increasingly important resource. This importance is growing in the field of Artificial Intelligence (AI), where sub fields such as Machine Learning (ML) need more and more data to achieve better results. Internet of Things (IoT) is the connection of sensors and smart objects to collect and exchange data, in addition to achieving many other tasks. A huge amount of the resource desired, data, is stored in mobile devices, sensors and other Internet of Things (IoT) devices, but remains there due to data protection restrictions. At the same time these devices do not have enough data or computational capacity to train good models. Moreover, transmitting, storing and processing all this data on a centralised server is problematic. Federated Learning (FL) provides an innovative solution that allows devices to learn in a collaborative way. More importantly, it accomplishes this without violating data protection laws. FL is currently growing, and there are several solutions that implement it. This article presents a prototype of a FL solution where the IoT devices used were raspberry pi boards. The results compare the performance of a solution of this type with those obtained in traditional approaches. In addition, the FL solution performance was tested in a hostile environment. A convolutional neural network (CNN) and a image data set were used. The results show the feasibility and usability of these techniques, although in many cases they do not reach the performance of traditional approaches.
    摘要 在技术时代,数据已成为一种非常重要的资源。在人工智能(AI)领域,其下的机器学习(ML)需要更多的数据来达到更好的效果。互联网物联网(IoT)是连接感知器和智能设备来收集和交换数据的方式,同时具有许多其他任务的能力。很多人希望的资源——数据——却被存储在移动设备、感知器和其他互联网设备中,但由于数据保护限制而无法使用。此外,将所有这些数据传输、存储和处理到中央服务器是一项困难的任务。 Federated Learning(FL)提供了一种创新的解决方案,允许设备之间协同学习。此外,它不产生数据保护法规的违反。FL目前在发展中,有许多实现这种解决方案的解决方案。本文描述了一种基于raspberry pi板的FL解决方案的原型。测试结果显示,这种解决方案的性能与传统方法相比,在一定程度上具有可行性和实用性,但在一些情况下并不能达到传统方法的性能。Note: The translation is in Simplified Chinese, which is the standard Chinese writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Low-Cost HEM with Arduino and Zigbee Technologies in the Energy Sector in Colombia

  • paper_url: http://arxiv.org/abs/2311.14767
  • repo_url: None
  • paper_authors: Zurisaddai de la Cruz Severiche Maury, Ana Fernandez Vilas, Rebeca Diaz Redondo
  • for: 降低家庭电力消耗
  • methods: 使用低成本家电管理系统(HEMS)监控家庭常用设备的电力消耗,并让用户分别监控每个设备的消耗,以设计降低家庭电力消耗策略。
  • results: 透过试验室评估,发现在安装HEMS后,每周的电力消耗降低了27%。这显示了低成本系统可以实现好的电力消耗减少。
    Abstract Since no solutions have been proposed in Colombia that seek to reduce the consumption of electricity at the residential level, this paper describes the design and implementation of a simple prototype of a low-cost home energy management system (HEMS). The objective of this plat-form is to monitor the energy consumption of typical household devices so that users can access the consumption of each device separately and then establish the strategy that allows them to reduce energy consumption at home. In order to demonstrate that our system is viable, the system has been evaluated by measuring weekly energy consumption with the on-line and off-line HEMS using a test bench with typical household devices in a Sincelejo typical household. The evaluation has shown that with the installation of this HEMS, consumption is reduced by 27%. This shows that it is possible to achieve a good reduction percentage with a low-cost system.
    摘要 自哈利逊(Sincelejo)typical household中,没有任何解决方案来减少家庭用电量。这篇文章描述了一个简单的低成本家庭能源管理系统(HEMS)的设计和实施。该平台的目标是监控家庭常用设备的能源消耗,让用户可以分别查看每个设备的能源消耗,并根据这些信息来制定减少家庭用电的策略。为证明我们的系统的可行性,我们在一个具有典型家庭设备的测试台上测试了在线和离线HEMS。测试结果表明,通过安装我们的HEMS,每周的能源消耗可以减少27%。这表明,可以通过低成本的系统实现有效的减少。

Directly Attention Loss Adjusted Prioritized Experience Replay

  • paper_url: http://arxiv.org/abs/2311.14390
  • repo_url: None
  • paper_authors: Zhuoying Chen, Huiping Li, Zhaoxu Wang
  • for: 提高强化学习算法的训练效率和稳定性
  • methods: 使用 Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) 方法,通过自回归网络直接量化变化后的分布差异,准确补偿错误
  • results: 在值函数基本、政策梯度基本和多代强化学习算法中进行了集成,实现了提高训练效率和减少训练异谱的优势
    Abstract Prioritized Experience Replay (PER) enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, an novel off policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, so as to accurately compensate the error. In addition, a Priority-Encouragement mechanism is designed simultaneously to optimize the sample screening criterion, and further improve the training efficiency. In order to verify the effectiveness and generality of DALAP, we integrate it with the value-function based, the policy-gradient based and multi-agent reinforcement learning algorithm, respectively. The multiple groups of comparative experiments show that DALAP has the significant advantages of both improving the convergence rate and reducing the training variance.
    摘要 通过人工变更样本的访问频率,增强体验回放(PER)可以让模型更多地学习关键样本。然而,这种不均衡采样方法会导致状态动作分布的变化,从而影响估计值函数的准确性。本文提出了一种新的不对称采样学习框架,称为直接注意力补偿增强体验回放(DALAP)。DALAP可以直接量化变换后的分布变化的程度,并通过并行自注意力网络进行补偿。此外,我们同时设计了优先顺序鼓励机制,以优化样本检选标准,进一步提高训练效率。为了证明DALAP的效果和通用性,我们将其与值函数基本、政策梯度基本和多代人reno奖学习算法相结合。多组比较实验显示,DALAP具有加速收敛率和降低训练方差的显著优势。

Potential Societal Biases of ChatGPT in Higher Education: A Scoping Review

  • paper_url: http://arxiv.org/abs/2311.14381
  • repo_url: None
  • paper_authors: Ming Li, Ariunaa Enkhtur, Beverley Anne Yamamoto, Fei Cheng
    for: This scoping review aims to examine the ethical issues involved in the use of ChatGPT and other Generative Artificial Intelligence (GAI) models in higher education settings, particularly the potential biases that may be inherited or amplified.methods: The review searches for academic articles written in English, Chinese, and Japanese across four main databases concerned with GAI usage in higher education and bias.results: The majority of articles touch on “bias” at a relatively superficial level, with few identifying the types of bias that may occur under what circumstances or discussing the possible implications for higher education, staff, faculty members, or students. There is a notable lack of empirical work in this area, and the review calls for more research to be conducted.
    Abstract ChatGPT and other Generative Artificial Intelligence (GAI) models tend to inherit and even amplify prevailing societal biases as they are trained on large amounts of existing data. Given the increasing usage of ChatGPT and other GAI by students, faculty members, and staff in higher education institutions (HEIs), there is an urgent need to examine the ethical issues involved such as its potential biases. In this scoping review, we clarify the ways in which biases related to GAI in higher education settings have been discussed in recent academic publications and identify what type of potential biases are commonly reported in this body of literature. We searched for academic articles written in English, Chinese, and Japanese across four main databases concerned with GAI usage in higher education and bias. Our findings show that while there is an awareness of potential biases around large language models (LLMs) and GAI, the majority of articles touch on ``bias'' at a relatively superficial level. Few identify what types of bias may occur under what circumstances. Neither do they discuss the possible implications for the higher education, staff, faculty members, or students. There is a notable lack of empirical work at this point, and we call for higher education researchers and AI experts to conduct more research in this area.
    摘要 chatGPT和其他生成人工智能(GAI)模型通常会继承和增强现有社会偏见,因为它们在大量数据上训练。随着学生、教师和工作人员在高等教育机构(HEI)中使用chatGPT和GAI的增加,有一项急需要评估GAI在高等教育 Setting中的伦理问题。本探讨篇中,我们清楚地解释了在高等教育设置中GAI中的偏见问题,并identify了常见的potential bias类型。我们在四个主要关于GAI使用高等教育和偏见的数据库中搜索了学术文献,our findings表明,虽然有人意识到大语言模型(LLM)和GAI中的偏见问题,但大多数文献只在“偏见”的问题上进行了 superficialexploration。 few identify了哪些情况下可能出现偏见,nor do they discuss the possible implications for higher education, staff, faculty members, or students。there is a notable lack of empirical work at this point, and we call for higher education researchers and AI experts to conduct more research in this area.

Ethical implications of ChatGPT in higher education: A scoping review

  • paper_url: http://arxiv.org/abs/2311.14378
  • repo_url: None
  • paper_authors: Ming Li, Ariunaa Enkhtur, Fei Cheng, Beverley Anne Yamamoto
  • For: This paper explores the ethical challenges of using ChatGPT in education, particularly in higher education.* Methods: The paper uses a scoping review approach, reviewing recent academic articles written in English, Chinese, and Japanese to provide a comprehensive overview of relevant research and identify gaps for future considerations.* Results: The paper identifies six main areas of ethical concern in using AI in education, including misinformation harms and human-computer interaction related harms. The majority of papers reviewed were concerned with these two areas.
    Abstract This scoping review explores the ethical challenges of using ChatGPT in education, focusing particularly on issues related to higher education. By reviewing recent academic articles written in English, Chinese, and Japanese, we aimed to provide a comprehensive overview of relevant research while identifying gaps for future considerations. Drawing on Arksey and O'Malley's (2005) five-stage scoping review framework, we identified research questions, search terms, and conducted article search from four databases in the target three languages. Each article was reviewed by at least two researchers identifying the main ethical issues of utilizing AI in education, particularly higher education. Our analysis of ethical issues followed the framework developed by DeepMind (Weiginger et al., 2021) to identify six main areas of ethical concern in Language Models. The majority of papers were concerned with misinformation harms (n=25) and/or human-computer interaction related harms (n=24). Given the rapid deployment of Generative Artificial Intelligence (GAI), it is imperative for educators to conduct more empirical studies to develop sound ethical policies for the use of GAI.
    摘要 Translation notes:* "ChatGPT" was translated as "聊天GPT" (shuò tiān GPT)* "higher education" was translated as "高等教育" (gāo děng jiào yù)* "Arksey and O'Malley's" was translated as "阿克西和奥马利的" (ā kè xī yǔ ài mǎ lì de)* "five-stage scoping review framework" was translated as "五个阶段探讨框架" (wǔ gè jiē dàng tàng zhù kōng jí)* "recent academic articles" was translated as "最新的学术文章" (zuì xīn de xué xué wén zhāng)* "in the target three languages" was translated as "目标语言中的" (mù zhǎng yǔ yán zhōng de)* "each article was reviewed by at least two researchers" was translated as "每篇文章都由至少两名研究人员审核" (měi piān wén zhāng dōu yǐ jīn yī zhī shàng liǎng míng yán jí yì zhū)* "main ethical issues of utilizing AI in education" was translated as "使用AI在教育中的主要伦理问题" (shǐ yòng AI zài jiào yù zhōng de zhōng yào liú lǐ wèn tí)* "particularly higher education" was translated as "特别是高等教育" (tè bié shì gāo děng jiào yù)* "misinformation harms" was translated as "误信害" (huì xìn hài)* "human-computer interaction related harms" was translated as "人机交互相关的害" (rén jī jiāo hù xiāng guān de hài)* "Generative Artificial Intelligence (GAI)" was translated as "生成人工智能" (shēng chéng rén gōng zhì yǎng)

Federated Transformed Learning for a Circular, Secure, and Tiny AI

  • paper_url: http://arxiv.org/abs/2311.14371
  • repo_url: None
  • paper_authors: Weisi Guo, Schyler Sun, Bin Li, Sam Blakeman
  • for: 本研究旨在实现转化的深度学习表示法,以解决新任务无法忘记之前任务的问题。
  • methods: 本研究使用了深度学习技术,包括循环深度学习、安全深度学习和微型深度学习。
  • results: 研究表明,通过跨领域的激励和深度学习变换,可以实现循环安全小型AI(CST-AI)。
    Abstract Deep Learning (DL) is penetrating into a diverse range of mass mobility, smart living, and industrial applications, rapidly transforming the way we live and work. DL is at the heart of many AI implementations. A key set of challenges is to produce AI modules that are: (1) "circular" - can solve new tasks without forgetting how to solve previous ones, (2) "secure" - have immunity to adversarial data attacks, and (3) "tiny" - implementable in low power low cost embedded hardware. Clearly it is difficult to achieve all three aspects on a single horizontal layer of platforms, as the techniques require transformed deep representations that incur different computation and communication requirements. Here we set out the vision to achieve transformed DL representations across a 5G and Beyond networked architecture. We first detail the cross-sectoral motivations for each challenge area, before demonstrating recent advances in DL research that can achieve circular, secure, and tiny AI (CST-AI). Recognising the conflicting demand of each transformed deep representation, we federate their deep learning transformations and functionalities across the network to achieve connected run-time capabilities.
    摘要 深度学习(DL)在多样化的大规模移动、智能生活和工业应用中普遍传播,快速改变我们的生活和工作方式。DL是许多人工智能实施的核心。一个关键的挑战是生成能够解决新任务而不忘记之前任务的AI模块。另外,还需要实现安全的AI模块,具有对抗恶意数据攻击的免疫力,以及能够在低功耗低成本的嵌入式硬件上实现的小型AI模块。然而,在单一的平台层上实现这三个方面的要求很 diffficult,因为这些技术需要不同的计算和通信要求。在这里,我们提出了在5G和以后网络架构下实现转换的深度学习表示法的视野。我们首先详细介绍了每个挑战领域的跨领域动机,然后示cases recent advances in DL research that can achieve circular, secure, and tiny AI (CST-AI).认识每个转换深度学习的矛盾需求,我们在网络上联合深度学习转换和功能,实现连接的运行时能力。

Comparative Analysis of Transformers for Modeling Tabular Data: A Casestudy using Industry Scale Dataset

  • paper_url: http://arxiv.org/abs/2311.14335
  • repo_url: None
  • paper_authors: Usneek Singh, Piyush Arora, Shamika Ganesan, Mohit Kumar, Siddhant Kulkarni, Salil R. Joshi
  • for: 本研究是为了研究基于转换器模型的tabular数据模型化方法,特别是在大规模业务 dataset 上进行比较分析。
  • methods: 本研究使用了多种基于转换器模型的方法,包括预训练和直接监督学习方法,对 synthetic dataset 和 default prediction Kaggle dataset (2022) 进行了广泛的比较。
  • results: 研究发现了处理高维数据、有效地预处理 categorical 和数值特征以及计算资源的权衡等挑战,并提供了优化数据预处理、管理 categorical 和数值特征以及考虑计算资源和性能之间的贸易OFF的策略。
    Abstract We perform a comparative analysis of transformer-based models designed for modeling tabular data, specifically on an industry-scale dataset. While earlier studies demonstrated promising outcomes on smaller public or synthetic datasets, the effectiveness did not extend to larger industry-scale datasets. The challenges identified include handling high-dimensional data, the necessity for efficient pre-processing of categorical and numerical features, and addressing substantial computational requirements. To overcome the identified challenges, the study conducts an extensive examination of various transformer-based models using both synthetic datasets and the default prediction Kaggle dataset (2022) from American Express. The paper presents crucial insights into optimal data pre-processing, compares pre-training and direct supervised learning methods, discusses strategies for managing categorical and numerical features, and highlights trade-offs between computational resources and performance. Focusing on temporal financial data modeling, the research aims to facilitate the systematic development and deployment of transformer-based models in real-world scenarios, emphasizing scalability.
    摘要 我们进行了对transformer模型在表格数据模型方面的比较分析,特别是在大规模业务 dataset 上。EARLIER STUDIES 表明在小型公共或人工生成 dataset 上具有扎实的成果,但这些成果并没有扩展到更大的业务dataset。 THE CHALLENGES IDENTIFIED INCLUDE HANDLING HIGH-DIMENSIONAL DATA, THE NECESSITY FOR EFFICIENT PRE-PROCESSING OF CATEGORICAL AND NUMERICAL FEATURES, AND ADDRESSING SUBSTANTIAL COMPUTATIONAL REQUIREMENTS.为了解决这些挑战,该研究使用了多种 transformer 模型,并使用了 synthetic dataset 和 Kaggle 默认预测 dataset (2022) 从美国表现。研究发现了数据预处理的优化策略、预训练和直接监督学习方法的比较、 categorical 和数字特征的处理策略,以及计算资源和性能之间的贸易OFF。 concentrate on 财务数据模型,该研究旨在推动 transformer 模型在实际场景中的系统性发展和部署,强调可扩展性。

Large Language Models as Topological Structure Enhancers for Text-Attributed Graphs

  • paper_url: http://arxiv.org/abs/2311.14324
  • repo_url: None
  • paper_authors: Shengyin Sun, Yuxiang Ren, Chen Ma, Xuecang Zhang
  • for: 本研究探讨了如何使用大型自然语言模型(LLM)改善文本关联图(TAG)中节点的 topological structure,尤其是在节点分类任务下。
  • methods: 本研究提出了两种使用 LLM 改善图 topological structure的方法:首先,使用 LLM 生成节点特征的 semantic similarity,然后根据相似性进行边删除和边添加;其次,引入 pseudo-label 协助 GNN 学习合适的边重量。
  • results: 实验结果表明,LLM-based 图 topological refinement 可以提高节点分类任务的性能(在公共标准benchmark上达到0.15%–2.47%的提升)。
    Abstract The latest advancements in large language models (LLMs) have revolutionized the field of natural language processing (NLP). Inspired by the success of LLMs in NLP tasks, some recent work has begun investigating the potential of applying LLMs in graph learning tasks. However, most of the existing work focuses on utilizing LLMs as powerful node feature augmenters, leaving employing LLMs to enhance graph topological structures an understudied problem. In this work, we explore how to leverage the information retrieval and text generation capabilities of LLMs to refine/enhance the topological structure of text-attributed graphs (TAGs) under the node classification setting. First, we propose using LLMs to help remove unreliable edges and add reliable ones in the TAG. Specifically, we first let the LLM output the semantic similarity between node attributes through delicate prompt designs, and then perform edge deletion and edge addition based on the similarity. Second, we propose using pseudo-labels generated by the LLM to improve graph topology, that is, we introduce the pseudo-label propagation as a regularization to guide the graph neural network (GNN) in learning proper edge weights. Finally, we incorporate the two aforementioned LLM-based methods for graph topological refinement into the process of GNN training, and perform extensive experiments on four real-world datasets. The experimental results demonstrate the effectiveness of LLM-based graph topology refinement (achieving a 0.15%--2.47% performance gain on public benchmarks).
    摘要 最新的大语言模型(LLM)在自然语言处理(NLP)领域已经引起了革命,而一些最近的工作则开始研究在图学任务中应用LLM。然而,大多数现有的工作都是使用LLM作为图像特征增强工具,忽略了使用LLM来提高图结构的可能性。在这项工作中,我们探讨了如何使用LLM的信息检索和文本生成能力来改善文本涂抹图(TAG)中的图结构,具体来说,我们提出了以下两种方法:1. 使用LLM输出节点特征相似性,并根据相似性进行边删除和边添加。我们首先使用灵活的提示设计让LLM输出节点特征相似性,然后根据相似性进行边删除和边添加。2. 使用LLM生成的假标签来改善图结构,即通过假标签卷积来规范图 neural network(GNN)学习合适的边权重。最后,我们将这两种LLM基于的图结构级别提升方法integrated into GNN培训过程中,并在四个实际数据集上进行了广泛的实验。实验结果表明,LLM基于的图结构级别提升方法可以在公共标准准确率(ACC)上提高0.15%--2.47%的性能。

Windformer:Bi-Directional Long-Distance Spatio-Temporal Network For Wind Speed Prediction

  • paper_url: http://arxiv.org/abs/2311.14316
  • repo_url: https://github.com/szwszwszw123/windformer
  • paper_authors: Xuewei Li, Zewen Shang, Zhiqiang Liu, Jian Yu, Wei Xiong, Mei Yu
  • for: 预测风速是风力发电管理中非常重要的一环, 由于风速的巨大变化范围和扰动效应, 可能存在较强的相关性 между远距离的风机。这种难以提取的特征已成为提高准确性的瓶颈。
  • methods: 为解决这些问题, 本文提出了风former。首先, 风former将风机群分成多个不重叠的窗口, 然后在窗口内计算相关性, 并将窗口部分地移动以提供窗口之间的连接, 最后将多个通道特征 fusions 基于详细和全局信息。
  • results: 对比其他当前先进方法, 风former 的 Mean Square Error (MSE) 在 NERL 上的两个数据集上降低了0.5% 到 15%。
    Abstract Wind speed prediction is critical to the management of wind power generation. Due to the large range of wind speed fluctuations and wake effect, there may also be strong correlations between long-distance wind turbines. This difficult-to-extract feature has become a bottleneck for improving accuracy. History and future time information includes the trend of airflow changes, whether this dynamic information can be utilized will also affect the prediction effect. In response to the above problems, this paper proposes Windformer. First, Windformer divides the wind turbine cluster into multiple non-overlapping windows and calculates correlations inside the windows, then shifts the windows partially to provide connectivity between windows, and finally fuses multi-channel features based on detailed and global information. To dynamically model the change process of wind speed, this paper extracts time series in both history and future directions simultaneously. Compared with other current-advanced methods, the Mean Square Error (MSE) of Windformer is reduced by 0.5\% to 15\% on two datasets from NERL.
    摘要 预测风速对风力发电管理是关键。由于风速波动范围很大,带动效应也可能导致远程风机之间强相关性。这种难以提取特征使得精度提高受到了各种瓶颈。历史和未来时间信息包括风暴气流变化趋势,是否能充分利用这些动态信息,将影响预测效果。为了解决这些问题,本文提出了风成器(Windformer)。首先,风成器将风机群分成多个非重叠窗口,然后在窗口内计算相关性,并将窗口部分移动以提供窗口之间连接。最后,风成器将多个通道特征 fusion,基于详细和全局信息。为了动态模型风速变化的过程,本文同时提取历史和未来时间序列。与现有先进方法相比,风成器的 Mean Square Error(MSE)在NERL dataset上降低了0.5%到15%。

Robust Domain Misinformation Detection via Multi-modal Feature Alignment

  • paper_url: http://arxiv.org/abs/2311.14315
  • repo_url: https://github.com/less-and-less-bugs/rdcm
  • paper_authors: Hui Liu, Wenya Wang, Hao Sun, Anderson Rocha, Haoliang Li
  • for: 本研究旨在提出一种robust多模态鲁棒适应(RDCM)方法,用于检测多Modal的谣言信息。
  • methods: 本方法使用了交叉模态对齐模块和交叉模态对齐模块来减少频率域的差异,并且同时考虑了频率域的适应和频率域的泛化。
  • results: 经过测试两个公共的多模态谣言检测数据集(Pheme和Twitter Datasets),结果显示提出的方法在检测多模态谣言信息方面表现出色,与传统的监督学习方法相比,它具有更好的鲁检性和泛化性。
    Abstract Social media misinformation harms individuals and societies and is potentialized by fast-growing multi-modal content (i.e., texts and images), which accounts for higher "credibility" than text-only news pieces. Although existing supervised misinformation detection methods have obtained acceptable performances in key setups, they may require large amounts of labeled data from various events, which can be time-consuming and tedious. In turn, directly training a model by leveraging a publicly available dataset may fail to generalize due to domain shifts between the training data (a.k.a. source domains) and the data from target domains. Most prior work on domain shift focuses on a single modality (e.g., text modality) and ignores the scenario where sufficient unlabeled target domain data may not be readily available in an early stage. The lack of data often happens due to the dynamic propagation trend (i.e., the number of posts related to fake news increases slowly before catching the public attention). We propose a novel robust domain and cross-modal approach (\textbf{RDCM}) for multi-modal misinformation detection. It reduces the domain shift by aligning the joint distribution of textual and visual modalities through an inter-domain alignment module and bridges the semantic gap between both modalities through a cross-modality alignment module. We also propose a framework that simultaneously considers application scenarios of domain generalization (in which the target domain data is unavailable) and domain adaptation (in which unlabeled target domain data is available). Evaluation results on two public multi-modal misinformation detection datasets (Pheme and Twitter Datasets) evince the superiority of the proposed model. The formal implementation of this paper can be found in this link: https://github.com/less-and-less-bugs/RDCM
    摘要 社交媒体谣言伤害个人和社会,而这种谣言受到快速增长的多Modal内容(即文本和图像)的支持,这种内容的 credibility 高于文本只的新闻篇幅。 Although 现有的监督式谣言检测方法在某些场景中获得了接受的表现,但它们可能需要大量的标注数据从多个事件,这可能是时间consuming 和繁琐的。 In turn, 直接使用公共可用的数据集来训练模型可能无法泛化 Due to domain shifts between the training data (即源频道) and the data from target domains。 Prior work on domain shift 主要关注单一模态(例如文本模式),而忽略了target频道数据不可避免的情况,即在早期阶段可能无法获得足够的未标注数据。 这种情况frequently happens due to the dynamic propagation trend (即谣言宣传速度慢慢增长)。 We propose a novel robust domain and cross-modal approach (\textbf{RDCM}) for multi-modal misinformation detection. It reduces the domain shift by aligning the joint distribution of textual and visual modalities through an inter-domain alignment module and bridges the semantic gap between both modalities through a cross-modality alignment module. We also propose a framework that simultaneously considers application scenarios of domain generalization (in which the target domain data is unavailable) and domain adaptation (in which unlabeled target domain data is available). Evaluation results on two public multi-modal misinformation detection datasets (Pheme和Twitter Datasets) evince the superiority of the proposed model. The formal implementation of this paper can be found in this link:

New Epochs in AI Supervision: Design and Implementation of an Autonomous Radiology AI Monitoring System

  • paper_url: http://arxiv.org/abs/2311.14305
  • repo_url: None
  • paper_authors: Vasantha Kumar Venugopal, Abhishek Gupta, Rohit Takhar, Vidur Mahajan
  • for: 监测和维护 radiology AI 分类模型在实践中的准确性和可靠性
  • methods: 提出了两个指标:predictive divergence和temporal stability,通过比较预测结果和两个补充模型的结果来评估模型准确性,并通过比较当前预测结果与历史移动平均值来评估模型的时间稳定性
  • results: 通过递归验证使用了胸部X射图数据,实现了模型可靠性的监测和维护,提供了实时的模型性能信息,为AI在医疗决策中的安全和有效使用 lay the foundation
    Abstract With the increasingly widespread adoption of AI in healthcare, maintaining the accuracy and reliability of AI models in clinical practice has become crucial. In this context, we introduce novel methods for monitoring the performance of radiology AI classification models in practice, addressing the challenges of obtaining real-time ground truth for performance monitoring. We propose two metrics - predictive divergence and temporal stability - to be used for preemptive alerts of AI performance changes. Predictive divergence, measured using Kullback-Leibler and Jensen-Shannon divergences, evaluates model accuracy by comparing predictions with those of two supplementary models. Temporal stability is assessed through a comparison of current predictions against historical moving averages, identifying potential model decay or data drift. This approach was retrospectively validated using chest X-ray data from a single-center imaging clinic, demonstrating its effectiveness in maintaining AI model reliability. By providing continuous, real-time insights into model performance, our system ensures the safe and effective use of AI in clinical decision-making, paving the way for more robust AI integration in healthcare
    摘要 We propose two metrics - predictive divergence and temporal stability - to be used for preemptive alerts of AI performance changes. Predictive divergence, measured using Kullback-Leibler and Jensen-Shannon divergences, evaluates model accuracy by comparing predictions with those of two supplementary models. Temporal stability is assessed through a comparison of current predictions against historical moving averages, identifying potential model decay or data drift.We retrospectively validated our approach using chest X-ray data from a single-center imaging clinic, and found it to be effective in maintaining AI model reliability. Our system provides continuous, real-time insights into model performance, ensuring the safe and effective use of AI in clinical decision-making, and paving the way for more robust AI integration in healthcare.

Efficient Open-world Reinforcement Learning via Knowledge Distillation and Autonomous Rule Discovery

  • paper_url: http://arxiv.org/abs/2311.14270
  • repo_url: None
  • paper_authors: Ekaterina Nikonova, Cheng Xue, Jochen Renz
  • for: 本研究旨在提高深度强化学习Agent在新环境中的适应能力。
  • methods: 我们提出了一个通用的框架,可以让Agent在新环境中自主找到任务特定的规则,并自我超visit。
  • results: 我们的实验表明,基于规则的深度Q学习Agent(RDQ)可以快速检测和适应新的情况,并比基准Agent更加鲜明。
    Abstract Deep reinforcement learning suffers from catastrophic forgetting and sample inefficiency making it less applicable to the ever-changing real world. However, the ability to use previously learned knowledge is essential for AI agents to quickly adapt to novelties. Often, certain spatial information observed by the agent in the previous interactions can be leveraged to infer task-specific rules. Inferred rules can then help the agent to avoid potentially dangerous situations in the previously unseen states and guide the learning process increasing agent's novelty adaptation speed. In this work, we propose a general framework that is applicable to deep reinforcement learning agents. Our framework provides the agent with an autonomous way to discover the task-specific rules in the novel environments and self-supervise it's learning. We provide a rule-driven deep Q-learning agent (RDQ) as one possible implementation of that framework. We show that RDQ successfully extracts task-specific rules as it interacts with the world and uses them to drastically increase its learning efficiency. In our experiments, we show that the RDQ agent is significantly more resilient to the novelties than the baseline agents, and is able to detect and adapt to novel situations faster.
    摘要 深度强化学习受到慢性学习和样本不足的影响,使其在实际世界中应用更加困难。然而,使用先前学习的知识是AI代理人在面临新事物时快速适应的重要能力。经常,特定的空间信息在先前交互中被观察到可以用来推导任务特定的规则。推导出的规则可以帮助代理人在未看过的状态中避免可能的危险,并且导引学习过程增加代理人对新事物的适应速度。在这种工作中,我们提出了一个通用的框架,可以应用于深度强化学习代理人。我们的框架为代理人提供了自主地推导任务特定的规则的能力,并且自我超级视觉。我们在实验中提供了一个基于这种框架的规则驱动深度Q学习代理人(RDQ),并证明了RDQ成功地推导出任务特定的规则,并且在交互时使用这些规则以增加其学习效率。在我们的实验中,RDQ代理人比基准代理人更加鲜明地适应新事物,并且更快地检测和适应新情况。

DemoFusion: Democratising High-Resolution Image Generation With No $$$

  • paper_url: http://arxiv.org/abs/2311.16973
  • repo_url: None
  • paper_authors: Ruoyi Du, Dongliang Chang, Timothy Hospedales, Yi-Zhe Song, Zhanyu Ma
  • for: 这篇论文的目的是通过推进高分辨率生成技术,使生成人工智能(GenAI)更加民主化,以便更多人可以访问和使用。
  • methods: 该论文使用了现有的幂展 diffusion model(LDM),并提出了一种名为 DemoFusion 的新框架,以实现更高的分辨率图像生成。 DemoFusion 使用了 Progressive Upscaling、Skip Residual 和 Dilated Sampling 机制,以提高图像生成的质量。
  • results: 论文的实验结果表明,使用 DemoFusion 可以实现更高的分辨率图像生成,而且可以在更广泛的用户群体中使用。
    Abstract High-resolution image generation with Generative Artificial Intelligence (GenAI) has immense potential but, due to the enormous capital investment required for training, it is increasingly centralised to a few large corporations, and hidden behind paywalls. This paper aims to democratise high-resolution GenAI by advancing the frontier of high-resolution generation while remaining accessible to a broad audience. We demonstrate that existing Latent Diffusion Models (LDMs) possess untapped potential for higher-resolution image generation. Our novel DemoFusion framework seamlessly extends open-source GenAI models, employing Progressive Upscaling, Skip Residual, and Dilated Sampling mechanisms to achieve higher-resolution image generation. The progressive nature of DemoFusion requires more passes, but the intermediate results can serve as "previews", facilitating rapid prompt iteration.
    摘要 高分辨率图像生成技术采用生成人工智能(GenAI)具有巨大潜力,但由于训练需要庞大资本投入,因此逐渐集中到少数大公司手中,并被隐藏在付费墙上。这篇论文希望通过推动高分辨率GenAI的前沿,使其更加民主化,让更广泛的人群有access。我们示示现有的潜在扩散模型(LDM)具有更高的分辨率图像生成潜力。我们的 DemoFusion 框架可以轻松扩展开源GenAI模型,通过进步升级、跳跃副作用和扩大采样机制实现高分辨率图像生成。进步性的 DemoFusion 需要更多的过程,但 intermediate 结果可以作为“预览”,促进快速的提示迭代。

cs.CL - 2023-11-24

Tracing Influence at Scale: A Contrastive Learning Approach to Linking Public Comments and Regulator Responses

  • paper_url: http://arxiv.org/abs/2311.14871
  • repo_url: None
  • paper_authors: Linzi Xing, Brad Hackinen, Giuseppe Carenini
  • for: 这个论文是为了解决美国联邦 regulators 每年接到一百万多封公众意见信的问题。
  • methods: 这篇论文使用了迭代对比方法,使用神经网络模型匹配公众意见信和 regulators 的回复文本。
  • results: 论文的实验结果显示,该方法可以substantially outperform 一些选择的文本匹配基准,并且与最先进的语言模型(GPT-4)的性能相当,在处理大规模的公众意见信和 regulators 回复时更加经济。
    Abstract U.S. Federal Regulators receive over one million comment letters each year from businesses, interest groups, and members of the public, all advocating for changes to proposed regulations. These comments are believed to have wide-ranging impacts on public policy. However, measuring the impact of specific comments is challenging because regulators are required to respond to comments but they do not have to specify which comments they are addressing. In this paper, we propose a simple yet effective solution to this problem by using an iterative contrastive method to train a neural model aiming for matching text from public comments to responses written by regulators. We demonstrate that our proposal substantially outperforms a set of selected text-matching baselines on a human-annotated test set. Furthermore, it delivers performance comparable to the most advanced gigantic language model (i.e., GPT-4), and is more cost-effective when handling comments and regulator responses matching in larger scale.
    摘要 美国联邦监管机构每年收到超过一百万个公众意见书籍,来自企业、利益团体和公民,强烈提出修改提案的修订。这些意见被认为具有广泛的公共政策影响。然而,评估特定意见的影响很困难,因为监管机构需要回复意见,但并不需要指定哪些意见。在这篇论文中,我们提议一种简单 yet effective的解决方案,使用迭代对照方法训练一个神经网络,以匹配公众意见和监管机构的回复。我们示出,我们的提议在一个人工标注的测试集上显著超越了一组选择的文本匹配基线。此外,它可以与最先进的巨大语言模型(即GPT-4)的性能相似,并在处理评论和监管机构回复的更大规模时更加经济。

OpusCleaner and OpusTrainer, open source toolkits for training Machine Translation and Large language models

  • paper_url: http://arxiv.org/abs/2311.14838
  • repo_url: https://github.com/hplt-project/opustrainer
  • paper_authors: Nikolay Bogoychev, Jelmer van der Linde, Graeme Nail, Barry Haddow, Jaume Zaragoza-Bernabeu, Gema Ramírez-Sánchez, Lukas Weymann, Tudor Nicolae Mateiu, Jindřich Helcl, Mikko Aulamo
  • for: 提高机器翻译系统的质量和可靠性,减少新手入门难度。
  • methods: 提供OpusCleaner和OpusTrainer两个工具,用于简化数据下载、清洁、处理和数据调度,以及实现大规模机器翻译系统和语言模型的建立。
  • results: 使用这两个工具,可以创建高质量的机器翻译模型,抗雷达User输入噪音,以及多语言模型和专业词汇模型。
    Abstract Developing high quality machine translation systems is a labour intensive, challenging and confusing process for newcomers to the field. We present a pair of tools OpusCleaner and OpusTrainer that aim to simplify the process, reduce the amount of work and lower the entry barrier for newcomers. OpusCleaner is a data downloading, cleaning, and proprocessing toolkit. It is designed to allow researchers to quickly download, visualise and preprocess bilingual (or monolingual) data that comes from many different sources, each of them with different quality, issues, and unique filtering/preprocessing requirements. OpusTrainer is a data scheduling and data augmenting tool aimed at building large scale, robust machine translation systems and large language models. It features deterministic data mixing from many different sources, on-the-fly data augmentation and more. Using these tools, we showcase how we can use it to create high quality machine translation model robust to noisy user input; multilingual models and terminology aware models.
    摘要 开发高质量机器翻译系统是一项劳动密集、挑战性强、容易困惑的过程,特别是对新手而言。我们提供了一对工具——OpusCleaner和OpusTrainer——以简化过程、减少工作量和降低新手入门难度。OpusCleaner是一个数据下载、清洁、预处理工具集。它是为研究人员快速下载、视见和预处理来自多种不同来源的双语(或单语)数据,每个来源都有不同的质量、问题和唯一的筛选/预处理要求。OpusTrainer是一个数据调度和数据增强工具,旨在建立大规模、可靠的机器翻译系统和大语言模型。它具有 deterministic 数据混合、在线数据增强等功能。使用这两个工具,我们示例如如何使其创建高质量机器翻译模型,抗骚抗噪的用户输入,多语言模型和专有词汇模型。

Custom Data Augmentation for low resource ASR using Bark and Retrieval-Based Voice Conversion

  • paper_url: http://arxiv.org/abs/2311.14836
  • repo_url: None
  • paper_authors: Anand Kamble, Aniket Tathe, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra
  • for: 提高低资源语言 like Hindi 的 Common Voice 数据集的定制化
  • methods: 利用 Bark 模型和 Meta 的 enCodec 和 HuBert 模型进行改进,以及 Retrieval-Based Voice Conversion (RVC) 技术
  • results: 提高 ASR 技术的发展,并为各种应用场景提供高质量、个性化的声音生成Translation:
  • for: 提高低资源语言 like Hindi 的 Common Voice 数据集的定制化
  • methods: 利用 Bark 模型和 Meta 的 enCodec 和 HuBert 模型进行改进,以及 Retrieval-Based Voice Conversion (RVC) 技术
  • results: 提高 ASR 技术的发展,并为各种应用场景提供高质量、个性化的声音生成
    Abstract This paper proposes two innovative methodologies to construct customized Common Voice datasets for low-resource languages like Hindi. The first methodology leverages Bark, a transformer-based text-to-audio model developed by Suno, and incorporates Meta's enCodec and a pre-trained HuBert model to enhance Bark's performance. The second methodology employs Retrieval-Based Voice Conversion (RVC) and uses the Ozen toolkit for data preparation. Both methodologies contribute to the advancement of ASR technology and offer valuable insights into addressing the challenges of constructing customized Common Voice datasets for under-resourced languages. Furthermore, they provide a pathway to achieving high-quality, personalized voice generation for a range of applications.
    摘要

Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR

  • paper_url: http://arxiv.org/abs/2311.14835
  • repo_url: None
  • paper_authors: Jintao Jiang, Yingbo Gao, Zoltan Tuske
  • for: 这 paper 的目的是创建弱对Alignment超级vision,以帮助端到端模型。
  • methods: 作者使用现有的混合式 ASR 系统生成训练声音的 triphone 对Alignment,然后在某层的 encoder 中创建一个 cross-entropy 损失函数。
  • results: 结果表明,在第三层 encoder 中使用 label smoothing 参数值为 0.5 的 weak alignment supervision 比一般一颗 cross-entropy 损失函数和 CTC 损失函数 WITH loss weighting 更好,可以在 TED-LIUM 2 数据集上减少约 5% 的 relative WER。
    Abstract In this paper, we aim to create weak alignment supervision to aid the end-to-end modeling. Towards this end, we use the existing hybrid ASR system to produce triphone alignments of the training audios. We then create a cross-entropy loss at a certain layer of the encoder using the derived alignments. In contrast to the general one-hot cross-entropy losses with or without loss weighting, here we use a cross-entropy loss with a label smoothing parameter to regularize the supervision. As a comparison, we also conduct the experiments with one-hot cross-entropy losses and CTC losses with loss weighting. The results show that placing the weak alignment supervision with the label smoothing parameter of 0.5 at the third encoder layer outperforms the other two approaches and leads to about 5% relative WER reduction on the TED-LIUM 2 dataset over the baseline. We see similar improvements when applying the method out-of-the-box on a Tagalog end-to-end ASR system.
    摘要 在这篇论文中,我们目的是创建弱对Alignment超级vision来 помо助端到端模型。为此,我们使用现有的混合式ASR系统生成训练听力的triphone对Alignment。然后,我们在encoder层中定义一个cross-entropy损失函数,使用 derive的对Alignment来定义损失。与通常的一个hot cross-entropy损失函数不同,我们使用一个标签平滑参数来规范Supervision。为了比较,我们还进行了使用一个hot cross-entropy损失函数和CTC损失函数的实验。结果表明,在第三层encoder层上添加弱对Alignment超级vision,使用标签平滑参数0.5,可以比基eline的5%相对WRER降低。我们在Tagalog端到端ASR系统上也 observe到类似的改进。

Data-to-Text Bilingual Generation

  • paper_url: http://arxiv.org/abs/2311.14808
  • repo_url: None
  • paper_authors: Guy Lapalme
  • for: 这篇论文旨在提供一种基于pyrealb的方法,用于自动生成英文和法文两种语言的平行文本,从单一数据源开始。
  • methods: 论文使用了对象导向的方法,确保在两种语言中的文本组织和数据选择过程相似,只有语言依赖的单词和短语选择异常。
  • results: 实验结果表明,使用这种方法可以生成同一个信息在英文和法文两种语言中,无论是翻译或是同时拥有两种语言能力。此外,与GPT实例的文本生成结果进行比较,也表明这种方法的优势。
    Abstract This document illustrates the use of pyrealb for generating two parallel texts (English and French) from a single source of data. The data selection and text organisation processes are shared between the two languages. only language dependent word and phrasing choices are distinct processes. The realized texts thus convey identical information in both languages without the risk of being lost in translation. This is especially important in cases where strict and simultaneous bilingualism is required. We first present the types of applications targeted by this approach and how the pyrealb English and French realizer can be used for achieving this goal in a natural way. We describe an object-oriented organization to ensure a convenient realization in both languages. To illustrate the process, different types of applications are then briefly sketched with links to the source code. A brief comparison of the text generation is given with the output of an instance of a GPT.
    摘要 We will discuss the types of applications that can benefit from this approach and how the pyrealb English and French realizer can be used to achieve this goal in a natural way. We will also describe an object-oriented organization to make the realization process convenient for both languages.To illustrate the process, we will provide brief sketches of different types of applications and links to the source code. Finally, we will compare the text generation produced by pyrealb with the output of an instance of a GPT.

One Pass Streaming Algorithm for Super Long Token Attention Approximation in Sublinear Space

  • paper_url: http://arxiv.org/abs/2311.14652
  • repo_url: None
  • paper_authors: Raghav Addanki, Chenyang Li, Zhao Song, Chiwun Yang
  • for: 这个论文主要关注在流处理中大语言模型(LLM)的应用,尤其是在长文本分析和对话中的应用。
  • methods: 这个论文提出了一种新的算法,可以在流处理中减少大语言模型的内存使用。该算法只需要一次遍历数据,并使用低于等比数的存储空间来实现。
  • results: 论文的实验结果表明,该算法可以在流处理中高效地应用大语言模型,并且可以避免内存溢出问题。特别是当文本长度增长时,该算法可以保持减少内存使用的特点。
    Abstract Deploying Large Language Models (LLMs) in streaming applications that involve long contexts, particularly for extended dialogues and text analysis, is of paramount importance but presents two significant challenges. Firstly, the memory consumption is substantial during the decoding phase due to the caching of Key and Value states (KV) of previous tokens. Secondly, attention computation is time-consuming with a time complexity of $O(n^2)$ for the generation of each token. In recent OpenAI DevDay (Nov 6, 2023), OpenAI released a new model that is able to support a 128K-long document, in our paper, we focus on the memory-efficient issue when context length $n$ is much greater than 128K ($n \gg 2^d$). Considering a single-layer self-attention with Query, Key, and Value matrices $Q, K, V \in \mathbb{R}^{n \times d}$, the polynomial method approximates the attention output $T \in \mathbb{R}^{n \times d}$. It accomplishes this by constructing $U_1, U_2 \in \mathbb{R}^{n \times t}$ to expedite attention ${\sf Attn}(Q, K, V)$ computation within $n^{1+o(1)}$ time executions. Despite this, storing the Key and Value matrices $K, V \in \mathbb{R}^{n \times d}$ still necessitates $O( n d)$ space, leading to significant memory usage. In response to these challenges, we introduce a new algorithm that only reads one pass of the data in streaming fashion. This method employs sublinear space $o(n)$ to store three sketch matrices, alleviating the need for exact $K, V$ storage. Notably, our algorithm exhibits exceptional memory-efficient performance with super-long tokens. As the token length $n$ increases, our error guarantee diminishes while the memory usage remains nearly constant. This unique attribute underscores the potential of our technique in efficiently handling LLMs in streaming applications.
    摘要 部署大型自然语言模型(LLMs)在流处理应用程序中,特别是在长 Context 中进行长时间的对话和文本分析,是非常重要的。然而,这种部署存在两个主要挑战。首先,在解码阶段,模型的内存占用非常大,主要是因为缓存前 tokens 的 Key 和 Value 状态。其次,计算注意力的时间复杂度为 $O(n^2)$,对于每个token的生成。在OpenAI DevDay(2023年11月6日)上,OpenAI 发布了一个新的模型,可以支持128K字长的文档。在我们的论文中,我们关注的是,当 Context 长度远大于128K($n \gg 2^d)时,内存使用效率的问题。对于单层自注意的模型,我们使用多项式方法来近似注意输出 $T \in \mathbb{R}^{n \times d}$。它通过构建 $U_1, U_2 \in \mathbb{R}^{n \times t}$来加速注意力计算,从而在 $n^{1+o(1)}$ 时间内执行注意力计算。尽管如此,保存 Key 和 Value 矩阵 $K, V \in \mathbb{R}^{n \times d}$仍需要 $O(n d)$ 空间,导致内存使用增加。为了解决这些挑战,我们提出了一新的算法,只需要在流处理模式下读取一次数据。这种方法使用 sublinear 空间 $o(n)$ 存储三个笔记矩阵,从而消除了 $K, V$ 的准确存储需求。值得一提的是,我们的算法在长 token 时 exhibit 出色的内存减少性,即,随着 token 长度 $n$ 增加,我们的错误保证逐渐减少,而内存使用则保持相对常数。这种特点强调了我们的技术在流处理应用中的高效性。

Machine Translation for Ge’ez Language

  • paper_url: http://arxiv.org/abs/2311.14530
  • repo_url: None
  • paper_authors: Aman Kassahun Wassie
  • for: 此研究旨在提高非常低资源语言如格 Ethiopic 的机器翻译性能。
  • methods: 本研究使用了多种方法来提高格 Ethiopic 机器翻译,包括将相关语言的转移学习、优化共享词汇和分词方法、使用大型预训练模型和大语言模型(LLM)进行几招翻译与杂合匹配。
  • results: 我们发现,基于语言相似性的多语言神经机器翻译(MNMT)模型可以提高格 Ethiopic 机器翻译的性能,并且使用 GPT-3.5 大语言模型进行几招翻译也可以达到 remarkable BLEU 分数。然而,对于只有4k 的训练样本,NLLB-200 模型的 finsheet 表现较差。
    Abstract Machine translation (MT) for low-resource languages such as Ge'ez, an ancient language that is no longer spoken in daily life, faces challenges such as out-of-vocabulary words, domain mismatches, and lack of sufficient labeled training data. In this work, we explore various methods to improve Ge'ez MT, including transfer-learning from related languages, optimizing shared vocabulary and token segmentation approaches, finetuning large pre-trained models, and using large language models (LLMs) for few-shot translation with fuzzy matches. We develop a multilingual neural machine translation (MNMT) model based on languages relatedness, which brings an average performance improvement of about 4 BLEU compared to standard bilingual models. We also attempt to finetune the NLLB-200 model, one of the most advanced translation models available today, but find that it performs poorly with only 4k training samples for Ge'ez. Furthermore, we experiment with using GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches, which leverages embedding similarity-based retrieval to find context examples from a parallel corpus. We observe that GPT-3.5 achieves a remarkable BLEU score of 9.2 with no initial knowledge of Ge'ez, but still lower than the MNMT baseline of 15.2. Our work provides insights into the potential and limitations of different approaches for low-resource and ancient language MT.
    摘要 机器翻译(MT) для低资源语言如格'ез(Ge'ez)面临挑战,包括无法词、领域不匹配和不足的训练数据。在这个工作中,我们探索了不同的方法来改善格'езMT,包括将相关语言的转移学习应用到格'ез,优化共享词汇和分词方法,调整大型预训模型,以及使用大型自然语言模型(LLM)进行几据翻译。我们开发了一个多语言神经机器翻译(MNMT)模型,基于语言之间的相关性,带来了约4个BLEU的平均性能提升。我们还尝试了调整NLLB-200模型,但发现它对于格'ез的4000个训练数据表现不佳。此外,我们尝试使用GPT-3.5,一个现今最先进的自然语言模型,进行几据翻译,使用类似度基于的汇集搜寻获得上下文示例。我们发现GPT-3.5在无任何格'ез知识下可以获得9.2个BLEU分,但仍比MNMT基准下的15.2分低。我们的工作提供了低资源语言和古语言MT的可能性和限制。

tinyCLAP: Distilling Constrastive Language-Audio Pretrained Models

  • paper_url: http://arxiv.org/abs/2311.14517
  • repo_url: None
  • paper_authors: Francesco Paissan, Elisabetta Farella
  • for: 降低对比语音预训练模型的复杂性,以实现高效的语音识别和生成
  • methods: 基于首肯定理 derivation的单模型热退、约束梯度下降和精简
  • results: 使用 tinyCLAP 模型,只需使用原 Microsoft CLAP 参数的 6%,在三个声音事件检测数据集上实现零 shot 分类性能下降 less than 5%
    Abstract Contrastive Language-Audio Pretraining (CLAP) became of crucial importance in the field of audio and speech processing. Its employment ranges from sound event detection to text-to-audio generation. However, one of the main limitations is the considerable amount of data required in the training process and the overall computational complexity during inference. This paper investigates how we can reduce the complexity of contrastive language-audio pre-trained models, yielding an efficient model that we call tinyCLAP. We derive an unimodal distillation loss from first principles and explore how the dimensionality of the shared, multimodal latent space can be reduced via pruning. TinyCLAP uses only 6% of the original Microsoft CLAP parameters with a minimal reduction (less than 5%) in zero-shot classification performance across the three sound event detection datasets on which it was tested
    摘要 对于语音处理领域而言,对照语言-语音预训(CLAP)已经成为非常重要的一种方法。它的应用范围自声事件探测到文本-语音生成。然而,CLAP的主要限制是训练过程中需要很大量数据,以及推导过程中的总 Computational Complexity。这篇文章探讨了如何将对照语言-语音预训模型简化,实现一个高效的模型,我们称之为“tinyCLAP”。我们从基本原理开始, derivate一种单modal distillation损失函数,并考虑如何透过剪枝来降低共享多modal的 latent space 维度。 tinyCLAP 只需6%的原始 Microsoft CLAP 参数,并且在三个声事件探测数据集上进行零 shot 分类时,几乎没有损失(少于5%)。

Analysing the Impact of Removing Infrequent Words on Topic Quality in LDA Models

  • paper_url: http://arxiv.org/abs/2311.14505
  • repo_url: None
  • paper_authors: Victor Bystrov, Viktoriia Naboka-Krell, Anna Staszewska-Bystrova, Peter Winker
  • for: 这篇论文的目的是为了探讨在文本数据应用中文本预处理的一个步骤,即去掉不常见的词语,以提高计算的效率。
  • methods: 论文使用了Latent Dirichlet Allocation(LDAL)来估计主题质量。在实验中,作者使用了不同的词语去掉标准和评价指标来评估去掉不常见词语的效果。
  • results: 结果显示,去掉不常见词语可以提高主题估计的质量,并且可以去掉一 considerable amount of vocabulary。
    Abstract An initial procedure in text-as-data applications is text preprocessing. One of the typical steps, which can substantially facilitate computations, consists in removing infrequent words believed to provide limited information about the corpus. Despite popularity of vocabulary pruning, not many guidelines on how to implement it are available in the literature. The aim of the paper is to fill this gap by examining the effects of removing infrequent words for the quality of topics estimated using Latent Dirichlet Allocation. The analysis is based on Monte Carlo experiments taking into account different criteria for infrequent terms removal and various evaluation metrics. The results indicate that pruning is beneficial and that the share of vocabulary which might be eliminated can be quite considerable.
    摘要 <>文本为数据应用的初始过程之一是文本处理。其中一个常见的步骤是去掉不常用词,因为这些词据信能够提供 corpus 中的有限信息。虽然词汇剔除受欢迎,但在 литературе 中有很少关于如何实现它的指南。本文的目标是填补这个空白,通过对 Latent Dirichlet Allocation 估算的话题质量的影响进行分析。这些分析基于 Monte Carlo 实验,考虑了不同的不常用词去除 criterion 和不同的评价指标。结果表明,剔除不常用词是有利的,并且可以去掉一部分词汇的比例。Note: I used the Traditional Chinese characters for "文本" (wén tiě) and "词汇" (cí huì) to match the original text.

SER_AMPEL: A multi-source dataset for SER of Italian older adults

  • paper_url: http://arxiv.org/abs/2311.14483
  • repo_url: None
  • paper_authors: Alessandra Grossi, Francesca Gasparini
  • for: 这 paper 是为了提供一个参考 Dataset дляspeech emotion recognition(SER)Italian older adults。
  • methods: 该 Dataset 采集了不同协议,包括 acted conversations 从电影和电视剧中提取,以及自然会话中使用问题引起情感的录制。
  • results: 这 paper 预览了提出的 Dataset 的需求,并对一个子集进行了初步的分类结果分析,探讨了SER 的关键问题。
    Abstract In this paper, SER_AMPEL, a multi-source dataset for speech emotion recognition (SER) is presented. The peculiarity of the dataset is that it is collected with the aim of providing a reference for speech emotion recognition in case of Italian older adults. The dataset is collected following different protocols, in particular considering acted conversations, extracted from movies and TV series, and recording natural conversations where the emotions are elicited by proper questions. The evidence of the need for such a dataset emerges from the analysis of the state of the art. Preliminary considerations on the critical issues of SER are reported analyzing the classification results on a subset of the proposed dataset.
    摘要 在本文中,我们提出了一个多源数据集 для语音情感识别(SER),称为SER_AMPEL。该数据集的特点是集成了意大利老年人的语音情感识别参考数据集。数据集采集了不同协议,包括从电影和电视剧中提取的 acted conversations,以及通过适当问题诱发的自然对话。我们认为这样的数据集是有必要的,因为我们通过分析现状技术的报告发现了语音情感识别领域的挑战。本文的前提是对SER_AMPEL数据集的一些首要考虑。

Controlled Text Generation via Language Model Arithmetic

  • paper_url: http://arxiv.org/abs/2311.14479
  • repo_url: https://github.com/eth-sri/language-model-arithmetic
  • paper_authors: Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev
  • for: 这 paper 是为了提出一种新的推理框架,帮助在更广泛的场景中使用大型自然语言模型(LLMs)进行自定义。
  • methods: 这 paper 使用了一种名为“模型算术”的新的推理方法,可以无需再训练模型或使用高度特定的数据集来进行自定义。这种方法还允许更精细地控制生成的文本,比直接提示和先前的控制文本生成(CTG)技术更有效。
  • results: 根据这 paper,使用模型算术可以实现精细地控制生成的文本,同时超过了现有的状态对tasks of toxicity reduction。
    Abstract As Large Language Models (LLMs) are deployed more widely, customization with respect to vocabulary, style and character becomes more important. In this work we introduce model arithmetic, a novel inference framework for composing and biasing LLMs without the need for model (re)training or highly specific datasets. In addition, the framework allows for more precise control of generated text than direct prompting and prior controlled text generation (CTG) techniques. Using model arithmetic, we can express prior CTG techniques as simple formulas and naturally extend them to new and more effective formulations. Further, we show that speculative sampling, a technique for efficient LLM sampling, extends to our setting. This enables highly efficient text generation with multiple composed models with only marginal overhead over a single model. Our empirical evaluation demonstrates that model arithmetic allows fine-grained control of generated text while outperforming state-of-the-art on the task of toxicity reduction.
    摘要 As Large Language Models (LLMs) 广泛部署,自定义 vocabulary、style 和 character 变得更加重要。在这项工作中,我们介绍 model arithmetic,一种新的推理框架,可以无需模型(重)训练或特定的数据集来组合和偏迷 LLMs。此外,该框架还允许更精细地控制生成的文本,比直接提示和先前控制的文本生成(CTG)技术更为灵活。使用 model arithmetic,我们可以将先前的 CTG 技术表示为简单的公式,并自然地扩展到新的有效的表述。此外,我们发现,用于高效的 LLM 抽样的 speculative sampling 技术可以应用于我们的设置中。这使得可以使用多个组合的模型进行高效的文本生成,只需单个模型的负担。我们的实验证明,model arithmetic 允许细化控制生成的文本,而且在减少攻击性 task 上超越了当前的状态。

DP-NMT: Scalable Differentially-Private Machine Translation

  • paper_url: http://arxiv.org/abs/2311.14465
  • repo_url: https://github.com/trusthlt/dp-nmt
  • paper_authors: Timour Igamberdiev, Doan Nam Long Vu, Felix Künnecke, Zhuo Yu, Jannik Holmer, Ivan Habernal
  • for: 这个研究旨在提供一个开源框架,用于进行隐私保护的自然语言译latexmb Translation (NMT) 系统的研究。
  • methods: 本研究使用了差异encibly private stochastic gradient descent (DP-SGD) 方法来训练 NMT 模型,并提供了一个可重新使用的开源框架,以便研究人员可以轻松地实现隐私保护的 NMT 系统。
  • results: 本研究通过在不同的数据集和评估指标下进行了一系列实验,以验证 DP-NMT 框架的可行性和有效性。
    Abstract Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implementation specifics of training a model with DP-SGD are not always clarified in existing models, with differing software libraries used and code bases not always being public, leading to reproducibility issues. To tackle this, we introduce DP-NMT, an open-source framework for carrying out research on privacy-preserving NMT with DP-SGD, bringing together numerous models, datasets, and evaluation metrics in one systematic software package. Our goal is to provide a platform for researchers to advance the development of privacy-preserving NMT systems, keeping the specific details of the DP-SGD algorithm transparent and intuitive to implement. We run a set of experiments on datasets from both general and privacy-related domains to demonstrate our framework in use. We make our framework publicly available and welcome feedback from the community.
    摘要 神经机器翻译(NMT)是广泛应用的文本生成任务,但是在开发隐私保护NMT模型方面还存在较大的研究差距,尽管NMT系统存在数据隐私问题。不同的隐私保护权限的权限评估(DP-SGD)是训练机器学习模型的受欢迎方法,但是在训练模型时的具体实现细节不一定是已经解释的,存在不同的软件库和代码库,导致复制问题。为了解决这问题,我们介绍DP-NMT框架,这是一个开源的框架,用于进行隐私保护NMT模型的研究,汇集了许多模型、数据集和评价指标在一个系统化的软件包中。我们的目标是提供一个平台,使研究人员可以在隐私保护NMT系统的发展中进行研究,并且在DP-SGD算法中保持简明易懂的具体细节。我们在不同的数据集上进行了一系列实验,以示DP-NMT框架的应用。我们将DP-NMT框架公开发布,欢迎社区的反馈。

ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution

  • paper_url: http://arxiv.org/abs/2311.14391
  • repo_url: None
  • paper_authors: Milan Straka
  • for: 本文是CRAC 2023 Shared Task on Multilingual Coreference Resolution 的获奖作品,用于提高多语言核心参照解决的性能。
  • methods: 本文使用的方法包括提取提及 span 以及在这些 span 上进行核心参照链接,通过在所有可用 corpora 上进行共同预训练,并使用共享预训练语言模型进行训练。主要改进包括输入大于 512 个子词和更改提及解码以支持 ensemble。
  • results: 本文的实验结果表明,CorPipe 在 CRAC 2023 中的得分高于其他参与者的平均分数点数 by 4.5% 之多。
    Abstract We present CorPipe, the winning entry to the CRAC 2023 Shared Task on Multilingual Coreference Resolution. Our system is an improved version of our earlier multilingual coreference pipeline, and it surpasses other participants by a large margin of 4.5 percent points. CorPipe first performs mention detection, followed by coreference linking via an antecedent-maximization approach on the retrieved spans. Both tasks are trained jointly on all available corpora using a shared pretrained language model. Our main improvements comprise inputs larger than 512 subwords and changing the mention decoding to support ensembling. The source code is available at https://github.com/ufal/crac2023-corpipe.
    摘要 我们现在介绍CorPipe,CRAC 2023共享任务中的赢家。我们的系统是之前的多语言核心引用管道的改进版本,在其他参与者之上减分4.5个百分点。CorPipe首先检测提及,然后通过 antecedent-maximization 方法对检测到的跨度进行核心关系链接。两个任务都是通过所有可用 corpora 进行共同训练,使用共享预训练语言模型。我们的主要改进包括输入大于512个子词和更改提及解码以支持集成。源代码可以在 GitHub 上找到:https://github.com/ufal/crac2023-corpipe。

Average Token Delay: A Duration-aware Latency Metric for Simultaneous Translation

  • paper_url: http://arxiv.org/abs/2311.14353
  • repo_url: None
  • paper_authors: Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
  • for: 这篇论文是关于同时翻译的评估 metric,具体来说是一种基于听说 span (EVS) 的延迟评估方法。
  • methods: 这篇论文使用了一种新的延迟评估方法called \emph{Average Token Delay} (ATD),它关注了部分翻译输出的持续时间。
  • results: 在实验中,ATD 与 EVS 之间存在高度相关性,特别在大多数情况下。
    Abstract Simultaneous translation is a task in which the translation begins before the end of an input speech segment. Its evaluation should be conducted based on latency in addition to quality, and for users, the smallest possible amount of latency is preferable. Most existing metrics measure latency based on the start timings of partial translations and ignore their duration. This means such metrics do not penalize the latency caused by long translation output, which delays the comprehension of users and subsequent translations. In this work, we propose a novel latency evaluation metric for simultaneous translation called \emph{Average Token Delay} (ATD) that focuses on the duration of partial translations. We demonstrate its effectiveness through analyses simulating user-side latency based on Ear-Voice Span (EVS). In our experiment, ATD had the highest correlation with EVS among baseline latency metrics under most conditions.
    摘要 同时翻译是一种任务,在输入语音段结束之前,翻译就开始了。其评估应该基于延迟,而不仅仅是质量。用户希望的最小化延迟。现有的度量都是基于部分翻译的开始时间,忽略其持续时间。这意味着这些度量不会负担由长翻译输出带来的延迟,这会延迟用户的理解和后续翻译。在这项工作中,我们提出了一种新的同时翻译延迟评估度量called 平均字符延迟(ATD),它关注部分翻译持续时间。我们通过 simulate user-side 延迟基于耳语间距(EVS)进行分析,并证明 ATD 在大多数情况下与基准延迟度量之间存在最高的相关性。

cs.LG - 2023-11-24

Effective Structural Encodings via Local Curvature Profiles

  • paper_url: http://arxiv.org/abs/2311.14864
  • repo_url: None
  • paper_authors: Lukas Fesser, Melanie Weber
  • for: 提高Graph Neural Networks在下游任务中的性能。
  • methods: 使用 discrete Ricci curvature(Local Curvature Profiles,简称LCP)作为结构编码方法,并与全局坐标编码相结合以提高下游性能。
  • results: 相比现有编码方法,LCP编码方法显示出显著的性能提升。此外,与rewiring技术相比,使用曲率信息进行结构编码可以实现更大的性能提升。
    Abstract Structural and Positional Encodings can significantly improve the performance of Graph Neural Networks in downstream tasks. Recent literature has begun to systematically investigate differences in the structural properties that these approaches encode, as well as performance trade-offs between them. However, the question of which structural properties yield the most effective encoding remains open. In this paper, we investigate this question from a geometric perspective. We propose a novel structural encoding based on discrete Ricci curvature (Local Curvature Profiles, short LCP) and show that it significantly outperforms existing encoding approaches. We further show that combining local structural encodings, such as LCP, with global positional encodings improves downstream performance, suggesting that they capture complementary geometric information. Finally, we compare different encoding types with (curvature-based) rewiring techniques. Rewiring has recently received a surge of interest due to its ability to improve the performance of Graph Neural Networks by mitigating over-smoothing and over-squashing effects. Our results suggest that utilizing curvature information for structural encodings delivers significantly larger performance increases than rewiring.
    摘要 graph neural networks 可以通过结构和位置编码进行显著提升下游任务的性能。 current literature 已经开始系统地探索这些方法编码的结构性质之间的差异,以及这些方法之间的性能交换。 however, 关于哪些结构特性可以提供最有效的编码仍然是一个开放的问题。 in this paper, 我们从 геометрической角度来调查这个问题。 we propose a novel structural encoding based on discrete ricci curvature (local curvature profiles, short LCP) and show that it significantly outperforms existing encoding approaches. we further show that combining local structural encodings, such as LCP, with global positional encodings improves downstream performance, suggesting that they capture complementary geometric information. finally, we compare different encoding types with (curvature-based) rewiring techniques. rewiring has recently received a surge of interest due to its ability to improve the performance of graph neural networks by mitigating over-smoothing and over-squashing effects. our results suggest that utilizing curvature information for structural encodings delivers significantly larger performance increases than rewiring.

An Empirical Investigation into Benchmarking Model Multiplicity for Trustworthy Machine Learning: A Case Study on Image Classification

  • paper_url: http://arxiv.org/abs/2311.14859
  • repo_url: None
  • paper_authors: Prakhar Ganesh
  • for: This paper aims to address the issue of model multiplicity in deep learning, which occurs when multiple models achieve similar performance but exhibit distinct underlying behaviors.
  • methods: The paper proposes a framework called “multiplicity sheets” to benchmark multiplicity in various scenarios, and translates several trustworthy metrics into accuracy under appropriate interventions.
  • results: The paper demonstrates the advantages of the proposed setup through a case study in image classification and provides actionable insights into the impact and trends of different hyperparameters on model multiplicity. Additionally, the paper shows that multiplicity persists in deep learning models even after enforcing additional specifications during model selection.
    Abstract Deep learning models have proven to be highly successful. Yet, their over-parameterization gives rise to model multiplicity, a phenomenon in which multiple models achieve similar performance but exhibit distinct underlying behaviours. This multiplicity presents a significant challenge and necessitates additional specifications in model selection to prevent unexpected failures during deployment. While prior studies have examined these concerns, they focus on individual metrics in isolation, making it difficult to obtain a comprehensive view of multiplicity in trustworthy machine learning. Our work stands out by offering a one-stop empirical benchmark of multiplicity across various dimensions of model design and its impact on a diverse set of trustworthy metrics. In this work, we establish a consistent language for studying model multiplicity by translating several trustworthy metrics into accuracy under appropriate interventions. We also develop a framework, which we call multiplicity sheets, to benchmark multiplicity in various scenarios. We demonstrate the advantages of our setup through a case study in image classification and provide actionable insights into the impact and trends of different hyperparameters on model multiplicity. Finally, we show that multiplicity persists in deep learning models even after enforcing additional specifications during model selection, highlighting the severity of over-parameterization. The concerns of under-specification thus remain, and we seek to promote a more comprehensive discussion of multiplicity in trustworthy machine learning.
    摘要 我们首先确定了一种共同语言,用于研究多样性。我们将多个可靠指标翻译成准确率,并在合适的干预下进行翻译。然后,我们开发了一个名为“多样性表”的框架,用于评估多样性在不同的场景下。我们通过一个实验study示例,证明了我们的设置的优势。最后,我们发现了多样性在深度学习模型中仍然存在,即使在选择模型时采取了额外的要求,这表明了过度参数化的问题仍然存在,而不是下pecification的问题。因此,我们呼吁更加全面地讨论多样性在可靠机器学习中的问题。

Disruption Prediction in Fusion Devices through Feature Extraction and Logistic Regression

  • paper_url: http://arxiv.org/abs/2311.14856
  • repo_url: None
  • paper_authors: Diogo R. Ferreira
  • for: 这篇论文是为了描述一种在多机器断层预测挑战中使用的方法,该挑战由ITU在2023年9月至11月在线平台Zindi上举行。
  • methods: 这篇论文使用了特征提取方法,然后对这些特征进行了логистиック回归。每个信号都被视为一个分子预测器,最终组合这些预测器达到了领导板块的第一名。
  • results: 该论文在多机器断层预测挑战中取得了第一名,表明该方法可以准确地预测断层事件。
    Abstract This document describes an approach used in the Multi-Machine Disruption Prediction Challenge for Fusion Energy by ITU, a data science competition which ran from September to November 2023, on the online platform Zindi. The competition involved data from three fusion devices - C-Mod, HL-2A, and J-TEXT - with most of the training data coming from the last two, and the test data coming from the first one. Each device has multiple diagnostics and signals, and it turns out that a critical issue in this competition was to identify which signals, and especially which features from those signals, were most relevant to achieve accurate predictions. The approach described here is based on extracting features from signals, and then applying logistic regression on top of those features. Each signal is treated as a separate predictor and, in the end, a combination of such predictors achieved the first place on the leaderboard.
    摘要 The approach described here is based on extracting features from signals and applying logistic regression on top of those features. Each signal is treated as a separate predictor, and a combination of such predictors achieved the first place on the leaderboard.

Deep convolutional encoder-decoder hierarchical neural networks for conjugate heat transfer surrogate modeling

  • paper_url: http://arxiv.org/abs/2311.17068
  • repo_url: None
  • paper_authors: Takiah Ebbs-Picken, David A. Romero, Carlos M. Da Silva, Cristina H. Amon
  • for: 这个论文主要是为了开发一种基于深度学习的高级热传输模型(CHT),用于解决计算复杂的热传输问题。
  • methods: 该论文使用了一种名为DeepEDH的神经网络模型,这种模型结合了深度学习和卷积神经网络,用于模拟热传输过程中的温度和流速场。
  • results: 根据实验结果,DeepEDH方法可以与传统的计算机模型相比,提供更高的准确率(R2),并且可以在大规模的热传输问题中实现高效的计算。
    Abstract Conjugate heat transfer (CHT) models are vital for the design of many engineering systems. However, high-fidelity CHT models are computationally intensive, which limits their use in applications such as design optimization, where hundreds to thousands of model evaluations are required. In this work, we develop a modular deep convolutional encoder-decoder hierarchical (DeepEDH) neural network, a novel deep-learning-based surrogate modeling methodology for computationally intensive CHT models. Leveraging convective temperature dependencies, we propose a two-stage temperature prediction architecture that couples velocity and temperature models. The proposed DeepEDH methodology is demonstrated by modeling the pressure, velocity, and temperature fields for a liquid-cooled cold-plate-based battery thermal management system with variable channel geometry. A computational model of the cold plate is developed and solved using the finite element method (FEM), generating a dataset of 1,500 simulations. The FEM results are transformed and scaled from unstructured to structured, image-like meshes to create training and test datasets. The DeepEDH methodology's performance is examined in relation to data scaling, training dataset size, and network depth. Our performance analysis covers the impact of the novel architecture, separate field models, output geometry masks, multi-stage temperature models, and optimizations of the hyperparameters and architecture. Furthermore, we quantify the influence of the CHT thermal boundary condition on surrogate model performance, highlighting improved temperature model performance with higher heat fluxes. Compared to other deep learning neural network surrogate models, such as U-Net and DenseED, the proposed DeepEDH methodology for CHT models exhibits up to a 65% enhancement in the coefficient of determination ($R^{2}$).
    摘要 高级热传输(CHT)模型是多种工程系统的设计中的关键。然而,高级别的CHT模型计算复杂,限制了其在设计优化等应用中的使用。在这种情况下,我们开发了一种模块化深度卷积Encoder-Decoder层次(DeepEDH)神经网络,用于高级别CHT模型的替身模型。利用热传输依赖关系,我们提出了两stage温度预测建筑,将速度和温度模型相联。我们的DeepEDH方法在模拟固体电池热管理系统中的压力、速度和温度场的问题上进行了应用。我们使用finite element法(FEM)计算冰板模型,生成了1500次的数据集。FEM结果被转换和缩放成不结构化到结构化、图像化的网格,以创建训练和测试数据集。我们对DeepEDH方法的性能进行了关于数据缩放、训练数据集大小和网络深度的分析。我们的性能分析包括新建建筑、分离场模型、输出 geometry 面积、多stage 温度模型和优化网络参数和结构的影响。此外,我们评估了热边界条件对替身模型性能的影响,并发现在高热 flux 下,温度模型性能得到了提高。与其他深度学习神经网络替身模型相比,例如U-Net和DenseED,我们的DeepEDH方法在CHT模型替身中表现出到65%的提高。

Deep Latent Force Models: ODE-based Process Convolutions for Bayesian Deep Learning

  • paper_url: http://arxiv.org/abs/2311.14828
  • repo_url: None
  • paper_authors: Thomas Baldwin-McDonald, Mauricio A. Álvarez
  • for: 该论文主要针对高度非线性动力系统中现象的模型化和uncertainty量化问题。
  • methods: 该论文提出了深度嵌入力力模型(DLFM),一种适用于各种问题的领域独特方法,包括深度 Gaussian 过程架构,其中每层的kernel来自于ordinary differential equation,使用过程 convolutions的框架。
  • results: 论文提供了两种DLFM的形式,即weight-space和variational inducing points-based Gaussian process approximations,两者均适用于双重概率变换推断。论文还提供了实验证明DLFM能够正确地模型高度非线性的多变量时间序列数据,并与其他概率模型在 benchark regression 任务上具有相似的性能。
    Abstract Effectively modeling phenomena present in highly nonlinear dynamical systems whilst also accurately quantifying uncertainty is a challenging task, which often requires problem-specific techniques. We outline the deep latent force model (DLFM), a domain-agnostic approach to tackling this problem, which consists of a deep Gaussian process architecture where the kernel at each layer is derived from an ordinary differential equation using the framework of process convolutions. Two distinct formulations of the DLFM are presented which utilise weight-space and variational inducing points-based Gaussian process approximations, both of which are amenable to doubly stochastic variational inference. We provide evidence that our model is capable of capturing highly nonlinear behaviour in real-world multivariate time series data. In addition, we find that our approach achieves comparable performance to a number of other probabilistic models on benchmark regression tasks. We also empirically assess the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models.
    摘要 高度非线性动力系统中的现象模型化是一项复杂的任务,需要专门的技术来解决。我们介绍了深层强制力模型(DLFM),这是一种适用于所有领域的方法,它基于深度 Gaussian process 架构,其中每层的kernel来自于ordinary differential equation使用过程 convolutions框架。我们提出了两种DLFM的形式,一种使用weight-space Gaussian processapproxiamtion,另一种使用variational inducing points-based Gaussian processapproxiamtion,两者都可以使用 doubly stochastic variational inference。我们提供了证据,证明我们的模型可以Capture高度非线性行为的实际世界多变量时间序列数据。此外,我们发现我们的方法与其他概率模型在Benchmark regression task上具有相似的性能。我们还employs empirical assessment to study the negative impact of the inducing points framework on the extrapolation capabilities of LFM-based models。

Revisiting Quantum Algorithms for Linear Regressions: Quadratic Speedups without Data-Dependent Parameters

  • paper_url: http://arxiv.org/abs/2311.14823
  • repo_url: None
  • paper_authors: Zhao Song, Junze Yin, Ruizhe Zhang
  • for: Linear regression problem, specifically finding $x’$ such that $|Ax’ - b|2^2 \leq (1+\epsilon)\min{x}|Ax - b|_2^2$.
  • methods: Quantum algorithm that runs in $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \mathrm{poly}(d/\epsilon)$ time, providing a quadratic quantum speedup in $n$ over the classical lower bound without any dependence on data-dependent parameters.
  • results: Exponential quantum speedups over classical algorithms, with a running time of $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \mathrm{poly}(d/\epsilon)$ that depends only on the condition number of $A$ and not on the size of the dataset. The result can be generalized to multiple regression and ridge linear regression.
    Abstract Linear regression is one of the most fundamental linear algebra problems. Given a dense matrix $A \in \mathbb{R}^{n \times d}$ and a vector $b$, the goal is to find $x'$ such that $ \| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2 $. The best classical algorithm takes $O(nd) + \mathrm{poly}(d/\epsilon)$ time [Clarkson and Woodruff STOC 2013, Nelson and Nguyen FOCS 2013]. On the other hand, quantum linear regression algorithms can achieve exponential quantum speedups, as shown in [Wang Phys. Rev. A 96, 012335, Kerenidis and Prakash ITCS 2017, Chakraborty, Gily{\'e}n and Jeffery ICALP 2019]. However, the running times of these algorithms depend on some quantum linear algebra-related parameters, such as $\kappa(A)$, the condition number of $A$. In this work, we develop a quantum algorithm that runs in $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \mathrm{poly}(d/\epsilon)$ time. It provides a quadratic quantum speedup in $n$ over the classical lower bound without any dependence on data-dependent parameters. In addition, we also show our result can be generalized to multiple regression and ridge linear regression.
    摘要 Linear regression 是一个非常基本的线性代数问题。给定一个稠密矩阵 $A \in \mathbb{R}^{n \times d}$ 和一个向量 $b$, 目标是找到 $x'$ 使得 $\| Ax' - b \|_2^2 \leq (1+\epsilon) \min_{x} \| A x - b \|_2^2 $。最佳的 классический算法需要 $O(nd) + \text{poly}(d/\epsilon)$ 时间 [Clarkson 和 Woodruff STOC 2013, Nelson 和 Nguyen FOCS 2013]。然而,量子线性回归算法可以实现幂量量子加速,如在 [Wang Phys. Rev. A 96, 012335, Kerenidis 和 Prakash ITCS 2017, Chakraborty, Gily{\'e}n 和 Jeffery ICALP 2019] 中所示。但是,这些算法的运行时间取决于一些量子线性代数相关的参数,如 $A$ 的condition number $\kappa(A)$。在这种工作中,我们开发了一个量子算法,运行时间为 $\widetilde{O}(\epsilon^{-1}\sqrt{n}d^{1.5}) + \text{poly}(d/\epsilon)$。它提供了一个quadratic量子加速度,至于 $n$ 的二次量子加速度无关于数据依赖的参数。此外,我们还证明了我们的结果可以推广到多重回归和梯度回归。

Differentiable and accelerated spherical harmonic and Wigner transforms

  • paper_url: http://arxiv.org/abs/2311.14670
  • repo_url: https://github.com/astro-informatics/s2fft
  • paper_authors: Matthew A. Price, Jason D. McEwen
  • for: The paper is written for researchers and practitioners who work with data defined on spherical manifolds and require efficient computation of gradients for machine learning or other differentiable programming tasks.
  • methods: The paper presents novel algorithmic structures for accelerated and differentiable computation of generalised Fourier transforms on the sphere and rotation group, including a recursive algorithm for the calculation of Wigner $d$-functions and a hybrid automatic and manual differentiation approach.
  • results: The paper reports up to a 400-fold acceleration and very close to optimal linear scaling with increasing number of GPUs when benchmarked against alternative C codes, and exhibits an unprecedented effective linear time complexity when distributing over multiple GPUs.
    Abstract Many areas of science and engineering encounter data defined on spherical manifolds. Modelling and analysis of spherical data often necessitates spherical harmonic transforms, at high degrees, and increasingly requires efficient computation of gradients for machine learning or other differentiable programming tasks. We develop novel algorithmic structures for accelerated and differentiable computation of generalised Fourier transforms on the sphere $\mathbb{S}^2$ and rotation group $\text{SO}(3)$, i.e. spherical harmonic and Wigner transforms, respectively. We present a recursive algorithm for the calculation of Wigner $d$-functions that is both stable to high harmonic degrees and extremely parallelisable. By tightly coupling this with separable spherical transforms, we obtain algorithms that exhibit an extremely parallelisable structure that is well-suited for the high throughput computing of modern hardware accelerators (e.g. GPUs). We also develop a hybrid automatic and manual differentiation approach so that gradients can be computed efficiently. Our algorithms are implemented within the JAX differentiable programming framework in the S2FFT software code. Numerous samplings of the sphere are supported, including equiangular and HEALPix sampling. Computational errors are at the order of machine precision for spherical samplings that admit a sampling theorem. When benchmarked against alternative C codes we observe up to a 400-fold acceleration. Furthermore, when distributing over multiple GPUs we achieve very close to optimal linear scaling with increasing number of GPUs due to the highly parallelised and balanced nature of our algorithms. Provided access to sufficiently many GPUs our transforms thus exhibit an unprecedented effective linear time complexity.
    摘要 多个科学和工程领域遇到定义在球面上的数据。模型和分析球面数据时常常需要球面傅立卷变换,特别是在高度上,并且需要高效地计算导数用于机器学习或其他可导程序任务。我们开发了新的算法结构,以加速和可导计算球面上的总化傅立卷变换。我们提出一种递归算法来计算温顿-$d$函数,该算法是稳定的高傅立卷度和可并行化的。通过与分解的球面变换紧密结合,我们获得了高并行化的算法结构,非常适合现代硬件加速器(如GPU)的高通过put计算。我们还开发了一种混合自动和手动导数方法,以便高效地计算导数。我们的算法在JAX可导编程框架中的S2FFT软件代码中实现。我们支持多种球面抽象,包括均匀和HEALPix抽象。计算错误在球面抽象中的机器精度阈值内。与替换的C代码进行比较时,我们观察到最多400倍的加速。此外,当分布到多个GPU上时,我们达到了几乎optimal的直线扩展性,因为我们的算法具有高并行化和均衡的特性。只要有足够多的GPU,我们的变换就会显示出前所未有的高效性,即无穷Linear Time Complexity。

Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks

  • paper_url: http://arxiv.org/abs/2311.14658
  • repo_url: None
  • paper_authors: Zhen Qin, Xuwei Tan, Zhihui Zhu
  • for: 这个论文的目的是提供对深度神经网络训练的正交正则化的理论分析,尤其是训练正交深度线性神经网络的收敛性分析。
  • methods: 这篇论文使用了里曼尼安排 descent 算法,并提供了一种适当的初始化方法来加速训练过程。
  • results: 这篇论文的实验结果表明,在一些损失函数下,采用正交正则化的方法可以提高训练速度,并且可以在不同的隐藏层数量下进行优化。
    Abstract Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks. However, despite its practical performance, the theoretical analysis of orthonormality in neural networks is still lacking; for example, how orthonormality affects the convergence of the training process. In this letter, we aim to bridge this gap by providing convergence analysis for training orthonormal deep linear neural networks. Specifically, we show that Riemannian gradient descent with an appropriate initialization converges at a linear rate for training orthonormal deep linear neural networks with a class of loss functions. Unlike existing works that enforce orthonormal weight matrices for all the layers, our approach excludes this requirement for one layer, which is crucial to establish the convergence guarantee. Our results shed light on how increasing the number of hidden layers can impact the convergence speed. Experimental results validate our theoretical analysis.
    摘要 强制权重矩阵具有正交性或均匀性性质可以增强深度神经网络的训练,使得梯度爆炸/消失问题得到改善,并提高学习的网络性。然而,关于神经网络中权重矩阵正交性的理论分析仍然缺失,例如如何正交性对训练过程的收敛有什么影响。在这封信中,我们尝试填补这一空白,通过对正交深度线性神经网络的训练进行收敛分析。我们表明,在适当的初始化下,里曼射gradient descent算法可以在训练正交深度线性神经网络时达到直线收敛率,并且不需要所有层的权重矩阵具有正交性。我们的结果显示,增加隐藏层数量可以对收敛速度产生影响。实验结果证明了我们的理论分析。

JetLOV: Enhancing Jet Tree Tagging through Neural Network Learning of Optimal LundNet Variables

  • paper_url: http://arxiv.org/abs/2311.14654
  • repo_url: https://github.com/giorgiocerro/jetlov
  • paper_authors: Mauricio A. Diaz, Giorgio Cerro, Jacan Chaplais, Srinandan Dasmahapatra, Stefano Moretti
  • for: 这 paper 的目的是使用机器学习算法,尤其是深度学习,解决物理领域中复杂的分类问题,如束环核心分类。
  • methods: 这 paper 使用了两种模型:一个简单的多层感知器(MLP)和已经证明有效的 LundNet。
  • results: 研究发现,可以通过不依赖 LundNet 变量,而使用自动学习的新变量,来实现类似的束环标记性能。这些发现可能有助于解决模型依赖性问题,并通过总结和训练在多个数据集上进行模型的泛化。
    Abstract Machine learning has played a pivotal role in advancing physics, with deep learning notably contributing to solving complex classification problems such as jet tagging in the field of jet physics. In this experiment, we aim to harness the full potential of neural networks while acknowledging that, at times, we may lose sight of the underlying physics governing these models. Nevertheless, we demonstrate that we can achieve remarkable results obscuring physics knowledge and relying completely on the model's outcome. We introduce JetLOV, a composite comprising two models: a straightforward multilayer perceptron (MLP) and the well-established LundNet. Our study reveals that we can attain comparable jet tagging performance without relying on the pre-computed LundNet variables. Instead, we allow the network to autonomously learn an entirely new set of variables, devoid of a priori knowledge of the underlying physics. These findings hold promise, particularly in addressing the issue of model dependence, which can be mitigated through generalization and training on diverse data sets.
    摘要 机器学习在物理研究中发挥了关键作用,特别是深度学习在复杂分类问题上提供了突出的贡献,如jets Tagging在物理上。在这个实验中,我们希望利用神经网络的全部潜力,同时也让我们不断地掌握神经网络模型下的物理知识。然而,我们发现在某些情况下,我们可能会忽略神经网络模型下的物理知识。不过,我们的研究表明,我们可以通过不依赖 LundNet 变量来实现相似的 jet 标记性能。我们的结果表明,我们可以让神经网络自动学习一个 entirely new的变量集合,无需受到先前物理知识的限制。这些发现对于解决模型依赖性问题具有潜在的意义,特别是通过普适性和训练在多个数据集上进行加持。

Data-driven Prior Learning for Bayesian Optimisation

  • paper_url: http://arxiv.org/abs/2311.14653
  • repo_url: https://github.com/sighellan/plebo
  • paper_authors: Sigrid Passano Hellan, Christopher G. Lucas, Nigel H. Goddard
  • for: 这 paper 是为了提高 Bayesian 优化中的计算效率,而不是假设所有优化任务具有相似的优化输入。
  • methods: 这 paper 使用的方法是 Prior Learning for Bayesian Optimization - PLeBO -,它通过学习 Gaussian process 模型的 hyperparameter 来更好地近似实际函数。
  • results: experiments 表明,PLeBO 和 Prior transfer 能够在 fewer evaluations 中找到好的输入,比其他 transfer learning 方法更有效。
    Abstract Transfer learning for Bayesian optimisation has generally assumed a strong similarity between optimisation tasks, with at least a subset having similar optimal inputs. This assumption can reduce computational costs, but it is violated in a wide range of optimisation problems where transfer learning may nonetheless be useful. We replace this assumption with a weaker one only requiring the shape of the optimisation landscape to be similar, and analyse the recent method Prior Learning for Bayesian Optimisation - PLeBO - in this setting. By learning priors for the hyperparameters of the Gaussian process surrogate model we can better approximate the underlying function, especially for few function evaluations. We validate the learned priors and compare to a breadth of transfer learning approaches, using synthetic data and a recent air pollution optimisation problem as benchmarks. We show that PLeBO and prior transfer find good inputs in fewer evaluations.
    摘要 <>将文本翻译成简化中文。<>bayesian优化中的转移学习通常假设优化任务之间具有强相似性,至少有一部分优化任务的优化输入具有相似性。这个假设可以降低计算成本,但在许多优化问题中被违反。我们将这个假设改为只需要优化函数的形态相似,并分析最近的方法Prior Learning for Bayesian Optimization(PLeBO)在这种设定下的表现。通过学习GP模型的超参数的先验知识,我们可以更好地近似下面函数,特别是在少量函数评估中。我们验证学习的先验知识和多种转移学习方法,使用 sintetic数据和最新的空气污染优化问题作为标准准比。我们发现PLeBO和先验转移可以快速找到优化的输入, fewer evaluations。

Learning in Deep Factor Graphs with Gaussian Belief Propagation

  • paper_url: http://arxiv.org/abs/2311.14649
  • repo_url: None
  • paper_authors: Seth Nabarro, Mark van der Wilk, Andrew J Davison
  • for: 这种方法用于学习 Gaussian factor graphs 中的量表。
  • methods: 该方法将所有相关的量(输入、输出、参数、隐藏变量)视为图形模型中的随机变量,并视为训练和预测为推理问题,其中每个问题都可以使用信念传播(BP)进行有效地解决。
  • results: 该方法可以扩展到深度网络,并提供一种自然的方式进行连续学习:使用 BP 估计当前任务中的参数积分作为下一个任务的参数先验。在视频噪声任务上,该方法超越了类传统因子图方法,并在 MNIST 图像分类任务上表现出了激进的性能。
    Abstract We propose an approach to do learning in Gaussian factor graphs. We treat all relevant quantities (inputs, outputs, parameters, latents) as random variables in a graphical model, and view both training and prediction as inference problems with different observed nodes. Our experiments show that these problems can be efficiently solved with belief propagation (BP), whose updates are inherently local, presenting exciting opportunities for distributed and asynchronous training. Our approach can be scaled to deep networks and provides a natural means to do continual learning: use the BP-estimated parameter marginals of the current task as parameter priors for the next. On a video denoising task we demonstrate the benefit of learnable parameters over a classical factor graph approach and we show encouraging performance of deep factor graphs for continual image classification on MNIST.
    摘要 我们提出了一种在高斯因子图上进行学习的方法。我们将所有相关的量(输入、输出、参数、隐藏变量)视为图形模型中的随机变量,并将训练和预测视为两个不同的观察节点的推理问题。我们的实验表明,这些问题可以有效地使用信仰卷积(BP)解决,其更新是本地的,这为分布式和异步训练提供了激动人心的机会。我们的方法可扩展到深度网络,并提供了一种自然的升级学习方法:使用当前任务BP估计的参数积分作为下一任务参数先验。在视频噪声去噪任务上,我们证明了可学习参数的优势 над传统的因子图方法,并在MNIST continual图像分类任务上表现出了鼓励的表现。

More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory

  • paper_url: http://arxiv.org/abs/2311.14646
  • repo_url: None
  • paper_authors: James B. Simon, Dhruva Karkada, Nikhil Ghosh, Mikhail Belkin
  • for: 提供了关于过度参数化、过度适应和更多数据对随机特征(RF)回归模型的理论支持。
  • methods: 使用了随机特征回归模型,并通过优化ridge penalty来控制模型复杂度。
  • results: 提出了一种新的理论,表明随机特征回归模型的测试风险随着特征数和样本数的增加而下降,并且在某些任务下,只有在训练损失很低时才能达到近似优秀的性能。
    Abstract In our era of enormous neural networks, empirical progress has been driven by the philosophy that more is better. Recent deep learning practice has found repeatedly that larger model size, more data, and more computation (resulting in lower training loss) improves performance. In this paper, we give theoretical backing to these empirical observations by showing that these three properties hold in random feature (RF) regression, a class of models equivalent to shallow networks with only the last layer trained. Concretely, we first show that the test risk of RF regression decreases monotonically with both the number of features and the number of samples, provided the ridge penalty is tuned optimally. In particular, this implies that infinite width RF architectures are preferable to those of any finite width. We then proceed to demonstrate that, for a large class of tasks characterized by powerlaw eigenstructure, training to near-zero training loss is obligatory: near-optimal performance can only be achieved when the training error is much smaller than the test error. Grounding our theory in real-world data, we find empirically that standard computer vision tasks with convolutional neural tangent kernels clearly fall into this class. Taken together, our results tell a simple, testable story of the benefits of overparameterization, overfitting, and more data in random feature models.
    摘要 在我们的巨大神经网络时代,实践中的进步主要受到“更多是更好”的哲学影响。现代深度学习实践 repeatedly 发现,更大的模型Size,更多的数据和更多的计算(导致训练损失降低)会提高性能。在这篇论文中,我们给这些实际观察提供了理论支持,证明这三个属性在Random Feature(RF)回归中成立,这类模型与 shallow network 的最后一层 alone 相同。具体来说,我们首先显示测试风险在RF回归中随着特征数和样本数的增加而下降,只要ridge penalty 得到优化。这 imply dass infinite width RF 架构更加有利,而不是 finite width 的。然后,我们证明,对于一类 Task possessing power-law eigenstructure,在训练到训练损失接近零时,性能才能达到 Near-optimal 水平。基于实际数据,我们发现了标准计算机视觉任务中的 Convolutional Neural Tangent Kernels 明显属于这一类。总之,我们的结果告诉了一个简单、可验证的故事:Random Feature 模型中的过参数、过拟合和更多数据的好处。

A General Framework for User-Guided Bayesian Optimization

  • paper_url: http://arxiv.org/abs/2311.14645
  • repo_url: None
  • paper_authors: Carl Hvarfner, Frank Hutter, Luigi Nardi
  • for: 优化高计算成本的黑盒函数,如在科学领域中广泛存在。
  • methods: 使用 bayesian 优化,可以自动、通用和减少样本数来解决这些问题,但是它无法包含专家所信任的知识或信念来加速优化。
  • results: 我们提出了 ColaBO,第一个基于 bayesian 原理的框架,可以在不同的 Monte Carlo 获取函数和专家信念中包含专家所信任的信念。我们的实验表明,当专家信念准确时,ColaBO 能够显著加速优化,而当专家信念误导时,它能够保持约等于默认性能。
    Abstract The optimization of expensive-to-evaluate black-box functions is prevalent in various scientific disciplines. Bayesian optimization is an automatic, general and sample-efficient method to solve these problems with minimal knowledge of the underlying function dynamics. However, the ability of Bayesian optimization to incorporate prior knowledge or beliefs about the function at hand in order to accelerate the optimization is limited, which reduces its appeal for knowledgeable practitioners with tight budgets. To allow domain experts to customize the optimization routine, we propose ColaBO, the first Bayesian-principled framework for incorporating prior beliefs beyond the typical kernel structure, such as the likely location of the optimizer or the optimal value. The generality of ColaBO makes it applicable across different Monte Carlo acquisition functions and types of user beliefs. We empirically demonstrate ColaBO's ability to substantially accelerate optimization when the prior information is accurate, and to retain approximately default performance when it is misleading.
    摘要 scipy.optimize.minimize(函数)的优化是科学领域中广泛存在的问题。 bayesian 优化是一种自动、通用和效率高的方法,用于解决这些问题,只需要 minimal knowledge of the underlying function dynamics。然而,bayesian 优化的能力 incorporate prior knowledge or beliefs about the function at hand in order to accelerate the optimization is limited, which reduces its appeal for knowledgeable practitioners with tight budgets. To allow domain experts to customize the optimization routine, we propose ColaBO, the first Bayesian-principled framework for incorporating prior beliefs beyond the typical kernel structure, such as the likely location of the optimizer or the optimal value. The generality of ColaBO makes it applicable across different Monte Carlo acquisition functions and types of user beliefs. We empirically demonstrate ColaBO's ability to substantially accelerate optimization when the prior information is accurate, and to retain approximately default performance when it is misleading.Note: "scipy.optimize.minimize(函数)" is a function in the Scipy library used for optimization in Python.

Differentially Private SGD Without Clipping Bias: An Error-Feedback Approach

  • paper_url: http://arxiv.org/abs/2311.14632
  • repo_url: None
  • paper_authors: Xinwei Zhang, Zhiqi Bu, Zhiwei Steven Wu, Mingyi Hong
  • For: 提供一种可靠的权限保护机制,使得深度学习模型在使用敏感数据时能够保持数据隐私。* Methods: 使用Differentially Private Stochastic Gradient Descent with gradient clipping(DPSGD-GC)和错误反馈(EF)算法,提供了一个有理性的权限保护机制,并且可以在具有不同问题特点的场景中进行自适应调整。* Results: 在Cifar-10/100和E2E datasets上进行了实验,并证明了该算法可以在保持同等数据隐私保护的前提下,提高深度学习模型的准确率。
    Abstract Differentially Private Stochastic Gradient Descent with gradient clipping (DPSGD-GC) is a powerful tool for training deep learning models using sensitive data, providing both a solid theoretical privacy guarantee and high efficiency. However, using DPSGD-GC to ensure Differential Privacy (DP) comes at the cost of model performance degradation due to DP noise injection and gradient clipping. Existing research has extensively analyzed the theoretical convergence of DPSGD-GC, and has shown that it only converges when using large clipping thresholds that are dependent on problem-specific parameters. Unfortunately, these parameters are often unknown in practice, making it hard to choose the optimal clipping threshold. Therefore, in practice, DPSGD-GC suffers from degraded performance due to the {\it constant} bias introduced by the clipping. In our work, we propose a new error-feedback (EF) DP algorithm as an alternative to DPSGD-GC, which not only offers a diminishing utility bound without inducing a constant clipping bias, but more importantly, it allows for an arbitrary choice of clipping threshold that is independent of the problem. We establish an algorithm-specific DP analysis for our proposed algorithm, providing privacy guarantees based on R{\'e}nyi DP. Additionally, we demonstrate that under mild conditions, our algorithm can achieve nearly the same utility bound as DPSGD without gradient clipping. Our empirical results on Cifar-10/100 and E2E datasets, show that the proposed algorithm achieves higher accuracies than DPSGD while maintaining the same level of DP guarantee.
    摘要 diferencialmente privado Stochastic Gradient Descent con clipping de gradient (DPSGD-GC) es una herramienta poderosa para entrenar modelos de aprendizaje profundo utilizando datos sensibles, brindando tanto una garantía teórica sólida de privacidad como eficiencia alta. Sin embargo, utilizar DPSGD-GC para garantizar la privacidad diferencial (DP) tiene un costo de degradación de rendimiento del modelo debido al ruido de DP y clipping de gradient. La investigación existente ha analizado ampliamente la convergencia teórica de DPSGD-GC y ha demostrado que solo converge cuando se utiliza un umbral de clipping grande y dependiente de parámetros específicos del problema. Desafortunadamente, estos parámetros son a menudo desconocidos en la práctica, lo que dificulta la elección del umbral de clipping óptimo. Por lo tanto, en la práctica, DPSGD-GC sufre de una bias constante introducida por el clipping, lo que degrade su rendimiento.En nuestro trabajo, propusimos un algoritmo de retroalimentación de error (EF) como una alternativa a DPSGD-GC, que no solo ofrece una bound de utilidad diminuyente sin introducir un bias constante, sino que también permite una elección arbitraria del umbral de clipping que es independiente del problema. Establecemos un análisis de privacidad específico del algoritmo basado en la privacidad de Rényi. Además, demostramos que, bajo ciertas condiciones suaves, nuestro algoritmo puede alcanzar una utilidad similar a la de DPSGD sin clipping de gradient. Nuestros resultados empíricos en los conjuntos de datos Cifar-10/100 y E2E demuestran que el propuesto algoritmo tiene una precisión más alta que DPSGD mientras mantiene el mismo nivel de garantía de privacidad.

Analysis of the expected $L_2$ error of an over-parametrized deep neural network estimate learned by gradient descent without regularization

  • paper_url: http://arxiv.org/abs/2311.14609
  • repo_url: None
  • paper_authors: Selina Drews, Michael Kohler
  • for: 这个论文是为了证明深度神经网络可以不使用正则化项来实现好的学习效果。
  • methods: 这个论文使用的方法是应用梯度下降到带扰动的数据集上学习深度神经网络。
  • results: 研究发现,无需正则化项的深度神经网络也可以实现好的学习效果,并且在某些情况下,其学习速率比使用正则化项的情况快。此外,研究还发现,当投影函数是Holder平滑的时候,$L_2$误差的整体趋势是随着数据集大小$n$的增加而下降,并且其 converge 速率与输入维度$d$无关。
    Abstract Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical $L_2$ risk are universally consistent and achieve good rates of convergence. In this paper, we show that the regularization term is not necessary to obtain similar results. In the case of a suitably chosen initialization of the network, a suitable number of gradient descent steps, and a suitable step size we show that an estimate without a regularization term is universally consistent for bounded predictor variables. Additionally, we show that if the regression function is H\"older smooth with H\"older exponent $1/2 \leq p \leq 1$, the $L_2$ error converges to zero with a convergence rate of approximately $n^{-1/(1+d)}$. Furthermore, in case of an interaction model, where the regression function consists of a sum of H\"older smooth functions with $d^*$ components, a rate of convergence is derived which does not depend on the input dimension $d$.
    摘要 最新的结果表明,使用权重参数化深度神经网络的梯度下降学习,可以得到良好的整体性和速度。在这篇论文中,我们证明了,无需正则化项,可以获得类似的结果。在网络初始化合适,步长合适,步长合适时,无正则化项的估计是全面一致的。此外,如果回归函数是Holder平滑的,那么$L_2$误差的整体趋势为零,并且错误率为$n^{-1/(1+d)}$。在交互模型中,其中回归函数为多个Holder平滑函数的总和,我们 derivated一个不依赖输入维度$d$的速度。

A Metalearned Neural Circuit for Nonparametric Bayesian Inference

  • paper_url: http://arxiv.org/abs/2311.14601
  • repo_url: https://github.com/jakesnell/neural-circuits
  • paper_authors: Jake C. Snell, Gianluca Bencomo, Thomas L. Griffiths
  • for: 这个研究是为了解决机器学习类别时遇到的关注集分布问题,即在实际世界中,类别出现的数据不仅是对称的,而且类别数量可能会 seguing 到长尾的力 Ло� distributed 分布。
  • methods: 这个研究使用了非 Parametric Bayesian 模型,并将其中的导引偏好转换到人工神经网络中。通过实际资料上的非 Parametric Bayesian 偏好,我们可以实现遍历无限多个类别的推理。
  • results: 我们的实验结果显示,将非 Parametric Bayesian 模型中的导引偏好转换到人工神经网络中,可以实现比 particle filter 方法更好的推理性能,并且比使用将 Bayesian nonparametric 推理直接包含在人工神经网络中更加快速和简单。
    Abstract Most applications of machine learning to classification assume a closed set of balanced classes. This is at odds with the real world, where class occurrence statistics often follow a long-tailed power-law distribution and it is unlikely that all classes are seen in a single sample. Nonparametric Bayesian models naturally capture this phenomenon, but have significant practical barriers to widespread adoption, namely implementation complexity and computational inefficiency. To address this, we present a method for extracting the inductive bias from a nonparametric Bayesian model and transferring it to an artificial neural network. By simulating data with a nonparametric Bayesian prior, we can metalearn a sequence model that performs inference over an unlimited set of classes. After training, this "neural circuit" has distilled the corresponding inductive bias and can successfully perform sequential inference over an open set of classes. Our experimental results show that the metalearned neural circuit achieves comparable or better performance than particle filter-based methods for inference in these models while being faster and simpler to use than methods that explicitly incorporate Bayesian nonparametric inference.
    摘要 大多数机器学习应用于分类假设了关闭的均衡类别。然而,现实中,类别发生频率经常遵循长尾力学 distribuition,而且在单个样本中可能无法见到所有类别。非Parametric Bayesian模型自然地捕捉到这种现象,但它们在实际应用中存在重要的实现复杂性和计算效率问题。为了解决这些问题,我们提出了一种方法,即从非Parametric Bayesian模型中提取 inductive bias,并将其传递到人工神经网络中。通过使用非Parametric Bayesian prior simulate数据,我们可以模拟出一个能够进行无限多个类别的推理的“神经网络”。经过训练,这个“神经网络”已经吸收了相应的 inductive bias,并可以成功地进行无限多个类别的推理。我们的实验结果表明,使用这种模板学习的神经网络可以与 particile filter-based 方法相比,在这些模型中进行推理时达到相同或更好的性能,而且比使用直接 incorporate Bayesian nonparametric inference 的方法更快和更简单。

One Fits All: Universal Time Series Analysis by Pretrained LM and Specially Designed Adaptors

  • paper_url: http://arxiv.org/abs/2311.14782
  • repo_url: https://github.com/psacfc/gpt4ts_adapter
  • paper_authors: Tian Zhou, Peisong Niu, Xue Wang, Liang Sun, Rong Jin
  • for: 本研究旨在开探预训计划模型在时间序列分析领域的应用,并提出一种基于预训计划模型的实时时间序列分析方法。
  • methods: 本研究使用预训计划模型的内置构成,包括自我注意力和传递层,以及特定设计的四个适应器,进行时间序列分析。
  • results: 本研究的实验结果显示,使用预训计划模型可以在不同的时间序列分析任务上达到顶尖性能,而且可以透过特定设计的适应器进一步提高性能。
    Abstract Despite the impressive achievements of pre-trained models in the fields of natural language processing (NLP) and computer vision (CV), progress in the domain of time series analysis has been limited. In contrast to NLP and CV, where a single model can handle various tasks, time series analysis still relies heavily on task-specific methods for activities such as classification, anomaly detection, forecasting, and few-shot learning. The primary obstacle to developing a pre-trained model for time series analysis is the scarcity of sufficient training data. In our research, we overcome this obstacle by utilizing pre-trained models from language or CV, which have been trained on billions of data points, and apply them to time series analysis. We assess the effectiveness of the pre-trained transformer model in two ways. Initially, we maintain the original structure of the self-attention and feedforward layers in the residual blocks of the pre-trained language or image model, using the Frozen Pre-trained Transformer (FPT) for time series analysis with the addition of projection matrices for input and output. Additionally, we introduce four unique adapters, designed specifically for downstream tasks based on the pre-trained model, including forecasting and anomaly detection. These adapters are further enhanced with efficient parameter tuning, resulting in superior performance compared to all state-of-the-art methods.Our comprehensive experimental studies reveal that (a) the simple FPT achieves top-tier performance across various time series analysis tasks; and (b) fine-tuning the FPT with the custom-designed adapters can further elevate its performance, outshining specialized task-specific models.
    摘要 尽管预训模型在自然语言处理(NLP)和计算机视觉(CV)领域的成就印象深刻,但时间序列分析领域的进步却有限。与NLP和CV不同,时间序列分析仍然依赖于特定任务的方法,例如分类、异常检测、预测和几何学学习。主要阻碍开发预训模型的困难在于缺乏充足的训练数据。在我们的研究中,我们利用预训CV或语言模型,这些模型在数据点上训练了数百亿个数据点,并将其应用于时间序列分析。我们使用预训转换器模型进行时间序列分析,并添加输入和输出投影矩阵。此外,我们还引入四种专门为下游任务设计的适应器,包括预测和异常检测。这些适应器通过高效的参数调整,使其性能胜过所有现有方法。我们的广泛的实验研究表明:(a)简单的FPT可以在多种时间序列分析任务中达到顶尖性能;(b)对FPT进行特定任务的定制适应器进行微调,可以进一步提高其性能,超越特定任务的模型。

Example-Based Explanations of Random Forest Predictions

  • paper_url: http://arxiv.org/abs/2311.14581
  • repo_url: None
  • paper_authors: Henrik Boström
  • for: 这篇论文主要是关于如何提供更有用的Random Forest预测解释。
  • methods: 论文提出了一种修改预测过程,只使用最重要的示例来计算预测结果,从而减少了预测中每个示例的数量,同时保持或even improve预测性能。
  • results: 实验表明,使用修改过程可以substantially reduce the number of examples used in each explanation,而且与标准预测过程相比,预测性能可以保持或even improve。
    Abstract A random forest prediction can be computed by the scalar product of the labels of the training examples and a set of weights that are determined by the leafs of the forest into which the test object falls; each prediction can hence be explained exactly by the set of training examples for which the weights are non-zero. The number of examples used in such explanations is shown to vary with the dimensionality of the training set and hyperparameters of the random forest algorithm. This means that the number of examples involved in each prediction can to some extent be controlled by varying these parameters. However, for settings that lead to a required predictive performance, the number of examples involved in each prediction may be unreasonably large, preventing the user to grasp the explanations. In order to provide more useful explanations, a modified prediction procedure is proposed, which includes only the top-weighted examples. An investigation on regression and classification tasks shows that the number of examples used in each explanation can be substantially reduced while maintaining, or even improving, predictive performance compared to the standard prediction procedure.
    摘要 一个随机森林预测可以通过标签的内积和一组由树叶决定的权重来计算;每个预测都可以由一组非零权重的训练示例进行 precisely 解释。训练集的维度和随机森林算法的Hyperparameter会影响这些例子的数量。这意味着可以通过调整这些参数来控制预测中每个例子的数量。然而,在需要的预测性能下,每个预测中可能会有过多的例子,使用者无法理解解释。为了提供更有用的解释,我们提议一种修改的预测方法,只包含权重最大的示例。我们对 regression 和 classification 任务进行了调查,发现可以大幅减少每个解释中的例子数量,保持或者even 改善预测性能与标准预测方法相比。

Predicting Failure of P2P Lending Platforms through Machine Learning: The Case in China

  • paper_url: http://arxiv.org/abs/2311.14577
  • repo_url: None
  • paper_authors: Jen-Yin Yeh, Hsin-Yu Chiu, Jhih-Huei Huang
  • for: 这种研究用机器学习模型预测中国P2P借据平台失败。
  • methods: 这种研究使用筛法和包袋法,并使用前向选择和反向减少来确定变量的有效性和重要性。
  • results: 研究发现一组可靠的变量,这些变量在不同的选择方法和模型中都出现在特征子 subsets 中,表明它们在预测平台失败方面具有可靠性和重要性。 研究发现,减少变量的数量会导致false acceptance rate 增加,但是性能指标保持稳定,AUC值约为0.96,F1 score 约为0.88。
    Abstract This study employs machine learning models to predict the failure of Peer-to-Peer (P2P) lending platforms, specifically in China. By employing the filter method and wrapper method with forward selection and backward elimination, we establish a rigorous and practical procedure that ensures the robustness and importance of variables in predicting platform failures. The research identifies a set of robust variables that consistently appear in the feature subsets across different selection methods and models, suggesting their reliability and relevance in predicting platform failures. The study highlights that reducing the number of variables in the feature subset leads to an increase in the false acceptance rate while the performance metrics remain stable, with an AUC value of approximately 0.96 and an F1 score of around 0.88. The findings of this research provide significant practical implications for regulatory authorities and investors operating in the Chinese P2P lending industry.
    摘要 Here's the translation in Simplified Chinese:这个研究使用机器学习模型预测中国Peer-to-Peer(P2P)借贷平台的失败。通过使用过滤方法和包裹方法,我们建立了一个可靠和实用的程序,确保变量的重要性和可靠性在预测平台失败方面。研究发现,减少变量的数量会导致准确接受率的增加,而性能指标保持稳定,AUC值约为0.96,F1分数约为0.88。这些研究结论对中国P2P借贷行业的规制机构和投资者有重要实践意义。

FRUITS: Feature Extraction Using Iterated Sums for Time Series Classification

  • paper_url: http://arxiv.org/abs/2311.14549
  • repo_url: https://github.com/irkri/fruits
  • paper_authors: Joscha Diehl, Richard Krieg
  • for: 这个论文是为了提出一个时间序列分类管道,该管道EXTRACTS特征基于迭代和签名(ISS),然后应用线性分类器。这些特征是非线性的,捕捉时间序列信息,并在某些设置下是时间抽象的。
  • methods: 这个管道使用迭代和签名(ISS)来EXTRACT特征,然后应用线性分类器进行分类。
  • results: 这个管道在UCAR archive上与当前最佳方法竞争,both in terms of accuracy和速度。codes are available at \url{https://github.com/irkri/fruits}.
    Abstract We introduce a pipeline for time series classification that extracts features based on the iterated-sums signature (ISS) and then applies a linear classifier. These features are intrinsically nonlinear, capture chronological information, and, under certain settings, are invariant to time-warping. We are competitive with state-of-the-art methods on the UCR archive, both in terms of accuracy and speed. We make our code available at \url{https://github.com/irkri/fruits}.
    摘要 我们提出了一个时间序列分类管道,其中提取基于迭代和积分特征(ISS),然后应用线性分类器。这些特征是内在非线性的,捕捉时间信息,并在某些设置下保持时间戳变化的不变性。我们与当前最佳方法在UCRL存档上具有同等精度和速度。我们的代码可以在 GitHub 上找到:https://github.com/irkri/fruits。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form as well.

Finding Foundation Models for Time Series Classification with a PreText Task

  • paper_url: http://arxiv.org/abs/2311.14534
  • repo_url: https://github.com/msd-irimas/domainfoundationmodelstsc
  • paper_authors: Ali Ismail-Fawaz, Maxime Devanne, Stefano Berretti, Jonathan Weber, Germain Forestier
  • for: 这篇研究旨在解决时间序列分类 tasks 中的过滤问题,尤其是在训练数据稀缺的情况下。
  • methods: 本研究提出了一种基于预训练领域的基础模型,使用了一个新的预tex task来训练模型,并在训练过程中分成了两个阶段:预训练阶段和精度训练阶段。
  • results: 实验结果显示,这种预训练策略可以对时间序列分类 tasks 中的过滤问题做出有效的改善,并且在训练数据稀缺的情况下具有较好的适应性。
    Abstract Over the past decade, Time Series Classification (TSC) has gained an increasing attention. While various methods were explored, deep learning - particularly through Convolutional Neural Networks (CNNs)-stands out as an effective approach. However, due to the limited availability of training data, defining a foundation model for TSC that overcomes the overfitting problem is still a challenging task. The UCR archive, encompassing a wide spectrum of datasets ranging from motion recognition to ECG-based heart disease detection, serves as a prime example for exploring this issue in diverse TSC scenarios. In this paper, we address the overfitting challenge by introducing pre-trained domain foundation models. A key aspect of our methodology is a novel pretext task that spans multiple datasets. This task is designed to identify the originating dataset of each time series sample, with the goal of creating flexible convolution filters that can be applied across different datasets. The research process consists of two phases: a pre-training phase where the model acquires general features through the pretext task, and a subsequent fine-tuning phase for specific dataset classifications. Our extensive experiments on the UCR archive demonstrate that this pre-training strategy significantly outperforms the conventional training approach without pre-training. This strategy effectively reduces overfitting in small datasets and provides an efficient route for adapting these models to new datasets, thus advancing the capabilities of deep learning in TSC.
    摘要 过去一个 décennial,时间序列分类(TSC)已经受到了越来越多的关注。虽然各种方法被探索,但是深度学习——特别是通过卷积神经网络(CNN)——脱颖而出,成为了有效的方法。然而,由于训练数据的有限性,为TSC定义一个基础模型,以解决过拟合问题,仍然是一个挑战。UCRC存档,包含了多种时间序列 datasets,从动作识别到心跳监测等,成为了研究这一问题的多样化 TSC 场景的示例。在这篇论文中,我们采用了预训练频率基础模型来解决过拟合问题。我们的方法包括了一个新的预测任务,该任务是识别每个时间序列样本的来源dataset,以创建可以在不同dataset上应用的灵活的卷积滤波器。我们的研究过程包括两个阶段:预训练阶段,模型通过预测任务获得通用特征;然后是精度调整阶段,用于特定dataset的分类。我们在UCRC存档上进行了广泛的实验,发现这种预训练策略在小 datasets 中明显超过了无预训练的情况。这种策略有效地降低了小 datasets 中的过拟合,并提供了一种有效的方式,以适应新的 datasets,从而推动深度学习在 TSC 中的发展。

Comparing Feature Engineering and End-to-End Deep Learning for Autism Spectrum Disorder Assessment based on Fullbody-Tracking

  • paper_url: http://arxiv.org/abs/2311.14533
  • repo_url: None
  • paper_authors: Alberto Altozano, Maria Eleonora Minissi, Mariano Alcañiz, Javier Marín-Morales
  • for: 本研究旨在评估不同方法的有效性在诊断Autism Spectrum Disorder (ASD)中,以便发现更加可靠和灵活的方法。
  • methods: 研究使用了两种方法:一种是使用手工设计的特征,另一种是使用端到端模型。
  • results: 结果显示,使用手工设计的特征在特定任务中表现较好,达到了状态之artefact的区下标(AUC)的0.90$\pm$0.06。然而,端到端模型提供了更加一致的结果,无论任务是什么,并且表现了领域总是和可靠性。
    Abstract Autism Spectrum Disorder (ASD) is characterized by challenges in social communication and restricted patterns, with motor abnormalities gaining traction for early detection. However, kinematic analysis in ASD is limited, often lacking robust validation and relying on hand-crafted features for single tasks, leading to inconsistencies across studies. Thus, end-to-end models have become promising methods to overcome the need for feature engineering. Our aim is to assess both approaches across various kinematic tasks to measure the efficacy of commonly used features in ASD assessment, while comparing them to end-to-end models. Specifically, we developed a virtual reality environment with multiple motor tasks and trained models using both classification approaches. We prioritized a reliable validation framework with repeated cross-validation. Our comparative analysis revealed that hand-crafted features outperformed our deep learning approach in specific tasks, achieving a state-of-the-art area under the curve (AUC) of 0.90$\pm$0.06. Conversely, end-to-end models provided more consistent results with less variability across all VR tasks, demonstrating domain generalization and reliability, with a maximum task AUC of 0.89$\pm$0.06. These findings show that end-to-end models enable less variable and context-independent ASD assessments without requiring domain knowledge or task specificity. However, they also recognize the effectiveness of hand-crafted features in specific task scenarios.
    摘要 自适应发展障碍(ASD)特征之一是社交通信和固定模式的挑战,以及运动异常的出现,这使得早期检测变得可能。然而,在ASD中的运动分析受到限制,通常缺乏可靠的验证和特定任务的手工特征,这导致研究中存在不一致性。因此,端到端模型成为了检测ASD的有效方法。我们的目标是评估这两种方法在不同的运动任务中的效果,并将其与手工特征进行比较。我们在虚拟现实环境中创建了多种运动任务,并使用了两种分类方法进行训练。我们重视了可靠的验证框架,并进行了重复的横排验证。我们的比较分析表明,手工特征在特定任务中表现出色,达到了状态的报告圆度(AUC)的0.90$\pm$0.06。相比之下,端到端模型在所有VR任务中提供了更加一致和无关上下文的评估结果,达到了最大任务AUC的0.89$\pm$0.06。这些发现表明,端到端模型可以提供不变性和无关上下文的ASD评估,无需培riniciples或任务特定知识。然而,它们也证明了特定任务场景中手工特征的效iveness。

Fault Detection in Telecom Networks using Bi-level Federated Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.14469
  • repo_url: None
  • paper_authors: R. Bourgerie, T. Zanouda
  • For: 这个研究旨在探讨如何透过 anomaly detection 和诊断来探测和维护5G 和以后网络中的网络异常。* Methods: 本研究提出了一个 Bi-level Federated Graph Neural Network 异常探测和诊断模型,可以透过 Privacy-preserving 的方式探测网络异常,同时对于不同的应用需求进行适应。* Results: 研究人员透过使用 real-world 数据进行实验,发现 Personalized Federated Temporal Graph Neural Networks 方法可以较好地探测网络异常,并且比较常用的技术有更好的性能。
    Abstract 5G and Beyond Networks become increasingly complex and heterogeneous, with diversified and high requirements from a wide variety of emerging applications. The complexity and diversity of Telecom networks place an increasing strain on maintenance and operation efforts. Moreover, the strict security and privacy requirements present a challenge for mobile operators to leverage network data. To detect network faults, and mitigate future failures, prior work focused on leveraging traditional ML/DL methods to locate anomalies in networks. The current approaches, although powerful, do not consider the intertwined nature of embedded and software-intensive Radio Access Network systems. In this paper, we propose a Bi-level Federated Graph Neural Network anomaly detection and diagnosis model that is able to detect anomalies in Telecom networks in a privacy-preserving manner, while minimizing communication costs. Our method revolves around conceptualizing Telecom data as a bi-level temporal Graph Neural Networks. The first graph captures the interactions between different RAN nodes that are exposed to different deployment scenarios in the network, while each individual Radio Access Network node is further elaborated into its software (SW) execution graph. Additionally, we use Federated Learning to address privacy and security limitations. Furthermore, we study the performance of anomaly detection model under three settings: (1) Centralized (2) Federated Learning and (3) Personalized Federated Learning using real-world data from an operational network. Our comprehensive experiments showed that Personalized Federated Temporal Graph Neural Networks method outperforms the most commonly used techniques for Anomaly Detection.
    摘要 “5G和以后网络逐渐变得越来越复杂和多样化,各种emerging应用的需求也越来越高。这种复杂和多样化的telecom网络对维护和运维带来增加的压力。此外,保持安全和隐私的要求也对 mobil operators来说是一个挑战。以往的工作主要利用传统的ML/DL方法来检测网络异常。然而,这些方法不考虑Radio Access Network系统中的嵌入式和软件化的特点。在这篇论文中,我们提出了一种Bi-level Federated Graph Neural Network异常检测和诊断模型,可以在保持隐私的情况下检测网络异常,同时尽可能降低通信成本。我们的方法基于 conceptualizing Telecom数据为bi-level时间Graph Neural Networks。第一个图表captures不同的RAN节点之间的互动,而每个Radio Access Network节点进一步分解成其软件(SW)执行图。此外,我们使用联合学习来解决隐私和安全限制。我们还在实际网络数据上进行了三种设置的实验:(1)中央化(2)联合学习和(3)个性化联合学习。我们的全面实验结果表明,Personalized Federated Temporal Graph Neural Networks方法在异常检测中比最常用的技术表现出色。”

Efficient Gradient Estimation via Adaptive Sampling and Importance Sampling

  • paper_url: http://arxiv.org/abs/2311.14468
  • repo_url: None
  • paper_authors: Corentin Salaün, Xingchang Huang, Iliyan Georgiev, Niloy J. Mitra, Gurprit Singh
  • for: 这 paper 是为了提高 Stochastic Gradient Descent (SGD) 优化器的效果,特别是在Estimating gradients from a mini-batch of data samples方面。
  • methods: 这 paper 使用了一种新的 Adaptive Importance Sampling (AIS) 技术,它可以减少 noise in gradient estimation,并且可以有效地 интеGRATE importance sampling into machine learning frameworks。
  • results: 这 paper 的实验结果表明,使用 AIS 技术可以提高 classification 和 regression 任务的 converge 率,而且可以减少计算负担。 且 validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets。
    Abstract Machine learning problems rely heavily on stochastic gradient descent (SGD) for optimization. The effectiveness of SGD is contingent upon accurately estimating gradients from a mini-batch of data samples. Instead of the commonly used uniform sampling, adaptive or importance sampling reduces noise in gradient estimation by forming mini-batches that prioritize crucial data points. Previous research has suggested that data points should be selected with probabilities proportional to their gradient norm. Nevertheless, existing algorithms have struggled to efficiently integrate importance sampling into machine learning frameworks. In this work, we make two contributions. First, we present an algorithm that can incorporate existing importance functions into our framework. Second, we propose a simplified importance function that relies solely on the loss gradient of the output layer. By leveraging our proposed gradient estimation techniques, we observe improved convergence in classification and regression tasks with minimal computational overhead. We validate the effectiveness of our adaptive and importance-sampling approach on image and point-cloud datasets.
    摘要 (Simplified Chinese)机器学习问题强调了Stochastic Gradient Descent(SGD)优化。SGD的效果取决于正确地估计 mini-batch 中数据点的梯度。而不是通常使用均匀采样,可适应采样可以减少梯度估计中的噪声。先前的研究表明,数据点选择概率应该与梯度 нор 成正比。然而,现有的算法很难高效地将重要性采样 integrate 到机器学习框架中。在这种情况下,我们做了两个贡献。首先,我们提供了一个可以包含现有重要性函数的算法。其次,我们提议一种简单的重要性函数,它仅仅基于输出层的损失梯度。通过我们提议的梯度估计技术,我们在分类和回归任务中观察到提高了收敛性,并且具有最小的计算开销。我们验证了我们的自适应和重要性采样方法在图像和点云Dataset中的效果。

Finite Volume Features, Global Geometry Representations, and Residual Training for Deep Learning-based CFD Simulation

  • paper_url: http://arxiv.org/abs/2311.14464
  • repo_url: None
  • paper_authors: Loh Sher En Jessica, Naheed Anjum Arafat, Wei Xian Lim, Wai Lee Chan, Adams Wai Kin Kong
  • for: 提高 CFD 模拟效果和精度。
  • methods: 使用 Shortest Vector (SV) 和 Directional Integrated Distance (DID) 两种新的几何表示方法,以及 Finite Volume Features (FVF) 在图 convolution 中作为节点和边特征。
  • results: 实验结果表明,使用 SV、DID、FVF 和 residual training 可以降低 CFD 模拟结果的预测错误率,相比现有的 GNN 方法可以降低至少 41%。
    Abstract Computational fluid dynamics (CFD) simulation is an irreplaceable modelling step in many engineering designs, but it is often computationally expensive. Some graph neural network (GNN)-based CFD methods have been proposed. However, the current methods inherit the weakness of traditional numerical simulators, as well as ignore the cell characteristics in the mesh used in the finite volume method, a common method in practical CFD applications. Specifically, the input nodes in these GNN methods have very limited information about any object immersed in the simulation domain and its surrounding environment. Also, the cell characteristics of the mesh such as cell volume, face surface area, and face centroid are not included in the message-passing operations in the GNN methods. To address these weaknesses, this work proposes two novel geometric representations: Shortest Vector (SV) and Directional Integrated Distance (DID). Extracted from the mesh, the SV and DID provide global geometry perspective to each input node, thus removing the need to collect this information through message-passing. This work also introduces the use of Finite Volume Features (FVF) in the graph convolutions as node and edge attributes, enabling its message-passing operations to adjust to different nodes. Finally, this work is the first to demonstrate how residual training, with the availability of low-resolution data, can be adopted to improve the flow field prediction accuracy. Experimental results on two datasets with five different state-of-the-art GNN methods for CFD indicate that SV, DID, FVF and residual training can effectively reduce the predictive error of current GNN-based methods by as much as 41%.
    摘要 计算流体动力学(CFD)模拟是许多工程设计中不可或缺的模拟步骤,但它往往具有计算成本高的问题。一些基于图神经网络(GNN)的CFD方法已经被提出。然而,现有的方法继承了传统的数值模拟器的弱点,同时忽略了finite volume方法中的细网格特征。具体来说,输入节点在这些GNN方法中具有非常有限的对象和其周围环境的信息。此外,finite volume方法中的细网格特征,如细网格体积、面积和中心点,不包括在消息传递操作中。为解决这些弱点,本工作提出了两种新的几何表示:最短 вектор(SV)和方向积分距离(DID)。从细网格中提取出来的SV和DID为每个输入节点提供全局几何视角,因此消除了通过消息传递收集这些信息的需要。此外,本工作还引入了finite volume特征(FVF)在图 convolution 中作为节点和边属性,使其消息传递操作可以适应不同的节点。最后,本工作是首次实现了使用剩余训练,在低分辨率数据可用时,提高流体场预测精度的可能性。实验结果表明,使用SV、DID、FVF和剩余训练可以有效地降低当前GNN基于CFD方法的预测错误,最多降低41%。

Disentangling the Spectral Properties of the Hodge Laplacian: Not All Small Eigenvalues Are Equal

  • paper_url: http://arxiv.org/abs/2311.14427
  • repo_url: None
  • paper_authors: Vincent P. Grande, Michael T. Schaub
  • for: 这篇论文旨在为图论、机器学习和图信号处理等领域提供更多的信息,通过对哥德 Laplacian 的精细特征进行分析。
  • methods: 这篇论文使用了 Hodge Laplacian 作为更高级别的图模型,并对其最小特征值进行分析,从而掌握到图中的重要拓扑性质。
  • results: 这篇论文提出了一种基于 Hodge Laplacian 的 persistenteigenvector similarity 的方法,可以跟踪不同级别的图信号在不同缩放级别上的变化。此外,这篇论文还提出了一种基于这种 persistenteigenvector similarity 的图特征分类方法。
    Abstract The rich spectral information of the graph Laplacian has been instrumental in graph theory, machine learning, and graph signal processing for applications such as graph classification, clustering, or eigenmode analysis. Recently, the Hodge Laplacian has come into focus as a generalisation of the ordinary Laplacian for higher-order graph models such as simplicial and cellular complexes. Akin to the traditional analysis of graph Laplacians, many authors analyse the smallest eigenvalues of the Hodge Laplacian, which are connected to important topological properties such as homology. However, small eigenvalues of the Hodge Laplacian can carry different information depending on whether they are related to curl or gradient eigenmodes, and thus may not be comparable. We therefore introduce the notion of persistent eigenvector similarity and provide a method to track individual harmonic, curl, and gradient eigenvectors/-values through the so-called persistence filtration, leveraging the full information contained in the Hodge-Laplacian spectrum across all possible scales of a point cloud. Finally, we use our insights (a) to introduce a novel form of topological spectral clustering and (b) to classify edges and higher-order simplices based on their relationship to the smallest harmonic, curl, and gradient eigenvectors.
    摘要 “graph Laplacian的各种 спектル інформація在图论、机器学习和图信号处理中具有重要的应用,如图分类、凝集分析和eigenmode分析。在这些应用中,作者们通常分析最小的Hodge Laplacian的値,这些値值与重要的Topological Property有关,如同体积。但是,小的Hodge Laplacian値可能不同具有不同的信息,具体取决于curl或漫步eigenmode的相关性,因此可能不能比较。我们因此引入 persistenteigenvector similarity的概念,并提供了一种方法,通过called persistenc filtration,利用Hodge-Laplacian спектル中所有可能的扩展scale的点云中的全部信息,跟踪individual harmonic、curl和gradient eigenvectors/-values的变化。最后,我们使用我们的发现(a)提出了一种新的topological spectral clustering方法,以及(b)基于curl、gradient和harmonic eigenvectors的edge和高阶simplices的分类。”Note that Simplified Chinese is a written form of Chinese that uses shorter words and simpler grammar than Traditional Chinese. The translation may not be exactly the same as the original text, but it should convey the same meaning and ideas.

Approximation of Convex Envelope Using Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.14421
  • repo_url: None
  • paper_authors: Vivek S. Borkar, Adit Akarsh
  • for: 这个论文是用来估计非凸函数的凸包的问题的 Stochastic control 形式ulation。
  • methods: 该论文使用了一种基于 Q-learning 的控制优化方法来近似凸包,以及一种变体的 Q-learning 算法来实现控制的优化停止。
  • results: 该论文在一个标准的测试问题库中得到了非常有 promise 的结果。
    Abstract Oberman gave a stochastic control formulation of the problem of estimating the convex envelope of a non-convex function. Based on this, we develop a reinforcement learning scheme to approximate the convex envelope, using a variant of Q-learning for controlled optimal stopping. It shows very promising results on a standard library of test problems.
    摘要 奥伯曼提出了一种游戏理论的控制形式,用于估算非凸函数的凸包。基于这种形式,我们开发了一种基于Q学习的控制优化停止方法,用于approximate凸包。实验结果表明,这种方法在标准库中的测试问题上表现很出色。Note: "凸包" (convex envelope) in Chinese is typically translated as "凸函数的凸包" (convex envelope of a function), but since the text already uses "凸包" without any clarification, I kept the translation consistent.

A Comparison of PDF Projection with Normalizing Flows and SurVAE

  • paper_url: http://arxiv.org/abs/2311.14412
  • repo_url: None
  • paper_authors: Paul M. Baggenstoss, Felix Govaers
  • for: 这篇论文目的是为了探讨Normalizing Flows(NF)和Surjection VAE(SurVAE)等方法的应用。
  • methods: 这篇论文使用了NF和SurVAE等方法,这些方法可以实现精确的预测和探索。
  • results: 这篇论文发现了NF和SurVAE的应用是PDF projection的重新发明,这个概念已经在过去二十年内得到了更多的发展。
    Abstract Normalizing flows (NF) recently gained attention as a way to construct generative networks with exact likelihood calculation out of composable layers. However, NF is restricted to dimension-preserving transformations. Surjection VAE (SurVAE) has been proposed to extend NF to dimension-altering transformations. Such networks are desirable because they are expressive and can be precisely trained. We show that the approaches are a re-invention of PDF projection, which appeared over twenty years earlier and is much further developed.
    摘要 对�utable流(NF)在最近吸引了一些注意,因为它可以通过可composable层建立生成网络,并且可以进行精确的概率计算。然而,NF受限于维度保持变换。Sujection VAE(SurVAE)已经提议以增加NF的维度变换能力。这些网络非常表达力强,可以精确地训练。我们显示出这些方法与PDF投影已经在二十年前出现,而且已经更加发展。Note: "PDF projection" in the text refers to "dimensionality-preserving flow projection"

Unveiling The Factors of Aesthetic Preferences with Explainable AI

  • paper_url: http://arxiv.org/abs/2311.14410
  • repo_url: None
  • paper_authors: Derya Soydaner, Johan Wagemans
  • for: 本研究旨在探讨图像美学吸引人的原因,并使用机器学习模型来预测图像的美学分数。
  • methods: 我们使用了多种机器学习模型,包括Random Forest、XGBoost、Support Vector Regression和Multilayer Perceptron,并使用SHAP技术来解释图像美学分数的各个属性和它们之间的交互。
  • results: 我们在三个图像美学标准 benchmark 上进行了实验,发现不同的机器学习模型在预测图像美学分数方面的表现不同,同时使用SHAP技术可以提供更深入的解释。
    Abstract The allure of aesthetic appeal in images captivates our senses, yet the underlying intricacies of aesthetic preferences remain elusive. In this study, we pioneer a novel perspective by utilizing machine learning models that focus on aesthetic attributes known to influence preferences. Through a data mining approach, our models process these attributes as inputs to predict the aesthetic scores of images. Moreover, to delve deeper and obtain interpretable explanations regarding the factors driving aesthetic preferences, we utilize the popular Explainable AI (XAI) technique known as SHapley Additive exPlanations (SHAP). Our methodology involves employing various machine learning models, including Random Forest, XGBoost, Support Vector Regression, and Multilayer Perceptron, to compare their performances in accurately predicting aesthetic scores, and consistently observing results in conjunction with SHAP. We conduct experiments on three image aesthetic benchmarks, providing insights into the roles of attributes and their interactions. Ultimately, our study aims to shed light on the complex nature of aesthetic preferences in images through machine learning and provides a deeper understanding of the attributes that influence aesthetic judgements.
    摘要 《图像美学魅力的研究》Introduction:图像美学魅力 captivates our senses, yet the underlying intricacies of aesthetic preferences remain elusive. In this study, we pioneer a novel perspective by utilizing machine learning models that focus on aesthetic attributes known to influence preferences. Through a data mining approach, our models process these attributes as inputs to predict the aesthetic scores of images. Moreover, to delve deeper and obtain interpretable explanations regarding the factors driving aesthetic preferences, we utilize the popular Explainable AI (XAI) technique known as SHapley Additive exPlanations (SHAP).Methodology:Our methodology involves employing various machine learning models, including Random Forest, XGBoost, Support Vector Regression, and Multilayer Perceptron, to compare their performances in accurately predicting aesthetic scores, and consistently observing results in conjunction with SHAP. We conduct experiments on three image aesthetic benchmarks, providing insights into the roles of attributes and their interactions.Objectives:Our study aims to shed light on the complex nature of aesthetic preferences in images through machine learning and provides a deeper understanding of the attributes that influence aesthetic judgements. By utilizing SHAP, we can obtain interpretable explanations regarding the factors driving aesthetic preferences, allowing for a more nuanced understanding of the underlying mechanisms.Expected Outcomes:We expect our study to provide valuable insights into the aesthetic preferences of images and the factors that influence them. By comparing the performances of various machine learning models and analyzing the results with SHAP, we can gain a more comprehensive understanding of the complex nature of aesthetic preferences and the attributes that contribute to them. Ultimately, our study aims to provide a deeper understanding of the intricacies of aesthetic preferences in images and their underlying mechanisms.

BHGNN-RT: Network embedding for directed heterogeneous graphs

  • paper_url: http://arxiv.org/abs/2311.14404
  • repo_url: https://github.com/albertlordsun/bhgnn-rt
  • paper_authors: Xiyang Sun, Fumiyasu Komaki
  • for: 本研究旨在提出一种适用于指定异ogeneous网络的bidirectional heterogeneous graph neural network with random teleport(BHGNN-RT),以解决指定异ogeneous网络中的过拟合问题。
  • methods: 本研究提出了一种基于bidirectional message-passing过程和网络不同性的 embedding方法,并且通过优化 телепорport比例来解决过拟合问题。
  • results: 经验表明,BHGNN-RT在不同数据集上展现出了优于比较方法的性能,并且可以在节点分类和无监督归一类任务中达到领先水平。此外,研究还探讨了消息组件、模型层次和电PORT比例对模型性能的影响。
    Abstract Networks are one of the most valuable data structures for modeling problems in the real world. However, the most recent node embedding strategies have focused on undirected graphs, with limited attention to directed graphs, especially directed heterogeneous graphs. In this study, we first investigated the network properties of directed heterogeneous graphs. Based on network analysis, we proposed an embedding method, a bidirectional heterogeneous graph neural network with random teleport (BHGNN-RT), for directed heterogeneous graphs, that leverages bidirectional message-passing process and network heterogeneity. With the optimization of teleport proportion, BHGNN-RT is beneficial to overcome the over-smoothing problem. Extensive experiments on various datasets were conducted to verify the efficacy and efficiency of BHGNN-RT. Furthermore, we investigated the effects of message components, model layer, and teleport proportion on model performance. The performance comparison with all other baselines illustrates that BHGNN-RT achieves state-of-the-art performance, outperforming the benchmark methods in both node classification and unsupervised clustering tasks.
    摘要 网络是现实世界中一种非常有价值的数据结构,但最近的节点嵌入策略却主要关注于无向图,尚未充分关注有向图,特别是有向不同类型图。在本研究中,我们首先研究了指向不同类型图的网络性质。基于网络分析,我们提出了一种嵌入方法,即bidirectional heterogeneous graph neural network with random teleport(BHGNN-RT),用于直接不同类型图,该方法利用双向信息传递过程和网络多样性。通过优化 телепор比例,BHGNN-RT可以解决过度平滑问题。我们在多个数据集上进行了广泛的实验,以证明BHGNN-RT的效果和效率。此外,我们还研究了消息组成部分、模型层次和 телепор比例对模型性能的影响。与所有基elines相比,BHGNN-RT实现了状态革命性的表现,在节点类别预测和无监督归一类任务中都高于标准方法。

TEA: Test-time Energy Adaptation

  • paper_url: http://arxiv.org/abs/2311.14402
  • repo_url: None
  • paper_authors: Yige Yuan, Bingbing Xu, Liang Hou, Fei Sun, Huawei Shen, Xueqi Cheng
  • for: 提高模型总体化能力,尤其是在测试数据与训练数据分布不同时
  • methods: 基于能量视角,将训练过的分类器转化为能量基本模型,并将模型的分布与测试数据的分布进行对应
  • results: 对多个任务、标准准则和架构进行了广泛的实验,显示 TEA 的总体化性能较为先进,并且可以帮助模型更好地识别测试数据分布,从而提高总体化和准确性。
    Abstract Test-time adaptation (TTA) aims to improve model generalizability when test data diverges from training distribution, offering the distinct advantage of not requiring access to training data and processes, especially valuable in the context of large pre-trained models. However, current TTA methods fail to address the fundamental issue: covariate shift, i.e., the decreased generalizability can be attributed to the model's reliance on the marginal distribution of the training data, which may impair model calibration and introduce confirmation bias. To address this, we propose a novel energy-based perspective, enhancing the model's perception of target data distributions without requiring access to training data or processes. Building on this perspective, we introduce $\textbf{T}$est-time $\textbf{E}$nergy $\textbf{A}$daptation ($\textbf{TEA}$), which transforms the trained classifier into an energy-based model and aligns the model's distribution with the test data's, enhancing its ability to perceive test distributions and thus improving overall generalizability. Extensive experiments across multiple tasks, benchmarks and architectures demonstrate TEA's superior generalization performance against state-of-the-art methods. Further in-depth analyses reveal that TEA can equip the model with a comprehensive perception of test distribution, ultimately paving the way toward improved generalization and calibration.
    摘要

Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

  • paper_url: http://arxiv.org/abs/2311.14387
  • repo_url: None
  • paper_authors: Mingze Wang, Zeping Min, Lei Wu
  • for: 本研究 investigate了梯度-based algorithm在分类 linearly separable data 上的margin-maximization bias。
  • methods: 本研究使用了深入分析normalized gradient velocity field的特性,强调其在margin maximization中的作用。基于此分析,我们提出了一种新的算法 called Progressive Rescaling Gradient Descent (PRGD),并证明PRGD可以在 exponential rate 上 maximize the margin。
  • results: 我们发现了一些数据分布下,现有的算法such as gradient descent (GD)和normalized gradient descent (NGD) provably fail 在高效地 maximize the margin。此外,我们还发现PRGD可以增强深度神经网络和线性不可分数据集上的泛化性表现。
    Abstract In this work, we investigate the margin-maximization bias exhibited by gradient-based algorithms in classifying linearly separable data. We present an in-depth analysis of the specific properties of the velocity field associated with (normalized) gradients, focusing on their role in margin maximization. Inspired by this analysis, we propose a novel algorithm called Progressive Rescaling Gradient Descent (PRGD) and show that PRGD can maximize the margin at an {\em exponential rate}. This stands in stark contrast to all existing algorithms, which maximize the margin at a slow {\em polynomial rate}. Specifically, we identify mild conditions on data distribution under which existing algorithms such as gradient descent (GD) and normalized gradient descent (NGD) {\em provably fail} in maximizing the margin efficiently. To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
    摘要 在这项研究中,我们调查了基于梯度的算法在分类 linearly separable 数据时的margin-maximization偏好。我们进行了深入的velocity场分析,关注把normalized梯度 associate with的特性,尤其是它们在margin maximization中的作用。 inspirited by这种分析,我们提出了一种新的算法 called Progressive Rescaling Gradient Descent (PRGD),并证明PRGD可以在 exponential rate 上 maximize the margin。这与所有现有的算法不同,他们只能在 slow polynomial rate 上 maximize the margin。我们还证明了exist 的算法如 gradient descent (GD) 和 normalized gradient descent (NGD) 在certain condition下 provably fail 在efficiently maximize the margin。为 validate our theoretical findings, we present both synthetic and real-world experiments. notable, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.

Deciphering and integrating invariants for neural operator learning with various physical mechanisms

  • paper_url: http://arxiv.org/abs/2311.14361
  • repo_url: None
  • paper_authors: Rui Zhang, Qi Meng, Zhi-Ming Ma
  • for: 本研究旨在开发一种基于神经网络的 Physical Invariant Attention Neural Operator (PIANO),用于解决基于 partial differential equation (PDE) 的物理系统模拟问题。
  • methods: PIANO 使用自我超视的学习方法提取物理知识,并使用注意力机制将其集成到动态卷积层中。
  • results: 相比现有技术,PIANO 可以降低 PDE 预测任务中相对误差的百分比值,从13.6% 至 82.2%。此外, varied 下游任务表明 PIANO 所提取的物理嵌入 aligned well 于 PDE 系统中的下面逻辑,证明 PIANO 的物理意义。
    Abstract Neural operators have been explored as surrogate models for simulating physical systems to overcome the limitations of traditional partial differential equation (PDE) solvers. However, most existing operator learning methods assume that the data originate from a single physical mechanism, limiting their applicability and performance in more realistic scenarios. To this end, we propose Physical Invariant Attention Neural Operator (PIANO) to decipher and integrate the physical invariants (PI) for operator learning from the PDE series with various physical mechanisms. PIANO employs self-supervised learning to extract physical knowledge and attention mechanisms to integrate them into dynamic convolutional layers. Compared to existing techniques, PIANO can reduce the relative error by 13.6\%-82.2\% on PDE forecasting tasks across varying coefficients, forces, or boundary conditions. Additionally, varied downstream tasks reveal that the PI embeddings deciphered by PIANO align well with the underlying invariants in the PDE systems, verifying the physical significance of PIANO. The source code will be publicly available at: https://github.com/optray/PIANO.
    摘要 <>Translate given text into Simplified Chinese.<> neuronal 算法已经被探索作为模拟物理系统的代理模型,以超越传统的partial differential equation(PDE)解决方法的局限性。然而,大多数现有的算法学习方法假设数据来自单一的物理机制,这限制了它们在更真实的场景中的可应用性和性能。为此,我们提出了物理 invariants 吸引神经算法(PIANO),以解读并 integrate 物理 invariants(PI)在 PDE 系列中的学习。PIANO 使用了自我超级vised 学习来提取物理知识,并使用了注意力机制来集成它们到动态卷积层中。相比之前的技术,PIANO 可以降低 PDE 预测任务中相对错误率 by 13.6%-82.2%,并且在不同的下游任务中,PI 嵌入被PIANO解读出来的对应于 Underlying invariants 在 PDE 系统中,这证明了PIANO 的物理意义。代码将公开在:https://github.com/optray/PIANO。

Thompson sampling for zero-inflated count outcomes with an application to the Drink Less mobile health study

  • paper_url: http://arxiv.org/abs/2311.14359
  • repo_url: None
  • paper_authors: Xueqing Liu, Nina Deliu, Tanujit Chakraborty, Lauren Bell, Bibhas Chakraborty
  • for: 本研究旨在提高远期结果,如临床状况,通过适时适量的可靠性改进。
  • methods: 本研究使用了 Contextual Bandits 框架,通过个性化时变上下文来自适应ively 调整 intervención。
  • results: 研究提出了一种将 count 数据模型 integrate 到在线决策中的方法,并在实际数据集和模拟数据集上证明了该方法的有效性。 regret bounds 也得到了理论上的证明。
    Abstract Mobile health (mHealth) technologies aim to improve distal outcomes, such as clinical conditions, by optimizing proximal outcomes through just-in-time adaptive interventions. Contextual bandits provide a suitable framework for customizing such interventions according to individual time-varying contexts, intending to maximize cumulative proximal outcomes. However, unique challenges such as modeling count outcomes within bandit frameworks have hindered the widespread application of contextual bandits to mHealth studies. The current work addresses this challenge by leveraging count data models into online decision-making approaches. Specifically, we combine four common offline count data models (Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regressions) with Thompson sampling, a popular contextual bandit algorithm. The proposed algorithms are motivated by and evaluated on a real dataset from the Drink Less trial, where they are shown to improve user engagement with the mHealth system. The proposed methods are further evaluated on simulated data, achieving improvement in maximizing cumulative proximal outcomes over existing algorithms. Theoretical results on regret bounds are also derived. A user-friendly R package countts that implements the proposed methods for assessing contextual bandit algorithms is made publicly available at https://cran.r-project.org/web/packages/countts.
    摘要 мобильные технологии здоровья (mHealth) стремятся улучшить дальнейшие результаты, такие как клинические условия, путем оптимизации ближайших результатов в реальном времени с помощью адаптивных интервенций на основе контекста. Фреймворк контекстных бандитов подходит для настройки таких интервенций в соответствии с индивидуальными контекстами в реальном времени, чтобы максимизировать суммарные ближайшие результаты. Однако, существуют уникальные проблемы, такие как моделирование количественных результатов в рамках фреймворков бандитов, что ограничивает применение контекстных бандитов в исследованиях mHealth. Наши работы преодолевают эту проблему, введя модели count data в онлайновое принятие решений. В частности, мы комбинируем четыре общепринятых офлайн-модели count data (полупановую, отрицательную биномиальную,Poissonовскую и отрицательную биномиальную регрессии) с алгоритмом Thompson sampling, популярным контекстным бандитом. Наши предложенные алгоритмы были вдохновлены и оценены на реальных данных из исследования Drink Less, где они были показаны улучшением вовлеченности пользователей в систему mHealth. Наши методы были также оценены на симулированных данных и показали улучшение в максимизации суммарных ближайших результатов по сравнению с существующими алгоритмами. Мы также получены теоретические результаты о границах regret. User-friendly R package countts, который реализует наши предложенные методы для оценки алгоритмов контекстных бандитов, является доступен для скачивания на сайте .

Cycle Invariant Positional Encoding for Graph Representation Learning

  • paper_url: http://arxiv.org/abs/2311.14333
  • repo_url: https://github.com/pkuyzy/CycleNet
  • paper_authors: Zuoyu Yan, Tengfei Ma, Liangcai Gao, Zhi Tang, Chao Chen, Yusu Wang
  • for: 增强图像学模型的图形数据中的循环元素
  • methods: 使用循环基础(一个最小的循环生成图形空间的基础)和循环结构编码模块
  • results: 在多个比较中表现更好,比如在多种benchmark中的表现Here is the full translation of the abstract in Simplified Chinese:
  • for: 本文为了增强图像学模型中的图形数据中的循环元素,提出了一种结构编码模块,即循环网络(CycleNet)。
  • methods: 我们使用循环基础(一个最小的循环生成图形空间的基础)和循环结构编码模块来编码循环信息。为保证编码是 permutation invariant,我们使用基于BasisNet的正交 проекor来编码循环信息。
  • results: 我们通过多种实验表明,增强了的模型在多个比较中表现更好,比如在多种benchmark中的表现。此外,我们还提供了一些理论理解module的表达力。
    Abstract Cycles are fundamental elements in graph-structured data and have demonstrated their effectiveness in enhancing graph learning models. To encode such information into a graph learning framework, prior works often extract a summary quantity, ranging from the number of cycles to the more sophisticated persistence diagram summaries. However, more detailed information, such as which edges are encoded in a cycle, has not yet been used in graph neural networks. In this paper, we make one step towards addressing this gap, and propose a structure encoding module, called CycleNet, that encodes cycle information via edge structure encoding in a permutation invariant manner. To efficiently encode the space of all cycles, we start with a cycle basis (i.e., a minimal set of cycles generating the cycle space) which we compute via the kernel of the 1-dimensional Hodge Laplacian of the input graph. To guarantee the encoding is invariant w.r.t. the choice of cycle basis, we encode the cycle information via the orthogonal projector of the cycle basis, which is inspired by BasisNet proposed by Lim et al. We also develop a more efficient variant which however requires that the input graph has a unique shortest cycle basis. To demonstrate the effectiveness of the proposed module, we provide some theoretical understandings of its expressive power. Moreover, we show via a range of experiments that networks enhanced by our CycleNet module perform better in various benchmarks compared to several existing SOTA models.
    摘要 “循环是很重要的元素在图structured data中,它们已经证明了它们可以增强图学习模型。在图学习框架中实现这些信息,先前的研究通常是提取一个总体量,从数量到更加复杂的对应图的实变图 summaries。但是,更详细的信息,例如哪些边被编码在循环中,尚未在图神经网络中使用过。在这篇论文中,我们做出了一步,对这个问题进行了解释,并提出了一个叫做CycleNet的结构编码模组。这个模组使用图的边结构编码,实现了图的循环信息编码,并且保持了 permutation 不变性。为了有效地编码循环空间,我们开始 WITH 一个循环基底(即循环空间的最小生成集),我们使用图的1维hodgetlaplacian的kernel来计算。为 garantuee encoding是对应给选择循环基底的不变性,我们使用循环基底的正交投影器,这是由BasisNet提出的。我们还开发了一个更有效的变iante,但是它需要输入图有唯一的最短循环基底。通过一些 teoretic 理解和实验验证,我们证明了我们的CycleNet模组可以增强图学习模型的表现。”

GATGPT: A Pre-trained Large Language Model with Graph Attention Network for Spatiotemporal Imputation

  • paper_url: http://arxiv.org/abs/2311.14332
  • repo_url: None
  • paper_authors: Yakun Chen, Xianzhi Wang, Guandong Xu
  • for: 该论文目的是提出一种基于大型自然语言模型(LLM)的协同推理框架,用于适应缺失数据的 espacio-temporal 补做。
  • methods: 该方法利用预训练的 LLM integretes a graph attention mechanism,保持大多数 LLM 参数不变,以便利用现有知识来学习时间模式,同时对特定应用进行微调。
  • results: 通过测试三个真实世界数据集,该创新方法与现有深度学习标准做比较,得到了相似的结果。
    Abstract The analysis of spatiotemporal data is increasingly utilized across diverse domains, including transportation, healthcare, and meteorology. In real-world settings, such data often contain missing elements due to issues like sensor malfunctions and data transmission errors. The objective of spatiotemporal imputation is to estimate these missing values by understanding the inherent spatial and temporal relationships in the observed multivariate time series. Traditionally, spatiotemporal imputation has relied on specific, intricate architectures designed for this purpose, which suffer from limited applicability and high computational complexity. In contrast, our approach integrates pre-trained large language models (LLMs) into spatiotemporal imputation, introducing a groundbreaking framework, GATGPT. This framework merges a graph attention mechanism with LLMs. We maintain most of the LLM parameters unchanged to leverage existing knowledge for learning temporal patterns, while fine-tuning the upper layers tailored to various applications. The graph attention component enhances the LLM's ability to understand spatial relationships. Through tests on three distinct real-world datasets, our innovative approach demonstrates comparable results to established deep learning benchmarks.
    摘要 《维度时空数据分析在不同领域中越来越普遍,如交通、医疗和气象等。在实际应用中,这些数据经常包含某些缺失元素,这可能是因为仪器故障或数据传输错误。目标是使用维度时空填充来估算这些缺失值,并且理解这些维度时空数据的自然关系。传统上,维度时空填充通常采用专门设计的复杂架构,这些架构受限于应用场景和计算复杂性。相比之下,我们的方法将预训练的大型自然语言模型(LLM) integrate into spatiotemporal imputation,提出了一种创新的框架——GATGPT。这种框架将图注意机制与LLM结合在一起,以便更好地理解空间关系。我们保留了大多数LLM参数不变,以利用现有的知识来学习时间模式,同时对上层进行微调,适应不同的应用。图注意部分可以增强LLM对空间关系的理解。经过对三个不同的实际数据集进行测试,我们的创新方法与传统的深度学习标准准确性相当。》

Reinforcement Learning from Statistical Feedback: the Journey from AB Testing to ANT Testing

  • paper_url: http://arxiv.org/abs/2311.14766
  • repo_url: None
  • paper_authors: Feiyang Han, Yimin Wei, Zhaofeng Liu, Yanxing Qi
  • for: 用于填补RLHF中商业目标和模型训练之间的空白,使用统计业务反馈来提高学习效果和性能。
  • methods: 使用AB测试来获取偏好反馈,并使用统计推理方法来计算奖励网络的偏好。
  • results: 提出了RLSF基于AB测试的方法,并通过多个数字体验 validate了方法的有效性。
    Abstract Reinforcement Learning from Human Feedback (RLHF) has played a crucial role in the success of large models such as ChatGPT. RLHF is a reinforcement learning framework which combines human feedback to improve learning effectiveness and performance. However, obtaining preferences feedback manually is quite expensive in commercial applications. Some statistical commercial indicators are usually more valuable and always ignored in RLHF. There exists a gap between commercial target and model training. In our research, we will attempt to fill this gap with statistical business feedback instead of human feedback, using AB testing which is a well-established statistical method. Reinforcement Learning from Statistical Feedback (RLSF) based on AB testing is proposed. Statistical inference methods are used to obtain preferences for training the reward network, which fine-tunes the pre-trained model in reinforcement learning framework, achieving greater business value. Furthermore, we extend AB testing with double selections at a single time-point to ANT testing with multiple selections at different feedback time points. Moreover, we design numerical experiences to validate the effectiveness of our algorithm framework.
    摘要 人工智能学习从人类反馈(RLHF)已经在大型模型如ChatGPT的成功中发挥了关键作用。 RLHF是一种结合人类反馈来提高学习效果和性能的 reinforcement learning 框架。然而,在商业应用中手动获取人类反馈是非常昂贵的。一些商业指标通常被忽略在 RLHF 中。这存在一个商业目标和模型训练之间的差距。在我们的研究中,我们尝试使用统计业务反馈来填充这个差距,使用 AB 测试,这是一种已有的统计方法。基于 AB 测试的 Reinforcement Learning from Statistical Feedback(RLSF)被提议。统计推理方法用于从统计反馈中获取训练奖网络的偏好,这些网络在 reinforcement learning 框架中进行精细调整,从而实现更大的商业价值。此外,我们扩展 AB 测试,使用 double selections 在同一时间点进行测试,并使用多个反馈时间点进行 ANT 测试。此外,我们设计了数字体验来验证我们的算法框架的效果。

AdaMedGraph: Adaboosting Graph Neural Networks for Personalized Medicine

  • paper_url: http://arxiv.org/abs/2311.14304
  • repo_url: None
  • paper_authors: Jie Lian, Xufang Luo, Caihua Shan, Dongqi Han, Varut Vardhanabhuti, Dongsheng Li
  • for: 这篇论文是针对个性化医疗 tailored to individual patients 的预测任务。
  • methods: 这篇论文使用机器学习技术处理个性化数据,包括影像、基因和评估。特别是使用建构图的方法,连结相似的病人,然后运用图神经网络(GNNs)进行预测。
  • results: 这篇论文提出了一个新的算法,可以自动选择重要的特征,建构多个病人相似图,并使用这些图进行预测。在两个实际医疗应用中,这个算法表现出色。
    Abstract Precision medicine tailored to individual patients has gained significant attention in recent times. Machine learning techniques are now employed to process personalized data from various sources, including images, genetics, and assessments. These techniques have demonstrated good outcomes in many clinical prediction tasks. Notably, the approach of constructing graphs by linking similar patients and then applying graph neural networks (GNNs) stands out, because related information from analogous patients are aggregated and considered for prediction. However, selecting the appropriate edge feature to define patient similarity and construct the graph is challenging, given that each patient is depicted by high-dimensional features from diverse sources. Previous studies rely on human expertise to select the edge feature, which is neither scalable nor efficient in pinpointing crucial edge features for complex diseases. In this paper, we propose a novel algorithm named \ours, which can automatically select important features to construct multiple patient similarity graphs, and train GNNs based on these graphs as weak learners in adaptive boosting. \ours{} is evaluated on two real-world medical scenarios and shows superiors performance.
    摘要 准精准医学在最近几年内得到了广泛关注。现代机器学习技术被用来处理个性化数据,包括图像、基因和评估等多种来源。这些技术在许多临床预测任务中表现出色。特别是通过建立相似病人之间的图并应用图 neural networks(GNNs)的方法,这种方法在汇集相似病人的相关信息并考虑这些信息 для预测中表现出色。然而,选择适当的边特征来定义病人相似性并构建图是困难的,因为每个病人都是由多维特征来自多种来源所描述。先前的研究依赖于人类专家来选择边特征,这并不是可推广的 nor efficient in identifying crucial edge features for complex diseases。在这篇论文中,我们提出了一种新的算法名为\ours,可以自动选择重要的特征来构建多个病人相似图,并基于这些图进行GNNs的训练为弱学习器在适应增强中。\ours{}在两个实际医疗场景中进行了评估,并表现出优于其他方法。

Out-of-Distribution Generalized Dynamic Graph Neural Network with Disentangled Intervention and Invariance Promotion

  • paper_url: http://arxiv.org/abs/2311.14255
  • repo_url: https://github.com/zeniSoida/pl1
  • paper_authors: Zeyang Zhang, Xin Wang, Ziwei Zhang, Haoyang Li, Wenwu Zhu
  • for: 这个 paper 的目的是对 dynamic graph neural networks (DyGNNs) 进行改进,以便在具有分布差异的动态图形上进行预测。
  • methods: 这个 paper 使用了一种名为 Disentangled Intervention-based Dynamic graph Attention networks with Invariance Promotion (I-DIDA) 的方法,它可以在动态图形上捕捉到不同的构造和特征,并将其转换为不同的几何构造和特征,以便在分布差异下进行预测。
  • results: 实验结果显示,这个方法可以在分布差异下进行预测,并且比以往的基elines 更高。这是首次研究具有分布差异的动态图形预测问题。
    Abstract Dynamic graph neural networks (DyGNNs) have demonstrated powerful predictive abilities by exploiting graph structural and temporal dynamics. However, the existing DyGNNs fail to handle distribution shifts, which naturally exist in dynamic graphs, mainly because the patterns exploited by DyGNNs may be variant with respect to labels under distribution shifts. In this paper, we propose Disentangled Intervention-based Dynamic graph Attention networks with Invariance Promotion (I-DIDA) to handle spatio-temporal distribution shifts in dynamic graphs by discovering and utilizing invariant patterns, i.e., structures and features whose predictive abilities are stable across distribution shifts. Specifically, we first propose a disentangled spatio-temporal attention network to capture the variant and invariant patterns. By utilizing the disentangled patterns, we design a spatio-temporal intervention mechanism to create multiple interventional distributions and an environment inference module to infer the latent spatio-temporal environments, and minimize the variance of predictions among these intervened distributions and environments, so that our model can make predictions based on invariant patterns with stable predictive abilities under distribution shifts. Extensive experiments demonstrate the superiority of our method over state-of-the-art baselines under distribution shifts. Our work is the first study of spatio-temporal distribution shifts in dynamic graphs, to the best of our knowledge.
    摘要 临时图 neural networks (DyGNNs) 已经表现出了强大的预测能力,利用图结构和时间动态。然而,现有的 DyGNNs 无法处理分布变化,这些变化 Naturally 存在于临时图中,主要因为 DyGNNs 所抓取的模式可能与标签之间的分布变化相关。在这篇论文中,我们提出了分离了解释-based Dynamic graph Attention networks with Invariance Promotion (I-DIDA),用于在动态图中处理空间-时间分布变化。具体来说,我们首先提出了分离的空间-时间注意力网络,用于捕捉变化和不变性模式。然后,我们设计了空间-时间干预机制,创建多个干预分布,并使用环境推理模块来推理 latent 空间-时间环境,以降低预测结果中的差异,从而使我们的模型可以基于不变性模式进行预测,并在分布变化下保持稳定的预测能力。我们的实验结果表明,我们的方法比现有的基elines 在分布变化下表现更优异。我们的工作是动态图中首次研究空间-时间分布变化,到目前为止,这是我们所知道的。

eess.IV - 2023-11-24

Wavelength-multiplexed Multi-mode EUV Reflection Ptychography based on Automatic-Differentiation

  • paper_url: http://arxiv.org/abs/2311.14780
  • repo_url: None
  • paper_authors: Yifeng Shao, Sven Weerdenburg, Jacob Seifert, H. Paul Urbach, Allard P. Mosk, Wim Coene
  • for: This paper aims to demonstrate the potential of ptychographic extreme ultraviolet (EUV) diffractive imaging as an efficient and accurate metrology tool for the semiconductor industry.
  • methods: The paper introduces a novel algorithm that enables wavelength-multiplexed reconstruction, which enhances the measurement throughput and introduces data diversity, allowing for accurate characterization of sample structures. The algorithm uses a modal approach to represent the cross-density function of the illumination by a series of mutually incoherent and independent spatial modes.
  • results: The proposed algorithm was tested on a mainstream machine learning platform, and the results demonstrate the algorithm’s capacity to accommodate experimental uncertainties and achieve a resolution approaching the diffraction limit in reflection geometry. The reconstruction of wafer samples with 20-nm high patterned gold structures on a silicon substrate highlights the ability to handle complex physical interrelations involving a multitude of parameters.
    Abstract Ptychographic extreme ultraviolet (EUV) diffractive imaging has emerged as a promising candidate for the next-generation metrology solutions in the semiconductor industry, as it can image wafer samples in reflection geometry at the nanoscale. This technique has surged attention recently, owing to the significant progress in high-harmonic generation (HHG) EUV sources and advancements in both hardware and software for computation. In this study, a novel algorithm is introduced and tested, which enables wavelength-multiplexed reconstruction that enhances the measurement throughput and introduces data diversity, allowing the accurate characterisation of sample structures. To tackle the inherent instabilities of the HHG source, a modal approach was adopted, which represents the cross-density function of the illumination by a series of mutually incoherent and independent spatial modes. The proposed algorithm was implemented on a mainstream machine learning platform, which leverages automatic differentiation to manage the drastic growth in model complexity and expedites the computation using GPU acceleration. By optimising over 200 million parameters, we demonstrate the algorithm's capacity to accommodate experimental uncertainties and achieve a resolution approaching the diffraction limit in reflection geometry. The reconstruction of wafer samples with 20-nm heigh patterned gold structures on a silicon substrate highlights our ability to handle complex physical interrelations involving a multitude of parameters. These results establish ptychography as an efficient and accurate metrology tool.
    摘要 弹性极紫外(EUV)探测技术在半导体工业中 emerged as a promising candidate for the next-generation metrology solutions, as it can image wafer samples in reflection geometry at the nanoscale. This technique has attracted significant attention recently, due to the significant progress in high-harmonic generation (HHG) EUV sources and advancements in both hardware and software for computation. In this study, a novel algorithm is introduced and tested, which enables wavelength-multiplexed reconstruction that enhances the measurement throughput and introduces data diversity, allowing the accurate characterization of sample structures. To tackle the inherent instabilities of the HHG source, a modal approach was adopted, which represents the cross-density function of the illumination by a series of mutually incoherent and independent spatial modes. The proposed algorithm was implemented on a mainstream machine learning platform, which leverages automatic differentiation to manage the drastic growth in model complexity and expedites the computation using GPU acceleration. By optimizing over 200 million parameters, we demonstrate the algorithm's capacity to accommodate experimental uncertainties and achieve a resolution approaching the diffraction limit in reflection geometry. The reconstruction of wafer samples with 20-nm height patterned gold structures on a silicon substrate highlights our ability to handle complex physical interrelations involving a multitude of parameters. These results establish ptychography as an efficient and accurate metrology tool.

Lightweight Framework for Automated Kidney Stone Detection using coronal CT images

  • paper_url: http://arxiv.org/abs/2311.14488
  • repo_url: None
  • paper_authors: Fangyijie Wang, Guenole Silvestre, Kathleen M. Curran
  • for: 寻找病人当中的肾calculi,以提高诊断效率。
  • methods: 提出了一种轻量级的混合方案,通过对卷积神经网络进行优化,实现了高效的肾calculi检测和诊断。
  • results: 实验结果表明,该方案可以在8%的原始训练数据上达到竞争力的结果,其中F1分数为96%,false negative率为4%。同时,每个CT图像的检测时间平均为0.62秒。
    Abstract Kidney stone disease results in millions of annual visits to emergency departments in the United States. Computed tomography (CT) scans serve as the standard imaging modality for efficient detection of kidney stones. Various approaches utilizing convolutional neural networks (CNNs) have been proposed to implement automatic diagnosis of kidney stones. However, there is a growing interest in employing fast and efficient CNNs on edge devices in clinical practice. In this paper, we propose a lightweight fusion framework for kidney detection and kidney stone diagnosis on coronal CT images. In our design, we aim to minimize the computational costs of training and inference while implementing an automated approach. The experimental results indicate that our framework can achieve competitive outcomes using only 8\% of the original training data. These results include an F1 score of 96\% and a False Negative (FN) error rate of 4\%. Additionally, the average detection time per CT image on a CPU is 0.62 seconds. Reproducibility: Framework implementation and models available on GitHub.
    摘要 每年美国医院访问量因肾石病已经达到数百万次。 computed tomography(CT)扫描成为了肾石的efficient检测标准影像技术。各种利用卷积神经网络(CNN)的方法已经被提议用于自动诊断肾石。然而,随着临床实践中的edge设备应用得到越来越多的关注,快速和高效的CNN在edge设备上进行训练和推理已成为一个热点。在这篇论文中,我们提出了一种轻量级融合框架,用于肾部检测和肾石诊断在横截 CT 图像上。我们的设计目标是尽量降低训练和推理的计算成本,实现自动化的方法。实验结果表明,我们的框架可以在只使用8%的原始训练数据时达到竞争力的结果,其中F1分数为96%,False Negative(FN)错误率为4%。此外,在CPU上处理每个 CT 图像的检测时间平均为0.62秒。可重现:框架实现和模型可以在 GitHub 上找到。