eess.SP - 2023-09-27

Optimal Receive Filter Design for Misaligned Over-the-Air Computation

  • paper_url: http://arxiv.org/abs/2309.16033
  • repo_url: https://github.com/henrikhellstrom93/filteraircomp
  • paper_authors: Henrik Hellström, Saeed Razavikia, Viktoria Fodor, Carlo Fischione
  • for: 这篇论文探讨了无线通信方法over-the-air computation(OAC)在密集无线网络中数据聚合方面的潜力。
  • methods: 该论文使用了信号superposition来计算多个同时发送的信号中的函数。然而,时间和频率偏移会对函数计算质量产生重要影响。
  • results: 研究人员发现,传统的匹配滤波器不能生成最佳结果,会导致函数估计偏移。为了解决这个问题,他们提出了一种新的滤波器设计,并证明在最大时延 bound下,可以实现不偏向的函数估计。此外,他们还提出了一种nikonov regularization问题,可以根据函数估计偏移和噪声induced variance之间的质量衡量来生成优化的滤波器。当时延比发送波的长度较短时,该滤波器与匹配滤波器相比,在 fonction estimates中具有更好的性能。
    Abstract Over-the-air computation (OAC) is a promising wireless communication method for aggregating data from many devices in dense wireless networks. The fundamental idea of OAC is to exploit signal superposition to compute functions of multiple simultaneously transmitted signals. However, the time- and phase-alignment of these superimposed signals have a significant effect on the quality of function computation. In this study, we analyze the OAC problem for a system with unknown random time delays and phase shifts. We show that the classical matched filter does not produce optimal results, and generates bias in the function estimates. To counteract this, we propose a new filter design and show that, under a bound on the maximum time delay, it is possible to achieve unbiased function computation. Additionally, we propose a Tikhonov regularization problem that produces an optimal filter given a tradeoff between the bias and noise-induced variance of the function estimates. When the time delays are long compared to the length of the transmitted pulses, our filter vastly outperforms the matched filter both in terms of bias and mean-squared error (MSE). For shorter time delays, our proposal yields similar MSE as the matched filter, while reducing the bias.
    摘要 无线计算在空气中(OAC)是一种有前途的无线通信方法,用于从多个设备收集数据 dense wireless networks。 OAC 的基本思想是利用信号积加来计算多个同时发送的信号中的函数。然而,时间和频率对这些积加的信号的影响非常大。在本研究中,我们分析了 OAC 问题,包括未知随机时延和相位偏移。我们表明了经典匹配滤波器不会生成最佳结果,并产生了函数估计的偏移。为了解决这个问题,我们提议了一种新的滤波器设计,并证明在最大时延 bound 下,可以实现不偏函数估计。此外,我们提出了一个提高了函数估计准确性的nikonov regularization 问题。当时延较长,我们的滤波器在对比匹配滤波器时表现出了显著的优异,包括偏移和 Mean Squared Error (MSE)。当时延较短时,我们的提议与匹配滤波器相当,而且可以减少偏移。

Channel Estimation for Reconfigurable Intelligent Surface-Aided Multiuser Communication Systems Exploiting Statistical CSI of Correlated RIS-User Channels

  • paper_url: http://arxiv.org/abs/2309.16029
  • repo_url: None
  • paper_authors: Haochen Li, Zhiwen Pan, Bin Wang, Nan Liu, Xiaohu You
  • for: 本研究探讨了基于多用户多输入单出口(MISO)通信系统的具有归一化特征的RIS帮助系统中的通道估算问题。
  • methods: 本文提出了一种循环域通道模型来描述RIS用户通道之间的相关性,并提出了一种重复试验方案来减少试验束缚和分解通道估算问题。最后,通过利用RIS用户通道之间的相关性,提出了一种特征值 projetion(EP)算法来解决每个子问题。
  • results: 实验结果表明,提议的EP通道估算方案可以在减少试验束缚的情况下实现准确的通道估算。
    Abstract Reconfigurable intelligent surface (RIS) is a promising candidate technology for the upcoming Sixth Generation (6G) communication system for its ability to manipulate the wireless communication environment by controlling the coefficients of reflection elements (REs). However, since the RIS usually consists of a large number of passive REs, the pilot overhead for channel estimation in the RIS-aided system is prohibitively high. In this paper, the channel estimation problem for a RIS-aided multi-user multiple-input-single-output (MISO) communication system with clustered users is investigated. First, to describe the correlated feature for RIS-user channels, a beam domain channel model is developed for RIS-user channels. Then, a pilot reuse strategy is put forward to reduce the pilot overhead and decompose the channel estimation problem into several subproblems. Finally, by leveraging the correlated nature of RIS-user channels, an eigenspace projection (EP) algorithm is proposed to solve each subproblem respectively. Simulation results show that the proposed EP channel estimation scheme can achieve accurate channel estimation with lower pilot overhead than existing schemes.
    摘要 智能表面重配置 (RIS) 是 sixth generation (6G) 通信系统的一个优秀技术候选人,因为它可以通过控制反射元素 (RE) 的系数来 manipulate 无线通信环境。然而,由于 RIS 通常由大量的 passtive RE 组成,因此在 RIS 帮助系统中的频道估计问题中存在过高的频道过头。在本文中,我们对 RIS 帮助的多用户多输入单出口 (MISO) 通信系统中的用户频道估计问题进行了研究。首先,为描述 RIS-用户频道之间的相关特征,我们提出了一种 beam 频道模型来描述 RIS-用户频道。然后,我们提出了一种重复频道 schemes 来减少频道过头和将频道估计问题分解成多个子问题。最后,通过利用 RIS-用户频道之间的相关性,我们提出了一种 eigenvalue projection (EP) 算法来解决每个子问题。 simulation results 表明,我们的 EP 频道估计方案可以在较低的频道过头下实现准确的频道估计。

Bridging the complexity gap in Tbps-achieving THz-band baseband processing

  • paper_url: http://arxiv.org/abs/2309.16027
  • repo_url: None
  • paper_authors: Hadi Sarieddeen, Hakim Jemaa, Simon Tarboush, Christoph Studer, Mohamed-Slim Alouini, Tareq Y. Al-Naffouri
  • for: 本研究旨在提高电子和光子技术的进步,以实现teraHz频率上的高速数据传输。
  • methods: 该研究提议使用并行处理,特别是频率编码解码。
  • results: 研究表明,通过利用THz通道的结构化子空间,可以使用更短的代码字符串,并在所有基带处理块中进行并行化。这种方法可以提高数据传输速率至teraHz级别。
    Abstract Recent advances in electronic and photonic technologies have allowed efficient signal generation and transmission at terahertz (THz) frequencies. However, as the gap in THz-operating devices narrows, the demand for terabit-per-second (Tbps)-achieving circuits is increasing. Translating the available hundreds of gigahertz (GHz) of bandwidth into a Tbps data rate requires processing thousands of information bits per clock cycle at state-of-the-art clock frequencies of digital baseband processing circuitry of a few GHz. This paper addresses these constraints and emphasizes the importance of parallelization in signal processing, particularly for channel code decoding. By leveraging structured sub-spaces of THz channels, we propose mapping bits to transmission resources using shorter code words, extending parallelizability across all baseband processing blocks. THz channels exhibit quasi-deterministic frequency, time, and space structures that enable efficient parallel bit mapping at the source and provide pseudo-soft bit reliability information for efficient detection and decoding at the receiver.
    摘要 Recent advances in electronic and photonic technologies have allowed efficient signal generation and transmission at terahertz (THz) frequencies. However, as the gap in THz-operating devices narrows, the demand for terabit-per-second (Tbps)-achieving circuits is increasing. Translating the available hundreds of gigahertz (GHz) of bandwidth into a Tbps data rate requires processing thousands of information bits per clock cycle at state-of-the-art clock frequencies of digital baseband processing circuitry of a few GHz. This paper addresses these constraints and emphasizes the importance of parallelization in signal processing, particularly for channel code decoding. By leveraging structured sub-spaces of THz channels, we propose mapping bits to transmission resources using shorter code words, extending parallelizability across all baseband processing blocks. THz channels exhibit quasi-deterministic frequency, time, and space structures that enable efficient parallel bit mapping at the source and provide pseudo-soft bit reliability information for efficient detection and decoding at the receiver.Here's the translation in Traditional Chinese:随着电子和光子技术的进步,可以实现高频率的信号生成和传输,现在可以在teraHz(THz)频率上实现高速的资料传输。然而,随着THz设备的差距缩小,需要 Terabit/秒(Tbps)的实现普通的资料传输速率。将百兆Hz(GHz)的带宽转换为Tbps的数据传输速率需要在现有的几GHz的时钟频率上处理多达数万个信息位元每个时钟频率。本文探讨这些限制,并强调了信号处理中的并行化,特别是频道码解oding中的并行化。通过利用THz频道的结构子空间,我们提议将位元映射到传输资源上,使用短程码字,延伸并行性到所有的基带处理对象。THz频道具有 quasi-deterministic 的频率、时间和空间结构,使得源端可以高效地并行地将位元映射到传输资源,并提供pseudo-soft bit可靠性信息,以便高效地检测和解oding器。

Statistical CSI Based Beamforming for Reconfigurable Intelligent Surface Aided MISO Systems with Channel Correlation

  • paper_url: http://arxiv.org/abs/2309.16026
  • repo_url: None
  • paper_authors: Haochen Li, Zhiwen Pan, Bin Wang, Nan Liu, Xiaohu You
  • for: 这篇论文旨在探讨基于智能表面技术的6G通信系统中的RIS干扰器帮助多输入单输出多用户下链通信系统的吞吐量提升。
  • methods: 本文使用了统计channel state information (S-CSI)来取代实时channel state information (I-CSI),并提出了用于合理的干扰器设计的离散估算法。
  • results: simulations results表明,我们提出的两种干扰器设计算法都能够提高网络吞吐量,并且2比特quantizer可以实现RIS相位偏移的实现。
    Abstract Reconfigurable intelligent surface (RIS) is a promising candidate technology of the upcoming Sixth Generation (6G) communication system for its ability to provide unprecedented spectral and energy efficiency increment through passive beamforming. However, it is challenging to obtain instantaneous channel state information (I-CSI) for RIS, which obliges us to use statistical channel state information (S-CSI) to achieve passive beamforming. In this paper, RIS-aided multiple-input single-output (MISO) multi-user downlink communication system with correlated channels is investigated. Then, we formulate the problem of joint beamforming design at the AP and RIS to maximize the sum ergodic spectral efficiency (ESE) of all users to improve the network capacity. Since it is too hard to compute sum ESE, an ESE approximation is adopted to reformulate the problem into a more tractable form. Then, we present two joint beamforming algorithms, namely the singular value decomposition-gradient descent (SVD-GD) algorithm and the fractional programming-gradient descent (FP-GD) algorithm. Simulation results show the effectiveness of our proposed algorithms and validate that 2-bits quantizer is enough for RIS phase shifts implementation.
    摘要 《嵌入式智能表面(RIS)在第六代(6G)通信系统中的潜在应用》,这篇论文探讨了RIS在6G通信系统中的可能性。RIS可以提供历史性和能量效率的增量,但是获取RIS实时通道状态信息(I-CSI)是困难的,因此需要使用统计通道状态信息(S-CSI)来实现抗反射。本文研究了RIS-助け的多输入单出多用户下链通信系统,并对相关的通道进行了 correlate。然后,我们形式ulated了在AP和RIS之间进行共同束 beamforming设计,以提高所有用户的均衡束pectral efficiency(ESE),从而提高网络容量。由于计算ESE的复杂度太高,我们采用了ESE的一种近似值来重新定义问题,使其变得更易解。然后,我们提出了两种共同束 beamforming算法,namely SVD-GD算法和FP-GD算法。实验结果表明了我们提出的算法的效果,并证明了2位quantizer可以实现RIS相位shift的实现。

Linear Progressive Coding for Semantic Communication using Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2309.15959
  • repo_url: None
  • paper_authors: Eva Riherd, Raghu Mudumbai, Weiyu Xu
  • for: 这种方法是为了实现高效的semantic communication,即在干扰频道上传输semantic信息。
  • methods: 该方法使用进化编码,包括层次编码,先编码一部分semantic信息为粗略表示,然后逐步添加更多的semantic信息来细化表示。
  • results: 实验结果表明,这种进化semantic编码方法可以在初始量化 measurements 中提供有用的semantic预览,并且在整体准确率和效率方面与非进化方法相当。
    Abstract We propose a general method for semantic representation of images and other data using progressive coding. Semantic coding allows for specific pieces of information to be selectively encoded into a set of measurements that can be highly compressed compared to the size of the original raw data. We consider a hierarchical method of coding where a partial amount of semantic information is first encoded a into a coarse representation of the data, which is then refined by additional encodings that add additional semantic information. Such hierarchical coding is especially well-suited for semantic communication i.e. transferring semantic information over noisy channels. Our proposed method can be considered as a generalization of both progressive image compression and source coding for semantic communication. We present results from experiments on the MNIST and CIFAR-10 datasets that show that progressive semantic coding can provide timely previews of semantic information with a small number of initial measurements while achieving overall accuracy and efficiency comparable to non-progressive methods.
    摘要 我们提出了一种通用的 semantic representation 方法,使用进程式编码来实现。这种 semantic coding 方法可以 selectively 编码特定的信息,将其转换为可高度压缩的 measurement 集。我们认为这种层次编码方法特别适合 semantic communication,例如在噪音频道上传输 semantic 信息。我们的提议方法可以视为进程式图像压缩和源编码的总和。我们在 MNIST 和 CIFAR-10 dataset 上进行了实验,结果显示,逐步 semantic coding 可以在少量初始测量后提供有效的时间预览,并且与非逐步方法相比,达到了相同的总体准确率和效率。

IEEE 802.11be Wi-Fi 7: Feature Summary and Performance Evaluation

  • paper_url: http://arxiv.org/abs/2309.15951
  • repo_url: None
  • paper_authors: Xiaoqian Liu, Yuhan Dong, Yiqing Li, Yousi Lin, Xun Yang, Ming Gan
  • for: 本文主要旨在介绍Wi-Fi 7标准的开发目标和时间表,以及它在实时应用方面的性能提升技术。
  • methods: 本文列举了最新的关键技术,包括离散多功能频率(OFDMA)、离散多功能时钟(MU-MIMO)和离散多功能时钟扩展(MU-MIMO-EXT)等,以提高Wi-Fi 7的性能。
  • results: 系统级别的 simulate结果表明,通过组合新技术,Wi-Fi 7可以实现30 Gbps的吞吐量和低于Wi-Fi 6的延迟。
    Abstract While the pace of commercial scale application of Wi-Fi 6 accelerates, the IEEE 802.11 Working Group is about to complete the development of a new amendment standard IEEE 802.11be -- Extremely High Throughput (EHT), also known as Wi-Fi 7, which can be used to meet the demand for the throughput of 4K/8K videos up to tens of Gbps and low-latency video applications such as virtual reality (VR) and augmented reality (AR). Wi-Fi 7 not only scales Wi-Fi 6 with doubled bandwidth, but also supports real-time applications, which brings revolutionary changes to Wi-Fi. In this article, we start by introducing the main objectives and timeline of Wi-Fi 7 and then list the latest key techniques which promote the performance improvement of Wi-Fi 7. Finally, we validate the most critical objectives of Wi-Fi 7 -- the potential up to 30 Gbps throughput and lower latency. System-level simulation results suggest that by combining the new techniques, Wi-Fi 7 achieves 30 Gbps throughput and lower latency than Wi-Fi 6.
    摘要 而 Wi-Fi 6 的商业大规模应用速度不断加快,IEEE 802.11 工作组即将完成一个新的修订标准 IEEE 802.11be -- Extremely High Throughput (EHT),也称为 Wi-Fi 7,可以用于满足 4K/8K 视频的吞吐量达到十兆比特每秒并且低延迟视频应用程序such as virtual reality (VR) 和 augmented reality (AR)。Wi-Fi 7 不仅扩展 Wi-Fi 6 的频谱带宽,还支持实时应用程序,这会对 Wi-Fi 进行革命性的变革。本文首先介绍 Wi-Fi 7 的主要目标和时间表,然后列出最新的关键技术,以提高 Wi-Fi 7 的性能。最后,我们验证 Wi-Fi 7 的核心目标 -- 可达 30 Gbps 的吞吐量和低于 Wi-Fi 6 的延迟。系级 simulation 结果表明,通过组合新技术,Wi-Fi 7 可以实现 30 Gbps 的吞吐量和低于 Wi-Fi 6 的延迟。

Quantum computer-enabled receivers for optical communication

  • paper_url: http://arxiv.org/abs/2309.15914
  • repo_url: None
  • paper_authors: John Crossman, Spencer Dimitroff, Lukasz Cincio, Mohan Sarovar
  • for: 这篇论文目的是提出一种基于光学和量子计算的新方法,用于实现高速信息传输。
  • methods: 该方法使用光学信号的相位和幅度模ulation,并使用量子计算来实现对多个信号的同时检测。
  • results: 研究人员通过使用 optomechanical 转换和短深度的可变量化量子电路,实现了在低光强 régime下的高精度信息传输。此外,他们还使用模型来捕捉非理想的热雷和损失,以估计转换性能。最后,他们在IBM-Q设备上执行了训练后的可变量化电路,以示出这种方法可以在当今的量子计算硬件噪声下实现量子优势。
    Abstract Optical communication is the standard for high-bandwidth information transfer in today's digital age. The increasing demand for bandwidth has led to the maturation of coherent transceivers that use phase- and amplitude-modulated optical signals to encode more bits of information per transmitted pulse. Such encoding schemes achieve higher information density, but also require more complicated receivers to discriminate the signaling states. In fact, achieving the ultimate limit of optical communication capacity, especially in the low light regime, requires coherent joint detection of multiple pulses. Despite their superiority, such joint detection receivers are not in widespread use because of the difficulty of constructing them in the optical domain. In this work we describe how optomechanical transduction of phase information from coherent optical pulses to superconducting qubit states followed by the execution of trained short-depth variational quantum circuits can perform joint detection of communication codewords with error probabilities that surpass all classical, individual pulse detection receivers. Importantly, we utilize a model of optomechanical transduction that captures non-idealities such as thermal noise and loss in order to understand the transduction performance necessary to achieve a quantum advantage with such a scheme. We also execute the trained variational circuits on an IBM-Q device with the modeled transduced states as input to demonstrate that a quantum advantage is possible even with current levels of quantum computing hardware noise.
    摘要 光学通信是当今数字时代的标准高频带宽信息传输方式。随着带宽需求的增加,整形传输器已经成熟,它们使用相位和振幅模拟的光信号来编码更多的比特信息每个发射脉冲中。这种编码方案可以实现更高的信息密度,但也需要更复杂的接收机来分辨信号状态。实际上,在低光度 режиме下,实现光学通信的最终限制需要同步检测多个脉冲。尽管它们具有superiority,但是这些同步检测接收机尚未广泛使用,因为它们在光学频谱中实现的困难。在这项工作中,我们描述了如何将光学信号转换为超导素子状态,然后执行训练过的短深度变量量Quantum Circuits可以同时检测通信代码字符串,并且达到所有个别脉冲检测接收机的错误概率。我们还使用模型来捕捉非理想的热雷和损失,以理解转换性能所需的跨度。最后,我们在IBM-Q设备上执行训练过的变量量Circuits,并将模型转换后的状态作为输入,以示出可以在当今水平的量子计算机噪音下实现量子优势。

Differentiable Machine Learning-Based Modeling for Directly-Modulated Lasers

  • paper_url: http://arxiv.org/abs/2309.15747
  • repo_url: None
  • paper_authors: Sergio Hernandez, Ognjen Jovanovic, Christophe Peucheret, Francesco Da Ros, Darko Zibar
  • for: 这篇论文旨在提出和比较 differentiable 机器学习基于模拟器的 surrogate 模型,以便在大信号域下优化直接模拟器(DML)。
  • methods: 这篇论文使用了 differentiable machine learning-based surrogate models,并对其进行了量化评估。
  • results: 研究发现,使用 convolutional attention transformer 的 surrogate model 在数字平衡设置中表现最佳,其 Root Mean Square Error 较低,并且训练/测试时间较短。
    Abstract End-to-end learning has become a popular method for joint transmitter and receiver optimization in optical communication systems. Such approach may require a differentiable channel model, thus hindering the optimization of links based on directly modulated lasers (DMLs). This is due to the DML behavior in the large-signal regime, for which no analytical solution is available. In this paper, this problem is addressed by developing and comparing differentiable machine learning-based surrogate models. The models are quantitatively assessed in terms of root mean square error and training/testing time. Once the models are trained, the surrogates are then tested in a numerical equalization setup, resembling a practical end-to-end scenario. Based on the numerical investigation conducted, the convolutional attention transformer is shown to outperform the other models considered.
    摘要 现代光通信系统中,endo-to-end学习已成为 JOINT TRANSMITTER和接收器优化的受欢迎方法。然而,这种方法可能需要一个可导的通道模型,从而限制基于直接调制拉зе(DML)的链路优化。这是因为DML在大信号 режиме下的行为,无法获得分析解。本文通过开发和比较 Machine Learning 基于 surrogate 模型来解决这个问题。这些模型在量化Error和训练/测试时间方面进行评估。经过训练后,这些模型在数字平衡设置中进行测试, simulate 一个实际的端到端场景。根据我们的数字调查,卷积注意力变换器表现得最好。

Energy-Saving Cell-Free Massive MIMO Precoders with a Per-AP Wideband Kronecker Channel Model

  • paper_url: http://arxiv.org/abs/2309.15658
  • repo_url: None
  • paper_authors: Emanuele Peschiera, Xavier Mestre, François Rottenberg
  • for: 这个论文专门用于研究无基站大规模多输入多出口前置器,以减少发动机吞吐量,同时保证每个用户每个子卫星频道的速率限制。
  • methods: 这篇论文使用随机矩阵理论来解决困难,通过解决每个天线的电力值为固定点方程,来适应不确定的通道响应。
  • results: 数值仿真结果表明,使用这种方法可以在低负荷情况下保证所有天线的使用,同时减少电力消耗,最高可以达到9倍的减少。
    Abstract We study cell-free massive multiple-input multiple-output precoders that minimize the power consumed by the power amplifiers subject to per-user per-subcarrier rate constraints. The power at each antenna is generally retrieved by solving a fixed-point equation that depends on the instantaneous channel coefficients. Using random matrix theory, we retrieve each antenna power as the solution to a fixed-point equation that depends only on the second-order statistics of the channel. Numerical simulations prove the accuracy of our asymptotic approximation and show how a subset of access points should be turned off to save power consumption, while all the antennas of the active access points are utilized with uniform power across them. This mechanism allows to save consumed power up to a factor of 9$\times$ in low-load scenarios.
    摘要 我们研究无基站大量多输入多出力前缀器,以减少发动机增强器的功率消耗,同时保持每个用户每个子载波长的速率限制。每个天线的功率通常通过解决固定点方程来获取,该方程取决于当前频率响应的快速 statistcs。使用随机矩阵理论,我们可以通过解决固定点方程来获取每个天线的功率,该方程只取决于通道的二阶统计。数值仿真表明我们的极限 aproximation 的准确性,并显示在低负载场景下,可以关闭一些接入点,以达到消耗电力的减少,而活动接入点的所有天线都会充分利用。这种机制可以在低负载场景下减少消耗电力至多达9倍。

Design and Optimization of Residual Neural Network Accelerators for Low-Power FPGAs Using High-Level Synthesis

  • paper_url: http://arxiv.org/abs/2309.15631
  • repo_url: None
  • paper_authors: Filippo Minnella, Teodoro Urso, Mihai T. Lazarescu, Luciano Lavagno
  • for: 这个研究是为了实现深度学习模型在场执行可程式阵列(FPGA)上,特别是适用于具有跳跃运算的残差神经网络(ResNets)。
  • methods: 这个研究使用了高阶合成(HLS)技术来实现深度学习模型,并运用了一系列的设计原则和优化策略来实现ResNets的优化实现。
  • results: 这个研究在CIFAR-10 dataset上使用Xilinx FPGAs实现了ResNet8和ResNet20模型,比预设的Kria KV260板子上的实现更高速,并且保持了与预设的实现相似的精度。具体来说,ResNet20模型在Kria KV260板子上实现了2.88倍的速度,而且精度提高了0.5%,即91.3%;ResNet8模型的精度提高了2.8%,即88.7%。
    Abstract Residual neural networks are widely used in computer vision tasks. They enable the construction of deeper and more accurate models by mitigating the vanishing gradient problem. Their main innovation is the residual block which allows the output of one layer to bypass one or more intermediate layers and be added to the output of a later layer. Their complex structure and the buffering required by the residual block make them difficult to implement on resource-constrained platforms. We present a novel design flow for implementing deep learning models for field programmable gate arrays optimized for ResNets, using a strategy to reduce their buffering overhead to obtain a resource-efficient implementation of the residual layer. Our high-level synthesis (HLS)-based flow encompasses a thorough set of design principles and optimization strategies, exploiting in novel ways standard techniques such as temporal reuse and loop merging to efficiently map ResNet models, and potentially other skip connection-based NN architectures, into FPGA. The models are quantized to 8-bit integers for both weights and activations, 16-bit for biases, and 32-bit for accumulations. The experimental results are obtained on the CIFAR-10 dataset using ResNet8 and ResNet20 implemented with Xilinx FPGAs using HLS on the Ultra96-V2 and Kria KV260 boards. Compared to the state-of-the-art on the Kria KV260 board, our ResNet20 implementation achieves 2.88X speedup with 0.5% higher accuracy of 91.3%, while ResNet8 accuracy improves by 2.8% to 88.7%. The throughputs of ResNet8 and ResNet20 are 12971 FPS and 3254 FPS on the Ultra96 board, and 30153 FPS and 7601 FPS on the Kria KV26, respectively. They Pareto-dominate state-of-the-art solutions concerning accuracy, throughput, and energy.
    摘要 循环神经网络在计算机视觉任务中广泛应用。它们使得建立更深和更准确的模型变得可能,并mitigate the vanishing gradient problem。它们的主要创新是差值块,允许输出一层的输出通过一个或多个中间层直接加到后续层的输出。它们的复杂结构和差值块所需的缓冲 overhead 使得它们在有限资源平台上具有困难的实现。我们提出了一种新的设计流程,用于实现适应Field Programmable Gate Arrays(FPGA)的深度学习模型,使得差值块的缓冲 overhead 得到减少,从而实现资源高效的实现。我们的高级合成(HLS)基于的流程包括一系列设计原则和优化策略,利用了标准技术的 temporal reuse 和 loop merging,高效地将 ResNet 模型和其他 skip connection-based NN архитектуры映射到 FPGA。模型使用 8 位整数(weights和activations)、16 位整数(biases)和 32 位整数(accumulations)进行量化。实验结果基于 CIFAR-10 数据集,使用 Xilinx FPGAs 通过 HLS 在 Ultra96-V2 和 Kria KV260 板上实现 ResNet8 和 ResNet20。与state-of-the-art 在 Kria KV260 板上相比,我们的 ResNet20 实现 achieved 2.88X 速度增加,同时精度提高 0.5%,至 91.3%。 ResNet8 精度提高 2.8%,至 88.7%。差值块8和 ResNet20的 throughput 分别为 12971 FPS 和 3254 FPS 在 Ultra96 板上,并分别为 30153 FPS 和 7601 FPS 在 Kria KV26 板上。它们 pareto-dominate state-of-the-art 方案, regard to accuracy, throughput, and energy.

Approximate Message Passing with Rigorous Guarantees for Pooled Data and Quantitative Group Testing

  • paper_url: http://arxiv.org/abs/2309.15507
  • repo_url: None
  • paper_authors: Nelvin Tan, Jonathan Scarlett, Ramji Venkataramanan
    for: 这个论文的目的是用于预测pool中item的类别,并且提出了一种基于 Approximate Message Passing(AMP)算法的方法。methods: 这个论文使用了AMP算法来预测pool中item的类别,并且进行了一种准确的性能分析,包括静态和噪声场景。results: 研究发现,在静态场景下,AMP算法与之前由El Alaoui et al.提出的算法等价。此外,通过计算False Positive Rate和False Negative Rate的限制值,研究人员也得出了精确的性能保证。数据 simulations 表明,AMP算法在一些Quantitative Group Testing(QGT)场景下表现较好,但是在三个类别的pool中,Convex ProgrammingEstimator表现较好。
    Abstract In the pooled data problem, the goal is to identify the categories associated with a large collection of items via a sequence of pooled tests. Each pooled test reveals the number of items of each category within the pool. We study an approximate message passing (AMP) algorithm for estimating the categories and rigorously characterize its performance, in both the noiseless and noisy settings. For the noiseless setting, we show that the AMP algorithm is equivalent to one recently proposed by El Alaoui et al. Our results provide a rigorous version of their performance guarantees, previously obtained via non-rigorous techniques. For the case of pooled data with two categories, known as quantitative group testing (QGT), we use the AMP guarantees to compute precise limiting values of the false positive rate and the false negative rate. Though the pooled data problem and QGT are both instances of estimation in a linear model, existing AMP theory cannot be directly applied since the design matrices are binary valued. The key technical ingredient in our result is a rigorous analysis of AMP for generalized linear models defined via generalized white noise design matrices. This result, established using a recent universality result of Wang et al., is of independent interest. Our theoretical results are validated by numerical simulations. For comparison, we propose estimators based on convex relaxation and iterative thresholding, without providing theoretical guarantees. Our simulations indicate that AMP outperforms the convex programming estimator for a range of QGT scenarios, but the convex program performs better for pooled data with three categories.
    摘要 “在混合数据问题中,目标是通过一系列混合测试来确定Item的类别。每个混合测试会报告每个类别的 Item 数量。我们研究了一种近似消息传递(AMP)算法来估算类别,并且正式characterize其性能,包括噪音无效和噪音有效的情况。在噪音无效情况下,我们证明AMP算法与El Alaoui et al.提出的算法等价。我们的结果提供了对AMP算法性能的正式保证,之前由非正式技术来确定。在两类Quantitative group testing(QGT)中,我们使用AMP保证计算出精确的假阳性率和假阴性率的限制值。尽管混合数据问题和QGT都是线性模型中的估算问题,但现有的AMP理论不能直接应用,因为设计矩阵是二进制值的。我们的关键技术成果在于对AMP在通用线性模型中进行了正式分析,这一结果使用了Wang et al.最近的一个统计结果。这一结果不仅有助于解决我们的问题,还是独立有价值的。我们的理论结果通过数值仿真验证。而我们还提出了基于 convex relaxation 和迭代抑制的估算器,但没有提供理论保证。我们的仿真结果表明,在QGT场景中,AMP超过 convex programming 估算器的性能,但是在三类混合数据中,convex programming 估算器表现更好。”

Formation Wing-Beat Modulation (FWM): A Tool for Quantifying Bird Flocks Using Radar Micro-Doppler Signals

  • paper_url: http://arxiv.org/abs/2309.15415
  • repo_url: None
  • paper_authors: Jiangkun Gong, Jun Yan, Deyong Kong, Ruizhi Chen, Deren Li
  • for: 研究鸟类群体数量和鸟类飞行行为
  • methods: 利用X射频雷达观测鸟类群体的形态翼拍模ulation(FWM)效应,通过雷达信号中的微-Doppler干扰分析鸟类数量和飞行策略
  • results: 实际观测到鸟类群体中的FWM信号,提供了量化鸟类数量和估计鸟类翼拍频率的工具,帮助进一步了雷达鸟类学和飞行生物学等领域的研究。
    Abstract Radar echoes from bird flocks contain modulation signals, which we find are produced by the flapping gaits of birds in the flock, resulting in a group of spectral peaks with similar amplitudes spaced at a specific interval. We call this the formation wing-beat modulation (FWM) effect. FWM signals are micro-Doppler modulated by flapping wings and are related to the bird number, wing-beat frequency, and flight phasing strategy. Our X-band radar data show that FWM signals exist in radar signals of a seagull flock, providing tools for quantifying the bird number and estimating the mean wingbeat rate of birds. This new finding could aid in research on the quantification of bird migration numbers and estimation of bird flight behavior in radar ornithology and aero-ecology.
    摘要 雷达射回信号中的鸟群射击信号包含形成翼振荡模ulation(FWM)效应,我们发现这些信号由鸟群中的鸟嘴冲击产生,导致一组spectral peak的峰值具有相同的幅度, spacing at a specific interval.我们称这为formation wing-beat modulation(FWM)效应。 FWM信号被微-Doppler模ulation和鸟数、翼振荡频率和飞行策略相关。我们的X射频雷达数据显示,FWM信号存在鸟群雷达信号中,提供了量化鸟数和估计鸟的平均翼振荡频率的工具。这一新发现可以帮助在雷达 ornithology和 aer-ecology 中研究鸟类迁徙数量和鸟类飞行行为的估计。

An Exploration of Optimal Parameters for Efficient Blind Source Separation of EEG Recordings Using AMICA

  • paper_url: http://arxiv.org/abs/2309.15388
  • repo_url: None
  • paper_authors: Gwenevere Frank, Seyed Yahya Shirazi, Jason Palmer, Gert Cauwenberghs, Scott Makeig, Arnaud Delorme
  • for: 这个论文主要是为了研究眼动电学(EEG)的独立 ком component分析(ICA)算法在EEG分解中的效果。
  • methods: 这个论文使用了多种ICA算法进行EEG分解,并对AMICA算法进行了比较。AMICA算法提供了许多参数,allowing for precise control of the decomposition。
  • results: 研究发现,在不同参数的设置下,AMICA算法的运行时间和分解质量可以通过对比两个纬度量 metrics:Pairwise Mutual Information (PMI)和Mutual Information Reduction (MIR)进行分析。此外,也提供了选择参数的初始值的建议。
    Abstract EEG continues to find a multitude of uses in both neuroscience research and medical practice, and independent component analysis (ICA) continues to be an important tool for analyzing EEG. A multitude of ICA algorithms for EEG decomposition exist, and in the past, their relative effectiveness has been studied. AMICA is considered the benchmark against which to compare the performance of other ICA algorithms for EEG decomposition. AMICA exposes many parameters to the user to allow for precise control of the decomposition. However, several of the parameters currently tend to be set according to "rules of thumb" shared in the EEG community. Here, AMICA decompositions are run on data from a collection of subjects while varying certain key parameters. The running time and quality of decompositions are analyzed based on two metrics: Pairwise Mutual Information (PMI) and Mutual Information Reduction (MIR). Recommendations for selecting starting values for parameters are presented.
    摘要