results: 该论文的实验结果表明,可以通过该算法实现至少 30% BD-rate 减少,相比于 VVC 编码器的内部预测模式,这表明仍然有很大的潜在提高损失图像压缩的potential。Abstract
Recent work has shown that Variational Autoencoders (VAEs) can be used to upper-bound the information rate-distortion (R-D) function of images, i.e., the fundamental limit of lossy image compression. In this paper, we report an improved upper bound on the R-D function of images implemented by (1) introducing a new VAE model architecture, (2) applying variable-rate compression techniques, and (3) proposing a novel \ourfunction{} to stabilize training. We demonstrate that at least 30\% BD-rate reduction w.r.t. the intra prediction mode in VVC codec is achievable, suggesting that there is still great potential for improving lossy image compression. Code is made publicly available at https://github.com/duanzhiihao/lossy-vae.
摘要
最近的研究表明,变量自动编码器(VAEs)可以用来Upper-bound the information rate-distortion(R-D)函数图像,即图像损失压缩的基本限制。在这篇论文中,我们报告了一种新的 VAE 模型架构,以及对变量比特率压缩技术的应用,以及一种新的 \ourfunction{} 来稳定训练。我们示出,至少可以实现30%的BD-rate减少相对于VVC编码器的内部预测模式,这表明还有很大的潜在改进损失图像压缩的可能性。代码在https://github.com/duanzhiihao/lossy-vae中公开。
Fast and High-Performance Learned Image Compression With Improved Checkerboard Context Model, Deformable Residual Module, and Knowledge Distillation
results: 比对 estado-of-the-art 学习图像编码方案,我们的方法可以在编码和解码过程中减少时间,并且在 PSNR 和 MS-SSIM 指标上提高 $2.3%$,在 Kodak 和 Tecnick-40 数据集上测试得到更高的性能。Abstract
Deep learning-based image compression has made great progresses recently. However, many leading schemes use serial context-adaptive entropy model to improve the rate-distortion (R-D) performance, which is very slow. In addition, the complexities of the encoding and decoding networks are quite high and not suitable for many practical applications. In this paper, we introduce four techniques to balance the trade-off between the complexity and performance. We are the first to introduce deformable convolutional module in compression framework, which can remove more redundancies in the input image, thereby enhancing compression performance. Second, we design a checkerboard context model with two separate distribution parameter estimation networks and different probability models, which enables parallel decoding without sacrificing the performance compared to the sequential context-adaptive model. Third, we develop an improved three-step knowledge distillation and training scheme to achieve different trade-offs between the complexity and the performance of the decoder network, which transfers both the final and intermediate results of the teacher network to the student network to help its training. Fourth, we introduce $L_{1}$ regularization to make the numerical values of the latent representation more sparse. Then we only encode non-zero channels in the encoding and decoding process, which can greatly reduce the encoding and decoding time. Experiments show that compared to the state-of-the-art learned image coding scheme, our method can be about 20 times faster in encoding and 70-90 times faster in decoding, and our R-D performance is also $2.3 \%$ higher. Our method outperforms the traditional approach in H.266/VVC-intra (4:4:4) and some leading learned schemes in terms of PSNR and MS-SSIM metrics when testing on Kodak and Tecnick-40 datasets.
摘要
深度学习基于图像压缩的技术在最近几年来已经取得了大量的进步。然而,许多领先的方案仍然使用序列Context-adaptive entropy模型来提高Rate-distortion(R-D)性能,这很慢。此外,压缩和解压缩网络的复杂度很高,不适合许多实际应用。在这篇论文中,我们提出了四种技术来平衡复杂度和性能的负担。我们是首次在压缩框架中引入可变 convolutional模块,可以更好地从输入图像中除去红UNDERSCOREundancy,从而提高压缩性能。其次,我们设计了Checkerboard Context模型,它使用两个独立的分布参数估计网络和不同的概率模型,可以在平行解码过程中保持同样的性能,而不需要顺序Context-adaptive模型。第三,我们开发了一种改进的三步知识传递和训练方案,可以在不同的负担和性能之间进行平衡。最后,我们引入L1正则化,使得干扰表示的数字值更加稀疏。然后,我们只编码非零通道,从而大幅减少编码和解码时间。实验结果显示,相比于当前最佳学习图像编码方案,我们的方法可以在编码过程中提高20倍,并在解码过程中提高70-90倍,同时R-D性能也提高了2.3%。我们的方法在H.266/VVC-intra(4:4:4)和一些领先的学习图像编码方案之上具有较高的PSNR和MS-SSIM指标,当测试在Kodak和Tecnick-40 dataset时。
An automated, high-resolution phenotypic assay for adult Brugia malayi and microfilaria
results: 研究发现,这三种抗helmintic药物可以减少成年布鲁格IA的运动能力,且具有不同的机理和效果。Abstract
Brugia malayi are thread-like parasitic worms and one of the etiological agents of Lymphatic filariasis (LF). Existing anthelmintic drugs to treat LF are effective in reducing the larval microfilaria (mf) counts in human bloodstream but are less effective on adult parasites. To test potential drug candidates, we report a multi-parameter phenotypic assay based on tracking the motility of adult B. malayi and mf in vitro. For adult B. malayi, motility is characterized by the centroid velocity, path curvature, angular velocity, eccentricity, extent, and Euler Number. These parameters are evaluated in experiments with three anthelmintic drugs. For B. malayi mf, motility is extracted from the evolving body skeleton to yield positional data and bending angles at 74 key point. We achieved high-fidelity tracking of complex worm postures (self-occlusions, omega turns, body bending, and reversals) while providing a visual representation of pose estimates and behavioral attributes in both space and time scales.
摘要
布鲁迪亚马LAY是线状寄生虫,是淋巴炎病(LF)的etiological agent之一。现有的安定虫药可以降低人体血液中幼虫微血短的数量,但对成熟虫有效性较差。为测试潜在药物候选者,我们报告了一种多参数现象学测试方法,基于成人布鲁迪亚马LAY和幼虫的运动追踪。成人布鲁迪亚马LAY的运动特征包括中心速度、轨迹弯曲、angular velocity、eccentricity、范围和Euler数。这些参数在三种安定虫药实验中被评估。布鲁迪亚马LAY幼虫的运动被提取自发展的身体骨架中,以获得位坐数据和弯曲角度。我们实现了高精度的跟踪复杂的虫姿势(自相交、卷曲、身体弯曲和反转),并提供了Visual representation of pose estimates和行为特征在空间和时间尺度上。
Duration-adaptive Video Highlight Pre-caching for Vehicular Communication Network
results: simulations based on real-world video datasets show that the proposed method significantly improves highlight entropy and jitter compared to benchmark schemes.Abstract
Video traffic in vehicular communication networks (VCNs) faces exponential growth. However, different segments of most videos reveal various attractiveness for viewers, and the pre-caching decision is greatly affected by the dynamic service duration that edge nodes can provide services for mobile vehicles driving along a road. In this paper, we propose an efficient video highlight pre-caching scheme in the vehicular communication network, adapting to the service duration. Specifically, a highlight entropy model is devised with the consideration of the segments' popularity and continuity between segments within a period of time, based on which, an optimization problem of video highlight pre-caching is formulated. As this problem is non-convex and lacks a closed-form expression of the objective function, we decouple multiple variables by deriving candidate highlight segmentations of videos through wavelet transform, which can significantly reduce the complexity of highlight pre-caching. Then the problem is solved iteratively by a highlight-direction trimming algorithm, which is proven to be locally optimal. Simulation results based on real-world video datasets demonstrate significant improvement in highlight entropy and jitter compared to benchmark schemes.
摘要
Video流量在交通网络(VCN)中正在呈指数增长趋势。然而,不同的视频片段吸引了不同的观众,并且边缘节点可以为移动 vehicles提供不同的服务时间,这会对预缓存决策产生很大的影响。在这篇论文中,我们提出了一种高效的视频突出点预缓存方案,适应到服务时间的变化。具体来说,我们开发了一个高光 entropy 模型,考虑了视频片段的吸引力和时间上的连续性,基于这个模型,我们将预缓存问题转化为优化问题。由于这个问题是非凸的,而且无法取得目标函数的闭合表达,我们将变量分解成多个变量,通过wavelet 变换 derivation 得到候选的高光片段,这可以很大地降低预缓存的复杂度。然后,我们通过一种高光方向截断算法来解决这个问题,这个算法被证明是本地优化的。实验结果基于实际的视频数据集表明,与参考方案相比,我们的方案具有显著提高高光 entropy和抖动的性能。