APSIPA Transactions on Signal and Information Processing > Vol 12 > Issue 4

Real-time Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video in Compressed Domain

Sandy Ardianto, Electrical Engineering and Computer Science International Graduate Program, National Yang Ming Chiao Tung University, Hsinchu, Taiwan, Hsueh-Ming Hang, Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan, hmhang@nctu.edu.tw , Wen-Huang Cheng, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
 
Suggested Citation
Sandy Ardianto, Hsueh-Ming Hang and Wen-Huang Cheng (2023), "Real-time Vehicle Detection and Tracking on Fisheye Traffic Monitoring Video in Compressed Domain", APSIPA Transactions on Signal and Information Processing: Vol. 12: No. 4, e106. http://dx.doi.org/10.1561/116.00000116

Publication Date: 06 Nov 2023
© 2023 S. Ardianto, H.-M. Hang and W.-H. Cheng
 
Subjects
 
Keywords
Real-timevehicle detectionvehicle trackingfisheye cameracompressed domain
 

Share

Open Access

This is published under the terms of CC BY-NC.

Downloaded: 479 times

In this article:
Introduction 
Related Works 
Dataset and Metrics 
Scheme 1: Multi-frame Detection and Tracking 
Scheme 2: Extract Objects from Detected Trails 
Compressed-domain vs. Pixel-domain Performance Comparison 
Conclusions 
References 

Abstract

Our goal is to develop real-time vehicle detection and tracking schemes for fisheye traffic monitoring video using the temporal information in the compressed domain without decoding the entire video. Two algorithms are proposed. The first algorithm starts with a conventional single-frame detector, but we introduce a multi-frame information fusion stage to improve the final detection and tracking accuracy, which is implemented using multi-modal bi-directional LSTM (MM bi-LSTM) network. The second algorithm first constructs multi-frame motion trail image, and then a single-image multi-head detector is designed to produce bounding boxes of an individual frame. The first scheme can be viewed as a detect-to-track design, and the second scheme is track-to-detect. We tested our proposals on the ICIP2020 VIP Cup dataset in H.265 video format. The aforementioned algorithms are applied to the motion fields and residual images in the H.265 compressed data set. It turns out that their detection and tracking performances are on par with their pixel-domain counterparts, and they can achieve the state-of-the-art accuracy of conventional video object detectors and trackers. If the decoding process for video compression is not counted, their computational complexities are much lower than the conventional pixel-domain video object detectors and trackers.

DOI:10.1561/116.00000116

Companion

APSIPA Transactions on Signal and Information Processing Special Issue - Emerging AI Technologies for Smart Infrastructure
See the other articles that are part of this special issue.