Loading...
Thumbnail Image
Publication

Predicting Compression Modes and Split Decisions for HEVC Video Coding Using Machine Learning Techniques

Hassan, Mahitab Alaaeldin
Date
2017-05
Type
Thesis
Degree
Description
A Master of Science thesis in Computer Engineering by Mahitab Alaaeldin Hassan entitled, "Predicting Compression Modes and Split Decisions for HEVC Video Coding Using Machine Learning Techniques," submitted in May 2017. Thesis advisor is Dr. Tamer Shanableh. Soft and hard copy available.
Abstract
The High Efficiency Video Coding (HEVC) standard presents a substantial video compression efficiency improvement at the expense of increasing the computational complexity. This enhancement is primarily due to the introduction of flexible quad-based-tree partitioning structures for motion estimation (ME) and image transformation. However, finding the optimum coding structure, which is done by an exhaustive rate-distortion optimization (RDO) process, is what contributes to increasing the computational complexity. In this thesis, we propose a set of early termination algorithms to reduce the HEVC video encoding complexity by predicting both the split decisions of Coding Units (CUs) and the coding modes of Prediction Units (PUs). A video sequence-dependent approach is used in which frames belonging to the video being encoded are utilized for generating a classification model. At each CU depth level, features representing the given CU are extracted from both the current and previously encoded CUs. The feature vectors (FVs) are then utilized for generating dimensionality reduction and classification models. These models are in turn used at each coding depth to predict the split and mode decisions of subsequence CUs. In this work, we use stepwise regression, random forest feature importance, and Principal Component Analysis (PCA) for dimensionality reduction. Moreover, polynomial networks, random forests, and J48 decision trees are used for classification. Using seventeen video sequences with four different spatial resolution classes, the proposed solution is assessed in terms of the classification accuracy, Bjontegaard Delta bitrate (BD-rate), BD Peak Signal-to-Noise Ratio (BD-PSNR) and computational complexity reduction (CCR). On average, the CU early termination scheme achieved a CCR of 38.5% with an average classification accuracy of 78.1% at a negligible cost of 0.539% and -0.021 dB in terms of BD-rate and BD-PSNR, respectively. The PU early termination scheme attained an overall CCR of 20.9% with an average classification accuracy of 86.5% at the cost of a BD-rate of 0.248% and a BD-PSNR of -0.01 dB. When jointly implemented, an overall CCR of 50.1% was achieved with a BD-rate increase of 2% and a BD-PSNR decrease of 0.079 dB.
External URI
Collections