Loading...
A Two-Tier Post-Processing Framework for Lightweight Video Anomaly Detection
Akhtar, Muhammad Hannan
Akhtar, Muhammad Hannan
Date
2025-11
Author
Advisor
Type
Thesis
Degree
Citations
Altmetric:
Description
A Master of Science thesis in Computer Engineering by Muhammad Hannan Akhtar entitled, “A Two-Tier Post-Processing Framework for Lightweight Video Anomaly Detection”, submitted in November 2025. Thesis advisor is Dr. Tamer Shanableh. Soft copy is available (Thesis, Completion Certificate, Approval Signatures, and AUS Archives Consent Form).
Abstract
Video anomaly detection (VAD) is predominantly performed in the one-class/unsupervised setting, yet state-of-the-art reconstruction-based approaches remain impractical for real-time deployment due to their massive computational cost, often exceeding 150 million parameters and 500 GFLOPs per frame. This thesis introduces a two-tier cascading framework that rethinks VAD compression by distilling the behavior of heavy models rather than compressing their parameters. In Tier One, a high-capacity teacher model (ASTNet or HSTforU) is trained on normal data and used to generate per-frame log-MSE reconstruction errors, a continuous representation of its learned normality manifold. In Tier Two, an ultra-lightweight student regressor learns to predict these log-MSE values (and thus the corresponding PSNR) directly from short RGB clips, eliminating the reconstruction pipeline entirely. The Tier-2 model then uses the predicted PSNR/log-MSE to decide whether incoming frames conform to the normal training distribution. Two complementary architectures are proposed: Tier-2-ModelA, a compact 3D CNN with explicit motion cues (0.33M parameters, 2.92 GFLOPs), and Tier-2-ModelB, a deeper 3D CNN with temporal self-attention (2.44M parameters, 20.88 GFLOPs). Extensive evaluation on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets shows that traditional compression methods, quantization, pruning, and classical knowledge distillation, either catastrophically degrade accuracy or yield insufficient efficiency gains. In contrast, the proposed framework achieves up to 456× parameter reduction, 202× fewer GFLOPs, and 106× faster inference, while retaining up to 92.7% AUC, enabling real-time anomaly detection at 467 FPS on CPU-grade hardware. Crucially, the framework is architecture-agnostic and successfully compresses both CNN-based and Transformer-based teachers without modifying their internal structure. This work establishes reconstruction-error regression as a practical, general, and highly efficient methodology for deploying high-accuracy VAD models on resource-constrained edge platforms.
