Published in "Journal of Electronic Imaging 21, 2 (2012) 1-17" February 26, 2012, Submitted to SPIE International Journal on Electronic Imaging.
Keywords: video genre classification, block-level audio descriptors, action segmentation, color perception, statistics of contour geometry, video indexing.
Bogdan Ionescu, Christoph Rasche, Constantin Vertan, University «Politehnica» of Bucharest, 061071, Romania, {bionescu, rasche, cvertan}@alpha.imag.pub.ro. Klaus Seyerlehner , DCP, Johannes Kepler University, A-4040 Austria,
[email protected]. Patrick Lambert Lapi-Etti, LISTIC, Polytech Annecy-Chambery, University of Savoie, 74944 France
[email protected].
We propose an audio-visual approach to video genre classification using content descriptors that exploit audio, color, temporal, and contour information. Audio information is extracted at block-level, which has the advantage of capturing local temporal information. At the temporal structure level, we consider action content in relation to human perception. Color perception is quantified using statistics of color distribution, elementary hues, color properties, and relationships between colors. Further, we compute statistics of contour geometry and relationships. The main contribution of our work lies in harnessing the descriptive power of the combination of these descriptors in genre classification. Validation was carried out on over 91 h of video footage encompassing 7 common video genres, yielding average precision and recall ratios of 87% to 100% and 77% to 100%, respectively, and an overall average correct classification of up to 97%. Also, experimental comparison as part of the MediaEval 2011 benchmarking campaign demonstrated the efficiency of the proposed audio-visual descriptors over other existing approaches. Finally, we discuss a 3-D video browsing platform that displays movies using feature-based coordinates and thus regroups them according to genre.