State-of-the-art and future challenges in video scene detection: a survey

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

In Multimedia Systems, Vol. 19, № 5; (2013), pp. 427-454, doi:10.1007/s00530-013-0306-4 by Manfred Del Fabro, Laszlo Böszörmenyi, Multimedia Systems Volume 19, Issue 5 , pp 427-454
Keywords: video segmentation; scene detection; non-sequential video; survey. Topics: multimedia information systems; computer communication; networksoperating systems; data storage representation; data encryption; computer graphics. Industry sectors: electronics; it & software; telecommunications.
Abstract
In the last 15 years much effort has been made in the field of segmentation of videos into scenes. We give a comprehensive overview of the published approaches and classify them into seven groups based on three basic classes of low-level features used for the segmentation process: (1) visual-based, (2) audio-based, (3) text-based, (4) audio-visual-based, (5) visual-textual-based, (6) audio-textual-based and (7) hybrid approaches. We try to make video scene detection approaches better assessable and comparable by making a categorization of the evaluation strategies used. This includes size and type of the dataset used as well as the evaluation metrics. Furthermore, in order to let the reader make use of the survey, we list eight possible application scenarios, including an own section for interactive video scene segmentation, and identify those algorithms that can be applied to them. At the end, current challenges for scene segmentation algorithms are discussed. In the appendix the most important characteristics of the algorithms presented in this paper are summarized in table form.
Cover: Date 2013-10-01; Print: ISSN 0942-4962; Online: ISSN 1432-1882;Publisher: Springer Berlin Heidelberg.
Author Affiliations: Institute of Information Technology (ITEC), Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Classification of Scene Segmentation Approaches
Scene Segmentation Methods
Rule-Based Methods: 180 degree rule, action matching rule, film tempo rule, shot/reverse shot rule, establishment/breakdown rule
Graph-Based Methods
Stochastic-Based Methods
Hierarchical and Full vs. Partial Decomposition
Video Scene Segmentation: State-of-the-Art
Visual-Based Segmentation
Visual-Based Full Segmentation
Visual-Based Partial Segmentation
Visual Graph-Based Full Segmentation
Visual Stochastic-Based Full Segmentation
Audio-Based Segmentation
Audio-Based Full Segmentation
Audio-Based Partial Segmentation
Text-Based Full Segmentation
Audio-Visual Full Segmentation
Audio-Visual Graph-Based Full Segmentation
Audio-Visual Stochastic-Based Full Segmentation
Audio-Visual Stochastic-Based Partial Segmentation
Hybrid Full Segmentation
Visual-Textual Full Segmentation
Audio-Textual Full Segmentation
Hybrid Partial Segmentation
Audio-Textual Partial Segmentation
Evaluation of Video Segmentation Approaches
Datasets and Video Genres
Evaluation Methods
Strategies for Video Scene Segmentation Problems
Movies
Presented approaches for movies
Presented approaches for movies
TV series or sitcoms
News
Presented approaches for news videos
Possible approaches for news videos
Game and TV show videos
Presented approaches for game and TV show videos
Possible approaches for game and TV show videos
Sports videos
Presented approaches for sports videos
Possible approaches for sports videos
Single-shot videos
Possible approaches for single-shot videos
Black-and-white videos
Presented approaches for black-and-white videos
Possible approaches for black-and-white videos
Interactive scene segmentation
Future Challenges in Video Scene Detection
References
Adams, B., Dorai, C., Venkatesh, S.: Toward automatic extraction of expressive elements from motion pictures: tempo. IEEE Trans. Multimed. 4(4), 472–481 (2002)
Aner, A., Kender, J.: Video Summaries through mosaic-based shot and scene clustering. In: Heyden, A., Sparr, G., Nielsen, M., Johansen P. (eds.) Computer Vision ECCV 2002, Lecture Notes in Computer Science, vol. 2353, Chap. 26, pp. 45–
49. Springer, Berlin (2006)
Arifin, S., Cheung, P.Y.K.: Affective level video segmentation by utilizing the Pleasure-Arousal-dominance information. IEEE Trans. Multimed. 10(7), 1325–1341 (2008)
Ariki, Y., Kumano, M., Tsukada, K.: Highlight scene extraction in real time from baseball live video. In: Proceedings of the 5th ACM SIGMM International Workshop on Multimedia Information Retrieval, MIR ’03, pp. 209–
214. ACM, New York, NY, USA (2003)
Benini, S., Xu, L.Q., Leonardi, R.: Identifying video content consistency by vector quantization. In: Proceedings of the 2005 International Workshop on Image Analysis for Multimedia Interactive Services (WIAMIS 2005) (2005)
Bredin, H.: Segmentation of tv shows into scenes using speaker diarization and speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 2377–2380 (2012)
Cao, J.R.: Algorithm of scene segmentation based on svm for scenery documentary. In: Third International Conference on Natural Computation, 2007 (ICNC 2007), vol. 3, pp. 95–98 (2007)
Chaisorn, L., Chua, T.S., Lee, C.H.: The segmentation of news video into story units. In: IEEE International Conference on Multimedia and Expo, 2002. ICME ’02, 2002, vol. 1, pp. 73–76 (2002)
Chasanis, V.T., Likas, A.C., Galatsanos, N.P.: Scene detection in videos using shot clustering and sequence alignment. IEEE Trans. Multimed. 11(1), 89–100 (2009)
Chen, L., Ozsu, M.: Rule-based scene extraction from video. In: Proceedings of 2002 International Conference on Image Processing (2002)
Chen, L.H., Lai, Y.C., Mark Liao, H.Y.: Movie scene segmentation using background information. Pattern Recognit. 41, 1056–1065 (2008)
Chen, S.C., Shyu, M.L., Liao, W., Zhang, C.: Scene change detection by audio and video clues, pp. 365–368
Cheng, W., Lu, J.: Video scene oversegmentation reduction by tempo analysis. In: Fourth International Conference on Natural Computation, 2008 (ICNC ’08), vol. 4, pp. 296–300 (2008)
Chu, W.T., Li, C.J., Tseng, S.C.: Travelmedia: an intelligent management system for media captured in travel. J. Vis. Commun. Image Represent. 22(1), 93–104 (2011)
Chu, W.T., Lin, C.C., Yu, J.Y.: Using cross-media correlation for scene detection in travel videos. In: Proceedings of the ACM International Conference on Image and Video Retrieval, CIVR ’
09. ACM, New York, NY, USA (2009)
Cour, T., Jordan, C., Miltsakaki, E., Taskar, B.: Movie/script: alignment and parsing of video and text transcription. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) Computer Vision ECCV 2008, Lecture Notes in Computer Science, vol. 5305, Chap. 12, pp. 158–
171. Springer, Berlin (2008)
Del Fabro, M., Böszörmenyi, L.: Video scene detection based on recurring motion patterns. In: Second International Conferences on Advances in Multimedia (MMEDIA), pp. 113–118 (2010)
Del Fabro, M., Böszörmenyi, L.: Summarization and presentation of real-life events using community-contributed content. In: Schoeffmann, K., Merialdo, B., Hauptmann, A., Ngo, C.W., Andreopoulos, Y., Breiteneder, C. (eds.) Advances in Multimedia Modeling, Lecture Notes in Computer Science, vol. 7131, pp. 630–
632. Springer, Berlin (2012)
Del Fabro, M., Sobe, A., Böszörmenyi, L.: Summarization of real-life events based on community-contributed content. In: The Fourth International Conferences on Advances in Multimedia, pp. 119–
126. IARIA (2012)
Ellouze, M., Boujemaa, N., Alimi, A.: Scene pathfinder: unsupervised clustering techniques for movie scenes extraction. Multimed. Tools Appl. 47(2), 325–346 (2010)
Ercolessi, P., Bredin, H., Sénac, C., Joly, P.: Segmenting TV series into scenes using speaker diarization. In: WIAMIS 2011: 12th International Workshop on Image Analysis for Multimedia Interactive Services. Delft, The Netherlands (2011)
Friedland, G., Gottlieb, L., Janin, A.: Joke-o-mat: browsing sitcoms punchline by punchline. In: Proceedings of the Seventeen ACM International Conference on Multimedia, MM ’09, pp. 1115–
1116. ACM, New York, NY, USA (2009)
Gatica-Perez, D., Loui, A., Sun, M.T.: Finding structure in home videos by probabilistic hierarchical clustering. IEEE Trans. Circuits Syst. Video Technol. 13(6), 539– 548 (2003)
Goela, N., Wilson, K., Niu, F., Divakaran, A., Otsuka, I.: An SVM framework for Genre-Independent scene change detection. In: IEEE International Conference on Multimedia and Expo, pp. 532–535 (2007)
Gu, Z., Mei, T., Hua, X.S., Wu, X., Li, S.: EMS: Energy Minimization Based Video Scene Segmentation. In: IEEE International Conference on Multimedia and Expo, pp. 520–523 (2007)
Han, B., Wu, W.: Video scene segmentation using a novel boundary evaluation criterion and dynamic programming. In: IEEE International Conference on Multimedia and Expo (ICME), 2011, pp. 1–6 (2011)
Hanjalic, A., Lagendijk, R.L., Biemond, J.: Automated high-level movie segmentation for advanced video-retrieval systems. IEEE Trans. Circuits Syst. Video Technol. 9(4), 580–588 (1999)
Hauptmann, A., Witbrock, M.: Story segmentation and detection of commercials in broadcast news video. In: Proceedings. IEEE International Forum on Research and Technology Advances in Digital Libraries, 1998. ADL 98, pp. 168–179 (1998)
Hsu, W.H.M., Chang, S.F.: Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation. In: IEEE International Conference on Multimedia and Expo, 2004. ICME ’04, vol. 2, pp. 1091–1094 (2004)
Huang, J., Liu, Z., Wang, Y.: Joint scene classification and segmentation based on hidden markov model. IEEE Trans. Multimed. 7(3), 538–550 (2005)
Huang, J., Liu, Z., Yao, W.: Integration of audio and visual information for content-based video segmentation. In: International Conference on Image Processing, ICIP 98, vol. 3, pp. 526–529 (1998)
Janin, A., Gottlieb, L., Friedland, G.: Joke-o-Mat HD: browsing sitcoms with human derived transcripts. In: Proceedings of the International Conference on Multimedia, MM ’10, pp. 1591–1594. ACM, New York, NY, USA (2010)
Javed, O., Rasheed, Z., Shah, M.: A framework for segmentation of talk and game shows. In: Eighth IEEE International Conference on Computer Vision, ICCV 2001, (2001)
Katz, E., Klein, F., Nolen, R.: The film encyclopedia. Film Encyclopedia. HarperPerennial (1998). http://books.google.com/books?id=jhx0QgAACAAJ
Kender, J., Yeo, B.L.: Video scene segmentation via continuous video coherence. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 367–373 (1998)
Kohonen, T.: The self-organizing map. Neurocomputing 21(1–3), 1–6 (1998)
Kwon, Y.M., Song, C.J., Kim, I.J.: A new approach for high level video structuring. In: IEEE International Conference on Multimedia and Expo, ICME 2000. (2000)
Kyperountas, M., Kotropoulos, C., Pitas, I.: Enhanced Eigen-Audioframes for audiovisual scene change detection. IEEE Trans. Multimed. 9(4), 785–797 (2007)
Liang, C., Zhang, Y., Cheng, J., Xu, C., Lu, H.: A novel role-based movie scene segmentation method. In: Muneesawang, P., Wu, F., Kumazawa, I., Roeksabutr, A., Liao, M., Tang, X. (eds.) Advances in Multimedia Information Processing—PCM 2009, Lecture Notes in Computer Science, vol. 5879, Chap. 82, pp. 917–
922. Springer, Berlin (2009)
Lienbart, R., Pfeiffer, S., Effelsberg, W.: Scene determination based on video and audio features. In: IEEE International Conference on Multimedia Computing and Systems, vol. 1, pp. 685–690 (1999)
Lin, T., Zhang, H.J., Shi, Q.Y.: Video scene extraction by force competition. In: IEEE International Conference on Multimedia and Expo, p. 192 (2001)
Liu, C., Huang, Q., Jiang, S., Xing, L., Ye, Q., Gao, W.: A framework for flexible summarization of racquet sports video using multiple modalities. Comput. Vis. Image Underst. 113(3), 415–424 (2009)
Lu, L., Cai, R., Hanjalic, A.: Audio elements based auditory scene segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings, vol. 5, p. V (2006)
Lu, L., Zhang, H.J., Jiang, H.: Content analysis for audio classification and segmentation. IEEE Trans. Speech Audio Process. 10(7), 504–516 (2002)
Mitrović, D., Hartlieb, S., Zeppelzauer, M., Zaharieva, M.: Scene segmentation in artistic archive documentaries. In: Leitner, G., Hitz, M., Holzinger, A. (eds.) HCI in Work and Learning, Life and Leisure, Lecture Notes in Computer Science, vol. 6389, Chap. 27, pp. 400–
410. Springer, Berlin (2010)
Monaco, J.: How to Read a Film: The World of Movies, Media, Multimedia: Language, History, Theory, 3 edn. Oxford University Press, USA (2000)
Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Trans. Circuits Syst. Video Technol. 15(2), 296–305 (2005)
Ngo, C.W., Pong, T.C., Zhang, H.J.: Motion-based video representation for scene change detection. Int. J. Comput. Vis. 50(2), 127–142 (2002)
Nitanda, N., Haseyama, M., Kitajima, H.: Audio signal segmentation and classification for scene-cut detection. In: IEEE International Symposium on Circuits and Systems, 2005. ISCAS 2005, Vol. 4, pp. 4030– 4033 (2005)
Niu, F., Goela, N., Divakaran, A., Abdel-Mottaleb, M.: Audio scene segmentation for video with generic content. In: Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series. Presented at the Society of Photo-Optical Instrumentation Engineers (SPIE) Conference, vol. 6820 (2008)
Odobez, J.M., Gatica-Perez, D., Guillemot, M.: Spectral structuring of home videos. In: Bakker, E., Lew, M., Huang, T., Sebe, N., Zhou, X. (eds.) Image and Video Retrieval, Lecture Notes in Computer Science, vol. 2728, Chap. 31, pp. 85–
90. Springer, Berlin (2003)
Over, P., Awad, G., Fiscus, J., Antonishek, B., Michel, M., Smeaton, A.F., Kraaij, W., Quenot, G.: Trecvid 2010—an overview of the goals, tasks, data, evaluation mechanisms and metrics. In: Proceedings of TRECVID 2010. NIST, USA (2010)
Parshin, V., Paradzinets, A., Chen, L.: Multimodal data fusion for video scene segmentation. In: Bres, S., Laurini, R. (eds.) Visual Information and Information Systems, Lecture Notes in Computer Science, vol. 3736, pp. 279–
289. Springer, Berlin (2006)
Petersohn, C.: Temporal video structuring for preservation and annotation of video content. In: 16th IEEE International Conference on Image Processing (ICIP), 2009, pp. 93–96 (2009)
Poulisse, G., Moens, M.: Unsupervised scene detection in olympic video using multi-modal chains. In: 9th International Workshop on Content-Based Multimedia Indexing (CBMI), 2011, pp. 103–108 (2011)
Rasheed, Z., Shah, M.: Scene Detection in Hollywood Movies and TV Shows. IEEE Computer Society, Los Alamitos, CA, USA, p. 343 (2003)
Rasheed, Z., Shah, M.: Detection and representation of scenes in videos. IEEE Trans. Multimed. 7(6), 1097–1105 (2005)
Rui, Y., Huang, T.S., Mehrotra, S.: Constructing table-of-content for videos. Multimed. Syst. 7(5), 359–368 (1999)
Sakarya, U., Telatar, Z.: Graph-based multilevel temporal video segmentation. Multimed. Syst. 14(5), 277–290 (2008)
Sakarya, U., Telatar, Z.: Video scene detection using dominant sets. In: 15th IEEE International Conference on Image Processing, 2008. ICIP 2008, pp. 73–76 (2008)
Sakarya, U., Telatar, Z.: Video scene detection using graph-based representations. Signal Process. Image Commun. 25(10), 774–783 (2010)
Sang, J., Xu, C.: Character-based movie summarization. In: Proceedings of the International Conference on Multimedia, MM ’10, pp. 855–
858. ACM, New York, NY, USA (2010)
Schoeffmann, K., Lux, M., Taschwer, M., Boeszoermenyi, L.: Visualization of video motion in context of video browsing. In: Proceedings of the IEEE International Conference on Multimedia and Expo. IEEE, New York, USA (2009)
Schoeffmann, K., Taschwer, M., Boeszoermenyi, L.: The video explorer: a tool for navigation and searching within a single video based on fast content analysis. In: MMSys 10: Proceedings of the First Annual ACM SIGMM Conference on Multimedia Systems, p. 247–
258. ACM, New York, NY, USA (2010)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Kittler, J.: Differential edit distance: a metric for scene segmentation evaluation. IEEE Transa. Circuits Syst. Video Technol. 22(6), 904–914 (2012)
Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Bugalho, M., Trancoso, I.: Temporal video segmentation to scenes using High-Level audiovisual features. IEEE Trans. Circuits Syst. Video Technol. 21(8), 1163–1177 (2011)
Sidiropoulos, P., Mezaris, V., Kompatsiaris, I., Meinedo, H., Trancoso, I.: Multi-modal scene segmentation using scene transition graphs. In: Proceedings of the Seventeen ACM International Conference on Multimedia, MM ’09, pp. 665–
668. ACM, New York, NY, USA (2009)
Song, Y., Ogawa, T., Haseyama, M.: MCMC-based scene segmentation method using structure of video. In: IEEE International Symposium on Communications and Information Technologies (ISCIT), pp. 862–866 (2010)
Sundaram, H., Chang, S.F.: Video scene segmentation using video and audio features. In: IEEE International Conference on Multimedia and Expo, 2000. ICME 2000 (2000)
Sundaram, H., Chang, S.F.: Computable scenes and structures in films. IEEE Trans. Multimed. 4(4), 482–491 (2002)
Surowiecki, J.: The Wisdom of Crowds. Anchor, New York (2005)
Tavanapong, W., Zhou, J.: Shot Clustering Techniques for Story Browsing. IEEE Trans. Multimed. 6(4), 517–527 (2004)
Truong, B.T., Venkatesh, S.: Video abstraction: a systematic review and classification. ACM Trans. Multimed. Comput. Commun. Appl. 3(1), 3+ (2007)
Truong, B.T., Venkatesh, S., Dorai, C.: Scene extraction in motion pictures. IEEE Trans. Circuits Syst. Video Technol. 13(1), 5–15 (2003)
Velivelli, A., Ngo, C.W., Huang, T.S.: Detection of documentary scene changes by Audio-Visual fusion image and video retrieval. In: Bakker, E.M., Lew, M.S., Huang, T.S., Sebe, N., Zhou, X.S. (eds.) Image and Video Retrieval, Lecture Notes in Computer Science, vol. 2728, Chap. 23, pp. 227–
238. Springer, Berlin (2003)
Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimed. 4(4), 492–499 (2002)
Vinciarelli, A., Favre, S.: Broadcast news story segmentation using social network analysis and hidden markov models. In: Proceedings of the 15th International Conference on Multimedia, MULTIMEDIA ’07, pp. 261–
264. ACM, New York, NY, USA (2007)
Wang, J., Duan, L., Liu, Q., Lu, H., Jin, J.S.: A multimodal scheme for program segmentation and representation in broadcast video streams. IEEE Trans. Multimed. 10(3), 393–408 (2008)
Wang, X., Wang, S., Xuejun, S., Gabbouj, M.: A shot clustering based algorithm for scene segmentation. In: International Conference on Computational Intelligence and Security Workshops, CISW 2007, pp. 259–252 (

Author(s): Fabro М.D., Böszörmenyi L.

Language: English
Commentary: 1549470
Tags: Информатика и вычислительная техника;Обработка медиа-данных;Обработка видео