As technologies are fast advancing, the importance of text detection and recognition is receiving special attention from the researchers. Thus, one can see several real-time applications of video text processing which requires cognitive-based methods to find a solution. The main applications are (1) retrieving and indexing video based on semantic of the content of the video, (2) machine translation to assist foreigners, (3) assisting blind people to walk on the road freely without aid, (4) automatic vehicle driving, (5) license plate tracing to catch vehicles which violate the traffic signals, (6) monitoring the images posted on social media based on text and content of the images, (7) identifying the location based on the address of the street and shops, etc., (8) tracing players in the sports based on the jersey/bib number or text, and (9) in the same way, tracing the bib number in case of marathon and other events. For the above-mentioned applications, text detection and recognition in video and natural scene images is an integral part of the system.
Author(s): Palaiahnakote Shivakumara, Umapada Pal
Series: Cognitive Intelligence and Robotics
Publisher: Springer
Year: 2021
Language: English
Pages: 285
City: Singapore
Preface
Contents
1 Cognitively Inspired Video Text Processing
1.1 Background
1.2 Cognitively Inspired Video Processing
1.3 Video Text Processing
1.4 History of Video Text Processing
1.4.1 OCR for Camera-Based Image
1.4.2 OCR for Natural Scene and Video Image
1.5 Video Text Processing for Surveillance Applications
1.5.1 Arbitrary-Oriented Video Text Recognition
1.5.2 Multi-Type Arbitrary-Oriented Video Text Recognition
1.6 Challenges for Surveillance Applications
1.7 Video Text Processing for Forensic Applications
1.8 Challenges for Forensic Applications
1.9 Summary
References
2 Key Text Frame Selection from Video
2.1 Background
2.2 Approaches for Key Text Frame Classification
2.2.1 Edge Features for Text Frame Classification in Video
2.3 Experimental Results
2.3.1 Experiments for Edge-Based Features Method
2.4 Summary
References
3 Text and Non-text Frame Classification in Video
3.1 Background
3.2 A Mutual Nearest Neighbor-Based Symmetry for Text Frame Classification in Video
3.2.1 Related Work
3.2.2 Mutual Nearest Neighbor Symmetry (MNNS)-Based Approach
3.3 Experimental Results
3.3.1 Experiment on Text Detection Methods on Non-Text Frames
3.3.2 Experiments on only Probable Text Blocks Selection (PTBS)
3.3.3 Experiments on Mutual Nearest Neighbor-Based Symmetry (MNNS)
3.3.4 Experiments on Combined Method (PTBS + MNNS)
3.3.5 Experiment on Publicly Available Data (Hua’s + ICDAR 03 Dataset)
3.3.6 Erroneous Results
3.4 Summary
References
4 Video Text Detection
4.1 Background
4.2 Text Detection Using Delaunay Triangulation in Video Sequence
4.2.1 Delaunay Triangulation Method
4.2.2 Pruning Edges via Four Criterions
4.2.3 Candidate Text Region Verification
4.2.4 Merging Clusters to Form Text Line
4.3 Histogram Oriented Moments Descriptor for Multi-Oriented Moving Text Detection in Video
4.3.1 Related Work
4.3.2 Histogram Oriented Moments Approach
4.3.3 HOM for Text Candidates Selection
4.3.4 Text Candidates Verification
4.3.5 Moving Text Detection
4.4 Experimental Results
4.4.1 Experiments on Delaunay Triangulation
4.4.2 Experiments on Histogram Oriented Moments Based Approach
4.5 Summary
References
5 Text Detection in Images
5.1 Background
5.2 Script-Independent Approach for Multi-oriented Text Detection in Scene Image
5.2.1 Related Work
5.2.2 Script-Independent Approach for Text Detection
5.2.3 Component Formation
5.2.4 Ring Radius Transform for Text Detection
5.3 Graph Attention Network for Detecting License Plates in Crowded Street Scenes
5.3.1 Related Work
5.3.2 APSEGAT Approach for License Plate Number Detection
5.3.3 Adaptive Progressive Scalable Expansion-Based Graph Attention Network (APSEGAT)
5.3.4 End-To-End Training Mechanism for License Plate Detection in Dense Vehicle Images
5.4 Experimental Results
5.4.1 Experiments on Ring Radius Transform Method
5.4.2 Experiments on Graph Attention-Based Network
5.5 Summary
References
6 Word and Character Segmentation
6.1 Background
6.2 Laplacian Method for Arbitrarily Oriented Word Segmentation in Video
6.2.1 Word Segmentation
6.2.2 Zero Crossing Points for Seed Window
6.2.3 Horizontal and Vertical Sampling for Word Segmentations
6.3 GVF Arrow Pattern for Character Segmentation from Double Line License Plate Images
6.3.1 GVF Arrow Pattern-Based Approach for Character Segmentation
6.3.2 GVF for Seed Patch Detection for the Space Between the Lines and the Characters
6.3.3 Line and Character Segmentation Using Hough Transform
6.4 Experimental Results
6.4.1 Experiments for Sampling-Based Word Segmentation
6.4.2 Experiments on GVF Based Method for Character Segmentation
6.5 Summary
References
7 Video Text Type Classification
7.1 Background
7.2 Separation of Graphics and Scene Text in Video Frames
7.2.1 Caption and Scene Text Classification
7.2.2 Horizontal Graphics and Scene Text Separation
7.2.3 Multi-Oriented Graphics and Scene Text Separation
7.3 A Temporal Integration for Word-Wise Caption and Scene Text Identification
7.3.1 Classification of Caption and Scene Text Using Temporal Information
7.4 Experimental Results
7.4.1 Experiments for Separating Graphics and Scene Text
7.4.2 Experiments on Temporal Integration Method
7.5 Summary
References
8 Video Text Enhancement for Recognition
8.1 Background
8.2 A Blind Deconvolution Model for Scene Text Enhancement in Video
8.2.1 Related Work
8.2.2 Blind Deconvolutional Approach
8.3 Experimental Results
8.3.1 Experiments for Deconvolutional Model
8.4 Summary
References
9 Video Text Recognition
9.1 Background
9.2 Improved Ring Radius Transform-Based Reconstruction for Video Character Recognition
9.2.1 Related Work
9.2.2 Recognition Method at Character Level
9.3 A CNN-RNN-Based Method for License Plate Recognition
9.3.1 Related Work
9.3.2 Recognition at Word Level
9.4 Experimental Results
9.4.1 Experiments for IRRT
9.4.2 Experiments on Deep Learning Model for Recognition
9.5 Summary
References
10 Conclusion and Future Directions
10.1 Limitations of the Video Text Processing Methods
10.2 Future Directions of Video Text Processing
10.3 Conclusion