This book constitutes the refereed proceedings of the 5th International Workshop on Document Analysis Systems, DAS 2002, held in Princeton, NJ, USA in August 2002 with sponsorship from IAPR.
The 44 revised full papers presented together with 14 short papers were carefuly reviwed and selected for inclusion in the book. All current issues in document analysis systems are adressed. The papers are organized in topical sections on OCR features and systems, handwriting recognition, layout analysis, classifiers and learning, tables and forms, text extraction, indexing and retrieval, document engineering, and new applications.
Author(s): Elisa Barney Smith, Xiaohui Qiu (auth.), Daniel Lopresti, Jianying Hu, Ramanujan Kashi (eds.)
Series: Lecture Notes in Computer Science 2423
Edition: 1
Publisher: Springer-Verlag Berlin Heidelberg
Year: 2002
Language: English
Pages: 574
Tags: Pattern Recognition; Information Storage and Retrieval; Document Preparation and Text Processing; Image Processing and Computer Vision
Relating Statistical Image Differences and Degradation Features....Pages 1-12
Script Identification in Printed Bilingual Documents....Pages 13-24
Optimal Feature Extraction for Bilingual OCR....Pages 25-36
Machine Recognition of Printed Kannada Text....Pages 37-48
An Integrated System for the Analysis and the Recognition of Characters in Ancient Documents....Pages 49-52
A Complete Tamil Optical Character Recognition System....Pages 53-57
Distinguishing between Handwritten and Machine Printed Text in Bank Cheque Images....Pages 58-61
Multi-expert Seal Imprint Verification System for Bankcheck Processing....Pages 62-65
Automatic Reading of Traffic Tickets....Pages 66-69
A Stochastic Model Combining Discrete Symbols and Continuous Attributes and Its Application to Handwriting Recognition....Pages 70-81
Top-Down Likelihood Word Image Generation Model for Holistic Word Recognition....Pages 82-94
The Segmentation and Identification of Handwriting in Noisy Document Images....Pages 95-105
The Impact of Large Training Sets on the Recognition Rate of Off-line Japanese Kanji Character Classifiers....Pages 106-110
Automatic Completion of Korean Words for Open Vocabulary Pen Interface....Pages 111-114
Using Stroke-Number-Characteristics for Improving Efficiency of Combined Online and Offline Japanese Character Classifiers....Pages 115-118
Closing Gaps of Discontinuous Lines: A New Criterion for Choosing the Best Prolongation....Pages 119-122
Classifier Adaptation with Non-representative Training Data....Pages 123-133
A Learning Pseudo Bayes Discriminant Method Based on Difference Distribution of Feature Vectors....Pages 134-144
Increasing the Number of Classifiers in Multi-classifier Systems: A Complementarity-Based Analysis....Pages 145-156
Discovering Rules for Dynamic Configuration of Multi-classifier Systems....Pages 157-166
Multiple Classifier Combination for Character Recognition: Revisiting the Majority Voting System and Its Variations....Pages 167-178
Correcting for Variable Skew....Pages 179-187
Two Geometric Algorithms for Layout Analysis....Pages 188-199
Text/Graphics Separation Revisited....Pages 200-211
A Study on the Document Zone Content Classification Problem....Pages 212-223
Logical Labeling of Document Images Using Layout Graph Matching with Adaptive Learning....Pages 224-235
A Ground-Truthing Tool for Layout Analysis Performance Evaluation....Pages 236-244
Simple Layout Segmentation of Gray-Scale Document Images....Pages 245-248
Detecting Tables in HTML Documents....Pages 249-260
Document-Form Identification Using Constellation Matching of Keywords Abstracted by Character Recognition....Pages 261-271
Table Detection via Probability Optimization....Pages 272-282
Complex Table Form Analysis Using Graph Grammar....Pages 283-286
Detection Approaches for Table Semantics in Text....Pages 287-290
A Theoretical Foundation and a Method for Document Table Structure Extraction and Decompositon....Pages 291-294
Fuzzy Segmentation of Characters in Web Images Based on Human Colour Perception....Pages 295-306
Word and Sentence Extraction Using Irregular Pyramid....Pages 307-318
Word Searching in Document Images Using Word Portion Matching....Pages 319-328
Scene Text Extraction in Complex Images....Pages 329-340
Text Extraction in Digital News Video Using Morphology....Pages 341-352
Retrieval by Layout Similarity of Documents Represented with MXY Trees....Pages 353-364
Automatic Indexing of Newspaper Microfilm Images....Pages 365-375
Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts....Pages 376-387
Spotting Where to Read on Pages - Retrieval of Relevant Parts from Page Images....Pages 388-399
Mining Documents for Complex Semantic Relations by the Use of Context Classification....Pages 400-411
Hairetes: A Search Engine for OCR Documents....Pages 412-422
Text Verification in an Automated System for the Extraction of Bibliographic Data....Pages 423-432
smartFIX : A Requirements-Driven System for Document Analysis and Understanding....Pages 433-444
Machine Learning of Generalized Document Templates for Data Extraction....Pages 445-456
Machine Learning of Generalized Document Templates for Data Extraction....Pages 457-468
Configuration REcognition Model for Complex Reverse Engineering Methods: 2(CREM) ....Pages 469-479
Electronic Document Publishing Using DjVu....Pages 480-490
DAN: An Automatic Segmentation and Classification Engine for Paper Documents....Pages 491-502
Document Reverse Engineering: From Paper to XML....Pages 503-506
Human Interactive Proofs and Document Image Analysis....Pages 507-518
Data GroundTruth, Complexity, and Evaluation Measures for Color Document Analysis....Pages 519-531
Exploiting WWW Resources in Experimental Document Analysis Research....Pages 532-543
An Automated Tachograph Chart Analysis System....Pages 544-555
A Multimodal System for Accessing Driving Directions....Pages 556-567