This book constitutes the thoroughly refereed proceedings of the 21st International Conference on Computer Processing of Oriental Languages, ICCPOL 2006, held in Singapore in December 2006, colocated with ISCSLP 2006, the 5th International Symposium on Chinese Spoken Language Processing.
The 36 revised full papers and 20 revised short papers presented were carefully reviewed and selected from 169 submissions. The papers are organized in topical sections on information retrieval, document classification, questions and answers, summarization, machine translation, word segmentation, chunking, abbreviation expansion, writing-system issues, parsing, semantics, and lexical resources.
Author(s): Yuji Matsumoto, Richard Sproat, Kam-Fai Wong, Min Zhang
Series: Lecture Notes in Artificial Intelligence 4285
Edition: 1
Publisher: Springer
Year: 2006
Language: English
Pages: 556
Front matter......Page 1
Introduction......Page 14
Contextual Question Answering......Page 15
Proposed Method......Page 16
Example......Page 17
Detecting of Reference Expressions......Page 19
Narrowing Down Antecedent Candidates Using Selectional Restriction......Page 20
Updating the List of Antecedent Candidates......Page 21
Experimental Results......Page 22
Failure Analysis......Page 23
Conclusion......Page 25
Introduction......Page 26
Segmentation of Mixed Chinese/English Document Including Scattered Italic Characters......Page 27
Preprocess......Page 28
Italic Determination......Page 29
Estimation of Slant Angle......Page 30
Evaluation of Italic Determination......Page 31
Evaluation of Estimation of Slant Angle......Page 32
References......Page 33
Introduction......Page 35
Feature Structures in the Domain of Automobile Review......Page 37
The Method of Pointwise Mutual Information......Page 38
Opinion Words Collection......Page 39
Mapping Opinion Words to Features......Page 40
Conclusion......Page 42
Introduction......Page 44
Tri-training Semi-supervised Learning and Its Modifications......Page 45
Semi-supervised Tri-training Algorithm......Page 46
Modified Versions of Tri-training......Page 47
Question Data Sets and Feature Selection......Page 48
Experiments with Multiple Classifiers......Page 49
Experiments with Two Different Algorithms and Two Views......Page 51
Conclusion......Page 52
Introduction......Page 55
Mathematical Description of Queries Similarity Computation......Page 56
Improvement of Literal Similarity Computation......Page 58
Quantity Computation of Multi-feature Similar Units......Page 59
Description of Similarity Computation Algorithm of the Queries......Page 60
Experiments and Results......Page 61
References......Page 62
Introduction......Page 64
Related Work......Page 65
Candidates Expansion......Page 66
Estimation of Chinese-English Translation Probability......Page 68
Estimation of Bilingual Context Similarity......Page 69
Improvement on the Coverage......Page 70
Evaluations of Existing Techniques......Page 71
Evaluations of ICT......Page 72
Conclusions......Page 73
References......Page 74
Approaches to Exploiting Syntactic Information for SMT......Page 76
Our Work......Page 77
Syntactic Transformation......Page 78
Transformational Model......Page 79
Training......Page 80
Applying......Page 82
Training the Transformational Model......Page 83
Maximum Phrase Length......Page 84
Training-Set Size......Page 85
Conclusion......Page 86
Introduction......Page 88
Word Alignment Model......Page 89
Similarity Measure......Page 90
Matching......Page 92
Experiments Design......Page 93
Experimental Results......Page 94
Conclusion......Page 96
References......Page 97
Introduction......Page 98
Hybrid Transliteration Model and Correspondence-Based Transliteration Model......Page 100
Framework of Different Transliteration Models......Page 101
Transliteration Model-Based Validation: Stm......Page 102
Web-Based Validation: Sweb......Page 103
Experimental Results......Page 105
Conclusion......Page 107
Introduction......Page 110
Preliminaries......Page 111
Clique Percolation Method in Random Graphs......Page 112
Algorithmic Implementation of CPM......Page 114
Related Work......Page 116
Data Sets......Page 117
Performance Evaluation......Page 118
Conclusion and Future Work......Page 120
Introduction......Page 122
Related Work......Page 123
Constructing Head Extraction Model......Page 124
Supplementary Heuristics of Head Extraction Model......Page 126
Extracting Semantic-Core Element from Extracted Head......Page 127
Extracting Table-Schemata Using SCE and Table Structures......Page 128
Extracting Triple......Page 129
Experimental Results......Page 130
Conclusions......Page 131
References......Page 132
Introduction......Page 133
kNN Connection Graph......Page 135
kNN Connection Graph-Based Hierarchical Document Clustering......Page 137
Experimentation......Page 139
Conclusion......Page 142
Introduction......Page 144
Affinity Graph Building......Page 145
Information Richness Computation......Page 146
Experimental Setup......Page 147
Experimental Results......Page 148
Conclusion and Future Work......Page 150
References......Page 151
Introduction......Page 152
People with Reading Disabilities and Chinese Stem-Deriving Instruction......Page 153
Computer-Assisted Instruction......Page 154
Research Design......Page 155
Results......Page 158
Conclusion and Recommendations......Page 160
References......Page 161
Introduction......Page 162
Baseline System......Page 163
Discrimination Issues......Page 164
Feature Selection......Page 165
Comparison Experiment on Newsgroups......Page 167
Comparison Experiment on Reuters-21578......Page 168
References......Page 169
Introduction......Page 170
MMI......Page 171
FW......Page 172
HFW......Page 173
WC......Page 174
Experimental Results......Page 175
Applying Word Clusters to Chinese Syntactic Parsing......Page 176
References......Page 177
Introduction......Page 178
SUMO......Page 179
Expanding FrameNet Coverage with Synsets......Page 180
Populating Frames with Synsets by SUMO......Page 181
Evaluation on Recall and Precision......Page 182
References......Page 184
Introduction......Page 186
Outline......Page 188
Constructing the Corpus of the Domain......Page 189
Domain Specificity Estimation of Technical Terms......Page 190
Collecting Novel Technical Terms of a Domain from the Web......Page 191
Experimental Evaluation......Page 192
Concluding Remarks......Page 193
Introduction......Page 194
Extraction of Event......Page 195
Building Document Graph......Page 196
Node Scoring with PageRank for Summarization......Page 198
Experiments and Discussions......Page 199
Conclusion......Page 200
References......Page 201
Introduction......Page 202
Probabilistic Feature Based Maximum Entropy Model......Page 203
Person Name Recognition and Location Name Recognition......Page 204
Experiments and Results......Page 207
References......Page 209
Introduction......Page 210
Text Line Feature Extraction and Preprocessing......Page 211
Graph Model of Text Line Features......Page 212
Computation Time Reduction......Page 213
Experiment Result and Analysis......Page 214
Conclusion and Future Research......Page 216
Introduction......Page 218
Statistics of IPC Clusters......Page 219
Automatic Clustering......Page 220
Interpolation Model......Page 221
Cluster Expansion Model......Page 222
Cluster Expansion Model......Page 223
Conclusions......Page 224
References......Page 225
Introduction......Page 226
Overview of the Process......Page 227
Detection of Incorrect Word Candidates......Page 228
Substitution Process......Page 229
Data Set......Page 230
Experiment on Error Detection......Page 231
Experiment on Error Correction......Page 232
Conclusions......Page 233
Introduction......Page 235
Related Work......Page 236
Candidate Extraction: Transliteration Boundary Detection......Page 237
Candidate Validation: Phonetic Similarity and Joint Web Validation Models......Page 239
Joint Web Validation Model: SJoint......Page 240
Experimental Setup......Page 242
Results......Page 243
Error Analysis......Page 244
Conclusion......Page 245
Introduction......Page 247
Mathematical Re-formulation of Harris's Hypothesis......Page 249
Precision and Recall......Page 250
Data......Page 251
Small Examples......Page 252
Larger-Scale Performance......Page 253
Results......Page 254
Discussion......Page 255
Conclusion......Page 256
Introduction and Background......Page 258
System Overview......Page 259
Abbreviation Formation Analysis......Page 261
Morphological Analysis......Page 263
Named-Entity Identification......Page 264
Comparison of Kernel Functions......Page 265
Improvement Evaluation......Page 266
Conclusion and Future Work......Page 267
References......Page 268
Introduction......Page 269
Data Set......Page 271
Combining Raw and MM-Segmented Corpora......Page 272
Combining $F_{RFB}(w_i)$ and $F_{HB}(w_i)$......Page 273
Experiments......Page 274
Perspective 1: The Spearman Coefficient of Rank Correlation......Page 275
Perspective 2: Rank Sequence Deviation......Page 276
Perspective 3: The Coverage Rate......Page 277
Sample Analysis......Page 278
Conclusion and Future Work......Page 279
Introduction......Page 281
Expanded Chunks......Page 282
System Architecture......Page 284
Using BPs and Expanded Chunks......Page 285
Error Analysis......Page 286
Comparative Experiments......Page 287
References......Page 288
Introduction......Page 290
Types of Abbreviations in Chinese......Page 292
Overview......Page 293
Generation of Expansion Candidates......Page 294
Disambiguation of Abbreviations with HMMs......Page 295
Revising Error Expansions Using Linguistic Knowledge......Page 296
Building an Abbreviation-Expanded Corpus......Page 297
Evaluation Results and Discussions......Page 298
Conclusion......Page 299
References......Page 300
Introduction......Page 301
Syllable-N-Gram-Based Approaches......Page 302
Resolving Data Sparseness and Memory Increase in Word-Unigram Model......Page 303
The Correlation Between the Observed Word and Its Category Pattern......Page 304
Basic Concepts Involved in the Category-Pattern-Based Model Construction......Page 305
Parameter Fitting Using Simulated Annealing......Page 306
Experimentation......Page 308
References......Page 310
Introduction......Page 312
Alternative Approaches to Integration and Experimental Comparisons......Page 314
Segmentation Using Word N-Grams......Page 316
Integration Schemes in the Conventional Framework......Page 317
Divide-and-Conquer Integration......Page 319
Conclusions......Page 321
References......Page 322
Introduction......Page 323
Arbitrariness of the Kanji Writing System......Page 324
A Beginner's View of Kanjis......Page 325
Kansuke Code......Page 326
Interface......Page 327
Component Trees......Page 329
Settings......Page 330
Results......Page 331
Conclusion......Page 333
Introduction......Page 334
Lexical Processing of Kanji and Hanzi......Page 335
Experiment Outline......Page 336
Results......Page 337
Pixel Difference Model......Page 339
Bag of Radicals Model......Page 340
Model Evaluation......Page 341
Confusability......Page 343
Conclusion......Page 344
Introduction......Page 346
The Proposed Method......Page 348
State Transition Cost Computation......Page 349
Determination of the Shortest Path(s)......Page 351
Experiment and Results......Page 352
Conclusion and Future Works......Page 355
References......Page 356
Introduction......Page 358
Classifying MWEs......Page 359
Outline of the Method Proposed......Page 360
Creating a New MWE and Its Possible Translation......Page 361
Prioritizing by Similarity in a Japanese Thesaurus......Page 362
Prioritizing Considered with a Number of Original MWEs......Page 364
Experimental Results and Discussion......Page 365
Conclusion......Page 367
Introduction......Page 368
A Maximum Entropy Approach to EBMT Model Training......Page 369
Feature Functions......Page 370
Experimental Setting......Page 372
Results......Page 373
References......Page 374
Introduction......Page 376
Word-to-Phrase Transfer Model for EVMT......Page 377
English Dictionary......Page 378
Bilingual English-Vietnamese Dictionary......Page 379
Vietnamese Dictionary......Page 380
References......Page 381
Introduction......Page 383
Previous Work......Page 384
Features for Sense Disambiguation from a Mono-bilingual Dictionary......Page 385
Features for Sense Disambiguation from a Target Language Corpus......Page 386
Combining Features Using Machine Learning......Page 387
Evaluation......Page 389
References......Page 390
Introduction......Page 391
Lexical-Knowledge Based Aligner......Page 392
Component for Establishing Reliable Alignment......Page 393
Component for Extending Alignment......Page 394
Automatically Building a Japanese-Chinese Dictionary......Page 396
Experiment......Page 397
Conclusion......Page 398
Processing Difficulties......Page 400
Forming a Numeral-Classifier Sequence and Its Semantics......Page 401
Syntax and Semantics of the Floating Quantifier Constructions......Page 402
Case Mismatches......Page 404
Future Work and Conclusion......Page 407
Introduction......Page 408
Various Surface Forms of Japanese Functional Expressions......Page 409
Hierarchy with Nine Abstraction Levels......Page 411
Compilation Procedure......Page 413
Related Work......Page 414
Conclusion and Future Work......Page 415
Introduction......Page 416
Categories of Honorific Misuse......Page 417
System Input and Output......Page 418
Process Flow......Page 419
Consistency Table......Page 422
System Validity Check......Page 423
Procedure......Page 424
Conclusion......Page 425
Introduction......Page 427
The Lexicon......Page 428
The Interactive Construction......Page 429
Linking Senses to CCD Synsets......Page 430
Keeping Consistency......Page 431
Inter-annotator Agreement......Page 432
Conclusion and Future Works......Page 433
References......Page 434
Introduction......Page 435
Related Works......Page 436
Named Entity Alignment for Multilingual Named Entity Translation......Page 437
Program Identifier for Program Specific Knowledge......Page 439
Objective Evaluation: BLEU and NIST11 Score......Page 440
Conclusions......Page 441
Introduction......Page 443
Self-Organizing Map Using Directional Similarity Measures – To Find Similarity and Hierarchical Relationships of Concepts......Page 444
Self-Organizing Map......Page 445
Construction of Hierarchy Composed from Concepts When Using CSM......Page 447
Comparison of CSM with Other Methods: Ovlp, and CSM Using Frequency Information......Page 448
Comparison of Created Hierarchies with Existing Handcrafted Thesaurus......Page 449
Conclusion......Page 453
References......Page 454
Introduction......Page 455
Estimate the Distance of Pronunciation Similarity......Page 456
Estimating Threshold Using a Support Vector Machine......Page 457
Evaluation......Page 458
Seven Vowels Test......Page 459
Conclusions......Page 461
References......Page 462
Introduction......Page 463
SVM Based Speaker Selection......Page 464
Main Procedure of Proposed Plan......Page 466
Experiment Setup......Page 467
Experimental Results......Page 468
References......Page 469
Related Work......Page 470
Exploiting Semantic-Contextual Knowledge with LSA......Page 471
Choosing Useful Features for NEs......Page 473
Experiments......Page 474
Conclusion......Page 477
Introduction......Page 479
Tri-training for Chinese Chunking......Page 480
Select Training Samples......Page 481
Related Works......Page 482
Experimental 1: Selection Methods of Tri-training......Page 483
Discussion......Page 484
Conclusions......Page 485
Introduction......Page 487
One-Against-One......Page 488
Comparison on a Real Case......Page 489
Elimination Assembling......Page 490
Experiment Setup......Page 491
Discussion I: N-Way vs. Binarization......Page 492
Conclusions......Page 493
References......Page 494
Introduction......Page 495
Extending Labeled Data......Page 497
A New Bootstrapping Algorithm......Page 498
Data......Page 499
Parameters and Results......Page 500
Conclusion......Page 502
Introduction......Page 503
Example of Dialog Using Our System......Page 504
Details of Information Retrieve Utterance Mode......Page 505
Filtering by Surface Cohesion......Page 506
Filtering by Predicate Coherence......Page 507
Ranking by Coherence of Nouns......Page 508
Dialog Examples and Discussion......Page 509
Conclusion......Page 510
Introduction......Page 511
Turkish......Page 512
Parsing Framework......Page 513
Experimental Setup......Page 514
Inflectional Groups......Page 515
Inflectional Features......Page 516
Lexicalization......Page 518
Conclusion......Page 519
Introduction......Page 522
Compositional Constituents and Non-compositional Constituents......Page 523
Characteristics of C-Constituents......Page 524
Language Model......Page 525
The Principles of Pattern-Forming......Page 526
The Number of Different SPs......Page 527
Matched Pattern Ratio and Precision......Page 529
Semantic Coverage......Page 530
References......Page 531
Introduction......Page 533
Chinese Sentence Structure and Zero Anaphora......Page 534
Text Preprocessing......Page 535
Case-Based Reasoning Module......Page 537
Experiments and Analysis......Page 541
Conclusions......Page 542
Introduction......Page 545
Collocation Net......Page 546
EAMI: Estimated Average Mutual Information......Page 548
EPMI: Estimated Pair-Wise Mutual Information......Page 549
Building a Collocation Net......Page 550
Experimentation......Page 551
Conclusion......Page 553
References......Page 554
Back matter......Page 555