Foundations of Statistical Natural Language Processing

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

内容简介 · · · · · · Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.

Author(s): Christopher D. Manning / Hinrich Schütze
Publisher: The MIT Press
Year: 1999

Language: English
Pages: 620

Foundations of Statistical Natural Language Processing
Copyright
Brief Contents
Contents
List of Tables
List of Figures
Table of Notations
Preface
Road Map
Part1 Preliminaries
Ch1 Introduction
1.1 Rationalist & Empiricist Approaches to Language
1.2 Scientific Content
1.2.1 Questions that linguistics should answer
1.2.2 Non-categorical phenomena in language
1.2.3 Language & cognition as probabilistic phenomena
1.3 Ambiguity of Language: Why NLP is Difficult
1.4 Dirty Hands
1.4.1 Lexical resources
1.4.2 Word counts
1.4.3 Zipf's laws
1.4.4 Collocations
1.4.5 Concordances
1.5 Further Reading
1.6 Exercises
Ch2 Mathematical Foundations
2.1 Elementary Probability Theory
2.1.1 Probability spaces
2.1.2 Conditional probability & independence
2.1.3 Bayes' theorem
2.1.4 Random variables
2.1.5 Expectation & variance
2.1.6 Notation
2.1.7 Joint & conditional distributions
2.1.8 Determining P
2.1.9 Standard distributions
2.1.10 Bayesian statistics
2.1.11 Exercises
2.2 Essential Information Theory
2.2.1 Entropy
2.2.2 Joint entropy & conditional entropy
2.2.3 Mutual information
2.2.4 Noisy channel model
2.2.5 Relative entropy or Kullback-Leibler divergence
2.2.6 Relation to language: Cross entropy
2.2.7 Entropy of English
2.2.8 Perplexity
2.2.9 Exercises
2.3 Further Reading
Ch3 Linguistic Essentials
3.1 Parts of Speech & Morphology
3.1.1 Nouns & pronouns
3.1.2 Words that accompany nouns: Determiners & adjectives
3.1.3 Verbs
3.1.4 Other parts of speech
3.2 Phrase Structure
3.2.1 Phrase structure grammars
3.2.2 Dependency: Arguments & adjuncts
3.2.3 X' theory
3.2.4 Phrase structure ambiguity
3.3 Semantics & Pragmatics
3.4 Other Areas
3.5 Further Reading
3.6 Exercises
Ch4 Corpus-Based Work
4.1 Getting Set Up
4.1.1 Computers
4.1.2 Corpora
4.1.3 Software
4.2 Looking at Text
4.2.1 Low-level formatting issues
4.2.2 Tokenization: What is a word?
4.2.3 Morphology
4.2.4 Sentences
4.3 Marked-up Data
4.3.1 Markup schemes
4.3.2 Grammatical tagging
4.4 Further Reading
4.5 Exercises
Part2 Words
Ch5 Collocations
5.1 Frequency
5.2 Mean & Variance
5.3 Hypothesis Testing
5.3.1 t test
5.3.2 Hypothesis testing of differences
5.3.3 Pearson's chi-square test
5.3.4 Likelihood ratios
5.4 Mutual Information
5.5 Notion of Collocation
5.6 Further Reading
Ch6 Statistical Inference: n-gram Models over Sparse Data
6.1 Bins: Forming Equivalence Classes
6.1.1 Reliability vs discrimination
6.1.2 n-gram models
6.1.3 Building n-gram models
6.2 Statistical Estimators
6.2.1 Maximum Likelihood Estimation (MLE)
6.2.2 Laplace's law, Lidstone's law & Jeffreys-Perks law
6.2.3 Held out estimation
6.2.4 Cross-validation (deleted estimation)
6.2.5 Good-Turing estimation
6.2.6 Briefly noted
6.3 Combining Estimators
6.3.1 Simple linear interpolation
6.3.2 Katz's backing-off
6.3.3 General linear interpolation
6.3.4 Briefly noted
6.3.5 Language models for Austen
6.4 Conclusions
6.5 Further Reading
6.6 Exercises
Ch7 Word Sense Disambiguation
7.1 Methodological Preliminaries
7.1.1 Supervised & unsupervised learning
7.1.2 Pseudowords
7.1.3 Upper & lower bounds on performance
7.2 Supervised Disambiguation
7.2.1 Bayesian classification
7.2.2 Information-theoretic approach
7.3 Dictionary-Based Disambiguation
7.3.1 Disambiguation based on sense definitions
7.3.2 Thesaurus-based disambiguation
7.3.3 Disambiguation based on translations in 2nd-language corpus
7.3.4 One sense per discourse, one sense per collocation
7.4 Unsupervised Disambiguation
7.5 What is a Word Sense?
7.6 Further Reading
7.7 Exercises
Ch8 Lexical Acquisition
8.1 Evaluation Measures
8.2 Verb Subcategorization
8.3 Attachment Ambiguity
8.3.1 Hindle & Rooth (1993)
8.3.2 General remarks on PP attachment
8.4 Selectional Preferences
8.5 Semantic Similarity
8.5.1 Vector space measures
8.5.2 Probabilistic measures
8.6 Role of Lexical Acquisition in Statistical NLP
8.7 Further Reading
Part3 Grammar
Ch9 Markov Models
9.1 Markov Models
9.2 Hidden Markov Models
9.2.1 Why uses HMMs?
9.2.2 General form of HMM
9.3 Three Fundamental Questions for HMMs
9.3.1 Finding probability of observation
9.3.2 Finding best state sequence
9.3.3 Third problem: Parameter estimation
9.4 HMMs: Implementation, Properties & Variants
9.4.1 Implementation
9.4.2 Variants
9.4.3 Multiple input observations
9.4.4 Initialization of parameter values
9.5 Further Reading
Ch10 Part-of-Speech Tagging
10.1 Information Sources in Tagging
10.2 Markov Model Taggers
10.2.1 Probabilistic model
10.2.2 Viterbi algorithm
10.2.3 Variations
10.3 Hidden Markov Model Taggers
10.3.1 Applying HMMs to POS tagging
10.3.2 Effect of initialization on HMM training
10.4 Transformation-Based Learning of Tags
10.4.1 Transformations
10.4.2 Learning algorithm
10.4.3 Relation to other models
10.4.4 Automata
10.4.5 Summary
10.5 Other Methods, Other Languages
10.5.1 Other approaches to tagging
10.5.2 Languages other than English
10.6 Tagging Accuracy & Uses of Taggers
10.6.1 Tagging accuracy
10.6.2 Applications of tagging
10.7 Further Reading
10.8 Exercises
Ch11 Probabilistic Context Free Grammars
11.1 Some Features of PCFGs
11.2 Questions for PCFGs
11.3 Probability of String
11.3.1 Using inside probabilities
11.3.2 Using outside probabilities
11.3.3 Finding most likely parse for sentence
11.3.4 Training PCFG
11.4 Problems with Inside-Outside Algorithm
11.5 Further Reading
11.6 Exercises
Ch12 Probabilistic Parsing
12.1 Some Concepts
12.1.1 Parsing for disambiguation
12.1.2 Treebanks
12.1.3 Parsing models vs language models
12.1.4 Weakening independence assumptions of PCFGs
12.1.5 Tree probabilities & derivational probabilities
12.1.6 There's more than one way to do it
12.1.7 Phrase structure grammars & dependency grammars
12.1.8 Evaluation
12.1.9 Equivalent models
12.1.10 Building parsers: Search methods
12.1.11 Use of geometric mean
12.2 Some Approaches
12.2.1 Non-lexicalized treebank grammars
12.2.2 Lexicalized models using derivational histories
12.2.3 Dependency-based models
12.2.4 Discussion
12.3 Further Reading
12.4 Exercises
Part4 Applications & Techniques
Ch13 Statistical Alignment & Machine Translation
13.1 Text Alignment
13.1.1 Aligning sentences & paragraphs
13.1.2 Length-based methods
13.1.3 Offset alignment by signal processing techniques
13.1.4 Lexical methods of sentence alignment
13.1.5 Summary
13.1.6 Exercises
13.2 Word Alignment
13.3 Statistical Machine Translation
13.4 Further Reading
Ch14 Clustering
14.1 Hierarchical Clustering
14.1.1 Single-link & complete-link clustering
14.1.2 Group-average agglomerative clustering
14.1.3 Application: Improving language model
14.1.4 Top-down clustering
14.2 Non-Hierarchical Clustering
14.2.1 K-means
14.2.2 EM algorithm
14.3 Further Reading
14.4 Exercises
Ch15 Topics in Information RetrievaI
15.1 Some Background on Information Retrieval
15.1.1 Common design features of IR systems
15.1.2 Evaluation measures
15.1.3 Probability ranking principle (PRP)
15.2 Vector Space Model
15.2.1 Vector similarity
15.2.2 Term weighting
15.3 Term Distribution Models
15.3.1 Poisson distribution
15.3.2 Two-Poisson model
15.3.3 K mixture
15.3.4 Inverse document frequency
15.3.5 Residual inverse document frequency
15.3.6 Usage of term distribution models
15.4 Latent Semantic Indexing
15.4.1 Least-squares methods
15.4.2 Singular Value Decomposition
15.4.3 Latent Semantic Indexing in IR
15.5 Discourse Segmentation
15.5.1 TextTiling
15.6 Further Reading
15.7 Exercises
Ch16 Text Categorization
16.1 Decision Trees
16.2 Maximum Entropy Modeling
16.2.1 Generalized iterative scaling
16.2.2 Application to text categorization
16.3 Perceptrons
16.4 k Nearest Neighbor Classification
16.5 Further Reading
Tiny Statistical Tables
Bibliography
Index