Natural Language Processing

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

https://github.com/jacobeisenstein/gt-nlp-class/tree/master/notes These notes are the basis for the readings in CS4650 and CS7650 ("Natural Language") at Georgia Tech. The latest version is eisenstein-nlp-notes.pdf. These notes are under contract with MIT Press, and are posted here under the Creative Commons license CC-BY-NC-ND. More information about this license is available here: https://creativecommons.org/licenses/. One aspect of the license is that translations require permission, which must be negotiated with MIT Press. You are welcome to use these notes and slides for your own learning or teaching, assuming you give attribution. If you do use this material in your another class, please tell me! You can reach me directly at the username jacobe, at the website gmail dot com. Feedback of any kind is welcome. I've gotten a few github issues to correct typos, which I appreciate. Please just add your full name in the issue if you want to be acknowledged.

Author(s): Jacob Eisenstein
Edition: 1
Publisher: MIT Press, and are posted here under the Creative Commons license CC-BY-NC-ND
Year: 2018

Language: English
Pages: 591

Contents......Page 3
Background......Page 13
How to use this book......Page 14
Natural language processing and its neighbors......Page 19
Learning and knowledge......Page 24
Search and learning......Page 25
Relational, compositional, and distributional perspectives......Page 27
Learning......Page 29
The bag of words......Page 31
Naïve Bayes......Page 35
Types and tokens......Page 37
Prediction......Page 38
Estimation......Page 39
Smoothing......Page 40
Setting hyperparameters......Page 41
Discriminative learning......Page 42
Perceptron......Page 43
Loss functions and large-margin classification......Page 45
Online large margin classification......Page 48
*Derivation of the online support vector machine......Page 50
Logistic regression......Page 53
Regularization......Page 54
Optimization......Page 55
Batch optimization......Page 56
Online optimization......Page 57
Other views of logistic regression......Page 59
Summary of learning algorithms......Page 61
Nonlinear classification......Page 65
Feedforward neural networks......Page 66
Activation functions......Page 68
Network structure......Page 69
Outputs and loss functions......Page 70
Learning neural networks......Page 71
Backpropagation......Page 73
Regularization and dropout......Page 75
*Learning theory......Page 76
Tricks......Page 77
Convolutional neural networks......Page 80
Sentiment and opinion analysis......Page 87
Related problems......Page 88
Alternative approaches to sentiment analysis......Page 90
Word sense disambiguation......Page 91
How many word senses?......Page 92
Word sense disambiguation as classification......Page 93
What is a word?......Page 94
How many words?......Page 97
Evaluating classifiers......Page 98
Precision, recall, and F-measure......Page 99
Threshold-free metrics......Page 101
Classifier comparison and statistical significance......Page 102
*Multiple comparisons......Page 105
Labeling data......Page 106
Unsupervised learning......Page 113
K-means clustering......Page 114
Expectation-Maximization (EM)......Page 116
EM as an optimization algorithm......Page 120
How many clusters?......Page 121
Word sense induction......Page 122
Semi-supervised learning......Page 123
Multi-component modeling......Page 124
Semi-supervised learning......Page 125
Multi-view learning......Page 126
Graph-based algorithms......Page 127
Domain adaptation......Page 128
Supervised domain adaptation......Page 129
Unsupervised domain adaptation......Page 130
*Other approaches to learning with latent variables......Page 132
Sampling......Page 133
Spectral learning......Page 135
Sequences and trees......Page 141
Language models......Page 143
N-gram language models......Page 144
Smoothing......Page 147
Discounting and backoff......Page 148
*Interpolation......Page 149
Recurrent neural network language models......Page 151
Backpropagation through time......Page 154
Gated recurrent neural networks......Page 155
Held-out likelihood......Page 157
Perplexity......Page 158
Out-of-vocabulary words......Page 159
Sequence labeling as classification......Page 163
Sequence labeling as structure prediction......Page 165
The Viterbi algorithm......Page 167
Example......Page 170
Hidden Markov Models......Page 171
Inference......Page 173
Discriminative sequence labeling with features......Page 175
Structured support vector machines......Page 178
Conditional random fields......Page 180
Recurrent neural networks......Page 185
Character-level models......Page 187
*Unsupervised sequence labeling......Page 188
Semiring notation and the generalized viterbi algorithm......Page 190
Part-of-speech tagging......Page 193
Parts-of-Speech......Page 194
Accurate part-of-speech tagging......Page 198
Morphosyntactic Attributes......Page 200
Named Entity Recognition......Page 201
Tokenization......Page 203
Code switching......Page 204
Dialogue acts......Page 205
Formal language theory......Page 209
Regular languages......Page 210
Finite state acceptors......Page 211
Morphology as a regular language......Page 212
Weighted finite state acceptors......Page 214
Finite state transducers......Page 219
*Learning weighted finite state automata......Page 224
Context-free languages......Page 225
Context-free grammars......Page 226
Natural language syntax as a context-free language......Page 229
A phrase-structure grammar for English......Page 231
*Mildly context-sensitive languages......Page 236
Context-sensitive phenomena in natural language......Page 237
Combinatory categorial grammar......Page 238
Context-free parsing......Page 243
Deterministic bottom-up parsing......Page 244
Non-binary productions......Page 245
Ambiguity......Page 247
Parser evaluation......Page 248
Local solutions......Page 249
Weighted Context-Free Grammars......Page 250
Parsing with weighted context-free grammars......Page 252
Probabilistic context-free grammars......Page 253
*Semiring weighted context-free grammars......Page 255
Probabilistic context-free grammars......Page 256
Feature-based parsing......Page 257
*Conditional random field parsing......Page 258
Grammar refinement......Page 260
Parent annotations and other tree transformations......Page 261
Lexicalized context-free grammars......Page 262
*Refinement grammars......Page 266
Reranking......Page 268
Transition-based parsing......Page 269
Dependency grammar......Page 275
Heads and dependents......Page 276
Labeled dependencies......Page 277
Dependency subtrees and constituents......Page 278
Graph-based dependency parsing......Page 280
Graph-based parsing algorithms......Page 282
Computing scores for dependency arcs......Page 283
Learning......Page 285
Transition-based dependency parsing......Page 286
Transition systems for dependency parsing......Page 287
Scoring functions for transition-based parsers......Page 291
Learning to parse......Page 292
Applications......Page 295
Meaning......Page 301
Logical semantics......Page 303
Meaning and denotation......Page 304
Propositional logic......Page 305
First-order logic......Page 306
Semantic parsing and the lambda calculus......Page 309
The lambda calculus......Page 310
Quantification......Page 311
Learning semantic parsers......Page 314
Learning from derivations......Page 315
Learning from logical forms......Page 317
Learning from denotations......Page 319
Predicate-argument semantics......Page 323
Semantic roles......Page 325
VerbNet......Page 326
Proto-roles and PropBank......Page 327
FrameNet......Page 328
Semantic role labeling as classification......Page 330
Semantic role labeling as constrained optimization......Page 333
Neural semantic role labeling......Page 335
Abstract Meaning Representation......Page 336
AMR Parsing......Page 339
The distributional hypothesis......Page 343
Representation......Page 345
Context......Page 346
Latent semantic analysis......Page 347
Brown clusters......Page 349
Continuous bag-of-words (CBOW)......Page 352
Computational complexity......Page 353
Word embeddings as matrix factorization......Page 355
Evaluating word embeddings......Page 356
Extrinsic evaluations......Page 357
Fairness and bias......Page 358
Word-internal structure......Page 359
Lexical semantic resources......Page 361
Purely distributional methods......Page 362
Distributional-compositional hybrids......Page 363
Hybrid distributed-symbolic representations......Page 364
Reference Resolution......Page 369
Pronouns......Page 370
Nominals......Page 375
Algorithms for coreference resolution......Page 376
Mention-pair models......Page 377
Mention-ranking models......Page 378
Transitive closure in mention-based models......Page 379
Entity-based models......Page 380
Features......Page 385
Distributed representations of mentions and entities......Page 388
Evaluating coreference resolution......Page 391
Segments......Page 397
Topic segmentation......Page 398
Entities and reference......Page 399
Centering theory......Page 400
The entity grid......Page 401
*Formal semantics beyond the sentence level......Page 402
Shallow discourse relations......Page 403
Hierarchical discourse relations......Page 407
Argumentation......Page 410
Applications of discourse relations......Page 411
Applications......Page 419
Information extraction......Page 421
Entities......Page 423
Entity linking by learning to rank......Page 424
Collective entity linking......Page 426
*Pairwise ranking loss functions......Page 427
Relations......Page 429
Pattern-based relation extraction......Page 430
Relation extraction as a classification task......Page 431
Knowledge base population......Page 434
Open information extraction......Page 437
Events......Page 438
Hedges, denials, and hypotheticals......Page 440
Formal semantics......Page 442
Machine reading......Page 443
Machine translation as a task......Page 449
Evaluating translations......Page 451
Data......Page 453
Statistical machine translation......Page 454
Statistical translation modeling......Page 455
Estimation......Page 456
Phrase-based translation......Page 457
*Syntax-based translation......Page 459
Neural machine translation......Page 460
Neural attention......Page 462
*Neural machine translation without recurrence......Page 464
Out-of-vocabulary words......Page 466
Decoding......Page 467
Training towards the evaluation metric......Page 469
Data-to-text generation......Page 475
Latent data-to-text alignment......Page 477
Neural data-to-text generation......Page 478
Neural abstractive summarization......Page 482
Sentence fusion for multi-document summarization......Page 484
Finite-state and agenda-based dialogue systems......Page 485
Markov decision processes......Page 486
Neural chatbots......Page 488
Probabilities of event combinations......Page 493
Probabilities of disjoint events......Page 494
Conditional probability and Bayes' rule......Page 495
Independence......Page 497
Random variables......Page 498
Expectations......Page 499
Modeling and estimation......Page 500
Numerical optimization......Page 503
Constrained optimization......Page 504
Example: Passive-aggressive online learning......Page 505
Bibliography......Page 507