Author(s): Jacob Eisenstein
Year: 2018
Language: English
Tags: CS 7650; CS7650; computer science; artificial intelligence; machine learning; deep learning; ML; AI; DL
Contents
Preface
Background
How to use this book
Introduction
Natural language processing and its neighbors
Three themes in natural language processing
Learning and knowledge
Search and learning
Relational, compositional, and distributional perspectives
Learning
Linear text classification
The bag of words
Naïve Bayes
Types and tokens
Prediction
Estimation
Smoothing
Setting hyperparameters
Discriminative learning
Perceptron
Averaged perceptron
Loss functions and large-margin classification
Online large margin classification
*Derivation of the online support vector machine
Logistic regression
Regularization
Gradients
Optimization
Batch optimization
Online optimization
*Additional topics in classification
Feature selection by regularization
Other views of logistic regression
Summary of learning algorithms
Nonlinear classification
Feedforward neural networks
Designing neural networks
Activation functions
Network structure
Outputs and loss functions
Inputs and lookup layers
Learning neural networks
Backpropagation
Regularization and dropout
*Learning theory
Tricks
Convolutional neural networks
Linguistic applications of classification
Sentiment and opinion analysis
Related problems
Alternative approaches to sentiment analysis
Word sense disambiguation
How many word senses?
Word sense disambiguation as classification
Design decisions for text classification
What is a word?
How many words?
Count or binary?
Evaluating classifiers
Precision, recall, and F-measure
Threshold-free metrics
Classifier comparison and statistical significance
*Multiple comparisons
Building datasets
Metadata as labels
Labeling data
Learning without supervision
Unsupervised learning
K-means clustering
Expectation-Maximization (EM)
EM as an optimization algorithm
How many clusters?
Applications of expectation-maximization
Word sense induction
Semi-supervised learning
Multi-component modeling
Semi-supervised learning
Multi-view learning
Graph-based algorithms
Domain adaptation
Supervised domain adaptation
Unsupervised domain adaptation
*Other approaches to learning with latent variables
Sampling
Spectral learning
Sequences and trees
Language models
N-gram language models
Smoothing and discounting
Smoothing
Discounting and backoff
*Interpolation
*Kneser-Ney smoothing
Recurrent neural network language models
Backpropagation through time
Hyperparameters
Gated recurrent neural networks
Evaluating language models
Held-out likelihood
Perplexity
Out-of-vocabulary words
Sequence labeling
Sequence labeling as classification
Sequence labeling as structure prediction
The Viterbi algorithm
Example
Higher-order features
Hidden Markov Models
Estimation
Inference
Discriminative sequence labeling with features
Structured perceptron
Structured support vector machines
Conditional random fields
Neural sequence labeling
Recurrent neural networks
Character-level models
Convolutional Neural Networks for Sequence Labeling
*Unsupervised sequence labeling
Linear dynamical systems
Alternative unsupervised learning methods
Semiring notation and the generalized viterbi algorithm
Applications of sequence labeling
Part-of-speech tagging
Parts-of-Speech
Accurate part-of-speech tagging
Morphosyntactic Attributes
Named Entity Recognition
Tokenization
Code switching
Dialogue acts
Formal language theory
Regular languages
Finite state acceptors
Morphology as a regular language
Weighted finite state acceptors
Finite state transducers
*Learning weighted finite state automata
Context-free languages
Context-free grammars
Natural language syntax as a context-free language
A phrase-structure grammar for English
Grammatical ambiguity
*Mildly context-sensitive languages
Context-sensitive phenomena in natural language
Combinatory categorial grammar
Context-free parsing
Deterministic bottom-up parsing
Recovering the parse tree
Non-binary productions
Complexity
Ambiguity
Parser evaluation
Local solutions
Weighted Context-Free Grammars
Parsing with weighted context-free grammars
Probabilistic context-free grammars
*Semiring weighted context-free grammars
Learning weighted context-free grammars
Probabilistic context-free grammars
Feature-based parsing
*Conditional random field parsing
Neural context-free grammars
Grammar refinement
Parent annotations and other tree transformations
Lexicalized context-free grammars
*Refinement grammars
Beyond context-free parsing
Reranking
Transition-based parsing
Dependency parsing
Dependency grammar
Heads and dependents
Labeled dependencies
Dependency subtrees and constituents
Graph-based dependency parsing
Graph-based parsing algorithms
Computing scores for dependency arcs
Learning
Transition-based dependency parsing
Transition systems for dependency parsing
Scoring functions for transition-based parsers
Learning to parse
Applications
Meaning
Logical semantics
Meaning and denotation
Logical representations of meaning
Propositional logic
First-order logic
Semantic parsing and the lambda calculus
The lambda calculus
Quantification
Learning semantic parsers
Learning from derivations
Learning from logical forms
Learning from denotations
Predicate-argument semantics
Semantic roles
VerbNet
Proto-roles and PropBank
FrameNet
Semantic role labeling
Semantic role labeling as classification
Semantic role labeling as constrained optimization
Neural semantic role labeling
Abstract Meaning Representation
AMR Parsing
Distributional and distributed semantics
The distributional hypothesis
Design decisions for word representations
Representation
Context
Estimation
Latent semantic analysis
Brown clusters
Neural word embeddings
Continuous bag-of-words (CBOW)
Skipgrams
Computational complexity
Word embeddings as matrix factorization
Evaluating word embeddings
Intrinsic evaluations
Extrinsic evaluations
Fairness and bias
Distributed representations beyond distributional statistics
Word-internal structure
Lexical semantic resources
Distributed representations of multiword units
Purely distributional methods
Distributional-compositional hybrids
Supervised compositional methods
Hybrid distributed-symbolic representations
Reference Resolution
Forms of referring expressions
Pronouns
Proper Nouns
Nominals
Algorithms for coreference resolution
Mention-pair models
Mention-ranking models
Transitive closure in mention-based models
Entity-based models
Representations for coreference resolution
Features
Distributed representations of mentions and entities
Evaluating coreference resolution
Discourse
Segments
Topic segmentation
Functional segmentation
Entities and reference
Centering theory
The entity grid
*Formal semantics beyond the sentence level
Relations
Shallow discourse relations
Hierarchical discourse relations
Argumentation
Applications of discourse relations
Applications
Information extraction
Entities
Entity linking by learning to rank
Collective entity linking
*Pairwise ranking loss functions
Relations
Pattern-based relation extraction
Relation extraction as a classification task
Knowledge base population
Open information extraction
Events
Hedges, denials, and hypotheticals
Question answering and machine reading
Formal semantics
Machine reading
Machine translation
Machine translation as a task
Evaluating translations
Data
Statistical machine translation
Statistical translation modeling
Estimation
Phrase-based translation
*Syntax-based translation
Neural machine translation
Neural attention
*Neural machine translation without recurrence
Out-of-vocabulary words
Decoding
Training towards the evaluation metric
Text generation
Data-to-text generation
Latent data-to-text alignment
Neural data-to-text generation
Text-to-text generation
Neural abstractive summarization
Sentence fusion for multi-document summarization
Dialogue
Finite-state and agenda-based dialogue systems
Markov decision processes
Neural chatbots
Probability
Probabilities of event combinations
Probabilities of disjoint events
Law of total probability
Conditional probability and Bayes' rule
Independence
Random variables
Expectations
Modeling and estimation
Numerical optimization
Gradient descent
Constrained optimization
Example: Passive-aggressive online learning
Bibliography