Many books and courses tackle natural language processing (NLP) problems with toy use cases and well-defined datasets. But if you want to build, iterate, and scale NLP systems in a business setting and tailor them for particular industry verticals, this is your guide. Software engineers and data scientists will learn how to navigate the maze of options available at each step of the journey. Through the course of the book, authors Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana will guide you through the process of building real-world NLP solutions embedded in larger product setups. You'll learn how to adapt your solutions for different industry verticals such as healthcare, social media, and retail. With this book, you'll: Understand the wide spectrum of problem statements, tasks, and solution approaches within NLP Implement and evaluate different NLP applications using machine learning and deep learning methods Fine-tune your NLP solution based on your business problem and industry vertical Evaluate various algorithms and approaches for NLP product tasks, datasets, and stages Produce software solutions following best practices around release, deployment, and DevOps for NLP systems
Author(s): Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, Harshit Surana
Publisher: O'Reilly Media, Inc.
Year: 2020
Language: English
Pages: 325
Cover
Copyright
Table of Contents
Foreword
Preface
Why We Wrote This Book
The Philosophy
Scope
Who Should Read This Book
What You Will Learn
Structure of the Book
How to Read This Book
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Further Information
Acknowledgments
Part I. Foundations
Chapter 1. NLP: A Primer
NLP in the Real World
NLP Tasks
What Is Language?
Building Blocks of Language
Why Is NLP Challenging?
Machine Learning, Deep Learning, and NLP: An Overview
Approaches to NLP
Heuristics-Based NLP
Machine Learning for NLP
Deep Learning for NLP
Why Deep Learning Is Not Yet the Silver Bullet for NLP
An NLP Walkthrough: Conversational Agents
Wrapping Up
Chapter 2. NLP Pipeline
Data Acquisition
Text Extraction and Cleanup
HTML Parsing and Cleanup
Unicode Normalization
Spelling Correction
System-Specific Error Correction
Pre-Processing
Preliminaries
Frequent Steps
Other Pre-Processing Steps
Advanced Processing
Feature Engineering
Classical NLP/ML Pipeline
DL Pipeline
Modeling
Start with Simple Heuristics
Building Your Model
Building THE Model
Evaluation
Intrinsic Evaluation
Extrinsic Evaluation
Post-Modeling Phases
Deployment
Monitoring
Model Updating
Working with Other Languages
Case Study
Wrapping Up
Chapter 3. Text Representation
Vector Space Models
Basic Vectorization Approaches
One-Hot Encoding
Bag of Words
Bag of N-Grams
TF-IDF
Distributed Representations
Word Embeddings
Going Beyond Words
Distributed Representations Beyond Words and Characters
Universal Text Representations
Visualizing Embeddings
Handcrafted Feature Representations
Wrapping Up
Part II. Essentials
Chapter 4. Text Classification
Applications
A Pipeline for Building Text Classification Systems
A Simple Classifier Without the Text Classification Pipeline
Using Existing Text Classification APIs
One Pipeline, Many Classifiers
Naive Bayes Classifier
Logistic Regression
Support Vector Machine
Using Neural Embeddings in Text Classification
Word Embeddings
Subword Embeddings and fastText
Document Embeddings
Deep Learning for Text Classification
CNNs for Text Classification
LSTMs for Text Classification
Text Classification with Large, Pre-Trained Language Models
Interpreting Text Classification Models
Explaining Classifier Predictions with Lime
Learning with No or Less Data and Adapting to New Domains
No Training Data
Less Training Data: Active Learning and Domain Adaptation
Case Study: Corporate Ticketing
Practical Advice
Wrapping Up
Chapter 5. Information Extraction
IE Applications
IE Tasks
The General Pipeline for IE
Keyphrase Extraction
Implementing KPE
Practical Advice
Named Entity Recognition
Building an NER System
NER Using an Existing Library
NER Using Active Learning
Practical Advice
Named Entity Disambiguation and Linking
NEL Using Azure API
Relationship Extraction
Approaches to RE
RE with the Watson API
Other Advanced IE Tasks
Temporal Information Extraction
Event Extraction
Template Filling
Case Study
Wrapping Up
Chapter 6. Chatbots
Applications
A Simple FAQ Bot
A Taxonomy of Chatbots
Goal-Oriented Dialog
Chitchats
A Pipeline for Building Dialog Systems
Dialog Systems in Detail
PizzaStop Chatbot
Deep Dive into Components of a Dialog System
Dialog Act Classification
Identifying Slots
Response Generation
Dialog Examples with Code Walkthrough
Other Dialog Pipelines
End-to-End Approach
Deep Reinforcement Learning for Dialogue Generation
Human-in-the-Loop
Rasa NLU
A Case Study: Recipe Recommendations
Utilizing Existing Frameworks
Open-Ended Generative Chatbots
Wrapping Up
Chapter 7. Topics in Brief
Search and Information Retrieval
Components of a Search Engine
A Typical Enterprise Search Pipeline
Setting Up a Search Engine: An Example
A Case Study: Book Store Search
Topic Modeling
Training a Topic Model: An Example
What’s Next?
Text Summarization
Summarization Use Cases
Setting Up a Summarizer: An Example
Practical Advice
Recommender Systems for Textual Data
Creating a Book Recommender System: An Example
Practical Advice
Machine Translation
Using a Machine Translation API: An Example
Practical Advice
Question-Answering Systems
Developing a Custom Question-Answering System
Looking for Deeper Answers
Wrapping Up
Part III. Applied
Chapter 8. Social Media
Applications
Unique Challenges
NLP for Social Data
Word Cloud
Tokenizer for SMTD
Trending Topics
Understanding Twitter Sentiment
Pre-Processing SMTD
Text Representation for SMTD
Customer Support on Social Channels
Memes and Fake News
Identifying Memes
Fake News
Wrapping Up
Chapter 9. E-Commerce and Retail
E-Commerce Catalog
Review Analysis
Product Search
Product Recommendations
Search in E-Commerce
Building an E-Commerce Catalog
Attribute Extraction
Product Categorization and Taxonomy
Product Enrichment
Product Deduplication and Matching
Review Analysis
Sentiment Analysis
Aspect-Level Sentiment Analysis
Connecting Overall Ratings to Aspects
Understanding Aspects
Recommendations for E-Commerce
A Case Study: Substitutes and Complements
Wrapping Up
Chapter 10. Healthcare, Finance, and Law
Healthcare
Health and Medical Records
Patient Prioritization and Billing
Pharmacovigilance
Clinical Decision Support Systems
Health Assistants
Electronic Health Records
Mental Healthcare Monitoring
Medical Information Extraction and Analysis
Finance and Law
NLP Applications in Finance
NLP and the Legal Landscape
Wrapping Up
Part IV. Bringing It All Together
Chapter 11. The End-to-End NLP Process
Revisiting the NLP Pipeline: Deploying NLP Software
An Example Scenario
Building and Maintaining a Mature System
Finding Better Features
Iterating Existing Models
Code and Model Reproducibility
Troubleshooting and Interpretability
Monitoring
Minimizing Technical Debt
Automating Machine Learning
The Data Science Process
The KDD Process
Microsoft Team Data Science Process
Making AI Succeed at Your Organization
Team
Right Problem and Right Expectations
Data and Timing
A Good Process
Other Aspects
Peeking over the Horizon
Final Words
Index
About the Authors
Colophon