Build advanced Natural Language Understanding Systems by acquiring data and selecting appropriate technology.
Key Features
• Master NLU concepts from basic text processing to advanced deep learning techniques
• Explore practical NLU applications like chatbots, sentiment analysis, and language translation
• Gain a deeper understanding of large language models like ChatGPT
Book Description
Natural language understanding (NLU) organizes and structures, language allowing computer systems to effectively process textual information for many different practical applications. Natural Language Understanding with Python will help you explore practical techniques that make use of NLU to build a wide variety of creative and useful applications.
Complete with step-by-step explanations of essential concepts and practical examples, this book begins by teaching you about NLU and its applications. You'll then explore a wide range of current NLU techniques and their most appropriate use-case. In the process, you'll be introduced to the most useful Python NLU libraries. Not only will you learn the basics of NLU, you'll also be introduced to practical issues such as acquiring data, evaluating systems, and deploying NLU applications, along with their solutions. This book is a comprehensive guide that will help you explore the full spectrum of essential NLU techniques and resources.
By the end of this book, you will be familiar with the foundational concepts of NLU, deep learning, and large language models (LLMs). You will be well on your way to having the skills to independently apply NLU technology in your own academic and practical applications.
What you will learn
• The most important skill that readers will acquire is not just HOW to apply natural language techniques, but WHY to select particular techniques.
• The book will also cover important practical considerations concerning acquiring real data and evaluating real system performance, not just performing textbook evaluations with pre-existing corpora
• After reading this book and studying the code, readers will be equipped to build state of the art as well as practical natural language applications to solve real problems.
• How to develop and fine-tune an NLP application
• Maintaining NLP applications after deployment
Who this book is for
This book is for python developers, computational linguists, linguists, data scientists, NLP developers, conversational AI developers, and students looking to learn about natural language understanding (NLU) and applying natural language processing (NLP) technology to real problems. Anyone interested in addressing natural language problems will find this book useful. Working knowledge in Python is a must.
Author(s): Deborah A. Dahl
Edition: 1
Publisher: Packt Publishing
Year: 2023
Language: English
Commentary: Publisher's PDF
Pages: 326
City: Birmingham, UK
Tags: Artificial Intelligence; Machine Learning; Neural Networks; Deep Learning; Natural Language Processing; Unsupervised Learning; Python; Support Vector Machines; Data Visualization; Keras; NLTK; Semantic Analysis; Natural Language Understanting; Text Classification; Regular Expressions; Transformers; Naïve Bayes; BERT; Data Exploration; spaCy; GPT-3; ChatGPT; Large Language Models
Cover
Title Page
Copyright
Dedication
Contributors
Table of Contents
Preface
Part 1: Getting Started with Natural Language Understanding Technology
Chapter 1: Natural Language Understanding, Related Technologies, and Natural Language Applications
Understanding the basics of natural language
Global considerations – languages, encodings, and translations
The relationship between conversational AI and NLP
Exploring interactive applications – chatbots and voice assistants
Generic voice assistants
Enterprise assistants
Translation
Education
Exploring non-interactive applications
Classification
Sentiment analysis
Spam and phishing detection
Fake news detection
Document retrieval
Analytics
Information extraction
Translation
Summarization, authorship, correcting grammar, and other applications
A summary of the types of applications
A look ahead – Python for NLP
Summary
Chapter 2: Identifying Practical Natural Language Understanding Problems
Identifying problems that are the appropriate level of difficulty for the technology
Looking at difficult applications of NLU
Looking at applications that don’t need NLP
Training data
Application data
Taking development costs into account
Taking maintenance costs into account
A flowchart for deciding on NLU applications
Summary
Part 2:Developing and Testing Natural Language Understanding Systems
Chapter 3: Approaches to Natural Language Understanding – Rule-Based Systems, Machine Learning, and Deep Learning
Rule-based approaches
Words and lexicons
Part-of-speech tagging
Grammar
Parsing
Semantic analysis
Pragmatic analysis
Pipelines
Traditional machine learning approaches
Representing documents
Classification
Deep learning approaches
Pre-trained models
Considerations for selecting technologies
Summary
Chapter 4: Selecting Libraries and Tools for Natural Language Understanding
Technical requirements
Installing Python
Developing software – JupyterLab and GitHub
JupyterLab
GitHub
Exploring the libraries
Using NLTK
Using spaCy
Using Keras
Learning about other NLP libraries
Choosing among NLP libraries
Learning about other packages useful for NLP
Looking at an example
Setting up JupyterLab
Processing one sentence
Looking at corpus properties
Summary
Chapter 5: Natural Language Data – Finding and Preparing Data
Finding sources of data and annotating it
Finding data for your own application
Finding data for a research project
Metadata
Generally available corpora
Ensuring privacy and observing ethical considerations
Ensuring the privacy of training data
Ensuring the privacy of runtime data
Treating human subjects ethically
Treating crowdworkers ethically
Preprocessing data
Removing non-text
Regularizing text
Spelling correction
Application-specific types of preprocessing
Substituting class labels for words and numbers
Redaction
Domain-specific stopwords
Remove HTML markup
Data imbalance
Using text preprocessing pipelines
Choosing among preprocessing techniques
Summary
Chapter 6: Exploring and Visualizing Data
Why visualize?
Text document dataset – Sentence Polarity Dataset
Data exploration
Frequency distributions
Measuring the similarities among documents
General considerations for developing visualizations
Using information from visualization to make decisions about processing
Summary
Chapter 7: Selecting Approaches and Representing Data
Selecting NLP approaches
Fitting the approach to the task
Starting with the data
Considering computational efficiency
Initial studies
Representing language for NLP applications
Symbolic representations
Representing language numerically with vectors
Understanding vectors for document representation
Representing words with context-independent vectors
Word2Vec
Representing words with context-dependent vectors
Summary
Chapter 8: Rule-Based Techniques
Rule-based techniques
Why use rules?
Exploring regular expressions
Recognizing, parsing, and replacing strings with regular expressions
General tips for using regular expressions
Word-level analysis
Lemmatization
Ontologies
Sentence-level analysis
Syntactic analysis
Semantic analysis and slot filling
Summary
Chapter 9: Machine Learning Part 1 – Statistical Machine Learning
A quick overview of evaluation
Representing documents with TF-IDF and classifying with Naïve Bayes
Summary of TF-IDF
Classifying texts with Naïve Bayes
TF-IDF/Bayes classification example
Classifying documents with Support Vector Machines (SVMs)
Slot-filling with CRFs
Representing slot-tagged data
Summary
Chapter 10: Machine Learning Part 2 – Neural Networks and Deep Learning Techniques
Basics of NNs
Example – MLP for classification
Hyperparameters and tuning
Moving beyond MLPs – RNNs
Looking at another approach – CNNs
Summary
Chapter 11: Machine Learning Part 3 – Transformers and Large Language Models
Technical requirements
Overview of transformers and LLMs
Introducing attention
Applying attention in transformers
Leveraging existing data – LLMs or pre-trained models
BERT and its variants
Using BERT – a classification example
Installing the data
Splitting the data into training, validation, and testing sets
Loading the BERT model
Defining the model for fine-tuning
Defining the loss function and metrics
Defining the optimizer and the number of epochs
Compiling the model
Training the model
Plotting the training process
Evaluating the model on the test data
Saving the model for inference
Cloud-based LLMs
ChatGPT
Applying GPT-3
Summary
Chapter 12: Applying Unsupervised Learning Approaches
What is unsupervised learning?
Topic modeling using clustering techniques and label derivation
Grouping semantically similar documents
Applying BERTopic to 20 newsgroups
After clustering and topic labeling
Making the most of data with weak supervision
Summary
Chapter 13: How Well Does It Work? – Evaluation
Why evaluate an NLU system?
Evaluation paradigms
Comparing system results on standard metrics
Evaluating language output
Leaving out part of a system – ablation
Shared tasks
Data partitioning
Evaluation metrics
Accuracy and error rate
Precision, recall, and F1
The receiver operating characteristic and area under the curve
Confusion matrix
User testing
Statistical significance of differences
Comparing three text classification methods
A small transformer system
TF-IDF evaluation
A larger BERT model
Summary
Part 3: Systems in Action – Applying Natural Language Understanding at Scale
Chapter 14: What to Do If the System Isn’t Working
Technical requirements
Figuring out that a system isn’t working
Initial development
Fixing accuracy problems
Changing data
Restructuring an application
Moving on to deployment
Problems after deployment
Summary
Chapter 15: Summary and Looking to the Future
Overview of the book
Potential for improvement – better accuracy and faster training
Better accuracy
Faster training
Other areas for improvement
Applications that are beyond the current state of the art
Processing very long documents
Understanding and creating videos
Interpreting and generating sign languages
Writing compelling fiction
Future directions in NLU technology and research
Quickly extending NLU technologies to new languages
Real-time speech-to-speech translation
Multimodal interaction
Detecting and correcting bias
Summary
Further reading
Index
About Packt
Other Books You May Enjoy