This textbook presents an up-to-date and comprehensive overview of Natural Language Processing (NLP), from basic concepts to core algorithms and key applications. Further, it contains seven step-by-step NLP workshops (total length: 14 hours) offering hands-on practice with essential Python tools like NLTK, spaCy, TensorFlow Kera, Transformer and BERT.
The objective of this book is to provide readers with a fundamental grasp of NLP and its core technologies, and to enable them to build their own NLP applications (e.g. Chatbot systems) using Python-based NLP tools. It is both a textbook and NLP tool-book intended for the following readers: undergraduate students from various disciplines who want to learn NLP; lecturers and tutors who want to teach courses or tutorials for undergraduate/graduate students on NLP and related AI topics; and readers with various backgrounds who want to learn NLP, and more importantly, to build workable NLP applications after completing its 14 hours of Python-based workshops.
Natural Language Processing (NLP) and its related applications become part of daily life with exponential growth of Artificial Intelligence (AI) in past decades. NLP applications including Information Retrieval (IR) systems, Text Summarization System, and Question-and-Answering (Chatbot) System became one of the prevalent topics in both industry and academia that had evolved routines and benefited immensely to a wide array of day-to-day services.
Readers of This Book
This book is both an NLP textbook and NLP Python implementation book tailored for:
• Undergraduates and postgraduates of various disciplines including AI, Computer Science, IT, Data Science, etc.
• Lecturers and tutors teaching NLP or related AI courses.
• NLP, AI scientists and developers who would like to learn NLP basic concepts, practice and implement via Python workshops.
• Readers who would like to learn NLP concepts, practice Python-based NLP workshops using various NLP implementation tools such as NLTK, spaCy, TensorFlow Keras, BERT, and Transformer technology.
How to Use This book?
This book can be served as a textbook for undergraduates and postgraduate courses on Natural Language Processing, and a reference book for general readers who would like to learn key technologies and implement NLP applications with contemporary implementation tools such as NLTK, spaCy, TensorFlow, BERT, and Transformer technology.
Author(s): Raymond S. T. Lee
Publisher: Springer
Year: 2023
Language: English
Pages: 454
Preface
Motivation of This Book
Organization and Structure of This Book
Readers of This Book
How to Use This book?
Acknowledgements
About the Book
Contents
About the Author
Abbreviations
Part I: Concepts and Technology
Chapter 1: Natural Language Processing
1.1 Introduction
1.2 Human Language and Intelligence
1.3 Linguistic Levels of Human Language
1.4 Human Language Ambiguity
1.5 A Brief History of NLP
1.5.1 First Stage: Machine Translation (Before 1960s)
1.5.2 Second Stage: Early AI on NLP from 1960s to 1970s
1.5.3 Third Stage: Grammatical Logic on NLP (1970s–1980s)
1.5.4 Fourth Stage: AI and Machine Learning (1980s–2000s)
1.5.5 Fifth Stage: AI, Big Data, and Deep Networks (2010s–Present)
1.6 NLP and AI
1.7 Main Components of NLP
1.8 Natural Language Understanding (NLU)
1.8.1 Speech Recognition
1.8.2 Syntax Analysis
1.8.3 Semantic Analysis
1.8.4 Pragmatic Analysis
1.9 Potential Applications of NLP
1.9.1 Machine Translation (MT)
1.9.2 Information Extraction (IE)
1.9.3 Information Retrieval (IR)
1.9.4 Sentiment Analysis
1.9.5 Question-Answering (Q&A) Chatbots
References
Chapter 2: N-Gram Language Model
2.1 Introduction
2.2 N-Gram Language Model
2.2.1 Basic NLP Terminology
2.2.2 Language Modeling and Chain Rule
2.3 Markov Chain in N-Gram Model
2.4 Live Example: The Adventures of Sherlock Holmes
2.5 Shannon’s Method in N-Gram Model
2.6 Language Model Evaluation and Smoothing Techniques
2.6.1 Perplexity
2.6.2 Extrinsic Evaluation Scheme
2.6.3 Zero Counts Problems
2.6.4 Smoothing Techniques
2.6.5 Laplace (Add-One) Smoothing
2.6.6 Add-k Smoothing
2.6.7 Backoff and Interpolation Smoothing
2.6.8 Good Turing Smoothing
References
Chapter 3: Part-of-Speech (POS) Tagging
3.1 What Is Part-of-Speech (POS)?
3.1.1 Nine Major POS in English Language
3.2 POS Tagging
3.2.1 What Is POS Tagging in Linguistics?
3.2.2 What Is POS Tagging in NLP?
3.2.3 POS Tags Used in the PENN Treebank Project
3.2.4 Why Do We Care About POS in NLP?
3.3 Major Components in NLU
3.3.1 Computational Linguistics and POS
3.3.2 POS and Semantic Meaning
3.3.3 Morphological and Syntactic Definition of POS
3.4 9 Key POS in English
3.4.1 English Word Classes
3.4.2 What Is a Preposition?
3.4.3 What Is a Conjunction?
3.4.4 What Is a Pronoun?
3.4.5 What Is a Verb?
3.5 Different Types of POS Tagset
3.5.1 What Is Tagset?
3.5.2 Ambiguous in POS Tags
3.5.3 POS Tagging Using Knowledge
3.6 Approaches for POS Tagging
3.6.1 Rule-Based Approach POS Tagging
3.6.2 Example of Rule-Based POS Tagging
3.6.3 Example of Stochastic-Based POS Tagging
3.6.4 Hybrid Approach for POS Tagging Using Brill Taggers
3.6.4.1 What Is Transformation-Based Learning?
3.6.4.2 Hybrid POS Tagging: Brill Tagger
3.6.4.3 Learning Brill Tagger Transformations
3.7 Taggers Evaluations
3.7.1 How Good Is an POS Tagging Algorithm?
References
Chapter 4: Syntax and Parsing
4.1 Introduction and Motivation
4.2 Syntax Analysis
4.2.1 What Is Syntax
4.2.2 Syntactic Rules
4.2.3 Common Syntactic Patterns
4.2.4 Importance of Syntax and Parsing in NLP
4.3 Types of Constituents in Sentences
4.3.1 What Is Constituent?
4.3.2 Kinds of Constituents
4.3.3 Noun-Phrase (NP)
4.3.4 Verb-Phrase (VP)
4.3.5 Complexity on Simple Constituents
4.3.6 Verb Phrase Subcategorization
4.3.7 The Role of Lexicon in Parsing
4.3.8 Recursion in Grammar Rules
4.4 Context-Free Grammar (CFG)
4.4.1 What Is Context-Free Language (CFL)?
4.4.2 What Is Context-Free Grammar (CFG)?
4.4.3 Major Components of CFG
4.4.4 Derivations Using CFG
4.5 CFG Parsing
4.5.1 Morphological Parsing
4.5.2 Phonological Parsing
4.5.3 Syntactic Parsing
4.5.4 Parsing as a Kind of Tree Searching
4.5.5 CFG for Fragment of English
4.5.6 Parse Tree for “Play the Piano” for Prior CFG
4.5.7 Top-Down Parser
4.5.8 Bottom-Up Parser
4.5.9 Control of Parsing
4.5.10 Pros and Cons of Top-Down vs. Bottom-Up Parsing
4.5.10.1 Top-Down Parsing Approach
Pros
Cons
4.5.10.2 Bottom-Up Parsing Approach
Pros
Cons
4.6 Lexical and Probabilistic Parsing
4.6.1 Why Using Probabilities in Parsing?
4.6.2 Semantics with Parsing
4.6.3 What Is PCFG?
4.6.4 A Simple Example of PCFG
4.6.5 Using Probabilities for Language Modeling
4.6.6 Limitations for PCFG
4.6.7 The Fix: Lexicalized Parsing
References
Chapter 5: Meaning Representation
5.1 Introduction
5.2 What Is Meaning?
5.3 Meaning Representations
5.4 Semantic Processing
5.5 Common Meaning Representation
5.5.1 First-Order Predicate Calculus (FOPC)
5.5.2 Semantic Networks
5.5.3 Conceptual Dependency Diagram (CDD)
5.5.4 Frame-Based Representation
5.6 Requirements for Meaning Representation
5.6.1 Verifiability
5.6.2 Ambiguity
5.6.3 Vagueness
5.6.4 Canonical Forms
5.6.4.1 What Is Canonical Form?
5.6.4.2 Canonical Form in Meaning Representation
5.6.4.3 Canonical Forms: Pros and Cons
Advantages
Disadvantages
5.7 Inference
5.7.1 What Is Inference?
5.7.2 Example of Inferencing with FOPC
5.8 Fillmore’s Theory of Universal Cases
5.8.1 What Is Fillmore’s Theory of Universal Cases?
5.8.2 Major Case Roles in Fillmore’s Theory
5.8.3 Complications in Case Roles
5.8.3.1 Selectional Restrictions
5.9 First-Order Predicate Calculus
5.9.1 FOPC Representation Scheme
5.9.2 Major Elements of FOPC
5.9.3 Predicate-Argument Structure of FOPC
5.9.4 Meaning Representation Problems in FOPC
5.9.5 Inferencing Using FOPC
References
Chapter 6: Semantic Analysis
6.1 Introduction
6.1.1 What Is Semantic Analysis?
6.1.2 The Importance of Semantic Analysis in NLP
6.1.3 How Human Is Good in Semantic Analysis?
6.2 Lexical Vs Compositional Semantic Analysis
6.2.1 What Is Lexical Semantic Analysis?
6.2.2 What Is Compositional Semantic Analysis?
6.3 Word Senses and Relations
6.3.1 What Is Word Sense?
6.3.2 Types of Lexical Semantics
6.3.2.1 Homonymy
6.3.2.2 Polysemy
6.3.2.3 Metonymy
6.3.2.4 Zeugma Test
6.3.2.5 Synonyms
6.3.2.6 Antonyms
6.3.2.7 Hyponymy and Hypernymy
6.3.2.8 Hyponyms and Instances
6.4 Word Sense Disambiguation
6.4.1 What Is Word Sense Disambiguation (WSD)?
6.4.2 Difficulties in Word Sense Disambiguation
6.4.3 Method for Word Sense Disambiguation
6.5 WordNet and Online Thesauri
6.5.1 What Is WordNet?
6.5.2 What Is Synsets?
6.5.3 Knowledge Structure of WordNet
6.5.4 What Are Major Lexical Relations Captured in WordNet?
6.5.5 Applications of WordNet and Thesauri?
6.6 Other Online Thesauri: MeSH
6.6.1 What Is MeSH?
6.6.2 Uses of the MeSH Ontology
6.7 Word Similarity and Thesaurus Methods
6.8 Introduction
6.8.1 Path-based Similarity
6.8.2 Problems with Path-based Similarity
6.8.3 Information Content Similarity
6.8.4 The Resnik Method
6.8.5 The Dekang Lin Method
6.8.6 The (Extended) Lesk Algorithm
6.9 Distributed Similarity
6.9.1 Distributional Models of Meaning
6.9.2 Word Vectors
6.9.3 Term-Document Matrix
6.9.4 Point-wise Mutual Information (PMI)
6.9.5 Example of Computing PPMI on a Term-Context Matrix
6.9.6 Weighing PMI Techniques
6.9.7 K-Smoothing in PMI Computation
6.9.8 Context and Word Similarity Measurement
6.9.9 Evaluating Similarity
References
Chapter 7: Pragmatic Analysis and Discourse
7.1 Introduction
7.2 Discourse Phenomena
7.2.1 Coreference Resolution
7.2.2 Why Is it Important?
7.2.3 Coherence and Coreference
7.2.3.1 What Is Coherence?
7.2.3.2 What Is Coreference?
7.2.4 Importance of Coreference Relations
7.2.5 Entity-Based Coherence
7.3 Discourse Segmentation
7.3.1 What Is Discourse Segmentation?
7.3.2 Unsupervised Discourse Segmentation
7.3.3 Hearst’s TextTiling Method
7.3.4 TextTiling Algorithm
7.3.5 Supervised Discourse Segmentation
7.4 Discourse Coherence
7.4.1 What Makes a Text Coherent?
7.4.2 What Is Coherence Relation?
7.4.3 Types of Coherence Relations
7.4.4 Hierarchical Structure of Discourse Coherence
7.4.5 Types of Referring Expressions
7.4.6 Features for Filtering Potential Referents
7.4.7 Preferences in Pronoun Interpretation
7.5 Algorithms for Coreference Resolution
7.5.1 Introduction
7.5.2 Hobbs Algorithm
7.5.2.1 What Is Hobbs Algorithm?
7.5.2.2 Hobbs’ Algorithm
7.5.2.3 Example of Using Hobbs’ Algorithm
7.5.2.4 Performance of Hobbs’ Algorithm
7.5.3 Centering Algorithm
7.5.3.1 What Is Centering Algorithm?
7.5.3.2 Part I: Initial Setting
7.5.3.3 Part II: Constraints
7.5.3.4 Part III: Rules and Algorithm
7.5.3.5 Example of Centering Algorithm
7.5.3.6 Performance of Centering Algorithm
7.5.4 Machine Learning Method
7.5.4.1 What is Machine Learning Method?
7.5.4.2 Performance of Log-Linear Model
7.5.4.3 Other Advanced Machine Learning Models
7.6 Evaluation
References
Chapter 8: Transfer Learning and Transformer Technology
8.1 What Is Transfer Learning?
8.2 Motivation of Transfer Learning
8.2.1 Categories of Transfer Learning
8.3 Solutions of Transfer Learning
8.4 Recurrent Neural Network (RNN)
8.4.1 What Is RNN?
8.4.2 Motivation of RNN
8.4.3 RNN Architecture
8.4.4 Long Short-Term Memory (LSTM) Network
8.4.4.1 What Is LSTM?
8.4.4.2 LSTM Architecture
8.4.5 Gate Recurrent Unit (GRU)
8.4.5.1 What Is GRU?
8.4.5.2 GRU Inner Architecture
8.4.6 Bidirectional Recurrent Neural Networks (BRNNs)
8.4.6.1 What Is BRNN?
8.5 Transformer Technology
8.5.1 What Is Transformer?
8.5.2 Transformer Architecture
8.5.2.1 Encoder
8.5.2.2 Decoder
8.5.3 Deep Into Encoder
8.5.3.1 Positional Encoding
8.5.3.2 Self-Attention Mechanism
8.5.3.3 Multi-Head Attention
8.5.3.4 Layer Normalization of Attention Sublayer
8.5.3.5 Feedforward Layer
8.6 BERT
8.6.1 What Is BERT?
8.6.2 Architecture of BERT
8.6.3 Training of BERT
8.6.3.1 Pre-training BERT
8.6.3.2 Next Sentence Prediction (NSP)
8.6.3.3 Fine-tuning BERT
8.7 Other Related Transformer Technology
8.7.1 Transformer-XL
8.7.1.1 Motivation
8.7.1.2 Transformer-XL technology
8.7.2 ALBERT
References
Chapter 9: Major NLP Applications
9.1 Introduction
9.2 Information Retrieval Systems
9.2.1 Introduction to IR Systems
9.2.2 Vector Space Model in IR
9.2.3 Term Distribution Models in IR
9.2.4 Latent Semantic Indexing in IR
9.2.4.1 Query-Likelihood
9.2.4.2 Document-Likelihood
9.2.5 Discourse Segmentation in IR
9.3 Text Summarization Systems
9.3.1 Introduction to Text Summarization Systems
9.3.1.1 Motivation
9.3.1.2 Task Definition
9.3.1.3 Basic Approach
9.3.1.4 Task Goals
9.3.1.5 Task Sub-processes
9.3.2 Text Summarization Datasets
9.3.3 Types of Summarization Systems
9.3.4 Query-Focused Vs Generic Summarization Systems
9.3.4.1 Query-Focused Summarization Systems
9.3.4.2 Generic Summarization Systems
9.3.5 Single and Multiple Document Summarization
9.3.5.1 Single Document Summarization
9.3.5.2 Multiple Document Summarization
9.3.6 Contemporary Text Summarization Systems
9.3.6.1 Contemporary Extractive Text Summarization (ETS) System
9.3.6.2 Graph-Based Method
9.3.6.3 Feature-Based Method
9.3.6.4 Topic Based Method
9.3.6.5 Grammar-Based Method
9.3.6.6 Contemporary Abstractive Text Summarization (ATS) System
9.3.6.7 Aided Summarization Method
9.3.6.8 Contemporary Combined Text Summarization System
9.4 Question-and-Answering Systems
9.4.1 QA System and AI
9.4.1.1 Rule-based QA Systems
9.4.1.2 Information Retrieval (IR)-based QA Systems
9.4.1.3 Neural Network-Based QA Systems
9.4.2 Overview of Industrial QA Systems
9.4.2.1 AliMe QA System
9.4.2.2 Xiao Ice QA System
9.4.2.3 TransferTransfo Conversational Agents
References
Part II: Natural Language Processing Workshops with Python Implementation in 14 Hours
Chapter 10: Workshop#1 Basics of Natural Language Toolkit (Hour 1–2)
10.1 Introduction
10.2 What Is Natural Language Toolkit (NLTK)?
10.3 A Simple Text Tokenization Example Using NLTK
10.4 How to Install NLTK?
10.5 Why Using Python for NLP?
10.6 NLTK with Basic Text Processing in NLP
10.7 Simple Text Analysis with NLTK
10.8 Text Analysis Using Lexical Dispersion Plot
10.8.1 What Is a Lexical Dispersion Plot?
10.8.2 Lexical Dispersion Plot Over Context Using Sense and Sensibility
10.8.3 Lexical Dispersion Plot Over Time Using Inaugural Address Corpus
10.9 Tokenization in NLP with NLTK
10.9.1 What Is Tokenization in NLP?
10.9.2 Different Between Tokenize() vs Split()
10.9.3 Count Distinct Tokens
10.9.4 Lexical Diversity
10.9.4.1 Token Usage Frequency (Lexical Diversity)
10.9.4.2 Word Usage Frequency
10.10 Basic Statistical Tools in NLTK
10.10.1 Frequency Distribution: FreqDist()
10.10.1.1 FreqDist() as Dictionary Object
10.10.1.2 Access FreqDist of Any Token Type
10.10.1.3 Frequency Distribution Plot from NLTK
10.10.2 Rare Words: Hapax
10.10.3 Collocations
10.10.3.1 What Are Collocations?
10.10.3.2 Collocations in NLTK
References
Chapter 11: Workshop#2 N-grams in NLTK and Tokenization in SpaCy (Hour 3–4)
11.1 Introduction
11.2 What Is N-Gram?
11.3 Applications of N-Grams in NLP
11.4 Generation of N-Grams in NLTK
11.5 Generation of N-Grams Statistics
11.6 spaCy in NLP
11.6.1 What Is spaCy?
11.7 How to Install spaCy?
11.8 Tokenization using spaCy
11.8.1 Step 1: Import spaCy Module
11.8.2 Step 2: Load spaCy Module "en_core_web_sm".
11.8.3 Step 3: Open and Read Text File "Adventures_Holmes.txt" Into file_handler "fholmes"
11.8.4 Step 4: Read Adventures of Sherlock Holmes
11.8.5 Step 5: Replace All Newline Symbols
11.8.6 Step 6: Simple Counting
11.8.7 Step 7: Invoke nlp() Method in spaCy
11.8.8 Step 8: Convert Text Document Into Sentence Object
11.8.9 Step 9: Directly Tokenize Text Document
References
Chapter 12: Workshop#3 POS Tagging Using NLTK (Hour 5–6)
12.1 Introduction
12.2 A Revisit on Tokenization with NLTK
12.3 Stemming Using NLTK
12.3.1 What Is Stemming?
12.3.2 Why Stemming?
12.3.3 How to Perform Stemming?
12.3.4 Porter Stemmer
12.3.5 Snowball Stemmer
12.4 Stop-Words Removal with NLTK
12.4.1 What Are Stop-Words?
12.4.2 NLTK Stop-Words List
12.4.3 Try Some Texts
12.4.4 Create Your Own Stop-Words
12.4.4.1 Step 1: Create Own Stop-Word Library List
12.4.4.2 Step 2: Check Object Type and Will See It Has a Simple List
12.4.4.3 Step 3: Study Stop-Word List
12.4.4.4 Step 4: Add New Stop-Word "sampleSW" Using Append()
12.5 Text Analysis with NLTK
12.6 Integration with WordCloud
12.6.1 What Is WordCloud?
12.7 POS Tagging with NLTK
12.7.1 What Is POS Tagging?
12.7.2 Universal POS Tagset
12.7.3 PENN Treebank Tagset (English and Chinese)
12.7.4 Applications of POS Tagging
12.8 Create Own POS Tagger with NLTK
References
Chapter 13: Workshop#4 Semantic Analysis and Word Vectors Using spaCy (Hour 7–8)
13.1 Introduction
13.2 What Are Word Vectors?
13.3 Understanding Word Vectors
13.3.1 Example: A Simple Word Vector
13.4 A Taste of Word Vectors
13.5 Analogies and Vector Operations
13.6 How to Create Word Vectors?
13.7 spaCy Pre-trained Word Vectors
13.8 Similarity Method in Semantic Analysis
13.9 Advanced Semantic Similarity Methods with spaCy
13.9.1 Understanding Semantic Similarity
13.9.2 Euclidian Distance
13.9.3 Cosine Distance and Cosine Similarity
13.9.4 Categorizing Text with Semantic Similarity
13.9.5 Extracting Key Phrases
13.9.6 Extracting and Comparing Named Entities
References
Chapter 14: Workshop#5 Sentiment Analysis and Text Classification with LSTM Using spaCy (Hour 9–10)
14.1 Introduction
14.2 Text Classification with spaCy and LSTM Technology
14.3 Technical Requirements
14.4 Text Classification in a Nutshell
14.4.1 What Is Text Classification?
14.4.2 Text Classification as AI Applications
14.5 Text Classifier with spaCy NLP Pipeline
14.5.1 TextCategorizer Class
14.5.2 Formatting Training Data for the TextCategorizer
14.5.3 System Training
14.5.4 System Testing
14.5.5 Training TextCategorizer for Multi-Label Classification
14.6 Sentiment Analysis with spaCy
14.6.1 IMDB Large Movie Review Dataset
14.6.2 Explore the Dataset
14.6.3 Training the TextClassfier
14.7 Artificial Neural Network in a Nutshell
14.8 An Overview of TensorFlow and Keras
14.9 Sequential Modeling with LSTM Technology
14.10 Keras Tokenizer in NLP
14.10.1 Embedding Words
14.11 Movie Sentiment Analysis with LTSM Using Keras and spaCy
14.11.1 Step 1: Dataset
14.11.2 Step 2: Data and Vocabulary Preparation
14.11.3 Step 3: Implement the Input Layer
14.11.4 Step 4: Implement the Embedding Layer
14.11.5 Step 5: Implement the LSTM Layer
14.11.6 Step 6: Implement the Output Layer
14.11.7 Step 7: System Compilation
14.11.8 Step 8: Model Fitting and Experiment Evaluation
References
Chapter 15: Workshop#6 Transformers with spaCy and TensorFlow (Hour 11–12)
15.1 Introduction
15.2 Technical Requirements
15.3 Transformers and Transfer Learning in a Nutshell
15.4 Why Transformers?
15.5 An Overview of BERT Technology
15.5.1 What Is BERT?
15.5.2 BERT Architecture
15.5.3 BERT Input Format
15.5.4 How to Train BERT?
15.6 Transformers with TensorFlow
15.6.1 HuggingFace Transformers
15.6.2 Using the BERT Tokenizer
15.6.3 Word Vectors in BERT
15.7 Revisit Text Classification Using BERT
15.7.1 Data Preparation
15.7.1.1 Import Related Modules
15.7.1.2 Read emails.csv Datafile
15.7.1.3 Use dropna() to Remove Record with Missing Contents
15.7.2 Start the BERT Model Construction
15.7.2.1 Import BERT Models and Tokenizer
15.7.2.2 Process Input Data with BertTokenizer
15.7.2.3 Double Check Databank to See Whether Data Has
15.7.2.4 Use BERT Tokenizer
15.7.2.5 Define Keras Model Using the Following Lines
15.7.2.6 Perform Model Fitting and Use 1 Epoch to Save Time
15.7.2.7 Review Model Summary
15.8 Transformer Pipeline Technology
15.8.1 Transformer Pipeline for Sentiment Analysis
15.8.2 Transformer Pipeline for QA System
15.9 Transformer and spaCy
References
Chapter 16: Workshop#7 Building Chatbot with TensorFlow and Transformer Technology (Hour 13–14)
16.1 Introduction
16.2 Technical Requirements
16.3 AI Chatbot in a Nutshell
16.3.1 What Is a Chatbot?
16.3.2 What Is a Wake Word in Chatbot?
16.3.2.1 Tailor-Made Wake Word
16.3.2.2 Why Embedded Word Detection?
16.3.3 NLP Components in a Chatbot
16.4 Building Movie Chatbot by Using TensorFlow and Transformer Technology
16.4.1 The Chatbot Dataset
16.4.2 Movie Dialog Preprocessing
16.4.3 Tokenization of Movie Conversation
16.4.4 Filtering and Padding Process
16.4.5 Creation of TensorFlow Movie Dataset Object (mDS)
16.4.6 Calculate Attention Learning Weights
16.4.7 Multi-Head-Attention (MHAttention)
16.4.8 System Implementation
16.4.8.1 Step 1. Implement Masking
16.4.8.2 Step 2. Implement Positional Encoding
16.4.8.3 Step 3. Implement Encoder Layer
16.4.8.4 Step 4. Implement Encoder
16.4.8.5 Step 5. Implement Decoder Layer
16.4.8.6 Step 6. Implement Decoder
16.4.8.7 Step 7. Implement Transformer
16.4.8.8 Step 8. Model Training
16.4.8.9 Step 9. Implement Model Evaluation Function
16.4.8.10 Step 10. Implement Customized Learning Rate
16.4.8.11 Step 11. Compile Chatbot Model
16.4.8.12 Step 12. System Training (Model Fitting)
16.4.8.13 Step 13. System Evaluation and Live Chatting
16.5 Related Works
References
Index