Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision: Techniques and Use Cases

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision provides an overview of general deep learning methodology and its applications of Natural Language Processing (NLP), speech and Computer Vision tasks. It simplifies and presents the concepts of Deep Learning in a comprehensive manner, with suitable, full-fledged examples of Deep Learning models, with an aim to bridge the gap between the theory and the applications using case studies with code, experiments, and supporting analysis. Voice-based assistants, AI-based chatbots, and advanced driver assistance systems are examples of applications that are becoming more common in daily life. In particular, the profound success of deep learning in a wide variety of domains has served as a benchmark for the many downstream applications in Artificial Intelligence (AI). Application areas of AI include natural language processing (NLP), speech, and computer vision. The cutting-edge deep learning models have predominantly changed the perspectives of varied fields in AI, including speech, vision, and NLP. In this book, we made an attempt to explore the more recent developments of deep learning in the field of NLP, speech, and computer vision. With the knowledge in this book, the reader can understand the intuition behind the working of natural language applications, speech, and computer vision applications. NLP is a part of AI that makes computers to interpret the meaning of human language. NLP utilizes machine learning and deep learning algorithms to derive the context behind the raw text. Computer vision applications such as advanced driver assistance systems, augmented reality, virtual reality, and biometrics have advanced significantly. With the advances in deep learning and neural networks, the field of computer vision has made great strides in the last decade and now outperforms humans in tasks such as object detection and labeling. This book gives an easy understanding of the fundamental concepts of underlying deep learning algorithms to the students, researchers, and industrial researchers as well as anyone interested in deep learning and NLP. It serves as a source of motivation for those who want to create NLP, speech, and computer vision applications. Features: - Covers latest developments in Deep Learning techniques as applied to audio analysis, Computer Vision, and Natural Language Processing - Introduces contemporary applications of Deep Learning techniques as applied to audio, textual, and visual processing - Discovers Deep Learning frameworks and libraries for NLP, speech and computer vision in Python - Gives insights into using the tools and libraries in Python for real-world applications - Provides easily accessible tutorials, and real-world case studies with codes to provide hands-on experience This book is aimed at researchers and graduate students in computer engineering, image, speech, and text processing.

Author(s): L. Ashok Kumar, D. Karthika Renuka
Publisher: CRC Press
Year: 2023

Language: English
Commentary: true
Pages: 246

Cover
Half Title
Title
Copyright
Dedication
Contents
About the Authors
Preface
Acknowledgments
Chapter 1 Introduction
Learning Outcomes
1.1 Introduction
1.1.1 Subsets of Artificial Intelligence
1.1.2 Three Horizons of Deep Learning Applications
1.1.3 Natural Language Processing
1.1.4 Speech Recognition
1.1.5 Computer Vision
1.2 Machine Learning Methods for NLP, Computer Vision (CV), and Speech
1.2.1 Support Vector Machine (SVM)
1.2.2 Bagging
1.2.3 Gradient-boosted Decision Trees (GBDTs)
1.2.4 Naïve Bayes
1.2.5 Logistic Regression
1.2.6 Dimensionality Reduction Techniques
1.3 Tools, Libraries, Datasets, and Resources for the Practitioners
1.3.1 TensorFlow
1.3.2 Keras
1.3.3 Deeplearning4j
1.3.4 Caffe
1.3.5 ONNX
1.3.6 PyTorch
1.3.7 scikit-learn
1.3.8 NumPy
1.3.9 Pandas
1.3.10 NLTK
1.3.11 Gensim
1.3.12 Datasets
1.4 Summary
Bibliography
Chapter 2 Natural Language Processing
Learning Outcomes
2.1 Natural Language Processing
2.2 Generic NLP Pipeline
2.2.1 Data Acquisition
2.2.2 Text Cleaning
2.3 Text Pre-processing
2.3.1 Noise Removal
2.3.2 Stemming
2.3.3 Tokenization
2.3.4 Lemmatization
2.3.5 Stop Word Removal
2.3.6 Parts of Speech Tagging
2.4 Feature Engineering
2.5 Modeling
2.5.1 Start with Simple Heuristics
2.5.2 Building Your Model
2.5.3 Metrics to Build Model
2.6 Evaluation
2.7 Deployment
2.8 Monitoring and Model Updating
2.9 Vector Representation for NLP
2.9.1 One Hot Vector Encoding
2.9.2 Word Embeddings
2.9.3 Bag of Words
2.9.4 TF-IDF
2.9.5 N-gram
2.9.6 Word2Vec
2.9.7 Glove
2.9.8 ElMo
2.10 Language Modeling with n-grams
2.10.1 Evaluating Language Models
2.10.2 Smoothing
2.10.3 Kneser-Ney Smoothing
2.11 Vector Semantics and Embeddings
2.11.1 Lexical Semantics
2.11.2 Vector Semantics
2.11.3 Cosine for Measuring Similarity
2.11.4 Bias and Embeddings
2.12 Summary
Bibliography
Chapter 3 State-of-the-Art Natural Language Processing
Learning Outcomes
3.1 Introduction
3.2 Sequence-to-Sequence Models
3.2.1 Sequence
3.2.2 Sequence Labeling
3.2.3 Sequence Modeling
3.3 Recurrent Neural Networks
3.3.1 Unrolling RNN
3.3.2 RNN-based POS Tagging Use Case
3.3.3 Challenges in RNN
3.4 Attention Mechanisms
3.4.1 Self-attention Mechanism
3.4.2. Multi-head Attention Mechanism
3.4.3 Bahdanau Attention
3.4.4 Luong Attention
3.4.5 Global Attention versus Local Attention
3.4.6 Hierarchical Attention
3.5 Transformer Model
3.5.1 Bidirectional Encoder, Representations, and Transformers (BERT)
3.5.2 GPT3
3.6 Summary
Bibliography
Chapter 4 Applications of Natural Language Processing
Learning Outcomes
4.1 Introduction
4.2 Word Sense Disambiguation
4.2.1 Word Senses
4.2.2 WordNet: A Database of Lexical Relations
4.2.3 Approaches to Word Sense Disambiguation
4.2.4 Applications of Word Sense Disambiguation
4.3 Text Classification
4.3.1 Building the Text Classification Model
4.3.2 Applications of Text Classification
4.3.3 Other Applications
4.4 Sentiment Analysis
4.4.1 Types of Sentiment Analysis
4.5 Spam Email Classification
4.5.1 History of Spam
4.5.2 Spamming Techniques
4.5.3 Types of Spams
4.6 Question Answering
4.6.1 Components of Question Answering System
4.6.2 Information Retrieval-based Factoid Question and Answering
4.6.3 Entity Linking
4.6.4 Knowledge-based Question Answering
4.7 Chatbots and Dialog Systems
4.7.1 Properties of Human Conversation
4.7.2 Chatbots
4.7.3 The Dialog-state Architecture
4.8 Summary
Bibliography
Chapter 5 Fundamentals of Speech Recognition
Learning Outcomes
5.1 Introduction
5.2 Structure of Speech
5.3 Basic Audio Features
5.3.1 Pitch
5.3.2 Timbral Features
5.3.3 Rhythmic Features
5.3.4 MPEG-7 Features
5.4 Characteristics of Speech Recognition System
5.4.1 Pronunciations
5.4.2 Vocabulary
5.4.3 Grammars
5.4.4 Speaker Dependence
5.5 The Working of a Speech Recognition System
5.5.1 Input Speech
5.5.2 Audio Pre-processing
5.5.3 Feature Extraction
5.6 Audio Feature Extraction Techniques
5.6.1 Spectrogram
5.6.2 MFCC
5.6.3 Short-Time Fourier Transform
5.6.4 Linear Prediction Coefficients (LPCC)
5.6.5 Discrete Wavelet Transform (DWT)
5.6.6 Perceptual Linear Prediction (PLP)
5.7 Statistical Speech Recognition
5.7.1 Acoustic Model
5.7.2 Pronunciation Model
5.7.3 Language Model
5.7.4 Conventional ASR Approaches
5.8 Speech Recognition Applications
5.8.1 In Banking
5.8.2 In-Car Systems
5.8.3 Health Care
5.8.4 Experiments by Different Speech Groups for Large-Vocabulary Speech Recognition
5.8.5 Measure of Performance
5.9 Challenges in Speech Recognition
5.9.1 Vocabulary Size
5.9.2 Speaker-Dependent or -Independent
5.9.3 Isolated, Discontinuous, and Continuous Speech
5.9.4 Phonetics
5.9.5 Adverse Conditions
5.10 Open-source Toolkits for Speech Recognition
5.10.1 Frameworks
5.10.2 Additional Tools and Libraries
5.11 Summary
Bibliography
Chapter 6 Deep Learning Models for Speech Recognition
Learning Outcomes
6.1 Traditional Methods of Speech Recognition
6.1.1 Hidden Markov Models (HMMs)
6.1.2 Gaussian Mixture Models (GMMs)
6.1.3 Artificial Neural Network (ANN)
6.1.4 HMM and ANN Acoustic Modeling
6.1.5 Deep Belief Neural Network (DBNN) for Acoustic Modelling
6.2 RNN-based Encoder–Decoder Architecture
6.3 Encoder
6.4 Decoder
6.5 Attention-based Encoder–Decoder Architecture
6.6 Challenges in Traditional ASR and the Motivation for End-to-End ASR
6.7 Summary
Bibliography
Chapter 7 End-to-End Speech Recognition Models
Learning Outcomes
7.1 End-to-End Speech Recognition Models
7.1.1 Definition of End-to-End ASR System
7.1.2 Connectionist Temporal Classification (CTC)
7.1.3 Deep Speech
7.1.4 Deep Speech 2
7.1.5 Listen, Attend, Spell (LAS) Model
7.1.6 JASPER
7.1.7 QuartzNet
7.2 Self-supervised Models for Automatic Speech Recognition
7.2.1 Wav2Vec
7.2.2 Data2Vec
7.2.3 HuBERT
7.3 Online/Streaming ASR
7.3.1 RNN-transducer-Based Streaming ASR
7.3.2 Wav2Letter for Streaming ASR
7.3.3 Conformer Model
7.4 Summary
Bibliography
Chapter 8 Computer Vision Basics
Learning Outcomes
8.1 Introduction
8.1.1 Fundamental Steps for Computer Vision
8.1.2 Fundamental Steps in Digital Image Processing
8.2 Image Segmentation
8.2.1 Steps in Image Segmentation
8.3 Feature Extraction
8.4 Image Classification
8.4.1 Image Classification Using Convolutional Neural Network (CNN)
8.4.2 Convolution Layer
8.4.3 Pooling or Down Sampling Layer
8.4.4 Flattening Layer
8.4.5 Fully Connected Layer
8.4.6 Activation Function
8.5 Tools and Libraries for Computer Vision
8.5.1 OpenCV
8.5.2 MATLAB
8.6 Applications of Computer Vision
8.6.1 Object Detection
8.6.2 Face Recognition
8.6.3 Number Plate Identification
8.6.4 Image-based Search
8.6.5 Medical Imaging
8.7 Summary
Bibliography
Chapter 9 Deep Learning Models for Computer Vision
Learning Outcomes
9.1 Deep Learning for Computer Vision
9.2 Pre-trained Architectures for Computer Vision
9.2.1 LeNet
9.2.2 AlexNet
9.2.3 VGG
9.2.4 Inception
9.2.5 R-CNN
9.2.6 Fast R-CNN
9.2.7 Faster R-CNN
9.2.8 Mask R-CNN
9.2.9 YOLO
9.3 Summary
Bibliography
Chapter 10 Applications of Computer Vision
Learning Outcomes
10.1 Introduction
10.2 Optical Character Recognition
10.2.1 Code Snippets
10.2.2 Result Analysis
10.3 Face and Facial Expression Recognition
10.3.1 Face Recognition
10.3.2 Facial Recognition System
10.3.3 Major Challenges in Recognizing Face Expression
10.3.4 Result Analysis
10.4 Visual-based Gesture Recognition
10.4.1 Framework Used
10.4.2 Code Snippets
10.4.3 Result Analysis
10.4.4 Major Challenges in Gesture Recognition
10.5 Posture Detection and Correction
10.5.1 Framework Used
10.5.2 Squats
10.5.3 Result Analysis
10.6 Summary
Bibliography
Index