Getting Started with Google BERT: Build and train state-of-the-art natural language processing models using BERT

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Kickstart your NLP journey by exploring BERT and its variants such as ALBERT, RoBERTa, DistilBERT, VideoBERT, and more with Hugging Face's transformers library

Key Features
  • Explore the encoder and decoder of the transformer model
  • Become well-versed with BERT along with ALBERT, RoBERTa, and DistilBERT
  • Discover how to pre-train and fine-tune BERT models for several NLP tasks
Book Description

BERT (bidirectional encoder representations from transformer) has revolutionized the world of natural language processing (NLP) with promising results. This book is an introductory guide that will help you get to grips with Google's BERT architecture. With a detailed explanation of the transformer architecture, this book will help you understand how the transformer’s encoder and decoder work.

You’ll explore the BERT architecture by learning how the BERT model is pre-trained and how to use pre-trained BERT for downstream tasks by fine-tuning it for NLP tasks such as sentiment analysis and text summarization with the Hugging Face transformers library. As you advance, you’ll learn about different variants of BERT such as ALBERT, RoBERTa, and ELECTRA, and look at SpanBERT, which is used for NLP tasks like question answering. You'll also cover simpler and faster BERT variants based on knowledge distillation such as DistilBERT and TinyBERT. The book takes you through MBERT, XLM, and XLM-R in detail and then introduces you to sentence-BERT, which is used for obtaining sentence representation. Finally, you'll discover domain-specific BERT models such as BioBERT and ClinicalBERT, and discover an interesting variant called VideoBERT.

By the end of this BERT book, you’ll be well-versed with using BERT and its variants for performing practical NLP tasks.

What you will learn
  • Understand the transformer model from the ground up
  • Find out how BERT works and pre-train it using masked language model (MLM) and next sentence prediction (NSP) tasks
  • Get hands-on with BERT by learning to generate contextual word and sentence embeddings
  • Fine-tune BERT for downstream tasks
  • Get to grips with ALBERT, RoBERTa, ELECTRA, and SpanBERT models
  • Get the hang of the BERT models based on knowledge distillation
  • Understand cross-lingual models such as XLM and XLM-R
  • Explore Sentence-BERT, VideoBERT, and BART
Who this book is for

This book is for NLP professionals and data scientists looking to simplify NLP tasks to enable efficient language understanding using BERT. A basic understanding of NLP concepts and deep learning is required to get the best out of this book.

Author(s): Sudharsan Ravichandiran
Publisher: Packt Publishing Ltd
Year: 2021

Language: English
Pages: 352

Cover
Title Page
Copyright and Credits
Dedication
About Packt
Contributors
Table of Contents
Preface
Section 1 - Starting Off with BERT
Chapter 1: A Primer on Transformers
Introduction to the transformer 
Understanding the encoder of the transformer 
Self-attention mechanism 
Understanding the self-attention mechanism 
Step 1
Step 2
Step 3
Step 4
Multi-head attention mechanism 
Learning position with positional encoding 
Feedforward network
Add and norm component 
Putting all the encoder components together 
Understanding the decoder of a transformer
Masked multi-head attention 
Multi-head attention 
Feedforward network
Add and norm component 
Linear and softmax layers
Putting all the decoder components together 
Putting the encoder and decoder together 
Training the transformer
Summary
Questions
Further reading
Chapter 2: Understanding the BERT Model
Basic idea of BERT 
Working of BERT 
Configurations of BERT 
BERT-base
BERT-large
Other configurations of BERT
Pre-training the BERT model
Input data representation
Token embedding 
Segment embedding
Position embedding 
Final representation 
WordPiece tokenizer 
Pre-training strategies 
Language modeling
Auto-regressive language modeling 
Auto-encoding language modeling
Masked language modeling
Whole word masking 
Next sentence prediction 
Pre-training procedure 
Subword tokenization algorithms 
Byte pair encoding 
Tokenizing with BPE 
Byte-level byte pair encoding 
WordPiece
Summary
Questions
Further reading
Chapter 3: Getting Hands-On with BERT
Exploring the pre-trained BERT model
Extracting embeddings from pre-trained BERT 
Hugging Face transformers 
Generating BERT embeddings
Preprocessing the input 
Getting the embedding 
Extracting embeddings from all encoder layers of BERT
Extracting the embeddings 
Preprocessing the input
Getting the embeddings 
Fine-tuning BERT for downstream tasks
Text classification 
Fine-tuning BERT for sentiment analysis 
Importing the dependencies 
Loading the model and dataset
Preprocessing the dataset
Training the model 
Natural language inference 
Question-answering
Performing question-answering with fine-tuned BERT 
Preprocessing the input
Getting the answer
Named entity recognition 
Summary 
Questions
Further reading 
Section 2 - Exploring BERT Variants
Chapter 4: BERT Variants I - ALBERT, RoBERTa, ELECTRA, and SpanBERT
A Lite version of BERT 
Cross-layer parameter sharing 
Factorized embedding parameterization
Training the ALBERT model
Sentence order prediction
Comparing ALBERT with BERT 
Extracting embeddings with ALBERT
Robustly Optimized BERT pre-training Approach
Using dynamic masking instead of static masking 
Removing the NSP task
Training with more data points
Training with a large batch size 
Using BBPE as a tokenizer 
Exploring the RoBERTa tokenizer 
Understanding ELECTRA 
Understanding the replaced token detection task 
Exploring the generator and discriminator of ELECTRA 
Training the ELECTRA model
Exploring efficient training methods
Predicting span with SpanBERT
Understanding the architecture of SpanBERT
Exploring SpanBERT 
Performing Q&As with pre-trained SpanBERT 
Summary
Questions
Further reading 
Chapter 5: BERT Variants II - Based on Knowledge Distillation
Introducing knowledge distillation 
Training the student network 
DistilBERT – the distilled version of BERT 
Teacher-student architecture 
The teacher BERT
The student BERT
Training the student BERT (DistilBERT) 
Introducing TinyBERT 
Teacher-student architecture  
Understanding the teacher BERT  
Understanding the student BERT 
Distillation in TinyBERT 
Transformer layer distillation 
Attention-based distillation
Hidden state-based distillation 
Embedding layer distillation 
Prediction layer distillation
The final loss function 
Training the student BERT (TinyBERT)
General distillation 
Task-specific distillation 
The data augmentation method 
Transferring knowledge from BERT to neural networks
Teacher-student architecture 
The teacher BERT 
The student network 
Training the student network  
The data augmentation method
Understanding the masking method
Understanding the POS-guided word replacement method 
Understanding the n-gram sampling method
The data augmentation procedure
Summary
Questions
Further reading 
Section 3 - Applications of BERT
Chapter 6: Exploring BERTSUM for Text Summarization
Text summarization 
Extractive summarization
Abstractive summarization 
Fine-tuning BERT for text summarization 
Extractive summarization using BERT 
BERTSUM with a classifier 
BERTSUM with a transformer and LSTM 
BERTSUM with an inter-sentence transformer 
BERTSUM with LSTM 
Abstractive summarization using BERT 
Understanding ROUGE evaluation metrics
Understanding the ROUGE-N metric 
ROUGE-1 
ROUGE-2 
Understanding ROUGE-L  
The performance of the BERTSUM model 
Training the BERTSUM model 
Summary 
Questions
Further reading
Chapter 7: Applying BERT to Other Languages
Understanding multilingual BERT 
Evaluating M-BERT on the NLI task 
Zero-shot 
TRANSLATE-TEST 
TRANSLATE-TRAIN
TRANSLATE-TRAIN-ALL
How multilingual is multilingual BERT? 
Effect of vocabulary overlap
Generalization across scripts 
Generalization across typological features 
Effect of language similarity
Effect of code switching and transliteration
Code switching 
Transliteration 
M-BERT on code switching and transliteration 
The cross-lingual language model
Pre-training strategies 
Causal language modeling 
Masked language modeling 
Translation language modeling 
Pre-training the XLM model
Evaluation of XLM
Understanding XLM-R
Language-specific BERT 
FlauBERT for French 
Getting a representation of a French sentence with FlauBERT 
French Language Understanding Evaluation
BETO for Spanish 
Predicting masked words using BETO 
BERTje for Dutch
Next sentence prediction with BERTje
German BERT 
Chinese BERT 
Japanese BERT 
FinBERT for Finnish
UmBERTo for Italian 
BERTimbau for Portuguese 
RuBERT for Russian 
Summary
Questions
Further reading
Chapter 8: Exploring Sentence and Domain-Specific BERT
Learning about sentence representation with Sentence-BERT  
Computing sentence representation 
Understanding Sentence-BERT 
Sentence-BERT with a Siamese network 
Sentence-BERT for a sentence pair classification task
Sentence-BERT for a sentence pair regression task
Sentence-BERT with a triplet network
Exploring the sentence-transformers library 
Computing sentence representation using Sentence-BERT 
Computing sentence similarity 
Loading custom models
Finding a similar sentence with Sentence-BERT 
Learning multilingual embeddings through knowledge distillation 
Teacher-student architecture
Using the multilingual model 
Domain-specific BERT 
ClinicalBERT 
Pre-training ClinicalBERT 
Fine-tuning ClinicalBERT 
Extracting clinical word similarity 
BioBERT 
Pre-training the BioBERT model
Fine-tuning the BioBERT model 
BioBERT for NER tasks 
BioBERT for question answering 
Summary 
Questions
Further reading
Chapter 9: Working with VideoBERT, BART, and More
Learning language and video representations with VideoBERT 
Pre-training a VideoBERT model  
Cloze task 
Linguistic-visual alignment  
The final pre-training objective 
Data source and preprocessing 
Applications of VideoBERT 
Predicting the next visual tokens
Text-to-video generation 
Video captioning 
Understanding BART 
Architecture of BART 
Noising techniques 
Token masking
Token deletion
Token infilling 
Sentence shuffling 
Document rotation
Comparing different pre-training objectives 
Performing text summarization with BART 
Exploring BERT libraries 
Understanding ktrain
Sentiment analysis using ktrain
Building a document answering model 
Document summarization 
bert-as-service 
Installing the library 
Computing sentence representation
Computing contextual word representation 
Summary 
Questions 
Further reading
Assessments
Chapter 1, A Primer on Transformers
Chapter 2, Understanding the BERT Model
Chapter 3, Getting Hands-On with BERT
Chapter 4, BERT Variants I – ALBERT, RoBERTa, ELECTRA, SpanBERT
Chapter 5, BERT Variants II – Based on Knowledge Distillation
Chapter 6, Exploring BERTSUM for Text Summarization
Chapter 7, Applying BERT to Other Languages
Chapter 8, Exploring Sentence- and Domain-Specific BERT
Chapter 9, Working with VideoBERT, BART, and More
Other Books You May Enjoy
Index