Transformer, BERT, and GPT3 : Including ChatGPT and Prompt Engineering

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book provides a comprehensive group of topics covering the details of the Transformer architecture, BERT models, and the GPT series, including GPT-3 and GPT-4. Spanning across ten chapters, it begins with foundational concepts such as the attention mechanism, then tokenization techniques, explores the nuances of Transformer and BERT architectures, and culminates in advanced topics related to the latest in the GPT series, including ChatGPT. Key chapters provide insights into the evolution and significance of attention in deep learning, the intricacies of the Transformer architecture, a two-part exploration of the BERT family, and hands-on guidance on working with GPT-3. The concluding chapters present an overview of ChatGPT, GPT-4, and visualization using generative AI. In addition to the primary topics, the book also covers influential AI organizations such as DeepMind, OpenAI, Cohere, Hugging Face, and more. Readers will gain a comprehensive understanding of the current landscape of NLP models, their underlying architectures, and practical applications. Features companion files with numerous code samples and figures from the book. Although this book is introductory in nature, some knowledge of Python 3.x with certainly be helpful for the code samples. Knowledge of other programming languages (such as Java) can also be helpful because of the exposure to programming concepts and constructs. The less technical knowledge that you have, the more diligence will be required in order to understand the various topics that are covered. If you want to be sure that you can grasp the material in this book, glance through some of the code samples to get an idea of how much is familiar to you and how much is new for you. Features: Provides a comprehensive group of topics covering the details of the Transformer architecture, BERT models, and the GPT series, including GPT-3 and GPT-4. Features companion files with numerous code samples and figures from the book. The target audience: This book is intended primarily for people who have a basic knowledge of Machine Learning or software developers who are interested in working with LLMs. Specifically, this book is for readers who are accustomed to searching online for more detailed information about technical topics. If you are a beginner, there are other books that may be more suitable for you, and you can find them by performing an online search. This book is also intended to reach an international audience of readers with highly diverse backgrounds in various age groups. In addition, this book uses standard English rather than colloquial expressions that might be confusing to those readers. This book provide a comfortable and meaningful learning experience for the intended readers.

Author(s): Oswald Campesato
Publisher: Mercury Learning and Information
Year: 2023

Language: English
Pages: 379

Front Cover
Half-Title Page
LICENSE, DISCLAIMER OF LIABILITY, AND LIMITED WARRANTY
Title Page
Copyright Page
Dedication
Contents
Preface
Chapter 1 Introduction
What is Generative AI?
Conversational AI Versus Generative AI
Is DALL-E Part of Generative AI?
Are ChatGPT-3 and GPT-4 Part of Generative AI?
DeepMind
OpenAI
Cohere
Hugging Face
AI21
InflectionAI
Anthropic
What are LLMs?
What is AI Drift?
Machine Learning and Drift (Optional)
What is Attention?
Calculating Attention: A High-Level View
An Example of Self Attention
Multi-Head Attention (MHA)
Summary
Chapter 2 Tokenization
What is Pre-Tokenization?
What is Tokenization?
Word, Character, and Subword Tokenizers
Trade-Offs with Character-Based Tokenizers
Subword Tokenization
Subword Tokenization Algorithms
Hugging Face Tokenizers and Models
Hugging Face Tokenizers
Tokenization for the DistilBERT Model
Token Selection Techniques in LLMs
Summary
Chapter 3 Transformer Architecture Introduction
Sequence-to-Sequence Models
Examples of seq2seq Models
What About RNNs and LSTMs?
Encoder/Decoder Models
Examples of Encoder/Decoder Models
Autoregressive Models
Autoencoding Models
The Transformer Architecture: Introduction
The Transformer is an Encoder/Decoder Model
The Transformer Flow and Its Variants
The transformers Library from Hugging Face
Transformer Architecture Complexity
Hugging Face Transformer Code Samples
Transformer and Mask-Related Tasks
Summary
Chapter 4 Transformer Architecture in Greater Depth
An Overview of the Encoder
What are Positional Encodings?
Other Details Regarding Encoders
An Overview of the Decoder
Encoder, Decoder, or Both: How to Decide?
Delving Deeper into the Transformer Architecture
Autoencoding Transformers
The “Auto” Classes
Improved Architectures
Hugging Face Pipelines and How They Work
Hugging Face Datasets
Transformers and Sentiment Analysis
Source Code for Transformer-Based Models
Summary
Chapter 5 The BERT Family Introduction
What is Prompt Engineering?
Aspects of LLM Development
Kaplan and Under-Trained Models
What is BERT?
BERT and NLP Tasks
BERT and the Transformer Architecture
BERT and Text Processing
BERT and Data Cleaning Tasks
Three BERT Embedding Layers
Creating a BERT Model
Training and Saving a BERT Model
The Inner Workings of BERT
Summary
Chapter 6 The BERT Family in Greater Depth
A Code Sample for Special BERT Tokens
BERT-Based Tokenizers
Sentiment Analysis with DistilBERT
BERT Encoding: Sequence of Steps
Sentence Similarity in BERT
Generating BERT Tokens (1)
Generating BERT Tokens (2)
The BERT Family
Working with RoBERTa
Italian and Japanese Language Translation
Multilingual Language Models
Translation for 1,000 Languages
M-BERT
Comparing BERT-Based Models
Web-Based Tools for BERT
Topic Modeling with BERT
What is T5?
Working with PaLM
Summary
Chapter 7 Working with GPT-3 Introduction
The GPT Family: An Introduction
GPT-2 and Text Generation
What is GPT-3?
GPT-3 Models
What is the Goal of GPT-3?
What Can GPT-3 Do?
Limitations of GPT-3
GPT-3 Task Performance
How GPT-3 and BERT are Different
The GPT-3 Playground
Inference Parameters
Overview of Prompt Engineering
Details of Prompt Engineering
Few-Shot Learning and Fine-Tuning LLMs
Summary
Chapter 8 Working with GPT-3 in Greater Depth
Fine-Tuning and Reinforcement Learning (Optional)
GPT-3 and Prompt Samples
Working with Python and OpenAI APIs
Text Completion in OpenAI
The Completion() API in OpenAI
Text Completion and Temperature
Text Classification with GPT-3
Sentiment Analysis with GPT-3
GPT-3 Applications
Open-Source Variants of GPT-3
Miscellaneous Topics
Summary
Chapter 9 ChatGPT and GPT-4
What is ChatGPT?
Plugins, Code Interpreter, and Code Whisperer
Detecting Generated Text
Concerns about ChatGPT
Sample Queries and Responses from ChatGPT
ChatGPT and Medical Diagnosis
Alternatives to ChatGPT
Machine Learning and ChatGPT: Advanced Data Analytics
What is InstructGPT?
VizGPT and Data Visualization
What is GPT-4?
ChatGPT and GPT-4 Competitors
LlaMa-2
When Will GPT-5 Be Available?
Summary
Chapter 10 Visualization with Generative AI
Generative AI and Art and Copyrights
Generative AI and GANs
What is Diffusion?
CLIP (OpenAI)
GLIDE (OpenAI)
Text-to-Image Generation
Text-to-Image Models
The DALL-E Models
DALL-E 2
DALL-E Demos
Text-to-Video Generation
Text-to-Speech Generation
Summary
Index