This book is designed to show readers the concepts of Python 3 programming and the art of data visualization. It also explores cutting-edge techniques using ChatGPT/GPT-4 in harmony with Python for generating visuals that tell more compelling data stories. Chapter 1 introduces the essentials of Python, covering a vast array of topics from basic data types, loops, and functions to more advanced constructs like dictionaries, sets, and matrices. In Chapter 2, the focus shifts to NumPy and its powerful array operations, leading into data visualization using prominent libraries such as Matplotlib. Chapter 6 includes Seaborn's rich visualization tools, offering insights into datasets like Iris and Titanic. Further, the book covers other visualization tools and techniques, including SVG graphics, D3 for dynamic visualizations, and more. Chapter 7 covers information about the main features of ChatGPT and GPT-4, as well as some of their competitors. Chapter 8 contains examples of using ChatGPT in order to perform data visualization, such as charts and graphs that are based on datasets (e.g., the Titanic dataset). Companion files with code, datasets, and figures are available for downloading. From foundational Python concepts to the intricacies of data visualization, this book is ideal for Python practitioners, data scientists, and anyone in the field of data analytics looking to enhance their storytelling with data through visuals. It's also perfect for educators seeking material for teaching advanced data visualization techniques.
Features:
- Explores cutting-edge techniques using ChatGPT/GPT-4 in harmony with Python for generating visuals that tell more compelling data stories
- Contains detailed tutorials that guide you through the creation of complex visuals
- Tackles actual data scenarios and builds your expertise as you apply learned concepts to real datasets
- Features data manipulation and cleaning with Pandas to prepare flawless datasets ready for visualization
- Includes companion files with source code, data sets, and figures.
The target audience:
This book is intended primarily for people who have worked with Python and are interested in learning about graphics effects with Python libraries. This book is also intended to reach an international audience of readers with highly diverse backgrounds in various age groups. Consequently, this book uses standard English rather than colloquial expressions that might be confusing to those readers. This book provides a comfortable and meaningful learning experience for the intended readers.
Author(s): Oswald Campesato
Publisher: Mercury Learning and Information
Year: 2024
Language: English
Pages: 314
Cover
Title Page
Copyright Page
Dedication
Contents
Preface
Chapter 1: Introduction to Python
Tools for Python
easy_install and pip
virtualenv
IPython
Python Installation
Setting the PATH Environment Variable (Windows Only)
Launching Python on Your Machine
The Python Interactive Interpreter
Python Identifiers
Lines, Indentation, and Multi-Line Comments
Quotations and Comments in Python
Saving Your Code in a Module
Some Standard Modules in Python
The help() and dir() Functions
Compile Time and Runtime Code Checking
Simple Data Types
Working with Numbers
Working with Other Bases
The chr() Function
The round() Function
Formatting Numbers
Working with Fractions
Unicode and UTF-8
Working with Unicode
Working with Strings
Comparing Strings
Formatting Strings
Slicing and Splicing Strings
Testing for Digits and Alphabetic Characters
Search and Replace a String in Other Strings
Remove Leading and Trailing Characters
Printing Text without NewLine Characters
Text Alignment
Working with Dates
Converting Strings to Dates
Exception Handling in Python
Handling User Input
Command-Line Arguments
Summary
Chapter 2: Introduction to NumPy
What is NumPy?
Useful NumPy Features
What are NumPy Arrays?
Working with Loops
Appending Elements to Arrays (1)
Appending Elements to Arrays (2)
Multiplying Lists and Arrays
Doubling the Elements in a List
Lists and Exponents
Arrays and Exponents
Math Operations and Arrays
Working with “–1” Subranges with Vectors
Working with “–1” Subranges with Arrays
Other Useful NumPy Methods
Arrays and Vector Operations
NumPy and Dot Products (1)
NumPy and Dot Products (2)
NumPy and the Length of Vectors
NumPy and Other Operations
NumPy and the reshape() Method
Calculating the Mean and Standard Deviation
Code Sample with Mean and Standard Deviation
Trimmed Mean and Weighted Mean
Working with Lines in the Plane (Optional)
Plotting Randomized Points with NumPy and Matplotlib
Plotting a Quadratic with NumPy and Matplotlib
What is Linear Regression?
What is Multivariate Analysis?
What about Non-Linear Datasets?
The MSE (Mean Squared Error) Formula
Other Error Types
Non-Linear Least Squares
Calculating the MSE Manually
Find the Best-Fitting Line in NumPy
Calculating the MSE by Successive Approximation (1)
Calculating the MSE by Successive Approximation (2)
Google Colaboratory
Uploading CSV Files in Google Colaboratory
Summary
Chapter 3: Pandas and Data Visualization
What Is Pandas?
Pandas DataFrames
Dataframes and Data Cleaning Tasks
A Pandas DataFrame Example
Describing a Pandas DataFrame
Pandas Boolean DataFrames
Transposing a Pandas DataFrame
Pandas DataFrames and Random Numbers
Converting Categorical Data to Numeric Data
Matching and Splitting Strings in Pandas
Merging and Splitting Columns in Pandas
Combining Pandas DataFrames
Data Manipulation With Pandas DataFrames
Data Manipulation With Pandas DataFrames (2)
Data Manipulation With Pandas DataFrames (3)
Pandas DataFrames and CSV Files
Pandas DataFrames and Excel Spreadsheets
Select, Add, and Delete Columns in DataFrames
Handling Outliers in Pandas
Pandas DataFrames and Scatterplots
Pandas DataFrames and Simple Statistics
Finding Duplicate Rows in Pandas
Finding Missing Values in Pandas
Sorting DataFrames in Pandas
Working With groupby() in Pandas
Aggregate Operations With the titanic.csv Dataset
Working with apply() and mapapply() in Pandas
Useful One-Line Commands in Pandas
What is Texthero?
Data Visualization in Pandas
Summary
Chapter 4: Pandas and SQL
Pandas and Data Visualization
Pandas and Bar Charts
Pandas and Horizontally Stacked Bar Charts
Pandas and Vertically Stacked Bar Charts
Pandas and Nonstacked Area Charts
Pandas and Stacked Area Charts
What Is Fugue?
MySQL, SQLAlchemy, and Pandas
What Is SQLAlchemy?
Read MySQL Data via SQLAlchemy
Export SQL Data From Pandas to Excel
MySQL and Connector/Python
Establishing a Database Connection
Reading Data From a Database Table
Creating a Database Table
Writing Pandas Data to a MySQL Table
Read XML Data in Pandas
Read JSON Data in Pandas
Working WithJSON-Based Data
Python Dictionary and JSON
Python, Pandas, and JSON
Pandas and Regular Expressions (Optional)
What Is SQLite?
SQLite Features
SQLite Installation
Create a Database and a Table
Insert, Select, and Delete Table Data
Launch SQL Files
Drop Tables and Databases
Load CSV Data Into a sqlite Table
Python and SQLite
Connect to a sqlite3 Database
Create a Table in a sqlite3 Database
Insert Data in a sqlite3 Table
Select Data From a sqlite3 Table
Populate a Pandas Dataframe From a sqlite3 Table
Histogram With Data From a sqlite3 Table (1)
Histogram With Data From a sqlite3 Table (2)
Working With sqlite3 Tools
SQLiteStudio Installation
DB Browser for SQLite Installation
SQLiteDict (Optional)
Working With Beautiful Soup
Parsing an HTML Web Page
Beautiful Soup and Pandas
Beautiful Soup and Live HTML Web Pages
Summary
Chapter 5: Matplotlib and Visualization
What is Data Visualization?
Types of Data Visualization
What is Matplotlib?
Matplotlib Styles
Display Attribute Values
Color Values in Matplotlib
Cubed Numbers in Matplotlib
Horizontal Lines in Matplotlib
Slanted Lines in Matplotlib
Parallel Slanted Lines in Matplotlib
A Grid of Points in Matplotlib
A Dotted Grid in Matplotlib
Two Lines and a Legend in Matplotlib
Loading Images in Matplotlib
A Checkerboard in Matplotlib
Randomized Data Points in Matplotlib
A Set of Line Segments in Matplotlib
Plotting Multiple Lines in Matplotlib
Trigonometric Functions in Matplotlib
A Histogram in Matplotlib
Histogram with Data from a sqlite3 Table
Plot Bar Charts in Matplotlib
Plot a Pie Chart in Matplotlib
Heat Maps in Matplotlib
Save Plot as a PNG File
Working with SweetViz
Working with Skimpy
3D Charts in Matplotlib
Plotting Financial Data with MPLFINANCE
Charts and Graphs with Data from Sqlite3
Summary
Chapter 6: Seaborn for Data Visualization
Working With Seaborn
Features of Seaborn
Seaborn Dataset Names
Seaborn Built-In Datasets
The Iris Dataset in Seaborn
The Titanic Dataset in Seaborn
Extracting Data From Titanic Dataset in Seaborn (1)
Extracting Data From Titanic Dataset in Seaborn (2)
Visualizing a Pandas Dataset in Seaborn
Seaborn Heat Maps
Seaborn Pair Plots
What Is Bokeh?
Introduction to Scikit-Learn
The Digits Dataset in Scikit-learn
The Iris Dataset in Scikit-Learn
Scikit-Learn, Pandas, and the Iris Dataset
Advanced Topics in Seaborn
Summary
Chapter 7: ChatGPT and GPT-4
What is Generative AI?
Important Features of Generative AI
Popular Techniques in Generative AI
What Makes Generative AI Unique
Conversational AI Versus Generative AI
Primary Objective
Applications
Technologies Used
Training and Interaction
Evaluation
Data Requirements
Is DALL-E Part of Generative AI?
Are ChatGPT-3 and GPT-4 Part of Generative AI?
DeepMind
DeepMind and Games
Player of Games (PoG)
OpenAI
Cohere
Hugging Face
Hugging Face Libraries
Hugging Face Model Hub
AI21
InflectionAI
Anthropic
What is Prompt Engineering?
Prompts and Completions
Types of Prompts
Instruction Prompts
Reverse Prompts
System Prompts Versus Agent Prompts
Prompt Templates
Prompts for Different LLMs
Poorly Worded Prompts
What is ChatGPT?
ChatGPT: GPT-3 “on Steroids”?
ChatGPT: Google “Code Red”
ChatGPT Versus Google Search
ChatGPT Custom Instructions
ChatGPT on Mobile Devices and Browsers
ChatGPT and Prompts
GPTBot
ChatGPT Playground
Plugins, Code Interpreter, and Code Whisperer
Plugins
Advanced Data Analysis
Advanced Data Analysis Versus Claude-2
Code Whisperer
Detecting Generated Text
Concerns About ChatGPT
Code Generation and Dangerous Topics
ChatGPT Strengths and Weaknesses
Sample Queries and Responses from ChatGPT
Chatgpt and Medical Diagnosis
Alternatives to ChatGPT
Google Bard
YouChat
Pi From Inflection
Machine Learning and Chatgpt
What is InstructGPT?
VizGPT and Data Visualization
What is GPT-4?
GPT-4 and Test Scores
GPT-4 Parameters
GPT-4 Fine-Tuning
ChatGPT and GPT-4 Competitors
Bard
CoPilot (OpenAI/Microsoft)
Codex (OpenAI)
Apple GPT
PaLM-2
Med-PaLM M
Claude-2
Llama-2
How to Download Llama-2
Llama-2 Architecture Features
Fine-Tuning Llama-2
When Will GPT-5 Be Available?
Summary
Chapter 8: ChatGPT and Data Visualization
Working with Charts and Graphs
Bar Charts
Pie Charts
Line Graphs
Heat Maps
Histograms
Box Plots
Pareto Charts
Radar Charts
Treemaps
Waterfall Charts
Line Plots with Matplotlib
A Pie Chart Using Matplotlib
Box and Whisker Plots Using Matplotlib
Time Series Visualization with Matplotlib
Stacked Bar Charts with Matplotlib
Donut Charts Using Matplotlib
3D Surface Plots with Matplotlib
Radial or Spider Charts with Matplotlib
Matplotlib’s Contour Plots
Stream Plots for Vector Fields
Quiver Plots for Vector Fields
Polar Plots
Bar Charts with Seaborn
Scatterplots with a Regression Line Using Seaborn
Heat Maps for Correlation Matrices with Seaborn
Histograms with Seaborn
Violin Plots with Seaborn
Pair Plots Using Seaborn
Facet Grids with Seaborn
Hierarchical Clustering
Swarm Plots
Joint Plot for Bivariate Data
Point Plots for Factorized Views
Seaborn’s KDE Plots for Density Estimations
Seaborn’s Ridge Plots
Summary
Index