How do you turn raw, unprocessed, or malformed data into dynamic, interactive web visualizations? In this practical book, author Kyran Dale shows data scientists and analysts--as well as Python and JavaScript developers--how to create the ideal toolchain for the job. By providing engaging examples and stressing hard-earned best practices, this guide teaches you how to leverage the power of best-of-breed Python and JavaScript libraries.
Python provides accessible, powerful, and mature libraries for scraping, cleaning, and processing data. And while JavaScript is the best language when it comes to programming web visualizations, its data processing abilities can't compare with Python's. Together, these two languages are a perfect complement for creating a modern web-visualization toolchain. This book gets you started.
You'll learn how to:
• Obtain data you need programmatically, using scraping tools or web APIs: Requests, Scrapy, Beautiful Soup
• Clean and process data using Python's heavyweight data processing libraries within the NumPy ecosystem: Jupyter notebooks with pandas+Matplotlib+Seaborn
• Deliver the data to a browser with static files or by using Flask, the lightweight Python server, and a RESTful API
• Pick up enough web development skills (HTML, CSS, JS) to get your visualized data on the web
• Use the data you've mined and refined to create web charts and visualizations with Plotly, D3, Leaflet, and other libraries
Author(s): Kyran Dale
Edition: 2
Publisher: O’Reilly Media
Year: 2023
Language: English
Commentary: Publisher's PDF
Pages: 566
City: Sebastopol, CA
Tags: Python; JavaScript; Data Visualization; SQL; MongoDB; JSON; Flask; Web Scraping; BeautifulSoup; Scrapy; Requests Library; DOM; NumPy; matplotlib; pandas; D3.js; Jupyter; CSS; Seaborn; Web Design; RESTful API; plotly; HTML
Cover
Copyright
Table of Contents
Preface
The Second Edition
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
Second Edition
Introduction
Who This Book Is For
Minimal Requirements to Use This Book
Why Python and JavaScript?
Why Not Python in the Browser?
Why Python for Data Processing
Python’s Getting Better All the Time
What You’ll Learn
The Choice of Libraries
Preliminaries
The Dataviz Toolchain
1. Scraping Data with Scrapy
2. Cleaning Data with pandas
3. Exploring Data with pandas and Matplotlib
4. Delivering Your Data with Flask
5. Transforming Data into Interactive Visualizations with Plotly and D3
Smaller Libraries
Using the Book
A Little Bit of Context
Summary
Recommended Books
Part I. Basic Toolkit
Chapter 1. Development Setup
The Accompanying Code
Python
Anaconda
Installing Extra Libraries
Virtual Environments
JavaScript
Content Delivery Networks
Installing Libraries Locally
Databases
Getting MongoDB Up and Running
Easy MongoDB with Docker
Integrated Development Environments
Summary
Chapter 2. A Language-Learning Bridge Between Python and JavaScript
Similarities and Differences
Interacting with the Code
Python
JavaScript
Basic Bridge Work
Style Guidelines, PEP 8, and use strict
CamelCase Versus Underscore
Importing Modules, Including Scripts
JavaScript Modules
Keeping Your Namespaces Clean
Outputting “Hello World!”
Simple Data Processing
String Construction
Significant Whitespace Versus Curly Brackets
Comments and Doc-Strings
Declaring Variables Using let or var
Strings and Numbers
Booleans
Data Containers: dicts, objects, lists, Arrays
Functions
Iterating: for Loops and Functional Alternatives
Conditionals: if, else, elif, switch
File Input and Output
Classes and Prototypes
Differences in Practice
Method Chaining
Enumerating a List
Tuple Unpacking
Collections
Underscore
Functional Array Methods and List Comprehensions
Map, Reduce, and Filter with Python’s Lambdas
JavaScript Closures and the Module Pattern
A Cheat Sheet
Summary
Chapter 3. Reading and Writing Data with Python
Easy Does It
Passing Data Around
Working with System Files
CSV, TSV, and Row-Column Data Formats
JSON
Dealing with Dates and Times
SQL
Creating the Database Engine
Defining the Database Tables
Adding Instances with a Session
Querying the Database
Easier SQL with Dataset
MongoDB
Dealing with Dates, Times, and Complex Data
Summary
Chapter 4. Webdev 101
The Big Picture
Single-Page Apps
Tooling Up
The Myth of IDEs, Frameworks, and Tools
A Text-Editing Workhorse
Browser with Development Tools
Terminal or Command Prompt
Building a Web Page
Serving Pages with HTTP
The DOM
The HTML Skeleton
Marking Up Content
CSS
JavaScript
Data
Chrome DevTools
The Elements Tab
The Sources Tab
Other Tools
A Basic Page with Placeholders
Positioning and Sizing Containers with Flex
Filling the Placeholders with Content
Scalable Vector Graphics
The Element
Circles
Applying CSS Styles
Lines, Rectangles, and Polygons
Text
Paths
Scaling and Rotating
Working with Groups
Layering and Transparency
JavaScripted SVG
Summary
Part II. Getting Your Data
Chapter 5. Getting Data Off the Web with Python
Getting Web Data with the Requests Library
Getting Data Files with Requests
Using Python to Consume Data from a Web API
Consuming a RESTful Web API with Requests
Getting Country Data for the Nobel Dataviz
Using Libraries to Access Web APIs
Using Google Spreadsheets
Using the Twitter API with Tweepy
Scraping Data
Why We Need to Scrape
Beautiful Soup and lxml
A First Scraping Foray
Getting the Soup
Selecting Tags
Crafting Selection Patterns
Caching the Web Pages
Scraping the Winners’ Nationalities
Summary
Chapter 6. Heavyweight Scraping with Scrapy
Setting Up Scrapy
Establishing the Targets
Targeting HTML with Xpaths
Testing Xpaths with the Scrapy Shell
Selecting with Relative Xpaths
A First Scrapy Spider
Scraping the Individual Biography Pages
Chaining Requests and Yielding Data
Caching Pages
Yielding Requests
Scrapy Pipelines
Scraping Text and Images with a Pipeline
Specifying Pipelines with Multiple Spiders
Summary
Part III. Cleaning and Exploring Data with pandas
Chapter 7. Introduction to NumPy
The NumPy Array
Creating Arrays
Array Indexing and Slicing
A Few Basic Operations
Creating Array Functions
Calculating a Moving Average
Summary
Chapter 8. Introduction to pandas
Why pandas Is Tailor-Made for Dataviz
Why pandas Was Developed
Categorizing Data and Measurements
The DataFrame
Indices
Rows and Columns
Selecting Groups
Creating and Saving DataFrames
JSON
CSV
Excel Files
SQL
MongoDB
Series into DataFrames
Summary
Chapter 9. Cleaning Data with pandas
Coming Clean About Dirty Data
Inspecting the Data
Indices and pandas Data Selection
Selecting Multiple Rows
Cleaning the Data
Finding Mixed Types
Replacing Strings
Removing Rows
Finding Duplicates
Sorting Data
Removing Duplicates
Dealing with Missing Fields
Dealing with Times and Dates
The Full clean_data Function
Adding the born_in column
Merging DataFrames
Saving the Cleaned Datasets
Summary
Chapter 10. Visualizing Data with Matplotlib
pyplot and Object-Oriented Matplotlib
Starting an Interactive Session
Interactive Plotting with pyplot’s Global State
Configuring Matplotlib
Setting the Figure’s Size
Points, Not Pixels
Labels and Legends
Titles and Axes Labels
Saving Your Charts
Figures and Object-Oriented Matplotlib
Axes and Subplots
Plot Types
Bar Charts
Scatter Plots
seaborn
FacetGrids
PairGrids
Summary
Chapter 11. Exploring Data with pandas
Starting to Explore
Plotting with pandas
Gender Disparities
Unstacking Groups
Historical Trends
National Trends
Prize Winners Per Capita
Prizes by Category
Historical Trends in Prize Distribution
Age and Life Expectancy of Winners
Age at Time of Award
Life Expectancy of Winners
Increasing Life Expectancies over Time
The Nobel Diaspora
Summary
Part IV. Delivering the Data
Chapter 12. Delivering the Data
Serving the Data
Organizing Your Flask Files
Serving Data with Flask
Delivering Data Files
Dynamic Data with Flask APIs
A Simple Data API with Flask
Using Static or Dynamic Delivery
Summary
Chapter 13. RESTful Data with Flask
The Tools for a RESTful Job
Creating the Database
A Flask RESTful Data Server
Serializing with marshmallow
Adding our RESTful API Routes
Posting Data to the API
Extending the API with MethodViews
Paginating the Data Returns
Deploying the API Remotely with Heroku
CORS
Consuming the API Using JavaScript
Summary
Part V. Visualizing Your Data with D3 and Plotly
Chapter 14. Bringing Your Charts to the Web with Matplotlib and Plotly
Static Charts with Matplotlib
Adapting to Screen Sizes
Using Remote Images or Assets
Charting with Plotly
Basic Charts
Plotly Express
Plotly Graph-Objects
Mapping with Plotly
Adding Custom Controls with Plotly
From Notebook to Web with Plotly
Native JavaScript Charts with Plotly
Fetching JSON Files
User-Driven Plotly with JavaScript and HTML
Summary
Chapter 15. Imagining a Nobel Visualization
Who Is It For?
Choosing Visual Elements
Menu Bar
Prizes by Year
A Map Showing Selected Nobel Countries
A Bar Chart Showing Number of Winners by Country
A List of the Selected Winners
A Mini-Biography Box with Picture
The Complete Visualization
Summary
Chapter 16. Building a Visualization
Preliminaries
Core Components
Organizing Your Files
Serving the Data
The HTML Skeleton
CSS Styling
The JavaScript Engine
Importing the Scripts
Modular JS with Imports
Basic Data Flow
The Core Code
Initializing the Nobel Prize Visualization
Ready to Go
Data-Driven Updates
Filtering Data with Crossfilter
Running the Nobel Prize Visualization App
Summary
Chapter 17. Introducing D3—The Story of a Bar Chart
Framing the Problem
Working with Selections
Adding DOM Elements
Leveraging D3
Measuring Up with D3’s Scales
Quantitative Scales
Ordinal Scales
Unleashing the Power of D3 with Data Binding/Joining
Updating the DOM with Data
Putting the Bar Chart Together
Axes and Labels
Transitions
Updating the Bar Chart
Summary
Chapter 18. Visualizing Individual Prizes
Building the Framework
Scales
Axes
Category Labels
Nesting the Data
Adding the Winners with a Nested Data-Join
A Little Transitional Sparkle
Updating the Bar Chart
Summary
Chapter 19. Mapping with D3
Available Maps
D3’s Mapping Data Formats
GeoJSON
TopoJSON
Converting Maps to TopoJSON
D3 Geo, Projections, and Paths
Projections
Paths
graticules
Putting the Elements Together
Updating the Map
Adding Value Indicators
Our Completed Map
Building a Simple Tooltip
Updating the Map
Summary
Chapter 20. Visualizing Individual Winners
Building the List
Building the Bio-Box
Updating the Winners List
Summary
Chapter 21. The Menu Bar
Creating HTML Elements with D3
Building the Menu Bar
Building the Category Selector
Adding the Gender Selector
Adding the Country Selector
Wiring Up the Metric Radio Button
Summary
Chapter 22. Conclusion
Recap
Part I: Basic Toolkit
Part II: Getting Your Data
Part III: Cleaning and Exploring Data with pandas
Part IV: Delivering the Data
Part V: Visualizing Your Data with D3 and Plotly
Future Progress
Visualizing Social Media Networks
Machine-Learning Visualizations
Final Thoughts
Appendix A. D3’s enter/exit Pattern
The enter Method
Accessing the Bound Data
Index
About the Author
Colophon