Python Data Analytics: With Pandas, NumPy, and Matplotlib, 3rd Edition

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This third edition is fully updated for the latest version of Python and its related libraries, and includes coverage of social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation Author Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Third Edition is an invaluable reference with its examples of storing, accessing, and analyzing data. What You'll Learn Understand the core concepts of data analysis and the Python ecosystem Go in depth with pandas for reading, writing, and processing data Use tools and techniques for data visualization and image analysis Examine popular deep learning libraries Keras, Theano,TensorFlow, and PyTorch Who This Book Is For Experienced Python developers who need to learn about Pythonic tools for data analysis

Author(s): Fabio Nelli
Edition: 3
Publisher: Apress
Year: 2023

Language: English
Pages: 455

Table of Contents
About the Author
About the Technical Reviewer
Preface
Chapter 1: An Introduction to Data Analysis
Data Analysis
Knowledge Domains of the Data Analyst
Computer Science
Mathematics and Statistics
Machine Learning and Artificial Intelligence
Professional Fields of Application
Understanding the Nature of the Data
When the Data Become Information
When the Information Becomes Knowledge
Types of Data
The Data Analysis Process
Problem Definition
Data Extraction
Data Preparation
Data Exploration/Visualization
Predictive Modeling
Model Validation
Deployment
Quantitative and Qualitative Data Analysis
Open Data
Python and Data Analysis
Conclusions
Chapter 2: Introduction to the Python World
Python—The Programming Language
The Interpreter and the Execution Phases of the Code
CPython
Cython
Pyston
Jython
IronPython
PyPy
RustPython
Installing Python
Python Distributions
Anaconda
Anaconda Navigator
Using Python
Python Shell
Run an Entire Program
Implement the Code Using an IDE
Interact with Python
Writing Python Code
Make Calculations
Import New Libraries and Functions
Data Structure
Functional Programming
Indentation
IPython
IPython Shell
The Jupyter Project
Jupyter QtConsole
Jupyter Notebook
Jupyter Lab
PyPI—The Python Package Index
The IDEs for Python
Spyder
Eclipse (pyDev)
Sublime
Liclipse
NinjaIDE
Komodo IDE
SciPy
NumPy
Pandas
matplotlib
Conclusions
Chapter 3: The NumPy Library
NumPy: A Little History
The NumPy Installation
ndarray: The Heart of the Library
Create an Array
Types of Data
The dtype Option
Intrinsic Creation of an Array
Basic Operations
Arithmetic Operators
The Matrix Product
Increment and Decrement Operators
Universal Functions (ufunc)
Aggregate Functions
Indexing, Slicing, and Iterating
Indexing
Slicing
Iterating an Array
Conditions and Boolean Arrays
Shape Manipulation
Array Manipulation
Joining Arrays
Splitting Arrays
General Concepts
Copies or Views of Objects
Vectorization
Broadcasting
Structured Arrays
Reading and Writing Array Data on Files
Loading and Saving Data in Binary Files
Reading Files with Tabular Data
Conclusions
Chapter 4: The pandas Library—An Introduction
pandas: The Python Data Analysis Library
Installation of pandas
Installation from Anaconda
Installation from PyPI
Getting Started with pandas
Introduction to pandas Data Structures
The Series
Declaring a Series
Selecting the Internal Elements
Assigning Values to the Elements
Defining a Series from NumPy Arrays and Other Series
Filtering Values
Operations and Mathematical Functions
Evaluating Vales
NaN Values
Series as Dictionaries
Operations Between Series
The Dataframe
Defining a Dataframe
Selecting Elements
Assigning Values
Membership of a Value
Deleting a Column
Filtering
Dataframe from a Nested dict
Transposition of a Dataframe
The Index Objects
Methods on Index
Index with Duplicate Labels
Other Functionalities on Indexes
Reindexing
Dropping
Arithmetic and Data Alignment
Operations Between Data Structures
Flexible Arithmetic Methods
Operations Between Dataframes and Series
Function Application and Mapping
Functions by Element
Functions by Row or Column
Statistics Functions
Sorting and Ranking
Correlation and Covariance
“Not a Number” Data
Assigning a NaN Value
Filtering Out NaN Values
Filling in NaN Occurrences
Hierarchical Indexing and Leveling
Reordering and Sorting Levels
Summary Statistics with groupby Instead of with Level
Conclusions
Chapter 5: pandas: Reading and Writing Data
I/O API Tools
CSV and Textual Files
Reading Data in CSV or Text Files
Using Regexp to Parse TXT Files
Reading TXT Files Into Parts
Writing Data in CSV
Reading and Writing HTML Files
Writing Data in HTML
Reading Data from an HTML File
Reading Data from XML
Reading and Writing Data on Microsoft Excel Files
JSON Data
The HDF5 Format
Pickle—Python Object Serialization
Serialize a Python Object with cPickle
Pickling with pandas
Interacting with Databases
Loading and Writing Data with SQLite3
Loading and Writing Data with PostgreSQL in a Docker Container
Reading and Writing Data with a NoSQL Database: MongoDB
Conclusions
Chapter 6: pandas in Depth: Data Manipulation
Data Preparation
Merging
Merging on an Index
Concatenating
Combining
Pivoting
Pivoting with Hierarchical Indexing
Pivoting from “Long” to “Wide” Format
Removing
Data Transformation
Removing Duplicates
Mapping
Replacing Values via Mapping
Adding Values via Mapping
Rename the Indexes of the Axes
Discretization and Binning
Detecting and Filtering Outliers
Permutation
Random Sampling
String Manipulation
Built-in Methods for String Manipulation
Regular Expressions
Data Aggregation
GroupBy
A Practical Example
Hierarchical Grouping
Group Iteration
Chain of Transformations
Functions on Groups
Advanced Data Aggregation
Conclusions
Chapter 7: Data Visualization with matplotlib and Seaborn
The matplotlib Library
Installation
The matplotlib Architecture
Backend Layer
Artist Layer
Scripting Layer (pyplot)
pylab and pyplot
pyplot
The Plotting Window
Data Visualization with Jupyter Notebook
Set the Properties of the Plot
matplotlib and NumPy
Using kwargs
Working with Multiple Figures and Axes
Adding Elements to the Chart
Adding Text
Adding a Grid
Adding a Legend
Saving Your Charts
Saving the Code
Saving Your Notebook as an HTML File or as Other File Formats
Saving Your Chart Directly as an Image
Handling Date Values
Chart Typology
Line Charts
Line Charts with pandas
Histograms
Bar Charts
Horizontal Bar Charts
Multiserial Bar Charts
Multiseries Bar Charts with a pandas Dataframe
Multiseries Stacked Bar Charts
Stacked Bar Charts with a pandas Dataframe
Other Bar Chart Representations
Pie Charts
Pie Charts with a pandas Dataframe
Advanced Charts
Contour Plots
Polar Charts
The mplot3d Toolkit
3D Surfaces
Scatter Plots in 3D
Bar Charts in 3D
Multipanel Plots
Display Subplots Within Other Subplots
Grids of Subplots
The Seaborn Library
Conclusions
Chapter 8: Machine Learning with scikit-learn
The scikit-learn Library
Machine Learning
Supervised and Unsupervised Learning
Supervised Learning
Unsupervised Learning
Training Set and Testing Set
Supervised Learning with scikit-learn
The Iris Flower Dataset
The PCA Decomposition
K-Nearest Neighbors Classifier
Diabetes Dataset
Linear Regression: The Least Square Regression
Support Vector Machines (SVMs)
Support Vector Classification (SVC)
Nonlinear SVC
Plotting Different SVM Classifiers Using the Iris Dataset
Support Vector Regression (SVR)
Conclusions
Untitled
Chapter 9: Deep Learning with TensorFlow
Artificial Intelligence, Machine Learning, and Deep Learning
Artificial Intelligence
Machine Learning Is a Branch of Artificial Intelligence
Deep Learning Is a Branch of Machine Learning
The Relationship Between Artificial Intelligence, Machine Learning, and Deep Learning
Deep Learning
Neural Networks and GPUs
Data Availability: Open Data Source, Internet of Things, and Big Data
Python
Deep Learning Python Frameworks
Artificial Neural Networks
How Artificial Neural Networks Are Structured
Single Layer Perceptron (SLP)
Multilayer Perceptron (MLP)
Correspondence Between Artificial and Biological Neural Networks
TensorFlow
TensorFlow: Google’s Framework
TensorFlow: Data Flow Graph
Start Programming with TensorFlow
TensorFlow 2.x vs TensorFlow 1.x
Installing TensorFlow
Programming with the Jupyter Notebook
Tensors
Loading Data Into a Tensor from a pandas Dataframe
Loading Data in a Tensor from a CSV File
Operation on Tensors
Developing a Deep Learning Model with TensorFlow
Model Building
Model Compiling
Model Training and Testing
Prediction Making
Practical Examples with TensorFlow 2.x
Single Layer Perceptron with TensorFlow
Before Starting
Data To Be Analyzed
Multilayer Perceptron (with One Hidden Layer) with TensorFlow
Multilayer Perceptron (with Two Hidden Layers) with TensorFlow
Conclusions
Chapter 10: An Example—Meteorological Data
A Hypothesis to Be Tested: The Influence of the Proximity of the Sea
The System in the Study: The Adriatic Sea and the Po Valley
Finding the Data Source
Data Analysis on Jupyter Notebook
Analysis of Processed Meteorological Data
The RoseWind
Calculating the Mean Distribution of the Wind Speed
Conclusions
Chapter 11: Embedding the JavaScript D3 Library in the IPython Notebook
The Open Data Source for Demographics
The JavaScript D3 Library
Drawing a Clustered Bar Chart
The Choropleth Maps
The Choropleth Map of the U.S. Population in 2022
Conclusions
Chapter 12: Recognizing Handwritten Digits
Handwriting Recognition
Recognizing Handwritten Digits with scikit-learn
The Digits Dataset
Learning and Predicting
Recognizing Handwritten Digits with TensorFlow
Learning and Predicting with an SLP
Learning and Predicting with an MLP
Conclusions
Chapter 13: Textual Data Analysis with NLTK
Text Analysis Techniques
The Natural Language Toolkit (NLTK)
Import the NLTK Library and the NLTK Downloader Tool
Search for a Word with NLTK
Analyze the Frequency of Words
Select Words from Text
Bigrams and Collocations
Preprocessing Steps
Use Text on the Network
Extract the Text from the HTML Pages
Sentiment Analysis
Conclusions
Chapter 14: Image Analysis and Computer Vision with OpenCV
Image Analysis and Computer Vision
OpenCV and Python
OpenCV and Deep Learning
Installing OpenCV
First Approaches to Image Processing and Analysis
Before Starting
Load and Display an Image
Work with Images
Save the New Image
Elementary Operations on Images
Image Blending
Image Analysis
Edge Detection and Image Gradient Analysis
Edge Detection
The Image Gradient Theory
A Practical Example of Edge Detection with the Image Gradient Analysis
A Deep Learning Example: Face Detection
Conclusions
Appendix A: Writing Mathematical Expressions with LaTeX
With matplotlib
With Jupyter Notebook in a Python Cell
With Jupyter Notebook in a Markdown Cell
Subscripts and Superscripts
Fractions, Binomials, and Stacked Numbers
Radicals
Fonts
Accents
Appendix B: Open Data Sources
Political and Government Data
Health Data
Social Data
Miscellaneous and Public Datasets
Financial Data
Climatic Data
Sports Data
Publications, Newspapers, and Books
Musical Data
Index