Take the next steps in your data science career! This friendly and hands-on guide shows you how to start mastering Pandas with skills you already know from spreadsheet software.
In Pandas in Action you will learn how to:
• Import datasets, identify issues with their data structures, and optimize them for efficiency
• Sort, filter, pivot, and draw conclusions from a dataset and its subsets
• Identify trends from text-based and time-based data
• Organize, group, merge, and join separate datasets
• Use a GroupBy object to store multiple DataFrames
Pandas has rapidly become one of Python's most popular data analysis libraries. In Pandas in Action, a friendly and example-rich introduction, author Boris Paskhaver shows you how to master this versatile tool and take the next steps in your data science career. You'll learn how easy Pandas makes it to efficiently sort, analyze, filter and munge almost any type of data.
About the technology
Data analysis with Python doesn't have to be hard. If you can use a spreadsheet, you can learn pandas! While its grid-style layouts may remind you of Excel, pandas is far more flexible and powerful. This Python library quickly performs operations on millions of rows, and it interfaces easily with other tools in the Python data ecosystem. It's a perfect way to up your data game.
About the book
Pandas in Action introduces Python-based data analysis using the amazing pandas library. You'll learn to automate repetitive operations and gain deeper insights into your data that would be impractical—or impossible—in Excel. Each chapter is a self-contained tutorial. Realistic downloadable datasets help you learn from the kind of messy data you'll find in the real world.
What's inside
• Organize, group, merge, split, and join datasets
• Find trends in text-based and time-based data
• Sort, filter, pivot, optimize, and draw conclusions
• Apply aggregate operations
About the reader
For readers experienced with spreadsheets and basic Python programming.
About the author
Boris Paskhaver is a software engineer, Agile consultant, and online educator. His programming courses have been taken by 300,000 students across 190 countries.
Author(s): Boris Paskhaver
Edition: 1
Publisher: Manning Publications
Year: 2021
Language: English
Commentary: Vector PDF
Pages: 440
City: Shelter Island, NY
Tags: Python; Data Visualization; pandas; Relational Algebra; Time Series Analysis; Data Exploration
Pandas in Action
contents
preface
acknowledgments
about this book
Who should read this book
How this book is organized: A road map
About the code
liveBook discussion forum
Other online resources
about the author
about the cover illustration
Part 1 Core pandas
1 Introducing pandas
1.1 Data in the 21st century
1.2 Introducing pandas
1.2.1 Pandas vs. graphical spreadsheet applications
1.2.2 Pandas vs. its competitors
1.3 A tour of pandas
1.3.1 Importing a data set
1.3.2 Manipulating a DataFrame
1.3.3 Counting values in a Series
1.3.4 Filtering a column by one or more criteria
1.3.5 Grouping data
Summary
2 The Series object
2.1 Overview of a Series
2.1.1 Classes and instances
2.1.2 Populating the Series with values
2.1.3 Customizing the Series index
2.1.4 Creating a Series with missing values
2.2 Creating a Series from Python objects
2.3 Series attributes
2.4 Retrieving the first and last rows
2.5 Mathematical operations
2.5.1 Statistical operations
2.5.2 Arithmetic operations
2.5.3 Broadcasting
2.6 Passing the Series to Python’s built-in functions
2.7 Coding challenge
2.7.1 Problems
2.7.2 Solutions
Summary
3 Series methods
3.1 Importing a data set with the read_csv function
3.2 Sorting a Series
3.2.1 Sorting by values with the sort_values method
3.2.2 Sorting by index with the sort_index method
3.2.3 Retrieving the smallest and largest values with the nsmallest and nlargest methods
3.3 Overwriting a Series with the inplace parameter
3.4 Counting values with the value_counts method
3.5 Invoking a function on every Series value with the apply method
3.6 Coding challenge
3.6.1 Problems
3.6.2 Solutions
Summary
4 The DataFrame object
4.1 Overview of a DataFrame
4.1.1 Creating a DataFrame from a dictionary
4.1.2 Creating a DataFrame from a NumPy ndarray
4.2 Similarities between Series and DataFrames
4.2.1 Importing a DataFrame with the read_csv function
4.2.2 Shared and exclusive attributes of Series and DataFrames
4.2.3 Shared methods of Series and DataFrames
4.3 Sorting a DataFrame
4.3.1 Sorting by a single column
4.3.2 Sorting by multiple columns
4.4 Sorting by index
4.4.1 Sorting by row index
4.4.2 Sorting by column index
4.5 Setting a new index
4.6 Selecting columns and rows from a DataFrame
4.6.1 Selecting a single column from a DataFrame
4.6.2 Selecting multiple columns from a DataFrame
4.7 Selecting rows from a DataFrame
4.7.1 Extracting rows by index label
4.7.2 Extracting rows by index position
4.7.3 Extracting values from specific columns
4.8 Extracting values from Series
4.9 Renaming columns or rows
4.10 Resetting an index
4.11 Coding challenge
4.11.1 Problems
4.11.2 Solutions
Summary
5 Filtering a DataFrame
5.1 Optimizing a data set for memory use
5.1.1 Converting data types with the astype method
5.2 Filtering by a single condition
5.3 Filtering by multiple conditions
5.3.1 The AND condition
5.3.2 The OR condition
5.3.3 Inversion with ~
5.3.4 Methods for Booleans
5.4 Filtering by condition
5.4.1 The isin method
5.4.2 The between method
5.4.3 The isnull and notnull methods
5.4.4 Dealing with null values
5.5 Dealing with duplicates
5.5.1 The duplicated method
5.5.2 The drop_duplicates method
5.6 Coding challenge
5.6.1 Problems
5.6.2 Solutions
Summary
Part 2 Applied pandas
6 Working with text data
6.1 Letter casing and whitespace
6.2 String slicing
6.3 String slicing and character replacement
6.4 Boolean methods
6.5 Splitting strings
6.6 Coding challenge
6.6.1 Problems
6.6.2 Solutions
6.7 A note on regular expressions
Summary
7 MultiIndex DataFrames
7.1 The MultiIndex object
7.2 MultiIndex DataFrames
7.3 Sorting a MultiIndex
7.4 Selecting with a MultiIndex
7.4.1 Extracting one or more columns
7.4.2 Extracting one or more rows with loc
7.4.3 Extracting one or more rows with iloc
7.5 Cross-sections
7.6 Manipulating the Index
7.6.1 Resetting the index
7.6.2 Setting the index
7.7 Coding challenge
7.7.1 Problems
7.7.2 Solutions
Summary
8 Reshaping and pivoting
8.1 Wide vs. narrow data
8.2 Creating a pivot table from a DataFrame
8.2.1 The pivot_table method
8.2.2 Additional options for pivot tables
8.3 Stacking and unstacking index levels
8.4 Melting a data set
8.5 Exploding a list of values
8.6 Coding challenge
8.6.1 Problems
8.6.2 Solutions
Summary
9 The GroupBy object
9.1 Creating a GroupBy object from scratch
9.2 Creating a GroupBy object from a data set
9.3 Attributes and methods of a GroupBy object
9.4 Aggregate operations
9.5 Applying a custom operation to all groups
9.6 Grouping by multiple columns
9.7 Coding challenge
9.7.1 Problems
9.7.2 Solutions
Summary
10 Merging, joining, and concatenating
10.1 Introducing the data sets
10.2 Concatenating data sets
10.3 Missing values in concatenated DataFrames
10.4 Left joins
10.5 Inner joins
10.6 Outer joins
10.7 Merging on index labels
10.8 Coding challenge
10.8.1 Problems
10.8.2 Solutions
Summary
11 Working with dates and times
11.1 Introducing the Timestamp object
11.1.1 How Python works with datetimes
11.1.2 How pandas works with datetimes
11.2 Storing multiple timestamps in a DatetimeIndex
11.3 Converting column or index values to datetimes
11.4 Using the DatetimeProperties object
11.5 Adding and subtracting durations of time
11.6 Date offsets
11.7 The Timedelta object
11.8 Coding challenge
11.8.1 Problems
11.8.2 Solutions
Summary
12 Imports and exports
12.1 Reading from and writing to JSON files
12.1.1 Loading a JSON file Into a DataFrame
12.1.2 Exporting a DataFrame to a JSON file
12.2 Reading from and writing to CSV files
12.3 Reading from and writing to Excel workbooks
12.3.1 Installing the xlrd and openpyxl libraries in an Anaconda environment
12.3.2 Importing Excel workbooks
12.3.3 Exporting Excel workbooks
12.4 Coding challenge
12.4.1 Problems
12.4.2 Solutions
Summary
13 Configuring pandas
13.1 Getting and setting pandas options
13.2 Precision
13.3 Maximum column width
13.4 Chop threshold
13.5 Option context
Summary
14 Visualization
14.1 Installing matplotlib
14.2 Line charts
14.3 Bar graphs
14.4 Pie charts
Summary
appendix A Installation and setup
A.1 The Anaconda distribution
A.2 The macOS setup process
A.2.1 Installing Anaconda in macOS
A.2.2 Launching Terminal
A.2.3 Common Terminal commands
A.3 The Windows setup process
A.3.1 Installing Anaconda in Windows
A.3.2 Launching Anaconda Prompt
A.3.3 Common Anaconda Prompt commands
A.4 Creating a new Anaconda environment
A.5 Anaconda Navigator
A.6 The basics of Jupyter Notebook
appendix B Python crash course
B.1 Simple data types
B.1.1 Numbers
B.1.2 Strings
B.1.3 Booleans
B.1.4 The None object
B.2 Operators
B.2.1 Mathematical operators
B.2.2 Equality and inequality operators
B.3 Variables
B.4 Functions
B.4.1 Arguments and return values
B.4.2 Custom functions
B.5 Modules
B.6 Classes and objects
B.7 Attributes and methods
B.8 String methods
B.9 Lists
B.9.1 List iteration
B.9.2 List comprehension
B.9.3 Converting a string to a list and vice versa
B.10 Tuples
B.11 Dictionaries
B.11.1 Dictionary Iteration
B.12 Sets
appendix C NumPy crash course
C.1 Dimensions
C.2 The ndarray object
C.2.1 Generating a numeric range with the arange method
C.2.2 Attributes on a ndarray object
C.2.3 The reshape method
C.2.4 The randint function
C.2.5 The randn function
C.3 The nan object
appendix D Generating fake data with Faker
D.1 Installing Faker
D.2 Getting started with Faker
D.3 Populating a DataFrame with fake values
appendix E Regular expressions
E.1 Introduction to Python’s re module
E.2 Metacharacters
E.3 Advanced search patterns
E.4 Regular expressions and pandas
index
Symbols
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Pandas in Action-back