Understand data analysis concepts to make accurate decisions based on data using Python programming and Jupyter Notebook
Key Features
• Find out how to use Python code to extract insights from data using real-world examples
• Work with structured data and free text sources to answer questions and add value using data
• Perform data analysis from scratch with the help of clear explanations for cleaning, transforming, and visualizing data
Book Description
Data literacy is the ability to read, analyze, work with, and argue using data. Data analysis is the process of cleaning and modeling your data to discover useful information. This book combines these two concepts by sharing proven techniques and hands-on examples so that you can learn how to communicate effectively using data.
After introducing you to the basics of data analysis using Jupyter Notebook and Python, the book will take you through the fundamentals of data. Packed with practical examples, this guide will teach you how to clean, wrangle, analyze, and visualize data to gain useful insights, and you'll discover how to answer questions using data with easy-to-follow steps.
Later chapters teach you about storytelling with data using charts, such as histograms and scatter plots. As you advance, you'll understand how to work with unstructured data using natural language processing (NLP) techniques to perform sentiment analysis. All the knowledge you gain will help you discover key patterns and trends in data using real-world examples. In addition to this, you will learn how to handle data of varying complexity to perform efficient data analysis using modern Python libraries.
By the end of this book, you'll have gained the practical skills you need to analyze data with confidence.
What you will learn
• Understand the importance of data literacy and how to communicate effectively using data
• Find out how to use Python packages such as NumPy, pandas, Matplotlib, and the Natural Language Toolkit (NLTK) for data analysis
• Wrangle data and create DataFrames using pandas
• Produce charts and data visualizations using time-series datasets
• Discover relationships and how to join data together using SQL
• Use NLP techniques to work with unstructured data to create sentiment analysis models
• Discover patterns in real-world datasets that provide accurate insights
Who this book is for
This book is for aspiring data analysts and data scientists looking for hands-on tutorials and real-world examples to understand data analysis concepts using SQL, Python, and Jupyter Notebook. Anyone looking to evolve their skills to become data-driven personally and professionally will also find this book useful. No prior knowledge of data analysis or programming is required to get started with this book.
Author(s): Marc Wintjen
Edition: 1
Publisher: Packt Publishing
Year: 2020
Language: English
Commentary: Vector PDF
Pages: 322
City: Birmingham, UK
Tags: Understand data analysis concepts to make accurate decisions based on data using Python programming and Jupyter Notebook Key Features • Find out how to use Python code to extract insights from data using real-world examples • Work with structured data and free text sources to answer questions and add value using data • Perform data analysis from scratch with the help of clear explanations for cleaning, transforming, and visualizing data Book Description Data literacy is the ability to read, an
Cover
Title Page
Copyright and Credits
About Packt
Foreword
Contributors
Table of Contents
Preface
Section 1: Data Analysis Essentials
Chapter 1: Fundamentals of Data Analysis
The evolution of data analysis and why it is important
What makes a good data analyst?
Know Your Data (KYD)
Voice of the Customer (VOC)
Always Be Agile (ABA)
Understanding data types and their significance
Unstructured data
Semi-structured data
Structured data
Common data types
Data classifications and data attributes explained
Data attributes
Understanding data literacy
Reading data
Working with data
Analyzing data
Arguing about the data
Summary
Further reading
Chapter 2: Overview of Python and Installing Jupyter Notebook
Technical requirements
Installing Python and using Jupyter Notebook
Installing Anaconda
Running Jupyter and installing Python packages for data analysis
Storing and retrieving data files
Hello World! – running your first Python code
Creating a project folder hierarchy
Uploading a file
Exploring Python packages
Checking for pandas
Checking for NumPy
Checking for sklearn
Checking for Matplotlib
Checking for SciPy
Summary
Future reading
Chapter 3: Getting Started with NumPy
Technical requirements
Understanding a Python NumPy array and its importance
Differences between single and multiple dimensional arrays
Making your first NumPy array
Useful array functions
Practical use cases of NumPy and arrays
Assigning values to arrays manually
Assigning values to arrays directly
Assigning values to an array using a loop
Summary
Further reading
Chapter 4: Creating Your First pandas DataFrame
Technical requirements
Techniques for manipulating tabular data
Understanding pandas and DataFrames
Handling essential data formats
CSV
XML
Data hierarchy
Defined schema
JSON
Data dictionaries and data types
Creating our first DataFrame
Summary
Further reading
Chapter 5: Gathering and Loading Data in Python
Technical requirements
Introduction to SQL and relational databases
From SQL to pandas DataFrames
Data about your data explained
Fundamental statistics
Metadata explained
The importance of data lineage
Data flow
The input stage
The data ingestion stage
The data source stage
The data target stage
Business rules
Summary
Further reading
Section 2: Solutions for Data Discovery
Chapter 6: Visualizing and Working with Time Series Data
Technical requirements
Data modeling for results
Introducing dimensions and measures
Anatomy of a chart and data viz best practices
Analyzing your data
Why pie charts have lost ground
Art versus science
What makes great data visualizations?
Comparative analysis
Date and time trends explained
The shape of the curve
Creating your first time series chart
Summary
Further reading
Chapter 7: Exploring, Cleaning, Refining, and Blending Datasets
Technical requirements
Retrieving, viewing, and storing tabular data
Retrieving
Viewing
Storing
Learning how to restrict, sort, and sift through data
Restricting
Sorting
Sifting
Cleaning, refining, and purifying data using Python
Combining and binning data
Binning
Summary
Further reading
Chapter 8: Understanding Joins, Relationships, and Aggregates
Technical requirements
Foundations of join relationships
One-to-one relationships
Many-to-one relationships
Many-to-many relationship
Left join
Right join
Inner join
Outer join
Join types in action
Explaining data aggregation
Understanding the granularity of data
Data aggregation in action
Summary statistics and outliers
Summary
Further reading
Chapter 9: Plotting, Visualization, and Storytelling
Technical requirements
Explaining distribution analysis
KYD
Shape of the curve
Understanding outliers and trends
Geoanalytical techniques and tips
Finding patterns in data
Summary
Further reading
Section 3: Working with Unstructured Big Data
Chapter 10: Exploring Text Data and Unstructured Data
Technical requirements
Preparing to work with unstructured data
Corpus in action
Tokenization explained
Tokenize in action
Counting words and exploring results
Counting words
Normalizing text techniques
Stemming and lemmatization in action
Excluding words from analysis
Summary
Further reading
Chapter 11: Practical Sentiment Analysis
Technical requirements
Why sentiment analysis is important
Elements of an NLP model
Creating a prediction output
Sentiment analysis packages
Sentiment analysis in action
Manual input
Social media file input
Summary
Further reading
Chapter 12: Bringing It All Together
Technical requirements
Discovering real-world datasets
Data.gov
The Humanitarian Data Exchange
The World Bank
Our World in Data
Reporting results
Storytelling
The Capstone project
KYD sources
Exercise
Summary
Further reading
Works Cited
Other Books You May Enjoy
Index