Data Wrangling with R: Load, explore, transform and visualize data for modeling with tidyverse libraries

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

In this information era, where large volumes of data are being generated every day, companies want to get a better grip on it to perform more efficiently than before. This is where skillful data analysts and data scientists come into play, wrangling and exploring data to generate valuable business insights. In order to do that, you'll need plenty of tools that enable you to extract the most useful knowledge from data. Data Wrangling with R will help you to gain a deep understanding of ways to wrangle and prepare datasets for exploration, analysis, and modeling. This data book enables you to get your data ready for more optimized analyses, develop your first data model, and perform effective data visualization. The book begins by teaching you how to load and explore datasets. Then, you'll get to grips with the modern concepts and tools of data wrangling. As data wrangling and visualization are intrinsically connected, you'll go over best practices to plot data and extract insights from it. The chapters are designed in a way to help you learn all about modeling, as you will go through the construction of a data science project from end to end, and become familiar with the built-in RStudio, including an application built with Shiny dashboards. By the end of this book, you'll have learned how to create your first data model and build an application with Shiny in R. What you will learn Discover how to load datasets and explore data in R Work with different types of variables in datasets Create basic and advanced visualizations Find out how to build your first data model Create graphics using ggplot2 in a step-by-step way in Microsoft Power BI Get familiarized with building an application in R with Shiny Who this book is for If you are a professional data analyst, data scientist, or beginner who wants to learn more about data wrangling, this book is for you. Familiarity with the basic concepts of R programming or any other object-oriented programming language will help you to grasp the concepts taught in this book. Data analysts looking to improve their data manipulation and visualization skills will also benefit immensely from this book.

Author(s): Gustavo R Santos
Publisher: Packt Publishing
Year: 2023

Language: English
Pages: 385

Cover
Copyright
Contributors
Table of Contents
Preface
Part 1: Load and Explore Data
Chapter 1: Fundamentals of Data Wrangling
What is data wrangling?
Why data wrangling?
Benefits
The key steps of data wrangling
Frameworks in Data Science
Summary
Exercises
Further reading
Chapter 2: Loading and Exploring Datasets
Technical requirements
How to load files to RStudio
Loading a CSV file to R
Tibbles versus Data Frames
Saving files
A workflow for data exploration
Loading and viewing
Descriptive statistics
Missing values
Data distributions
Visualizations
Basic Web Scraping
Getting data from an API
Summary
Exercises
Further reading
Chapter 3: Basic Data Visualization
Technical requirements
Data visualization
Creating single-variable plots
Dataset
Boxplots
Density plot
Creating two-variable plots
Scatterplot
Bar plot
Line plot
Working with multiple variables
Plots side by side
Summary
Exercises
Further reading
Part 2: Data Wrangling
Chapter 4: Working with Strings
Introduction to stringr
Detecting patterns
Subset strings
Managing lengths
Mutating strings
Joining and splitting
Ordering strings
Working with regular expressions
Learning the basics
Creating frequency data summaries in R
Regexps in practice
Creating a contingency table using gmodels
Text mining
Tokenization
Stemming and lemmatization
TF-IDF
N-grams
Factors
Summary
Exercises
Further reading
Chapter 5: Working with Numbers
Technical requirements
Numbers in vectors, matrices, and data frames
Vectors
Matrices
Data frames
Math operations with variables
apply functions
Descriptive statistics
Correlation
Summary
Exercises
Further reading
Chapter 6: Working with Date and Time Objects
Technical requirements
Introduction to date and time
Date and time with lubridate
Arithmetic operations with datetime
Time zones
Date and time using regular expressions (regexps)
Practicing
Summary
Exercises
Further reading
Chapter 7: Transformations with Base R
Technical requirements
The dataset
Slicing and filtering
Slicing
Filtering
Grouping and summarizing
Replacing and filling
Arranging
Creating new variables
Binding
Using data.table
Summary
Exercises
Further reading
Chapter 8: Transformations with Tidyverse Libraries
Technical requirements
What is tidy data
The pipe operator
Slicing and filtering
Slicing
Filtering
Grouping and summarizing data
Replacing and filling data
Arranging data
Creating new variables
The mutate function
Joining datasets
Left Join
Right join
Inner join
Full join
Anti-join
Reshaping a table
Do more with tidyverse
Summary
Exercises
Further reading
Chapter 9: Exploratory Data Analysis
Technical requirements
Loading the dataset to RStudio
Understanding the data
Treating missing data
Exploring and visualizing the data
Univariate analysis
Multivariate analysis
Exploring
Analysis report
Report
Next steps
Summary
Exercises
Further reading
Part 3: Data Visualization
Chapter 10: Introduction to ggplot2
Technical requirements
The grammar of graphics
Data
Geometry
Aesthetics
Statistics
Coordinates
Facets
Themes
The basic syntax of ggplot2
Plot types
Histograms
Boxplot
Scatterplot
Bar plots
Line plots
Smooth geometry
Themes
Summary
Exercises
Further reading
Chapter 11: Enhanced Visualizations with ggplot2
Technical requirements
Facet grids
Map plots
Time series plots
3D plots
Adding interactivity to graphics
Summary
Exercises
Further reading
Chapter 12: Other Data Visualization Options
Technical requirements
Plotting graphics in Microsoft Power BI using R
Preparing data for plotting
Creating word clouds in RStudio
Summary
Exercises
Further reading
Part 4: Modeling
Chapter 13: Building a Model with R
Technical requirements
Machine learning concepts
Classification models
Regression models
Supervised and unsupervised learning
Understanding the project
The dataset
The project
The algorithm
Preparing data for modeling in R
Exploring the data with a few visualizations
Selecting the best variables
Modeling
Training
Testing and evaluating the model
Predicting
Summary
Exercises
Further reading
Chapter 14: Build an Application with Shiny in R
Technical requirements
Learning the basics of Shiny
Get started
Basic functions
Creating an application
The project
Coding
Deploying the application on the web
Summary
Exercises
Further reading
Conclusion
References
Index
Other Books You May Enjoy