Data Science for Water Utilities: Data as a Source of Value

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This addition to the Data Science Series introduces the principles of data science and the R language to the singular needs of water professionals. The book provides unique data and examples relevant to managing water utility and is sourced from the author’s extensive experience.

Data Science for Water Utilities: Data as a Source of Value is an applied, practical guide that shows water professionals how to use data science to solve urban water management problems. Content develops through four case studies. The first looks at analysing water quality to ensure public health. The second considers customer feedback. The third case study introduces smart meter data. The guide flows easily from basic principles through code that, with each case study, increases in complexity. The last case study analyses data using basic machine learning.

Readers will be familiar with analysing data but do not need coding experience to use this book. The title will be essential reading for anyone seeking a practical introduction to data science and creating value with R.

Author(s): Peter Prevos
Series: Chapman & Hall/CRC Data Science Series
Publisher: CRC Press/Chapman & Hall
Year: 2023

Language: English
Pages: 211
City: Boca Raton

Cover
Half Title
Series Page
Title Page
Copyright Page
Contents
Preface
Foreword
1. Introduction
1.1. Who Is This Book For?
1.1.1. Prerequisites
1.2. Book Structure
1.2.1. Case Study 1: Exploring Water Quality Data
1.2.2. Case Study 2: Understanding the Customer Experience
1.2.3. Case Study 3: Digital Metering Data
1.2.4. Case Study 4: Predicting Concrete Strength
1.2.5. Closing Chapter
1.3. What Is Data Science?
1.3.1. Data Science Unicorns
1.4. What Is Good Data Science?
1.4.1. Useful Data Science
1.4.2. Sound Data Science
1.4.3. Aesthetic Data Science
1.4.4. Data Science Ethics
2. Basics of the R Language
2.1. Download and Install R and RStudio
2.2. Basics of the R language
2.2.1. Using the Console
2.2.2. The Assignment Operator
2.2.3. Variables Names
2.2.4. Doing Arithmetic
2.2.5. Vector Variables
2.2.6. Arithmetic Functions
2.2.7. Basic Visualisations
2.3. Developing R Code
2.3.1. RStudio Projects
2.3.2. Writing Elegant Code
2.3.3. Finding Help
2.3.4. Debugging Your Code
2.3.5. Reverse-Engineering
2.4. Case Study
2.4.1. Generating Sequences
2.4.2. Repeat Code with Loops
2.5. Further Study
3. Loading and Exploring Data
3.1. R Packages
3.1.1. Introducing the Tidyverse
3.2. Case Study 1: Exploring Water Quality Data
3.3. Loading Data
3.3.1. Reading CSV Files
3.3.2. Reading Tibbles
3.3.3. Reading Excel Spreadsheets
3.4. Exploring Data Frames
3.5. Variable Types
3.6. Explore the Content of a Data Frame
3.7. Filtering Data Frames
3.7.1. Logical Values
3.7.2. Filtering in Tidyverse
3.8. Counting Data
3.9. Further Study
4. Descriptive Statistics
4.1. Case Study Problem Statement
4.2. Measures of Central Tendency
4.2.1. Mean
4.2.2. Median
4.2.3. Mode
4.3. Measures of Position
4.3.1. Calculating Percentiles
4.4. Measures of Dispersion
4.4.1. Range
4.4.2. Inter-Quartile Range
4.4.3. Variance
4.4.4. Standard Deviation
4.5. Measures of Shape
4.5.1. Skewness
4.5.2. Kurtosis
4.6. Analysing Grouped Data
4.7. Basic Data Visualisation
4.7.1. Histograms
4.7.2. Box and Whisker Plots
4.8. Further Study
5. Visualising Data with ggplot2
5.1. Principles of Visualisation
5.1.1. Aesthetics of Visualisation
5.1.2. Telling Stories
5.1.3. Visualising Data in R
5.2. Telling Stories with ggplot2
5.2.1. Data Layer
5.2.2. Aesthetics Layer
5.2.3. Geometries Layer
5.2.4. Colour Aesthetics
5.2.5. Facets Layer
5.2.6. Statistics Layer
5.2.7. Coordinates Layer
5.2.8. Theme Layer
5.2.9. Sharing Visualisations
5.3. Further Study
6. Sharing Results
6.1. Data Science Workflow
6.2. Define
6.3. Prepare
6.4. Understand
6.4.1. Explore
6.4.2. Model
6.4.3. Reflect
6.5. Communicate
6.5.1. Reproducible and Replicable Research
6.5.2. Workflow for R Coding
6.6. R Markdown
6.6.1. Defining Metadata
6.6.2. Formatted Text
6.6.3. Code Chunks
6.6.4. Formatting Tables
6.6.5. Inline Code
6.6.6. Knitting the Document
6.7. Presenting Numbers
6.7.1. Pasting Results Together
6.8. Further Study
7. Managing Dirty Data
7.1. Case Study 2: Understanding the Customer Experience
7.2. Cleaning Data
7.3. Load and Explore the Data
7.4. Convert the Data Structure
7.4.1. Convert Data Types
7.4.2. Select Relevant Variables
7.4.3. Joining Data Frames
7.5. Remove Invalid Data
7.6. Refactoring Code
7.6.1. Using Tidyverse Pipes
7.6.2. Data Cleaning Workflow
7.7. Dealing with Missing Data
7.7.1. Calculating with Missing Data
7.7.2. Remove and Impute Missing Data
7.8. Tidy Data
7.9. Further Study
8. Analysing the Customer Experience
8.1. Measuring Mental States
8.1.1. Reliability and Validity
8.2. Case Study: Consumer Involvement
8.2.1. Personal Involvement Inventory
8.2.2. Preparing the Involvement Data
8.3. Measuring Reliability
8.3.1. Correlation between Responses
8.3.2. Significance Testing for Correlations
8.3.3. Measuring Survey Reliability with Cronbach's Alpha
8.4. Survey Validity
8.4.1. Exploratory Factor Analysis
8.4.2. Factor Analysis with R
8.4.3. Visualising Factor Analysis
8.5. Interpreting Consumer Involvement
8.6. Further Study
9. Basic Linear Regression
9.1. Principles of Linear Regression
9.2. Basic Linear Regression in R
9.3. The Linear Model Function
9.4. Assessing Linear Relationship Models
9.4.1. Residuals
9.4.2. Coefficients
9.4.3. Residual Standard Error
9.4.4. R-Squared
9.4.5. F-statistic
9.5. Graphical Assessment
9.5.1. Residuals versus Fitted Plot
9.5.2. Normal Q-Q Plot
9.5.3. Scale-Location Plot
9.5.4. Residuals versus Leverage Plot
9.6. Polynomial Regression
9.7. Further Study
10. Clustering Customers to Define Segments
10.1. Customer Segmentation
10.2. Clustering Analysis Example
10.3. Hierarchical Clustering
10.3.1. Pre-Process the Data
10.3.2. Scaling Variables
10.3.3. Calculating Distances
10.3.4. Clustering the Distance Matrix
10.3.5. Interpreting Hierarchical Clustering
10.4. K-means Clustering
10.4.1. K-means Clustering in R
10.4.2. Using the Elbow Method
10.5. Clustering Categorical Data
10.5.1. Processing Categorical Variables
10.5.2. Analysing Categorical Clusters
10.6. Further Study
11. Working with Dates and Times
11.1. Date Variables
11.1.1. Defining Date Variables
11.2. Time Variables
11.2.1. Defining Time Variables
11.3. The Lubridate Package
11.4. Exploring Digital Metering Data
11.4.1. Filtering and Grouping by Date and Time
11.4.2. Analysing Water Consumption
11.4.3. Linear Interpolation to Calculate Daily Flows
11.4.4. Diurnal Curves
11.5. Further Study
12. Detecting Outliers and Anomalies
12.1. Detecting Anomalies
12.1.1. Graphical Detection
12.1.2. Standard Deviations
12.1.3. Median Absolute Deviation
12.1.4. Grubb's Test
12.1.5. Time Series Anomalies
12.1.6. Managing Outliers
12.2. Extending R with Functions
12.2.1. Functional Programming
12.2.2. Variables in Functions
12.3. Detecting Anomalous Water Consumption
12.3.1. Leak Detection
12.4. Further Study
13. Introduction to Machine Learning
13.1. What Is Machine Learning?
13.1.1. Unsupervised Machine Learning
13.1.2. Supervised Learning
13.1.3. Basic Principle of Machine Learning
13.2. Concrete Strength Case Study
13.3. Cross-Validation
13.4. Multiple Linear Regression
13.4.1. Cross-Validation for Regression Models
13.5. Decision Trees
13.5.1. Concrete Strength Case Study
13.5.2. Cross-Validation for Classification Models
13.6. Further Study
14. In Closing
14.1. Start Your Journey to Data Science
14.1.1. How to Ask For Help
14.1.2. Learning Other Languages
Bibliography
Index