This easy-to-follow guide provides R and Python recipes to help you learn and apply the top languages in the field of data analytics to your work in Microsoft Power BI. Data analytics expert and author Ryan Wade shows you how to use R and Python to perform tasks that are extremely hard, if not impossible, to do using native Power BI tools. For example, you will learn to score Power BI data using custom data science models and powerful models from Microsoft Cognitive Services.
The R and Python languages are powerful complements to Power BI. They enable advanced data transformation techniques that are difficult to perform in Power BI in its default configuration but become easier by leveraging the capabilities of R and Python. If you are a business analyst, data analyst, or a data scientist who wants to push Power BI and transform it from being just a business intelligence tool into an advanced data analytics tool, then this is the book to help you do that.
What You Will Learn
Create advanced data visualizations via R using the ggplot2 package
Ingest data using R and Python to overcome some limitations of Power Query
Apply machine learning models to your data using R and Python without the need of Power BI premium compacity
Incorporate advanced AI in Power BI without the need of Power BI premium compacity via Microsoft Cognitive Services, IBM Watson Natural Language Understanding, and pre-trained models in SQL Server Machine Learning Services
Perform advanced string manipulations not otherwise possible in Power BI using R and Python
Who This Book Is For
Power users, data analysts, and data scientists who want to go beyond Power BI’s built-in functionality to create advanced visualizations, transform data in ways not otherwise supported, and automate data ingestion from sources such as SQL Server and Excel in a more concise way
Author(s): Ryan Wade
Publisher: Apress
Year: 2020
Language: English
Pages: 440
Table of Contents
About the Author
About the Technical Reviewer
Acknowledgments
Introduction
Part I: Creating Custom Data Visualizations Using R
Chapter 1: The Grammar of Graphics
Steps to build an R custom visual in Power BI
Step 1: Configure Power BI
Step 2: Drag the “R custom visual” icon to the Power BI canvas
Step 3: Define the data set
Step 4: Develop the visual in your default R IDE
Step 5: Use the following template to develop your visual
Step 6: Make the script functional
Recommended steps to create an R visual using ggplot2
Step 1: Load the required packages that you will need for your script
Step 2: Make any required adjustments to the data set
Step 3: Initiate the creation of the visualization with the ggplot() function
Step 4: Add desired geom(s)
Step 5: Define your titles, subtitles, and caption
Step 6: Make any necessary changes to the x and y axis
Step 7: Apply themes if needed
Step 8: Use the theme() function to change any specific non-data elements
Bonus step: Specifying specific colors for your points in your scatter plot
The importance of having “tidy” data
Popular geoms
Controlling aesthetics with scales
Themes built into ggplot2
Using R visuals in the Power BI service
Helper packages for ggplot2
Summary
Chapter 2: Creating R Custom Visuals in Power BI Using ggplot2
Callout chart
Step 1: Acquire the necessary data
Step 2: Create a slicer based on the year in the Filter pane
Step 3: Configure the R visual in Power BI
Step 4: Export data to R Studio for development
Step 5: Load the required packages
Step 6: Create the variables needed for the data validation test
Step 7: Create the data validation test
Step 8: Add additional columns to your data set that is required for the custom R visual
Step 9: Create the variables that will be used for the dynamic portions of the chart
Step 10: Start building the chart by defining the ggplot() function
Step 11: Add a column chart layer to the R visual
Step 12: Add a text layer to the R visuals
Step 13: Modify the y axis
Step 14: Convert chart from a vertical column chart to horizontal bar chart
Step 15: Add a dynamic annotation to the R visuals
Step 16: Add the dynamic titles and caption to the R visual
Step 17: Remove labels from x axis and y axis
Step 18: Remove legend
Step 19: Change the look and feel of the visual using theme_few()
Step 20: Center align titles
Step 21: Add code to Power BI
Bubble chart
Step 1: Acquire the necessary data
Step 2: Load the data into Power BI
Step 3: Create a filter slicer based on the year
Step 4: Do the initial R visual configuration
Step 5: Export data to R Studio for development
Step 6: Load the required packages
Step 7: Create the variables needed for the data validation test
Step 8: Create the data validation test
Step 9: Define the colors for the conferences and conference divisions
Step 10: Dynamically define the chart titles
Step 11: Create the chart’s data set
Step 12: Start the chart by defining the ggplot function
Step 13: Add the layer for your bubble chart using the geom_point geom
Step 14: Add labels to the bubble chart
Step 15: Change the color of the bubble’s border and change the color of the bubble’s fill
Step 16: Create the ggtitle
Step 17: Set the theme
Step 18: Add code to Power BI
Forecast
Step 1: Acquire the necessary data
Step 2: Create a slicer based on quarterback
Step 3: Configure the R visual
Step 4: Export data to R Studio for development
Step 5: Load the required packages to the script
Step 6: Create the variables needed for the data validation test
Step 7: Create the data validation test
Step 8: Create the dynamic chart title
Step 9: Create the data set that is needed to generate the forecast
Step 10: Generate the forecast
Step 11: Generate the plot
Step 12: Add code to Power BI
Line chart with shade
Step 1: Acquire the necessary data
Step 2: Load the data into Power BI
Step 3: Create the report slicers
Step 4: Configure the R visual
Step 5: Export data to R Studio for development
Step 6: Load the required packages to the script
Step 7: Create the variables needed for the data validation test
Step 8: Create the data validation test
Step 9: Create a new data frame based on the dataset data frame
Step 10: Create the variables that will be used for the dynamic portions of the chart
Step 11: Create the data sets needed for background shade
Step 12: Create the data sets needed for line chart
Step 13: Create a named character vector that will be used to color the shades
Step 14: Start the chart by defining the ggplot function
Step 15: Add a layer to create the background shade
Step 16: Add a line chart based on the statistic selected
Step 17: Reshade the background using pre-determined colors based on the political party
Step 18: Format the y axis based on the statistic selected
Step 19: Add labels to the x and y axis
Step 20: Add the dynamic titles and caption to the custom R visuals
Step 21: Apply a theme based on The Economists publication
Step 22: Add code to Power BI
Map
Step 1: Acquire the necessary data
Step 2: Load the data into Power BI
Step 3: Create a slicer based on state in the Filter pane
Step 4: Configure the R visual
Step 5: Export data to R Studio for development
Step 6: Load the required packages
Step 7: Create the variables needed for the data validation test
Step 8: Create the data validation test
Step 9: Create the variables for the chart titles
Step 10: Add the quintile column to the data set
Step 11: Define the colors that will be used to shade the map
Step 12: Define the ggplot() function
Step 13: Add the map layer
Step 14: Format the x and y axis
Step 15: Color the counties based on their quintile
Step 16: Improve the approximation of the selected state
Step 17: Add the dynamic titles and caption to the custom R visuals
Step 18: Apply the theme_map() theme
Step 19: Add code to Power BI
Quad chart
Step 1: Acquire the necessary data
Step 2: Load the data into Power BI
Step 3: Create a slicer for game type and period
Step 4: Configure an R visual on the report canvas
Step 5: Export data to R Studio for development
Step 6: Load the required packages
Step 7: Create the variables needed for the data validation test
Step 8: Create the data validation test
Step 9: Create the chart titles
Step 10: Add additional columns to the data set
Step 11: Start the chart by defining the ggplot function
Step 12: Use the geom_point() geom to plot the players on the plot
Step 13: Add the labels for each quadrant
Step 14: Draw vertical and horizontal lines through the x and y axis
Step 15: Add quadrant labels to the chart
Step 16: Add labels for the x and y axis
Step 17: Add the dynamic titles and caption to the custom R visual
Step 18: Add a theme
Step 19: Perform last minute cleanup
Step 20: Add code to Power BI
Adding regression line
Step 1: Acquire the necessary data
Step 2: Load the data into Power BI
Step 3: Configure the R visual
Step 4: Export data to R Studio for development
Step 5: Load the required packages
Step 6: Create the variables needed for the data validation test
Step 7: Create the data validation test
Step 8: Start the chart by initializing the ggplot function
Step 9: Add a scatter plot layer to the R visual
Step 10: Add regression line layer to the R visuals
Step 11: Add a title to the chart
Step 12: Change the chart’s theme
Step 13: Perform some last minute cleanup
Step 14: Add code to Power BI
Part II: Ingesting Data into the Power BI Data Model Using R and Python
Chapter 3: Reading CSV Files
Dynamically combining files
Example scenario
Picking the rolling 24 months using R
Step 1: Load the required R packages for the script
Step 2: Change your working directory to the folder that contains the Sales data sets
Step 3: Read the file names into a character vector
Step 4: Create a date vector
Step 5: Create a data frame using the two vectors
Step 6: Get the upper and lower bound of our desired date range
Step 7: Subset the data frame to only include the desired months
Step 8: Create a data frame that is based on the union of all the files
Step 9: Combine the code into one script and paste into the R editor for Power BI
Picking the rolling 24 months using Python
Step 1: Create a Python script and load the necessary libraries
Step 2: Change your working directory to the Python_Code folder
Step 3: Read the file names into a Python list
Step 4: Create a pandas data frame that will hold the information of the files to combine
Step 5: Create a new column that strips the date information from the monthly_reports column
Step 6: Get the upper and lower bound of the date range
Step 7: Subset the data frame
Step 8: Combine files into one data frame
Step 9: Add the code to Power BI
Filtering rows based on a regular expression
Leveraging regular expressions via R
Step 1: Load the required packages
Step 2: Load the file that contains the potential voters into R
Step 3: Define the regular expression
Step 4: Remove the bad email addresses from the data set
Step 5: Combine the preceding code into one script and paste in the R editor for Power BI
Leveraging regular expressions via Python
Step 1: Load the required libraries
Step 2: Load the file that contains the potential voters into Python and assign the contents to a pandas data frame
Step 3: Define the regular expression that matches the pattern of a well-formed email address
Step 4: Remove the bad email addresses from the data set
Step 5: Combine the preceding code into one script and paste in the Python editor for Power BI
Chapter 4: Reading Excel Files
Reading Excel files using R
Step 1: Import the tidyverse and readxl package
Step 2: Create the shell of the combine_sheets function
Step 3: Get the name of the sheets you need to combine in your function from the specified Excel workbook
Step 4: Convert the character vector built in Step 3 to a named character vector
Step 5: Use the mapr_df function to combine the sheets into a single data frame
Step 6: Return the data frame
Step 7: Set the working directory to the location where the Excel files are located
Step 8: Assign the file names to the excel_file_paths variable
Step 9: Use the map_dfr function to apply the combine_sheets function to each file in the working directory
Step 10: Copy the R script and paste it into the R editor via GetData in Power BI
Reading Excel files using Python
Step 1: Import the os and pandas library
Step 2: Create the shell of the combine_sheets function
Step 3: Create an Excel object based on the workbook located at the path specified in the excel_file_path variable
Step 4: Create a list of the sheet names in the workbook specified by the excel_file_path variable
Step 5: Use the read_excel method of pandas to read the data from each sheet into one data frame
Step 6: Return the data frame held in df as the output from the combine_sheets function
Step 7: Set the working directory to the location that contains the Excel data you want to combine
Step 8: Get the list of the file names in the current working directory and assign it to the excel_file_paths variable
Step 9: Create an empty data frame and name it combined_workbooks
Step 10: Create the shell of the for loop
Step 11: Combine all the data in each sheet into one data frame using the combine_sheets function
Step 12: Append the combined_workbook data frame to the combined_workbooks data frame
Step 13: Copy the Python script and paste it into the Python editor via GetData in Power BI
Chapter 5: Reading SQL Server Data
Adding AdventureWorksDW_StarSchema database to your SQL Server instance
Reading SQL Server data into Power BI using R
Step 1: Create a DSN to the SQL Server database
Step 2: Create a log table in SQL Server
Step 3: Start developing the R script to load DimDate
Step 4: Create a variable to hold the name of the table you want to import
Step 5: Create a variable to hold the sql statement that will be used to return the table
Step 6: Create a connection to SQL Server
Step 7: Retrieve the data from SQL Server and store it in a data frame
Step 8: Get the current time
Step 9: Get the number of records that were read
Step 10: Create a one record R data frame that contains the information you want to log
Step 11: Insert the information you gathered in Step 10 into the history log table
Step 12: Close your connection
Step 13: Copy the script into Power BI
Step 14: Create a script to load DimProduct based on the ReadLog_DimDate.R script
Step 15: Create a script to load DimPromotion based on the ReadLog_DimDate.R script
Step 16: Create a script to load DimSalesTerritory based on the ReadLog_DimDate.R script
Step 17: Create a script to load FactInternetSales based on the ReadLog_DimDate.R script
Reading SQL Server data using Python
Step 1: Create a DSN to the SQL Server database
Step 2: Create a log table in SQL Server
Step 3: Begin creating the script to load the DimDate table
Step 4: Create a variable that holds the name of the table that you want to read into Power BI
Step 5: Create a connection to the database using sqlalchemy
Step 6: Read in the contents of the DimDate table and store it as a data frame in the df_read variable
Step 7: Get the current date and time and store the information into the datastamp variable
Step 8: Calculate the number of records in the DimDate table
Step 9: Create a one record pandas data frame that contains the information you want to log
Step 10: Insert the information you gathered in Step 9 into the history log table
Step 11: Copy the script into Power BI
Step 12: Create a script to load DimProduct based on the ReadLog_DimDate.py script
Step 13: Create a script to load DimPromotion based on the ReadLog_DimDate.py script
Step 14: Create a script to load DimSalesTerritory based on the ReadLog_DimDate.py script
Step 15: Create a script to load FactInternetSales based on the ReadLog_DimDate.py script
Chapter 6: Reading Data into the Power BI Data Model via an API
Reading Census data into Power BI via an API using R
Step 1: Get a personal Census API key
Step 2: Load the necessary R packages
Step 3: Identify the variables you want to return from your data set
Step 4: Create a character vector of the tables that contains the variables you want to return
Step 5: Configure the get_acs function
Step 6: Give the variables (columns) meaningful names
Step 7: Copy the script into Power BI
Reading Census data into Power BI via an API using Python
Step 1: Get a personal Census API key
Step 2: Load the necessary Python libraries
Step 3: Identify the variables you want to return in your data set
Step 4: Create a variable that is based on the list variables you want
Step 5: Create a list of tuples that contains the geographies you want to in your data set
Step 6: Retrieve the data using the censusdata.download() function
Step 7: Reset the index of the data frame created in Step 6
Step 8: Define the new column names
Step 9: Rename columns
Step 10: Copy the script into Power BI
Summary
Part III: Transforming Data Using R and Python
Chapter 7: Advanced String Manipulation and Pattern Matching
Masking sensitive data
Masking sensitive data in Power BI using R
Step 1: Import tidyverse and stringr
Step 2: Create the scrub data function
Step 3: Read the comments into a data frame
Step 4: Mask the phone numbers and ssn numbers in the comment field
Step 5: Copy the script into Power BI
Masking sensitive data in Power BI using Python
Step 1: Import pandas, os, and re library
Step 2: Create the mask_text function
Step 3: Set the working directory
Step 4: Read the comments into a data frame
Step 5: Mask the phone numbers and SSNs
Step 6: Copy the script into Power BI
Counting the number of words and sentences in reviews
Counting the number of words and sentences in reviews in Power BI using R
Step 1: Import tidyverse and stringr
Step 2: Change working directory to location of the file
Step 3: Read in the Yelp data
Step 4: Subset the columns
Step 5: Add word count and sentence count columns
Step 6: Copy the script into Power BI
Counting the number of words in reviews in Power BI using Python
Step 1: Import pandas and os
Step 2: Change working directory to the location of the file
Step 3: Read the Yelp data into Python
Step 4: Create the word_count column
Step 5: Copy the script into Power BI
Removing names that are in an invalid format
Removing names that are in an invalid format in Power BI using R
Step 1: Import tidyverse and stringr
Step 2: Change working directory to location where the DimEmployee.csv file is
Step 3: Create a regular expression to match a valid name
Step 4: Read the data into a data frame
Step 5: Do an inline update of the Name column
Step 6: Copy the script into Power BI
Removing names that are in an invalid format in Power BI using Python
Step 1: Import pandas, re, and os library
Step 2: Change working directory to the location where the DimEmployee csv file is
Step 3: Read in the DImEmployee data into an R data frame
Step 4: Create a regular expression that matches a valid name
Step 5: Compile the regular expression
Step 6: Define a function that executes the name test
Step 7: Apply the function to the column to scrub the name
Step 8: Copy the script into Power BI
Identifying patterns in strings based on conditional logic
Identifying patterns in strings based on conditional logic in Power BI using R
Step 1: Import the tidyverse and stringr packages
Step 2: Change the working directory
Step 3: Create a function that identifies the products
Step 4: Read in the data from the ProductionOrders.csv file into an R data frame
Step 5: Add a column to the df named “Monitored Products”
Step 6: Copy the script into Power BI
Identifying patterns in strings based on conditional logic in Power BI using Python
Step 1: Import pandas, re, and os library
Step 2: Change working directory
Step 3: Compile the required regular expression
Step 4: Define a function that returns the monitored products
Step 5: Read in the data into Pandas data frame
Step 6: Create a new column named “Monitored Products”
Step 7: Copy the script into Power BI
Summary
Chapter 8: Calculated Columns Using R and Python
Create a Google Geocoding API key
Step 1: Log into the Google console
Step 2: Set up a billing account
Step 3: Add a new project
Step 4: Enable Geocoding API
Geocode the addresses using R
Geocode the addresses using Python
Calculate the distance with a custom function using R
Calculate the distance with a custom function using Python
Calculate distance with a pre-built function in Power BI using R
Calculate distance with a pre-built function in Power BI using Python
Summary
Part IV: Machine Learning and AI in Power BI Using R and Python
Chapter 9: Applying Machine Learning and AI to Your Power BI Data Models
Apply machine learning to a data set before bringing it into the Power BI data model
Predicting home values using R
Step 1: Have the data scientist share the model with you
Step 2: Load the tidyverse package
Step 3: Load the model object and the data set to be scored
Step 4: Subset the data frame so that it only contains the columns needed for the model
Step 5: Apply the model to your data set to predict the median home values
Step 6: Add the predictions to the original data set
Step 7: Copy the entire R script into the Power BI data model
Predicting home values using Python
Step 1: Have the data scientist share the model with you
Step 2: Load the necessary Python libraries needed for the script
Step 3: Load the model object and the Boston homes data set
Step 4: Extract the information needed from the bost_housing data frame
Step 5: Apply the model to the preprocessed data to predict the median home values
Step 6: Add the predictions to the original data set
Step 7: Copy the entire Python script into the Python script editor in Power BI
Using pre-built AI models to enhance Power BI data models
Set up Cognitive Services in Azure
The Data Science Virtual Machine (DSVM)
Performing sentiment analysis in Microsoft Cognitive Services via Python
Step 1: Get the Yelp review data from Kaggle
Step 2: Import the necessary libraries, modules, and function for the script
Step 3: Assign values to the variables used in the script
Step 4: Read in the sample of the Yelp review data into Python
Step 5: Transform the data frame to the format that is required by Microsoft Cognitive Services
Step 6: Score the reviews using the sentiment method of the text_analytics object
Step 7: Create a data frame to hold the sentiment data
Step 8: Add the data to Power BI
Applying AI to your Power BI data model using services other than Microsoft Cognitive Services
Configuring the Tone Analyzer service in IBM Watson
Step 1: Sign up for IBM Cloud account
Step 2: Log into the IBM Cloud
Step 3: Go to the Tone Analyzer page
Step 4: Define your Tone Analyzer service
Step 5: Get your API key
Writing the Python script to perform the tone analysis
Step 1: Import the required libraries and modules
Step 2: Create an instance of the IAMAuthenticator class
Step 3: Create an instance of the ToneAnalyzerV3 class
Step 4: Set the service URL of the service object
Step 5: Create a data frame that will hold the data you want to use for the tone analysis
Step 6: Create a data frame based on the scored data
Step 7: Create a looping structure to individually send the documents to IBM Watson
Step 8: Format and score the document
Step 9: Assign the results of the tone analysis and set the initial values of the tone variables
Step 10: Loop through the returned tones and assign their scores to the appropriate variable
Step 11: Create a data frame based on the listReturnedUtterance list
Step 12: Merge the dfReturnedUtterance data frame with the dfDocuments data frame
Step 13: Copy the complete script into Power BI
Chapter 10: Productionizing Data Science Models and Data Wrangling Scripts
Predicting home values in Power BI using R in SQL Server Machine Learning Services
Build the R script that adds the model to SQL Server
Step 1: Load the necessary packages
Step 2: Load the model into the R session
Step 3: Create a connection to the database
Step 4: Define the model variables
Step 5: Build the T-SQL statement to add the model to the database
Step 6: Add the code needed to execute the T-SQL statement from R
Step 7: Save the script
Use SQL Server Machine Learning Services with R to score the data
Step 1: Launch SQL Server Management Studio
Step 2: Create a connection to the server you want to use
Step 3: Add the BostonHousingInfo database to your server
Step 4: Add the model to the database
Step 5: Add the stored procedure to the database that will do the scoring
Step 6: Fetch the scored data in Power BI from SQL Server
Predicting home values in Power BI using Python in SQL Server Machine Learning Services
Create the script needed to add Python model to SQL Server
Step 1: Get the version of libraries used in this exercise
Step 2: Create a conda environment
Step 3: Create the code that pushes the model to SQL Server
Use SQL Server Machine Learning Services with Python from Power BI to score the data
Step 1: Launch SQL Server Management Studio
Step 2: Create a connection to the server you want to use
Step 3: Add the BostonHousingInfo database to your server
Step 4: Add the model to the database
Step 5: Add the stored procedure to the database that will do the scoring
Step 6: Fetch the scored data in Power BI from SQL Server
Performing sentiment analysis in Power BI using R in SQL Server Machine Learning Services
Add pre-built R models to SQL Server Machine Learning Services using PowerShell
Step 1: Check to see if the pre-trained models are installed
Step 2: Open PowerShell as administrator
Step 3: Download PowerShell script
Step 4: Run the downloaded script in PowerShell
Troubleshooting
Use pre-built R sentiment model in SQL Server Machine Learning Services to score data in Power BI
Step 1: Begin defining the stored procedures
Step 2: Define variables
Step 3: Set @Query variable
Step 4: Set @RScript variable
Step 5: Configure sp_execute_external_script
Step 6: Define the output
Step 7: Add the procedure to the database
Step 8: Call the procedure from Power BI
Performing sentiment analysis in Power BI using Python in SQL Server Machine Learning Services
Add pre-built Python models to SQL Server Machine Learning Services
Step 1: Check to see if the pre-trained models are installed
Step 2: Open PowerShell as administrator
Step 3: Download PowerShell script
Step 4: Run the downloaded script in PowerShell
Troubleshooting
Use pre-built Python sentiment model in SQL Server Machine Learning Services to score data in Power BI
Step 1: Begin defining the stored procedures
Step 2: Define variables
Step 3: Set @Query
Step 4: Set @PythonScript
Step 5: Configure sp_execute_external_script
Step 6: Define the output
Step 7: Add the procedure to the database
Step 8: Call the procedure from Power BI
Calculating distance in Power BI using R in SQL Server Machine Learning Services
Step 1: Make sure dplyr is loaded in SSMLS
Step 2: Launch SSMS and connect to a SQL Server
Step 3: Add the CalculateDistance database to the server
Step 4: Add the stored procedure that will calculate the distances
Step 5: Call the Power BI procedure from Power BI
Calculating distance in Power BI using Python in SQL Server Machine Learning Services
Step 1: Launch SSMS and connect to a SQL Server
Step 2: Add the CalculateDistance database to the server
Step 3: Add the stored procedure that will calculate the distances
Step 4: Call the Power BI procedure from Power BI
Index