SQL Server Analytical Toolkit: Using Windowing, Analytical, Ranking, and Aggregate Functions for Data and Statistical Analysis

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Learn window function foundational concepts through a cookbook-style approach, beginning with an introduction to the OVER() clause, its various configurations in terms of how partitions and window frames are created, and how data is sorted in the partition so that the window function can operate on the partition data sets. You will build a toolkit based not only on the window functions but also on the performance tuning tools, use of Microsoft Excel to graph results, and future tools you can learn such as PowerBI, SSIS, and SSAS to enhance your data architecture skills. This book goes beyond just showing how each function works. It presents four unique use-case scenarios (sales, financial, engineering, and inventory control) related to statistical analysis, data analysis, and BI. Each section is covered in three chapters, one chapter for each of the window aggregate, ranking, and analytical function categories. Each chapter includes several TSQL code examples and is re-enforced with graphic output plus Microsoft Excel graphs created from the query output. SQL Server estimated query plans are generated and described so you can see how SQL Server processes the query. These together with IO, TIME, and PROFILE statistics are used to performance tune the query. You will know how to use indexes and when not to use indexes. You will learn how to use techniques such as creating report tables, memory enhanced tables, and creating clustered indexes to enhance performance. And you will wrap up your learning with suggested steps related to business intelligence and its relevance to other Microsoft Tools such as Power BI and Analysis Services. All code examples, including code to create and load each of the databases, are available online. What You Will Learn Use SQL Server window functions in the context of statistical and data analysis Re-purpose code so it can be modified for your unique applications Study use-case scenarios that span four critical industries Get started with statistical data analysis and data mining using TSQL queries to dive deep into data Study discussions on statistics, how to use SSMS, SSAS, performance tuning, and TSQL queries using the OVER() clause. Follow prescriptive guidance on good coding standards to improve code legibility Who This Book Is For Intermediate to advanced SQL Server developers and data architects. Technical and savvy business analysts who need to apply sophisticated data analysis for their business users and clients will also benefit. This book offers critical tools and analysis techniques they can apply to their daily job in the disciplines of data mining, data engineering, and business intelligence.

Author(s): Angelo Bobak
Publisher: Apress
Year: 2023

Language: English
Pages: 1069

Table of Contents
About the Author
About the Technical Reviewer
Introduction
Chapter 1: Partitions, Frames, and the OVER() Clause
What Are Partitions and Window Frames?
What Is an OVER() Clause?
History of the OVER() Clause and Window Functions
The Window Functions
The OVER() Clause
Syntax
Partitions and Frames
ROWS Frame Definition
RANGE Frame Definition
Example 1
ROWS and RANGE Default Behavior
Scenario 1
Scenario 2
ROWS and RANGE Window Frame Examples
Data Set
Example 2
Example 3
Example 4
Example 5
Summary
Chapter 2: Sales DW Use Case: Aggregate Functions
Sales Data Warehouse
Sales Data Warehouse Conceptual Model
A Word About Performance Tuning
Aggregate Functions
COUNT(), MAX(), MIN(), AVG(), and SUM() Functions
With OVER()
GROUPING() Function
GROUPING: Performance Tuning Considerations
STRING_AGG Function
STDEV() and STDEVP() Functions
STDEV: Performance Tuning Considerations
VAR() and VARP() Functions
SQL Server 2022: Named Window Example
Summary
Chapter 3: Sales Use Case: Analytical Functions
Analytical Functions
CUME_DIST() Function
Performance Considerations
PERCENT_RANK() Function
Performance Considerations
High-Performance Strategy
LAST_VALUE() and FIRST_VALUE()
Performance Considerations
LAG() and LEAD()
Performance Considerations
PERCENTILE_CONT() and PERCENTILE_DISC()
Performance Considerations
Using a Report Table
Summary
Chapter 4: Sales Use Case: Ranking/Window Functions
Ranking/Window Functions
RANK() vs. PERCENT_RANK()
Performance Considerations
RANK() vs. DENSE_RANK()
Performance Considerations
NTILE() Function Revisited
Performance Considerations
ROW_NUMBER() Function
Performance Considerations
Islands and Gaps Example
Summary
Chapter 5: Finance Use Case: Aggregate Functions
Aggregate Functions
COUNT() and SUM() Functions
Performance Considerations
SUM() Function
Performance Considerations
MIN() and MAX() Functions
Performance Considerations
AVG() Function
Performance Considerations
GROUPING Function
Performance Considerations
STRING_AGG() Function
STDEV() and STDEVP() Functions
Performance Considerations
VAR() and VARP() Functions
Ticker Analysis
More Non-statistical Variance
Even More Statistical Variance
Summary
Chapter 6: Finance Use Case: Ranking Functions
Ranking Functions
RANK() Function
Example 1
Performance Considerations
Example 2
Performance Considerations
DENSE_RANK() Function
Example 1
Performance Considerations
Example 2
Performance Considerations
NTILE() Function
Example 1
Performance Considerations
Example 2
Performance Considerations
ROW_NUMBER() Function
Performance Considerations
The Data Gaps and Islands Problem
Step 1: Create the First CTE
Step 2: Set Up the Second CTE to Label Gaps
Step 3: Set Up the Third CTE and Identify Start/Stop Dates of Gaps
Step 4: Generate the Report
Performance Considerations
Islands Next
Step 1: Create the First CTE Using LAG() and LEAD()
Step 2: Create the Second CTE That Labels Islands and Gaps
Step 3: Identify Island Start/Stop Dates
Step 4: Create the Final Report
Summary
Chapter 7: Finance Use Case: Analytical Functions
Analytical Functions
CUME_DIST() Function
Performance Considerations
FIRST_VALUE() and LAST_VALUE() Functions
Performance Considerations
LAG() and LEAD() Functions
LAG() Function
Performance Considerations
LEAD() Function
Performance Considerations
Memory-Optimized Strategy
PERCENT_RANK() Function
Performance Considerations
PERCENTILE_CONT() and PERCENTILE_DISC()
PERCENTILE_CONT()
Performance Considerations
PERCENTILE_DISC Function
Performance Considerations
Multi-memory-enhanced-table Strategy
Performance Considerations
Summary
Chapter 8: Plant Use Case: Aggregate Functions
Aggregate Functions
Data Model
Data Dictionaries
Entity Data Dictionary
Entity Attribute Data Dictionary
Entity Relationship Data Dictionary: Equipment Failure Subject Area
Entity Relationship Data Dictionary: Equipment Status History
Entity Relationship Data Dictionary: Plant Expense
COUNT() Function
AVG() Function
MIN() and MAX() Functions
GROUPING() Function
STRING_AGG() Function
STDEV() and STDEVP() Functions
VAR() and VARP() Functions
Example 1: Rolling Variance
Example 2: Variance by Quarter
Example 3: Variance by Year
Performance Considerations
Memory-Optimized Table Approach
Create a File and File Group
Create the Memory-Optimized Table
Load the Memory-Optimized Table
Estimated Query Plan
Seven-Million-Row Query: Performance Tuning
Summary
Chapter 9: Plant Use Case: Ranking Functions
Ranking Functions
RANK() Function
Performance Considerations
Performance Considerations
Performance Considerations
DENSE_RANK() Function
Performance Considerations
NTILE Function
Performance Considerations
ROW_NUMBER() Function
Performance Considerations
Summary
Chapter 10: Plant Use Case: Analytical Functions
Analytical Functions
CUME_DIST() Function
Performance Considerations
FIRST_VALUE() and LAST_VALUE() Functions
Performance Considerations
LAG() Function
Performance Considerations
LEAD() Function
Performance Considerations
PERCENT_RANK() Function
Performance Considerations
PERCENTILE_CONT Function
Performance Considerations
PERCENTILE_DISC() Function
Performance Considerations
Our Usual Report Table Solution
SQL Server Analysis Services
Summary
Chapter 11: Inventory Use Case: Aggregate Functions
The Inventory Database
The Inventory Data Warehouse
Loading the Inventory Data Warehouse
Aggregate Functions
COUNT(), SUM(), MAX(), MIN(), and AVG() Functions
Performance Considerations
AVG() Function
Performance Considerations
Data Warehouse Query
Performance Considerations
STDEV() Function
Performance Considerations
Data Warehouse Query
Performance Considerations
VAR() Function
Performance Considerations
Enhancing the SSIS Package
Summary
Chapter 12: Inventory Use Case: Ranking Functions
Ranking Functions
RANK() Function
Performance Considerations
Querying the Data Warehouse
DENSE_RANK() Function
Performance Considerations
NTILE() Function
Performance Considerations
ROW_NUMBER() Function
Performance Considerations
Create an SSRS Report
Report Builder Mini Tutorial
Create a Power BI Report
Summary
Chapter 13: Inventory Use Case: Analytical Functions
Analytical Functions
CUME_DIST() Function
Performance Considerations
FIRST_VALUE() and LAST_VALUE() Functions
Performance Considerations
LAG() Function
Performance Considerations
LEAD() Function
Performance Considerations
PERCENT_RANK() Function
Performance Considerations
PERCENTILE_CONT() Function
Performance Considerations
PERCENTILE_DISC() Function
Performance Considerations
Overall Performance Considerations
Report Builder Examples
Summary
Chapter 14: Summary, Conclusions, and Next Steps
Summary
Our Journey
About the Code
About the Database Folders
Data Used in the Examples
The Toolkit
SQL Server
SSMS
The Window Functions
The Visual Studio Community License
SSAS Projects
SSIS Projects
Power BI Web Scorecards, Dashboards, and Reports
Microsoft Excel Spreadsheets
SSAS Server
SSRS Server and Website
Report Builder
Performance Analysis Tools
Estimated Query Plans
Live Query Plans
DBCC
IO and TIME Statistics
STATISTICS PROFILE
Where to Get the Tools
SQL Server Developer
Visual Studio Community
SQL Server Data Tools
SQL Server SSAS Project Support
SQL Server SSIS Project Support
SQL Server SSRS Project Support
Report Builder
Power BI Desktop
Power BI Server
Microsoft Excel
SSMS
Next Steps
Thank You!
Appendix A: Function Syntax, Descriptions
The Window Frame Specifications
ROWS and RANGE Default Behavior
Scenario 1
Scenario 2
The Aggregate Functions
COUNT()
COUNT_BIG()
SUM()
MAX()
MIN()
AVG()
GROUPING()
STRING_AGG()
STDEV()
STDEVP()
VAR()
VARP()
The Analytical Functions
CUME_DIST()
FIRST_VALUE()
LAST_VALUE()
LAG()
LEAD()
PERCENT_RANK()
PERCENTILE_CONT()
PERCENTILE_DISC()
The Window/Ranking Functions
RANK()
DENSE_RANK()
NTILE()
ROW_NUMBER()
Appendix B: Statistical Functions
Standard Deviation
Variance
Normal Distribution
Mean (Average)
Median
Mode
Geometric Mean
Harmonic Mean
Weighted Mean (Average)
Summary
Index
df-Capture.PNG