Data Analysis With Python: A Modern Approach

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Data Analysis with Python offers a modern approach to data analysis so that you can work with the latest and most powerful Python tools, AI techniques, and open source libraries. Industry expert David Taieb shows you how to bridge data science with the power of programming and algorithms in Python. You'll be working with complex algorithms, and cutting-edge AI in your data analysis. Learn how to analyze data with hands-on examples using Python-based tools and Jupyter Notebook. You'll find the right balance of theory and practice, with extensive code files that you can integrate right into your own data projects. Explore the power of this approach to data analysis by then working with it across key industry case studies. Four fascinating and full projects connect you to the most critical data analysis challenges you’re likely to meet in today. The first of these is an image recognition application with TensorFlow – embracing the importance today of AI in your data analysis. The second industry project analyses social media trends, exploring big data issues and AI approaches to natural language processing. The third case study is a financial portfolio analysis application that engages you with time series analysis - pivotal to many data science applications today. The fourth industry use case dives you into graph algorithms and the power of programming in modern data science. You'll wrap up with a thoughtful look at the future of data science and how it will harness the power of algorithms and artificial intelligence.

Author(s): David Taieb
Publisher: Packt Publishing
Year: 2018

Language: English
Commentary: TruePDF
Pages: 491
Tags: Python: Data Analysis

Cover......Page 1
Copyright......Page 3
Mapt Upsell......Page 5
Contributors......Page 6
Table of Contents......Page 8
Preface......Page 12
What is data science......Page 24
Is data science here to stay?......Page 25
Why is data science on the rise?......Page 26
What does that have to do with developers?......Page 27
Putting these concepts into practice......Page 29
Deep diving into a concrete example......Page 30
Data pipeline blueprint......Page 31
What kind of skills are required to become a data scientist?......Page 33
IBM Watson DeepQA......Page 35
Back to our sentiment analysis of Twitter hashtags project......Page 38
Lessons learned from building our first enterprise-ready data pipeline......Page 42
Data science strategy......Page 43
Jupyter Notebooks at the center of our strategy......Page 45
Why are Notebooks so popular?......Page 46
Summary......Page 48
Chapter 2 - Data Science at Scale with Jupyter Notebooks and PixieDust......Page 50
Why choose Python?......Page 51
Introducing PixieDust......Page 55
SampleData – a simple API for loading data......Page 59
Wrangling data with pixiedust_rosie......Page 65
Display – a simple interactive API for data visualization......Page 72
Filtering......Page 83
Bridging the gap between developers and data scientists with PixieApps......Page 86
Architecture for operationalizing data science analytics......Page 90
Summary......Page 95
Chapter 3 - PixieApp under the Hood......Page 96
Anatomy of a PixieApp......Page 97
Routes......Page 99
Generating requests to routes......Page 102
A GitHub project tracking sample application......Page 103
Displaying the search results in a table......Page 107
Invoking the PixieDust display() API using pd_entity attribute......Page 115
Invoking arbitrary Python code with pd_script......Page 123
Making the application more responsive with pd_refresh......Page 128
Creating reusable widgets......Page 130
Summary......Page 131
Chapter 4 - Deploying PixieApps
to the Web with the PixieGateway Server......Page 132
Overview of Kubernetes......Page 133
Installing and configuring the PixieGateway server......Page 135
PixieGateway server configuration......Page 139
PixieGateway architecture......Page 143
Publishing an application......Page 147
Encoding state in the PixieApp URL......Page 151
Sharing charts by publishing them as web pages......Page 152
PixieGateway admin console......Page 157
Python Console......Page 160
Displaying warmup and run code for a PixieApp......Page 161
Summary......Page 162
Chapter 5 - Best Practices and Advanced PixieDust Concepts......Page 164
Create a word cloud image with
@captureOutput......Page 165
Increase modularity and code reuse......Page 168
Creating a widget with pd_widget......Page 171
PixieDust support of streaming data......Page 173
Adding streaming capabilities to your PixieApp......Page 176
Adding dashboard drill-downs with PixieApp events......Page 179
Extending PixieDust visualizations......Page 184
Debugging on the Jupyter Notebook using pdb......Page 192
Visual debugging with PixieDebugger......Page 196
Debugging PixieApp routes with PixieDebugger......Page 199
Troubleshooting issues using PixieDust logging......Page 201
Client-side debugging......Page 204
Run Node.js inside a Python Notebook......Page 206
Summary......Page 211
Chapter 6 - Image Recognition
with TensorFlow......Page 212
What is machine learning?......Page 213
What is deep learning?......Page 215
Getting started with TensorFlow......Page 218
Simple classification with DNNClassifier......Page 222
Image recognition sample application......Page 234
Part 1 – Load the pretrained MobileNet model......Page 235
Part 2 – Create a PixieApp for our image recognition sample application......Page 243
Part 3 – Integrate the TensorBoard graph visualization......Page 247
Part 4 – Retrain the model with custom training data......Page 253
Summary......Page 265
Chapter 7 - Big Data Twitter
Sentiment Analysis......Page 266
Apache Spark architecture......Page 267
Configuring Notebooks to work with Spark......Page 269
Twitter sentiment analysis application......Page 271
Architecture diagram for the data pipeline......Page 272
Authentication with Twitter......Page 273
Creating the Twitter stream......Page 274
Creating a Spark Streaming DataFrame......Page 278
Creating and running a structured query......Page 281
Monitoring active streaming queries......Page 283
Creating a batch DataFrame from the Parquet files......Page 285
Getting started with the IBM Watson Natural Language Understanding service......Page 288
Part 3 – Creating a real-time dashboard PixieApp......Page 296
Refactoring the analytics into their own methods......Page 297
Creating the PixieApp......Page 299
Part 4 – Adding scalability with Apache Kafka and IBM Streams Designer......Page 309
Streaming the raw tweets to Kafka......Page 311
Enriching the tweets data with the Streaming Analytics service......Page 314
Creating a Spark Streaming DataFrame
with a Kafka input source......Page 321
Summary......Page 325
Chapter 8 - Financial Time Series Analysis and Forecasting......Page 326
Getting started with NumPy......Page 327
Creating a NumPy array......Page 330
Operations on ndarray......Page 333
Selections on NumPy arrays......Page 335
Broadcasting......Page 336
Statistical exploration of time series......Page 338
Hypothetical investment......Page 346
Autocorrelation function (ACF) and partial autocorrelation function (PACF)......Page 347
Putting it all together with the StockExplorer PixieApp......Page 351
BaseSubApp – base class for all the child PixieApps......Page 356
StockExploreSubApp – first child PixieApp......Page 358
MovingAverageSubApp – second child PixieApp......Page 360
AutoCorrelationSubApp – third child PixieApp......Page 364
Time series forecasting using the ARIMA model......Page 366
Build an ARIMA model for the MSFT stock time series......Page 369
StockExplorer PixieApp Part 2 – add time series forecasting using the ARIMA model......Page 378
Summary......Page 394
Chapter 9 - US Domestic Flight Data Analysis Using Graphs......Page 396
Introduction to graphs......Page 397
Graph representations......Page 398
Graph algorithms......Page 400
Graph and big data......Page 403
Getting started with the networkx graph library......Page 404
Creating a graph......Page 405
Visualizing a graph......Page 407
Part 1 – Loading the US domestic flight data into a graph......Page 408
Graph centrality......Page 417
Part 2 – Creating the USFlightsAnalysis PixieApp......Page 427
Part 3 – Adding data exploration to the USFlightsAnalysis PixieApp......Page 438
Part 4 – Creating an ARIMA model for predicting flight delays......Page 448
Summary......Page 463
Chapter 10 - Final Thoughts......Page 464
Forward thinking – what to expect for AI and data science......Page 465
References......Page 468
Annotations......Page 470
Custom HTML attributes......Page 473
Methods......Page 478
Other Books
You May Enjoy......Page 480
Leave a review – let other readers know what you think......Page 482
Index......Page 484