The fast and easy way to learn Python programming and statistics
Python is a general-purpose programming language created in the late 1980s--and named after Monty Python--that's used by thousands of people to do things from testing microchips at Intel, to powering Instagram, to building video games with the PyGame library.
Python For Data Science For Dummies is written for people who are new to data analysis, and discusses the basics of Python data analysis programming and statistics. The book also discusses Google Colab, which makes it possible to write Python code in the cloud.
Get started with data science and Python Visualize information Wrangle data Learn from data The book provides the statistical background needed to get started in data science programming, including probability, random distributions, hypothesis testing, confidence intervals, and building regression models for prediction.
Author(s): John Paul Mueller; Luca Massaron
Edition: 2
Publisher: For Dummies
Year: 2019
Language: English
Pages: 496
Cover
Introduction
About This Book
Foolish Assumptions
Icons Used in This Book
Beyond the Book
Where to Go from Here
Part 1: Getting Started with Data Science and Python
Chapter 1: Discovering the Match between Data Science and Python
Defining the Sexiest Job of the 21st Century
Creating the Data Science Pipeline
Understanding Python’s Role in Data Science
Learning to Use Python Fast
Chapter 2: Introducing Python’s Capabilities and Wonders
Why Python?
Working with Python
Performing Rapid Prototyping and Experimentation
Considering Speed of Execution
Visualizing Power
Using the Python Ecosystem for Data Science
Chapter 3: Setting Up Python for Data Science
Considering the Off-the-Shelf Cross-Platform Scientific Distributions
Installing Anaconda on Windows
Installing Anaconda on Linux
Installing Anaconda on Mac OS X
Downloading the Datasets and Example Code
Chapter 4: Working with Google Colab
Defining Google Colab
Getting a Google Account
Working with Notebooks
Performing Common Tasks
Using Hardware Acceleration
Executing the Code
Viewing Your Notebook
Sharing Your Notebook
Getting Help
Part 2: Getting Your Hands Dirty with Data
Chapter 5: Understanding the Tools
Using the Jupyter Console
Using Jupyter Notebook
Performing Multimedia and Graphic Integration
Chapter 6: Working with Real Data
Uploading, Streaming, and Sampling Data
Accessing Data in Structured Flat-File Form
Sending Data in Unstructured File Form
Managing Data from Relational Databases
Interacting with Data from NoSQL Databases
Accessing Data from the Web
Chapter 7: Conditioning Your Data
Juggling between NumPy and pandas
Validating Your Data
Manipulating Categorical Variables
Dealing with Dates in Your Data
Dealing with Missing Data
Slicing and Dicing: Filtering and Selecting Data
Concatenating and Transforming
Aggregating Data at Any Level
Chapter 8: Shaping Data
Working with HTML Pages
Working with Raw Text
Using the Bag of Words Model and Beyond
Working with Graph Data
Chapter 9: Putting What You Know in Action
Contextualizing Problems and Data
Considering the Art of Feature Creation
Performing Operations on Arrays
Part 3: Visualizing Information
Chapter 10: Getting a Crash Course in MatPlotLib
Starting with a Graph
Setting the Axis, Ticks, Grids
Defining the Line Appearance
Using Labels, Annotations, and Legends
Chapter 11: Visualizing the Data
Choosing the Right Graph
Creating Advanced Scatterplots
Plotting Time Series
Plotting Geographical Data
Visualizing Graphs
Part 4: Wrangling Data
Chapter 12: Stretching Python’s Capabilities
Playing with Scikit-learn
Performing the Hashing Trick
Considering Timing and Performance
Running in Parallel on Multiple Cores
Chapter 13: Exploring Data Analysis
The EDA Approach
Defining Descriptive Statistics for Numeric Data
Counting for Categorical Data
Creating Applied Visualization for EDA
Understanding Correlation
Modifying Data Distributions
Chapter 14: Reducing Dimensionality
Understanding SVD
Performing Factor Analysis and PCA
Understanding Some Applications
Chapter 15: Clustering
Clustering with K-means
Performing Hierarchical Clustering
Discovering New Groups with DBScan
Chapter 16: Detecting Outliers in Data
Considering Outlier Detection
Examining a Simple Univariate Method
Developing a Multivariate Approach
Part 5: Learning from Data
Chapter 17: Exploring Four Simple and Effective Algorithms
Guessing the Number: Linear Regression
Moving to Logistic Regression
Making Things as Simple as Naïve Bayes
Learning Lazily with Nearest Neighbors
Chapter 18: Performing Cross-Validation, Selection, and Optimization
Pondering the Problem of Fitting a Model
Cross-Validating
Selecting Variables Like a Pro
Pumping Up Your Hyperparameters
Chapter 19: Increasing Complexity with Linear and Nonlinear Tricks
Using Nonlinear Transformations
Regularizing Linear Models
Fighting with Big Data Chunk by Chunk
Understanding Support Vector Machines
Playing with Neural Networks
Chapter 20: Understanding the Power of the Many
Starting with a Plain Decision Tree
Making Machine Learning Accessible
Boosting Predictions
Part 6: The Part of Tens
Chapter 21: Ten Essential Data Resources
Discovering the News with Subreddit
Getting a Good Start with KDnuggets
Locating Free Learning Resources with Quora
Gaining Insights with Oracle’s Data Science Blog
Accessing the Huge List of Resources on Data Science Central
Learning New Tricks from the Aspirational Data Scientist
Obtaining the Most Authoritative Sources at Udacity
Receiving Help with Advanced Topics at Conductrics
Obtaining the Facts of Open Source Data Science from Masters
Zeroing In on Developer Resources with Jonathan Bower
Chapter 22: Ten Data Challenges You Should Take
Meeting the Data Science London + Scikit-learn Challenge
Predicting Survival on the Titanic
Finding a Kaggle Competition that Suits Your Needs
Honing Your Overfit Strategies
Trudging Through the MovieLens Dataset
Getting Rid of Spam E-mails
Working with Handwritten Information
Working with Pictures
Analyzing Amazon.com Reviews
Interacting with a Huge Graph
Index
About the Authors
Advertisement Page
Connect with Dummies
End User License Agreement