Take tiny steps to enter the big world of data science through this interesting guideAbout This Book* Learn the fundamentals of machine learning and build your own intelligent applications* Master the art of building your own machine learning systems with this example-based practical guide* Work with important classification and regression algorithms and other machine learning techniquesWho This Book Is ForThis book is for anyone interested in entering the data science stream with machine learning. Basic familiarity with Python is assumed.What You Will Learn* Exploit the power of Python to handle data extraction, manipulation, and exploration techniques* Use Python to visualize data spread across multiple dimensions and extract useful features* Dive deep into the world of analytics to predict situations correctly* Implement machine learning classification and regression algorithms from scratch in Python* Be amazed to see the algorithms in action* Evaluate the performance of a machine learning model and optimize it* Solve interesting real-world problems using machine learning and Python as the journey unfoldsIn DetailData science and machine learning are some of the top buzzwords in the technical world today. A resurging interest in machine learning is due to the same factors that have made data mining and Bayesian analysis more popular than ever. This book is your entry point to machine learning.This book starts with an introduction to machine learning and the Python language and shows you how to complete the setup. Moving ahead, you will learn all the important concepts such as, exploratory data analysis, data preprocessing, feature extraction, data visualization and clustering, classification, regression and model performance evaluation. With the help of various projects included, you will find it intriguing to acquire the mechanics of several important machine learning algorithms - they are no more obscure as they thought. Also, you will be guided step by step to build your own models from scratch. Toward the end, you will gather a broad picture of the machine learning ecosystem and best practices of applying machine learning techniques.Through this book, you will learn to tackle data-driven problems and implement your solutions with the powerful yet simple language, Python. Interesting and easy-to-follow examples, to name some, news topic classification, spam email detection, online ad click-through prediction, stock prices forecast, will keep you glued till you reach your goal.Style and approachThis book is an enticing journey that starts from the very basics and gradually picks up pace as the story unfolds. Each concept is first succinctly defined in the larger context of things, followed by a detailed explanation of their application. Every concept is explained with the help of a project that solves a real-world problem, and involves hands-on work--giving you a deep insight into the world of machine learning. With simple yet rich language--Python--you will understand and be able to implement the examples with ease.
Author(s): Yuxi (Hayden) Liu
Year: 2017
Language: English
Pages: 254
Cover
Copyright
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Getting Started with Python and Machine Learning
What is machine learning and why do we need it?
A very high level overview of machine learning
A brief history of the development of machine learning algorithms
Generalizing with data
Overfitting, underfitting and the bias-variance tradeoff
Avoid overfitting with cross-validation
Avoid overfitting with regularization
Avoid overfitting with feature selection and dimensionality reduction
Preprocessing, exploration, and feature engineering
Missing values
Label encoding
One-hot-encoding
Scaling
Polynomial features
Power transformations
Binning
Combining models
Bagging
Boosting
Stacking
Blending
Voting and averaging
Installing software and setting up
Troubleshooting and asking for help
Summary
Chapter 2: Exploring the 20 Newsgroups Dataset with Text Analysis Algorithms
What is NLP?
Touring powerful NLP libraries in Python
The newsgroups data
Getting the data
Thinking about features
Visualization
Data preprocessing
Clustering
Topic modeling
Summary
Chapter 3: Spam Email Detection with Naive Bayes
Getting started with classification
Types of classification
Applications of text classification
Exploring naive Bayes
Bayes' theorem by examples
The mechanics of naive Bayes
The naive Bayes implementations
Classifier performance evaluation
Model tuning and cross-validation
Summary
Chapter 4: News Topic Classification with Support Vector Machine
Recap and inverse document frequency
Support vector machine
The mechanics of SVM
Scenario 1 - identifying the separating hyperplane
Scenario 2 - determining the optimal hyperplane
Scenario 3 - handling outliers
The implementations of SVM
Scenario 4 - dealing with more than two classes
The kernels of SVM
Scenario 5 - solving linearly non-separable problems
Choosing between the linear and RBF kernel
News topic classification with support vector machine
More examples - fetal state classification on cardiotocography with SVM
Summary
Chapter 5: Click-Through Prediction with Tree-Based Algorithms
Brief overview of advertising click-through prediction
Getting started with two types of data, numerical and categorical
Decision tree classifier
The construction of a decision tree
The metrics to measure a split
The implementations of decision tree
Click-through prediction with decision tree
Random forest - feature bagging of decision tree
Summary
Chapter 6: Click-Through Prediction with Logistic Regression
One-hot encoding - converting categorical features to numerical
Logistic regression classifier
Getting started with the logistic function
The mechanics of logistic regression
Training a logistic regression model via gradient descent
Click-through prediction with logistic regression by gradient descent
Training a logistic regression model via stochastic gradient descent
Training a logistic regression model with regularization
Training on large-scale datasets with online learning
Handling multiclass classification
Feature selection via random forest
Summary
Chapter 7: Stock Price Prediction with Regression Algorithms
Brief overview of the stock market and stock price
What is regression?
Predicting stock price with regression algorithms
Feature engineering
Data acquisition and feature generation
Linear regression
Decision tree regression
Support vector regression
Regression performance evaluation
Stock price prediction with regression algorithms
Summary
Chapter 8: Best Practices
Machine learning workflow
Best practices in the data preparation stage
Best practice 1 - completely understand the project goal
Best practice 2 - collect all fields that are relevant
Best practice 3 - maintain consistency of field values
Best practice 4 - deal with missing data
Best practices in the training sets generation stage
Best practice 5 - determine categorical features with numerical values
Best practice 6 - decide on whether or not to encode categorical features
Best practice 7 - decide on whether or not to select features and if so, how
Best practice 8 - decide on whether or not to reduce dimensionality and if so how
Best practice 9 - decide on whether or not to scale features
Best practice 10 - perform feature engineering with domain expertise
Best practice 11 - perform feature engineering without domain expertise
Best practice 12 - document how each feature is generated
Best practices in the model training, evaluation, and selection stage
Best practice 13 - choose the right algorithm(s) to start with
Naive Bayes
Logistic regression
SVM
Random forest (or decision tree)
Neural networks
Best practice 14 - reduce overfitting
Best practice 15 - diagnose overfitting and underfitting
Best practices in the deployment and monitoring stage
Best practice 16 - save, load, and reuse models
Best practice 17 - monitor model performance
Best practice 18 - update models regularly
Summary
Index