Author(s): matthew kirk
Publisher: Oriely
Language: English
Pages: 216
Tags: machine learning, python
Copyright
Table of Contents
Preface
Conventions Used in This Book
Using Code Examples
O’Reilly Safari
How to Contact Us
Acknowledgments
Chapter 1. Probably Approximately Correct Software
Writing Software Right
SOLID
Testing or TDD
Refactoring
Writing the Right Software
Writing the Right Software with Machine Learning
What Exactly Is Machine Learning?
The High Interest Credit Card Debt of Machine Learning
SOLID Applied to Machine Learning
Machine Learning Code Is Complex but Not Impossible
TDD: Scientific Method 2.0
Refactoring Our Way to Knowledge
The Plan for the Book
Chapter 2. A Quick Introduction to Machine Learning
What Is Machine Learning?
Supervised Learning
Unsupervised Learning
Reinforcement Learning
What Can Machine Learning Accomplish?
Mathematical Notation Used Throughout the Book
Conclusion
Chapter 3. K-Nearest Neighbors
How Do You Determine Whether You Want to Buy a House?
How Valuable Is That House?
Hedonic Regression
What Is a Neighborhood?
K-Nearest Neighbors
Mr. K’s Nearest Neighborhood
Distances
Triangle Inequality
Geometrical Distance
Computational Distances
Statistical Distances
Curse of Dimensionality
How Do We Pick K?
Guessing K
Heuristics for Picking K
Valuing Houses in Seattle
About the Data
General Strategy
Coding and Testing Design
KNN Regressor Construction
KNN Testing
Conclusion
Chapter 4. Naive Bayesian Classification
Using Bayes’ Theorem to Find Fraudulent Orders
Conditional Probabilities
Probability Symbols
Inverse Conditional Probability (aka Bayes’ Theorem)
Naive Bayesian Classifier
The Chain Rule
Naiveté in Bayesian Reasoning
Pseudocount
Spam Filter
Setup Notes
Coding and Testing Design
Data Source
Email Class
Tokenization and Context
SpamTrainer
Error Minimization Through Cross-Validation
Conclusion
Chapter 5. Decision Trees and Random Forests
The Nuances of Mushrooms
Classifying Mushrooms Using a Folk Theorem
Finding an Optimal Switch Point
Information Gain
GINI Impurity
Variance Reduction
Pruning Trees
Ensemble Learning
Writing a Mushroom Classifier
Conclusion
Chapter 6. Hidden Markov Models
Tracking User Behavior Using State Machines
Emissions/Observations of Underlying States
Simplification Through the Markov Assumption
Using Markov Chains Instead of a Finite State Machine
Hidden Markov Model
Evaluation: Forward-Backward Algorithm
Mathematical Representation of the Forward-Backward Algorithm
Using User Behavior
The Decoding Problem Through the Viterbi Algorithm
The Learning Problem
Part-of-Speech Tagging with the Brown Corpus
Setup Notes
Coding and Testing Design
The Seam of Our Part-of-Speech Tagger: CorpusParser
Writing the Part-of-Speech Tagger
Cross-Validating to Get Confidence in the Model
How to Make This Model Better
Conclusion
Chapter 7. Support Vector Machines
Customer Happiness as a Function of What They Say
Sentiment Classification Using SVMs
The Theory Behind SVMs
Decision Boundary
Maximizing Boundaries
Kernel Trick: Feature Transformation
Optimizing with Slack
Sentiment Analyzer
Setup Notes
Coding and Testing Design
SVM Testing Strategies
Corpus Class
CorpusSet Class
Model Validation and the Sentiment Classifier
Aggregating Sentiment
Exponentially Weighted Moving Average
Mapping Sentiment to Bottom Line
Conclusion
Chapter 8. Neural Networks
What Is a Neural Network?
History of Neural Nets
Boolean Logic
Perceptrons
How to Construct Feed-Forward Neural Nets
Input Layer
Hidden Layers
Neurons
Activation Functions
Output Layer
Training Algorithms
The Delta Rule
Back Propagation
QuickProp
RProp
Building Neural Networks
How Many Hidden Layers?
How Many Neurons for Each Layer?
Tolerance for Error and Max Epochs
Using a Neural Network to Classify a Language
Setup Notes
Coding and Testing Design
The Data
Writing the Seam Test for Language
Cross-Validating Our Way to a Network Class
Tuning the Neural Network
Precision and Recall for Neural Networks
Wrap-Up of Example
Conclusion
Chapter 9. Clustering
Studying Data Without Any Bias
User Cohorts
Testing Cluster Mappings
Fitness of a Cluster
Silhouette Coefficient
Comparing Results to Ground Truth
K-Means Clustering
The K-Means Algorithm
Downside of K-Means Clustering
EM Clustering
Algorithm
The Impossibility Theorem
Example: Categorizing Music
Setup Notes
Gathering the Data
Coding Design
Analyzing the Data with K-Means
EM Clustering Our Data
The Results from the EM Jazz Clustering
Conclusion
Chapter 10. Improving Models and Data Extraction
Debate Club
Picking Better Data
Feature Selection
Exhaustive Search
Random Feature Selection
A Better Feature Selection Algorithm
Minimum Redundancy Maximum Relevance Feature Selection
Feature Transformation and Matrix Factorization
Principal Component Analysis
Independent Component Analysis
Ensemble Learning
Bagging
Boosting
Conclusion
Chapter 11. Putting It Together: Conclusion
Machine Learning Algorithms Revisited
How to Use This Information to Solve Problems
What’s Next for You?
Index
About the Author
Colophon