Not a single day passes that we do not hear about big data in the news media,
technical conferences, and even coffee shops. The ever-increasing amount of data
collected in process monitoring, research, or simple human behavior becomes
valuable only if you extract knowledge from it. Machine learning is the essential
tool to mine data for knowledge. This book covers the what, why, and how of
machine learning:
• What are the objectives and the mathematical foundations of
machine learning?
• Why is Scala the ideal programming language to implement machine
learning algorithms?
• How can you apply machine learning to solve real-world problems?
Throughout this book, machine learning algorithms are described with diagrams,
mathematical formulations, and documented snippets of Scala code, allowing you to
understand these key concepts in your own unique way.
Author(s): Patrick R. Nicolas
Edition: 2
Publisher: Packt Publishing
Year: 2017
Language: English
Pages: 740
Cover
Copyright
Credits
About the Author
About the Reviewers
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Getting Started
Mathematical notations for the curious
Why machine learning?
Classification
Prediction
Optimization
Regression
Why Scala?
Scala as a functional language
Abstraction
Higher kinded types
Functors
Monads
Scala as an object oriented language
Scala as a scalable language
Model categorization
Taxonomy of machine learning algorithms
Unsupervised learning
Clustering
Dimension reduction
Supervised learning
Generative models
Discriminative models
Semi-supervised learning
Reinforcement learning
Leveraging Java libraries
Tools and frameworks
Java
Scala
Eclipse Scala IDE
IntelliJ IDEA Scala plugin
Simple build tool
Apache Commons Math
Description
Licensing
Installation
JFreeChart
Description
Licensing
Installation
Other libraries and frameworks
Source code
Convention
Context bounds
Presentation
Primitives and implicits
Immutability
Let's kick the tires
Writing a simple workflow
Step 1 – scoping the problem
Step 2 – loading data
Step 3 – preprocessing data
Step 4 – discovering patterns
Step 5 – implementing the classifier
Step 6 – evaluating the model
Summary
Chapter 2: Data Pipelines
Modeling
What is a model?
Model versus design
Selecting features
Extracting features
Defining a methodology
Monadic data transformation
Error handling
Monads to the rescue
mplicit models
Explicit models
Workflow computational model
Supporting mathematical abstractions
Step 1 – variable declaration
Step 2 – model definition
Step 3 – instantiation
Composing mixins to build workflow
Understanding the problem
Defining modules
Instantiating the workflow
Modularizing
Profiling data
Immutable statistics
Z-score and Gauss
Assessing a model
Validation
Key quality metrics
F-score for binomial classification
F-score for multinomial classification
Area under the curves
Area under PRC
Area under ROC
Cross-validation
One-fold cross-validation
K-fold cross-validation
Bias-variance decomposition
Overfitting
Summary
Chapter 3: Data Preprocessing
Time series in Scala
Context bounds
Types and operations
Transpose operator
Differential operator
Lazy views
Moving averages
Simple moving average
Weighted moving average
Exponential moving average
Fourier analysis
Discrete Fourier transform (DFT)
DFT-based filtering
Detection of market cycles
The discrete Kalman filter
The state space estimation
The transition equation
The measurement equation
The recursive algorithm
Prediction
Correction
Kalman smoothing
Fixed lag smoothing
Experimentation
Benefits and drawbacks
Alternative preprocessing techniques
Summary
Chapter 4: Unsupervised Learning
K-mean clustering
K-means
Measuring similarity
Defining the algorithm
Step 1 – Clusters configuration
Step 2 – Clusters assignment
Step 3 – Reconstruction error minimization
Step 4 – Classification
Curse of dimensionality
Evaluation
The results
Tuning the number of clusters
Validation
Expectation-Maximization (EM)
Gaussian mixture model
EM overview
Implementation
Classification
Testing
Online EM
Summary
Chapter 5: Dimension Reduction
Challenging model complexity
The divergences
The Kullback-Leibler divergence
Overview
Implementation
Testing
The mutual information
Principal components analysis (PCA)
Algorithm
Implementation
Test case
Evaluation
Extending PCA
Validation
Categorical features
Performance
Nonlinear models
Kernel PCA
Manifolds
Summary
Chapter 6: Naïve Bayes Classifiers
Probabilistic graphical models
Naïve Bayes classifiers
Introducing the multinomial Naïve Bayes
Formalism
The frequentist perspective
The predictive model
The zero-Frequency problem
Implementation
Design
Training
Classification
F1 Validation
Features extraction
Testing
Multivariate Bernoulli classification
Model
Implementation
Naïve Bayes and text mining
Basics information retrieval
Implementation
Analyzing documents
Extracting relative terms frequency
Generating the features
Testing
Retrieving textual information
Evaluating text mining classifier
Pros and cons
Summary
Chapter 7: Sequential Data Models
Markov decision processes
The Markov property
The first-order discrete Markov chain
The hidden Markov model (HMM)
Notation
The lambda model
Design
Evaluation (CF-1)
Alpha (forward pass)
Beta (backward pass)
Training (CF-2)
Baum-Welch estimator (EM)
Decoding (CF-3)
The Viterbi algorithm
Putting it all together
Test case 1 – Training
HMM as filtering technique
Conditional random fields
Introduction to CRF
Linear chain CRF
Regularized CRF and text analytics
The feature functions model
Design
Implementation
Configuring the CRF classifier
Training the CRF model
Applying the CRF model
Tests
The training convergence profile
Impact of the size of the training set
Impact of L2 regularization factor
Comparing CRF and HMM
Performance consideration
Summary
Chapter 8: Monte Carlo Inference
The purpose of sampling
Gaussian sampling
Box-Muller transform
Monte Carlo approximation
Overview
Implementation
Bootstrapping with replacement
Overview
Resampling
Implementation
Pros and cons of bootstrap
Markov Chain Monte Carlo (MCMC)
Overview
Metropolis-Hastings (MH)
Implementation
Test
Summary
Chapter 9: Regression and Regularization
Linear regression
Univariate linear regression
Implementation
Test case
Ordinary least squares (OLS) regression
Design
Implementation
Test case 1 – trending
Test case 2 – features selection
Regularization
Ln roughness penalty
Ridge regression
Design
Implementation
Test case
Numerical optimization
Logistic regression
Logistic function
Design
Training workflow
Step 1 – configuring the optimizer
Step 2 – computing the Jacobian matrix
Step 3 – managing the convergence of optimizer
Step 4 – defining the least squares problem
Step 5 – minimizing the sum of square errors
Test
Classification
Summary
Chapter 10: Multi-Layer Perceptron
Feed-forward neural networks (FFNN)
The biological background
Mathematical background
The multilayer perceptron (MLP)
Activation function
Network topology
Design
Configuration
Network components
Network topology
Input and hidden layers
Output layer
Synapses
Connections
Weights initialization
Model
Problem types (modes)
Online versus batch training
Training epoch
Step 1 – input forward propagation
Step 2 – error backpropagation
Step 3 – exit condition
Putting it all together
Training and classification
Regularization
Model generation
Fast Fisher-Yates shuffle
Prediction
Model fitness
Evaluation
Execution profile
Impact of learning rate
Impact of the momentum factor
Impact of the number of hidden layers
Test case
Implementation
Models evaluation
Impact of hidden layers' architecture
Benefits and limitations
Summary
Chapter 11: Deep Learning
Sparse autoencoder
Undercomplete autoencoder
Deterministic autoencoder
Categorization
Feed-forward sparse, undercomplete autoencoder
Sparsity updating equations
Implementation
Restricted Boltzmann Machines (RBMs)
Boltzmann machine
Binary restricted Boltzmann machines
Conditional probabilities
Sampling
Log-likelihood gradient
Contrastive divergence
Configuration parameters
Unsupervised learning
Convolution neural networks
Local receptive fields
Weight sharing
Convolution layers
Sub-sampling layers
Putting it all together
Summary
Chapter 12: Kernel Models and SVM
Kernel functions
Overview
Common discriminative kernels
Kernel monadic composition
The support vector machine (SVM)
The linear SVM
The separable case (hard margin)
The non-separable case (soft margin)
The nonlinear SVM
Max-margin classification
The kernel trick
Support vector classifier (SVC)
The binary SVC
Anomaly detection with one-class SVC
Support vector regression (SVR)
Overview
SVR versus linear regression
Performance considerations
Summary
Chapter 13: Evolutionary Computing
Evolution
The origin
NP problems
Evolutionary computing
Genetic algorithms and machine learning
Genetic algorithm components
Encodings
Value encoding
Predicate encoding
Solution encoding
The encoding scheme
Genetic operators
Selection
Crossover
Mutation
Fitness score
Implementation
Software design
Key components
Population
Chromosomes
Genes
Selection
Controlling population growth
GA configuration
Crossover
Population
Chromosomes
Genes
Mutation
Population
Chromosomes
Genes
Reproduction
Solver
GA for trading strategies
Definition of trading strategies
Trading operators
The cost function
Market signals
Trading strategies
Signal encoding
Test case – Fall 2008 market crash
Creating trading strategies
Configuring the optimizer
Finding the best trading strategy
Tests
Advantages and risks of genetic algorithms
Summary
Chapter 14: Multi-Armed Bandits
K-armed bandit
Exploration-exploitation trade-offs
Expected cumulative regret
Bayesian Bernoulli bandits
Epsilon-greedy algorithm
Thompson sampling
Bandit context
Prior/posterior beta distribution
Implementation
Simulated exploration and exploitation
Upper bound confidence
Confidence interval
Implementation
Summary
Chapter 15: Reinforcement Learning
Reinforcement learning
Understanding the challenge
A solution – Q-learning
Terminology
Concept
Value of policy
Bellman optimality equations
Temporal difference for model-free learning
Action-value iterative update
Implementation
Software design
The states and actions
The search space
The policy and action-value
The Q-learning components
The Q-learning training
Tail recursion to the rescue
Validation
The prediction
Option trading using Q-learning
Option property
Option model
Quantization
Putting it all together
Evaluation
Pros and cons of reinforcement learning
Learning classifier systems
Introduction to LCS
Combining learning and evolution
Terminology
Extended learning classifier systems
XCS components
Application to portfolio management
XCS core data
XCS rules
Covering
Example of implementation
Benefits and limitations of learning classifier systems
Summary
Chapter 16: Parallelism in Scala and Akka
Overview
Scala
Object creation
Streams
Memory on demand
Design for reusing Streams memory
Parallel collections
Processing a parallel collection
Benchmark framework
Performance evaluation
Scalability with Actors
The Actor model
Partitioning
Beyond Actors – reactive programming
Akka
Master-workers
Messages exchange
Worker Actors
The workflow controller
The master Actor
Master with routing
Distributed discrete Fourier transform
Limitations
Futures
Blocking on futures
Future callbacks
Putting it all together
Summary
Chapter 17
: Apache Spark MLlib
Overview
Apache Spark core
Why Spark?
Design principles
In-memory persistency
Laziness
Transforms and actions
Shared variables
Experimenting with Spark
Deploying Spark
Using Spark shell
MLlib library
Overview
Creating RDDs
K-means using MLlib
Tests
Reusable ML pipelines
Reusable ML transforms
Encoding features
Training the model
Predictive model
Training summary statistics
Validating the model
Grid search
Apache Spark and ScalaTest
Extending Spark
Kullback-Leibler divergence
Implementation
Kullback-Leibler evaluator
Streaming engine
Why streaming?
Batch and real-time processing
Architecture overview
Discretized streams
Use case – continuous parsing
Checkpointing
Performance evaluation
Tuning parameters
Performance considerations
Pros and cons
Summary
Appendix A: Basic Concepts
Scala programming
List of libraries and tools
Code snippets format
Best practices
Encapsulation
Class constructor template
Companion objects versus case classes
Enumerations versus case classes
Overloading
Design template for immutable classifiers
Utility classes
Data extraction
Financial data sources
Documents extraction
DMatrix class
Counter
Monitor
Mathematics
Linear algebra
QR decomposition
LU factorization
LDL decomposition
Cholesky factorization
Singular Value Decomposition (SVD)
Eigenvalue decomposition
Algebraic and numerical libraries
First order predicate logic
Jacobian and Hessian matrices
Summary of optimization techniques
Gradient descent methods
Quasi-Newton algorithms
Nonlinear least squares minimization
Lagrange multipliers
Overview dynamic programming
Finances 101
Fundamental analysis
Technical analysis
Terminology
Trading data
Trading signal and strategy
Price patterns
Options trading
Financial data sources
Suggested online courses
References
Appendix B
: References