Machine learning (ML) is progressively reshaping the fields of quantitative finance and algorithmic trading. ML tools are increasingly adopted by hedge funds and asset managers, notably for alpha signal generation and stocks selection. The technicality of the subject can make it hard for non-specialists to join the bandwagon, as the jargon and coding requirements may seem out-of-reach. Machine learning for factor investing: Python version bridges this gap. It provides a comprehensive tour of modern ML-based investment strategies that rely on firm characteristics. The book covers a wide array of subjects which range from economic rationales to rigorous portfolio back-testing and encompass both data processing and model interpretability. Common supervised learning algorithms such as tree models and neural networks are explained in the context of style investing and the reader can also dig into more complex techniques like autoencoder asset returns, Bayesian additive trees and causal models. All topics are illustrated with self-contained Python code samples and snippets that are applied to a large public dataset that contains over 90 predictors. The material, along with the content of the book, is available online so that readers can reproduce and enhance the examples at their convenience. If you have even a basic knowledge of quantitative finance, this combination of theoretical concepts and practical illustrations will help you learn quickly and deepen your financial and technical expertise.
Author(s): Guillaume Coquere; Tony Guida
Publisher: CRC Press LLC
Year: 2023
Language: English
Pages: 358
I Introduction
1 Notations and data
1.1 Notations
1.2 Dataset
2 Introduction
2.1 Context
2.2 Portfolio construction: the workflow
2.3 Machine learning is no magic wand
3 Factor investing and asset pricing anomalies
3.1 Introduction
3.2 Detecting anomalies
3.2.1 Challenges
3.2.2 Simple portfolio sorts
3.2.3 Factors
3.2.4 Fama-MacBeth regressions
3.2.5 Factor competition
3.2.6 Advanced techniques
3.3 Factors or characteristics?
3.4 Hot topics: momentum, timing, and ESG
3.4.1 Factor momentum
3.4.2 Factor timing
3.4.3 The green factors
3.5 The links with machine learning
3.5.1 Short list of recent references
3.5.2 Explicit connections with asset pricing models
3.6 Coding exercises
4 Data preprocessing
4.1 Know your data
4.2 Missing data
4.3 Outlier detection
4.4 Feature engineering
4.4.1 Feature selection
4.4.2 Scaling the predictors
4.5 Labelling
4.5.1 Simple labels
4.5.2 Categorical labels
4.5.3 The triple barrier method
4.5.4 Filtering the sample
4.5.5 Return horizons
4.6 Handling persistence
4.7 Extensions
4.7.1 Transforming features
4.7.2 Macroeconomic variables
4.7.3 Active learning
4.8 Additional code and results
4.8.1 Impact of rescaling: graphical representation
4.8.2 Impact of rescaling: toy example
4.9 Coding exercises
II Common supervised algorithms
5 Penalized regressions and sparse hedging for minimum variance portfolios
5.1 Penalized regressions
5.1.1 Simple regressions
5.1.2 Forms of penalizations
5.1.3 Illustrations
5.2 Sparse hedging for minimum variance portfolios
5.2.1 Presentation and derivations
5.2.2 Example
5.3 Predictive regressions
5.3.1 Literature review and principle
5.3.2 Code and results
5.4 Coding exercise
6 Tree-based methods
6.1 Simple trees
6.1.1 Principle
6.1.2 Further details on classification
6.1.3 Pruning criteria
6.1.4 Code and interpretation
6.2 Random forests
6.2.1 Principle
6.2.2 Code and results
6.3 Boosted trees: Adaboost
6.3.1 Methodology
6.3.2 Illustration
6.4 Boosted trees: extreme gradient boosting
6.4.1 Managing loss
6.4.2 Penalization
6.4.3 Aggregation
6.4.4 Tree structure
6.4.5 Extensions
6.4.6 Code and results
6.4.7 Instance weighting
6.5 Discussion
6.6 Coding exercises
7 Neural networks
7.1 The original perceptron
7.2 Multilayer perceptron
7.2.1 Introduction and notations
7.2.2 Universal approximation
7.2.3 Learning via back-propagation
7.2.4 Further details on classification
7.3 How deep we should go and other practical issues
7.3.1 Architectural choices
7.3.2 Frequency of weight updates and learning duration
7.3.3 Penalizations and dropout
7.4 Code samples and comments for vanilla MLP
7.4.1 Regression example
7.4.2 Classification example
7.4.3 Custom losses
7.5 Recurrent networks
7.5.1 Presentation
7.5.2 Code and results
7.6 Other common architectures
7.6.1 Generative adversarial networks
7.6.2 Autoencoders
7.6.3 A word on convolutional networks
7.6.4 Advanced architectures
7.7 Coding exercise
8 Support vector machines
8.1 SVM for classification
8.2 SVM for regression
8.3 Practice
8.4 Coding exercises
9 Bayesian methods
9.1 The Bayesian framework
9.2 Bayesian sampling
9.2.1 Gibbs sampling
9.2.2 Metropolis-Hastings sampling
9.3 Bayesian linear regression
9.4 Naïve Bayes classifier
9.5 Bayesian additive trees
9.5.1 General formulation
9.5.2 Priors
9.5.3 Sampling and predictions
9.5.4 Code
III From predictions to portfolios
10 Validating and tuning
10.1 Learning metrics
10.1.1 Regression analysis
10.1.2 Classification analysis
10.2 Validation
10.2.1 The variance-bias tradeoff: theory
10.2.2 The variance-bias tradeoff: illustration
10.2.3 The risk of overfitting: principle
10.2.4 The risk of overfitting: some solutions
10.3 The search for good hyperparameters
10.3.1 Methods
10.3.2 Example: grid search
10.3.3 Example: Bayesian optimization
10.4 Short discussion on validation in backtests
11 Ensemble models
11.1 Linear ensembles
11.1.1 Principles
11.1.2 Example
11.2 Stacked ensembles
11.2.1 Two-stage training
11.2.2 Code and results
11.3 Extensions
11.3.1 Exogenous variables
11.3.2 Shrinking inter-model correlations
11.4 Exercise
12 Portfolio backtesting
12.1 Setting the protocol
12.2 Turning signals into portfolio weights
12.3 Performance metrics
12.3.1 Discussion
12.3.2 Pure performance and risk indicators
12.3.3 Factor-based evaluation
12.3.4 Risk-adjusted measures
12.3.5 Transaction costs and turnover
12.4 Common errors and issues
12.4.1 Forward looking data
12.4.2 Backtest overfitting
12.4.3 Simple safeguards
12.5 Implication of non-stationarity: forecasting is hard
12.5.1 General comments
12.5.2 The no free lunch theorem
12.6 First example: a complete backtest
12.7 Second example: backtest overfitting
12.8 Coding exercises
IV Further important topics
13 Interpretability
13.1 Global interpretations
13.1.1 Simple models as surrogates
13.1.2 Variable importance (tree-based)
13.1.3 Variable importance (agnostic)
13.1.4 Partial dependence plot
13.2 Local interpretations
13.2.1 LIME
13.2.2 Shapley values
13.2.3 Breakdown
14 Two key concepts: causality and non-stationarity
14.1 Causality
14.1.1 Granger causality
14.1.2 Causal additive models
14.1.3 Structural time series models
14.2 Dealing with changing environments
14.2.1 Non-stationarity: yet another illustration
14.2.2 Online learning
14.2.3 Homogeneous transfer learning
15 Unsupervised learning
15.1 The problem with correlated predictors
15.2 Principal component analysis and autoencoders
15.2.1 A bit of algebra
15.2.2 PCA
15.2.3 Autoencoders
15.2.4 Application
15.3 Clustering via k-means
15.4 Nearest neighbors
15.5 Coding exercise
16 Reinforcement learning
16.1 Theoretical layout
16.1.1 General framework
16.1.2 Q-learning
16.1.3 SARSA
16.2 The curse of dimensionality
16.3 Policy gradient
16.3.1 Principle
16.3.2 Extensions
16.4 Simple examples
16.4.1 Q-learning with simulations
16.4.2 Q-learning with market data
16.5 Concluding remarks
16.6 Exercises
V Appendix
17 Data description
18 Solutions to exercises
18.1 Chapter 3
18.2 Chapter 4
18.3 Chapter 5
18.4 Chapter 6
18.5 Chapter 7: the autoencoder model and universal approximation
18.6 Chapter 8
18.7 Chapter 11: ensemble neural network
18.8 Chapter 12
18.8.1 EW portfolios
18.8.2 Advanced weighting function
18.9 Chapter 15
18.10 Chapter 16
Bibliography
Index