The Art of Feature Engineering: Essentials for Machine Learning

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

When machine learning engineers work with data sets, they may find the results aren't as good as they need. Instead of improving the model or collecting more data, they can use the feature engineering process to help improve results by modifying the data's features to better capture the nature of the problem. This practical guide to feature engineering is an essential addition to any data scientist's or machine learning engineer's toolbox, providing new ideas on how to improve the performance of a machine learning solution. Beginning with the basic concepts and techniques, the text builds up to a unique cross-domain approach that spans data on graphs, texts, time series, and images, with fully worked out case studies. Key topics include binning, out-of-fold estimation, feature selection, dimensionality reduction, and encoding variable-length data. The full source code for the case studies is available on a companion website as Python Jupyter notebooks.

Author(s): Pablo Duboue
Edition: 1
Publisher: Cambridge University Press
Year: 2020

Language: English
Commentary: True PDF
Pages: 283
City: Cambridge, UK
Tags: Machine Learning; Deep Learning; Image Analysis; Python; Feature Engineering; Graph Data Model; Text Analysis; Dimensionality Reduction; Time Series Analysis

Contents
Preface
PART ONE FUNDAMENTALS
1 Introduction
1.1 Feature Engineering
1.2 Evaluation
1.3 Cycles
1.4 Analysis
1.5 Other Processes
1.6 Discussion
1.7 Learning More
2 Features, Combined: Normalization, Discretization and Outliers
2.1 Normalizing Features
2.2 Discretization and Binning
2.3 Descriptive Features
2.4 Dealing with Outliers
2.5 Advanced Techniques
2.6 Learning More
3 Features, Expanded: Computable Features, Imputation and Kernels
3.1 Computable Features
3.2 Imputation
3.3 Decomposing Complex Features
3.4 Kernel-Induced Feature Expansion
3.5 Learning More
4 Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings
4.1 Feature Selection
4.2 Regularization and Embedded Feature Selection
4.3 Dimensionality Reduction
4.4 Learning More
5 Advanced Topics: Variable-Length Data and Automated Feature Engineering
5.1 Variable-Length Feature Vectors
5.2 Instance-Based Engineering
5.3 Deep Learning and Feature Engineering
5.4 Automated Feature Engineering
5.5 Learning More
PART TWO CASE STUDIES
6 Graph Data
6.1 WikiCities Dataset
6.2 Exploratory Data Analysis (EDA)
6.3 First Feature Set
6.4 Second Feature Set
6.5 Final Feature Sets
6.6 Learning More
7 Timestamped Data
7.1 WikiCities: Historical Features
7.2 Time Lagged Features
7.3 Sliding Windows
7.4 Third Featurization: EMA
7.5 Historical Data as Data Expansion
7.6 Time Series
7.7 Learning More
8 Textual Data
8.1 WikiCities: Text
8.2 Exploratory Data Analysis
8.3 Numeric Tokens Only
8.4 Bag-of-Words
8.5 Stop Words and Morphological Features
8.6 Features in Context
8.7 Skip Bigrams and Feature Hashing
8.8 Dimensionality Reduction and Embeddings
8.9 Closing Remarks
8.10 Learning More
9 Image Data
9.1 WikiCities: Satellite Images
9.2 Exploratory Data Analysis
9.3 Pixels as Features
9.4 Automatic Dataset Expansion
9.5 Descriptive Features: Histograms
9.6 Local Feature Detectors: Corners
9.7 Dimensionality Reduction: HOGs
9.8 Closing Remarks
9.9 Learning More
10 Other Domains: Video, GIS and Preferences
10.1 Video
10.2 Geographical Features
10.3 Preferences
Bibliography
Index