The Kaggle Book: Data analysis and machine learning for competitive data science

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Get a step ahead of your competitors with insights from over 30 Kaggle Masters and Grandmasters. Discover tips, tricks, and best practices for competing effectively on Kaggle and becoming a better data scientist. Purchase of the print or Kindle book includes a free eBook in the PDF format. Key Features Learn how Kaggle works and how to make the most of competitions from over 30 expert Kagglers Sharpen your modeling skills with ensembling, feature engineering, adversarial validation and AutoML A concise collection of smart data handling techniques for modeling and parameter tuning Book Description Millions of data enthusiasts from around the world compete on Kaggle, the most famous data science competition platform of them all. Participating in Kaggle competitions is a surefire way to improve your data analysis skills, network with an amazing community of data scientists, and gain valuable experience to help grow your career. The first book of its kind, The Kaggle Book assembles in one place the techniques and skills you'll need for success in competitions, data science projects, and beyond. Two Kaggle Grandmasters walk you through modeling strategies you won't easily find elsewhere, and the knowledge they've accumulated along the way. As well as Kaggle-specific tips, you'll learn more general techniques for approaching tasks based on image, tabular, textual data, and reinforcement learning. You'll design better validation schemes and work more comfortably with different evaluation metrics. Whether you want to climb the ranks of Kaggle, build some more data science skills, or improve the accuracy of your existing models, this book is for you. Plus, join our Discord Community to learn along with more than 1,000 members and meet like-minded people! What you will learn Get acquainted with Kaggle as a competition platform Make the most of Kaggle Notebooks, Datasets, and Discussion forums Create a portfolio of projects and ideas to get further in your career Design k-fold and probabilistic validation schemes Get to grips with common and never-before-seen evaluation metrics Understand binary and multi-class classification and object detection Approach NLP and time series tasks more effectively Handle simulation and optimization competitions on Kaggle Who this book is for This book is suitable for anyone new to Kaggle, veteran users, and anyone in between. Data analysts/scientists who are trying to do better in Kaggle competitions and secure jobs with tech giants will find this book useful.

Author(s): Konrad Banachewicz, Luca Massaron
Publisher: Packt
Year: 2023

Language: English
Pages: 534

Cover
Copyright
Foreword
Contributors
Table of Contents
Preface
Part I: Introduction to Competitions
Chapter 1: Introducing Kaggle and Other Data Science Competitions
The rise of data science competition platforms
The Kaggle competition platform
A history of Kaggle
Other competition platforms
Introducing Kaggle
Stages of a competition
Types of competitions and examples
Submission and leaderboard dynamics
Explaining the Common Task Framework paradigm
Understanding what can go wrong in a competition
Computational resources
Kaggle Notebooks
Teaming and networking
Performance tiers and rankings
Criticism and opportunities
Summary
Chapter 2: Organizing Data with Datasets
Setting up a dataset
Gathering the data
Working with datasets
Using Kaggle Datasets in Google Colab
Legal caveats
Summary
Chapter 3: Working and Learning with Kaggle Notebooks
Setting up a Notebook
Running your Notebook
Saving Notebooks to GitHub
Getting the most out of Notebooks
Upgrading to Google Cloud Platform (GCP)
One step beyond
Kaggle Learn courses
Summary
Chapter 4: Leveraging Discussion Forums
How forums work
Example discussion approaches
Netiquette
Summary
Part II: Sharpening Your Skills for Competitions
Chapter 5: Competition Tasks and Metrics
Evaluation metrics and objective functions
Basic types of tasks
Regression
Classification
Ordinal
The Meta Kaggle dataset
Handling never-before-seen metrics
Metrics for regression (standard and ordinal)
Mean squared error (MSE) and R squared
Root mean squared error (RMSE)
Root mean squared log error (RMSLE)
Mean absolute error (MAE)
Metrics for classification (label prediction and probability)
Accuracy
Precision and recall
The F1 score
Log loss and ROC-AUC
Matthews correlation coefficient (MCC)
Metrics for multi-class classification
Metrics for object detection problems
Intersection over union (IoU)
Dice
Metrics for multi-label classification and recommendation problems
MAP@{K}
Optimizing evaluation metrics
Custom metrics and custom objective functions
Post-processing your predictions
Predicted probability and its adjustment
Summary
Chapter 6: Designing Good Validation
Snooping on the leaderboard
The importance of validation in competitions
Bias and variance
Trying different splitting strategies
The basic train-test split
Probabilistic evaluation methods
k-fold cross-validation
Subsampling
The bootstrap
Tuning your model validation system
Using adversarial validation
Example implementation
Handling different distributions of training and test data
Handling leakage
Chapter 7: Modeling for Tabular Competitions
The Tabular Playground Series
Setting a random state for reproducibility
The importance of EDA
Dimensionality reduction with t-SNE and UMAP
Reducing the size of your data
Applying feature engineering
Easily derived features
Meta-features based on rows and columns
Target encoding
Using feature importance to evaluate your work
Pseudo-labeling
Denoising with autoencoders
Neural networks for tabular competitions
Summary
Chapter 8: Hyperparameter Optimization
Basic optimization techniques
Grid search
Random search
Halving search
Key parameters and how to use them
Linear models
Support-vector machines
Random forests and extremely randomized trees
Gradient tree boosting
LightGBM
XGBoost
CatBoost
HistGradientBoosting
Bayesian optimization
Using Scikit-optimize
Customizing a Bayesian optimization search
Extending Bayesian optimization to neural architecture search
Creating lighter and faster models with KerasTuner
The TPE approach in Optuna
Summary
Chapter 9: Ensembling with Blending and Stacking Solutions
A brief introduction to ensemble algorithms
Averaging models into an ensemble
Majority voting
Averaging of model predictions
Weighted averages
Averaging in your cross-validation strategy
Correcting averaging for ROC-AUC evaluations
Blending models using a meta-model
Best practices for blending
Stacking models together
Stacking variations
Creating complex stacking and blending solutions
Summary
Chapter 10: Modeling for Computer Vision
Augmentation strategies
Keras built-in augmentations
ImageDataGenerator approach
Preprocessing layers
albumentations
Classification
Object detection
Semantic segmentation
Summary
Chapter 11: Modeling for NLP
Sentiment analysis
Open domain Q&A
Text augmentation strategies
Basic techniques
nlpaug
Summary
Chapter 12: Simulation and Optimization Competitions
Connect X
Rock-paper-scissors
Santa competition 2020
The name of the game
Summary
Part III: Leveraging Competitions for Your Career
Chapter 13: Creating Your Portfolio of Projects and Ideas
Building your portfolio with Kaggle
Leveraging Notebooks and discussions
Leveraging Datasets
Arranging your online presence beyond Kaggle
Blogs and publications
GitHub
Monitoring competition updates and newsletters
Summary
Chapter 14: Finding New Professional Opportunities
Building connections with other competition data scientists
Participating in Kaggle Days and other Kaggle meetups
Getting spotted and other job opportunities
The STAR approach
Summary (and some parting words)
Other Books You May Enjoy
Index
Blank Page
Blank Page