As tech products become more prevalent today, the demand for machine learning professionals continues to grow. But the responsibilities and skill sets required of ML professionals still vary drastically from company to company, making the interview process difficult to predict. In this guide, data science leader Susan Shu Chang shows you how to tackle the ML hiring process.
Having served as principal data scientist in several companies, Chang has considerable experience as both ML interviewer and interviewee. She'll take you through the highly selective recruitment process by sharing hard-won lessons she learned along the way. You'll quickly understand how to successfully navigate your way through typical ML interviews.
This guide shows you how to:
Explore various machine learning roles, including ML engineer, applied scientist, data scientist, and other positionsAssess your interests and skills before deciding which ML role(s) to...
Author(s): Susan Shu Chang
Publisher: O'Reilly Media
Year: 2023
Language: English
Pages: 307
Preface
Why Machine Learning Jobs?
Who This Book Is For
What This Book Is Not
Conventions Used in This Book
O’Reilly Online Learning
How to Contact Us
Acknowledgments
1. Machine Learning Roles and the Interview Process
Overview of This Book
A Brief History of Machine Learning and Data Science Job Titles
Job Titles Requiring ML Experience
Machine Learning Lifecycle
Startups
Larger ML Teams
The Three Pillars of Machine Learning Roles
Machine Learning Algorithms and Data Intuition: Ability to Adapt
Programming and Software Engineering: Ability to Build
Execution and Communication: Ability to Get Things Done in a Team
Clearing Minimum Requirements in the Three ML Pillars
Machine Learning Skills Matrix
Introduction to ML Job Interviews
Machine Learning Job-Interview Process
Applying for Jobs Through Websites or Job Boards
Resume Screening of Website or Job-Board Applications
Applying via a Referral
Preinterview Checklist
Review your notes and questions that you fumbled
Scheduling the interview
Preinterview tech prep
Recruiter Screening
Overview of Main Interview Loop
Technical interviews
Behavioral interviews
The on-site final round
Summary
2. Machine Learning Job Application and Resume
Where Are the Jobs?
ML Job Application Guide
Your Effectiveness per Application
Job Referrals
Job referral example 1: Successful intern networking and outreach
Job referral example 2: Warm outreach to learn more about a job posting
Job referral example 3: Cold message
Networking
Machine Learning Resume Guide
Take Inventory of Your Past Experience
Overview of Resume Sections
Experience
Education
Skills summary
Volunteering
Interests
Additional resume sections
Tailoring Your Resume to Your Desired Role(s)
Job posting example 1: Data scientist
Job posting example 2: Machine learning engineer
Final Resume Touch-ups
Applying to Jobs
Vetting Job Postings
Mapping Your Skills and Experience to the ML Skills Matrix
Tracking Applications
Additional Job Application Materials, Credentials, and FAQ
Do You Need a Project Portfolio?
Do Online Certifications Help?
FAQ: How Many Pages Should My Resume Be?
What are the expectations of your region?
Coming from academia? Create an industry resume instead of a CV
FAQ: Should I Format My Resume for ATS (Applicant Tracking Systems)?
Next Steps
Browsing Job Postings
Identifying the Gaps Between Your Current Skills and Target Roles
Summary
3. Technical Interview: Machine Learning Algorithms
Overview of the Machine Learning Algorithms Technical Interview
Statistical and Foundational Techniques
Summarizing Independent and Dependent Variables
Defining Models
Summarizing Linear Regression
Defining Training and Test Set Splits
Defining Model Underfitting and Overfitting
Summarizing Regularization
Sample Interview Questions on Foundational Techniques
Interview question 3-1: What is L1 versus L2 regularization?
Interview question 3-2: How do you deal with the challenges that come with an imbalanced dataset?
Interview question 3-3: Explain boosting and bagging and what they can help with.
Supervised, Unsupervised, and Reinforcement Learning
Defining Labeled Data
Summarizing Supervised Learning
Defining Unsupervised Learning
Summarizing Semisupervised and Self-Supervised Learning
Summarizing Reinforcement Learning
Sample Interview Questions on Supervised and Unsupervised Learning
Interview question 3-4: What are common algorithms in supervised learning?
Interview question 3-5: What are some common algorithms used in unsupervised learning? How do they work?
Interview question 3-6: What are the differences between supervised and unsupervised learning?
Interview question 3-7: What are scenarios where you would use supervised learning but not unsupervised learning, and vice versa? Please illustrate with some real-world examples.
Interview question 3-8: What is a common issue that you might run into while implementing supervised learning, and how would you address it?
Natural Language Processing Algorithms
Summarizing NLP Underlying Concepts
Summarizing Long Short-Term Memory Networks
Summarizing Transformer Models
Summarizing BERT Models
Summarizing GPT Models
Going Further
Sample Interview Questions on NLP
Interview question 3-9: How would you leverage pretrained models like BERT for specific downstream tasks such as sentiment analysis, chatbots, or named entity recognition?
Interview question 3-10: How do you clean/process a raw text corpus for training an NLP model? Can you name one or two techniques and the reasons behind them?
Interview question 3-11: What are some common challenges of NLP models, and how would you address them?
Interview question 3-12: What is the difference between BERT-cased and BERT-uncased? What are the advantages and disadvantages of using one over the other?
Recommender System Algorithms
Summarizing Collaborative Filtering
Summarizing Explicit and Implicit Ratings
Summarizing Content-Based Recommender Systems
User-Based/Item-Based Versus Content-Based Recommender Systems
Summarizing Matrix Factorization
Sample Interview Questions on Recommender Systems
Interview question 3-13: What’s the difference between content-based recommender systems and collaborative filtering recommender systems? When would you use one over the other?
Interview question 3-14: What are some common problems encountered in recommender systems, and how would you resolve them?
Interview question 3-15: What is the difference between explicit and implicit feedback in recommender systems? What are the trade-offs with using each type, respectively?
Interview question 3-16: How would you address imbalanced data in recommender systems?
Reinforcement Learning Algorithms
Summarizing Reinforcement Learning Agents
Summarizing Q-Learning
Summarizing Model-Based Versus Model-Free Reinforcement Learning
Summarizing Value-Based Versus Policy-Based Reinforcement Learning
Summarizing On-Policy Versus Off-Policy Reinforcement Learning
Sample Interview Questions on Reinforcement Learning
Interview question 3-17: Explain the DQN (deep Q-network) algorithm in reinforcement learning.
Interview question 3-18: As a follow-up question, could you explain the main modifications that DQN added on top of regular Q-learning?
Interview question 3-19: Explain exploration and exploitation in reinforcement learning with an example. What are the trade-offs of these two concepts? What are some ways you would balance exploration and exploitation?
Interview question 3-20: In the following scenario, you’ve found that the reinforcement learning algorithm keeps recommending an item that is incorrectly labeled as 10% of its sale price. What might have caused this, and what would you investigate, assuming that the data is all correct?
Interview question 3-21: Explain model-based or model-free reinforcement learning. What are some examples of each, and when would you choose one over the other?
Computer Vision Algorithms
Summarizing Common Image Datasets
Summarizing Convolutional Neural Networks (CNNs)
Summarizing Transfer Learning
Summarizing Generative Adversarial Networks
Summarizing Additional Computer Vision Use Cases
Super resolution summary
Object detection summary
Semantic image segmentation summary
Sample Interview Questions on Image Recognition
Interview question 3-22: What are some common techniques of preprocessing in image-recognition tasks?
Interview question 3-23: How might you handle class imbalance in image-recognition tasks?
Interview question 3-24: How would you handle overfitting in image-recognition tasks?
Interview question 3-25: How would you improve and optimize the architecture for a CNN used for image recognition?
Summary
4. Technical Interview: Model Training and Evaluation
Defining a Machine Learning Problem
Data Preprocessing and Feature Engineering
Introduction to Data Acquisition
Introduction to Exploratory Data Analysis
Introduction to Feature Engineering
Handling missing data with imputation
Handling duplicate data
Standardizing data
Data preprocessing
One-hot encoding of categorical data
Label encoding
Binning for numerical values
Feature selection
Sample Interview Questions on Data Preprocessing and Feature Engineering
Interview question 4-1: What’s the difference between feature engineering and feature selection?
Interview question 4-2: How do you prevent data leakage issues while conducting data preprocessing?
Interview question 4-3: How do you handle a skewed data distribution during feature engineering, assuming that the minority data class is required for the machine learning problem?
The Model Training Process
The Iteration Process in Model Training
Defining the ML Task
Overview of Model Selection
Overview of Model Training
Hyperparameter tuning
ML loss functions
ML optimizers
Experiment tracking
Additional resource for model training
Sample Interview Questions on Model Selection and Training
Interview question 4-4: In what scenario would you use a reinforcement learning algorithm rather than, say, a tree-based method?
Interview question 4-5: What are some common mistakes made during model training, and how would you avoid them?
Interview question 4-6: In what scenario might ensemble models be useful?
Model Evaluation
Summary of Common ML Evaluation Metrics
Classification metrics
Regression metrics
Clustering metrics
Ranking metrics
Trade-offs in Evaluation Metrics
Additional Methods for Offline Evaluation
Model Versioning
Sample Interview Questions on Model Evaluation
Interview question 4-7: What is the ROC metric, and when is it useful?
Interview question 4-8: What is the difference between precision and recall; when would you use one over the other in a classification task?
Interview question 4-9: What is the NDCG (normalized discounted cumulative gain), explained on a high level? What type of ML task is it used for?
Summary
5. Technical Interview: Coding
Starting from Scratch: Learning Roadmap If You Don’t Know Python
Pick Up a Book or Course That’s Easy to Understand
Start with Easy Questions on LeetCode, HackerRank, or Your Platform of Choice
Set a Measurable Target and Practice, Practice, Practice
Try Out ML-Related Python Packages
Coding Interview Success Tips
Think Out Loud
Control the Flow
Your Interviewer Can Help You Out
Optimize Your Environment
Interviews Require Energy!
Python Coding Interview: Data- and ML-Related Questions
Sample Data- and ML-Related Interview and Questions
Scenario
Question 5-1 (a)
Question 5-1 (b)
FAQs for Data- and ML-Focused Interviews
Resources for Data and ML Interview Questions
Python Coding Interview: Brainteaser Questions
Patterns for Brainteaser Programming Questions
Array and string manipulation
Sliding window
Question 5-2
Two pointers
Question 5-3
Resources for Brainteaser Programming Questions
Practice platforms for coding interviews
Curated study resources for coding interviews
Curated practice problems for coding interviews
SQL Coding Interview: Data-Related Questions
Resources for SQL Coding Interview Questions
Roadmaps for Preparing for Coding Interviews
Coding Interview Roadmap Example: Four Weeks, University Student
Coding Interview Roadmap Example: Six Months, Career Transition
Coding Interview Roadmap: Create Your Own!
Summary
6. Technical Interview: Model Deployment and End-to-End ML
Model Deployment
The Main Experience Gap for New Entrants into the ML Industry
Should Data Scientists and MLEs Know This?
End-to-End Machine Learning
Cloud Environments and Local Environments
Summary of local environments
Summary of cloud environments
Public cloud provider
On-premises and private cloud
Overview of Model Deployment
Introduction to Docker
Orchestrating with Kubernetes
Additional Tooling to Know
On-Device Machine Learning
Interviews for Roles Focused on Model Training
Model Monitoring
Monitoring Setups
Dashboards
Data quality checks
Alerts
ML-Related Monitoring Metrics
Overview of Cloud Providers
GCP
AWS
Microsoft Azure
Developer Best Practices for Interviews
Version Control
Dependency Management
Code Review
Tests
Additional Technical Interview Components
Machine Learning Systems Design Interview
Technical Deep-Dive Interview
Take-Home Exercise Tips
Product Sense
Sample Interview Questions on MLOps
Interview question 6-1: Can you walk through an example where you improved the scalability of ML infrastructure?
Interview question 6-2: How do you handle the monitoring and performance tracking of ML models in production?
Interview question 6-3: What kind of CI/CD pipeline for ML models have you built, and how?
Summary
7. Behavioral Interviews
Behavioral Interview Questions and Responses
Use the STAR Method to Answer Behavioral Questions
Enhance Your Answers with the Hero’s Journey Method
Best Practices and Feedback from an Interviewer’s Perspective
Common Behavioral Questions and Recommendations
Questions About Communication Skills
Questions About Collaboration and Teamwork
Questions on How You Respond to Feedback
Questions on Dealing with Challenges and Learning New Skills
Questions About the Company
Questions About Work Projects
Free-Form Questions
Behavioral Interview Best Practices
How to Answer Behavioral Questions If You Don’t Have Relevant Work Experience
If you’re a student
If you worked in another field
Get creative—create your own experience
Senior+ Behavioral Interview Tips
Specific Preparation Examples for Big Tech
Amazon
Meta/Facebook
Alphabet/Google
Netflix
Summary
8. Tying It All Together: Your Interview Roadmap
Interview Preparation Checklist
Interview Roadmap Template
Efficient Interview Preparation
Become a Better Learner
Get hands-on ASAP
Understand the system
Progress per time spent equals efficiency
Iteratively fill in knowledge gaps
Time Management and Accountability
Focus time
Use the Pomodoro Technique
Do you need an accountability buddy?
Avoid Burnout: It Is Costly
Impostor Syndrome
Summary
9. Post-Interview and Follow-up
Post-Interview Steps
Take Notes of What You Remember from the Interview
Make Sure You’re Not Missing Important Information
Should You Send a Thank-You Email to the Interviewer?
Thank-You Note Template
How Long Should You Wait After the Interview for a Response Before Following Up?
What to Do Between Interviews
How to Respond to Rejections
Template for Rejection Responses
Job Applications Are a Funnel
Update and Customize Your Resume and Test Variations
Steps of the Offer Stage
Let Other Interviews-in-Progress Know You’ve Gotten an Offer
What to Do If the Offer Response Timeline Is Very Short
Understand Your Offer
Workplace culture
Work-life balance
Base pay
Bonuses, stocks, and other kinds of compensation
Benefits
Tying it all together
First 30/60/90 Days of Your New ML Job
Gain Domain Knowledge
Gain Code Knowledge
Meet Relevant People
Help Improve the Onboarding Documentation
Keep Track of Your Achievements
Summary
Epilogue
Index