Building Recommendation Systems in Python and JAX

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Implementing and designing systems that make suggestions to users are among the most popular and essential machine learning applications available. Whether you want customers to find the most appealing items at your online store, videos to enrich and entertain them, or news they need to know, recommendation systems (RecSys) provide the way.

In this practical book, authors Bryan Bischof and Hector Yee illustrate the core concepts and examples to help you create a RecSys for any industry or scale. You'll learn the math, ideas, and implementation details you need to succeed. This book includes the RecSys platform components, relevant MLOps tools in your stack, plus code examples and helpful suggestions in PySpark, SparkSQL, FastAPI, and Weights & Biases.

You'll learn:

  • The data essential for building a RecSys
  • How to frame your data and business as a RecSys problem
  • Ways to evaluate models appropriate for your system
  • Methods to...
  • Author(s): Bryan Bischof Ph.D
    Publisher: O'Reilly Media
    Year: 2023

    Language: English
    Pages: 400

    Preface
    Conventions Used in This Book
    Using Code Examples
    O’Reilly Online Learning
    How to Contact Us
    Acknowledgments
    I. Warming Up
    1. Introduction
    Key Components of a Recommendation System
    Collector
    Ranker
    Server
    Simplest Possible Recommenders
    The Trivial Recommender
    Most-Popular-Item Recommender
    A Gentle Introduction to JAX
    Basic Types, Initialization, and Immutability
    Indexing and Slicing
    Broadcasting
    Random Numbers
    Just-in-Time Compilation
    Summary
    2. User-Item Ratings and Framing the Problem
    The User-Item Matrix
    User-User Versus Item-Item Collaborative Filtering
    The Netflix Challenge
    Soft Ratings
    Data Collection and User Logging
    What to Log
    Page loads
    Page views and hover
    Clicks
    Add-to-bag
    Impressions
    Collection and Instrumentation
    Funnels
    Business Insight and What People Like
    Summary
    3. Mathematical Considerations
    Zipf’s Laws in RecSys and the Matthew Effect
    Sparsity
    User Similarity for Collaborative Filtering
    Pearson Correlation
    Ratings via Similarity
    Explore-Exploit as a Recommendation System
    ϵ -greedy
    What Should ϵ Be?
    The NLP-RecSys Relationship
    Vector Search
    Nearest-Neighbors Search
    Summary
    4. System Design for Recommending
    Online Versus Offline
    Collector
    Offline Collector
    Online Collector
    Ranker
    Offline Ranker
    Online Ranker
    Server
    Offline Server
    Online Server
    Summary
    5. Putting It All Together: Content-Based Recommender
    Revision Control Software
    Python Build Systems
    Random-Item Recommender
    Obtaining the STL Dataset Images
    Convolutional Neural Network Definition
    Model Training in JAX, Flax, and Optax
    Input Pipeline
    Summary
    II. Retrieval
    6. Data Processing
    Hydrating Your System
    PySpark
    Example: User Similarity in PySpark
    DataLoaders
    Database Snapshots
    Data Structures for Learning and Inference
    Vector Search
    Approximate Nearest Neighbors
    Bloom Filters
    Fun Aside: Bloom Filters as the Recommendation System
    Feature Stores
    Summary
    7. Serving Models and Architectures
    Architectures by Recommendation Structure
    Item-to-User Recommendations
    Query-Based Recommendations
    Context-Based Recommendations
    Sequence-Based Recommendations
    Why Bother with Extra Features?
    Encoder Architectures and Cold Starting
    Deployment
    Models as APIs
    Spinning Up a Model Service
    Workflow Orchestration
    Containerization
    Scheduling
    CI/CD
    Alerting and Monitoring
    Schemas and Priors
    Integration Tests
    Observability
    Spans and traces
    Timeouts
    Evaluation in Production
    Slow Feedback
    Model Metrics
    Continuous Training and Deployment
    Model Drift
    Deployment Topologies
    Ensembles
    Shadowing
    Experimentation
    The Evaluation Flywheel
    Daily Warm Starts
    Lambda Architecture and Orchestration
    Logging
    Collector logs
    Filtering and scoring
    Ordering
    Active Learning
    Types of optimization
    Application: User sign-up
    Summary
    8. Putting It All Together: Data Processing and Counting Recommender
    Tech Stack
    Data Representation
    Big Data Frameworks
    Cluster Frameworks
    PySpark Example
    GloVE Model Definition
    GloVE Model Specification in JAX and Flax
    GloVE Model Training with Optax
    Summary
    III. Ranking
    9. Feature-Based and Counting-Based Recommendations
    Bilinear Factor Models (Metric Learning)
    Feature-Based Warm Starting
    Segmentation Models and Hybrids
    Tag-Based Recommenders
    Hybridization
    Limitations of Bilinear Models
    Counting Recommenders
    Return to the Most-Popular-Item Recommender
    Correlation Mining
    Pointwise Mutual Information via Co-occurrences
    Similarity from Co-occurrence
    Similarity-Based Recommendations
    Summary
    10. Low-Rank Methods
    Latent Spaces
    Dot Product Similarity
    Co-occurrence Models
    Reducing the Rank of a Recommender Problem
    Optimizing for MF with ALS
    Regularization for MF
    Regularized MF Implementation
    Output from HPO MF
    Prequential validation
    WSABIE
    Dimension Reduction
    Isometric Embeddings
    Nonlinear Locally Metrizable Embeddings
    Centered Kernel Alignment
    Affinity and p-sale
    Propensity Weighting for Recommendation System Evaluation
    Propensity
    Simpson’s and Mitigating Confounding
    Summary
    11. Personalized Recommendation Metrics
    Environments
    Online and Offline
    User Versus Item Metrics
    A/B Testing
    Recall and Precision
    @ k
    Precision at k
    Recall at k
    R-precision
    mAP, MMR, NDCG
    mAP
    MRR
    NDCG
    mAP Versus NDCG?
    Correlation Coefficients
    RMSE from Affinity
    Integral Forms: AUC and cAUC
    Recommendation Probabilities to AUC-ROC
    Comparison to Other Metrics
    BPR
    Summary
    12. Training for Ranking
    Where Does Ranking Fit in Recommender Systems?
    Learning to Rank
    Training an LTR Model
    Classification for Ranking
    Regression for Ranking
    Classification and Regression for Ranking
    WARP
    k-order Statistic
    BM25
    Multimodal Retrieval
    Summary
    13. Putting It All Together: Experimenting and Ranking
    Experimentation Tips
    Keep It Simple
    Debug Print Statements
    Defer Optimization
    Keep Track of Changes
    Use Feature Engineering
    Understand Metrics Versus Business Metrics
    Perform Rapid Iteration
    Spotify Million Playlist Dataset
    Building URI Dictionaries
    Building the Training Data
    Reading the Input
    Modeling the Problem
    Framing the Loss Function
    Exercises
    Summary
    IV. Serving
    14. Business Logic
    Hard Ranking
    Learned Avoids
    Hand-Tuned Weights
    Inventory Health
    Implementing Avoids
    Model-Based Avoids
    Summary
    15. Bias in Recommendation Systems
    Diversification of Recommendations
    Improving Diversity
    Applying Portfolio Optimization
    Multiobjective Functions
    Predicate Pushdown
    Fairness
    Summary
    16. Acceleration Structures
    Sharding
    Locality Sensitive Hashing
    k-d Trees
    Hierarchical k-means
    Cheaper Retrieval Methods
    Summary
    V. The Future of Recs
    17. Sequential Recommenders
    Markov Chains
    Order-Two Markov Chain
    Other Markov Models
    RNN and CNN Architectures
    Attention Architectures
    Self-Attentive Sequential Recommendation
    BERT4Rec
    Recency Sampling
    Merging Static and Sequential
    Summary
    18. What’s Next for Recs?
    Multimodal Recommendations
    Graph-Based Recommenders
    Neural Message Passing
    Applications
    Modeling user-item interactions
    Feature learning
    Cold-start problem
    Context-aware recommendations
    Random Walks
    Metapath and Heterogeneity
    LLM Applications
    LLM Recommenders
    LLM Training
    Instruct Tuning for Recommendations
    LLM Rankers
    Recommendations for AI
    Summary
    Index