Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings,Google's PageRank and Beyondsupplies the answers to these and other questions and more.
The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.
The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.
Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.
Many illustrative examples and entertaining asides
MATLAB code
Accessible and informal style
Complete and self-contained section for mathematics review
Author(s): Amy N. Langville; Carl D. Meyer
Publisher: Princeton University Press
Year: 2006
Language: English
Pages: 224
Cover
Contents
Preface
Chapter 1. Introduction to Web Search Engines
* 1.1 A Short History of Information Retrieval
* 1.2 An Overview of Traditional Information Retrieval
* 1.3 Web Information Retrieval
Chapter 2. Crawling, Indexing, and Query Processing
* 2.1 Crawling
* 2.2 The Content Index
* 2.3 Query Processing
Chapter 3. Ranking Webpages by Popularity
* 3.1 The Scene in 1998
* 3.2 Two Theses
* 3.3 Query-Independence
Chapter 4. The Mathematics of Google’s PageRank
* 4.1 The Original Summation Formula for PageRank
* 4.2 Matrix Representation of the Summation Equations
* 4.3 Problems with the Iterative Process
* 4.4 A Little Markov Chain Theory
* 4.5 Early Adjustments to the Basic Model
* 4.6 Computation of the PageRank Vector
* 4.7 Theorem and Proof for Spectrum of the Google Matrix
Chapter 5. Parameters in the PageRank Model
* 5.1 The α Factor
* 5.2 The Hyperlink Matrix H
* 5.3 The Teleportation Matrix E
Chapter 6. The Sensitivity of PageRank
* 6.1 Sensitivity with respect to α
* 6.2 Sensitivity with respect to H
* 6.3 Sensitivity with respect to v[sup(T)]
* 6.4 Other Analyses of Sensitivity
* 6.5 Sensitivity Theorems and Proofs
Chapter 7. The PageRank Problem as a Linear System
* 7.1 Properties of (I – αS)
* 7.2 Properties of (I – αH)
* 7.3 Proof of the PageRank Sparse Linear System
Chapter 8. Issues in Large-Scale Implementation of PageRank
* 8.1 Storage Issues
* 8.2 Convergence Criterion
* 8.3 Accuracy
* 8.4 Dangling Nodes
* 8.5 Back Button Modeling
Chapter 9. Accelerating the Computation of PageRank
* 9.1 An Adaptive Power Method
* 9.2 Extrapolation
* 9.3 Aggregation
* 9.4 Other Numerical Methods
Chapter 10. Updating the PageRank Vector
* 10.1 The Two Updating Problems and their History
* 10.2 Restarting the Power Method
* 10.3 Approximate Updating Using Approximate Aggregation
* 10.4 Exact Aggregation
* 10.5 Exact vs. Approximate Aggregation
* 10.6 Updating with Iterative Aggregation
* 10.7 Determining the Partition
* 10.8 Conclusions
Chapter 11. The HITS Method for Ranking Webpages
* 11.1 The HITS Algorithm
* 11.2 HITS Implementation
* 11.3 HITS Convergence
* 11.4 HITS Example
* 11.5 Strengths and Weaknesses of HITS
* 11.6 HITS’s Relationship to Bibliometrics
* 11.7 Query-Independent HITS
* 11.8 Accelerating HITS
* 11.9 HITS Sensitivity
Chapter 12. Other Link Methods for Ranking Webpages
* 12.1 SALSA
* 12.2 Hybrid Ranking Methods
* 12.3 Rankings based on Traffic Flow
Chapter 13. The Future of Web Information Retrieval
* 13.1 Spam
* 13.2 Personalization
* 13.3 Clustering
* 13.4 Intelligent Agents
* 13.5 Trends and Time-Sensitive Search
* 13.6 Privacy and Censorship
* 13.7 Library Classification Schemes
* 13.8 Data Fusion
Chapter 14. Resources for Web Information Retrieval
* 14.1 Resources for Getting Started
* 14.2 Resources for Serious Study
Chapter 15. The Mathematics Guide
* 15.1 Linear Algebra
* 15.2 Perron–Frobenius Theory
* 15.3 Markov Chains
* 15.4 Perron Complementation
* 15.5 Stochastic Complementation
* 15.6 Censoring
* 15.7 Aggregation
* 15.8 Disaggregation
Chapter 16. Glossary
Bibliography
Index