An Introduction to Search Engines and Web Navigation

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book is a second edition, updated and expanded to explain the technologies that help us find information on the web.  Search engines and web navigation tools have become ubiquitous in our day to day use of the web as an information source, a tool for commercial transactions and a social computing tool. Moreover, through the mobile web we have access to the web's services when we are on the move.  This book demystifies the tools that we use when interacting with the web, and gives the reader a detailed overview of where we are and where we are going in terms of search engine and web navigation technologies.

Author(s): Mark Levene
Edition: 2
Publisher: Wiley
Year: 2010

Language: English
Pages: 500

AN INTRODUCTION TO SEARCH ENGINES AND WEB NAVIGATION......Page 5
CONTENTS......Page 9
PREFACE......Page 16
LIST OF FIGURES......Page 19
CHAPTER 1 INTRODUCTION......Page 23
1.1 Brief Summary of Chapters......Page 24
1.2 Brief History of Hypertext and the Web......Page 25
1.3 Brief History of Search Engines......Page 28
CHAPTER 2 THE WEB AND THE PROBLEM OF SEARCH......Page 31
2.1.1 Web Size Statistics......Page 32
2.1.2 Web Usage Statistics......Page 37
2.2 Tabular Data Versus Web Data......Page 40
2.3 Structure of the Web......Page 42
2.3.1 Bow-Tie Structure of the Web......Page 43
2.3.2 Small-World Structure of the Web......Page 45
2.4.1 Direct Navigation......Page 46
2.4.2 Navigation within a Directory......Page 47
2.4.3 Navigation using a Search Engine......Page 48
2.4.4 Problems with Web Information Seeking......Page 49
2.5 Informational, Navigational, and Transactional Queries......Page 50
2.6 Comparing Web Search to Traditional Information Retrieval......Page 51
2.6.1 Recall and Precision......Page 52
2.7 Local Site Search Versus Global Web Search......Page 54
2.8 Difference Between Search and Navigation......Page 56
CHAPTER 3 THE PROBLEM OF WEB NAVIGATION......Page 60
3.1 Getting Lost in Hyperspace and the Navigation Problem......Page 61
3.2.1 The Potential Use of Machine Learning Algorithms......Page 64
3.2.2 The Naive Bayes Classifier for Categorizing Web Pages......Page 65
3.3 Trails Should be First Class Objects......Page 68
3.4.1 Markov Chains and the Markov Property......Page 71
3.4.2 Markov Chains and the Probabilities of Following Links......Page 72
3.4.3 Markov Chains and the Relevance of Links......Page 74
3.5 Conflict Between Web Site Owner and Visitor......Page 76
3.6 Conflict Between Semantics of Web Site and the Business Model......Page 79
CHAPTER 4 SEARCHING THE WEB......Page 82
4.1 Mechanics of a Typical Search......Page 83
4.2 Search Engines as Information Gatekeepers of the Web......Page 86
4.3 Search Engine Wars, is the Dust Settling?......Page 90
4.3.1 Competitor Number One: Google......Page 91
4.3.3 Competitor Number Three: Bing......Page 92
4.3.4 Other Competitors......Page 94
4.4.1 Search Engine Query Logs......Page 95
4.4.2 Search Engine Query Syntax......Page 97
4.4.3 The Most Popular Search Keywords......Page 99
4.5 Architecture of a Search Engine......Page 100
4.5.1 The Search Index......Page 101
4.5.2 The Query Engine......Page 102
4.6 Crawling the Web......Page 103
4.6.1 Crawling Algorithms......Page 104
4.6.3 The Robots Exclusion Protocol......Page 106
4.7 What Does it Take to Deliver a Global Search Service?......Page 107
CHAPTER 5 HOW DOES A SEARCH ENGINE WORK......Page 113
5.1.1 Processing Web Pages......Page 116
5.1.3 Term Frequency......Page 118
5.1.4 Inverse Document Frequency......Page 121
5.1.5 Computing Keyword TF–IDF Values......Page 122
5.1.8 Synonyms......Page 124
5.1.9 Link Text......Page 125
5.1.12 HTML Structure Weighting......Page 126
5.1.13 Spell Checking......Page 127
5.1.14 Non-English Queries......Page 128
5.1.16 Related Searches and Query Suggestions......Page 129
5.2 Link-Based Metrics......Page 130
5.2.1 Referential and Informational Links......Page 131
5.2.3 Are Links the Currency of the Web?......Page 132
5.2.4 PageRank Explained......Page 134
5.2.6 Monte Carlo Methods in PageRank Computation......Page 138
5.2.7 Hyperlink-Induced Topic Search......Page 139
5.2.8 Stochastic Approach for Link-Structure Analysis......Page 142
5.2.9 Counting Incoming Links......Page 144
5.2.11 PageRank within a Community......Page 145
5.2.12 Influence of Weblogs on PageRank......Page 146
5.2.13 Link Spam......Page 147
5.2.14 Citation Analysis......Page 149
5.2.15 The Wide Ranging Interest in PageRank......Page 151
5.3.1 Direct Hit’s Popularity Metric......Page 152
5.3.3 Using Query Log Data to Improve Search......Page 154
5.3.4 Learning to Rank......Page 155
5.3.5 BrowseRank......Page 156
5.4.2 Evaluation Metrics......Page 158
5.4.3 Performance Measures......Page 160
5.4.4 Eye Tracking Studies......Page 161
5.4.5 Test Collections......Page 163
5.4.6 Inferring Ranking Algorithms......Page 164
CHAPTER 6 DIFFERENT TYPES OF SEARCH ENGINES......Page 170
6.1 Directories and Categorization of Web Content......Page 172
6.2.1 Paid Inclusion......Page 174
6.2.3 Sponsored Search and Paid Placement......Page 175
6.2.4 Behavioral Targeting......Page 179
6.2.5 User Behavior......Page 180
6.2.6 The Trade-Off between Bias and Demand......Page 182
6.2.7 Sponsored Search Auctions......Page 183
6.2.8 Pay per Action......Page 187
6.2.9 Click Fraud and Other Forms of Advertising Fraud......Page 188
6.3 Metasearch......Page 190
6.3.1 Fusion Algorithms......Page 191
6.3.2 Operational Metasearch Engines......Page 192
6.3.3 Clustering Search Results......Page 195
6.3.4 Classifying Search Results......Page 197
6.4 Personalization......Page 200
6.4.2 Personalized Results Tool......Page 202
6.4.4 Relevance Feedback......Page 204
6.4.5 Personalized PageRank......Page 206
6.4.6 Outride’s Personalized Search......Page 208
6.5 Question Answering (Q&A) on the Web......Page 209
6.5.1 Natural Language Annotations......Page 210
6.5.2 Factual Queries......Page 212
6.5.3 Open Domain Question Answering......Page 213
6.5.4 Semantic Headers......Page 215
6.6 Image Search......Page 216
6.6.1 Text-Based Image Search......Page 217
6.6.2 Content-Based Image Search......Page 218
6.6.3 VisualRank......Page 220
6.6.5 Image Search for Finding Location-Based Information......Page 222
6.7 Special Purpose Search Engines......Page 223
CHAPTER 7 NAVIGATING THE WEB......Page 231
7.1.2 Hyperlinks and Surfing......Page 233
7.1.3 Web Site Design and Usability......Page 234
7.2.1 The Basic Browser Tools......Page 235
7.2.2 The Back and Forward Buttons......Page 236
7.2.3 Search Engine Toolbars......Page 237
7.2.4 The Bookmarks Tool......Page 238
7.2.6 Identifying Web Pages......Page 241
7.2.7 Breadcrumb Navigation......Page 243
7.2.8 Quicklinks......Page 244
7.2.9 Hypertext Orientation Tools......Page 245
7.2.10 Hypercard Programming Environment......Page 246
7.3 Navigational Metrics......Page 247
7.3.1 The Potential Gain......Page 248
7.3.2 Structural Analysis of a Web Site......Page 250
7.3.3 Measuring the Usability of Web Sites......Page 251
7.4.1 Three Perspectives on Data Mining......Page 252
7.4.2 Measuring the Success of a Web Site......Page 253
7.4.4 E-Metrics......Page 255
7.4.5 Web Analytics Tools......Page 256
7.4.6 Weblog File Analyzers......Page 257
7.4.7 Identifying the Surfer......Page 258
7.4.8 Sessionizing......Page 259
7.4.10 Markov Chain Model of Web Site Navigation......Page 260
7.4.11 Applications of Web Usage Mining......Page 265
7.4.12 Information Extraction......Page 266
7.5 The Best Trail Algorithm......Page 267
7.5.3 Developing a Trail Engine......Page 268
7.6.1 How to Visualize Navigation Patterns......Page 274
7.6.2 Overview Diagrams and Web Site Maps......Page 275
7.6.3 Fisheye Views......Page 278
7.6.4 Visualizing Trails within a Web Site......Page 279
7.6.5 Visual Search Engines......Page 280
7.6.6 Social Data Analysis......Page 281
7.7.1 Real-World Web Usage Mining......Page 284
7.7.2 The Museum Experience Recorder......Page 286
7.7.3 Navigating in the Real World......Page 287
CHAPTER 8 THE MOBILE WEB......Page 294
8.1 The Paradigm of Mobile Computing......Page 295
8.1.1 Wireless Markup Language......Page 296
8.1.2 The i-mode Service......Page 297
8.2.1 M-commerce......Page 299
8.2.2 Delivery of Personalized News......Page 300
8.2.3 Delivery of Learning Resources......Page 303
8.3.1 Mobile Web Browsers......Page 304
8.3.3 Text Entry on Mobile Devices......Page 306
8.3.4 Voice Recognition for Mobile Devices......Page 308
8.3.5 Presenting Information on a Mobile Device......Page 309
8.4.1 Click-Distance......Page 313
8.4.2 Adaptive Mobile Portals......Page 314
8.4.3 Adaptive Web Navigation......Page 316
8.5 Mobile Search......Page 317
8.5.1 Mobile Search Interfaces......Page 318
8.5.2 Search Engine Support for Mobile Devices......Page 319
8.5.3 Focused Mobile Search......Page 321
8.5.4 Laid Back Mobile Search......Page 322
8.5.5 Mobile Query Log Analysis......Page 323
8.5.6 Personalization of Mobile Search......Page 324
8.5.7 Location-Aware Mobile Search......Page 325
CHAPTER 9 SOCIAL NETWORKS......Page 331
9.1 What is a Social Network?......Page 333
9.1.1 Milgram’s Small-World Experiment......Page 334
9.1.2 Collaboration Graphs......Page 335
9.1.4 The Social Web......Page 336
9.1.5 Social Network Start-Ups......Page 338
9.2.1 Social Network Terminology......Page 342
9.2.3 Centrality......Page 344
9.2.4 Web Communities......Page 346
9.3 Peer-to-Peer Networks......Page 348
9.3.1 Centralized P2P Networks......Page 349
9.3.2 Decentralized P2P Networks......Page 350
9.3.3 Hybrid P2P Networks......Page 352
9.3.5 BitTorrent File Distribution......Page 353
9.3.7 Incentives in P2P Systems......Page 354
9.4.1 Amazon.com......Page 355
9.4.2 Collaborative Filtering Explained......Page 356
9.4.3 User-Based Collaborative Filtering......Page 357
9.4.4 Item-Based Collaborative Filtering......Page 359
9.4.6 Content-Based Recommendation Systems......Page 360
9.4.7 Evaluation of Collaborative Filtering Systems......Page 362
9.4.9 A Case Study of Amazon.co.uk......Page 363
9.4.10 The Netflix Prize......Page 364
9.4.11 Some Other Collaborative Filtering Systems......Page 368
9.5 Weblogs (Blogs)......Page 369
9.5.2 Blogspace......Page 370
9.5.4 Spreading Ideas via Blogs......Page 371
9.5.5 The Real-Time Web and Microblogging......Page 372
9.6 Power-Law Distributions in the Web......Page 374
9.6.1 Detecting Power-Law Distributions......Page 375
9.6.3 A Law of Surfing and a Law of Participation......Page 377
9.6.4 The Evolution of the Web via Preferential Attachment......Page 379
9.6.5 The Evolution of the Web as a Multiplicative Process......Page 381
9.6.6 The Evolution of the Web via HOT......Page 382
9.6.7 Small-World Networks......Page 383
9.6.8 The Robustness and Vulnerability of a Scale-Free Network......Page 388
9.7.1 Social Navigation......Page 391
9.7.2 Social Search Engines......Page 392
9.7.3 Navigation Within Social Networks......Page 395
9.7.4 Navigation Within Small-World Networks......Page 397
9.8 Social Tagging and Bookmarking......Page 401
9.8.2 YouTube—Broadcast Yourself......Page 402
9.8.4 Communities Within Content Sharing Sites......Page 404
9.8.6 Folksonomy......Page 405
9.8.7 Tag Clouds......Page 406
9.8.8 Tag Search and Browsing......Page 407
9.8.9 The Efficiency of Tagging......Page 410
9.8.10 Clustering and Classifying Tags......Page 411
9.9 Opinion Mining......Page 412
9.9.1 Feature-Based Opinion Mining......Page 413
9.9.2 Sentiment Classification......Page 414
9.10 Web 2.0 and Collective Intelligence......Page 415
9.10.1 Ajax......Page 416
9.10.2 Syndication......Page 417
9.10.3 Open APIs, Mashups, and Widgets......Page 418
9.10.5 Collective Intelligence......Page 420
9.10.6 Algorithms for Collective Intelligence......Page 423
9.10.7 Wikipedia—The World’s Largest Encyclopedia......Page 424
9.10.8 eBay—The World’s Largest Online Trading Community......Page 429
CHAPTER 10 THE FUTURE OF WEB SEARCH AND NAVIGATION......Page 441
BIBLIOGRAPHY......Page 446
INDEX......Page 485