Advances in Web Mining and Web Usage Analysis: 8th International Workshop on Knowledge Discovery on the Web, WebKDD 2006 Philadelphia, USA, August 20,

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

This book constitutes the thoroughly refereed post-proceedings of the 8th International Workshop on Mining Web Data, WEBKDD 2006, held in Philadelphia, PA, USA in August 2006 in conjunction with the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006.

The 13 revised full papers presented together with a detailed preface went through two rounds of reviewing and improvement and were carefully selected for inclusion in the book. The enhanced papers show new technologies from areas like adaptive mining methods, stream mining algorithms, techniques for the Grid, especially flat texts, documents, pictures and streams, usability, e-commerce applications, personalization, and recommendation engines.

Author(s): Olfa Nasraoui, Myra Spiliopoulou, Jaideep Srivastava, Bamshad Mobasher, Brij Masand
Series: Lecture Notes in Artificial Intelligence 4811
Edition: 1
Publisher: Springer
Year: 2007

Language: English
Pages: 258

Front matter......Page 1
Introduction......Page 12
Related Work......Page 13
Definitions......Page 15
Evaluating the Quality of a Shortcutting Algorithm......Page 16
Perkowitz' Shortcutting Algorithm......Page 17
The MinPath Algorithm......Page 18
The CacheCut Algorithm......Page 19
Increasing the Size of the Underlying Cache......Page 20
CacheCut Implementation......Page 21
Motivation......Page 22
Experimental Results......Page 24
Choosing the Best Parameters for CacheCut......Page 25
Comparing CacheCut to Other Shortcutting Algorithms......Page 27
FrontCache Performance......Page 29
Conclusions and Future Work......Page 30
Introduction......Page 32
Related Work......Page 33
Background......Page 35
Usage Graph......Page 36
Distance Measure Using Floyd Warshall’s Algorithm......Page 37
Experimental Results......Page 38
Example Distances......Page 39
Evaluation Methodologies......Page 40
Comparison of Results......Page 41
Conclusions and Future Work......Page 45
References......Page 46
Motivation......Page 47
Contribution......Page 48
Related Work......Page 49
Examined Issues......Page 50
The Data Preprocessing/Discretization Step......Page 53
The Biclustering Process......Page 54
The Nearest Bicluster Algorithm......Page 56
Experimental Configuration......Page 58
Results for Tuning Nearest Biclusters......Page 59
Comparative Results for Effectiveness......Page 60
Comparative Results for Efficiency......Page 61
Examination of Additional Factors......Page 62
Conclusions......Page 64
Introduction......Page 67
Graph Document Models: An Overview......Page 69
Categorization Model Induction Based on a Hybrid Document Representation......Page 71
The Hybrid Smart Approach......Page 72
Frequent Sub-graph Extraction Problem......Page 73
Preprocessing and Representation......Page 75
Comparison of Hybrid and Bag-of-Words Representations Using the C4.5 Classifier......Page 76
Comparison of Hybrid and Bag-of-Words Representations Using Probabilistic Naïve Bayes Classifier......Page 79
References......Page 81
Introduction......Page 83
Related Work......Page 85
Score and Keyword Frequency Propagation......Page 86
Relative Content......Page 87
Intuition......Page 88
Quantifying the Degree of ``OR''ness of a Keyword Vector......Page 89
Evaluating the Relative Generality......Page 90
Relative-Content Preserving Keyword Propagation Between a Pair of Entries......Page 91
Keyword Propagation Across a Complex Structure......Page 92
Setup......Page 95
Hierarchically-Informed Keyword Propagation vs. Non-Propagation-Based Context Representation Schemes......Page 97
Statistical Validation of the Ground Truth......Page 100
Conclusion......Page 101
Introduction......Page 103
Related Studies......Page 105
Research Design......Page 106
Data Analysis......Page 109
Session Lengths......Page 113
Session Durations......Page 115
Discussion......Page 116
Conclusion......Page 118
References......Page 119
Introduction......Page 121
Optimal Sequence Alignment Based Session Similarity......Page 123
Prediction Model Using Clickstream Trees......Page 124
Concept Hierarchy and Recommendations......Page 125
Concept Hierarchy......Page 126
Similarity Model......Page 127
Implementation......Page 129
Experiments and Results......Page 131
Experimental Setup......Page 132
Comparison on number of recommendations made......Page 133
Comparison on number of clusters......Page 134
References......Page 135
Introduction......Page 138
Related Work......Page 139
The Problem......Page 141
MovieLens Data Set......Page 142
Data Model: Correlation Graph......Page 143
ItemRank Algorithm......Page 145
Complexity Issues......Page 148
ItemRank as a Linear Operator and Convergence......Page 149
Experimental Results......Page 151
Conclusions......Page 155
Introduction......Page 158
The Problem Domain......Page 160
Proposed Approach......Page 161
Other CF Algorithms Considered......Page 163
Datasets......Page 168
Evaluation Metrics......Page 169
Results......Page 171
Discussion......Page 174
Conclusion......Page 175
Introduction......Page 178
Attack Models......Page 180
Attack Profile Classification......Page 181
Generic Attributes......Page 182
Model-Specific Attributes......Page 183
Methodology......Page 186
Evaluation Metrics......Page 187
Experimental Results......Page 188
Conclusion......Page 196
Introduction......Page 198
Statistical Sentiment Classification......Page 201
Extracting Web Log Posts on Topic......Page 202
Dataset Representation......Page 203
Support Vector Machines......Page 205
Feature Selection......Page 206
Experiments......Page 207
Comparing Different Machine Learning Techniques......Page 208
Comparing Different Feature Sets......Page 209
Comparing Different Categorical Constituencies......Page 212
Conclusions and Future Work......Page 213
References......Page 214
Introduction......Page 218
Quality of Query Sessions......Page 220
Data Processing Algorithms......Page 222
Data Mart......Page 223
Data Preprocessing......Page 224
Data Pre-analysis......Page 226
Clustering of the Data in Homogeneous Groups......Page 228
Analysis of Resulting Clusters for Query Sessions: User Types and Quality Profiles......Page 231
Analysis of Resulting Clusters and Subclusters: Documents......Page 232
Rule and Tree Induction on Query Dataset......Page 233
Rule and Tree Induction on Document Dataset......Page 235
Conclusions......Page 237
Introduction......Page 238
Definitions......Page 239
Web Collections......Page 241
Algorithm for Duplicate Detection......Page 242
Results About Duplicates......Page 243
Log-Based Content Evolution Study......Page 245
Algorithm Description......Page 246
Experimental Setup......Page 248
Experimental Results......Page 249
Algorithm Description......Page 252
Chilean Web Content Evolution......Page 253
Related Work......Page 255
Concluding Remarks......Page 256
Back matter......Page 258