Data Management for Multimedia Retrieval

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Multimedia data require specialized management techniques because the representations of color, time, semantic concepts, and other underlying information can be drastically different from one another. The user's subjective judgment can also have significant impact on what data or features are relevant in a given context. These factors affect both the performance of the retrieval algorithms and their effectiveness. This textbook on multimedia data management techniques offers a unified perspective on retrieval efficiency and effectiveness. It provides a comprehensive treatment, from basic to advanced concepts, that will be useful to readers of different levels, from advanced undergraduate and graduate students to researchers and to professionals. After introducing models for multimedia data (images, video, audio, text, and web) and for their features, such as color, texture, shape, and time, the book presents data structures and algorithms that help store, index, cluster, classify, and access common data representations. The authors also introduce techniques, such as relevance feedback and collaborative filtering, for bridging the ''semantic gap'' and present the applications of these to emerging topics, including web and social networking.

Author(s): K. Selçuk Candan, Maria Luisa Sapino
Publisher: Cambridge University Press
Year: 2010

Language: English
Pages: 500

Title......Page 5
Copyright......Page 6
Contents......Page 7
Preface......Page 11
1.1 Heterogeneity......Page 13
1.1.1 Complex Media Objects......Page 14
1.1.2 Semantic Gap......Page 18
1.2.1 Reasons for Imprecision and Subjectivity......Page 20
1.2.2 Impact on Query Formulation and Processing......Page 22
1.3 Components of a Multimedia Database Management System......Page 24
1.4 Summary......Page 31
2 Models for Multimedia Data......Page 32
Schema and Constraints......Page 33
Queries, Relational Calculus, and SQL......Page 34
Relational Algebra for Query Processing......Page 36
Objects, Entities, and Encapsulation......Page 37
Object-Relational Databases......Page 39
2.1.4 Semi-Structured Models......Page 40
2.1.5 Flexible Models and RDF......Page 42
2.2.2 Distance Measures and Metrics......Page 44
2.2.3 Common Representations: Vectors, Strings, Graphs, Fuzzyand Probabilistic Representations......Page 45
2.3.1 Color Models......Page 46
RGB Model......Page 47
CIE, CIELAB, and HSV......Page 49
Color-Based Image Representation Using Histograms......Page 63
2.3.2 Texture Models......Page 64
Fractals......Page 66
Texture Histograms......Page 67
2.3.3 Shape Models......Page 69
Segmentation......Page 70
Watershed Transformation......Page 71
Describing the Boundaries of the Shapes......Page 72
Shape Histograms......Page 73
Hough Transform......Page 74
2.3.4 Local Feature Descriptors (Set-Based Models)......Page 76
Basic Timeline Model......Page 79
Extended Timeline Model......Page 80
Difference Constraints......Page 82
Causal Models......Page 83
2.3.5.3 Interval-Based Logical Models......Page 84
2.3.5.4 Hybrid Models......Page 86
2.3.5.5 Graph-Based Temporal Models......Page 87
Timed Petri Nets......Page 88
Timed Automata......Page 89
2.3.5.6 Time Series......Page 91
2.3.5.7 Temporal Similarity and Distance Measures......Page 92
Temporal Distance – Timeline Model......Page 93
Temporal Distance – Extended (Flexible) Timeline Model......Page 95
Temporal Distance/Similarity – Constraint-Based Models......Page 96
2.3.6 Spatial Models......Page 98
2.3.6.1 Fields and Their Directional and Topological Relationships......Page 99
UID-Matrix......Page 101
Nine-Intersection Matrix......Page 102
Spatial Orientation Graph......Page 103
Plane Sweep......Page 104
2.3.6.3 Exact Retrieval Based on Spatial Information......Page 105
2.3.6.4 Spatial Similarity......Page 106
Spatial Orientation Graph and Similarity Computation......Page 107
2D-String......Page 108
2D R-String......Page 109
2D G-String, 2D C-String, and 2D C+-String......Page 110
2D B-String, 2D B -String, and 2D Z-String......Page 111
2D-PIR and Topology Neighborhood Graph......Page 112
2.3.7 Audio Models......Page 114
QBIC......Page 116
VCSQL/SEMCOG......Page 120
SQL/MM......Page 121
2.5 Summary......Page 122
3.1 Vector Space Models......Page 123
3.1.1 Vector Space......Page 124
3.1.3 Comparison of Objects in the Vector Space......Page 126
3.2.1 Example Application: User Experience Sequences......Page 133
3.2.2 Edit Distance Measures......Page 134
3.3 Graphs and Trees......Page 135
3.3.1 Operations on Graphs and Trees......Page 137
3.3.2 Graph Similarity and Edit Distance......Page 138
3.4.1 Fuzzy Sets and Predicates......Page 139
3.4.2.1 Min, Product, and Average......Page 140
3.4.2.2 Properties of the Common Fuzzy Operators......Page 142
3.4.3 Relative Importance of Query Criteria......Page 143
3.4.3.1 Measuring Relative Importance......Page 144
3.4.3.2 Fagin’s Generic Importance Weighting Function......Page 145
3.5 Probabilistic Models......Page 147
3.5.1.1 Mean, Variance, and Normal Distribution......Page 148
3.5.1.2 Conditional Probability, Independence, Correlation, and Covariance......Page 150
3.5.2 Possible-Worlds Interpretation of Uncertainty......Page 151
3.5.2.1 Probabilistic Relations......Page 152
3.5.2.3 Queries in Probabilistic Databases......Page 153
Disjoint-Independence......Page 154
Tuple-Independence......Page 156
3.5.3 Bayesian Models: Bayesian Networks, Languageand Generative Models......Page 159
3.5.3.2 Language Models......Page 160
Generative Query Models......Page 161
Dirichlet Models......Page 163
3.5.4 Markovian Models......Page 164
Stationary Distribution and Proximity......Page 165
3.6 Summary......Page 166
4 Feature Quality and Independence: Why and How?......Page 167
4.1 Dimensionality Curse......Page 168
4.2 Feature Selection......Page 169
4.2.1 False Hits and Misses......Page 172
4.2.2 Feature Significance in the Information-Theoretic Sense......Page 174
4.2.3 Feature Significance in Terms of Data Distribution......Page 175
4.2.5 Intrinsic Dimensionality of a Data Set......Page 177
4.2.6 Principal Component Analysis......Page 180
4.2.6.1 Selecting the Number of Dimensions......Page 183
4.2.7 Common Factor Analysis......Page 184
4.2.8 Selecting an Independent Subset of the Original Features......Page 185
4.2.9 Dimensionality Reduction Using Fixed Basis......Page 186
4.2.9.1 Discrete Cosine Transform......Page 188
4.2.9.2 Discrete Wavelet Transform......Page 189
4.3 Mapping from Distances to a Multidimensional Space......Page 191
4.3.1 Multidimensional Scaling......Page 192
4.3.2 FastMap......Page 194
4.4.1 Singular Value Decomposition (and Latent Semantic Indexing)......Page 196
4.4.1.1 Latent Semantic Analysis (LSA)......Page 197
SVD-Update......Page 198
More General Database Updates......Page 199
4.4.2.1 Aspect Model......Page 200
4.4.3 CUR Decomposition......Page 201
4.4.4.1 Tensor Basics......Page 202
Diagonal Tensor Decompositions......Page 203
4.5 Summary......Page 204
5.1 Inverted Files......Page 205
5.1.2 Sorting and Compressing Inverted Lists for Efficient Retrieval......Page 207
5.2.1 False Positives......Page 208
Conjunctive Queries......Page 209
Blocked Signature Files......Page 211
5.2.3 Data Characteristics and Signature File Performance......Page 212
5.2.4 Word-Overlap–Based Fuzzy Retrieval Using Signature Files......Page 213
5.3 Signature- and Inverted-File Hybrids......Page 214
5.4 Sequence Matching......Page 215
5.4.2 Suffix Trees and Suffix Arrays......Page 216
Directed Acyclic Word Graphs......Page 217
Bit Parallelism......Page 218
5.5.1 Finite Automata......Page 219
5.5.2 Dynamic Programming–Based Edit Distance Computation......Page 220
5.5.3 Bit Parallelism for Direct NFA Simulation......Page 221
rho-Grams......Page 222
Locality Sensitive Hashing......Page 224
5.5.5 Compression-Based Sequence Comparison......Page 225
5.6.1 Regular Expressions......Page 226
5.6.3 RE-Trees......Page 227
5.7 Multiple Sequence Matching and Filtering......Page 228
5.7.2 Hash-Based Multiple Pattern Filtering......Page 229
5.8 Summary......Page 230
6.1.1 GraphGrep......Page 232
6.1.2.2 Graph Probes......Page 233
6.1.3.1 Step (i): MDS-Based Mapping into a Vector Space......Page 234
6.1.3.2 Step (ii): Procrustes-Based Alignment of Vector Spaces......Page 235
6.2 Tree Matching......Page 236
Ordered Trees......Page 237
Unordered Trees......Page 239
Isolated-Subtree Distance......Page 240
6.2.5 Tree Filtering......Page 241
6.2.5.4 String Encodings......Page 242
Other Encodings......Page 243
6.2.5.5 Propagation Vector-Based Tree Comparison......Page 244
6.3.1 Web Search......Page 246
6.3.1.1 Hubs and Authorities......Page 247
6.3.1.2 PageRank......Page 249
6.3.2 Associative Retrieval and Spreading Activation......Page 250
6.3.2.1 Constrained Leaky Capacitor Model......Page 251
6.3.2.3 Branch-and-Bound Spreading Activation......Page 252
6.3.3 Collaborative Filtering......Page 253
6.3.4 Social Networking......Page 254
6.3.5.2 Triangle and Bipartite Core Laws......Page 255
6.3.6 Proximity Search Queries in Graphs......Page 256
6.4 Summary......Page 257
7 Indexing, Search, and Retrieval of Vectors......Page 259
7.1 Space-Filling Curves......Page 262
7.1.1 Fractals......Page 264
7.1.2 Hilbert Curve......Page 265
7.1.3 Z-Order Curve......Page 266
7.1.4 Executing Range Queries Using Hilbert and Z-order Curves......Page 267
7.2 Multidimensional Index Structures......Page 268
7.2.1 Grid Files......Page 269
Insertion......Page 270
Deletions......Page 272
7.2.2.2 MX Quadtrees......Page 274
7.2.2.4 Summary of Quadtrees......Page 275
7.2.3 KD-Trees......Page 276
7.2.3.2 Adaptive KD-Trees......Page 277
7.2.4.2 LSD-Tree......Page 279
7.2.4.3 Hybrid-Tree......Page 280
7.2.5 R-Trees and Their Variants......Page 281
7.2.5.1 Hilbert R-Tree......Page 283
7.2.5.2 R+-Tree......Page 284
7.2.5.4 Trees with Spherical Bounding Regions......Page 285
7.2.5.6 X-Tree......Page 286
7.2.6 TV-Trees......Page 287
7.2.7 Pyramid Trees......Page 291
7.2.8 VA-Files......Page 293
7.3 Summary......Page 294
8 Clustering Techniques......Page 295
Cluster Homogeneity/Compactness......Page 296
Cluster Separation......Page 297
Clustering Modularity......Page 298
8.2.1 Connected Components......Page 299
8.2.2 Maximal Clique Clustering......Page 300
8.2.3.2 Random Walk–Based (or Markovian) Clustering......Page 301
8.2.4 Minimum Cut–Based Clustering......Page 302
8.2.5 Adaptive Thresholding......Page 303
8.3.1 Single-Pass Iterative Clustering......Page 304
Leader Selection......Page 305
8.3.3 K-Means and Iterative Improvement......Page 306
8.3.3.1 K-Means and Related Algorithms......Page 307
8.3.4.1 Probabilistic Estimation......Page 308
8.3.4.2 Incremental Selection of the Number of Clusters......Page 309
8.4 Multiconstraint Partitioning......Page 310
8.5 Mixture Model Based Clustering......Page 311
8.6.1 Confidence Clustering......Page 312
8.6.2 Perturbation-Based Clustering......Page 313
8.7.2 Clustering Using Self-Organizing Maps......Page 314
8.8 Co-clustering......Page 316
8.8.2 Information-Theoretical Co-clustering......Page 317
8.8.3 Cross-Association......Page 319
8.9 Summary......Page 320
9.1 Decision Tree Classification......Page 321
9.1.1 Selecting the Feature Dimension for Partitioninga Given Set of Observations......Page 322
9.1.3 Random Forests......Page 324
9.3.1 Linear Discriminant Analysis......Page 325
9.3.2 Overfitting and Regularization......Page 326
9.3.3 Max-Margin Classification and SVMs......Page 327
9.3.4 Lagrangian Formulation......Page 329
9.3.5 Classification with Non–"Linearly Separable'' Observations......Page 330
9.3.8 Complexity of SVMs......Page 331
9.4 Rule-Based Classification......Page 332
9.4.1 From Decision Trees to Rule-Based Classifiers......Page 333
9.4.2 Simplifying Rule-Based Classifiers......Page 334
9.5.1 Fuzzy Decision Trees for Fuzzy Classification......Page 335
9.5.2 Learning Fuzzy Classification Rules Directly from Observations......Page 336
9.6.1 Independence Assumption......Page 338
9.6.2 Relaxing the Independence Assumption......Page 339
9.7 Hidden Markov Models......Page 340
9.7.1 Markov Models with Hidden States: Formal Definition......Page 341
9.7.2.1 Forward Algorithm......Page 342
9.7.3 Classification by Predicting the Sequence of Hidden States......Page 343
9.7.4.2 Baum-Welch Method......Page 344
9.7.4.3 Expectation Maximization (EM)......Page 345
9.8 Model Selection: Overfitting Revisited......Page 346
9.8.2 Akaike's Information Criterion (AIC)......Page 347
9.9 Boosting......Page 348
9.10 Summary......Page 350
10 Ranked Retrieval......Page 351
10.1.1 Branch and Bound–Based Nearest Neighbor Search......Page 352
10.1.1.1 Nearest Neighbor Searches in Euclidean Vector Spaces......Page 353
10.1.1.2 Search Data Structures in Metric Spaces......Page 354
10.1.2 Nearest Neighbor Search without Hierarchical Partioning......Page 355
10.1.2.2 Orchard’s Algorithm......Page 356
10.1.2.3 AESA, LAESA, and TLAESA......Page 357
10.1.3 Incremental Nearest Neighbor Search......Page 358
10.1.4.2 Locality Sensitive Hashing......Page 359
10.1.5 Nearest Neighbor Search with Batch-Oriented Data Sources......Page 360
10.2 Top-k Queries......Page 361
10.2.1 Fagin’s Algorithm (FA)......Page 363
10.2.2 Threshold Algorithm (TA)......Page 365
10.2.3 No Random Access Algorithm (NRA)......Page 367
10.2.4 Partial Sorted Access Algorithm (PSA) and Minimization of Random Accesses......Page 369
10.2.5 Pre-Processing for Layer Ordering......Page 370
10.2.5.2 Robust Indexing......Page 371
10.2.6 Relaxing the Monotonicity Requirement......Page 372
Top-K Tree Pattern Query Evaluation in Weighted Graphs......Page 373
Sum-Max Monotonicity......Page 374
Horizon-Based Ranked Join (HR-Join)......Page 375
10.2.6.2 Skip-and-Prune: Cosine-Based Top-K Query Processing......Page 376
10.2.7.1 Filter-Based Implementation of Ranking Expressions......Page 378
10.2.7.2 Stop and Restart......Page 379
10.2.7.3 Specialized Top-K Join Operators......Page 381
10.2.7.4 Extended Algebraic Formulations......Page 382
10.3 Skylines......Page 384
10.3.1.1 Nested-Loops–Based Skylines......Page 385
10.3.1.2 Presorting-Based Skylines......Page 388
10.3.1.3 Divide-and-Conquer–Based Skylines......Page 389
10.3.2.1 B-trees......Page 390
10.3.2.2 Bitmap Skylines......Page 391
10.3.2.4 Branch-and-Bound Skylines......Page 392
10.3.3 Skylines with Partially Ordered Data......Page 393
10.3.3.1 Interval Mapping–Based Branch-and-Bound......Page 394
10.3.3.2 Weak Pareto Dominance and l-cuts......Page 395
10.3.4 Top-K Dominating Queries......Page 396
10.4 Optimization of Ranking Queries......Page 397
10.4.1 Cost Estimation for Multidimensional Queries and Power Law......Page 398
10.4.2 Dealing with Expensive Filtering Predicates......Page 399
10.4.3 Dealing with Expensive Join Predicates......Page 401
10.4.4 Rank-Aware Query Optimization......Page 402
10.5 Summary......Page 403
11 Evaluation of Retrieval......Page 404
11.2.1 Arithmetic and Harmonic Means......Page 405
11.2.2 Weighted Arithmetic and Harmonic Means and the Effectiveness Measure......Page 406
11.3 Systems with Ranked Results......Page 407
11.4 Single-Valued Summaries of the Precision-Recall Curve......Page 408
11.4.2 R-Precision......Page 409
11.5 Evaluating Systems Using Ranked and Graded Ground Truths......Page 410
11.5.1 Pearson's Correlation, Spearman's Rank Correlation Coefficient, and the Kendall-Tau Rank Coefficient......Page 411
11.5.3 Normalized Discounted Cumulative Gain......Page 412
11.5.4 Normalized Modified Retrieval Rank......Page 413
11.7 Statistical Significance of Assessments......Page 414
11.7.1 T-Test......Page 415
11.7.2 Wilcoxon Signed-Rank Test......Page 416
11.7.3.1 One-Way ANOVA......Page 417
11.7.3.2 Two-Way ANOVA......Page 418
11.7.3.3 ANOVA with Repeated Measures......Page 419
11.7.4 Confidence Intervals and Two-Step Sampling......Page 420
11.8 Summary......Page 421
12 User Relevance Feedback and Collaborative Filtering......Page 422
12.1 Challenges in Interpreting the User Feedback......Page 424
12.2 Alternative Ways of Using the Collected Feedback in Query Processing......Page 425
12.4 Relevance Feedback in Probabilistic Models......Page 428
12.4.1 Estimating p(oi|rel) and p(oi|rel)......Page 429
12.4.2 Estimating the Probabilities of Feature Occurrences in Relevant and Nonrelevant Objects......Page 430
12.4.3 Query Adjustment......Page 431
12.5 Relevance Feedback in Probabilistic Language Modeling......Page 432
12.5.1 Feedback using a Generative Model......Page 433
12.5.3 Negative Feedback......Page 434
12.7 Feedback Decay......Page 435
12.8 Collaborative Filtering......Page 437
Types of Collaborative Filtering Schemes......Page 438
12.8.1.1 Voting-Based CF......Page 439
12.8.1.2 Nearest Neighbor–Based CF......Page 440
12.8.1.3 Associative Retrieval–Based CF......Page 441
12.8.2.2 Clustering-Based CF......Page 442
12.8.3 Combining Model- and Memory-Based Approaches......Page 443
12.8.4 Ensemble (or Boosting) Style CF......Page 444
12.8.4.2 Combining Rankings using AdaBoost......Page 445
12.8.4.3 Combining Rankings using RankBoost......Page 447
12.9 Summary......Page 449
Bibliography......Page 451
CITED LINKS......Page 495
Index......Page 497