This volume presents recent methodological developments in data analysis and classification. A wide range of topics is covered that includes methods for classification and clustering, dissimilarity analysis, graph analysis, consensus methods, conceptual analysis of data, analysis of symbolic data, statistical multivariate methods, data mining and knowledge discovery in databases. Besides structural and theoretical results, the book presents a wide variety of applications, in fields such as biology, micro-array analysis, cyber traffic, bank fraud detection, and text analysis. Combining new methodological advances with a wide variety of real applications, this volume is certainly of special value for researchers and practitioners, providing new analytical tools that are useful in theoretical research and daily practice in classification and data analysis.
Author(s): Paula Brito
Series: Studies in Classification, Data Analysis, and Knowledge Organization
Edition: 1
Publisher: Springer
Year: 2007
Language: English
Pages: 649
Cover......Page 1
Studies in Classification, Data Analysis,
and Knowledge Organization......Page 2
Edwin Diday......Page 3
Selected Contributions
in Data Analysis
and Classification......Page 4
ISBN 9783540735588......Page 5
Foreword......Page 6
Preface......Page 8
Contents......Page 10
Part I
Analysis of Symbolic Data......Page 16
Dependencies and Variation Components of
Symbolic Interval-Valued Data......Page 18
On the Analysis of Symbolic......Page 28
Symbolic Analysis to Learn
Evolving CyberTraffic......Page 38
A Clustering Algorithm for Symbolic Interval
Data Based on a Single Adaptive Hausdorff
Distance......Page 50
An Agglomerative Hierarchical Clustering
Algorithm for Improving Symbolic Object
Retrieval......Page 60
3WaySym-Scal: Three-Way Symbolic
Multidimensional Scaling......Page 70
Clustering and Validation of Interval Data......Page 84
Building Symbolic Objects from Data Streams......Page 98
Feature Clustering Method to Detect
Monotonic Chain Structures in Symbolic Data......Page 110
Symbolic Markov Chains......Page 118
Quality Issues in Symbolic Data Analysis......Page 128
Dynamic Clustering of Histogram Data:
Using the Right Metric......Page 138
Part II
Clustering Methods......Page 150
Beyond the Pyramids:
Rigid Clustering Systems......Page 152
Indirect Blockmodeling of 3-Way Networks......Page 166
Clustering Methods:
A History of k-Means Algorithms......Page 176
Overlapping Clustering in a Graph Using
k-Means and Application to Protein
Interactions Networks......Page 188
Species Clustering via Classical and Interval
Data Representation......Page 198
Looking for High Density Zones in a Graph......Page 208
Block Bernoulli Parsimonious Clustering
Models......Page 218
Cluster Analysis Based on Posets......Page 228
Hybrid k-Means: Combining Regression-Wise
and Centroid-Based Criteria for QSAR......Page 240
Partitioning by Particle Swarm Optimization......Page 250
Part III
Conceptual Analysis of Data......Page 260
Concepts of a Discrete Random Variable......Page 262
Mining Description Logics Concepts with
Relational Concept Analysis......Page 274
Representation of Concept Description
by Multivalued Taxonomic Preordonance
Variables......Page 286
Recent Advances in Conceptual Clustering:
CLUSTER3......Page 300
Symbolic Dynamics in Text: Application to
Automated Construction of Concept
Hierarchies......Page 314
Part IV
Consensus Methods......Page 322
Average Consensus
and Infinite Norm Consensus :
Two Methods for Ultrametric Trees......Page 324
Consensus from Frequent Groupings......Page 332
Consensus of Star Tree Hypergraphs......Page 340
Part V
Data Analysis, Data Mining, and KDD......Page 346
Knowledge Management in Environmental
Sciences with IKBS:
Application to Systematics of Corals of the
Mascarene Archipelago......Page 348
Unsupervised Learning Informational Limit
in Case of Sparsely Described Examples......Page 360
Data Analysis and Operations Research......Page 372
Reduction of Redundant Rules in
Statistical Implicative Analysis......Page 382
Mining Personal Banking Data
to Detect Fraud......Page 392
Finding Rules in Data......Page 402
Mining Biological Data Using Pyramids......Page 412
Association Rules for Categorical and
Tree Data......Page 424
Induction Graphs for Data Mining......Page 434
Part VI
Dissimilarities: Structures and Indices......Page 446
Clustering of Molecules: Influence of the
Similarity Measures......Page 448
Group Average Representations in
Euclidean Distance Cones......Page 460
On Lower-Maximal Paired-Ultrametrics......Page 470
A Note on Three-Way Dissimilarities and
Their Relationship with Two-Way
Dissimilarities......Page 480
One-to-One Correspondence Between Indexed
Cluster Structures and Weakly Indexed Closed
Cluster Structures......Page 492
Adaptive Dissimilarity Index for Gene
Expression Profiles Classification......Page 498
Lower (Anti-)Robinson Rank Representations
for Symmetric Proximity Matrices......Page 510
Density-Based Distances: a New Approach for
Evaluating Proximities Between Objects.
Applications in Clustering and
Discriminant Analysis......Page 520
Robinson Cubes......Page 530
Part VII
Multivariate Statistics......Page 540
Relative and Absolute Contributions
to Aid Strata Interpretation......Page 542
Classification and Generalized Principal
Component Analysis......Page 554
Locally Linear Regression and the Calibration
Problem for Micro-Array Analysis......Page 564
Sanskrit Manuscript Comparison for Critical
Edition and Classification......Page 572
Divided Switzerland......Page 582
Prediction with Confidence......Page 592
Which Bootstrap for Principal Axes Methods?......Page 596
PCR and PLS for Clusterwise Regression on
Functional Data......Page 604
A New Method for Ranking n Statistical Units......Page 614
About Relational Correlations......Page 624
Dynamic Features Extraction
in Soybean Futures Market of China......Page 634
Index......Page 644