Intelligent Text Categorization and Clustering

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Издательство Springer, 2009, -127 pp.
Automatic Text Categorization and Clustering are becoming more and more important as the amount of text in electronic format grows and the access to it becomes more necessary and widespread. Well known applications are spam filtering and web search, but a large number of everyday uses exist (intelligent web search, data mining, law enforcement, etc.) Currently, researchers are employing many intelligent techniques for text categorization and clustering, ranging from support vector machines and neural networks to Bayesian inference and algebraic methods, such as Latent Semantic Indexing.
This volume offers a wide spectrum of research work developed for intelligent text categorization and clustering. In the following, we give a brief introduction of the chapters that are included in this book.
In Chapter 1, the authors present the use of attribute selection techniques to define a subset of genes related to specific characteristics such as cancer arising. Through combination of search methods and evaluation procedures, the authors show that the data mining algorithm speeds up, mining performance such as predictive accuracy is improved and the comprehensibility of the results becomes easier in most of the combinations. The authors obtained best results with wrapper approaches and sequential search.
In Chapter 2, the authors propose a new preprocessing technique for online handwriting. The approach is to first remove the hooks of the strokes by using changed-angle threshold with length threshold, then filter the noise by using a smoothing technique, which is the combination of the Cubic Spline and the equal-interpolation methods. Then, the handwriting is normalised.
In Chapter 3, the authors explore clustering of unstructured document collection. They explore a simple procedure that not only considerably reduces the dimension of the feature space and hence the processing time, but also produces clustering performance comparable or even better when confronted with the full set of terms.
In Chapter 4, the authors investigate the application of query expansion technique to improve cross-language information retrieval in English and Thai as well as the potential to apply the technique to other intelligent systems such as tutoring systems. As a method of evaluation of query expansion, they attempt to find out whether the expanded terms are useful for the search.
In Chapter 5, the authors provide a fuzzy partition and a prototype for each cluster by optimizing a criterion dependent on the dissimilarity function chosen. They include experiments involving benchmark data sets and carried out in order to compare the accuracy of each function. In order to analyse the results, they apply an external criterion that compares different partitions of a same data set.
In Chapter 6, the authors describe a system for cluster analysis of hypertext documents based on genetic algorithms. The system’s effectiveness in getting groups with similar documents is evidenced by the experimental results.
Gene Selection from Microarray Data
Preprocessing Techniques for Online Handwriting Recognition
A Simple and Fast Term Selection Procedure for Text Clustering
Bilingual Search Engine and Tutoring System Augmented with Query Expansion
Comparing Clustering on Symbolic Data
Exploring a Genetic Algorithm for Hypertext Documents Clustering

Author(s): Nedjah N., Macedo Mourelle L., Kacprzyk J., Franёca F.M.G., Souza A.F. (eds.)

Language: English
Commentary: 671097
Tags: Информатика и вычислительная техника;Искусственный интеллект;Компьютерная лингвистика