Data Analytics for Social Microblogging Platforms

This document was uploaded by one of our users. The uploader already confirmed that they had the permission to publish it. If you are author/publisher or own the copyright of this documents, please report to us by using this DMCA report form.

Simply click on the Download Book button.

Yes, Book downloads on Ebookily are 100% Free.

Sometimes the book is free on Amazon As well, so go ahead and hit "Search on Amazon"

Data Analysis for Social Microblogging Platforms explores the nature of microblog datasets, also covering the larger field which focuses on information, data and knowledge in the context of natural language processing. The book investigates a range of significant computational techniques which enable data and computer scientists to recognize patterns in these vast datasets, including machine learning, data mining algorithms, rough set and fuzzy set theory, evolutionary computations, combinatorial pattern matching, clustering, summarization and classification. Chapters focus on basic online micro blogging data analysis research methodologies, community detection, summarization application development, performance evaluation and their applications in big data.

Author(s): Soumi Dutta, Asit Kumar Das, Saptarshi Ghosh, Debabrata Samanta
Series: Hybrid Computational Intelligence for Pattern Analysis and Understanding
Publisher: Academic Press
Year: 2022

Language: English
Pages: 327
City: London

Front Cover
Data Analytics for Social Microblogging Platforms
Copyright
Contents
About the authors
Preface
Acknowledgments
About the book
Part 1 Introduction of intelligent information filtering and organization systems for social microblogging sites
1 Introduction to microblogging sites
1.1 Introduction
1.2 Online social networking sites
1.3 Advantages and disadvantages of social networking
1.4 Microblogging sites
1.4.1 The best microblogging site list includes the following names
1.5 Information of social microblogging sites
1.6 Challenges in using microblogging sites
1.7 Background of the Twitter microblogging site
1.8 Motivation of research
1.8.1 Information filtering
1.8.2 Information organization
1.8.3 Clustering
1.8.3.1 HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)
1.9 Challenges and requirements of multi-document summarization
1.10 Contributions of this research
1.10.1 Attribute selection for spam classification
1.10.2 Microblog clustering
1.10.2.1 Graph-based clustering algorithm
1.10.2.2 Genetic algorithm-based tweet clustering
1.10.2.3 Clustering based on feature selection
1.10.2.4 Clustering using dimensionality reduction techniques
1.10.2.5 Comparative analysis
1.10.3 Summarization of OSN data (Twitter data)
1.10.3.1 Motivation of ensemble summarization
1.10.3.2 Proposed ensemble summarization algorithms
1.11 Conclusion
References
2 Literature review on data analytics for social microblogging platforms
2.1 Introduction
2.2 Attribute selection and its application in spam detection
2.2.1 Attribute selection methods
2.2.1.1 Filter method
2.2.1.2 Wrapper method
2.2.1.3 Other attribute selection algorithms
2.2.2 Spam detection
2.2.2.1 Spam detection in OSM
2.2.2.2 Attribute selection for spam detection
2.2.3 Contributions of this chapter
2.3 Summarization with various methods
2.3.1 Automatic document summarization
2.3.2 Summarization of microblogs
2.3.3 Microblog summarizing with comparative study
2.3.4 Summarization validation
2.3.5 Contributions of this section
2.4 Cluster analysis of microblogs
2.4.1 Clustering algorithms
2.4.1.1 Partition-based clustering
2.4.1.2 Hierarchical clustering
2.4.1.3 Density-based clustering
2.4.1.4 Graph clustering algorithms
2.4.2 Cluster validation indices
2.4.3 Clustering in online social microblogging sites
2.5 Conclusion
References
3 Data collection using Twitter API
3.1 Introduction
3.2 Experimental dataset description
3.2.1 Experimental dataset for cluster analysis and summarization
3.2.2 Experimental dataset for attribute selection
3.2.2.1 Datasets from prior works
3.2.2.2 Collected dataset of spam tweets
3.2.2.3 Attributes for spam vs. legitimate classification
3.3 Data preprocessing
3.4 Removal of user names and URLs
3.5 Converting emojis and emoticons to words
3.6 Conclusion
References
Part 2 Microblogging dataset applications and implications
4 Attribute selection to improve spam classification
4.1 Introduction
4.2 Literature survey
4.2.1 Attribute selection methods
4.2.1.1 Filter methods
4.2.1.2 Wrapper methods
4.2.1.3 Other attribute selection algorithms
4.2.1.4 Spam detection in OSM
4.2.1.5 Attribute selection for spam detection
4.3 Methodology for classification
4.3.1 Rough set theory fundamentals
4.3.2 Attribute selection algorithm
4.4 Experimental dataset
4.4.1 Datasets from previous works
4.4.2 Collected dataset of spam tweets
4.5 Evaluating performance
4.5.1 Baseline attribute selection strategies
4.5.2 Classifiers used
4.5.3 Evaluation measures
4.5.4 Results
4.6 Conclusion
References
5 Ensemble summarization algorithms for microblog summarization
5.1 Introduction
5.2 Base summarization algorithms
5.3 Unsupervised ensemble summarization
5.3.1 Baseline: voting approach
5.3.2 EnGraphSumm: proposed ensemble algorithm
5.4 Supervised ensemble summarization
5.4.1 Baseline: weighted voting approach
5.4.2 Learn2Summ: proposed ensemble algorithm
5.5 Experiments and results
5.5.1 Experimental setup
5.5.2 Performance of base algorithms
5.5.3 Performance of unsupervised ensemble algorithms
5.5.4 Performance of supervised ensemble algorithms
5.6 Demonstrating the input and output of summarization algorithms through an example
5.7 Conclusion
References
6 Graph-based clustering technique for microblog clustering
6.1 Introduction
6.2 Related work
6.3 Background studies
6.3.1 Community detection
6.3.2 WordNet
6.3.2.1 SumBasic
6.4 Proposed methodology
6.4.1 Dataset
6.4.2 Similarity identification
6.4.3 Dataset preprocessing
6.4.4 Similarity measure
6.4.5 Graph generation
6.4.6 Summarization
6.5 Results and discussion
6.6 Conclusion
References
7 Genetic algorithm-based microblog clustering technique
7.1 Introduction
7.2 Related work
7.2.1 Clustering of tweets
7.2.2 Genetic algorithms
7.3 Clustering using genetic algorithms and K-means
7.4 Evaluating performance
7.5 Experimental dataset
7.5.1 Parameter selection for the algorithms
7.5.2 Baseline clustering algorithms
7.5.2.1 Partition-based clustering
7.5.2.2 Hierarchical clustering
7.5.2.3 Density-based clustering
7.5.2.4 Graph clustering algorithms
7.5.3 Cluster validation indices
7.5.4 Comparison of proposed methodology with baselines
7.6 Conclusion
References
Part 3 Attribute selection to improve spam classification
8 Feature selection-based microblog clustering technique
8.1 Introduction
8.2 Related work
8.3 Microblog clustering algorithms
8.3.1 Data preprocessing
8.4 Dataset for clustering algorithms
8.5 Experimental results
8.5.1 Cluster validation indices
8.5.2 Metrics for evaluating clustering
8.6 Conclusion
References
9 Dimensionality reduction techniques in microblog clustering models
9.1 Introduction
9.2 Literature survey
9.3 Proposed methodology
9.4 Dataset
9.4.1 Generating golden standard clusters
9.5 Results and discussion
9.5.1 Baseline clustering algorithms
9.5.1.1 Partition-based clustering
9.5.1.2 Hierarchical clustering
9.5.1.3 Density-based clustering
9.5.1.4 Graph clustering algorithms
9.5.2 Cluster validation indices
9.6 Conclusion
References
10 Conclusion and future directions
10.1 Introduction
10.2 Summary of contributions
10.3 Future research directions
References
Index
Back Cover