This book presents the basics and recent advancements in natural language processing and information retrieval in a single volume. It will serve as an ideal reference text for graduate students and academic researchers in interdisciplinary areas of electrical engineering, electronics engineering, computer engineering, and information technology. This text emphasizes the existing problem domains and possible new directions in natural language processing and information retrieval. It discusses the importance of information retrieval with the integration of machine learning, deep learning, and word embedding. This approach supports the quick evaluation of real-time data. It covers important topics including rumor detection techniques, sentiment analysis using graph-based techniques, social media data analysis, and language-independent text mining.
Features
• Covers aspects of information retrieval in different areas including healthcare, data analysis, and machine translation
• Discusses recent advancements in language- and domain-independent information extraction from textual and/or multimodal data
• Explains models including decision making, random walk, knowledge graphs, word embedding, n-grams, and frequent pattern mining
• Provides integrated approaches of machine learning, deep learning, and word embedding for natural language processing
• Covers latest datasets for natural language processing and information retrieval for social media like Twitter
The text is primarily written for graduate students and academic researchers in interdisciplinary areas of electrical engineering, electronics engineering, computer engineering, and information technology.
Author(s): Muskan Garg, Sandeep Kumar, Abdul Khader Jilani Saudagar
Publisher: Taylor & Francis Group
Year: 2023
Language: English
Pages: 252
1 Federated learning for natural language processing
Sergei Ternovykh and Anastasia Nikiforova
1.1 Introduction
1.1.1 Centralized (standard) federated learning
1.1.2 Aggregation algorithms
1.1.3 DSSGD
1.1.4 FedAvg
1.1.5 FedProx
1.1.6 SCAFFOLD
1.1.7 FedOpt
1.1.8 Other algorithms
1.1.9 The cross-device setting
1.1.10 The cross-silo setting
1.1.11 Horizontal federated learning
1.1.12 Vertical federated learning
1.1.13 Federated transfer learning
1.2 Split learning and split federated learning 15
1.2.1 Vanilla split learning
1.2.2 Configurations of split learning
1.2.3 SplitFed learning
1.3 Decentralized (peer-to-peer) federated learning
1.4 Summary
1.5 NLP through FL
1.6 Security and privacy protection
1.7 FL platforms and datasets
1.8 Conclusion
References
2 Utility-based recommendation system for large datasets using EAHUIM
Vandna Dahiya
2.1 Introduction
2.2 Related work
2.2.1 Recommendation system
2.2.2 Recommendation systems in E-commerce
2.2.3 Recommendation system with utility itemset mining
2.2.4 Prerequisites of utility itemset mining
2.2.5 Problem definition
2.3 Proposed model
2.3.1 Design of the model – recommendation system with EAHUIM
2.4 Results and discussion
2.4.1 Setup
2.4.2 Data collection and preprocessing
2.4.3 Performance evaluation
2.4.4 Limitations of the model
2.4.5 Discussions
2.5 Conclusion
References
3 Anaphora resolution: A complete view with case study
Kalpana B. Khandale and C. Namrata Mahender
3.1 Introduction
3.1.1 Issues and challenges of an Anaphora
3.1.2 Need for anaphora in NLP applications
3.1.3 Anaphora
3.1.4 Discourse anaphora
3.2 Approaches to Anaphora resolution
3.2.1 Knowledge-rich approaches
3.2.2 Corpus based approaches
3.2.3 Knowledge-poor approaches
3.3 Case study of anaphora resolution in Marathi text
3.3.1 Development of POS tagger
3.3.2 Anaphora resolution system architecture
3.4 Case Study of Anaphora Resolution in Marathi with Python
3.4.1 Database
3.4.2 Preprocessing
3.4.3 Post-processing
3.5 Conclusion
References
4 A review of the approaches to neural machine translation
Preetpal Kaur Buttar and Manoj Kumar Sachan
4.1 Introduction
4.2 Machine translation approaches
4.3 Formulation of the NMT task
4.4 The encoder-decoder model
4.4.1 Encoder
4.4.2 Decoder
4.5 RNNs as encoder-decoder models
4.5.1 One-hot encoding
4.5.2 Variations of RNNs
4.5.3 Discussion and inferences
4.6 LSTMs: dealing with long-term dependencies and vanishing gradients
4.6.1 GRUs
4.6.2 Limitations of LSTMs
4.7 NMT with attention
4.8 Recent developments in NMT
4.8.1 Word embeddings
4.8.2 CNN-based NMT
4.8.3 Fully attention-based NMT
4.8.4 Transformer-based pre-trained models
4.8.5 Improved transformer models
4.9 NMT in low-resource languages
4.10 Vocabulary coverage problem
4.11 Datasets for machine translation
4.12 Challenges and future scope
References
5 Evolution of question-answering system from information retrieval: A scientific time travel for Bangla
Arijit Das and Diganta Saha
5.1 Introduction
5.2 The meaning and various ways of research done in the field of semantics
5.3 Semantic text retrieval: automatic question-answering system or query system
5.4 State-of-the-art performance
5.5 Latest research works on question-answering systems in related major global languages
5.6 Latest research works on question-answering system in major Indian languages
5.7 Backend database or repository used in the question-answering system
5.8 Different approaches of algorithms used in Bangla QA system research
5.9 Different algorithms used in Bangla QA system research
5.10 Results achieved by various Bangla QA systems
5.11 Conclusion
References
6 Recent advances in textual code-switching
Sergei Ternovykh and Anastasia Nikiforova
6.1 Introduction
6.2 Background
6.2.1 Theoretical approaches and types of code-switching
6.2.2 Measuring code-switching complexity
6.3 Code-switching datasets
6.3.1 Language identification
6.3.2 POS tagging
6.3.3 Named entity recognition
6.3.4 Chunking and dependency parsing
6.3.5 Sentiment analysis
6.3.6 Question-answering
6.3.7 Conversational systems
6.3.8 Machine translation
6.3.9 Natural language inference
6.4 NLP techniques for textual code-switching
6.4.1 Language modeling
6.4.2 Language identification
6.4.3 POS tagging
6.4.4 Named entity recognition
6.4.5 Dependency parsing
6.4.6 Sentiment analysis
6.4.7 Natural language inference
6.4.8 Machine translation
6.4.9 Question-answering
6.5 Evaluation of code-switched systems
6.6 Current limitations and future work
References
7 Legal document summarization using hybrid model
Deekshitha and Nandhini K.
7.1 Introduction
7.1.1 Background
7.1.2 Motivation
7.1.3 Problem definition
7.1.4 Objectives and scopes
7.1.5 Organization
7.2 Literature review
7.2.1 Automatic text summarization in the legal domain
7.3 Methodology
7.3.1 Legal document summary
7.3.2 Evaluation
7.4 Experiments and results
7.4.1 Evaluating extractive model
7.4.2 Effects of different K values (summary length)
7.4.3 Evaluating abstractive summary model
7.4.4 Comparison with extractive summarization models
7.4.5 Comparison with abstractive model
7.5 Conclusion
References
8 Concept network using network text analysis
Md Masum Billah, Dipanita Saha, Farzana Bhuiyan, and Mohammed Kaosar
8.1 Introduction
8.2 Literature review
8.3 The concept network
8.3.1 Concept-based information retrieval
8.3.2 Concept networks and extended fuzzy concept networks
8.3.3 Applications for fuzzy concept knowledge
8.3.4 Building WikiNet: using Wikipedia as the source
8.4 Network text analysis
8.4.1 Extracting context words from training documents
8.4.2 Building bigram frequency for text classification
8.4.3 Detecting related articles by using bigrams
8.5 Conclusion and future direction
References
9 Question-answering versus machine reading comprehension: Neural machine reading comprehension using transformer models
Nisha Varghese and M. Punithavalli
9.1 Introduction
9.2 Architecture of machine reading comprehension
9.2.1 Word embedding
9.2.2 Feature extraction
9.2.3 Context-question interaction
9.2.4 Answer prediction
9.3 Machine reading comprehension tasks and classification
9.3.1 Cloze tests
9.3.2 Multiple-choice questions
9.3.3 Span extraction
9.3.4 Free-form answering
9.3.5 Attribute-based classification
9.4 Datasets
9.5 Performance evaluation metrics
9.5.1 Accuracy
9.5.2 Exact match
9.5.3 Precision and recall
9.5.4 F1 score
9.5.6 ROUGE (recall-oriented understudy for gisting evaluation)
9.5.7 BLEU (bilingual evaluation understudy)
9.5.8 METEOR (Metric for Evaluation of Translation with Explicit ORdering)
9.5.9 HEQ (human equivalence score)
9.6 Transformer and BERT
9.6.1 BERT-based models
9.7 Results and discussion
9.8 Conclusion and future enhancement
References
10 Online subjective question-answering system necessity of education system
Madhav A. Kankhar, Bharat A. Shelke, and C. Namrata Mahender
10.1 Introduction
10.1.1 Brief on NLP (natural language processing)
10.1.2 Question
10.1.3 Answer
10.1.4 Question-answering system
10.1.5 Types of question-answering
10.1.6 Approaches used for developing QA system
10.1.7 Components of question-answering
10.1.8 Need for question-answering system
10.2 Question types follow into two categories
10.2.1 Objective examination
10.2.2 Subjective examination
10.2.3 Subjective examination-related work
10.2.4 Why is subjective examination important
10.2.5 Online education system
10.3 Proposed model
10.3.1 Text document
10.3.2 Preprocessing
10.3.3 POS tagging
10.3.4 Question generation
10.3.5 User (students) and user (teacher)
10.3.6 Model answer
10.3.7 Answer
10.3.8 Evaluation
10.3.9 Result
10.4 Conclusion
References
Index