This book presents recent advances in Natural Language Processing (NLP) and speech technology, a topic attracting increasing interest in a variety of fields through its myriad applications, such as the demand for speech guided touchless technology. The authors present results of recent experimental research that provides contributions and solutions to different issues related to speech technology and speech in industry. Technologies include Natural Language Processing, automatic speech recognition (for under-resourced dialects) and speech synthesis that are useful for applications such as intelligent virtual assistants, among others. Applications cover areas such as sentiment analysis and opinion mining, and language modelling. This book is relevant for anyone interested in the latest in language and speech technology.
The increased access to powerful processors has made possible significant progress in Natural Language Processing (NLP). We find more research in NLP targeting diverse spectrum of major industries that use voice recognition, text-to-speech (TTS) solutions, speech translation, natural language understanding (NLU), and many other applications and techniques related to these areas.
This book presents the latest research related to Natural Language Processing and speech technology and sheds light on the main topics for readers interested in this field. For TTS and automatic speech recognition, it is demonstrated how to explore transfer learning in order to generate speech in other voices from TTS of a specific language, and to improve speech recognition for non-native English. Language resources are the cornerstone for building high-quality systems; however, some languages, are considered under-resourced compared to English. In addition, the readers of this book will discover conceptions and solutions for other NLP issues such as language modeling, question answering, dialog systems, and sentence embeddings.
Although non-native English speakers (L2) outnumber native English speakers (L1), major challenges contribute to a gap between performance of automatic speech recognition (ASR) systems on L2 speech. This is mainly due to influence of L1 pronunciation on the learned language and lack of annotated L2 speech data on which ASR systems can be trained. To meet these challenges, previous work has generally followed two distinct approaches. The first is to make L2 speech representations more closely match those of L1 speech. The second approach leverages L2 speech data to improve model robustness. Due to L2 data scarcity, this second approach necessitates employment of transfer learning or domain adaptation. State-of-the-art ASR models based on self-supervised pre-training such as wav2vec and wav2vec 2.0 offer a tantalizing starting point for applying the transfer learning approach we list above, especially due to their strong performance of self-trained wav2vec 2.0 models on ASR in low-resource settings even without a language model. However, challenges remain in identifying how best to apply models such as wav2vec 2.0 in L2 fine-tuning scenarios.
Author(s): Mourad Abbas
Publisher: Springer
Year: 2023
Language: english
Pages: 217
Preface
Contents
ITAcotron 2: The Power of Transfer Learning in Expressive TTS Synthesis
1 Introduction
2 Background
3 Related Work
4 Aim and Experimental Hypotheses
5 Corpora
6 ITAcotron 2 Synthesis Pipeline
7 Evaluation Approach
8 Results
8.1 Speech Intelligibility and Naturalness
8.2 Speaker Similarity
9 Conclusion and Future Work
Appendix
References
Improving Automatic Speech Recognition for Non-native English with Transfer Learning and Language Model Decoding
1 Introduction
2 Related Work
3 Methods
3.1 Transfer Learning
3.2 CTC Decoding
4 Data
4.1 Corpus Information
4.2 Data Splits
5 Experiments
5.1 Baselines
5.2 Multi-Accent Models
5.3 Accent-Specific Models
5.4 Language Model Decoding
6 Error Analysis
7 Conclusion
References
Kabyle ASR Phonological Error and Network Analysis
1 Introduction
2 Background
2.1 ASR Modeling Units
2.2 Diacritization
2.3 Berber Language Tools
2.4 Phonological Networks
3 The Kabyle Language and Berber Writing Systems
4 Approach
4.1 Mozilla CommonVoice
4.2 Mozilla DeepSpeech
4.3 Transliterator
4.4 Sequence Alignment
5 Experimentation and Results
5.1 Experiments
5.2 Results
5.3 Phonemic Confusion Analysis
5.4 Phonological Network Analysis
6 Discussion
7 Future Work
8 Conclusion
References
ALP: An Arabic Linguistic Pipeline
1 Introduction
2 Ambiguity in Arabic
2.1 Ambiguity in Word Segmentation
2.2 Ambiguity in POS Tagging
2.2.1 Verb Ambiguities: Passive vs Active Voice
2.2.2 Verb Ambiguities: Past vs Present Tense
2.2.3 Verb Ambiguities: Imperative
2.2.4 Noun Ambiguities: Singular vs Plural
2.2.5 Noun Ambiguities: Dual vs Singular
2.2.6 Noun Ambiguities: Dual vs Plural
2.2.7 Noun Ambiguities: Feminine vs Masculine Singular
2.3 Ambiguity in Named Entity Recognition
2.3.1 Inherent Ambiguity in Named Entities
2.3.2 Ellipses
2.4 Ambiguity in Lemmatization
2.5 Ambiguity in Phrase Chunking
3 Pipeline Architecture
3.1 Preprocessing: POS, NER, and Word Segment Tagging
3.1.1 POS Tagging
3.1.2 Named Entity Recognition
3.1.3 Word Segmentation
3.2 Lemmatization
3.2.1 Learning-Based Lemmatizer
3.2.2 Dictionary-Based Lemmatizer
3.2.3 Fusion Lemmatizer
3.3 Base Chunker
4 Annotation Schema
4.1 Annotation of POS Tags
4.2 Annotation of Word Segments
4.3 Annotation of Named Entities
4.4 Annotation of Lemmas
4.5 Annotation of Base Chunks
5 Corpus Annotation
5.1 POS and Name Annotation Method
5.2 Lemma Annotation Method
5.2.1 Dictionary Lemmatizer
5.2.2 Machine Learning Lemmatizer
5.3 Base Chunking Annotation Method
6 Evaluation
6.1 Evaluation of POS Tagging
6.2 Evaluation of NER
6.3 Evaluation of Lemmatization and Base Chunking
7 Conclusion and Future Work
References
Arabic Anaphora Resolution System Using New Features: Pronominal and Verbal Cases
1 Introduction
2 Varieties of Anaphora in Arabic Text
2.1 Verb Anaphora
2.2 Lexical Anaphora
2.3 Pronominal Anaphora
2.3.1 Third-Person Personal Pronouns
2.3.2 Relative Pronouns
2.3.3 Demonstrative Pronouns
2.4 Comparative Anaphora
3 Related Work
4 Arabic Anaphoric Resolution Challenges
4.1 Lack of Diacritical Marks
4.2 Agglutination Phenomenon
4.3 Syntactic Flexibility (Words Free Order)
4.4 Ambiguity of the Referent
4.5 Hidden Referent
4.6 Lack of Annotated Corpora with Anaphoric Links
5 The A3T Architecture
5.1 Preprocessing
5.2 Anaphora and Candidate Identification
5.3 Anaphora Resolving
5.4 Automatic Text Annotation
6 Experiments and Results
7 Discussion
8 Conclusion
References
A Commonsense-Enhanced Document-Grounded Conversational Agent: A Case Study on Task-Based Dialogue
1 Introduction
2 Related Work
2.1 Task- and Goal-Oriented Dialogue
2.2 Dialogue State Tracking and Planning
2.3 Document-Grounded Dialogue
2.4 Commonsense-Enhanced Dialogue
2.5 Dialogue Management
3 Task2Dial
3.1 Data Collection Methodology
4 Dataset Analysis
5 The ChefBot Conversational Agent
6 Conclusions and Future Work
6.1 Future Work and Open Questions
References
BloomQDE: Leveraging Bloom's Taxonomy for Question Difficulty Estimation
1 Introduction
2 Related Work
3 Approach
3.1 Datasets
3.1.1 ARC
3.1.2 SQuAD
3.2 Data Preparation
3.2.1 Keyword Mapping
3.2.2 PoS Tagging
3.2.3 Class Binarization
3.2.4 Test Data
4 Experiments
4.1 Model Training
4.2 Parameter Optimization
4.3 Experimental Results
4.4 Room for Improvement
5 Conclusion and Future Work
References
A Comparative Study on Language Models for Dravidian Languages
1 Introduction
2 Related Work
3 Methodology
3.1 Dataset
3.2 Preprocessing
3.3 Tokenization and Vocabulary
3.4 Experimental Setup
4 Models and Evaluation
4.1 Word Embedding Models
4.2 Contextual Embedding Models
4.2.1 RoBERTa
4.2.2 DeBERTa
4.2.3 ELECTRA
5 Results
5.1 Word Similarity
5.2 News Article Classification
6 Conclusion
7 Future Work
References
Arabic Named Entity Recognition with a CRF Model Based on Transformer Architecture
1 Introduction
2 Background
2.1 AraBERT
2.2 AraELECTRA
2.3 RoBERTa
3 Related Works
3.1 Rule-Based Approach
3.2 Machine Learning Approach
3.3 Deep Learning Approach
3.4 Hybrid Approach
4 Transformer-Based CRF Model
4.1 Proposed Model Architecture
4.2 Linear Layer
4.3 CRF Tagging Algorithm
4.4 Calculating the NLL Function
5 Experiment
5.1 Tagging Types
5.2 Data Samples
5.2.1 ANERcorp Dataset
5.2.2 AQMAR Dataset
5.2.3 CANERCorpus Dataset
5.2.4 Our Arabic Legal Content (ALC) Dataset
5.3 Fine-Tuning Process
6 Results
7 Conclusion
References
Static Fuzzy Bag-of-Words: Exploring Static Universe Matrices for Sentence Embeddings
1 Introduction
2 Related Work
2.1 Word and Sentence Embeddings
2.2 Fuzzy Bag-of-Words and DynaMax for Sentence Embeddings
3 Static Fuzzy Bag-of-Words Model
3.1 Word Embeddings
3.2 Universe Matrix
3.2.1 Clustering
3.2.2 Identity
3.2.3 Multivariate Analysis
3.2.4 Vector Significance
4 Experiments
4.1 Word Embeddings
4.2 Universe Matrices
4.2.1 Clustering
4.2.2 Identity
4.2.3 Multivariate Analysis
4.2.4 Vector Significance
4.3 Data
4.4 Evaluation Approach
5 Results
5.1 Individual SFBoW Results
5.2 Comparison with Other Models
6 Conclusion
References
Index