Speech represents the most natural means of communication between humans. By using Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems, machines also become able to interact with humans using speech. This is of particular importance for building interactive robots or speech-enabled chatbots. This book starts by exploring state-of-the-art ASR and TTS approaches, making use of artificial neural networks, relevant also to low-resource scenarios. Then, it explores the application of speech technology to specific domains, such as the medical domain, human-robot interaction, and even interlinking of speech and text resources using linguistic linked open data (LLOD) principles. The book also provides punctuation restoration techniques, enabling the production of high-quality text transcripts. Included algorithms have low latency and can be parallelized, thus enabling their use in interactive systems. Chapter authors are professors and scientific researchers with experience in building and using natural language processing algorithms and speech applications.
Author(s): Vasile-florian Pais
Series: Computer Science, Technology and Applications
Publisher: Nova Science Publishers
Year: 2022
Language: English
Pages: 238
City: New York
Computer Science, Technologyand Applications
Speech Recognition Technologyand Applications
Contents
Preface
Chapter 1Building an Automatic Speech RecognitionSystem for a Low-Resource Language
Abstract
1. Introduction
1.1. Romanian as a Low-Resource Language
2. State-of-the-Art Architectures
2.1. HMM-GMMBased Architectures
2.2. Deep Neural Networks Architectures
2.3. Hybrid Architectures
2.4. Language Models
3. Method
3.1. Corpora
3.2. Automatic Grapheme-to-Phoneme Conversion
3.3. Language Models
3.4. Data Augmentation
3.5. Speech-to-Text Architectures for Romanian
3.5.1. CMUSphinx
3.5.2. DeepSpeech
3.5.3. DeepSpeech 2
3.5.4. Kaldi
3.6. Replicable Experiments with Containerization
4. Results
4.1. CMUSphinx
4.2. DeepSpeech
4.3. Kaldi
4.4. Data Augmentation and SpecAugment
5. Discussion
5.1. CMUSphinx
5.2. DeepSpeech
5.3. Kaldi
Conclusion and FutureWork
Acknowledgment
References
Chapter 2Self-Supervised Pre-Training in SpeechRecognition Systems
Abstract
1. Introduction
2. Contrastive Representation Learning
2.1. Training Objectives
2.2. Essential Components
3. Pre-Trained ASR Architectures
3.1. Wav2Vec
3.2. VQ-Wav2Vec
3.3. Wav2Vec2
4. Comparison with Non-Pre-TrainedModels
4.1. Dataset
4.2. Baseline Models
4.3. Pre-TrainedWav2Vec2 Models
4.4. Experimental Setup
4.5. Results
5. RELATE Integration
Conclusion
References
Chapter 3The Impact of Speech RecognitionPerformance on Human-ComputerInteraction
Abstract
1. Introduction
2. Architecture of a Speech-Based Dialogue System
3. Implementation Details
3.1. Automatic Speech Recognition
3.2. DialogueManager
3.3. Text-to-Speech
4. ASR Enhancements Leading to IncreasedPerformance of the Overall System
4.1. End-to-End Neural ASR System
4.2. Fine-Tuning the ASR System with Domain-Specific Data
5. Impact of ASR Enhancements
5.1. Evaluation of RDM with a Fine-Tuned ASR System
5.2. Overall System Response Time
Conclusion
References
Chapter 4The Role of Automatic Speech RecognitionSystems in Developing Medical Applications
Abstract
1. Introduction
2. General Overview of ASR
3. NLP Applications in Medical Domain
3.1. Named Entity Recognition
3.2. Classification
3.3. Summarization
4. ASR Applications in Medical Domain
4.1. Digital Scribes for Medical Domain
4.1.1. Challenges of Developing Digital Scribes for theMedical Domain
4.2. Software and Platforms with ASR-Based Capabilities intheMedical Domain
4.2.1. Case Study: AmazonMedical
• Amazon Transcribe Medical
• Amazon Comprehend Medical
4.3. ASR and Vocal Biomarkers
4.4. Medical IOT and ASR
Conclusion
References
Chapter 5Punctuation Recovery for RomanianTranscribed Documents
Abstract
1. Introduction
2. Punctuation in Romanian Language
3. Corpora and Resources
4. Algorithms
5. Results
Conclusion
References
Chapter 6Linguistic Linked Open Datafor Speech Processing
Abstract
1. Introduction
2. Linguistic Linked Open Data
3. Romanian Resources as Linguistic Linked OpenData
4. LLOD Resources for Speech Processing
5. Romanian LLOD Resources for SpeechProcessing
5.1. The RoLEX Lexicon
5.2. The RTASC Corpus
6. ExploitingMultiple Resources for AdvancedUsage Scenarios
Conclusion
References
Chapter 7Transformer-Based RomanianText-to-Speech System Using BooleanMasking for Improved Prosody
Abstract
1. Introduction
2. RelatedWork
2.1. Deep Neural Models for Speech Synthesis
2.2. Speech Synthesis for the Romanian Language
3. Datasets for the Romanian Language
3.1. Existing Datasets
3.2. Introducing a New Male Voice Dataset - RSS-Alex
4. FastSpeech TTS with BooleanMasking
5. Experiments
6. Results
Conclusion
FutureWork
Acknowledgments
References
About the Editor
About the Contributors
Index
Blank Page
Blank Page