Speech Normalization and Data Augmentation Techniques Based on Acoustical and Physiological Constraints and Their Applications to Child Speech Recognition

Speech Normalization and Data Augmentation Techniques Based on Acoustical and Physiological Constraints and Their Applications to Child Speech Recognition
Author: Gary Joseph Yeung
Publisher:
Total Pages: 90
Release: 2021
Genre:
ISBN:


Download Speech Normalization and Data Augmentation Techniques Based on Acoustical and Physiological Constraints and Their Applications to Child Speech Recognition Book in PDF, Epub and Kindle

Recently, adult automatic speech recognition (ASR) system performance has improved dramatically. In contrast, the performance of child ASR systems remains inadequate in an era where demand for child speech technology is on the rise. While adult speech data is abundant, publicly available child speech data is sparse due, in part, to privacy concerns. Hence, many child ASR systems are trained using adult speech data. However, child ASR systems perform poorly when trained on adult speech due to the acoustic mismatch that results from body size differences, especially the vocal folds and the vocal tract, as well as the high variability of child speech.This research analyzes the acoustical properties of child speech across various ages and compares them to the acoustic properties of adult speech. Specifically, the subglottal resonances (SGRs), fundamental frequency (fo), and formant frequencies of vowel productions are investigated. These acoustic features are shown to be capable of predicting acoustic structures across speakers. As such, we propose feature extraction methods utilizing these properties to normalize the acoustic structure across speakers and reduce the acoustic mismatch between adult and child speech. This allows child ASR systems to leverage adult data for training and suggests a framework for a universal ASR system that need not be adult or child dependent. Furthermore, we demonstrate that when child speech data is limited, these feature normalization methods are capable of producing significant improvements in child ASR for both Gaussian mixture model (GMM) and deep neural network (DNN)-based systems.

New Era for Robust Speech Recognition

New Era for Robust Speech Recognition
Author: Shinji Watanabe
Publisher: Springer
Total Pages: 433
Release: 2017-10-30
Genre: Computers
ISBN: 331964680X


Download New Era for Robust Speech Recognition Book in PDF, Epub and Kindle

This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of real-world applications, benchmark tools and datasets widely used in the field. This book is intended for researchers and practitioners working in the field of speech processing and recognition who are interested in the latest deep learning techniques for noise robustness. It will also be of interest to graduate students in electrical engineering or computer science, who will find it a useful guide to this field of research.

Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition
Author: Alex Acero
Publisher: Springer Science & Business Media
Total Pages: 216
Release: 1992-11-30
Genre: Technology & Engineering
ISBN: 9780792392842


Download Acoustical and Environmental Robustness in Automatic Speech Recognition Book in PDF, Epub and Kindle

The need for automatic speech recognition systems to be robust with respect to changes in their acoustical environment has become more widely appreciated in recent years, as more systems are finding their way into practical applications. Although the issue of environmental robustness has received only a small fraction of the attention devoted to speaker independence, even speech recognition systems that are designed to be speaker independent frequently perform very poorly when they are tested using a different type of microphone or acoustical environment from the one with which they were trained. The use of microphones other than a "close talking" headset also tends to severely degrade speech recognition -performance. Even in relatively quiet office environments, speech is degraded by additive noise from fans, slamming doors, and other conversations, as well as by the effects of unknown linear filtering arising reverberation from surface reflections in a room, or spectral shaping by microphones or the vocal tracts of individual speakers. Speech-recognition systems designed for long-distance telephone lines, or applications deployed in more adverse acoustical environments such as motor vehicles, factory floors, oroutdoors demand far greaterdegrees ofenvironmental robustness. There are several different ways of building acoustical robustness into speech recognition systems. Arrays of microphones can be used to develop a directionally-sensitive system that resists intelference from competing talkers and other noise sources that are spatially separated from the source of the desired speech signal.

Speech and Speaker Recognition

Speech and Speaker Recognition
Author: Manfred Robert Schroeder
Publisher: Karger Medical and Scientific Publishers
Total Pages: 220
Release: 1985-01-01
Genre: Medical
ISBN: 9783805540124


Download Speech and Speaker Recognition Book in PDF, Epub and Kindle

Invariant Features and Enhanced Speaker Normalization for Automatic Speech Recognition

Invariant Features and Enhanced Speaker Normalization for Automatic Speech Recognition
Author: Florian Müller
Publisher: Logos Verlag Berlin GmbH
Total Pages: 247
Release: 2013
Genre: Computers
ISBN: 3832533192


Download Invariant Features and Enhanced Speaker Normalization for Automatic Speech Recognition Book in PDF, Epub and Kindle

Automatic speech recognition systems have to handle various kinds of variabilities sufficiently well in order to achieve high recognition rates in practice. One of the variabilities that has a major impact on the performance is the vocal tract length of the speakers. Normalization of the features and adaptation of the acoustic models are commonly used methods in speech recognition systems. In contrast to that, a third approach follows the idea of extracting features with transforms that are invariant to vocal tract lengths changes. This work presents several approaches for extracting invariant features for automatic speech recognition systems. The robustness of these features under various training-test conditions is evaluated and it is described how the robustness of the features to noise can be increased. Furthermore, it is shown how the spectral effects due to different vocal tract lengths can be estimated with a registration method and how this can be used for speaker normalization.

Data-Driven Techniques in Speech Synthesis

Data-Driven Techniques in Speech Synthesis
Author: R.I. Damper
Publisher: Springer Science & Business Media
Total Pages: 328
Release: 2012-12-06
Genre: Science
ISBN: 1475734131


Download Data-Driven Techniques in Speech Synthesis Book in PDF, Epub and Kindle

This first review of a new field covers all areas of speech synthesis from text, ranging from text analysis to letter-to-sound conversion. At the leading edge of current research, the concise and accessible book is written by well respected experts in the field.

Data Augmentation for Automatic Speech Recognition for Low Resource Languages

Data Augmentation for Automatic Speech Recognition for Low Resource Languages
Author: Ronit Damania
Publisher:
Total Pages: 37
Release: 2021
Genre: Automatic speech recognition
ISBN:


Download Data Augmentation for Automatic Speech Recognition for Low Resource Languages Book in PDF, Epub and Kindle

"In this thesis, we explore several novel data augmentation methods for improving the performance of automatic speech recognition (ASR) on low-resource languages. Using a 100-hour subset of English LibriSpeech to simulate a low-resource setting, we compare the well-known SpecAugment augmentation approach to these new methods, along with several other competitive baselines. We then apply the most promising combinations of models and augmentation methods to three genuinely under-resourced languages using the 40-hour Gujarati, Tamil, Telugu datasets from the 2021 Interspeech Low Resource Automatic Speech Recognition Challenge for Indian Languages. Our data augmentation approaches, coupled with state-of-the-art acoustic model architectures and language models, yield reductions in word error rate over SpecAugment and other competitive baselines for the LibriSpeech-100 dataset, showing a particular advantage over prior models for the ``other'', more challenging, dev and test sets. Extending this work to the low-resource Indian languages, we see large improvements over the baseline models and results comparable to large multilingual models."--Abstract.

Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments

Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments
Author: Xiao-Lei Zhang
Publisher: Elsevier
Total Pages: 282
Release: 2024-09-04
Genre: Computers
ISBN: 0443248575


Download Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments Book in PDF, Epub and Kindle

Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments provides a detailed discussion of deep learning-based robust speech processing and its applications. The book begins by looking at the basics of deep learning and common deep network models, followed by front-end algorithms for deep learning-based speech denoising, speech detection, single-channel speech enhancement multi-channel speech enhancement, multi-speaker speech separation, and the applications of deep learning-based speech denoising in speaker verification and speech recognition. Provides a comprehensive introduction to the development of deep learning-based robust speech processing Covers speech detection, speech enhancement, dereverberation, multi-speaker speech separation, robust speaker verification, and robust speech recognition Focuses on a historical overview and then covers methods that demonstrate outstanding performance in practical applications

Advances in Non-Linear Modeling for Speech Processing

Advances in Non-Linear Modeling for Speech Processing
Author: Raghunath S. Holambe
Publisher: Springer Science & Business Media
Total Pages: 109
Release: 2012-02-21
Genre: Technology & Engineering
ISBN: 1461415047


Download Advances in Non-Linear Modeling for Speech Processing Book in PDF, Epub and Kindle

Advances in Non-Linear Modeling for Speech Processing includes advanced topics in non-linear estimation and modeling techniques along with their applications to speaker recognition. Non-linear aeroacoustic modeling approach is used to estimate the important fine-structure speech events, which are not revealed by the short time Fourier transform (STFT). This aeroacostic modeling approach provides the impetus for the high resolution Teager energy operator (TEO). This operator is characterized by a time resolution that can track rapid signal energy changes within a glottal cycle. The cepstral features like linear prediction cepstral coefficients (LPCC) and mel frequency cepstral coefficients (MFCC) are computed from the magnitude spectrum of the speech frame and the phase spectra is neglected. To overcome the problem of neglecting the phase spectra, the speech production system can be represented as an amplitude modulation-frequency modulation (AM-FM) model. To demodulate the speech signal, to estimation the amplitude envelope and instantaneous frequency components, the energy separation algorithm (ESA) and the Hilbert transform demodulation (HTD) algorithm are discussed. Different features derived using above non-linear modeling techniques are used to develop a speaker identification system. Finally, it is shown that, the fusion of speech production and speech perception mechanisms can lead to a robust feature set.

Direction of Arrival Estimation and Localization of Multi-Speech Sources

Direction of Arrival Estimation and Localization of Multi-Speech Sources
Author: Nilanjan Dey
Publisher: Springer
Total Pages: 67
Release: 2017-12-23
Genre: Technology & Engineering
ISBN: 3319730592


Download Direction of Arrival Estimation and Localization of Multi-Speech Sources Book in PDF, Epub and Kindle

This book presents research and applications on arrival estimation and localization in speech processing to ensure that the broad vision of the direction of arrival estimation (DOAE) / localization of speech sources is well-established. The book first provides a brief overview of the most classical direction of arrival estimation and localization techniques. It then introduces the concept and model of acoustics sources and then highlights the most contemporary studies on this pervasive problem. In addition, the authors explore employing the optimization algorithms to improve the DOAE techniques. The book then highlights the concept and principles of the multi-DOAE approaches. Using a microphone array, the book introduces the localization and tracking problem of multiple speech/acoustic sources. It includes several applications and real-life speech sources localization based on the DOAE approaches. The book reports the challenges facing the DOAE techniques in speech-sources localization. The book pertains to researchers, designers, and engineers in speech processing fields.