Invariant Features and Enhanced Speaker Normalization for Automatic Speech Recognition

Invariant Features and Enhanced Speaker Normalization for Automatic Speech Recognition
Author: Florian Müller
Publisher: Logos Verlag Berlin GmbH
Total Pages: 247
Release: 2013
Genre: Computers
ISBN: 3832533192


Download Invariant Features and Enhanced Speaker Normalization for Automatic Speech Recognition Book in PDF, Epub and Kindle

Automatic speech recognition systems have to handle various kinds of variabilities sufficiently well in order to achieve high recognition rates in practice. One of the variabilities that has a major impact on the performance is the vocal tract length of the speakers. Normalization of the features and adaptation of the acoustic models are commonly used methods in speech recognition systems. In contrast to that, a third approach follows the idea of extracting features with transforms that are invariant to vocal tract lengths changes. This work presents several approaches for extracting invariant features for automatic speech recognition systems. The robustness of these features under various training-test conditions is evaluated and it is described how the robustness of the features to noise can be increased. Furthermore, it is shown how the spectral effects due to different vocal tract lengths can be estimated with a registration method and how this can be used for speaker normalization.

Automatic Speech and Speaker Recognition

Automatic Speech and Speaker Recognition
Author: Chin-Hui Lee
Publisher: Springer Science & Business Media
Total Pages: 548
Release: 1996-03-31
Genre: Technology & Engineering
ISBN: 9780792397069


Download Automatic Speech and Speaker Recognition Book in PDF, Epub and Kindle

Research in the field of automatic speech and speaker recognition has made a number of significant advances in the last two decades, influenced by advances in signal processing, algorithms, architectures, and hardware. These advances include: the adoption of a statistical pattern recognition paradigm; the use of the hidden Markov modeling framework to characterize both the spectral and the temporal variations in the speech signal; the use of a large set of speech utterance examples from a large population of speakers to train the hidden Markov models of some fundamental speech units; the organization of speech and language knowledge sources into a structural finite state network; and the use of dynamic, programming based heuristic search methods to find the best word sequence in the lexical network corresponding to the spoken utterance. Automatic Speech and Speaker Recognition: Advanced Topics groups together in a single volume a number of important topics on speech and speaker recognition, topics which are of fundamental importance, but not yet covered in detail in existing textbooks. Although no explicit partition is given, the book is divided into five parts: Chapters 1-2 are devoted to technology overviews; Chapters 3-12 discuss acoustic modeling of fundamental speech units and lexical modeling of words and pronunciations; Chapters 13-15 address the issues related to flexibility and robustness; Chapter 16-18 concern the theoretical and practical issues of search; Chapters 19-20 give two examples of algorithm and implementational aspects for recognition system realization. Audience: A reference book for speech researchers and graduate students interested in pursuing potential research on the topic. May also be used as a text for advanced courses on the subject.

Fundamentals in Computer Understanding: Speech and Vision

Fundamentals in Computer Understanding: Speech and Vision
Author: Institut national de recherche en informatique et en automatique (France)
Publisher: CUP Archive
Total Pages: 296
Release: 1987-05-07
Genre: Computers
ISBN: 9780521309837


Download Fundamentals in Computer Understanding: Speech and Vision Book in PDF, Epub and Kindle

Man-machine communication is presently undergoing an important evolution which is influenced both by technological advances and by the progress made in various fields such as signal processing, pattern recognition and artificial intelligence. This book emphasizes relevant aspects of man-machine dialogue by voice (acoustic-phonetic decoding, multi-speaker aspects, dialogue architectures, etc.) and presents analogies with the related fields of computer vision and natural language processing. It also introduces the fundamentals of knowledge-based and expert systems which are widely used in this field. The book is the result of an interdisciplinary collaboration of international experts who worked together for an advanced course sponsored by the Commission of the European Communities and Institut National de Recherche en Informatique et en Automatique. The course was held in Paris in May 1985.

Robust Automatic Speech Recognition

Robust Automatic Speech Recognition
Author: Jinyu Li
Publisher: Academic Press
Total Pages: 308
Release: 2015-10-30
Genre: Technology & Engineering
ISBN: 0128026162


Download Robust Automatic Speech Recognition Book in PDF, Epub and Kindle

Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications.The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided.The reader will: Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition Learn the links and relationship between alternative technologies for robust speech recognition Be able to use the technology analysis and categorization detailed in the book to guide future technology development Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years

New Era for Robust Speech Recognition

New Era for Robust Speech Recognition
Author: Shinji Watanabe
Publisher: Springer
Total Pages: 433
Release: 2017-10-30
Genre: Computers
ISBN: 331964680X


Download New Era for Robust Speech Recognition Book in PDF, Epub and Kindle

This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of real-world applications, benchmark tools and datasets widely used in the field. This book is intended for researchers and practitioners working in the field of speech processing and recognition who are interested in the latest deep learning techniques for noise robustness. It will also be of interest to graduate students in electrical engineering or computer science, who will find it a useful guide to this field of research.

Advances in Nonlinear Speech Processing

Advances in Nonlinear Speech Processing
Author: Jordi Sole-Casals
Publisher: Springer
Total Pages: 209
Release: 2010-03-10
Genre: Computers
ISBN: 3642115098


Download Advances in Nonlinear Speech Processing Book in PDF, Epub and Kindle

This volume contains the proceedings of NOLISP 2009, an ISCA Tutorial and Workshop on Non-Linear Speech Processing held at the University of Vic (- talonia, Spain) during June 25-27, 2009. NOLISP2009wasprecededbythreeeditionsofthisbiannualeventheld2003 in Le Croisic (France), 2005 in Barcelona, and 2007 in Paris. The main idea of NOLISP workshops is to present and discuss new ideas, techniques and results related to alternative approaches in speech processing that may depart from the mainstream. In order to work at the front-end of the subject area, the following domains of interest have been de?ned for NOLISP 2009: 1. Non-linear approximation and estimation 2. Non-linear oscillators and predictors 3. Higher-order statistics 4. Independent component analysis 5. Nearest neighbors 6. Neural networks 7. Decision trees 8. Non-parametric models 9. Dynamics for non-linear systems 10. Fractal methods 11. Chaos modeling 12. Non-linear di?erential equations The initiative to organize NOLISP 2009 at the University of Vic (UVic) came from the UVic Research Group on Signal Processing and was supported by the Hardware-Software Research Group. We would like to acknowledge the ?nancial support obtained from the M- istry of Science and Innovation of Spain (MICINN), University of Vic, ISCA, and EURASIP. All contributions to this volume are original. They were subject to a doub- blind refereeing procedure before their acceptance for the workshop and were revised after being presented at NOLISP 2009.

Automatic Speech and Speaker Recognition

Automatic Speech and Speaker Recognition
Author: Joseph Keshet
Publisher: John Wiley & Sons
Total Pages: 268
Release: 2009-04-27
Genre: Technology & Engineering
ISBN: 9780470742037


Download Automatic Speech and Speaker Recognition Book in PDF, Epub and Kindle

This book discusses large margin and kernel methods for speech and speaker recognition Speech and Speaker Recognition: Large Margin and Kernel Methods is a collation of research in the recent advances in large margin and kernel methods, as applied to the field of speech and speaker recognition. It presents theoretical and practical foundations of these methods, from support vector machines to large margin methods for structured learning. It also provides examples of large margin based acoustic modelling for continuous speech recognizers, where the grounds for practical large margin sequence learning are set. Large margin methods for discriminative language modelling and text independent speaker verification are also addressed in this book. Key Features: Provides an up-to-date snapshot of the current state of research in this field Covers important aspects of extending the binary support vector machine to speech and speaker recognition applications Discusses large margin and kernel method algorithms for sequence prediction required for acoustic modeling Reviews past and present work on discriminative training of language models, and describes different large margin algorithms for the application of part-of-speech tagging Surveys recent work on the use of kernel approaches to text-independent speaker verification, and introduces the main concepts and algorithms Surveys recent work on kernel approaches to learning a similarity matrix from data This book will be of interest to researchers, practitioners, engineers, and scientists in speech processing and machine learning fields.

Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition

Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition
Author: Jinxi Guo
Publisher:
Total Pages: 127
Release: 2019
Genre:
ISBN:


Download Neural Network Based Representation Learning and Modeling for Speech and Speaker Recognition Book in PDF, Epub and Kindle

Deep learning and neural network research has grown significantly in the fields of automatic speech recognition (ASR) and speaker recognition. Compared to traditional methods, deep learning-based approaches are more powerful in learning representation from data and building complex models. In this dissertation, we focus on representation learning and modeling using neural network-based approaches for speech and speaker recognition. In the first part of the dissertation, we present two novel neural network-based methods to learn speaker-specific and phoneme-invariant features for short-utterance speaker verification. We first propose to learn a spectral feature mapping from each speech signal to the corresponding subglottal acoustic signal which has less phoneme variation, using deep neural networks (DNNs). The estimated subglottal features show better speaker-separation ability and provide complementary information when combined with traditional speech features on speaker verification tasks. Additional, we propose another DNN-based mapping model, which maps the speaker representation extracted from short utterances to the speaker representation extracted from long utterances of the same speaker. Two non-linear regression models using an autoencoder are proposed to learn this mapping, and they both improve speaker verification performance significantly. In the second part of the dissertation, we design several new neural network models which take raw speech features (either complex Discrete Fourier Transform (DFT) features or raw waveforms) as input, and perform the feature extraction and phone classification jointly. We first propose a unified deep Highway (HW) network with a time-delayed bottleneck layer (TDB), in the middle, for feature extraction. The TDB-HW networks with complex DFT features as input provide significantly lower error rates compared with hand-designed spectrum features on large-scale keyword spotting tasks. Next, we present a 1-D Convolutional Neural Network (CNN) model, which takes raw waveforms as input and uses convolutional layers to do hierarchical feature extraction. The proposed 1-D CNN model outperforms standard systems with hand-designed features. In order to further reduce the redundancy of the 1-D CNN model, we propose a filter sampling and combination (FSC) technique, which can reduce the model size by 70% and still improve the performance on ASR tasks. In the third part of dissertation, we propose two novel neural-network models for sequence modeling. We first propose an attention mechanism for acoustic sequence modeling. The attention mechanism can automatically predict the importance of each time step and select the most important information from sequences. Secondly, we present a sequence-to-sequence based spelling correction model for end-to-end ASR. The proposed correction model can effectively correct errors made by the ASR systems.

Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition
Author: Alex Acero
Publisher: Springer Science & Business Media
Total Pages: 216
Release: 1992-11-30
Genre: Technology & Engineering
ISBN: 9780792392842


Download Acoustical and Environmental Robustness in Automatic Speech Recognition Book in PDF, Epub and Kindle

The need for automatic speech recognition systems to be robust with respect to changes in their acoustical environment has become more widely appreciated in recent years, as more systems are finding their way into practical applications. Although the issue of environmental robustness has received only a small fraction of the attention devoted to speaker independence, even speech recognition systems that are designed to be speaker independent frequently perform very poorly when they are tested using a different type of microphone or acoustical environment from the one with which they were trained. The use of microphones other than a "close talking" headset also tends to severely degrade speech recognition -performance. Even in relatively quiet office environments, speech is degraded by additive noise from fans, slamming doors, and other conversations, as well as by the effects of unknown linear filtering arising reverberation from surface reflections in a room, or spectral shaping by microphones or the vocal tracts of individual speakers. Speech-recognition systems designed for long-distance telephone lines, or applications deployed in more adverse acoustical environments such as motor vehicles, factory floors, oroutdoors demand far greaterdegrees ofenvironmental robustness. There are several different ways of building acoustical robustness into speech recognition systems. Arrays of microphones can be used to develop a directionally-sensitive system that resists intelference from competing talkers and other noise sources that are spatially separated from the source of the desired speech signal.