Ngoc Thang VuAutomatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
ISBN: | 978-3-8440-2892-8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reihe: | Informatik | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Schlagwörter: | Automatic speech recognition; Multilingualism; Speech and language processing | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Publikationsart: | Dissertation | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sprache: | Englisch | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Seiten: | 206 Seiten | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Abbildungen: | 45 Abbildungen | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gewicht: | 305 g | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Format: | 24 x 17 cm | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Bindung: | Paperback | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Preis: | 49,80 € | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Erscheinungsdatum: | Juli 2014 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Kaufen: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Download: | Verfügbare Online-Dokumente zu diesem Titel: Sie benötigen den Adobe Reader, um diese Dateien ansehen zu können. Hier erhalten Sie eine kleine Hilfe und Informationen, zum Download der PDF-Dateien. Bitte beachten Sie, dass die Online-Dokumente nicht ausdruckbar und nicht editierbar sind.
Benutzereinstellungen für registrierte Online-Kunden Sie können hier Ihre Adressdaten ändern sowie bereits georderte Dokumente erneut aufrufen.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Weiterempfehlung: | Sie möchten diesen Titel weiterempfehlen? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Rezensionsexemplar: | Hier können Sie ein Rezensionsexemplar bestellen. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Verlinken: | Sie möchten diese Seite verlinken? Hier klicken. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Export Zitat: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Zusammenfassung: | This thesis explores methods to rapidly bootstrap automatic speech recognition systems (ASR) for languages, which lack resources for speech and language processing - called low-resource languages. We focus on finding approaches which allow using data from multiple languages to improve ASR systems for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech, which have become more common in the modern world. The main contributions of this thesis are as follows: Building an ASR system without transcribed audio data: In this thesis, we developed a multilingual unsupervised training framework which allows building ASR systems without transcribed audio data. Several existing ASR systems from different languages were used in combination with cross-language transfer techniques and unsupervised training to iteratively transcribe the audio data of the target language and, therefore, bootstrap ASR systems. The key contribution is the proposal of a word-based confidence score called “Multilingual A-stabil” which works well not only with well trained acoustic models but also with a poorly estimated acoustic model, such as one which is borrowed from other languages in order to bootstrap the acoustic model for an unseen language. All the experimental results showed that it is possible to build ASR systems for new languages without any transcribed data, even if the source and the target languages are not related. Multilingual Bottle-Neck features: We explored multilingual Bottle-Neck (BN) features and their application to rapid language adaptation to new languages. Our results revealed that using a multilingual multilayer perceptron (MLP) to initialize the MLP training for new languages improved the MLP performance and, therefore, the ASR performance. Finally, visualization of the features using t-SNE leads to a better understanding of the multilingual BN features. Improving ASR performance on non-native speech using multilingual and crosslingual information: This part presents our exploration of using multilingual and crosslingual information to improve the ASR performance on nonnative speech. We showed that a multilingual ASR system consistently outperforms a monolingual ASR system on non-native speech. Finally, we proposed a method called cross-lingual accent adaptation to improve the ASR performance on non-native speech without any adaptation data. With this approach, we achieved substantial improvements over the baseline system. Multilingual deep neural network based acoustic modeling for rapid language adaptation: This thesis comprises an investigation of multilingual deep neural network (DNN) based acoustic modeling and its application to new languages. We investigated the effect of phone merging on multilingual DNN in the context of rapid language adaptation and the combination of multilingual DNNs with Kullback–Leibler divergence based acoustic modeling (KL-HMM). Our studies revealed that KL-HMM based decoding consistently outperformed conventional hybrid decoding, especially in low-resource scenarios. Furthermore, we found that multilingual DNN training equally benefits from simple phone set concatenation and a manually derived universal phone set based on IPA. Multilingual language modeling for Code-Switching speech: We investigated the integration of high level features, such as part-of-speech tags and language identifiers into language models for Code-Switching speech. Our results showed that using these features in state-of-the-art language modeling techniques, such as recurrent neural network and factored language models improved the perplexity and mixed error rate on Code-Switching speech. Moreover, the interpolated language model between these two LMs gave the best performance on the SEAME database. Finally, we showed that Code-Switching is speaker dependent and, therefore, Code-Switching attitude dependent language modeling further improved the perplexity and the mixed error rate. We believe that our findings will have an increasing impact over time not only for research but also for industry. The results can be used to save costs and developmental time for the building of a speech recognizer for a new language. In addition, the contribution of this thesis on non-native and Code-Switching speech will become more important due to the rapidly growing globalization. |