Andrea Schnall

Speaker Adaptation for Word Prominence Detection


Vorderseite	Rückseite

ISBN:

978-3-8440-5921-2

Reihe:

Informationstechnik

Schlagwörter:

Prosody; Word Prominence Detection; Speaker Adaptation; fMLLR; SVM

Publikationsart:

Dissertation

Sprache:

Englisch

Seiten:

140 Seiten

Abbildungen:

32 Abbildungen

Gewicht:

197 g

Format:

21 x 14,8 cm

Bindung:

Paperback

Preis:

45,80 €

Erscheinungsdatum:

Mai 2018

Kaufen:

Download:

Verfügbare Online-Dokumente zu diesem Titel:

Sie benötigen den Adobe Reader, um diese Dateien ansehen zu können. Hier erhalten Sie eine kleine Hilfe und Informationen, zum Download der PDF-Dateien.

Bitte beachten Sie, dass die Online-Dokumente nicht ausdruckbar und nicht editierbar sind.
Bitte beachten Sie auch weitere Informationen unter: Hilfe und Informationen.


	Dokument		Gesamtdokument
	Dateiart		PDF
	Kosten		34,35 EUR
	Aktion		Zahlungspflichtig kaufen und anzeigen der Datei - 4,6 MB (4801525 Byte)
	Aktion		Zahlungspflichtig kaufen und download der Datei - 4,6 MB (4801525 Byte)


	Dokument		Inhaltsverzeichnis
	Dateiart		PDF
	Kosten		frei
	Aktion		Anzeigen der Datei - 170 kB (173659 Byte)
	Aktion		Download der Datei - 170 kB (173659 Byte)

Benutzereinstellungen für registrierte Online-Kunden

Sie können hier Ihre Adressdaten ändern sowie bereits georderte Dokumente erneut aufrufen.

Benutzer:	Nicht angemeldet
Aktionen:	Anmelden/Registrieren Passwort vergessen?

Weiterempfehlung:

Sie möchten diesen Titel weiterempfehlen?

Rezensionsexemplar:

Hier können Sie ein Rezensionsexemplar bestellen.

Verlinken:

Sie möchten diese Seite verlinken? Hier klicken.

Export Zitat:

Text
BibTex
RIS

Zusammenfassung:

The goal of this dissertation is to investigate methods for word prominence detection in speech. In human communication prosodic cues such as word prominence play an important role: We emphasize words to mark them important and indicate the informational focus in a sentence. Speech recognition systems currently do not use this information and are therefore not very intuitive and error-prone.

In this thesis, a system to distinguish prominent and non-prominent words is presented. Several different feature choices in the audio and video domain are investigated; furthermore, several different classifiers with different characteristics are examined. One aspect to be evaluated here is the usage of context information on the feature level as well as on the classifier level. It will be shown, that plenty of information is incorporated in the neighboring words. Therefore, the whole sequence should be used for classification.

The study will be especially concerned with the performance difference between speaker-dependent and speaker-independent trained systems. To overcome the problem of variations from a pool of speakers and the resulting performance loss, a new adaptation method is presented. Common speaker adaptation methods, used for speech processing, are designed for Gaussian Mixture Models/Hidden Markov Models based classifiers. This thesis shows that for the problem of word prominence detection, a discriminative classifier, such as the Support Vector Machines, performs best, but until now has not been combined adequately with common speaker adaptation methods. Therefore, a new method, based on Support Vector Machines with Radial Basis Function kernel, and their two extensions are presented and evaluated. Ultimately, the thesis shows that this method can significantly improve performance for speaker-independent classification when only a small amount of speaker-specific data is available.