Ziyue Zhao

Contributions to Neural Network-Based Speech Processing: Nonlinear Speech Prediction, Decoder Postprocessing, and Perceptual Loss Functions


Vorderseite	Rückseite

ISBN:

978-3-8440-8779-6

Reihe:

Mitteilungen aus dem Institut für Nachrichtentechnik der Technischen Universität Braunschweig
Herausgeber: Prof. Dr.-Ing. U. Reimers, Prof. Dr.-Ing. T. Kürner und Prof. Dr.-Ing. T. Fingscheidt
Braunschweig

Band:

Schlagwörter:

Speech Coding; Speech Enhancement; Neural Networks

Publikationsart:

Dissertation

Sprache:

Englisch

Seiten:

154 Seiten

Abbildungen:

28 Abbildungen

Gewicht:

228 g

Format:

21 x 14,8 cm

Bindung:

Paperback

Preis:

48,80 €

Erscheinungsdatum:

Oktober 2022

Kaufen:

Download:

Verfügbare Online-Dokumente zu diesem Titel:

Sie benötigen den Adobe Reader, um diese Dateien ansehen zu können. Hier erhalten Sie eine kleine Hilfe und Informationen, zum Download der PDF-Dateien.

Bitte beachten Sie, dass die Online-Dokumente nicht ausdruckbar und nicht editierbar sind.
Bitte beachten Sie auch weitere Informationen unter: Hilfe und Informationen.


	Dokument		Gesamtdokument
	Dateiart		PDF
	Kosten		36,60 EUR
	Aktion		Zahlungspflichtig kaufen und anzeigen der Datei - 1,1 MB (1198555 Byte)
	Aktion		Zahlungspflichtig kaufen und download der Datei - 1,1 MB (1198555 Byte)


	Dokument		Inhaltsverzeichnis
	Dateiart		PDF
	Kosten		frei
	Aktion		Anzeigen der Datei - 256 kB (261802 Byte)
	Aktion		Download der Datei - 256 kB (261802 Byte)

Benutzereinstellungen für registrierte Online-Kunden

Sie können hier Ihre Adressdaten ändern sowie bereits georderte Dokumente erneut aufrufen.

Benutzer:	Nicht angemeldet
Aktionen:	Anmelden/Registrieren Passwort vergessen?

Weiterempfehlung:

Sie möchten diesen Titel weiterempfehlen?

Rezensionsexemplar:

Hier können Sie ein Rezensionsexemplar bestellen.

Verlinken:

Sie möchten diese Seite verlinken? Hier klicken.

Export Zitat:

Text
BibTex
RIS

Zusammenfassung:

Speech processing technologies are omnipresent in our daily communication products and services. Neural networks, as powerful data-driven models, have shown promising performance in various research fields, including speech processing. This thesis focuses on neural network-based speech processing, and it can be divided into three parts as follows.

In the field of speech prediction, a nonlinear speech predictor using the echo state network (ESN) is proposed as a novel adaptive prediction approach. This proposed nonlinear predictor shows better prediction performance than all baseline prediction methods in the simulations, including a predictor based on a long short-term memory (LSTM) structure. Second, the field of neural network-based speech enhancement puts focus on loss functions. A novel perceptual weighting filter (PWF) loss function motivated by the weighting filter from code-excited linear prediction (CELP) speech coding is proposed. A fully connected neural network (FCNN) and a convolutional neural network (CNN) are both used to evaluate the proposed loss functions, and the simulation results show their superior performance compared to baselines. Finally, neural network-based postprocessing for the enhancement of coded speech is studied. CNN-based postprocessors are proposed either to directly enhance the raw waveform in an end-to-end fashion, or to enhance the cepstral domain features using analysis synthesis. Furthermore, an advanced network structure, the fully convolutional recurrent network (FCRN), is utilized to enhance coded speech in the frequency domain, with the PWF loss function advantageously applied. The experimental results confirm the effectiveness of the proposed postprocessors with improved speech quality.