Trainable Text-to-Speech Synthesis for European Portuguese

This dissertation implements a hidden Markov models (HMM) based text-to-speech (TTS) synthesis system for European Portuguese (EP). The work describes the different speech synthesis approaches and overviews the use of HMMs in speech synthesis. The history of HMMs in speech synthesis is outlined, the main techniques used to apply HMMs to speech synthesis are explained and the speech parameter generation algorithm is described. The implemented TTS system for EP is presented, together with an analysis of the EP language and phonetic inventory. The natural language processing (NLP) module is described, with a brief description of the use of maximum entropy (ME) in NLP and the results for EP. The EP language dependent module is described, referring to the language contextual factors and decision tree questions for phoneme clustering. A description of the speech synthesis module and of the training process of a synthesis system with the speech parameters generated from HMMs themselves is given. The results of the TTS system evaluation using a mean opinion score (MOS) test-set to test its acceptability and compare it with two other systems evaluate the system. To improve the results, a speech corpus especially designed for context-based EP TTS systems is proposed and a subset of the complete corpus, organized in orthographic sentences together with their phonetic transcription are presented in the dissertation. Two other improvements to the system are suggested: a hybrid system based on the residual signal, in substitution for the pulse train used in the source-filter model of the speech synthesizer for the voiced sounds; and a method for dealing with foreign words, based on the mapping of phonemes from EP with phonemes of other languages or dialects.