Abstract:
A Chinese text-to-speech system,which is based on the time domain Pitch-Synchronous-Overlap-Add (PSOLA) method,a Chinese syllable dictionary and a prosodic——rule dictionary,can produce very clear and natural Chinese speech.Research work on naturalness of synthetic Chinese show that,when synthesizing Chinese,pitch,energy,syllable duration and coarticulation between syllables are main factors which affect the natrualness.Among them pitch and duration play the most important roles.The time domain PSOLA Scheme provide a method to modity the pitch and duration of a speech segment in time domain,and this makes it possible to adjust the prosody of speech in word level and sentence level,when synthesizing Chinese using waveform concatenation technique.Acoustics analysis of news broadcast speech provides theoretical basis for building up prosodic rules.this paper presents the flowchart of the new Chinese text-to-speech system,the research result of acoustics analysis of news broadcast speech,prosodic rules of the new system,and the evaluation results of speech quality of the new system.