EI / SCOPUS / CSCD 收录

中文核心期刊

一种高清晰度、高自然度的汉语文语转换系统

A Chinese text-to-speech system with high intelligibility and high naturalness

  • 摘要: 以基音同步叠加技术为基础,以汉语单音节为合成单元,有一包含词调模式、重音模式和句调模式的韵律规则库的汉语文语转换系统,可合成出高清晰度和高自然度的汉语语音。研究表明,影响汉语合成语音的自然度的主要因素是音高和音强随时间的变化、各音节的音长分布以及音节间的协同发音,其中以音高和音长的影响最为显著。时域基音同步叠加技术提供了一种在时域改变语音波形的音高和音长的方法,从而使在用波形拼接法合成汉语时,进行词一级和句一级的韵律调节成为可能。对新闻广播语言的声学特征的分析,为建立汉语合成的韵律调节规则提供了理论依据。本文介绍新的汉语文语转换系统的结构及流程、对广播语言韵律特征的初步研究结果、汉语合成规则及合成系统语音质量的评测结果。

     

    Abstract: A Chinese text-to-speech system,which is based on the time domain Pitch-Synchronous-Overlap-Add (PSOLA) method,a Chinese syllable dictionary and a prosodic——rule dictionary,can produce very clear and natural Chinese speech.Research work on naturalness of synthetic Chinese show that,when synthesizing Chinese,pitch,energy,syllable duration and coarticulation between syllables are main factors which affect the natrualness.Among them pitch and duration play the most important roles.The time domain PSOLA Scheme provide a method to modity the pitch and duration of a speech segment in time domain,and this makes it possible to adjust the prosody of speech in word level and sentence level,when synthesizing Chinese using waveform concatenation technique.Acoustics analysis of news broadcast speech provides theoretical basis for building up prosodic rules.this paper presents the flowchart of the new Chinese text-to-speech system,the research result of acoustics analysis of news broadcast speech,prosodic rules of the new system,and the evaluation results of speech quality of the new system.

     

/

返回文章
返回