EI / SCOPUS / CSCD 收录

中文核心期刊

ZHOU Jian, DOU Yunfeng, LIU Rongmin, WANG Huabin, TAO Liang. Whisper to normal conversion based on low dimension feature mapping[J]. ACTA ACUSTICA, 2018, 43(5): 855-863. DOI: 10.15949/j.cnki.0371-0025.2018.05.017
Citation: ZHOU Jian, DOU Yunfeng, LIU Rongmin, WANG Huabin, TAO Liang. Whisper to normal conversion based on low dimension feature mapping[J]. ACTA ACUSTICA, 2018, 43(5): 855-863. DOI: 10.15949/j.cnki.0371-0025.2018.05.017

Whisper to normal conversion based on low dimension feature mapping

More Information
  • PACS: 
    • 43.28  (Aeroacoustics and atmospheric sound)
    • 43.20  (General linear acoustics)
  • Received Date: May 19, 2017
  • Revised Date: July 08, 2017
  • Available Online: June 27, 2022
  • In order to characterize the relationship between whisper and its corresponding normal speech for whisper to normal speech conversion, the low dimension features of spectrum envelope in whisper and normal speech are extracted and represented by a sparse auto-encoder. In the low dimension space, two BP networks are then trained. One is used to model the spectrum relation between the whisper and its corresponding normal speech and the other is used to model the relation between the whisper spectrum and the pitch of normal speech. In the conversion stage, the spectral envelope of whisper is sparsely encoded to obtain low dimension spectral envelope feature. The low dimension normal speech feature and pitch are then estimated respectively through the trained BP networks. With sparse decoding, the envelope spectrum of normal speech is then obtained and used to reconstruct the normal speech. Experimental results show that the ceptral distance of the normal speech estimated by the proposed method decreases 10% compared with that of the GMM-based method. Subjective listening tests also show better naturalness and intelligibility obtained by the proposed method.
  • Related Articles

    [1]CHEN Lele, ZHANG Xiongwei, SUN Meng, ZHANG Xingyu. Noise robust voice conversion with the fusion of Mel-spectrum enhancement and feature disentanglement[J]. ACTA ACUSTICA, 2023, 48(5): 1070-1080. DOI: 10.12395/0371-0025.2022093
    [2]LIAN Hailun, ZHOU Jian, HU Yuting, ZHENG Wenming. Whisper to normal speech conversion using deep convolutional neural networks[J]. ACTA ACUSTICA, 2020, 45(1): 137-144. DOI: 10.15949/j.cnki.0371-0025.2020.01.017
    [3]LI Na, ZENG Xiangyang, QIAO Yu, LI Zhifeng. Voice conversion using bayesian analysis and dynamic kernel features[J]. ACTA ACUSTICA, 2015, 40(3): 455-461. DOI: 10.15949/j.cnki.0371-0025.2015.03.013
    [4]LI Yangchun, YU Yibiao. Voice conversion using structured Gaussian mixture model in eigen space[J]. ACTA ACUSTICA, 2015, 40(1): 12-19. DOI: 10.15949/j.cnki.0371-0025.2015.01.002
    [5]CHEN Xueqin, ZHAO Heming. Research of whispered speech vocal tract system conversion based on universal background model and effective Gaussian components[J]. ACTA ACUSTICA, 2013, 38(2): 195-200. DOI: 10.15949/j.cnki.0371-0025.2013.02.008
    [6]TAO Zhi, ZHAO Heming, TAN Xuedan, GU Jihua, ZHANG Xiaojun, WU Di. Research of conversion from whispered speech to normal speech by the extended bilinear transformation[J]. ACTA ACUSTICA, 2012, 37(6): 651-658. DOI: 10.15949/j.cnki.0371-0025.2012.06.011
    [7]YU Yibiao, CENG Daojian, JIANG Ying. Voice conversion based on isolated speaker model[J]. ACTA ACUSTICA, 2012, 37(3): 346-352. DOI: 10.15949/j.cnki.0371-0025.2012.03.011
    [8]CHEN Cunbao, ZHAO Li, ZOU Cairong. Speaker verification using speaker model synthesis and feature mapping based on maximum-likelihood linear regression[J]. ACTA ACUSTICA, 2011, 36(1): 81-87. DOI: 10.15949/j.cnki.0371-0025.2011.01.010
    [9]WANG Huanliang, HAN Jiqing, LI Haifeng. Robust endpoint detection based on feature weighted likelihood and dimension reduction[J]. ACTA ACUSTICA, 2007, 32(1): 62-68. DOI: 10.15949/j.cnki.0371-0025.2007.01.008
    [10]KANG Yongguo, SHUANG Zhiwei, TAO Jianhua, ZHANG Wei. A hybrid method to convert acoustic features for voice conversion[J]. ACTA ACUSTICA, 2006, 31(6): 555-562. DOI: 10.15949/j.cnki.0371-0025.2006.06.014
  • Cited by

    Periodical cited type(2)

    1. 章瑾,冯平. 影视动画配音节奏特征自动提取系统设计. 现代电子技术. 2020(18): 59-63 .
    2. 庞聪,连海伦,周健,王华彬,陶亮. 一种基于特征融合的耳语音向正常音的转换方法. 南京航空航天大学学报. 2020(05): 777-782 .

    Other cited types(4)

Catalog

    Article Metrics

    Article views (100) PDF downloads (11) Cited by(6)
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return