Abstract:
Reconstruction the time domain speech signal from Mel-frequency cepstral coefficients (MFCCs) based on L1/2 sparse constraint is proposed. Since estimating the speech amplitude spectrum from MFCCs is an underdetermined problem, existing methods usually adopt either minimum mean square error minimization of the amplitude spectrum or the
L1 regularization based sparse constraint to estimate the amplitude spectrum. Compared to the
L1 regularization, the
L1/2 regularization has stronger ability to obtain the sparse components of a speech signal. Thus, we use
L1/2 regularization constraint when estimating amplitude spectrum from MFCCs in the proposed method. The phase spectrum is estimated from the estimated sparse amplitude spectrum. Finally the time domain speech signal is reconstructed from the estimated spectrum. Experimental results show that the speech signal reconstructed by the proposed method gains higher speech quality than that by the minimum mean square error method. Specifically, the proposed method outperforms the
L1 regularization method in the aspect of speech quality under the noise environment, indicating noise robustness of the proposed method.