Citation: | ZHANG Cong, YANG Feiran, CHEN Xianmei, YANG Jun. Convolution transfer function-based multi-channel non-negative matrix factorization using generalized Gaussian distributions[J]. ACTA ACUSTICA, 2024, 49(3): 598-610. DOI: 10.12395/0371-0025.2023009 |
The convolution transfer function-based multi-channel non-negative matrix factorization (CTF-MNMF) has been shown to perform well in blind source separation in highly reverberant environments, but its effectiveness may be limited by the source model. An improved version of the CTF-MNMF is proposed, where the generalized Gaussian distribution (GGD) is used as the source model. The domain parameter is introduced into the NMF and the generalized NMF (GNMF) is utilized to model the non-negative scale factors of the GGD, which enhances the robustness of the source model in capturing signal outliers, and thus improves the accuracy of source estimation. An auxiliary function-based method is used to derive an improved formula for updating the separated matrix and non-negative matrix parameters. Simulation results shows that the proposed algorithm achieves better separation performance than the GGD-ILRMA, WPE-ILRMA, CTF-MNMF algorithms for both speech and music input signals.
[1] |
Hyvärinen A, Karhunen J, Oja E. Independent component analysis. New York, USA: Wiley, 2001
|
[2] |
Makino S. Audio source separation. New York, USA: Springer, 2018
|
[3] |
Vincent E, Virtanen T, Gannot S. Audio source separation and speech enhancement. Hoboken, USA: Wiley, 2018
|
[4] |
王泽林, 陈锴, 卢晶. 车载场景结合盲源分离与多说话人状态判决的语音抽取. 声学学报, 2020; 45(5): 696−706 DOI: 10.15949/j.cnki.0371-0025.2020.05.009
|
[5] |
张天, 张天骐, 葛宛营, 等. 融合声源分离及反复结构模型的音乐分离方法. 声学学报, 2020; 45(5): 707−715 DOI: 10.15949/j.cnki.0371-0025.2020.05.010
|
[6] |
李秀坤, 杨阳, 孟祥夏. 水下目标回波与混响的时频形态特征域盲分离. 声学学报, 2017; 42(2): 169−177 DOI: 10.15949/j.cnki.0371-0025.2017.02.005
|
[7] |
Wang D L, Chen J D. Supervised speech separation based on deep learning: An overview. IEEE/ACM Trans. Audio Speech Lang. Process., 2010; 26(10): 1702−1726 DOI: 10.1109/TASLP.2018.2842159
|
[8] |
Pandey A, Wang D L. On cross-corpus generalization of deep learning based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process., 2020; 28: 2489−2499 DOI: 10.1109/TASLP.2020.3016487
|
[9] |
Huang P S, Kim M, Hasegawa-Johnson M, et al. Deep learning for monaural speech separation. IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Florence, Italy, 2014: 1562−1566
|
[10] |
Luo Y, Mesgarani N. Tasnet: Time-domain audio separation network for real-time, single-channel speech separation. IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Calgary, Canada, 2018: 696−700
|
[11] |
Luo Y, Mesgarani N. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation. IEEE/ACM Trans. Audio Speech Lang. Process., 2019; 27(8): 1256−1266 DOI: 10.1109/TASLP.2019.2915167
|
[12] |
Luo Y, Chen Z, Yoshioka T. Dual-path RNN: Efficient long sequence modeling for time-domain single-channel speech separation. IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Barcelona, Spain, 2020: 46−50
|
[13] |
Smaragdis P. Blind separation of convolved mixtures in the frequency domain. Neurocomputing, 1998; 22(1-3): 21−34 DOI: 10.1016/S0925-2312(98)00047-2
|
[14] |
Kang F, Yang F, Yang J. A low-complexity permutation alignment method for frequency-domain blind source separation. Speech Commun., 2019; 115: 88−94 DOI: 10.1016/j.specom.2019.11.002
|
[15] |
Kim T, Eltoft T, Lee T W. Independent vector analysis: An extension of ICA to multivariate components. International conference on independent component analysis and signal separation. Springer, Berlin, Heidelberg, 2006: 165−172
|
[16] |
Kim T, Attias H T, Lee S Y, et al. Blind source separation exploiting higher-order frequency dependencies. IEEE Trans. Audio Speech Lang. Process., 2006; 15(1): 70−79 DOI: 10.1109/TASL.2006.872618
|
[17] |
Ono T, Ono N, Sagayama S. User-guided independent vector analysis with source activity tuning. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Kyoto, Japan, 2012: 2417−2420
|
[18] |
Ono N. Auxiliary-function-based independent vector analysis with power of vector-norm type weighting functions. Proceedings of the 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, IEEE, Hollywood, California, 2012: 1−4
|
[19] |
Kitamura D, Mogami S, Mitsui Y. Generalized independent low-rank matrix analysis using heavy-tailed distributions for blind source separation. EURASIP J. Adv. Signal Process., 2018; 28(1): 1−25 DOI: 10.1186/s13634-018-0549-5
|
[20] |
Févotte C, Bertin N, Durrieu JL. Nonnegative matrix factorization with the Itakura-Saito divergence: with application to music analysis. Neural Comput., 2009; 21(3): 793−830 DOI: 10.1162/neco.2008.04-08-771
|
[21] |
Duong N Q K, Vincent E, Gribonval R. Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Trans. Audio Speech Lang. Process., 2010; 18(7): 1830−1840 DOI: 10.1109/TASL.2010.2050716
|
[22] |
Kitamura D, Ono N, Sawada H, et al. Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization. IEEE/ACM Trans. Audio Speech Lang. Process., 2016; 24(9): 1626−1641 DOI: 10.1109/TASLP.2016.2577880
|
[23] |
Talmon R, Cohen I, Gannot S. Relative transfer function identification using convolutive transfer function approximation. IEEE Trans. Audio Speech Lang. Process., 2009; 17(4): 546−555 DOI: 10.1109/TASL.2008.2009576
|
[24] |
Wang T, Yang F, Yang J. Convolutive transfer function-based multichannel nonnegative matrix factorization for overdetermined blind source separation. IEEE/ACM Trans. Audio Speech Lang. Process., 2022; 30: 802−815 DOI: 10.1109/TASLP.2022.3145304
|
[25] |
Jukić A, van Waterschoot T, Gerkmann T, et al. Multi-channel linear prediction-based speech dereverberation with sparse priors. IEEE/ACM Trans. Audio Speech Lang. Process., 2015; 23(9): 1509−1520 DOI: 10.1109/TASLP.2015.2438549
|
[26] |
Ono N, Miyabe S. Auxiliary-function-based independent component analysis for super-Gaussian sources. International Conference on Latent Variable Analysis and Signal Separation, Springer, Berlin, Heidelberg, 2010: 165−172
|
[27] |
Gillis N. Nonnegative matrix factorization. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2020: 184−192
|
[28] |
Ono N. Stable and fast update rules for independent vector analysis based on auxiliary function technique. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, New York, 2011: 189−192
|
[29] |
Avargel Y, Cohen I. System identification in the short-time Fourier transform domain with crossband filtering. IEEE Trans. Audio Speech Lang. Process., 2007; 15(4): 1305−1319 DOI: 10.1109/TASL.2006.889720
|
[30] |
Erkelens J S, Hendriks R C, Heusdens R, et al. Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors. IEEE Trans. Audio Speech Lang. Process., 2007; 15(6): 1741−1752 DOI: 10.1109/TASL.2007.899233
|
[31] |
Prasad R, Saruwatari H, Shikano K. Probability distribution of time-series of speech spectral components. IEICE Trans. Fundam. Electron. Commun. Comput. Sci., 2004; 87(3): 584−597
|
[32] |
Yeredor A. On hybrid exact-approximate joint diagonalization. Proceedings of IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, IEEE, 2009: 312−325
|
[33] |
Sun Y, Babu P, Palomar D P. Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Trans. Signal Process., 2016; 65(3): 794−816 DOI: 10.1109/TSP.2016.2601299
|
[34] |
Févotte C, Idier J. Algorithms for nonnegative matrix factorization with the β-divergence. Neural Comput., 2011; 23(9): 2421−2456 DOI: 10.1162/NECO_a_00168
|
[35] |
Mitsui Y, Kitamura D, N. Takamune, et al. Independent low-rank matrix analysis based on parametric majorization-equalization algorithm. Proceedings of 7th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, IEEE, 2017: 1−5
|
[36] |
Garofolo J S, Lamel L F, Fisher W M, et al. DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. Nat. Inst. Standards Technol., 1993
|
[37] |
Araki S, Nesta F, Vincent E, et al. The 2011 signal separation evaluation campaign (SiSEC2011): Audio source separation. Proceedings of International Conference on Latent Variable Analysis and Signal Separation. Springer, Berlin, Heidelberg, 2012: 414−422
|
[38] |
Nakamura S, Hiyane K, Asano F, et al. Acoustical sound database in real environments for sound scene understanding and hands-free speech recognition. The 2nd International Conference on Language Resources and Evaluation, ELRA, Athens, Greece, 2000: 965−968
|
[39] |
Vincent E, Gribonval R, Févotte C. Performance measurement in blind audio source separation. IEEE Trans. Audio Speech Lang. Process., 2006; 14(4): 1462−1469 DOI: 10.1109/TSA.2005.858005
|
1. |
马丽婷,刘宇辉,张鹏,李伟伟,董华军. 基于图像处理技术的触头烧蚀面积计算. 广东电力. 2025(02): 85-91 .
![]() | |
2. |
白荣雪,孙德刚,白荣霜. 基于非负矩阵分解的复杂网络社区检测研究. 办公自动化. 2024(21): 7-9 .
![]() |