Citation: | WANG Meng, ZHANG Pengyuan. Short-time acoustic scene recognition method using multi-scale feature fusion[J]. ACTA ACUSTICA, 2022, 47(6): 717-726. DOI: 10.15949/j.cnki.0371-0025.2022.06.002 |
[1] |
Sawhney N, Maes P. Situational awareness from environmental sounds. Project Rep. for Pattie Maes, 1997:1-7
|
[2] |
Chu S, Narayanan S, Kuo C C J et al. Where am I? Scene recognition for mobile robots using audio features. 2006 IEEE International Conference On Multimedia and Expo. IEEE, 2006:885-888
|
[3] |
Eronen A J, Peltonen V T, Tuomi J T et al. Audio-based context recognition. IEEE Trans. Audio Speech Lang. Process., 2005; 14(1):321-329
|
[4] |
Ma L, Milner B, Smith D. Acoustic environment classification. ACM Trans. Speech Lang. Process., 2006; 3(2):1-22
|
[5] |
Jiang H, Bai J, Zhang S et al. SVM-based audio scene classification. 2005 International Conference on Natural Language Processing and Knowledge Engineering. IEEE, 2005:131-136
|
[6] |
Zhu Y, Ming Z. SVM-based video scene classification and segmentation. 2008 International Conference on Multimedia and Ubiquitous Engineering (mue 2008). IEEE, 2008:407-412
|
[7] |
Dahl G E, Yu D, Deng L et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process., 2011; 20(1):30-42
|
[8] |
Li J, Dai W, Metze F et al. A comparison of deep learning methods for environmental sound detection. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., IEEE, 2017:126-130
|
[9] |
Weiping Z, Jiantao Y, Xiaotao X et al. Acoustic scene classification using deep convolutional neural network and multiple spectrograms fusion. Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2017
|
[10] |
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classi- fication with deep convolutional neural networks. Proc. Adv. Neural Inf. Process. Syst., 2012:1097-1105
|
[11] |
Han Y, Park J, Lee K. Convolutional neural networks with binaural representations and background subtraction for acoustic scene classification. Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2017:1-5
|
[12] |
Phan H, Koch P, Hertel L et al. CNN-LTE:a class of 1-X pooling convolutional neural networks on label tree embeddings for audio scene classification. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., IEEE, 2017:136-140
|
[13] |
Hershey S, Chaudhuri S, Ellis D P W et al. CNN architectures for large-scale audio classification. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., IEEE, 2017:131-135
|
[14] |
Sharma J, Granmo O C, Goodwin M. Environment sound classification using multiple feature channels and attention based deep convolutional neural network. Interspeech, Shanghai, China, 2020:1186-1190
|
[15] |
杨立东, 胡江涛. 多优化机制下深度神经网络的音频场景识别. 信号处理, 2021; 37(10):1969-1976
|
[16] |
Dai W, Dai C, Qu S et al. Very deep convolutional neural networks for raw waveforms. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., IEEE, 2017:421-425
|
[17] |
Lee J, Park J, Kim K L et al. Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. Sound and Music Computing Conference, 2017:220-226
|
[18] |
Gao W, McDonnell M, UniSA S. Acoustic scene classification using deep residual networks with focal loss and mild domain adaptation. Proc. Detection and Classification of Acoustic Scenes and Events Workshop, 2020
|
[19] |
He K, Zhang X, Ren S et al. Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778
|
[20] |
Ioffe S, Szegedy C. Batch normalization:Accelerating deep network training by reducing internal covariate shift. International Conference on Machine Learning, PMLR, 2015:448-456
|
[21] |
Wang S, Mesaros A, Heittola T et al. A curated dataset of urban scenes for audio-visual scene analysis. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., IEEE, 2021:626-630
|
[22] |
Sutskever I, Martens J, Dahl G et al. On the importance of initialization and momentum in deep learning. International Conference on Machine Learning, PMLR, 2013:1139-1147
|
[23] |
Loshchilov I, Hutter F. Sgdr:Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016
|
[24] |
Van der Maaten L, Hinton G. Visualizing data using t-SNE. J. Mach. Learn. Res., 2008:2579-2605
|
[25] |
Yang L, Chen X, Tao L. Acoustic scene classification using multi-scale features. Proc. Detection and Classification of Acoustic Scenes and Events, 2018:29-33
|
[26] |
Zhu B, Wang C, Liu F et al. Learning environmental sounds with multi-scale convolutional neural network. 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, 2018:1-8
|
[27] |
Hu H, Yang C H H, Xia X et al. A two-stage approach to device-robust acoustic scene classification. Proc. IEEE Int. Conf. Acoust. Speech Signal Process., IEEE, 2021:845-849
|
[28] |
Suh S, Park S, Jeong Y et al. Designing acoustic scene classification models with CNN variants. Tech. Rep., Detection and Classification of Acoustic Scenes and Events, 2020
|
[29] |
Zhang H, Wu C, Zhang Z et al. Resnest:Split-attention networks. arXiv preprint arXiv:2004.08955, 2020
|
[30] |
Wang S, Heittola T, Mesaros A et al. Audio-visual scene classification:analysis of DCASE 2021 Challenge submissions. arXiv preprint arXiv:2105.13675, 2021
|