References
- A. Khamparia, D. Gupta, N. G. Nguyen, A. Khanna, B. Pandey and P. Tiwari, "Sound Classification Using Convolutional Neural Network and Tensor Deep Stacking Network," in IEEE Access, vol. 7, pp. 7717-7727, 2019. doi: http://doi.org/10.1109/ACCESS.2018.2888882
- K. Jaiswal and D. Kalpeshbhai Patel, "Sound Classification Using Convolutional Neural Networks," 2018 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM), pp. 81-84, 2018. doi: http://doi.org/10.1109/CCEM.2018.00021
- P. Tzirakis, J. Zhang and B. W. Schuller, "End-to-End Speech Emotion Recognition Using Deep Neural Networks," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5089-5093, 2018. doi: http://doi.org/10.1109/ICASSP.2018.8462677
- Zhichao Zhang, Shugong Xu, Tianhao Qiao, Shunqing Zhang, Shan Cao, "Attention Based Convolutional Recurrent Neural Network for Environmental Sound Classification", 2019. arXiv:1907.02230
- S. Wyatt et al., "Environmental Sound Classification with Tiny Transformers in Noisy Edge Environments," 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), pp. 309-314, 2021. doi: http://doi.org/10.1109/WF-IoT51360.2021.9596007
- Alexey Dosovitskiy, Lucas Beyer et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale," 2020. arXiv:2010.11929
- Yue, Zhihan, Yujing Wang, Juanyong Duan, Tianmeng Yang, Congrui Huang, Yu Tong and Bixiong Xu. "TS2Vec: Towards Universal Representation of Time Series.," 2021. arXiv:2106.10466
- Sepp Hochreiter, Jurgen Schmidhuber, "Long Short-Term Memory," Neural computation 9, 1735-80, 1997. doi: http://doi.org/10.1162/neco.1997.9.8.1735
- B. Shi, X. Bai and C. Yao, "An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 11, pp. 2298-2304, 1 Nov. 2017. doi: http://doi.org/10.1109/TPAMI.2016.2646371
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, "Attention is All you Need," 31st Conference on Neural Information Processing Systems (NIPS), 2017.
- Shaojie Bai, J. Zico Kolter, Vladlen Koltun, "An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling," 2018. arXiv:1803.01271
- Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, "ImageNet classification with deep convolutional neural networks," Communications of the ACM, Volume 60, Issue 6, pp 84-90, June 2017. doi: http://doi.org/10.1145/3065386
- Bengio, Y., Louradour, J., Collobert, R., & Weston, J. "Curriculum learning. In Proceedings of the 26th annual international conference on machine learning," pp. 41-48, June, 2009.
- Kumar, M., Packer, B., & Koller, D. "Self-paced learning for latent variable models," Advances in Neural Information Processing Systems 23 (NIPS 2010), pp. 1189-1197, 2010.
- Qingsong Wen, Liang Sun et al, "Time Series Data Augmentation for Deep Learning: A Survey," Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence Survey Track, pp. 4653-4660, 2021. doi: http://doi.org/10.24963/ijcai.2021/631
- Park, Daniel S., et al. "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition," Proc. Interspeech 2019, pp. 2613-2617, 2019.
- D. S. Park et al., "Specaugment on Large Scale Datasets," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6879-6883, 2020. doi: http://doi.org/10.1109/ICASSP40776.2020.9053205
- Xingcheng Song, Zhiyong Wu, Yiheng Huang, Dan Su, Helen Meng, "SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition," 2019. arXiv:1912.05533
- Helin Wang and Yuexian Zou and Wenwu Wang. "SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification", arXiv:2103.16858v3
- Bingyi Kang, Saining Xie, Marcus Rohrbach, Zhicheng Yan, Albert Gordo, Jiashi Feng, Yannis Kalantidis "Decoupling Representation and Classifier for Long-Tailed Recognition," ICLR 2020.
- Y. Cui, M. Jia, T. -Y. Lin, Y. Song and S. Belongie, "Class-Balanced Loss Based on Effective Number of Samples," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9260-9269, 2019. doi: http://doi.org/10.1109/CVPR.2019.00949
- T. -Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollar, "Focal Loss for Dense Object Detection," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2999-3007, 2017. doi: http://doi.org/10.1109/ICCV.2017.324
- Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma, "Learning imbalanced datasets with label-distribution-aware margin loss," Proceedings of the 33rd International Conference on Neural Information Processing Systems, Article No:140, pp.1567-1578, December 2019.
- W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj and L. Song, "SphereFace: Deep Hypersphere Embedding for Face Recognition," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6738-6746, 2017. doi: http://doi.org/10.1109/CVPR.2017.713
- H. Wang et al., "CosFace: Large Margin Cosine Loss for Deep Face Recognition," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5265-5274, 2018. doi: http://doi.org/10.1109/CVPR.2018.00552
- J. Deng, J. Guo, N. Xue and S. Zafeiriou, "ArcFace: Additive Angular Margin Loss for Deep Face Recognition," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685-4694, 2019. doi: http://doi.org/10.1109/CVPR.2019.00482
- J. Salamon, C. Jacoby and J. P. Bello, "A Dataset and Taxonomy for Urban Sound Research", 22nd ACM International Conference on Multimedia, Orlando USA, Nov. 2014.
- Livingstone SR, Russo FA (2018) The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English. PLoS ONE 13(5): e0196391. doi: http://doi.org/10.1371/journal.pone.0196391
- Bosch, J. J., Janer, J., Fuhrmann, F., & Herrera, P. "A Comparison of Sound Segregation Techniques for Predominant Instrument Recognition in Musical Audio Signals", in Proc. ISMIR, pp. 559-564, 2012.