References
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kings bury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
- A. Fernandez-Lopez, F. M. Sukno, "Survey on automatic lip-reading in the era of deep learning," Image and Vision Computing, vol. 78, pp. 53-72. 2018.
- J. A. Gonzalez-Lopez, A. Gomez-Alanis, J. M. Martin Donas, J. L. Perez-Cordoba and A. M. Gomez, "Silent Speech Interfaces for Speech Restoration: A Review," in IEEE Access, vol. 8, pp. 177995-178021, 2020.
- M. Hao, M. Mamut, N. Yadikar, A. Aysa and K. Ubul, "A Survey of Research on Lipreading Technology," in IEEE Access, vol. 8, pp. 204518-204544, 2020.
- S. Fenghour, D. Chen, K. Guo, B. Li and P. Xiao, "Deep Learning-Based Automated Lip-Reading: A Survey," in IEEE Access, vol. 9, pp. 121184-121205, 2021.
- K. Paliwal, K. Wojcicki, B. Shannon, "The importance of phase in speech enhancement," Speech Communication, vol. 53, No. 4, pp. 465-494, 2011.
- M. Wollmer, B. Schuller, F. Eyben and G. Rigoll, "Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening," in IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 867-881, 2010.
- J. T. Geiger, F. Weninger, J. F. Gemmeke, M. Wollmer, B. Schuller and G. Rigoll, "Memory-Enhanced Neural Networks and NMF for Robust ASR," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 6, pp. 1037-1046, 2014.
- Y. Qian, M. Bi, T. Tan, and K. Yu, "Very deep convolutional neural networks for noise robust speech recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 12, pp. 2263-2276, 2016.
- G. E. Dahl, D. Yu, L. Deng and A. Acero, "Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 30-42, Jan. 2012.
- G. Trigeorgis et al., "Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, pp. 5200-5204, 2016.
- H. Joo, K. Lee, "Estimating speech parameters for ultrasonic Doppler signal using LSTM recurrent neural networks," The Journal of the Acoustical Society of Korea, vol.38, no.4, pp. 433-441, 2019.
- Z. Zhang, N. Cummins and B. Schuller, "Advanced Data Exploitation in Speech Analysis: An overview," in IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 107-129, July 2017.
- K. Lee, "An acoustic Doppler-based silent speech interface technology using generative adversarial networks'" The Journal of the Acoustical Society of Korea. vol.40, no.2, pp. 161-168, 2021.
- A. Creswell, T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, "Generative adversarial networks: An overview," IEEE Signal Processing Magazine, vol. 35, pp. 53-65, 2018.
- Julius Richter, Simon Welker, J-M Lemercier, Bunlong Lay, Timo Gerkmann, "Speech Enhancement and Dereverberation with Diffusion-Based Generative Models," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 2351-2364. 2023.
- C. -S. Lin, S. -F. Chang, C. -C. Chang and C. -C. Lin, "Microwave Human Vocal Vibration Signal Detection Based on Doppler Radar Technology," in IEEE Transactions on Microwave Theory and Techniques, vol. 58, no. 8, pp. 2299-2306, Aug. 2010.
- B. Denby, T. Schultz, K. Honda, T. Hueber, J.M. Gilbert, J.S. Brumberg, "Silent speech interfaces," Speech Communication, vol. 52, No. 4, pp. 270-287, 2010.
- T. Le Cornu and B. Milner, "Generating Intelligible Audio Speech from Visual Speech," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 9, pp. 1751-1761, 2017.
- K. Lee, "Ultrasonic Doppler Based Silent Speech Interface Using Perceptual Distance," Applied Sciences. 12(2), 827, 2022.
- K. Lee, "Speech enhancement using ultrasonic doppler sonar", Speech Communication, Vol. 110, pp. 21-32, July 2019.
- K. Lee, "Silent Speech Interface Using Ultrasonic Doppler Sonar," EICE Transactions on Information and Systems, vol. E103.D, no. 8, pp. 1875-1887, 2020.
- T. Toda, M. Nakagiri and K. Shikano, "Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 9, pp. 2505-2517, Nov. 2012.
- M. Janke and L. Diener, "EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2375- 2385, Dec. 2017.
- G. Shin, J. Kim, "A Study on the Intelligent Recognition of a Various Electronic Components and Alignment Method with Vision," Journal of the Semiconductor & Display Technology, vol. 23, no. 2, pp. 1-5, 2024.
- X. Tan and B. Triggs, "Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions," in IEEE Transactions on Image Processing, vol. 19, no. 6, pp. 1635-1650, 2010.
- A. Chavarin, E. Cuevas, O. Avalos, J. Galvez and M. Perez-Cisneros, "Contrast Enhancement in Images by Homomorphic Filtering and Cluster-Chaotic Optimization," in IEEE Access, vol. 11, pp. 73803-73822, 2023.
- P. -H. Lee, S. -W. Wu and Y. -P. Hung, "Illumination Compensation Using Oriented Local Histogram Equalization and its Application to Face Recognition," in IEEE Transactions on Image Processing, vol. 21, no. 9, pp. 4280-4289, Sept. 2012.
- M. Zheng, G. Qi, Z. Zhu, Y. Li, H. Wei and Y. Liu, "Image Dehazing by an Artificial Image Fusion Method Based on Adaptive Structure Decomposition," in IEEE Sensors Journal, vol. 20, no. 14, pp. 8062-8072, 15 July15, 2020.
- D. Sugimura, T. Mikami, H. Yamashita and T. Hamamoto, "Enhancing Color Images of Extremely Low Light Scenes Based on RGB/NIR Images Acquisition With Different Exposure Times," in IEEE Transactions on Image Processing, vol. 24, no. 11, pp. 3586-3597, Nov. 2015.
- Y. Kumar, R. Jain, K. M. Salik, R. R. Shah, Y. Yin, R. Zimmermann, "Lipper: Synthesizing thy speech using multi-view lipreading," in Proc. AAAI Conf. Artif. Intell., vol. 33, pp. 2588-2595, 2019.
- K. Vougioukas, P. Ma, S. Petridis, and M. Pantic, "Video-driven speech reconstruction using generative adversarial networks," in Proc. Interspeech, Sep. 2019, pp. 4125-4129.
- M. Wand, J. Koutnik, and J. Schmidhuber, "Lipreading with Long Short-Term Memory," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2016, pp. 6115-6119.
- A. Ephrat and S. Peleg, "Vid2Speech: Speech reconstruction from silent video," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2017, pp. 5095-5099.
- H. Akbari, H. Arora, L. Cao, and N. Mesgarani, "Lip2Audspec: Speech reconstruction from silent lip movements video," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Apr. 2018, pp. 2516- 2520.
- T. Stafylakis and G. Tzimiropoulos, "Combining residual networks with LSTMs for lipreading," in Proc. Interspeech, Aug. 2017, pp. 3652-3656.
- B. Martinez, P. Ma, S. Petridis, and M. Pantic, "Lipreading using temporal convolutional networks," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2020, pp. 6319-6323.
- K. Lee, "Improving the Performance of Automatic Lip-Reading Using Image Conversion Techniques," Electronics, 13(6), 1032, March 2024.
- M. Sadeghi, S. Leglaive, X. Alameda-Pineda, L. Girin, and R. Horaud, "Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1788-1800. 2020.
- X. Qian, Z. Wang, J. Wang, G. Guan, and H. Li, "Audio-visual cross-attention networks for robotic speaker tracking," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 550-562. 2023.
- Lei. Liu, Li Liu, and H. Li, "Computation and Parameter Efficient Multi-Modal fusion Transformer for Cued Speech Recognition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 1559-1572. 2024.
- B. Shi, W. Hsu, A. Mohamed, "Robust Self-Supervised Audio-Visual Speech Recognition," arXiv:2201.01763, 2022.
- L. Qu, C. Weber, S. Wermter, "LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading," IEEE Transactions on Neural Networks and Learning systems, vol. 35, no. 2, pp. 2772-2782, 2024.
- T. Afouras, J. Chung, A. Senior, O. Vinyals, A. Zisserman, "Deep Audio-Visual Speech Recognition," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 8717-8727, 2022.
- C. Xie, T. Toda, "Noisy-to-Noisy Voice Conversion Under Variations of Noisy Condition," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3871-3882. 2023.
- J. Devlin, M. Chang, K. Lee, K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171-4186, 2019.
- Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "ALBERT: A lite BERT for self-supervised learning of language representations," ICLR 2020 Conference, 2019, arXiv:1909.11942.
- B. Chen et al., "Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021, pp. 7992-8001.