Search | Korea Science

A Korean speech recognition based on conformer (콘포머 기반 한국어 음성인식)

Koo, Myoung-Wan
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.5
- /
- pp.488-495
- /
- 2021
We propose a speech recognition system based on conformer. Conformer is known to be convolution-augmented transformer, which combines transfer model for capturing global information with Convolution Neural Network (CNN) for exploiting local feature effectively. The baseline system is developed to be a transfer-based speech recognition using Long Short-Term Memory (LSTM)-based language model. The proposed system is a system which uses conformer instead of transformer with transformer-based language model. When Electronics and Telecommunications Research Institute (ETRI) speech corpus in AI-Hub is used for our evaluation, the proposed system yields 5.7 % of Character Error Rate (CER) while the baseline system results in 11.8 % of CER. Even though speech corpus is extended into other domain of AI-hub such as NHNdiguest speech corpus, the proposed system makes a robust performance for two domains. Throughout those experiments, we can prove a validation of the proposed system.
https://doi.org/10.7776/ASK.2021.40.5.488 인용 PDF KSCI

Performance of Exercise Posture Correction System Based on Deep Learning (딥러닝 기반 운동 자세 교정 시스템의 성능)

Hwang, Byungsun;Kim, Jeongho;Lee, Ye-Ram;Kyeong, Chanuk;Seon, Joonho;Sun, Young-Ghyu;Kim, Jin-Young
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.22 no.5
- /
- pp.177-183
- /
- 2022
Recently, interesting of home training is getting bigger due to COVID-19. Accordingly, research on applying HAR(human activity recognition) technology to home training has been conducted. However, existing paper of HAR proposed static activity instead of dynamic activity. In this paper, the deep learning model where dynamic exercise posture can be analyzed and the accuracy of the user's exercise posture can be shown is proposed. Fitness images of AI-hub are analyzed by blaze pose. The experiment is compared with three types of deep learning model: RNN(recurrent neural network), LSTM(long short-term memory), CNN(convolution neural network). In simulation results, it was shown that the f1-score of RNN, LSTM and CNN is 0.49, 0.87 and 0.98, respectively. It was confirmed that CNN is more suitable for human activity recognition than other models from simulation results. More exercise postures can be analyzed using a variety learning data.
https://doi.org/10.7236/JIIBC.2022.22.5.177 인용 PDF KSCI HTML

Deep recurrent neural networks with word embeddings for Urdu named entity recognition

Khan, Wahab;Daud, Ali;Alotaibi, Fahd;Aljohani, Naif;Arafat, Sachi
- ETRI Journal
- /
- v.42 no.1
- /
- pp.90-100
- /
- 2020
Named entity recognition (NER) continues to be an important task in natural language processing because it is featured as a subtask and/or subproblem in information extraction and machine translation. In Urdu language processing, it is a very difficult task. This paper proposes various deep recurrent neural network (DRNN) learning models with word embedding. Experimental results demonstrate that they improve upon current state-of-the-art NER approaches for Urdu. The DRRN models evaluated include forward and bidirectional extensions of the long short-term memory and back propagation through time approaches. The proposed models consider both language-dependent features, such as part-of-speech tags, and language-independent features, such as the "context windows" of words. The effectiveness of the DRNN models with word embedding for NER in Urdu is demonstrated using three datasets. The results reveal that the proposed approach significantly outperforms previous conditional random field and artificial neural network approaches. The best f-measure values achieved on the three benchmark datasets using the proposed deep learning approaches are 81.1%, 79.94%, and 63.21%, respectively.
https://doi.org/10.4218/etrij.2018-0553 인용 PDF KSCI

Emotion Recognition in Arabic Speech from Saudi Dialect Corpus Using Machine Learning and Deep Learning Algorithms

Hanaa Alamri;Hanan S. Alshanbari
- International Journal of Computer Science & Network Security
- /
- v.23 no.8
- /
- pp.9-16
- /
- 2023
Speech can actively elicit feelings and attitudes by using words. It is important for researchers to identify the emotional content contained in speech signals as well as the sort of emotion that resulted from the speech that was made. In this study, we studied the emotion recognition system using a database in Arabic, especially in the Saudi dialect, the database is from a YouTube channel called Telfaz11, The four emotions that were examined were anger, happiness, sadness, and neutral. In our experiments, we extracted features from audio signals, such as Mel Frequency Cepstral Coefficient (MFCC) and Zero-Crossing Rate (ZCR), then we classified emotions using many classification algorithms such as machine learning algorithms (Support Vector Machine (SVM) and K-Nearest Neighbor (KNN)) and deep learning algorithms such as (Convolution Neural Network (CNN) and Long Short-Term Memory (LSTM)). Our Experiments showed that the MFCC feature extraction method and CNN model obtained the best accuracy result with 95%, proving the effectiveness of this classification system in recognizing Arabic spoken emotions.
https://doi.org/10.22937/IJCSNS.2023.23.8.2 인용 PDF

Formation of Attention and Associative Memory based on Reinforcement Learning

Kenichi, Abe;Park, Jin-Bae
- 제어로봇시스템학회:학술대회논문집
- /
- 2001.10a
- /
- pp.22.3-22
- /
- 2001
An attention task, in which context information should be extracted from the first presented pattern, and the recognition answer of the second presented pattern should be generated using the context information, is employed in this paper. An Elman-type recurrent neural network is utilized to extract and keep the context information. A reinforcement signal that indicates whether the answer is correct or not, is only a signal that the system can obtain for the learning. Only by this learning, necessary context information became to be extracted and kept, and the system became to generate the correct answers. Furthermore, the function of an associative memory is observed in the feedback loop in the Elman-type neural network.
PDF

Design and Implementation of Hand Gesture Recognizer Based on Artificial Neural Network (인공신경망 기반 손동작 인식기의 설계 및 구현)

Kim, Minwoo;Jeong, Woojae;Cho, Jaechan;Jung, Yunho
- Journal of Advanced Navigation Technology
- /
- v.22 no.6
- /
- pp.675-680
- /
- 2018
In this paper, we propose a hand gesture recognizer using restricted coulomb energy (RCE) neural network, and present hardware implementation results for real-time learning and recognition. Since RCE-NN has a flexible network architecture and real-time learning process with low complexity, it is suitable for hand recognition applications. The 3D number dataset was created using an FPGA-based test platform and the designed hand gesture recognizer showed 98.8% recognition accuracy for the 3D number dataset. The proposed hand gesture recognizer is implemented in Intel-Altera cyclone IV FPGA and confirmed that it can be implemented with 26,702 logic elements and 258Kbit memory. In addition, real-time learning and recognition verification were performed at an operating frequency of 70MHz.
https://doi.org/10.12673/jant.2018.22.6.675 인용 PDF KSCI HTML

Guassian pdfs Clustering Using a Divergence Measure-based Neural Network (발산거리 기반의 신경망에 의한 가우시안 확률 밀도 함수의 군집화)

박동철;권오현
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.29 no.5C
- /
- pp.627-631
- /
- 2004
An efficient algorithm for clustering of GPDFs(Gaussian Probability Density Functions) in a speech recognition model is proposed in this paper. The proposed algorithm is based on CNN with the divergence as its distance measure and is applied to a speech recognition. The algorithm is compared with conventional Dk-means(Divergence-based k-means) algorithm in CDHMM(Continuous Density Hidden Markov Model). The results show that it can reduce about 31.3％ of GPDFs over Dk-means algorithm without suffering any recognition performance. When compared with the case that no clustering is employed and full GPDFs are used, the proposed algorithm can save about 61.8％ of GPDFs while preserving the recognition performance.
PDF KSCI

Development of a Low-cost Industrial OCR System with an End-to-end Deep Learning Technology

Subedi, Bharat;Yunusov, Jahongir;Gaybulayev, Abdulaziz;Kim, Tae-Hyong
- IEMEK Journal of Embedded Systems and Applications
- /
- v.15 no.2
- /
- pp.51-60
- /
- 2020
Optical character recognition (OCR) has been studied for decades because it is very useful in a variety of places. Nowadays, OCR's performance has improved significantly due to outstanding deep learning technology. Thus, there is an increasing demand for commercial-grade but affordable OCR systems. We have developed a low-cost, high-performance OCR system for the industry with the cheapest embedded developer kit that supports GPU acceleration. To achieve high accuracy for industrial use on limited computing resources, we chose a state-of-the-art text recognition algorithm that uses an end-to-end deep learning network as a baseline model. The model was then improved by replacing the feature extraction network with the best one suited to our conditions. Among the various candidate networks, EfficientNet-B3 has shown the best performance: excellent recognition accuracy with relatively low memory consumption. Besides, we have optimized the model written in TensorFlow's Python API using TensorFlow-TensorRT integration and TensorFlow's C++ API, respectively.
https://doi.org/10.14372/IEMEK.2020.15.2.51 인용 PDF KSCI

Video Representation via Fusion of Static and Motion Features Applied to Human Activity Recognition

Arif, Sheeraz;Wang, Jing;Fei, Zesong;Hussain, Fida
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.7
- /
- pp.3599-3619
- /
- 2019
In human activity recognition system both static and motion information play crucial role for efficient and competitive results. Most of the existing methods are insufficient to extract video features and unable to investigate the level of contribution of both (Static and Motion) components. Our work highlights this problem and proposes Static-Motion fused features descriptor (SMFD), which intelligently leverages both static and motion features in the form of descriptor. First, static features are learned by two-stream 3D convolutional neural network. Second, trajectories are extracted by tracking key points and only those trajectories have been selected which are located in central region of the original video frame in order to to reduce irrelevant background trajectories as well computational complexity. Then, shape and motion descriptors are obtained along with key points by using SIFT flow. Next, cholesky transformation is introduced to fuse static and motion feature vectors to guarantee the equal contribution of all descriptors. Finally, Long Short-Term Memory (LSTM) network is utilized to discover long-term temporal dependencies and final prediction. To confirm the effectiveness of the proposed approach, extensive experiments have been conducted on three well-known datasets i.e. UCF101, HMDB51 and YouTube. Findings shows that the resulting recognition system is on par with state-of-the-art methods.
https://doi.org/10.3837/tiis.2019.07.015 인용 PDF KSCI HTML

Development of Real-Time Face Region Recognition System for City-Security CCTV (도심방범용 CCTV를 위한 실시간 얼굴 영역 인식 시스템)

Kim, Young-Ho;Kim, Jin-Hong
- Journal of Korea Multimedia Society
- /
- v.13 no.4
- /
- pp.504-511
- /
- 2010
In this paper, we propose the face region recognition system for City-Security CCTV(Closed Circuit Television) using hippocampal neural network which is modelling of human brain's hippocampus. This system is composed of feature extraction, learning and recognition part. The feature extraction part is constructed using PCA(Principal Component Analysis) and LDA(Linear Discriminants Analysis). In the learning part, it can label the features of the image-data which are inputted according to the order of hippocampal neuron structure to reaction-pattern according to the adjustment of a good impression in a dentate gyrus and remove the noise through the auto-associative memory in the CA3 region. In the CA1 region receiving the information of the CA3, it can make long-term memory learned by neuron. Experiments confirm the each recognition rate, that are shape change and light change. The experimental results show that we can compare a feature extraction and learning method proposed in this paper of any other methods, and we can confirm that the proposed method is superior to existing methods.
PDF KSCI

Search Result 122, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)