Search | Korea Science

Audio and Video Bimodal Emotion Recognition in Social Networks Based on Improved AlexNet Network and Attention Mechanism

Liu, Min;Tang, Jun
- Journal of Information Processing Systems
- /
- v.17 no.4
- /
- pp.754-771
- /
- 2021
In the task of continuous dimension emotion recognition, the parts that highlight the emotional expression are not the same in each mode, and the influences of different modes on the emotional state is also different. Therefore, this paper studies the fusion of the two most important modes in emotional recognition (voice and visual expression), and proposes a two-mode dual-modal emotion recognition method combined with the attention mechanism of the improved AlexNet network. After a simple preprocessing of the audio signal and the video signal, respectively, the first step is to use the prior knowledge to realize the extraction of audio characteristics. Then, facial expression features are extracted by the improved AlexNet network. Finally, the multimodal attention mechanism is used to fuse facial expression features and audio features, and the improved loss function is used to optimize the modal missing problem, so as to improve the robustness of the model and the performance of emotion recognition. The experimental results show that the concordance coefficient of the proposed model in the two dimensions of arousal and valence (concordance correlation coefficient) were 0.729 and 0.718, respectively, which are superior to several comparative algorithms.
https://doi.org/10.3745/JIPS.02.0161 인용 PDF KSCI

Variations of AlexNet and GoogLeNet to Improve Korean Character Recognition Performance

Lee, Sang-Geol;Sung, Yunsick;Kim, Yeon-Gyu;Cha, Eui-Young
- Journal of Information Processing Systems
- /
- v.14 no.1
- /
- pp.205-217
- /
- 2018
Deep learning using convolutional neural networks (CNNs) is being studied in various fields of image recognition and these studies show excellent performance. In this paper, we compare the performance of CNN architectures, KCR-AlexNet and KCR-GoogLeNet. The experimental data used in this paper is obtained from PHD08, a large-scale Korean character database. It has 2,187 samples of each Korean character with 2,350 Korean character classes for a total of 5,139,450 data samples. In the training results, KCR-AlexNet showed an accuracy of over 98% for the top-1 test and KCR-GoogLeNet showed an accuracy of over 99% for the top-1 test after the final training iteration. We made an additional Korean character dataset with fonts that were not in PHD08 to compare the classification success rate with commercial optical character recognition (OCR) programs and ensure the objectivity of the experiment. While the commercial OCR programs showed 66.95% to 83.16% classification success rates, KCR-AlexNet and KCR-GoogLeNet showed average classification success rates of 90.12% and 89.14%, respectively, which are higher than the commercial OCR programs' rates. Considering the time factor, KCR-AlexNet was faster than KCR-GoogLeNet when they were trained using PHD08; otherwise, KCR-GoogLeNet had a faster classification speed.
https://doi.org/10.3745/JIPS.04.0061 인용 PDF KSCI HTML

Convolutional Neural Networks for Character-level Classification

Ko, Dae-Gun;Song, Su-Han;Kang, Ki-Min;Han, Seong-Wook
- IEIE Transactions on Smart Processing and Computing
- /
- v.6 no.1
- /
- pp.53-59
- /
- 2017
Optical character recognition (OCR) automatically recognizes text in an image. OCR is still a challenging problem in computer vision. A successful solution to OCR has important device applications, such as text-to-speech conversion and automatic document classification. In this work, we analyze character recognition performance using the current state-of-the-art deep-learning structures. One is the AlexNet structure, another is the LeNet structure, and the other one is the SPNet structure. For this, we have built our own dataset that contains digits and upper- and lower-case characters. We experiment in the presence of salt-and-pepper noise or Gaussian noise, and report the performance comparison in terms of recognition error. Experimental results indicate by five-fold cross-validation that the SPNet structure (our approach) outperforms AlexNet and LeNet in recognition error.
https://doi.org/10.5573/IEIESPC.2017.6.1.053 인용 PDF KSCI

A Study on Detection Performance Comparison of Bone Plates Using Parallel Convolution Neural Networks (병렬형 합성곱 신경망을 이용한 골절합용 판의 탐지 성능 비교에 관한 연구)

Lee, Song Yeon;Huh, Yong Jeong
- Journal of the Semiconductor & Display Technology
- /
- v.21 no.3
- /
- pp.63-68
- /
- 2022
In this study, we produced defect detection models using parallel convolution neural networks. If convolution neural networks are constructed parallel type, the model's detection accuracy will increase and detection time will decrease. We produced parallel-type defect detection models using 4 types of convolutional algorithms. The performance of models was evaluated using evaluation indicators. The model's performance is detection accuracy and detection time. We compared the performance of each parallel model. The detection accuracy of the model using AlexNet is 97 % and the detection time is 0.3 seconds. We confirmed that when AlexNet algorithm is constructed parallel type, the model has the highest performance.
PDF KSCI

Evaluation of Deep-Learning Feature Based COVID-19 Classifier in Various Neural Network (코로나바이러스 감염증19 데이터베이스에 기반을 둔 인공신경망 모델의 특성 평가)

Hong, Jun-Yong;Jung, Young-Jin
- Journal of radiological science and technology
- /
- v.43 no.5
- /
- pp.397-404
- /
- 2020
Coronavirus disease(COVID-19) is highly infectious disease that directly affects the lungs. To observe the clinical findings from these lungs, the Chest Radiography(CXR) can be used in a fast manner. However, the diagnostic performance via CXR needs to be improved, since the identifying these findings are highly time-consuming and prone to human error. Therefore, Artificial Intelligence(AI) based tool may be useful to aid the diagnosis of COVID-19 via CXR. In this study, we explored various Deep learning(DL) approach to classify COVID-19, other viral pneumonia and normal. For the original dataset and lung-segmented dataset, the pre-trained AlexNet, SqueezeNet, ResNet18, DenseNet201 were transfer-trained and validated for 3 class - COVID-19, viral pneumonia, normal. In the results, AlexNet showed the highest mean accuracy of 99.15±2.69% and fastest training time of 1.61±0.56 min among 4 pre-trained neural networks. In this study, we demonstrated the performance of 4 pre-trained neural networks in COVID-19 diagnosis with CXR images. Further, we plotted the class activation map(CAM) of each network and demonstrated that the lung-segmentation pre-processing improve the performance of COVID-19 classifier with CXR images by excluding background features.
https://doi.org/10.17946/JRST.2020.43.5.397 인용 PDF KSCI

An Approximate DRAM Architecture for Energy-efficient Deep Learning

Nguyen, Duy Thanh;Chang, Ik-Joon
- Journal of Semiconductor Engineering
- /
- v.1 no.1
- /
- pp.31-37
- /
- 2020
We present an approximate DRAM architecture for energy-efficient deep learning. Our key premise is that by bounding memory errors to non-critical information, we can significantly reduce DRAM refresh energy without compromising recognition accuracy of deep neural networks. To validate the key premise, we make extensive Monte-Carlo simulations for several well-known convolutional neural networks such as LeNet, ConvNet and AlexNet with the input of MINIST, CIFAR-10, and ImageNet, respectively. We assume that the highest-order 8-bits (in single precision) and 4-bits (in half precision) are protected from retention errors under the proposed architecture and then, randomly inject bit-errors to unprotected bits with various bit-error-rates. Here, recognition accuracies of the above convolutional neural networks are successfully maintained up to the 10^-5-order bit-error-rate. We simulate DRAM energy during inference of the above convolutional neural networks, where the proposed architecture shows the possibility of considerable energy saving up to 10 ~ 37.5% of total DRAM energy.
https://doi.org/10.22895/jse.2020.0004 인용 PDF KSCI

Emotion Transfer with Strength Control for End-to-End TTS (감정 제어 가능한 종단 간 음성합성 시스템)

Jeon, Yejin;Lee, Gary Geunbae
- Annual Conference on Human and Language Technology
- /
- 2021.10a
- /
- pp.423-426
- /
- 2021
본 논문은 전역 스타일 토큰(Global Style Token)을 기준으로 하여 감정의 세기를 조절할 수 있는 방법을 소개한다. 기존의 전역 스타일 토큰 연구에서는 원하는 스타일이 포함된 참조 오디오(reference audio)을 사용하여 음성을 합성하였다. 그러나, 참조 오디오의 스타일대로만 음성합성이 가능하기 때문에 세밀한 감정 조절에 어려움이 있었다. 이 문제를 해결하기 위해 본 논문에서는 전역 스타일 토큰의 레퍼런스 인코더 부분을 잔여 블록(residual block)과 컴퓨터 비전 분야에서 사용되는 AlexNet으로 대체하였다. AlexNet은 5개의 함성곱 신경망(convolutional neural networks) 으로 구성되어 있지만, 본 논문에서는 1개의 신경망을 제외한 4개의 레이어만 사용했다. 청취 평가(Mean Opinion Score)를 통해 제시된 방법으로 감정 세기의 조절 가능성을 보여준다.
PDF

Performance Enhancement and Evaluation of a Deep Learning Framework on Embedded Systems using Unified Memory (통합메모리를 이용한 임베디드 환경에서의 딥러닝 프레임워크 성능 개선과 평가)

Lee, Minhak;Kang, Woochul
- KIISE Transactions on Computing Practices
- /
- v.23 no.7
- /
- pp.417-423
- /
- 2017
Recently, many embedded devices that have the computing capability required for deep learning have become available; hence, many new applications using these devices are emerging. However, these embedded devices have an architecture different from that of PCs and high-performance servers. In this paper, we propose a method that improves the performance of deep-learning framework by considering the architecture of an embedded device that shares memory between the CPU and the GPU. The proposed method is implemented in Caffe, an open-source deep-learning framework, and is evaluated on an NVIDIA Jetson TK1 embedded device. In the experiment, we investigate the image recognition performance of several state-of-the-art deep-learning networks, including AlexNet, VGGNet, and GoogLeNet. Our results show that the proposed method can achieve significant performance gain. For instance, in AlexNet, we could reduce image recognition latency by about 33% and energy consumption by about 50%.
https://doi.org/10.5626/KTCP.2017.23.7.417 인용 KSCI

Interworking technology of neural network and data among deep learning frameworks

Park, Jaebok;Yoo, Seungmok;Yoon, Seokjin;Lee, Kyunghee;Cho, Changsik
- ETRI Journal
- /
- v.41 no.6
- /
- pp.760-770
- /
- 2019
Based on the growing demand for neural network technologies, various neural network inference engines are being developed. However, each inference engine has its own neural network storage format. There is a growing demand for standardization to solve this problem. This study presents interworking techniques for ensuring the compatibility of neural networks and data among the various deep learning frameworks. The proposed technique standardizes the graphic expression grammar and learning data storage format using the Neural Network Exchange Format (NNEF) of Khronos. The proposed converter includes a lexical, syntax, and parser. This NNEF parser converts neural network information into a parsing tree and quantizes data. To validate the proposed system, we verified that MNIST is immediately executed by importing AlexNet's neural network and learned data. Therefore, this study contributes an efficient design technique for a converter that can execute a neural network and learned data in various frameworks regardless of the storage format of each framework.
https://doi.org/10.4218/etrij.2018-0135 인용 PDF KSCI

The development of food image detection and recognition model of Korean food for mobile dietary management

Park, Seon-Joo;Palvanov, Akmaljon;Lee, Chang-Ho;Jeong, Nanoom;Cho, Young-Im;Lee, Hae-Jeung
- Nutrition Research and Practice
- /
- v.13 no.6
- /
- pp.521-528
- /
- 2019
BACKGROUND/OBJECTIVES: The aim of this study was to develop Korean food image detection and recognition model for use in mobile devices for accurate estimation of dietary intake. MATERIALS/METHODS: We collected food images by taking pictures or by searching web images and built an image dataset for use in training a complex recognition model for Korean food. Augmentation techniques were performed in order to increase the dataset size. The dataset for training contained more than 92,000 images categorized into 23 groups of Korean food. All images were down-sampled to a fixed resolution of $150{\times}150$ and then randomly divided into training and testing groups at a ratio of 3:1, resulting in 69,000 training images and 23,000 test images. We used a Deep Convolutional Neural Network (DCNN) for the complex recognition model and compared the results with those of other networks: AlexNet, GoogLeNet, Very Deep Convolutional Neural Network, VGG and ResNet, for large-scale image recognition. RESULTS: Our complex food recognition model, K-foodNet, had higher test accuracy (91.3%) and faster recognition time (0.4 ms) than those of the other networks. CONCLUSION: The results showed that K-foodNet achieved better performance in detecting and recognizing Korean food compared to other state-of-the-art models.
https://doi.org/10.4162/nrp.2019.13.6.521 인용 PDF KSCI

Search Result 17, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)