• Title/Summary/Keyword: Deep Learning Models

Search Result 1,295, Processing Time 0.028 seconds

Deep learning-based speech recognition for Korean elderly speech data including dementia patients (치매 환자를 포함한 한국 노인 음성 데이터 딥러닝 기반 음성인식)

  • Jeonghyeon Mun;Joonseo Kang;Kiwoong Kim;Jongbin Bae;Hyeonjun Lee;Changwon Lim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.1
    • /
    • pp.33-48
    • /
    • 2023
  • In this paper we consider automatic speech recognition (ASR) for Korean speech data in which elderly persons randomly speak a sequence of words such as animals and vegetables for one minute. Most of the speakers are over 60 years old and some of them are dementia patients. The goal is to compare deep-learning based ASR models for such data and to find models with good performance. ASR is a technology that can recognize spoken words and convert them into written text by computers. Recently, many deep-learning models with good performance have been developed for ASR. Training data for such models are mostly composed of the form of sentences. Furthermore, the speakers in the data should be able to pronounce accurately in most cases. However, in our data, most of the speakers are over the age of 60 and often have incorrect pronunciation. Also, it is Korean speech data in which speakers randomly say series of words, not sentences, for one minute. Therefore, pre-trained models based on typical training data may not be suitable for our data, and hence we train deep-learning based ASR models from scratch using our data. We also apply some data augmentation methods due to small data size.

Automatic Extraction of References for Research Reports using Deep Learning Language Model (딥러닝 언어 모델을 이용한 연구보고서의 참고문헌 자동추출 연구)

  • Yukyung Han;Wonsuk Choi;Minchul Lee
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.2
    • /
    • pp.115-135
    • /
    • 2023
  • The purpose of this study is to assess the effectiveness of using deep learning language models to extract references automatically and create a reference database for research reports in an efficient manner. Unlike academic journals, research reports present difficulties in automatically extracting references due to variations in formatting across institutions. In this study, we addressed this issue by introducing the task of separating references from non-reference phrases, in addition to the commonly used metadata extraction task for reference extraction. The study employed datasets that included various types of references, such as those from research reports of a particular institution, academic journals, and a combination of academic journal references and non-reference texts. Two deep learning language models, namely RoBERTa+CRF and ChatGPT, were compared to evaluate their performance in automatic extraction. They were used to extract metadata, categorize data types, and separate original text. The research findings showed that the deep learning language models were highly effective, achieving maximum F1-scores of 95.41% for metadata extraction and 98.91% for categorization of data types and separation of the original text. These results provide valuable insights into the use of deep learning language models and different types of datasets for constructing reference databases for research reports including both reference and non-reference texts.

A Study on Korean Speech Animation Generation Employing Deep Learning (딥러닝을 활용한 한국어 스피치 애니메이션 생성에 관한 고찰)

  • Suk Chan Kang;Dong Ju Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.10
    • /
    • pp.461-470
    • /
    • 2023
  • While speech animation generation employing deep learning has been actively researched for English, there has been no prior work for Korean. Given the fact, this paper for the very first time employs supervised deep learning to generate Korean speech animation. By doing so, we find out the significant effect of deep learning being able to make speech animation research come down to speech recognition research which is the predominating technique. Also, we study the way to make best use of the effect for Korean speech animation generation. The effect can contribute to efficiently and efficaciously revitalizing the recently inactive Korean speech animation research, by clarifying the top priority research target. This paper performs this process: (i) it chooses blendshape animation technique, (ii) implements the deep-learning model in the master-servant pipeline of the automatic speech recognition (ASR) module and the facial action coding (FAC) module, (iii) makes Korean speech facial motion capture dataset, (iv) prepares two comparison deep learning models (one model adopts the English ASR module, the other model adopts the Korean ASR module, however both models adopt the same basic structure for their FAC modules), and (v) train the FAC modules of both models dependently on their ASR modules. The user study demonstrates that the model which adopts the Korean ASR module and dependently trains its FAC module (getting 4.2/5.0 points) generates decisively much more natural Korean speech animations than the model which adopts the English ASR module and dependently trains its FAC module (getting 2.7/5.0 points). The result confirms the aforementioned effect showing that the quality of the Korean speech animation comes down to the accuracy of Korean ASR.

A Study on Preprocessing Method in Deep Learning for ICS Cyber Attack Detection (ICS 사이버 공격 탐지를 위한 딥러닝 전처리 방법 연구)

  • Seonghwan Park;Minseok Kim;Eunseo Baek;Junghoon Park
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.36-47
    • /
    • 2023
  • Industrial Control System(ICS), which controls facilities at major industrial sites, is increasingly connected to other systems through networks. With this integration and the development of intelligent attacks that can lead to a single external intrusion as a whole system paralysis, the risk and impact of security on industrial control systems are increasing. As a result, research on how to protect and detect cyber attacks is actively underway, and deep learning models in the form of unsupervised learning have achieved a lot, and many abnormal detection technologies based on deep learning are being introduced. In this study, we emphasize the application of preprocessing methodologies to enhance the anomaly detection performance of deep learning models on time series data. The results demonstrate the effectiveness of a Wavelet Transform (WT)-based noise reduction methodology as a preprocessing technique for deep learning-based anomaly detection. Particularly, by incorporating sensor characteristics through clustering, the differential application of the Dual-Tree Complex Wavelet Transform proves to be the most effective approach in improving the detection performance of cyber attacks.

A Comparison Study on Forecasting Models for Air Compressor Power Consumption (공압기 소비전력에 대한 예측 모형의 비교연구)

  • Juhyeon Kim;Moonsoo Jang;Yejn Kim;Yoseob Heo;Hyunsang Chung;Soyoung Park
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.26 no.4_2
    • /
    • pp.657-668
    • /
    • 2023
  • It's important to note that air compressors in the industrial sector are major energy consumers, accounting for a significant portion of total energy costs in manufacturing plants, ranging from 12% to 40%. To address this issue, researchers have compared forecasting models that can predict the power consumption of air compressors. The forecasting models were designed to incorporate variables such as flow rate, pressure, temperature, humidity, and dew point, utilizing statistical methods, machine learning, and deep learning techniques. The model performance was compared using measures such as RMSE, MAE and SMAPE. Out of the 21 models tested, the Elastic Net, a statistical method, proved to be the most effective in power comsumption forecasting.

Affective Computing Among Individuals in Deep Learning

  • Kim, Seong-Kyu (Steve)
    • Journal of Multimedia Information System
    • /
    • v.7 no.2
    • /
    • pp.115-124
    • /
    • 2020
  • This paper is a study of deep learning among artificial intelligence technology which has been developing many technologies recently. Especially, I am talking about emotional computing that has been mentioned a lot recently during deep learning. Emotional computing, in other words, is a passive concept that is dominated by people who scientifically analyze human sensibilities and reflect them in product development or system design, and a more active concept that studies how devices and systems understand humans and communicate with people in different modes. This emotional signal extraction, sensitivity, and psychology recognition technology is defined as a technology to process, analyze, and recognize psycho-sensitivity based on micro-small, hyper-sensor technology, and sensitive signals and information that can be sensed by the active movement of the autonomic nervous system caused by human emotional changes in everyday life. Chapter 1 talks about overview and Chapter 2 shows related research. Chapter 3 shows the problems and models of real emotional computing and Chapter 4 shows this paper as a conclusion.

Bark Identification Using a Deep Learning Model (심층 학습 모델을 이용한 수피 인식)

  • Kim, Min-Ki
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.10
    • /
    • pp.1133-1141
    • /
    • 2019
  • Most of the previous studies for bark recognition have focused on the extraction of LBP-like statistical features. Deep learning approach was not well studied because of the difficulty of acquiring large volume of bark image dataset. To overcome the bark dataset problem, this study utilizes the MobileNet which was trained with the ImageNet dataset. This study proposes two approaches. One is to extract features by the pixel-wise convolution and classify the features with SVM. The other is to tune the weights of the MobileNet by flexibly freezing layers. The experimental results with two public bark datasets, BarkTex and Trunk12, show that the proposed methods are effective in bark recognition. Especially the results of the flexible tunning method outperform state-of-the-art methods. In addition, it can be applied to mobile devices because the MobileNet is compact compared to other deep learning models.

A Manually Captured and Modified Phone Screen Image Dataset for Widget Classification on CNNs

  • Byun, SungChul;Han, Seong-Soo;Jeong, Chang-Sung
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.197-207
    • /
    • 2022
  • The applications and user interfaces (UIs) of smart mobile devices are constantly diversifying. For example, deep learning can be an innovative solution to classify widgets in screen images for increasing convenience. To this end, the present research leverages captured images and the ReDraw dataset to write deep learning datasets for image classification purposes. First, as the validation for datasets using ResNet50 and EfficientNet, the experiments show that the dataset composed in this study is helpful for classification according to a widget's functionality. An implementation for widget detection and classification on RetinaNet and EfficientNet is then executed. Finally, the research suggests the Widg-C and Widg-D datasets-a deep learning dataset for identifying the widgets of smart devices-and implementing them for use with representative convolutional neural network models.

Detecting Adversarial Examples Using Edge-based Classification

  • Jaesung Shim;Kyuri Jo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.67-76
    • /
    • 2023
  • Although deep learning models are making innovative achievements in the field of computer vision, the problem of vulnerability to adversarial examples continues to be raised. Adversarial examples are attack methods that inject fine noise into images to induce misclassification, which can pose a serious threat to the application of deep learning models in the real world. In this paper, we propose a model that detects adversarial examples using differences in predictive values between edge-learned classification models and underlying classification models. The simple process of extracting the edges of the objects and reflecting them in learning can increase the robustness of the classification model, and economical and efficient detection is possible by detecting adversarial examples through differences in predictions between models. In our experiments, the general model showed accuracy of {49.9%, 29.84%, 18.46%, 4.95%, 3.36%} for adversarial examples (eps={0.02, 0.05, 0.1, 0.2, 0.3}), whereas the Canny edge model showed accuracy of {82.58%, 65.96%, 46.71%, 24.94%, 13.41%} and other edge models showed a similar level of accuracy also, indicating that the edge model was more robust against adversarial examples. In addition, adversarial example detection using differences in predictions between models revealed detection rates of {85.47%, 84.64%, 91.44%, 95.47%, and 87.61%} for each epsilon-specific adversarial example. It is expected that this study will contribute to improving the reliability of deep learning models in related research and application industries such as medical, autonomous driving, security, and national defense.

Comparison of Deep Learning-based Unsupervised Domain Adaptation Models for Crop Classification (작물 분류를 위한 딥러닝 기반 비지도 도메인 적응 모델 비교)

  • Kwak, Geun-Ho;Park, No-Wook
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.2
    • /
    • pp.199-213
    • /
    • 2022
  • The unsupervised domain adaptation can solve the impractical issue of repeatedly collecting high-quality training data every year for annual crop classification. This study evaluates the applicability of deep learning-based unsupervised domain adaptation models for crop classification. Three unsupervised domain adaptation models including a deep adaptation network (DAN), a deep reconstruction-classification network, and a domain adversarial neural network (DANN) are quantitatively compared via a crop classification experiment using unmanned aerial vehicle images in Hapcheon-gun and Changnyeong-gun, the major garlic and onion cultivation areas in Korea. As source baseline and target baseline models, convolutional neural networks (CNNs) are additionally applied to evaluate the classification performance of the unsupervised domain adaptation models. The three unsupervised domain adaptation models outperformed the source baseline CNN, but the different classification performances were observed depending on the degree of inconsistency between data distributions in source and target images. The classification accuracy of DAN was higher than that of the other two models when the inconsistency between source and target images was low, whereas DANN has the best classification performance when the inconsistency between source and target images was high. Therefore, the extent to which data distributions of the source and target images match should be considered to select the best unsupervised domain adaptation model to generate reliable classification results.