• Title/Summary/Keyword: Deep Learning Model

Search Result 2,764, Processing Time 0.028 seconds

Three-Dimensional Convolutional Vision Transformer for Sign Language Translation (수어 번역을 위한 3차원 컨볼루션 비전 트랜스포머)

  • Horyeor Seong;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.3
    • /
    • pp.140-147
    • /
    • 2024
  • In the Republic of Korea, people with hearing impairments are the second-largest demographic within the registered disability community, following those with physical disabilities. Despite this demographic significance, research on sign language translation technology is limited due to several reasons including the limited market size and the lack of adequately annotated datasets. Despite the difficulties, a few researchers continue to improve the performacne of sign language translation technologies by employing the recent advance of deep learning, for example, the transformer architecture, as the transformer-based models have demonstrated noteworthy performance in tasks such as action recognition and video classification. This study focuses on enhancing the recognition performance of sign language translation by combining transformers with 3D-CNN. Through experimental evaluations using the PHOENIX-Wether-2014T dataset [1], we show that the proposed model exhibits comparable performance to existing models in terms of Floating Point Operations Per Second (FLOPs).

Predicting Performance of Heavy Industry Firms in Korea with U.S. Trade Policy Data (미국 무역정책 변화가 국내 중공업 기업의 경영성과에 미치는 영향)

  • Park, Jinsoo;Kim, Kyoungho;Kim, Buomsoo;Suh, Jihae
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.4
    • /
    • pp.71-101
    • /
    • 2017
  • Since late 2016, protectionism has been a major trend in world trade with the Great Britain exiting the European Union and the United States electing Donald Trump as the 45th president. Consequently, there has been a huge public outcry regarding the negative prospects of heavy industry firms in Korea, which are highly dependent upon international trade with Western countries including the United States. In light of such trend and concerns, we have tried to predict business performance of heavy industry firms in Korea with data regarding trade policy of the United States. United States International Trade Commission (USITC) levies countervailing duties and anti-dumping duties to firms that violate its fair-trade regulations. In this study, we have performed data analysis with past records of countervailing duties and anti-dumping duties. With results from clustering analysis, it could be concluded that trade policy trends of the Unites States significantly affects the business performance of heavy industry firms in Korea. Furthermore, we have attempted to quantify such effects by employing long short-term memory (LSTM), a popular neural networks model that is well-suited to deal with sequential data. Our major contribution is that we have succeeded in empirically validating the intuitive argument and also predicting the future trend with rigorous data mining techniques. With some improvements, our results are expected to be highly relevant to designing regulations regarding heavy industry in Korea.

A Method for Body Keypoint Localization based on Object Detection using the RGB-D information (RGB-D 정보를 이용한 객체 탐지 기반의 신체 키포인트 검출 방법)

  • Park, Seohee;Chun, Junchul
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.85-92
    • /
    • 2017
  • Recently, in the field of video surveillance, a Deep Learning based learning method has been applied to a method of detecting a moving person in a video and analyzing the behavior of a detected person. The human activity recognition, which is one of the fields this intelligent image analysis technology, detects the object and goes through the process of detecting the body keypoint to recognize the behavior of the detected object. In this paper, we propose a method for Body Keypoint Localization based on Object Detection using RGB-D information. First, the moving object is segmented and detected from the background using color information and depth information generated by the two cameras. The input image generated by rescaling the detected object region using RGB-D information is applied to Convolutional Pose Machines for one person's pose estimation. CPM are used to generate Belief Maps for 14 body parts per person and to detect body keypoints based on Belief Maps. This method provides an accurate region for objects to detect keypoints an can be extended from single Body Keypoint Localization to multiple Body Keypoint Localization through the integration of individual Body Keypoint Localization. In the future, it is possible to generate a model for human pose estimation using the detected keypoints and contribute to the field of human activity recognition.

Research on Text Classification of Research Reports using Korea National Science and Technology Standards Classification Codes (국가 과학기술 표준분류 체계 기반 연구보고서 문서의 자동 분류 연구)

  • Choi, Jong-Yun;Hahn, Hyuk;Jung, Yuchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.1
    • /
    • pp.169-177
    • /
    • 2020
  • In South Korea, the results of R&D in science and technology are submitted to the National Science and Technology Information Service (NTIS) in reports that have Korea national science and technology standard classification codes (K-NSCC). However, considering there are more than 2000 sub-categories, it is non-trivial to choose correct classification codes without a clear understanding of the K-NSCC. In addition, there are few cases of automatic document classification research based on the K-NSCC, and there are no training data in the public domain. To the best of our knowledge, this study is the first attempt to build a highly performing K-NSCC classification system based on NTIS report meta-information from the last five years (2013-2017). To this end, about 210 mid-level categories were selected, and we conducted preprocessing considering the characteristics of research report metadata. More specifically, we propose a convolutional neural network (CNN) technique using only task names and keywords, which are the most influential fields. The proposed model is compared with several machine learning methods (e.g., the linear support vector classifier, CNN, gated recurrent unit, etc.) that show good performance in text classification, and that have a performance advantage of 1% to 7% based on a top-three F1 score.

Dual CNN Structured Sound Event Detection Algorithm Based on Real Life Acoustic Dataset (실생활 음향 데이터 기반 이중 CNN 구조를 특징으로 하는 음향 이벤트 인식 알고리즘)

  • Suh, Sangwon;Lim, Wootaek;Jeong, Youngho;Lee, Taejin;Kim, Hui Yong
    • Journal of Broadcast Engineering
    • /
    • v.23 no.6
    • /
    • pp.855-865
    • /
    • 2018
  • Sound event detection is one of the research areas to model human auditory cognitive characteristics by recognizing events in an environment with multiple acoustic events and determining the onset and offset time for each event. DCASE, a research group on acoustic scene classification and sound event detection, is proceeding challenges to encourage participation of researchers and to activate sound event detection research. However, the size of the dataset provided by the DCASE Challenge is relatively small compared to ImageNet, which is a representative dataset for visual object recognition, and there are not many open sources for the acoustic dataset. In this study, the sound events that can occur in indoor and outdoor are collected on a larger scale and annotated for dataset construction. Furthermore, to improve the performance of the sound event detection task, we developed a dual CNN structured sound event detection system by adding a supplementary neural network to a convolutional neural network to determine the presence of sound events. Finally, we conducted a comparative experiment with both baseline systems of the DCASE 2016 and 2017.

A Thoracic Spine Segmentation Technique for Automatic Extraction of VHS and Cobb Angle from X-ray Images (X-ray 영상에서 VHS와 콥 각도 자동 추출을 위한 흉추 분할 기법)

  • Ye-Eun, Lee;Seung-Hwa, Han;Dong-Gyu, Lee;Ho-Joon, Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.1
    • /
    • pp.51-58
    • /
    • 2023
  • In this paper, we propose an organ segmentation technique for the automatic extraction of medical diagnostic indicators from X-ray images. In order to calculate diagnostic indicators of heart disease and spinal disease such as VHS(vertebral heart scale) and Cobb angle, it is necessary to accurately segment the thoracic spine, carina, and heart in a chest X-ray image. A deep neural network model in which the high-resolution representation of the image for each layer and the structure converted into a low-resolution feature map are connected in parallel was adopted. This structure enables the relative position information in the image to be effectively reflected in the segmentation process. It is shown that learning performance can be improved by combining the OCR module, in which pixel information and object information are mutually interacted in a multi-step process, and the channel attention module, which allows each channel of the network to be reflected as different weight values. In addition, a method of augmenting learning data is presented in order to provide robust performance against changes in the position, shape, and size of the subject in the X-ray image. The effectiveness of the proposed theory was evaluated through an experiment using 145 human chest X-ray images and 118 animal X-ray images.

Detecting Vehicles That Are Illegally Driving on Road Shoulders Using Faster R-CNN (Faster R-CNN을 이용한 갓길 차로 위반 차량 검출)

  • Go, MyungJin;Park, Minju;Yeo, Jiho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.1
    • /
    • pp.105-122
    • /
    • 2022
  • According to the statistics about the fatal crashes that have occurred on the expressways for the last 5 years, those who died on the shoulders of the road has been as 3 times high as the others who died on the expressways. It suggests that the crashes on the shoulders of the road should be fatal, and that it would be important to prevent the traffic crashes by cracking down on the vehicles intruding the shoulders of the road. Therefore, this study proposed a method to detect a vehicle that violates the shoulder lane by using the Faster R-CNN. The vehicle was detected based on the Faster R-CNN, and an additional reading module was configured to determine whether there was a shoulder violation. For experiments and evaluations, GTAV, a simulation game that can reproduce situations similar to the real world, was used. 1,800 images of training data and 800 evaluation data were processed and generated, and the performance according to the change of the threshold value was measured in ZFNet and VGG16. As a result, the detection rate of ZFNet was 99.2% based on Threshold 0.8 and VGG16 93.9% based on Threshold 0.7, and the average detection speed for each model was 0.0468 seconds for ZFNet and 0.16 seconds for VGG16, so the detection rate of ZFNet was about 7% higher. The speed was also confirmed to be about 3.4 times faster. These results show that even in a relatively uncomplicated network, it is possible to detect a vehicle that violates the shoulder lane at a high speed without pre-processing the input image. It suggests that this algorithm can be used to detect violations of designated lanes if sufficient training datasets based on actual video data are obtained.

Image-to-Image Translation Based on U-Net with R2 and Attention (R2와 어텐션을 적용한 유넷 기반의 영상 간 변환에 관한 연구)

  • Lim, So-hyun;Chun, Jun-chul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.9-16
    • /
    • 2020
  • In the Image processing and computer vision, the problem of reconstructing from one image to another or generating a new image has been steadily drawing attention as hardware advances. However, the problem of computer-generated images also continues to emerge when viewed with human eyes because it is not natural. Due to the recent active research in deep learning, image generating and improvement problem using it are also actively being studied, and among them, the network called Generative Adversarial Network(GAN) is doing well in the image generating. Various models of GAN have been presented since the proposed GAN, allowing for the generation of more natural images compared to the results of research in the image generating. Among them, pix2pix is a conditional GAN model, which is a general-purpose network that shows good performance in various datasets. pix2pix is based on U-Net, but there are many networks that show better performance among U-Net based networks. Therefore, in this study, images are generated by applying various networks to U-Net of pix2pix, and the results are compared and evaluated. The images generated through each network confirm that the pix2pix model with Attention, R2, and Attention-R2 networks shows better performance than the existing pix2pix model using U-Net, and check the limitations of the most powerful network. It is suggested as a future study.

A Study on the stock price prediction and influence factors through NARX neural network optimization (NARX 신경망 최적화를 통한 주가 예측 및 영향 요인에 관한 연구)

  • Cheon, Min Jong;Lee, Ook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.8
    • /
    • pp.572-578
    • /
    • 2020
  • The stock market is affected by unexpected factors, such as politics, society, and natural disasters, as well as by corporate performance and economic conditions. In recent days, artificial intelligence has become popular, and many researchers have tried to conduct experiments with that. Our study proposes an experiment using not only stock-related data but also other various economic data. We acquired a year's worth of data on stock prices, the percentage of foreigners, interest rates, and exchange rates, and combined them in various ways. Thus, our input data became diversified, and we put the combined input data into a nonlinear autoregressive network with exogenous inputs (NARX) model. With the input data in the NARX model, we analyze and compare them to the original data. As a result, the model exhibits a root mean square error (RMSE) of 0.08 as being the most accurate when we set 10 neurons and two delays with a combination of stock prices and exchange rates from the U.S., China, Europe, and Japan. This study is meaningful in that the exchange rate has the greatest influence on stock prices, lowering the error from RMSE 0.589 when only closing data are used.

Air-conditioning and Heating Time Prediction Based on Artificial Neural Network and Its Application in IoT System (냉난방 시간을 예측하는 인공신경망의 구축 및 IoT 시스템에서의 활용)

  • Kim, Jun-soo;Lee, Ju-ik;Kim, Dongho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.347-350
    • /
    • 2018
  • In order for an IoT system to automatically make the house temperature pleasant for the user, the system needs to predict the optimal start-up time of air-conditioner or heater to get to the temperature that the user has set. Predicting the optimal start-up time is important because it prevents extra fee from the unnecessary operation of the air-conditioner and heater. This paper introduces an ANN(Artificial Neural Network) and an IoT system that predicts the cooling and heating time in households using air-conditioner and heater. Many variables such as house structure, house size, and external weather condition affect the cooling and heating. Out of the many variables, measurable variables such as house temperature, house humidity, outdoor temperature, outdoor humidity, wind speed, wind direction, and wind chill was used to create training data for constructing the model. After constructing the ANN model, an IoT system that uses the model was developed. The IoT system comprises of a main system powered by Raspberry Pi 3 and a mobile application powered by Android. The mobile's GPS sensor and an developed feature used to predict user's return.

  • PDF