• Title/Summary/Keyword: 인과 딥러닝

Search Result 127, Processing Time 0.027 seconds

Improving the performance for Relation Networks using parameters tuning (파라미터 튜닝을 통한 Relation Networks 성능개선)

  • Lee, Hyun-Ok;Lim, Heui-Seok
    • Annual Conference of KIPS
    • /
    • 2018.05a
    • /
    • pp.377-380
    • /
    • 2018
  • 인간의 추론 능력이란 문제에 주어진 조건을 보고 문제 해결에 필요한 것이 무엇인지를 논리적으로 생각해 보는 것으로 문제 상황 속에서 일정한 규칙이나 성질을 발견하고 이를 수학적인 방법으로 법칙을 찾아내거나 해결하는 능력을 말한다. 이러한 인간인지 능력과 유사한 인공지능 시스템을 개발하는데 있어서 핵심적 도전은 비구조적 데이터(unstructured data)로부터 그 개체들(object)과 그들간의 관계(relation)에 대해 추론하는 능력을 부여하는 것이라고 할 수 있다. 지금까지 딥러닝(deep learning) 방법은 구조화 되지 않은 데이터로부터 문제를 해결하는 엄청난 진보를 가져왔지만, 명시적으로 개체간의 관계를 고려하지 않고 이를 수행해왔다. 최근 발표된 구조화되지 않은 데이터로부터 복잡한 관계 추론을 수행하는 심층신경망(deep neural networks)은 관계추론(relational reasoning)의 시도를 이해하는데 기대할 만한 접근법을 보여주고 있다. 그 첫 번째는 관계추론을 위한 간단한 신경망 모듈(A simple neural network module for relational reasoning) 인 RN(Relation Networks)이고, 두 번째는 시각적 관찰을 기반으로 실제대상의 미래 상태를 예측하는 범용 목적의 VIN(Visual Interaction Networks)이다. 관계 추론을 수행하는 이들 심층신경망(deep neural networks)은 세상을 객체(objects)와 그들의 관계(their relations)라는 체계로 분해하고, 신경망(neural networks)이 피상적으로는 매우 달라 보이지만 근본적으로는 공통관계를 갖는 장면들에 대하여 객체와 관계라는 새로운 결합(combinations)을 일반화할 수 있는 강력한 추론 능력(powerful ability to reason)을 보유할 수 있다는 것을 보여주고 있다. 본 논문에서는 관계 추론을 수행하는 심층신경망(deep neural networks) 중에서 Sort-of-CLEVR 데이터 셋(dataset)을 사용하여 RN(Relation Networks)의 성능을 재현 및 관찰해 보았으며, 더 나아가 파라미터(parameters) 튜닝을 통하여 RN(Relation Networks) 모델의 성능 개선방법을 제시하여 보았다.

Development a Meal Support System for the Visually Impaired Using YOLO Algorithm (YOLO알고리즘을 활용한 시각장애인용 식사보조 시스템 개발)

  • Lee, Gun-Ho;Moon, Mi-Kyeong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.5
    • /
    • pp.1001-1010
    • /
    • 2021
  • Normal people are not deeply aware of their dependence on sight when eating. However, since the visually impaired do not know what kind of food is on the table, the assistant next to them holds the blind spoon and explains the position of the food in a clockwise direction, front and rear, left and right, etc. In this paper, we describe the development of a meal assistance system that recognizes each food image and announces the name of the food by voice when a visually impaired person looks at their table using a smartphone camera. This system extracts the food on which the spoon is placed through the YOLO model that has learned the image of food and tableware (spoon), recognizes what the food is, and notifies it by voice. Through this system, it is expected that the visually impaired will be able to eat without the help of a meal assistant, thereby increasing their self-reliance and satisfaction.

LSTM based sequence-to-sequence Model for Korean Automatic Word-spacing (LSTM 기반의 sequence-to-sequence 모델을 이용한 한글 자동 띄어쓰기)

  • Lee, Tae Seok;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.7 no.4
    • /
    • pp.17-23
    • /
    • 2018
  • We proposed a LSTM-based RNN model that can effectively perform the automatic spacing characteristics. For those long or noisy sentences which are known to be difficult to handle within Neural Network Learning, we defined a proper input data format and decoding data format, and added dropout, bidirectional multi-layer LSTM, layer normalization, and attention mechanism to improve the performance. Despite of the fact that Sejong corpus contains some spacing errors, a noise-robust learning model developed in this study with no overfitting through a dropout method helped training and returned meaningful results of Korean word spacing and its patterns. The experimental results showed that the performance of LSTM sequence-to-sequence model is 0.94 in F1-measure, which is better than the rule-based deep-learning method of GRU-CRF.

Transformer Network for Container's BIC-code Recognition (컨테이너 BIC-code 인식을 위한 Transformer Network)

  • Kwon, HeeJoo;Kang, HyunSoo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.1
    • /
    • pp.19-26
    • /
    • 2022
  • This paper presents a pre-processing method to facilitate the container's BIC-code recognition. We propose a network that can find ROI(Region Of Interests) containing a BIC-code region and estimate a homography matrix for warping. Taking the structure of STN(Spatial Transformer Networks), the proposed network consists of next 3 steps, ROI detection, homography matrix estimation, and warping using the homography estimated in the previous step. It contributes to improving the accuracy of BIC-code recognition by estimating ROI and matrix using the proposed network and correcting perspective distortion of ROI using the estimated matrix. For performance evaluation, five evaluators evaluated the output image as a perfect score of 5 and received an average of 4.25 points, and when visually checked, 224 out of 312 photos are accurately and perfectly corrected, containing ROI.

Automatic Augmentation Technique of an Autoencoder-based Numerical Training Data (오토인코더 기반 수치형 학습데이터의 자동 증강 기법)

  • Jeong, Ju-Eun;Kim, Han-Joon;Chun, Jong-Hoon
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.75-86
    • /
    • 2022
  • This study aims to solve the problem of class imbalance in numerical data by using a deep learning-based Variational AutoEncoder and to improve the performance of the learning model by augmenting the learning data. We propose 'D-VAE' to artificially increase the number of records for a given table data. The main features of the proposed technique go through discretization and feature selection in the preprocessing process to optimize the data. In the discretization process, K-means are applied and grouped, and then converted into one-hot vectors by one-hot encoding technique. Subsequently, for memory efficiency, sample data are generated with Variational AutoEncoder using only features that help predict with RFECV among feature selection techniques. To verify the performance of the proposed model, we demonstrate its validity by conducting experiments by data augmentation ratio.

Freeway Bus-Only Lane Enforcement System Using Infrared Image Processing Technique (적외선 영상검지 기술을 활용한 고속도로 버스전용차로 단속시스템 개발)

  • Jang, Jinhwan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.5
    • /
    • pp.67-77
    • /
    • 2022
  • An automatic freeway bus-only lane enforcement system was developed and assessed in a real-world environment. Observation of a bus-only lane on the Youngdong freeway, South Korea, revealed that approximately 99% of the vehicles violated the high-occupancy vehicle (HOV) lane regulation. However, the current enforcement by the police not only exhibits a low enforcement rate, but also induces unnecessary safety and delay concerns. Since vehicles with six passengers or higher are permitted to enter freeway bus-only lanes, identifying the number of passengers in a vehicle is a core technology required for a freeway bus-only lane enforcement system. To that end, infrared cameras and the You Only Look Once (YOLOv5) deep learning algorithm were utilized. For assessment of the performance of the developed system, two environments, including a controlled test-bed and a real-world freeway, were used. As a result, the performances under the test-bed and the real-world environments exhibited 7% and 8% errors, respectively, indicating satisfactory outcomes. The developed system would contribute to an efficient freeway bus-only lane operations as well as eliminate safety and delay concerns caused by the current manual enforcement procedures.

Speed Prediction and Analysis of Nearby Road Causality Using Explainable Deep Graph Neural Network (설명 가능 그래프 심층 인공신경망 기반 속도 예측 및 인근 도로 영향력 분석 기법)

  • Kim, Yoo Jin;Yoon, Young
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.51-62
    • /
    • 2022
  • AI-based speed prediction studies have been conducted quite actively. However, while the importance of explainable AI is emerging, the study of interpreting and reasoning the AI-based speed predictions has not been carried out much. Therefore, in this paper, 'Explainable Deep Graph Neural Network (GNN)' is devised to analyze the speed prediction and assess the nearby road influence for reasoning the critical contributions to a given road situation. The model's output was explained by comparing the differences in output before and after masking the input values of the GNN model. Using TOPIS traffic speed data, we applied our GNN models for the major congested roads in Seoul. We verified our approach through a traffic flow simulation by adjusting the most influential nearby roads' speed and observing the congestion's relief on the road of interest accordingly. This is meaningful in that our approach can be applied to the transportation network and traffic flow can be improved by controlling specific nearby roads based on the inference results.

A Korean Multi-speaker Text-to-Speech System Using d-vector (d-vector를 이용한 한국어 다화자 TTS 시스템)

  • Kim, Kwang Hyeon;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.3
    • /
    • pp.469-475
    • /
    • 2022
  • To train the model of the deep learning-based single-speaker TTS system, a speech DB of tens of hours and a lot of training time are required. This is an inefficient method in terms of time and cost to train multi-speaker or personalized TTS models. The voice cloning method uses a speaker encoder model to make the TTS model of a new speaker. Through the trained speaker encoder model, a speaker embedding vector representing the timbre of the new speaker is created from the small speech data of the new speaker that is not used for training. In this paper, we propose a multi-speaker TTS system to which voice cloning is applied. The proposed TTS system consists of a speaker encoder, synthesizer and vocoder. The speaker encoder applies the d-vector technique used in the speaker recognition field. The timbre of the new speaker is expressed by adding the d-vector derived from the trained speaker encoder as an input to the synthesizer. It can be seen that the performance of the proposed TTS system is excellent from the experimental results derived by the MOS and timbre similarity listening tests.

Semantic Segmentation of Drone Images Based on Combined Segmentation Network Using Multiple Open Datasets (개방형 다중 데이터셋을 활용한 Combined Segmentation Network 기반 드론 영상의 의미론적 분할)

  • Ahram Song
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.967-978
    • /
    • 2023
  • This study proposed and validated a combined segmentation network (CSN) designed to effectively train on multiple drone image datasets and enhance the accuracy of semantic segmentation. CSN shares the entire encoding domain to accommodate the diversity of three drone datasets, while the decoding domains are trained independently. During training, the segmentation accuracy of CSN was lower compared to U-Net and the pyramid scene parsing network (PSPNet) on single datasets because it considers loss values for all dataset simultaneously. However, when applied to domestic autonomous drone images, CSN demonstrated the ability to classify pixels into appropriate classes without requiring additional training, outperforming PSPNet. This research suggests that CSN can serve as a valuable tool for effectively training on diverse drone image datasets and improving object recognition accuracy in new regions.

A Study on Non-Contact Care Robot System through Deep Learning

  • Hyun-Sik Ham;Sae Jun Ko
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.33-40
    • /
    • 2023
  • As South Korea enters the realm of an super-aging society, the demand for elderly welfare services has been steadily rising. However, the current shortage of welfare personnel has emerged as a social issue. To address this challenge, there is active research underway on elderly care robots designed to mitigate the social isolation of the elderly and provide emergency contact capabilities in critical situations. Nonetheless, these functionalities require direct user contact, which represents a limitation of conventional elderly care robots. In this paper, we propose a solution to overcome these challenges by introducing a care robot system capable of interacting with users without the need for direct physical contact. This system leverages commercialized elderly care robots and cameras. We have equipped the care robot with an edge device that incorporates facial expression recognition and action recognition models. The models were trained and validated using public available data. Experimental results demonstrate high accuracy rates, with facial expression recognition achieving 96.5% accuracy and action recognition reaching 90.9%. Furthermore, the inference times for these processes are 50ms and 350ms, respectively. These findings affirm that our proposed system offers efficient and accurate facial and action recognition, enabling seamless interaction even in non-contact situations.