• Title/Summary/Keyword: 딥러닝 시스템

Search Result 1,296, Processing Time 0.029 seconds

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

  • Jo, Min Su;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.675-681
    • /
    • 2021
  • With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.

Development of Graph based Deep Learning methods for Enhancing the Semantic Integrity of Spaces in BIM Models (BIM 모델 내 공간의 시멘틱 무결성 검증을 위한 그래프 기반 딥러닝 모델 구축에 관한 연구)

  • Lee, Wonbok;Kim, Sihyun;Yu, Youngsu;Koo, Bonsang
    • Korean Journal of Construction Engineering and Management
    • /
    • v.23 no.3
    • /
    • pp.45-55
    • /
    • 2022
  • BIM models allow building spaces to be instantiated and recognized as unique objects independently of model elements. These instantiated spaces provide the required semantics that can be leveraged for building code checking, energy analysis, and evacuation route analysis. However, theses spaces or rooms need to be designated manually, which in practice, lead to errors and omissions. Thus, most BIM models today does not guarantee the semantic integrity of space designations, limiting their potential applicability. Recent studies have explored ways to automate space allocation in BIM models using artificial intelligence algorithms, but they are limited in their scope and relatively low classification accuracy. This study explored the use of Graph Convolutional Networks, an algorithm exclusively tailored for graph data structures. The goal was to utilize not only geometry information but also the semantic relational data between spaces and elements in the BIM model. Results of the study confirmed that the accuracy was improved by about 8% compared to algorithms that only used geometric distinctions of the individual spaces.

Training of a Siamese Network to Build a Tracker without Using Tracking Labels (샴 네트워크를 사용하여 추적 레이블을 사용하지 않는 다중 객체 검출 및 추적기 학습에 관한 연구)

  • Kang, Jungyu;Song, Yoo-Seung;Min, Kyoung-Wook;Choi, Jeong Dan
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.5
    • /
    • pp.274-286
    • /
    • 2022
  • Multi-object tracking has been studied for a long time under computer vision and plays a critical role in applications such as autonomous driving and driving assistance. Multi-object tracking techniques generally consist of a detector that detects objects and a tracker that tracks the detected objects. Various publicly available datasets allow us to train a detector model without much effort. However, there are relatively few publicly available datasets for training a tracker model, and configuring own tracker datasets takes a long time compared to configuring detector datasets. Hence, the detector is often developed separately with a tracker module. However, the separated tracker should be adjusted whenever the former detector model is changed. This study proposes a system that can train a model that performs detection and tracking simultaneously using only the detector training datasets. In particular, a Siam network with augmentation is used to compose the detector and tracker. Experiments are conducted on public datasets to verify that the proposed algorithm can formulate a real-time multi-object tracker comparable to the state-of-the-art tracker models.

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF

Binary CNN Operation Algorithm using Bit-plane Image (비트평면 영상을 이용한 이진 CNN 연산 알고리즘)

  • Choi, Jong-Ho
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.6
    • /
    • pp.567-572
    • /
    • 2019
  • In this paper, we propose an algorithm to perform convolution, pooling, and ReLU operations in CNN using binary image and binary kernel. It decomposes 256 gray-scale images into 8 bit planes and uses a binary kernel consisting of -1 and 1. The convolution operation of binary image and binary kernel is performed by addition and subtraction. Logically, it is a binary operation algorithm using the XNOR and comparator. ReLU and pooling operations are performed by using XNOR and OR logic operations, respectively. Through the experiments to verify the usefulness of the proposed algorithm, We confirm that the CNN operation can be performed by converting it to binary logic operation. It is an algorithm that can implement deep running even in a system with weak computing power. It can be applied to a variety of embedded systems such as smart phones, intelligent CCTV, IoT system, and autonomous car.

A Proposal of Deep Learning Based Semantic Segmentation to Improve Performance of Building Information Models Classification (Semantic Segmentation 기반 딥러닝을 활용한 건축 Building Information Modeling 부재 분류성능 개선 방안)

  • Lee, Ko-Eun;Yu, Young-Su;Ha, Dae-Mok;Koo, Bon-Sang;Lee, Kwan-Hoon
    • Journal of KIBIM
    • /
    • v.11 no.3
    • /
    • pp.22-33
    • /
    • 2021
  • In order to maximize the use of BIM, all data related to individual elements in the model must be correctly assigned, and it is essential to check whether it corresponds to the IFC entity classification. However, as the BIM modeling process is performed by a large number of participants, it is difficult to achieve complete integrity. To solve this problem, studies on semantic integrity verification are being conducted to examine whether elements are correctly classified or IFC mapped in the BIM model by applying an artificial intelligence algorithm to the 2D image of each element. Existing studies had a limitation in that they could not correctly classify some elements even though the geometrical differences in the images were clear. This was found to be due to the fact that the geometrical characteristics were not properly reflected in the learning process because the range of the region to be learned in the image was not clearly defined. In this study, the CRF-RNN-based semantic segmentation was applied to increase the clarity of element region within each image, and then applied to the MVCNN algorithm to improve the classification performance. As a result of applying semantic segmentation in the MVCNN learning process to 889 data composed of a total of 8 BIM element types, the classification accuracy was found to be 0.92, which is improved by 0.06 compared to the conventional MVCNN.

Keyword Retrieval-Based Korean Text Command System Using Morphological Analyzer (형태소 분석기를 이용한 키워드 검색 기반 한국어 텍스트 명령 시스템)

  • Park, Dae-Geun;Lee, Wan-Bok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.2
    • /
    • pp.159-165
    • /
    • 2019
  • Based on deep learning technology, speech recognition method has began to be applied to commercial products, but it is still difficult to be used in the area of VR contents, since there is no easy and efficient way to process the recognized text after the speech recognition module. In this paper, we propose a Korean Language Command System, which can efficiently recognize and respond to Korean speech commands. The system consists of two components. One is a morphological analyzer to analyze sentence morphemes and the other is a retrieval based model which is usually used to develop a chatbot system. Experimental results shows that the proposed system requires only 16% commands to achieve the same level of performance when compared with the conventional string comparison method. Furthermore, when working with Google Cloud Speech module, it revealed 60.1% of success rate. Experimental results show that the proposed system is more efficient than the conventional string comparison method.

License Plate Detection and Recognition Algorithm using Deep Learning (딥러닝을 이용한 번호판 검출과 인식 알고리즘)

  • Kim, Jung-Hwan;Lim, Joonhong
    • Journal of IKEEE
    • /
    • v.23 no.2
    • /
    • pp.642-651
    • /
    • 2019
  • One of the most important research topics on intelligent transportation systems in recent years is detecting and recognizing a license plate. The license plate has a unique identification data on vehicle information. The existing vehicle traffic control system is based on a stop and uses a loop coil as a method of vehicle entrance/exit recognition. The method has the disadvantage of causing traffic jams and rising maintenance costs. We propose to exploit differential image of camera background instead of loop coil as an entrance/exit recognition method of vehicles. After entrance/exit recognition, we detect the candidate images of license plate using the morphological characteristics. The license plate can finally be detected using SVM(Support Vector Machine). Letter and numbers of the detected license plate are recognized using CNN(Convolutional Neural Network). The experimental results show that the proposed algorithm has a higher recognition rate than the existing license plate recognition algorithm.

LeafNet: Plants Segmentation using CNN (LeafNet: 합성곱 신경망을 이용한 식물체 분할)

  • Jo, Jeong Won;Lee, Min Hye;Lee, Hong Ro;Chung, Yong Suk;Baek, Jeong Ho;Kim, Kyung Hwan;Lee, Chang Woo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.24 no.4
    • /
    • pp.1-8
    • /
    • 2019
  • Plant phenomics is a technique for observing and analyzing morphological features in order to select plant varieties of excellent traits. The conventional methods is difficult to apply to the phenomics system. because the color threshold value must be manually changed according to the detection target. In this paper, we propose the convolution neural network (CNN) structure that can automatically segment plants from the background for the phenomics system. The LeafNet consists of nine convolution layers and a sigmoid activation function for determining the presence of plants. As a result of the learning using the LeafNet, we obtained a precision of 98.0% and a recall rate of 90.3% for the plant seedlings images. This confirms the applicability of the phenomics system.

Design of YOLO-based Removable System for Pet Monitoring (반려동물 모니터링을 위한 YOLO 기반의 이동식 시스템 설계)

  • Lee, Min-Hye;Kang, Jun-Young;Lim, Soon-Ja
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.1
    • /
    • pp.22-27
    • /
    • 2020
  • Recently, as the number of households raising pets increases due to the increase of single households, there is a need for a system for monitoring the status or behavior of pets. There are regional limitations in the monitoring of pets using domestic CCTVs, which requires a large number of CCTVs or restricts the behavior of pets. In this paper, we propose a mobile system for detecting and tracking cats using deep learning to solve the regional limitations of pet monitoring. We use YOLO (You Look Only Once), an object detection neural network model, to learn the characteristics of pets and apply them to Raspberry Pi to track objects detected in an image. We have designed a mobile monitoring system that connects Raspberry Pi and a laptop via wireless LAN and can check the movement and condition of cats in real time.