• 제목/요약/키워드: deep Learning

Search Result 5,763, Processing Time 0.032 seconds

Automatic gasometer reading system using selective optical character recognition (관심 문자열 인식 기술을 이용한 가스계량기 자동 검침 시스템)

  • Lee, Kyohyuk;Kim, Taeyeon;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.1-25
    • /
    • 2020
  • In this paper, we suggest an application system architecture which provides accurate, fast and efficient automatic gasometer reading function. The system captures gasometer image using mobile device camera, transmits the image to a cloud server on top of private LTE network, and analyzes the image to extract character information of device ID and gas usage amount by selective optical character recognition based on deep learning technology. In general, there are many types of character in an image and optical character recognition technology extracts all character information in an image. But some applications need to ignore non-of-interest types of character and only have to focus on some specific types of characters. For an example of the application, automatic gasometer reading system only need to extract device ID and gas usage amount character information from gasometer images to send bill to users. Non-of-interest character strings, such as device type, manufacturer, manufacturing date, specification and etc., are not valuable information to the application. Thus, the application have to analyze point of interest region and specific types of characters to extract valuable information only. We adopted CNN (Convolutional Neural Network) based object detection and CRNN (Convolutional Recurrent Neural Network) technology for selective optical character recognition which only analyze point of interest region for selective character information extraction. We build up 3 neural networks for the application system. The first is a convolutional neural network which detects point of interest region of gas usage amount and device ID information character strings, the second is another convolutional neural network which transforms spatial information of point of interest region to spatial sequential feature vectors, and the third is bi-directional long short term memory network which converts spatial sequential information to character strings using time-series analysis mapping from feature vectors to character strings. In this research, point of interest character strings are device ID and gas usage amount. Device ID consists of 12 arabic character strings and gas usage amount consists of 4 ~ 5 arabic character strings. All system components are implemented in Amazon Web Service Cloud with Intel Zeon E5-2686 v4 CPU and NVidia TESLA V100 GPU. The system architecture adopts master-lave processing structure for efficient and fast parallel processing coping with about 700,000 requests per day. Mobile device captures gasometer image and transmits to master process in AWS cloud. Master process runs on Intel Zeon CPU and pushes reading request from mobile device to an input queue with FIFO (First In First Out) structure. Slave process consists of 3 types of deep neural networks which conduct character recognition process and runs on NVidia GPU module. Slave process is always polling the input queue to get recognition request. If there are some requests from master process in the input queue, slave process converts the image in the input queue to device ID character string, gas usage amount character string and position information of the strings, returns the information to output queue, and switch to idle mode to poll the input queue. Master process gets final information form the output queue and delivers the information to the mobile device. We used total 27,120 gasometer images for training, validation and testing of 3 types of deep neural network. 22,985 images were used for training and validation, 4,135 images were used for testing. We randomly splitted 22,985 images with 8:2 ratio for training and validation respectively for each training epoch. 4,135 test image were categorized into 5 types (Normal, noise, reflex, scale and slant). Normal data is clean image data, noise means image with noise signal, relfex means image with light reflection in gasometer region, scale means images with small object size due to long-distance capturing and slant means images which is not horizontally flat. Final character string recognition accuracies for device ID and gas usage amount of normal data are 0.960 and 0.864 respectively.

Estimating Gastrointestinal Transition Location Using CNN-based Gastrointestinal Landmark Classifier (CNN 기반 위장관 랜드마크 분류기를 이용한 위장관 교차점 추정)

  • Jang, Hyeon Woong;Lim, Chang Nam;Park, Ye-Suel;Lee, Gwang Jae;Lee, Jung-Won
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.3
    • /
    • pp.101-108
    • /
    • 2020
  • Since the performance of deep learning techniques has recently been proven in the field of image processing, there are many attempts to perform classification, analysis, and detection of images using such techniques in various fields. Among them, the expectation of medical image analysis software, which can serve as a medical diagnostic assistant, is increasing. In this study, we are attention to the capsule endoscope image, which has a large data set and takes a long time to judge. The purpose of this paper is to distinguish the gastrointestinal landmarks and to estimate the gastrointestinal transition location that are common to all patients in the judging of capsule endoscopy and take a lot of time. To do this, we designed CNN-based Classifier that can identify gastrointestinal landmarks, and used it to estimate the gastrointestinal transition location by filtering the results. Then, we estimate gastrointestinal transition location about seven of eight patients entered the suspected gastrointestinal transition area. In the case of change from the stomach to the small intestine(pylorus), and change from the small intestine to the large intestine(ileocecal valve), we can check all eight patients were found to be in the suspected gastrointestinal transition area. we can found suspected gastrointestinal transition area in the range of 100 frames, and if the reader plays images at 10 frames per second, the gastrointestinal transition could be found in 10 seconds.

Image-to-Image Translation Based on U-Net with R2 and Attention (R2와 어텐션을 적용한 유넷 기반의 영상 간 변환에 관한 연구)

  • Lim, So-hyun;Chun, Jun-chul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.4
    • /
    • pp.9-16
    • /
    • 2020
  • In the Image processing and computer vision, the problem of reconstructing from one image to another or generating a new image has been steadily drawing attention as hardware advances. However, the problem of computer-generated images also continues to emerge when viewed with human eyes because it is not natural. Due to the recent active research in deep learning, image generating and improvement problem using it are also actively being studied, and among them, the network called Generative Adversarial Network(GAN) is doing well in the image generating. Various models of GAN have been presented since the proposed GAN, allowing for the generation of more natural images compared to the results of research in the image generating. Among them, pix2pix is a conditional GAN model, which is a general-purpose network that shows good performance in various datasets. pix2pix is based on U-Net, but there are many networks that show better performance among U-Net based networks. Therefore, in this study, images are generated by applying various networks to U-Net of pix2pix, and the results are compared and evaluated. The images generated through each network confirm that the pix2pix model with Attention, R2, and Attention-R2 networks shows better performance than the existing pix2pix model using U-Net, and check the limitations of the most powerful network. It is suggested as a future study.

The Study on The Identification Model of Friend or Foe on Helicopter by using Binary Classification with CNN

  • Kim, Tae Wan;Kim, Jong Hwan;Moon, Ho Seok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.33-42
    • /
    • 2020
  • There has been difficulties in identifying objects by relying on the naked eye in various surveillance systems. There is a growing need for automated surveillance systems to replace soldiers in the field of military surveillance operations. Even though the object detection technology is developing rapidly in the civilian domain, but the research applied to the military is insufficient due to a lack of data and interest. Thus, in this paper, we applied one of deep learning algorithms, Convolutional Neural Network-based binary classification to develop an autonomous identification model of both friend and foe helicopters (AH-64, Mi-17) among the military weapon systems, and evaluated the model performance by considering accuracy, precision, recall and F-measure. As the result, the identification model demonstrates 97.8%, 97.3%, 98.5%, and 97.8 for accuracy, precision, recall and F-measure, respectively. In addition, we analyzed the feature map on convolution layers of the identification model in order to check which area of imagery is highly weighted. In general, rotary shaft of rotating wing, wheels, and air-intake on both of ally and foe helicopters played a major role in the performance of the identification model. This is the first study to attempt to classify images of helicopters among military weapons systems using CNN, and the model proposed in this study shows higher accuracy than the existing classification model for other weapons systems.

Development of a Simulator for Optimizing Semiconductor Manufacturing Incorporating Internet of Things (사물인터넷을 접목한 반도체 소자 공정 최적화 시뮬레이터 개발)

  • Dang, Hyun Shik;Jo, Dong Hee;Kim, Jong Seo;Jung, Taeho
    • Journal of the Korea Society for Simulation
    • /
    • v.26 no.4
    • /
    • pp.35-41
    • /
    • 2017
  • With the advances in Internet over Things, the demand in diverse electronic devices such as mobile phones and sensors has been rapidly increasing and boosting up the researches on those products. Semiconductor materials, devices, and fabrication processes are becoming more diverse and complicated, which accompanies finding parameters for an optimal fabrication process. In order to find the parameters, a process simulation before fabrication or a real-time process control system during fabrication can be used, but they lack incorporating the feedback from post-fabrication data and compatibility with older equipment. In this research, we have developed an artificial intelligence based simulator, which finds parameters for an optimal process and controls process equipment. In order to apply the control concept to all the equipment in a fabrication sequence, we have developed a prototype for a manipulator which can be installed over an existing buttons and knobs in the equipment and controls the equipment communicating with the AI over the Internet. The AI is based on the deep learning to find process parameters that will produce a device having target electrical characteristics. The proposed simulator can control existing equipment via the Internet to fabricate devices with desired performance and, therefore, it will help engineers to develop new devices efficiently and effectively.

ARP Spoofing attack scenarios and countermeasures using CoAP in IoT environment (IoT 환경에서의 CoAP을 이용한 ARP Spoofing 공격 시나리오 및 대응방안)

  • Seo, Cho-Rong;Lee, Keun-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.4
    • /
    • pp.39-44
    • /
    • 2016
  • Due to the dazzling development of IT in this IT-oriented era, information delivering technology among objects, between objects and humans, and among humans has been actively performed. As information delivery technology has been actively performed, IoT became closely related to our daily lives and ubiquitous at any time and place. Therefore, IoT has become a part of our daily lives. CoAp, a web-based protocol, is mostly used in IoT environment. CoAp protocol is mostly used in the network where transmission speed is low along with the huge loss. Therefore, it is mostly used in IoT environment. However, there is a weakness on IoT that it is weak in security. If security issue occurs in IoT environment, there is a possibility for secret information of individuals or companies to be disclosed. If attackers infect the targeted device, and infected device accesses to the wireless frequently used in public areas, the relevant device sends arp spoofing to other devices in the network. Afterward, infected devices receive the packet sent by other devices in the network after occupying the packet flow in the internal network and send them to the designated hacker's server. This study suggests counter-attacks on this issues and a method of coping with them.

A Study on Deep Learning Methodology for Bigdata Mining from Smart Farm using Heterogeneous Computing (스마트팜 빅데이터 분석을 위한 이기종간 심층학습 기법 연구)

  • Min, Jae-Ki;Lee, DongHoon
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.162-162
    • /
    • 2017
  • 구글에서 공개한 Tensorflow를 이용한 여러 학문 분야의 연구가 활발하다. 농업 시설환경을 대상으로 한 빅데이터의 축적이 증가함과 아울러 실효적인 정보 획득을 위한 각종 데이터 분석 및 마이닝 기법에 대한 연구 또한 활발한 상황이다. 한편, 타 분야의 성공적인 심층학습기법 응용사례에 비하여 농업 분야에서의 응용은 초기 성장 단계라 할 수 있다. 이는 농업 현장에서 취득한 정보의 난해성 및 완성도 높은 생육/환경 모델링 정보의 부재로 실효적인 전과정 처리 기술 도출에 소요되는 시간, 비용, 연구 환경이 상대적으로 부족하기 때문일 것이다. 특히, 센서 기반 데이터 취득 기술 증가에 따라 비약적으로 방대해진 수집 데이터를 시간 복잡도가 높은 심층 학습 모델링 연산에 기계적으로 단순 적용할 경우 시간 효율적인 측면에서 성공적인 결과 도출에 애로가 있을 것이다. 매우 높은 시간 복잡도를 해결하기 위하여 제시된 하드웨어 가속 기능의 경우 일부 개발환경에 국한이 되어 있다. 일례로, 구글의 Tensorflow는 오픈소스 기반 병렬 클러스터링 기술인 MPICH를 지원하는 알고리즘을 공개하지 않고 있다. 따라서, 본 연구에서는 심층학습 기법 연구에 있어서, 예상 가능한 다양한 자원을 활용하여 최대한 연산의 결과를 빨리 도출할 수 있는 하드웨어적인 접근 방법을 모색하였다. 호스트에서 수행하는 일방적인 학습 알고리즘과 달리 이기종간 심층 학습이 가능하기 위해선 우선, NFS(Network File System)를 이용하여 데이터 계층이 상호 연결이 되어야 한다. 이를 위해서 고속 네트워크를 기반으로 한 NFS의 이용이 필수적이다. 둘째로 제한된 자원의 한계를 극복하기 위한 메모 공유 라이브러리가 필요하다. 셋째로 이기종간 프로세서에 최적화된 병렬 처리용 컴파일러를 이용해야 한다. 가장 중요한 부분은 이기종간의 처리 능력에 따른 작업을 고르게 분배할 수 있는 작업 스케쥴링이 수행되어야 하며, 이는 처리하고자 하는 데이터의 형태에 따라 매우 가변적이므로 해당 데이터 도메인에 대한 엄밀한 사전 벤치마킹이 수행되어야 한다. 이러한 요구조건을 대부분 충족하는 Open-CL ver1.2(https://www.khronos.org/opencl/)를 이용하였다. 최신의 Open-CL 버전은 2.2이나 본 연구를 위하여 준비한 4가지 이기종 시스템에서 모두 공통적으로 지원하는 버전은 1.2이다. 실험적으로 선정된 4가지 이기종 시스템은 1) Windows 10 Pro, 2) Linux-Ubuntu 16.04.4 LTS-x86_64, 3) MAC OS X 10.11 4) Linux-Ubuntu 16.04.4 LTS-ARM Cortext-A15 이다. 비교 분석을 위하여 NVIDIA 사에서 제공하는 Pascal Titan X 2식을 SLI로 구성한 시스템을 준비하였다. 개별 시스템에서 별도로 컴파일 된 바이너리의 이름을 통일하고, 개별 시스템의 코어수를 동일하게 균등 배분하여 100 Hz의 데이터로 입력이 되는 온도 정보와 조도 정보를 입력으로 하고 이를 습도정보에 Linear Gradient Descent Optimizer를 이용하여 Epoch 10,000회의 학습을 수행하였다. 4종의 이기종에서 총 32개의 코어를 이용한 학습에서 17초 내외로 연산 수행을 마쳤으나, 비교 시스템에서는 11초 내외로 연산을 마치는 결과가 나왔다. 기보유 하드웨어의 적절한 활용이 가능한 심층학습 기법에 대한 연구를 지속할 것이다

  • PDF

Spine Computed Tomography to Magnetic Resonance Image Synthesis Using Generative Adversarial Networks : A Preliminary Study

  • Lee, Jung Hwan;Han, In Ho;Kim, Dong Hwan;Yu, Seunghan;Lee, In Sook;Song, You Seon;Joo, Seongsu;Jin, Cheng-Bin;Kim, Hakil
    • Journal of Korean Neurosurgical Society
    • /
    • v.63 no.3
    • /
    • pp.386-396
    • /
    • 2020
  • Objective : To generate synthetic spine magnetic resonance (MR) images from spine computed tomography (CT) using generative adversarial networks (GANs), as well as to determine the similarities between synthesized and real MR images. Methods : GANs were trained to transform spine CT image slices into spine magnetic resonance T2 weighted (MRT2) axial image slices by combining adversarial loss and voxel-wise loss. Experiments were performed using 280 pairs of lumbar spine CT scans and MRT2 images. The MRT2 images were then synthesized from 15 other spine CT scans. To evaluate whether the synthetic MR images were realistic, two radiologists, two spine surgeons, and two residents blindly classified the real and synthetic MRT2 images. Two experienced radiologists then evaluated the similarities between subdivisions of the real and synthetic MRT2 images. Quantitative analysis of the synthetic MRT2 images was performed using the mean absolute error (MAE) and peak signal-to-noise ratio (PSNR). Results : The mean overall similarity of the synthetic MRT2 images evaluated by radiologists was 80.2%. In the blind classification of the real MRT2 images, the failure rate ranged from 0% to 40%. The MAE value of each image ranged from 13.75 to 34.24 pixels (mean, 21.19 pixels), and the PSNR of each image ranged from 61.96 to 68.16 dB (mean, 64.92 dB). Conclusion : This was the first study to apply GANs to synthesize spine MR images from CT images. Despite the small dataset of 280 pairs, the synthetic MR images were relatively well implemented. Synthesis of medical images using GANs is a new paradigm of artificial intelligence application in medical imaging. We expect that synthesis of MR images from spine CT images using GANs will improve the diagnostic usefulness of CT. To better inform the clinical applications of this technique, further studies are needed involving a large dataset, a variety of pathologies, and other MR sequence of the lumbar spine.

Application of Advertisement Filtering Model and Method for its Performance Improvement (광고 글 필터링 모델 적용 및 성능 향상 방안)

  • Park, Raegeun;Yun, Hyeok-Jin;Shin, Ui-Cheol;Ahn, Young-Jin;Jeong, Seungdo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.11
    • /
    • pp.1-8
    • /
    • 2020
  • In recent years, due to the exponential increase in internet data, many fields such as deep learning have developed, but side effects generated as commercial advertisements, such as viral marketing, have been discovered. This not only damages the essence of the internet for sharing high-quality information, but also causes problems that increase users' search times to acquire high-quality information. In this study, we define advertisement as "a text that obscures the essence of information transmission" and we propose a model for filtering information according to that definition. The proposed model consists of advertisement filtering and advertisement filtering performance improvement and is designed to continuously improve performance. We collected data for filtering advertisements and learned document classification using KorBERT. Experiments were conducted to verify the performance of this model. For data combining five topics, accuracy and precision were 89.2% and 84.3%, respectively. High performance was confirmed, even if atypical characteristics of advertisements are considered. This approach is expected to reduce wasted time and fatigue in searching for information, because our model effectively delivers high-quality information to users through a process of determining and filtering advertisement paragraphs.

A Study on Quantitative Evaluation Method for STT Engine Accuracy based on Korean Characteristics (한국어 특성 기반의 STT 엔진 정확도를 위한 정량적 평가방법 연구)

  • Min, So-Yeon;Lee, Kwang-Hyong;Lee, Dong-Seon;Ryu, Dong-Yeop
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.7
    • /
    • pp.699-707
    • /
    • 2020
  • With the development of deep learning technology, voice processing-related technology is applied to various areas, such as STT (Speech To Text), TTS (Text To Speech), ChatBOT, and intelligent personal assistant. In particular, the STT is a voice-based, relevant service that changes human languages to text, so it can be applied to various IT related services. Recently, many places, such as general private enterprises and public institutions, are attempting to introduce the relevant technology. On the other hand, in contrast to the general IT solution that can be evaluated quantitatively, the standard and methods of evaluating the accuracy of the STT engine are ambiguous, and they do not consider the characteristics of the Korean language. Therefore, it is difficult to apply the quantitative evaluation standard. This study aims to provide a guide to an evaluation of the STT engine conversion performance based on the characteristics of the Korean language, so that engine manufacturers can perform the STT conversion based on the characteristics of the Korean language, while the market could perform a more accurate evaluation. In the experiment, a 35% more accurate evaluation could be performed compared to the existing methods.