• 제목/요약/키워드: DATA PRE-PROCESSING

검색결과 801건 처리시간 0.023초

심층신경망을 이용한 소스 코드 원작자 식별 (Souce Code Identification Using Deep Neural Network)

  • 임지수
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제8권9호
    • /
    • pp.373-378
    • /
    • 2019
  • 현재 프로그래밍 소스들이 온라인에서 공개되어 있기 때문에 무분별한 표절이나 저작권에 대한 문제가 일어나고 있다. 그 중 반복된 저자가 작성한 소스코드는 프로그래밍 특성상 고유의 지문이 있을 수 있다. 본 논문은 구글 코드 잼 프로그램 소스를 심층신경망을 이용한 학습을 통해 각각의 저자를 분별하는 것이다. 이 때 원작자의 소스를 예측 기반 벡터나, 주파수 기반 접근법인 TF-IDF등의 전처리기를 사용하여 입력값들을 벡터화해주고, 심층신경망을 이용한 학습을 통해 각 프로그램 소스 원작자를 식별하고자 한다. 전처리기를 이용하여 언어에 독립적인 학습시스템을 구성하고, 기존의 다른 학습 방법들과 비교하였다. 그 중 TF-IDF와 심층신경망을 사용한 모델은 다른 전처리기나 다른 학습방식을 사용한 것보다 좋은 성능을 보임을 확인하였다.

선체의 태양복사 열변형 해석을 위한 전처리시스템 (A System for Thermal Distortion Analysis of Hull Structures by Solar Radiation)

  • 하윤석;이동훈
    • 대한조선학회논문집
    • /
    • 제53권4호
    • /
    • pp.275-281
    • /
    • 2016
  • One of the most important things for quality to meet ship-production schedule is an accuracy control. A ship is assembled by welding through whole production process, so it is important that loss by correction will not happen as much as possible by using some engineering skills like reverse design, reverse setting and margin for thermal shrinkage. These efforts are a quite effective in fabrication stages, but not in erection stages. If a ship block which consists of common steel is exposed to directional solar radiation, its dimensional accuracy will change high as time by its thermal expansion coefficient. Therefore, the measuring work would be often done at dawn or evening even with having a very accurate device. In this study, an FE analysis method is developed to solve this problem. It can change measured data affected by solar thermal distortion to ones not, even though ship-block is measured at an arbitrary time. It will use the time when measuring, the direction of block and the weather record by satellites. It is confirmed by a comparison between measured data of a ship-block and the result by suggested analysis method. Furthermore, a pre-processing system is also developed for fast application of the suggested analysis method.

A Design and Implementation of Missing Person Identification System using face Recognition

  • Shin, Jong-Hwan;Park, Chan-Mi;Lee, Heon-Ju;Lee, Seoung-Hyeon;Lee, Jae-Kwang
    • 한국컴퓨터정보학회논문지
    • /
    • 제26권2호
    • /
    • pp.19-25
    • /
    • 2021
  • 본 논문에서는 비전 기술과 딥러닝 기반의 얼굴인식을 통해 실종자를 식별하는 방법을 제안하였다. 모바일 디바이스에서 전송된 원본 이미지에 대해 얼굴인식에 적합하도록 이미지를 전처리한 후, 얼굴인식의 정확도 향상을 위한 이미지 데이터 증식과 CNN 기반 얼굴학습 및 검증을 통해 실종자를 인식하였다. 본 논문의 구현 결과를 이용하여 가상의 실종자 이미지를 식별한 결과, 원본 데이터와 블러 처리한 데이터를 함께 학습한 모델의 성능이 가장 우수하게 나왔다. 또한 사전학습된 가중치를 사용한 학습 모델은 사용하지 않은 모델보다 높은 성능을 보였지만, 편향과 분산이 높게 나오는 한계를 확인할 수 있었다.

Effective Pre-rating Method Based on Users' Dichotomous Preferences and Average Ratings Fusion for Recommender Systems

  • Cheng, Shulin;Wang, Wanyan;Yang, Shan;Cheng, Xiufang
    • Journal of Information Processing Systems
    • /
    • 제17권3호
    • /
    • pp.462-472
    • /
    • 2021
  • With an increase in the scale of recommender systems, users' rating data tend to be extremely sparse. Some methods have been utilized to alleviate this problem; nevertheless, it has not been satisfactorily solved yet. Therefore, we propose an effective pre-rating method based on users' dichotomous preferences and average ratings fusion. First, based on a user-item ratings matrix, a new user-item preference matrix was constructed to analyze and model user preferences. The items were then divided into two categories based on a parameterized dynamic threshold. The missing ratings for items that the user was not interested in were directly filled with the lowest user rating; otherwise, fusion ratings were utilized to fill the missing ratings. Further, an optimized parameter λ was introduced to adjust their weights. Finally, we verified our method on a standard dataset. The experimental results show that our method can effectively reduce the prediction error and improve the recommendation quality. As for its application, our method is effective, but not complicated.

한국 남성의 고혈압에 대한 특징 선택 기반 위험 예측 (Feature selection-based Risk Prediction for Hypertension in Korean men)

  • 홍고르출;김미혜
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2021년도 춘계학술발표대회
    • /
    • pp.323-325
    • /
    • 2021
  • In this article, we have improved the prediction of hypertension detection using the feature selection method for the Korean national health data named by the KNHANES database. The study identified a variety of risk factors associated with chronic hypertension. The paper is divided into two modules. The first of these is a data pre-processing step that uses a factor analysis (FA) based feature selection method from the dataset. The next module applies a predictive analysis step to detect and predict hypertension risk prediction. In this study, we compare the mean standard error (MSE), F1-score, and area under the ROC curve (AUC) for each classification model. The test results show that the proposed FIFA-OE-NB algorithm has an MSE, F1-score, and AUC outcomes 0.259, 0.460, and 64.70%, respectively. These results demonstrate that the proposed FIFA-OE method outperforms other models for hypertension risk predictions.

Investigation of light stimulated mouse brain activation in high magnetic field fMRI using image segmentation methods

  • Kim, Wook;Woo, Sang-Keun;Kang, Joo Hyun;Lim, Sang Moo
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권12호
    • /
    • pp.11-18
    • /
    • 2016
  • Magnetic resonance image (MRI) is widely used in brain research field and medical image. Especially, non-invasive brain activation acquired image technique, which is functional magnetic resonance image (fMRI) is used in brain study. In this study, we investigate brain activation occurred by LED light stimulation. For investigate of brain activation in experimental small animal, we used high magnetic field 9.4T MRI. Experimental small animal is Balb/c mouse, method of fMRI is using echo planar image (EPI). EPI method spend more less time than any other MRI method. For this reason, however, EPI data has low contrast. Due to the low contrast, image pre-processing is very hard and inaccuracy. In this study, we planned the study protocol, which is called block design in fMRI research field. The block designed has 8 LED light stimulation session and 8 rest session. All block is consist of 6 EPI images and acquired 1 slice of EPI image is 16 second. During the light session, we occurred LED light stimulation for 1 minutes 36 seconds. During the rest session, we do not occurred light stimulation and remain the light off state for 1 minutes 36 seconds. This session repeat the all over the EPI scan time, so the total spend time of EPI scan has almost 26 minutes. After acquired EPI data, we performed the analysis of this image data. In this study, we analysis of EPI data using statistical parametric map (SPM) software and performed image pre-processing such as realignment, co-registration, normalization, smoothing of EPI data. The pre-processing of fMRI data have to segmented using this software. However this method has 3 different method which is Gaussian nonparametric, warped modulate, and tissue probability map. In this study we performed the this 3 different method and compared how they can change the result of fMRI analysis results. The result of this study show that LED light stimulation was activate superior colliculus region in mouse brain. And the most higher activated value of segmentation method was using tissue probability map. this study may help to improve brain activation study using EPI and SPM analysis.

Rough Set Theory와 Support Vector Machine 알고리즘을 이용한 RSIDS 설계 (A Design of RSIDS using Rough Set Theory and Support Vector Machine Algorithm)

  • 이병관;정은희
    • 한국컴퓨터정보학회논문지
    • /
    • 제17권12호
    • /
    • pp.179-185
    • /
    • 2012
  • 본 논문에서는 RST(Rough Set Theory)과 SVM(Support Vector Machine) 알고리즘을 이용한 RSIDS (RST and SVM based Intrusion Detection System)를 설계하였다. RSIDS는 PrePro(Preprocessing) 모듈, RRG(RST based Rule Generation) 모듈, 그리고 SAD(SVM based Attack Detection) 모듈로 구성된다. PrePro 모듈은 수집한 정보를 RSIDS의 데이터 형식에 맞게 변경한다. RRG 모듈은 공격 자료를 분석하여 공격 규칙을 생성하고, 그 규칙을 이용하여 대량화된 데이터에서 공격정보를 추출하고, 그리고 추출한 공격정보를 SAD 모듈에 전달한다. SAD 모듈은 추출된 공격 정보를 이용하여 공격을 탐지하여 관리자에게 통보한다. 그 결과, 기존의 SVM과 비교해볼 때, RSIDS는 평균 공격 탐지율 77.71%에서 85.28%로 향상되었으며, 평균 FPR은 13.25%에서 9.87%로 감소하였다. 따라서 RSIDS는 기존의 SVM을 이용한 공격 탐지 기법보다 향상되었다고 할 수 있다.

New Medical Image Fusion Approach with Coding Based on SCD in Wireless Sensor Network

  • Zhang, De-gan;Wang, Xiang;Song, Xiao-dong
    • Journal of Electrical Engineering and Technology
    • /
    • 제10권6호
    • /
    • pp.2384-2392
    • /
    • 2015
  • The technical development and practical applications of big-data for health is one hot topic under the banner of big-data. Big-data medical image fusion is one of key problems. A new fusion approach with coding based on Spherical Coordinate Domain (SCD) in Wireless Sensor Network (WSN) for big-data medical image is proposed in this paper. In this approach, the three high-frequency coefficients in wavelet domain of medical image are pre-processed. This pre-processing strategy can reduce the redundant ratio of big-data medical image. Firstly, the high-frequency coefficients are transformed to the spherical coordinate domain to reduce the correlation in the same scale. Then, a multi-scale model product (MSMP) is used to control the shrinkage function so as to make the small wavelet coefficients and some noise removed. The high-frequency parts in spherical coordinate domain are coded by improved SPIHT algorithm. Finally, based on the multi-scale edge of medical image, it can be fused and reconstructed. Experimental results indicate the novel approach is effective and very useful for transmission of big-data medical image(especially, in the wireless environment).

Automated Geo-registration for Massive Satellite Image Processing

  • 허준;박완용;방수남
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 한국공간정보시스템학회 2005년도 GIS/RS 공동 춘계학술대회
    • /
    • pp.345-349
    • /
    • 2005
  • Massive amount of satellite image processing such asglobal/continental-level analysis and monitoring requires automated and speedy georegistration. There could be two major automated approaches: (1) rigid mathematical modeling using sensor model and ephemeris data; (2) heuristic co-registration approach with respect to existing reference image. In case of ETM+, the accuracy of the first approach is known as RMSE 250m, which is far below requested accuracy level for most of satellite image processing. On the other hands, the second approach is to find identical points between new image and reference image and use heuristic regression model for registration. The latter shows better accuracy but has problems with expensive computation. To improve efficiency of the coregistration approach, the author proposed a pre-qualified matching algorithm which is composed of feature extraction with canny operator and area matching algorithm with correlation coefficient. Throughout the pre-qualification approach, the computation time was significantly improved and make the registration accuracy is improved. A prototype was implemented and tested with the proposed algorithm. The performance test of 14 TM/ETM+ images in the U.S. showed: (1) average RMSE error of the approach was 0.47 dependent upon terrain and features; (2) the number average matching points were over 15,000; (3) the time complexity was 12 min per image with 3.2GHz Intel Pentium 4 and 1G Ram.

  • PDF

Performance analysis on the geometric correction algorithms using GCPs - polynomial warping and full camera modelling algorithm

  • Shin, Dong-Seok;Lee, Young-Ran
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 1998년도 Proceedings of International Symposium on Remote Sensing
    • /
    • pp.252-256
    • /
    • 1998
  • Accurate mapping of satellite images is one of the most important Parts in many remote sensing applications. Since the position and the attitude of a satellite during image acquisition cannot be determined accurately enough, it is normal to have several hundred meters' ground-mapping errors in the systematically corrected images. The users which require a pixel-level or a sub-pixel level mapping accuracy for high-resolution satellite images must use a number of Ground Control Points (GCPs). In this paper, the performance of two geometric correction algorithms is tested and compared. One is the polynomial warping algorithm which is simple and popular enough to be implemented in most of the commercial satellite image processing software. The other is full camera modelling algorithm using Physical orbit-sensor-Earth geometry which is used in satellite image data receiving, pre-processing and distribution stations. Several criteria were considered for the performance analysis : ultimate correction accuracy, GCP representatibility, number of GCPs required, convergence speed, sensitiveness to inaccurate GCPs, usefulness of the correction results. This paper focuses on the usefulness of the precision correction algorithm for regular image pre-processing operations. This means that not only final correction accuracy but also the number of GCPs and their spatial distribution required for an image correction are important factors. Both correction algorithms were implemented and will be used for the precision correction of KITSAT-3 images.

  • PDF