• 제목/요약/키워드: data set

검색결과 10,939건 처리시간 0.035초

영상 데이터 특징 커버리지 기반 딥러닝 모델 검증 기법 (Deep Learning Model Validation Method Based on Image Data Feature Coverage)

  • 임창남;박예슬;이정원
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제10권9호
    • /
    • pp.375-384
    • /
    • 2021
  • 딥러닝 기법은 영상 처리 분야에서 높은 성능을 입증 받아 다양한 분야에서 적용되고 있다. 이러한 딥러닝 모델의 검증에 가장 널리 사용되는 방법으로는 홀드아웃 검증 방법, k-겹 교차 검증 방법, 부트스트랩 방법 등이 있다. 이러한 기존의 기법들은 데이터 셋을 분할하는 과정에서 클래스 간의 비율에 대한 균형을 고려하지만, 같은 클래스 내에서도 존재하는 다양한 특징들의 비율은 고려하지 않고 있다. 이러한 특징들을 고려하지 않을 경우, 일부 특징에 편향된 검증 결과를 얻게 될 수 있다. 따라서 본 논문에서는 기존 검증 방법들을 개선하여 영상 분류를 위한 데이터 특징 커버리지 기반의 딥러닝 모델 검증 기법을 제안한다. 제안하는 기법은 딥러닝 모델의 학습과 검증을 위한 훈련 데이터 셋과 평가 데이터 셋이 전체 데이터 셋의 특징을 얼마나 반영하고 있는지 수치로 측정할 수 있는 데이터 특징 커버리지를 제안한다. 이러한 방식은 전체 데이터 셋의 특징을 모두 포함하도록 커버리지를 보장하여 데이터 셋을 분할할 수 있고, 모델의 평가 결과를 생성한 특징 군집 단위로 분석할 수 있다. 검증결과, 훈련 데이터 셋의 데이터 특징 커버리지가 낮아질 경우, 모델이 특정 특징에 편향되게 학습하여 모델의 성능이 낮아지며, Fashion-MNIST의 경우 정확도가 8.9%까지 차이나는 것을 확인하였다.

집합 기반 POI 검색을 이용한 문장 유사도 측정 기법 (Sentence Similarity Measurement Method Using a Set-based POI Data Search)

  • 고은별;이종우
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제20권12호
    • /
    • pp.711-716
    • /
    • 2014
  • 최근 논문 표절 논란과 지능형 텍스트 검색서비스에 대한 관심이 증가하면서 문장 유사도 측정의 필요성이 증가하고 있다. n-gram, 편집거리, LSA 등 기존의 다양한 방향으로 선행 연구가 있었지만 각 기법마다 장단점이 존재한다. 본 논문에서는 집합 기반 POI 검색 기법을 이용한 새로운 방향의 문장 유사도 측정 기법을 제안한다. 집합 기반 POI 검색 기법은 하드매칭에 비해 단어의 도치, 누락, 삽입, 변경에 현저한 성능 향상을 보인다. 이 기법을 이용하면 보다 정확하고 빠른 문장 유사도 측정이 가능하다. 제안하는 기법은 기존 집합 기반 POI 검색 기법의 데이터 로딩 알고리즘과 텍스트 검색 알고리즘을 변형하고 어절 연산 알고리즘을 추가하여 두 문장의 유사도를 백분율로 표현한다. 실험을 통해 본 논문에서 제시하는 기법이 정확도와 속도에서 n-gram과 기존 집합 기반 POI 검색 기법에 비해 우수함을 확인하였다.

Information Engineering and Workflow Design in a Clinical Decision Support System for Colorectal Cancer Screening in Iran

  • Maserat, Elham;Farajollah, Seiede Sedigheh Seied;Safdari, Reza;Ghazisaeedi, Marjan;Aghdaei, Hamid Asadzadeh;Zali, Mohammad Reza
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제16권15호
    • /
    • pp.6605-6608
    • /
    • 2015
  • Background: Colorectal cancer is a major cause of morbidity and mortality throughout the world. Colorectal cancer screening is an optimal way for reducing of morbidity and mortality and a clinical decision support system (CDSS) plays an important role in predicting success of screening processes. DSS is a computer-based information system that improves the delivery of preventive care services. The aim of this article was to detail engineering of information requirements and work flow design of CDSS for a colorectal cancer screening program. Materials and Methods: In the first stage a screening minimum data set was determined. Developed and developing countries were analyzed for identifying this data set. Then information deficiencies and gaps were determined by check list. The second stage was a qualitative survey with a semi-structured interview as the study tool. A total of 15 users and stakeholders' perspectives about workflow of CDSS were studied. Finally workflow of DSS of control program was designed by standard clinical practice guidelines and perspectives. Results: Screening minimum data set of national colorectal cancer screening program was defined in five sections, including colonoscopy data set, surgery, pathology, genetics and pedigree data set. Deficiencies and information gaps were analyzed. Then we designed a work process standard of screening. Finally workflow of DSS and entry stage were determined. Conclusions: A CDSS facilitates complex decision making for screening and has key roles in designing optimal interactions between colonoscopy, pathology and laboratory departments. Also workflow analysis is useful to identify data reconciliation strategies to address documentation gaps. Following recommendations of CDSS should improve quality of colorectal cancer screening.

HTTP 기반의 자바를 이용한 원격 감시 및 제어 시스템 (HTTP based remote monitoring and control system using JAVA)

  • 이경웅;최한수
    • 제어로봇시스템학회논문지
    • /
    • 제10권9호
    • /
    • pp.847-854
    • /
    • 2004
  • In this paper, It is studied to control and to monitor the remote system state using HTTP(Hyper Text Transfer Protocol) object communication. The remote control system is controlled by using a web browser or a application program. This system is organized by three different part depending on functionality-server part, client part, controller part. The java technology is used to composite the server part and the client part and C language is used for a controller. The server part is waiting for the request of client part and then the request is reached, the server part saves client data to the database and send a command set to the client part. The administrator can control the remote system just using a web browser. Remote part is worked by timer that is activated per 1 second. It gets the measurement data of the controller part, and then send the request to the server part and get a command set in the command repository of server part using the client ID. After interpreting the command set, the client part transfers the command set to the controller part. Controller part can be activated by the client part. If send command is transmitted by the client part, it sends sensor monitoring data to the client part and command set is transmitted then setting up the value of the controlled system.

A Filter Lining Scheme for Efficient Skyline Computation

  • Kim, Ji-Hyun;Kim, Myung
    • 한국멀티미디어학회논문지
    • /
    • 제14권12호
    • /
    • pp.1591-1600
    • /
    • 2011
  • The skyline of a multidimensional data set is the maximal subset whose elements are not dominated by other elements of the set. Skyline computation is considered to be very useful for a decision making system that deals with multidimensional data analyses. Recently, a great deal of interests has been shown to improve the performance of skyline computation algorithms. In order to speedup, the number of comparisons between data elements should be reduced. In this paper, we propose a filter lining scheme to accomplish such objectives. The scheme divides the multidimensional data space into angle-based partitions, and places a filter for each partition, and then connects them together in order to establish the final filter line. The filter line can be used to eliminate data, that are not part of the skyline, from the original data set in the preprocessing stage. The filter line is adaptively improved during the data scanning stage. In addition, skylines are computed for each remaining data partition, and are then merged to form the final skyline. Our scheme is an improvement of the previously reported simple preprocessing scheme using simple filters. The performance of the scheme is shown by experiments.

PREDICTION OF RESIDUAL STRESS FOR DISSIMILAR METALS WELDING AT NUCLEAR POWER PLANTS USING FUZZY NEURAL NETWORK MODELS

  • Na, Man-Gyun;Kim, Jin-Weon;Lim, Dong-Hyuk
    • Nuclear Engineering and Technology
    • /
    • 제39권4호
    • /
    • pp.337-348
    • /
    • 2007
  • A fuzzy neural network model is presented to predict residual stress for dissimilar metal welding under various welding conditions. The fuzzy neural network model, which consists of a fuzzy inference system and a neuronal training system, is optimized by a hybrid learning method that combines a genetic algorithm to optimize the membership function parameters and a least squares method to solve the consequent parameters. The data of finite element analysis are divided into four data groups, which are split according to two end-section constraints and two prediction paths. Four fuzzy neural network models were therefore applied to the numerical data obtained from the finite element analysis for the two end-section constraints and the two prediction paths. The fuzzy neural network models were trained with the aid of a data set prepared for training (training data), optimized by means of an optimization data set and verified by means of a test data set that was different (independent) from the training data and the optimization data. The accuracy of fuzzy neural network models is known to be sufficiently accurate for use in an integrity evaluation by predicting the residual stress of dissimilar metal welding zones.

An Alternative Method of Regression: Robust Modified Anti-Hebbian Learning

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • 제7권2호
    • /
    • pp.203-210
    • /
    • 1996
  • A linear neural unit with a modified anti-Hebbian learning rule has been shown to be able to optimally fit curves, surfaces, and hypersurfaces by adaptively extracting the minor component of the input data set. In this paper, we study how to use the robust version of this neural fitting method for linear regression analysis. Furthermore, we compare this method with other methods when data set is contaminated by outliers.

  • PDF

Tolerance Rough Set Approaches in the Classification of Multi-Attribute Data

  • Lee, Jaeik;Suh Kapsun;Suh, Yong-Soo
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1997년도 추계학술대회 학술발표 논문집
    • /
    • pp.419-423
    • /
    • 1997
  • This paper is concerned about the classification of objects together with muti-attributes such as remote sensing image data by using tolerance rough set. To produce more reliable relations from given attributes in the data, we define new similarity measures by using scaling. Our Method will be applied to classify multi-spectral image data.

  • PDF

On the Estimation of Parameters in ALT under Generalized Exponential Distribution

  • Yoon, Sang-Chul
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.923-931
    • /
    • 2005
  • The two parameter generalized exponential distribution was recently introduced by Gupta and Kundu (1999). It is observed that the generalized exponential distribution can be used quite effectively to analyze skewed data set. This paper develops the accelerated life test model using generalized exponential distribution and considers maximum likelihood estimation of parameters under the tampered random variable model. To show the performance of proposed maximum likelihood estimates, some simulation will be performed. Using a real data set, an example will be given.

  • PDF

Comparison of EKF and UKF on Training the Artificial Neural Network

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권2호
    • /
    • pp.499-506
    • /
    • 2004
  • The Unscented Kalman Filter is known to outperform the Extended Kalman Filter for the nonlinear state estimation with a significance advantage that it does not require the computation of Jacobian but EKF has a competitive advantage to the UKF on the performance time. We compare both algorithms on training the artificial neural network. The validation data set is used to estimate parameters which are supposed to result in better fitting for the test data set. Experimental results are presented which indicate the performance of both algorithms.

  • PDF