• 제목/요약/키워드: data set

검색결과 10,939건 처리시간 0.039초

데이터 증강을 통한 기계학습 능력 개선 방법 연구 (Study on the Improvement of Machine Learning Ability through Data Augmentation)

  • 김태우;신광성
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 춘계학술대회
    • /
    • pp.346-347
    • /
    • 2021
  • 기계학습을 위한 패턴인식을 위해서는 학습데이터의 양이 많을수록 그 성능이 향상된다. 하지만 일상에서 검출해내야하는 패턴의 종류 및 정보가 항상 많은 양의 학습데이터를 확보할 수는 없다. 따라서 일반적인 기계학습을 위해 적은데이터셋을 의미있게 부풀릴 필요가 있다. 본 연구에서는 기계학습을 수행할 수 있도록 데이터를 증강시키는 기법에 관해 연구한다. 적은데이터셋을 이용하여 기계학습을 수행하는 대표적인 방법이 전이학습(transfer learning) 기법이다. 전이학습은 범용데이터셋으로 기본적인 학습을 수행한 후 목표데이터셋을 최종 단계에 대입함으로써 결과를 얻어내는 방법이다. 본 연구에서는 ImageNet과 같은 범용데이터셋으로 학습시킨 학습모델을 증강된 데이터를 이용하여 특징추출셋으로 사용하여 원하는 패턴에 대한 검출을 수행한다.

  • PDF

AVHRR MOSAIC IMAGE DATA SET FOR ASIAN REGION

  • Yokoyama, Ryuzo;Lei, Liping;Purevdorj, Ts.;Tanba, Sumio
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 1999년도 Proceedings of International Symposium on Remote Sensing
    • /
    • pp.285-289
    • /
    • 1999
  • A processing system to produce cloud-free composite image data set was developed. In the process, a fine geometric correction based on orbit parameters and ground control points and radiometric correction based on 6S code are applied. Presently, by using AVHRR image data received at Tokyo, Okinawa, Ulaanbaatar and Bangkok, data set of 10 days composite images covering almost whole Asian region.

  • PDF

대용량 자료에 대한 서포트 벡터 회귀에서 모수조절 (Parameter Tuning in Support Vector Regression for Large Scale Problems)

  • 류지열;곽민정;윤민
    • 한국지능시스템학회논문지
    • /
    • 제25권1호
    • /
    • pp.15-21
    • /
    • 2015
  • 커널에 대한 모수의 조절은 서포트 벡터 기계의 일반화 능력에 영향을 준다. 이와 같이 모수들의 적절한 값을 결정하는 것은 종종 어려운 작업이 된다. 서포트 벡터 회귀에서 이와 같은 모수들의 값을 결정하기 위한 부담은 앙상블 학습을 사용함으로써 감소시킬 수 있다. 그러나 대용량의 자료에 대한 문제에 직접적으로 적용하기에는 일반적으로 시간 소모적인 방법이다. 본 논문에서 서포트 벡터 회귀의 모수 조절에 대한 부담을 감소하기 위하여 원래 자료집합을 유한개의 부분집합으로 분해하는 방법을 제안하였다. 제안하는 방법은 대용량의 자료들인 경우와 특히 불균등 자료 집합에서 효율적임을 보일 것이다.

Detecting differentially expressed genes from a mixed data set

  • Lee, Sun-Ho;Kim, In-Young;Kim, Sang-Cheol;Rha, Sun-Young;Chung, Hyun-Chel;Kim, Byung-Soo
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2003년도 추계 학술발표회 논문집
    • /
    • pp.173-177
    • /
    • 2003
  • When we have both a paired data set and two independent data sets, neither a paired t-test nor a two-sample t-test can be used to detect differences between two samples. In order to identify differentially expressed genes in a mixed data set, a new test statistic is proposed.

  • PDF

Rough Set을 이용한 퍼지 규칙의 생성 (Extraction of Fuzzy Rules from Data using Rough Set)

  • 조영완;노흥식;위성윤;이희진;박민용
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 1996년도 추계학술대회 학술발표 논문집
    • /
    • pp.327-332
    • /
    • 1996
  • Rough Set theory suggested by Pawlak has a property that it can describe the degree of relation between condition and decision attributes of data which don't have linguistic information. In this paper, by using this ability of rough set theory, we define a occupancy degree which is a measure can represent a degree of relational quantity between condition and decision attributes of data table. We also propose a method that can find an optimal fuzzy rule table and membership functions of input and output variables from data without linguistic information and examine the validity of the method by modeling data generated by fuzzy rule.

  • PDF

TMY2 방식에 의한 국내 기상자료 작성 연구 (TMY2 Weather data for Korea)

  • 신기식;윤창렬;박상동
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 한국신재생에너지학회 2009년도 춘계학술대회 논문집
    • /
    • pp.243-246
    • /
    • 2009
  • To evaluate the building energy performance, many building simulation programs are used and its capabilities are developed. Despite of its increased capabilities the weather data used In the Building Energy performance evaluation, are still using the same limited set of data. This often forces users to find or calculate weather data such as illuminance, solar radiation, and ground temperature from other sources to calculate it. Also, proper selection of a right weather data set has been considered as one of important factors for a successful building energy simulation. In this paper, we describe TMY2 data, a generalized weather data format developed for use, and applied to Seoul region and examine the differences comparing to existing weather data. A set of 23 years raw weather data base has been developed to provide the weather data file for building energy analysis in Seoul.

  • PDF

Utilizing the GOA-RF hybrid model, predicting the CPT-based pile set-up parameters

  • Zhao, Zhilong;Chen, Simin;Zhang, Dengke;Peng, Bin;Li, Xuyang;Zheng, Qian
    • Geomechanics and Engineering
    • /
    • 제31권1호
    • /
    • pp.113-127
    • /
    • 2022
  • The undrained shear strength of soil is considered one of the engineering parameters of utmost significance in geotechnical design methods. In-situ experiments like cone penetration tests (CPT) have been used in the last several years to estimate the undrained shear strength depending on the characteristics of the soil. Nevertheless, the majority of these techniques rely on correlation presumptions, which may lead to uneven accuracy. This research's general aim is to extend a new united soft computing model, which is a combination of random forest (RF) with grasshopper optimization algorithm (GOA) to the pile set-up parameters' better approximation from CPT, based on two different types of data as inputs. Data type 1 contains pile parameters, and data type 2 consists of soil properties. The contribution of this article is that hybrid GOA - RF for the first time, was suggested to forecast the pile set-up parameter from CPT. In order to do this, CPT data and related bore log data were gathered from 70 various locations across Louisiana. With an R2 greater than 0.9098, which denotes the permissible relationship between measured and anticipated values, the results demonstrated that both models perform well in forecasting the set-up parameter. It is comprehensible that, in the training and testing step, the model with data type 2 has finer capability than the model using data type 1, with R2 and RMSE are 0.9272 and 0.0305 for the training step and 0.9182 and 0.0415 for the testing step. All in all, the models' results depict that the A parameter could be forecasted with adequate precision from the CPT data with the usage of hybrid GOA - RF models. However, the RF model with soil features as input parameters results in a finer commentary of pile set-up parameters.

The Effect of Bias in Data Set for Conceptual Clustering Algorithms

  • Lee, Gye Sung
    • International journal of advanced smart convergence
    • /
    • 제8권3호
    • /
    • pp.46-53
    • /
    • 2019
  • When a partitioned structure is derived from a data set using a clustering algorithm, it is not unusual to have a different set of outcomes when it runs with a different order of data. This problem is known as the order bias problem. Many algorithms in machine learning fields try to achieve optimized result from available training and test data. Optimization is determined by an evaluation function which has also a tendency toward a certain goal. It is inevitable to have a tendency in the evaluation function both for efficiency and for consistency in the result. But its preference for a specific goal in the evaluation function may sometimes lead to unfavorable consequences in the final result of the clustering. To overcome this bias problems, the first clustering process proceeds to construct an initial partition. The initial partition is expected to imply the possible range in the number of final clusters. We apply the data centric sorting to the data objects in the clusters of the partition to rearrange them in a new order. The same clustering procedure is reapplied to the newly arranged data set to build a new partition. We have developed an algorithm that reduces bias effect resulting from how data is fed into the algorithm. Experiment results have been presented to show that the algorithm helps minimize the order bias effects. We have also shown that the current evaluation measure used for the clustering algorithm is biased toward favoring a smaller number of clusters and a larger size of clusters as a result.

어린이집 CCTV 빅데이터의 활용을 위한 기초 연구 (Preliminary Study on Utilization of Big Data from CCTV at Child Care Centers)

  • 신나리;유애형
    • 한국보육지원학회지
    • /
    • 제13권6호
    • /
    • pp.43-67
    • /
    • 2017
  • Objective: The purpose of this study was to explore the feasibility to utilize image data recorded and accumulated from CCTV at child care centers. Methods: Literature reviews, consultations and workshops with scholars studying child development, legal professionals, and engineers, focus group interviews with professionals working with young children, and surveys targeting parents, directors and teachers were implemented. Results: It was found the big data from CCTV at child care centers can be used to make policies and implement research as a secondary data set after anonymization. Extracting implicit and useful data from images stored on CCTV is technically feasible. Also, it can be legally guaranteed to analyze the data under the condition of acquiring informed consents. Conclusion/Implications: It was likely to utilize image data from CCTV at child care centers as a secondary data set in order for policy development and scholarly purposes, after overcoming obstacles of the budget for additional infrastructures and consents of information holders.

Study on Rainfall Characteristics for the Millimeter-wave Communication Systems-Comparisons of Rainfall rate data from Several observation methods.

  • Chung, H.S.;Song, B.H.;Lee, J.H.;Park, K.M.;Lee, K.A.
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 1999년도 Proceedings of International Symposium on Remote Sensing
    • /
    • pp.132-134
    • /
    • 1999
  • Rainfall characteristics for designing the optimum millimeter-wave communication systems from two rainfall data set was analyzed. Two rainfall data sets were compared; one-minute rainfall rate data, one-hour synoptic observation data. Each data set has different observation method, sampling frequency. We looked for tendency and quality confluence between two data sets. We showed several results using one-minute rainfall data by millimeter-wave attenuation model. A climatological one-minute rainfall rate data set over Korean Peninsula will be made after data quality control procedure

  • PDF