• 제목/요약/키워드: methods of data analysis

검색결과 19,673건 처리시간 0.052초

공공부문 데이터의 경제적 가치평가 연구: 소상공인 신용보증 데이터 사례 (Economic Valuation of Public Sector Data: A Case Study on Small Business Credit Guarantee Data)

  • 김동성;김종우;이홍주;강만수
    • 지식경영연구
    • /
    • 제18권1호
    • /
    • pp.67-81
    • /
    • 2017
  • As the important breakthrough continues in the field of machine learning and artificial intelligence recently, there has been a growing interest in the analysis and the utilization of the big data which constitutes a foundation for the field. In this background, while the economic value of the data held by the corporates and public institutions is well recognized, the research on the evaluation of its economic value is still insufficient. Therefore, in this study, as a part of the economic value evaluation of the data, we have conducted the economic value measurement of the data generated through the small business guarantee program of Korean Federation of Credit Guarantee Foundations (KOREG). To this end, by examining the previous research related to the economic value measurement of the data and intangible assets at home and abroad, we established the evaluation methods and conducted the empirical analysis. For the data value measurements in this paper, we used 'cost-based approach', 'revenue-based approach', and 'market-based approach'. In order to secure the reliability of the measured result of economic values generated through each approach, we conducted expert verification with the employees. Also, we derived the major considerations and issues in regards to the economic value measurement of the data. These will be able to contribute to the empirical methods for economic value measurement of the data in the future.

광업 데이터의 시계열 분석을 통해 실리카 농도를 예측하기 위한 머신러닝 모델 (A Machine Learning Model for Predicting Silica Concentrations through Time Series Analysis of Mining Data)

  • 이승훈;윤연아;정진형;심현수;장태우;김용수
    • 품질경영학회지
    • /
    • 제48권3호
    • /
    • pp.511-520
    • /
    • 2020
  • Purpose: The purpose of this study was to devise an accurate machine learning model for predicting silica concentrations following the addition of impurities, through time series analysis of mining data. Methods: The mining data were preprocessed and subjected to time series analysis using the machine learning model. Through correlation analysis, valid variables were selected and meaningless variables were excluded. To reflect changes over time, dependent variables at baseline were treated as independent variables at later time points. The relationship between independent variables and the dependent variable after n point was subjected to Pearson correlation analysis. Results: The correlation (R2) was strongest after 3 hours, which was adopted as a dependent variable. According to root mean square error (RMSE) data, the proposed method was superior to the other machine learning methods. The XGboost algorithm showed the best predictive performance. Conclusion: This study is important given the current lack of machine learning studies pertaining to the domestic mining industry. In addition, using time series analysis in mining data will show further improvement. Before establishing a predictive model for the proposed method, predictions should be made using data with time series characteristics. After doing this work, it should also improve prediction accuracy in other domains.

Computational analysis of large-scale genome expression data

  • Zhang, Michael
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.41-44
    • /
    • 2000
  • With the advent of DNA microarray and "chip" technologies, gene expression in an organism can be monitored on a genomic scale, allowing the transcription levels of many genes to be measured simultaneously. Functional interpretation of massive expression data and linking such data to DNA sequences have become the new challenges to bioinformatics. I will us yeast cell cycle expression data analysis as an example to demonstrate how special database and computational methods may be used for extracting functional information, I will also briefly describe a novel clustering algorithm which has been applied to the cell cycle data.

  • PDF

A Study of Comparing Speech Act Data from Two Differing Data-gathering Instruments

  • Suh, Jae-Suk
    • 영어어문교육
    • /
    • 제13권3호
    • /
    • pp.77-97
    • /
    • 2007
  • To compare data on the speech act of requests from two different methods, a study was conducted in which both native and non-native speakers of English participated as subjects, and data were collected by means of actual e-mail writing and DCT (discourse completion test). The analysis of requests from the two different data-gathering methods showed that despite some similarities, considerable differences existed between e-mail and DCT requests in several important aspects of requests such as amount of talk, directness level, downgraders and supportive moves which play an important role in making a given request sound less imposing and more polite. Also it was shown that requests of non-native speakers differed considerably from requests of native speakers in terms of the four aspects of requests across type of data-gathering methods. Based on the findings, some suggestions were made for both further research and L2 classrooms.

  • PDF

내부손실계수 측정을 위한 실험 방법 (Experimental Methods for the Measurement of Damping Loss Factors)

  • 김관주;최승권
    • 소음진동
    • /
    • 제9권6호
    • /
    • pp.1187-1192
    • /
    • 1999
  • The purpose of this study is to determine the most appropriate experimental method of the measurement of "damping loss factors" (DLF) for the statistical energy analysis(SEA) calculation. The successful prediction of vibration levels from the structure is critically dependent on the accurate estimation of DLF's not only in conventional vibration analysis but especially in SEA. Unforunately, calculation of accurate DLF is not an easy matter. So experimental methods are made use of for the DLF values. Three kinds of experimental methods for estimating DLF, i.e. decay rate method, half-power bandwidth method and power balance method, are presented and tests are carried out for the plate and the cylindrical shell examples. Pro and con of each methods is reviewed. Finally, calculated DLF values are used for vibration level estimation using commercial SEA software and compared with measured vibration data.tion data.

  • PDF

Urban Environment change detection through landscape indices derived from Landsat TM data

  • Iisaka, Joji
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2002년도 Proceedings of International Symposium on Remote Sensing
    • /
    • pp.696-701
    • /
    • 2002
  • This paper describes some results of change detection in Tokyo metropolitan area, Japan , using the Landsat TM data, and methods to quantify the ground cover classes. The changes are analyzed using the measures of not only conventional spectral classes but also a set of landscape indices to describe spatial properties of ground cove types using fractal dimension of objects, entropy in the specific windows defining the neighbors of focusing locations. In order eliminate the seasonal radiometric effects on TM data, an automated class labeling method is also attempted. Urban areas are also delineated automatically by defining the boundaries of the urban area. These procedures for urban change detection were implemented by the unified image computing methods proposed by the author, they can be automated in coherent and systematic ways, and it is anticipated to automate the whole procedures. The results of this analysis suggest that Tokyo metropolitan area was extended to the suburban areas along the new transportation networks and the high density area of Tokyo were also very much extended during the period between 1985 and 1995.

  • PDF

A Comparative Study on the Performance of Bayesian Partially Linear Models

  • Woo, Yoonsung;Choi, Taeryon;Kim, Wooseok
    • Communications for Statistical Applications and Methods
    • /
    • 제19권6호
    • /
    • pp.885-898
    • /
    • 2012
  • In this paper, we consider Bayesian approaches to partially linear models, in which a regression function is represented by a semiparametric additive form of a parametric linear regression function and a nonparametric regression function. We make a comparative study on the performance of widely used Bayesian partially linear models in terms of empirical analysis. Specifically, we deal with three Bayesian methods to estimate the nonparametric regression function, one method using Fourier series representation, the other method based on Gaussian process regression approach, and the third method based on the smoothness of the function and differencing. We compare the numerical performance of three methods by the root mean squared error(RMSE). For empirical analysis, we consider synthetic data with simulation studies and real data application by fitting each of them with three Bayesian methods and comparing the RMSEs.

A review of analysis methods for secondary outcomes in case-control studies

  • Schifano, Elizabeth D.
    • Communications for Statistical Applications and Methods
    • /
    • 제26권2호
    • /
    • pp.103-129
    • /
    • 2019
  • The main goal of a case-control study is to learn the association between various risk factors and a primary outcome (e.g., disease status). Particularly recently, it is also quite common to perform secondary analyses of the case-control data in order to understand certain associations between the risk factors of the primary outcome. It has been repeatedly documented with case-control data, association studies of the risk factors that ignore the case-control sampling scheme can produce highly biased estimates of the population effects. In this article, we review the issues of the naive secondary analyses that do not account for the biased sampling scheme, and also the various methods that have been proposed to account for the case-control ascertainment. We additionally compare the results of many of the discussed methods in an example examining the association of a particular genetic variant with smoking behavior, where the data were obtained from a lung cancer case-control study.

철도 산업의 공기 질 데이터베이스 연합형 통합을 위한 지능형 데이터 거버넌스 (Intelligent Data Governance for the Federated Integration of Air Quality Databases in the Railway Industry)

  • 김민정;원종운;박상찬;박가영
    • 품질경영학회지
    • /
    • 제50권4호
    • /
    • pp.811-830
    • /
    • 2022
  • Purpose: In this paper, we will discuss 1) prioritizing databases to be integrated; 2) which data elements should be emphasized in federated database integration; and 3) the degree of efficiency in the integration. This paper aims to lay the groundwork for building data governance by presenting guidelines for database integration using metrics to identify and evaluate the capabilities of the UK's air quality databases. Methods: This paper intends to perform relative efficiency analysis using Data Envelope Analysis among the multi-criteria decision-making methods. In federated database integration, it is important to identify databases with high integration efficiency when prioritizing databases to be integrated. Results: The outcome of this paper aims not to present performance indicators for the implementation and evaluation of data governance, but rather to discuss what criteria should be used when performing 'federated integration'. Using Data Envelope Analysis in the process of implementing intelligent data governance, authors will establish and present practical strategies to discover databases with high integration efficiency. Conclusion: Through this study, it was possible to establish internal guidelines from an integrated point of view of data governance. The flexiblity of the federated database integration under the practice of the data governance, makes it possible to integrate databases quickly, easily, and effectively. By utilizing the guidelines presented in this study, authors anticipate that the process of integrating multiple databases, including the air quality databases, will evolve into the intelligent data governance based on the federated database integration when establishing the data governance practice in the railway industry.

채광·조명설비시스템의 광학 분석을 위한 이미지 프로세싱 기법에 관한 연구 (Methodological study on the High Dynamic Range Imaging Processing)

  • 임홍수;김곤
    • KIEAE Journal
    • /
    • 제10권4호
    • /
    • pp.3-8
    • /
    • 2010
  • Recently, various daylight evaluation methods for visual environment have been developed; simulation analysis methods, numerical calculation, and data monitoring methods. However, it is impossible for simulation analysis to make real scenes and visualize real images exactly. Also, a numerical calculation is considered as an out of date and time-consuming mean. Therefore, for acquisition of accurate results, many studies often use the monitoring data methods. Especially, most studies regarding discomfort glare are evaluated by measuring the physical quantity of luminance through traditional measuring Minolta Luminance meters as an instrument. But, this method has a difficulty in measuring several points at the same time because of the limitation of spaces and time when mapping. So, this study focused on the potential usefulness of High Dynamic Range photography technique as a luminance mapping tool. In order to evaluate the accuracy of proposed programs such as webHDR, Photomatix and PHOTOLUX, this paper has conducted an experiment by using Canon EOS 5D and NICON Coolpix8400 digital camera.