• Title/Summary/Keyword: random data analysis

Search Result 1,741, Processing Time 0.036 seconds

Empirical Analysis on the Factors Affecting the Net Income of Regional and Industrial Fisheries Cooperatives Using Panel Data (패널자료를 이용한 지구별·업종별 수산업협동조합의 수익에 영향을 미치는 요인 분석)

  • Kim, Cheol-Hyun;Nam, Jong-Oh
    • The Journal of Fisheries Business Administration
    • /
    • v.51 no.1
    • /
    • pp.81-96
    • /
    • 2020
  • The purpose of this paper is to analyze factors affecting the net income of regional and industrial fisheries cooperatives in South Korea using panel data. This paper utilizes linear or GLS regression models such as pooled OLS model, fixed effects model, and random effects model to estimate affecting factors of the net income of regional and industrial fisheries cooperatives. After reviewing various tests, we eventually select random effects model. The results, based on panel data between 2013 and 2018 year and 64 fisheries cooperatives, indicate that capital and area dummy variables have positive effects and employment has negative effect on the net income of regional and industrial fisheries cooperatives as predicted. However, debt are opposite with our predictions. Specifically, it turns out that debt has positive effect on the net income of regional and industrial fisheries cooperatives although it has been increased. Additionally, this paper shows that the member of confreres does not show any significant effect on the net income of regional and industrial fisheries cooperatives in South Korea. This study is significant in that it analyzes the major factors influencing changes in the net income that have not been conducted recently for the fisheries cooperatives by region and industry.

MAHA-FS : A Distributed File System for High Performance Metadata Processing and Random IO (MAHA-FS : 고성능 메타데이터 처리 및 랜덤 입출력을 위한 분산 파일 시스템)

  • Kim, Young Chang;Kim, Dong Oh;Kim, Hong Yeon;Kim, Young Kyun;Choi, Wan
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.2
    • /
    • pp.91-96
    • /
    • 2013
  • The application field of supercomputing systems are changing to support into the field for both a large-volume data processing and high-performance computing at the same time such as bio-applications. These applications require high-performance distributed file system for storage management and efficient high-speed processing of large amounts of data that occurs. In this paper, we introduce MAHA-FS for supercomputing systems for processing large amounts of data and high-performance computing, providing excellent metadata operation performance and IO performance. It is shown through performance analysis that MAHA-FS provides excellent performance in terms of the metadata processing and random IO processing.

Forest Vertical Structure Mapping from Bi-Seasonal Sentinel-2 Images and UAV-Derived DSM Using Random Forest, Support Vector Machine, and XGBoost

  • Young-Woong Yoon;Hyung-Sup Jung
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.2
    • /
    • pp.123-139
    • /
    • 2024
  • Forest vertical structure is vital for comprehending ecosystems and biodiversity, in addition to fundamental forest information. Currently, the forest vertical structure is predominantly assessed via an in-situ method, which is not only difficult to apply to inaccessible locations or large areas but also costly and requires substantial human resources. Therefore, mapping systems based on remote sensing data have been actively explored. Recently, research on analyzing and classifying images using machine learning techniques has been actively conducted and applied to map the vertical structure of forests accurately. In this study, Sentinel-2 and digital surface model images were obtained on two different dates separated by approximately one month, and the spectral index and tree height maps were generated separately. Furthermore, according to the acquisition time, the input data were separated into cases 1 and 2, which were then combined to generate case 3. Using these data, forest vetical structure mapping models based on random forest, support vector machine, and extreme gradient boost(XGBoost)were generated. Consequently, nine models were generated, with the XGBoost model in Case 3 performing the best, with an average precision of 0.99 and an F1 score of 0.91. We confirmed that generating a forest vertical structure mapping model utilizing bi-seasonal data and an appropriate model can result in an accuracy of 90% or higher.

Probabilistic Approach of Stability Analysis for Rock Wedge Failure (확률론적 해석방법을 이용한 쐐기파괴의 안정성 해석)

  • Park, Hyuck-Jin
    • Economic and Environmental Geology
    • /
    • v.33 no.4
    • /
    • pp.295-307
    • /
    • 2000
  • Probabilistic analysis is a powerful method to quantify variability and uncertainty common in engineering geology fields. In rock slope engineering, the uncertainty and variation may be in the form of scatter in orientations and geometries of discontinuities, and also test results. However, in the deterministic analysis, the factor of safety which is used to ensure stability of rock slopes, is based on the fixed representative values for each parameter without a consideration of the scattering in data. For comparison, in the probabilistic analysis, these discontinuity parameters are considered as random variables, and therefore, the reliability and probability theories are utilized to evaluate the possibility of slope failure. Therefore, in the probabilistic analysis, the factor of safety is considered as a random variable and replaced by the probability of failure to measure the level of slope stability. In this study, the stochastic properties of discontinuity parameters are evaluated and the stability of rock slope is analyzed based on the random properties of discontinuity parameters. Then, the results between the deterministic analysis and the probabilistic analysis are compared and the differences between the two analysis methods are explained.

  • PDF

Statistical Analysis of Degradation Data under a Random Coefficient Rate Model (확률계수 열화율 모형하에서 열화자료의 통계적 분석)

  • Seo, Sun-Keun;Lee, Su-Jin;Cho, You-Hee
    • Journal of Korean Society for Quality Management
    • /
    • v.34 no.3
    • /
    • pp.19-30
    • /
    • 2006
  • For highly reliable products, it is difficult to assess the lifetime of the products with traditional life tests. Accordingly, a recent approach is to observe the performance degradation of product during the test rather than regular failure time. This study compares performances of three methods(i.e. the approximation, analytical and numerical methods) to estimate the parameters and quantiles of the lifetime when the time-to-failure distribution follows Weibull and lognormal distributions under a random coefficient degradation rate model. Numerical experiments are also conducted to investigate the effects of model error such as measurements in a random coefficient model.

Analysis of a Random Shock Model for a System and Its Optimization

  • Park, Jeong-Hun;Choi, Seung-Kyoung;Lee, Eui-Yong
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2004.10a
    • /
    • pp.33-42
    • /
    • 2004
  • In this paper, a random shock model for a system is considered. Each shock arriving according to a Poisson process decreases the state of the system by a random amount. A repairman arriving according to another Poisson process of rate $\lambda$ repairs the system only if the state of the system is below a threshold $\alpha$. After assigning various costs to the system, we calculate the long-run average cost and show that there exist a unique value of arrival rate $\lambda$ and a unique value of threshold $\alpha$ which minimize the long-run average cost per unit time.

  • PDF

A Spatial Analysis of Seismic Vulnerability of Buildings Using Statistical and Machine Learning Techniques Comparative Analysis (통계분석 기법과 머신러닝 기법의 비교분석을 통한 건물의 지진취약도 공간분석)

  • Seong H. Kim;Sang-Bin Kim;Dae-Hyeon Kim
    • Journal of Industrial Convergence
    • /
    • v.21 no.1
    • /
    • pp.159-165
    • /
    • 2023
  • While the frequency of seismic occurrence has been increasing recently, the domestic seismic response system is weak, the objective of this research is to compare and analyze the seismic vulnerability of buildings using statistical analysis and machine learning techniques. As the result of using statistical technique, the prediction accuracy of the developed model through the optimal scaling method showed about 87%. As the result of using machine learning technique, because the accuracy of Random Forest method is 94% in case of Train Set, 76.7% in case of Test Set, which is the highest accuracy among the 4 analyzed methods, Random Forest method was finally chosen. Therefore, Random Forest method was derived as the final machine learning technique. Accordingly, the statistical analysis technique showed higher accuracy of about 87%, whereas the machine learning technique showed the accuracy of about 76.7%. As the final result, among the 22,296 analyzed building data, the seismic vulnerabilities of 1,627(0.1%) buildings are expected as more dangerous when the statistical analysis technique is used, 10,146(49%) buildings showed the same rate, and the remaining 10,523(50%) buildings are expected as more dangerous when the machine learning technique is used. As the comparison of the results of using advanced machine learning techniques in addition to the existing statistical analysis techniques, in spatial analysis decisions, it is hoped that this research results help to prepare more reliable seismic countermeasures.

The Determinants of FDI Inflow after Reform-Opening of China (중국에서 개혁·개방이후 FDI유입에 영향을 미치는 요인들)

  • Choi, Won-Ick;Han, Jong-Soo
    • Korea Trade Review
    • /
    • v.41 no.3
    • /
    • pp.177-198
    • /
    • 2016
  • China has retained economic growth rate of average 9% for more than ten years recently after China introduced capitalistic market economy system in 1979 by Deng Xiaoping. China has attracted foreign direct investment for a long time because it has retained very high economic growth rate, low labor cost, and various policies for foreign investors. This paper tries to analyse the determinants of foreign direct investment inflow after reform-opening of China with empirical analysis methods utilizing each province·city's specific characteristics by using the panel data from 1985 to 2013. For the empirical analysis we use random effect model, fixed effect model, pooled OLS, and random coefficient model. The results by pooled OLS and random coefficient model are presented for the comparison with the main results in the process of research. The research shows the results by fixed effect model are better than those by random effect model after doing Hausman's test. The results shows that GRDP, capital stock, and telecommunication exert a positive relationship with foreign direct investment, while express way variable exerts a negative one. China's education level surprisingly does not attract foreign direct investment even though it is not at a critical level. Therefore, the Chinese government should try to increase national income level as it symbolizes market size; encourage domestic investment; and construct high quality telecommunication infrastructure.

  • PDF

Predicting Gross Box Office Revenue for Domestic Films

  • Song, Jongwoo;Han, Suji
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.301-309
    • /
    • 2013
  • This paper predicts gross box office revenue for domestic films using the Korean film data from 2008-2011. We use three regression methods, Linear Regression, Random Forest and Gradient Boosting to predict the gross box office revenue. We only consider domestic films with a revenue size of at least KRW 500 million; relevant explanatory variables are chosen by data visualization and variable selection techniques. The key idea of analyzing this data is to construct the meaningful explanatory variables from the data sources available to the public. Some variables must be categorized to conduct more effective analysis and clustering methods are applied to achieve this task. We choose the best model based on performance in the test set and important explanatory variables are discussed.

Applicability Evaluation of a Mixed Model for the Analysis of Repeated Inventory Data : A Case Study on Quercus variabilis Stands in Gangwon Region (반복측정자료 분석을 위한 혼합모형의 적용성 검토: 강원지역 굴참나무 임분을 대상으로)

  • Pyo, Jungkee;Lee, Sangtae;Seo, Kyungwon;Lee, Kyungjae
    • Journal of Korean Society of Forest Science
    • /
    • v.104 no.1
    • /
    • pp.111-116
    • /
    • 2015
  • The purpose of this study was to evaluate mixed model of dbh-height relation containing random effect. Data were obtained from a survey site for Quercus variabilis in Gangwon region and remeasured the same site after three years. The mixed model were used to fixed effect in the dbh-height relation for Quercus variabilis, with random effect representing correlation of survey period were obtained. To verify the evaluation of the model for random effect, the akaike information criterion (abbreviated as, AIC) was used to calculate the variance-covariance matrix, and residual of repeated data. The estimated variance-covariance matrix, and residual were -0.0291, 0.1007, respectively. The model with random effect (AIC = -215.5) has low AIC value, comparison with model with fixed effect (AIC = -154.4). It is for this reason that random effect associated with categorical data is used in the data fitting process, the model can be calibrated to fit repeated site by obtaining measurements. Therefore, the results of this study could be useful method for developing model using repeated measurement.