• Title/Summary/Keyword: forest statistics

Search Result 320, Processing Time 0.021 seconds

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

Development of Stem Profile and Taper Equation for Carpinus laxiflora in Jeju Experimental Forests of Korea Forest Research Institute (국립산림과학원 제주시험림의 서어나무 수간형태와 수간곡선식 추정)

  • Chung, Young-Gyo;Kim, Dae-Hyun;Kim, Cheol-Min
    • Journal of agriculture & life science
    • /
    • v.44 no.4
    • /
    • pp.1-7
    • /
    • 2010
  • Data was collected to develop equation for predicting stemp taper for Carpinus laxiflora in Jeju Experimental Forests. The Models tested for choosing the best-fit equations were Max & Burkhart's model, Kozak's model, and Lee's model. Performance of the equations in predicting stem diameter at a specific point along a stem was evaluated with fit and validation statistics and distribution of residuals on predicted values. In result, all the three models gave slightly better values of fitting statistics. In plotting residuals against predicted diameter, Max & Burkhart's model showed underestimation in predicting small diameter and Lee's Model did the same in predicting small diameter. Based on the above analysis of the three models in predicting stem taper, Kozak's model was chosen for the best-fit stem taper equations, and its parameters were given for C. laxiflora. Kozak's model was used to develop a stem volume table of outside bark for C. laxiflora.

Using Reliability Tools to Characterize Wood Strand Thickness of Oriented Strand Board Panels

  • Chastain, J.S.;Young, T.M.;Guess, F.M.;Leo, R.V.
    • International Journal of Reliability and Applications
    • /
    • v.10 no.2
    • /
    • pp.89-99
    • /
    • 2009
  • Oriented Strand Board (OSB) is an important engineered wood product used in housing construction which has a lower environmental impact or "carbon footprint." In this paper, reliability and statistical tools are applied to gain insights on the strand thickness of OSB panels. An OSB panel consists of several hundred wood strands that are resinated and pressed. The variability of OSB strand thickness for six manufacturers in the Eastern United States is examined as a whole, as well as individually. Little research exists on OSB strand thickness across mills even though strand thickness variability has been documented in laboratory experiments to greatly influence the dimensional stability of OSB panels. Our aims are to quantify and characterize strand thickness, plus apply reliability techniques, such as Kaplan-Meier curves, to characterize the probability of strand thickness. We further explore graphically and statistically the thickness of the strands.

  • PDF

Exploring Reliability of Oriented Strand Board's Tensile and Stiffness Strengths

  • Wang, Y.;Young, T.M.;Guess, F.M.;Leon, R.V.
    • International Journal of Reliability and Applications
    • /
    • v.8 no.1
    • /
    • pp.111-124
    • /
    • 2007
  • In this paper, we apply insightful statistical reliability tools to manage and seek improvements in the strengths of Oriented Strand Board (OSB). As a part of the OSB manufacturing process, the product undergoes destructive testing at various intervals to determine compliance with customers' specifications. Workers perform these tests on sampled cross sections of the OSB panel to measure the tensile strength, also called internal bond (IB), in pounds per square inches until failure. Additional stiffness strength tests include parallel and perpendicular elasticity indices (EI), which are taken from cross sectional samples of the OSB panel in the parallel and perpendicular directions with respect to the orientation of the wood strands. We explore both graphically and statistically these "pressure-to-failures" of OSB. Also, we briefly comment on reducing sources of variability in the IB and EI of OSB.

  • PDF

Network Classification of P2P Traffic with Various Classification Methods (다양한 분류기법을 이용한 네트워크상의 P2P 데이터 분류실험)

  • Han, Seokwan;Hwang, Jinsoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.1-8
    • /
    • 2015
  • Security has become an issue due to the rapid increases in internet traffic data network. Especially P2P traffic data poses a great challenge to network systems administrators. Preemptive measures are necessary for network quality of service(QoS) and efficient resource management like blocking suspicious traffic data. Deep packet inspection(DPI) is the most exact way to detect an intrusion but it may pose a private security problem that requires time. We used several machine learning methods to compare the performance in classifying network traffic data accurately over time. The Random Forest method shows an excellent performance in both accuracy and time.

Prediction of electricity consumption in A hotel using ensemble learning with temperature (앙상블 학습과 온도 변수를 이용한 A 호텔의 전력소모량 예측)

  • Kim, Jaehwi;Kim, Jaehee
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.319-330
    • /
    • 2019
  • Forecasting the electricity consumption through analyzing the past electricity consumption a advantageous for energy planing and policy. Machine learning is widely used as a method to predict electricity consumption. Among them, ensemble learning is a method to avoid the overfitting of models and reduce variance to improve prediction accuracy. However, ensemble learning applied to daily data shows the disadvantages of predicting a center value without showing a peak due to the characteristics of ensemble learning. In this study, we overcome the shortcomings of ensemble learning by considering the temperature trend. We compare nine models and propose a model using random forest with the linear trend of temperature.

Exploring modern machine learning methods to improve causal-effect estimation

  • Kim, Yeji;Choi, Taehwa;Choi, Sangbum
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.2
    • /
    • pp.177-191
    • /
    • 2022
  • This paper addresses the use of machine learning methods for causal estimation of treatment effects from observational data. Even though conducting randomized experimental trials is a gold standard to reveal potential causal relationships, observational study is another rich source for investigation of exposure effects, for example, in the research of comparative effectiveness and safety of treatments, where the causal effect can be identified if covariates contain all confounding variables. In this context, statistical regression models for the expected outcome and the probability of treatment are often imposed, which can be combined in a clever way to yield more efficient and robust causal estimators. Recently, targeted maximum likelihood estimation and causal random forest is proposed and extensively studied for the use of data-adaptive regression in estimation of causal inference parameters. Machine learning methods are a natural choice in these settings to improve the quality of the final estimate of the treatment effect. We explore how we can adapt the design and training of several machine learning algorithms for causal inference and study their finite-sample performance through simulation experiments under various scenarios. Application to the percutaneous coronary intervention (PCI) data shows that these adaptations can improve simple linear regression-based methods.

Content and Characteristics of Forest Cover Changes in North Korea (북한(北韓) 지역(地域) 산림면적(山林面積) 변화(變化)의 규모(規模)와 특성(特性))

  • Lee, Kyu-Sung;Joung, Mi-Reyoung;Yoon, Jung-Sook
    • Journal of Korean Society of Forest Science
    • /
    • v.88 no.3
    • /
    • pp.352-363
    • /
    • 1999
  • It has been rare to obtain reliable information related to the size of forest land in North Korea. Several sources of forest statistics, ranging from the first map of forest distribution in Korean Peninsula produced in 1910 to official data reported by the North Korea Government in 1997, were gathered and analyzed to define the characteristics of forest cover changes over years. In addition, Landsat satellite data obtained from 1973 to 1993 were processed for the two study areas of the provinces of Pyungyang and Heasan, where the topography and land use pattern are significantly different each other. Using three sets of multitemporal Landsat imagery, land cover ma-ps were produced by computer classification. Although forest statistics reported before 1990 are somewhat inconsistent, they mere gradually decreasing over years. The estimates of 1991 satellite data and the recent statistics reported in 1998 shows very steep decline in forest lands as compared to the ones before 1990. The abrupt decrease of forest lands after 1990 was also found on the detailed analysis of Landsat data for the two study areas of Pyungyang and Heasan. The rapid decline of forest lands may have something to do with the poor economic situation of the country and the continuing natural disasters of severe flooding and drought. Unstocked forest, which was not classified into forest land, was a very distinct and pervasive land cover type that can be easily observed on satellite imagery. Since unstocked forest land in North Korea may be a critical factor for degrading environmental quality as well as for the continuing natural disasters, further analysis is necessary to define the exact extent and the physical characteristics of the cover type.

  • PDF

Classification of Forest Vegetation Zone over Southern Part of Korean Peninsula Using Geographic Information Systems (環境因子의 空間分析을 통한 南韓지역의 山林植生帶 구분/지리정보시스템(GIS)에 의한 접근)

  • Lee, Kyu-Sung;Byong-Chun Lee;Joon Hwan Shin
    • The Korean Journal of Ecology
    • /
    • v.19 no.5
    • /
    • pp.465-476
    • /
    • 1996
  • There are several environmental variables that may be influential to the spatial distribution of forest vegetation. To create a map of forest vegetation zone over southern part of Korean Peninsula, digital map layers were produced for each of environmental variables that include topography, geographic locations, and climate. In addition, an extensive set of field survey data was collected at relatively undisturbed forests and they were introduced into the GIS database with exact coordinates of survey sites. Preliminary statistical analysis on the survey data showed that the environmental variables were significantly different among the previously defined five forest vegetation zones. Classification of the six layers of digital map representing environmental variables was carried out by a supervised classifier using the training statistics from field survey data and by a clustering algorithm. Although the maps from two classifiers were somewhat different due to the classification procedure applied, they showed overall patterns of vertical and horizontal distribution of forest zones. considering the spatial contents of many ecological studies, GIS can be used as an important tool to manage and analyze spatial data. This study discusses more about the generation of digital map and the analysis procedure rather than the outcome map of forest vegetation zone.

  • PDF

Estimating Wood Weight Change on Air Drying Times for Three Coniferous Species of South Korea

  • Lee, Daesung;Choi, Jungkee
    • Journal of Forest and Environmental Science
    • /
    • v.32 no.3
    • /
    • pp.262-269
    • /
    • 2016
  • The purposes of this study are to calculate the green and dried weight using wood discs, to figure out weight change on air drying times, and to develop the model of wood disc weight change for Larix kaempferi, Pinus koraiensis, and Pinus densiflora. The variables affecting the weight change were investigated, and the pattern of weight change over time was figured out through linear models. When comparing the stem green weight calculated using wood discs in this study with the weight table of Korea Forest Service, the weight was not significantly different for L. kaempferi and P. koraiensis. On the other hand, in comparison of stem dried weight, the weight was significantly different in all of three species. In addition, various measurement factors were examined to figure out the relationship with weight change, and air drying times and disc diameter were found as significant independent variables. Finally, two linear models were developed to estimate air drying times of three species, fit statistics were significant for practical use.