• Title/Summary/Keyword: Gini mean

Search Result 16, Processing Time 0.021 seconds

Application of Random Forests to Association Studies Using Mitochondrial Single Nucleotide Polymorphisms

  • Kim, Yoon-Hee;Kim, Ho
    • Genomics & Informatics
    • /
    • v.5 no.4
    • /
    • pp.168-173
    • /
    • 2007
  • In previous nuclear genomic association studies, Random Forests (RF), one of several up-to-date machine learning methods, has been used successfully to generate evidence of association of genetic polymorphisms with diseases or other phenotypes. Compared with traditional statistical analytic methods, such as chi-square tests or logistic regression models, the RF method has advantages in handling large numbers of predictor variables and examining gene-gene interactions without a specific model. Here, we applied the RF method to find the association between mitochondrial single nucleotide polymorphisms (mtSNPs) and diabetes risk. The results from a chi-square test validated the usage of RF for association studies using mtDNA. Indexes of important variables such as the Gini index and mean decrease in accuracy index performed well compared with chi-square tests in favor of finding mtSNPs associated with a real disease example, type 2 diabetes.

An Analysis Regarding Trends of Dualism in Korean Agriculture (농업생산 양극화 추이에 대한 연구)

  • Sung, Jae-Hoon;Woo, Sung-Hwi
    • The Journal of Industrial Distribution & Business
    • /
    • v.8 no.6
    • /
    • pp.87-95
    • /
    • 2017
  • Purpose - The structural changes of Korean agriculture are complex due to heterogeneous production processes and farms' features. This study analyzed trends of dualism in Korean agriculture over the period 2000-15 based on farm-level data to clarify the specific trends of dualism in terms of farm income, farm-size, and farm operators' age. From the results of this study, we would be able to understand the features of structural changes in Korean agriculture more profoundly. Research design, data, and methodology - We incorporated farm-level data in South Korea: Agricultural census and Farm household economy survey. As measures of inequality, we used size-weighted quantiles, and normalized Gini coefficients as well as mean and conventional quantiles. The size-weighted quantiles are more robust to changes in the number of small farms, but they are more sensitive to changes in the distribution of farm-size. Thus, they would be more useful to identify trends of dualism of Korean agriculture. Results - The results show that the farmland distribution of crop farms became more skewed and dispersed. However, the herd distribution of livestock farms became more concentrated. To be specific, their mean and 1st quantile increases more rapidly than their size-weighted 2nd quantile and size-weighted 3rd quantile. Gini coefficients of livestock farms regarding their herd distribution decreased by 0.1 on average. In the case of income distribution, the results indicate that the polarization regarding farm household/agricultural/non-agricultural income became more severe. However, we also found that the distribution of transfer income became concentrated continuously. The results imply that transfer income including subsidies would decrease farm income polarization. Lastly, during the study periods, Korean farms were aging over time, and age distribution of them more concentrated. Conclusions - The structure of Korean agriculture has been changing, even though the absolute size of it decreased over time. Land (herd) distribution became more dispersed (concentrated). Inequality regarding agricultural income became more severe, and it made farm household income more polarized even though transfer income would decrease income gaps among farms. Lastly, farms continue to age regardless of farm types and this might affect the structural changes in Korean agriculture in the future.

Households' Characteristics, Forest Resources Dependency and Forest Availability in Central Terai of Nepal

  • Panta, Menaka;Kim, Kyehyun;Lee, Cholyoung
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.5
    • /
    • pp.548-557
    • /
    • 2009
  • For centuries, forests have been a key component of rural livelihood. They are important both socially and economically in Nepal. Firewood and fodder are the basic forest products that are extracted daily or weekly basis in most of the rural areas in Nepal. In this study, a field survey of 100 households was conducted to examine the degree of forest dependency and forest resource availability, households' livelihood strategy and their relationship with forest dependency in Chitwan, Nepal. A household' response indexes were constructed, Gini coefficient, Head Count Poverty Index (HCI) and Poverty Gap Index (PGI) were calculated and one way ANOVA test was also performed for data analysis. Data revealed that 82/81% of all households were constantly used forest for firewood and fodder collection respectively while 42% of households were used forest or forest fringe for grazing. The Forest Product Availability Indexes (FPAI) showed a sharp decline of forest resources from 0.781 to 0.308 for a 20-yr time horizon while timber wood was noticeably lowered than the other products. Yet, about 33% of households were below the poverty threshold line with 0.0945 PGI. Income distribution among the household showed a lower Gini coefficient 0.25 than 0.37 of landholdings size. However, mean income was significantly varies with F-statistics=246.348 at P=0.05 between income groups (rich, medium and poor). The extraction of firewood, fodder and other forest products were significantly different between the income group with F-statistics=16.480, 19.930, 29.956 at P=0.05 respectively. Similarly, landholdings size and education were also significantly different between the income groups with F-statistics=4.333, 5.981 at P=0.05 respectively. These findings suggested that income status of households was the major indicator of forest dependency while poor and medium groups were highly dependent on the forests for firewood, fodder and other products. Forest dependency still remains high and the availability of forest products that can be extracted from the remaining forestlands is decreasing. The high dependency of households on forest coupled with other socioeconomic attributes like education, poverty, small landholders and so on were possibly caused the forest degradation in Chitwan.Therefore, policy must be directed towards the poor livelihood supporting agenda that may enhance the financial conditions of rural households while it could reduce the degree of forest dependency inspired with other income generating activities in due course.

Selecting the optimal threshold based on impurity index in imbalanced classification (불균형 자료에서 불순도 지수를 활용한 분류 임계값 선택)

  • Jang, Shuin;Yeo, In-Kwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.711-721
    • /
    • 2021
  • In this paper, we propose the method of adjusting thresholds using impurity indices in classification analysis on imbalanced data. Suppose the minority category is Positive and the majority category is Negative for the imbalanced binomial data. When categories are determined based on the commonly used 0.5 basis, the specificity tends to be high in unbalanced data while the sensitivity is relatively low. Increasing sensitivity is important when proper classification of objects in minority categories is relatively important. We explore how to increase sensitivity through adjusting thresholds. Existing studies have adjusted thresholds based on measures such as G-Mean and F1-score, but in this paper, we propose a method to select optimal thresholds using the chi-square statistic of CHAID, the Gini index of CART, and the entropy of C4.5. We also introduce how to get a possible unique value when multiple optimal thresholds are obtained. Empirical analysis shows what improvements have been made compared to the results based on 0.5 through classification performance metrics.

Machine Learning Based MMS Point Cloud Semantic Segmentation (머신러닝 기반 MMS Point Cloud 의미론적 분할)

  • Bae, Jaegu;Seo, Dongju;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.939-951
    • /
    • 2022
  • The most important factor in designing autonomous driving systems is to recognize the exact location of the vehicle within the surrounding environment. To date, various sensors and navigation systems have been used for autonomous driving systems; however, all have limitations. Therefore, the need for high-definition (HD) maps that provide high-precision infrastructure information for safe and convenient autonomous driving is increasing. HD maps are drawn using three-dimensional point cloud data acquired through a mobile mapping system (MMS). However, this process requires manual work due to the large numbers of points and drawing layers, increasing the cost and effort associated with HD mapping. The objective of this study was to improve the efficiency of HD mapping by segmenting semantic information in an MMS point cloud into six classes: roads, curbs, sidewalks, medians, lanes, and other elements. Segmentation was performed using various machine learning techniques including random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), and gradient-boosting machine (GBM), and 11 variables including geometry, color, intensity, and other road design features. MMS point cloud data for a 130-m section of a five-lane road near Minam Station in Busan, were used to evaluate the segmentation models; the average F1 scores of the models were 95.43% for RF, 92.1% for SVM, 91.05% for GBM, and 82.63% for KNN. The RF model showed the best segmentation performance, with F1 scores of 99.3%, 95.5%, 94.5%, 93.5%, and 90.1% for roads, sidewalks, curbs, medians, and lanes, respectively. The variable importance results of the RF model showed high mean decrease accuracy and mean decrease gini for XY dist. and Z dist. variables related to road design, respectively. Thus, variables related to road design contributed significantly to the segmentation of semantic information. The results of this study demonstrate the applicability of segmentation of MMS point cloud data based on machine learning, and will help to reduce the cost and effort associated with HD mapping.

The Trends and Causes of Income Inequality Among Gender (성별 집단 내 소득불평등(inequality among gender)의 변화 추이 및 원인)

  • Kim, Hye-Yeon;Hong, Baeg-Eui
    • Korean Journal of Social Welfare
    • /
    • v.61 no.2
    • /
    • pp.391-415
    • /
    • 2009
  • The purpose of this study is to examine the trends of income inequality by gender since 1997 economic crisis and to investigate what is the most influential factor on these changes for males and females. Data used for this study are nine waves of Korean Labor and Income Panel Study(KLIPS). Income inequality is measured by the Gini coefficient and the mean logarithmic deviation(MLD) and the MLDs are decomposed into three components to quantify within- and between-group inequalities. The results show that the extent of income inequality is greater for women during the whole period and is fluctuated more widely. Women's income inequality is mainly affected by the family-related variables, such as age and marital status, while men's inequality is primarily determined by the labor market factors, such as employment status, industrial types and occupation status. These results imply that gender-sensetive welfare policies need to be implemented and that it is necessary to assist the poor women and men through the benefits from the income assistance programs and labor market programs.

  • PDF