• Title/Summary/Keyword: dataset records

Search Result 98, Processing Time 0.023 seconds

Estimation of Genetic Parameters and Trends for Weaning-to-first Service Interval and Litter Traits in a Commercial Landrace-Large White Swine Population in Northern Thailand

  • Chansomboon, C.;Elzo, M.A.;Suwanasopee, T.;Koonawootrittriron, S.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.23 no.5
    • /
    • pp.543-555
    • /
    • 2010
  • The objectives of this research were the estimation of genetic parameters and trends for weaning-to-first service interval (WSI), and litter traits in a commercial swine population composed of Landrace (L), Large White (T), LT, and TL animals in Chiang Mai, Northern Thailand. The dataset contained 4,399 records of WSI, number of piglets born alive (NBA), litter weight of live piglets at birth (LBW), number of piglets at weaning (NPW), and litter weight at weaning (LWW). Variance and covariance components were estimated with REML using 2-trait analyses. An animal model was used for WSI and a sire-dam model for litter traits. Fixed effects were farrowing year-season, breed group of sow, breed group of boar (litter traits), parity, heterosis (litter traits), sow age, and lactation length (NPW and LWW). Random effects were boar (litter traits), sow, permanent environment, and residual. Heritabilities for direct genetic effects were low for WSI (0.04${\pm}$0.02) and litter traits (0.05${\pm}$0.02 to 0.06${\pm}$0.02). Most heritabilities for maternal litter trait effects were 20% to 50% lower than their direct counterparts. Repeatability for WSI was similar to its heritability. Repeatabilities for litter traits ranged from 0.15${\pm}$0.02 to 0.18${\pm}$F0.02. Direct genetic, permanent environment, and phenotypic correlations between WSI and litter traits were near zero. Direct genetic correlations among litter traits ranged from 0.56${\pm}$0.20 to 0.95${\pm}$0.05, except for near zero estimates between NBA and LWW, and LBW and LWW. Maternal, permanent environment, and phenotypic correlations among litter traits had similar patterns of values to direct genetic correlations. Boar genetic trends were small and significant only for NBA (-0.015${\pm}$0.005 piglets/yr, p<0.004). Sow genetic trends were small, negative, and significant (-0.036${\pm}$0.013 d/yr, p<0.01 for WSI; -0.017${\pm}$0.005 piglets/yr, p<0.007, for NBA; -0.015${\pm}$0.005 kg/yr, p<0.01, for LBW; -0.019${\pm}$0.008 piglets/yr, p<0.02, for NPW; and -0.022${\pm}$0.006 kg/yr, p<0.003, for LWW). Permanent environmental correlations were small, negative, and significant only for WSI (-0.028${\pm}$0.011 d/yr, p<0.02). Environmental trends were positive and significant only for litter traits (p<0.01 to p<0.0003). Selection based on predicted genetic values rather than phenotypes could be advantageous in this population. A single trait analysis could be used for WSI and a multiple trait analysis could be implemented for litter traits.

Variance Components and Genetic Parameters for Milk Production and Lactation Pattern in an Ethiopian Multibreed Dairy Cattle Population

  • Gebreyohannes, Gebregziabher;Koonawootrittriron, Skorn;Elzo, Mauricio A.;Suwanasopee, Thanathip
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.26 no.9
    • /
    • pp.1237-1246
    • /
    • 2013
  • The objective of this study was to estimate variance components and genetic parameters for lactation milk yield (LY), lactation length (LL), average milk yield per day (YD), initial milk yield (IY), peak milk yield (PY), days to peak (DP) and parameters (ln(a) and c) of the modified incomplete gamma function (MIG) in an Ethiopian multibreed dairy cattle population. The dataset was composed of 5,507 lactation records collected from 1,639 cows in three locations (Bako, Debre Zeit and Holetta) in Ethiopia from 1977 to 2010. Parameters for MIG were obtained from regression analysis of monthly test-day milk data on days in milk. The cows were purebred (Bos indicus) Boran (B) and Horro (H) and their crosses with different fractions of Friesian (F), Jersey (J) and Simmental (S). There were 23 breed groups (B, H, and their crossbreds with F, J, and S) in the population. Fixed and mixed models were used to analyse the data. The fixed model considered herd-year-season, parity and breed group as fixed effects, and residual as random. The single and two-traits mixed animal repeatability models, considered the fixed effects of herd-year-season and parity subclasses, breed as a function of cow H, F, J, and S breed fractions and general heterosis as a function of heterozygosity, and the random additive animal, permanent environment, and residual effects. For the analysis of LY, LL was added as a fixed covariate to all models. Variance components and genetic parameters were estimated using average information restricted maximum likelihood procedures. The results indicated that all traits were affected (p<0.001) by the considered fixed effects. High grade $B{\times}F$ cows (3/16B 13/16F) had the highest least squares means (LSM) for LY ($2,490{\pm}178.9kg$), IY ($10.5{\pm}0.8kg$), PY ($12.7{\pm}0.9kg$), YD ($7.6{\pm}0.55kg$) and LL ($361.4{\pm}31.2d$), while B cows had the lowest LSM values for these traits. The LSM of LY, IY, YD, and PY tended to increase from the first to the fifth parity. Single-trait analyses yielded low heritability ($0.03{\pm}0.03$ and $0.08{\pm}0.02$) and repeatability ($0.14{\pm}0.01$ to $0.24{\pm}0.02$) estimates for LL, DP and parameter c. Medium heritability ($0.21{\pm}0.03$ to $0.33{\pm}0.04$) and repeatability ($0.27{\pm}0.02$ to $0.53{\pm}0.01$) estimates were obtained for LY, IY, PY, YD and ln(a). Genetic correlations between LY, IY, PY, YD, ln(a), and LL ranged from 0.59 to 0.99. Spearman's rank correlations between sire estimated breeding values for LY, LL, IY, PY, YD, ln(a) and c were positive (0.67 to 0.99, p<0.001). These results suggested that selection for IY, PY, YD, or LY would genetically improve lactation milk yield in this Ethiopian dairy cattle population.

A Prospect on the Changes in Short-term Cold Hardiness in "Campbell Early" Grapevine under the Future Warmer Winter in South Korea (남한의 겨울기온 상승 예측에 따른 포도 "캠벨얼리" 품종의 단기 내동성 변화 전망)

  • Chung, U-Ran;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.10 no.3
    • /
    • pp.94-101
    • /
    • 2008
  • Warming trends during winter seasons in East Asian regions are expected to accelerate in the future according to the climate projection by the Inter-governmental Panel on Climate Change (IPCC). Warmer winters may affect short-term cold hardiness of deciduous fruit trees, and yet phenological observations are scant compared to long-term climate records in the regions. Dormancy depth, which can be estimated by daily temperature, is expected to serve as a reasonable proxy for physiological tolerance of flowering buds to low temperature in winter. In order to delineate the geographical pattern of short-term cold hardiness in grapevines, a selected dormancy depth model was parameterized for "Campbell Early", the major cultivar in South Korea. Gridded data sets of daily maximum and minimum temperature with a 270m cell spacing ("High Definition Digital Temperature Map", HDDTM) were prepared for the current climatological normal year (1971-2000) based on observations at the 56 Korea Meteorological Administration (KMA) stations and a geospatial interpolation scheme for correcting land surface effects (e.g., land use, topography, and site elevation). To generate relevant datasets for climatological normal years in the future, we combined a 25km-resolution, 2011-2100 temperature projection dataset covering South Korea (under the auspices of the IPCC-SRES A2 scenario) with the 1971-2000 HD-DTM. The dormancy depth model was run with the gridded datasets to estimate geographical pattern of change in the cold-hardiness period (the number of days between endo- and forced dormancy release) across South Korea for the normal years (1971-2000, 2011-2040, 2041-2070, and 2071-2100). Results showed that the cold-hardiness zone with 60 days or longer cold-tolerant period would diminish from 58% of the total land area of South Korea in 1971-2000 to 40% in 2011-2040, 14% in 2041-2070, and less than 3% in 2071-2100. This method can be applied to other deciduous fruit trees for delineating geographical shift of cold-hardiness zone under the projected climate change in the future, thereby providing valuable information for adaptation strategy in fruit industry.

Classification Tree Analysis to Assess Contributing Factors Influencing Biosecurity Level on Farrow-to-Finish Pig Farms in Korea (분류 트리 기법을 이용한 국내 일괄사육 양돈장의 차단방역 수준에 영향을 미치는 기여 요인 평가)

  • Kim, Kyu-Wook;Pak, Son-Il
    • Journal of Veterinary Clinics
    • /
    • v.33 no.2
    • /
    • pp.107-112
    • /
    • 2016
  • The objective of this study was to determine potential contributing factors associated with biosecurity level of farrow-to-finish pig farms and to develop a classification tree model to explore how these factors related to each other based on prediction model. To this end, the author analyzed data (n = 193) extracted from a cross-sectional study of 344 farrow-to-finish farms which was conducted between March and September 2014 aimed to explore swine disease status at farm level. Standardized questionnaires with information about basic demographical data and management practices were collected in each farm by on-site visit of trained veterinarians. For the classification of the data sets regarding biosecurity level as a dependent variable and predictor variables, Chi-squared Automatic Interaction Detection (CHAID) algorithm was applied for modeling classification tree. The statistics of misclassification risk was used to evaluate the fitness of the model in terms of prediction results. Categorical multivariate input data (40 variables) was used to construct a classification tree, and the target variable was biosecurity level dichotomized into low versus high. In general, the level of biosecurity was lower in the majority of farms studied, mainly due to the limited implementation of on-farm basic biosecurity measures aimed at controlling the potential introduction and transmission of swine diseases. The CHAID model illustrated the relative importance of significant predictors in explaining the level of biosecurity; maintenance of medical records of treatment and vaccination, use of dedicated clothing to enter the farm, installing fence surrounding the farm perimeter, and periodic monitoring of the herd using written biosecurity plan in place. The misclassification risk estimate of the prediction model was 0.145 with the standard error of 0.025, indicating that 85.5% of the cases could be classified correctly by using the decision rule based on the current tree. Although CHAID approach could provide detailed information and insight about interactions among factors associated with biosecurity level, further evaluation of potential bias intervened in the course of data collection should be included in future studies. In addition, there is still need to validate findings through the external dataset with larger sample size to improve the external validity of the current model.

Application of single-step genomic evaluation using social genetic effect model for growth in pig

  • Hong, Joon Ki;Kim, Young Sin;Cho, Kyu Ho;Lee, Deuk Hwan;Min, Ye Jin;Cho, Eun Seok
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.32 no.12
    • /
    • pp.1836-1843
    • /
    • 2019
  • Objective: Social genetic effects (SGE) are an important genetic component for growth, group productivity, and welfare in pigs. The present study was conducted to evaluate i) the feasibility of the single-step genomic best linear unbiased prediction (ssGBLUP) approach with the inclusion of SGE in the model in pigs, and ii) the changes in the contribution of heritable SGE to the phenotypic variance with different scaling ${\omega}$ constants for genomic relationships. Methods: The dataset included performance tested growth rate records (average daily gain) from 13,166 and 21,762 pigs Landrace (LR) and Yorkshire (YS), respectively. A total of 1,041 (LR) and 964 (YS) pigs were genotyped using the Illumina PorcineSNP60 v2 BeadChip panel. With the BLUPF90 software package, genetic parameters were estimated using a modified animal model for competitive traits. Giving a fixed weight to pedigree relationships (${\tau}:1$), several weights (${\omega}_{xx}$, 0.1 to 1.0; with a 0.1 interval) were scaled with the genomic relationship for best model fit with Akaike information criterion (AIC). Results: The genetic variances and total heritability estimates ($T^2$) were mostly higher with ssGBLUP than in the pedigree-based analysis. The model AIC value increased with any level of ${\omega}$ other than 0.6 and 0.5 in LR and YS, respectively, indicating the worse fit of those models. The theoretical accuracies of direct and social breeding value were increased by decreasing ${\omega}$ in both breeds, indicating the better accuracy of ${\omega}_{0.1}$ models. Therefore, the optimal values of ${\omega}$ to minimize AIC and to increase theoretical accuracy were 0.6 in LR and 0.5 in YS. Conclusion: In conclusion, single-step ssGBLUP model fitting SGE showed significant improvement in accuracy compared with the pedigree-based analysis method; therefore, it could be implemented in a pig population for genomic selection based on SGE, especially in South Korean populations, with appropriate further adjustment of tuning parameters for relationship matrices.

Artificial Intelligence Techniques for Predicting Online Peer-to-Peer(P2P) Loan Default (인공지능기법을 이용한 온라인 P2P 대출거래의 채무불이행 예측에 관한 실증연구)

  • Bae, Jae Kwon;Lee, Seung Yeon;Seo, Hee Jin
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.3
    • /
    • pp.207-224
    • /
    • 2018
  • In this article, an empirical study was conducted by using public dataset from Lending Club Corporation, the largest online peer-to-peer (P2P) lending in the world. We explore significant predictor variables related to P2P lending default that housing situation, length of employment, average current balance, debt-to-income ratio, loan amount, loan purpose, interest rate, public records, number of finance trades, total credit/credit limit, number of delinquent accounts, number of mortgage accounts, and number of bank card accounts are significant factors to loan funded successful on Lending Club platform. We developed online P2P lending default prediction models using discriminant analysis, logistic regression, neural networks, and decision trees (i.e., CART and C5.0) in order to predict P2P loan default. To verify the feasibility and effectiveness of P2P lending default prediction models, borrower loan data and credit data used in this study. Empirical results indicated that neural networks outperforms other classifiers such as discriminant analysis, logistic regression, CART, and C5.0. Neural networks always outperforms other classifiers in P2P loan default prediction.

Privacy Preserving Data Publication of Dynamic Datasets (프라이버시를 보호하는 동적 데이터의 재배포 기법)

  • Lee, Joo-Chang;Ahn, Sung-Joon;Won, Dong-Ho;Kim, Ung-Mo
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.6A
    • /
    • pp.139-149
    • /
    • 2008
  • The amount of personal information collected by organizations and government agencies is continuously increasing. When a data collector publishes personal information for research and other purposes, individuals' sensitive information should not be revealed. On the other hand, published data is also required to provide accurate statistical information for analysis. k-Anonymity and ${\iota}$-diversity models are popular approaches for privacy preserving data publication. However, they are limited to static data release. After a dataset is updated with insertions and deletions, a data collector cannot safely release up-to-date information. Recently, the m-invariance model has been proposed to support re-publication of dynamic datasets. However, the m-invariant generalization can cause high information loss. In addition, if the adversary already obtained sensitive values of some individuals before accessing released information, the m-invariance leads to severe privacy disclosure. In this paper, we propose a novel technique for safely releasing dynamic datasets. The proposed technique offers a simple and effective method for handling inserted and deleted records without generalization. It also gives equivalent degree of privacy preservation to the m-invariance model.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.