• Title/Summary/Keyword: dataset records

Search Result 98, Processing Time 0.03 seconds

Effect of errors in pedigree on the accuracy of estimated breeding value for carcass traits in Korean Hanwoo cattle

  • Nwogwugwu, Chiemela Peter;Kim, Yeongkuk;Chung, Yun Ji;Jang, Sung Bong;Roh, Seung Hee;Kim, Sidong;Lee, Jun Heon;Choi, Tae Jeong;Lee, Seung-Hwan
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.7
    • /
    • pp.1057-1067
    • /
    • 2020
  • Objective: This study evaluated the effect of pedigree errors (PEs) on the accuracy of estimated breeding value (EBV) and genetic gain for carcass traits in Korean Hanwoo cattle. Methods: The raw data set was based on the pedigree records of Korean Hanwoo cattle. The animals' information was obtained using Hanwoo registration records from Korean animal improvement association database. The record comprised of 46,704 animals, where the number of the sires used was 1,298 and the dams were 38,366 animals. The traits considered were carcass weight (CWT), eye muscle area (EMA), back fat thickness (BFT), and marbling score (MS). Errors were introduced in the pedigree dataset through randomly assigning sires to all progenies. The error rates substituted were 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, and 80%, respectively. A simulation was performed to produce a population of 1,650 animals from the pedigree data. A restricted maximum likelihood based animal model was applied to estimate the EBV, accuracy of the EBV, expected genetic gain, variance components, and heritability (h2) estimates for carcass traits. Correlation of the simulated data under PEs was also estimated using Pearson's method. Results: The results showed that the carcass traits per slaughter year were not consistent. The average CWT, EMA, BFT, and MS were 342.60 kg, 78.76 ㎠, 8.63 mm, and 3.31, respectively. When errors were introduced in the pedigree, the accuracy of EBV, genetic gain and h2 of carcass traits was reduced in this study. In addition, the correlation of the simulation was slightly affected under PEs. Conclusion: This study reveals the effect of PEs on the accuracy of EBV and genetic parameters for carcass traits, which provides valuable information for further study in Korean Hanwoo cattle.

Conditional mean spectrum for Bucharest

  • Vacareanu, Radu;Iancovici, Mihail;Pavel, Florin
    • Earthquakes and Structures
    • /
    • v.7 no.2
    • /
    • pp.141-157
    • /
    • 2014
  • The Conditional Mean Spectrum represents a powerful link between the seismic hazard information and the selection of strong ground motion records at a particular site. The scope of the paper is to apply for the city of Bucharest for the first time the method to obtain the Conditional Mean Spectrum (CMS) presented by Baker (2011) and to select, on the basis of the CMS, a suite of strong ground motions for performing elastic and inelastic dynamic analyses of buildings and structures with fundamental periods of vibration in the vicinity of 1.0 s. The major seismic hazard for Bucharest and for most of Southern and Eastern Romania is dominated by the Vrancea subcrustal seismic source. The ground motion prediction equation developed for subduction-type earthquakes and soil conditions by Youngs et al. (1997) is used for the computation of the Uniform Hazard Spectrum (UHS) and the CMS. The disaggregation of seismic hazard is then performed in order to determine the mean causal values of magnitude and source-to-site distance for a particular spectral ordinate (for a spectral period T = 1.0 s in this study). The spectral period of 1.0 s is considered to be representative for the new stock of residential and office reinforced concrete (RC) buildings in Bucharest. The differences between the Uniform Hazard Spectrum (UHS) and the Conditional Mean Spectrum (CMS) are discussed taking into account the scarcity of ground motions recorded in the region of Bucharest and the frequency content characteristics of the recorded data. Moreover, a record selection based on the criteria proposed by Baker and Cornell (2006) and Baker (2011) is performed using a dataset consisting of strong ground motions recorded during seven Vrancea seismic events.

Patterns of Recurrence after Resection of Mass-Forming Type Intrahepatic Cholangiocarcinomas

  • Luvira, Vor;Eurboonyanun, Chalerm;Bhudhisawasdi, Vajarabhongsa;Pugkhem, Ake;Pairojkul, Chawalit;Luvira, Varisara;Sathitkarnmanee, Egapong;Somsap, Kulyada;Kamsa-ard, Supot
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.10
    • /
    • pp.4735-4739
    • /
    • 2016
  • Background: Intrahepatic cholangiocarcinoma (IHCCA) is an aggressive tumor for which surgical resection is a mainstay of treatment. However, recurrence after resection is common associated with a poor prognosis. Studies regarding recurrence of mass-forming IHCCA are rare; therefore, we investigated the pattern with our dataset. Methods: We retrospectively reviewed the medical and pathological records of 50 mass-forming IHCCA patients who underwent hepatic resection between January 2004 and December 2009 in order to determine the patterns of recurrence and prognosis. All demographic and operative parameters were analyzed for their effects on recurrence-free survival. Results: The median recurrence-free survival time was 188 days (95%CI: 149-299). The respective 1-, 2-, and 3-year recurrence-free survival rates were 16.2% (95%CI: 6.6-29.4), 5.4% (95%CI: 1.0-15.8) and 2.7% (95%CI: 0.2-12.0). There was an equal distribution of recurrence at solitary and multiple sites. Univariate analysis revealed no factors related to recurrence-free survival.Conclusion: The overall survival and recurrence-free survival after surgery for mass-forming IHCCA were found to be very poor. Almost all recurrences were detected within 2 years after surgery. Adjuvant chemotherapy after surgery may add benefit in the affected patients.

Data Cleaning and Integration of Multi-year Dietary Survey in the Korea National Health and Nutrition Examination Survey (KNHANES) using Database Normalization Theory (데이터베이스 정규화 이론을 이용한 국민건강영양조사 중 다년도 식이조사 자료 정제 및 통합)

  • Kwon, Namji;Suh, Jihye;Lee, Hunjoo
    • Journal of Environmental Health Sciences
    • /
    • v.43 no.4
    • /
    • pp.298-306
    • /
    • 2017
  • Objectives: Since 1998, the Korea National Health and Nutrition Examination Survey (KNHANES) has been conducted in order to investigate the health and nutritional status of Koreans. The food intake data of individuals in the KNHANES has also been utilized as source dataset for risk assessment of chemicals via food. To improve the reliability of intake estimation and prevent missing data for less-responded foods, the structure of integrated long-standing datasets is significant. However, it is difficult to merge multi-year survey datasets due to ineffective cleaning processes for handling extensive numbers of codes for each food item along with changes in dietary habits over time. Therefore, this study aims at 1) cleaning the process of abnormal data 2) generation of integrated long-standing raw data, and 3) contributing to the production of consistent dietary exposure factors. Methods: Codebooks, the guideline book, and raw intake data from KNHANES V and VI were used for analysis. The violation of the primary key constraint and the $1^{st}-3rd$ normal form in relational database theory were tested for the codebook and the structure of the raw data, respectively. Afterwards, the cleaning process was executed for the raw data by using these integrated codes. Results: Duplication of key records and abnormality in table structures were observed. However, after adjusting according to the suggested method above, the codes were corrected and integrated codes were newly created. Finally, we were able to clean the raw data provided by respondents to the KNHANES survey. Conclusion: The results of this study will contribute to the integration of the multi-year datasets and help improve the data production system by clarifying, testing, and verifying the primary key, integrity of the code, and primitive data structure according to the database normalization theory in the national health data.

An Audit of 204 Histopathology Reports Over Three Years of Carcinoma of Cervix: Experience from a Tertiary Referral Centre

  • Pradhan, Anuja Prakash;Menon, Santosh;Rekhi, Bharat;Deodhar, Kedar
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.14
    • /
    • pp.5643-5645
    • /
    • 2015
  • Background: The aim was to see compliance to minimum data set information in carcinoma cervix histopathology reports from a team of 13 pathologists; and also to analyse the distribution of parameters like tumor size, grade, depth of cervical stromal invasion, lymph node yield and pTNM stage. Materials and Methods: All pathology reports of radical hysterectomy for carcinoma cervix operated in house within a three year duration (2010-2012), (n=204) were retrieved from medical records and analyzed for the above parameters. Results: In 2010- 59 cases, in 2011- 67 cases and in 2012- 78 cases of carcinoma cervix underwent operations in our hospital. The median age was 50.5 years and the maximum T diameter was 2.8 cms in the reports of three years. Squamous carcinoma was the commonest subtype amongst all the tumors. It was noted that 60.8% of cases had cervical stromal involvement more than half the thickness of the cervical stroma. Parametrial involvement was seen in 4.82% of cases. pTNM Staging was not mentioned in 65.06% of the cases. The mean bilateral pelvic lymph node yield count in our study was 16.6 inclusive of all the three years. Conclusions: Compliance with provision of a minimum dataset in our team of 13 pathologists was generally good. Lymph node yield in our hands is reasonable, but constant striving for greater numbers should be made. pTNM staging should be more meticulously documented. Use of proformas /checklists is recommended.

Analysis of the Effect of Patients' Clinical Conditions on No-Shows (외래 환자의 임상특성이 예약 부도에 미치는 영향 분석)

  • Lee, Sangbok;Park, Kitaek;Chung, Kwanghun
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.4
    • /
    • pp.53-69
    • /
    • 2017
  • This study focuses on analyzing no-shows associated with patients' clinical characteristics described by diagnoses in their medical data. A dataset of 7,055 patient-records from a Veterans hospital in the United States was used to test if there is difference on no-shows along with each patient's diagnosed diseases and the number of diagnoses. Patients with mental diseases such as drug dependence abuse and major depression, and chronic diseases such as hypertension are more likely to no-show. In comparisons with the number of diagnoses, the no-show decreases as the number of diagnoses increases up to four and doesn't change significantly afterwards. We provide managerial insights on clinical operations problems from statistical analysis. We believe that our results can be used to develop appropriate solutions on no-shows in clinics.

Identification of Factors Affecting the Crash Severity and Safety Countermeasures Toward Safer Work Zone Traffic Management (공사구간 교통관리특성을 고려한 고속도로 교통사고 심각도 영향요인 분석 및 안전성 증진 방안)

  • YOON, Seok Min;OH, Cheol;PARK, Hyun Jin;CHUNG, Bong Jo
    • Journal of Korean Society of Transportation
    • /
    • v.34 no.4
    • /
    • pp.354-372
    • /
    • 2016
  • This study identified factors affecting the crash severity at freeway work zones. A nice feature of this study was to take into account the characteristics of work zone traffic management in analyzing traffic safety concerns. In addition to crash records, vehicle detection systems (VDS) data and work zone historical data were used for establishing a dataset to be used for statistical analyses based on an ordered probit model. A total of six safety improvement strategies for freeway work zones, including traffic merging method, guidance information provision, speed management, warning information systems, traffic safety facility, and monitoring of effectiveness for countermeasures, were also proposed.

A Scientometric Social Network Analysis of International Collaborative Publications of All India Institute of Medical Sciences, India

  • Nishavathi, E.;Jeyshankar, R.
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.3
    • /
    • pp.64-76
    • /
    • 2020
  • Scientometrics and social network analysis (SNA) measures were used to analyze the international scientific collaboration (ISC) of All India Institute of Medical Sciences (AIIMS) for a period of 10 years (2009-2018). The dataset consists of 19,622 records retrieved from the Scopus database. The mean degree of collaboration 0.95 implied that researchers of AIIMS tend to collaborate domestically (80.29%) and internationally (14.67%). The data exhibits a hyper authorship pattern, and a medium-size research team consists of 4 to 10 authors who contributed a maximum of 62.08% (12,182) publications. 71.97% of research findings are scattered in journal articles. The most preferred journals published 58.55% of medical literature. An undirected collaboration network is constructed in Pajek to study the ISC of AIIMS during the period 2009-2018 which consists of 179 vertices (Vn) and 11,938 edges. The degree centrality (Dc) identified that the United States of America (Dc - 54; CC - 0.99) and United Kingdom (Dc - 41; 0.98) are the most collaborative countries in the whole network as well as the most influential countries. The Louvain community detection method is used to detect influential research groups of AIIMS. The temporal evolution of ISC of AIIMS studied through scientometrics and SNA measures shed light on the structure and properties of ISC networks of AIIMS. It revealed that AIIMS, India has taken keen steps to enrich the quality of research by extending and encouraging the collaboration between institutions and industries at the international level.

Prediction of replacement period of shield TBM disc cutter using SVM (SVM 기법을 이용한 쉴드 TBM 디스크 커터 교환 주기 예측)

  • La, You-Sung;Kim, Myung-In;Kim, Bumjoo
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • v.21 no.5
    • /
    • pp.641-656
    • /
    • 2019
  • In this study, a machine learning method was proposed to use in predicting optimal replacement period of shield TBM (Tunnel Boring Machine) disc cutter. To do this, a large dataset of ground condition, disc cutter replacement records and TBM excavation-related data, collected from a shield TBM tunnel site in Korea, was built and they were used to construct a disc cutter replacement period prediction model using a machine learning algorithm, SVM (Support Vector Machine) and to assess the performance of the model. The results showed that the performance of RBF (Radial Basis Function) SVM is the best among a total of three SVM classification functions (80% accuracy and 10% error rate on average). When compared between ground types, the more disc cutter replacement data existed, the better prediction results were obtained. From this results, it is expected that machine learning methods become very popularly used in practice in near future as more data is accumulated and the machine learning models continue to be fine-tuned.

A Best Effort Classification Model For Sars-Cov-2 Carriers Using Random Forest

  • Mallick, Shrabani;Verma, Ashish Kumar;Kushwaha, Dharmender Singh
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.27-33
    • /
    • 2021
  • The whole world now is dealing with Coronavirus, and it has turned to be one of the most widespread and long-lived pandemics of our times. Reports reveal that the infectious disease has taken toll of the almost 80% of the world's population. Amidst a lot of research going on with regards to the prediction on growth and transmission through Symptomatic carriers of the virus, it can't be ignored that pre-symptomatic and asymptomatic carriers also play a crucial role in spreading the reach of the virus. Classification Algorithm has been widely used to classify different types of COVID-19 carriers ranging from simple feature-based classification to Convolutional Neural Networks (CNNs). This research paper aims to present a novel technique using a Random Forest Machine learning algorithm with hyper-parameter tuning to classify different types COVID-19-carriers such that these carriers can be accurately characterized and hence dealt timely to contain the spread of the virus. The main idea for selecting Random Forest is that it works on the powerful concept of "the wisdom of crowd" which produces ensemble prediction. The results are quite convincing and the model records an accuracy score of 99.72 %. The results have been compared with the same dataset being subjected to K-Nearest Neighbour, logistic regression, support vector machine (SVM), and Decision Tree algorithms where the accuracy score has been recorded as 78.58%, 70.11%, 70.385,99% respectively, thus establishing the concreteness and suitability of our approach.