• Title/Summary/Keyword: Dataset Management

Search Result 540, Processing Time 0.025 seconds

Predicting of the Severity of Car Traffic Accidents on a Highway Using Light Gradient Boosting Model (LightGBM 알고리즘을 활용한 고속도로 교통사고심각도 예측모델 구축)

  • Lee, Hyun-Mi;Jeon, Gyo-Seok;Jang, Jeong-Ah
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.6
    • /
    • pp.1123-1130
    • /
    • 2020
  • This study aims to classify the severity in car crashes using five classification learning models. The dataset used in this study contains 21,013 vehicle crashes, obtained from Korea Expressway Corporation, between the year of 2015-2017 and the LightGBM(Light Gradient Boosting Model) performed well with the highest accuracy. LightGBM, the number of involved vehicles, type of accident, incident location, incident lane type, types of accidents, types of vehicles involved in accidents were shown as priority factors. Based on the results of this model, the establishment of a management strategy for response of highway traffic accident should be presented through a consistent prediction process of accident severity level. This study identifies applicability of Machine Learning Models for Predicting of the Severity of Car Traffic Accidents on a Highway and suggests that various machine learning techniques based on big data that can be used in the future.

A Study on Residual U-Net for Semantic Segmentation based on Deep Learning (딥러닝 기반의 Semantic Segmentation을 위한 Residual U-Net에 관한 연구)

  • Shin, Seokyong;Lee, SangHun;Han, HyunHo
    • Journal of Digital Convergence
    • /
    • v.19 no.6
    • /
    • pp.251-258
    • /
    • 2021
  • In this paper, we proposed an encoder-decoder model utilizing residual learning to improve the accuracy of the U-Net-based semantic segmentation method. U-Net is a deep learning-based semantic segmentation method and is mainly used in applications such as autonomous vehicles and medical image analysis. The conventional U-Net occurs loss in feature compression process due to the shallow structure of the encoder. The loss of features causes a lack of context information necessary for classifying objects and has a problem of reducing segmentation accuracy. To improve this, The proposed method efficiently extracted context information through an encoder using residual learning, which is effective in preventing feature loss and gradient vanishing problems in the conventional U-Net. Furthermore, we reduced down-sampling operations in the encoder to reduce the loss of spatial information included in the feature maps. The proposed method showed an improved segmentation result of about 12% compared to the conventional U-Net in the Cityscapes dataset experiment.

Optimal Algorithm and Number of Neurons in Deep Learning (딥러닝 학습에서 최적의 알고리즘과 뉴론수 탐색)

  • Jang, Ha-Young;You, Eun-Kyung;Kim, Hyeock-Jin
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.389-396
    • /
    • 2022
  • Deep Learning is based on a perceptron, and is currently being used in various fields such as image recognition, voice recognition, object detection, and drug development. Accordingly, a variety of learning algorithms have been proposed, and the number of neurons constituting a neural network varies greatly among researchers. This study analyzed the learning characteristics according to the number of neurons of the currently used SGD, momentum methods, AdaGrad, RMSProp, and Adam methods. To this end, a neural network was constructed with one input layer, three hidden layers, and one output layer. ReLU was applied to the activation function, cross entropy error (CEE) was applied to the loss function, and MNIST was used for the experimental dataset. As a result, it was concluded that the number of neurons 100-300, the algorithm Adam, and the number of learning (iteraction) 200 would be the most efficient in deep learning learning. This study will provide implications for the algorithm to be developed and the reference value of the number of neurons given new learning data in the future.

Genetic parameter analysis of reproductive traits in Large White pigs

  • Yu, Guanghui;Wang, Chuduan;Wang, Yuan
    • Animal Bioscience
    • /
    • v.35 no.11
    • /
    • pp.1649-1655
    • /
    • 2022
  • Objective: The primary objective of this study was to determine the genetic parameters for reproductive traits among Large White pigs, including the following traits: total number born (TNB), number born alive (NBA), litter birth weight (LBW), average birth weight (ABW), gestation length (GL), age at first service (AFS) and age at first farrowing (AFF). Methods: The dataset consisted of 19,036 reproductive records from 4,986 sows, and a multi-trait animal model was used to estimate genetic variance components of seven reproductive traits. Results: The heritability estimates for these reproductive traits ranged from 0.09 to 0.26, with the highest heritability for GL and AFF, and the lowest heritability for NBA. The repeatabilities for TNB, NBA, LWB, ABW, and GL were ranged from 0.16 to 0.34. Genetic and phenotypic correlations ranged from -0.41 to 0.99, and -0.34 to 0.98, respectively. In particular, the correlations between TNB, NBA and LBW, between AFS and AFF, exhibited a strong positive correlation. Furthermore, for TNB, NBA, LBW, ABW, and GL, genetic correlations of the same trait between different parities were moderately to strongly correlated (0.32 to 0.97), and the correlations of adjacent parities were higher than those of nonadjacent parities. Conclusion: All the results in the present study can be used as a basis for the genetic assessment of the target population. In the formulation of dam line selection index, AFS or AFF can be considered to combine with TNB in a multiple trait swine breeding value estimation system. Moreover, breeders are encouraged to increase the proportion of sows at parity 3-5 and reinforce the management of sows at parity 1 and parity ≥8.

A Study on the Improvement of Administrative Information Data Set Operation of Private Universities (사립대학 행정정보 데이터세트 운영 개선 방안)

  • Kim, Hyunjung;Bae, Sungjung
    • The Korean Journal of Archival Studies
    • /
    • no.74
    • /
    • pp.187-222
    • /
    • 2022
  • The aim of this study was to analyze the operation status of administrative information datasets of private universities and present improvement plans. For the system of private universities, the generation, correction, and deletion of functions, development types, and data were analyzed politically. As a result of the analysis, it has one or more administrative information systems and uses the academic management system in common, and the system is often developed on its own through the university's infrastructure, and data is deleted by the person in charge, but the regulations are not clear. To solve these problems, it was proposed to revise the EA portal to properly investigate the current status of the administrative information system of private universities, manage records centering on systems without data correction, and revise internal regulations to conduct education.

Analysis of Treatment Pattern in COPD Patients Using Health Insurance Claims Data: Focusing on Inhaled Medications (건강 보험 청구 자료를 이용한 COPD 환자에서 치료제 처방 변화 분석: 흡입제를 중심으로)

  • Lim, Hana;Park, Mihai
    • Korean Journal of Clinical Pharmacy
    • /
    • v.32 no.3
    • /
    • pp.155-165
    • /
    • 2022
  • Background: Chronic obstructive pulmonary disease (COPD) is not completely reversible and requires long-term management with appropriate treatment. This study aimed to analyze trends in treatment regimens and medication costs for COPD patients using a national claims database. Methods: We conducted this analysis using National Patient Sample data from the Health Insurance Review and Assessment Service covering the period from 2015 to 2018. We have constructed a dataset comprising COPD disease classification codes J43.x and J44.x (based on KCD-7 code, J43.0 was excluded) and compiled a list of drugs fitting current guidelines. To identify trends, we calculated frequency, ratio, and compound annual growth rate (CAGR) using the numbers of prescriptions and patients. Results: The number of COPD patients was 7,260 in 2018, slightly decreased from 2015. Most of these COPD patients were aged 60 or older and included a high proportion of males (72.2%; 2018). The number of patients prescribed inhaled medications increased gradually from 2015 to 2018 (9,227 (47.1%); 2015, 9,285 (51.5%); 2018), while the number of patients prescribed systemic beta-agonists and Xanthines has decreased since 2015 (CAGR -14.7; systemic beta-agonist, -5.8; Xanthines). The per capita cost of medication has increased by 0.4% (KRW 206,667; 2018, KRW 204,278; 2015) annually during the study period. Conclusion: This study showed that treatment with inhaled medications had continuously increased in accord with changing guidelines, but oral medications were still widely used. It is necessary to emphasize the importance of inhaled medications in treating COPD to reduce additional economic burden through appropriate medication use.

A Visitor Study of The Exhibition of Using Big Data Analysis which reflects viewing experiences

  • Kang, Ji-Su;Rhee, Bo-A
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.81-89
    • /
    • 2022
  • This study aims to analyze the images of Instagram posts and to draw implcations regarding the exhibition of . This study collects and crawl 24,295 images from Instagram posts as a dataset. We use the Google Cloud Vision API for labeling the images and a total of 212,567 clusters of labels are finally classified into 9 categories using Word2Vec. The categories of museum spaces, photo zone, architecture category are dominant along with people category. In conclusion, visitors curate their experiences and memories of physical places and spaces while they are experiencing with the exhibition. This result reproves the results of previous studies which emphasize a sense of social presence and place making. The convergent approach of art management and art technology used in this study help museum professionals have an insight on big data based visitor research on a practical level.

The Methodology of the Golf Swing Similarity Measurement Using Deep Learning-Based 2D Pose Estimation

  • Jonghyuk, Park
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.1
    • /
    • pp.39-47
    • /
    • 2023
  • In this paper, we propose a method to measure the similarity between golf swings in videos. As it is known that deep learning-based artificial intelligence technology is effective in the field of computer vision, attempts to utilize artificial intelligence in video-based sports data analysis are increasing. In this study, the joint coordinates of a person in a golf swing video were obtained using a deep learning-based pose estimation model, and based on this, the similarity of each swing segment was measured. For the evaluation of the proposed method, driver swing videos from the GolfDB dataset were used. As a result of measuring swing similarity by pairing swing videos of a total of 36 players, 26 players evaluated that their other swing sequence was the most similar, and the average ranking of similarity was confirmed to be about 5th. This ensured that the similarity could be measured in detail even when the motion was performed similarly.

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal (기계학습에 유효한 데이터 요건 및 선별: 공공데이터포털 제공 데이터 사례를 통해)

  • Oh, Hyo-Jung;Yun, Bo-Hyun
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.1
    • /
    • pp.37-43
    • /
    • 2022
  • The fundamental basis of AI technology is learningable data. Recently, the types and amounts of data collected and produced by the government or private companies are increasing exponentially, however, verified data that can be used for actual machine learning has not yet led to it. This study discusses the conditions that data actually can be used for machine learning should meet, and identifies factors that degrade data quality through case studies. To this end, two representative cases of developing a prediction model using public big data was selected, and data for actual problem solving was collected from the public data portal. Through this, there is a difference from the results of applying valid data screening criteria and post-processing. The ultimate purpose of this study is to argue the importance of data quality management that must be most fundamentally preceded before the development of machine learning technology, which is the core of artificial intelligence, and accumulating valid data.

Implementation of Git's Commit Message Complex Classification Model for Software Maintenance

  • Choi, Ji-Hoon;Kim, Joon-Yong;Park, Seong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.131-138
    • /
    • 2022
  • Git's commit message is closely related to the project life cycle, and by this characteristic, it can greatly contribute to cost reduction and improvement of work efficiency by identifying risk factors and project status of project operation activities. Among these related fields, there are many studies that classify commit messages as types of software maintenance, and the maximum accuracy among the studies is 87%. In this paper, the purpose of using a solution using the commit classification model is to design and implement a complex classification model that combines several models to increase the accuracy of the previously published models and increase the reliability of the model. In this paper, a dataset was constructed by extracting automated labeling and source changes and trained using the DistillBERT model. As a result of verification, reliability was secured by obtaining an F1 score of 95%, which is 8% higher than the maximum of 87% reported in previous studies. Using the results of this study, it is expected that the reliability of the model will be increased and it will be possible to apply it to solutions such as software and project management.