• Title/Summary/Keyword: machine learning models

Search Result 1,361, Processing Time 0.038 seconds

A Study on the Performance Degradation Pattern of Caisson-type Quay Wall Port Facilities (케이슨식 안벽 항만시설의 성능저하패턴 연구)

  • Na, Yong Hyoun;Park, Mi Yeon;Jang, Shinwoo
    • Journal of the Society of Disaster Information
    • /
    • v.18 no.1
    • /
    • pp.146-153
    • /
    • 2022
  • Purpose: In the case of domestic port facilities, port structures that have been in use for a long time have many problems in terms of safety performance and functionality due to the enlargement of ships, increased frequency of use, and the effects of natural disasters due to climate change. A big data analysis method was studied to develop an approximate model that can predict the aging pattern of a port facility based on the maintenance history data of the port facility. Method: In this study, member-level maintenance history data for caisson-type quay walls were collected, defined as big data, and based on the data, a predictive approximation model was derived to estimate the aging pattern and deterioration of the facility at the project level. A state-based aging pattern prediction model generated through Gaussian process (GP) and linear interpolation (SLPT) techniques was proposed, and models suitable for big data utilization were compared and proposed through validation. Result: As a result of examining the suitability of the proposed method, the SLPT method has RMSE of 0.9215 and 0.0648, and the predictive model applied with the SLPT method is considered suitable. Conclusion: Through this study, it is expected that the study of predicting performance degradation of big data-based facilities will become an important system in decision-making regarding maintenance.

Development of prediction model identifying high-risk older persons in need of long-term care (장기요양 필요 발생의 고위험 대상자 발굴을 위한 예측모형 개발)

  • Song, Mi Kyung;Park, Yeongwoo;Han, Eun-Jeong
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.457-468
    • /
    • 2022
  • In aged society, it is important to prevent older people from being disability needing long-term care. The purpose of this study is to develop a prediction model to discover high-risk groups who are likely to be beneficiaries of Long-Term Care Insurance. This study is a retrospective study using database of National Health Insurance Service (NHIS) collected in the past of the study subjects. The study subjects are 7,724,101, the population over 65 years of age registered for medical insurance. To develop the prediction model, we used logistic regression, decision tree, random forest, and multi-layer perceptron neural network. Finally, random forest was selected as the prediction model based on the performances of models obtained through internal and external validation. Random forest could predict about 90% of the older people in need of long-term care using DB without any information from the assessment of eligibility for long-term care. The findings might be useful in evidencebased health management for prevention services and can contribute to preemptively discovering those who need preventive services in older people.

Comparative Study of Anomaly Detection Accuracy of Intrusion Detection Systems Based on Various Data Preprocessing Techniques (다양한 데이터 전처리 기법 기반 침입탐지 시스템의 이상탐지 정확도 비교 연구)

  • Park, Kyungseon;Kim, Kangseok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.449-456
    • /
    • 2021
  • An intrusion detection system is a technology that detects abnormal behaviors that violate security, and detects abnormal operations and prevents system attacks. Existing intrusion detection systems have been designed using statistical analysis or anomaly detection techniques for traffic patterns, but modern systems generate a variety of traffic different from existing systems due to rapidly growing technologies, so the existing methods have limitations. In order to overcome this limitation, study on intrusion detection methods applying various machine learning techniques is being actively conducted. In this study, a comparative study was conducted on data preprocessing techniques that can improve the accuracy of anomaly detection using NGIDS-DS (Next Generation IDS Database) generated by simulation equipment for traffic in various network environments. Padding and sliding window were used as data preprocessing, and an oversampling technique with Adversarial Auto-Encoder (AAE) was applied to solve the problem of imbalance between the normal data rate and the abnormal data rate. In addition, the performance improvement of detection accuracy was confirmed by using Skip-gram among the Word2Vec techniques that can extract feature vectors of preprocessed sequence data. PCA-SVM and GRU were used as models for comparative experiments, and the experimental results showed better performance when sliding window, skip-gram, AAE, and GRU were applied.

Study on Zero-shot based Quality Estimation (Zero-Shot 기반 기계번역 품질 예측 연구)

  • Eo, Sugyeong;Park, Chanjun;Seo, Jaehyung;Moon, Hyeonseok;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.35-43
    • /
    • 2021
  • Recently, there has been a growing interest in zero-shot cross-lingual transfer, which leverages cross-lingual language models (CLLMs) to perform downstream tasks that are not trained in a specific language. In this paper, we point out the limitations of the data-centric aspect of quality estimation (QE), and perform zero-shot cross-lingual transfer even in environments where it is difficult to construct QE data. Few studies have dealt with zero-shots in QE, and after fine-tuning the English-German QE dataset, we perform zero-shot transfer leveraging CLLMs. We conduct comparative analysis between various CLLMs. We also perform zero-shot transfer on language pairs with different sized resources and analyze results based on the linguistic characteristics of each language. Experimental results showed the highest performance in multilingual BART and multillingual BERT, and we induced QE to be performed even when QE learning for a specific language pair was not performed at all.

Effective Capacity Planning of Capital Market IT System: Reflecting Sentiment Index (자본시장 IT시스템 효율적 용량계획 모델: 심리지수 활용을 중심으로)

  • Lee, Kukhyung;Kim, Miyea;Park, Jaeyoung;Kim, Beomsoo
    • Knowledge Management Research
    • /
    • v.23 no.1
    • /
    • pp.89-109
    • /
    • 2022
  • Due to COVID-19 and soaring participation of individual investors, large-scale transactions exceeding system capacity limits have been reported frequently in the capital market. The capital market IT systems, which the impact of system failure is very critical, have encountered unexpectedly tremendous transactions in 2020, resulting in a sharp increase in system failures. Despite the fact that many companies maintained large-scale system capacity planning policies, recent transaction influx suggests that a new approach to capacity planning is required. Therefore, this study developed capital market IT system capacity planning models using machine learning techniques and analyzed those performances. In addition, the performance of the best proposed model was improved by using sentiment index that can promptly reflect the behavior of investors. The model uses empirical data including the COVID-19 period, and has high performance and stability that can be used in practice. In practical significance, this study maximizes the cost-efficiency of a company, but also presents optimal parameters in consideration of the practical constraints involved in changing the system. Additionally, by proving that the sentiment index can be used as a major variable in system capacity planning, it shows that the sentiment index can be actively used for various other forecasting demands.

Applicability Analysis on Estimation of Spectral Induced Polarization Parameters Based on Multi-objective Optimization (다중목적함수 최적화에 기초한 광대역 유도분극 변수 예측 적용성 분석)

  • Kim, Bitnarae;Jeong, Ju Yeon;Min, Baehyun;Nam, Myung Jin
    • Geophysics and Geophysical Exploration
    • /
    • v.25 no.3
    • /
    • pp.99-108
    • /
    • 2022
  • Among induced polarization (IP) methods, spectral IP (SIP) uses alternating current as a transmission source to measure amplitudes and phase of complex electrical resistivity at each source frequency, which disperse with respect to source frequencies. The frequency dependence, which can be explained by a relaxation model such as Cole-Cole model or equivalent models, is analyzed to estimate SIP parameters from dispersion curves of complex resistivity employing multi-objective optimization (MOO). The estimation uses a generic algorithm to optimize two objective functions minimizing data misfits of amplitude and phase based on Cole-Cole model, which is most widely used to explain IP relaxation effects. The MOO-based estimation properly recovered Cole-Cole model parameters for synthetic examples but hardly fitted for the real laboratory measures ones, which have relatively smaller values of phases (less than about 10 mrad). Discrepancies between scales for data misfits of amplitude and phase, used as parameters of MOO method, and it is in necessity to employ other methods such as machine learning, which can deal with the discrepancies, to estimate SIP parameters from dispersion curves of complex resistivity.

Quantitative Estimation Method for ML Model Performance Change, Due to Concept Drift (Concept Drift에 의한 ML 모델 성능 변화의 정량적 추정 방법)

  • Soon-Hong An;Hoon-Suk Lee;Seung-Hoon Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.6
    • /
    • pp.259-266
    • /
    • 2023
  • It is very difficult to measure the performance of the machine learning model in the business service stage. Therefore, managing the performance of the model through the operational department is not done effectively. Academically, various studies have been conducted on the concept drift detection method to determine whether the model status is appropriate. The operational department wants to know quantitatively the performance of the operating model, but concept drift can only detect the state of the model in relation to the data, it cannot estimate the quantitative performance of the model. In this study, we propose a performance prediction model (PPM) that quantitatively estimates precision through the statistics of concept drift. The proposed model induces artificial drift in the sampling data extracted from the training data, measures the precision of the sampling data, creates a dataset of drift and precision, and learns it. Then, the difference between the actual precision and the predicted precision is compared through the test data to correct the error of the performance prediction model. The proposed PPM was applied to two models, a loan underwriting model and a credit card fraud detection model that can be used in real business. It was confirmed that the precision was effectively predicted.

Reliability of mortar filling layer void length in in-service ballastless track-bridge system of HSR

  • Binbin He;Sheng Wen;Yulin Feng;Lizhong Jiang;Wangbao Zhou
    • Steel and Composite Structures
    • /
    • v.47 no.1
    • /
    • pp.91-102
    • /
    • 2023
  • To study the evaluation standard and control limit of mortar filling layer void length, in this paper, the train sub-model was developed by MATLAB and the track-bridge sub-model considering the mortar filling layer void was established by ANSYS. The two sub-models were assembled into a train-track-bridge coupling dynamic model through the wheel-rail contact relationship, and the validity was corroborated by the coupling dynamic model with the literature model. Considering the randomness of fastening stiffness, mortar elastic modulus, length of mortar filling layer void, and pier settlement, the test points were designed by the Box-Behnken method based on Design-Expert software. The coupled dynamic model was calculated, and the support vector regression (SVR) nonlinear mapping model of the wheel-rail system was established. The learning, prediction, and verification were carried out. Finally, the reliable probability of the amplification coefficient distribution of the response index of the train and structure in different ranges was obtained based on the SVR nonlinear mapping model and Latin hypercube sampling method. The limit of the length of the mortar filling layer void was, thus, obtained. The results show that the SVR nonlinear mapping model developed in this paper has a high fitting accuracy of 0.993, and the computational efficiency is significantly improved by 99.86%. It can be used to calculate the dynamic response of the wheel-rail system. The length of the mortar filling layer void significantly affects the wheel-rail vertical force, wheel weight load reduction ratio, rail vertical displacement, and track plate vertical displacement. The dynamic response of the track structure has a more significant effect on the limit value of the length of the mortar filling layer void than the dynamic response of the vehicle, and the rail vertical displacement is the most obvious. At 250 km/h - 350 km/h train running speed, the limit values of grade I, II, and III of the lengths of the mortar filling layer void are 3.932 m, 4.337 m, and 4.766 m, respectively. The results can provide some reference for the long-term service performance reliability of the ballastless track-bridge system of HRS.

A Groundwater Potential Map for the Nakdonggang River Basin (낙동강권역의 지하수 산출 유망도 평가)

  • Soonyoung Yu;Jaehoon Jung;Jize Piao;Hee Sun Moon;Heejun Suk;Yongcheol Kim;Dong-Chan Koh;Kyung-Seok Ko;Hyoung-Chan Kim;Sang-Ho Moon;Jehyun Shin;Byoung Ohan Shim;Hanna Choi;Kyoochul Ha
    • Journal of Soil and Groundwater Environment
    • /
    • v.28 no.6
    • /
    • pp.71-89
    • /
    • 2023
  • A groundwater potential map (GPM) was built for the Nakdonggang River Basin based on ten variables, including hydrogeologic unit, fault-line density, depth to groundwater, distance to surface water, lineament density, slope, stream drainage density, soil drainage, land cover, and annual rainfall. To integrate the thematic layers for GPM, the criteria were first weighted using the Analytic Hierarchical Process (AHP) and then overlaid using the Technique for Ordering Preferences by Similarity to Ideal Solution (TOPSIS) model. Finally, the groundwater potential was categorized into five classes (very high (VH), high (H), moderate (M), low (L), very low (VL)) and verified by examining the specific capacity of individual wells on each class. The wells in the area categorized as VH showed the highest median specific capacity (5.2 m3/day/m), while the wells with specific capacity < 1.39 m3/day/m were distributed in the areas categorized as L or VL. The accuracy of GPM generated in the work looked acceptable, although the specific capacity data were not enough to verify GPM in the studied large watershed. To create GPMs for the determination of high-yield well locations, the resolution and reliability of thematic maps should be improved. Criterion values for groundwater potential should be established when machine learning or statistical models are used in the GPM evaluation process.

Mapping Mammalian Species Richness Using a Machine Learning Algorithm (머신러닝 알고리즘을 이용한 포유류 종 풍부도 매핑 구축 연구)

  • Zhiying Jin;Dongkun Lee;Eunsub Kim;Jiyoung Choi;Yoonho Jeon
    • Journal of Environmental Impact Assessment
    • /
    • v.33 no.2
    • /
    • pp.53-63
    • /
    • 2024
  • Biodiversity holds significant importance within the framework of environmental impact assessment, being utilized in site selection for development, understanding the surrounding environment, and assessing the impact on species due to disturbances. The field of environmental impact assessment has seen substantial research exploring new technologies and models to evaluate and predict biodiversity more accurately. While current assessments rely on data from fieldwork and literature surveys to gauge species richness indices, limitations in spatial and temporal coverage underscore the need for high-resolution biodiversity assessments through species richness mapping. In this study, leveraging data from the 4th National Ecosystem Survey and environmental variables, we developed a species distribution model using Random Forest. This model yielded mapping results of 24 mammalian species' distribution, utilizing the species richness index to generate a 100-meter resolution map of species richness. The research findings exhibited a notably high predictive accuracy, with the species distribution model demonstrating an average AUC value of 0.82. In addition, the comparison with National Ecosystem Survey data reveals that the species richness distribution in the high-resolution species richness mapping results conforms to a normal distribution. Hence, it stands as highly reliable foundational data for environmental impact assessment. Such research and analytical outcomes could serve as pivotal new reference materials for future urban development projects, offering insights for biodiversity assessment and habitat preservation endeavors.