• Title/Summary/Keyword: Prediction of variables

Search Result 1,832, Processing Time 0.03 seconds

A Study on Artificial Intelligence Model for Forecasting Daily Demand of Tourists Using Domestic Foreign Visitors Immigration Data (국내 외래객 출입국 데이터를 활용한 관광객 일별 수요 예측 인공지능 모델 연구)

  • Kim, Dong-Keon;Kim, Donghee;Jang, Seungwoo;Shyn, Sung Kuk;Kim, Kwangsu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.35-37
    • /
    • 2021
  • Analyzing and predicting foreign tourists' demand is a crucial research topic in the tourism industry because it profoundly influences establishing and planning tourism policies. Since foreign tourist data is influenced by various external factors, it has a characteristic that there are many subtle changes over time. Therefore, in recent years, research is being conducted to design a prediction model by reflecting various external factors such as economic variables to predict the demand for tourists inbound. However, the regression analysis model and the recurrent neural network model, mainly used for time series prediction, did not show good performance in time series prediction reflecting various variables. Therefore, we design a foreign tourist demand prediction model that complements these limitations using a convolutional neural network. In this paper, we propose a model that predicts foreign tourists' demand by designing a one-dimensional convolutional neural network that reflects foreign tourist data for the past ten years provided by the Korea Tourism Organization and additionally collected external factors as input variables.

  • PDF

Estimation of Forest Productive Area of Quercus acutissima and Quercus mongolica Using Site Environmental Variables (산림 입지토양 환경요인에 의한 상수리나무와 신갈나무의 적지추정)

  • Lee, Seung-Woo;Won, Hyung-Kyu;Shin, Man-Yong;Son, Young-Mo;Lee, Yoon-Young
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.40 no.5
    • /
    • pp.429-434
    • /
    • 2007
  • This study was conducted to estimate site productivity of Quercus acutissima and Quercus mongolica by four forest climatic zones. We used site environmental variables (28 geographical and pedological factors) and site index as a site productivity indicator from nation-wide 23,315 stands. Based on multiple regression analysis between site index and major environmental variables, the best-fit multivaliate models were made by each species and forest climatic zone. Most of site index prediction models by species were regressed with seven to eight factors, including altitude, relief, soil depth, and soil moisture etc. For those models, three evaluation statistics such as mean difference, standard deviation of difference, and standard error of difference were applied to the test data set for the validation of the results. According to the evaluation statistics, it was found that the models by climatic zones and species fitted well to the test data set with relatively low bias and variation. Also having above middle of site index range, total area of productive sites for the two Quercus spp. estimated by those models would be about 6% of total forest area. Northern temperate forest zone and central temperate forest zone had more productive area than southern temperate forest zone and warm temperate forest zone. As a result, it was concluded that the regressive prediction with site environmental variables by climatic zones and species had enough estimation capability of forest site productivity.

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

Data Assimilation of Aeolus/ALADIN Horizontal Line-Of-Sight Wind in the Korean Integrated Model Forecast System (KIM 예보시스템에서의 Aeolus/ALADIN 수평시선 바람 자료동화)

  • Lee, Sihye;Kwon, In-Hyuk;Kang, Jeon-Ho;Chun, Hyoung-Wook;Seol, Kyung-Hee;Jeong, Han-Byeol;Kim, Won-Ho
    • Atmosphere
    • /
    • v.32 no.1
    • /
    • pp.27-37
    • /
    • 2022
  • The Korean Integrated Model (KIM) forecast system was extended to assimilate Horizontal Line-Of-Sight (HLOS) wind observations from the Atmospheric Laser Doppler Instrument (ALADIN) on board the Atmospheric Dynamic Mission (ADM)-Aeolus satellite. Quality control procedures were developed to assess the HLOS wind data quality, and observation operators added to the KIM three-dimensional variational data assimilation system to support the new observed variables. In a global cycling experiment, assimilation of ALADIN observations led to reductions in average root-mean-square error of 2.1% and 1.3% for the zonal and meridional wind analyses when compared against European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecast System (IFS) analyses. Even though the observable variable is wind, the assimilation of ALADIN observation had an overall positive impact on the analyses of other variables, such as temperature and specific humidity. As a result, the KIM 72-hour wind forecast fields were improved in the Southern Hemisphere poleward of 30 degrees.

A Study on Developing a Prediction Model of Patent Citation Counts (특허인용 예측모형 구축에 관한 연구)

  • Yoo, Jae-Bok;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.4
    • /
    • pp.239-258
    • /
    • 2010
  • The purpose of this study is to develop a prediction model of patent citation counts based on major factors which affect patent citation. To this end, we performed multiple regression analysis between the patent citation counts and five explanatory variables such as the number of pages, the number of claims, the reference-average-citation rate, the strength of bibliographic coupling, and the document similarity proved as having 5% or more standardized variances($r^2$) with patent citation counts, with a test dataset of U.S. patents in five subject fields. As a result, our prediction models showed 58.3% to 89.6% predictability depending on subject fields and revealed the document similarity has the highest impact on citation counts among the five predictive variables in all the subject fields. The result of comparison between the predicted citation counts and the actual ones confirmed the usefulness of the citation prediction models built for each subject field.

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

Development of the KOSPI (Korea Composite Stock Price Index) forecast model using neural network and statistical methods) (신경 회로망과 통계적 기법을 이용한 종합주가지수 예측 모형의 개발)

  • Lee, Eun-Jin;Min, Chul-Hong;Kim, Tae-Seon
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.5
    • /
    • pp.95-101
    • /
    • 2008
  • Modeling of stock prices forecast has been considered as one of the most difficult problem to develop accurately since stock prices are highly correlated with various environmental conditions including economics and political situation. In this paper, we propose a agent system approach to predict Korea Composite Stock Price Index (KOSPI) using neural network and statistical methods. To minimize mean of prediction error and variation of prediction error, agent system includes sub-agent modules for feature extraction, variables selection, forecast engine selection, and forecasting results analysis. As a first step to develop agent system for KOSPI forecasting, twelve economic indices are selected from twenty two basic standard economic indices using principal component analysis. From selected twelve economic indices, prediction model input variables are chosen again using best-subsets regression method. Two different types data are tested for KOSPI forecasting and the Prediction results showed 11.92 points of root mean squared error for consecutive thirty days of prediction. Also, it is shown that proposed agent system approach for KOSPI forecast is effective since required types and numbers of prediction variables are time-varying, so adaptable selection of modeling inputs and prediction engine are essential for reliable and accurate forecast model.

A study on the impact on predicted soil moisture based on machine learning-based open-field environment variables (머신러닝 기반 노지 환경 변수에 따른 예측 토양 수분에 미치는 영향에 대한 연구)

  • Gwang Hoon Jung;Meong-Hun Lee
    • Smart Media Journal
    • /
    • v.12 no.10
    • /
    • pp.47-54
    • /
    • 2023
  • As understanding sudden climate change and agricultural productivity becomes increasingly important due to global warming, soil moisture prediction is emerging as a key topic in agriculture. Soil moisture has a significant impact on crop growth and health, and proper management and accurate prediction are key factors in improving agricultural productivity and resource management. For this reason, soil moisture prediction is receiving great attention in agricultural and environmental fields. In this paper, we collected and analyzed open field environmental data using a pilot field through random forest, a machine learning algorithm, obtained the correlation between data characteristics and soil moisture, and compared the actual and predicted values of soil moisture. As a result of the comparison, the prediction rate was about 92%. It was confirmed that the accuracy was . If soil moisture prediction is carried out by adding crop growth data variables through future research, key information such as crop growth speed and appropriate irrigation timing according to soil moisture can be accurately controlled to increase crop quality and improve productivity and water management efficiency. It is expected that this will have a positive impact on resource efficiency.

TeGCN:Transformer-embedded Graph Neural Network for Thin-filer default prediction (TeGCN:씬파일러 신용평가를 위한 트랜스포머 임베딩 기반 그래프 신경망 구조 개발)

  • Seongsu Kim;Junho Bae;Juhyeon Lee;Heejoo Jung;Hee-Woong Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.419-437
    • /
    • 2023
  • As the number of thin filers in Korea surpasses 12 million, there is a growing interest in enhancing the accuracy of assessing their credit default risk to generate additional revenue. Specifically, researchers are actively pursuing the development of default prediction models using machine learning and deep learning algorithms, in contrast to traditional statistical default prediction methods, which struggle to capture nonlinearity. Among these efforts, Graph Neural Network (GNN) architecture is noteworthy for predicting default in situations with limited data on thin filers. This is due to their ability to incorporate network information between borrowers alongside conventional credit-related data. However, prior research employing graph neural networks has faced limitations in effectively handling diverse categorical variables present in credit information. In this study, we introduce the Transformer embedded Graph Convolutional Network (TeGCN), which aims to address these limitations and enable effective default prediction for thin filers. TeGCN combines the TabTransformer, capable of extracting contextual information from categorical variables, with the Graph Convolutional Network, which captures network information between borrowers. Our TeGCN model surpasses the baseline model's performance across both the general borrower dataset and the thin filer dataset. Specially, our model performs outstanding results in thin filer default prediction. This study achieves high default prediction accuracy by a model structure tailored to characteristics of credit information containing numerous categorical variables, especially in the context of thin filers with limited data. Our study can contribute to resolving the financial exclusion issues faced by thin filers and facilitate additional revenue within the financial industry.

A Prediction on the Pollution Level of Outdoor Insulator with Regression Analysis (회귀분석을 활용한 옥외 절연물의 오손도 예측)

  • 최남호;구경완;한상옥
    • The Transactions of the Korean Institute of Electrical Engineers C
    • /
    • v.52 no.3
    • /
    • pp.137-143
    • /
    • 2003
  • The degree of contamination on outdoor insulator is ons of the most importance factor to determine the pollution level of outdoor insulation, and the sea salt is known as the most dangerous pollutant. As shown through the preceding study, the generation of salt pollutant and the pollution degree of outdoor insulator have a close relation with meteorological conditions, such as wind velocity, wind direction, precipitation and so fourth. So, in this paper, we made an investigation on the prediction method, a statistical estimation technique for equivalent salt deposit density of outdoor insulator with multiple linear regression analysis. From the results of the analysis, we proved the superiority of the prediction method in which the variables had a very close(about 0.9) correlation coefficient. And the results could be applied to establish the Pollution Prediction System for power utilities, and the system could provide an invaluable information for the design and maintenance of outdoor insulation system.