• Title/Summary/Keyword: Linear predictive model

Search Result 288, Processing Time 0.027 seconds

Apartment Price Prediction Using Deep Learning and Machine Learning (딥러닝과 머신러닝을 이용한 아파트 실거래가 예측)

  • Hakhyun Kim;Hwankyu Yoo;Hayoung Oh
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.2
    • /
    • pp.59-76
    • /
    • 2023
  • Since the COVID-19 era, the rise in apartment prices has been unconventional. In this uncertain real estate market, price prediction research is very important. In this paper, a model is created to predict the actual transaction price of future apartments after building a vast data set of 870,000 from 2015 to 2020 through data collection and crawling on various real estate sites and collecting as many variables as possible. This study first solved the multicollinearity problem by removing and combining variables. After that, a total of five variable selection algorithms were used to extract meaningful independent variables, such as Forward Selection, Backward Elimination, Stepwise Selection, L1 Regulation, and Principal Component Analysis(PCA). In addition, a total of four machine learning and deep learning algorithms were used for deep neural network(DNN), XGBoost, CatBoost, and Linear Regression to learn the model after hyperparameter optimization and compare predictive power between models. In the additional experiment, the experiment was conducted while changing the number of nodes and layers of the DNN to find the most appropriate number of nodes and layers. In conclusion, as a model with the best performance, the actual transaction price of apartments in 2021 was predicted and compared with the actual data in 2021. Through this, I am confident that machine learning and deep learning will help investors make the right decisions when purchasing homes in various economic situations.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.

Prediction of BaP and Total PAH in Soil from Pyr Concentration using Regression Analysis (회귀분석을 통한 토양 내 Pyr 농도로부터 BaP와 총 PAH의 예측기법)

  • Lee, Woo-Bum;Kim, Jongo
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.39 no.3
    • /
    • pp.118-123
    • /
    • 2017
  • This study investigated the feasibility of a statistical approach for the prediction of BaP and total PAHs as pyrogenic sources. As results of regression, excellent linear and multiple correlations ($r^2$ > 0.94) were observed between BaP (or ${\Sigma}PAH$) and Pyr concentrations. When a developed prediction equation was applied to other investigations as validation and application studies, outstanding prediction results were obtained. The predictive model showed very good correlation between the measured and calculated ${\Sigma}PAH$. From this equation, Pyr was an apparently important hydrocarbon for the prediction of PAH. This model might provide a potentially useful tool for the calculation of average BaP and ${\Sigma}PAH$ in a certain region without additional tests.

Model Development for Estimating Total Arsenic Contents with Chemical Properties and Extractable Heavy Metal Contents in Paddy Soils (논토양의 이화학적 특성 및 침출성 중금속 함량을 이용한 비소의 전함량 예측)

  • Lee, Jeong-Mi;Go, Woo-Ri;Kunhikrishnan, Anitha;Yoo, Ji-Hyock;Kim, Ji-Young;Kim, Doo-Ho;Kim, Won-Il
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.45 no.6
    • /
    • pp.920-924
    • /
    • 2012
  • This study was performed to estimate total contents of arsenic (As) by stepwise multiple-regression analysis using chemical properties and extractable contents of metal in paddy soil adjacent to abandoned mines. The soil was collected from paddies near abandoned mines. Soil pH, electrical conductively (EC), organic mater (OM), available phosphorus ($P_2O_5$), and exchangeable cations (Ca, K, Mg, Na) were measured. Total contents of As and extractable contents of metals were analyzed by ICP-OES. From stepwise analysis, it was showed that the contents of extractable As, available phosphorus, extractable Cu, exchangeable K, exchangeable Na, and organic mater significantly influenced the total contents of As in soil (p<0.001). The multiple linear regression models have been established as Log (Total-As) = 0.741 + 0.716 Log (extractable-As) - 0.734 Log (avail-$P_2O_5$) + 0.334 Log (extractable-Cu) + 0.186 Log (exchangeable-K) - 0.593 Log (exchangeable-Na) + 0.558 Log (OM). The estimated value in total contents of As was significantly correlated with the measured value in soil ($R^2$=0.84196, p<0.0001). This predictive model for estimating total As contents in paddy soil will be properly applied to the numerous datasets which were surveyed with extractable heavy metal contents based on Soil Environmental Conservation Act before 2010.

Family of Cascade-correlation Learning Algorithm (캐스케이드-상관 학습 알고리즘의 패밀리)

  • Choi Myeong-Bok;Lee Sang-Un
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.1
    • /
    • pp.87-91
    • /
    • 2005
  • The cascade-correlation (CC) learning algorithm of Fahlman and Lebiere is one of the most influential constructive algorithm in a neural network. Cascading the hidden neurons results in a network that can represent very strong nonlinearities. Although this power is in principle useful, it can be a disadvantage if such strong nonlinearity is not required to solve the problem. 3 models are presented and compared empirically. All of them are based on valiants of the cascade architecture and output neurons weights training of the CC algorithm. Empirical results indicate the followings: (1) In the pattern classification, the model that train only new hidden neuron to output layer connection weights shows the best predictive ability; (2) In the function approximation, the model that removed input-output connection and used sigmoid-linear activation function is better predictability than CasCor algorithm.

A Comparative Study of Speech Parameters for Speech Recognition Neural Network (음성 인식 신경망을 위한 음성 파라키터들의 성능 비교)

  • Kim, Ki-Seok;Im, Eun-Jin;Hwang, Hee-Yung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.3
    • /
    • pp.61-66
    • /
    • 1992
  • There have been many researches that uses neural network models for automatic speech recognition, but the main trend was finding the neural network models and learning rules appropriate to automatic speech recognition. However, the choice of the input speech parameter for the neural network as well as neural network model itself is a very important factor for the improvement of performance of the automatic speech recognition system using neural network. In this paper we select 6 speech parameters from surveys of the speech recognition papers which uses neural networks, and analyze the performance for the same data and the same neural network model. We use 8 sets of 9 Korean plosives and 18 sets of 8 Korean vowels. We use recurrent neural network and compare the performance of the 6 speech parameters while the number of nodes is constant. The delta cepstrum of linear predictive coefficients showed best result and the recognition rates are 95.1% for the vowels and 100.0% for plosives.

  • PDF

Prediction of PAHs Concentration using Statistical Analysis for Soil Recycling (토양 재활용을 위한 통계적 분석의 PAHs 농도 예측)

  • Kim, Jongo;Lee, Manseung
    • Resources Recycling
    • /
    • v.26 no.4
    • /
    • pp.56-61
    • /
    • 2017
  • This study investigated the feasibility of a statistical approach for soil recycling through the prediction of BaP, DahA and total PAH (${\Sigma}PAH$) concentrations from BaA concentration. As results of regression, excellent linear correlations ($R^2$ > 0.90) were observed between BaA and BaP (or DahA) concentrations. When a developed prediction equation was applied to other investigations as a validation study, good prediction results were obtained. The predictive model showed very good correlation between the measured and calculated BaP. From this equation, BaA was an apparently important hydrocarbon for the prediction of PAHs. This model might provide a potentially useful tool for the calculation of average BaP, DahA and ${\Sigma}PAH$ without additional tests.

Analsis Of Outliers In Real Estate Prices Using Autoencoder (Autoencoder 기법을 활용한 부동산 가격 이상치 분석)

  • Kim, Yoonseo;Park, Jongchan;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1739-1748
    • /
    • 2021
  • Real estate prices affect countries, businesses, and households, and many studies have been conducted on the real estate bubble in recent soaring real estate prices. However, if the real estate bubble prediction simply compares the real estate price, or if it does not reflect key psychological variables in real estate sales, it can be judged that the accuracy of the bubble prediction model is poor. The purpose of this study is to design a predictive model that can explain the real estate bubble situation by region using the autoencoder technique. Existing real estate bubble analysis studies failed to set various types of variables that affect prices, and most of them were conducted based on linear models. Thus, this study suggests the possibility of introducing techniques and variables that have not been used in existing real estate bubble studies.

Data Analysis and Mining for Fish Growth Data in Fish-Farms (양식장 어류 생육 데이터 분석 및 마이닝)

  • Seoung-Bin Ye;Jeong-Seon Park;Soon-Hee Han;Hyi-Thaek Ceong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.1
    • /
    • pp.127-142
    • /
    • 2023
  • The management of size and weight, which are the growth information of aquaculture fish in fish-farms, is the most basic goal. In this study, the epoch is defined in fish-farms from the time of stocking or dividing to the time of shipment, and the growth data for a total of three epoch is analyzed from a time series perspective. Growth information such as the size and weight of aquaculture fish that occur over time in fish-farms is compared and analyzed with water quality environmental information and feeding information, and a model is presented using the analysis results. In this study, linear, exponential, and logarithmic regression models are presented using the Box-Jenkins method for size and weight by epoch using data obtained in the field.

Formulations of Job Strain and Psychological Distress: A Four-year Longitudinal Study in Japan

  • Mayumi Saiki;Timothy A. Matthews;Norito Kawakami;Wendie Robbins;Jian Li
    • Safety and Health at Work
    • /
    • v.15 no.1
    • /
    • pp.59-65
    • /
    • 2024
  • Background: Different job strain formulations based on the Job Demand-Control model have been developed. This study evaluated longitudinal associations between job strain and psychological distress and whether associations were influenced by six formulations of job strain, including quadrant (original and simplified), subtraction, quotient, logarithm quotient, and quartile based on quotient, in randomly selected Japanese workers. Methods: Data were from waves I and II of the Survey of Midlife in Japan (MIDJA), with a 4-year followup period. The study sample consisted of 412 participants working at baseline and had complete data on variables of interest. Associations between job strain at baseline and psychological distress at follow-up were assessed via multivariable linear regression, and results were expressed as β coefficients and 95% confidence intervals including R2 and Akaike information criterion (AIC) evaluation. Results: Crude models revealed that job strain formulations explained 6.93-10.30% of variance. The AIC ranged from 1475.87 to 1489.12. After accounting for sociodemographic and behavioral factors and psychological distress at baseline, fully-adjusted models indicated significant associations between all job strain formulations at baseline and psychological distress at follow-up: original quadrant (β: 1.16, 95% CI: 0.12, 2.21), simplified quadrant (β: 1.01, 95% CI: 0.18, 1.85), subtraction (β: 0.39, 95% CI: 0.09, 0.70), quotient (β: 0.37, 95% CI: 0.08, 0.67), logarithm quotient (β: 0.42, 95% CI: 0.12, 0.72), and quartile based on quotient (β: 1.22, 95% CI: 0.36, 2.08). Conclusion: Six job strain formulations showed robust predictive power regarding psychological distress over 4 years among Japanese workers.