• Title/Summary/Keyword: Data prediction

Search Result 9,890, Processing Time 0.034 seconds

Using Machine Learning Algorithms for Housing Price Prediction: The Case of Islamabad Housing Data

  • Imran, Imran;Zaman, Umar;Waqar, Muhammad;Zaman, Atif
    • Soft Computing and Machine Intelligence
    • /
    • v.1 no.1
    • /
    • pp.11-23
    • /
    • 2021
  • House price prediction is a significant financial decision for individuals working in the housing market as well as for potential buyers. From investment to buying a house for residence, a person investing in the housing market is interested in the potential gain. This paper presents machine learning algorithms to develop intelligent regressions models for House price prediction. The proposed research methodology consists of four stages, namely Data Collection, Pre Processing the data collected and transforming it to the best format, developing intelligent models using machine learning algorithms, training, testing, and validating the model on house prices of the housing market in the Capital, Islamabad. The data used for model validation and testing is the asking price from online property stores, which provide a reasonable estimate of the city housing market. The prediction model can significantly assist in the prediction of future housing prices in Pakistan. The regression results are encouraging and give promising directions for future prediction work on the collected dataset.

Severity-based Software Quality Prediction using Class Imbalanced Data

  • Hong, Euy-Seok;Park, Mi-Kyeong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.4
    • /
    • pp.73-80
    • /
    • 2016
  • Most fault prediction models have class imbalance problems because training data usually contains much more non-fault class modules than fault class ones. This imbalanced distribution makes it difficult for the models to learn the minor class module data. Data imbalance is much higher when severity-based fault prediction is used. This is because high severity fault modules is a smaller subset of the fault modules. In this paper, we propose severity-based models to solve these problems using the three sampling methods, Resample, SpreadSubSample and SMOTE. Empirical results show that Resample method has typical over-fit problems, and SpreadSubSample method cannot enhance the prediction performance of the models. Unlike two methods, SMOTE method shows good performance in terms of AUC and FNR values. Especially J48 decision tree model using SMOTE outperforms other prediction models.

Study on the Demand Prediction for Transportation System Utilizing Data Granulization (Data Granulization을 이용한 수송수요예측에 관한 연구)

  • 이덕규;홍태화;김학배;우광방
    • Proceedings of the KSR Conference
    • /
    • 1998.05a
    • /
    • pp.211-218
    • /
    • 1998
  • The demand prediction becomes an essential mean to utilize efficiently finite traffic facilities and to provide the optimized schedules for transportation system. The demand prediction is one of the critical complex management schemes for distibuting resources of transportation service by means of computer system. The construction of a prediction model is based on data granulization, followed by processing the raw input data and evaluating the predicted output values. A large number of economic-social parameters are also to be implemented in conventional prediction models which are only based on a sequence of past data. The proposed prediction models are classified by static and dynamic characteristics and its performances are evaluated utilizing computer simulation.

  • PDF

Defect Type Prediction Method in Manufacturing Process Using Data Mining Technique (데이터마이닝 기법을 이용한 제조 공정내의 불량항목별 예측방법)

  • Byeon Sung-Kyu;Kang Chang-Wook;Sim Seong-Bo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.27 no.2
    • /
    • pp.10-16
    • /
    • 2004
  • Data mining technique is the exploration and analysis, by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules. This paper uses a data mining technique for the prediction of defect types in manufacturing Process. The Purpose of this Paper is to model the recognition of defect type Patterns and Prediction of each defect type before it occurs in manufacturing process. The proposed model consists of data handling, defect type analysis, and defect type prediction stages. The performance measurement shows that it is higher in prediction accuracy than logistic regression model.

Saturation Prediction for Crowdsensing Based Smart Parking System

  • Kim, Mihui;Yun, Junhyeok
    • Journal of Information Processing Systems
    • /
    • v.15 no.6
    • /
    • pp.1335-1349
    • /
    • 2019
  • Crowdsensing technologies can improve the efficiency of smart parking system in comparison with present sensor based smart parking system because of low install price and no restriction caused by sensor installation. A lot of sensing data is necessary to predict parking lot saturation in real-time. However in real world, it is hard to reach the required number of sensing data. In this paper, we model a saturation predication combining a time-based prediction model and a sensing data-based prediction model. The time-based model predicts saturation in aspects of parking lot location and time. The sensing data-based model predicts the degree of saturation of the parking lot with high accuracy based on the degree of saturation predicted from the first model, the saturation information in the sensing data, and the number of parking spaces in the sensing data. We perform prediction model learning with real sensing data gathered from a specific parking lot. We also evaluate the performance of the predictive model and show its efficiency and feasibility.

Software Quality Classification using Bayesian Classifier (베이지안 분류기를 이용한 소프트웨어 품질 분류)

  • Hong, Euy-Seok
    • Journal of Information Technology Services
    • /
    • v.11 no.1
    • /
    • pp.211-221
    • /
    • 2012
  • Many metric-based classification models have been proposed to predict fault-proneness of software module. This paper presents two prediction models using Bayesian classifier which is one of the most popular modern classification algorithms. Bayesian model based on Bayesian probability theory can be a promising technique for software quality prediction. This is due to the ability to represent uncertainty using probabilities and the ability to partly incorporate expert's knowledge into training data. The two models, Na$\ddot{i}$veBayes(NB) and Bayesian Belief Network(BBN), are constructed and dimensionality reduction of training data and test data are performed before model evaluation. Prediction accuracy of the model is evaluated using two prediction error measures, Type I error and Type II error, and compared with well-known prediction models, backpropagation neural network model and support vector machine model. The results show that the prediction performance of BBN model is slightly better than that of NB. For the data set with ambiguity, although the BBN model's prediction accuracy is not as good as the compared models, it achieves better performance than the compared models for the data set without ambiguity.

Comparative Study on the Accuracy of Surface Air Temperature Prediction based on selection of land use and initial meteorological data (토지이용도와 초기 기상 입력 자료의 선택에 따른 지상 기온 예측 정확도 비교 연구)

  • Hae-Dong Kim;Ha-Young Kim
    • Journal of Environmental Science International
    • /
    • v.33 no.6
    • /
    • pp.435-442
    • /
    • 2024
  • We investigated the accuracy of surface air temperature prediction according to the selection of land-use data and initial meteorological data using the Weather Research and Forecasting model-v4.2.1. A numerical experiment was conducted at the Daegu Dyeing Industrial Complex. We initially used meteorological input data from GFS (Global forecast system)and GDAPS (Global data assimilation and prediction system). High-resolution input data were generated and used as input data for the weather model using the land cover data of the Ministry of Environment and the digital elevation model of the Ministry of Land, Infrastructure, and Transport. The experiment was conducted by classifying the terrestrial and topographic data (land cover data) and meteorological data applied to the model. For simulations using high-resolution terrestrial data(10 m), global data assimilation, and prediction system data(CASE 3), the calculated surface temperature was much closer to the automatic weather station observations than for simulations using low-resolution terrestrial data(900 m) and GFS(CASE 1).

A Prediction of Number of Patients and Risk of Disease in Each Region Based on Pharmaceutical Prescription Data (의약품 처방 데이터 기반의 지역별 예상 환자수 및 위험도 예측)

  • Chang, Jeong Hyeon;Kim, Young Jae;Choi, Jong Hyeok;Kim, Chang Su;Aziz, Nasridinov
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.2
    • /
    • pp.271-280
    • /
    • 2018
  • Recently, big data has been growing rapidly due to the development of IT technology. Especially in the medical field, big data is utilized to provide services such as patient-customized medical care, disease management and disease prediction. In Korea, 'National Health Alarm Service' is provided by National Health Insurance Corporation. However, the prediction model has a problem of short-term prediction within 3 days and unreliability of social data used in prediction model. In order to solve these problems, this paper proposes a disease prediction model using medicine prescription data generated from actual patients. This model predicts the total number of patients and the risk of disease in each region and uses the ARIMA model for long-term predictions.

TIME SERIES PREDICTION USING INCREMENTAL REGRESSION

  • Kim, Sung-Hyun;Lee, Yong-Mi;Jin, Long;Chai, Duck-Jin;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.635-638
    • /
    • 2006
  • Regression of conventional prediction techniques in data mining uses the model which is generated from the training step. This model is applied to new input data without any change. If this model is applied directly to time series, the rate of prediction accuracy will be decreased. This paper proposes an incremental regression for time series prediction like typhoon track prediction. This technique considers the characteristic of time series which may be changed over time. It is composed of two steps. The first step executes a fractional process for applying input data to the regression model. The second step updates the model by using its information as new data. Additionally, the model is maintained by only recent data in a queue. This approach has the following two advantages. It maintains the minimum information of the model by using a matrix, so space complexity is reduced. Moreover, it prevents the increment of error rate by updating the model over time. Accuracy rate of the proposed method is measured by RME(Relative Mean Error) and RMSE(Root Mean Square Error). The results of typhoon track prediction experiment are performed by the proposed technique IMLR(Incremental Multiple Linear Regression) is more efficient than those of MLR(Multiple Linear Regression) and SVR(Support Vector Regression).

  • PDF

Wind Speed Prediction using WAsP for Complex Terrain (복합지형에 대한 WAsP의 풍속 예측성 평가)

  • Yoon, Kwang-Yong;Yoo, Neung-Soo;Paek, In-Su
    • Journal of Industrial Technology
    • /
    • v.28 no.B
    • /
    • pp.199-207
    • /
    • 2008
  • A linear wind prediction program, WAsP, was employed to predict wind speed at two different sites located in complex terrain in South Korea. The reference data obtained at locations more than 7 kilometers away from the prediction sites were used for prediction. The predictions from the linear model were compared with the measured data at the two prediction sites. Two compensation methods such as a self-prediction error method and a delta ruggedness index (RIX) method were used to improve the wind speed prediction from WAsP and showed a good possibility. The wind speed prediction errors reached within 3.5 % with the self prediction error method, and within 10% with the delta RIX method. The self prediction error method can be used as a compensation method to reduce the wind speed prediction error in WAsP.

  • PDF