• Title/Summary/Keyword: Decision Tree Regression

Search Result 328, Processing Time 0.024 seconds

The Related Factors to Perceived gastritis or Perceived enteritis in High school seniors -the 2009 Korea Youth Risk Behavior Web-based Survey- (고등학교 3학년 학생들이 인지한 위염 및 장염 관련요인 -2009년 청소년 건강행태 온라인 조사 자료를 중심으로-)

  • Bea, Sang-Sook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.2
    • /
    • pp.668-677
    • /
    • 2012
  • This study analyzed the related factors affecting to perceived gastritis or perceived enteritis for 11,753 Korean high school seniors who participated in the 2009 Korea Youth Risk Behavior Web-based Survey (KYHRBWS). Of the subjects, 5,685 (47.6%)were male and 6,068(52.4%) were female and 8.7% of the students responded that they had suffered from gastritis or enteritis for a long time and the females had a slightly higher attack rate of gastritis or enteritis. Survey logistic regression models and decision tree analysis were used to calculate odd ratios and 95% confidence intervals. As a result, there was affecting to their stress and health behaviors in the risk of gastritis and enteritis, and that their lower level perceived health, smoking, heavy drinking or starting drinking before they were 13 years old and a higher level of perceived stress significantly affected the risk of gastritis or enteritis in the subjects(p<.001).

A study on integrating and discovery of semantic based knowledge model (의미 기반의 지식모델 통합과 탐색에 관한 연구)

  • Chun, Seung-Su
    • Journal of Internet Computing and Services
    • /
    • v.15 no.6
    • /
    • pp.99-106
    • /
    • 2014
  • Generation and analysis methods have been proposed in recent years, such as using a natural language and formal language processing, artificial intelligence algorithms based knowledge model is effective meaning. its semantic based knowledge model has been used effective decision making tree and problem solving about specific context. and it was based on static generation and regression analysis, trend analysis with behavioral model, simulation support for macroeconomic forecasting mode on especially in a variety of complex systems and social network analysis. In this study, in this sense, integrating knowledge-based models, This paper propose a text mining derived from the inter-Topic model Integrated formal methods and Algorithms. First, a method for converting automatically knowledge map is derived from text mining keyword map and integrate it into the semantic knowledge model for this purpose. This paper propose an algorithm to derive a method of projecting a significant topic map from the map and the keyword semantically equivalent model. Integrated semantic-based knowledge model is available.

연관분석을 이용한 데이터마이닝 기법에 관한 사례연구

  • Ryu, Gwi-Yeol;Mun, Yeong-Su;Choi, Seung-Du
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2006.04a
    • /
    • pp.109-120
    • /
    • 2006
  • Huge information has been made due to the current computing environment and could not be acceptable. People want the information which they can understand and accept easily. They may want not only simple information but also knowledge. That is why data mining becomes a center of information. We use RFM analysis in order to create customer score. Customers are classified into five groups(most oxcellenrexcellenycommoflowerilowest) for a various marketing activities. We can found the significant patterns in each group, and classify customers from loyal customers to leaving customers in the near future by the indirect data mining(e.g. association analysis) and the direct data mining(e.g. decision tree, logistic regression analysis, etc.), which are named in this study. Our research focuses on the advanced models by applying the association rules in data mining. Our results indicate that the indirect data mining and the direct data mining seem to have same outputs, but the former shows more clear pattern then the latter one.

  • PDF

A Study on Predictors of Academic Achievement in College Students : Focused on J University (대학생의 학업성취도 예측요인 연구 : J 대학을 중심으로)

  • Son, Yo-Han;Kim, In-Gyu
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.1
    • /
    • pp.519-529
    • /
    • 2020
  • The purpose of this study is to establish a model for predicting academic achievement of college students and to reveal the interrelationship and relative influence of each factor. For this, we surveyed the personal factors and learning strategy factors of 1,310 learners at J University, and analyzed the discriminant factors and patterns of the predictors of academic achievement through the decision tree analysis, a data mining method, and examined the relative effects of each factor. Binary logistic regression analysis was performed for viewing. As a result, the most important factor for predicting academic achievement was efficacy, and other factors such as motivation, time management, and depression were predictive of academic achievement. The patterns of factors predicting academic achievement were found to be high in efficacy and time management, and high in motivation for learning even if the efficacy was moderate. Low efficacy and learning motivation, and high depression have been shown to decrease academic achievement. Based on these results, the study suggested the efficacy and motivation to improve academic achievement of college students, strengthening time management education, and managing negative emotions.

Particulate Matter Prediction using Quantile Boosting (분위수 부스팅을 이용한 미세먼지 농도 예측)

  • Kwon, Jun-Hyeon;Lim, Yaeji;Oh, Hee-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.83-92
    • /
    • 2015
  • Concerning the national health, it is important to develop an accurate prediction method of atmospheric particulate matter (PM) because being exposed to such fine dust can trigger not only respiratory diseases as well as dermatoses, ophthalmopathies and cardiovascular diseases. The National Institute of Environmental Research (NIER) employs a decision tree to predict bad weather days with a high PM concentration. However, the decision tree method (even with the inherent unstableness) cannot be a suitable model to predict bad weather days which represent only 4% of the entire data. In this paper, while presenting the inaccuracy and inappropriateness of the method used by the NIER, we present the utility of a new prediction model which adopts boosting with quantile loss functions. We evaluate the performance of the new method over various ${\tau}$-value's and justify the proposed method through comparison.

A Best Effort Classification Model For Sars-Cov-2 Carriers Using Random Forest

  • Mallick, Shrabani;Verma, Ashish Kumar;Kushwaha, Dharmender Singh
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.27-33
    • /
    • 2021
  • The whole world now is dealing with Coronavirus, and it has turned to be one of the most widespread and long-lived pandemics of our times. Reports reveal that the infectious disease has taken toll of the almost 80% of the world's population. Amidst a lot of research going on with regards to the prediction on growth and transmission through Symptomatic carriers of the virus, it can't be ignored that pre-symptomatic and asymptomatic carriers also play a crucial role in spreading the reach of the virus. Classification Algorithm has been widely used to classify different types of COVID-19 carriers ranging from simple feature-based classification to Convolutional Neural Networks (CNNs). This research paper aims to present a novel technique using a Random Forest Machine learning algorithm with hyper-parameter tuning to classify different types COVID-19-carriers such that these carriers can be accurately characterized and hence dealt timely to contain the spread of the virus. The main idea for selecting Random Forest is that it works on the powerful concept of "the wisdom of crowd" which produces ensemble prediction. The results are quite convincing and the model records an accuracy score of 99.72 %. The results have been compared with the same dataset being subjected to K-Nearest Neighbour, logistic regression, support vector machine (SVM), and Decision Tree algorithms where the accuracy score has been recorded as 78.58%, 70.11%, 70.385,99% respectively, thus establishing the concreteness and suitability of our approach.

A Study on the Prediction of the Surface Drifter Trajectories in the Korean Strait (대한해협에서 표층 뜰개 이동 예측 연구)

  • Ha, Seung Yun;Yoon, Han-Sam;Kim, Young-Taeg
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.34 no.1
    • /
    • pp.11-18
    • /
    • 2022
  • In order to improve the accuracy of particle tracking prediction techniques near the Korean Strait, this study compared and analyzed a particle tracking model based on a seawater flow numerical model and a machine learning based on a particle tracking model using field observation data. The data used in the study were the surface drifter buoy movement trajectory data observed in the Korea Strait, prediction data by machine learning (linear regression, decision tree) using the tide and wind data from three observation stations (Gageo Island, Geoje Island, Gyoboncho), and prediciton data by numerical models (ROMS, MOHID). The above three data were compared through three error evaluation methods (Correlation Coefficient (CC), Root Mean Square Errors (RMSE), and Normalized Cumulative Lagrangian Separation (NCLS)). As a final result, the decision tree model had the best prediction accuracy in CC and RMSE, and the MOHID model had the best prediction results in NCLS.

Corporate Corruption Prediction Evidence From Emerging Markets

  • Kim, Yang Sok;Na, Kyunga;Kang, Young-Hee
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.4
    • /
    • pp.13-40
    • /
    • 2021
  • Purpose - The purpose of this study is to predict corporate corruption in emerging markets such as Brazil, Russia, India, and China (BRIC) using different machine learning techniques. Since corruption is a significant problem that can affect corporate performance, particularly in emerging markets, it is important to correctly identify whether a company engages in corrupt practices. Design/methodology/approach - In order to address the research question, we employ predictive analytic techniques (machine learning methods). Using the World Bank Enterprise Survey Data, this study evaluates various predictive models generated by seven supervised learning algorithms: k-Nearest Neighbour (k-NN), Naïve Bayes (NB), Decision Tree (DT), Decision Rules (DR), Logistic Regression (LR), Support Vector Machines (SVM), and Artificial Neural Network (ANN). Findings - We find that DT, DR, SVM and ANN create highly accurate models (over 90% of accuracy). Among various factors, firm age is the most significant, while several other determinants such as source of working capital, top manager experience, and the number of permanent full-time employees also contribute to company corruption. Research implications or Originality - This research successfully demonstrates how machine learning can be applied to predict corporate corruption and also identifies the major causes of corporate corruption.

The Role of Data Technologies with Machine Learning Approaches in Makkah Religious Seasons

  • Waleed Al Shehri
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.26-32
    • /
    • 2023
  • Hajj is a fundamental pillar of Islam that all Muslims must perform at least once in their lives. However, Umrah can be performed several times yearly, depending on people's abilities. Every year, Muslims from all over the world travel to Saudi Arabia to perform Hajj. Hajj and Umrah pilgrims face multiple issues due to the large volume of people at the same time and place during the event. Therefore, a system is needed to facilitate the people's smooth execution of Hajj and Umrah procedures. Multiple devices are already installed in Makkah, but it would be better to suggest the data architectures with the help of machine learning approaches. The proposed system analyzes the services provided to the pilgrims regarding gender, location, and foreign pilgrims. The proposed system addressed the research problem of analyzing the Hajj pilgrim dataset most effectively. In addition, Visualizations of the proposed method showed the system's performance using data architectures. Machine learning algorithms classify whether male pilgrims are more significant than female pilgrims. Several algorithms were proposed to classify the data, including logistic regression, Naive Bayes, K-nearest neighbors, decision trees, random forests, and XGBoost. The decision tree accuracy value was 62.83%, whereas K-nearest Neighbors had 62.86%; other classifiers have lower accuracy than these. The open-source dataset was analyzed using different data architectures to store the data, and then machine learning approaches were used to classify the dataset.

A Study on Factors of Management of Diabetes Mellitus using Data Mining (데이터 마이닝을 이용한 당뇨환자의 관리요인에 관한 연구)

  • Kim, Yoo-Mi;Chang, Dong-Min;Kim, Sung-Soo;Park, Il-Su;Kang, Sung-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.5
    • /
    • pp.1100-1108
    • /
    • 2009
  • The Objectives: The purpose of this study is to identify the factors related to management of DM in Korea. Methods: The subjects selected by using data of National Health and Nutrition Survey(NHANS) in 2005 were 415 adults, aged 20 and older, and diagnosed with DM. This study used data mining algorithms. This study validated the predictive power of data mining algorithms by comparing the performance of logistic regression, decision tree, and Neural Network on the basic of validation, it was found that the model performance of decision tree was the best among the above three techniques. Result: First, awareness of DM was positively associated with age, residential area, and job. The most important factor of DM awareness is age. Awareness rate of DM with 52 age over is 76.1%. Among the ${\geq}52$ age group, an important factor is family history. Among patients who are 52 years or over with family history of DM, an important factor is job. The awareness rate of patients who are 52 age over, family, history of DM, and professionals is 95.0%. Second, treatment of DM was also positively associated with awareness, region, and job. The most important factor of DM treatment is DM awareness. Treatment rate of patients who are aware of DM is 84.8%. Among patients who have awareness of DM, an important factor is region. The awareness rate of patients who are aware of DM in rural area is 10.4%. Conclusion: Finally, the result of analysis suggest that DM management programs should consider group characteristic of DM patients.