• Title/Summary/Keyword: Decision Tree Regression

Search Result 328, Processing Time 0.025 seconds

Sales Pattern and Related Product Attributes of T-shirts (티셔츠 상품의 판매패턴과 연관된 상품속성)

  • Chae, Jin Mie;Kim, Eun Hie
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.44 no.6
    • /
    • pp.1053-1069
    • /
    • 2020
  • This study examined the sales pattern relationship with respect to product attributes to propose sales forecasting for fashion products. We analyzed 537 SKU sales data of T-shirts in the domestic sports brand using SAS program. The sales pattern of fashion products fluctuated and were influenced by exogenous factors; therefore, we removed the influence of exogenous factors found to be price discounts and holiday effects as a result of regression analysis. In addition, it was difficult to predict sales using the sales patterns of the same product since fashion products were released as new products every year. Therefore, the forecasting model was proposed using sales patterns of related product attributes when attributes were considered descriptive variables. We classified sales patterns using K-means clustering in order to explain the relationship between sales patterns and product attributes along with creating a decision tree classifier using attributes as input and sales patterns as output. As a result, the sales patterns of T-shirts were clustered into six types that featured the characteristic shape of peak and slope. It was also associated with the combination of product attributes and their values in regards to the proposed sales pattern prediction model.

Using Missing Values in the Model Tree to Change Performance for Predict Cholesterol Levels (모델트리의 결측치 처리 방법에 따른 콜레스테롤수치 예측의 성능 변화)

  • Jung, Yong Gyu;Won, Jae Kang;Sihn, Sung Chul
    • Journal of Service Research and Studies
    • /
    • v.2 no.2
    • /
    • pp.35-43
    • /
    • 2012
  • Data mining is an interest area in all field around us not in any specific areas, which could be used applications in a number of areas heavily. In other words, it is used in the decision-making process, data and correlation analysis in hidden relations, for finding the actionable information and prediction. But some of the data sets contains many missing values in the variables and do not exist a large number of records in the data set. In this paper, missing values are handled in accordance with the model tree algorithm. Cholesterol value is applied for predicting. For the performance analysis, experiments are approached for each treatment. Through this, efficient alternative is presented to apply the missing data.

  • PDF

Data-Driven Analysis for Future Construction Prediction : Case Study on Seoul (서울시 데이터 기반 필지별 건축행위 발생 예측)

  • Yun, Sung-Bum;Kim, Tae Hyun
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2019.11a
    • /
    • pp.7-8
    • /
    • 2019
  • 지속적인 건축물의 노화와 개발지 부족은 현존하는 건축물의 재건축 및 활용 가능 용지에 신규 건축행위를 유도한다. 서울에서는 근 5년간 25,000여 건의 신축이 발생하였으며, 이에 대한 신규 정책 등 다양한 지원 체계가 활성화되고 있다. 본 연구에서는 2011년부터 2015년까지 발생한 필지별 건축행위 데이터와 추가적 43개의 변수를 활용하여 신규 건축행위가 발생하는 필지에 대한 예측 모델을 구축하고자 한다. 요인도출 기계학습 방식인 의사결정트리 (Decision Tree) 중 CART(Classification And Regression Tree)를 활용하여 신규 건축 예측 모델을 구축하였으며, 86.28%의 정확도와 4개의 주요 신규 건축행위 발생 요인을 도출하였다.

  • PDF

Comparison of Methodologies for Characterizing Pedestrian-Vehicle Collisions (보행자-차량 충돌사고 특성분석 방법론 비교 연구)

  • Choi, Saerona;Jeong, Eunbi;Oh, Cheol
    • Journal of Korean Society of Transportation
    • /
    • v.31 no.6
    • /
    • pp.53-66
    • /
    • 2013
  • The major purpose of this study is to evaluate methodologies to predict the injury severity of pedestrian-vehicle collisions. Methodologies to be evaluated and compared in this study include Binary Logistic Regression(BLR), Ordered Probit Model(OPM), Support Vector Machine(SVM) and Decision Tree(DT) method. Valuable insights into applying methodologies to analyze the characteristics of pedestrian injury severity are derived. For the purpose of identifying causal factors affecting the injury severity, statistical approaches such as BLR and OPM are recommended. On the other hand, to achieve better prediction performance, heuristic approaches such as SVM and DT are recommended. It is expected that the outcome of this study would be useful in developing various countermeasures for enhancing pedestrian safety.

An Analysis of Nursing Needs for Hospitalized Cancer Patients;Using Data Mining Techniques (데이터 마이닝을 이용한 입원 암 환자 간호 중증도 예측모델 구축)

  • Park, Sun-A
    • Asian Oncology Nursing
    • /
    • v.5 no.1
    • /
    • pp.3-10
    • /
    • 2005
  • Back ground: Nurses now occupy one third of all hospital human resources. Therefore, efficient management of nursing manpower is getting more important. While it is very clear that nursing workload requirement analysis and patient severity classification should be done first for the efficient allocation of nursing workforce, these processes have been conducted manually with ad hoc rule. Purposes: This study was tried to make a predict model for patient classification according to nursing need. We tried to find the easier and faster method to classify nursing patients that can help efficient management of nursing manpower. Methods: The nursing patient classifications data of the hospitalized cancer patients in one of the biggest cancer center in Korea during 2003.1.1-2003.12.31 were assessed by trained nurses. This study developed a prediction model and analyzing nursing needs by data mining techniques. Patients were classified by three different data mining techniques, (Logistic regression, Decision tree and Neural network) and the results were assessed. Results: The data set was created using 165,073 records of 2,228 patients classification database. Main explaining variables were as follows in 3 different data mining techniques. 1) Logistic regression : age, month and section. 2) Decision tree : section, month, age and tumor. 3) Neural network : section, diagnosis, age, sex, metastasis, hospital days and month. Among these three techniques, neural network showed the best prediction power in ROC curve verification. As the result of the patient classification prediction model developed by neural network based on nurse needs, the prediction accuracy was 84.06%. Conclusion: The patient classification prediction model was developed and tested in this study using real patients data. The result can be employed for more accurate calculation of required nursing staff and effective use of labor force.

  • PDF

Early Prediction of Carcass Yield Grade by Ultrasound in Hanwoo (초음파를 이용한 한우 육량등급의 조기예측)

  • Rhee, Y. J.;Seok, H. K.;Kim, S. J.;Song, Y. H.
    • Journal of Animal Science and Technology
    • /
    • v.45 no.2
    • /
    • pp.327-334
    • /
    • 2003
  • This study was carried out to make early prediction of carcass yield grade. Sixty six Hanwoo steers were measured for back fat thickness, longissimus muscle area and body weight at 18, 21 and 24 months of age by ultrasound. Carcass evaluation was done after ultrasound measurement at 24 month of age. Ultrasonic yield grade at 18, 21 and 24 month of age were predicted by regression and decision tree methods. Classifying by carcass yield grade, ultrasonic back fat thickness at 18, 21 and 24 months of age was significantly different in each carcass yield grade (p<0.05). The prediction accuracy of carcass yield grade by regression method was 78.8% at 18 months, 86.4% at 21 months and 90.9% at 24 months of age. By using the decision tree method for carcass yield grade, 78.8%, 89.4% and 89.4% of prediction accuracy were obtained at 18, 21 and 24 months of age, respectively.

Determinants of job finding using student's characteristic information (학생정보를 이용한 대졸 취업에 미치는 영향력 분석)

  • Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.849-856
    • /
    • 2011
  • In this paper, we study the influence analysis of admission and enrollment variables including individual characteristics variables on employment of graduate students at K university. First, logistic regression analysis is used to examine the main effects of admission, enrollment variables including student's individual characteristics on employment. Also, decision tree analysis is used to examine the interaction effects for the variables on employment. The results of this paper may be helpful to K university in designing effective job finding strategies for graduate students.

Study on the effectiveness of english-medium class (영어강의의 효과성에 대한 연구)

  • Cho, Jang Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1137-1144
    • /
    • 2012
  • Many universities stress gradually the importance of english-medium class in order to improve the international competitiveness and the internationalization of the university. In this paper, we compare english-medium class with korean class using course evaluation score. Also we analyze the factors that affect the effectiveness of the course evaluation score of english-medium class. First, logistic regression analysis is used to examine the main effects of subjects and individual characteristics. Also, decision tree analysis is used to examine the interaction effects for subjects and individual characteristics. The results of this paper are as follows. Grade, department category, class size, GPA and screening method affect the effectiveness of english-medium class. The highest effectiveness group of english-medium class is that grade is freshmen and department category is humanity. Also the group of the second highest effectiveness group is that grade is freshmen and department category is nature and art and GPA is high.

The effect of road weather factors on traffic accident - Focused on Busan area - (도로위의 기상요인이 교통사고에 미치는 영향 - 부산지역을 중심으로 -)

  • Lee, Kyeongjun;Jung, Imgook;Noh, Yunhwan;Yoon, Sanggyeong;Cho, Youngseuk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.661-668
    • /
    • 2015
  • Them traffic accidents have been increased every year due to increasing of vehicles numbers as well as the gravitation of the population. The carelessness of drivers, many road weather factors have a great influence on the traffic accidents. Especially, the number of traffic accident is governed by precipitation, visibility, humidity, cloud amounts and temperature. The purpose of this paper is to analyse the effect of road weather factors on traffic accident. We use the data of traffic accident, AWS weather factors (precipitation, existence of rainfall, temperature, wind speed), time zone and day of the week in 2013. We did statistical analysis using logistic regression analysis and decision tree analysis. These prediction models may be used to predict the traffic accident according to the weather condition.

Development of medical/electrical convergence software for classification between normal and pathological voices (장애 음성 판별을 위한 의료/전자 융복합 소프트웨어 개발)

  • Moon, Ji-Hye;Lee, JiYeoun
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.187-192
    • /
    • 2015
  • If the software is developed to analyze the speech disorder, the application of various converged areas will be very high. This paper implements the user-friendly program based on CART(Classification and regression trees) analysis to distinguish between normal and pathological voices utilizing combination of the acoustical and HOS(Higher-order statistics) parameters. It means convergence between medical information and signal processing. Then the acoustical parameters are Jitter(%) and Shimmer(%). The proposed HOS parameters are means and variances of skewness(MOS and VOS) and kurtosis(MOK and VOK). Database consist of 53 normal and 173 pathological voices distributed by Kay Elemetrics. When the acoustical and proposed parameters together are used to generate the decision tree, the average accuracy is 83.11%. Finally, we developed a program with more user-friendly interface and frameworks.