• Title/Summary/Keyword: validation study

Search Result 4,757, Processing Time 0.032 seconds

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Characterization of Traits Related to Grain Shape in Korean Rice Varieties (국내 육성 벼 품종 입형 관련 특성 분석)

  • Lee, Chang-Min;Lee, Keon-Mi;Baek, Man-Kee;Kim, Woo-Jae;Suh, Jung-Pil;Jeong, Oh-Young;Cho, Young-Chan;Park, Hyun-Su;Kim, Suk-Man
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.65 no.3
    • /
    • pp.199-213
    • /
    • 2020
  • Grain size and shape are the two important components contributing to rice yield and quality. To analyze traits related to grain-shape, a total of 272 varieties derived from japonica, japonica black and Tongil-type rice accession in Korea were evaluated in this study. The traits, grain length (GL), grain width (GW), grain thickness (GT), length to width ratio (RLW), and 1000-grain weight (TGW) were measured and replicated 10 times. Genes (GW2, GS3, qGL3, qSW5, GS5, TGW6, GW7, and GW8) related to grain-shape were validated in the accessions using specific DNA marker sets. K-mean clustering of the accession based on phenotypic data revealed three groups: group 1 was classified by GW and GT and included most of japonica type, group 2 was classified by RLW and GL reached a medium size and possessed a half spindle-shaped type, and group 3 was classified by TGW, reached a long size and possessed a semi-round shape. In validation tests using the marker sets, both gw2 and tgw6 were validated in less than 1% of the tested accessions and two allelic types, qgl3 and gw8, were only verified in Tongil-type accessions. For GW8 and GW2, any different amplicons were not amplified in any japonica or Tongil-type accessions, respectively. In order to suggest the representative grain-shape gene combinations for each ecotype, the allelic combinations were evaluated by PCR analysis. Cj1 and 2 in japonica (Cj1-7), Cj_b1 and 2 in japonica-black (Cj_b1-3), and CT3 in Tongil-type (CT1-13) turned out to be the dominant combination in each ecotype, respectively. In addition, the results revealed that introgression of four genes (gw2, gs3, qSW5, and GS5) would expand the diversity of grain shape in Korean japonica varieties. The gene combinations information could be utilized practically to understand or enhance grain shape in japonica rice breeding program.

The Alignment Evaluation for Patient Positioning System(PPS) of Gamma Knife PerfexionTM (감마나이프 퍼펙션의 자동환자이송장치에 대한 정렬됨 평가)

  • Jin, Seong Jin;Kim, Gyeong Rip;Hur, Beong Ik
    • Journal of the Korean Society of Radiology
    • /
    • v.14 no.3
    • /
    • pp.203-209
    • /
    • 2020
  • The purpose of this study is to assess the mechanical stability and alignment of the patient positioning system (PPS) of Leksell Gamma Knife Perfexion(LGK PFX). The alignment of the PPS of the LGK PFX was evaluated through measurements of the deviation of the coincidence of the Radiological Focus Point(RFP) and the PPS Calibration Center Point(CCP) applying different weights on the couch(0, 50, 60, 70, 80, and 90 kg). In measurements, a service diode test tool with three diode detectors being used biannually at the time of the routine preventive maintenance was used. The test conducted with varying weights on the PPS using the service diode test tool measured the radial deviations for all three collimators 4, 8, and 16 mm and also for three different positions of the PPS. In order to evaluate the alignment of the PPS, the radial deviations of the correspondence of the radiation focus and the LGK calibration center point of multiple beams were averaged using the calibrated service diode test tool at three university hospitals in Busan and Gyeongnam. Looking at the center diode for all collimators 4, 8, and 16 mm without weight on the PPS, and examining the short and long diodes for the 4 mm collimator, the means of the validation difference, i.e., the radial deviation for the setting of 4, 8, and 16 mm collimators for the center diode were respectively measured to 0.058 ± 0.023, 0.079 ± 0.023, and 0.097 ± 0.049 mm, and when the 4 mm collimator was applied to the center diode, the short diode, and the long diode, the average of the radial deviation was respectively 0.058 ± 0.023, 0.078 ± 0.01 and 0.070 ± 0.023 mm. The average of the radial deviations when irradiating 8 and 16 mm collimators on short and long diodes without weight are measured to 0.07 ± 0.003(8 mm sd), 0.153 ± 0.002 mm(16 mm sd) and 0.031 ± 0.014(8 mm ld), 0.175 ± 0.01 mm(16 mm ld) respectively. When various weights of 50 to 90 kg are placed on the PPS, the average of radial deviation when irradiated to the center diode for 4, 8, and 16 mm is 0.061 ± 0.041 to 0.075 ± 0.015, 0.023 ± 0.004 to 0.034 ± 0.003, and 0.158 ± 0.08 to 0.17 ± 0.043 mm, respectively. In addition, in the same situation, when the short diode for 4, 8, and 16 mm was irradiated, the averages of radial deviations were 0.063 ± 0.024 to 0.07 ± 0.017, 0.037 ± 0.006 to 0.059 ± 0.001, and 0.154 ± 0.03 to 0.165 ± 0.07 mm, respectively. In addition, when irradiated on long diode for 4, 8, and 16 mm, the averages of radial deviations were measured to be 0.102 ± 0.029 to 0.124 ± 0.036, 0.035 ± 0.004 to 0.054 ± 0.02, and 0.183 ± 0.092 to 0.202 ± 0.012 mm, respectively. It was confirmed that all the verification results performed were in accordance with the manufacturer's allowable deviation criteria. It was found that weight dependence was negligible as a result of measuring the alignment according to various weights placed on the PPS that mimics the actual treatment environment. In particular, no further adjustment or recalibration of the PPS was required during the verification. It has been confirmed that the verification test of the PPS according to various weights is suitable for normal Quality Assurance of LGK PFX.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Development of Analytical Method for Detection of Fungicide Validamycin A Residues in Agricultural Products Using LC-MS/MS (LC-MS/MS를 이용한 농산물 중 살균제 Validamycin A의 시험법 개발)

  • Park, Ji-Su;Do, Jung-Ah;Lee, Han Sol;Park, Shin-min;Cho, Sung Min;Shin, Hye-Sun;Jang, Dong Eun;Cho, Myong-Shik;Jung, Yong-hyun;Lee, Kangbong
    • Journal of Food Hygiene and Safety
    • /
    • v.34 no.1
    • /
    • pp.22-29
    • /
    • 2019
  • Validamycin A is an aminoglycoside fungicide produced by Streptomyces hygroscopicus that inhibits trehalase. The purpose of this study was to develop a method for detecting validamycin A in agricultural samples to establish MRL values for use in Korea. The validamycin A residues in samples were extracted using methanol/water (50/50, v/v) and purified with a hydrophilic-lipophilic balance (HLB) cartridges. The analyte was quantified and confirmed by liquid chromatograph-tandem mass spectrometer (LC-MS/MS) in positive ion mode using multiple reaction monitoring (MRM). Matrix-matched calibration curves were linear over the calibration ranges (0.005~0.5 ng) into a blank extract with $R^2$ > 0.99. The limits of detection and quantification were 0.005 and 0.01 mg/kg, respectively. For validation validamycin A, recovery studies were carried out three different concentration levels (LOQ, $LOQ{\times}10$, $LOQ{\times}50$, n = 5) with five replicates at each level. The average recovery range was from 72.5~118.3%, with relative standard deviation (RSD) less than 10.3%. All values were consistent with the criteria ranges requested in the Codex guidelines (CAC/GL 40-1993, 2003) and the NIFDS (National Institute of Food and Drug Safety) guideline (2016). Therefore, the proposed analytical method is accurate, effective and sensitive for validamycin A determination in agricultural commodities.

Estimation of Fresh Weight and Leaf Area Index of Soybean (Glycine max) Using Multi-year Spectral Data (다년도 분광 데이터를 이용한 콩의 생체중, 엽면적 지수 추정)

  • Jang, Si-Hyeong;Ryu, Chan-Seok;Kang, Ye-Seong;Park, Jun-Woo;Kim, Tae-Yang;Kang, Kyung-Suk;Park, Min-Jun;Baek, Hyun-Chan;Park, Yu-hyeon;Kang, Dong-woo;Zou, Kunyan;Kim, Min-Cheol;Kwon, Yeon-Ju;Han, Seung-ah;Jun, Tae-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.329-339
    • /
    • 2021
  • Soybeans (Glycine max), one of major upland crops, require precise management of environmental conditions, such as temperature, water, and soil, during cultivation since they are sensitive to environmental changes. Application of spectral technologies that measure the physiological state of crops remotely has great potential for improving quality and productivity of the soybean by estimating yields, physiological stresses, and diseases. In this study, we developed and validated a soybean growth prediction model using multispectral imagery. We conducted a linear regression analysis between vegetation indices and soybean growth data (fresh weight and LAI) obtained at Miryang fields. The linear regression model was validated at Goesan fields. It was found that the model based on green ratio vegetation index (GRVI) had the greatest performance in prediction of fresh weight at the calibration stage (R2=0.74, RMSE=246 g/m2, RE=34.2%). In the validation stage, RMSE and RE of the model were 392 g/m2 and 32%, respectively. The errors of the model differed by cropping system, For example, RMSE and RE of model in single crop fields were 315 g/m2 and 26%, respectively. On the other hand, the model had greater values of RMSE (381 g/m2) and RE (31%) in double crop fields. As a result of developing models for predicting a fresh weight into two years (2018+2020) with similar accumulated temperature (AT) in three years and a single year (2019) that was different from that AT, the prediction performance of a single year model was better than a two years model. Consequently, compared with those models divided by AT and a three years model, RMSE of a single crop fields were improved by about 29.1%. However, those of double crop fields decreased by about 19.6%. When environmental factors are used along with, spectral data, the reliability of soybean growth prediction can be achieved various environmental conditions.

An accuracy analysis of Cyberknife tumor tracking radiotherapy according to unpredictable change of respiration (예측 불가능한 호흡 변화에 따른 사이버나이프 종양 추적 방사선 치료의 정확도 분석)

  • Seo, jung min;Lee, chang yeol;Huh, hyun do;Kim, wan sun
    • The Journal of Korean Society for Radiation Therapy
    • /
    • v.27 no.2
    • /
    • pp.157-166
    • /
    • 2015
  • Purpose : Cyber-Knife tumor tracking system, based on the correlation relationship between the position of a tumor which moves in response to the real time respiratory cycle signal and respiration was obtained by the LED marker attached to the outside of the patient, the location of the tumor to predict in advance, the movement of the tumor in synchronization with the therapeutic device to track real-time tumor, is a system for treating. The purpose of this study, in the cyber knife tumor tracking radiation therapy, trying to evaluate the accuracy of tumor tracking radiation therapy system due to the change in the form of unpredictable sudden breathing due to cough and sleep. Materials and Methods : Breathing Log files that were used in the study, based on the Respiratory gating radiotherapy and Cyber-knife tracking radiosurgery breathing Log files of patients who received herein, measured using the Log files in the form of a Sinusoidal pattern and Sudden change pattern. it has been reconstituted as possible. Enter the reconstructed respiratory Log file cyber knife dynamic chest Phantom, so that it is possible to implement a motion due to respiration, add manufacturing the driving apparatus of the existing dynamic chest Phantom, Phantom the form of respiration we have developed a program that can be applied to. Movement of the phantom inside the target (Ball cube target) was driven by the displacement of three sizes of according to the size of the respiratory vertical (Superior-Inferior) direction to the 5 mm, 10 mm, 20 mm. Insert crosses two EBT3 films in phantom inside the target in response to changes in the target movement, the End-to-End (E2E) test provided in Cyber-Knife manufacturer depending on the form of the breathing five times each. It was determined by carrying. Accuracy of tumor tracking system is indicated by the target error by analyzing the inserted film, additional E2E test is analyzed by measuring the correlation error while being advanced. Results : If the target error is a sine curve breathing form, the size of the target of the movement is in response to the 5 mm, 10 mm, 20 mm, respectively, of the average $1.14{\pm}0.13mm$, $1.05{\pm}0.20mm$, with $2.37{\pm}0.17mm$, suddenly for it is variations in breathing, respective average $1.87{\pm}0.19mm$, $2.15{\pm}0.21mm$, and analyzed with $2.44{\pm}0.26mm$. If the correlation error can be defined by the length of the displacement vector in the target track is a sinusoidal breathing mode, the size of the target of the movement in response to 5 mm, 10 mm, 20 mm, respective average $0.84{\pm}0.01mm$, $0.70{\pm}0.13mm$, with $1.63{\pm}0.10mm$, if it is a variant of sudden breathing respective average $0.97{\pm}0.06mm$, $1.44{\pm}0.11mm$, and analyzed with $1.98{\pm}0.10mm$. The larger the correlation error values in both the both the respiratory form, the target error value is large. If the motion size of the target of the sine curve breathing form is greater than or equal to 20 mm, was measured at 1.5 mm or more is a recommendation value of both cyber knife manufacturer of both error value. Conclusion : There is a tendency that the correlation error value between about target error value magnitude of the target motion is large is increased, the error value becomes large in variation of rapid respiration than breathing the form of a sine curve. The more the shape of the breathing large movements regular shape of sine curves target accuracy of the tumor tracking system can be judged to be reduced. Using the algorithm of Cyber-Knife tumor tracking system, when there is a change in the sudden unpredictable respiratory due patient coughing during treatment enforcement is to stop the treatment, it is assumed to carry out the internal target validation process again, it is necessary to readjust the form of respiration. Patients under treatment is determined to be able to improve the treatment of accuracy to induce the observed form of regular breathing and put like to see the goggles monitor capable of the respiratory form of the person.

  • PDF