• Title/Summary/Keyword: Generated Data

Search Result 6,856, Processing Time 0.041 seconds

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

The Evaluation of Clinical Usefulness on Application of Myocardial Extract in Quantitative Perfusion SPECT (QPS 프로그램에서 Myocardial extract 적용에 따른 임상적 유용성 평가)

  • Yun, Jong-Jun;Lim, Yeong-Hyeon;Lee, Mu-Seok;Song, Hyeon-Seok;Jeong, Ji-Uk;Park, Se-Yun;Kim, Jae-Hwan;Kim, Jeong-Uk
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.15 no.2
    • /
    • pp.88-93
    • /
    • 2011
  • Purpose: As to analytical method of data, the AutoQUANT software in which it is used quantitative rating of the myocardial perfusion SPECT are reported that there is a difference. Therefore the measured value error of the mutual program is expected to be generated even if the quantitative analysis is made data of the same patient. The purpose of this study is to offer the comparative analysis of myocardial extract method in Quantitative Perfusion SPECT. Materials and methods: We analyzed the 51 patients who were examined by Tc-99m MIBI gated myocardial SPECT in nuclear medicine department of Pusan National University Hospital from June to December 2010(34 men, 17 women, mean age $66.5{\pm}9.9$). We acquired the extracted image in myocardial extract protocol. QPS program that uses the AutoQUANT software measured TID(Transient Ischemic Dilation), ESD(Extent of Stress Defect), SSS(Summed Stress Score). Then analyzed the results. Results: The correlation of appyling myocardial extract is TID(r=0.98), ESD(r=0.99), SSS(r=0.99). In the 95% confidence limit, there was no satistically significant difference(TID p=0.78, ESD p=0.31, SSS p=0.19). After blinding test with a physician for making a qualitative analysis, there was no difference. Conclusion: Quantitative indices in QPS program showed good correlation and the results showed no statistically signigicant difference. The variance between method was small. therefore, the functional parameters by each method can be used interchangeably. Also, we expect patient's satisfaction.

  • PDF

The Effects of Sentiment and Readability on Useful Votes for Customer Reviews with Count Type Review Usefulness Index (온라인 리뷰의 감성과 독해 용이성이 리뷰 유용성에 미치는 영향: 가산형 리뷰 유용성 정보 활용)

  • Cruz, Ruth Angelie;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.43-61
    • /
    • 2016
  • Customer reviews help potential customers make purchasing decisions. However, the prevalence of reviews on websites push the customer to sift through them and change the focus from a mere search to identifying which of the available reviews are valuable and useful for the purchasing decision at hand. To identify useful reviews, websites have developed different mechanisms to give customers options when evaluating existing reviews. Websites allow users to rate the usefulness of a customer review as helpful or not. Amazon.com uses a ratio-type helpfulness, while Yelp.com uses a count-type usefulness index. This usefulness index provides helpful reviews to future potential purchasers. This study investigated the effects of sentiment and readability on useful votes for customer reviews. Similar studies on the relationship between sentiment and readability have focused on the ratio-type usefulness index utilized by websites such as Amazon.com. In this study, Yelp.com's count-type usefulness index for restaurant reviews was used to investigate the relationship between sentiment/readability and usefulness votes. Yelp.com's online customer reviews for stores in the beverage and food categories were used for the analysis. In total, 170,294 reviews containing information on a store's reputation and popularity were used. The control variables were the review length, store reputation, and popularity; the independent variables were the sentiment and readability, while the dependent variable was the number of helpful votes. The review rating is the moderating variable for the review sentiment and readability. The length is the number of characters in a review. The popularity is the number of reviews for a store, and the reputation is the general average rating of all reviews for a store. The readability of a review was calculated with the Coleman-Liau index. The sentiment is a positivity score for the review as calculated by SentiWordNet. The review rating is a preference score selected from 1 to 5 (stars) by the review author. The dependent variable (i.e., usefulness votes) used in this study is a count variable. Therefore, the Poisson regression model, which is commonly used to account for the discrete and nonnegative nature of count data, was applied in the analyses. The increase in helpful votes was assumed to follow a Poisson distribution. Because the Poisson model assumes an equal mean and variance and the data were over-dispersed, a negative binomial distribution model that allows for over-dispersion of the count variable was used for the estimation. Zero-inflated negative binomial regression was used to model count variables with excessive zeros and over-dispersed count outcome variables. With this model, the excess zeros were assumed to be generated through a separate process from the count values and therefore should be modeled as independently as possible. The results showed that positive sentiment had a negative effect on gaining useful votes for positive reviews but no significant effect on negative reviews. Poor readability had a negative effect on gaining useful votes and was not moderated by the review star ratings. These findings yield considerable managerial implications. The results are helpful for online websites when analyzing their review guidelines and identifying useful reviews for their business. Based on this study, positive reviews are not necessarily helpful; therefore, restaurants should consider which type of positive review is helpful for their business. Second, this study is beneficial for businesses and website designers in creating review mechanisms to know which type of reviews to highlight on their websites and which type of reviews can be beneficial to the business. Moreover, this study highlights the review systems employed by websites to allow their customers to post rating reviews.

ERF Components Patterns of Causal Question Generation during Observation of Biological Phenomena : A MEG Study (생명현상 관찰에서 나타나는 인과적 의문 생성의 ERF 특성 : MEG 연구)

  • Kwon, Suk-Won;Kwon, Yong-Ju
    • Journal of Science Education
    • /
    • v.33 no.2
    • /
    • pp.336-345
    • /
    • 2009
  • The purpose of this study is to analysis ERF components patterns of causal questions generated during the observation of biological phenomenon. First, the system that shows pictures causing causal questions based on biological phenomenon (evoked picture system) was developed in a way of cognitive psychology. The ERF patterns of causal questions based on time-series brain processing was observed using MEG. The evoked picture system was developed by R&D method consisting of scientific education experts and researchers. Tasks were classified into animal (A), microbe (M), and plant (P) tasks according to biological species and into interaction (I), all (A), and part (P) based on the interaction between different species. According to the collaboration with MEG team in the hospital of Seoul National University, the paradigm of MEG task was developed. MEG data about the generation of scientific questions in 5 female graduate student were collected. For examining the unique characteristic of causal question, MEG ERF components were analyzed. As a result, total 100 pictures were produced by evoked picture and 4 ERF components, M1(100~130ms), M2(220~280ms), M3(320~390ms), M4(460~520ms). The present study could guide personalized teaching-learning method through the application and development of scientific question learning program.

  • PDF

Independent Verification Program for High-Dose-Rate Brachytherapy Treatment Plans (고선량률 근접치료계획의 정도보증 프로그램)

  • Han Youngyih;Chu Sung Sil;Huh Seung Jae;Suh Chang-Ok
    • Radiation Oncology Journal
    • /
    • v.21 no.3
    • /
    • pp.238-244
    • /
    • 2003
  • Purpose: The Planning of High-Dose-Rate (HDR) brachytherapy treatments are becoming individualized and more dependent on the treatment planning system. Therefore, computer software has been developed to perform independent point dose calculations with the integration of an isodose distribution curve display into the patient anatomy images. Meterials and Methods: As primary input data, the program takes patients'planning data including the source dwell positions, dwell times and the doses at reference points, computed by an HDR treatment planning system (TPS). Dosimetric calculations were peformed in a $10\times12\times10\;Cm^3$ grid space using the Interstitial Collaborative Working Group (ICWG) formalism and an anisotropy table for the HDR Iridium-192 source. The computed doses at the reference points were automatically compared with the relevant results of the TPS. The MR and simulation film images were then imported and the isodose distributions on the axial, sagittal and coronal planes intersecting the point selected by a user were superimposed on the imported images and then displayed. The accuracy of the software was tested in three benchmark plans peformed by Gamma-Med 12i TPS (MDS Nordion, Germany). Nine patients'plans generated by Plato (Nucletron Corporation, The Netherlands) were verified by the developed software. Results: The absolute doses computed by the developed software agreed with the commercial TPS results within an accuracy of $2.8\%$ in the benchmark plans. The isodose distribution plots showed excellent agreements with the exception of the tip legion of the source's longitudinal axis where a slight deviation was observed. In clinical plans, the secondary dose calculations had, on average, about a $3.4\%$ deviation from the TPS plans. Conclusion: The accurate validation of complicate treatment plans is possible with the developed software and the qualify of the HDR treatment plan can be improved with the isodose display integrated into the patient anatomy information.

Study on Ego states in the view of Transactional analysis, Coping style and Health states of Nursing Students (상호교류분석으로 본 간호학생의 자아상태와 스트레스 대처방법 및 건강상태에 관한 연구)

  • Won, Jeong-Sook;Kim, Jeong-Hwa
    • Journal of East-West Nursing Research
    • /
    • v.7 no.1
    • /
    • pp.68-81
    • /
    • 2002
  • The purpose of this study is to analyze the type of ego states and stress coping style on female college students who are in the course of nursing study. This study is performed in the view of Transactional Analysis and designed to scrutinize descriptive correlations between the type of ego states and stress coping style. The subject is consists of 144 freshmen and sophomore, 138 junior and senior students group, who are students of K nursing college located in Seoul. The sampling investigation period is on Sept. 14, 2002 to Oct. 26, 2002. The measuring instrument used for Transactional Analysis ego state is 50 items Ego-gram research paper devised by Dusay(1997). For studying coping style, Folkman & Lazarus's measurement(1984) was adopted, which is translated and modified by Han, and Oh,(1990). Health states is adopted by standardized health inspecting instrumental table (Cornell Medical Index:CMI) which is designed for Korean people by Ko and Park(1980) Statistic average and standard deviation were generated by using SPSS PC+, t=test and Pearson correlation. The results were as follows: 1) In the type of ego states on both groups indicated the arithmetic apex NP(maximum value), then the point A was high and the data made a down slope to point AC. In the comparison to type of ego states between two groups, only at point CP, the data value of upper year students represented higher than that of lower year ones by c(t=2.28, p=.023). 2) Stress coping style of whole students were highly and affirmatively dedicated to research. Especially hopeful aspect(t=.67, p=.05), relaxation of tension(t=-2.16, p=.03) made significant difference each other in the view of arithmetic calculation. 3) In view of nursing students' physical health states, there is significant difference in past history(t=2.50, p=.013) and in case of mental health states, there are considerable discrepancies between lower group(73.52) and upper group(75.11)(p<.05). In view of all field, state of tension(t=2.13, p=.048) has difference. 4) While verifying coping style in terms of ego states level between lower and upper students group, In type CP, high level ego states group indicated significant difference on stress coping style area than low leveled group and made such sequences as the central point of problem, In type NP, sequences such as the central point of problem, In type A, the central point of problem, In type FC, hopeful aspect and In type AC, hopeful aspect and indifference were derived significantly different (p<.05). 5) While verifying health state differences in the level of lower and upper ego states, In type FC, low level group(150.29) marked higher point than upper group(145.19), there is remarkable discrepancy and so did whole health state(p=.014), In type AC both mental state(p=.000) and whole health state (p=.015) showed differences. 6) When analyzing correlations between whole students' ego states, copying style and health state, all type of ego state showed differences(p<.001). In correlations between ego state and health state, in type FC physical state had an apex and there are inverse correlations among the other types. Especially, type FC showed inverse correlations with great discrepancies(p<.05). In mental state, type NP(${\gamma}=.198$, p<.001) and type A(${\gamma}=.166$, p<.05) represented straight correlations with remarkable differences. Especially, In type AC showed inverse correlations(${\gamma}=.282$, p<.001). In case of correlations between copying style and health state, indifference(${\gamma}=-.157$) and relaxation of tension(${\gamma}=-.158$) presented great difference(p<.05). In mental state, central point of problem and search for social support showed straight correlations with great discrepancies(p<.05), hopeful aspect and indifference showed inverse correlations with considerable differences(p<.001).

  • PDF

The Relationship of Organizational and Job Characteristics, Empowerment, Job Satisfaction and Organizational Commitment Perceived by Hospital Administrative Staffs (병원 행정인력이 인지하는 조지.직무특성, 임파워먼트, 직무만족 및 조직몰입간의 관련성)

  • 박재산
    • Health Policy and Management
    • /
    • v.14 no.1
    • /
    • pp.65-88
    • /
    • 2004
  • In general, empowerment is defined as the motivational concept of autonomy and self-efficacy. Recently, the concept of empowerment is applied to improve organizational staff's job satisfaction and organizational commitment in many organizations. Empower-ment in service organizations, i.e., hospitals, has certainly generated more publicity than any other organizations. The objectives of this study are, first, to measure the degree of hospital employees' empowerment using Spreitzer(1995)'s empowerment theory, second, to analyze the causal relationship of organizational and job characteristics, a degree of empowerment, and organizational performance(job satisfaction and organizational commitment), and third, to offer the strategy for the improvement of job satisfaction and organizational commitment. Spreitzer insists that the empowerment is composed of 4 dimensions(meaning, competence, self-determination, and impact). And he argues that various work-related characteristics is a direct cause of empowerment and the indirect cause of job satisfaction and organizational commitment, mediated by the empowerment latent variable. In order to perform this study, data were collected by self-administered questionnaires from hospital employees working in administrative department of 3 university hospitals in Inchon and Kyunggi-Do region. The number of cases is 181(response rate; 86%). The Collected data were analyzed with SPSS Ver. 10.0 and AMOSV Ver. 4.0. First, to test validity of variables, the factor analysis was used. Second, to test reliability, Cronbach's alpha coefficients was calculated. Cronbach's alpha of empowerment variable is 0.8323 showing that there's no problem in regard to the internal consistency. Also the Cronbach's alpha of other variables are 0.8301 of the degree of perceived control, 0.6705 of job characteristics, O.8787 of compensation, 0.9254 of job satisfaction, and 0.8389 of organizational commitment, respectively. Among the questions of job characteristics, two survey questions are deleted due to lowering the reliability. Third, to test multicollinearity and correlation of variables, the correlation analysis was performed. There was no problem of multicollinearity. Finally structural equation modelling (SEM) analysis was conducted to find the causal relationship of organizational and job Characteristics, empowerment, job satisfaction and organizational commitment. The 16 variables are included for the SEM analysis. The major results of this study are as follows: First, in the case of model fitness, the condition of x$^2$ statistic(92.187) is not fully satisfied, but the indices of GFI(0.912), AGFI(0.863), NFI(0.917) and CFI(0.928) are partially satisfied, which needs to upper 0.90. Second, in the result of hypotheses testing, all hypotheses are accepted and have a positive effect in 95% or 99% confidence interval(P<0.05 or P<0.001) except the effect of compensation variable on empowerment(P=0.082). Third, in regard to the direct, indirect, and total effect of variables, the direct effect of perceived control, task characteristics, and compensation on job satisfaction are 0.728, 2.264, 0.328 and on organizational commitment are 0.094, 1.411, 0.418, respectively. Also the indirect effect of perceived control, task characteristics, and compensation on job satisfaction are 0.311, 0.196, 0.028 and on organizational commitment are 0.210, 0.132, 0.019, respectively. Thus, these findings imply that various work-related factors are direct effect of empowerment and indirect effect of result variables, job satisfaction and organizational commitment. Also These results showed that the workplace empowerment is significant mediating factor of employee's job satisfaction and organizational commitment.

Modeling of Sensorineural Hearing Loss for the Evaluation of Digital Hearing Aid Algorithms (디지털 보청기 알고리즘 평가를 위한 감음신경성 난청의 모델링)

  • 김동욱;박영철
    • Journal of Biomedical Engineering Research
    • /
    • v.19 no.1
    • /
    • pp.59-68
    • /
    • 1998
  • Digital hearing aids offer many advantages over conventional analog hearing aids. With the advent of high speed digital signal processing chips, new digital techniques have been introduced to digital hearing aids. In addition, the evaluation of new ideas in hearing aids is necessarily accompanied by intensive subject-based clinical tests which requires much time and cost. In this paper, we present an objective method to evaluate and predict the performance of hearing aid systems without the help of such subject-based tests. In the hearing impairment simulation(HIS) algorithm, a sensorineural hearing impairment medel is established from auditory test data of the impaired subject being simulated. Also, the nonlinear behavior of the loudness recruitment is defined using hearing loss functions generated from the measurements. To transform the natural input sound into the impaired one, a frequency sampling filter is designed. The filter is continuously refreshed with the level-dependent frequency response function provided by the impairment model. To assess the performance, the HIS algorithm was implemented in real-time using a floating-point DSP. Signals processed with the real-time system were presented to normal subjects and their auditory data modified by the system was measured. The sensorineural hearing impairment was simulated and tested. The threshold of hearing and the speech discrimination tests exhibited the efficiency of the system in its use for the hearing impairment simulation. Using the HIS system we evaluated three typical hearing aid algorithms.

  • PDF

A Study on the Effect of Cold Water Mass on Observed Air Temperature in Busan (부산지역 기온에 미치는 냉수대의 영향에 대한 연구)

  • Park, Myung-Hee;Lee, Joon-Soo;Ahn, Ji-Suk;Suh, Young-Sang;Han, In-Seong;Kim, Hae-Dong;Bae, Hun-Kyun
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.3
    • /
    • pp.132-146
    • /
    • 2014
  • The effects of the cold air generated from large cold water mass at the coastal area on observed air temperature in Busan were investigated using AWS(Automatic Weather Station) data at the Busan area operated by Korea Meterological Administration and SST(Sea Surface Temperature) data at the Gijang and Busan area operated by Korean National Fisheries Research Development Institute. First, the temperature difference between the coastal area and the city area was about $1^{\circ}C$ during cold water mass day while it was about $0.5^{\circ}C$ if cold water mass was not appeared. Second, for day time, the temperature at the coastal area was about $1^{\circ}C$ lower than that at the city area during cold water mass day, but the difference was only about $0.4^{\circ}C$ without cold water mass. On the other hand, for night time, the temperature at the coastal area was about $1.2^{\circ}C$ lower than that at the city area during cold water mass day and the difference was about $0.9^{\circ}C$ without cold water mass. As a result, temperature differences at night time were higher than those at day time whether or not cold water mass appeared. The reason for higher temperature at night time might be the urban heat island phenomenon.