Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)
-
- Journal of Intelligence and Information Systems
- /
- v.23 no.2
- /
- pp.123-138
- /
- 2017
Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.
Data on reported cancer mortality in the Gyeongsangbuk- do province from 1991 to 1998 were collected and analyzed using the existing mortality reporting system as well as the public health network to furnish accurate data on reported cancer death and to collect data to establish a high quality district health plan. The overall crude death rate in Gyeongsangbuk province in 1991 was 74.56 deaths per 100,000-person but this rate increased to 79.22 in 1998. Among the deaths, the overall death rate of cancer was 16.7% in 1991, which increased to 19.3% in 1998; specifically the death rate of men increased from 19.4% in 1991 to 22.3% in 1998 while that of women increased from 12.4% in 1991 to 15.5% in 1998, showing a more increase among women. The types of cancer and associated death rates in 1991 were gastric cancer(41.5%), followed by liver cancer (28.8%), and lung and bronchogenic carcinoma(8.7%) and in 1998, gastric cancer (24.7%), followed by liver cancer(22.7%), lung and bronchogenic carcinoma(19.3%), showing the same order. For men and women, gastric cancer(40.2% and 44.7%, respectively) was the most common cancer death, followed by liver cancer(33.7% and 16.7%, respectively), and lung and bronchogenic carcinoma(10.2% and 5.0%, respectively) in 1991. However, in 1998, gastric cancer(27.8%) was still the most common type among both men and women, followed by liver cancer (18.5%) and lung and bronchogenic carcinoma(12.7%), showing the most decrease in gastric cancer but most increase in lung and bronchogenic carcinoma. The age- adjusted mortality rates by gastric cancer, hepatoma, laryngeal carcinoma were decreased in both male and female, and also uterine cancer was decreased in female. The age- adjusted mortality rates by lung and bronchogenic carcinoma, pancreatic cancer, rectal cancer were increased in both male and female, and also breast cancer was increased in female. The calculated overall age-adjusted death rate based on the 1995 population was 84.25 in 1991, which decreased to 77.67 in 1998. Male death rate decreased significantly from 119.81 in 1991 to 101.82 in 1998 while the female death rate increased from 48.64 in 1991 to 53.80 in 1998. A census of cancer death rate using accurate death records is important for the establishment of proper and high-quality district health and medical plan and policy. The effort to improve the accuracy of death reports using the health facility network, as had been attempted by this study, can be continued. Furthermore, there must be a way for the Health and Welfare Department to use the death reports to improve the present reporting system. Lastly, additional studies need to be conducted to investigate how much the accuracy was improved by the supplemented death reports in this study.
Purpose: Latest linear accelerator and the introduction of new measurement equipment to the agency that the introduction of this equipment in the future, by analyzing the process of confirming the usefulness of the preparation process for applying it in the clinical causes some problems, should be helpful. Materials and Methods: All measurements TrueBEAM STX (Varian, USA) was used, and a file specific to each energy, irradiation conditions, the dose distribution was calculated using a computerized treatment planning equipment (Eclipse ver 10.0.39, Varian, USA). Measuring performance and cause errors in MapCHECK 2 were analyzed and measured against. In order to verify the performance of the MapCHECK 2, 6X, 6X-FFF, 10X, 10X-FFF, 15X field size
Recently, as the information communication technology develops, the discussion regarding the ubiquitous environment is occurring in diverse perspectives. Ubiquitous environment is an environment that could transfer data through networks regardless of the physical space, virtual space, time or location. In order to realize the ubiquitous environment, the Pervasive Sensing technology that enables the recognition of users' data without the border between physical and virtual space is required. In addition, the latest and diversified technologies such as Context-Awareness technology are necessary to construct the context around the user by sharing the data accessed through the Pervasive Sensing technology and linkage technology that is to prevent information loss through the wired, wireless networking and database. Especially, Pervasive Sensing technology is taken as an essential technology that enables user oriented services by recognizing the needs of the users even before the users inquire. There are lots of characteristics of ubiquitous environment through the technologies mentioned above such as ubiquity, abundance of data, mutuality, high information density, individualization and customization. Among them, information density directs the accessible amount and quality of the information and it is stored in bulk with ensured quality through Pervasive Sensing technology. Using this, in the companies, the personalized contents(or information) providing became possible for a target customer. Most of all, there are an increasing number of researches with respect to recommender systems that provide what customers need even when the customers do not explicitly ask something for their needs. Recommender systems are well renowned for its affirmative effect that enlarges the selling opportunities and reduces the searching cost of customers since it finds and provides information according to the customers' traits and preference in advance, in a commerce environment. Recommender systems have proved its usability through several methodologies and experiments conducted upon many different fields from the mid-1990s. Most of the researches related with the recommender systems until now take the products or information of internet or mobile context as its object, but there is not enough research concerned with recommending adequate store to customers in a ubiquitous environment. It is possible to track customers' behaviors in a ubiquitous environment, the same way it is implemented in an online market space even when customers are purchasing in an offline marketplace. Unlike existing internet space, in ubiquitous environment, the interest toward the stores is increasing that provides information according to the traffic line of the customers. In other words, the same product can be purchased in several different stores and the preferred store can be different from the customers by personal preference such as traffic line between stores, location, atmosphere, quality, and price. Krulwich(1997) has developed Lifestyle Finder which recommends a product and a store by using the demographical information and purchasing information generated in the internet commerce. Also, Fano(1998) has created a Shopper's Eye which is an information proving system. The information regarding the closest store from the customers' present location is shown when the customer has sent a to-buy list, Sadeh(2003) developed MyCampus that recommends appropriate information and a store in accordance with the schedule saved in a customers' mobile. Moreover, Keegan and O'Hare(2004) came up with EasiShop that provides the suitable tore information including price, after service, and accessibility after analyzing the to-buy list and the current location of customers. However, Krulwich(1997) does not indicate the characteristics of physical space based on the online commerce context and Keegan and O'Hare(2004) only provides information about store related to a product, while Fano(1998) does not fully consider the relationship between the preference toward the stores and the store itself. The most recent research by Sedah(2003), experimented on campus by suggesting recommender systems that reflect situation and preference information besides the characteristics of the physical space. Yet, there is a potential problem since the researches are based on location and preference information of customers which is connected to the invasion of privacy. The primary beginning point of controversy is an invasion of privacy and individual information in a ubiquitous environment according to researches conducted by Al-Muhtadi(2002), Beresford and Stajano(2003), and Ren(2006). Additionally, individuals want to be left anonymous to protect their own personal information, mentioned in Srivastava(2000). Therefore, in this paper, we suggest a methodology to recommend stores in U-market on the basis of ubiquitous environment not using personal information in order to protect individual information and privacy. The main idea behind our suggested methodology is based on Feature Matrices model (FM model, Shahabi and Banaei-Kashani, 2003) that uses clusters of customers' similar transaction data, which is similar to the Collaborative Filtering. However unlike Collaborative Filtering, this methodology overcomes the problems of personal information and privacy since it is not aware of the customer, exactly who they are, The methodology is compared with single trait model(vector model) such as visitor logs, while looking at the actual improvements of the recommendation when the context information is used. It is not easy to find real U-market data, so we experimented with factual data from a real department store with context information. The recommendation procedure of U-market proposed in this paper is divided into four major phases. First phase is collecting and preprocessing data for analysis of shopping patterns of customers. The traits of shopping patterns are expressed as feature matrices of N dimension. On second phase, the similar shopping patterns are grouped into clusters and the representative pattern of each cluster is derived. The distance between shopping patterns is calculated by Projected Pure Euclidean Distance (Shahabi and Banaei-Kashani, 2003). Third phase finds a representative pattern that is similar to a target customer, and at the same time, the shopping information of the customer is traced and saved dynamically. Fourth, the next store is recommended based on the physical distance between stores of representative patterns and the present location of target customer. In this research, we have evaluated the accuracy of recommendation method based on a factual data derived from a department store. There are technological difficulties of tracking on a real-time basis so we extracted purchasing related information and we added on context information on each transaction. As a result, recommendation based on FM model that applies purchasing and context information is more stable and accurate compared to that of vector model. Additionally, we could find more precise recommendation result as more shopping information is accumulated. Realistically, because of the limitation of ubiquitous environment realization, we were not able to reflect on all different kinds of context but more explicit analysis is expected to be attainable in the future after practical system is embodied.
Purpose: In the early stage of using PET/CT, it was used to damper revision but recently shows that CT with MDCT is commonly used and works well for an anatomical diagnosis. This hospital makes the accuracy and convenience more higher in the diagnosis and evaluate of coronary heart disease through concurrently running myocardial perfusion SPECT examination, myocardial PET examination with FDG, and CT coronary artery CT angiography(coronary CTA) used PET/CT with 64-slice. This report shows protocol and image based on results from about 400 coronary heart disease examinations since having 64 channels PET/CT in July 2007. Materials and Methods: An Equipment for this examination is 64-slice CT and Discovery VCT (DVCT) that is consisted of PET with BGO (
News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.
Purpose : By taking advantage of each imaging modality, the use of fused CT/MRI image has increased in prostate cancer radiation therapy. However, fusion uncertainty may cause partial target miss or normal organ overdose. In order to complement such limitation, our hospital acquired MRI image (Planning MRI) by setting up patients with the same fixing tool and posture as CT simulation. This study aims to evaluate the usefulness of the Planning MRI through comparing and analyzing the diagnostic MRI image and Planning MRI image. Materials and Methods : This study targeted 10 patients who had been diagnosed with prostate cancer and prescribed nonhormone and definitive RT 70 Gy/28 fx from August 2011 to July 2013. Each patient had both CT and MRI simulations. The MRI images were acquired within one half hour after the CT simulation. The acquired CT/MRI images were fused primarily based on bony structure matching. This study measured the volume of prostate in the images of Planning MRI and diagnostic MRI. The diameters at the craniocaudal, anteroposterior and left-to-right directions from the center of prostate were measured in order to compare changes in the shape of prostate. Results : As a result of comparing the volume of prostate in the images of Planning MRI and diagnostic MRI, they were found to be
Introduction The purpose of this research is to develop overall model which involves the effect of ongoing support services by franchisor on franchisee's relationship quality(trust, satisfaction, and commitment) and business performance(financial and non-financial performance), and to investigate the relationships among trust, satisfaction, commitment, financial and non-financial performance. This study also suggests franchise business or franchise system should be based on long-term orientation between franchisor and franchisee rather than short-term orientation, or transactional relationship, and proposes the most effective way of providing on-going support services by franchisor with franchisee thru symbiotic relationship among franchisor and franchisee Research Model and Hypothesis The research model as Figure 1 shows the variables on-going support services which affect the relationship quality between franchisor and franchisee such as trust, satisfaction, and commitment, and also analyze the effects of relationship quality on business performance including financial and non-financial performance We established 12 hypotheses to test as follows; Relationship between on-going support services and trust H1: On-going support services factors (product category & price, logistics service, promotion, information providing & problem solving capability, supervisor's support, and education & training support) have positive effect on franchisee's trust. Relationship between on-going support services and satisfaction H2: On-going support services factors (product category & price, logistics service, promotion, information providing & problem solving capability, supervisor's support, and education & training support) have positive effect on franchisee's satisfaction. Relationship between on-going support services and commitment H3: On-going support services factors (product category & price, logistics service, promotion, information providing & problem solving capability, supervisor's support, and education & training support) have positive effect on franchisee's commitment. Relationship among relationship quality: trust, satisfaction, and commitment H4: Franchisee's trust has positive effect on franchisee's satisfaction. H5: Franchisee's trust has positive effect on franchisee's commitment. H6: Franchisee's satisfaction has positive effect on franchisee's commitment. Relationship between relationship quality and business performance H7: Franchisee's trust has positive effect on franchisee's financial performance. H8: Franchisee's trust has positive effect on franchisee's non-financial performance. H9: Franchisee's satisfaction has positive effect on franchisee's financial performance. H10: Franchisee's satisfaction has positive effect on franchisee's non-financial performance. H11: Franchisee's commitment has positive effect on franchisee's financial performance. H12: Franchisee's commitment has positive effect on franchisee's non-financial performance. Method The on-going support services were defined as an organized system of continuous supporting services by franchisor for the purpose of satisfying the expectation of franchisee based on long-term orientation and classified into six constructs such as product category & price, logistics service, promotion, providing information & problem solving capability, supervisor's support, and education & training support. The six constructs were measured agreement using a 7-point Likert-type scale (1 = strongly disagree to 7 = strongly agree)as follows. The product category & price was measured by four items: menu variety, price of food material provided by franchisor, and support for developing new menu. The logistics service was measured by six items: distribution system of franchisor, return policy for provided food materials, timeliness, inventory control level of franchisor, accuracy of order, and flexibility of emergency order. The promotion was measured by five items: differentiated promotion activities, brand image of franchisor, promotion effect such as customer increase, long-term plan of promotion, and micro-marketing concept in promotion. The providing information & problem solving capability was measured by information providing of new products, information of competitors, information of cost reduction, and efforts for solving problems in franchisee's operations. The supervisor's support was measured by supervisor operations, frequency of visiting franchisee, support by data analysis, processing the suggestions by franchisee, diagnosis and solutions for the franchisee's operations, and support for increasing sales in franchisee. Finally, the of education & training support was measured by recipe training by specialist, service training for store people, systemized training program, and tax & human resources support services. Analysis and results The data were analyzed using Amos. Figure 2 and Table 1 present the result of the structural equation model. Implications The results of this research are as follows: Firstly, the factors of product category, information providing and problem solving capacity influence only franchisee's satisfaction and commitment. Secondly, logistic services and supervising factors influence only trust and satisfaction. Thirdly, continuing education and training factors influence only franchisee's trust and commitment. Fourthly, sales promotion factor influences all the relationship quality representing trust, satisfaction, and commitment. Fifthly, regarding relationship among relationship quality, trust positively influences satisfaction, however, does not directly influence commitment, but satisfaction positively affects commitment. Therefore, satisfaction plays a mediating role between trust and commitment. Sixthly, trust positively influence only financial performance, and satisfaction and commitment influence positively both financial and non-financial performance.
Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used