• Title/Summary/Keyword: Web performance

Search Result 2,060, Processing Time 0.029 seconds

Prediction of Key Variables Affecting NBA Playoffs Advancement: Focusing on 3 Points and Turnover Features (미국 프로농구(NBA)의 플레이오프 진출에 영향을 미치는 주요 변수 예측: 3점과 턴오버 속성을 중심으로)

  • An, Sehwan;Kim, Youngmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.263-286
    • /
    • 2022
  • This study acquires NBA statistical information for a total of 32 years from 1990 to 2022 using web crawling, observes variables of interest through exploratory data analysis, and generates related derived variables. Unused variables were removed through a purification process on the input data, and correlation analysis, t-test, and ANOVA were performed on the remaining variables. For the variable of interest, the difference in the mean between the groups that advanced to the playoffs and did not advance to the playoffs was tested, and then to compensate for this, the average difference between the three groups (higher/middle/lower) based on ranking was reconfirmed. Of the input data, only this year's season data was used as a test set, and 5-fold cross-validation was performed by dividing the training set and the validation set for model training. The overfitting problem was solved by comparing the cross-validation result and the final analysis result using the test set to confirm that there was no difference in the performance matrix. Because the quality level of the raw data is high and the statistical assumptions are satisfied, most of the models showed good results despite the small data set. This study not only predicts NBA game results or classifies whether or not to advance to the playoffs using machine learning, but also examines whether the variables of interest are included in the major variables with high importance by understanding the importance of input attribute. Through the visualization of SHAP value, it was possible to overcome the limitation that could not be interpreted only with the result of feature importance, and to compensate for the lack of consistency in the importance calculation in the process of entering/removing variables. It was found that a number of variables related to three points and errors classified as subjects of interest in this study were included in the major variables affecting advancing to the playoffs in the NBA. Although this study is similar in that it includes topics such as match results, playoffs, and championship predictions, which have been dealt with in the existing sports data analysis field, and comparatively analyzed several machine learning models for analysis, there is a difference in that the interest features are set in advance and statistically verified, so that it is compared with the machine learning analysis result. Also, it was differentiated from existing studies by presenting explanatory visualization results using SHAP, one of the XAI models.

Development and Experimental Performance Evaluation of Steel Composite Girder by Turn Over Process (단면회전방법을 적용한 강합성 소수주거더 개발 및 실험적 성능 평가)

  • Kim, Sung Jae;Yi, Na Hyun;Kim, Sung Bae;Kim, Jang-Ho Jay
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.30 no.5A
    • /
    • pp.407-415
    • /
    • 2010
  • In Korea, more than 90% of the total number of steel bridges built for 40~70 m span length is a steel box-girder bridge type. A steel box-girder bridge is suitable for long span or curved bridges with outstanding flexural and torsional rigidity as well as good constructability and safety. However, a steel box-girder bridge is uneconomical, requiring many secondary members and workmanship such as stiffeners and ribs requiring welding attachments to flanges or webs. Therefore, in US and Japan, a plate girder bridge, which is relatively cheap and easy to construct is generally used. One type of the plate girder bridge is the two- or three-main girder plate bridge, which is a composite plate girder bridge that minimizes the number of required main girders by increasing the distance between the adjacent girders. Also, for the simplification of girder section, the stiffener which requires attachment to the web is not required. The two-main steel girder plate bridge is a representative type of plate girder bridges, which is suitable for bridges with 10 m effective width and has been developed in the early 1960s in France. To ensure greater safety of two- or three-main girder plate bridges, a larger steel section is used in the bridge domestically than in Europe or Japan. Also, the total number of two- or three-main girder plate bridge constructed in Korea is significantly less than the steel box girder bridge due to a lack of designers' familiarity with more complex design detailing of the bridge compare to that of a steel box girder bridge design. In this study, a new construction method called Turn Over method is proposed to minimize the steel section size used in a two- or three-main girder plate bridge by applying prestressing force to the member using confining concrete section's weight to reduce construction cost. Also, a full scale 20 m Turn Over girder specimen and a Turn Over girder bridge specimen were tested to evaluate constructability and structural safety of the members constructed using Turn Over process.

Characteristics and Implications of Sports Content Business of Big Tech Platform Companies : Focusing on Amazon.com (빅테크 플랫폼 기업의 스포츠콘텐츠 사업의 특징과 시사점 : 아마존을 중심으로)

  • Shin, Jae-hyoo
    • Journal of Venture Innovation
    • /
    • v.7 no.1
    • /
    • pp.1-15
    • /
    • 2024
  • This study aims to elucidate the characteristics of big tech platform companies' sports content business in an environment of rapid digital transformation. Specifically, this study examines the market structure of big tech platform companies with a focus on Amazon, revealing the role of sports content within this structure through an analysis of Amazon's sports marketing business and provides an outlook on the sports content business of big tech platform companies. Based on two-sided market platform business models, big tech platform companies incorporate sports content as a strategy to enhance the value of their platforms. Therefore, sports content is used as a tool to enhance the value of their platforms and to consolidate their monopoly position by maximizing profits by increasing the synergy of platform ecosystems such as infrastructure. Amazon acquires popular live sports broadcasting rights on a continental or national basis and supplies them to its platforms, which not only increases the number of new customers and purchasing effects, but also provides IT solution services to sports organizations and teams while planning and supplying various promotional contents, thus creates synergy across Amazon's platforms including its advertising business. Amazon also expands its business opportunities and increases its overall value by supplying live sports contents to Amazon Prime Video and Amazon Prime, providing technical services to various stakeholders through Amazon Web Services, and offering Amazon Marketing Cloud services for analyzing and predicting advertisers' advertising and marketing performance. This gives rise to a new paradigm in the sports marketing business in the digital era, stemming from the difference in market structure between big tech companies based on two-sided market platforms and legacy global companies based on one-sided markets. The core of this new model is a business through the development of various contents based on live sports streaming rights, and sports content marketing will become a major field of sports marketing along with traditional broadcasting rights and sponsorship. Big tech platform global companies such as Amazon, Apple, and Google have the potential to become new global sports marketing companies, and the current sports marketing and advertising companies, as well as teams and leagues, are facing both crises and opportunities.

Preliminary Report of the $1998{\sim}1999$ Patterns of Care Study of Radiation Therapy for Esophageal Cancer in Korea (식도암 방사선 치료에 대한 Patterns of Care Study ($1998{\sim}1999$)의 예비적 결과 분석)

  • Hur, Won-Joo;Choi, Young-Min;Lee, Hyung-Sik;Kim, Jeung-Kee;Kim, Il-Han;Lee, Ho-Jun;Lee, Kyu-Chan;Kim, Jung-Soo;Chun, Mi-Son;Kim, Jin-Hee;Ahn, Yong-Chan;Kim, Sang-Gi;Kim, Bo-Kyung
    • Radiation Oncology Journal
    • /
    • v.25 no.2
    • /
    • pp.79-92
    • /
    • 2007
  • [ $\underline{Purpose}$ ]: For the first time, a nationwide survey in the Republic of Korea was conducted to determine the basic parameters for the treatment of esophageal cancer and to offer a solid cooperative system for the Korean Pattern of Care Study database. $\underline{Materials\;and\;Methods}$: During $1998{\sim}1999$, biopsy-confirmed 246 esophageal cancer patients that received radiotherapy were enrolled from 23 different institutions in South Korea. Random sampling was based on power allocation method. Patient parameters and specific information regarding tumor characteristics and treatment methods were collected and registered through the web based PCS system. The data was analyzed by the use of the Chi-squared test. $\underline{Results}$: The median age of the collected patients was 62 years. The male to female ratio was about 91 to 9 with an absolute male predominance. The performance status ranged from ECOG 0 to 1 in 82.5% of the patients. Diagnostic procedures included an esophagogram (228 patients, 92.7%), endoscopy (226 patients, 91.9%), and a chest CT scan (238 patients, 96.7%). Squamous cell carcinoma was diagnosed in 96.3% of the patients; mid-thoracic esophageal cancer was most prevalent (110 patients, 44.7%) and 135 patients presented with clinical stage III disease. Fifty seven patients received radiotherapy alone and 37 patients received surgery with adjuvant postoperative radiotherapy. Half of the patients (123 patients) received chemotherapy together with RT and 70 patients (56.9%) received it as concurrent chemoradiotherapy. The most frequently used chemotherapeutic agent was a combination of cisplatin and 5-FU. Most patients received radiotherapy either with 6 MV (116 patients, 47.2%) or with 10 MV photons (87 patients, 35.4%). Radiotherapy was delivered through a conventional AP-PA field for 206 patients (83.7%) without using a CT plan and the median delivered dose was 3,600 cGy. The median total dose of postoperative radiotherapy was 5,040 cGy while for the non-operative patients the median total dose was 5,970 cGy. Thirty-four patients received intraluminal brachytherapy with high dose rate Iridium-192. Brachytherapy was delivered with a median dose of 300 cGy in each fraction and was typically delivered $3{\sim}4\;times$. The most frequently encountered complication during the radiotherapy treatment was esophagitis in 155 patients (63.0%). $\underline{Conclusion}$: For the evaluation and treatment of esophageal cancer patients at radiation facilities in Korea, this study will provide guidelines and benchmark data for the solid cooperative systems of the Korean PCS. Although some differences were noted between institutions, there was no major difference in the treatment modalities and RT techniques.

Improved Social Network Analysis Method in SNS (SNS에서의 개선된 소셜 네트워크 분석 방법)

  • Sohn, Jong-Soo;Cho, Soo-Whan;Kwon, Kyung-Lag;Chung, In-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.117-127
    • /
    • 2012
  • Due to the recent expansion of the Web 2.0 -based services, along with the widespread of smartphones, online social network services are being popularized among users. Online social network services are the online community services which enable users to communicate each other, share information and expand human relationships. In the social network services, each relation between users is represented by a graph consisting of nodes and links. As the users of online social network services are increasing rapidly, the SNS are actively utilized in enterprise marketing, analysis of social phenomenon and so on. Social Network Analysis (SNA) is the systematic way to analyze social relationships among the members of the social network using the network theory. In general social network theory consists of nodes and arcs, and it is often depicted in a social network diagram. In a social network diagram, nodes represent individual actors within the network and arcs represent relationships between the nodes. With SNA, we can measure relationships among the people such as degree of intimacy, intensity of connection and classification of the groups. Ever since Social Networking Services (SNS) have drawn increasing attention from millions of users, numerous researches have made to analyze their user relationships and messages. There are typical representative SNA methods: degree centrality, betweenness centrality and closeness centrality. In the degree of centrality analysis, the shortest path between nodes is not considered. However, it is used as a crucial factor in betweenness centrality, closeness centrality and other SNA methods. In previous researches in SNA, the computation time was not too expensive since the size of social network was small. Unfortunately, most SNA methods require significant time to process relevant data, and it makes difficult to apply the ever increasing SNS data in social network studies. For instance, if the number of nodes in online social network is n, the maximum number of link in social network is n(n-1)/2. It means that it is too expensive to analyze the social network, for example, if the number of nodes is 10,000 the number of links is 49,995,000. Therefore, we propose a heuristic-based method for finding the shortest path among users in the SNS user graph. Through the shortest path finding method, we will show how efficient our proposed approach may be by conducting betweenness centrality analysis and closeness centrality analysis, both of which are widely used in social network studies. Moreover, we devised an enhanced method with addition of best-first-search method and preprocessing step for the reduction of computation time and rapid search of the shortest paths in a huge size of online social network. Best-first-search method finds the shortest path heuristically, which generalizes human experiences. As large number of links is shared by only a few nodes in online social networks, most nods have relatively few connections. As a result, a node with multiple connections functions as a hub node. When searching for a particular node, looking for users with numerous links instead of searching all users indiscriminately has a better chance of finding the desired node more quickly. In this paper, we employ the degree of user node vn as heuristic evaluation function in a graph G = (N, E), where N is a set of vertices, and E is a set of links between two different nodes. As the heuristic evaluation function is used, the worst case could happen when the target node is situated in the bottom of skewed tree. In order to remove such a target node, the preprocessing step is conducted. Next, we find the shortest path between two nodes in social network efficiently and then analyze the social network. For the verification of the proposed method, we crawled 160,000 people from online and then constructed social network. Then we compared with previous methods, which are best-first-search and breath-first-search, in time for searching and analyzing. The suggested method takes 240 seconds to search nodes where breath-first-search based method takes 1,781 seconds (7.4 times faster). Moreover, for social network analysis, the suggested method is 6.8 times and 1.8 times faster than betweenness centrality analysis and closeness centrality analysis, respectively. The proposed method in this paper shows the possibility to analyze a large size of social network with the better performance in time. As a result, our method would improve the efficiency of social network analysis, making it particularly useful in studying social trends or phenomena.

Visual Media Education in Visual Arts Education (미술교육에 있어서 시각적 미디어를 통한 조형교육에 관한 연구)

  • Park Ji-Sook
    • Journal of Science of Art and Design
    • /
    • v.7
    • /
    • pp.64-104
    • /
    • 2005
  • Visual media transmits image and information reproduced in large quantities, such as a photography, film, television, video, advertisement, or computer image. Correspondence to the students' reception and recognition of culture in the future. arrangements for the field of studies of visual culture. 'Visual Culture' implies cultural phenomena of visual images via visual media, which includes not only the categories of traditional arts like a painting, sculpture, print, or design, but the performance arts including a fashion show or parade of carnival, and the mass and electronic media like a photography, film, television, video, advertisement, cartoon, animation, or computer image. In the world of visual media, Image' functions as an essential medium of communication. Therefore, people call the culture of today fra of Image Culture', which has been converted from an alphabet convergence era to an image convergence one. Image, via visual media, has become a dominant means for communication in large part of human life, so we can designate an Image' as a typical aspect of visual culture today. Image, as an essential medium of communication, plays an important role in contemporary society. The one way is the conversion of analogue image like an actual picture, photograph, or film into digital one through the digitalization of digital camera or scanner as 'an analogue/digital commutator'. The other is a way of process with a computer drawing, or modeling of objects. It is appropriate to the production of pictorial and surreal images. Digital images, produced by the other, can be divided into the form of Pixel' and form of Vector'. Vector is a line linking the point of departure to the point of end, which organizes informations. Computer stores each line's standard location and correlative locations to one another Digital image shows for more 'Perfectness' than any other visual media. Digital image has been evolving in the diverse aspects, such as a production of geometrical or organic image compositing, interactive art, multimedia art, or web art, which has been applied a computer as an extended trot of painting. Someone often interprets digitalized copy with endless reproduction of original even as an extension of a print. Visual af is no longer a simple activity of representation by a painter or sculptor, but now is intimately associated with a matter of application of media. There is some problem in images via visual media. First, the image via media doesn't reflect a reality as it is, but reflects an artificial manipulated world, that is, a virtual reality. Second, the introduction of digital effect and the development of image processing technology have enhanced a spectacle of destructive and violent scenes. Third, a child intends to recognize the interactive images of computer game and virtual reality as a reality, or truth. Education needs not only to point out an ill effect of mass media and prevent the younger generation from being damaged by it, but also to offer a knowledge and know-how to cope actively with social, cultural circumstances. Visual media education is one of these essential methods for the contemporary and future human being in the overflowing of image informations. The fosterage of 'Visual Literacy' can be considered as a very purpose of visual media education. This is a way to lead an individual to the discerning, active consumer and producer of visual media in life as far as possible. The elements of 'Visual Literacy' can be divided into a faculty of recognition related to the visual media, a faculty of critical reception, a faculty of appropriate application, a faculty of active work and a faculty of creative modeling, which are promoted at the same time by the education of 'visual literacy'. In conclusion, the education of 'Visual Literacy' guides students to comprehend and discriminate the visual image media carefully, or receive them critically, apply them properly, or produce them creatively and voluntarily. Moreover, it leads to an artistic activity by means of new media. This education can be approached and enhanced by the connection and integration with real life. Visual arts and education of them play an important role in the digital era depended on visual communications via image information. Visual me야a of day functions as an essential element both in daily life and in arts. Students can soundly understand visual phenomena of today by means of visual media, and apply it as an expression tool of life culture as well. A new recognition and valuation visual image and media education is required to cultivate the capability of active, upright dealing with the changes of history of civilization. 1) Visual media education helps to cultivate a sensibility for images, which reacts to and deals with the circumstances. 2) It helps students to comprehend the contemporary arts and culture via new media. 3) It supplies a chance of students' experiencing a visual modeling by means of new media. 4) There are educational opportunities of images with temporality and spaciality, and therefore a discerning person becomes to increase. 5) The modeling activity via new media leads students to be continuously interested in the school and production of plastic arts. 6) It raises the ability of visual communications dealing with image information society. 7) An education of digital image is significant in respect of cultivation of man of talent for the future society of image information as well. To correspond to the changing and developing social, cultural circumstances, and the form and recognition of students' reception of them, visual arts education must arrange the field of studying on a new visual culture. Besides, a program needs to be developed, which is in more systematic and active level in relation to visual media education. Educational contents should be extended to the media for visual images, that is, photography, film, television, video, computer graphic, animation, music video, computer game and multimedia. Every media must be separately approached, because they maintain the modes and peculiarities of their own according to the conveyance form of message. The concrete and systematic method of teaching and the quality of education must be researched and developed, centering around the development of a course of study. Teacher's foundational capability of teaching should be cultivated for the visual media education. In this case, it must be paid attention to the fact that a technological level of media is considered as a secondary. Because school education doesn't intend to train expert and skillful producers, but intends to lay stress on the essential aesthetic one with visual media under the social and cultural context, in respect of a consumer including a man of culture.

  • PDF

Content-based Recommendation Based on Social Network for Personalized News Services (개인화된 뉴스 서비스를 위한 소셜 네트워크 기반의 콘텐츠 추천기법)

  • Hong, Myung-Duk;Oh, Kyeong-Jin;Ga, Myung-Hyun;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.57-71
    • /
    • 2013
  • Over a billion people in the world generate new news minute by minute. People forecasts some news but most news are from unexpected events such as natural disasters, accidents, crimes. People spend much time to watch a huge amount of news delivered from many media because they want to understand what is happening now, to predict what might happen in the near future, and to share and discuss on the news. People make better daily decisions through watching and obtaining useful information from news they saw. However, it is difficult that people choose news suitable to them and obtain useful information from the news because there are so many news media such as portal sites, broadcasters, and most news articles consist of gossipy news and breaking news. User interest changes over time and many people have no interest in outdated news. From this fact, applying users' recent interest to personalized news service is also required in news service. It means that personalized news service should dynamically manage user profiles. In this paper, a content-based news recommendation system is proposed to provide the personalized news service. For a personalized service, user's personal information is requisitely required. Social network service is used to extract user information for personalization service. The proposed system constructs dynamic user profile based on recent user information of Facebook, which is one of social network services. User information contains personal information, recent articles, and Facebook Page information. Facebook Pages are used for businesses, organizations and brands to share their contents and connect with people. Facebook users can add Facebook Page to specify their interest in the Page. The proposed system uses this Page information to create user profile, and to match user preferences to news topics. However, some Pages are not directly matched to news topic because Page deals with individual objects and do not provide topic information suitable to news. Freebase, which is a large collaborative database of well-known people, places, things, is used to match Page to news topic by using hierarchy information of its objects. By using recent Page information and articles of Facebook users, the proposed systems can own dynamic user profile. The generated user profile is used to measure user preferences on news. To generate news profile, news category predefined by news media is used and keywords of news articles are extracted after analysis of news contents including title, category, and scripts. TF-IDF technique, which reflects how important a word is to a document in a corpus, is used to identify keywords of each news article. For user profile and news profile, same format is used to efficiently measure similarity between user preferences and news. The proposed system calculates all similarity values between user profiles and news profiles. Existing methods of similarity calculation in vector space model do not cover synonym, hypernym and hyponym because they only handle given words in vector space model. The proposed system applies WordNet to similarity calculation to overcome the limitation. Top-N news articles, which have high similarity value for a target user, are recommended to the user. To evaluate the proposed news recommendation system, user profiles are generated using Facebook account with participants consent, and we implement a Web crawler to extract news information from PBS, which is non-profit public broadcasting television network in the United States, and construct news profiles. We compare the performance of the proposed method with that of benchmark algorithms. One is a traditional method based on TF-IDF. Another is 6Sub-Vectors method that divides the points to get keywords into six parts. Experimental results demonstrate that the proposed system provide useful news to users by applying user's social network information and WordNet functions, in terms of prediction error of recommended news.

A Study on the Improvement of Recommendation Accuracy by Using Category Association Rule Mining (카테고리 연관 규칙 마이닝을 활용한 추천 정확도 향상 기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.27-42
    • /
    • 2020
  • Traditional companies with offline stores were unable to secure large display space due to the problems of cost. This limitation inevitably allowed limited kinds of products to be displayed on the shelves, which resulted in consumers being deprived of the opportunity to experience various items. Taking advantage of the virtual space called the Internet, online shopping goes beyond the limits of limitations in physical space of offline shopping and is now able to display numerous products on web pages that can satisfy consumers with a variety of needs. Paradoxically, however, this can also cause consumers to experience the difficulty of comparing and evaluating too many alternatives in their purchase decision-making process. As an effort to address this side effect, various kinds of consumer's purchase decision support systems have been studied, such as keyword-based item search service and recommender systems. These systems can reduce search time for items, prevent consumer from leaving while browsing, and contribute to the seller's increased sales. Among those systems, recommender systems based on association rule mining techniques can effectively detect interrelated products from transaction data such as orders. The association between products obtained by statistical analysis provides clues to predicting how interested consumers will be in another product. However, since its algorithm is based on the number of transactions, products not sold enough so far in the early days of launch may not be included in the list of recommendations even though they are highly likely to be sold. Such missing items may not have sufficient opportunities to be exposed to consumers to record sufficient sales, and then fall into a vicious cycle of a vicious cycle of declining sales and omission in the recommendation list. This situation is an inevitable outcome in situations in which recommendations are made based on past transaction histories, rather than on determining potential future sales possibilities. This study started with the idea that reflecting the means by which this potential possibility can be identified indirectly would help to select highly recommended products. In the light of the fact that the attributes of a product affect the consumer's purchasing decisions, this study was conducted to reflect them in the recommender systems. In other words, consumers who visit a product page have shown interest in the attributes of the product and would be also interested in other products with the same attributes. On such assumption, based on these attributes, the recommender system can select recommended products that can show a higher acceptance rate. Given that a category is one of the main attributes of a product, it can be a good indicator of not only direct associations between two items but also potential associations that have yet to be revealed. Based on this idea, the study devised a recommender system that reflects not only associations between products but also categories. Through regression analysis, two kinds of associations were combined to form a model that could predict the hit rate of recommendation. To evaluate the performance of the proposed model, another regression model was also developed based only on associations between products. Comparative experiments were designed to be similar to the environment in which products are actually recommended in online shopping malls. First, the association rules for all possible combinations of antecedent and consequent items were generated from the order data. Then, hit rates for each of the associated rules were predicted from the support and confidence that are calculated by each of the models. The comparative experiments using order data collected from an online shopping mall show that the recommendation accuracy can be improved by further reflecting not only the association between products but also categories in the recommendation of related products. The proposed model showed a 2 to 3 percent improvement in hit rates compared to the existing model. From a practical point of view, it is expected to have a positive effect on improving consumers' purchasing satisfaction and increasing sellers' sales.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.