Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)
-
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.109-122
- /
- 2014
People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.
There is a general tendency to increase nitrogen level in rice production to insure an increased yield. On the other hand, percentage of ripened grains is getting decreased with such an increased fertilizer level. Decreasing of the percentage is one of the important yield limiting factors. Especially the newly developed rice variety, 'Tongil' is characterized by a relatively low percentage of ripened grains as compared with the other leading varieties. Therefore, these studies were aimed to finding out of some measures for the improvement of ripening in rice. The studies had been carried out in the field and in the phytotron during the period of three years from 1970 to 1972 at the Crop Experiment Station in Suwon. The results obtained from the experiments could be summarized as follows: 1. The spikelet of Tongil was longer in length, more narrow in width, thinner in thickness, smaller in the volume of grains and lighter in grain weight than those of Jinheung. The specific gravity of grain was closely correlated with grain weight and the relationship with thickness, width and length was getting smaller in Jinheung. On the other hand, Tongil showed a different pattern from Jinheung. The relationship of the specific gravity with grain weight was the greatest and followed by that with the width, thickness and length, in order. 2. The distribution of grain weight selected by specific gravity was different from one variety to another. Most of grains of Jinheung were distributed over the specific gravity of 1.12 with its peak at 1.18, but many of grains of Tongil were distributed below 1.12 with its peak at 1.16. The brown/rough rice ratio was sharply declined below the specific gravity of 1.06 in Jinheung, but that of Tongil was not declined from the 1.20 to the 0.96. Accordingly, it seemed to be unfair to make the specific gravity criterion for ripened grains at 1.06 in the Tongil variety. 3. The increasing tendency of grain weight after flowering was different depending on varieties. Generally speaking, rice varieties originated from cold area showed a slow grain weight increase while Tongil was rapid except at lower temperature in late ripening stage. 4. In the late-tillered culms or weak culms, the number of spikelets was small and the percentage of ripened grains was low. Tongil produced more late-tillered culms and had a longer flowering duration especially at lower temperature, resulting in a lower percentage of ripened grains. 5. The leaf blade of Tongil was short, broad and errect, having light receiving status for photosynthesis was better. The photosynthetic activity of Tongil per unit leaf area was higher than that of Jinheung at higher temperature, but lower at lower temperature. 6. Tongil was highly resistant to lodging because of short culm length, and thick lower-internodes. Before flowering, Tongil had a relatively higher amount of sugars, phosphate, silicate, calcium, manganese and magnesium. 7. The number of spikelets of Tongil was much more than that of Jinheung. The negative correlation was observed between the number of spikelets and percentage of ripened grains in Jinheung, but no correlation was found in Tongil grown at higher temperature. Therefore, grain yield was increased with increased number of spikelets in Tongil. Anthesis was not occurred below 21