Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)
-
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.69-94
- /
- 2017
Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.
This study was attempted to find out the value of cultural assets through the clear diagnosis and prescription of the dead and weakness factors of the Population of Retusa Fringe Trees in Pyeongji-ri, Jinan(Natural Monument No. 214), The results are as follows. First, Since the designation of 13 natural monuments in 1968, since 1973, many years have passed since then. In particular, despite the removal of some of the buried soil during the maintenance process, such as retreating from the fence of the primary school after 2010, Second, The first and third surviving tree of the designated trees also have many branches that are dead, the leaves are dull, and the amount of leaves is small. vitality of tree is 'extremely bad', and the first branch has already been faded by a large number of branches, and the amount of leaves is considerably low this year, so that only two flowers are bloomed. The second is also in a 'bad'state, with small leaves, low leaf density, and deformed water. The largest number 1 in the world is added to the concern that the s coverd oil is assumed to be paddy soils. Third, It is found that the composition ratio of silt is high because it is known as '[silty loam(SiL)]'. In addition, the pH of the northern soil at pH 1 was 6.6, which was significantly different from that of the other soil. In addition, the organic matter content was higher than the appropriate range, which is considered to reflect the result of continuous application for protection management. Fourth, It is considered that the root cause of failure and growth of Jinan pyeongji-ri Population of Retusa Fringe Trees group is chronic syndrome of serious menstrual deterioration due to covered soil. This can also be attributed to the newly planted succession and to some of the deaths. Fifthly, It is urgent to gradually remove the subsoil part, which is estimated to be the cause of the initial damage. Above all, it is almost impossible to remove the coverd soil after grasping the details of the soil, such as clayey soil, which is buried in the rootstock. After removal of the coverd soil, a pestle is installed to improve the respiration of the roots and the ground with Masato. And the dead 4th dead wood and the 5th and 6th dead wood are the best, and the lower layer vegetation is mown. The viable neck should be removed from the upper surface, and the bark defect should undergo surgery and induce the development of blindness by vestibule below the growth point. Sixth, The underground roots should be identified to prepare a method to improve the decompression of the root and the respiration of the soil. It is induced by the shortening of rotten roots by tracing the first half of the rootstock to induce the generation of new roots. Seventh, We try mulching to suppress weed occurrence, trampling pressure, and soil moisturizing effect. In addition, consideration should be given to the fertilization of the foliar fertilizer, the injection of the nutrients, and the soil management of the inorganic fertilizer for the continuous nutrition supply. Future monitoring and forecasting plans should be developed to check for changes continuously.
Introduction: Diffusion is process by which an innovation is communicated through certain channel overtime among the members of a social system(Rogers 1983). Bass(1969) suggested the Bass model describing diffusion process. The Bass model assumes potential adopters of innovation are influenced by mass-media and word-of-mouth from communication with previous adopters. Various expansions of the Bass model have been conducted. Some of them proposed a third factor affecting diffusion. Others proposed multinational diffusion model and it stressed interactive effect on diffusion among several countries. We add a spatial factor in the Bass model as a third communication factor. Because of situation where we can not control the interaction between markets, we need to consider that diffusion within certain market can be influenced by diffusion in contiguous market. The process that certain type of retail extends is a result that particular market can be described by the retail life cycle. Diffusion of retail has pattern following three phases of spatial diffusion: adoption of innovation happens in near the diffusion center first, spreads to the vicinity of the diffusing center and then adoption of innovation is completed in peripheral areas in saturation stage. So we expect spatial effect to be important to describe diffusion of domestic discount store. We define a spatial diffusion model using multinational diffusion model and apply it to the diffusion of discount store. Modeling: In this paper, we define a spatial diffusion model and apply it to the diffusion of discount store. To define a spatial diffusion model, we expand learning model(Kumar and Krishnan 2002) and separate diffusion process in diffusion center(market A) from diffusion process in the vicinity of the diffusing center(market B). The proposed spatial diffusion model is shown in equation (1a) and (1b). Equation (1a) is the diffusion process in diffusion center and equation (1b) is one in the vicinity of the diffusing center.