Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)
-
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.69-94
- /
- 2017
Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.
Stevia (Stevia rebaudiana Bertoni) is a perennial herb widely distributed in the mountainous area of Paraguay. It belongs to the family Compositae and contains 6 to 12 percent stevioside in the leaves. Stevioside is a glucoside having similar sweetening character to surgar and the degree of sweetness is approximately 300 times of sugar. Since Korea does not produce any sugar crops, and the synthetic sweetenings are potentially hazardous for health, it is rather urgent to develop an economical new sweetener. Consequently, the current experiments are conducted to establish cultural practices of stevia, a new sweetening herbs, introduced into Korea in 1973 and the results are summarized as followings: 1. Days from transplanting of cuttings to the flower bud formation of 6 stevia lines were similar among daylengths of 8, 10 and 12 hours, but it was much greater at daylengths of 14 or 24 hour and varietal differences were noticable. All lines were photosensitive, but a line, 77013, was the most sensitive and 77067 and Suweon 2 were less sensitive to daylength. 2. Critical daylength of all lines seemed to be approximately 12 hours. Growth of plants was severely retarded at daylengths less than 12 hours. 3. Cutting were responded to short daylength before rooting. Number of days from transplanting to flower bud formation of 40-day old cuttings in the nursery bed was 20 days and it was delayed as duration of nursery were shorter. 4. Number of days from emergence to flower bud formation was shortest at short day treatment from 20 days after emergence. It was became longer as initiation of short day treatment was earlier or later than 20 days. 5. Plant height, number of branches, and top dry weight of stevia were reduced as cutting date was delayed from March 20 to May 20. The highest yield of dry leaf was obtained at nursery duration of 40-50 days in march 20 cutting, 30-40 days in April 20 cutting, and 30 days in May 20 cutting. 6. An asymptotic relationship was observed between plant population and leaf dry weight. Yield of dry leaf increased rapidly as plant population increased from 5,000 to 10,000 plants/10a with a reduced increasing rate from 10,000 to 20,000 plants/l0a, and levelled off at the plant population higher than 20,000 plants/l0a. 7. Stevia was adaptable in Suweon, Chengju, Mokpo and Jeju and drought was one of the main factors reducing yield of dry leaf. Yield of dry leaf was reduced significantly (approximately 30%) at June 20 transplanting compared to optimum transplanting. 8. Yield of dry leaf was higher in a vinyl house compared to unprotected control at long daylength or natural daylength except at short day treatment at March 20. Higher temperature ill a vinyl house does not have benefital effects at April 20 transplanting. 9. The highest content of stevioside was noted at the upper leaves of the plant but the lowest was measured at the plant parts of 20cm above ground. Leaf dry weight and stevioside yield was mainly contributed by the plant parts of 60 to 120cm above ground but the varietal differences were also significant. 10. Delayed harvest by the time of flower bud formation increased leaf dry weight remarkably. However, there were insignificant changes of yield as harvests were made at any time after flower bud formation. Content of stevioside was highest at the time of flower bud formation and earlier or later harvest than this time was low in its content. The optimum harvesting time determined by leaf dry weight and stevioside content was the periods from flower bud formation to right before flowering that would be the period from September 10 to September 15 in Suweon area. 11. Stevioside and rebaudioside content in the leaves of Stevia varieties were ranged from 5.4% to 14.3% and 1.5% to 8.3% respectively. However, no definit relationships between stevioside and rebaudioside were observed in these particular experiments.
In a case in which National Health Insurance Corporation (NHIC) pays medical care expenses to a victim of a traffic accident resulting in injury or death and asks the assailant for compensation of its share in the medical care expenses, as the precedent treats the subrogation of a claim set by National Health Insurance Act the same as that set by Industrial Accident Compensation Insurance Act, it draws the range of its compensation from the range of deduction, according to the principle of deduction after offsetting and acknowledges the compensation of all medical care expenses borne by the NHIC, within the amount of compensation claimed by the victim. However, both the National Health Insurance Act and the Industrial Accident Compensation Insurance Act are laws that regulate social insurance, but medical care expenses in the National Health Insurance Act have a character of 'an underinsurance that fixes the ratio of indemnification,' while insurance benefit on the Industrial Accident Compensation Insurance Act has a character of full insurance, or focuses on helping the insured that suffered an industrial accident lead a life, approximate to that in the past, regardless of the amount of damages according to its character of social insurance. Therefore, there is no reason to treat the subrogation of a claim on the National Health Insurance Act the same as that on the Industrial Accident Compensation Insurance Act. Since the insured loses the right of claim acquired by the insurer by subrogation in return for receiving a receipt, there is no benefit from receiving insurance in the range. Thus, in a suit in which the insured seeks compensation for damages from the assailant, there is no room for the application of the legal principle of offset of profits and losses, and the range of subrogation of a claim or the amount of deduction from compensation should be decided by the contract between the persons directly involved or a related law. Therefore, it is not reasonable that the precedent draws the range of the NHIC's compensation from the principle of deduction after offsetting. To interpret Clause 1, Article 58 of the National Health Insurance Act that sets the range of the NHIC's compensation uniformly and systematically in combination with Clause 2 of the same article that sets the range of exemption, if the compensation is made first, it is reasonable to fix the range of the NHIC's compensation by multiplying the medical care expenses paid by the ratio of the assailant's liability. This is contrasted with the range of the Korea Labor Welfare Corporation's compensation which covers the total amount of the claim of the insured within the insurance benefit paid in the interpretation of Clauses 1 and 2, Article 87 of the Industrial Accident Compensation Insurance Act. In the meantime, there are doubts about why the profit should be deducted from the amount of compensation claimed, though it is enough for the principle of deduction after offsetting that the precedent took as the premise in judging the range of the NHIC's compensation to deduct the profit made by the victim from the amount of damages, so as to achieve the goal of not attributing profit more than the amount of damage to a victim; whether it is reasonable to attribute all the profit made by the victim to the assailant, while the damages suffered by the victim are distributed fairly; and whether there is concrete validity in actual cases. Therefore, the legal principle of the precedent concerning the range of the NHIC's compensation and the legal principle of the precedent following the principle of deduction after offsetting should be reconsidered.
Purpose: This study was conducted to examine coffee consumption behaviors, dietary habits, and nutrient intakes by coffee intake amount among university students. Methods: Questionnaires were distributed to 300 university students randomly selected in Gongju. Dietary survey was administered during two weekdays by the food record method. Results: Subjects were divided into three groups: NCG (non-coffee group), LCG (low coffee group, 1~2 cups/d), and HCG (high coffee group, 3 cups/d) by coffee intake amount and subjects' distribution. Coffee intake frequency was significantly greater in the HCG compared to the LCG (p < 0.001). The HCG was more likely to intake dripped coffee with or without milk and/or sugar than the LCG (p < 0.05). More than 80% of coffee drinkers chose their favorite coffee or accompanying snacks regardless of energy content. More than 75% of coffee takers did not eat accompanying snacks instead of meals, and the HCG ate them more frequently than LCG (p < 0.05). Breakfast skipping rate was high while vegetable and fruit intakes were very low in most subjects. Subjects who drank carbonated drinks, sweet beverages, or alcohol were significantly greater in number in the LCG and HCG than in the NCG (p < 0.01). Energy intakes from coffee were
Introduction: Diffusion is process by which an innovation is communicated through certain channel overtime among the members of a social system(Rogers 1983). Bass(1969) suggested the Bass model describing diffusion process. The Bass model assumes potential adopters of innovation are influenced by mass-media and word-of-mouth from communication with previous adopters. Various expansions of the Bass model have been conducted. Some of them proposed a third factor affecting diffusion. Others proposed multinational diffusion model and it stressed interactive effect on diffusion among several countries. We add a spatial factor in the Bass model as a third communication factor. Because of situation where we can not control the interaction between markets, we need to consider that diffusion within certain market can be influenced by diffusion in contiguous market. The process that certain type of retail extends is a result that particular market can be described by the retail life cycle. Diffusion of retail has pattern following three phases of spatial diffusion: adoption of innovation happens in near the diffusion center first, spreads to the vicinity of the diffusing center and then adoption of innovation is completed in peripheral areas in saturation stage. So we expect spatial effect to be important to describe diffusion of domestic discount store. We define a spatial diffusion model using multinational diffusion model and apply it to the diffusion of discount store. Modeling: In this paper, we define a spatial diffusion model and apply it to the diffusion of discount store. To define a spatial diffusion model, we expand learning model(Kumar and Krishnan 2002) and separate diffusion process in diffusion center(market A) from diffusion process in the vicinity of the diffusing center(market B). The proposed spatial diffusion model is shown in equation (1a) and (1b). Equation (1a) is the diffusion process in diffusion center and equation (1b) is one in the vicinity of the diffusing center.