• Title/Summary/Keyword: 시간추출

Search Result 6,014, Processing Time 0.041 seconds

Methods for Integration of Documents using Hierarchical Structure based on the Formal Concept Analysis (FCA 기반 계층적 구조를 이용한 문서 통합 기법)

  • Kim, Tae-Hwan;Jeon, Ho-Cheol;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.3
    • /
    • pp.63-77
    • /
    • 2011
  • The World Wide Web is a very large distributed digital information space. From its origins in 1991, the web has grown to encompass diverse information resources as personal home pasges, online digital libraries and virtual museums. Some estimates suggest that the web currently includes over 500 billion pages in the deep web. The ability to search and retrieve information from the web efficiently and effectively is an enabling technology for realizing its full potential. With powerful workstations and parallel processing technology, efficiency is not a bottleneck. In fact, some existing search tools sift through gigabyte.syze precompiled web indexes in a fraction of a second. But retrieval effectiveness is a different matter. Current search tools retrieve too many documents, of which only a small fraction are relevant to the user query. Furthermore, the most relevant documents do not nessarily appear at the top of the query output order. Also, current search tools can not retrieve the documents related with retrieved document from gigantic amount of documents. The most important problem for lots of current searching systems is to increase the quality of search. It means to provide related documents or decrease the number of unrelated documents as low as possible in the results of search. For this problem, CiteSeer proposed the ACI (Autonomous Citation Indexing) of the articles on the World Wide Web. A "citation index" indexes the links between articles that researchers make when they cite other articles. Citation indexes are very useful for a number of purposes, including literature search and analysis of the academic literature. For details of this work, references contained in academic articles are used to give credit to previous work in the literature and provide a link between the "citing" and "cited" articles. A citation index indexes the citations that an article makes, linking the articleswith the cited works. Citation indexes were originally designed mainly for information retrieval. The citation links allow navigating the literature in unique ways. Papers can be located independent of language, and words in thetitle, keywords or document. A citation index allows navigation backward in time (the list of cited articles) and forwardin time (which subsequent articles cite the current article?) But CiteSeer can not indexes the links between articles that researchers doesn't make. Because it indexes the links between articles that only researchers make when they cite other articles. Also, CiteSeer is not easy to scalability. Because CiteSeer can not indexes the links between articles that researchers doesn't make. All these problems make us orient for designing more effective search system. This paper shows a method that extracts subject and predicate per each sentence in documents. A document will be changed into the tabular form that extracted predicate checked value of possible subject and object. We make a hierarchical graph of a document using the table and then integrate graphs of documents. The graph of entire documents calculates the area of document as compared with integrated documents. We mark relation among the documents as compared with the area of documents. Also it proposes a method for structural integration of documents that retrieves documents from the graph. It makes that the user can find information easier. We compared the performance of the proposed approaches with lucene search engine using the formulas for ranking. As a result, the F.measure is about 60% and it is better as about 15%.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Extension Method of Association Rules Using Social Network Analysis (사회연결망 분석을 활용한 연관규칙 확장기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.111-126
    • /
    • 2017
  • Recommender systems based on association rule mining significantly contribute to seller's sales by reducing consumers' time to search for products that they want. Recommendations based on the frequency of transactions such as orders can effectively screen out the products that are statistically marketable among multiple products. A product with a high possibility of sales, however, can be omitted from the recommendation if it records insufficient number of transactions at the beginning of the sale. Products missing from the associated recommendations may lose the chance of exposure to consumers, which leads to a decline in the number of transactions. In turn, diminished transactions may create a vicious circle of lost opportunity to be recommended. Thus, initial sales are likely to remain stagnant for a certain period of time. Products that are susceptible to fashion or seasonality, such as clothing, may be greatly affected. This study was aimed at expanding association rules to include into the list of recommendations those products whose initial trading frequency of transactions is low despite the possibility of high sales. The particular purpose is to predict the strength of the direct connection of two unconnected items through the properties of the paths located between them. An association between two items revealed in transactions can be interpreted as the interaction between them, which can be expressed as a link in a social network whose nodes are items. The first step calculates the centralities of the nodes in the middle of the paths that indirectly connect the two nodes without direct connection. The next step identifies the number of the paths and the shortest among them. These extracts are used as independent variables in the regression analysis to predict future connection strength between the nodes. The strength of the connection between the two nodes of the model, which is defined by the number of nodes between the two nodes, is measured after a certain period of time. The regression analysis results confirm that the number of paths between the two products, the distance of the shortest path, and the number of neighboring items connected to the products are significantly related to their potential strength. This study used actual order transaction data collected for three months from February to April in 2016 from an online commerce company. To reduce the complexity of analytics as the scale of the network grows, the analysis was performed only on miscellaneous goods. Two consecutively purchased items were chosen from each customer's transactions to obtain a pair of antecedent and consequent, which secures a link needed for constituting a social network. The direction of the link was determined in the order in which the goods were purchased. Except for the last ten days of the data collection period, the social network of associated items was built for the extraction of independent variables. The model predicts the number of links to be connected in the next ten days from the explanatory variables. Of the 5,711 previously unconnected links, 611 were newly connected for the last ten days. Through experiments, the proposed model demonstrated excellent predictions. Of the 571 links that the proposed model predicts, 269 were confirmed to have been connected. This is 4.4 times more than the average of 61, which can be found without any prediction model. This study is expected to be useful regarding industries whose new products launch quickly with short life cycles, since their exposure time is critical. Also, it can be used to detect diseases that are rarely found in the early stages of medical treatment because of the low incidence of outbreaks. Since the complexity of the social networking analysis is sensitive to the number of nodes and links that make up the network, this study was conducted in a particular category of miscellaneous goods. Future research should consider that this condition may limit the opportunity to detect unexpected associations between products belonging to different categories of classification.

The meaning based on Yin-Yang and Five Elements Principle in Semantic Landscape Composition of 'the Forty Eight Poems of Soswaewon' ('소쇄원(瀟灑園) 48영'의 의미경관 구성에 있어서 음양오행론적(陰陽五行論的) 의미(意味))

  • Jang, Il-Young;Shin, Sang-Sup
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.31 no.2
    • /
    • pp.43-57
    • /
    • 2013
  • The purpose of this study is to identify potential semantic landscape makeup of "the Forty Eight Poems of Soswaewon" according to Yin-Yang and Five Elements Principle(陰陽五行論). that speculation system between human's nature and cosmical universal order. Existing academic discussions made so far concerning this topic can be summed up as follows: 1. Among Yin-Yang-based landscape makeups of the Forty Eight Poems of Soswaewon, poetic writings for embodiment of interactions between nature and human behaviors focused on depicting dynamic aspects of a poetic narrator when he appreciates or explores hills and streams as of to live free from worldly cares. Primarily, many of those writings were created on the east and south primarily through assignment of yang. On the other hand, poetic writings for embodiment of nature and seasonal scenery - as static landscape makeup of yin - were often created on or near the north and west for many times. Those writings focusing on embodiment of nature and artificial scenery as a work are divided into two categories: One category refers to author Kim In-hu's expression of semantic landscape from seasonal scenery in nature. The other refers to his depiction of realistic garden images as they are. In the Forty Eight Poems of Soswaewon, the poetic writings show that author Kim focused on embodying seasonal scenery rather than expressing human behaviors. In addition, both Poem No. 1 and Poem No. 48(last poem; titled 'Jangwon Jeyeong') were created in a same place, which author Kim sought to understand the place as a space of beginning and end where yin and yang - i.e. the principle of natural cycle - are inherent. 2. According to construction about landscape in the Forty Eight Poems of Soswaewon on the basis of Ohaeng-ron (five natural element principle), it was found that tree(木) and fire(火) are typical examples of a world combined by emanation. First, many of poetic writings depicting the sentiments of tree focused on embodying seasonal scenery and were located in the place of Ogogmun(五曲門) area in the east, from overall perspective of Soswaewon. The content of these poems shows generation and curve / straightness in flexibility and simplicity. Many of poems depicting the sentiments of fire(火) focused on embodying human behaviors, and they were created in Aeyangdan area on the south of Soswaewon over which sun rises at noon. These poems are all on a status of side movement that is characterized by emanation and ascension which belong to attributes of yang. 3. With regard to Ohaeng-ron's interpretation about landscape in the Forty Eight Poems of Soswaewon, it was found that metal(金) and water(水) are typical examples of world combined by convergence. First, it was found that all of poems depicting sentiments of metal focused on embodying seasonal scenery, and were created in a bamboo grove area on the west from overall perspective of Soswaewon. They represent scenery of autumn among 4 seasons to symbolize faithfulness vested in a man of virtue(seonbi) with integrity and righteousness. Poems depicting sentiments of water were created in vicinity of Jewoldang on the north, possibly topmost of Soswaewon. They were divided into two categories: One category refers to poems embodying actions of welcoming the first full moon deep in the night after sunset, and the other refers to poems embodying natural scenery of snowscape. All of those poems focused on expressing any atmosphere of turning into yin via convergence. 4. With regard to Ohaeng-ron's interpretation of landscape in the Forty Eight Poems of Soswaewon, it was found that poems depicting sentiments of earth(土), a complex body of convergence and emanation, were created in vicinity of mountain stream around Gwangpunggak which is located in the center of Soswaewon. These poems focused on carrying actions of author Kim by way of natural phenomena and artificial scenery.

A Survey on Physical Complaints Related with Farmers' Syndrome of Vinylhouse and Non-vinylhouse Farmers (비닐하우스 재배농민과 일반농민의 농부증 관련 신체증상 호소율 조사)

  • Lee, Ju-Young;Park, Jung-Han;Kim, Doo-Hie
    • Journal of Preventive Medicine and Public Health
    • /
    • v.27 no.2 s.46
    • /
    • pp.258-273
    • /
    • 1994
  • To compare the physical complaints of vinylhouse farmers with those of non-vinylhouse farmers, a personal interviews on 250 vinylhouse and 142 non-vinylhouse farmers were conducted in Sungjoo county in Kyungpook province selected by a random sampling from July 5 to July 10, 1993. Blood pressure of the subjects was also measured. Vinylhouse farmers had a higher average age, larger family size, shorter experience of farming, more working hours per day and working days per year and higher annual income than the non-vinylhouse farmers. The frequency of pesticide spray of the vinylhouse farmers was 3.4 times on the average in June 1993 as compared with 2.0 times of non-vinylhouse farmers, and 16.7 times for the vinylhouse farmers during the last one year while it was 8.3 times for the non-vinylhouse farmers in the same period. While 39.6% of vinylhouse farmers experienced pesticide intoxication symptoms such as headache, nausea, vomiting, dizziness, itching, and skin irritation, etc. during the month of June, 25.4% of non-vinylhouse farmers experienced such symptoms. The most frequent symptoms among eight symptoms that constitute the farmers' syndrome were lumbago, numbness of hand or foot, shoulder pain and dizziness regardless of sex and type of farming. Prevalence of the farmers' syndrome in male and female among vinylhouse farmers were 22.1%, 43.4%, respectively, and the prevalence in non-vinylhouse farmers was 23.2% for male and 50.7% for female. There was no statistically significant difference in the prevalence of farmers' syndrome between vinylhouse and non-vinylhouse farmers. However, the prevalence in female was about 2 times higher than that of male. When the effects of other factors were adjusted by multiple logistic regression for farmers' syndrome, the prevalence in female was 3.0 times higher than that of male. The prevalence of farmers' syndrome was increased as the age of farmers increased in both vinylhouse and non-vinylhouse farmers, and adjusted odds ratio of farmers' syndrome increased by 3% as the age increased by 1 year. Adjusted odds ratio for Farmers' syndrome in farmers who experienced pesticide intoxication during the month of June was 3.1 times higher than that of farmers who did not have such experience. While the prevalence of hypertension in male and female non-vinylhouse farmers were 22.4%, 13.7%, respectively, the prevalence in vinylhouse farmers were 13.5% for male and 12.0% for female. However, there was no association between farmers' syndrome and hypertension. It was found in this study that the vinylhouse farmers are at a high risk of pesticide intoxication, which is associated with tile common physical complaints. To reduce such risk it is necessary to develop farming methods which do not require the pesticide or may use less pesticide, a safer method of pesticide spraying, and the protective equipments which can be worn at a high temperature and have a better protective effect. Also education of farmers for the correct methods of ventilation after pesticide spraying in the vinylhouse and wearing the protective equipments may be considered as a supportive method. Since inappropriate posture at work and intensive labor may cause farmers' syndrome, it is recommended to develop farming tools which reduce physical burden and take a rest and exercise periodically during work. It is necessary to strengthen the hypertension management program of the Kyungpook province, because the prevalence of hypertension was as high as about 15%.

  • PDF

Effects of Crude Protein Levels in Total Mixed Rations on Dry Matter Intake, Digestibility and Nitrogen Balance in Early Pregnant Korean Black Goats (섬유질배합사료 내 조단백질 수준이 임신초기 흑염소의 건물섭취량, 소화율 및 질소출납에 미치는 영향)

  • HwangBo, Soon;Choi, Sun-Ho;Lee, Sung-Hoon;Kim, Sang-Woo;Kim, Young-Keun;Sang, Byung-Don;Jo, Ik-Hwan
    • Journal of The Korean Society of Grassland and Forage Science
    • /
    • v.27 no.2
    • /
    • pp.93-100
    • /
    • 2007
  • This study was conducted to determine the effects of different levels (10, 12 and 15%) of crude protein (CP) in total mixed ration (TMR) on dry matter intake, digestibility and nitrogen balance of Korean black goats in the stage of early pregnancy and to obtain information on their optimal dietary levels of CP. In the present study, 12 Does of Korean black goats in the early pregnancy were allotted to four unreplicated groups by dietary level of CP and then they were housed in individual metabolism cages with completely randomized design throughout 30 days with 20 days adaptation and 10 days collection periods. Does in Control were fed a conventional diet and does in TMR10, TMR12 and TMR15 were fed a diet adjusted to about 10, 12 and 15% CP, respectively. Dry matter(DM) contents ranged from 89 to 91% in treatments. There were no differences fur fiber contents among three CP levels of TMR, showing that ADF and NDF had 18.57 to 19.85, and 53.41 to 54.80, respectively. Crude protein contents for three TMR treaements had 10.61, 12.15 and 14.97%, respectively. However, non-fibrous carbohydrate (NFC) contents decreased with increasing CP levels in treatments. Meanwhile, Intakes of DM, nutrients and digestible nutrients were significantly (p<0.05) higher in TMR15 and control than in TMR10 and TMR12. Moreover, DM intake per metabolic body weight and theit ratio per body weight was significantly (p<0.05) higher for control and TMR15 than other treatments. DM digestibility was not significantly different among treatments, but ether extract digestibility of treatments was significantly (p<0.05) higher than that of control, but there was no significant difference among treatments. Nitrogen retention significantly (p<0.05) increased with increasing CP levels in TMR, and TMR15 was highest among treatments. Our results showed that the increasing CP levels in TMR increased DM intake and nitrogen retention and suggested that the optimal dietary CP levels under TMR feeding system in early pregnant Korean black goats could be estimated for at least 15%.

The Impact of Bladder Volume on Acute Urinary Toxicity during Radiation Therapy for Prostate Cancer (전립선암의 방사선치료시 방광 부피가 비뇨기계 부작용에 미치는 영향)

  • Lee, Ji-Hae;Suh, Hyun-Suk;Lee, Kyung-Ja;Lee, Re-Na;Kim, Myung-Soo
    • Radiation Oncology Journal
    • /
    • v.26 no.4
    • /
    • pp.237-246
    • /
    • 2008
  • Purpose: Three-dimensional conformal radiation therapy (3DCRT) and intensity-modulated radiation therapy (IMRT) were found to reduce the incidence of acute and late rectal toxicity compared with conventional radiation therapy (RT), although acute and late urinary toxicities were not reduced significantly. Acute urinary toxicity, even at a low-grade, not only has an impact on a patient's quality of life, but also can be used as a predictor for chronic urinary toxicity. With bladder filling, part of the bladder moves away from the radiation field, resulting in a small irradiated bladder volume; hence, urinary toxicity can be decreased. The purpose of this study is to evaluate the impact of bladder volume on acute urinary toxicity during RT in patients with prostate cancer. Materials and Methods: Forty two patients diagnosed with prostate cancer were treated by 3DCRT and of these, 21 patients made up a control group treated without any instruction to control the bladder volume. The remaining 21 patients in the experimental group were treated with a full bladder after drinking 450 mL of water an hour before treatment. We measured the bladder volume by CT and ultrasound at simulation to validate the accuracy of ultrasound. During the treatment period, we measured bladder volume weekly by ultrasound, for the experimental group, to evaluate the variation of the bladder volume. Results: A significant correlation between the bladder volume measured by CT and ultrasound was observed. The bladder volume in the experimental group varied with each patient despite drinking the same amount of water. Although weekly variations of the bladder volume were very high, larger initial CT volumes were associated with larger mean weekly bladder volumes. The mean bladder volume was $299{\pm}155\;mL$ in the experimental group, as opposed to $187{\pm}155\;mL$ in the control group. Patients in experimental group experienced less acute urinary toxicities than in control group, but the difference was not statistically significant. A trend of reduced toxicity was observed with the increase of CT bladder volume. In patients with bladder volumes greater than 150 mL at simulation, toxicity rates of all grades were significantly lower than in patients with bladder volume less than 150 mL. Also, patients with a mean bladder volume larger than 100 mL during treatment showed a slightly reduced Grade 1 urinary toxicity rate compared to patients with a mean bladder volume smaller than 100 mL. Conclusion: Despite the large variability in bladder volume during the treatment period, treating patients with a full bladder reduced acute urinary toxicities in patients with prostate cancer. We recommend that patients with prostate cancer undergo treatment with a full bladder.

A Basic Study on Spatial Recognition through Poet in Soswaewon Garden (시문을 통해 본 소쇄원의 공간인식에 관한 기초연구)

  • Lee, Won-Ho;Kim, Dong-Hyun
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.33 no.3
    • /
    • pp.38-49
    • /
    • 2015
  • This study aims to contemplated spatial recognition in Soswaewon Garden through garden visitors poetry. It was content analysis in poetry and extract frequency from words based on relationship of author. The results were as follows. First, relationship of authors who wrote Soswaewon Garden poetry was formed in companionship. In the Yang, San-Bo(梁山甫), poetry was written by Song, Soon(宋純), Kim, Un-Geo(金彦据) and Kim, In-Hu(金麟厚) as the central figure. Especially Kim, In-Hu was playing an important role in Soswaewon Garden poetry. He was wrote many of poetry and keep friends with Yang, Ja-Jeong(梁子渟) too. In the Yang, Ja-Jung, relationship of previous generation was sustained. In addition, Ko, Gyeong-Myeong(高敬命) and Kim, Seong-Won and Jeong, Chul(鄭澈) is more closely related than others. Because blood relationship by marriage. In the Yang, Jin-Tae(梁晋泰), He formed a relationship with a celebrity and attend to international activity. Since then Yang, Jin-Tae periord, Yang, Gyeong-Ji(梁敬之) and Yang, Chae-Ji(梁采之) formed relationship of previous generation was sustained. And surrounding people was written poetry as hold a banquet. Second, plant and ornament is a popular object for writing poetry. Bamboo grove and Fine tree with a high frequency of plant element in poetry. Bamboo grove is a typical species of trees in Soswaewon Garden. It was enclosed the Soswaewon Garden. Fine tree was often used target of poetry as a single tree. Meanwhile, ornament of the wall has been used most frequently. Descendants wrote a poem to see it because Kim, In-Hu's poetry was left. This phenomenon is involves respect for the ancient sages with high frequency. In addition, behavior of viewing the landscape was mainly appeared. Third, spatial recognition of Soswaewon Garden can be divided into landscape cognition, behavior cognition and emotional cognition. In a aspect of landscape cognition, early Soswaewon Garden was recognized as a pavilion. That was used garden name to 'Soswaewon Garden' since Yang, Ja-Jung's period. That is to say, Soswaewon Garden expanded from pavilion area surrounded by trees into the whole appearance is equipped garden area. Behavior cognition was consisting drink and enjoys a landscape. In the Yang, San-Bo, authors enjoyed drinking and viewing a landscape besides walking, writing poetry, viewing the moon. But after Yang, San-Bo's period other than drinking and enjoy a landscape has appeared a low frequency. These results were changed from internal place to blood relationship into external place to companionship. In the Yang, San-Bo's emotional cognition was sorrow and yearning about leave to Soswaewon Garden with an idly atmosphere. Pleasant emotion was sustained all generation. And emotion of respect for the ancient sages was appeared since Yang, Cheon-un.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

An Ontology Model for Public Service Export Platform (공공 서비스 수출 플랫폼을 위한 온톨로지 모형)

  • Lee, Gang-Won;Park, Sei-Kwon;Ryu, Seung-Wan;Shin, Dong-Cheon
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.149-161
    • /
    • 2014
  • The export of domestic public services to overseas markets contains many potential obstacles, stemming from different export procedures, the target services, and socio-economic environments. In order to alleviate these problems, the business incubation platform as an open business ecosystem can be a powerful instrument to support the decisions taken by participants and stakeholders. In this paper, we propose an ontology model and its implementation processes for the business incubation platform with an open and pervasive architecture to support public service exports. For the conceptual model of platform ontology, export case studies are used for requirements analysis. The conceptual model shows the basic structure, with vocabulary and its meaning, the relationship between ontologies, and key attributes. For the implementation and test of the ontology model, the logical structure is edited using Prot$\acute{e}$g$\acute{e}$ editor. The core engine of the business incubation platform is the simulator module, where the various contexts of export businesses should be captured, defined, and shared with other modules through ontologies. It is well-known that an ontology, with which concepts and their relationships are represented using a shared vocabulary, is an efficient and effective tool for organizing meta-information to develop structural frameworks in a particular domain. The proposed model consists of five ontologies derived from a requirements survey of major stakeholders and their operational scenarios: service, requirements, environment, enterprise, and county. The service ontology contains several components that can find and categorize public services through a case analysis of the public service export. Key attributes of the service ontology are composed of categories including objective, requirements, activity, and service. The objective category, which has sub-attributes including operational body (organization) and user, acts as a reference to search and classify public services. The requirements category relates to the functional needs at a particular phase of system (service) design or operation. Sub-attributes of requirements are user, application, platform, architecture, and social overhead. The activity category represents business processes during the operation and maintenance phase. The activity category also has sub-attributes including facility, software, and project unit. The service category, with sub-attributes such as target, time, and place, acts as a reference to sort and classify the public services. The requirements ontology is derived from the basic and common components of public services and target countries. The key attributes of the requirements ontology are business, technology, and constraints. Business requirements represent the needs of processes and activities for public service export; technology represents the technological requirements for the operation of public services; and constraints represent the business law, regulations, or cultural characteristics of the target country. The environment ontology is derived from case studies of target countries for public service operation. Key attributes of the environment ontology are user, requirements, and activity. A user includes stakeholders in public services, from citizens to operators and managers; the requirements attribute represents the managerial and physical needs during operation; the activity attribute represents business processes in detail. The enterprise ontology is introduced from a previous study, and its attributes are activity, organization, strategy, marketing, and time. The country ontology is derived from the demographic and geopolitical analysis of the target country, and its key attributes are economy, social infrastructure, law, regulation, customs, population, location, and development strategies. The priority list for target services for a certain country and/or the priority list for target countries for a certain public services are generated by a matching algorithm. These lists are used as input seeds to simulate the consortium partners, and government's policies and programs. In the simulation, the environmental differences between Korea and the target country can be customized through a gap analysis and work-flow optimization process. When the process gap between Korea and the target country is too large for a single corporation to cover, a consortium is considered an alternative choice, and various alternatives are derived from the capability index of enterprises. For financial packages, a mix of various foreign aid funds can be simulated during this stage. It is expected that the proposed ontology model and the business incubation platform can be used by various participants in the public service export market. It could be especially beneficial to small and medium businesses that have relatively fewer resources and experience with public service export. We also expect that the open and pervasive service architecture in a digital business ecosystem will help stakeholders find new opportunities through information sharing and collaboration on business processes.