• Title/Summary/Keyword: 정형

Search Result 5,588, Processing Time 0.047 seconds

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Mid-Term Results of 292 cases of Coronary Artery Bypass Grafting (관상동맥 우회술 292례의 중기 성적)

  • 김태윤;김응중;이원용;지현근;신윤철;김건일
    • Journal of Chest Surgery
    • /
    • v.35 no.9
    • /
    • pp.643-652
    • /
    • 2002
  • As the prevalence of coronay artery disease is increasing, the surgical treatment has been universalized and operative outcome has been improved. We analyzed the short and mid-term results of 292 CABGs performed in Kangdong Sacred Heart Hospital. Material and Method: From June 1994 to December 2001, 292 patients underwent coronary artery bypass grafting. There were 173 men and 119 women and their ages ranged from 39 to 84 years with a mean of $61.8{\pm}9.1$ years. We analyzed the preoperative risk factors, operative procedures and operative outcome. In addition, we analyzed the recurrence of symptoms, long-term mortality and complications via out-patient follow-up for discharged patients. Result: Preoperative clinical diagnoses were unstable angina in 137(46.9%), stable angina in 34(11.6%), acute myocardial infarction in 40(13.7%), non-Q myocardial infarction in 25(8.6%), postinfarction angina in 22(7.5%), cardiogenic shock in 30(10.3%) and PTCA failure in 4(1.4%) patients. Preoperative angiographic diagnoses were three-vessel disease in 157(53.8%), two-vessel disease in 35 (12.0%), one-vessel disease in 11(3.8%) and left main disease in 89(30.5%) patients. We used saphenous veins in 630, internal thoracic arteries in 257, radial arteries in 50, and right gastoepiploic arteries in 2 distal anastomoses. The mean number of distal anastomoses per patient was $3.2{\pm}1.0$ There were 18 concomitant procedures ; valve replacement in 8(2.7%), left main coronary artery angioplasty in 6(2.1%), patch closure of postinfarction ventricular septal defect(PMI-VSD) in 2(0.7%), replacement of ascending aorta in 1(0.3%) and coronary endarterectomy in 1(0.3%) patient. The mean ACC time was $96.6{\pm}35.3 $ minutes and the mean CPB time was $179.2{\pm}94.6$ minutes. Total early mortality was 8.6%, but it was 3.1% in elective operations. The most common cause of early mortality was low cardiac output syndrome in 6(2.1%) patients. The stastistically significant risk factors for early mortality were hypertension, old age($\geq$ 70 years), poor LV function(EF<40%), congestive heart failure, preoperative intraaortic balloon pump, emergency operation and chronic renal failure. The most common complication was arrhythmia in 52(17.8%) patients. The mean follow-up period was $39.0{\pm}27.0$ months. Most patients were free of symptoms during follow-up. Fourteen patients(5.8 %) had recurrent symptoms and 7 patients(2.9%) died during follow-up period. Follow-up coronary angiography was performed in 13 patients with recurrent symptoms and they were managed by surgical and medical treatment according to the coronary angiographic result. Conclusion: The operative and late results of CABG in our hospital, was acceptable. However, There should be more refinement in operative technique and postoperative management to improve the results.

CAS 500-1/2 Image Utilization Technology and System Development: Achievement and Contribution (국토위성정보 활용기술 및 운영시스템 개발: 성과 및 의의)

  • Yoon, Sung-Joo;Son, Jonghwan;Park, Hyeongjun;Seo, Junghoon;Lee, Yoojin;Ban, Seunghwan;Choi, Jae-Seung;Kim, Byung-Guk;Lee, Hyun jik;Lee, Kyu-sung;Kweon, Ki-Eok;Lee, Kye-Dong;Jung, Hyung-sup;Choung, Yun-Jae;Choi, Hyun;Koo, Daesung;Choi, Myungjin;Shin, Yunsoo;Choi, Jaewan;Eo, Yang-Dam;Jeong, Jong-chul;Han, Youkyung;Oh, Jaehong;Rhee, Sooahm;Chang, Eunmi;Kim, Taejung
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_2
    • /
    • pp.867-879
    • /
    • 2020
  • As the era of space technology utilization is approaching, the launch of CAS (Compact Advanced Satellite) 500-1/2 satellites is scheduled during 2021 for acquisition of high-resolution images. Accordingly, the increase of image usability and processing efficiency has been emphasized as key design concepts of the CAS 500-1/2 ground station. In this regard, "CAS 500-1/2 Image Acquisition and Utilization Technology Development" project has been carried out to develop core technologies and processing systems for CAS 500-1/2 data collecting, processing, managing and distributing. In this paper, we introduce the results of the above project. We developed an operation system to generate precision images automatically with GCP (Ground Control Point) chip DB (Database) and DEM (Digital Elevation Model) DB over the entire Korean peninsula. We also developed the system to produce ortho-rectified images indexed to 1:5,000 map grids, and hence set a foundation for ARD (Analysis Ready Data)system. In addition, we linked various application software to the operation system and systematically produce mosaic images, DSM (Digital Surface Model)/DTM (Digital Terrain Model), spatial feature thematic map, and change detection thematic map. The major contribution of the developed system and technologies includes that precision images are to be automatically generated using GCP chip DB for the first time in Korea and the various utilization product technologies incorporated into the operation system of a satellite ground station. The developed operation system has been installed on Korea Land Observation Satellite Information Center of the NGII (National Geographic Information Institute). We expect the system to contribute greatly to the center's work and provide a standard for future ground station systems of earth observation satellites.

Granulocytic Sarcoma(Chloroma) in Leukemic Patients (백혈병 환자의 과립구 육종(녹색종양))

  • Rhee, Seung-Koo;Kang, Yong-Ku;Bahk, Won-Jong;Jung, Yang-Kuk;Lee, Sang-Wook;Jeong, Ji-Ho
    • The Journal of the Korean bone and joint tumor society
    • /
    • v.11 no.1
    • /
    • pp.54-61
    • /
    • 2005
  • Purpose: The granulocytic sarcoma which developed in leukemic patients are quite rare and it will have bad prognosis, but it's tumor pathogenesis and also their treatment are not yet established. Through this study we have tried to know their clinical course, prognosis and their end result of recent treatment. Material and Methods: Total 20 patients of granulocytic sarcoma which were developed in total 2,197 leukemic patients from April, 1998 to September, 2004 were treated at the leukemic center and the orthopaedic department of St. Mary's hospital, Catholic University of Korea, and followed them for 1~78 months(average 18 months). Results: Total 20 cases of granulocytic sarcoma was found in 14 cases of total 1,331 acute myelocytic leukemic patients(AML), 4 cases of total 744 of chronic myelocytic leukemic patients(CML), and only one case in total 122 of acute biphenotype of leukemia. And so their occurrence rate in leukmic patients are actually 0.91%, total 20 cases of granulocytic sarcoma in total 2,197 leukemic patients at same period. Their ages are average 28.3 years(4~52 years), and male are predominant(13 cases) than female(7 cases). Single involvement was found in 11 cases but multiple lesions are in 9 cases, and spine, brain, extremities, chest, and pelvic bone are involved in frequency. The granulocytic sarcoma was developed in various stages of the leukemia, ie, 8 cases in complete remission of leukemia, and 12 cases in the treatment process of AML. The pathohistologic evaluation of granulocytic sarcoma was done in 6 cases which was developed in their extremities, and confirmed numerous immature myeloblasts and lymphocytes mixed. The treatment of these granulocytic sarcoma was mainly limited for the treatment of leukemia by Glivac and massive steroid therapy(19cases) and also combined with the bone marrow transplantation(13 cases), but radiation therapy with average 3,500 rads in 15 cases out of total 20 sarcomas was also done, and followed them for average 17.5 months after development of granulocytic sarcomas. Finally their prognosis was so bad that 12 patients(60%) out of total 20 granulocytic sarcoma were dead in 6.5 months after sarcoma developed and we found the granulocytic sarcoma was more fatal if they are developed during the process of CML(mortality: 100%(4/4cases). Conclusion: The prognosis of granulocytic sarcomas in leukemic patients are quite fatal, and much more studies for their pathogenesis and ways of treatment should be performed continuously.

  • PDF

Cellular activities of osteoblast-like cells on alkali-treated titanium surface (알칼리 처리된 타이타늄 표면에 대한 골아 유사세포의 세포 활성도)

  • Park, Jin-Woo;Lee, Deog-Hye;Yeo, Shin-Il;Park, Kwang-Bum;Choi, Seok-Kyu;Suh, Jo-Young
    • Journal of Periodontal and Implant Science
    • /
    • v.37 no.sup2
    • /
    • pp.427-445
    • /
    • 2007
  • To improve osseointegration at the boneto-implant interface, several studies have been carried out to modify titanium surface. Variations in surface texture or microtopography may affect the cellular response to an implant. Osteoblast-like cells attach more readily to a rougher titanium surface, and synthesis of extracellular matrix and subsequent mineralization were found to be enhanced on rough or porous coated titanium. However, regarding the effect of roughened surface by physical and mechanical methods, most studies carried out on the reactions of cells to micrometric topography, little work has been performed on the reaction of cells to nanotopography. The purpose of this study was to examme the response of osteoblast-like cell cultured on blasted surfaces and alkali treated surfaces, and to evaluate the influence of surface texture or submicro-scaled surface topography on the cell attachment, cell proliferation and the gene expression of osteoblastic phenotype using ROS 17/2.8 cell lines. In scanning electron micrographs, the blasted, alkali treated and machined surfaces demonstrated microscopic differences in the surface topography. The specimens of alkali treatment had a submicro-scaled porous sur-face with pore size about 200 nm. The blasted surfaces showed irregularities in morphology with small(<10 ${\mu}m$) depression and indentation among flatter-appearing areas of various sizes. Based on profilometry, the blasted surfaces was significantly rougher than the machined and the alkali treated surfaces (p$TiO_2$) were observed on alkali treated surfaces, whereas not observed on machined and blasted surfaces. The attachment morphology of cells according to time was observed by the scanning electron microscope. After 1 hour incubation, the cells were in the process of adhesion and spreading on the prepared surfaces. After 3 hours, the cells on all prepared surfaces were further spreaded and flattened, however on the blasted and alkali treated surfaces, the cells exhibited slightly irregular shapes and some gaps or spaces were seen. After 24 hours incubation, most cells of the all groups had a flattened and polygonal shape, but the cells were more spreaded on the machined surfaces than the blasted and alkali treated surfaces. The MTT assay indicated the increase on machined, alkali treated and blasted surfaces according to time, and the alkali treated and blasted surfaces showed significantly increased in optical density comparing with machined surfaces at 1 day (p<0.01). Gene expression study showed that mRNA expression level of ${\alpha}\;1(I)$ collagen, alkaline phosphatase and osteopontin of the osteoblast-like cells showed a tendency to be higher on blasted and alkali treated surfaces than on the machined surfaces, although no siginificant difference in the mRNA expression level of ${\alpha}\;1(I)$ collagen, alkaline phosphatase and osteopontin was observed among all groups. In conclusion, we suggest that submicroscaled surfaces on osteoblast-like cell response do not over-ride the one of the surface with micro-scaled topography produced by blasting method, although the microscaled and submicro-scaled surfaces can accelerate osteogenic cell attachment and function compared with the machined surfaces.

Changes of Quality Characteristics of Manufactured Press Ham using Conjugated Linoleic Acid(CLA) Accumulated Pork during Storage Periods (CLA가 축적된 돈육으로 제조된 Press Ham의 저장기간중 품질변화)

  • Lee, J.I.;Ha, Y.J.;Jung, J.D.;Kang, K.H.;Hur, S.J.;Park, G.B.;Lee, J.D.;Do, C.H.
    • Journal of Animal Science and Technology
    • /
    • v.46 no.4
    • /
    • pp.645-658
    • /
    • 2004
  • To investigate the effects of conjugated linoleic acid added diet feeding on CLA accumulation and quality characteristics of manufactured press ham using CLA accwnulated pork loin meat. The CLA used to add in diet was chemically synthesized by alkaline isomerization method with com oil. Pigs were divided into 5 treatment groups(4 pigs/group) and subjected to one of five treatment diets(0, 1.25% CLA for 2weeks, 2.5% CLA for 2weeks, 1.25% CLA for 4weeks and 2.5% CLA for 4weeks, CLA diets; total fed diets) before slaughter. Pork loin were collected from the animals(110kg body weight) slaughtering at the commercial slaughter house. Manufacture press ham using CLA accumulated pork loin meat were vacuum packaged and then stored during 1, 7, 14, 21 and 28 days at 4$^{\circ}C$. Samples were analyzed for general compositions, physico-chemical properties(pH, color, shear force value), TBARS. pH value of CLA treatment(T4) was increased significantly than that of oontrol(P<0.05). pH of control and CLA treatments were increased significantly as the storage period passed(P< 0.05). Crude fat content of CLA treatment groups was significantly higher than the control pork (P<0.05). Meat color(CIE $L^*$, $a^*$$b^*$

Clinical Assessment and Cephalometric Characteristics in Patients with Condylar Resorption (하악과두흡수 환자의 임상적 평가 및 악안면 골격형태에 대한 연구)

  • Koo, Seon-Ju;Kim, Kyun-Yo;Hur, Yun-Kyung;Chae, Jong-Moon;Choi, Jae-Kap
    • Journal of Oral Medicine and Pain
    • /
    • v.34 no.1
    • /
    • pp.91-102
    • /
    • 2009
  • Condylar resorption, or condylysis can be defined as progressive alteration of condylar shape and decrease in mass. Condylar resorption is a poorly understood progressive disease that affects the TMJ and that can result in malocclusion, facial disfigurement, TMJ dysfunction, and pain. The aim of this study was to investigate clinical assessment and cephalometric characteristics in 224 patients with condylar resorption, who visited in the Department of Oral Medicine Kyungpook National University Hospital at 2006, by use of panorama, transcranial view and lateral cephalometric radiograph. The results were as follows; 1. Clinical assessment 1) Total number of patients who visited with chief complaints of TMD were 2419 and 224 (9.3%) among them revealed the condylar resorption, Among patients group with condylar resorption, female was 183 and male was 41, females were predominant. 2) Patient's age ranged from 12 to 70 and mean age was 30.6 years old with a strong predominance for 10s and 20s. Distribution of a showed as follows; 10s was 26.3%, 20s was 34,8%, 30s was 13.8%, 40s was 11.2%, 50s was 7.1%, 60s was 6.3% and 70s was 0.4%. 3) Most of the patients had parafunctional habit. 4) The case of showing the pain in condylar resorption was 145, the case of not showing the pain was 79. 5) Treatment duration of the patients was relatively short. 2. Cephalometric Characteristics 1) ANB which means the retruding of the mandible increased significantly than normal group. The ANB of female was lager than male group as the means of ANB were 5.05 in female and 3.57 in male, 2) SN-GoMe and FMA increased in resorption patients, but FH-PP did not show any significant difference. The FMA of female was lager than male group as the means were 31.69 in female and 30.44 in male. 3) Total posterior facial height was significantly smaller and total anterior facial height showed no significant increase as compared with those of the normal group. Condylar resorption was predominant in young female which was caused by more vertical facial pattern in female than male and increase of parafunctional habit in young age. It was thought that the patients who have a risk factor increasing the compressive stress at condyle caused by obliquely inclined masseter and medial pterygoid show high prevalence of condylar resorption.

The Etiologies and Initial Antimicrobial Therapy Outcomes in One Tertiary Hospital ICU-admitted Patient with Severe Community-acquired Pneumonia (국내 한 3차 병원 중환자실에 입원한 중증지역획득폐렴 환자의 원인 미생물과 경험적 항균제 치료 성적의 고찰)

  • Lee, Jae Seung;Chung, Joo Won;Koh, Yunsuck;Lim, Chae-Man;Jung, Young Joo;Oh, Youn Mok;Shim, Tae Sun;Lee, Sang Do;Kim, Woo Sung;Kim, Dong-Soon;Kim, Won Dong;Hong, Sang-Bum
    • Tuberculosis and Respiratory Diseases
    • /
    • v.59 no.5
    • /
    • pp.522-529
    • /
    • 2005
  • Background : Several national societies have published guidelines for empirical antimicrobial therapy in patients with severe community-acquired pneumonia (SCAP). This study investigated the etiologies of SCAP in the Asan Medical Center and assessed the relationship between the initial empirical antimicrobial regimen and 30 day mortality rate. Method : retrospective analysis was performed on patients with SCAP admitted to the ICU between March 2002 and February 2004 in the Asan Medical Center. The basic demographic data, bacteriologic study results and initial antimicrobial regimen were examined for all patients. The clinical outcomes including the ICU length of stay, the ICU mortality rate, and 30 days mortality rates were assessed by the initial antimicrobial regimen. Results : One hundred sixteen consecutive patients were admitted to the ICU (mean age 66.5 years, 81.9 % male, 30 days mortality 28.4 %). The microbiologic diagnosis was established in 58 patients (50 %). The most common pathogens were S. pneumoniae (n=12), P. aeruginosae (n=9), K. pneumonia (n=9) and S. aureus (n=8). The initial empirical antimicrobial regimens were classified as: ${\beta}$-lactam plus macrolide; ${\beta}$-lactam plus fluoroquinolone; anti-Pseudomonal ${\beta}$-lactam plus fluoroquinolone; Aminoglycoside combination regimen; ${\beta}$-lactam plus clindamycin; and ${\beta}$-lactam alone. There were no statistical significant differences in the 30-day mortality rate according to the initial antimicrobial regimen (p = 0.682). Multivariate analysis revealed that acute renal failure, acute respiratory distress syndrome and K. pneumonae were independent risk factors related to the 30 day mortality rate. Conclusion : S. pneumoniae, P. aeruginosae, K. pneumonia and S. aureus were the most common causative pathogens in patients with SCAP and K. pneumoniae was an independent risk factor for 30 day mortality. The initial antimicrobial regimen was not associated with the 30-day mortality.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.