• Title/Summary/Keyword: 비정형적

Search Result 1,213, Processing Time 0.029 seconds

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Analyzing Self-Introduction Letter of Freshmen at Korea National College of Agricultural and Fisheries by Using Semantic Network Analysis : Based on TF-IDF Analysis (언어네트워크분석을 활용한 한국농수산대학 신입생 자기소개서 분석 - TF-IDF 분석을 기초로 -)

  • Joo, J.S.;Lee, S.Y.;Kim, J.S.;Kim, S.H.;Park, N.B.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.23 no.1
    • /
    • pp.89-104
    • /
    • 2021
  • Based on the TF-IDF weighted value that evaluates the importance of words that play a key role, the semantic network analysis(SNA) was conducted on the self-introduction letter of freshman at Korea National College of Agriculture and Fisheries(KNCAF) in 2020. The top three words calculated by TF-IDF weights were agriculture, mathematics, study (Q. 1), clubs, plants, friends (Q. 2), friends, clubs, opinions, (Q. 3), mushrooms, insects, and fathers (Q. 4). In the relationship between words, the words with high betweenness centrality are reason, high school, attending (Q. 1), garbage, high school, school (Q. 2), importance, misunderstanding, completion (Q.3), processing, feed, and farmhouse (Q. 4). The words with high degree centrality are high school, inquiry, grades (Q. 1), garbage, cleanup, class time (Q. 2), opinion, meetings, volunteer activities (Q.3), processing, space, and practice (Q. 4). The combination of words with high frequency of simultaneous appearances, that is, high correlation, appeared as 'certification - acquisition', 'problem - solution', 'science - life', and 'misunderstanding - concession'. In cluster analysis, the number of clusters obtained by the height of cluster dendrogram was 2(Q.1), 4(Q.2, 4) and 5(Q. 3). At this time, the cohesion in Cluster was high and the heterogeneity between Clusters was clearly shown.

Association Between Psychiatric Medications and Urinary Incontinence (정신과 약물과 요실금의 연관성)

  • Jaejong Lee;SeungYun Lee;Hyeran Ko;Su Im Jin;Young Kyung Moon;Kayoung Song
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.31 no.2
    • /
    • pp.63-71
    • /
    • 2023
  • Urinary incontinence (UI), affecting 3%-11% of males and 25%-45% of females globally, is expected to rise with an aging population. It significantly impacts mental health, causing depression, stress, and reduced quality of life. UI can exacerbate psychiatric conditions, affecting treatment compliance and effectiveness. It is categorized into transient and chronic types. Transient UI, often reversible, is caused by factors summarized in the acronym DIAPPERS: Delirium, Infection, Atrophic urethritis/vaginitis, Psychological disorders, Pharmaceuticals, Excess urine output, Restricted mobility, Stool impaction. Chronic UI includes stress, urge, mixed, overflow, functional, and persistent incontinence. Drug-induced UI, a transient form, is frequently seen in psychiatric treatment. Antipsychotics, antidepressants, and other psychiatric medications can cause UI through various mechanisms like affecting bladder muscle tone, altering nerve reflexes, and inducing other conditions like diabetes or epilepsy. Specific drugs like lithium and valproic acid have also been linked to UI, though mechanisms are not always clear. Managing UI in psychiatric patients requires careful monitoring of urinary symptoms and judicious medication management. If a drug is identified as the cause, options include discontinuing, reducing, or adjusting the dosage. In cases where medication continuation is necessary, additional treatments like desmopressin, oxybutynin, trihexyphenidyl, or amitriptyline may be considered.

Lung Biopsy after Localization of Pulmonary Nodules with Hook Wire (Hook Wire를 이용한 폐결절의 위치선정 및 생검)

  • Kim, Jin-Sik;Hwang, Jae-Joon;Lee, Song-Am;Lee, Woo-Surng;Kim, Yo-Han;Kim, Jun-Seok;Chee, Hyun-Keun;Yi, Jeong-Geun
    • Journal of Chest Surgery
    • /
    • v.43 no.6
    • /
    • pp.681-686
    • /
    • 2010
  • Background: A chest computed-tomography has become more prevalent so that it is more common to detect small sized pulmonary nodules that have not been found in previous simple chest x-ray. If those detected nodules are undersized or located in pulmonary parenchyma, it is difficult to accomplish a biopsy since it is vulnerable to explore them either grossly or digitally. Thus, in our hospital, a thoracoscopic pulmonary wedge resection was performed after locating a lesion by means of hook wire with CT-guided. Material and Method: 31 patients (17 males and 14 female patients) from December in 2006 to June in 2010 became our subjects; their 34 pulmonary nodules were subjected to the thoracoscopic pulmonary wedge resection after locating a lesion by means of hook wire with CT-guided. Also we analyzed a possibility of hook wire dislocation, a frequency of conversion to open thoracotomy, time consumed to operation after location of a lesion, operation time, post operation complication, and histological diagnosis of the lesion. Result: 12 of 34 cases were ground glass lesion, whereas 22 cases of them were solitary pulmonary lesion. The median value of the lesion was 8mm in size (range: 3 to 23 mm), while the median value was 12.5 mm in depth (range: 1 to 34 mm). The median value of time consumed from location of the lesion to anesthetic induction was 86.5 minutes (41~473 minutes); furthermore the mean value of operation time was 103 minutes (25~345 minutes). Intrathoracic wire dislocation was found in one case, but a target lesion was successfully excised. Open thoracotomy was performed in four cases due to pleural adhesion. However, there was no case of conversion to open thoracotomy due to failure to detect a target lesion. In histological diagnosis, metastatic cancer were found in 15 cases, which were the most common, primary lung cancer were in 9 cases, non-specific inflammation were in 3 cases, tuberculosis inflammation were in 2 cases, lymph nodes were in 2 cases, active tuberculosis were in 1 case, atypical adenomatous hyperplasia was in 1 case and normal lung parenchymal finding was in 1 case, respectively. Conclusion: In our hospital, in order to accomplish a precise histological diagnosis of ground-glass lesion and pulmonary nodules in lung parenchyma, location of pulmonary nodules were exactly located with hook wire under chest computed-tomography, which was followed by lung biopsy. We concluded that this was an accurate, minimally invasive and valuable method to minimize the complications and increase of cost of medical service provided.

Structural and functional characteristics of rock-boring clam Barnea manilensis (암석을 천공하는 돌맛조개(Barnea manilensis)의 구조 및 기능)

  • Ji Yeong Kim;Yun Jeon Ahn;Tae Jin Kim;Seung Min Won;Seung Won Lee;Jongwon Song;Jeongeun Bak
    • Korean Journal of Environmental Biology
    • /
    • v.40 no.4
    • /
    • pp.413-422
    • /
    • 2022
  • Barnea manilensis is a bivalve which bores soft rocks, such as, limestone or mudstone in the low intertidal zone. They make burrows which have narrow entrances and wide interiors and live in these burrows for a lifetime. In this study, the morphology and the microstructure of the valve of rock-boring clam B. manilensis were observed using a stereoscopic microscope and FE-SEM, respectively. The chemical composition of specific part of the valve was assessed by energy dispersive X-ray spectroscopy (EDS) analysis. 3D modeling and structural dynamic analysis were used to simulate the boring behavior of B. manilensis. Microscopy results showed that the valve was asymmetric with plow-like spikes which were located on the anterior surface of the valve and were distributed in a specific direction. The anterior parts of the valve were thicker than the posterior parts. EDS results indicated that the valve mainly consisted of calcium carbonate, while metal elements, such as, Al, Si, Mn, Fe, and Mg were detected on the outer surface of the anterior spikes. It was assumed that the metal elements increased the strength of the valve, thus helping the B. manilensis to bore sediment. The simulation showed that spikes located on the anterior part of the valve received a load at all angles. It was suggested that the anterior part of the shell received the load while drilling rocks. The boring mechanism using the amorphous valve of B. manilensis is expected to be used as basic data to devise an efficient drilling mechanism.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

The Etiologies and Initial Antimicrobial Therapy Outcomes in One Tertiary Hospital ICU-admitted Patient with Severe Community-acquired Pneumonia (국내 한 3차 병원 중환자실에 입원한 중증지역획득폐렴 환자의 원인 미생물과 경험적 항균제 치료 성적의 고찰)

  • Lee, Jae Seung;Chung, Joo Won;Koh, Yunsuck;Lim, Chae-Man;Jung, Young Joo;Oh, Youn Mok;Shim, Tae Sun;Lee, Sang Do;Kim, Woo Sung;Kim, Dong-Soon;Kim, Won Dong;Hong, Sang-Bum
    • Tuberculosis and Respiratory Diseases
    • /
    • v.59 no.5
    • /
    • pp.522-529
    • /
    • 2005
  • Background : Several national societies have published guidelines for empirical antimicrobial therapy in patients with severe community-acquired pneumonia (SCAP). This study investigated the etiologies of SCAP in the Asan Medical Center and assessed the relationship between the initial empirical antimicrobial regimen and 30 day mortality rate. Method : retrospective analysis was performed on patients with SCAP admitted to the ICU between March 2002 and February 2004 in the Asan Medical Center. The basic demographic data, bacteriologic study results and initial antimicrobial regimen were examined for all patients. The clinical outcomes including the ICU length of stay, the ICU mortality rate, and 30 days mortality rates were assessed by the initial antimicrobial regimen. Results : One hundred sixteen consecutive patients were admitted to the ICU (mean age 66.5 years, 81.9 % male, 30 days mortality 28.4 %). The microbiologic diagnosis was established in 58 patients (50 %). The most common pathogens were S. pneumoniae (n=12), P. aeruginosae (n=9), K. pneumonia (n=9) and S. aureus (n=8). The initial empirical antimicrobial regimens were classified as: ${\beta}$-lactam plus macrolide; ${\beta}$-lactam plus fluoroquinolone; anti-Pseudomonal ${\beta}$-lactam plus fluoroquinolone; Aminoglycoside combination regimen; ${\beta}$-lactam plus clindamycin; and ${\beta}$-lactam alone. There were no statistical significant differences in the 30-day mortality rate according to the initial antimicrobial regimen (p = 0.682). Multivariate analysis revealed that acute renal failure, acute respiratory distress syndrome and K. pneumonae were independent risk factors related to the 30 day mortality rate. Conclusion : S. pneumoniae, P. aeruginosae, K. pneumonia and S. aureus were the most common causative pathogens in patients with SCAP and K. pneumoniae was an independent risk factor for 30 day mortality. The initial antimicrobial regimen was not associated with the 30-day mortality.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

An Analytical Study on the Stem-Growth by the Principal Component and Canonical Correlation Analyses (주성분(主成分) 및 정준상관분석(正準相關分析)에 의(依)한 수간성장(樹幹成長) 해석(解析)에 관(關)하여)

  • Lee, Kwang Nam
    • Journal of Korean Society of Forest Science
    • /
    • v.70 no.1
    • /
    • pp.7-16
    • /
    • 1985
  • To grasp canonical correlations, their related backgrounds in various growth factors of stem, the characteristics of stem by synthetical dispersion analysis, principal component analysis and canonical correlation analysis as optimum method were applied to Larix leptolepis. The results are as follows; 1) There were high or low correlation among all factors (height ($x_1$), clear height ($x_2$), form height ($x_3$), breast height diameter (D. B. H.: $x_4$), mid diameter ($x_5$), crown diameter ($x_6$) and stem volume ($x_7$)) except normal form factor ($x_8$). Especially stem volume showed high correlation with the D.B.H., height, mid diameter (cf. table 1). 3) (1) Canonical correlation coefficients and canonical variate between stem volume and composite variate of various height growth factors ($x_1$, $x_2$ and $x_3$) are ${\gamma}_{u1,v1}=0.82980^{**}$, $\{u_1=1.00000x_7\\v_1=1.08323x_1-0.04299x_2-0.07080x_3$. (2) Those of stem volume and composite variate of various diameter growth factors ($x_4$, $x_5$ and $x_6$) are ${\gamma}_{u1,v1}=0.98198^{**}$, $\{{u_1=1.00000x_7\\v_1=0.86433x_4+0.11996x_5+0.02917x_6$. (3) And canonical correlation between stem volume and composite variate of six factors including various heights and diameters are ${\gamma}_{u1,v1}=0.98700^{**}$, $\{^u_1=1.00000x_7\\v1=0.12948x_1+0.00291x_2+0.03076x_3+0.76707x_4+0.09107x_5+0.02576x_6$. All the cases showed the high canonical correlation. Height in the case of (1), D.B.H. in that of (2), and the D.B.H, and height in that of (3) respectively make an absolute contribution to the canonical correlation. Synthetical characteristics of each qualitative growth are largely affected by each factor. Especially in the case of (3) the influence by the D.B.H. is the most significant in the above six factors (cf. table 2). 3) Canonical correlation coefficient and canonical variate between composite variate of various height growth factors and that of the various diameter factors are ${\gamma}_{u1,v1}=0.78556^{**}$, $\{u_1=1.20569x_1-0.04444x_2-0.21696x_3\\v_1=1.09571x_4-0.14076x_5+0.05285x_6$. As shown in the above facts, only height and D.B.H. affected considerably to the canonical correlation. Thus, it was revealed that the synthetical characteristics of height growth was determined by height and those of the growth in thickness by D.B.H., respectively (cf. table 2). 4) Synthetical characteristics (1st-3rd principal component) derived from eight growth factors of stem, on the basis of 85% accumulated proportion aimed, are as follows; Ist principal component ($z_1$): $Z_1=0.40192x_1+0.23693x_2+0.37047x_3+0.41745x_4+0.41629x_5+0.33454x_60.42798x_7+0.04923x_8$, 2nd principal component ($z_2$): $z_2=-0.09306x_1-0.34707x_2+0.08372x_3-0.03239x_4+0.11152x_5+0.00012x_6+0.02407x_7+0.92185x_8$, 3rd principal component ($z_3$): $Z_3=0.19832x_1+0.68210x_2+0.35824x_3-0.22522x_4-0.20876x_5-0.42373x_6-0.15055x_7+0.26562x_8$. The first principal component ($z_1$) as a "size factor" showed the high information absorption power with 63.26% (proportion), and its principal component score is determined by stem volume, D.B.H., mid diameter and height, which have considerably high factor loading. The second principal component ($z_2$) is the "shape factor" which indicates cubic similarity of the stem and its score is formed under the absolute influence of normal form factor. The third principal component ($z_3$) is the "shape factor" which shows the degree of thickness and length of stem. These three principal components have the satisfactory information absorption power with 88.36% of the accumulated percentage. variance (cf. table 3). 5) Thus the principal component and canonical correlation analyses could be applied to the field of forest measurement, judgement of site qualities, management diagnoses for the forest management and the forest products industries, and the other fields which require the assessment of synthetical characteristics.

  • PDF

Pulmonary tuberculosis misdiagnosed as lung Metastasis in childhood cancer patients (소아암 환자에서 암의 전이로 오인된 폐결핵)

  • Lee, Hyun-Jae;Kim, Dong-Whan;Lee, Kang-Min;Park, Kyung-Duk;Lee, Jun-Ah;Cho, Soo-Yeon;Kook, Yoon-Hoh;Kim, Hee-Youn;Kim, Dong-Ho
    • Clinical and Experimental Pediatrics
    • /
    • v.52 no.8
    • /
    • pp.904-909
    • /
    • 2009
  • Purpose : The differential diagnosis for a pulmonary nodule is intriguing in cancer patients. Metastasis might be a preferential diagnosis, and yet possibilities of other medical conditions still exist. Pulmonary tuberculosis should be enlisted in the differential diagnosis for a pulmonary nodule in cancer patients in Korea. This study was aimed at analyzing the incidence and clinical features of pulmonary tuberculosis that were misdiagnosed as pulmonary metastasis during radiologic follow-up in pediatric cancer patients. Methods : We retrospectively studied 422 cancer patients less than 18 years old in the Korea Cancer Center Hospital from January 2001 to June 2007. We collected episodes of lung metastasis of primary tumor and tuberculosis during treatment or follow-up, and analyzed medical records. Results : There were 5 cases of tuberculosis confirmed after surgery which were initially regarded as cancer. Two patients had respiratory symptoms such as cough and sputum but the other 3 patients did not. One patient had a family history of tuberculosis. Acid-fast M. tuberculosis was found in one case upon tissue specimen analysis. Two cases were Mantoux positive and the sputum examination was negative in all cases. The polymerase chain reaction for tuberculosis on a pathologic specimen was used to differentiate M. tuberculosis from non-tuberculosis mycobacterium (NTM). It was positive in one case. Lung lesions in one case showed a concurrence of tuberculosis along with lung metastasis. One of these patients died after cancer recurrence. Conclusion : It is necessary to consider the possibility of tuberculosis when a lung mass is newly detected during treatment or follow-up in patients with childhood cancer.