• Title/Summary/Keyword: K means clustering

Search Result 1,107, Processing Time 0.031 seconds

Study about Library and Information Center's Image of Library and Information Science Students as Workplace (문헌정보학과 학생의 직장으로서의 도서관·정보센터 이미지 분석)

  • Cho, Jane;Lee, Jiwon
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.3
    • /
    • pp.113-132
    • /
    • 2016
  • Positioning technique which has been widely used for making marketing strategy by analyzing customer's image also has been used for public and test-taker's image analysis about public facilities, entrepreneurs, universities. This study analyze image of library and Information science students who trying to find a job in library fields about diverse types of library and information centers by Positioning technique. As a result of Similarity cognition analysis by multidimensional Scaling and K-means clustering, it was found that students recognize that public, national, university, school library are similar, on the other hand, portal company and special library are different from those types. In the jobs, user service jobs and technical service jobs are recognized as separated clusters, and cultural program job is also recognized dissimilarly from those clusters. By the way, images about work satisfaction and stability of employment shows high in national library; high wage shows high in portal company; employee's growth potential shows high in special library; job importance shows high in reference service jobs; difficulty shows high in content's job. Anyway, in the workplace selection, almost students regard stability of employment as top priorities, accordingly they prefers public library at most. Such a preference concentration tendency is strongly appeared in local university students than in metropolitan area students as a result of Pearson's chi-square test.

Development of Brain Tumor Detection using Improved Clustering Method on MRI-compatible Robotic Assisted Surgery (MRI 영상 유도 수술 로봇을 위한 개선된 군집 분석 방법을 이용한 뇌종양 영역 검출 개발)

  • Kim, DaeGwan;Cha, KyoungRae;Seung, SungMin;Jeong, Semi;Choi, JongKyun;Roh, JiHyoung;Park, ChungHwan;Song, Tae-Ha
    • Journal of Biomedical Engineering Research
    • /
    • v.40 no.3
    • /
    • pp.105-115
    • /
    • 2019
  • Brain tumor surgery may be difficult, but it is also incredibly important. The technological improvements for traditional brain tumor surgeries have always been a focus to improve the precision of surgery and release the potential of the technology in this important area of the body. The need for precision during brain tumor surgery has led to an increase in Robotic-assisted surgeries (RAS). One of the challenges to the widespread acceptance of RAS in the neurosurgery is to recognize invisible tumor accurately. Therefore, it is important to detect brain tumor size and location because surgeon tries to remove as much tumor as possible. In this paper, we proposed brain tumor detection procedures for MRI (Magnetic Resonance Imaging) system. A method of automatic brain tumor detection is needed to accurately target the location of the lesion during brain tumor surgery and to report the location and size of the lesion. In the qualitative assessment, the proposed method showed better results than those obtained with other brain tumor detection methods. Comparisons among all assessment criteria indicated that the proposed method was significantly superior to the threshold method with respect to all assessment criteria. The proposed method was effective for detecting brain tumor.

Tumor Habitat Analysis Using Longitudinal Physiological MRI to Predict Tumor Recurrence After Stereotactic Radiosurgery for Brain Metastasis

  • Da Hyun Lee;Ji Eun Park;NakYoung Kim;Seo Young Park;Young-Hoon Kim;Young Hyun Cho;Jeong Hoon Kim;Ho Sung Kim
    • Korean Journal of Radiology
    • /
    • v.24 no.3
    • /
    • pp.235-246
    • /
    • 2023
  • Objective: It is difficult to predict the treatment response of tissue after stereotactic radiosurgery (SRS) because radiation necrosis (RN) and tumor recurrence can coexist. Our study aimed to predict tumor recurrence, including the recurrence site, after SRS of brain metastasis by performing a longitudinal tumor habitat analysis. Materials and Methods: Two consecutive multiparametric MRI examinations were performed for 83 adults (mean age, 59.0 years; range, 27-82 years; 44 male and 39 female) with 103 SRS-treated brain metastases. Tumor habitats based on contrast-enhanced T1- and T2-weighted images (structural habitats) and those based on the apparent diffusion coefficient (ADC) and cerebral blood volume (CBV) images (physiological habitats) were defined using k-means voxel-wise clustering. The reference standard was based on the pathology or Response Assessment in Neuro-Oncologycriteria for brain metastases (RANO-BM). The association between parameters of single-time or longitudinal tumor habitat and the time to recurrence and the site of recurrence were evaluated using the Cox proportional hazards regression analysis and Dice similarity coefficient, respectively. Results: The mean interval between the two MRI examinations was 99 days. The longitudinal analysis showed that an increase in the hypovascular cellular habitat (low ADC and low CBV) was associated with the risk of recurrence (hazard ratio [HR], 2.68; 95% confidence interval [CI], 1.46-4.91; P = 0.001). During the single-time analysis, a solid low-enhancing habitat (low T2 and low contrast-enhanced T1 signal) was associated with the risk of recurrence (HR, 1.54; 95% CI, 1.01-2.35; P = 0.045). A hypovascular cellular habitat was indicative of the future recurrence site (Dice similarity coefficient = 0.423). Conclusion: After SRS of brain metastases, an increased hypovascular cellular habitat observed using a longitudinal MRI analysis was associated with the risk of recurrence (i.e., treatment resistance) and was indicative of recurrence site. A tumor habitat analysis may help guide future treatments for patients with brain metastases.

Association between High Diffusion-Weighted Imaging-Derived Functional Tumor Burden of Peritoneal Carcinomatosis and Overall Survival in Patients with Advanced Ovarian Carcinoma

  • He An;Jose AU Perucho;Keith WH Chiu;Edward S Hui;Mandy MY Chu;Siew Fei Ngu;Hextan YS Ngan;Elaine YP Lee
    • Korean Journal of Radiology
    • /
    • v.23 no.5
    • /
    • pp.539-547
    • /
    • 2022
  • Objective: To investigate the association between functional tumor burden of peritoneal carcinomatosis (PC) derived from diffusion-weighted imaging (DWI) and overall survival in patients with advanced ovarian carcinoma (OC). Materials and Methods: This prospective study was approved by the local research ethics committee, and informed consent was obtained. Fifty patients (mean age ± standard deviation, 57 ± 12 years) with stage III-IV OC scheduled for primary or interval debulking surgery (IDS) were recruited between June 2016 and December 2021. DWI (b values: 0, 400, and 800 s/mm2) was acquired with a 16-channel phased-array torso coil. The functional PC burden on DWI was derived based on K-means clustering to discard fat, air, and normal tissue. A score similar to the surgical peritoneal cancer index was assigned to each abdominopelvic region, with additional scores assigned to the involvement of critical sites, denoted as the functional peritoneal cancer index (fPCI). The apparent diffusion coefficient (ADC) of the largest lesion was calculated. Patients were dichotomized by immediate surgical outcome into high- and low-risk groups (with and without residual disease, respectively) with subsequent survival analysis using the Kaplan-Meier curve and log-rank test. Multivariable Cox proportional hazards regression was used to evaluate the association between DWI-derived results and overall survival. Results: Fifteen (30.0%) patients underwent primary debulking surgery, and 35 (70.0%) patients received neoadjuvant chemotherapy followed by IDS. Complete tumor debulking was achieved in 32 patients. Patients with residual disease after debulking surgery had reduced overall survival (p = 0.043). The fPCI/ADC was negatively associated with overall survival when accounted for clinicopathological information with a hazard ratio of 1.254 for high fPCI/ADC (95% confidence interval, 1.007-1.560; p = 0.043). Conclusion: A high DWI-derived functional tumor burden was associated with decreased overall survival in patients with advanced OC.

The Need for Paradigm Shift in Semantic Similarity and Semantic Relatedness : From Cognitive Semantics Perspective (의미간의 유사도 연구의 패러다임 변화의 필요성-인지 의미론적 관점에서의 고찰)

  • Choi, Youngseok;Park, Jinsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.111-123
    • /
    • 2013
  • Semantic similarity/relatedness measure between two concepts plays an important role in research on system integration and database integration. Moreover, current research on keyword recommendation or tag clustering strongly depends on this kind of semantic measure. For this reason, many researchers in various fields including computer science and computational linguistics have tried to improve methods to calculating semantic similarity/relatedness measure. This study of similarity between concepts is meant to discover how a computational process can model the action of a human to determine the relationship between two concepts. Most research on calculating semantic similarity usually uses ready-made reference knowledge such as semantic network and dictionary to measure concept similarity. The topological method is used to calculated relatedness or similarity between concepts based on various forms of a semantic network including a hierarchical taxonomy. This approach assumes that the semantic network reflects the human knowledge well. The nodes in a network represent concepts, and way to measure the conceptual similarity between two nodes are also regarded as ways to determine the conceptual similarity of two words(i.e,. two nodes in a network). Topological method can be categorized as node-based or edge-based, which are also called the information content approach and the conceptual distance approach, respectively. The node-based approach is used to calculate similarity between concepts based on how much information the two concepts share in terms of a semantic network or taxonomy while edge-based approach estimates the distance between the nodes that correspond to the concepts being compared. Both of two approaches have assumed that the semantic network is static. That means topological approach has not considered the change of semantic relation between concepts in semantic network. However, as information communication technologies make advantage in sharing knowledge among people, semantic relation between concepts in semantic network may change. To explain the change in semantic relation, we adopt the cognitive semantics. The basic assumption of cognitive semantics is that humans judge the semantic relation based on their cognition and understanding of concepts. This cognition and understanding is called 'World Knowledge.' World knowledge can be categorized as personal knowledge and cultural knowledge. Personal knowledge means the knowledge from personal experience. Everyone can have different Personal Knowledge of same concept. Cultural Knowledge is the knowledge shared by people who are living in the same culture or using the same language. People in the same culture have common understanding of specific concepts. Cultural knowledge can be the starting point of discussion about the change of semantic relation. If the culture shared by people changes for some reasons, the human's cultural knowledge may also change. Today's society and culture are changing at a past face, and the change of cultural knowledge is not negligible issues in the research on semantic relationship between concepts. In this paper, we propose the future directions of research on semantic similarity. In other words, we discuss that how the research on semantic similarity can reflect the change of semantic relation caused by the change of cultural knowledge. We suggest three direction of future research on semantic similarity. First, the research should include the versioning and update methodology for semantic network. Second, semantic network which is dynamically generated can be used for the calculation of semantic similarity between concepts. If the researcher can develop the methodology to extract the semantic network from given knowledge base in real time, this approach can solve many problems related to the change of semantic relation. Third, the statistical approach based on corpus analysis can be an alternative for the method using semantic network. We believe that these proposed research direction can be the milestone of the research on semantic relation.

Analysis of the Seasonal Concentration Differences of Particulate Matter According to Land Cover of Seoul - Focusing on Forest and Urbanized Area - (서울시 토지피복에 따른 계절별 미세먼지 농도 차이 분석 - 산림과 시가화지역을 중심으로 -)

  • Choi, Tae-Young;Moon, Ho-Gyeong;Kang, Da-In;Cha, Jae-Gyu
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.6
    • /
    • pp.635-646
    • /
    • 2018
  • This study sought to identify the characteristics of seasonal concentration differences of particulate matter influenced by land cover types associated with particulate matter emission and reductions, namely forest and urbanized regions. PM10 and PM2.5 was measured with quantitative concentration in 2016 on 23 urban air monitoring stations in Seoul, classified the stations into 3 groups based on the ratio of urbanized and forest land covers within a range of 3km around station, and analysed the differences in particulate matter concentration by season. The center values for the urbanized and forest land covers by group were 53.4% and 34.6% in Group A, 61.8% and 16.5% in Group B, and 76.3% and 6.7% in Group C. The group-specific concentration of PM10 and PM2.5 by season indicated that the concentration of Group A, with high ratio of forests, was the lowest in all seasons, and the concentration of Group C, with high ratio of urbanized regions, had the highest concentration from spring to autumn. These inter-group differences were statistically significant. The concentration of Group C was lower than Group B in the winter; however, the differences between Groups B to C in the winter were not statistically significant. Group A concentration compared to the high-concentration groups by season was lower by 8.5%, 11.2%, 8.0%, 6.8% for PM10 in the order of spring, summer, autumn and winter, and 3.5%, 10.0%, 4.1% and 3.3% for PM2.5. The inter-group concentration differences for both PM10 and PM2.5 were the highest in the summer and grew smaller in the winter, this was thought to be because the forests' ability to reduce particulate matter emissions was the most pronounced during the summer and the least pronounced during the winter. The influence of urbanized areas on particulate matter concentration was lower compared to the influence of forests. This study provided evidence that the particulate matter concentration was lower for regions with higher ratios of forests, and subsequent studies are required to identify the role of green space to manage particulate matter concentration in cities.

A Hybrid Forecasting Framework based on Case-based Reasoning and Artificial Neural Network (사례기반 추론기법과 인공신경망을 이용한 서비스 수요예측 프레임워크)

  • Hwang, Yousub
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.4
    • /
    • pp.43-57
    • /
    • 2012
  • To enhance the competitive advantage in a constantly changing business environment, an enterprise management must make the right decision in many business activities based on both internal and external information. Thus, providing accurate information plays a prominent role in management's decision making. Intuitively, historical data can provide a feasible estimate through the forecasting models. Therefore, if the service department can estimate the service quantity for the next period, the service department can then effectively control the inventory of service related resources such as human, parts, and other facilities. In addition, the production department can make load map for improving its product quality. Therefore, obtaining an accurate service forecast most likely appears to be critical to manufacturing companies. Numerous investigations addressing this problem have generally employed statistical methods, such as regression or autoregressive and moving average simulation. However, these methods are only efficient for data with are seasonal or cyclical. If the data are influenced by the special characteristics of product, they are not feasible. In our research, we propose a forecasting framework that predicts service demand of manufacturing organization by combining Case-based reasoning (CBR) and leveraging an unsupervised artificial neural network based clustering analysis (i.e., Self-Organizing Maps; SOM). We believe that this is one of the first attempts at applying unsupervised artificial neural network-based machine-learning techniques in the service forecasting domain. Our proposed approach has several appealing features : (1) We applied CBR and SOM in a new forecasting domain such as service demand forecasting. (2) We proposed our combined approach between CBR and SOM in order to overcome limitations of traditional statistical forecasting methods and We have developed a service forecasting tool based on the proposed approach using an unsupervised artificial neural network and Case-based reasoning. In this research, we conducted an empirical study on a real digital TV manufacturer (i.e., Company A). In addition, we have empirically evaluated the proposed approach and tool using real sales and service related data from digital TV manufacturer. In our empirical experiments, we intend to explore the performance of our proposed service forecasting framework when compared to the performances predicted by other two service forecasting methods; one is traditional CBR based forecasting model and the other is the existing service forecasting model used by Company A. We ran each service forecasting 144 times; each time, input data were randomly sampled for each service forecasting framework. To evaluate accuracy of forecasting results, we used Mean Absolute Percentage Error (MAPE) as primary performance measure in our experiments. We conducted one-way ANOVA test with the 144 measurements of MAPE for three different service forecasting approaches. For example, the F-ratio of MAPE for three different service forecasting approaches is 67.25 and the p-value is 0.000. This means that the difference between the MAPE of the three different service forecasting approaches is significant at the level of 0.000. Since there is a significant difference among the different service forecasting approaches, we conducted Tukey's HSD post hoc test to determine exactly which means of MAPE are significantly different from which other ones. In terms of MAPE, Tukey's HSD post hoc test grouped the three different service forecasting approaches into three different subsets in the following order: our proposed approach > traditional CBR-based service forecasting approach > the existing forecasting approach used by Company A. Consequently, our empirical experiments show that our proposed approach outperformed the traditional CBR based forecasting model and the existing service forecasting model used by Company A. The rest of this paper is organized as follows. Section 2 provides some research background information such as summary of CBR and SOM. Section 3 presents a hybrid service forecasting framework based on Case-based Reasoning and Self-Organizing Maps, while the empirical evaluation results are summarized in Section 4. Conclusion and future research directions are finally discussed in Section 5.

Determination of Tumor Boundaries on CT Images Using Unsupervised Clustering Algorithm (비교사적 군집화 알고리즘을 이용한 전산화 단층영상의 병소부위 결정에 관한 연구)

  • Lee, Kyung-Hoo;Ji, Young-Hoon;Lee, Dong-Han;Yoo, Seoung-Yul;Cho, Chul-Koo;Kim, Mi-Sook;Yoo, Hyung-Jun;Kwon, Soo-Il;Chun, Jun-Chul
    • Journal of Radiation Protection and Research
    • /
    • v.26 no.2
    • /
    • pp.59-66
    • /
    • 2001
  • It is a hot issue to determine the spatial location and shape of tumor boundary in fractionated stereotactic radiotherapy (FSRT). We could get consecutive transaxial plane images from the phantom (paraffin) and 4 patients with brain tumor using helical computed tomography(HCT). K-means classification algorithm was adjusted to change raw data pixel value in CT images into classified average pixel value. The classified images consists of 5 regions that ate tumor region (TR), normal region (NR), combination region (CR), uncommitted region (UR) and artifact region (AR). The major concern was how to separate the normal region from tumor region in the combination area. Relative average deviation analysis was adjusted to alter average pixel values of 5 regions into 2 regions of normal and tumor region to define maximum point among average deviation pixel values. And then we drawn gross tumor volume (GTV) boundary by connecting maximum points in images using semi-automatic contour method by IDL(Interactive Data Language) program. The error limit of the ROI boundary in homogeneous phantom is estimated within ${\pm}1%$. In case of 4 patients, we could confirm that the tumor lesions described by physician and the lesions described automatically by the K-mean classification algorithm and relative average deviation analyses were similar. These methods can make uncertain boundary between normal and tumor region into clear boundary. Therefore it will be useful in the CT images-based treatment planning especially to use above procedure apply prescribed method when CT images intermittently fail to visualize tumor volume comparing to MRI images.

  • PDF

Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System (법령정보 검색을 위한 생활용어와 법률용어 간의 대응관계 탐색 방법론)

  • Kim, Ji Hyun;Lee, Jong-Seo;Lee, Myungjin;Kim, Wooju;Hong, June Seok
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.137-152
    • /
    • 2012
  • In the generation of Web 2.0, as many users start to make lots of web contents called user created contents by themselves, the World Wide Web is overflowing by countless information. Therefore, it becomes the key to find out meaningful information among lots of resources. Nowadays, the information retrieval is the most important thing throughout the whole field and several types of search services are developed and widely used in various fields to retrieve information that user really wants. Especially, the legal information search is one of the indispensable services in order to provide people with their convenience through searching the law necessary to their present situation as a channel getting knowledge about it. The Office of Legislation in Korea provides the Korean Law Information portal service to search the law information such as legislation, administrative rule, and judicial precedent from 2009, so people can conveniently find information related to the law. However, this service has limitation because the recent technology for search engine basically returns documents depending on whether the query is included in it or not as a search result. Therefore, it is really difficult to retrieve information related the law for general users who are not familiar with legal terms in the search engine using simple matching of keywords in spite of those kinds of efforts of the Office of Legislation in Korea, because there is a huge divergence between everyday words and legal terms which are especially from Chinese words. Generally, people try to access the law information using everyday words, so they have a difficulty to get the result that they exactly want. In this paper, we propose a term mapping methodology between everyday words and legal terms for general users who don't have sufficient background about legal terms, and we develop a search service that can provide the search results of law information from everyday words. This will be able to search the law information accurately without the knowledge of legal terminology. In other words, our research goal is to make a law information search system that general users are able to retrieval the law information with everyday words. First, this paper takes advantage of tags of internet blogs using the concept for collective intelligence to find out the term mapping relationship between everyday words and legal terms. In order to achieve our goal, we collect tags related to an everyday word from web blog posts. Generally, people add a non-hierarchical keyword or term like a synonym, especially called tag, in order to describe, classify, and manage their posts when they make any post in the internet blog. Second, the collected tags are clustered through the cluster analysis method, K-means. Then, we find a mapping relationship between an everyday word and a legal term using our estimation measure to select the fittest one that can match with an everyday word. Selected legal terms are given the definite relationship, and the relations between everyday words and legal terms are described using SKOS that is an ontology to describe the knowledge related to thesauri, classification schemes, taxonomies, and subject-heading. Thus, based on proposed mapping and searching methodologies, our legal information search system finds out a legal term mapped with user query and retrieves law information using a matched legal term, if users try to retrieve law information using an everyday word. Therefore, from our research, users can get exact results even if they do not have the knowledge related to legal terms. As a result of our research, we expect that general users who don't have professional legal background can conveniently and efficiently retrieve the legal information using everyday words.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.