• Title/Summary/Keyword: K-means 군집화

Search Result 274, Processing Time 0.02 seconds

Object Image Classification Using Hierarchical Neural Network (계층적 신경망을 이용한 객체 영상 분류)

  • Kim Jong-Ho;Kim Sang-Kyoon;Shin Bum-Joo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.11 no.1
    • /
    • pp.77-85
    • /
    • 2006
  • In this paper, we propose a hierarchical classifier of object images using neural networks for content-based image classification. The images for classification are object images that can be divided into foreground and background. In the preprocessing step, we extract the object region and shape-based texture features extracted from wavelet transformed images. We group the image classes into clusters which have similar texture features using Principal Component Analysis(PCA) and K-means. The hierarchical classifier has five layes which combine the clusters. The hierarchical classifier consists of 59 neural network classifiers learned with the back propagation algorithm. Among the various texture features, the diagonal moment was the most effective. A test with 1000 training data and 1000 test data composed of 10 images from each of 100 classes shows classification rates of 81.5% and 75.1% correct, respectively.

  • PDF

Factors affecting to the Quality of Korean Soybean Paste, Doenjang (한국 된장의 품질에 영향을 미치는 요인)

  • Shim, Hye-Jeoung;Yun, Jeong-hyun;Koh, Kyung-Hee
    • Journal of Applied Biological Chemistry
    • /
    • v.61 no.4
    • /
    • pp.357-365
    • /
    • 2018
  • The quality of Korean doenjang, which was traditionally made for this study, was monitored for physicochemical properties, antioxidant capacity, and sensory properties at six months intervals for three years. The collected data were comprehensively analyzed using the k-means clustering via principal component analysis (PCA) to determine the optimal intake duration and sensory factors associated with acceptance. Doenjang samples were classified with every year interval based on PCA, and then the classified doenjang samples were further grouped into cluster one, two, and three based on the k-means clustering. In Cluster three, doenjang that was aged for thirty and thirty-six months, respectively, showed high total phenolic content, antioxidant capacity, superoxide dismutase like activity, and 2,2-diphenyl-1-picryl-hydrazyl radical scavenging capacity. Interestingly, along with acceptance, the levels of free amino acids and organic acids were higher in Cluster 3. The sensory factors found to be associated with acceptance included umami taste and brown color. In conclusion, this study proposes the intake of doenjang aged for thirty months based on its antioxidant activity and sensory properties although doenjang is usually ready after twelve months of aging.

Intention-Awareness Method using Behavior Model Based User Intention (사용자 의도에 따른 행동 모델을 이용한 의도 인식 기법)

  • Kim, Geon-Su;Kim, Dong-Mun;Yun, Tae-Bok;Lee, Ji-Hyeong
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.3-6
    • /
    • 2007
  • 사람들이 어떠한 행동을 할 때는 특정 의도를 가지고 있기 때문에 상황에 맞는 적합한 서비스를 제공하기 위해서는 사용자가 현재 하고 있는 행동에 대한 의도를 파악해야한다. 이를 위해 의도와 행동사이의 연관성을 이용하여 사용자의 의도에 따른 행동의 모델을 만든다. 일상생활에서 사람들이 하는 행동은 작은 단위 행동들의 연속(sequence)으로 이루어지므로, 사용자의 단위행동의 순서를 분석한다면 의도에 따른 행동 모델을 만들기가 용이해진다. 하지만, 이런 단위 행동 분석 방법의 문제점은 같은 의도를 가진 행동이 완벽하게 동일한 단위 행동의 순서로 일어나지는 않는다는 점이다. 시스템은 동일한 동작 순서로 일어나지 않는 행동들을 서로 다른 의도를 가진 행동으로 이해하게 된다. 따라서 이 문제점을 해결할 수 있는 사용자 의도 파악 기법이 필요하다. 본 논문에서는 과거의 사용자의 행동 정보를 기반으로 행동들의 유사성을 판별하였고, 그 결과를 이용하여 행동의 의도를 파악하는 방법을 사용한다. 이를 위해, 과거 사용자가 한 행동들을 단위 시간 별로 나누어 단위 행동의 순서로 만들고, 이를 K-평균 군집화 방법(K-means)으로 군집들의 순서로 나타내었다. 이 변경된 사용자 행동 정보를 사용하여 은닉 마코프 모델을 학습 시키고, 이렇게 만들어진 은닉 마코프 모델은 현재 사용자가 행한 행동이 어떤 행동인지를 예측하여 사용자의 의도를 파악한다.

  • PDF

Method of Associative Group Using FP-Tree in Personalized Recommendation System (개인화 추천 시스템에서 FP-Tree를 이용한 연관 군집 방법)

  • Cho, Dong-Ju;Rim, Kee-Wook;Lee, Jung-Hyun;Chung, Kyung-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.10
    • /
    • pp.19-26
    • /
    • 2007
  • Since collaborative filtering has used the nearest-neighborhood method based on item preference it cannot only reflect exact contents but also has the problem of sparsity and scalability. The item-based collaborative filtering has been practically used improve these problems. However it still does not reflect attributes of the item. In this paper, we propose the method of associative group using the FP-Tree to solve the problem of existing recommendation system. The proposed makes frequent item and creates association rule by using FP-Tree without occurrence of candidate set. We made the efficient item group using $\alpha-cut$ according to the confidence of the association rule. To estimate the performance, the suggested method is compared with Gibbs Sampling, Expectation Maximization, and K-means in the MovieLens dataset.

Distribution Analysis of Optimal Equipment Assignment Using a Genetic Algorithm (유전알고리즘을 이용하여 최적화된 방제 자원 배치안의 분포도 분석)

  • Kim, Hye-Jin;Kim, Yong-Hyuk
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.4
    • /
    • pp.11-16
    • /
    • 2020
  • As a plan for oil spill accidents, research to collect and analyze optimal equipment assignments is essential. However, studies that have diversified and analyzed the optimal equipment assignments for responding to oil spill accidents have not been preceded. In response to the need for analyzing optimal equipment assignments study, we devised a genetic algorithm for optimal equipment assignments. The designed genetic algorithm yielded 10,000 optimal equipment assignments. We clustered using the k-means algorithm. As a result, the two clusters of Yeosu, Daesan, and Ulsan, which are expected to be the largest spills, were clearly identified. We also projected 16-dimensional data in two dimensions via Sammon's mapping. The projected data were analyzed for distribution. We confirmed that results of the simulation were better than those of optimal equipment assignments included in the cluster.In the future, it will be possible to implement an approximate model with excellent performance based on this study.

Robust k-means Clustering-based High-speed Barcode Decoding Method to Blur and Illumination Variation (블러와 조명 변화에 강인한 k-means 클러스터링 기반 고속 바코드 정보 추출 방법)

  • Kim, Geun-Jun;Cho, Hosang;Kang, Bongsoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.1
    • /
    • pp.58-64
    • /
    • 2016
  • In this paper presents Robust k-means clustering-based high-speed bar code decoding method to blur and lighting. for fast operation speed and robust decoding to blur, proposed method uses adaptive local threshold binarization methods that calculate threshold value by dividing blur region and a non-blurred region. Also, in order to prevent decoding fail from the noise, decoder based on k-means clustering algorithm is implemented using area data summed pixel width line of the same number of element. Results of simulation using samples taken at various worst case environment, the average success rate of proposed method is 98.47%. it showed the highest decoding success rate among the three comparison programs.

Text Region Detection Method in Mobile Phone Video (휴대전화 동영상에서의 문자 영역 검출 방법)

  • Lee, Hoon-Jae;Sull, Sang-Hoon
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.5
    • /
    • pp.192-198
    • /
    • 2010
  • With the popularization of the mobile phone with a built-in camera, there are a lot of effort to provide useful information to users by detecting and recognizing the text in the video which is captured by the camera in mobile phone, and there is a need to detect the text regions in such mobile phone video. In this paper, we propose a method to detect the text regions in the mobile phone video. We employ morphological operation as a preprocessing and obtain binarized image using modified k-means clustering. After that, candidate text regions are obtained by applying connected component analysis and general text characteristic analysis. In addition, we increase the precision of the text detection by examining the frequency of the candidate regions. Experimental results show that the proposed method detects the text regions in the mobile phone video with high precision and recall.

Curriculum Mining Analysis Using Clustering-Based Process Mining (군집화 기반 프로세스 마이닝을 이용한 커리큘럼 마이닝 분석)

  • Joo, Woo-Min;Choi, Jin Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.4
    • /
    • pp.45-55
    • /
    • 2015
  • In this paper, we consider curriculum mining as an application of process mining in the domain of education. The basic objective of the curriculum mining is to construct a registration pattern model by using logs of registration data. However, subject registration patterns of students are very unstructured and complicated, called a spaghetti model, because it has a lot of different cases and high diversity of behaviors. In general, it is typically difficult to develop and analyze registration patterns. In the literature, there was an effort to handle this issue by using clustering based on the features of students and behaviors. However, it is not easy to obtain them in general since they are private and qualitative. Therefore, in this paper, we propose a new framework of curriculum mining applying K-means clustering based on subject attributes to solve the problems caused by unstructured process model obtained. Specifically, we divide subject's attribute data into two parts : categorical and numerical data. Categorical attribute has subject name, class classification, and research field, while numerical attribute has ABEEK goal and semester information. In case of categorical attribute, we suggest a method to quantify them by using binarization. The number of clusters used for K-means clustering, we applied Elbow method using R-squared value representing the variance ratio that can be explained by the number of clusters. The performance of the suggested method was verified by using a log of student registration data from an 'A university' in terms of the simplicity and fitness, which are the typical performance measure of obtained process model in process mining.

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

Performance of Korean spontaneous speech recognizers based on an extended phone set derived from acoustic data (음향 데이터로부터 얻은 확장된 음소 단위를 이용한 한국어 자유발화 음성인식기의 성능)

  • Bang, Jeong-Uk;Kim, Sang-Hun;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.39-47
    • /
    • 2019
  • We propose a method to improve the performance of spontaneous speech recognizers by extending their phone set using speech data. In the proposed method, we first extract variable-length phoneme-level segments from broadcast speech signals, and convert them to fixed-length latent vectors using an long short-term memory (LSTM) classifier. We then cluster acoustically similar latent vectors and build a new phone set by choosing the number of clusters with the lowest Davies-Bouldin index. We also update the lexicon of the speech recognizer by choosing the pronunciation sequence of each word with the highest conditional probability. In order to analyze the acoustic characteristics of the new phone set, we visualize its spectral patterns and segment duration. Through speech recognition experiments using a larger training data set than our own previous work, we confirm that the new phone set yields better performance than the conventional phoneme-based and grapheme-based units in both spontaneous speech recognition and read speech recognition.