• Title/Summary/Keyword: K-Mean++ Clustering

Search Result 280, Processing Time 0.023 seconds

Surface Synoptic Climatic Patterns for Heavy Snowfall Events in the Republic of Korea (우리나라 대설 시 지상 종관 기후 패턴)

  • Choi, Gwang-Yong;Kim, Jun-Su
    • Journal of the Korean Geographical Society
    • /
    • v.45 no.3
    • /
    • pp.319-341
    • /
    • 2010
  • The purposes of this study are to classify heavy snowfall types in the Republic of Korea based on fresh snowfall data and atmospheric circulation data during the last 36(1973/74-2008/09) snow seasons and to identify typical surface synoptic climate patterns that characterize each heavy snowfall type. Four synoptic climate categories and seventeen regional heavy snowfall types are classified based on sea level pressure/surface wind vector patterns in East Asia and frequent spatial clustering patterns of heavy snowfall in the Republic of Korea, respectively. Composite analyses of multiple surface synoptic weather charts demonstrate that the locations and intensity of pressure/wind vector mean and anomaly cores in East Asia differentiate each regional heavy snowfall type in Korea. These differences in synoptic climatic fields are primarily associated with the surge of the Siberian high pressure system and the appearance of low pressure systems over the Korean Peninsula. In terms of hemispheric atmospheric circulation, synoptic climatic patterns in the negative mode of winter Arctic Oscillation (AO) are also associated with frequent heavy snowfall in the Republic of Korea at seasonal scales. These results from long-term synoptic climatic data could contribute to improvement of short-range or seasonal prediction of regional heavy snowfall.

Human Action Recognition in Still Image Using Weighted Bag-of-Features and Ensemble Decision Trees (가중치 기반 Bag-of-Feature와 앙상블 결정 트리를 이용한 정지 영상에서의 인간 행동 인식)

  • Hong, June-Hyeok;Ko, Byoung-Chul;Nam, Jae-Yeal
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38A no.1
    • /
    • pp.1-9
    • /
    • 2013
  • This paper propose a human action recognition method that uses bag-of-features (BoF) based on CS-LBP (center-symmetric local binary pattern) and a spatial pyramid in addition to the random forest classifier. To construct the BoF, an image divided into dense regular grids and extract from each patch. A code word which is a visual vocabulary, is formed by k-means clustering of a random subset of patches. For enhanced action discrimination, local BoF histogram from three subdivided levels of a spatial pyramid is estimated, and a weighted BoF histogram is generated by concatenating the local histograms. For action classification, a random forest, which is an ensemble of decision trees, is built to model the distribution of each action class. The random forest combined with the weighted BoF histogram is successfully applied to Standford Action 40 including various human action images, and its classification performance is better than that of other methods. Furthermore, the proposed method allows action recognition to be performed in near real-time.

The Character Area Extraction and the Character Segmentation on the Color Document (칼라 문서에서 문자 영역 추출믹 문자분리)

  • 김의정
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.9 no.4
    • /
    • pp.444-450
    • /
    • 1999
  • This paper deals with several methods: the clustering method that uses k-means algorithm to abstract the area of characters on the image document and the distance function that suits for the HIS coordinate system to cluster the image. For the prepossessing step to recognize this, or the method of characters segmentate, the algorithm to abstract a discrete character is also proposed, using the linking picture element. This algorithm provides the feature that separates any character such as the touching or overlapped character. The methods of projecting and tracking the edge have so far been used to segment them. However, with the new method proposed here, the picture element extracts a discrete character with only one-time projection after abstracting the character string. it is possible to pull out it. dividing the area into the character and the rest (non-character). This has great significance in terms of processing color documents, not the simple binary image, and already received verification that it is more advanced than the previous document processing system.

  • PDF

Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas (메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.23-30
    • /
    • 2020
  • In this study, the relationship between natural gas (NG) data and gas-related environmental elements was performed using machine learning algorithms to predict the level of gas leakage risk without directly measuring gas leakage data. The study was based on open data provided by the server using the IoT-based remote control Picarro gas sensor specification. The naturel gas leaks into the air, it is a big problem for air pollution, environment and the health. The proposed method is multivariate outlier removing method based Random Forest (RF) classification for predicting risk of NG leak. After, unsupervised k-means clustering, the experimental dataset has done imbalanced data. Therefore, we focusing our proposed models can predict medium and high risk so best. In this case, we compared the receiver operating characteristic (ROC) curve, accuracy, area under the ROC curve (AUC), and mean standard error (MSE) for each classification model. As a result of our experiments, the evaluation measurements include accuracy, area under the ROC curve (AUC), and MSE; 99.71%, 99.57%, and 0.0016 for MOL_RF respectively.

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

Classification of Music Data using Fuzzy c-Means with Divergence Kernel (분산커널 기반의 퍼지 c-평균을 이용한 음악 데이터의 장르 분류)

  • Park, Dong-Chul
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.3
    • /
    • pp.1-7
    • /
    • 2009
  • An approach for the classification of music genres using a Fuzzy c-Means(FcM) with divergence-based kernel is proposed and presented in this paper. The proposed model utilizes the mean and covariance information of feature vectors extracted from music data and modelled by Gaussian Probability Density Function (GPDF). Furthermore, since the classifier utilizes a kernel method that can convert a complicated nonlinear classification boundary to a simpler linear one, he classifier can improve its classification accuracy over conventional algorithms. Experiments and results on collected music data sets demonstrate hat the proposed classification scheme outperforms conventional algorithms including FcM and SOM 17.73%-21.84% on average in terms of classification accuracy.

Estimation of Drought Rainfall by Regional Frequency Analysis using L and LH-Moments(I) - On the Method of L-Moments - (L 및 LH-모멘트법과 지역빈도분석에 의한 가뭄우량의 추정(I) - L-모멘트법을 중심으로 -)

  • 이순혁;윤성수;맹승진;류경식;주호길
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.45 no.5
    • /
    • pp.97-109
    • /
    • 2003
  • This study is mainly conducted to derive the design drought rainfall by the consecutive duration using probability weighted moments with rainfall in the regional drought frequency analysis. It is anticipated to suggest optimal design drought rainfall of hydraulic structures for the water requirement and drought frequency of occurrence for the safety of water utilization through this study. Preferentially, this study was conducted to derive the optimal regionalization of the precipitation data that can be classified by the climatologically and geographically homogeneous regions all over the regions except Cheju and Ulreung islands in Korea. Five homogeneous regions in view of topographical and climatological aspects were accomplished by K-means clustering method. Using the L-moment ratio diagram and Kolmogorov-Smirnov test, generalized extreme value distribution was confirmed as the best fitting one among applied distributions. At-site and regional parameters of the generalized extreme value distribution were estimated by the method of L-moments. Design drought rainfalls using L-moments following the consecutive duration were derived by the at-site and regional analysis using the observed and simulated data resulted from Monte Carlo techniques. Relative root-mean-square error (RRMSE), relative bias (RBIAS) and relative reduction (RR) in RRMSE for the design drought rainfall derived by at-site and regional analysis in the observed an simulated data were computed and compared. In has shown that the regional frequency analysis procedure can substantially more reduce the RRMSE. RBIAS and RR in RRMSE than those of at-site analysis in the prediction of design drought rainfall. Consequently, optimal design drought rainfalls following the regions and consecutive durations were derived by the regional frequency analysis.

Fuzzy-based Segmentation Algorithm for Brain Images (퍼지기반의 두뇌영상 영역분할 알고리듬)

  • Lee, Hyo-Jong
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.46 no.12
    • /
    • pp.102-107
    • /
    • 2009
  • As technology gets developed, medical equipments are also modernized and leading-edge systems, such as PACS become popular. Many scientists noticed importance of medical image processing technology. Technique of region segmentation is the first step of digital medical image processing. Segmentation technique helps doctors to find out abnormal symptoms early, such as tumors, edema, and necrotic tissue, and helps to diagnoses correctly. Segmentation of white matter, gray matter and CSF of a brain image is very crucial part. However, the segmentation is not easy due to ambiguous boundaries and inhomogeneous physical characteristics. The rate of incorrect segmentation is high because of these difficulties. Fuzzy-based segmentation algorithms are robust to even ambiguous boundaries. In this paper a modified Fuzzy-based segmentation algorithm is proposed to handle the noise of MR scanners. A proposed algorithm requires minimal computations of mean and variance of neighbor pixels to adjust a new neighbor list. With the addition of minimal compuation, the modified FCM(mFCM) lowers the rate of incorrect clustering below 30% approximately compared the traditional FCM.

Exploratory Study on the Quality Grade of Korea Black Raspberry Wines by Using Consumer Preference Data (시판 복분자주의 기호도 분석을 통한 탐색적 등급 분류)

  • Lee, Seung-Joo
    • Korean Journal of Food Science and Technology
    • /
    • v.46 no.3
    • /
    • pp.352-357
    • /
    • 2014
  • In this study, 100 consumers (men, 50; women, 50; age group, 20-50 years) rated their overall preferences for 24 Korean raspberry wines by using a 9-point hedonic scale. The analysis of variance was constructed to evaluate the effect of gender, age, and samples on the preference scores of the wine products. Significant differences were observed in overall preferences for the 24 samples; however, no interactions based on preferences by age and gender groups were noted. Cluster analysis was performed to determine sample clustering based on the frequencies from the preference data. Three clusters were obtained; these three clusters were well separated based on the mean overall preference scores for the samples. Discriminant analysis based on the three clusters also confirmed the same grouping of samples with 100% accuracy.

Market Segmentation Based on Types of Motivations to Visit Coffee Shops (커피전문점 방문동기유형에 따른 시장세분화)

  • Lee, Yong-Sook;Kim, Eun-Jung;Park, Heung-Jin
    • The Korean Journal of Franchise Management
    • /
    • v.7 no.1
    • /
    • pp.21-29
    • /
    • 2016
  • Purpose - The primary purpose of this study is to employ effective marketing methods using market segmentation of coffee shops by determining how motivations to visit coffee shops have different impacts on demographic profile of visitors and characteristics of coffee shop visits, so as to draw out a better understanding of customers of coffee market. Research design, data, and methodology - Data were collected using surveys of self-administered questionnaires toward coffee shop users in Daejeon, Korea. A number of samples used in data analysis were 253 excluding unusable responses. The data were analyzed through frequency, reliability, and factor analysis using SPSS 20.0. Factor analysis was conducted through the principal component analysis and varimax rotation method to derive factors of one or more eigen values. In addition, the cluster analysis, multivariate ANOVA, and cross-tab analysis were used for the market segmentation based on the types of motivation for coffee shop visits. The process of the cluster analysis is as follows. Four clusters were derived through hierarchical clustering, and k-means cluster analysis was then carried out using mean value of the four clusters as the initial seed value. Result - The factor analysis delineated four dimensions of motivation to visit coffee shops: ostentation motivation, hedonic motivation, esthetic motivation, utility motivation. The cluster analysis yielded four clusters: utility and esthetic seekers, hedonic seekers, utility seekers, ostentation seekers. In order to further specify the profile of four clusters, each cluster was cross tabulated with socio-demographics and characteristics of coffee shop visits. Four clusters are significantly different from each other by four types of motivations for coffee shop visits. Conclusions - This study has empirically examined the difference in demographic profile of visitors and characteristics of coffee shop visits by motivation to visit coffee shops. There are significant differences according to age, education background, marital status, occupation and monthly income. In addition, coffee shops use pattern characterization in frequency of visits to coffee shops, relationships with companion, purpose of visit, information sources, brand type, average expense per visit, important elements of selection attribute were significantly different depending on motivations for coffee shop visits.