• Title/Summary/Keyword: K-Mean++ Clustering

Search Result 280, Processing Time 0.027 seconds

Probabilistic reduced K-means cluster analysis (확률적 reduced K-means 군집분석)

  • Lee, Seunghoon;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.905-922
    • /
    • 2021
  • Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.

The Study of Genetic Diversity and Population Structure of the Korean Fleshy Shrimp, Fenneropenaeus chinensis, Using Newly Developed Microsatellite Markers (새로 개발한 미세위성체 마커를 이용한 한국 대하의 유전다양성 및 집단구조)

  • Shin, Eun-Ha;Kong, Hee Jeong;Nam, Bo-Hye;Kim, Young-Ok;Kim, Bong-Seok;Kim, Dong-Gyun;An, Cheul Min;Jung, Hyungtaek;Kim, Woo-Jin
    • Journal of Life Science
    • /
    • v.25 no.12
    • /
    • pp.1347-1353
    • /
    • 2015
  • The fleshy shrimp, Fenneropenaeus chinensis, is the family of Penaeidae and one of the most economically important marine culture species in Korea. However, its genetic characteristics have never been studied. In this study, a total of 240 wild F. chinensis individuals were collected from four locations as follows: Narodo (NRD, n = 60), Beopseongpo (BSP, n = 60), Chaesukpo (CSP, n = 60), and Cheonsuman (CSM, n = 60). Genetic variability and the relationships among four wild F. chinensis populations were analyzed using 13 newly developed microsatellite loci. Relatively high levels of genetic variability (mean allelic richness = 16.87; mean heterozygosity = 0.845) were found among localities. Among the 52 population loci, 13 showed significant deviation from the Hardy–Weinberg equilibrium. Neighbor-joining, principal coordinate, and molecular variance analyses revealed the presence of three subpopulations (NRD, CSM, BSP and CSP), which was consistent with clustering based on genetic distance. The mean observed heterozygosity values of the NRD, CSM, BSP, and CSP populations were 0.724, 0.821, 0.814, and 0.785 over all loci, respectively. These genetic variability and differentiation results of the four wild populations can be applied for future genetic improvement using selective breeding and to design suitable management guidelines for Korean F. chinensis culture.

Multi-Objective Optimization of a Dimpled Channel Using NSGA-II (NSGA-II를 통한 딤플채널의 다중목적함수 최적화)

  • Lee, Ki-Don;Samad, Abdus;Kim, Kwang-Yong
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2008.03b
    • /
    • pp.113-116
    • /
    • 2008
  • This work presents numerical optimization for design of staggered arrays of dimples printed on opposite surfaces of a cooling channel with a fast and elitist Non-Dominated Sorting of Genetic Algorithm (NSGA-II) of multi-objective optimization. As Pareto optimal front produces a set of optimal solutions, the trends of objective functions with design variables are predicted by hybrid multi-objective evolutionary algorithm. The problem is defined by three non-dimensional geometric design variables composed of dimpled channel height, dimple print diameter, dimple spacing and dimple depth to maximize heat transfer rate compromising with pressure drop. Twenty designs generated by Latin hypercube sampling were evaluated by Reynolds-averaged Navier-Stokes solver and the evaluated objectives were used to construct Pareto optimal front through hybrid multi-objective evolutionary algorithm. The optimum designs were grouped by k-mean clustering technique and some of the clustered points were evaluated by flow analysis. With increase in dimple depth, heat transfer rate increases and at the same time pressure drop also increases, while opposite behavior is obtained for the dimple spacing. The heat transfer performance is related to the vertical motion of the flow and the reattachment length in the dimple.

  • PDF

VAD By Neural Network Under Wireless Communication Systems (Neural Network을 이용한 무선 통신시스템에서의 VAD)

  • Lee Hosun;Kim Sukyung;Park Sung-Kwon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.12C
    • /
    • pp.1262-1267
    • /
    • 2005
  • Elliptical basis function (EBF) neural network works stably under high-level background noise environment and makes the nonlinear processing possible. It can be adapted real time VAD with simple design. This paper introduces VAD implementation using EBF and the experimental results show that EBF VAD outperforms G729 Annex B and RBF neural networks. The best error rates achieved by the EBF networks were improved more than $70\%$ in speech and $50\%$ in silence while that achieved by G.729 Annex B and RBF networks respectively.

Voice Activity Detection Algorithm base on Radial Basis Function Networks with Dual Threshold (Radial Basis Function Networks를 이용한 이중 임계값 방식의 음성구간 검출기)

  • Kim Hong lk;Park Sung Kwon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.12C
    • /
    • pp.1660-1668
    • /
    • 2004
  • This paper proposes a Voice Activity Detection (VAD) algorithm based on Radial Basis Function (RBF) network using dual threshold. The k-means clustering and Least Mean Square (LMS) algorithm are used to upade the RBF network to the underlying speech condition. The inputs for RBF are the three parameters in a Code Exited Linear Prediction (CELP) coder, which works stably under various background noise levels. Dual hangover threshold applies in BRF-VAD for reducing error, because threshold value has trade off effect in VAD decision. The experimental result show that the proposed VAD algorithm achieves better performance than G.729 Annex B at any noise level.

Performance Evaluation of Pilotless Channel Estimation with Limited Number of Data Symbols in Frequency Selective Channel

  • Wang, Hanho
    • International Journal of Contents
    • /
    • v.14 no.2
    • /
    • pp.1-6
    • /
    • 2018
  • In a wireless mobile communication system, a pilot signal has been considered to be a necessary signal for estimating a changing channel between a base station and a terminal. All mobile communication systems developed so far have a specification for transmitting pilot signals. However, although the pilot signal transmission is easy to estimate the channel,(Ed: unclear wording: it is easy to use the pilot signal transmission to estimate the channel?) it should be minimized because it uses radio resources for data transmission. In this paper, we propose a pilotless channel estimation scheme (PCE) by introducing the clustering method of unsupervised learning used in our deep learning into channel estimation.(Ed: highlight- unclear) The PCE estimates the channel using only the data symbols without using the pilot signal at all. Also, to apply PCE to a real system, we evaluated the performance of PCE based on the resource block (RB), which is a resource allocation unit used in LTE. According to the results of this study, the PCE always provides a better mean square error (MSE) performance than the least square estimator using pilots, although it does not use the pilot signal at all. The MSE performance of the PCE is affected by the number of data symbols used and the frequency selectivity of the channel. In this paper, we provide simulation results considering various effects(Ed: unclear, clarify).

The Study of Land Surface Change Detection Using Long-Term SPOT/VEGETATION (장기간 SPOT/VEGETATION 정규화 식생지수를 이용한 지면 변화 탐지 개선에 관한 연구)

  • Yeom, Jong-Min;Han, Kyung-Soo;Kim, In-Hwan
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.13 no.4
    • /
    • pp.111-124
    • /
    • 2010
  • To monitor the environment of land surface change is considered as an important research field since those parameters are related with land use, climate change, meteorological study, agriculture modulation, surface energy balance, and surface environment system. For the change detection, many different methods have been presented for distributing more detailed information with various tools from ground based measurement to satellite multi-spectral sensor. Recently, using high resolution satellite data is considered the most efficient way to monitor extensive land environmental system especially for higher spatial and temporal resolution. In this study, we use two different spatial resolution satellites; the one is SPOT/VEGETATION with 1 km spatial resolution to detect coarse resolution of the area change and determine objective threshold. The other is Landsat satellite having high resolution to figure out detailed land environmental change. According to their spatial resolution, they show different observation characteristics such as repeat cycle, and the global coverage. By correlating two kinds of satellites, we can detect land surface change from mid resolution to high resolution. The K-mean clustering algorithm is applied to detect changed area with two different temporal images. When using solar spectral band, there are complicate surface reflectance scattering characteristics which make surface change detection difficult. That effect would be leading serious problems when interpreting surface characteristics. For example, in spite of constant their own surface reflectance value, it could be changed according to solar, and sensor relative observation location. To reduce those affects, in this study, long-term Normalized Difference Vegetation Index (NDVI) with solar spectral channels performed for atmospheric and bi-directional correction from SPOT/VEGETATION data are utilized to offer objective threshold value for detecting land surface change, since that NDVI has less sensitivity for solar geometry than solar channel. The surface change detection based on long-term NDVI shows improved results than when only using Landsat.

A Study on the Prediction System of Block Matching Rework Time (블록 정합 재작업 시수 예측 시스템에 관한 연구)

  • Jang, Moon-Seuk;Ruy, Won-Sun;Park, Chang-Kyu;Kim, Deok-Eun
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.55 no.1
    • /
    • pp.66-74
    • /
    • 2018
  • In order to evaluate the precision degree of the blocks on the dock, the shipyards recently started to use the point cloud approaches using the 3D scanners. However, they hesitate to use it due to the limited time, cost, and elaborative effects for the post-works. Although it is somewhat traditional instead, they have still used the electro-optical wave devices which have a characteristic of having less dense point set (usually 1 point per meter) around the contact section of two blocks. This paper tried to expand the usage of point sets. Our approach can estimate the rework time to weld between the Pre-Erected(PE) Block and Erected(ER) block as well as the precision of block construction. In detail, two algorithms were applied to increase the efficiency of estimation process. The first one is K-mean clustering algorithm which is used to separate only the related contact point set from others not related with welding sections. The second one is the Concave hull algorithm which also separates the inner point of the contact section used for the delayed outfitting and stiffeners section, and constructs the concave outline of contact section as the primary objects to estimate the rework time of welding. The main purpose of this paper is that the rework cost for welding is able to be obtained easily and precisely with the defective point set. The point set on the blocks' outline are challenging to get the approximated mathematical curves, owing to the lots of orthogonal parts and lack of number of point. To solve this problems we compared the Radial based function-Multi-Layer(RBF-ML) and Akima interpolation method. Collecting the proposed methods, the paper suggested the noble point matching method for minimizing the rework time of block-welding on the dock, differently the previous approach which had paid the attention of only the degree of accuracy.

Delineation of Provenance Regions of Forests Based on Climate Factors in Korea (기상인자(氣象因子)에 의한 우리 나라 산림(山林)의 산지구분(産地區分))

  • Choi, Wan Yong;Tak, Woo Sik;Yim, Kyong Bin;Jang, Suk Seong
    • Journal of Korean Society of Forest Science
    • /
    • v.88 no.3
    • /
    • pp.379-388
    • /
    • 1999
  • As a first step for delineating the provenance regions of the forest trees in Korea, horizontal zones have been deduced primarily from the various climatic factors such as annual mean temperature, extremely low temperature, relative humidity, annual gum of possible growing days, duration of sunshine and dry index. The basic concept to the delineation of the provenance regions was based on the ecological regions, which was likely to be more practical than that on the basis of the typical provenance regions at the species level. Primary classification of the regions has been based on the forest zones(sub-tropical, warm-temperate, mid-temperate and cool-temperate) as a broad geographic region. Further classification has been carried out using cluster analyses among the basic regions within forest zone. On the basis of clustering, a total of 19 regions including 3 from sub-tropical, 6 from warm-temperate, 8 from mid-temperate and 2 from cool-temperate was horizontally delineated. Of the mean values of 6 climate factors at the broad geographic region level, three factors such as annual mean temperature, extremely low temperature, annual growing days showed directional tendencies from subtropical to cool-temperate, while the others didn't. The values of relative humidity, duration of sunshine and dry index varied among the provenance regions within forest zone. These three factors might he more sensitive by the micro-environment condition than by the macro-environment condition. Present study aimed to delineate the primary provenance regions for tentative application to forest practices. These will be stepwise revised through the supplement using accumulated information regard to genecological data.

  • PDF

A method for learning users' preference on fuzzy values using neural networks and k-means clustering (신경망과 k-means 클러스터링을 이용한 사용자의 퍼지값 선호도 학습 방법)

  • Yoon, Tae-Bok;Na, Hyun-Jong;Park, Doo-Kyung;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.6
    • /
    • pp.716-720
    • /
    • 2006
  • Fuzzy sets are good for abstracting and unifying information using natural language like terms. However, fuzzy sets embody vagueness and users may have different attitude to the vagueness, each user may choose difference one as the best among several fuzzy values. In this paper, we develop a method teaming a user's, preference on fuzzy values and select one which fits to his preference. Users' preferences are modeled with artificial neural networks. We gather learning data from users by asking to choose the best from two fuzzy values in several representative cases of comparing two fuzzy sets. In order to establish tile representative comparing cases, we enumerate more than 600 cases and cluster them into several groups. Neural networks ate trained with the users' answer and the given two fuzzy values in each case. Experiments show that the proposed method produces outputs closet to users' preference than other methods.