• Title/Summary/Keyword: K-mean cluster analysis

Search Result 303, Processing Time 0.037 seconds

Analysis of spatial mixing characteristics of water quality at the confluence using artificial intelligence (인공지능을 활용한 합류부에서 수질의 공간혼합 특성 분석)

  • Lee, Seo Gyeong;Kim, Dongsu;Kim, Kyungdong;Kim, Young Do;Lyu, Siwan
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.482-482
    • /
    • 2022
  • 하천의 합류부에서는 수질이 다른 유체가 혼합하여 합류 전과 다른 특성을 보인다. 하천의 합류부에서 수질을 효율적으로 관리하기 위해서는 수질의 공간적인 혼합 특성을 규명하는 것이 중요하다. 합류부에서 수질의 공간적인 혼합 특성을 분석하기 위해 본 연구에서는 토폴로지 데이터 분석(topological data analysis, TDA), 자기 조직화 지도(Self-Organizing Map, SOM), k-평균 알고리즘(K-means clustering algorithm) 세 가지 기법을 이용하였다. 세 가지 기법을 비교하여 어떤 알고리즘이 합류부의 수질 변화 특성을 더 뚜렷하게 나타내는지 분석하였다. 수질 변화 비교 인자들은 pH, chlorophyll, DO, Turbidity 등이 있고, 수질 인자들은 YSI를 활용해 측정하였다. 자료의 측정 지역은 낙동강과 황강이 합류하는 지역이며, 보트에 YSI 장비를 부착하고 횡단하여 측정하였다. 측정한 데이터를 R 프로그램을 통해 세 가지 기법을 적용시켜 수질 변화 비교를 분석한다. 토폴로지 데이터 분석(topological data analysis, TDA)은 거대하고 복잡한 데이터로부터 유의미한 정보를 추출하는 데 사용하고, 자기조직화지도(Self-Organizing Map, SOM) 기법은 차원 축소와 군집화를 동시에 수행한다. k-평균 알고리즘(K-means clustering algorithm) 기법은 주어진 데이터를 k개의 클러스터로 묶는 머신러닝 비지도학습에 속하는 알고리즘이다. 세 가지 방법들의 주목적은 클러스터링이다. 클러스터 분석(Cluster analysis)이란 주어진 데이터들의 특성을 고려해 동일한 성격을 가진 여러 개의 그룹으로 대상을 분류하는 데이터 마이닝의 한 방법이다. 군집화 방법들인 TDA, SOM, K-means를 이용해 합류 지역의 수질 특성들을 클러스터링하여 수질 패턴들을 분석해 하천 수질 오염을 방지할 수 있을 것이다. 본 연구에서는 토폴로지 데이터 분석(topological data analysis, TDA), 자기조직화지도(Self-Organizing Map, SOM), k-평균 알고리즘(K-means clustering algorithm) 세 가지 기법을 이용하여 합류부에서의 수질 특성을 비교하며 어떤 기법이 합류의 특성을 더욱 뚜렷하게 나타내는지 규명했다. 합류의 특성을 군집화 방법을 이용해 알게 된다면, 합류부의 수질 변화 패턴을 다른 합류 지역에서도 적용할 수 있을 것으로 기대된다.

  • PDF

Development of SSR markers for classification of Flammulina velutipes strains (팽이버섯 (Flammulina velutipes) 계통의 분류를 위한 SSR 마커개발)

  • Woo, Sung-I;Seo, Kyoung-In;Jang, Kab yeul;Kong, Won-Sik
    • Journal of Mushroom
    • /
    • v.15 no.2
    • /
    • pp.78-83
    • /
    • 2017
  • Microsatellite SSR markers were developed and utilized to reveal the genetic diversity of 32 strains of Flammulina velutipes collected in Korea, China, and Japan. From the SSR-enriched library, 490 white colonies were randomly selected and sequenced. Among the 490 sequenced clones, 85 (17.35%) were redundant. Among the remaining 405 unique clones, 201 (49.6%) contained microsatellite sequences. We used 12 primer pairs that produced reproducible polymorphic bands for four diverse strains, and these selected markers were further characterized in 32 Flammulina velutipes strains. A total of 34 alleles were detected using the 12 markers, with an average of 3.42 alleles, and the number of alleles ranged from two to seven per locus. The major allele frequency ranged from 0.42 (GB-FV-127) to 0.98 (GB-FV-166), and values for observed ($H_O$) and expected ($H_E$) heterozygosity ranged from 0.00 to 0.94 (mean = 0.18) and from 0.03 to 0.67 (mean = 0.32), respectively. SSR loci amplified with GB-FV-127 markers gave the highest polymorphism information content (PIC) of 0.61 and mean allele number of five, whereas for loci amplified with GB-FV-166 markers these values were the lowest, namely 0.03 and two. The mean PIC value (0.29) observed in the present study with average number of alleles (3.42). The genetic relationships among the 32 Flammulina velutipes strains on the basis of SSR data were investigated by UPGMA cluster analysis. In conclusion, we succeeded in developing 12 polymorphic SSRs markers from an SSR-enriched library of Flammulina velutipes. These SSRs are presently being used for phylogenetic analysis and evaluation of genetic variations. In future, these SSR markers will be used in clarifying taxonomic relationships among the Flammulina velutipes.

Classification of Weather Patterns in the East Asia Region using the K-means Clustering Analysis (K-평균 군집분석을 이용한 동아시아 지역 날씨유형 분류)

  • Cho, Young-Jun;Lee, Hyeon-Cheol;Lim, Byunghwan;Kim, Seung-Bum
    • Atmosphere
    • /
    • v.29 no.4
    • /
    • pp.451-461
    • /
    • 2019
  • Medium-range forecast is highly dependent on ensemble forecast data. However, operational weather forecasters have not enough time to digest all of detailed features revealed in ensemble forecast data. To utilize the ensemble data effectively in medium-range forecasting, representative weather patterns in East Asia in this study are defined. The k-means clustering analysis is applied for the objectivity of weather patterns. Input data used daily Mean Sea Level Pressure (MSLP) anomaly of the ECMWF ReAnalysis-Interim (ERA-Interim) during 1981~2010 (30 years) provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). Using the Explained Variance (EV), the optimal study area is defined by 20~60°N, 100~150°E. The number of clusters defined by Explained Cluster Variance (ECV) is thirty (k = 30). 30 representative weather patterns with their frequencies are summarized. Weather pattern #1 occurred all seasons, but it was about 56% in summer (June~September). The relatively rare occurrence of weather pattern (#30) occurred mainly in winter. Additionally, we investigate the relationship between weather patterns and extreme weather events such as heat wave, cold wave, and heavy rainfall as well as snowfall. The weather patterns associated with heavy rainfall exceeding 110 mm day-1 were #1, #4, and #9 with days (%) of more than 10%. Heavy snowfall events exceeding 24 cm day-1 mainly occurred in weather pattern #28 (4%) and #29 (6%). High and low temperature events (> 34℃ and < -14℃) were associated with weather pattern #1~4 (14~18%) and #28~29 (27~29%), respectively. These results suggest that the classification of various weather patterns will be used as a reference for grouping all ensemble forecast data, which will be useful for the scenario-based medium-range ensemble forecast in the future.

Segmentation of Middle and High Class Chinese Women in their 20's and 30's based on Clothing Purchasing Motive (의복구매동기에 의한 중국 $20\~30$대 중$\cdot$상류층 여성소비자시장 세분화)

  • Park Hye Won;Zhang Chun Ji
    • Journal of the Korean Home Economics Association
    • /
    • v.43 no.4 s.206
    • /
    • pp.49-63
    • /
    • 2005
  • The Purposes of this study were to segment Chinese consumers by clothing Purchase motive, and then to analyze and compare the clothing purchasing behavior among the segmented groups. The subjects were 655 career women of middle and high class in their 20's and 30's living in Benjing, Shanghai, Shenzhen, and Changchun. A total of 655 questionnaires were analyzed by using frequency, mean, factor analysis, ANOVA, Duncan's multiple range test, cluster analysis, and X^2 _ test. The results were as follows: 1. Chinese consumers were segmented into clothing high-involvement group, fashion pursuing group, practicality pursuing group, and characterless group. 2. The clothing purchase behavior variables such as purchasing motive, using informants, clothing selection standards, store selection standards, purchasing place, satisfaction after purchasing clothes, price of purchase, shopping time, shopping companion, and paying method were significantly different among the 4 segmented groups. 3. The demographic variables such as a city, marriage, total monthly income, and average monthly expenditure on clothing were significantly different among the 4 segmented groups.

Construction of Large Library of Protein Fragments Using Inter Alpha-carbon Distance and Binet-Cauchy Distance (내부 알파탄소간 거리와 비네-코시 거리를 사용한 대규모 단백질 조각 라이브러리 구성)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.12
    • /
    • pp.3011-3016
    • /
    • 2015
  • Representing protein three-dimensional structure by concatenating a sequence of protein fragments gives an efficient application in analysis, modeling, search, and prediction of protein structures. This paper investigated the effective combination of distance measures, which can exploit large protein structure database, in order to construct a protein fragment library representing native protein structures accurately. Clustering method was used to construct a protein fragment library. Initial clustering stage used inter alpha-carbon distance having low time complexity, and cluster extension stage used the combination of inter alpha-carbon distance, Binet-Cauchy distance, and root mean square deviation. Protein fragment library was constructed by leveraging large protein structure database using the proposed combination of distance measures. This library gives low root mean square deviation in the experiments representing protein structures with protein fragments.

Concentration Variation of Atmospheric Radon and Gaseous Pollutants Related to the Airflow Transport Pathways during 2010~2015 (대기 라돈 및 기체상 오염물질의 기류 이동경로별 농도변화: 2010~2015년 측정)

  • Song, Jung-Min;Kim, Ki-Ju;Bu, Jun-Oh;Kim, Won-Hyung;Kang, Chang-Hee;Chambers, S.
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.34 no.2
    • /
    • pp.321-330
    • /
    • 2018
  • Concentrations of the atmospheric radon and gaseous pollutants were measured at the Gosan site on Jeju Island from 2010 to 2015, in order to observe their time-series variation characteristics and examine the concentration change related to the airflow transport pathways. Based on the realtime monitoring of the atmospheric radon and gaseous pollutants, the daily mean concentrations of radon ($^{222}Rn$) and gaseous pollutants($SO_2$, CO, $O_3$, $NO_x$) were $2,400mBq\;m^{-3}$ and 1.3, 377.6, 41.1, 3.9 ppb, respectively. On monthly variations of radon, the mean concentration in October was the highest as $3,033mBq\;m^{-3}$, almost twice as that in July ($1,452mBq\;m^{-3}$). The diurnal variation of radon concentration shows bimodal curves at early morning (around 7 a.m.) and near midnight, whereas its lowest concentration was recorded at around 3 p.m. Several gaseous pollutants($SO_2$, CO, $NO_x$) showed a similar seasonal variation with radon concentration as high in winter and low in summer, whereas the $O_3$ concentrations had a bit different seasonal trend. According to the cluster back trajectory analysis, the frequencies of airflow pathways moving from continental North China, East China, Japan and the East Sea, the Korean Peninsula, and North Pacific Ocean routes were 36, 37, 10, 13, and 4%, respectively. When the airflow were moved to Jeju Island from continental China, the concentrations of radon and gaseous pollutants were relatively high. On the other hand, when the airflows were moved from North Pacific Ocean and East Sea, their concentrations were much lower than those from continental China.

Types of Smoking Decision Making-Temptation in Adolescents and Related Characteristics (청소년기 흡연의사결정-유혹 유형과 유형별 흡연 관련 특성)

  • Chang, Sung-Ok;Song, Jun-Ah;Lee, Su-Jung
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.15 no.1
    • /
    • pp.60-70
    • /
    • 2008
  • Purpose: This study was done to identify types of smoking decision making-temptation in adolescents and characteristics related to type among student smokers. Method: Data collection was done from March to July. 2006. A survey was administered to 275 students in 13 high schools and 15 middle schools in Seoul, South Korea. To identify types and characteristics smoking decision making-temptation in adolescents, cluster analysis using the K-mean method was employed. Characteristics of the influential variables according to the identified types of adolescent smokers were evaluated using ANOVA. Results: Four types of smoking pattern in adolescents were identified: habitual craving (17.7%), nicotine dependence (35.8%), feeblemindedness (28.4%), and self control (18.1%). The score for nicotine dependency was higher in the habitual craving type than any other type (F=11.79, p=.001), while the score for self efficacy for smoking abstinence was higher in the self control type (F=23.06, p=.000). Conclusions: Findings from this study suggest that effective interventions for smoking cessation in adolescents require not only active implementation of nicotine replacement therapy but also development of individualized approaches for each person targeting change in the social environment that may lead to positive smoking decisional balance.

  • PDF

Benthic algal community of Ulleungdo, East coast of Korea (동해안 울릉도 해역의 해조군집)

  • KIM, Sung-Tae;HWANG, Kangseok;PARK, Gyu-Jin;CHOI, Chang Geun
    • Journal of Fisheries and Marine Sciences Education
    • /
    • v.28 no.1
    • /
    • pp.83-90
    • /
    • 2016
  • A subtidal marine benthic algal vegetation at Ulleungdo Island, the eastern coast of Korea was investigated to clarify the community structure and vertical distribution by quadrat method at seven stations in May and August 2014. The total number of marine algal species was 148 species composed of the green algae of 12 species, the brown algae of 40 species and the red algae of 96 species. Mean biomass in dry weight was $94.8g\;dry\;weight\;m^{-2}$ in study sites, $98.1g\;dry\;weight\;m^{-2}$ in upper tidal level, and $86.6g\;dry\;weight\;m^{-2}$ in middle level. The R/P and (R+C)/P value reflecting flora characteristic were 1.9 and 2.3, respectively. Three groups produced by cluster analysis, one including sites Neunggeol, Daepung, Jukdo, second including sites Gongam, Ssangjeongcho and the other including sites Gwaneum, Hangnam, showed meaningful difference in similarity (about 40%), each other. The number of marine algal species and biomass in Ulleungdo Island area were markedly reduced comparing that in the previous studies. This result may suggest probably change of algal vegetation in future, considering the physical and chemical pollutions loaded in the coastal marine environment of this area.

A Study on Meiofauna Community in the Subtidal Sediment outside of the Saemangeum Seadike in the West Coast of Korea (새만금 외해역 조하대 퇴적물에 서식하는 중형저서동물 군집에 관한 연구)

  • Kim, Kwang-Soo;Lee, Seunghan;Hong, Jung-Ho;Lee, Wonchoel;Park, Eun-Ok
    • Ocean and Polar Research
    • /
    • v.36 no.3
    • /
    • pp.209-223
    • /
    • 2014
  • The community structure of benthic meiofauna was investigated from seasonal surveys at seventeen stations off the Saemangeum area, in 2007. Ten meiofaunal taxa were identified. Nematodes were the dominant faunal group in all seasons and harpacticoids were dominant only at a few stations. The mean density of meiofauna was 383 indiv. $10cm^{-2}$, highest in May and November (434 indiv. $10cm^{-2}$), lowest in February (284 indiv. $10cm^{-2}$). Meiofaunal mean biomass was $80.49{\mu}gC{\cdot}10cm^{-2}$, highest in November ($99.54{\mu}gC{\cdot}10cm^{-2}$), lowest in February ($51.56{\mu}gC{\cdot}10cm^{-2}$). Cluster analysis revealed that the study area was composed of three benthic meiofaunal communities. There were significant correlations between major meiofaunal groups and sediment composition and the concentrations of heavy metals. The abundance of harpacticoids are positively correlated with silt (0.559, p < 0.01) and clay (0.340, p < 0.01), and negatively correlated with sand (-0.548, p < 0.01). Harpacticoids also showed positive correlations with heavy metals. The community structure of meiofauna in the study area varied seasonally in response to the change of sediment composition.

Assessing Changes in Selected Soil Chemical Properties of Rice Paddy Fields in Gyeongbuk Province

  • Park, Sang-Jo;Park, Jun-Hong;Won, Jong-Gun;Seo, Dong-Hwan;Lee, Suk-Hee
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.50 no.3
    • /
    • pp.150-161
    • /
    • 2017
  • This study was conducted with the data of monitoring on soil chemical properties of rice paddy soils in Gyeongbuk Province. The selected soil chemical properties were analyzed every 4 year from 1999 to 2015. The soil pH measured in 2015 was higher than pH 6.0, which was 0.3-0.4 pH unit higher than data until 2007 survey year. The mean content of organic matter was greater than $24g\;kg^{-1}$ since 2003, but 35% of soil samples remained below the recommended level ($20-30g\;kg^{-1}$) in 2015. The mean concentration of available phosphate was maintained at $40mg\;kg^{-1}$ higher than the upper recommendation level ($80-120mg\;kg^{-1}$), and more than 40% of paddy soils tested were found to have less than the recommendation level during the survey period. The exchangeable K concentration ranged from 0.25 to $0.39cmol_c\;kg^{-1}$. Exchangeable Ca showed an average at the optimum range ($5.0-6.0cmol_c\;kg^{-1}$) during the monitoring period. Exchangeable Mg decreased linearly ($0.02cmol_c\;kg^{-1}\;year^{-1}$) from $1.55cmol_c\;kg^{-1}$ as of 1999 to below the lower level of the recommendation range ($1.5-2.0cmol_c\;kg^{-1}$). The amount of available $SiO_2$ was increased significantly from 2011 to over the recommendation level (${\geq}157mg\;kg^{-1}$). It was revealed that the soil chemical properties of rice paddy fields was influenced by topology, soil texture, type and region as result of principal component analysis or cluster analysis. Therefore, an assessment on chemical properties of rice paddy soils should be performed to consider various soil physical conditions and agronomic practices such as fertilization, cropping system, and so on. Because of the high variability of nutrient levels across Gyeongbuk Province, nutrient management based on soil fertility test is required by respective farm land unit.