• Title/Summary/Keyword: 계층적 군집방법

Search Result 116, Processing Time 0.03 seconds

Analyzing the Co-occurrence of Endangered Brackish-Water Snails with Other Species in Ecosystems Using Association Rule Learning and Clustering Analysis (연관 규칙 학습과 군집분석을 활용한 멸종위기 기수갈고둥과 생태계 내 종 간 연관성 분석)

  • Sung-Ho Lim;Yuno Do
    • Korean Journal of Ecology and Environment
    • /
    • v.57 no.2
    • /
    • pp.83-91
    • /
    • 2024
  • This study utilizes association rule learning and clustering analysis to explore the co-occurrence and relationships within ecosystems, focusing on the endangered brackish-water snail Clithon retropictum, classified as Class II endangered wildlife in Korea. The goal is to analyze co-occurrence patterns between brackish-water snails and other species to better understand their roles within the ecosystem. By examining co-occurrence patterns and relationships among species in large datasets, association rule learning aids in identifying significant relationships. Meanwhile, K-means and hierarchical clustering analyses are employed to assess ecological similarities and differences among species, facilitating their classification based on ecological characteristics. The findings reveal a significant level of relationship and co-occurrence between brackish-water snails and other species. This research underscores the importance of understanding these relationships for the conservation of endangered species like C. retropictum and for developing effective ecosystem management strategies. By emphasizing the role of a data-driven approach, this study contributes to advancing our knowledge on biodiversity conservation and ecosystem health, proposing new directions for future research in ecosystem management and conservation strategies.

Selecting Technique of Accident Sections using K-mean Method (K-평균법을 이용한 고속도로 사고분석구간 분할기법 개발)

  • Lee, Ki-Young;Chang, Myung-Soon
    • International Journal of Highway Engineering
    • /
    • v.7 no.4 s.26
    • /
    • pp.211-219
    • /
    • 2005
  • A selection of the analysis section for traffic accidents is used to analyze definitely the cause of accidents sorting similar accidents by a group and to raise the effect of improvement projects deciding the priority of accidents. In the existing method, an uniformly dividing method based on road mileages has been used, which has no consideration for similarities among accidents. Consequently, in recent, a slider-length method considering accident types rather than road mileages is widely used. In this study, using K-mean method, a non-hierarchical grouping technique used in the Cluster Analysis ai a applicatory method for the slider length method, a method classifies accidents that occurred the most nearby mileages into one group is proposed. To verify the proposed method, a comparison between the f-mean method and the dividing method at regular intervals on the data of a total of 25.6km lengths along Kyung-bu freeway in Pusan direction was made so that the K-mean method was proved to an effective method considering the similarities and adjacencies of accidents.

  • PDF

Analysis of Research Trends Related to drug Repositioning Based on Machine Learning (머신러닝 기반의 신약 재창출 관련 연구 동향 분석)

  • So Yeon Yoo;Gyoo Gun Lim
    • Information Systems Review
    • /
    • v.24 no.1
    • /
    • pp.21-37
    • /
    • 2022
  • Drug repositioning, one of the methods of developing new drugs, is a useful way to discover new indications by allowing drugs that have already been approved for use in people to be used for other purposes. Recently, with the development of machine learning technology, the case of analyzing vast amounts of biological information and using it to develop new drugs is increasing. The use of machine learning technology to drug repositioning will help quickly find effective treatments. Currently, the world is having a difficult time due to a new disease caused by coronavirus (COVID-19), a severe acute respiratory syndrome. Drug repositioning that repurposes drugsthat have already been clinically approved could be an alternative to therapeutics to treat COVID-19 patients. This study intends to examine research trends in the field of drug repositioning using machine learning techniques. In Pub Med, a total of 4,821 papers were collected with the keyword 'Drug Repositioning'using the web scraping technique. After data preprocessing, frequency analysis, LDA-based topic modeling, random forest classification analysis, and prediction performance evaluation were performed on 4,419 papers. Associated words were analyzed based on the Word2vec model, and after reducing the PCA dimension, K-Means clustered to generate labels, and then the structured organization of the literature was visualized using the t-SNE algorithm. Hierarchical clustering was applied to the LDA results and visualized as a heat map. This study identified the research topics related to drug repositioning, and presented a method to derive and visualize meaningful topics from a large amount of literature using a machine learning algorithm. It is expected that it will help to be used as basic data for establishing research or development strategies in the field of drug repositioning in the future.

Korean Onomatopoeia Clustering for Sound Database (음향 DB 구축을 위한 한국어 의성어 군집화)

  • Kim, Myung-Gwan;Shin, Young-Suk;Kim, Young-Rye
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.9
    • /
    • pp.1195-1203
    • /
    • 2008
  • Onomatopoeia of korean documents is to represent from natural or artificial sound to human language and it can express onomatopoeia language which is the nearest an object and also able to utilize as standard for clustering of Multimedia data. In this study, We get frequency of onomatopoeia in the experiment subject and select 100 onomatopoeia of use to our study In order to cluster onomatopoeia's relation, we extract feature of similarity and distance metric and then represent onomatopoeia's relation on vector space by using PCA. At the end, we can clustering onomatopoeia by using k-means algorithm.

  • PDF

A spatial analysis of Neyman-Scott rectangular pulses model using an approximate likelihood function (근사적 우도함수를 이용한 Neyman-Scott 구형펄스모형의 공간구조 분석)

  • Lee, Jeongjin;Kim, Yongku
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1119-1131
    • /
    • 2016
  • The Neyman-Scott Rectangular Pulses Model (NSRPM) is mainly used to construct hourly rainfall series. This model uses a modest number of parameters to represent the rainfall processes and underlying physical phenomena, such as the arrival of storms or rain cells. In NSRPM, the method of moments has often been used because it is difficult to know the distribution of rainfall intensity. Recently, approximated likelihood function for NSRPM has been introduced. In this paper, we propose a hierarchical model for applying a spatial structure to the NSRPM parameters using the approximated likelihood function. The proposed method is applied to summer hourly precipitation data observed at 59 weather stations (Korea Meteorological Administration) from 1973 to 2011.

The Cognitive Development of Secondary School Students in the Republic of Korea (한국 중등학생의 지적 발달 연구)

  • Han, Jong-Ha
    • Journal of The Korean Association For Science Education
    • /
    • v.6 no.2
    • /
    • pp.53-62
    • /
    • 1986
  • 본 연구의 목적은 한국 중 고등학교 학생들의 지적 발달의 특성을 조사 분석함으로써 교과서 및 교육과정의 개발에 필요한 기초자료를 얻으려는 것이다. 지역, 학년, 연령, 성 및 가정의 사회 경제적 지위에 따른 인지 발달 특성을 조사하였다. 연구의 대상은 전국을 대도시, 중 소도시, 농촌으로 유층화한 유층군집 표집방법에 의해 표집한 중학교 1학년부터 고등학교 2학년까지의 남 녀 학생이었다. 표집학생 수는 중학교가 18개교 54학급 3,164명이었고, 고등학교가 18개교 36학급 1,981명이었다. 가정의 사회 경제적 지위는 가정의 경제적 형편, 부의 직업, 부의 학력, 가정의 수입 정도를 고려하여 4계층으로 구분하였다. 사용된 도구는 지적 영역의 조사에 Piaget의 인지발달이론에 따른 논리발달 검사를 이용했다. 분석된 결과를 요약하면 다음과 같다. 첫째, 명제논리, 확률논리, 조합논리, 변인조작개념은 연령과 학년이 높아질수록, 대도시로 갈수록, 사회 경제적 지위가 높을수록 더욱 발달하는 경향이다. 둘째, 개념의 발달경향에 있어서 이원추리와 조합논리개념의 발달이 확률논리와 명제논리 개념의 발달보다 빠른 경향이다. 셋째, 한국의 중등학생 중에서 12세의 64.6%, 13세의 58.1%, 14세의 43.8%, 15세의 30.1%, 16세의 22.6%가 구체적 조작 후기에 도달해 있다. 넷째, 중등학생의 학년별 인지발달경향을 보면 중1의 69.8%, 중2의 51.1%, 중3의 47.4%, 고1의 21.6%, 고2의 21.7%가 구체적 후기의 발달수준이다.

  • PDF

Validation Technique of Simulation Model using Weighted F-measure with Hierarchical X-means (WF-HX) Method (계층적 X-means와 가중 F-measure를 통한 시뮬레이션 모델 검증 기법)

  • Yang, Dae-Gil;HwangBo, Hun;Cheon, Hyun-Jae;Lee, Hong-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.13 no.2
    • /
    • pp.562-574
    • /
    • 2012
  • Simulation validation techniques which have been employed in most studies are statistical analysis, which validate a model with mean or variance of throughput and resource utilization as an evaluation object. However, these methods have not been able to ensure the reliability of individual elements of the model well. To overcome the problem, the weighted F-measure method was proposed, but this technique also had some limitations. First, it is difficult to apply the technique to complex system environment with numerous values of interarrival time because it assigns a class to an individual value of interarrival time. In addition, due to unbounded weights, the value of weighted F-measure has no lower bound, so it is difficult to determine its threshold. Therefore, this paper propose weighted F-measure technique with cluster analysis to solve these problems. The classes for the technique are defined by each cluster, which reduces considerable number of classes and enables to apply the technique to various systems. Moreover, we improved the validation technique in the way of assigning minimum bounded weights without any lack of objectivity.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

A Market Segmentation Scheme Based on Customer Information and QAP Correlation between Product Networks (고객정보와 상품네트워크 유사도를 이용한 시장세분화 기법)

  • Jeong, Seok-Bong;Shin, Yong Ho;Koo, Seo Ryong;Yoon, Hyoup-Sang
    • Journal of the Korea Society for Simulation
    • /
    • v.24 no.4
    • /
    • pp.97-106
    • /
    • 2015
  • In recent, hybrid market segmentation techniques have been widely adopted, which conduct segmentation using both general variables and transaction based variables. However, the limitation of the techniques is to generate incorrect results for market segmentation even though its methodology and concept are easy to apply. In this paper, we propose a novel scheme to overcome this limitation of the hybrid techniques and to take an advantage of product information obtained by customer's transaction data. In this scheme, we first divide a whole market into several unit segments based on the general variables and then agglomerate the unit segments with higher QAP correlations. Each product network represents for purchasing patterns of its corresponding segment, thus, comparisons of QAP correlation between product networks of each segment can be a good measure to compare similarities between each segment. A case study has been conducted to validate the proposed scheme. The results show that our scheme effectively works for Internet shopping malls.

Characteristics of Community Structure for Forest Vegetation on Manisan, Ganghwado (강화도 마니산 산림식생의 군집구조 특성)

  • Shin, Hak-Sub;Shin, Jae-Kwon;Kim, Hye-Jin;Han, Sang-Hak;Lee, Won-Hee;Yun, Chung-Weon
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.16 no.1
    • /
    • pp.11-21
    • /
    • 2014
  • The purpose of this study was to furnish basic information for forest community ecology and to accumulate vegetational datum related to hierarchy of forest community for the efficient management of forest vegetation in Mt. Mani. Samples were collected and analyzed by 32 releves from August to October in 2010 using phytosociological analysis methodology of Z-M school and importance value analysis. The results were summarized that the forest vegetation was classified into 5 units in total. Importance value at vegetation unit 1 indicated Pinus densiflora 54.31 (18.10%), Quercus mongolica 39.21 (13.07%), Carpinus coreana 37.29 (12.43%), at vegetation unit 2 Quercus mongolica 89.43 (22.23%), Rhododendron mucronulatum 57.75 (14.43%), Carpinus coreana 47.19(11.80%), at vegetation unit 3 Styrax japonica 53.97 (13.50%), Acer mono 33.60 (8.40%), Carpinus coreana 26.48 (6.62%), Quercus serrata 22.51 (5.64%), at vegetation unit 4 Carpinus coreana 47.70 (11.92%), Quercus acutissima 38.40 (9.60%) and at vegetation unit 5 Evodia daniellii 80.59 (20.14%), Robinia pseudoacacia 35.00 (8.74%), Pueraria thunbergiana 28.63 (7.15%), Quercus dentata 28.20 (7.05%) in the order, respectively.