• Title/Summary/Keyword: k-Means Clustering

Search Result 1,119, Processing Time 0.035 seconds

Implementation of CNN-based classification model for flood risk determination (홍수 위험도 판별을 위한 CNN 기반의 분류 모델 구현)

  • Cho, Minwoo;Kim, Dongsoo;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.3
    • /
    • pp.341-346
    • /
    • 2022
  • Due to global warming and abnormal climate, the frequency and damage of floods are increasing, and the number of people exposed to flood-prone areas has increased by 25% compared to 2000. Floods cause huge financial and human losses, and in order to reduce the losses caused by floods, it is necessary to predict the flood in advance and decide to evacuate quickly. This paper proposes a flood risk determination model using a CNN-based classification model so that timely evacuation decisions can be made using rainfall and water level data, which are key data for flood prediction. By comparing the results of the CNN-based classification model proposed in this paper and the DNN-based classification model, it was confirmed that it showed better performance. Through this, it is considered that it can be used as an initial study to determine the risk of flooding, determine whether to evacuate, and make an evacuation decision at the optimal time.

Factors Influencing Medical Care Utilization according to Decline of Region: Urban Decline Index and Medical Vulnerability Index as Indicators (지역쇠퇴 유형별 의료이용행태 영향요인: 도시쇠퇴 지표와 의료취약지 지표를 활용하여)

  • Jeong, Ji Yun;Jeong, Jae Yeon;Yoon, In Hye;Choi, Hwa Young;Lee, Hae Jong
    • Health Policy and Management
    • /
    • v.32 no.2
    • /
    • pp.205-215
    • /
    • 2022
  • Background: The purpose of this study is to identify the factors infecting the medical care utilization from a new perspective by newly classifying the categories of administrative districts using the urban decline index and medical vulnerability index as indicators. Methods: This study targeted 150,940 people who used medical services using the 2015 cohort database (DB), 2010-2015 urban regeneration analysis index DB, and 2014-2015 public health and medical statistics DB. The decline of the region was classified using the urban decline index typed using k-means clustering and the medical vulnerability index typed using the quantile score calculation. Regression analysis was performed 3 times with medical expenditure, length of stay, and the number of outpatient visits as dependent variables. Results: There were 37 stable region (47.4%), 29 health vulnerable region (37.2%), and 12 decline region (15.4%). The health vulnerable region had lower medical expenditure, fewer outpatient visits, and a higher length of stay than the stable region. The decline region was all higher than the stable region but had no significant effect. Conclusion: The factors that cause the health disparity between regions are not only factors related to individual health behavior but also environmental factors of the local community. Therefore, there is a need for a systematic alternative that properly considers the resources within the community and reflects the characteristics of the population.

Trend Analysis of Grow-Your-Own Using Social Network Analysis: Focusing on Hashtags on Instagram

  • Park, Yumin;Shin, Yong-Wook
    • Journal of People, Plants, and Environment
    • /
    • v.24 no.5
    • /
    • pp.451-460
    • /
    • 2021
  • Background and objective: The prolonged COVID-19 pandemic has had significant impacts on mental health, which has emerged as a major public health issue around the world. This study aimed to analyze trends and network structure of 'grow-your-own (GYO)' through Instagram, one of the most influential social media platforms, to encourage and sustain home gardening activities for promotion of emotional support and physical health. Methods: A total of 6,388 posts including keyword hashtags '#gyo' and '#growyourown' on Instagram from June 13, 2020 to April 13, 2021 were collected. Word embedding was performed using Word2Vec library, and 7 clusters were identified with K-means clustering: GYO, garden and gardening, allotment, kitchen garden, sustainability, urban gardening, etc. Moreover, we conducted social network analysis to determine the centrality of related words and visualized the results using Gephi 0.9.2. Results: The analysis showed that various combinations of words, such as #growourrownfood, #growourrownveggies, and #growwhatyoueat revealed preference and interest of users in GYO, and appeared to encourage their activities on Instagram. In particular, #gardeningtips, #greenfingers, #goodlife, #gardeninglife, #gardensofinstagram were found to express positive emotions and pride as a gardener by sharing their daily gardening lives. Users were participating in urban gardening through #allotment, #raisedbeds, #kitchengarden and we could identify trends toward self-sufficiency and sustainable living. Conclusion: Based on these findings, it is expected that the trend data of GYO, which is a form of urban gardening, can be used as the basic data to establish urban gardening plans considering each characteristic, such as the emotions and identity of participants as well as their dispositions.

Estimation of design flood derived by regional frequency analysis (지역빈도분석에 의한 금강유역의 설계홍수량 산정)

  • Da Ye Kim;Seung Jin Maeng
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.104-104
    • /
    • 2023
  • 최근 2017년 청주, 천안의 홍수, 2020년 용담댐 상류와 대청댐 상류의 홍수, 2022년 청주의 도시침수를 비롯한 서울 도심의 침수피해와 같은 홍수 발생은 지역의 국민들에게 막대한 재산상의 피해를 입히고 있다. 국가적 차원에서 치수의 목적을 달성하고 경제적으로 적절한 규모의 수리구조물을 설계하기 위해 하천의 주요지점에 대한 신뢰성 있는 설계홍수량의 제시는 반드시 필요한 현실에 직면해 있다. 특히 해당 지점의 수리시설물은 점빈도분석에 의한 설계홍수량을 적용하나, 관측자료가 없는 미계측 지점에 위치한 수리시설물은 지역빈도분석에 의한 설계홍수량을 산정하여 적용해야 한다. 이에 본 연구에서는 금강 유역을 대상으로 점빈도분석과 지역빈도분석에 의한 설계홍수량 결과를 비교·분석하고자 한다. 지역빈도분석을 위한 수위관측소의 선정은 금강유역 80개 수위관측소 중 장기간 연최대홍수량 자료가 있고 유량자료의 연결성 및 신뢰성이 확보된 46개수위관측소를 대상으로 하였다. 46개 수위관측소의 연최대홍수량 계열을 대상으로 동질성, 독립성 및 이상치 검정을 수행하였으며, 세 가지 검정 모두 적절한 수위 관측소 지점은 36개 지점으로 분석되었다. 36개 수위관측소의 기본통계치(평균, 표준편차, 분산, 왜곡도 및 첨예도)를 산정한 후 3변수 Gamma 분포 계열인 GEV, GLO, GPA의 확률 분포를 적용하였다. 확률 분포별 매개변수는 전산화를 통해 L-모멘트의 차수를 0~4까지 변화시켜 LH-모멘트법에 적용하였다. LH-모멘트법에 의해 산정된 확률 분포들의 매개변수를 적용하여 적합도 검정을 수행하였다. 지역빈도분석을 위해 36개 수위관측소를 K-Means clustering 방법을 통해 4개 지역으로 구분하였다. 이를 통해LH-모멘트의 적정차수와 확률 분포에 따른 점빈도분석(지점 대상)과 지역빈도분석(지역 대상) 결과인 설계홍수량을 산정하였으며, 점빈도분석과 지역빈도분석에 의해 산정된 설계홍수량간의 분석결과를 제시하였다. 본 연구를 통해 수리구조물 설계 시 안정적인 조건 제시 및 관리체계 구축에 기여하고 방재대책 수립 시 경제·사회적 요소를 반영한 합리적 방안을 제시하고자 한다.

  • PDF

Market Segmentation Based on Types of Motivations to Visit Coffee Shops (커피전문점 방문동기유형에 따른 시장세분화)

  • Lee, Yong-Sook;Kim, Eun-Jung;Park, Heung-Jin
    • The Korean Journal of Franchise Management
    • /
    • v.7 no.1
    • /
    • pp.21-29
    • /
    • 2016
  • Purpose - The primary purpose of this study is to employ effective marketing methods using market segmentation of coffee shops by determining how motivations to visit coffee shops have different impacts on demographic profile of visitors and characteristics of coffee shop visits, so as to draw out a better understanding of customers of coffee market. Research design, data, and methodology - Data were collected using surveys of self-administered questionnaires toward coffee shop users in Daejeon, Korea. A number of samples used in data analysis were 253 excluding unusable responses. The data were analyzed through frequency, reliability, and factor analysis using SPSS 20.0. Factor analysis was conducted through the principal component analysis and varimax rotation method to derive factors of one or more eigen values. In addition, the cluster analysis, multivariate ANOVA, and cross-tab analysis were used for the market segmentation based on the types of motivation for coffee shop visits. The process of the cluster analysis is as follows. Four clusters were derived through hierarchical clustering, and k-means cluster analysis was then carried out using mean value of the four clusters as the initial seed value. Result - The factor analysis delineated four dimensions of motivation to visit coffee shops: ostentation motivation, hedonic motivation, esthetic motivation, utility motivation. The cluster analysis yielded four clusters: utility and esthetic seekers, hedonic seekers, utility seekers, ostentation seekers. In order to further specify the profile of four clusters, each cluster was cross tabulated with socio-demographics and characteristics of coffee shop visits. Four clusters are significantly different from each other by four types of motivations for coffee shop visits. Conclusions - This study has empirically examined the difference in demographic profile of visitors and characteristics of coffee shop visits by motivation to visit coffee shops. There are significant differences according to age, education background, marital status, occupation and monthly income. In addition, coffee shops use pattern characterization in frequency of visits to coffee shops, relationships with companion, purpose of visit, information sources, brand type, average expense per visit, important elements of selection attribute were significantly different depending on motivations for coffee shop visits.

Efficient Sign Language Recognition and Classification Using African Buffalo Optimization Using Support Vector Machine System

  • Karthikeyan M. P.;Vu Cao Lam;Dac-Nhuong Le
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.6
    • /
    • pp.8-16
    • /
    • 2024
  • Communication with the deaf has always been crucial. Deaf and hard-of-hearing persons can now express their thoughts and opinions to teachers through sign language, which has become a universal language and a very effective tool. This helps to improve their education. This facilitates and simplifies the referral procedure between them and the teachers. There are various bodily movements used in sign language, including those of arms, legs, and face. Pure expressiveness, proximity, and shared interests are examples of nonverbal physical communication that is distinct from gestures that convey a particular message. The meanings of gestures vary depending on your social or cultural background and are quite unique. Sign language prediction recognition is a highly popular and Research is ongoing in this area, and the SVM has shown value. Research in a number of fields where SVMs struggle has encouraged the development of numerous applications, such as SVM for enormous data sets, SVM for multi-classification, and SVM for unbalanced data sets.Without a precise diagnosis of the signs, right control measures cannot be applied when they are needed. One of the methods that is frequently utilized for the identification and categorization of sign languages is image processing. African Buffalo Optimization using Support Vector Machine (ABO+SVM) classification technology is used in this work to help identify and categorize peoples' sign languages. Segmentation by K-means clustering is used to first identify the sign region, after which color and texture features are extracted. The accuracy, sensitivity, Precision, specificity, and F1-score of the proposed system African Buffalo Optimization using Support Vector Machine (ABOSVM) are validated against the existing classifiers SVM, CNN, and PSO+ANN.

Relationship networks among nurses in acute nursing care units (종합병원 간호단위의 간호사 관계 네트워크 연구)

  • Park, Seungmi;Park, Eun-Jun
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.30 no.2
    • /
    • pp.182-191
    • /
    • 2024
  • Purpose: The purpose of this study was to explore the characteristics of social networks among registered nurses in acute nursing care units. Methods: This study used a survey design. Four nursing units from two acute hospitals were selected using a convenience method, and 83 nurses from those nursing units participated in the study in July 2022. The positive influences among nurses included friendship, collaboration, advice, and referent networks, and the negative influences included avoidance and bullying networks. Using the NetMiner program, the k-means clustering technique was applied to create groups of nodes with similar characteristics. The general characteristics of the participants were analyzed by mean, standard deviation, frequency, and ANOVA or chi-squared test. Results: As a result of dividing the 83 nurse participants into four clusters, positive influencers, silent peers, unwelcome peers, and active bullies were identified. Positive influence group nurses were frequently mentioned in the friendship, collaboration, advice, and referent networks. On the other hand, nurses in the unwelcome group and the active bullying group were frequently mentioned in the avoidance and bullying networks. Conclusion: Social networks that have a positive or negative impact on nursing performance are created through different relationships between nurses. Nurse managers can use the findings to create a more supportive and collaborative environment. Further research is needed to develop intervention programs to improve interactions and relationships between fellow nurses.

Analyzing fashion item purchase patterns and channel transition patterns using association rules and brand loyalty in big data (빅데이터의 연관규칙과 브랜드 충성도를 활용한 패션품목 구매패턴과 구매채널 전환패턴 분석)

  • Ki Yong Kwon
    • The Research Journal of the Costume Culture
    • /
    • v.32 no.2
    • /
    • pp.199-214
    • /
    • 2024
  • Until now, research on consumers' purchasing behavior has primarily focused on psychological aspects or depended on consumer surveys. However, there may be a gap between consumers' self-reported perceptions and their observable actions. In response, this study aimed to investigate consumer purchasing behavior utilizing a big data approach. To this end, this study investigated the purchasing patterns of fashion items, both online and in retail stores, from a data-driven perspective. We also investigated whether individual consumers switched between online websites and retail establishments for making purchases. Data on 516,474 purchases were obtained from fashion companies. We used association rule analysis and K-means clustering to identify purchase patterns that were influenced by customer loyalty. Furthermore, sequential pattern analysis was applied to investigate the usage patterns of online and offline channels by consumers. The results showed that high-loyalty consumers mainly purchased infrequently bought items in the brand line, as well as high-priced items, and that these purchase patterns were similar both online and in stores. In contrast, the low-loyalty group showed different purchasing behaviors for online versus in-store purchases. In physical environments, the low-loyalty consumers tended to purchase less popular or more expensive items from the brand line, whereas in online environments, their purchases centered around items with relatively high sales volumes. Finally, we found that both high and low loyalty groups exclusively used a single preferred channel, either online or in-store. The findings help companies better understand consumer purchase patterns and build future marketing strategies around items with high brand centrality.

A Study on the Impact Factors of Contents Diffusion in Youtube using Integrated Content Network Analysis (일반영향요인과 댓글기반 콘텐츠 네트워크 분석을 통합한 유튜브(Youtube)상의 콘텐츠 확산 영향요인 연구)

  • Park, Byung Eun;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.19-36
    • /
    • 2015
  • Social media is an emerging issue in content services and in current business environment. YouTube is the most representative social media service in the world. YouTube is different from other conventional content services in its open user participation and contents creation methods. To promote a content in YouTube, it is important to understand the diffusion phenomena of contents and the network structural characteristics. Most previous studies analyzed impact factors of contents diffusion from the view point of general behavioral factors. Currently some researchers use network structure factors. However, these two approaches have been used separately. However this study tries to analyze the general impact factors on the view count and content based network structures all together. In addition, when building a content based network, this study forms the network structure by analyzing user comments on 22,370 contents of YouTube not based on the individual user based network. From this study, we re-proved statistically the causal relations between view count and not only general factors but also network factors. Moreover by analyzing this integrated research model, we found that these factors affect the view count of YouTube according to the following order; Uploader Followers, Video Age, Betweenness Centrality, Comments, Closeness Centrality, Clustering Coefficient and Rating. However Degree Centrality and Eigenvector Centrality affect the view count negatively. From this research some strategic points for the utilizing of contents diffusion are as followings. First, it is needed to manage general factors such as the number of uploader followers or subscribers, the video age, the number of comments, average rating points, and etc. The impact of average rating points is not so much important as we thought before. However, it is needed to increase the number of uploader followers strategically and sustain the contents in the service as long as possible. Second, we need to pay attention to the impacts of betweenness centrality and closeness centrality among other network factors. Users seems to search the related subject or similar contents after watching a content. It is needed to shorten the distance between other popular contents in the service. Namely, this study showed that it is beneficial for increasing view counts by decreasing the number of search attempts and increasing similarity with many other contents. This is consistent with the result of the clustering coefficient impact analysis. Third, it is important to notice the negative impact of degree centrality and eigenvector centrality on the view count. If the number of connections with other contents is too much increased it means there are many similar contents and eventually it might distribute the view counts. Moreover, too high eigenvector centrality means that there are connections with popular contents around the content, and it might lose the view count because of the impact of the popular contents. It would be better to avoid connections with too powerful popular contents. From this study we analyzed the phenomenon and verified diffusion factors of Youtube contents by using an integrated model consisting of general factors and network structure factors. From the viewpoints of social contribution, this study might provide useful information to music or movie industry or other contents vendors for their effective contents services. This research provides basic schemes that can be applied strategically in online contents marketing. One of the limitations of this study is that this study formed a contents based network for the network structure analysis. It might be an indirect method to see the content network structure. We can use more various methods to establish direct content network. Further researches include more detailed researches like an analysis according to the types of contents or domains or characteristics of the contents or users, and etc.

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.