• Title/Summary/Keyword: Correlation clustering

Search Result 272, Processing Time 0.026 seconds

Correlation of Cancer Incidence with Diet, Smoking and Socio-Economic Position Across 22 Districts of Tehran in 2008

  • Rohani-Rasaf, Marzieh;Abdollahi, Morteza;Jazayeri, Shima;Kalantari, Naser;Asadi-Lari, Mohsen
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.3
    • /
    • pp.1669-1676
    • /
    • 2013
  • Background: Variation in cancer incidence in geographical locations is due to different lifestyles and risk factors. Diet and socio-economic position (SEP) have been identified as important for the etiology of cancer but patterns are changing and inconsistent. The aim of this study was to investigate correlations of the incidence of common cancers with food groups, total energy, smoking, and SEP. Materials and Methods: In an ecological study, disaggregated cancer data through the National Cancer Registry in Iran (2008) and dietary intake, smoking habits and SEP obtained through a population based survey within the Urban Health Equity Assessment (Urban-HEART) project were correlated across 22 districts of Tehran. Results: Consumption of fruit, meat and dairy products adjusted for energy were positively correlated with bladder, colorectal, prostate and breast and total cancers in men and women, while these cancers were adversely correlated with bread and fat intake. Also prostate, breast, colorectal, bladder and ovarian cancers had a positive correlation with SEP; there was no correlation between SEP and skin cancer in both genders and stomach cancer in men. Conclusions: The incidence of cancer was higher in some regions of Tehran which appeared to be mainly determined by SEP rather than dietary intake. Further individual data are required to investigate reasons of cancer clustering.

A Market Segmentation Scheme Based on Customer Information and QAP Correlation between Product Networks (고객정보와 상품네트워크 유사도를 이용한 시장세분화 기법)

  • Jeong, Seok-Bong;Shin, Yong Ho;Koo, Seo Ryong;Yoon, Hyoup-Sang
    • Journal of the Korea Society for Simulation
    • /
    • v.24 no.4
    • /
    • pp.97-106
    • /
    • 2015
  • In recent, hybrid market segmentation techniques have been widely adopted, which conduct segmentation using both general variables and transaction based variables. However, the limitation of the techniques is to generate incorrect results for market segmentation even though its methodology and concept are easy to apply. In this paper, we propose a novel scheme to overcome this limitation of the hybrid techniques and to take an advantage of product information obtained by customer's transaction data. In this scheme, we first divide a whole market into several unit segments based on the general variables and then agglomerate the unit segments with higher QAP correlations. Each product network represents for purchasing patterns of its corresponding segment, thus, comparisons of QAP correlation between product networks of each segment can be a good measure to compare similarities between each segment. A case study has been conducted to validate the proposed scheme. The results show that our scheme effectively works for Internet shopping malls.

A Study on Spatial Pattern of Impact Area of Intersection Using Digital Tachograph Data and Traffic Assignment Model (차량 운행기록정보와 통행배정 모형을 이용한 교차로 영향권의 공간적 패턴에 관한 연구)

  • PARK, Seungjun;HONG, Kiman;KIM, Taegyun;SEO, Hyeon;CHO, Joong Rae;HONG, Young Suk
    • Journal of Korean Society of Transportation
    • /
    • v.36 no.2
    • /
    • pp.155-168
    • /
    • 2018
  • In this study, we studied the directional pattern of entering the intersection from the intersection upstream link prior to predicting short future (such as 5 or 10 minutes) intersection direction traffic volume on the interrupted flow, and examined the possibility of traffic volume prediction using traffic assignment model. The analysis method of this study is to investigate the similarity of patterns by performing cluster analysis with the ratio of traffic volume by intersection direction divided by 2 hours using taxi DTG (Digital Tachograph) data (1 week). Also, for linking with the result of the traffic assignment model, this study compares the impact area of 5 minutes or 10 minutes from the center of the intersection with the analysis result of taxi DTG data. To do this, we have developed an algorithm to set the impact area of intersection, using the taxi DTG data and traffic assignment model. As a result of the analysis, the intersection entry pattern of the taxi is grouped into 12, and the Cubic Clustering Criterion indicating the confidence level of clustering is 6.92. As a result of correlation analysis with the impact area of the traffic assignment model, the correlation coefficient for the impact area of 5 minutes was analyzed as 0.86, and significant results were obtained. However, it was analyzed that the correlation coefficient is slightly lowered to 0.69 in the impact area of 10 minutes from the center of the intersection, but this was due to insufficient accuracy of O/D (Origin/Destination) travel and network data. In future, if accuracy of traffic network and accuracy of O/D traffic by time are improved, it is expected that it will be able to utilize traffic volume data calculated from traffic assignment model when controlling traffic signals at intersections.

Classification of Cultivation Region for Soybean (Glycine max [L.]) in South Korea Based on 30 Years of Weather Indices (평년기상을 활용한 우리나라의 콩 재배지역 구분)

  • Dong-Kyung Yoon;Jaesung Park;Jinhee Seo;Okjae Won;Man-Soo Choi;Hyeon Su Lee;Chaewon Lee
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.69 no.1
    • /
    • pp.49-60
    • /
    • 2024
  • A region can be divided into cultivation zones based on homogeneity in weather variables that have the greatest influence on crop growth and yield. This study classified the cultivation zone of soybean using weather indices as a prior study to classify the agroclimatic zone of soybean. Meteorological factors affecting soybeans were determined through correlation analysis over a 10 year period (from 2013 to 2022) using data from the Miryang and Suwon regions collected from the soybean yield trial database of the Rural Development Administration, Korea and the meteorological database of the Korea Meteorological Administration. The correlation between growth characteristics and the minimum temperature, daily temperature range, and precipitation were high during the vegetative growth stages. Moreover, the correlation between yield components and the maximum temperature, daily temperature range, and precipitation were high during the reproductive growth stages. As a result of k-means clustering, soybean cultivation zones were divided into three zones. Zone 1 was the central inland region and southern Gyeonggi-do; Zone 2 was the southern part of the west coast, the southern part of the east coast, and the South Sea; and Zone 3 included parts of eastern Gyeonggi-do, Gangwon-do, and areas with high altitudes. Zone 1, which has a wide latitude range, was further subdivided into three cultivation zones. The results of this study may provide useful information for estimating agrometeorological characteristics and predicting the success of soybean cultivation in South Korea.

A Study on the Prediction of Residual Probability of Fine Dust in Complex Urban Area (복잡한 도심에서의 유입된 미세먼지 잔류 가능성 예보 연구)

  • Park, Sung Ju;Seo, You Jin;Kim, Dong Wook;Choi, Hyun Jeong
    • Journal of the Korean earth science society
    • /
    • v.41 no.2
    • /
    • pp.111-128
    • /
    • 2020
  • This study presents a possibility of intensification of fine dust mass concentration due to the complex urban structure using data mining technique and clustering analysis. The data mining technique showed no significant correlation between fine dust concentration and regional-use public urban data over Seoul. However, clustering analysis based on nationwide-use public data showed that building heights (floors) have a strong correlation particularly with PM10. The modeling analyses using the single canopy model and the micro-atmospheric modeling program (ENVI-Met. 4) conducted that the controlled atmospheric convection in urban area leaded to the congested flow pattern depending on the building along the distribution and height. The complex structure of urban building controls convective activity resulted in stagnation condition and fine dust increase near the surface. Consequently, the residual effect through the changes in the thermal environment caused by the shape and structure of the urban buildings must be considered in the fine dust distribution. It is notable that the atmospheric congestion may be misidentified as an important implications for providing information about the residual probability of fine dust mass concentration in the complex urban area.

Preference Prediction System using Similarity Weight granted Bayesian estimated value and Associative User Clustering (베이지안 추정치가 부여된 유사도 가중치와 연관 사용자 군집을 이용한 선호도 예측 시스템)

  • 정경용;최성용;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.316-325
    • /
    • 2003
  • A user preference prediction method using an exiting collaborative filtering technique has used the nearest-neighborhood method based on the user preference about items and has sought the user's similarity from the Pearson correlation coefficient. Therefore, it does not reflect any contents about items and also solve the problem of the sparsity. This study suggests the preference prediction system using the similarity weight granted Bayesian estimated value and the associative user clustering to complement problems of an exiting collaborative preference prediction method. This method suggested in this paper groups the user according to the Genre by using Association Rule Hypergraph Partitioning Algorithm and the new user is classified into one of these Genres by Naive Bayes classifier to slove the problem of sparsity in the collaborative filtering system. Besides, for get the similarity between users belonged to the classified genre and new users, this study allows the different estimated value to item which user vote through Naive Bayes learning. If the preference with estimated value is applied to the exiting Pearson correlation coefficient, it is able to promote the precision of the prediction by reducing the error of the prediction because of missing value. To estimate the performance of suggested method, the suggested method is compared with existing collaborative filtering techniques. As a result, the proposed method is efficient for improving the accuracy of prediction through solving problems of existing collaborative filtering techniques.

Pattern Analysis for Urban Spatial Distribution of Traffic Accidents in Jinju (진주시 교통사고의 도시공간분포패턴 분석)

  • Sung, Byeong Jun;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.3
    • /
    • pp.99-105
    • /
    • 2014
  • Since traffic accidents account for the highest proportion of the artificial disasters which occur in urban areas along with fire, more scientific an analysis on the causes of traffic accidents and various prevention measures against traffic accidents are needed. In this study, the research selected Jinju-si, which belongs to local small and medium-sized cities as a research target to analyze the characteristics of temporal and spacial distribution of traffic accidents by associating the data of traffic accidents, occurred in 2013 with the causes of traffic accidents and location information that includes occurrence time and seasonal features. It subsequently examines the spatial correlation between traffic accidents and the characteristics of urban space development according to the plans of land using. As a result, the characteristics of accident distribution according to the types of accidents reveal that side right-angle collisions (car versus car) and pedestrian-crossing accident (car versus man) showed the highest clustering in the density analysis and average nearest neighbor analysis. In particular, traffic accidents occurred the most on roads which connect urban central commercial areas, high-density residential areas, and industrial areas. In addition, human damage in damage conditions, clear day in weather condition, dry condition in the road condition, and three-way intersection in the road way showed the highest clustering.

A Study on the Cerber-Type Ransomware Detection Model Using Opcode and API Frequency and Correlation Coefficient (Opcode와 API의 빈도수와 상관계수를 활용한 Cerber형 랜섬웨어 탐지모델에 관한 연구)

  • Lee, Gye-Hyeok;Hwang, Min-Chae;Hyun, Dong-Yeop;Ku, Young-In;Yoo, Dong-Young
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.363-372
    • /
    • 2022
  • Since the recent COVID-19 Pandemic, the ransomware fandom has intensified along with the expansion of remote work. Currently, anti-virus vaccine companies are trying to respond to ransomware, but traditional file signature-based static analysis can be neutralized in the face of diversification, obfuscation, variants, or the emergence of new ransomware. Various studies are being conducted for such ransomware detection, and detection studies using signature-based static analysis and behavior-based dynamic analysis can be seen as the main research type at present. In this paper, the frequency of ".text Section" Opcode and the Native API used in practice was extracted, and the association between feature information selected using K-means Clustering algorithm, Cosine Similarity, and Pearson correlation coefficient was analyzed. In addition, Through experiments to classify and detect worms among other malware types and Cerber-type ransomware, it was verified that the selected feature information was specialized in detecting specific ransomware (Cerber). As a result of combining the finally selected feature information through the above verification and applying it to machine learning and performing hyper parameter optimization, the detection rate was up to 93.3%.

Visualized Determination for Installation Location of Monitoring Devices using CPTED (CPTED기법을 통한 모니터링 시스템 설치위치 시각화 결정법)

  • Kim, Joohwan;Nam, Doohee
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.2
    • /
    • pp.145-150
    • /
    • 2015
  • Needs about safety of residents are important in urbanized society, elderly and small-size family. People are looking for safety information system and device of CPTED. That is, Needs and Installations of CCTV increased steadily. But, scientific analysis about validity, systematic plan and location of security CCTV is nonexistent. It is simply put these devised in more demanded areas. It has limits to look for safety of residents by increasing density of CCTVs. One of the characteristics of crime is clustering and stong interconnectivity. So, exploratory spatial data of crime is geo-coded using 2 years data and carried out cluster analysis and space statistical analysis through GIS space analysis by dividing 18 variables into social economy, urban space, crime prevention facility and crime occurrence index. The result of analysis shows cluster of 5 major crimes, theft, violence and sexual violence by Nearest Neighbor distance analysis and Ripley's K function. It also shows strong crime interconnectivity through criminal correlation analysis. In case of finding criminal cluster, you can find criminal hotspot. So, in this study I found concept of hotspot and considered technique about selection of hotspot. And then, selected hotspot about 5 major crimes, theft, violence and sexual violence through Nearest Neighbor Hierarchical Spatial Clustering.

Performance Improvement of Collaborative Filtering System Using Associative User′s Clustering Analysis for the Recalculation of Preference and Representative Attribute-Neighborhood (선호도 재계산을 위한 연관 사용자 군집 분석과 Representative Attribute -Neighborhood를 이용한 협력적 필터링 시스템의 성능향상)

  • Jung, Kyung-Yong;Kim, Jin-Su;Kim, Tae-Yong;Lee, Jung-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.287-296
    • /
    • 2003
  • There has been much research focused on collaborative filtering technique in Recommender System. However, these studies have shown the First-Rater Problem and the Sparsity Problem. The main purpose of this Paper is to solve these Problems. In this Paper, we suggest the user's predicting preference method using Bayesian estimated value and the associative user clustering for the recalculation of preference. In addition to this method, to complement a shortcoming, which doesn't regard the attribution of item, we use Representative Attribute-Neighborhood method that is used for the prediction when we find the similar neighborhood through extracting the representative attribution, which most affect the preference. We improved the efficiency by using the associative user's clustering analysis in order to calculate the preference of specific item within the cluster item vector to the collaborative filtering algorithm. Besides, for the problem of the Sparsity and First-Rater, through using Association Rule Hypergraph Partitioning algorithm associative users are clustered according to the genre. New users are classified into one of these genres by Naive Bayes classifier. In addition, in order to get the similarity value between users belonged to the classified genre and new users, and this paper allows the different estimated value to item which user evaluated through Naive Bayes learning. As applying the preference granted the estimated value to Pearson correlation coefficient, it can make the higher accuracy because the errors that cause the missing value come less. We evaluate our method on a large collaborative filtering database of user rating and it significantly outperforms previous proposed method.