• Title/Summary/Keyword: Clustering sampling

Search Result 86, Processing Time 0.024 seconds

Testing Independence in Contingency Tables with Clustered Data (집락자료의 분할표에서 독립성검정)

  • 정광모;이현영
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.2
    • /
    • pp.337-346
    • /
    • 2004
  • The Pearson chi-square goodness-of-fit test and the likelihood ratio tests are usually used for testing independence in two-way contingency tables under random sampling. But both of these tests may provide false results for the contingency table with clustered observations. In this case we consider the generalized linear mixed model which includes random effects of clustering in addition to the fixed effects of covariates. Both the heterogeneity between clusters and the dependency within a cluster can be explained via generalized linear mixed model. In this paper we introduce several types of generalized linear mixed model for testing independence in contingency tables with clustered observations. We also discuss the fitting of these models through a real dataset.

Multi-Objective Optimization of a Dimpled Channel Using NSGA-II (NSGA-II를 통한 딤플채널의 다중목적함수 최적화)

  • Lee, Ki-Don;Samad, Abdus;Kim, Kwang-Yong
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2008.03b
    • /
    • pp.113-116
    • /
    • 2008
  • This work presents numerical optimization for design of staggered arrays of dimples printed on opposite surfaces of a cooling channel with a fast and elitist Non-Dominated Sorting of Genetic Algorithm (NSGA-II) of multi-objective optimization. As Pareto optimal front produces a set of optimal solutions, the trends of objective functions with design variables are predicted by hybrid multi-objective evolutionary algorithm. The problem is defined by three non-dimensional geometric design variables composed of dimpled channel height, dimple print diameter, dimple spacing and dimple depth to maximize heat transfer rate compromising with pressure drop. Twenty designs generated by Latin hypercube sampling were evaluated by Reynolds-averaged Navier-Stokes solver and the evaluated objectives were used to construct Pareto optimal front through hybrid multi-objective evolutionary algorithm. The optimum designs were grouped by k-mean clustering technique and some of the clustered points were evaluated by flow analysis. With increase in dimple depth, heat transfer rate increases and at the same time pressure drop also increases, while opposite behavior is obtained for the dimple spacing. The heat transfer performance is related to the vertical motion of the flow and the reattachment length in the dimple.

  • PDF

Determinants of Consumer Preference by type of Accommodation: Two Step Cluster Analysis (이단계 군집분석에 의한 농촌관광 편의시설 유형별 소비자 선호 결정요인)

  • Park, Duk-Byeong;Yoon, Yoo-Shik;Lee, Min-Soo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.3
    • /
    • pp.1-19
    • /
    • 2007
  • 1. Purpose Rural tourism is made by individuals with different characteristics, needs and wants. It is important to have information on the characteristics and preferences of the consumers of the different types of existing rural accommodation. The stud aims to identify the determinants of consumer preference by type of accommodations. 2. Methodology 2.1 Sample Data were collected from 1000 people by telephone survey with three-stage stratified random sampling in seven metropolitan areas in Korea. Respondents were chosen by sampling internal on telephone book published in 2006. We surveyed from four to ten-thirty 0'clock afternoon so as to systematic sampling considering respondents' life cycle. 2.2 Two-step cluster Analysis Our study is accomplished through the use of a two-step cluster method to classify the accommodation in a reduced number of groups, so that each group constitutes a type. This method had been suggested as appropriate in clustering large data sets with mixed attributes. The method is based on a distance measure that enables data with both continuous and categorical attributes to be clustered. This is derived from a probabilistic model in which the distance between two clusters in equivalent to the decrease in log-likelihood function as a result of merging. 2.3 Multinomial Logit Analysis The estimation of a Multionmial Logit model determines the characteristics of tourist who is most likely to opt for each type of accommodation. The Multinomial Logit model constitutes an appropriate framework to explore and explain choice process where the choice set consists of more than two alternatives. Due to its ease and quick estimation of parameters, the Multinomial Logit model has been used for many empirical studies of choice in tourism. 3. Findings The auto-clustering algorithm indicated that a five-cluster solution was the best model, because it minimized the BIC value and the change in them between adjacent numbers of clusters. The accommodation establishments can be classified into five types: Traditional House, Typical Farmhouse, Farmstay house for group Tour, Log Cabin for Family, and Log Cabin for Individuals. Group 1 (Traditional House) includes mainly the large accommodation establishments, i.e. those with ondoll style room providing meals and one shower room on family tourist, of original construction style house. Group 2 (Typical Farmhouse) encompasses accommodation establishments of Ondoll rooms and each bathroom providing meals. It includes, in other words, the tourist accommodations Known as "rural houses." Group 3 (Farmstay House for Group) has accommodation establishments of Ondoll rooms not providing meals and self cooking facilities, large room size over five persons. Group 4 (Log Cabin for Family) includes mainly the popular accommodation establishments, i.e. those with Ondoll style room with on shower room on family tourist, of western styled log house. While the accommodations in this group are not defined as regards type of construction, the group does include all the original Korean style construction, Finally, group 5 (Log Cabin for Individuals)includes those accommodations that are bedroom western styled wooden house with each bathroom. First Multinomial Logit model is estimated including all the explicative variables considered and taking accommodation group 2 as base alternative. The results show that the variables and the estimated values of the parameters for the model giving the probability of each of the five different types of accommodation available in rural tourism village in Korea, according to the socio-economic and trip related characteristics of the individuals. An initial observation of the analysis reveals that none of variables income, the number of journey, distance, and residential style of house is explicative in the choice of rural accommodation. The age and accompany variables are significant for accommodation establishment of group 1. The education and rural residential experience variables are significant for accommodation establishment of groups 4 and 5. The expenditure and marital status variables are significant for accommodation establishment of group 4. The gender and occupation variable are significant for accommodation establishment of group 3. The loyalty variable is significant for accommodation establishment of groups 3 and 4. The study indicates that significant differences exist among the individuals who choose each type of accommodation at a destination. From this investigation is evident that several profiles of tourists can be attracted by a rural destination according to the types of existing accommodations at this destination. Besides, the tourist profiles may be used as the basis for investment policy and promotion for each type of accommodation, making use in each case of the variables that indicate a greater likelihood of influencing the tourist choice of accommodation.

  • PDF

Spatial Pattern Analysis for Distribution of Migratory Insect Pests at Paddy Field in Jeolla-province (전라도 지역 논벼에서 비래해충 개체군 분포의 공간패턴분석)

  • Park, Taechul;Choe, Hojeong;Jeong, Hyoujin;Jang, Hojung;Kim, Kwang Ho;Park, Jung-Joon
    • Korean journal of applied entomology
    • /
    • v.57 no.4
    • /
    • pp.361-372
    • /
    • 2018
  • Migratory insect pest populations migrate from the southern China to Korea through jet streams. In Korea, 5 major migratory insect species are important, i.e. Nilaparvata lugens, Sogatella furcifera, Laodelphax striatellus, Cnaphalocrocis medinalis and Mythimma separate, which are damages to the major crops, rice. This study was conducted from late July 2016 to early September 2016 and from July 2017 to August 2017 in rice paddy of Jeolla-province. C. medinalis and M. separata collected using pheromone traps, while N. lugens, S. furcifera and L. striatellus collected using 3 methods (visual surveys, sweeping surveys, sticky traps). SADIE (Spatial Analysis by Distance IndicEs) among geostatistics was used to analyze migratory insect pests. SADIE was used to analyze spatial distribution and index of aggregation $I_a$, index of clustering $V_i$, $V_j$ were used to investigate the spatial distribution. Also, the clustering indices were mapped as red-blue plot. C. medinalis and M. separata showed different distribution based on SADIE spatial aggregation analysis and red-blue plot analysis. Initial spatial distributions of L. striatellus and other planthoppers were differed for sampling location and time.

A Study on Heavy Rainfall Guidance Realized with the Aid of Neuro-Fuzzy and SVR Algorithm Using AWS Data (AWS자료 기반 SVR과 뉴로-퍼지 알고리즘 구현 호우주의보 가이던스 연구)

  • Kim, Hyun-Myung;Oh, Sung-Kwun;Kim, Yong-Hyuk;Lee, Yong-Hee
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.4
    • /
    • pp.526-533
    • /
    • 2014
  • In this study, we introduce design methodology to develop a guidance for issuing heavy rainfall warning by using both RBFNNs(Radial basis function neural networks) and SVR(Support vector regression) model, and then carry out the comparative studies between two pattern classifiers. Individual classifiers are designed as architecture realized with the aid of optimization and pre-processing algorithm. Because the predictive performance of the existing heavy rainfall forecast system is commonly affected from diverse processing techniques of meteorological data, under-sampling method as the pre-processing method of input data is used, and also data discretization and feature extraction method for SVR and FCM clustering and PSO method for RBFNNs are exploited respectively. The observed data, AWS(Automatic weather wtation), supplied from KMA(korea meteorological administration), is used for training and testing of the proposed classifiers. The proposed classifiers offer the related information to issue a heavy rain warning in advance before 1 to 3 hours by using the selected meteorological data and the cumulated precipitation amount accumulated for 1 to 12 hours from AWS data. For performance evaluation of each classifier, ETS(Equitable Threat Score) method is used as standard verification method for predictive ability. Through the comparative studies of two classifiers, neuro-fuzzy method is effectively used for improved performance and to show stable predictive result of guidance to issue heavy rainfall warning.

Preference Differences in Interior Images of Restaurants according to Lifestyles (라이프스타일 유형에 따른 레스토랑 실내이미지 선호도 차이에 관한 연구)

  • Kim, Tae-Hee;Park, Young-Seok
    • Journal of the Korean Home Economics Association
    • /
    • v.43 no.10 s.212
    • /
    • pp.69-79
    • /
    • 2005
  • The purpose of this study was to determine restaurant patrons' preference differences in interior design style of restaurants according to their lifestyles. Written questionnaires were handed out to 500 adults in Seoul and surroundings and the results were sampled by convenience sampling. The questionnaire was composed of respondents' general characteristics, lifestyles, and preference for 10 types of interior design style. A total of 415 questionnaires were usable for data analysis, resulting in a response rate of $83\%$. To analyze the collected data, frequency, factor, reliability, quick clustering K- means and One-Way ANOVA analysis were conducted using SPSS 10.0. The results showed that there were preference differences in 10 types of interior design style of restaurants according to lifestyle types which were categorized into 4 groups. The conservative and self-convinced group showed the lowest preference scores in the 10 types of interior design style which are Romantic, Ethnic, Classic, High-Tech, Elegant, Country, Modem, Minimal, Natural, and Casual style. The quality life pursuing group and extroverted individuality groups showed the high preference scores in most of the styles, especially in the Classic and Elegant styles. The realistic self-centered group showed the highest preference scores in Casual style among the 4 groups. These study findings indicate that restaurants should take into account their patrons' lifestyles as a mean of market segmentation, and respond to their taste and preference when they have established suitable servicescape.

Assessment through Statistical Methods of Water Quality Parameters(WQPs) in the Han River in Korea

  • Kim, Jae Hyoun
    • Journal of Environmental Health Sciences
    • /
    • v.41 no.2
    • /
    • pp.90-101
    • /
    • 2015
  • Objective: This study was conducted to develop a chemical oxygen demand (COD) regression model using water quality monitoring data (January, 2014) obtained from the Han River auto-monitoring stations. Methods: Surface water quality data at 198 sampling stations along the six major areas were assembled and analyzed to determine the spatial distribution and clustering of monitoring stations based on 18 WQPs and regression modeling using selected parameters. Statistical techniques, including combined genetic algorithm-multiple linear regression (GA-MLR), cluster analysis (CA) and principal component analysis (PCA) were used to build a COD model using water quality data. Results: A best GA-MLR model facilitated computing the WQPs for a 5-descriptor COD model with satisfactory statistical results ($r^2=92.64$,$Q{^2}_{LOO}=91.45$,$Q{^2}_{Ext}=88.17$). This approach includes variable selection of the WQPs in order to find the most important factors affecting water quality. Additionally, ordination techniques like PCA and CA were used to classify monitoring stations. The biplot based on the first two principal components (PCs) of the PCA model identified three distinct groups of stations, but also differs with respect to the correlation with WQPs, which enables better interpretation of the water quality characteristics at particular stations as of January 2014. Conclusion: This data analysis procedure appears to provide an efficient means of modelling water quality by interpreting and defining its most essential variables, such as TOC and BOD. The water parameters selected in a COD model as most important in contributing to environmental health and water pollution can be utilized for the application of water quality management strategies. At present, the river is under threat of anthropogenic disturbances during festival periods, especially at upstream areas.

Geographic Variations between Jedo Venus Clam (Protothaca jedoensis, Lischke) Populations from Boryeong and Wonsan of Korea

  • Park, Gi-Sik;Yoon, Jong-Man
    • The Korean Journal of Malacology
    • /
    • v.24 no.1
    • /
    • pp.11-24
    • /
    • 2008
  • GDNA was isolated from the jedo venus clam (Protothaca jedoensis, Lischke) from Boryeong (jedo venus clam from Boryeong JVCB) and Wonsan (jedo venus clam from Wonsan; JVCW) located in the West Sea and the East Sea of Korean Peninsula, respectively and we performed clustering analyses, DNA polymorphisms and the populations genetic variations. In the present study, the seven decamer primer generated the one hundred and eleven major/minor specific bands in JVCB population and ninety four-specific bands in JVCW population. Seven primers generated the unique shared bands to each population of one hundred and seventy-six, on average of 25,1, in JVCB population from Boryeong and three hundred thirty, on average of 47,1, in JVCW population from Wonsan, respectively. The dendrogram obtained by the seven oligonucleotides primers, indicates two genetic clusters. Especially, two Protothaca between the individual WONSAN no. 12 and BORYEONG no. 10 showed the longest genetic distance (0.537) in comparison with other individuals used. Accordingly, RAPD analysis showed that the JVCB was a little more genetically diverse than the JVCW population. This result implies the genetic similarity owing to rearing in the same and/or similar circumstances or inbreeding within the JVCW population. So to speak, JVCB population may have high levels of genomic DNA variability owing to the introduction of the wild individuals from the other sites to sampling sites although it may be the geographically diverse distribution of this species. However, it was confirmed that it did not appear like that really in this study. We feel convinced that RAPD analysis discovered a significant genetic distance between two Protothaca population pairs (P<0.001). The existence of population discrimination and genetic diversity between two Protothaca populations was identified by RAPD analysis.

  • PDF

Automatic Detection of Foreign Body through Template Matching in Industrial CT Volume Data (산업용 CT 볼륨데이터에서 템플릿 매칭을 통한 이물질 자동 검출)

  • Ji, Hye-Rim;Hong, Helen
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.12
    • /
    • pp.1376-1384
    • /
    • 2013
  • In this paper, we propose an automaticdetection method of foreign bodies through template matching in industrial CT volume data. Our method is composed of three main steps. First,Indown-sampling data, the product region is separated from background after noise reduction and initial foreign-body candidates are extracted using mean and standard deviation of the product region. Then foreign-body candidates are extracted using K-means clustering. Second, the foreign body with different intensity of product region is detected using template matching. At this time, the template matching is performed by evaluating SSD orjoint entropy according to the size of detected foreign-body candidates. Third, to improve thedetection rate of foreign body in original volume data, final foreign bodiesare detected using percolation method. For the performance evaluation of our method, industrial CT volume data and simulation data are used. Then visual inspection and accuracy assessment are performed and processing time is measured. For accuracy assessment, density-based detection method is used as comparative method and Dice's coefficient is measured.

Relationships of Colorectal Cancer with Dietary Factors and Public Health Indicators: an Ecological Study

  • Abbastabar, Hedayat;Roustazadeh, Abazar;Alizadeh, Ali;Hamidifard, Parvin;Valipour, Mehrdad;Valipour, Ali Asghar
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.9
    • /
    • pp.3991-3995
    • /
    • 2015
  • Background: Colorectal cancer (CRC) is the third most common cancer in Iranian women and fifth in men. The aims of this study were to investigate the relation of dietary factors and public health indicators to its development. Materials and Methods: The required information (2001-2006) about risk factors was obtained from the Non-Communicable Disease Surveillance Centre (NCDSC) of Iran. Risk factor data (RFD) from 89,404 individuals (15-64 years old) were gathered by questionnaire and laboratory examinations through a cross sectional study in all provinces by systematic clustering sampling method. CRC incidence segregated by age and gender was obtained from Cancer Registry Ministry of Health (CRMH) of Iran. First, correlation coefficients were used for data analysis and then multiple regression analysis was performed to control for confounding factors. Results: Colorectal cancer incidence showed a positive relationship with diabetes mellitus, hypertension, lacking or low physical activity, high education, high intake of dairy products, and non-consumption of vegetables and fruits. Conclusions: We concluded that many dietary factors and public health indicators have positive relationships with CRC and might therefore be targets of preliminary prevention. However, since this is an ecological study limited by potential ecological fallacy the results must be interpreted with caution.