• Title/Summary/Keyword: Iris data

Search Result 193, Processing Time 0.023 seconds

Design and Application of Genetic-Fuzzy System based on Grammatical Encoding (문법 코딩에 기반한 유전적 퍼지 시스템의 설계 및 응용)

  • Gil, Jun-Min;Go, Myeong-Suk;Hwang, Jong-Seon
    • Journal of KIISE:Software and Applications
    • /
    • v.28 no.1
    • /
    • pp.31-45
    • /
    • 2001
  • 퍼지 시스템의 설계시, 퍼지 시스템의 성능 저하 없이 최적의 퍼지 규칙 선택과 퍼지 소속 함수의 단순한 정의는 매우 중요하다. 이러한 목적을 이루기 위해서, 본 논문에서는 입력 공간에 강한 영향을 보이는 퍼지 규칙만을 퍼지 규칙으로 선택함으로써 입력 공간의 증가에 유연하게 대처할 수 있는 퍼지 규칙 구조를 제안한다. 또한, 유전자 알고리즘의 진화 탐색을 통하여 퍼지 시스템의 최적화된 구조를 얻기 위해서 퍼지 시스템의 구조를 생성시키는 문법 규칙을 해개체로 코딩하는 문법 코딩을 이용한 유전적 퍼지 시스템을 제안한다. 문법 규칙은 퍼지 규칙의 복잡한 구조를 단순한 모듈 구조로 표현하므로 문법 규칙의 코딩은 유전자 알고리즘의 빠른 수렴과 효율적인 탐색을 보장한다. 아울러, 제안하는 방법을 많은 입력 공간을 갖는 아이리스 데이타(Iris data) 문제와 시간열 예측(time series prediction) 문제에 적용함으로써 제안하는 방법의 응용성을 보이고 성능을 분석한다. 실험 결과, 제안하는 방법이 직접 코딩을 사용한 다른 설계 방법보다 더 좋은 성능을 보여 주었다.

  • PDF

Design of the Pattern Classifier using Fuzzy Neural Network (퍼지 신경 회로망을 이용한 패턴 분류기의 설계)

  • Kim, Moon-Hwan;Lee, Ho-Jae;Joo, Young-Hoon;Park, Jin-Bae
    • Proceedings of the KIEE Conference
    • /
    • 2003.07d
    • /
    • pp.2573-2575
    • /
    • 2003
  • In this paper, we discuss a fuzzy neural network classifier with immune algorithm. The fuzzy neural network classifier is constructed with the fuzzy classifier and the neural network classifier based on fuzzy rules. To maximize performance of classifier, the immune algorithm and the back propagation algorithm are used. For the generalized classification ability, the simulation results from the iris data demonstrate superiority of the proposed classifier in comparison with other classifier.

  • PDF

Efficient Data Clustering using Fast Choice for Number of Clusters (빠른 클러스터 개수 선정을 통한 효율적인 데이터 클러스터링 방법)

  • Kim, Sung-Soo;Kang, Bum-Su
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.2
    • /
    • pp.1-8
    • /
    • 2018
  • K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, this method has the limitation to be used with fixed number of clusters because of only considering the intra-cluster distance to evaluate the data clustering solutions. Silhouette is useful and stable valid index to decide the data clustering solution with number of clusters to consider the intra and inter cluster distance for unsupervised data. However, this valid index has high computational burden because of considering quality measure for each data object. The objective of this paper is to propose the fast and simple speed-up method to overcome this limitation to use silhouette for the effective large-scale data clustering. In the first step, the proposed method calculates and saves the distance for each data once. In the second step, this distance matrix is used to calculate the relative distance rate ($V_j$) of each data j and this rate is used to choose the suitable number of clusters without much computation time. In the third step, the proposed efficient heuristic algorithm (Group search optimization, GSO, in this paper) can search the global optimum with saving computational capacity with good initial solutions using $V_j$ probabilistically for the data clustering. The performance of our proposed method is validated to save significantly computation time against the original silhouette only using Ruspini, Iris, Wine and Breast cancer in UCI machine learning repository datasets by experiment and analysis. Especially, the performance of our proposed method is much better than previous method for the larger size of data.

Hybrid Simulated Annealing for Data Clustering (데이터 클러스터링을 위한 혼합 시뮬레이티드 어닐링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Beom-Soo
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.2
    • /
    • pp.92-98
    • /
    • 2017
  • Data clustering determines a group of patterns using similarity measure in a dataset and is one of the most important and difficult technique in data mining. Clustering can be formally considered as a particular kind of NP-hard grouping problem. K-means algorithm which is popular and efficient, is sensitive for initialization and has the possibility to be stuck in local optimum because of hill climbing clustering method. This method is also not computationally feasible in practice, especially for large datasets and large number of clusters. Therefore, we need a robust and efficient clustering algorithm to find the global optimum (not local optimum) especially when much data is collected from many IoT (Internet of Things) devices in these days. The objective of this paper is to propose new Hybrid Simulated Annealing (HSA) which is combined simulated annealing with K-means for non-hierarchical clustering of big data. Simulated annealing (SA) is useful for diversified search in large search space and K-means is useful for converged search in predetermined search space. Our proposed method can balance the intensification and diversification to find the global optimal solution in big data clustering. The performance of HSA is validated using Iris, Wine, Glass, and Vowel UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KSAK (K-means+SA+K-means) and SAK (SA+K-means) are better than KSA(K-means+SA), SA, and K-means in our simulations. Our method has significantly improved accuracy and efficiency to find the global optimal data clustering solution for complex, real time, and costly data mining process.

A Study on the Distribution and Conservation Plan of Vascular Flora in Gyodong Island (교동도의 관속식물상 분포 및 보전방안 연구)

  • Yun, Ho-Geun;Kim, Sang-Jun;Lee, Jong-Won
    • Journal of Environmental Impact Assessment
    • /
    • v.31 no.1
    • /
    • pp.11-46
    • /
    • 2022
  • This study was carried out to examine vascular plants and invasive alien plants in Gyodong Island, located at the northwestern Civilian Control Line (CCL) of Ganghwa-gun, Incheon, and to use them as basic data for systematic management of identified plants and establishing biodiversity conservation measures. The survey was conducted 13 times from April 2019 to August 2021. The vascularflora in Gyodong Island was identified as 109 families, 378 genera, 641 species, 15 subspecies, 49 variants, 8 forma, a total of 713 taxa. This was found to be about 15.36% of the total 4,641 taxa of vascular plants in Korea. The northern linage plants on the Korean Peninsula appearing in the Gyodong Island area were identified in 83 classification groups, including Red-based leaf edge (Carex erythrobasis H.Lev. & Vaniot). Korea endemic plants were identified as 16 taxa such as Seoul wild-ginger [Asarum heterotropoides var. seoulense (Nakai) Kitag.], and a total of 20 taxa of rare plants designated by IUCN were observed, including the endangered grade Beardless iris (Iris ruthenica Ker Gawl.). Floristic target species were classified with a total of 99 taxa. For V grade, Beardless iris 1 taxon was found. and also IV grade and III grade were identified in 8 taxa and 20 taxa respectively. The invasive alien plants identified as 75 taxa, such as Verbesina alternifolia (L.) Britton ex Kearney. The naturalization rate was 10.51%, and the urbanization index was calculated as 23.29%. Since large-scale construction has been currently underway on Hwagae Mt. in Gyodong Island as the target of survey area, the influx of invasive plants will be expected to promote. Therefore, it is urgent to establish in-situ protection and conservation measures for notable plants such as Beardless iris and Water smartweed [Persicaria amphibia (L.) S.F.Gray].

Plurality Rule-based Density and Correlation Coefficient-based Clustering for K-NN

  • Aung, Swe Swe;Nagayama, Itaru;Tamaki, Shiro
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.6 no.3
    • /
    • pp.183-192
    • /
    • 2017
  • k-nearest neighbor (K-NN) is a well-known classification algorithm, being feature space-based on nearest-neighbor training examples in machine learning. However, K-NN, as we know, is a lazy learning method. Therefore, if a K-NN-based system very much depends on a huge amount of history data to achieve an accurate prediction result for a particular task, it gradually faces a processing-time performance-degradation problem. We have noticed that many researchers usually contemplate only classification accuracy. But estimation speed also plays an essential role in real-time prediction systems. To compensate for this weakness, this paper proposes correlation coefficient-based clustering (CCC) aimed at upgrading the performance of K-NN by leveraging processing-time speed and plurality rule-based density (PRD) to improve estimation accuracy. For experiments, we used real datasets (on breast cancer, breast tissue, heart, and the iris) from the University of California, Irvine (UCI) machine learning repository. Moreover, real traffic data collected from Ojana Junction, Route 58, Okinawa, Japan, was also utilized to lay bare the efficiency of this method. By using these datasets, we proved better processing-time performance with the new approach by comparing it with classical K-NN. Besides, via experiments on real-world datasets, we compared the prediction accuracy of our approach with density peaks clustering based on K-NN and principal component analysis (DPC-KNN-PCA).

Breast and Colorectal Cancer Screening and Associated Correlates among Chinese Older Women

  • Leung, Doris Y.P.;Leung, Angela Y.M.;Chi, Iris
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.1
    • /
    • pp.283-287
    • /
    • 2012
  • Objective: To explore the participation rates for breast and colorectal cancer screening and identify associated correlates among elderly women. Methods: Logistic regressions were conducted using data collected in 2006 from 1,533 elderly women aged 60 years or above who had completed a screening instrument, the Minimum Data Set-Home Care, while applying for long-term care services at the first time in Hong Kong. Results: The participation rates for breast and colorectal cancer screening among frail older Chinese women were 3.7% and 10.8% respectively. Cognitive status was inversely associated with the likelihood of participation in screening (breast: OR = 0.66, 95%CI = 0.47-0.94; colon: OR = 0.81, 95%CI = 0.66-0.99), as was educational level with the likelihood of participation in breast cancer screening (no formal education: OR = 0.20, 95%CI = 0.06-0.61, some primary education: OR = 0.31, 95%CI = 0.10-1.00). Conclusion: The delivery of cancer preventive health services to frail older women is less than ideal. Cognitive status and educational level were important factors in cancer screening behaviour. Tailor-made strategic promotion programmes targeting older women with low cognitive status and educational levels are needed to enhance awareness and acceptance within this vulnerable group.

Double K-Means Clustering (이중 K-평균 군집화)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.343-352
    • /
    • 2000
  • In this study. the author proposes a nonhierarchical clustering method. called the "Double K-Means Clustering", which performs clustering of multivariate observations with the following algorithm: Step I: Carry out the ordinary K-means clmitering and obtain k temporary clusters with sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-I: Allocate the observation x, to the cluster F if it satisfies ..... where N is the total number of observations, for -i = 1, . ,N. $\bullet$ Step II-2: Update cluster sizes $n_1$,... , $n_k$, centroids $c_$1,..., $c_k$ and pooled covariance matrix S. $\bullet$ Step II-3: Repeat Steps II-I and II-2 until the change becomes negligible. The double K-means clustering is nearly "optimal" under the mixture of k multivariate normal distributions with the common covariance matrix. Also, it is nearly affine invariant, with the data-analytic implication that variable standardizations are not that required. The method is numerically demonstrated on Fisher's iris data.

  • PDF

Development of Data Fusion Human Identification System Based on Finger-Vein Pattern-Matching Method and photoplethysmography Identification

  • Ko, Kuk Won;Lee, Jiyeon;Moon, Hongsuk;Lee, Sangjoon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.7 no.2
    • /
    • pp.149-154
    • /
    • 2015
  • Biometric techniques for authentication using body parts such as a fingerprint, face, iris, voice, finger-vein and also photoplethysmography have become increasingly important in the personal security field, including door access control, finance security, electronic passport, and mobile device. Finger-vein images are now used to human identification, however, difficulties in recognizing finger-vein images are caused by capturing under various conditions, such as different temperatures and illumination, and noise in the acquisition camera. The human photoplethysmography is also important signal for human identification. In this paper To increase the recognition rate, we develop camera based identification method by combining finger vein image and photoplethysmography signal. We use a compact CMOS camera with a penetrating infrared LED light source to acquire images of finger vein and photoplethysmography signal. In addition, we suggest a simple pattern matching method to reduce the calculation time for embedded environments. The experimental results show that our simple system has good results in terms of speed and accuracy for personal identification compared to the result of only finger vein images.

A Study on Environmental Information System for Hazard Identification of Air Pollutants (환경정보 검색 시스템의 활용에 관한 연구 : 대기오염 물질의 위험성 확인을 중심으로)

  • Kim, Sun-Jeong;Shin, Dong-Chun;Chung, Yong;Koo, Ja-Kon
    • Journal of Environmental Impact Assessment
    • /
    • v.5 no.1
    • /
    • pp.107-121
    • /
    • 1996
  • The objective of this study is to establish the application method of environmental information system which is related to hazard identification for Health Risk Assessment. For establishing the environmental information system, fourteen hazardous chemicals were chosen and applicated to the database network such as RTKNET(Right Know Net), MSDS(Material Safety Data Sheets), TRI(Toxic Release Inventory), IRIS, AIRS, etc. The searching method of environmental information is classified to three sections such as the domestic commercial information company, international database agencies, and internet. Recently the importance of environmental information is being emphasized because it is essential 10 use database system in the field of environmental studies. Most of the foreign research organizations are communicating actively for information exchange, and the improvement of the quality of research. It is required to accumulate the data and develop them to database for future research.

  • PDF