• Title/Summary/Keyword: Clustering Problem

Search Result 709, Processing Time 0.03 seconds

Isolated Word Recognition Using k-clustering Subspace Method and Discriminant Common Vector (k-clustering 부공간 기법과 판별 공통벡터를 이용한 고립단어 인식)

  • Nam, Myung-Woo
    • Journal of the Institute of Electronics Engineers of Korea TE
    • /
    • v.42 no.1
    • /
    • pp.13-20
    • /
    • 2005
  • In this paper, I recognized Korean isolated words using CVEM which is suggested by M. Bilginer et al. CVEM is an algorithm which is easy to extract the common properties from training voice signals and also doesn't need complex calculation. In addition CVEM shows high accuracy in recognition results. But, CVEM has couple of problems which are impossible to use for many training voices and no discriminant information among extracted common vectors. To get the optimal common vectors from certain voice classes, various voices should be used for training. But CVEM is impossible to get continuous high accuracy in recognition because CVEM has a limitation to use many training voices and the absence of discriminant information among common vectors can be the source of critical errors. To solve above problems and improve recognition rate, k-clustering subspace method and DCVEM suggested. And did various experiments using voice signal database made by ETRI to prove the validity of suggested methods. The result of experiments shows improvements in performance. And with proposed methods, all the CVEM problems can be solved with out calculation problem.

A Heuristic Algorithm for Designing Traffic Analysis Zone Using Geographic Information System (Vector GIS를 이용한 교통 Zone체계 알고리즘 개발 방안에 관한 연구)

  • Choi, Kee-Choo
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.3 no.1 s.5
    • /
    • pp.91-104
    • /
    • 1995
  • The spatial aggregation of data, in transportation and other planning processes, is an important theoretical consideration because the results of any analysis are not entirely independent of the delineation of zones. Moreover, using a different spatial aggregation may lead to different, and sometimes contradictory conclusions. Two criteria have been considered as important in designing zone systems. They are scale and aggregation. The scale problem arises because of uncertainty about the number of zones needed for a study and the aggregation problem arises because of uncertainty about how the data are to be aggregated to from a given scale problem. In a transportation study, especially in the design of traffic analysis zone(TAZ), the scale problem is directly related to the number dof zones and the aggregation problem involves spatial clustering, meeting the general requirements of forming the zones system such as equal traffic generation, convexity, and the consistency with the political boundary. In this study, first, the comparative study of delineating spatial units has been given. Second, a FORTRAN-based heuristic algorithm for designing TAZ based on socio-economic data has been developed and applied to the Korean peninsula containing 132 micro parcels. The vector type ARC/INFO GIS topological data mosel has been used to provise the adjacency information between parcels. The results, however, leave some to be desired in order to overcome such problems as non-convexity of the agglomerated TAZ system and/or uneven traffic phenomenon for each TAZ.

  • PDF

Detection of Forest Fire Damage from Sentinel-1 SAR Data through the Synergistic Use of Principal Component Analysis and K-means Clustering (Sentinel-1 SAR 영상을 이용한 주성분분석 및 K-means Clustering 기반 산불 탐지)

  • Lee, Jaese;Kim, Woohyeok;Im, Jungho;Kwon, Chunguen;Kim, Sungyong
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_3
    • /
    • pp.1373-1387
    • /
    • 2021
  • Forest fire poses a significant threat to the environment and society, affecting carbon cycle and surface energy balance, and resulting in socioeconomic losses. Widely used multi-spectral satellite image-based approaches for burned area detection have a problem in that they do not work under cloudy conditions. Therefore, in this study, Sentinel-1 Synthetic Aperture Radar (SAR) data from Europe Space Agency, which can be collected in all weather conditions, were used to identify forest fire damaged area based on a series of processes including Principal Component Analysis (PCA) and K-means clustering. Four forest fire cases, which occurred in Gangneung·Donghae and Goseong·Sokcho in Gangwon-do of South Korea and two areas in North Korea on April 4, 2019, were examined. The estimated burned areas were evaluated using fire reference data provided by the National Institute of Forest Science (NIFOS) for two forest fire cases in South Korea, and differenced normalized burn ratio (dNBR) for all four cases. The average accuracy using the NIFOS reference data was 86% for the Gangneung·Donghae and Goseong·Sokcho fires. Evaluation using dNBR showed an average accuracy of 84% for all four forest fire cases. It was also confirmed that the stronger the burned intensity, the higher detection the accuracy, and vice versa. Given the advantage of SAR remote sensing, the proposed statistical processing and K-means clustering-based approach can be used to quickly identify forest fire damaged area across the Korean Peninsula, where a cloud cover rate is high and small-scale forest fires frequently occur.

Credit Card Bad Debt Prediction Model based on Support Vector Machine (신용카드 대손회원 예측을 위한 SVM 모형)

  • Kim, Jin Woo;Jhee, Won Chul
    • Journal of Information Technology Services
    • /
    • v.11 no.4
    • /
    • pp.233-250
    • /
    • 2012
  • In this paper, credit card delinquency means the possibility of occurring bad debt within the certain near future from the normal accounts that have no debt and the problem is to predict, on the monthly basis, the occurrence of delinquency 3 months in advance. This prediction is typical binary classification problem but suffers from the issue of data imbalance that means the instances of target class is very few. For the effective prediction of bad debt occurrence, Support Vector Machine (SVM) with kernel trick is adopted using credit card usage and payment patterns as its inputs. SVM is widely accepted in the data mining society because of its prediction accuracy and no fear of overfitting. However, it is known that SVM has the limitation in its ability to processing the large-scale data. To resolve the difficulties in applying SVM to bad debt occurrence prediction, two stage clustering is suggested as an effective data reduction method and ensembles of SVM models are also adopted to mitigate the difficulty due to data imbalance intrinsic to the target problem of this paper. In the experiments with the real world data from one of the major domestic credit card companies, the suggested approach reveals the superior prediction accuracy to the traditional data mining approaches that use neural networks, decision trees or logistics regressions. SVM ensemble model learned from T2 training set shows the best prediction results among the alternatives considered and it is noteworthy that the performance of neural networks with T2 is better than that of SVM with T1. These results prove that the suggested approach is very effective for both SVM training and the classification problem of data imbalance.

Hopping Routing Scheme to Resolve the Hot Spot Problem of Periodic Monitoring Services in Wireless Sensor Networks (주기적 모니터링 센서 네트워크에서 핫 스팟 문제 해결을 위한 호핑 라우팅 기법)

  • Heo, Seok-Yeol;Lee, Wan-Jik;Jang, Seong-Sik;Byun, Tae-Young;Lee, Won-Yeol
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.9
    • /
    • pp.2340-2349
    • /
    • 2009
  • In this paper we proposed a hopping routing scheme to resolve the hot spot problem for periodic monitoring services in wireless sensor networks. Our hopping routing scheme constructs load balanced routing path, where an amount of energy consumption of all nodes in the sensor networks is predictable. Load balanced routing paths can be obtained from horizontal hopping transmission scheme which balances the load of the sensor nodes in the same area, and also from vertical hopping transmission scheme which balances the load of the sensor nodes in the other area. The direct transmission count numbers as load balancing parameter for vertical hopping transmission are derived using the energy consumption model of the sensor nodes. The experimental results show that the proposed hopping scheme resolves the hot spot problem effectively. The efficiency of hopping routing scheme is also shown by comparison with other routing scheme such as multi-hop, direct transmission and clustering.

Cluster-Based Mobile Sink Location Management Scheme for Solar-Powered Wireless Sensor Networks

  • Oh, Eomji;Kang, Minjae;Yoon, Ikjune;Noh, Dong Kun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.9
    • /
    • pp.33-40
    • /
    • 2017
  • In this paper, we propose a sink-location management and data-routing scheme to effectively support the mobile sink in solar-powered WSN. Battery-based wireless sensor networks (WSNs) have a limited lifetime due to their limited energy, but solar energy-based WSNs can be supplied with energy periodically and can operate forever. On the other hand, introduction of mobile sink in WSNs can solve some energy unbalance problem between sink-neighboring nodes and outer nodes which is one of the major challenges in WSNs. However, there is a problem that additional energy should be consumed to notify each sensor node of the location of the randomly moving mobile sink. In the proposed scheme, one of the nodes that harvests enough energy in each cluster are selected as the cluster head, and the location information of the mobile sink is shared only among the cluster heads, thereby reducing the location management overhead. In addition, the overhead for setting the routing path can be removed by transferring data in the opposite direction to the path where the sink-position information is transferred among the heads. Lastly, the access node is introduced to transmit data to the sink more reliably when the sink moves frequently.

Learning Probabilistic Kernel from Latent Dirichlet Allocation

  • Lv, Qi;Pang, Lin;Li, Xiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2527-2545
    • /
    • 2016
  • Measuring the similarity of given samples is a key problem of recognition, clustering, retrieval and related applications. A number of works, e.g. kernel method and metric learning, have been contributed to this problem. The challenge of similarity learning is to find a similarity robust to intra-class variance and simultaneously selective to inter-class characteristic. We observed that, the similarity measure can be improved if the data distribution and hidden semantic information are exploited in a more sophisticated way. In this paper, we propose a similarity learning approach for retrieval and recognition. The approach, termed as LDA-FEK, derives free energy kernel (FEK) from Latent Dirichlet Allocation (LDA). First, it trains LDA and constructs kernel using the parameters and variables of the trained model. Then, the unknown kernel parameters are learned by a discriminative learning approach. The main contributions of the proposed method are twofold: (1) the method is computationally efficient and scalable since the parameters in kernel are determined in a staged way; (2) the method exploits data distribution and semantic level hidden information by means of LDA. To evaluate the performance of LDA-FEK, we apply it for image retrieval over two data sets and for text categorization on four popular data sets. The results show the competitive performance of our method.

An Collaborative Filtering Method based on Associative Cluster Optimization for Recommendation System (추천시스템을 위한 연관군집 최적화 기반 협력적 필터링 방법)

  • Lee, Hyun Jin;Jee, Tae Chang
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.6 no.3
    • /
    • pp.19-29
    • /
    • 2010
  • A marketing model is changed from a customer acquisition to customer retention and it is being moved to a way that enhances the quality of customer interaction to add value to our customers. Such personalization is emerging from this background. The Web site is accelerate the adoption of a personalization, and in contrast to the rapid growth of data, quantitative analytical experience is required. For the automated analysis of large amounts of data and the results must be passed in real time of personalization has been interested in technical problems. A recommendation algorithm is an algorithm for the implementation of personalization, which predict whether the customer preferences and purchasing using the database with new customers interested or likely to purchase. As recommended number of users increases, the algorithm increases recommendation time is the problem. In this paper, to solve this problem, a recommendation system based on clustering and dimensionality reduction is proposed. First, clusters customers with such an orientation, then shrink the dimensions of the relationship between customers to low dimensional space. Because finding neighbors for recommendations is performed at low dimensional space, the computation time is greatly reduced.

An Advanced Parallel Join Algorithm for Managing Data Skew on Hypercube Systems (하이퍼큐브 시스템에서 데이타 비대칭성을 고려한 향상된 병렬 결합 알고리즘)

  • 원영선;홍만표
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.3_4
    • /
    • pp.117-129
    • /
    • 2003
  • In this paper, we propose advanced parallel join algorithm to efficiently process join operation on hypercube systems. This algorithm uses a broadcasting method in processing relation R which is compatible with hypercube structure. Hence, we can present optimized parallel join algorithm for that hypercube structure. The proposed algorithm has a complete solution of two essential problems - load balancing problem and data skew problem - in parallelization of join operation. In order to solve these problems, we made good use of the characteristics of clustering effect in the algorithm. As a result of this, performance is improved on the whole system than existing algorithms. Moreover. new algorithm has an advantage that can implement non-equijoin operation easily which is difficult to be implemented in hash based algorithm. Finally, according to the cost model analysis. this algorithm showed better performance than existing parallel join algorithms.

CAD Scheme To Detect Brain Tumour In MR Images using Active Contour Models and Tree Classifiers

  • Helen, R.;Kamaraj, N.
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.2
    • /
    • pp.670-675
    • /
    • 2015
  • Medical imaging is one of the most powerful tools for gaining information about internal organs and tissues. It is a challenging task to develop sophisticated image analysis methods in order to improve the accuracy of diagnosis. The objective of this paper is to develop a Computer Aided Diagnostics (CAD) scheme for Brain Tumour detection from Magnetic Resonance Image (MRI) using active contour models and to investigate with several approaches for improving CAD performances. The problem in clinical medicine is the automatic detection of brain Tumours with maximum accuracy and in less time. This work involves the following steps: i) Segmentation performed by Fuzzy Clustering with Level Set Method (FCMLSM) and performance is compared with snake models based on Balloon force and Gradient Vector Force (GVF), Distance Regularized Level Set Method (DRLSE). ii) Feature extraction done by Shape and Texture based features. iii) Brain Tumour detection performed by various tree classifiers. Based on investigation FCMLSM is well suited segmentation method and Random Forest is the most optimum classifier for this problem. This method gives accuracy of 97% and with minimum classification error. The time taken to detect Tumour is approximately 2 mins for an examination (30 slices).