• Title/Summary/Keyword: 데이터 분할 평가

Search Result 502, Processing Time 0.029 seconds

A Representative Pattern Generation Algorithm Based on Evaluation And Selection (평가와 선택기법에 기반한 대표패턴 생성 알고리즘)

  • Yih, Hyeong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.139-147
    • /
    • 2009
  • The memory based reasoning just stores in the memory in the form of the training pattern of the representative pattern. And it classifies through the distance calculation with the test pattern. Because it uses the techniques which stores the training pattern whole in the memory or in which it replaces training patterns with the representative pattern. Due to this, the memory in which it is a lot for the other machine learning techniques is required. And as the moreover stored training pattern increases, the time required for a classification is very much required. In this paper, We propose the EAS(Evaluation And Selection) algorithm in order to minimize memory usage and to improve classification performance. After partitioning the training space, this evaluates each partitioned space as MDL and PM method. The partitioned space in which the evaluation result is most excellent makes into the representative pattern. Remainder partitioned spaces again partitions and repeat the evaluation. We verify the performance of Proposed algorithm using benchmark data sets from UCI Machine Learning Repository.

An Energy-Efficient Dynamic Area Compression Scheme in Wireless Multimedia Sensor Networks (무선 멀티미디어 센서 네트워크에서 에너지 효율적인 동적 영역 압축 기법)

  • Park, Junho;Ryu, Eunkyung;Son, Ingook;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.12
    • /
    • pp.9-18
    • /
    • 2013
  • In recent years, the demands of multimedia data in wireless sensor networks have been significantly increased for the high-quality environment monitoring applications that utilize sensor nodes to collect multimedia data. However, since the amount of multimedia data is very large, the network lifetime and network performance are significantly reduced due to excessive energy consumption on particular nodes. In this paper, we propose an energy-efficient dynamic area compression scheme in wireless multimedia sensor networks. The proposed scheme minimizes the energy consumption in the huge multimedia data transmission process by compression using the Chinese Remainder Theorem(CRT) and dynamic area detection and division algorithm. Our experimental results show that our proposed scheme improves the data compression ratio by about 37% and reduces the amount of transmitted data by about 56% over the existing scheme on average. In addition, the proposed scheme increases network lifetime by about 14% over the existing scheme on average.

A fuzzy cluster validity index for the evaluation of Fuzzy C-Means algorithm (최적 클러스터 분할을 위한 FCM 평가 인덱스)

  • 김대원;이광현
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.374-376
    • /
    • 2003
  • 본 논문에서는 Fussy C-Means (FCM) 알고리즘에 의해 계산된 퍼지 클러스터들에 대한 평가 인덱스를 제안한다. 제안된 인덱스는 퍼지 클러스터들간의 인접성(inter-cluster proximity)을 이용한다. 클러스터 인접성을 도입함으로써 클러스터간의 중첩 정도를 계산할 수 있다. 따라서, 인접성 값이 낮을수록 클러스터들은 공간에 잘 분포하게 됨을 알 수 있다. 다양한 데이터 집합에 대한 실험을 통해서 제안된 인덱스의 효율성과 신뢰성을 검증하였다.

  • PDF

Characteristics of Input-Output Spaces of Fuzzy Inference Systems by Means of Membership Functions and Performance Analyses (소속 함수에 의한 퍼지 추론 시스템의 입출력 공간 특성 및 성능 분석)

  • Park, Keon-Jun;Lee, Dong-Yoon
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.4
    • /
    • pp.74-82
    • /
    • 2011
  • To do fuzzy modelling of a nonlinear process needs to analyze the characteristics of input-output of fuzzy inference systems according to the division of entire input spaces and the fuzzy reasoning methods. For this, fuzzy model is expressed by identifying the structure and parameters of the system by means of input variables, fuzzy partition of input spaces, and consequence polynomial functions. In the premise part of the fuzzy rules Min-Max method using the minimum and maximum values of input data set and C-Means clustering algorithm forming input data into the clusters are used for identification of fuzzy model and membership functions are used as a series of triangular, gaussian-like, trapezoid-type membership functions. In the consequence part of the fuzzy rules fuzzy reasoning is conducted by two types of inferences such as simplified and linear inference. The identification of the consequence parameters, namely polynomial coefficients, of each rule are carried out by the standard least square method. And lastly, using gas furnace process which is widely used in nonlinear process we evaluate the performance and the system characteristics.

Fuzzy Rules Generation and Inference System of Scatter Partition Method (분산 분할 방식의 퍼지 규칙 생성 및 추론 시스템)

  • Park, Keon-jun;Jang, Tae-Su;Kim, Sung-Hun;Kim, Yong-kab
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2012.10a
    • /
    • pp.35-36
    • /
    • 2012
  • The generation of fuzzy rules is inevitable in order to construct fuzzy modeling and in general, has the problem that the number of rules increases exponentially with increasing dimension. To solve this problem, we introduce the system that generate the fuzzy rules and make a inference based on FCM clustering algorithm that partition the input space in the scatter form. The parameters in the premise part of the fuzzy rules is determined as membership matrix by the FCM clustering algorithm and the consequence part of the fuzzy rules is are expressed as a polynomial function. Proposed model evaluated using the numerical data.

  • PDF

A study on end-to-end speaker diarization system using single-label classification (단일 레이블 분류를 이용한 종단 간 화자 분할 시스템 성능 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.536-543
    • /
    • 2023
  • Speaker diarization, which labels for "who spoken when?" in speech with multiple speakers, has been studied on a deep neural network-based end-to-end method for labeling on speech overlap and optimization of speaker diarization models. Most deep neural network-based end-to-end speaker diarization systems perform multi-label classification problem that predicts the labels of all speakers spoken in each frame of speech. However, the performance of the multi-label-based model varies greatly depending on what the threshold is set to. In this paper, it is studied a speaker diarization system using single-label classification so that speaker diarization can be performed without thresholds. The proposed model estimate labels from the output of the model by converting speaker labels into a single label. To consider speaker label permutations in the training, the proposed model is used a combination of Permutation Invariant Training (PIT) loss and cross-entropy loss. In addition, how to add the residual connection structures to model is studied for effective learning of speaker diarization models with deep structures. The experiment used the Librispech database to generate and use simulated noise data for two speakers. When compared with the proposed method and baseline model using the Diarization Error Rate (DER) performance the proposed method can be labeling without threshold, and it has improved performance by about 20.7 %.

Performance Evaluation of Clustering Algorithms for Fixed-Grid Spatial Index (고정 그리드 공간 색인을 위한 클러스터링 알고리즘의 성능 평가)

  • 유진영;김진덕;김동현;홍봉희;김장수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10b
    • /
    • pp.32-134
    • /
    • 1998
  • 공간 색인의 하나인 그리드 파일은 공간 데이터 영역을 격자 형태의 셀로 분할하여 구성하는데 특히, 셀들의 크기가 모두 동일한 값으로 고정되어진 것을 고정 그리드(fixed grid)라고 한다. 셀들의 크기가 고정된으로 인해 샐 분할선 상에 객체가 존재하는 경우가 자주 발생하게 되고 이러한 객체들은 하나 이상의 셀에 의해 중복으로 참조된다. 중복 참조 객체는 1/10 시간을 증가시켜 질의 처리 시 성능 저하의 주요한 원인이 된다. 따라서 중복 객체를 효율적으로 처리 할 수 있는 클러스터링 알고리즘의 고안이 필요하다. 이 논문에서는 중복 참조 객체를 처리하기 위한 객체 클러스터링(Object clustering)과 셀 단위로 클러스터하기 위한 셀 클러스터링(Cell clustering) 알고리즘을 구현한다. 그리고 공간 질의 수행 시에 각 클러스터기법들에 대한 성능을 평가한다.

Implementation and Performance Analysis of a Parallel CBF Scheme under Cluster System Environment (클러스터 시스템 환경 하에서의 병렬 CBF 기법의 구현 및 성능 평가)

  • 박승봉;장재우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.250-252
    • /
    • 2002
  • 기존의 색인 기법들은 차원의 수가 증가할수록 검색 성능이 급격히 저하되는 문제를 지니고 있으며. 이를 극복하기 위하여 CBF 기법이 제안되었다. 그러나 CBF 기법은 데이터 양이 증가함에 따라 검색성능이 선형적으로 감소하는 문제가 존재한다. 이를 해결하기 위해 다수의 디스크를 수령 분할 방법을 이용하여 디클러스터링(declustering)을 하는 병렬 CBF 기법이 제안되었다. 본 논문에서는 병렬 CBF기법을 여러 대의 리눅스 컴퓨터를 이용한 클러스터 시스템 환경 하에서 구현하고, 삽입시간, 범위 질의 검색시간, k-최근접 질의 검색시간 측면에서 성능 평가를 수행한다. 아울러, 클러스터 시스템 환경 하에서의 병렬 CBF 기법을 기존 CBF 기법과 성능 비교를 수행하며, 이를 통해 병렬 CBF 기법이 기존 CBF 기법보다 우수한 검색 성능을 나타냄을 보인다.

  • PDF

Sign Language Dataset Built from S. Korean Government Briefing on COVID-19 (대한민국 정부의 코로나 19 브리핑을 기반으로 구축된 수어 데이터셋 연구)

  • Sim, Hohyun;Sung, Horyeol;Lee, Seungjae;Cho, Hyeonjoong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.8
    • /
    • pp.325-330
    • /
    • 2022
  • This paper conducts the collection and experiment of datasets for deep learning research on sign language such as sign language recognition, sign language translation, and sign language segmentation for Korean sign language. There exist difficulties for deep learning research of sign language. First, it is difficult to recognize sign languages since they contain multiple modalities including hand movements, hand directions, and facial expressions. Second, it is the absence of training data to conduct deep learning research. Currently, KETI dataset is the only known dataset for Korean sign language for deep learning. Sign language datasets for deep learning research are classified into two categories: Isolated sign language and Continuous sign language. Although several foreign sign language datasets have been collected over time. they are also insufficient for deep learning research of sign language. Therefore, we attempted to collect a large-scale Korean sign language dataset and evaluate it using a baseline model named TSPNet which has the performance of SOTA in the field of sign language translation. The collected dataset consists of a total of 11,402 image and text. Our experimental result with the baseline model using the dataset shows BLEU-4 score 3.63, which would be used as a basic performance of a baseline model for Korean sign language dataset. We hope that our experience of collecting Korean sign language dataset helps facilitate further research directions on Korean sign language.

A Study on Performance Evaluation of Hidden Markov Network Speech Recognition System (Hidden Markov Network 음성인식 시스템의 성능평가에 관한 연구)

  • 오세진;김광동;노덕규;위석오;송민규;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.4 no.4
    • /
    • pp.30-39
    • /
    • 2003
  • In this paper, we carried out the performance evaluation of HM-Net(Hidden Markov Network) speech recognition system for Korean speech databases. We adopted to construct acoustic models using the HM-Nets modified by HMMs(Hidden Markov Models), which are widely used as the statistical modeling methods. HM-Nets are carried out the state splitting for contextual and temporal domain by PDT-SSS(Phonetic Decision Tree-based Successive State Splitting) algorithm, which is modified the original SSS algorithm. Especially it adopted the phonetic decision tree to effectively express the context information not appear in training speech data on contextual domain state splitting. In case of temporal domain state splitting, to effectively represent information of each phoneme maintenance in the state splitting is carried out, and then the optimal model network of triphone types are constructed by in the parameter. Speech recognition was performed using the one-pass Viterbi beam search algorithm with phone-pair/word-pair grammar for phoneme/word recognition, respectively and using the multi-pass search algorithm with n-gram language models for sentence recognition. The tree-structured lexicon was used in order to decrease the number of nodes by sharing the same prefixes among words. In this paper, the performance evaluation of HM-Net speech recognition system is carried out for various recognition conditions. Through the experiments, we verified that it has very superior recognition performance compared with the previous introduced recognition system.

  • PDF