• Title/Summary/Keyword: Number of data

Search Result 22,255, Processing Time 0.048 seconds

The Effect of the Number of Training Data on Speech Recognition

  • Lee, Chang-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2E
    • /
    • pp.66-71
    • /
    • 2009
  • In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.

Group Search Optimization Data Clustering Using Silhouette (실루엣을 적용한 그룹탐색 최적화 데이터클러스터링)

  • Kim, Sung-Soo;Baek, Jun-Young;Kang, Bum-Soo
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.42 no.3
    • /
    • pp.25-34
    • /
    • 2017
  • K-means is a popular and efficient data clustering method that only uses intra-cluster distance to establish a valid index with a previously fixed number of clusters. K-means is useless without a suitable number of clusters for unsupervised data. This paper aimsto propose the Group Search Optimization (GSO) using Silhouette to find the optimal data clustering solution with a number of clusters for unsupervised data. Silhouette can be used as valid index to decide the number of clusters and optimal solution by simultaneously considering intra- and inter-cluster distances. The performance of GSO using Silhouette is validated through several experiment and analysis of data sets.

Data-Dependent Choice of Optimal Number of Lags in Variogram Estimation

  • Choi, Seung-Bae;Kang, Chang-Wan;Cho, Jang-Sik
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.3
    • /
    • pp.609-619
    • /
    • 2010
  • Geostatistical data among spatial data is analyzed in three stages: (1) variogram estimation, (2) model fitting for the estimated variograms and (3) spatial prediction using the fitted variogram model. It is very important to estimate the variograms properly as the first stage(i.e., variogram estimation) affects the next two stages. In general, the variogram is estimated with the moment estimator. To estimate the variogram, we have to decide the 'lag increment' or the 'number of lags'. However, there is no established rule for selecting the number of lags in estimating the variogram. The present paper proposes a method of choosing the optimal number of lags based on the PRESS statistic. To show the usefulness of the proposed method, we perform a small simulation study and show an empirical example with with air pollution data from Korea.

The relation between the five critical crime of criminal law and the private security services (형법범죄 중 5대 범죄와 민간경비 간의 관계)

  • Joo, Il-Yeob;Jo, Gwang-Rae
    • Korean Security Journal
    • /
    • no.8
    • /
    • pp.361-377
    • /
    • 2004
  • This study is to examine the relations between the big five critical crime that consist of homicide, robbery, rape, theft, violence and the private security services. To achieve this objective, this research selected the subject of study, specially, 2002 status of the private security such as the number of companies and employees classified by areas along with the big five crime mentioned above classified by area. The research data is secondary data that is from '2003 Crime Analysis' of the Supreme Public Prosecutors' Office and 'The private Security Related Data' of the National Police Agency. The selected data were analyzed according to the variables by using SPSS 10.0 statistics software program. Each hypothesis was verified around the level of significance ${\alpha}$=.05 by using the statistical techniques, such as Descriptive Statistics, Correlation, Regression, etc. The following was the result of the study, First, the total number of the big five crime affects the number of the companies at significant level. Second, the number of the security companies can be explained by the each total number of the big five crime in the order of theft, robbery, violence, rape and murder. Third, the total number of the big five crime affects the number of the security employees at significant level. Forth the number of the security employees can be explained by the each total number of the big five crime in the order of theft, robbery, violence, rape and murder.

  • PDF

Systematic Determination of Number of Clusters Based on Input Representation Coverage (클러스터 분석을 위한 IRC기반 클러스터 개수 자동 결정 방법)

  • 신미영
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.41 no.6
    • /
    • pp.39-46
    • /
    • 2004
  • One of the significant issues in cluster analysis is to identify a proper number of clusters hidden under given data. In this paper we propose a novel approach to systematically determine the number of clusters based on Input Representation Coverage (IRC), which is newly defined as a quantified value of how well original input data in Gaussian feature space can be captured with a certain number of clusters. Furthermore, its usability and applicability is also investigated via experiments with synthetic data. Our experiment results show that the proposed approach is quite useful in approximately finding the real number of clusters implicitly contained in the data.

K-means based Clustering Method with a Fixed Number of Cluster Members

  • Yi, Faliu;Moon, Inkyu
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.10
    • /
    • pp.1160-1170
    • /
    • 2014
  • Clustering methods are very useful in many fields such as data mining, classification, and object recognition. Both the supervised and unsupervised grouping approaches can classify a series of sample data with a predefined or automatically assigned cluster number. However, there is no constraint on the number of elements for each cluster. Numbers of cluster members for each cluster obtained from clustering schemes are usually random. Thus, some clusters possess a large number of elements whereas others only have a few members. In some areas such as logistics management, a fixed number of members are preferred for each cluster or logistic center. Consequently, it is necessary to design a clustering method that can automatically adjust the number of group elements. In this paper, a k-means based clustering method with a fixed number of cluster members is proposed. In the proposed method, first, the data samples are clustered using the k-means algorithm. Then, the number of group elements is adjusted by employing a greedy strategy. Experimental results demonstrate that the proposed clustering scheme can classify data samples efficiently for a fixed number of cluster members.

Predicting the future number of failures based on the field failure summary data (필드 고장 요약 데이터를 활용한 미래 고장수의 예측)

  • Baik, Jai-Wook;Jo, Jin-Nam
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.755-764
    • /
    • 2011
  • In many companies field failure data is used to predict the future number of failures, especially when an unexpected failure mode happens to be a problem. It is because they want to predict the number of spare parts needed and the future quality warranty cost associated with the part based on the predictions of the future number of failures. In this paper field summary data is used to predict the future number of failures based on an appropriate distribution. Other types of data are also investigated to identify the appropriate distribution.

Statistical Estimation of Specified Concrete Strength by Applying Non-Destructive Test Data (비파괴시험 자료를 적용한 콘크리트 기준강도의 통계적 추정)

  • Paik, Inyeol
    • Journal of the Korean Society of Safety
    • /
    • v.30 no.1
    • /
    • pp.52-59
    • /
    • 2015
  • The aim of the paper is to introduce the statistical definition of the specified compressive strength of the concrete to be used for safety evaluation of the existing structure in domestic practice and to present the practical method to obtain the specified strength by utilizing the non-destructive test data as well as the limited number of core test data. The statistical definition of the specified compressive strength of concrete in the design codes is reviewed and the consistent formulations to statistically estimate the specified strength for assessment are described. In order to prevent estimating an unrealistically small value of the specified strength due to limited number of data, it is proposed that the information from the non-destructive test data is combined to that of the minimum core test data. The the sample mean, standard deviation and total number of concrete test are obtained from combined test data. The proposed procedures are applied to an example test data composed of the artificial numerical values and the actual evaluation data collected from the bridge assessment reports. The calculation results show that the proposed statistical estimation procedures yield reasonable values of the specified strength for assessment by applying the non-destructive test data in addition to the limited number of core test data.

Heat Transfer Correlation for the Forced Convective Flow on Single Circular Fin-tube Heat Exchanger

  • Kang Hie-Chan
    • International Journal of Air-Conditioning and Refrigeration
    • /
    • v.14 no.1
    • /
    • pp.14-18
    • /
    • 2006
  • This study was performed to investigate the heat transfer characteristics of the circular fin-tube heat exchanger. This paper contains the experimental data for the seven kinds of fin geometries. The correlation of Stasiulevicius agreed with the experimental data at high Reynolds number, however not well at low Reynolds number. The Nusselt number was well correlated with Graetz number, and showed a transition near Gz=10. An empirical correlation proposed in the present study agreed well with the experimental data.

Forced Convection Correlation for Single Circular Fin-tube Heat Exchanger (단일 원형휜-원형관에 대한 강제대류열전달 상관식)

  • 강희찬;강민철
    • Korean Journal of Air-Conditioning and Refrigeration Engineering
    • /
    • v.16 no.6
    • /
    • pp.584-588
    • /
    • 2004
  • This work was performed to investigate the heat transfer characteristics of the circular fin-tube heat exchanger. This paper contains the experimental data for the seven kinds of fin geometries. The correlation of Stasiulevicius agreed with the experimental data at high Reynolds number, however not well at low Reynolds number. The Nusselt number was well correlated with Graetz number, and showed a transition near Gz=10. An empirical correlation proposed in the present work agreed well with the experimental data.