• Title/Summary/Keyword: Probability and statistics

Search Result 1,185, Processing Time 0.024 seconds

Document Clustering Methods using Hierarchy of Document Contents (문서 내용의 계층화를 이용한 문서 비교 방법)

  • Hwang, Myung-Gwon;Bae, Yong-Geun;Kim, Pan-Koo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2335-2342
    • /
    • 2006
  • The current web is accumulating abundant information. In particular, text based documents are a type used very easily and frequently by human. So, numerous researches are progressed to retrieve the text documents using many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both subject and semantic of documents. So, to overcome the previous problems, we propose the document similarity method for semantic retrieval of document users want. This is the core method of document clustering. This method firstly, expresses a hierarchy semantically of document content ut gives the important hierarchy domain of document to weight. With this, we could measure the similarity between documents using both the domain weight and concepts coincidence in the domain hierarchies.

Statistical division of compressive strength results on the aspect of concrete family concept

  • Jasiczak, Jozef;Kanoniczak, Marcin;Smaga, Lukasz
    • Computers and Concrete
    • /
    • v.14 no.2
    • /
    • pp.145-161
    • /
    • 2014
  • The article presents the statistical method of grouping the results of the compressive strength of concrete in continuous production. It describes the method of dividing the series of compressive strength results into batches of statistically stable strength parameters at specific time intervals, based on the standardized concept of "concrete family". The article presents the examples of calculations made for two series of concrete strength results, from which sets of decreased strength parameters were separated. When assessing the quality of concrete elements and concrete road surfaces, the principal issue is the control of the compressive strength parameters of concrete. Large quantities of concrete mix manufactured in a continuous way should be subject to continuous control. Standardized approach to assessing the concrete strength proves to be insufficient because it does not allow for the detection of subsets of the decreased strength results, which in turn makes it impossible to make adjustments to the concrete manufacturing process and to identify particular product or area on site with decreased concrete strength. In this article two independent methods of grouping the test results of concrete with statistically stable strength parameters were proposed, involving verification of statistical hypothesis based on statistical tests: Student's t-test and Mann - Whitney - U test.

Two-Stage Logistic Regression for Cancer Classi cation and Prediction from Copy-Numbe Changes in cDNA Microarray-Based Comparative Genomic Hybridization

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.847-859
    • /
    • 2011
  • cDNA microarray-based comparative genomic hybridization(CGH) data includes low-intensity spots and thus a statistical strategy is needed to detect subtle differences between different cancer classes. In this study, genes displaying a high frequency of alteration in one of the different classes were selected among the pre-selected genes that show relatively large variations between genes compared to total variations. Utilizing copy-number changes of the selected genes, this study suggests a statistical approach to predict patients' classes with increased performance by pre-classifying patients with similar genetic alteration scores. Two-stage logistic regression model(TLRM) was suggested to pre-classify homogeneous patients and predict patients' classes for cancer prediction; a decision tree(DT) was combined with logistic regression on the set of informative genes. TLRM was constructed in cDNA microarray-based CGH data from the Cancer Metastasis Research Center(CMRC) at Yonsei University; it predicted the patients' clinical diagnoses with perfect matches (except for one patient among the high-risk and low-risk classified patients where the performance of predictions is critical due to the high sensitivity and specificity requirements for clinical treatments. Accuracy validated by leave-one-out cross-validation(LOOCV) was 83.3% while other classification methods of CART and DT performed as comparisons showed worse performances than TLRM.

A maximum likelihood approach to infer demographic models

  • Chung, Yujin
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.3
    • /
    • pp.385-395
    • /
    • 2020
  • We present a new maximum likelihood approach to estimate demographic history using genomic data sampled from two populations. A demographic model such as an isolation-with-migration (IM) model explains the genetic divergence of two populations split away from their common ancestral population. The standard probability model for an IM model contains a latent variable called genealogy that represents gene-specific evolutionary paths and links the genetic data to the IM model. Under an IM model, a genealogy consists of two kinds of evolutionary paths of genetic data: vertical inheritance paths (coalescent events) through generations and horizontal paths (migration events) between populations. The computational complexity of the IM model inference is one of the major limitations to analyze genomic data. We propose a fast maximum likelihood approach to estimate IM models from genomic data. The first step analyzes genomic data and maximizes the likelihood of a coalescent tree that contains vertical paths of genealogy. The second step analyzes the estimated coalescent trees and finds the parameter values of an IM model, which maximizes the distribution of the coalescent trees after taking account of possible migration events. We evaluate the performance of the new method by analyses of simulated data and genomic data from two subspecies of common chimpanzees in Africa.

A study on the comparison of contents in mathematics curriculums - focused on Korea, California in USA England, Japan- (수학과 교육과정에서의 내용 비교 연구 - 우리나라, 미국의 캘리포니아주, 영국, 일본을 중심으로 -)

  • 나귀수;황혜정;임재훈
    • Journal of Educational Research in Mathematics
    • /
    • v.13 no.3
    • /
    • pp.403-428
    • /
    • 2003
  • This study intends to compare the mathematics contents included in the mathematics curriculum of Korea, California in USA, England, and Japan. The result of this comparison is that there are big differences on ranges, depths, and grades between mathematics contents in four countries' mathematics curriculum. In Korea, more contents are dealt in earlier grade and to higher level than other countries. And, these features are revealed more apparently in the area of algebra, analysis, and geometry than probability and statistics.

  • PDF

Polyclass in Data Mining (데이터 마이닝에서의 폴리클라스)

  • 구자용;박헌진;최대우
    • The Korean Journal of Applied Statistics
    • /
    • v.13 no.2
    • /
    • pp.489-503
    • /
    • 2000
  • Data mining means data analysis and model selection using various types of data in order to explore useful information and knowledge for making decisions. Examples of data mining include scoring for credit analysis of a new customer and scoring for churn management, where the customers with high scores are given special attention. In this paper, scoring is interpreted as a modeling process of the conditional probability and polyclass scoring method is described. German credit data, a PC communication company data and a mobile communication company data are used to compare the performance of polyclass scoring method with that of the scoring method based on a tree model.

  • PDF

Autocovariance based estimation in the linear regression model (선형회귀 모형에서 자기공분산 기반 추정)

  • Park, Cheol-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.5
    • /
    • pp.839-847
    • /
    • 2011
  • In this study, we derive an estimator based on autocovariance for the regression coefficients vector in the multiple linear regression model. This method is suggested by Park (2009), and although this method does not seem to be intuitively attractive, this estimator is unbiased for the regression coefficients vector. When the vectors of exploratory variables satisfy some regularity conditions, under mild conditions which are satisfied when errors are from autoregressive and moving average models, this estimator has asymptotically the same distribution as the least squares estimator and also converges in probability to the regression coefficients vector. Finally we provide a simulation study that the forementioned theoretical results hold for small sample cases.

Development and Effectiveness of a Spiritual Care Education Program for Nursing Students - Based on the ASSET Model (간호대학생의 영적간호교육프로그램의 개발 및 효과)

  • Chung, Mi-Ja;Eun, Young
    • Journal of Korean Academy of Nursing
    • /
    • v.41 no.5
    • /
    • pp.673-683
    • /
    • 2011
  • Purpose: This study was done to develop a spiritual care education program (SCEP) for nursing students to help increase their awareness of the essence of spirituality in care so as to enable them to promote spiritual well-being and spiritual care competence. Methods: The participants were assigned to an experimental group (n=42) or a control group (n=39). From August to October 2009, the experimental group participated in the SCEP, which were held 2 hours a week for 6 weeks. The data were analyzed using ${\chi}^2$-test, Fisher's exact probability test, paired t-test, t-test with the SPSS WIN 17.0 statistics program. Results: The experimental group had a higher mean score for spirituality, spiritual well-being and spiritual care competence than the control group. Significant differences were found between the experimental group and the control group. Conclusion: The results of this study indicate that the SCEP was effective in improving spirituality, spiritual well-being and spiritual care competence for nursing students.

GOODNESS-OF-FIT TEST USING LOCAL MAXIMUM LIKELIHOOD POLYNOMIAL ESTIMATOR FOR SPARSE MULTINOMIAL DATA

  • Baek, Jang-Sun
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.3
    • /
    • pp.313-321
    • /
    • 2004
  • We consider the problem of testing cell probabilities in sparse multinomial data. Aerts et al. (2000) presented T=${{\Sigma}_{i=1}}^{k}{[{p_i}^{*}-E{(p_{i}}^{*})]^2$ as a test statistic with the local least square polynomial estimator ${{p}_{i}}^{*}$, and derived its asymptotic distribution. The local least square estimator may produce negative estimates for cell probabilities. The local maximum likelihood polynomial estimator ${{\hat{p}}_{i}}$, however, guarantees positive estimates for cell probabilities and has the same asymptotic performance as the local least square estimator (Baek and Park, 2003). When there are cell probabilities with relatively much different sizes, the same contribution of the difference between the estimator and the hypothetical probability at each cell in their test statistic would not be proper to measure the total goodness-of-fit. We consider a Pearson type of goodness-of-fit test statistic, $T_1={{\Sigma}_{i=1}}^{k}{[{p_i}^{*}-E{(p_{i}}^{*})]^2/p_{i}$ instead, and show it follows an asymptotic normal distribution. Also we investigate the asymptotic normality of $T_2={{\Sigma}_{i=1}}^{k}{[{p_i}^{*}-E{(p_{i}}^{*})]^2/p_{i}$ where the minimum expected cell frequency is very small.

ETF risk management (ETF 위험관리에 관한 연구)

  • Lee, Woosik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.843-851
    • /
    • 2017
  • The rise of the Robo-advisor represents one of the most profound shifts in FinTech. It also raises concerns about their financial management. As the most Robo-Advisors utilize ETFs, we seek to determine the appropriate risk management model in estimating 95% Value-at-Risk (VaR) and 99% VaR in this paper. The GARCH and the Markov regime wwitching GARCH are evaluated in terms of the accuracy of probability, the independence of extreme events occurrence and both. The result shows that the Markov regime switching GARCH can be a good ETF risk management tool since it can reflect financial market structural changes into the volatility.