• Title/Summary/Keyword: 통계 분류

Search Result 2,155, Processing Time 0.031 seconds

A Study on the Statistical Analysis of Korea Patent Information (한국특허정보의 통계분석에 관한 연구)

  • Uhm, Dai-Ho;Chang, Young-Bae;Jeong, Eui-Seop
    • Journal of Information Management
    • /
    • v.41 no.3
    • /
    • pp.27-44
    • /
    • 2010
  • Most research about patent data analyzes the trend of technologies using a Patent Map(PM), and suggests the frequencies and trend of patents in a certain topic using tables or graphs in Excel. However, more advanced analysis tools are recently needed to compare the trends among national and international industries. This research discussed why statistical analysis is needed to improve the reliability in PM analysis, and the research compares the trends of patents in Korea between 1990 and 2004 by years, International Patent Classification(IPC) sections, and countries using the frequencies and Poisson regression model. The statistical analysis is also suggested and applied to R&D studies.

Variational Bayesian multinomial probit model with Gaussian process classification on mice protein expression level data (가우시안 과정 분류에 대한 변분 베이지안 다항 프로빗 모형: 쥐 단백질 발현 데이터에의 적용)

  • Donghyun Son;Beom Seuk Hwang
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.115-127
    • /
    • 2023
  • Multinomial probit model is a popular model for multiclass classification and choice model. Markov chain Monte Carlo (MCMC) method is widely used for estimating multinomial probit model, but its computational cost is high. However, it is well known that variational Bayesian approximation is more computationally efficient than MCMC, because it uses subsets of samples. In this study, we describe multinomial probit model with Gaussian process classification and how to employ variational Bayesian approximation on the model. This study also compares the results of variational Bayesian multinomial probit model to the results of naive Bayes, K-nearest neighbors and support vector machine for the UCI mice protein expression level data.

A Study on the Optimal Safety Measurement Classification System by Correcting and Deriving Statistical Items for External Causes of Death (통계청 사망외인 통계항목군의 수정도출을 통한 최적의 안전측정 분류체계(안) 연구)

  • Park, Hyung-joo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.535-543
    • /
    • 2020
  • The KSA caused an error in deriving the statistical measurement items due to the misconfiguration of the safety variables and sub-modules that constitute the safety statistics framework for external causes of death, and pathogenesis, without considering the academic classification system of the field and area of the disaster/accident. By naming it as a mechanism, it was analyzed that the result of poor statistical validity has arrived. Therefore, in this study, by changing the safety parameters according to the WHO safety definition and setting the sub-modules appropriately, the categories of falls, drowning accidents, and accidents exposed to inanimate mechanical forces are classified as accidents at industrial sites and work in daily life. As a result, by systematically re-establishing the complex group of statistical items of the NSO by deriving the field of disasters/accidents according to the nature of the source of external causes of death and setting the relevant domains academically, statistical validity gets better and It is anticipated to play an important role in determining the direction of safe investment.

Integrated Clustering Method based on Syntactic Structure and Word Similarity for Statistical Machine Translation (문장구조 유사도와 단어 유사도를 이용한 클러스터링 기반의 통계기계번역)

  • Kim, Hankyong;Na, Hwi-Dong;Li, Jin-Ji;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.44-49
    • /
    • 2009
  • 통계기계번역에서 도메인에 특화된 번역을 시도하여 성능향상을 얻는 방법이 있다. 이를 위하여 문장의 유형이나 장르에 따라 클러스터링을 수행한다. 그러나 기존의 연구 중 문장의 유형 정보와 장르에 따른 정보를 동시에 사용한 경우는 없었다. 본 논문에서는 문장 사이의 문법적 구조 유사성으로 문장을 유형별로 분류하는 새로운 기법을 제시하였고, 단어 유사도 정보로 문서의 장르를 구분하여 기존의 두 기법을 통합하였다. 이렇게 분류된 말뭉치에서 추출한 모델과 전체 말뭉치에서 추출된 모델에서 보간법(interpolation)을 사용하여 통계기계번역의 성능을 향상하였다. 문장구조의 유사성과 단어 유사도 계산을 위하여 각각 커널과 코사인 유사도를 적용하였으며, 두 유사도를 적용하여 말뭉치를 분류하는 과정은 K-Means 알고리즘과 유사한 기계학습 기법을 사용하였다. 이를 일본어-영어의 특허문서에서 실험한 결과 최선의 경우 약 2.5%의 상대적인 성능 향상을 얻었다.

  • PDF

Spam Filter by Using X2 Statistics and Support Vector Machines (카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터)

  • Lee, Song-Wook
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.249-254
    • /
    • 2010
  • We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.

A hybrid method to compose an optimal gene set for multi-class classification using mRMR and modified particle swarm optimization (mRMR과 수정된 입자군집화 방법을 이용한 다범주 분류를 위한 최적유전자집단 구성)

  • Lee, Sunho
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.683-696
    • /
    • 2020
  • The aim of this research is to find an optimal gene set that provides highly accurate multi-class classification with a minimum number of genes. A two-stage procedure is proposed: Based on minimum redundancy and maximum relevance (mRMR) framework, several statistics to rank differential expression genes and K-means clustering to reduce redundancy between genes are used for data filtering procedure. And a particle swarm optimization is modified to select a small subset of informative genes. Two well known multi-class microarray data sets, ALL and SRBCT, are analyzed to indicate the effectiveness of this hybrid method.

사무기기 시장의 현재와 미래 - 통계로 보는 일본의 사무기기 및 복사기.복합기 시장 동향

  • Multi Function Printer, Multi Function Printer
    • The Optical Journal
    • /
    • s.122
    • /
    • pp.20-21
    • /
    • 2009
  • 본 통계는 2002년 1월부터 디아조(Diazo) 광디스크 파일링, 타이프 라이터, 2005년 1월부터 서류분쇄기(Shredder), 타임레코더(Time recorder)가 생산 통계에서 제외되었기 때문에 전체의 숫자에는 포함되어 있지 않다. 지면 사정상 오프셋인쇄기(생산수 출입)과 워드프로세서(수출입)의 표는 생략했지만, 사무기 전체의 숫자에는 포함되어 있다. 그리고 2007년부터 회계기(수출입)가 통계에서 제외되어 전체 숫자에는 들어있지 않다. 한편, 2007년 1월 통계부터 수출입의 복사기 카테고리가 변경되어, 지금까지 계산하지 않았던 레이저복합기(FAX, 프린터, 스캐너 기능을 탑재한 기종)를 통계에 넣었기 때문에 숫자가 종전보다 늘어났으며 정전간접식, 기타 분류는 없어졌다.

  • PDF

Assessing Spatial Uncertainty Distributions in Classification of Remote Sensing Imagery using Spatial Statistics (공간 통계를 이용한 원격탐사 화상 분류의 공간적 불확실성 분포 추정)

  • Park No-Wook;Chi Kwang-Hoon;Kwon Byung-Doo
    • Korean Journal of Remote Sensing
    • /
    • v.20 no.6
    • /
    • pp.383-396
    • /
    • 2004
  • The application of spatial statistics to obtain the spatial uncertainty distributions in classification of remote sensing images is investigated in this paper. Two quantitative methods are presented for describing two kinds of uncertainty; one related to class assignment and the other related to the connection of reference samples. Three quantitative indices are addressed for the first category of uncertainty. Geostatistical simulation is applied both to integrate the exhaustive classification results with the sparse reference samples and to obtain the spatial uncertainty or accuracy distributions connected to those reference samples. To illustrate the proposed methods and to discuss the operational issues, the experiment was done on a multi-sensor remote sensing data set for supervised land-cover classification. As an experimental result, the two quantitative methods presented in this paper could provide additional information for interpreting and evaluating the classification results and more experiments should be carried out for verifying the presented methods.

Comparison Studies of Classification Methods based on L1-Distance and L1-Data Depth (L1-거리와 L1-데이터뎁스를 이용한 분류방법의 비교연구)

  • Baek Soo-Jin;Hwang Jin-Soo;Kim Jean-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.183-193
    • /
    • 2006
  • We consider a new classification method(DnDclass) combining two classification rules based on $L_1$-distance(L1DISTclass) and $L_1$-data depth(L1DDclass). To investigate characteristics and to evaluate the performance of these classification methods, we use simulation data in various settings. Through this simulation study, we can confirm that the new method, DnDclass, performs relatively well in many cases.

A comparative study of feature screening methods for ultrahigh dimensional multiclass classification (초고차원 다범주분류를 위한 변수선별 방법 비교 연구)

  • Lee, Kyungeun;Kim, Kyoung Hee;Shin, Seung Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.5
    • /
    • pp.793-808
    • /
    • 2017
  • We compare various variable screening methods on multiclass classification problems when the data is ultrahigh-dimensional. Two different approaches were considered: (1) pairwise extension from binary classification via one versus one or one versus rest comparisons and (2) direct classification of multiclass responses. We conducted extensive simulation studies under different conditions: heavy tailed explanatory variables, correlated signal and noise variables, correlated joint distributions but uncorrelated marginals, and unbalanced response variables. We then analyzed real data to examine the performance of the methods. The results showed that model-free methods perform better for multiclass classification problems as well as binary ones.