• 제목/요약/키워드: methods of data analysis

검색결과 19,359건 처리시간 0.049초

한의학 고문헌 데이터 분석을 위한 단어 임베딩 기법 비교: 자연어처리 방법을 적용하여 (Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method)

  • 오준호
    • 대한한의학원전학회지
    • /
    • 제32권1호
    • /
    • pp.61-74
    • /
    • 2019
  • Objectives : The purpose of this study is to help select an appropriate word embedding method when analyzing East Asian traditional medicine texts as data. Methods : Based on prescription data that imply traditional methods in traditional East Asian medicine, we have examined 4 count-based word embedding and 2 prediction-based word embedding methods. In order to intuitively compare these word embedding methods, we proposed a "prescription generating game" and compared its results with those from the application of the 6 methods. Results : When the adjacent vectors are extracted, the count-based word embedding method derives the main herbs that are frequently used in conjunction with each other. On the other hand, in the prediction-based word embedding method, the synonyms of the herbs were derived. Conclusions : Counting based word embedding methods seems to be more effective than prediction-based word embedding methods in analyzing the use of domesticated herbs. Among count-based word embedding methods, the TF-vector method tends to exaggerate the frequency effect, and hence the TF-IDF vector or co-word vector may be a more reasonable choice. Also, the t-score vector may be recommended in search for unusual information that could not be found in frequency. On the other hand, prediction-based embedding seems to be effective when deriving the bases of similar meanings in context.

Finding Interesting Genes Using Reliability in Various Gene Expression Models

  • Lee, Eun-Kyung;Cook, Dianne;Hoffman, Heike
    • Genomics & Informatics
    • /
    • 제9권1호
    • /
    • pp.28-36
    • /
    • 2011
  • Most statistical methods for finding interesting genes are focusing on the summary values with large fold-changes or large variations. Very few methods consider the probe level data. We developed a new measure to detect reliability that incorporates the probe level data. This reliability measure is useful for exploring the microarray data without ignoring the probe level data. It is easy to calculate, and it can be used for all the other statistical methods as a good guideline to find real differentially expressed genes. Instead of filtering out genes before the analysis, we use whole genes in the analysis and make decisions with new reliability measures.

순서 범주형 자료해석법의 비교 연구 (A Study on Comparison with the Methods of Ordered Categorical Data of Analysis)

  • 김홍준;송서일
    • 산업경영시스템학회지
    • /
    • 제20권44호
    • /
    • pp.207-215
    • /
    • 1997
  • This paper deals with a comparison between Taguchi's accumulation analysis method and Nair test on the ordered categorical data from an industrial experiment for quality improvement. a result of Taguchi's accumulation analysis method is shown to have reasonable power for detecting location effects, while Nair test identifies the location and dispersion effects separately, Accordingly, Taguchi's accumulation analysis needs to develop methods for detecting dispersion effects as well as location effects. In addition this paper rewmmends models for analyzing ordered categorical data, for examples, the cumulative legit model, mean response model etc Successively simple, reasonable methods should be introduced more likely to be used by the practitioners.

  • PDF

Quadrilateral Irregular Network for Mesh-Based Interpolation

  • Tae Beom Kim;Chihyung Lee
    • 지질공학
    • /
    • 제33권3호
    • /
    • pp.439-459
    • /
    • 2023
  • Numerical analysis has been adopted in nearly all modern scientific and engineering fields due to the rapid and ongoing evolution of computational technology, with the number of grid or mesh points in a given data field also increasing. Some values must be extracted from large data fields to evaluate and supplement numerical analysis results and observational data, thereby highlighting the need for a fast and effective interpolation approach. The quadrilateral irregular network (QIN) proposed in this study is a fast and reliable interpolation method that is capable of sufficiently satisfying these demands. A comparative sensitivity analysis is first performed using known test functions to assess the accuracy and computational requirements of QIN relative to conventional interpolation methods. These same interpolation methods are then employed to produce simple numerical model results for a real-world comparison. Unlike conventional interpolation methods, QIN can obtain reliable results with a guaranteed degree of accuracy since there is no need to determine the optimal parameter values. Furthermore, QIN is a computationally efficient method compared with conventional interpolation methods that require the entire data space to be evaluated during interpolation, even if only a subset of the data space requires interpolation.

도심부와 교외지역의 장·단파 복사와 상관도 분석 (II) - 관측 자료의 상관도 분석기법에 관한 연구 - (Long and Short Wave Radiation and Correlation Analysis Between Downtown and Suburban Area(II) - Study on Correlation Analysis Method of Radiation Data -)

  • 최동호;이부용;오호엽
    • 한국태양에너지학회 논문집
    • /
    • 제33권4호
    • /
    • pp.101-110
    • /
    • 2013
  • The propose of this study is to understand the phenomenon of radiation and comparison of analysis of two methods. One is analysis method of same-time data and the another is analysis method of rank data. We confirmed that two methods of correlation analysis had the effectiveness and suitability. The followings are main results from this study. 1) The seasonal correlation coefficient of long and short-wave radiation is higher in winter than in summer because of high humidity in the summer season can makes easily cloud in the sky locally. 2) According to analysis method, there is big difference in correlation coefficient from 0.494(Analysis method of same-time data) to 0.967(Analysis method of rank data) with short-wave radiation by the location during summer. These results have significant value in solar radiation research and analysis. It has explored a new way for solar radiation research of analysis method as well.

자료 분석의 기초 (An Introduction to Data Analysis)

  • 박선일;이영원
    • 한국임상수의학회지
    • /
    • 제26권3호
    • /
    • pp.189-199
    • /
    • 2009
  • With the growing importance of evidence-based medicine, clinical or biomedical research relies critically on the validity and reliability of data, and the subsequent statistical inferences for medical decision-making may lead to valid conclusion. Despite widespread use of analytical techniques in papers published in the Journal of Veterinary Clinics statistical errors particularly in design of experiments, research methodology or data analysis methods are commonly encountered. These flaws often leading to misinterpretation of the data, thereby, subjected to inappropriate conclusions. This article is the first in a series of nontechnical introduction designed not to systemic review of medical statistics but intended to provide the journal readers with an understanding of common statistical concepts, including data scale, selection of appropriate statistical methods, descriptive statistics, data transformation, confidence interval, the principles of hypothesis testing, sampling distribution, and interpretation of results.

택시통행패턴에 따른 광주시 기능지역 분석 (Functional Areas of Kwang-ju City through Analysis of the Taxi-flow Pattern)

  • 김영기
    • 대한교통학회지
    • /
    • 제6권2호
    • /
    • pp.35-48
    • /
    • 1988
  • Amongst various analytic methods of internal structure of city, the factor analysis method which uses O-D matrix data has some merits and characteristics compared to other methods. 1) It is possible to find one certain interaction and flow pattern between traffic zones with in a city through reanalyzing O-D data which is too complex to grasp specific meaning or pattern of flow systems. 2) It can be easily visualized the traffic flow pattern by using adequate graphic techniques, and also can clarify the functional areas whose interaction linkages are significantly strong enough between each other. In this study, the taxi traffic O-D data between 42 traffic zones in Kwang-ju city was reanalyzied by varimax rotated factor analysis methods. As a result, four factors that have significant level factor loading (over 0.5 ) and factor score (over 1.0) were sorted out. so to speak four different functional areas were clarified in Kwang-ju city, of the West, the East, the south, and the North functional areas whose interaction linkages are significantly strong enough between each other. In the study, the taxi traffic O-D data between 42 traffic zones in Kwang-ju city was reanalyzied by varimax rotated factor analysis methods. As a result, four factors that have significant level factor loading (over 0.5) and factor score 9over 1.0) were sorted out. so to speak four different functional areas were clarified in Kwang-ju city, of the West, the East, the South, and the North functional area, then these four functional areas are almost coincided with citizen's general conception of community division and administrative district. Accordingly the factor analysis methods using traffic data seems to proved to be very accurate and useful analytic instruments for analyzing flow pattern and clarifying functional areas of city, and believed to provide basic informations and criteria for practical urban land use planning and transportation planning.

  • PDF

Improving the Gumbel analysis by using M-th highest extremes

  • Cook, Nicholas J.
    • Wind and Structures
    • /
    • 제1권1호
    • /
    • pp.25-42
    • /
    • 1998
  • Improvements to the Gumbel method of extreme value analysis of wind data made over the last two decades are reviewed and illustrated using sample data for Jersey. A new procedure for extending the Gumbel method to include M-th highest annual extremes is shown to be less effective than the standard method, but leads to a method for calibrating peak-over-threshold methods against the standard Gumbel approach. Peak-over-threshold methods that include at least the 3rd highest annual extremes, specifically the modified Jensen and Franck method and the "Method of independent storms" are shown to give the best estimates of extremes from observations.

지식기반사회에서 이용자연구의 최신동향 (A Study on the Current Trends of User Study at the Knowledge Based Information Society)

  • 한복희
    • 한국문헌정보학회지
    • /
    • 제37권4호
    • /
    • pp.295-310
    • /
    • 2003
  • 본 연구는 1991년부터 2003년까지 이용자연구 논문의 내용분석을 통하여 우리나라 이용자연구의 연구 방법과 연구동향을 제시한 것이다. 분석자료는 연구의 주제, 연구방법, 자료수집방법, 자료분석방법, 가설설정 등을 조사하였다. 이용자연구논문의 경우 1991∼2003년 가을까지 발표된 논문의 연 평균은 12.8편으로 나타났다. 연도별 발표량은 200l년이 24 편으로 가장 많고, 주제별 발표논문 현황은 정보이용행태, 이용자연구, 이용자 인터페이스, 도서관 및 정보이용교육, 온라인 목록 이용행태의 순으로 연구되었다. 연구방법은 문헌연구, 서베이 연구, 사례연구의 순서로 이루어지고 있으며, 자료 분석방법은 주로 기술통계를 사용하고 이용자연구자들은 카이자승(28.0%), 상관관계(22.7%), T-검증(17.35), 분산 분석(14.7%), 다변량분석(4.0%) 등을 사용하며, 연구자의 약 17%가 가설을 설정하고 있다.

강건 실험계획법을 이용한 열화자료의 분석 (Analysis of Degradation Data Using Robust Experimental Design)

  • 서순근;하천수
    • 품질경영학회지
    • /
    • 제32권1호
    • /
    • pp.113-129
    • /
    • 2004
  • The reliability of the product can be improved by making the product less sensitive to noises. Especially, it Is important to make products robust against various noise factors encountered in production and field environments. In this paper, the phenomenon of degradation assumes a simple random coefficient degradation model to present analysis procedures of degradation data for robust experimental design. To alleviate weak points of previous studies, such as Taguchi's, Wasserman's, and pseudo failure time methods, novel techniques for analysis of degradation data using the cross array that regards amount of degradation as a dynamic characteristic for time are proposed. Analysis approach for degradation data using robust experimental design are classified by assumptions on parametric or nonparametric degradation rate(or slope). Also, a simulation study demonstrates the superiority of proposed methods over some previous works.