• Title/Summary/Keyword: methods of data analysis

Search Result 19,673, Processing Time 0.059 seconds

Study on core herbs and herbal prescriptions from Internal medicine on Liver system in Korean Medicine (한방간계내과학 내 중요 본초 및 처방 분석 연구)

  • Kim Anna;Seo Sumin;Kim Sangkyun;Lee Sanghun;Oh Yongtaek
    • Herbal Formula Science
    • /
    • v.32 no.2
    • /
    • pp.129-139
    • /
    • 2024
  • Objective : This study aims to investigate core herbs and formulas in Internal Medicine on the Liver system (IML) to enhance the efficiency of teaching IML, Herbalogy, and Formula Science, as well as to increase the integration of these courses. Methods : The study employed the frequency concept, commonly utilized in previous studies, alongside network analysis. Results : This study identified frequently used herbs, herbs with high centrality, frequently combined herbs, and core formulas. The herb with the highest frequency was 'Angelicae Gigantis Radix', while the herb with the highest centrality was 'Citri Unshius Pericarpium', and the most frequent herb combination was 'Zingiberis Rhizoma - Citri Unshius Pericarpium'. The network analysis results revealed a total of 5 herbal combination communities. Conclusion : In this study, we identified core herbs using traditional frequency analysis methods. Additionally, to complement traditional analysis methods, we discovered core herbal combinations, and fundamental herbal prescriptions through network analysis. The results of this study can be utilized as foundational data to enhance the efficiency of education in Internal Medicine on the Liver system (IML), Herbology, and Formula Science courses, as well as to improve coherence and consistency between foundational and clinical subjects.

Binary classification on compositional data

  • Joo, Jae Yun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.1
    • /
    • pp.89-97
    • /
    • 2021
  • Due to boundedness and sum constraint, compositional data are often transformed by logratio transformation and their transformed data are put into traditional binary classification or discriminant analysis. However, it may be problematic to directly apply traditional multivariate approaches to the transformed data because class distributions are not Gaussian and Bayes decision boundary are not polynomial on the transformed space. In this study, we propose to use flexible classification approaches to transformed data for compositional data classification. Empirical studies using synthetic and real examples demonstrate that flexible approaches outperform traditional multivariate classification or discriminant analysis.

A Comparison Study on Statistical Modeling Methods (통계모델링 방법의 비교 연구)

  • Noh, Yoojeong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.5
    • /
    • pp.645-652
    • /
    • 2016
  • The statistical modeling of input random variables is necessary in reliability analysis, reliability-based design optimization, and statistical validation and calibration of analysis models of mechanical systems. In statistical modeling methods, there are the Akaike Information Criterion (AIC), AIC correction (AICc), Bayesian Information Criterion, Maximum Likelihood Estimation (MLE), and Bayesian method. Those methods basically select the best fitted distribution among candidate models by calculating their likelihood function values from a given data set. The number of data or parameters in some methods are considered to identify the distribution types. On the other hand, the engineers in a real field have difficulties in selecting the statistical modeling method to obtain a statistical model of the experimental data because of a lack of knowledge of those methods. In this study, commonly used statistical modeling methods were compared using statistical simulation tests. Their advantages and disadvantages were then analyzed. In the simulation tests, various types of distribution were assumed as populations and the samples were generated randomly from them with different sample sizes. Real engineering data were used to verify each statistical modeling method.

A Study on the Reliability of Observational Settlement Analysis Using Data Mining (데이터마이닝을 이용한 관측적 침하해석의 신뢰성 연구)

  • 우철웅;장병욱
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.45 no.6
    • /
    • pp.183-193
    • /
    • 2003
  • Most construction works on the soft ground adopt instrumentation to manage settlement and stability of the embankment. The rapid progress of the information technologies and the digital data acquisition on the soft ground instrumentation has led to the fast-growing amount of data. Although valuable information about the behaviour of the soft ground may be hiding behind the data, most of the data are used restrictedly only for the management of settlement and stability. One of the critical issues on soft ground instrumentation is the long-term settlement prediction. Some observational settlement analysis methods are used for this purpose. But the reliability of the analysis results is remained in vague. The knowledge could be discovered from a large volume of experiences on the observational settlement analysis. In this article, we present a database to store settlement records and data mining procedure. A large volume of knowledge about observational settlement prediction were collected from the database by applying the filtering algorithm and knowledge discovery algorithm. Statistical analysis revealed that the reliability of observational settlement analysis depends on stay duration and estimated degree of consolidation.

A Collaborative Framework for Discovering the Organizational Structure of Social Networks Using NER Based on NLP (NLP기반 NER을 이용해 소셜 네트워크의 조직 구조 탐색을 위한 협력 프레임 워크)

  • Elijorde, Frank I.;Yang, Hyun-Ho;Lee, Jae-Wan
    • Journal of Internet Computing and Services
    • /
    • v.13 no.2
    • /
    • pp.99-108
    • /
    • 2012
  • Many methods had been developed to improve the accuracy of extracting information from a vast amount of data. This paper combined a number of natural language processing methods such as NER (named entity recognition), sentence extraction, and part of speech tagging to carry out text analysis. The data source is comprised of texts obtained from the web using a domain-specific data extraction agent. A framework for the extraction of information from unstructured data was developed using the aforementioned natural language processing methods. We simulated the performance of our work in the extraction and analysis of texts for the detection of organizational structures. Simulation shows that our study outperformed other NER classifiers such as MUC and CoNLL on information extraction.

A Statistical Analysis of Professional Baseball Team Data: The Case of the Lotte Giants

  • Cho, Young-Seuk;Han, Jun-Tae;Park, Chan-Keun;Heo, Tae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1191-1199
    • /
    • 2010
  • Knowing what factors into a player's ability to affect the outcome of a sports game is crucial. This knowledge helps determine the relative degree of contribution by each team member as well as sets appropriate annual salaries. This study uses statistical analysis to investigate how much the outcome of a professional baseball game is influenced by the records of individual players. We used the Lotte Giants' data on 252 games played between 2007 and 2008 that included environmental data(home or away games and opponents) as well as pitchers' and batters' data. Using a SAS Enterprise Miner, we performed a logistic regression analysis and decision tree analysis on the data. The results obtained through the two analytic methods are compared and discussed.

Exploratory Analysis of Gene Expression Data Using Biplot (행렬도를 이용한 유전자발현자료의 탐색적 분석)

  • Park, Mi-Ra
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.355-369
    • /
    • 2005
  • Genome sequencing and microarray technology produce ever-increasing amounts of complex data that needs statistical analysis. Visualization is an effective analytic technique that exploits the ability of the human brain to process large amounts of data. In this study, biplot approach applied to microarray data to see the relationship between genes and samples. The supplementary data method to classify new sample to known category is suggested. The methods are validated by applying it to well known microarray data such as Golub et al.(1999), Alizadeh et al.(2000), Ross et al.(2000). The results are compared to the results of several clustering methods. Modified graph which combine partitioning method and biplot is also suggested.

A Comparative Study on Spatial Lattice Data Analysis - A Case Where Outlier Exists - (공간 격자데이터 분석에 대한 우위성 비교 연구 - 이상치가 존재하는 경우 -)

  • Kim, Su-Jung;Choi, Seung-Bae;Kang, Chang-Wan;Cho, Jang-Sik
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.2
    • /
    • pp.193-204
    • /
    • 2010
  • Recently, researchers of the various fields where the spatial analysis is needed have more interested in spatial statistics. In case of data with spatial correlation, methodologies accounting for the correlation are required and there have been developments in methods for spatial data analysis. Lattice data among spatial data is analyzed with following three procedures: (1) definition of the spatial neighborhood, (2) definition of spatial weight, and (3) the analysis using spatial models. The present paper shows a spatial statistical analysis method superior to a general statistical method in aspect estimation by using the trimmed mean squared error statistic, when we analysis the spatial lattice data that outliers are included. To show validation and usefulness of contents in this paper, we perform a small simulation study and show an empirical example with a criminal data in BusanJin-Gu, Korea.

Statistical Issues in Genomic Cohort Studies (유전체 코호트 연구의 주요 통계학적 과제)

  • Park, So-Hee
    • Journal of Preventive Medicine and Public Health
    • /
    • v.40 no.2
    • /
    • pp.108-113
    • /
    • 2007
  • When conducting large-scale cohort studies, numerous statistical issues arise from the range of study design, data collection, data analysis and interpretation. In genomic cohort studies, these statistical problems become more complicated, which need to be carefully dealt with. Rapid technical advances in genomic studies produce enormous amount of data to be analyzed and traditional statistical methods are no longer sufficient to handle these data. In this paper, we reviewed several important statistical issues that occur frequently in large-scale genomic cohort studies, including measurement error and its relevant correction methods, cost-efficient design strategy for main cohort and validation studies, inflated Type I error, gene-gene and gene-environment interaction and time-varying hazard ratios. It is very important to employ appropriate statistical methods in order to make the best use of valuable cohort data and produce valid and reliable study results.

Web-based DNA Microarray Data Analysis Tool

  • Ryu, Ki-Hyun;Park, Hee-Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1161-1167
    • /
    • 2006
  • Since microarray data structures are various and complicative, the data are generally stored in databases for approaching to and controlling the data effectively. But we have some difficulties to analyze and control the data when the data are stored in the several database management systems. The existing analysis tools for DNA microarray data have many difficult problems by complicated instructions, and dependency on data types and operating system, and high cost, etc. In this paper, we design and implement the web-based analysis tool for obtaining to useful information from DNA microarray data. When we use this tool, we can analyze effectively DNA microarray data without special knowledge and education for data types and analytical methods.

  • PDF