• Title/Summary/Keyword: analysis data

Search Result 85,083, Processing Time 0.075 seconds

Complex Segregation Analysis of Categorical Traits in Farm Animals: Comparison of Linear and Threshold Models

  • Kadarmideen, Haja N.;Ilahi, H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.18 no.8
    • /
    • pp.1088-1097
    • /
    • 2005
  • Main objectives of this study were to investigate accuracy, bias and power of linear and threshold model segregation analysis methods for detection of major genes in categorical traits in farm animals. Maximum Likelihood Linear Model (MLLM), Bayesian Linear Model (BALM) and Bayesian Threshold Model (BATM) were applied to simulated data on normal, categorical and binary scales as well as to disease data in pigs. Simulated data on the underlying normally distributed liability (NDL) were used to create categorical and binary data. MLLM method was applied to data on all scales (Normal, categorical and binary) and BATM method was developed and applied only to binary data. The MLLM analyses underestimated parameters for binary as well as categorical traits compared to normal traits; with the bias being very severe for binary traits. The accuracy of major gene and polygene parameter estimates was also very low for binary data compared with those for categorical data; the later gave results similar to normal data. When disease incidence (on binary scale) is close to 50%, segregation analysis has more accuracy and lesser bias, compared to diseases with rare incidences. NDL data were always better than categorical data. Under the MLLM method, the test statistics for categorical and binary data were consistently unusually very high (while the opposite is expected due to loss of information in categorical data), indicating high false discovery rates of major genes if linear models are applied to categorical traits. With Bayesian segregation analysis, 95% highest probability density regions of major gene variances were checked if they included the value of zero (boundary parameter); by nature of this difference between likelihood and Bayesian approaches, the Bayesian methods are likely to be more reliable for categorical data. The BATM segregation analysis of binary data also showed a significant advantage over MLLM in terms of higher accuracy. Based on the results, threshold models are recommended when the trait distributions are discontinuous. Further, segregation analysis could be used in an initial scan of the data for evidence of major genes before embarking on molecular genome mapping.

Utilization of Social Media Analysis using Big Data (빅 데이터를 이용한 소셜 미디어 분석 기법의 활용)

  • Lee, Byoung-Yup;Lim, Jong-Tae;Yoo, Jaesoo
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.211-219
    • /
    • 2013
  • The analysis method using Big Data has evolved based on the Big data Management Technology. There are quite a few researching institutions anticipating new era in data analysis using Big Data and IT vendors has been sided with them launching standardized technologies for Big Data management technologies. Big Data is also affected by improvements of IT gadgets IT environment. Foreran by social media, analyzing method of unstructured data is being developed focusing on diversity of analyzing method, anticipation and optimization. In the past, data analyzing methods were confined to the optimization of structured data through data mining, OLAP, statics analysis. This data analysis was solely used for decision making for Chief Officers. In the new era of data analysis, however, are evolutions in various aspects of technologies; the diversity in analyzing method using new paradigm and the new data analysis experts and so forth. In addition, new patterns of data analysis will be found with the development of high performance computing environment and Big Data management techniques. Accordingly, this paper is dedicated to define the possible analyzing method of social media using Big Data. this paper is proposed practical use analysis for social media analysis through data mining analysis methodology.

Multi-dimension Categorical Data with Bayesian Network (베이지안 네트워크를 이용한 다차원 범주형 분석)

  • Kim, Yong-Chul
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.2
    • /
    • pp.169-174
    • /
    • 2018
  • In general, the methods of the analysis of variance(ANOVA) for the continuous data and the chi-square test for the discrete data are used for statistical analysis of the effect and the association. In multidimensional data, analysis of hierarchical structure is required and statistical linear model is adopted. The structure of the linear model requires the normality of the data. A multidimensional categorical data analysis methods are used for causal relations, interactions, and correlation analysis. In this paper, Bayesian network model using probability distribution is proposed to reduce analysis procedure and analyze interactions and causal relationships in categorical data analysis.

A Study on Design of Real-time Big Data Collection and Analysis System based on OPC-UA for Smart Manufacturing of Machine Working

  • Kim, Jaepyo;Kim, Youngjoo;Kim, Seungcheon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.121-128
    • /
    • 2021
  • In order to design a real time big data collection and analysis system of manufacturing data in a smart factory, it is important to establish an appropriate wired/wireless communication system and protocol. This paper introduces the latest communication protocol, OPC-UA (Open Platform Communication Unified Architecture) based client/server function, applied user interface technology to configure a network for real-time data collection through IoT Integration. Then, Database is designed in MES (Manufacturing Execution System) based on the analysis table that reflects the user's requirements among the data extracted from the new cutting process automation process, bush inner diameter indentation measurement system and tool monitoring/inspection system. In summary, big data analysis system introduced in this paper performs SPC (statistical Process Control) analysis and visualization analysis with interface of OPC-UA-based wired/wireless communication. Through AI learning modeling with XGBoost (eXtream Gradient Boosting) and LR (Linear Regression) algorithm, quality and visualization analysis is carried out the storage and connection to the cloud.

Analysis of Market Trajectory Data using k-NN

  • Park, So-Hyun;Ihm, Sun-Young;Park, Young-Ho
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.195-200
    • /
    • 2018
  • Recently, as the sensor and big data analysis technology have been developed, there have been a lot of researches that analyze the purchase-related data such as the trajectory information and the stay time. Such purchase-related data is usefully used for the purchase pattern prediction and the purchase time prediction. Because it is difficult to find periodic patterns in large-scale human data, it is necessary to look at actual data sets, find various feature patterns, and then apply a machine learning algorithm appropriate to the pattern and purpose. Although existing papers have been used to analyze data using various machine learning methods, there is a lack of statistical analysis such as finding feature patterns before applying the machine learning algorithm. Therefore, we analyze the purchasing data of Songjeong Maeil Market, which is a data gathering place, and finds some characteristic patterns through statistical data analysis. Based on the results of 1, we derive meaningful conclusions by applying the machine learning algorithm and present future research directions. Through the data analysis, it was confirmed that the number of visits was different according to the regional characteristics around Songjeong Maeil Market, and the distribution of time spent by consumers could be grasped.

A Statistical Analysis of Professional Baseball Team Data: The Case of the Lotte Giants

  • Cho, Young-Seuk;Han, Jun-Tae;Park, Chan-Keun;Heo, Tae-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1191-1199
    • /
    • 2010
  • Knowing what factors into a player's ability to affect the outcome of a sports game is crucial. This knowledge helps determine the relative degree of contribution by each team member as well as sets appropriate annual salaries. This study uses statistical analysis to investigate how much the outcome of a professional baseball game is influenced by the records of individual players. We used the Lotte Giants' data on 252 games played between 2007 and 2008 that included environmental data(home or away games and opponents) as well as pitchers' and batters' data. Using a SAS Enterprise Miner, we performed a logistic regression analysis and decision tree analysis on the data. The results obtained through the two analytic methods are compared and discussed.

A Study on the Reliability of Observational Settlement Analysis Using Data Mining (데이터마이닝을 이용한 관측적 침하해석의 신뢰성 연구)

  • 우철웅;장병욱
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.45 no.6
    • /
    • pp.183-193
    • /
    • 2003
  • Most construction works on the soft ground adopt instrumentation to manage settlement and stability of the embankment. The rapid progress of the information technologies and the digital data acquisition on the soft ground instrumentation has led to the fast-growing amount of data. Although valuable information about the behaviour of the soft ground may be hiding behind the data, most of the data are used restrictedly only for the management of settlement and stability. One of the critical issues on soft ground instrumentation is the long-term settlement prediction. Some observational settlement analysis methods are used for this purpose. But the reliability of the analysis results is remained in vague. The knowledge could be discovered from a large volume of experiences on the observational settlement analysis. In this article, we present a database to store settlement records and data mining procedure. A large volume of knowledge about observational settlement prediction were collected from the database by applying the filtering algorithm and knowledge discovery algorithm. Statistical analysis revealed that the reliability of observational settlement analysis depends on stay duration and estimated degree of consolidation.

Nonlinear Canonical Correlation Analysis for Paralysis Disease Data

  • Shin, Yang-Kyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.515-521
    • /
    • 2004
  • Categorical data are mostly found in oriental medical research. The nonlinear canonical correlation analysis does not assume an interval level of measurement. In this paper, we apply nonlinear canonical correlation analysis to quantification and explain how similar sets of variables are to one another for paralysis disease data.

  • PDF

The analysis of NRT-SEL of the B747-400 Aircraft by FDR(Flight Data Recorder) (비행자료기록(FDR)에 의한 B747-400 항공기의 나리타-김포 비행 분석)

  • Shin, D.W.;Lee, K.C.;Lee, J.H.;Choi, H.S.;Song, B.H.
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.9 no.2
    • /
    • pp.63-76
    • /
    • 2001
  • The number of civil aviation aircraft registered in Korea has continuously increased. Accordingly the accidents accompanying loss of human or economic resources occurred repeatedly. But the analysis of flight data recorded in FDR was not done by ourselves in the accident investigation because of lack of analysis technology. Therefore, this study is performed to secure the safety of civil aviation by establishing systematic analysis ability of Flight Data Recorder. Through this study, download flight data to personal computer, editing interface file to convert from binary data to engineering data, three dimensional graphical analysis and numerical analysis are performed. For the analysis, the flight of B747-400 model aircraft between Narita(Japan) and Gimpo(Korea) was selected.

  • PDF

Analysis of Hyperspectral Dentin Data Using Independent Component Analysis

  • Jung, Sung-Hwan
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.12
    • /
    • pp.1755-1760
    • /
    • 2009
  • In this research, for the first time, we tried to analyse Raman hyperspectral dentin data using Independent Component Analysis (ICA) to see its possibility of adoption for the dental analysis software. We captured hyperspectral dentin data on 569 spots on a molar with dental lesion by HR800 Micro Raman Spectrometer at UMKC-CRISP (University of Missouri at Kansas City-Center for Research on Interfacial Structure and Properties). Each spot has 1,005 hyperspectral data. We applied ICA to the captured hyperspectral data of dentin for evaluating ICA approach, and compared it with the well known multivariate analysis method, PCA. As a result of the experiment, ICA approach shows better local characteristic of dentin than the result of PCA. We confirmed that ICA also could be a good method along with PCA in the dental analysis software.

  • PDF