• 제목/요약/키워드: data for analysis

검색결과 73,879건 처리시간 0.083초

국방 C5ISR 분야 품질문제의 빅데이터 분석 및 예측 모델에 대한 연구 (A Study on the Big Data Analysis and Predictive Models for Quality Issues in Defense C5ISR)

  • 허형조;고수진;백승현
    • 품질경영학회지
    • /
    • 제51권4호
    • /
    • pp.551-571
    • /
    • 2023
  • Purpose: The purpose of this study is to propose useful suggestions by analyzing the causal effect relationship between the failure rate of quality and the process variables in the C5ISR domain of the defense industry. Methods: The collected data through the in house Systems were analyzed using Big data analysis. Data analysis between quality data and A/S history data was conducted using the CRISP-DM(Cross-Industry Standard Process for Data Mining) analysis process. Results: The results of this study are as follows: After evaluating the performance of candidate models for the influence of inspection data and A/S history data, logistic regression was selected as the final model because it performed relatively well compared to the decision tree with an accuracy of 82%/67% and an AUC of 0.66/0.57. Based on this model, we estimated the coefficients using 'R', a data analysis tool, and found that a specific variable(continuous maximum discharge current time) had a statistically significant effect on the A/S quality failure rate and it was analysed that 82% of the failure rate could be predicted. Conclusion: As the first case of applying big data analysis to quality issues in the defense industry, this study confirms that it is possible to improve the market failure rates of defense products by focusing on the measured values of the main causes of failures derived through the big data analysis process, and identifies improvements, such as the number of data samples and data collection limitations, to be addressed in subsequent studies for a more reliable analysis model.

빅데이터 분석도구의 특성 (The Characteristics of Tools for Big Data Analysis)

  • 김도관;소순후
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2016년도 추계학술대회
    • /
    • pp.114-116
    • /
    • 2016
  • 오늘날 빅데이터 분석은 새로운 고객의 니즈를 추적하는 중요한 도구로 활용되고 있다. 빅데이터 분석 결과를 제공하는 다양한 사이트들은 각각의 서비스 유형과 특성에 따라 다양한 형태로 분석결과를 제시해주고 있다. 때문에 마케팅 분야에서 빅데이터 분석을 활용할 때는 각각의 사이트가 제공하는 빅데이터 분석 결과의 유형과 특성을 종합적으로 고려해야할 것이다. 이러한 점에서 본 연구에서는 현재 빅데이터 분석 서비스를 제공하는 사이트들의 분석 결과와 유형을 비교분석하고자 한다.

  • PDF

다기능레이더 데이터 획득 및 분석 장치 개발 (The Development of the Data Acquisition & Analysis System for Multi-Function Radar)

  • 송준호
    • 한국군사과학기술학회지
    • /
    • 제14권1호
    • /
    • pp.106-113
    • /
    • 2011
  • This paper describes Data Acquisition & Analysis System(DAS) for analysis of the multi-function radar. There are various information - beam probing data, clutter map data, plot data, target tracking data, RT tracking data, radar signal processing data, interface data - this device saves. The most important thing of data analysis is that a researcher gets a view of the whole data. The DAS intergrates with all of the data and provides overall information on the time matters occur. This is very useful advantage for approaching the matter easily. System algorithms of multi-function radar are improved by using this advantage. As a result of, range blank region have fallen about 72% and it is able to keep track in jammer environment.

TRAPR: R Package for Statistical Analysis and Visualization of RNA-Seq Data

  • Lim, Jae Hyun;Lee, Soo Youn;Kim, Ju Han
    • Genomics & Informatics
    • /
    • 제15권1호
    • /
    • pp.51-53
    • /
    • 2017
  • High-throughput transcriptome sequencing, also known as RNA sequencing (RNA-Seq), is a standard technology for measuring gene expression with unprecedented accuracy. Numerous bioconductor packages have been developed for the statistical analysis of RNA-Seq data. However, these tools focus on specific aspects of the data analysis pipeline, and are difficult to appropriately integrate with one another due to their disparate data structures and processing methods. They also lack visualization methods to confirm the integrity of the data and the process. In this paper, we propose an R-based RNA-Seq analysis pipeline called TRAPR, an integrated tool that facilitates the statistical analysis and visualization of RNA-Seq expression data. TRAPR provides various functions for data management, the filtering of low-quality data, normalization, transformation, statistical analysis, data visualization, and result visualization that allow researchers to build customized analysis pipelines.

Analysis of Impact Between Data Analysis Performance and Database

  • Kyoungju Min;Jeongyun Cho;Manho Jung;Hyangbae Lee
    • Journal of information and communication convergence engineering
    • /
    • 제21권3호
    • /
    • pp.244-251
    • /
    • 2023
  • Engineering or humanities data are stored in databases and are often used for search services. While the latest deep-learning technologies, such like BART and BERT, are utilized for data analysis, humanities data still rely on traditional databases. Representative analysis methods include n-gram and lexical statistical extraction. However, when using a database, performance limitation is often imposed on the result calculations. This study presents an experimental process using MariaDB on a PC, which is easily accessible in a laboratory, to analyze the impact of the database on data analysis performance. The findings highlight the fact that the database becomes a bottleneck when analyzing large-scale text data, particularly over hundreds of thousands of records. To address this issue, a method was proposed to provide real-time humanities data analysis web services by leveraging the open source database, with a focus on the Seungjeongwon-Ilgy, one of the largest datasets in the humanities fields.

Complex Segregation Analysis of Categorical Traits in Farm Animals: Comparison of Linear and Threshold Models

  • Kadarmideen, Haja N.;Ilahi, H.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제18권8호
    • /
    • pp.1088-1097
    • /
    • 2005
  • Main objectives of this study were to investigate accuracy, bias and power of linear and threshold model segregation analysis methods for detection of major genes in categorical traits in farm animals. Maximum Likelihood Linear Model (MLLM), Bayesian Linear Model (BALM) and Bayesian Threshold Model (BATM) were applied to simulated data on normal, categorical and binary scales as well as to disease data in pigs. Simulated data on the underlying normally distributed liability (NDL) were used to create categorical and binary data. MLLM method was applied to data on all scales (Normal, categorical and binary) and BATM method was developed and applied only to binary data. The MLLM analyses underestimated parameters for binary as well as categorical traits compared to normal traits; with the bias being very severe for binary traits. The accuracy of major gene and polygene parameter estimates was also very low for binary data compared with those for categorical data; the later gave results similar to normal data. When disease incidence (on binary scale) is close to 50%, segregation analysis has more accuracy and lesser bias, compared to diseases with rare incidences. NDL data were always better than categorical data. Under the MLLM method, the test statistics for categorical and binary data were consistently unusually very high (while the opposite is expected due to loss of information in categorical data), indicating high false discovery rates of major genes if linear models are applied to categorical traits. With Bayesian segregation analysis, 95% highest probability density regions of major gene variances were checked if they included the value of zero (boundary parameter); by nature of this difference between likelihood and Bayesian approaches, the Bayesian methods are likely to be more reliable for categorical data. The BATM segregation analysis of binary data also showed a significant advantage over MLLM in terms of higher accuracy. Based on the results, threshold models are recommended when the trait distributions are discontinuous. Further, segregation analysis could be used in an initial scan of the data for evidence of major genes before embarking on molecular genome mapping.

Structural health monitoring data reconstruction of a concrete cable-stayed bridge based on wavelet multi-resolution analysis and support vector machine

  • Ye, X.W.;Su, Y.H.;Xi, P.S.;Liu, H.
    • Computers and Concrete
    • /
    • 제20권5호
    • /
    • pp.555-562
    • /
    • 2017
  • The accuracy and integrity of stress data acquired by bridge heath monitoring system is of significant importance for bridge safety assessment. However, the missing and abnormal data are inevitably existed in a realistic monitoring system. This paper presents a data reconstruction approach for bridge heath monitoring based on the wavelet multi-resolution analysis and support vector machine (SVM). The proposed method has been applied for data imputation based on the recorded data by the structural health monitoring (SHM) system instrumented on a prestressed concrete cable-stayed bridge. The effectiveness and accuracy of the proposed wavelet-based SVM prediction method is examined by comparing with the traditional autoregression moving average (ARMA) method and SVM prediction method without wavelet multi-resolution analysis in accordance with the prediction errors. The data reconstruction analysis based on 5-day and 1-day continuous stress history data with obvious preternatural signals is performed to examine the effect of sample size on the accuracy of data reconstruction. The results indicate that the proposed data reconstruction approach based on wavelet multi-resolution analysis and SVM is an effective tool for missing data imputation or preternatural signal replacement, which can serve as a solid foundation for the purpose of accurately evaluating the safety of bridge structures.

강우빈도해석에서의 측우기자료의 유용성 평가 (Evaluation for usefulness of Chukwookee Data in Rainfall Frequency Analysis)

  • 김기욱;유철상;박민규;김대하;박상형;김현준
    • 한국수자원학회:학술대회논문집
    • /
    • 한국수자원학회 2007년도 학술발표회 논문집
    • /
    • pp.1526-1530
    • /
    • 2007
  • In this study, the chukwookee data were evaluated by applying that for the historical rainfall frequency analysis. To derive a two parameter log-normal distribution by using historical data and modern data, censored data MLE and binomial censored data MLE were applied. As a result, we found that both average and standard deviation were all estimated smaller with chukwookee data then those with only modern data. This indicates that rather big events rarely happens during the period of chukwookee data then during the modern period. The frequency analysis results using the parameters estimated were also similar to those expected. The point to be noticed is that the rainfall quantiles estimated by both methods were similar, especially for the 99% threshold. This result indicates that the historical document records like the annals of Chosun dynasty could be valuable and effective for the frequency analysis. This also means the extension of data available for frequency analysis.

  • PDF

TF-IDF와 네트워크분석을 이용한 특허 데이터 분석과 경쟁우위 전략수립에 관한 연구 (A Study on Patent Data Analysis and Competitive Advantage Strategy using TF-IDF and Network Analysis)

  • 윤석용;한경석
    • 디지털콘텐츠학회 논문지
    • /
    • 제19권3호
    • /
    • pp.529-535
    • /
    • 2018
  • 데이터는 폭발적으로 증가하고 있으나 아직도 많은 기업이 데이터 분석을 현황 설명(descriptive analysis)이나 진단 분석(diagnostic analysis)에만 활용하고 예측분석(predictive analysis)이나 기업의 기술전략 분석 등에는 적절하게 활용하고 있지 못하다. 본 연구는 오픈 되어 있는 특허의 IPC 코드, 발명자, 출원일 등의 정형데이터와 청구항 등의 비정형 데이터를 네트워크분석, TF-IDF 등의 빅데이터 분석기법을 활용하여 경쟁기업의 확보 기술과 핵심 기술의 분포, 해외 진출 전략을 파악하기 위한 분석 프로세스를 제시하고 이를 데이터 분석을 통하여 증명하고자 한다.

A Big Data-Driven Business Data Analysis System: Applications of Artificial Intelligence Techniques in Problem Solving

  • Donggeun Kim;Sangjin Kim;Juyong Ko;Jai Woo Lee
    • 한국빅데이터학회지
    • /
    • 제8권1호
    • /
    • pp.35-47
    • /
    • 2023
  • It is crucial to develop effective and efficient big data analytics methods for problem-solving in the field of business in order to improve the performance of data analytics and reduce costs and risks in the analysis of customer data. In this study, a big data-driven data analysis system using artificial intelligence techniques is designed to increase the accuracy of big data analytics along with the rapid growth of the field of data science. We present a key direction for big data analysis systems through missing value imputation, outlier detection, feature extraction, utilization of explainable artificial intelligence techniques, and exploratory data analysis. Our objective is not only to develop big data analysis techniques with complex structures of business data but also to bridge the gap between the theoretical ideas in artificial intelligence methods and the analysis of real-world data in the field of business.