• Title/Summary/Keyword: methods of data analysis

Search Result 19,201, Processing Time 0.046 seconds

Analysis of Impact Between Data Analysis Performance and Database

  • Kyoungju Min;Jeongyun Cho;Manho Jung;Hyangbae Lee
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.3
    • /
    • pp.244-251
    • /
    • 2023
  • Engineering or humanities data are stored in databases and are often used for search services. While the latest deep-learning technologies, such like BART and BERT, are utilized for data analysis, humanities data still rely on traditional databases. Representative analysis methods include n-gram and lexical statistical extraction. However, when using a database, performance limitation is often imposed on the result calculations. This study presents an experimental process using MariaDB on a PC, which is easily accessible in a laboratory, to analyze the impact of the database on data analysis performance. The findings highlight the fact that the database becomes a bottleneck when analyzing large-scale text data, particularly over hundreds of thousands of records. To address this issue, a method was proposed to provide real-time humanities data analysis web services by leveraging the open source database, with a focus on the Seungjeongwon-Ilgy, one of the largest datasets in the humanities fields.

A Study on Work Development Direction of Cost Analysis through Cost Analysis of Micro Satellite (초소형위성 비용분석 사례연구를 통한 비용분석 업무발전 방향에 대한 고찰)

  • Lee, Tae Hwa
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.3
    • /
    • pp.461-479
    • /
    • 2023
  • Purpose: It emphasizes the importance of cost analysis for weapons systems that require enormous develop- ment costs, analyzes the problems of cost analysis steps from a practical point of view, and presents the direction of business development in terms of cost analysis reliability, timeliness, and efficiency. Methods: It analyzes the R&D cost of Micro satellites with a complex cost structure and large scale according to engineering estimation procedures, derives major analysis step-by-step problems, and presents business development directions. Results: Problems with standards and assumptions, data collection, cost division structure, and cost estimation methods were derived through the micro satellite cost analysis process, and business development directions such as expanding common standards, standardizing basic data, standardizing cost division structures and cost items, and data asset were presented. Conclusion: In order to develop work in terms of cost analysis reliability, timeliness, and efficiency, it is important to prepare and standardize standards and rules for detailed tasks at each analysis stage, and through this, it is expected that high utilization value and systematic cost data will be assetized in the future.

Two-stage imputation method to handle missing data for categorical response variable

  • Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.577-587
    • /
    • 2023
  • Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.

Literature Review on the Statistical Methods in KSQM for 50 Years (품질경영학회 50주년 특별호: 통계적 기법 분야 연구 리뷰)

  • Lim, Yong Bin;Kim, Sang Ik;Lee, Sang Bok;Jang, Dae Heung
    • Journal of Korean Society for Quality Management
    • /
    • v.44 no.2
    • /
    • pp.221-244
    • /
    • 2016
  • Purpose: This research reviews the papers, published in the Journal of the Korean Society for Quality Control (KSQC) and the Journal of the Korean Society for Quality Management (KSQM) since 1965, in the area of statistical methods. The literature review is performed in the four fields of the statistical methods and we categorize the published articles into the several sub-areas in each field. Methods: The reviewed articles are classified into the four main categories: probability model and estimation, Bayesian analysis and non-parametric analysis, regression and time series analysis, and application of data analysis. We examine the contents and relationships of the published articles of the several sub-areas in each category. Results: We summarize the reviewed papers in the chronological road-maps for each sub-area, and outline the relations of the connected papers. Some comments on the contents and the contributions of the reviewed papers are also provided in this paper. Conclusion: Various issues are employed and published on the research of the application statistical methods for past 50 years, and many worthy works are achieved in the theory and application areas of statistical methods for improving quality in the manufacturing and service industries. The future direction of the research in the statistical quality management methods also can be explored by the contents of this research.

Comparing Data Access Methods in Statistical Packages (통계 패키지에서의 데이터 접근 방식 비교)

  • Kang, Gun-Seog
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.3
    • /
    • pp.437-447
    • /
    • 2009
  • Recently, in addition to analyzing data with appropriate statistical methods, statistical analysts in the industrial fields face difficulties that they have to compose proper datasets for analysis objectives via extracting or generating processes from diverse data storage devices. In this paper we survey and compare many state-of-the-art data access technologies adopted by several commonly used statistical packages. More understanding of these technologies will help to reduce the costs occurring when analyzing large size of datasets in especially data mining works, and so to allow more time in applying statistical analysis methods.

A case study to Regression Analysis using Artificial Neural Network (인공신경망을 이용한 회귀분석 사례 조사)

  • Kim, Jie-Hyun;Ree, Sang-Bok
    • Proceedings of the Korean Society for Quality Management Conference
    • /
    • 2010.04a
    • /
    • pp.402-408
    • /
    • 2010
  • Forecasting have qualitative and quantitative methods. Quantitative one analyze macro-economic factors such as the rate of exchange, oil price, interest rate and also predict the micro-economic factors such as sales and demands. Applying various statistical methods depends on the type of data. when data has seasonality and trend, Time Series analysis is proper but when it has casual relation, Regression analysis is good for this. Time Series and Regression can be used together. This study investigate artificial neural networks which is predictive technique for casual relation and try to compare the accuracy of forecasting between regression analysis and artificial neural network.

  • PDF

A Study on One Factorial Longitudinal Data Analysis with Informative Drop-out

  • Lee, Ki-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1053-1065
    • /
    • 2006
  • This paper proposes a method in one-way layouts for longitudinal data with informative drop-out. When dropouts are informative, that is, correlated with unobserved data and/or the previous observed data, the simple imputation methods such as 'last observation carried forward' (LOCF) methods would arise the bias of the testing models. The maximum likelihood procedure combined with a logit model for the drop-out process is proposed to test treatment effects for one factorial designs and compared with LOCF method in two examples.

  • PDF

Feasibility to Expand Complex Wards for Efficient Hospital Management and Quality Improvement

  • CHOI, Eun-Mee;JUNG, Yong-Sik;KWON, Lee-Seung;KO, Sang-Kyun;LEE, Jae-Young;KIM, Myeong-Jong
    • The Journal of Industrial Distribution & Business
    • /
    • v.11 no.12
    • /
    • pp.7-15
    • /
    • 2020
  • Purpose: This study aims to explore the feasibility of expanding complex wards to provide efficient hospital management and high-quality medical services to local residents of Gangneung Medical Center (GMC). Research Design, Data and Methodology: There are four research designs to achieve the research objectives. We analyzed Big Data for 3 months on Social Network Services (SNS). A questionnaire survey conducted on 219 patients visiting the GMC. Surveys of 20 employees of the GMC applied. The feasibility to expand the GMC ward measured through Focus Group Interview by 12 internal and external experts. Data analysis methods derived from various surveys applied with data mining technique, frequency analysis, and Importance-Performance Analysis methods, and IBM SPSS statistical package program applied for data processing. Results: In the result of the big data analysis, the GMC's recognition on SNS is high. 95.9% of the residents and 100.0% of the employees required the need for the complex ward extension. In the analysis of expert opinion, in the future functions of GMC, specialized care (△3.3) and public medicine (△1.4) increased significantly. Conclusion: GMC's complex ward extension is an urgent and indispensable project to provide efficient hospital management and service quality.

Aspect-based Sentiment Analysis of Product Reviews using Multi-agent Deep Reinforcement Learning

  • M. Sivakumar;Srinivasulu Reddy Uyyala
    • Asia pacific journal of information systems
    • /
    • v.32 no.2
    • /
    • pp.226-248
    • /
    • 2022
  • The existing model for sentiment analysis of product reviews learned from past data and new data was labeled based on training. But new data was never used by the existing system for making a decision. The proposed Aspect-based multi-agent Deep Reinforcement learning Sentiment Analysis (ADRSA) model learned from its very first data without the help of any training dataset and labeled a sentence with aspect category and sentiment polarity. It keeps on learning from the new data and updates its knowledge for improving its intelligence. The decision of the proposed system changed over time based on the new data. So, the accuracy of the sentiment analysis using deep reinforcement learning was improved over supervised learning and unsupervised learning methods. Hence, the sentiments of premium customers on a particular site can be explored to other customers effectively. A dynamic environment with a strong knowledge base can help the system to remember the sentences and usage State Action Reward State Action (SARSA) algorithm with Bidirectional Encoder Representations from Transformers (BERT) model improved the performance of the proposed system in terms of accuracy when compared to the state of art methods.

THE STUDY OF SPATIAL AND TEMPORAL VARIABILITY OF THE KUROSHIO EXTENSION USING REMOTE SENSING DATA WITH APPLICATION OF DATA-FUSION METHODS

  • Kim Woo-Jin;Park Gil- Yong;Lim Se-Han;OH Im-Sang
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.434-436
    • /
    • 2005
  • Analysis method using remote sensing data is one of the effective ways to research a spatial and temporal variability of the mesoscale oceanic motions. During past several decades, many researchers have been getting comprehensive results using remote sensing data with application of data fusion methods in many parts of geo-science. For this study, we took the integration and fusion of several remote sensing data, which are different data resolution, timescale and characteristics, for improving accurate analysis of variation of the Kuroshio Extension. Furthermore, we might get advanced ways to understand the variability of the Kuroshio Extension, has close relation to the spatial and temporal variation of the Kuroshio and Oyashio Current.

  • PDF