• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.038 seconds

A Study on the Improvement of Large-Volume Scalable Spatial Data for VWorld Desktop (브이월드 데스크톱을 위한 대용량 공간정보 데이터 지원 방안 연구)

  • Kang, Ji-Hun;Kim, Hyeon-Deok;Kim, Jung-Ok
    • Journal of Cadastre & Land InformatiX
    • /
    • v.45 no.1
    • /
    • pp.169-179
    • /
    • 2015
  • Recently, as the amount of data increases rapidly, the development of IT technology entered the 'Big Data' era, dealing with large-volume of data at once. In the spatial field, a spatial data service technology is required to use that various and big amount of data. In this study, firstly, we explained the technology of typical spatial information data services abroad, and then we have developed large KML data processing techniques those can be applied as KML format to VWorld desktop. The test was conducted using a large KML data in order to verify the development KML partitioned methods and tools. As a result, the index file and the divided files are produced and it was visible in VWorld desktop.

A Case Study of Basic Data Science Education using Public Big Data Collection and Spreadsheets for Teacher Education (교사교육을 위한 공공 빅데이터 수집 및 스프레드시트 활용 기초 데이터과학 교육 사례 연구)

  • Hur, Kyeong
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.3
    • /
    • pp.459-469
    • /
    • 2021
  • In this paper, a case study of basic data science practice education for field teachers and pre-service teachers was studied. In this paper, for basic data science education, spreadsheet software was used as a data collection and analysis tool. After that, we trained on statistics for data processing, predictive hypothesis, and predictive model verification. In addition, an educational case for collecting and processing thousands of public big data and verifying the population prediction hypothesis and prediction model was proposed. A 34-hour, 17-week curriculum using a spreadsheet tool was presented with the contents of such basic education in data science. As a tool for data collection, processing, and analysis, unlike Python, spreadsheets do not have the burden of learning program- ming languages and data structures, and have the advantage of visually learning theories of processing and anal- ysis of qualitative and quantitative data. As a result of this educational case study, three predictive hypothesis test cases were presented and analyzed. First, quantitative public data were collected to verify the hypothesis of predicting the difference in the mean value for each group of the population. Second, by collecting qualitative public data, the hypothesis of predicting the association within the qualitative data of the population was verified. Third, by collecting quantitative public data, the regression prediction model was verified according to the hypothesis of correlation prediction within the quantitative data of the population. And through the satisfaction analysis of pre-service and field teachers, the effectiveness of this education case in data science education was analyzed.

Analysis of Factors for Korean Women's Cancer Screening through Hadoop-Based Public Medical Information Big Data Analysis (Hadoop기반의 공개의료정보 빅 데이터 분석을 통한 한국여성암 검진 요인분석 서비스)

  • Park, Min-hee;Cho, Young-bok;Kim, So Young;Park, Jong-bae;Park, Jong-hyock
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.10
    • /
    • pp.1277-1286
    • /
    • 2018
  • In this paper, we provide flexible scalability of computing resources in cloud environment and Apache Hadoop based cloud environment for analysis of public medical information big data. In fact, it includes the ability to quickly and flexibly extend storage, memory, and other resources in a situation where log data accumulates or grows over time. In addition, when real-time analysis of accumulated unstructured log data is required, the system adopts Hadoop-based analysis module to overcome the processing limit of existing analysis tools. Therefore, it provides a function to perform parallel distributed processing of a large amount of log data quickly and reliably. Perform frequency analysis and chi-square test for big data analysis. In addition, multivariate logistic regression analysis of significance level 0.05 and multivariate logistic regression analysis of meaningful variables (p<0.05) were performed. Multivariate logistic regression analysis was performed for each model 3.

Optimization Model for the Mixing Ratio of Coatings Based on the Design of Experiments Using Big Data Analysis (빅데이터 분석을 활용한 실험계획법 기반의 코팅제 배합비율 최적화 모형)

  • Noh, Seong Yeo;Kim, Young-Jin
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.3 no.10
    • /
    • pp.383-392
    • /
    • 2014
  • The research for coatings is one of the most popular and active research in the polymer industry. For the coatings, electronics industry, medical and optical fields are growing more important. In particular, the trend is the increasing of the technical requirements for the performance and accuracy of the coatings by the development of automotive and electronic parts. In addition, the industry has a need of more intelligent and automated system in the industry is increasing by introduction of the IoT and big data analysis based on the environmental information and the context information. In this paper, we propose an optimization model for the design of experiments based coating formulation data objects using the Internet technologies and big data analytics. In this paper, the coating formulation was calculated based on the best data analysis is based on the experimental design, modify the operator with respect to the error caused based on the coating formulation used in the actual production site data and the corrected result data. Further optimization model to correct the reference value by leveraging big data analysis and Internet of things technology only existing coating formulation is applied as the reference data using a manufacturing environment and context information retrieval in color and quality, the most important factor in maintaining and was derived. Based on data obtained from an experiment and analysis is improving the accuracy of the combination data and making it possible to give a LOT shorter working hours per data. Also the data shortens the production time due to the reduction in the delivery time per treatment and It can contribute to cost reduction or the like defect rate reduced. Further, it is possible to obtain a standard data in the manufacturing process for the various models.

Parallelism point selection in nested parallelism situations with focus on the bandwidth selection problem (평활량 선택문제 측면에서 본 중첩병렬화 상황에서 병렬처리 포인트선택)

  • Cho, Gayoung;Noh, Hohsuk
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.3
    • /
    • pp.383-396
    • /
    • 2018
  • Various parallel processing R packages are used for fast processing and the analysis of big data. Parallel processing is used when the work can be decomposed into tasks that are non-interdependent. In some cases, each task decomposed for parallel processing can also be decomposed into non-interdependent subtasks. We have to choose whether to parallelize the decomposed tasks in the first step or to parallelize the subtasks in the second step when facing nested parallelism situations. This choice has a significant impact on the speed of computation; consequently, it is important to understand the nature of the work and decide where to do the parallel processing. In this paper, we provide an idea of how to apply parallel computing effectively to problems by illustrating how to select a parallelism point for the bandwidth selection of nonparametric regression.

Learning Performance Analysis Using Deep Learning (딥러닝기법을 활용한 학습성과분석)

  • Oh, Jeong-Hoon;Yu, Heonchang
    • Annual Conference of KIPS
    • /
    • 2018.10a
    • /
    • pp.711-714
    • /
    • 2018
  • 본 연구의 목적은 교육관리시스템(LMS)에서의 학습활동로그를 바탕으로 학습성과 영향도를 분석하고 이를 예측하기 위한 모델을 개발하는데 있다. 연구방법은 먼저 상관분석을 사용하여 유의미한 변수를 선정하였으며, 딥러닝을 사용하여 예측 모델을 생성하였다. 모델 생성 결과 테스트 데이터 셋에 대해 약 84%의 정확도로 학습성과를 예측할 수 있었다. 본 연구는 온라인 교육환경에서 빅데이터와 인공지능을 적용할 수 있는 새로운 관점을 제공할 것으로 기대한다.

Applied Method of BI Service for Enterprise Company in BigData Environment (빅데이터 기업의 효율적인 BI 서비스 적용 방안)

  • Joe, Dong-Wan;Shim, Jae-Sung;Park, Seok-Cheon
    • Annual Conference of KIPS
    • /
    • 2015.04a
    • /
    • pp.343-345
    • /
    • 2015
  • 현대사회가 복잡해지면서 정보의 양이 증가하였고, 경쟁이 치열해지면서 기업의 경영 전략으로 BI 서비스를 선택하였으나, 기존 BI 기업에서는 실시간 분석 및 처리, 외부 데이터 사용 불가 동의 문제점이 있었다. 이를 위해 본 논문에서는 BI서비스 동향 및 기술을 분석하고 기업의 효율적인 BI 서비스 적용 방안을 연구하였다.

A Study on Performance Analysis of MRC Algorithm Using SQuAD (SQuAD를 활용한 MRC 알고리즘 성능 분석 연구)

  • Lim, Jong-Hyuk
    • Annual Conference of KIPS
    • /
    • 2018.05a
    • /
    • pp.431-432
    • /
    • 2018
  • MRC(기계독해)는 Passage, Question, Answel 로 이루어진 Dataset 으로 학습된 모델을 사용하여 요청한 Question 의 Answer 를 같이 주어진 Passage 내에서 찾아내는 것을 목적으로 한다. 최근 MRC 시스템의 성능 측정 지표로 활용되는 SQuAD Dataset 을 활용하여 RNN 의 한 분류인 match-LSTM과 R-NET 알고리즘의 성능을 비교 분석하고자 한다.