• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.032 seconds

Compression Conversion and Storing of Large RDF datasets based on MapReduce (맵리듀스 기반 대량 RDF 데이터셋 압축 변환 및 저장 방법)

  • Kim, InA;Lee, Kyong-Ha;Lee, Kyu-Chul
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.487-494
    • /
    • 2022
  • With the recent demand for analysis using data, the size of the knowledge graph, which is the data to be analyzed, gradually increased, reaching about 82 billion edges when extracted from the web as a knowledge graph. A lot of knowledge graphs are represented in the form of Resource Description Framework (RDF), which is a standard of W3C for representing metadata for web resources. Because of the characteristics of RDF, existing RDF storages have the limitations of processing time overhead when converting and storing large amounts of RDF data. To resolve these limitations, in this paper, we propose a method of compressing and converting large amounts of RDF data into integer IDs using MapReduce, and vertically partitioning and storing them. Our proposed method demonstrated a high performance improvement of up to 25.2 times compared to RDF-3X and up to 3.7 times compared to H2RDF+.

An EDA Analysis of Seoul Metropolitan Area's Mountain Usage Patterns of Users in Their 20~30s after COVID-19 Occurrence

  • Lee, BoBae;Yeon, PoungSik
    • Journal of People, Plants, and Environment
    • /
    • v.24 no.2
    • /
    • pp.229-244
    • /
    • 2021
  • Background and objective: The purpose of this study was to comprehensively analyze the user behavior in order to cope appropriately with the increasing demand for mountain usage of those in their 20s and 30s and to allocate resources efficiently. Methods: To analyze the behavior of mountain hiking users, an exploratory data analysis (EDA) was conducted on the data which had been collected in the app Tranggle. The main target are users in their 20s and 30s who visited the mountains in the metropolitan area in 2019-2020. Among them, we have selected data on the top 13 mountains based on the frequency of visits. After data pre-processing, mountain usage patterns were analyzed through statistical analysis and visualization. Results: Compared to 2019, the number of users in 2020 increased 1.36 times. The utilization rate of the well-established hiking trails has also increased. The usage of mountain on weekends (Saturday > Sunday) was still the highest, and the difference in the usage between the days of the week decreased. Outside of work hours, early morning usage has increased and night-time usage has decreased. There was no significant change in usages depending on activity type, level (experience point) and exercise properties. Conclusion: Since the COVID-19 outbreak, the usage of mountains has been changing towards low user density and short-distance trip. in the post-COVID-19 era, the function and role of forests in daily life are expected to increase. To cope with this, further research needs to be carried out with consideration of the wider demographic and social characteristics.

A Study on the Architecture Design of Road and Facility Operation Management System for 3D Spatial Data Processing (3차원 공간데이터 처리를 위한 차로 및 시설물 운영 관리 시스템 아키텍처 설계 연구)

  • KIM, Duck-Ho;KIM, Sung-Jin;LEE, Jung-Uck
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.24 no.4
    • /
    • pp.136-147
    • /
    • 2021
  • Autonomous driving-related technologies are developing step by step by applying the degree of driving. It is essential that operational management technology for roads where autonomous vehicles move should also develop in line with autonomous driving technology. However, in the case of road operation management, it is currently managed using only two-dimensional information, showing limitations in the systematic operation management of lane and facility information and maintenance. This study proposed a plan to construct an operation management system architecture capable of 3D spatial information-based operation management by designing a convergence database that can process real-time big data with high-definition road map data. Through this study, when using a high-definition road map based operation management system for lane and facility maintenance in the future, it is possible to visualize and manage facilities, edit and analyze data of multiple users, link various GIS S/W and efficiently process large scale of real-time data.

Development of Online Fashion Thesaurus and Taxonomy for Text Mining (텍스트마이닝을 위한 패션 속성 분류체계 및 말뭉치 웹사전 구축)

  • Seyoon Jang;Ha Youn Kim;Songmee Kim;Woojin Choi;Jin Jeong;Yuri Lee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.6
    • /
    • pp.1142-1160
    • /
    • 2022
  • Text data plays a significant role in understanding and analyzing trends in consumer, business, and social sectors. For text analysis, there must be a corpus that reflects specific domain knowledge. However, in the field of fashion, the professional corpus is insufficient. This study aims to develop a taxonomy and thesaurus that considers the specialty of fashion products. To this end, about 100,000 fashion vocabulary terms were collected by crawling text data from WSGN, Pantone, and online platforms; text subsequently was extracted through preprocessing with Python. The taxonomy was composed of items, silhouettes, details, styles, colors, textiles, and patterns/prints, which are seven attributes of clothes. The corpus was completed through processing synonyms of terms from fashion books such as dictionaries. Finally, 10,294 vocabulary words, including 1,956 standard Korean words, were classified in the taxonomy. All data was then developed into a web dictionary system. Quantitative and qualitative performance tests of the results were conducted through expert reviews. The performance of the thesaurus also was verified by comparing the results of text mining analysis through the previously developed corpus. This study contributes to achieving a text data standard and enables meaningful results of text mining analysis in the fashion field.

Analysis of interest in non-face-to-face medical counseling of modern people in the medical industry (의료 산업에 있어 현대인의 비대면 의학 상담에 대한 관심도 분석 기법)

  • Kang, Yooseong;Park, Jong Hoon;Oh, Hayoung;Lee, Se Uk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1571-1576
    • /
    • 2022
  • This study aims to analyze the interest of modern people in non-face-to-face medical counseling in the medical industrys. Big data was collected on two social platforms, 지식인, a platform that allows experts to receive medical counseling, and YouTube. In addition to the top five keywords of telephone counseling, "internal medicine", "general medicine", "department of neurology", "department of mental health", and "pediatrics", a data set was built from each platform with a total of eight search terms: "specialist", "medical counseling", and "health information". Afterwards, pre-processing processes such as morpheme classification, disease extraction, and normalization were performed based on the crawled data. Data was visualized with word clouds, broken line graphs, quarterly graphs, and bar graphs by disease frequency based on word frequency. An emotional classification model was constructed only for YouTube data, and the performance of GRU and BERT-based models was compared.

A Study on the Health Index Based on Degradation Patterns in Time Series Data Using ProphetNet Model (ProphetNet 모델을 활용한 시계열 데이터의 열화 패턴 기반 Health Index 연구)

  • Sun-Ju Won;Yong Soo Kim
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.123-138
    • /
    • 2023
  • The Fourth Industrial Revolution and sensor technology have led to increased utilization of sensor data. In our modern society, data complexity is rising, and the extraction of valuable information has become crucial with the rapid changes in information technology (IT). Recurrent neural networks (RNN) and long short-term memory (LSTM) models have shown remarkable performance in natural language processing (NLP) and time series prediction. Consequently, there is a strong expectation that models excelling in NLP will also excel in time series prediction. However, current research on Transformer models for time series prediction remains limited. Traditional RNN and LSTM models have demonstrated superior performance compared to Transformers in big data analysis. Nevertheless, with continuous advancements in Transformer models, such as GPT-2 (Generative Pre-trained Transformer 2) and ProphetNet, they have gained attention in the field of time series prediction. This study aims to evaluate the classification performance and interval prediction of remaining useful life (RUL) using an advanced Transformer model. The performance of each model will be utilized to establish a health index (HI) for cutting blades, enabling real-time monitoring of machine health. The results are expected to provide valuable insights for machine monitoring, evaluation, and management, confirming the effectiveness of advanced Transformer models in time series analysis when applied in industrial settings.

Study on Plastics Detection Technique using Terra/ASTER Data

  • Syoji, Mizuhiko;Ohkawa, Kazumichi
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.1460-1463
    • /
    • 2003
  • In this study, plastic detection technique was developed, applying remote sensing technology as a method to extract plastic wastes, which is one of the big causes of concern contributing to environmental destruction. It is possible to extract areas where plastic (including polypropylene and polyethylene) wastes are prominent, using ASTER data by taking advantage of its absorptive characteristics of ASTER/SWIR bands. The algorithm is applicable to define large industrial wastes disposal sites and areas where plastic greenhouses are concentrated. However, the detection technique with ASTER/SWIR data has some research tasks to be tackled, which includes a partial secretion of reference spectral, depending on some conditions of plastic wastes and a detection error in a region mixed with vegetations and waters. Following results were obtained after making comparisons between several detection methods and plastic wastes in different conditions; (a)'spectral extraction method' was suitable for areas where plastic wastes exist separated from other objects, such as coastal areas where plastic wastes drifted ashore. (single plastic spectral was used as a reference for the 'spectral extraction method') (b)On the other hand, the 'spectral extraction method' was not suitable for sites where plastic wastes are mixed with vegetation and soil. After making comparison of the processing results of a mixed area, it was found that applying both 'separation method' using un-mixing and ‘spectral extraction method’ with NDVI masked is the most appropriate method to extract plastic wastes. Also, we have investigated the possibility of reducing the influence of vegetation and water, using ASTER/TIR, and successfully extracted some places with plastics. As a conclusion, we have summarized the relationship between detection techniques and conditions of plastic wastes and propose the practical application of remote sensing technology to the extraction of plastic wastes.

  • PDF

Intelligent Sensor Technology Trend for Smart IT Convergence Platform (스마트 IT 융합 플랫폼을 위한 지능형 센서 기술 동향)

  • Kim, H.J.;Jin, H.B.;Youm, W.S.;Kim, Y.G.;Park, K.H.
    • Electronics and Telecommunications Trends
    • /
    • v.34 no.5
    • /
    • pp.14-25
    • /
    • 2019
  • As the Internet of Things, artificial intelligence and big data have received a lot of attention as key growth engines in the era of the fourth industrial revolution, data acquisition and utilization in mobile, automotive, robotics, manufacturing, agriculture, health care and national defense are becoming more important. Due to numerous data-based industrial changes, demand for sensor technologies is exploding, especially for intelligent sensor technologies that combine control, judgement, storage and communication functions with the sensors's own functions. Intelligent sensor technology can be defined as a convergence component technology that combines intelligent sensor units, intelligent algorithms, modules with signal processing circuits, and integrated plaform technologies. Intelligent sensor technology, which can be applied to variety of smart IT convergence services such as smart devices, smart homes, smart cars, smart factory, smart cities, and others, is evolving towards intelligent and convergence technologies that produce new high-value information through recognition, reasoning, and judgement based on artificial intelligence. As a result, development of intelligent sensor units is accelerating with strategies for miniaturization, low-power consumption and convergence, new form factor such as flexible and stretchable form, and integration of high-resolution sensor arrays. In the future, these intelligent sensor technologies will lead explosive sensor industries in the era of data-based artificial intelligence and will greatly contribute to enhancing nation's competitiveness in the global sensor market. In this report, we analyze and summarize the recent trends in intelligent sensor technologies, especially those for four core technologies.

Using Mobile Phone Data, Analyzing Floating Population Near University Areas in Daegu, South Korea, before and after Covid-19 - with a focus on Comparisons with Seoul (통신사 빅데이터를 활용한 코로나 전염병 전후 대구 대학가 유동인구 분석 - 서울과의 비교를 중심으로)

  • Kim, Jae-Hun;Son, Ji-Hoon;Park, Han-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.3
    • /
    • pp.62-70
    • /
    • 2022
  • This study investigates the temporal structure and movement of floating people near university areas in Daegu metropolitan city, South Korea, before and after Covid-19. In order to determine Daegu's position, the current study compares Daegu and Seoul. The floating population is used as an index to reveal people's various activities in the area known as the local business district, which surrounds the university campus. The information was provided by mobile phone manufacturers. A municipal authority managed a public website where mobile data was made available. Several statistical and visualization techniques were used after the data pre-processing steps. As a result, the floating population fluctuation patterns in both cities in the first half of 2019 and 2020 were comparable. When the Covid-19 diffusion rate in Daegu stabilized in the second half of 2020, the floating population in Daegu increased slightly over the previous year, while the population in Seoul decreased due to the second wave of Covid-19.

A Study on Classification Models for Predicting Bankruptcy Based on XAI (XAI 기반 기업부도예측 분류모델 연구)

  • Jihong Kim;Nammee Moon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.333-340
    • /
    • 2023
  • Efficient prediction of corporate bankruptcy is an important part of making appropriate lending decisions for financial institutions and reducing loan default rates. In many studies, classification models using artificial intelligence technology have been used. In the financial industry, even if the performance of the new predictive models is excellent, it should be accompanied by an intuitive explanation of the basis on which the result was determined. Recently, the US, EU, and South Korea have commonly presented the right to request explanations of algorithms, so transparency in the use of AI in the financial sector must be secured. In this paper, an artificial intelligence-based interpretable classification prediction model was proposed using corporate bankruptcy data that was open to the outside world. First, data preprocessing, 5-fold cross-validation, etc. were performed, and classification performance was compared through optimization of 10 supervised learning classification models such as logistic regression, SVM, XGBoost, and LightGBM. As a result, LightGBM was confirmed as the best performance model, and SHAP, an explainable artificial intelligence technique, was applied to provide a post-explanation of the bankruptcy prediction process.