• Title/Summary/Keyword: Big Data Processing Technology

Search Result 385, Processing Time 0.029 seconds

A Study on Applying Novel Reverse N-Gram for Construction of Natural Language Processing Dictionary for Healthcare Big Data Analysis (헬스케어 분야 빅데이터 분석을 위한 개체명 사전구축에 새로운 역 N-Gram 적용 연구)

  • KyungHyun Lee;RackJune Baek;WooSu Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.391-396
    • /
    • 2024
  • This study proposes a novel reverse N-Gram approach to overcome the limitations of traditional N-Gram methods and enhance performance in building an entity dictionary specialized for the healthcare sector. The proposed reverse N-Gram technique allows for more precise analysis and processing of the complex linguistic features of healthcare-related big data. To verify the efficiency of the proposed method, big data on healthcare and digital health announced during the Consumer Electronics Show (CES) held each January was collected. Using the Python programming language, 2,185 news titles and summaries mentioned from January 1 to 31 in 2010 and from January 1 to 31 in 2024 were preprocessed with the new reverse N-Gram method. This resulted in the stable construction of a dictionary for natural language processing in the healthcare field.

Visualizing Unstructured Data using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 비정형 데이터 시각화)

  • Nam, Soo-Tai;Chen, Jinhui;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.151-154
    • /
    • 2021
  • Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study was analyzed for 21 papers in the March 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 305 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

Sequential Pattern Mining for Intrusion Detection System with Feature Selection on Big Data

  • Fidalcastro, A;Baburaj, E
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.10
    • /
    • pp.5023-5038
    • /
    • 2017
  • Big data is an emerging technology which deals with wide range of data sets with sizes beyond the ability to work with software tools which is commonly used for processing of data. When we consider a huge network, we have to process a large amount of network information generated, which consists of both normal and abnormal activity logs in large volume of multi-dimensional data. Intrusion Detection System (IDS) is required to monitor the network and to detect the malicious nodes and activities in the network. Massive amount of data makes it difficult to detect threats and attacks. Sequential Pattern mining may be used to identify the patterns of malicious activities which have been an emerging popular trend due to the consideration of quantities, profits and time orders of item. Here we propose a sequential pattern mining algorithm with fuzzy logic feature selection and fuzzy weighted support for huge volumes of network logs to be implemented in Apache Hadoop YARN, which solves the problem of speed and time constraints. Fuzzy logic feature selection selects important features from the feature set. Fuzzy weighted supports provide weights to the inputs and avoid multiple scans. In our simulation we use the attack log from NS-2 MANET environment and compare the proposed algorithm with the state-of-the-art sequential Pattern Mining algorithm, SPADE and Support Vector Machine with Hadoop environment.

IoT Big Data Processing SW Using In-Memory DB Queue (In-memory DB queue를 이용한 IoT 빅데이터 처리 SW)

  • Kang, JeongHoon;Chae, Chulseoung;Kim, HyeongGoo;Min, Su-Yeong;Lee, Myeong-Su;Park, Bu-Sik;Lee, Sang-Yeop
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2019.10a
    • /
    • pp.612-614
    • /
    • 2019
  • 본 논문은 손실이 큰 IoT 빅데이터 처리 고속화 SW의 Queue를 In-memory DataBase로 이용하여 전처리 프레임워크 기술에 대하여 제안하였다.

Big Data Analytics of Construction Safety Incidents Using Text Mining (텍스트 마이닝을 활용한 건설안전사고 빅데이터 분석)

  • Jeong Uk Seo;Chie Hoon Song
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.3
    • /
    • pp.581-590
    • /
    • 2024
  • This study aims to extract key topics through text mining of incident records (incident history, post-incident measures, preventive measures) from construction safety accident case data available on the public data portal. It also seeks to provide fundamental insights contributing to the establishment of manuals for disaster prevention by identifying correlations between these topics. After pre-processing the input data, we used the LDA-based topic modeling technique to derive the main topics. Consequently, we obtained five topics related to incident history, and four topics each related to post-incident measures and preventive measures. Although no dominant patterns emerged from the topic pattern analysis, the study holds significance as it provides quantitative information on the follow-up actions related to the incident history, thereby suggesting practical implications for the establishment of a preventive decision-making system through the linkage between accident history and subsequent measures for reccurrence prevention.

Building an Analytical Platform of Big Data for Quality Inspection in the Dairy Industry: A Machine Learning Approach (유제품 산업의 품질검사를 위한 빅데이터 플랫폼 개발: 머신러닝 접근법)

  • Hwang, Hyunseok;Lee, Sangil;Kim, Sunghyun;Lee, Sangwon
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.125-140
    • /
    • 2018
  • As one of the processes in the manufacturing industry, quality inspection inspects the intermediate products or final products to separate the good-quality goods that meet the quality management standard and the defective goods that do not. The manual inspection of quality in a mass production system may result in low consistency and efficiency. Therefore, the quality inspection of mass-produced products involves automatic checking and classifying by the machines in many processes. Although there are many preceding studies on improving or optimizing the process using the data generated in the production process, there have been many constraints with regard to actual implementation due to the technical limitations of processing a large volume of data in real time. The recent research studies on big data have improved the data processing technology and enabled collecting, processing, and analyzing process data in real time. This paper aims to propose the process and details of applying big data for quality inspection and examine the applicability of the proposed method to the dairy industry. We review the previous studies and propose a big data analysis procedure that is applicable to the manufacturing sector. To assess the feasibility of the proposed method, we applied two methods to one of the quality inspection processes in the dairy industry: convolutional neural network and random forest. We collected, processed, and analyzed the images of caps and straws in real time, and then determined whether the products were defective or not. The result confirmed that there was a drastic increase in classification accuracy compared to the quality inspection performed in the past.

An Efficient Algorithm for Big Data Prediction of Pipelining, Concurrency (PCP) and Parallelism based on TSK Fuzzy Model (TSK 퍼지 모델 이용한 효율적인 빅 데이터 PCP 예측 알고리즘)

  • Kim, Jang-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.10
    • /
    • pp.2301-2306
    • /
    • 2015
  • The time to address the exabytes of data has come as the information age accelerates. Big data transfer technology is essential for processing large amounts of data. This paper posits to transfer big data in the optimal conditions by the proposed algorithm for predicting the optimal combination of Pipelining, Concurrency, and Parallelism (PCP), which are major functions of GridFTP. In addition, the author introduced a simple design process of Takagi-Sugeno-Kang (TSK) fuzzy model and designed a model for predicting transfer throughput with optimal combination of Pipelining, Concurrency and Parallelism. Hence, the author evaluated the model of the proposed algorithm and the TSK model to prove the superiority.

Analysis of the Influence Factors of Data Loading Performance Using Apache Sqoop (아파치 스쿱을 사용한 하둡의 데이터 적재 성능 영향 요인 분석)

  • Chen, Liu;Ko, Junghyun;Yeo, Jeongmo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.2
    • /
    • pp.77-82
    • /
    • 2015
  • Big Data technology has been attracted much attention in aspect of fast data processing. Research of practicing Big Data technology is also ongoing to process large-scale structured data much faster in Relatioinal Database(RDB). Although there are lots of studies about measuring analyzing performance, studies about structured data loading performance, prior step of analyzing, is very rare. Thus, in this study, structured data in RDB is tested the performance that loads distributed processing platform Hadoop using Apache sqoop. Also in order to analyze the influence factors of data loading, it is tested repeatedly with different options of data loading and compared with data loading performance among RDB based servers. Although data loading performance of Apache Sqoop in test environment was low, but in large-scale Hadoop cluster environment we can expect much better performance because of getting more hardware resources. It is expected to be based on study improving data loading performance and whole steps of performance analyzing structured data in Hadoop Platform.

Real-time emotion analysis service with big data-based user face recognition (빅데이터 기반 사용자 얼굴인식을 통한 실시간 감성분석 서비스)

  • Kim, Jung-Ah;Park, Roy C.;Hwang, Gi-Hyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.18 no.2
    • /
    • pp.49-54
    • /
    • 2017
  • In this paper, we use face database to detect human emotion in real time. Although human emotions are defined globally, real emotional perception comes from the subjective thoughts of the judging person. Therefore, judging human emotions using computer image processing technology requires high technology. In order to recognize the emotion, basically the human face must be detected accurately and the emotion should be recognized based on the detected face. In this paper, based on the Cohn-Kanade Database, one of the face databases, faces are detected by combining the detected faces with the database.

  • PDF

A Study on the Web Building Assistant System Using GUI Object Detection and Large Language Model (웹 구축 보조 시스템에 대한 GUI 객체 감지 및 대규모 언어 모델 활용 연구)

  • Hyun-Cheol Jang;Hyungkuk Jang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.830-833
    • /
    • 2024
  • As Large Language Models (LLM) like OpenAI's ChatGPT[1] continue to grow in popularity, new applications and services are expected to emerge. This paper introduces an experimental study on a smart web-builder application assistance system that combines Computer Vision with GUI object recognition and the ChatGPT (LLM). First of all, the research strategy employed computer vision technology in conjunction with Microsoft's "ChatGPT for Robotics: Design Principles and Model Abilities"[2] design strategy. Additionally, this research explores the capabilities of Large Language Model like ChatGPT in various application design tasks, specifically in assisting with web-builder tasks. The study examines the ability of ChatGPT to synthesize code through both directed prompts and free-form conversation strategies. The researchers also explored ChatGPT's ability to perform various tasks within the builder domain, including functions and closure loop inferences, basic logical and mathematical reasoning. Overall, this research proposes an efficient way to perform various application system tasks by combining natural language commands with computer vision technology and LLM (ChatGPT). This approach allows for user interaction through natural language commands while building applications.