• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.034 seconds

Algorithm for Extract Region of Interest Using Fast Binary Image Processing (고속 이진화 영상처리를 이용한 관심영역 추출 알고리즘)

  • Cho, Young-bok;Woo, Sung-hee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.4
    • /
    • pp.634-640
    • /
    • 2018
  • In this paper, we propose an automatic extraction algorithm of region of interest(ROI) based on medical x-ray images. The proposed algorithm uses segmentation, feature extraction, and reference image matching to detect lesion sites in the input image. The extracted region is searched for matching lesion images in the reference DB, and the matched results are automatically extracted using the Kalman filter based fitness feedback. The proposed algorithm is extracts the contour of the left hand image for extract growth plate based on the left x-ray input image. It creates a candidate region using multi scale Hessian-matrix based sessionization. As a result, the proposed algorithm was able to split rapidly in 0.02 seconds during the ROI segmentation phase, also when extracting ROI based on segmented image 0.53, the reinforcement phase was able to perform very accurate image segmentation in 0.49 seconds.

An Analysis of Existing Studies on Parallel and Distributed Processing of the Rete Algorithm (Rete 알고리즘의 병렬 및 분산 처리에 관한 기존 연구 분석)

  • Kim, Jaehoon
    • The Journal of Korean Institute of Information Technology
    • /
    • v.17 no.7
    • /
    • pp.31-45
    • /
    • 2019
  • The core technologies for intelligent services today are deep learning, that is neural networks, and parallel and distributed processing technologies such as GPU parallel computing and big data. However, for intelligent services and knowledge sharing services through globally shared ontologies in the future, there is a technology that is better than the neural networks for representing and reasoning knowledge. It is a knowledge representation of IF-THEN in RIF or SWRL, which is the standard rule language of the Semantic Web, and can be inferred efficiently using the rete algorithm. However, when the number of rules processed by the rete algorithm running on a single computer is 100,000, its performance becomes very poor with several tens of minutes, and there is an obvious limitation. Therefore, in this paper, we analyze the past and current studies on parallel and distributed processing of rete algorithm, and examine what aspects should be considered to implement an efficient rete algorithm.

S-PARAFAC: Distributed Tensor Decomposition using Apache Spark (S-PARAFAC: 아파치 스파크를 이용한 분산 텐서 분해)

  • Yang, Hye-Kyung;Yong, Hwan-Seung
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.280-287
    • /
    • 2018
  • Recently, the use of a recommendation system and tensor data analysis, which has high-dimensional data, is increasing, as they allow us to analyze the tensor and extract potential elements and patterns. However, due to the large size and complexity of the tensor, it needs to be decomposed in order to analyze the tensor data. While several tools are used for tensor decomposition such as rTensor, pyTensor, and MATLAB, since such tools run on a single machine, they are unable to handle large data. Also, while distributed tensor decomposition tools based on Hadoop can handle a scalable tensor, its computing speed is too slow. In this paper, we propose S-PARAFAC, which is a tensor decomposition tool based on Apache Spark, in distributed in-memory environments. We converted the PARAFAC algorithm into an Apache Spark version that enables rapid processing of tensor data. We also compared the performance of the Hadoop based tensor tool and S-PARAFAC. The result showed that S-PARAFAC is approximately 4~25 times faster than the Hadoop based tensor tool.

FRChain: A Blockchain-based Flow-Rules-oriented Data Forwarding Security Scheme in SDN

  • Lian, Weichen;Li, Zhaobin;Guo, Chao;Wei, Zhanzhen;Peng, Xingyuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.1
    • /
    • pp.264-284
    • /
    • 2021
  • As the next-generation network architecture, software-defined networking (SDN) has great potential. But how to forward data packets safely is a big challenge today. In SDN, packets are transferred according to flow rules which are made and delivered by the controller. Once flow rules are modified, the packets might be redirected or dropped. According to related research, we believe that the key to forward data flows safely is keeping the consistency of flow rules. However, existing solutions place little emphasis on the safety of flow rules. After summarizing the shortcomings of the existing solutions, we propose FRChain to ensure the security of SDN data forwarding. FRChain is a novel scheme that uses blockchain to secure flow rules in SDN and to detect compromised nodes in the network when the proportion of malicious nodes is less than one-third. The scheme places the flow strategies into blockchain in form of transactions. Once an unmatched flow rule is detected, the system will issue the problem by initiating a vote and possible attacks will be deduced based on the results. To simulate the scheme, we utilize BigchainDB, which has good performance in data processing, to handle transactions. The experimental results show that the scheme is feasible, and the additional overhead for network performance and system performance is less than similar solutions. Overall, FRChain can detect suspicious behaviors and deduce malicious nodes to keep the consistency of flow rules in SDN.

Predicting the Future Price of Export Items in Trade Using a Deep Regression Model (딥러닝 기반 무역 수출 가격 예측 모델)

  • Kim, Ji Hun;Lee, Jee Hang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.10
    • /
    • pp.427-436
    • /
    • 2022
  • Korea Trade-Investment Promotion Agency (KOTRA) annually publishes the trade data in South Korea under the guidance of the Ministry of Trade, Industry and Energy in South Korea. The trade data usually contains Gross domestic product (GDP), a custom tariff, business score, and the price of export items in previous and this year, with regards to the trading items and the countries. However, it is challenging to figure out the meaningful insight so as to predict the future price on trading items every year due to the significantly large amount of data accumulated over the several years under the limited human/computing resources. Within this context, this paper proposes a multi layer perception that can predict the future price of potential trading items in the next year by training large amounts of past year's data with a low computational and human cost.

A Predictive System for Equipment Fault Diagnosis based on Machine Learning in Smart Factory (스마트 팩토리에서 머신 러닝 기반 설비 장애진단 예측 시스템)

  • Chow, Jaehyung;Lee, Jaeoh
    • KNOM Review
    • /
    • v.24 no.1
    • /
    • pp.13-19
    • /
    • 2021
  • In recent, there is research to maximize production by preventing failures/accidents in advance through fault diagnosis/prediction and factory automation in the industrial field. Cloud technology for accumulating a large amount of data, big data technology for data processing, and Artificial Intelligence(AI) technology for easy data analysis are promising candidate technologies for accomplishing this. Also, recently, due to the development of fault diagnosis/prediction, the equipment maintenance method is also developing from Time Based Maintenance(TBM), being a method of regularly maintaining equipment, to the TBM of combining Condition Based Maintenance(CBM), being a method of maintenance according to the condition of the equipment. For CBM-based maintenance, it is necessary to define and analyze the condition of the facility. Therefore, we propose a machine learning-based system and data model for diagnosing the fault in this paper. And based on this, we will present a case of predicting the fault occurrence in advance.

Design of Customized Research Information Service Based on Prescriptive Analytics (처방적 분석 기반의 연구자 맞춤형 연구정보 서비스 설계)

  • Lee, Jeong-Won;Oh, Yong-Sun
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.3
    • /
    • pp.69-74
    • /
    • 2022
  • Big data related analysis techniques, the prescriptive analytics methodology improves the performance of passive learning models by ensuring that active learning secures high-quality learning data. Prescriptive analytics is a performance maximizing process by enhancing the machine learning models and optimizing systems through active learning to secure high-quality learning data. It is the best subscription value analysis that constructs the expensive category data efficiently. To expand the value of data by collecting research field, research propensity, and research activity information, customized researcher through prescriptive analysis such as predicting the situation at the time of execution after data pre-processing, deriving viable alternatives, and examining the validity of alternatives according to changes in the situation Provides research information service.

Re-defining Named Entity Type for Personal Information De-identification and A Generation method of Training Data (개인정보 비식별화를 위한 개체명 유형 재정의와 학습데이터 생성 방법)

  • Choi, Jae-hoon;Cho, Sang-hyun;Kim, Min-ho;Kwon, Hyuk-chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.206-208
    • /
    • 2022
  • As the big data industry has recently developed significantly, interest in privacy violations caused by personal information leakage has increased. There have been attempts to automate this through named entity recognition in natural language processing. In this paper, named entity recognition data is constructed semi-automatically by identifying sentences with de-identification information from de-identification information in Korean Wikipedia. This can reduce the cost of learning about information that is not subject to de-identification compared to using general named entity recognition data. In addition, it has the advantage of minimizing additional systems based on rules and statistics to classify de-identification information in the output. The named entity recognition data proposed in this paper is classified into twelve categories. There are included de-identification information, such as medical records and family relationships. In the experiment using the generated dataset, KoELECTRA showed performance of 0.87796 and RoBERTa of 0.88.

  • PDF

Infrastructure Anomaly Analysis for Data-center Failure Prevention: Based on RRCF and Prophet Ensemble Analysis (데이터센터 장애 예방을 위한 인프라 이상징후 분석: RRCF와 Prophet Ensemble 분석 기반)

  • Hyun-Jong Kim;Sung-Keun Kim;Byoung-Whan Chun;Kyong-Bog, Jin;Seung-Jeong Yang
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.113-124
    • /
    • 2022
  • Various methods using machine learning and big data have been applied to prevent failures in Data Centers. However, there are many limitations to referencing individual equipment-based performance indicators or to being practically utilized as an approach that does not consider the infrastructure operating environment. In this study, the performance indicators of individual infrastructure equipment are integrated monitoring and the performance indicators of various equipment are segmented and graded to make a single numerical value. Data pre-processing based on experience in infrastructure operation. And an ensemble of RRCF (Robust Random Cut Forest) analysis and Prophet analysis model led to reliable analysis results in detecting anomalies. A failure analysis system was implemented to facilitate the use of Data Center operators. It can provide a preemptive response to Data Center failures and an appropriate tuning time.

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]