• Title/Summary/Keyword: Big Data Analysis Technique

Search Result 260, Processing Time 0.03 seconds

Developing an Intrusion Detection Framework for High-Speed Big Data Networks: A Comprehensive Approach

  • Siddique, Kamran;Akhtar, Zahid;Khan, Muhammad Ashfaq;Jung, Yong-Hwan;Kim, Yangwoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.4021-4037
    • /
    • 2018
  • In network intrusion detection research, two characteristics are generally considered vital to building efficient intrusion detection systems (IDSs): an optimal feature selection technique and robust classification schemes. However, the emergence of sophisticated network attacks and the advent of big data concepts in intrusion detection domains require two more significant aspects to be addressed: employing an appropriate big data computing framework and utilizing a contemporary dataset to deal with ongoing advancements. As such, we present a comprehensive approach to building an efficient IDS with the aim of strengthening academic anomaly detection research in real-world operational environments. The proposed system has the following four characteristics: (i) it performs optimal feature selection using information gain and branch-and-bound algorithms; (ii) it employs machine learning techniques for classification, namely, Logistic Regression, Naïve Bayes, and Random Forest; (iii) it introduces bulk synchronous parallel processing to handle the computational requirements of large-scale networks; and (iv) it utilizes a real-time contemporary dataset generated by the Information Security Centre of Excellence at the University of Brunswick (ISCX-UNB) to validate its efficacy. Experimental analysis shows the effectiveness of the proposed framework, which is able to achieve high accuracy, low computational cost, and reduced false alarms.

A Study on the Data Visualization for Real Time Power System Operation (실시간 전력계통 운영을 위한 데이터 시각화에 관한 연구)

  • Chog, Yoon-Sung;Joung, Jinyoung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.10
    • /
    • pp.1361-1367
    • /
    • 2013
  • This paper describes and suggests the data visualization for real time power system operation based on energy management system. Because real time power system operation performs analysis of the vast amount of on-line data, the operators need intuitive data visualization to find out useful information in the big data. Especially, in emergency situation, the data visualization is able to assist the operators in handling the crisis quickly and efficiently. Therefore, this paper aims to improve displays of output of real time power system operation by visualizing on-line big data. Through this study, we can develop improved visualization technique for real time power system operation, which has highly readable displays of output and intuitive information.

Firm Classification based on MBTI Organizational Character Type: Using Firm Review Big Data (MBTI 조직성격유형화에 따른 기업분류: 기업리뷰 빅데이터를 활용하여)

  • Lee, Hanjun;Shin, Dongwon;An, Byungdae
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.3
    • /
    • pp.361-378
    • /
    • 2021
  • Purpose - The purpose of this study is to classify KOSPI listed companies according to their organizational character type based on MBTI. Design/methodology/approach - This study collected 109,989 reviews from an online firm review website, Jobplanet. Using these reviews and the descriptions about organizational character, we conducted document similarity analysis. Doc2Vec technique was hired for the analysis. Findings - First, there are more companies belonging to Extraversion(E), Intuition(N), Feeling(F), and Judging(J) than Introversion(I), Sensing(S), Thinking(T), and Perceiving(P) as organizational character types of MBTI. Second, more companies have EJ and EP as the behavior type and NT and NF as the decision-making type. Third, the top-3 organizational character type of which firms have among 16 types are ENTJ, ENFP, and ENFJ. Finally, companies belonging to the same industry group were found to have similar organizational character. Research implications or Originality - This study provides a noble way to measure organizational character type using firm review big data and document similarity analysis technique. The research results can be practically used for firms in their organizational diagnosis and organizational management, and are meaningful as a basic study for various future studies to empirically analyze the impact of organizational character.

A Prediction System for Server Performance Management (서버 성능 관리를 위한 장애 예측 시스템)

  • Lim, Bock-Chool;Kim, Soon-Gohn
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.6
    • /
    • pp.684-690
    • /
    • 2018
  • In society of the big data is being recognized as one of the core technologies witch is analysis of the collected information, the intelligent evolution of society seems to be more oriented society through an optimized value creation based on a prediction technique. If we take advantage of technologies based on big data about various data and a large amount of data generated during system operation, it will be possible to support stable operation and prevention of faults and failures. In this paper, we suggested an environment using the collection and analysis of big data, and proposed an derive time series prediction model for predicting failure through server performance monitoring for data collected and analyzed. It can be capable of supporting stable operation of the IT systems through failure prediction model for the server operator.

An SNS and Web based BDAS design for On-Line Marketing Strategy (온라인 마케팅 전략을 위한 SNS와 Web기반 BDAS(Big data Data Analysis Scheme) 설계)

  • Jeong, Yi-Na;Lee, Byung-Kwan;Park, Seok-Gyu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.141-148
    • /
    • 2015
  • This paper proposes the BDAS(Big Data analysis Scheme) design that extracts the real time shared information from SNS and Web, analyzes the extracted data rapidly for customers, and makes an on-line marketing strategy efficiently. First, the BDAS collects the data shared in SNS and Web. Second, it provides the result of visualization by analyzing the semantics of the collected data as positive or negative. Therefore, because the BDAS ensures an average 90% accuracy in judging the semantics about the shared SNA and Web data, it can judge customer's propensity accurately and be used for on-line marketing strategy efficiently.

Big Data Analytics of Construction Safety Incidents Using Text Mining (텍스트 마이닝을 활용한 건설안전사고 빅데이터 분석)

  • Jeong Uk Seo;Chie Hoon Song
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.3
    • /
    • pp.581-590
    • /
    • 2024
  • This study aims to extract key topics through text mining of incident records (incident history, post-incident measures, preventive measures) from construction safety accident case data available on the public data portal. It also seeks to provide fundamental insights contributing to the establishment of manuals for disaster prevention by identifying correlations between these topics. After pre-processing the input data, we used the LDA-based topic modeling technique to derive the main topics. Consequently, we obtained five topics related to incident history, and four topics each related to post-incident measures and preventive measures. Although no dominant patterns emerged from the topic pattern analysis, the study holds significance as it provides quantitative information on the follow-up actions related to the incident history, thereby suggesting practical implications for the establishment of a preventive decision-making system through the linkage between accident history and subsequent measures for reccurrence prevention.

Application of Social Big Data Analysis for CosMedical Cosmetics Marketing : H Company Case Study (기능성 화장품 마케팅의 소셜 빅데이터 분석 활용 : H사 사례를 중심으로)

  • Hwang, Sin-Hae;Ku, Dong-Young;Kim, Jeoung-Kun
    • Journal of Digital Convergence
    • /
    • v.17 no.7
    • /
    • pp.35-41
    • /
    • 2019
  • This study aims to analyze the cosmedical cosmetics market and the nature of customer through the social big data analysis. More than 80,000 posts were analyzed using R program. After data cleansing, keyword frequency analysis and association analysis were performed to understand customer needs and competitor positioning, formulated several implications for marketing strategy sophistication and implementation. Analysis results show that "prevention" is a new and essential attribute for appealing target customers. The expansion of the product line for the gift market is also suggested. It has been shown that there is a high correlation with products that can be complementary to each other. In addition to the traditional marketing technique, the social big data analysis based on evidence was useful in deriving the characteristics of the customers and the market that had not been identified before. Word2vec algorithm will be beneficial to find additional.

Unstructured Data Processing Using Keyword-Based Topic-Oriented Analysis (키워드 기반 주제중심 분석을 이용한 비정형데이터 처리)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.521-526
    • /
    • 2017
  • Data format of Big data is diverse and vast, and its generation speed is very fast, requiring new management and analysis methods, not traditional data processing methods. Textual mining techniques can be used to extract useful information from unstructured text written in human language in online documents on social networks. Identifying trends in the message of politics, economy, and culture left behind in social media is a factor in understanding what topics they are interested in. In this study, text mining was performed on online news related to a given keyword using topic - oriented analysis technique. We use Latent Dirichiet Allocation (LDA) to extract information from web documents and analyze which subjects are interested in a given keyword, and which topics are related to which core values are related.

A Study on the Classification of Unstructured Data through Morpheme Analysis

  • Kim, SungJin;Choi, NakJin;Lee, JunDong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.105-112
    • /
    • 2021
  • In the era of big data, interest in data is exploding. In particular, the development of the Internet and social media has led to the creation of new data, enabling the realization of the era of big data and artificial intelligence and opening a new chapter in convergence technology. Also, in the past, there are many demands for analysis of data that could not be handled by programs. In this paper, an analysis model was designed and verified for classification of unstructured data, which is often required in the era of big data. Data crawled DBPia's thesis summary, main words, and sub-keyword, and created a database using KoNLP's data dictionary, and tokenized words through morpheme analysis. In addition, nouns were extracted using KAIST's 9 part-of-speech classification system, TF-IDF values were generated, and an analysis dataset was created by combining training data and Y values. Finally, The adequacy of classification was measured by applying three analysis algorithms(random forest, SVM, decision tree) to the generated analysis dataset. The classification model technique proposed in this paper can be usefully used in various fields such as civil complaint classification analysis and text-related analysis in addition to thesis classification.

Speed-up of the Matrix Computation on the Ridge Regression

  • Lee, Woochan;Kim, Moonseong;Park, Jaeyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3482-3497
    • /
    • 2021
  • Artificial intelligence has emerged as the core of the 4th industrial revolution, and large amounts of data processing, such as big data technology and rapid data analysis, are inevitable. The most fundamental and universal data interpretation technique is an analysis of information through regression, which is also the basis of machine learning. Ridge regression is a technique of regression that decreases sensitivity to unique or outlier information. The time-consuming calculation portion of the matrix computation, however, basically includes the introduction of an inverse matrix. As the size of the matrix expands, the matrix solution method becomes a major challenge. In this paper, a new algorithm is introduced to enhance the speed of ridge regression estimator calculation through series expansion and computation recycle without adopting an inverse matrix in the calculation process or other factorization methods. In addition, the performances of the proposed algorithm and the existing algorithm were compared according to the matrix size. Overall, excellent speed-up of the proposed algorithm with good accuracy was demonstrated.