• Title/Summary/Keyword: Log preprocessing

Search Result 28, Processing Time 0.018 seconds

Framework for Efficient Web Page Prediction using Deep Learning

  • Kim, Kyung-Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.165-172
    • /
    • 2020
  • Recently, due to exponential growth of access information on the web, the importance of predicting a user's next web page use has been increasing. One of the methods that can be used for predicting user's next web page is deep learning. To predict next web page, web logs are analyzed by data preprocessing and then a user's next web page is predicted on the output of the analyzed web logs using a deep learning algorithm. In this paper, we propose a framework for web page prediction that includes methods for web log preprocessing followed by deep learning techniques for web prediction. To increase the speed of preprocessing of large web log, a Hadoop based MapReduce programming model is used. In addition, we present a web prediction system that uses an efficient deep learning technique on the output of web log preprocessing for training and prediction. Through experiment, we show the performance improvement of our proposed method over traditional methods. We also show the accuracy of our prediction.

Analysis of Microbiological Hazards of Preprocessed Namuls in School Food Service and Processing Plant (학교급식에 공급되는 전처리 나물류 및 가공업체에서의 공정별 미생물학적 위해요소 분석)

  • Kwak, Soo-Jin;Kim, Su-Jin;Lkhagvasarnai, Enkhjargal;Yoon, Ki-Sun
    • Journal of Food Hygiene and Safety
    • /
    • v.27 no.2
    • /
    • pp.117-126
    • /
    • 2012
  • This study was conducted to assess the levels of microbiological hazards of preprocessed Namuls, which were served at the school foodservice. 19 preprocessed ground or root vegetables were collected from 21 schools in May to June of 2011. Heavy contamination of aerobic plate counts (from 3.39 to 8.42 logCFU/g) and total coliform groups (from 3.16 to 7.84 logCFU/g), enterobacteriaceaes (from 2.53 to 7.55 logCFU/g) were detected in preprocessed Namuls. In addition, the detection rates of Escherichia coli, Staphylococcus aureus and Bacillus cereus (emetic form) were 4.3%, 11.7% and 2.1%, respectively. In addition, sanitary indicative bacterium at preprocessing steps of root vegetables (lotus root, burdock root, bellflower root) and blanched Namuls (bracken, sweet potato vine, chinamul) were analyzed. Aerobic plate counts, coliform groups, and enterobacteriaceaes were not effectively removed during preprocessing including washing and soaking steps. In the case of blanched Namuls (bracken, sweet potato vine, chinamul), contamination levels increased more after drying process and no significant reduction effect on the levels of microbial contamination was observed during preprocessing steps. Thus, effect of preprocessing steps on the microbiological hazards in Namuls must be reevaluated to improve the microbiological quality of preprocessed Namuls at the school foodservice and retail markets.

User Identification and Session completion in Input Data Preprocessing for Web Mining (웹 마이닝을 위한 입력 데이타의 전처리과정에서 사용자구분과 세션보정)

  • 최영환;이상용
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.9
    • /
    • pp.843-849
    • /
    • 2003
  • Web usage mining is the technique of data mining that analyzes web users' usage patterns by large web log. To use the web usage mining technique, we have to classify correctly users and users session in preprocessing, but can't classify them completely by only log files with standard web log format. To classify users and user session there are many problems like local cache, firewall, ISP, user privacy, cookey etc., but there isn't any definite method to solve the problems now. Especially local cache problem is the most difficult problem to classify user session which is used as input in web mining systems. In this paper we propose a heuristic method which solves local cache problem by using only click stream data of server side like referrer log, agent log and access log, classifies user sessions and completes session.

Comparison of environmental sound classification performance of convolutional neural networks according to audio preprocessing methods (오디오 전처리 방법에 따른 콘벌루션 신경망의 환경음 분류 성능 비교)

  • Oh, Wongeun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.3
    • /
    • pp.143-149
    • /
    • 2020
  • This paper presents the effect of the feature extraction methods used in the audio preprocessing on the classification performance of the Convolutional Neural Networks (CNN). We extract mel spectrogram, log mel spectrogram, Mel Frequency Cepstral Coefficient (MFCC), and delta MFCC from the UrbanSound8K dataset, which is widely used in environmental sound classification studies. Then we scale the data to 3 distributions. Using the data, we test four CNNs, VGG16, and MobileNetV2 networks for performance assessment according to the audio features and scaling. The highest recognition rate is achieved when using the unscaled log mel spectrum as the audio features. Although this result is not appropriate for all audio recognition problems but is useful for classifying the environmental sounds included in the Urbansound8K.

Effects of Preprocessing on Quality of Fermented Red Snow Crab Chionoecetes japonicus Sauce (전처리 방법에 따른 홍게(Chionoecetes japonicus) 어간장의 제조 및 품질변화)

  • Lim, Ji-Hoon;Jeong, Jee-Hee;Jeong, Min-Jung;Jeong, In-Hak;Kim, Byoung-Mok
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.48 no.3
    • /
    • pp.284-292
    • /
    • 2015
  • We explored preprocessing-mediated quality changes in red snow crab fish sauce. A control (C) group and groups treated with autolysis (A), boiling (B), enzymatic hydrolysis (E), and addition of Aspergillus oryzae (K) were formed. The titratable acidity of the K group increased with storage time, whereas that of groups C, A, B, and E decreased. The total and amino nitrogen contents initially increased on storage of all samples, but decreased in later periods. The total plate count (TPC) of the K group was initially 5.26 log CFU/mL and increased to 7.28 log CFU/mL at 3 months of storage. The TPCs of the C, A, B, and E groups were initially <5.00 log CFU/mL and decreased with storage. The lactic acid bacteria count of the K group was initially 4.80 log CFU/mL and increased until month 5 to approximately 6.06 log CFU/mL. The K group scored higher in terms of sensory attributes than the other groups and maintained marketable scores for all relevant properties (color, flavor, off-odor, and overall acceptance). Furthermore, the free amino acid content of the K group was the highest among all groups at approximately 3,000 mg per 100 g. These results suggest that K treatment may be beneficial in the preparation of fermented fish sauce.

Preprocessing performance of convolutional neural networks according to characteristic of underwater targets (수중 표적 분류를 위한 합성곱 신경망의 전처리 성능 비교)

  • Kyung-Min, Park;Dooyoung, Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.6
    • /
    • pp.629-636
    • /
    • 2022
  • We present a preprocessing method for an underwater target detection model based on a convolutional neural network. The acoustic characteristics of the ship show ambiguous expression due to the strong signal power of the low frequency. To solve this problem, we combine feature preprocessing methods with various feature scaling methods and spectrogram methods. Define a simple convolutional neural network model and train it to measure preprocessing performance. Through experiment, we found that the combination of log Mel-spectrogram and standardization and robust scaling methods gave the best classification performance.

Designing Summary Tables for Mining Web Log Data

  • Ahn, Jeong-Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.1
    • /
    • pp.157-163
    • /
    • 2005
  • In the Web, the data is generally gathered automatically by Web servers and collected in server or access logs. However, as users access larger and larger amounts of data, query response times to extract information inevitably get slower. A method to resolve this issue is the use of summary tables. In this short note, we design a prototype of summary tables that can efficiently extract information from Web log data. We also present the relative performance of the summary tables against a sampling technique and a method that uses raw data.

  • PDF

Improving Lookup Time Complexity of Compressed Suffix Arrays using Multi-ary Wavelet Tree

  • Wu, Zheng;Na, Joong-Chae;Kim, Min-Hwan;Kim, Dong-Kyue
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.1
    • /
    • pp.1-4
    • /
    • 2009
  • In a given text T of size n, we need to search for the information that we are interested. In order to support fast searching, an index must be constructed by preprocessing the text. Suffix array is a kind of index data structure. The compressed suffix array (CSA) is one of the compressed indices based on the regularity of the suffix array, and can be compressed to the $k^{th}$ order empirical entropy. In this paper we improve the lookup time complexity of the compressed suffix array by using the multi-ary wavelet tree at the cost of more space. In our implementation, the lookup time complexity of the compressed suffix array is O(${\log}_{\sigma}^{\varepsilon/(1-{\varepsilon})}\;n\;{\log}_r\;\sigma$), and the space of the compressed suffix array is ${\varepsilon}^{-1}\;nH_k(T)+O(n\;{\log}\;{\log}\;n/{\log}^{\varepsilon}_{\sigma}\;n)$ bits, where a is the size of alphabet, $H_k$ is the kth order empirical entropy r is the branching factor of the multi-ary wavelet tree such that $2{\leq}r{\leq}\sqrt{n}$ and $r{\leq}O({\log}^{1-{\varepsilon}}_{\sigma}\;n)$ and 0 < $\varepsilon$ < 1/2 is a constant.

Effect of zero imputation methods for log-transformation of independent variables in logistic regression

  • Seo Young Park
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.4
    • /
    • pp.409-425
    • /
    • 2024
  • Logistic regression models are commonly used to explain binary health outcome variable using independent variables such as patient characteristics in medical science and public health research. Although there is no distributional assumption required for independent variables in logistic regression, variables with severely right-skewed distribution such as lab values are often log-transformed to achieve symmetry or approximate normality. However, lab values often have zeros due to limit of detection which makes it impossible to apply log-transformation. Therefore, preprocessing to handle zeros in the observation before log-transformation is necessary. In this study, five methods that remove zeros (shift by 1, shift by half of the smallest nonzero, shift by square root of the smallest nonzero, replace zeros with half of the smallest nonzero, replace zeros with the square root of the smallest nonzero) are investigated in logistic regression setting. To evaluate performances of these methods, we performed a simulation study based on randomly generated data from log-normal distribution and logistic regression model. Shift by 1 method has the worst performance, and overall shift by half of the smallest nonzero method, replace zeros with half of the smallest nonzero method, and replace zeros with the square root of the smallest nonzero method showed comparable and stable performances.

For Improving Security Log Big Data Analysis Efficiency, A Firewall Log Data Standard Format Proposed (보안로그 빅데이터 분석 효율성 향상을 위한 방화벽 로그 데이터 표준 포맷 제안)

  • Bae, Chun-sock;Goh, Sung-cheol
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.1
    • /
    • pp.157-167
    • /
    • 2020
  • The big data and artificial intelligence technology, which has provided the foundation for the recent 4th industrial revolution, has become a major driving force in business innovation across industries. In the field of information security, we are trying to develop and improve an intelligent security system by applying these techniques to large-scale log data, which has been difficult to find effective utilization methods before. The quality of security log big data, which is the basis of information security AI learning, is an important input factor that determines the performance of intelligent security system. However, the difference and complexity of log data by various product has a problem that requires excessive time and effort in preprocessing big data with poor data quality. In this study, we research and analyze the cases related to log data collection of various firewall. By proposing firewall log data collection format standard, we hope to contribute to the development of intelligent security systems based on security log big data.