• Title/Summary/Keyword: Data Sequence

Search Result 3,107, Processing Time 0.029 seconds

Anomaly Detection for User Action with Generative Adversarial Networks (적대적 생성 모델을 활용한 사용자 행위 이상 탐지 방법)

  • Choi, Nam woong;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.43-62
    • /
    • 2019
  • At one time, the anomaly detection sector dominated the method of determining whether there was an abnormality based on the statistics derived from specific data. This methodology was possible because the dimension of the data was simple in the past, so the classical statistical method could work effectively. However, as the characteristics of data have changed complexly in the era of big data, it has become more difficult to accurately analyze and predict the data that occurs throughout the industry in the conventional way. Therefore, SVM and Decision Tree based supervised learning algorithms were used. However, there is peculiarity that supervised learning based model can only accurately predict the test data, when the number of classes is equal to the number of normal classes and most of the data generated in the industry has unbalanced data class. Therefore, the predicted results are not always valid when supervised learning model is applied. In order to overcome these drawbacks, many studies now use the unsupervised learning-based model that is not influenced by class distribution, such as autoencoder or generative adversarial networks. In this paper, we propose a method to detect anomalies using generative adversarial networks. AnoGAN, introduced in the study of Thomas et al (2017), is a classification model that performs abnormal detection of medical images. It was composed of a Convolution Neural Net and was used in the field of detection. On the other hand, sequencing data abnormality detection using generative adversarial network is a lack of research papers compared to image data. Of course, in Li et al (2018), a study by Li et al (LSTM), a type of recurrent neural network, has proposed a model to classify the abnormities of numerical sequence data, but it has not been used for categorical sequence data, as well as feature matching method applied by salans et al.(2016). So it suggests that there are a number of studies to be tried on in the ideal classification of sequence data through a generative adversarial Network. In order to learn the sequence data, the structure of the generative adversarial networks is composed of LSTM, and the 2 stacked-LSTM of the generator is composed of 32-dim hidden unit layers and 64-dim hidden unit layers. The LSTM of the discriminator consists of 64-dim hidden unit layer were used. In the process of deriving abnormal scores from existing paper of Anomaly Detection for Sequence data, entropy values of probability of actual data are used in the process of deriving abnormal scores. but in this paper, as mentioned earlier, abnormal scores have been derived by using feature matching techniques. In addition, the process of optimizing latent variables was designed with LSTM to improve model performance. The modified form of generative adversarial model was more accurate in all experiments than the autoencoder in terms of precision and was approximately 7% higher in accuracy. In terms of Robustness, Generative adversarial networks also performed better than autoencoder. Because generative adversarial networks can learn data distribution from real categorical sequence data, Unaffected by a single normal data. But autoencoder is not. Result of Robustness test showed that he accuracy of the autocoder was 92%, the accuracy of the hostile neural network was 96%, and in terms of sensitivity, the autocoder was 40% and the hostile neural network was 51%. In this paper, experiments have also been conducted to show how much performance changes due to differences in the optimization structure of potential variables. As a result, the level of 1% was improved in terms of sensitivity. These results suggest that it presented a new perspective on optimizing latent variable that were relatively insignificant.

An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

  • Karim, Md. Rezaul;Rashid, Md. Mamunur;Jeong, Byeong-Soo;Choi, Ho-Jin
    • Genomics & Informatics
    • /
    • v.10 no.1
    • /
    • pp.51-57
    • /
    • 2012
  • Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.

A comparative analysis on Blind Adaptation Algorithms performances for User Detection in CDMA Systems (CDMA System에서 사용자 검파를 위한 Blind 적용 알고리즘에 관한 성능 비교 분석)

  • 조미령;윤석하
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.4
    • /
    • pp.537-546
    • /
    • 2001
  • Griffth's and LCCMA which are Single-user detection adaptive algorithm are proposed for mitigate MAI(multiple access interference) and the near-far problem in direct-sequence spread-spectrum CDMA system and MOE Algorithm is proposed for MMSE(Minimum Mean-Square Error). This paper pertains to three types of Blind adaptive algorithms which can upgrade system functionality without the requirements from training sequence. It goes further to compare and analyze the functionalities of the algorithms as per number of interfering users or data update rate of the users. The simulation results was that LCCMA algorithm was superior to other algorithms in both areas. Blind application enabled a more flexible network design by eliminating the necessity of training sequence.

  • PDF

A Study on the Semiology and Quantitative Psychological Analysis of Sequence Landscape of National Park (국립공원 Sequence 경관의 기호학과 계량심리학적 분석에 관한 연구)

  • 김세천
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.19 no.3
    • /
    • pp.55-71
    • /
    • 1991
  • The purpose of this thesis is to suggest objective basic data for the environmental design through the quantitative analysis of the visual quality included in the physical environment of Basemsagol valley sequence landscape. For this, visual volumes of physical elements have been evaluated by using the mesh analysis, spatial images structure of physical elements have been analyzed by factor analysis algorithm, and degree of visual quality have been measured mainly by questionnaires. Also, this study aims to understand semiotics and to grope the possibility of application to the sequence landscape assessment. A semiological approach suggests a new dimension in sequence landscape assessment, which is a contrast to the existing scientific evaluation methods. Result of this thesis can be summarized as follows. Visual volumes of the immediate vegetation, rock, bridge, road and distant vegetation are found to be the main factor determining the visual quality. Factors covering the spatial image of natural park sequence landscape have been found to be the overall synthetic evaluation, potentiality, natural quality, spatial, appeal and dignity. By using the control method for the number of factors, T.V. has been obtained as 40.22%. The characteristics of the semiological approach is qualitative, open, holistic, and experiential, whereas that of the scientific approach is quantitative, closed, reductive, and experimental.

  • PDF

A Study on Digital Information Hiding Technique using Random Sequence and Hadamard Matrix (랜덤시퀀스와 Hadamard 행렬을 이용한 디지털 정보은폐 기술에 관한 연구)

  • 김장환;김규태;김은수
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.24 no.9A
    • /
    • pp.1339-1345
    • /
    • 1999
  • In this paper we propose the digital information hiding technique by which we use the combination of random sequence and Hadamard matrix to hide multiple information. The prior work used only one random sequence multiplied by information signal to lower the energy level of information signal and thus it is difficult for a third party to detect the information signal or jam it. But because we have to use the orthogonal code for hiding key in order to hide multiple information in the same digital image, only the use of random sequence that are not uncorrelated has some problems in the information hiding scheme. Thus we present a new information hiding scheme that can be used in hiding multiple information by the use of random sequence that spreads the energy level of the data to be hidden and Hadamard matrix that makes the random sequence uncorrelated.

  • PDF

The Complete Genome Sequence of Southern rice black-streaked dwarf virus Isolated from Vietnam

  • Dinh, Thi-Sau;Zhou, Cuiji;Cao, Xiuling;Han, Chenggui;Yu, Jialin;Li, Dawei;Zhang, Yongliang
    • The Plant Pathology Journal
    • /
    • v.28 no.4
    • /
    • pp.428-432
    • /
    • 2012
  • We determined the complete genome sequence of a Vietnamese isolate of Southern rice black-streaked dwarf virus (SRBSDV). Whole genome comparisons and phylogenetic analysis showed that the genome of the Vietnamese isolate shared high nucleotide sequence identities of over 97.5% with those of the reported Chinese isolates, confirming a common origin of them. Moreover, the greatest divergence between different SRBSDV isolates was found in the segments S1, S3, S4 and S6, which differs from the sequence alignment results between SRBSDV and Rice black streaked dwarf virus (RBSDV), implying that SRBSDV evolved in a unique way independent of RBSDV. This is the first report of a complete nucleotide sequence of SRBSDV from Vietnam and our data provides new clues for further understanding of molecular variation and epidemiology of SRBSDV in Southeast Asia.

A Noble Equalizer Structure with the Variable Length of Training Sequence for Increasing the Throughput in DS-UWB

  • Chung, Se-Myoung;Kim, Eun-Jung;Jin, Ren;Lim, Myoung-Seob
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.1C
    • /
    • pp.113-119
    • /
    • 2009
  • The training sequence with the appropriate length for equalization and initial synchronization is necessary before sending the pure data in the burst transmission type DS-UWB system. The length of the training sequence is one of the factors which make throughput decreased. The noble structure with the variable length of the training sequence whose length can be adaptively tailored according to the channel conditions (CM1,CM2,CM3,CM4) in the DS-USB systems is proposed. This structure can increase the throughput without sacrificing the performance than the method with fixed length of training sequence considering the worst case channel conditions. Simulation results under IEEE 802.15.3a channel model show that the proposed scheme can achieve higher throughput than a conventional one with the slight loss of BER performance. And this structure can reduce the computation complexity and power consumption with selecting the short length of the training sequence.

Analyzing Financial Data from Banks and Savings Banks: Application of Bioinformatical Methods (은행과 저축은행 관련 재정 지표 분석: 생물 정보학 분석 기법의 응용)

  • Pak, Ro Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.4
    • /
    • pp.577-588
    • /
    • 2014
  • The collection and storage of a large volumes of data are becoming easier; however, the number of variables is sometimes more than the number of samples(objects). We now face the problem of dependency among variables(such as multicollinearity) due to the increased number of variables. We cannot apply various statistical methods without satisfying independency assumption. In order to overcome such a drawback we consider a categorizing (or discretizing) observations. We have a data set of nancial indices from banks in Korea that contain 78 variables from 16 banks. Genetic sequence data is also a good example of a large data and there have been numerous statistical methods to handle it. We discover lots of useful bank information after we transform bank data into categorical data that resembles genetic sequence data and apply bioinformatic techniques.

Smart Pallet Based Just-in-sequence Parts Delivery System (스마트 파렛트 기반 직서열 부품공급 시스템)

  • Lee, Young-Doo;Kim, Sang-Rak;Kong, Hyung-Yun;Koo, In-Soo
    • Journal of the Institute of Electronics Engineers of Korea SC
    • /
    • v.47 no.1
    • /
    • pp.35-41
    • /
    • 2010
  • In order to improve the productivity and the yield at assembling line of finished goods' manufacturers, it is necessary that the fabricated parts are supplied to the assembly line not only just in time (JIT) but also just in sequence (JIS). Parts that are not delivered just in time can cause assembly line to be delayed, and parts that are not delivered just in sequence can cause assembly line to be halted or defected products. For JIT and JIS implementation, in the paper we propose the smart pallet based just-in-sequence parts delivery system in which RFID and USN technologies are converged. Compared with the bar-code based just-in-sequence parts delivery system, the proposed system can reduce unnecessary time for confirming parts' type and sequence and unnecessary cost by bar-code labeling and sequence data' documenting. The proposed system also can overcome the drawbacks of the RFID based just-in-sequence parts delivery system such as transmission range limit and difficulties of confirming parts' type and sequence in real time. Finally, we show the implementation of the proposed system, and its practicality.

Sequence based Intrusion Detection using Similarity Matching of the Multiple Sequence Alignments (다중서열정렬의 유사도 매칭을 이용한 순서기반 침입탐지)

  • Kim Yong-Min
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.16 no.1
    • /
    • pp.115-122
    • /
    • 2006
  • The most methods for intrusion detection are based on the misuse detection which accumulates hewn intrusion information and makes a decision of an attack against any behavior data. However it is very difficult to detect a new or modified aoack with only the collected patterns of attack behaviors. Therefore, if considering that the method of anomaly behavior detection actually has a high false detection rate, a new approach is required for very huge intrusion patterns based on sequence. The approach can improve a possibility for intrusion detection of known attacks as well as modified and unknown attacks in addition to the similarity measurement of intrusion patterns. This paper proposes a method which applies the multiple sequence alignments technique to the similarity matching of the sequence based intrusion patterns. It enables the statistical analysis of sequence patterns and can be implemented easily. Also, the method reduces the number of detection alerts and false detection for attacks according to the changes of a sequence size.