Search | Korea Science

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
- International Journal of Contents
- /
- v.3 no.2
- /
- pp.18-24
- /
- 2007
Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.
https://doi.org/10.5392/IJoC.2007.3.2.018 인용 PDF

Trend-based Sequential Pattern Discovery from Time-Series Data (시계열 데이터로부터의 경향성 기반 순차패턴 탐색)

오용생;이동하;남도원;이전영
- Journal of Intelligence and Information Systems
- /
- v.7 no.1
- /
- pp.27-45
- /
- 2001
Sequential discovery from time series data has mainly concerned about events or item sets. Recently, the research has stated to applied to the numerical data. An example is sensor information generated by checking a machine state. The numerical data hardly have the same valuers while making patterns. So, it is important to extract suitable number of pattern features, which can be transformed to events or item sets and be applied to sequential pattern mining tasks. The popular methods to extract the patterns are sliding window and clustering. The results of these methods are sensitive to window sine or clustering parameters; that makes users to apply data mining task repeatedly and to interpret the results. This paper suggests the method to retrieve pattern features making numerical data into vector of an angle and a magnitude. The retrieved pattern features using this method make the result easy to understand and sequential patterns finding fast. We define an inclusion relation among pattern features using angles and magnitudes of vectors. Using this relation, we can fad sequential patterns faster than other methods, which use all data by reducing the data size.
PDF

Continuous Moving Pattern Mining Approach in LBS Platform

LEE, J.W.;Heo, T.W.;Kim, K.S.;Lee, J.H.
- Proceedings of the KSRS Conference
- /
- 2003.11a
- /
- pp.597-599
- /
- 2003
Moving pattern is as a kind of sequential pattern, which can be extracted from the large volume of location history data. This sort of knowledge is very useful in supporting intelligence to the LBS or GIS. In this paper, we proposed the continuous moving pattern mining approach in LBS platform and LBS Miner. The location updates of moving objects affect the set of the rules maintained. In our approach, we use the validity thresholds that indicate the next time to invoke the incremental pattern mining. The mining system will play a major role in supporting the various LBS solutions.
PDF

Efficient Sequence Pattern Mining Technique for the Removal of Ambiguity in the Interval Patterns Mining (인터벌 패턴 마이닝에서 모호성 제거를 위한 효율적인 순차 패턴 마이닝 기법)

Kim, Hwan;Choi, Pilsun;Kim, Daein;Hwang, Buhyun
- KIPS Transactions on Software and Data Engineering
- /
- v.2 no.8
- /
- pp.565-570
- /
- 2013
Previous researches on mining sequential patterns mainly focused on discovering patterns from the point-based event. Interval events with a time interval occur in the real world that have the start and end point. Existing interval pattern mining methods that discover relationships among interval events based on the Allen operators have some problems. These are that interval patterns having three or more interval events can be interpreted as several meanings. In this paper, we propose the I_TPrefixSpan algorithm, which is an efficient sequence pattern mining technique for removing ambiguity in the Interval Patterns Mining. The proposed algorithm generates event sequences that have no ambiguity. Therefore, the size of generated candidate set can be minimized by searching sequential pattern mining entries that exist only in the event sequence. The performance evaluation shows that the proposed method is more efficient than existing methods.
https://doi.org/10.3745/KTSDE.2013.2.8.565 인용 PDF KSCI

Learning Multidimensional Sequential Patterns Using Hellinger Entropy Function (Hellinger 엔트로피를 이용한 다차원 연속패턴의 생성방법)

Lee, Chang-Hwan
- The KIPS Transactions:PartB
- /
- v.11B no.4
- /
- pp.477-484
- /
- 2004
The technique of sequential pattern mining means generating a set of inter-transaction patterns residing in time-dependent data. This paper proposes a new method for generating sequential patterns with the use of Hellinger measure. While the current methods are generating single dimensional sequential patterns within a single attribute, the proposed method is able to detect multi-dimensional patterns among different attributes. A number of heuristics, based on the characteristics of Hellinger measure, are proposed to reduce the computational complexity of the sequential pattern systems. Some experimental results are presented.
https://doi.org/10.3745/KIPSTB.2004.11B.4.477 인용 PDF KSCI

Finding associations between genes by time-series microarray sequential patterns analysis

Nam, Ho-Jung;Lee, Do-Heon
- Proceedings of the Korean Society for Bioinformatics Conference
- /
- 2005.09a
- /
- pp.161-164
- /
- 2005
Data mining techniques can be applied to identify patterns of interest in the gene expression data. One goal in mining gene expression data is to determine how the expression of any particular gene might affect the expression of other genes. To find relationships between different genes, association rules have been applied to gene expression data set [1]. A notable limitation of association rule mining method is that only the association in a single profile experiment can be detected. It cannot be used to find rules across different condition profiles or different time point profile experiments. However, with the appearance of time-series microarray data, it became possible to analyze the temporal relationship between genes. In this paper, we analyze the time-series microarray gene expression data to extract the sequential patterns which are similar to the association rules between genes among different time points in the yeast cell cycle. The sequential patterns found in our work can catch the associations between different genes which express or repress at diverse time points. We have applied sequential pattern mining method to time-series microarray gene expression data and discovered a number of sequential patterns from two groups of genes (test, control) and more sequential patterns have been discovered from test group (same CO term group) than from the control group (different GO term group). This result can be a support for the potential of sequential patterns which is capable of catching the biologically meaningful association between genes.
PDF

Mining High Utility Sequential Patterns Using Sequence Utility Lists (시퀀스 유틸리티 리스트를 사용하여 높은 유틸리티 순차 패턴 탐사 기법)

Park, Jong Soo
- KIPS Transactions on Software and Data Engineering
- /
- v.7 no.2
- /
- pp.51-62
- /
- 2018
High utility sequential pattern (HUSP) mining has been considered as an important research topic in data mining. Although some algorithms have been proposed for this topic, they incur the problem of producing a large search space for HUSPs. The tighter utility upper bound of a sequence can prune more unpromising patterns early in the search space. In this paper, we propose a sequence expected utility (SEU) as a new utility upper bound of each sequence, which is the maximum expected utility of a sequence and all its descendant sequences. A sequence utility list for each pattern is used as a new data structure to maintain essential information for mining HUSPs. We devise an algorithm, high sequence utility list-span (HSUL-Span), to identify HUSPs by employing SEU. Experimental results on both synthetic and real datasets from different domains show that HSUL-Span generates considerably less candidate patterns and outperforms other algorithms in terms of execution time.
https://doi.org/10.3745/KTSDE.2018.7.2.51 인용 PDF KSCI

Extracting Maximal Similar Paths between Two XML Documents using Sequential Pattern Mining (순차 패턴 마이닝을 사용한 두 XML 문서간 최대 유사 경로 추출)

이정원;박승수
- Journal of KIISE:Databases
- /
- v.31 no.5
- /
- pp.553-566
- /
- 2004
Some of the current main research areas involving techniques related to XML consist of storing XML documents, optimizing the query, and indexing. As such we may focus on the set of documents that are composed of various structures, but that are not shared with common structure such as the same DTD or XML Schema. In the case, it is essential to analyze structural similarities and differences among many documents. For example, when the documents from the Web or EDMS (Electronic Document Management System) are required to be merged or classified, it is very important to find the common structure for the process of handling documents. In this paper, we transformed sequential pattern mining algorithms(1) to extract maximal similar paths between two XML documents. Experiments with XML documents show that our transformed sequential pattern mining algorithms can exactly find common structures and maximal similar paths between them. For analyzing experimental results, similarity metrics based on maximal similar paths can exactly classify the types of XML documents.
PDF KSCI

Sequential Pattern Mining for Customer Retention in Insurance Industry (보험 고객의 유지를 위한 순차 패턴 마이닝)

Lee, Jae-Sik;Jo, Yu-Jeong
- Proceedings of the Korea Inteligent Information System Society Conference
- /
- 2005.05a
- /
- pp.274-282
- /
- 2005
Customer retention is one of the major issued in life insurance industry, in which competition is increasingly fierce. There are many things to do to retain customers. One of those things is to be continuously in touch with all customers. The objective of this study is to design the contact scheduling system(CSS) to support the planers who must touch the customers without having subjective information. Support-planers suffer from lack of information which can be used to intimately touch. CSS that is developed in this study generates contact schedule to touch customers by taking into account existing contact history. CSS has a two stage process. In the first stage, it segments customers according to his or her demographics and contract status data. Then it finds typical pattern and pattern is combined to business rules for each segment. We expert that CSS would support support-planers to make uncontacted customers' experience positive.
PDF

Sequential Pattern Mining Algorithms with Quantities (정량 정보를 포함한 순차 패턴 마이닝 알고리즘)

Kim, Chul-Yun;Lim, Jong-Hwa;Ng Raymond T.;Shim Kyu-Seok
- Journal of KIISE:Databases
- /
- v.33 no.5
- /
- pp.453-462
- /
- 2006
Discovering sequential patterns is an important problem for many applications. Existing algorithms find sequential patterns in the sense that only items are included in the patterns. However, for many applications, such as business and scientific applications, quantitative attributes are often recorded in the data, which are ignored by existing algorithms but can provide useful insight to the users. In this paper, we consider the problem of mining sequential patterns with quantities. We demonstrate that naive extensions to existing algorithms for sequential patterns are inefficient, as they may enumerate the search space blindly. Thus, we propose hash filtering and quantity sampling techniques that significantly improve the performance of the naive extensions. Experimental results confirm that compared with the naive extensions, these schemes not only improve the execution time substantially but also show better scalability for sequential patterns with quantities.
PDF KSCI

Search Result 82, Processing Time 0.019 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)