Search | Korea Science

Sequential Pattern Mining with Optimization Calling MapReduce Function on MapReduce Framework (맵리듀스 프레임웍 상에서 맵리듀스 함수 호출을 최적화하는 순차 패턴 마이닝 기법)

Kim, Jin-Hyun;Shim, Kyu-Seok
- The KIPS Transactions:PartD
- /
- v.18D no.2
- /
- pp.81-88
- /
- 2011
Sequential pattern mining that determines frequent patterns appearing in a given set of sequences is an important data mining problem with broad applications. For example, sequential pattern mining can find the web access patterns, customer's purchase patterns and DNA sequences related with specific disease. In this paper, we develop the sequential pattern mining algorithms using MapReduce framework. Our algorithms distribute input data to several machines and find frequent sequential patterns in parallel. With synthetic data sets, we did a comprehensive performance study with varying various parameters. Our experimental results show that linear speed up can be achieved through our algorithms with increasing the number of used machines.
https://doi.org/10.3745/KIPSTD.2011.18D.2.081 인용 PDF KSCI

Early Bone Marrow Edema Pattern of the Osteoporotic Vertebral Compression Fracture : Can Be Predictor of Vertebral Deformity Types and Prognosis?

Ahn, Sung Eun;Ryu, Kyung Nam;Park, Ji Seon;Jin, Wook;Park, So Young;Kim, Sung Bum
- Journal of Korean Neurosurgical Society
- /
- v.59 no.2
- /
- pp.137-142
- /
- 2016
Objective : To evaluate whether an early bone marrow edema pattern predicts vertebral deformity types and prognosis in osteoporotic vertebral compression fracture (OVCF). Methods : This retrospective study enrolled 64 patients with 75 acute OVCFs who underwent early MRI and followed up MRI. On early MRI, the low SI pattern of OVCF on T1WI were assessed and classified into 3 types (diffuse, globular or patchy, band-like). On followed up MRI, the vertebral deformity types (anterior wedge, biconcave, crush), degree of vertebral body height loss, incidence of vertebral osteonecrosis and spinal stenosis were assessed for each vertebral fracture types. Results : According to the early bone marrow edema pattern on T1WI, 26 vertebrae were type 1, 14 vertebrae were type 2 and 35 vertebrae were type 3. On followed up MRI, the crush-type vertebral deformity was most frequent among the type 1 OVCFs, the biconcave-type vertebral deformity was most frequent among the type 2 OVCFs and the anterior wedge-type vertebral deformity was most frequent among the type 3 OVCFs (p<0.001). In addition, type 1 early bone marrow edema pattern of OVCF on T1WI were associated with higher incidence of severe degree vertebral body height loss, vertebral osteonecrosis and spinal stenosis on the follow up MRI. Conclusion : Early bone marrow edema pattern of OVCF on T1WI, significant correlated with vertebral deformity types on the follow up MRI. The severe degree of vertebral height loss, vertebral osteonecrosis, and spinal stenosis were more frequent in patients with diffuse low SI pattern.
https://doi.org/10.3340/jkns.2016.59.2.137 인용 PDF KSCI

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences

Kang, Tae-Ho;Yoo, Jae-Soo;Kim, Hak-Yong;Lee, Byoung-Yup
- International Journal of Contents
- /
- v.3 no.2
- /
- pp.18-24
- /
- 2007
Biological sequences such as DNA and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of more than hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological datasets with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with a fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. The experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.
https://doi.org/10.5392/IJoC.2007.3.2.018 인용 PDF

Pattern mining for large distributed dataset: A parallel approach (PMLDD)

Pal, Amrit;Kumar, Manish
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.11
- /
- pp.5287-5303
- /
- 2018
Handling vast amount of data found in large transactional datasets is an obvious challenge for the conventional data mining algorithms. Addressing this challenge, our paper proposes a parallel approach for proper decomposition of mining problem into sub-problems in order to find frequent patterns from these datasets. The proposed, Pattern Mining for Large Distributed Dataset (PMLDD) approach, ensures minimum dependencies as well as minimum communications among sub-problems. It establishes a linear aggregation of the intermediate results so that it can be adapted to large-scale programming models like MapReduce. In this context, an algorithmic structure for MapReduce programming model is presented. PMLDD guarantees an efficient load balancing among the sub-problems by a specific selection criterion. Further, it optimizes the number of required iterations over the dataset for mining frequent patterns as compared to the existing approaches. Finally, we believe that our approach is scalable enough to handle larger datasets in terms of performance evaluation, and the result analysis justifies all these mentioned concerns.
https://doi.org/10.3837/tiis.2018.11.007 인용 PDF KSCI

Frequent Pattern Mining By using a Completeness for BigData (빅데이터에 대한 Completeness를 이용한 빈발 패턴 마이닝)

Park, In-Kyu
- Journal of Korea Game Society
- /
- v.18 no.2
- /
- pp.121-130
- /
- 2018
Most of those studies use frequency, the number of times a pattern appears in a transaction database, as the key measure for pattern interestingness. It prerequisites that any interesting pattern should occupy a maximum portion of the transactions it appears. But in our real world scenarios the completeness of any pattern is more likely to become various in transactions. Hence, we should also consider the problem of finding the qualified patterns with the significant values of the weighted support by completeness in order to reduce the loss of information within any pattern in transaction. In these pattern recommendation applications, patterns with higher completeness may lead to higher recall while patterns with higher completeness may lead to higher recall while patterns with higher frequency lead to higher precision. In this paper, we propose a measure of weighted support and completeness and an algorithm WSCFPM(weigted support and completeness frequent pattern mining). Our algorithm handles the invalidation of the monotone or anti-monotone property which does not hold on completeness. Extensive performance analysis show that our algorithm is very efficient and scalable for word pattern mining.
https://doi.org/10.7583/JKGS.2018.18.2.121 인용 PDF KSCI

An Efficient Approach to Mining Maximal Contiguous Frequent Patterns from Large DNA Sequence Databases

Karim, Md. Rezaul;Rashid, Md. Mamunur;Jeong, Byeong-Soo;Choi, Ho-Jin
- Genomics & Informatics
- /
- v.10 no.1
- /
- pp.51-57
- /
- 2012
Mining interesting patterns from DNA sequences is one of the most challenging tasks in bioinformatics and computational biology. Maximal contiguous frequent patterns are preferable for expressing the function and structure of DNA sequences and hence can capture the common data characteristics among related sequences. Biologists are interested in finding frequent orderly arrangements of motifs that are responsible for similar expression of a group of genes. In order to reduce mining time and complexity, however, most existing sequence mining algorithms either focus on finding short DNA sequences or require explicit specification of sequence lengths in advance. The challenge is to find longer sequences without specifying sequence lengths in advance. In this paper, we propose an efficient approach to mining maximal contiguous frequent patterns from large DNA sequence datasets. The experimental results show that our proposed approach is memory-efficient and mines maximal contiguous frequent patterns within a reasonable time.
https://doi.org/10.5808/GI.2012.10.1.51 인용 PDF KSCI

Constructing Gene Regulatory Networks using Frequent Gene Expression Pattern and Chain Rules (빈발 유전자 발현 패턴과 연쇄 규칙을 이용한 유전자 조절 네트워크 구축)

Lee, Heon-Gyu;Ryu, Keun-Ho;Joung, Doo-Young
- The KIPS Transactions:PartD
- /
- v.14D no.1 s.111
- /
- pp.9-20
- /
- 2007
Groups of genes control the functioning of a cell by complex interactions. Such interactions of gene groups are tailed Gene Regulatory Networks(GRNs). Two previous data mining approaches, clustering and classification, have been used to analyze gene expression data. Though these mining tools are useful for determining membership of genes by homology, they don't identify the regulatory relationships among genes found in the same class of molecular actions. Furthermore, we need to understand the mechanism of how genes relate and how they regulate one another. In order to detect regulatory relationships among genes from time-series Microarray data, we propose a novel approach using frequent pattern mining and chain rules. In this approach, we propose a method for transforming gene expression data to make suitable for frequent pattern mining, and gene expression patterns we detected by applying the FP-growth algorithm. Next, we construct a gene regulatory network from frequent gene patterns using chain rules. Finally, we validate our proposed method through our experimental results, which are consistent with published results.
https://doi.org/10.3745/KIPSTD.2007.14-D.1.009 인용 PDF KSCI

Discovering Association Rules using Item Clustering on Frequent Pattern Network (빈발 패턴 네트워크에서 아이템 클러스터링을 통한 연관규칙 발견)

Oh, Kyeong-Jin;Jung, Jin-Guk;Ha, In-Ay;Jo, Geun-Sik
- Journal of Intelligence and Information Systems
- /
- v.14 no.1
- /
- pp.1-17
- /
- 2008
Data mining is defined as the process of discovering meaningful and useful pattern in large volumes of data. In particular, finding associations rules between items in a database of customer transactions has become an important thing. Some data structures and algorithms had been proposed for storing meaningful information compressed from an original database to find frequent itemsets since Apriori algorithm. Though existing method find all association rules, we must have a lot of process to analyze association rules because there are too many rules. In this paper, we propose a new data structure, called a Frequent Pattern Network (FPN), which represents items as vertices and 2-itemsets as edges of the network. In order to utilize FPN, We constitute FPN using item's frequency. And then we use a clustering method to group the vertices on the network into clusters so that the intracluster similarity is maximized and the intercluster similarity is minimized. We generate association rules based on clusters. Our experiments showed accuracy of clustering items on the network using confidence, correlation and edge weight similarity methods. And We generated association rules using clusters and compare traditional and our method. From the results, the confidence similarity had a strong influence than others on the frequent pattern network. And FPN had a flexibility to minimum support value.
PDF

I-Tree: A Frequent Patterns Mining Approach without Candidate Generation or Support Constraint

Tanbeer, Syed Khairuzzaman;Sarkar, Jehad;Jeong, Byeong-Soo;Lee, Young-Koo;Lee, Sung-Young
- Proceedings of the Korea Information Processing Society Conference
- /
- 2007.05a
- /
- pp.31-33
- /
- 2007
Devising an efficient one-pass frequent pattern mining algorithm has been an issue in data mining research in recent past. Pattern growth algorithms like FP-Growth which are found more efficient than candidate generation and test algorithms still require two database scans. Moreover, FP-growth approach requires rebuilding the base-tree while mining with different support counts. In this paper we propose an item-based tree, called I-Tree that not only efficiently mines frequent patterns with single database scan but also provides multiple mining scopes with multiple support thresholds. The 'build-once-mine-many' property of I-Tree allows it to construct the tree only once and perform mining operation several times with the variation of support count values.
PDF

An Efficient Method for Mining Frequent Patterns based on Weighted Support over Data Streams (데이터 스트림에서 가중치 지지도 기반 빈발 패턴 추출 방법)

Kim, Young-Hee;Kim, Won-Young;Kim, Ung-Mo
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.10 no.8
- /
- pp.1998-2004
- /
- 2009
Recently, due to technical developments of various storage devices and networks, the amount of data increases rapidly. The large volume of data streams poses unique space and time constraints on the data mining process. The continuous characteristic of streaming data necessitates the use of algorithms that require only one scan over the stream for knowledge discovery. Most of the researches based on the support are concerned with the frequent itemsets, but ignore the infrequent itemsets even if it is crucial. In this paper, we propose an efficient method WSFI-Mine(Weighted Support Frequent Itemsets Mine) to mine all frequent itemsets by one scan from the data stream. This method can discover the closed frequent itemsets using DCT(Data Stream Closed Pattern Tree). We compare the performance of our algorithm with DSM-FI and THUI-Mine, under different minimum supports. As results show that WSFI-Mine not only run significant faster, but also consume less memory.
https://doi.org/10.5762/KAIS.2009.10.8.1998 인용 PDF

Search Result 605, Processing Time 0.033 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)