• Title, Summary, Keyword: 클러스터링

Search Result 2,163, Processing Time 0.047 seconds

Digital Archives of Cultural Archetype Contents: Its Problems and Direction (디지털 아카이브즈의 문제점과 방향 - 문화원형 콘텐츠를 중심으로 -)

  • Hahm, Han-Hee;Park, Soon-Cheol
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.17 no.2
    • /
    • pp.23-42
    • /
    • 2006
  • This is a study of the digital archives of Culturecontent.com where 'Cultural Archetype Contents' are currently in service. One of the major purposes of our study is to point out problems in the current system and eventually propose improvements to the digital archives. The government launched a four-year project for developing the cultural archetype content sources and establishing its related business with the hope of enhancing the nation's competitiveness. More specifically, the project focuses on the production of source materials of cultural archetype contents in the subjects of Korea's history. tradition, everyday life. arts and general geographical books. In addition, through this project, the government also intends to establish a proper distribution system of digitalized culture contents and to control copyright issues. This paper analyzes the digital archives system that stores the culture content data that have been produced from 2002 to 2005 and evaluates the current system's weaknesses and strengths. The summary of our findings is as follows. First. the digital archives system does not contain a semantic search engine and therefore its full function is 1agged. Second, similar data is not classified into the same categories but into the different ones, thereby confusing and inconveniencing users. Users who want to find source materials could be disappointed by the current distributive system. Our paper suggests a better system of digital archives with text mining technology which consists of five significant intelligent process-keyword searches, summarization, clustering, classification and topic tracking. Our paper endeavors to develop the best technical environment for preserving and using culture contents data. With the new digitalized upgraded settings, users of culture contents data will discover a world of new knowledge. The technology we introduce in this paper will lead to the highest achievable digital intelligence through a new framework.

  • PDF

The Habitat Classification of mammals in Korea based on the National Ecosystem Survey (전국자연환경조사를 활용한 포유류 서식지 유형의 분류)

  • Lee, Hwajin;Ha, Jeongwook;Cha, Jinyeol;Lee, Junghyo;Yoon, Heenam;Chung, Chulun;Oh, Hongshik;Bae, Soyeon
    • Journal of Environmental Impact Assessment
    • /
    • v.26 no.2
    • /
    • pp.160-170
    • /
    • 2017
  • The purpose of this study is to perform clustering of the habitat types and to identify the characteristics of species in the habitat types using mammal data (70,562) of the 3rd National Ecosystem Survey conducted from 2006 to 2012. The 15 habitat types recorded in the field-paper of the 3rd National ecosystem survey were reclassified, which was followed by the statistical analysis of mammal habitat types. In the habitat types cluster analysis, non-hierarchical cluster analysis (k-means cluster analysis), hierarchical cluster analysis, and non-metric multidimensional scaling method were applied to 14 habitat types recorded more than 30 times. A total of 7 Orders, 16 Families, and 39 Species of mammals were identified in the 3rd National Ecosystem Survey collected nationwide. When 11 clusters were classified by habitat types, the simple structure index was the highest (ssi = 0.07). As a result of the similarities and hierarchies between habitat types suggested by the hierarchical clustering analysis, the residential areas were the most different habitat types for mammals; the next following type was a cluster together with rivers and coasts. The results of the non-metric multidimensional scaling analysis demonstrated that both Mus musculus and Rattus norvegicus restrictively appeared in a residential area, which is the most discriminating habitat type. Lutra lutra restrictively appeared in coastal and river areas. In summary, according to our results, the mammalian habitat can be divided into the following four types: (1) the forest type (using forest as the main habitat and migration route); (2) the river type (using water as the main habitat); (3) the residence habitat (living near residential area); and (4) the lowland type (consuming grain or seeds as the main feeding resource).

Personalized Recommendation System for IPTV using Ontology and K-medoids (IPTV환경에서 온톨로지와 k-medoids기법을 이용한 개인화 시스템)

  • Yun, Byeong-Dae;Kim, Jong-Woo;Cho, Yong-Seok;Kang, Sang-Gil
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.147-161
    • /
    • 2010
  • As broadcasting and communication are converged recently, communication is jointed to TV. TV viewing has brought about many changes. The IPTV (Internet Protocol Television) provides information service, movie contents, broadcast, etc. through internet with live programs + VOD (Video on demand) jointed. Using communication network, it becomes an issue of new business. In addition, new technical issues have been created by imaging technology for the service, networking technology without video cuts, security technologies to protect copyright, etc. Through this IPTV network, users can watch their desired programs when they want. However, IPTV has difficulties in search approach, menu approach, or finding programs. Menu approach spends a lot of time in approaching programs desired. Search approach can't be found when title, genre, name of actors, etc. are not known. In addition, inserting letters through remote control have problems. However, the bigger problem is that many times users are not usually ware of the services they use. Thus, to resolve difficulties when selecting VOD service in IPTV, a personalized service is recommended, which enhance users' satisfaction and use your time, efficiently. This paper provides appropriate programs which are fit to individuals not to save time in order to solve IPTV's shortcomings through filtering and recommendation-related system. The proposed recommendation system collects TV program information, the user's preferred program genres and detailed genre, channel, watching program, and information on viewing time based on individual records of watching IPTV. To look for these kinds of similarities, similarities can be compared by using ontology for TV programs. The reason to use these is because the distance of program can be measured by the similarity comparison. TV program ontology we are using is one extracted from TV-Anytime metadata which represents semantic nature. Also, ontology expresses the contents and features in figures. Through world net, vocabulary similarity is determined. All the words described on the programs are expanded into upper and lower classes for word similarity decision. The average of described key words was measured. The criterion of distance calculated ties similar programs through K-medoids dividing method. K-medoids dividing method is a dividing way to divide classified groups into ones with similar characteristics. This K-medoids method sets K-unit representative objects. Here, distance from representative object sets temporary distance and colonize it. Through algorithm, when the initial n-unit objects are tried to be divided into K-units. The optimal object must be found through repeated trials after selecting representative object temporarily. Through this course, similar programs must be colonized. Selecting programs through group analysis, weight should be given to the recommendation. The way to provide weight with recommendation is as the follows. When each group recommends programs, similar programs near representative objects will be recommended to users. The formula to calculate the distance is same as measure similar distance. It will be a basic figure which determines the rankings of recommended programs. Weight is used to calculate the number of watching lists. As the more programs are, the higher weight will be loaded. This is defined as cluster weight. Through this, sub-TV programs which are representative of the groups must be selected. The final TV programs ranks must be determined. However, the group-representative TV programs include errors. Therefore, weights must be added to TV program viewing preference. They must determine the finalranks.Based on this, our customers prefer proposed to recommend contents. So, based on the proposed method this paper suggested, experiment was carried out in controlled environment. Through experiment, the superiority of the proposed method is shown, compared to existing ways.

Study on the Development of Measuring System for Fermentation Degree of Liquid Swine Manure Using Visible Ray (가시광선을 이용한 돈분뇨 액비 부숙도 측정장치 개발에 관한 연구)

  • Choi, D.Y.;Kwag, J.H.;Park, K.H.;Song, J.I.;Kim, J.H.;Kang, H.S.;Han, C.B.;Choi, S.W.;Lee, C.S.
    • Journal of Animal Environmental Science
    • /
    • v.16 no.3
    • /
    • pp.227-236
    • /
    • 2010
  • This study was conducted to develop an measuring system and method for fermentation degree of liquid swine manure by visible ray. The constituent changes of liquid swine manure were examined. pH gradually increased with time, but EC gradually decreased. Malodor strength decreased gradually with aeration treatment with time. Control needed more time to decrease malodor strength than aeration treatment. In aeration treatment, there was no germination of seeds (radish, chinese cabbage) up to 6 weeks and germination rate at 15th week was over 50%. However, in control, there was no germination up to end of experiment. Circular chromatography method showed that there was change after 10th week in aeration treatment but there was no change up to end of experiment in control. As a result, the fermentation degree of liquid swine manure would have relations among pH, EC, germination rate, malodor concentration, and reaction of circular chromatography. The simple analytical instrument for liquid swine manure consisted of a tungsten halogen and deuterium lamp for light source, a sample holder, a quartz cell, spectrometer for spectrum analyzer, a malodour measuring device, a software, etc. Results showed that the simple analytical instrument that was developed can approximately predict the fermentation degree of liquid swine manure by visible ray. Generally, the experiment proved that the simple analytical instrument was reliable, feasible and practical for analyzing the fermentation degree of liquid swine manure.

An Extraction Way of Benchmarking Ports through Tier Analysis for Korean Seaports (Tier분석을 통한 벤치마킹항만 적출방법)

  • Park, Ro-Kyung
    • Journal of Korea Port Economic Association
    • /
    • v.25 no.1
    • /
    • pp.15-28
    • /
    • 2009
  • The purpose of this paper is to show the empirical extraction way of benchmarking ports for overcoming the shortcoming which the traditional DEA method has by using 20 Korean ports in 2003 for 2 inputs (birthing capacity, cargo handling capacity) and 2 outputs(Export and Import Quantity, Number of Ship Calls). Because DEA method has produced the limited set of efficient units which are reference to inefficient units respective of their differences in efficiency scores, it is necessary to adopt the more feasible benchmarking information according to the path analysis(tier or stratification). The core empirical results of this paper are as follows. Benchmarking ports against inefficient ports according to the tier analysis are that Masan Port(Janghang$\rightarrow$Jeju$\rightarrow$Seogoipo$\rightarrow$Yeosu), Jinhae Port(Janghang$\rightarrow$Mogpo$\rightarrow$Seogoipo$\rightarrow$Wando), Pohang&DonghaePort(Janghang$\rightarrow$Samcheonpo$\rightarrow$Pyungtag$\rightarrow$Samcheog), and Sogcho Port(Janghang$\rightarrow$Mogpo$\rightarrow$Seogoipo$\rightarrow$Wando). The policy implication to the Korean seaports and planners is that Korean seaports should introduce the new methods like Tier analysis of this paper for evaluating the port performance and enhancing the efficiency in short term, mid term, and long term according to the tier 3 stage, the tier 2 stage, and the tier 1 stage with original DEA stage.

  • PDF

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.

A Study on the Impact Factors of Contents Diffusion in Youtube using Integrated Content Network Analysis (일반영향요인과 댓글기반 콘텐츠 네트워크 분석을 통합한 유튜브(Youtube)상의 콘텐츠 확산 영향요인 연구)

  • Park, Byung Eun;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.19-36
    • /
    • 2015
  • Social media is an emerging issue in content services and in current business environment. YouTube is the most representative social media service in the world. YouTube is different from other conventional content services in its open user participation and contents creation methods. To promote a content in YouTube, it is important to understand the diffusion phenomena of contents and the network structural characteristics. Most previous studies analyzed impact factors of contents diffusion from the view point of general behavioral factors. Currently some researchers use network structure factors. However, these two approaches have been used separately. However this study tries to analyze the general impact factors on the view count and content based network structures all together. In addition, when building a content based network, this study forms the network structure by analyzing user comments on 22,370 contents of YouTube not based on the individual user based network. From this study, we re-proved statistically the causal relations between view count and not only general factors but also network factors. Moreover by analyzing this integrated research model, we found that these factors affect the view count of YouTube according to the following order; Uploader Followers, Video Age, Betweenness Centrality, Comments, Closeness Centrality, Clustering Coefficient and Rating. However Degree Centrality and Eigenvector Centrality affect the view count negatively. From this research some strategic points for the utilizing of contents diffusion are as followings. First, it is needed to manage general factors such as the number of uploader followers or subscribers, the video age, the number of comments, average rating points, and etc. The impact of average rating points is not so much important as we thought before. However, it is needed to increase the number of uploader followers strategically and sustain the contents in the service as long as possible. Second, we need to pay attention to the impacts of betweenness centrality and closeness centrality among other network factors. Users seems to search the related subject or similar contents after watching a content. It is needed to shorten the distance between other popular contents in the service. Namely, this study showed that it is beneficial for increasing view counts by decreasing the number of search attempts and increasing similarity with many other contents. This is consistent with the result of the clustering coefficient impact analysis. Third, it is important to notice the negative impact of degree centrality and eigenvector centrality on the view count. If the number of connections with other contents is too much increased it means there are many similar contents and eventually it might distribute the view counts. Moreover, too high eigenvector centrality means that there are connections with popular contents around the content, and it might lose the view count because of the impact of the popular contents. It would be better to avoid connections with too powerful popular contents. From this study we analyzed the phenomenon and verified diffusion factors of Youtube contents by using an integrated model consisting of general factors and network structure factors. From the viewpoints of social contribution, this study might provide useful information to music or movie industry or other contents vendors for their effective contents services. This research provides basic schemes that can be applied strategically in online contents marketing. One of the limitations of this study is that this study formed a contents based network for the network structure analysis. It might be an indirect method to see the content network structure. We can use more various methods to establish direct content network. Further researches include more detailed researches like an analysis according to the types of contents or domains or characteristics of the contents or users, and etc.

Development of Customer Sentiment Pattern Map for Webtoon Content Recommendation (웹툰 콘텐츠 추천을 위한 소비자 감성 패턴 맵 개발)

  • Lee, Junsik;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.67-88
    • /
    • 2019
  • Webtoon is a Korean-style digital comics platform that distributes comics content produced using the characteristic elements of the Internet in a form that can be consumed online. With the recent rapid growth of the webtoon industry and the exponential increase in the supply of webtoon content, the need for effective webtoon content recommendation measures is growing. Webtoons are digital content products that combine pictorial, literary and digital elements. Therefore, webtoons stimulate consumer sentiment by making readers have fun and engaging and empathizing with the situations in which webtoons are produced. In this context, it can be expected that the sentiment that webtoons evoke to consumers will serve as an important criterion for consumers' choice of webtoons. However, there is a lack of research to improve webtoons' recommendation performance by utilizing consumer sentiment. This study is aimed at developing consumer sentiment pattern maps that can support effective recommendations of webtoon content, focusing on consumer sentiments that have not been fully discussed previously. Metadata and consumer sentiments data were collected for 200 works serviced on the Korean webtoon platform 'Naver Webtoon' to conduct this study. 488 sentiment terms were collected for 127 works, excluding those that did not meet the purpose of the analysis. Next, similar or duplicate terms were combined or abstracted in accordance with the bottom-up approach. As a result, we have built webtoons specialized sentiment-index, which are reduced to a total of 63 emotive adjectives. By performing exploratory factor analysis on the constructed sentiment-index, we have derived three important dimensions for classifying webtoon types. The exploratory factor analysis was performed through the Principal Component Analysis (PCA) using varimax factor rotation. The three dimensions were named 'Immersion', 'Touch' and 'Irritant' respectively. Based on this, K-Means clustering was performed and the entire webtoons were classified into four types. Each type was named 'Snack', 'Drama', 'Irritant', and 'Romance'. For each type of webtoon, we wrote webtoon-sentiment 2-Mode network graphs and looked at the characteristics of the sentiment pattern appearing for each type. In addition, through profiling analysis, we were able to derive meaningful strategic implications for each type of webtoon. First, The 'Snack' cluster is a collection of webtoons that are fast-paced and highly entertaining. Many consumers are interested in these webtoons, but they don't rate them well. Also, consumers mostly use simple expressions of sentiment when talking about these webtoons. Webtoons belonging to 'Snack' are expected to appeal to modern people who want to consume content easily and quickly during short travel time, such as commuting time. Secondly, webtoons belonging to 'Drama' are expected to evoke realistic and everyday sentiments rather than exaggerated and light comic ones. When consumers talk about webtoons belonging to a 'Drama' cluster in online, they are found to express a variety of sentiments. It is appropriate to establish an OSMU(One source multi-use) strategy to extend these webtoons to other content such as movies and TV series. Third, the sentiment pattern map of 'Irritant' shows the sentiments that discourage customer interest by stimulating discomfort. Webtoons that evoke these sentiments are hard to get public attention. Artists should pay attention to these sentiments that cause inconvenience to consumers in creating webtoons. Finally, Webtoons belonging to 'Romance' do not evoke a variety of consumer sentiments, but they are interpreted as touching consumers. They are expected to be consumed as 'healing content' targeted at consumers with high levels of stress or mental fatigue in their lives. The results of this study are meaningful in that it identifies the applicability of consumer sentiment in the areas of recommendation and classification of webtoons, and provides guidelines to help members of webtoons' ecosystem better understand consumers and formulate strategies.

Development of Beauty Experience Pattern Map Based on Consumer Emotions: Focusing on Cosmetics (소비자 감성 기반 뷰티 경험 패턴 맵 개발: 화장품을 중심으로)

  • Seo, Bong-Goon;Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.179-196
    • /
    • 2019
  • Recently, the "Smart Consumer" has been emerging. He or she is increasingly inclined to search for and purchase products by taking into account personal judgment or expert reviews rather than by relying on information delivered through manufacturers' advertising. This is especially true when purchasing cosmetics. Because cosmetics act directly on the skin, consumers respond seriously to dangerous chemical elements they contain or to skin problems they may cause. Above all, cosmetics should fit well with the purchaser's skin type. In addition, changes in global cosmetics consumer trends make it necessary to study this field. The desire to find one's own individualized cosmetics is being revealed to consumers around the world and is known as "Finding the Holy Grail." Many consumers show a deep interest in customized cosmetics with the cultural boom known as "K-Beauty" (an aspect of "Han-Ryu"), the growth of personal grooming, and the emergence of "self-culture" that includes "self-beauty" and "self-interior." These trends have led to the explosive popularity of cosmetics made in Korea in the Chinese and Southeast Asian markets. In order to meet the customized cosmetics needs of consumers, cosmetics manufacturers and related companies are responding by concentrating on delivering premium services through the convergence of ICT(Information, Communication and Technology). Despite the evolution of companies' responses regarding market trends toward customized cosmetics, there is no "Intelligent Data Platform" that deals holistically with consumers' skin condition experience and thus attaches emotions to products and services. To find the Holy Grail of customized cosmetics, it is important to acquire and analyze consumer data on what they want in order to address their experiences and emotions. The emotions consumers are addressing when purchasing cosmetics varies by their age, sex, skin type, and specific skin issues and influences what price is considered reasonable. Therefore, it is necessary to classify emotions regarding cosmetics by individual consumer. Because of its importance, consumer emotion analysis has been used for both services and products. Given the trends identified above, we judge that consumer emotion analysis can be used in our study. Therefore, we collected and indexed data on consumers' emotions regarding their cosmetics experiences focusing on consumers' language. We crawled the cosmetics emotion data from SNS (blog and Twitter) according to sales ranking ($1^{st}$ to $99^{th}$), focusing on the ample/serum category. A total of 357 emotional adjectives were collected, and we combined and abstracted similar or duplicate emotional adjectives. We conducted a "Consumer Sentiment Journey" workshop to build a "Consumer Sentiment Dictionary," and this resulted in a total of 76 emotional adjectives regarding cosmetics consumer experience. Using these 76 emotional adjectives, we performed clustering with the Self-Organizing Map (SOM) method. As a result of the analysis, we derived eight final clusters of cosmetics consumer sentiments. Using the vector values of each node for each cluster, the characteristics of each cluster were derived based on the top ten most frequently appearing consumer sentiments. Different characteristics were found in consumer sentiments in each cluster. We also developed a cosmetics experience pattern map. The study results confirmed that recommendation and classification systems that consider consumer emotions and sentiments are needed because each consumer differs in what he or she pursues and prefers. Furthermore, this study reaffirms that the application of emotion and sentiment analysis can be extended to various fields other than cosmetics, and it implies that consumer insights can be derived using these methods. They can be used not only to build a specialized sentiment dictionary using scientific processes and "Design Thinking Methodology," but we also expect that these methods can help us to understand consumers' psychological reactions and cognitive behaviors. If this study is further developed, we believe that it will be able to provide solutions based on consumer experience, and therefore that it can be developed as an aspect of marketing intelligence.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.