• Title/Summary/Keyword: Sequential clustering

Search Result 89, Processing Time 0.027 seconds

Movement Route Generation Technique through Location Area Clustering (위치 영역 클러스터링을 통한 이동 경로 생성 기법)

  • Yoon, Chang-Pyo;Hwang, Chi-Gon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.355-357
    • /
    • 2022
  • In this paper, as a positioning technology for predicting the movement path of a moving object using a recurrent neural network (RNN) model, which is a deep learning network, in an indoor environment, continuous location information is used to predict the path of a moving vehicle within a local path. We propose a movement path generation technique that can reduce decision errors. In the case of an indoor environment where GPS information is not available, the data set must be continuous and sequential in order to apply the RNN model. However, Wi-Fi radio fingerprint data cannot be used as RNN data because continuity is not guaranteed as characteristic information about a specific location at the time of collection. Therefore, we propose a movement path generation technique for a vehicle moving a local path in an indoor environment by giving the necessary sequential location continuity to the RNN model.

  • PDF

Analysis and Visualization for Comment Messages of Internet Posts (인터넷 게시물의 댓글 분석 및 시각화)

  • Lee, Yun-Jung;Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.7
    • /
    • pp.45-56
    • /
    • 2009
  • There are many internet users who collect the public opinions and express their opinions for internet news or blog articles through the replying comment on online community. But, it is hard to search and explore useful messages on web blogs since most of web blog systems show articles and their comments to the form of sequential list. Also, spam and malicious comments have become social problems as the internet users increase. In this paper, we propose a clustering and visualizing system for responding comments on large-scale weblogs, namely 'Daum AGORA,' using similarity analysis. Our system shows the comment clustering result as a simple screen view. Our system also detects spam comments using Needleman-Wunsch algorithm that is a well-known algorithm in bioinformatics.

Video Scene Detection using Shot Clustering based on Visual Features (시각적 특징을 기반한 샷 클러스터링을 통한 비디오 씬 탐지 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.47-60
    • /
    • 2012
  • Video data comes in the form of the unstructured and the complex structure. As the importance of efficient management and retrieval for video data increases, studies on the video parsing based on the visual features contained in the video contents are researched to reconstruct video data as the meaningful structure. The early studies on video parsing are focused on splitting video data into shots, but detecting the shot boundary defined with the physical boundary does not cosider the semantic association of video data. Recently, studies on structuralizing video shots having the semantic association to the video scene defined with the semantic boundary by utilizing clustering methods are actively progressed. Previous studies on detecting the video scene try to detect video scenes by utilizing clustering algorithms based on the similarity measure between video shots mainly depended on color features. However, the correct identification of a video shot or scene and the detection of the gradual transitions such as dissolve, fade and wipe are difficult because color features of video data contain a noise and are abruptly changed due to the intervention of an unexpected object. In this paper, to solve these problems, we propose the Scene Detector by using Color histogram, corner Edge and Object color histogram (SDCEO) that clusters similar shots organizing same event based on visual features including the color histogram, the corner edge and the object color histogram to detect video scenes. The SDCEO is worthy of notice in a sense that it uses the edge feature with the color feature, and as a result, it effectively detects the gradual transitions as well as the abrupt transitions. The SDCEO consists of the Shot Bound Identifier and the Video Scene Detector. The Shot Bound Identifier is comprised of the Color Histogram Analysis step and the Corner Edge Analysis step. In the Color Histogram Analysis step, SDCEO uses the color histogram feature to organizing shot boundaries. The color histogram, recording the percentage of each quantized color among all pixels in a frame, are chosen for their good performance, as also reported in other work of content-based image and video analysis. To organize shot boundaries, SDCEO joins associated sequential frames into shot boundaries by measuring the similarity of the color histogram between frames. In the Corner Edge Analysis step, SDCEO identifies the final shot boundaries by using the corner edge feature. SDCEO detect associated shot boundaries comparing the corner edge feature between the last frame of previous shot boundary and the first frame of next shot boundary. In the Key-frame Extraction step, SDCEO compares each frame with all frames and measures the similarity by using histogram euclidean distance, and then select the frame the most similar with all frames contained in same shot boundary as the key-frame. Video Scene Detector clusters associated shots organizing same event by utilizing the hierarchical agglomerative clustering method based on the visual features including the color histogram and the object color histogram. After detecting video scenes, SDCEO organizes final video scene by repetitive clustering until the simiarity distance between shot boundaries less than the threshold h. In this paper, we construct the prototype of SDCEO and experiments are carried out with the baseline data that are manually constructed, and the experimental results that the precision of shot boundary detection is 93.3% and the precision of video scene detection is 83.3% are satisfactory.

A Dynamic Recommendation System Using User Log Analysis and Document Similarity in Clusters (사용자 로그 분석과 클러스터 내의 문서 유사도를 이용한 동적 추천 시스템)

  • 김진수;김태용;최준혁;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.586-594
    • /
    • 2004
  • Because web documents become creation and disappearance rapidly, users require the recommend system that offers users to browse the web document conveniently and correctly. One largely untapped source of knowledge about large data collections is contained in the cumulative experiences of individuals finding useful information in the collection. Recommendation systems attempt to extract such useful information by capturing and mining one or more measures of the usefulness of the data. The existing Information Filtering system has the shortcoming that it must have user's profile. And Collaborative Filtering system has the shortcoming that users have to rate each web document first and in high-quantity, low-quality environments, users may cover only a tiny percentage of documents available. And dynamic recommendation system using the user browsing pattern also provides users with unrelated web documents. This paper classifies these web documents using the similarity between the web documents under the web document type and extracts the user browsing sequential pattern DB using the users' session information based on the web server log file. When user approaches the web document, the proposed Dynamic recommendation system recommends Top N-associated web documents set that has high similarity between current web document and other web documents and recommends set that has sequential specificity using the extracted informations and users' session information.

A Study on Generation Adequacy Assessment Considering Probabilistic Relation Between System Load and Wind-Power (계통 부하량과 풍력발전의 확률적 관계를 고려한 발전량 적정성 평가 연구)

  • Kim, Gwang-Won;Hyun, Seung-Ho
    • Journal of the Korean Institute of Illuminating and Electrical Installation Engineers
    • /
    • v.21 no.10
    • /
    • pp.52-58
    • /
    • 2007
  • This paper presents the wind-power model for generation adequacy assessment. Both wind-power and system load depend on time of a year and show their periodic nature with similar periods. Therefore, the two quantities have some probabilistic relations, and if one of them is given, the other can be decided with some probability. In this paper, the two quantities are quantized by k-means clustering algorithm and related probabilities among the cluster centers are calculated using sequential wind-power and system load data. The proposed model is highly expected to be applied for generation adequacy assessment by Monte-Carlo simulation with state sampling method.

A framework for parallel processing in multiblock flow computations (다중블록 유동해석에서 병렬처리를 위한 시스템의 구조)

  • Park, Sang-Geun;Lee, Geon-U
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.21 no.8
    • /
    • pp.1024-1033
    • /
    • 1997
  • The past several years have witnessed an ever-increasing acceptance and adoption of parallel processing, both for high performance scientific computing as well as for more general purpose applications. Furthermore with increasing needs to perform the complex flow calculations in an efficient manner, the use of the message passing model on distributed networks has emerged as an important alternative to the expensive supercomputers. This work attempts to provide a generic framework to enable the parallelization of all CFD-related works using the master-slave model. This framework consists of (1) input geometry, (2) domain decomposition, (3) grid generation, (4) flow computations, (5) flow visualization, and (6) output display as the sequential components, but performs computations for (2) to (5) in parallel on the workstation clustering. The flow computations are parallized by having multiple copies of the flow-code to solve a PDE on different spatial regions on different processors, while their flow data are exchanged across the region boundaries, and the solution is time-stepped. The Parallel Virtual Machine (PVM) is used for distributed communication in this work.

Design of a GIS-Based Distribution System with Service Consideration (서비스수준을 고려한 GIS기반의 차량 운송시스템)

  • 황흥석;조규성
    • Korean Management Science Review
    • /
    • v.18 no.2
    • /
    • pp.125-134
    • /
    • 2001
  • This paper is concerned with the development of a GIS-based distribution system with service consideration. The proposed model could be used for a wide range of logistics applications in planning, engineering and operational purpose for logistics system. This research addresses the formulation of those complex prob1ems of two-echelon logistics system to plan the incorporating supply center locations and distribution problems based on GIS. We propose an integrated logistics model for determining the optimal patterns of supply centers and inventory allocations (customers) with a three-step sequential approach. 1) First step, Developing GIS-distance model and stochastic set-covering program to determine Optimel pattern of supply center location. 2) Second step, Optimal sector-clustering to support customers. 3) Third step, Optimal vehicle rouse scheduling based on GIS, GIS-VRP In this research we developed GUI-tree program, the GIS-VRP provide the vehicle to users and freight information in real time. We applied a set of sample examples to this model and demonstrated samp1e results. It has been found that the proposed model is potentially efficient and useful in solving multi-depot problem through examples. However the proposed model can provide logistics decision makers to get the best supply schedule.

  • PDF

A Biclustering Method for Time Series Analysis

  • Lee, Jeong-Hwa;Lee, Young-Rok;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.9 no.2
    • /
    • pp.131-140
    • /
    • 2010
  • Biclustering is a method of finding meaningful subsets of objects and attributes simultaneously, which may not be detected by traditional clustering methods. It is popularly used for the analysis of microarray data representing the expression levels of genes by conditions. Usually, biclustering algorithms do not consider a sequential relation between attributes. For time series data, however, bicluster solutions should keep the time sequence. This paper proposes a new biclustering algorithm for time series data by modifying the plaid model. The proposed algorithm introduces a parameter controlling an interval between two selected time points. Also, the pruning step preventing an over-fitting problem is modified so as to eliminate only starting or ending points. Results from artificial data sets show that the proposed method is more suitable for the extraction of biclusters from time series data sets. Moreover, by using the proposed method, we find some interesting observations from real-world time-course microarray data sets and apartment price data sets in metropolitan areas.

Analysis of shopping website visit types and shopping pattern (쇼핑 웹사이트 탐색 유형과 방문 패턴 분석)

  • Choi, Kyungbin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.85-107
    • /
    • 2019
  • Online consumers browse products belonging to a particular product line or brand for purchase, or simply leave a wide range of navigation without making purchase. The research on the behavior and purchase of online consumers has been steadily progressed, and related services and applications based on behavior data of consumers have been developed in practice. In recent years, customization strategies and recommendation systems of consumers have been utilized due to the development of big data technology, and attempts are being made to optimize users' shopping experience. However, even in such an attempt, it is very unlikely that online consumers will actually be able to visit the website and switch to the purchase stage. This is because online consumers do not just visit the website to purchase products but use and browse the websites differently according to their shopping motives and purposes. Therefore, it is important to analyze various types of visits as well as visits to purchase, which is important for understanding the behaviors of online consumers. In this study, we explored the clustering analysis of session based on click stream data of e-commerce company in order to explain diversity and complexity of search behavior of online consumers and typified search behavior. For the analysis, we converted data points of more than 8 million pages units into visit units' sessions, resulting in a total of over 500,000 website visit sessions. For each visit session, 12 characteristics such as page view, duration, search diversity, and page type concentration were extracted for clustering analysis. Considering the size of the data set, we performed the analysis using the Mini-Batch K-means algorithm, which has advantages in terms of learning speed and efficiency while maintaining the clustering performance similar to that of the clustering algorithm K-means. The most optimized number of clusters was derived from four, and the differences in session unit characteristics and purchasing rates were identified for each cluster. The online consumer visits the website several times and learns about the product and decides the purchase. In order to analyze the purchasing process over several visits of the online consumer, we constructed the visiting sequence data of the consumer based on the navigation patterns in the web site derived clustering analysis. The visit sequence data includes a series of visiting sequences until one purchase is made, and the items constituting one sequence become cluster labels derived from the foregoing. We have separately established a sequence data for consumers who have made purchases and data on visits for consumers who have only explored products without making purchases during the same period of time. And then sequential pattern mining was applied to extract frequent patterns from each sequence data. The minimum support is set to 10%, and frequent patterns consist of a sequence of cluster labels. While there are common derived patterns in both sequence data, there are also frequent patterns derived only from one side of sequence data. We found that the consumers who made purchases through the comparative analysis of the extracted frequent patterns showed the visiting pattern to decide to purchase the product repeatedly while searching for the specific product. The implication of this study is that we analyze the search type of online consumers by using large - scale click stream data and analyze the patterns of them to explain the behavior of purchasing process with data-driven point. Most studies that typology of online consumers have focused on the characteristics of the type and what factors are key in distinguishing that type. In this study, we carried out an analysis to type the behavior of online consumers, and further analyzed what order the types could be organized into one another and become a series of search patterns. In addition, online retailers will be able to try to improve their purchasing conversion through marketing strategies and recommendations for various types of visit and will be able to evaluate the effect of the strategy through changes in consumers' visit patterns.

A Semantic Text Model with Wikipedia-based Concept Space (위키피디어 기반 개념 공간을 가지는 시멘틱 텍스트 모델)

  • Kim, Han-Joon;Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.3
    • /
    • pp.107-123
    • /
    • 2014
  • Current text mining techniques suffer from the problem that the conventional text representation models cannot express the semantic or conceptual information for the textual documents written with natural languages. The conventional text models represent the textual documents as bag of words, which include vector space model, Boolean model, statistical model, and tensor space model. These models express documents only with the term literals for indexing and the frequency-based weights for their corresponding terms; that is, they ignore semantical information, sequential order information, and structural information of terms. Most of the text mining techniques have been developed assuming that the given documents are represented as 'bag-of-words' based text models. However, currently, confronting the big data era, a new paradigm of text representation model is required which can analyse huge amounts of textual documents more precisely. Our text model regards the 'concept' as an independent space equated with the 'term' and 'document' spaces used in the vector space model, and it expresses the relatedness among the three spaces. To develop the concept space, we use Wikipedia data, each of which defines a single concept. Consequently, a document collection is represented as a 3-order tensor with semantic information, and then the proposed model is called text cuboid model in our paper. Through experiments using the popular 20NewsGroup document corpus, we prove the superiority of the proposed text model in terms of document clustering and concept clustering.