• Title/Summary/Keyword: longest common subsequence (LCS)

Search Result 8, Processing Time 0.031 seconds

Comparison and Analysis of Lengths of Longest Common Subsequence and Maximal Common Subsequence (최장 공통 부분 서열과 극대 공통 부분 서열의 길이 비교 및 분석)

  • Lee, DongYeop;Na, Joong Chae
    • Annual Conference of KIPS
    • /
    • 2021.11a
    • /
    • pp.15-18
    • /
    • 2021
  • 최장 공통 부분 서열(Longest Common Subsequence, LCS)은 서열 유사도(Similarity)를 측정하기 위한 주요 지표 중 하나로 특별한 가정이 없는 한 두 문자열의 LCS 를 계산하기 위해서는 두 문자열의 길이의 곱에 비례하는 시간이 필요하다. 최근 최장(longest)이라는 조건을 극대(maximal)로 완화한 극대 공통 부분 서열(Maximal Common Subsequence, MCS)이 제시되었고, 두 문자열의 MCS 를 선형에 가까운 시간에 찾는 알고리즘이 개발되었다. 극대는 최장을 보장하지 않기 때문에 두 문자열의 MCS 길이는 LCS 길이와 달리 유일하지 않을 수 있고, LCS 길이가 매우 길어도 길이가 1인 MCS가 존재할 수도 있다. 본 논문에서는 기존 알고리즘에 의해 계산되는 MCS 의 효용성을 알아보기 위해, DNA 등 여러 종류의 실제 데이터와 랜덤 생성된 데이터에 대해 LCS 와 MCS 의 길이를 비교했다. MCS 길이는 LCS 길이 대비 실제 데이터에서 32.1 ~ 60.2%, 랜덤 데이터에서는 27.5 ~ 62.9%로 나타났다. 이 비율은 문자열을 이루고 있는 알파벳 수가 많을수록, 문자열의 길이가 길어질수록 감소했다.

A Dynamic Hand Gesture Recognition System Incorporating Orientation-based Linear Extrapolation Predictor and Velocity-assisted Longest Common Subsequence Algorithm

  • Yuan, Min;Yao, Heng;Qin, Chuan;Tian, Ying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4491-4509
    • /
    • 2017
  • The present paper proposes a novel dynamic system for hand gesture recognition. The approach involved is comprised of three main steps: detection, tracking and recognition. First, the gesture contour captured by a 2D-camera is detected by combining the three-frame difference method and skin-color elliptic boundary model. Then, the trajectory of the hand gesture is extracted via a gesture-tracking algorithm based on an occlusion-direction oriented linear extrapolation predictor, where the gesture coordinate in next frame is predicted by the judgment of current occlusion direction. Finally, to overcome the interference of insignificant trajectory segments, the longest common subsequence (LCS) is employed with the aid of velocity information. Besides, to tackle the subgesture problem, i.e., some gestures may also be a part of others, the most probable gesture category is identified through comparison of the relative LCS length of each gesture, i.e., the proportion between the LCS length and the total length of each template, rather than the length of LCS for each gesture. The gesture dataset for system performance test contains digits ranged from 0 to 9, and experimental results demonstrate the robustness and effectiveness of the proposed approach.

Sequence-based Similar Music Retrieval Scheme (시퀀스 기반의 유사 음악 검색 기법)

  • Jun, Sang-Hoon;Hwang, Een-Jun
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.167-174
    • /
    • 2009
  • Music evokes human emotions or creates music moods through various low-level musical features. Typical music clip consists of one or more moods and this can be used as an important criteria for determining the similarity between music clips. In this paper, we propose a new music retrieval scheme based on the mood change patterns of music clips. For this, we first divide music clips into segments based on low level musical features. Then, we apply K-means clustering algorithm for grouping them into clusters with similar features. By assigning a unique mood symbol for each cluster, we can represent each music clip by a sequence of mood symbols. Finally, to estimate the similarity of music clips, we measure the similarity of their musical mood sequence using the Longest Common Subsequence (LCS) algorithm. To evaluate the performance of our scheme, we carried out various experiments and measured the user evaluation. We report some of the results.

  • PDF

High-performance computing for SARS-CoV-2 RNAs clustering: a data science-based genomics approach

  • Oujja, Anas;Abid, Mohamed Riduan;Boumhidi, Jaouad;Bourhnane, Safae;Mourhir, Asmaa;Merchant, Fatima;Benhaddou, Driss
    • Genomics & Informatics
    • /
    • v.19 no.4
    • /
    • pp.49.1-49.11
    • /
    • 2021
  • Nowadays, Genomic data constitutes one of the fastest growing datasets in the world. As of 2025, it is supposed to become the fourth largest source of Big Data, and thus mandating adequate high-performance computing (HPC) platform for processing. With the latest unprecedented and unpredictable mutations in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the research community is in crucial need for ICT tools to process SARS-CoV-2 RNA data, e.g., by classifying it (i.e., clustering) and thus assisting in tracking virus mutations and predict future ones. In this paper, we are presenting an HPC-based SARS-CoV-2 RNAs clustering tool. We are adopting a data science approach, from data collection, through analysis, to visualization. In the analysis step, we present how our clustering approach leverages on HPC and the longest common subsequence (LCS) algorithm. The approach uses the Hadoop MapReduce programming paradigm and adapts the LCS algorithm in order to efficiently compute the length of the LCS for each pair of SARS-CoV-2 RNA sequences. The latter are extracted from the U.S. National Center for Biotechnology Information (NCBI) Virus repository. The computed LCS lengths are used to measure the dissimilarities between RNA sequences in order to work out existing clusters. In addition to that, we present a comparative study of the LCS algorithm performance based on variable workloads and different numbers of Hadoop worker nodes.

An Automated Technique for Illegal Site Detection using the Sequence of HTML Tags (HTML 태그 순서를 이용한 불법 사이트 탐지 자동화 기술)

  • Lee, Kiryong;Lee, Heejo
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1173-1178
    • /
    • 2016
  • Since the introduction of BitTorrent protocol in 2001, everything can be downloaded through file sharing, including music, movies and software. As a result, the copyright holder suffers from illegal sharing of copyright content. In order to solve this problem, countries have enacted illegal share related law; and internet service providers block pirate sites. However, illegal sites such as pirate bay easily reopen the site by changing the domain name. Thus, we propose a technique to easily detect pirate sites that are reopened. This automated technique collects the domain names using the google search engine, and measures similarity using Longest Common Subsequence (LCS) algorithm by comparing the tag structure of the source web page and reopened web page. For evaluation, we colledted 2,383 domains from google search. Experimental results indicated detection of a total of 44 pirate sites for collected domains when applying LCS algorithm. In addition, this technique detected 23 pirate sites for 805 domains when applied to foreign pirate sites. This experiment facilitated easy detection of the reopened pirate sites using an automated detection system.

Sketch Map System using Clustering Method of XML Documents (XML 문서의 클러스터링 기법을 이용한 스케치맵 시스템)

  • Kim, Jung-Sook;Lee, Ya-Ri;Hong, Kyung-Pyo
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.12
    • /
    • pp.19-30
    • /
    • 2009
  • The service that has recently come into the spotlight utilizes the map to first approach the map and then provide various mash-up formed results through the interface. This service can provide precise information to the users but the map is barely reusable. The sketch-map system of this paper, unlike the existing large map system, uses the method of presenting the specific spot and route in XML document and then clustering among sketch-maps. The map service system is designed to show the optimum route to the destination in a simple outline map. It is done by renovating the spot presented by the map into optimum contents. This service system, through the process of analyzing, splitting and clustering of the sketch-map's XML document input, creates a valid form of a sketch-map. It uses the LCS(Longest Common Subsequence) algorithm for splitting and merging sketch-map in the process of query. In addition, the simulation of this system's expected effects is provided. It shows how the maps that share information and knowledge assemble to form a large map and thus presents the system's ability and role as a new research portal.

A Music Retrieval Scheme based on Variation of Musical Mood (음악 무드의 변화 기반 유사 음악 검색 기법)

  • Sanghoon Jun;Byeong-jun Han;Eenjun Hwang
    • Annual Conference of KIPS
    • /
    • 2008.11a
    • /
    • pp.760-762
    • /
    • 2008
  • 음악에서는 다양한 감정의 표현을 시간에 따른 음악 무드의 전이로 표현한다. 본 연구에서는 Longest Common Subsequence (LCS) 알고리즘 및 k-Means 알고리즘에 기반한 유사 음악 검색 기법을 제안한다. 우선, 음악 무드의 흐름을 무드 세그먼트 단위로 나누고, 이를 추출된 다양한 음악 특성을 k-Means 알고리즘으로 분류하여 무드 시퀀스로 변환한다. 또한, 유사한 무드의 흐름을 가지는 음악을 검색하기 위해 LCS 알고리즘에 기반한 무드 시퀀스의 유사도를 정의한다. 본 논문은 제안된 내용을 바탕으로 실험과 설문 조사를 통해, 기존의 전역적 특성 검색 방식보다 시퀀스를 이용한 검색방식이 좀 더 효율적임을 증명하였다.

Interface Mapping and Generation Methods for Intuitive User Interface and Consistency Provision (사용자 인터페이스의 직관적인 인식 및 일관성 부여를 위한 인터페이스 매핑 및 생성 기법)

  • Yoon, Hyo-Seok;Woo, Woon-Tack
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.135-139
    • /
    • 2009
  • In this paper we present INCUI, a user interface based on natural view of physical user interface of target devices and services in pervasive computing environment. We present a concept of Intuitively Natural and Consistent User Interface (INCUI) consisted of an image of physical user interface and a description XML file. Then we elaborate how INCUI template can be used to consistently map user interface components structurally and visually. We describe the process of INCUI mapping and a novel mapping method selection architecture based on domain size, types of source and target INCUI. Especially we developed and applied an extended LCS-based algorithm using prefix/postfix/synonym for similarity calculation.

  • PDF