• Title/Summary/Keyword: string matching

Search Result 102, Processing Time 0.022 seconds

Efficient Image Retrieval using Minimal Spatial Relationships (최소 공간관계를 이용한 효율적인 이미지 검색)

  • Lee, Soo-Cheol;Hwang, Een-Jun;Byeon, Kwang-Jun
    • Journal of KIISE:Databases
    • /
    • v.32 no.4
    • /
    • pp.383-393
    • /
    • 2005
  • Retrieval of images from image databases by spatial relationship can be effectively performed through visual interface systems. In these systems, the representation of image with 2D strings, which are derived from symbolic projections, provides an efficient and natural way to construct image index and is also an ideal representation for the visual query. With this approach, retrieval is reduced to matching two symbolic strings. However, using 2D-string representations, spatial relationships between the objects in the image might not be exactly specified. Ambiguities arise for the retrieval of images of 3D scenes. In order to remove ambiguous description of object spatial relationships, in this paper, images are referred by considering spatial relationships using the spatial location algebra for the 3D image scene. Also, we remove the repetitive spatial relationships using the several reduction rules. A reduction mechanism using these rules can be used in query processing systems that retrieve images by content. This could give better precision and flexibility in image retrieval.

(A Method to Classify and Recognize Spelling Changes between Morphemes of a Korean Word) (한국어 어절의 철자변화 현상 분류와 인식 방법)

  • 김덕봉
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.476-486
    • /
    • 2003
  • There is no explicit spelling change information in part-of-speech tagged corpora of Korean. It causes some difficulties in acquiring the data to study Korean morphology, i.e. automatically in constructing a dictionary for morphological analysis and systematically in collecting the phenomena of the spelling changes from the corpora. To solve this problem, this paper presents a method to recognize spelling changes between morphemes of a Korean word in tagged corpora, only using a string matching, without using a dictionary and phonological rules. This method not only has an ability to robustly recognize the spelling changes because it doesn't use any phonological rules, but also can be implemented with few cost. This method has been experimented with a large tagged corpus of Korean, and recognized the 100% of spelling changes in the corpus with accuracy.

Ontology Alignment based on Parse Tree Kernel usig Structural and Semantic Information (구조 및 의미 정보를 활용한 파스 트리 커널 기반의 온톨로지 정렬 방법)

  • Son, Jeong-Woo;Park, Seong-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.329-334
    • /
    • 2009
  • The ontology alignment has two kinds of major problems. First, the features used for ontology alignment are usually defined by experts, but it is highly possible for some critical features to be excluded from the feature set. Second, the semantic and the structural similarities are usually computed independently, and then they are combined in an ad-hoc way where the weights are determined heuristically. This paper proposes the modified parse tree kernel (MPTK) for ontology alignment. In order to compute the similarity between entities in the ontologies, a tree is adopted as a representation of an ontology. After transforming an ontology into a set of trees, their similarity is computed using MPTK without explicit enumeration of features. In computing the similarity between trees, the approximate string matching is adopted to naturally reflect not only the structural information but also the semantic information. According to a series of experiments with a standard data set, the kernel method outperforms other structural similarities such as GMO. In addition, the proposed method shows the state-of-the-art performance in the ontology alignment.

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.

An Efficient Code Expansion from EM to SPARC Code (EM에서 SPARC 코드로 효율적인 코드 확장)

  • Oh, Se-Man;Yun, Young-Shick
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2596-2604
    • /
    • 1997
  • There are two kinds of backends in ACK:code generator(full-fledged backend) and code expander(fast backend). Code generators generate target code using string pattern matching and code expanders generate target code using macro expansion. ACK translates EM to SPARC code using code expander. The corresponding SPARC code sequences for a EM code are generated and then push-pop optimization is performed. But, there is the problem of maintaining hybrid stack. And code expander is not considered to passes parameters of a procedure call through register windows. The purpose of this paper is to improve SPARC code quality. We suggest a method of SPARC cod generation using EM tree. Our method is divided into two phases:EM tree building phase and code expansion phase. The EM tree building phase creates the EM tree and code expansion phase translates it into SPARC code. EM tree is designed to pass parameters of a procedure call through register windows. To remove hybrid stack, we extract an additional information from EM code. We improved many disadvantages that arise from code expander in ACK.

  • PDF

A Space Efficient Indexing Technique for DNA Sequences (공간 효율적인 DNA 시퀀스 인덱싱 방안)

  • Song, Hye-Ju;Park, Young-Ho;Loh, Woong-Kee
    • Journal of KIISE:Databases
    • /
    • v.36 no.6
    • /
    • pp.455-465
    • /
    • 2009
  • Suffix trees are widely used in similar sequence matching for DNA. They have several problems such as time consuming, large space usages of disks and memories and data skew, since DNA sequences are very large and do not fit in the main memory. Thus, in the paper, we present a space efficient indexing method called SENoM, allowing us to build trees without merging phases for the partitioned sub trees. The proposed method is constructed in two phases. In the first phase, we partition the suffixes of the input string based on a common variable-length prefix till the number of suffixes is smaller than a threshold. In the second phase, we construct a sub tree based on the disk using the suffix sets, and then write it to the disk. The proposed method, SENoM eliminates complex merging phases. We show experimentally that proposed method is effective as bellows. SENoM reduces the disk usage less than 35% and reduces the memory usage less than 20% compared with TRELLIS algorithm. SENoM is available to query efficiently using the prefix tree even when the length of query sequence is large.

An Index Structure for Substructure Searching In Chemical Databases (화학 데이타베이스에서 부분구조 검색을 위한 인덱스 구조)

  • Lee Hwangu;Cha Jaehyuk
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.641-649
    • /
    • 2004
  • The relationship between chemical structures and biological activities is researched briskly in the area of 'Medicinal Chemistry' At the base of these structure-based drug design tries, medicinal chemists search the existing drugs of similar chemical structure to target drug for the development of a new drug. Therefore, it is such necessary that an automatic system selects drug files that have a set of chemical moieties matching a user-defined query moiety. Substructure searching is the process of identifying a set of chemical moieties that match a specific query moiety. Testing for substructure searching was developed in the late 1950s. In graph theoretical terms, this problem corresponds to determining which graphs in a set are subgraph isomorphic to a specified query moiety. Testing for subgraph isomorphism has been proved, in the general case, to be an NP- complete problem. For the purpose of overcoming this difficulty, there were computational approaches. On the 1990s, a US patent has been granted on an atom-centered indexing scheme, used by the RS3 system; this has the virtue that the indexes generated can be searched by direct text comparison. This system is commercially used(http://www.acelrys.com/rs3). We define the RS3 system's drawback and present a new indexing scheme. The RS3 system treats substructure searching with substring matching by means of expressing chemical structure aspredefined strings. However, it has insufficient 'rerall' and 'precision‘ because it is impossible to index structures uniquely for same atom and same bond. To resolve this problem, we make the minimum-cost- spanning tree for one centered atom and describe a structure with paths per levels. Expressing 2D chemical structure into 1D a string has limit. Therefore, we break 2D chemical structure into 1D structure fragments. We present in this paper a new index technique to improve recall and precision surprisingly.

Application of Integrated Security Control of Artificial Intelligence Technology and Improvement of Cyber-Threat Response Process (인공지능 기술의 통합보안관제 적용 및 사이버침해대응 절차 개선 )

  • Ko, Kwang-Soo;Jo, In-June
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.10
    • /
    • pp.59-66
    • /
    • 2021
  • In this paper, an improved integrated security control procedure is newly proposed by applying artificial intelligence technology to integrated security control and unifying the existing security control and AI security control response procedures. Current cyber security control is highly dependent on the level of human ability. In other words, it is practically unreasonable to analyze various logs generated by people from different types of equipment and analyze and process all of the security events that are rapidly increasing. And, the signature-based security equipment that detects by matching a string and a pattern has insufficient functions to accurately detect advanced and advanced cyberattacks such as APT (Advanced Persistent Threat). As one way to solve these pending problems, the artificial intelligence technology of supervised and unsupervised learning is applied to the detection and analysis of cyber attacks, and through this, the analysis of logs and events that occur innumerable times is automated and intelligent through this. The level of response has been raised in the overall aspect by making it possible to predict and block the continuous occurrence of cyberattacks. And after applying AI security control technology, an improved integrated security control service model was newly proposed by integrating and solving the problem of overlapping detection of AI and SIEM into a unified breach response process(procedure).

Research on Malicious code hidden website detection method through WhiteList-based Malicious code Behavior Analysis (WhiteList 기반의 악성코드 행위분석을 통한 악성코드 은닉 웹사이트 탐지 방안 연구)

  • Ha, Jung-Woo;Kim, Huy-Kang;Lim, Jong-In
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.4
    • /
    • pp.61-75
    • /
    • 2011
  • Recently, there is significant increasing of massive attacks, which try to infect PCs that visit websites containing pre-implanted malicious code. When visiting the websites, these hidden malicious codes can gain monetary profit or can send various cyber attacks such as BOTNET for DDoS attacks, personal information theft and, etc. Also, this kind of malicious activities is continuously increasing, and their evasion techniques become professional and intellectual. So far, the current signature-based detection to detect websites, which contain malicious codes has a limitation to prevent internet users from being exposed to malicious codes. Since, it is impossible to detect with only blacklist when an attacker changes the string in the malicious codes proactively. In this paper, we propose a novel approach that can detect unknown malicious code, which is not well detected by a signature-based detection. Our method can detect new malicious codes even though the codes' signatures are not in the pattern database of Anti-Virus program. Moreover, our method can overcome various obfuscation techniques such as the frequent change of the included redirection URL in the malicious codes. Finally, we confirm that our proposed system shows better detection performance rather than MC-Finder, which adopts pattern matching, Google's crawling based malware site detection, and McAfee.

Evaluation of Web Service Similarity Assessment Methods (웹서비스 유사성 평가 방법들의 실험적 평가)

  • Hwang, You-Sub
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • The World Wide Web is transitioning from being a mere collection of documents that contain useful information toward providing a collection of services that perform useful tasks. The emerging Web service technology has been envisioned as the next technological wave and is expected to play an important role in this recent transformation of the Web. By providing interoperable interface standards for application-to-application communication, Web services can be combined with component based software development to promote application interaction and integration both within and across enterprises. To make Web services for service-oriented computing operational, it is important that Web service repositories not only be well-structured but also provide efficient tools for developers to find reusable Web service components that meet their needs. As the potential of Web services for service-oriented computing is being widely recognized, the demand for effective Web service discovery mechanisms is concomitantly growing. A number of techniques for Web service discovery have been proposed, but the discovery challenge has not been satisfactorily addressed. Unfortunately, most existing solutions are either too rudimentary to be useful or too domain dependent to be generalizable. In this paper, we propose a Web service organizing framework that combines clustering techniques with string matching and leverages the semantics of the XML-based service specification in WSDL documents. We believe that this is one of the first attempts at applying data mining techniques in the Web service discovery domain. Our proposed approach has several appealing features : (1) It minimizes the requirement of prior knowledge from both service consumers and publishers; (2) It avoids exploiting domain dependent ontologies; and (3) It is able to visualize the semantic relationships among Web services. We have developed a prototype system based on the proposed framework using an unsupervised artificial neural network and empirically evaluated the proposed approach and tool using real Web service descriptions drawn from operational Web service registries. We report on some preliminary results demonstrating the efficacy of the proposed approach.

  • PDF