• Title/Summary/Keyword: XML Matching

Search Result 67, Processing Time 0.023 seconds

X-tree Diff: An Efficient Change Detection Algorithm for Tree-structured Data (X-tree Diff: 트리 기반 데이터를 위한 효율적인 변화 탐지 알고리즘)

  • Lee, Suk-Kyoon;Kim, Dong-Ah
    • The KIPS Transactions:PartC
    • /
    • v.10C no.6
    • /
    • pp.683-694
    • /
    • 2003
  • We present X-tree Diff, a change detection algorithm for tree-structured data. Our work is motivated by need to monitor massive volume of web documents and detect suspicious changes, called defacement attack on web sites. From this context, our algorithm should be very efficient in speed and use of memory space. X-tree Diff uses a special ordered labeled tree, X-tree, to represent XML/HTML documents. X-tree nodes have a special field, tMD, which stores a 128-bit hash value representing the structure and data of subtrees, so match identical subtrees form the old and new versions. During this process, X-tree Diff uses the Rule of Delaying Ambiguous Matchings, implying that it perform exact matching where a node in the old version has one-to one corrspondence with the corresponding node in the new, by delaying all the others. It drastically reduces the possibility of wrong matchings. X-tree Diff propagates such exact matchings upwards in Step 2, and obtain more matchings downwsards from roots in Step 3. In step 4, nodes to ve inserted or deleted are decided, We aldo show thst X-tree Diff runs on O(n), woere n is the number of noses in X-trees, in worst case as well as in average case, This result is even better than that of BULD Diff algorithm, which is O(n log(n)) in worst case, We experimented X-tree Diff on reat data, which are about 11,000 home pages from about 20 wev sites, instead of synthetic documets manipulated for experimented for ex[erimentation. Currently, X-treeDiff algorithm is being used in a commeercial hacking detection system, called the WIDS(Web-Document Intrusion Detection System), which is to find changes occured in registered websites, and report suspicious changes to users.

Design and Implementation of A Brokering System for Ships and Cargos (선박과 화물에 대한 중개 시스템의 설계 및 구현)

  • Seo Sang-Koo;Yoon Kyung-Hyun
    • Journal of Information Technology Applications and Management
    • /
    • v.12 no.1
    • /
    • pp.49-68
    • /
    • 2005
  • It is one of the crucial components of electronic logistics systems to manage logistics information of cargos and transportation companies and to mediate appropriate brokerage between them. Due to the advance of e-Commerce technologies many kinds of logistics transactions can be handled by means of EDI or XML/EDI applications. but the brokering processing relies mostly on the traditional processes and the research in this field is still at the initial stage. In this paper we study a logistics brokering system for ships and cargos and describe the design and implementation of the system. We analyze the brokering constraints for logistics of cargos and ships and construct an optimization model for their brokering. We also suggest a brokering procedure and a simple heuristic algorithm with respect to the proposed matching criteria. The experimental result shows that the proposed greedy-based heuristic algorithm performs very well. In its response time the proposed algorithm executed within a couple of seconds independently of the number of cargos and the container capacities of ships. The output of the algorithm is very close to that of the optimal solution. showing higher than 95% of approximation. The proposed system is implemented for the Web environment using JSP and PL/SQL.

  • PDF

Efficient Image Retrieval using Minimal Spatial Relationships (최소 공간관계를 이용한 효율적인 이미지 검색)

  • Lee, Soo-Cheol;Hwang, Een-Jun;Byeon, Kwang-Jun
    • Journal of KIISE:Databases
    • /
    • v.32 no.4
    • /
    • pp.383-393
    • /
    • 2005
  • Retrieval of images from image databases by spatial relationship can be effectively performed through visual interface systems. In these systems, the representation of image with 2D strings, which are derived from symbolic projections, provides an efficient and natural way to construct image index and is also an ideal representation for the visual query. With this approach, retrieval is reduced to matching two symbolic strings. However, using 2D-string representations, spatial relationships between the objects in the image might not be exactly specified. Ambiguities arise for the retrieval of images of 3D scenes. In order to remove ambiguous description of object spatial relationships, in this paper, images are referred by considering spatial relationships using the spatial location algebra for the 3D image scene. Also, we remove the repetitive spatial relationships using the several reduction rules. A reduction mechanism using these rules can be used in query processing systems that retrieve images by content. This could give better precision and flexibility in image retrieval.

A Photo Management Model for Semantic Web (시맨틱 웹을 위한 사진 관리 모델)

  • Han Jeong-Hwan;Koo Yong-Wan
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.10 no.3
    • /
    • pp.15-20
    • /
    • 2005
  • Since the invention of Web, it became part of our daily life replacing the routine information search and effecting many activities which otherwise could have been done off-line without it. It was a natural evolution of the web technology, which had started out as a simple test based pattern matching, to be based on the optimized match process for its multi-media Web environment, like still images, music and movies that we are to face today. In this paper, we proposed and implemented the model which the multimedia resources can be efficiently shared in semantic web. After converting multimedia resource information(metadata) into RDF, for efficient management, the model separate and allotment actual multimedia resource and corresponding metadata changed to RDF to each server. The proposed model could be applied in all multimedia resources. For easy explanation and implementation, however, we applied it to the digital photograph in example.

  • PDF

Evaluation of Web Service Similarity Assessment Methods (웹서비스 유사성 평가 방법들의 실험적 평가)

  • Hwang, You-Sub
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.4
    • /
    • pp.1-22
    • /
    • 2009
  • The World Wide Web is transitioning from being a mere collection of documents that contain useful information toward providing a collection of services that perform useful tasks. The emerging Web service technology has been envisioned as the next technological wave and is expected to play an important role in this recent transformation of the Web. By providing interoperable interface standards for application-to-application communication, Web services can be combined with component based software development to promote application interaction and integration both within and across enterprises. To make Web services for service-oriented computing operational, it is important that Web service repositories not only be well-structured but also provide efficient tools for developers to find reusable Web service components that meet their needs. As the potential of Web services for service-oriented computing is being widely recognized, the demand for effective Web service discovery mechanisms is concomitantly growing. A number of techniques for Web service discovery have been proposed, but the discovery challenge has not been satisfactorily addressed. Unfortunately, most existing solutions are either too rudimentary to be useful or too domain dependent to be generalizable. In this paper, we propose a Web service organizing framework that combines clustering techniques with string matching and leverages the semantics of the XML-based service specification in WSDL documents. We believe that this is one of the first attempts at applying data mining techniques in the Web service discovery domain. Our proposed approach has several appealing features : (1) It minimizes the requirement of prior knowledge from both service consumers and publishers; (2) It avoids exploiting domain dependent ontologies; and (3) It is able to visualize the semantic relationships among Web services. We have developed a prototype system based on the proposed framework using an unsupervised artificial neural network and empirically evaluated the proposed approach and tool using real Web service descriptions drawn from operational Web service registries. We report on some preliminary results demonstrating the efficacy of the proposed approach.

  • PDF

Development of Intelligent Job Classification System based on Job Posting on Job Sites (구인구직사이트의 구인정보 기반 지능형 직무분류체계의 구축)

  • Lee, Jung Seung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.123-139
    • /
    • 2019
  • The job classification system of major job sites differs from site to site and is different from the job classification system of the 'SQF(Sectoral Qualifications Framework)' proposed by the SW field. Therefore, a new job classification system is needed for SW companies, SW job seekers, and job sites to understand. The purpose of this study is to establish a standard job classification system that reflects market demand by analyzing SQF based on job offer information of major job sites and the NCS(National Competency Standards). For this purpose, the association analysis between occupations of major job sites is conducted and the association rule between SQF and occupation is conducted to derive the association rule between occupations. Using this association rule, we proposed an intelligent job classification system based on data mapping the job classification system of major job sites and SQF and job classification system. First, major job sites are selected to obtain information on the job classification system of the SW market. Then We identify ways to collect job information from each site and collect data through open API. Focusing on the relationship between the data, filtering only the job information posted on each job site at the same time, other job information is deleted. Next, we will map the job classification system between job sites using the association rules derived from the association analysis. We will complete the mapping between these market segments, discuss with the experts, further map the SQF, and finally propose a new job classification system. As a result, more than 30,000 job listings were collected in XML format using open API in 'WORKNET,' 'JOBKOREA,' and 'saramin', which are the main job sites in Korea. After filtering out about 900 job postings simultaneously posted on multiple job sites, 800 association rules were derived by applying the Apriori algorithm, which is a frequent pattern mining. Based on 800 related rules, the job classification system of WORKNET, JOBKOREA, and saramin and the SQF job classification system were mapped and classified into 1st and 4th stages. In the new job taxonomy, the first primary class, IT consulting, computer system, network, and security related job system, consisted of three secondary classifications, five tertiary classifications, and five fourth classifications. The second primary classification, the database and the job system related to system operation, consisted of three secondary classifications, three tertiary classifications, and four fourth classifications. The third primary category, Web Planning, Web Programming, Web Design, and Game, was composed of four secondary classifications, nine tertiary classifications, and two fourth classifications. The last primary classification, job systems related to ICT management, computer and communication engineering technology, consisted of three secondary classifications and six tertiary classifications. In particular, the new job classification system has a relatively flexible stage of classification, unlike other existing classification systems. WORKNET divides jobs into third categories, JOBKOREA divides jobs into second categories, and the subdivided jobs into keywords. saramin divided the job into the second classification, and the subdivided the job into keyword form. The newly proposed standard job classification system accepts some keyword-based jobs, and treats some product names as jobs. In the classification system, not only are jobs suspended in the second classification, but there are also jobs that are subdivided into the fourth classification. This reflected the idea that not all jobs could be broken down into the same steps. We also proposed a combination of rules and experts' opinions from market data collected and conducted associative analysis. Therefore, the newly proposed job classification system can be regarded as a data-based intelligent job classification system that reflects the market demand, unlike the existing job classification system. This study is meaningful in that it suggests a new job classification system that reflects market demand by attempting mapping between occupations based on data through the association analysis between occupations rather than intuition of some experts. However, this study has a limitation in that it cannot fully reflect the market demand that changes over time because the data collection point is temporary. As market demands change over time, including seasonal factors and major corporate public recruitment timings, continuous data monitoring and repeated experiments are needed to achieve more accurate matching. The results of this study can be used to suggest the direction of improvement of SQF in the SW industry in the future, and it is expected to be transferred to other industries with the experience of success in the SW industry.

A Dynamic Management Method for FOAF Using RSS and OLAP cube (RSS와 OLAP 큐브를 이용한 FOAF의 동적 관리 기법)

  • Sohn, Jong-Soo;Chung, In-Jeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.39-60
    • /
    • 2011
  • Since the introduction of web 2.0 technology, social network service has been recognized as the foundation of an important future information technology. The advent of web 2.0 has led to the change of content creators. In the existing web, content creators are service providers, whereas they have changed into service users in the recent web. Users share experiences with other users improving contents quality, thereby it has increased the importance of social network. As a result, diverse forms of social network service have been emerged from relations and experiences of users. Social network is a network to construct and express social relations among people who share interests and activities. Today's social network service has not merely confined itself to showing user interactions, but it has also developed into a level in which content generation and evaluation are interacting with each other. As the volume of contents generated from social network service and the number of connections between users have drastically increased, the social network extraction method becomes more complicated. Consequently the following problems for the social network extraction arise. First problem lies in insufficiency of representational power of object in the social network. Second problem is incapability of expressional power in the diverse connections among users. Third problem is the difficulty of creating dynamic change in the social network due to change in user interests. And lastly, lack of method capable of integrating and processing data efficiently in the heterogeneous distributed computing environment. The first and last problems can be solved by using FOAF, a tool for describing ontology-based user profiles for construction of social network. However, solving second and third problems require a novel technology to reflect dynamic change of user interests and relations. In this paper, we propose a novel method to overcome the above problems of existing social network extraction method by applying FOAF (a tool for describing user profiles) and RSS (a literary web work publishing mechanism) to OLAP system in order to dynamically innovate and manage FOAF. We employed data interoperability which is an important characteristic of FOAF in this paper. Next we used RSS to reflect such changes as time flow and user interests. RSS, a tool for literary web work, provides standard vocabulary for distribution at web sites and contents in the form of RDF/XML. In this paper, we collect personal information and relations of users by utilizing FOAF. We also collect user contents by utilizing RSS. Finally, collected data is inserted into the database by star schema. The system we proposed in this paper generates OLAP cube using data in the database. 'Dynamic FOAF Management Algorithm' processes generated OLAP cube. Dynamic FOAF Management Algorithm consists of two functions: one is find_id_interest() and the other is find_relation (). Find_id_interest() is used to extract user interests during the input period, and find-relation() extracts users matching user interests. Finally, the proposed system reconstructs FOAF by reflecting extracted relationships and interests of users. For the justification of the suggested idea, we showed the implemented result together with its analysis. We used C# language and MS-SQL database, and input FOAF and RSS as data collected from livejournal.com. The implemented result shows that foaf : interest of users has reached an average of 19 percent increase for four weeks. In proportion to the increased foaf : interest change, the number of foaf : knows of users has grown an average of 9 percent for four weeks. As we use FOAF and RSS as basic data which have a wide support in web 2.0 and social network service, we have a definite advantage in utilizing user data distributed in the diverse web sites and services regardless of language and types of computer. By using suggested method in this paper, we can provide better services coping with the rapid change of user interests with the automatic application of FOAF.