Search | Korea Science

X-tree Diff: An Efficient Change Detection Algorithm for Tree-structured Data (X-tree Diff: 트리 기반 데이터를 위한 효율적인 변화 탐지 알고리즘)

Lee, Suk-Kyoon;Kim, Dong-Ah
- The KIPS Transactions:PartC
- /
- v.10C no.6
- /
- pp.683-694
- /
- 2003
We present X-tree Diff, a change detection algorithm for tree-structured data. Our work is motivated by need to monitor massive volume of web documents and detect suspicious changes, called defacement attack on web sites. From this context, our algorithm should be very efficient in speed and use of memory space. X-tree Diff uses a special ordered labeled tree, X-tree, to represent XML/HTML documents. X-tree nodes have a special field, tMD, which stores a 128-bit hash value representing the structure and data of subtrees, so match identical subtrees form the old and new versions. During this process, X-tree Diff uses the Rule of Delaying Ambiguous Matchings, implying that it perform exact matching where a node in the old version has one-to one corrspondence with the corresponding node in the new, by delaying all the others. It drastically reduces the possibility of wrong matchings. X-tree Diff propagates such exact matchings upwards in Step 2, and obtain more matchings downwsards from roots in Step 3. In step 4, nodes to ve inserted or deleted are decided, We aldo show thst X-tree Diff runs on O(n), woere n is the number of noses in X-trees, in worst case as well as in average case, This result is even better than that of BULD Diff algorithm, which is O(n log(n)) in worst case, We experimented X-tree Diff on reat data, which are about 11,000 home pages from about 20 wev sites, instead of synthetic documets manipulated for experimented for ex[erimentation. Currently, X-treeDiff algorithm is being used in a commeercial hacking detection system, called the WIDS(Web-Document Intrusion Detection System), which is to find changes occured in registered websites, and report suspicious changes to users.
https://doi.org/10.3745/KIPSTC.2003.10C.6.683 인용 PDF KSCI

A Study on the Cultural Landscapes of Scenic Sites on 『Joseon myeongseungsiseon(朝鮮名勝詩選)』 at the Japanese Colonial Period - A Case of Cheonan, Chungnam Province - (일제강점기 『조선명승시선(朝鮮名勝詩選)』에 나타나는 명승고적의 문화경관 연구 - 충청남도 천안을 사례로 -)

Lee, Hang-Lyoul
- Journal of the Korean Institute of Traditional Landscape Architecture
- /
- v.37 no.2
- /
- pp.40-53
- /
- 2019
This study was conducted to investigate the changes in Scenic Spots by utilizing the "Sinjeungdonggukyeojiseungram(新增東國輿地勝覽)" and "Joseonhwanyeoseungnam(朝鮮?與勝覽)" to interpret "Joseonmyeongseungsiseon(朝鮮名勝詩選, 1915)". By examining the historical context when "Joseonmyeongseungsiseon" was published, it documented the Japanese's memories of 'Sino-Japanese War(淸日戰爭)' in 1894, which implies the 'policy of assimilation' by the Japanese Government-General of Korea after the Japanese annexation of Korea(1910). Detailed information about the author 'Narushima Sagimura(成島鷺村)' can be found in preface. In the "Joseonmyeongseungsiseon", it dedicates most of the part in describing the Scenic Spot especially in 'Anseong Do (15 lines)', where has the memories of war such as the 'First Sino-Japanese War'. The number of Scenic Spots, commonly mentioned in both "Sinjeungdonggukyeoji seungram" and "Joseonhwanyeoseungnam" in Cheonan province are 13 in total. Most of the content contains a similar structure. But 'Honggyeongwon(弘慶院)' and 'Seonghwanyeok(成歡驛)' has both the common Joseon Dynasty landscape point of view, and the additional historical context which are about the 'Jeongyujaeran(丁酉再亂)' or 'First Sino-Japanese War' consequently enlightens the 'placeness' of the Scenic Spots. Among the newly described Scenic Spots, 'Anseongdo(安城渡)' is the part that focuses on the memory of the 'Anseongcheon Battle' that gave Japan its first defeat in the Sino-Japanese War. Especially, by introducing the poetry of 'Sinobu Shunpei' it maximizes the appreciation through emphasizing the direct correlation between placeness and the poem itself. While the Joseon Dynasty poems are 10 pieces in total and their title and the subject matters are all related to historical spots, and the appreciation also maximizes when fully interpreted with understanding the historical context. However, it's contextual meanings are neglected by dividing the actual structures into separate pages. When looking at the location of famous historic sites, they come in many different types, considering the location, meaning, size, and conditions surrounding them. It appears as a service space for travelers, a place for sightseeing, relaxation or return, a temple space for paying respects or memorial services, a fortress facility for defense and protection, or a fishing area for wages, and an old battlefield. Especially, it is noted that the area is diverse as the cultural landscape of Cheonan, given that the battle space between the hermitage and the Donghangnongmin(東學農民) is shared with each other. It is necessary to establish policies for the preservation and restoration of local cultural assets based on these points in the future.
https://doi.org/10.14700/KITLA.2019.37.2.040 인용 PDF KSCI HTML

The aesthetics of irony in repetition and the difference of Oh! Soojung (<오! 수정>의 아이러니 미학 - 반복과 차이의 구조를 중심으로)

Suh, MyungSoo
- 기호학연구
- /
- no.57
- /
- pp.121-153
- /
- 2018
In terms of the story told, we see that Oh! Soojung(Virgin Stripped Bare by Her Bachelors) is a film of the ideololgy of masculinity. However, from the point of view of the manner of presenting story, Oh! Soojung is a film that aims to devalue this ideology. How will it be possible? This is the principle of the irony that the speaker, by saying P, wants to make Q listen that devalues and contradicts P. Our study is tempted to explain the process of interpreting the irony in the film. The ideology of the film occurs when the presupposed contents have become the subject. For example Cendrion who tells a story of a girl married to a prince presupposes that the girl, Cendrion, is obedient. The subject of this story is that the presupposition: /the girls who want to be happy must be obedient/, which represents the ideology of masculinity. Presupposed content thus imposes on the public a collective and conservative value, as its enunciator belongs to the collective voice. Since ironisation occurs when the utterance itself is annulled, one must also deny or cancel the story told of Oh! Soojng: /Jeahun who is rich and Soojung who is obedient and virgin have become lovers/. Since there is no semantic mark within the utterance, irony is a voice that comes from without; this is how we understand irony in a purely pragmatic way. The outer voices are two things: the way to build the story: question of focusing, ocularization and auricularization, and the way to present the story: question the order, the frequency or the plot. Our study is focused on the question of frequency at Oh! Soojung which has a repetition structure in which the memory of Jeahun and that of Soojung are represented one after the other. Since the memories of two characters are not identical, the repetition is accompanied by differences. The differences at first allow the public to build their own story from the di?g?se of the film and then make the audience fall into confusion where we can not be certain of what we see and know in the di?g?se of the film, and finally make their knowledge questionable. About repetition, so that it can have validity in terms of the informativeness of the utterance, it must deny the existence of the previous repetition. This is how repetition cancels itself and consequently the utterance. We see that the irony of Oh! Soojung occurs by repetition with differences that cancels the story of the film.
https://doi.org/10.24825/SI.57.5 인용

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
- Journal of Internet Computing and Services
- /
- v.14 no.6
- /
- pp.71-84
- /
- 2013
Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.
https://doi.org/10.7472/jksii.2013.14.6.71 인용 PDF KSCI

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

Yun, Unil;Pyun, Gwangbum
- Journal of Internet Computing and Services
- /
- v.16 no.1
- /
- pp.67-74
- /
- 2015
In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.
https://doi.org/10.7472/jksii.2015.16.1.67 인용 PDF KSCI

Search Result 1,425, Processing Time 0.023 seconds

X-tree Diff: An Efficient Change Detection Algorithm for Tree-structured Data (X-tree Diff: 트리 기반 데이터를 위한 효율적인 변화 탐지 알고리즘)

The aesthetics of irony in repetition and the difference of Oh! Soojung (<오! 수정>의 아이러니 미학 - 반복과 차이의 구조를 중심으로)

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)