Search | Korea Science

Implementation of a Parallel Web Crawler for the Odysseus Large-Scale Search Engine (오디세우스 대용량 검색 엔진을 위한 병렬 웹 크롤러의 구현)

Shin, Eun-Jeong;Kim, Yi-Reun;Heo, Jun-Seok;Whang, Kyu-Young
- Journal of KIISE:Computing Practices and Letters
- /
- v.14 no.6
- /
- pp.567-581
- /
- 2008
As the size of the web is growing explosively, search engines are becoming increasingly important as the primary means to retrieve information from the Internet. A search engine periodically downloads web pages and stores them in the database to provide readers with up-to-date search results. The web crawler is a program that downloads and stores web pages for this purpose. A large-scale search engines uses a parallel web crawler to retrieve the collection of web pages maximizing the download rate. However, the service architecture or experimental analysis of parallel web crawlers has not been fully discussed in the literature. In this paper, we propose an architecture of the parallel web crawler and discuss implementation issues in detail. The proposed parallel web crawler is based on the coordinator/agent model using multiple machines to download web pages in parallel. The coordinator/agent model consists of multiple agent machines to collect web pages and a single coordinator machine to manage them. The parallel web crawler consists of three components: a crawling module for collecting web pages, a converting module for transforming the web pages into a database-friendly format, a ranking module for rating web pages based on their relative importance. We explain each component of the parallel web crawler and implementation methods in detail. Finally, we conduct extensive experiments to analyze the effectiveness of the parallel web crawler. The experimental results clarify the merit of our architecture in that the proposed parallel web crawler is scalable to the number of web pages to crawl and the number of machines used.
PDF KSCI

Rule-base Expert System for Privacy Violation Certainty Estimation (개인정보유출 확신도 도출을 위한 전문가시스템개발)

Kim, Jin-Hyung;Lee, Alexander;Kim, Hyung-Jong;Hwang, Jun
- Journal of the Korea Institute of Information Security & Cryptology
- /
- v.19 no.4
- /
- pp.125-135
- /
- 2009
Logs from various security system can reveal the attack trials for accessing private data without authorization. The logs can be a kind of confidence deriving factors that a certain IP address is involved in the trial. This paper presents a rule-based expert system for derivation of privacy violation confidence using various security systems. Generally, security manager analyzes and synthesizes the log information from various security systems about a certain IP address to find the relevance with privacy violation cases. The security managers' knowledge handling various log information can be transformed into rules for automation of the log analysis and synthesis. Especially, the coverage of log analysis for personal information leakage is not too broad when we compare with the analysis of various intrusion trials. Thus, the number of rules that we should author is relatively small. In this paper, we have derived correlation among logs from IDS, Firewall and Webserver in the view point of privacy protection and implemented a rule-based expert system based on the derived correlation. Consequently, we defined a method for calculating the score which represents the relevance between IP address and privacy violation. The UI(User Interface) expert system has a capability of managing the rule set such as insertion, deletion and update.
https://doi.org/10.13089/JKIISC.2009.19.4.125 인용 PDF KSCI HTML

Location Prediction of Mobile Objects using the Cubic Spline Interpolation (3차 스플라인 보간법을 이용한 이동 객체의 위치 추정)

안윤애;박정석;류근호
- Journal of KIISE:Databases
- /
- v.31 no.5
- /
- pp.479-491
- /
- 2004
Location information of mobile objects is applied to vehicle tracking, digital battlefields, location based services, and telematics. Their location coordinates are periodically measured and stored in the application systems. The linear function is mainly used to estimate the location information that is not in the system at the query time point. However, a new method is needed to improve uncertainties of the location representation, because the location estimation by linear function induces the estimation error. This paper proposes an application method of the cubic spline interpolation in order to reduce deviation of the location estimation by linear function. First, we define location information of the mobile object moving on the two-dimensional space. Next, we apply the cubic spline interpolation to location estimation of the proposed data model and describe algorithm of the estimation operation. Finally, the precision of this estimation operation model is experimented. The experimentation comes out more accurate results than the method by linear function, although the proposed location estimation function uses the small amount of information. The proposed method has an advantage that drops the cost of data storage space and communication for the management of location information of the mobile objects.
PDF KSCI

Inland Logistics Forwarding System based on Supply Chain Management : ILOF (공급사슬기반의 육상물류중개시스템 개발에 관한 연구)

박남규;최형림;김현수;박영재;손형수
- Journal of Information Technology Application
- /
- v.3 no.2
- /
- pp.67-82
- /
- 2001
The ILOF project addresses the needs of logistics industrial organizations to reduce information processing time, improve added and residual value of information and reduce processing and transportation costs. It deals with the information supply chain information systems shared by vertical partner as important entity, whose performance and optimization very significantly affects the efficiency and performance of logistics industries. This paper deals with logistics information exchange systems based on supply chain management, focusing on sharing database and processes between partners such as shipper, logistics broker, transportation company, shipping company etc., for smoothing the information flow, enhancing consumer service and reducing communication fee and labour costs. The significance of contribution of this research is the provision of a model for logistics information exchange including entity relationship diagram, data flow diagram and functions which is able to facilitate the formulation of a customer driven supply chain information network, there by enhancing the competitive edge of companies in logistics industries on local and global basis.
PDF

Design of a Question-Answering System based on RAG Model for Domestic Companies

Gwang-Wu Yi;Soo Kyun Kim
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.7
- /
- pp.81-88
- /
- 2024
Despite the rapid growth of the generative AI market and significant interest from domestic companies and institutions, concerns about the provision of inaccurate information and potential information leaks have emerged as major factors hindering the adoption of generative AI. To address these issues, this paper designs and implements a question-answering system based on the Retrieval-Augmented Generation (RAG) architecture. The proposed method constructs a knowledge database using Korean sentence embeddings and retrieves information relevant to queries through optimized searches, which is then provided to the generative language model. Additionally, it allows users to directly manage the knowledge database to efficiently update changing business information, and it is designed to operate in a private network to reduce the risk of corporate confidential information leakage. This study aims to serve as a useful reference for domestic companies seeking to adopt and utilize generative AI.
https://doi.org/10.9708/jksci.2024.29.07.081 인용 PDF HTML

A Study on the Implementation of Law Information Retrieval System (법령 정보검색 시스템 구현에 관한 연구)

Min, Jae-Hong;Cho, Pyung-Dong;Yang, Jin-Hyuk;Park, Pyung-Koo;Chung, In-Jeong
- The Transactions of the Korea Information Processing Society
- /
- v.7 no.11S
- /
- pp.3702-3713
- /
- 2000
Telecommunications standards have two different types of regulations: one is a law. enacted by government which all telecommunications related industries must observe. The other is a recommendatory standards. formulated by either government agency or some standardization organizations. Observation of these standards is not obligatory. However, technical standards are strict laws and ordinances based on common judgement and various conditions for evaluation of levels and limits. This paper deals with enhancing productivity of enactment and revision of technical standards. Through database of above related information we secure information continuity and public property of cyber space for the public. In this paper. we also classify recent data within the website in and out of the country offering four different methods of information retrieval and management system. The four retrieval methods suggested in this paper are itemized keyword retrieval. hierarchical retrieval, regulatory keyword retrieval and chronological keyword retrieval. These various retrieval methods provide the public with information of enactment and amendment of laws and regulations in the cyber space. thereby guarantees the sharing of information. Finally the important feature of the information retrieval system implemented in this paper is the online updating capability of law and regulations through the internet.
PDF

Web-enabled Healthcare System for Hypertension: Hyperlink-based Inference Approach (고혈압관리를 위한 웹 기반의 지능정보시스템: 하이퍼링크를 이용한 추론방식으로)

Song, Yong-Uk;Ho, Seung-Hee;Chae, Young-Moon;Cho, Kyoung-Won
- Journal of Intelligence and Information Systems
- /
- v.9 no.1
- /
- pp.91-107
- /
- 2003
In the conduct of this study, a web-enabled healthcare system for the management of hypertension was implemented through a hyperlink-based inference approach. The hyperlink-based inference platform was implemented using the hypertext capacity of HTML which ensured accessibility, multimedia facilities, fast response, stability, ease of use and upgrade, and platform independency of expert systems. Many HTML documents, which are hyperlinked to each other based on expert rules, were uploaded beforehand to perform the hyperlink-based inference. The HTML documents were uploaded and maintained automatically by our proprietary tool called the Web-Based Inference System (WeBIS) that supports a graphical user interface (GUI) for the input and edit of decision graphs. Nevertheless, the editing task of the decision graph using the GUI tool is a time consuming and tedious chore when the knowledge engineer must perform it manually. Accordingly, this research implemented an automatic generator of the decision graph for the management of hypertension. As a result, this research suggests a methodology for the development of Web-enabled healthcare systems using the hyperlink-based inference approach and, as an example, implements a Web-enabled healthcare system for hypertension, a platform which performed especially well in the areas of speed and stability.
PDF

Fuzzy reasoning for assessing bulk tank milk quality (Bulk tank milk의 품질평가를 위한 퍼지기반 추론)

Kim Taioun;Jung Daeyou;Jayarao Bhushan M.
- Journal of Intelligence and Information Systems
- /
- v.10 no.3
- /
- pp.39-57
- /
- 2004
Many dairy producers periodically receive information about their bulk tank milk with reference to bulk tank somatic cell counts, standard plate counts, and preliminary incubation counts. This information, when collected over a period of time, in combination with bulk tank mastitis culture reports can become a significant knowledge base. Several guidelines have been proposed to interpret farm bulk tank milk bacterial counts. However many of the suggested interpretive criteria lack validation, and provide little insight to the interrelationship between different groups of bacteria found in bulk tank milk. Also the linguistic terms describing bulk tank milk quality or herd management status are rather vague or fuzzy such as excellent, good or unsatisfactory. The objective of this paper was to develop a set of fuzzy descriptors to evaluate bulk tank milk quality and herd's milking practice based on bulk tank milk microbiology test results. Thus, fuzzy logic based reasoning methodologies were developed based on fuzzy inference engine. Input parameters were bulk tank somatic cell counts, standard plate counts, preliminary incubation counts, laboratory pasteurization counts, non agalactiae-Streptococci and Streptococci like organisms, and Staphylococcus aureus. Based on the input data, bulk tank milk quality was classified as excellent, good, milk cooling problem, cleaning problem, environmental mastitis, or mixed with mastitis and cleaning problems. The results from fuzzy reasoning would provide a reference regarding a good management practice for milk producers, dairy health consultants, and veterinarians.
PDF

Construction of the Honam Culture Information System(HCIS) using Web GIS (WebGIS를 이용한 호남문화정보시스템(HCIS) 구축)

Yang, Hea-Kun;Shin, Hye-Jin
- Journal of the Korean association of regional geographers
- /
- v.12 no.2
- /
- pp.291-304
- /
- 2006
Individual culture information has been the mainstream in studies on culture information so far, and the studies have focused on zones using paper map. As a result, intuitive analysis in map and extremely restricted measuring space analysis are limited in summarizing and utilizing complicated and huge cultural materials systematically and scientifically. Introduction of GIS can be regarded as an indispensable element for solution of this problem as it can analyze temporal-spatial dynamics of culture information as a whole and to construct effective management system for regional culture information. In particular, supply of two-way information rather than one-way information becomes more and more important in the society structure where value is diversified and the culture gets faster owing to high-level information industry like today. Accordingly, this study is considered to be meaningful in that WebGIS-based regional culture information system allows temporal-spatial analysis and spatial analysis for various culture information for the users using internet. Regional culture information system like culture information system in Honam region can not only contribute to comparative study between regions and to creation of new information through analysis of statistics between culture elements but also allow easy and comprehensive approach to regional information.
PDF

인터넷 질의 처리를 위한 웨이블릿 변환에 기반한 통합 요약정보의 관리

Joe, Moon-Jeung;Whang, Kyu-Young;Kim, Sang-Wook;Shim, Kyu-Seok
- Journal of KIISE:Databases
- /
- v.28 no.4
- /
- pp.702-714
- /
- 2001
As Internet technology evolves, there is growing need of Internet queries involving multiple information sources. Efficient processing of such queries necessitates the integrated summary data that compactly represents the data distribution of the entire database scattered over many information sources. This paper presents an efficient method of managing the integrated summary data based on the wavelet transform and addresses Internet query processing using the integrated summary data. The simplest method for creating the integrated summary data would be to summarize the integrated data sidtribution obtained by merging the data distributions in multiple information sources. However, this method suffers from the high cost of transmitting storing and merging a large amount of data distribution. To overcome the drawbacks, we propose a new wavelet transform based method that creates the integrated summary data by merging multiple summary data and effective method for optimizing Internet queries using it A wavelet transformed summary data is converted to satisfy conditions for merging. Moreover i the merging process is very simpe owing to the properties of the wavelet transform. we formally derive the upper bound of the error of the wavelet transformed intergrated summary data. Compared with the histogram-based integrated summary data the wavelet transformedintegrated summary data provesto be 1.6~5.5 time more accurate when used for selectivity estimation in experiments. In processing Internet top-N queries involving 56 information sources using the integrated summary data reduces the processing cost to 1/44 of the cost of not using it.
PDF

Search Result 765, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)