• Title/Summary/Keyword: Web Bot

Search Result 12, Processing Time 0.037 seconds

An Implementation and Performance Evaluation of Fast Web Crawler with Python

  • Kim, Cheong Ghil
    • Journal of the Semiconductor & Display Technology
    • /
    • v.18 no.3
    • /
    • pp.140-143
    • /
    • 2019
  • The Internet has been expanded constantly and greatly such that we are having vast number of web pages with dynamic changes. Especially, the fast development of wireless communication technology and the wide spread of various smart devices enable information being created at speed and changed anywhere, anytime. In this situation, web crawling, also known as web scraping, which is an organized, automated computer system for systematically navigating web pages residing on the web and for automatically searching and indexing information, has been inevitably used broadly in many fields today. This paper aims to implement a prototype web crawler with Python and to improve the execution speed using threads on multicore CPU. The results of the implementation confirmed the operation with crawling reference web sites and the performance improvement by evaluating the execution speed on the different thread configurations on multicore CPU.

MultiHammer: A Virtual Auction System based on Information Agents

  • Yamada, Ryota;Hattori, Hiromitsy;Ito, Takayuki;Ozono, Tadachika;Chintani, Toramastsu
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.73-77
    • /
    • 2001
  • In this paper, we propose a virtual action system based on information agents, We call the system the MultiHammer, MultiHammer can be used for studying and analyzing online actions. MuiltiHammer provides functions of implement-ing a meta online action site and an experiment environ-ment. We have been using MultiHammer as an experiment as an experiment environment for BiddinBot. BiddingBot aims at assisting users to bid simultaneously in multiple online auctions. In order to bid simultaneously in multiple online auctions. In order to analyze the behavior of BiddngBot, we need to pur-chase a lot of items. It is hard for us to prepare a lot of fund to show usability and advantage of BiddingBot. MultiHam-mer enables us to effectively analyze the behavior of BiddingBot. MultiHammer consists of three types of agents for information collecting data storing and auctioning. Agents for information wrappers. To make agent work as wrarp-pers, we heed to realize software modules for each online action site. Implementing these modules reguires a lot of time and patience. To address this problem, we designed a support mechanism for developing the modules. Agents for data storing record the data gathered by agents for informa-tion collecting. Agents for auctioning provide online services using data recorded by agents for data storing. By recording the activities in auction sites. MultiHammer can recreate any situation and trace auction for experimentation, Users can participate in virtual using the same information in real online auctions. Users also participate in real auc-tions via wrapper agents for information collecting

  • PDF

Information Quality Satisfaction of Web Site User (웹 사이트 사용자의 정보품질 만족에 관한 연구)

  • Ham, Bong-Jin
    • The Journal of Society for e-Business Studies
    • /
    • v.9 no.3
    • /
    • pp.169-190
    • /
    • 2004
  • This study involves the test of constructs for measuring the web site user expectation, perceived performance regarding IQ, expectation congruency, and satisfaction about influence overall satisfaction. The findings can be summarized as follows. First, Web-IQ expectation appeared to have a positive effect on Web-IQ perceived performance by expectation congruency and a positive effect on perceived performance by assimilation effect. Second, Web-IQ expectation did bot appear to have a negative effect on Web-IQ expectation congruency. Although that was rejected, it differs from established researches. Some prior Web-IQ expectations that approach to congruency after one's post Web-IQ expectation. Third, Web-IQ perceived performance appeared to have a positive effect on Web-IQ expectation congruency. Fourth, result that analyze about effect that congruency gets to Web-IQ satisfaction although Web-IQ expectation appeared to have a positive effect on overall satisfaction.

  • PDF

Detecting the HTTP-GET Flood Attacks Based on the Access Behavior of Inline Objects in a Web-page Using NetFlow Data

  • Kang, Koo-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.7
    • /
    • pp.1-8
    • /
    • 2016
  • Nowadays, distributed denial of service (DDoS) attacks on web sites reward attackers financially or politically because our daily lifes tightly depends on web services such as on-line banking, e-mail, and e-commerce. One of DDoS attacks to web servers is called HTTP-GET flood attack which is becoming more serious. Most existing techniques are running on the application layer because these attack packets use legitimate network protocols and HTTP payloads; that is, network-level intrusion detection systems cannot distinguish legitimate HTTP-GET requests and malicious requests. In this paper, we propose a practical detection technique against HTTP-GET flood attacks, based on the access behavior of inline objects in a webpage using NetFlow data. In particular, our proposed scheme is working on the network layer without any application-specific deep packet inspections. We implement the proposed detection technique and evaluate the ability of attack detection on a simple test environment using NetBot attacker. Moreover, we also show that our approach must be applicable to real field by showing the test profile captured on a well-known e-commerce site. The results show that our technique can detect the HTTP-GET flood attack effectively.

A Comparative Study of WWW Search Engine Performance (WWW 탐색도구의 색인 및 탐색 기능 평가에 관한 연구)

  • Chung Young-Mee;Kim Seong-Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.31 no.1
    • /
    • pp.153-184
    • /
    • 1997
  • The importance of WWW search services is increasing as Internet information resources explode. An evaluation of current 9 search services was first conducted by comparing descriptively the features concerning indexing, searching, and ranking of search results. Secondly, a couple of search queries were used to evaluate search performance of those services by the measures of retrieval effectiveness. the degree of overlap in searching sites, and the degree of similarity between services. In this experiment, Alta Vista, HotBot and Open Text Index showed better results for the retrieval effectiveness. The level of similarity among the 9 search services was extremely low.

  • PDF

Distributed Attack Analysis and Countermeasure (분산처리 공격에 대한 방어방법 연구)

  • Shin, Miyea
    • Journal of Convergence Society for SMB
    • /
    • v.5 no.1
    • /
    • pp.19-23
    • /
    • 2015
  • Distributed Denial of Service attack is a form of denial of service attacks, the attacker to attack a place in a number of points of attack by a wide variety of forms over the network to perform a service on a point attack . Do not use a specific server or client attempts to make a connection to many services available that prevents this attack and so normally used . Corresponding methods of DDoS attacks has a corresponding managerial aspects and technical aspects of the proposed two.

  • PDF

Web Bot Accessibility Evaluation Program for A Web Site (웹 사이트의 웹봇 접근성 평가 프로그램)

  • Yun, Won-Ki;Park, Yong-Hwi;Kim, Seong-Hwan;Kim, Min-Ju;Kim, Suk-Il
    • Annual Conference of KIPS
    • /
    • 2014.11a
    • /
    • pp.672-675
    • /
    • 2014
  • 최근 DDOS 공격이 활발해짐에 따라 많은 사이트에서 검색 페이지에 노출되지 않도록 웹봇을 차단한 경우가 생기게 되었다. 하지만 이는 사이트 사용자나 사이트 측 모두에게 악영향을 미치므로 웹 사이트의 웹봇 접근성을 평가하는 프로그램을 제작하게 되었다. 본 프로그램은 Robots.txt를 분석하거나 Robots 메타 태크 등을 분석하여 웹 사이트의 웹봇 접근성을 평가하는 프로그램이다. 이 프로그램으로 평가한 결과를 피드백의 근거로 삼아 더 나은 검색 결과를 기대할 수 있다.

A Design and Implementation of Weather Forecast Chatbot Based on Kakaotalk Open Builder (카카오톡 오픈빌더 기반의 일기 예보 챗봇 설계 및 구현)

  • Lee, Won Joo;Gim, Han Su;Cha, Dae Yun;Lee, il u;Jung, Seong Jun;Cho, Seung Yeon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.07a
    • /
    • pp.29-30
    • /
    • 2019
  • 본 논문에서는 카카오i 오픈빌더 API를 활용하여 언제 어디서나 손쉬운 접근 방법으로 날씨 정보를 얻을 수 있는 챗봇을 설계하고 구현한다 이 챗봇은, 플러스 친구를 통해 친구 추가 후 이용 가능하며, Python의 Flask 웹 프레임워크를 통하여 날씨에 관한 기온, 미세 먼지 농도, 강수량, 자외선 지수, 캐스팅 정보 등을 네이버에서 사용자가 검색한 지역별로 크롤링 후 가공하여 서비스 한다.

  • PDF

Dark Web based Malicious Code Detection and Analysis (다크웹 크롤러를 사용한 악성코드 탐지 및 분석)

  • Kim, Ah-Lynne;Lee, Eun-Ji
    • Annual Conference of KIPS
    • /
    • 2020.11a
    • /
    • pp.446-449
    • /
    • 2020
  • 다크웹을 이용한 사이버 범죄율이 국내외에서 가파르게 상승 중이다. 그러나 다크웹의 특성상 숨겨져 있는 인터넷 영역에서 공유되는 악성코드들을 찾기란 어렵다. 특히 다크웹상 여러 서비스들은 크롤러 bot과 같은 정보 수집을 막고자 다양한 기법을 적용하고 있다. 따라서 우리는 기존의 연구 방법에 따라 다크웹 상의 URL을 수집한 후, 추가적으로 다운로더를 만들어 exe, zip과 같은 특정 형식의 파일을 수집하였다. 앞으로 해당 파일들은 통합 바이러스 스캔 엔진에서 검사하여 의심 파일들을 분별할 예정이다. 의심 파일들은 정적 / 동적 분석을 통해 상세한 보고서를 제출하여 향후 다크웹 내의 악성코드 분포 / 출처 분석에 유의미한 결과를 도출할 수 있다.

Multi-threaded Web Crawling Design using Queues (큐를 이용한 다중스레드 방식의 웹 크롤링 설계)

  • Kim, Hyo-Jong;Lee, Jun-Yun;Shin, Seung-Soo
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.2
    • /
    • pp.43-51
    • /
    • 2017
  • Background/Objectives : The purpose of this study is to propose a multi-threaded web crawl using queues that can solve the problem of time delay of single processing method, cost increase of parallel processing method, and waste of manpower by utilizing multiple bots connected by wide area network Design and implement. Methods/Statistical analysis : This study designs and analyzes applications that run on independent systems based on multi-threaded system configuration using queues. Findings : We propose a multi-threaded web crawler design using queues. In addition, the throughput of web documents can be analyzed by dividing by client and thread according to the formula, and the efficiency and the number of optimal clients can be confirmed by checking efficiency of each thread. The proposed system is based on distributed processing. Clients in each independent environment provide fast and reliable web documents using queues and threads. Application/Improvements : There is a need for a system that quickly and efficiently navigates and collects various web sites by applying queues and multiple threads to a general purpose web crawler, rather than a web crawler design that targets a particular site.