• Title/Summary/Keyword: 크롤링 시스템

Search Result 73, Processing Time 0.025 seconds

Web-Anti-MalWare Malware Detection System (악성코드 탐지 시스템 Web-Anti-Malware)

  • Jung, Seung-il;Kim, Hyun-Woo
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.07a
    • /
    • pp.365-367
    • /
    • 2014
  • 최근 웹 서비스의 증가와 악성코드는 그 수를 판단 할 수 없을 정도로 빠르게 늘어나고 있다. 매년 늘어나는 악성코드는 금전적 이윤 추구가 악성코드의 주된 동기가 되고 있으며 이는 공공기관 및 보안 업체에서도 악성코드를 탐지하기 위한 연구가 활발히 진행되고 있다. 본 논문에서는 실시간으로 패킷을 분석할수 있는 필터링과 웹 크롤링을 통해 도메인 및 하위 URL까지 자동적으로 탐지할 수 있는 악성코드 탐지 시스템을 제안한다.

  • PDF

Recruitment information SNS system using crawling (크롤링을 이용한 채용정보 SNS 시스템)

  • Hur, Tai-Sung;Park, Jae-Hyun
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.467-468
    • /
    • 2021
  • 본 논문에서는 자료수집(데이터 크롤링)을 이용해 많은 채용정보를 쉽게 접근할 수 있도록 하는 시스템이다. 현재는 StackOverflow의 자료를 수집하고 데이터베이스에 자동으로 저장하도록 하였다. 수집해야 할 자료가 많아 Celery와 RabbitMQ를 사용하여 비동기 작업을 요청하여 즉시 응답을 받지 않아도 다른 일을 수행할 수 있다. 이렇게 수집한 자료들을 해당 사이트에 나열해줌으로 사용자들이 시간과 비용을 절감하여 효율적인 취업 준비를 할 수 있도록 하는 시스템을 설계 구현하였다.

  • PDF

Effective Web Crawling Orderings from Graph Search Techniques (그래프 탐색 기법을 이용한 효율적인 웹 크롤링 방법들)

  • Kim, Jin-Il;Kwon, Yoo-Jin;Kim, Jin-Wook;Kim, Sung-Ryul;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.1
    • /
    • pp.27-34
    • /
    • 2010
  • Web crawlers are fundamental programs which iteratively download web pages by following links of web pages starting from a small set of initial URLs. Previously several web crawling orderings have been proposed to crawl popular web pages in preference to other pages, but some graph search techniques whose characteristics and efficient implementations had been studied in graph theory community have not been applied yet for web crawling orderings. In this paper we consider various graph search techniques including lexicographic breadth-first search, lexicographic depth-first search and maximum cardinality search as well as well-known breadth-first search and depth-first search, and then choose effective web crawling orderings which have linear time complexity and crawl popular pages early. Especially, for maximum cardinality search and lexicographic breadth-first search whose implementations are non-trivial, we propose linear-time web crawling orderings by applying the partition refinement method. Experimental results show that maximum cardinality search has desirable properties in both time complexity and the quality of crawled pages.

Design and Implementation of Event-driven Real-time Web Crawler to Maintain Reliability (신뢰성 유지를 위한 이벤트 기반 실시간 웹크롤러의 설계 및 구현)

  • Ahn, Yong-Hak
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.1-6
    • /
    • 2022
  • Real-time systems using web cralwing data must provide users with data from the same database as remote data. To do this, the web crawler repeatedly sends HTTP(HtypeText Transfer Protocol) requests to the remote server to see if the remote data has changed. This process causes network load on the crawling server and remote server, causing problems such as excessive traffic generation. To solve this problem, in this paper, based on user events, we propose a real-time web crawling technique that can reduce the overload of the network while securing the reliability of maintaining the sameness between the data of the crawling server and data from multiple remote locations. The proposed method performs a crawling process based on an event that requests unit data and list data. The results show that the proposed method can reduce the overhead of network traffic in existing web crawlers and secure data reliability. In the future, research on the convergence of event-based crawling and time-based crawling is required.

Responsive web based Virus Information Sytem using Crawling (크롤링을 통한 반응형웹 기반의 바이러스 정보 시스템)

  • Hur, Tai-Sung;Baek, Jae-Won
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.269-270
    • /
    • 2020
  • 코로나 사태 이후에도 세상은 수많은 바이러스가 유행하게 될 것이다. 수많은 질병에서 필요한 것은 정보이고 이러한 정보를 얻기 위해서 사람들은 많은 사이트를 찾아다니며 정보를 검색하는 데 시간을 소비하고 원하는 정보를 빠르게 찾을 수 없다. 이러한 문제를 해결하고자 현재 유행하고 있는 질병 현황 정보, 시도별 현황 정보, 마스크 판매처 위치 및 재고, 바이러스 감염자 방문 기록을 확인할 수 있는 등 바이러스 정보를 짧은 시간에 사용자가 원하는 정보를 한 눈에 확인할 수 있도록 각종 사이트에서 데이터를 크롤링하여 가공하여 필요한 정보를 제공하는 반응형웹 시스템을 개발하였다.

  • PDF

Design and Implemention of Real-time web Crawling distributed monitoring system (실시간 웹 크롤링 분산 모니터링 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.1
    • /
    • pp.45-53
    • /
    • 2019
  • We face problems from excessive information served with websites in this rapidly changing information era. We find little information useful and much useless and spend a lot of time to select information needed. Many websites including search engines use web crawling in order to make data updated. Web crawling is usually used to generate copies of all the pages of visited sites. Search engines index the pages for faster searching. With regard to data collection for wholesale and order information changing in realtime, the keyword-oriented web data collection is not adequate. The alternative for selective collection of web information in realtime has not been suggested. In this paper, we propose a method of collecting information of restricted web sites by using Web crawling distributed monitoring system (R-WCMS) and estimating collection time through detailed analysis of data and storing them in parallel system. Experimental results show that web site information retrieval is applied to the proposed model, reducing the time of 15-17%.

A Proposal of Motion Recognition-based Video Search System using Machine Learning (기계학습을 이용한 동작인식 동영상 검색시스템 제안)

  • Seo, Won-Seoung;Lee, Kang-Hee
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2019.01a
    • /
    • pp.463-464
    • /
    • 2019
  • 본 논문은 기계학습을 기반으로 아두이노와 시리얼통신을 통한 사용자의 동작인식을 이용해 보다 간단하게 인터넷상의 원하는 동영상을 찾을 수 있는 검색시스템을 제작하고자 하였다. 이 검색시스템은 Python을 기반으로 SVM(Support Vector Machine)을 이용한 패턴 분류를 사용하였으며 이를 통해 사용자의 동작을 입력받아 문자를 예측 할 수 있다. 사용자는 이 검색시스템을 사용하기 위하여 우선 문자에 대한 사용자의 동작입력을 통해 학습 데이터 셋을 만들어야 하며 그것을 SVM을 이용하여 학습 모델과 식별자를 만들고, 만들어진 분류기를 통하여 동작인식을 바탕으로 문자의 결과를 예측 할 수 있다. 최종적으로 사용자의 동작인식을 거쳐 만들어진 문자열을 이용해 인터넷 동영상 사이트인 Youtube를 통해 웹 크롤링하여 문자열과 관련 있는 동영상을 찾아준다.

  • PDF

Early Detection Assistance System for Rare Diseases based on Patient's Symptom Information (환자 증상정보 기반 희귀질환 조기 발견 보조시스템)

  • Jae-Min Choi;Sun-Yong Kim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.2
    • /
    • pp.373-378
    • /
    • 2023
  • Untypical symptoms and lack of diagnostic records make it difficult for even medical specialists to detect rare diseases. Thus, it takes a lot of time and money from the onset of symptoms to an accurate diagnosis, which seriously results in physical, mental, and economic pressure on patients. In this paper, we propose and implement an early detection assistance system for rare diseases using web crawling and text mining, which can suggest the names of suspected rare diseases so that medical staffs can easily recall the disease names and make a final diagnosis of the rare diseases.

Learning Effects of Flipped Learning based on Learning Analytics in SW Coding Education (SW 코딩교육에서의 학습분석기반 플립러닝의 학습효과)

  • Pi, Su-Young
    • Journal of Digital Convergence
    • /
    • v.18 no.11
    • /
    • pp.19-29
    • /
    • 2020
  • The study aims to examine the effectiveness of flipped learning teaching methods by using learning analytics to enable effective programming learning for non-major students. After designing a flipped learning programming class model applied with the ADDIE model, learning-related data of the lecture support system operated by the school was processed with crawling. By providing data processed with crawling through a dashboard so that the instructor can understand it easily, the instructor can design classes more efficiently and provide individually tailored learning based on this. As a result of analysis based on the learning-related data collected through one semester class, it was found that the department, academic year, attendance, assignment submission, and preliminary/review attendance had an effect on academic achievement. As a result of survey analysis, they responded that the individualized feedback of instructors through learning analysis was very helpful in self-directed learning. It is expected that it will serve as an opportunity for instructors to provide a foundation for enhancing teaching activities. In the future, the contents of social network services related to learners' learning will be processed with crawling to analyze learners' learning situations.

Real-Time Ransomware Infection Detection System Based on Social Big Data Mining (소셜 빅데이터 마이닝 기반 실시간 랜섬웨어 전파 감지 시스템)

  • Kim, Mihui;Yun, Junhyeok
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.10
    • /
    • pp.251-258
    • /
    • 2018
  • Ransomware, a malicious software that requires a ransom by encrypting a file, is becoming more threatening with its rapid propagation and intelligence. Rapid detection and risk analysis are required, but real-time analysis and reporting are lacking. In this paper, we propose a ransomware infection detection system using social big data mining technology to enable real-time analysis. The system analyzes the twitter stream in real time and crawls tweets with keywords related to ransomware. It also extracts keywords related to ransomware by crawling the news server through the news feed parser and extracts news or statistical data on the servers of the security company or search engine. The collected data is analyzed by data mining algorithms. By comparing the number of related tweets, google trends (statistical information), and articles related wannacry and locky ransomware infection spreading in 2017, we show that our system has the possibility of ransomware infection detection using tweets. Moreover, the performance of proposed system is shown through entropy and chi-square analysis.