• 제목/요약/키워드: Crawling system

Search Result 110, Processing Time 0.023 seconds

Tax Judgment Analysis and Prediction using NLP and BiLSTM (NLP와 BiLSTM을 적용한 조세 결정문의 분석과 예측)

  • Lee, Yeong-Keun;Park, Koo-Rack;Lee, Hoo-Young
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.181-188
    • /
    • 2021
  • Research and importance of legal services applied with AI so that it can be easily understood and predictable in difficult legal fields is increasing. In this study, based on the decision of the Tax Tribunal in the field of tax law, a model was built through self-learning through information collection and data processing, and the prediction results were answered to the user's query and the accuracy was verified. The proposed model collects information on tax decisions and extracts useful data through web crawling, and generates word vectors by applying Word2Vec's Fast Text algorithm to the optimized output through NLP. 11,103 cases of information were collected and classified from 2017 to 2019, and verified with 70% accuracy. It can be useful in various legal systems and prior research to be more efficient application.

YouTube Channel Ranking Scheme based on Hidden Qualitative Information Analysis (유튜브 은닉 질적 정보 분석 기반 유튜브 채널 랭킹 기법)

  • Lee, Ji Hyeon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.7
    • /
    • pp.757-763
    • /
    • 2019
  • Youtube has become so popular that it is called the age of YouTube. As the number of users and contents increase, the choice of information increases. However, it is difficult to select information that meets the needs of users. YouTube provides recommendations based on their watch list. Therefore, in this study, we want to analyze the channel of user's subject in various angles and provide the proposed scheme based on the crawled channels, measurement of the perception of channels and channel videos through quantitative data and hidden qualitative data analysis. Based on the above two data analysis, it is possible to know the recognition of the channel and the recognition of the channel video, thereby providing a ranking of the channels that deal with the topic. Finally, as a case study, we recommend English learning channels to users based on numerical data statistics and emotional analysis results to maximize flipped learning effect regardless of time and space.

Design and Analysis of Technical Management System of Personal Information Security using Web Crawer (웹 크롤러를 이용한 개인정보보호의 기술적 관리 체계 설계와 해석)

  • Park, In-pyo;Jeon, Sang-june;Kim, Jeong-ho
    • Journal of Platform Technology
    • /
    • v.6 no.4
    • /
    • pp.69-77
    • /
    • 2018
  • In the case of personal information files containing personal information, there is insufficient awareness of personal information protection in end-point areas such as personal computers, smart terminals, and personal storage devices. In this study, we use Diffie-Hellman method to securely retrieve personal information files generated by web crawler. We designed SEED and ARIA using hybrid slicing to protect against attack on personal information file. The encryption performance of the personal information file collected by the Web crawling method is compared with the encryption decryption rate according to the key generation and the encryption decryption sharing according to the user key level. The simulation was performed on the personal information file delivered to the external agency transmission process. As a result, we compared the performance of existing methods and found that the detection rate is improved by 4.64 times and the information protection rate is improved by 18.3%.

A Web application vulnerability scoring framework by categorizing vulnerabilities according to privilege acquisition (취약점의 권한 획득 정도에 따른 웹 애플리케이션 취약성 수치화 프레임워크)

  • Cho, Sung-Young;Yoo, Su-Yeon;Jeon, Sang-Hun;Lim, Chae-Ho;Kim, Se-Hun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.22 no.3
    • /
    • pp.601-613
    • /
    • 2012
  • It is required to design and implement secure web applications to provide safe web services. For this reason, there are several scoring frameworks to measure vulnerabilities in web applications. However, these frameworks do not classify according to seriousness of vulnerability because these frameworks simply accumulate score of individual factors in a vulnerability. We rate and score vulnerabilities according to probability of privilege acquisition so that we can prioritize vulnerabilities found in web applications. Also, our proposed framework provides a method to score all web applications provided by an organization so that which web applications is the worst secure and should be treated first. Our scoring framework is applied to the data which lists vulnerabilities in web applications found by a web scanner based on crawling, and we show the importance of categorizing vulnerabilities according to privilege acquisition.

Development of Online Fashion Thesaurus and Taxonomy for Text Mining (텍스트마이닝을 위한 패션 속성 분류체계 및 말뭉치 웹사전 구축)

  • Seyoon Jang;Ha Youn Kim;Songmee Kim;Woojin Choi;Jin Jeong;Yuri Lee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.6
    • /
    • pp.1142-1160
    • /
    • 2022
  • Text data plays a significant role in understanding and analyzing trends in consumer, business, and social sectors. For text analysis, there must be a corpus that reflects specific domain knowledge. However, in the field of fashion, the professional corpus is insufficient. This study aims to develop a taxonomy and thesaurus that considers the specialty of fashion products. To this end, about 100,000 fashion vocabulary terms were collected by crawling text data from WSGN, Pantone, and online platforms; text subsequently was extracted through preprocessing with Python. The taxonomy was composed of items, silhouettes, details, styles, colors, textiles, and patterns/prints, which are seven attributes of clothes. The corpus was completed through processing synonyms of terms from fashion books such as dictionaries. Finally, 10,294 vocabulary words, including 1,956 standard Korean words, were classified in the taxonomy. All data was then developed into a web dictionary system. Quantitative and qualitative performance tests of the results were conducted through expert reviews. The performance of the thesaurus also was verified by comparing the results of text mining analysis through the previously developed corpus. This study contributes to achieving a text data standard and enables meaningful results of text mining analysis in the fashion field.

Avocado Classification and Shipping Prediction System based on Transfer Learning Model for Rational Pricing (합리적 가격결정을 위한 전이학습모델기반 아보카도 분류 및 출하 예측 시스템)

  • Seong-Un Yu;Seung-Min Park
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.2
    • /
    • pp.329-335
    • /
    • 2023
  • Avocado, a superfood selected by Time magazine and one of the late ripening fruits, is one of the foods with a big difference between local prices and domestic distribution prices. If this sorting process of avocados is automated, it will be possible to lower prices by reducing labor costs in various fields. In this paper, we aim to create an optimal classification model by creating an avocado dataset through crawling and using a number of deep learning-based transfer learning models. Experiments were conducted by directly substituting a deep learning-based transfer learning model from a dataset separated from the produced dataset and fine-tuning the hyperparameters of the model. When an avocado image is input, the model classifies the ripeness of the avocado with an accuracy of over 99%, and proposes a dataset and algorithm that can reduce manpower and increase accuracy in avocado production and distribution households.

Job-related analysis and visualization using big data distributed processing system (빅데이터를 활용한 직업관련 분석 및 시각화)

  • Choi, Dong-Cheol;Choi, Nakjin;Kim, Min-Seok;Park, Jun-wook;Lee, Jun-Dong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.07a
    • /
    • pp.249-251
    • /
    • 2020
  • 본 논문에서는 코로나바이러스감염증19 사태가 국내 취업시장에 어떠한 영향을 미쳤는지에 대해 알아보기 위하여 빅데이터를 활용한 직업 관련 분석 및 시각화를 수행하였다. 빅데이터를 위한 기본 자료는 통계청 자료와 워크넷 Open API를 활용하였으며, 빅데이터 처리 과정을 거쳐 결과값을 예측을 시도하였다. 2020년도 워크넷 Open API를 통해 고용수와 통계청 자료를 통해 비교 분석 및 시각화를 실시하였고, 08년~20년 취업자수를 통해 시계열 분석 및 예측을 진행해 앞으로의 횡보를 예상해보았다. 분석한 결과 19년, 20년도를 비교 분석했을 때에는 크게 차이가 나지 않았다. 추가적으로 시계열 분석기법을 활용해 보았을 때 매년 고용수는 전체적으로 증가하고 4월에는 감소, 7월에는 증가하는 추세가 나왔다. 코로나바이러스감염증19 사태로 인해 공공기관과 언택트 시대에 따른 화상회의나 재택근무로 인해 운수·통신 취업률은 상승한다는 결과값이 도출되었고, 자영업이나 서비스 직업 등은 다른 직종에 비해 큰 감소를 보여줬으나 국가 경제 활성화에 따른 고용수는 점차 증가할 것이라 예측된다.

  • PDF

Design and Implementation of Crime Prevention System Targeting Women by Using Public BigData (공공 빅데이터를 이용한 여성 대상 범죄 예방 시스템의 설계 및 구현)

  • Ko, Sung-Wook;Oh, Su-Bin;Baek, Se-In;Park, Hyeok-Ju;Park, Mee-Hwa;Lee, Kang-Woo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.561-564
    • /
    • 2016
  • If using crime map which represents criminal section that violent crimes targeting women frequently happened, the police could prevent additional crimes by positioning themselves intensively in expected crime zones and each individual could avoid being damaged by referring information of criminal zones. In this paper, by analyzing crimes targeting women and offender information which is provided in public-opened datum portal, we suppose a system which prevents crimes that calculates locational danger and, by considering location and age group of users, provides user-customized information of danger. By crawling the criminals datum which is provided in public-opened datum portal, It collects them. About the areas which happened sexual crimes, calculating danger of crime based on statistical crime information including criminal information, residence of offenders, areas which happened sexual crimes, sentences and the number of crime, this system is able to visualize the areas which sexual crimes happened based on information of danger grade representing on user's location. The score of danger calculated in location unit can provide criminal information according to location and ages of users by interacting GIS.

  • PDF

Prototype Design and Development of Online Recruitment System Based on Social Media and Video Interview Analysis (소셜미디어 및 면접 영상 분석 기반 온라인 채용지원시스템 프로토타입 설계 및 구현)

  • Cho, Jinhyung;Kang, Hwansoo;Yoo, Woochang;Park, Kyutae
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.203-209
    • /
    • 2021
  • In this study, a prototype design model was proposed for developing an online recruitment system through multi-dimensional data crawling and social media analysis, and validates text information and video interview in job application process. This study includes a comparative analysis process through text mining to verify the authenticity of job application paperwork and to effectively hire and allocate workers based on the potential job capability. Based on the prototype system, we conducted performance tests and analyzed the result for key performance indicators such as text mining accuracy and interview STT(speech to text) function recognition rate. If commercialized based on design specifications and prototype development results derived from this study, it may be expected to be utilized as the intelligent online recruitment system technology required in the public and private recruitment markets in the future.

An Analysis of the Support Policy for Small Businesses in the Post-Covid-19 Era Using the LDA Topic Model (LDA 토픽 모델을 활용한 포스트 Covid-19 시대의 소상공인 지원정책 분석)

  • Kyung-Do Suh;Jung-il Choi;Pan-Am Choi;Jaerim Jung
    • Journal of Industrial Convergence
    • /
    • v.22 no.6
    • /
    • pp.51-59
    • /
    • 2024
  • The purpose of the paper is to suggest government policies that are practically helpful to small business owners in pandemic situations such as COVID-19. To this end, keyword frequency analysis and word cloud analysis of text mining analysis were performed by crawling news articles centered on the keywords "COVID-19 Support for Small Businesses", "The Impact of Small Businesses by Response System to COVID-19 Infectious Diseases", and "COVID-19 Small Business Economic Policy", and major issues were identified through LDA topic modeling analysis. As a result of conducting LDA topic modeling, the support policy for small business owners formed a topic label with government cash and financial support, and the impact of small business owners according to the COVID-19 infectious disease response system formed a topic label with a government-led quarantine system and an individual-led quarantine system, and the COVID-19 economic policy formed a topic label with a policy for small business owners to acquire economic crisis and self-sustainability. Focusing on the organized topic label, it was intended to provide basic data for small business owners to understand the damage reduction policy for small business owners and the policy for enhancing market competitiveness in the future pandemic situation.