• Title/Summary/Keyword: Page Similarity

Search Result 69, Processing Time 0.031 seconds

Improving Performance of Search Engine Using Category based Evaluation (범주 기반 평가를 이용한 검색시스템의 성능 향상)

  • Kim, Hyung-Il;Yoon, Hyun-Nim
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.1
    • /
    • pp.19-29
    • /
    • 2013
  • In the current Internet environment where there is high space complexity of information, search engines aim to provide accurate information that users want. But content-based method adopted by most of search engines cannot be used as an effective tool in the current Internet environment. As content-based method gives different weights to each web page using morphological characteristics of vocabulary, the method has its drawbacks of not being effective in distinguishing each web page. To resolve this problem and provide useful information to the users, this paper proposes an evaluation method based on categories. Category-based evaluation method is to extend query to semantic relations and measure the similarity to web pages. In applying weighting to web pages, category-based evaluation method utilizes user response to web page retrieval and categories of query and thus better distinguish web pages. The method proposed in this paper has the advantage of being able to effectively provide the information users want through search engines and the utility of category-based evaluation technique has been confirmed through various experiments.

A Traceback-Based Authentication Model for Active Phishing Site Detection for Service Users (서비스 사용자의 능동적 피싱 사이트 탐지를 위한 트레이스 백 기반 인증 모델)

  • Baek Yong Jin;Kim Hyun Ju
    • Convergence Security Journal
    • /
    • v.23 no.1
    • /
    • pp.19-25
    • /
    • 2023
  • The current network environment provides a real-time interactive service from an initial one-way information prov ision service. Depending on the form of web-based information sharing, it is possible to provide various knowledge a nd services between users. However, in this web-based real-time information sharing environment, cases of damage by illegal attackers who exploit network vulnerabilities are increasing rapidly. In particular, for attackers who attempt a phishing attack, a link to the corresponding web page is induced after actively generating a forged web page to a user who needs a specific web page service. In this paper, we analyze whether users directly and actively forge a sp ecific site rather than a passive server-based detection method. For this purpose, it is possible to prevent leakage of important personal information of general users by detecting a disguised webpage of an attacker who induces illegal webpage access using traceback information

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

Analysis on Printing Advertisements Appearing Monthly Magazines for Korean Women (여성잡지광고의 레이아웃요소와 제품생명주기에 관한 연구)

  • Lee, Kwang-Sook
    • Journal of the Korean Graphic Arts Communication Society
    • /
    • v.20 no.2
    • /
    • pp.107-118
    • /
    • 2002
  • Two-page spread sheet advertising appearing on monthly women's magazine were selected and analyzed for this study. All 299 advertisements were sampled from 5 kinds of women's magazines published from January to June in 2002. The results shows that; 1) all elements of layout in advertisements are significantly related to product life cycle; 2) Similarity of layout type proposed to differentiate layout from competitor's advertisements; 3) Realistic picture of products using high technological equipments and skill occupied most of ad space. It proposed the possibility of varied illustration type; finally, Using foreign models both imported products and domestic products are increased. It shows that the level of westernized insight of current consumer.

  • PDF

User Reputation Evaluation Using Co-occurrence Feature and Collective Intelligence (동시출현 자질과 집단 지성을 이용한 지식검색 문서 사용자 명성 평가)

  • Lee, Hyun-Woo;Han, Yo-Sub;Kim, Lae-Hyun;Cha, Jeong-Won
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.459-476
    • /
    • 2008
  • The user needs to find the answer to your question is growing fast at the service using collective intelligent knowledge. In the previous researches, it was proven that the non-text information like view counting, referrer number, and number of answer is good in evaluating answers. There were also many works about evaluating answers using the various kinds of word dictionaries. In this work, we propose new method to evaluate answers to question effectively using user reputation that estimated by the social activity. We use a modified PageRank algorithm for estimating user reputation. We also use the similarity between question and answer. From the result of experiment in the Naver GisikiN corpus, we can see that the proposed method gives meaningful performance to complement the answer selection rate.

  • PDF

An Automated Technique for Illegal Site Detection using the Sequence of HTML Tags (HTML 태그 순서를 이용한 불법 사이트 탐지 자동화 기술)

  • Lee, Kiryong;Lee, Heejo
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1173-1178
    • /
    • 2016
  • Since the introduction of BitTorrent protocol in 2001, everything can be downloaded through file sharing, including music, movies and software. As a result, the copyright holder suffers from illegal sharing of copyright content. In order to solve this problem, countries have enacted illegal share related law; and internet service providers block pirate sites. However, illegal sites such as pirate bay easily reopen the site by changing the domain name. Thus, we propose a technique to easily detect pirate sites that are reopened. This automated technique collects the domain names using the google search engine, and measures similarity using Longest Common Subsequence (LCS) algorithm by comparing the tag structure of the source web page and reopened web page. For evaluation, we colledted 2,383 domains from google search. Experimental results indicated detection of a total of 44 pirate sites for collected domains when applying LCS algorithm. In addition, this technique detected 23 pirate sites for 805 domains when applied to foreign pirate sites. This experiment facilitated easy detection of the reopened pirate sites using an automated detection system.

Automatic Meeting Summary System using Enhanced TextRank Algorithm (향상된 TextRank 알고리즘을 이용한 자동 회의록 생성 시스템)

  • Bae, Young-Jun;Jang, Ho-Taek;Hong, Tae-Won;Lee, Hae-Yeoun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.467-474
    • /
    • 2018
  • To organize and document the contents of meetings and discussions is very important in various tasks. However, in the past, people had to manually organize the contents themselves. In this paper, we describe the development of a system that generates the meeting minutes automatically using the TextRank algorithm. The proposed system records all the utterances of the speaker in real time and calculates the similarity based on the appearance frequency of the sentences. Then, to create the meeting minutes, it extracts important words or phrases through a non-supervised learning algorithm for finding the relation between the sentences in the document data. Especially, we improved the performance by introducing the keyword weighting technique for the TextRank algorithm which reconfigured the PageRank algorithm to fit words and sentences.

Analysis of N- Terminal Amino Acid Sequence of Catechol 2,3-dioxygenase from Aniline Degrading Delftia sp. JK-2 (Aniline 분해세균 Delftia sp. JK-2에서 분리된 Catechol 2,3-dioxygenase의 N-말단 아미노산 서열 분석)

  • Hwang Seon-Young;Kahng Hyung-Yeel;Oh Kye-Heon
    • Korean Journal of Microbiology
    • /
    • v.41 no.1
    • /
    • pp.13-17
    • /
    • 2005
  • The aim of this work was to investigate the N-terminal amino acid sequence of catechol 2,3-dioxygenase isolated from Delftia sp. JK-2, which could utilize aniline as sole carbon, nitrogen and energy source. Molecular weight of the enzyme was determined to approximately 35 kDa by SDS-PAGE. N-terminal amino acid sequence of C2,3O from strain JK-2 was $^1MGVMRIGHASLKVMDMDAAVRHYENV^{26}$, and exhibited high sequence similarity with that of C2,3O from Pseudomonas sp., Comamonas sp. JS765, Comamonas test-osteroni, or Burkholderia sp. RP007. Approximately 950-bp C2,3O was obtained through PCR using the primers derived from N-terminal amino acid sequence. Analysis of the DNA sequence revealed that the deduced 296 amino acid sequences were determined, and it showed $100\%$ identity with C2,3O from Pseudomonas sp. AW-2 and $97\%$ similarity with Comamonas sp. JS765.

Cloning and Identification of Essential Residues for Thermostable β-glucosidase (BgIB) from Thermotoga maritima (Thermotoga maritima로부터 고온성 β-glucosidase (BgIB)의 클로닝과 필수아미노산 잔기의 확인)

  • Hong, Su-Young;Cho, Kye-Man;Kim, Yong-Hee;Hong, Sun-Joo;Cho, Soo-Jeong;Cho, Yong-Un;Kim, Hoon;Yun, Han-Dae
    • Journal of Life Science
    • /
    • v.16 no.7 s.80
    • /
    • pp.1148-1157
    • /
    • 2006
  • A hyperthermophilic bacterium Thernotoga maritima produced thermostable ${\beta}-glucosidase$. The gene encoding ${\beta}-glucosidase$ from T. maritima MSB8 was cloned and expressed in Escherichia coli. The en-zyme (BgIB) hydrolyzed ${\beta}-glucosidase$ linkages between glucose and alkyl, aryl of saccharide groups such as salicin, arbutin, and $_pNPG$. The insert DNA contained ORF with 2,166 bp encodes a 721 amino acids (calculated molecular mass of 80,964 and pl of 4.93). The amino a.id sequence of BglB showed the similarity to family 3 glycosyl hydrolases. The molecular weight of the enzyme was estimated to be approximately 81kDa by MUG-nondenaturing PAGE (4-methylumbelliferyl 13-D-glucoside-nondenaturing polyacrylamide gel electophoresis) and SDS-PACE. The ${\beta}-glucosidase$ exhibited maximal activity at pH 7.0 and $80^{\circ}C$. By exchanging two possible residues (Glu-232 and Asp-242) to Ala by site-directed mutagenesis method, it was found that these were essential for enzymatic activity.

Purification and Characterization of Laccase from Basidiomycete Fomitella fraxinea

  • Park, Kyung-Mi;Park, Sang-Shin
    • Journal of Microbiology and Biotechnology
    • /
    • v.18 no.4
    • /
    • pp.670-675
    • /
    • 2008
  • A laccase was isolated from the culture filtrate of the basidiomycete Fomitella fraxinea. The enzyme was purified to electrophoretical homogeneity using ammonium sulfate precipitation, anion-exchange chromatography, and gel-filtration chromatography. The enzyme was identified as a monomeric protein with a molecular mass of 47 kDa by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and gel-filtration chromatography, and had an isoelectric point of 3.8. The N-terminal amino acid sequence for the enzyme was ATXSNXKTLAAD, which had a very low similarity to the sequences previously reported for laccases from other basidiomycetes. The optimum pH and temperature for 2,2'-azino-bis(3-ethylbenzothiazoline-6-sulfonate) (ABTS) were 3.0 and $70^{\circ}C$, respectively. The enzyme also showed a much higher level of specific activity for ABTS and 2,6-dimethoxyphenol (DMP), where the $K_m$ values of the enzyme for ABTS and 2,6-DMP were 270 and $426{\mu}M$, respectively, and the $V_{max}$ values were 876 and $433.3{\mu}M/min$, respectively. The laccase activity was completely inhibited by L-cysteine, dithiothreitol (DTT), and sodium azide, significantly inhibited by $Ni^+,\;Mn^{2+}$, and $Ba^{2+}$, and slightly stimulated by $K^+$ and $Ca^{2+}$.