• Title/Summary/Keyword: Journal PageRank

Search Result 80, Processing Time 0.021 seconds

The Distinct Impact Dimensions of the Prestige Indices in Author Citation Networks (저자 인용 네트워크에서 명망성 지표의 차별된 영향력 측정기준에 관한 연구)

  • Ahn, Hyerim;Park, Ji-Hong
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.2
    • /
    • pp.61-76
    • /
    • 2016
  • This study aims at proposing three prestige indices-closeness prestige, input domain, and proximity prestige- as useful measures for the impact of a particular node in citation networks. It compares these prestige indices with other impact indices as it is still unknown what dimensions of impact these indices actually measure. The prestige indices enable us to distinguish the most prominent actors in a directed network, similar to the centrality indices in undirected networks. Correlation analysis and principal component analysis were conducted on the author citation network to identify the differentiated implications of the three prestige indices from the existing impact indices. We selected simple citation counting, h-index, PageRank, and the three kinds of centrality indices which assume undirected networks as the existing impact measures for comparison with the three prestige indices. The results indicate that these prestige indices demonstrate distinct impact dimension from the other impact indices. The prestige indices reflect indirect impact while the others direct impact.

Implementation of Efficient Power Method on CUDA GPU (CUDA 기반 GPU에서 효율적인 Power Method의 구현)

  • Kim, Jung-Hwan;Kim, Jin-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.9-16
    • /
    • 2011
  • GPU computing is emerging in high performance application area since it can easily exploit massive parallelism in a way of cost-effective computing. The power method which finds the eigen vector of a given matrix is widely used in various applications such as PageRank for calculating importance of web pages. In this research we made the power method efficiently parallelized on GPU and also suggested how it can be improved to enhance its performance. The power method mainly consists of matrix-vector product and it can be easily parallelized. However, it should decide the convergence of the eigen vector and need scaling of the vector subsequently. Such operations incur several calls to GPU kernels and data movement between host and GPU memories. We improved the performance of the power method by means of reduced calls to GPU kernels, optimized thread allocation and enhanced decision operation for the convergence.

User Reputation Evaluation Using Co-occurrence Feature and Collective Intelligence (동시출현 자질과 집단 지성을 이용한 지식검색 문서 사용자 명성 평가)

  • Lee, Hyun-Woo;Han, Yo-Sub;Kim, Lae-Hyun;Cha, Jeong-Won
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.4
    • /
    • pp.459-476
    • /
    • 2008
  • The user needs to find the answer to your question is growing fast at the service using collective intelligent knowledge. In the previous researches, it was proven that the non-text information like view counting, referrer number, and number of answer is good in evaluating answers. There were also many works about evaluating answers using the various kinds of word dictionaries. In this work, we propose new method to evaluate answers to question effectively using user reputation that estimated by the social activity. We use a modified PageRank algorithm for estimating user reputation. We also use the similarity between question and answer. From the result of experiment in the Naver GisikiN corpus, we can see that the proposed method gives meaningful performance to complement the answer selection rate.

  • PDF

A Research for Web Documents Genre Classification using STW (STW를 이용한 웹 문서 장르 분류에 관한 연구)

  • Ko, Byeong-Kyu;Oh, Kun-Seok;Kim, Pan-Koo
    • Journal of Information Technology and Architecture
    • /
    • v.9 no.4
    • /
    • pp.413-422
    • /
    • 2012
  • Many researchers have been studied to reveal human natural language to let machine understand its meaning by text based, page rank based or more. Particularly, it has been considered that URL and HTML Tag information in web documents are attracting people' attention again to analyze huge amount of web document automatically. In this paper, we propose a STW (Semantic Term Weight) approach based on syntactic and linguistic structure of web documents in order to classify what genres are. For the evaluation, we analyzed more than 1,000 documents from 20-Genre-collection corpus for training the documents based on SVM algorithm. Afterwards, we tested KI-04 corpus to evaluate performance of our proposed method. This paper measured their accuracy by classifying them into an experiment using STW and one without u sing STW. As the results, the proposed STW based approach showed approximately 10.2% which Is higher than one without use of STW.

Global Technical Knowledge Flow Analysis in Intelligent Information Technology : Focusing on South Korea (지능정보기술 분야에서의 글로벌 기술 지식 경쟁력 분석 : 한국을 중심으로)

  • Kwak, Gihyun;Yoon, Jungsub
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.1
    • /
    • pp.24-38
    • /
    • 2021
  • This study aims to measure Korea's global competitiveness in intelligent information technology, which is the core technology of the 4th industrial revolution. For analysis, we collect patents of each field and prior patents cited by them, which are applied at the U.S. Patent Office (USPTO) between 2010 and 2018 from PATSTAT Online. A global knowledge transfer network was established by grouping citing- and cited-relationships at a national level. The in-degree centrality is used to evaluate technology acceptance, which indicates the process of absorbing existing technological knowledge to create new knowledge in each field. Second, to evaluate the impact of existing technological knowledge on the creation of new one, the out-degree centrality is investigated. Third, we apply the PageRank algorithm to qualitatively and quantitatively investigate the importance of the relationships between countries. As a result, it is confirmed through all the indicators that the AI sector is currently the least competitive.

Snippet Extraction Method using Fuzzy Implication Operator and Relevance Feedback (연관 피드백과 퍼지 함의 연산자를 이용한 스니핏 추출 방법)

  • Park, Sun;Shim, Chun-Sik;Lee, Seong-Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.3
    • /
    • pp.424-431
    • /
    • 2012
  • In information retrieval, search engine provide the rank of web page and the summary of the web page information to user. Snippet is a summaries information of representing web pages. Visiting the web page by the user is affected by the snippet. User sometime visits the wrong page with respect to user intention when uses snippet. The snippet extraction method is difficult to accurate comprehending user intention. In order to solve above problem, this paper proposes a new snippet extraction method using fuzzy implication operator and relevance feedback. The proposed method uses relevance feedback to expand the use's query. The method uses the fuzzy implication operator between the expanded query and the web pages to extract snippet to be well reflected semantic user's intention. The experimental results demonstrate that the proposed method can achieve better snippet extraction performance than the other methods.

Korean Web Content Extraction using Tag Rank Position and Gradient Boosting (태그 서열 위치와 경사 부스팅을 활용한 한국어 웹 본문 추출)

  • Mo, Jonghoon;Yu, Jae-Myung
    • Journal of KIISE
    • /
    • v.44 no.6
    • /
    • pp.581-586
    • /
    • 2017
  • For automatic web scraping, unnecessary components such as menus and advertisements need to be removed from web pages and main contents should be extracted automatically. A content block tends to be located in the middle of a web page. In particular, Korean web documents rarely include metadata and have a complex design; a suitable method of content extraction is therefore needed. Existing content extraction algorithms use the textual and structural features of content blocks because processing visual features requires heavy computation for rendering and image processing. In this paper, we propose a new content extraction method using the tag positions in HTML as a quasi-visual feature. In addition, we develop a tag rank position, a type of tag position not affected by text length, and show that gradient boosting with the tag rank position is a very accurate content extraction method. The result of this paper shows that the content extraction method can be used to collect high-quality text data automatically from various web pages.

Nonparametric procedures based on aligned method and placement for ordered alternatives in randomized block design (랜덤화 블록 모형에서 정렬방법과 위치를 이용한 순서형 대립가설에 대한 비모수 검정법)

  • Kim, Hyosook;Kim, Dongjae
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.707-717
    • /
    • 2016
  • Nonparametric procedures in a randomized block design was proposed by Friedman (1937) as a general alternative as well as suggested as a test for ordered alternatives by Page (1963). These methods are used for the rank of treatments in each block. In this paper, we proposed nonparametric procedures using aligned method proposed by Hodges and Lehmann (1962) to reduce among block information and based on placement suggested by Kim (1999) in a randomized block design. We also perform a Monte Carlo study to compare the empirical powers of the proposed procedures and established method.

A Query Randomizing Technique for breaking 'Filter Bubble'

  • Joo, Sangdon;Seo, Sukyung;Yoon, Youngmi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.12
    • /
    • pp.117-123
    • /
    • 2017
  • The personalized search algorithm is a search system that analyzes the user's IP, cookies, log data, and search history to recommend the desired information. As a result, users are isolated in the information frame recommended by the algorithm. This is called 'Filter bubble' phenomenon. Most of the personalized data can be deleted or changed by the user, but data stored in the service provider's server is difficult to access. This study suggests a way to neutralize personalization by keeping on sending random query words. This is to confuse the data accumulated in the server while performing search activities with words that are not related to the user. We have analyzed the rank change of the URL while conducting the search activity with 500 random query words once using the personalized account as the experimental group. To prove the effect, we set up a new account and set it as a control. We then searched the same set of queries with these two accounts, stored the URL data, and scored the rank variation. The URLs ranked on the upper page are weighted more than the lower-ranked URLs. At the beginning of the experiment, the difference between the scores of the two accounts was insignificant. As experiments continue, the number of random query words accumulated in the server increases and results show meaningful difference.

The Effective Blog Search Algorithm based on the Structural Features in the Blogspace (블로그의 구조적 특성을 고려한 효율적인 블로그 검색 알고리즘)

  • Kim, Jung-Hoon;Yoon, Tae-Bok;Lee, Jee-Hyong
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.7
    • /
    • pp.580-589
    • /
    • 2009
  • Today, most web pages are being created in the blogspace or evolving into the blogspace. A blog entry (blog page) includes non-traditional features of Web pages, such as trackback links, bloggers' authority, tags, and comments. Thus, the traditional rank algorithms are not proper to evaluate blog entries because those algorithms do not consider the blog specific features. In this paper, a new algorithm called "Blog-Rank" is proposed. This algorithm ranks blog entries by calculating bloggers' reputation scores, trackback scores, and comment scores based on the features of the blog entries. This algorithm is also applied to searching for information related to the users' queries in the blogspace. The experiment shows that it finds the much more relevant information than the traditional ranking algorithms.