• Title/Summary/Keyword: Journal PageRank

Search Result 80, Processing Time 0.036 seconds

Effective Web Crawling Orderings from Graph Search Techniques (그래프 탐색 기법을 이용한 효율적인 웹 크롤링 방법들)

  • Kim, Jin-Il;Kwon, Yoo-Jin;Kim, Jin-Wook;Kim, Sung-Ryul;Park, Kun-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.37 no.1
    • /
    • pp.27-34
    • /
    • 2010
  • Web crawlers are fundamental programs which iteratively download web pages by following links of web pages starting from a small set of initial URLs. Previously several web crawling orderings have been proposed to crawl popular web pages in preference to other pages, but some graph search techniques whose characteristics and efficient implementations had been studied in graph theory community have not been applied yet for web crawling orderings. In this paper we consider various graph search techniques including lexicographic breadth-first search, lexicographic depth-first search and maximum cardinality search as well as well-known breadth-first search and depth-first search, and then choose effective web crawling orderings which have linear time complexity and crawl popular pages early. Especially, for maximum cardinality search and lexicographic breadth-first search whose implementations are non-trivial, we propose linear-time web crawling orderings by applying the partition refinement method. Experimental results show that maximum cardinality search has desirable properties in both time complexity and the quality of crawled pages.

A Preliminary Study on the Co-author Network Analysis of Korean Library & Information Science Research Community (공저 네트워크 분석에 관한 기초연구 - 문헌정보학 분야 4개 학술지를 중심으로 -)

  • Lee, Soo-Sang
    • Journal of Korean Library and Information Science Society
    • /
    • v.41 no.2
    • /
    • pp.297-315
    • /
    • 2010
  • This study investigates the various statistical data and measures of coauthorship network in the Korean LIS Research Community such as patterns of coauthorship, structural properties, types of cluster, centrality & impact analysis. This issues are mostly addressed through a Social Network Analysis of articles published from 2000 to 2009(10 years) in Korean Library & Information Science major four Journals. The coauthorship network was constructed and various measures of four centralities, PageRank, Effect size were calculated. The results show three implications. 1) There presents a phenomenon of Pareto's law in the articles publishing counts. 2) The top authors based on publishing counts prefer co-work publishing than solo-publishing. 3) The counts of article publishing are significantly correlated with five measures of network and not correlated with the case of power centrality.

  • PDF

A Distributed Vertex Rearrangement Algorithm for Compressing and Mining Big Graphs (대용량 그래프 압축과 마이닝을 위한 그래프 정점 재배치 분산 알고리즘)

  • Park, Namyong;Park, Chiwan;Kang, U
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1131-1143
    • /
    • 2016
  • How can we effectively compress big graphs composed of billions of edges? By concentrating non-zeros in the adjacency matrix through vertex rearrangement, we can compress big graphs more efficiently. Also, we can boost the performance of several graph mining algorithms such as PageRank. SlashBurn is a state-of-the-art vertex rearrangement method. It processes real-world graphs effectively by utilizing the power-law characteristic of the real-world networks. However, the original SlashBurn algorithm displays a noticeable slowdown for large-scale graphs, and cannot be used at all when graphs are too large to fit in a single machine since it is designed to run on a single machine. In this paper, we propose a distributed SlashBurn algorithm to overcome these limitations. Distributed SlashBurn processes big graphs much faster than the original SlashBurn algorithm does. In addition, it scales up well by performing the large-scale vertex rearrangement process in a distributed fashion. In our experiments using real-world big graphs, the proposed distributed SlashBurn algorithm was found to run more than 45 times faster than the single machine counterpart, and process graphs that are 16 times bigger compared to the original method.

Analysis of the different of Interest words between Korea and Vietnam using network theory - Focusing on smart city (네트워크 이론을 이용한 한국과 베트남의 관심어 차이 분석 - 스마트시티를 중심으로)

  • Jeong, Seong Yun;Kim, Nam Gon
    • Smart Media Journal
    • /
    • v.11 no.8
    • /
    • pp.73-83
    • /
    • 2022
  • In order to support new construction engineering companies with weak information power to successfully advance into the overseas construction market, this study tried to analyze what are the keywords of interest in the overseas construction market and how they differ from Korea. For this purpose, we recently collected 2,473 news article titles and major articles targeting smart cities that are of high interest in Korea and Vietnam. Through network configuration and topic modeling, we examined the connection relationship between the word of interest and the word of interest. In addition, the influence of the word of interest in the network was measured using PageRank centrality. Through this analysis, it was found that there is a high interest in smart city-related construction, cities, and digital in both countries, and the difference in terms of interest between Korea and Vietnam was inferred. Finally, the limitations of this study and additional research directions to complement them are presented.

Blog Search Method using User Relevance Feedback and Guru Estimation (사용자 적합성 피드백과 구루 평가 점수를 고려한 블로그 검색 방법)

  • Jeong, Kyung-Seok;Park, Hyuk-Ro
    • The KIPS Transactions:PartB
    • /
    • v.15B no.5
    • /
    • pp.487-492
    • /
    • 2008
  • Most Web search engines use ranking methods that take both the relevancy and the importance of documents into consideration. The importance of a document denotes the degree of usefulness of the document to general users. One of the most successful methods for estimating the importance of a document has been Page-Rank algorithm which uses the hyperlink structure of the Web for the estimation. In this paper, we propose a new importance estimation algorithm for the blog environment. The proposed method, first, calculates the importance of each document using user's bookmark and click count. Then, the Guru point of a blogger is computed as the sum of all importance points of documents which he/she wrote. Finally, the guru points are reflected in document ranking again. Our experiments show that the proposed method has higher correlation coefficient than the traditional methods with respect to correct answers.

Scar Wars: Preferences in Breast Surgery

  • Joyce, Cormac W;Murphy, Siun;Murphy, Stephen;Kelly, Jack L;Morrison, Colin M
    • Archives of Plastic Surgery
    • /
    • v.42 no.5
    • /
    • pp.596-600
    • /
    • 2015
  • Background The uptake of breast reconstruction is ever increasing with procedures ranging from implant-based reconstructions to complex free tissue transfer. Little emphasis is placed on scarring when counseling patients yet they remain a significant source of morbidity and litigation. The aim of this study was to examine the scarring preferences of men and women in breast oncoplastic and reconstructive surgery. Methods Five hundred men and women were asked to fill out a four-page questionnaire in two large Irish centres. They were asked about their opinions on scarring post breast surgery and were also asked to rank the common scarring patterns in wide local excisions, oncoplastic procedures, breast reconstructions as well as donor sites. Results Fifty-eight percent of those surveyed did not feel scars were important post breast cancer surgery. 61% said that their partners' opinion of scars were important. The most preferred wide local excision scar was the lower lateral quadrant scar whilst the scars from the deep inferior epigastric artery perforator (DIEP) flap were most favoured. The superior gluteal artery perforator flap had the most preferred donor site while surprisingly, the DIEP had the least favourite donor site. Conclusions Scars are often overlooked when planning breast surgery yet the extent and position of the scar needs to be outlined to patients and it should play an important role in selecting a breast reconstruction option. This study highlights the need for further evaluation of patients' opinions regarding scar patterns.

Finding Influential Users in the SNS Using Interaction Concept : Focusing on the Blogosphere with Continuous Referencing Relationships (상호작용성에 의한 SNS 영향유저 선정에 관한 연구 : 연속적인 참조관계가 있는 블로고스피어를 중심으로)

  • Park, Hyunjung;Rho, Sangkyu
    • The Journal of Society for e-Business Studies
    • /
    • v.17 no.4
    • /
    • pp.69-93
    • /
    • 2012
  • Various influence-related relationships in Social Network Services (SNS) among users, posts, and user-and-post, can be expressed using links. The current research evaluates the influence of specific users or posts by analyzing the link structure of relevant social network graphs to identify influential users. We applied the concept of mutual interactions proposed for ranking semantic web resources, rather than the voting notion of Page Rank or HITS, to blogosphere, one of the early SNS. Through many experiments with network models, where the performance and validity of each alternative approach can be analyzed, we showed the applicability and strengths of our approach. The weight tuning processes for the links of these network models enabled us to control the experiment errors form the link weight differences and compare the implementation easiness of alternatives. An additional example of how to enter the content scores of commercial or spam posts into the graph-based method is suggested on a small network model as well. This research, as a starting point of the study on identifying influential users in SNS, is distinctive from the previous researches in the following points. First, various influence-related properties that are deemed important but are disregarded, such as scraping, commenting, subscribing to RSS feeds, and trusting friends, can be considered simultaneously. Second, the framework reflects the general phenomenon where objects interacting with more influential objects increase their influence. Third, regarding the extent to which a bloggers causes other bloggers to act after him or her as the most important factor of influence, we treated sequential referencing relationships with a viewpoint from that of PageRank or HITS (Hypertext Induced Topic Selection).

Comparisons of Popularity- and Expert-Based News Recommendations: Similarities and Importance (인기도 기반의 온라인 추천 뉴스 기사와 전문 편집인 기반의 지면 뉴스 기사의 유사성과 중요도 비교)

  • Suh, Kil-Soo;Lee, Seongwon;Suh, Eung-Kyo;Kang, Hyebin;Lee, Seungwon;Lee, Un-Kon
    • Asia pacific journal of information systems
    • /
    • v.24 no.2
    • /
    • pp.191-210
    • /
    • 2014
  • As mobile devices that can be connected to the Internet have spread and networking has become possible whenever/wherever, the Internet has become central in the dissemination and consumption of news. Accordingly, the ways news is gathered, disseminated, and consumed have changed greatly. In the traditional news media such as magazines and newspapers, expert editors determined what events were worthy of deploying their staffs or freelancers to cover and what stories from newswires or other sources would be printed. Furthermore, they determined how these stories would be displayed in their publications in terms of page placement, space allocation, type sizes, photographs, and other graphic elements. In turn, readers-news consumers-judged the importance of news not only by its subject and content, but also through subsidiary information such as its location and how it was displayed. Their judgments reflected their acceptance of an assumption that these expert editors had the knowledge and ability not only to serve as gatekeepers in determining what news was valuable and important but also how to rank its value and importance. As such, news assembled, dispensed, and consumed in this manner can be said to be expert-based recommended news. However, in the era of Internet news, the role of expert editors as gatekeepers has been greatly diminished. Many Internet news sites offer a huge volume of news on diverse topics from many media companies, thereby eliminating in many cases the gatekeeper role of expert editors. One result has been to turn news users from passive receptacles into activists who search for news that reflects their interests or tastes. To solve the problem of an overload of information and enhance the efficiency of news users' searches, Internet news sites have introduced numerous recommendation techniques. Recommendations based on popularity constitute one of the most frequently used of these techniques. This popularity-based approach shows a list of those news items that have been read and shared by many people, based on users' behavior such as clicks, evaluations, and sharing. "most-viewed list," "most-replied list," and "real-time issue" found on news sites belong to this system. Given that collective intelligence serves as the premise of these popularity-based recommendations, popularity-based news recommendations would be considered highly important because stories that have been read and shared by many people are presumably more likely to be better than those preferred by only a few people. However, these recommendations may reflect a popularity bias because stories judged likely to be more popular have been placed where they will be most noticeable. As a result, such stories are more likely to be continuously exposed and included in popularity-based recommended news lists. Popular news stories cannot be said to be necessarily those that are most important to readers. Given that many people use popularity-based recommended news and that the popularity-based recommendation approach greatly affects patterns of news use, a review of whether popularity-based news recommendations actually reflect important news can be said to be an indispensable procedure. Therefore, in this study, popularity-based news recommendations of an Internet news portal was compared with top placements of news in printed newspapers, and news users' judgments of which stories were personally and socially important were analyzed. The study was conducted in two stages. In the first stage, content analyses were used to compare the content of the popularity-based news recommendations of an Internet news site with those of the expert-based news recommendations of printed newspapers. Five days of news stories were collected. "most-viewed list" of the Naver portal site were used as the popularity-based recommendations; the expert-based recommendations were represented by the top pieces of news from five major daily newspapers-the Chosun Ilbo, the JoongAng Ilbo, the Dong-A Daily News, the Hankyoreh Shinmun, and the Kyunghyang Shinmun. In the second stage, along with the news stories collected in the first stage, some Internet news stories and some news stories from printed newspapers that the Internet and the newspapers did not have in common were randomly extracted and used in online questionnaire surveys that asked the importance of these selected news stories. According to our analysis, only 10.81% of the popularity-based news recommendations were similar in content with the expert-based news judgments. Therefore, the content of popularity-based news recommendations appears to be quite different from the content of expert-based recommendations. The differences in importance between these two groups of news stories were analyzed, and the results indicated that whereas the two groups did not differ significantly in their recommendations of stories of personal importance, the expert-based recommendations ranked higher in social importance. This study has importance for theory in its examination of popularity-based news recommendations from the two theoretical viewpoints of collective intelligence and popularity bias and by its use of both qualitative (content analysis) and quantitative methods (questionnaires). It also sheds light on the differences in the role of media channels that fulfill an agenda-setting function and Internet news sites that treat news from the viewpoint of markets.

Research Trend Analysis Using Bibliographic Information and Citations of Cloud Computing Articles: Application of Social Network Analysis (클라우드 컴퓨팅 관련 논문의 서지정보 및 인용정보를 활용한 연구 동향 분석: 사회 네트워크 분석의 활용)

  • Kim, Dongsung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.195-211
    • /
    • 2014
  • Cloud computing services provide IT resources as services on demand. This is considered a key concept, which will lead a shift from an ownership-based paradigm to a new pay-for-use paradigm, which can reduce the fixed cost for IT resources, and improve flexibility and scalability. As IT services, cloud services have evolved from early similar computing concepts such as network computing, utility computing, server-based computing, and grid computing. So research into cloud computing is highly related to and combined with various relevant computing research areas. To seek promising research issues and topics in cloud computing, it is necessary to understand the research trends in cloud computing more comprehensively. In this study, we collect bibliographic information and citation information for cloud computing related research papers published in major international journals from 1994 to 2012, and analyzes macroscopic trends and network changes to citation relationships among papers and the co-occurrence relationships of key words by utilizing social network analysis measures. Through the analysis, we can identify the relationships and connections among research topics in cloud computing related areas, and highlight new potential research topics. In addition, we visualize dynamic changes of research topics relating to cloud computing using a proposed cloud computing "research trend map." A research trend map visualizes positions of research topics in two-dimensional space. Frequencies of key words (X-axis) and the rates of increase in the degree centrality of key words (Y-axis) are used as the two dimensions of the research trend map. Based on the values of the two dimensions, the two dimensional space of a research map is divided into four areas: maturation, growth, promising, and decline. An area with high keyword frequency, but low rates of increase of degree centrality is defined as a mature technology area; the area where both keyword frequency and the increase rate of degree centrality are high is defined as a growth technology area; the area where the keyword frequency is low, but the rate of increase in the degree centrality is high is defined as a promising technology area; and the area where both keyword frequency and the rate of degree centrality are low is defined as a declining technology area. Based on this method, cloud computing research trend maps make it possible to easily grasp the main research trends in cloud computing, and to explain the evolution of research topics. According to the results of an analysis of citation relationships, research papers on security, distributed processing, and optical networking for cloud computing are on the top based on the page-rank measure. From the analysis of key words in research papers, cloud computing and grid computing showed high centrality in 2009, and key words dealing with main elemental technologies such as data outsourcing, error detection methods, and infrastructure construction showed high centrality in 2010~2011. In 2012, security, virtualization, and resource management showed high centrality. Moreover, it was found that the interest in the technical issues of cloud computing increases gradually. From annual cloud computing research trend maps, it was verified that security is located in the promising area, virtualization has moved from the promising area to the growth area, and grid computing and distributed system has moved to the declining area. The study results indicate that distributed systems and grid computing received a lot of attention as similar computing paradigms in the early stage of cloud computing research. The early stage of cloud computing was a period focused on understanding and investigating cloud computing as an emergent technology, linking to relevant established computing concepts. After the early stage, security and virtualization technologies became main issues in cloud computing, which is reflected in the movement of security and virtualization technologies from the promising area to the growth area in the cloud computing research trend maps. Moreover, this study revealed that current research in cloud computing has rapidly transferred from a focus on technical issues to for a focus on application issues, such as SLAs (Service Level Agreements).

An Exploratory Study on the Competition Patterns Between Internet Sites in Korea (한국 인터넷사이트들의 산업별 경쟁유형에 대한 탐색적 연구)

  • Park, Yoonseo;Kim, Yongsik
    • Asia Marketing Journal
    • /
    • v.12 no.4
    • /
    • pp.79-111
    • /
    • 2011
  • Digital economy has grown rapidly so that the new business area called 'Internet business' has been dramatically extended as time goes on. However, in the case of Internet business, market shares of individual companies seem to fluctuate very extremely. Thus marketing managers who operate the Internet sites have seriously observed the competition structure of the Internet business market and carefully analyzed the competitors' behavior in order to achieve their own business goals in the market. The newly created Internet business might differ from the offline ones in management styles, because it has totally different business circumstances when compared with the existing offline businesses. Thus, there should be a lot of researches for finding the solutions about what the features of Internet business are and how the management style of those Internet business companies should be changed. Most marketing literatures related to the Internet business have focused on individual business markets. Specifically, many researchers have studied the Internet portal sites and the Internet shopping mall sites, which are the most general forms of Internet business. On the other hand, this study focuses on the entire Internet business industry to understand the competitive circumstance of online market. This approach makes it possible not only to have a broader view to comprehend overall e-business industry, but also to understand the differences in competition structures among Internet business markets. We used time-series data of Internet connection rates by consumers as the basic data to figure out the competition patterns in the Internet business markets. Specifically, the data for this research was obtained from one of Internet ranking sites, 'Fian'. The Internet business ranking data is obtained based on web surfing record of some pre-selected sample group where the possibility of double-count for page-views is controlled by method of same IP check. The ranking site offers several data which are very useful for comparison and analysis of competitive sites. The Fian site divides the Internet business areas into 34 area and offers market shares of big 5 sites which are on high rank in each category daily. We collected the daily market share data about Internet sites on each area from April 22, 2008 to August 5, 2008, where some errors of data was found and 30 business area data were finally used for our research after the data purification. This study performed several empirical analyses in focusing on market shares of each site to understand the competition among sites in Internet business of Korea. We tried to perform more statistically precise analysis for looking into business fields with similar competitive structures by applying the cluster analysis to the data. The research results are as follows. First, the leading sites in each area were classified into three groups based on averages and standard deviations of daily market shares. The first group includes the sites with the lowest market shares, which give more increased convenience to consumers by offering the Internet sites as complimentary services for existing offline services. The second group includes sites with medium level of market shares, where the site users are limited to specific small group. The third group includes sites with the highest market shares, which usually require online registration in advance and have difficulty in switching to another site. Second, we analyzed the second place sites in each business area because it may help us understand the competitive power of the strongest competitor against the leading site. The second place sites in each business area were classified into four groups based on averages and standard deviations of daily market shares. The four groups are the sites showing consistent inferiority compared to the leading sites, the sites with relatively high volatility and medium level of shares, the sites with relatively low volatility and medium level of shares, the sites with relatively low volatility and high level of shares whose gaps are not big compared to the leading sites. Except 'web agency' area, these second place sites show relatively stable shares below 0.1 point of standard deviation. Third, we also classified the types of relative strength between leading sites and the second place sites by applying the cluster analysis to the gap values of market shares between two sites. They were also classified into four groups, the sites with the relatively lowest gaps even though the values of standard deviation are various, the sites with under the average level of gaps, the sites with over the average level of gaps, the sites with the relatively higher gaps and lower volatility. Then we also found that while the areas with relatively bigger gap values usually have smaller standard deviation values, the areas with very small differences between the first and the second sites have a wider range of standard deviation values. The practical and theoretical implications of this study are as follows. First, the result of this study might provide the current market participants with the useful information to understand the competitive circumstance of the market and build the effective new business strategy for the market success. Also it might be useful to help new potential companies find a new business area and set up successful competitive strategies. Second, it might help Internet marketing researchers take a macro view of the overall Internet market so that make possible to begin the new studies on overall Internet market beyond individual Internet market studies.

  • PDF