Search | Korea Science

Kang, Minseo;Kim, Jaesung;Lee, Jaegil
- Journal of KIISE
- /
- v.43 no.6
- /
- pp.621-626
- /
- 2016
Recursive query algorithm is used in many social network services, e.g., reachability queries in social networks. Recently, the size of social network data has increased as social network services evolve. As a result, it is almost impossible to use the recursive query algorithm on a single machine. In this paper, we implement recursive query on two popular in-memory distributed platforms, Spark and Twister, to solve this problem. We evaluate the performance of two implementations using 50 machines on Amazon EC2, and real-world data sets: LiveJournal and ClueWeb. The result shows that recursive query algorithm shows better performance on Spark for the Livejournal input data set with relatively high average degree, but smaller vertices. However, recursive query on Twister is superior to Spark for the ClueWeb input data set with relatively low average degree, but many vertices.
https://doi.org/10.5626/JOK.2016.43.6.621 인용 KSCI

Kwon, Beom;Jeon, Donghyun;Kim, Jongyoo;Kim, Junghwan;Kim, Doyoung;Song, Hyewon;Lee, Sanghoon
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.41 no.11
- /
- pp.1490-1501
- /
- 2016
In this paper, we propose an image-based indoor localization system using parallel distributed computing. In order to reduce computation time for indoor localization, an scale invariant feature transform (SIFT) algorithm is performed in parallel by using Apache Spark. Toward this goal, we propose a novel image processing interface of Apache Spark. The experimental results show that the speed of the proposed system is about 3.6 times better than that of the conventional system.
https://doi.org/10.7840/kics.2016.41.11.1490 인용 PDF KSCI

Park, Ki-Sung;Choi, Jae-Hyun;Kim, Jong-Bae;Park, Jae-Won
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.21 no.1
- /
- pp.17-28
- /
- 2017
Recently, a study on data has been actively conducted because the value of the data has become more useful. Web crawler that is program of data collection recently spotlighted because it can take advantage of the various fields. Web crawler can be defined as a tool to analyze the web pages and collects the URL by traversing the web server in an automated manner. For the treatment of Big-data, distributed Web crawler is widely used which is based on the Hadoop MapReduce. But, it is difficult to use and has constraints on the performance. Apache spark that is the In-memory computing platform is an alternative to MapReduce. The search engine which is one of the main purposes of web crawler displays the information you search by keyword gathered by web crawler. If search engines implement a spark-based web crawler instead of traditional MapReduce-based web crawler, it would be a more rapid data collection.
https://doi.org/10.6109/jkiice.2017.21.1.17 인용 PDF KSCI