Search | Korea Science

Son, Siwoon;Kim, Dasol;Lee, Sujeong;Gil, Myeong-Seon;Moon, Yang-Sae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2016.10a
- /
- pp.47-49
- /
- 2016
최근 SNS(social networking service)의 사용이 급증함에 따라 SNS에서 발생하는 데이터의 분석이 활발해졌다. 하지만 SNS 데이터는 빠르게 생성되며 정형화 되어 있지 않은 빅데이터이기 때문에 그대로 수집할 경우 분석하기가 어렵다. 본 논문은 분산 스트리밍 처리 기술인 Storm을 사용하여 트위터에서 실시간으로 발생하는 데이터를 수집 및 집계하고, 태그 클라우드를 사용하여 집계 결과를 동적으로 시각화하고자 한다. 또한 사용자가 쉽게 키워드를 입력하고 시각화 결과를 실시간으로 확인할 수 있도록 웹 인터페이스를 구현한다. 그리고 결과를 통해 태그 클라우드의 결과가 시간에 따라 바르게 시각화되었는지 확인한다. 본 논문은 빠르게 발생하는 SNS 데이터로부터 각 키워드와 관련된 정보를 시각화하여 각 사용자에게 제공할 수 있는 우수한 결과가 사료된다.
https://doi.org/10.3745/PKIPS.y2016m10a.47 인용 PDF

Ruchi Singh;Jongwook Woo
- Asia pacific journal of information systems
- /
- v.29 no.1
- /
- pp.35-49
- /
- 2019
The paper attempts to document the application of relevant Machine Learning (ML) models on Yelp (a crowd-sourced local business review and social networking site) dataset to analyze, predict and recommend business. Strategically using two cloud platforms to minimize the effort and time required for this project. Seven machine learning algorithms in Azure ML of which four algorithms are implemented in Databricks Spark ML. The analyzed Yelp business dataset contained 70 business attributes for more than 350,000 registered business. Additionally, review tips and likes from 500,000 users have been processed for the project. A Recommendation Model is built to provide Yelp users with recommendations for business categories based on their previous business ratings, as well as the business ratings of other users. Classification Model is implemented to predict the popularity of the business as defining the popular business to have stars greater than 3 and unpopular business to have stars less than 3. Text Analysis model is developed by comparing two algorithms, uni-gram feature extraction and n-feature extraction in Azure ML studio and logistic regression model in Spark. Comparative conclusions have been made related to efficiency of Spark ML and Azure ML for these models.
https://doi.org/10.14329/apjis.2019.29.1.35 인용 PDF

Lee, Jaemyoun;Kang, Kyungtae
- KIPS Transactions on Computer and Communication Systems
- /
- v.3 no.11
- /
- pp.409-418
- /
- 2014
The amount of data located in storage servers has dramatically increased with the growth in cloud and social networking services. Storage systems with very large capacities may suffer from poor reliability and long latency, problems which can be addressed by the use of a hybrid disk, in which mechanical and flash memory storage are combined. The Linux-based SSD(solid-state disk) uses a caching technique based on the DM-cache utility. We assess the limitations of DM-cache by evaluating its performance in diverse environments, and identify problems with the caching policy that it operates in response to various commands. This policy is effective in reducing latency when Linux is running in native mode; but when Linux is installed as a guest operating systems on a virtual machine, the overhead incurred by caching actually reduces performance.
https://doi.org/10.3745/KTCCS.2014.3.11.409 인용 PDF KSCI