• 제목/요약/키워드: Apache

Search Result 360, Processing Time 0.024 seconds

Design and Implementation of Collaborative Filtering Application System using Apache Mahout -Focusing on Movie Recommendation System-

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.7
    • /
    • pp.125-131
    • /
    • 2017
  • It is not easy for the user to find the information that is appropriate for the user among the suddenly increasing information in recent years. One of the ways to help individuals make decisions in such a lot of information is the recommendation system. Although there are many recommendation methods for such recommendation systems, a representative method is collaborative filtering. In this paper, we design and implement the movie recommendation system on user-based collaborative filtering of apache mahout. In addition, Pearson correlation coefficient is used as a method of measuring the similarity between users. We evaluate Precision and Recall using the MovieLens 100k dataset for performance evaluation.

A performance comparison for Apache Spark platform on environment of limited memory (제한된 메모리 환경에서의 아파치 스파크 성능 비교)

  • Song, Jun-Seok;Kim, Sang-Young;Lee, Jung-June;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.01a
    • /
    • pp.67-68
    • /
    • 2016
  • 최근 빅 데이터를 이용한 시스템들이 여러 분야에서 활발히 이용되기 시작하면서 대표적인 빅데이터 저장 및 처리 플랫폼인 하둡(Hadoop)의 기술적 단점을 보완할 수 있는 다양한 분산 시스템 플랫폼이 등장하고 있다. 그 중 아파치 스파크(Apache Spark)는 하둡 플랫폼의 속도저하 단점을 보완하기 위해 인 메모리 처리를 지원하여 대용량 데이터를 효율적으로 처리하는 오픈 소스 분산 데이터 처리 플랫폼이다. 하지만, 아파치 스파크의 작업은 메모리에 의존적이므로 제한된 메모리 환경에서 전체 작업 성능은 급격히 낮아진다. 본 논문에서는 메모리 용량에 따른 아파치 스파크 성능 비교를 통해 아파치 스파크 동작을 위해 필요한 적정 메모리 용량을 확인한다.

  • PDF

Real-time Watermarking Method for Streaming Video Data (Apache Kafka를 활용한 실시간 대규모 비디오 스트리밍 기법)

  • Yeon-Jun Yoo;Seok-Min Hong;Yong-Tae Shin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.556-558
    • /
    • 2024
  • 오늘날 클라우드 컴퓨팅은 FIFA, WTA, F1, MLB등과 같은 비디오 및 실시간 스포츠 이벤트에 널리 사용된다. DataM에 따르면 비디오 스트리밍 플랫폼 시장은 545억 달러에서 2,523달러에 달할 것으로 예측된다. 기존 실시간 스트리밍 방법은 스트리밍 비디오의 개수가 증가하고나 스트리밍 이용자가 증가할 경우 성능 저하 문제가 발생한다. 본 논문에서는 Apache Kafka Server를 활용한 대규모 비디오 스트리밍 기법을 제안한다. Apache Kafka Server를 사용하여 네트워크를 수집하면 대규모 데이터를 처리할 수 있으며, 데이터의 안정성과 실시간 처리를 할 수 있어 온라인 비디오 스트리밍에 적합하다. 이에 비디오 품질을 선택할 때 적합한 비디오 품질을 선택할 수 있다. 향후 제안하는 기법은 많은 데이터와 실험으로 실질적인 검증을 할 예정이다.

The SOFA Score to Evaluate Organ Failure and Prognosis in the Intensive Care Unit Patients (중환자실에 입원한 환자의 장기부전 및 예후 평가를 위한 SOFA 점수체계의 의의)

  • Kim, Su Ho;Lee, Myung Goo;Park, Sang Myeon;Park, Young Bum;Jang, Seung Hun;Kim, Cheol Hong;Jeon, Man Jo;Shin, Tae Rim;Eom, Kwang Seok;Hyun, In-Gyu;Jung, Ki-Suck;Lee, Seung-Joon
    • Tuberculosis and Respiratory Diseases
    • /
    • v.57 no.4
    • /
    • pp.329-335
    • /
    • 2004
  • Background : The Sequential Organ Failure Assessment (SOFA) score can help to assess organ failure over time and is useful to evaluate morbidity. The aim of this study is to evaluate the performance of SOFA score as a descriptor of multiple organ failure in critically ill patients in a local unit hospital, and to compare with APACHE III scoring system. Methods : This study was carried out prospectively. A total of ninety one patients were included who admitted to the medical intensive care unit (ICU) in Chuncheon Sacred Heart Hospital from May 1 through June 30, 2000. We excluded patients with a length of stay in the ICU less than 2 days following scheduled procedure, admissions for ECG monitoring, other department and patients transferred to other hospital. The SOFA score and APACHE III score were calculated on admission and then consecutively every 24 hours until ICU discharge. Results : The ICU mortality rate was 20%. The non-survivors had a higher SOFA score within 24 hours after admission. The number of organ failure was associated with increased mortality. The evaluation of a subgroup of 74 patients who stayed in the ICU for at least 48 hours showed that survivors and non-survivors followed a different course. In this subgroup, the total SOFA score increased in 81% of the non-survivors but in only 21% of the survivors. Conversely, the total SOFA score decreased in 48% of the survivors compared with 6% of the non-survivors. The non-survivors also had a higher APACHE III score within 24 hours and there was a correlation between SOFA score and APACHE III score. Conclusion : The SOFA score is a simple, but effective method to assess organ failure and to predict mortality in critically ill patients. Regular and repeated scoring enables patient's condition and clinical course to be monitored and better understood. The SOFA score well correlates with APACHE III score.

SPQUSAR : A Large-Scale Qualitative Spatial Reasoner Using Apache Spark (SPQUSAR : Apache Spark를 이용한 대용량의 정성적 공간 추론기)

  • Kim, Jongwhan;Kim, Jonghoon;Kim, Incheol
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.12
    • /
    • pp.774-779
    • /
    • 2015
  • In this paper, we present the design and implementation of a large-scale qualitative spatial reasoner using Apache Spark, an in-memory high speed cluster computing environment, which is effective for sequencing and iterating component reasoning jobs. The proposed reasoner can not only check the integrity of a large-scale spatial knowledge base representing topological and directional relationships between spatial objects, but also expand the given knowledge base by deriving new facts in highly efficient ways. In general, qualitative reasoning on topological and directional relationships between spatial objects includes a number of composition operations on every possible pair of disjunctive relations. The proposed reasoner enhances computational efficiency by determining the minimal set of disjunctive relations for spatial reasoning and then reducing the size of the composition table to include only that set. Additionally, in order to improve performance, the proposed reasoner is designed to minimize disk I/Os during distributed reasoning jobs, which are performed on a Hadoop cluster system. In experiments with both artificial and real spatial knowledge bases, the proposed Spark-based spatial reasoner showed higher performance than the existing MapReduce-based one.

A General Distributed Deep Learning Platform: A Review of Apache SINGA

  • Lee, Chonho;Wang, Wei;Zhang, Meihui;Ooi, Beng Chin
    • Communications of the Korean Institute of Information Scientists and Engineers
    • /
    • v.34 no.3
    • /
    • pp.31-34
    • /
    • 2016
  • This article reviews Apache SINGA, a general distributed deep learning (DL) platform. The system components and its architecture are presented, as well as how to configure and run SINGA for different types of distributed training using model/data partitioning. Besides, several features and performance are compared with other popular DL tools.

Big Data Astronomy : Let's "PySpark" the Universe (빅데이터 천문학 : PySpark를 이용한 천문자료 분석)

  • Hong, Sungryong
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.1
    • /
    • pp.63.1-63.1
    • /
    • 2018
  • The modern large-scale surveys and state-of-the-art cosmological simulations produce various kinds of big data composed of millions and billions of galaxies. Inevitably, we need to adopt modern Big Data platforms to properly handle such large-scale data sets. In my talk, I will briefly introduce the de facto standard of modern Big Data platform, Apache Spark, and present some examples to demonstrate how Apache Spark can be utilized for solving data-driven astronomical problems.

  • PDF

Framework Implementation of Image-Based Indoor Localization System Using Parallel Distributed Computing (병렬 분산 처리를 이용한 영상 기반 실내 위치인식 시스템의 프레임워크 구현)

  • Kwon, Beom;Jeon, Donghyun;Kim, Jongyoo;Kim, Junghwan;Kim, Doyoung;Song, Hyewon;Lee, Sanghoon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.11
    • /
    • pp.1490-1501
    • /
    • 2016
  • In this paper, we propose an image-based indoor localization system using parallel distributed computing. In order to reduce computation time for indoor localization, an scale invariant feature transform (SIFT) algorithm is performed in parallel by using Apache Spark. Toward this goal, we propose a novel image processing interface of Apache Spark. The experimental results show that the speed of the proposed system is about 3.6 times better than that of the conventional system.

Clinical Aspects and Prognostic Factors Of Small Bowel Perforation After Blunt Abdominal Trauma (복부 둔상에 의한 소장 천공 환자의 임상 양상 및 예후 인자)

  • Kim, Ji-Won;Kwak, Seung-Su;Park, Mun-Ki;Koo, Yong-Pyeong
    • Journal of Trauma and Injury
    • /
    • v.24 no.2
    • /
    • pp.82-88
    • /
    • 2011
  • Background: The incidence of abdominal trauma with intra-abdominal organ injury or bowel rupture is increasing. Articles on the diagnosis, symptoms and treatment of small bowel perforation due to blunt trauma have been reported, but reports on the relationship of mortality and morbidity to clinical factors for prognosis are minimal. The purposes of this study are to evaluate the morbidity and mortality of patients with small bowel perforation after blunt abdominal trauma on the basis of clinical examination and to analyze factors associated with the prognosis for blunt abdominal trauma with small bowel perforation. Methods: The clinical data on patients with small bowel perforation due to blunt trauma who underwent emergency surgery from January 1994 to December 2009 were retrospectively analyzed. The correlation of each prognostic factor to morbidity and mortality, and the relationship among prognostic factors were analyzed. Results: A total of 83 patients met the inclusion criteria: The male was 81.9%. The mean age was 45.6 years. The mean APACHE II score was 5.75. The mean time interval between injury and surgery was 395.9 minutes. The mean surgery time was 111.1 minutes. Forty seven patients had surgery for ileal perforations, and primary closure was done for 51patients. The mean admission period was 15.3 days, and the mean fasting time was 4.5 days. There were 6 deaths (7.2%), and 25 patients suffered from complications. Conclusion: The patient's age and the APACHE II score on admission were important prognostic factors that effected a patient's progress. Especially, this study shows that the APACHE II score had effect on the operation time, admission period, the treatment period, the fasting time, the mortality rate, and the complication rate.

Anomaly Detection Technique of Log Data Using Hadoop Ecosystem (하둡 에코시스템을 활용한 로그 데이터의 이상 탐지 기법)

  • Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.2
    • /
    • pp.128-133
    • /
    • 2017
  • In recent years, the number of systems for the analysis of large volumes of data is increasing. Hadoop, a representative big data system, stores and processes the large data in the distributed environment of multiple servers, where system-resource management is very important. The authors attempted to detect anomalies from the rapid changing of the log data that are collected from the multiple servers using simple but efficient anomaly-detection techniques. Accordingly, an Apache Hive storage architecture was designed to store the log data that were collected from the multiple servers in the Hadoop ecosystem. Also, three anomaly-detection techniques were designed based on the moving-average and 3-sigma concepts. It was finally confirmed that all three of the techniques detected the abnormal intervals correctly, while the weighted anomaly-detection technique is more precise than the basic techniques. These results show an excellent approach for the detection of log-data anomalies with the use of simple techniques in the Hadoop ecosystem.