• 제목/요약/키워드: methods of data analysis

검색결과 19,359건 처리시간 0.053초

계량정보분석시스템으로서의 KnowledgeMatrix 개발 (Development of the KnowledgeMatrix as an Informetric Analysis System)

  • 이방래;여운동;이준영;이창환;권오진;문영호
    • 한국콘텐츠학회논문지
    • /
    • 제8권1호
    • /
    • pp.68-74
    • /
    • 2008
  • 데이터베이스로부터 지식을 발견하고 이를 연구기획자, 정책의사결정자들이 활용하는 움직임이 전세계적으로 활발해지고 있다. 이러한 연구분야 중 대표적인 것이 계량정보학이고 이 분야를 지원하기 위해서 주로 선진국을 중심으로 분석시스템이 개발되고 있다. 그러나 외국의 분석시스템은 실제 수요자의 요구를 충분히 반영하지 못하고 있고, 고가이면서 한글이 지원되지 않아 국내 연구기획자가 사용하기에 어려운 점이 있다. 따라서 한국과학기술정보연구원에서는 이러한 단점을 극복하기 위해서 계량정보분석시스템 KnowledgeMatrix를 개발하였다. KnowledgeMatrix는 논문 및 특허의 서지정보를 분석하여 지식을 발견하기 위한 목적으로 설계된 독립형(stand-alone) 시스템이다 KnowledgeMatrix의 주요 구성을 살펴보면 행렬 생성, 클러스터링, 시각화, 데이터 전처리로 요약된다. 본 논문에서 소개하고 있는 KnowledgeMatrix는 외국의 대표적인 정보분석시스템과 비교했을 때 다양한 기능을 제공하고 있고 특히 영문데이터 처리 이외에 한글데이터 처리가 가능하다는 장점을 갖고 있다.

Clustering non-stationary advanced metering infrastructure data

  • Kang, Donghyun;Lim, Yaeji
    • Communications for Statistical Applications and Methods
    • /
    • 제29권2호
    • /
    • pp.225-238
    • /
    • 2022
  • In this paper, we propose a clustering method for advanced metering infrastructure (AMI) data in Korea. As AMI data presents non-stationarity, we consider time-dependent frequency domain principal components analysis, which is a proper method for locally stationary time series data. We develop a new clustering method based on time-varying eigenvectors, and our method provides a meaningful result that is different from the clustering results obtained by employing conventional methods, such as K-means and K-centres functional clustering. Simulation study demonstrates the superiority of the proposed approach. We further apply the clustering results to the evaluation of the electricity price system in South Korea, and validate the reform of the progressive electricity tariff system.

Movie Recommendation Algorithm Using Social Network Analysis to Alleviate Cold-Start Problem

  • Xinchang, Khamphaphone;Vilakone, Phonexay;Park, Doo-Soon
    • Journal of Information Processing Systems
    • /
    • 제15권3호
    • /
    • pp.616-631
    • /
    • 2019
  • With the rapid increase of information on the World Wide Web, finding useful information on the internet has become a major problem. The recommendation system helps users make decisions in complex data areas where the amount of data available is large. There are many methods that have been proposed in the recommender system. Collaborative filtering is a popular method widely used in the recommendation system. However, collaborative filtering methods still have some problems, namely cold-start problem. In this paper, we propose a movie recommendation system by using social network analysis and collaborative filtering to solve this problem associated with collaborative filtering methods. We applied personal propensity of users such as age, gender, and occupation to make relationship matrix between users, and the relationship matrix is applied to cluster user by using community detection based on edge betweenness centrality. Then the recommended system will suggest movies which were previously interested by users in the group to new users. We show shown that the proposed method is a very efficient method using mean absolute error.

도로 침수영역의 탐색을 위한 빅데이터 분석 시스템 연구 (A Study on the Big Data Analysis System for Searching of the Flooded Road Areas)

  • 송영미;김창수
    • 한국멀티미디어학회논문지
    • /
    • 제18권8호
    • /
    • pp.925-934
    • /
    • 2015
  • The frequency of natural disasters because of global warming is gradually increasing, risks of flooding due to typhoon and torrential rain have also increased. Among these causes, the roads are flooded by suddenly torrential rain, and then vehicle and personal injury are happening. In this respect, because of the possibility that immersion of a road may occur in a second, it is necessary to study the rapid data collection and quick response system. Our research proposes a big data analysis system based on the collected information and a variety of system information collection methods for searching flooded road areas by torrential rains. The data related flooded roads are utilized the SNS data, meteorological data and the road link data, etc. And the big data analysis system is implemented the distributed processing system based on the Hadoop platform.

사용자의 감성반응에 기초한 형태 분석 도구에 대한 연구 (A Study on the Form Analysis Tools Based on the User's Emotional Response)

  • 최민영
    • 감성과학
    • /
    • 제12권2호
    • /
    • pp.233-242
    • /
    • 2009
  • 최근 사용자 중심의 디자인과 형태개발 및 분석은 성공적 디자인의 중요한 방법론으로 부각되고 있다. 사용자의 형태분석을 위해서는 기존 방법의 통합적인 접근과 더불어 디자이너의 전문도구로서 고찰되고 개발되는 것이 요구된다. 특히 기존의 분석도구들은 디자이너의 요구에 적합할 수 있도록 분석결과의 시각화과 명확한 방향성 제시가 요구되며, 사용자 감성반응을 형태분석에 응용할 수 있는 다각적인 방법의 모색이 요구된다. 또한 분석도구로서의 전문성을 강화하고 디자이너가 손쉽게 사용할 수 있도록 디자이너 친화적 인터페이스의 적용이 필요하다. 본 연구는 사용자 감성반응을 기존의 형태분석 도구에 활용하기 위한 방법과 체계를 분석하며, 이를 통하여 사용자의 감성적 반응에 기초한 형태분석 도구를 제시하였다. 구체적인 형태분석의 도구는 통합적 관리, 변수설정, 분석결과의 시각화, 데이터마이닝을 통한 심층 분석, 사용자 중심 분석결과의 연관성 강화의 5가지 컨셉으로 제시되었으며, 프로젝트 관리, 분석프레임 설정, 데이터 입출력, 기초 분석, 심층분석의 5가지 모듈로서 개발되었다. 제안된 도구는 모바일 폰의 사례조사를 통하여 그 효용성을 알아보았으며, 도구 활용의 사용성과 형태분석의 타당성이 검증되었다.

  • PDF

Comparison of Sentiment Analysis from Large Twitter Datasets by Naïve Bayes and Natural Language Processing Methods

  • Back, Bong-Hyun;Ha, Il-Kyu
    • Journal of information and communication convergence engineering
    • /
    • 제17권4호
    • /
    • pp.239-245
    • /
    • 2019
  • Recently, effort to obtain various information from the vast amount of social network services (SNS) big data generated in daily life has expanded. SNS big data comprise sentences classified as unstructured data, which complicates data processing. As the amount of processing increases, a rapid processing technique is required to extract valuable information from SNS big data. We herein propose a system that can extract human sentiment information from vast amounts of SNS unstructured big data using the naïve Bayes algorithm and natural language processing (NLP). Furthermore, we analyze the effectiveness of the proposed method through various experiments. Based on sentiment accuracy analysis, experimental results showed that the machine learning method using the naïve Bayes algorithm afforded a 63.5% accuracy, which was lower than that yielded by the NLP method. However, based on data processing speed analysis, the machine learning method by the naïve Bayes algorithm demonstrated a processing performance that was approximately 5.4 times higher than that by the NLP method.

화상처리에 의한 교통류 해석방법에 관한 연구 (A Study on the Traffic Flow Analysis Method by Image Processing)

  • 이종달;이령욱
    • 대한교통학회지
    • /
    • 제12권1호
    • /
    • pp.97-116
    • /
    • 1994
  • Today advanced traffic management systems are required because of a high increase in traffic demand. Accordingly, the objective of this study is to take advantage of image processing systems and present image processing methods available for collection of the data on traffic characteristics, and then to investigate the possibility of traffic flow analysis by means of comparison and analysis of measured traffic flow. Data were collected at two places of Daegu city and Kyongbu expressway by using VTR. Rear view (down stream) and frontal view (up stream) methods were employed to compare and analyze traffic characteristics including traffic volume, speed, time-headway, time-occupancy, and vehicle-length, by analysis of measured traffic flow and image processing respectively. Judging from the results obtained by this study, image processing techniques are sufficient for the analysis of traffic volume, but a frame grabber equipped with high speed processor is necessary as well, with low level system judged to be sufficient for traffic volume analysis.

  • PDF

토픽 모형 및 사회연결망 분석을 이용한 한국데이터정보과학회지 영문초록 분석 (Analysis of English abstracts in Journal of the Korean Data & Information Science Society using topic models and social network analysis)

  • 김규하;박철용
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권1호
    • /
    • pp.151-159
    • /
    • 2015
  • 이 논문에서는 텍스트마이닝 (text mining) 기법을 이용하여 한국데이터정보과학회지에 게재된 논문의 영어초록을 분석하였다. 먼저 다양한 방법을 통해 단어-문서 행렬 (term-document matrix)을 생성하고 이를 사회연결망 분석 (social network analysis)을 통해 시각화하였다. 또한 토픽을 추출하기 위한 방법으로 LDA (latent Dirichlet allocation)와 CTM (correlated topic model)을 사용하였다. 토픽의 수, 단어-문서 행렬의 생성방법에 따라 엔트로피 (entropy)를 통해 토픽 추출 모형들의 성능을 비교하였다.

자성 측정 방법에 따른 BLDC 전동기의 전자계 특성해석 (Finite Element Analysis of BLDC Motor Characteristic according to Magnetic Property Measurement Methods)

  • 김지현;하경호;권오열;차상윤;김재관
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2008년도 제39회 하계학술대회
    • /
    • pp.697-698
    • /
    • 2008
  • This paper deals with finite element characteristic analysis of brushless DC motor according to magnetic property measurement methods. Magnetic property data for non-oriented (NO) electrical steel for electric motors are measured by the Epstein test which is considered as the international standards. Data from Epstein test may result in discrepancy from motor characteristic tests due to innate anisotropic property of NO electrical steel. Finite element analysis were performed for a BLDC motor by various measurement methods such as Epstein test, Ring test and single sheet test (SST), and calculated results were compared with considering anisotropic property conditions.

  • PDF

PhysioCover: Recovering the Missing Values in Physiological Data of Intensive Care Units

  • Kim, Sun-Hee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • 제10권2호
    • /
    • pp.47-58
    • /
    • 2014
  • Physiological signals provide important clues in the diagnosis and prediction of disease. Analyzing these signals is important in health and medicine. In particular, data preprocessing for physiological signal analysis is a vital issue because missing values, noise, and outliers may degrade the analysis performance. In this paper, we propose PhysioCover, a system that can recover missing values of physiological signals that were monitored in real time. PhysioCover integrates a gradual method and EM-based Principle Component Analysis (PCA). This approach can (1) more readily recover long- and short-term missing data than existing methods, such as traditional EM-based PCA, linear interpolation, 5-average and Missing Value Singular Value Decomposition (MSVD), (2) more effectively detect hidden variables than PCA and Independent component analysis (ICA), and (3) offer fast computation time through real-time processing. Experimental results with the physiological data of an intensive care unit show that the proposed method assigns more accurate missing values than previous methods.