• Title/Summary/Keyword: 시계열 군집분석

Search Result 85, Processing Time 0.025 seconds

Big Data News Analysis in Healthcare Using Topic Modeling and Time Series Regression Analysis (토픽모델링과 시계열 회귀분석을 활용한 헬스케어 분야의 뉴스 빅데이터 분석 연구)

  • Eun-Jung Kim;Suk-Gwon Chang;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.25 no.3
    • /
    • pp.163-177
    • /
    • 2023
  • This research aims to identify key initiatives and a policy approach to support the industrialization of the sector. The research collected a total of 91,873 news data points relating to healthcare between 2013 to 2022. A total of 20 topics were derived through topic modeling analysis, and as a result of time series regression analysis, 4 hot topics (Healthcare, Biopharmaceuticals, Corporate outlook·Sales, Government·Policy), 3 cold topics (Smart devices, Stocks·Investment, Urban development·Construction) derived a significant topic. The research findings will serve as an important data source for government institutions that are engaged in the formulation and implementation of Korea's policies.

Software Measurement by Analyzing Multiple Time-Series Patterns (다중 시계열 패턴 분석에 의한 소프트웨어 계측)

  • Kim Gye-Young
    • Journal of Internet Computing and Services
    • /
    • v.6 no.1
    • /
    • pp.105-114
    • /
    • 2005
  • This paper describes a new measuring technique by analysing multiple time-series patterns. This paper's goal is that extracts a really measured value having a sample pattern which is the best matched with an inputted time-series, and calculates a difference ratio with the value. Therefore, the proposed technique is not a recognition but a measurement. and not a hardware but a software. The proposed technique is consisted of three stages, initialization, learning and measurement. In the initialization stage, it decides weights of all parameters using importance given by an operator. In the learning stage, it classifies sample patterns using LBG and DTW algorithm, and then creates code sequences for all the patterns. In the measurement stage, it creates a code sequence for an inputted time-series pattern, finds samples having the same code sequence by hashing, and then selects the best matched sample. Finally it outputs the really measured value with the sample and the difference ratio. For the purpose of performance evaluation, we tested on multiple time-series patterns obtained from etching machine which is a semiconductor manufacturing.

  • PDF

A Statistical Analysis of the Causes of Marine Incidents occurring during Berthing (정박 중 발생한 준해양사고 원인에 대한 통계 분석 연구)

  • Roh, Boem-Seok;Kang, Suk-Young
    • Journal of Navigation and Port Research
    • /
    • v.45 no.3
    • /
    • pp.95-101
    • /
    • 2021
  • Marine Incidents based on Heinrich's law are very important in preventing accidents. However, marine Incident data are mainly qualitative and are used to prevent similar accidents through case sharing rather than statistical analysis, which can be confirmed in the marine Incident-related data posted in the Korea Maritime Safety Tribunal. Therefore, this study derived quantitative results by analyzing the causes of marine incidents during berthing using various methods of statistical analysis. To this end, data involving marine incidents from various shipping companies were collected and reclassified for easy analysis. The main keywords were derived via primary analysis using text mining. Only meaningful words were selected via verification by an expert group, and time series and cluster analysis were performed to predict marine incidents that may occur during berthing. Although the role of an expert group was still required during the analysis, it was confirmed that quantitative analysis of marine incidents was feasible, and iused to provide cause and accident prevention information.

Derivation of Digital Music's Ranking Change Through Time Series Clustering (시계열 군집분석을 통한 디지털 음원의 순위 변화 패턴 분류)

  • Yoo, In-Jin;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.171-191
    • /
    • 2020
  • This study focused on digital music, which is the most valuable cultural asset in the modern society and occupies a particularly important position in the flow of the Korean Wave. Digital music was collected based on the "Gaon Chart," a well-established music chart in Korea. Through this, the changes in the ranking of the music that entered the chart for 73 weeks were collected. Afterwards, patterns with similar characteristics were derived through time series cluster analysis. Then, a descriptive analysis was performed on the notable features of each pattern. The research process suggested by this study is as follows. First, in the data collection process, time series data was collected to check the ranking change of digital music. Subsequently, in the data processing stage, the collected data was matched with the rankings over time, and the music title and artist name were processed. Each analysis is then sequentially performed in two stages consisting of exploratory analysis and explanatory analysis. First, the data collection period was limited to the period before 'the music bulk buying phenomenon', a reliability issue related to music ranking in Korea. Specifically, it is 73 weeks starting from December 31, 2017 to January 06, 2018 as the first week, and from May 19, 2019 to May 25, 2019. And the analysis targets were limited to digital music released in Korea. In particular, digital music was collected based on the "Gaon Chart", a well-known music chart in Korea. Unlike private music charts that are being serviced in Korea, Gaon Charts are charts approved by government agencies and have basic reliability. Therefore, it can be considered that it has more public confidence than the ranking information provided by other services. The contents of the collected data are as follows. Data on the period and ranking, the name of the music, the name of the artist, the name of the album, the Gaon index, the production company, and the distribution company were collected for the music that entered the top 100 on the music chart within the collection period. Through data collection, 7,300 music, which were included in the top 100 on the music chart, were identified for a total of 73 weeks. On the other hand, in the case of digital music, since the cases included in the music chart for more than two weeks are frequent, the duplication of music is removed through the pre-processing process. For duplicate music, the number and location of the duplicated music were checked through the duplicate check function, and then deleted to form data for analysis. Through this, a list of 742 unique music for analysis among the 7,300-music data in advance was secured. A total of 742 songs were secured through previous data collection and pre-processing. In addition, a total of 16 patterns were derived through time series cluster analysis on the ranking change. Based on the patterns derived after that, two representative patterns were identified: 'Steady Seller' and 'One-Hit Wonder'. Furthermore, the two patterns were subdivided into five patterns in consideration of the survival period of the music and the music ranking. The important characteristics of each pattern are as follows. First, the artist's superstar effect and bandwagon effect were strong in the one-hit wonder-type pattern. Therefore, when consumers choose a digital music, they are strongly influenced by the superstar effect and the bandwagon effect. Second, through the Steady Seller pattern, we confirmed the music that have been chosen by consumers for a very long time. In addition, we checked the patterns of the most selected music through consumer needs. Contrary to popular belief, the steady seller: mid-term pattern, not the one-hit wonder pattern, received the most choices from consumers. Particularly noteworthy is that the 'Climbing the Chart' phenomenon, which is contrary to the existing pattern, was confirmed through the steady-seller pattern. This study focuses on the change in the ranking of music over time, a field that has been relatively alienated centering on digital music. In addition, a new approach to music research was attempted by subdividing the pattern of ranking change rather than predicting the success and ranking of music.

A Study on Market Analysis of Seoul's Commercial Districts by Food Service Sector Using Sales per Store (점포당 매출액을 활용한 서울 소재 외식업종별 상권 분석에 관한 연구)

  • Sora Jang;Jaeho Hwang;Sooyon Seo;Moohong Min
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.398-401
    • /
    • 2023
  • 본 연구는 서울소재 외식업종의 6년간 점포당 매출액 데이터를 이용해 시계열 군집분석을 수행, 업종 및 지역별 상권을 세분화하고 '성장 상권'부터 '쇠퇴 상권'에 이르기까지 재정의한다. 이를 통해 예비 창업자와 소상공인이 업종과 지역을 선정하는 지표들을 분석하고 연구하였다.

Clustering fMRI Time Series using Self-Organizing Map (자기 조직 신경망을 이용한 기능적 뇌영상 시계열의 군집화)

  • 임종윤;장병탁;이경민
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.251-254
    • /
    • 2001
  • 본 논문에서는 Self Organizing Map을 이용하여 fMRI data를 분석해 보았다. fMRl (functional Magnetic Resonance Imaging)는 인간의 뇌에 대한 비 침투적 연구 방법 중 최근에 각광받고 있는 것이다. Motor task를 수행하고 있는 피험자로부터 image data를 얻어내어 SOM을 적용하여 clustering한 결과 motor cortex 영역이 뚜렷하게 clustering 되었음을 알 수 있었다.

  • PDF

A Study of Search Methodology for Efficient Clustering (효율적 군집화를 위한 탐색 방법 연구)

  • Jeon, Jin-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.571-573
    • /
    • 2010
  • Most real world system such as world economy, management, medical and engineering applications contain a series of complex phenomena. One of common methods to understand these system is to build a model and analyze the behavior of the system. As a first step, Determining the best clusters on data. As a second step, Determining the model of the cluster. In this paper, we investigated heuristic search methods for efficient clustering.

  • PDF

Time Series Patterns and Clustering of Rotifer Community in Relation with Topographical Characteristics in Lentic Ecosystems (정수생태계의 지형적인 요인 변화와 윤충류 출현 종 수 및 개체군 밀도 변동에 대한 연구)

  • Oh, Hye-Ji;Heo, Yu-Ji;Chang, Kwang-Hyeon;Kim, Hyun-Woo
    • Korean Journal of Ecology and Environment
    • /
    • v.54 no.4
    • /
    • pp.390-397
    • /
    • 2021
  • The time series data of rotifer community focusing on the species number and total density were collected from 29 reservoirs located at Jeonnam Province from 2008 to 2016 quarterly. The reservoirs had similar weather condition during the study period, but their sizes and water qualities were different. To analyze the temporal dynamics of rotifer community, the medians, ranges, outliers and coefficient of variation (CV) value of rotifer species number and abundance were compared. For the temporal trend analysis, time series of each reservoir data were compared and clustered using the dynamic time warping function of the R package "dtwclust". Small-sized reservoirs showed higher variability in rotifer abundance with more frequent outliers than large-sized reservoirs. On the other hand, apparent pattern was not observed for the rotifer species number. For the temporal pattern of rotifer density, COD, phytoplankton abundance fluctuation, and cladoceran abundance fluctuation have been suggested as potential factor affecting the rotifer abundance dynamics.

Similarity of Sampling Sites by Water Quality (수질 관측지점 유사성 측정방법 연구)

  • Kwon, Se-Hyug;Lee, Yo-Sang
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.1
    • /
    • pp.39-45
    • /
    • 2010
  • As the value of environment is increasing, the water quality has been a matter of interest to the nation and people. Research on water quality has been widely studied, but focused on geographical characteristic and river characteristics like inflow, outflow, quantity and speed of water. In this paper, two approaches to measure the similarity of sampling sites by using water quality data are discussed and compared with two-years empirical data of Yongdam-Dam. The existing method has calculated their similarities with principal component scores. The proposed approach in this paper use correlation matrix of water quality related variables and MDS for measuring the similarity, which is shown to be better in the sense of being clustering which is identical to geographical clustering since it can consider the time series pattern of water quality.

Optimize TOD Time-Division with Dynamic Time Warping Distance-based Non-Hierarchical Cluster Analysis (동적 타임 워핑 거리 기반 비 계층적 군집분석을 활용한 TOD 시간분할 최적화)

  • Hwang, Jae-Yeon;Park, Minju;Kim, Yongho;Kang, Woojin
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.5
    • /
    • pp.113-129
    • /
    • 2021
  • Recently, traffic congestion in the city is continuously increasing due to the expansion of the living area centered in the metropolitan area and the concentration of population in large cities. New road construction has become impossible due to the increase in land prices in downtown areas and limited sites, and the importance of efficient data-based road operation is increasingly emerging. For efficient road operation, it is essential to classify appropriate scenarios according to changes in traffic conditions and to operate optimal signals for each scenario. In this study, the Dynamic Time Warping model for cluster analysis of time series data was applied to traffic volume and speed data collected at continuous intersections for optimal scenario classification. We propose a methodology for composing an optimal signal operation scenario by analyzing the characteristics of the scenarios for each data used for classification.