• Title/Summary/Keyword: K means clustering

Search Result 1,118, Processing Time 0.03 seconds

A Study on Price Volatility and Properties of Time-series for the Tangerine Price in Jeju (제주지역 감귤가격의 시계열적 특성 및 가격변동성에 관한 연구)

  • Ko, Bong-Hyun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.6
    • /
    • pp.212-217
    • /
    • 2020
  • The purpose of this study was to analyze the volatility and properties of a time series for tangerine prices in Jeju using the GARCH model of Bollerslev(1986). First, it was found that the time series for the rate of change in tangerine prices had a thicker tail rather than a normal distribution. At a significance level of 1%, the Jarque-Bera statistic led to a rejection of the null hypothesis that the distribution of the time series for the rate of change in tangerine prices is normally distributed. Second, the correlation between the time series was high based on the Ljung-Box Q statistic, which was statistically verified through the ARCH-LM test. Third, the results of the GARCH(1,1) model estimation showed statistically significant results at a significance level of 1%, except for the constant of the mean equation. The persistence parameter value of the variance equation was estimated to be close to 1, which means that there is a high possibility that a similar level of volatility will be present in the future. Finally, it is expected that the results of this study can be used as basic data to optimize the government's tangerine supply and demand control policy.

Mixed-effects zero-inflated Poisson regression for analyzing the spread of COVID-19 in Daejeon (혼합효과 영과잉 포아송 회귀모형을 이용한 대전광역시 코로나 발생 동향 분석)

  • Kim, Gwanghee;Lee, Eunjee
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.375-388
    • /
    • 2021
  • This paper aims to help prevent the spread of COVID-19 by analyzing confirmed cases of COVID-19 in Daejeon. A high volume of visitors, downtown areas, and psychological fatigue with prolonged social distancing were considered as risk factors associated with the spread of COVID-19. We considered the weekly confirmed cases in each administrative district as a response variable. Explanatory variables were the number of passengers getting off at a bus station in each administrative district and the elapsed time since the Korean government had imposed distancing in daily life. We employed a mixed-effects zero-inflated Poisson regression model because the number of cases was repeatedly measured with excess zero-count data. We conducted k-means clustering to identify three groups of administrative districts having different characteristics in terms of the number of bars, the population size, and the distance to the closest college. Considering that the number of confirmed cases might vary depending on districts' characteristics, the clustering information was incorporated as a categorical explanatory variable. We found that Covid-19 was more prevalent as population size increased and a district is downtown. As the number of passengers getting off at a downtown district increased, the confirmed cases significantly increased.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

A STUDY OF MANDIBULAR DENIAL ARCH OF KOREAN ADULTS (한국 성인 유치악자의 하악 치열궁에 관한 조사)

  • Kim, Il-Han;Choi, Dae-Gyun
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.36 no.1
    • /
    • pp.166-182
    • /
    • 1998
  • The purposes of this study are to evaluate the Korean mandibular dental arch and classify the mandibular dental arch shape and size based on the incisal angle, canine angle, inter second molar width and height. In this study the mandibular study models were fabricated using irreversible hydrocolloid impression material from 225 volunteers with a mean age 23.62 (range 19-29). And the study models were measured with 3-dimensional measuring device and the mandibular dental arch was classified by means of K-means clustering method and visual inspection, then obtained data were analyzed with t-test for the statistical analysis. The results were as follows ; 1. The average canine height was 5.19mm(s.d. 1.17) in both sex, 5.34mm in male, and 4.95mnm in female. And the sexual difference was significant($0). 2. The average second molar height was 39.81mm(s.d. 2.44) in both sex, 40.19mm in male, and 39.21mm in female. And the sexual difference was significant($0). 3. The average inter-canine width was 27.16mm(s.d. 1.78) in both sex, 27.41mm in male, and 26.77mm in female. And the sexual difference was significant($0). 4. The average inter-first molar width was 46.93mm(s.d. 2.67) in both sex, 47.72mm in male, and 45.7mm in female. And the sexual difference was significant($0). 5. The inter-second molar width was average 56.09mm(s.d. 3.01) in both sex, 57.24mm in male, and 54.32mn in woma. And the sexual difference was significant($0). 6. The arch form was classified into three shapes based on the incisal and canine angle. V-shape showed $124.88^{\circ}$ of incisal angle and $141.64^{\circ}$ of canine angle, U-shape showed $152.76^{\circ}\;and\;125.35^{\circ}$, and O-shape showed $138.03^{\circ}\;and \;33.66^{\circ}$ respectively. Each shape distribution was that the V-shape was 14.2%, the U-Shape was 14.7%, and the O-shape was 71.1% of the 225 study models. 7. It was thought that the use of second molar width is more reasonable than height for classifying the dental arch size. The arch size was classified into four sizes based on the second molar width. Size 1 showed range of 42.24-48.23mm, size 2 showed 48.24-54.23mm, size 3 showed 54.24-60.23mm, and size 4 showed 60.24-66.23mm respectively. Each arch size distribution was that the size 1 was 1.3%, the size 2 was 27.1%, the size 3 was 63.6%, and the size 4 was 8.0% of the 225 study models.

  • PDF

Spatio-Temporal Clustering Analysis of HPAI Outbreaks in South Korea, 2014 (2014년 국내 발생 HPAI(고병원성 조류인플루엔자)의 시·공간 군집 분석)

  • MOON, Oun-Kyong;CHO, Seong-Beom;BAE, Sun-Hak
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.18 no.3
    • /
    • pp.89-101
    • /
    • 2015
  • Outbreaks of highly pathogenic avian influenza(HPAI) subtype H5N8 have occurred in Korea, January 2014 and it continued more than a year until 2015. And more than 5 million heads of poultry hads been damaged in 196 farms until May 2014. So, we studied the spatial, temporal and spatio-temporal patterns of the HPAI epidemics for understanding the propagation and diffusion characteristics of the 2014 HPAI. The results are expressed using GIS. Throughout the study period three epidemic waves occurred over the time. And outbreaks made three clusters in space. First spatial cluster is adjacent areas of province of Chungcheongbuk-do, Chungcheongnam-do and Gyeonggi -do. Second is Jeonlabuk-do Gomso Bay area. And the last is Naju and Yeongam in Jeollanam-do. Also, most of spatio-temporal clusters were formed in spatially high clustered areas. Especially, in Gomso Bay area space density and spatio-temporal density were concurrent. It means that the effective prevention activity for HPAI was carried out. But there are some exceptional areas such as Chungcheongbuk-do, Chungcheongnam-do, Gyeonggi-do adjacent area. In these areas the outbreak density was high in space but the spatio-temporal cluster was not formed. It means that the HPAI virus was continuing inflow over a long period.

A Study on Recommendation Technique Using Mining and Clustering of Weighted Preference based on FRAT (마이닝과 FRAT기반 가중치 선호도 군집을 이용한 추천 기법에 관한 연구)

  • Park, Wha-Beum;Cho, Young-Sung;Ko, Hyung-Hwa
    • Journal of Digital Contents Society
    • /
    • v.14 no.4
    • /
    • pp.419-428
    • /
    • 2013
  • Real-time accessibility and agility are required in u-commerce under ubiquitous computing environment. Most of the existing recommendation techniques adopt the method of evaluation based on personal profile, which has been identified with difficulties in accurately analyzing the customers' level of interest and tendencies, as well as the problems of cost, consequently leaving customers unsatisfied. Researches have been conducted to improve the accuracy of information such as the level of interest and tendencies of the customers. However, the problem lies not in the preconstructed database, but in generating new and diverse profiles that are used for the evaluation of the existing data. Also it is difficult to use the unique recommendation method with hierarchy of each customer who has various characteristics in the existing recommendation techniques. Accordingly, this dissertation used the implicit method without onerous question and answer to the users based on the data from purchasing, unlike the other evaluation techniques. We applied FRAT technique which can analyze the tendency of the various personalization and the exact customer.

A Post-Verification Method of Near-Duplicate Image Detection using SIFT Descriptor Binarization (SIFT 기술자 이진화를 이용한 근-복사 이미지 검출 후-검증 방법)

  • Lee, Yu Jin;Nang, Jongho
    • Journal of KIISE
    • /
    • v.42 no.6
    • /
    • pp.699-706
    • /
    • 2015
  • In recent years, as near-duplicate image has been increasing explosively by the spread of Internet and image-editing technology that allows easy access to image contents, related research has been done briskly. However, BoF (Bag-of-Feature), the most frequently used method for near-duplicate image detection, can cause problems that distinguish the same features from different features or the different features from same features in the quantization process of approximating a high-level local features to low-level. Therefore, a post-verification method for BoF is required to overcome the limitation of vector quantization. In this paper, we proposed and analyzed the performance of a post-verification method for BoF, which converts SIFT (Scale Invariant Feature Transform) descriptors into 128 bits binary codes and compares binary distance regarding of a short ranked list by BoF using the codes. Through an experiment using 1500 original images, it was shown that the near-duplicate detection accuracy was improved by approximately 4% over the previous BoF method.

Genetic Diversity Based on Morphology and RAPD Analysis in Vegetable Soybean

  • Srinives, P.;Chowdhury, A.K.;Tongpamnak, P.;Saksoong, P.
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.46 no.2
    • /
    • pp.112-120
    • /
    • 2001
  • Genetic diversity of 47 East-Asian vegetable soybean was characterized by means of agro-morphological traits and RAPD markers. A field trial was conducted to evaluate 14 agro-morphological traits. To study RAPD-based DNA analysis, a total of sixty 10-mer random primers were screened. Of these, 23 polymorphic markers in 16 varieties used for screening. Among 207 markers amplified, 48 were polymorphic for at least one pairwise comparison within the 47 varieties. A higher differentiation level between varieties was observed by using RAPD markers compared to morphological markers. Correspondence analysis using both types of marker showed that RAPD data could fully discriminate between all varieties, whereas morphological markers could not achieve a complete discrimination. Genetic distances between the varieties were estimated from simple matching coefficients, ranged from 0.0 to 0.640 with an average of 0.295$\pm$0.131 for morphological traits and 0.042 to 0.625 with an average of 0.336$\pm$0.099 for RAPD data, respectively. Cluster analysis based on genetic dissimilarity of these varieties gave rise to 4 distinct groups. The clustering results based on RAPDs did not match with those based on morphological traits. Geographical distribution of most varieties in each of the groups were not well defined. The results suggested that the level of genetic diversity within this group of East-Asian vegetable soybean varieties was sufficient for a breeding program and can be used to establish genetic relationships among them with unknown or unrelated pedigrees.

  • PDF

Design of RBFNN-based Emotional Lighting System Using RGBW LED (RGBW LED 이용한 RBFNN 기반 감성조명 시스템 설계)

  • Lim, Sung-Joon;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.62 no.5
    • /
    • pp.696-704
    • /
    • 2013
  • In this paper, we introduce the LED emotional lighting system realized with the aid of both intelligent algorithm and RGB LED combined with White LED. Generally, the illumination is known as a design factor to form the living place that affects human's emotion and action in the light- space as well as the purpose to light up the specific space. The LED emotional lighting system that can express emotional atmosphere as well as control the quantity of light is designed by using both RGB LED to form the emotional mood and W LED to get sufficient amount of light. RBFNNs is used as the intelligent algorithm and the network model designed with the aid of LED control parameters (viz. color coordinates (x and y) related to color temperature, and lux as inputs, RGBW current as output) plays an important role to build up the LED emotional lighting system for obtaining appropriate color space. Unlike conventional RBFNNs, Fuzzy C-Means(FCM) clustering method is used to obtain the fitness values of the receptive function, and the connection weights of the consequence part of networks are expressed by polynomial functions. Also, the parameters of RBFNN model are optimized by using PSO(Particle Swarm Optimization). The proposed LED emotional lighting can save the energy by using the LED light source and improve the ability to work as well as to learn by making an adequate mood under diverse surrounding conditions.

Anomaly Detection in Sensor Data

  • Kim, Jong-Min;Baik, Jaiwook
    • Journal of Applied Reliability
    • /
    • v.18 no.1
    • /
    • pp.20-32
    • /
    • 2018
  • Purpose: The purpose of this study is to set up an anomaly detection criteria for sensor data coming from a motorcycle. Methods: Five sensor values for accelerator pedal, engine rpm, transmission rpm, gear and speed are obtained every 0.02 second from a motorcycle. Exploratory data analysis is used to find any pattern in the data. Traditional process control methods such as X control chart and time series models are fitted to find any anomaly behavior in the data. Finally unsupervised learning algorithm such as k-means clustering is used to find any anomaly spot in the sensor data. Results: According to exploratory data analysis, the distribution of accelerator pedal sensor values is very much skewed to the left. The motorcycle seemed to have been driven in a city at speed less than 45 kilometers per hour. Traditional process control charts such as X control chart fail due to severe autocorrelation in each sensor data. However, ARIMA model found three abnormal points where they are beyond 2 sigma limits in the control chart. We applied a copula based Markov chain to perform statistical process control for correlated observations. Copula based Markov model found anomaly behavior in the similar places as ARIMA model. In an unsupervised learning algorithm, large sensor values get subdivided into two, three, and four disjoint regions. So extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior in the sensor values. Conclusion: Exploratory data analysis is useful to find any pattern in the sensor data. Process control chart using ARIMA and Joe's copula based Markov model also give warnings near similar places in the data. Unsupervised learning algorithm shows us that the extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior.