• Title/Summary/Keyword: Local clustering

Search Result 341, Processing Time 0.03 seconds

A Time Series Forecasting Model with the Option to Choose between Global and Clustered Local Models for Hotel Demand Forecasting (호텔 수요 예측을 위한 전역/지역 모델을 선택적으로 활용하는 시계열 예측 모델)

  • Keehyun Park;Gyeongho Jung;Hyunchul Ahn
    • The Journal of Bigdata
    • /
    • v.9 no.1
    • /
    • pp.31-47
    • /
    • 2024
  • With the advancement of artificial intelligence, the travel and hospitality industry is also adopting AI and machine learning technologies for various purposes. In the tourism industry, demand forecasting is recognized as a very important factor, as it directly impacts service efficiency and revenue maximization. Demand forecasting requires the consideration of time-varying data flows, which is why statistical techniques and machine learning models are used. In recent years, variations and integration of existing models have been studied to account for the diversity of demand forecasting data and the complexity of the natural world, which have been reported to improve forecasting performance concerning uncertainty and variability. This study also proposes a new model that integrates various machine-learning approaches to improve the accuracy of hotel sales demand forecasting. Specifically, this study proposes a new time series forecasting model based on XGBoost that selectively utilizes a local model by clustering with DTW K-means and a global model using the entire data to improve forecasting performance. The hotel demand forecasting model that selectively utilizes global and regional models proposed in this study is expected to impact the growth of the hotel and travel industry positively and can be applied to forecasting in other business fields in the future.

An Analysis on the Spatial Spillover Patterns of Aging Population in Rural Areas (공간자기상관을 활용한 농촌지역 인구 고령화의 공간적 확산 분석)

  • Yeo, Chang-Hwan;Seo, Yun-Hee
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.17 no.3
    • /
    • pp.39-53
    • /
    • 2014
  • The Korean population is aging rapidly and a disproportionate share of older people(aged 65 and older) lives in rural areas. The rural population is aging more rapidly than the population in urban area. However, the majority of studies on aging population focuses on an urban area rather than a rural area. Rural areas have been alienated from the priority of the national policy. For these reasons, this study is to show the level of population aging and to analyze the spatial spillover patterns of aging population in rural areas for the establishment of localized policy on population aging. The main findings of this study can be summarized as follows. First, the level of population aging varies in different localities such as socio-economic and locational characteristics. Secondly, there are distinct differences between hot spot region(clustering of high aging index) and cold spot region(clustering of low aging index) in spatio-temporal spillover patterns. This study intends to suggest an useful information to establish the area-specific policy on aging population through the results of analysis.

Railway Track Extraction from Mobile Laser Scanning Data (모바일 레이저 스캐닝 데이터로부터 철도 선로 추출에 관한 연구)

  • Yoonseok, Jwa;Gunho, Sohn;Jong Un, Won;Wonchoon, Lee;Nakhyeon, Song
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.33 no.2
    • /
    • pp.111-122
    • /
    • 2015
  • This study purposed on introducing a new automated solution for detecting railway tracks and reconstructing track models from the mobile laser scanning data. The proposed solution completes following procedures; the study initiated with detecting a potential railway region, called Region Of Interest (ROI), and approximating the orientation of railway track trajectory with the raw data. At next, the knowledge-based detection of railway tracks was performed for localizing track candidates in the first strip. In here, a strip -referring the local track search region- is generated in the orthogonal direction to the orientation of track trajectory. Lastly, an initial track model generated over the candidate points, which were detected by GMM-EM (Gaussian Mixture Model-Expectation & Maximization) -based clustering strip- wisely grows to capture all track points of interest and thus converted into geometric track model in the tracking by detection framework. Therefore, the proposed railway track tracking process includes following key features; it is able to reduce the complexity in detecting track points by using a hypothetical track model. Also, it enhances the efficiency of track modeling process by simultaneously capturing track points and modeling tracks that resulted in the minimization of data processing time and cost. The proposed method was developed using the C++ program language and was evaluated by the LiDAR data, which was acquired from MMS over an urban railway track area with a complex railway scene as well.

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.453-462
    • /
    • 2009
  • Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

Pattern Analysis for Urban Spatial Distribution of Traffic Accidents in Jinju (진주시 교통사고의 도시공간분포패턴 분석)

  • Sung, Byeong Jun;Yoo, Hwan Hee
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.3
    • /
    • pp.99-105
    • /
    • 2014
  • Since traffic accidents account for the highest proportion of the artificial disasters which occur in urban areas along with fire, more scientific an analysis on the causes of traffic accidents and various prevention measures against traffic accidents are needed. In this study, the research selected Jinju-si, which belongs to local small and medium-sized cities as a research target to analyze the characteristics of temporal and spacial distribution of traffic accidents by associating the data of traffic accidents, occurred in 2013 with the causes of traffic accidents and location information that includes occurrence time and seasonal features. It subsequently examines the spatial correlation between traffic accidents and the characteristics of urban space development according to the plans of land using. As a result, the characteristics of accident distribution according to the types of accidents reveal that side right-angle collisions (car versus car) and pedestrian-crossing accident (car versus man) showed the highest clustering in the density analysis and average nearest neighbor analysis. In particular, traffic accidents occurred the most on roads which connect urban central commercial areas, high-density residential areas, and industrial areas. In addition, human damage in damage conditions, clear day in weather condition, dry condition in the road condition, and three-way intersection in the road way showed the highest clustering.

The Design of Polynomial Network Pattern Classifier based on Fuzzy Inference Mechanism and Its Optimization (퍼지 추론 메커니즘에 기반 한 다항식 네트워크 패턴 분류기의 설계와 이의 최적화)

  • Kim, Gil-Sung;Park, Byoung-Jun;Oh, Sung-Kwun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.7
    • /
    • pp.970-976
    • /
    • 2007
  • In this study, Polynomial Network Pattern Classifier(PNC) based on Fuzzy Inference Mechanism is designed and its parameters such as learning rate, momentum coefficient and fuzzification coefficient are optimized by means of Particle Swarm Optimization. The proposed PNC employes a partition function created by Fuzzy C-means(FCM) clustering as an activation function in hidden layer and polynomials weights between hidden layer and output layer. Using polynomials weights can help to improve the characteristic of the linear classification of basic neural networks classifier. In the viewpoint of linguistic analysis, the proposed classifier is expressed as a collection of "If-then" fuzzy rules. Namely, architecture of networks is constructed by three functional modules that are condition part, conclusion part and inference part. The condition part relates to the partition function of input space using FCM clustering. In the conclusion part, a polynomial function caries out the presentation of a partitioned local space. Lastly, the output of networks is gotten by fuzzy inference in the inference part. The proposed PNC generates a nonlinear discernment function in the output space and has the better performance of pattern classification as a classifier, because of the characteristic of polynomial based fuzzy inference of PNC.

Classification of Clusters, Characteristics and Related Factors according to Drinking, Smoking, Exercising and Nutrition among Korean Adults (한국 성인의 음주, 흡연, 운동 및 영양행태에 대한 군집별 특성 및 관련요인)

  • Kim, Kkot-byeol;Eun, Sang Jun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.5
    • /
    • pp.252-266
    • /
    • 2019
  • The purpose of this study was to identify the type of health behaviors in Korean adults and to identify related factors. The data used in the analysis was the Korea Health and Nutrition Examination Survey 2014., which was representative of the Korean population. Cluster analysis was used to find the pattern of clustering of smoking, drinking, exercising and nutrition. Differences in the pattern of clustering was examined, first by bivariate chi-square test, and then by multinomial logit regression. Lastly, the association between the clusters of health behaviors and other behavioral risk factors was tested by chi-square test and logistic regression. The distribution of the clusters varied not only across socioeconomic characteristics and local size, but also between individuals with certain chronic diseases and those without. The results of this study can be used as a basis for the usefulness of approaching the cluster rather than individually approaching the health behavior.

New Soil Classification System Using Cone Penetration Test (콘관입시험결과를 이용한 새로운 흙분류 방법의 개발)

  • Kim, Chan-Hong;Im, Jong-Chul;Kim, Young-Sang;Joo, No-Ah
    • Journal of the Korean Geotechnical Society
    • /
    • v.24 no.10
    • /
    • pp.57-70
    • /
    • 2008
  • The advantage of piezocone penetration test is a guarantee of continuous data, which is a source of reliable interpretation of target soil layer. Many researches have been carried out f3r several decades and several classification charts have been developed to classify in-situ soil from the cone penetration test result. Since most present classification charts or methods were developed based on the data which were compiled over the world except Korea, they should be verified to be feasible for Korean soil. Furthermore, sometimes their charts provide different soil classification results according to the different input parameters. However, unfortunately, revision of those charts is quite difficult or almost impossible. In this research a new soil classification model is proposed by using fuzzy C-mean clustering and neuro-fuzzy theory based on the 5371 CPT results and soil logging results compiled from 17 local sites around Korea. Proposed neuro-fuzzy soil classification model was verified by comparing the classification results f3r new data, which were not used during learning process of neuro-fuzzy model, with real soil log. Efficiency of proposed neuro-fuzzy model was compared with other soft computing classification models and Robertson method for new data.

Research on Characterizing Urban Color Analysis based on Tourists-Shared Photos and Machine Learning - Focused on Dali City, China - (관광객 공유한 사진 및 머신 러닝을 활용한 도시 색채 특성 분석 연구 - 중국 대리시를 대상으로 -)

  • Yin, Xiaoyan;Jung, Taeyeol
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.52 no.2
    • /
    • pp.39-50
    • /
    • 2024
  • Color is an essential visual element that has a significant impact on the formation of a city's image and people's perceptions. Quantitative analysis of color in urban environments is a complex process that has been difficult to implement in the past. However, with recent rapid advances in Machine Learning, it has become possible to analyze city colors using photos shared by tourists. This study selected Dali City, a popular tourist destination in China, as a case study. Photos of Dali City shared by tourists were collected, and a method to measure large-scale city colors was explored by combining machine learning techniques. Specifically, the DeepLabv3+ model was first applied to perform a semantic segmentation of tourist sharing photos based on the ADE20k dataset, thereby separating artificial elements in the photos. Next, the K-means clustering algorithm was used to extract colors from the artificial elements in Dali City, and an adjacency matrix was constructed to analyze the correlations between the dominant colors. The research results indicate that the main color of the artificial elements in Dali City has the highest percentage of orange-grey. Furthermore, gray tones are often used in combination with other colors. The results indicated that local ethnic and Buddhist cultures influence the color characteristics of artificial elements in Dali City. This research provides a new method of color analysis, and the results not only help Dali City to shape an urban color image that meets the expectations of tourists but also provide reference materials for future urban color planning in Dali City.

An Effect of Aggregation of Point Features to Areal Units on K-Index (점사상의 지역단위 집계가 K-지표에 미치는 영향)

  • Lee Byoung-Kil
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.24 no.1
    • /
    • pp.131-138
    • /
    • 2006
  • Recently, data gathering and algorithm developing are in progress for the GIS application using point feature. Several researches prove that verification of the spatial clustering and evaluation of inter-dependencies between event and control are possible. On the other hand, most of the point features as GIS data are gathered by indirect method, such as address geo-coding, rather than by direct method, such as field surveying. Futhermore, lots of statistics by administrative district based on the point features have no coordinates information of the points. In this study, calculating the K-index in GIS environment, to evaluate the effect of aggregation of raw data on K-index, K-indices estimated from raw data (parcel unit), topographically aggregated data (block unit), administratively aggregated data (administrative district unit) are compared and evaluated. As a result, point feature, highly clustered in local area, is largely distorted when aggregated administratively. But, the K-indices of topographically aggregated data is very similar to the K-indices of raw data.