Similarity Analysis of Hospitalization using Crowding Distance

  • Received : 2016.03.16
  • Accepted : 2016.04.15
  • Published : 2016.06.30


With the growing use of big data and data mining, it serves to understand how such techniques can be used to understand various relationships in the healthcare field. This study uses hierarchical methods of data analysis to explore similarities in hospitalization across several New York state counties. The study utilized methods of measuring crowding distance of data for age-specific hospitalization period. Crowding distance is defined as the longest distance, or least similarity, between urban cities. It is expected that the city of Clinton have the greatest distance, while Albany the other cities are closer because they are connected by the shortest distance to each step. Similarities were stronger across hospital stays categorized by age. Hierarchical clustering can be applied to predict the similarity of data across the 10 cities of hospitalization with the measurement of crowding distance. In order to enhance the performance of hierarchical clustering, comparison can be made across congestion distance when crowding distance is applied first through the application of converting text to an attribute vector. Measurements of similarity between two objects are dependent on the measurement method used in clustering but is distinguished from the similarity of the distance; where the smaller the distance value the more similar two things are to one other. By applying this specific technique, it is found that the distance between crowding is reduced consistently in relationship to similarity between the data increases to enhance the performance of the experiments through the application of special techniques. Furthermore, through the similarity by city hospitalization period, when the construction of hospital wards in cities, by referring to results of experiments, or predict possible will land to the extent of the size of the hospital facilities hospital stay is expected to be useful in efficiently managing the patient in a similar area.


  1. Ian H. Witten, Eibe Frank, Mark A. Hall, Data Mining Practical Machine Learning Tools and Techniques Third Edition, Morgan Kaufmann Publishers, 2011
  2. Jun-ho Lim, medical data mining using association rules, School of Computer & Information Technology Korea University, 2010
  3. Disease Control Division, Korea Research Society, the study of specimens correction and weight calculation of discharge patient survey, 2007. 12
  4. Disease Control Division, Korea Research Society, Hospital patient survey sampling and weighting correction calculation study 2007. 12
  5. Ltifi, Hela, et al. "A human-centred design approach for developing dynamic decision support system based on knowledge discovery in databases." Journal of Decision Systems 22.2 (2013): 69-96.
  6. Barnes, Sean, Bruce Golden, and Stuart Price. "Applications of agent-based modeling and simulation to healthcare operations management." Handbook of Healthcare Operations Management. Springer New York, 2013. 45-74.