• Title/Summary/Keyword: over-clustering

Search Result 388, Processing Time 0.024 seconds

Two Paths of Korea's Clustering: Centralized De-concentration and Regionalized Concentration

  • Lee, Shi-Chul
    • World Technopolis Review
    • /
    • v.1 no.2
    • /
    • pp.129-140
    • /
    • 2012
  • This paper presents, from a broad perspective, the manner in which various types of clusters and options for regional development have evolved in Korea over the past decade, with particular emphasis on who have taken initiative in establishing the clusters. Characterized by not only progress but also setbacks, two distinctive patterns have emerged: centralized de-concentration and regionalized concentration. Both the Korean government and numerous localities have continuously extended efforts to create different clusters, technology parks, special districts, etc. In many cases, local or regional governments have competed intensely for clusters to be located in their jurisdictions; in particular, concerted efforts to convince national governments to set up special districts have been witnessed. On the other hand, major localities have made their own efforts to generate large- and small-scale clustering projects. It remains to be seen how different outcomes or effectiveness these two approaches will make in the future. Following the review of relevant literature and practices, I examine the well-known national campaign and projects in the previous administration in Korea in the context of 'de-concentration' of economic values and resources. Thereafter, other cases initiated mostly by local governments are discussed; some of these clustering efforts and regional projects have fared well thus far, but some haven't. In the case of Daegu, the progress of some critical projects, such as the Daegu Technopolis and a Free Economic Zone, is elaborated.

Learning Algorithm using a LVQ and ADALINE (LVQ와 ADALINE을 이용한 학습 알고리듬)

  • 윤석환;민준영;신용백
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.19 no.39
    • /
    • pp.47-61
    • /
    • 1996
  • We propose a parallel neural network model in which patterns are clustered and patterns in a cluster are studied in a parallel neural network. The learning algorithm used in this paper is based on LVQ algorithm of Kohonen(1990) for clustering and ADALINE(Adaptive Linear Neuron) network of Widrow and Hoff(1990) for parallel learning. The proposed algorithm consists of two parts. First, N patterns to be learned are categorized into C clusters by LVQ clustering algorithm. Second, C patterns that was selected from each cluster of C are learned as input pattern of ADALINE(Adaptive Linear Neuron). Data used in this paper consists of 250 patterns of ASCII characters normalized into $8\times16$ and 1124. The proposed algorithm consists of two parts. First, N patterns to be learned are categorized into C clusters by LVQ clustering algorithm. Second, C patterns that was selected from each cluster of C are learned as input pattern of ADALINE(Adaptive Linear Neuron). Data used in this paper consists 250 patterns of ASCII characters normalized into $8\times16$ and 1124 samples acquired from signals generated from 9 car models that passed Inductive Loop Detector(ILD) at 10 points. In ASCII character experiment, 191(179) out of 250 patterns are recognized with 3%(5%) noise and with 1124 car model data. 807 car models were recognized showing 71.8% recognition ratio. This result is 10.2% improvement over backpropagation algorithm.

  • PDF

The Joint analysis of galaxy clustering and weak lensing from the Deep Lens Survey to constrain cosmology and baryonic feedback

  • Yoon, Mijin;Jee, M. James;Tyson, J. Tony
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.1
    • /
    • pp.79.2-79.2
    • /
    • 2019
  • Based on three types of 2-point statistics (galaxy clustering, galaxy-galaxy lensing, and cosmic shear power spectra) from the Deep Lens Survey (DLS), we constrain cosmology and baryonic feedback. The DLS is a deep survey, so-called a precursor to LSST, reaching down to ~27th magnitude in BVRz' over 20 deg2. To measure the three power spectra, we choose two lens galaxy populations centered at z ~0.27 and 0.54 and two source galaxy populations centered at z ~0.64 and 1.1, with more than 1 million galaxies. We perform a number of consistency tests to confirm the reliability of the measurements. We calibrated photo-z estimation of the lens galaxies and validated the result with galaxy cross-correlation measurement. The B-mode signals, indicative of potential systematics, are found to be consistent with zero. The two cosmological results independently obtained from the cosmic shear and the galaxy clustering + galaxy-galaxy lensing measurements agree well with each other. Also, we verify that cosmological results between bright and faint sources are consistent. While there exist some weak lensing surveys showing a tension with Planck, the DLS constraint on S8 agrees nicely with the Planck result. Using the HMcode approach derived from the OWLS simulation, we constrain the strength of baryonic feedback. The DLS results hint at the possibility that the actual AGN feedback may be stronger than the one implemented in the current state-of-the-art simulations.

  • PDF

Analysis of News Articles on Child Welfare Policies in South Korea: K-Means Clustering (대한민국 정권별 아동복지정책 관련 뉴스 기사 분석: K-평균 군집 분석)

  • Kim, Eun Joo;Kim, Seong Kwang;Park, Bit Na
    • Journal of East-West Nursing Research
    • /
    • v.29 no.2
    • /
    • pp.185-195
    • /
    • 2023
  • Purpose: The purpose of this study is to analyze changes of child welfare policies and provide insights based on the collection and classification of newspaper articles. Methods: Articles related to child welfare policies were collected from 1990, during the Kim, Young-sam administration, to May 9, 2022, under the Moon, Jae-in administration. K-Means clustering and keyword Term Frequency-Inverse Document Frequency analysis were utilized to cluster and analyze newspaper articles with similar themes. Results: The administrations of Kim, Young-sam, Kim, Dae-jung, Roh, Moo-hyun, and Park, Geun-hye were classified into two clusters, and the Lee, Myung-bak and Moon, Jae-in administrations were classified into three clusters. Conclusion: South Korea's child welfare policies have focused on ensuring the safety and healthy development of children through diverse policies initiatives over the years. However, challenges related to child protection and child abuse persist. This requires additional resources and budget allocation. It is important to establish a comprehensive support system for children and families, including comprehensive nursing support.

Classification of Land Cover over the Korean Peninsula using MODIS Data (MODIS 자료를 이용한 한반도 지면피복 분류)

  • Kang, Jeon-Ho;Suh, Myoung-Seok;Kwak, Chong-Heum
    • Atmosphere
    • /
    • v.19 no.2
    • /
    • pp.169-182
    • /
    • 2009
  • To improve the performance of climate and numerical models, concerns on the land-atmosphere schemes are steadily increased in recent years. For the realistic calculation of land-atmosphere interaction, a land surface information of high quality is strongly required. In this study, a new land cover map over the Korean peninsula was developed using MODIS (MODerate resolution Imaging Spectroradiometer) data. The seven phenological data set (maximum, minimum, amplitude, average, growing period, growing and shedding rate) derived from 15-day normalized difference vegetation index (NDVI) were used as a basic input data. The ISOData (Iterative Self-Organizing Data Analysis), a kind of unsupervised non-hierarchical clustering method, was applied to the seven phenological data set. After the clustering, assignment of land cover type to the each cluster was performed according to the phenological characteristics of each land cover defined by USGS (US. Geological Survey). Most of the Korean peninsula are occupied by deciduous broadleaf forest (46.5%), mixed forest (15.6%), and dryland crop (13%). Whereas, the dominant land cover types are very diverse in South-Korea: evergreen needleleaf forest (29.9%), mixed forest (26.6%), deciduous broadleaf forest (16.2%), irrigated crop (12.6%), and dryland crop (10.7%). The 38 in-situ observation data-base over South-Korea, Environment Geographic Information System and Google-earth are used in the validation of the new land cover map. In general, the new land cover map over the Korean peninsula seems to be better classified compared to the USGS land cover map, especially for the Savanna in the USGS land cover map.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.

The Topology of Galaxy Clustering in the Sloan Digital Sky Survey Main Galaxy Sample: a Test for Galaxy Formation Models

  • Choi, Yun-Young;Park, Chang-Bom;Kim, Ju-Han;Weinberg, David H.;Kim, Sung-Soo S.;Gott III, J. Richard;Vogeley, Michael S.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.35 no.1
    • /
    • pp.82-82
    • /
    • 2010
  • We measure the topology of the galaxy distribution using the Seventh Data Release of the Sloan Digital Sky Survey (SDSS DR7), examining the dependence of galaxy clustering topology on galaxy properties. The observational results are used to test galaxy formation models. A volume-limited sample defined by Mr<-20.19 enables us to measure the genus curve with amplitude of G=378 at 6h-1Mpc smoothing scale, with 4.8% uncertainty including all systematics and cosmic variance. The clustering topology over the smoothing length interval from 6 to 10h-1Mpc reveals a mild scale-dependence for the shift and void abundance (A_V) parameters of the genus curve. We find strong bias in the topology of galaxy clustering with respect to the predicted topology of the matter distribution, which is also scale-dependent. The luminosity dependence of galaxy clustering topology discovered by Park et al. (2005) is confirmed: the distribution of relatively brighter galaxies shows a greater prevalence of isolated clusters and more percolated voids. We find that galaxy clustering topology depends also on morphology and color. Even though early (late)-type galaxies show topology similar to that of red (blue) galaxies, the morphology dependence of topology is not identical to the color dependence. In particular, the void abundance parameter A_V depends on morphology more strongly than on color. We test five galaxy assignment schemes applied to cosmological N-body simulations to generate mock galaxies: the Halo-Galaxy one-to-one Correspondence (HGC) model, the Halo Occupation Distribution (HOD) model, and three implementations of Semi-Analytic Models (SAMs). None of the models reproduces all aspects of the observed clustering topology; the deviations vary from one model to another but include statistically significant discrepancies in the abundance of isolated voids or isolated clusters and the amplitude and overall shift of the genus curve. SAM predictions of the topology color-dependence are usually correct in sign but incorrect in magnitude.

  • PDF

MOC: A Multiple-Object Clustering Scheme for High Performance of Page-out in BSD VM (MOC: 다중 오브젝트 클러스터링을 통한 BSD VM의 페이지-아웃 성능 향상)

  • Yang, Jong-Cheol;Ahn, Woo-Hyun;Oh, Jae-Won
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.6
    • /
    • pp.476-487
    • /
    • 2009
  • The virtual memory system in 4.4 BSD operating systems exploits a clustering scheme to reduce disk I/Os in paging out (or flushing) modified pages that are intended to be replaced in order to make free rooms in memory. Upon the page out of a victim page, the scheme stores a cluster (or group) of modified pages contiguous with the victim in the virtual address space to swap disk at a single disk write. However, it fails to find large clusters of contiguous pages if applications change pages not adjacent with each other in the virtual address space. To address the problem, we propose a new clustering scheme called Multiple-Object Clustering (MOC), which together stores multiple clusters in the virtual address space at a single disk write instead of paging out the clusters to swap space at separate disk I/Os. This multiple-cluster transfer allows the virtual memory system to significantly decrease disk writes, thus improving the page-out performance. Our experiments in the FreeBSD 6.2 show that MOC improves the execution times of realistic benchmarks such as NS2, Scimark2 SOR, and nbench LU over the traditional clustering scheme ranging from 9 to 45%.

A Study on the Asia Container Ports Clustering Using Hierarchical Clustering(Single, Complete, Average, Centroid Linkages) Methods with Empirical Verification of Clustering Using the Silhouette Method and the Second Stage(Type II) Cross-Efficiency Matrix Clustering Model (계층적 군집분석(최단, 최장, 평균, 중앙연결)방법에 의한 아시아 컨테이너 항만의 클러스터링 측정 및 실루엣방법과 2단계(Type II) 교차효율성 메트릭스 군집모형을 이용한 실증적 검증에 관한 연구)

  • Park, Ro-Kyung
    • Journal of Korea Port Economic Association
    • /
    • v.37 no.1
    • /
    • pp.31-70
    • /
    • 2021
  • The purpose of this paper is to measure the clustering change and analyze empirical results, and choose the clustering ports for Busan, Incheon, and Gwangyang ports by using Hierarchical clustering(single, complete, average, and centroid), Silhouette, and 2SCE[the Second Stage(Type II) cross-efficiency] matrix clustering models on Asian container ports over the period 2009-2018. The models have chosen number of cranes, depth, birth length, and total area as inputs and container TEU as output. The main empirical results are as follows. First, ranking order according to the efficiency increasing ratio during the 10 years analysis shows Silhouette(0.4052 up), Hierarchical clustering(0.3097 up), and 2SCE(0.1057 up). Second, according to empirical verification of the Silhouette and 2SCE models, 3 Korean ports should be clustered with ports like Busan Port[ Dubai, Hong Kong, and Tanjung Priok], and Incheon Port and Gwangyang Port are required to cluster with most ports. Third, in terms of the ASEAN, it would be good to cluster like Busan (Singapore), Incheon Port (Tanjung Priok, Tanjung Perak, Manila, Tanjung Pelpas, Leam Chanbang, and Bangkok), and Gwangyang Port(Tanjung Priok, Tanjung Perak, Port Kang, Tanjung Pelpas, Leam Chanbang, and Bangkok). Third, Wilcoxon's signed-ranks test of models shows that all P values are significant at an average level of 0.852. It means that the average efficiency figures and ranking orders of the models are matched each other. The policy implication is that port policy makers and port operation managers should select benchmarking ports by introducing the models used in this study into the clustering of ports, compare and analyze the port development and operation plans of their ports, and introduce and implement the parts which required benchmarking quickly.