• Title/Summary/Keyword: agglomerative clustering

Search Result 59, Processing Time 0.025 seconds

Underdetermined blind source separation using normalized spatial covariance matrix and multichannel nonnegative matrix factorization (멀티채널 비음수 행렬분해와 정규화된 공간 공분산 행렬을 이용한 미결정 블라인드 소스 분리)

  • Oh, Son-Mook;Kim, Jung-Han
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.2
    • /
    • pp.120-130
    • /
    • 2020
  • This paper solves the problem in underdetermined convolutive mixture by improving the disadvantages of the multichannel nonnegative matrix factorization technique widely used in blind source separation. In conventional researches based on Spatial Covariance Matrix (SCM), each element composed of values such as power gain of single channel and correlation tends to degrade the quality of the separated sources due to high variance. In this paper, level and frequency normalization is performed to effectively cluster the estimated sources. Therefore, we propose a novel SCM and an effective distance function for cluster pairs. In this paper, the proposed SCM is used for the initialization of the spatial model and used for hierarchical agglomerative clustering in the bottom-up approach. The proposed algorithm was experimented using the 'Signal Separation Evaluation Campaign 2008 development dataset'. As a result, the improvement in most of the performance indicators was confirmed by utilizing the 'Blind Source Separation Eval toolbox', an objective source separation quality verification tool, and especially the performance superiority of the typical SDR of 1 dB to 3.5 dB was verified.

Hierarchical Overlapping Clustering to Detect Complex Concepts (중복을 허용한 계층적 클러스터링에 의한 복합 개념 탐지 방법)

  • Hong, Su-Jeong;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.111-125
    • /
    • 2011
  • Clustering is a process of grouping similar or relevant documents into a cluster and assigning a meaningful concept to the cluster. By this process, clustering facilitates fast and correct search for the relevant documents by narrowing down the range of searching only to the collection of documents belonging to related clusters. For effective clustering, techniques are required for identifying similar documents and grouping them into a cluster, and discovering a concept that is most relevant to the cluster. One of the problems often appearing in this context is the detection of a complex concept that overlaps with several simple concepts at the same hierarchical level. Previous clustering methods were unable to identify and represent a complex concept that belongs to several different clusters at the same level in the concept hierarchy, and also could not validate the semantic hierarchical relationship between a complex concept and each of simple concepts. In order to solve these problems, this paper proposes a new clustering method that identifies and represents complex concepts efficiently. We developed the Hierarchical Overlapping Clustering (HOC) algorithm that modified the traditional Agglomerative Hierarchical Clustering algorithm to allow overlapped clusters at the same level in the concept hierarchy. The HOC algorithm represents the clustering result not by a tree but by a lattice to detect complex concepts. We developed a system that employs the HOC algorithm to carry out the goal of complex concept detection. This system operates in three phases; 1) the preprocessing of documents, 2) the clustering using the HOC algorithm, and 3) the validation of semantic hierarchical relationships among the concepts in the lattice obtained as a result of clustering. The preprocessing phase represents the documents as x-y coordinate values in a 2-dimensional space by considering the weights of terms appearing in the documents. First, it goes through some refinement process by applying stopwords removal and stemming to extract index terms. Then, each index term is assigned a TF-IDF weight value and the x-y coordinate value for each document is determined by combining the TF-IDF values of the terms in it. The clustering phase uses the HOC algorithm in which the similarity between the documents is calculated by applying the Euclidean distance method. Initially, a cluster is generated for each document by grouping those documents that are closest to it. Then, the distance between any two clusters is measured, grouping the closest clusters as a new cluster. This process is repeated until the root cluster is generated. In the validation phase, the feature selection method is applied to validate the appropriateness of the cluster concepts built by the HOC algorithm to see if they have meaningful hierarchical relationships. Feature selection is a method of extracting key features from a document by identifying and assigning weight values to important and representative terms in the document. In order to correctly select key features, a method is needed to determine how each term contributes to the class of the document. Among several methods achieving this goal, this paper adopted the $x^2$�� statistics, which measures the dependency degree of a term t to a class c, and represents the relationship between t and c by a numerical value. To demonstrate the effectiveness of the HOC algorithm, a series of performance evaluation is carried out by using a well-known Reuter-21578 news collection. The result of performance evaluation showed that the HOC algorithm greatly contributes to detecting and producing complex concepts by generating the concept hierarchy in a lattice structure.

Underdetermined Blind Source Separation from Time-delayed Mixtures Based on Prior Information Exploitation

  • Zhang, Liangjun;Yang, Jie;Guo, Zhiqiang;Zhou, Yanwei
    • Journal of Electrical Engineering and Technology
    • /
    • v.10 no.5
    • /
    • pp.2179-2188
    • /
    • 2015
  • Recently, many researches have been done to solve the challenging problem of Blind Source Separation (BSS) problems in the underdetermined cases, and the “Two-step” method is widely used, which estimates the mixing matrix first and then extracts the sources. To estimate the mixing matrix, conventional algorithms such as Single-Source-Points (SSPs) detection only exploits the sparsity of original signals. This paper proposes a new underdetermined mixing matrix estimation method for time-delayed mixtures based on the receiver prior exploitation. The prior information is extracted from the specific structure of the complex-valued mixing matrix, which is used to derive a special criterion to determine the SSPs. Moreover, after selecting the SSPs, Agglomerative Hierarchical Clustering (AHC) is used to automaticly cluster, suppress, and estimate all the elements of mixing matrix. Finally, a convex-model based subspace method is applied for signal separation. Simulation results show that the proposed algorithm can estimate the mixing matrix and extract the original source signals with higher accuracy especially in low SNR environments, and does not need the number of sources before hand, which is more reliable in the real non-cooperative environment.

Agglomerative Hierarchical Clustering Using Latent Semantic Analysis in Information Retrieval (정보 검색에서의 잠재 의미 분석 방법을 이용한 응집 계층 군집화 기법 연구)

  • Khiati, Abdel-Ilah Zakaria;Kang, Daehyun;Park, Hansaem;Kwon, Kyunglag;Chung, In-Jeong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.04a
    • /
    • pp.952-955
    • /
    • 2014
  • 본 논문에서는 정보 검색 분야에서 잘 알려진 잠재 의미 분석 방법과 계층적 군집화 방법의 단점을 상호 보완하여 보다 효율적인 정보 검색을 위한 혼합형 군집화 방법을 제안한다. 먼저, 잠재 의미 분석 방법은 벡터 연산을 통하여 자동적으로 문서 내에 있는 잠재적인 의미를 찾는 정보 검색분야에서 많이 사용되는 고전적인 방법이다. 그러나 이 방법은 언어의 유의성이나 다의성으로 인하여 발생되는 백-오브-워드(bag-of-word) 문제를 가지고 있다. 두 번째 방법인 문서 군집화를 위하여 범용적으로 사용되고 있는 계층적 군집화 방법이다. 이 방법은 이를 통하여 분석된 군집의 질적 측면에서 볼 때, 여전히 단층적 군집들이 많이 형성되어 세부적인 분석을 통한 추가적인 군집화가 필요함을 알 수 있다. 따라서, 본 논문에서는 앞서 언급한 문제점을 해결하기 위하여 혼합적인 방법으로 잠재 의미 분석 방법을 이용한 응집 계층 군집화 방법을 제안한다. 제안한 방법을 이용하여 잘 알려진 두 개의 데이터에 적용하고 기존의 방법과 그 결과를 비교함으로써 군집의 질적 측면에서의 우수함을 보인다.

A detailed analysis of nearby young stellar moving groups

  • Lee, Jinhee
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.44 no.2
    • /
    • pp.63.3-63.3
    • /
    • 2019
  • Nearby young moving groups (NYMGs hereafter) are gravitationally unbound loose young stellar associations located within 100 pc of the Sun. Since NYMGs are crucial laboratories for studying low-mass stars and planets, intensive searches for NYMG members have been performed. For identification of NYMG members, various strategies and methods have been applied. As a result, the reliability of the members in terms of membership is not uniform, which means that a careful membership re-assessment is required. In this study, I developed a NYMG membership probability calculation tool based on Bayesian inference (Bayesian Assessment of Moving Groups: BAMG). For the development of the BAMG tool, I constructed ellipsoidal models for nine NYMGs via iterative and self-consistent processes. Using BAMG, memberships of claimed members in the literature (N~2000) were evaluated, and 35 per cent of members were confirmed as bona fide members of NYMGs. Based on the deficiency of low-mass members appeared in mass function using these bona fide members, low mass members from Gaia DR2 are identified. About 2000 new M dwarf and brown dwarf candidate members were identified. Memberships of ~70 members with RV from Gaia were confirmed, and the additional ~20 members were confirmed via spectroscopic observation. Not relying on previous knowledge about the existence of nine NYMGs, unsupervised machine learning analyses were applied to NYMG members. K-means and Agglomerative Clustering algorithms result in similar trends of grouping. As a result, six previously known groups (TWA, beta-Pic, Carina, Argus, AB Doradus, and Volans-Carina) were rediscovered. Three the other known groups are recognized as well; however, they are combined into two new separate groups (ThOr+Columba and TucHor+Columba).

  • PDF

Variations in the texture properties of cooked rice as a function of instrumental parameter conditions (기기적 측정조건을 달리하여 측정한 쌀밥의 조직감 특성 변화)

  • Choi, Won-Seok;Seo, Han-Seok
    • Korean Journal of Food Science and Technology
    • /
    • v.48 no.5
    • /
    • pp.521-524
    • /
    • 2016
  • This study aimed to examine variations in the texture profile analysis (TPA) of cooked rice in relation to the instrumental parameter conditions. The TPA of four types of ready-to-eat, white rice products was conducted in two levels of compression ratio (30 and 70%) and cross-head speed (0.5 and 1.0 mm/s). The properties of the four rice products significantly or non-significantly differed, depending on the instrumental parameter condition. Agglomerative hierarchical clustering, based on the five TPA properties such as hardness, adhesiveness, cohesiveness, chewiness, and springiness, revealed that clustering of the four rice products varied with the instrumental parameter condition. Additionally, the ratio of adhesiveness to hardness, an index of rice texture quality, showed a variation depending on the two instrumental parameter conditions. In conclusion, our findings demonstrate that the texture profile, texture-based sample clustering, and the ratio of adhesiveness to hardness vary with the compression ratio and cross-head speed in the TPA.

Video Scene Detection using Shot Clustering based on Visual Features (시각적 특징을 기반한 샷 클러스터링을 통한 비디오 씬 탐지 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.47-60
    • /
    • 2012
  • Video data comes in the form of the unstructured and the complex structure. As the importance of efficient management and retrieval for video data increases, studies on the video parsing based on the visual features contained in the video contents are researched to reconstruct video data as the meaningful structure. The early studies on video parsing are focused on splitting video data into shots, but detecting the shot boundary defined with the physical boundary does not cosider the semantic association of video data. Recently, studies on structuralizing video shots having the semantic association to the video scene defined with the semantic boundary by utilizing clustering methods are actively progressed. Previous studies on detecting the video scene try to detect video scenes by utilizing clustering algorithms based on the similarity measure between video shots mainly depended on color features. However, the correct identification of a video shot or scene and the detection of the gradual transitions such as dissolve, fade and wipe are difficult because color features of video data contain a noise and are abruptly changed due to the intervention of an unexpected object. In this paper, to solve these problems, we propose the Scene Detector by using Color histogram, corner Edge and Object color histogram (SDCEO) that clusters similar shots organizing same event based on visual features including the color histogram, the corner edge and the object color histogram to detect video scenes. The SDCEO is worthy of notice in a sense that it uses the edge feature with the color feature, and as a result, it effectively detects the gradual transitions as well as the abrupt transitions. The SDCEO consists of the Shot Bound Identifier and the Video Scene Detector. The Shot Bound Identifier is comprised of the Color Histogram Analysis step and the Corner Edge Analysis step. In the Color Histogram Analysis step, SDCEO uses the color histogram feature to organizing shot boundaries. The color histogram, recording the percentage of each quantized color among all pixels in a frame, are chosen for their good performance, as also reported in other work of content-based image and video analysis. To organize shot boundaries, SDCEO joins associated sequential frames into shot boundaries by measuring the similarity of the color histogram between frames. In the Corner Edge Analysis step, SDCEO identifies the final shot boundaries by using the corner edge feature. SDCEO detect associated shot boundaries comparing the corner edge feature between the last frame of previous shot boundary and the first frame of next shot boundary. In the Key-frame Extraction step, SDCEO compares each frame with all frames and measures the similarity by using histogram euclidean distance, and then select the frame the most similar with all frames contained in same shot boundary as the key-frame. Video Scene Detector clusters associated shots organizing same event by utilizing the hierarchical agglomerative clustering method based on the visual features including the color histogram and the object color histogram. After detecting video scenes, SDCEO organizes final video scene by repetitive clustering until the simiarity distance between shot boundaries less than the threshold h. In this paper, we construct the prototype of SDCEO and experiments are carried out with the baseline data that are manually constructed, and the experimental results that the precision of shot boundary detection is 93.3% and the precision of video scene detection is 83.3% are satisfactory.

A Market Segmentation Scheme Based on Customer Information and QAP Correlation between Product Networks (고객정보와 상품네트워크 유사도를 이용한 시장세분화 기법)

  • Jeong, Seok-Bong;Shin, Yong Ho;Koo, Seo Ryong;Yoon, Hyoup-Sang
    • Journal of the Korea Society for Simulation
    • /
    • v.24 no.4
    • /
    • pp.97-106
    • /
    • 2015
  • In recent, hybrid market segmentation techniques have been widely adopted, which conduct segmentation using both general variables and transaction based variables. However, the limitation of the techniques is to generate incorrect results for market segmentation even though its methodology and concept are easy to apply. In this paper, we propose a novel scheme to overcome this limitation of the hybrid techniques and to take an advantage of product information obtained by customer's transaction data. In this scheme, we first divide a whole market into several unit segments based on the general variables and then agglomerate the unit segments with higher QAP correlations. Each product network represents for purchasing patterns of its corresponding segment, thus, comparisons of QAP correlation between product networks of each segment can be a good measure to compare similarities between each segment. A case study has been conducted to validate the proposed scheme. The results show that our scheme effectively works for Internet shopping malls.

Composition of Federal R&D Spending, and Regional Economy : The Case of the U.S.A

  • Lee, Si-Kyoung
    • Journal of the Korean Regional Science Association
    • /
    • v.9 no.1
    • /
    • pp.65-78
    • /
    • 1993
  • In this study, the significant and enduring concentration of federal R&D spending in metro-scale clusters across the nation is treated as evidence of the operation of a distinct industrial infrastructure defined by the ability of R&D performers to attract external funding and pursue the sophisticated project work demanded. It follows, then, that the agglomerative potential of these R&D concentrations -- performers and their support infrastructures -- requires a search for economic impacts guided by a different stimulative effects attributable to federal R&D spending may be that substantial subnational economic impacts are routinely obscured and diluted by research designs that seek to discover impacts either at the level of nation-scale economic aggregates or on firms or specific industries organized spatially. Therefore, this study proceeds by seeking to link the locational clustering of federal contract R&D spending to more localized economic impacts. It tests a series of models(X-IV) designed to trace federal contract R&D spending flows to economic impacts registered at the level of metro-regional economies. By shifting the focus from funding sources to recipient types and then to sector-specific impacts, the patterns of consistent results become increasingly compelling. In general, these results indicated that federal R&D spending does indeed nurture the development of an important nation-spanning advanced industrial production and R&D infrastructure anchored primarily by two dozed or so metro-regions. However, dominated as it is by a strong defense-industrial orientation, federal contract R&D spending would appear to constitute a relatively inefficient national economic development policy, at least as registered on conventional indicators. Federal contract R&D destined for the support of nondefense/civilian(Model I), nonprofit(Model II), and educational/research(Mode III) R&D agendas is associated with substantially greater regional employment and income impacts than is R&D funding disbursed by the Department of Defense. While federal R&D support from DOD(Model I) and for-profit(Model II) and industrial performer(Model III) contract R&D agendas are associated with positive regional economic impacts, they are substantially smaller than those associated with performers operating outside the defense industrial base. Moreover, evidence that the large-business sector mediates a small business sector(Model VI) justifies closer scrutiny of the relative contribution to economic growth and development made by these two sectors, as well as of the primacy typically accorded employment change as a conventional economic performance indicator. Ultimately, those regions receiving federal R&D spending have experienced measurable employment and income gains as a result. However, whether or not those gains could be improved by changing the composition -- and therefore the primary missions -- of federal R&D spending cannot be decided by merely citing evidence of its economic impacts of the kind reported here. Rather, that decision turns on a prior public choice relating to the trade-offs deemed acceptable between conventional employment and income gains, the strength of a nation's industrial base not reflected in such indicators, and the reigning conception of what constitutes national security -- military might or a competitive civilian economy.

  • PDF