• Title/Summary/Keyword: Co-clustering

Search Result 221, Processing Time 0.024 seconds

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.600-608
    • /
    • 2017
  • Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.

Clustering Scheme using Memory Restriction for Wireless Sensor Network (무선센서네트워크에서 메모리 속성을 이용한 클러스터링 기법)

  • Choi, Hae-Won;Yoo, Kee-Young
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.1B
    • /
    • pp.10-15
    • /
    • 2009
  • Recently, there are tendency that wireless sensor network is one of the important techniques for the future IT industry and thereby application areas in it are getting growing. Researches based on the hierarchical network topology are evaluated in good at energy efficiency in related protocols for wireless sensor network. LEACH is the best well known routing protocol for the hierarchical topology. However, there are problems in the range of message broadcasting, which should be expand into the overall network coverage, in LEACH related protocols. Thereby, this paper proposes a new clustering scheme to solve the co-shared problems in them. The basic idea of our scheme is using the inherent memory restrictions in sensor nodes. The results show that the proposed scheme could support the load balancing by distributing the clusters with a reasonable number of member nodes and thereby the network life time would be extended in about 1.8 times longer than LEACH.

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

A Study on the Structures and Characteristics of National Policy Knowledge (국가 정책지식의 구조와 특성에 관한 연구)

  • Lee, Ji-Sue;Chung, Young-Mee
    • Journal of Information Management
    • /
    • v.41 no.2
    • /
    • pp.1-30
    • /
    • 2010
  • This study analyzed research output in dominant research areas of 19 national research institutions. Policy knowledge produced by the institutions during the past 5 years mainly concerned 10 policies dealing with economy and society issues. Similarities between the research subjects of the institutions were displayed by MDS mapping. The study also identified issue attention cycles of the 5 chosen policies and examined the correlation between the issue attention cycles and the yields of policy knowledge. The knowledge structure of each policy was mapped using co-word analysis and Ward's clustering. It was also found that the institutions performing research on similar subjects demonstrated citation preferences for each other.

A Study on Intellectual Structure of Library and Information Science in Korea (문헌정보학의 지식 구조에 관한 연구)

  • Yoo, Yeong-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.20 no.3
    • /
    • pp.277-297
    • /
    • 2003
  • This study was conducted upon the premise that index terms display the intellectual structure of a specific subject field. In this study, and attempt was made to grasp the intellectual structure of Library and Information. Science by clustering the index terms of the journals of the related academic societies at the Library of National Assembly - such as the Journal of the Korean Society for Information Management, the Journal of the Korean Library and Information Science Society, and the Journal of the Korean Society for Library and Information Science. Through the course of the study, index term clusters were generated based on the linkage of the index terms and the frequency of co-occurrence, and moreover, time periods analysis was conducted along with studies on first-appearing terms, in order to clarify the trend and development process of the Library and Information Science. This study also analysed the difference between two intellectual structure by comparing the structure generated by index term clusters with the existing structure of traditional classification systems.

An Investigation of the Relationship between Revenue Water Ratio and the Operating and Maintenance Cost of Water Supply Network (상수관망 유수율과 유지관리 비용의 관계 분석)

  • Kim, Jaehee;Yoo, Kwangtae;Jun, Hwandon;Jang, Jaesun
    • Journal of Korean Society on Water Environment
    • /
    • v.28 no.2
    • /
    • pp.202-212
    • /
    • 2012
  • Due to the deterioration of water supply network and the deficiency of raw water, the water utility of local governments have performed various projects to improve their revenue water ratio. However, it is very difficult to estimate the cost for maintaining the revenue water ratio at higher level after completing the project, because local governments have different conditions affecting the operating and maintenance cost of water supply network. The purpose of this study is to present a procedure to estimate the operating and maintenance cost required to maintain the target revenue water ratio of the water supply network. For this purpose, we estimated the cost used only for operation and maintenance of water supply network of 164 local governments with the aid of K-Mean Clustering Analysis and the data from 40 representative local governments. Then, the regression analysis was performed to find relationship between revenue water ratio and the operating and maintenance cost with two different data sets generated by two classification methods; the first method classifies the local governments by means of k-means clustering, and the other classifies the local governments according to the index standardized by the operating and maintenance cost per unit length of water mains per revenue water ratio. The results shows that the method based on the index standardized by the cost and revenue water ratio of each government produces more reliable results for finding regression equations between revenue water ratio and the operating and maintenance cost only for water supply network. The estimated regression equations for each group can be used to estimate the cost required to keep the target revenue water ratio of the local government.

Personalized Size Recommender System for Online Apparel Shopping: A Collaborative Filtering Approach

  • Dongwon Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.39-48
    • /
    • 2023
  • This study was conducted to provide a solution to the problem of sizing errors occurring in online purchases due to discrepancies and non-standardization in clothing sizes. This paper discusses an implementation approach for a machine learning-based recommender system capable of providing personalized sizes to online consumers. We trained multiple validated collaborative filtering algorithms including Non-Negative Matrix Factorization (NMF), Singular Value Decomposition (SVD), k-Nearest Neighbors (KNN), and Co-Clustering using purchasing data derived from online commerce and compared their performance. As a result of the study, we were able to confirm that the NMF algorithm showed superior performance compared to other algorithms. Despite the characteristic of purchase data that includes multiple buyers using the same account, the proposed model demonstrated sufficient accuracy. The findings of this study are expected to contribute to reducing the return rate due to sizing errors and improving the customer experience on e-commerce platforms.

Femtocell Subband Selection Method for Managing Cross- and Co-tier Interference in a Femtocell Overlaid Cellular Network

  • Kwon, Young Min;Choo, Hyunseung;Lee, Tae-Jin;Chung, Min Young;Kim, Mihui
    • Journal of Information Processing Systems
    • /
    • v.10 no.3
    • /
    • pp.384-394
    • /
    • 2014
  • The femtocell overlaid cellular network (FOCN) has been used to enhance the capacity of existing cellular systems. To obtain the desired system performance, both cross-tier interference and co-tier interference in an FOCN need to be managed. This paper proposes an interference management scheme that adaptively constructs a femtocell cluster, which is a group of femtocell base stations that share the same frequency band. The performance evaluation shows that the proposed scheme can enhance the performance of the macrocell-tier and maintain a greater signal to interference-plus-noise ratio than the outage level can for about 99% of femtocell users.

Research on Function and Policy for e-Government System using Semantic Technology (전자정부내 의미기반 기술 도입에 따른 기능 및 정책 연구)

  • Go, Gwang-Seop;Jang, Yeong-Cheol;Lee, Chang-Hun
    • 한국디지털정책학회:학술대회논문집
    • /
    • 2007.06a
    • /
    • pp.79-87
    • /
    • 2007
  • This paper aims to offer a solution based on semantic document classification to improve e-Government utilization and efficiency for people using their own information retrieval system and linguistic expression Generally, semantic document classification method is an approach that classifies documents based on the diverse relationships between keywords in a document without fully describing hierarchial concepts between keywords. Our approach considers the deep meanings within the context of the document and radically enhances the information retrieval performance. Concept Weight Document Classification(CoWDC) method, which goes beyond using exist ing keyword and simple thesaurus/ontology methods by fully considering the concept hierarchy of various concepts is proposed, experimented, and evaluated. With the recognition that in order to verify the superiority of the semantic retrieval technology through test results of the CoWDC and efficiently integrate it into the e-Government, creation of a thesaurus, management of the operating system, expansion of the knowledge base and improvements in search service and accuracy at the national level were needed.

  • PDF

A Systematic Approach to Accident Scenario Analysis: Child Safety Seat Case Study (체계적 사고 시나리오 분석기법을 이용한 유아용 안전의자 사례연구)

  • Byun, Seong-Nam;Lee, Dong-Hoon
    • IE interfaces
    • /
    • v.15 no.2
    • /
    • pp.114-125
    • /
    • 2002
  • The objective of this paper is to describe a systematic accident scenario analysis method(SASA) adept at creating accident scenarios for the design of safer products. This approach was inspired by the Quality Function Deployment(QFD) method, which is conventionally used in quality management. In this study, the QFD provides a formal and systematic scheme to devise accident scenarios while maintaining objectivity. SASA consists of three key stages to be broken down into a series of consecutive steps:(1) developing an accident analysis tableau,(2) devising the accident scenarios using the accident analysis tableau,(3) performing a feasibility test, a clustering process and a patterning process, and finally(4) performing quantitative evaluation of each accident scenario. The SASA was applied to a case study of child safety seats. The accident analysis tableau devised 2828(maximum) accident scenarios from all possible relationships between the hazard factors and situation characteristics. Among them, 270 scenarios were devised through the feasibility test and the clustering process. The patterning process reduced them to 29 patterns representative of all accident scenarios. Based on an intensive analysis of the accident patterns, design guidelines for a safer child safety seat were recommended. The implications of the study on the child safety seat case were then discussed.