• Title/Summary/Keyword: clustering techniques

Search Result 528, Processing Time 0.023 seconds

An Analysis of the Research Methodologies and Techniques in the Industrial Engineering Using Text Mining (텍스트 마이닝을 이용한 산업공학 연구기법의 분석)

  • Cho, Geun Ho;Lim, Si Yeong;Hur, Sun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.52-59
    • /
    • 2014
  • We survey 3,857 journal articles published on the four domestic academic journals in the industrial engineering field during 1975~2012. Titles, abstracts, and keywords of the papers are searched by means of text mining technique to draw the information on the methodologies and techniques adopted in the papers, and then we aggregate and merge similar ones to obtain final 38 representative methodologies and techniques. Trends of these methodologies and techniques are studied by analyzing frequencies, clustering, and finding association rules among them. Results of the paper can shed a light to choose tools in the future education and research in the industrial engineering related area.

Systematic Review of Bug Report Processing Techniques to Improve Software Management Performance

  • Lee, Dong-Gun;Seo, Yeong-Seok
    • Journal of Information Processing Systems
    • /
    • v.15 no.4
    • /
    • pp.967-985
    • /
    • 2019
  • Bug report processing is a key element of bug fixing in modern software maintenance. Bug reports are not processed immediately after submission and involve several processes such as bug report deduplication and bug report triage before bug fixing is initiated; however, this method of bug fixing is very inefficient because all these processes are performed manually. Software engineers have persistently highlighted the need to automate these processes, and as a result, many automation techniques have been proposed for bug report processing; however, the accuracy of the existing methods is not satisfactory. Therefore, this study focuses on surveying to improve the accuracy of existing techniques for bug report processing. Reviews of each method proposed in this study consist of a description, used techniques, experiments, and comparison results. The results of this study indicate that research in the field of bug deduplication still lacks and therefore requires numerous studies that integrate clustering and natural language processing. This study further indicates that although all studies in the field of triage are based on machine learning, results of studies on deep learning are still insufficient.

Comparisons on Clustering Methods: Use of LMS Log Variables on Academic Courses

  • Jo, Il-Hyun;PARK, Yeonjeong;SONG, Jongwoo
    • Educational Technology International
    • /
    • v.18 no.2
    • /
    • pp.159-191
    • /
    • 2017
  • Academic analytics guides university decision-makers to assign limited resources more effectively. Especially, diverse academic courses clustered by the usage patterns and levels on Learning Management System(LMS) help understanding instructors' pedagogical approach and the integration level of technologies. Further, the clustering results can contribute deciding proper range and levels of financial and technical supports. However, in spite of diverse analytic methodologies, clustering analysis methods often provide different results. The purpose of this study is to present implications by using three different clustering analysis including Gaussian Mixture Model, K-Means clustering, and Hierarchical clustering. As a case, we have clustered academic courses based on the usage levels and patterns of LMS in higher education using those three clustering techniques. In this study, 2,639 courses opened during 2013 fall semester in a large private university located in South Korea were analyzed with 13 observation variables that represent the characteristics of academic courses. The results of analysis show that the strengths and weakness of each clustering analysis and suggest that academic leaders and university staff should look into the usage levels and patterns of LMS with more elaborated view and take an integrated approach with different analytic methods for their strategic decision on development of LMS.

Alert Correlation Analysis based on Clustering Technique for IDS (클러스터링 기법을 이용한 침입 탐지 시스템의 경보 데이터 상관관계 분석)

  • Shin, Moon-Sun;Moon, Ho-Sung;Ryu, Keun-Ho;Jang, Jong-Su
    • The KIPS Transactions:PartC
    • /
    • v.10C no.6
    • /
    • pp.665-674
    • /
    • 2003
  • In this paper, we propose an approach to correlate alerts using a clustering analysis of data mining techniques in order to support intrusion detection system. Intrusion detection techniques are still far from perfect. Current intrusion detection systems cannot fully detect novel attacks. However, intrucsion detection techniques are still far from perfect. Current intrusion detection systems cannot fully detect novel attacks or variations of known attacks without generating a large amount of false alerts. In addition, all the current intrusion detection systems focus on low-level attacks or anomalies. Consequently, the intrusion detection systems to underatand the intrusion behind the alerts and take appropriate actions. The clustering analysis groups data objects into clusters such that objects belonging to the same cluster are similar, while those belonging to different ones are dissimilar. As using clustering technique, we can analyze alert data efficiently and extract high-level knowledgy about attacks. Namely, it is possible to classify new type of alert as well as existed. And it helps to understand logical steps and strategies behind series of attacks using sequences of clusters, and can potentially be applied to predict attacks in progress.

Brain Magnetic Resonance Image Segmentation Using Adaptive Region Clustering and Fuzzy Rules (적응 영역 군집화 기법과 퍼지 규칙을 이용한 자기공명 뇌 영상의 분할)

  • 김성환;이배호
    • Proceedings of the IEEK Conference
    • /
    • 1999.11a
    • /
    • pp.525-528
    • /
    • 1999
  • Abstract - In this paper, a segmentation method for brain Magnetic Resonance(MR) image using region clustering technique with statistical distribution of gradient image and fuzzy rules is described. The brain MRI consists of gray matter and white matter, cerebrospinal fluid. But due to noise, overlap, vagueness, and various parameters, segmentation of MR image is a very difficult task. We use gradient information rather than intensity directly from the MR images and find appropriate thresholds for region classification using gradient approximation, rayleigh distribution function, region clustering, and merging techniques. And then, we propose the adaptive fuzzy rules in order to extract anatomical structures and diseases from brain MR image data. The experimental results shows that the proposed segmentation algorithm given better performance than traditional segmentation techniques.

  • PDF

Novel Techniques for Real Time Computing Critical Clearing Time SIME-B and CCS-B

  • Dinh, Hung Nguyen;Nguyen, Minh Y.;Yoon, Yong Tae
    • Journal of Electrical Engineering and Technology
    • /
    • v.8 no.2
    • /
    • pp.197-205
    • /
    • 2013
  • Real time transient stability assessment mainly depends on real-time prediction. Unfortunately, conventional techniques based on offline analysis are too slow and unreliable in complex power systems. Hence, fast and reliable stability prediction methods and simple stability criterions must be developed for real time purposes. In this paper, two new methods for real time determining critical clearing time based on clustering identification are proposed. This article is covering three main sections: (i) clustering generators and recognizing critical group; (ii) replacing the multi-machine system by a two-machine dynamic equivalent and eventually, to a one-machine-infinite-bus system; (iii) presenting a new method to predict post-fault trajectory and two simple algorithms for calculating critical clearing time, respectively established upon two different transient stability criterions. The performance is expected to figure out critical clearing time within 100ms-150ms and with an acceptable accuracy.

A Clustering Method Based on Path Similarities of XML Data (XML 데이타의 경로 유사성에 기반한 클러스터링 기법)

  • Choi Il-Hwan;Moon Bong-Ki;Kim Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.33 no.3
    • /
    • pp.342-352
    • /
    • 2006
  • Current studies on storing XML data are focused on either mapping XML data to existing RDBMS efficiently or developing a native XML storage. Some native XML storages store each XML node with parsed object form. Clustering, the physical arrangement of each object, can be an important factor to increase the performance with this storing method. In this paper, we propose re-clustering techniques that can store an XML document efficiently. Proposed clustering technique uses path similarities among data nodes, which can reduce page I/Os when returning query results. And proposed technique can process a path query only using small number of clusters as possible instead of using all clusters. This enables efficient processing of path query because we can reduce search space by skipping unnecessary data. Finally, we apply existing clustering techniques to store XML data and compare the performance with proposed technique. Our results show that the performance of XML storage can be improved by using a proper clustering technique.

A Comparison of Clustering Algorithm in Data Mining

  • Lee, Yung-Seop;An, Mi-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.725-736
    • /
    • 2003
  • To provide the information needed to make a decision, it is important to know the relationship or pattern between variables in database. Grouping objects which have similar characteristics of pattern is called as cluster analysis, one of data mining techniques. In this study, it is compared with several partitioning clustering algorithms, based on the statistical distance or total variance in each cluster.

  • PDF

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

  • Hwang, S.Y.;Hahn, H.E.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.555-563
    • /
    • 2004
  • In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.

  • PDF

Biomedical Ontologies and Text Mining for Biomedicine and Healthcare: A Survey

  • Yoo, Ill-Hoi;Song, Min
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.2
    • /
    • pp.109-136
    • /
    • 2008
  • In this survey paper, we discuss biomedical ontologies and major text mining techniques applied to biomedicine and healthcare. Biomedical ontologies such as UMLS are currently being adopted in text mining approaches because they provide domain knowledge for text mining approaches. In addition, biomedical ontologies enable us to resolve many linguistic problems when text mining approaches handle biomedical literature. As the first example of text mining, document clustering is surveyed. Because a document set is normally multiple topic, text mining approaches use document clustering as a preprocessing step to group similar documents. Additionally, document clustering is able to inform the biomedical literature searches required for the practice of evidence-based medicine. We introduce Swanson's UnDiscovered Public Knowledge (UDPK) model to generate biomedical hypotheses from biomedical literature such as MEDLINE by discovering novel connections among logically-related biomedical concepts. Another important area of text mining is document classification. Document classification is a valuable tool for biomedical tasks that involve large amounts of text. We survey well-known classification techniques in biomedicine. As the last example of text mining in biomedicine and healthcare, we survey information extraction. Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. We also address techniques and issues of evaluating text mining applications in biomedicine and healthcare.