Feature Extraction of Web Document using Association Word Mining (연관 단어 마이닝을 사용한 웹문서의 특징 추출)

  • 고수정;최준혁;이정현
    • Journal of KIISE:Databases
    • v.30 no.4
    • pp.351-361
    • 2003
  • The previous studies to extract features for document through word association have the problems of updating profiles periodically, dealing with noun phrases, and calculating the probability for indices. We propose more effective feature extraction method which is using association word mining. The association word mining method, by using Apriori algorithm, represents a feature for document as not single words but association-word-vectors. Association words extracted from document by Apriori algorithm depend on confidence, support, and the number of composed words. This paper proposes an effective method to determine confidence, support, and the number of words composing association words. Since the feature extraction method using association word mining does not use the profile, it need not update the profile, and automatically generates noun phrase by using confidence and support at Apriori algorithm without calculating the probability for index. We apply the proposed method to document classification using Naive Bayes classifier, and compare it with methods of information gain and TFㆍIDF. Besides, we compare the method proposed in this paper with document classification methods using index association and word association based on the model of probability, respectively.

Automatic Prostate Segmentation from Ultrasound Images using Morphological Features (형태학적 특징을 이용한 초음파 영상에서의 자동 전립선 분할)

  • Kim, Kwang Baek
    • Journal of the Korea Institute of Information and Communication Engineering
    • v.26 no.6
    • pp.865-871
    • 2022
  • In this paper, we propose a method of extracting prostate region using morphological characteristics of ultra-sonic image of prostate. In the first step of the proposed method, the edge area of the prostate image is extracted. The histogram of ultra-sonic image is used to extract base objects to detect the upper edge of prostate region by altering the contrast of the image, then, the lower edges of the extracted base objects are connected by using monotone cubic spline interpolation to extract the upper edge. Step 2, Otsu's binarization is applied to the region under the extracted upper edge of the prostate ultra-sonic image to extract the lower edge of prostate. In the last step, the upper and the lower edges are connected to extract prostate region and by comparing the extracted region of prostate with the one measured manually, the result showed that the morphological characteristics of prostate in ultrasonic image can be utilized to extract the prostate region.

A Collaborative Recommendation Method based on Fuzzy Associative Memory (퍼지연상기억장치에 기반한 협력 추천 방법)

  • 이동섭;고일주;김계영
    • Journal of KIISE:Software and Applications
    • v.31 no.8
    • pp.1054-1061
    • 2004
  • At recent, people can easily access to information by Internet to be rapidly evolving. And also, the amount is rapidly increasing. So the techniques, to automatically extract the required information are very important to reduce the time and the effort for retrieving information. In this paper, we describe a collaborative filtering system for automatically recommending high-quality information to users with similar interests on arbitrarily narrow information domains. It asks a user to rate a gauge set of items. It then evaluates the user's rates and suggests a recommendation set of items. We interpret the process of evaluation as an inference mechanism that maps a gauge set to a recommendation set. We accomplish the mapping with FAM (Fuzzy Associative Memory). We implemented the suggested system in a Web server and tested its performance in the domain of retrieval of technical papers, especially in the field of information technologies. The experimental results show that it may provide reliable recommendations.

Word Sense Disambiguation Based on Local Syntactic Relations and Sense Co-occurrence Information (국소 구문 관계 및 의미 공기 정보에 기반한 명사 의미 모호성 해소)

  • Kim, Young-Kil;Hong, Mun-Pyo;Kim, Chang-Hyun;Seo, Young-Ae;Yang, Seong-Il;Ryu, Chul;Huang, Yin-Xia;Choi, Sung-Kwon;Park, Sang-Kyu
    • Annual Conference on Human and Language Technology
    • 2002.10e
    • pp.184-188
    • 2002
  • 본 논문에서는 단순히 주변에 위치하는 어휘들간의 문맥 공기 정보를 이용하는 방식과는 달리 국소 구문 관계 및 의미 공기 정보에 기반한 명사 의미 모호성 해소 방안을 제안한다. 기존의 WSD 방법은 구조 분석의 어려움으로 인하여 문장의 구문 관계를 충분히 고려하지 못하고 주변 어휘들과의 공기 관계로 그 의미를 파악하려 했다. 그러나 본 논문에서는 동사구의 논항 의미 관계뿐만 아니라 명사구내에서의 의미 관계도 고려한 국소 구문관계를 고려한 명사 의미 모호성 해소 방법을 제안한다. 이 때, 명사들의 의미는 자동번역 시스템의 목적에 맞게 공기(co-occurrence)하는 동사들에 따라 분류하였다. 그리고 한중 자동 번역 지식으로 사용되는 명사 의미 코드가 부착된 74,880 의미 격틀의 의미 공기정보를 이용하였으며 형태소 태깅된 말뭉치로부터 의미모호성이 발생하지 않게 의미 공기정보 및 명사구 의미 공기 정보를 자동으로 추출하였다. 실험 결과, 의미 모호성이 발생하는 명사들에 대해서 83.9%의 의미 모호성 해소 정확률을 보였다.

Finding Measure Position Using Combination Rules of Musical Notes in Monophonic Song (단일 음원 노래에서 음표의 조합 규칙을 이용한 마디 위치 찾기)

  • Park, En-Jong;Shin, Song-Yi;Lee, Joon-Whoan
    • The Journal of the Korea Contents Association
    • v.9 no.10
    • pp.1-12
    • 2009
  • There exist some regular multiple relations in the intervals of notes when they are combined within one measure. This paper presents a method to find the exact measure positions in monophonic song based on those relations. In the proposed method the individual intervals are segmented at first and the rules that state the multiple relations are used to find the measure position. The measures can be applied as the foundational information for extracting beat and tempo of a song which can be used as background knowledge of automatic music transcription system. The proposed method exactly detected the measure positions of 11 songs out of 12 songs except one song which consist of monophonic voice song of the men and women. Also one can extract the information of beat and tempo of a song using the information about extracted measure positions with music theory.

제조기업 현장 데이터를 이용한 빅데이터 분석시스템 모델

  • Kim, Jae-Jung;Seong, Baek-Min;Yu, Jae-Gon;Gang, Chan-U;Kim, Jong-Bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • 2015.05a
    • pp.741-743
    • 2015
  • 오늘날 BI(Business Intelligence)시스템 다차원 데이터를 다루는 많은 방법들이 제안되어 TB 이상의 데이터를 다룰 수 있다. 하지만 IT 전문가 및 IT에 대한 투자여력이 충분하지 않은 중소 제조 기업들은 발 맞춰가기 힘들다. 또한 생산관리시스템(MES)을 미 도입한 기업이 대다수이고, 존재하는 현장데이터의 대부분도 수기데이터 또는 Excel 데이터로 보관 되어 있어, 수작업에 의한 데이터 분석과 의사결정을 수행한다. 이로 인해, 불량 요인 파악이나 이상 현상 파악이 불분명하기 때문에 데이터 분석에 어려움을 겪는다. 이에 본 연구에서는 중소제조기업의 경쟁력 강화를 위하여 제조 기업현장에서 사용되는 데이터를 자동으로 수집하여 정제 및 처리하여 저장이 가능하도록 하는 빅 데이터 분석 시스템 모델을 개발하였다. 이 분석 시스템 모델은 ERP, MIS 등에 존재하는 데이터들이 각 시스템의 DB 기능을 활용하여 데이터를 추출하고 정제하여 수집하는 ETL(Extract Transform Loading)과정을 통한다. 현장에서 비정형으로 기록되고 있는 정보들(ex. Excel)은 ODE(Office Data Excavation)모듈을 통해 문서의 패턴을 자동으로 인식하고 정형화된 정보로서 추출, 정제되어 수집된다. 저장된 데이터는 오픈소스 데이터 시각화 라이브러리인 D3.js를 이용하여 다양한 chart들을 통한 강력한 시각효과를 제공함으로써, 정보간의 연관 관계 및 다차원 분석의 기반을 마련하여 의사결정체계를 효과적으로 지원한다. 또한, 높은 가격에 형성되어 있는 빅데이터 솔루션을 대신해 오픈소스 Spago BI를 이용하여 경제적인 빅 데이터 솔루션을 제공한다. 본 연구의 기대효과로는 첫째, 현장 데이터 중심의 효과적인 의사결정 기반을 마련할 수 있다. 둘째, 통합 데이터 기반의 연관/다차원 분석으로 경영 효율성이 향상된다. 마지막으로, 중소 제조기업 환경에 적합한 분석 시스템을 구축함으로써 경쟁력과 생산력을 강화한다.

An Automatic Classification System of Korean Documents Using Weight for Keywords of Document and Word Cluster (문서의 주제어별 가중치 부여와 단어 군집을 이용한 한국어 문서 자동 분류 시스템)

  • Hur, Jun-Hui;Choi, Jun-Hyeog;Lee, Jung-Hyun;Kim, Joong-Bae;Rim, Kee-Wook
    • The KIPS Transactions:PartB
    • v.8B no.5
    • pp.447-454
    • 2001
  • The automatic document classification is a method that assigns unlabeled documents to the existing classes. The automatic document classification can be applied to a classification of news group articles, a classification of web documents, showing more precise results of Information Retrieval using a learning of users. In this paper, we use the weighted Bayesian classifier that weights with keywords of a document to improve the classification accuracy. If the system cant classify a document properly because of the lack of the number of words as the feature of a document, it uses relevance word cluster to supplement the feature of a document. The clusters are made by the automatic word clustering from the corpus. As the result, the proposed system outperformed existing classification system in the classification accuracy on Korean documents.

Detection of Flaws in Air Deck using Non-Destructive Testing (비파괴 검사를 이용한 항공 갑판의 결함 검출)

  • Kim, Kwang-Baek;Cho, Jae-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • v.15 no.9
    • pp.1865-1870
    • 2011
  • In this paper, we propose an effective method that automatically detects flaws in air deck by using non-destructive testing. First, Gamma correlation transform, 7 ${\times}$ 7 and 13 ${\times}$ 13 Sobel mask apply to the image of air deck acquired non-destructive testing in order to detect the edge of the image. Second, the edge detection area is smoothed and corrected by mean binarization method. Finally, the region of flaws in air deck is detected by a labeling method after removing the noise by the erosion and the dilation operation. In experimental results, we showed that the proposed detection method is effective in air deck.

The Detection Scheme of Graph Area from Sea Level Measurements Recording Paper Images (조위관측기록지 이미지에서 그래프 영역 검출 기법)

  • Yu, Young-Jung;Kim, Young-Ju;Park, Seong-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • v.14 no.11
    • pp.2555-2562
    • 2010
  • In this paper, we propose the method that extracts sea level measurements graph from the sea level measurements recording paper image with a little interaction. At first, a pixel that is included in the graph area is selected. Then, background pixels are automatically determined using the distance between a selected pixel and other pixels on LAB color space. In each vertical line, a pixel that is the nearest to the selected pixel on LAB color space is extracted and the graph area is determined using that pixels. Experimental results show that the sea level measurements graph can be extracted with a few interaction from the various sea level measurements recording paper images.

An Experimental Approach of Keyword Extraction in Korean-Chinese Text (국한문 혼용 텍스트 색인어 추출기법 연구 『시사총보』를 중심으로)

  • Jeong, Yoo Kyung;Ban, Jae-yu
    • Journal of the Korean Society for information Management
    • v.36 no.4
    • pp.7-19
    • 2019
  • The aim of this study is to develop a technique for keyword extraction in Korean-Chinese text in the modern period. We considered a Korean morphological analyzer and a particle in classical Chinese as a possible method for this study. We applied our method to the journal "Sisachongbo," employing proper-noun dictionaries and a list of stop words to extract index terms. The results show that our system achieved better performance than a Chinese morphological analyzer in terms of recall and precision. This study is the first research to develop an automatic indexing system in the traditional Korean-Chinese mixed text.