• Title/Summary/Keyword: Term weighting

Search Result 110, Processing Time 0.027 seconds

An Image Inpainting Method using Global Information and Distance Weighting (전역적 특성과 거리가중치를 이용한 영상 인페인팅)

  • Kim, Chang-Ki;Kim, Baek-Sop
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.8
    • /
    • pp.629-640
    • /
    • 2010
  • The exemplar-based inpainting model is widely used to remove objects from natural images and to restore a damaged region. This paper presents a method which improves the performance of the conventional exemplar-based inpainting model by modifying three major parts in the model: data term, confidence term and patch selection. While the conventional data term is calculated using the local gradient, the proposed method uses 16 compass masks to get the global gradient to make the method robust to noise. To overcome the problem that the confidence term gets negligible in the inside of the eliminated region, a method is proposed which makes the confidence term decrease slowly in the eliminated region. The patch selection procedure is modified so that the closer patch has higher weight. Experiments showed that the proposed method produced more natural images and lower reconstruction error than the conventional exemplar-based inpainting.

A Study on Improving the Performance of Document Classification Using the Context of Terms (용어의 문맥활용을 통한 문헌 자동 분류의 성능 향상에 관한 연구)

  • Song, Sung-Jeon;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.205-224
    • /
    • 2012
  • One of the limitations of BOW method is that each term is recognized only by its form, failing to represent the term's meaning or thematic background. To overcome the limitation, different profiles for each term were defined by thematic categories depending on contextual characteristics. In this study, a specific term was used as a classification feature based on its meaning or thematic background through the process of comparing the context in those profiles with the occurrences in an actual document. The experiment was conducted in three phases; term weighting, ensemble classifier implementation, and feature selection. The classification performance was enhanced in all the phases with the ensemble classifier showing the highest performance score. Also, the outcome showed that the proposed method was effective in reducing the performance bias caused by the total number of learning documents.

A Study on Optimization of Support Vector Machine Classifier for Word Sense Disambiguation (단어 중의성 해소를 위한 SVM 분류기 최적화에 관한 연구)

  • Lee, Yong-Gu
    • Journal of Information Management
    • /
    • v.42 no.2
    • /
    • pp.193-210
    • /
    • 2011
  • The study was applied to context window sizes and weighting method to obtain the best performance of word sense disambiguation using support vector machine. The context window sizes were used to a 3-word, sentence, 50-bytes, and document window around the targeted word. The weighting methods were used to Binary, Term Frequency(TF), TF ${\times}$ Inverse Document Frequency(IDF), and Log TF ${\times}$ IDF. As a result, the performance of 50-bytes in the context window size was best. The Binary weighting method showed the best performance.

A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods (용어 가중치부여 기법을 이용한 로치오 분류기의 성능 향상에 관한 연구)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.1
    • /
    • pp.211-233
    • /
    • 2008
  • This study examines various weighting methods for improving the performance of automatic classification based on Rocchio algorithm on two collections(LISA, Reuters-21578). First, three factors for weighting are identified as document factor, document factor, category factor for each weighting schemes, the performance of each was investigated. Second, the performance of combined weighting methods between the single schemes were examined. As a result, for the single schemes based on each factor, category-factor-based schemes showed the best performance, document set-factor-based schemes the second, and document-factor-based schemes the worst. For the combined weighting schemes, the schemes(idf*cat) which combine document set factor with category factor show better performance than the combined schemes(tf*cat or ltf*cat) which combine document factor with category factor as well as the common schemes (tfidf or ltfidf) that combining document factor with document set factor. However, according to the results of comparing the single weighting schemes with combined weighting schemes in the view of the collections, while category-factor-based schemes(cat only) perform best on LISA, the combined schemes(idf*cat) which combine document set factor with category factor showed best performance on the Reuters-21578. Therefore for the practical application of the weighting methods, it needs careful consideration of the categories in a collection for automatic classification.

Efficient Term Weighting For Term-based Web Document Search (단어기반 웹 문서 검색을 위한 효과적인 단어 가중치의 계산)

  • 권순만;박병준
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.169-171
    • /
    • 2004
  • 웹(WWW)은 방대한 양의 정보들과 함께 그에 따른 웹의 환경과 그에 따른 정보도 증가하게 되었다. 그에 따라 사용자가 찾고자 하는 정보가 잘 표현된 웹 문서를 효과적으로 찾는 것은 중요한 일이 되었다. 단어기반의 검색에서는 사용자가 찾고자 하는 단어가 나타난 문서들을 사용자에게 보여주게 된다. 검색 단어를 가지고 문서에 대한 가중치를 계산하게 되는데, 본 논문에서는 이러한 단어기반의 검색에서 단어에 대한 가중치를 효과적으로 계산하는 방법을 제시한다 기존의 방식은 단어가 나타난 빈도수에 한정되어진 계산을 하게 되는 반면, 수정된 방식은 태그별로 분류를 통한 차별화 된 가중치를 부여하여 계산된다. 기존의 방식과 비교한 결과 본 논문에서 제시한 수정된 방식이 더 높은 정확도를 나타냈다.

  • PDF

Determination of Weighting Factor in the Inverse Model for Estimating Surface Velocity from AVHRR/SST Data (AVHRR/SST로 부터 표층유속을 추정하기 위한 역행렬 모델에서 가중치의 설정)

  • Lee, Tae-Shin;Chung, Jong-Yul;Kang, Hyoun-Woo
    • 한국해양학회지
    • /
    • v.30 no.6
    • /
    • pp.543-549
    • /
    • 1995
  • The inverse method has been used to estimate a surface velocity field from sequential AVHRR/SST data. In the model, equation system was composed of heat equation and horizontal divergence minimization and the velocity field contained in the advective term of the heat equation, which was linearized in grid system, was estimated. A constraint was the minimization of horizontal divergence with weighting factor and introduced to compensate the null space(Menke, 1984) of the velocity solutions for the heat equation. The experiments were carried out to set up the range of weighting factor and the matrix equation was solved by SVD(Singular Value Decomposion). In the experiment, the scales of horizontal temperature gradient and divergence of synthetic velocity field were approximated to those of real field. The neglected diffusive effect and the horizontal variation of heat flux in the heat equation were regarded as random temperature errors. According to the result of experiments, the minimum of relative error was more desirable than the minimum of misfit as the criteria of setting up the weighting factor and the error of estimated velocity field became small when the weighting factor was order of $10^{-1}$

  • PDF

A Risk Assessment Method for the Long-term Preservation of Electronic Records (전자기록의 장기보존을 위한 위험평가 방법의 제안)

  • Cha, Hyun Chul;Choi, Joo Ho
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.1
    • /
    • pp.79-87
    • /
    • 2019
  • Appropriate strategies are needed to ensure long-term preservation of various types of electronic records. For proper preservation of electronic records, it is necessary of decision-making processes for risk assessment, notification and implementation of conservation measures. To do this, the task of assessing various risk factors that impede the long-term preservation and utilization of electronic records should be done first. In this study, since electronic records are mostly stored in file form, risk assessment for electronic records of file type is performed. The risk factors required for the risk assessment of the file format are derived, and the algorithms are developed to devise a calculation method of the weighting factor and the risk factor index for evaluating the risk based on the proposed risk factors. In addition, the proposed methods are applied to the file formats used in Korea and risk assessment is performed and the results are analyzed.

Localization of Mobile Robot Based on Radio Frequency Identification Devices (RFID를 이용한 이동로봇의 위치인식기술)

  • Lee Hyun-Jeong;Choi Kyu-Cheon;Lee Min-Cheol;Lee Jang-Myung
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.12 no.1
    • /
    • pp.41-46
    • /
    • 2006
  • Ubiquitous location based services, offer helpful services anytime and anywhere by using real-time location information of objects based on ubiquitous network. Particularly, autonomous mobile robots can be a solution for various applications related to ubiquitous location based services, e.g. in hospitals, for cleaning, at airports or railway stations. However, a meaningful and still unsolved problem for most applications is to develop a robust and cheap positioning system. A typical example of position measurements is dead reckoning that is well known for providing a good short-term accuracy, being inexpensive and allowing very high sampling rates. However, the measurement always has some accumulated errors because the fundamental idea of dead reckoning is the integration of incremental motion information over time. The other hand, a localization system using RFID offers absolute position of robots regardless of elapsed time. We construct an absolute positioning system based on RFID and investigate how localization technique can be enhanced by RFID through experiment to measure the location of a mobile robot. Tags are placed on the floor at 5cm intervals in the shape of square in an arbitrary space and the accuracy of position measurement is investigated . To reduce the error and the variation of error, a weighting function based on Gaussian function is used. Different weighting values are applied to position data of tags since weighting values follow Gaussian function.

Document classification using a deep neural network in text mining (텍스트 마이닝에서 심층 신경망을 이용한 문서 분류)

  • Lee, Bo-Hui;Lee, Su-Jin;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.615-625
    • /
    • 2020
  • The document-term frequency matrix is a term extracted from documents in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function term frequency-inverse document frequency (TF-IDF) to the generated document-term frequency matrix. In addition, we applied term frequency-inverse gravity moment (TF-IGM). We also generated a document-keyword weighted matrix by extracting keywords to improve the document classification accuracy. Based on the keywords matrix extracted, we classify documents using a deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. Consequently, the model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy (according to parameter changes) were higher than TF-IDF. In addition, the deep neural network was confirmed to have better accuracy than the support vector machine. Therefore, we propose a method to apply TF-IGM and a deep neural network in the document classification.

A Study on Automatic Indexing of Korean Texts based on Statistical Criteria (통계적기법에 의한 한글자동색인의 연구)

  • Woo, Dong-Chin
    • Journal of the Korean Society for information Management
    • /
    • v.4 no.1
    • /
    • pp.47-86
    • /
    • 1987
  • The purpose of this study is to present an effective automatic indexing method of Korean texts based on statistical criteria. Titles and abstracts of the 299 documents randomly selected from ETRI's DOCUMENT data base are used as the experimental data in this study the experimental data is divided into 4 word groups and these 4 word groups are respectively analyzed and evaluated by applying 3 automatic indexing methods including Transition Phenomena of Word Occurrence, Inverse Document Frequency Weighting Technique, and Term Discrimination Weighting Technique.

  • PDF