• 제목/요약/키워드: K-Nearest Neighbor algorithm

검색결과 271건 처리시간 0.033초

Academic Registration Text Classification Using Machine Learning

  • Alhawas, Mohammed S;Almurayziq, Tariq S
    • International Journal of Computer Science & Network Security
    • /
    • 제22권1호
    • /
    • pp.93-96
    • /
    • 2022
  • Natural language processing (NLP) is utilized to understand a natural text. Text analysis systems use natural language algorithms to find the meaning of large amounts of text. Text classification represents a basic task of NLP with a wide range of applications such as topic labeling, sentiment analysis, spam detection, and intent detection. The algorithm can transform user's unstructured thoughts into more structured data. In this work, a text classifier has been developed that uses academic admission and registration texts as input, analyzes its content, and then automatically assigns relevant tags such as admission, graduate school, and registration. In this work, the well-known algorithms support vector machine SVM and K-nearest neighbor (kNN) algorithms are used to develop the above-mentioned classifier. The obtained results showed that the SVM classifier outperformed the kNN classifier with an overall accuracy of 98.9%. in addition, the mean absolute error of SVM was 0.0064 while it was 0.0098 for kNN classifier. Based on the obtained results, the SVM is used to implement the academic text classification in this work.

축에 평행한 도로들이 놓여 있을 때의 $L_1$ 최단 경로 ([$L_1$] Shortest Paths with Isothetic Roads)

  • 배상원;김재훈;좌경룡
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2005년도 가을 학술발표논문집 Vol.32 No.2 (1)
    • /
    • pp.976-978
    • /
    • 2005
  • We present a nearly optimal ($O(\nu\;min(\nu,\;n)n\;log\;n)$ time and O(n) srace) algorithm that constructs a shortest path map with n isothetic roads of speed $\nu$ under the $L_1$ metric. The algorithm uses the continuous Dijkstra method and its efficiency is based on a new geometric insight; the minimum in-degree of any nearest neighbor graph for points with roads of speed $\nu$ is $\Theta(\nu\;min(\nu,\;n))$, which is first shown in this paper. Also, this algorithm naturally extends to the multi-source case so that the Voronoi diagram for m sites can be computed in $O(\nu\;min(\nu,\;n)(n+m)log(n+m))$ time and O(n+m) space, which is also nearly optimal.

  • PDF

사례기반 추론을 이용한 한글 문서분류 시스템 (A Hangul Document Classification System using Case-based Reasoning)

  • 이재식;이종운
    • Asia pacific journal of information systems
    • /
    • 제12권2호
    • /
    • pp.179-195
    • /
    • 2002
  • In this research, we developed an efficient Hangul document classification system for text mining. We mean 'efficient' by maintaining an acceptable classification performance while taking shorter computing time. In our system, given a query document, k documents are first retrieved from the document case base using the k-nearest neighbor technique, which is the main algorithm of case-based reasoning. Then, TFIDF method, which is the traditional vector model in information retrieval technique, is applied to the query document and the k retrieved documents to classify the query document. We call this procedure 'CB_TFIDF' method. The result of our research showed that the classification accuracy of CB_TFIDF was similar to that of traditional TFIDF method. However, the average time for classifying one document decreased remarkably.

Text-independent Speaker Identification Using Soft Bag-of-Words Feature Representation

  • Jiang, Shuangshuang;Frigui, Hichem;Calhoun, Aaron W.
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제14권4호
    • /
    • pp.240-248
    • /
    • 2014
  • We present a robust speaker identification algorithm that uses novel features based on soft bag-of-word representation and a simple Naive Bayes classifier. The bag-of-words (BoW) based histogram feature descriptor is typically constructed by summarizing and identifying representative prototypes from low-level spectral features extracted from training data. In this paper, we define a generalization of the standard BoW. In particular, we define three types of BoW that are based on crisp voting, fuzzy memberships, and possibilistic memberships. We analyze our mapping with three common classifiers: Naive Bayes classifier (NB); K-nearest neighbor classifier (KNN); and support vector machines (SVM). The proposed algorithms are evaluated using large datasets that simulate medical crises. We show that the proposed soft bag-of-words feature representation approach achieves a significant improvement when compared to the state-of-art methods.

순서 유지 암호화 기반의 k-최근접 질의처리 알고리즘 설계 (Design of k-Nearest Neighbor Query Processing Algorithm Based on Order-Preserving Encryption)

  • 김용기;최기석
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2012년도 추계학술발표대회
    • /
    • pp.1410-1411
    • /
    • 2012
  • 최근 모바일 사용자의 안전한 위치기반 서비스의 사용을 위한 아웃소싱 데이터베이스에서 객체 및 사용자의 위치 정보를 보호하는 연구가 위치 데이터를 보호하기 위한 연구가 활발히 진행되고 있다. 그러나 기존 연구는 불필요한 객체 정보를 요구하기 때문에, 높은 질의 처리 시간을 지니는 단점을 지닌다. 이러한 문제점을 해결하기 위해, 본 논문에서는 기준 POI를 중심으로 객체의 방향성 정보와 변환된 거리를 이용하여, 사용자와 객체의 정보를 보호하는 k-최근접 질의처리 알고리즘을 제안한다.

k-NN기법을 이용한 산림바이오매스 자원량 평가 - 강원대학교 학술림을 대상으로 - (Assessment of Forest Biomass using k-Neighbor Techniques - A Case Study in the Research Forest at Kangwon National University -)

  • 서환석;박동환;임종수;이정수
    • 한국산림과학회지
    • /
    • 제101권4호
    • /
    • pp.547-557
    • /
    • 2012
  • 본 연구는 강원대학교 학술림을 대상으로 현장조사자료와 Landsat TM-5 위성영상 정보를 이용하여 k-NN기법을 통해 산림바이오매스를 추정하는 것을 목적으로 하였다. 임상 층화 및 최소수평 참조거리(HRA)와 공간필터링의 조건변화에 따른 최적의 참조표본점 개수(k)를 검토하였으며, 이에 따른 산림바이오매스량 추정과 정확도를 비교 분석하였다. 침엽수는 $5{\times}5$ 필터링을 적용한 HRA 4 km와 k=8를 적용하였을 때 최소의 RMSE를 나타냈으며, 편차는 1.8 t/ha으로 과대추정되었다. 한편, 잣나무와 활엽수는 필터링을 적용하지 않은 HRA 4km의 k=8과 HRA 10 km의 k=6을 적용하였을 때 최소의 RMSE가 나타났으며, 편차는 각각 -1.6 t/ha, -5.2 t/ha로 과소추정되었다. k-NN기법에 의하여 추정된 총 바이오매스량은 799천t이며, ha당 평균 산림바이오매스량은 237 t/ha로서 표본점자료를 이용한 추정치보다 약 1 t/ha 높게 나타났다.

Clustering Techniques for XML Data Using Data Mining

  • Kim, Chun-Sik
    • 한국전자거래학회:학술대회논문집
    • /
    • 한국전자거래학회 2005년도 e-Biz World Conference 2005
    • /
    • pp.189-194
    • /
    • 2005
  • Many studies have been conducted to classify documents, and to extract useful information from documents. However, most search engines have used a keyword based method. This method does not search and classify documents effectively. This paper identifies structures of XML document based on the fact that the XML document has a structural document using a set theory, which is suggested by Broder, and attempts a test for clustering XML document by applying a k-nearest neighbor algorithm. In addition, this study investigates the effectiveness of the clustering technique for large scaled data, compared to the existing bitmap method, by applying a test, which reveals a difference between the clause based documents instead of using a type of vector, in order to measure the similarity between the existing methods.

  • PDF

Gesture Recognition Using Higher Correlation Feature Information and PCA

  • Kim, Jong-Min;Lee, Kee-Jun
    • 통합자연과학논문집
    • /
    • 제5권2호
    • /
    • pp.120-126
    • /
    • 2012
  • This paper describes the algorithm that lowers the dimension, maintains the gesture recognition and significantly reduces the eigenspace configuration time by combining the higher correlation feature information and Principle Component Analysis. Since the suggested method doesn't require a lot of computation than the method using existing geometric information or stereo image, the fact that it is very suitable for building the real-time system has been proved through the experiment. In addition, since the existing point to point method which is a simple distance calculation has many errors, in this paper to improve recognition rate the recognition error could be reduced by using several successive input images as a unit of recognition with K-Nearest Neighbor which is the improved Class to Class method.

외판원 문제를 위한 효율적인 분산 최근접 휴리스틱 알고리즘 (An Efficient Distributed Nearest Neighbor Heuristic for the Traveling Salesman Problem)

  • 김정숙;이희영
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2000년도 추계학술발표논문집 (하)
    • /
    • pp.1373-1376
    • /
    • 2000
  • 외판원 문제(Traveling Salesman Problem)는 주어진 n개의 도시들과 그 도시들간의 거리 비용이 주어졌을 매, 처음 출발도시에서부터 정확히 한 도시는 한 번씩만 방문하여 다시 출발도시로 돌아오면서 방문한 도시들을 연결하는 최소의 비용이 드는 경로를 찾는 문제로 최적해(optimal value)를 구하는 것은 전형적인 NP-완전 문제중의 하나이다[2,4,5, 8]. 따라서 이들의 수행시간을 줄이고자 하는 연구가 많이 진행된다. 본 논문에서는 외판원 문제의 최적의 해를 구하는데. 휴리스틱 알고리즘인 최근접 휴리스틱을 이용한다. 물론 수행 시간을 줄이고자 최적화 문제에서 좋은 성능을 보이는 유전 알고리즘 (Genetic Algorithm)으로 얻은 근사해(near optimal)를 초기 분기 함수로 사용하고, 근거리 통신망(Local Area Network)에 기반한 분산 처리 환경에서 여러 프로세서에 분산시켜 병렬성을 살린다.

  • PDF

Systematic Approach for Detecting Text in Images Using Supervised Learning

  • Nguyen, Minh Hieu;Lee, GueeSang
    • International Journal of Contents
    • /
    • 제9권2호
    • /
    • pp.8-13
    • /
    • 2013
  • Locating text data in images automatically has been a challenging task. In this approach, we build a three stage system for text detection purpose. This system utilizes tensor voting and Completed Local Binary Pattern (CLBP) to classify text and non-text regions. While tensor voting generates the text line information, which is very useful for localizing candidate text regions, the Nearest Neighbor classifier trained on discriminative features obtained by the CLBP-based operator is used to refine the results. The whole algorithm is implemented in MATLAB and applied to all images of ICDAR 2011 Robust Reading Competition data set. Experiments show the promising performance of this method.