• Title/Summary/Keyword: Neighborhood search

Search Result 113, Processing Time 0.019 seconds

Isolation and identification of tick-borne pathogens in hard ticks collected in Daejeon (대전 주택가 산책로 진드기의 인수공통전염병 병원체 감염실태 조사)

  • Han, So-young;Sung, Sun-hye;Seo, Jin-woo;Kim, Jong-ho;Lee, Seok-ju;Yoo, Sang-sik
    • Korean Journal of Veterinary Service
    • /
    • v.44 no.2
    • /
    • pp.93-102
    • /
    • 2021
  • In this study, a total of 9,449 hard ticks were collected once a month from April to October 2020 from a neighborhood park in Daejeon by flagging & dragging method and CO2 manned trap method. The collected ticks were classified according to the Yamagutsi search table using a stereoscopic microscope and molecular biological analysis of four pathogens (SFTSV, Anaplasma spp., Ehrlichia spp., Borrellia spp.). As a result of the study, Haemaphysalis longicornis were collected the most in all areas of the five boroughs at a rate of 82 to 96 percent, while adults were collected the most in May to July, nymphs were collected the most in April to June, and larvae from August to October at a rate of 78 percent to 98 percent. In pathogens, three cases of SFTSV were detected, showing a minimum infection rate (MIR) of 0.46%, while Anaplasma spp. and Ehrlichia spp. were detected one each, with 0.15% and Borrelia spp. with a minimum infection rate of 0.46%. The detected SFTSV showed 99.9% homogeneity with the KF781490 detected in Cheongwon-gun, Chungbuk Province, Anaplasma spp. showed 99.0% homogeneity with JN990105 detected in China, and Erhlichia spp. showed 98.9% genetic similarity with U96436 separated from the U.S. In this study, the distribution status and pathogen infection rate of the hard ticks in the Daejeon area are analyzed and provided as basic data for the prevention of the hard tick-borne infectious disease.

A CF-based Health Functional Recommender System using Extended User Similarity Measure (확장된 사용자 유사도를 이용한 CF-기반 건강기능식품 추천 시스템)

  • Sein Hong;Euiju Jeong;Jaekyeong Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.1-17
    • /
    • 2023
  • With the recent rapid development of ICT(Information and Communication Technology) and the popularization of digital devices, the size of the online market continues to grow. As a result, we live in a flood of information. Thus, customers are facing information overload problems that require a lot of time and money to select products. Therefore, a personalized recommender system has become an essential methodology to address such issues. Collaborative Filtering(CF) is the most widely used recommender system. Traditional recommender systems mainly utilize quantitative data such as rating values, resulting in poor recommendation accuracy. Quantitative data cannot fully reflect the user's preference. To solve such a problem, studies that reflect qualitative data, such as review contents, are being actively conducted these days. To quantify user review contents, text mining was used in this study. The general CF consists of the following three steps: user-item matrix generation, Top-N neighborhood group search, and Top-K recommendation list generation. In this study, we propose a recommendation algorithm that applies an extended similarity measure, which utilize quantified review contents in addition to user rating values. After calculating review similarity by applying TF-IDF, Word2Vec, and Doc2Vec techniques to review content, extended similarity is created by combining user rating similarity and quantified review contents. To verify this, we used user ratings and review data from the e-commerce site Amazon's "Health and Personal Care". The proposed recommendation model using extended similarity measure showed superior performance to the traditional recommendation model using only user rating value-based similarity measure. In addition, among the various text mining techniques, the similarity obtained using the TF-IDF technique showed the best performance when used in the neighbor group search and recommendation list generation step.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.