• Title/Summary/Keyword: Analysis Algorithm

Search Result 12,267, Processing Time 0.042 seconds

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

  • Lee, O-Joun;You, Eun-Soon
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.119-142
    • /
    • 2015
  • With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Personalized Recommendation System for IPTV using Ontology and K-medoids (IPTV환경에서 온톨로지와 k-medoids기법을 이용한 개인화 시스템)

  • Yun, Byeong-Dae;Kim, Jong-Woo;Cho, Yong-Seok;Kang, Sang-Gil
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.147-161
    • /
    • 2010
  • As broadcasting and communication are converged recently, communication is jointed to TV. TV viewing has brought about many changes. The IPTV (Internet Protocol Television) provides information service, movie contents, broadcast, etc. through internet with live programs + VOD (Video on demand) jointed. Using communication network, it becomes an issue of new business. In addition, new technical issues have been created by imaging technology for the service, networking technology without video cuts, security technologies to protect copyright, etc. Through this IPTV network, users can watch their desired programs when they want. However, IPTV has difficulties in search approach, menu approach, or finding programs. Menu approach spends a lot of time in approaching programs desired. Search approach can't be found when title, genre, name of actors, etc. are not known. In addition, inserting letters through remote control have problems. However, the bigger problem is that many times users are not usually ware of the services they use. Thus, to resolve difficulties when selecting VOD service in IPTV, a personalized service is recommended, which enhance users' satisfaction and use your time, efficiently. This paper provides appropriate programs which are fit to individuals not to save time in order to solve IPTV's shortcomings through filtering and recommendation-related system. The proposed recommendation system collects TV program information, the user's preferred program genres and detailed genre, channel, watching program, and information on viewing time based on individual records of watching IPTV. To look for these kinds of similarities, similarities can be compared by using ontology for TV programs. The reason to use these is because the distance of program can be measured by the similarity comparison. TV program ontology we are using is one extracted from TV-Anytime metadata which represents semantic nature. Also, ontology expresses the contents and features in figures. Through world net, vocabulary similarity is determined. All the words described on the programs are expanded into upper and lower classes for word similarity decision. The average of described key words was measured. The criterion of distance calculated ties similar programs through K-medoids dividing method. K-medoids dividing method is a dividing way to divide classified groups into ones with similar characteristics. This K-medoids method sets K-unit representative objects. Here, distance from representative object sets temporary distance and colonize it. Through algorithm, when the initial n-unit objects are tried to be divided into K-units. The optimal object must be found through repeated trials after selecting representative object temporarily. Through this course, similar programs must be colonized. Selecting programs through group analysis, weight should be given to the recommendation. The way to provide weight with recommendation is as the follows. When each group recommends programs, similar programs near representative objects will be recommended to users. The formula to calculate the distance is same as measure similar distance. It will be a basic figure which determines the rankings of recommended programs. Weight is used to calculate the number of watching lists. As the more programs are, the higher weight will be loaded. This is defined as cluster weight. Through this, sub-TV programs which are representative of the groups must be selected. The final TV programs ranks must be determined. However, the group-representative TV programs include errors. Therefore, weights must be added to TV program viewing preference. They must determine the finalranks.Based on this, our customers prefer proposed to recommend contents. So, based on the proposed method this paper suggested, experiment was carried out in controlled environment. Through experiment, the superiority of the proposed method is shown, compared to existing ways.

An Implementation of Dynamic Gesture Recognizer Based on WPS and Data Glove (WPS와 장갑 장치 기반의 동적 제스처 인식기의 구현)

  • Kim, Jung-Hyun;Roh, Yong-Wan;Hong, Kwang-Seok
    • The KIPS Transactions:PartB
    • /
    • v.13B no.5 s.108
    • /
    • pp.561-568
    • /
    • 2006
  • WPS(Wearable Personal Station) for next generation PC can define as a core terminal of 'Ubiquitous Computing' that include information processing and network function and overcome spatial limitation in acquisition of new information. As a way to acquire significant dynamic gesture data of user from haptic devices, traditional gesture recognizer based on desktop-PC using wire communication module has several restrictions such as conditionality on space, complexity between transmission mediums(cable elements), limitation of motion and incommodiousness on use. Accordingly, in this paper, in order to overcome these problems, we implement hand gesture recognition system using fuzzy algorithm and neural network for Post PC(the embedded-ubiquitous environment using blue-tooth module and WPS). Also, we propose most efficient and reasonable hand gesture recognition interface for Post PC through evaluation and analysis of performance about each gesture recognition system. The proposed gesture recognition system consists of three modules: 1) gesture input module that processes motion of dynamic hand to input data 2) Relational Database Management System(hereafter, RDBMS) module to segment significant gestures from input data and 3) 2 each different recognition modulo: fuzzy max-min and neural network recognition module to recognize significant gesture of continuous / dynamic gestures. Experimental result shows the average recognition rate of 98.8% in fuzzy min-nin module and 96.7% in neural network recognition module about significantly dynamic gestures.

Estimation of Ground-level PM10 and PM2.5 Concentrations Using Boosting-based Machine Learning from Satellite and Numerical Weather Prediction Data (부스팅 기반 기계학습기법을 이용한 지상 미세먼지 농도 산출)

  • Park, Seohui;Kim, Miae;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.2
    • /
    • pp.321-335
    • /
    • 2021
  • Particulate matter (PM10 and PM2.5 with a diameter less than 10 and 2.5 ㎛, respectively) can be absorbed by the human body and adversely affect human health. Although most of the PM monitoring are based on ground-based observations, they are limited to point-based measurement sites, which leads to uncertainty in PM estimation for regions without observation sites. It is possible to overcome their spatial limitation by using satellite data. In this study, we developed machine learning-based retrieval algorithm for ground-level PM10 and PM2.5 concentrations using aerosol parameters from Geostationary Ocean Color Imager (GOCI) satellite and various meteorological parameters from a numerical weather prediction model during January to December of 2019. Gradient Boosted Regression Trees (GBRT) and Light Gradient Boosting Machine (LightGBM) were used to estimate PM concentrations. The model performances were examined for two types of feature sets-all input parameters (Feature set 1) and a subset of input parameters without meteorological and land-cover parameters (Feature set 2). Both models showed higher accuracy (about 10 % higher in R2) by using the Feature set 1 than the Feature set 2. The GBRT model using Feature set 1 was chosen as the final model for further analysis(PM10: R2 = 0.82, nRMSE = 34.9 %, PM2.5: R2 = 0.75, nRMSE = 35.6 %). The spatial distribution of the seasonal and annual-averaged PM concentrations was similar with in-situ observations, except for the northeastern part of China with bright surface reflectance. Their spatial distribution and seasonal changes were well matched with in-situ measurements.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

Predicting Suitable Restoration Areas for Warm-Temperate Evergreen Broad-Leaved Forests of the Islands of Jeollanamdo (전라남도 섬 지역의 난온대 상록활엽수림 복원을 위한 적합지 예측)

  • Sung, Chan Yong;Kang, Hyun-Mi;Park, Seok-Gon
    • Korean Journal of Environment and Ecology
    • /
    • v.35 no.5
    • /
    • pp.558-568
    • /
    • 2021
  • Poor supervision and tourism activities have resulted in forest degradation in islands in Korea. Since the southern coastal region of the Korean peninsula was originally dominated by warm-temperate evergreen broad-leaved forests, it is desirable to restore forests in this region to their original vegetation. In this study, we identified suitable areas to be restored as evergreen broad-leaved forests by analyzing the environmental factors of existing evergreen broad-leaved forests in the islands of Jeollanam-do. We classified forest lands in the study area into six vegetation types from Sentinel-2 satellite images using a deep learning algorithm and analyzed the tolerance ranges of existing evergreen broad-leaved forests by measuring the locational, topographic, and climatic attributes of the classified vegetation types. Results showed that evergreen broad-leaved forests were distributed more in areas with a high altitudes and steep slope, where human intervention was relatively low. The human intervention has led to a higher distribution of evergreen broad-leaved forests in areas with lower annual average temperature, which was an unexpected but understandable result because an area with higher altitude has a lower temperature. Of the environmental factors, latitude and average temperature in the coldest month (January) were relatively less contaminated by the effects of human intervention, thus enabling the identification of suitable restoration areas of the evergreen broad-leaved forests. The tolerance range analysis of evergreen broad-leaved forests showed that they mainly grew in areas south of the latitude of 34.7° and a monthly average temperature of 1.7℃ or higher in the coldest month. Therefore, we predicted the areas meeting these criteria to be suitable for restoring evergreen broad-leaved forests. The suitable areas cover 614.5 km2, which occupies 59.0% of the total forest lands on the islands of Jeollanamdo, and 73% of actual forests that exclude agricultural and other non-restorable forest lands. The findings of this study can help forest managers prepare a restoration plan and budget for island forests.

A Comparison between Multiple Satellite AOD Products Using AERONET Sun Photometer Observations in South Korea: Case Study of MODIS,VIIRS, Himawari-8, and Sentinel-3 (우리나라에서 AERONET 태양광도계 자료를 이용한 다종위성 AOD 산출물 비교평가: MODIS, VIIRS, Himawari-8, Sentinel-3의 사례연구)

  • Kim, Seoyeon;Jeong, Yemin;Youn, Youjeong;Cho, Subin;Kang, Jonggu;Kim, Geunah;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.3
    • /
    • pp.543-557
    • /
    • 2021
  • Because aerosols have different spectral characteristics according to the size and composition of the particle and to the satellite sensors, a comparative analysis of aerosol products from various satellite sensors is required. In South Korea, however, a comprehensive study for the comparison of various official satellite AOD (Aerosol Optical Depth) products for a long period is not easily found. In this paper, we aimed to assess the performance of the AOD products from MODIS (Moderate Resolution Imaging Spectroradiometer), VIIRS (Visible Infrared Imaging Radiometer Suite), Himawari-8, and Sentinel-3 by referring to the AERONET (Aerosol Robotic Network) sun photometer observations for the period between January 2015 and December 2019. Seasonal and geographical characteristics of the accuracy of satellite AOD were also analyzed. The MODIS products, which were accumulated for a long time and optimized by the new MAIAC (Multiangle Implementation of Atmospheric Correction) algorithm, showed the best accuracy (CC=0.836) and were followed by the products from VIIRS and Himawari-8. On the other hand, Sentinel-3 AOD did not appear to have a good quality because it was recently launched and not sufficiently optimized yet, according to ESA (European Space Agency). The AOD of MODIS, VIIRS, and Himawari-8 did not show a significant difference in accuracy according to season and to urban vs. non-urban regions, but the mixed pixel problem was partly found in a few coastal regions. Because AOD is an essential component for atmospheric correction, the result of this study can be a reference to the future work for the atmospheric correction for the Korean CAS (Compact Advanced Satellite) series.

Comparative study of volumetric change in water-stored and dry-stored complete denture base (공기중과 수중에서 보관한 총의치 의치상의 체적변화에 대한 비교연구)

  • Kim, Jinseon;Lee, Younghoo;Hong, Seoung-Jin;Paek, Janghyun;Noh, Kwantae;Pae, Ahran;Kim, Hyeong-Seob;Kwon, Kung-Rock
    • The Journal of Korean Academy of Prosthodontics
    • /
    • v.59 no.1
    • /
    • pp.18-26
    • /
    • 2021
  • Purpose: Generally, patients are noticed to store denture in water when removed from the mouth. However, few studies have reported the advantage of volumetric change in underwater storage over dry storage. To be a reference in defining the proper denture storage method, this study aims to evaluate the volumetric change and dimensional deformation in case of underwater and dry storage. Materials and methods: Definitive casts were scanned by a model scanner, and denture bases were designed with computer-aided design (CAD) software. Twelve denture bases (upper 6, lower 6) were printed with 3D printer. Printed denture bases were invested and flasked with heat-curing method. 6 upper and 6 lower dentures were divided into group A and B, and each group contains 3 upper and 3 lower dentures. Group A was stored dry at room temperature, group B was stored underwater. Group B was scanned at every 24 hours for 28 days and scanned data was saved as stereolithography (SLA) file. These SLA files were analyzed to measure the difference in volumetric change of a month and Kruskal-Wallis test were used for statistical analysis. Best-fit algorithm was used to overlap and 3-dimensional color-coded map was used to observe the changing pattern of impression surface. Results: No significant difference was found in volumetric changes regardless of the storage methods. In dry-stored denture base, significant changes were found in the palate of upper jaw and posterior lingual border of lower jaw in direction away from the underlying tissue, maxillary tuberosity of upper jaw and retromolar pad area of lower jaw in direction towards the underlying tissue. Conclusion: Storing the denture underwater shows less volumetric change of impression surface than storing in the dry air.

A Prediction of N-value Using Artificial Neural Network (인공신경망을 이용한 N치 예측)

  • Kim, Kwang Myung;Park, Hyoung June;Goo, Tae Hun;Kim, Hyung Chan
    • The Journal of Engineering Geology
    • /
    • v.30 no.4
    • /
    • pp.457-468
    • /
    • 2020
  • Problems arising during pile design works for plant construction, civil and architecture work are mostly come from uncertainty of geotechnical characteristics. In particular, obtaining the N-value measured through the Standard Penetration Test (SPT) is the most important data. However, it is difficult to obtain N-value by drilling investigation throughout the all target area. There are many constraints such as licensing, time, cost, equipment access and residential complaints etc. it is impossible to obtain geotechnical characteristics through drilling investigation within a short bidding period in overseas. The geotechnical characteristics at non-drilling investigation points are usually determined by the engineer's empirical judgment, which can leads to errors in pile design and quantity calculation causing construction delay and cost increase. It would be possible to overcome this problem if N-value could be predicted at the non-drilling investigation points using limited minimum drilling investigation data. This study was conducted to predicted the N-value using an Artificial Neural Network (ANN) which one of the Artificial intelligence (AI) method. An Artificial Neural Network treats a limited amount of geotechnical characteristics as a biological logic process, providing more reliable results for input variables. The purpose of this study is to predict N-value at the non-drilling investigation points through patterns which is studied by multi-layer perceptron and error back-propagation algorithms using the minimum geotechnical data. It has been reviewed the reliability of the values that predicted by AI method compared to the measured values, and we were able to confirm the high reliability as a result. To solving geotechnical uncertainty, we will perform sensitivity analysis of input variables to increase learning effect in next steps and it may need some technical update of program. We hope that our study will be helpful to design works in the future.