• Title/Summary/Keyword: Vector space

검색결과 1,803건 처리시간 0.317초

WebPR : A Dynamic Web Page Recommendation Algorithm Based on Mining Frequent Traversal Patterns (WebPR :빈발 순회패턴 탐사에 기반한 동적 웹페이지 추천 알고리즘)

  • Yoon, Sun-Hee;Kim, Sam-Keun;Lee, Chang-Hoon
    • The KIPS Transactions:PartB
    • /
    • 제11B권2호
    • /
    • pp.187-198
    • /
    • 2004
  • The World-Wide Web is the largest distributed Information space and has grown to encompass diverse information resources. However, although Web is growing exponentially, the individual's capacity to read and digest contents is essentially fixed. From the view point of Web users, they can be confused by explosion of Web information, by constantly changing Web environments, and by lack of understanding needs of Web users. In these Web environments, mining traversal patterns is an important problem in Web mining with a host of application domains including system design and Information services. Conventional traversal pattern mining systems use the inter-pages association in sessions with only a very restricted mechanism (based on vector or matrix) for generating frequent k-Pagesets. We develop a family of novel algorithms (termed WebPR - Web Page Recommend) for mining frequent traversal patterns and then pageset to recommend. Our algorithms provide Web users with new page views, which Include pagesets to recommend, so that users can effectively traverse its Web site. The main distinguishing factors are both a point consistently spanning schemes applying inter-pages association for mining frequent traversal patterns and a point proposing the most efficient tree model. Our experimentation with two real data sets, including Lady Asiana and KBS media server site, clearly validates that our method outperforms conventional methods.

Multiple Cause Model-based Topic Extraction and Semantic Kernel Construction from Text Documents (다중요인모델에 기반한 텍스트 문서에서의 토픽 추출 및 의미 커널 구축)

  • 장정호;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • 제31권5호
    • /
    • pp.595-604
    • /
    • 2004
  • Automatic analysis of concepts or semantic relations from text documents enables not only an efficient acquisition of relevant information, but also a comparison of documents in the concept level. We present a multiple cause model-based approach to text analysis, where latent topics are automatically extracted from document sets and similarity between documents is measured by semantic kernels constructed from the extracted topics. In our approach, a document is assumed to be generated by various combinations of underlying topics. A topic is defined by a set of words that are related to the same topic or cooccur frequently within a document. In a network representing a multiple-cause model, each topic is identified by a group of words having high connection weights from a latent node. In order to facilitate teaming and inferences in multiple-cause models, some approximation methods are required and we utilize an approximation by Helmholtz machines. In an experiment on TDT-2 data set, we extract sets of meaningful words where each set contains some theme-specific terms. Using semantic kernels constructed from latent topics extracted by multiple cause models, we also achieve significant improvements over the basic vector space model in terms of retrieval effectiveness.

Relevance Feedback using Region-of-interest in Retrieval of Satellite Images (위성영상 검색에서 사용자 관심영역을 이용한 적합성 피드백)

  • Kim, Sung-Jin;Chung, Chin-Wan;Lee, Seok-Lyong;Kim, Deok-Hwan
    • Journal of KIISE:Databases
    • /
    • 제36권6호
    • /
    • pp.434-445
    • /
    • 2009
  • Content-based image retrieval(CBIR) is the retrieval technique which uses the contents of images. However, in contrast to text data, multimedia data are ambiguous and there is a big difference between system's low-level representation and human's high-level concept. So it doesn't always mean that near points in the vector space are similar to user. We call this the semantic-gap problem. Due to this problem, performance of image retrieval is not good. To solve this problem, the relevance feedback(RF) which uses user's feedback information is used. But existing RF doesn't consider user's region-of-interest(ROI), and therefore, irrelevant regions are used in computing new query points. Because the system doesn't know user's ROI, RF is proceeded in the image-level. We propose a new ROI RF method which guides a user to select ROI from relevant images for the retrieval of complex satellite image, and this improves the accuracy of the image retrieval by computing more accurate query points in this paper. Also we propose a pruning technique which improves the accuracy of the image retrieval by using images not selected by the user in this paper. Experiments show the efficiency of the proposed ROI RF and the pruning technique.

Extracting Typical Group Preferences through User-Item Optimization and User Profiles in Collaborative Filtering System (사용자-상품 행렬의 최적화와 협력적 사용자 프로파일을 이용한 그룹의 대표 선호도 추출)

  • Ko Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • 제32권7호
    • /
    • pp.581-591
    • /
    • 2005
  • Collaborative filtering systems have problems involving sparsity and the provision of recommendations by making correlations between only two users' preferences. These systems recommend items based only on the preferences without taking in to account the contents of the items. As a result, the accuracy of recommendations depends on the data from user-rated items. When users rate items, it can be expected that not all users ran do so earnestly. This brings down the accuracy of recommendations. This paper proposes a collaborative recommendation method for extracting typical group preferences using user-item matrix optimization and user profiles in collaborative tittering systems. The method excludes unproven users by using entropy based on data from user-rated items and groups users into clusters after generating user profiles, and then extracts typical group preferences. The proposed method generates collaborative user profiles by using association word mining to reflect contents as well as preferences of items and groups users into clusters based on the profiles by using the vector space model and the K-means algorithm. To compensate for the shortcoming of providing recommendations using correlations between only two user preferences, the proposed method extracts typical preferences of groups using the entropy theory The typical preferences are extracted by combining user entropies with item preferences. The recommender system using typical group preferences solves the problem caused by recommendations based on preferences rated incorrectly by users and reduces time for retrieving the most similar users in groups.

Feature Selection to Predict Very Short-term Heavy Rainfall Based on Differential Evolution (미분진화 기반의 초단기 호우예측을 위한 특징 선택)

  • Seo, Jae-Hyun;Lee, Yong Hee;Kim, Yong-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제22권6호
    • /
    • pp.706-714
    • /
    • 2012
  • The Korea Meteorological Administration provided the recent four-years records of weather dataset for our very short-term heavy rainfall prediction. We divided the dataset into three parts: train, validation and test set. Through feature selection, we select only important features among 72 features to avoid significant increase of solution space that arises when growing exponentially with the dimensionality. We used a differential evolution algorithm and two classifiers as the fitness function of evolutionary computation to select more accurate feature subset. One of the classifiers is Support Vector Machine (SVM) that shows high performance, and the other is k-Nearest Neighbor (k-NN) that is fast in general. The test results of SVM were more prominent than those of k-NN in our experiments. Also we processed the weather data using undersampling and normalization techniques. The test results of our differential evolution algorithm performed about five times better than those using all features and about 1.36 times better than those using a genetic algorithm, which is the best known. Running times when using a genetic algorithm were about twenty times longer than those when using a differential evolution algorithm.

Early Estimation of Rice Cultivation in Gimje-si Using Sentinel-1 and UAV Imagery (Sentinel-1 및 UAV 영상을 활용한 김제시 벼 재배 조기 추정)

  • Lee, Kyung-do;Kim, Sook-gyeong;Ahn, Ho-yong;So, Kyu-ho;Na, Sang-il
    • Korean Journal of Remote Sensing
    • /
    • 제37권3호
    • /
    • pp.503-514
    • /
    • 2021
  • Rice production with adequate level of area is important for decision making of rice supply and demand policy. It is essential to grasp rice cultivation areas in advance for estimating rice production of the year. This study was carried out to classify paddy rice cultivation in Gimje-si using sentinel-1 SAR (synthetic aperture radar) and UAV imagery in early July. Time-series Sentinel-1A and 1B images acquired from early May to early July were processed to convert into sigma naught (dB) images using SNAP (SeNtinel application platform, Version 8.0) toolbox provided by European Space Agency. Farm map and parcel map, which are spatial data of vector polygon, were used to stratify paddy field population for classifying rice paddy cultivation. To distinguish paddy rice from other crops grown in the paddy fields, we used the decision tree method using threshold levels and random forest model. Random forest model, trained by mainly rice cultivation area and rice and soybean cultivation area in UAV image area, showed the best performance as overall accuracy 89.9%, Kappa coefficient 0.774. Through this, we were able to confirm the possibility of early estimation of rice cultivation area in Gimje-si using UAV image.

Vehicle-Bridge Interaction Analysis of Railway Bridges by Using Conventional Trains (기존선 철도차량을 이용한 철도교의 상호작용해석)

  • Cho, Eun Sang;Kim, Hee Ju;Hwang, Won Sup
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • 제29권1A호
    • /
    • pp.31-43
    • /
    • 2009
  • In this study, the numerical method is presented, which can consider the various train types and can solve the equations of motion for a vehicle-bridge interaction analysis by non-iteration procedure through formulating the coupled equations of motion. The coupled equations of motion for the vehicle-bridge interaction are solved by the Newmark ${\beta}$ of a direct integration method, and by composing the effective stiffness matrix and the effective force vector according to a analysis step, those can be solved with the same manner of the solving procedure of equilibrium equations in static analysis. Also, the effective stiffness matrix is reconstructed by the Skyline method for increasing the analysis effectiveness. The Cholesky's matrix decomposition scheme is applied to the analysis procedure for minimizing the numerical errors that can be generated in directly calculating the inverse matrix. The equations of motion for the conventional trains are derived, and the numerical models of the conventional trains are idealized by a set of linear springs and dashpots with 16 degrees of freedom. The bridge models are simplified by the 3 dimensional space frame element which is based on the Euler-Bernoulli theory. The rail irregularities of vertical and lateral directions are generated by the PSD functions of the Federal Railroad Administration (FRA). The results of the vehicle-bridge interaction analysis are verified by the experimental results for the railway plate girder bridges of a span length with 12 m, 18 m, and the experimental and analytical data are applied to the low pass filtering scheme, and the basis frequency of the filtering is a 2 times of the 1st fundamental frequency of a bridge bending.

Comparison of the Vertical Data between Eulerian and Lagrangian Method (오일러와 라그랑주 관측방식의 연직 자료 비교)

  • Hyeok-Jin Bae;Byung Hyuk Kwon;Sang Jin Kim;Kyung-Hun Lee;Geon-Myeong Lee;Yu-Jin Kim;Ji-Woo Seo;Yu-Jung Koo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • 제18권6호
    • /
    • pp.1009-1014
    • /
    • 2023
  • Comprehensive observations of the Euler method and the Lagrangian method were performed in order to obtain high-resolution observation data in space and time for the complex environment of new city. The two radiosondes, which measure meteorological parameters using Lagrangian methods, produced air pressure, wind speed and wind direction. They were generally consistent with each other even if the observation points or times were different. The temperature measured by the sensor exposed to the air during the day was relatively high as the altitude increased due to the influence of solar radiation. The temporal difference in wind direction and speed was found in the comparison of Euler's wind profiler data with radiosonde data. When the wind field is horizontally in homogeneous, this result implies the need to consider the advection component to compare the data of the two observation methods. In this study, a method of using observation data at different times for each altitude section depending on the observation period of the Euler method is proposed to effectively compare the data of the two observation methods.

A Spatial-Temporal Correlation Analysis of Housing Prices in Busan Using SpVAR and GSTAR (SpVAR(공간적 벡터자기회귀모델)과 GSTAR(일반화 시공간자기회귀모델)를 이용한 부산지역 주택가격의 시공간적 상관성 분석)

  • Kwon, Youngwoo;Choi, Yeol
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • 제44권2호
    • /
    • pp.245-256
    • /
    • 2024
  • Since 2020, quantitative easing and easy money policies have been implemented for the purpose of economic stimulus. As a result, real estate prices have skyrocketed. In this study, the relationship between sales and rental prices by housing type during the period of soaring real estate prices in Busan was analyzed spatio-temporally. Based on the actual transaction price data, housing type, transaction type, and monthly data of district units were constructed. Among the spatio-temporal analysis models, the SpVAR, which is used to understand the temporal and spatial effects of variables, and the GSTAR, which is used to understand the effects of each region on those variables, were used. As a result, the sales price of apartment had positive effect on the sale price of apartment, row house, and detached house in the surrounding area, including the target area. On the other hand, it was confirmed that demand was converted to apartment rental due to an increase in apartment sales prices, and the sale price fell again over time. The spatio-temporal spillover effect of apartments was positive, but the positive effect of row house and detached house were concentrated in the original downtown area.

A study on the rock mass classification in boreholes for a tunnel design using machine learning algorithms (머신러닝 기법을 활용한 터널 설계 시 시추공 내 암반분류에 관한 연구)

  • Lee, Je-Kyum;Choi, Won-Hyuk;Kim, Yangkyun;Lee, Sean Seungwon
    • Journal of Korean Tunnelling and Underground Space Association
    • /
    • 제23권6호
    • /
    • pp.469-484
    • /
    • 2021
  • Rock mass classification results have a great influence on construction schedule and budget as well as tunnel stability in tunnel design. A total of 3,526 tunnels have been constructed in Korea and the associated techniques in tunnel design and construction have been continuously developed, however, not many studies have been performed on how to assess rock mass quality and grade more accurately. Thus, numerous cases show big differences in the results according to inspectors' experience and judgement. Hence, this study aims to suggest a more reliable rock mass classification (RMR) model using machine learning algorithms, which is surging in availability, through the analyses based on various rock and rock mass information collected from boring investigations. For this, 11 learning parameters (depth, rock type, RQD, electrical resistivity, UCS, Vp, Vs, Young's modulus, unit weight, Poisson's ratio, RMR) from 13 local tunnel cases were selected, 337 learning data sets as well as 60 test data sets were prepared, and 6 machine learning algorithms (DT, SVM, ANN, PCA & ANN, RF, XGBoost) were tested for various hyperparameters for each algorithm. The results show that the mean absolute errors in RMR value from five algorithms except Decision Tree were less than 8 and a Support Vector Machine model is the best model. The applicability of the model, established through this study, was confirmed and this prediction model can be applied for more reliable rock mass classification when additional various data is continuously cumulated.