Search | Korea Science

Predictive Clustering-based Collaborative Filtering Technique for Performance-Stability of Recommendation System (추천 시스템의 성능 안정성을 위한 예측적 군집화 기반 협업 필터링 기법)

Lee, O-Joun;You, Eun-Soon
- Journal of Intelligence and Information Systems
- /
- v.21 no.1
- /
- pp.119-142
- /
- 2015
With the explosive growth in the volume of information, Internet users are experiencing considerable difficulties in obtaining necessary information online. Against this backdrop, ever-greater importance is being placed on a recommender system that provides information catered to user preferences and tastes in an attempt to address issues associated with information overload. To this end, a number of techniques have been proposed, including content-based filtering (CBF), demographic filtering (DF) and collaborative filtering (CF). Among them, CBF and DF require external information and thus cannot be applied to a variety of domains. CF, on the other hand, is widely used since it is relatively free from the domain constraint. The CF technique is broadly classified into memory-based CF, model-based CF and hybrid CF. Model-based CF addresses the drawbacks of CF by considering the Bayesian model, clustering model or dependency network model. This filtering technique not only improves the sparsity and scalability issues but also boosts predictive performance. However, it involves expensive model-building and results in a tradeoff between performance and scalability. Such tradeoff is attributed to reduced coverage, which is a type of sparsity issues. In addition, expensive model-building may lead to performance instability since changes in the domain environment cannot be immediately incorporated into the model due to high costs involved. Cumulative changes in the domain environment that have failed to be reflected eventually undermine system performance. This study incorporates the Markov model of transition probabilities and the concept of fuzzy clustering with CBCF to propose predictive clustering-based CF (PCCF) that solves the issues of reduced coverage and of unstable performance. The method improves performance instability by tracking the changes in user preferences and bridging the gap between the static model and dynamic users. Furthermore, the issue of reduced coverage also improves by expanding the coverage based on transition probabilities and clustering probabilities. The proposed method consists of four processes. First, user preferences are normalized in preference clustering. Second, changes in user preferences are detected from review score entries during preference transition detection. Third, user propensities are normalized using patterns of changes (propensities) in user preferences in propensity clustering. Lastly, the preference prediction model is developed to predict user preferences for items during preference prediction. The proposed method has been validated by testing the robustness of performance instability and scalability-performance tradeoff. The initial test compared and analyzed the performance of individual recommender systems each enabled by IBCF, CBCF, ICFEC and PCCF under an environment where data sparsity had been minimized. The following test adjusted the optimal number of clusters in CBCF, ICFEC and PCCF for a comparative analysis of subsequent changes in the system performance. The test results revealed that the suggested method produced insignificant improvement in performance in comparison with the existing techniques. In addition, it failed to achieve significant improvement in the standard deviation that indicates the degree of data fluctuation. Notwithstanding, it resulted in marked improvement over the existing techniques in terms of range that indicates the level of performance fluctuation. The level of performance fluctuation before and after the model generation improved by 51.31% in the initial test. Then in the following test, there has been 36.05% improvement in the level of performance fluctuation driven by the changes in the number of clusters. This signifies that the proposed method, despite the slight performance improvement, clearly offers better performance stability compared to the existing techniques. Further research on this study will be directed toward enhancing the recommendation performance that failed to demonstrate significant improvement over the existing techniques. The future research will consider the introduction of a high-dimensional parameter-free clustering algorithm or deep learning-based model in order to improve performance in recommendations.
https://doi.org/10.13088/jiis.2015.21.1.119 인용 PDF KSCI

A Study on the Longitudinal Trajectories of Use Time and the Related factors for the Children in Community Children Centers (아동의 지역아동센터 이용시간의 종단적 변화유형과 영향요인에 관한 연구)

Kim, Dong Ha
- Korean Journal of Social Welfare Studies
- /
- v.49 no.2
- /
- pp.159-180
- /
- 2018
The purpose of this study is to identify the trajectories in the use time of children from community children centers and to examine the predictive factors and developmental factors related to each trajectory. The data were derived from the second stage of the Community Children Center Panel Survey using from the first wave (2014) to the third wave (2016). A total of 606 samples were selected from the forth to sixth grades of elementary school. Latent class growth model was employed to identify the trajectories, and the multinominal logistic regression and the logistic regression analysis were used to examine predictive factors and developmental factors. Main results indicated that three types of trajectory were identified: high using group, low using group, and high initial using-rapid declining group. Sex, parental supervision, and use duration were found to be significant predictors. Regarding developmental factors, children who constantly use the community children centers were more likely to increase academic performance and school adaptation. However, no significant results were found for aggression and delinquent behaviors. Based on these findings, this study have suggested the future direction of the community children center.
https://doi.org/10.16999/kasws.2018.49.2.159 인용

A Study of Future Residential Land Use Change considering Climate Change using Land Use Equilibrium Model in Jeju (토지이용균형 모델을 이용한 기후변화에 따른 주거용 토지이용변화 - 제주 지역을 대상으로 -)

Yoo, Somin;Lee, Woo-Kyun;Yamagata, Yoshiki;Kim, Jiyoung;Kim, Moon-Il;Lim, Chul-Hee
- Journal of Climate Change Research
- /
- v.6 no.1
- /
- pp.1-10
- /
- 2015
Climate change lead to environmental pollution caused by the radical economic growth and development of industry. The amount of damage from abnormal climate is increasing rapidly for this reason in Korea. In particular, the cities is a lot of carbon emission quantity from the radical growth. Thus the government present "low carbon green growth" for eco-friendly city planning. As one of the important factors effecting climate change, active researches on land use change is performed. In this study, we knew land use change of each scenarios using land use equilibrium model which is kind of predictive model of land use in Japan. First, we selected study area to Jeju lsland. For this study, indicators for input data were selected and spatial data for input data were established using GIS program. Second, we established future scenarios based in 2040s. There are 2 future scenarios: dispersion scenario, compact scenario. Third, we compared with residential area of current and residential area for future scenarios. Results showed that residential area of the difference between current and dispersion scenario were 1,230 ha and residential area of the difference between current and compact scenario were 1,515 ha. Finally, for comparing carbon dioxide absorption volume between dispersion scenarios and compact scenarios, we calculated carbon dioxide absorption volume according to residential area decreased of each future scenarios. Results showed that carbon dioxide absorption volume in dispersion scenario was 477,878 ton and carbon dioxide absorption volume in compact scenario was 588,606 ton. Therefore, the study showed that land use equilibrium model is expected to put to use for future enhancement in creating data for climate change stabilization. And it is also expected to be utilized for city planning research in Korea.
https://doi.org/10.15531/KSCCR.2015.6.1.1 인용

Prediction of spatio-temporal AQI data

KyeongEun Kim;MiRu Ma;KyeongWon Lee
- Communications for Statistical Applications and Methods
- /
- v.30 no.2
- /
- pp.119-133
- /
- 2023
With the rapid growth of the economy and fossil fuel consumption, the concentration of air pollutants has increased significantly and the air pollution problem is no longer limited to small areas. We conduct statistical analysis with the actual data related to air quality that covers the entire of South Korea using R and Python. Some factors such as SO₂, CO, O₃, NO₂, PM₁₀, precipitation, wind speed, wind direction, vapor pressure, local pressure, sea level pressure, temperature, humidity, and others are used as covariates. The main goal of this paper is to predict air quality index (AQI) spatio-temporal data. The observations of spatio-temporal big datasets like AQI data are correlated both spatially and temporally, and computation of the prediction or forecasting with dependence structure is often infeasible. As such, the likelihood function based on the spatio-temporal model may be complicated and some special modelings are useful for statistically reliable predictions. In this paper, we propose several methods for this big spatio-temporal AQI data. First, random effects with spatio-temporal basis functions model, a classical statistical analysis, is proposed. Next, neural networks model, a deep learning method based on artificial neural networks, is applied. Finally, random forest model, a machine learning method that is closer to computational science, will be introduced. Then we compare the forecasting performance of each other in terms of predictive diagnostics. As a result of the analysis, all three methods predicted the normal level of PM_2.5 well, but the performance seems to be poor at the extreme value.
https://doi.org/10.29220/CSAM.2023.30.2.119 인용 PDF

Development of Mass Proliferation Control Algorithm of Phytoplankton Using Artificial Neural Network (인공신경망을 이용한 식물플랑크톤의 대량 증식 제어 알고리즘 개발)

Seonghwa Park;Jonggu Kim;Minsun Kwon
- Journal of the Korean Society of Marine Environment & Safety
- /
- v.29 no.5
- /
- pp.435-444
- /
- 2023
Suitable environmental conditions in Saemangeum frequently favor phytoplankton growth. There have been occurrences of sudden phytoplankton blooms, surpassing the algae management standards. A model was designed to prevent such blooms using scientific predictive techniques to forecast and regulate the possibility of phytoplankton blooms. We propose effective and efficient algae control measures concerning every phytoplankton species optimized through the policy control of nutrients (DIN, PO4-P) from rivers and controlling lake salinity using gate operations. The probability of phytoplankton blooms was initially forecast using an artificial neural network algorithm based on observations. The model's Kappa number fluctuated from 0.7889 to 1.0000, indicating good to excellent predictive power. The Garson algorithm was then utilized to assess the significance of explanatory variables for every species. Meanwhile, the probability of phytoplankton blooms was anticipated depending on the DIN and salinity value changes. Therefore, the model predicted the precise DIN and salinity concentrations to inhibit phytoplankton blooms for each species. Hence, the green algae model can create effective proactive measures to avoid future phytoplankton blooms in enormous artificial lakes.
https://doi.org/10.7837/kosomes.2023.29.5.435 인용 PDF

Application of Predictive Microbiology for Shelf-life Estimation of Tteokgalbi Containing Dietary Fiber from Rice Bran (예측미생물학을 활용한 미강 식이섬유 함유 떡갈비의 유통기한 설정)

Heo, Chan;Kim, Hyoun-Wook;Choi, Yun-Sang;Kim, Cheon-Jei;Paik, Hyun-Dong
- Food Science of Animal Resources
- /
- v.28 no.2
- /
- pp.232-239
- /
- 2008
The objective of this study is to estimate the shelf-life of Tteokgalbi containing dietary fiber extracted from rice bran by using the predictive microbiology. This Tteokgalbi was made with 0%, 1%, 2%, and 3% dietary fiber. The number of total viable cells, anaerobic, psychrotrophic, and heat-stable bacteria and coliforms was calculated during 15 days of storage under $4{\pm}1^{\circ}C$ and the obtained data was applied to Baranyi function. The evaluation of fitness between predicted and observed data showed that these were matched in a satisfactory way. Heat-stable bacteria was detected lower than <1 log CFU/g and coliforms were not detected during the storage. The changes of total viable cells and psychrotrophic bacteria in Tteokgalbi were increased gradually, but dramatically increased after 3 days of storage. The models of total viable cells and anaerobic bacteria showed very similar growth trends and values of growth parameters each other. The estimated shelf-life of each Tteokgalbi was calculated from the predictive model of total viable cells and the estimated shelf-life was 1.7, 2.3, 2.3, and 2.4 days, respectively. The results suggested that the prediction of bacteria growth could be used to evaluate the microbiological safety and determine the shelf-life of Tteokgalbi as ready-to-eat food in the local market.
https://doi.org/10.5851/kosfa.2008.28.2.232 인용 PDF KSCI

Development of Examination Model of Weather Factors on Garlic Yield Using Big Data Analysis (빅데이터 분석을 활용한 마늘 생산에 미치는 날씨 요인에 관한 영향 조사 모형 개발)

Kim, Shinkon
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.19 no.5
- /
- pp.480-488
- /
- 2018
The development of information and communication technology has been carried out actively in the field of agriculture to generate valuable information from large amounts of data and apply big data technology to utilize it. Crops and their varieties are determined by the influence of the natural environment such as temperature, precipitation, and sunshine hours. This paper derives the climatic factors affecting the production of crops using the garlic growth process and daily meteorological variables. A prediction model was also developed for the production of garlic per unit area. A big data analysis technique considering the growth stage of garlic was used. In the exploratory data analysis process, various agricultural production data, such as the production volume, wholesale market load, and growth data were provided from the National Statistical Office, the Rural Development Administration, and Korea Rural Economic Institute. Various meteorological data, such as AWS, ASOS, and special status data, were collected and utilized from the Korea Meteorological Agency. The correlation analysis process was designed by comparing the prediction power of the models and fitness of models derived from the variable selection, candidate model derivation, model diagnosis, and scenario prediction. Numerous weather factor variables were selected as descriptive variables by factor analysis to reduce the dimensions. Using this method, it was possible to effectively control the multicollinearity and low degree of freedom that can occur in regression analysis and improve the fitness and predictive power of regression analysis.
https://doi.org/10.5762/KAIS.2018.19.5.480 인용 PDF KSCI

Development of Predictive Models for Rights Issues Using Financial Analysis Indices and Decision Tree Technique (경영분석지표와 의사결정나무기법을 이용한 유상증자 예측모형 개발)

Kim, Myeong-Kyun;Cho, Yoonho
- Journal of Intelligence and Information Systems
- /
- v.18 no.4
- /
- pp.59-77
- /
- 2012
This study focuses on predicting which firms will increase capital by issuing new stocks in the near future. Many stakeholders, including banks, credit rating agencies and investors, performs a variety of analyses for firms' growth, profitability, stability, activity, productivity, etc., and regularly report the firms' financial analysis indices. In the paper, we develop predictive models for rights issues using these financial analysis indices and data mining techniques. This study approaches to building the predictive models from the perspective of two different analyses. The first is the analysis period. We divide the analysis period into before and after the IMF financial crisis, and examine whether there is the difference between the two periods. The second is the prediction time. In order to predict when firms increase capital by issuing new stocks, the prediction time is categorized as one year, two years and three years later. Therefore Total six prediction models are developed and analyzed. In this paper, we employ the decision tree technique to build the prediction models for rights issues. The decision tree is the most widely used prediction method which builds decision trees to label or categorize cases into a set of known classes. In contrast to neural networks, logistic regression and SVM, decision tree techniques are well suited for high-dimensional applications and have strong explanation capabilities. There are well-known decision tree induction algorithms such as CHAID, CART, QUEST, C5.0, etc. Among them, we use C5.0 algorithm which is the most recently developed algorithm and yields performance better than other algorithms. We obtained data for the rights issue and financial analysis from TS2000 of Korea Listed Companies Association. A record of financial analysis data is consisted of 89 variables which include 9 growth indices, 30 profitability indices, 23 stability indices, 6 activity indices and 8 productivity indices. For the model building and test, we used 10,925 financial analysis data of total 658 listed firms. PASW Modeler 13 was used to build C5.0 decision trees for the six prediction models. Total 84 variables among financial analysis data are selected as the input variables of each model, and the rights issue status (issued or not issued) is defined as the output variable. To develop prediction models using C5.0 node (Node Options: Output type = Rule set, Use boosting = false, Cross-validate = false, Mode = Simple, Favor = Generality), we used 60% of data for model building and 40% of data for model test. The results of experimental analysis show that the prediction accuracies of data after the IMF financial crisis (59.04% to 60.43%) are about 10 percent higher than ones before IMF financial crisis (68.78% to 71.41%). These results indicate that since the IMF financial crisis, the reliability of financial analysis indices has increased and the firm intention of rights issue has been more obvious. The experiment results also show that the stability-related indices have a major impact on conducting rights issue in the case of short-term prediction. On the other hand, the long-term prediction of conducting rights issue is affected by financial analysis indices on profitability, stability, activity and productivity. All the prediction models include the industry code as one of significant variables. This means that companies in different types of industries show their different types of patterns for rights issue. We conclude that it is desirable for stakeholders to take into account stability-related indices and more various financial analysis indices for short-term prediction and long-term prediction, respectively. The current study has several limitations. First, we need to compare the differences in accuracy by using different data mining techniques such as neural networks, logistic regression and SVM. Second, we are required to develop and to evaluate new prediction models including variables which research in the theory of capital structure has mentioned about the relevance to rights issue.
https://doi.org/10.13088/jiis.2012.18.4.059 인용 PDF KSCI

Evaluating Distress Prediction Models for Food Service Franchise Industry (외식프랜차이즈기업 부실예측모형 예측력 평가)

KIM, Si-Joong
- Journal of Distribution Science
- /
- v.17 no.11
- /
- pp.73-79
- /
- 2019
Purpose: The purpose of this study was evaluated to compare the predictive power of distress prediction models by using discriminant analysis method and logit analysis method for food service franchise industry in Korea. Research design, data and methodology: Forty-six food service franchise industry with high sales volume in the 2017 were selected as the sample food service franchise industry for analysis. The fourteen financial ratios for analysis were calculated from the data in the 2017 statement of financial position and income statement of forty-six food service franchise industry in Korea. The fourteen financial ratios were used as sample data and analyzed by t-test. As a result seven statistically significant independent variables were chosen. The analysis method of the distress prediction model was performed by logit analysis and multiple discriminant analysis. Results: The difference between the average value of fourteen financial ratios of forty-six food service franchise industry was tested through t-test in order to extract variables that are classified as top-leveled and failure food service franchise industry among the financial ratios. As a result of the univariate test appears that the variables which differentiate the top-leveled food service franchise industry to failure food service industry are income to stockholders' equity, operating income to sales, current ratio, net income to assets, cash flows from operating activities, growth rate of operating income, and total assets turnover. The statistical significances of the seven financial ratio independent variables were also confirmed by logit analysis and discriminant analysis. Conclusions: The analysis results of the prediction accuracy of each distress prediction model in this study showed that the forecast accuracy of the prediction model by the discriminant analysis method was 84.8% and 89.1% by the logit analysis method, indicating that the logit analysis method has higher distress predictability than the discriminant analysis method. Comparing the previous distress prediction capability, which ranges from 75% to 85% by discriminant analysis and logit analysis, this study's prediction capacity, which is 84.8% in the discriminant analysis, and 89.1% in logit analysis, is found to belong to the range of previous study's prediction capacity range and is considered high number.
https://doi.org/10.13106/ijidb.2019.vol10.no11.73 인용 PDF HTML

Development of the Surface Forest Fire Behavior Prediction Model Using GIS (GIS를 이용한 지표화 확산예측모델의 개발)

Lee, Byungdoo;Chung, Joosang;Lee, Myung-Bo
- Journal of Korean Society of Forest Science
- /
- v.94 no.6
- /
- pp.481-487
- /
- 2005
In this study, a GIS model to simulate the behavior of surface forest fires was developed on the basis of forest fire growth prediction algorithm. This model consists of three modules for data-handling, simulation and report writing. The data-handling module was designed to interpret such forest fire environment factors as terrain, fuel and weather and provide sets of data required in analyzing fire behavior. The simulation module simulates the fire and determines spread velocity, fire intensity and burnt area over time associated with terrain slope, wind, effective humidity and such fuel condition factors as fuel depth, fuel loading and moisture content for fire extinction. The module is equipped with the functions to infer the fuel condition factors from the information extracted from digital vegetation map sand the fuel moisture from the weather conditions including effective humidity, maximum temperature, precipitation and hourly irradiation. The report writer has the function to provide results of a series of analyses for fire prediction. A performance test of the model with the 2002 Chungyang forest fire showed the predictive accuracy of 61% in spread rate.
PDF KSCI

Search Result 146, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)