Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)
-
- Journal of Intelligence and Information Systems
- /
- v.20 no.2
- /
- pp.73-92
- /
- 2014
An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.
The object of this study is to develop a computer program for the arrangement of the forest-road network maximizing the investment effect in forest-road construction with factors such as terrains, forest physiognomy, management plan, logging system, cost of forest-road construction, capacity of inputted labour, capacity of timber production and so on. The operating system developed by this study is Korean Windows 95/98 and Microsoft Visual Basic ver. 5.0. User interface was designed as systematic structure, it is presented as a kind of GUI(graphic user interface). The developed program has result of the most suitable forest-road arrangement, has suitable forest-road density calculated with cost of logging, cost of forest-road construction, diversion ratio of forest-road, cost of walking in forest. And the most suitable forest-road arrangement was designed for forest-road arrangement network which maximized investment effect through minimizing the sum of cost of logging and cost of forest-road construction. Input data were divided into map data and control data. Digital terrain model, division of forest-road layout plan, division of forest function and the existing road network are obtained from map data. on the other hand, cost of logging related terrain division, diversion ratio of forest-road and working road, cost of forest-road construction, cost of walking, cost of labor, walking speed, capacity of inputted labor, capacity of timber production and total distance of forest-road are inputted from control data. And map data was designed to be inputted by mesh method for common matrix. This program can be used to construct a new forest-road or vice forest-road which compensate already existing forest-road for the functional forestry.
CREAHS II carried out an intensive hydrographic survey covering almost entire East Sea in 1999. Hydrographic data from total 203 stations were released to public on the internee. This paper summarized the results of water mass analysis by OHP (Optimum Multiparameter) method that utilizes temperature, salinity, dissolved oxygen, pH, alkalinity, silicate, nitrate, phosphate and location data as an input data-matrix. A total of eight source water types are identified in the East Sea: four in surface waters(North Korea Surface Water, Tatar Surface Cold Water, East Korean Coastal Water, Modified Tsushima Surface Water), two intermediate water types (Tsushima Middle Water, Liman Cold Water), two deep water types (East Sea Intermediate Water, East Sea Proper Water). Of these NKSW, MTSW and TSCW are the newly reported as the source water type. Distribution of each water types reveals several few interesting hydrographic features. A few noteworthy are summarized as follows: The Tsushima Warm Current enter the East Sea as three branches; East Korea Coastal Water propagates north along the coast around
Purpose: The purpose of this study was to investigate the current status of performing nuclear medicine quality control in korea and to test selected protocols of quality control of nuclear medicine counting system and gamma camera. Materials and Methods: Fifty three hospitals were included to investigate the current status of nuclear medicine quality control in korea. The precision of dose calibrator and thyroid uptake system was measured with Tc-99m 35.52 MBq for 2 minuets and Tc-99m 5.14 MBq for 10 sec every one minute, respectively. The sensitivity of CeraSPECT
The purpose of this study is to investigate the present conditions of nursing investment contents, its conversion process, and output in Oriental University Medical Center, Korea to get good qualified Oriental nursing result which is the ultimate purpose of the Oriental nursing management, and to develope a matrix of Oriental nursing management system on the basis of that project. The subjects for nursing investment and output contents were eighteen nursing directors in eleven Oriental University Medical Center and two hundred thirty-nine nurses with three years and over experience in Oriental medical center. The subjects for Oriental nursing organization, human affair management, and control function were nineteen Oriental medical center in Oriental University Medical Center, Korea. Data were collected from November, 2002 to February, 2003 with questionnaire. Data analysis was done by SPSS PC+ 12 program. Frequency, percentage, and minimum/maximum values were used for investment contents, and frequency and percentage were used for conversion process and output contents. 1. The input factors of oriental nursing management system The objective's western hospital career was over five years of one hundred and seventy-five(73.2%) persons. Nursing in-service education was performed in fourteen hospitals(77.8%). Two hundreds(83.7%) were pro to oriental nurse system. Only four hospitals(22.2%) had independent budget in nursing division. Nursing staff allocation to the bed was from 2.8:1 to 9.06:1 respectively, with a big gap of the rate following the hospitals. 2. The conversion factors of oriental nursing system 1) Oriental nursing system Oriental hospital nursing system was organized independently in ten hospitals among eighteen hospitals. The recruitment of nurses which was a vital role of the nursing division of the hospital was mostly(79%) opened. The education to develope nursing personnels was through in-service one in 97.4%. Education for oriental nursing and management was performed in 42.1%(eight hospitals) and that for reserves was done in 36.8%(seven hospitals). Administration for nursing education by nursing division was 68.5%(thirteen hospitals). The post education evaluation was performed by report submission in 36.8%(seven hospitals), by written examination in 26.3%, by questionnaires in 21.1%, and by lecture presentation in 15.8% subsequently. The directorial meeting for the nursing directors was attended by 84.2%(sixteen hospitals), and the meeting type was the medical executive and support division executive meeting in 55.6%(ten hospitals) and the personnel management in 39.6%(seven hospitals). 2) The actual conditions of oriental nursing personnel management The reason of working in oriental hospital was by voluntary in 67.1%(a hundred and sixty persons), by nursing department order in 28.0%(sixty-seven persons), and by others in 5.0%(twelve persons) respectively. The shift form was a three-shifts one in 94.7%(eighteen hospitals), a two-shift one in only one hospital. Duty assignment was functional in 52.6%(ten hospitals), team and functional in 26.3%(five hospitals) and no team alone. Promotion manual was present at 68.4%(thirteen hospitals) and the competency essentials comprised of performance evaluation in 79%, interview, written examination, training result, study result subsequently. No labor union existed in 79%(fifteen hospitals) 3) Oriental nursing preceptor system There were five oriental hospitals(27.7%) administering the preceptor utilization model, which showed lower rate than the twenty-two medical university hospitals in Seoul in which fifteen hospitals (72.7%) were having the system. To the question of necessity of oriental nurse system asked to the objectives of two hundred and thirty-nine with more than three year-experience in oriental hospital, two hundred persons(83.7%) answered positively. 4) The control of oriental nursing The evaluation results from the target hospitals were mostly not opened in 89.4% of oriental hospitals. Thirteen hospitals(68.3%) had evaluation system of direct managers and the next were three hospitals(15.8%) of direct managers and selves. There was one hospital(5.3% each) where fellows and superiors, fellows, and inferiors' evaluation was performed and no hospital where superiors, fellows, inferiors and selves, and superiors, fellows and selves' evaluation was performed. The QI activity of nursing was 42.1%(eight hospitals) for nursing service evaluation, 36.8% for survey of ECSI, 26.3% for survey of ICSI, 15.8% for medical visit rate, 10% for hospital standardization inspection in sequence. 3. The output factors of oriental nursing management system The job satisfaction appeared good in general, indicating very good in thirty-seven persons (15.7%), good in one hundred and fourteen persons (48.3%) and fair in eighty-five persons(36.0%).
System trading is becoming more popular among Korean traders recently. System traders use automatic order systems based on the system generated buy and sell signals. These signals are generated from the predetermined entry and exit rules that were coded by system traders. Most researches on system trading have focused on designing profitable entry and exit rules using technical indicators. However, market conditions, strategy characteristics, and money management also have influences on the profitability of the system trading. Unexpected price deviations from the predetermined trading rules can incur large losses to system traders. Therefore, most professional traders use strategy portfolios rather than only one strategy. Building a good strategy portfolio is important because trading performance depends on strategy portfolios. Despite of the importance of designing strategy portfolio, rule of thumb methods have been used to select trading strategies. In this study, we propose a SVM-based strategy portfolio management system. SVM were introduced by Vapnik and is known to be effective for data mining area. It can build good portfolios within a very short period of time. Since SVM minimizes structural risks, it is best suitable for the futures trading market in which prices do not move exactly the same as the past. Our system trading strategies include moving-average cross system, MACD cross system, trend-following system, buy dips and sell rallies system, DMI system, Keltner channel system, Bollinger Bands system, and Fibonacci system. These strategies are well known and frequently being used by many professional traders. We program these strategies for generating automated system signals for entry and exit. We propose SVM-based strategies selection system and portfolio construction and order routing system. Strategies selection system is a portfolio training system. It generates training data and makes SVM model using optimal portfolio. We make
Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.
In this paper, we develop a deep learning structure for a complex microbial incubator that applies deep learning prediction result information. The proposed complex microbial incubator consists of pre-processing of complex microbial data, conversion of complex microbial data structure, design of deep learning network, learning of the designed deep learning network, and GUI development applied to the prototype. In the complex microbial data preprocessing, one-hot encoding is performed on the amount of molasses, nutrients, plant extract, salt, etc. required for microbial culture, and the maximum-minimum normalization method for the pH concentration measured as a result of the culture and the number of microbial cells to preprocess the data. In the complex microbial data structure conversion, the preprocessed data is converted into a graph structure by connecting the water temperature and the number of microbial cells, and then expressed as an adjacency matrix and attribute information to be used as input data for a deep learning network. In deep learning network design, complex microbial data is learned by designing a graph convolutional network specialized for graph structures. The designed deep learning network uses a cosine loss function to proceed with learning in the direction of minimizing the error that occurs during learning. GUI development applied to the prototype shows the target pH concentration (3.8 or less) and the number of cells (108 or more) of complex microorganisms in an order suitable for culturing according to the water temperature selected by the user. In order to evaluate the performance of the proposed microbial incubator, the results of experiments conducted by authorized testing institutes showed that the average pH was 3.7 and the number of cells of complex microorganisms was 1.7 × 108. Therefore, the effectiveness of the deep learning structure for the complex microbial incubator applying the deep learning prediction result information proposed in this paper was proven.
One of the characteristics of South Korea's economic structure is that it is highly dependent on exports. Thus, many businesses are closely related to the global economy and diplomatic situation. In addition, small and medium-sized enterprises(SMEs) specialized in exporting are struggling due to the spread of COVID-19. Therefore, this study aimed to develop a model to forecast exports for next year to support SMEs' export strategy and decision making. Also, this study proposed a strategy to recommend promising export countries of each item based on the forecasting model. We analyzed important variables used in previous studies such as country-specific, item-specific, and macro-economic variables and collected those variables to train our prediction model. Next, through the exploratory data analysis(EDA) it was found that exports, which is a target variable, have a highly skewed distribution. To deal with this issue and improve predictive performance, we suggest a separated learning method. In a separated learning method, the whole dataset is divided into homogeneous subgroups and a prediction algorithm is applied to each group. Thus, characteristics of each group can be more precisely trained using different input variables and algorithms. In this study, we divided the dataset into five subgroups based on the exports to decrease skewness of the target variable. After the separation, we found that each group has different characteristics in countries and goods. For example, In Group 1, most of the exporting countries are developing countries and the majority of exporting goods are low value products such as glass and prints. On the other hand, major exporting countries of South Korea such as China, USA, and Vietnam are included in Group 4 and Group 5 and most exporting goods in these groups are high value products. Then we used LightGBM(LGBM) and Exponential Moving Average(EMA) for prediction. Considering the characteristics of each group, models were built using LGBM for Group 1 to 4 and EMA for Group 5. To evaluate the performance of the model, we compare different model structures and algorithms. As a result, it was found that the separated learning model had best performance compared to other models. After the model was built, we also provided variable importance of each group using SHAP-value to add explainability of our model. Based on the prediction model, we proposed a second-stage recommendation strategy for potential export countries. In the first phase, BCG matrix was used to find Star and Question Mark markets that are expected to grow rapidly. In the second phase, we calculated scores for each country and recommendations were made according to ranking. Using this recommendation framework, potential export countries were selected and information about those countries for each item was presented. There are several implications of this study. First of all, most of the preceding studies have conducted research on the specific situation or country. However, this study use various variables and develops a machine learning model for a wide range of countries and items. Second, as to our knowledge, it is the first attempt to adopt a separated learning method for exports prediction. By separating the dataset into 5 homogeneous subgroups, we could enhance the predictive performance of the model. Also, more detailed explanation of models by group is provided using SHAP values. Lastly, this study has several practical implications. There are some platforms which serve trade information including KOTRA, but most of them are based on past data. Therefore, it is not easy for companies to predict future trends. By utilizing the model and recommendation strategy in this research, trade related services in each platform can be improved so that companies including SMEs can fully utilize the service when making strategies and decisions for exports.
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70