Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)
-
- Journal of Internet Computing and Services
- /
- v.16 no.1
- /
- pp.67-74
- /
- 2015
In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.
Purpose: Tissue inhomogeneity such as lung affects tumor dose as well as transmission dose in new concept of on-line dosimetry which estimates tumor dose from transmission dose using the new algorithm. This study was carried out to confirm accuracy of correction by tissue density in tumor dose estimation utilizing transmission dose. Methods: Cork phantom (CP, density
The study was carried out to investigate the changes of the various chemical components and the microflora during the aging period of Korean navive Kochuzang. (Red pepper soybean paste) Korean native maeju loaves were separated into surface and inner parts. Three kinds of Korean native Kochuzang were prepared from surface part, inner part, and ordinary of maeju. The selection and the indentification of the high enzyme producing strains from the microflora and characteristics of their enzymes were studied. I. The changes of the various chemical components during the aging period of Kochuzang. 1) The changes of pH in the 3 kinds of Kochuzang displayed rapid decrease for the first 10 days after preparing and gradual curve of decrease until 60 days, but slight increase for the next 30 days. The pH of the surface part Kochuzang was lower than that of inner part or ordinary Kochuzang. 2) The total acid contents in the 3 kinds of Kochuzang showed gradual increase until the 60 days but it slowly reduced after this time. 3) The total nitrogen contents in the 3 kind of Kochuzang showed gradual inerease up to the 60 days, but slight decrease after this time. 4) The changes of trichloroacetic acid soluble nitrogen in the 3 kinds of Kochuzang showed a remarkable increase for the first 10 days, however gradual increase after this time. 5) The increase of amino nitrogen contents in the 3 kinds of Kochuzang seemed to be remarkable until the first 30 days, however to be less remarkable after this time. 6) The contents of reducing sugar in the 3 kinds of Kochuzang showed remarkable increase until the first 50 days and it slowly reduced after this time. II. The changes of microflora during the aging period of Kochuzang. 1) Aerobic, anaerobic bacteria and mold in the 3 kinds of Kochuzang were increased until the first 30 to 40 days, but they were reduced after this time. 2) No yeast in the three kinds of Kochuzang appeared until the first 20 days. Yeast were proved to grow, when the pH value was decreased below 5.4 after the 30 days. Yeasts in the surface part and ordinary Kochuzang were gradually increased and those in the inner part Kochuzang were decreased as aging. III. The selection and identification of high amylase and protease producing strains from the microflora during the aging period of Kochuzang. 1) The amylase and protease highly producing strains from microflora were identified as Bacillus subtilis-P, Bacillus subtilis-G, Bacillus licheniformis-K, Aspergillus oryzae-B. 2) Amylase activity of Aspergillus oryzae-B was highest among the strains and the strains in order of the higher activity to the lower one were Bacillus subtilis-P Bacillus licheniformis-K, Bacillus subtilis-G. Protease activities of Aspergillus oryzae-B and Bacillus subtilis-P were about the same and the strains in order of the higher activity to the lower one were Bacillus licheniformis-K, Bacillus subtilis-G. 3) Amylase activity was inhibited more than protease activity was with NaCl concentration. Amylase activity was inhibited by 45 to 65 percent and protease activity by 40 to 46 percent at the concentration of 15 percent NaCl, which was the average concentration of NaCl in Kochuzang.
The purpose of this study is that I should look for a desirous directions about home economics by studying the requirements and perception of the high school parents who have finished the course of home economics. It was about 600 parents whom I have searched Seoul-Pusan, Ganwon. Ghynggi province, Choongcheong-Gyungsang province, Cheonla and Jeju province of 600, I chose only 560 as apparently suitable research. The questions include 61 requirements about home economics and one which we never fail to keep among the contents, whenever possible and one about the perception of home economics aims 11 about the perception of home economics courses and management. The collections were analyzed frequency, percent, mean. standard deviation t-test by using SAS program. The followings is the summary result of studying of it. 1. All the boys and girls learning together about the Idea of healthy lives and desirous human formulation and knowledge together are higher. 2. Among the teaching purposes of home economics, the item of the scientific principle and knowledge for improvements of home life shows 15.7% below average value. 3. The recognition degree about the quality of home economics is highly related with the real life, and about the system. we recognize lacking in periods and contents of home economics field and about guiding content, accomplishment and application qualities are higher regardless of sex. 4. The important term which we should emphasize in the subject of home economics is family part. 5. Among the needs of home economic requirement in freshman, in the middle unit, their growth and development are higher than anything else, representing 4.11, and by contrast the basic principle and actuality is 3.70, which is lowest among them. 6. In the case of second grade requirement of home economics content for parents in the middle unit young man and consuming life is 4.09 highest. 7. In the case of 3rd grade requirement of economics contents in the middle unit the choice of coming direction and job ethics is highest 4.16, and preparing meals and evaluation is lowest 3.50.
Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used