• Title/Summary/Keyword: Accuracy improvement

Search Result 2,373, Processing Time 0.026 seconds

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • A Study on the Characteristics of Enterprise R&D Capabilities Using Data Mining (데이터마이닝을 활용한 기업 R&D역량 특성에 관한 탐색 연구)

    • Kim, Sang-Gook;Lim, Jung-Sun;Park, Wan
      • Journal of Intelligence and Information Systems
      • /
      • v.27 no.1
      • /
      • pp.1-21
      • /
      • 2021
    • As the global business environment changes, uncertainties in technology development and market needs increase, and competition among companies intensifies, interests and demands for R&D activities of individual companies are increasing. In order to cope with these environmental changes, R&D companies are strengthening R&D investment as one of the means to enhance the qualitative competitiveness of R&D while paying more attention to facility investment. As a result, facilities or R&D investment elements are inevitably a burden for R&D companies to bear future uncertainties. It is true that the management strategy of increasing investment in R&D as a means of enhancing R&D capability is highly uncertain in terms of corporate performance. In this study, the structural factors that influence the R&D capabilities of companies are explored in terms of technology management capabilities, R&D capabilities, and corporate classification attributes by utilizing data mining techniques, and the characteristics these individual factors present according to the level of R&D capabilities are analyzed. This study also showed cluster analysis and experimental results based on evidence data for all domestic R&D companies, and is expected to provide important implications for corporate management strategies to enhance R&D capabilities of individual companies. For each of the three viewpoints, detailed evaluation indexes were composed of 7, 2, and 4, respectively, to quantitatively measure individual levels in the corresponding area. In the case of technology management capability and R&D capability, the sub-item evaluation indexes that are being used by current domestic technology evaluation agencies were referenced, and the final detailed evaluation index was newly constructed in consideration of whether data could be obtained quantitatively. In the case of corporate classification attributes, the most basic corporate classification profile information is considered. In particular, in order to grasp the homogeneity of the R&D competency level, a comprehensive score for each company was given using detailed evaluation indicators of technology management capability and R&D capability, and the competency level was classified into five grades and compared with the cluster analysis results. In order to give the meaning according to the comparative evaluation between the analyzed cluster and the competency level grade, the clusters with high and low trends in R&D competency level were searched for each cluster. Afterwards, characteristics according to detailed evaluation indicators were analyzed in the cluster. Through this method of conducting research, two groups with high R&D competency and one with low level of R&D competency were analyzed, and the remaining two clusters were similar with almost high incidence. As a result, in this study, individual characteristics according to detailed evaluation indexes were analyzed for two clusters with high competency level and one cluster with low competency level. The implications of the results of this study are that the faster the replacement cycle of professional managers who can effectively respond to changes in technology and market demand, the more likely they will contribute to enhancing R&D capabilities. In the case of a private company, it is necessary to increase the intensity of input of R&D capabilities by enhancing the sense of belonging of R&D personnel to the company through conversion to a corporate company, and to provide the accuracy of responsibility and authority through the organization of the team unit. Since the number of technical commercialization achievements and technology certifications are occurring both in the case of contributing to capacity improvement and in case of not, it was confirmed that there is a limit in reviewing it as an important factor for enhancing R&D capacity from the perspective of management. Lastly, the experience of utility model filing was identified as a factor that has an important influence on R&D capability, and it was confirmed the need to provide motivation to encourage utility model filings in order to enhance R&D capability. As such, the results of this study are expected to provide important implications for corporate management strategies to enhance individual companies' R&D capabilities.

    A Study on the Establishment of Acceptable Range for Internal Quality Control of Radioimmunoassay (핵의학 검체검사 내부정도관리 허용범위 설정에 관한 고찰)

    • Young Ji, LEE;So Young, LEE;Sun Ho, LEE
      • The Korean Journal of Nuclear Medicine Technology
      • /
      • v.26 no.2
      • /
      • pp.43-47
      • /
      • 2022
    • Purpose Radioimmunoassay implement quality control by systematizing the internal quality control system for quality assurance of test results. This study aims to contribute to the quality assurance of radioimmunoassay results and to implement systematic quality control by measuring the average CV of internal quality control and external quality control by plenty of institutions for reference when setting the laboratory's own acceptable range. Materials and Methods We measured the average CV of internal quality control and the bounce rate of more than 10.0% for a total of 42 items from October 2020 to December 2021. According to the CV result, we classified and compared the upper group (5.0% or less), the middle group (5.0~10.0%) and the lower group (10.0% or more). The bounce rate of 10.0% or more was compared by classifying the item of five or more institutions into tumor markers, thyroid hormones and other hormones. The average CV was measured by the overall average and standard deviation of the external quality control results for 28 items from the first quarter to the fourth quarter of 2021. In addition, the average CV was measured by the overall average and standard deviation of the proficiency results between institutions for 13 items in the first half and the second half of 2021. The average CV of internal quality control and external quality control was compared by item so we compared and analyzed the items that implement well to quality control and the items that require attention to quality control. Results As a result of measuring the precision average of internal quality control for 42 items of six institutions, the top group (5.0% or less) are Ferritin, HGH, SHBG, and 25-OH-VitD, while the bottom group (≤10.0%) are cortisol, ATA, AMA, renin, and estradiol. When comparing more than 10.0% bounce rate of CV for tumor markers, CA-125 (6.7%), CA-19-9 (9.8%) implemented well, while SCC-Ag (24.3%), CA-15-3 (26.7%) were among the items that require attention to control. As a result of comparing the bounce rate of more than 10.0% of CV for thyroid hormones examination, free T4 (2.1%), T3 (9.3%) showed excellent performance and AMA (39.6%), ATA (51.6%) required attention to control. When comparing the bounce rate of 10.0% or more of CV for other hormones, IGF-1 (8.8%), FSH (9.1%), prolactin (9.2%) showed excellent performance, however estradiol (37.3%), testosterone (37.7%), cortisol (44.4%) required attention to control. As a result of measuring the average CV of the whole institutions participating at external quality control for 28 items, HGH and SCC-Ag were included in the top group (≤10.0%), however ATA, estradiol, TSI, and thyroglobulin included in bottom group (≥30.0%). Conclusion As a result of evaluating 42 items of six institutions, the average CV was 3.7~12.2% showing a 3.3 times difference between the upper group and the lower group. Cortisol, ATA, AMA, Renin and estradiol tests with high CV will require continuous improvement activities to improve precision. In addition, we measured and compared the overall average CV of the internal quality control, the external quality control and the proficiency between institutions participating of six institutions for 41 items excluding HBs-Ab. As a result, ATA, AMA, Renin and estradiol belong to the same subgroup so we require attention to control and consider setting a higher acceptable range. It is recommended to set and control the acceptable range standard of internal quality control CV in consideration of many things in the laboratory due to the different reagents and instruments, and the results vary depending on the test's proficiency and quality control materials. It is thought that the accuracy and reliability of radioimmunoassay results can be improved if systematic quality control is implemented based on the set acceptable range.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.