Search | Korea Science

A Topic Modeling-based Recommender System Considering Changes in User Preferences (고객 선호 변화를 고려한 토픽 모델링 기반 추천 시스템)

Kang, So Young;Kim, Jae Kyeong;Choi, Il Young;Kang, Chang Dong
- Journal of Intelligence and Information Systems
- /
- v.26 no.2
- /
- pp.43-56
- /
- 2020
Recommender systems help users make the best choice among various options. Especially, recommender systems play important roles in internet sites as digital information is generated innumerable every second. Many studies on recommender systems have focused on an accurate recommendation. However, there are some problems to overcome in order for the recommendation system to be commercially successful. First, there is a lack of transparency in the recommender system. That is, users cannot know why products are recommended. Second, the recommender system cannot immediately reflect changes in user preferences. That is, although the preference of the user's product changes over time, the recommender system must rebuild the model to reflect the user's preference. Therefore, in this study, we proposed a recommendation methodology using topic modeling and sequential association rule mining to solve these problems from review data. Product reviews provide useful information for recommendations because product reviews include not only rating of the product but also various contents such as user experiences and emotional state. So, reviews imply user preference for the product. So, topic modeling is useful for explaining why items are recommended to users. In addition, sequential association rule mining is useful for identifying changes in user preferences. The proposed methodology is largely divided into two phases. The first phase is to create user profile based on topic modeling. After extracting topics from user reviews on products, user profile on topics is created. The second phase is to recommend products using sequential rules that appear in buying behaviors of users as time passes. The buying behaviors are derived from a change in the topic of each user. A collaborative filtering-based recommendation system was developed as a benchmark system, and we compared the performance of the proposed methodology with that of the collaborative filtering-based recommendation system using Amazon's review dataset. As evaluation metrics, accuracy, recall, precision, and F1 were used. For topic modeling, collapsed Gibbs sampling was conducted. And we extracted 15 topics. Looking at the main topics, topic 1, top 3, topic 4, topic 7, topic 9, topic 13, topic 14 are related to "comedy shows", "high-teen drama series", "crime investigation drama", "horror theme", "British drama", "medical drama", "science fiction drama", respectively. As a result of comparative analysis, the proposed methodology outperformed the collaborative filtering-based recommendation system. From the results, we found that the time just prior to the recommendation was very important for inferring changes in user preference. Therefore, the proposed methodology not only can secure the transparency of the recommender system but also can reflect the user's preferences that change over time. However, the proposed methodology has some limitations. The proposed methodology cannot recommend product elaborately if the number of products included in the topic is large. In addition, the number of sequential patterns is small because the number of topics is too small. Therefore, future research needs to consider these limitations.
https://doi.org/10.13088/jiis.2020.26.2.043 인용 PDF KSCI

Development of User Based Recommender System using Social Network for u-Healthcare (사회 네트워크를 이용한 사용자 기반 유헬스케어 서비스 추천 시스템 개발)

Kim, Hyea-Kyeong;Choi, Il-Young;Ha, Ki-Mok;Kim, Jae-Kyeong
- Journal of Intelligence and Information Systems
- /
- v.16 no.3
- /
- pp.181-199
- /
- 2010
As rapid progress of population aging and strong interest in health, the demand for new healthcare service is increasing. Until now healthcare service has provided post treatment by face-to-face manner. But according to related researches, proactive treatment is resulted to be more effective for preventing diseases. Particularly, the existing healthcare services have limitations in preventing and managing metabolic syndrome such a lifestyle disease, because the cause of metabolic syndrome is related to life habit. As the advent of ubiquitous technology, patients with the metabolic syndrome can improve life habit such as poor eating habits and physical inactivity without the constraints of time and space through u-healthcare service. Therefore, lots of researches for u-healthcare service focus on providing the personalized healthcare service for preventing and managing metabolic syndrome. For example, Kim et al.(2010) have proposed a healthcare model for providing the customized calories and rates of nutrition factors by analyzing the user's preference in foods. Lee et al.(2010) have suggested the customized diet recommendation service considering the basic information, vital signs, family history of diseases and food preferences to prevent and manage coronary heart disease. And, Kim and Han(2004) have demonstrated that the web-based nutrition counseling has effects on food intake and lipids of patients with hyperlipidemia. However, the existing researches for u-healthcare service focus on providing the predefined one-way u-healthcare service. Thus, users have a tendency to easily lose interest in improving life habit. To solve such a problem of u-healthcare service, this research suggests a u-healthcare recommender system which is based on collaborative filtering principle and social network. This research follows the principle of collaborative filtering, but preserves local networks (consisting of small group of similar neighbors) for target users to recommend context aware healthcare services. Our research is consisted of the following five steps. In the first step, user profile is created using the usage history data for improvement in life habit. And then, a set of users known as neighbors is formed by the degree of similarity between the users, which is calculated by Pearson correlation coefficient. In the second step, the target user obtains service information from his/her neighbors. In the third step, recommendation list of top-N service is generated for the target user. Making the list, we use the multi-filtering based on user's psychological context information and body mass index (BMI) information for the detailed recommendation. In the fourth step, the personal information, which is the history of the usage service, is updated when the target user uses the recommended service. In the final step, a social network is reformed to continually provide qualified recommendation. For example, the neighbors may be excluded from the social network if the target user doesn't like the recommendation list received from them. That is, this step updates each user's neighbors locally, so maintains the updated local neighbors always to give context aware recommendation in real time. The characteristics of our research as follows. First, we develop the u-healthcare recommender system for improving life habit such as poor eating habits and physical inactivity. Second, the proposed recommender system uses autonomous collaboration, which enables users to prevent dropping and not to lose user's interest in improving life habit. Third, the reformation of the social network is automated to maintain the quality of recommendation. Finally, this research has implemented a mobile prototype system using JAVA and Microsoft Access2007 to recommend the prescribed foods and exercises for chronic disease prevention, which are provided by A university medical center. This research intends to prevent diseases such as chronic illnesses and to improve user's lifestyle through providing context aware and personalized food and exercise services with the help of similar users'experience and knowledge. We expect that the user of this system can improve their life habit with the help of handheld mobile smart phone, because it uses autonomous collaboration to arouse interest in healthcare.
PDF KSCI

The Relationship between Hair Zinc and Lead Levels and Clinical Features of Attention-Deficit Hyperactivity Disorder

Shin, Dong-Won;Kim, Eun-Ji;Oh, Kang-Seob;Shin, Young-Chul;Lim, Se-Won
- Journal of the Korean Academy of Child and Adolescent Psychiatry
- /
- v.25 no.1
- /
- pp.28-36
- /
- 2014
Objectives : The goal of this study was to examine the association between zinc and lead level and symptoms of attention-deficit hyperactivity disorder (ADHD) among Korean children. Methods : A total of 89 clinic-referred children participated in the study (ADHD group=45, control group=44). The participants were 5-15 years old, and were mainly from urban areas of Seoul, Korea. ADHD was diagnosed using the Kiddie-Schedule for Affective Disorders and Schizophrenia-Present and Lifetime Version. We excluded children with a comorbid psychiatric disorder, medical illness requiring medication, or a prior history of taking ADHD medication. In order to evaluate the severity of ADHD symptoms, parents' Korean ADHD Rating Scale (K-ARS) was used. The ADHD diagnostic system (ADS) was used for evaluation of the severity of inattention and impulsivity. All participants completed the intelligence test and hair mineral analysis. Multiple regression analysis was used to examine the effect of hair zinc and lead levels on the K-ARS and ADS. We measured the predictive ability of the zinc and lead levels using logistic regression analysis. Results : The lead level explained the score for omission errors, commission errors, and response time SD in visual ADS in the ADHD group (adjusted $R^2$=.243, p<.01, adjusted $R^2$=.362, p<.01, and adjusted $R^2$=.275, p<.01), the score for omission errors of auditory ADS in ADHD group (adjusted $R^2$=.407, p<.01) and the entire group (adjusted $R^2$=.292, p<.01). Zinc was significantly explanatory for the K-ARS scores for the entire group (adjusted $R^2$=.248, p<.001) and the ADHD group (adjusted $R^2$=.247, p<.05). Conclusion : These findings suggest a possible role of zinc and lead in ADHD. Lead concentration in hair samples affected the ADS scores, and this was more prominent in children with ADHD. Children with ADHD had a lower zinc concentration in their hair, and the zinc concentration in hair showed negative correlation with the K-ARS score.
https://doi.org/10.5765/jkacap.2014.25.1.28 인용 PDF KSCI

A Study on the Improvement Scheme of University's Software Education

Lee, Won Joo
- Journal of the Korea Society of Computer and Information
- /
- v.25 no.3
- /
- pp.243-250
- /
- 2020
In this paper, we propose an effective software education scheme for universities. The key idea of this software education scheme is to analyze software curriculum of QS world university rankings Top 10, SW-oriented university, and regional main national university. And based on the results, we propose five improvements for the effective SW education method of universities. The first is to enhance the adaptability of the industry by developing courses based on the SW developer's job analysis in the curriculum development process. Second, it is necessary to strengthen the curriculum of the 4th industrial revolution core technologies(cloud computing, big data, virtual/augmented reality, Internet of things, etc.) and integrate them with various fields such as medical, bio, sensor, human, and cognitive science. Third, programming language education should be included in software convergence course after basic syntax education to implement projects in various fields. In addition, the curriculum for developing system programming developers and back-end developers should be strengthened rather than application program developers. Fourth, it offers opportunities to participate in industrial projects by reinforcing courses such as capstone design and comprehensive design, which enables product-based self-directed learning. Fifth, it is necessary to develop university-specific curriculum based on local industry by reinforcing internship or industry-academic program that can acquire skills in local industry field.
https://doi.org/10.9708/jksci.2020.25.03.243 인용 PDF KSCI

Discovery of Market Convergence Opportunity Combining Text Mining and Social Network Analysis: Evidence from Large-Scale Product Databases (B2B 전자상거래 정보를 활용한 시장 융합 기회 발굴 방법론)

Kim, Ji-Eun;Hyun, Yoonjin;Choi, Yun-Jeong
- Journal of Intelligence and Information Systems
- /
- v.22 no.4
- /
- pp.87-107
- /
- 2016
Understanding market convergence has became essential for small and mid-size enterprises. Identifying convergence items among heterogeneous markets could lead to product innovation and successful market introduction. Previous researches have two limitations. First, traditional researches focusing on patent databases are suitable for detecting technology convergence, however, they have failed to recognize market demands. Second, most researches concentrate on identifying the relationship between existing products or technology. This study presents a platform to identify the opportunity of market convergence by using product databases from a global B2B marketplace. We also attempt to identify convergence opportunity in different industries by applying Structural Hole theory. This paper shows the mechanisms for market convergence: attributes extraction of products and services using text mining and association analysis among attributes, and network analysis based on structural hole. In order to discover market demand, we analyzed 240,002 e-catalog from January 2013 to July 2016.
https://doi.org/10.13088/jiis.2016.22.4.087 인용 PDF KSCI

Privacy-Preserving Language Model Fine-Tuning Using Offsite Tuning (프라이버시 보호를 위한 오프사이트 튜닝 기반 언어모델 미세 조정 방법론)

Jinmyung Jeong;Namgyu Kim
- Journal of Intelligence and Information Systems
- /
- v.29 no.4
- /
- pp.165-184
- /
- 2023
Recently, Deep learning analysis of unstructured text data using language models, such as Google's BERT and OpenAI's GPT has shown remarkable results in various applications. Most language models are used to learn generalized linguistic information from pre-training data and then update their weights for downstream tasks through a fine-tuning process. However, some concerns have been raised that privacy may be violated in the process of using these language models, i.e., data privacy may be violated when data owner provides large amounts of data to the model owner to perform fine-tuning of the language model. Conversely, when the model owner discloses the entire model to the data owner, the structure and weights of the model are disclosed, which may violate the privacy of the model. The concept of offsite tuning has been recently proposed to perform fine-tuning of language models while protecting privacy in such situations. But the study has a limitation that it does not provide a concrete way to apply the proposed methodology to text classification models. In this study, we propose a concrete method to apply offsite tuning with an additional classifier to protect the privacy of the model and data when performing multi-classification fine-tuning on Korean documents. To evaluate the performance of the proposed methodology, we conducted experiments on about 200,000 Korean documents from five major fields, ICT, electrical, electronic, mechanical, and medical, provided by AIHub, and found that the proposed plug-in model outperforms the zero-shot model and the offsite model in terms of classification accuracy.
https://doi.org/10.13088/jiis.2023.29.4.165 인용 PDF

Analysis of News Agenda Using Text mining and Semantic Network Analysis: Focused on COVID-19 Emotions (텍스트 마이닝과 의미 네트워크 분석을 활용한 뉴스 의제 분석: 코로나 19 관련 감정을 중심으로)

Yoo, So-yeon;Lim, Gyoo-gun
- Journal of Intelligence and Information Systems
- /
- v.27 no.1
- /
- pp.47-64
- /
- 2021
The global spread of COVID-19 around the world has not only affected many parts of our daily life but also has a huge impact on many areas, including the economy and society. As the number of confirmed cases and deaths increases, medical staff and the public are said to be experiencing psychological problems such as anxiety, depression, and stress. The collective tragedy that accompanies the epidemic raises fear and anxiety, which is known to cause enormous disruptions to the behavior and psychological well-being of many. Long-term negative emotions can reduce people's immunity and destroy their physical balance, so it is essential to understand the psychological state of COVID-19. This study suggests a method of monitoring medial news reflecting current days which requires striving not only for physical but also for psychological quarantine in the prolonged COVID-19 situation. Moreover, it is presented how an easier method of analyzing social media networks applies to those cases. The aim of this study is to assist health policymakers in fast and complex decision-making processes. News plays a major role in setting the policy agenda. Among various major media, news headlines are considered important in the field of communication science as a summary of the core content that the media wants to convey to the audiences who read it. News data used in this study was easily collected using "Bigkinds" that is created by integrating big data technology. With the collected news data, keywords were classified through text mining, and the relationship between words was visualized through semantic network analysis between keywords. Using the KrKwic program, a Korean semantic network analysis tool, text mining was performed and the frequency of words was calculated to easily identify keywords. The frequency of words appearing in keywords of articles related to COVID-19 emotions was checked and visualized in word cloud 'China', 'anxiety', 'situation', 'mind', 'social', and 'health' appeared high in relation to the emotions of COVID-19. In addition, UCINET, a specialized social network analysis program, was used to analyze connection centrality and cluster analysis, and a method of visualizing a graph using Net Draw was performed. As a result of analyzing the connection centrality between each data, it was found that the most central keywords in the keyword-centric network were 'psychology', 'COVID-19', 'blue', and 'anxiety'. The network of frequency of co-occurrence among the keywords appearing in the headlines of the news was visualized as a graph. The thickness of the line on the graph is proportional to the frequency of co-occurrence, and if the frequency of two words appearing at the same time is high, it is indicated by a thick line. It can be seen that the 'COVID-blue' pair is displayed in the boldest, and the 'COVID-emotion' and 'COVID-anxiety' pairs are displayed with a relatively thick line. 'Blue' related to COVID-19 is a word that means depression, and it was confirmed that COVID-19 and depression are keywords that should be of interest now. The research methodology used in this study has the convenience of being able to quickly measure social phenomena and changes while reducing costs. In this study, by analyzing news headlines, we were able to identify people's feelings and perceptions on issues related to COVID-19 depression, and identify the main agendas to be analyzed by deriving important keywords. By presenting and visualizing the subject and important keywords related to the COVID-19 emotion at a time, medical policy managers will be able to be provided a variety of perspectives when identifying and researching the regarding phenomenon. It is expected that it can help to use it as basic data for support, treatment and service development for psychological quarantine issues related to COVID-19.
https://doi.org/10.13088/jiis.2021.27.1.047 인용 PDF KSCI

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.22 no.1
- /
- pp.83-105
- /
- 2016
Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.
https://doi.org/10.13088/jiis.2016.22.1.083 인용 PDF KSCI

A Regression-Model-based Method for Combining Interestingness Measures of Association Rule Mining (연관상품 추천을 위한 회귀분석모형 기반 연관 규칙 척도 결합기법)

Lee, Dongwon
- Journal of Intelligence and Information Systems
- /
- v.23 no.1
- /
- pp.127-141
- /
- 2017
Advances in Internet technologies and the proliferation of mobile devices enabled consumers to approach a wide range of goods and services, while causing an adverse effect that they have hard time reaching their congenial items even if they devote much time to searching for them. Accordingly, businesses are using the recommender systems to provide tools for consumers to find the desired items more easily. Association Rule Mining (ARM) technology is advantageous to recommender systems in that ARM provides intuitive form of a rule with interestingness measures (support, confidence, and lift) describing the relationship between items. Given an item, its relevant items can be distinguished with the help of the measures that show the strength of relationship between items. Based on the strength, the most pertinent items can be chosen among other items and exposed to a given item's web page. However, the diversity of the measures may confuse which items are more recommendable. Given two rules, for example, one rule's support and confidence may not be concurrently superior to the other rule's. Such discrepancy of the measures in distinguishing one rule's superiority from other rules may cause difficulty in selecting proper items for recommendation. In addition, in an online environment where a web page or mobile screen can provide a limited number of recommendations that attract consumer interest, the prudent selection of items to be included in the list of recommendations is very important. The exposure of items of little interest may lead consumers to ignore the recommendations. Then, such consumers will possibly not pay attention to other forms of marketing activities. Therefore, the measures should be aligned with the probability of consumer's acceptance of recommendations. For this reason, this study proposes a model-based approach to combine those measures into one unified measure that can consistently determine the ranking of recommended items. A regression model was designed to describe how well the measures (independent variables; i.e., support, confidence, and lift) explain consumer's acceptance of recommendations (dependent variables, hit rate of recommended items). The model is intuitive to understand and easy to use in that the equation consists of the commonly used measures for ARM and can be used in the estimation of hit rates. The experiment using transaction data from one of the Korea's largest online shopping malls was conducted to show that the proposed model can improve the hit rates of recommendations. From the top of the list to 13th place, recommended items in the higher rakings from the proposed model show the higher hit rates than those from the competitive model's. The result shows that the proposed model's performance is superior to the competitive model's in online recommendation environment. In a web page, consumers are provided around ten recommendations with which the proposed model outperforms. Moreover, a mobile device cannot expose many items simultaneously due to its limited screen size. Therefore, the result shows that the newly devised recommendation technique is suitable for the mobile recommender systems. While this study has been conducted to cover the cross-selling in online shopping malls that handle merchandise, the proposed method can be expected to be applied in various situations under which association rules apply. For example, this model can be applied to medical diagnostic systems that predict candidate diseases from a patient's symptoms. To increase the efficiency of the model, additional variables will need to be considered for the elaboration of the model in future studies. For example, price can be a good candidate for an explanatory variable because it has a major impact on consumer purchase decisions. If the prices of recommended items are much higher than the items in which a consumer is interested, the consumer may hesitate to accept the recommendations.
https://doi.org/10.13088/jiis.2017.23.1.127 인용 PDF KSCI

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.23-45
- /
- 2020
Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.
https://doi.org/10.13088/jiis.2020.26.1.023 인용 PDF KSCI

Search Result 186, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)