Search | Korea Science

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.23-45
- /
- 2020
Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.
https://doi.org/10.13088/jiis.2020.26.1.023 인용 PDF KSCI

A Study of the Images of General Supers and a Department Store in a Local City (지방도시에 입점하고 있는 종합슈퍼와 백화점에 대한 점포이미지 비교 분석)

Kim, Chang-Gon
- Journal of Distribution Science
- /
- v.10 no.6
- /
- pp.17-26
- /
- 2012
Suncheon is a city comprising a rural and urban area, where there are four types of large stores. Studies have shown that there are too many large stores serving the local population of just 300,000. However, geographically, Suncheon is located at a transportation hub that borders the cities of Gwangyang and Yeosu as well as the local counties of Boseong and Gurae. Residents of these areas can reach these shopping stores within an hour's drive. Thus, the managers of these four stores regard residents in these areas as their valued customers and endeavor to create a differentiated image among them. In this study, 13 different images were used to determine the public's opinions and feelings towards these stores and the differences were analyzed. The store images measured overall store impression, diversity of the product, the quality of products displayed at the store, accessibility, the atmosphere, service to the customers, and so on. These images are evaluated subjectively by each customer and are major factors in them deciding to revisit the stores. The 13 images are classified into five main categories and further classified into 13 sub-categories. Three kinds of factor images were extracted from the store images in the five main categories by factor analysis using SPSS Ver. 19. The first factor image was extracted from the images of convenience, atmosphere, and service in the main categories and is called a sub-service factor for the store in this study. Accessibility to the store was classified as a convenience image in the main category and was extracted as a common factor along with diversity and the price of goods. These differences are expected according to the store location, that is, the difference between stores located in a large city and those in a small local city, and depending on the nature of survey respondents. The result shows that there is a significant difference between the stores' images with regard to accessibility, the price of products, brand image, and lighting/sound image. This study has the following limitations. First, the survey sample was restricted to residents of a small local city that includes rural and urban populations. The differences between the store images regarding traffic and accessibility are factored by store location, whether they are located within a large or a small city as well as the economic situation of these cities. Second, only the customers of large-scale stores were included in the survey as respondents. Relatively large traditional markets are held every five days in local cities and there is competition between large-scale stores and traditional markets with regard to diversity and the price of goods. It could be expected that customers in large-scale stores and customers in traditional markets would hold different store images. In future studies, images of stores in large cities should be compared with the images of stores located in small local cities. In addition, customer behavior when buying goods in large-scale stores should be compared with their behavior when buying goods in traditional markets.
PDF

A Study on the Linkage and Development of the BRM Based National Tasks and the Policy Information Contents (BRM기반 국정과제와 정책정보콘텐츠 연계 및 구축방안에 관한 연구)

Younghee, Noh;Inho, Chang;Hyojung, Sim;Woojung, Kwak
- Journal of the Korean Society for information Management
- /
- v.39 no.4
- /
- pp.191-213
- /
- 2022
With a view to providing a high-quality policy information service beyond the existing national task service of the national policy information portal (POINT) of the National Library of Korea Sejong, it would be necessary to effectively provide the policy data needed for the implementation of the new national tasks. Accordingly, in this study, an attempt has been made to find a way to connect and develop the BRM-based national tasks and the policy information contents. Towards this end, first, the types of national tasks and the contents of each field and area of the government function's classification system were analyzed, with a focus placed on the 120 national tasks of the new administration. Furthermore, by comparing and analyzing the national tasks of the previous administration and the current information, the contents ought to be reflected for the development of contents related to the national tasks identified. Second, the method for linking and collecting the policy information was sought based on the analysis of the current status of policy information and the national information portal. As a result of the study, first, examining the 1st stage BRM of the national tasks, it turned out that there were 21 tasks for social welfare, 14 for unification and diplomacy, 17 for small and medium-sized businesses in industry and trade, 12 for general public administration, 8 for the economy, taxation and finance, 6 for culture, sports and tourism, science and technology, and education each, 5 for communication, public order and safety each, 4 for health, transportation and logistics, and environment each, 3 for agriculture and forestry, 2 for national defense and regional development each, and 1 for maritime and fisheries each, among others. As for the new administration, it is apparent that science technology and IT are important, and hence, it is necessary to consider such when developing the information services for the core national tasks. Second, to link the database with external organizations, it would be necessary to form a linked operation council, link and collect the information on the national tasks, and link and provide the national task-related information for the POINTs.
https://doi.org/10.3743/KOSIM.2022.39.4.191 인용 PDF KSCI

Search Result 433, Processing Time 0.023 seconds

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

A Study of the Images of General Supers and a Department Store in a Local City (지방도시에 입점하고 있는 종합슈퍼와 백화점에 대한 점포이미지 비교 분석)

A Study on the Linkage and Development of the BRM Based National Tasks and the Policy Information Contents (BRM기반 국정과제와 정책정보콘텐츠 연계 및 구축방안에 관한 연구)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)