Search | Korea Science

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
- Journal of Intelligence and Information Systems
- /
- v.19 no.3
- /
- pp.1-23
- /
- 2013
To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.
https://doi.org/10.13088/jiis.2013.19.3.001 인용 PDF KSCI

The Effects of Evaluation Attributes of Cultural Tourism Festivals on Satisfaction and Behavioral Intention (문화관광축제 방문객의 평가속성 만족과 행동의도에 관한 연구 - 2006 광주김치대축제를 중심으로 -)

Kim, Jung-Hoon
- Journal of Global Scholars of Marketing Science
- /
- v.17 no.2
- /
- pp.55-73
- /
- 2007
Festivals are an indispensable feature of cultural tourism(Formica & Uysal, 1998). Cultural tourism festivals are increasingly being used as instruments promoting tourism and boosting the regional economy. So much research related to festivals is undertaken from a variety of perspectives. Plans to revisit a particular festival have been viewed as an important research topic both in academia and the tourism industry. Therefore festivals have frequently been leveled as cultural events. Cultural tourism festivals have become a crucial component in constituting the attractiveness of tourism destinations(Prentice, 2001). As a result, a considerable number of tourist studies have been carried out in diverse cultural tourism festivals(Backman et al., 1995; Crompton & Mckay, 1997; Park, 1998; Clawson & Knetch, 1996). Much of previous literature empirically shows the close linkage between tourist satisfaction and behavioral intention in festivals. The main objective of this study is to investigate the effects of evaluation attributes of cultural tourism festivals on satisfaction and behavioral intention. accomplish the research objective, to find out evaluation items of cultural tourism festivals through the literature study an empirical study. Using a varimax rotation with Kaiser normalization, the research obtained four factors in the 18 evaluation attributes of cultural tourism festivals. Some empirical studies have examined the relationship between behavioral intention and actual behavior. To understand between tourist satisfaction and behavioral intention, this study suggests five hypotheses and hypothesized model. In this study, the analysis is based on primary data collected from visitors who participated in '2006 Gwangju Kimchi Festival'. In total, 700 self-administered questionnaires were distributed and 561 usable questionnaires were obtained. Respondents were presented with the 18 satisfactions item on a scale from 1(strongly disagree) to 7(strongly agree). Dimensionality and stability of the scale were evaluated by a factor analysis with varimax rotation. Four factors emerged with eigenvalues greater than 1, which explained 66.40% of the total variance and Cronbach' alpha raging from 0.876 to 0.774. And four factors named: advertisement and guides, programs, food and souvenirs, and convenient facilities. To test and estimate the hypothesized model, a two-step approach with an initial measurement model and a subsequent structural model for Structural Equation Modeling was used. The AMOS 4.0 analysis package was used to conduct the analysis. In estimating the model, the maximum likelihood procedure was used.In this study Chi-square test is used, which is the most common model goodness-of-fit test. In addition, considering the literature about the Structural Equation Modeling, this study used, besides Chi-square test, more model fit indexes to determine the tangibility of the suggested model: goodness-of-fit index(GFI) and root mean square error of approximation(RMSEA) as absolute fit indexes; normed-fit index(NFI) and non-normed-fit index(NNFI) as incremental fit indexes. The results of T-test and ANOVAs revealed significant differences(0.05 level), therefore H1(Tourist Satisfaction level should be different from Demographic traits) are supported. According to the multiple Regressions analysis and AMOS, H2(Tourist Satisfaction positively influences on revisit intention), H3(Tourist Satisfaction positively influences on word of mouth), H4(Evaluation Attributes of cultural tourism festivals influences on Tourist Satisfaction), and H5(Tourist Satisfaction positively influences on Behavioral Intention) are also supported. As the conclusion of this study are as following: First, there were differences in satisfaction levels in accordance with the demographic information of visitors. Not all visitors had the same degree of satisfaction with their cultural tourism festival experience. Therefore it is necessary to understand the satisfaction of tourists if the experiences that are provided are to meet their expectations. So, in making festival plans, the organizer should consider the demographic variables in explaining and segmenting visitors to cultural tourism festival. Second, satisfaction with attributes of evaluation cultural tourism festivals had a significant direct impact on visitors' intention to revisit such festivals and the word of mouth publicity they shared. The results indicated that visitor satisfaction is a significant antecedent of their intention to revisit such festivals. Festival organizers should strive to forge long-term relationships with the visitors. In addition, it is also necessary to understand how the intention to revisit a festival changes over time and identify the critical satisfaction factors. Third, it is confirmed that behavioral intention was enhanced by satisfaction. The strong link between satisfaction and behavioral intentions of visitors areensured by high quality advertisement and guides, programs, food and souvenirs, and convenient facilities. Thus, examining revisit intention from a time viewpoint may be of a great significance for both practical and theoretical reasons. Additionally, festival organizers should give special attention to visitor satisfaction, as satisfied visitors are more likely to return sooner. The findings of this research have several practical implications for the festivals managers. The promotion of cultural festivals should be based on the understanding of tourist satisfaction for the long- term success of tourism. And this study can help managers carry out this task in a more informed and strategic manner by examining the effects of demographic traits on the level of tourist satisfaction and the behavioral intention. In other words, differentiated marketing strategies should be stressed and executed by relevant parties. The limitations of this study are as follows; the results of this study cannot be generalized to other cultural tourism festivals because we have not explored the many different kinds of festivals. A future study should be a comparative analysis of other festivals of different visitor segments. Also, further efforts should be directed toward developing more comprehensive temporal models that can explain behavioral intentions of tourists.
PDF

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

Choi, Youji;Park, Do-Hyung
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.155-175
- /
- 2017
As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.
https://doi.org/10.13088/jiis.2017.23.3.155 인용 PDF KSCI

An Empirical Investigation Into the Effect of Organizational Capabilities on Service Innovation in Knowledge Intensive Business Firms (지식서비스기업의 서비스 혁신에 영향을 미치는 조직의 역량에 관한 연구)

Yoon, Bo Sung;Kim, Yong Jin;Jin, Seung Hye
- Asia pacific journal of information systems
- /
- v.23 no.1
- /
- pp.87-106
- /
- 2013
In the service-oriented economy, knowledge and skills are considered core resources to secure competitive advantages and service innovation. Knowledge management capability, which facilitates to produce, share, accumulate and reuse knowledge, becomes as important as knowledge itself to create service value. Along with knowledge management capability, dynamic capability and operational capability are the key capabilities related to managing service delivery processes. Previous studies indicated that these three capabilities are related to service innovation. Although separately investigate the relationship between the three capabilities. The purpose of this study is 1) to define variables that have effects on service innovation including knowledge management capability, dynamic capability and operational capability, and 2) to empirically test to identify relationship among variables. In this study, knowledge management capability is defined as the capability to manage knowledge process. Dynamic capability is regarded as the firm's ability to integrate, build, and reconfigure internal and external competences to address rapidly changing environments. Operational capability refers to a high-level routine that, together with its implementing input flows, confers upon an organization's management a set of decision options for producing significant outputs of a particular type. The proposed research model was tested against the data collected through the survey method. The survey questionnaire was distributed to the managers who participated in an educational program for management consulting. Each individual who answered the questionnaire represented a knowledge based service firm. About 212 surveys questionnaires were sent via e-mail or directly delivered to respondents. The number of useable responses was 93. Measurement items were adapted from previous studies to reflect the characteristics of the industry each informant worked in. All measurement items were in, 5 point Likert scale with anchors ranging from strongly disagree (1) to strongly agree (5). Out of 93 respondents, about 81% were male, 82% of respondents were in their 30s. In terms of jobs, managers were 39.78%, professions/technicians were 24.73%, researchers were 12.90%, and sales people were 10.75%. Most of respondents worked for medium size enterprises (47,31%) in their, less than 30 employees (46.24%) in their number of employees, and less than 10 million USD (65.59%) in terms of sales volume. To test the proposed research model, structural equation modeling (SEM) technique (SPSS 16.0 and AMOS version 5) was used. We found that the three organizational capabilities have influence on service innovation directly or indirectly. Knowledge management capability directly affects dynamic capability and service innovation but indirectly affect operational capability through dynamic capability. Dynamic capability has no direct impact on service innovation, but influence service innovation indirectly through operational capability. Operational capability was found to positively affect service innovation. In sum, three organizational capabilities (knowledge management capability, dynamic capability and operational capability) need to be strategically managed at firm level, because organizational capabilities are significantly related to service innovation. An interesting result is that dynamic capability has a positive effect on service innovation only indirectly through operational capability. This result indicates that service innovation might have a characteristics similar to process innovation rather than product orientation. The results also show that organizational capabilities are inter-correlated to influence each other. Dynamic capability enables effective resource management, arrangement, and integration. Through these dynamic capability affected activities, strategic agility and responsibility get strength. Knowledge management capability intensify dynamic capability and service innovation. Knowledge management capability is the basis of dynamic capability as well. The theoretical and practical implications are discussed further in the conclusion section.
PDF

Evaluation of Water Quality Impacts of Forest Fragmentation at Doam-Dam Watershed using GIS-based Modeling System (GIS 기반의 모형을 이용한 도암댐 유역의 산림 파편화에 따른 수(水)환경 영향 평가)

Heo, Sung-Gu;Kim, Ki-Sung;Ahn, Jae-Hun;Yoon, Jong-Suk;Lim, Kyoungjae;Choi, Joongdae;Shin, Yong-Chul;Lyou, Chang-Won
- Journal of the Korean Association of Geographic Information Studies
- /
- v.9 no.4
- /
- pp.81-94
- /
- 2006
The water quality impacts of forest fragmentation at the Doam-dam watershed were evaluated in this study. For this ends, the watershed scale model, Soil and Water Assessment Tool (SWAT) model was utilized. To exclude the effects of different magnitude and patterns in weather, the same weather data of 1985 was used because of significant differences in precipitation in year 1985 and 2000. The water quality impacts of forest fragmentation were analyzed temporarily and spatially because of its nature. The flow rates for Winter and Spring has increased with forest fragmentations by $8,366m^3/month$ and $72,763m^3/month$ in the S1 subwatershed, experiencing the most forest fragmentation within the Doam-dam watershed. For Summer and Fall, the flow rate has increased by $149,901m^3/month$ and $107,109m^3/month$, respectively. It is believed that increased flow rates contributed significant amounts of soil erosion and diffused nonpoint source pollutants into the receiving water bodies. With the forest fragmentation in the S1 watershed, the average sediment concentration values for Winter and Spring increased by 5.448mg/L and 13.354mg/L, respectively. It is believed that the agricultural area, which were forest before the forest fragmentation, are responsible for increased soil erosion and sediment yield during the spring thaw and snow melts. For Spring and Fall, the sediment concentration values increased by 20.680mg/L and 24.680mg/L, respectively. Compared with Winter and Spring, the increased precipitation during Summer and Fall contributed more soil erosion and increased sediment concentration value in the stream. Based on the results obtained from the analysis performed in this study, the stream flow and sediment concentration values has increased with forest fragmentation within the S1 subwatershed. These increased flow and soil erosion could contribute the eutrophication in the receiving water bodies. This results show that natural functionalities of the forest, such as flood control, soil erosion protection, and water quality improvement, can be easily lost with on-going forest fragmentation within the watershed. Thus, the minimize the negative impacts of forest fragmentation, comprehensive land use planning at watershed scale needs to be developed and implemented based on the results obtained in this research.
PDF

The Role of Digital Knowledge Richness in Green Technology Adoption: A Digital Option Theory Perspective (그린기술 채택에의 디지털 지식풍부성의 역할: 디지털 옵션 이론 관점에서)

Yoo, Hosun;Lee, Namyeon;Kwon, Ohbyung
- The Journal of Information Systems
- /
- v.24 no.2
- /
- pp.23-52
- /
- 2015
Purpose This study aims to understand the role of digital knowledge in accepting the green technology. This study combined digital option theory with the second version of the Unified Theory of Acceptance and Use of Technology (UTAUT2). Contrary to other studies in which the UTAUT2 is used to explain IT adoption behavior, we look at the relationship between IT and the UTAUT2 from a new angle, incorporating an important aspect of IT, that is, digitized knowledge richness, as a determinant of the UTAUT2. Design/methodology/approach Grounded in the UTAUT2, a content analysis was conducted to investigate novel constructs dedicated to explaining green technology adoption. In this study, an amended version of the UTAUT2 specific to green technology is offered that better explains the green technology adoption behavior of consumers. Using the items identified by content analysis, we developed a questionnaire with 36 survey items. We measured all the items on a seven-point Likert-type scale. We randomly selected 402 survey respondents from a set of panel data. After a pilot study, we analyzed the main survey data by using PLS 2.0M3 and SPSS 20.0, and employed structural equation modeling to test the hypotheses. Findings The results suggest that the UTAUT2 was found to be extendable to technologies other than conventional IT. Social influence is more significant than conventional utilitarian and hedonic-based constructs such as those utilized in the UTAUT and UTAUT2 in explaining adoption behavior in the context of green technologies. The hypothesized connection between digitized knowledge richness and adoption intention was supported by the results of studies on the role of IT in formation of attitudes toward eco-friendly production. The results also indicate that digital knowledge can also encourage people to try green technology when they learn that their peers are already using the technology successfully.
https://doi.org/10.5859/KAIS.2015.24.2.23 인용 PDF KSCI

A Study of Guide System for Cerebrovascular Intervention (뇌혈관 중재시술 지원 가이드 시스템에 관한 연구)

Lee, Sung-Gwon;Jeong, Chang-Won;Yoon, Kwon-Ha;Joo, Su-Chong
- Journal of Internet Computing and Services
- /
- v.17 no.1
- /
- pp.101-107
- /
- 2016
Due to the recent advancement in digital imaging technology, development of intervention equipment has become generalize. Video arbitration procedure is a process to insert a tiny catheter and a guide wire in the body, so in order to enhance the effectiveness and safety of this treatment, the high-quality of x-ray of image should be used. However, the increasing of radiation has become the problem. Therefore, the studies to improve the performance of x-ray detectors are being actively processed. Moreover, this intervention is based on the reference of the angiographic imaging and 3D medical image processing. In this paper, we propose a guidance system to support this intervention. Through this intervention, it can solve the problem of the existing 2D medical images based vessel that has a formation of cerebrovascular disease, and guide the real-time tracking and optimal route to the target lesion by intervention catheter and guide wire tool. As a result, the system was completely composed for medical image acquisition unit and image processing unit as well as a display device. The experimental environment, guide services which are provided by the proposed system Brain Phantom (complete intracranial model with aneurysms, ref H+N-S-A-010) was taken with x-ray and testing. To generate a reference image based on the Laplacian algorithm for the image processing which derived from the cerebral blood vessel model was applied to DICOM by Volume ray casting technique. $A^*$ algorithm was used to provide the catheter with a guide wire tracking path. Finally, the result does show the location of the catheter and guide wire providing in the proposed system especially, it is expected to provide a useful guide for future intervention service.
https://doi.org/10.7472/jksii.2016.17.1.101 인용 PDF KSCI

DISEASE DIAGNOSED AND DESCRIBED BY NIRS

Tsenkova, Roumiana N.
- Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
- /
- 2001.06a
- /
- pp.1031-1031
- /
- 2001
The mammary gland is made up of remarkably sensitive tissue, which has the capability of producing a large volume of secretion, milk, under normal or healthy conditions. When bacteria enter the gland and establish an infection (mastitis), inflammation is initiated accompanied by an influx of white cells from the blood stream, by altered secretory function, and changes in the volume and composition of secretion. Cell numbers in milk are closely associated with inflammation and udder health. These somatic cell counts (SCC) are accepted as the international standard measurement of milk quality in dairy and for mastitis diagnosis. NIR Spectra of unhomogenized composite milk samples from 14 cows (healthy and mastitic), 7days after parturition and during the next 30 days of lactation were measured. Different multivariate analysis techniques were used to diagnose the disease at very early stage and determine how the spectral properties of milk vary with its composition and animal health. PLS model for prediction of somatic cell count (SCC) based on NIR milk spectra was made. The best accuracy of determination for the 1100-2500nm range was found using smoothed absorbance data and 10 PLS factors. The standard error of prediction for independent validation set of samples was 0.382, correlation coefficient 0.854 and the variation coefficient 7.63%. It has been found that SCC determination by NIR milk spectra was indirect and based on the related changes in milk composition. From the spectral changes, we learned that when mastitis occurred, the most significant factors that simultaneously influenced milk spectra were alteration of milk proteins and changes in ionic concentration of milk. It was consistent with the results we obtained further when applied 2DCOS. Two-dimensional correlation analysis of NIR milk spectra was done to assess the changes in milk composition, which occur when somatic cell count (SCC) levels vary. The synchronous correlation map revealed that when SCC increases, protein levels increase while water and lactose levels decrease. Results from the analysis of the asynchronous plot indicated that changes in water and fat absorptions occur before other milk components. In addition, the technique was used to assess the changes in milk during a period when SCC levels do not vary appreciably. Results indicated that milk components are in equilibrium and no appreciable change in a given component was seen with respect to another. This was found in both healthy and mastitic animals. However, milk components were found to vary with SCC content regardless of the range considered. This important finding demonstrates that 2-D correlation analysis may be used to track even subtle changes in milk composition in individual cows. To find out the right threshold for SCC when used for mastitis diagnosis at cow level, classification of milk samples was performed using soft independent modeling of class analogy (SIMCA) and different spectral data pretreatment. Two levels of SCC - 200 000 cells/$m\ell$ and 300 000 cells/$m\ell$, respectively, were set up and compared as thresholds to discriminate between healthy and mastitic cows. The best detection accuracy was found with 200 000 cells/$m\ell$ as threshold for mastitis and smoothed absorbance data: - 98% of the milk samples in the calibration set and 87% of the samples in the independent test set were correctly classified. When the spectral information was studied it was found that the successful mastitis diagnosis was based on reviling the spectral changes related to the corresponding changes in milk composition. NIRS combined with different ways of spectral data ruining can provide faster and nondestructive alternative to current methods for mastitis diagnosis and a new inside into disease understanding at molecular level.
PDF

Backward Path Tracking Control of a Trailer Type Robot Using a RCGS-Based Model (RCGA 기반의 모델을 이용한 트레일러형 로봇의 후방경로 추종제어)

Wi, Yong-Uk;Kim, Heon-Hui;Ha, Yun-Su;Jin, Gang-Gyu
- Journal of Institute of Control, Robotics and Systems
- /
- v.7 no.9
- /
- pp.717-722
- /
- 2001
This paper presents a methodology on the backward path tracking control of a trailer type robot which consists of two parts: a tractor and a trailer. It is difficult to control the motion of a trailer vehicle since its dynamics is non-holonomic. Therefore, in this paper, the modeling and parameter estimation of the system using a real-coded genetic algorithm(RCGA) is proposed and a backward path tracking control algorithm is then obtained based on the linearized model. Experimental results verify the effectiveness of the proposed method.
PDF

The Effect of Meta-Features of Multiclass Datasets on the Performance of Classification Algorithms (다중 클래스 데이터셋의 메타특징이 판별 알고리즘의 성능에 미치는 영향 연구)

Kim, Jeonghun;Kim, Min Yong;Kwon, Ohbyung
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.23-45
- /
- 2020
Big data is creating in a wide variety of fields such as medical care, manufacturing, logistics, sales site, SNS, and the dataset characteristics are also diverse. In order to secure the competitiveness of companies, it is necessary to improve decision-making capacity using a classification algorithm. However, most of them do not have sufficient knowledge on what kind of classification algorithm is appropriate for a specific problem area. In other words, determining which classification algorithm is appropriate depending on the characteristics of the dataset was has been a task that required expertise and effort. This is because the relationship between the characteristics of datasets (called meta-features) and the performance of classification algorithms has not been fully understood. Moreover, there has been little research on meta-features reflecting the characteristics of multi-class. Therefore, the purpose of this study is to empirically analyze whether meta-features of multi-class datasets have a significant effect on the performance of classification algorithms. In this study, meta-features of multi-class datasets were identified into two factors, (the data structure and the data complexity,) and seven representative meta-features were selected. Among those, we included the Herfindahl-Hirschman Index (HHI), originally a market concentration measurement index, in the meta-features to replace IR(Imbalanced Ratio). Also, we developed a new index called Reverse ReLU Silhouette Score into the meta-feature set. Among the UCI Machine Learning Repository data, six representative datasets (Balance Scale, PageBlocks, Car Evaluation, User Knowledge-Modeling, Wine Quality(red), Contraceptive Method Choice) were selected. The class of each dataset was classified by using the classification algorithms (KNN, Logistic Regression, Nave Bayes, Random Forest, and SVM) selected in the study. For each dataset, we applied 10-fold cross validation method. 10% to 100% oversampling method is applied for each fold and meta-features of the dataset is measured. The meta-features selected are HHI, Number of Classes, Number of Features, Entropy, Reverse ReLU Silhouette Score, Nonlinearity of Linear Classifier, Hub Score. F1-score was selected as the dependent variable. As a result, the results of this study showed that the six meta-features including Reverse ReLU Silhouette Score and HHI proposed in this study have a significant effect on the classification performance. (1) The meta-features HHI proposed in this study was significant in the classification performance. (2) The number of variables has a significant effect on the classification performance, unlike the number of classes, but it has a positive effect. (3) The number of classes has a negative effect on the performance of classification. (4) Entropy has a significant effect on the performance of classification. (5) The Reverse ReLU Silhouette Score also significantly affects the classification performance at a significant level of 0.01. (6) The nonlinearity of linear classifiers has a significant negative effect on classification performance. In addition, the results of the analysis by the classification algorithms were also consistent. In the regression analysis by classification algorithm, Naïve Bayes algorithm does not have a significant effect on the number of variables unlike other classification algorithms. This study has two theoretical contributions: (1) two new meta-features (HHI, Reverse ReLU Silhouette score) was proved to be significant. (2) The effects of data characteristics on the performance of classification were investigated using meta-features. The practical contribution points (1) can be utilized in the development of classification algorithm recommendation system according to the characteristics of datasets. (2) Many data scientists are often testing by adjusting the parameters of the algorithm to find the optimal algorithm for the situation because the characteristics of the data are different. In this process, excessive waste of resources occurs due to hardware, cost, time, and manpower. This study is expected to be useful for machine learning, data mining researchers, practitioners, and machine learning-based system developers. The composition of this study consists of introduction, related research, research model, experiment, conclusion and discussion.
https://doi.org/10.13088/jiis.2020.26.1.023 인용 PDF KSCI

Search Result 8,577, Processing Time 0.041 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)