Search | Korea Science

A Processing of Progressive Aspect "te-iru" in Japanese-Korean Machine Translation (일한기계번역에서 진행형 "ている"의 번역처리)

Kim, Jeong-In;Mun, Gyeong-Hui;Lee, Jong-Hyeok
- The KIPS Transactions:PartB
- /
- v.8B no.6
- /
- pp.685-692
- /
- 2001
This paper describes how to disambiguate the aspectual meaning of Japanese expression "-te iru" in Japanese-Korean machine translation Due to grammatical similarities of both languages, almost all Japanese- Korean MT systems have been developed under the direct MT strategy, in which the lexical disambiguation is essential to high-quality translation. Japanese has a progressive aspectual marker “-te iru" which is difficult to translate into Korean equivalents because in Korean there are two different progressive aspectual markers: "-ko issta" for "action progressive" and "-e issta" for "state progressive". Moreover, the aspectual system of both languages does not quite coincide with each other, so the Korean progressive aspect could not be determined by Japanese meaning of " te iru" alone. The progressive aspectural meaning may be parially determined by the meaning of predicates and also the semantic meaning of predicates may be partially reshicted by adverbials, so all Japanese predicates are classified into five classes : the 1nd verb is used only for "action progrssive",2nd verb generally for "action progressive" but occasionally for "state progressive", the 3rd verb only for "state progressive", the 4th verb generally for "state progressive", but occasIonally for "action progressive", and the 5th verb for the others. Some heuristic rules are defined for disambiguation of the 2nd and 4th verbs on the basis of adverbs and abverbial phrases. In an experimental evaluation using more than 15,000 sentances from "Asahi newspapers", the proposed method improved the translation quality by about 5%, which proves that it is effective in disambiguating "-te iru" for Japanese-Korean machine translation.translation quality by about 5%, which proves that it is effective in disambiguating "-te iru" for Japanese-Korean machine translation.anslation.
PDF

Different Look, Different Feel: Social Robot Design Evaluation Model Based on ABOT Attributes and Consumer Emotions (각인각색, 각봇각색: ABOT 속성과 소비자 감성 기반 소셜로봇 디자인평가 모형 개발)

Ha, Sangjip;Lee, Junsik;Yoo, In-Jin;Park, Do-Hyung
- Journal of Intelligence and Information Systems
- /
- v.27 no.2
- /
- pp.55-78
- /
- 2021
Tosolve complex and diverse social problems and ensure the quality of life of individuals, social robots that can interact with humans are attracting attention. In the past, robots were recognized as beings that provide labor force as they put into industrial sites on behalf of humans. However, the concept of today's robot has been extended to social robots that coexist with humans and enable social interaction with the advent of Smart technology, which is considered an important driver in most industries. Specifically, there are service robots that respond to customers, the robots that have the purpose of edutainment, and the emotionalrobots that can interact with humans intimately. However, popularization of robots is not felt despite the current information environment in the modern ICT service environment and the 4th industrial revolution. Considering social interaction with users which is an important function of social robots, not only the technology of the robots but also other factors should be considered. The design elements of the robot are more important than other factors tomake consumers purchase essentially a social robot. In fact, existing studies on social robots are at the level of proposing "robot development methodology" or testing the effects provided by social robots to users in pieces. On the other hand, consumer emotions felt from the robot's appearance has an important influence in the process of forming user's perception, reasoning, evaluation and expectation. Furthermore, it can affect attitude toward robots and good feeling and performance reasoning, etc. Therefore, this study aims to verify the effect of appearance of social robot and consumer emotions on consumer's attitude toward social robot. At this time, a social robot design evaluation model is constructed by combining heterogeneous data from different sources. Specifically, the three quantitative indicator data for the appearance of social robots from the ABOT Database is included in the model. The consumer emotions of social robot design has been collected through (1) the existing design evaluation literature and (2) online buzzsuch as product reviews and blogs, (3) qualitative interviews for social robot design. Later, we collected the score of consumer emotions and attitudes toward various social robots through a large-scale consumer survey. First, we have derived the six major dimensions of consumer emotions for 23 pieces of detailed emotions through dimension reduction methodology. Then, statistical analysis was performed to verify the effect of derived consumer emotionson attitude toward social robots. Finally, the moderated regression analysis was performed to verify the effect of quantitatively collected indicators of social robot appearance on the relationship between consumer emotions and attitudes toward social robots. Interestingly, several significant moderation effects were identified, these effects are visualized with two-way interaction effect to interpret them from multidisciplinary perspectives. This study has theoretical contributions from the perspective of empirically verifying all stages from technical properties to consumer's emotion and attitudes toward social robots by linking the data from heterogeneous sources. It has practical significance that the result helps to develop the design guidelines based on consumer emotions in the design stage of social robot development.
https://doi.org/10.13088/jiis.2021.27.2.055 인용 PDF KSCI

The Behavior Analysis of Exhibition Visitors using Data Mining Technique at the KIDS & EDU EXPO for Children (유아교육 박람회에서 데이터마이닝 기법을 이용한 전시 관람 행동 패턴 분석)

Jung, Min-Kyu;Kim, Hyea-Kyeong;Choi, Il-Young;Lee, Kyoung-Jun;Kim, Jae-Kyeong
- Journal of Intelligence and Information Systems
- /
- v.17 no.2
- /
- pp.77-96
- /
- 2011
An exhibition is defined as market events for specific duration to present exhibitors' main products to business or private visitors, and it plays a key role as effective marketing channels. As the importance of exhibition is getting more and more, domestic exhibition industry has achieved such a great quantitative growth. But, In contrast to the quantitative growth of domestic exhibition industry, the qualitative growth of Exhibition has not achieved competent growth. In order to improve the quality of exhibition, we need to understand the preference or behavior characteristics of visitors and to increase the level of visitors' attention and satisfaction through the understanding of visitors. So, in this paper, we used the observation survey method which is a kind of field research to understand visitors and collect the real data for the analysis of behavior pattern. And this research proposed the following methodology framework consisting of three steps. First step is to select a suitable exhibition to apply for our method. Second step is to implement the observation survey method. And we collect the real data for further analysis. In this paper, we conducted the observation survey method to obtain the real data of the KIDS & EDU EXPO for Children in SETEC. Our methodology was conducted on 160 visitors and 78 booths from November 4th to 6th in 2010. And, the last step is to analyze the record data through observation. In this step, we analyze the feature of exhibition using Demographic Characteristics collected by observation survey method at first. And then we analyze the individual booth features by the records of visited booth. Through the analysis of individual booth features, we can figure out what kind of events attract the attention of visitors and what kind of marketing activities affect the behavior pattern of visitors. But, since previous research considered only individual features influenced by exhibition, the research about the correlation among features is not performed much. So, in this research, additional analysis is carried out to supplement the existing research with data mining techniques. And we analyze the relation among booths using data mining techniques to know behavior patterns of visitors. Among data mining techniques, we make use of two data mining techniques, such as clustering analysis and ARM(Association Rule Mining) analysis. In clustering analysis, we use K-means algorithm to figure out the correlation among booths. Through data mining techniques, we figure out that there are two important features to affect visitors' behavior patterns in exhibition. One is the geographical features of booths. The other is the exhibit contents of booths. Those features are considered when the organizer of exhibition plans next exhibition. Therefore, the results of our analysis are expected to provide guideline to understanding visitors and some valuable insights for the exhibition from the earlier phases of exhibition planning. Also, this research would be a good way to increase the quality of visitor satisfaction. Visitors' movement paths, booth location, and distances between each booth are considered to plan next exhibition in advance. This research was conducted at the KIDS & EDU EXPO for Children in SETEC(Seoul Trade Exhibition & Convention), but it has some constraints to be applied directly to other exhibitions. Also, the results were derived from a limited number of data samples. In order to obtain more accurate and reliable results, it is necessary to conduct more experiments based on larger data samples and exhibitions on a variety of genres.
https://doi.org/10.13088/jiis.2011.17.2.077 인용 PDF KSCI

The Brand Personality Effect: Communicating Brand Personality on Twitter and its Influence on Online Community Engagement (브랜드 개성 효과: 트위터 상의 브랜드 개성 전달이 온라인 커뮤니티 참여에 미치는 영향)

Cruz, Ruth Angelie B.;Lee, Hong Joo
- Journal of Intelligence and Information Systems
- /
- v.20 no.1
- /
- pp.67-101
- /
- 2014
The use of new technology greatly shapes the marketing strategies used by companies to engage their consumers. Among these new technologies, social media is used to reach out to the organization's audience online. One of the most popular social media channels to date is the microblogging platform Twitter. With 500 million tweets sent on average daily, the microblogging platform is definitely a rich source of data for researchers, and a lucrative marketing medium for companies. Nonetheless, one of the challenges for companies in developing an effective Twitter campaign is the limited theoretical and empirical evidence on the proper organizational usage of Twitter despite its potential advantages for a firm's external communications. The current study aims to provide empirical evidence on how firms can utilize Twitter effectively in their marketing communications using the association between brand personality and brand engagement that several branding researchers propose. The study extends Aaker's previous empirical work on brand personality by applying the Brand Personality Scale to explore whether Twitter brand communities convey distinctive brand personalities online and its influence on the communities' level or intensity of consumer engagement and sentiment quality. Moreover, the moderating effect of the product involvement construct in consumer engagement is also measured. By collecting data for a period of eight weeks using the publicly available Twitter application programming interface (API) from 23 accounts of Twitter-verified business-to-consumer (B2C) brands, we analyze the validity of the paper's hypothesis by using computerized content analysis and opinion mining. The study is the first to compare Twitter marketing across organizations using the brand personality concept. It demonstrates a potential basis for Twitter strategies and discusses the benefits of these strategies, thus providing a framework of analysis for Twitter practice and strategic direction for companies developing their use of Twitter to communicate with their followers on this social media platform. This study has four specific research objectives. The first objective is to examine the applicability of brand personality dimensions used in marketing research to online brand communities on Twitter. The second is to establish a connection between the congruence of offline and online brand personalities in building a successful social media brand community. Third, we test the moderating effect of product involvement in the effect of brand personality on brand community engagement. Lastly, we investigate the sentiment quality of consumer messages to the firms that succeed in communicating their brands' personalities on Twitter.
https://doi.org/10.13088/jiis.2014.20.1.067 인용 PDF KSCI

A Study on Improving Scheme and An Investigation into the Actual Condition about Components of Physical Distribution System (물류시스템 구성요인에 관한 실태분석과 개선방안에 관한 연구)

Kim, Kyeong-Cho
- Journal of Distribution Science
- /
- v.7 no.4
- /
- pp.47-56
- /
- 2009
The purpose of this study is to present an alternative improving the efficient and reasonable of the physical distribution system management is influenced by many factors. Therefore, the study depends on the documentary method and survey method to achieve the purpose of this study. The major components of a physical distribution system are refers to as elements, include warehouse·storage system, transportation system, inventory system, physical distribution information system. The factors used in this study are ① factor of product(quality·A/S·added value of product·adaption of product·technical competitive power to other enterprises), ② factor of market(market channel·kinds of customer·physical distribution share), ③ factor of warehouse·storage(warehouse design·size·direction·storage ability·warehouse quality), ④ factor of transportation(promptness·reliability·responsibility·kinds of transportation·cooperation united transportation system·national transportation network), ⑤ factor of packaging (packaging design·material·educating program·pollution degree measure program), ⑥ factor of inventory(ordinary inventory criterion·consistence for inventories record), ⑦ factor of unloaded(unloaded machine·having machine ratio), ⑧ factor of information system (physical distribution quantity analysis·usable computer part), ⑨ factor of physical distribution cost(sales ratio to product) ⑩ factor of physical distribution system(physical distribution center etc). The implication of this study can be summarized as follows: ① In firms that have not adopted a systems integrative approach, physical distribution is a fragmented and often uncoordinated set of activities spread throughout various functions with function having its own set of priorities and measurements. ② The physical distribution is recognized as more an important strategic factor than a simple cost reduction factor, ③ It can be used a strategic competition tool to enterprise.
PDF

A Study on the Characteristics of the Atmospheric Environment in Suwon Based on GIS Data and Measured Meteorological Data and Fine Particle Concentrations (GIS 자료와 지상측정 기상·미세먼지 자료에 기반한 수원시 지역의 도시대기환경 특성 연구)

Wang, Jang-Woon;Han, Sang-Cheol;Mun, Da-Som;Yang, Minjune;Choi, Seok-Hwan;Kang, Eunha;Kim, Jae-Jin
- Korean Journal of Remote Sensing
- /
- v.37 no.6_2
- /
- pp.1849-1858
- /
- 2021
We analyzed the monthly and annual trends of the meteorological factors(wind speeds and directions and air temperatures) measured at an automated synoptic observation system (ASOS) and fine particle (PM₁₀ and PM_2.5) concentrations measured at the air quality monitoring systems(AQMSs) in Suwon. In addition, we investigated how the fine particle concentrations were related to the meteorological factors as well as urban morphological parameters (fractions of building volume and road area). We calculated the total volume of buildings and the total area of the roads in the area of 2 km × 2 km centered at each AQMS using the geographic information system and environmental geographic information system. The analysis of the meteorological factors showed that the dominant wind directions at the ASOS were westerly and northwesterly and that the average wind speed was strong in Spring. The measured fine particle concentrations were low in Summer and early Autumn (July to September) and high in Spring and Winter. In 2020, the annual mean fine particle concentration was lowest at most AQMSs. The fine particle concentrations were negatively and weakly correlated with the measured wind speeds and air temperatures (the correlation between PM_2.5 concentrations and air temperatures was relatively strong). In Suwon city, at least for 6 AQMSs except for the RAQMS 131116 and AQMS 131118, the PM₁₀ concentrations were affected mainly by the transport from outside rather than primary emission from mobile sources or wind speed decrease caused by buildings and, in the case of PM_2.5, vise versa.
https://doi.org/10.7780/kjrs.2021.37.6.2.7 인용 PDF KSCI HTML

A Intelligent Diagnostic Model that base on Case-Based Reasoning according to Korea - International Financial Reporting Standards (K-IFRS에 따른 사례기반추론에 기반한 지능형 기업 진단 모형)

Lee, Hyoung-Yong
- Journal of Intelligence and Information Systems
- /
- v.20 no.4
- /
- pp.141-154
- /
- 2014
The adoption of International Financial Reporting Standards (IFRS) is the one of important issues in the recent accounting research because the change from local GAAP (Generally Accepted Accounting Principles) to IFRS has a substantial effect on accounting information. Over 100 countries including Australia, China, Canada and the European Union member countries adopt IFRS (International Financial Reporting Standards) for financial reporting purposes, and several more including the United States and Japan are considering the adoption of IFRS (International Financial Reporting Standards). In Korea, 61 firms voluntarily adopted Korean International Financial Reporting Standard (K-IFRS) in 2009 and 2010 and all listed firms mandatorily adopted K-IFRS (Korea-International Financial Reporting Standards) in 2011. The adoption of IFRS is expected to increase financial statement comparability, improve corporate transparency, increase the quality of financial reporting, and hence, provide benefits to investors This study investigates whether recognized accounts receivable discounting (AR discounting) under Korean International Financial Reporting Standard (K-IFRS) is more value relevant than disclosed AR discounting under Korean Generally Accepted Accounting Principles (K-GAAP). Because more rigorous standards are applied to the derecognition of AR discounting under K-IFRS(Korea-International Financial Reporting Standards), most AR discounting is recognized as a short term debt instead of being disclosed as a contingent liability unless all risks and rewards are transferred. In this research, I try to figure out industrial responses to the changes in accounting rules for the treatment of accounts receivable toward more strict standards in the recognition of sales which occurs with the adoption of Korea International Financial Reporting Standard. This study examines whether accounting information is more value-relevant, especially information on accounts receivable discounting (hereinafter, AR discounting) is value-relevant under K-IFRS (Korea-International Financial Reporting Standards). First, note that AR discounting involves the transfer of financial assets. Under Korean Generally Accepted Accounting Principles (K-GAAP), when firms discount AR to banks before the AR maturity, firms conventionally remove AR from the balance-sheet and report losses from AR discounting and disclose and explain the transactions in the footnotes. Under K-IFRS (Korea-International Financial Reporting Standards), however, most firms keep AR and add a short-term debt as same as discounted AR. This process increases the firms' leverage ratio and raises the concern to the firms about investors' reactions to worsening capital structures. Investors may experience the change in perceived risk of the firm. In the study sample, the average of AR discounting is 75.3 billion won (maximum 3.6 trillion won and minimum 18 million won), which is, on average 7.0% of assets (maximum 38.6% and minimum 0.002%), 26.2% of firms' accounts receivable (maximum 92.5% and minimum 0.003%) and 13.5% of total liabilities (maximum 69.5% and minimum 0.004%). After the adoption of K-IFRS (Korea-International Financial Reporting Standards), total liabilities increase by 13%p on average (maximum 103%p and minimum 0.004%p) attributable to AR discounting. The leverage ratio (total liabilities/total assets) increases by an average 2.4%p (maximum 16%p and minimum 0.001%p) and debt-to-equity ratio increases by average 14.6%p (maximum 134%p and minimum 0.006%) attributable to the recognition of AR discounting as a short-term debt. The structure of debts and equities of the companies engaging in factoring transactions are likely to be affected in the changes of accounting rule. I suggest that the changes in accounting provisions subsequent to Korea International Financial Reporting Standard adoption caused significant influence on the structure of firm's asset and liabilities. Due to this changes, the treatment of account receivable discounting have become critical. This paper proposes an intelligent diagnostic system for estimating negative impact on stock value with self-organizing maps and case based reasoning. To validate the usefulness of this proposed model, real data was analyzed. In order to get the significance of this proposed model, several models were compared to the research model. I found out that this proposed model provides satisfactory results with compared models.
https://doi.org/10.13088/jiis.2014.20.4.141 인용 PDF KSCI

A Study on the Design of Case-based Reasoning Office Knowledge Recommender System for Office Professionals (사례기반추론을 이용한 사무지식 추천시스템)

Kim, Myong-Ok;Na, Jung-Ah
- Journal of Intelligence and Information Systems
- /
- v.17 no.3
- /
- pp.131-146
- /
- 2011
It is becoming more essential than ever for office professionals to become competent in information collection/gathering and problem solving in today's global business society. In particular, office professionals do not only assist simple chores but are also forced to make decisions as quickly and efficiently as possible in problematic situations that can end in either profit or loss to their company. Since office professionals rely heavily on their tacit knowledge to solve problems that arise in everyday business situations, it is truly helpful and efficient to refer to similar business cases from the past and share or reuse such previous business knowledge for better performance results. Case-based reasoning(CBR) is a problem-solving method which utilizes previous similar cases to solve problems. Through CBR, the closest case to the current business situation can be searched and retrieved from the case or knowledge base and can be referred to for a new solution. This reduces the time and resources needed and increase success probability. The main purpose of this study is to design a system called COKRS(Case-based reasoning Office Knowledge Recommender System) and develop a prototype for it. COKRS manages cases and their meta data, accepts key words from the user and searches the casebase for the most similar past case to the input keyword, and communicates with users to collect information about the quality of the case provided and continuously apply the information to update values on the similarity table. Core concepts like system architecture, definition of a case, meta database, similarity table have been introduced, and also an algorithm to retrieve all similar cases from past work history has also been proposed. In this research, a case is best defined as a work experience in office administration. However, defining a case in office administration was not an easy task in reality. We surveyed 10 office professionals in order to get an idea of how to define a case in office administration and found out that in most cases any type of office work is to be recorded digitally and/or non-digitally. Therefore, we have defined a record or document case as for COKRS. Similarity table was composed of items of the result of job analysis for office professionals conducted in a previous research. Values between items of the similarity table were initially set to those from researchers' experiences and literature review. The results of this study could also be utilized in other areas of business for knowledge sharing wherever it is necessary and beneficial to share and learn from past experiences. We expect this research to be a reference for researchers and developers who are in this area or interested in office knowledge recommendation system based on CBR. Focus group interview(FGI) was conducted with ten administrative assistants carefully selected from various areas of business. They were given a chance to try out COKRS in an actual work setting and make some suggestions for future improvement. FGI has identified the user-interface for saving and searching cases for keywords as the most positive aspect of COKRS, and has identified the most urgently needed improvement as transforming tacit knowledge and knowhow into recorded documents more efficiently. Also, the focus group has mentioned that it is essential to secure enough support, encouragement, and reward from the company and promote positive attitude and atmosphere for knowledge sharing for everybody's benefit in the company.
https://doi.org/10.13088/jiis.2011.17.3.131 인용 PDF KSCI

A Hybrid Forecasting Framework based on Case-based Reasoning and Artificial Neural Network (사례기반 추론기법과 인공신경망을 이용한 서비스 수요예측 프레임워크)

Hwang, Yousub
- Journal of Intelligence and Information Systems
- /
- v.18 no.4
- /
- pp.43-57
- /
- 2012
To enhance the competitive advantage in a constantly changing business environment, an enterprise management must make the right decision in many business activities based on both internal and external information. Thus, providing accurate information plays a prominent role in management's decision making. Intuitively, historical data can provide a feasible estimate through the forecasting models. Therefore, if the service department can estimate the service quantity for the next period, the service department can then effectively control the inventory of service related resources such as human, parts, and other facilities. In addition, the production department can make load map for improving its product quality. Therefore, obtaining an accurate service forecast most likely appears to be critical to manufacturing companies. Numerous investigations addressing this problem have generally employed statistical methods, such as regression or autoregressive and moving average simulation. However, these methods are only efficient for data with are seasonal or cyclical. If the data are influenced by the special characteristics of product, they are not feasible. In our research, we propose a forecasting framework that predicts service demand of manufacturing organization by combining Case-based reasoning (CBR) and leveraging an unsupervised artificial neural network based clustering analysis (i.e., Self-Organizing Maps; SOM). We believe that this is one of the first attempts at applying unsupervised artificial neural network-based machine-learning techniques in the service forecasting domain. Our proposed approach has several appealing features : (1) We applied CBR and SOM in a new forecasting domain such as service demand forecasting. (2) We proposed our combined approach between CBR and SOM in order to overcome limitations of traditional statistical forecasting methods and We have developed a service forecasting tool based on the proposed approach using an unsupervised artificial neural network and Case-based reasoning. In this research, we conducted an empirical study on a real digital TV manufacturer (i.e., Company A). In addition, we have empirically evaluated the proposed approach and tool using real sales and service related data from digital TV manufacturer. In our empirical experiments, we intend to explore the performance of our proposed service forecasting framework when compared to the performances predicted by other two service forecasting methods; one is traditional CBR based forecasting model and the other is the existing service forecasting model used by Company A. We ran each service forecasting 144 times; each time, input data were randomly sampled for each service forecasting framework. To evaluate accuracy of forecasting results, we used Mean Absolute Percentage Error (MAPE) as primary performance measure in our experiments. We conducted one-way ANOVA test with the 144 measurements of MAPE for three different service forecasting approaches. For example, the F-ratio of MAPE for three different service forecasting approaches is 67.25 and the p-value is 0.000. This means that the difference between the MAPE of the three different service forecasting approaches is significant at the level of 0.000. Since there is a significant difference among the different service forecasting approaches, we conducted Tukey's HSD post hoc test to determine exactly which means of MAPE are significantly different from which other ones. In terms of MAPE, Tukey's HSD post hoc test grouped the three different service forecasting approaches into three different subsets in the following order: our proposed approach > traditional CBR-based service forecasting approach > the existing forecasting approach used by Company A. Consequently, our empirical experiments show that our proposed approach outperformed the traditional CBR based forecasting model and the existing service forecasting model used by Company A. The rest of this paper is organized as follows. Section 2 provides some research background information such as summary of CBR and SOM. Section 3 presents a hybrid service forecasting framework based on Case-based Reasoning and Self-Organizing Maps, while the empirical evaluation results are summarized in Section 4. Conclusion and future research directions are finally discussed in Section 5.
https://doi.org/10.13088/jiis.2012.18.4.043 인용 PDF KSCI

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.24 no.3
- /
- pp.21-44
- /
- 2018
In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.
https://doi.org/10.13088/jiis.2018.24.3.021 인용 PDF KSCI

Search Result 3,954, Processing Time 0.031 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)