• Title/Summary/Keyword: Rule Based System

Search Result 1,770, Processing Time 0.035 seconds

A Study on Improving Operating System of an Intangible Cultural Heritage by an Ecological Perspective (생태계적 방식에 의한 무형문화유산 체계 연구 - 자생력 강화방안을 중심으로 -)

  • Oh, Jung-Shim
    • Korean Journal of Heritage: History & Science
    • /
    • v.48 no.3
    • /
    • pp.30-45
    • /
    • 2015
  • The purpose of this study is to demonstrate that transmissions of an intangible cultural heritage in Korea may be cut off because it is separated from human and social environment and protected and managed under the national system. In addition, another purpose is to criticize concept and method dichotomy in the current institution from an ecological perspective and consider the problem that the intangible cultural heritages are transmitted mainly by holders having skills and accomplishments by distinguishing them from others. Furthermore, the last purpose is to suggest a direction of policy emphasizing an importance of establishment of environment which allows nurture, change and development of local people, which may ensure continuous transmission in order to solve the problem and a transmission system of the intangible cultural heritage by using a principle in which the system is operated by self-recovery and natural rule of the ecology. The findings of this study show that seven problems can be analyzed by reviewing concept establishment and protection and transmission measure of intangible cultural heritages according to the Cultural Properties Protection Law, based on the ecological perspective. The protection and transmission methods of the intangible cultural heritage through the cultural heritage ecology are suggested by applying ecological theory to it. The intangible cultural heritage ecology defined in this paper means 'a sustainable community consisting of intangible cultural heritage, subject of activity and physical environment.' Since it is operated according to the principle reflecting the rules and features of natural ecology, it can keep system through self-recovery without an external intervention, as the case of natural ecology.

Development of a Feature Catalogue for Marine Geographic Information (해양 지리정보 피쳐 카탈로그 작성에 관한 연구)

  • Hong, Sang-Ki;Yun, Suk-Bum
    • Journal of Korea Spatial Information System Society
    • /
    • v.6 no.1 s.11
    • /
    • pp.101-117
    • /
    • 2004
  • Standards are essential to facilitate the efficient use of GIS data. International Standards such as ISO TC211's 19100 series and various technical specifications from OpenGIS Consortium are some of the examples of efforts to maintain the interoperability among GIS applications. Marine GIS is no exception to this rule and in this context. developing standards for marine GIS is also in urgent needs. Using the same meaning and definition for the features commonly found in marine GIS applications is one of the ways to increase the interoperability among systems. One of the key requirements for maintaining the standard meanings for features is to build a common feature catalogue. This paper examines the concept of feature catalogue and describe the ways in which the feature catalogue can be organized. To identify the common features found in various marine GIS applications, a comprehensive search has been made to collect and analyze the features used in various applications. To maintain the interoperability with the National GIS (NGIS) system, the features used in various NGIS applications have been analyzed as well. The result of these analyses are used to create a comprehensive list of common features for marine GIS. This paper then explains the common feature catalogue for marine GIS and the provides the appropriate classification and coding systems for the common features. In addition, a registration tool for registering the common features into the standard registry has been developed in this study. This Web-based tool can be used to input features into the feature catalogue by various applications and also to maintain a standard-compliant feature catalogue by standard agencies.

  • PDF

Pareto Ratio and Inequality Level of Knowledge Sharing in Virtual Knowledge Collaboration: Analysis of Behaviors on Wikipedia (지식 공유의 파레토 비율 및 불평등 정도와 가상 지식 협업: 위키피디아 행위 데이터 분석)

  • Park, Hyun-Jung;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.3
    • /
    • pp.19-43
    • /
    • 2014
  • The Pareto principle, also known as the 80-20 rule, states that roughly 80% of the effects come from 20% of the causes for many events including natural phenomena. It has been recognized as a golden rule in business with a wide application of such discovery like 20 percent of customers resulting in 80 percent of total sales. On the other hand, the Long Tail theory, pointing out that "the trivial many" produces more value than "the vital few," has gained popularity in recent times with a tremendous reduction of distribution and inventory costs through the development of ICT(Information and Communication Technology). This study started with a view to illuminating how these two primary business paradigms-Pareto principle and Long Tail theory-relates to the success of virtual knowledge collaboration. The importance of virtual knowledge collaboration is soaring in this era of globalization and virtualization transcending geographical and temporal constraints. Many previous studies on knowledge sharing have focused on the factors to affect knowledge sharing, seeking to boost individual knowledge sharing and resolve the social dilemma caused from the fact that rational individuals are likely to rather consume than contribute knowledge. Knowledge collaboration can be defined as the creation of knowledge by not only sharing knowledge, but also by transforming and integrating such knowledge. In this perspective of knowledge collaboration, the relative distribution of knowledge sharing among participants can count as much as the absolute amounts of individual knowledge sharing. In particular, whether the more contribution of the upper 20 percent of participants in knowledge sharing will enhance the efficiency of overall knowledge collaboration is an issue of interest. This study deals with the effect of this sort of knowledge sharing distribution on the efficiency of knowledge collaboration and is extended to reflect the work characteristics. All analyses were conducted based on actual data instead of self-reported questionnaire surveys. More specifically, we analyzed the collaborative behaviors of editors of 2,978 English Wikipedia featured articles, which are the best quality grade of articles in English Wikipedia. We adopted Pareto ratio, the ratio of the number of knowledge contribution of the upper 20 percent of participants to the total number of knowledge contribution made by the total participants of an article group, to examine the effect of Pareto principle. In addition, Gini coefficient, which represents the inequality of income among a group of people, was applied to reveal the effect of inequality of knowledge contribution. Hypotheses were set up based on the assumption that the higher ratio of knowledge contribution by more highly motivated participants will lead to the higher collaboration efficiency, but if the ratio gets too high, the collaboration efficiency will be exacerbated because overall informational diversity is threatened and knowledge contribution of less motivated participants is intimidated. Cox regression models were formulated for each of the focal variables-Pareto ratio and Gini coefficient-with seven control variables such as the number of editors involved in an article, the average time length between successive edits of an article, the number of sections a featured article has, etc. The dependent variable of the Cox models is the time spent from article initiation to promotion to the featured article level, indicating the efficiency of knowledge collaboration. To examine whether the effects of the focal variables vary depending on the characteristics of a group task, we classified 2,978 featured articles into two categories: Academic and Non-academic. Academic articles refer to at least one paper published at an SCI, SSCI, A&HCI, or SCIE journal. We assumed that academic articles are more complex, entail more information processing and problem solving, and thus require more skill variety and expertise. The analysis results indicate the followings; First, Pareto ratio and inequality of knowledge sharing relates in a curvilinear fashion to the collaboration efficiency in an online community, promoting it to an optimal point and undermining it thereafter. Second, the curvilinear effect of Pareto ratio and inequality of knowledge sharing on the collaboration efficiency is more sensitive with a more academic task in an online community.

A Study on Classification System for Gong-Po-Do Style in Tomb Wall Paintings of Koguryo (고구려 고분벽화 공포도 형식의 분류체계에 관한 연구)

  • Hwang, Se-Ok
    • Korean Journal of Heritage: History & Science
    • /
    • v.49 no.2
    • /
    • pp.20-55
    • /
    • 2016
  • Koguryo's tomb mural paintings in North Korea are our precious cultural heritage which have been designated as UNESCO World Heritage property receiving high praise in the following criterion, i) exceptional creativeness of human being, ii) representative value showing the stage of development in construction history of East-Asia, iii) aesthetic superiority iv) uniqueness of building construction including tombs' ceiling. Mural paintings have been found from almost 100 tombs of the Koguryo dynasty out of 130 which are scattered across Huanren County, Lianoning Province, Ji'an, Jilin Province in China and Pyongyang in North Korea. Especially, most of them are gathered in Pyongyang from 4th and 5th century. Peculiarly, some of them have been constructed before King Jangsu's transfer of the capital to Pyongyang(AD 427). It can be regarded that Pyongyang territory had been under control of Koguryo and to become a new capital in the near future. And dense emergence of such tombs since the capital transfer from Gungnae City to Pyongyang during the reign of Jangsu is linked closely to the construction of tombs for rulers under strengthen royal authority of Jangsu and centralized system of authoritarian rule. Tomb mural paintings describe the owner's figure pictorially based on the truth just as in his living years. General lifestyles of ruling powers and sovereigns can be seen from the wall paintings portraying several buildings with various styles, figures, manners of living, which are considered that the tomb owner had led politically and sociologically in his life. In spite of not enough proofs to approve figure of architectures or "Gong-Po" in wall paintings on the tombs as those of Koguryo, it is persuasive with consideration for painting and decoration inside the tomb like wooden building in real life for the purpose of reenacting and continuing the tomb owner's luxurious life after death. "Du-Gong-Po-Zak" had appeared in company with Koguryo tomb murals and it can be found in most of the murals. And the emergence of substantial "Gong-Po-Do" can be counted more than a century ahead of the figure in murals. It could be a reasonable assumption as regards Koguryo tomb murals time of appearance match up with production period of Gahyungmyunggi(家形明器) and Hwasangseok(畵像石) Hwasangjeon(畵像塼) Design in the Mural Painting of the East-Han(東漢) Ancient Tombs in China. On this study, architectural "Gong-Po"s described in Koguryo tomb murals are categorized largely in "Bi(non)-Po-Zak-kye", "Jun(semi)-Po-Zak-kye", and "Po-Zak-kye" based on presence of "Ju-Du", "Cheom-Cha", and "Cheom-Cha-Sal-Mi" with developmental aspect, and, "Po-zak" is subdivided as "Bi(non)-Cheul-Mok" and "Cheul-Mok" types.

The Prediction of Export Credit Guarantee Accident using Machine Learning (기계학습을 이용한 수출신용보증 사고예측)

  • Cho, Jaeyoung;Joo, Jihwan;Han, Ingoo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.83-102
    • /
    • 2021
  • The government recently announced various policies for developing big-data and artificial intelligence fields to provide a great opportunity to the public with respect to disclosure of high-quality data within public institutions. KSURE(Korea Trade Insurance Corporation) is a major public institution for financial policy in Korea, and thus the company is strongly committed to backing export companies with various systems. Nevertheless, there are still fewer cases of realized business model based on big-data analyses. In this situation, this paper aims to develop a new business model which can be applied to an ex-ante prediction for the likelihood of the insurance accident of credit guarantee. We utilize internal data from KSURE which supports export companies in Korea and apply machine learning models. Then, we conduct performance comparison among the predictive models including Logistic Regression, Random Forest, XGBoost, LightGBM, and DNN(Deep Neural Network). For decades, many researchers have tried to find better models which can help to predict bankruptcy since the ex-ante prediction is crucial for corporate managers, investors, creditors, and other stakeholders. The development of the prediction for financial distress or bankruptcy was originated from Smith(1930), Fitzpatrick(1932), or Merwin(1942). One of the most famous models is the Altman's Z-score model(Altman, 1968) which was based on the multiple discriminant analysis. This model is widely used in both research and practice by this time. The author suggests the score model that utilizes five key financial ratios to predict the probability of bankruptcy in the next two years. Ohlson(1980) introduces logit model to complement some limitations of previous models. Furthermore, Elmer and Borowski(1988) develop and examine a rule-based, automated system which conducts the financial analysis of savings and loans. Since the 1980s, researchers in Korea have started to examine analyses on the prediction of financial distress or bankruptcy. Kim(1987) analyzes financial ratios and develops the prediction model. Also, Han et al.(1995, 1996, 1997, 2003, 2005, 2006) construct the prediction model using various techniques including artificial neural network. Yang(1996) introduces multiple discriminant analysis and logit model. Besides, Kim and Kim(2001) utilize artificial neural network techniques for ex-ante prediction of insolvent enterprises. After that, many scholars have been trying to predict financial distress or bankruptcy more precisely based on diverse models such as Random Forest or SVM. One major distinction of our research from the previous research is that we focus on examining the predicted probability of default for each sample case, not only on investigating the classification accuracy of each model for the entire sample. Most predictive models in this paper show that the level of the accuracy of classification is about 70% based on the entire sample. To be specific, LightGBM model shows the highest accuracy of 71.1% and Logit model indicates the lowest accuracy of 69%. However, we confirm that there are open to multiple interpretations. In the context of the business, we have to put more emphasis on efforts to minimize type 2 error which causes more harmful operating losses for the guaranty company. Thus, we also compare the classification accuracy by splitting predicted probability of the default into ten equal intervals. When we examine the classification accuracy for each interval, Logit model has the highest accuracy of 100% for 0~10% of the predicted probability of the default, however, Logit model has a relatively lower accuracy of 61.5% for 90~100% of the predicted probability of the default. On the other hand, Random Forest, XGBoost, LightGBM, and DNN indicate more desirable results since they indicate a higher level of accuracy for both 0~10% and 90~100% of the predicted probability of the default but have a lower level of accuracy around 50% of the predicted probability of the default. When it comes to the distribution of samples for each predicted probability of the default, both LightGBM and XGBoost models have a relatively large number of samples for both 0~10% and 90~100% of the predicted probability of the default. Although Random Forest model has an advantage with regard to the perspective of classification accuracy with small number of cases, LightGBM or XGBoost could become a more desirable model since they classify large number of cases into the two extreme intervals of the predicted probability of the default, even allowing for their relatively low classification accuracy. Considering the importance of type 2 error and total prediction accuracy, XGBoost and DNN show superior performance. Next, Random Forest and LightGBM show good results, but logistic regression shows the worst performance. However, each predictive model has a comparative advantage in terms of various evaluation standards. For instance, Random Forest model shows almost 100% accuracy for samples which are expected to have a high level of the probability of default. Collectively, we can construct more comprehensive ensemble models which contain multiple classification machine learning models and conduct majority voting for maximizing its overall performance.

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

The Influence of the Restrictions in Chinese economic growth on Korean commercial environment (중국 경제성장의 제약요인이 한국 통상환경에 미치는 영향)

  • Shong, Il-Ho;Lee, Gye-Young
    • International Commerce and Information Review
    • /
    • v.15 no.4
    • /
    • pp.457-479
    • /
    • 2013
  • Through a Chinese rise, Chinese dream is actualizing as the world's great power. According to outlook of World Bank and IMF, Around 2030 China will be a great power bigger than America's economic power. The rise of China will give a huge impact to the whole world. China expands her influence through a global manufacturing base and a global market. To actualize 'Peaceful Rise' Strategy, China has many constraints. Chinese society is facing many difficult social problem due to side effects of a rapid development. Such as the spread of corruption, the severity of wealth gap, environmental degradation and energy shortage. Internationally there are containment from hegemon so-called 'China threat' dispute, Taiwan issue and territorial disputes. Western countries are hostile to China for two reasons. Based on expectations, one is China's socialist system and the other is the rising China which will compete for supremacy with Europe and America. Recent emergence of Chinese nationalism and the containment of the neighboring countries are also serious limiting factors. Domestically they have the rampant corruption in the bureaucracy, weakened capacity of Communist rule, wealth disparity due to the discriminatory economic development strategy, seriousness of rural problem, social instability, lack of social security systems and the development gap between the eastern coastal areas and western inland areas, ethnic minorities problems, the constraint of sustainable development issues due to lack of resources, environmental pollution and energy constraints. Like the former Soviet Union, China may face a dismantlement. After the rise, China may encounter possibilities of a war between great powers or a collapse of Chinese society caused by deepening internal conflict. Serious economic polarization would make peasants and urban workers, who are social vulnerable people, to turn their back to communist party and threaten the justification and the appropriateness of the ruling communist party. Chinese government will think internal system security threat is more formidable risk factor than a system security threat from the hegemon. The decline of great country comes from internal reasons rather than external reasons. To achieve peaceful rise, unification with Taiwan is an essential prerequisite. Taiwan issues are complex problems which equipped with international and domestic factors. Lack of energy resources, environmental pollution in China will bring economic crisis to Korean enterprises. Important influence to Korean economy will be a changeover of the method in economic development. It will turn the balance of investment and consumption, GDP-centered growth to consumption and environment-centered growth. Services industries including finance, environment, culture, education, health care and social welfare will grow. Change in China's growth model will give a great challenge upon the intermediate goods industry in Korea. Korea should reduce the portion of machinery, automotive, semiconductor, steel and chemical-centered export industry to China, and should increase the proportion of the service industry.

  • PDF

Optimal Monetary Policy System for Both Macroeconomics and Financial Stability (거시경제와 금융안정을 종합 고려한 최적 통화정책체계 연구)

  • Joonyoung Hur;Hyoung Seok Oh
    • KDI Journal of Economic Policy
    • /
    • v.46 no.1
    • /
    • pp.91-129
    • /
    • 2024
  • The Bank of Korea, through a legal amendment in 2011 following the financial crisis, was entrusted with the additional responsibility of financial stability beyond its existing mandate of price stability. Since then, concerns have been raised about the prolonged increase in household debt compared to income conditions, which could constrain consumption and growth and increase the possibility of a crisis in the event of negative economic shocks. The current accumulation of financial imbalances suggests a critical period for the government and central bank to be more vigilant, ensuring it does not impede the stable flow of our financial and economic systems. This study examines the applicability of the Integrated Inflation Targeting (IIT) framework proposed by the Bank for International Settlements (BIS) for macro-financial stability in promoting long-term economic stability. Using VAR models, the study reveals a clear increase in risk appetite following interest rate cuts after the financial crisis, leading to a rise in household debt. Additionally, analyzing the central bank's conduct of monetary policy from 2000 to 2021 through DSGE models indicates that the Bank of Korea has operated with a form of IIT, considering both inflation and growth in its policy decisions, with some responsiveness to the increase in household debt. However, the estimation of a high interest rate smoothing coefficient suggests a cautious approach to interest rate adjustments. Furthermore, estimating the optimal interest rate rule to minimize the central bank's loss function reveals that a policy considering inflation, growth, and being mindful of household credit conditions is superior. It suggests that the policy of actively adjusting the benchmark interest rate in response to changes in economic conditions and being attentive to household credit situations when household debt is increasing rapidly compared to income conditions has been analyzed as a desirable policy approach. Based on these findings, we conclude that the integrated inflation targeting framework proposed by the BIS could be considered as an alternative policy system that supports the stable growth of the economy in the medium to long term.

The Ruling System of Silla to Gangneung Area Judged from Archaeological Resources in 5th to 6th Century (고고자료로 본 5~6세기 신라의 강릉지역 지배방식)

  • Shim, Hyun Yong
    • Korean Journal of Heritage: History & Science
    • /
    • v.42 no.3
    • /
    • pp.4-24
    • /
    • 2009
  • This paper examined archaeological resources that discuss how Silla entered the Gangneung area, the coastal region along the East Sea that has been excavated most actively. Silla expanded its territories while organizing the its system as an ancient state and acquired several independent townships in various regions, stretching its forces to the East Sea area faster than any other ancient states of the time. In particular, many early relics and heritages of Silla have been found in Gangneung, the center of the East Sea area. Many archaeological resources prove these circumstances of that time and provide brief texts that are valuable for our interpretation of historical facts. In this respect, it was possible for me to examine these resources to answer my question as to why early relics and heritages of Silla are found in the Gangneung area. Based on my research on Silla's advancement into the Gangneung area, I have acquired the following results: How did Silla rule this area after conquering Yeguk in the Gangneung area? After conquering the Gangneung area, Silla attempted an indirect ruling at first. Later, Silla adopted a direct ruling system. I divided the indirect ruling period into two phases: introduction and settlement. In detail, Silla's earthenware and stone chamber tombs first appeared in Hasi-dong in the fourth quarter of the 4th Century and the tombs spread to Chodang-dong in the second quarter of the 5th Century. A belt with dragon pattern openwork, which seems to be from the second quarter of the 5th Century, was found to tell us that the Gangneung region began receiving rewards from Silla during this time. Thus, the period from the fourth quarter of the 4th Century to the second quarter of the 5th Century is designated as the 1st Phase (Introduction) of indirect ruling in terms of aechaeological findings. This is when Silla was first advanced to the Gangneung area and tolerated independent administration of the conquered. In the third and fourth quarters of the 5th Century, old mound tombs appeared and burials of relics that symbolized power emerged. In the third quarter of the 5th Century, stone chamber tombs were prevalent, but wooden chamber tombs, stone mounded wooden chamber tombs, and lateral entrance stone chamber tombs began to emerge. Also, tombs that were clustered in Hasi-dong and Chodang-dong began to scatter to Byeongsan-dong, Yeongjin-ri, and Bangnae-ri nearby. Steel pots were the symbol of power that emerged at this time. In the fourth quarter of the 5th Century, stone chamber tombs were still dominating, but wooden chamber tombs, stone mounded wooden chamber tombs, and lateral entrance stone chamber tombs became more popular. More crowns, crown ornaments, big daggers, and belts were bestowed by Silla, mostly in Chodang-dong and Byeongsan-dong. The period from the third quarter to the fourth quarter of the 5th Century was designated as the 2nd Phase (Settlement) of indirect ruling in terms of aechaeological findings. At this time, Silla bestowed items of power to the ruling class of the Gangneung area and gave equal power to the rulers of Chodang-dong and Byeongsan-dong to keep them restrained by each other. However, Silla converted the ruling system to direct ruling once it recognized the Gangneung area as the base of its expedition of conquest to the north. In the first quarter of the 6th Century, old mound tombs disappeared and small/medium-sized mounds appeared in the western inlands and the northern areas. In this period, the tunnel entrance stone chamber tombs were large enough for people to enter with doors. A cluster of several tunnel entrance stone chamber tombs was formed in Yeongjin-ri and Bangnae-ri at this time, probably with the influence of Silla's direct ruling. In the first quarter of the 6th Century, Silla dispatched officers from the central government to complete the local administration system and replaced the ruling class of Chodang-dong and Byeongsan-dong with that of Silla-friendly Yeonjin-ri and Bangnae-ri to reorganize the local administration system and gain full control of the Gangneung area.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.