• Title/Summary/Keyword: Knowledge Structures

Search Result 715, Processing Time 0.025 seconds

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Fluid-mud deposits in the Early Cretaceous McMurray Formation, Alberta, Canada (캐나다 앨버타주 전기 백악기 맥머레이층의 유성이토 퇴적층)

  • Oh, Juhyeon;Jo, Hyung Rae
    • Journal of the Geological Society of Korea
    • /
    • v.54 no.5
    • /
    • pp.477-488
    • /
    • 2018
  • Fluid muds commonly occur in estuarine environments, but their ancient examples have rarely been studied in terms of depositional characteristics and processes. Cores of estuarine channel deposits of the Early Cretaceous McMurray Formation, Alberta, Canada show various mudstone layers that possess depositional characteristics of high clay-concentration flows. These mudstone layers are examined in detail through microscopic observation of thin sections and classified into three microfacies (<1 to 25 mm thick) on the basis of sedimentary texture and structures. Structureless mudstone (Microfacies 1) consists mainly of clay particles and contains randomly dispersed coarser grains (coarse silt to fine sand). This microfacies is interpreted as being deposited by cohesive mud flows, i.e., fluid muds, which possessed sufficient strength to support suspended coarser grains (quasi-laminar plug flow). Silt-streaked mudstone (Microfacies 2) mainly comprises mudstone with dispersed coarse grains and includes very thin, discontinuous silt streaks of coarse-silt to very-fine-sand grains. The texture similar to Microfacies 1 indicates that Microfacies 2 was also deposited by cohesive fluid muds. The silt streaks are, however, suggestive of the presence of intermittent weak turbulence under the plug (upper transitional plug flow). Heterolithic laminated mudstone (Microfacies 3) is characterized by alternation of relatively thick silt laminae and much thinner clay laminae. It is either parallel-laminated or low-angle cross-laminated, occasionally showing low-amplitude ripple forms. The heterolithic laminae are interpreted as the results of shear sorting in the basal turbulent zone under a cohesive plug. They may represent low-amplitude bed-waves formed under lower transitional plug flows. These three microfacies reflect a range of flow phases of fluid muds, which change with flow velocities and suspended mud concentrations. The results of this study provide important knowledge to recognize fluid-mud deposits in ancient sequences and to better understand depositional processes of mudstones.

A Review of the Changes Made to the Sites of Hwangnyongsa Temple during the Unified Silla and Goryeo Periods (통일신라~고려시대 황룡사 사역의 변화과정 검토)

  • JEONG, Yeoseon
    • Korean Journal of Heritage: History & Science
    • /
    • v.55 no.1
    • /
    • pp.265-280
    • /
    • 2022
  • Hwangnyongsa Temple was the large Buddhist monastery of Silla that has existed for about 685 years. The temple underwent a series of excavations from 1976 to 1983, during which it was discovered that its layout consisted of one pagoda and three main dharma halls. This discovery also led to the production of four artistic depictions of the temple at various times from its foundation to its final phase. Previous studies on the architectural layout of Hwangnyongsa Temple are largely focused on the inner sanctuary ("Buddha's Land"). The studies on the temple's main architectural structures may be natural for those who are interested in the origins of and background to its establishment, but the studies on its outer sanctuary ("Sangha's Land") have to come first to acquire a deeper knowledge of the architectural layout of the temple as a whole. To gain a comprehensive understanding of the entire layout of Buddhist monasteries of the Silla dynasty, including both their inner and outer sanctuaries, the studies on Hwangnyongsa Temple are essential as it was once the kingdom's most highly honored temple. The studies on Korean Buddhist monasteries of the Three Kingdoms Period have produced only a limited amount of information concerning the outer sanctuary, resulting in little evidence about the exact scope of the temple's sanctuary. Meanwhile, the excavations of the Hwangnyongsa Temple site have revealed the archaeological features of the walls that divided the monastery and its neighboring facilities, thus helping to delineate the size of the temple site. The excavations have revealed the boundaries between the inner and outer sanctuaries of Hwangnyongsa Temple, as well as the entire temple precincts and the exterior, providing valuable information about the changes made to the layout of the temple. In this study, the main discussion focuses on the changes made to the sanctuary of Hwangnyongsa Temple during the Unified Silla and Goryeo Periods, particularly in relation to the architectural layout of the temple. The discussion is based on a review of the periods in which the Nammunji(South Gate site) was built, which provides tangible evidence about the expansion of the temple to the south, and the walls enclosing the temple precincts on the four sides and the changes that occurred afterwards. As a result, the study concludes that both the inner and outer sanctuaries of the temple probably changed through the 1 st and 3rd. It also concludes that the changes made to the architectural layout of Hwangnyongsa Temple were intended not only to alter the scope of the temple but were also closely associated with the politico-geographical significance of its location at the center of the royal capital of Silla and the urban archaeological remains around it.

Export Prediction Using Separated Learning Method and Recommendation of Potential Export Countries (분리학습 모델을 이용한 수출액 예측 및 수출 유망국가 추천)

  • Jang, Yeongjin;Won, Jongkwan;Lee, Chaerok
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.69-88
    • /
    • 2022
  • One of the characteristics of South Korea's economic structure is that it is highly dependent on exports. Thus, many businesses are closely related to the global economy and diplomatic situation. In addition, small and medium-sized enterprises(SMEs) specialized in exporting are struggling due to the spread of COVID-19. Therefore, this study aimed to develop a model to forecast exports for next year to support SMEs' export strategy and decision making. Also, this study proposed a strategy to recommend promising export countries of each item based on the forecasting model. We analyzed important variables used in previous studies such as country-specific, item-specific, and macro-economic variables and collected those variables to train our prediction model. Next, through the exploratory data analysis(EDA) it was found that exports, which is a target variable, have a highly skewed distribution. To deal with this issue and improve predictive performance, we suggest a separated learning method. In a separated learning method, the whole dataset is divided into homogeneous subgroups and a prediction algorithm is applied to each group. Thus, characteristics of each group can be more precisely trained using different input variables and algorithms. In this study, we divided the dataset into five subgroups based on the exports to decrease skewness of the target variable. After the separation, we found that each group has different characteristics in countries and goods. For example, In Group 1, most of the exporting countries are developing countries and the majority of exporting goods are low value products such as glass and prints. On the other hand, major exporting countries of South Korea such as China, USA, and Vietnam are included in Group 4 and Group 5 and most exporting goods in these groups are high value products. Then we used LightGBM(LGBM) and Exponential Moving Average(EMA) for prediction. Considering the characteristics of each group, models were built using LGBM for Group 1 to 4 and EMA for Group 5. To evaluate the performance of the model, we compare different model structures and algorithms. As a result, it was found that the separated learning model had best performance compared to other models. After the model was built, we also provided variable importance of each group using SHAP-value to add explainability of our model. Based on the prediction model, we proposed a second-stage recommendation strategy for potential export countries. In the first phase, BCG matrix was used to find Star and Question Mark markets that are expected to grow rapidly. In the second phase, we calculated scores for each country and recommendations were made according to ranking. Using this recommendation framework, potential export countries were selected and information about those countries for each item was presented. There are several implications of this study. First of all, most of the preceding studies have conducted research on the specific situation or country. However, this study use various variables and develops a machine learning model for a wide range of countries and items. Second, as to our knowledge, it is the first attempt to adopt a separated learning method for exports prediction. By separating the dataset into 5 homogeneous subgroups, we could enhance the predictive performance of the model. Also, more detailed explanation of models by group is provided using SHAP values. Lastly, this study has several practical implications. There are some platforms which serve trade information including KOTRA, but most of them are based on past data. Therefore, it is not easy for companies to predict future trends. By utilizing the model and recommendation strategy in this research, trade related services in each platform can be improved so that companies including SMEs can fully utilize the service when making strategies and decisions for exports.

Present Status and Prospect of Valuation for Tangible Fixed Asset in South Korea (유형고정자산 가치평가 현황: 우리나라 사례를 중심으로)

  • Jin-Hyung Cho;Hyun-Seung O;Sae-Jae Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.91-104
    • /
    • 2023
  • The records system is believed to have started in Italy in the 14th century in line with trade developments in Europe. In 1491, Luca Pacioli, a mathematician, and an Italian Franciscan monk wrote the first book that described double-entry accounting processes. In many countries, including Korea, the government accounting standards used single-entry bookkeeping rather than double-entry bookkeeping that can be aggregated by account subject. The cash-based and single-entry bookkeeping used by the government in the past had limitations in providing clear information on financial status and establishing a performance-oriented financial management system. Accordingly, the National Accounting Act (promulgated in October 2007) stipulated the introduction of double-entry bookkeeping and accrual accounting systems in the government sector from January 1, 2009. Furthermore, the Korean government has also introduced International Financial Reporting Standards (IFRS), and the System of National Accounts (SNA). Since 2014, Korea owned five national accounts. In Korea, valuation began with the 1968 National Wealth Statistics Survey. The academic origins of the valuation of national wealth statistics which had been investigated by due diligence every 10 years since 1968 are based on the 'Engineering Valuation' of professor Marston in the Department of Industrial Engineering at Iowa State University in the 1930s. This field has spread to economics, etc. In economics, it became the basis of capital stock estimation for positive economics such as econometrics. The valuation by the National Wealth Statistics Survey contributed greatly to converting the book value of accounting data into vintage data. And in 2000 National Statistical Office collected actual disposal data for the 1-digit asset class and obtained the ASL(average service life) by Iowa curve. Then, with the data on fixed capital formation centered on the National B/S Team of the Bank of Korea, the national wealth statistics were prepared by the Permanent Inventory Method(PIM). The asset classification was also classified into 59 types, including 2 types of residential buildings, 4 types of non-residential buildings, 14 types of structures, 9 types of transportation equipment, 28 types of machinery, and 2 types of intangible fixed assets. Tables of useful lives of tangible fixed assets published by the Korea Appraisal Board in 1999 and 2013 were made by the Iowa curve method. In Korea, the Iowa curve method has been adopted as a method of ASL estimation. There are three types of the Iowa curve method. The retirement rate method of the three types is the best because it is based on the collection and compilation of the data of all properties in service during a period of recent years, both properties retired and that are still in service. We hope the retirement rate method instead of the individual unit method is used in the estimation of ASL. Recently Korean government's accounting system has been developed. When revenue expenditure and capital expenditure were mixed in the past single-entry bookkeeping we would like to suggest that BOK and National Statistical Office have accumulated knowledge of a rational difference between revenue expenditure and capital expenditure. In particular, it is important when it is estimated capital stock by PIM. Korea also needs an empirical study on economic depreciation like Hulten & Wykoff Catalog A of the US BEA.