• Title/Summary/Keyword: input system

Search Result 11,545, Processing Time 0.051 seconds

A Three-year Study on the Leaf and Soil Nitrogen Contents Influenced by Irrigation Frequency, Clipping Return or Removal and Nitrogen Rate in a Creeping Bentgrass Fairway (크리핑 벤트그라스 훼어웨이에서 관수회수.예지물과 질소시비수준이 엽조직 및 토양 질소함유량에 미치는 효과)

  • 김경남;로버트쉬어만
    • Asian Journal of Turfgrass Science
    • /
    • v.11 no.2
    • /
    • pp.105-115
    • /
    • 1997
  • Responses of 'Penncross' creeping bentgrass turf to various fairway cultural practices are not well-established or supported by research results. This study was initiated to evaluate the effects of irrigation frequency, clipping return or removal, and nitrogen rate on leaf and soil nitrogen con-tent in the 'Penncross' creeping bentgrass (Agrostis palustris Huds.) turf. A 'Penncross' creeping bentgrass turf was established in 1988 on a Sharpsburg silty-clay loam (Typic Argiudoll). The experiment was conducted from 1989 to 1991 under nontraffic conditions. A split-split-plot experimental design was used. Daily or biweekly irrigation, clipping return or removal, and 5, 15, or 25 g N $m-^2$ $yr-^1$ were the main-, sub-, and sub-sub-plot treatments, respectively. Treatments were replicated 3 times in a randomized complete block design. The turf was mowed 4 times weekly at a l3 mm height of cut. Leaf tissue nitrogen content was analyzed twice in 1989 and three times in both 1990 and 1991. Leaf samples were collected from turfgrass plants in the treatment plots, dried immediately at 70˚C for 48 hours, and evaluated for total-N content, using the Kjeldahl method. Concurrently, six soil cores (18mm diam. by 200 mm depth) were collected, air dried, and analyzed for total-N content. Nitrogen analysis on the soil and leaf samples were made in the Soil and Plant Analyical Laboratory, at the University of Nebraska, Lincoln, USA. Data were analyzed as a split-split-plot with analysis of variance (ANOVA), using the General Linear Model procedures of the Statistical Analysis System. The nitrogen content of the leaf tissue is variable in creeping bentgrass fairway turf with clip-ping recycles, nitrogen application rate and time after establishment. Leaf tissue nitrogen content increased with clipping return and nitrogen rate. Plots treated with clipping return had 8% and 5% more nitrogen content in the leaf tissue in 1989 and 1990, respectively, as compared to plots treated with clipping removal. Plots applied with high-N level (25g N $m-^2$ $yr-^1$)had 10%, 17%, and 13% more nitrogen content in leaf tissue in 1989, 1990, and 1991, respectively, when compared with plots applied with low-N level (5g N $m-^2$ $yr-^1$). Overall observations during the study indicated that leaf tissue nitrogen content increased at any nitrogen rate with time after establishment. At the low-N level treatment (5g N $m-^2$ $yr-^1$ ), plots sampled in 1991 had 15% more leaf nitrogen content, as compared to plots sampled in 1989. Similar responses were also found from the high-N level treatment (25g N $m-^2$ $yr-^1$ ).Plots analyzed in 1991 were 18% higher than that of plots analyzed in 1989. No significant treatment effects were observed for soil nitrogen content over the first 3 years after establishment. Strategic management application is necessary for the golf course turf, depending on whether clippings return or not. Different approaches should be addressed to turf fertilization program from a standpoint of clipping recycles. It is recommended that regular analysis of the soil and leaf tissue of golf course turf must be made and fertilization program should be developed through the interpretation of its analytic data result. In golf courses where clippings are recycled, the fertilization program need to be adjusted, being 20% to 30% less nitrogen input over the clipping-removed areas. Key words: Agrostis palustris Huds., 'Penncross' creeping bentgrass fairway, Irrigation frequency, Clipping return, Nitrogen rate, Leaf nitrogen content, Soil nitrogen content.

  • PDF

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Ecoclimatic Map over North-East Asia Using SPOT/VEGETATION 10-day Synthesis Data (SPOT/VEGETATION NDVI 자료를 이용한 동북아시아의 생태기후지도)

  • Park Youn-Young;Han Kyung-Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.2
    • /
    • pp.86-96
    • /
    • 2006
  • Ecoclimap-1, a new complete surface parameter global database at a 1-km resolution, was previously presented. It is intended to be used to initialize the soil-vegetation- atmosphere transfer schemes in meteorological and climate models. Surface parameters in the Ecoclimap-1 database are provided in the form of a per-class value by an ecoclimatic base map from a simple merging of land cover and climate maps. The principal objective of this ecoclimatic map is to consider intra-class variability of life cycle that the usual land cover map cannot describe. Although the ecoclimatic map considering land cover and climate is used, the intra-class variability was still too high inside some classes. In this study, a new strategy is defined; the idea is to use the information contained in S10 NDVI SPOT/VEGETATION profiles to split a land cover into more homogeneous sub-classes. This utilizes an intra-class unsupervised sub-clustering methodology instead of simple merging. This study was performed to provide a new ecolimatic map over Northeast Asia in the framework of Ecoclimap-2 global database construction for surface parameters. We used the University of Maryland's 1km Global Land Cover Database (UMD) and a climate map to determine the initial number of clusters for intra-class sub-clustering. An unsupervised classification process using six years of NDVI profiles allows the discrimination of different behavior for each land cover class. We checked the spatial coherence of the classes and, if necessary, carried out an aggregation step of the clusters having a similar NDVI time series profile. From the mapping system, 29 ecosystems resulted for the study area. In terms of climate-related studies, this new ecosystem map may be useful as a base map to construct an Ecoclimap-2 database and to improve the surface climatology quality in the climate model.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

Ontology-Based Process-Oriented Knowledge Map Enabling Referential Navigation between Knowledge (지식 간 상호참조적 네비게이션이 가능한 온톨로지 기반 프로세스 중심 지식지도)

  • Yoo, Kee-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.61-83
    • /
    • 2012
  • A knowledge map describes the network of related knowledge into the form of a diagram, and therefore underpins the structure of knowledge categorizing and archiving by defining the relationship of the referential navigation between knowledge. The referential navigation between knowledge means the relationship of cross-referencing exhibited when a piece of knowledge is utilized by a user. To understand the contents of the knowledge, a user usually requires additionally information or knowledge related with each other in the relation of cause and effect. This relation can be expanded as the effective connection between knowledge increases, and finally forms the network of knowledge. A network display of knowledge using nodes and links to arrange and to represent the relationship between concepts can provide a more complex knowledge structure than a hierarchical display. Moreover, it can facilitate a user to infer through the links shown on the network. For this reason, building a knowledge map based on the ontology technology has been emphasized to formally as well as objectively describe the knowledge and its relationships. As the necessity to build a knowledge map based on the structure of the ontology has been emphasized, not a few researches have been proposed to fulfill the needs. However, most of those researches to apply the ontology to build the knowledge map just focused on formally expressing knowledge and its relationships with other knowledge to promote the possibility of knowledge reuse. Although many types of knowledge maps based on the structure of the ontology were proposed, no researches have tried to design and implement the referential navigation-enabled knowledge map. This paper addresses a methodology to build the ontology-based knowledge map enabling the referential navigation between knowledge. The ontology-based knowledge map resulted from the proposed methodology can not only express the referential navigation between knowledge but also infer additional relationships among knowledge based on the referential relationships. The most highlighted benefits that can be delivered by applying the ontology technology to the knowledge map include; formal expression about knowledge and its relationships with others, automatic identification of the knowledge network based on the function of self-inference on the referential relationships, and automatic expansion of the knowledge-base designed to categorize and store knowledge according to the network between knowledge. To enable the referential navigation between knowledge included in the knowledge map, and therefore to form the knowledge map in the format of a network, the ontology must describe knowledge according to the relation with the process and task. A process is composed of component tasks, while a task is activated after any required knowledge is inputted. Since the relation of cause and effect between knowledge can be inherently determined by the sequence of tasks, the referential relationship between knowledge can be circuitously implemented if the knowledge is modeled to be one of input or output of each task. To describe the knowledge with respect to related process and task, the Protege-OWL, an editor that enables users to build ontologies for the Semantic Web, is used. An OWL ontology-based knowledge map includes descriptions of classes (process, task, and knowledge), properties (relationships between process and task, task and knowledge), and their instances. Given such an ontology, the OWL formal semantics specifies how to derive its logical consequences, i.e. facts not literally present in the ontology, but entailed by the semantics. Therefore a knowledge network can be automatically formulated based on the defined relationships, and the referential navigation between knowledge is enabled. To verify the validity of the proposed concepts, two real business process-oriented knowledge maps are exemplified: the knowledge map of the process of 'Business Trip Application' and 'Purchase Management'. By applying the 'DL-Query' provided by the Protege-OWL as a plug-in module, the performance of the implemented ontology-based knowledge map has been examined. Two kinds of queries to check whether the knowledge is networked with respect to the referential relations as well as the ontology-based knowledge network can infer further facts that are not literally described were tested. The test results show that not only the referential navigation between knowledge has been correctly realized, but also the additional inference has been accurately performed.

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

Measurement of Backscattering Coefficients of Rice Canopy Using a Ground Polarimetric Scatterometer System (지상관측 레이다 산란계를 이용한 벼 군락의 후방산란계수 측정)

  • Hong, Jin-Young;Kim, Yi-Hyun;Oh, Yi-Sok;Hong, Suk-Young
    • Korean Journal of Remote Sensing
    • /
    • v.23 no.2
    • /
    • pp.145-152
    • /
    • 2007
  • The polarimetric backscattering coefficients of a wet-land rice field which is an experimental plot belong to National Institute of Agricultural Science and Technology in Suwon are measured using ground-based polarimetric scatterometers at 1.8 and 5.3 GHz throughout a growth year from transplanting period to harvest period (May to October in 2006). The polarimetric scatterometers consist of a vector network analyzer with time-gating function and polarimetric antenna set, and are well calibrated to get VV-, HV-, VH-, HH-polarized backscattering coefficients from the measurements, based on single target calibration technique using a trihedral corner reflector. The polarimetric backscattering coefficients are measured at $30^{\circ},\;40^{\circ},\;50^{\circ}\;and\;60^{\circ}$ with 30 independent samples for each incidence angle at each frequency. In the measurement periods the ground truth data including fresh and dry biomass, plant height, stem density, leaf area, specific leaf area, and moisture contents are also collected for each measurement. The temporal variations of the measured backscattering coefficients as well as the measured plant height, LAI (leaf area index) and biomass are analyzed. Then, the measured polarimetric backscattering coefficients are compared with the rice growth parameters. The measured plant height increases monotonically while the measured LAI increases only till the ripening period and decreases after the ripening period. The measured backscattering coefficientsare fitted with polynomial expressions as functions of growth age, plant LAI and plant height for each polarization, frequency, and incidence angle. As the incidence angle is bigger, correlations of L band signature to the rice growth was higher than that of C band signatures. It is found that the HH-polarized backscattering coefficients are more sensitive than the VV-polarized backscattering coefficients to growth age and other input parameters. It is necessary to divide the data according to the growth period which shows the qualitative changes of growth such as panicale initiation, flowering or heading to derive functions to estimate rice growth.

A study of Artificial Intelligence (AI) Speaker's Development Process in Terms of Social Constructivism: Focused on the Products and Periodic Co-revolution Process (인공지능(AI) 스피커에 대한 사회구성 차원의 발달과정 연구: 제품과 시기별 공진화 과정을 중심으로)

  • Cha, Hyeon-ju;Kweon, Sang-hee
    • Journal of Internet Computing and Services
    • /
    • v.22 no.1
    • /
    • pp.109-135
    • /
    • 2021
  • his study classified the development process of artificial intelligence (AI) speakers through analysis of the news text of artificial intelligence (AI) speakers shown in traditional news reports, and identified the characteristics of each product by period. The theoretical background used in the analysis are news frames and topic frames. As analysis methods, topic modeling and semantic network analysis using the LDA method were used. The research method was a content analysis method. From 2014 to 2019, 2710 news related to AI speakers were first collected, and secondly, topic frames were analyzed using Nodexl algorithm. The result of this study is that, first, the trend of topic frames by AI speaker provider type was different according to the characteristics of the four operators (communication service provider, online platform, OS provider, and IT device manufacturer). Specifically, online platform operators (Google, Naver, Amazon, Kakao) appeared as a frame that uses AI speakers as'search or input devices'. On the other hand, telecommunications operators (SKT, KT) showed prominent frames for IPTV, which is the parent company's flagship business, and 'auxiliary device' of the telecommunication business. Furthermore, the frame of "personalization of products and voice service" was remarkable for OS operators (MS, Apple), and the frame for IT device manufacturers (Samsung) was "Internet of Things (IoT) Integrated Intelligence System". The econd, result id that the trend of the topic frame by AI speaker development period (by year) showed a tendency to develop around AI technology in the first phase (2014-2016), and in the second phase (2017-2018), the social relationship between AI technology and users It was related to interaction, and in the third phase (2019), there was a trend of shifting from AI technology-centered to user-centered. As a result of QAP analysis, it was found that news frames by business operator and development period in AI speaker development are socially constituted by determinants of media discourse. The implication of this study was that the evolution of AI speakers was found by the characteristics of the parent company and the process of co-evolution due to interactions between users by business operator and development period. The implications of this study are that the results of this study are important indicators for predicting the future prospects of AI speakers and presenting directions accordingly.

A Study on the Designer's Post-Evaluation of Gyeongui Line Forest Park Based on Ground Theory - Focused on Yeonnam-dong Section - (근거이론을 활용한 설계자의 경의선숲길공원 사후평가 - 연남동 구간을 중심으로 -)

  • Kim, Eun-Young;Hong, Youn-Soon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.47 no.3
    • /
    • pp.39-48
    • /
    • 2019
  • This research is based on the analysis of in-depth interviews of designers who participated in the design of the Yeonnam-dong section, which was completed in 2016. The case study site has received many domestic and foreign awards and is receiving very positive reviews from actual users. 53 concepts were derived from the open coding of the ground theory methodology. Thirty-four higher categories incorporated the concepts and 18 higher categories that reintegrated them. Later, the six categories of the ground theory were interpreted as the paradigm, and it was determined that the aspects of 'will of client' and 'work efficiency', 'site resources' and 'field manager's specialty' were the categories that had the greatest positive impact on the park construction. The key category of this park's construction was interpreted as "a park-construction model with active empathy and communication." The results of the study and are linked to the following research proposals. First, the need to improve the trust between the client and the landscape designer and the need to improve the customary administrative procedures; second, the importance of the input of landscape experts into the park construction process; third, the importance of all efforts to develop the design; fourth, the importance of on-site circular resources and landscape preservation; and fifth active social participation to increase the opportunity. This study, which seeks to grasp the facts that existed behind the park's construction, which received excellent internal and external evaluations, and has a qualitative, objective and structural interpretation of the social network related to the park's construction, in contrast to the conventional quantitative post-evaluation. It is expected that the administration and system improvements related to landscaping will be further improved through the continuation of in-depth post-evaluation studies.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.