• Title/Summary/Keyword: Attempts

Search Result 6,217, Processing Time 0.039 seconds

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

An Examination on Dongbeomwas if Convex Roofing Tiles (수막새의 동범와(同范瓦)에 대한 검토 - 월성해자 출토 단판연화문 수막새를 중심으로 -)

  • Lee, Seonhui
    • Korean Journal of Heritage: History & Science
    • /
    • v.39
    • /
    • pp.59-93
    • /
    • 2006
  • Wolseong in Gyeongju is a historic fortress site of Silla constructed under the reign of Pasanisageum that played politically and militarily important roles. The moat surrounding Wolseong had a function of protecting the fortress in wartimes but became a part of gardening in the unified Silla era. Lots of relics have been excavated from Wolseong moat since 1985. Among them a great number and kinds of convex roofing tiles are regarded as invaluable sources to show different aspects of Silla, from its earlier time through to the unified and on. Roofing tiles were widely used for national buildings such as royal palaces, temples and fortresses and even for other popular architecture and have been dug out a lot more than any other relics. Research on them, however, has been done poorly. Vigorous study is in progress with increasing number of roofing tiles coming from many recent excavations, though it has been limited to the studies on general genealogy of patterns and manufacture processes. Thus this essay seeks to find which are dongbeomwas, roofing tiles of a same mold, out of convex tiles with the pattern of a unilobed lotus flower dug out of Wolseong moat. It also attempts to identify dongbeomwas by examining detail characteristics of roofing tiles which have been confusingly termed as yusawa, similar roofing tiles, or donghyeongwa, roofing tiles of the same shape. The significance of identifying dongbeomwas could be emphasized by various facts resulting from researches on dongbeomwas; the ways to identify them correctly, their time sequence and their excavated sites. In conclusion, dongbeomwas were identified out of many kinds of convex tiles. If they were excavated from the same site, they share some common features. The sites where they were dug out also tell what changes were made with passage of time and what relations they had with neighboring Anapji. Since roofing tile molds haven't been found yet, the only way to identify dongbeomwas is to examine details of roofing tiles. Dongbeomwas excavated in Wolseong moat help to discuss the time of each district of it. Meanwhile it should be noted that the term 'dongbeomwa' be used only after exact examining.

A Study on Institutional Reliability of Open Record Information in the Information Disclosure System (정보공개제도에서 공개 기록정보의 제도적 신뢰성에 관한 연구)

  • Lee, Bo-ram;Lee, Young-hak
    • The Korean Journal of Archival Studies
    • /
    • no.35
    • /
    • pp.41-91
    • /
    • 2013
  • There have been numerous steps of growth in policy system since the legal systemization through the enactment of Information Disclosure of public institution Act in 1996 and Records Management of public institution Act in 1999 as well as infrastructure advancement led by government bodies, but it still shows insufficiency in some aspects of information disclosure system and records management. In particular, the issue of reliability on record information disclosed through information disclosure system is raised, and institutional base through the legal and technical devices to ensure the reliability are not well prepared. Government has attempted to enact laws and regulations to guarantee the public right to know through information disclosure and records management at government level, and establish the national system in a way that advances the infrastructure for encouraging the participation in state affairs and utilization of national record information resources. There are limitations that it lacks internal stability and overlooks the impact and significance of record information itself by focusing upon system expansion and disclosing information quantatively. Numerous record information disclosed tends to be falsified, forged, extracted or manufactured by information disclosure staffs, or provided in a form other than official document or draft. In addition, the disclosure or non-disclosure decisions without consistency and criteria due to lack of information disclosure staff or titular supervising authority, which is likely to lead to societal confusion. There are also frequent cases where the reliability is damaged due to voluntary decision, false response or non response depending upon request agents for information disclosure. In other cases, vague request by information disclosure applicant or civil complaint form request are likely to hinder the reliability of record information. Thus it is essential to ensure the reliability of record information by establishing and amending relevant laws and regulations, systemic improvement through organizational and staff expertise advancement, supplementing the information disclosure system and process, and changing the social perception on information disclosure. That is, reliable record information is expected to contribute to genuine governance form administration as well as accountability of government bodies and public organizations. In conclusion, there are needed numerous attempts to ensure the reliability of record information to be disclosure in the future beyond previous trials of perceiving record information as records systematically and focusing upon disclosing more information and external development of system.

Rethinking the Records of the Japan's Korean Colonial Rule and the Post-War Compensation : Focusing on the Dual Decision Making System and the Sources of the Documents (제국의 식민지·점령지 지배와 '전후보상' 기록의 재인식 조선의 식민지지배·보상처리 결재구조와 원본출처를 중심으로)

  • Kim, Kyung-Nam
    • The Korean Journal of Archival Studies
    • /
    • no.39
    • /
    • pp.281-318
    • /
    • 2014
  • This article aims to inquire into the decision making system and the sources of the original documents made by means of it in Imperial Japan, the colonial Chosun, GHQ, and the occupied Japan in terms of the post-war treatments of compensation on the Japanese colonial rules. It deals with them from 1910 to 1952 in the perspective of history and archivistics. This article attempts to establish the foundation on which the perception of the documents made in the Imperial Japan, its colony, and the occupied territory would be widened by placing the colonial rules and the compensation on them into a continuous line. The records of Japan's forced occupation of Korea during 1910-1945, and the original records documenting the decision making process of post-war compensation under GHQ, 1945-1952, have been dispersed in Korea, Japan and the United States. This dispersed preservation was mainly due to the complicated decision-making process among Governor-General of Chosun, the Japanese Imperial government, and the GHQ. It was the top-down styled, dual decision making system, in which the critical policies, personnel, and budget had been decided in Imperial homeland, while their implementations were made in the colonies. As a result, the records documenting the whole process of domination have been preserved dispersedly in Japan and its colonies. In particular, the accounts of not yet paid Korean workers that was forced to mobilize in Japan's colonial periods, which is emerging as the diplomatic conflict between Korea and Japan, had been dealt in the decrees of the Japanese government and policy-making of GHQ. It has already been changed to the problem as 'economic cooperation' from the 'debt'. Also, the critical records for post-war compensation were preserved dispersedly in the United States and Japan under the top-down decision making process of GHQ-Japan. Therefore, the dispersed records of 1910-1952 about the colonial rules by the Imperial Japan and the post-war compensation on them must be re-investigated for the adequate documentation in the context of time and space.

Predicting Forest Gross Primary Production Using Machine Learning Algorithms (머신러닝 기법의 산림 총일차생산성 예측 모델 비교)

  • Lee, Bora;Jang, Keunchang;Kim, Eunsook;Kang, Minseok;Chun, Jung-Hwa;Lim, Jong-Hwan
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.1
    • /
    • pp.29-41
    • /
    • 2019
  • Terrestrial Gross Primary Production (GPP) is the largest global carbon flux, and forest ecosystems are important because of the ability to store much more significant amounts of carbon than other terrestrial ecosystems. There have been several attempts to estimate GPP using mechanism-based models. However, mechanism-based models including biological, chemical, and physical processes are limited due to a lack of flexibility in predicting non-stationary ecological processes, which are caused by a local and global change. Instead mechanism-free methods are strongly recommended to estimate nonlinear dynamics that occur in nature like GPP. Therefore, we used the mechanism-free machine learning techniques to estimate the daily GPP. In this study, support vector machine (SVM), random forest (RF) and artificial neural network (ANN) were used and compared with the traditional multiple linear regression model (LM). MODIS products and meteorological parameters from eddy covariance data were employed to train the machine learning and LM models from 2006 to 2013. GPP prediction models were compared with daily GPP from eddy covariance measurement in a deciduous forest in South Korea in 2014 and 2015. Statistical analysis including correlation coefficient (R), root mean square error (RMSE) and mean squared error (MSE) were used to evaluate the performance of models. In general, the models from machine-learning algorithms (R = 0.85 - 0.93, MSE = 1.00 - 2.05, p < 0.001) showed better performance than linear regression model (R = 0.82 - 0.92, MSE = 1.24 - 2.45, p < 0.001). These results provide insight into high predictability and the possibility of expansion through the use of the mechanism-free machine-learning models and remote sensing for predicting non-stationary ecological processes such as seasonal GPP.

Does the Availability of Various Types and Quantity of Food Limit the Community Structure of the Benthos (Mollusks) Inhabiting in the Hard-bottom Subtidal Area? (먹이생물의 종류와 양이 암반 조하대 저서동물(연체동물) 군집구조 결정요소가 될 수 있는가?)

  • SON, MIN-HO;KIM, HYUN-JUNG;KANG, CHANG-KEUN;HWANG, IN-SUH;KIM, YOUNG-NAM;MOON, CHANG-HO;HWANG, JUNG-MIN;HAN, SU-JIN;LEE, WON-HAENG
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.24 no.1
    • /
    • pp.128-138
    • /
    • 2019
  • Effects of feeding type and food resource availability on community structure of mollusks inhabiting hard-bottom subtidal areas were investigated. By following guidance from several references, mollusks observed in this study were divided into 5 groups according to feeding type - 1) grazing, 2) filter feeding, 3) deposit feeding, 4) omnivorous and 5) predation. The results showed that both grazing and filter feeders were the most numerous, explaining grazing type in the East Sea accounting for 47.9%, 32.6% in the South Sea and 29.6% for filter feeding, and filter feeding as a dominant feeding type in Yellow Sea accounting for 42.3%. Results of this study showed distinctive difference in community structure depending on mechanism of feeding type and geographical areas where sampling took place. With the results, attempts were made to understand whether community structure could be affected by feeding type or feeding availability and found out that community structure depended heavily on food resource availability. In the East Sea where marine algal density was high, the algal community in the forms of thick-leathery and sheet often occurred in water column with high transparency which provides proper environment for growth. In the South Sea where grazing and filter feeding types were predominated similarly, the algal density was high, but had the relative highest phytoplankton density. Whereas in the Yellow Sea showing the lowest algal biomass compared to the one in the East and the South Sea, and phytoplankton density was similar to those. It might be a adequate environment for filter feeders than grazers. This study concluded that community structure of mollusks showing high abundance was present where food resource availability with types and quantity was high.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Extracorporeal Membrane Oxygenation for Coronavirus Disease 2019: Expert Recommendations from The Korean Society for Thoracic and Cardiovascular Surgery

  • Jeong, In Seok;Kim, Woong-Han;Baek, Jong Hyun;Choi, Chang-Hyu;Choi, Chang Woo;Chung, Euy Suk;Jang, Jae Seok;Jang, Woo Sung;Jung, Hanna;Jung, Jae-Seung;Kang, Pil Je;Kim, Dong Jung;Kim, Do Wan;Kim, Hyoung Soo;Kim, Jae Bum;Kim, Woo-Shik;Kim, Young Sam;Kwak, Jae Gun;Lee, Haeyoung;Lee, Seok In;Lim, Jae Woong;Oh, Se Jin;Oh, Tak-Hyuck;Park, Chun Soo;Ryu, Kyoung Min;Shim, Man-Shik;Son, Joohyung;Son, Kuk Hui;Song, Seunghwan;The Korean Society for Thoracic and Cardiovascular Surgery COVID-19 ECMO Task Force Team
    • Journal of Chest Surgery
    • /
    • v.54 no.1
    • /
    • pp.2-8
    • /
    • 2021
  • Since the first reported case of coronavirus disease 2019 (COVID-19) in December 2019, the numbers of confirmed cases and deaths have continued to increase exponentially despite multi-factorial efforts. Although various attempts have been made to improve the level of evidence for extracorporeal membrane oxygenation (ECMO) treatment over the past 10 years, most experts still hesitate to take an active position on whether to apply ECMO in COVID-19 patients. Several ECMO management guidelines have been published recently, but they reflect some important differences from the Korean medical system and aspects of real-world medical practice in Korea. We aimed to find evidence on the efficacy of ECMO for COVID-19 patients by reviewing the published literature and to propose expert recommendations by analyzing the Korean COVID-19 ECMO registry data.

A Study on the Transmission Process of Yeoju-Palkyung in Old Poems and Map (팔경시와 고지도에 투영된 여주팔경의 전승양상)

  • Rho, Jae-Hyun
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.29 no.1
    • /
    • pp.14-27
    • /
    • 2011
  • The study reviewed the content and the meaning of the present Yeoju Palkyung(eight scenery) through analyzing and interpreting the Palkyung poems, old maps and paintings, and classic materials transmitted in Yeoju area, and investigated the transmission process. Although five scenes of the Yeoju Palkyungs illustrate abstract landscapes derived from the Sosang Palkyung, there are mixed with local sceneries showing famous historical ruins in the area and local life of the Yeogang(驪江: river). Seunggyeong(勝景) of Yeoju, highlighted in old paintings, has been emphasized through duplication the object and the view point field of Yeoju Palgyeong(驪州八詠), which is usually symbolized to sailing boats along the Yeogang, forests around Cheongshimru, and the layer Jeontap and Maam above Shinreuksa(神勒寺) Dongdae(東臺). It is quite undoubtful that the Yeoju Palyong of Choi Sukjeong and Seo Geojung is the copy of the present Yeoju Palkyung, but the present version is found to be all included in the Cheonggijeongsipyoung(淸奇亭十詠) of Cho Moonsu since the 17th Century, which shows that the Cheonggijeongsipyoung is viewed that it played an important role for the transmission of the Yeoju Palkyung. Also, it. is concluded that the Yeoju Palyong recorded in Yeojidoseo(與地圖書) is the same landscape collecting with the presend Yeoju Palkyung, which would be dated back at least until the mid 18th Century. In addition, given the fact that the studied old maps show Eight scenery, Sachoneohwa, Shinreukmojong, Yeontanguibum, Paldaejangrim, Yangdonagan, Ibanchungam, Pasagwau, and Yongmoonjeukchui, recorded consistently in the same time order, the eight scenic points in the old maps had been apparently established as the typical copy of the Yeoju Palkyung in the 18th Century. Therefore, the transmission route of the Yeoju Palkyung follows two separate versions, one starting from the Yeoju Palyong(Choi Sukjeung, Seo Geojeong) to Cheonggijeongpalyong to Yeoju Palyoung(Yeojidoseo) to the present. Yeoju Palkyung, and the other from the Yeoju Palyoung Geumsa Palyong(金沙八詠) to the old map Palkyung to the Yeoju Palkyung(the late 18th C). These two transmission processes have their own cultural sceneries having the same origin, which are different only in perspective which attempts to cover the representative scenic landscapes including Yeoju and Geumsa.