Korean Morphological Analysis Method Based on BERT-Fused Transformer Model (BERT-Fused Transformer 모델에 기반한 한국어 형태소 분석 기법)
-
- KIPS Transactions on Software and Data Engineering
- /
- v.11 no.4
- /
- pp.169-178
- /
- 2022
Morphemes are most primitive units in a language that lose their original meaning when segmented into smaller parts. In Korean, a sentence is a sequence of eojeols (words) separated by spaces. Each eojeol comprises one or more morphemes. Korean morphological analysis (KMA) is to divide eojeols in a given Korean sentence into morpheme units. It also includes assigning appropriate part-of-speech(POS) tags to the resulting morphemes. KMA is one of the most important tasks in Korean natural language processing (NLP). Improving the performance of KMA is closely related to increasing performance of Korean NLP tasks. Recent research on KMA has begun to adopt the approach of machine translation (MT) models. MT is to convert a sequence (sentence) of units of one domain into a sequence (sentence) of units of another domain. Neural machine translation (NMT) stands for the approaches of MT that exploit neural network models. From a perspective of MT, KMA is to transform an input sequence of units belonging to the eojeol domain into a sequence of units in the morpheme domain. In this paper, we propose a deep learning model for KMA. The backbone of our model is based on the BERT-fused model which was shown to achieve high performance on NMT. The BERT-fused model utilizes Transformer, a representative model employed by NMT, and BERT which is a language representation model that has enabled a significant advance in NLP. The experimental results show that our model achieves 98.24 F1-Score.
Peroxisomes, known as microbodies, are a class of morphologically similar subcellular organelles commonly found in most eukaryotic cells. They are 0.2~1.8 ㎛ in diameter and are bound by a single membrane. The matrix is usually finely granular, but occasionally crystalline or fibrillary inclusions are observed. They characteristically contain hydrogen peroxide (H2O2) generating oxidases and contain the enzyme catalase, thus confining the metabolism of the poisonous H2O2 within these organelles. Therefore, the eukaryotic organelles are greatly dynamic both in morphology and metabolism. Plant peroxisomes, in particular, are associated with numerous metabolic processes, including β-oxidation, the glyoxylate cycle and photorespiration. Furthermore, plant peroxisomes are involved in development, along with responses to stresses such as the synthesis of important phytohormones of auxins, salicylic acid and jasmonic acids. In the past few decades substantial progress has been made in the study of peroxisome biogenesis in eukaryotic organisms, mainly in animals and yeasts. Advancement of sophisticated techniques in molecular biology and widening of the range of genomic applications have led to the identification of most peroxisomal genes and proteins (peroxins, PEXs). Furthermore, recent applications of proteome study have produced fundamental information on biogenesis in plant peroxisomes, together with improving our understanding of peroxisomal protein targeting, regulation, and degradation. Nonetheless, despite this progress in peroxisome development, much remains to be explained about how peroxisomes originate from the endoplasmic reticulum (ER), then assemble and divide. Peroxisomes perform dynamic roles in many phases of plant development, and in this review, we focus on the latest progress in furthering our understanding of plant peroxisome functions, biogenesis, and dynamics.
Purpose: Generally dual energy X-ray absorptiometry has been used for the purpose of evaluation of osteoporosis and treatment. Recently the interest of obesity came to be high and body percent fat test is increasing. Existing measure of body fat have to scan the whole body can be evaluated, but only lumbar spine and hip measurements was assumed to be whole body fat as well as improving the software. It tries to check whether the part measured value not being whole body measurement has the validity or not compared with the value calculated with the method that it is different, it forgives through a correlation with a (BIA) and (BMI). Materials and Methods: In 2010, the body percent fat was measured among the examinee coming to the Asan Medical Center public health care center from March till August against 90 females more than 40 years old through (DXA) and BIA. BMI utilized the value which wrote an hight and weight measured through the body measuring instrument in the examinee information and is automatically calculated. In addition, it classified as the low weight (
The ultimate goal of preserving and maintaining the records is to use them practically. The effective use of records should be supported by the reasonable recordskeeping systems and access standards. In this report, I examined the Korean laws and administrative systems related to the public records access issues. After I pointed out major problems of the access laws, the Government Information Opening Act (GOIA), and the problems in practices, I suggested some alternatives for the betterment of the access system. The GIOA established "eight standards of exemption to access" not to open some information to protect national interests and privacy. The Public Records Management Act (PRMA) applies to the archives transferred to "professional archives." The two laws show fundamental differences in the ways to open the public records to public. First, the GIOA deals with the whole information (the records) that public institutions keep and maintain, while the PRMA deals with the records that were transferred to the Government Archives. Second, the GIOA provides with a legal procedure to open public records and the standards to open or not to open them, while the PRMA allows the Government Archives to decide whether the transferred records should be opened or not. Third, the GIOA applies to record producing agencies, while the PRMA applies to public archival institutions. One of the most critical inadequacies of the PRMA is that there are no standards to judge to open the archives through reclassification procedure. The GIOA also suggests only the type of information that is not accessible. It does not specify how long the records can be closed. The GARS does not include the records less than 30 years old as its objects of the reclassification. To facilitate the opening of the archives, we need to revise the GIOA and the PRMA. It is necessary to clearly divide the realms between the GIOA and the PRMA on the access of the archives. The PRMA should clarify the principles of the reclassification as well as reclassifying method and exceptions. The exemption standards of the GIOA should be revised to restrict the abuse of the exemption clauses, and they should not be applied to the archives in the GARS indiscreetly and unconditionally.
The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.
Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used