• Title/Summary/Keyword: 문장 의미 비교

Search Result 147, Processing Time 0.024 seconds

A Morpheme Analyzer based on Transformer using Morpheme Tokens and User Dictionary (사용자 사전과 형태소 토큰을 사용한 트랜스포머 기반 형태소 분석기)

  • DongHyun Kim;Do-Guk Kim;ChulHui Kim;MyungSun Shin;Young-Duk Seo
    • Smart Media Journal
    • /
    • v.12 no.9
    • /
    • pp.19-27
    • /
    • 2023
  • Since morphemes are the smallest unit of meaning in Korean, it is necessary to develop an accurate morphemes analyzer to improve the performance of the Korean language model. However, most existing analyzers present morpheme analysis results by learning word unit tokens as input values. However, since Korean words are consist of postpositions and affixes that are attached to the root, even if they have the same root, the meaning tends to change due to the postpositions or affixes. Therefore, learning morphemes using word unit tokens can lead to misclassification of postposition or affixes. In this paper, we use morpheme-level tokens to grasp the inherent meaning in Korean sentences and propose a morpheme analyzer based on a sequence generation method using Transformer. In addition, a user dictionary is constructed based on corpus data to solve the out - of-vocabulary problem. During the experiment, the morpheme and morpheme tags printed by each morpheme analyzer were compared with the correct answer data, and the experiment proved that the morpheme analyzer presented in this paper performed better than the existing morpheme analyzer.

Development of a Ranking System for Tourist Destination Using BERT-based Semantic Search (BERT 기반 의미론적 검색을 활용한 관광지 순위 시스템 개발)

  • KangWoo Lee;MyeongSeon Kim;Soon Goo Hong;SuGyeong Roh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.4
    • /
    • pp.91-103
    • /
    • 2024
  • A tourist destination ranking system was designed that employs a semantic search to extract information with reasonable accuracy. To this end the process involves collecting data, preprocessing text reviews of tourist spots, and embedding the corpus and queries with SBERT. We calculate the similarity between data points, filter out those below a specified threshold, and then rank the remaining tourist destinations using a count-based algorithm to align them semantically with the query. To assess the efficacy of the ranking algorithm experiments were conducted with four queries. Furthermore, 58,175 sentences were directly labeled to ascertain their semantic relevance to the third query, 'crowdedness'. Notably, human-labeled data for crowdedness showed similar results. Despite challenges including optimizing thresholds and imbalanced data, this study shows that a semantic search is a powerful method for understanding user intent and recommending tourist destinations with less time and costs.

A Global-Interdependence Pairwise Approach to Entity Linking Using RDF Knowledge Graph (개체 링킹을 위한 RDF 지식그래프 기반의 포괄적 상호의존성 짝 연결 접근법)

  • Shim, Yongsun;Yang, Sungkwon;Kim, Hong-Gee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.3
    • /
    • pp.129-136
    • /
    • 2019
  • There are a variety of entities in natural language such as people, organizations, places, and products. These entities can have many various meanings. The ambiguity of entity is a very challenging task in the field of natural language processing. Entity Linking(EL) is the task of linking the entity in the text to the appropriate entity in the knowledge base. Pairwise based approach, which is a representative method for solving the EL, is a method of solving the EL by using the association between two entities in a sentence. This method considers only the interdependence between entities appearing in the same sentence, and thus has a limitation of global interdependence. In this paper, we developed an Entity2vec model that uses Word2vec based on knowledge base of RDF type in order to solve the EL. And we applied the algorithms using the generated model and ranked each entity. In this paper, to overcome the limitations of a pairwise approach, we devised a pairwise approach based on comprehensive interdependency and compared it.

Legal search method using S-BERT

  • Park, Gil-sik;Kim, Jun-tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.11
    • /
    • pp.57-66
    • /
    • 2022
  • In this paper, we propose a legal document search method that uses the Sentence-BERT model. The general public who wants to use the legal search service has difficulty searching for relevant precedents due to a lack of understanding of legal terms and structures. In addition, the existing keyword and text mining-based legal search methods have their limits in yielding quality search results for two reasons: they lack information on the context of the judgment, and they fail to discern homonyms and polysemies. As a result, the accuracy of the legal document search results is often unsatisfactory or skeptical. To this end, This paper aims to improve the efficacy of the general public's legal search in the Supreme Court precedent and Legal Aid Counseling case database. The Sentence-BERT model embeds contextual information on precedents and counseling data, which better preserves the integrity of relevant meaning in phrases or sentences. Our initial research has shown that the Sentence-BERT search method yields higher accuracy than the Doc2Vec or TF-IDF search methods.

Korean Students' Achievement in Scientific Literacy (우리 나라 학생들의 과학적 소양 성취도)

  • Shin, Dong-Hee;Ro, Koog-Hyang
    • Journal of The Korean Association For Science Education
    • /
    • v.22 no.1
    • /
    • pp.76-92
    • /
    • 2002
  • OECD/PISA(Programme for International Student Assessment) is significant in that it is the first international comparative study assessing 15-year-old students' scientific literacy. Based on Korean students' results of percent correct in 35 science items, several characteristics such as followings were revealed. First, from the perspectives of science application area, Korean students showed the highest achievement in the area of 'science in technology' followed by in the areas of 'science in life and health' and 'science in earth and environment'. Male students achieved significantly better than female counterparts in all three areas. Second, the achievement in items of science knowledge was significantly higher than in items of scientific processes. Whereas the achievement difference between science knowledge- and scientific process items was larger for male students. Third, from the perspectives of application contexts, Korean students showed the highest achievement in the historical context and the lowest achievement in the personal context. Fourth, from the perspectives of item format, Korean students performed significantly better in open-constructed items rather than in multiple-choice items. Fifth, Korean students showed low performance in items of biotechnology and environment-related issue, which was more prominent for female students. Sixth, whereas male students performed significantly better than female students in most aspects, it is noteworthy that there was no significant gender differences in items of scientific processes and females performed significantly better than male students in open-constructed items which require long sentence.

A Comparative Study of South and North Korea on Mathematics Textbook and the Development of Unified Mathematics Curriculum for South and North Korea (II) - Focusing on the Elementary School Textbooks of South and Those of North Korea - (남북한 수학 교과서 영역별 분석 및 표준 수학 교육과정안 개발 연구 (II): 남북한 초등학교 수학교과서의 구성과 전개방법 비교)

  • 임재훈;이경화;박경미
    • School Mathematics
    • /
    • v.5 no.1
    • /
    • pp.43-58
    • /
    • 2003
  • This study intends to compare the structure of contents and the way of developing concepts in mathematics textbooks of south and those of north Korea. After thorough investigations of the textbooks from south and north Korea, the following three characteristics were identified. First, the mathematics textbooks of south Korea tends to spread out contents across several grades, while those of north Korea have a tendency of centralization in terms of locating contents Second, in the textbooks of South Korea, mathematics concepts are permeated through real world situations, and students gradually acquire those concepts mostly through activities. This is different from the approach of the north Korean textbooks in which various problems play a key role in explaining concepts. Third, the main strategy of introducing contents in the textbooks of south and that of north Korea corresponds to 'guidance' and 'explanation' respectively. Exploratory questions leading to the concepts are more emphasized in the textbooks of south Korea, on the other hand, meaningful explanations play an important role in the textbooks of north Korea.

  • PDF

Sensitivity Identification Method for New Words of Social Media based on Naive Bayes Classification (나이브 베이즈 기반 소셜 미디어 상의 신조어 감성 판별 기법)

  • Kim, Jeong In;Park, Sang Jin;Kim, Hyoung Ju;Choi, Jun Ho;Kim, Han Il;Kim, Pan Koo
    • Smart Media Journal
    • /
    • v.9 no.1
    • /
    • pp.51-59
    • /
    • 2020
  • From PC communication to the development of the internet, a new term has been coined on the social media, and the social media culture has been formed due to the spread of smart phones, and the newly coined word is becoming a culture. With the advent of social networking sites and smart phones serving as a bridge, the number of data has increased in real time. The use of new words can have many advantages, including the use of short sentences to solve the problems of various letter-limited messengers and reduce data. However, new words do not have a dictionary meaning and there are limitations and degradation of algorithms such as data mining. Therefore, in this paper, the opinion of the document is confirmed by collecting data through web crawling and extracting new words contained within the text data and establishing an emotional classification. The progress of the experiment is divided into three categories. First, a word collected by collecting a new word on the social media is subjected to learned of affirmative and negative. Next, to derive and verify emotional values using standard documents, TF-IDF is used to score noun sensibilities to enter the emotional values of the data. As with the new words, the classified emotional values are applied to verify that the emotions are classified in standard language documents. Finally, a combination of the newly coined words and standard emotional values is used to perform a comparative analysis of the technology of the instrument.

Scalarization of HPF FORALL Construct (HPF FORALL 구조의 스칼라화(Scalarization))

  • Koo, Mi-Soon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.12 no.5
    • /
    • pp.121-129
    • /
    • 2007
  • Scalarization is a process that a parallel construct like an array statement of Fortran 90 or FORALL of HPF is converted into sequential loops that maintain the correct semantics. Most compilers of HPF, recognized as a standard data parallel language, convert a HPF program into a Fortran 77 program inserted message passing primitives. During scalariztion, a parallel construct FORALL should be translated into Fortran 77 DO loops maintaining the semantics of FORALL. In this paper, we propose a scalarization algorithm which converts a FORALL construct into a DO loop with improved performance. For this, we define and use a relation distance vector to keep necessary dependence informations. Then we evaluate execution times of the codes generated by our method and by PARADIGM compiler method for various array sizes.

  • PDF

An E-Mail Question Answering System using Question Generation Model (질의생성 모델을 이용한 전자우편 질의응답 시스템)

  • Zhang, Jeong-Sun;Kim, Sang-Bum;Seo, Hee-Chul;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.176-183
    • /
    • 2002
  • 전자우편과 같이 일정한 질의 형식을 가지고 있는 긴 자연어 질의에 대해서 사용자 질의 단어에 가중치를 부과하는 방법과 질의에 대한 정답을 기존의 질의응답 집합에서 유사한 질의를 검색하여 그 정답을 사용자에게 제공하는 전자우편 질의응답 시스템을 제안한다. 사용자의 긴 자연어 질의가 주어지면 질의의 범주와 문장의 중요도 정보를 이용하여 질의에서 사용된 단어가 주제어로 쓰였을 확률을 계산하고, 계산된 확률에 기반하여 중요도를 할당하는 질의생성 모델을 제안한다. 또한 사용자 질의와 기존에 문의되어진 전자우편 질의의 유사도를 단어의 빈도를 고려한 어휘유사도, 한글 시소러스(Thesaurus)를 이용한 의미유사도와 본 논문에서 제안한 질의생성 모델을 이용한 주제 유사도를 이용하여 계산한다. 실험을 위하여 실세계에서 사용 중인 질의응답 집합을 이용하여 실험을 하였으며 각 유사도 계산 방법의 기여도를 비교 평가하고 제안한 질의생성모델이 성능향상에 미치는 영향을 평가하였다.

  • PDF

English Learning Applications Using Big Data Development (빅데이터를 활용한 영어학습 애플리케이션 설계 및 구현)

  • Lee, Jae-hoon;Kim, Seung-beom;Kim, Chang-young;Yang, Won-seok;Kim, Do-woo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.11a
    • /
    • pp.644-647
    • /
    • 2020
  • 최근 교육분야에서는 IT 기술을 활용하여 교육을 혁신하는 것을 의미하는 에듀테크에 대한 관심이 높아지고 있다. 단순한 지식의 전달이 아닌 사용자의 수준에 맞춰진 학습을 하고 자신의 학습 내용을 스스로 모니터링할 수 있는 새로운 교육시스템이 필요하다. 이에 본 논문에서는 빅데이터를 활용한 영어학습 애플리케이션를 제안한다. 제안하는 애플리케이션은 영어뉴스 기사에서 추출한 빅데이터를 활용하여 사용자 수준에 맞춘 유용한 문장을 분석해 자동으로 문제를 생성하고 사용자의 음성데이터를 강세 분석 알고리즘으로 원어민 발음과 비교분석 하여 발음 및 강세를 교정할 수 있도록 설계 및 구현하였다.