• Title/Summary/Keyword: 어휘 압축

Search Result 11, Processing Time 0.025 seconds

Improving Clustered Sense Labels for Word Sense Disambiguation (단어 의미 모호성 해소를 위한 군집화된 의미 어휘의 품질 향상)

  • Jeongyeon Park;Hyeong Jin Shin;Jae Sung Lee
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.268-271
    • /
    • 2022
  • 단어 의미 모호성 해소는 동형이의어의 의미를 문맥에 맞게 결정하는 일이다. 최근 연구에서는 희소 데이터 처리를 위해 시소러스를 사용해 의미 어휘를 압축하고 사용하는 방법이 좋은 성능을 보였다[1]. 본 연구에서는 시소러스 없이 군집화 알고리즘으로 의미 어휘를 압축하는 방법의 성능 향상을 위해 두 가지 방법을 제안한다. 첫째, 의미적으로 유사한 의미 어휘 집합인 범주(category) 정보를 군집화를 위한 초기 군집 생성에 사용한다. 둘째, 다양하고 많은 문맥 정보를 학습해 만들어진 품질 좋은 벡터를 군집화에 사용한다. 영어데이터인 SemCor 데이터를 학습하고 Senseval, Semeval 5개 데이터로 평가한 결과, 제안한 방법의 평균 성능이 기존 연구보다 1.5%p 높은 F1 70.6%를 달성했다.

  • PDF

Word Sense Disambiguation Using Korean Word Definition Vectors (한국어 단어 정의 벡터를 이용한 단어 의미 모호성 해소)

  • Park, Jeong Yeon;Lee, Jae Sung
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.195-198
    • /
    • 2021
  • 기존 연구에 따르면, 시소러스의 계층적 관계를 기반으로 압축한 의미 어휘 태그를 단어 의미 모호성 해소에 사용할 경우, 그 성능이 향상되었다. 본 논문에서는 시소러스를 사용하지 않고, 국어 사전에 포함된 단어의 의미 정의를 군집화하여 압축된 의미 어휘 태그를 만드는 방법을 제안한다. 또, 이를 이용하여 효율적으로 단어 의미 모호성을 해소하는 BERT 기반의 딥러닝 모델을 제안한다. 한국어 세종 의미 부착 말뭉치로 실험한 결과, 제안한 방법의 성능이 F1 97.21%로 기존 방법의 성능 F1 95.58%보다 1.63%p 향상되었다.

  • PDF

The Vocabulary Recognition Optimize using Acoustic and Lexical Search (음향학적 및 언어적 탐색을 이용한 어휘 인식 최적화)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.4
    • /
    • pp.496-503
    • /
    • 2010
  • Speech recognition system is developed of standalone, In case of a mobile terminal using that low recognition rate represent because of limitation of memory size and audio compression. This study suggest vocabulary recognition highest performance improvement system for separate acoustic search and lexical search. Acoustic search is carry out in mobile terminal, lexical search is carry out in server processing system. feature vector of speech signal extract using GMM a phoneme execution, recognition a phoneme list transmission server using Lexical Tree Search algorithm lexical search recognition execution. System performance as a result of represent vocabulary dependence recognition rate of 98.01%, vocabulary independence recognition rate of 97.71%, represent recognition speed of 1.58 second.

Emotion Analysis Using a Bidirectional LSTM for Word Sense Disambiguation (양방향 LSTM을 적용한 단어의미 중의성 해소 감정분석)

  • Ki, Ho-Yeon;Shin, Kyung-shik
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.197-208
    • /
    • 2020
  • Lexical ambiguity means that a word can be interpreted as two or more meanings, such as homonym and polysemy, and there are many cases of word sense ambiguation in words expressing emotions. In terms of projecting human psychology, these words convey specific and rich contexts, resulting in lexical ambiguity. In this study, we propose an emotional classification model that disambiguate word sense using bidirectional LSTM. It is based on the assumption that if the information of the surrounding context is fully reflected, the problem of lexical ambiguity can be solved and the emotions that the sentence wants to express can be expressed as one. Bidirectional LSTM is an algorithm that is frequently used in the field of natural language processing research requiring contextual information and is also intended to be used in this study to learn context. GloVe embedding is used as the embedding layer of this research model, and the performance of this model was verified compared to the model applied with LSTM and RNN algorithms. Such a framework could contribute to various fields, including marketing, which could connect the emotions of SNS users to their desire for consumption.

A Novel VLSI Architecture for Parallel Adaptive Dictionary-Base Text Compression (가변 적응형 사전을 이용한 텍스트 압축방식의 병렬 처리를 위한 VLSI 구조)

  • Lee, Yong-Doo;Kim, Hie-Cheol;Kim, Jung-Gyu
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1495-1507
    • /
    • 1997
  • Among a number of approaches to text compression, adaptive dictionary schemes based on a sliding window have been very frequently used due to their high performance. The LZ77 algorithm is the most efficient algorithm which implements such adaptive schemes for the practical use of text compression. This paperpresents a VLSI architecture designed for processing the LZ77 algorithm in parallel. Compared with the other VLSI architectures developed so far, the proposed architecture provides the more viable solution to high performance with regard to its throughput, efficient implementation of the VLSI systolic arrays, and hardware scalability. Indeed, without being affected by the size of the sliding window, our system has the complexity of O(N) for both the compression and decompression and also requires small wafer area, where N is the size of the input text.

  • PDF

A Design and Implementation of WML Compiler for WAP Gateway for Wireless Internet Services (무선 인터넷 서비스를 위한 WAP 게이트웨이용 WML 컴파일러의 설계 및 구현)

  • Choi, Eun-Jeong;Han, Dong-Won;Lim, Kyung-Shik
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.2
    • /
    • pp.165-182
    • /
    • 2001
  • In this paper, we describe a design and implementation of the Wireless Markup Language(WML) compiler to deploy wireless Internet services effectively. The WML compiler translates textual WML decks into binary ones in order to reduce the traffic on wireless links that have relatively low bandwidth to wireline links and mitigate the processing overhead of WML decks on, wireless terminals that have relatively low processing power to fixed workstations. In addition, it takes over the overhead of eXtensible Markup Language(XML) well-formedness and validation processes. The WML compiler consists of the lexical analyzer and parser modules. The granunar for the WML parser module is LALR(1) context-free grammar that is designed based on XML 1.0 and WML 1.2 DTD(Document Type Definition) with the consideration of the Wireless Application Protocol Binary XML grammar. The grammar description is converted into a C program to parse that grammar by using parser generator. Even though the tags in WML will be extended or WML DTD will be upgraded, this approach has the advantage of flexibility because the program is generated by modifying just the changed parts. We have verified the functionality of the WML compiler by using a WML decompiler in the public domain and by using the Nokia WAP Toolkit as a WAP client. To measurethe compressibility gain of the WML compiler, we have tested a large number of textual WML decks and obtained a maximum 85 %. As the effect of compression is reduced when the portion of general textual strings increases relative to one of the tags and attributes in a WML deck, an extended encoding method might be needed for specific applications such as compiling of the WML decks to which the Hyper Text Markup Language document is translated dynamically.

  • PDF

A Sentence Reduction Method using Part-of-Speech Information and Templates (품사 정보와 템플릿을 이용한 문장 축소 방법)

  • Lee, Seung-Soo;Yeom, Ki-Won;Park, Ji-Hyung;Cho, Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.5
    • /
    • pp.313-324
    • /
    • 2008
  • A sentence reduction is the information compression process which removes extraneous words and phrases and retains basic meaning of the original sentence. Most researches in the sentence reduction have required a large number of lexical and syntactic resources and focused on extracting or removing extraneous constituents such as words, phrases and clauses of the sentence via the complicated parsing process. However, these researches have some problems. First, the lexical resource which can be obtained in loaming data is very limited. Second, it is difficult to reduce the sentence to languages that have no method for reliable syntactic parsing because of an ambiguity and exceptional expression of the sentence. In order to solve these problems, we propose the sentence reduction method which uses templates and POS(part of speech) information without a parsing process. In our proposed method, we create a new sentence using both Sentence Reduction Templates that decide the reduction sentence form and Grammatical POS-based Reduction Rules that compose the grammatical sentence structure. In addition, We use Viterbi algorithms at HMM(Hidden Markov Models) to avoid the exponential calculation problem which occurs under applying to Sentence Reduction Templates. Finally, our experiments show that the proposed method achieves acceptable results in comparison to the previous sentence reduction methods.

Multi-Document Summarization Method Based on Semantic Relationship using VAE (VAE를 이용한 의미적 연결 관계 기반 다중 문서 요약 기법)

  • Baek, Su-Jin
    • Journal of Digital Convergence
    • /
    • v.15 no.12
    • /
    • pp.341-347
    • /
    • 2017
  • As the amount of document data increases, the user needs summarized information to understand the document. However, existing document summary research methods rely on overly simple statistics, so there is insufficient research on multiple document summaries for ambiguity of sentences and meaningful sentence generation. In this paper, we investigate semantic connection and preprocessing process to process unnecessary information. Based on the vocabulary semantic pattern information, we propose a multi-document summarization method that enhances semantic connectivity between sentences using VAE. Using sentence word vectors, we reconstruct sentences after learning from compressed information and attribute discriminators generated as latent variables, and semantic connection processing generates a natural summary sentence. Comparing the proposed method with other document summarization methods showed a fine but improved performance, which proved that semantic sentence generation and connectivity can be increased. In the future, we will study how to extend semantic connections by experimenting with various attribute settings.

An Extraction Algorithm of Compound Field-associated Terms for Korean Document Classifications (한글문서 분류용으로 이용할 복합어로 구성된 분야연상어의 추출법)

  • Lee, Samuel Sang-kon
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.7
    • /
    • pp.636-649
    • /
    • 2005
  • Field-associated Terms itself have field Information. So, they determine field of document just like when human being perceives field. In case of Korean, we organized and experimented them by collecting approximately IS,999 document banks that are classified into 180 fields. We obtained high precision of extraction that 88,782 single field-associated terms are contracted into 8,405 ones thus recording compression rate as approximately 9$\%$ and recall as above 0.77 (average 0.85), precision as above 0.90 (average 0.94). By applying established field-associated terms to initial determination for document classification and comparing it with filed determination by human being, we got correct answers above approximately 90$\%$. We can use results of research as fundamental research for initial stage and apply it document retrieval between multilingual environment thus utilizing it as fundamental research for multilingual information retrieval.

Hwaseo Lee Hang-ro's View of Scholarship and the World of his Poetry (화서 이항로의 학문관과 시세계)

  • Lee, Hoon
    • (The)Study of the Eastern Classic
    • /
    • no.69
    • /
    • pp.259-296
    • /
    • 2017
  • This thesis examines the reality of Hwaseo's view of scholarship established through the various academic roots, and explains the writing techniques in his poetic world. The results are as follows; With the roots including the Five Books of Confucianism, Five Classics, the books of Zhuxi and Songjadaejeon, Hwaseo established his own scholarship based on the principle of 'keeping of piety and acting with prudence(持敬致愼)'. And he pursued a practical discipline in parallel with 'understanding the utmost principle and extension of knowledge(致知)' and 'diligent self-cultivation(力行)'. The characteristics of his poetic techniques are the use of quotation and interweaving narrative with discussion. He did not stay in just borrowing or variations on the quotation, but gave new meanings beyond the acceptation, and even reached the point of creating newly coined words. He made pithy narratives according to historical events and character's activities, put forward some discussions, and then expressed his emotion. In particular, there were the poems described by interweaving narrative with discussion based on Mencius's historical viewpoint of 'repetition of peaceful times and troublous times one by one(一治一亂)', which could be regarded as the most representative works of his literary value as well as the essence of his scholarship and ideas.