• Title/Summary/Keyword: 자질 생성

Search Result 102, Processing Time 0.031 seconds

Morpheme Recovery Based on Naïve Bayes Model (NB 모델을 이용한 형태소 복원)

  • Kim, Jae-Hoon;Jeon, Kil-Ho
    • The KIPS Transactions:PartB
    • /
    • v.19B no.3
    • /
    • pp.195-200
    • /
    • 2012
  • In Korean, spelling change in various forms must be recovered into base forms in morphological analysis as well as part-of-speech (POS) tagging is difficult without morphological analysis because Korean is agglutinative. This is one of notorious problems in Korean morphological analysis and has been solved by morpheme recovery rules, which generate morphological ambiguity resolved by POS tagging. In this paper, we propose a morpheme recovery scheme based on machine learning methods like Na$\ddot{i}$ve Bayes models. Input features of the models are the surrounding context of the syllable which the spelling change is occurred and categories of the models are the recovered syllables. The POS tagging system with the proposed model has demonstrated the $F_1$-score of 97.5% for the ETRI tree-tagged corpus. Thus it can be decided that the proposed model is very useful to handle morpheme recovery in Korean.

An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI) (지도적 잠재의미색인(LSI)기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구)

  • Lee, Ji-Hye;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.451-462
    • /
    • 2009
  • The aim of this study is to apply latent semantic indexing(LSI) techniques for efficient automatic classification of opinionated documents. For the experiments, we collected 1,000 opinionated documents such as reviews and news, with 500 among them labelled as positive documents and the remaining 500 as negative. In this study, sets of content words and sentiment words were extracted using a POS tagger in order to identify the optimal feature set in opinion classification. Findings addressed that it was more effective to employ LSI techniques than using a term indexing method in sentiment classification. The best performance was achieved by a supervised LSI technique.

Extraction of Relationships between Scientific Terms based on Composite Kernels (혼합 커널을 활용한 과학기술분야 용어간 관계 추출)

  • Choi, Sung-Pil;Choi, Yun-Soo;Jeong, Chang-Hoo;Myaeng, Sung-Hyon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.12
    • /
    • pp.988-992
    • /
    • 2009
  • In this paper, we attempted to extract binary relations between terminologies using composite kernels consisting of convolution parse tree kernels and WordNet verb synset vector kernels which explain the semantic relationships between two entities in a sentence. In order to evaluate the performance of our system, we used three domain specific test collections. The experimental results demonstrate the superiority of our system in all the targeted collection. Especially, the increase in the effectiveness on KREC 2008, 8% in F1, shows that the core contexts around the entities play an important role in boosting the entire performance of relation extraction.

A Semi-automatic Construction method of a Named Entity Dictionary Based on Wikipedia (위키피디아 기반 개체명 사전 반자동 구축 방법)

  • Song, Yeongkil;Jeong, Seokwon;Kim, Harksoo
    • Journal of KIISE
    • /
    • v.42 no.11
    • /
    • pp.1397-1403
    • /
    • 2015
  • A named entity(NE) dictionary is an important resource for the performance of NE recognition. However, it is not easy to construct a NE dictionary manually since human annotation is time consuming and labor-intensive. To save construction time and reduce human labor, we propose a semi-automatic system for the construction of a NE dictionary. The proposed system constructs a pseudo-document with Wiki-categories per NE class by using an active learning technique. Then, it calculates similarities between Wiki entries and pseudo-documents using the BM25 model, a well-known information retrieval model. Finally, it classifies each Wiki entry into NE classes based on similarities. In experiments with three different types of NE class sets, the proposed system showed high performance(macro-average F1-score of 0.9028 and micro-average F1-score 0.9554).

Voice-to-voice conversion using transformer network (Transformer 네트워크를 이용한 음성신호 변환)

  • Kim, June-Woo;Jung, Ho-Young
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.55-63
    • /
    • 2020
  • Voice conversion can be applied to various voice processing applications. It can also play an important role in data augmentation for speech recognition. The conventional method uses the architecture of voice conversion with speech synthesis, with Mel filter bank as the main parameter. Mel filter bank is well-suited for quick computation of neural networks but cannot be converted into a high-quality waveform without the aid of a vocoder. Further, it is not effective in terms of obtaining data for speech recognition. In this paper, we focus on performing voice-to-voice conversion using only the raw spectrum. We propose a deep learning model based on the transformer network, which quickly learns the voice conversion properties using an attention mechanism between source and target spectral components. The experiments were performed on TIDIGITS data, a series of numbers spoken by an English speaker. The conversion voices were evaluated for naturalness and similarity using mean opinion score (MOS) obtained from 30 participants. Our final results yielded 3.52±0.22 for naturalness and 3.89±0.19 for similarity.

An Attention Method-based Deep Learning Encoder for the Sentiment Classification of Documents (문서의 감정 분류를 위한 주목 방법 기반의 딥러닝 인코더)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.268-273
    • /
    • 2017
  • Recently, deep learning encoder-based approach has been actively applied in the field of sentiment classification. However, Long Short-Term Memory network deep learning encoder, the commonly used architecture, lacks the quality of vector representation when the length of the documents is prolonged. In this study, for effective classification of the sentiment documents, we suggest the use of attention method-based deep learning encoder that generates document vector representation by weighted sum of the outputs of Long Short-Term Memory network based on importance. In addition, we propose methods to modify the attention method-based deep learning encoder to suit the sentiment classification field, which consist of a part that is to applied to window attention method and an attention weight adjustment part. In the window attention method part, the weights are obtained in the window units to effectively recognize feeling features that consist of more than one word. In the attention weight adjustment part, the learned weights are smoothened. Experimental results revealed that the performance of the proposed method outperformed Long Short-Term Memory network encoder, showing 89.67% in accuracy criteria.

A Study on the Reliable Video Transmission Through Source/Channel Combined Optimal Quantizer for EREC Based Bitstream (EREC 기반 비트열을 위한 Source-Channel 결합 최적 양자화기 설계 및 이를 통한 안정적 영상 전송에 관한 연구)

  • 김용구;송진규;최윤식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.12B
    • /
    • pp.2094-2108
    • /
    • 2000
  • 오류를 수반하는 통신망을 통한 멀티미디어 데이터의 응용은 최근 그 수요가 급증하고 있다. 하지만 그 구현은 많은 문제점들을 야기하는데, 전송된 비디오 데이터에 발생한 오류를 처리하는 문제가 그 중 하나이다. 이는 압축된 비트열에 발생한 오류가 영상의 시-공간 방향으로 심각한 전파 현상을 수반하기 때문이다. 이러한 심각한 오류 전파를 완화하기 위해 본 논문에서는 EREC라 알려진 오류 제한 기법을 적용하고, 적용된 EREC의 오류 전파 특성을 분석하였다. 이를 통해, 압축 부호화된 하나의 기본 블록 (매크로 블록)이 복호시 오류가 생길 확률을 추정하였으며, 추정된 확률의 근사를 통해 양 끝단(전송단과 수신단)에서의 비디오 화질 열화를 예측하였다. 추정 확률의 근사는 매 기본 블록에서 발생된 비트수에 대한 그 기본 블록이 복호시 오류가 생길 확률을 간단한 1차식을 통한 선형 회귀법으로 모델링 되었으며, 따라서 간단한 방법을 통해 양 끝단의 화질 열화를 효과적으로 예측할 수 있었다. 부호화된 비트열이 전송 오류에 보다 강인하게 되도록 하기 위해, 본 논문에서 개발된 화질 열화 모델을 양자화기 선택에 적용함으로써, 새로운 최적 양자화 기법을 제시하였다. 본 논문에서 제안된 최적 양자화 기법은, 기존의 양자기 최적화 기법들과는 달리, 복호단에서의 복원 영상 화질이 주어진 비트율에서 최적이 되도록 양자화를 수행한다. H.263 비디오 압축 규격에 적용한 제안 양자화 기법의 실험 결과를 통해, 제안 기법이 매우 적은 계산상의 부하를 비용으로 객관적 화질은 물론 주관적 화질까지 크게 개선할 수 있음을 확인할 수 있었다.내었다.Lc. lacti ssp. lactis의 젖산과 초산의 생성량은 각각 0.089, 0.003과 0.189, 0.003M이었다. 따라서 corn steep liquor는 L. fermentum와 Lc. lactis ssp, lactis 의 생장을 위해 질소 또는 탄소 공급원으로서 배지에 첨가 될 수 있는 우수한 농업 부산물로 판단되었다.징하며 WLWQ에 적용되는 몇 가지 제약을 관찰하고 이를 일반적인 언어원리로 설명한다. 첫째, XP는 주어로만 해석되는데 그 이유는 XP가 목적어 혹은 부가어 등 다른 기능을 할 경우 생략 부위가 생략의 복원 가능선 원리 (the deletion-up-to recoverability principle)를 위배하기 때문이다. 둘째, WLWQ가 내용 의문문으로만 해석되는데 그 이유는 양의 공리(the maxim of quantity: Grice 1975) 때문이다. 평서문으로 해석될 경우 WP에 들어갈 부분이 XP의 자질의 부분집합에 불과하므로 명제가 아무런 정보제공을 하지 못한다. 반면 의문문 자체는 정보제공을 추구하지 않으므로 앞에서 언급한 양의 공리로부터 자유롭다. 셋째, WLWQ의 XP는 주제어 표지 ‘는/-은’을 취하나 주어표지 ‘가/-이’는 취하지 못한다(XP-는/-은 vs. XP-가/-이). 이는 IP내부 에 비공범주의 존재 여부에 따라 C의 음운형태(PF)가 시성이 정해진다는 가설로 설명하고자 했다. WLWQ에 대한 우리의 논의가 옳다면, 본 논문은 다음과 같은 이론적 함의를 기닌다. 첫째, WLWQ의 존재는 생략에 대한 두 이론 즉 LF 복사 이론과 PF 삭제 이론

  • PDF

Multiple Cause Model-based Topic Extraction and Semantic Kernel Construction from Text Documents (다중요인모델에 기반한 텍스트 문서에서의 토픽 추출 및 의미 커널 구축)

  • 장정호;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.595-604
    • /
    • 2004
  • Automatic analysis of concepts or semantic relations from text documents enables not only an efficient acquisition of relevant information, but also a comparison of documents in the concept level. We present a multiple cause model-based approach to text analysis, where latent topics are automatically extracted from document sets and similarity between documents is measured by semantic kernels constructed from the extracted topics. In our approach, a document is assumed to be generated by various combinations of underlying topics. A topic is defined by a set of words that are related to the same topic or cooccur frequently within a document. In a network representing a multiple-cause model, each topic is identified by a group of words having high connection weights from a latent node. In order to facilitate teaming and inferences in multiple-cause models, some approximation methods are required and we utilize an approximation by Helmholtz machines. In an experiment on TDT-2 data set, we extract sets of meaningful words where each set contains some theme-specific terms. Using semantic kernels constructed from latent topics extracted by multiple cause models, we also achieve significant improvements over the basic vector space model in terms of retrieval effectiveness.

A Comparative Study of Text analysis and Network embedding Methods for Effective Fake News Detection (효과적인 가짜 뉴스 탐지를 위한 텍스트 분석과 네트워크 임베딩 방법의 비교 연구)

  • Park, Sung Soo;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.5
    • /
    • pp.137-143
    • /
    • 2019
  • Fake news is a form of misinformation that has the advantage of rapid spreading of information on media platforms that users interact with, such as social media. There has been a lot of social problems due to the recent increase in fake news. In this paper, we propose a method to detect such false news. Previous research on fake news detection mainly focused on text analysis. This research focuses on a network where social media news spreads, generates qualities with DeepWalk, a network embedding method, and classifies fake news using logistic regression analysis. We conducted an experiment on fake news detection using 211 news on the Internet and 1.2 million news diffusion network data. The results show that the accuracy of false network detection using network embedding is 10.6% higher than that of text analysis. In addition, fake news detection, which combines text analysis and network embedding, does not show an increase in accuracy over network embedding. The results of this study can be effectively applied to the detection of fake news that organizations spread online.

The Centra of the aesthetics and the aspect of existence of the Sijo in the Choseon Dynasty (조선조(朝鮮朝) 시조(時調)의 미적주체(美的主體)와 그 존재양상(存在樣相))

  • Choi, Dong-Kook
    • Sijohaknonchong
    • /
    • v.23
    • /
    • pp.107-131
    • /
    • 2005
  • The purpose of this is to make sure what the scholar of the Choseon thought the object of aesthetics, and which psychological element did they use to clear the center of the aesthetics. In addition, I consider their way to recognize the human's aesthetic point of view. A literary man in Choseon regarded the five senses as an obstacle which interrupted to see the nature of all things. So they set up the spiritual condition as their aesthetic subject, with suppressing the pleasure through five senses. The goal to make one's mind calm was to cause someone's spirit to activate freely. The condition of the calmness was considered as a process of recovering the human's natural spirit, which was connected to the poetic spirit and became the aesthetic subject. Simultaneously, a literary man at that time set up the condition of the calmness as a subject of objective consciousness, and estimated the common things as a objective existence, which led to the result that all the materials of the poem and the poet could be unified. This aesthetic centra put a premium on the senses caused by contacting with all natural things, and adopted the 'Xing' as a method of expression. This 'Xing' is not a sense based upon one's experience, but a sense which is created of itself. The expression method of 'Xing' caused the reader to feel and express, encourage, and sublimate what they feel through the poem, as well as purified reader's mind.

  • PDF