• Title/Summary/Keyword: word similarity

Search Result 301, Processing Time 0.027 seconds

Topic based Web Document Clustering using Named Entities (개체명을 이용한 주제기반 웹 문서 클러스터링)

  • Sung, Ki-Youn;Yun, Bo-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.29-36
    • /
    • 2010
  • Past clustering researches are focused on extraction of keyword for word similarity grouping. However, too many candidates to compare and compute bring high complexity, low speed and low accuracy. To overcome these weaknesses, this paper proposed a topical web document clustering model using not only keyword but also named entities such as person name, organization, location, and so on. By several experiments, we prove effects of our model compared with traditional model based on only keyword and analyze how different effects show according to characteristics of document collection.

Speaker Adaptation Using i-Vector Based Clustering

  • Kim, Minsoo;Jang, Gil-Jin;Kim, Ji-Hwan;Lee, Minho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2785-2799
    • /
    • 2020
  • We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-term memory.

Integrated Clustering Method based on Syntactic Structure and Word Similarity for Statistical Machine Translation (문장구조 유사도와 단어 유사도를 이용한 클러스터링 기반의 통계기계번역)

  • Kim, Hankyong;Na, Hwi-Dong;Li, Jin-Ji;Lee, Jong-Hyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.44-49
    • /
    • 2009
  • 통계기계번역에서 도메인에 특화된 번역을 시도하여 성능향상을 얻는 방법이 있다. 이를 위하여 문장의 유형이나 장르에 따라 클러스터링을 수행한다. 그러나 기존의 연구 중 문장의 유형 정보와 장르에 따른 정보를 동시에 사용한 경우는 없었다. 본 논문에서는 문장 사이의 문법적 구조 유사성으로 문장을 유형별로 분류하는 새로운 기법을 제시하였고, 단어 유사도 정보로 문서의 장르를 구분하여 기존의 두 기법을 통합하였다. 이렇게 분류된 말뭉치에서 추출한 모델과 전체 말뭉치에서 추출된 모델에서 보간법(interpolation)을 사용하여 통계기계번역의 성능을 향상하였다. 문장구조의 유사성과 단어 유사도 계산을 위하여 각각 커널과 코사인 유사도를 적용하였으며, 두 유사도를 적용하여 말뭉치를 분류하는 과정은 K-Means 알고리즘과 유사한 기계학습 기법을 사용하였다. 이를 일본어-영어의 특허문서에서 실험한 결과 최선의 경우 약 2.5%의 상대적인 성능 향상을 얻었다.

  • PDF

Analyzing Customer Feedback Differences between VOCs and External Channels (VOC와 외부채널간의 고객 피드백 차이 분석)

  • Ahn, Sang Hyeon;Baek, Dong Hyun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.3
    • /
    • pp.129-137
    • /
    • 2018
  • VOCs have been used as the most definitive resource to reflect customer feedback when developing products and services. However, due to the development of the Internet and the emergence of SNS, VOC is no longer the only channel that represents customer opinions. There are also a number of studies showing that many customers express complaints through channels other than VOCs. In this paper, we analyze the difference between the official VOC data and the data collected through the external channel, and suggest ways to reflect the various opinions of customers. To do this, this study uses keyword analysis that can identify differences according to frequency through social network, modular analysis to distinguish topics according to centrality and similarity, and emotional analysis to confirm word polarity (positive and negative). The results of this study show that the opinions of the customers were different depending on channels such as VOCs and external channels. Therefore, the collected data through VOC as well as external channels should be used in order to reflect the opinions of customers. In particular, this paper confirms that the results of one channel may vary depending on the channel characteristics even for the same channel. This confirms that collecting voc only on certain channels may differ from what real customers require. Therefore, data collected through VOCs as well as external channels must be used to reflect various customer feedback.

Design and Implementation of Computational Model Simulating Language Phenomena in Lexical Decision Task (어휘판단 과제 시 보이는 언어현상의 계산주의적 모델 설계 및 구현)

  • Park, Kinam;Lim, Heuiseok;Nam, Kichun
    • The Journal of Korean Association of Computer Education
    • /
    • v.9 no.2
    • /
    • pp.89-99
    • /
    • 2006
  • This paper proposes a computational model which can simulate peculiar language phenomena observed in human lexical decision task. The model is designed to mimic major language phenomena such as frequency effect, lexical status effect, word similarity, and semantic priming effect. The experimental results show that the propose model replicated the major language phenomena and performed similar performance with that of human in LDT.

  • PDF

A Study on Analysis of Source Code for Program Protection in ICT Environment (ICT 환경에서 프로그램보호를 위한 소스코드 분석 사례 연구)

  • Lee, Seong-Hoon;Lee, Dong-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.7 no.4
    • /
    • pp.69-74
    • /
    • 2017
  • ICT(Information Communication Technology) is a key word in our society on today. Various support programs by the government have given many quantitative and qualitative changes to the software industries. Software is instructions(Computer Program) and data structure. Software can be divided into Application program and System program. Application programs have been developed to perform special functions or provide entertainment functions. Because of this rapid growth of software industries, one of the problems is issue on copyright of program. In this paper, we described an analysis method for program similarity based on source code in program.

A Method to Measure the Self-Supplied News Volumes of Internet Newspaper Company

  • Kim, Dong-Joo;Lee, Won Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.10
    • /
    • pp.99-105
    • /
    • 2015
  • The growth of internet infrastructure and a tremendous increment of internet users lead actively to found internet newspaper publishing companies, which are able to dig up and publish own news articles. In disregard of these quantitative growth of internet newspaper companies, the qualitative growth of them doesn't coincide with the quantitative growth. Therefore, to require social responsibility and to build healthy media environment, Korean government has put in force registration system of internet newspaper company. According to this system, internet newspaper companies have to produce at the inside over 30 percent of weekly publications, and this requisite increases the needs of its verification. This paper investigates technologies to measure the self-supplied news volumes of internet newspaper company, examines validity of them, and presents appropriate method to measure. To compare huge amount of news articles rapidly, the presented method is based on the modified edit-distance, which reflects human cognition of word and empirical information related with it. To prove correctness of our presented method, we show experimental results for some real internet news articles.

Comparative Study on Consumers' Perceptive Attitude and Origins of 'Tattoo' and 'Moonsin' (태투(Tattoo)와 문신(文身)에 관한 소비자인지도 및 유래에 나타난 차이점 비교)

  • Song, Nam-Kyung;Park, Sook-Hyun
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.31 no.1 s.160
    • /
    • pp.107-118
    • /
    • 2007
  • The purpose of this study is to examine the realities of the chaotic use of terms, 'tattoo' and 'moonsin', through the empirical field researches. This paper will research the differences in the origins and the etymological meanings of 'tattoo' and 'moonsin' through examining related literatures. Clarifying the term definitions on 'tatto' and 'moonsin', this research is to help fashion consumers to use these terms discretely. In order to figure out consumers' perceptive attitude, this study has performed the questionnaire inquiry and has reached the result by analyzing the level of frequency of using the two terms. 1. The result of the term-preference inquiry tells that consumers prefer 'tattoo' to 'moosin'. However, the inquiry shows considerable number of them use the two terms indiscretely. 2. The study on the perceptions from the two terms shows: the term 'tattoo' is often related to positive images-fashionable, charming, and sexy, and the term 'moonsin' to negative ones-violent, anti-social, and demonic. 3. Both 'tattoo' and 'moonsin' shares the similarity in terms of engraving patterns on skin and coloring them. 4. 'Tattoo' is originally derived from the Polynesian word 'tatau', which means 'artistic'. 'Tatau' is a kind of ethnic art practiced on Polynesian people's skin. The design patterns and practicing techniques are very similar to those on the Polynesian earthware called 'Lapita'.

Research on Subjective-type Grading System Using Syntactic-Semantic Tree Comparator (구문의미트리 비교기를 이용한 주관식 문항 채점 시스템에 대한 연구)

  • Kang, WonSeog
    • The Journal of Korean Association of Computer Education
    • /
    • v.21 no.6
    • /
    • pp.83-92
    • /
    • 2018
  • The subjective question is appropriate for evaluation of deep thinking, but it is not easy to score. Since, regardless of same scoring criterion, the graders are able to produce different scores, we need the objective automatic evaluation system. However, the system has the problem of Korean analysis and comparison. This paper suggests the Korean syntactic analysis and subjective grading system using the syntactic-semantic tree comparator. This system is the hybrid grading system of word based and syntactic-semantic tree based grading. This system grades the answers on the subjective question using the syntactic-semantic comparator. This proposed system has the good result. This system will be utilized in Korean syntactic-semantic analysis, subjective question grading, and document classification.

Malware API Classification Technology Using LSTM Deep Learning Algorithm (LSTM 딥러닝 알고리즘을 활용한 악성코드 API 분류 기술 연구)

  • Kim, Jinha;Park, Wonhyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.259-261
    • /
    • 2022
  • Recently, malicious code is not a single technique, but several techniques are combined and merged, and only important parts are extracted. As new malicious codes are created and transformed, attack patterns are gradually diversified and attack targets are also diversifying. In particular, the number of damage cases caused by malicious actions in corporate security is increasing over time. However, even if attackers combine several malicious codes, the APIs for each type of malicious code are repeatedly used and there is a high possibility that the patterns and names of the APIs are similar. For this reason, this paper proposes a classification technique that finds patterns of APIs frequently used in malicious code, calculates the meaning and similarity of APIs, and determines the level of risk.

  • PDF