• Title/Summary/Keyword: Embedding vector

Search Result 128, Processing Time 0.02 seconds

Super Resolution by Learning Sparse-Neighbor Image Representation (Sparse-Neighbor 영상 표현 학습에 의한 초해상도)

  • Eum, Kyoung-Bae;Choi, Young-Hee;Lee, Jong-Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.12
    • /
    • pp.2946-2952
    • /
    • 2014
  • Among the Example based Super Resolution(SR) techniques, Neighbor embedding(NE) has been inspired by manifold learning method, particularly locally linear embedding. However, the poor generalization of NE decreases the performance of such algorithm. The sizes of local training sets are always too small to improve the performance of NE. We propose the Learning Sparse-Neighbor Image Representation baesd on SVR having an excellent generalization ability to solve this problem. Given a low resolution image, we first use bicubic interpolation to synthesize its high resolution version. We extract the patches from this synthesized image and determine whether each patch corresponds to regions with high or low spatial frequencies. After the weight of each patch is obtained by our method, we used to learn separate SVR models. Finally, we update the pixel values using the previously learned SVRs. Through experimental results, we quantitatively and qualitatively confirm the improved results of the proposed algorithm when comparing with conventional interpolation methods and NE.

Assignment Semantic Category of a Word using Word Embedding and Synonyms (워드 임베딩과 유의어를 활용한 단어 의미 범주 할당)

  • Park, Da-Sol;Cha, Jeong-Won
    • Journal of KIISE
    • /
    • v.44 no.9
    • /
    • pp.946-953
    • /
    • 2017
  • Semantic Role Decision defines the semantic relationship between the predicate and the arguments in natural language processing (NLP) tasks. The semantic role information and semantic category information should be used to make Semantic Role Decisions. The Sejong Electronic Dictionary contains frame information that is used to determine the semantic roles. In this paper, we propose a method to extend the Sejong electronic dictionary using word embedding and synonyms. The same experiment is performed using existing word-embedding and retrofitting vectors. The system performance of the semantic category assignment is 32.19%, and the system performance of the extended semantic category assignment is 51.14% for words that do not appear in the Sejong electronic dictionary of the word using the word embedding. The system performance of the semantic category assignment is 33.33%, and the system performance of the extended semantic category assignment is 53.88% for words that do not appear in the Sejong electronic dictionary of the vector using retrofitting. We also prove it is helpful to extend the semantic category word of the Sejong electronic dictionary by assigning the semantic categories to new words that do not have assigned semantic categories.

A Word Embedding used Word Sense and Feature Mirror Model (단어 의미와 자질 거울 모델을 이용한 단어 임베딩)

  • Lee, JuSang;Shin, JoonChoul;Ock, CheolYoung
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.226-231
    • /
    • 2017
  • Word representation, an important area in natural language processing(NLP) used machine learning, is a method that represents a word not by text but by distinguishable symbol. Existing word embedding employed a large number of corpora to ensure that words are positioned nearby within text. However corpus-based word embedding needs several corpora because of the frequency of word occurrence and increased number of words. In this paper word embedding is done using dictionary definitions and semantic relationship information(hypernyms and antonyms). Words are trained using the feature mirror model(FMM), a modified Skip-Gram(Word2Vec). Sense similar words have similar vector. Furthermore, it was possible to distinguish vectors of antonym words.

High-Capacity and Robust Watermarking Scheme for Small-Scale Vector Data

  • Tong, Deyu;Zhu, Changqing;Ren, Na;Shi, Wenzhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6190-6213
    • /
    • 2019
  • For small-scale vector data, restrictions on watermark scheme capacity and robustness limit the use of copyright protection. A watermarking scheme based on robust geometric features and capacity maximization strategy that simultaneously improves capacity and robustness is presented in this paper. The distance ratio and angle of adjacent vertices are chosen as the watermark domain due to their resistance to vertex and geometric attacks. Regarding watermark embedding and extraction, a capacity-improved strategy based on quantization index modulation, which divides more intervals to carry sufficient watermark bits, is proposed. By considering the error tolerance of the vector map and the numerical accuracy, the optimization of the capacity-improved strategy is studied to maximize the embedded watermark bits for each vertex. The experimental results demonstrated that the map distortion caused by watermarks is small and much lower than the map tolerance. Additionally, the proposed scheme can embed a copyright image of 1024 bits into vector data of 150 vertices, which reaches capacity at approximately 14 bits/vertex, and shows prominent robustness against vertex and geometric attacks for small-scale vector data.

KoELMo: Deep Contextualized word representations for Korean (KoELMo: 한국어를 위한 문맥화된 단어 표상)

  • Hong, Seung-Yean;Na, Seung-Hoon;Shin, Jong-Hoon;Kim, Young-Kil
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.296-298
    • /
    • 2018
  • 기존의 Word2Vec이나 Glove 등의 단어 임베딩 모델은 문맥에 상관없이 단어의 Vector들이 고정된 Vector를 가지는 문제가 있다. ELMo는 훈련된 Bi-LSTM 모델을 통해서 문장마다 Word Embedding을 수행하기 때문에 문맥에 상관없이 고정된 Vector를 가지는 문제를 해결하였다. 본 논문에서는 한국어와 같이 형태적으로 복잡한 언어의 경우 수 많은 단어가 파생될 수 있어 단어 임베딩 벡터를 직접적으로 얻는 방식에는 한계가 있기 때문에 형태소의 표상들을 결합하여 단어 표상을 사용한 ELMo를 제안한다. ELMo 단어 임베딩을 Biaffine attention 파싱 모델에 적용 결과 UAS에서 91.39%, LAS에서 90.79%으로 기존의 성능보다 향상된 성능을 얻었다.

  • PDF

Compound Noun Decomposition by using Syllable-based Embedding and Deep Learning (음절 단위 임베딩과 딥러닝 기법을 이용한 복합명사 분해)

  • Lee, Hyun Young;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.8 no.2
    • /
    • pp.74-79
    • /
    • 2019
  • Traditional compound noun decomposition algorithms often face challenges of decomposing compound nouns into separated nouns when unregistered unit noun is included. It is very difficult for those traditional approach to handle such issues because it is impossible to register all existing unit nouns into the dictionary such as proper nouns, coined words, and foreign words in advance. In this paper, in order to solve this problem, compound noun decomposition problem is defined as tag sequence labeling problem and compound noun decomposition method to use syllable unit embedding and deep learning technique is proposed. To recognize unregistered unit nouns without constructing unit noun dictionary, compound nouns are decomposed into unit nouns by using LSTM and linear-chain CRF expressing each syllable that constitutes a compound noun in the continuous vector space.

Developing a Graph Convolutional Network-based Recommender System Using Explicit and Implicit Feedback (명시적 및 암시적 피드백을 활용한 그래프 컨볼루션 네트워크 기반 추천 시스템 개발)

  • Xinzhe Li;Dongeon Kim;Qinglong Li;Jaekyeong Kim
    • Journal of Information Technology Services
    • /
    • v.22 no.1
    • /
    • pp.43-56
    • /
    • 2023
  • With the development of the e-commerce market, various types of products continue to be released. However, customers face an information overload problem in purchasing decision-making. Therefore, personalized recommendations have become an essential service in providing personalized products to customers. Recently, many studies on GCN-based recommender systems have been actively conducted. Such a methodology can address the limitation in disabling to effectively reflect the interaction between customer and product in the embedding process. However, previous studies mainly use implicit feedback data to conduct experiments. Although implicit feedback data improves the data scarcity problem, it cannot represent customers' preferences for specific products. Therefore, this study proposed a novel model combining explicit and implicit feedback to address such a limitation. This study treats the average ratings of customers and products as the features of customers and products and converts them into a high-dimensional feature vector. Then, this study combines ID embedding vectors and feature vectors in the embedding layer to learn the customer-product interaction effectively. To evaluate recommendation performance, this study used the MovieLens dataset to conduct various experiments. Experimental results showed the proposed model outperforms the state-of-the-art. Therefore, the proposed model in this study can provide an enhanced recommendation service for customers to address the information overload problem.

A Multi-speaker Speech Synthesis System Using X-vector (x-vector를 이용한 다화자 음성합성 시스템)

  • Jo, Min Su;Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.4
    • /
    • pp.675-681
    • /
    • 2021
  • With the recent growth of the AI speaker market, the demand for speech synthesis technology that enables natural conversation with users is increasing. Therefore, there is a need for a multi-speaker speech synthesis system that can generate voices of various tones. In order to synthesize natural speech, it is required to train with a large-capacity. high-quality speech DB. However, it is very difficult in terms of recording time and cost to collect a high-quality, large-capacity speech database uttered by many speakers. Therefore, it is necessary to train the speech synthesis system using the speech DB of a very large number of speakers with a small amount of training data for each speaker, and a technique for naturally expressing the tone and rhyme of multiple speakers is required. In this paper, we propose a technology for constructing a speaker encoder by applying the deep learning-based x-vector technique used in speaker recognition technology, and synthesizing a new speaker's tone with a small amount of data through the speaker encoder. In the multi-speaker speech synthesis system, the module for synthesizing mel-spectrogram from input text is composed of Tacotron2, and the vocoder generating synthesized speech consists of WaveNet with mixture of logistic distributions applied. The x-vector extracted from the trained speaker embedding neural networks is added to Tacotron2 as an input to express the desired speaker's tone.

Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah;Atwan, Jaffar
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.2
    • /
    • pp.1-17
    • /
    • 2021
  • Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudorelevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query's elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.

Robust Multi-Watermarking Method Based on Vector Quantization Using Index Transform Function (인덱스 변환 함수를 이용한 벡터 양자화 기반의 견고한 다중 워터마킹 방법)

  • Bae Sung-Ho;Song Kun-Woen
    • The KIPS Transactions:PartB
    • /
    • v.12B no.5 s.101
    • /
    • pp.513-520
    • /
    • 2005
  • In this paper, we propose a robust multi-watermarking method based on vector quantization using an index transform function. In contrast with the conventional watermark embedding methods to embed only one watermark at a time into the original image, we present a method to embed multiple watermarks for copyright protection. The proposed method efficiently enhances the robustness by index transform function which minimizes changes of vector quantization indices against various attacks. Experimental results show that the proposed method has a good robustness against various attacks compared with the conventional multi-watermarking method based on vector quantization.