• Title/Summary/Keyword: Embedding method

Search Result 701, Processing Time 0.03 seconds

A Method for Learning the Specialized Meaning of Terminology through Mixed Word Embedding (혼합 임베딩을 통한 전문 용어 의미 학습 방안)

  • Kim, Byung Tae;Kim, Nam Gyu
    • The Journal of Information Systems
    • /
    • v.30 no.2
    • /
    • pp.57-78
    • /
    • 2021
  • Purpose In this study, first, we try to make embedding results that reflect the characteristics of both professional and general documents. In addition, when disparate documents are put together as learning materials for natural language processing, we try to propose a method that can measure the degree of reflection of the characteristics of individual domains in a quantitative way. Approach For this study, the Korean Supreme Court Precedent documents and Korean Wikipedia are selected as specialized documents and general documents respectively. After extracting the most similar word pairs and similarities of unique words observed only in the specialized documents, we observed how those values were changed in the process of embedding with general documents. Findings According to the measurement methods proposed in this study, it was confirmed that the degree of specificity of specialized documents was relaxed in the process of combining with general documents, and that the degree of dissolution could have a positive correlation with the size of general documents.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Lossless Data Hiding Using Modification of Histogram in Wavelet Domain (웨이블릿 영역에서 히스토그램 수정을 이용한 무손실 정보은닉)

  • Jeong Cheol-Ho;Eom Il-Kyu;Kim Yoo-Shin
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.4 s.310
    • /
    • pp.27-36
    • /
    • 2006
  • Lossless data embedding is a method to insert information into a host image that guarantees complete restoration when the extraction has been done. In this paper, we propose a noble reversible data embedding algorithm for images in wavelet domain. The proposed embedding technique, which modifies histogram of wavelet coefficient, is composed of two inserting steps. Data is embedded to wavelet coefficient using modification of histogram in first embedding process. Second embedding step compensates the distortion caused by the first embedding process as well as hides more information. Hence we achieve higher inserting capacity. In view of the relationship between the embedding capacity and the PSNR value, our proposed method shows considerably higher performance than the current reversible data embedding methods.

A Graph Embedding Technique for Weighted Graphs Based on LSTM Autoencoders

  • Seo, Minji;Lee, Ki Yong
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1407-1423
    • /
    • 2020
  • A graph is a data structure consisting of nodes and edges between these nodes. Graph embedding is to generate a low dimensional vector for a given graph that best represents the characteristics of the graph. Recently, there have been studies on graph embedding, especially using deep learning techniques. However, until now, most deep learning-based graph embedding techniques have focused on unweighted graphs. Therefore, in this paper, we propose a graph embedding technique for weighted graphs based on long short-term memory (LSTM) autoencoders. Given weighted graphs, we traverse each graph to extract node-weight sequences from the graph. Each node-weight sequence represents a path in the graph consisting of nodes and the weights between these nodes. We then train an LSTM autoencoder on the extracted node-weight sequences and encode each nodeweight sequence into a fixed-length vector using the trained LSTM autoencoder. Finally, for each graph, we collect the encoding vectors obtained from the graph and combine them to generate the final embedding vector for the graph. These embedding vectors can be used to classify weighted graphs or to search for similar weighted graphs. The experiments on synthetic and real datasets show that the proposed method is effective in measuring the similarity between weighted graphs.

Research Trends on the Thread Embedding Therapy of Neck pain in Traditional Chinese Medicine (경항통에 대한 매선 임상연구의 중국 현황 분석)

  • Jun, Purumea;Kim, Su Ran;Liu, Yan;Park, Ji Eun;Jung, So Young;Han, Chang Hyun
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.31 no.5
    • /
    • pp.284-293
    • /
    • 2017
  • Thread embedding therapy is used increasingly for various disease including neck pain. However, the evidence of thread embedding therapy on neck pain and the assessment of their methodology are still limited. This study aimed to investigate the clinical research methodology using thread embedding therapy on neck pain. Thread embedding therapy is used increasingly for various disease including neck pain. However, the evidence of thread embedding therapy on neck pain and the assessment of their methodology are still limited. This study aimed to investigate the clinical research methodology using thread embedding therapy on neck pain. Total 31 studies were included in analysis. Thread embedding therapy usually was used once a week(32.3%), once per 10days(29.0%), once per two weeks (25.8%). The most common concurrent treatment used with thread embedding therapy was Chinese medicine. Among acupuncture points, EX-B2 (61.3%) was most commonly used, followed by GV14(45.2%), GB20(29.0%), GB21(22.6%). For control group, acupuncture were most commonly used(58.1%). All studies reported that the effect of thread embedding therapy was more effective compared to control group. and 11 studies reported side effects. Only 13 studies(41.9%) reported the appropriate randomization method, and the mean Jadad score of included studies was 1.52. Previous clinical trials included in this study showed the effect of thread embedding therapy for neck pain. However, the quality of the studies was not high. Further rigorous clinical trials are need to assess the effect of thread embedding therapy.

Twitter Hashtags Clustering with Word Embedding (Word Embedding기반 Twitter 해시 태그 클러스터링)

  • Nguyen, Tien Anh;Yang, Hyung-Jeong
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2019.05a
    • /
    • pp.179-180
    • /
    • 2019
  • Nowadays, clustering algorithm is considered as a promising solution for lacking human-labeled and massive data of social media sites in numerous machine learning tasks. Many researchers propose disaster event detection systems have ability to determine special local events, such as missing people, public transport damage by clustering similar tweets and hashtags together. In this paper, we try to extend tweet hashtag feature definition by applying word embedding. The experimental results are described that word embedding achieve better performance than the reference method.

  • PDF

Knowledge Embedding Method for Implementing a Generative Question-Answering Chat System (생성 기반 질의응답 채팅 시스템 구현을 위한 지식 임베딩 방법)

  • Kim, Sihyung;Lee, Hyeon-gu;Kim, Harksoo
    • Journal of KIISE
    • /
    • v.45 no.2
    • /
    • pp.134-140
    • /
    • 2018
  • A chat system is a computer program that understands user's miscellaneous utterances and generates appropriate responses. Sometimes a chat system needs to answer users' simple information-seeking questions. However, previous generative chat systems do not consider how to embed knowledge entities (i.e., subjects and objects in triple knowledge), essential elements for question-answering. The previous chat models have a disadvantage that they generate same responses although knowledge entities in users' utterances are changed. To alleviate this problem, we propose a knowledge entity embedding method for improving question-answering accuracies of a generative chat system. The proposed method uses a Siamese recurrent neural network for embedding knowledge entities and their synonyms. For experiments, we implemented a sequence-to-sequence model in which subjects and predicates are encoded and objects are decoded. The proposed embedding method showed 12.48% higher accuracies than the conventional embedding method based on a convolutional neural network.

Adaptive Image Interpolation Using Pixel Embedding (화소 삽입을 이용한 적응적 영상보간)

  • Han, Kyu-Phil;Oh, Gil-Ho
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.12
    • /
    • pp.1393-1401
    • /
    • 2014
  • This paper presents an adaptive image interpolation method using a pixel-based neighbor embedding which is modified from the patch-based neighbor embedding of contemporary super resolution algorithms. Conventional interpolation methods for high resolution detect at least 16-directional edges in order to remove zig-zaging effects and selectively choose the interpolation strategy according to the direction and value of edge. Thus, they require much computation and high complexity. In order to develop a simple interpolation method preserving edge's directional shape, the proposed algorithm adopts the simplest Haar wavelet and suggests a new pixel-based embedding scheme. First, the low-quality image but high resolution, magnified into 1 octave above, is acquired using an adaptive 8-directional interpolation based on the high frequency coefficients of the wavelet transform. Thereafter, the pixel embedding process updates a high resolution pixel of the magnified image with the weighted sum of the best matched pixel value, which is searched at its low resolution image. As the results, the proposed scheme is simple and removes zig-zaging effects without any additional process.

HTML Tag Depth Embedding: An Input Embedding Method of the BERT Model for Improving Web Document Reading Comprehension Performance (HTML 태그 깊이 임베딩: 웹 문서 기계 독해 성능 개선을 위한 BERT 모델의 입력 임베딩 기법)

  • Mok, Jin-Wang;Jang, Hyun Jae;Lee, Hyun-Seob
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.5
    • /
    • pp.17-25
    • /
    • 2022
  • Recently the massive amount of data has been generated because of the number of edge devices increases. And especially, the number of raw unstructured HTML documents has been increased. Therefore, MRC(Machine Reading Comprehension) in which a natural language processing model finds the important information within an HTML document is becoming more important. In this paper, we propose HTDE(HTML Tag Depth Embedding Method), which allows the BERT to train the depth of the HTML document structure. HTDE makes a tag stack from the HTML document for each input token in the BERT and then extracts the depth information. After that, we add a HTML embedding layer that takes the depth of the token as input to the step of input embedding of BERT. Since tokenization using HTDE identifies the HTML document structures through the relationship of surrounding tokens, HTDE improves the accuracy of BERT for HTML documents. Finally, we demonstrated that the proposed idea showing the higher accuracy compared than the accuracy using the conventional embedding of BERT.

Case report: Correction of nasolabial fold with needle embedding therapy (매선침법을 이용한 비순구 주름 개선 5례)

  • Yun, Young-Hee;Cho, Seung-Pil;Choi, In-Hwa
    • The Journal of Korean Medicine Ophthalmology and Otolaryngology and Dermatology
    • /
    • v.24 no.3
    • /
    • pp.154-161
    • /
    • 2011
  • Objective : Needle embedding therapy is a newly induced therapy which uses specialized tools. The purpose of this study is to report the nasolabial folds correction effect of needle embedding therapy on 5 cases. Method : Five women with no disease were received needle embedding therapy four to six times over one to two months. The photos of each patient were taken before and after the treatment. Also, Glogau photoaging classification were assessed before and after the treatment. Result and conclusion : Needle embedding therapy may be an effective and safe treatment for the correction of nasolabial folds.