• Title/Summary/Keyword: Embedding Dimension

Search Result 75, Processing Time 0.033 seconds

Pairwise Neural Networks for Predicting Compound-Protein Interaction (약물-표적 단백질 연관관계 예측모델을 위한 쌍 기반 뉴럴네트워크)

  • Lee, Munhwan;Kim, Eunghee;Kim, Hong-Gee
    • Korean Journal of Cognitive Science
    • /
    • v.28 no.4
    • /
    • pp.299-314
    • /
    • 2017
  • Predicting compound-protein interactions in-silico is significant for the drug discovery. In this paper, we propose an scalable machine learning model to predict compound-protein interaction. The key idea of this scalable machine learning model is the architecture of pairwise neural network model and feature embedding method from the raw data, especially for protein. This method automatically extracts the features without additional knowledge of compound and protein. Also, the pairwise architecture elevate the expressiveness and compact dimension of feature by preventing biased learning from occurring due to the dimension and type of features. Through the 5-fold cross validation results on large scale database show that pairwise neural network improves the performance of predicting compound-protein interaction compared to previous prediction models.

A Hybrid System of Joint Time-Frequency Filtering Methods and Neural Network Techniques for Foreign Exchange Rate Forecasting (환율예측을 위한 신호처리분석 및 인공신경망기법의 통합시스템 구축)

  • 신택수;한인구
    • Journal of Intelligence and Information Systems
    • /
    • v.5 no.1
    • /
    • pp.103-123
    • /
    • 1999
  • Input filtering as a preprocessing method is so much crucial to get good performance in time series forecasting. There are a few preprocessing methods (i.e. ARMA outputs as time domain filters, and Fourier transform or wavelet transform as time-frequency domain filters) for handling time series. Specially, the time-frequency domain filters describe the fractal structure of financial markets better than the time domain filters due to theoretically additional frequency information. Therefore, we, first of all, try to describe and analyze specially some issues on the effectiveness of different filtering methods from viewpoint of the performance of a neural network based forecasting. And then we discuss about neural network model architecture issues, for example, what type of neural network learning architecture is selected for our time series forecasting, and what input size should be applied to a model. In this study an input selection problem is limited to a size selection of the lagged input variables. To solve this problem, we simulate on analyzing and comparing a few neural networks having different model architecture and also use an embedding dimension measure as chaotic time series analysis or nonlinear dynamic analysis to reduce the dimensionality (i.e. the size of time delayed input variables) of the models. Throughout our study, experiments for integration methods of joint time-frequency analysis and neural network techniques are applied to a case study of daily Korean won / U. S dollar exchange returns and finally we suggest an integration framework for future research from our experimental results.

  • PDF

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

Ring Embedding in (n.K) Star Graphs with Faulty Nodes (결함 노드를 갖는 (n,K)-스타 그래프에서의 링 임베딩)

  • Chang, Jung-Hwan;Kim, Jin-Soo
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.1
    • /
    • pp.22-34
    • /
    • 2002
  • In this paper, we consider ring embeding problem in faulty (n,k) star graphs which is recently proposed as an alternative interconnection network topology, By effectively utilizing such strategies as series of dimension expansions and even distribution of faulty nodes into sub-stars in graph itself. we prove that it is possible to construct a maximal fault-free ring excluding only faulty nodes when the number of faults is no more than n-3 and $n-k{\geq}2$, and also propose an algorithm which can embed the corresponding ring in (n.k)-star graphs This results will be applied into the multicasting applications that the underlying cycle properties on the multi-computer system.

A Novel Approach of Feature Extraction for Analog Circuit Fault Diagnosis Based on WPD-LLE-CSA

  • Wang, Yuehai;Ma, Yuying;Cui, Shiming;Yan, Yongzheng
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.6
    • /
    • pp.2485-2492
    • /
    • 2018
  • The rapid development of large-scale integrated circuits has brought great challenges to the circuit testing and diagnosis, and due to the lack of exact fault models, inaccurate analog components tolerance, and some nonlinear factors, the analog circuit fault diagnosis is still regarded as an extremely difficult problem. To cope with the problem that it's difficult to extract fault features effectively from masses of original data of the nonlinear continuous analog circuit output signal, a novel approach of feature extraction and dimension reduction for analog circuit fault diagnosis based on wavelet packet decomposition, local linear embedding algorithm, and clone selection algorithm (WPD-LLE-CSA) is proposed. The proposed method can identify faulty components in complicated analog circuits with a high accuracy above 99%. Compared with the existing feature extraction methods, the proposed method can significantly reduce the quantity of features with less time spent under the premise of maintaining a high level of diagnosing rate, and also the ratio of dimensionality reduction was discussed. Several groups of experiments are conducted to demonstrate the efficiency of the proposed method.

A review on the t-distributed stochastic neighbors embedding (t-SNE에 대한 요약)

  • Kipoong Kim;Choongrak Kim
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.167-173
    • /
    • 2023
  • This paper investigates several methods of visualizing high-dimensional data in a low-dimensional space. At first, principal component analysis and multidimensional scaling are briefly introduced as linear approaches, and then kernel principal component analysis, self-organizing map, locally linear embedding, Isomap, Laplacian Eigenmaps, and local multidimensional scaling are introduced as nonlinear approaches. In particular, t-SNE, which is widely used but relatively unfamiliar in the field of statistics, is described in more detail. We also present a simple example for several methods, including t-SNE. Finally, we provide a review of several recent studies pointing out the limitations of t-SNE and discuss the future research problems presented.

A Study on prediction of patent big data using supervised learning with dimension reduction model (지도학습 기반의 차원축소 모델을 이용한 특허 빅데이터 예측에 관한 연구)

  • Lee, Juhyun;Lee, Junseok;Kang, Jiho;Park, Sangsung;Jang, Dongsik;Hong, Sungwook;Kim, Sunyoung
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.4
    • /
    • pp.41-49
    • /
    • 2019
  • Patents are system to promote the development of industry by disclosing technology. The importance of recent patent is being emphasized. For this reason, companies apply for many patents. And they analyze the patent. Patent analysis helps to protect and foster their technology. Previously this method has been carried out by experts. Expert-based patent analysis, however, has the disadvantage of being time-consuming and expensive. Consequently, we try to solve this problems by developing prediction model. Therefore, this paper proposes a data-based patent analysis method using quantitative indicator and textual information. We confirmed the practical applicability of the proposed method through 1,831 autonomous vehicle patents. As a result, it was possible to confirmed that safety and lane detection related technologies are important.

A possible application of the PD detection technique using electro-optic Pockels cell with nonlinear characteristic analysis on the PD signals (포켈스 소자를 이용한 PD 신호의 검출 및 비선형적 해석에 관한 연구)

  • Lim, Y.S.;Kang, W.J.;Chang, Y.M.;Koo, J.Y.
    • Proceedings of the KIEE Conference
    • /
    • 2000.07c
    • /
    • pp.1850-1852
    • /
    • 2000
  • In this paper, new Partial Discharge (PD) detection technique using Pockels cell was proposed and considerable apparent chaotic characteristics were discussed. For this purpose, PD was generated from needle-plane electrode in air and detected by optical measuring system using Pockels cell, based on Mach-Zehnder interferometer, consisting of He-Ne laser, single mode optical fiber, 50/50 beam splitter and photo detector. A qualitative analysis was carried out by drawing Return map for the normalized time series of the detected PD signals. The results are as follows:(a) Fixed points, between 0.7 and 1.0, are appeared clearly in the right upper area of the return map as the increase in the number of obtained data.(b) Considerable periodicity have been remarked even though exact period and length can not be determined.(c) The self-similarity can be also observed inasmuch as the late paths do not follow the previous ones. Accordingly, exact quantitative analysis such as embedding dimension, fractal dimension, and Lyapunov exponents should be carried out for deducing the quantitative properties regarding PD phenomena.

  • PDF

The Study on Ultra-Precision Cutting Characteristics Evaluation of Non-Ferrous Metals Using Attractor Quadrant Method (어트랙터 사분면법을 이용한 비철금속의 초정밀 절삭특성 평가에 관한 연구)

  • 고준빈;김건희;윤인식
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.20 no.6
    • /
    • pp.20-26
    • /
    • 2003
  • This study proposes the construction of attractor quadrant method for high-precision cutting characteristics evaluation of non-ferrous metals. Also this paper aims to find the optimal cutting conditions of diamond turning machine by measuring surface form and roughness to perform the cutting experiment of non-ferrous metals, which are aluminum, with diamond tool. As well, according to change cutting conditions such as feed rate, using diamond turning machine to Perform cutting Processing, by measuring cutting force and surface roughness and according to cutting conditions the aluminum about cutting properties. Trajectory changes in the attractor indicated a substantial difference in fractal characteristics and attractor quadrant characteristics. In quantitative quadrant feature extraction, 1,309 point in the case of A17075 (one quadrant) and 1,406 point (one quadrant) in the case of brass were proposed on the basis of attractor reconstruction. Proposed attractor quadrant method can be used for high-precision cutting characteristics evaluation of non-ferrous metals.

Semantic Visualization of Dynamic Topic Modeling (다이내믹 토픽 모델링의 의미적 시각화 방법론)

  • Yeon, Jinwook;Boo, Hyunkyung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.131-154
    • /
    • 2022
  • Recently, researches on unstructured data analysis have been actively conducted with the development of information and communication technology. In particular, topic modeling is a representative technique for discovering core topics from massive text data. In the early stages of topic modeling, most studies focused only on topic discovery. As the topic modeling field matured, studies on the change of the topic according to the change of time began to be carried out. Accordingly, interest in dynamic topic modeling that handle changes in keywords constituting the topic is also increasing. Dynamic topic modeling identifies major topics from the data of the initial period and manages the change and flow of topics in a way that utilizes topic information of the previous period to derive further topics in subsequent periods. However, it is very difficult to understand and interpret the results of dynamic topic modeling. The results of traditional dynamic topic modeling simply reveal changes in keywords and their rankings. However, this information is insufficient to represent how the meaning of the topic has changed. Therefore, in this study, we propose a method to visualize topics by period by reflecting the meaning of keywords in each topic. In addition, we propose a method that can intuitively interpret changes in topics and relationships between or among topics. The detailed method of visualizing topics by period is as follows. In the first step, dynamic topic modeling is implemented to derive the top keywords of each period and their weight from text data. In the second step, we derive vectors of top keywords of each topic from the pre-trained word embedding model. Then, we perform dimension reduction for the extracted vectors. Then, we formulate a semantic vector of each topic by calculating weight sum of keywords in each vector using topic weight of each keyword. In the third step, we visualize the semantic vector of each topic using matplotlib, and analyze the relationship between or among the topics based on the visualized result. The change of topic can be interpreted in the following manners. From the result of dynamic topic modeling, we identify rising top 5 keywords and descending top 5 keywords for each period to show the change of the topic. Existing many topic visualization studies usually visualize keywords of each topic, but our approach proposed in this study differs from previous studies in that it attempts to visualize each topic itself. To evaluate the practical applicability of the proposed methodology, we performed an experiment on 1,847 abstracts of artificial intelligence-related papers. The experiment was performed by dividing abstracts of artificial intelligence-related papers into three periods (2016-2017, 2018-2019, 2020-2021). We selected seven topics based on the consistency score, and utilized the pre-trained word embedding model of Word2vec trained with 'Wikipedia', an Internet encyclopedia. Based on the proposed methodology, we generated a semantic vector for each topic. Through this, by reflecting the meaning of keywords, we visualized and interpreted the themes by period. Through these experiments, we confirmed that the rising and descending of the topic weight of a keyword can be usefully used to interpret the semantic change of the corresponding topic and to grasp the relationship among topics. In this study, to overcome the limitations of dynamic topic modeling results, we used word embedding and dimension reduction techniques to visualize topics by era. The results of this study are meaningful in that they broadened the scope of topic understanding through the visualization of dynamic topic modeling results. In addition, the academic contribution can be acknowledged in that it laid the foundation for follow-up studies using various word embeddings and dimensionality reduction techniques to improve the performance of the proposed methodology.