• Title/Summary/Keyword: Matrix Vector

Search Result 764, Processing Time 0.023 seconds

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Robo-Advisor Algorithm with Intelligent View Model (지능형 전망모형을 결합한 로보어드바이저 알고리즘)

  • Kim, Sunwoong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.39-55
    • /
    • 2019
  • Recently banks and large financial institutions have introduced lots of Robo-Advisor products. Robo-Advisor is a Robot to produce the optimal asset allocation portfolio for investors by using the financial engineering algorithms without any human intervention. Since the first introduction in Wall Street in 2008, the market size has grown to 60 billion dollars and is expected to expand to 2,000 billion dollars by 2020. Since Robo-Advisor algorithms suggest asset allocation output to investors, mathematical or statistical asset allocation strategies are applied. Mean variance optimization model developed by Markowitz is the typical asset allocation model. The model is a simple but quite intuitive portfolio strategy. For example, assets are allocated in order to minimize the risk on the portfolio while maximizing the expected return on the portfolio using optimization techniques. Despite its theoretical background, both academics and practitioners find that the standard mean variance optimization portfolio is very sensitive to the expected returns calculated by past price data. Corner solutions are often found to be allocated only to a few assets. The Black-Litterman Optimization model overcomes these problems by choosing a neutral Capital Asset Pricing Model equilibrium point. Implied equilibrium returns of each asset are derived from equilibrium market portfolio through reverse optimization. The Black-Litterman model uses a Bayesian approach to combine the subjective views on the price forecast of one or more assets with implied equilibrium returns, resulting a new estimates of risk and expected returns. These new estimates can produce optimal portfolio by the well-known Markowitz mean-variance optimization algorithm. If the investor does not have any views on his asset classes, the Black-Litterman optimization model produce the same portfolio as the market portfolio. What if the subjective views are incorrect? A survey on reports of stocks performance recommended by securities analysts show very poor results. Therefore the incorrect views combined with implied equilibrium returns may produce very poor portfolio output to the Black-Litterman model users. This paper suggests an objective investor views model based on Support Vector Machines(SVM), which have showed good performance results in stock price forecasting. SVM is a discriminative classifier defined by a separating hyper plane. The linear, radial basis and polynomial kernel functions are used to learn the hyper planes. Input variables for the SVM are returns, standard deviations, Stochastics %K and price parity degree for each asset class. SVM output returns expected stock price movements and their probabilities, which are used as input variables in the intelligent views model. The stock price movements are categorized by three phases; down, neutral and up. The expected stock returns make P matrix and their probability results are used in Q matrix. Implied equilibrium returns vector is combined with the intelligent views matrix, resulting the Black-Litterman optimal portfolio. For comparisons, Markowitz mean-variance optimization model and risk parity model are used. The value weighted market portfolio and equal weighted market portfolio are used as benchmark indexes. We collect the 8 KOSPI 200 sector indexes from January 2008 to December 2018 including 132 monthly index values. Training period is from 2008 to 2015 and testing period is from 2016 to 2018. Our suggested intelligent view model combined with implied equilibrium returns produced the optimal Black-Litterman portfolio. The out of sample period portfolio showed better performance compared with the well-known Markowitz mean-variance optimization portfolio, risk parity portfolio and market portfolio. The total return from 3 year-period Black-Litterman portfolio records 6.4%, which is the highest value. The maximum draw down is -20.8%, which is also the lowest value. Sharpe Ratio shows the highest value, 0.17. It measures the return to risk ratio. Overall, our suggested view model shows the possibility of replacing subjective analysts's views with objective view model for practitioners to apply the Robo-Advisor asset allocation algorithms in the real trading fields.

Multivariate conditional tail expectations (다변량 조건부 꼬리 기대값)

  • Hong, C.S.;Kim, T.W.
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1201-1212
    • /
    • 2016
  • Value at Risk (VaR) for market risk management is a favorite method used by financial companies; however, there are some problems that cannot be explained for the amount of loss when a specific investment fails. Conditional Tail Expectation (CTE) is an alternative risk measure defined as the conditional expectation exceeded VaR. Multivariate loss rates are transformed into a univariate distribution in real financial markets in order to obtain CTE for some portfolio as well as to estimate CTE. We propose multivariate CTEs using multivariate quantile vectors. A relationship among multivariate CTEs is also derived by extending univariate CTEs. Multivariate CTEs are obtained from bivariate and trivariate normal distributions; in addition, relationships among multivariate CTEs are also explored. We then discuss the extensibility to high dimension as well as illustrate some examples. Multivariate CTEs (using variance-covariance matrix and multivariate quantile vector) are found to have smaller values than CTEs transformed to univariate. Therefore, it can be concluded that the proposed multivariate CTEs provides smaller estimates that represent less risk than others and that a drastic investment using this CTE is also possible when a diversified investment strategy includes many companies in a portfolio.

Analysis of Research Trends in SIAM Journal on Applied Mathematics Using Topic Modeling (토픽모델링을 활용한 SIAM Journal on Applied Mathematics의 연구 동향 분석)

  • Kim, Sung-Yeun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.7
    • /
    • pp.607-615
    • /
    • 2020
  • The purpose of this study was to analyze the research status and trends related to the industrial mathematics based on text mining techniques with a sample of 4910 papers collected in the SIAM Journal on Applied Mathematics from 1970 to 2019. The R program was used to collect titles, abstracts, and key words from the papers and to analyze topic modeling techniques based on LDA algorithm. As a result of the coherence score on the collected papers, 20 topics were determined optimally using the Gibbs sampling methods. The main results were as follows. First, studies on industrial mathematics were conducted in a variety of mathematics fields, including computational mathematics, geometry, mathematical modeling, topology, discrete mathematics, probability and statistics, with a focus on analysis and algebra. Second, 5 hot topics (mathematical biology, nonlinear partial differential equation, discrete mathematics, statistics, topology) and 1 cold topic (probability theory) were found based on time series regression analysis. Third, among the fields that were not reflected in the 2015 revised mathematics curriculum, numeral system, matrix, vector in space, and complex numbers were extracted as the contents to be covered in the high school mathematical curriculum. Finally, this study suggested strategies to activate industrial mathematics in Korea, described the study limitations, and proposed directions for future research.

Filter-Bank Based Regularized Common Spatial Pattern for Classification of Motor Imagery EEG (동작 상상 EEG 분류를 위한 필터 뱅크 기반 정규화 공통 공간 패턴)

  • Park, Sang-Hoon;Kim, Ha-Young;Lee, David;Lee, Sang-Goog
    • Journal of KIISE
    • /
    • v.44 no.6
    • /
    • pp.587-594
    • /
    • 2017
  • Recently, motor imagery electroencephalogram(EEG) based Brain-Computer Interface(BCI) systems have received a significant amount of attention in various fields, including medicine and engineering. The Common Spatial Pattern(CSP) algorithm is the most commonly-used method to extract the features from motor imagery EEG. However, the CSP algorithm has limited applicability in Small-Sample Setting(SSS) situations because these situations rely on a covariance matrix. In addition, large differences in performance depend on the frequency bands that are being used. To address these problems, 4-40Hz band EEG signals are divided using nine filter-banks and Regularized CSP(R-CSP) is applied to individual frequency bands. Then, the Mutual Information-Based Individual Feature(MIBIF) algorithm is applied to the features of R-CSP for selecting discriminative features. Thereafter, selected features are used as inputs of the classifier Least Square Support Vector Machine(LS-SVM). The proposed method yielded a classification accuracy of 87.5%, 100%, 63.78%, 82.14%, and 86.11% in five subjects("aa", "al", "av", "aw", and "ay", respectively) for BCI competition III dataset IVa by using 18 channels in the vicinity of the motor area of the cerebral cortex. The proposed method improved the mean classification accuracy by 16.21%, 10.77% and 3.32% compared to the CSP, R-CSP and FBCSP, respectively The proposed method shows a particularly excellent performance in the SSS situation.

Characterization of Recombinant Bovine Sperm Hyaluronidase and Identification of an Important Asn-X-Ser/Thr Motif for Its Activity

  • Park, Chaeri;Kim, Young-Hyun;Lee, Sang-Rae;Park, Soojin;Jung, Yena;Lee, Youngjeon;Kim, Ji-Su;Eom, Taekil;Kim, Ju-Sung;Lee, Dong-Mok;Song, Bong-Suk;Sim, Bo-Woong;Kim, Sun-Uk;Chang, Kyu-Tae;Kim, Ekyune
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.9
    • /
    • pp.1547-1553
    • /
    • 2018
  • Hyaluronidases are a family of enzymes that catalyse the breakdown of hyaluronic acid, which is abundant in the extracellular matrix and cumulus oocyte complex. To investigate the activity of recombinant bovine sperm hyaluronidase 1 (SPAM1) and determine the effect of the Asn-X-Ser/Thr motif on its activity, the bovine SPAM1 open reading frame was cloned into the mammalian expression vector pCXN2 and then transfected to the HEK293 cell line. Expression of recombinant bovine hyaluronidase was estimated using a hyaluronidase activity assay with gel electrophoresis. Recombinant hyaluronidase could resolve highly polymeric hyaluronic acid and also caused dispersal of the cumulus cell layer. Comparative analysis with respect to enzyme activity was carried out for the glycosylated and deglycosylated bovine sperm hyaluronidase by N-glycosidase F treatment. Finally, mutagenesis analysis revealed that among the five potential N-linked glycosylation sites, only three contributed to significant inhibition of hyaluronic activity. Recombinant bovine SPAM1 has hyaluronan degradation and cumulus oocyte complex dispersion ability, and the N-linked oligosaccharides are important for enzyme activity, providing a foundation for the commercialization of hyaluronidase.

Incremental Regression based on a Sliding Window for Stream Data Prediction (스트림 데이타 예측을 위한 슬라이딩 윈도우 기반 점진적 회귀분석)

  • Kim, Sung-Hyun;Jin, Long;Ryu, Keun-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.483-492
    • /
    • 2007
  • Time series of conventional prediction techniques uses the model which is generated from the training step. This model is applied to new input data without any change. If this model is applied directly to stream data, the rate of prediction accuracy will be decreased. This paper proposes an stream data prediction technique using sliding window and regression. This technique considers the characteristic of time series which may be changed over time. It is composed of two steps. The first step executes a fractional process for applying input data to the regression model. The second step updates the model by using its information as new data. Additionally, the model is maintained by only recent data in a queue. This approach has the following two advantages. It maintains the minimum information of the model by using a matrix, so space complexity is reduced. Moreover, it prevents the increment of error rate by updating the model over time. Accuracy rate of the proposed method is measured by RME(Relative Mean Error) and RMSE(Root Mean Square Error). The results of stream data prediction experiment are performed by the proposed technique IMQR(Incremental Multiple Quadratic Regression) is more efficient than those of MLR(Multiple Linear Regression) and SVR(Support Vector Regression).

Fast Analysis of Fractal Antenna by Using FMM (FMM에 의한 프랙탈 안테나 고속 해석)

  • Kim, Yo-Sik;Lee, Kwang-Jae;Kim, Kun-Woo;Oh, Kyung-Hyun;Lee, Taek-Kyung;Lee, Jae-Wook
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.19 no.2
    • /
    • pp.121-129
    • /
    • 2008
  • In this paper, we present a fast analysis of multilayer microstrip fractal structure by using the fast multipole method (FMM). In the analysis, accurate spatial green's functions from the real-axis integration method(RAIM) are employed to solve the mixed potential integral equation(MPIE) with FMM algorithm. MoM's iteration and memory requirement is $O(N^2)$ in case of calculation using the green function. the problem is the unknown number N can be extremely large for calculation of large scale objects and high accuracy. To improve these problem is fast algorithm FMM. FMM use the addition theorem of green function. So, it reduce the complexity of a matrix-vector multiplication and reduce the cost of calculation to the order of $O(N^{1.5})$, The efficiency is proved from comparing calculation results of the moment method and Fast algorithm.

Gauss-Newton Based Estimation for Moving Emitter Location Using TDOA/FDOA Measurements and Its Analysis (TDOA/FDOA 정보를 이용한 Gauss-Newton 기법 기반의 이동 신호원 위치 및 속도 추정 방법과 성능 분석)

  • Kim, Yong-Hee;Kim, Dong-Gyu;Han, Jin-Woo;Song, Kyu-Ha;Kim, Hyoung-Nam
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.6
    • /
    • pp.62-71
    • /
    • 2013
  • The passive emitter location method using TDOA and FDOA measurements has higher accuracy comparing to the single TDOA or FDOA based method. Moreover, it is able to estimate the velocity vector of a moving platform. Recently, several non-iterative methods were suggested using the nuisance parameter but the common reference sensor is needed for each pair of sensors. They show also relatively low performance in the case of a long range between the sensor groups and the emitter. To solve this, we derive the estimation method of the position and velocity of a moving platform based on the Gauss-Newton method. In addition, to analyze the estimation performance of the position and velocity, respectively, we decompose the CRLB matrix into each subspace. Simulation results show the estimation performance of the derived method and the CEP planes according to the given geometry of the sensors.