• Title/Summary/Keyword: Improved LDA Model

Search Result 16, Processing Time 0.024 seconds

Hot Topic Discovery across Social Networks Based on Improved LDA Model

  • Liu, Chang;Hu, RuiLin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3935-3949
    • /
    • 2021
  • With the rapid development of Internet and big data technology, various online social network platforms have been established, producing massive information every day. Hot topic discovery aims to dig out meaningful content that users commonly concern about from the massive information on the Internet. Most of the existing hot topic discovery methods focus on a single network data source, and can hardly grasp hot spots as a whole, nor meet the challenges of text sparsity and topic hotness evaluation in cross-network scenarios. This paper proposes a novel hot topic discovery method across social network based on an im-proved LDA model, which first integrates the text information from multiple social network platforms into a unified data set, then obtains the potential topic distribution in the text through the improved LDA model. Finally, it adopts a heat evaluation method based on the word frequency of topic label words to take the latent topic with the highest heat value as a hot topic. This paper obtains data from the online social networks and constructs a cross-network topic discovery data set. The experimental results demonstrate the superiority of the proposed method compared to baseline methods.

The MeSH-Term Query Expansion Models using LDA Topic Models in Health Information Retrieval (MeSH 기반의 LDA 토픽 모델을 이용한 검색어 확장)

  • You, Sukjin
    • Journal of Korean Library and Information Science Society
    • /
    • v.52 no.1
    • /
    • pp.79-108
    • /
    • 2021
  • Information retrieval in the health field has several challenges. Health information terminology is difficult for consumers (laypeople) to understand. Formulating a query with professional terms is not easy for consumers because health-related terms are more familiar to health professionals. If health terms related to a query are automatically added, it would help consumers to find relevant information. The proposed query expansion (QE) models show how to expand a query using MeSH terms. The documents were represented by MeSH terms (i.e. Bag-of-MeSH), found in the full-text articles. And then the MeSH terms were used to generate LDA (Latent Dirichlet Analysis) topic models. A query and the top k retrieved documents were used to find MeSH terms as topic words related to the query. LDA topic words were filtered by threshold values of topic probability (TP) and word probability (WP). Threshold values were effective in an LDA model with a specific number of topics to increase IR performance in terms of infAP (inferred Average Precision) and infNDCG (inferred Normalized Discounted Cumulative Gain), which are common IR metrics for large data collections with incomplete judgments. The top k words were chosen by the word score based on (TP *WP) and retrieved document ranking in an LDA model with specific thresholds. The QE model with specific thresholds for TP and WP showed improved mean infAP and infNDCG scores in an LDA model, comparing with the baseline result.

Topic Extraction and Classification Method Based on Comment Sets

  • Tan, Xiaodong
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.329-342
    • /
    • 2020
  • In recent years, emotional text classification is one of the essential research contents in the field of natural language processing. It has been widely used in the sentiment analysis of commodities like hotels, and other commentary corpus. This paper proposes an improved W-LDA (weighted latent Dirichlet allocation) topic model to improve the shortcomings of traditional LDA topic models. In the process of the topic of word sampling and its word distribution expectation calculation of the Gibbs of the W-LDA topic model. An average weighted value is adopted to avoid topic-related words from being submerged by high-frequency words, to improve the distinction of the topic. It further integrates the highest classification of the algorithm of support vector machine based on the extracted high-quality document-topic distribution and topic-word vectors. Finally, an efficient integration method is constructed for the analysis and extraction of emotional words, topic distribution calculations, and sentiment classification. Through tests on real teaching evaluation data and test set of public comment set, the results show that the method proposed in the paper has distinct advantages compared with other two typical algorithms in terms of subject differentiation, classification precision, and F1-measure.

A standardization model based on image recognition for performance evaluation of an oral scanner

  • Seo, Sang-Wan;Lee, Wan-Sun;Byun, Jae-Young;Lee, Kyu-Bok
    • The Journal of Advanced Prosthodontics
    • /
    • v.9 no.6
    • /
    • pp.409-415
    • /
    • 2017
  • PURPOSE. Accurate information is essential in dentistry. The image information of missing teeth is used in optically based medical equipment in prosthodontic treatment. To evaluate oral scanners, the standardized model was examined from cases of image recognition errors of linear discriminant analysis (LDA), and a model that combines the variables with reference to ISO 12836:2015 was designed. MATERIALS AND METHODS. The basic model was fabricated by applying 4 factors to the tooth profile (chamfer, groove, curve, and square) and the bottom surface. Photo-type and video-type scanners were used to analyze 3D images after image capture. The scans were performed several times according to the prescribed sequence to distinguish the model from the one that did not form, and the results confirmed it to be the best. RESULTS. In the case of the initial basic model, a 3D shape could not be obtained by scanning even if several shots were taken. Subsequently, the recognition rate of the image was improved with every variable factor, and the difference depends on the tooth profile and the pattern of the floor surface. CONCLUSION. Based on the recognition error of the LDA, the recognition rate decreases when the model has a similar pattern. Therefore, to obtain the accurate 3D data, the difference of each class needs to be provided when developing a standardized model.

Learning Probabilistic Kernel from Latent Dirichlet Allocation

  • Lv, Qi;Pang, Lin;Li, Xiong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2527-2545
    • /
    • 2016
  • Measuring the similarity of given samples is a key problem of recognition, clustering, retrieval and related applications. A number of works, e.g. kernel method and metric learning, have been contributed to this problem. The challenge of similarity learning is to find a similarity robust to intra-class variance and simultaneously selective to inter-class characteristic. We observed that, the similarity measure can be improved if the data distribution and hidden semantic information are exploited in a more sophisticated way. In this paper, we propose a similarity learning approach for retrieval and recognition. The approach, termed as LDA-FEK, derives free energy kernel (FEK) from Latent Dirichlet Allocation (LDA). First, it trains LDA and constructs kernel using the parameters and variables of the trained model. Then, the unknown kernel parameters are learned by a discriminative learning approach. The main contributions of the proposed method are twofold: (1) the method is computationally efficient and scalable since the parameters in kernel are determined in a staged way; (2) the method exploits data distribution and semantic level hidden information by means of LDA. To evaluate the performance of LDA-FEK, we apply it for image retrieval over two data sets and for text categorization on four popular data sets. The results show the competitive performance of our method.

WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS (WV-BTM: SNS 단문의 주제 분석을 위한 토픽 모델 정확도 개선 기법)

  • Song, Ae-Rin;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.51-58
    • /
    • 2018
  • As the amount of users and data of NS explosively increased, research based on SNS Big data became active. In social mining, Latent Dirichlet Allocation(LDA), which is a typical topic model technique, is used to identify the similarity of each text from non-classified large-volume SNS text big data and to extract trends therefrom. However, LDA has the limitation that it is difficult to deduce a high-level topic due to the semantic sparsity of non-frequent word occurrence in the short sentence data. The BTM study improved the limitations of this LDA through a combination of two words. However, BTM also has a limitation that it is impossible to calculate the weight considering the relation with each subject because it is influenced more by the high frequency word among the combined words. In this paper, we propose a technique to improve the accuracy of existing BTM by reflecting semantic relation between words.

Evolutionary Computing Driven Extreme Learning Machine for Objected Oriented Software Aging Prediction

  • Ahamad, Shahanawaj
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.232-240
    • /
    • 2022
  • To fulfill user expectations, the rapid evolution of software techniques and approaches has necessitated reliable and flawless software operations. Aging prediction in the software under operation is becoming a basic and unavoidable requirement for ensuring the systems' availability, reliability, and operations. In this paper, an improved evolutionary computing-driven extreme learning scheme (ECD-ELM) has been suggested for object-oriented software aging prediction. To perform aging prediction, we employed a variety of metrics, including program size, McCube complexity metrics, Halstead metrics, runtime failure event metrics, and some unique aging-related metrics (ARM). In our suggested paradigm, extracting OOP software metrics is done after pre-processing, which includes outlier detection and normalization. This technique improved our proposed system's ability to deal with instances with unbalanced biases and metrics. Further, different dimensional reduction and feature selection algorithms such as principal component analysis (PCA), linear discriminant analysis (LDA), and T-Test analysis have been applied. We have suggested a single hidden layer multi-feed forward neural network (SL-MFNN) based ELM, where an adaptive genetic algorithm (AGA) has been applied to estimate the weight and bias parameters for ELM learning. Unlike the traditional neural networks model, the implementation of GA-based ELM with LDA feature selection has outperformed other aging prediction approaches in terms of prediction accuracy, precision, recall, and F-measure. The results affirm that the implementation of outlier detection, normalization of imbalanced metrics, LDA-based feature selection, and GA-based ELM can be the reliable solution for object-oriented software aging prediction.

Improvement of generalization of linear model through data augmentation based on Central Limit Theorem (데이터 증가를 통한 선형 모델의 일반화 성능 개량 (중심극한정리를 기반으로))

  • Hwang, Doohwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.19-31
    • /
    • 2022
  • In Machine learning, we usually divide the entire data into training data and test data, train the model using training data, and use test data to determine the accuracy and generalization performance of the model. In the case of models with low generalization performance, the prediction accuracy of newly data is significantly reduced, and the model is said to be overfit. This study is about a method of generating training data based on central limit theorem and combining it with existed training data to increase normality and using this data to train models and increase generalization performance. To this, data were generated using sample mean and standard deviation for each feature of the data by utilizing the characteristic of central limit theorem, and new training data was constructed by combining them with existed training data. To determine the degree of increase in normality, the Kolmogorov-Smirnov normality test was conducted, and it was confirmed that the new training data showed increased normality compared to the existed data. Generalization performance was measured through differences in prediction accuracy for training data and test data. As a result of measuring the degree of increase in generalization performance by applying this to K-Nearest Neighbors (KNN), Logistic Regression, and Linear Discriminant Analysis (LDA), it was confirmed that generalization performance was improved for KNN, a non-parametric technique, and LDA, which assumes normality between model building.

Analysis of Research Trends in Cloud Security Using Topic Modeling and Time-Series Analysis: Focusing on NTIS Projects (토픽모델링과 시계열 분석을 활용한 클라우드 보안 분야 연구 동향 분석 : NTIS 과제를 중심으로)

  • Sun Young Yun;Nam Wook Cho
    • Convergence Security Journal
    • /
    • v.24 no.2
    • /
    • pp.31-38
    • /
    • 2024
  • Recent expansion in cloud service usage has heightened the importance of cloud security. The purpose of this study is to analyze current research trends in the field of cloud security and to derive implications. To this end, R&D project data provided by the National Science and Technology Knowledge Information Service (NTIS) from 2010 to 2023 was utilized to analyze trends in cloud security research. Fifteen core topics in cloud security research were identified using LDA topic modeling and ARIMA time series analysis. Key areas identified in the research include AI-powered security technologies, privacy and data security, and solving security issues in IoT environments. This highlights the need for research to address security threats that may arise due to the proliferation of cloud technologies and the digital transformation of infrastructure. Based on the derived topics, the field of cloud security was divided into four categories to define a technology reference model, which was improved through expert interviews. This study is expected to guide the future direction of cloud security development and provide important guidelines for future research and investment in academia and industry.

Abnormal Behavior Recognition Based on Spatio-temporal Context

  • Yang, Yuanfeng;Li, Lin;Liu, Zhaobin;Liu, Gang
    • Journal of Information Processing Systems
    • /
    • v.16 no.3
    • /
    • pp.612-628
    • /
    • 2020
  • This paper presents a new approach for detecting abnormal behaviors in complex surveillance scenes where anomalies are subtle and difficult to distinguish due to the intricate correlations among multiple objects' behaviors. Specifically, a cascaded probabilistic topic model was put forward for learning the spatial context of local behavior and the temporal context of global behavior in two different stages. In the first stage of topic modeling, unlike the existing approaches using either optical flows or complete trajectories, spatio-temporal correlations between the trajectory fragments in video clips were modeled by the latent Dirichlet allocation (LDA) topic model based on Markov random fields to obtain the spatial context of local behavior in each video clip. The local behavior topic categories were then obtained by exploiting the spectral clustering algorithm. Based on the construction of a dictionary through the process of local behavior topic clustering, the second phase of the LDA topic model learns the correlations of global behaviors and temporal context. In particular, an abnormal behavior recognition method was developed based on the learned spatio-temporal context of behaviors. The specific identification method adopts a top-down strategy and consists of two stages: anomaly recognition of video clip and anomalous behavior recognition within each video clip. Evaluation was performed using the validity of spatio-temporal context learning for local behavior topics and abnormal behavior recognition. Furthermore, the performance of the proposed approach in abnormal behavior recognition improved effectively and significantly in complex surveillance scenes.