• 제목/요약/키워드: Text Network

검색결과 1,110건 처리시간 0.025초

A Comparative Study of Word Embedding Models for Arabic Text Processing

  • Assiri, Fatmah;Alghamdi, Nuha
    • International Journal of Computer Science & Network Security
    • /
    • 제22권8호
    • /
    • pp.399-403
    • /
    • 2022
  • Natural texts are analyzed to obtain their intended meaning to be classified depending on the problem under study. One way to represent words is by generating vectors of real values to encode the meaning; this is called word embedding. Similarities between word representations are measured to identify text class. Word embeddings can be created using word2vec technique. However, recently fastText was implemented to provide better results when it is used with classifiers. In this paper, we will study the performance of well-known classifiers when using both techniques for word embedding with Arabic dataset. We applied them to real data collected from Wikipedia, and we found that both word2vec and fastText had similar accuracy with all used classifiers.

An End-to-End Sequence Learning Approach for Text Extraction and Recognition from Scene Image

  • Lalitha, G.;Lavanya, B.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권7호
    • /
    • pp.220-228
    • /
    • 2022
  • Image always carry useful information, detecting a text from scene images is imperative. The proposed work's purpose is to recognize scene text image, example boarding image kept on highways. Scene text detection on highways boarding's plays a vital role in road safety measures. At initial stage applying preprocessing techniques to the image is to sharpen and improve the features exist in the image. Likely, morphological operator were applied on images to remove the close gaps exists between objects. Here we proposed a two phase algorithm for extracting and recognizing text from scene images. In phase I text from scenery image is extracted by applying various image preprocessing techniques like blurring, erosion, tophat followed by applying thresholding, morphological gradient and by fixing kernel sizes, then canny edge detector is applied to detect the text contained in the scene images. In phase II text from scenery image recognized using MSER (Maximally Stable Extremal Region) and OCR; Proposed work aimed to detect the text contained in the scenery images from popular dataset repositories SVT, ICDAR 2003, MSRA-TD 500; these images were captured at various illumination and angles. Proposed algorithm produces higher accuracy in minimal execution time compared with state-of-the-art methodologies.

저가항공 관련 국내학술지 네트워크 텍스트 분석 (Bibliometric Network Analysis on Low Cost Carrier Research)

  • 나진성;최동현
    • 한국항공운항학회지
    • /
    • 제23권1호
    • /
    • pp.14-23
    • /
    • 2015
  • This study applied the network text analysis to reveal the scope and trends of low cost carrier studies. We analyzed low cost carrier research published in Korean journals and news articles. The results showed that there are three clusters in terms of research topics. First dimension consists of articles investigating growth in the low cost carrier industry. The second dimension is associated with service characteristics. The last dimension has strong ties organizational and human resource dimension. We run Krkwic, Krtitle, Netdraw, and Ucinet 6.0 to conduct the network text analysis. This study suggests the direction of low cost carrier research in the future.

Hot Topic Discovery across Social Networks Based on Improved LDA Model

  • Liu, Chang;Hu, RuiLin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.3935-3949
    • /
    • 2021
  • With the rapid development of Internet and big data technology, various online social network platforms have been established, producing massive information every day. Hot topic discovery aims to dig out meaningful content that users commonly concern about from the massive information on the Internet. Most of the existing hot topic discovery methods focus on a single network data source, and can hardly grasp hot spots as a whole, nor meet the challenges of text sparsity and topic hotness evaluation in cross-network scenarios. This paper proposes a novel hot topic discovery method across social network based on an im-proved LDA model, which first integrates the text information from multiple social network platforms into a unified data set, then obtains the potential topic distribution in the text through the improved LDA model. Finally, it adopts a heat evaluation method based on the word frequency of topic label words to take the latent topic with the highest heat value as a hot topic. This paper obtains data from the online social networks and constructs a cross-network topic discovery data set. The experimental results demonstrate the superiority of the proposed method compared to baseline methods.

An Enhanced Text Mining Approach using Ensemble Algorithm for Detecting Cyber Bullying

  • Z.Sunitha Bai;Sreelatha Malempati
    • International Journal of Computer Science & Network Security
    • /
    • 제23권5호
    • /
    • pp.1-6
    • /
    • 2023
  • Text mining (TM) is most widely used to process the various unstructured text documents and process the data present in the various domains. The other name for text mining is text classification. This domain is most popular in many domains such as movie reviews, product reviews on various E-commerce websites, sentiment analysis, topic modeling and cyber bullying on social media messages. Cyber-bullying is the type of abusing someone with the insulting language. Personal abusing, sexual harassment, other types of abusing come under cyber-bullying. Several existing systems are developed to detect the bullying words based on their situation in the social networking sites (SNS). SNS becomes platform for bully someone. In this paper, An Enhanced text mining approach is developed by using Ensemble Algorithm (ETMA) to solve several problems in traditional algorithms and improve the accuracy, processing time and quality of the result. ETMA is the algorithm used to analyze the bullying text within the social networking sites (SNS) such as facebook, twitter etc. The ETMA is applied on synthetic dataset collected from various data a source which consists of 5k messages belongs to bullying and non-bullying. The performance is analyzed by showing Precision, Recall, F1-Score and Accuracy.

YOLO, EAST: 신경망 모델을 이용한 문자열 위치 검출 성능 비교 (YOLO, EAST : Comparison of Scene Text Detection Performance, Using a Neural Network Model)

  • 박찬용;임영민;정승대;조영혁;이병철;이규현;김진욱
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제11권3호
    • /
    • pp.115-124
    • /
    • 2022
  • 본 논문에서는 최근 다양한 분야에서 많이 활용되고 있는 YOLO와 EAST 신경망을 이미지 속 문자열 탐지문제에 적용해보고 이들의 성능을 비교분석 해 보았다. YOLO 신경망은 일반적으로 이미지 속 문자영역 탐지에 낮은 성능을 보인다고 알려졌으나, 실험결과 YOLOv3는 문자열 탐지에 비교적 약점을 보이지만 최근 출시된 YOLOv4와 YOLOv5의 경우 다양한 형태의 이미지 속에 있는 한글과 영문 문자열 탐지에 뛰어난 성능을 보여줌을 확인하였다. 따라서, 이들 YOLO 신경망 기반 문자열 탐지방법이 향후 문자 인식 분야에서 많이 활용될 것으로 전망한다.

텍스트 네트워크를 활용한 간호창업 연구동향 고찰 (Analysis of Nursing Start-up Trends Using Text Network Analysis)

  • 김주행
    • 한국융합학회논문지
    • /
    • 제11권1호
    • /
    • pp.359-367
    • /
    • 2020
  • 본 연구는 간호창업 관련 문헌에서 나타난 간호창업의 관심 주제 및 간호창업 경험의 속성, 간호창업의 방향성을 탐색하기 위해 시행되었다. MEDLINE, Embase, Cochrane Library DB를 통해 55편의 간호창업 관련 문헌을 선정하여 덱스트 네트워크 분석 방법을 적용하여 분석하였다. 분석결과 단순출현 빈도와 연결중심성에서 공통적인 핵심키워드는 'business', 'care', 'nursing', 'healthcare', 'service'으로 나타났다. 연결중심성에서 높은 순위를 보이는 키워드는 'mission', 'vision', 'team'으로 나타났다. 이에 본 연구결과가 체계적인 간호창업 교육프로그램과 간호창업 이론 개발의 기초 자료로 활용 될 수 있을 것이다.

Systematic network analysis of herb formula in Traditional East Asian Medicine discloses synergistic operation of medicinal herb pairs with statistical significance

  • Lee, Jungsul;Jeon, Jongwook;Choi, Chulhee
    • 셀메드
    • /
    • 제5권2호
    • /
    • pp.11.1-11.5
    • /
    • 2015
  • Traditional East Asian Medicine (TEAM) prescriptions typically consist of several herbs based on the assumption that the herbs operate synergistically and/or cooperate on several related pathways simultaneously. This is a general concept that is widely accepted in TEAM, but it has not been tested systematically. To check this assumption statistically, we have text mined traditional Korean medicine text the Inje-ji(仁濟志, Collections of benevolent savings), a text that contains more than 5000 herb-cocktail prescriptions. We created herb-pairing network based on herb-herb pairing specificity and performed a systematic network analysis. Herbs were shown to be used selectively with other herbs and not randomly. Moreover, herb pairs were more specifically associated with symptoms than were single herbs. Single herbs and combinations of herbs specifically used for diabetes mellitus were successfully identified. As conclusion, herb-pairings in TEAM are not randomly constructed; instead, each herb was selectively used with other herbs. In terms of statistical significance, herb pairs were more specifically associated with symptoms than were single herbs alone. Collectively, these results suggest that it may be important to understand the interactions among multiple ingredients contained in herb pairs rather than trying to identify a single compound to resolve symptoms.

The Impact of Transforming Unstructured Data into Structured Data on a Churn Prediction Model for Loan Customers

  • Jung, Hoon;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권12호
    • /
    • pp.4706-4724
    • /
    • 2020
  • With various structured data, such as the company size, loan balance, and savings accounts, the voice of customer (VOC), which is text data containing contact history and counseling details was analyzed in this study. To analyze unstructured data, the term frequency-inverse document frequency (TF-IDF) analysis, semantic network analysis, sentiment analysis, and a convolutional neural network (CNN) were implemented. A performance comparison of the models revealed that the predictive model using the CNN provided the best performance with regard to predictive power, followed by the model using the TF-IDF, and then the model using semantic network analysis. In particular, a character-level CNN and a word-level CNN were developed separately, and the character-level CNN exhibited better performance, according to an analysis for the Korean language. Moreover, a systematic selection model for optimal text mining techniques was proposed, suggesting which analytical technique is appropriate for analyzing text data depending on the context. This study also provides evidence that the results of previous studies, indicating that individual customers leave when their loyalty and switching cost are low, are also applicable to corporate customers and suggests that VOC data indicating customers' needs are very effective for predicting their behavior.

Using Highly Secure Data Encryption Method for Text File Cryptography

  • Abu-Faraj, Mua'ad M.;Alqadi, Ziad A.
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12호
    • /
    • pp.53-60
    • /
    • 2021
  • Many standard methods are used for secret text files and secrete short messages cryptography, these methods are efficient when the text to be encrypted is small, and the efficiency will rapidly decrease when increasing the text size, also these methods sometimes have a low level of security, this level will depend on the PK length and sometimes it may be hacked. In this paper, a new method will be introduced to improve the data protection level by using a changeable secrete speech file to generate PK. Highly Secure Data Encryption (HSDE) method will be implemented and tested for data quality levels to ensure that the HSDE destroys the data in the encryption phase, and recover the original data in the decryption phase. Some standard methods of data cryptography will be implemented; comparisons will be done to justify the enhancements provided by the proposed method.