• Title/Summary/Keyword: text detection

Search Result 391, Processing Time 0.023 seconds

Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data (텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지)

  • Choi, Semok;Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.9
    • /
    • pp.1181-1190
    • /
    • 2020
  • Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

  • Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.2
    • /
    • pp.771-789
    • /
    • 2019
  • Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.

Correction of Signboard Distortion by Vertical Stroke Estimation

  • Lim, Jun Sik;Na, In Seop;Kim, Soo Hyung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.9
    • /
    • pp.2312-2325
    • /
    • 2013
  • In this paper, we propose a preprocessing method that it is to correct the distortion of text area in Korean signboard images as a preprocessing step to improve character recognition. Distorted perspective in recognizing of Korean signboard text may cause of the low recognition rate. The proposed method consists of four main steps and eight sub-steps: main step consists of potential vertical components detection, vertical components detection, text-boundary estimation and distortion correction. First, potential vertical line components detection consists of four steps, including edge detection for each connected component, pixel distance normalization in the edge, dominant-point detection in the edge and removal of horizontal components. Second, vertical line components detection is composed of removal of diagonal components and extraction of vertical line components. Third, the outline estimation step is composed of the left and right boundary line detection. Finally, distortion of the text image is corrected by bilinear transformation based on the estimated outline. We compared the changes in recognition rates of OCR before and after applying the proposed algorithm. The recognition rate of the distortion corrected signboard images is 29.63% and 21.9% higher at the character and the text unit than those of the original images.

Extraction of Text Alignment by Tensor Voting and its Application to Text Detection (텐서보팅을 이용한 텍스트 배열정보의 획득과 이를 이용한 텍스트 검출)

  • Lee, Guee-Sang;Dinh, Toan Nguyen;Park, Jong-Hyun
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.11
    • /
    • pp.912-919
    • /
    • 2009
  • A novel algorithm using 2D tensor voting and edge-based approach is proposed for text detection in natural scene images. The tensor voting is used based on the fact that characters in a text line are usually close together on a smooth curve and therefore the tokens corresponding to centers of these characters have high curve saliency values. First, a suitable edge-based method is used to find all possible text regions. Since the false positive rate of text detection result generated from the edge-based method is high, 2D tensor voting is applied to remove false positives and find only text regions. The experimental results show that our method successfully detects text regions in many complex natural scene images.

Deep-Learning Approach for Text Detection Using Fully Convolutional Networks

  • Tung, Trieu Son;Lee, Gueesang
    • International Journal of Contents
    • /
    • v.14 no.1
    • /
    • pp.1-6
    • /
    • 2018
  • Text, as one of the most influential inventions of humanity, has played an important role in human life since ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications such as the text data extracted from images that can provide information for automatic annotation, indexing, language translation, and the assistance systems for impaired persons. Therefore, natural-scene text detection with active research topics regarding computer vision and document analysis is very important. Previous methods have poor performances due to numerous false-positive and true-negative regions. In this paper, a fully-convolutional-network (FCN)-based method that uses supervised architecture is used to localize textual regions. The model was trained directly using images wherein pixel values were used as inputs and binary ground truth was used as label. The method was evaluated using ICDAR-2013 dataset and proved to be comparable to other feature-based methods. It could expedite research on text detection using deep-learning based approach in the future.

Text Detection based on Edge Enhanced Contrast Extremal Region and Tensor Voting in Natural Scene Images

  • Pham, Van Khien;Kim, Soo-Hyung;Yang, Hyung-Jeong;Lee, Guee-Sang
    • Smart Media Journal
    • /
    • v.6 no.4
    • /
    • pp.32-40
    • /
    • 2017
  • In this paper, a robust text detection method based on edge enhanced contrasting extremal region (CER) is proposed using stroke width transform (SWT) and tensor voting. First, the edge enhanced CER extracts a number of covariant regions, which is a stable connected component from input images. Next, SWT is created by the distance map, which is used to eliminate non-text regions. Then, these candidate text regions are verified based on tensor voting, which uses the input center point in the previous step to compute curve salience values. Finally, the connected component grouping is applied to a cluster closed to characters. The proposed method is evaluated with the ICDAR2003 and ICDAR2013 text detection competition datasets and the experiment results show high accuracy compared to previous methods.

A Novel Video Image Text Detection Method

  • Zhou, Lin;Ping, Xijian;Gao, Haolin;Xu, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.3
    • /
    • pp.941-953
    • /
    • 2012
  • A novel and universal method of video image text detection is proposed. A coarse-to-fine text detection method is implemented. Firstly, the spectral clustering (SC) method is adopted to coarsely detect text regions based on the stationary wavelet transform (SWT). In order to make full use of the information, multi-parameters kernel function which combining the features similarity information and spatial adjacency information is employed in the SC method. Secondly, 28 dimension classifying features are proposed and support vector machine (SVM) is implemented to classify text regions with non-text regions. Experimental results on video images show the encouraging performance of the proposed algorithm and classifying features.

A Novel Video Image Text Detection Method

  • Zhou, Lin;Ping, Xijian;Gao, Haolin;Xu, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.4
    • /
    • pp.1140-1152
    • /
    • 2012
  • A novel and universal method of video image text detection is proposed. A coarse-to-fine text detection method is implemented. Firstly, the spectral clustering (SC) method is adopted to coarsely detect text regions based on the stationary wavelet transform (SWT). In order to make full use of the information, multi-parameters kernel function which combining the features similarity information and spatial adjacency information is employed in the SC method. Secondly, 28 dimension classifying features are proposed and support vector machine (SVM) is implemented to classify text regions with non-text regions. Experimental results on video images show the encouraging performance of the proposed algorithm and classifying features.

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining

  • Hong, Sung-Sam;Kong, Jong-Hwan;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.6
    • /
    • pp.2186-2196
    • /
    • 2014
  • Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.

Text Region Detection using Edge and Regional Minima/Maxima Transformation from Natural Scene Images (에지 및 국부적 최소/최대 변환을 이용한 자연 이미지로부터 텍스트 영역 검출)

  • Park, Jong-Cheon;Lee, Keun-Wang
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.2
    • /
    • pp.358-363
    • /
    • 2009
  • Text region detection from the natural scene images used in a variety of applications, many research are needed in this field. Recent research methods is to detect the text region using various algorithm which it is combination of edge based and connected component based. Therefore, this paper proposes an text region detection using edge and regional minima/maxima transformation algorithm from natural scene images, and then detect the connected components of edge and regional minima/maxima, labeling edge and regional minima/maxima connected components. Analysis the labeled regions and then detect a text candidate regions, each of detected text candidates combined and create a single text candidate image, Final text region validated by comparing the similarity and adjacency of individual characters, and then as the final text regions are detected. As the results of experiments, proposed algorithm improved the correctness of text regions detection using combined edge and regional minima/maxima connected components detection methods.