• 제목/요약/키워드: text detection

검색결과 393건 처리시간 0.023초

텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지 (Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data)

  • 최세목;박정희
    • 한국멀티미디어학회논문지
    • /
    • 제23권9호
    • /
    • pp.1181-1190
    • /
    • 2020
  • Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.

A Novel Text Sample Selection Model for Scene Text Detection via Bootstrap Learning

  • Kong, Jun;Sun, Jinhua;Jiang, Min;Hou, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권2호
    • /
    • pp.771-789
    • /
    • 2019
  • Text detection has been a popular research topic in the field of computer vision. It is difficult for prevalent text detection algorithms to avoid the dependence on datasets. To overcome this problem, we proposed a novel unsupervised text detection algorithm inspired by bootstrap learning. Firstly, the text candidate in a novel form of superpixel is proposed to improve the text recall rate by image segmentation. Secondly, we propose a unique text sample selection model (TSSM) to extract text samples from the current image and eliminate database dependency. Specifically, to improve the precision of samples, we combine maximally stable extremal regions (MSERs) and the saliency map to generate sample reference maps with a double threshold scheme. Finally, a multiple kernel boosting method is developed to generate a strong text classifier by combining multiple single kernel SVMs based on the samples selected from TSSM. Experimental results on standard datasets demonstrate that our text detection method is robust to complex backgrounds and multilingual text and shows stable performance on different standard datasets.

Correction of Signboard Distortion by Vertical Stroke Estimation

  • Lim, Jun Sik;Na, In Seop;Kim, Soo Hyung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권9호
    • /
    • pp.2312-2325
    • /
    • 2013
  • In this paper, we propose a preprocessing method that it is to correct the distortion of text area in Korean signboard images as a preprocessing step to improve character recognition. Distorted perspective in recognizing of Korean signboard text may cause of the low recognition rate. The proposed method consists of four main steps and eight sub-steps: main step consists of potential vertical components detection, vertical components detection, text-boundary estimation and distortion correction. First, potential vertical line components detection consists of four steps, including edge detection for each connected component, pixel distance normalization in the edge, dominant-point detection in the edge and removal of horizontal components. Second, vertical line components detection is composed of removal of diagonal components and extraction of vertical line components. Third, the outline estimation step is composed of the left and right boundary line detection. Finally, distortion of the text image is corrected by bilinear transformation based on the estimated outline. We compared the changes in recognition rates of OCR before and after applying the proposed algorithm. The recognition rate of the distortion corrected signboard images is 29.63% and 21.9% higher at the character and the text unit than those of the original images.

텐서보팅을 이용한 텍스트 배열정보의 획득과 이를 이용한 텍스트 검출 (Extraction of Text Alignment by Tensor Voting and its Application to Text Detection)

  • 이귀상;또안;박종현
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제36권11호
    • /
    • pp.912-919
    • /
    • 2009
  • 본 논문에서는 이차원 텐서보팅과 에지 기반 방법을 이용하여 자연영상에서 문자를 검출하는 새로운 방법을 제시한다. 텍스트의 문자들은 보통 연속적인 완만한 곡선 상에 배열되어 있고 서로 가깝게 위치하며, 이러한 특성은 텐서보팅에 의하여 효과적으로 검출될 수 있다. 이차원 텐서보팅은 토큰의 연속성을 curve saliency 로 산출하며 이러한 특성은 다양한 영상해석에 사용된다. 먼저 에지 검출을 이용하여 영상 내의 텍스트 영역이 위치할 가능성이 있는 텍스트 후보영역을 찾고 이러한 후보영역의 연속성을 텐서보팅에 의해 검증하여 잡음영역을 제거하고 텍스트 영역만을 구분한다. 실험 결과, 제안된 방법은 복잡한 자연영상에서 효과적으로 텍스트 영역을 검출함을 확인하였다.

Deep-Learning Approach for Text Detection Using Fully Convolutional Networks

  • Tung, Trieu Son;Lee, Gueesang
    • International Journal of Contents
    • /
    • 제14권1호
    • /
    • pp.1-6
    • /
    • 2018
  • Text, as one of the most influential inventions of humanity, has played an important role in human life since ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications such as the text data extracted from images that can provide information for automatic annotation, indexing, language translation, and the assistance systems for impaired persons. Therefore, natural-scene text detection with active research topics regarding computer vision and document analysis is very important. Previous methods have poor performances due to numerous false-positive and true-negative regions. In this paper, a fully-convolutional-network (FCN)-based method that uses supervised architecture is used to localize textual regions. The model was trained directly using images wherein pixel values were used as inputs and binary ground truth was used as label. The method was evaluated using ICDAR-2013 dataset and proved to be comparable to other feature-based methods. It could expedite research on text detection using deep-learning based approach in the future.

Text Detection based on Edge Enhanced Contrast Extremal Region and Tensor Voting in Natural Scene Images

  • Pham, Van Khien;Kim, Soo-Hyung;Yang, Hyung-Jeong;Lee, Guee-Sang
    • 스마트미디어저널
    • /
    • 제6권4호
    • /
    • pp.32-40
    • /
    • 2017
  • In this paper, a robust text detection method based on edge enhanced contrasting extremal region (CER) is proposed using stroke width transform (SWT) and tensor voting. First, the edge enhanced CER extracts a number of covariant regions, which is a stable connected component from input images. Next, SWT is created by the distance map, which is used to eliminate non-text regions. Then, these candidate text regions are verified based on tensor voting, which uses the input center point in the previous step to compute curve salience values. Finally, the connected component grouping is applied to a cluster closed to characters. The proposed method is evaluated with the ICDAR2003 and ICDAR2013 text detection competition datasets and the experiment results show high accuracy compared to previous methods.

A Novel Video Image Text Detection Method

  • Zhou, Lin;Ping, Xijian;Gao, Haolin;Xu, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제6권3호
    • /
    • pp.941-953
    • /
    • 2012
  • A novel and universal method of video image text detection is proposed. A coarse-to-fine text detection method is implemented. Firstly, the spectral clustering (SC) method is adopted to coarsely detect text regions based on the stationary wavelet transform (SWT). In order to make full use of the information, multi-parameters kernel function which combining the features similarity information and spatial adjacency information is employed in the SC method. Secondly, 28 dimension classifying features are proposed and support vector machine (SVM) is implemented to classify text regions with non-text regions. Experimental results on video images show the encouraging performance of the proposed algorithm and classifying features.

A Novel Video Image Text Detection Method

  • Zhou, Lin;Ping, Xijian;Gao, Haolin;Xu, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제6권4호
    • /
    • pp.1140-1152
    • /
    • 2012
  • A novel and universal method of video image text detection is proposed. A coarse-to-fine text detection method is implemented. Firstly, the spectral clustering (SC) method is adopted to coarsely detect text regions based on the stationary wavelet transform (SWT). In order to make full use of the information, multi-parameters kernel function which combining the features similarity information and spatial adjacency information is employed in the SC method. Secondly, 28 dimension classifying features are proposed and support vector machine (SVM) is implemented to classify text regions with non-text regions. Experimental results on video images show the encouraging performance of the proposed algorithm and classifying features.

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining

  • Hong, Sung-Sam;Kong, Jong-Hwan;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권6호
    • /
    • pp.2186-2196
    • /
    • 2014
  • Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.

에지 및 국부적 최소/최대 변환을 이용한 자연 이미지로부터 텍스트 영역 검출 (Text Region Detection using Edge and Regional Minima/Maxima Transformation from Natural Scene Images)

  • 박종천;이근왕
    • 한국산학기술학회논문지
    • /
    • 제10권2호
    • /
    • pp.358-363
    • /
    • 2009
  • 자연이미지로부터 텍스트 영역 검출은 다양한 응용분야에 활용됨으로 이 분야의 많은 연구가 필요하다. 최근의 연구 방법은 에지 및 연결요소 기반 방법을 결합하는 다양한 알고리즘을 이용하여 텍스트 영역을 검출하고 있다. 그러므로 본 논문은 이러한 결합방법으로 에지 및 국부적 최소/최대 변환 방법을 이용하여 텍스트 영역을 검출하는 알고리즘을 제안한다. 명도 이미지로부터 에지 및 국부적 최소/최대 연결성분을 검출하고, 에지 및 국부적 최소/최대 연결성분을 레이블화한다. 레이블된 영역을 분석하여 텍스트 후보 영역을 검출하고, 검출된 각각의 텍스트 후보 영역을 결합하여 단일 텍스트 후보 이미지를 생성한다. 텍스트 후보 개별문자의 인접성 및 유사도를 비교하여 검증함으로서 최종적인 텍스트 영역을 검출한다. 실험결과 제안한 알고리즘은 에지 요소 및 국부적 최소/최대 연결요소 검출 방법을 결합하여 자연 이미지로부터 텍스트 영역 검출의 정확도 및 재현률을 향상할 수 있었다.