• Title/Summary/Keyword: Scene text recognition

Search Result 30, Processing Time 0.029 seconds

Detecting and Segmenting Text from Images for a Mobile Translator System

  • Chalidabhongse, Thanarat H.;Jeeraboon, Poonsak
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.875-878
    • /
    • 2004
  • Researching in text detection and segmentation has been done for a long period in the OCR area. However, there is some other area that the text detection and segmentation from images can be very useful. In this report, we first propose the design of a mobile translator system which helps non-native speakers to understand the foreign language using ubiquitous mobile network and camera mobile phones. The main focus of the paper will be the algorithm in detecting and segmenting texts embedded in the natural scenes from taken images. The image, which is captured by a camera mobile phone, is transmitted to a translator server. It is initially passed through some preprocessing processes to smooth the image as well as suppress noises. A threshold is applied to binarize the image. Afterward, an edge detection algorithm and connected component analysis are performed on the filtered image to find edges and segment the components in the image. Finally, the pre-defined layout relation constraints are utilized in order to decide which components likely to be texts in the image. A preliminary experiment was done and the system yielded a recognition rate of 94.44% on a set of 36 various natural scene images that contain texts.

  • PDF

Knowledge-Based Numeric Open Caption Recognition for Live Sportscast

  • Sung, Si-Hun
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.1871-1874
    • /
    • 2003
  • Knowledge-based numeric open caption recognition is proposed that can recognize numeric captions generated by character generator (CG) and automatically superimpose a modified caption using the recognized text only when a valid numeric caption appears in the aimed specific region of a live sportscast scene produced by other broadcasting stations. in the proposed method, mesh features are extracted from an enhanced binary image as feature vectors, then a valuable information is recovered from a numeric image by perceiving the character using a multiplayer perceptron (MLP) network. The result is verified using knowledge-based hie set designed for a more stable and reliable output and then the modified information is displayed on a screen by CG. MLB Eye Caption based on the proposed algorithm has already been used for regular Major League Base-ball (MLB) programs broadcast five over a Korean nationwide TV network and has produced a favorable response from Korean viewer.

  • PDF

An Ensemble Classifier Based Method to Select Optimal Image Features for License Plate Recognition (차량 번호판 인식을 위한 앙상블 학습기 기반의 최적 특징 선택 방법)

  • Jo, Jae-Ho;Kang, Dong-Joong
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.1
    • /
    • pp.142-149
    • /
    • 2016
  • This paper proposes a method to detect LP(License Plate) of vehicles in indoor and outdoor parking lots. In restricted environment, there are many conventional methods for detecting LP. But, it is difficult to detect LP in natural and complex scenes with background clutters because several patterns similar with text or LP always exist in complicated backgrounds. To verify the performance of LP text detection in natural images, we apply MB-LGP feature by combining with ensemble machine learning algorithm in purpose of selecting optimal features of small number in huge pool. The feature selection is performed by adaptive boosting algorithm that shows great performance in minimum false positive detection ratio and in computing time when combined with cascade approach. MSER is used to provide initial text regions of vehicle LP. Throughout the experiment using real images, the proposed method functions robustly extracting LP in natural scene as well as the controlled environment.

Identification of Korea Traditional Color Harmony (비디오에서 프로젝션을 이용한 문자 인식)

  • Baek, Jeong-Uk;Shin, Seong-Yoon;Rhee, Yang-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.10a
    • /
    • pp.196-197
    • /
    • 2009
  • In Video, key frame generated from the scene change detection is to perform character recognition through the projections. The separation between the text are separated by a vertical projection. Phoneme is separated Cho-sung, Jung-sung, and Jong-sung and is divided 6 types. Phoneme pattern is separated to suitable 6 types through the horizontal projection. Phoneme are separated horizontal, vertical, diagonal, reverse-diagonal direction. Phoneme is recognized using the 4-direction projection and location information.

  • PDF

A Study on the OCR of Korean Sentence Using DeepLearning (딥러닝을 활용한 한글문장 OCR연구)

  • Park, Sun-Woo
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.470-474
    • /
    • 2019
  • 한글 OCR 성능을 높이기 위해 딥러닝 모델을 활용하여 문자인식 부분을 개선하고자 하였다. 본 논문에서는 폰트와 사전데이터를 사용해 딥러닝 모델 학습을 위한 한글 문장 이미지 데이터를 직접 생성해보고 이를 활용해서 한글 문장의 OCR 성능을 높일 다양한 모델 조합들에 대한 실험을 진행했다. 딥러닝 모델은 STR(Scene Text Recognition) 구조를 사용해 변환, 추출, 시퀀스, 예측 모듈 각 24가지 모델 조합을 구성했다. 딥러닝 모델을 활용한 OCR 실험 결과 한글 문장에 적합한 모델조합은 변환 모듈을 사용하고 시퀀스와 예측 모듈에는 BiLSTM과 어텐션을 사용한 모델조합이 다른 모델 조합에 비해 높은 성능을 보였다. 해당 논문에서는 이전 한글 OCR 연구와 비교해 적용 범위를 글자 단위에서 문장 단위로 확장하였고 실제 문서 이미지에서 자주 발견되는 유형의 데이터를 사용해 애플리케이션 적용 가능성을 높이고자 한 부분에 의의가 있다.

  • PDF

Study on Extracting Filming Location Information in Movies Using OCR for Developing Customized Travel Content (맞춤형 여행 콘텐츠 개발을 위한 OCR 기법을 활용한 영화 속 촬영지 정보 추출 방안 제시)

  • Park, Eunbi;Shin, Yubin;Kang, Juyoung
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.29-39
    • /
    • 2020
  • Purpose The atmosphere of respect for individual tastes that have spread throughout society has changed the consumption trend. As a result, the travel industry is also seeing customized travel as a new trend that reflects consumers' personal tastes. In particular, there is a growing interest in 'film-induced tourism', one of the areas of travel industry. We hope to satisfy the individual's motivation for traveling while watching movies with customized travel proposals, which we expect to be a catalyst for the continued development of the 'film-induced tourism industry'. Design/methodology/approach In this study, we implemented a methodology through 'OCR' of extracting and suggesting film location information that viewers want to visit. First, we extract a scene from a movie selected by a user by using 'OpenCV', a real-time image processing library. In addition, we detected the location of characters in the scene image by using 'EAST model', a deep learning-based text area detection model. The detected images are preprocessed by using 'OpenCV built-in function' to increase recognition accuracy. Finally, after converting characters in images into recognizable text using 'Tesseract', an optical character recognition engine, the 'Google Map API' returns actual location information. Significance This research is significant in that it provides personalized tourism content using fourth industrial technology, in addition to existing film tourism. This could be used in the development of film-induced tourism packages with travel agencies in the future. It also implies the possibility of being used for inflow from abroad as well as to abroad.

Label Restoration Using Biquadratic Transformation

  • Le, Huy Phat;Nguyen, Toan Dinh;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • v.6 no.1
    • /
    • pp.6-11
    • /
    • 2010
  • Recently, there has been research to use portable digital camera to recognize objects in natural scene images, including labels or marks on a cylindrical surface. In many cases, text or logo in a label can be distorted by a structural movement of the object on which the label resides. Since the distortion in the label can degrade the performance of object recognition, the label should be rectified or restored from deformations. In this paper, a new method for label detection and restoration in digital images is presented. In the detection phase, the Hough transform is employed to detect two vertical boundaries of the label, and a horizontal edge profile is analyzed to detect upper-side and lower-side boundaries of the label. Then, the biquadratic transformation is used to restore the rectangular shape of the label. The proposed algorithm performs restoration of 3D objects in a 2D space, and it requires neither an auxiliary hardware such as 3D camera to construct 3D models nor a multi-camera to capture objects in different views. Experimental results demonstrate the effectiveness of the proposed method.

Automatic Text Extraction from News Video using Morphology and Text Shape (형태학과 문자의 모양을 이용한 뉴스 비디오에서의 자동 문자 추출)

  • Jang, In-Young;Ko, Byoung-Chul;Kim, Kil-Cheon;Byun, Hye-Ran
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.4
    • /
    • pp.479-488
    • /
    • 2002
  • In recent years the amount of digital video used has risen dramatically to keep pace with the increasing use of the Internet and consequently an automated method is needed for indexing digital video databases. Textual information, both superimposed and embedded scene texts, appearing in a digital video can be a crucial clue for helping the video indexing. In this paper, a new method is presented to extract both superimposed and embedded scene texts in a freeze-frame of news video. The algorithm is summarized in the following three steps. For the first step, a color image is converted into a gray-level image and applies contrast stretching to enhance the contrast of the input image. Then, a modified local adaptive thresholding is applied to the contrast-stretched image. The second step is divided into three processes: eliminating text-like components by applying erosion, dilation, and (OpenClose+CloseOpen)/2 morphological operations, maintaining text components using (OpenClose+CloseOpen)/2 operation with a new Geo-correction method, and subtracting two result images for eliminating false-positive components further. In the third filtering step, the characteristics of each component such as the ratio of the number of pixels in each candidate component to the number of its boundary pixels and the ratio of the minor to the major axis of each bounding box are used. Acceptable results have been obtained using the proposed method on 300 news images with a recognition rate of 93.6%. Also, my method indicates a good performance on all the various kinds of images by adjusting the size of the structuring element.

Weaving the realities with video in multi-media theatre centering on Schaubuhne's Hamlet and Lenea de Sombra's Amarillo (멀티미디어 공연에서 비디오를 활용한 리얼리티 구축하기 - 샤우뷔네의 <햄릿>과 리니아 드 솜브라의 <아마릴로>를 중심으로 -)

  • Choi, Young-Joo
    • Journal of Korean Theatre Studies Association
    • /
    • no.53
    • /
    • pp.167-202
    • /
    • 2014
  • When video composes mise-en-scene during the performance, it reflects the aspect of contemporary image culture, where the individual as creator joins in the image culture through the device of cell phone and computer remediating the former video technology. It also closely related with the contemporary theatre culture in which 1960's and 1970's video art was weaved into the contemporary performance theatre. With these cultural background, theatre practitioners regarded media-friendly mise-en-scene as an alternative facing the cultural landscape the linear representational narrative did not correspond to the present culture. Nonetheless, it can not be ignored that video in the performance theatre is remediating its historical function: to criticize the social reality. to enrich the aesthetic or emotional reality. I focused video in the performance theatre could feature the object with the image by realizing the realtime relay, emphasizing the situation within the frame, and strengthening the reality by alluding the object as a gesutre. So I explored its two historical manuel. First, video recorded the spot, communicated the information, and arose the audience's recognition of the object to its critical function. Second, video in performance theatre could redistribute perceptual way according to the editing method like as close up, slow motion, multiple perspective, montage and collage, and transformation of the image to the aesthetic function. Reminding the historical function of video in contemporary performance theatre, I analyzed two shows, Schaubuhne's Hamlet and Lenea de Sombra's Amarillo which were introduced to Korean audiences during the 2010 Seoul Theatre Olympics. It is known to us that Ostermeir found real social reality as a text and made the play the context. In this, he used video as a vehicle to penetrate the social reality through the hero's perspective. It is also noteworthy that Ostermeir understood Hamlet's dilemma as these days' young generation's propensity. They delayed action while being involved in image culture. Besides his use of video in the piece revitalized the aesthetic function of video by hypermedial perceptual method. Amarillo combined documentary theatre method with installation, physical theatre, and video relay on the spot, and activated aesthetic function with the intermediality, its interacting co-relationship between the media. In this performance theatre, video has recorded and pursued the absent presence of the real people who died or lost in the desert. At the same time it fantasized the emotional aspect of the people at the moment of their death, which would be opaque or non prominent otherwise. As a conclusion, I found the video in contemporary performance theatre visualized the rupture between the media and perform their intermediality. It attempted to disturb the transparent immediacy to invoke the spectator's perception to the theatrical situation, to open its emotional and spiritual aspect, and to remind the realities as with Schaubuhne's Hamlet and Lenea de Sombra's Amarillo.

A Study on the Meaning and Coherence of Sosangpalkyung as a Text of Traditional Scenery (소상팔경(瀟湘八景), 전통경관 텍스트로서의 의미와 결속구조)

  • Rho, Jae-Hyun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.37 no.1
    • /
    • pp.110-119
    • /
    • 2009
  • Sosang Pal-Kyung(瀟湘八景), which originated in China and means eight scenes of So River and Sang River, greatly influenced the poems and the pictures in East Asia for a long time and became a cultural phenomenon leading the stereotype of the traditional landscapes in Korea and Japan. Studies on 'Kyung(a scene)' such as 'Pal-Kyung(八景)', have been made continuously until now, but there are no results of a study intensively focusing on the meaning and the form of Sosang Pal-Kyung, which is the origin of the domestic Pal-Kyung culture. The goal of this study is to investigate the typical form observed in Sosang Pal-Kyung-Ga(瀟湘八景歌) and Sosang Pal-Kyung-Do(瀟湘八景圖), as a text of a cultural landscape, and to clear up the coherence structure between a recognition system and a way of thinking that existed in the cultural phenomenon of Sosang Pal-Kyung. In this study, the symbolism of Pal(八) was summarized and the surface structure and the correlation of each Kyung of Sosang Pal-Kyung was explained in light of semiotics through segmenting and disjointing the lexeme of a landscape while the coherence structure and the meaning of Sosang Pal-Kyung-Ga and Sosang Pal-Kyung-Do as a text were investigated. Sosang Pal-Kyung is based on the view of the Sun and the Moon(or Positive and Negative) and the Eight Trigrams(八卦) for divination and is a linguistic symbol in which human life and the principle of circulation and conversion of nature are expressed as characters and picture texts. Its structure has strong coherence and cohesion, which attempt to move the abstruse truth of nature into human consciousness by developing and corresponding the grammatical structure and form of the sentences and the implicative languages emphasizing the symbolism of the words to the characteristics of similarities and contrast. In addition, Sosang Pal-Kyung expresses human life, the processes of birth and death of nature and the mutual response dialectically by putting various factors of the landscape in the frame of regular formality and structure. It is considered that the image signs in Sosang Pal-Kyung emphasize the theory of circulation of human life and nature are narrative scenery, which one looks at with a contemplative view in the circulation system of the time and the season. The cultural phenomena of Sosang Pal-Kyung in the Joseon Dynasty, which had been handed down from the Goryeo Dynasty, had become the driving force of leading aesthetics of Joseon's art and literature by adding the scenery of the point of view of Sung Confucianism. Its coherence structure was changed, but its cohesion was handed down continuously so that it became not only the basic text of the traditional and cultural landscape but also, the typical Korean-style stereotype of a landscape.