• Title/Summary/Keyword: text density

Search Result 73, Processing Time 0.02 seconds

Text Extraction in HIS Color Space by Weighting Scheme

  • Le, Thi Khue Van;Lee, Gueesang
    • Smart Media Journal
    • /
    • v.2 no.1
    • /
    • pp.31-36
    • /
    • 2013
  • A robust and efficient text extraction is very important for an accuracy of Optical Character Recognition (OCR) systems. Natural scene images with degradations such as uneven illumination, perspective distortion, complex background and multi color text give many challenges to computer vision task, especially in text extraction. In this paper, we propose a method for extraction of the text in signboard images based on a combination of mean shift algorithm and weighting scheme of hue and saturation in HSI color space for clustering algorithm. The number of clusters is determined automatically by mean shift-based density estimation, in which local clusters are estimated by repeatedly searching for higher density points in feature vector space. Weighting scheme of hue and saturation is used for formulation a new distance measure in cylindrical coordinate for text extraction. The obtained experimental results through various natural scene images are presented to demonstrate the effectiveness of our approach.

  • PDF

Research of Adaptive Transformation Method Based on Webpage Semantic Features for Small-Screen Terminals

  • Li, Hao;Liu, Qingtang;Hu, Min;Zhu, Xiaoliang
    • ETRI Journal
    • /
    • v.35 no.5
    • /
    • pp.900-910
    • /
    • 2013
  • Small-screen mobile terminals have difficulty accessing existing Web resources designed for large-screen devices. This paper presents an adaptive transformation method based on webpage semantic features to solve this problem. According to the text density and link density features of the webpages, the webpages are divided into two types: index and content. Our method uses an index-based webpage transformation algorithm and a content-based webpage transformation algorithm. Experiment results demonstrate that our adaptive transformation method is not dependent on specific software and webpage templates, and it is capable of enhancing Web content adaptation on small-screen terminals.

The effects of pixel density, sub-pixel structure, luminance, and illumination on legibility of smartphone (화소 밀집도, 화소 하부구조, 휘도, 조명 조도가 스마트폰 가독성에 미치는 영향)

  • Park, JongJin;Li, Hyung-Chul O.;Kim, ShinWoo
    • Science of Emotion and Sensibility
    • /
    • v.17 no.3
    • /
    • pp.3-14
    • /
    • 2014
  • Since the domestic introduction of iPhone in 2009, use of smartphones rapidly increased and many tasks, previously performed by various devices, are now performed by smartphones. In this process the importance of reading little text using small smartphone screen has become highly significant. This research tested how display factors of smartphone (pixel density, sub-pixel structure, luminance) and environmental factor (illumination) affect legibility related discomfort in text reading. The results indicated that legibility related discomfort is largely affected by pixel density, where people experience inconvenience when the pixel density becomes lower than 300 PPI. Illumination has limited effect on legibility related discomfort. Participants reported more legibility related discomfort when stimulus presented in various levels of illumination rather than single illumination level. Sub-pixel structure and luminance did not affected legibility related discomfort. Based on the results we suggest lower limit resolution of smart devices (smartphones, tablet computers) of different sizes for text legibility.

Region Analysis of Business Card Images Acquired in PDA Using DCT and Information Pixel Density (DCT와 정보 화소 밀도를 이용한 PDA로 획득한 명함 영상에서의 영역 해석)

  • 김종흔;장익훈;김남철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.8C
    • /
    • pp.1159-1174
    • /
    • 2004
  • In this paper, we present an efficient algorithm for region analysis of business card images acquired in a PDA by using DCT and information pixel density. The proposed method consists of three parts: region segmentation, information region classification, and text region classification. In the region segmentation, an input business card image is partitioned into 8 f8 blocks and the blocks are classified into information and background blocks using the normalized DCT energy in their low frequency bands. The input image is then segmented into information and background regions by region labeling on the classified blocks. In the information region classification, each information region is classified into picture region or text region by using a ratio of the DCT energy of horizontal and vertical edge components to that in low frequency band and a density of information pixels, that are black pixels in its binarized region. In the text region classification, each text region is classified into large character region or small character region by using the density of information pixels and an averaged horizontal and vertical run-lengths of information pixels. Experimental results show that the proposed method yields good performance of region segmentation, information region classification, and text region classification for test images of several types of business cards acquired by a PDA under various surrounding conditions. In addition, the error rates of the proposed region segmentation are about 2.2-10.1% lower than those of the conventional region segmentation methods. It is also shown that the error rates of the proposed information region classification is about 1.7% lower than that of the conventional information region classification method.

Main Content Extraction from Web Pages Based on Node Characteristics

  • Liu, Qingtang;Shao, Mingbo;Wu, Linjing;Zhao, Gang;Fan, Guilin;Li, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.2
    • /
    • pp.39-48
    • /
    • 2017
  • Main content extraction of web pages is widely used in search engines, web content aggregation and mobile Internet browsing. However, a mass of irrelevant information such as advertisement, irrelevant navigation and trash information is included in web pages. Such irrelevant information reduces the efficiency of web content processing in content-based applications. The purpose of this paper is to propose an automatic main content extraction method of web pages. In this method, we use two indicators to describe characteristics of web pages: text density and hyperlink density. According to continuous distribution of similar content on a page, we use an estimation algorithm to judge if a node is a content node or a noisy node based on characteristics of the node and neighboring nodes. This algorithm enables us to filter advertisement nodes and irrelevant navigation. Experimental results on 10 news websites revealed that our algorithm could achieve a 96.34% average acceptable rate.

Factor Analysis and Content Development of Digital Text Structure for Designing Visual Experience in e-Book Interface (e-Book 인터페이스에서 시각적 경험 설계를 위한 디지털 텍스트 구조의 물리적 요인분석 및 콘텐츠 개발)

  • Sung, Eun-Mo
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.11
    • /
    • pp.79-90
    • /
    • 2011
  • The purpose of this study is to explore physical factor of digital text structure for designing e-Book interface and to develop prototype of e-Book interface by applied these factors. To address this goal, explore factor analysis and confirmatory factor analysis were employed, 237 university students were the participated in this study. According to a result, 29 items for physical feature of digital text structure were developed, 9 factors of digital text structure were also extracted; volume, depth, density, space, layout, format, signal, size, and length. Besides, to identify structure of pre-defined 9 factors, confirmatory factor analysis was conducted. As a result of CFA, the factor structure was supported by all of model fit indices.

PDA-based Text Extraction System using Client/Server Architecture (Client/Server구조를 이용한 PDA기반의 문자 추출 시스템)

  • Park Anjin;Jung Keechul
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.85-98
    • /
    • 2005
  • Recently, a lot of researches about mobile vision using Personal Digital Assistant(PDA) has been attempted. Many CPUs for PDA are integer CPUs, which have no floating-computation component. It results in slow computation of the algorithms peformed by vision system or image processing, which have much floating-computation. In this paper, in order to resolve this weakness, we propose the Client(PDA)/server(PC) architecture which is connected to each other with a wireless LAN, and we construct the system with pipelining processing using two CPUs of the Client(PDA) and the Server(PC) in image sequence. The Client(PDA) extracts tentative text regions using Edge Density(ED). The Server(PC) uses both the Multi-1.aver Perceptron(MLP)-based texture classifier and Connected Component(CC)-based filtering for a definite text extraction based on the Client(PDA)'s tentativel99-y extracted results. The proposed method leads to not only efficient text extraction by using both the MLP and the CC, but also fast running time using Client(PDA)/server(PC) architecture with the pipelining processing.

A new approach for overlay text detection from complex video scene (새로운 비디오 자막 영역 검출 기법)

  • Kim, Won-Jun;Kim, Chang-Ick
    • Journal of Broadcast Engineering
    • /
    • v.13 no.4
    • /
    • pp.544-553
    • /
    • 2008
  • With the development of video editing technology, there are growing uses of overlay text inserted into video contents to provide viewers with better visual understanding. Since the content of the scene or the editor's intention can be well represented by using inserted text, it is useful for video information retrieval and indexing. Most of the previous approaches are based on low-level features, such as edge, color, and texture information. However, existing methods experience difficulties in handling texts with various contrasts or inserted in a complex background. In this paper, we propose a novel framework to localize the overlay text in a video scene. Based on our observation that there exist transient colors between inserted text and its adjacent background a transition map is generated. Then candidate regions are extracted by using the transition map and overlay text is finally determined based on the density of state in each candidate. The proposed method is robust to color, size, position, style, and contrast of overlay text. It is also language free. Text region update between frames is also exploited to reduce the processing time. Experiments are performed on diverse videos to confirm the efficiency of the proposed method.