• 제목/요약/키워드: Scene Recognition

검색결과 193건 처리시간 0.028초

지역적, 전역적 특징을 이용한 환경 인식 (Scene Recognition Using Local and Global Features)

  • 강산들;황중원;정희철;한동윤;심성대;김준모
    • 한국군사과학기술학회지
    • /
    • 제15권3호
    • /
    • pp.298-305
    • /
    • 2012
  • In this paper, we propose an integrated algorithm for scene recognition, which has been a challenging computer vision problem, with application to mobile robot localization. The proposed scene recognition method utilizes SIFT and visual words as local-level features and GIST as a global-level feature. As local-level and global-level features complement each other, it results in improved performance for scene recognition. This improved algorithm is of low computational complexity and robust to image distortions.

Representative Batch Normalization for Scene Text Recognition

  • Sun, Yajie;Cao, Xiaoling;Sun, Yingying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권7호
    • /
    • pp.2390-2406
    • /
    • 2022
  • Scene text recognition has important application value and attracted the interest of plenty of researchers. At present, many methods have achieved good results, but most of the existing approaches attempt to improve the performance of scene text recognition from the image level. They have a good effect on reading regular scene texts. However, there are still many obstacles to recognizing text on low-quality images such as curved, occlusion, and blur. This exacerbates the difficulty of feature extraction because the image quality is uneven. In addition, the results of model testing are highly dependent on training data, so there is still room for improvement in scene text recognition methods. In this work, we present a natural scene text recognizer to improve the recognition performance from the feature level, which contains feature representation and feature enhancement. In terms of feature representation, we propose an efficient feature extractor combined with Representative Batch Normalization and ResNet. It reduces the dependence of the model on training data and improves the feature representation ability of different instances. In terms of feature enhancement, we use a feature enhancement network to expand the receptive field of feature maps, so that feature maps contain rich feature information. Enhanced feature representation capability helps to improve the recognition performance of the model. We conducted experiments on 7 benchmarks, which shows that this method is highly competitive in recognizing both regular and irregular texts. The method achieved top1 recognition accuracy on four benchmarks of IC03, IC13, IC15, and SVTP.

동적 환경에 강인한 장면 인식 기반의 로봇 자율 주행 (Scene Recognition based Autonomous Robot Navigation robust to Dynamic Environments)

  • 김정호;권인소
    • 로봇학회논문지
    • /
    • 제3권3호
    • /
    • pp.245-254
    • /
    • 2008
  • Recently, many vision-based navigation methods have been introduced as an intelligent robot application. However, many of these methods mainly focus on finding an image in the database corresponding to a query image. Thus, if the environment changes, for example, objects moving in the environment, a robot is unlikely to find consistent corresponding points with one of the database images. To solve these problems, we propose a novel navigation strategy which uses fast motion estimation and a practical scene recognition scheme preparing the kidnapping problem, which is defined as the problem of re-localizing a mobile robot after it is undergone an unknown motion or visual occlusion. This algorithm is based on motion estimation by a camera to plan the next movement of a robot and an efficient outlier rejection algorithm for scene recognition. Experimental results demonstrate the capability of the vision-based autonomous navigation against dynamic environments.

  • PDF

모바일/임베디드 객체 및 장면 인식 기술 동향 (Recent Trends of Object and Scene Recognition Technologies for Mobile/Embedded Devices)

  • 이수웅;이근동;고종국;이승재;유원영
    • 전자통신동향분석
    • /
    • 제34권6호
    • /
    • pp.133-144
    • /
    • 2019
  • Although deep learning-based visual image recognition technology has evolved rapidly, most of the commonly used methods focus solely on recognition accuracy. However, the demand for low latency and low power consuming image recognition with an acceptable accuracy is rising for practical applications in edge devices. For example, most Internet of Things (IoT) devices have a low computing power requiring more pragmatic use of these technologies; in addition, drones or smartphones have limited battery capacity again requiring practical applications that take this into consideration. Furthermore, some people do not prefer that central servers process their private images, as is required by high performance serverbased recognition technologies. To address these demands, the object and scene recognition technologies for mobile/embedded devices that enable optimized neural networks to operate in mobile and embedded environments are gaining attention. In this report, we briefly summarize the recent trends and issues of object and scene recognition technologies for mobile and embedded devices.

Arabic Words Extraction and Character Recognition from Picturesque Image Macros with Enhanced VGG-16 based Model Functionality Using Neural Networks

  • Ayed Ahmad Hamdan Al-Radaideh;Mohd Shafry bin Mohd Rahim;Wad Ghaban;Majdi Bsoul;Shahid Kamal;Naveed Abbas
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권7호
    • /
    • pp.1807-1822
    • /
    • 2023
  • Innovation and rapid increased functionality in user friendly smartphones has encouraged shutterbugs to have picturesque image macros while in work environment or during travel. Formal signboards are placed with marketing objectives and are enriched with text for attracting people. Extracting and recognition of the text from natural images is an emerging research issue and needs consideration. When compared to conventional optical character recognition (OCR), the complex background, implicit noise, lighting, and orientation of these scenic text photos make this problem more difficult. Arabic language text scene extraction and recognition adds a number of complications and difficulties. The method described in this paper uses a two-phase methodology to extract Arabic text and word boundaries awareness from scenic images with varying text orientations. The first stage uses a convolution autoencoder, and the second uses Arabic Character Segmentation (ACS), which is followed by traditional two-layer neural networks for recognition. This study presents the way that how can an Arabic training and synthetic dataset be created for exemplify the superimposed text in different scene images. For this purpose a dataset of size 10K of cropped images has been created in the detection phase wherein Arabic text was found and 127k Arabic character dataset for the recognition phase. The phase-1 labels were generated from an Arabic corpus of quotes and sentences, which consists of 15kquotes and sentences. This study ensures that Arabic Word Awareness Region Detection (AWARD) approach with high flexibility in identifying complex Arabic text scene images, such as texts that are arbitrarily oriented, curved, or deformed, is used to detect these texts. Our research after experimentations shows that the system has a 91.8% word segmentation accuracy and a 94.2% character recognition accuracy. We believe in the future that the researchers will excel in the field of image processing while treating text images to improve or reduce noise by processing scene images in any language by enhancing the functionality of VGG-16 based model using Neural Networks.

Multimodal Attention-Based Fusion Model for Context-Aware Emotion Recognition

  • Vo, Minh-Cong;Lee, Guee-Sang
    • International Journal of Contents
    • /
    • 제18권3호
    • /
    • pp.11-20
    • /
    • 2022
  • Human Emotion Recognition is an exciting topic that has been attracting many researchers for a lengthy time. In recent years, there has been an increasing interest in exploiting contextual information on emotion recognition. Some previous explorations in psychology show that emotional perception is impacted by facial expressions, as well as contextual information from the scene, such as human activities, interactions, and body poses. Those explorations initialize a trend in computer vision in exploring the critical role of contexts, by considering them as modalities to infer predicted emotion along with facial expressions. However, the contextual information has not been fully exploited. The scene emotion created by the surrounding environment, can shape how people perceive emotion. Besides, additive fusion in multimodal training fashion is not practical, because the contributions of each modality are not equal to the final prediction. The purpose of this paper was to contribute to this growing area of research, by exploring the effectiveness of the emotional scene gist in the input image, to infer the emotional state of the primary target. The emotional scene gist includes emotion, emotional feelings, and actions or events that directly trigger emotional reactions in the input image. We also present an attention-based fusion network, to combine multimodal features based on their impacts on the target emotional state. We demonstrate the effectiveness of the method, through a significant improvement on the EMOTIC dataset.

후보 단어 리스트와 확률 점수에 기반한 한국어 문자 인식 모델 (Candidate Word List and Probability Score Guided for Korean Scene Text Recognition)

  • 이윤지;이종민
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2022년도 춘계학술대회
    • /
    • pp.73-75
    • /
    • 2022
  • 글자 인식 시스템은 무인 로봇, 자율 주행 자동차 등 자동화를 필요로 하는 인공지능 분야에서 사용되는 기술로, 주변 환경에 여러 장애물이 있음에도 글자를 정확하게 인식하는 것을 말한다. 영어만 인식했던 기존의 연구와 달리, 본 논문은 영어, 한국어, 특수문자와 숫자를 포함한 다양한 문자가 혼재되어 있는 경우에도 강한 인식률을 보여준다. 가장 높은 확률 값을 갖는 클래스 하나 만을 선택하는 것이 아닌 차 순위의 확률도 함께 고려하여 후보 단어 리스트를 생성하고, 이로 인해 기존에 오인식되는 단어를 교정할 수 있는 방법을 제안한다.

  • PDF

비음수 텐서 분해를 이용한 차량 인식 (Vehicle Recognition using Non-negative Tensor Factorization)

  • 반재민;강현철
    • 전자공학회논문지
    • /
    • 제52권5호
    • /
    • pp.136-146
    • /
    • 2015
  • 차량 인식을 기반으로 하는 능동 제어는 지능형 자동차의 구현에 필요한 핵심 기술이며. 차폐 영역(occlusion)이 빈번하게 발생하는 도심에서 차량을 인식하기 위하여 차량의 부분적인 모습만으로도 차량을 인식할 수 있는 부분 기반 차량 표현이 필요하다. 본 논문에서는 지역적인 특징을 기저벡터로 사용하는 비음수 텐서 분해(non-negative tensor factorization, NTF)를 이용하여 차량을 표현하고, NTF 분해 계수를 특징으로 차량 인식률을 검증하였다. 실험 결과, 제안하는 방법이 기존의 비음수 행렬 분해를 사용한 경우에 비하여 보다 직관적인 부분 표현이 가능하며, 도심 영상에서도 보다 강건하게 차량을 인식함을 보여주었다.

OCR 엔진 기반 분류기 애드온 결합을 통한 이미지 내부 텍스트 인식 성능 향상 (Scene Text Recognition Performance Improvement through an Add-on of an OCR based Classifier)

  • 채호열;석호식
    • 전기전자학회논문지
    • /
    • 제24권4호
    • /
    • pp.1086-1092
    • /
    • 2020
  • 일상 환경에서 동작하는 자율 에이전트를 구현하기 위해서는 이미지나 객체에 존재하는 텍스트를 인식하는 기능이 필수적이다. 주어진 이미지에 입력 변환, 특성 인식, 워드 예측을 적용하여 인식된 텍스트에 존재하는 워드를 출력하는 과정에 다양한 딥러닝 모델이 활용되고 있으며, 딥뉴럴넷의 놀라운 객체 인식 능력으로 인식 성능이 매우 향상되었지만 실제 환경에 적용하기에는 아직 부족한 점이 많다. 본 논문에서는 인식 성능 향상을 위하여 텍스트 존재 영역 감지, 텍스트 인식, 워드 예측의 파이프라인에 OCR 엔진과 분류기로 구성된 애드온을 추가하여 기존 파이프라인이 인식하지 못한 텍스트의 인식을 시도하는 접근법을 제안한다. IC13, IC15의 데이터 셋에 제안 방법을 적용한 결과, 문자 단위에서 기존 파이프라인이 인식하는데 실패한 문자의 최대 10.92%를 인식함을 확인하였다.

An End-to-End Sequence Learning Approach for Text Extraction and Recognition from Scene Image

  • Lalitha, G.;Lavanya, B.
    • International Journal of Computer Science & Network Security
    • /
    • 제22권7호
    • /
    • pp.220-228
    • /
    • 2022
  • Image always carry useful information, detecting a text from scene images is imperative. The proposed work's purpose is to recognize scene text image, example boarding image kept on highways. Scene text detection on highways boarding's plays a vital role in road safety measures. At initial stage applying preprocessing techniques to the image is to sharpen and improve the features exist in the image. Likely, morphological operator were applied on images to remove the close gaps exists between objects. Here we proposed a two phase algorithm for extracting and recognizing text from scene images. In phase I text from scenery image is extracted by applying various image preprocessing techniques like blurring, erosion, tophat followed by applying thresholding, morphological gradient and by fixing kernel sizes, then canny edge detector is applied to detect the text contained in the scene images. In phase II text from scenery image recognized using MSER (Maximally Stable Extremal Region) and OCR; Proposed work aimed to detect the text contained in the scenery images from popular dataset repositories SVT, ICDAR 2003, MSRA-TD 500; these images were captured at various illumination and angles. Proposed algorithm produces higher accuracy in minimal execution time compared with state-of-the-art methodologies.