• 제목/요약/키워드: AI 영상인식

Search Result 111, Processing Time 0.025 seconds

Large-scale Language-image Model-based Bag-of-Objects Extraction for Visual Place Recognition (영상 기반 위치 인식을 위한 대규모 언어-이미지 모델 기반의 Bag-of-Objects 표현)

  • Seung Won Jung;Byungjae Park
    • Journal of Sensor Science and Technology
    • /
    • v.33 no.2
    • /
    • pp.78-85
    • /
    • 2024
  • We proposed a method for visual place recognition that represents images using objects as visual words. Visual words represent the various objects present in urban environments. To detect various objects within the images, we implemented and used a zero-shot detector based on a large-scale image language model. This zero-shot detector enables the detection of various objects in urban environments without additional training. In the process of creating histograms using the proposed method, frequency-based weighting was applied to consider the importance of each object. Through experiments with open datasets, the potential of the proposed method was demonstrated by comparing it with another method, even in situations involving environmental or viewpoint changes.

A Study on Tools for Small Apartment Parking Managemet System Development (빌라 주차 관리 시스템에 관한 연구)

  • Seok-Young Chang;Do-Hyun Kang;In-Seok Park;Jeong-Pung Kim
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.888-889
    • /
    • 2024
  • 주차 관리 서비스의 개발은 제한된 주차 자원을 효율적으로 활용하고, 실시간 주차 수요를 파악하며, 사용자에게 편리한 서비스를 제공하는 데 중요한 역할을 한다. 본 논문에서는 AI 기반의 딥러닝 모델과 영상 처리 기술을 결합하여 빌라의 주차 공간을 효율적으로 관리하는 시스템을 제안한다. 이 시스템은 YOLOv8를 활용하여 차량 객체를 실시간으로 감지하고, OpenCV를 사용하여 주차 공간의 빈자리를 인식한다. 또한, 예상 출차 시간을 강화학습 기반의 알고리즘을 이용하여 사용자에게 제공하며, 이를 통해 주차 공간의 효율적 이용을 극대화한다. 서버는 주차장 내 주차 현황과 출차 시간을 실시간으로 갱신하며, 관리자는 웹 플랫폼을 통해 데이터를 모니터링 및 관리하고, 사용자는 웹 또는 모바일 애플리케이션을 통해 주차 공간 조회와 차량 예상 출차 시간을 제공받을 수 있다. 다양한 실험을 통해 제안된 시스템의 성능과 정확성을 확인하였으며, 이를 통해 주차 관리의 자동화와 편리성 향상에 기여할 수 있을 것으로 예상된다.

Implementation of Facility Movement Recognition Accuracy Analysis and Utilization Service using Drone Image (드론 영상 활용 시설물 이동 인식 정확도 분석 및 활용 서비스 구현)

  • Kim, Gwang-Seok;Oh, Ah-Ra;Choi, Yun-Soo
    • Journal of the Korean Institute of Gas
    • /
    • v.25 no.5
    • /
    • pp.88-96
    • /
    • 2021
  • Advanced Internet of Things (IoT) technology is being used in various ways for the safety of the energy industry. At the center of safety measures, drones play various roles on behalf of humans. Drones are playing a role in reaching places that are difficult to reach due to large-scale facilities and space restrictions that are difficult for humans to inspect. In this study, the accuracy and completeness of movement of dangerous facilities were tested using drone images, and it was confirmed that the movement recognition accuracy was 100%, the average data analysis accuracy was 95.8699%, and the average completeness was 100%. Based on the experimental results, a future-oriented facility risk analysis system combined with ICT technology was implemented and presented. Additional experiments with diversified conditions are required in the future, and ICT convergence analysis system implementation is required.

Video Compression Standard Prediction using Attention-based Bidirectional LSTM (어텐션 알고리듬 기반 양방향성 LSTM을 이용한 동영상의 압축 표준 예측)

  • Kim, Sangmin;Park, Bumjun;Jeong, Jechang
    • Journal of Broadcast Engineering
    • /
    • v.24 no.5
    • /
    • pp.870-878
    • /
    • 2019
  • In this paper, we propose an Attention-based BLSTM for predicting the video compression standard of a video. Recently, in NLP, many researches have been studied to predict the next word of sentences, classify and translate sentences by their semantics using the structure of RNN, and they were commercialized as chatbots, AI speakers and translator applications, etc. LSTM is designed to solve the gradient vanishing problem in RNN, and is used in NLP. The proposed algorithm makes video compression standard prediction possible by applying BLSTM and Attention algorithm which focuses on the most important word in a sentence to a bitstream of a video, not an sentence of a natural language.

Generating Extreme Close-up Shot Dataset Based On ROI Detection For Classifying Shots Using Artificial Neural Network (인공신경망을 이용한 샷 사이즈 분류를 위한 ROI 탐지 기반의 익스트림 클로즈업 샷 데이터 셋 생성)

  • Kang, Dongwann;Lim, Yang-mi
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.983-991
    • /
    • 2019
  • This study aims to analyze movies which contain various stories according to the size of their shots. To achieve this, it is needed to classify dataset according to the shot size, such as extreme close-up shots, close-up shots, medium shots, full shots, and long shots. However, a typical video storytelling is mainly composed of close-up shots, medium shots, full shots, and long shots, it is not an easy task to construct an appropriate dataset for extreme close-up shots. To solve this, we propose an image cropping method based on the region of interest (ROI) detection. In this paper, we use the face detection and saliency detection to estimate the ROI. By cropping the ROI of close-up images, we generate extreme close-up images. The dataset which is enriched by proposed method is utilized to construct a model for classifying shots based on its size. The study can help to analyze the emotional changes of characters in video stories and to predict how the composition of the story changes over time. If AI is used more actively in the future in entertainment fields, it is expected to affect the automatic adjustment and creation of characters, dialogue, and image editing.

A Study on Low-Light Image Enhancement Technique for Improvement of Object Detection Accuracy in Construction Site (건설현장 내 객체검출 정확도 향상을 위한 저조도 영상 강화 기법에 관한 연구)

  • Jong-Ho Na;Jun-Ho Gong;Hyu-Soung Shin;Il-Dong Yun
    • Tunnel and Underground Space
    • /
    • v.34 no.3
    • /
    • pp.208-217
    • /
    • 2024
  • There is so much research effort for developing and implementing deep learning-based surveillance systems to manage health and safety issues in construction sites. Especially, the development of deep learning-based object detection in various environmental changes has been progressing because those affect decreasing searching performance of the model. Among the various environmental variables, the accuracy of the object detection model is significantly dropped under low illuminance, and consistent object detection accuracy cannot be secured even the model is trained using low-light images. Accordingly, there is a need of low-light enhancement to keep the performance under low illuminance. Therefore, this paper conducts a comparative study of various deep learning-based low-light image enhancement models (GLADNet, KinD, LLFlow, Zero-DCE) using the acquired construction site image data. The low-light enhanced image was visually verified, and it was quantitatively analyzed by adopting image quality evaluation metrics such as PSNR, SSIM, Delta-E. As a result of the experiment, the low-light image enhancement performance of GLADNet showed excellent results in quantitative and qualitative evaluation, and it was analyzed to be suitable as a low-light image enhancement model. If the low-light image enhancement technique is applied as an image preprocessing to the deep learning-based object detection model in the future, it is expected to secure consistent object detection performance in a low-light environment.

Digital Library Interface Research Based on EEG, Eye-Tracking, and Artificial Intelligence Technologies: Focusing on the Utilization of Implicit Relevance Feedback (뇌파, 시선추적 및 인공지능 기술에 기반한 디지털 도서관 인터페이스 연구: 암묵적 적합성 피드백 활용을 중심으로)

  • Hyun-Hee Kim;Yong-Ho Kim
    • Journal of the Korean Society for information Management
    • /
    • v.41 no.1
    • /
    • pp.261-282
    • /
    • 2024
  • This study proposed and evaluated electroencephalography (EEG)-based and eye-tracking-based methods to determine relevance by utilizing users' implicit relevance feedback while navigating content in a digital library. For this, EEG/eye-tracking experiments were conducted on 32 participants using video, image, and text data. To assess the usefulness of the proposed methods, deep learning-based artificial intelligence (AI) techniques were used as a competitive benchmark. The evaluation results showed that EEG component-based methods (av_P600 and f_P3b components) demonstrated high classification accuracy in selecting relevant videos and images (faces/emotions). In contrast, AI-based methods, specifically object recognition and natural language processing, showed high classification accuracy for selecting images (objects) and texts (newspaper articles). Finally, guidelines for implementing a digital library interface based on EEG, eye-tracking, and artificial intelligence technologies have been proposed. Specifically, a system model based on implicit relevance feedback has been presented. Moreover, to enhance classification accuracy, methods suitable for each media type have been suggested, including EEG-based, eye-tracking-based, and AI-based approaches.

An Effectiveness Verification for Evaluating the Amount of WTCI Tongue Coating Using Deep Learning (딥러닝을 이용한 WTCI 설태량 평가를 위한 유효성 검증)

  • Lee, Woo-Beom
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.20 no.4
    • /
    • pp.226-231
    • /
    • 2019
  • A WTCI is an important criteria for evaluating an mount of patient's tongue coating in tongue diagnosis. However, Previous WTCI tongue coating evaluation methods is a most of quantitatively measuring ration of the extracted tongue coating region and tongue body region, which has a non-objective measurement problem occurring by exposure conditions of tongue image or the recognition performance of tongue coating. Therefore, a WTCI based on deep learning is proposed for classifying an amount of tonger coating in this paper. This is applying the AI deep learning method using big data. to WTCI for evaluating an amount of tonger coating. In order to verify the effectiveness performance of the deep learning in tongue coating evaluating method, we classify the 3 types class(no coating, some coating, intense coating) of an amount of tongue coating by using CNN model. As a results by testing a building the tongue coating sample images for learning and verification of CNN model, proposed method is showed 96.7% with respect to the accuracy of classifying an amount of tongue coating.

Implementation of Hair Style Recommendation System Based on Big data and Deepfakes (빅데이터와 딥페이크 기반의 헤어스타일 추천 시스템 구현)

  • Tae-Kook Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.3
    • /
    • pp.13-19
    • /
    • 2023
  • In this paper, we investigated the implementation of a hairstyle recommendation system based on big data and deepfake technology. The proposed hairstyle recommendation system recognizes the facial shapes based on the user's photo (image). Facial shapes are classified into oval, round, and square shapes, and hairstyles that suit each facial shape are synthesized using deepfake technology and provided as videos. Hairstyles are recommended based on big data by applying the latest trends and styles that suit the facial shape. With the image segmentation map and the Motion Supervised Co-Part Segmentation algorithm, it is possible to synthesize elements between images belonging to the same category (such as hair, face, etc.). Next, the synthesized image with the hairstyle and a pre-defined video are applied to the Motion Representations for Articulated Animation algorithm to generate a video animation. The proposed system is expected to be used in various aspects of the beauty industry, including virtual fitting and other related areas. In future research, we plan to study the development of a smart mirror that recommends hairstyles and incorporates features such as Internet of Things (IoT) functionality.

Method of Automatically Generating Metadata through Audio Analysis of Video Content (영상 콘텐츠의 오디오 분석을 통한 메타데이터 자동 생성 방법)

  • Sung-Jung Young;Hyo-Gyeong Park;Yeon-Hwi You;Il-Young Moon
    • Journal of Advanced Navigation Technology
    • /
    • v.25 no.6
    • /
    • pp.557-561
    • /
    • 2021
  • A meatadata has become an essential element in order to recommend video content to users. However, it is passively generated by video content providers. In the paper, a method for automatically generating metadata was studied in the existing manual metadata input method. In addition to the method of extracting emotion tags in the previous study, a study was conducted on a method for automatically generating metadata for genre and country of production through movie audio. The genre was extracted from the audio spectrogram using the ResNet34 artificial neural network model, a transfer learning model, and the language of the speaker in the movie was detected through speech recognition. Through this, it was possible to confirm the possibility of automatically generating metadata through artificial intelligence.