• Title/Summary/Keyword: vision AI

Search Result 153, Processing Time 0.023 seconds

Noise Robust Baseball Event Detection with Multimodal Information (멀티모달 정보를 이용한 잡음에 강인한 야구 이벤트 시점 검출 방법)

  • Young-Ik Kim;Hyun Jo Jung;Minsoo Na;Younghyun Lee;Joonsoo Lee
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.11a
    • /
    • pp.136-138
    • /
    • 2022
  • 스포츠 방송/미디어 데이터에서 특정 이벤트 시점을 효율적으로 검출하는 방법은 정보 검색이나 하이라이트, 요약 등을 위해 중요한 기술이다. 이 논문에서는, 야구 중계 방송 데이터에서 투구에 대한 타격 및 포구 이벤트 시점을 강인하게 검출하는 방법으로, 음향 및 영상 정보를 융합하는 방법에 대해 제안한다. 음향 정보에 기반한 이벤트 검출 방법은 계산이 용이하고 정확도가 높은 반면, 영상 정보의 도움 없이는 모호성을 해결하기 힘든 경우가 많이 발생한다. 특히 야구 중계 데이터의 경우, 투수의 투구 시점에 대한 영상 정보를 활용하여 타격 및 포구 이벤트 검출의 정확도를 보다 향상시킬 수 있다. 이 논문에서는 음향 기반의 딥러닝 이벤트 시점 검출 모델과 영상 기반의 보정 방법을 제안하고, 실제 KBO 야구 중계 방송 데이터에 적용한 사례와 실험 결과에 대해 기술한다.

  • PDF

The Effect of Background on Object Recognition of Vision AI (비전 AI의 객체 인식에 배경이 미치는 영향)

  • Wang, In-Gook;Yu, Jung-Ho
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2023.05a
    • /
    • pp.127-128
    • /
    • 2023
  • The construction industry is increasingly adopting vision AI technologies to improve efficiency and safety management. However, the complex and dynamic nature of construction sites can pose challenges to the accuracy of vision AI models trained on datasets that do not consider the background. This study investigates the effect of background on object recognition for vision AI in construction sites by constructing a learning dataset and a test dataset with varying backgrounds. Frame scaffolding was chosen as the object of recognition due to its wide use, potential safety hazards, and difficulty in recognition. The experimental results showed that considering the background during model training significantly improved the accuracy of object recognition.

  • PDF

A Study on Public Library Book Location Guidance System based on AI Vision Sensor

  • Soyoung Kim;Heesun Kim
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.3
    • /
    • pp.253-261
    • /
    • 2024
  • The role of the library is as a public institution that provides academic information to a variety of people, including students, the general public, and researchers. These days, as the importance of lifelong education is emphasized, libraries are evolving beyond simply storing and lending materials to complex cultural spaces that share knowledge and information through various educational programs and cultural events. One of the problems library user's faces is locating books to borrow. This problem occurs because of errors in the location of borrowed books due to delays in updating library databases related to borrowed books, incorrect labeling, and books temporarily located in different locations. The biggest problem is that it takes a long time for users to search for the books they want to borrow. In this paper, we propose a system that visually displays the location of books in real time using an AI vision sensor and LED. The AI vision sensor-based book location guidance system generates a QR code containing the call number of the borrowed book. When the AI vision sensor recognizes this QR code, the exact location of the book is visually displayed through LED to guide users to find it easily. We believe that the AI vision sensor-based book location guidance system dramatically improves book search and management efficiency, and this technology is expected to have great potential for use not only in libraries and bookstores but also in a variety of other fields.

From Masked Reconstructions to Disease Diagnostics: A Vision Transformer Approach for Fundus Images (마스크된 복원에서 질병 진단까지: 안저 영상을 위한 비전 트랜스포머 접근법)

  • Toan Duc Nguyen;Gyurin Byun;Hyunseung Choo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.557-560
    • /
    • 2023
  • In this paper, we introduce a pre-training method leveraging the capabilities of the Vision Transformer (ViT) for disease diagnosis in conventional Fundus images. Recognizing the need for effective representation learning in medical images, our method combines the Vision Transformer with a Masked Autoencoder to generate meaningful and pertinent image augmentations. During pre-training, the Masked Autoencoder produces an altered version of the original image, which serves as a positive pair. The Vision Transformer then employs contrastive learning techniques with this image pair to refine its weight parameters. Our experiments demonstrate that this dual-model approach harnesses the strengths of both the ViT and the Masked Autoencoder, resulting in robust and clinically relevant feature embeddings. Preliminary results suggest significant improvements in diagnostic accuracy, underscoring the potential of our methodology in enhancing automated disease diagnosis in fundus imaging.

Performance Evaluation of Efficient Vision Transformers on Embedded Edge Platforms (임베디드 엣지 플랫폼에서의 경량 비전 트랜스포머 성능 평가)

  • Minha Lee;Seongjae Lee;Taehyoun Kim
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.3
    • /
    • pp.89-100
    • /
    • 2023
  • Recently, on-device artificial intelligence (AI) solutions using mobile devices and embedded edge devices have emerged in various fields, such as computer vision, to address network traffic burdens, low-energy operations, and security problems. Although vision transformer deep learning models have outperformed conventional convolutional neural network (CNN) models in computer vision, they require more computations and parameters than CNN models. Thus, they are not directly applicable to embedded edge devices with limited hardware resources. Many researchers have proposed various model compression methods or lightweight architectures for vision transformers; however, there are only a few studies evaluating the effects of model compression techniques of vision transformers on performance. Regarding this problem, this paper presents a performance evaluation of vision transformers on embedded platforms. We investigated the behaviors of three vision transformers: DeiT, LeViT, and MobileViT. Each model performance was evaluated by accuracy and inference time on edge devices using the ImageNet dataset. We assessed the effects of the quantization method applied to the models on latency enhancement and accuracy degradation by profiling the proportion of response time occupied by major operations. In addition, we evaluated the performance of each model on GPU and EdgeTPU-based edge devices. In our experimental results, LeViT showed the best performance in CPU-based edge devices, and DeiT-small showed the highest performance improvement in GPU-based edge devices. In addition, only MobileViT models showed performance improvement on EdgeTPU. Summarizing the analysis results through profiling, the degree of performance improvement of each vision transformer model was highly dependent on the proportion of parts that could be optimized in the target edge device. In summary, to apply vision transformers to on-device AI solutions, either proper operation composition and optimizations specific to target edge devices must be considered.

Analysis on Lightweight Methods of On-Device AI Vision Model for Intelligent Edge Computing Devices (지능형 엣지 컴퓨팅 기기를 위한 온디바이스 AI 비전 모델의 경량화 방식 분석)

  • Hye-Hyeon Ju;Namhi Kang
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.1-8
    • /
    • 2024
  • On-device AI technology, which can operate AI models at the edge devices to support real-time processing and privacy enhancement, is attracting attention. As intelligent IoT is applied to various industries, services utilizing the on-device AI technology are increasing significantly. However, general deep learning models require a lot of computational resources for inference and learning. Therefore, various lightweighting methods such as quantization and pruning have been suggested to operate deep learning models in embedded edge devices. Among the lightweighting methods, we analyze how to lightweight and apply deep learning models to edge computing devices, focusing on pruning technology in this paper. In particular, we utilize dynamic and static pruning techniques to evaluate the inference speed, accuracy, and memory usage of a lightweight AI vision model. The content analyzed in this paper can be used for intelligent video control systems or video security systems in autonomous vehicles, where real-time processing are highly required. In addition, it is expected that the content can be used more effectively in various IoT services and industries.

Quantitative evaluation of transfer learning for image recognition AI of robot vision (로봇 비전의 영상 인식 AI를 위한 전이학습 정량 평가)

  • Jae-Hak Jeong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.909-914
    • /
    • 2024
  • This study suggests a quantitative evaluation of transfer learning, which is widely used in various AI fields, including image recognition for robot vision. Quantitative and qualitative analyses of results applying transfer learning are presented, but transfer learning itself is not discussed. Therefore, this study proposes a quantitative evaluation of transfer learning itself based on MNIST, a handwritten digit database. For the reference network, the change in recognition accuracy according to the depth of the transfer learning frozen layer and the ratio of transfer learning data and pre-training data is tracked. It is observed that when freezing up to the first layer and the ratio of transfer learning data is more than 3%, the recognition accuracy of more than 90% can be stably maintained. The transfer learning quantitative evaluation method of this study can be used to implement transfer learning optimized according to the network structure and type of data in the future, and will expand the scope of the use of robot vision and image analysis AI in various environments.

Intelligent Monitoring System for Solitary Senior Citizens with Vision-Based Security Architecture (영상보안 구조 기반의 지능형 독거노인 모니터링 시스템)

  • Kim, Soohee;Jeong, Youngwoo;Jeong, Yue Ri;Lee, Seung Eun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.639-641
    • /
    • 2022
  • With the increasing of aging population, a lot of researches on monitoring systems for solitary senior citizens are under study. In general, a monitoring system provides a monitoring service by computing the information of vision, sensors, and measurement values on a server. Design considering data security is essential because a risk of data leakage exists in the structure of the system employing the server. In this paper, we propose a intelligent monitoring system for solitary senior citizens with vision-based security architecture. The proposed system protects privacy by ensuring high security through an architecture that blocks communication between a camera module and a server by employing an edge AI module. The edge AI module was designed with Verilog HDL and verified by implementing on a Field Programmable Gate Array (FPGA). We tested our proposed system on 5,144 frame data and demonstrated that a dangerous detection signal is generated correctly when human motion is not detected for a certain period.

  • PDF

Development of an intelligent edge computing device equipped with on-device AI vision model (온디바이스 AI 비전 모델이 탑재된 지능형 엣지 컴퓨팅 기기 개발)

  • Kang, Namhi
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.5
    • /
    • pp.17-22
    • /
    • 2022
  • In this paper, we design a lightweight embedded device that can support intelligent edge computing, and show that the device quickly detects an object in an image input from a camera device in real time. The proposed system can be applied to environments without pre-installed infrastructure, such as an intelligent video control system for industrial sites or military areas, or video security systems mounted on autonomous vehicles such as drones. The On-Device AI(Artificial intelligence) technology is increasingly required for the widespread application of intelligent vision recognition systems. Computing offloading from an image data acquisition device to a nearby edge device enables fast service with less network and system resources than AI services performed in the cloud. In addition, it is expected to be safely applied to various industries as it can reduce the attack surface vulnerable to various hacking attacks and minimize the disclosure of sensitive data.

A Framework for Computer Vision-aided Construction Safety Monitoring Using Collaborative 4D BIM

  • Tran, Si Van-Tien;Bao, Quy Lan;Nguyen, Truong Linh;Park, Chansik
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.1202-1208
    • /
    • 2022
  • Techniques based on computer vision are becoming increasingly important in construction safety monitoring. Using AI algorithms can automatically identify conceivable hazards and give feedback to stakeholders. However, the construction site remains various potential hazard situations during the project. Due to the site complexity, many visual devices simultaneously participate in the monitoring process. Therefore, it challenges developing and operating corresponding AI detection algorithms. Safety information resulting from computer vision needs to organize before delivering it to safety managers. This study proposes a framework for computer vision-aided construction safety monitoring using collaborative 4D BIM information to address this issue, called CSM4D. The suggested framework consists of two-module: (1) collaborative BIM information extraction module (CBIE) extracts the spatial-temporal information and potential hazard scenario of a specific activity; through that, Computer Vision-aid Safety Monitoring Module (CVSM) can apply accurate algorithms at the right workplace during the project. The proposed framework is expected to aid safety monitoring using computer vision and 4D BIM.

  • PDF