• Title/Summary/Keyword: vision-based technology

Search Result 1,063, Processing Time 0.031 seconds

Human Action Recognition Using Deep Data: A Fine-Grained Study

  • Rao, D. Surendra;Potturu, Sudharsana Rao;Bhagyaraju, V
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.97-108
    • /
    • 2022
  • The video-assisted human action recognition [1] field is one of the most active ones in computer vision research. Since the depth data [2] obtained by Kinect cameras has more benefits than traditional RGB data, research on human action detection has recently increased because of the Kinect camera. We conducted a systematic study of strategies for recognizing human activity based on deep data in this article. All methods are grouped into deep map tactics and skeleton tactics. A comparison of some of the more traditional strategies is also covered. We then examined the specifics of different depth behavior databases and provided a straightforward distinction between them. We address the advantages and disadvantages of depth and skeleton-based techniques in this discussion.

Fire Detection Based on Image Learning by Collaborating CNN-SVM with Enhanced Recall

  • Yongtae Do
    • Journal of Sensor Science and Technology
    • /
    • v.33 no.3
    • /
    • pp.119-124
    • /
    • 2024
  • Effective fire sensing is important to protect lives and property from the disaster. In this paper, we present an intelligent visual sensing method for detecting fires based on machine learning techniques. The proposed method involves a two-step process. In the first step, fire and non-fire images are used to train a convolutional neural network (CNN), and in the next step, feature vectors consisting of 256 values obtained from the CNN are used for the learning of a support vector machine (SVM). Linear and nonlinear SVMs with different parameters are intensively tested. We found that the proposed hybrid method using an SVM with a linear kernel effectively increased the recall rate of fire image detection without compromising detection accuracy when an imbalanced dataset was used for learning. This is a major contribution of this study because recall is important, particularly in the sensing of disaster situations such as fires. In our experiments, the proposed system exhibited an accuracy of 96.9% and a recall rate of 92.9% for test image data.

Large Multimodal Model for Context-aware Construction Safety Monitoring

  • Taegeon Kim;Seokhwan Kim;Minkyu Koo;Minwoo Jeong;Hongjo Kim
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.415-422
    • /
    • 2024
  • Recent advances in construction automation have led to increased use of deep learning-based computer vision technology for construction monitoring. However, monitoring systems based on supervised learning struggle with recognizing complex risk factors in construction environments, highlighting the need for adaptable solutions. Large multimodal models, pretrained on extensive image-text datasets, present a promising solution with their capability to recognize diverse objects and extract semantic information. This paper proposes a methodology that generates training data for multimodal models, including safety-centric descriptions using GPT-4V, and fine-tunes the LLaVA model using the LoRA method. Experimental results from seven construction site hazard scenarios show that the fine-tuned model accurately assesses safety status in images. These findings underscore the proposed approach's effectiveness in enhancing construction site safety monitoring and illustrate the potential of large multimodal models to tackle domain-specific challenges.

Deep Learning-Based Face Recognition through Low-Light Enhancement (딥러닝 기반 저조도 향상 기술을 활용한 얼굴 인식 성능 개선)

  • Changwoo Baek;Kyeongbo Kong
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.19 no.5
    • /
    • pp.243-250
    • /
    • 2024
  • This study explores enhancing facial recognition performance in low-light environments using deep learning-based low-light enhancement techniques. Facial recognition technology is widely used in edge devices like smartphones, smart home devices, and security systems, but low-light conditions reduce accuracy due to degraded image quality and increased noise. We reviewed the latest techniques, including Zero-DCE, Zero-DCE++, and SCI (Self-Calibrated Illumination), and applied them as preprocessing steps in facial recognition on edge devices. Using the K-face dataset, experiments on the Qualcomm QRB5165 platform showed significant improvements in F1 SCORE from 0.57 to 0.833 with SCI. Processing times were 0.15ms for SCI, 0.4ms for Zero-DCE, and 0.7ms for Zero-DCE++, all much shorter than the facial recognition model MobileFaceNet's 5ms. These results indicate that these techniques can be effectively used in resource-limited edge devices, enhancing facial recognition in low-light conditions for various applications.

Dynamic characteristics monitoring of wind turbine blades based on improved YOLOv5 deep learning model

  • W.H. Zhao;W.R. Li;M.H. Yang;N. Hong;Y.F. Du
    • Smart Structures and Systems
    • /
    • v.31 no.5
    • /
    • pp.469-483
    • /
    • 2023
  • The dynamic characteristics of wind turbine blades are usually monitored by contact sensors with the disadvantages of high cost, difficult installation, easy damage to the structure, and difficult signal transmission. In view of the above problems, based on computer vision technology and the improved YOLOv5 (You Only Look Once v5) deep learning model, a non-contact dynamic characteristic monitoring method for wind turbine blade is proposed. First, the original YOLOv5l model of the CSP (Cross Stage Partial) structure is improved by introducing the CSP2_2 structure, which reduce the number of residual components to better the network training speed. On this basis, combined with the Deep sort algorithm, the accuracy of structural displacement monitoring is mended. Secondly, for the disadvantage that the deep learning sample dataset is difficult to collect, the blender software is used to model the wind turbine structure with conditions, illuminations and other practical engineering similar environments changed. In addition, incorporated with the image expansion technology, a modeling-based dataset augmentation method is proposed. Finally, the feasibility of the proposed algorithm is verified by experiments followed by the analytical procedure about the influence of YOLOv5 models, lighting conditions and angles on the recognition results. The results show that the improved YOLOv5 deep learning model not only perform well compared with many other YOLOv5 models, but also has high accuracy in vibration monitoring in different environments. The method can accurately identify the dynamic characteristics of wind turbine blades, and therefore can provide a reference for evaluating the condition of wind turbine blades.

Accelerometer-based Gesture Recognition for Robot Interface (로봇 인터페이스 활용을 위한 가속도 센서 기반 제스처 인식)

  • Jang, Min-Su;Cho, Yong-Suk;Kim, Jae-Hong;Sohn, Joo-Chan
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.53-69
    • /
    • 2011
  • Vision and voice-based technologies are commonly utilized for human-robot interaction. But it is widely recognized that the performance of vision and voice-based interaction systems is deteriorated by a large margin in the real-world situations due to environmental and user variances. Human users need to be very cooperative to get reasonable performance, which significantly limits the usability of the vision and voice-based human-robot interaction technologies. As a result, touch screens are still the major medium of human-robot interaction for the real-world applications. To empower the usability of robots for various services, alternative interaction technologies should be developed to complement the problems of vision and voice-based technologies. In this paper, we propose the use of accelerometer-based gesture interface as one of the alternative technologies, because accelerometers are effective in detecting the movements of human body, while their performance is not limited by environmental contexts such as lighting conditions or camera's field-of-view. Moreover, accelerometers are widely available nowadays in many mobile devices. We tackle the problem of classifying acceleration signal patterns of 26 English alphabets, which is one of the essential repertoires for the realization of education services based on robots. Recognizing 26 English handwriting patterns based on accelerometers is a very difficult task to take over because of its large scale of pattern classes and the complexity of each pattern. The most difficult problem that has been undertaken which is similar to our problem was recognizing acceleration signal patterns of 10 handwritten digits. Most previous studies dealt with pattern sets of 8~10 simple and easily distinguishable gestures that are useful for controlling home appliances, computer applications, robots etc. Good features are essential for the success of pattern recognition. To promote the discriminative power upon complex English alphabet patterns, we extracted 'motion trajectories' out of input acceleration signal and used them as the main feature. Investigative experiments showed that classifiers based on trajectory performed 3%~5% better than those with raw features e.g. acceleration signal itself or statistical figures. To minimize the distortion of trajectories, we applied a simple but effective set of smoothing filters and band-pass filters. It is well known that acceleration patterns for the same gesture is very different among different performers. To tackle the problem, online incremental learning is applied for our system to make it adaptive to the users' distinctive motion properties. Our system is based on instance-based learning (IBL) where each training sample is memorized as a reference pattern. Brute-force incremental learning in IBL continuously accumulates reference patterns, which is a problem because it not only slows down the classification but also downgrades the recall performance. Regarding the latter phenomenon, we observed a tendency that as the number of reference patterns grows, some reference patterns contribute more to the false positive classification. Thus, we devised an algorithm for optimizing the reference pattern set based on the positive and negative contribution of each reference pattern. The algorithm is performed periodically to remove reference patterns that have a very low positive contribution or a high negative contribution. Experiments were performed on 6500 gesture patterns collected from 50 adults of 30~50 years old. Each alphabet was performed 5 times per participant using $Nintendo{(R)}$ $Wii^{TM}$ remote. Acceleration signal was sampled in 100hz on 3 axes. Mean recall rate for all the alphabets was 95.48%. Some alphabets recorded very low recall rate and exhibited very high pairwise confusion rate. Major confusion pairs are D(88%) and P(74%), I(81%) and U(75%), N(88%) and W(100%). Though W was recalled perfectly, it contributed much to the false positive classification of N. By comparison with major previous results from VTT (96% for 8 control gestures), CMU (97% for 10 control gestures) and Samsung Electronics(97% for 10 digits and a control gesture), we could find that the performance of our system is superior regarding the number of pattern classes and the complexity of patterns. Using our gesture interaction system, we conducted 2 case studies of robot-based edutainment services. The services were implemented on various robot platforms and mobile devices including $iPhone^{TM}$. The participating children exhibited improved concentration and active reaction on the service with our gesture interface. To prove the effectiveness of our gesture interface, a test was taken by the children after experiencing an English teaching service. The test result showed that those who played with the gesture interface-based robot content marked 10% better score than those with conventional teaching. We conclude that the accelerometer-based gesture interface is a promising technology for flourishing real-world robot-based services and content by complementing the limits of today's conventional interfaces e.g. touch screen, vision and voice.

Comparative Analysis of Written Language and Colloquial Language for Information Communication of Multi-Modal Interface Environment (다중 인터페이스 환경에서의 문자언어와 음성언어의 차이에 관한 비교 연구)

  • Choi, In-Hwan;Lee, Kun-Pyo
    • Archives of design research
    • /
    • v.19 no.2 s.64
    • /
    • pp.91-98
    • /
    • 2006
  • The product convergence and complex application environment raise the need of multi-modal interface which enables us to interact products through various human senses. The sense of vision has been used predominantly more than any other senses for the traditional and general information gathering situation, but in the future which will be developed based on the digital network technology, the practical use of the various senses will be desired for more convenient and rational usage of the information appliances. The sense of auditory which possibility of practical use is becoming higher than ever with the sense of vision, the possible usage will be developed broader and in the various ways in the future. Based on this situation, the characteristics of the written language and the colloquial language and the comparative analysis of the difference between male and female's reaction for each language were examined through this study. To achieve this purpose, the literature research about the diverse components of the language system was peformed. Then, some peculiar characters of the sense of vision and auditory were reviewed and the appropriate experimentation was planned and carried out. The result of the accomplished experimentation was examined by the objective analysis method. The main results of this study are as follows: first, the reaction time for written language is shorter than colloquial language, second, there is a partial difference between the male's and female's reaction for those two stimuli, third, there is no selection bias between the sense of sight and the sense of hearing. I think the continuous development of the broad and diverse ways of study for various senses is needed based on this study.

  • PDF

Abdominal-Deformation Measurement for a Shape-Flexible Mannequin Using the 3D Digital Image Correlation

  • Liu, Huan;Hao, Kuangrong;Ding, Yongsheng
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.3
    • /
    • pp.79-91
    • /
    • 2017
  • In this paper, the abdominal-deformation measurement scheme is conducted on a shape-flexible mannequin using the DIC technique in a stereo-vision system. Firstly, during the integer-pixel displacement search, a novel fractal dimension based on an adaptive-ellipse subset area is developed to track an integer pixel between the reference and deformed images. Secondly, at the subpixel registration, a new mutual-learning adaptive particle swarm optimization (MLADPSO) algorithm is employed to locate the subpixel precisely. Dynamic adjustments of the particle flight velocities that are according to the deformation extent of each interest point are utilized for enhancing the accuracy of the subpixel registration. A test is performed on the abdominal-deformation measurement of the shape-flexible mannequin. The experiment results indicate that under the guarantee of its measurement accuracy without the cause of any loss, the time-consumption of the proposed scheme is significantly more efficient than that of the conventional method, particularly in the case of a large number of interest points.

The Role of the Center for Technology Fusion in Construction (첨단융합건설연구단의 역할)

  • Kim, Hyoung-Kwan;Han, Seung-Heon;Kim, Moon-Kyum
    • Proceedings of the Korean Institute Of Construction Engineering and Management
    • /
    • 2006.11a
    • /
    • pp.229-232
    • /
    • 2006
  • The Center for Technology Fusion in Construction was established on Sep. 30, 2005, with the support of Korea Ministry of Construction and Transportation and Korea Institute of Construction and Transportation Technology Evaluation and Planning. It aims to develop the next generation of economic growth engine through the fusion of traditional construction technology and cutting-edge emerging technologies. To achieve this vision, the center tries to establish a system for the systematic construction research based on the fusion approach. The scope of the center focuses on improving the performance of construction project, including planning, design, construction, and maintenance. Along with the newly developed Korea Construction Technology Road Map, the center is expected to significantly contribute to the development of innovative construction technologies for the world-class Korean society.

  • PDF

A Novel Approach to Enhance Dual-Energy X-Ray Images Using Region of Interest and Discrete Wavelet Transform

  • Ullah, Burhan;Khan, Aurangzeb;Fahad, Muhammad;Alam, Mahmood;Noor, Allah;Saleem, Umar;Kamran, Muhammad
    • Journal of Information Processing Systems
    • /
    • v.18 no.3
    • /
    • pp.319-331
    • /
    • 2022
  • The capability to examine an X-ray image is so far a challenging task. In this work, we suggest a practical and novel algorithm based on image fusion to inspect the issues such as background noise, blurriness, or sharpness, which curbs the quality of dual-energy X-ray images. The current technology exercised for the examination of bags and baggage is "X-ray"; however, the results of the incumbent technology used show blurred and low contrast level images. This paper aims to improve the quality of X-ray images for a clearer vision of illegitimate or volatile substances. A dataset of 40 images was taken for the experiment, but for clarity, the results of only 13 images have been shown. The results were evaluated using MSE and PSNR metrics, where the average PSNR value of the proposed system compared to single X-ray images was increased by 19.3%, and the MSE value decreased by 17.3%. The results show that the proposed framework will help discern threats and the entire scanning process.