• Title/Summary/Keyword: deep Learning

Search Result 5,800, Processing Time 0.032 seconds

A Study on the Improvement of Construction Site Worker Detection Performance Using YOLOv5 and OpenPose (YOLOv5 및 OpenPose를 이용한 건설현장 근로자 탐지성능 향상에 대한 연구)

  • Yoon, Younggeun;Oh, Taekeun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.5
    • /
    • pp.735-740
    • /
    • 2022
  • The construction is the industry with the highest fatalities, and the fatalities has not decreased despite various institutional improvements. Accordingly, real-time safety management by applying artificial intelligence (AI) to CCTV images is emerging. Although some research on worker detection by applying AI to images of construction sites is being conducted, there are limitations in performance expression due to problems such as complex background due to the nature of the construction industry. In this study, the YOLO model and the OpenPose model were fused to improve the performance of worker detection and posture estimation to improve the detection performance of workers in various complex conditions. This is expected to be highly useful in terms of unsafe behavior and health management of workers in the future.

A Dynamic Correction Technique of Time-Series Data using Anomaly Detection Model based on LSTM-GAN (LSTM-GAN 기반 이상탐지 모델을 활용한 시계열 데이터의 동적 보정기법)

  • Hanseok Jeong;Han-Joon Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.2
    • /
    • pp.103-111
    • /
    • 2023
  • This paper proposes a new data correction technique that transforms anomalies in time series data into normal values. With the recent development of IT technology, a vast amount of time-series data is being collected through sensors. However, due to sensor failures and abnormal environments, most of time-series data contain a lot of anomalies. If we build a predictive model using original data containing anomalies as it is, we cannot expect highly reliable predictive performance. Therefore, we utilizes the LSTM-GAN model to detect anomalies in the original time series data, and combines DTW (Dynamic Time Warping) and GAN techniques to replace the anomaly data with normal data in partitioned window units. The basic idea is to construct a GAN model serially by applying the statistical information of the window with normal distribution data adjacent to the window containing the detected anomalies to the DTW so as to generate normal time-series data. Through experiments using open NAB data, we empirically prove that our proposed method outperforms the conventional two correction methods.

Performance Enhancement of Speech Declipping using Clipping Detector (클리핑 감지기를 이용한 음성 신호 클리핑 제거의 성능 향상)

  • Eunmi Seo;Jeongchan Yu;Yujin Lim;Hochong Park
    • Journal of Broadcast Engineering
    • /
    • v.28 no.1
    • /
    • pp.132-140
    • /
    • 2023
  • In this paper, we propose a method for performance enhancement of speech declipping using clipping detector. Clipping occurs when the input speech level exceeds the dynamic range of microphone, and it significantly degrades the speech quality. Recently, many methods for high-performance speech declipping based on machine learning have been developed. However, they often deteriorate the speech signal because of degradation in signal reconstruction process when the degree of clipping is not high. To solve this problem, we propose a new approach that combines the declipping network and clipping detector, which enables a selective declipping operation depending on the clipping level and provides high-quality speech in all clipping levels. We measured the declipping performance using various metrics and confirmed that the proposed method improves the average performance over all clipping levels, compared with the conventional methods, and greatly improves the performance when the clipping distortion is small.

A Study on Reconstruction Performance of Phase-only Holograms with Varying Propagation Distance (전파 거리에 따른 위상 홀로그램 복원성능 분석 및 BL-ASM 개선 방안 연구)

  • Jun Yeong Cha;Hyun Min Ban;Seung Mi Choi;Jin Woong Kim;Hui Yong Kim
    • Journal of Broadcast Engineering
    • /
    • v.28 no.1
    • /
    • pp.3-20
    • /
    • 2023
  • A computer-generated hologram (CGH) is a digitally calculated and recorded hologram in which the amplitude and phase information of an image is transmitted in free space. The CGH is in the form of a complex hologram, but it is converted into a phase-only hologram to display through a phase-only spatial light modulator (SLM). In this paper, in the process of including the amplitude information of an object in the phase information, when a technique that includes subsampling such as DPAC is used, we showed experimentally that the bandwidth of the phase-only hologram increases, and as a result, aliasing that was not present in the complex hologram can occur. In addition, it was experimentally shown that it is possible to generate a high-quality phase-only hologram by restricting the spatial frequency range even at a distance where the numerical reconstruction performance is degraded by aliasing.

Ensemble-based deep learning for autonomous bridge component and damage segmentation leveraging Nested Reg-UNet

  • Abhishek Subedi;Wen Tang;Tarutal Ghosh Mondal;Rih-Teng Wu;Mohammad R. Jahanshahi
    • Smart Structures and Systems
    • /
    • v.31 no.4
    • /
    • pp.335-349
    • /
    • 2023
  • Bridges constantly undergo deterioration and damage, the most common ones being concrete damage and exposed rebar. Periodic inspection of bridges to identify damages can aid in their quick remediation. Likewise, identifying components can provide context for damage assessment and help gauge a bridge's state of interaction with its surroundings. Current inspection techniques rely on manual site visits, which can be time-consuming and costly. More recently, robotic inspection assisted by autonomous data analytics based on Computer Vision (CV) and Artificial Intelligence (AI) has been viewed as a suitable alternative to manual inspection because of its efficiency and accuracy. To aid research in this avenue, this study performs a comparative assessment of different architectures, loss functions, and ensembling strategies for the autonomous segmentation of bridge components and damages. The experiments lead to several interesting discoveries. Nested Reg-UNet architecture is found to outperform five other state-of-the-art architectures in both damage and component segmentation tasks. The architecture is built by combining a Nested UNet style dense configuration with a pretrained RegNet encoder. In terms of the mean Intersection over Union (mIoU) metric, the Nested Reg-UNet architecture provides an improvement of 2.86% on the damage segmentation task and 1.66% on the component segmentation task compared to the state-of-the-art UNet architecture. Furthermore, it is demonstrated that incorporating the Lovasz-Softmax loss function to counter class imbalance can boost performance by 3.44% in the component segmentation task over the most employed alternative, weighted Cross Entropy (wCE). Finally, weighted softmax ensembling is found to be quite effective when used synchronously with the Nested Reg-UNet architecture by providing mIoU improvement of 0.74% in the component segmentation task and 1.14% in the damage segmentation task over a single-architecture baseline. Overall, the best mIoU of 92.50% for the component segmentation task and 84.19% for the damage segmentation task validate the feasibility of these techniques for autonomous bridge component and damage segmentation using RGB images.

Turbulent-image Restoration Based on a Compound Multibranch Feature Fusion Network

  • Banglian Xu;Yao Fang;Leihong Zhang;Dawei Zhang;Lulu Zheng
    • Current Optics and Photonics
    • /
    • v.7 no.3
    • /
    • pp.237-247
    • /
    • 2023
  • In middle- and long-distance imaging systems, due to the atmospheric turbulence caused by temperature, wind speed, humidity, and so on, light waves propagating in the air are distorted, resulting in image-quality degradation such as geometric deformation and fuzziness. In remote sensing, astronomical observation, and traffic monitoring, image information loss due to degradation causes huge losses, so effective restoration of degraded images is very important. To restore images degraded by atmospheric turbulence, an image-restoration method based on improved compound multibranch feature fusion (CMFNetPro) was proposed. Based on the CMFNet network, an efficient channel-attention mechanism was used to replace the channel-attention mechanism to improve image quality and network efficiency. In the experiment, two-dimensional random distortion vector fields were used to construct two turbulent datasets with different degrees of distortion, based on the Google Landmarks Dataset v2 dataset. The experimental results showed that compared to the CMFNet, DeblurGAN-v2, and MIMO-UNet models, the proposed CMFNetPro network achieves better performance in both quality and training cost of turbulent-image restoration. In the mixed training, CMFNetPro was 1.2391 dB (weak turbulence), 0.8602 dB (strong turbulence) respectively higher in terms of peak signal-to-noise ratio and 0.0015 (weak turbulence), 0.0136 (strong turbulence) respectively higher in terms of structure similarity compared to CMFNet. CMFNetPro was 14.4 hours faster compared to the CMFNet. This provides a feasible scheme for turbulent-image restoration based on deep learning.

Forecasting volatility index by temporal convolutional neural network (Causal temporal convolutional neural network를 이용한 변동성 지수 예측)

  • Ji Won Shin;Dong Wan Shin
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.2
    • /
    • pp.129-139
    • /
    • 2023
  • Forecasting volatility is essential to avoiding the risk caused by the uncertainties of an financial asset. Complicated financial volatility features such as ambiguity between non-stationarity and stationarity, asymmetry, long-memory, sudden fairly large values like outliers bring great challenges to volatility forecasts. In order to address such complicated features implicity, we consider machine leaning models such as LSTM (1997) and GRU (2014), which are known to be suitable for existing time series forecasting. However, there are the problems of vanishing gradients, of enormous amount of computation, and of a huge memory. To solve these problems, a causal temporal convolutional network (TCN) model, an advanced form of 1D CNN, is also applied. It is confirmed that the overall forecasting power of TCN model is higher than that of the RNN models in forecasting VIX, VXD, and VXN, the daily volatility indices of S&P 500, DJIA, Nasdaq, respectively.

Estimation of Urban Traffic State Using Black Box Camera (차량 블랙박스 카메라를 이용한 도시부 교통상태 추정)

  • Haechan Cho;Yeohwan Yoon;Hwasoo Yeo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.22 no.2
    • /
    • pp.133-146
    • /
    • 2023
  • Traffic states in urban areas are essential to implement effective traffic operation and traffic control. However, installing traffic sensors on numerous road sections is extremely expensive. Accordingly, estimating the traffic state using a vehicle-mounted camera, which shows a high penetration rate, is a more effective solution. However, the previously proposed methodology using object tracking or optical flow has a high computational cost and requires consecutive frames to obtain traffic states. Accordingly, we propose a method to detect vehicles and lanes by object detection networks and set the region between lanes as a region of interest to estimate the traffic density of the corresponding area. The proposed method only uses less computationally expensive object detection models and can estimate traffic states from sampled frames rather than consecutive frames. In addition, the traffic density estimation accuracy was over 90% on the black box videos collected from two buses having different characteristics.

A Study on the Generation of Webtoons through Fine-Tuning of Diffusion Models (확산모델의 미세조정을 통한 웹툰 생성연구)

  • Kyungho Yu;Hyungju Kim;Jeongin Kim;Chanjun Chun;Pankoo Kim
    • Smart Media Journal
    • /
    • v.12 no.7
    • /
    • pp.76-83
    • /
    • 2023
  • This study proposes a method to assist webtoon artists in the process of webtoon creation by utilizing a pretrained Text-to-Image model to generate webtoon images from text. The proposed approach involves fine-tuning a pretrained Stable Diffusion model using a webtoon dataset transformed into the desired webtoon style. The fine-tuning process, using LoRA technique, completes in a quick training time of approximately 4.5 hours with 30,000 steps. The generated images exhibit the representation of shapes and backgrounds based on the input text, resulting in the creation of webtoon-like images. Furthermore, the quantitative evaluation using the Inception score shows that the proposed method outperforms DCGAN-based Text-to-Image models. If webtoon artists adopt the proposed Text-to-Image model for webtoon creation, it is expected to significantly reduce the time required for the creative process.

Performance Evaluation of Efficient Vision Transformers on Embedded Edge Platforms (임베디드 엣지 플랫폼에서의 경량 비전 트랜스포머 성능 평가)

  • Minha Lee;Seongjae Lee;Taehyoun Kim
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.3
    • /
    • pp.89-100
    • /
    • 2023
  • Recently, on-device artificial intelligence (AI) solutions using mobile devices and embedded edge devices have emerged in various fields, such as computer vision, to address network traffic burdens, low-energy operations, and security problems. Although vision transformer deep learning models have outperformed conventional convolutional neural network (CNN) models in computer vision, they require more computations and parameters than CNN models. Thus, they are not directly applicable to embedded edge devices with limited hardware resources. Many researchers have proposed various model compression methods or lightweight architectures for vision transformers; however, there are only a few studies evaluating the effects of model compression techniques of vision transformers on performance. Regarding this problem, this paper presents a performance evaluation of vision transformers on embedded platforms. We investigated the behaviors of three vision transformers: DeiT, LeViT, and MobileViT. Each model performance was evaluated by accuracy and inference time on edge devices using the ImageNet dataset. We assessed the effects of the quantization method applied to the models on latency enhancement and accuracy degradation by profiling the proportion of response time occupied by major operations. In addition, we evaluated the performance of each model on GPU and EdgeTPU-based edge devices. In our experimental results, LeViT showed the best performance in CPU-based edge devices, and DeiT-small showed the highest performance improvement in GPU-based edge devices. In addition, only MobileViT models showed performance improvement on EdgeTPU. Summarizing the analysis results through profiling, the degree of performance improvement of each vision transformer model was highly dependent on the proportion of parts that could be optimized in the target edge device. In summary, to apply vision transformers to on-device AI solutions, either proper operation composition and optimizations specific to target edge devices must be considered.