Search | Korea Science

Deep Learning Based Monocular Depth Estimation: Survey

Lee, Chungkeun;Shim, Dongseok;Kim, H. Jin
- Journal of Positioning, Navigation, and Timing
- /
- v.10 no.4
- /
- pp.297-305
- /
- 2021
Monocular depth estimation helps the robot to understand the surrounding environments in 3D. Especially, deep-learning-based monocular depth estimation has been widely researched, because it may overcome the scale ambiguity problem, which is a main issue in classical methods. Those learning based methods can be mainly divided into three parts: supervised learning, unsupervised learning, and semi-supervised learning. Supervised learning trains the network from dense ground-truth depth information, unsupervised one trains it from images sequences and semi-supervised one trains it from stereo images and sparse ground-truth depth. We describe the basics of each method, and then explain the recent research efforts to enhance the depth estimation performance.
https://doi.org/10.11003/JPNT.2021.10.4.297 인용 PDF KSCI HTML

AdaMM-DepthNet: Unsupervised Adaptive Depth Estimation Guided by Min and Max Depth Priors for Monocular Images

Bello, Juan Luis Gonzalez;Kim, Munchurl
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2020.11a
- /
- pp.252-255
- /
- 2020
Unsupervised deep learning methods have shown impressive results for the challenging monocular depth estimation task, a field of study that has gained attention in recent years. A common approach for this task is to train a deep convolutional neural network (DCNN) via an image synthesis sub-task, where additional views are utilized during training to minimize a photometric reconstruction error. Previous unsupervised depth estimation networks are trained within a fixed depth estimation range, irrespective of its possible range for a given image, leading to suboptimal estimates. To overcome this suboptimal limitation, we first propose an unsupervised adaptive depth estimation method guided by minimum and maximum (min-max) depth priors for a given input image. The incorporation of min-max depth priors can drastically reduce the depth estimation complexity and produce depth estimates with higher accuracy. Moreover, we propose a novel network architecture for adaptive depth estimation, called the AdaMM-DepthNet, which adopts the min-max depth estimation in its front side. Intensive experimental results demonstrate that the adaptive depth estimation can significantly boost up the accuracy with a fewer number of parameters over the conventional approaches with a fixed minimum and maximum depth range.
PDF

Improving Detection Range for Short Baseline Stereo Cameras Using Convolutional Neural Networks and Keypoint Matching (컨볼루션 뉴럴 네트워크와 키포인트 매칭을 이용한 짧은 베이스라인 스테레오 카메라의 거리 센싱 능력 향상)

Byungjae Park
- Journal of Sensor Science and Technology
- /
- v.33 no.2
- /
- pp.98-104
- /
- 2024
This study proposes a method to overcome the limited detection range of short-baseline stereo cameras (SBSCs). The proposed method includes two steps: (1) predicting an unscaled initial depth using monocular depth estimation (MDE) and (2) adjusting the unscaled initial depth by a scale factor. The scale factor is computed by triangulating the sparse visual keypoints extracted from the left and right images of the SBSC. The proposed method allows the use of any pre-trained MDE model without the need for additional training or data collection, making it efficient even when considering the computational constraints of small platforms. Using an open dataset, the performance of the proposed method was demonstrated by comparing it with other conventional stereo-based depth estimation methods.
https://doi.org/10.46670/JSST.2024.33.2.98 인용 PDF

3D Depth Information Extraction Algorithm Based on Motion Estimation in Monocular Video Sequence (단안 영상 시퀸스에서 움직임 추정 기반의 3차원 깊이 정보 추출 알고리즘)

Park, Jun-Ho;Jeon, Dae-Seong;Yun, Yeong-U
- The KIPS Transactions:PartB
- /
- v.8B no.5
- /
- pp.549-556
- /
- 2001
The general problems of recovering 3D for 2D imagery require the depth information for each picture element form focus. The manual creation of those 3D models is consuming time and cost expensive. The goal in this paper is to simplify the depth estimation algorithm that extracts the depth information of every region from monocular image sequence with camera translation to implement 3D video in realtime. The paper is based on the property that the motion of every point within image which taken from camera translation depends on the depth information. Full-search motion estimation based on block matching algorithm is exploited at first step and ten, motion vectors are compensated for the effect by camera rotation and zooming. We have introduced the algorithm that estimates motion of object by analysis of monocular motion picture and also calculates the averages of frame depth and relative depth of region to the average depth. Simulation results show that the depth of region belongs to a near object or a distant object is in accord with relative depth that human visual system recognizes.
PDF

Benchmark for Deep Learning based Visual Odometry and Monocular Depth Estimation (딥러닝 기반 영상 주행기록계와 단안 깊이 추정 및 기술을 위한 벤치마크)

Choi, Hyukdoo
- The Journal of Korea Robotics Society
- /
- v.14 no.2
- /
- pp.114-121
- /
- 2019
This paper presents a new benchmark system for visual odometry (VO) and monocular depth estimation (MDE). As deep learning has become a key technology in computer vision, many researchers are trying to apply deep learning to VO and MDE. Just a couple of years ago, they were independently studied in a supervised way, but now they are coupled and trained together in an unsupervised way. However, before designing fancy models and losses, we have to customize datasets to use them for training and testing. After training, the model has to be compared with the existing models, which is also a huge burden. The benchmark provides input dataset ready-to-use for VO and MDE research in 'tfrecords' format and output dataset that includes model checkpoints and inference results of the existing models. It also provides various tools for data formatting, training, and evaluation. In the experiments, the exsiting models were evaluated to verify their performances presented in the corresponding papers and we found that the evaluation result is inferior to the presented performances.
https://doi.org/10.7746/jkros.2019.14.2.114 인용 PDF KSCI

Unsupervised Monocular Depth Estimation Using Self-Attention for Autonomous Driving (자율주행을 위한 Self-Attention 기반 비지도 단안 카메라 영상 깊이 추정)

Seung-Jun Hwang;Sung-Jun Park;Joong-Hwan Baek
- Journal of Advanced Navigation Technology
- /
- v.27 no.2
- /
- pp.182-189
- /
- 2023
Depth estimation is a key technology in 3D map generation for autonomous driving of vehicles, robots, and drones. The existing sensor-based method has high accuracy but is expensive and has low resolution, while the camera-based method is more affordable with higher resolution. In this study, we propose self-attention-based unsupervised monocular depth estimation for UAV camera system. Self-Attention operation is applied to the network to improve the global feature extraction performance. In addition, we reduce the weight size of the self-attention operation for a low computational amount. The estimated depth and camera pose are transformed into point cloud. The point cloud is mapped into 3D map using the occupancy grid of Octree structure. The proposed network is evaluated using synthesized images and depth sequences from the Mid-Air dataset. Our network demonstrates a 7.69% reduction in error compared to prior studies.
https://doi.org/10.12673/jant.2023.27.2.182 인용 PDF HTML

Single Image Depth Estimation With Integration of Parametric Learning and Non-Parametric Sampling

Jung, Hyungjoo;Sohn, Kwanghoon
- Journal of Korea Multimedia Society
- /
- v.19 no.9
- /
- pp.1659-1668
- /
- 2016
Understanding 3D structure of scenes is of a great interest in various vision-related tasks. In this paper, we present a unified approach for estimating depth from a single monocular image. The key idea of our approach is to take advantages both of parametric learning and non-parametric sampling method. Using a parametric convolutional network, our approach learns the relation of various monocular cues, which make a coarse global prediction. We also leverage the local prediction to refine the global prediction. It is practically estimated in a non-parametric framework. The integration of local and global predictions is accomplished by concatenating the feature maps of the global prediction with those from local ones. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods both qualitatively and quantitatively.
https://doi.org/10.9717/kmms.2016.19.9.1659 인용 PDF KSCI KPUBS HTML

Study of Accomodation-lag using Monocular Estimation Method(MEM) (단안평가법(MEM)을 이용한 조절지체에 대한 연구)

Park, Eun-kyoo;Seo, Jung-Ick
- Journal of the Korea society of information convergence
- /
- v.6 no.2
- /
- pp.51-55
- /
- 2013
The accomodation is made to see near objects. This accomodation have different characteristics from individual to individual. Difference also occurs accommodation of the theory and real. This is accomodative-lag. Depth of focus directly affects the accomodative-lag. Depth of focus is affected by the refractive power and the size of the pupil. Depth of focus becomes deeper as the size of the pupil is small, the refractive power is increased. The accomodative-lag occur more as depth of focus is deep. In this paper, a study was made of the relationship of the accomodative-lag and refractive power. A Monocular Estimation Method(MEM) use for measuring the accomodative-lag. Results were measured by MEM, it tended to increase the refractive power so as to increase the accodative-lag. The accomodative-lag amount was measured at 0.51D. Men were measured at 0.52D, women were measured at 0.49D. The accomodative-lag by gender tended also increases the amount of refractive power increases.
PDF

High-Quality Depth Map Generation of Humans in Monocular Videos (단안 영상에서 인간 오브젝트의 고품질 깊이 정보 생성 방법)

Lee, Jungjin;Lee, Sangwoo;Park, Jongjin;Noh, Junyong
- Journal of the Korea Computer Graphics Society
- /
- v.20 no.2
- /
- pp.1-11
- /
- 2014
The quality of 2D-to-3D conversion depends on the accuracy of the assigned depth to scene objects. Manual depth painting for given objects is labor intensive as each frame is painted. Specifically, a human is one of the most challenging objects for a high-quality conversion, as a human body is an articulated figure and has many degrees of freedom (DOF). In addition, various styles of clothes, accessories, and hair create a very complex silhouette around the 2D human object. We propose an efficient method to estimate visually pleasing depths of a human at every frame in a monocular video. First, a 3D template model is matched to a person in a monocular video with a small number of specified user correspondences. Our pose estimation with sequential joint angular constraints reproduces a various range of human motions (i.e., spine bending) by allowing the utilization of a fully skinned 3D model with a large number of joints and DOFs. The initial depth of the 2D object in the video is assigned from the matched results, and then propagated toward areas where the depth is missing to produce a complete depth map. For the effective handling of the complex silhouettes and appearances, we introduce a partial depth propagation method based on color segmentation to ensure the detail of the results. We compared the result and depth maps painted by experienced artists. The comparison shows that our method produces viable depth maps of humans in monocular videos efficiently.
PDF KSCI

Pedestrian and Vehicle Distance Estimation Based on Hard Parameter Sharing (하드 파라미터 쉐어링 기반의 보행자 및 운송 수단 거리 추정)

Seo, Ji-Won;Cha, Eui-Young
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.26 no.3
- /
- pp.389-395
- /
- 2022
Because of improvement of deep learning techniques, deep learning using computer vision such as classification, detection and segmentation has also been used widely at many fields. Expecially, automatic driving is one of the major fields that applies computer vision systems. Also there are a lot of works and researches to combine multiple tasks in a single network. In this study, we propose the network that predicts the individual depth of pedestrians and vehicles. Proposed model is constructed based on YOLOv3 for object detection and Monodepth for depth estimation, and it process object detection and depth estimation consequently using encoder and decoder based on hard parameter sharing. We also used attention module to improve the accuracy of both object detection and depth estimation. Depth is predicted with monocular image, and is trained using self-supervised training method.
https://doi.org/10.6109/jkiice.2022.26.3.389 인용 PDF KSCI

Search Result 23, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)