• Title/Summary/Keyword: 객체기반 영상분류

Search Result 215, Processing Time 0.026 seconds

On Optimizing Dissimilarity-Based Classifications Using a DTW and Fusion Strategies (DTW와 퓨전기법을 이용한 비유사도 기반 분류법의 최적화)

  • Kim, Sang-Woon;Kim, Seung-Hwan
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.2
    • /
    • pp.21-28
    • /
    • 2010
  • This paper reports an experimental result on optimizing dissimilarity-based classification(DBC) by simultaneously using a dynamic time warping(DTW) and a multiple fusion strategy(MFS). DBC is a way of defining classifiers among classes; they are not based on the feature measurements of individual samples, but rather on a suitable dissimilarity measure among the samples. In DTW, the dissimilarity is measured in two steps: first, we adjust the object samples by finding the best warping path with a correlation coefficient-based DTW technique. We then compute the dissimilarity distance between the adjusted objects with conventional measures. In MFS, fusion strategies are repeatedly used in generating dissimilarity matrices as well as in designing classifiers: we first combine the dissimilarity matrices obtained with the DTW technique to a new matrix. After training some base classifiers in the new matrix, we again combine the results of the base classifiers. Our experimental results for well-known benchmark databases demonstrate that the proposed mechanism achieves further improved results in terms of classification accuracy compared with the previous approaches. From this consideration, the method could also be applied to other high-dimensional tasks, such as multimedia information retrieval.

Visual Verb and ActionNet Database for Semantic Visual Understanding (동영상 시맨틱 이해를 위한 시각 동사 도출 및 액션넷 데이터베이스 구축)

  • Bae, Changseok;Kim, Bo Kyeong
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.14 no.5
    • /
    • pp.19-30
    • /
    • 2018
  • Visual information understanding is known as one of the most difficult and challenging problems in the realization of machine intelligence. This paper proposes deriving visual verb and construction of ActionNet database as a video database for video semantic understanding. Even though development AI (artificial intelligence) algorithms have contributed to the large part of modern advances in AI technologies, huge amount of database for algorithm development and test plays a great role as well. As the performance of object recognition algorithms in still images are surpassing human's ability, research interests shifting to semantic understanding of video contents. This paper proposes candidates of visual verb requiring in the construction of ActionNet as a learning and test database for video understanding. In order to this, we first investigate verb taxonomy in linguistics, and then propose candidates of visual verb from video description database and frequency of verbs. Based on the derived visual verb candidates, we have defined and constructed ActionNet schema and database. According to expanding usability of ActionNet database on open environment, we expect to contribute in the development of video understanding technologies.

Classification of Feature Points Required for Multi-Frame Based Building Recognition (멀티 프레임 기반 건물 인식에 필요한 특징점 분류)

  • Park, Si-young;An, Ha-eun;Lee, Gyu-cheol;Yoo, Ji-sang
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.3
    • /
    • pp.317-327
    • /
    • 2016
  • The extraction of significant feature points from a video is directly associated with the suggested method's function. In particular, the occlusion regions in trees or people, or feature points extracted from the background and not from objects such as the sky or mountains are insignificant and can become the cause of undermined matching or recognition function. This paper classifies the feature points required for building recognition by using multi-frames in order to improve the recognition function(algorithm). First, through SIFT(scale invariant feature transform), the primary feature points are extracted and the mismatching feature points are removed. To categorize the feature points in occlusion regions, RANSAC(random sample consensus) is applied. Since the classified feature points were acquired through the matching method, for one feature point there are multiple descriptors and therefore a process that compiles all of them is also suggested. Experiments have verified that the suggested method is competent in its algorithm.

Atrous Residual U-Net for Semantic Segmentation in Street Scenes based on Deep Learning (딥러닝 기반 거리 영상의 Semantic Segmentation을 위한 Atrous Residual U-Net)

  • Shin, SeokYong;Lee, SangHun;Han, HyunHo
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.10
    • /
    • pp.45-52
    • /
    • 2021
  • In this paper, we proposed an Atrous Residual U-Net (AR-UNet) to improve the segmentation accuracy of semantic segmentation method based on U-Net. The U-Net is mainly used in fields such as medical image analysis, autonomous vehicles, and remote sensing images. The conventional U-Net lacks extracted features due to the small number of convolution layers in the encoder part. The extracted features are essential for classifying object categories, and if they are insufficient, it causes a problem of lowering the segmentation accuracy. Therefore, to improve this problem, we proposed the AR-UNet using residual learning and ASPP in the encoder. Residual learning improves feature extraction ability and is effective in preventing feature loss and vanishing gradient problems caused by continuous convolutions. In addition, ASPP enables additional feature extraction without reducing the resolution of the feature map. Experiments verified the effectiveness of the AR-UNet with Cityscapes dataset. The experimental results showed that the AR-UNet showed improved segmentation results compared to the conventional U-Net. In this way, AR-UNet can contribute to the advancement of many applications where accuracy is important.

Automatic Validation of the Geometric Quality of Crowdsourcing Drone Imagery (크라우드소싱 드론 영상의 기하학적 품질 자동 검증)

  • Dongho Lee ;Kyoungah Choi
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.577-587
    • /
    • 2023
  • The utilization of crowdsourced spatial data has been actively researched; however, issues stemming from the uncertainty of data quality have been raised. In particular, when low-quality data is mixed into drone imagery datasets, it can degrade the quality of spatial information output. In order to address these problems, the study presents a methodology for automatically validating the geometric quality of crowdsourced imagery. Key quality factors such as spatial resolution, resolution variation, matching point reprojection error, and bundle adjustment results are utilized. To classify imagery suitable for spatial information generation, training and validation datasets are constructed, and machine learning is conducted using a radial basis function (RBF)-based support vector machine (SVM) model. The trained SVM model achieved a classification accuracy of 99.1%. To evaluate the effectiveness of the quality validation model, imagery sets before and after applying the model to drone imagery not used in training and validation are compared by generating orthoimages. The results confirm that the application of the quality validation model reduces various distortions that can be included in orthoimages and enhances object identifiability. The proposed quality validation methodology is expected to increase the utility of crowdsourced data in spatial information generation by automatically selecting high-quality data from the multitude of crowdsourced data with varying qualities.

Modified Weight Filter Algorithm using Pixel Matching in AWGN Environment (AWGN 환경에서 화소매칭을 이용한 변형된 가중치 필터 알고리즘)

  • Cheon, Bong-Won;Kim, Nam-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.10
    • /
    • pp.1310-1316
    • /
    • 2021
  • Recently, with the development of artificial intelligence and IoT technology, the importance of video processing such as object tracking, medical imaging, and object recognition is increasing. In particular, the noise reduction technology used in the preprocessing process demands the ability to effectively remove noise and maintain detailed features as the importance of system images increases. In this paper, we provide a modified weight filter based on pixel matching in an AWGN environment. The proposed algorithm uses a pixel matching method to maintain high-frequency components in which the pixel value of the image changes significantly, detects areas with highly relevant patterns in the peripheral area, and matches pixels required for output calculation. Classify the values. The final output is obtained by calculating the weight according to the similarity and spatial distance between the matching pixels with the center pixel in order to consider the edge component in the filtering process.

Noise Removal Filter Algorithm using Spatial Weight in AWGN Environment (화소값 분포패턴과 가중치 마스크를 사용한 AWGN 제거 알고리즘)

  • Cheon, Bong-Won;Kim, Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.428-430
    • /
    • 2022
  • Image processing is playing an important part in automation and artificial intelligence systems, such as object tracking, object recognition and classification, and the importance of IoT technology and automation is emphasizing as interest in automation increases. However, in a system that requires detailed data such as an image boundary, a precise noise removal algorithm is required. Therefore, in this paper, we propose a filtering algorithm based on the pixel value distribution pattern to minimize the information loss in the filtering process. The proposed algorithm finds the distribution pattern of neighboring pixel values with respect to the pixel values of the input image. Then, a weight mask is calculated based on the distribution pattern, and the final output is calculated by applying it to the filtering mask. The proposed algorithm has superior noise removal characteristics compared to the existing method and restored the image while minimizing blurring.

  • PDF

A Study on the Applicability of Deep Learning Algorithm for Detection and Resolving of Occlusion Area (영상 폐색영역 검출 및 해결을 위한 딥러닝 알고리즘 적용 가능성 연구)

  • Bae, Kyoung-Ho;Park, Hong-Gi
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.11
    • /
    • pp.305-313
    • /
    • 2019
  • Recently, spatial information is being constructed actively based on the images obtained by drones. Because occlusion areas occur due to buildings as well as many obstacles, such as trees, pedestrians, and banners in the urban areas, an efficient way to resolve the problem is necessary. Instead of the traditional way, which replaces the occlusion area with other images obtained at different positions, various models based on deep learning were examined and compared. A comparison of a type of feature descriptor, HOG, to the machine learning-based SVM, deep learning-based DNN, CNN, and RNN showed that the CNN is used broadly to detect and classify objects. Until now, many studies have focused on the development and application of models so that it is impossible to select an optimal model. On the other hand, the upgrade of a deep learning-based detection and classification technique is expected because many researchers have attempted to upgrade the accuracy of the model as well as reduce the computation time. In that case, the procedures for generating spatial information will be changed to detect the occlusion area and replace it with simulated images automatically, and the efficiency of time, cost, and workforce will also be improved.

CNN-based People Recognition for Vision Occupancy Sensors (비전 점유센서를 위한 합성곱 신경망 기반 사람 인식)

  • Lee, Seung Soo;Choi, Changyeol;Kim, Manbae
    • Journal of Broadcast Engineering
    • /
    • v.23 no.2
    • /
    • pp.274-282
    • /
    • 2018
  • Most occupancy sensors installed in buildings, households and so forth are pyroelectric infra-red (PIR) sensors. One of disadvantages is that PIR sensor can not detect the stationary person due to its functionality of detecting the variation of thermal temperature. In order to overcome this problem, the utilization of camera vision sensors has gained interests, where object tracking is used for detecting the stationary persons. However, the object tracking has an inherent problem such as tracking drift. Therefore, the recognition of humans in static trackers is an important task. In this paper, we propose a CNN-based human recognition to determine whether a static tracker contains humans. Experimental results validated that human and non-humans are classified with accuracy of about 88% and that the proposed method can be incorporated into practical vision occupancy sensors.

A Dual-Structured Self-Attention for improving the Performance of Vision Transformers (비전 트랜스포머 성능향상을 위한 이중 구조 셀프 어텐션)

  • Kwang-Yeob Lee;Hwang-Hee Moon;Tae-Ryong Park
    • Journal of IKEEE
    • /
    • v.27 no.3
    • /
    • pp.251-257
    • /
    • 2023
  • In this paper, we propose a dual-structured self-attention method that improves the lack of regional features of the vision transformer's self-attention. Vision Transformers, which are more computationally efficient than convolutional neural networks in object classification, object segmentation, and video image recognition, lack the ability to extract regional features relatively. To solve this problem, many studies are conducted based on Windows or Shift Windows, but these methods weaken the advantages of self-attention-based transformers by increasing computational complexity using multiple levels of encoders. This paper proposes a dual-structure self-attention using self-attention and neighborhood network to improve locality inductive bias compared to the existing method. The neighborhood network for extracting local context information provides a much simpler computational complexity than the window structure. CIFAR-10 and CIFAR-100 were used to compare the performance of the proposed dual-structure self-attention transformer and the existing transformer, and the experiment showed improvements of 0.63% and 1.57% in Top-1 accuracy, respectively.