• 제목/요약/키워드: object detection system

Search Result 1,078, Processing Time 1.194 seconds

Implementation of Specific Target Detection and Tracking Technique using Re-identification Technology based on public Multi-CCTV (공공 다중CCTV 기반에서 재식별 기술을 활용한 특정대상 탐지 및 추적기법 구현)

  • Hwang, Joo-Sung;Nguyen, Thanh Hai;Kang, Soo-Kyung;Kim, Young-Kyu;Kim, Joo-Yong;Chung, Myoung-Sug;Lee, Jooyeoun
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.22 no.4
    • /
    • pp.49-57
    • /
    • 2022
  • The government is making great efforts to prevent crimes such as missing children by using public CCTVs. However, there is a shortage of operating manpower, weakening of concentration due to long-term concentration, and difficulty in tracking. In addition, applying real-time object search, re-identification, and tracking through a deep learning algorithm showed a phenomenon of increased parameters and insufficient memory for speed reduction due to complex network analysis. In this paper, we designed the network to improve speed and save memory through the application of Yolo v4, which can recognize real-time objects, and the application of Batch and TensorRT technology. In this thesis, based on the research on these advanced algorithms, OSNet re-ranking and K-reciprocal nearest neighbor for re-identification, Jaccard distance dissimilarity measurement algorithm for correlation, etc. are developed and used in the solution of CCTV national safety identification and tracking system. As a result, we propose a solution that can track objects by recognizing and re-identification objects in real-time within situation of a Korean public multi-CCTV environment through a set of algorithm combinations.

A Comparison of Image Classification System for Building Waste Data based on Deep Learning (딥러닝기반 건축폐기물 이미지 분류 시스템 비교)

  • Jae-Kyung Sung;Mincheol Yang;Kyungnam Moon;Yong-Guk Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.199-206
    • /
    • 2023
  • This study utilizes deep learning algorithms to automatically classify construction waste into three categories: wood waste, plastic waste, and concrete waste. Two models, VGG-16 and ViT (Vision Transformer), which are convolutional neural network image classification algorithms and NLP-based models that sequence images, respectively, were compared for their performance in classifying construction waste. Image data for construction waste was collected by crawling images from search engines worldwide, and 3,000 images, with 1,000 images for each category, were obtained by excluding images that were difficult to distinguish with the naked eye or that were duplicated and would interfere with the experiment. In addition, to improve the accuracy of the models, data augmentation was performed during training with a total of 30,000 images. Despite the unstructured nature of the collected image data, the experimental results showed that VGG-16 achieved an accuracy of 91.5%, and ViT achieved an accuracy of 92.7%. This seems to suggest the possibility of practical application in actual construction waste data management work. If object detection techniques or semantic segmentation techniques are utilized based on this study, more precise classification will be possible even within a single image, resulting in more accurate waste classification

Intelligent Motion Pattern Recognition Algorithm for Abnormal Behavior Detections in Unmanned Stores (무인 점포 사용자 이상행동을 탐지하기 위한 지능형 모션 패턴 인식 알고리즘)

  • Young-june Choi;Ji-young Na;Jun-ho Ahn
    • Journal of Internet Computing and Services
    • /
    • v.24 no.6
    • /
    • pp.73-80
    • /
    • 2023
  • The recent steep increase in the minimum hourly wage has increased the burden of labor costs, and the share of unmanned stores is increasing in the aftermath of COVID-19. As a result, theft crimes targeting unmanned stores are also increasing, and the "Just Walk Out" system is introduced to prevent such thefts, and LiDAR sensors, weight sensors, etc. are used or manually checked through continuous CCTV monitoring. However, the more expensive sensors are used, the higher the initial cost of operating the store and the higher the cost in many ways, and CCTV verification is difficult for managers to monitor around the clock and is limited in use. In this paper, we would like to propose an AI image processing fusion algorithm that can solve these sensors or human-dependent parts and detect customers who perform abnormal behaviors such as theft at low costs that can be used in unmanned stores and provide cloud-based notifications. In addition, this paper verifies the accuracy of each algorithm based on behavior pattern data collected from unmanned stores through motion capture using mediapipe, object detection using YOLO, and fusion algorithm and proves the performance of the convergence algorithm through various scenario designs.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

Land-Cover Change Detection of Western DMZ and Vicinity using Spectral Mixture Analysis of Landsat Imagery (선형분광혼합화소분석을 이용한 서부지역 DMZ의 토지피복 변화 탐지)

  • Kim, Sang-Wook
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.9 no.1
    • /
    • pp.158-167
    • /
    • 2006
  • The object of this study is to detect of land-cover change in western DMZ and vicinity. This was performed as a basic study to construct a decision support system for the conservation or a sustainable development of the DMZ and Vicinity near future. DMZ is an is 4km wide and 250km long and it's one of the most highly fortified boundaries in the world and also a unique thin green line. Environmentalists want to declare the DMZ as a natural reserve and a biodiversity zone, but nowadays through the strengthening of the inter-Korean economic cooperation, some developers are trying to construct a new-town or an industrial complex inside of the DMZ. This study investigates the current environmental conditions, especially deforestation of the western DMZ adopting remote sensing and GIS techniques. The Land-covers were identified through the linear spectvral mixture analysis(LSMA) which was used to handle the spectral mixture problem of low spatial resolution imagery of Landsat TM and ETM+ imagery. To analyze quantitative and spatial change of vegetation-cover in western DMZ, GIS overlay method was used. In LSMA, to develop high-quality fraction images, three endmembers of green vegetation(GV), soil, water were driven from pure features in the imagery. Through 15 years, from 1987 to 2002, forest of western DMZ and vicinity was devastated and changed to urban, farmland or barren land. Northern part of western DMZ and vicinity was more deforested than that of southern part. ($52.37km^2$ of North Korean forest and $39.04km^2$ of South Korean were change to other land-covers.) In case of North Korean part, forest changed to barren land and farmland and in South Korean part, forest changed to farmland and urban area. Especially, In North Korean part of DMZ and vicinity, $56.15km^2$ of farmland changed to barren land through 15 years, which showed the failure of the 'Darakbat' (terrace filed) project which is one of food increase projects in North Korea.

  • PDF

Land Cover Classification of Coastal Area by SAM from Airborne Hyperspectral Images (항공 초분광 영상으로부터 연안지역의 SAM 토지피복분류)

  • LEE, Jin-Duk;BANG, Kon-Joon;KIM, Hyun-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.21 no.1
    • /
    • pp.35-45
    • /
    • 2018
  • Image data collected by an airborne hyperspectral camera system have a great usability in coastal line mapping, detection of facilities composed of specific materials, detailed land use analysis, change monitoring and so forh in a complex coastal area because the system provides almost complete spectral and spatial information for each image pixel of tens to hundreds of spectral bands. A few approaches after classifying by a few approaches based on SAM(Spectral Angle Mapper) supervised classification were applied for extracting optimal land cover information from hyperspectral images acquired by CASI-1500 airborne hyperspectral camera on the object of a coastal area which includes both land and sea water areas. We applied three different approaches, that is to say firstly the classification approach of combined land and sea areas, secondly the reclassification approach after decompostion of land and sea areas from classification result of combined land and sea areas, and thirdly the land area-only classification approach using atmospheric correction images and compared classification results and accuracies. Land cover classification was conducted respectively by selecting not only four band images with the same wavelength range as IKONOS, QuickBird, KOMPSAT and GeoEye satelllite images but also eight band images with the same wavelength range as WorldView-2 from 48 band hyperspectral images and then compared with the classification result conducted with all of 48 band images. As a result, the reclassification approach after decompostion of land and sea areas from classification result of combined land and sea areas is more effective than classification approach of combined land and sea areas. It is showed the bigger the number of bands, the higher accuracy and reliability in the reclassification approach referred above. The results of higher spectral resolution showed asphalt or concrete roads was able to be classified more accurately.

Synthetic Data Generation with Unity 3D and Unreal Engine for Construction Hazard Scenarios: A Comparative Analysis

  • Aqsa Sabir;Rahat Hussain;Akeem Pedro;Mehrtash Soltani;Dongmin Lee;Chansik Park;Jae- Ho Pyeon
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.1286-1288
    • /
    • 2024
  • The construction industry, known for its inherent risks and multiple hazards, necessitates effective solutions for hazard identification and mitigation [1]. To address this need, the implementation of machine learning models specializing in object detection has become increasingly important because this technological approach plays a crucial role in augmenting worker safety by proactively recognizing potential dangers on construction sites [2], [3]. However, the challenge in training these models lies in obtaining accurately labeled datasets, as conventional methods require labor-intensive labeling or costly measurements [4]. To circumvent these challenges, synthetic data generation (SDG) has emerged as a key method for creating realistic and diverse training scenarios [5], [6]. The paper reviews the evolution of synthetic data generation tools, highlighting the shift from earlier solutions like Synthpop and Data Synthesizer to advanced game engines[7]. Among the various gaming platforms, Unity 3D and Unreal Engine stand out due to their advanced capabilities in replicating realistic construction hazard environments [8], [9]. Comparing Unity 3D and Unreal Engine is crucial for evaluating their effectiveness in SDG, aiding developers in selecting the appropriate platform for their needs. For this purpose, this paper conducts a comparative analysis of both engines assessing their ability to create high-fidelity interactive environments. To thoroughly evaluate the suitability of these engines for generating synthetic data in construction site simulations, the focus relies on graphical realism, developer-friendliness, and user interaction capabilities. This evaluation considers these key aspects as they are essential for replicating realistic construction sites, ensuring both high visual fidelity and ease of use for developers. Firstly, graphical realism is crucial for training ML models to recognize the nuanced nature of construction environments. In this aspect, Unreal Engine stands out with its superior graphics quality compared to Unity 3D which typically considered to have less graphical prowess [10]. Secondly, developer-friendliness is vital for those generating synthetic data. Research indicates that Unity 3D is praised for its user-friendly interface and the use of C# scripting, which is widely used in educational settings, making it a popular choice for those new to game development or synthetic data generation. Whereas Unreal Engine, while offering powerful capabilities in terms of realistic graphics, is often viewed as more complex due to its use of C++ scripting and the blueprint system. While the blueprint system is a visual scripting tool that does not require traditional coding, it can be intricate and may present a steeper learning curve, especially for those without prior experience in game development [11]. Lastly, regarding user interaction capabilities, Unity 3D is known for its intuitive interface and versatility, particularly in VR/AR development for various skill levels. In contrast, Unreal Engine, with its advanced graphics and blueprint scripting, is better suited for creating high-end, immersive experiences [12]. Based on current insights, this comparative analysis underscores the user-friendly interface and adaptability of Unity 3D, featuring a built-in perception package that facilitates automatic labeling for SDG [13]. This functionality enhances accessibility and simplifies the SDG process for users. Conversely, Unreal Engine is distinguished by its advanced graphics and realistic rendering capabilities. It offers plugins like EasySynth (which does not provide automatic labeling) and NDDS for SDG [14], [15]. The development complexity associated with Unreal Engine presents challenges for novice users, whereas the more approachable platform of Unity 3D is advantageous for beginners. This research provides an in-depth review of the latest advancements in SDG, shedding light on potential future research and development directions. The study concludes that the integration of such game engines in ML model training markedly enhances hazard recognition and decision-making skills among construction professionals, thereby significantly advancing data acquisition for machine learning in construction safety monitoring.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.