• 제목/요약/키워드: Deep Learning Dataset

Search Result 764, Processing Time 0.025 seconds

Facial Expression Recognition through Self-supervised Learning for Predicting Face Image Sequence

  • Yoon, Yeo-Chan;Kim, Soo Kyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.9
    • /
    • pp.41-47
    • /
    • 2022
  • In this paper, we propose a new and simple self-supervised learning method that predicts the middle image of a face image sequence for automatic expression recognition. Automatic facial expression recognition can achieve high performance through deep learning methods, however, generally requires a expensive large data set. The size of the data set and the performance of the algorithm are tend to be proportional. The proposed method learns latent deep representation of a face through self-supervised learning using an existing dataset without constructing an additional dataset. Then it transfers the learned parameter to new facial expression reorganization model for improving the performance of automatic expression recognition. The proposed method showed high performance improvement for two datasets, CK+ and AFEW 8.0, and showed that the proposed method can achieve a great effect.

Deep learning-based Automatic Weed Detection on Onion Field (딥러닝을 이용한 양파 밭의 잡초 검출 연구)

  • Kim, Seo jeong;Lee, Jae Su;Kim, Hyong Suk
    • Smart Media Journal
    • /
    • v.7 no.3
    • /
    • pp.16-21
    • /
    • 2018
  • This paper presents the design and implementation of a deep learning-based automated weed detector on onion fields. The system is based on a Convolutional Neural Network that specifically selects proposed regions. The detector initiates training with a dataset taken from agricultural onion fields, after which candidate regions with very high probability of suspicion are considered weeds. Non-maximum suppression helps preserving the less overlapped bounding boxes. The dataset collected from different onion farms is evaluated with the proposed classifier. Classification accuracy is about 99% for the dataset, indicating the proposed method's superior performance with regard to weed detection on the onion fields.

Lightweight Deep Learning Model for Heart Rate Estimation from Facial Videos (얼굴 영상 기반의 심박수 추정을 위한 딥러닝 모델의 경량화 기법)

  • Gyutae Hwang;Myeonggeun Park;Sang Jun Lee
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.18 no.2
    • /
    • pp.51-58
    • /
    • 2023
  • This paper proposes a deep learning method for estimating the heart rate from facial videos. Our proposed method estimates remote photoplethysmography (rPPG) signals to predict the heart rate. Although there have been proposed several methods for estimating rPPG signals, most previous methods can not be utilized in low-power single board computers due to their computational complexity. To address this problem, we construct a lightweight student model and employ a knowledge distillation technique to reduce the performance degradation of a deeper network model. The teacher model consists of 795k parameters, whereas the student model only contains 24k parameters, and therefore, the inference time was reduced with the factor of 10. By distilling the knowledge of the intermediate feature maps of the teacher model, we improved the accuracy of the student model for estimating the heart rate. Experiments were conducted on the UBFC-rPPG dataset to demonstrate the effectiveness of the proposed method. Moreover, we collected our own dataset to verify the accuracy and processing time of the proposed method on a real-world dataset. Experimental results on a NVIDIA Jetson Nano board demonstrate that our proposed method can infer the heart rate in real time with the mean absolute error of 2.5183 bpm.

A method of generating virtual shadow dataset of buildings for the shadow detection and removal

  • Kim, Kangjik;Chun, Junchul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.49-56
    • /
    • 2020
  • Detecting shadows in images and restoring or removing them was a very challenging task in computer vision. Traditional researches used color information, edges, and thresholds to detect shadows, but there were errors such as not considering the penumbra area of shadow or even detecting a black area that is not a shadow. Deep learning has been successful in various fields of computer vision, and research on applying deep learning has started in the field of shadow detection and removal. However, it was very difficult and time-consuming to collect data for network learning, and there were many limited conditions for shooting. In particular, it was more difficult to obtain shadow data from buildings and satellite images, which hindered the progress of the research. In this paper, we propose a method for generating shadow data from buildings and satellites using Unity3D. In the virtual Unity space, 3D objects existing in the real world were placed, and shadows were generated using lights effects to shoot. Through this, it is possible to get all three types of images (shadow-free, shadow image, shadow mask) necessary for shadow detection and removal when training deep learning networks. The method proposed in this paper contributes to helping the progress of the research by providing big data in the field of building or satellite shadow detection and removal research, which is difficult for learning deep learning networks due to the absence of data. And this can be a suboptimal method. We believe that we have contributed in that we can apply virtual data to test deep learning networks before applying real data.

An active learning method with difficulty learning mechanism for crack detection

  • Shu, Jiangpeng;Li, Jun;Zhang, Jiawei;Zhao, Weijian;Duan, Yuanfeng;Zhang, Zhicheng
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.195-206
    • /
    • 2022
  • Crack detection is essential for inspection of existing structures and crack segmentation based on deep learning is a significant solution. However, datasets are usually one of the key issues. When building a new dataset for deep learning, laborious and time-consuming annotation of a large number of crack images is an obstacle. The aim of this study is to develop an approach that can automatically select a small portion of the most informative crack images from a large pool in order to annotate them, not to label all crack images. An active learning method with difficulty learning mechanism for crack segmentation tasks is proposed. Experiments are carried out on a crack image dataset of a steel box girder, which contains 500 images of 320×320 size for training, 100 for validation, and 190 for testing. In active learning experiments, the 500 images for training are acted as unlabeled image. The acquisition function in our method is compared with traditional acquisition functions, i.e., Query-By-Committee (QBC), Entropy, and Core-set. Further, comparisons are made on four common segmentation networks: U-Net, DeepLabV3, Feature Pyramid Network (FPN), and PSPNet. The results show that when training occurs with 200 (40%) of the most informative crack images that are selected by our method, the four segmentation networks can achieve 92%-95% of the obtained performance when training takes place with 500 (100%) crack images. The acquisition function in our method shows more accurate measurements of informativeness for unlabeled crack images compared to the four traditional acquisition functions at most active learning stages. Our method can select the most informative images for annotation from many unlabeled crack images automatically and accurately. Additionally, the dataset built after selecting 40% of all crack images can support crack segmentation networks that perform more than 92% when all the images are used.

Improved Inference for Human Attribute Recognition using Historical Video Frames

  • Ha, Hoang Van;Lee, Jong Weon;Park, Chun-Su
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.3
    • /
    • pp.120-124
    • /
    • 2021
  • Recently, human attribute recognition (HAR) attracts a lot of attention due to its wide application in video surveillance systems. Recent deep-learning-based solutions for HAR require time-consuming training processes. In this paper, we propose a post-processing technique that utilizes the historical video frames to improve prediction results without invoking re-training or modifying existing deep-learning-based classifiers. Experiment results on a large-scale benchmark dataset show the effectiveness of our proposed method.

Arabic Text Recognition with Harakat Using Deep Learning

  • Ashwag, Maghraby;Esraa, Samkari
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.41-46
    • /
    • 2023
  • Because of the significant role that harakat plays in Arabic text, this paper used deep learning to extract Arabic text with its harakat from an image. Convolutional neural networks and recurrent neural network algorithms were applied to the dataset, which contained 110 images, each representing one word. The results showed the ability to extract some letters with harakat.

Deep Learning Description Language for Referring to Analysis Model Based on Trusted Deep Learning (신뢰성있는 딥러닝 기반 분석 모델을 참조하기 위한 딥러닝 기술 언어)

  • Mun, Jong Hyeok;Kim, Do Hyung;Choi, Jong Sun;Choi, Jae Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.4
    • /
    • pp.133-142
    • /
    • 2021
  • With the recent advancements of deep learning, companies such as smart home, healthcare, and intelligent transportation systems are utilizing its functionality to provide high-quality services for vehicle detection, emergency situation detection, and controlling energy consumption. To provide reliable services in such sensitive systems, deep learning models are required to have high accuracy. In order to develop a deep learning model for analyzing previously mentioned services, developers should utilize the state of the art deep learning models that have already been verified for higher accuracy. The developers can verify the accuracy of the referenced model by validating the model on the dataset. For this validation, the developer needs structural information to document and apply deep learning models, including metadata such as learning dataset, network architecture, and development environments. In this paper, we propose a description language that represents the network architecture of the deep learning model along with its metadata that are necessary to develop a deep learning model. Through the proposed description language, developers can easily verify the accuracy of the referenced deep learning model. Our experiments demonstrate the application scenario of a deep learning description document that focuses on the license plate recognition for the detection of illegally parked vehicles.

A Mask Wearing Detection System Based on Deep Learning

  • Yang, Shilong;Xu, Huanhuan;Yang, Zi-Yuan;Wang, Changkun
    • Journal of Multimedia Information System
    • /
    • v.8 no.3
    • /
    • pp.159-166
    • /
    • 2021
  • COVID-19 has dramatically changed people's daily life. Wearing masks is considered as a simple but effective way to defend the spread of the epidemic. Hence, a real-time and accurate mask wearing detection system is important. In this paper, a deep learning-based mask wearing detection system is developed to help people defend against the terrible epidemic. The system consists of three important functions, which are image detection, video detection and real-time detection. To keep a high detection rate, a deep learning-based method is adopted to detect masks. Unfortunately, according to the suddenness of the epidemic, the mask wearing dataset is scarce, so a mask wearing dataset is collected in this paper. Besides, to reduce the computational cost and runtime, a simple online and real-time tracking method is adopted to achieve video detection and monitoring. Furthermore, a function is implemented to call the camera to real-time achieve mask wearing detection. The sufficient results have shown that the developed system can perform well in the mask wearing detection task. The precision, recall, mAP and F1 can achieve 86.6%, 96.7%, 96.2% and 91.4%, respectively.

Tissue Level Based Deep Learning Framework for Early Detection of Dysplasia in Oral Squamous Epithelium

  • Gupta, Rachit Kumar;Kaur, Mandeep;Manhas, Jatinder
    • Journal of Multimedia Information System
    • /
    • v.6 no.2
    • /
    • pp.81-86
    • /
    • 2019
  • Deep learning is emerging as one of the best tool in processing data related to medical imaging. In our research work, we have proposed a deep learning based framework CNN (Convolutional Neural Network) for the classification of dysplastic tissue images. The CNN has classified the given images into 4 different classes namely normal tissue, mild dysplastic tissue, moderate dysplastic tissue and severe dysplastic tissue. The dataset under taken for the study consists of 672 tissue images of epithelial squamous layer of oral cavity captured out of the biopsy samples of 52 patients. After applying the data pre-processing and augmentation on the given dataset, 2688 images were created. Further, these 2688 images were classified into 4 categories with the help of expert Oral Pathologist. The classified data was supplied to the convolutional neural network for training and testing of the proposed framework. It has been observed that training data shows 91.65% accuracy whereas the testing data achieves 89.3% accuracy. The results produced by our proposed framework are also tested and validated by comparing the manual results produced by the medical experts working in this area.