• Title/Summary/Keyword: Video extraction

Search Result 466, Processing Time 0.026 seconds

The Kernel Trick for Content-Based Media Retrieval in Online Social Networks

  • Cha, Guang-Ho
    • Journal of Information Processing Systems
    • /
    • v.17 no.5
    • /
    • pp.1020-1033
    • /
    • 2021
  • Nowadays, online or mobile social network services (SNS) are very popular and widely spread in our society and daily lives to instantly share, disseminate, and search information. In particular, SNS such as YouTube, Flickr, Facebook, and Amazon allow users to upload billions of images or videos and also provide a number of multimedia information to users. Information retrieval in multimedia-rich SNS is very useful but challenging task. Content-based media retrieval (CBMR) is the process of obtaining the relevant image or video objects for a given query from a collection of information sources. However, CBMR suffers from the dimensionality curse due to inherent high dimensionality features of media data. This paper investigates the effectiveness of the kernel trick in CBMR, specifically, the kernel principal component analysis (KPCA) for dimensionality reduction. KPCA is a nonlinear extension of linear principal component analysis (LPCA) to discovering nonlinear embeddings using the kernel trick. The fundamental idea of KPCA is mapping the input data into a highdimensional feature space through a nonlinear kernel function and then computing the principal components on that mapped space. This paper investigates the potential of KPCA in CBMR for feature extraction or dimensionality reduction. Using the Gaussian kernel in our experiments, we compute the principal components of an image dataset in the transformed space and then we use them as new feature dimensions for the image dataset. Moreover, KPCA can be applied to other many domains including CBMR, where LPCA has been used to extract features and where the nonlinear extension would be effective. Our results from extensive experiments demonstrate that the potential of KPCA is very encouraging compared with LPCA in CBMR.

Arousal and Valence Classification Model Based on Long Short-Term Memory and DEAP Data for Mental Healthcare Management

  • Choi, Eun Jeong;Kim, Dong Keun
    • Healthcare Informatics Research
    • /
    • v.24 no.4
    • /
    • pp.309-316
    • /
    • 2018
  • Objectives: Both the valence and arousal components of affect are important considerations when managing mental healthcare because they are associated with affective and physiological responses. Research on arousal and valence analysis, which uses images, texts, and physiological signals that employ deep learning, is actively underway; research investigating how to improve the recognition rate is needed. The goal of this research was to design a deep learning framework and model to classify arousal and valence, indicating positive and negative degrees of emotion as high or low. Methods: The proposed arousal and valence classification model to analyze the affective state was tested using data from 40 channels provided by a dataset for emotion analysis using electrocardiography (EEG), physiological, and video signals (the DEAP dataset). Experiments were based on 10 selected featured central and peripheral nervous system data points, using long short-term memory (LSTM) as a deep learning method. Results: The arousal and valence were classified and visualized on a two-dimensional coordinate plane. Profiles were designed depending on the number of hidden layers, nodes, and hyperparameters according to the error rate. The experimental results show an arousal and valence classification model accuracy of 74.65 and 78%, respectively. The proposed model performed better than previous other models. Conclusions: The proposed model appears to be effective in analyzing arousal and valence; specifically, it is expected that affective analysis using physiological signals based on LSTM will be possible without manual feature extraction. In a future study, the classification model will be adopted in mental healthcare management systems.

Video Stabilization Algorithm of Shaking image using Deep Learning (딥러닝을 활용한 흔들림 영상 안정화 알고리즘)

  • Lee, Kyung Min;Lin, Chi Ho
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.1
    • /
    • pp.145-152
    • /
    • 2019
  • In this paper, we proposed a shaking image stabilization algorithm using deep learning. The proposed algorithm utilizes deep learning, unlike some 2D, 2.5D and 3D based stabilization techniques. The proposed algorithm is an algorithm that extracts and compares features of shaky images through CNN network structure and LSTM network structure, and transforms images in reverse order of movement size and direction of feature points through the difference of feature point between previous frame and current frame. The algorithm for stabilizing the shake is implemented by using CNN network and LSTM structure using Tensorflow for feature extraction and comparison of each frame. Image stabilization is implemented by using OpenCV open source. Experimental results show that the proposed algorithm can be used to stabilize the camera shake stability in the up, down, left, and right shaking images.

A Method of Detection of Deepfake Using Bidirectional Convolutional LSTM (Bidirectional Convolutional LSTM을 이용한 Deepfake 탐지 방법)

  • Lee, Dae-hyeon;Moon, Jong-sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1053-1065
    • /
    • 2020
  • With the recent development of hardware performance and artificial intelligence technology, sophisticated fake videos that are difficult to distinguish with the human's eye are increasing. Face synthesis technology using artificial intelligence is called Deepfake, and anyone with a little programming skill and deep learning knowledge can produce sophisticated fake videos using Deepfake. A number of indiscriminate fake videos has been increased significantly, which may lead to problems such as privacy violations, fake news and fraud. Therefore, it is necessary to detect fake video clips that cannot be discriminated by a human eyes. Thus, in this paper, we propose a deep-fake detection model applied with Bidirectional Convolution LSTM and Attention Module. Unlike LSTM, which considers only the forward sequential procedure, the model proposed in this paper uses the reverse order procedure. The Attention Module is used with a Convolutional neural network model to use the characteristics of each frame for extraction. Experiments have shown that the model proposed has 93.5% accuracy and AUC is up to 50% higher than the results of pre-existing studies.

Endoscopic Retrograde Cholangiopancreatography in Bangladeshi Children: Experiences and Challenges in a Developing Country

  • Rashid, Rafia;Arfin, Md. Samsul;Karim, A.S.M. Bazlul;Alam, Muhammad Baharul;Mahmud, Salahuddin
    • Pediatric Gastroenterology, Hepatology & Nutrition
    • /
    • v.25 no.4
    • /
    • pp.332-339
    • /
    • 2022
  • Purpose: Although endoscopic retrograde cholangiopancreatography (ERCP) has been used for more than five decades, its applicability in Bangladeshi children has recently become more common. Therefore, this manuscript aims to describe our experience in performing ERCPs in Bangladeshi children with hepatopancreaticobiliary diseases, focusing on presenting diseases, as well as the diagnostic and therapeutic efficacy. Methods: Between 2018 and 2021, 20 children underwent 30 ERCP procedures at the Bangladesh Specialized Hospital, Dhaka. A single trained adult gastroenterologist performed all procedures using a therapeutic video duodenoscope. The indications for ERCP, diagnostic findings, therapeutic procedures, and complications were documented. Results: The median age of the study patients was 10 years (range, 1.7-15 years). Successful cannulation of the papilla was achieved in 28 procedures and failed in 2 cases. Repeated ERCP was required in seven patients. Nine patients had biliary indications and 11 had pancreatic indications. Choledocholithiasis was the most common indication for ERCP in patients with biliary disease, while chronic pancreatitis was common among patients with pancreatic indications. Pancreatic divisum was observed in only one patient. Pancreatic and biliary sphincterotomy was performed in 14 and 9 cases, respectively. A single pigtail or straight therapeutic stent was inserted in seven cases and removed in five cases. Stone extraction was performed in six procedures, and balloon dilatation was performed in five procedures. The post-procedural period for these patients was uneventful. Conclusion: We found that ERCP is a practical and successful therapeutic intervention for treating hepatopancreaticobiliary disorders in children when performed by experienced endoscopists.

A Research on Image Metadata Extraction through YCrCb Color Model Analysis for Media Hyper-personalization Recommendation (미디어 초개인화 추천을 위한 YCrCb 컬러 모델 분석을 통한 영상의 메타데이터 추출에 대한 연구)

  • Park, Hyo-Gyeong;Yong, Sung-Jung;You, Yeon-Hwi;Moon, Il-Young
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.277-280
    • /
    • 2021
  • Recently as various contents are mass produced based on high accessibility, the media contents market is more active. Users want to find content that suits their taste, and each platform is competing for personalized recommendations for content. For an efficient recommendation system, high-quality metadata is required. Existing platforms take a method in which the user directly inputs the metadata of an image. This will waste time and money processing large amounts of data. In this paper, for media hyperpersonalization recommendation, keyframes are extracted based on the YCrCb color model of the video based on movie trailers, movie genres are distinguished through supervised learning of artificial intelligence and In the future, we would like to propose a utilization plan for generating metadata.

  • PDF

3D Object Extraction Mechanism from Informal Natural Language Based Requirement Specifications (비정형 자연어 요구사항으로부터 3D 객체 추출 메커니즘)

  • Hyuntae Kim;Janghwan Kim;Jihoon Kong;Kidu Kim;R. Young Chul Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.9
    • /
    • pp.453-459
    • /
    • 2024
  • Recent advances in generative AI technologies using natural language processing have critically impacted text, image, and video production. Despite these innovations, we still need to improve the consistency and reusability of AI-generated outputs. These issues are critical in cartoon creation, where the inability to consistently replicate characters and specific objects can degrade the work's quality. We propose an integrated adaption of language analysis-based requirement engineering and cartoon engineering to solve this. The proposed method applies the linguistic frameworks of Chomsky and Fillmore to analyze natural language and utilizes UML sequence models for generating consistent 3D representations of object interactions. It systematically interprets the creator's intentions from textual inputs, ensuring that each character or object, once conceptualized, is accurately replicated across various panels and episodes to preserve visual and contextual integrity. This technique enhances the accuracy and consistency of character portrayals in animated contexts, aligning closely with the initial specifications. Consequently, this method holds potential applicability in other domains requiring the translation of complex textual descriptions into visual representations.

A Study on an Open/Closed Eye Detection Algorithm for Drowsy Driver Detection (운전자 졸음 검출을 위한 눈 개폐 검출 알고리즘 연구)

  • Kim, TaeHyeong;Lim, Woong;Sim, Donggyu
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.7
    • /
    • pp.67-77
    • /
    • 2016
  • In this paper, we propose an algorithm for open/closed eye detection based on modified Hausdorff distance. The proposed algorithm consists of two parts, face detection and open/closed eye detection parts. To detect faces in an image, MCT (Modified Census Transform) is employed based on characteristics of the local structure which uses relative pixel values in the area with fixed size. Then, the coordinates of eyes are found and open/closed eyes are detected using MHD (Modified Hausdorff Distance) in the detected face region. Firstly, face detection process creates an MCT image in terms of various face images and extract criteria features by PCA(Principle Component Analysis) on offline. After extraction of criteria features, it detects a face region via the process which compares features newly extracted from the input face image and criteria features by using Euclidean distance. Afterward, the process finds out the coordinates of eyes and detects open/closed eye using template matching based on MHD in each eye region. In performance evaluation, the proposed algorithm achieved 94.04% accuracy in average for open/closed eye detection in terms of test video sequences of gray scale with 30FPS/$320{\times}180$ resolution.

On-Road Car Detection System Using VD-GMM 2.0 (차량검출 GMM 2.0을 적용한 도로 위의 차량 검출 시스템 구축)

  • Lee, Okmin;Won, Insu;Lee, Sangmin;Kwon, Jangwoo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.11
    • /
    • pp.2291-2297
    • /
    • 2015
  • This paper presents a vehicle detection system using the video as a input image what has moving of vehicles.. Input image has constraints. it has to get fixed view and downward view obliquely from top of the road. Road detection is required to use only the road area in the input image. In introduction, we suggest the experiment result and the critical point of motion history image extraction method, SIFT(Scale_Invariant Feature Transform) algorithm and histogram analysis to detect vehicles. To solve these problem, we propose using applied Gaussian Mixture Model(GMM) that is the Vehicle Detection GMM(VDGMM). In addition, we optimize VDGMM to detect vehicles more and named VDGMM 2.0. In result of experiment, each precision, recall and F1 rate is 9%, 53%, 15% for GMM without road detection and 85%, 77%, 80% for VDGMM2.0 with road detection.

An Algorithm of Fingerprint Image Restoration Based on an Artificial Neural Network (인공 신경망 기반의 지문 영상 복원 알고리즘)

  • Jang, Seok-Woo;Lee, Samuel;Kim, Gye-Young
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.8
    • /
    • pp.530-536
    • /
    • 2020
  • The use of minutiae by fingerprint readers is robust against presentation attacks, but one weakness is that the mismatch rate is high. Therefore, minutiae tend to be used with skeleton images. There have been many studies on security vulnerabilities in the characteristics of minutiae, but vulnerability studies on the skeleton are weak, so this study attempts to analyze the vulnerability of presentation attacks against the skeleton. To this end, we propose a method based on the skeleton to recover the original fingerprint using a learning algorithm. The proposed method includes a new learning model, Pix2Pix, which adds a latent vector to the existing Pix2Pix model, thereby generating a natural fingerprint. In the experimental results, the original fingerprint is restored using the proposed machine learning, and then, the restored fingerprint is the input for the fingerprint reader in order to achieve a good recognition rate. Thus, this study verifies that fingerprint readers using the skeleton are vulnerable to presentation attacks. The approach presented in this paper is expected to be useful in a variety of applications concerning fingerprint restoration, video security, and biometrics.