• Title/Summary/Keyword: Image to Speech

Search Result 188, Processing Time 0.025 seconds

Blockchain Technology for Combating Deepfake and Protect Video/Image Integrity

  • Rashid, Md Mamunur;Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.8
    • /
    • pp.1044-1058
    • /
    • 2021
  • Tempered electronic contents have multiplied in last few years, thanks to the emergence of sophisticated artificial intelligence(AI) algorithms. Deepfakes (fake footage, photos, speech, and videos) can be a frightening and destructive phenomenon that has the capacity to distort the facts and hamper reputation by presenting a fake reality. Evidence of ownership or authentication of digital material is crucial for combating the fabricated content influx we are facing today. Current solutions lack the capacity to track digital media's history and provenance. Due to the rise of misrepresentation created by technologies like deepfake, detection algorithms are required to verify the integrity of digital content. Many real-world scenarios have been claimed to benefit from blockchain's authentication capabilities. Despite the scattered efforts surrounding such remedies, relatively little research has been undertaken to discover where blockchain technology can be used to tackle the deepfake problem. Latest blockchain based innovations such as Smart Contract, Hyperledger fabric can play a vital role against the manipulation of digital content. The goal of this paper is to summarize and discuss the ongoing researches related to blockchain's capabilities to protect digital content authentication. We have also suggested a blockchain (smart contract) dependent framework that can keep the data integrity of original content and thus prevent deepfake. This study also aims at discussing how blockchain technology can be used more effectively in deepfake prevention as well as highlight the current state of deepfake video detection research, including the generating process, various detection algorithms, and existing benchmarks.

Development of Character Goods Content Utilizing Marker-based Augmented Reality (마커기반 증강현실을 활용한 캐릭터 굿즈 콘텐츠 개발)

  • AHN CHAN JE
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.953-958
    • /
    • 2024
  • Recently, there has been growing interest in the Fourth Industrial Revolution, with a particular focus on the advancement of augmented reality (AR) devices. However, there is a shortage of AR content. Augmented reality operates through marker-based and markerless methods. The marker-based approach involves using a camera to capture images that serve as markers, enhancing them through AR principles. To address the scarcity of AR content and improve the quality of character goods, this study proposes integrating AR technology into character goods. The character industry is expanding each year, leading to a diverse range of character goods. Character acrylic stands, among these goods, leverage game, webtoon, and animation character IPs for sales. To enhance the design process, we utilized the character image as a marker, allowing for the creation of content that aligns with the characteristics of the character IP. We selected a webtoon character and developed AR content, incorporating features such as voice, speech bubbles, and an introduction to the webtoon, tailored to the webtoon's characteristics. This study demonstrates the potential of AR to present visual and auditory information, paving the way for a variety of products, including diverse content. We anticipate that utilizing this research will lead to the emergence of products encompassing various contents.

Time-Synchronization Method for Dubbing Signal Using SOLA (SOLA를 이용한 더빙 신호의 시간축 동기화)

  • 이기승;지철근;차일환;윤대희
    • Journal of Broadcast Engineering
    • /
    • v.1 no.2
    • /
    • pp.85-95
    • /
    • 1996
  • The purpose of this paper Is to propose a dubbed signal time-synchroniztion technique based on the SOLA(Synchronized Over-Lap and Add) method which has been widely used to modify the time scale of speech signal. In broadcasting audio recording environments, the high degree of background noise requires dubbing process. Since the time difference between the original and the dubbed signal ranges about 200mili seconds, process is required to make the dubbed signal synchronize to the corresponding image. The proposed method finds he starting point of the dubbing signal using the short-time energy of the two signals. Thereafter, LPC cepstrum analysis and DTW(Dynamic Time Warping) process are applied to synchronize phoneme positions of the two signals. After determining the matched point by the minimum mean square error between orignal and dubbed LPC cepstrums, the SOLA method is applied to the dubbed signal, to maintain the consistency of the corresponding phase. Effectiveness of proposed method is verified by comparing the waveforms and the spectrograms of the original and the time synchronized dubbing signal.

  • PDF

Extracting Rules from Neural Networks with Continuous Attributes (연속형 속성을 갖는 인공 신경망의 규칙 추출)

  • Jagvaral, Batselem;Lee, Wan-Gon;Jeon, Myung-joong;Park, Hyun-Kyu;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.22-29
    • /
    • 2018
  • Over the decades, neural networks have been successfully used in numerous applications from speech recognition to image classification. However, these neural networks cannot explain their results and one needs to know how and why a specific conclusion was drawn. Most studies focus on extracting binary rules from neural networks, which is often impractical to do, since data sets used for machine learning applications contain continuous values. To fill the gap, this paper presents an algorithm to extract logic rules from a trained neural network for data with continuous attributes. It uses hyperplane-based linear classifiers to extract rules with numeric values from trained weights between input and hidden layers and then combines these classifiers with binary rules learned from hidden and output layers to form non-linear classification rules. Experiments with different datasets show that the proposed approach can accurately extract logical rules for data with nonlinear continuous attributes.

A Study on the Performance of Music Retrieval Based on the Emotion Recognition (감정 인식을 통한 음악 검색 성능 분석)

  • Seo, Jin Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.3
    • /
    • pp.247-255
    • /
    • 2015
  • This paper presents a study on the performance of the music search based on the automatically recognized music-emotion labels. As in the other media data, such as speech, image, and video, a song can evoke certain emotions to the listeners. When people look for songs to listen, the emotions, evoked by songs, could be important points to consider. However; very little study has been done on the performance of the music-emotion labels to the music search. In this paper, we utilize the three axes of human music perception (valence, activity, tension) and the five basic emotion labels (happiness, sadness, tenderness, anger, fear) in measuring music similarity for music search. Experiments were conducted on both genre and singer datasets. The search accuracy of the proposed emotion-based music search was up to 75 % of that of the conventional feature-based music search. By combining the proposed emotion-based method with the feature-based method, we achieved up to 14 % improvement of search accuracy.

A Study on the Development of Language Education Service Platform for Teaching Assistance Robots (교사도우미 로봇을 활용한 어학교육 서비스 플랫폼 구축방안 연구)

  • Yoo, Gab-Sang;Choi, Jong-Chon
    • Journal of Digital Convergence
    • /
    • v.14 no.8
    • /
    • pp.223-232
    • /
    • 2016
  • This study focuses on the new teaching assistance robot platform and the cloud-based education service model to support the server. In the client area we would like to use the teacher assistant robot in elementary school classrooms to utilize the language education service platform. Emerging IoT technology will be adopted to provide a comfortable classroom environment and various media interfaces. Extensive precedent review and case study have been conducted to identify basic requirements of proposed service platform. Embedded system and technology for image recognition, speech recognition, autonomous movement, display, touch screen, IR sensor, GPS, and temperature-humidity sensor were extensively investigated to complete the service. Key findings of this paper are optimized service platform with cloud server system and possibilities of potential smart classroom with intelligent robot by adopting IoT and BIM technology.

Research Trends for the Deep Learning-based Metabolic Rate Calculation (재실자 활동량 산출을 위한 딥러닝 기반 선행연구 동향)

  • Park, Bo-Rang;Choi, Eun-Ji;Lee, Hyo Eun;Kim, Tae-Won;Moon, Jin Woo
    • KIEAE Journal
    • /
    • v.17 no.5
    • /
    • pp.95-100
    • /
    • 2017
  • Purpose: The purpose of this study is to investigate the prior art based on deep learning to objectively calculate the metabolic rate which is the subjective factor for the PMV optimum control and to make a plan for future research based on this study. Methods: For this purpose, the theoretical and technical review and applicability analysis were conducted through various documents and data both in domestic and foreign. Results: As a result of the prior art research, the machine learning model of artificial neural network and deep learning has been used in various fields such as speech recognition, scene recognition, and image restoration. As a representative case, OpenCV Background Subtraction is a technique to separate backgrounds from objects or people. PASCAL VOC and ILSVRC are surveyed as representative technologies that can recognize people, objects, and backgrounds. Based on the results of previous researches on deep learning based on metabolic rate for occupational metabolic rate, it was found out that basic technology applicable to occupational metabolic rate calculation technology to be developed in future researches. It is considered that the study on the development of the activity quantity calculation model with high accuracy will be done.

Efficient Thread Allocation Method of Convolutional Neural Network based on GPGPU (GPGPU 기반 Convolutional Neural Network의 효율적인 스레드 할당 기법)

  • Kim, Mincheol;Lee, Kwangyeob
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.10
    • /
    • pp.935-943
    • /
    • 2017
  • CNN (Convolution neural network), which is used for image classification and speech recognition among neural networks learning based on positive data, has been continuously developed to have a high performance structure to date. There are many difficulties to utilize in an embedded system with limited resources. Therefore, we use GPU (General-Purpose Computing on Graphics Processing Units), which is used for general-purpose operation of GPU to solve the problem because we use pre-learned weights but there are still limitations. Since CNN performs simple and iterative operations, the computation speed varies greatly depending on the thread allocation and utilization method in the Single Instruction Multiple Thread (SIMT) based GPGPU. To solve this problem, there is a thread that needs to be relaxed when performing Convolution and Pooling operations with threads. The remaining threads have increased the operation speed by using the method used in the following feature maps and kernel calculations.

Deconstructing the Genealogy of Orientalism in Term of a Supplement (『오리엔탈리즘』 계보학의 해체론적 재해석 "Truths are illusions which we have forgotten are illusions") (진리란 그것이 환상임을 망각하고 있는 착각이다))

  • Choi, Su
    • English & American cultural studies
    • /
    • v.17 no.2
    • /
    • pp.29-61
    • /
    • 2017
  • Said's Orientalism criticized the European representations on the Middle-East by theorizing orientalism as a discourse. In this text, he explored and criticized the colonial forms of knowledge and language that distorted the image of the colonized. The justification of the discourse of orientalism is derived from the binary system that is originated from Plato which Derrida rejects on the ground that it always privileges one term over the other, that is, colonizer over colonized. Derrida names for this traditional heritage of Western binary system logocentrism which regards logos(the Greek term for speech or reason) as the central principle of language and philosophy, whereas mythos derives its meaning from the logos on the basis of binary oppositions. Thus according to logocentrism, the colonized is merely the defined who can have its meaning from the definers, colonizers. In this paper, utilizing Derrida's a (non)concept called supplement which means both to add on as a surplus and to make up something missing as a mere extra, I propose another alternative interpretation towards the critique of colonial representation by raising internal contradictions in the Platonic dichotomy between logos and mythos embedded in western colonialism discourse, orientalism. I attempt to show that logos(colonizer) and mythos(colonized) is inseparable in itself due to the fact that they exist as supplementary. For this purpose, I demonstrate how colonial binary system constituted and was constituted in terms of language. Through this paper I reinterpret the colonial rationality of privileging 'logos' over 'mythos' by substituting the colonial binary system with the supplement.

Darkness at the Heart of Anti-Imperialism: Racism in Conrad's Heart of Darkness (반제국주의 속의 어둠 -『암흑의 핵심』에 나타난 인종주의)

  • Shin, Moonsu
    • Journal of English Language & Literature
    • /
    • v.55 no.1
    • /
    • pp.61-82
    • /
    • 2009
  • This paper aims to reexamine the issue of racism in Conrad's Heart of Darkness, especially in the light of Chinua Achebe's critique of the novella as a racist text entrenched with European prejudices of Africa and its people in his 1975 speech at the University of Massachusetts titled "An Image of Africa." While the novella's indictment of imperial exploitation has been noted from an early stage of its critical reception, its racism had hardly been discussed until Chinua Achebe posed it. Achebe offers the canonized status of the text as a modernist classic, "the most commonly prescribed novel in twentieth-century literature courses," as one reason for its obvious manifestations of racism being glossed over. One may add that Conrad's militant denunciation of imperialist enterprises as "a sordid farce," his seemingly radical stance against imperialism, serves as ideological constraints upon his readers, blinding them to its immanent racism. A closer look at the novella's attack on imperialism turns out to be contradictory, for it also shows such liberal-humanist ideas as the civilizing mission, the work ethic, and the superiority of civilized man, all of which served to prop up European imperialism at the end of the nineteenth century. This ideological contradiction also accounts for Conrad's racist attitude, which is betrayed in his portrayal of Africans as obscure, primitive. Euro-American imperialism has frequently justified itself by recourse to racism, but racism has not always been allied with imperialism. Some staunch racists such as Robert Knox and Arthur de Gobineau went against imperialism, and Conrad proves one of such cases whose critique of imperialism is voiced in ways that can be characterized as racist.