• Title/Summary/Keyword: Image to Speech

Search Result 188, Processing Time 0.023 seconds

A Study on Analysis of Variant Factors of Recognition Performance for Lip-reading at Dynamic Environment (동적 환경에서의 립리딩 인식성능저하 요인분석에 대한 연구)

  • 신도성;김진영;이주헌
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.5
    • /
    • pp.471-477
    • /
    • 2002
  • Recently, lip-reading has been studied actively as an auxiliary method of automatic speech recognition(ASR) in noisy environments. However, almost of research results were obtained based on the database constructed in indoor condition. So, we dont know how developed lip-reading algorithms are robust to dynamic variation of image. Currently we have developed a lip-reading system based on image-transform based algorithm. This system recognize 22 words and this word recognizer achieves word recognition of up to 53.54%. In this paper we present how stable the lip-reading system is in environmental variance and what the main variant factors are about dropping off in word-recognition performance. For studying lip-reading robustness we consider spatial valiance (translation, rotation, scaling) and illumination variance. Two kinds of test data are used. One Is the simulated lip image database and the other is real dynamic database captured in car environment. As a result of our experiment, we show that the spatial variance is one of degradations factors of lip reading performance. But the most important factor of degradation is not the spatial variance. The illumination variances make severe reduction of recognition rates as much as 70%. In conclusion, robust lip reading algorithms against illumination variances should be developed for using lip reading as a complementary method of ASR.

Crossing the "Great Fire Wall": A Study with Grounded Theory Examining How China Uses Twitter as a New Battlefield for Public Diplomacy

  • Guo, Jing
    • Journal of Public Diplomacy
    • /
    • v.1 no.2
    • /
    • pp.49-74
    • /
    • 2021
  • In this paper, I applied grounded theory in exploring how Twitter became the battlefield for China's public diplomacy campaign. China's new move to global social media platforms, such as Twitter and Facebook, has been a controversial strategy in public diplomacy. This study analyzes Chinese Foreign Spokesperson Zhao Lijian's Twitter posts and comments. It models China's recent diplomatic move to Twitter as a "war of words" model, with features including "leadership," "polarization," and "aggression," while exerting possible effects as "resistance," "hatred," and "sarcasm" to the global community. Our findings show that by failing to gage public opinion and promote the country's positive image, China's current digital diplomacy strategy reflected by Zhao Lijian's tweets has instead constructed a polarized political public sphere, contradictory to the country's promoted "shared human destiny." The "war of words" model extends our understanding of China's new digital diplomacy move as a hybrid of state propaganda and self-performance. Such a strategy could spread hate speech and accelerate political polarization in cyberspace, despite improvements to China's homogenous network building on Twitter.

A Review on Advanced Methodologies to Identify the Breast Cancer Classification using the Deep Learning Techniques

  • Bandaru, Satish Babu;Babu, G. Rama Mohan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.420-426
    • /
    • 2022
  • Breast cancer is among the cancers that may be healed as the disease diagnosed at early times before it is distributed through all the areas of the body. The Automatic Analysis of Diagnostic Tests (AAT) is an automated assistance for physicians that can deliver reliable findings to analyze the critically endangered diseases. Deep learning, a family of machine learning methods, has grown at an astonishing pace in recent years. It is used to search and render diagnoses in fields from banking to medicine to machine learning. We attempt to create a deep learning algorithm that can reliably diagnose the breast cancer in the mammogram. We want the algorithm to identify it as cancer, or this image is not cancer, allowing use of a full testing dataset of either strong clinical annotations in training data or the cancer status only, in which a few images of either cancers or noncancer were annotated. Even with this technique, the photographs would be annotated with the condition; an optional portion of the annotated image will then act as the mark. The final stage of the suggested system doesn't need any based labels to be accessible during model training. Furthermore, the results of the review process suggest that deep learning approaches have surpassed the extent of the level of state-of-of-the-the-the-art in tumor identification, feature extraction, and classification. in these three ways, the paper explains why learning algorithms were applied: train the network from scratch, transplanting certain deep learning concepts and constraints into a network, and (another way) reducing the amount of parameters in the trained nets, are two functions that help expand the scope of the networks. Researchers in economically developing countries have applied deep learning imaging devices to cancer detection; on the other hand, cancer chances have gone through the roof in Africa. Convolutional Neural Network (CNN) is a sort of deep learning that can aid you with a variety of other activities, such as speech recognition, image recognition, and classification. To accomplish this goal in this article, we will use CNN to categorize and identify breast cancer photographs from the available databases from the US Centers for Disease Control and Prevention.

Future Trends of AI-Based Smart Systems and Services: Challenges, Opportunities, and Solutions

  • Lee, Daewon;Park, Jong Hyuk
    • Journal of Information Processing Systems
    • /
    • v.15 no.4
    • /
    • pp.717-723
    • /
    • 2019
  • Smart systems and services aim to facilitate growing urban populations and their prospects of virtual-real social behaviors, gig economies, factory automation, knowledge-based workforce, integrated societies, modern living, among many more. To satisfy these objectives, smart systems and services must comprises of a complex set of features such as security, ease of use and user friendliness, manageability, scalability, adaptivity, intelligent behavior, and personalization. Recently, artificial intelligence (AI) is realized as a data-driven technology to provide an efficient knowledge representation, semantic modeling, and can support a cognitive behavior aspect of the system. In this paper, an integration of AI with the smart systems and services is presented to mitigate the existing challenges. Several novel researches work in terms of frameworks, architectures, paradigms, and algorithms are discussed to provide possible solutions against the existing challenges in the AI-based smart systems and services. Such novel research works involve efficient shape image retrieval, speech signal processing, dynamic thermal rating, advanced persistent threat tactics, user authentication, and so on.

A Study on Wavelet Application for Signal Analysis (신호 해석을 위한 웨이브렛 응용에 관한 연구)

  • Bae, Sang-Bum;Ryu, Ji-Goo;Kim, Nam-Ho
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2005.11a
    • /
    • pp.302-305
    • /
    • 2005
  • Recently, many methods to analyze signal have been proposed and representative methods are the Fourier transform and wavelet transform. In these methods, the Fourier transform represents signal with combination cosine and sine at all locations in the frequency domain. However, it doesn't provide time information that particular frequency occurs in signal and denpends on only the global feature of the signal. So, to improve these points the wavelet transform which is capable of multiresolution analysis has been applied to many fields such as speech processing, image processing and computer vision. And the wavelet transform, which uses changing window according to scale parameter, presents time-frequency localization. In this paper, we proposed a new approach using a wavelet of cosine and sine type and analyzed features of signal in a limited point of frequency-time plane.

  • PDF

Could Decimal-binary Vector be a Representative of DNA Sequence for Classification?

  • Sanjaya, Prima;Kang, Dae-Ki
    • International journal of advanced smart convergence
    • /
    • v.5 no.3
    • /
    • pp.8-15
    • /
    • 2016
  • In recent years, one of deep learning models called Deep Belief Network (DBN) which formed by stacking restricted Boltzman machine in a greedy fashion has beed widely used for classification and recognition. With an ability to extracting features of high-level abstraction and deal with higher dimensional data structure, this model has ouperformed outstanding result on image and speech recognition. In this research, we assess the applicability of deep learning in dna classification level. Since the training phase of DBN is costly expensive, specially if deals with DNA sequence with thousand of variables, we introduce a new encoding method, using decimal-binary vector to represent the sequence as input to the model, thereafter compare with one-hot-vector encoding in two datasets. We evaluated our proposed model with different contrastive algorithms which achieved significant improvement for the training speed with comparable classification result. This result has shown a potential of using decimal-binary vector on DBN for DNA sequence to solve other sequence problem in bioinformatics.

Recognition of Korean Vowels using Bayesian Classification with Mouth Shape (베이지안 분류 기반의 입 모양을 이용한 한글 모음 인식 시스템)

  • Kim, Seong-Woo;Cha, Kyung-Ae;Park, Se-Hyun
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.8
    • /
    • pp.852-859
    • /
    • 2019
  • With the development of IT technology and smart devices, various applications utilizing image information are being developed. In order to provide an intuitive interface for pronunciation recognition, there is a growing need for research on pronunciation recognition using mouth feature values. In this paper, we propose a system to distinguish Korean vowel pronunciations by detecting feature points of lips region in images and applying Bayesian based learning model. The proposed system implements the recognition system based on Bayes' theorem, so that it is possible to improve the accuracy of speech recognition by accumulating input data regardless of whether it is speaker independent or dependent on small amount of learning data. Experimental results show that it is possible to effectively distinguish Korean vowels as a result of applying probability based Bayesian classification using only visual information such as mouth shape features.

Design of Smart Device Assistive Emergency WayFinder Using Vision Based Emergency Exit Sign Detection

  • Lee, Minwoo;Mariappan, Vinayagam;Mfitumukiza, Joseph;Lee, Junghoon;Cho, Juphil;Cha, Jaesang
    • Journal of Satellite, Information and Communications
    • /
    • v.12 no.1
    • /
    • pp.101-106
    • /
    • 2017
  • In this paper, we present Emergency exit signs are installed to provide escape routes or ways in buildings like shopping malls, hospitals, industry, and government complex, etc. and various other places for safety purpose to aid people to escape easily during emergency situations. In case of an emergency situation like smoke, fire, bad lightings and crowded stamped condition at emergency situations, it's difficult for people to recognize the emergency exit signs and emergency doors to exit from the emergency building areas. This paper propose an automatic emergency exit sing recognition to find exit direction using a smart device. The proposed approach aims to develop an computer vision based smart phone application to detect emergency exit signs using the smart device camera and guide the direction to escape in the visible and audible output format. In this research, a CAMShift object tracking approach is used to detect the emergency exit sign and the direction information extracted using template matching method. The direction information of the exit sign is stored in a text format and then using text-to-speech the text synthesized to audible acoustic signal. The synthesized acoustic signal render on smart device speaker as an escape guide information to the user. This research result is analyzed and concluded from the views of visual elements selecting, EXIT appearance design and EXIT's placement in the building, which is very valuable and can be commonly referred in wayfinder system.

Animation OST Musical Element Analysis based on A Narrative Process Classification Model (내러티브 프로세스 분류 모델 기반 애니메이션 OST의 음악적 요소 분석)

  • Jang, Soeun;Sung, Bongsun;Lee, Jang Hoon;Kim, Jae Ho
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.10
    • /
    • pp.1239-1252
    • /
    • 2014
  • The OST (Original Sound Track) in the film plays a vital role in increasing consensus and concentration to the storyline. The selected 4 animations are classified into 17 Narrative Processes (NP) by using NP Classification Model [1]. For the NPs each having OSTs, the authors have investigated 6 kinds of objective musical elements of the OST such as sound (speech, music, effect), tonality, tempo, range, intensity, and instrumentation. It is found that there are 33.3% common musical elements among all of them for the NPs with OSTs commonly. Among them, it is also found that there are 71.9% of common properties of the musical element. This research is meaningful by firstly showing that there are common properties of objective musical elements in each NP and the corresponding OST.

Trends in Neuromorphic Software Platform for Deep Neural Network (딥 뉴럴 네트워크 지원을 위한 뉴로모픽 소프트웨어 플랫폼 기술 동향)

  • Yu, Misun;Ha, Youngmok;Kim, Taeho
    • Electronics and Telecommunications Trends
    • /
    • v.33 no.4
    • /
    • pp.14-22
    • /
    • 2018
  • Deep neural networks (DNNs) are widely used in various domains such as speech and image recognition. DNN software frameworks such as Tensorflow and Caffe contributed to the popularity of DNN because of their easy programming environment. In addition, many companies are developing neuromorphic processing units (NPU) such as Tensor Processing Units (TPUs) and Graphical Processing Units (GPUs) to improve the performance of DNN processing. However, there is a large gap between NPUs and DNN software frameworks due to the lack of framework support for various NPUs. A bridge for the gap is a DNN software platform including DNN optimized compilers and DNN libraries. In this paper, we review the technical trends of DNN software platforms.