• Title/Summary/Keyword: Video representation

Search Result 195, Processing Time 0.039 seconds

A Longitudinal Case Study of Late Babble and Early Speech in Southern Mandarin

  • Chen, Xiaoxiang
    • Cross-Cultural Studies
    • /
    • v.20
    • /
    • pp.5-27
    • /
    • 2010
  • This paper studies the relation between canonical/variegated babble (CB/VB) and early speech in an infant acquiring Mandarin Chinese from 9 to 17 months. The infant was audio-and video-taped in her home almost every week. The data analyzed here come from 1,621 utterances extracted from 23 sessions ranging from 30 minutes to one hour, from age 00:09;07 to 01:05;27. The data was digitized, and segments from 23 sessions were transcribed in narrow IPA and coded for analysis. Babble was coded from age 00:09;07 to 01:00;00, and words were coded from 01:00;00 to 01:05;27, proto-words appeared at 11 months, and some babble was still present after 01:10;00. 3821 segments were counted in CB/VB utterances, plus the segments found in 899 word tokens. The data transcription was completed and checked by the author and was rechecked by two other researchers who majored in Chinese phonetics in order to ensure the reliability, we reached an agreement of 95.65%. Mandarin Chinese is phonetically very rich in consonants, especially affricates: it has aspirated and unaspirated stops in labial, alveolar, and velar places of articulation; affricates and fricatives in alveolar, retroflex, and palatal places; /f/; labial, alveolar, and velar nasals; a lateral;[h]; and labiovelar and palatal glides. In the child's pre-speech phonetic repertoire, 7 different consonants and 10 vowels were transcribed at 00:09;07. By 00:10;16, the number of phones was more than doubled (17 consonants, 25 vowels), but the rate of increase slowed after 11 months of age. The phones from babbling remained active throughout the child's early and subsequent speech. The rank order of the occurrence of the major class types for both CB and early speech was: stops, approximants, nasals, affricates, fricatives and lateral. As expected, unaspirated stops outnumbered aspirated stops, and front stops and nasals were more frequent than back sounds in both types of utterances. The fact that affricates outnumbered fricatives in the child's late babble indicates the pre-speech influence of the ambient language. The analysis of the data also showed that: 1) the phonetic characteristics of CB/VB and early meaningful speech are extremely similar. The similarities of CB/VB and speech prove that the two are deeply related; 2) The infant has demonstrated similar preferences for certain types of sounds in the two stages; 3) The infant's babbling was patterned at segmental level, and this regularity was similarly evident in the early speech of children. The three types being coronal plus front vowel; labial plus central and dorsal plus back vowel exhibited much overlap in the phonetic forms of CB/ VB and early speech. So the child's CB/ VB at this stage already shared the basic architecture, composition and representation of early speech. The evidence of similarity between CB/VB and early speech leaves no doubt that phones present in CB/VB are indeed precursors to early speech.

A Theory of Intermediality and its Application in Peter Greenaway's (상호매체성의 이론과 그 적용 - 피터 그리너웨이의 <프로스페로의 서재>를 중심으로)

  • PARK, Ki-Hyun
    • Cross-Cultural Studies
    • /
    • v.19
    • /
    • pp.39-77
    • /
    • 2010
  • The cinema of Peter Greenaway has consistently engaged questions of the relationship between the arts and particularly the relations of image and writing to cinema. When different types of images are correlated and merged with each other on the borders of painting, photography, film, video and computer animation, the interrelationships of the distinct elements cause a shift in the notion of the whole image. This analysis proposes to articulate the complex relationship between the 'interartial' dimension and the 'intermedial' dimension in Peter Greenaway's film, (1991). If the interartiality is interested in the interaction between various arts, including the transition from one to another, the intermediality articulates the same type of relationship between two or more media. The interactional relationship is the same on both sides; on the contrary, the relationship between art and media does not show the same symmetry. All art is based on one or more media - the media is a condition existence of art - but no art can't be reduced to the status of media. This suggests that if the interartiality always involves the intermediality, this proposal may not be reversed. First, we analyse a self-conscious investigation into digital art and technology. Prosospero's Books can be read as a daring visual essay that self-consciously investigates the technical and philosophical functions of letters, books, images, animated paintings, digital arts, and the other magical illusions, which have been modern or will be post-modern media to represent the world. Greenaway uses both conventional film techniques and the resources of high-definition television to layer image upon image, superimposing a second or third frame within his frame. Greenaway uses the frame-within-frame as the cinematic equivalent of Shakespeare's paly-within-play : it offer him the possibility to analyse the work of art/artist/spectator relationship. Secondly, we analyse the relationship between the written word, oral word and the books. Like the written word, the oral word changes into a visual image: The linguistic richness and nuances of Shakeaspeare's characters turn into the powerful and authoritative, but monotone, voices of Gielgud-Prospero, who speaks the Shakespearean lines aloud, shaping the characters so powerfully through his worlds that they are conjured before us. Specially each book is placed over the frame of the play's action, only partially covering the image, so that it gives virtually every frame at least two space-time orientations. Thirdly, we try to show how Peter Greenaway uses pictorial references in order to illustrate the context of the Renaissance as well as pictorial techniques and language in order to question the nature of artistic representation. For exemple, The storm is visualised through reference to Botticelli's : the storm of papers swirling around the library is constructed to look like a facsimili copy of Michelangelo's Laurentiana Library in Florence. Greenaway's modern mannerism consists in imposing his own aesthetic vision and his questioning of art beyond the play's meta-theatricality: in other words, Shakespeare''s text has been adapted without being betrayed.

Progressive occupancy network for 3D reconstruction (3차원 형상 복원을 위한 점진적 점유 예측 네트워크)

  • Kim, Yonggyu;Kim, Duksu
    • Journal of the Korea Computer Graphics Society
    • /
    • v.27 no.3
    • /
    • pp.65-74
    • /
    • 2021
  • 3D reconstruction means that reconstructing the 3D shape of the object in an image and a video. We proposed a progressive occupancy network architecture that can recover not only the overall shape of the object but also the local details. Unlike the original occupancy network, which uses a feature vector embedding information of the whole image, we extract and utilize the different levels of image features depending on the receptive field size. We also propose a novel network architecture that applies the image features sequentially to the decoder blocks in the decoder and improves the quality of the reconstructed 3D shape progressively. In addition, we design a novel decoder block structure that combines the different levels of image features properly and uses them for updating the input point feature. We trained our progressive occupancy network with ShapeNet. We compare its representation power with two prior methods, including prior occupancy network(ONet) and the recent work(DISN) that used different levels of image features like ours. From the perspective of evaluation metrics, our network shows better performance than ONet for all the metrics, and it achieved a little better or a compatible score with DISN. For visualization results, we found that our method successfully reconstructs the local details that ONet misses. Also, compare with DISN that fails to reconstruct the thin parts or occluded parts of the object, our progressive occupancy network successfully catches the parts. These results validate the usefulness of the proposed network architecture.

Abnormal Crowd Behavior Detection via H.264 Compression and SVDD in Video Surveillance System (H.264 압축과 SVDD를 이용한 영상 감시 시스템에서의 비정상 집단행동 탐지)

  • Oh, Seung-Geun;Lee, Jong-Uk;Chung, Yongw-Ha;Park, Dai-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.6
    • /
    • pp.183-190
    • /
    • 2011
  • In this paper, we propose a prototype system for abnormal sound detection and identification which detects and recognizes the abnormal situations by means of analyzing audio information coming in real time from CCTV cameras under surveillance environment. The proposed system is composed of two layers: The first layer is an one-class support vector machine, i.e., support vector data description (SVDD) that performs rapid detection of abnormal situations and alerts to the manager. The second layer classifies the detected abnormal sound into predefined class such as 'gun', 'scream', 'siren', 'crash', 'bomb' via a sparse representation classifier (SRC) to cope with emergency situations. The proposed system is designed in a hierarchical manner via a mixture of SVDD and SRC, which has desired characteristics as follows: 1) By fast detecting abnormal sound using SVDD trained with only normal sound, it does not perform the unnecessary classification for normal sound. 2) It ensures a reliable system performance via a SRC that has been successfully applied in the field of face recognition. 3) With the intrinsic incremental learning capability of SRC, it can actively adapt itself to the change of a sound database. The experimental results with the qualitative analysis illustrate the efficiency of the proposed method.

A Study on the Creative Process of Creative Ballet <Youth> through Motion Capture Technology (모션캡처 활용을 통한 창작발레<청춘>창작과정연구)

  • Chang, So-Jung; Park, Arum
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.5
    • /
    • pp.809-814
    • /
    • 2023
  • Currently, there is a lack of research that directly applies and integrates science and technology in the field of dance and translates it into creative work. In this study, the researcher applied motion capture to creative dance performance 'Youth' and described the process of incorporating motion capture into scenes for the performance. The research method involved utilizing practice-based research, which derives new knowledge and meaning from creative outcomes through the analysis of phenomena and experiences generated on-site. The creative ballet performance "<Youth>" consists of a total of 4 scenes, and the motion-captured video in these scenes serves as the highlight moments. It visually represents the image of a past ballerina while embodying the meaning of a scene that is both the 'past me' and the 'dream of the present.' The use of motion capture enhances the visual representation of the scenes and plays a role in increasing the audience's immersion. The dance field needs to become familiar with collaborating with scientific and technological advancements like motion capture to digitize intangible assets. It is essential to engage in experimental endeavors and continue training for such collaborations. Furthermore, through collaboration, the ongoing research should extend the scope of movement through digitized processes, performances, and performance records. This will continually confer value and meaning to the field of dance

Analysis of the Manners of Using Scientific Models in Secondary Earth Science Classrooms: With a Focus on Lessons in the Domains of Atmospheric and Oceanic Earth Sciences (중등학교 지구과학 수업에서 과학적 모델의 활용 양상 분석: 대기 및 해양 지구과학 관련 수업을 중심으로)

  • Oh, Phil-Seok
    • Journal of The Korean Association For Science Education
    • /
    • v.27 no.7
    • /
    • pp.645-662
    • /
    • 2007
  • The purpose of this study was to explore the manners in which models are used in secondary science classrooms. A total of thirteen video-recordings of science lessons dealing with the domains of atmospheric and oceanic earth sciences and their verbatim transcripts were analysed both quantitatively and qualitatively. Interviews with three inservice science teachers were also conducted. Six interrelated assertions were generated as the result of the study: 1) The most frequently used models in secondary earth science classrooms include two-dimensional pictorial, symbolic, iconic, and diagrammatic ones; 2) Science teachers employ models as a mode of representation to make the subject matter available to students; 3) In earth science classrooms, teachers use typical forms of models in intensive manners; 4) Students themselves deal with models on a few occasions, but they just follow similar procedures with the same models; 5) Teachers talk rarely about the nature of scientific models and provide few opportunities for students to think about it; and, 6) Teachers in practice think that the value of using models should be appraised in consideration of the pedagogical intentions of the teacher. Implications for science education and science education research were discussed.

Research on Local and Global Infrared Image Pre-Processing Methods for Deep Learning Based Guided Weapon Target Detection

  • Jae-Yong Baek;Dae-Hyeon Park;Hyuk-Jin Shin;Yong-Sang Yoo;Deok-Woong Kim;Du-Hwan Hur;SeungHwan Bae;Jun-Ho Cheon;Seung-Hwan Bae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.7
    • /
    • pp.41-51
    • /
    • 2024
  • In this paper, we explore the enhancement of target detection accuracy in the guided weapon using deep learning object detection on infrared (IR) images. Due to the characteristics of IR images being influenced by factors such as time and temperature, it's crucial to ensure a consistent representation of object features in various environments when training the model. A simple way to address this is by emphasizing the features of target objects and reducing noise within the infrared images through appropriate pre-processing techniques. However, in previous studies, there has not been sufficient discussion on pre-processing methods in learning deep learning models based on infrared images. In this paper, we aim to investigate the impact of image pre-processing techniques on infrared image-based training for object detection. To achieve this, we analyze the pre-processing results on infrared images that utilized global or local information from the video and the image. In addition, in order to confirm the impact of images converted by each pre-processing technique on object detector training, we learn the YOLOX target detector for images processed by various pre-processing methods and analyze them. In particular, the results of the experiments using the CLAHE (Contrast Limited Adaptive Histogram Equalization) shows the highest detection accuracy with a mean average precision (mAP) of 81.9%.

One-shot multi-speaker text-to-speech using RawNet3 speaker representation (RawNet3를 통해 추출한 화자 특성 기반 원샷 다화자 음성합성 시스템)

  • Sohee Han;Jisub Um;Hoirin Kim
    • Phonetics and Speech Sciences
    • /
    • v.16 no.1
    • /
    • pp.67-76
    • /
    • 2024
  • Recent advances in text-to-speech (TTS) technology have significantly improved the quality of synthesized speech, reaching a level where it can closely imitate natural human speech. Especially, TTS models offering various voice characteristics and personalized speech, are widely utilized in fields such as artificial intelligence (AI) tutors, advertising, and video dubbing. Accordingly, in this paper, we propose a one-shot multi-speaker TTS system that can ensure acoustic diversity and synthesize personalized voice by generating speech using unseen target speakers' utterances. The proposed model integrates a speaker encoder into a TTS model consisting of the FastSpeech2 acoustic model and the HiFi-GAN vocoder. The speaker encoder, based on the pre-trained RawNet3, extracts speaker-specific voice features. Furthermore, the proposed approach not only includes an English one-shot multi-speaker TTS but also introduces a Korean one-shot multi-speaker TTS. We evaluate naturalness and speaker similarity of the generated speech using objective and subjective metrics. In the subjective evaluation, the proposed Korean one-shot multi-speaker TTS obtained naturalness mean opinion score (NMOS) of 3.36 and similarity MOS (SMOS) of 3.16. The objective evaluation of the proposed English and Korean one-shot multi-speaker TTS showed a prediction MOS (P-MOS) of 2.54 and 3.74, respectively. These results indicate that the performance of our proposed model is improved over the baseline models in terms of both naturalness and speaker similarity.

VR media aesthetics due to the evolution of visual media (시각 미디어의 진화에 따른 VR 매체 미학)

  • Lee, Dong-Eun;Son, Chang-Min
    • Cartoon and Animation Studies
    • /
    • s.49
    • /
    • pp.633-649
    • /
    • 2017
  • The purpose of this study is to conceptualize the changing aspects of human freedom of observation and viewing as the visual media evolves from film to 3D stereoscopic film and VR. The purpose of this study is to conceptualize the aspect of freedom and viewing aspect from the viewpoint of genealogy. In addition, I will identify the media aesthetic characteristics of VR and identify the identity and ontology of VR. Media has evolved around the most artificial sense of human being. There is a third visual space called screen at the center of all the reproduction devices centering on visual media such as painting, film, television, and computer. In particular, movies, television, and video screens, which are media that reproduce moving images, pursue perfect fantasy and visual satisfaction while controlling the movement of the audience. A mobilized virtual gaze was secured on the assumption of the floating nature of the so-called viewers. The audience sees a cinematic illusion with a view while seated in a fixed seat in a floating posture. They accept passive, passive, and passively without a doubt the fantasy world beyond the screen. But with the advent of digital paradigm, the evolution of visual media creates a big change in the tradition of reproduction media. 3D stereoscopic film predicted the extinction of the fourth wall, the fourth wall. The audience is no longer sitting in a fixed seat and only staring at the front. The Z-axis appearance of the 3D stereoscopic image reorganizes the space of the story. The viewer's gaze also extends from 'front' to 'top, bottom, left, right' and even 'front and back'. It also transforms the passive audience into an active, interactive, and experiential subject by placing viewers between images. Going one step further, the visual media, which entered the VR era, give freedom to the body of the captive audience. VR secures the possibility of movement of visitors and simultaneously coexists with virtual space and physical space. Therefore, the audience of the VR contents acquires an integrated identity on the premise of participation and movement. It is not a so-called representation but a perfection of the aesthetic system by reconstructing the space of fantasy while inheriting the simulation tradition of the screen.

TV Anytime and MPEG-21 DIA based Ubiquitous Consumption of TV Contents in Digital Home Environment (TV Anytime 및 MPEG-21 DIA 기반 콘텐츠 이동성을 이용한 디지털 홈 환경에서의 유비쿼터스 TV 콘텐츠 소비)

  • Kim Munjo;Yang Chanseok;Lim Jeongyeon;Kim Munchurl;Park Sungjin;Kim Kwanlae;Oh Yunje
    • Journal of Broadcast Engineering
    • /
    • v.10 no.4 s.29
    • /
    • pp.557-575
    • /
    • 2005
  • Much research in core technologies has been done to make it possible the ubiquitous video services over various kinds of user information terminals anytime anywhere in the way the users want to consume. In this paper, we design plototypesystem architecture for the ubiquitous TV program content consumption based on user preference via various kinds of intelligent information terminals in digital home environment, and present an implementation and testing results for the prototype system. For the system design, we utilize the TV Anytime specification fur the consumption of TV program contents based on user preference in TV programs, and also use the MPEG-21 DIA (Digital Item Adaptation) tools which are the representation schema formats in order to describe the context information for user environments, user terminal characteristics, user characteristics for universal access and consumption of the preferred TV program contents. The proposed ubiquitous content mobility prototype system is designed to make it possible to seamlessly consume contents by a single user or multiple users via various kinds of user terminals for the TV program contents they watch together. The proposed ubiquitous content mobility prototype system in digital home environment consists of a home server, a display TV terminal, and an intelligent information terminal. We use 42 TV programs contents in eight different genres from four different TV channels in order to test our prototype system.