• Title/Summary/Keyword: Audio and Video

Search Result 805, Processing Time 0.028 seconds

Lip and Voice Synchronization Using Visual Attention (시각적 어텐션을 활용한 입술과 목소리의 동기화 연구)

  • Dongryun Yoon;Hyeonjoong Cho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.166-173
    • /
    • 2024
  • This study explores lip-sync detection, focusing on the synchronization between lip movements and voices in videos. Typically, lip-sync detection techniques involve cropping the facial area of a given video, utilizing the lower half of the cropped box as input for the visual encoder to extract visual features. To enhance the emphasis on the articulatory region of lips for more accurate lip-sync detection, we propose utilizing a pre-trained visual attention-based encoder. The Visual Transformer Pooling (VTP) module is employed as the visual encoder, originally designed for the lip-reading task, predicting the script based solely on visual information without audio. Our experimental results demonstrate that, despite having fewer learning parameters, our proposed method outperforms the latest model, VocaList, on the LRS2 dataset, achieving a lip-sync detection accuracy of 94.5% based on five context frames. Moreover, our approach exhibits an approximately 8% superiority over VocaList in lip-sync detection accuracy, even on an untrained dataset, Acappella.

Why A Multimedia Approach to English Education\ulcorner

  • Keem, Sung-uk
    • Proceedings of the KSPS conference
    • /
    • 1997.07a
    • /
    • pp.176-178
    • /
    • 1997
  • To make a long story short I made up my mind to experiment with a multimedia approach to my classroom presentations two years ago because my ways of giving instructions bored the pants off me as well as my students. My favorite ways used to be sometimes referred to as classical or traditional ones, heavily dependent on the three elements: teacher's mouth, books, and chalk. Some call it the 'MBC method'. To top it off, I tried audio-visuals such as tape recorders, cassette players, VTR, pictures, and you name it, that could help improve my teaching method. And yet I have been unhappy about the results by a trial and error approach. I was determined to look for a better way that would ensure my satisfaction in the first place. What really turned me on was a multimedia CD ROM title, ELLIS (English Language Learning Instructional Systems) developed by Dr. Frank Otto. This is an integrated system of learning English based on advanced computer technology. Inspired by the utility and potential of such a multimedia system for regular classroom or lab instructions, I designed a simple but practical multimedia language learning laboratory in 1994 for the first time in Korea(perhaps for the first time in the world). It was high time that the conventional type of language laboratory(audio-passive) at Hahnnam be replaced because of wear and tear. Prior to this development, in 1991, I put a first CALL(Computer Assisted Language Learning) laboratory equipped with 35 personal computers(286), where students were encouraged to practise English typing, word processing and study English grammar, English vocabulary, and English composition. The first multimedia language learning laboratory was composed of 1) a multimedia personal computer(486DX2 then, now 586), 2) VGA multipliers that enable simultaneous viewing of the screen at control of the instructor, 3) an amplifIer, 4) loud speakers, 5)student monitors, 6) student tables to seat three students(a monitor for two students is more realistic, though), 7) student chairs, 8) an instructor table, and 9) cables. It was augmented later with an Internet hookup. The beauty of this type of multimedia language learning laboratory is the economy of furnishing and maintaining it. There is no need of darkening the facilities, which is a must when an LCD/beam projector is preferred in the laboratory. It is headset free, which proved to make students exasperated when worn more than- twenty minutes. In the previous semester I taught three different subjects: Freshman English Lab, English Phonetics, and Listening Comprehension Intermediate. I used CD ROM titles like ELLIS, Master Pronunciation, English Tripple Play Plus, English Arcade, Living Books, Q-Steps, English Discoveries, Compton's Encyclopedia. On the other hand, I managed to put all teaching materials into PowerPoint, where letters, photo, graphic, animation, audio, and video files are orderly stored in terms of slides. It takes time for me to prepare my teaching materials via PowerPoint, but it is a wonderful tool for the sake of presentations. And it is worth trying as long as I can entertain my students in such a way. Once everything is put into the computer, I feel relaxed and a bit excited watching my students enjoy my presentations. It appears to be great fun for students because they have never experienced this type of instruction. This is how I freed myself from having to manipulate a cassette tape player, VTR, and write on the board. The student monitors in front of them seem to help them concentrate on what they see, combined with what they hear. All I have to do is to simply click a mouse to give presentations and explanations, when necessary. I use a remote mouse, which prevents me from sitting at the instructor table. Instead, I can walk around in the room and enjoy freer interactions with students. Using this instrument, I can also have my students participate in the presentation. In particular, I invite my students to manipulate the computer using the remote mouse from the student's seat not from the instructor's seat. Every student appears to be fascinated with my multimedia approach to English teaching because of its unique nature as a new teaching tool as we face the 21st century. They all agree that the multimedia way is an interesting and fascinating way of learning to satisfy their needs. Above all, it helps lighten their drudgery in the classroom. They feel other subjects taught by other teachers should be treated in the same fashion. A multimedia approach to education is impossible without the advent of hi-tech computers, of which multi functions are integrated into a unified system, i.e., a personal computer. If you have computer-phobia, make quick friends with it; the sooner, the better. It can be a wonderful assistant to you. It is the Internet that I pay close attention to in conjunction with the multimedia approach to English education. Via e-mail system, I encourage my students to write to me in English. I encourage them to enjoy chatting with people all over the world. I also encourage them to visit the sites where they offer study courses in English conversation, vocabulary, idiomatic expressions, reading, and writing. I help them search any subject they want to via World Wide Web. Some day in the near future it will be the hub of learning for everybody. It will eventually free students from books, teachers, libraries, classrooms, and boredom. I will keep exploring better ways to give satisfying instructions to my students who deserve my entertainment.

  • PDF

A Design and Implementation of Multimedia Retrieval System based on MAF(Multimedia Application File Format) (MAF(Multimedia Application File Format) 기반 멀티미디어 검색 시스템의 설계 및 구현)

  • Gang Young-Mo;Park Joo-Hyoun;Bang Hyung-Gin;Nang Jong-Ho;Kim Hyung-Chul
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.574-584
    • /
    • 2006
  • Recently, ISO/IEC 23000 (also known as 'MPEG-A') has proposed a new file format called 'MAF(Multimedia Application File Format)[1]' which provides a capability of integrating/storing the widely-used compression standards for audio and video and the metadata in MPEG-7 form into a single file format. However, it is still very hard to verify the usefulness of MPEG-A in the real applications because there is still no real system that fully implements this standard. In this thesis, a design and implementation of a multimedia retrieval system based on MPEG-A standard on PC and mobile device is presented. Furthermore, an extension of MPEG-A for describing the metadata for video is also proposed. It is selected and defined as a subset of MPEG-7 MDS[4] and TV-anytime[5] for video that is useful and manageable in the mobile environments. In order to design the multimedia retrieval system based on MPEG-A, we define the system requirements in terms of portability, extensibility, compatibility, adaptability, efficiency. Based on these requirements, we design the system which composed of 3 layers: Application Layer, Middleware Layer, Platform Layer. The proposed system consists of two sub-parts, client-part and server-part. The client-part consists of MAF authoring tool, MAP player tool and MAF searching tool which allow users to create, play and search the MAF files, respectively. The server-part is composed of modules to store and manage the MAF files and metadata extracted from MAF files. We show the usefulness of the proposed system by implementing the client system both on MS-Windows platform on desk-top computer and WIPI platform on mobile phone, and validate whether it to satisfy all the system requirements. The proposed system can be used to verify the specification in the MPEG-A, and to proves the usefulness of MPEG-A in the real application.

A Longitudinal Case Study of Late Babble and Early Speech in Southern Mandarin

  • Chen, Xiaoxiang
    • Cross-Cultural Studies
    • /
    • v.20
    • /
    • pp.5-27
    • /
    • 2010
  • This paper studies the relation between canonical/variegated babble (CB/VB) and early speech in an infant acquiring Mandarin Chinese from 9 to 17 months. The infant was audio-and video-taped in her home almost every week. The data analyzed here come from 1,621 utterances extracted from 23 sessions ranging from 30 minutes to one hour, from age 00:09;07 to 01:05;27. The data was digitized, and segments from 23 sessions were transcribed in narrow IPA and coded for analysis. Babble was coded from age 00:09;07 to 01:00;00, and words were coded from 01:00;00 to 01:05;27, proto-words appeared at 11 months, and some babble was still present after 01:10;00. 3821 segments were counted in CB/VB utterances, plus the segments found in 899 word tokens. The data transcription was completed and checked by the author and was rechecked by two other researchers who majored in Chinese phonetics in order to ensure the reliability, we reached an agreement of 95.65%. Mandarin Chinese is phonetically very rich in consonants, especially affricates: it has aspirated and unaspirated stops in labial, alveolar, and velar places of articulation; affricates and fricatives in alveolar, retroflex, and palatal places; /f/; labial, alveolar, and velar nasals; a lateral;[h]; and labiovelar and palatal glides. In the child's pre-speech phonetic repertoire, 7 different consonants and 10 vowels were transcribed at 00:09;07. By 00:10;16, the number of phones was more than doubled (17 consonants, 25 vowels), but the rate of increase slowed after 11 months of age. The phones from babbling remained active throughout the child's early and subsequent speech. The rank order of the occurrence of the major class types for both CB and early speech was: stops, approximants, nasals, affricates, fricatives and lateral. As expected, unaspirated stops outnumbered aspirated stops, and front stops and nasals were more frequent than back sounds in both types of utterances. The fact that affricates outnumbered fricatives in the child's late babble indicates the pre-speech influence of the ambient language. The analysis of the data also showed that: 1) the phonetic characteristics of CB/VB and early meaningful speech are extremely similar. The similarities of CB/VB and speech prove that the two are deeply related; 2) The infant has demonstrated similar preferences for certain types of sounds in the two stages; 3) The infant's babbling was patterned at segmental level, and this regularity was similarly evident in the early speech of children. The three types being coronal plus front vowel; labial plus central and dorsal plus back vowel exhibited much overlap in the phonetic forms of CB/ VB and early speech. So the child's CB/ VB at this stage already shared the basic architecture, composition and representation of early speech. The evidence of similarity between CB/VB and early speech leaves no doubt that phones present in CB/VB are indeed precursors to early speech.

User Satisfaction of Mobile Convergence Device: The Expectation and Disconfirmation Approach (모바일 복합 단말기 사용자 만족: 기대-불일치 접근)

  • Lee, Seung-Chang;Suh, Eung-Kyo
    • Journal of Distribution Science
    • /
    • v.10 no.11
    • /
    • pp.89-99
    • /
    • 2012
  • Purpose - Mobile devices, especially mobile terminals capable of telecommunication and wireless connectivity, are leading the advancements in consumer electronics. Digital convergence drives the functions of various devices, such as cellular phones, MP3 players, personal digital assistants, and gaming, into a single device. This trend would continue and applications such as digital audio and video streaming (including personalized content delivery mechanisms) would soon be on a handheld device. As customers want mobile convergence devices, manufacturers are driving new initiatives in the emerging mobile device market. Given the roles played by device design and service content in user satisfaction of a mobile convergence device, this study focuses on identifying and measuring the constructs for the process by which user satisfaction is achieved. This study synthesizes the expectation-disconfirmation paradigm with empirical theories in user satisfaction. Device and service levels are separated, and nine key constructs for user satisfaction of mobile convergence devices are proposed. Insight into this process could help web-based businesses to improve user satisfaction, thus enhancing the effectiveness of e-commerce for sellers and buyers. Research design, data, methodology - This study draws on three users of mobile convergence devices as examples. To test there search model and hypotheses, survey questionnaires were sent to 607 mobile device users. Mobile device users were initially identified from several members, and subjects were randomly drawn. Data from 577 survey responses were finally analyzed. The unit of measurement and analysis in this research study is at a personal level. Results - The measurements for the constructs were developed and tested in a two-phase study. In the first phase, the device and service dimensions were identified, and instruments for measuring them were developed and tested. In the second phase, using the salient dimensions of the device and service as the formulating first-order factors, instruments were developed and empirically tested to measure satisfaction of the device and service. In measuring satisfaction of mobile convergence devices, the critical tasks are to identify the key constructs of such user satisfaction and to develop validated instruments to measure them. Hence, the results of this study have immediate implications for businesses and for research in user satisfaction of mobile convergence devices. Conclusions - This study provides reliable instruments for operationalizing key constructs in the analysis of user satisfaction of mobile convergence devices within the expectation-disconfirmation paradigm. Hence, convergence device makers will be able to examine whether their websites meet their customers' expectations by examining the device aspect of the mobile convergence device customers, and the service aspect expectations and disconfirmation. Moreover, the introduction of expectation and disconfirmation constructs brings the marketing aspect of convergence devices into focus for such retailers, an aspect crucial to the effective design of websites for online businesses. In addition,this study provides the metrics required to initiate future studies on user satisfaction of mobile convergence devices.

  • PDF

The Role of Fundamentalization of Education in Improving the Future Specialists Professional Training with Usage of Multimedia Technologies

  • Palshkov, Kostiantyn;Kochubei, Olena;Tsokur, Olga;Tiahur, Vasyl;Tiahur, Liubomyra;Filimonova, Tetiana;Kuzminskyi, Anatolii
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.95-102
    • /
    • 2022
  • The article considers the fundamentalization of education in improving the future specialists professional training with usage of multimedia technologies by various scientists. Various points of view and approaches to defining the concepts of fundamentalization of education and multimedia technologies are identified. The concept of fundamentalization of professional training of a future specialist is based on the goals and functions of fundamentalization and - on the ways and means of achieving it, etc. Most authors agree only in their views that the fundamentalization of education is aimed at improving the quality of education and the education of the individual. Others involve the formation of a culture and worldview, increasing the creative and intellectual potential, forming the professional competence of a specialist and the potential for further education, and so on. The term multimedia refers to interactive systems that provide processing of moving and still video images, animated graphics, high-quality audio and speech. It is found out that professional training of a specialist by means of multimedia technologies includes not only the activities of the teacher and student, which form the learning process, but also the independent activity of the subject, self-development, assimilation of experience by the subject through analysis, comprehension and transformation of the field of activity in which he is included. It is revealed through the implementation of which approaches to the fundamentalization of higher professional education, it becomes possible to fully present theoretical training courses and effectively pass practical training by students, which contributes to improving the quality of training of future specialists in higher education institutions. Theoretical analysis of scientific views indicates a fairly serious attention of scientists to the problem of professional readiness of specialists and the possibility of higher educational institutions in preparing for it. At the same time, professional readiness is considered from different positions: as an active state of a person, which manifests itself in activity; as a result of activity; as goals of activity; as a quality that characterizes the attitude to solving professional problems and social situations; as a prerequisite for purposeful activity; as a form of activity of the subject; as an integral formation of personality; as a component of socio-professional culture; as a complex professionally significant neoplasm of the individual.

Implementation of Character and Object Metadata Generation System for Media Archive Construction (미디어 아카이브 구축을 위한 등장인물, 사물 메타데이터 생성 시스템 구현)

  • Cho, Sungman;Lee, Seungju;Lee, Jaehyeon;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.24 no.6
    • /
    • pp.1076-1084
    • /
    • 2019
  • In this paper, we introduced a system that extracts metadata by recognizing characters and objects in media using deep learning technology. In the field of broadcasting, multimedia contents such as video, audio, image, and text have been converted to digital contents for a long time, but the unconverted resources still remain vast. Building media archives requires a lot of manual work, which is time consuming and costly. Therefore, by implementing a deep learning-based metadata generation system, it is possible to save time and cost in constructing media archives. The whole system consists of four elements: training data generation module, object recognition module, character recognition module, and API server. The deep learning network module and the face recognition module are implemented to recognize characters and objects from the media and describe them as metadata. The training data generation module was designed separately to facilitate the construction of data for training neural network, and the functions of face recognition and object recognition were configured as an API server. We trained the two neural-networks using 1500 persons and 80 kinds of object data and confirmed that the accuracy is 98% in the character test data and 42% in the object data.

A Study on the Development of Electronic Resource Management System in a University Library (대학도서관 전자자원관리시스템(ERMS) 구축에 관한 연구)

  • Kim, Yong;Cho, Su-Kyeong
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.4
    • /
    • pp.249-276
    • /
    • 2010
  • With the rapid growth and development of information technology and the Internet, the amount of information published in electronic formats such as video, audio, digitalized text, etc. and the number of users accessing information online to satisfy their information needs are growing at a tremendous rate. This study analyzes standardized components to construct ERMS and proposes a model of ERMS based on the result of the analysis. The main functions of ERMS in university libraries are: 1) ERMS can manage and control access information to various electronic resources, metadata, holdings, user resources. Also, ERMS can be compatible with an existing library system such as IR(Information Retrieval) system, linking system, or proxy system. 2) ERMS should completely be compatible with acquisition and cataloging systems for effective management and control of integrated information organization and library budget. 3) ERMS should systematically and effectively manage license information on electronic resources. 4) ERMS should provide ideal and effective environment for use and access control of electronic resources in a library and integrated tool to manage and control all of electronic resources. Additionally, this study points out the need to organize committee groups to establish standardized rules and collaborative management of electronic resources among university libraries like DLF ERMI and redesign organizations in a library and a librarian's job description.

Emission Factors of Chemical Substances and the Abatement Policies in Korea Industries (화학물질 배출량 변동 요인과 배출저감 정책의 조합)

  • Rhee, Hae-Chun
    • Environmental and Resource Economics Review
    • /
    • v.18 no.4
    • /
    • pp.653-693
    • /
    • 2009
  • Using the Korean environmental input output analysis, this paper provides the emission intensities of the chemicals, especially, the toxic and carcinogenic substances, by linking the structure of demand, and the policy mix to abate these substances emissions. Acording to the results, Industries with the highest total emission intensities(TEI) of toxic substances are ranked : Printing and reproduction of recorded media(21), Other transportation equipment(26), Pulp and paper(11), Leather and fur products(9), Fiber yarn and fabrics(7). And the highest TEI of carcinogenic substances are Wood and wooden products(10), Motor vehicles and parts(25), Plastic and rubber products(15), Audio, video and communications equipment(23), etc.. The economic factors of changing these emissions are emission intensities and final demands. The effective combinations of policy instruments to abate these emissions are varied by the industries and substances. For example, Government need to execute the effective TEI management in the Fiber yarn and fabrics(7) sector, and, in furniture(27) sector, the reduction of final demand is more effective.

  • PDF

Understanding Purposes and Functions of Students' Drawing while on Geological Field Trips and during Modeling-Based Learning Cycle (야외지질답사 및 모델링 기반 순환 학습에서 학생들이 그린 그림의 목적과 기능에 대한 이해)

  • Choi, Yoon-Sung
    • Journal of the Korean earth science society
    • /
    • v.42 no.1
    • /
    • pp.88-101
    • /
    • 2021
  • The purpose of this study was to qualitatively examine the meaning of students' drawings in outdoor classes and modeling-based learning cycles. Ten students were observed in a gifted education center in Seoul. Under the theme of the Hantan River, three outdoor classes and three modeling activities were conducted. Data were collected to document all student activities during field trips and classroom modeling activities using simultaneous video and audio recording and observation notes made by the researcher and students. Please note it is unclear what this citation refers to. If it is the previous sentence it should be placed within that sentence's punctuation. Hatisaru (2020) Ddrawing typess were classified by modifying the representations in a learning context in geological field trips. We used deductive content analysis to describe the drawing characteristics, including students writing. The results suggest that students have symbolic images that consist of geologic concepts, visual images that describe topographical features, and affective images that express students' emotion domains. The characteristics were classified into explanation, generality, elaboration, evidence, coherence, and state-of-mind. The characteristics and drawing types are consecutive in the modeling-based learning cycle and reflect the students' positive attitude and cognitive scientific domain. Drawing is a useful tool for reflecting students' thoughts and opinions in both outdoor class and classroom modeling activities. This study provides implications for emphasizing the importance of drawing activities.