• 제목/요약/키워드: Image to Speech

검색결과 188건 처리시간 0.026초

히어 캠 임베디드 플랫폼 설계 (HearCAM Embedded Platform Design)

  • 홍선학;조경순
    • 디지털산업정보학회논문지
    • /
    • 제10권4호
    • /
    • pp.79-87
    • /
    • 2014
  • In this paper, we implemented the HearCAM platform with Raspberry PI B+ model which is an open source platform. Raspberry PI B+ model consists of dual step-down (buck) power supply with polarity protection circuit and hot-swap protection, Broadcom SoC BCM2835 running at 700MHz, 512MB RAM solered on top of the Broadcom chip, and PI camera serial connector. In this paper, we used the Google speech recognition engine for recognizing the voice characteristics, and implemented the pattern matching with OpenCV software, and extended the functionality of speech ability with SVOX TTS(Text-to-speech) as the matching result talking to the microphone of users. And therefore we implemented the functions of the HearCAM for identifying the voice and pattern characteristics of target image scanning with PI camera with gathering the temperature sensor data under IoT environment. we implemented the speech recognition, pattern matching, and temperature sensor data logging with Wi-Fi wireless communication. And then we directly designed and made the shape of HearCAM with 3D printing technology.

시각 장애인용 신문 구독 프로그램을 위한 이미지에서 표 구조 인식 (Table Structure Recognition in Images for Newspaper Reader Application for the Blind)

  • 김지웅;이강;김경미
    • 한국멀티미디어학회논문지
    • /
    • 제19권11호
    • /
    • pp.1837-1851
    • /
    • 2016
  • Newspaper reader mobile applications using text-to-speech (TTS) function enable blind people to read newspaper contents. But, tables cannot be easily read by the reader program because most of the tables are stored as images in the contents. Even though we try to use OCR (Optical character reader) programs to recognize letters from the table images, it cannot be simply applied to the table reading function because the table structure is unknown to the readers. Therefore, identification of exact location of each table cell that contains the text of the table is required beforehand. In this paper, we propose an efficient image processing algorithm to recognize all the cells in tables by identifying columns and rows in table images. From the cell location data provided by the table column and row identification algorithm, we can generate table structure information and table reading scenarios. Our experimental results with table images found commonly in newspapers show that our cell identification approach has 100% accuracy for simple black and white table images and about 99.7% accuracy for colored and complicated tables.

음성과 영상정보를 결합한 멀티모달 제어기의 구현 (Implementation of a Multimodal Controller Combining Speech and Lip Information)

  • 김철;최승호
    • 한국음향학회지
    • /
    • 제20권6호
    • /
    • pp.40-45
    • /
    • 2001
  • 본 논문에서는 음성과 영상정보를 결합한 멀티모달시스템을 구현하고 그 성능을 평가하였다. 음성정보를 이용해서 음성인식기를, 영상정보를 이용해서 입술인식기를 설계하였으며, 두 인식기는 HMM (Hidden Markov Model) 기반의 인식엔진을 사용하였다. 음성과 영상인식의 결과는 각각 8:2의 가중치를 부여하여 통합하였다. 한편, 구축된 멀티모달 인식시스템은 DARC (data radio channel)시스템과 통합되어 응용프로그램인 Comdio(computer radio)를 제어하도록 구현하였다. 멀티모달과 DARC시스템, 멀티모달시스템 내에서 두 인식기간의 정보교환은TCP/IP소켓 방식을 사용하였다. 통합시스템의 Comdio 제어실험의 결과는 입술인식이 음성인식기의 보조수단으로 사용될 수 있음을 보였으며, 향후교통정보 및 자동차항법장치에 적용되어짐으로써 그 적용분야를 넓힐 수 있을 것으로 기대된다.

  • PDF

HMM(Hidden Markov Model) 기반의 견고한 실시간 립리딩을 위한 효율적인 VLSI 구조 설계 및 FPGA 구현을 이용한 검증 (Design of an Efficient VLSI Architecture and Verification using FPGA-implementation for HMM(Hidden Markov Model)-based Robust and Real-time Lip Reading)

  • 이지근;김명훈;이상설;정성태
    • 한국컴퓨터정보학회논문지
    • /
    • 제11권2호
    • /
    • pp.159-167
    • /
    • 2006
  • 립리딩은 잡음이 있는 환경에서 음성 인식 시스템의 성능 향상을 위한 한 방법으로 제안되었다. 기존의 논문들이 소프트웨어 립리딩 방법을 제안하는 것에 반하여, 본 논문에서는 실시간 립리딩을 위한 하드웨어 설계를 제안한다. 실시간 처리와 구현의 용이성을 위하여 본 논문에서는 립리딩 시스템을 이미지 획득 모듈, 특징 벡터 추출 모듈, 인식 모듈의 세 모듈로 분할하였다. 이미지 획득 모듈에서는 CMOS 이미지 센서를 사용하여 입력 영상을 획득하게 하였고, 특징 벡터 추출 모듈에서는 병렬 블록매칭 알고리즘을 이용하여 입력영상으로부터 특징벡터를 추출하도록 하였고, 이를 FPGA로 코딩하여 시뮬레이션 하였다. 인식 모듈에서는 추출된 특징 벡터에 대하여 HMM 기반 인식 알고리즘을 적용하여 발성한 단어를 인식하도록 하였고, 이를 DSP에 코딩하여 시뮬레이션 하였다. 시뮬레이션 결과 실시간 립리딩 시스템이 하드웨어로 구현 가능함을 알 수 있었다.

  • PDF

A Study on the Image of Male Flight Attendant on Customer Satisfaction

  • Kim, Min-Ji;Park, Hye-Yoon;Park, So-Yeon
    • 유통과학연구
    • /
    • 제15권8호
    • /
    • pp.37-46
    • /
    • 2017
  • Purpose - Many studies have shown the effects of the external images of female flight attendants on the customers' satisfaction. Recently, the perception of male flight attendants has become more important and positive, and airlines are hiring a significant number of male flight attendants every year. Due to the lack of research on the male flight attendant, however, the images of male flight attendants were investigated for this study. Research, design, data and methodology - Using survey techniques with 204 respondents, this study used analytical data based their resulting analysis. Results - The study examined whether the image of the male flight attendant affects the cognitive and emotional perceptions of customers. The focus of the present study is the external image of the male flight attendant, and the following image-component divisions were formed: hairstyle, body type, uniform, speech, and facial expression. Conclusions - The study purpose sought to determine whether the image of the male flight attendant exert effects on the emotional and cognitive images of airlines, and if these images have a positive effect on the customers' satisfaction and loyalty for an airline, so that airlines can use the external image of the male flight attendant to help with its own image reinforcement.

The Test-Retest Reliability of Subjective Visual Horizontal Testing: Comparisons between Solid and Dotted Line Images

  • Zakaria, Mohd Normani;Wahat, Nor Haniza Abdul;Zainun, Zuraida;Sakeri, Nurul Syarida Mohd;Salim, Rosdan
    • Journal of Audiology & Otology
    • /
    • 제24권2호
    • /
    • pp.107-111
    • /
    • 2020
  • The present study aimed to determine the test-retest reliability of subjective visual horizontal (SVH) testing when tested with solid and dotted line images. In this repeated measures study, 36 healthy young Malaysian adults (mean age=23.3±2.3 years, 17 males and 19 females) were enrolled. All of them were healthy and had no hearing, vestibular, balance, or vision problems. The SVH angles were recorded from each participant in an upright body position using a computerized device. They were asked to report their horizontality perception for solid and dotted line images (in the presence of a static black background). After 1 week, the SVH procedure was repeated. The test-retest reliability of SVH was found to be good for both solid line [intraclass correlation (ICC)=0.80] and dotted line (ICC=0.78). As revealed by Bland-Altman plots, for each visual image, the agreements of SVH between the two sessions were within the clinically accepted criteria (±2°). The SVH testing was found to be temporally reliable, which can be clinically beneficial. Both solid and dotted lines in the SVH testing are reliable to be used among young adults.

The Test-Retest Reliability of Subjective Visual Horizontal Testing: Comparisons between Solid and Dotted Line Images

  • Zakaria, Mohd Normani;Wahat, Nor Haniza Abdul;Zainun, Zuraida;Sakeri, Nurul Syarida Mohd;Salim, Rosdan
    • 대한청각학회지
    • /
    • 제24권2호
    • /
    • pp.107-111
    • /
    • 2020
  • The present study aimed to determine the test-retest reliability of subjective visual horizontal (SVH) testing when tested with solid and dotted line images. In this repeated measures study, 36 healthy young Malaysian adults (mean age=23.3±2.3 years, 17 males and 19 females) were enrolled. All of them were healthy and had no hearing, vestibular, balance, or vision problems. The SVH angles were recorded from each participant in an upright body position using a computerized device. They were asked to report their horizontality perception for solid and dotted line images (in the presence of a static black background). After 1 week, the SVH procedure was repeated. The test-retest reliability of SVH was found to be good for both solid line [intraclass correlation (ICC)=0.80] and dotted line (ICC=0.78). As revealed by Bland-Altman plots, for each visual image, the agreements of SVH between the two sessions were within the clinically accepted criteria (±2°). The SVH testing was found to be temporally reliable, which can be clinically beneficial. Both solid and dotted lines in the SVH testing are reliable to be used among young adults.

노화에 따른 발화 시 입술움직임의 변화: 이중모음을 중심으로 (Change in lip movement during speech by aging: Based on a double vowel)

  • 박희준
    • 말소리와 음성과학
    • /
    • 제13권1호
    • /
    • pp.73-79
    • /
    • 2021
  • 본 연구에서는 노화에 따른 발화 시 입술 움직임의 변화를 알아보고자 하였다. 연구대상으로 평균 69세의 노인 여성 15명과 평균 22세의 젊은 여성 15명을 선정하였다. 입술 움직임을 측정하기 위해 이중모음 발화시 입술 움직임을 녹화하여 스틸 이미지로 저장한 다음 입술의 움직임이 최소인 부분과 최대한 길이를 영상분석 소프트웨어를 이용하여 pixel 단위로 수작업으로 분석하여 비교하였다. 임상적 활용성을 위해 자동화 알고리즘을 적용하여 소프트웨어를 제작했으며 수작업의 결과와 비교하였다. 연구결과 노년층의 경우 청년층에 비해 이중모음 과제에서 입술의 가로 및 세로의 길이 범위가 작은 것을 알 수 있었다. 수작업과 자동화 방법의 상관관계를 측정한 결과 강한 정적 상관관계가 나타나 두 방법 모두 입술 윤곽 추출 시 유용함을 알 수 있었다. 이상의 결과를 바탕으로 노화가 진행됨에 따라 발화 시 입술의 범위가 작아지는 것을 알 수 있었다. 따라서 노화가 진행되기 전 간단하게 입술의 움직임을 측정하여 본인의 상태를 모니터링하고 입술 범위를 유지할 수 있는 운동을 실시한다면 노화로 인한 발음 문제를 예방할 수 있을 것이다.

오디오 신호를 이용한 음란 동영상 판별 (Classification of Phornographic Videos Using Audio Information)

  • 김봉완;최대림;방만원;이용주
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.207-210
    • /
    • 2007
  • As the Internet is prevalent in our life, harmful contents have been increasing on the Internet, which has become a very serious problem. Among them, pornographic video is harmful as poison to our children. To prevent such an event, there are many filtering systems which are based on the keyword based methods or image based methods. The main purpose of this paper is to devise a system that classifies the pornographic videos based on the audio information. We use Mel-Cepstrum Modulation Energy (MCME) which is modulation energy calculated on the time trajectory of the Mel-Frequency cepstral coefficients (MFCC) and MFCC as the feature vector and Gaussian Mixture Model (GMM) as the classifier. With the experiments, the proposed system classified the 97.5% of pornographic data and 99.5% of non-pornographic data. We expect the proposed method can be used as a component of the more accurate classification system which uses video information and audio information simultaneously.

  • PDF

A Simulation Study of the Vocal Tract in Tracheoesophageal Speaker

  • Kim, Cheol-Soo;Wang, Soo-Geun;Roh, Hwan-Jung;Goh, Eui-Kyung;Chon, Kyong-Myong;Lee, Byung-Joo;Kwon, Soon-Bok;Lee, Suck-Hong;Kim, Hak-Jin;Yang, Byung-Gon
    • 음성과학
    • /
    • 제7권3호
    • /
    • pp.197-218
    • /
    • 2000
  • The vocal tract shapes were measured from tracheoesophageal speakers during the sustained phonation of five Korean vowels /u/, /o/, /a/, /e/, /i/ using magnetic resonance image(MRI). The subject's original vowel utterances with speech intelligibility and the synthesized vowels from MR images were analyzed. The results were as follows: (1) The vowels /a/, /e/, /i/ were perceived as the same sounds of actual subject's speech, but the vowels /o/ and /u/ were perceived as /$\partial$/ and strained /u/, respectively. (2) The synthesized vowels /a/ and /e/ from the MR images were perceived as the same sounds, but the vowels /u/, /o/, /i/ were perceived as different sounds. (3) The synthesized vowel by the expanded pharyngeal segment of 3 times in vowel /o/ was perceived as more natural than that of 2 times. The pharyngeal areas with varied sizes should be experimented to secure better speech production because the correct shapes of the vocal tract lead to distinct vowel production.

  • PDF