What You Hear is What You See\ulcorner

  • Published : 2002.03.01

Abstract

This study aims at investigating the relationship between voice and the image information carried within the voice. Whenever we hear somebody talking, we form a mental image of the speaker. Is it accurate? Is there a relationship between the voice and the image triggered by the voice? To answer these questions, speech samples form 8 males and 8 females were recorded. Two photos were taken for each speaker: the whole body photo (W) with physical characteristics present, and the face close-ups (F) without much physical details revealed. 361 subjects were asked to match the voices with the corresponding photos. The results showed that 5 males and 5 f3males (with W) and 2 males and 4 females (with F) were correctly identified. More interestingly, however, even in the mismatches, there was a strong tendency for participants to agree on which voice should correspond to which photo. The participants also agreed much more readily on their favorite voice than on their favorite photo. It seems voice does carry certain information about the physical characteristics of the speaker in a consistent manner. These findings have some bearings on understanding the mechanism of speech production and perception as well as on improving speech technology.

Keywords

References

  1. J. Sundberg, The Science of Singing Voice. Northern Illinoise University Press, Dekelb, 1987
  2. L. Leinonen, and T. Hiltunen, 'Expression of emotional motivational connotations with a one-word utterance.' Journal of the Acoustical Society of America, 102 (3) pp. 1853-1863, 1997 https://doi.org/10.1121/1.420109
  3. S-J. Moon, 'Voice and image: a pilot study,' Malsori (Phonetics), 35-36, pp. 37-48, 1998
  4. S-J. Moon, 'Voice and image: A perception experiment,' Journal of the Acoustical Society of Korea, 18.8, pp.66-74, 1999