Abstract
One of the research objectives in the area of multimedia human-computer interaction is the application of artificial intelligence and robotics technologies to the development of computer interfaces. This involves utilizing many forms of media, integrating speed input, natural language, graphics, hand pointing gestures, and other methods for interactive dialogues. Although current human-computer communication methods include computer keyboards, mice, and other traditional devices, the two basic ways by which people communicate with each other are voice and gesture. This paper reports on research focusing on the development of an intelligent multimedia interface system modeled based on the manner in which people communicate. This work explores the interaction between humans and computers based only on the processing of speech(Work uttered by the person) and processing of images(hand pointing gestures). The purpose of the interface is to control a pan/tilt camera to point it to a location specified by the user through utterance of words and pointing of the hand, The systems utilizes another stationary camera to capture images of the users hand and a microphone to capture the users words. Upon processing of the images and sounds, the systems responds by pointing the camera. Initially, the interface uses hand pointing to locate the general position which user is referring to and then the interface uses voice command provided by user to fine-the location, and change the zooming of the camera, if requested. The image of the location is captured by the pan/tilt camera and sent to a color TV monitor to be displayed. This type of system has applications in tele-conferencing and other rmote operations, where the system must respond to users command, in a manner similar to how the user would communicate with another person. The advantage of this approach is the elimination of the traditional input devices that the user must utilize in order to control a pan/tillt camera, replacing them with more "natural" means of interaction. A number of experiments were performed to evaluate the interface system with respect to its accuracy, efficiency, reliability, and limitation.