Although there has been much progress in developing theories, models and systems in the areas of Natural Language Processing (NLP) and Vision Processing (VP) there has up to now been little progress on integrating these two subareas of Artificial Intelligence (AI). This book contains a set of edited papers on recent advances in the theories, computational models and systems of the integration of NLP and VP. The volume includes original work of notable researchers: Alex Waibel outlines multimodal interfaces including studies in speech, gesture and points; eye-gaze, lip motion and facial expression; hand writing, face recognition, face tracking and sound localization in a connectionist framework. Antony Cohen and John Gooday use spatial relations to describe visual languages. Naoguki Okada considers intentions of agents in visual environments. In addition to these studies, the volume includes many recent advances from North America, Europe and Asia demonstrating the fact that integration of Natural Language Processing and Vision is truly an international challenge.