The Psycholinguistics Research Group at York was the first group in the UK to employ the visual world paradigm. This procedure involves monitoring participants’ eye movements around a scene as they concurrently hear a sentence that refers to objects within the scene. The ensuing eye movements can be analysed relative to the timings of particular points within the spoken sentence (our group was also the first to fully automate the analysis process). The paradigm is particularly useful for examining the manner and timing with which an unfolding sentence can guide visual attention. Our own work, described below, has demonstrated how the paradigm can also be used to explore how language is mapped onto mental representations of an external world that may in fact be absent at the time of the language. Establishing the ‘how and when’ of this mapping is important because it allows us to determine what kinds of information (lexical, grammatical, visual, real world knowledge) are deployed when. It also allows us to explore the surprisingly close linkage between the interpretation of an unfolding sentence and the control of visual attention and eye movements.

The visual world paradigm is due originally to Cooper (1974) and was subsequently developed by Tanenhaus et al. (1995). In our first study (Altmann & Kamide, 1999), we presented participants with a scene depicting, for example, a boy on a floor, a toy train, toy car, ball, and a birthday cake. Whilst viewing this scene, participants heard ‘the boy will eat the cake’ or ‘the boy will move the cake’. We found that during the verb (‘eat’ or ‘move’) more looks were launched towards the cake during ‘eat’ than during ‘move’. We argued that the human sentence processing mechanism, when mapping language onto the visual world, can use information at the verb to anticipate what would most likely be referred to next in the concurrent visual scene (‘eat’ requires an object that is edible, with the cake being the only edible object in the scene; ‘move’ was not so selective, and could ‘apply’ to any of the objects in the scene). This was the first demonstration of anticipatory eye movements – eye movements towards an entity that was referred to in the sentence but which were launched before the critical referring expression (e.g. ‘the cake’) was actually encountered in the unfolding speech stream. This effect has engendered much interest.

Subsequently, we have reported a range of data using this paradigm, including its application to exploring the comprehension deficit in children with poor comprehension (Nation et al., 2003). In Kamide et al. (2003), we asked whether the ‘
eat the cake’ result was due simply to information on the verb (i.e. it is part of the meaning of ‘eat’ that there must be something edible, and perhaps it was this information alone that drove the eyes towards the cake), or to information derived through the combination of ‘the boy’ and ‘eat’ (i.e. the eyes were driven towards the cake because that was what the boy would most likely eat). To explore this, we showed participants scenes depicting a young girl, an adult man, a motorbike, and a fairground carousel. Participants heard either ‘the man will ride the bike’ or ‘the girl will ride the carousel’. We had established through prior norms that the man would more likely ride the bike than the carousel, and conversely for the girl. We found, during ‘ride’, that more looks were initiated towards the motorbike when the subject of the sentence had been ‘the man’ than when it had been ‘the girl’. Conversely for looks towards the carousel. For the same scene, which also depicted a beer and some sweets, we established the same pattern for ‘the man will taste the beer’ and ‘the girl will taste the sweets’ (more looks to the beer during ‘taste’ after ‘the man’, and more looks to the sweets after ‘the girl’). These data demonstrated that there was rapid integration, at the verb, of information associated with the meaning of the verb, with the combined meaning of the subject of the sentence, with the contents of the visual scene, and with general world knowledge concerning the plausibility of alternative events.

In Altmann (2004), we demonstrated that the original Altmann & Kamide (1999) result did not require the visual scene to be concurrent with the language; we showed participants the scene for a few seconds and then removed it. After a short while, with the screen still blank, participants heard the target sentence. In this case, we found looks towards where the cake
had been during the critical verb. This was the first indication that within this paradigm, language may be mapped onto a mental representation of the scene rather than the scene itself (Richardson & Spivey, 2000, had demonstrated looks towards where a face had been located that had introduced some information currently being repeated, but this is a different effect; the Altmann study showed that as reference to objects unfolds, or is anticipated, so the eyes move to where those objects had been located). A range of follow-up studies (Altmann & Kamide, 2004; submitted) demonstrated further the nature of this representation. In one study, participants saw a scene depicting a woman, a wine glass on the floor, a wine bottle nearby, and an empty table. Participants heard, concurrently, one of: (A) ‘The woman will put the glass on the table. Then, she will pick up the wine and pour it carefully into the glass’ (B) ‘The woman is too lazy to put the glass on the table. Instead, she will pick up the wine and pour it carefully into the glass’. Kamide et al., 2003, had demonstrated anticipatory eye movements towards the glass during ‘it carefully’. Here, we were interested in where the eyes would move when anticipating the glass. In (A), the glass is ‘moved’ to the table prior to the pouring event. In fact, we found during ‘it carefully’ more eye movements towards the table in (A) than in (B). Similarly, at the final ‘glass’ there were more fixations on the table in (A) than in (B). We argued that language-mediated eye movements thus reflect the mapping of the unfolding language onto a dynamically interpreted mental representation of the scene. Taking all these data together, we have concluded that language makes contact with the visual world (or rather, its representation) at the earliest opportunity.

Recently, in Huettig & Altmann (2004; submitted), we have asked what information about an object in the visual scene (or in its mental representation) draws the eyes towards it as a word referring to that object unfolds in time (the efficacy of the paradigm for exploring not only sentential effects but also lexical effects was established by Tanenhaus and colleagues: Tanenhaus, et al., 1995; Allopenna et al., 1998). In Huettig & Altmann (2004, submitted), participants were presented with an array of objects (e.g. a trumpet, a goat, a cabbage, a bicycle) and heard the sentence ‘
At first, the man looked around, but then he saw the piano and agreed that it was beautiful’. During ‘piano’ we found more looks towards the trumpet than towards any of the other objects, and almost as many looks as when, in another condition, the trumpet was replaced by a piano. And when both the piano and trumpet were present, there were most looks towards the piano, but still more looks towards the trumpet than towards the other objects. We attributed this to the equivalent of a semantic priming effect (cf. Meyer & Schvaneveldt, 1971). Moreover, the probability of looking towards the related object (e.g. the trumpet in this example) was correlated with the degree of conceptual overlap between this object and the named object (‘piano’), as determined through semantic feature norms (e.g. Cree & McRae, 2003) – we have since replicated this effect in a further study still to be submitted for publication. However, in Huettig & Altmann (2004) we demonstrated that it is not just semantic overlap that drives eye movements towards ‘competitors’. We found more looks towards a green cabbage than towards other objects after hearing ‘ball’ (shape similarity) or after hearing ‘frog’ (colour similarity). And as with the original semantic competitor effect (‘piano’–trumpet), the scene had been onscreen for at least 5 seconds before target word onset, which is ample time in which each of the four objects in the display could be recognized.

There thus appears to be an automatic tendency to move the eyes towards objects that are
not the objects intended by the speaker, but which, incidentally, share certain features (conceptual, shape, colour, and possibly location) with the intended object. The relevance is twofold: First, it enables us to determine in more detail the nature of the information, and processes, that guide eye movements; and second, it suggests that some mechanism must exist to prevent an explosion of spurious eye movements during normal everyday listening.

Our overall goal has been to understand how language is mapped onto
mental representations of the visual world. We view the visual scene in this paradigm as functioning (in certain of our studies) in much the same way as a discourse context functions – it sets up entities with certain properties which the language will subsequently refer to. This research thus extends a body of work stretching back to the mid 1980s and Altmann & Steedman (1988), where we demonstrated how reference (or failure to establish reference) to entities in the context could potentially influence syntactic ambiguity resolution (until then, the initial decisions of the sentence processor had been considered to be ‘informationally blind’ to contextual influence; e.g. Frazier, 1987). Our goal now is to extend our current work on language processing in visual contexts, and to understand better the control of language-mediated eye movements in light of the ‘spurious’ cases outlined above: eye movements towards where objects were but no longer are, and towards objects that only partially match the conceptual and physical specification associated with the meaning of a noun. Our aim is to understand both what drives language-mediated eye movements and what, in the case of the spurious case, prevents them.

Allopenna, P. D., Magnuson, J. S., & Tanenhaus, M. K. (1998). Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 38(4), 419-439.
Altmann, G.T.M., & Steedman, M.J. (1988). Interaction with context during human sentence processing.
Cognition, 30, 191-238.
Altmann, G.T.M. (2004). Language-mediated eye movements in the absence of a visual world: The ‘blank screen paradigm’.
Cognition, 93, 79–87.
Altmann, G.T.M., & Kamide, Y. (submitted). Mediating the mapping between language and the visual world: eye movements and mental representation.
Altmann, G.T.M., & Kamide, Y. (1999). Incremental interpretation at verbs: Restricting the domain of subsequent reference.
Cognition, 73(3), 247-264.
Altmann, G.T.M., & Kamide, Y. (2004). Now you see it, now you don't: Mediating the mapping between language and the visual world. In J. M. Henderson & F. Ferreira (Eds.),
The interface of language, vision, and action: Eye movements and the visual world. New York: Psychology Press.
Cooper, R. M. (1974). "The control of eye fixation by the meaning of spoken language: A new methodology for the real-time investigation of speech perception, memory, and language processing."
Cognitive Psychology 6(1): 84-107.
Cree, G. S., & McRae, K. (2003). Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns).
Journal of Experimental Psychology: General, 132, 163-201.
Frazier, L. (1987). Sentence Processing: A Tutorial Review. In M. Coltheart (Ed.),
Attention and Performance XII: The Psychology of Reading (pp. 559-586). Hove: Erlbaum.
Huettig, F. & Altmann, G.T.M. (2004). Language-mediated eye movements and the resolution of lexical ambiguity. In M. Carreiras & C. Clifton (Eds.)
The on-line study of sentence comprehension: Eye-tracking, ERP, and beyond. Psychology Press.
Huettig, F. & Altmann, G.T.M. (submitted). Word meaning and the control of eye fixation: semantic competitor effects and the visual world paradigm.
Kamide, Y., Altmann, G. T. M., & Haywood, S. L. (2003). The time-course of prediction in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language, 49, 133–159.
Meyer, D. E., & Schvaneveldt, R. W. (1971). Facilitation in recognizing words: Evidence of a dependence upon retrieval operations.
Journal of Experimental Psychology, 90, 227-234.
Nation, K., Marshall, C., & Altmann, G.T.M. (2003) Investigating individual differences in children’s real-time sentence comprehension using language-mediated eye movements.
Journal of Experimental Child Psychology.86, 314-329
Richardson, D. C., & Spivey, M. J. (2000). Representation, space and Hollywood squares: looking at things that aren't there anymore.
Cognition, 76, 269-295.
Tanenhaus, M. K., M. J. Spivey-Knowlton, et al. (1995). Integration of visual and linguistic information in spoken language comprehension.
Science, 268(5217): 1632-1634.