Motion events in language describe the movement of an entity to another location along a path. In two eye-tracking experiments we found that comprehension of motion events involves the online construction of a spatial mental model that integrates language with the visual world. In the first experiment, participants listened to sentences describing the movement of an agent to a goal location with verbs suggesting a more upwards (e.g., “jump”) or more downwards oriented path (e.g., “crawl”) while concurrently viewing a visual scene depicting the agent, the goal, and some ‘empty space’ in between. We found that in the rare event of fixating the empty space region between agent and goal, visual attention was biased upwards or downwards depending on the kind of verb. In Experiment 2, the sentences were presented concurrently with scenes featuring a central ‘obstruction’ which would not only impose further constraints on verb-related motion paths, but also increase the likelihood of fixating the area in-between the agent and the goal. The results from this experiment corroborated and refined the previous findings. Specifically, eye-movement effects started immediately after hearing the verb and were in line with data from an additional mouse tracking task which encouraged a more explicit spatial re-enactment of the motion event. In revealing how event comprehension operates in the visual world, these findings suggest a mental simulation process whereby spatial details of motion events are mapped onto the world through visual attention. The strength and detectability of such effects in overt eye-movements is constrained by the visual world and the fact that perceivers rarely fixate regions of empty space.