Our rich, embodied visual experiences of the world involve integrating information from multiple sensory modalities – yet how the brain brings together multiple sensory reference frames to generate such experiences remains unclear. Recently, it has been demonstrated that BOLD fluctuations throughout the brain can be explained as a function of the activation pattern on the primary visual cortex (V1) topographic map. This class of ‘connective field’ models allow us to project V1’s map of visual space into the rest of the brain and discover previously unknown visual organization. Here, we extend this powerful principle to incorporate both visual and somatosensory topographies by explaining BOLD responses during naturalistic movie-watching as a function of two spatial patterns (Connective fields) on the surfaces of V1 and S1. We show that responses in the higher levels of the visual hierarchy are characterized by multimodal topographic connectivity: these responses can be explained as a function of spatially specific activation patterns on both the retinotopic and somatosensory homunculus topographies, indicating that somatosensory cortex participates in naturalistic vision. These novel multimodal tuning profiles are in line with known visual category selectivity, for example for faces and manipulable objects. Our findings demonstrate a scale and granularity of multisensory tuning far more extensive than previously assumed. When inspecting their topographic tuning in S1, we find a full band extrastriate visual cortex from retrosplenial, laterally to the fusiform gyrus, is tiled with somatosensory homunculi. These results demonstrate the intimate integration of information about visual coordinates and body parts in the brain that likely supports visually guided movements and our rich, embodied experience of the world. Finally, we present initial data from a new, densely sampled 7T fMRI movie-watching dataset optimised to shed light on the brain basis of human action understanding.