Looking through the mind's eye via multimodal encoder-decoder networks

27 September 2024

Arman Afrasiyabi

E. L. Busch

Rahul Singh

Dhananjay Bhaskar

Laurent Caplette

Nicholas Turk-Browne

Smita Krishnaswamy

ArXiv (abs)PDF HTML

Main:6 Pages

5 Figures

Bibliography:2 Pages

Abstract

In this work, we explore the decoding of mental imagery from subjects using their fMRI measurements. In order to achieve this decoding, we first created a mapping between a subject's fMRI signals elicited by the videos the subjects watched. This mapping associates the high dimensional fMRI activation states with visual imagery. Next, we prompted the subjects textually, primarily with emotion labels which had no direct reference to visual objects. Then to decode visual imagery that may have been in a person's mind's eye, we align a latent representation of these fMRI measurements with a corresponding video-fMRI based on textual labels given to the videos themselves. This alignment has the effect of overlapping the video fMRI embedding with the text-prompted fMRI embedding, thus allowing us to use our fMRI-to-video mapping to decode. Additionally, we enhance an existing fMRI dataset, initially consisting of data from five subjects, by including recordings from three more subjects gathered by our team. We demonstrate the efficacy of our model on this augmented dataset both in accurately creating a mapping, as well as in plausibly decoding mental imagery.

View on arXiv

Comments on this paper