Right. There was some fairly recent research out of UC Berkeley (I think) with visual input and fMRI(?) methods that were able to capture what the subject was seeing. This was basically a collection of youtube videos as training data then new videos as the test input. The results were certainly not random but not great (basically compilations of the most similar previous video clips in groups of a few pixels or frames, etc.).
While that seems impressive ("Can we see what people are dreaming?", etc.) it was probably just reading off the input level, and here it's probably the same, reading off the input level of text, rather than some deeper linguistic representation. On the other hand, it was features (letters) rather than raw images (eg, a page), as far as I understand it, so that is a little deeper.