One of the main reason I started to work on Deep Learning was to explain how human see and understand their environment.

This, to begin with, has mot much to do with Deep Learning. Whatn is the story?

Well, since the year 1997, I studied the neuroscience of how we see. How our eyes work, how our visual cortex forms a visual hierarchy, neurons, brains, etc. Too much literature to list here. And this story is here

My original inspiration was the work of Thomas Serre under the supervision of Tomaso Poggio at MIT.

Recently, this work from Thomas Serre group show that we may not need very deep neural network to perform feed-forward classification. What maybe is needed, instead, is a shallower neural net, with recurrent processing, similar to what is proposed by Poggio group at MIT.

In other words for getting better categorization, instead of more layers, we need more time multiple recurrent (feedback) passes.

This is in line with attentional mechanisms and visual search research, that shows that human use multipole fixation on the image to gather context and integrate information from multiple areas of the image.

Rapid categorization is useful to run away when in danger, but in most typical situations, we take our time to look.