Topics in Computational Vision:

Deep Learning and Human Vision

University of Minnesota, Spring Semester, 2017

Psy 8036 (58390)

Instructor: Dan Kersten (
Office: S212 Elliott

For the third time in the past sixty years, there has been a surge of interest in the science and application of artificial neural networks. Spurred in part by the availability of large datasets for machine learning and advances in computer hardware, the field has seen unprecedented successes in pattern recognition, including algorithms that approach and on occasion surpass human performance. This seminar will review the history of neural network theory, and then focus on recent technical and theoretical advances. We will look at successes and failures in explaining aspects of human visual recognition and the functional architecture of the primate visual system. We will also compare discriminative vs. generative models of human visual perception and cognition. The class format will consist of short lectures to provide overviews of upcoming themes together with discussions of journal articles led by seminar participants.

Meeting time: Tuesdays, Jan 17th, 3:00 pm.
Place: Elliott Hall S204

Schedule and Readings

Background material Discussion papers
1: Jan 17


Lecture 1 slides  
2: Jan 24

Object recognition

Lecture 2 slides

Edelman, S. (1997). Computational theories of object recognition. Trends in Cognitive Sciences, 1(8), 296–304.

DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain solve visual object recognition? Neuron, 73(3), 415–434.

3: Jan 31 Classic neural network feedforward models of object recognition

Lecture 3 slides

Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.

Serre, T., Oliva, A., & Poggio, T. (2007). A Feedforward Architecture Accounts for Rapid Categorization, 104(15), 6424–6429.

Hung, C. P., Kreiman, G., Poggio, T., & DiCarlo, J. J. (2005). Fast Readout of Object Identity from Macaque Inferior Temporal Cortex. Science, 310(5749), 863–866.

Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. Pattern Analysis and Machine Intelligence, 29(3), 411–426.

Wu, C.-T., Crouzet, S. M., Thorpe, S. J., & Fabre-Thorpe, M. (2015). At 120 msec You Can Spot the Animal but You Don“t Yet Know It”s a Dog. Cognitive Neuroscience, Journal of, 27(1), 141–149.

4: Feb 7 Supervised learning & deep CNNs for recognition

Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323.

Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.

Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 682–687.

Hegdé, J., Bart, E., & Kersten, D. (2008). Fragment-based learning of visual object categories. Current Biology, 18(8), 597–601.

Toshev, A., & Szegedy, C. (2014). DeepPose: Human Pose Estimation via Deep Neural Networks, 1653–1660.

Epshtein, B., Lifshitz, I., & Ullman, S. (2008). Image interpretation by a single bottom-up top-down cycle. Proceedings of the National Academy of Sciences, 105(38), 14298–14303.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, 1097–1105.

5: Feb 14 Testing deep models of recognition

Hong, H., Yamins, D. L. K., Majaj, N. J., & DiCarlo, J. J. (2016). Explicit information for category-orthogonal object properties increases along the ventral stream. Nature Neuroscience, 1–18.

Eickenberg, M., Gramfort, A., Varoquaux, G., & Thirion, B. (2016b). Seeing it all_ Convolutional network layers map the function of the human visual system. NeuroImage, 1–39.

Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation. PLoS Computational Biology, 10(11), e1003915.

Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci U S A, 111(23), 8619–8624.

6: Feb 21 Testing deep models of recognition

Ullman, S., Assif, L., Fetaya, E., & Harari, D. (2016). Atoms of recognition in human and computer vision. Proc Natl Acad Sci U S A, 113(10), 2744–2749.

Linsley, D., Eberhardt, S., Sharma, T., Gupta, P., & Serre, T. (2017). Clicktionary: A Web-based Game for Exploring the Atoms of Object Recognition, 12. Computer Vision and Pattern Recognition. Retrieved from

Eberhardt, S., Cader, J. G., & Serre, T. (2016). How Deep is the Feature Analysis underlying Rapid Visual Categorization?, 1100–1108.

7: Feb 28

Learning imterediate-level features:intrinsic images, textures, segmentation

Eigen, D., & Fergus, R. (2015). Predicting Depth, Surface Normals and Semantic Labels With a Common Multi-Scale Convolutional Architecture, 2650–2658.

Rock, J., Issaranon, T., Deshpande, A., & Forsyth, D. (2016, December 5). Authoring image decompositions with generative models.

Rematas, K. (2015). Deep Reflectance Maps. Philosophy of Science. Philosophy of Science Association.

Gatys, L. A., Ecker, A. S., & Bethge, M. (2015, May 27). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks.

Wang, P., & Yuille, A. L. (2015). Joint Object and Part Segmentation using Deep Learned Potentials, 1–9.

Wang, P., & Yuille, A. (2016). DOC: Deep Occlusion Estimation from a Single Image. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer Vision – ECCV 2016 (Vol. 9905, pp. 545–561). Cham: Springer International Publishing.


8: Mar 7

Unsupervised learning

Anselmi, F., Leibo, J. Z., Rosasco, L., Mutch, J., Tacchetti, A., & , T. (2014). Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? (No. CBMM Memo No. 001) (pp. 1–23). Center for Brains, Minds & Machines.  
Spring Break
9: Mar 21 Theoretical interpretations

Anselmi, F., Leibo, J. Z., Rosasco, L., Mutch, J., Tacchetti, A., & Poggio, T. (2013). Magic Materials:a theory of deep hierarchical architectures for learning sensory representations (pp. 1–187). A Istituto Italiano di Tecnologia, Genova, Italy.

Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., & Liao, Q. (2016, November 2). Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review.

Lin, H. W., & Tegmark, M. (2016). Why does deep and cheap learning work so well?

10: Mar 28 Visualization & equivalence classes in deep networks

Karpathy, A., Johnson, J., & Fei-Fei, L. (2015, June 5). Visualizing and Understanding Recurrent Networks.

Nguyen, A. M., Yosinski, J., Clune, J., clune. (2015). Deep neural networks are easily fooled: high confidence predictions for unrecognizable images (pp. 1–10).

11: Apr 4 Adversarial learning Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples.  
12: Apr 11
Generative models

Salakhutdinov, R., Tenenbaum, J. B., & Torralba, A. (2013). Learning with Hierarchical-Deep Models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1958–1971.

Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338.

Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). A theory of generative convnet.


13: Apr 18 Testing models of feedback Fan, X., Wang, L., Shao, H., Kersten, D., & He, S. (2016). Temporally flexible feedback signal to foveal cortex for peripheral object recognition. Proc Natl Acad Sci U S A, 201606137–6.  
14: Apr 25
Recurrent NNs
de Freitas, N. (2016). Learning to Learn and Compositionality with Deep Recurrent Neural Networks (pp. 3–3). Presented at the the 22nd ACM SIGKDD International Conference, New York, New York, USA: ACM Press.  
15: May 2
Reinforcement learning
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.  

Ventral stream & cortical anatomy



Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G., & Mishkin, M. (2013). The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1), 26–49.

Markov, N. T., Vezoli, J., Chameau, P., Falchier, A., Quilodran, R., Huissoud, C., et al. (2013). Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortex. J Comp Neurol, 522(1), 225–259.




Reference List (under construction)

Privacy Statement