Instructor: Dan Kersten (kersten@umn.edu)
Office: S212 Elliott
For the third time in the past sixty years, there has been a surge of interest in the science and application of artificial neural networks. Spurred in part by the availability of large datasets for machine learning and advances in computer hardware, the field has seen unprecedented successes in pattern recognition, including algorithms that approach and on occasion surpass human performance. This seminar will review the history of neural network theory, and then focus on recent technical and theoretical advances. We will look at successes and failures in explaining aspects of human visual recognition and the functional architecture of the primate visual system. We will also compare discriminative vs. generative models of human visual perception and cognition. The class format will consist of short lectures to provide overviews of upcoming themes together with discussions of journal articles led by seminar participants.
Meeting time: Tuesdays, Jan 17th, 3:00 pm.
Place: Elliott Hall S204
Week |
Topics |
Background material | Discussion papers |
1: Jan 17 | Introduction |
Lecture 1 slides | |
2: Jan 24 | Object recognition |
Edelman, S. (1997). Computational theories of object recognition. Trends in Cognitive Sciences, 1(8), 296–304. http://doi.org/10.1016/S1364-6613(97)01090-5 DiCarlo, J. J., Zoccolan, D., & Rust, N. C. (2012). How does the brain solve visual object recognition? Neuron, 73(3), 415–434. |
|
3: Jan 31 | Classic neural network feedforward models of object recognition |
Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025. Serre, T., Oliva, A., & Poggio, T. (2007). A Feedforward Architecture Accounts for Rapid Categorization, 104(15), 6424–6429. Hung, C. P., Kreiman, G., Poggio, T., & DiCarlo, J. J. (2005). Fast Readout of Object Identity from Macaque Inferior Temporal Cortex. Science, 310(5749), 863–866. http://doi.org/10.2307/3842768?ref=search-gateway:67b178c654aa247b9da453fe61d06cef |
Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. Pattern Analysis and Machine Intelligence, 29(3), 411–426. Wu, C.-T., Crouzet, S. M., Thorpe, S. J., & Fabre-Thorpe, M. (2015). At 120 msec You Can Spot the Animal but You Don“t Yet Know It”s a Dog. Cognitive Neuroscience, Journal of, 27(1), 141–149. http://doi.org/10.1162/jocn_a_00701 |
4: Feb 7 | Supervised learning & deep CNNs for recognition | Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366. Ullman, S., Vidal-Naquet, M., & Sali, E. (2002). Visual features of intermediate complexity and their use in classification. Nature Neuroscience, 5(7), 682–687. http://doi.org/10.1038/nn870 Hegdé, J., Bart, E., & Kersten, D. (2008). Fragment-based learning of visual object categories. Current Biology, 18(8), 597–601. http://doi.org/10.1016/j.cub.2008.03.058 Toshev, A., & Szegedy, C. (2014). DeepPose: Human Pose Estimation via Deep Neural Networks, 1653–1660. |
Epshtein, B., Lifshitz, I., & Ullman, S. (2008). Image interpretation by a single bottom-up top-down cycle. Proceedings of the National Academy of Sciences, 105(38), 14298–14303. http://doi.org/10.1073/pnas.0800968105 Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems, 1097–1105. |
5: Feb 14 | Testing deep models of recognition | Hong, H., Yamins, D. L. K., Majaj, N. J., & DiCarlo, J. J. (2016). Explicit information for category-orthogonal object properties increases along the ventral stream. Nature Neuroscience, 1–18. http://doi.org/10.1038/nn.4247 Eickenberg, M., Gramfort, A., Varoquaux, G., & Thirion, B. (2016b). Seeing it all_ Convolutional network layers map the function of the human visual system. NeuroImage, 1–39. http://doi.org/10.1016/j.neuroimage.2016.10.001 Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation. PLoS Computational Biology, 10(11), e1003915. http://doi.org/10.1371/journal.pcbi.1003915.s014 Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci U S A, 111(23), 8619–8624. http://doi.org/10.1073/pnas.1403112111 |
|
6: Feb 21 | Testing deep models of recognition | Ullman, S., Assif, L., Fetaya, E., & Harari, D. (2016). Atoms of recognition in human and computer vision. Proc Natl Acad Sci U S A, 113(10), 2744–2749. http://doi.org/10.1073/pnas.1513198113 Linsley, D., Eberhardt, S., Sharma, T., Gupta, P., & Serre, T. (2017). Clicktionary: A Web-based Game for Exploring the Atoms of Object Recognition, 12. Computer Vision and Pattern Recognition. Retrieved from http://arxiv.org/abs/1701.02704 Eberhardt, S., Cader, J. G., & Serre, T. (2016). How Deep is the Feature Analysis underlying Rapid Visual Categorization?, 1100–1108. |
|
7: Feb 28 | Learning imterediate-level features:intrinsic images, textures, segmentation |
Eigen, D., & Fergus, R. (2015). Predicting Depth, Surface Normals and Semantic Labels With a Common Multi-Scale Convolutional Architecture, 2650–2658. Rock, J., Issaranon, T., Deshpande, A., & Forsyth, D. (2016, December 5). Authoring image decompositions with generative models. Rematas, K. (2015). Deep Reflectance Maps. Philosophy of Science. Philosophy of Science Association. Gatys, L. A., Ecker, A. S., & Bethge, M. (2015, May 27). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. Wang, P., & Yuille, A. L. (2015). Joint Object and Part Segmentation using Deep Learned Potentials, 1–9. Wang, P., & Yuille, A. (2016). DOC: Deep Occlusion Estimation from a Single Image. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer Vision – ECCV 2016 (Vol. 9905, pp. 545–561). Cham: Springer International Publishing. |
|
8: Mar 7 | Unsupervised learning |
Anselmi, F., Leibo, J. Z., Rosasco, L., Mutch, J., Tacchetti, A., & , T. (2014). Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? (No. CBMM Memo No. 001) (pp. 1–23). Center for Brains, Minds & Machines. | |
Spring Break |
|||
9: Mar 21 | Theoretical interpretations | Anselmi, F., Leibo, J. Z., Rosasco, L., Mutch, J., Tacchetti, A., & Poggio, T. (2013). Magic Materials:a theory of deep hierarchical architectures for learning sensory representations (pp. 1–187). A Istituto Italiano di Tecnologia, Genova, Italy. Poggio, T., Mhaskar, H., Rosasco, L., Miranda, B., & Liao, Q. (2016, November 2). Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review. Lin, H. W., & Tegmark, M. (2016). Why does deep and cheap learning work so well? arXiv.org. |
|
10: Mar 28 | Visualization & equivalence classes in deep networks | Karpathy, A., Johnson, J., & Fei-Fei, L. (2015, June 5). Visualizing and Understanding Recurrent Networks. Nguyen, A. M., Yosinski, J., Clune, J., clune. (2015). Deep neural networks are easily fooled: high confidence predictions for unrecognizable images (pp. 1–10). |
|
11: Apr 4 | Adversarial learning | Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv.org. | |
12: Apr 11 | Generative models |
Salakhutdinov, R., Tenenbaum, J. B., & Torralba, A. (2013). Learning with Hierarchical-Deep Models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1958–1971. http://doi.org/10.1109/TPAMI.2012.269 Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338. http://doi.org/10.1126/science.aab3050 Xie, J., Lu, Y., Zhu, S. C., & Wu, Y. N. (2016). A theory of generative convnet. arXiv.org. |
|
13: Apr 18 | Testing models of feedback | Fan, X., Wang, L., Shao, H., Kersten, D., & He, S. (2016). Temporally flexible feedback signal to foveal cortex for peripheral object recognition. Proc Natl Acad Sci U S A, 201606137–6. http://doi.org/10.1073/pnas.1606137113 | |
14: Apr 25 | Recurrent NNs |
de Freitas, N. (2016). Learning to Learn and Compositionality with Deep Recurrent Neural Networks (pp. 3–3). Presented at the the 22nd ACM SIGKDD International Conference, New York, New York, USA: ACM Press. | |
15: May 2 | Reinforcement learning |
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. http://doi.org/10.1038/nature1423 | |
Ventral stream & cortical anatomy
|
Kravitz, D. J., Saleem, K. S., Baker, C. I., Ungerleider, L. G., & Mishkin, M. (2013). The ventral visual pathway: an expanded neural framework for the processing of object quality. Trends in Cognitive Sciences, 17(1), 26–49. Markov, N. T., Vezoli, J., Chameau, P., Falchier, A., Quilodran, R., Huissoud, C., et al. (2013). Anatomy of hierarchy: Feedforward and feedback pathways in macaque visual cortex. J Comp Neurol, 522(1), 225–259.
|
Anselmi, F., Leibo, J. Z., Rosasco, L., Mutch, J., Tacchetti, A., & Poggio, T. (2013). Magic Materials:a theory of deep hierarchical architectures for learning sensory representations (pp. 1–187). CBCL, McGovern Institute, Brain Science Department, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA Istituto Italiano di Tecnologia, Genova, Italy.
Axelrod, V., & Yovel, G. (2012). Hierarchical Processing of Face Viewpoint in Human Visual Cortex. Journal of Neuroscience, 32(7), 2442–2452. http://doi.org/10.1523/JNEUROSCI.4770-11.2012
Chen, L. C., Schwing, A. G., Yuille, A. L., & Urtasun, R. (2015). Learning deep structured models. Proc ICML.
de Freitas, N. (2016). Learning to Learn and Compositionality with Deep Recurrent Neural Networks (pp. 3–3). Presented at the the 22nd ACM SIGKDD International Conference, New York, New York, USA: ACM Press. http://doi.org/10.1145/2939672.2945358
Eickenberg, M., Gramfort, A., Varoquaux, G., & Thirion, B. (2016b). Seeing it all_ Convolutional network layers map the function of the human visual system. NeuroImage, 1–39. http://doi.org/10.1016/j.neuroimage.2016.10.001
Eigen, D., Puhrsch, C., & Fergus, R. (2014). Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information ….
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015, May 27). Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks.
George, D., & Hawkins, J. (2005). A hierarchical Bayesian model of invariant pattern recognition in the visual cortex (Vol. 3, pp. 1812–1817). Presented at the Proceedings 2005 IEEE International, http://doi.org/10.1109/IJCNN.2005.1556155
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, 1026–1034.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv Preprint arXiv:1207.0580.
Hong, H., Yamins, D. L. K., Majaj, N. J., & DiCarlo, J. J. (2016). Explicit information for category-orthogonal object properties increases along the ventral stream. Nature Neuroscience, 1–18. http://doi.org/10.1038/nn.4247
Huang, X., Shen, C., Boix, X., & Zhao, Q. (2015). SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks, 262–270.
Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., & Schiele, B. (2016, May 10). DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model.
Jiang, M., Boix, X., Roig, G., Xu, J., Van Gool, L., & Zhao, Q. (2016). Learning to Predict Sequences of Human Visual Fixations. IEEE Transactions on Neural Networks and Learning Systems, 27(6), 1241–1252. http://doi.org/10.1109/TNNLS.2015.2496306
Karpathy, A. (2015). The unreasonable effectiveness of recurrent neural networks..
Kavukcuoglu, K., Sermanet, P., Boureau, Y.-L., Gregor, K., Mathieu, M., & LeCun, Y. (2010). Learning Convolutional Feature Hierarchies for Visual Recognition., 1(2), 5.
Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep Supervised, but Not Unsupervised, Models May Explain IT Cortical Representation. PLoS Computational Biology, 10(11), e1003915. http://doi.org/10.1371/journal.pcbi.1003915.s014
Kok, P., Bains, L. J., van Mourik, T., Norris, D. G., & de Lange, F. P. (2016). Selective Activation of the Deep Layers of the Human Primary Visual Cortex by Top-Down Feedback. Curbio, 26(3), 371–376. http://doi.org/10.1016/j.cub.2015.12.038
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks, 1097–1105.
Kulkarni, T. D., Tenenbaum, J. B., Mansinghka, V. K., & Kohli, P. (2015). Picture: A Probabilistic Programming Language for Scene Perception. Kulkarni, Tejas Dattatraya. Institute of Electrical and Electronics Engineers (IEEE).
Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338. http://doi.org/10.1126/science.aab3050
Le, Q. V., Monga, R., Devin, M., Corrado, G., & Chen, K. (n.d.). Building high-level features using large scale unsupervised learning, arXiv. preprint
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. http://doi.org/10.1038/nature14539
Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. (2011). Unsupervised learning of hierarchical representations with convolutional deep belief networks. Communications of the ACM, 54(10), 95–9. http://doi.org/10.1145/2001269.2001295
Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. 2015 IEEE Conference on Computer ….
Maier. (2010). Distinct Superficial and deep laminar domains of activity in the visual cortex during rest and stimulation. Frontiers in System Neuroscience. http://doi.org/10.3389/fnsys.2010.00031
Marblestone, A. H., Wayne, G., & Körding, K. P. (2016). Toward an Integration of Deep Learning and Neuroscience. Frontiers in Computational Neuroscience, 10(5), 406–41. http://doi.org/10.3389/fncom.2016.00094
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. http://doi.org/10.1038/nature14236
Nguyen, A. M., Yosinski, J., Clune, J., clune. (2015). Deep neural networks are easily fooled: high confidence predictions for unrecognizable images (pp. 1–10).
Osindero, S., & Hinton, G. E. (2008). Modeling image patches with a directed hierarchy of Markov random fields. Advances in Neural Information Processing Systems, 20, 1121–1128.
Patel, A. B., Nguyen, T., & Baraniuk, R. G. (2015, April 2). A Probabilistic Theory of Deep Learning.
Rematas, K. (2015). Deep Reflectance Maps. Philosophy of Science. Philosophy of Science Association.
Salakhutdinov, R., Tenenbaum, J. B., & Torralba, A. (2013). Learning with Hierarchical-Deep Models. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1958–1971. http://doi.org/10.1109/TPAMI.2012.269
Shen, C., Song, M., & Zhao, Q. (2012). Learning high-level concepts by training a deep network on eye fixations. Deep Learning and Unsupervised ….
Son, H., & Lee, S. (n.d.). Intrinsic Image Decomposition using Deep Convolutional Network. Sunw.Csail.Mit.Edu
Tang, Y., Salakhutdinov, R., & Hinton, G. (2012). Deep Lambertian Networks. arXiv Preprint arXiv:1206.6445.
Vitelli, M. (n.d.). Intrinsic Image Decomposition Using Deep Convolutional Networks. Stanford.Edu
Wang, P., & Yuille, A. L. (2015). Joint Object and Part Segmentation using Deep Learned Potentials, 1–9.
Xu, J., Jiang, M., Wang, S., Kankanhalli, M. S., & Zhao, Q. (2014). Predicting human gaze beyond pixels. Journal of Vision, 14(1), 28–28. http://doi.org/10.1167/14.1.28
Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356–365. http://doi.org/10.1038/nn.4244
Yamins, D. L. K., Hong, H., Cadieu, C. F., Solomon, E. A., Seibert, D., & DiCarlo, J. J. (2014). Performance-optimized hierarchical models predict neural responses in higher visual cortex, 111(23), 8619–8624. http://doi.org/10.1073/pnas.1403112111
Yamins, D. L., Hong, H., Cadieu, C., & DiCarlo, J. J. (2013). Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream, 3093–3101.
Yildrim, I., Kulkarni, T., & Freiwald, W. (2015). Explaining monkey face patch system as deep inverse graphics.
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?, 3320–3328.
Zeiler, M. D., Krishnan, D., Taylor, G. W., & Fergus, R. (2010). Deconvolutional networks, 2528–2535. http://doi.org/10.1109/CVPR.2010.5539957
Zeiler, M. D., Taylor, G. W., & Fergus, R. (2011). Adaptive deconvolutional networks for mid and high level feature learning. Computer Vision (ICCV), 2011 IEEE International Conference on, 2018–2025.
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2014a). OBJECT DETECTORS EMERGE IN DEEP SCENE CNNS, 1–9. Retrieved from http://arxiv.org/pdf/1412.6856.pdf
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014b). Learning Deep Features for Scene Recognition using Places Database, 487–495.