Topics in Computational Vision:

Person Perception

University of Minnesota, Spring Semester, 2016

Psy 8036 (58390)

Instructor: Dan Kersten (

While computer vision has made substantial progress in the development of algorithms for limited visual tasks, achieving human-like visual capabilities remains a stiff challenge. And while there has also been substantial empirical progress in understanding human vision and its relation to brain activity, we do not yet understand the brain’s algorithms underlying image interpretation. This seminar will examine the proposal that human vision achieves its high degree of competence through built-in generative knowledge of how world structure causes images. Generative knowledge provides the basis for rapid learning from a relatively small number of examples, and the flexibilty to interpret almost any image.

There may be no better example of built-in knowledge than our ability to recognize and interpret images of other people, including their facial expressions, body poses, actions, and intentions. The human visual system can deal with an unlimited range of poses both static and in time, and with large uncertainty in the resulting local patterns of retinal intensities. Gunnar Johansson's classic "point light walker" movies demonstrate our extraordinary competency at interpreting human actions and interactions from locally ambiguous measurements.

This seminar will examine the role of generative models in person perception addressing questions such as: How can information about faces and body form be represented as compositions of parts? Is there a visual grammar for poses and actions? How is local intensity information integrated to infer body pose, given enormous variability in appearance (e.g. clothing and occlusion by other people)? Is there task prioritization, where for example, animacy is detected first? How is visual information about body pose represented in the brain? The class format will consist of short lectures to provide overviews of upcoming themes together with discussion of journal articles led by seminar participants.

Meeting time: First meeting Tuesday, Jan 19th, 3:00 pm. Regular time to be decided.
Place: Elliott Hall S204

Schedule and Readings

Background material & sample readings Discussion papers

Introduction: The generative approach to integrating local cues with global form

Discriminative vs. generative models



2. Perception: faces & expressions: What have we learned? I


Webster, M. A., & MacLeod, D. I. A. (2011). Visual adaptation and face perception. Philosophical Transactions of the Royal Society B: Biological Sciences, 366(1571), 1702–1725.


Leopold, D. A., O'Toole, A. J., Vetter, T., & Blanz, V. (2001). Prototype-referenced shape encoding revealed by high-level aftereffects. Nature Neuroscience, 4(1), 89–94. doi:10.1038/82947 (pdf)

Fang, F., & He, S. (2005). Viewer-Centered Object Representation in the Human Visual System Revealed by Viewpoint Aftereffects. Neuron, 45(5), 793–800. (pdf)


Perception: faces & expressions: What have we learned? II




Cunningham, D. W., Kleiner, M., Bülthoff, H. H., & Wallraven, C. (2004). The components of conversational facial expressions. APGV '04 Proceedings of the 1st Symposium on Applied perception in graphics and visualization, 143-150 .


Curio, C., Bülthoff, H. H., Giese, M. A., & Poggio, T. A. (2010). Dynamic Faces: Insights from Experiments and Computation. Cambridge, MA, USA: MIT Press.

Yu, H., Garrod, O. G. B., & Schyns, P. G. (2012). Perception-driven facial expression synthesis. Computers & Graphics, 36(3), 152–162.



Xu, H., Dayan, P., Lipkin, R. M., & Qian, N. (2008). Adaptation across the cortical hierarchy: Low-level curve adaptation affects high-level facial-expression judgments. Journal of Neuroscience, 28(13), 3374–3383. (pdf)

Jack, R. E., Garrod, O. G. B., & Schyns, P. G. (2014). Dynamic Facial Expressions of Emotion Transmit an Evolving Hierarchy of Signals over Time. Curbio, 24(2), 187–192. (pdf)


Perception: human pose & actions I


Blake, R., & Shiffrar, M. (2007). Perception of Human Motion. Annual Review of Psychology, 58(1), 47–73. (link)


Troje, N. F. (2002). Decomposing biological motion: A framework for analysis and synthesis of human gait patterns. Journal of Vision, 2(5). (link)

Mayer, K. M., Vuong, Q. C., & Thornton, I. M. (2015). Do People “Pop Out?” PLoS ONE, 10(10), e0139618–15. (link)

5. Perception: human pose & actions II

Kersten, Masmassian, P., & Yuille, A. (2004). Object perception as Bayesian inference. Annual Review of Psychology, 55, 271–304. (link)


Hahn, C. A., O'Toole, A. J., & Phillips, P. J. (2015). Dissecting the time course of person recognition in natural viewing environments. British Journal of Psychology (London, England : 1953). (link)

Clifford, C. W. G., Mareschal, I., Otsuka, Y., & Watson, T. L. (2015). A Bayesian approach to person perception. Consciousness and Cognition, 36, 406–413. (link)




Computation: face recognition

Static and dynamic generative models
The problems of skin, hair.

The Digital Emily project.

Alexander, O., Rogers, M., & Lambeth, W. (2009). Creating a photoreal digital actor: The digital emily project. (link)

Koenderink, J., & Pont, S. (2003). The secret of velvety skin. Machine Vision and Applications, 14(4), 260–268.

Jones, B. (2006). Approximating the Appearance of Human Skin in Computer Graphics.

Wang, N., & Ai, H. (2011). A compositional exemplar-based model for hair segmentation. Computer Vision–ACCV 2010.

Weyrich, T., Matusik, W., Pfister, H., Bickel, B., Donner, C., Tu, C., et al. (2006). Analysis of human faces using a measurement-based skin reflectance model. ACM Transactions on Graphics (TOG) (Vol. 25, pp. 1013–1024). ACM.

Dana, K. J., van Ginneken, B., Nayar, S. K., & Koenderink, J. J. (1999). Reflectance and texture of real-world surfaces. ACM Transactions on Graphics (TOG), 18(1), 1–34.


Ghosh, A., Hawkins, T., Peers, P., Frederiksen, S., & Debevec, P. (2008). Practical modeling and acquisition of layered facial reflectance. ACM Transactions on Graphics, 27(5), 1–10. (link)


Ward, K., Bertails, F., Kim, T.-Y., Marschner, S. R., Cani, M.-P., & Lin, M. C. (2007). A survey on hair modeling: styling, simulation, and rendering. Visualization and Computer Graphics, IEEE Transactions on, 13(2), 213–234. (link)




Computation: human form, actions

The problems of real images. Clutter, multiple people, clothes variation

Andriluka, M., Roth, S., & Schiele, B. (2009). Pictorial Structures Revisited: People Detection and Articulated Pose Estimation (pp. 1–8). Presented at the Computer Vision and Pattern Recognition, 2009. (link)

Felzenszwalb, P. F., & Huttenlocher, D. P. (2005). Pictorial Structures for Object Recognition. International Journal of Computer Vision, 61(1), 55–79.

Bourdev, L., & Malik, J. (2009). Poselets: Body part detectors trained using 3D human pose annotations. 2009 IEEE 12th International Conference on Computer Vision (ICCV), 1365–1372.

Bergou, M., Audoly, B., Vouga, E., Wardetzky, M., Grinspun, E., Bergou, M., et al. (2010). Example-based wrinkle synthesis for clothing animation. ACM Transactions on Graphics (TOG), 29(4), 107. doi:10.1145/1833349.1778844

Zhu, S., & Mok, P. Y. (2015). Predicting Realistic and Precise Human Body Models Under Clothing Based on Orthogonal-view Photos. Procedia Manufacturing, 3(C), 3812–3819.

Toshev, A., & Szegedy, C. (2014). DeepPose: Human Pose Estimation via Deep Neural Networks, CVPR2014, 1653–1660. (link)


X. Chen and A.L. Yuille. Articulated Pose Estimation with Image-Dependent Preference on Pairwise Relations. NIPS 2014. (link)



Spring Break


Compositional models: learning & inference



Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, 2, 2145–2152.

Zhu, L., Chen, Y., & Yuille, A. (2011). Recursive Compositional Models for Vision: Description and Review of Recent Work. Journal of Mathematical Imaging and Vision, 41(1-2), 122–146.

Bienenstock, E., & Geman, S. (1997). Compositionality, MDL priors, and object recognition. Advances in Neural Information Processing Systems, 838–844.

Doumas, L. A. A., & Hummel, J. E. (2010). A computational account of the development of the generalization of shape information. Cognitive Science, 34(4), 698–712. (link)

Tervo, D. G. R., Tenenbaum, J. B., & Gershman, S. J. (2016). Toward the neural implementation of structure learning. Current Opinion in Neurobiology, 37, 99–105.

Lake, B. M., Salakhutdinov, R., & Tenenbaum, J. B. (2015). Human-level concept learning through probabilistic program induction. Science, 350(6266), 1332–1338. (link)

Gershman, S. J., Tenenbaum, J. B., & Jakel, F. (2015). Discovering hierarchical motion structure. Vision Research, 1–10.


Cortical responses: faces

Kanwisher, N., & Dilks, D. D. (2012). The Functional Organization of the Ventral Visual Pathway in Humans (pp. 1–25).

Tsao, D. Y., & Livingstone, M. S. (2008). Mechanisms of Face Perception. Annual Review of Neuroscience, 31(1), 411–437.

Ohayon, S., Freiwald, W. A., & Tsao, D. Y. (2012). What Makes a Cell Face Selective? The Importance of Contrast. Neuron, 74(3), 567–581.

Mende-Siedlecki, P., Verosky, S. C., Turk-Browne, N. B., & Todorov, A. (2013). Robust Selectivity for Faces in the Human Amygdala in the Absence of Expressions. Journal of Cognitive Neuroscience, 25(12), 2086–2106.

Furl, N., van Rijsbergen, N. J., Treves, A., Friston, K. J., & Dolan, R. J. (2007). Experience-Dependent Coding of Facial Expression in Superior Temporal Sulcus. Proceedings of the National Academy of Sciences of the United States of America, 104(33), 13485–13489.

Dubois, J., de Berker, A. O., & Tsao, D. Y. (2015). Single-Unit Recordings in the Macaque Face Patch System Reveal Limitations of fMRI MVPA. Journal of Neuroscience, 35(6), 2791–2802.


Meyers, E. M., Borzello, M., Freiwald, W. A., & Tsao, D. (2015). Intelligent Information Loss: The Coding of Facial Identity, Head Pose, and Non-Face Information in the Macaque Face Patch System. The Journal of Neuroscience, 35(18), 7069–7081.


Cortical responses: bodies I


Downing, P. E., & Peelen, M. V. (2011). The role of occipitotemporal body-selective regions in person perception. Cognitive Neuroscience, 2(3-4), 186–203.



Orlov, T., Makin, T. R., & Zohary, E. (2010). Topographic Representation of the Human Body in the Occipitotemporal Cortex. Neuron, 68(3), 586–600. (link)

van Koningsbruggen, M. G., Peelen, M. V., & Downing, P. E. (2013). A Causal Role for the Extrastriate Body Area in Detecting People in Real-World Scenes. Journal of Neuroscience, 33(16), 7003–7010. (link)


11. Cortical responses: bodies II

Jastorff, J., & Orban, G. A. (2009). Human functional magnetic resonance imaging reveals separation and integration of shape and motion cues in biological motion processing. Journal of Neuroscience, 29(22), 7315–7329.

Jastorff, J., Popivanov, I. D., Vogels, R., Vanduffel, W., & Orban, G. A. (2012). Integration of shape and motion cues in biological motion processing in the monkey STS. NeuroImage, 60(2), 911–921. (link)

Weiner, K. S., & Grill-Spector, K. (2011). Neural representations of faces and limbs neighbor in human high-level visual cortex: evidence for a new organization principle. Psychological Research, 77(1), 74–97. (link)


Social interactions I

Heidel and Simmel (link) (youtube link)

*Gao, T., Newman, G. E., & Scholl, B. J. (2009). The psychophysics of chasing: A case study in the perception of animacy. Cognitive Psychology, 59(2), 154–179. (link)

Scholl, B., & Tremoulet, P. (2000). Perceptual causality and animacy. Trends in Cognitive Sciences, 4(8), 299–309. (link)

Scholl, B. J., & Gao, T. (2013). Perceiving animacy and intentionality: Visual processing or higher-level judgment. Social Perception: Detection and …. MIT Press. (Amazon book link)

*Pantelis, P. C., Gerstner, T., Sanik, K., Weinstein, A., Cholewiak, S. A., Kharkwal, G., … Feldman, J. (2015). Agency and Rationality: Adopting the Intentional Stance Toward Evolved Virtual Agents. Decision, 3(1), 40–53. (link)

*Schultz, J., & Bülthoff, H. H. (2013). Parametric animacy percept evoked by a single moving dot mimicking natural stimuli. Journal of Vision, 13(4), 15. (link)

Social interactions II

*Schultz, J., Friston, K. J., O’Doherty, J., Wolpert, D. M., & Frith, C. D. (2005). Activation in posterior superior temporal sulcus parallels parameter inducing the percept of animacy. Neuron, 45(4), 625–635. (link)

*Gao, T., Scholl, B. J., & McCarthy, G. (2012). Dissociating the Detection of Intentionality from Animacy in the Right Posterior Superior Temporal Sulcus. The Journal of Neuroscience, 32(41), 14276–14280. (link)

SEBANZ, N., BEKKERING, H., & KNOBLICH, G. (2006). Joint action: bodies and minds moving together. Trends in Cognitive Sciences, 10(2), 70–76. (link)


*Chan, A. W.-Y., Kravitz, D. J., Truong, S., Arizpe, J., & Baker, C. I. (2010). Cortical representations of bodies and faces are strongest in commonly experienced configurations. Nature Publishing Group, 13(4), 417–418. (link)

Troje, N. F., & Westhoff, C. (2006). The inversion effect in biological motion perception: evidence for a “life detector?” Current Biology, 16(8), 821–824. (link)


14. Social interactions III

*Gao, T., & Scholl, B. J. (2011). Chasing vs. stalking: Interrupting the perception of animacy. Journal of Experimental Psychology: Human Perception and Performance, 37(3), 669–84. (link)

Petro, L. S., Smith, F. W., Schyns, P. G., & Muckli, L. (2013). Decoding face categories in diagnostic subregions of primary visual cortex. European Journal of Neuroscience, 37(7), 1130–1139. (link)

Yang, Y., Fermüller, C., Li, Y., & Aloimonos, Y. (2015). Grasp type revisited: A modern perspective on a classical feature for vision. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 400–408. (link)

Blakemore, S.-J., Boyer, P., Pachot-Clouard, M., Meltzoff, A., Segebarth, C., & Decety, J. (2003). The detection of contingency and animacy from simple animations in the human brain. Cerebral Cortex, 13(8), 837–844.. (link)

Myers, A., & Sowden, P. T. (2008). Your hand or mine? The extrastriate body area. NeuroImage, 42(4), 1669–1677.

Saxe, R., Jamal, N., & Powell, L. (2006). My body or yours? The effect of visual perspective on cortical body representations. Cerebral Cortex, 16(2), 178–182.

Dayan, E., Casile, A., Levit-Binnun, N., Giese, M. A., Hendler, T., & Flash, T. (2007). Neural representations of kinematic laws of motion: evidence for action-perception coupling. Proceedings of the National Academy of Sciences, 104(51), 20582–20587.



Tarek El-Gaaly, V. F. A. E. J. F. A. M. S. (2014). A Bayesian Approach to Perceptual 3D Object-Part Decomposition Using Skeleton-Based Representations, 1–7. Proceedings, The Twenty-Ninth AAAI Conference on Artificial Intelligence. (link)

Henriksson, L., Mur, M., & Kriegeskorte, N. (2015). Faciotopy-A face-feature map with face-like topology in the human occipital face area. Cortex, 72(C), 156–167. (link)


Additional topics: human hands, gestures, detecting artifacts, feedback, ...

Feldman, J., & Tremoulet, P. D. (2008). The attribution of mental architecture from motion: Towards a computational theory (Tech. Rep. No. 87). Rutgers University Center for Cognitive Science (link)

Pantelis, P. C., Baker, C. L., Cholewiak, S. A., Sanik, K., Weinstein, A., Wu, C.-C., et al. (2014). Inferring the intentional states of autonomous virtual agents. Cognition, 130(3), 360–379.

Silson, E. H., Groen, I. I. A., Kravitz, D. J., & Baker, C. I. (2016). Evaluating the correspondence between face-, scene-, and object-selectivity and retinotopic organization within lateral occipitotemporal cortex. Journal of Vision, 16(6), 14–21. (link)

Pitcher, D., Goldhaber, T., Duchaine, B., Walsh, V., & Kanwisher, N. (2012). Two Critical and Functionally Distinct Stages of Face and Body Perception. Journal of Neuroscience, 32(45), 15877–15885. (link)

Orlov, T., Porat, Y., Makin, T. R., & Zohary, E. (2014). Hands in Motion: An Upper-Limb-Selective Area in the Occipitotemporal Cortex Shows Sensitivity to Viewed Hand Kinematics. Journal of Neuroscience, 34(14), 4882–4895.

Emery, N. J., & Clayton, N. S. (2015). Do birds have the capacity for fun? Curbio, 25(1), R16–R20.

*Bracci, S., Ietswaart, M., Peelen, M. V., & Cavina-Pratesi, C. (2010). Dissociable Neural Responses to Hands and Non-Hand Body Parts in Human Left Extrastriate Visual Cortex. Journal of Neurophysiology, 103(6), 3389–3397.




van Buren, B., Uddenberg, S., & Scholl, B. J. (2015). The automaticity of perceiving animacy: Goal-directed motion in simple shapes influences visuomotor behavior even when task-irrelevant. Psychonomic Bulletin & Review, 1–6. (link)