PSY 3031 Outline

Note: The class has 15 weeks. Exams happen during weeks 5, 10 & 15, and are paired with a review day. So there’s novel content for 12 weeks.

This outline is intended to capture testable, low-level knowledge (i.e., facts and methods). Higher-level themes that we hope to highlight throughout the semester are not necessarily present in this outline.

Each day will emphasize one of these high-level themes:
What is the difference between Sensation and Perception?

Sensation is the direct result of the transduction of a physical stimulus by sensory neurons.

Perception is interpretation of sensation. Perception is subjective and relative (not necessary an accurate physical representation). It includes both top-down and bottom-up effects.

  • Bottom-up effects are effects that can be explained by the stimulus and are (mostly) the same all the time for everyone.
  • Top-down effects are generated by our brains. Top-down effects include contextual modulation (knowing what an object is changes our perception is), attention (feedback from higher brain areas can actually increase neuron firing rates in primary sensory areas), and personal experience or prior expectation (e.g., a person looking for a bird is more likely to mistake a squirrel in a tree for a bird).

Sensory organs, neurons and cortex. For this course, we count 6 sensory modalities: sight, hearing, taste, touch, smell, balance. For each of the 6 sensory modalities, there are specialized neurons for detecting the physical stimuli, specialized organs for housing those neurons, and specialized regions of the brain (sensory cortex) dedicated to processing signals unique to that modality. Sensory cortical areas contain maps of the relevant aspects of the world.

What is psychophysics?
Psychophysical methods quantify perception. Gustav Fechner (1801 - 1897), often called the father of psychophysics, is credited with important idea of drawing quantitative connections between behavioral responses and stimuli. He described three methods for measuring absolute detection thresholds:
  • Method of Limits: increase/decrease intensity until subject can/cannot detect stimulus or stimulus change
  • Method of Adjustment: like Limits, except you give the subject the knob
  • Method of Constant Stimuli: stimuli at different intensities are presented, but in random order. Each intensity is repeated several times and percent correct is calculated to create an estimated psychometric function, which is usually s-shaped.
These different approaches can be used to measure different kinds of thresholds or perceptual levels:
    Absolute threshold: measurement of absolute detection limit. This is really a special case of the difference threshold (below) ... when the base stimulus strength is 0. Subliminal messages are stimuli that are below the absolute detection threshold.
    illustration of noisey stimulus-response function Difference threshold methods estimate an observer's just-noticeable difference (JND) or Differenze Limen (DL) (JND & DL are the same thing) by presenting two different stimuli and asking the subject which was greater (or lesser). What we're looking to discover is the stimulus change that produces a neural response larger than the inherent noise in the system.

    illustration of noisey stimulus-response function What Weber discovered was: bigger changes are needed for bigger base stimuli. This is written as the Weber-Fechner law: JND/S = k.

    The strong interpretation of this law -- a linear relationship between JND and stimulus strength -- results in a logarithmic relationship between stimulus intensity and perception (see below). A weaker interpretation of this law is often more useful: the stronger a stimulus (S), the bigger the change has to be to be noticeable (JND). This weaker interpretation is compatible with a range of compressive functions. Steven's law (discussed later) allows for expansive functions.

    illustration of noisey stimulus-response function Magnitude estimation: perception of magnitude. Often described by a power law, referred to as Steven's power law: P = kSn
      n = 1: linear (example: length estimation)
      n < 1: compression (example: brightness perception)
      n > 1: expansion (example: electrick shock perception)
    The math behind the laws ...
      For Weber's law, the idea that the rate of change of the response, relative to the response, is a constant results in a logarithmic stimulus-response function (if noise is the same for all response levels):
      • Think of a perceptual response function as P(S) ... perception on the y-axis and stimulus on the x-axis.
      • JND is the change in stimulus required to produce a change in P that's big enough to notice.
      • If there's the same amount of noise present for all response levels, then the change in P is the same, regardless of stimulus intensity (this is the tricky assumption).
      • So, the slope of the function (dP/dS) has to get smaller as S gets bigger (=k/S) to keep the change in P the same as the JND gets bigger.
      • dP/dS = k/S defines a logarithmic function, P = k ln(S)
      • In reality, logarithmic functions aren't quite the right shape ... but they're a good approximation in the middle of the range
      For Steven's (power) law, if the stimulus-response function is truly a power law, then logarithms come in handy again!
      • If P = kSn, then log(P) = log(kSn) = n log(S).
      • So if you plot log(P) on the y-axis and log(S) on the x-axis, you'll see a straight line with a slope of n. Super convenient for interpreting data.
      Logarithms will come up many times in this class. If you're rusty using them, you can brush up at Kahn academy ... or just spend the time to make sure you understand all the examples we do in class.
      Both Weber's and Steven's laws are too simple. Near threshold and near saturation, our perception deviates from these functions. When we get to vision, we can talk about more details. But for now, these are good first approximations.
Psychometric functions quantify stimulus-response relationships
illustration of noisey stimulus-response function Put together, the concepts above teach us that perception can be quantified. Many different stimulus-response plots can be called psychometric functions, but the most common is the S-shaped curve that represents an observer's behavior around a threshold (this works for absolute or difference thresholds). The transition from "can't tell" to "can tell" is not instantaneous. Because (neural) responses are variable (both bottom-up and top-down effects have variability), observers don't give the same answer to every stimulus.
  • The "ceiling" is the best performance for the most detectable stimuli. Sometimes this is not 100% because people make mistakes in answering ("finger errors").
  • The "floor" is the worst performance. If you're doing a two-alternative forced-choice task, the worst a person can do is 50% (guessing).
  • The middle is the most interesting part. This is where a person begins to detect the changes, but not all the time.
  • You determine the threshold by picking a performance criterion (e.g., 80% correct); the threshold is the stimulus value (intensity or difference) that resulted in criterion performance.

One thing we can do is determine whether a detection or discrimination threshold is high because the average neural response to the stimulus isn't very sensitive to changes in the stimulus (slow rate of response change) or because there is a lot of variability in responses.

  • illustration of noisey stimulus-response function Thresholds are higher when response doesn't depend very strongly on stimulus.
  • Thresholds are also higher when responses are highly variable -- even a big change isn't reliably detected.
  • illustration of noisey stimulus-response function The slope of the psychometric function (for a single stimulus level) indicates the level of perceptual noise.
  • So when comparing different population groups, and one tends to have higher thresholds than the other, we can use the slope to tell whether the difference is caused by reduced response amplitude or increased noise.

A few key applications of these methods to real life are:
    Reynolds, JH and Heeger DJ (2009). The normalization model of attention. Neuron 29;61(2):168-85.
  • Studying how the psychometric function shifts when people pay attention to or ignore a stimulus revealed one of the neural mechanisms by which attention improves performance.
Review of neuron basics (this section is not explicitly tested in this class, but is necessary background knowledge) basic neuron parts
    Main divisions: dendrites (receive input), soma (cell body, integrates inputs), axon (sends output). specializations of sensory neurons
    • Sensory neurons either have have specialized nerve endings instead of dendrites or are axonless receptor cells.
    • What we can sense is determined by the properties of nerve endings and receptor cells.
    Resting membrane potential creates potential for communication.
    • When not receiving excitation or inhibition, a neuron has a -70 mV potential difference across its membrane (lower on the inside). membranes are phospholipid bilayers studded with proteins
    • This is caused by anions (negatively charged molecules) in and near the inner membrane.
    On top of the potential difference, there is also a concentration difference in key electrolytes.
    • Na/K pumps in action There is 10X as much Na+ (sodium ions) outside the cell as inside. K+ (potassium) is concentrated inside the cell.
    • This potential difference is maintained by Na/K pumps, which actively pump ions against their concentration gradients.
    voltage-gated ion channel in action The membrane potential difference and the Na+ and K+ concentration differences are important because voltage-gated ion channels are the membrane-embedded proteins that initiate action potentials when a cell receives enough excitation.
Basic properties of the nervous system
Nerves are bundles of axons. Neurons (and therefore nerves) are classified as belonging to the central, peripheral or autonomic nervous system
    map of central and peripheral nervous system Central nervous system: brain, spinal cord and retina. In general, CNS neurons cannot regenerate after injury, e.g. spinal cord injury results in paralysis
    Peripheral nervous system: spinal nerves & sensory neurons, cranial nerves. Generally can re-grow after injury (e.g., a cut in the finger heals, and regains normal sensation).
    Autonomic nervous system (not addressed in this class) regulates homeostasis, fight or flight ...
    Enteric nervous system: neural networks regulating gut.
Action potentials enable long-distance communication between neurons.
Action potentials are … schematic of action potential
    ... created/carried by ionic currents and travel down axons
    ... initiated when excitation increases soma potential to a threshold (~ -50mV)
    ... propagated by Na+ ions entering the cell, down their concentration gradient, through voltage-gated sodium channels. Note: permeability is related to the number of open ion channels.
    ... terminated (automatically) by K+ running out of the cell (down its concentration gradient) through voltage-gated potassium channels.
Information is encoded by the rate of action potentials. Size (amplitude) never changes!
Synapses
Information is chemically transmitted to other cells at synapses where chemicals are released from once cell's axon terminal onto another cell's dendrites.
    Whether a particular molecule is excitatory or inhibitory is determined by the receptor on the post-synaptic membrane.
Myelin increases conduction velocity
Many axons are coated with myelin. Myelin is layers and layers of cell membrane from an astrocyte (Schwann cell in the peripheral nervous system; oligodendrocyte in the central nervous system) wrapped around the axon.
    Conduction velocity is highest in large (alpha), myelinated (A) axons. These conduct at ~100 m/s.
    Small, unmyelinated axons (C-fibers) transmit action potentials at ~1 m/s
    illustration of Schwann cell providing myelination When you stub your toe or touch something hot, you experience two sensations. First, neurons with fat, myelinated axons let you know what something intense has happened. But then, a few seconds later, a wave of pain reaches your brain. This delay is because the pain neurons have skinny, unmyelinated axons.
Somatosensory system: sensation throughout the body
Somatosensation includes: touch (mechanical and thermal), pain (mechanical, thermal, chemical) & proprioception (sense of self).
    There are many, many types of somatosensory receptors; we only talk about a few in this class.
    Your sense of self (proprioception) includes balance (next week's topic) and limb position -- you can feel yourself move, even when you're not touching anything. Kinesthesia refers specifically to proprioception when you're moving.
    Most of your non-proprioceptive somatosensory neurons are cutaneous sensory neurons in your skin. They fall into 3 categories: Mechanical, Thermal, Noxious.
    Most cutaneous receptors are pseudo-unipolar neurons, with cell bodies in the dorsal root ganglia.The dorsal root ganglia (singular: ganglion; plural: ganglia) are lumps of nervous tissue next to the spinal cord that house the cell bodies of somatosensory neurons.
Thermal receptors
Thermal sensory neurons have free nerve endings in the dermis and epidermis (skin layers) that sense warm and cool.
    They are non-uniformly distributed, which means there are some regions of skin w/o thermal sensation
    They respond to non-damaging temperatures above (warm receptors) and below (cool receptors) skin temperature.
Mechanoreceptors
While most textbooks focus on 4 primary types, we pick just 2
    Merkel receptor: slowly adapting, low frequency, perception of fine detail
      Density is inversely related to receptive field size
      Density is related to tactile acuity.
    Pacinian corpuscle: rapidly adapting, high-frequency; senses texture and vibration.
      Large receptive field sizes.
      Blobby nerve endings are located in sub-cutaneous fat.
Mechanoreceptors have receptive fields, which are basically a territory on the skin that they are responsible for.
    Mechanoreceptors on the fingers and tongue, which are places with high tactile acuity (ability to sense fine details), have small receptive fields and are densely packed.
    Mechanoreceptors on other body parts like the back or leg, which are places with poor tactile acuity, have large receptive fields and are more spread out.
Somatosensory pathways to the brain
Photo of homunculi at London Natural History Museum. Dr. Joe Kiff (http://psychology.wikia.com/wiki/User:Lifeartist (Almost) all sensory information goes to thalamus before cortex. The thalamus is in the middle of the brain, at the top of the brainstem.
    Axons from mechanoreceptors and thermal receptors travel up different sides of your spinal cord (Figure 1. from Anatomy and Physiology reading for the week).
    The thalamus likely helps regulate which stimuli or parts of our environment we pay attention to.
    Center-surround receptive field organization is created by subtraction in the thalamus.
Somatosensory representations in the brain
Photo of homunculi at London Natural History Museum. Dr. Joe Kiff (http://psychology.wikia.com/wiki/User:Lifeartist Primary somatotopic representation (S1) is on the post-central gyrus. It is a distorted map (body parts with high receptor density get more territory). The homunculus is an illustration of the relative size of representations.
  • Touch information crosses from one side to the other in the brainstem, so S1 represents contralateral body parts (i.e., left S1 has right hand representation).
  • Thermal information crosses from one side to the other in the spinal cord (before touch crosses) but still goes to contralateral S1.
  • S1 neurons respond to features such as orientation or direction.
How do we know all of this? In the early 1900s, surgeons like Penfield mapped it out through a lot of trial and error. Before him, there were lesion studies (strokes and bullet wounds) that gave us a coarse idea of what was where). These days, we can do functional MRI studies to study healthy brains without hurting anyone! Cortical representation beyond S1
  • Regions in parietal cortex outside S1 respond to more complex features.
  • Unattended stimuli can fail to elicit neural response, even in primary somatosensory cortex. But the effects of attention are stronger outside S1.
  • Object-selective responses are found in regions of the parietal cortex outside of primary somatosensory cortex.
Pain
Pain is unpleasant and expensive, but it protects you. For example, CIPA is an inherited disease in which people lack pain in mouth and limbs. Generally results in early death, due to either overheating (the A in CIPA is anhydrosis, an inability to sweat to cool oneself) or to repeated injury (e.g., chewing off your tounge and lips, twisting the skin off your hand trying to open a jar). CIPA happens when you get 2 defective copies of the SCN9A gene, which codes for a sub-unit of the Na+ channels in nociceptors.
Itch
We used to think itch was sub-threshold pain, but now we know that there are specific receptors for itch.
    They have unmyelinated axons with free nerve endings, but are unresponsive to mechanical or thermal insult.
    Applying histamine elicits action potentials and a sensation of itching. Reported in Schmelz et al (1997). Specific C-Receptors for Itch in Human Skin. J. Neuroscience 17(20):8003-8008.

There are three types of pain
    Nociceptive pain is mediated by cutaneous receptors which detect heat, cold, severe force and chemical insult
    Inflammatory pain happens when immune responses activate nociceptors in response to injury. (That infographic comes from a great article in The Scientist about chronic pain.) Pain associated with tumors and post-injury swelling are examples of inflammatory pain.
    Neuropathic pain is caused by damage to the nervous system itself. Examples: carpal tunnel syndrome, sciatic nerve pain
Capsaicin
Capsaicin, the oil-soluble chemical in hot chile peppers that makes them hot, provides a good example of a noxious chemical that creates pain when it activates nociceptors. Chiles taste hot because capsaicin activates polymodal nociceptors which respond to heat and capsaicin.
Pain pathways and cortical representation
    Nociceptive information shares the spinothalamic tract with regular thermal information, on the opposite side of the spinal cord from touch information (which is in the medial lemniscal pathway).
    Pain projects to contralateral primary somatosensory cortex, as well as the limbic systen. If you want to motivate a strong reaction from someone, the limbic system is a good target!
Physiological treatments for pain
Drugs that act on the peripheral nervous system:
    NSAIDs: non-steroidal anti-inflammatory drugs (e.g., Aspirin, Advil) act in the periphery by inhibiting prostaglandin synthesis, which is released in response to injury to do things like increase local blood flow and initiate immune response. Prostaglandins are fat-soluble molecules that, among other things, regulate blood flow. Inhibiting prostaglandins reduces pain and inflammation. Recent research indicates that aspirin also facilitates production of a new family of molecules derived from omega-3 fatty acids that actually calms inflamatory responses.
    Capsaicin: over-stimulating polymodal nociceptors makes them down-regulate their responses. A local anesthetic is required (to block pain signals) while you're over-stimulating the polymodal nociceptors ...
Drugs/treatments that act in the central nervous system
    Opioids: there are opiod receptors in brain and spinal cord that mediate analgesia, euphoria. Epidural injections (dura is the hard tissue protecting the brain and spinal cord) target opiod receptors in the dorsal horn of the spinal cord (in the substantia gelatinosa ... who doesn't love that name?!). Morphine targets receptors in the brainstem and brain.
Psychological treatment can address either experiential (sensory) or cognitive (perceptual) aspect.

Distraction decreases pain intensity rating

Expectation: Knowing what to expect decreases perceived severity.

Distraction: Thinking about something else (positive) decreases severity.

    However, there is a trade-off: pain can also interfere with the distracting task.

hypnotic suggestion decreasing unpleasantness does not affect intensity Hypnosis can be used to help about 75% of people.

    Experiential and cognitive aspects can be separated: decreasing intensity decreases unpleasantness, but unpleasantness can be decreased while intensity stays high. Figures are from Rainville, P., et al. (1999). "Dissociation of sensory and affective dimensions of pain using hypnotic modulation." Pain 82: 159-171.

Cue combination. (This is not a unique testable concept, but something that will come up many times throughout the class.)

Very few sensory experiences are isolated; almost every perception is the combination of many sensations and expectations. For example, when you reach out and pick up an object, you are often watching your hand. In the course of this action, you experience at least 3 different sensations, which your brain integrates:
  • The visual image of your hand moving through space.
  • The kinesthetic information of feeling your hand and arm move.
  • The tactile information of your hand touching the target object or your sleeve moving against your arm as you move.
Sometimes, your sensory information conflicts. In the above example, what if your hand is numb, or you're seeing your hand underwater so the visual image is distorted? Sometimes, your sensory information is poor. What if it's dark and you can't see where your hand is? What if your hand is tired, and adaptations in your kinesthetic receptors is giving you poor feedback about body position? Your brain doesn't have the option of giving up. How do we deal with noisy and ambiguous data?
  • A weighted combination of all available sensory information is the best approach.
  • More reliable senses are given stronger weight and dominate the perceptual decision.
  • The rubber hand illusion demonstrates how tactile cues can dominate visual and proprioceptive cues under the right circumstances, leading your brain to decide that a rubber hand is your own.
Another factor is your prior experience, or expectations. When your perceptual experience is ambiguous, you often decide to perceive the thing that is most likely given your prior knowledge (past experiences, beliefs about the world)
Kinesthesia and proprioception
Proprioception is the ability to sense oneself.
Kinesthetic information comes from neural receptors embeded in our joints and muscles that let us know when our joints move.
    The knee jerk reflex is an excellent example of a reaction triggered by kinesthesia: a circuit in the spinal cord automatically relaxes the hamstring and tightens the quadricep when you stretch receptors in the quadricep by tapping the patellar tendon.
Phantom limbs
Phantom limbs are limbs that are not present (due to amputation or congenital condition) but still perceived. Not all phantom limb sensations are pain, but when phantom limb sensations are painful, it is a problem! Phantom limb pain can have many origins, which we'll break into 3 categories:
  • Neuroma: inflammation or scar tissue on a nerve. Treatment is to remove the neuroma. It's good news if your phantom pain is treated this easily!
  • Cortical cause #1: Pessimistic interpretation of sensory deprivation or sensory reorganization. Perceptual experience is created by our brain; incoming sensory information mostly just updates our mental model of what is going on in our body and in our world. Without sensory input, or if the sensory input reorganizes, the brain creates perception anyway. Especially if the last sensation coming from the missing body part was pain, the brain can stay "stuck" in that state.
    • Spontaneous activity in denervated cortex might be misinterpreted.
    • Mechanosensation (tactile or kinesthetic input) normally inhibits pain signaling in the dorsal horn of the spinal cord (see Stimulus Produced Analgesia); removal of mechanical sensation removes inhibition of pain. So spontaneous nociceptive signals might be particularly strong. Deep brain stimulation of periaqueductal gray matter can suppress pain system to compensate.
  • Cortical cause #2: Sensory reorganization. Whether it happens in the periphery or the cortex, S1 has unusual inputs and the brain's interpretation might be pain.
    • Sensory nerve endings that used to target an amputated limb can re-grow to innervate neighboring territories.
    • Cortical reorganization: input from neighboring body regino spreads to take over a region of cortex that no longer has sensory input).

Mirror boxes and virtual reality are promising treatments that train the brain to interpret sensory responses, or the lack of sensory responses, as not painful.

Controlling prosthetic limbs
Prosthetic limbs have been around for a long time, and mechanical improvements are continuously being made. Active prosthetic limbs are exciting, although they are difficult to control without sensory feedback. First we will talk about two approaches for controlling active prosthetic limbs (since directly wiring into the nervous system isn't an option, due to reasons of complexity, vulnerability and stability):
  • Targeted muscle reinnervation provides control signals, once patients re-learn how their stump muscles map to bionic limb parts.
  • Brain-Machine Interface: electrophysiological signals from the brain (e.g., EEG, electroencephalography) can also be used to control bionic limbs when direct access to peripheral nerves is not an option.
      Lebedev, M. A. & Nicolelis, M. A. L. (2006). Brain–machine interfaces: past, present and future. TRENDS in Neurosciences, 29(9):536.
Active prosthetic limbs need sensory feedback to work
Targeted sensory reinnervation can provide necessary sensory feedback by delivering temperature and force cues to adjacent tissue, and letting the brain learn to interpret the sensations as coming from the prosthetic arm. This approach takes advantage of the same mechanisms that let phantom limbs live in the brain.
Vestibular transduction
Our inner ear can detect acceleration! The inner ear comprises the cochlea (for hearing) and three accelerometers: the saccule, the utricle and the semicircular canals.
    The maculae (singular: macula) of the saccule and utricle are where the neurons are found. The neurons are hair cells, named for the fact that they have cilia sticking out the top. The cilia tips are embedded in the otolith organ, which shifts due to linear acceleration.
    In the ampullae of the semicircular canals, hair cell tips are embedded in the cupula, which flexes due to angular acceleration
    Bending the tips of hair cells causes membrane potential change
      Hair cells have no axons (similar to taste cells that way)
      Direction-dependent: release of neurtransmitters increases or decreases depending on direction in which tips are deflected
The sensory organ and neurons that are uniquely dedicated to balance are the semicircular canals, saccule and utricle. However, balance is more than vestibular information! We use proprioceptive information (pressure sensors and kinesthetic information), and we use visual information. All of this needs to be integrated in order to keep our balance! Yet there is no place in the cerebral cortex that we have discovered yet as being dedicated to interpreting balance.
    In the brainstem, there are vestibular nuclei (near 4th ventricle and brainstem) that receive vestibular information as well as proprioceptive and optic flow information.
    Vestibular, proprioceptive and visual information also go to the cerebellum
    • The cerebellum actually has 4 times as many neurons as the cerebral cortex!
    • The cerebellum is generally responsible for sensorimotor integration (e.g., controlling gait), so it makes sense that it would be a key player in our balance system

Vertigo is a loss of balance due to inflammation or some other chronic condition in the inner ear.

The spins: elevated blood alcohol results in a change in the density of the endolymph (fluid) in the semicircular canals. Thus, linear acceleration results in a shift in the cupula of the semicircular canals, and is interpreted as angular acceleration.

Visual contributions to balance
Optic flow is coordinated motion in the visual field. It tells us about our motion when we're not accelerating (i.e., when our vestibular and kinesthetic information is useless). The point of expansion tells us our heading; this is the one place in our visual field where there is no motion when we're moving (because we're moving straight toward it).
Motion sickness

First, some backgound on posture, and how we use kinesthetic information to automatically maintain it. Muscles automatically contract to counterbalance shifts in “ground” … but compensation is more accurate if we know what to expect. Understanding spinal reflexes is relevant for understanding posture: see knee jerk reflex, above.

People who experience motion sickness experience it when vestibular (real or perceived) stimulation is low frequency.

    Cue conflict theory: nausea results from inconsistent visual and vestibular information. Why feel ill and vomit? Perhaps an evolutionary adaptation to protect against accidental ingestion of neurotoxins?
    Postural instability theory: people who sway more are more likely to get sick. Do we throw up because we think we're getting weak (swaying) or does postural instability exascerbate cue conflict?

Topic 4 - Required Readings

Topic 4 - Optional/Background Readings

Olfactory anatomy
    Olfactory mucosa (olfactory epithelium) houses olfactory neurons
      The olfactory mucosa is located at the top of the nasal cavity.
    Olfactory sensory neurons have cilia that actually contact air.
      Because of contact with environment, olfactory neurons are damaged often and replaced every 5 - 7 weeks.
      There are receptors are embedded in the cilia tip membranes that are activated by molecules with the right shape.
        Each neuron has one receptor type.
        A single molecule is enough to initiate a burst of action potentials.
        Each receptor/neuron responds to more than one molecule, but with different strengths.
    There are about 350 olfactory receptor types
      This might be the reason that we have an insufficient and ambiguous vocabulary for describing scent (e.g., wine tasting). There are many more dimensions (receptor types) than other senses (350 receptors for smell, vs. 3 for vision ...)
    Olfactory bulb
      Olfactory sensory neurons make their first synapse in the olfactory bulb. These synapses occur in clusters called glomeruli (singular: glomerulus).
        Each glomerulus receives input from (neurons with) the same type of receptor
        There is an orderly spatial arrangement of odorant responses based on size and functional groups on molecules
    Olfactory cortex
      Olfaction is the exception that proves the rule: all other senses are relayed through (gated by) the thalamus, but olfactory information does not go to the thalamus. Olfactory information is also unique in that it does not cross the midline, i.e., projects to ipsilateral cortex.
        Primary olfactory cortex is piriform cortex, on the ventral aspect of the temporal lobe.
        From piriform cortex, olfactory information goes to orbitofrontal cortex (OFC).
        Like pain, olfactory information also projects directly to limbic system (notably, the amygdala). Perhaps because it is an important protective mechanism (smell and taste are the only senses that represent chemical contact with the environment).
Experiencing scent
People can improve their sensitivity with experience (practice). There is no physiological basis for the observation that women often have more developed senses of smell than men.

Predicting what something smells like:

    Unpredictable relationship between molecular structure and individual receptor activation
    Predictable relationship between odorant scent and pattern of activation among receptors & on olfactory bulb (same pattern = same scent)

Detecting vs. recognizing scents: the recognition threshold is ~3X the detection threshold.

Olfaction has an strong but paradoxical linkage with memory

    Emotional content of smell-evoked memories is stronger than verbally evoked memories
    However, semantic retrieval (remembering names) is difficult (if you live in a culture that does not talk about scents very much), and semantic labels decrease emotional content of memories.

Pheromones
Pheromones are subliminal scents used for chemical communication (mediating intra-species aggression and attraction) in animals. Evidence for pheromones in humans is controversial.
    We have no vomeronasal gland, which is what other animals use to detect pheromones.
    Stern & McClintock's study of menstrual synchrony is one of the few studies demonstrating the effects of chemosignalling in humans.
Structure of the tongue
The bumps on the tongue that we called taste buds when we were kids are called papillae (one papilla, two papillae) in this class. There are four kinds of papillae: fungiform, circumvillate, foliate, and filliform. Only the filliform (the ones that make a cat's tongue rough) have no taste buds.
    Taste buds are clusters of taste cells found on papillae.
    Saliva contacts receptors on taste cells when it enters the taste pore at the top of the taste bud.
The primary dimensions of taste correspond to different types taste receptors
    Sweet and bitter are similar to odorant receptors in olfactory mucosa: Molecules dock on trans-membrane receptors, changing membrane potential and initiating action potential.
    Salty and sour are ion sensors (H+=sour, Na+=salt)
      Miracle fruit has a chemical that binds to sweet receptors so that H+ ions can activate sweet receptors.
    The 5th dimension, umami, is detected by a receptor that responds to MSG
    There is recent evidence for a 6th dimension: fat.
Supertasters
Supertasters are people with a genetic difference that means they have an extra kind of taste cell in their taste buds, one which signals a bitter sensation in response to PROP (6-n-propylthiouracil) or PTC.
Taste pathways
    Taste cell responses are carried to the brainstem by four nerves (2 for the tongue, 2 for the rest of the mouth). Different taste cell/receptor responses travel in different bundles in nerves.
      Taste cells have no axons.
      Secondary afferent neurons with cell bodies in the nucleus of the solitary tract make synaptic contact with taste cells and relay data to the thalamus.
    Taste in the cortex
      Taste signals are relayed through thalamus to the frontal operculum and insular cortex (primary taste area)
Flavor = Taste + Smell
Olfactory and gustatory signals combine in orbitofrontal cortex (OFC). Perception of flavor evolves throughout lifespan
    Infants can taste and smell at birth, and show appetitive responses to sweet and aversive responses to bitter, although apparently are not sensitive to salt.
    Number of taste cells decreases with age.
Appetite is driven by more than taste & smell
Illustration of how all the different sensory systems send a copy of their information to OFC Taste response (NST reactivity) is regulated by signals such as blood sugar, appearance, smell Appetite is mediated by orbitofrontal cortex, which combines information from all senses, as well as reward/desire, e.g., neurons in OFC respond more weakly to cream after the rat has consumed a lot.
Anosmia
Anosmia is the loss of olfactory sensation, and this affects both the sense of smell and sense of taste. It is dangerous because we use smell to detect hazardous conditions, such as smoke or the T-butyl mercaptan that is mixed with natural gas. It is also unhealthy because it negatively impacts appetite.
Exam 1 has 36 multiple-choice questions, corresponding to the 36 outline sections above. In addition, there are 2 "sprint" questions (<160 characters, 2 points each) and 2 short-answer questions (3-5 sentences, 4-points each).

Topic 5 - Required Readings

Topic 5 - Optional/Background Readings

Physical vs. perceptual characteristics of sound, and Uses of Sound
What do we use sound for?
    Localization. We are 100x less accurate localizing things with hearing than with vision *but*
    • you can hear in the dark
    • you can hear behind your head

    Environmental awareness: is a bus coming? Where is that hum coming from?

    Communication and social needs. Most notably, speech and music.

      Helen Keller, who was deaf and blind, said (something like) "Blindness separates us from things; deafness separates us from people."

What can we measure about sound?
    Physical things: sound, intensity/pressure, frequency/repetition rate, spectro-temporal properties (e.g., spectrograms of speech)
    Perceptual things: auditory sensation, loudness, pitch, timbre
    There is no clear 1:1 mapping between physical and perceptual dimensions, and they're not totally separable. For example, loudness can affect pitch perception.
Sound waves are pressure changes, usually in air.
    Condensation and rarefaction describe the regions of high and low pressure, respectively, that form when something vibrates and starts a sound wave.
    The pressure changes propagate (travel) at a rate of 340 m/s (1100 ft/s) in air; 1500 m/s in water.
  • Light travels so much faster than sound (300 million m/s), it is essentially instantaneous, which is why we can use the delay between lightning and thunder to figure out how far away the lightning strike was.
    The Pascal is the standard unit for pressure (force per area)
    Atmospheric pressure (at sea level) is 101 kPa
    Conversational speech generates sound waves with intensities of approximately 20 millipascals (mPa). So the sounds we hear are tiny modulations of the air pressure.
    An express subway train generates pressures of ~2 Pa
Loudness is most closely related to wave amplitude
The unit of loudness is the decibel (dB), which quantifies sound pressure level (SPL)
    Decibels are calculated by the following equation: SPL = 20log(P/P0)
      Intensity is the square of the pressure. Intensity is power. A 20Pa (120dB) sound is about 1 Watt of energy. So another way of writing the equation above is SPL = 10log(P2/P20) = 10log(I/I0)
      A decibel is 10 bels; a bel is log(I/I0)
    A decibel is a relative measure -- a loudness relative to another loudness
    To calculate Sound Pressure Level, we use a reference sound intensity of 20 micropascals. Conversational speech is therefore ~60 dB SPL, and a subway train is ~100 dB SPL.
    It is very useful to remember that an increase in SPL of 20dB represents a 10-fold increase in the amplitude of the pressure wave.
Perception of loudness vs. stimulus intensity is a compressive function
    The use of a logarithmic function (decibels) to characterize sound pressure level makes it a little hard to see. If you plot perceived loudness against dB, you get a function that is essentially linear, maybe even scooping up a bit. However, when you plot perceived loudness against the amplitude of the pressure wave, or even against intensity (pressure squared), you see that our perception of loudness is compressive: the louder a sound is, the bigger the change needs to be before we notice it.
The pitch of a sound is most closely related to its frequency
Frequency is measured in cycles per second, or Hertz (Hz). Because all sounds travel at the same speed (roughly ...), sounds with higher frequency have smaller wavelengths (if you take shorter steps, you have to take them faster in order to keep up with someone with a long, slow stride):
    Longer wavelength = lower frequency = lower pitch
    Higher frequency = higher pitch. A doubling in frequency is a one octave increase in pitch.
When you add sounds together, you can get constructive or destructive interference
Constructive interference happens when pressure peaks arrive at the same time and add together. Destructive interference happens when a pressure peak arrives at the same time as a pressure valley, and they cancel each other out.
Auditory sensitivity function
The typical human observer can hear frequencies between 20 Hz and 20,000 Hz
    We are most sensitive to sounds in the range of 1,000 - 3,000 Hz, where important information in conversational speech tends to be
    Musical notes occupy a smaller range, from about 28 Hz to just over 4,000 Hz.
    Dogs can hear above 40,000 Hz; dolphins can hear up to 150,000 Hz
Timbre characterizes the quality of complex tones
Frequency and amplitude are useful for characterizing simple sounds (pure tones, single frequencies). But sounds in the real world are more complicated. Complex (normal real-world) sounds are created by summing many, many frequencies.
    Complex tones are characterized by a fundamental frequency (this will be the perceived pitch) and many harmonics (overtones), which are separated from each other by the fundamental frequency.
    Two different musical instruments playing the same note have different timbre because the are producing complex tones with the same fundamental frequency, but different distributions of harmonics.
There are three parts to the ear: outer, middle and inner

The outer ear comprises the pinna and the auditory canal.

    The pinna collects sound and aids localization.
    The auditory canal resonates at 1,000-4,000 Hz, amplifying sounds related to speech.
    Serumen is the name of ear wax. Its role is to pick up particles of dust, so it protects the middle ear, but not from sound.

The middle ear comprises the tympanic membrane (ear drum) and the ossicles.

    Leverage of the ossicles (smallest bones in the body) and the fact that the tympanic membrane is larger than the oval window amplifies sound waves.
    The primary reason sound pressure waves need to be amplified is that the energy is being transferred from air (outer ear) to liquid (inner ear/cochlea). Air and water have different impedance, which means air is easier to move with air pressure waves than water.
    There are muscles connected to the ossicles, that tighten up to reduce the amplification when we're exposed to loud sounds. But it's not that effective and the muscle reflex is too slow to protect us from abrupt sounds like gunshot. And the muscles adapt after a while. This is called the stapedius reflex.
    The Eustachian tube connects the middle ear to the throat to equialize air pressure in the middle ear. It's usually closed; opens when we swallow. The Eustachian tube can't do it's job when we have a cold or an ear infection, and the pressure build-up causes pain.

The inner ear comprises the cochlea and vestibular organs.

Sound transduction happens in the cochlea
The cochlea is a twirled-up cone, comprising: fluid-filled spaces (scala tympani and scala vestibuli) on either side of the basilar partition (also fluid-filled), in which the Organ of Corti is found. Sound pressure waves vibrate the tympanic membrane; vibrations are amplified by the ossicles to vibrate the oval window of the cochlea, which sets the basilar membrane in motion. All of the basilar membrane moves in response to every stimulus, but different parts of the basilar membrane move more in response to different frequencies (place coding).
    Higher frequencies are represented closer to the base of the cochlea, on the skinny part of the basilar membrane.
    Lower frequencies are represented closer to the apex of the cochlea, on the fat part of the basilar membrane.
    In this way, the cochlea makes an "acoustic prism": just like an optical prism spreads out the different colors in white light into a rainbow, the cochlea spreads out all the different frequencies in a sound.
Inner and outer hair cells are found in the Organ of Corti
    Auditory neurons live in the Organ of Corti, which rides on the basilar membrane. The top of the Organ of Corti is the tectorial membrane. When the basilar membrane waves, the tectorial membrane shifts sideways, relative to the basilar membrane, and stimulates the hair cells. There are 3 rows of outer hair cells and one row of inner hair cells
      The inner hair cells are the sensory neurons. We have about 3500 of them, running the whole length of the basilar membrane. Each hair cell connects to about 10 auditory nerve endings!
        When their cilia are moved by the relative motion of the tectorial and basilar membranes, they release neurotransmitters.
        The actual motion of the cilia is quite tiny! If a hair cell were as tall as the Eiffel tour, then the displacement of the cilia would only be 1 cm.
        They have no axons. Secondary afferent neurons with cell bodies in the spiral ganglion pick up neurotransmitters from inner hair cells and send signals off to the brainstem.
      The motile response (electromotility)of the outer hair cells amplifies the vibration of the basilar membrane and sharpens frequency tuning. Prestin is the name of the molecule that makes the cell move.
      • Outer hair cells are most susceptible to damage and their loss is often the root of hearing damage -- you lose amplification for quiet sounds, and you stop being able to "hear out" sounds ... separate frequencies.
      • This feedback loop actually makes the ear generate sounds! "Otoacoustic emissions" can be measured: these are sounds coming out of your auditory canal. They can be used as hearing screening in newborn babies: play a sound, and measure echo coming back out. If you don't measure them, maybe it's just a plugged up middle ear. Noticing hearing loss early is crucial for assisting language development (most common treatment is a cochlear implant, but sign language and hearing aids are also used).
Sound waves are encoded by a combination of place coding and time coding.

Neurons do not send action potentials fast enough to encode stimuli above ~300 Hz. Many important sounds are higher than that! A timing code solves this problem: different neurons respond to different cycles, but when they do fire, they fire at the same place in the cycle. Added together, the population as a whole represents the entire waveform.

For frequencies above 3-4 kHz, however, inaccuracies in the timing of action potential initiation mean that not even phase-locking in the population code can create a train of action potentials in sync with the basilar membrane vibration. So place coding is necessary for high frequencies. The brain "knows" that neurons near the base of the cochlea (the skinny part of hte basilar membrane) respond best to high frequencies.

How do we combine these 2 cues? There is some evidence that we use them together when possible, using time coding for fine distinctions between frequencies.

Topic 6 - Required Readings

Hearing loss is common
While total deafness is relatively uncommon, hearing loss is common and almost guaranteed (98% chance) as we age. There are two main categories of hearing loss: Sensorineural ("nerve deafness") and conductive.

Hearing aids can help for hearing losses that are mild (~30dB threshold elevation), moderate (~50dB), and severe (~70dB) hearing loss. More profound losses are not helped much by hearing aids (other than greater environmental awareness), and a cochlear implant may be more appropriate.

How do we detect hearing loss?

  • In adults, audiograms are measured by asking you to raise your right or left hand (or push a button) when you hear a tone through headphones in one ear or the other.
  • In people who can't respond (babies ...), auditory brainstem response can be measured with EEG (electrodes) or otoacoustic emissions can be measured (the ear generates sound as the cochlea responds to sounds).

What are the effects of hearing loss?

    The most obvious effect is that you can't hear quiet sounds. But hearing loss is more complicated than that! People with hearing loss have more difficulty understanding speech in noise (can't "hear out" the words). And the fact that different frequencies are affected differently and dynamic range is compressed ("loudness recruitment") means people with hearing loss have a hard time enjoying music.

Conductive hearing loss is less common but more likely to be fixable. It is caused by damage to mechanical structures of the outer ear (e.g., extraordinary wax build-up or over-production of skin; both of these are uncommon) or the middle ear (e.g., ruptured tympanic membrane, which produces 30-40dB loss; fluid due to infection; damaged, missing or ossified ossicles)
    Often reparable (e.g., healing of tympanic membrane that was ruptured by ear infection or sudden pressure change)
    • Kids get ear infections more often because their eustachian tubes are not as angled as grown-ups.
    Impact is relatively uniform across frequencies -- it's just like turning down the sound.
    Amplification by hearing aids can provide effective compensation.
    Can be distinguished from sensorineural hearing loss by measuring hearing through bone conduction: you put a speaker against the mastoid bone behind the ear, and if hearing through bone is normal, or there's an "air-bone gap" (hearing is better through bone than air), then at least part of the loss is conductive.
Sensorineural hearing loss is generally permanent (and unfortunately more common).
Sensorineural hearing loss is caused by damage to neurons (most commonly outer hair cells, but also inner hair cells and the auditory nerve).
    Sensorineural hearing loss is caused by exposure to loud noises!
    • Frequencies in the 1000-5000Hz range are most strongly affected by environmental damage.
    • A temporary threshold shift (TTS) often precedes hearing loss: after exposure to loud noise, we can experience ~40dB loss, but thresholds generally return to normal after about 48h. This kind of research is no longer done because it can lead to permanent hearing loss!
    High frequencies (above ~5kHz) are also affected by aging. Presbycusis is the name for age-related hearing loss.
    • It appears that men experience greater hearing loss.
    • Perhaps it is because men are historically subjected to louder environments; we won't know until a new generation ages. In one study of men and women on Easter Island, where roads and jack hammers and other common auditory insults from the urban environment are absent, elderly men did not show greater hearing loss than elderly women. But this study is not broadly accepted or replicated.
    • Otoacoustic emissions are stronger in women than in men. So we know there are genetic differences, but we don't know exactly how these play out in day-to-day hearing.
    Loss of outer hair cells drives most sensorineural hearing loss. Ruining the stereocilia on your outer hair cells has 3 effects:
    • Loss of the motile response blunts the frequency tuning, so sounds get mushed together.
    • Loss of the amplification that OHCs provide at low sound levels results in an elevation of detection thresholds
    • Loudness recruitment happens. Quiet sounds are gone, but loud sounds are still annoying! So sounds are only tolerable in a small range.
      Research note: humans can't re-grow stereocilia on OHC, but birds and amphibians can! If we could figure out why ... sensorineural hearing loss would be fixable. But after 30 years of research, we don't have a handle on that yet. But we're still trying!
    These next 2 things are FYI, not testable material:
    Loss of stria vascularis will also cause sensorineural hearing loss. The stria vascularis is a membrane that delineates the outer edge of the cochlea and produces the endolymph that fills the central partition of the cochlea where the Organ of Corti is found. The specific ionic concentration in the endolymph creates the potential difference that drives the action potentials in hair cells. If you damage the stria vascularis, you have an ineffective cochlea!
    Retrocochlear sensorineural loss is hearing loss due to damage to structures between the cochlea and the brain. Examples are:
    • Acoustic neuroma: benign tumor that grows slowly but eventually presses against the auditory nerve. If you meet someone with unilateral (one-sided) hearing loss, an acoustic neuroma might be to blame!
    • Acoustic neuropathy: degeneration of the acoustic nerve.
Tinnitus
Tinnitus is phantom noise ("ringing") in the ears. When sensorineural loss means that the brain is missing sensory input ... the brain imagines it's hearing noise. This is just like a phantom limb. So tinnitus is generally a sign of sensorineural damage, although it is not necessarily accomplanied by hearing loss.
Hidden hearing loss
Hidden heaing loss wasn't reported before 2009 (Kujawa et al, J. Neurosci 19:14077-14085) but it is rapidly garnering broad attention. Even when the hair cells are not damaged and behavioral thresholds recover completely after loud noise exposure, 50% of the synapses are gone! But hearing thresholds as measured clinically and reported in audiograms are still the same. The technical name for hidden hearing loss is cochlear synaptopathy. How can we measure it if it doesn't show up on audiograms or other clinical tests?
    Can we detect hidden hearing loss by studying people's ability to "hear out" sounds instead of detecting sounds? This is an active research question and we don't know yet.
    Perhaps cochlear synaptopathy due to noise exposure accelerates effects of aging, so it'll show up as precocious age-related hearing loss?
    Is tinnitus (in the presence of normal hearing) an indicator of synaptopathy?
    Theoretical (computational) simulations of hearing indicates that you can lose 90% of your synapses and only have a 5dB hearing loss. So maybe we have so much redundancy built into the system that we can afford to lose half of our synapses?
    On the other hand, the middle ear reflex is affected!
    • There is dramatic attenuation of the middle ear muscle reflex (stapedius reflex) when tinnitus is present (Wojtczak et al, 2017 eNeuro).
    • If tinnitus is an indicatof of synaptopathy, then this might mean that the stapedius reflex can be used as a clinical marker for cochlear synaptopathy.
Prevention of hearing loss

EAR PLUGS!!!! Be nice to your future self and wear them at concerts and loud sporting events or when operating power tools. If you don't like the muffled sound, buy some fancy ones that attenuate all frequencies equally.

Ventilation tubes can be implanted in the tympanic membrane for children who get frequent ear infections. Tubes relieve pressure from infection to avoid larger tear in membrane.

Some medicines are ototoxic, so avoid overdosing on Aspirin, and be careful with your antibiotics.

Hearing Aids

Hearing aids amplify sound, so they rely on functioning hair cells. Hearing aids do not restore normal hearing when used to treat sensorineural loss, because the effects of neural damage (above) are so complicated. They are most effective for conductive loss. Modern hearing aids have, in addition to basic sound amplification, these features:

    Amplitude compression (mimicking the natural compressive response of the inner ear, which you lose when you lose OHC; this reduces Loudness Recruitmnet).
    Noise reduction -- a filtering algorithm that tries to take out background noise.
    • Noise reduction improves comfort, but doesn't improve speech intelligibility ... and the algorithm doesn't know which sound you're trying to pick out, and might reduce the wrong sound.
    • Current research is trying to use brain-steering: an EEG signal can detect which speech sound you're paying attention to, and this can be used to control amplification. Maybe someday.

Only about a quarter of the people who would benefit from hearing aids actually use them.

  • Adoption rates are higher (30-40%) in countries like Norway where insurance covers hearing aids.
  • New FDA regulations are coming out that might create over-the-counter hearing aids and bring down the price!
  • They're generally quite expensive ($2k-$3k) and insurance doesn't usually cover it.
  • Another thing that keeps adption rates down is that they really don't help in all situations.

Cochlear Implants

Cochlear implant. Electrodes are threaded through the cochlea, with the goal of stimulating secondary afferent neurons with cell bodies in spiral ganglion. Different locations of stimulation will result in a sensation of noise bands with different frequencies. Criteria for receiving an implant are stringent because implant destroys any residual hearing. Candidates for implants need: profound, bilateral sensorineurual loss; intact auditory nerve.

    More than 500,000 have been implanted.
    Early on, hearing experts thought this would never work.
    For a long time, there was vigorous debate on the ethics of giving children these implants when there were no data on long-term quality of life. Now that we have more data on long-term quality of life, the debate is less vigorous.

Critical bands and masking
Every sound, even the most complex, can be represented as the sum of sine waves (pure tones). We have Jean Baptiste Fourier to thank for this insight, which shows up everywhere (graphic equalizer on a stereo, specrograms in linguistics ... even the basilar membrane is doing a Fourier transform!). Where do we see this in the auditory system?
    On the basilar membrane: tones are represented by the location of greatest motion (place coding).
    In auditory nerve: axons near each other carry similar frequency information
    In auditory cortex: different frequencies activate different regions of the brain.
In psychophyiscal studies, a tone only masks, or elevates thresholds for, nearby tones. This is taken as evidence that frequencies that are widely separated are processed separately
    The physical origin of this phenomenon is likely the motion of the basilar membrane. Support for this idea comes from the upward spread of masking -- interference is stronger in one direction than the other, and this matches asymmetries in how the basilar membrane moves.
Pathways to the brain
Sound is unique among the senses in the amount and sophistication of the processing that is done in the brainstem. Several different nuclei are involed, in this order:
    Cochlear nucleus: inputs are ipsilateral and unilateral
    Superior olivary nucleus: processes bilateral information (i.e., combines info from both ears)
    Inferior colliculus: has circuits that aid sound localization

From the brainstem, sound stimuli are relayed through the medial geniculate nucleus in the thalamus to the cortex.

Primary auditory cortex (A1) is located on Heschl’s gyrus, in superior temporal cortex. A1 contains tonotopic maps.
    Tonotopic means that neurons that respond to similar freuqencies are close to each other in cortex. This is like orientation pinwheels, on a fine scale, and retinotopy, on a more coarse scale.
    These maps can shift with experience (monkeys who get really good at discrimination sounds in a particular frequency band grow fatter cortical representations of that frequncy). Experienced musicians also reportedly have elaborated A1 maps.
    We don't simply perceive the tones represented in A1. For example, we perceive the missing fundamental in a series of harmonic tones.
      A1 is necessary for pitch perception (but not duration information)
      A1 is not sufficient for pitch perception, as damage in inferior temporal cortex affects our ability to identify tones.
    A1 is surrounded by a region of cortex called the Belt area.
      Neurons in the belt area respond to the combinations of frequency and details of timing that define more complex characteristics of sound.
Cortical representations beyond A1 and the belt: What and Where pathways
Sound identity is represented in a more ventral extended network; sound location is represented in a more dorsal extended network. We know this from lesion studies as well as more modern neuroimaging studies

Topic 7 - Required Readings

Sound localization is hard
Unlike vision, where light from different places hits the retina at different places, sounds from everywhere arrive at the tympanic membrane together. We use two binaural cues (timing and level differences) and one type of monaural cue to deduce spatial information from the mess of sound that arrives at the tympanic membrane. The coordinate system that we use to describe the auditory environment is:
    azimuth: angle to the right or left of straight ahead, moving in an arc parallel to the ground. (The median plane is a vertical plane cutting through your head, perpendicular to "straight ahead".)
    elevation: angle above or below the horizontal plane
    distance: the obvious thing!
We name 3 spatial cues that support sound localization
Interaural time difference (ITD) is a binaural cue. If a sound is off to the side, it arrives at one ear before the other.
    Neurons in the superior olivary complex, the first place that signals from the two ears meet, are sensitive to ITD
    For the mathematically inclined: If the extra distance to the far ear is 23 cm (0.23 m) across, and sound is traveling at 340 m/s, this timing difference is at most 0.23/340 = 0.000676 seconds or 0.676 milliseconds or 686 microseconds. That's tiny!
    The smallest difference the ears can notice is 10 microsecondds!
    Neuroscience background:
    • Jeffress came up with the "concidence counter" theory in 1948: as your brainstem processes sounds, there are a bunch of different delay stages, with coincidence counters at each stage. Each delay stage has a different ITD. The delay stage with the most hits names the ITD.
    • In 1988, the coincidence counter circuit predicted by Jeffries was discovered in the barn owl! Because these delay stages and coincidence counters are spread out in space, they produce a spatial map! spatial map of ITD responses in barn owl
    • However, in guinea pigs, we can't find these maps. The neurons that have a preferred timing have preferred timings that are way bigger than actual ITDs a rodent might experience. But putting the peak response off to the side, you put the strongest slope in the most important range. The result is no map, but good sensitivity to right vs. left. Harper & McAlpine (2004). ITD responses in Gerbils
    • We don't know how this code is built in humans. But when we look for it, we'll look in the Superior Olivary Nucleus and the Inferior Colliculus.
Interaural level difference (ILD) is a binaural cue for high-frequency sounds only
High frequency sounds have short wavelengths, so the head casts an acoustic shadow and sounds are quieter in the ear away from the sound

Below about 1000Hz, there is no ILD because the head is small compared to the wavelength of the air pressure perturbation ... the sound sweeps on by without really noticing the head.

    For the mathematically inclined: speed = wavelength x frequency, so 340 m/s = wavelength x 1500 Hz ... and the wavelength of 1500 Hz tone is 340/1500 = 0.22 meters, which is about the size of your head.
    ILDs can be as big as 20dB for some frequencies; they depend both on frequency and on the direction that sounds are coming from
    ILDs are more useful at higher frequencies; ITDs stop being useful at about 1500Hz.
    Lord Rayleigh (1842-1919) was the first to state the Duplex Theory, which observes that we use ITD at low frequency and ILD at high frequency (although it gets a bit more complicated for complex tones)
Head-related transfer function (HRTF) is a monaural cue
Each of us has a customized HRTF because of the shape of our pinnae: different frequencies are attenuated differently at different elevations. Thus, the HRTF provides information about elevation. It's a compliated function and one that we've learned after years of living in our heads. A study by Hofman et al. (1998) changed the shape of subjects pinnae and discovered:
    1. Subjects lost the ability to determine the elevation of sounds when the first received some fake pinnae
    2. Subjects learned a new HRTF after a few weeks
    3. Subjects still had their old HRTF (were immediately accurate determining elevation) when the artificial pinnae were removed.
    Pretty cool! So do dogs, and other animals that move their ears, maintain multiple HRTFs and know which one to use depending on their ear position?
There's a thing called a cone of confusion, which is the cone-shaped region pointing out from the side of your head in which ITD and ILD are the same for all locations. Elevation information provided by the HRTF is needed to break this up.
Indoor spaces
Indirect sounds are the sounds that bounce off of something before they get to your ears. Many sounds bounce many times. Reverberation time tells us, in part, how big our space is, but also what kind of material it's made out of. Reverberation time is how long it takes a sound to die down to a fraction of its original amplitude (60 dB lower than original level, "RT60").
  • In stadiums and classrooms, you want a short reverberation time (1-2s)
  • In concert halls, you want longer reverberation times (3-4s) to blend music and make notes last. Sometimes, they build and then retrofit. The Royal Festival Hall in London was retrofit with a set of speakers that actively create longer reverberation time.
Distance perception
One cue involves reverberation time: the direct-to-reverberant ratio. Another cue is expectation. If a whisper is loud, then it must be close; if a yell is quiet, then it must be far away.
Using primitive and schema-based strategies to separate out sounds

The cocktail party problem is a classic problem -- when you're in a crowded room and trying to listen to just one person's voice, how do you do it?. Past scholarship:

  • Al Bregman wrote the book on it in 1990. "MELODY" "DISTRACTORS" example.
  • Gestalt principles layed out by early psychologists (centered on vision) guide our thinking: similarity, proximity, good continuation, common fate.

Primitive strategies are bottom-up strategies that are based on physical features that drive sensory resposnes (pitch, timing, timbre ...). We listened to many examples of this in class ... our perception of interleaved melodies is predicted by the fact that we group tones that are close in frequency or close in time, and these two factors compete against each other. Xylophone music demonstrated stream segregation by pitch or timbre.
Schema-based segregation strategies are top-down, employing prior knowledge (like the Mary Had a Little Lamb demo -- easier to pick out a medlody if you know it or you know it's there.
Speech production
Producing speech sounds is incredibly hard -- precise timing is required throughout the vocal tract. Key contributors to this danceare:
  • Tongue, lips, teeth, throat, epiglottis (a flap of tissue that keeps food/water from going down your "windpipe", the trachea, to your lungs): shaping sounds and starting and stopping them.
  • Larynx (voice box) and vocal cords control pitch and vibrate to produce sound. THe glottis is the section of the larynx where the vocal cords are located.
    • Throat singers form a second resonant cavity in their pharynx
Time-frequency analysis shows up a lot. For example, in analyzing EEG data, it helps us characterize when different patterns (alpha rhythms, gamma oscillations) show up.
    For speech, when we visualize the evolving frequency composition of sounds, it is called a spectrogram. For music, this is the graphic equalizer on a stereo.
On spectrograms (time-frequency plots) of speech, there are bands of power at different frequencies. These are formants. As the speaker changes the sound she is making, the power bands swoop up and down. These are transitions.
Categorical perception
Phonemes are single sounds, often shorter than a syllable. If you change a phoneme in a syllable or a word, you change the meaning of the word.
Mature listeners perceive phonemes categorically. For example, as you the increase voice onset time in the stimulus, there’s an abrupt transition from one category (“da”) to another (“ta”), even though the stimulus is being varied continuously. This is called categorical perception.
    It's not useful to distinguish between sounds that don't have distinct meaning, so we've learned to lump things together and draw (arbitrary) boundaries between phonemes.
    Different languages draw boundaries at different location (one example being Korean, which recognizes a third phoneme between /ba/ and /pa/ that English speakers don't hear).
    Our phonetic boundaries (categorical perception) are drawn at about 6 months of age. That's when we start losing the ability to hear phonemes from other languages.
Speech is hard to understand because of ...
    Difficulty in segmenting phonemes -- where does one stop and another start? Anyone who has learned a second language knows how hard it is to know where one word stops an another starts! People who study speech call this the segmentation problem.
    Variability due to co-articulation -- phonemes look/sound slightly different in different contexts. In a spectrogram, phonemes are characterized by formants (bands of constant sound during vowels) and transitions (onsets and offsets of formants, usually associated with consonants). A single phoneme will look very different on a spectrogram, depending on what syllable, word or phrase it is part of.
    Variation in speaker styles -- we all speak at different speeds, slur things together, etc. Human listeners employ a lot of social and contextual cues (e.g., visual cues) to figure out what people are saying. Computers do not have access to this information.
    The McGurk effect shows how we use visual cues (cue combination: auditory plus visual cues) to ignore the variability and figure out which phoneme is which.
Aphasia is a loss of language skills due to cortical damage.
There are 3 primary kinds of aphasia:
    Damage to Broca’s area (expressive aphasia) results in labored (telegraphic) speech but spares word content
  • Interestingly, even though Broca's area is known for speech production, you also use it when you are reading. It's like you're reading to yourself ...
    Damage to Wernicke’s area (receptive aphasia) results in fluent but contentless speech (word salad)
    Damage to the white matter connecting language areas results in a more difficult to diagnose “conductive aphasia
Light is a small portion of the electromagnetic spectrum
Electromagnetic radiation is sinusoidal electric and magnetic fields that are oriented at right angles (orthogonal) to each other, out of phase (one gets large while the other gets small, then they trade off), and can propagate through a vacuum. Like sound, all electromagnetic radation travels at the same speed (300 million meters per second in a vacuum), and speed = wavelength x frequency. The portion of the electromagnetic spectrum that we call light is the portion that has wavelengths between 400 and 700 nanometers.
    We define light as the electromagnetic rays that interact with the photoreceptors in our eyes.
    A honeybee would have a different defintion of light, because it can see ultraviolet rays (200-400nm), where we just think of UV as a sunburn hazard.
Light is quantized. Even though light propagates as a continuous dance between electric and magnetic fields, the energy is quantized - a photon is the smallest amount of light that can be generated or transmitted
    A blue photon has more energy (shorter wavelength, higher frequency) than a red photon
    A photon in the gamma ray segment of the spectrum conveys much more energy than a photon in the visible portion ... which is much more energetic than a radio wave (which has a wavelength of tens or hundreds of meters)
Structure of the eye (bold indicates testable terms)
    Cornea - transparent but alive curved structure at the front of the eye, with embedded nerve endings (pain, touch and thermal sensation). It has 80% of the focusing power of the eye but is not flexible.
    Sclera: white, hard outside of the eyeball
    Aqueous humor: low viscosity fluid behind the cornea, in front of the lens
    Iris: muscular, colored tissue that surrounds (and shapes) the pupil
    Pupil: hole through which light enters the eye
    Lens: flexible, clear substance that provides 20% of the focusing power of the eye
      Only 20% of focusing power of the eyes, but important because it is flexible and gives us the ability to accommodate
      Accommodation: the ability to focus on things that are near or far away
      Presbyopia: "old eyes" the lens gets hard and can't be squished by the ciliary muscles to focus on things that are near. Eventually, your arms won't be long enough to hold reading material far enough away for your eyes to focus on it.
      Cataracts: crystallization of the lens scatters light, making it hard to perceive detail. Solution: replace the lens. (We'll talk about that more during the low vision lecture in 2 weeks). Yellowing of the lens also creates changes in color perception.
    Ciliary (or lens) muscles: muscles that control shape (curvature) of the lens
    Vitreous humor: high viscosity fluid filling the eyeball
    Retina: sheet of neurons at the back of the eye
      Blind spot: location on the retina where there are no photoreceptors (because axons heading for the optic nerve occupy that space). Your brain fills in the hole with a copy of what's around the blind spot.
      Fovea: where light from the center of gaze lands on the retina
    Pigment epithelium: black layer behind the retina where visual pigments are replenished
      Nocturnal animals have a shiny tapetum lucidum instead of an absorbant pigment epithelium
      The reflectivity of the tapetum lucidum gives photons a 2nd chance at getting detected, but decreases acuity because light is scattering more.
Near-sighted vs. far-sighted eyes
Because the cornea has 80% of focusing power of the eye, but is not flexible, it determines whether you are near-sighted or far-sighted.
    If the cornea focuses too fast or the eyeball is too long, you're near-sighted (myopic)
    If the cornea does not focus strongly enough, light from near objects focuses behind the retina and is blurry: far-sighted
    LASIK and LASEK can reshape the cornea to eliminate the need for corrective lenses for near-sightedness, far-sightedness, astygmatism and other issues
Basic neural architecture of the retina
The retina has 5 main cell types in 3 main layers. From back to front, these are:
    Input layer: photoreceptors (rods and cones). Roughly 126 million per eye.
    Processing layer: horizontal cells, amacrine cells & bipolar cells
    Output layer: retinal ganglion cells. Roughly 1 million per eye.

The outer segments of the rods and cones are mixed together to tesselate the back half of the eyball and catch light from everywhere, with two exceptions:

    In the fovea, there are only cones.
    In the blind spot there are no photoreceptors, because all of the axons are leaving the eye.
Light transduction happens in the outer segments of the rods and cones (collectively called photoreceptors). This means that light travels through several layers (ganglion cells, bipolar and amacrine cells) before it does anything! Diagram showing molecular strecture of retinal as it goes through the cycle of catching a photon and being replenished in the pigment epithelium. By Krishnavedala - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=39504470
    The outer segments of the rods and cones have stacks of discs (membrane pancakes) that house photosensitve proteins
      The disks are there to increase the membrane area
      The photosensitive protein in a rod is rhodopsin, a 7 transmembrane protein, with retinal attached to one of the transmembrane domains
      Cones have other opsins, with slightly different structures that make them sensitve to different wavelengths of light (red, green, blue)
      The molecule that actually absorbs energy from a photon is retinal, which is derived from Vitamin A (one half of a betacarotene molecule).
        When light (of the right wavelength/color) hits 11-cis retinal, it changes conformation (to all-trans), acting as a switch to start an enzyme cascade in the cell, which eventually changes the rate at which photoreceptors release neurotransmitters
        After absorbing light, all-trans retinal needs to be turned back into 11-cis retinal. So it breaks off its opsin, finds its way to the pigment epithelium, gets bent back into shape, and finds its way back to an opsin. This is why photoreceptor outer segments need to be close to the pigment epithelium.
Foveal vs. peripheral vision: why eccentricity matters
The fovea is the central 1 degree of the visual field and contains only cones.
    We have high acuity (ability to see fine detail) in the fovea because there is little convergence: basically one photoreceptor for every output ganglion cell.
    Acuity is also improved by high light levels (smaller pupil, ideally 2-5mm), longer exposure times, and appropriate focusing of the image by cornea and lens
The macula is the central 5 degrees of visual angle.
The peripheral retina is dominated by rods, but also contains cones
    Rods are more sensitive to light than cones
    We have low acuity in the periphery because there is strong convergence: many rods map to a single output ganglion cell.
    Convergence (each ganglion cell pools responses from multiple photoreceptors) increases sensitivity but decreases acuity. There is lots of convergence for rods; much less convergence for cones.
Eye movements are often reflexive or automatic
It's actually impossible to hold your eyes perfectly still. When we study a picture, we move our eyes around it -- on average every few hundred milliseconds. Saccades are these fast motions of the eye, basically jumping between fixations. The reason we make saccades is to put our fovea on interesting parts of the image, so we can see it better.
Dark adaptation and photobleaching
Photobleaching occurs when the pigment epithelium cannot regenerate 11-cis retinal as fast as it is converted to all-trans retinal.
    During daylight viewing, rods are essentially completely photobleached, and cones are partially photobleached.
    It takes about 7 min. for cone visual pigments to replenish; it takes 20 - 30 min. for rod visual pigments to replenish
      Therefore, a normal observer's threshold therefore decreases with time (sensitivity increases) in 2 phases
      At first, the cones are more sensitive than the rods, so sensitivity improves for a few minutes as cone visual pigments are replenished then plateaus
      After 7 or 8 minutes, the rod photopigments have replenished enough that the rods become the most sensitive cells in the retina, and sensitivity continues to improve for another 13-22 minutes (20-30 total) while the rod visual pigment finishes replenishing.

Cool story about "Cats can see in the dark".
Center-surround receptive fields
The definition of a visual receptive field is the region of visual space in which a change in lightness or color will cause a change in the neuron's firing rate. Almost all receptive fields have structure -- different changes in different parts of the receptive field will have different effects on the neuron's response. Lateral inhibition in the middle processing layers of the retina results in center-surround antagonism in ganglion cell receptive fields. There are four basic types (ignoring color for now):
    Transient response, excitatory center/inhibitory surround
    Transient response, inhibitory center/excitatory surround
    Sustained response, excitatory center/inhibitory surround
    Sustained response, inhibitory center/excitatory surround
    A key function of this receptive field structure: neurons only respond to edges. When center and surround are balanced, the RGC will not change its firing rate in response to uniform illumination.
    A simple circuit with lateral inhibition explains Mach bands, the illusory appearance of light and dark lines flanking an abrupt change in brightness (like a shadow) and the gray dots that appear at the intersections of white lines in the Hermann grid illusion.
Exam 2 has 40 questions (36 multiple-choice, 2 spring, 2 short-answer), and covers hearing and the eyeball.
Causes and prevalenceof vision loss
Worldwide: 2% of the population (124 million) have low vision; 0.6% are blind
    Leading cause of vision loss is cataracts
    Also on the list: glaucoma, uveitis (infection), corneal opacity or malformation (more severe than simple near-sightedness or far-sightedness), macular degeneration, trachoma (bacterial infection)
Currently, around 5.7 million people are low vision (acuity < 20/40) in America, but the number is expected to rise to 9.6 million by the year 2050 (Chan, Friedman, Bradley, & Massof, 2018). Although medical treatments are improving, the number of visually impaired people is still rising. It is because our population is getting older in the future, and the leading causes of low vision in America are some age-related eye diseases, such as age-related macular degeneration. Another factor is that the overall increasing population in America. If the prevalence rate of visual impairment is fixed across time, we are expecting more visually impaired people in the future.
Definitions of low vision and blindness
In the US: 1.3 million people are legally blind. Another estimated 9 million have low vision.
    Low vision: visual acuity in best eye worse than 20/60 even with correction
    Legally blind: worse than 20/200 (or only 20 deg. visual field)
    Check out the low vision simulations at NEI, See Now or download an app like Tengobajavision, AiraVisionSim or ViaOptaSim/
Prevention

Glaucoma: detect pressure build-up due to blockage of flow of aqueous liquid before pressure damages optic nerve head

Macular degeneration: there are two types, wet and dry. Dry is less severe and slowly progressing. Early detection permits injection of anti-vascular drugs to keep elaboration of blood vessels from blocking vision

Diabetes: control of diet/blood sugar can stall peripheral degeneration and diabetic retinopathy

Trachoma: antibiotics and hygiene prevent blindness

Treatment and/or accommodation via sensory substitution
Goals (in US) focus on social experience (incl. communication) and navigation
    Surgery for optical issues: replace cornea (rare) or lens (cataract surgery)
    Gene therapy is promising for some forms of macular degeneration (mouse example for Stargardt's disease) or congenital blindness (dog example for Leber's congenital amaurosis)
    Artificial vision: sensory substitution takes many forms. Specifically mentioned in class:
      Guide dogs and canes
      BrainPort -- a device that indicates object locations with tactile stimulus on the tongue
      vOICE -- a device that converts images into "soundscapes" (e.g., horse)

Additional information provided by Walter Wu, graduate student in Psychology:

Nowadays, people with visual impairment do not only rely on optical magnifiers to read. There are some assistive technologies developed to help with many daily tasks. One example is Microsoft Seeing AI (https://www.microsoft.com/en-us/ai/seeing-ai ). The app can be applied in different contexts, such as reading, scene recognition, and social interaction. It can transcribe printed text into audio by using the phone camera. People can also use it to recognize people’s faces, currency, and some common scenes. It’s an application of current developments of computer vision.

People with visual impairment can also navigate with apps like BlindSquare (https://www.blindsquare.com/) and Microsoft soundscape (https://www.microsoft.com/en-us/research/product/soundscape/). These apps cannot replace white canes or guide dogs or orientation training for visually impaired people, but they can provide more accurate information about a surrounding environment, such as how far away from a bus stop or a restaurant. SoundScape can also present the direction of nearby targets either to your left ear or right ear, which is called Audio AR.

Another approach is the service of remote assistants, such as Aira (https://aira.io/). Users who subscribe to the service can connect with a trained assistant on their phones. The agent can see through a user’s phone camera and offer requested assistance, such as reading mails or bills or navigation. Many public spaces in the US have adopted this service. For example, people with visual impairment can access this service in the MSP airport and many Target stores without further charges.

There are many other accessibility features built in our phones, tablets, and computers. Some commonly used features are like color inversion, Zoom in and text-to-speech. VoiceOver is a popular accessibility function built in Apple products, such as iPhones, iPads and Macs. It can help users read texts on their devices. Windows users can use other software like JAWS. These accessibility features can not only help people with visual impairment but also normally sighted people. For example, people can use VoiceOver to listen to an E-book. Or many people use the color inversion function to reduce visual fatigue caused by a white and bright display.

Visual prosthetics
Retinal and cortical implants have existed for 30 years, and progress is being made on both fronts. Both have a resolution problem: to provide even low vision, we would need to stimulate neurons with a spatial precision of 100 microns, and right now, the precision is measured in millimeters. So the current goal is to restore useful visual experiences to people who have none.
    Both approaches share the problems of biocompatibility and stability: the salt-water environment of the body is hostile to electronics, and scar tissue degrades performance of devices over time.
    Retinal implants need to contend with the fact that the eye moves rapidly and the retinal sheet is delicate.
    Cortical implants have more neural territory to work with, but permanent brain implants bring risks of infection, inflammation, and other complications you don't want in your brain.
Most exciting recent progress has been made by Second Sight's retinal implant
Magnocellular and parvocellular pathways
There are two visual nuclei in the thalamus: the lateral geniculate nucleus (LGN) and the pulvinar. The pulvinar is huge, but we never talk about it. Neurons in the LGN have receptive fields are center/surround like the retinal ganglion cells. The LGN has 6 layers, segregating inputs/outputs according to:
  • Eye of origin: which eye the information is coming from
  • On- or off-receptive field: whether the center is excitatory or inhibitory
  • Magnocellular or parvocellular pathway (more details below). The names come from the fact that reginal ganglion cells have large (magno) or small (parvo) cell bodies.

Although the picture is more complicated, it is useful to think of 2 streams of information coming from the eyes to the brain.

    The magnocellular system carries information about large, fast things (low spatial frequency information; high temporal frequency information) and is colorblind
    The parvocellular system carries information about small, slow, colorful things (high spatial frequency information; low temporal frequency information).

A koniocellular system exists too (very small cell bodies); there are koniocellular layers between magno and parvo layers in LGN ... but those details are beyond the scope of this class.

Primary visual cortex (V1) is located in the calcarine sulcus, has retinotopic organization and cortical magnification
The calcarine sulcus is in occipital cortex, on the medial aspect. If you dissect away the white matter, the gray matter can be laid out like a sheet (flat). When you do this to V1, it is roughly U-shaped, and we use this visualization a lot. This is perhaps easiest to understand when you see it on an inflated human brain: inflated brain with V1 highlighted
Retinotopic organization means that neurons with receptive fields close together in visual space have cell bodies close together in cortex. Here's an interactive visualization of how the visual world is mapped to V1.
V1 also has cortical magnification. The high density of cones in the fovea means that many more V1 neurons are needed to represent the fovea, compared to the periphery. So, just like the somatosensory homunculus has big lips and fingers, the map of the visual world is distorted in V1, with foveal information blown up.
Orientation pinwheels, blobs and ocular dominance columns = hypercolumns!
V1 neurons have more complicated receptive fields than LGN neurons. Most V1 neurons respond well to short bars (likely created by summing up a line of LGN inputs). Different neurons like bars at different orientations, but almost every neuron has a preferred orientation.
    Like retina and LGN, the receptive field center can be excitatory or inhibitory.
Columnar organization is an important property of primary sensory and motor cortices. The gray matter has 2 important directions: across the surface and through the depth.
    If you sample neural responses as you move across the surface, they change. For V1, this means that, as you move across cortex, you find neurons with different orientation preferences.
    If you sample neural responses as you move through the depth, they stay the same (approximately). For V1, this means that, as you move down through cortex, you find neurons with the same orientation preferences.

In the through-depth direction, V1 has layers. All of cortex has layers -- 6 of them, defined by the types of cells that live at different depths. In primary sensory areas, the input layers are fat; in motor areas, output layers are fat.

    Input layers are in the middle (Layer 4). V1 is named striate cortex because, when the tissue is stained, the dense input from the LGN shows up as a dark band in the input layers.
    Local processing and connections to other parts of the brain are in the superficial layers (Layers 2 & 3).
    Feedback to the thalamus comes from deep layers (5 & 6). In motor cortex, it's layers 5 & 6 that send motor control signals to the body (via the thalamus).
    Layer 1 has very few neurons and lots of axons and dendrites, so lots of cortical connections get made up there.

For each region of space, there is a complete set of neurons to represent every feature V1 needs to encode.
    Left and right eye inputs are segregated into ocular dominance columns. This segregation is strongest in the input layer (4), so when we look for ODCs, we look in the middle of cortex.
    A cluster of orientation columns is called a pinwheel; there's an orientation pinwheel for each eye.
    Some V1 neurons are color-blind; some are color-selective. Color-selective neurons live in blobs (regions of cortex that stain dark when you stain for cytochrome oxidase, because they're metabolically rich); there's a blob for each pinwheel.
Hypercolumns are the same size everywhere on cortex, which means that hypercolumns in the fovea represent smaller regions of space, i.e., receptive fields are smaller here. This is a consequence of an increase in the number of projections from the fovea to cortex, but a uniform density of input connections across cortex (cortical magnification).
Uses of color
This is of course not an exhaustive list: Scene segmentation; object detection; object recognition; mate selection; threat detection; aesthetic enjoyment
Competing theories for color vision: trichromatic vs. color opponency
Proposed during mid-19th century ... and both correct!
    Trichromatic theory: any color can be matched with a combination of 3 primary colors (and it's not important exactly which three colors, but 2 is not enough and 4 is too many)
      Remember the distinction between additive and subtractive color mixing.
    Opponent process theory : we see colors as opponent pairs: red vs. green & blue vs. yellow.

Reconciliation lies in the retina: we have 3 kinds of photoreceptor pigments, but the circuity of the retina combines them so ganglion cells respond along a red/green axis or along a blue/yellow axis

Color vision deficiencies
True color blindness (the lack of color sensation) is rare, found in rod monochromats (people with only rods) and as a result of some brain injuries (color-sensitive blobs are most sensitive to oxygen deprivation).

Color deficiency is more common, and results from the lack of one of the cone pigments

    Protanopia: no L (long wavelength, or red) pigment. Hard to distinguish between red and green; reds look particularly dark. A few percent of the male population; a very small fraction of a percent of the female population (since the genes for the cone pigments are on the X chromosome).
    Deuteranopia: no M (medium wavelength, or green) pigment. Hard to distinguish between red and green. A few percent of the male population; tiny fraction of a percent of the female population.
    Tritanopia: no S (short wavelength) pigment. Difficult to distinguish yellows, greens and blues. Very rare.

Check out color vision simulations at colorblindness.com or colourblindawareness.org.

Simultaneous contrast
Simultaneous contrast is the following phenomenon: a gray patch looks lighter when it's next to a darker patch. This demonstrates how fluid our perception of lightness really is.
    Neural responses representing luminance boundaries are more credible that neural responses representing uniform patches (because adaptation makes it impossible to make absolute lightness judgments, and the center/surround receptive field structure provides weak responses to uniform fields).
    Therefore, the lateral inhibition experienced at boundaries propagates across uniform textures.
Color (and luminance) constancy
There are (at least) three things that contribute to our ability to experience an object as being the same color in spite of dramatic changes in the spectral content of the illumination (and therefore the spectral content of the light being reflected to our eyes:
    Chromatic adaptation: when the light source contains a disproportionate amount of light in one section of the spectrum (e.g., the strong red content in tungsten light compared to sunlight), the responses of the relevant photoreceptors are suppressed, shifting our perception away from the dominant wavelengths.
    Local context: we (consciously and unconsciously) compare all the colors in a scene to normalize our perception. Even though the true color of a patch of material might be brown (true = viewed under sunlight on a white background), it will look red if it's next to something green, or green if it's next to something red.
      Simultaneous contrast (described below) occurs for color as well as luminance and is a low-level mechanism by which local context comparisons can be made.
      High-level visual processes can also be used to make this local context comparison
    Prior knowledge: some objects, like an orange or a banana or a coke can, are identified by their color and will often be perceived as the "true" color in spite of illumination differences.
      An exception to this rule is the illumination at night under sodium vapor lamps, which have very narrow spectral content, so we see the world in sepia tones instead of colors that approximate the true colors.
      When viewing a scene, we perceptually assign the label "white" to the lightest things in the scene (things that reflect > 70% of the light), and "black" to the blackest (things that reflect < 10% of the light).

The Dress, which inspired much discussion and scientific analysis, represents a failure of constancy.

Oculomotor depth cues
One cue we use to understand how things are moving through 3D space is proprioceptive information: for example, when our eyes track an object coming toward us, we can feel our eyes moving together. Oculomotor cues are proprioceptive information from oculomotor muscles (which rotate the eyeballs to converge at a particular depth) and ciliary muscles (which compress the lens to change focal length).
Monocular depth cues are surprisingly strong
If you've ever taken a painting class, you learn about many different techniques for creating perspectiv and making things look three-dimensional. In Sensation and Perception, we call these monocular depth cues. They are visual depth cues that can be perceived without stereo vision, i.e. with just one eye:
    Occlusion: stuff gets in front of stuff
    Relative height: things on the ground at a distance look like their base is higher
    Relative size: things farther away are smaller
    Perspective convergence: parallel lines look like they get closer as they get farther away
    Familiar size: if we know how big something is, it will look farther away when it seems too small, and close to us when it seems too big.
    Atmospheric perspective: far away things look hazy (this cue can be misinterpreted when hiking on a clear day!)
    Texture gradient: similar to relative size, textures look finer as they recede
    Shadows: Kersten's ball-in-a-box demo illustrates that shadows are a powerful cue for relative height and a depth -- as long as they're consistent with our "light from above" assumption. And there's this.
    Movement parallax: as we move, things at different depths cross over each other. YouTube video: http://www.youtube.com/watch?v=Jd3-eiid-Uw
    Accretion/deletion: when one thing moves in front of another, the amount of the thing in back that you can see gets deleted; when the front thing moves out of the way, there's accretion of the back object.
Binocular cues rely on stereo vision
To understand binocular depth cues, you need to understand disparity. Disparity refers to the fact that the two images of an object on your two retinas are in different relative locations if the object is not sitting at the depth where your eyes are focused and pointed. Vergence refers to where your eyes are pointed. Accommodation refers to the focus of your eyes. Usually these match: your eyes are verged and focused on the thing you're looking at, and that point defines the horopter:
    The horopter is an imaginary circle drawn through the thing the eyes are converged on and back to the eyes, which traces out the location of all other objects in the 3D visual field that will land on the retinae with zero disparity.
    Things closer to you than the horopter have negative (crossed) disparity. Things beyond the horopter have positive (uncrossed) disparity.
    In primary visual cortex, disparity neurons are tuned to the relative location of images of the same object on each retina. Some neurons are tuned to near; some to far.
Strabismus and amblyopia affect brain development
    Strabismus is the misalignment of the two eyes. Sometimes this is due to problems with oculomotor muscles, which can be cured by training or surgery.
    Anisometropia is a condition in which the two eyes have very different refractive powers (e.g., far-sighted in one eye, near-sighted in the other).
    Amblyopia is the developmental disorder characterized by reduced spatial vision in an otherwise healthy eye, even with proper correction. Amblyopia can be result of strabismus or a anisometropia
      The brain discards the information from the non-dominant eye, creating monocular vision instead of binocular.
      Usually this is not discovered in children until 3 or 4 years of age. If treated young, stereo vision can develop normally.
      Perceptual deficits can be treated even in adults.
    If you grow up with amblyopbia (due to strabismus or something else), you won't develop stereo vision.
The correspondence problem is the problem of matching information between eyes.
Once you've picked a vergence and accommodation, you still have to create a 3D world using 2D retinal images. It's not clear how the brain decides whether or not features in each retinal image belong together and are offset (creating stereo disparity). Autostereograms (Magic Eye) pictures take advantage of your brain's skill and flexibility figuring this out. If you focus (and point) your eyes in front of or behind the image, patterns in the image are designed to have the correct disparity cues for describing a 3D world.
If you can't solve the correspondence problem, you get binocular rivalry
3D movies and stereo displays use disparity to create 3D images ... for most people
Polarized glasses (at theaters) or red/blue (anaglyph) glasses allow us to present separate images to each eye. However, about 5% of us cannot see them because we don't have stereo vision.
Emmert's law demonstrates how retinal size is determined by a combination of object size and viewing distance
If you look at a bright object on a dark background (or vice versa), then close your eyes, you see a ghostly after-image. If you then look at a screen close to you, it looks like a relatively small ghost. But if you look at a screen far away, it looks like a big ghost.
It's actually really hard to guess how far away a visual object is
As with many other visual problems, our visual system usually solves this one easily. However, images close to the eye and far from the eye arrive in the same place in the retina.
Our brains have learned that retinal size is not an indication of actual object size, so we're always reaching for other cues to figure out how big something is.
    The Ames room and Ponzo illusion demonstrate situations in which size perception breaks down because depth cues are strong.
Think back to the rubber hand illusion, the McGurk effect (bah/dah/gah) and other examples of cue combination. Retinal size is one cue; all the different depth cues are also there. Visual networks in the brain have to combine these cues to come up with a best guess.
Babies are born legally blind but orient to faces almost immediately
Visual acuity is awful at birth -- 20/400 vision (legally blind). Acuity is reasonable at 3 months, but still improving. The primary cause of this is that humans are born with underdeveloped neurons:
    Retinal photoreceptors have big inner segments and small outer segments, so visual information is undersampled.
    Some sources say color vision is normal; some say it isn't.
    Cortical neurons have relatively sparse connections. The first 6 months of development witnesses a great elaboration of connections (note neural density is roughly the same, but connections -- axons & dendrites -- are much richer).

Infants show an immediate preference for faces, in spite of horrid visual acuity. But really, any lightbulb with eyes looks like a face, and infants appear to cue off of large-scale cues like hairline.

Many aspects of vision continue to change for the first several years of life The contrast sensitivity function (detection threshold as a function of spatial frequency) makes good progress toward "normal" (peak sensitivty: 4 cycles/degree) during the first several years of life, but does not reach adult form until ~10 years old. Hyperacuity (the ability to perform better on a Vernier acuity task than predicted by normal visual acuity) develops after ~10 years. This skill requires extrapolation of straight lines to achieve hyperresolution.
The visual system is basically functional by 1 year of age
Depth perception begins at 3 mo. (evidence from vergence -- eyes track objects as they get closer, and stereo vision); infants understand that closer things are larger ~ 7 mo.

Object perception: infants understand occlusion and Gestalt principle of common fate (stuff that moves together belongs together) at ~3 mo.

An aside: how do we know how the visual brain is organized?

In chronological order ...
    Lesion studies: Tony Movshon gives a great talk about how high muzzle-velocity rifles have been a boon to neuroscience because they cause localized, survivable wounds and we can see what functions survive or don't survive when different brain regions are organized. Strokes have also provided a great deal of information about localization of function.
    Animal models: Primary visual cortex was first described in the cat by Hubel and Wiesel in the late 1950's. Since then, much has been learned about the organization of the human visual system by studying the organization of the cat and monkey visual systems.
    Neuroimaging methods (see below)

Philosophical question: visually responsive areas vs. visual areas vs. visual maps

    The most stringent test for whether a region is a visual area is whether it contains a regular map of some kind of stimulus attribute or visual feature, e.g. in V1, different orientations are represented in different places on cortex.
    A looser test for "visual area" is whether the region responds better to visual stimuli than other stimuli (like sounds or words). This test is used for many higher visual areas, in the parietal and temporal cortex.
    Many areas of the brain respond to visual stimuli (e.g. a Halle Berry cell has been discovered in somebody's entorhinal cortex), but a region that responds to words and ideas as strongly as it responds to visual stimuli should not be called a visual area.

>Neuroimaging methods: non-invasive study of brain activity lets us discover new visual areas and maps every year

Neuroimaging methods are useful, but must always be interpreted in the context of behavior: having a picture of a brain does not make you right!
PET: positron emission tomography. Requires the use of small amounts of radioactive tracers, and each data point takes a long time to acquire, but this technique gives us good information about metabolic activity, or specific neurotransmitters (e.g., maps of dopamine concentrations in the brain).
EEG: electroencephalography. This measures the electric fields in the scalp that are generated by clusters of neurons that are strongly stimulated. The technique suffers from poor spatial resolution, but can detect millisecond timing differences.
MEG: magnetoencephalography. This measures the magnetic fields (perpendicular to the electric fields) that are generated by clusters of active neurons. MEG has slightly better spatial resolution (both techniques have millisecond temporal resolution), but is more difficult and much more expensive than EEG.
fMRI: functional magnetic resonance imaging. MRI is sensitive to the spatial details of the magnetic field in your head, which is related to the local concentration of deoxyhemoglobin.
A representative sampling of specialized brain regions
While we memorize a few regions as examples of specialized function, it is more important to remember that every visual stimulus is processed by every visual region. Neurons selective to particular aspects of an image will be most responsive, but faces aren't just represented in FFA; biological motion stimuluates many, many brain regions. Visual experience is the sum of all of these responses, and these regions provide simultaneous, multi-dimensional representations of our complex visual world.
FFA: fusiform face area; neurons here like faces!
MT: medial temporal area; neurons here like coherent motion (in little patches)
MST: medial superior temporal area; neurons here put together coherent motion from around the scene to detect optic flow
STS: superior temporal sulcus; neurons here respond well to biological motion
VWFA: visual word form area; neurons here encode our understanding of written language
Similar to auditory stimuli, more dorsal regions of the posterior half of the brain (parietal cortex) are involved in processing information about location ("Where/how pathway" ); more ventral (temporal) regions are involved in recognizing objects ("what" pathway).
    Magnocellular information (colorblind neurons responding to large, fast things) tends to serve up the "where/how" pathway.
    Parvocellular information (color-sensitive neurons responding to small, slow things) tends to head down the "what" pathway.

Again ... a car zooming by is represented in the dorsal and ventral visual streams: we need information about both identity and location. And this information needs to be integrated. There is a massive white matter bundle (fasciculus) connecting dorsal and ventral regions. This paper by Jason Yeatman et al. about the history of the vertical occipital fasciculus is great:

Visual perception is often ambiguous
Our perception of a scene is an interpretation of the retinal images. The exact mechanisms that our visual brains use to represent and wrestle with uncertainty are open questions. Here are some visual experiences that illustrate the problem of ambiguity:
    Figure/ground segmentation. Low-level features (lines, edges, textures) provide clues about what belongs with what, but high-level interpretation (shape, scene layout) is also needed to separate foreground and background.
      Dalmation dog.
      Vase/Face: when equal evidence exists for multiple interpretations, we experience spontaneous switching between perceptual states. Perhaps this is because feedback has not amplified a "winning" response in V1.
    Aperture problem. When we only see part of an image, we can perceive line segments as moving either horizontally or vertically.
    Shading and light
      Light from above: when a scene is ambiguous, our perception of shape relies on the assumption that light is coming from above and casting shadows downward.
      Shape from shading: our perceptions of lighting, lightness, and shape are interrelated.
Gestalt principles are useful heuristics
These are not absolute rules, but common sense principles explaining why we see what we see, developed during the 1st half of the 20th century. By contrast to structuralism (whole = sum of parts), Gestalt principles describe how the whole (our perception, or interpretation of an image) can be greater than the sum of the parts.
    Pragnanz (simplicity or good form). The simplest shapes, and shapes that match their neighbors, are usually the right explanation for an image.
    Similarity. Things that look alike probably come from the same source.
    Good continuation. Contours rarely change abruptly; curves are smooth, acute angles are rare.
    Proximity. Things that are close together belong together.
    Common fate. Things that move together belong together.
    Meaningfulness or familiarity. We cluster features into familiar patterns (e.g., faces made out of rocks or branches in an image).

Another way of thinking of Gestalt principles is: rules we follow to resolve ambiguity in our environments.

The Bayesian brain
Thomas Bayes (1702-1761) articulated this understanding of how inference is done: p(A|B) x p(B) = p(B|A) x p(A)
  • p(A|B) is the probability of A, given B (given that it is snowing outside, what's the likelihood that it is cold outside?)
  • p(A) and p(B) are just the probabilities of event or observation A or B
  • For scene perception, the easiest way to write this is: p(S|I) = p(I|S)*p(S)/p(I). I is an image, S is a scene. So what we're trying to calculate is the posterior likelihood that we're seeing some scene, given some image that hit our eyeballs.
    • To calculate this, we need a generative model, p(I|S) ... the probability that a given scene might generate a given image
    • We also need to know p(S): the probablity of a particular scene (which we develop from prior experience)
    • Since the probability of the image we're experiencing is the same, regardless of our scene interpretation, we generally just drop that out of the equation and say p(S|I) is proportional to p(I|S)*p(S)
Many visual illustions and perceptual experiences can be explained using Bayes' theorem. Examples from class:

Here's a guide to describing our empathy for the Boston Dynamic Big Dog in terms of Bayesian inference:

  • We feel bad when the robot slips on ice because we're reacting to it like it's alive.
  • We're perceiving it as alive because our Bayesian brains computed a high posterior likelihood that the robot is living.
  • We calculated the high posterior likelihood that the robot is alive by multiplying two probabilities together: a generative model and a prior.
  • The prior, p(alive), is low. In all of our past experiences with animals, we've never seen one like this. The knees are especially weird.
    • The prior model is probably implemented in the brain in the form of firing rates from neurons in inferior temporal cortex, which have learned to recognize specific animals and objects.
  • However, the generative model tells us that this is moving just like a living thing moves!
    • In words, the generative model is "what is the probability that the motion I'm seeing is generated by a living thing".
    • In equations, the generative model is p(motion | alive), or what is the probability that I would be receiving these motion sensations from a living thing.
    • The generative model is probably implemented in the brain in the form of firing rates from neurons in STS (superior temporal sulcus, which is tuned to biological motion).
  • Therefore, even though that thing doesn't look like any animal you've ever seen, i.e., p(alive) is small, it's moving just like an animal would move, i.e., p(motion | alive) is large, so p(alive | motion) is relatively large and you react to it like it could be alive.
  • Motion is central but ambiguous
    We've already seen several examples of the importance of motion in scene segmentation: motion parallax as a depth cue; motion for breaking camouflage; motion for grabbing attention (change blindness doesn't happen if there are motion cues). Yet, in spite of its centrality, motion is not immune to the problem of ambiguity:
      We've already talked about the aperture problem: direction of motion of straight lines is ambiguous if you can't see ends or corners.
      We've already talked about optic flow. In discussing optic flow, we discussed how our motion through the world creates a visual cue we use to maintain balance. But sometimes coordinated motion in the world around us is real, e.g., a train going past. So we can confuse ego motion with object motion ...
      A major source of image motion is eye motion: we're making a saccade every few hundred milliseconds. How does our brain handle that? Here are a few options not testable knowledge.
      • Saccadic masking: our brains blank out all the wild motion created by saccades.
      • Corollary discharge -- a copy (corollate) of the motor signal sent to the eyes to move them is also sent to a circuit that compares eye movement to image movement. This comparator performs a logical or operation between the corollary discharge and the retinal image motion.
          If the eyes move but the image does not move ... then there is real image motion (eyes are tracking moving object)
          If the image moves but the eyes do not move ... then there is real image motion (eyes are holding still and something's moving by)
          If the eyes & image are moving ... then there is no motion (eyes are creating motion).
          If neither eyes nor image move ... no need to perceive motion.
          ... In reality, eye motion isn't the only thing that causes image motion. Translation or rotation of the head will also generate motion. Therefore, a realistic comparator trying to figure out whether image motion is due to motion in the world outside our head also needs to take into account optic flow information and vestibular information.
    Visual area MT responds to local motion. MST likes optic flow (motion that's coordinated across the entire scene and has a heading, or a point of convergence).
      Typical laboratory stimuli vary how many dots are moving in the same direction. Neurons in MT love to play this game.
      Optic flow is coherent motion + heading. That's MST's specialty.
    Without attention, your eyes are drawn to salient locations, and you can get scene gist
    Bottom-up salience is defined by contrast along dimensions that V1 cares about: color, orientation, brightness, direction of motion ...
    Gist is: indoor/outdoor, people/animal, basic geometry
    Attention can be deployed several ways and is required for conjunction search tasks
    Not everything just jumps out at you.
    • Basic features, like color, orientation, direction of motion serve as natural segmentation cues
    • More complex tasks, like "find the T in with the L's", require attention
    • Parietal cortex appears necessary for this.
    Loads of psychophysical and electrophysiological experiments demonstrate three strategies:
    • Feature-based attention: you can look for a particular feature ... anywhere
    • Spatial attention: you can look in a particular place ... for anything
    • Object-based attention: you can deploy attention along an object, even if it's partially occluded
    We fail to perceive so many things that go unattended
    Classic change blindness demos, like this one with static images or this one with real people show how much of a scene we do not perceive. If you don't have some kind of bottom-up salience cue (motion, color contrast ...), you need some top-down attention allocation if you're going to be aware of something.
    Synesthesia is a mixing of the senses
    About 5% of the population experiences some kind of synesthesia. Color/grapheme synesthesia is the most commonly described (letters and numbers have color). This can come in handy for pop-out search (it turns a conjunction search task into a simple, bottom-up search task) or memorizing long numbers. Interesting websites are:
    The relationship between perception and action is best described by a diagram with an arrow pointing both ways
    Perception selects targets for action, and helps us correct errors as we execute actions Broadly speaking, there are 2 kinds of action: navigation (moving around our environment) and reaching/grabbing.
      Ian Waterman's case study dramatically illustrates the role of proprioception in action.

    Acting on unconscious sensory information is possible

      There are a few dramatic examples of action without perception, e.g., patient DF who could mail a letter through a slot but couldn't tell you what the orientation of the slot was, or patient TN, who was completely blind but could navigate cluttered hallways. TN's case make it obvious that visual information gets to the non-visual regions of the brain (parietal cortex) even when V1 is not working.
      The fact that healthy controls accurately grasp the center object affected by a tilt illusion illustrates that all of us maintain a "raw copy" of sensory information, separate from the version that we're consciously aware of (which has been shaped by inference).
    Brain regions involved in the integration of perception and action
    MST for optic flow
    Parahippocampal place area for navigation (both in recognizing places, and in recognizing landmarks)
    Mirror neurons in premotor cortex (frontal lobe) respond to an action whether the brain-owner or someone else is doing the action
    Parietal cortex: some neurons in parietal cortex respond to the visual aspects of a given action, while others respond to motor commands
    Sometimes it is best to ignore perception
    In the curveball illusion, accidental feature binding (feature blurring) in peripheral vision gives the batter the wrong impression of the ball's trajectory if s/he looks away from the ball. Good hitters ignore that information.
    24 of the questions are on the material covered during the last 5 weeks of class, and 12 of the questions are cumulative, connecting material from the last 5 weeks of class to the first 10. 2 questions are "sprint" questions, and 2 questions are short-answer questions (4 points).
    Created: 2017. License: CC-BY 4.0