Term
| 4 things used to describe "rewards" (functional definitions) |
|
Definition
Generates approach/consummatory behaviors (reinforces an action) Produces learning of this behavior May engage positive emotions Represents the positive outcomes of economic decisions |
|
|
Term
| In general, what do we use rewards for? |
|
Definition
| Use rewards to make predictions about the environment to engage in goal-directed behavior |
|
|
Term
| Examples of primary rewards... |
|
Definition
Biologically ingrained into us; needed for survival
Water, oxygen, food, sex, temperature |
|
|
Term
| Examples of secondary (social) rewards... |
|
Definition
Derive value from primary rewards; help us feel good
Shelter, praise, fame, power |
|
|
Term
| What is the behavior of schadenfreude? |
|
Definition
Also known as spite/vengefulness When we engage in this behavior, we are willing to face negative consequences (inflict some form of pain on self) in order for someone else to feel more pain |
|
|
Term
| What does valence refer to when describing rewards? |
|
Definition
The positive or negative value associated with the reward
Indicates that rewards are subjective; if a person likes something, has a positive valence; if something is disliked, has a negative valence |
|
|
Term
| What is the main assumption underlying Decision Theory? |
|
Definition
| Main assumption is that the choice that a person makes in a situation, should be the choice that you perceive as having the most value (what you value the most); assumes that people will choose rationally |
|
|
Term
| What are the important decision variables? |
|
Definition
Subjective Value of Reward (UTILITY) Weighted Probability of Reward (LIKELIHOOD) Temporal Discounting (want more now) |
|
|
Term
| In terms of decision making and decision theory, where is there typically a discrepancy? |
|
Definition
| Often see discrepancy between what people should choose (NORMATIVE representation) and what they actually choose (DESCRIPTIVE representation) |
|
|
Term
| Normative vs. Descriptive Representation |
|
Definition
Normative = what people should choose Descriptive = what people actually choose |
|
|
Term
| What is "temporal discounting"? |
|
Definition
| Tend to value rewards MORE as they move towards the present (want more now); e.g. would rather choose $500 now, as opposed to $700 in 5 years |
|
|
Term
|
Definition
EV = reward x probability
Should always make decisions (according to decision theory) based on the HIGHEST expected value; value of reward times how likely it is to occur |
|
|
Term
| What is true of people when it comes to GAINS vs. LOSSES? |
|
Definition
People are risk AVERSE when it comes to GAINS
People are risk SEEKING when it comes to LOSSES. |
|
|
Term
| Why are people risk averse when it comes to gains? |
|
Definition
| In the money e.g. from class, the most important variable is the magnitude of the reward (subjective value/utility); we value the first 500k more than the 2nd 500k (principle of diminishing return as the reward magnitude increases) |
|
|
Term
| Example of diminishing return.. |
|
Definition
In the scenario where you have 100% of winning 500k vs. 80% of winning 1M, most people will stick with 100% scenario because they are risk averse. We value the 1st 500k more than the 2nd 500k so we think the subjective value of the 100% scenario is higher |
|
|
Term
| Why are people risk "seeking" when it comes to losses? |
|
Definition
| Because losses "hurt" more than gains; notice the steeper slope on the graph of subjective value vs. gains/losses |
|
|
Term
| Compare subjective utility for loss of 500k compared to a gain of 500k |
|
Definition
| The difference is MUCH larger for the loss of 500k because of the steeper slope on the graph; this causes us to engage in risk "seeking" behavior when it comes to losses |
|
|
Term
| Explain S-shaped curve around probability and how it leads us to act given its subjective weighting... |
|
Definition
We tend to over weight improbable events; perceive low probability events to be more likely (convex curve); POTENTIAL minded We tend to under weight higher probability events (concave curve); SECURITY minded |
|
|
Term
| What does the shape of the subjective weighting of probability curve describe our behavior to be? |
|
Definition
| S-shaped curve describes us to be "cautiously hopeful/optimistic" |
|
|
Term
| Given similar rewards, what do humans prefer? What is this called? |
|
Definition
Given similar rewards, humans prefer those that arrive sooner (tend to over value rewards that will occur closer to the present) This is known as "temporal discounting" |
|
|
Term
| How do perceived values of rewards diminish as they occur farther into the future? |
|
Definition
| Value diminishes by a hyperbolic function (value of rewards discounted hyperbolically) |
|
|
Term
| How does our ability to delay gratification change with time? |
|
Definition
Make it so rewards occurring farther in the future are relatively less discounted (function becomes less steep) Learn to temporally discount rewards less over time |
|
|
Term
| Game Theory (& two examples of where it can be applied) |
|
Definition
Looks at decision making in social interactions Look at decision making in situations where a person's success is dependent on their actions AND those of others
Examples - ultimatum game & trust game |
|
|
Term
| Evidence DA is central NT in reward systems... |
|
Definition
1. Effects of DBS 2. Drugs of abuse act on DA pathways - all increase DA release 3. Lesions to DA pathways cause lack of goal-motivated behaviors & motivation 4. Recording of DA neurons - activity scales with reward magnitude |
|
|
Term
| What happens to the baseline reading of a DA neuron (in regards to DBS) when a DA agonist vs. antagonist is added? |
|
Definition
With AGONIST - get curve shifted to the L and remaining the same size; because there is an increase in baseline DA activity need lower frequency to reach same response rate
With ANTAGONIST - curve shifts to the right and shrinks; shifts to right because decrease in tonic DA activity and shrinks because less DA receptors available |
|
|
Term
| What happens to DBS if it is not tied to an action or event (i.e. cannot be controlled or predicted)? |
|
Definition
Becomes unpleasurable or even aversive; need something to tie to the DA release (event/stimulus) so we can LEARN to predict/control it We need to have control over environment to enjoy things therefore need event to tie to it to be able to predict its occurrence |
|
|
Term
| In what two scenarios does DA release occur? |
|
Definition
When something pleasurable occurs When a salient stimulus is presented (something surprising) |
|
|
Term
| What happens to animals who have lesions to DA neurons? |
|
Definition
| Seem to lack motivation and cannot engage in goal directed behaviors |
|
|
Term
| What is true of the activity of DA neurons? |
|
Definition
| They are all TONICALLY ACTIVE; all DA neurons fire at some baseline rate (constant release of DA across synapse) |
|
|
Term
| Do DA neurons discriminate between the type of reward received? |
|
Definition
| No - see phasic burst after reward presentation regardless of what type of reward was received |
|
|
Term
| What does the amount of DA firing vary with? |
|
Definition
| Varies with the PREDICTED REWARD (varies with how reward presented matches up with what you thought you were going to get) |
|
|
Term
| If there is more vs. less reward presented than is expected what happens to DA neuron firing? |
|
Definition
If reward > expected, get increase in DA neuron firing after reward is presented
If reward is < expected or omitted, get cessation or decrease in DA neuron firing |
|
|
Term
| What does DA neuron firing scale with? (what does it vary according to?) |
|
Definition
| DA neuron firing scales with the MAGNITUDE OF THE REWARD (when compared to its expected value); if you get more than you thought you would, get more DA firing, for example |
|
|
Term
| What is the benefit of pairing a stimulus with a reward? |
|
Definition
| They are paired for the benefit of PREDICTION - want to be able to use the stimulus to predict the occurrence of the reward |
|
|
Term
| What 3 factors govern learning according to Learning Theory? |
|
Definition
Contiguity - need sequential presentation of stimulus followed by reward Contingency - need the (conditioned) stimulus to be able to accurately predict reward; learning is contingent on this prediction RPE - to learn we need "discrepancy" between predicted reinforcer and the actual reinforcer (reward) received |
|
|
Term
| When referring to Learning Theory, what does "contiguity" refer to? |
|
Definition
Refers to the necessity of the stimulus to be presented, followed by the presentation of the reward/reinforcer in that specific order. Need this to be repeated in continuous trials and for there to only be a short time delay between presentation. |
|
|
Term
| In referring to contiguity, how can learning be accomplished faster/to a stronger degree? |
|
Definition
By shortening the time interval between presentation of the stimulus and the reward Need the two to be temporally proximal |
|
|
Term
| Differences between operant & classical conditioning? |
|
Definition
Operant = trial and error learning; don't have any environmental cues to help
Classical = use environmental cues (stimuli) to predict what behaviors elicit rewards |
|
|
Term
| What occurs to DA activity during learning "acquisition"? |
|
Definition
In initial trials, see DA phasic burst occur after reward is presented However, as learning occurs, DA burst occurs after presentation of CS, instead of after reward is presented; use DA to PREDICT OCCURRENCE of reward |
|
|
Term
| In terms of learning acquisition, what is the function of DA? Evidence? |
|
Definition
DA, as learning is occurring/has occurred, is used to PREDICT occurrence of the reward This is evidenced by the fact that the phasic burst of DA firing occurs after the CS has been presented after learning occurs, instead of after reward is presented |
|
|
Term
| In terms of learning theory & contingency, what is learning proportional to? |
|
Definition
How good of a predictor the CS is for the occurrence of the reward
Strongest learning occurs if the CS predicts reward 100% of the time. |
|
|
Term
What happens to DA firing if... a) CS 100% predicts reward b) CS is neutral to reward (0%) c) CS fully predicts reward's absence |
|
Definition
a) Get increase in DA firing after CS presented b) Get tonic DA firing after CS c) Cessation of DA firing (no firing) after CS |
|
|
Term
|
Definition
| RPE = difference between the perceived reward (what you think you'll get) vs. the actual reward received (what you got) |
|
|
Term
| In operant conditioning, what does DA neuron firing respond to? |
|
Definition
| In operant conditioning, DA firing corresponds to TEMPORAL PREDICTION of the rewards - have an "expected time" (after action is completed) that the reward is expected within |
|
|
Term
In operant conditioning with a learned task, what happens if: a) Reward presented at expected time b) Reward not presented c) Reward presented 1/2 second early or late |
|
Definition
a) Get tonic DA activity because reward is expected and presented at the same time b) Get cessation/decrease in DA activity because no reward is presented c) Get increase in activity in DA firing because there is a temporal discrepancy and the reward is presented at a different time |
|
|
Term
| In terms of the reinforcement learning algorithm, what do the DA neurons encode for/calculate? |
|
Definition
| Encode for the difference/change in associated value from trial to trial (this is the reward prediction error) |
|
|
Term
| Why is it beneficial that we use the reinforcement algorithm as a recursive function in everyday life, as opposed to trying to remember everything? |
|
Definition
| Use it so that we can weight more recent events more heavily; only remember Vt from trial to trial, so by weighting more recent things more heavily we can adapt to our current environments |
|
|
Term
| What does a high vs. low learning rate favor? |
|
Definition
High learning rate -> favors exploration
Low learning rate -> favors exploitation |
|
|
Term
| Type of curve seen in variable vs. constant reward schedule? |
|
Definition
Variable - jagged curve which averages out onto expected value
Constant - straight curve with plateau at expected value |
|
|
Term
| For a slot machine type game with "variable" reward schedules, what type of curve would be seen? |
|
Definition
| Would see a jagged curve, with the average being located at the expected value of V in the long run |
|
|
Term
| Mesocortical vs. Mesolimbic (comparison)... |
|
Definition
Mesolimbic - associative, can work in parallel, fast, intuitive, automatic, emotionally influenced Mesocortical - works in series (one at a time), slow, effortful, not emotionally influenced, rule governed |
|
|
Term
| Alternate names for mesolimbic vs. mesocortical pathways? |
|
Definition
Mesolimbic - INTUITIVE Mesocortical - REASONING |
|
|
Term
| In the mesolimbic vs. mesocortical pathways where do the modulatory DA neurons stem from and where do they modulate the pathway? |
|
Definition
In mesolimbic - come from VTA; modulate at the level of the striatum In mesocortical - come from VTA; modulate at the level of the PFC (cortical) |
|
|
Term
| Name the pathway of the mesocortical system (anatomical connections)... |
|
Definition
Runs from VTA to the frontal/prefrontal cortices Input from dlPFC to striatum (VTA modulates input at PFC level); output from GPi/SNpr, through thalamus, back up to the dlPFC |
|
|
Term
| Name the pathway of the mesolimbic system (anatomical connections)... |
|
Definition
From VTA to nucleus accumbens in STRIATUM Input comes from mOFC, ACC, amygdala, hippocampus, insula (output eventually goes here too) BG output comes from GPi/SNpr through thalamic relay nuclei |
|
|
Term
| Functions of the mesolimbic vs. mesocortical pathways... |
|
Definition
Mesolimbic - involved in depression, schizophrenia, addiction; "reward circuit" of the brain Mesocortical - involved in executive function (attention, working memory), and most importantly INHIBITION |
|
|
Term
| What would humans behave like without over-developed PFC's? |
|
Definition
| Would behave as very reflexive creatures - would be extremely impulsive with our actions towards our surroundings and would always strive for immediate gratification, as opposed to long-term goal directed behavior |
|
|
Term
| Name deficits/symptoms of someone with lesions to their PFC: |
|
Definition
Cognitive - short attention, impaired working memory, lack of motivation Behavioral - overly aggressive/sexual behavior, perseveration (repeated behavior) Emotional - angry, depressed |
|
|
Term
| What are the 3 main types of deficits seen in those with PFC lesions? |
|
Definition
| Cognitive, Behavioral, Emotional |
|
|
Term
| With application of DA antagonists to the dlPFC, what was seen? |
|
Definition
| Affliction in ability to perform contralateral memory guided saccades - suggests function of dlPFC in working memory |
|
|
Term
| What is the role of the dlPFC in anti-saccades? |
|
Definition
| Saw that with dlPFC lesions (DA antagonists) had problems with contralateral anti-saccades - could not inhibit eye movement toward cue before moving in other direction; need dlPFC to suppress contralateral reflexive responses |
|
|
Term
| Role of dlPFC in saccadic eye movements (2 points): |
|
Definition
1) Needed for working memory (problem with contralateral memory guided saccades) 2) Needed for inhibition of reflexive saccadic eye movements (problem with contralateral anti-saccades) |
|
|
Term
|
Definition
| Encodes and compares the relative value of REALIZED (R) and POTENTIAL (Vt+1) rewards |
|
|
Term
What are the relative firing rates of mOFC neurons when the decision is... a) a no-brainer (something vs. nothing) b) A and B have equal value c) there is a deviation from equal value |
|
Definition
a) low firing rates - because the decision is so basic, don't really need to compare relative values b) low firing rates - because they are of equal value, either choice will suffice c) high firing rates - this becomes subjective decision making so firing rates increase |
|
|
Term
| When do you see the highest activity from mOFC neurons? |
|
Definition
| After the offer is presented (encoding Vt+1) and after the reward is presented (encoding R) |
|
|
Term
| In the human food auction experiment, what is the function of the mOFC neurons? |
|
Definition
| These mOFC neurons encode how much the subject is "willing to pay" for the food item being auctioned; assign relative value to different foods to make decisions |
|
|
Term
| What are the 3 main functions of the ACC in humans? |
|
Definition
Error Detection - e.g. w/ typing task Conflict Monitoring - Stroop Task Reward Based Learning (Task Switching) |
|
|
Term
|
Definition
| Check w/ typing test; when a typing error is made, see a spike in error-related negativity in the ACC (hyperpolarization); ACC monitors errors in tasks & remembers them so you can perform better in the future |
|
|
Term
| Conflict Monitoring & ACC |
|
Definition
| Shown in Stroop Task - when incongruent color/word pairings were shown (conflict present) saw highest amount of activity in ACC (conflict between 2 different mental processes used to perceive situation) |
|
|
Term
| In see-saw task, what are the 2 different signals for task switching to occur? |
|
Definition
Auditory feedback - hears a beep Reward feedback - reward magnitude decreases by 1/2 |
|
|
Term
| What activates neurons in the ACC during the see-saw task in reward based learning? |
|
Definition
| The actual switching from one task to another activates neurons; neurons serve to predict when to switch tasks |
|
|
Term
| Most important function of the ACC in relation to reward based learning? |
|
Definition
| TASK SWITCHING - involved in changing task based on reward feedback |
|
|
Term
| How can we prove that the ACC is involved in task switching in reward based learning? |
|
Definition
| Use GABA agonist in ACC and see what happens - only see deficits in reward feedback task switching (behavior perseveration), but not in auditory feedback task switching |
|
|
Term
| Amygdalae & fear based learning |
|
Definition
Amygdalae are important for forming long-term memory associations between neutral stimuli and painful/fearful events Ablation to amygdalae can lead to the inability to form these associations; inability to encode for aversive/punishing rewards |
|
|
Term
| In the ultimatum game, what happens in the mesocortical and mesolimbic systems? |
|
Definition
See more activation in amygdala, insula & dlPFC in unfair proposals as opposed to fair offers Increased insular activity is associated with people rejected unfair offers (as well as digusting/fearful stimuli) Increased dlPFC, amygdala & insula activity is predictive & scales how unfair an offer is |
|
|
Term
| What factors influence the speed vs. accuracy tradeoff... |
|
Definition
Quality of Evidence (e.g. % coherence of dots) Pre-Existing Decision Variables (expected value/probability of rewards) Urgency until Choice (how long you have to decide) |
|
|
Term
| Does any learning occur if there is no RPE? Where on the graph is RPE shown? |
|
Definition
RPE occurs on the steep/sloped part of the graph; at the plateau RPE = 0 and NO learning is occurring Need RPE (difference between expected and actual reward) for learning to occur |
|
|
Term
| Size of PFC in normal person vs. ADHD? |
|
Definition
| In ADHD children have under-developed PFCs - causes hyperactive behaviors with attention deficits because of the lack of inhibition from the underdeveloped PFC |
|
|
Term
| What area of the brain do drugs of abuse (nicotine, meth, cocaine) influence and what is their effect? |
|
Definition
| Affect the mesolimbic circuit of the brain (VTA to NA) and can lead to addiction; all involve increasing the amount of dopamine activity in this pathway |
|
|
Term
| What area of the brain does schizophrenia affect? What treatments exist for it? |
|
Definition
Affects the mesolimbic circuits of the brain; may be caused by excess DA present in the mesolimbic circuit Drugs to treat it function by decreasing the amount of DA present to alleviate symptoms |
|
|
Term
| What pathway is affected in ADHD? What occurs here? |
|
Definition
In ADHD the mesocortical pathway is affected; see a decrease in DA activity in the under-developed PFC in those with ADHD Use Ritalin to treat and increase DA levels to increase function of PFC (increase inhibition) - ritalin blocks DA reuptake to increase activity |
|
|